Z.AI

Z.AI API

Access Z.AI instantly with Puter.js, and add AI to any app in a few lines of code without backend or API keys.

// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';

puter.ai.chat("Explain AI like I'm five!", {
    model: "z-ai/glm-5"
}).then(response => {
    console.log(response);
});
<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        puter.ai.chat("Explain AI like I'm five!", {
            model: "z-ai/glm-5"
        }).then(response => {
            console.log(response);
        });
    </script>
</body>
</html>

List of Z.AI Models

Chat

GLM 5 Turbo

z-ai/glm-5-turbo

GLM-5 Turbo is a foundation model by Z.ai optimized for fast inference and agent-driven workflows, excelling at tool invocation, complex instruction decomposition, and long-chain task execution in OpenClaw scenarios. It is built on top of the GLM-5 architecture (744B parameters, 40B active) with DeepSeek Sparse Attention for reduced deployment cost and up to 205K token context. GLM-5 Turbo supports reasoning/thinking mode and is designed for real-world multi-step agentic tasks including scheduled, persistent, and high-throughput operations.

Chat

GLM 5

z-ai/glm-5

GLM-5 is Zhipu AI's (Z.ai) fifth-generation flagship open-weight foundation model with 744B total parameters (40B active) in a Mixture of Experts architecture, designed for agentic engineering, complex systems coding, and long-horizon agent tasks. It achieves state-of-the-art performance among open-weight models on coding and agentic benchmarks like SWE-bench Verified and Terminal Bench 2.0, approaching Claude Opus 4.5-level capability.

Chat

GLM 4.7 Flash

z-ai/glm-4.7-flash

GLM 4.7 Flash is designed for speed and efficiency while maintaining strong performance. It features a 200K token context window, making it suitable for processing long documents and generating extended responses.

Chat

GLM 4.6V

z-ai/glm-4.6v

GLM-4.6V is a 106B vision-language model featuring native multimodal Function Calling—the first to directly pass images as tool inputs. It supports 128K context for processing 150+ page documents or 1-hour videos in a single pass.

Chat

GLM 4.7

z-ai/glm-4.7

GLM-4.7 is Zhipu AI's latest ~400B flagship released December 2025, optimized for coding with 200K context and 128K output. It scores 73.8% on SWE-bench and 95.7% on AIME 2025.

Chat

GLM 4.6

z-ai/glm-4.6

GLM-4.6 is Zhipu AI's 355B-parameter (32B active) flagship text model with 200K context, excelling at coding, agentic workflows, and search tasks. It's 15% more token-efficient than GLM-4.5 and ranks as the #1 domestic model in China.

Chat

GLM 4.5V

z-ai/glm-4.5v

GLM-4.5V is a 106B-parameter vision-language model achieving SOTA on 42 multimodal benchmarks, capable of image/video reasoning, GUI agent tasks, document parsing, and visual grounding. It features a thinking mode toggle and 64K multimodal context under MIT license.

Chat

GLM 4.5

z-ai/glm-4.5

GLM-4.5 is Zhipu AI's flagship 355B-parameter open-source model (32B active) designed for agentic AI applications with dual thinking/non-thinking modes. It excels at reasoning, coding, and tool use, ranking 3rd globally among all models on combined benchmarks under MIT license.

Chat

GLM 4.5 Air

z-ai/glm-4.5-air

GLM-4.5-Air is a compact 106B-parameter variant (12B active) of GLM-4.5, offering competitive agentic performance with significantly lower resource requirements. It supports the same dual reasoning modes and 128K context window as its larger sibling.

Chat

GLM 4 32B

z-ai/glm-4-32b

GLM-4-32B is a 32-billion parameter bilingual (Chinese-English) foundation model by Zhipu AI, pre-trained on 15TB of reasoning-focused data. It delivers performance comparable to GPT-4o on code generation, function calling, and Q&A tasks while remaining deployable on accessible hardware.

Frequently Asked Questions

What is this Z.AI API about?

The Z.AI API gives you access to models for AI chat. Through Puter.js, you can start using Z.AI models instantly with zero setup or configuration.

Which Z.AI models can I use?

Puter.js supports a variety of Z.AI models, including GLM 5 Turbo, GLM 5, GLM 4.7 Flash, and more. Find all AI models supported by Puter.js in the AI model list.

How much does it cost?

With the User-Pays model, users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

What is Puter.js?

Puter.js is a JavaScript library that provides access to AI, storage, and other cloud services directly from a single API. It handles authentication, infrastructure, and scaling so you can focus on building your app.

Does this work with React / Vue / Vanilla JS / Node / etc.?

Yes — the Z.AI API through Puter.js works with any JavaScript framework, Node.js, or plain HTML. Just include the library and start building. See the documentation for more details.