IBM Granite API
Access IBM Granite instantly with Puter.js, and add AI to any app in a few lines of code without backend or API keys.
// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';
puter.ai.chat("Explain AI like I'm five!", {
model: "ibm-granite/granite-4.0-h-micro"
}).then(response => {
console.log(response);
});
<html>
<body>
<script src="https://js.puter.com/v2/"></script>
<script>
puter.ai.chat("Explain AI like I'm five!", {
model: "ibm-granite/granite-4.0-h-micro"
}).then(response => {
console.log(response);
});
</script>
</body>
</html>
List of IBM Granite Models
Granite 4.1 8B
ibm-granite/granite-4.1-8b
IBM Granite 4.1 8B is a dense, decoder-only language model from IBM, built for enterprise workloads like tool calling, RAG, code generation, summarization, and classification. It supports a 131K-token context window and 12 languages including English, German, Spanish, French, Japanese, and Chinese. Despite its compact size, the 8B model matches or outperforms IBM's previous-generation 32B Mixture-of-Experts model across benchmarks — scoring 69.0 on ArenaHard, 68.3 on BFCL V3 (tool calling), and 92.5 on GSM8K. It implements OpenAI-compatible tool calling and supports fill-in-the-middle for code completion. Its dense architecture makes it straightforward to fine-tune for downstream tasks. Released under the Apache 2.0 license, it's a strong pick for developers who need reliable enterprise capabilities at an efficient parameter count.
ChatGranite 4.0 Micro
ibm-granite/granite-4.0-h-micro
Granite 4.0 Micro is a 3B-parameter dense language model from IBM, built on a conventional transformer architecture and optimized for low-latency, cost-efficient workloads. Despite its compact size, it significantly outperforms its predecessor Granite 3.3 8B across the board — a model more than twice its size. It scores 16 on the Artificial Analysis Intelligence Index, placing ahead of Gemma 3 4B (15). In RAG benchmarks, it outperforms much larger models including Llama 3.3 70B and Qwen3 8B. The model natively supports tool calling, function calling, multilingual generation, fill-in-the-middle code completion, RAG, and structured JSON output, with a 128K token context window. It's a strong fit for agentic sub-tasks, API orchestration, and scenarios where speed and cost matter more than peak reasoning power.
Frequently Asked Questions
The IBM Granite API gives you access to models for AI chat. Through Puter.js, you can start using IBM Granite models instantly with zero setup or configuration.
Puter.js supports a variety of IBM Granite models, including Granite 4.1 8B and Granite 4.0 Micro. Find all AI models supported by Puter.js in the AI model list.
With the User-Pays model, users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.
Puter.js is a JavaScript library that provides access to AI, storage, and other cloud services directly from a single API. It handles authentication, infrastructure, and scaling so you can focus on building your app.
Yes — the IBM Granite API through Puter.js works with any JavaScript framework, Node.js, or plain HTML. Just include the library and start building. See the documentation for more details.