IBM Granite API

Q: How much does it cost?

With the User-Pays model , users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

Access IBM Granite instantly with Puter.js, and add AI to any app in a few lines of code without backend or API keys.

Get Started Read Tutorial

// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';

puter.ai.chat("Explain AI like I'm five!", {
    model: "ibm-granite/granite-4.0-h-micro"
}).then(response => {
    console.log(response);
});

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        puter.ai.chat("Explain AI like I'm five!", {
            model: "ibm-granite/granite-4.0-h-micro"
        }).then(response => {
            console.log(response);
        });
    </script>
</body>
</html>

List of IBM Granite Models

Chat

Granite 4.1 8B

ibm-granite/granite-4.1-8b

IBM Granite 4.1 8B is a dense, decoder-only language model from IBM, built for enterprise workloads like tool calling, RAG, code generation, summarization, and classification. It supports a 131K-token context window and 12 languages including English, German, Spanish, French, Japanese, and Chinese. Despite its compact size, the 8B model matches or outperforms IBM's previous-generation 32B Mixture-of-Experts model across benchmarks — scoring 69.0 on ArenaHard, 68.3 on BFCL V3 (tool calling), and 92.5 on GSM8K. It implements OpenAI-compatible tool calling and supports fill-in-the-middle for code completion. Its dense architecture makes it straightforward to fine-tune for downstream tasks. Released under the Apache 2.0 license, it's a strong pick for developers who need reliable enterprise capabilities at an efficient parameter count.

Chat

Granite 4.0 Micro

ibm-granite/granite-4.0-h-micro

Granite 4.0 Micro is a 3B-parameter dense language model from IBM, built on a conventional transformer architecture and optimized for low-latency, cost-efficient workloads. Despite its compact size, it significantly outperforms its predecessor Granite 3.3 8B across the board — a model more than twice its size. It scores 16 on the Artificial Analysis Intelligence Index, placing ahead of Gemma 3 4B (15). In RAG benchmarks, it outperforms much larger models including Llama 3.3 70B and Qwen3 8B. The model natively supports tool calling, function calling, multilingual generation, fill-in-the-middle code completion, RAG, and structured JSON output, with a 128K token context window. It's a strong fit for agentic sub-tasks, API orchestration, and scenarios where speed and cost matter more than peak reasoning power.

Frequently Asked Questions

What is this IBM Granite API about?

The IBM Granite API gives you access to models for AI chat. Through Puter.js, you can start using IBM Granite models instantly with zero setup or configuration.

Which IBM Granite models can I use?

Puter.js supports a variety of IBM Granite models, including Granite 4.1 8B and Granite 4.0 Micro. Find all AI models supported by Puter.js in the AI model list.

How much does it cost?

With the User-Pays model, users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

What is Puter.js?

Puter.js is a JavaScript library that provides access to AI, storage, and other cloud services directly from a single API. It handles authentication, infrastructure, and scaling so you can focus on building your app.

Does this work with React / Vue / Vanilla JS / Node / etc.?

Yes — the IBM Granite API through Puter.js works with any JavaScript framework, Node.js, or plain HTML. Just include the library and start building. See the documentation for more details.

IBM Granite API

List of IBM Granite Models

Granite 4.1 8B

Granite 4.0 Micro

Frequently Asked Questions

Related Resources

Free, Unlimited AI API

Getting Started with Puter.js

Free, Unlimited OpenAI API