Meta Llama

Meta Llama API

Access Meta Llama instantly with Puter.js, and add AI to any app in a few lines of code without backend or API keys.

// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';

puter.ai.chat("Explain AI like I'm five!", {
    model: "meta-llama/llama-4-maverick"
}).then(response => {
    console.log(response);
});
<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        puter.ai.chat("Explain AI like I'm five!", {
            model: "meta-llama/llama-4-maverick"
        }).then(response => {
            console.log(response);
        });
    </script>
</body>
</html>

List of Meta Llama Models

Chat

Llama Guard 4 12B

meta-llama/llama-guard-4-12b

Llama Guard 4 12B is Meta's 12 billion parameter multimodal safety model that moderates both text and image inputs across 12 languages. It was built from Llama 4 Scout and detects violations based on the MLCommons hazard taxonomy.

Chat

Llama 4 Maverick

meta-llama/llama-4-maverick

Llama 4 Maverick is Meta's 400 billion total parameter MoE model with 17B active parameters and 128 experts, supporting 1M token context. It's natively multimodal with state-of-the-art performance on coding, reasoning, and image understanding tasks.

Chat

Llama 4 Scout

meta-llama/llama-4-scout

Llama 4 Scout is Meta's efficient 109 billion parameter MoE model with 17B active parameters and 16 experts, featuring an industry-leading 10M token context window. It fits on a single H100 GPU and handles multimodal text and image inputs.

Chat

Llama Guard 3 8B

meta-llama/llama-guard-3-8b

Llama Guard 3 8B is Meta's enhanced safety moderation model providing content classification in 8 languages with support for tool call safety. It detects 14 hazard categories and integrates with Llama 3.1 for comprehensive AI safety.

Chat

Llama 3.3 70B Instruct

meta-llama/llama-3.3-70b-instruct

Llama 3.3 70B Instruct is Meta's refined 70 billion parameter multilingual model with improved instruction following and tool use capabilities. It supports 8 languages and offers enhanced reasoning performance over previous versions.

Chat

Meta Llama 3.3 70B Instruct Turbo

meta-llama/llama-3.3-70b-instruct-turbo

Llama 3.3 70B Instruct Turbo is a 70-billion-parameter, instruction-tuned text model from Meta, served on Together AI's throughput-optimized Turbo endpoint. It is known for delivering quality close to the much larger Llama 3.1 405B at a fraction of the cost, especially on instruction following and reasoning. Published benchmarks include IFEval 92.1%, MMLU 86.0%, HumanEval 88.4%, MGSM 91.1%, MATH 77.0%, and GPQA Diamond 50.5%. On IFEval it edges out Llama 3.1 405B (88.6) and approaches Claude 3.5 Sonnet (89.3). With a 128K context window and tool/function calling support, it suits developers building chat assistants, multilingual apps, coding helpers, and agentic workflows that want strong open-weight performance at a low price.

Chat

Llama 3.2 11B Vision Instruct

meta-llama/llama-3.2-11b-vision-instruct

Llama 3.2 11B Vision Instruct is Meta's multimodal model that processes both text and images with 11 billion parameters. It excels at visual recognition, image reasoning, captioning, and answering questions about images.

Chat

Llama 3.2 1B Instruct

meta-llama/llama-3.2-1b-instruct

Llama 3.2 1B Instruct is Meta's ultra-lightweight 1 billion parameter model designed for edge and mobile devices. It supports 128K context and handles summarization, instruction following, and rewriting tasks locally.

Chat

Llama 3.2 3B Instruct

meta-llama/llama-3.2-3b-instruct

Llama 3.2 3B Instruct is a compact 3 billion parameter model optimized for on-device use cases with 128K context support. It outperforms comparable models on instruction following, summarization, and tool-use tasks.

Chat

Llama 3.1 405B (base)

meta-llama/llama-3.1-405b

Llama 3.1 405B is Meta's flagship open-source large language model with 405 billion parameters, supporting 128K context length and 8 languages. It offers capabilities comparable to leading closed models for advanced reasoning, coding, and multilingual tasks.

Chat

Llama 3.1 405B Instruct

meta-llama/llama-3.1-405b-instruct

Llama 3.1 405B Instruct is the instruction-tuned version of Meta's largest open model, optimized for multilingual dialogue, tool use, and complex reasoning. It supports 8 languages with 128K context and serves as a foundation for enterprise-level AI applications.

Chat

Llama 3.1 70B Instruct

meta-llama/llama-3.1-70b-instruct

Llama 3.1 70B Instruct is a multilingual 70 billion parameter model with 128K context length, optimized for dialogue, tool use, and coding tasks. It balances strong performance with resource efficiency across 8 supported languages.

Chat

Llama 3.1 8B Instruct

meta-llama/llama-3.1-8b-instruct

Llama 3.1 8B Instruct is Meta's efficient 8 billion parameter multilingual model supporting 128K context and 8 languages. It's ideal for resource-constrained deployments requiring summarization, classification, and translation capabilities.

Chat

Llama 3 70B Instruct

meta-llama/llama-3-70b-instruct

Llama 3 70B Instruct is a 70 billion parameter instruction-tuned language model from Meta, optimized for dialogue and assistant-like chat in English. It uses an optimized transformer architecture with grouped-query attention and was trained on over 15 trillion tokens.

Chat

Llama 3 8B Instruct

meta-llama/llama-3-8b-instruct

Llama 3 8B Instruct is Meta's compact 8 billion parameter instruction-tuned model for dialogue use cases in English. It offers strong performance on common benchmarks while being more efficient to deploy than its larger sibling.

Chat

LlamaGuard 2 8B

meta-llama/llama-guard-2-8b

Llama Guard 2 8B is Meta's 8 billion parameter safety classifier built on Llama 3, designed to moderate both user prompts and AI responses. It classifies content across 11 hazard categories based on the MLCommons taxonomy.

Chat

Meta Llama 3 8B Instruct Lite

meta-llama/meta-llama-3-8b-instruct-lite

Meta Llama 3 8B Instruct Lite is a cost-optimized serving tier for Meta's Llama 3 8B Instruct, an 8-billion-parameter open-weight chat model pretrained on over 15 trillion tokens. Despite its small size, it delivers strong reasoning, coding, and general-purpose conversation. On published benchmarks it scores around 67.4 on MMLU, 79.6% on GSM8K math, and 62.2% on HumanEval coding, making it highly competitive among models in its tier. The Lite tier offers the same quality at a lower per-token price, making it a great fit for developers who need fast, affordable responses for chat assistants, summarization, classification, and lightweight coding tasks at scale.

Frequently Asked Questions

What is this Meta Llama API about?

The Meta Llama API gives you access to models for AI chat. Through Puter.js, you can start using Meta Llama models instantly with zero setup or configuration.

Which Meta Llama models can I use?

Puter.js supports a variety of Meta Llama models, including Llama Guard 4 12B, Llama 4 Maverick, Llama 4 Scout, and more. Find all AI models supported by Puter.js in the AI model list.

How much does it cost?

With the User-Pays model, users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

What is Puter.js?

Puter.js is a JavaScript library that provides access to AI, storage, and other cloud services directly from a single API. It handles authentication, infrastructure, and scaling so you can focus on building your app.

Does this work with React / Vue / Vanilla JS / Node / etc.?

Yes — the Meta Llama API through Puter.js works with any JavaScript framework, Node.js, or plain HTML. Just include the library and start building. See the documentation for more details.