StepFun Step 3.7 Flash Is Now Available in Puter.js

Reynaldi Chernando

June 2, 2026

On this page

What is Step 3.7 Flash?Examples Multimodal image understanding Agentic coding task Search-augmented reasoning Step-by-step reasoning with streaming Get Started Now

Puter.js now supports Step 3.7 Flash, StepFun's latest high-efficiency model—a multimodal Mixture-of-Experts system built for coding agents, search-augmented reasoning, and multimodal task automation. Add it to your application for free without any API keys.

What is Step 3.7 Flash?

Step 3.7 Flash is a 198B sparse Mixture-of-Experts vision-language model that pairs a 196B-parameter language backbone with a 1.8B Vision Transformer encoder, activating only ~11B parameters per token. Released and open-sourced on May 29, 2026, it builds directly on Step 3.5 Flash and adds native multimodality along with stronger, more consistent agentic performance. Key highlights include:

Native Multimodality: Unlike the text-only Step 3.5 Flash, version 3.7 natively understands images through a dedicated vision encoder, with a Visual Search pathway for long-tail entity recognition and a Python tool pathway for fine-grained tasks like cropping, zooming, and bounding-box analysis
Frontier Coding Agent: Scores 56.3% on SWE-Bench Pro and 59.6% on Terminal-Bench 2.1—improvements of roughly +5 and +6 points over Step 3.5 Flash—plus 76.5% on SWE-Bench Verified
Selectable Reasoning Tiers: Exposes low, medium, and high reasoning depths, letting you trade latency and cost against answer depth on a per-call basis
Cross-Harness Consistency: Where Step 3.5 ranged 43–73% across coding scaffolds, Step 3.7 narrows to 64.5–71.5%, making behavior far more predictable across different tool harnesses
256K Context, Up to 400 tokens/sec: Long-context reasoning with real-time responsiveness
Advisor Mode: Reaches ~97% of Claude Opus 4.6's coding performance at roughly one-ninth the per-task cost

	Step 3.5 Flash	Step 3.7 Flash
SWE-Bench Pro	51.3%	56.3%
Terminal-Bench 2.1	53.4%	59.6%
SWE-Bench Verified	—	76.5%
Multimodal Input	Text only	Text + Image
Reasoning Tiers	—	Low / Medium / High
Context Window	256K	256K

Examples

Multimodal image understanding

puter.ai.chat(
    "What do you see in this image? Describe it in detail.",
    "https://assets.puter.site/doge.jpeg",
    { model: "stepfun/step-3.7-flash" }
);

Agentic coding task

puter.ai.chat(`Design a Python script that automatically monitors
a directory for new files, processes them based on file type,
and generates a summary report. Include error handling and tests.`,
    { model: "stepfun/step-3.7-flash", stream: true }
);

Search-augmented reasoning

puter.ai.chat(
    "Research the trade-offs between server-side and client-side rendering for a content-heavy web app, then recommend an approach with justification.",
    { model: "stepfun/step-3.7-flash", stream: true }
);

Step-by-step reasoning with streaming

puter.ai
    .chat(
        "A farmer has 17 sheep. All but 9 run away. How many sheep does the farmer have left? Explain your reasoning step by step.",
        { model: "stepfun/step-3.7-flash", stream: true }
    )
    .then(async (resp) => {
        for await (const part of resp) {
            if (part?.reasoning) puter.print(part?.reasoning);
            else puter.print(part?.text);
        }
    });

Get Started Now

Just add one library to your project:

// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';

Or add one script tag to your HTML:

<script src="https://js.puter.com/v2/"></script>

No API keys needed. Start building with Step 3.7 Flash immediately.

Learn more:

Ship a Full-Stack App with One Prompt

Give this to your AI Create a to-do list app using Puter.js

Try in

Coding manually? see the guide