AI Tool

Step 3.7 Flash Review

Step 3.7 Flash is a high-efficiency, multimodal Mixture-of-Experts (MoE) vision-language model designed for real-world agentic workflows, developed by StepFun.

shipped May 31, 2026aifreemium

Read full review↓

Visit Step 3.7 Flash↗

aiproduct-hunt

Step 3.7 Flash - AI tool for step flash. Professional illustration showing core functionality and features.

1Released on May 28, 2026, Step 3.7 Flash is a 198-billion-parameter sparse MoE model.

2It features a 256k context window and activates approximately 11 billion parameters per token during inference.

3The model achieved a second-place finish on SWE-Bench PRO with a score of 56.3.

4Step 3.7 Flash leads the ClawEval-1.1 benchmark with a score of 67.1 for workflow integrity and tool orchestration.

Step 3.7 Flash at a Glance

Best For

product-hunt

Pricing

freemium

Key Features

Released on May 28, 2026, Step 3.7 Flash is a 198-billion-parameter sparse MoE model. · It features a 256k context window and activates approximately 11 billion parameters per token during inference. · The model achieved a second-place finish on SWE-Bench PRO with a score of 56.3.

Alternatives

Google Gemini (as an agent), AskUI Vision Agent, Skygen, OpenAI Operator

About Step 3.7 Flash

Founded

2023

Similar Tools

Compare Alternatives

Other tools you might consider

Google Gemini (as an agent)

Gemini is a multimodal AI model capable of understanding and operating across various data types, including images, video, and text, enabling sophisticated reasoning and direct UI control.

Visit→

AskUI Vision Agent

AskUI Vision Agent specializes in automating desktop and mobile workflows by visually understanding and interacting with graphical user interfaces at the operating system level.

Visit→

Skygen

Skygen is an AI desktop automation agent that provides real-time visibility and runs tasks across various applications, websites, and cloud computers.

View on Stork→

OpenAI Operator

OpenAI Operator is designed to execute multi-step actions directly within a web browser, enabling autonomous completion of complex web tasks.

Visit→

overview

What is Step 3.7 Flash?

Step 3.7 Flash is a high-efficiency, multimodal Mixture-of-Experts (MoE) vision-language model developed by StepFun that enables AI Developers and Enterprise users to build and deploy advanced AI agents. It provides advanced perception, search, and reasoning capabilities at production scale for agentic workflows. This 198-billion-parameter sparse MoE model, released on May 28, 2026, activates approximately 11 billion parameters per token during inference, ensuring high throughput. It integrates a 196B-parameter language backbone with a 1.8B-parameter vision encoder, facilitating native image and video understanding. The model supports a substantial 256k context window and offers three selectable reasoning levels (low, medium, and high) to balance speed, cost, and cognitive depth. Its primary function is to support agentic workflows requiring multimodal perception, search, and multi-step reasoning across various digital environments.

quick facts

Quick Facts

Attribute	Value
Developer	StepFun
Business Model	Freemium, Usage-based
Pricing	Freemium, Usage-based (Step 3.7 Flash input: $0.00020 per 1k tokens, output: $0.00115 per 1k tokens)
Platforms	API, Web (StepFun Open Platform)
API Available	Yes
Integrations	NVIDIA NIM, SGLang, NVIDIA TensorRT-LLM, vLLM, Hugging Face, OpenRouter, ModelScope
Founded	2023
HQ	Shanghai, China

features

Key Features of Step 3.7 Flash

Step 3.7 Flash incorporates a suite of technical features designed for high-performance agentic AI applications, leveraging a multimodal Mixture-of-Experts architecture. These capabilities enable advanced perception, reasoning, and action across diverse data types and operational environments.

1198-billion-parameter sparse Mixture-of-Experts (MoE) model, activating approximately 11 billion parameters per token.
2Native image and video understanding via an integrated 1.8B-parameter vision encoder.
3Supports a 256k context window for extensive information processing.
4Offers three selectable reasoning levels (low, medium, high) to optimize for speed, cost, or cognitive depth.
5Reliable interaction with external APIs, browsers, terminals, and Office tools for complex task execution.
6Open-source availability under the Apache 2.0 License on platforms like Hugging Face and ModelScope.
7Full inference stack support from NVIDIA, including availability as an NVIDIA NIM inference microservice.
8Advisor Mode functionality, allowing a smaller executor model to escalate complex tasks to a larger advisor model for cost efficiency.

use cases

Who Should Use Step 3.7 Flash?

Step 3.7 Flash is engineered for professionals and organizations requiring advanced multimodal AI capabilities for agentic workflows, particularly those focused on automation, complex data interpretation, and application development.

1**AI Developers:** For building and deploying next-generation AI applications, including multimodal agents with reliable tool use and orchestration.
2**Enterprise Users:** For parsing massive financial reports, running multi-step search loops with cross-source verification, and operating concurrent coding agents in high-throughput pipelines.
3**Engineers/Researchers:** For agentic coding, independently tracing multi-file repositories, identifying bugs from issue reports, and generating functional code patches.
4**Content Creators:** For applications requiring text-to-speech, voice cloning, creative writing, and advanced language learning functionalities.
5**Individuals Seeking Personal AI Assistance:** For knowledge acquisition, information finding, and general multimodal interaction.

pricing

Step 3.7 Flash Pricing & Plans

Step 3.7 Flash operates on a freemium and usage-based pricing model, allowing users to access a free tier before incurring costs based on token consumption. Specific rate limits are applied to concurrency, requests per minute (RPM), and tokens per minute (TPM), with a request timeout of 10 minutes. Users requiring higher limits can contact platform@stepfun.com.

1**Freemium:** A free tier is available for initial access and limited usage.
2**Step 1 (32K):** Input: $0.00205 per 1k tokens, Output: $0.00959 per 1k tokens.
3**Step 3.5 Flash:** Input: $0.000096 per 1k tokens, Output: $0.000288 per 1k tokens.
4**Step 3.5 Flash 2603:** Input: $0.000100 per 1k tokens, Output: $0.000300 per 1k tokens.
5**Step 3.7 Flash:** Input: $0.00020 per 1k tokens, Output: $0.00115 per 1k tokens.

competitors

Step 3.7 Flash vs Competitors

Step 3.7 Flash is positioned as a leading multimodal agentic model, competing in the 'Flash' model market against established and emerging AI solutions. Its strengths lie in native multimodal perception, robust tool orchestration, and competitive performance in coding and visual intelligence benchmarks.

Google Gemini (as an agent)↗

Gemini is a multimodal AI model capable of understanding and operating across various data types, including images, video, and text, enabling sophisticated reasoning and direct UI control.

Similar to Step 3.7 Flash, Gemini offers real-time perception and action capabilities, particularly strong in multimodal understanding and complex decision-making. Its freemium access is typically via API for developers, allowing for the creation of custom agents.

AskUI Vision Agent↗

AskUI Vision Agent specializes in automating desktop and mobile workflows by visually understanding and interacting with graphical user interfaces at the operating system level.

This is a direct competitor focusing on the 'see and act' aspect for digital interfaces, translating visual data into low-level commands. Its specialization in GUI automation provides a focused alternative to a general 'flash-speed' agent model.

SkygenOn Stork Compare

Skygen is an AI desktop automation agent that provides real-time visibility and runs tasks across various applications, websites, and cloud computers.

Skygen aligns closely with Step 3.7 Flash's description of a 'flash-speed agent model that can see and act' within digital environments, emphasizing real-time operation and broad application interaction. It offers a freemium model, similar to the described pricing of Step 3.7 Flash.

OpenAI Operator↗

OpenAI Operator is designed to execute multi-step actions directly within a web browser, enabling autonomous completion of complex web tasks.

While its pricing is listed as a paid 'Pro' tier rather than freemium, OpenAI Operator offers a direct functional comparison by focusing on agents that 'see' (perceive web interfaces) and 'act' (perform tasks) at speed within a browser environment.

Agno AI Agents↗

Agno AI Agents is a framework built for performance, enabling the creation of lightning-fast, production-ready AI agents with minimal startup times and a tiny footprint.

Agno directly addresses the 'flash-speed' aspect, offering a framework to build agents that are exceptionally fast and efficient. While its 'see' capability is more about perceiving digital states for action rather than explicit visual recognition, its emphasis on rapid, production-grade agent deployment makes it a strong competitor for high-performance autonomous tasks.

❓

Frequently Asked Questions

+What is Step 3.7 Flash?

+Is Step 3.7 Flash free?

Step 3.7 Flash operates on a freemium model, offering a free tier. For usage beyond the free tier, it is usage-based, with input tokens priced at $0.00020 per 1k tokens and output tokens at $0.00115 per 1k tokens.

+What are the main features of Step 3.7 Flash?

Key features of Step 3.7 Flash include its 198-billion-parameter sparse MoE architecture, native image and video understanding via a 1.8B-parameter vision encoder, a 256k context window, three selectable reasoning levels, and reliable interaction with external APIs and tools. It also supports NVIDIA inference stacks and offers an Advisor Mode for cost-efficient agentic operations.

+Who should use Step 3.7 Flash?

Step 3.7 Flash is primarily intended for AI Developers, Enterprise Users, Engineers/Researchers, and Content Creators who require advanced multimodal AI agents for tasks such as building AI applications, automating complex workflows, agentic coding, and processing diverse data types.

+How does Step 3.7 Flash compare to alternatives?

Step 3.7 Flash distinguishes itself with native multimodal support (images and video), outperforming competitors like DeepSeek V4 Flash in this aspect. It demonstrates strong coding performance, scoring 56.3 on SWE-Bench PRO, and leads the ClawEval-1.1 benchmark for tool orchestration. Its Advisor Mode offers a cost-effective alternative to models like Claude Opus 4.6 for similar performance levels.

Related AI Tools

Other tools in this category, ranked by community signal

Browse the full directory →

Pounce

🤖 AI Tools

AI monitors X and Reddit for the right conversations — you just reply and build relationships.

Hermes

🤖 AI Tools

Self-hosted AI agent that remembers your projects, builds skills automatically, and reaches you on Telegram, Discord & more. MIT license. No tracking.

Upstash Agent Analytics

🤖 AI Tools

Upstash is a serverless data platform providing low latency and high scalability for real-time applications. Optimize your data infrastructure with Upstash's managed services for Redis, Vector, QStash, and other key data technologies.

Novu Connect

🤖 AI Tools

Novu is an open-source notification platform that empowers developers to create robust, multi-channel notifications for web and mobile apps. With powerful workflows, seamless integrations, and a flexible API-first approach, Novu enables product teams.

Tinfoil Pigeons

🤖 AI Tools

Tinfoil Pigeons is a live radar scope: enter your postcode and see the flights overhead right now, then tap one to find out what it is.

Verol

🤖 AI Tools

Real-time AI fact checker and hallucination detector for ChatGPT, Claude, Gemini & Grok. Automatically verifies responses.

For builders

This page is doing a job for someone else’s tool.

AI agents read it. Buyers find it. Backlinks accrue. Your tool can have one too — live in 24 hours, indexed by Claude, ChatGPT, and Perplexity, queryable via MCP.

List your tool What you get