TL;DR
- Launch Scope: MiniMax has launched its M3 AI model as a broader long-context AI package for developers.
- Model Claims: M3 pairs bigger working memory, multimodal input, and faster prompt handling, but outside accuracy evidence is still missing.
- Next Test: API access, pricing, and the promised weight release must show whether M3 becomes practical beyond launch-day claims.
MiniMax has launched its M3 model and says it built the package around bigger working memory, broader input types, and faster handling of long tasks, but outside testing still has to show how much of that package carries into daily developer use.
M3 offers much larger working memory than a typical chatbot session and keep far larger blocks of text or code in view at once. MiniMax also presented M3 as a domestic model that combines frontier coding, agentic capabilities, a 1M-token context window, and native multimodal processing in one architecture.
M3 supports text image and video input with text output and reach developers through OpenAI-compatible endpoints. Teams working across source files, screenshots, diagrams, and other visual reference material will be able to keep more of that workflow inside one model instead of switching between separate tools.
Specs, rollout, and the market test
M3 also offers a 512000-token guaranteed minimum context. Developers planning around long codebases and chained agent tasks get a firmer lower-bound number from that reported floor than from a higher headline ceiling alone.
MiniMax promises the weights for M3 would be released within 10 days. Broader outside verification still depends on whether that downloadable package arrives on schedule, because direct access would let buyers run the model more directly.
M3 scores 59.0 on SWE-bench Pro and 66.0 percent on Terminal-Bench 2.1.
A coding interface at code.minimax.io already offers a fast path to hands-on testing, and MiniMax also presented the API as live at launch. Early access gives teams a way to probe prompt length, latency, and tool behavior before a broader download package arrives. Users who want to test code-heavy workflows before committing to a new model family can use it to find out whether the launch claims translate into usable tooling.
How M3 is claimed to speed long context
M3 uses a Grouped-Query Attention backbone with MiniMax Sparse Attention. MiniMax is framing that design as a way to lower the cost of processing very large prompts before the model starts answering, a bottleneck often called prefill in long-context systems.
At million-token scale, M3 delivers 15.6x faster decoding and 9.7x faster prefill versus M2. Faster prompt ingestion and output become more important when coding agents have to scan large repositories, long documents, or multi-step task histories before producing something useful.
Independent accuracy data was not published, so developers still lack the trade-off evidence they would need before trusting the latency pitch in production.
Where M3 fits in MiniMax’s run and the wider field
MiniMax’s record 4M token context models had already pushed the company into the long-context race in 2025 before the M3 model release.
MiniMax introduced M2.5 in February 2026. April 2026 brought a stronger move into developer tooling, which places M3 inside a shorter product cycle aimed at working coders rather than a benchmark-only announcement.
Anthropic already offers a 1M-token context window for Claude Opus and Sonnet.
M3 will be priced at $0.60 per million tokens for input and $2.40 per million for output up to 512000 tokens.

