MiniMax M2.5: China's Frontier AI Model Rivals Claude Opus at 1/20th the Cost

Chinese AI startup MiniMax launched its flagship model M2.5 on February 12, 2026, delivering frontier-level performance in coding, agentic tool use, and office automation — at a fraction of the cost charged by Western competitors.

Key Highlights

80.2% on SWE-Bench Verified, the industry standard for real-world coding tasks
51.3% on Multi-SWE-Bench, handling complex multi-repository codebases
76.3% on BrowseComp, demonstrating advanced web browsing and search capabilities
37% faster than its predecessor M2.1 on coding benchmarks
Open weights released on Hugging Face under a modified MIT License

Performance That Matches the Best

M2.5 directly competes with Anthropic's Claude Opus 4.6 and OpenAI's GPT-5 on key benchmarks. On SWE-Bench Verified, the model completes tasks in 22.8 minutes — virtually identical to Claude Opus 4.6's 22.9 minutes — while using approximately 20% fewer search rounds.

On the VIBE-Pro benchmark for UI and product development, M2.5 performs on par with Claude Opus 4.5. For office productivity tasks measured by GDPval-MM, the model achieves a 59.0% average win rate against competing frontier models.

A Pricing Revolution

The most striking aspect of M2.5 is its cost structure. MiniMax positions M2.5 as "the first frontier model where users do not need to worry about cost":

M2.5-Lightning: $0.30 per million input tokens, $2.40 per million output tokens
M2.5 standard: Half the cost of Lightning
Running cost: Approximately $1 per hour at 100 tokens per second

This makes M2.5 between 10 to 20 times cheaper than Claude Opus, Gemini 3 Pro, and GPT-5 for equivalent workloads.

How MiniMax Built It

Behind M2.5's performance lies MiniMax's proprietary Forge Framework, an agent-native reinforcement learning system that achieved a 40x training speedup. The model was trained across over 200,000 real-world coding scenarios spanning more than 10 programming languages.

A novel CISPO algorithm ensures training stability for MiniMax's large-scale Mixture of Experts (MoE) architecture, while process rewards monitor generation quality during long-context agent rollouts.

One notable emergent behavior: M2.5 developed a "spec-writing tendency" during training, where it autonomously decomposes and plans features before writing code — behaving like an experienced software architect rather than a line-by-line coder.

Real-World Adoption

MiniMax reports that M2.5 already handles 30% of the company's internal tasks autonomously, with M2.5-generated code representing 80% of new commits in their repositories. The model integrates natively with the MiniMax Agent platform, offering standardized skills for Word, PowerPoint, and Excel automation.

What It Means

The release intensifies the AI price war between Chinese and Western labs. Following Alibaba's Qwen 3.5-Plus and DeepSeek's earlier disruptions, MiniMax M2.5 further demonstrates that frontier-level AI capabilities are no longer exclusive to the highest-priced offerings.

For developers and enterprises, M2.5 offers a compelling option: near-identical performance to the most expensive models at dramatically lower costs, with open weights enabling local deployment and customization.

Source: MiniMax