Cursor released Composer 2.5 on May 18, 2026, its second-generation in-house coding model built on Moonshot AI's open-source Kimi K2.5 checkpoint. The model matches Claude Opus 4.7 on the SWE-Bench Multilingual benchmark while costing roughly one-tenth as much per token, marking Cursor's transition from a wrapper around third-party models to a frontier-scale AI lab in its own right.
Key Highlights
- Composer 2.5 scores 79.8% on SWE-Bench Multilingual, essentially tied with Claude Opus 4.7 at 80.5%.
- On Cursor's internal CursorBench v3.1 it leads Opus 4.7 by 63.2% to 61.6% at default settings.
- Standard pricing is $0.50 per million input tokens and $2.50 per million output tokens, with a Fast tier at $3 and $15 for priority routing.
- The model was trained with 25 times more synthetic tasks than Composer 2 and uses a new targeted reinforcement learning technique with textual feedback.
Details
According to Cursor's launch post, 85% of the training compute went into post-training, not the base pre-training. The team designed a new technique called targeted RL with textual feedback that inserts localized hints at exact points where the model could improve, then uses on-policy distillation to nudge token probabilities. Cursor describes the result as a substantial improvement in intelligence and behavior, particularly on sustained work and complex instruction-following.
Engineering details revealed in the post include a Sharded Muon optimizer that achieves a 0.2 second optimizer step on trillion-parameter models and a Dual mesh HSDP layout that separates expert and non-expert weights for better GPU utilization. Cursor also confirmed a frontier-scale training partnership with SpaceXAI to build a substantially larger model using ten times more compute through Colossus 2 and its roughly one million H100-equivalent GPUs.
Benchmarks and Limitations
Composer 2.5 is not universally dominant. On Terminal-Bench 2.0, which measures long-horizon shell tasks, GPT-5.5 still leads at 82.7% versus Composer 2.5's 69.3%, a 13-point gap that Cursor acknowledges as the clearest performance limitation. Independent reviewers note that the model excels at parallel multi-file edits and in-IDE iteration loops, but trails on heavy autonomous terminal work.
Developer reactions on X and developer forums have focused on the price-performance ratio. Public benchmarks from third parties list Composer 2.5 as the third-best coding model in the world at roughly 55 cents per representative task, far below the running cost of Opus 4.7 Extra High or GPT-5.5 Fast on equivalent workloads.
Impact
The launch arrives in the middle of a multi-agent coding arms race. Google's Antigravity 2.0, xAI's Grok Build, OpenAI's Codex, and Anthropic's Claude Code are all shipping multi-agent orchestration features in the same window. The competitive question for IDE-based products is no longer whether the model can write code, but how cheap, fast, and coordinated a swarm of agents can be while keeping a codebase coherent.
For development teams already paying for premium frontier-model usage, Composer 2.5 changes the budget math. A task that previously cost a few dollars in Opus 4.7 tokens can now run on Composer 2.5 for cents, which lets product teams afford parallel agents on the same workflow without blowing up the bill.
Background
Cursor began as an AI-native fork of Visual Studio Code that called out to OpenAI and Anthropic models for completions and chat. Composer 1 introduced the first in-house model focused on speed within the editor. Composer 2 expanded into longer agentic loops. Composer 2.5 is the first release where Cursor's own model is benchmark-competitive with the frontier closed-source models from Anthropic and OpenAI, rather than being positioned as a faster but weaker alternative.
What's Next
Cursor confirmed that the SpaceXAI partnership is aimed at training a model significantly larger than Composer 2.5 from scratch, with ten times more compute. The launch promotion doubles usage for the first week, suggesting Cursor wants to convert as many trial users to paid plans as possible before competitors respond with their own price cuts.
The broader pattern is clear: the AI coding tooling layer is consolidating around a handful of vertically integrated products that own both the IDE and the underlying model. The next twelve months will likely be defined by which of those products can coordinate multiple agents on a single codebase without producing semantic chaos.