writing/news/2026/06
NewsJun 10, 2026·6 min read

Cohere Open-Sources North Mini Code, an Agentic Coding Model That Runs on a Single H100

Cohere released North Mini Code on June 9, an Apache 2.0 mixture-of-experts coding model with 30B total and 3B active parameters, a 256K context window, and the ability to run agentic software engineering workflows on a single H100 GPU on-premises.

Cohere released North Mini Code on June 9, 2026, its first open-source coding model and the inaugural member of a new generation of models the company says are optimized specifically for software engineering. The mixture-of-experts (MoE) model ships under the permissive Apache 2.0 license and is designed to run agentic coding workflows on a single H100 GPU, making it a notable option for teams that want to keep their code and AI infrastructure on-premises.

The launch arrived in the middle of a busy week for frontier AI, but North Mini Code stakes out a different position: rather than chasing the largest managed model, Cohere is targeting developers who need sovereignty and flexibility over their agentic coding stack.

Key Highlights

  • 30B total parameters, 3B active — a sparse mixture-of-experts design that keeps inference cheap while preserving capability.
  • Apache 2.0 license — weights are freely downloadable and usable in commercial products without restriction.
  • 256K-token context window with a 64K-token maximum generation length.
  • Runs on a single H100 GPU at FP8 precision — no large GPU cluster required.
  • Built for agentic software engineering, including sub-agent orchestration, architecture mapping, code review, and terminal work.

Details

North Mini Code is purpose-built for agentic workflows rather than single-shot completions. According to Cohere, the model can understand and orchestrate sub-agents, map systems architecture across a codebase, conduct code reviews, and operate in the terminal — the kind of multi-step behavior that modern coding agents depend on.

On the Artificial Analysis Coding Index, North Mini Code scores 33.4, placing it well above GLM-4.7-Flash at 25.9 and just under Qwen3.6 35B A3B at 35.2. On the broader Artificial Analysis Intelligence Index it scores 27.6, landing above gpt-oss-20B at 24.5 and just below Mistral Small 4 at 27.8. The model is fast for its class: Cohere reports roughly 199 output tokens per second on its own API, with up to 2.8x higher throughput and a 30% inter-token latency advantage over Devstral Small 2.

The trade-off is focus. Independent benchmarking notes that North Mini Code underperforms on non-coding agentic tasks, scoring just 14% on GDPval-AA and 37% on the τ²-Bench Telecom evaluation, for an overall agentic index of 21.7. In other words, this is a model tuned for code, not a general-purpose reasoning system.

Impact

The release matters most for organizations that cannot or will not send their source code to a third-party API. A model that fits on a single H100 and carries an Apache 2.0 license is well within reach for an enterprise data center, a regional cloud provider, or a research lab — and it removes the per-token billing that makes large-scale agentic coding expensive on managed platforms.

For the MENA region in particular, where data residency and digital sovereignty are increasingly central to government and enterprise procurement, an open-weight coding model that can be self-hosted on local infrastructure is a meaningful building block. Teams can run code review, refactoring, and agentic development entirely inside their own perimeter, with zero per-token cost and full control over where data lives.

Background

North Mini Code is a departure for Cohere, which built its reputation on enterprise retrieval and the Command family of language models. Co-founder Nick Frosst announced the release as the first of a new model generation aimed at developers, signaling that Cohere intends to compete directly in the open-weight coding space alongside the likes of Qwen Coder, Devstral, and GLM.

The model is available now on Hugging Face, through the OpenCode agent platform, on OpenRouter, and via Cohere's own Model Vault and API, where a free trial is offered.

What's Next

The "Mini" in the name signals that larger members of this coding-focused generation are likely to follow. The open question is whether Cohere can scale the agentic coverage — particularly on non-coding tasks — without giving up the efficiency that makes North Mini Code attractive. For now, developers get a fast, permissively licensed, self-hostable coding model that fits on hardware many teams already own.


Source: Cohere