// models
Open models.
Frontier results.
The open-model frontier already covers the work enterprises actually run, retrieval, code, agents, extraction. More than 80% of tasks can be done with open models, same result at a much lower cost.
// the 80 / 20
Open models cover 80% of enterprise inference.
The work that runs your business (RAG, classification, text-to-speech, internal assistants) is solved today by open-weight models. Leave just that 20% for the frontier models.
Bleeding-edge reasoning at the very limit of capability. Real, but specific — and rarely the workload a regulated team needs to keep in-house. We are honest about that boundary instead of pretending it doesn't exist.
// benchmark · artificial analysis
Frontier-class intelligence, open-weight prices.
The Artificial Analysis Intelligence Index is a composite of nine hard evaluations — GPQA Diamond, SciCode, Terminal-Bench, Humanity's Last Exam and more. Open-weight models now sit just under the closed frontier — and the one Helmcode runs costs cents.
GLM-5.2 leads all 92 open-weight models at 51 — within reach of Claude Opus 4.8 (56) and GPT-5.5 (55). The models Helmcode runs sit just behind: MiMo V2.5 (42), DeepSeek V4 Flash (40), Qwen3.6 35B (32) and Gemma 4 26B (26). DeepSeek V4 Flash scores 40 at $0.14 / $0.28 per million tokens — against Opus 4.8's $5.00 / $25.00, that's ~35× cheaper to read and ~90× cheaper to write, for 67% of the index leader's intelligence.
Source: artificialanalysis.ai · Intelligence Index v4.1 · June 2026 · composite of 9 evaluations · price = first-party API, per 1M tokens (input · output). Rows marked on helmcode — MiMo V2.5, DeepSeek V4 Flash, Qwen3.6 35B and Gemma 4 26B — run on Helmcode. GLM-5.2 and DeepSeek V4 Pro are open-weight but not on the platform.
// proven in production
Benchmarks are one thing. Traffic is another.
The argument isn't theoretical. On our own platform, near-enough all inference already runs on open models — and most of it on a single 35B one.
333.8B
Tokens in production
cumulative
76%
Run on Qwen 3.6 (35B)
the open workhorse
99.5%
Of tokens on open models
LLM traffic
// the lineup
What actually runs the 80%.
The four language models, ordered by real production token share. One open 35B model carries most of the load — the rest step in for reasoning, scale and multimodal.
qwen3.6 35B MoE · 256K ctx 76.1% High-volume RAG, classification, code deepseek-v4-flash 284B MoE · 1M ctx 12.4% Reasoning, agents, long-context mimo-v2.5 310B MoE · 1M ctx 8.6% Multimodal — vision + audio + text gemma4 26B MoE · 256K ctx 2.4% Efficient assistants, document work Plus embeddings & reranking (qwen3-embedding, rerank) and speech (kokoro, whisper-large-v3) — nine models on one API. Full model reference →
// models faq
Open models, answered.
The questions every CTO asks before trusting open weights in production.
Are open models actually good enough?
For the work enterprises run day to day — yes. On Artificial Analysis’ Intelligence Index, GLM-5.2 ranks #1 of 92 open-weight models, just under the closed frontier — and the model we run, DeepSeek V4 Flash, delivers around two-thirds of leader-level intelligence for cents per million tokens. In production, 99.5% of all tokens on Helmcode already flow through open models. The gap that remains is a narrow set of frontier tasks most teams never hit.
Which model should I use?
Start with Qwen 3.6 — it carries three quarters of all production traffic and is the fastest, cheapest path for RAG, classification and code. Move to DeepSeek V4-Flash for hard reasoning, agents or 1M-token context, and MiMo for multimodal input. Same API, just change the model id.
What about the 20% that genuinely needs GPT-5?
It exists, and it is more specific than most assume — frontier-only reasoning at the very edge of capability. Helmcode is honest about that boundary: we cover the 80% that runs your business, privately and at a flat rate, not the last mile of the leaderboard.
How current are these benchmarks?
Figures are published scores as of June 2026 — open models served on Helmcode, closed-model numbers from vendor reports. Benchmarks move every release, so treat them as directional. What does not move is where your data is processed: always the EU, always zero logs.
Can I run a model that is not listed?
On Dedicated and On-premise plans, yes — custom or fine-tuned open-weight models on hardware reserved for you. The Shared cluster serves the curated lineup above.
// get started
START BURNING TOKENS
Skip the AI infra work. Deploy your first private inference endpoint today.
Flat rate. EU data. OpenAI API compatible.