// models

Open models.
Frontier results.

The open frontier already covers the work enterprises actually run, retrieval, code, agents, extraction. Benchmarked against the closed labs, proven in production, served only from the EU.

see_benchmarks

// the 80 / 20

Open models cover 80% of enterprise inference.

The work that runs your business, RAG, classification, code generation, internal assistants, is solved today by open-weight models. The 20% where you truly need a frontier closed model is narrower than it looks.

The 80%, open, private, flat-rate

RAG over internal knowledge
Classification & routing
Code generation & review
Internal assistants & copilots
Document extraction
Summarization
Translation
Autonomous agents

The 20%, frontier closed labs

Bleeding-edge reasoning at the very limit of capability. Real, but specific, and rarely the workload a regulated team needs to keep in-house. We are honest about that boundary instead of pretending it doesn't exist.

// benchmark · artificial analysis

Frontier-class intelligence, open-weight prices.

The Artificial Analysis Intelligence Index is a composite of nine hard evaluations, GPQA Diamond, SciCode, Terminal-Bench, Humanity's Last Exam and more. Open-weight models now sit just under the closed frontier.

Claude Fable 5 Anthropic 60 N/A

Claude Opus 4.8 Anthropic 56 $5.00 / $25.00

GPT-5.5 OpenAI 55 $5.00 / $30.00

GLM-5.2 Z.AI open 51 $1.40 / $4.40

DeepSeek V4 Pro DeepSeek open 44 $0.43 / $0.87

DeepSeek V4 Flash DeepSeek on helmcode 40 $0.14 / $0.28

Qwen3.6 35B Alibaba on helmcode 32 $0.25 / $1.49

Gemma 4 26B Google on helmcode 26 $0.13 / $0.40

GLM-5.2 leads all 92 open-weight models at 51, within reach of Claude Opus 4.8 (56) and GPT-5.5 (55). The models Helmcode runs sit just behind: DeepSeek V4 Flash (40), Qwen3.6 35B (32) and Gemma 4 26B (26). DeepSeek V4 Flash scores 40 at $0.14 / $0.28 per million tokens, against Opus 4.8's $5.00 / $25.00, that's ~35× cheaper to read and ~90× cheaper to write, for 67% of the index leader's intelligence.

Source: artificialanalysis.ai · Intelligence Index v4.1 · June 2026 · composite of 9 evaluations · price = first-party API, per 1M tokens (input · output). Rows marked on helmcode, DeepSeek V4 Flash, Qwen3.6 35B and Gemma 4 26B, run on Helmcode. GLM-5.2 and DeepSeek V4 Pro are open-weight but not on the platform.

// proven in production

The production numbers.

On our own platform, near-enough all inference already runs on open models, and most of it on a single 35B one.

333.8B

Tokens in production

cumulative

76%

Run on Qwen 3.6 (35B)

the open workhorse

99.5%

Of tokens on open models

LLM traffic

See the live numbers on OpenData

// the lineup

What actually runs the 80%.

The three language models, ordered by real production token share. One open 35B model carries most of the load, the rest step in for reasoning, scale and multimodal.

qwen3.6 35B MoE · 256K ctx High-volume RAG, classification, code

deepseek-v4-flash 284B MoE · 1M ctx Reasoning, agents, long-context

gemma4 26B MoE · 256K ctx Efficient assistants, document work

Plus embeddings & reranking (qwen3-embedding, rerank) and speech (kokoro, whisper-large-v3), eight models on one API. Full model reference →

// models faq

Open models, answered.

The questions everyone asks before trusting open models in production.

Are open models actually good enough?

For the work enterprises run day to day, yes. On Artificial Analysis’ Intelligence Index, GLM-5.2 ranks #1 of 92 open-weight models, just under the closed frontier, and the model we run, DeepSeek V4 Flash, delivers around two-thirds of leader-level intelligence for cents per million tokens. In production, 99.5% of all tokens on Helmcode already flow through open models. The gap that remains is a narrow set of frontier tasks most teams never hit.

Which model should I use?

Start with Qwen 3.6, it carries three quarters of all production traffic and is the fastest, cheapest path for RAG, classification and code. Move to DeepSeek V4-Flash for hard reasoning, agents or 1M-token context. For image and audio input, Qwen 3.6 and Gemma 4 are both multimodal. Same API, just change the model id.

What about the 20% that genuinely needs GPT-5?

It exists, and it is more specific than most assume, frontier-only reasoning at the very edge of capability. Helmcode is honest about that boundary: we cover the 80% that runs your business, privately and at a flat rate, not the last mile of the leaderboard.

How current are these benchmarks?

Figures are published scores as of June 2026, open models served on Helmcode, closed-model numbers from vendor reports. Benchmarks move every release, so treat them as directional. What does not move is where your data is processed: always the EU, always zero logs.

Can I run a model that is not listed?

On Dedicated and On-premise plans, yes, custom or fine-tuned open-weight models on hardware reserved for you. The Shared cluster serves the curated lineup above.

// get started

START BURNING TOKENS

Skip the AI infra work. Deploy your first private inference endpoint today.

Flat rate. EU data. OpenAI API compatible.

book_a_call

Open models.Frontier results.