// changelog

Every shipped
change.

A running log of what we ship, new models, API surface, performance and platform. No marketing, just the diffs.

Jun 12, 2026 platform

Improved

2× throughput on Qwen 3.6

Speculative decoding is now on by default for qwen3.6, roughly double the tokens per second at the same latency, no change on your side.

Jun 5, 2026 api

NewAPI

A dedicated /v1/rerank endpoint for cross-lingual semantic reranking, the missing middle step of RAG (embedding → rerank → LLM).

May 28, 2026 console

Security

Every API key now shows a live attestation that no prompt or completion content is stored, something your compliance team can screenshot.

May 19, 2026 platform

New

Exclusive NVIDIA B200 hardware inside Helmcode's EU infrastructure, guaranteed throughput, full network isolation and custom models.

May 8, 2026 platform

Improved

Reworked model loading and routing in the control plane. Cold starts are noticeably quicker and p95 latency is down across the board.

Apr 30, 2026 api

NewAPI

Kokoro text-to-speech (sub-second latency, 67 voices) and Whisper Large v3 speech-to-text (99+ languages), both behind the same key.

Apr 22, 2026 api

Fixed

Fixed an edge case where streamed responses could truncate when a tool call and content were interleaved. Streaming is solid across all chat models.

That's everything so far, updated as we ship.