// changelog

Every shipped
change.

A running log of what we ship — new models, API surface, performance and platform. No marketing, just the diffs.

platform
NewModel

MiMo V2.5 is now available

Full multimodal input — image, audio and text in, text out — in a single model, behind the same OpenAI-compatible API.

  • Call it with model id mimo-v2.5
  • 310B MoE · 1M context · vision + audio
platform
Improved

2× throughput on Qwen 3.6

Speculative decoding is now on by default for qwen3.6 — roughly double the tokens per second at the same latency, no change on your side.

api
NewAPI

Reranking endpoint

A dedicated /v1/rerank endpoint for cross-lingual semantic reranking — the missing middle step of RAG (embedding → rerank → LLM).

  • Powered by Qwen3-Reranker-8B
  • 100+ languages
console
Security

Zero-logs attestation in the console

Every API key now shows a live attestation that no prompt or completion content is stored — something your compliance team can screenshot.

platform
New

Dedicated GPU plans

Exclusive NVIDIA B200 hardware inside Helmcode's EU infrastructure — guaranteed throughput, full network isolation and custom models.

  • Custom models & fine-tuning
  • Custom SLA
platform
Improved

Faster cold starts, lower p95

Reworked model loading and routing in the control plane. Cold starts are noticeably quicker and p95 latency is down across the board.

api
NewAPI

Speech: TTS and STT

Kokoro text-to-speech (sub-second latency, 67 voices) and Whisper Large v3 speech-to-text (99+ languages) — both behind the same key.

  • /v1/audio/speech and /v1/audio/transcriptions
api
Fixed

Streaming with tool calls

Fixed an edge case where streamed responses could truncate when a tool call and content were interleaved. Streaming is solid across all chat models.

That's everything so far — updated as we ship.