step 01
Transcribe
whisper-large-v3 Turn calls and audio into text — 99+ languages, 3.2% WER on Spanish, up to 25MB per file. Recordings are processed only on EU infrastructure.
// use cases · voice
Speech-to-text, voicebots and text-to-speech from one provider on EU infrastructure, 99+ languages, sub-second synthesis.
// how it works
Transcription, an LLM and speech synthesis from a single OpenAI-compatible endpoint — so the audio takes one short trip, and only inside the EU.
step 01
whisper-large-v3 Turn calls and audio into text — 99+ languages, 3.2% WER on Spanish, up to 25MB per file. Recordings are processed only on EU infrastructure.
step 02
deepseek-v4-flash Summarize, route, answer or drive a voicebot with an LLM over the transcript — tool calling included, so the conversation actually does something.
step 03
kokoro Synthesize natural speech in under a second — 67 voices, Spanish included — for real-time voicebots, IVR and accessibility.
// drop-in
The OpenAI audio endpoints — transcriptions and speech — work as-is. Change the base URL and key and your existing voice code runs on private EU models.
read_the_docsfrom openai import OpenAI client = OpenAI( api_key="sk-...", base_url="https://api.helmcode.com/v1", # one line changes ) # 1 · transcribe a call — 99+ languages, stays in the EU text = client.audio.transcriptions.create( model="whisper-large-v3", file=open("call.mp3", "rb"), ) # 2 · synthesize the reply — sub-second, 67 voices speech = client.audio.speech.create( model="kokoro", voice="alba", input=reply, )
// why helmcode
Recordings are the most sensitive data you hold — full of PII, and a regulator's favourite. Voice on Helmcode keeps all of it in the EU.
Calls, transcripts and synthesized audio are never stored, and never train a model. The PII inside a recording stays your problem to no one.
Speech-to-text, the LLM and text-to-speech all run on EU infrastructure — not on US hyperscalers subject to the Cloud Act. GDPR and AI Act native.
The full voice stack — transcription, reasoning and synthesis — behind a single OpenAI-compatible endpoint. One vendor, one bill, one network hop.
Sub-second synthesis and fast transcription on dedicated GPUs — low enough latency for live voicebots and IVR, not just batch jobs.
Every minute of audio in and out is included. Limits are RPM and concurrency per key — never total tokens, so high call volume isn't a surprise bill.
// voice faq
What CX, operations and engineering teams ask before moving voice in-house.
whisper-large-v3 for transcription (99+ languages, 3.2% WER on Spanish, up to 25MB / ~2 min per file) and kokoro for text-to-speech (82M parameters, sub-second latency, 67 voices including Spanish).
No. Zero logs — audio, transcripts and synthesized speech are never persisted and never train a model. Transcribing recordings stops being a privacy problem.
Yes. On dedicated GPUs, kokoro synthesizes in under a second and transcription runs with low latency — enough for real-time voicebots and IVR, not just batch transcription.
Yes, from one provider. Transcribe with whisper-large-v3, reason and respond with an LLM (deepseek-v4-flash, including tool calling), then speak with kokoro — all behind one OpenAI-compatible API.
Yes. The audio.transcriptions and audio.speech endpoints are OpenAI-compatible — change the base URL and key and your existing code works.
Run on a dedicated GPU or fully on-premise inside your own datacenter — the same API and code, with audio that never leaves your network. Built for contact centers, healthcare and the public sector.
// get started
Skip the AI infra work. Deploy your first private inference endpoint today.
Flat rate. EU data. OpenAI API compatible.
// cookies
We use strictly necessary cookies to run the site and, only with your consent, Google Analytics to understand usage. No advertising, ever — see our Cookie Policy.
// preferences