step 01
Ground it in your domain
qwen3-embedding Retrieve over your docs, code and tickets so the copilot answers from your reality — not from whatever a generic model happened to memorise.
// use cases · copilots
Ship in-app and internal assistants on open models, tool calling, reasoning and your own domain context.
// how it works
Retrieval, tool calling and reasoning from a single OpenAI-compatible endpoint — so your copilot works on your data and only inside the EU.
step 01
qwen3-embedding Retrieve over your docs, code and tickets so the copilot answers from your reality — not from whatever a generic model happened to memorise.
step 02
deepseek-v4-flash Native tool and function calling lets the copilot query your systems, run actions and chain steps — the same JSON schema you already use with OpenAI.
step 03
qwen3.6 The model reasons over context and tool output, then streams its answer into your product — 2× throughput via speculative decoding, and zero logs.
// drop-in
Point the OpenAI SDK — or the Vercel AI SDK, LangChain, your own chat loop — at Helmcode. Same tools, same streaming, private models in the EU.
read_the_docsfrom openai import OpenAI client = OpenAI( api_key="sk-...", base_url="https://api.helmcode.com/v1", # one line changes ) # the tools your copilot is allowed to call tools = [{ "type": "function", "function": { "name": "search_tickets", "description": "Search the customer's support history", "parameters": schema, }, }] reply = client.chat.completions.create( model="deepseek-v4-flash", messages=messages, tools=tools, # native tool calling stream=True, )
// why helmcode
The assistant sees your most sensitive work — code, cases, customer data. That's exactly what closed APIs ask you to send away.
What your experts ask and what the copilot reads is never stored, and never trains a model — not ours, not anyone's.
Every copilot turn is processed only on EU infrastructure — not on US hyperscalers subject to the Cloud Act. GDPR and AI Act native.
Flagship open models with native function calling and reasoning — everything a real copilot needs to act, not just chat.
Every message, retry and agent loop is included. Limits are RPM and concurrency per key — never total tokens, so a heavy user isn't a surprise bill.
DeepSeek V4-Flash, Qwen 3.6, Gemma 4. No vendor can deprecate the model under your copilot or change pricing on you overnight.
Change the base URL and key. The Vercel AI SDK, LangChain, LlamaIndex and your own chat code keep working — streaming and tools included.
// copilots faq
What product and engineering teams ask before building assistants on their own data.
deepseek-v4-flash for flagship reasoning and tool calling, qwen3.6 (35B MoE) for 2× throughput via speculative decoding, and gemma4 when you need vision. All share one OpenAI-compatible API.
Yes — native function calling with the same JSON schema you already use with OpenAI, plus streaming. Your copilot can query systems and take actions, not just answer.
Yes. Pair it with retrieval (qwen3-embedding + rerank) to ground answers in your docs, code and tickets — no fine-tuning required, and your data stays in the EU.
No. Zero logs — what your experts type and what the copilot reads is never persisted and never trains a model.
Yes. Point any OpenAI-compatible client at our base URL with your API key. The Vercel AI SDK, LangChain, LlamaIndex and custom code work unchanged.
Yes. For strict compliance, run on a dedicated GPU or on-premise inside your own datacenter — the same API and code, with data that never leaves your network.
// get started
Skip the AI infra work. Deploy your first private inference endpoint today.
Flat rate. EU data. OpenAI API compatible.
// cookies
We use strictly necessary cookies to run the site and, only with your consent, Google Analytics to understand usage. No advertising, ever — see our Cookie Policy.
// preferences