// use cases · copilots

Copilots for
your experts.

Ship in-app and internal assistants on open models, tool calling, reasoning and your own domain context.

// how it works

Assistants that act, not just chat.

Retrieval, tool calling and reasoning from a single OpenAI-compatible endpoint, so your copilot works on your data and only inside the EU.

step 01

Ground it in your domain

qwen3-embedding

Retrieve over your docs, code and tickets so the copilot answers from your reality, not from whatever a generic model happened to memorise.

step 02

Wire up your tools

deepseek-v4-flash

Native tool and function calling lets the copilot query your systems, run actions and chain steps, the same JSON schema you already use with OpenAI.

step 03

Reason and respond

qwen3.6

The model reasons over context and tool output, then streams its answer into your product, 2× throughput via speculative decoding, and zero logs.

// drop-in

Change one line. Keep your stack.

Point the OpenAI SDK, or the Vercel AI SDK, LangChain, your own chat loop, at Helmcode. Same tools, same streaming, private models in the EU.

read_the_docs

copilot.py

from openai import OpenAI

client = OpenAI(
    api_key="sk-...",
    base_url="https://api.helmcode.com/v1",  # one line changes
)

# the tools your copilot is allowed to call
tools = [{
    "type": "function",
    "function": {
        "name": "search_tickets",
        "description": "Search the customer's support history",
        "parameters": schema,
    },
}]

reply = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=messages,
    tools=tools,            # native tool calling
    stream=True,
)

// why helmcode

Copilots your experts can trust.

The assistant sees your most sensitive work, code, cases, customer data. That's exactly what closed APIs ask you to send away.

Zero logs, by architecture.

What your experts ask and what the copilot reads is never stored, and never trains a model, not ours, not anyone's.

Runs in the EU.

Every copilot turn is processed only on EU infrastructure, not on US hyperscalers subject to the Cloud Act. GDPR and AI Act native.

Tool calling & reasoning.

Flagship open models with native function calling and reasoning, everything a real copilot needs to act, not just chat.

No caps on copilot turns.

Every message, retry and agent loop is included. Limits are RPM and concurrency per key, never total tokens, so a heavy user isn't a surprise bill.

Open models, no lock-in.

DeepSeek V4-Flash, Qwen 3.6, Gemma 4. No vendor can deprecate the model under your copilot or change pricing on you overnight.

Drops into your app.

Change the base URL and key. The Vercel AI SDK, LangChain, LlamaIndex and your own chat code keep working, streaming and tools included.

In production across

B2B SaaS
Insurance
Healthcare
HR & recruiting
Energy & utilities
Education
Manufacturing

In production at

// copilots faq

Copilots, answered.

What product and engineering teams ask before building assistants on their own data.

Which models are best for building copilots?

deepseek-v4-flash for flagship reasoning and tool calling, qwen3.6 (35B MoE) for 2× throughput via speculative decoding, and gemma4 when you need vision. All share one OpenAI-compatible API.

Do you support tool / function calling?

Yes, native function calling with the same JSON schema you already use with OpenAI, plus streaming. Your copilot can query systems and take actions, not just answer.

Can the copilot use our internal data?

Yes. Pair it with retrieval (qwen3-embedding + rerank) to ground answers in your docs, code and tickets, no fine-tuning required, and your data stays in the EU.

Do you store our prompts or completions?

No. Zero logs, what your experts type and what the copilot reads is never persisted and never trains a model.

Does it work with the Vercel AI SDK or LangChain?

Yes. Point any OpenAI-compatible client at our base URL with your API key. The Vercel AI SDK, LangChain, LlamaIndex and custom code work unchanged.

Can we run a copilot fully on-premise?

Yes. For strict compliance, run on a dedicated GPU or on-premise inside your own datacenter, the same API and code, with data that never leaves your network.

// get started

START BURNING TOKENS

Skip the AI infra work. Deploy your first private inference endpoint today.

Flat rate. EU data. OpenAI API compatible.

book_a_call

Copilots foryour experts.