// deployment

Same stack.
Your level of sovereignty.

Run private inference on a shared EU cluster, on dedicated Blackwell hardware, or fully inside your own datacenter. Same API, same code, you choose how far the sovereignty goes.

talk_to_us

// the spectrum

One stack, three levels of control.

Based on what you need, you decide whether we take care of the infrastructure or it all runs on yours.

// deployment models

Pick where it runs.

Three ways to deploy the exact same inference stack. The code is identical, what changes is who owns the hardware and where your data is processed.

Shared

Managed · shared EU cluster

Shared EU infrastructure.

The fastest way to get private inference into production. A managed EU cluster, zero logs and GDPR native, without provisioning a single GPU.

Best for: Small to mid teams shipping fast
Where it runs: Helmcode EU cluster
Isolation: Per-API-key rate limits
Setup: Immediate
SLA: 99.5% on Growth
From: €399 / month

Dedicated

Exclusive hardware · Helmcode EU

Exclusive Blackwell hardware.

NVIDIA Blackwell hardware reserved for you inside Helmcode's EU infrastructure, guaranteed throughput, full network isolation and support for custom or fine-tuned models.

Best for: Heavy use, strict compliance
Where it runs: Helmcode EU · isolated
Isolation: Full network isolation
Setup: Immediate
SLA: Custom
From: Custom pricing

On-premise

Your datacenter · operated by us

Runs in your datacenter.

We deploy and operate the full inference stack inside your datacenter, or a partner facility. Your data never moves. Not a single token leaves your network.

Best for: Banking, defense, public sector
Where it runs: Your datacenter or ours
Isolation: Air-gappable
Setup: Weeks · turn-key
SLA: Custom
From: Custom pricing

Ready for banking, healthcare, defense and public sector.

// side by side

The differences that matter.

Everything is the same except sovereignty, hardware and SLA. Here is exactly where the three diverge.

	Shared	Dedicated	On-premise
Where data is processed	Helmcode EU cluster	Helmcode EU · isolated	Your datacenter · operated by us
Hardware	Shared GPUs	B200, exclusive	Your hardware or ours
Network isolation	Logical (per key)	Full network isolation	Air-gappable
Custom / fine-tuned models	No	Yes	Yes
Setup time	Immediate	Immediate	Weeks · turn-key
Uptime SLA	99.5% on Growth	Custom	Custom
Starting price	€399 / month	Custom	Custom

// fully managed

Whoever owns the hardware, we run the stack.

Deployment changes where inference runs, never who keeps it alive. On all three models, Helmcode provisions, monitors and operates the full stack so your team never touches a GPU.

GPU provisioning & monitoring
vLLM installation & config
Model version management
Rate limiting & concurrency
Hardware upgrades
SLA management

// regulated sectors

Matched to your compliance.

Where your data can legally live decides how you deploy. A starting point for the most regulated industries we serve.

Banking & fintech

Dedicated / On-premise

DORA, GDPR and data residency with full network isolation.

Healthcare

Dedicated / On-premise

Patient data never leaves a controlled, auditable boundary.

Legal & legaltech

Shared / Dedicated

Privileged documents processed EU-only, with zero logs.

Public sector & defense

On-premise

Sovereign and air-gappable, no token leaves your network.

// migration

Moving up is a config change.

Because all three speak the same OpenAI-compatible API, graduating from shared to dedicated to on-premise never touches your application code.

01

Start on Shared

Get an API key, point your SDK at the Helmcode base URL, ship the same day.
02

Graduate when you need to

Tighter compliance or heavier load? Move to Dedicated or On-premise, we provision it.
03

Swap the base URL

Repoint base URL and key at the new deployment. Same models, same code, zero rewrite.

// deployment faq

Deployment, answered.

What teams ask before choosing where their inference runs.

Can I start on Shared and move to Dedicated or On-premise later?

Yes, that is the whole point. All three run the same stack behind the same OpenAI-compatible API. Moving up is a base URL and key change; your application code does not change.

Where exactly is the EU infrastructure?

Shared and Dedicated run on Helmcode infrastructure inside the EU, never on US hyperscalers subject to the Cloud Act. Inference is processed in-region with zero logs, so it is GDPR and AI Act native by architecture.

What hardware does Dedicated use?

NVIDIA B200, 192GB VRAM, 256GB DDR5, reserved exclusively for you. We provision, monitor and upgrade it; you never touch a GPU.

Who operates an On-premise deployment?

We do. Helmcode deploys and runs the full inference stack inside your datacenter or a partner facility, turn-key. You keep the data and the network; we keep the GPUs, vLLM and models healthy.

Can On-premise be air-gapped?

Yes. For the strictest environments the deployment can run fully isolated, with no outbound connectivity, not a single token leaves your network.

Is my data ever used to train a model?

Never, on any deployment model. Zero logs is a property of the architecture: prompts and completions are not stored, and nothing you send trains a model.

// get started

START BURNING TOKENS

Skip the AI infra work. Deploy your first private inference endpoint today.

Flat rate. EU data. OpenAI API compatible.

book_a_call

Same stack.Your level of sovereignty.