// deployment

Same stack.
Your level of sovereignty.

Run private inference on a shared EU cluster, on dedicated Blackwell hardware, or fully inside your own datacenter. Same API, same code — you choose how far the sovereignty goes.

// the spectrum

One stack, three levels of control.

The same managed inference stack, moved progressively closer to you — from a shared EU cluster to hardware that never leaves your building. You can start anywhere and move up.

// deployment models

Pick where it runs.

Three ways to deploy the exact same inference stack. The code is identical — what changes is who owns the hardware and where your data is processed.

Shared

Managed · shared EU cluster

Shared EU infrastructure.

The fastest way to get private inference into production. A managed EU cluster, zero logs and GDPR native — without provisioning a single GPU.

Best for
Small to mid teams shipping fast
Where it runs
Helmcode EU cluster
Isolation
Per-API-key rate limits
Setup
Minutes
SLA
99.5% on Growth
From
€399 / month

Dedicated

Exclusive hardware · Helmcode EU

Exclusive Blackwell hardware.

NVIDIA Blackwell hardware reserved for you inside Helmcode's EU infrastructure — guaranteed throughput, full network isolation and support for custom or fine-tuned models.

Best for
Heavy use, strict compliance
Where it runs
Helmcode EU · isolated
Isolation
Full network isolation
Setup
Days
SLA
Custom
From
Custom pricing

On-premise

Your datacenter · operated by us

Runs in your datacenter.

We deploy and operate the full inference stack inside your datacenter — or a partner facility. Your data never moves. Not a single token leaves your network.

Best for
Banking, defense, public sector
Where it runs
Your datacenter or ours
Isolation
Air-gappable
Setup
Weeks · turn-key
SLA
Custom
From
Custom pricing
Ready for banking, healthcare, defense and public sector.

// side by side

The differences that matter.

Everything is the same except sovereignty, hardware and SLA. Here is exactly where the three diverge.

Shared Dedicated On-premise
Where data is processed Helmcode EU cluster Helmcode EU · isolated Your datacenter
Hardware Shared GPUs B200, exclusive Your hardware or ours
Network isolation Logical (per key) Full network isolation Air-gappable
Custom / fine-tuned models Yes Yes
Setup time Minutes Days Weeks · turn-key
Uptime SLA 99.5% on Growth Custom Custom
Starting price €399 / month Custom Custom

// fully managed

Whoever owns the hardware, we run the stack.

Deployment changes where inference runs — never who keeps it alive. On all three models, Helmcode provisions, monitors and operates the full stack so your team never touches a GPU.

  • GPU provisioning & monitoring
  • vLLM installation & config
  • Model version management
  • Rate limiting & concurrency
  • Hardware upgrades
  • SLA management

// regulated sectors

Matched to your compliance.

Where your data can legally live decides how you deploy. A starting point for the most regulated industries we serve.

Banking & fintech

Dedicated / On-premise

DORA, GDPR and data residency with full network isolation.

Healthcare

Dedicated / On-premise

Patient data never leaves a controlled, auditable boundary.

Legal & legaltech

Shared / Dedicated

Privileged documents processed EU-only, with zero logs.

Public sector & defense

On-premise

Sovereign and air-gappable — no token leaves your network.

// migration

Moving up is a config change.

Because all three speak the same OpenAI-compatible API, graduating from shared to dedicated to on-premise never touches your application code.

  1. 01

    Start on Shared

    Get an API key, point your SDK at the Helmcode base URL, ship the same day.

  2. 02

    Graduate when you need to

    Tighter compliance or heavier load? Move to Dedicated or On-premise — we provision it.

  3. 03

    Swap the base URL

    Repoint base URL and key at the new deployment. Same models, same code, zero rewrite.

// deployment faq

Deployment, answered.

What teams ask before choosing where their inference runs.

Can I start on Shared and move to Dedicated or On-premise later?

Yes — that is the whole point. All three run the same stack behind the same OpenAI-compatible API. Moving up is a base URL and key change; your application code does not change.

Where exactly is the EU infrastructure?

Shared and Dedicated run on Helmcode infrastructure inside the EU — never on US hyperscalers subject to the Cloud Act. Inference is processed in-region with zero logs, so it is GDPR and AI Act native by architecture.

What hardware does Dedicated use?

NVIDIA B200 — 192GB VRAM, 256GB DDR5 — reserved exclusively for you. We provision, monitor and upgrade it; you never touch a GPU.

Who operates an On-premise deployment?

We do. Helmcode deploys and runs the full inference stack inside your datacenter or a partner facility — turn-key. You keep the data and the network; we keep the GPUs, vLLM and models healthy.

Can On-premise be air-gapped?

Yes. For the strictest environments the deployment can run fully isolated, with no outbound connectivity — not a single token leaves your network.

Is my data ever used to train a model?

Never, on any deployment model. Zero logs is a property of the architecture: prompts and completions are not stored, and nothing you send trains a model.

// get started

START BURNING TOKENS

Skip the AI infra work. Deploy your first private inference endpoint today.

Flat rate. EU data. OpenAI API compatible.