// pricing

Pricing

Pay per API key, not per token. Unlimited tokens on every plan — no lock-in, cancel anytime.

Starter

€399/mo

For small teams starting to integrate AI into their workflows.


  • 5 API Keys
  • Unlimited tokens · open models
  • 2.5B tokens/month · SOTA models
  • OpenAI API compatible
  • Zero logs · EU data
  • Email support

Scale

€3,199/mo

For organizations with intensive AI use and advanced needs.


  • 40 API Keys
  • Unlimited tokens · open models
  • 20B tokens/month · SOTA models
  • OpenAI API compatible
  • Zero logs · EU data
  • Priority support
  • SLA 99.9%
  • Early access to new models

Enterprise

Custom

For organizations needing dedicated GPUs and custom configuration.


  • +60 API Keys
  • Unlimited tokens · open models
  • Custom cap · SOTA models
  • Dedicated GPUs
  • Custom models
  • Zero logs · EU data
  • Custom SLA
  • Dedicated onboarding

All plans include RPM limits and concurrency per API Key to guarantee service quality.

dedicated infrastructure

Your own inference stack,
in your datacenter.

If your use case requires total data sovereignty, we deploy and operate the full inference stack inside your own infrastructure. Your models, your data, your prompts — they never leave your network.

talk_to_us →
On-premise deployment We install and operate the full stack on your servers, with the same OpenAI-compatible API.
Hardware advisory We help you select the right GPUs, memory and network for your use case and budget.
Total data sovereignty Your data and prompts never leave your network. Built for banking, healthcare and defense.
In production at

// pricing faq

Pricing, explained.

Everything about plans, limits and billing — before you ask.

Do I pay per API key or per token?

Per API key — a flat monthly price. Tokens are unlimited on open models, with no per-token charges and no usage surprises. Your CFO gets a fixed line on the P&L.

What's the difference between "unlimited tokens" and the monthly SOTA cap?

Open models (Qwen, Gemma, DeepSeek…) are unlimited on every plan. The monthly cap only applies to frontier/SOTA models, where compute is more expensive. We always reach out before any overage — never a surprise bill.

Is there any commitment or lock-in?

No. Plans are month-to-month and you can cancel anytime. You run on open-weight models you can always access — no vendor can deprecate your API or change pricing on you overnight.

Can I change plans later?

Yes. Upgrade or downgrade at any time and changes are prorated. As your usage grows you simply move up a tier — the API and your code stay exactly the same.

How do rate limits work?

Limits apply per API key as requests-per-minute and concurrency, to guarantee service quality — not on how many tokens you process. A single key can handle hundreds of millions of tokens a month.

Do you offer dedicated GPUs or on-premise?

Yes, on Enterprise: dedicated NVIDIA Blackwell hardware, custom and fine-tuned models, and full on-premise deployment inside your own datacenter. Talk to us for a custom quote.

// get started

START BURNING TOKENS

Skip the AI infra work. Deploy your first private inference endpoint today.

Flat rate. EU data. OpenAI API compatible.