// pricing

Pricing

Pay per API key, not per token. Unlimited tokens on every plan, no lock-in, cancel anytime.

Starter

€399/mo

For small teams starting to integrate AI into their workflows.

5 API Keys
Unlimited tokens · open models
2.5B tokens/month · SOTA models
OpenAI API compatible
Zero logs · EU data
Email support

Your own inference stack,
in your datacenter.

If your use case requires total data sovereignty, we deploy and operate the full inference stack inside your own infrastructure. Your models, your data, your prompts, they never leave your network.

talk_to_us →

On-premise deployment We install and operate the full stack on your servers, with the same OpenAI-compatible API.

Hardware advisory We help you select the right GPUs, memory and network for your use case and budget.

Total data sovereignty Your data and prompts never leave your network. Built for banking, healthcare and defense.

In production at

// pricing faq

Pricing, explained.

Everything about plans, limits and billing, before you ask.

Do I pay per API key or per token?

Per API key, a flat monthly price. Tokens are unlimited on open models, with no per-token charges and no usage surprises. Your CFO gets a fixed line on the P&L.

What's the difference between "unlimited tokens" and the monthly SOTA cap?

Open models (Qwen, Gemma, DeepSeek…) are unlimited on every plan. The monthly cap only applies to frontier/SOTA models, where compute is more expensive. We always reach out before any overage, never a surprise bill.

Is there any commitment or lock-in?

No. Plans are month-to-month and you can cancel anytime. You run on open-weight models you can always access, no vendor can deprecate your API or change pricing on you overnight.

Can I change plans later?

Yes. Upgrade or downgrade at any time and changes are prorated. As your usage grows you simply move up a tier, the API and your code stay exactly the same.

How do rate limits work?

Limits apply per API key as requests-per-minute and concurrency, to guarantee service quality, not on how many tokens you process. A single key can handle hundreds of millions of tokens a month.

Do you offer dedicated GPUs or on-premise?

Yes, on Enterprise: dedicated NVIDIA Blackwell hardware, custom and fine-tuned models, and full on-premise deployment inside your own datacenter. Talk to us for a custom quote.

// get started

START BURNING TOKENS

Skip the AI infra work. Deploy your first private inference endpoint today.

Flat rate. EU data. OpenAI API compatible.

book_a_call

Pricing

Your own inference stack,in your datacenter.

Pricing, explained.

START BURNING TOKENS

Your own inference stack,
in your datacenter.