// use cases · semantic search

Semantic search over
your own data.

Multilingual embeddings and cross-lingual reranking. Search, ranking and recommendation across catalogs, repos and archives.

book_a_call

// how it works

Search that understands meaning, not keywords.

Embeddings and reranking from a single OpenAI-compatible endpoint, multilingual out of the box, and private by default.

step 01

Embed your corpus

qwen3-embedding

Turn your catalog, repos and archives into 4096-dimension vectors, 100+ languages, MMTEB 70.58. Re-embed as often as you want; tokens are unlimited.

step 02

Search by meaning

qwen3-embedding

Embed the query and pull nearest neighbors from your own vector store, pgvector, Qdrant, Pinecone, Weaviate. Meaning and intent, not string matching.

step 03

Rerank for precision

rerank

Sharpen the top candidates with our cross-lingual reranker, so the best match comes first, even when the query and the document are in different languages.

// drop-in

Change one line. Keep your stack.

Point the OpenAI SDK, or your own search pipeline, at Helmcode. Same calls, same shapes, private multilingual models on EU infrastructure.

read_the_docs

search.py

from openai import OpenAI

client = OpenAI(
    api_key="sk-...",
    base_url="https://api.helmcode.com/v1",  # one line changes
)

# 1 · embed your catalog — 4096-dim multilingual vectors
catalog = client.embeddings.create(
    model="qwen3-embedding",
    input=documents,
)

# 2 · retrieve from your vector store, then rerank for precision
ranked = client.post(
    "/rerank",
    cast_to=dict,
    body={"model": "rerank", "query": query, "documents": candidates, "top_n": 5},
)

// why helmcode

Relevance you don't have to hand over.

Search and recommendation that stay private, multilingual and yours, without the bill scaling with your traffic.

Zero logs, by architecture.

Your queries and your catalog are never stored, and nothing you embed ever trains a model, not ours, not anyone's.

Vectors stay in the EU.

Embeddings and reranking run only on EU infrastructure, not on US hyperscalers subject to the Cloud Act. GDPR and AI Act native.

Embeddings + rerank, one API.

Vector embeddings and cross-lingual reranking behind a single OpenAI-compatible endpoint. No two vendors to wire together.

Multilingual by default.

100+ languages and cross-lingual retrieval out of the box. Search in one language, match documents written in another.

Re-index without a bill.

Re-embed your whole catalog as often as you need. Limits are RPM and concurrency per key, never total tokens.

Keep your vector DB.

Change the base URL and key. pgvector, Qdrant, Pinecone and your own search code keep working, we never lock you into a storage layer.

In production across

E-commerce & retail
Dev tools
Media & agencies
SaaS
Contact center
Healthcare
Pharma & biotech
HR & recruiting
Education
AI-native products

In production at

// search faq

Semantic search, answered.

What engineering teams ask before moving search and recommendation in-house.

Which embedding model do you use, and how good is it?

qwen3-embedding, 8B parameters, 4096 dimensions, 100+ languages, scoring 70.58 on MMTEB. It's served from the same OpenAI-compatible API as the rest of the stack.

Do you store my queries or my catalog?

No. Zero logs, queries, documents and embeddings are never persisted, and nothing you send ever trains a model. Privacy is enforced by architecture, not by policy.

Can I keep my own vector database?

Yes. Helmcode produces the embeddings and reranking, you keep your vector store (pgvector, Qdrant, Pinecone, Weaviate…). There's no proprietary index to migrate to.

How does reranking improve results?

After vector retrieval returns candidates, the rerank model (Qwen3 Reranker) scores query–document pairs directly and reorders them, so the most relevant result is first, not just nearby in vector space.

Is it really cross-lingual?

Yes. Both embeddings and reranking are multilingual, so a query in one language matches relevant documents written in another, no per-language index or translation step.

Can I use this for recommendation, not just search?

Yes. The same embeddings power similarity and recommendation, represent users, items or content as vectors and retrieve nearest neighbors, all on private EU infrastructure.

// get started

START BURNING TOKENS

Skip the AI infra work. Deploy your first private inference endpoint today.

Flat rate. EU data. OpenAI API compatible.