step 01
Embed your corpus
qwen3-embedding Turn your catalog, repos and archives into 4096-dimension vectors — 100+ languages, MMTEB 70.58. Re-embed as often as you want; tokens are unlimited.
// use cases · semantic search
Multilingual embeddings and cross-lingual reranking. Search, ranking and recommendation across catalogs, repos and archives.
// how it works
Embeddings and reranking from a single OpenAI-compatible endpoint — multilingual out of the box, and private by default.
step 01
qwen3-embedding Turn your catalog, repos and archives into 4096-dimension vectors — 100+ languages, MMTEB 70.58. Re-embed as often as you want; tokens are unlimited.
step 02
qwen3-embedding Embed the query and pull nearest neighbors from your own vector store — pgvector, Qdrant, Pinecone, Weaviate. Meaning and intent, not string matching.
step 03
rerank Sharpen the top candidates with our cross-lingual reranker, so the best match comes first — even when the query and the document are in different languages.
// drop-in
Point the OpenAI SDK — or your own search pipeline — at Helmcode. Same calls, same shapes, private multilingual models on EU infrastructure.
read_the_docsfrom openai import OpenAI client = OpenAI( api_key="sk-...", base_url="https://api.helmcode.com/v1", # one line changes ) # 1 · embed your catalog — 4096-dim multilingual vectors catalog = client.embeddings.create( model="qwen3-embedding", input=documents, ) # 2 · retrieve from your vector store, then rerank for precision ranked = client.post( "/rerank", cast_to=dict, body={"model": "rerank", "query": query, "documents": candidates, "top_n": 5}, )
// why helmcode
Search and recommendation that stay private, multilingual and yours — without the bill scaling with your traffic.
Your queries and your catalog are never stored, and nothing you embed ever trains a model — not ours, not anyone's.
Embeddings and reranking run only on EU infrastructure — not on US hyperscalers subject to the Cloud Act. GDPR and AI Act native.
Vector embeddings and cross-lingual reranking behind a single OpenAI-compatible endpoint. No two vendors to wire together.
100+ languages and cross-lingual retrieval out of the box. Search in one language, match documents written in another.
Re-embed your whole catalog as often as you need. Limits are RPM and concurrency per key — never total tokens.
Change the base URL and key. pgvector, Qdrant, Pinecone and your own search code keep working — we never lock you into a storage layer.
// search faq
What engineering teams ask before moving search and recommendation in-house.
qwen3-embedding — 8B parameters, 4096 dimensions, 100+ languages, scoring 70.58 on MMTEB. It's served from the same OpenAI-compatible API as the rest of the stack.
No. Zero logs — queries, documents and embeddings are never persisted, and nothing you send ever trains a model. Privacy is enforced by architecture, not by policy.
Yes. Helmcode produces the embeddings and reranking — you keep your vector store (pgvector, Qdrant, Pinecone, Weaviate…). There's no proprietary index to migrate to.
After vector retrieval returns candidates, the rerank model (Qwen3 Reranker) scores query–document pairs directly and reorders them — so the most relevant result is first, not just nearby in vector space.
Yes. Both embeddings and reranking are multilingual, so a query in one language matches relevant documents written in another — no per-language index or translation step.
Yes. The same embeddings power similarity and recommendation — represent users, items or content as vectors and retrieve nearest neighbors, all on private EU infrastructure.
// get started
Skip the AI infra work. Deploy your first private inference endpoint today.
Flat rate. EU data. OpenAI API compatible.
// cookies
We use strictly necessary cookies to run the site and, only with your consent, Google Analytics to understand usage. No advertising, ever — see our Cookie Policy.
// preferences