Sovereign AI

Host a private LLM in Europe

Run Mistral or Llama on European-hosted infrastructure, without sending your data to a US API. Managed GPU inference, GDPR-compliant, and re-internalisable.

Updated June 2026

OpenAIAnthropicUS APIself-hostyour data leaveshosted in Europe

Calling OpenAI's or Anthropic's API means sending every prompt and every response to servers under US jurisdiction. For many use cases (HR documents, contracts, customer data, proprietary code), that's a non-starter.

Hosting an open source model in Europe fixes the problem at the root: the data never leaves the continent, and you keep control of the infrastructure. Bunker provides the GPU and managed inference to do it without standing up an MLOps team.

Why host instead of consuming an API

  • Data stays in Europe. Your prompts don't pass through a third-party provider outside the EU. That's the difference that unblocks GDPR cases and sensitive data.
  • No leakage into someone else's training. A model you host sends nothing to anyone. What you give it stays with you.
  • Predictable cost at volume. An API bills per token. Once usage gets serious, a dedicated GPU often costs less than a bill that grows with traffic.
  • Re-internalisable. The infrastructure is open source. You can move it onto your own hardware whenever you want, taking your data with you.

Which model to choose

Open-weight models now cover most needs. The right choice mostly comes down to size, because size decides the GPU.

  • 7 to 9 billion parameters (Mistral 7B, Llama 3 8B): fast, enough for classification, summarization, extraction, an internal chatbot. Run on a single consumer GPU.
  • 12 to 34 billion: better reasoning, a good quality/cost balance for RAG or business assistance.
  • 70 billion and up: quality close to large proprietary models, but it takes several GPUs or quantization.

Quantization (4-bit, 8-bit) sharply reduces the memory needed at a usually acceptable precision cost. A 7B model goes from roughly 16 GB in fp16 to 6-8 GB in 4-bit; a 70B goes from around 140 GB to about forty.

How it works at Bunker

You pick a model and a GPU size, we deploy inference in Europe, and you get a private API that is OpenAI-compatible: your existing tools work without a rewrite. Everything stays in European datacenters.

Your applicationYour applicationEuropePrivate APIOpenAI formatGPU inferenceMistral / Llama

No data leaves this boundary. And because the stack is open source, you can ask to bring it onto your own infrastructure without starting over.

What to size

To size your private AI, we start from your business needs. The technical tuning (quantization, batching, GPU choice) is our job.

Model sizeContext sizeConcurrent usersExpected speedBunkersizes the GPU
  1. Model size. A small model (7 to 9 billion parameters) responds fast and costs little: enough for summarization, classification, or an internal assistant. A larger model reasons better on complex topics, at the cost of more resources. We tune the precision (quantization) to fit the GPU envelope.
  2. Context size. This is what the model reads in one go: a short question, or dozens of pages injected to query your documents (RAG). It is the most underestimated factor: a long context consumes a lot of memory and slows down the first response. As soon as you handle long documents, it is often what sizes the deployment.
  3. The number of truly concurrent users. How many people query the AI at the same time, at peak. Ten connected colleagues almost never fire ten requests in the same second: this figure is often overestimated.
  4. Expected speed. Two things that differ: the delay before the first response, which matters for an interactive exchange, and the generation throughput, which matters for bulk background processing. A live agent and an overnight batch do not have the same needs.

When in doubt, we start with a small model and a modest GPU, measure real usage, and scale up if needed. That is cheaper than oversizing from day one.

Frequently asked questions

Is my data used to train the model?

No. A hosted model does inference, and inference trains nothing. Your prompts are neither stored nor sent elsewhere, unless you decide otherwise.

Is this GDPR-compliant?

Processing happens in European datacenters, out of reach of US extraterritorial laws. That's what lets you handle personal and sensitive data.

Can I take my deployment back later?

Yes. The components are open source. You can re-internalise the whole thing onto your hardware, with no dependency on Bunker.

Do I need a big GPU to start?

No. A 7-to-9-billion-parameter model, optionally quantized, runs on a single GPU and already covers a lot. You scale up when usage justifies it.

Host your private AI in Europe

We size the GPU, we deploy inference, your data stays with you.