Sovereign AI

Fine-tune a sovereign LLM

Specialize an open source LLM on your own data, on GPU infrastructure hosted in Europe. LoRA, datasets, sizing, without your data training a third-party model.

Updated June 2026

Your dataLoRABase modelSpecialized model

A generic model knows the world but not your business. Fine-tuning teaches it your vocabulary, your formats, your tone: a model that drafts your reports the way you do, classifies your tickets with your categories, or answers in your industry's jargon.

Done on European-hosted infrastructure, training keeps your data with you, where consumer platforms reuse what you hand them.

Fine-tuning or RAG: which to choose

The two approaches solve different problems, and often combine.

  • RAG (retrieval-augmented generation): the model fetches information from your documents at query time. Ideal when knowledge changes often (catalog, document base, living FAQ).
  • Fine-tuning: you adjust the model so it internalizes a style, a format, or a lasting behavior. Ideal when the shape of the answer matters as much as the content.

In practice, many projects start with RAG, then add light fine-tuning when the tone or format doesn't follow.

LoRA: specialize without retraining everything

Retraining a whole model is expensive in GPU. LoRA (and its variants) trains only a small set of additional parameters, which changes everything on a tight budget:

  • training fits on one or two GPUs instead of a cluster;
  • the adapters produced weigh a few megabytes, easy to version and swap;
  • you keep the base model intact and stack several specializations.

It's the default approach to specialize a Mistral or a Llama without a lab budget.

Preparing the dataset

Fine-tuning quality depends on the data first, the hardware second.

  1. Gather representative examples of the task: input/output pairs that show exactly the expected behavior.
  2. Clean and format into a consistent instruction format. A few hundred to a few thousand well-chosen examples beat a large noisy volume.
  3. Keep a validation set aside to measure whether the model truly improves and isn't just memorizing.
Your dataLoRA trainingGPU in EuropeAdaptera few MBSpecializedinference

Your data doesn't leave the European infrastructure throughout the process.

Sizing the training

Fine-tuning needs more memory than inference, because gradients have to be stored. A few reference points:

  • a LoRA on a 7B model fits on a 24 GB GPU in most cases;
  • a larger model needs quantization (QLoRA) or several GPUs;
  • duration depends on dataset size and the number of passes, from a few minutes to a few hours for a reasonable LoRA.

Bunker provides the GPU and operates the training; you provide the data and the task definition.

Frequently asked questions

Does my training data feed a third-party model?

No. Training happens on dedicated infrastructure in Europe. Your data and the resulting adapter belong to you and are shared with no one.

Do I need a lot of examples?

Not necessarily. For a LoRA, a few hundred to a few thousand quality examples are often enough. Consistency beats volume.

What's the concrete difference from a well-written prompt?

A good prompt takes you far. Fine-tuning takes over when you want stable behavior, without re-explaining the format on every request, across a large number of calls.

Can I take back the specialized model?

Yes. The base model is open source and the adapter belongs to you. You can re-internalise the whole thing onto your own hardware.

Specialize your model on your data

LoRA on GPU in Europe, your data and your adapter stay yours.