Fine-tune a sovereign LLM
Specialize an open source LLM on your own data, on GPU infrastructure hosted in Europe. LoRA, datasets, sizing, without your data training a third-party model.
Updated June 2026
A generic model knows the world but not your business. Fine-tuning teaches it your vocabulary, your formats, your tone: a model that drafts your reports the way you do, classifies your tickets with your categories, or answers in your industry's jargon.
Done on European-hosted infrastructure, training keeps your data with you, where consumer platforms reuse what you hand them.
Fine-tuning or RAG: which to choose
The two approaches solve different problems, and often combine.
- RAG (retrieval-augmented generation): the model fetches information from your documents at query time. Ideal when knowledge changes often (catalog, document base, living FAQ).
- Fine-tuning: you adjust the model so it internalizes a style, a format, or a lasting behavior. Ideal when the shape of the answer matters as much as the content.
In practice, many projects start with RAG, then add light fine-tuning when the tone or format doesn't follow.
LoRA: specialize without retraining everything
Retraining a whole model is expensive in GPU. LoRA (and its variants) trains only a small set of additional parameters, which changes everything on a tight budget:
- training fits on one or two GPUs instead of a cluster;
- the adapters produced weigh a few megabytes, easy to version and swap;
- you keep the base model intact and stack several specializations.
It's the default approach to specialize a Mistral or a Llama without a lab budget.
Preparing the dataset
Fine-tuning quality depends on the data first, the hardware second.
- Gather representative examples of the task: input/output pairs that show exactly the expected behavior.
- Clean and format into a consistent instruction format. A few hundred to a few thousand well-chosen examples beat a large noisy volume.
- Keep a validation set aside to measure whether the model truly improves and isn't just memorizing.
Your data doesn't leave the European infrastructure throughout the process.
Sizing the training
Fine-tuning needs more memory than inference, because gradients have to be stored. A few reference points:
- a LoRA on a 7B model fits on a 24 GB GPU in most cases;
- a larger model needs quantization (QLoRA) or several GPUs;
- duration depends on dataset size and the number of passes, from a few minutes to a few hours for a reasonable LoRA.
Bunker provides the GPU and operates the training; you provide the data and the task definition.
Frequently asked questions
Does my training data feed a third-party model?
No. Training happens on dedicated infrastructure in Europe. Your data and the resulting adapter belong to you and are shared with no one.
Do I need a lot of examples?
Not necessarily. For a LoRA, a few hundred to a few thousand quality examples are often enough. Consistency beats volume.
What's the concrete difference from a well-written prompt?
A good prompt takes you far. Fine-tuning takes over when you want stable behavior, without re-explaining the format on every request, across a large number of calls.
Can I take back the specialized model?
Yes. The base model is open source and the adapter belongs to you. You can re-internalise the whole thing onto your own hardware.
Specialize your model on your data
LoRA on GPU in Europe, your data and your adapter stay yours.