📜 The Vocabulary, expanded

Glossary

The shorthand on the rest of the site, with depth. Every term here is a real ML concept dressed in distillation clothes — and the clothes were chosen carefully.

🍾

Spirit

= trained, bottled model artifact

The final output of a distillation run. A Spirit is a single self-contained file (PyTorch .pt, ONNX, or GGUF) that bundles three things:

The trained weights (typically 5-50M parameters)
The tokenizer (so inference works without external state)
The Recipe that produced it (so it's reproducible and auditable)

A Spirit can be moved, copied, shared, forked. It runs anywhere. That portability is the whole point.

🌾

Mash

= synthetic training corpus generated by the teacher

The Mash is the pile of (input, target) examples the teacher LLM produces when prompted. For tool calling that's (utterance, available_tools, target_call) triples. For PII it's (text, gold_spans). For SQL it's (question, schema, target_sql).

Quality of the Mash dominates everything downstream. A diverse, well-distributed Mash trains a useful Spirit. A repetitive Mash trains a Spirit that memorizes. Temperature, prompting strategy, and category coverage all matter here.

Watch for: the Mash is the leakage surface. If your prompt to the teacher includes the target, you'll get a corpus where the answer is too easy. Generate the input and target separately, or have the teacher commit before revealing.

📜

Recipe

= versioned YAML config

The Recipe is the single source of truth for a distillation run. It captures every knob: teacher, mash spec, student architecture, training hyperparameters, eval metrics, output formats. Two people with the same Recipe + the same teacher API key should get statistically equivalent Spirits.

Recipes are forkable. The community workflow is: someone publishes a Recipe → you fork it → tweak the catalog / params / teacher → distill your own variant → share back as a new Spirit.

🔥

The Still

= the training run

The actual gradient-descent loop on the student model. Inputs: the cuts (train split). Outputs: trained weights, loss curve, gradient norms. Heat is metaphor; in practice this is just AdamW on a GPU.

For Needle the Still runs 8 epochs at batch 16, lr 3e-4. On a single RTX 5090 that takes about 3 minutes.

✂

Cuts

= train / eval / test splits

The Mash is divided into cuts before training. Standard splits:

Hearts (90%) — the train set. The model trains on these.
Heads (10%) — the held-out eval set. Never seen during training. This is what Tasting Notes are computed on.
Tails — borderline or hard examples flagged for human review. (Not used in v0.1; planned for v0.3's active-learning loop.)

The terms come from real distilling: heads (volatile, kept), hearts (the core spirit), tails (less volatile, sometimes recycled). The metaphor is sharper than train/val/test.

📈

Proof

= held-out accuracy

The headline metric of a Spirit. We report it as a degree value (78°) because spirits-people say "80 proof whiskey" not "0.4 abv whiskey." Same idea: a single number that means "concentration."

The specific metric depends on task type:

Tool calling → tool-name accuracy on held-out
Classification → macro-F1
NER → span-F1
Generation → exact-match or LLM-judged similarity

Higher proof = more concentrated learning. There's a cap (the teacher's own performance on the task). Beyond that, distillation hits a wall.

📝

Tasting Notes

= auto-generated eval report

Every Spirit ships with structured Tasting Notes. Five sections:

Headline proof — single number
Strengths — what the model does well, with examples
Weaknesses — what it gets wrong, with examples
Loss curve — the training trajectory
Sample predictions — 8-10 held-out cases with gold + predicted + verdict

The point: publish the failure cases. A Spirit that hides its weaknesses can't be trusted in production. Tasting Notes make it cheap to be honest.

🛢

Aging in Casks

= continued training / RLHF / refresh

Spirits aren't static. As your task drifts (new tool added, new edge cases discovered), you re-age the Spirit — continue training on a new Mash slice. This is the same as "continued fine-tuning" or "incremental retraining" in ML lingo.

v0.3 will ship explicit aging support — version lineage tracked, deltas measured.

🍾

Bottling

= export to deployable format

Converts the in-training PyTorch state into a target runtime format:

.pt — PyTorch native. Easy continuation, Python-only.
.onnx — cross-runtime (CPU/GPU/edge, Rust/JS/Go/Java).
.gguf — quantized for llama.cpp, mobile, embedded.
.wasm — runs in the browser. (v0.5+.)

Each format is a different tradeoff between portability, size, and accuracy. Quantizing to q4 GGUF gets you to ~25 MB but loses ~3-5 proof points on the eval.

🏛

The Cellar

= your model library

A Cellar is just a directory of Spirits. Each one has Tasting Notes, a Recipe, and a download. Public Cellars (like the one at /cellar) are shared model showcases. Private Cellars are local-only.

Hosted private Cellars are a future feature, not v0.1.

👨‍🏫

Teacher / Student

= the classic distillation pair

The teacher is a large, capable model (Gemini 2.5 Flash, Claude Sonnet, GPT-4o). The student is the small model you're training. The teacher emits training examples; the student learns to mimic them on the specific task.

Distillarium is "data distillation" — the student learns from the teacher's outputs, not its logits. (Logit-based distillation is a different technique that requires direct model access. Both work; ours is simpler and works with any API teacher.)