Home · Cellar · Needle

VINTAGE 2026.05.13 · BATCH #001

NEEDLE

⚙ Tool Calling Spirit

Distilled from Gemini 2.5 Flash. An attention-only transformer that maps natural language to structured function calls. 15 tool categories. Designed for edge inference: 250 MB checkpoint, runs on CPU in under 50 ms per call. Zero API dependency at inference.

★ 78 PROOF PASSED CUTS 20.7M PARAMS CPU-RUNNABLE
tool-name accuracy
78%
on held-out cuts
Arg-key F1
0.729
p=0.85 · r=0.64
Final loss
0.732
8 epochs
Compression
72,000×
~1.5T → 20.7M

Tasting Notes

✓ Strengths

  • Strong tool-name selection on imperative queries (Set a timer..., Send a message to...)
  • High precision on argument keys (0.85) — when it produces an arg, it's usually a real one
  • Holds JSON structure consistently — never produces malformed nesting
  • Gracefully handles 3-6 tools in context window (the typical user scenario)
  • Inference cost: ~$0/call, runs CPU-only, 45ms median latency

✕ Weaknesses

  • Confused between adjacent social tools (predicts Twitter for an Instagram prompt)
  • Arg-key recall (0.64) — sometimes drops optional but useful args like unit
  • Exact-call accuracy only 3% — value-level prediction is the soft spot
  • WordPiece tokenizer introduces whitespace noise in decoded JSON
  • Untested on chained 2-tool calls (~20% of training data, untested in eval)

SAMPLE PREDICTIONS

UtteranceGoldPredictedVerdict
Hey, can you log my weight? I'm at 75 kilograms now. log_health_metric(metric=weight, value=75) log_health_metric(metric=weight, value=75) ✓ exact
Post on my Instagram story: 'Beach blast!' post_social(platform=Instagram, ...) post_social(platform=Twitter, ...) ✕ wrong value
Text my wife, 'Don't forget the milk!' send_message(contact=wife, message=...) send_message(contact=wife, message=...) ✓ exact
What's the weather in Lisbon tomorrow? get_weather(location=Lisbon, day=tomorrow) get_weather(location=Lisbon) ✕ missing arg

Distillation Run

8 epochs over distilled examples. Final loss: 0.732.

Provenance

Teacher: gemini-2.5-flash (~1.5T params)
Recipe: needle.tool-calling-v1.yaml
Distilled: 2026-05-13 by The Distillery v0.1 engine
Author: @cactus

Download Bottle

Self-contained — model weights, tokenizer, and recipe in a single artifact. No API key needed at inference.

PYTORCH
249 MB
For training continuation, HuggingFace, custom Python.
Download
ONNX
100 MB (coming)
CPU/GPU inference, edge, browser via onnxruntime-web.
GGUF
25 MB (coming)
Quantized for llama.cpp, mobile, embedded targets.