Distilled from Gemini 2.5 Flash. An attention-only transformer that maps natural language to structured function calls. 15 tool categories. Designed for edge inference: 250 MB checkpoint, runs on CPU in under 50 ms per call. Zero API dependency at inference.
| Utterance | Gold | Predicted | Verdict |
|---|---|---|---|
| Hey, can you log my weight? I'm at 75 kilograms now. | log_health_metric(metric=weight, value=75) | log_health_metric(metric=weight, value=75) | ✓ exact |
| Post on my Instagram story: 'Beach blast!' | post_social(platform=Instagram, ...) | post_social(platform=Twitter, ...) | ✕ wrong value |
| Text my wife, 'Don't forget the milk!' | send_message(contact=wife, message=...) | send_message(contact=wife, message=...) | ✓ exact |
| What's the weather in Lisbon tomorrow? | get_weather(location=Lisbon, day=tomorrow) | get_weather(location=Lisbon) | ✕ missing arg |
8 epochs over distilled examples. Final loss: 0.732.
gemini-2.5-flash (~1.5T params)needle.tool-calling-v1.yamlSelf-contained — model weights, tokenizer, and recipe in a single artifact. No API key needed at inference.