Training Lab

Stuff I'm working on around voice notes and tiny models.

This page is a running home for training experiments, eval ideas, half-finished notes, and the older dictation work that led into the current voice memo stuff.

Read The Writing

What this is

Mostly DIY learning in public

I'm using this route to keep the work in one place while I learn. Some of it is careful benchmark design, some of it is me trying things, getting them wrong, and writing down what changed.

Right now the main thread is voice memo extraction. The older dictation work is still here because it explains how I ended up caring about these smaller note-shaped tasks.

Stuff I'm Working On

Two threads that keep feeding each other

Current thread

Voice memo extraction

This is the newer thread. I'm trying to turn short voice memos into cleaner, more usable artifacts without pretending a tiny model knows more than it does.

Auto-title and tiny intent extraction
Evaluation for restraint and review behavior
Hosted and local model comparisons

Foundation

Dictation to structured output

This is the older thread. It started with spoken commands and shell syntax, but it still shapes how I think about cleanup, normalization, and where models should stop and code should take over.

Speech normalization and protocol formatting
Split architecture between model and processor
On-device training and evaluation loops

One Current Example

A small thing I'm testing right now

Voice memo

Need to talk to Maya about the deck or maybe just send it over first,
I'm not sure which is less annoying.

Extraction

{
  "title": "Decide how to share deck with Maya",
  "intent": "none",
  "target": ""
}

Recent Writing

Recent notes and writeups

All Ideas

EssayApril 8, 2026

Notes on extractions from voice memos

Pulling titles and lightweight actions from voice memos, how to measure the result, and what to do when an extraction is useful but not ready to finalize.

EssayApril 6, 2026

Designing A Semantic Eval For Tiny Models

A reader-first walkthrough of a new semantic eval for tiny local models, including what it measures, how the scores work, and what the first local runs show.

EssayApril 5, 2026

Building Core Eval v2

The practical design document behind core_eval_v2, from product truth and scoring layers to calibration rules and what still feels unfinished.

EssayMarch 9, 2026

Part 6: What a 0.6B Model Can't Learn

You can iterate forever on training data. At some point you have to ask whether the model is the right tool for the job.