Hold your ground truth

Pandora's box has been flung open for a few years now, we've flown the coop, the can of worms is wide open and wriggling everywhere, the cork's out of the bottle, and the bottle itself is long empty.

There's no going back, so I wanted to share my own "ah-ha" moment when it comes to how I use AI.

Let's suppose every few months these models take a leap forward. It's not just that GPT‑3.5 stumbles over logic that GPT‑4 nails, or that some new incarnation can meticulously count the number of 'R's in "strawberrrry." It's also that these systems keep refining how they interpret our input.

As an engineer, I like to see them as magical functions: you hand over a question or instruction, and you get an answer back. But the twist is that this function is oddly future-lossy. The same input today might yield a better output tomorrow, purely because the model behind the curtain has moved on to its next, more polished version.

Data over model

From that perspective, I keep coming back to the idea that data input is much more valuable than model output. Any truly novel human input is the secret sauce, the "magic pixie dust" fuelling everything. Any output to an input may be retrieved again, but if you don't retain your human thought that was the input, you've lost something special.

That's why I embrace the AI take on the principle of "file over app". Your raw data should always be preserved so you aren't locked into a single snapshot of what AI can do right now.

For me, that often takes shape as audio. I'll simply speak my thoughts into a recording, then hand it over to whichever large language model is fresh off the press. It's almost a stream-of-consciousness approach. But, similarly for notes, I prefer to have a ground truth braindump somewhere locally, that I own. Some place that I'm able to revisit, tune, and maintain strong ownership of.

A pensive selfie
A selfie from the other day

The crucial piece is this: hold on to your own ground truth. The data you personally create - be it a voice recording, handwritten notes, or anything else - remains yours to revisit. If a shiny new model comes along, you can go back to that unfiltered source and let the next iteration of AI interpret it anew. You aren't left working off a half-baked summary or some stale output. By putting "data over model," you keep the key that unlocks everything AI might offer in the future—without losing the essence of your own insights along the way.

If you're using AI for helping reason about a dumped stream-of-consciousness, consider ensuring that you keep the input. This baseline can be built on in the future to help guide a model to provide better guidance on someone's growth (your own too) over time. If you only retain the output, you're snapshotting the resolution of the information to the state of the art today, but that might suck by next year. If you retain your ground truth, you're good next year, and the year after.

I do this via Obsidian, a fantastic local-first note taking app. My ground truth starts there. It's full of dumped thoughts, images, audio, video, attachments, links, etc. At times I can wholesale dump whole chunks of context at Claude and ask it to sort out something for me. Consider what's right for you, but given that AI companies pay millions for your novel thoughts, you should probably value your novel outputs too.