I Thought I Was Building a Product. I Was Building a Category.

I've been building the Living Archive for months. Curated cultural memory. Community oral histories. Contributors, consent flows and contributor-attributed stories chunked and embedded into pgvector. The technical architecture was solid. The thesis was clear. The data model had already been rewritten twice and I was finally happy with it.

Then on day 39 of the war in Iran I tried to use it to capture something else, and realised it didn't fit at all.

I was on the sofa, in a medical flare-up, watching Trump post that "a whole civilisation will die tonight". Complete shock and dismay. Somewhere in the middle of doom scrolling I had a thought. "I'm not going to suffer from this and I know it. But the people who will, will be the same people the historical record tends to forget."

I wanted to capture the moment. Not the Trump post, but the framing of the Trump post. How Western mainstream media reported it. How Dawn reported it. How Premium Times reported it. The contradictions between them. The bit that nobody preserves because the news wire services flatten it before it reaches the archive.

So I opened my own tool. And it didn't fit.

The unit was wrong

Living Archive treats a contributor as the unit of capture. A person consented, a story was attributed, a contributor-story relationship was created. That's correct for community oral histories. It's the whole point.

But a news article doesn't have a contributor. It has a source. A publication, a publish time, a URL, a headline, a section, a slug. The "person" is the institution. The framing is the content. There's no consent flow because the content is already public, but there is an ethical question about how you preserve and surface it that the existing schema had no language for.

I sat with that for about ten minutes and felt the same sinking feeling I get when I realise I've been thinking about something wrong for months.

The reframe

The thing I kept saying out loud was "this isn't the same product". I wanted it to be. It would have been so much easier if it was. Same tables, same pipeline, ship it tonight, go to bed.

But it wasn't the same product. And the more I tried to bend the existing schema into the new shape, the more I could feel the architecture screaming.

Then I said something else out loud, and that was the bit that actually mattered.

"It's not a product. It's a category."

The engine

Capture → Chunk → Embed → Store → Query → Surface.

That pipeline can power multiple distinct products depending on what you're capturing and for whom. Community oral histories with full consent flows is one product. Real-time crisis archiving with provenance and framing analysis is another. Researcher tooling for media bias studies is a third. A public-facing accountability layer is a fourth.

Same engine. Different shells. Different data models. Different consent flows. Different surfacing logic.

I've been building Living Archive for months as a single product. That night I realised I'd actually been building a platform.

That reframe was worth more than the code I wrote afterwards.

Rapid capture, tidy later

Once I had the platform vs product distinction, the technical decisions made themselves.

The principle for the new module is the one I keep coming back to whenever I'm building something under time pressure. Rapid capture, tidy later. Capture everything raw. Tag aggressively with provenance. Make surfacing decisions when you know what you have.

Metadata is the receipt. It lets future you prove the framing without arguing about it.

I'll write more about the actual implementation another time, once there are real captures flowing. The point of this note isn't the code. The point is that the code only made sense after the reframe.

The naming problem

I thought the main risk to Living Archive was building the wrong thing. Turns out the bigger risk was building the right thing but calling it by too small a name. Calling it a product when it was a platform. Imagining one shell when the engine could power four.

A lot of the friction I'd been feeling for the last few weeks wasn't technical. It was naming. I'd been trying to make a single product hold every use case I could imagine for the underlying infrastructure, and that was making every individual decision harder than it needed to be.

Should the consent flow be optional? Should the data model support sources without contributors? Should I add a "framing" field? Every one of those questions had a different answer depending on which use case I was prioritising in my head at the time.

Once I let the engine be a platform and the products be distinct, all of those questions became easy. Crisis capture doesn't need the consent flow. Community archives don't need the framing field. Researcher tooling needs both, but layered differently. Different products. Different decisions. Same engine.

I should have seen this months ago. But I think you only see it when you try to use the thing for a use case it wasn't built for, and feel it not fit.