When Validation Lives Between Objects

I ran into a problem that, in hindsight, was simple — almost embarrassingly simple. But it took me a long time to see it, not because the logic was complicated, but because I was looking for correctness in the wrong place.

This is the story of how that happened, why it was hard to notice, and what finally made it click.

The Original Problem

I’m working with time-series data that gets split into canonical slices. Over time, the system grew a few different representations of the same underlying data:

In-memory slices generated by code
A manifest that describes those slices
JSON response files on disk referenced by the manifest

Each layer existed for a good reason, but together they raised an uncomfortable question:

How do I know these slices are actually correct?

Not “were they generated correctly once,” but:

Are they still correct?
Are they the right size?
Are they aligned properly?
Are there gaps or partials hiding somewhere?

At first glance, this feels like a slice-level validation problem.

And that’s where I got stuck.

The False Comfort of Generation-Time Correctness

I initially leaned on the idea that correctness was guaranteed upstream:

The slice generation code had been tested
The logic had worked before
The manifest was produced by the same system

But that assumption doesn’t hold in a real system.

Code changes. Jobs get interrupted. Partial reruns happen. Schemas evolve. Files get deleted or overwritten.

A manifest can be internally well-formed and still be wrong.

That meant validation had to happen at runtime, against what actually exists — not just against what should exist.

So I started splitting the problem apart.

Three Kinds of Validation (That Still Didn’t Solve It)

I eventually decomposed validation into three separate concerns:

Manifest structure validation Does the manifest itself have the expected shape and fields?
Slice object validation Does each slice entry look internally sane?
Manifest ↔ filesystem integrity Do manifest entries actually point to files on disk, and are there any orphaned files?

This helped a lot. It removed ambiguity and clarified responsibilities.

But one problem remained stubbornly unresolved:

How do I validate that slices are the right temporal size?

I wasn’t storing explicit end times. Each slice had a start timestamp and metadata. Nothing inside a single slice could prove correctness.

And no amount of counting slices or checking keys felt satisfying.

Something was missing.

The Bug That Had Nothing to Do With It (But Somehow Did)

The breakthrough didn’t happen while I was thinking about slice validation at all.

It happened while debugging a completely unrelated issue in the URL parser.

An exception was being thrown from code that parses URLs — already strange. The error claimed a set was being passed where a string was expected. That didn’t make sense. I was sure I was passing a string, and in most execution paths, everything worked fine.

To sanity-check myself, I added logging right before the failure and printed both the value and its type.

It printed a set.

That was genuinely confusing.

I searched other call sites, scanned nearby code, and couldn’t immediately see where a set could be coming from. Rather than spelunking endlessly, I dumped the relevant code and context into ChatGPT.

The answer was simple in hindsight: on that specific execution path, I was passing a set. It was an f-string formatting artifact left over from an earlier version of the code, back when the structure being passed around was different. The curly braces were doing exactly what Python says they do.

Once that was pointed out, the bug was trivial to fix.

And here’s the strange part.

The moment that bug was resolved — literally within a second — the solution to the slice validation problem snapped into place as well.

I wasn’t thinking about slices. I wasn’t thinking about timestamps. I wasn’t thinking about manifests.

But the mental tension that had been sitting unresolved suddenly collapsed.

The Realization

The mistake wasn’t in the code.

It was in the question.

I had been asking:

How do I validate a slice?

But slices don’t encode temporal correctness on their own.

The correct question was:

What structure do these timestamps form when considered together?

Once I asked that, the answer was obvious.

Why Object Thinking Failed

Each slice is a rich object:

timestamps
symbols
paths
metadata
status flags

That richness creates a kind of gravity. It suggests that correctness should be checkable inside the object.

But temporal correctness doesn’t live there.

Checking individual slices led to:

field validation
count checks
status flags
heuristics

None of them could express continuity.

Why Collection Thinking Still Wasn’t Enough

The next instinct is to zoom out:

Validate the collection.

But if your collection is a dictionary or set, you’re still stuck.

Counts don’t encode time. Membership doesn’t encode adjacency. An unordered collection has no notion of “next.”

A dictionary of slices is not a temporal structure.

The Missing Step: Ordered Projection

The solution was to stop treating slices as the primary object of validation.

Instead:

Project a single field — the start timestamp
Order it
Examine adjacency

Once ordered, each timestamp implicitly defines an edge to the next one.

That’s where the invariant lives.

If the delta between adjacent timestamps is correct, the slices are correct. If it isn’t, something is wrong — regardless of how “valid” each slice looks in isolation.

This validation:

does not work on a set
does not work on a map
only works on an ordered sequence

The Graph View (Why This Finally Made Sense)

In graph terms:

Slices are nodes
Ordering by timestamp induces edges
Temporal correctness lives on those edges, not on the nodes

I wasn’t missing data. I was missing edges.

And edges only appear after ordering.

Why This Was So Hard to See

If this were SQL, it would have been obvious.

A window function over ordered timestamps practically begs you to check deltas. SQL makes projection and ordering first-class, so column-level invariants are visible.

But object-based representations hide that move.

They quietly encourage you to believe that:

fields belong to objects
validation should be local
collections are unordered unless stated otherwise

None of that is true for temporal data.

The Lesson

The lesson isn’t “validate across objects.”

It’s this:

Some invariants only exist on ordered projections of data. If you stay at the object level or the unordered-collection level, those invariants are unexpressible.

Or more practically:

If correctness depends on sequence, validation must operate on ordered structure — not individual objects, and not generation-time assumptions.

Closing

What’s funny about this is that once you see it, it feels trivial.

Of course you’d sort timestamps. Of course you’d compare deltas. Of course correctness lives between slices, not inside them.

But that simplicity only appears after you give yourself permission to leave object-centric thinking behind.

And sometimes, oddly enough, that permission arrives while debugging a URL parser.

Comments

No comments yet. Be the first!

The Original Problem​

The False Comfort of Generation-Time Correctness​

Three Kinds of Validation (That Still Didn’t Solve It)​

The Bug That Had Nothing to Do With It (But Somehow Did)​

The Realization​

Why Object Thinking Failed​

Why Collection Thinking Still Wasn’t Enough​

The Missing Step: Ordered Projection​

The Graph View (Why This Finally Made Sense)​

Why This Was So Hard to See​

The Lesson​

Closing​