Why Wipple Works · The Geometry of WIP

01

A job is a point in space

Start with something intuitive: color. We know from the pixels on our computers that every color can be represented as three numbers (Red, Green, Blue).

Below, we represent those values in 3d space. You can move independently in the "blue dimension", or the "red dimension", or the "green dimension". The direction of that vector(the combination of all 3 moves) from the origin is the hue. The magnitude is brightness.

We can use similar logic with the values of a job, visualized by switching to Job mode. The directions we can move in become Costs Incurred, Contract Value, and Billings to Date. The math is identical. The direction from the origin now represents all of the characteristics of the job, fused together into one simple coordinate. The magnitude now represents job size.

A real WIP actually lives in four dimensions, with Estimated Total Cost as the fourth axis. Since we can't draw that, this demo collapses one dimension by treating Estimated Cost as equal to Contract Value (a zero-margin job), so % Complete reads directly as cost ÷ contract. The geometry is unchanged; production just carries one more axis of it.

The gray diagonal plane in Job mode is where Billings = Costs: the situation where a job is going perfectly on schedule. The distance a job is from that plane measures over/under billings, readable directly from the geometry.

drag canvas to rotate · link to lock direction

Red

128

Green

200

Blue

160

Magnitude

255

#8000DC

Purple

R 128 · G 0 · B 220

Jump to a named color

02

Dimensions sound scarier than they actually are

Dimensions aren't that complex: They're just numbers that go up or down independently.

On that cube, you could move as much as you'd like in the red direction, and it would never affect how blue something is. Red has its own direction. Blue has its own direction. They can change independently.

We extended that idea to a job's finances by making each column a direction you can move in. Contract Value was one direction. Estimated Cost was another. Billings to Date was another.

But now let's look at what we can do if we make each row a dimension instead.

Jobs don't interact with each other. Changing the Contract Value of Job 1 doesn't mess with Job 2 at all, just like adjusting red does not affect blue. Each job is a number that can go up or down independently, which sounds exactly like a dimension

So instead of one job living inside a cube made out of financial columns, let's flip the whole picture around. Now each job gets its own little number line.

If we only had three jobs, it would look something like this:

Job 1

Job 2

Job 3

Each row has three labeled points. The red point came from one column, the blue point came from another column, and the green point came from a third.

red − blue = green for every job

That's the relationship we're looking for. On every job, Contract Value minus Estimated Cost should equal Estimated Gross Profit.

Contract[i] − Est. Cost[i] = Est. Gross Profit[i] for every job i

This is what makes a WIP a system rather than a row-by-row check. You're not validating Job 1, then Job 2, then Job 3. You're asking whether the same relationship holds across every job at once.

With three jobs, we can cube it again. Job 1 becomes one direction, Job 2 becomes another, and Job 3 becomes another. A column is now a point inside that job-space.

The red column is one point. The blue column is another point. The green column is another point. And the valid answers all sit on the same plane: the plane where red minus blue equals green.

And more jobs means more confidence, not less. Two random columns might accidentally satisfy an identity on one or two jobs. Satisfying it across 50, 100, or 200 jobs is essentially impossible unless the columns genuinely are what the identity says they are.

03

Columns identify themselves

Here's that process made interactive. Five unlabeled columns, 10 jobs each. Three are real: Contract Value, Est. Cost at Completion, and Est. Gross Profit. Two are decoys that look like plausible numbers but break the identity.

The identity: Contract Value − Est. Cost = Est. Gross Profit. That's it. If your column assignment satisfies this equation for every single job simultaneously, you've found the right columns. If even one job breaks it, something is wrong.

Start with the column carrying the largest numbers; the simplest prior in the book says that's Contract Value. Once you assign it, the columns that stay below contract on every job light up as cost candidates. Assign the cost column, and exactly one remaining column satisfies the identity on every job: that's your Gross Profit. Cards stay draggable the whole time, so you can shuffle them between slots freely.

Mystery columns · 10 jobs, values in $k

⬡ Start with the biggest numbers: that is Contract Value

Assign columns · the identity: Contract Value − Est. Cost = Est. Gross Profit

Contract Value

Biggest numbers

Drop here

−

Est. Cost at Completion

Less than contract

Drop here

=

Est. Gross Profit

Contract minus cost

Drop here

Assign all three columns to score

04

There is a catch: some columns are swappable

The identity Contract − Est. Cost = Est. GP is symmetric in its two right-hand players. Swap Est. Cost and Est. GP and the equation still balances. The math can't tell them apart on its own, because both assignments satisfy the identity perfectly on every job.

This is the fundamental ambiguity problem in header-blind column identification: any formula that's algebraically symmetric produces multiple valid assignments. The system needs a way to break that symmetry.

The answer isn't more math on the same columns. It's witness columns: other columns on the WIP that interact with the core in asymmetric ways. If even one column's relationship to the core can only be satisfied by one of the two swappable candidates, the ambiguity collapses immediately.

The good news: you don't need many. As long as the right combination of witnesses is present on the WIP, as few as two additional columns are enough to validate and fully identify everything, almost instantly, still without reading a single header.

05

The validation grid

Assume the core is present: contract value, the cost estimate, cost to date, and billings to date. Those four alone are a trap. From them you can compute every other WIP number, but you can't check anything, because there's no second column to compare any answer against. You can't tell whether a mapping is right without looking at columns where the numbers interact. That's what a validator column is for: it gives the engine two independent routes to the same numbers, and agreement between those routes is the proof. It's the same reason a credit card number carries a check digit. If every 16-digit string were a valid card number, no typo could ever be caught.

The grid shows what happens when you add one cost validator (rows) and one billing validator (columns) to the core. The two halves each need their own confirmation, since the cost side and the billing side don't overlap. Most combinations validate fully. The one systematic trap is Remaining Billings, which mirrors Billings to Date so exactly that the engine needs help from another column to tell them apart.

06

Peeling: identification spreads like a chain reaction

The grid above says which column sets are sufficient. Here's how the engine actually resolves one. The procedure is called peeling, borrowed from how error-correcting codes are decoded, though you already know it from sudoku: every value you lock in creates new cells you can deduce.

It starts with a tiny hypothesis, four seed anchors: Contract Value, one estimate column, Cost to Date, and Billings to Date. From there, any formula whose inputs are all known predicts an entire column of numbers for some still-unknown variable. That prediction is held up against the unassigned physical columns. If one matches, job by job, it's identified, and it becomes a known input for the next round of formulas. The cascade runs until every column is explained.

Two details matter. When a predicted variable matches no physical column, the engine constructs it virtually and keeps going. A missing column never stalls the cascade, but a virtual earns zero evidence. And when several formulas are different spellings of the same underlying fact (Underbillings, Overbillings, and Net Billing are one identity wearing three hats), their matches are merged into a single unit of evidence. Counting them separately would be the engine citing itself as a source. It counts independent confirmations, not formulas.

Four seed anchors hypothesized · press Step

CASCADE LOG

EVIDENCE · INDEPENDENT CONFIRMATIONS

07

Breaking ties with economics

One blind spot the algebra cannot fix from inside: perfect symmetries. Swap Est. Cost with Est. GP and the subtraction still balances (section 04). Less obviously, the earned-revenue identity Earned × Est. Cost = Contract × Cost to Date survives swapping the pairs (Contract, Cost to Date) ↔ (Earned, Est. Cost). The algebra is equally happy with both readings.

What breaks them isn't more algebra. It's the fact that these numbers describe a business. Three observable regularities act as tiebreakers, and crucially, they're only ever used to rank candidates and orient readings, never as proof:

Band

Contractors don't run 60% gross margins. Est. Cost ÷ Contract sits high, typically 55–98%, while margin sits low. Whichever way you read the estimate column, only one orientation lands in a believable band.

Stability

An estimate ratio is a property of how the contractor bids, so it's tight across the whole portfolio. Progress ratios like % complete and % billed are a property of where each job is in its life, so they smear across 0–100%. Tight versus spread separates estimates from anchors at a glance.

Order

Cost to date rarely exceeds estimated cost, and billings rarely exceed contract by much. The first of these, Cost to Date ≤ Est. Cost, is exactly the observation that settles the pair-swap above.

08

Error detection is automatic

Identification and certification run at two different tolerances, and the gap between them is deliberate.

Identification is forgiving. A column is placed if it fits a loose tolerance on all but a small number of rows, and at least one bad row is always tolerated. This is a structural guarantee, not generosity: a single OCR-mangled cell must not be allowed to eject the true column from consideration, or the engine would silently "explain" the table while hiding its worst cell inside a virtual.

Certification is merciless. Once columns have names, every accepted identity is re-checked on every row at propagated dollar precision. A $37 misprint doesn't lower some confidence score. It produces a named row, a named column, the observed value, the expected value, and the signed difference.

Be clear about what a flagged cell means, though: it isn't the system failing. It's the system working. The columns were identified fine; the table simply contradicted itself in one spot, the engine caught it, and everything it reports from there is an explanation of exactly where and why.

Then comes the part that feels like magic but is just bookkeeping: triangulation. Every identity implicates exactly the cells it touches. Failing identities vote on a culprit; passing identities exonerate the innocent. The intersection usually pins a single cell, and because the surviving identities still connect that cell to the rest of the table, they jointly imply what the value should have been. The engine doesn't just find the error. It proposes the correction and tells you what kind of typo it was.

Try it. Click any number to corrupt one of its digits, the way an OCR misread would, and watch the identities triangulate.

All identities hold · table certifies clean

IDENTITY CHECKS · STRICT, EVERY ROW

DIAGNOSIS

09

Why this is fast and nearly free

The naive approach to column identification is brute force: try every assignment of columns to roles and check which one satisfies all the identities. With N columns that's N! orderings, and factorials do not negotiate:

8 columns → 8! = 40,320 orderings

12 columns → 12! = 479,001,600 orderings

30 columns → 30! = 265,252,859,812,191,058,636,308,480,000,000 orderings

A heavyweight 30-column WIP, the kind a large contractor actually submits, is about 2.65 × 10³² orderings. Checking a billion per second would take roughly eight quadrillion years, around 600,000 times the age of the universe. The geometric approach never goes there. It exploits the structure of the problem to prune the search at each step:

Step 1 · O(N)

Shortlist the columns with the largest portfolio-wide values as Contract candidates. One pass over N columns. This is a prior, not proof. If no shortlisted reading survives, the list expands.

Step 2 · O(N × J)

Screen each remaining column against the candidate: does col[j] ÷ contract[j] stay inside its economic band on at least 90% of jobs? Robust by design: real WIPs carry OCR noise, rounding, and the occasional zero row, so a single bad value never disqualifies a column.

Step 3 · O(K × J)

Only the K surviving anchor placements (usually a handful) proceed to peeling. Each cascade is a fixed sequence of vector predictions and matches, and the validation grid already says which combinations can resolve, so this is closer to a lookup than a search.

Total · O(N × J)

Linear in columns, linear in jobs. Even a 30-column WIP with 200 jobs is on the order of thousands of vectorized checks, not nonillions. The combinatorial explosion never happens because each constraint eliminates candidates before the next step runs.

The reason the pruning is so brutal: each GAAP identity carves the space of possible assignments down to a thin slice, and a valid assignment has to sit inside every slice at once. The engine isn't searching the whole space. It's applying one constraint at a time and keeping only what survives, and after three or four constraints almost nothing is left except the correct assignment.