May 12, 2026

42 Gallons

42 Gallons, Part 2: Every Drop Measured, Traced, and Accounted For

By John Wassilak

Part 1: 42 Gallons, Part 1: You've Known What's in the Barrel for 150 Years. It's Time to Know What's in Your Data.
Part 2: 42 Gallons, Part 2: Every Drop Measured, Traced, and Accounted For

Walk a producing field. Every barrel that comes out of the ground gets measured. Not approximated, not estimated, measured. The wellhead has a meter. The tank battery has a gauge. The truck has a meter ticket. The pipeline has custody transfer measurement that’s calibrated on a regular schedule and audited by people whose job is to catch discrepancies.

Nobody runs a producing field on the honor system. The whole industry would fall apart inside of a quarter.

Now look at the data side of the same business. The production volume gets pulled from a SCADA system, transformed through a series of stored procedures nobody fully remembers, joined to a working interest table that may or may not be current, and lands in a monthly report. Somewhere in that chain, the rules that turn raw measurements into the numbers the business runs on are applied. Most of those rules live in code that hasn’t been reviewed in years, in spreadsheets maintained by one person, or in vendor systems whose internals are a black box.

If we measured oil the way we measure data, the industry would be running on apologies.

This is Part 2 of our 42 Gallons series. Part 1 was about lineage and provenance, the chain of custody that lets you answer where a number came from and how it got here. (You can read it here.) This post is about what happens between the source and the final number. Measurement, governance, and the work of being ready for whatever the business decides to do next.

Clean at ingestion, not at reporting

The default pattern in upstream data looks like this. Data lands wherever the source system put it. Analysts pull from those systems into spreadsheets or a warehouse. Quality issues get caught and fixed at reporting time, by hand, by whoever is closest to the problem.

That works, in the same way that hand-gauging every tank works. It produces an answer. It doesn’t scale, it doesn’t survive turnover, and the rules that get applied tend to drift from one analyst to the next.

The version that actually works pushes the quality and governance work upstream, to the ingestion layer. When a record arrives from a SCADA system, the rules get applied right there. Out-of-range values get flagged immediately. Missing fields get checked against the source and reconciled before the data lands in the warehouse. Identifiers get normalized at the edge, not three queries deep in someone’s report.

We covered the technical side of this in OCC Data Ingestion: Automating What Most Companies Still Do by Hand. The OCC pipeline is one example. The principle generalizes. Every data flow into your business should have the same posture. Validate at the boundary. Apply the rules where the data enters. Make the warehouse the place where clean, governed data lives, not the place where dirty data goes to die.

Governed end-to-end

Governance gets talked about as a layer you add on top. A catalog. A committee. A set of policies that live in a SharePoint site nobody opens.

End-to-end governance is something different. It means the rules that define how data is supposed to behave are encoded in the systems that move the data, not in documents that describe the systems. Every transformation has an owner. Every threshold has a value somebody put there for a reason. Every override is logged with a person and a justification.

Practically, this looks like a few specific things.

Quality checks that run automatically. Not a quarterly review. Checks that fire on every load and surface failures before the data is used downstream. The check itself is the policy. The policy and the enforcement are the same artifact.

Versioned rules. When the production allocation methodology changes, the change is captured. The new rule is tagged with the date it took effect. Historical periods continue to use the rule that was in effect at the time. Nobody has to remember that allocations were calculated differently before March.

Owned domains. The well master has a name attached to it. So does the working interest register. So does the production allocation methodology. When a question comes up, there is a specific person who has the authority to answer it. We covered the lightweight version of this in Data Governance for Energy Companies That Don’t Have a Data Team, and it scales up cleanly to operators who have a dedicated team.

Auditable history. Every change to a rule, every override, every reconciliation gets recorded. Not because anyone enjoys writing audit code, but because in upstream the question “what did the data look like as of last March” is going to come up, and the answer needs to be retrievable in minutes.

The companies that have done this work do not talk about governance as a separate function. It’s just how the data moves. The metaphor in Part 1 still holds. The barrel’s run ticket is governance. It isn’t a separate document. It’s part of how the barrel moves.

Divestiture-ready from day one

Here is the test we use with operators when we want to know whether their data is actually in shape.

If you put a non-core asset on the market tomorrow, how long would it take to assemble the data room. Not a rough version. The version that holds up to a sophisticated buyer’s technical diligence team.

For most operators, the honest answer is somewhere between four and twelve weeks of internal scramble. Engineers pulled off other work. Land staff combing through file cabinets. Accounting reconstructing production histories. Outside consultants brought in to massage spreadsheets into something defensible. The deal team flying blind for the first month while the data gets pulled together.

That is not a state anybody has chosen deliberately. It’s the state most operators are in because nobody designed the data systems with divestiture in mind. The data was built to support operations, not to support a transaction. When the transaction shows up, the gap shows up with it.

Divestiture-ready from day one is the opposite posture. The data is structured, governed, and traceable in a way that producing a complete asset package is a query, not a project. Production history is reconciled to accounting. Land records are current and authoritative. Regulatory filings are linked to the wells they describe. The chain of custody from meter to monthly report is documented well enough that a buyer’s team can audit it without an internal escort answering questions for a month.

This isn’t a fantasy. We’ve worked with operators who have gotten there. The cost of getting there is the same work that makes monthly close faster, makes diligence easier, and makes the analytical layer trustworthy. Divestiture-ready is a side effect of running well, not a special preparation that happens when a sale is on the table.

The companies that wait until the deal is in front of them pay a real price. The diligence findings come back uglier. The buyer’s confidence drops. The price moves. Sometimes the deal does not close at all.

What this looks like in practice

The pattern we’ve seen work, repeatedly, looks like this.

Data flows from source systems into ingestion pipelines that validate at the boundary. The pipelines record what they pulled, when, and what rules they applied. The data lands in a model the business has chosen, which for most upstream operators ends up being PPDM-aligned. We covered the case for that in What the PPDM Model Actually Gives You (and What It Doesn’t).

Quality checks run on every load. Failures surface immediately, with enough context that an owner can act on them without a forensic investigation. Overrides and manual corrections are first-class objects, captured with a person, a timestamp, and a reason.

Reports and analytics pull from the governed data. They do not have their own quality logic embedded in them. The reporting layer assumes the data is clean, because the cleanliness was enforced upstream.

When a buyer asks for a data room, the answer is “what cutoff date do you want.” Not “give us six weeks.”

This is the destination. Most operators are not there yet. Most can be, with deliberate work and a willingness to push the discipline upstream rather than continuing to absorb it at reporting time.

Why this is the work that pays off

The argument for this kind of investment is sometimes hard to make in advance, because the payoff comes during events most operators are not actively planning for. A divestiture they didn’t anticipate. A regulatory request they couldn’t predict. A capital raise that suddenly needs three years of clean production history. An acquirer’s diligence team showing up with a checklist.

Companies that have done the work do not need a special project to handle those events. The data is ready because the operating posture is one of measurement and governance, all the time. Companies that have not done the work end up paying for it under deadline pressure, when the cost is highest and the options are narrowest.

The first post in this series argued that the chain of custody is the foundation. This post is about what gets built on that foundation. Measurement at the edge. Governance baked into the flow. A state of readiness that lets the business move quickly when it needs to.

The third post is going to bring this back around to standardization. The industry standardized the barrel in 1866 and saved itself a century of disputes. The cost of not standardizing the data shows up most clearly in the diligence room, and that’s where Part 3 will pick up.

What we do

This is the work we do for upstream operators. We build ingestion pipelines that validate at the boundary, model data in a way that supports governance natively, and help operators get to a state where divestiture-ready isn’t a panic. It’s the default.

We were just at the PPDM Energy Data Convention in Houston, April 27 through 29, where this conversation came up over and over. If a divestiture or a diligence package is on your horizon, or if you’re tired of the monthly scramble producing numbers you can’t fully explain, start a conversation. We’d like to hear what you’re working on.