April 16, 2026

From Spreadsheets to a Real Data Stack: A Realistic Migration Path for Mid-Size Operators

By John Wassilak

Let’s be honest about something. The spreadsheet didn’t ruin your company. It ran it.

For years, maybe decades, your production data, your land records, your royalty calculations have lived in Excel. Someone on your team knows exactly which tabs to update, which formulas not to touch, and how to get the monthly numbers out the door. It works. Until it doesn’t.

The breaking point is never sudden. It’s the third time this quarter that two people pulled the same production report and got different numbers. It’s the realization that your land team and your accounting team have been working from different versions of the same royalty interest for months. It’s a due diligence request that should take a day but takes two weeks because the data is spread across forty files on a shared drive.

If that sounds familiar, you’re not behind. You’re normal. Most mid-size upstream operators are in exactly this spot.

The question isn’t whether you need to move past spreadsheets. It’s how to do it without blowing up what already works.

Why the all-at-once approach fails

The natural instinct is to go buy something. A new platform, a fancy database, an enterprise solution that promises to consolidate everything. Vendors love this. They’ll scope a six-figure project, assign a team, and start building.

Eighteen months later you have a partially migrated system that nobody trusts, a pile of spreadsheets that never actually went away, and a very expensive lesson in why technology purchases don’t solve data problems.

This happens because the problem was never the spreadsheet itself. The spreadsheet is a symptom of a deeper issue: there’s no shared, structured, governed foundation underneath your data. Buying a platform before addressing that is like pouring a driveway before you’ve graded the lot.

A phased approach that actually sticks

The operators we’ve seen succeed at this don’t try to migrate everything at once. They pick the spot where the pain is worst, prove out a better approach there, and expand from that foundation. It’s not exciting. It’s effective.

Here’s roughly what that looks like in practice.

Phase 1: Pick one problem and solve it properly

Don’t start with “we need to centralize all our data.” Start with the specific thing that’s costing you time or money right now.

For a lot of upstream operators, that’s production data. Monthly volumes are getting pulled from the OCC, manually entered or pasted into spreadsheets, and then hand-reconciled against field reports and accounting. The same numbers get touched by three different people in three different formats before anyone can actually use them.

Take that one workflow and build it right. Pull OCC production data into a real database. Define the business logic for allocation and reconciliation once, in code, instead of in someone’s head. Make the output available to everyone who needs it without the manual steps.

This doesn’t require a massive platform. A PostgreSQL instance and a well-built pipeline will do the job for most operators at this scale.

Phase 2: Automate the inputs

Once you have a real database with one dataset done right, the next step is automating the data that feeds into it. Instead of someone downloading files and uploading them manually, you build pipelines that pull from your sources on a schedule.

For energy operators, that usually means connecting to regulatory data (OCC filings, completion reports), SCADA or field data systems, and whatever your accounting or ERP platform is. Each connection is its own small project, and they don’t all have to happen at once.

The goal here is to eliminate the human data pipeline. The person who spends every Monday morning copying and pasting numbers into a spreadsheet should be freed up to do something that actually requires judgment.

Phase 3: Use PPDM to structure the connections

This is where a standard data model earns its keep. Once you have multiple datasets landing in a real database, you need a consistent way to organize them. Well data, production data, land data, and facility data all need to relate to each other in a way that’s predictable and understood.

The PPDM data model was built for exactly this. It gives you a shared vocabulary and structure for upstream oil and gas data that’s recognized across the industry. When your well records, production allocations, and land interests all follow the same structural conventions, connecting them stops being a custom project every time.

You don’t have to implement the entire PPDM model on day one. Start with the entities that matter for the problems you’re solving (wells, production, land) and expand as you go. The model is large. The part of it you need right now probably isn’t.

If you want more context on how PPDM fits into the broader data engineering picture for energy companies, we wrote about that in Why Oklahoma Energy Companies Can’t Afford to Ignore Data Engineering.

Phase 4: Build reporting on a foundation that’s actually trustworthy

Once the underlying data is structured, automated, and governed, this is where dashboards and analytics actually start working. Not before.

The reason most BI projects disappoint isn’t the tool. It’s that the data underneath is inconsistent, stale, or scattered. When someone opens a dashboard and the numbers don’t match what they see in their own spreadsheet, they stop trusting the dashboard and go back to the spreadsheet. Game over.

But when the data is flowing automatically, reconciled according to defined business rules, and structured in a way that’s consistent across departments, the reporting layer becomes the easy part. Pick a tool your team is comfortable with. Point it at the database. The hard work is already done.

What you don’t need

You don’t need a data lake. You don’t need a six-figure platform license. You don’t need to hire five data engineers.

Most mid-size operators can get from spreadsheets to a functional, automated data stack with a PostgreSQL or SQL Server database, an orchestration tool like Airflow, and someone who knows how to wire it all together. The technology is not the bottleneck. Understanding the data, the business logic, and the industry context is what makes or breaks this work.

That’s why it matters to work with people who’ve actually dealt with OCC data, production allocations, and PPDM before. A generic data engineering shop can build you a pipeline. Whether it handles the nuances of working interest calculations or regulatory reporting formats is a different question entirely.

Start where it hurts

The biggest mistake is waiting for the perfect plan before doing anything. The second biggest mistake is trying to do everything at once. The path that works is smaller than you’d expect: pick the data that’s causing the most pain, build a proper foundation for it, and grow from there.

Every operator we’ve talked to who successfully made this transition says the same thing. They wish they’d started sooner, and they’re glad they started small.

See Us at PPDM 2026

We’ll be at the PPDM Energy Data Convention in Houston, April 27 through 29. Stop by Booth #2 if you want to talk about your data challenges, or just to say hello. We’d love to hear what you’re working on.

Get in touch