June 4, 2026

Field Tickets and the Digitization Gap: Why Paper Forms Still Run Upstream Operations

By John Wassilak

There’s a stack of carbon-copy field tickets in a pumper’s truck somewhere in western Oklahoma right now. Maybe yours. They’ll get driven into the office on Friday, dropped on someone’s desk, and keyed into a spreadsheet by Monday afternoon. Some of them will be wrong. A few will be unreadable. One or two will be missing entirely. The numbers will eventually feed allocation, revenue, and accruals.

This is not a story about one bad operator. It’s how most of the industry still runs.

Field tickets, run tickets, gauge sheets, JIBs from non-operated partners, workover invoices, completion service tickets. The list of documents that have to be ingested manually at a mid-size operator is longer than most outsiders realize. Each one is a small piece of operational reality that hasn’t crossed the gap from paper or PDF into structured data.

The gap has been there long enough that everyone has stopped noticing it. That doesn’t mean it doesn’t cost anything. It costs a lot. It’s just that the cost is distributed across enough people, departments, and months that nobody owns it.

What we’re actually talking about

A few definitions, because the terminology varies by region and by operator.

Field tickets are written records produced by a pumper, lease operator, or field hand documenting what happened at a well or facility on a given day. Volumes gauged, water hauled, chemicals injected, equipment serviced.
Run tickets are the records of oil leaving a tank, typically when a truck hauls a load to a sales point. They include opening and closing gauges, temperature, BS&W, and the load volume.
Gauge sheets track tank levels over time and feed into both production reporting and inventory reconciliation.
JIBs (joint interest billings) arrive from operators on properties where you hold a non-operated interest. They are line-item invoices for your share of costs, and they almost always arrive as PDFs.
Vendor invoices and service tickets cover workovers, completions, chemicals, water hauling, and the dozens of other line items that hit AFEs and lease operating expense.

Almost all of these documents exist in a hybrid state. Some are still genuinely paper. More are PDFs that were never machine-readable to begin with. A few are emailed spreadsheets that were exported from somebody else’s accounting system and have been hand-edited since they arrived.

The data inside them runs the business. The form it arrives in is the problem.

Why this hasn’t been fixed already

You can find a vendor selling a field-ticket digitization product on every aisle of every upstream conference. They’ve been selling them for fifteen years. Adoption is real but uneven. The reason is not that operators are uniquely backward.

The data sources you don’t control are the hardest. You can mandate that your own pumpers use a mobile app. You cannot mandate that your non-operated partners send you machine-readable JIBs. You cannot make a service company that’s been writing tickets by hand since 1987 switch to a structured export. As long as those upstream sources are still paper, your downstream process has to handle paper.

The forms aren’t standardized. Two pumpers working for the same operator will fill out a ticket differently. Two service companies will use different column orders, different units, different rounding conventions. JIBs from different operators look nothing alike. You cannot write one parser and walk away.

The stakes of getting it wrong are real and immediate. A misread digit on a run ticket flows directly into revenue. A misclassified line on a JIB ends up in the wrong AFE. Errors here don’t get caught in a downstream dashboard. They show up in monthly close, in audits, and in disputes with partners.

The economics aren’t obvious. A clerk keying tickets costs a few hours a week. Replacing that work with a real ingestion pipeline costs real engineering and ongoing maintenance. The ROI math works, but only if you account for the error rate, the close cycle delay, and the management time spent reconciling disagreements that wouldn’t have existed if the data had been clean to start with.

What a real ingestion pipeline looks like

The good news is that none of the technical pieces are exotic. The pieces are well understood. The work is in fitting them to the particular shape of this data.

Capture at the source where you can

For your own field operations, the right answer is a mobile capture tool. Pumpers enter readings on a phone or tablet at the well, the data lands in your system in structured form, and the carbon-copy book becomes a backup rather than the primary source. The vendor market here is crowded but functional, and the implementation cost is mostly about training and field-level change management, not software.

This is the cheapest and cleanest data you’ll get. It’s also the only category where you fully control the format.

Document ingestion for everything else

For the documents you don’t control (JIBs, service tickets from outside vendors, the partner who insists on PDFs), the pipeline starts with an inbox. Email forwarders, a shared drive, an upload portal. Whatever the mechanism, every document needs to land in one place where it can be processed and tracked.

From there:

Classification. What kind of document is this? Is it a JIB, a service invoice, a run ticket, a gauge sheet? Most operators are dealing with under a dozen real categories. A simple classifier (rules-based or a small ML model) handles the bulk of it. Edge cases go to a human review queue.
Extraction. Pull structured data out of the document. For standardized forms, layout-aware parsers work well. For PDFs that vary by sender, modern OCR plus LLM-assisted extraction earns its keep, with the caveats we’ll get into below.
Validation. Compare extracted values against known constraints. Volumes inside expected ranges, dates within the expected period, totals that match the line items. Failed validation goes to review, not to the database.
Entity resolution. Match the document to a well, a lease, an AFE, a vendor, a partner. This is the same problem covered in OCC Data Ingestion. The well master is the same well master. The matching rules are the same rules.
Loading. Land structured, validated records in your operational system. For most operators, that’s PPDM-aligned tables in PostgreSQL or SQL Server, with the original document retained as an audit artifact.

The shape is the same as every other ingestion problem we’ve covered on this blog. The wrinkle is the unstructured input, not the destination.

Where LLMs actually help, and where they don’t

A lot of what gets sold as “AI-powered document extraction” today is OCR plus a large language model plus a prompt that was tuned on a vendor’s specific sample set. It works well for some documents. On others it produces plausible-looking output that’s wrong in ways nobody catches until close.

The honest version of where this technology earns its keep:

A JIB from each of your top ten partners looks different from every other partner’s JIB. Building a rules-based parser for each takes weeks of work that has to be redone every time a partner changes their template. An LLM with a structured output schema handles the variability with much less per-partner work, and that’s a real win.

Free-text fields are another good fit. A service ticket description that says “pulled rods, replaced pump, RIH” needs to be classified into structured categories. LLMs do this kind of normalization well, and the worst case is a misclassified record that a reviewer corrects in the queue.

First-pass extraction with human review is the pattern that works for almost everything else. The model gets you most of the way. A reviewer corrects what’s wrong before it lands in the database. Over time, the corrections feed back into better prompts or fine-tuning.

Where this approach gets people in trouble is high-stakes numeric fields. A model that occasionally misreads a digit on a run ticket is not acceptable. For the fields that drive revenue and accounting, you need deterministic OCR plus validation, not generative inference. If you do use an LLM, cross-check its output against a deterministic extraction of the same number and fail closed to a human on any disagreement.

The other place it falls apart is anything without an obvious validation rule. If you can’t tell whether the extracted value is plausible without reading the document yourself, you can’t automate it. LLM confidence scores are not the same as correctness, and treating them that way is how silent errors end up in monthly close.

Compliance-bearing documents deserve their own caution. Regulatory filings, partner statements that drive disputes, anything that might end up in a courtroom. The audit trail needs to be ironclad and reproducible. “The model said so” is not a defense.

The pattern that works is mundane. Deterministic extraction where you can. LLM-assisted extraction where you need flexibility. Validation everywhere. A human in the loop on anything that fails validation or falls below a confidence threshold. The same point Jeff made in The Data Foundation That Makes Autonomy Work Better applies here. Clean data and disciplined process is what lets the autonomous pieces do useful work.

The reconciliation problem (again)

Getting the data out of the document is the first half of the problem. Making it agree with everything else is the second half.

Run tickets need to reconcile with sales statements from the purchaser. Field tickets need to reconcile with SCADA volumes where SCADA exists. JIBs need to reconcile with internal AFE forecasts and the operating agreement. Service tickets need to match purchase orders and vendor invoices.

Most of these reconciliations are happening today, somewhere, in somebody’s spreadsheet. They are usually happening late, by a person who has been doing it long enough to know which discrepancies matter and which can be waved off. That knowledge does not survive that person leaving the company.

A real pipeline turns the reconciliations into structured comparisons. Differences get flagged, categorized, and assigned. The patterns become visible. The chronic disagreements (this partner’s JIBs are always 3% off because of how they allocate overhead) become known issues with documented treatment rather than monthly surprises.

The same point applies that we made in Reconciling Land and Production Data. You are not trying to eliminate the disagreement. You are trying to make it visible, auditable, and resolvable instead of working through it from scratch every month.

Where to start

The mistake on this kind of project is trying to digitize everything at once. The right move is to pick the document type that costs you the most and start there.

For most mid-size operators, that’s one of three things: pumper-generated field tickets, JIBs from non-operated partners, or run tickets from oil sales. Pick the one where the manual work is costing the most time and the error rate is doing the most damage. Build the capture or ingestion pipeline for that one. Run it in parallel with the manual process for a close cycle or two. Cut over when the comparison shows the pipeline is at least as accurate as the human, which it usually is by the second month.

Then move to the next one.

Within a year, a focused effort can pull the bulk of the document-driven manual work out of the close cycle. The clerks who used to key tickets get assigned to higher-value reconciliation and analysis work. The close cycle shortens. The data flowing into your operational systems is fresher and more accurate. The audit trail is intact.

None of this is glamorous. None of it shows up in a press release. It is, however, the actual work of catching up the data infrastructure to where the rest of the business already operates.

We Were Just at PPDM 2026

We spent April 27 through 29 at the PPDM Energy Data Convention in Houston. If we didn’t get to connect there, or if field ticket and document ingestion is a conversation you’d like to have, we’d love to hear what you’re working on.

Get in touch