June 4, 2026 · KoldOps

Why Oil & Gas AI Projects Stall (and the State Layer That Lets Them Ship)

Oil & gas operations carry decades of state in drilling logs, mud reports, lease files, BSEE filings, and HSE history. Stateless LLMs cannot reason against that. The fix is a substrate. KoldOps installs it.

An oil & gas operator runs a Claude pilot on drilling reports. The first three weeks look promising. Six months later, the agent cannot answer "how did Well 14 compare to Well 7 on the same formation," it confuses last quarter's mud-program standard with the current one, and the operations VP quietly defunds the project. The model is not the problem. Well files. Mud logs. Lease documents. BSEE and BLM filings. HSE histories. Decades of production curves. The state the operation actually carries has no substrate the agent can read. This piece names the pattern, lists the state O&G operations carry, and lays out the buildout that makes Claude actually useful for an operator.

The pattern in oil & gas

Of the three industrial verticals KoldOps works in, oil & gas has the highest substrate-gap-to-AI-ambition ratio. Operators have heard the AI pitch. Many have run pilots. A meaningful fraction have spent six figures on internal builds. The output, almost uniformly, has been disappointing.

The disappointment shape is consistent. The agent is asked a question that requires reasoning across multiple wells, multiple time periods, or multiple regulatory filings. The agent returns a confident answer that an experienced engineer immediately recognizes as wrong, or partially correct, or correct only for the document the retrieval happened to surface. The engineer stops trusting the agent. The agent stops being used. The pilot ends.

The pattern is not unique to oil & gas. It is sharper here because the state oil & gas operations carry is uniquely deep, uniquely structured, and uniquely unforgiving when an answer is wrong.

The state an oil & gas operation actually carries

Below is a non-exhaustive list of the state categories a typical upstream or midstream operator carries, and what a stateless LLM does with each by default.

State category	Volume and shape	What a stateless LLM does with it
Well files	Hundreds to thousands of wells, each with construction, completion, workover, and abandonment history	Reads one well file at a time. Cannot compare across wells without explicit prompting and explicit document attachments.
Drilling reports	Daily morning reports for every active well. Tens of thousands of historical reports.	Summarizes the current report. Cannot identify recurring problems across rigs or campaigns without a substrate that indexes them.
Mud reports and DMRs	Daily, structured, voluminous. Rheology, solids analysis, salt chemistry, treatment volumes.	Reads one report. Cannot trend mud-program performance over a campaign. Cannot flag formation-specific anomalies without comparison context.
Lease documents	Leases, amendments, ratifications, assignments, ORRIs, NPRIs. Interactions across leases and depths.	Extracts terms from one lease. Cannot reason about lease-on-lease interactions, depth severances, or pugh clauses without the full lease corpus in retrievable form.
Regulatory filings	BSEE, BLM, state agencies. APD, sundry notices, production reports, MMS-130. Decade+ of historical filings per well.	Reads the one filing handed to it. Cannot answer "what did we file for this well over the last 10 years" without a versioned, queryable filings repository.
HSE incident history	Near-miss, recordable, lost-time, environmental release. With root-cause analyses, corrective actions, follow-up audits.	Summarizes one incident. Cannot identify recurring failure modes, equipment-specific patterns, or campaign-level safety trends.
Equipment maintenance	Pumps, compressors, separators, tank batteries. Service histories, failure modes, rebuild records.	Cannot recognize "this same pump failed this same way 3 years ago" without a substrate that joins equipment IDs across years.
Production data	Daily volumes, allocated, gas-oil ratios, water cuts, decline curves.	Reads the snapshot it is handed. Cannot reason about decline, anomalies, or comparison-to-type-curve without the historical series available on call.
Geological surveys	Logs, cores, seismic interpretations, formation tops, petrophysics.	Summarizes one document. Cannot integrate across surveys or correlate to production performance.

The state is deep, structured, and historical. Every column above is the kind of question an experienced engineer routinely answers by walking to a filing cabinet, opening three databases, calling a contractor, and consulting a 12-year-old report. The AI was supposed to make that walk unnecessary. Without a substrate, it makes the walk worse, because the agent confidently fills in the blanks the experienced engineer would know to leave open.

The substrate an oil & gas operator needs

The shape of the substrate is the same as any other vertical's. The contents are vertical-specific.

Versioned well files in markdown. One canonical document per well, updated through pull requests, with history visible. Source documents (PDF reports, native CAD, log curves) referenced and stored alongside.
Reviewed mud-program standards by formation and rig. The canonical answer to "what is our mud program for this formation, this rig, this depth window," with reviewer attribution and effective date.
Searchable, versioned lease repository. Every lease, amendment, and instrument in one queryable corpus, with depth severances and clause-level extraction.
Filing-by-filing regulatory history. Indexed by well, by date, by filing type, by agency. The corpus an LLM can walk to answer compliance questions without re-deriving them from the regulator's portal.
HSE corpus with root-cause structure. Incident records indexed by equipment, by failure mode, by location. The substrate that lets the AI answer "have we seen this before."
Equipment master with service history. Each piece of equipment as a first-class entity, with all service events attached, queryable across years and locations.
Production data with decline-curve context. Historical volumes available on every call, not just the snapshot.

None of these are exotic. Every one of them is a substrate that the operator already almost has, scattered across half a dozen systems (DrillingInfo, WellView, P2, Quorum, the contractor's portal, a shared drive, somebody's laptop). The substrate work is to consolidate, version, and review-gate the scattered state into a layer the agent can read.

What the engagement looks like

KoldOps installs oil & gas substrates as fixed-scope engagements. The starting point is one decision domain. Usually drilling-program standards, mud-program standards, or lease-encumbrance reasoning. Plus one pilot well or pilot field. The buildout sequence:

Business System Review (2 weeks, fixed scope). Map current state. Score the substrate against the 5-question audit. Identify the highest-ROI domain. Hand back a written report.
Substrate buildout for the chosen domain (4 to 8 weeks). Consolidate the scattered state into markdown, in git, with review gates. Wire the retrieval stack. Stand up the MCP layer.
Agent pilot on the pilot well or field (2 to 4 weeks). Point Claude (or whichever frontier model the operator prefers) at the substrate. Run the agent against real operating questions. Measure answer quality against an experienced engineer's judgment.
Expansion to the next domain, with the operator's team running the buildout pattern on their own where possible. KoldOps stays on as advisor, not as the sole builder.

Total time from contract to a useful in-production agent on the first domain: 8 to 14 weeks. Total cost: predictable, fixed-scope, with no proprietary lock-in. The substrate the operator owns at the end works with any frontier model and any future tool.

Frequently asked

We already have DrillingInfo / WellView / P2 / Quorum. Why do we need a substrate?

Those systems are systems of record. They are excellent at their core function (drilling intelligence, well management, accounting, land). They are not, by themselves, substrates an LLM can read fluently. The substrate KoldOps installs sits alongside those systems, pulls from them, and presents an LLM-native view the agent can reason against. Your existing systems stay. The substrate is additive.

Our data is sensitive (ITAR-adjacent, NDA, joint-venture). Can the substrate be on-premise?

Yes. The substrate is markdown files in a git repository. The retrieval stack is open-source. The inference layer can run on-premise (Ollama, llama.cpp, or a Mac Mini cluster for smaller workloads; dedicated GPU infrastructure for larger). See on-premise AI for the deployment options. Nothing in the substrate architecture requires the cloud.

What about the regulatory data? Is it appropriate to put BSEE / BLM / state filings in an LLM-readable corpus?

The filings are public records, so the legal answer is yes. The operational answer is also yes, because the filings the operator submits are the operator's own data. The substrate makes the operator's filing history queryable for the operator's own use; it does not change the public-record status of the filings themselves.

We tried Claude on drilling reports and it hallucinated. Why would the substrate help?

Because the hallucinations are usually not the model inventing facts. They are the model filling in blanks the retrieval should have filled with real source documents. With a substrate, the retrieval stack pulls the actual relevant reports on every call, the model has the context it needs, and the hallucination rate drops to the model's intrinsic floor (which is low on frontier models in 2026). The model is not the problem. The state the model has access to is the problem.

Can we start with just one well or one rig?

Yes, and we recommend it. The substrate pattern proves itself fastest on a narrow domain. A single field, a single rig, a single decision class. Once the substrate is providing useful answers in the pilot domain, the expansion to the next domain is cheaper than the first because the pattern is established. Starting narrow is the standard playbook.

What's next

If you are an oil & gas operator with an AI project that demoed well and stalled in production, run the substrate audit against your current state. 15 minutes. The result will tell you whether the work is upstream of the model (substrate) or downstream of it (prompts, tools, agent design).

If the audit confirms a substrate gap, the next step is a Business System Review. Fixed scope, written report, no further commitment required. We will map your decision domains, score the substrate as it stands, and hand back a prioritized buildout plan. You can act on the plan with your team or with us.

For the broad framing on why this problem exists across all industries, see Your AI Demo Worked. Your AI Project Failed. Here's Why. For the substrate philosophy underneath, see Decision-State, Airlocked to Code-State: Defining the AI Substrate.