Daily Reports as Substrate: The Single Highest-ROI Substrate Buildout in Oil & Gas
A drilling operator generates hundreds of daily reports per rig per year. Most of them are read once, filed, and never reasoned across. Turning the daily-report stream into a substrate is the single highest-ROI buildout in upstream operations.
A drilling operator generates 365 daily reports per active rig per year. A completion crew generates roughly the same. A mud engineer files a daily mud report per rig per shift. Add workover reports, rig-move reports, and the morning operations summary, and a single mid-sized operator produces tens of thousands of structured operational documents per year. Most of them are read once, filed, and never reasoned across again. Turning the daily-report stream into a substrate is the single highest-ROI buildout in upstream operations. This piece explains why, what the substrate looks like, what queries become possible, and how to start.
If you have not read the parent pillar yet, start there: Why Oil & Gas AI Projects Stall.
The structure that makes daily reports ideal substrate material
Daily reports in oil & gas have four properties that make them the highest-impact substrate candidate of any document class in the industry.
- Cadence is predictable. Reports arrive once per day per asset, every day, on the same template. The substrate ingestion pipeline is a single recurring job.
- Structure is consistent. Drilling reports follow a standard layout (depth, ROP, weight on bit, RPM, mud properties, equipment status, NPT codes, summary narrative). Mud reports follow API RP 13-style structure. Field tickets and morning reports vary by operator but stabilize within months.
- Volume justifies retrieval. Tens of thousands of reports per year is exactly the corpus size where exhaustive read by humans fails and hybrid retrieval starts producing answers neither the engineer nor the agent could produce alone.
- Cross-document reasoning has direct dollar value. "Have we seen this NPT cause before, and what did the company-man do," "what is our average ROP through the Wolfcamp B with this BHA configuration," "which rigs are running 12 percent over-budget on mud chemistry this quarter." Each query, answered well, is operationally consequential.
The same four properties hold for completion morning reports, workover reports, and the daily mud reports. The substrate pattern applies to all of them.
What the substrate looks like
Each daily report becomes one markdown document in the substrate. The structure: a YAML front-matter block with the extracted structured fields, followed by the narrative sections. Example:
---
report_type: ddr
well_id: well-001-23h
operator: example-co
api_number: 42-XXX-XXXXX
date: 2026-06-10
report_number: 47
depth_md_start: 8420
depth_md_end: 8612
rop_avg_fph: 36.2
wob_klbs: 38
rpm: 110
mud_weight_ppg: 11.2
mud_funnel_visc_secs: 48
formation: wolfcamp-b
bha_configuration: bha-rev-c
npt_hours: 1.5
npt_codes:
- code: NPT-MUD
duration_hr: 1.5
description: shaker screen change
company_man: name-redacted
contractor: example-rig-co
rig_id: rig-22
---
# Operations summary
Drilled 192 ft from 8420 to 8612 over 18.5 hours of rotating time.
ROP averaged 36.2 fph; performance consistent with offset wells in
this formation at this BHA configuration. Mud weight increased from
11.0 to 11.2 ppg to address minor sloughing observed in shaker returns
at ~8500 ft.
NPT: 1.5 hours of mud-system downtime for shaker screen replacement.
Routine. No correlation to formation event.
# Forward plan
Drill ahead to 8900 ft target. Anticipate 8 to 10 hours rotating time
based on formation projection. Monitor mud properties for additional
sloughing indicators.
The YAML front-matter is what the AI agent queries against. The markdown body is what the agent reads when it needs context. The same document serves both retrieval and reasoning, with no separate ETL into a vector DB or a tabular store.
The queries that become possible
Once the substrate is populated, a class of queries becomes possible that was either impossible or required a multi-day engineering effort before. Examples from real operator workflows:
- "What is our average ROP through the Wolfcamp B with BHA Rev C across the last 12 months?" The agent queries the front-matter, filters by formation and BHA, averages the ROP. Answer in seconds. Same query against PDFs in a SharePoint folder: a multi-week analyst project.
- "Which rigs are running over AFE on mud chemistry this quarter, and what is the dominant cost driver?" Cross-references daily mud reports against AFE budgets, surfaces the rigs over plan, identifies the chemistry line items driving the overage. The substrate makes this a query; without the substrate, this is the mud engineer's quarterly project.
- "Have we seen NPT code MUD-LCM with this lost-circulation pattern in the same formation before, and what worked?" The agent searches the corpus for similar narratives, returns the prior incidents with their resolutions and the company-man's notes. Operationally consequential during an active LCM event.
- "Pull every report from this well plus every offset well within 2 miles, summarize the formation tops actually observed versus the prognosis." The agent assembles the cross-well comparison. Geologic intelligence that was previously locked in PDFs becomes a single query.
- "For the morning ops meeting, summarize what changed across our 6 active rigs in the last 24 hours, flagged by NPT, by formation change, and by mud-property excursion." Replaces the company-man's morning hand-compilation. Same accuracy, 5 minutes instead of 90.
Each of these queries existed as a real engineering need before the substrate. The substrate did not invent the need. It made the answer cheap.
The buildout sequence
KoldOps installs daily-report substrates as fixed-scope engagements. The starting point is one report type from one rig or asset. Usually drilling reports from the operator's most active rig. The sequence:
- Business System Review (2 weeks). Map the existing report flow. Where do the reports originate (WITSML feed, contractor portal, direct PDF), where do they land (SharePoint, the well-management system, somebody's email), what structure is captured today.
- Substrate buildout (4 to 6 weeks). Stand up the ingestion pipeline that turns each new daily report into a markdown document with YAML front-matter. Backfill historical reports for the chosen rig (typically 1 to 2 years). Wire retrieval. Build the morning-ops-summary query as the first proof of value.
- Pilot against real operations (4 weeks). The company-man uses the substrate for the morning ops meeting. The mud engineer uses it for cross-rig comparison. The drilling engineer uses it for offset analysis. Measure time-to-answer delta.
- Expand to remaining rigs and report types. Once one rig's drilling-report substrate is producing operational value, the pattern extends to the rest of the fleet and to mud reports, completion reports, and workover reports.
Total time from contract to a production substrate covering one rig's drilling reports: 6 to 10 weeks. Expansion to the full fleet typically takes another 6 to 12 weeks depending on rig count and report-format consistency.
Where the historical corpus comes from
Most operators have years of historical daily reports somewhere. Common sources:
- The well-management system (WellView, OpenWells, Pason DataNet, NOV Wellsite). Most have a bulk-export capability. Some require contractor cooperation.
- The drilling contractor's portal. The contractor usually retains daily reports for the period of their engagement and can export on request.
- SharePoint, network drives, or a shared folder where the company-man or the office filed PDFs over time. This is the messy source but often the most complete.
- Email archives. The drilling engineer's inbox often has the most current operational context, attached to morning summaries the operator generated for the team.
- The accounting / cost system (Quorum, P2 Land/BOLO, OGSys). Holds the AFE detail that joins to daily-report cost overruns.
The substrate buildout includes the ingestion pipeline from each of these sources, normalized into the markdown format. The historical corpus does not need to be perfect on day one; it needs to be representative enough for the queries to return useful answers, and the ingestion pipeline keeps extending coverage as backfill continues.
Frequently asked
We already have WITSML feeds and a real-time drilling display. Why do we need a substrate?
WITSML is excellent for real-time drilling data. It is not, by itself, a substrate. The substrate consolidates WITSML data plus the narrative reports plus the manual fields the company-man enters plus the mud-engineer reports plus the contractor's daily into one queryable corpus. WITSML stays as the real-time feed. The substrate is the cross-document reasoning layer.
Our drilling reports come from the contractor in PDF format. Can the substrate handle that?
Yes. The ingestion pipeline OCRs and structures the PDF on arrival, extracts the standard fields into YAML front-matter, and stores the narrative as markdown. Quality is high on standard templates (Pason, NOV, IDS reports); it requires more validation on contractor-specific custom templates. The pipeline reports its own confidence so low-quality extractions are flagged for human review.
The reports contain confidential operational data and offset-well terms. Where does the substrate run?
On-premise, in most operator engagements. The substrate is markdown in git. The git host can be self-hosted (Gitea, Forgejo) inside the operator's environment. The inference layer can run on-premise (Mac Mini cluster, dedicated GPU box) or through a vendor with appropriate confidentiality terms. Nothing in the substrate architecture requires the cloud.
What about WITSML 2.0 / ETP for real-time integration?
The substrate integrates with ETP/WITSML 2.0 as a streaming source. Real-time data flows into the substrate's operational-data layer; daily reports continue to flow into the document layer. The agent has access to both. The pattern works the same way for the IADC RIG WIRES feeds and equivalent contractor real-time interfaces.
Will this work for unconventional pads with multiple wells drilled in batch?
Yes. The substrate is per-well at the document layer (each well has its own reports) but the queries are at the operator layer (cross-well, cross-pad, cross-field). Pad-level reasoning ("how did wells 1, 2, and 3 perform across the same formation at the same time") becomes a single query against the substrate.
What's next
If your operation generates daily drilling, completion, mud, or workover reports and reasoning across them is currently a manual analyst job, the substrate buildout is probably the highest-ROI AI investment available to your asset team. Run the substrate audit against your current report flow.
If the audit confirms a substrate gap on daily reports, the next step is a Business System Review. Fixed scope, written report, no further commitment. We map your report sources, score the substrate as it stands, identify the highest-ROI rig or asset to start with, and hand back a prioritized buildout plan.
For the broad framing on O&G-specific substrate work, see Why Oil & Gas AI Projects Stall. For the substrate philosophy, see Decision-State, Airlocked to Code-State.