When I began my Ph.D., U.S. shale oil & gas industry was at its peak hype cycle. The unconventional boom had become the central topic of energy conversations, and I had just co-authored a paper with our Department Chair on global oil supply dynamics between U.S. shale and OPEC. That paper, written during my master’s, encouraged me to look deeper into the production mechanisms of the unconventional oil systems and how we might predict them.
When my doctoral work started, most research direction was in the direction of hydraulic fracture modeling, how proppants propagate, how fractures grow, how fluids navigate the fracture network in tight source rocks. Back then, most models were hyper-local, focused on a single well or even a single lateral stage of a well. Entire PhDs were being written about just one fracture network. I started a discovery phase and went to conferences, met operators working in the Bakken and Texas basins, attended field workshops in Colorado’s DJ Basin. I ended up with a dataset that included tens of parameters for thousands of wells and it eventually broke my Excel sheets. My more sophisticated friends told me that I need to start learning machine learning.
We had a light data science course offered in our Department at the time that covered data variance concepts, PCA, Clustering, etc., plus I had also taken a serious econometrics class that covered advanced regression models for time-series and panel data. That gave me a foundation, but I ended up taking a course that used R-based modelling for chemical unit operations and followed the "An Introduction to Statistical Learning with Applications in R" textbook. This is where I started to learn how to identify the right type of algorithm for the right type of problem, and built my first models to forecast well productivity and to assess how geology, as well as drilling and completion strategies shape outcomes. I was gradually forced to incorporate domain specific knowledge in framing the problem. This was mainly to first show how the thought process of a reservoir engineer would approach solving the problem, which data types would be used in each stage of the process, how to handle the feature engineering, etc., and then contrast this with how a computer science graduate with a lot of ML skills but zero domain specific knowledge would approach this. The spatial and temporal context of this is a great element. I stopped treating outliers as problems to remove and started reading them like field reports. In many cases, those “outliers” were actually the best wells, early test pilots or overengineered one-offs that held lessons buried under their deviation from the mean. Some tasks that are very basic from a petroleum engineer’s point of view, such as checking for choke management data, could easily be dismissed in data exploratory stages of ML work, and completely throw off the production forecasts.
At this stage I had to show how data scientists have to avoid trying to “fix” the dataset and understand it like a field engineer would. Once you establish this in a reproducible method, the models got better, and you can quantify the difference. But more importantly, the models get more useful because you could explain them to other engineers or old school managers that do not know AI. The data scientist that is presenting their predictive models to an oil company VP, should do this without talking about hyperparameter fine-tuning and instead establish that they’re essentially doing inverse modeling, and find ways to show that their models are in agreement with some fundamental concepts of the forward modelling or physics-based simulation that the traditional engineer is familiar with. I could say, “his cluster of wells behaves like this because of the higher API gravity and higher proppant to frac fluid ratio used after such and such time,” and they’d nod.
From this point on, I started using more advanced models and pipelines. Combined unsupervised clustering, and fuzzy rule extraction method called VSR (developed by a former student in our group), and ensemble predictors like Random Forest and GBM. On top of that, I layered stochastic decline curve analysis to expand the prediction to basin-wide analysis for long-term field development planning. Together, they helped figure out where ML added value (like feature pruning and understanding decline parameters) and where physics still ruled. My key contribution was towards the end when I developed a transfer learning framework that showed you could train a model in one sub-basin and, with minimal local tuning, still predict reasonably well in another section of the asset with some level of geologic heterogeneity. When we published this in the highest impact factor journal in the field, it was the first basin-wide evidence that shale development learnings can be transferred digitally, not just through intuition or rules of thumb but quantitatively, and with minimal error propagation.
Still, the thesis was just a thesis, the framework made sense, but it wasn’t plugged into the workflows of real engineers making decisions on real budgets. After graduation, I was appointed project manager for a digital oilfield lab sponsored by a major California-based IOC (obvious which). Our interdisciplinary team was a mix of CS, EE, and Petroleum professors, two grad students as ML engineers, and a few petroleum engineering and geology students. We later ended up bringing in a chemistry PhD student to run lab tests. The goal was to take the same philosophy in my thesis, to combine domain specific knowledge with a data-driven workflow and cut carbon footprint in heavy oil operations.
The focus narrowed quickly as we boiled down the problem into on a single question: which wells deserved chemical treatment instead of cyclic steam? Steam is expensive, carbon-intensive, and politically difficult in California. If we could reduce steam usage through smarter targeting, there were real emissions and cost benefits to be gained, and the operator would be able to maintain production targets. The dataset we worked with spanned over 4,000 chemical treatment jobs across 3,000 wells in two heavy oil fields. Historical records of success rates were discouraging with fewer than half of treatments led to meaningful production improvements. But we needed to access to more than production logs to develop an AI or ML solution. The biggest challenge was that all the treatment jobs had been done by a single vendor with n undisclosed product, with exact same recipe. The data did not have the variance you need to train ML models. I had to drive to Bakersfield a few times to pick up actual oil field sample, and a batch of the vendor’s chemical product. Using that, our resourceful chemistry PhD student came up with chemical assay concentration results, water salinity and temperature impact data, to analyze the solubility of the final crude mixture. We also fetched data from scattered libraries for well age, perforation intervals, water cut history, and even steam temperature maps overlaid with well locations which is usually a rare luxury.
We built an ensemble model with Linear, Ridge, Lasso, Random Forest, and XGB, all tuned through cross-validation. The R² hovered around 0.90, but we knew better than to celebrate. One lesson from my Ph.D. work that came back quickly was that high R^2 doesn’t mean a model works in the field. So we introduced a "human-in-the-loop" decision layer. This wasn't something I used in my thesis, but here, it was essential. We established a traffic-light style threshold system based on heuristics to flag wells if the model predicted a gain of over 5 bbl/day, or a decline of over 30%, or a baseline gross production under 50 bbl/day. Field engineers were handed decisions instead of predictions and in a format they could tweak, discuss, or override. The resulting machine learning model can be deployed at the edge to ingest field data, and a light dashboard or config file would used by field engineers, recommendations were structured not just as predictions, but as operational guidance calibrated to thresholds field teams recognized.

That model eventually led to a field pilot. Two flagged wells were predicted to underperform under normal chemical concentration. The engineers doubled the chemical dose, and the result was that early-time production rates rose noticeably.
The thesis, like most research, by design, is clean and packaged nicely. The field, of course, is not. Some behind-the-scenes realities we had to face and made us work on wrong results for weeks. One of the more subtle but critical issues we ran into during model development came from how we engineered features. We used 2nd-degree polynomial transformations to enhance model performance, which on paper sounded like a smart way to capture non-linear effects and feature interactions. But in the process, we accidentally introduced a form of temporal leakage. The original features we used were safe, as based on the production engineers’ observations we used 60-day and 7-day average production ratio values before the chemical treatment, basic petrophysical inputs, and perforation metrics. But when we applied polynomial expansion, we unknowingly created interactions and squared terms that mathematically blended values across time, especially features that, depending on how they were constructed, could incorporate post-treatment signals or blur the edge between pre- and post-treatment windows. For example, if a total production sum or decline rate wasn’t cleanly bounded on the timeline, squaring or cross-multiplying it created terms that gave the model an indirect peek into outcomes it was supposed to predict.
This didn’t show up immediately in metrics. Our R² stayed high. But during a QA pass, we noticed that the model’s confidence was disproportionately strong around the treatment window, almost too good. We went back, traced the issue, and found that the temporal anchoring in our feature engineering had failed. The fix involved redefining every single engineered feature to ensure it was strictly based on data available before the treatment date. We also audited the pipeline so that no operation crossed the temporal boundary into the post-treatment period.
In time-sensitive domains like oil and gas operations, in feature engineering the focus shouldn’t just be about statistical creativity, but we have to understand how the temporal sequence of events matter and respect causality. A production engineer would never use post-treatment results to justify a treatment plan. Our model shouldn’t either.
What I learned (and didn’t learn) in school
This opportunity to immediately shift from theory workflow to field-ready software taught me a lot about the pains associated with technology transfer. We didn’t call it that back then, but it had some of the core tech transfer elements like bridging a knowledge gap between ML developers and petroleum engineers, embedding field-specific constraints (like time-based decision cycles, treatment chemical behavior) into model logic as microservices, and deploying a working system that lives beyond academic publication and contributes to operational decisions.
If I have to summarize my Ph.D. and then the immediate applied AI technology transfer project into a few lessons, these would be it:
Lesson 1: Problem scoping is everything.
Lesson 2: Domain knowledge is non-negotiable.
Lesson 3: ML Models age and degrade quietly unless they’re embedded in industrial workflows.
Lesson 4: Oil companies want explanations. Good accuracy as a black box wouldn’t fly with field teams.
Now, everything I built came before the explosion of LLMs and GenAI. But if I were doing it today, I’d take the same core and go further:
Lesson 5: Containerize the ML + rules engine, wrap it in an agentic GenAI layer for interpretation and field dialogue, deploy it across real assets, and monetize it as a decision-intelligence SaaS for chemical treatment optimization.