Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17336310
29
Compa a i e Analysis o C ude Oil and Gas P oduc ion P edic ion Using
Va ious Machine Lea ning Models
Saloni Sha ma1, D . Ga ima Tyagi2
1S uden (BCA) School o Compu e Applica ion and Technology, Ca ee Poin
Uni e si y, Ko a (Raj.), India
2P o esso , School o Compu e Applica ion and Technology, Ca ee Poin Uni e si y, Ko a
(Raj.), India
Abs ac :
Accu a e o ecas ing o c ude oil and gas p oduc ion is c i ical o he s a egic planning and
ope a ional e iciency o he ene gy indus y. T adi ional s a is ical app oaches o en all
sho in cap u ing he non-linea and dynamic pa e ns inhe en in pe oleum p oduc ion da a.
This esea ch pape explo es he applica ion o machine lea ning (ML) and deep lea ning
echniques o p edic pe oleum and gas p oduc ion mo e accu a ely using his o ical and
geological da ase s.
The s udy conduc s a compa a i e analysis o ou p edic i e models—Linea Reg ession,
Random Fo es , XGBoos , and LSTM—based on hei pe o mance me ics, including R-
squa ed (R²), Mean Absolu e E o (MAE), and Roo Mean Squa ed E o (RMSE). The
esea ch me hodology includes da a p ep ocessing, no maliza ion, model aining, and
alida ion using an 80-20 ain- es spli .
The models a e e alua ed no only in e ms o p edic i e accu acy bu also in hei abili y o
handle complex da a s uc u es. To enhance p ac ical usabili y, he models a e in eg a ed in o
an in e ac i e S eamli dashboa d ha enables eal- ime p edic ion and isualiza ion. Among
he e alua ed models, LSTM demons a ed supe io pe o mance due o i s abili y o cap u e
ime-se ies dependencies e ec i ely. This pape concludes ha deep lea ning app oaches,
when combined wi h in e ac i e analy ics ools, o e a obus amewo k o p oduc ion
o ecas ing in he ene gy sec o .
Keywo ds: C ude Oil P edic ion, Gas P oduc ion Fo ecas ing, Machine Lea ning, LSTM,
Random Fo es , XGBoos , Linea Reg ession, S eamli Dashboa d, Time-Se ies Analysis,
Ene gy Da a, MAE, RMSE, R² Sco e, Deep Lea ning, Fo ecas ing Models, Pe oleum
Indus y, P edic i e Analy ics.
In oduc ion:
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17336310
30
The oil and gas sec o is a undamen al pilla o mode n ci iliza ion, powe ing essen ial
se ices and in as uc u e a ound he globe. Wi h luc ua ing demands and global economic
p essu es, accu a e p oduc ion o ecas ing has become i al o ene gy companies s i ing o
main ain ope a ional e iciency and s a egic o esigh . As he wo ld con inues o depend on
ossil uels, imp o ing ou abili y o o ecas p oduc ion le els has bo h inancial and
en i onmen al implica ions.
T adi ionally, p oduc ion o ecas ing in he pe oleum sec o elied on s a is ical me hods like
eg ession and ime-se ies analysis. Howe e , hese me hods s uggle o model complex, non-
linea ela ionships and o en ail when aced wi h eal-wo ld da a a iabili y. The e olu ion o
machine lea ning has ushe ed in a new e a whe e da a-d i en models can lea n pa e ns om
his o ical and geological da a, adap o unseen inpu s, and make accu a e p edic ions o e ime.
Machine lea ning models such as Random Fo es and XGBoos a e pa icula ly use ul due o
hei ensemble na u e and abili y o manage ea u e in e ac ions. Simila ly, deep lea ning
models like Long Sho -Te m Memo y (LSTM) a e well-sui ed o ime-se ies o ecas ing,
o e ing a way o unde s and empo al dependencies in p oduc ion da a. These models p o ide
no only accu acy bu also scalabili y and adap abili y in e e -changing ene gy ma ke s.
In his esea ch, we e alua e and compa e ou powe ul algo i hms—Linea Reg ession,
Random Fo es , XGBoos , and LSTM— o hei e ec i eness in o ecas ing c ude oil and gas
p oduc ion. Real-wo ld da ase s including ea u es like low a e, a e age p essu e, condensa e,
and wa e -gas a io we e used o ain hese models. The p ojec u ilizes s anda d pe o mance
me ics—R², MAE, and RMSE— o assess how well each model gene alizes o unseen da a.
A majo s eng h o his esea ch is he in eg a ion o model ou comes in o an in e ac i e
S eamli dashboa d. This ool allows s akeholde s o isualize p oduc ion p edic ions in eal
ime, selec models dynamically, and il e da ase s based on hei needs. This ea u e ensu es
ha insigh s de i ed om complex ML algo i hms a e easily accessible and ac ionable o bo h
echnical expe s and business manage s.
O e all, his s udy emphasizes he po en ial o machine lea ning and deep lea ning models in
e olu ionizing p oduc ion o ecas ing in he pe oleum sec o . By o e ing a compa a i e
analysis combined wi h a p ac ical deploymen solu ion, i p esen s a holis ic app oach o da a-
d i en ene gy managemen . The ou comes o his esea ch can guide mo e in o med decision-
making, isk mi iga ion, and s a egic planning in ene gy ope a ions.
Re iew o Li e a u e:
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17336310
31
The ield o pe oleum p oduc ion o ecas ing has seen a signi ican shi wi h he
in oduc ion o machine lea ning and deep lea ning me hodologies. T adi ional models like
ARIMA and exponen ial smoo hing, al hough once s anda d in p oduc ion p edic ion, a e
now conside ed limi ed in hei capaci y o deal wi h he complex and non-linea na u e o
geological da a. Recen li e a u e explo es he use o ad anced algo i hms o o e come hese
challenges.
Singh and Sha ma (2020) conduc ed a de ailed s udy on he use o Long Sho -Te m Memo y
(LSTM) ne wo ks o oil p oduc ion o ecas ing. Thei esea ch highligh ed how LSTM
models, due o hei memo y e en ion capabili ies, ou pe o med classical ime-se ies models
in cap u ing empo al pa e ns and long- e m dependencies. Thei wo k alida ed ha deep
lea ning can be a mo e e ec i e al e na i e o o ecas ing asks in ol ing sequen ial da a.
Kuma and Pa el (2021) expanded on his by pe o ming a compa a i e analysis o mul iple
machine lea ning models, including Random Fo es and XGBoos , on ene gy sec o da ase s.
Thei indings suppo ed he use o ensemble lea ning me hods, which showed be e
gene aliza ion capabili ies and obus ness in modelling noisy, non-linea da a commonly
ound in pe oleum p oduc ion.
Se e al o he esea che s ha e con ibu ed o he g owing body o knowledge in his domain.
Zhang and Jin (2019) ocused on he implemen a ion o ensemble models and epo ed
p omising esul s when o ecas ing oil well pe o mance. Meanwhile, B ownlee (2018)
emphasized he impo ance o combining domain expe ise wi h machine lea ning
amewo ks o imp o e model eliabili y and in e p e abili y.
Mo eo e , he echnical in as uc u e suppo ing his esea ch has e ol ed. Lib a ies like
Sciki -lea n and Tenso Flow ha e become s anda d ools o implemen ing, aining, and
alida ing ML models. Thei ex ensi e documen a ion and ac i e communi y suppo p o ide
he ounda ion o de eloping scalable and ep oducible models. In his p ojec , hese
lib a ies we e used o ensu e consis ency and pe o mance ac oss di e en modelling
echniques.
Despi e he ex ensi e p og ess in p edic i e modelling, a gap emains in ansla ing hese
complex models in o use - iendly pla o ms ha enable eal- ime in e ac ion and decision-
making. Mos exis ing s udies ocus hea ily on accu acy and model compa ison bu do no
in eg a e hese insigh s in o usable in e aces. This pape b idges ha gap by embedding ML
and DL models in o a S eamli dashboa d, o e ing an in ui i e, eal- ime o ecas ing ool.
In summa y, he e iew o li e a u e e eals a consis en end: machine lea ning and deep
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17336310
32
lea ning models a e ans o ming pe oleum p oduc ion o ecas ing. Howe e , by coupling
hem wi h in e ac i e isual analy ics pla o ms, his p ojec ad ances bo h echnical accu acy
and p ac ical applica ion in he ene gy indus y.
Resea ch Gap Iden i ied:
Despi e nume ous s udies and exis ing models o compa a i e analysis o c ude oil and gas
p oduc ion, se e al impo an gaps ha e been iden i ied h ough an in-dep h analysis o
p e ious esea ch:
1. Lack o Real-Time Visualiza ion Tools:- Mos exis ing s udies ocus solely on model
accu acy wi hou in eg a ing p edic i e ou pu s in o in e ac i e pla o ms. The e's limi ed
wo k on combining ML models wi h eal- ime dashboa ds o ope a ional decision-
making.
2. Limi ed Compa a i e S udies Ac oss Mul iple ML Models:- While indi idual models
like LSTM o Random Fo es a e widely esea ched, ewe s udies o e a side-by-side
compa ison o mul iple ML and DL algo i hms speci ically o c ude oil and gas
p oduc ion o ecas ing.
3. Unde u iliza ion o Deep Lea ning o Tempo al Pa e ns:- T adi ional ML
app oaches domina e mos pe oleum o ecas ing li e a u e. The po en ial o deep lea ning
models like LSTM, which a e excellen o cap u ing ime-se ies dependencies, emains
unde explo ed in eal-wo ld p oduc ion da ase s.
4. Minimal Fea u e Enginee ing and In luence Analysis:- Exis ing esea ch o en
neglec s he iden i ica ion and analysis o key p oduc ion in luence s like CGR, WGR, o
p essu e a ia ions. You p ojec highligh s hese using ea u e impo ance analysis om
XGBoos .
5. Scalabili y and Deploymen No Add essed:- Many academic pape s s op a model
e alua ion and ail o discuss deploymen in scalable en i onmen s. You wo k
con ibu es by deploying models h ough a S eamli dashboa d, o e ing eal-wo ld
usabili y and scalabili y.
Resea ch Objec i e:
The p ima y objec i e o his s udy is o do compa a i e analysis o c ude oil and Gas
P oduc ion P edic ion using machine lea ning models. To achie e his goal, he s udy ou lines
he ollowing speci ic objec i es:
1. To analyze his o ical c ude oil and gas p oduc ion da a o iden i y key ends, pa e ns,
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17336310
33
and co ela ions in luencing p oduc ion ou pu s.
2. To apply and compa e mul iple machine lea ning and deep lea ning algo i hms (Linea
Reg ession, Random Fo es , XGBoos , and LSTM) o o ecas ing p oduc ion.
3. To e alua e model pe o mance using app op ia e me ics such as R² (coe icien o
de e mina ion), MAE (Mean Absolu e E o ), and RMSE (Roo Mean Squa ed E o ).
4. To iden i y he mos e ec i e p edic ion model in e ms o accu acy, speed, and
scalabili y o ime-se ies pe oleum da a.
5. To pe o m ea u e impo ance analysis o de e mine he mos in luen ial a iables (e.g.,
CGR, p essu e, WGR) in p edic ing p oduc ion.
6. To isualize p edic ion esul s and e o dis ibu ions h ough ad anced g aphs like
Ac ual s. P edic ed, Residual plo s, and hea maps.
7. To deploy an in e ac i e dashboa d using S eamli o eal- ime da a upload, model
swi ching, and isual in e p e a ion o s akeholde s and decision-make s.
Resea ch Me hodology:
Da ase Used:
o Two eal-wo ld da ase s we e used—one o gas p oduc ion and one
o pe oleum low. These da ase s included a iables such as Time, To al
Flow, Cumula i e Flow, Condensa e, Wa e , CGR (Condensa e-Gas Ra io),
WGR (Wa e -Gas Ra io), and A e age P essu e.
Tools and Technologies Used:
o Py hon 3.x o coding and model de elopmen
o Jupy e No ebook o da a analysis and code execu ion
o Pandas and NumPy o da a manipula ion
o Ma plo lib, Seabo n, and Plo ly o isualiza ions
o Sciki -lea n o implemen ing Linea Reg ession and Random Fo es
o XGBoos o ensemble boos ing
o Tenso Flow and Ke as o LSTM (deep lea ning model)
o S eamli o dashboa d c ea ion and deploymen
Techniques Applied:
o Da a Cleaning: Remo ed null alues and duplica es.
o Fea u e Enginee ing: De i ed ea u es om da e and low- ela ed
columns
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17336310
34
o No maliza ion: MinMaxScale used o model scaling
o T ain-Tes Spli : 80% o aining, 20% o es ing
o E alua ion Me ics: MAE, RMSE, R² o pe o mance measu emen
o Model Compa ison: Compa ed all ou models—Linea Reg ession,
Random Fo es , XGBoos , and LSTM
o Visualiza ion: Gene a ed p edic ion s. ac ual plo s, esiduals, and
ea u e impo ance g aphs.
o Deploymen : De eloped an in e ac i e S eamli app o dynamic
model compa ison and da a upload.
Sugges i e F amewo k:
Desc ip ion o he Flowcha Componen s -
Inpu C ude Oil and Gas Da a: This is he i s s ep whe e his o ical da a
ela ed o c ude oil and gas p oduc ion is collec ed. This da a may include:
o Daily/mon hly p oduc ion a es
o Tempe a u e, p essu e
o Well in o ma ion
This da ase is ypically s o ed in a CSV o Excel o ma and se es as he inpu o
he analysis.
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17336310
35
Da a P ep ocessing: Be o e aining models, he aw da a needs o be cleaned
and p epa ed:
o Handling missing alues o null en ies
o Remo ing duplica es
o Da a ype con e sion
o Scaling and no maliza ion
This ensu es he da ase is clean and sui able o model aining.
Fea u e Ex ac ion: In his s ep, ele an ea u es (inpu a iables) a e
selec ed o enginee ed:
o Iden i y ea u es ha mos in luence he ou pu (e.g., p essu e, low
a e)
o Remo e i ele an o edundan columns
o Possibly c ea e new ea u es h ough ma hema ical combina ions
These ea u es help imp o e model accu acy.
ML Models (Linea Reg ession, Decision T ee, Random Fo es ,
XGBoos ): Mul iple machine lea ning models a e ained on he da ase :
o Linea Reg ession: Fo baseline p edic ion
o Decision T ee: Fo in e p e abili y
o Random Fo es : Fo highe accu acy using ensemble lea ning
o XGBoos : Fo obus and e icien boos ing-based p edic ion
These models a e compa ed o ind he bes -pe o ming one.
Analysis o P edic ion: A e aining he models, hei p edic ions a e
compa ed agains ac ual alues using:
o G aphs (line plo s, sca e plo s)
o Me ics like MAE, RMSE, and R² sco e
This helps unde s and how well each model pe o med.
Fo ecas o P edic ion Resul s: He e, he chosen model is used o o ecas
u u e c ude oil and gas p oduc ion alues. These p edic ions a e shown in g aphs o
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17336310
36
ables and a e use ul o decision-making.
Analysis o P edic ion Resul s: The inal s age includes:
o Compa a i e analysis o all models
o D awing insigh s om o ecas ed alues
o In e p e a ion o which ea u es impac p oduc ion mos
This s ep ensu es ac ionable esul s a e de i ed om he models.
Da a Analysis & In e p e a ion:
The da a analysis and in e p e a ion phase is a i al componen o he machine lea ning
pipeline o p edic ing c ude oil and gas p oduc ion. Once he models—such as Linea
Reg ession, Decision T ee, Random Fo es , and XGBoos —gene a e p edic ions, hese
ou pu s a e subjec ed o igo ous analysis o e alua e hei accu acy, eliabili y, and p ac ical
alue.
1. Pe o mance E alua ion
To unde s and how well each model pe o ms, we employ a ious s a is ical e alua ion
me ics:
o Mean Absolu e E o (MAE): Measu es he a e age magni ude o e o s in a
se o p edic ions, wi hou conside ing hei di ec ion.
o Roo Mean Squa e E o (RMSE): P o ides insigh in o he magni ude o
p edic ion e o s and penalizes la ge e o s mo e han MAE.
o R-squa ed (R²) Sco e: Indica es how well he model explains he a iabili y o
he a ge a iable. A alue close o 1 means a be e i .
These me ics help in quan i a i ely compa ing he pe o mance o di e en
models and in selec ing he mos e ec i e algo i hm o p edic ion asks.
Ca ee Poin In e na ional Jou nal o Resea ch (CPIJR)
©2022 CPIJR ǀ Volume 3 ǀ Issue 4 ǀ ISSN: 2583-1895
July-Sep embe 2025 | DOI: h ps://doi.o g/10.5281/zenodo.17336310
37
2. Visual Analysis
Beyond nume ical me ics, isual ools o e an in ui i e unde s anding o model
p edic ions:
o Line g aphs compa e ac ual s p edic ed alues o e ime o show how closely
he model ollows eal-wo ld ends.
o Sca e plo s e eal he co ela ion be ween obse ed and p edic ed alues.
o Residual plo s a e used o diagnose e o s and de ec pa e ns ha migh
indica e model bias o poo i .
These isualiza ions help in unco e ing unde lying ends and highligh whe e he
model migh be unde pe o ming.