Co esponding au ho : Deepika Annam
Copy igh © 2025 Au ho (s) e ain he copy igh o his a icle. This a icle is published unde he e ms o he C ea i e Commons A ibu ion License 4.0.
AI-powe ed eal- ime da a pipeline op imiza ion using deep ein o cemen lea ning
Deepika Annam *
Independen Resea che , USA.
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2647-2653
Publica ion his o y: Recei ed on 08 Ap il 2025; e ised on 16 May 2025; accep ed on 19 May 2025
A icle DOI: h ps://doi.o g/10.30574/wja .2025.26.2.1957
Abs ac
Deep Rein o cemen Lea ning (DRL) ep esen s a ans o ma i e pa adigm o eal- ime da a pipeline op imiza ion
ac oss di e se indus ial applica ions. T adi ional op imiza ion echniques o en yield subop imal esul s in dynamic
en i onmen s wi h luc ua ing wo kloads, while DRL enables au onomous sys ems o adap h ough expe ience. This
a icle examines how DRL in eg a es wi h dis ibu ed s eam p ocessing sys ems o add ess c i ical challenges,
including wo kload unp edic abili y, esou ce dependencies, and in as uc u e he e ogenei y. The in eg a ion o
neu al ne wo ks wi h ein o cemen lea ning p inciples allows o sophis ica ed decision-making ha signi ican ly
imp o es esou ce u iliza ion and ope a ional e iciency. Va ious algo i hms, including Deep Q-Ne wo ks, P oximal
Policy Op imiza ion, and So Ac o -C i ic, demons a e pa icula e icacy in di e en applica ion con ex s. F om
heal hca e o da a cen e s, obo ics o IoT sys ems, DRL implemen a ion deli e s measu able imp o emen s in
h oughpu , la ency educ ion, and esou ce op imiza ion. Though implemen a ion challenges exis , including
hype pa ame e sensi i i y and sample e iciency conside a ions, he po en ial bene i s o DRL-powe ed op imiza ion
o da a-in ensi e indus ies a e subs an ial, o e ing a pa h owa d mo e in elligen , adap i e, and e icien da a
p ocessing a chi ec u es.
Keywo ds: Deep Rein o cemen Lea ning; Da a Pipeline Op imiza ion; S eam P ocessing; Resou ce Managemen ;
Adap i e Con ol
1. In oduc ion
In oday's dis ibu ed s eam p ocessing sys ems, housands o eal- ime s eams may en e he sys em h ough
p ocessing nodes, whe e hund eds o nodes may be co-loca ed o geog aphically dis ibu ed. Resou ce managemen o
hese sys ems is complica ed by se e al ac o s: p ocessing elemen s a e cons ained by p oduce -consume
ela ionships, da a and p ocessing a es can be highly bu s y, and adi ional measu es o e ec i eness, such as
u iliza ion, can be misleading [1]. The s eam p ocessing pa adigm has always played a key ole in ime-c i ical sys ems,
wi h applica ions anging om eal- ime explo a o y da a mining o high-pe o mance ansac ion p ocessing [1].
T adi ional op imiza ion echniques o da a pipelines, such as manual uning and heu is ics, usually yield subop imal
esul s and esou ce u iliza ion, especially in changing en i onmen s wi h di e en wo kloads [2]. Resou ce
managemen challenges include wo kload dynamici y, unp edic abili y, complex esou ce dependencies, he e ogenei y
o in as uc u e, and mul iple op imiza ion objec i es [2]. The classical solu ion o bu s iness p oblems is o add
bu e s, bu designing o e y high da a a es and scalabili y makes bu e ing inc easingly expensi e as sys em memo y
becomes a se e e cons ain [1].
Rein o cemen Lea ning (RL) has gained p onounced ecogni ion in ecen decades as a powe ul pa adigm aimed a
sel -o ganizing and con olling complex sys ems [2]. In RL, an agen lea ns how o make he bes decisions in in e ac ion
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2647-2653
2648
wi h an en i onmen by maximizing a cumula i e ewa d signal [2]. The eme gence o deep ein o cemen lea ning
echniques has u he imp o ed he applicabili y and e ec i eness o RL in di e en ields [2].
Expe imen al esul s om case s udies show p omising imp o emen s h ough RL applica ions. Fo Apache Spa k, an
RL-based esou ce alloca ion me hod comple ed asks up o 20% as e han heu is ic policies and used esou ces 25%
mo e e icien ly [2]. In Apache Flink, an RL-based app oach o da a low con ol ob ained a 30% educ ion in end- o-
end la ency and a 20% inc ease in h oughpu compa ed o ule-based policies [2]. Fo Kube ne es ask placemen , he
RL algo i hm policy accomplished up o 15% ewe ask comple ion imes and 20% ewe messages han heu is ic
app oaches [2].
The ACES (Adap i e Con ol o Ex eme-scale S eam p ocessing sys ems) app oach p oposes a wo- ie ed
op imiza ion whe e global op imiza ion de e mines ime-a e aged alloca ions, and a dis ibu ed esou ce con olle
uses adap i e con ol o ensu e s abili y in he p esence o bu s iness [1]. This app oach ou pe o ms adi ional
app oaches in e ms o weigh ed h oughpu by o e 20% in he limi o small bu e s and o e a wide ange o
bu s iness le els, while main aining end- o-end delay as li le as a hi d o adi ional app oaches [1].
2. Fundamen als o Deep Rein o cemen Lea ning o Da a Pipelines
Deep Rein o cemen Lea ning (deep RL) in eg a es he p inciples o ein o cemen lea ning wi h deep neu al ne wo ks,
enabling agen s o excel in di e se asks [3]. Acco ding o Te en's o e iew, ein o cemen lea ning is a pa adigm o
machine lea ning in which an agen lea ns an op imal beha io by in e ac ing wi h an en i onmen , ecei ing eedback
in he o m o ewa ds o penal ies, and adap ing i s ac ions o maximize long- e m e u ns [3]. The agen aims o
maximize he expec ed cumula i e ewa d, which can be w i en in he in ini e-ho izon se ing as ollows: E[∑( =0 o
∞) γᵗ ₜ], whe e ₜ is he ewa d ecei ed a ime , and 0 ≤ γ < 1 is a discoun ac o ha balances he impo ance o
immedia e e sus u u e ewa ds [3].
The RL amewo k consis s o s a es, ac ions, ewa ds, policies, and alue unc ions [3]. The s a e space ep esen s he
cu en condi ion o he sys em. In he con ex o da a pipelines, as no ed by Ra ie e al., "Real-wo ld p oblems usually
ha e many ea u es making i ha d o model and desc ibe he da a" [4]. The ac ion space encompasses all possible
in e en ions he agen can ake. Te en explains ha policy g adien me hods di ec ly lea n a pa ame e ized policy
π(a|s,θ) ha maps s a e- o-ac ion p obabili ies [3]. The ewa d unc ion de ines he op imiza ion goals. A
ans o ma i e b eak h ough occu ed when deep Q-ne wo ks (DQNs) demons a ed human-le el pe o mance on
dozens o A a i 2600 ideo games using only aw pixel inpu s and game sco es as he sole aining signals [3]. DQN
add essed key challenges h ough wo c ucial s abiliza ion echniques: expe ience eplay and a ge ne wo k [3].
Expe ience eplay s o es ansi ions in a eplay bu e and samples mini-ba ches andomly o aining, b eaking he
s ong co ela ions p esen in sequen ial obse a ions [3]. The a ge ne wo k is a copy o he Q-ne wo k ha is held
ixed o a numbe o i e a ions and hen pe iodically upda ed, which slows down changes in he a ge and educes
oscilla ions [3].
Fo da a pipeline op imiza ion challenges, Ra ie e al. iden i y se e al limi a ions in adi ional app oaches: "Al hough
he me hods men ioned abo e can imp o e lea ning pe o mance, howe e , hey a e in ol ed wi h se e al limi a ions.
Fo example, be o e s a ing he ea u e selec ion p ocess, i is necessa y o ha e access o he whole ea u e space.
While in many eal-wo ld applica ions, such as a enowned mic oblogging and social ne wo king se ice, ea u es
appea o e ime, and i is impossible o ha e all ea u es a he beginning o he p ocess" [4].
So ac o -c i ic (SAC) is pa icula ly ele an o con inuous con ol asks. As Te en no es, by op imizing no jus o
ewa d bu also o high ac ion en opy, SAC a oids collapsing o de e minis ic o o e ly na ow policies, subs an ially
imp o ing explo a ion [3]. In p ac ical obo ic scena ios, o example, na iga ing une en e ain o manipula ing objec s
unde unce ain y, SAC's s ochas ic explo a ion allows he agen o disco e obus s a egies wi hou ex ensi e manual
uning [3].
Ra ie e al. p opose mul i-objec i e app oaches o ea u e selec ion ha could be applicable o da a pipelines: "The i s
objec i e unc ion maximizes he ele ancy c i e ion, while he second minimizes edundancy among he selec ed
ea u es" [4]. This app oach is pa icula ly aluable as "in con as o mos p io me hods using an objec i e unc ion,
he Pa e o se is used o selec ea u es wi h maximum ele ance and minimal edundancy" [4].
Acco ding o Te en, h ee c i ical challenges exis in applying RL o eal-wo ld sys ems: sample e iciency, sa e y,
in e p e abili y, and mul i- ask lea ning [3]. Fo da a pipelines, Ra ie e al. no e ha " h ee c i ical condi ions mus
sa is y each online mul i-label s eaming ea u e selec ion me hod; To begin, no domain knowledge o ea u e space
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2647-2653
2649
should be equi ed. Also, i mus pe o m e ec i e inc emen al upda es in selec ed ea u es. Fu he mo e, i should be
accu a e in each ime ins ance o he classi ica ion pe o mance o be accep able" [4].
The applica ion o DRL o da a pipelines aligns wi h i s b oade use in esou ce managemen . As Te en no es, "In
esou ce managemen scena ios, RL is used in dis ibu ed sys ems and cloud in as uc u es. Da a cen e s ely on RL o
alloca e compu a ional esou ces, balance se e loads, and egula e ene gy consump ion" [3]. This makes DRL
pa icula ly sui able o op imizing da a pipelines, whe e esou ces mus be dynamically alloca ed in esponse o
changing wo kloads and condi ions.
Table 1 Ch onological E olu ion o Deep Rein o cemen Lea ning Algo i hms o Resou ce Managemen [3,4]
Algo i hm
Key Cha ac e is ics
Yea
In oduced
DQN (Deep Q-Ne wo k)
Uses expe ience eplay and a ge ne wo ks
2015
PPO (P oximal Policy Op imiza ion)
Clips p obabili y a io o p e en la ge policy upda es
2017
TRPO (T us Region Policy Op imiza ion)
En o ces cons ain on policy change be ween upda es
2015
SAC (So Ac o -C i ic)
Maximizes bo h ewa d and en opy o explo a ion
2018
DDPG (Deep De e minis ic Policy G adien )
Uses de e minis ic policy wi h a ge ne wo ks
2015
A3C (Asynch onous Ad an age Ac o -C i ic)
Uses mul iple wo ke s o deco ela e expe ience
2016
3. Implemen ing DRL-Powe ed Pipeline Op imiza ion
Implemen ing DRL o da a pipeline op imiza ion in ol es se e al key componen s ha enable adap i e pe o mance
uning o ecommenda ion models. Acco ding o Nag echa e al., hei InTune sys em demons a ed ha DRL-based
op imiza ion can inc ease da a inges ion h oughpu by as much as 2.29X e sus cu en s a e-o - he-a da a pipeline
op imize s while imp o ing bo h CPU and GPU u iliza ion [5]. This signi ican imp o emen highligh s he e ec i eness
o ein o cemen lea ning app oaches o pipeline op imiza ion.
The DRL agen is a he co e o InTune, lea ning how o dis ibu e CPU esou ces ac oss a DLRM da a pipeline o
e ec i ely pa allelize da a-loading and imp o e h oughpu . The sys em en i onmen e lec s a ious ac o s, including
pipeline la ency, ee CPUs, ee memo y in by es, model la ency, DRAM-CPU bandwid h, and CPU p ocessing speed [5].
The agen uses his in o ma ion o de e mine app op ia e esou ce alloca ion. As explained by Nag echa e al., he
ewa d unc ion is based on pipeline h oughpu and memo y usage, designed so ha ewa ds app oach ze o as
memo y consump ion nea s 100%, hus p e en ing ou -o -memo y e o s ha equen ly occu wi h o he
op imiza ion app oaches [5].
InTune's DRL agen uses a simple h ee-laye MLP a chi ec u e o minimize compu a ional demands, equi ing only
abou 200 FLOPs pe i e a ion. This ligh weigh design ensu es he agen doesn' in e e e wi h he ac ual model
aining job [5]. The ac ion space is designed o be inc emen al, allowing he agen o aise, main ain, o lowe esou ce
alloca ion o each pipeline s age by speci ied inc emen s. This app oach enables apid con e gence o an op imized
solu ion wi hin jus a ew minu es, e en on complex eal-wo ld pipelines [5].
Fo IoT applica ions speci ically, Mohammadi e al. no e ha adi ional ML ools do no su icien ly add ess eme ging
analy ic needs o IoT sys ems, pa icula ly o s eaming da a ha equi es as p ocessing. Thei su ey emphasizes
ha IoT applica ions need di e en mode n da a analy ics app oaches acco ding o he hie a chy o da a gene a ion
and managemen [6]. They classi y IoT analy ics in o big da a analy ics and s eaming da a analy ics, wi h he la e
equi ing p ocessing close o he sou ce o da a o emo e unnecessa y communica ion delays.
Mohammadi e al. also highligh ha combining DRL wi h IoT enables mo e in elligen sys ems. They demons a e ha
semi-supe ised deep ein o cemen lea ning can be applied o localiza ion in sma campus en i onmen s, whe e he
lea ning agen inds he bes ac ion o pe o m based on ecei ed signals om Blue oo h beacons [6]. Thei
expe imen al esul s show ha he semi-supe ised model consis en ly ou pe o ms he supe ised model in e ms o
ewa ds ecei ed and p oximi y o a ge s [6].
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2647-2653
2650
The implemen a ion challenges o DRL in IoT con ex s include he lack o la ge aining da ase s and p ep ocessing
equi emen s. Acco ding o Mohammadi e al., mos DL app oaches equi e some p ep ocessing o yield good esul s,
wi h image p ocessing echniques wo king be e when inpu da a is no malized, scaled in o speci ic anges, o
ans o med in o s anda d ep esen a ions [6]. Fo IoT applica ions, p ep ocessing becomes mo e complex as he
sys em deals wi h da a om di e en sou ces ha may ha e a ious o ma s and dis ibu ions while showing missing
da a [6].
Secu i y and p i acy p ese a ion a e also c i ical conce ns o DRL implemen a ions in IoT. Mohammadi e al. no e ha
DL models mus be enhanced wi h mechanisms o disco e abno mal o in alid da a, as hey lea n ea u es om aw
da a and he e o e can lea n om in alid inpu s. They sugges implemen ing a da a moni o ing DL model alongside he
main model o add ess his issue [6].
Figu e 1 Imp o emen s wi h InTune DRL-based op imize o e s anda d AUTOTUNE [5,6]
4. Bene i s and Pe o mance Imp o emen s
O ganiza ions implemen ing ein o cemen lea ning o op imiza ion can achie e signi ican bene i s based on indings
om he li e a u e. Acco ding o Ogun owo a and Najja an's comp ehensi e su ey [7], ein o cemen lea ning has
seen subs an ial g ow h in main enance planning applica ions, wi h an 80% inc ease in he numbe o RL and DRL-
based publica ions o main enance planning be ween 2019 and 2023.
The applica ion o ein o cemen lea ning echniques has demons a ed meaning ul imp o emen s in di e se
op imiza ion con ex s. As documen ed in [7], main enance ac i i ies ypically consume 15%-40% o o al p oduc ion
cos s in ac o ies. By le e aging condi ion moni o ing da a wi h ein o cemen lea ning, o ganiza ions can de elop
sma main enance planne s ha se e as p ecu so s o achie ing a sma ac o y [7]. These app oaches help educe
machine ailu es, imp o e eliabili y, and educe main enance and p oduc ion cos s associa ed wi h unplanned
down ime.
RL op imiza ion has shown bene i s in esou ce managemen in di e en con ex s. Acco ding o Poloskei, "Since he
public cloud p o ide s se e on-demand in oicing, he ese ed esou ces should be connec ed o he unning asks"
[8]. This is pa icula ly impo an because "The aining p ocess o a deep lea ning model akes some ime" and " he
aining quali y can o en be e icien ly inc eased by commi ing mo e esou ces, like a aching compu a ion-in ensi e
hype pa ame e op imiza ion measu es" [8].
In elligen wo k low managemen ansla es o e iciency bene i s, as demons a ed in Poloskei's esea ch. MLOps
app oaches in cloud-na i e ecosys ems le e age he cloud's ull capabili ies as cloud-na i e se ices, making ope a ions
mo e a o dable and implemen a ion mo e powe ul [8]. A s udy conduc ed by Humme e al. and ci ed in subsequen
esea ch indica es ha "da a handling uses 7% o he o al execu ion ime, bu his ime can be educed due o
pa allelized compu ing p ocedu es" [8]. This e iciency gain s ems om he abili y o speci y wo k lows as a Di ec ed
Acyclic G aph (DAG) [8].
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2647-2653
2651
RL-powe ed app oaches demons a e supe io pe o mance compa ed o adi ional implemen a ions. As no ed in [7],
o ganiza ions ha de eloped p ope main enance policies we e able o " educe he cos s associa ed wi h planned and
unplanned down ime o machines and main enance cos s." The au ho s also obse ed ha agen s using deep
ein o cemen lea ning o main enance planning o wind u bines "ou pe o med he co ec i e, scheduled, and
p edic i e main enance s a egies i espec i e o he numbe o a ailable main enance c ews because he agen lea ned
o pe o m main enance ac i i ies when he wind u bines a e in a low powe mode o demand is low" [7].
Beyond di ec pe o mance bene i s, o ganiza ions gain ope a ional e iciencies. Acco ding o Poloskei, "The MLOps
app oach concen a es on he modeling, elimina ing he pe sonnel and echnology gap in he deploymen " [8]. This
app oach helps add ess signi ican challenges, as "Fo a lou ishing big da a p ojec , he o ganiza ion should ha e
analy ics and in o ma ion- echnological know-how" [8]. The MLOps pa adigm helps b idge hese gaps by p o iding a
s uc u ed app oach o da a pipeline design in cloud-na i e ecosys ems, which, acco ding o Poloskei's analysis, is " he
ecommended way o da a pipeline design" [8].
Figu e 2 Da a Insigh s om RL/DRL Implemen a ion Resea ch [7,8]
5. Indus y Applica ions and Case S udies
DRL-powe ed pipeline op imiza ion is deli e ing ans o ma i e esul s ac oss nume ous da a-in ensi e indus ies. In
heal hca e, ein o cemen lea ning applica ions ha e shown ema kable po en ial. As documen ed in Al-Hamadani e
al.'s comp ehensi e e iew, ein o cemen lea ning has been e ec i ely applied in bo h heal hca e and obo ics
domains [9]. Fo obo ics applica ions, ein o cemen lea ning add esses he challenges o obo ic g asping and
manipula ion in uns uc u ed and dynamic en i onmen s, which emain c i ical p oblems due o he a iabili y and
complexi y o he eal wo ld [9]. T adi ional machine lea ning app oaches o en s uggle o handle he di e si y o
objec s in e ms o size, weigh , ex u e, anspa ency, and agili y. Consequen ly, ein o cemen lea ning has eme ged
as a solu ion, allowing obo s o lea n h ough ial and e o and adap o a ious si ua ions [9].
In he heal hca e sec o , ein o cemen lea ning echniques ha e been applied o cell g ow h p oblems, an a ea o
inc easing in e es due o i s signi icance in op imizing cell cul u e condi ions, ad ancing d ug disco e y, and enhancing
unde s anding o cellula beha io [9]. S udies ha e shown applica ions in modeling cell mo emen , pa icula ly in he
ea ly s age o C. elegans emb yogenesis, whe e deep ein o cemen lea ning was combined wi h agen -based modeling
amewo ks o model basic cell beha io s, including cell a e, di ision, and mig a ion [9].
Fo da a-in ensi e compu ing in as uc u e, he mal managemen ep esen s a c i ical op imiza ion challenge ha
di ec ly impac s bo h pe o mance and ene gy e iciency. Zhang e al. de eloped a deep ein o cemen lea ning
app oach o da a cen e he mal managemen ha demons a ed signi ican po en ial [10]. Thei comp ehensi e
e alua ion showed ha ac o -c i ic, o -policy, and model-based algo i hms ou pe o med o he app oaches in e ms o
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2647-2653
2652
op imali y, obus ness, and ans e abili y [10]. These implemen a ions we e able o educe cons ain iola ions and
achie e app oxima ely 8.84% powe sa ings in ce ain scena ios compa ed o de aul con olle s [10].
Zhang e al. no ed ha while DRL echniques show p omise, deploying hese algo i hms in eal-wo ld sys ems p esen s
challenges as hey a e sensi i e o speci ic hype pa ame e s, ewa d unc ions, and wo k scena ios [10]. Thei
expe imen s e ealed ha algo i hms can be e y sensi i e o se e al echniques and hype pa ame e s, such as s a e
p ep ocessing, lea ning a e, and ne wo k a chi ec u e [10]. The s udy iden i ied ha cons ain iola ions and sample
e iciency a e a eas ha s ill equi e imp o emen be o e widesp ead eal-wo ld implemen a ion [10].
The esea ch conduc ed by Zhang e al. inco po a ed a comp ehensi e ou -dimensional analysis o DRL applica ions in
da a cen e s, examining algo i hms, asks, sys em dynamics, and knowledge ans e [10]. This s uc u ed app oach
enabled de ailed e alua ion o a ious DRL algo i hms o dynamic he mal managemen deploymen using bo h
analy ical and nume ical me hods [10]. Thei indings emphasize he impo ance o quali a i e and quan i a i e
e alua ion me ics o comp ehensi e analysis, including s abili y, obus ness, sample e iciency, sa e y, asymp o ic
pe o mance, asymp o ic imp o emen , and jumps a [10].
These ad ancemen s demons a e how DRL-powe ed op imiza ion is ans o ming da a p ocessing ac oss di e se
indus ies, hough challenges emain in achie ing op imal implemen a ion in eal-wo ld en i onmen s.
Table 2 Rein o cemen Lea ning Pe o mance Ac oss Indus ial Applica ions [9,10]
Algo i hm
Pe o mance Me ic
Value
PPO and SAC
Success Ra e
100%
YOLO and SAC
Success Ra e (Building Blocks)
95%
QMIX-PSA
Success Ra e (Me al Wo kpieces)
82%
Success Ra e (Daily I ems)
83%
SAC
Success Ra e
80%
PPO
70%
6. Conclusion
Deep Rein o cemen Lea ning has es ablished i sel as a powe ul pa adigm o op imizing da a pipelines ac oss
nume ous domains. The in eg a ion o neu al ne wo ks wi h adi ional ein o cemen lea ning p inciples c ea es
sys ems capable o lea ning op imal esou ce alloca ion s a egies h ough in e ac ion wi h complex en i onmen s.
F om heal hca e applica ions ha model cell g ow h and mo emen o da a cen e he mal managemen sys ems ha
educe powe consump ion while main aining ope a ional pa ame e s, DRL demons a es e sa ili y and e ec i eness.
The echnology shows pa icula s eng h in handling he dynamic, unp edic able na u e o mode n da a p ocessing
en i onmen s, whe e adi ional me hods equen ly al e . While implemen a ion challenges pe sis , including
sensi i i y o hype pa ame e s and ewa d unc ion design, he ajec o y o ad ancemen poin s owa d inc easingly
obus solu ions. Ac o -c i ic a chi ec u es, o -policy lea ning, and model-based amewo ks ha e demons a ed
supe io pe o mance cha ac e is ics ac oss mul iple me ics. As hese echnologies ma u e, o ganiza ions can expec
con inued imp o emen s in ope a ional e iciency, esou ce u iliza ion, and sys em pe o mance. The u u e o da a
pipeline op imiza ion likely in ol es inc easingly sophis ica ed DRL implemen a ions ha combine he s eng hs o
a ious algo i hmic me hods while mi iga ing hei espec i e challenges, ul ima ely deli e ing mo e in elligen and
esponsi e da a p ocessing ecosys ems ac oss indus ies.
Re e ences
[1] Lisa Amini e al., "Adap i e Con ol o Ex eme-scale S eam P ocessing Sys ems", mic oso .com, 2006, [Online].
A ailable: h ps://www.mic oso .com/en-us/ esea ch/wp-con en /uploads/2017/01/jain06ex eme.pd
[2] Chand akan h Lekkala, "Le e aging Rein o cemen Lea ning o Au onomous Da a Pipeline Op imiza ion and
Managemen ", IJSR, 2023, [Online]. A ailable: h ps://www.ijs .ne /a chi e/ 12i5/SR24531190901.pd
Wo ld Jou nal o Ad anced Resea ch and Re iews, 2025, 26(02), 2647-2653
2653
[3] Juan Te en, "Deep Rein o cemen Lea ning: A Ch onological O e iew and Me hods", MDPI, Feb. 2025, [Online].
A ailable: h ps://www.mdpi.com/2673-2688/6/3/46
[4] Aza Ra ie e al., "A Mul i-Objec i e online s eaming Mul i-Label ea u e selec ion using mu ual in o ma ion",
ScienceDi ec , 2023, [Online]. A ailable:
h ps://www.sciencedi ec .com/science/a icle/abs/pii/S0957417422024472
[5] Kabi Nag echa e al., "InTune: Rein o cemen Lea ning-based Da a Pipeline Op imiza ion o Deep
Recommenda ion Models", ACM Digi al Lib a y, 2023, [Online]. A ailable:
h ps://dl.acm.o g/doi/ ullH ml/10.1145/3604915.3608778
[6] Mehdi Mohammadi e al., "Deep Lea ning o IoT Big Da a and S eaming Analy ics: A Su ey", a Xi , 2018,
[Online]. A ailable: h ps://a xi .o g/pd /1712.04301
[7] Oluwaseyi Ogun owo a, and Homayoun Najja ana, "Rein o cemen and Deep Rein o cemen Lea ning-based
Solu ions o Machine Main enance Planning, Scheduling Policies, and Op imiza ion", a Xi , 2023, [Online].
A ailable: h ps://a xi .o g/pd /2307.03860
[8] Is an Poloskei, "MLOps app oach in he cloud-na i e da a pipeline design", Resea chGa e, 2021, [Online].
A ailable: h ps://www. esea chga e.ne /publica ion/350775603_MLOps_app oach_in_ he_cloud-
na i e_da a_pipeline_design
[9] Mokhaled N A Al-Hamadani e al., "Rein o cemen Lea ning Algo i hms and Applica ions in Heal hca e and
Robo ics: A Comp ehensi e and Sys ema ic Re iew", Na ional Lib a y o Medicine, 2024, [Online]. A ailable:
h ps://pmc.ncbi.nlm.nih.go /a icles/PMC11053800/
[10] Qingang Zhang e al., "Deep ein o cemen lea ning owa ds eal-wo ld dynamic he mal managemen o da a
cen e s", ScienceDi ec , 2023, [Online]. A ailable:
h ps://www.sciencedi ec .com/science/a icle/abs/pii/S0306261922018189