scieee Science in your language
[en] (orig)

Machine Learning Framework for Performance Prediction and Intelligent Resource Allocation in Complex Data Environments

Author: Wang, Ming
Publisher: Zenodo
DOI: 10.5281/zenodo.17537140
Source: https://zenodo.org/records/17537140/files/Machine+Learning+Framework+for+Performance+Prediction+and+Intelligent+Resource+Allocation+in+Complex+Data+Environments.pdf
Jou nal o Compu e Technology and So wa e
ISSN: 2998-2383
Vol. 3, No. 5, 2024
Machine Lea ning F amewo k o Pe o mance P edic ion and
In elligen Resou ce Alloca ion in Complex Da a En i onmen s
Ming Wang
No heas e n Uni e si y, San Jose, USA
mingwangc @gmail.com
Abs ac : This pape ocuses on he p oblem o da abase que y execu ion ime p edic ion and op imiza ion. To add ess he
limi a ions o adi ional me hods ha su e om e o accumula ion and insu icien scheduling e iciency in complex que y
scena ios, i p oposes a comp ehensi e amewo k ha in eg a es s uc u ed modeling wi h adap i e scheduling. Fi s , a Plan-
G aph Guided La ency Modeling (PGLM) mechanism is designed, which explici ly inco po a es s uc u al ea u es o que y plans
o enhance he model's awa eness o ope a o pa e ns and join opologies, he eby imp o ing p edic ion accu acy and
gene aliza ion. Second, an Adap i e Que y–Resou ce O ches a o (AQRO) is cons uc ed o dynamically ma ch que y demands
wi h sys em esou ces unde a p edic ion –execu ion in e ac ion mechanism, ensu ing con inuous sa is ac ion o se ice-le el
objec i es (SLOs) and main aining sys em s abili y. The p oposed me hod demons a es s ong obus ness unde di e en
hype pa ame e s, esou ce quo as, and que y empla e di e si y, achie ing low p edic ion e o s and easonable unce ain y
calib a ion in dynamic en i onmen s. The esul s show ha he amewo k pe o ms well in bo h la ency p edic ion and esou ce
op imiza ion, p o iding a new echnical pa h o da abase sys em pe o mance imp o emen .
Keywo ds: Que y execu ion p edic ion; esou ce o ches a ion; la ency modeling; sys em op imiza ion
1. In oduc ion
In oday's da a-in ensi e applica ions, he e iciency o
da abase que y execu ion di ec ly a ec s sys em se ice quali y
and use expe ience. Wi h he con inuous g ow h o da a
olume and he di e si ica ion o applica ion equi emen s,
achie ing accu a e execu ion ime p edic ion and e icien
esou ce scheduling in complex que y plans has become a
c i ical p oblem in da abase op imiza ion. Accu a e la ency
p edic ion no only helps a oid pe o mance bo lenecks in
ad ance bu also p o ides key suppo o esou ce alloca ion
and que y op imiza ion s a egies, he eby imp o ing o e all
sys em s abili y and esponse speed. The e o e, esea ch on
in elligen me hods o que y execu ion ime p edic ion and
op imiza ion s a egies ca ies signi ican heo e ical and
p ac ical alue[1].
Howe e , exis ing me hods s ill ace many challenges in
complex que y scena ios. T adi ional cos models a e unable o
adap o he dynamic changes o da a dis ibu ion and sys em
s a es, o en leading o he accumula ion o p edic ion e o s.
Da a-d i en lea ning models imp o e p edic ion accu acy bu
s ill su e om limi a ions in gene aliza ion and obus ness. In
pa icula , when acing di e se que y empla es and unce ain
un ime en i onmen s, he p edic ions o en de ia e om ac ual
pe o mance. A he esou ce scheduling le el, cu en
s a egies lack ine-g ained modeling o he coo dina ion
be ween que y equi emen s and sys em esou ces, leading o
une en alloca ion and uns able sys em load, which nega i ely
a ec s he sa is ac ion o se ice-le el objec i es (SLOs)[2].
To add ess hese p oblems, his s udy in oduces a
comp ehensi e me hod ha in eg a es que y plan s uc u al
in o ma ion wi h esou ce scheduling mechanisms[3]. On one
hand, a plan-g aph-based la ency modeling mechanism is
cons uc ed o enhance he model's s uc u ed pe cep ion o
que y execu ion p ocesses, enabling mo e accu a e la ency
es ima ion du ing p edic ion. On he o he hand, an adap i e
que y – esou ce o ches a ion s a egy is inco po a ed o
achie e dynamic ma ching be ween que y wo kloads and
sys em esou ces, imp o ing p edic ion s abili y and
op imiza ion pe o mance in di e se scena ios. This
bidi ec ional in eg a ion aims o o m posi i e eedback
be ween p edic ion and scheduling, d i ing o e all
imp o emen s in da abase pe o mance op imiza ion[4].
The con ibu ions o his pape lie in wo main aspec s. Fi s ,
we p opose Plan-G aph Guided La ency Modeling (PGLM),
which explici ly inco po a es que y plan s uc u al ea u es in o
he p edic ion p ocess. This enhances he model's abili y o
ep esen and unde s and complex que y opologies, he eby
imp o ing p edic ion accu acy and gene aliza ion. Second, we
design an Adap i e Que y –Resou ce O ches a o (AQRO),
which achie es adap i e alignmen be ween que y demands
and sys em esou ces unde a p edic ion–execu ion in e ac ion
amewo k, balancing pe o mance imp o emen wi h esou ce
u iliza ion e iciency. Toge he , hese wo inno a ions cons uc
an end- o-end in elligen op imiza ion amewo k ha p o ides
a new solu ion o que y p edic ion and op imiza ion in
da abases[5].
2. Rela ed wo k
2.1 Que y Execu ion Time P edic ion: F om Cos Models
o Da a-D i en Lea ning
T adi ional esea ch has mainly ocused on cos models
based on ules and pa ame e ized assump ions. Que y
execu ion ime is decomposed in o he sum o ope a o -le el
CPU, I/O, and ne wo k cos s, i ed h ough ca dinali y
es ima ion, selec i i y, and cos ables. In ea lie single-node,
ow-s o e a chi ec u es, hese me hods o e ed good
in e p e abili y and ease o implemen a ion. Howe e , mode n
da abase sys ems in oduce columna comp ession, ec o ized
execu ion, pa allel pipelines, JIT compila ion, compu e-s o age
sepa a ion, and accele a ion ha dwa e. La ency is no longe a
simple linea sum o independen ope a o cos s[6]. Cache
pene a ion, memo y g an s, concu ency, and esou ce
go e nance s a egies in oduce s ong nonlinea i y and c oss-
laye coupling. In dis ibu ed se ings, da a skew, shu le,
e ies, and allback ampli y he cascading e ec o ca dinali y
e o s, making s a ic cos models p one o sys ema ic bias
unde mixed wo kloads, mul i- enan deploymen s, and elas ic
esou ce en i onmen s[7].
In esponse o inc easing complexi y, da a-d i en lea ning-
based p edic ion has become an impo an di ec ion. These
me hods ypically ely on execu ion logs and cons uc mul i-
g anula i y ea u es a ound que y plans, da a cha ac e is ics,
and sys em eleme y. The ea u es include logical and physical
ope a o sequences, join g aph densi y, p edica e complexi y,
de ia ions be ween es ima ed and obse ed ca dinali ies,
pipeline dep h, concu ency, memo y g an s, cache hi a e,
disk u iliza ion, and ne wo k u iliza ion[8]. One class o
me hods pe o ms ope a o -le el o s age-le el la ency
eg ession and hen combines esul s o ob ain o e all que y
la ency. Ano he class di ec ly pe o ms end- o-end p edic ion
using nonlinea models o cap u e how join o de , selec i i y,
and da a skew ampli y e ec s along he c i ical pa h. Compa ed
wi h adi ional cos models, lea ning-based me hods be e
accommoda e he e ogeneous ha dwa e and dynamic esou ce
s a egies, and p o ide ine-g ained signals o iden i ying
pe o mance bo lenecks unde di e en h o ling and
scheduling policies[9].
To enhance he ep esen a ion o que y plan s uc u es and
execu ion dependencies, ecen s udies emphasize s uc u ed
ep esen a ion lea ning. Typical app oaches ea que y plans as
ees o di ec ed acyclic g aphs and use g aph o sequence
encode s o cap u e ope a o -le el da a low dependencies,
pa allel o blocking ela ionships, and c oss-s age in e e ence.
A en ion mechanisms a e in oduced o explici ly model
c i ical pa hs and bo leneck ope a o s. Con ex encode s a e
used o in eg a e s a ic plans wi h un ime s a es, enabling
models o espond o ansien esou ce luc ua ions and plan
e-op imiza ion. Fo long- e m a ailabili y in p oduc ion,
online upda ing, inc emen al lea ning, and concep d i
de ec ion ha e been p oposed. These a e o en combined wi h
unce ain y es ima ion and calib a ion echniques, which ensu e
p edic ion accu acy while p o iding con idence in e als o
educe isks om inco ec decisions[10,11].
Lea ning-based me hods also ace challenges in da a quali y
and gene aliza ion. Execu ion logs o en con ain missing alues,
noise, and skewed dis ibu ions, while ex eme ail la encies
signi ican ly a ec aining and e alua ion. Va ia ions ac oss
wo kloads, schemas, and wo kload e olu ion cause ea u e
dis ibu ion shi s. C oss-engine, c oss-clus e , and c oss-cloud
deploymen equi es models wi h domain adap a ion and
pa ame e -e icien upda ing. P i acy and compliance
es ic ions limi da a agg ega ion ac oss enan s, mo i a ing
explo a ion o weakly supe ised, semi-supe ised, and
p i acy-p ese ing lea ning. Mechanisms such as UDFs,
app oxima e que ies, and ma e ialized iew selec ion in oduce
unobse able o ha d- o-quan i y a iables. In esponse,
esea ch has p oposed ea u e go e nance, obus losses,
esampling, and eweigh ing s a egies. Hie a chical, mul i- ask,
and mul i-objec i e models ha e also been in oduced o join ly
cap u e bo h a e age and high-pe cen ile la encies, he eby
p o iding a mo e s able p edic i e ounda ion o plan selec ion,
mid-que y e-op imiza ion, and esou ce o ches a ion[12].
2.2 Execu ion-Time Op imiza ion S a egies: Adap i e
Que y P ocessing and Resou ce O ches a ion
Execu ion- ime op imiza ion ocuses on un ime
coo dina ion be ween que y plans and esou ces. I s co e lies in
he syne gy o adap i e que y p ocessing and esou ce
o ches a ion. The o me add esses unce ain y caused by
s a is ical d i , concu ency luc ua ion, and da a skew, aiming
o con inuously co ec alse assump ions and con e ge o
be e execu ion pa hs. The la e emphasizes elas ic alloca ion
and global scheduling o compu e, s o age, and ne wo k
esou ces unde mul i- enan and he e ogeneous en i onmen s
wi h se ice-le el objec i es as cons ain s. A key p e equisi e
o hei join e ec i eness is he cons uc ion o obse able
links ac oss he plan, ope a o , and sys em laye s. This equi es
exposing eedback on es ima ion e o s be ween logical and
physical plans, collec ing ine-g ained un ime me ics in he
execu ion engine, and p o iding delay-sensi i e scheduling
in e aces and quo a con ols in esou ce managemen . These
mechanisms ensu e ha op imiza ion can ake e ec in a
closed-loop and imely manne [13].
The esea ch pa adigm o adap i e que y p ocessing
ocuses on in- ligh co ec ion. Typical app oaches include
moni o ing de ia ions in ca dinali y and selec i i y du ing
execu ion and igge ing phase e-op imiza ion o adjus join
o de and ope a o implemen a ion. Mul iple candida e
s a egies can be p ese o c i ical ope a o s and swi ched wi h
ligh weigh o e head once h esholds a e exceeded o
con idence le els upda ed[14]. Ope a o pa allelism, ba ch size,
and bu e h esholds can be adjus ed dynamically acco ding o
memo y and I/O p essu e, supp essing blocking chains and
ollback ampli ica ion. In da a skew scena ios, ho keys can be
esampled o a oided by spli ing long- ail asks in o balanced
sub asks. Inc emen al indexes and ma e ialized iews can be
ac i a ed on demand o educe memo y and ne wo k o e head
along c i ical pa hs. P o ec ion poin s can be placed in pipelines
ha a e sensi i e o es ima ion e o s, whe e ligh weigh
s a is ics and mic o- eo de ing a e inse ed wi hou b eaking
pipeline pa allelism, balancing obus ness and h oughpu [15].
Resou ce o ches a ion adop s a global pe spec i e o
coo dina e mul iple cons ain s. In clus e and mul i-cloud
en i onmen s, que ies a e spli in o independen ly schedulable
s ages o ask g oups. These a e placed wi h a ini y acco ding
o da a locali y and ne wo k opology, educing c oss-swi ch
a ic and ho spo conges ion. Admission con ol and h o ling
s a egies d i en by se ice-le el objec i es and la ency
budge s a e in oduced in o queues and p io i y hie a chies,
ensu ing ese ed esou ces and p io i y o c i ical eques s.
Quo as and isola ion a e applied o CPU, memo y, s o age, and
ne wo k esou ces, enhanced by NUMA awa eness and
accele a o binding, imp o ing u iliza ion e iciency. Fo
hyb id ansac ional and analy ical wo kloads, online wo kload
classi ica ion and shaping a e used o p e en long analy ical
asks om s a ing in e ac i e sho que ies. In dis ibu ed
execu ion, specula i e execu ion and eplica di e sion alle ia e
long- ail e ec s, while cache laye s and s aged p e e ch educe
cold-s a and ji e ampli ica ion[16].
A he me hodological le el, mo e op imiza ion s a egies
inco po a e p edic i e and lea ning signals o enhance
adap i eness and o esigh . A he que y laye , join p edic ion
o execu ion ime and esou ce usage can be embedded in o
cos e alua ion, o ming a p edic ion –decision – eedback
loop. A he sys em laye , scheduling and scaling can be
modeled as cons ained sequen ial decision p oblems, upda ed
wi h his o ical eleme y and online obse a ion. Fo mul i-
objec i e ade-o s, delay, h oughpu , cos , and ene gy a e
join ly op imized wi h obus egula iza ion o p e en
o e i ing on a single me ic. Fo c oss-en i onmen
gene aliza ion, domain adap a ion and pa ame e -e icien
upda es allow s a egies o ans e ac oss engines, clus e s, and
enan s. Fo enginee ing usabili y, unce ain y es ima ion, and
p o ec i e h esholds limi e-op imiza ion equency and
swi ching o e head, ensu ing ha op imiza ion bene i s exceed
con ol cos s in high-concu ency en i onmen s. Fallback
mechanisms a e also equi ed o mi iga e p edic ion e o s and
s a egy ailu es, p o iding sa e eco e y pa hs[17].
3. Me hod
This pape p oposes an end- o-end closed-loop
amewo k o da abase que y p ocessing ha ollows he line
o p edic ion, decision, and eedback, co e ing bo h o line
modeling and online op imiza ion, and inco po a ing wo key
inno a ions. The i s is Plan-G aph Guided La ency Modeling
(PGLM). This module abs ac s logical and physical plans in o
a ibu ed di ec ed acyclic g aphs and in eg a es ope a o
sequences, pipeline bounda ies, join s uc u es, de ia ions
be ween es ima ed and obse ed ca dinali ies, concu ency,
and mul i-sou ce eleme y o memo y, I/O, and ne wo k. I
cons uc s mul i-g anula i y ea u es wi h empo al con ex
encoding and ou pu s calib a ed unce ain y dis ibu ions o
cap u e bo h mean and high-pe cen ile la ency cha ac e is ics,
while also p o iding in e p e able cues o c i ical pa hs and
bo leneck ope a o s. The second is he Adap i e Que y –
Resou ce O ches a o (AQRO). This module uses
endogenous p edic ions and con idence in o ma ion as p io s
o coo dina e plans and esou ces a un ime. I igge s join
eo de ing and ope a o swi ching, adjus s pa allelism and
ba ch size acco ding o load and p essu e, mi iga es da a skew
h ough esampling and ask spli ing, and applies elas ic
o ches a ion o CPU, memo y, s o age, and ne wo k wi h
quo as and a ini y placemen , while main aining s abili y and
a ailabili y h ough cons ained op imiza ion and swi ching
cos con ol. The o e all wo k low consis s o da a collec ion
and ea u e go e nance, PGLM aining and online in e ence,
AQRO policy gene a ion and execu ion moni o ing, which a e
in eg a ed in o he op imize and esou ce manage h ough
s anda dized in e aces o p o ide execu ion- ime-awa e
adap i e que y p ocessing and esou ce o ches a ion in
complex and dynamic mul i- enan , he e ogeneous, and cloud-
na i e en i onmen s. The o e all model amewo k diag am
men ioned is shown in Figu e 1.
Figu e 1. O e all model a chi ec u e diag am
3.1 Plan-G aph Guided La ency Modeling
This s udy in oduces a Plan-G aph Guided La ency
Modeling (PGLM) me hod. The co e idea is o abs ac logical
and physical da abase plans in o a ibu ed di ec ed acyclic
g aphs and hen pe o m g aph s uc u e modeling and
empo al con ex encoding. By in eg a ing ope a o ypes,
pipeline bounda ies, ca dinali y es ima ion e o s, concu ency,
and mul i-sou ce eleme y, PGLM cap u es c i ical pa hs and
bo leneck ope a o s a he g aph s uc u e le el. This enables
ine-g ained and in e p e able p edic ions o que y execu ion
ime. Unlike adi ional s a ic cos models, PGLM does no
ely on ixed pa ame e s. Ins ead, i lea ns la en dependencies
and ampli ica ion e ec s au oma ically h ough a s uc u ed
g aph ep esen a ion, which makes i mo e adap able o
complex un ime en i onmen s.
PGLM u he ans o ms he p edic ion p oblem in o
modeling condi ional p obabili y dis ibu ions a he han
single-poin es ima ion. This dis ibu ional iew cha ac e izes
no only he mean execu ion ime bu also calib a ed ou pu s
o high-pe cen ile la encies such as P95 and P99, which a e
c i ical in eal sys ems. By inco po a ing unce ain y modeling,
PGLM p oduces join ep esen a ions ha include mean,
a iance, and con idence. This allows subsequen esou ce
o ches a ion and adap i e que y p ocessing o make decisions
based on con idence cons ain s ins ead o single-poin
es ima es, achie ing a balance be ween s abili y and
pe o mance. The amewo k o his inno a ion is illus a ed
in Figu e 2.
Figu e 2. PGLM module a chi ec u e
A he o mula easoning le el, we i s model he que y
plan as a weigh ed g aph. Le 's assume he que y plan is a
di ec ed acyclic g aph
),( EVG 
, whe e
V
ep esen s
he ope a o nodes and
E
ep esen s he dependency edges
be ween ope a o s. Fo each node
V 
, i s ep esen a ion
ec o can be w i en as:









)(
),(,
N
u u ehxh

x
ep esen s he ope a o cha ac e is ics (such as ype,
selec i i y, pa allelism, e c.),
u
e
ep esen s he edge
a ibu es (such as pa i ioning me hod, blocking ela ionship),
and

and

a e lea nable unc ions.
Based on he node ep esen a ion, we ep esen he
po en ial execu ion ime o he en i e que y as a g aph-le el
embedding
z
:
  
V hREADOUTz  |
Whe e
)(READOUT
ep esen s a g aph-le el
agg ega ion ope a ion, such as weigh ed a e aging o a en ion
agg ega ion.
To cha ac e ize he unce ain y o execu ion ime, we
model i s condi ional p obabili y dis ibu ion:
))(),(|()|( 2zzyNGyp


Whe e
y
ep esen s he ac ual execu ion ime,
)(z

and
)(
2z

a e he p edic ed mean and a iance,
espec i ely. The model ou pu s no only he expec ed alue
bu also he con idence in e al, hus a oiding o e - eliance
on a single es ima e du ing op imiza ion.
In he loss unc ion design, we use nega i e log-
likelihood as he op imiza ion a ge :
)(2
))((
)(log
2
1
2
2
2
z
zy
zLNLL





This loss unc ion cons ains bo h he mean and a iance
o he p edic ions, allowing he model o lea n a s able and
calib a ed p obabili y dis ibu ion. To u he ocus on ail
delays, we in oduce quan ile eg ession loss:
))
ˆ
)(1(),
ˆ
(max( yyyyL 


Whe e
y
ˆ
is he p edic ed

quan ile (such as
95.0

o
99.0
). The inal loss unc ion is a weigh ed
combina ion:
 



99.0,95.0



LLL NLL
The ole o he loss unc ion in his pa is o ans o m
execu ion ime p edic ion in o a join op imiza ion p oblem. I
equi es no only an accu a e mean p edic ion bu also a
eliable cha ac e iza ion o ail la ency. By cons aining bo h
he dis ibu ion cen e and ail ea u es, he model ou pu s
p edic ions ha a e ep esen a i e and obus , p o iding a
solid p obabilis ic ounda ion o subsequen adap i e que y
op imiza ion and esou ce o ches a ion.
3.3 Adap i e Que y–Resou ce O ches a o
This s udy also in oduces an Adap i e Que y–Resou ce
O ches a o (AQRO). I s main goal is o achie e dynamic
alignmen be ween compu ing esou ces and que y plans
du ing execu ion, he eby imp o ing o e all pe o mance and
ai ness in mul i- enan and he e ogeneous en i onmen s.
Unlike adi ional scheduling me hods ha ely on s a ic ules,
AQRO makes adap i e decisions based on p edic ed execu ion
signals such as mean la ency, ail la ency, and con idence
in e als. I enables join op imiza ion a bo h he que y and
esou ce le els. This coo dina ion allows he sys em o
main ain s able esou ce u iliza ion unde high load and
la ency-sensi i e scena ios while e ec i ely mi iga ing
pe o mance luc ua ions and ail la ency ampli ica ion.
In addi ion, he design o AQRO emphasizes c oss-laye
eedback by inco po a ing p edic ion unce ain y in o esou ce
o ches a ion s a egies. By combining s uc u al ea u es o
que y plans wi h un ime esou ce s a es, AQRO can
dynamically adjus ope a o scheduling, pa allelism alloca ion,
esou ce a ini y, and ba ch g anula i y. Impo an ly, AQRO
conside s no only a e age pe o mance objec i es bu also
in oduces pe cen ile la ency cons ain s in o scheduling o
achie e p oac i e con ol o ail la ency. This design p o ides
a uni ied con ol bus o esou ce scheduling and que y
op imiza ion, enabling he sys em o lexibly swi ch be ween
p ecise p edic ion and unce ain y de ense. The amewo k o
his model is illus a ed in Figu e 3.
Figu e 3. AQRO module a chi ec u e
A he o mula easoning le el, assume he sys em has a
que y se
 
N
qqqQ ,...,, 21

and a esou ce se
 
M
R ,..., 21

. The p edic ed la ency o each que y
i
q
is ou pu by PGLM, including he mean
i

, a iance
2
i

,
and quan ile p edic ion
)(
ˆ

i
y
. The esou ce alloca ion can be
ep esen ed as a ma ix:
 
 
MN
ji
aA 
 1,0
,
1
,
ji
a
Indica es ha he que y
i
q
is assigned o a
esou ce
j
.
Unde his cons ain , he execu ion la ency o each que y
is es ima ed o be:
))(,
ˆ
,,(
ˆ)(
jiiii y T
 

Whe e
)( j

ep esen s he load ac o o he esou ce
j
.
Fu he mo e, he sys em objec i e unc ion can be de ined
as minimizing he weigh ed delay:


N
i
iiTwAJ
1
ˆ
)(
i
w
is he que y p io i y weigh , which is used o e lec
he di e ences be ween di e en enan s o di e en business
needs.
To p e en he ail delay om being oo la ge, he quan ile
cons ain is in oduced:
iyi ,
ˆ)(


Whe e

is he maximum allowable h eshold o ail
la ency. This cons ain ensu es ha he scheduling s a egy
no only ocuses on a e age pe o mance bu also akes high-
pe cen ile la ency con ol in o accoun .
Finally, he loss unc ion o AQRO is de ined as
ollows:
)
ˆ
,0max()
ˆ
()
ˆ
()(
2
1
1
2
 
 

i
N
i
iiiAQRO yTVa yTL
The i s e m is he delay p edic ion e o cons ain , he
second e m is he egula iza ion o he p edic ion a iance o
s abilize he scheduling, he hi d e m is he ail delay penal y,
and
1

and
2

a e ade-o coe icien s.
The ole o he loss unc ion in AQRO is o ans o m
que y scheduling and esou ce o ches a ion in o a join
op imiza ion p oblem. I ensu es a e age pe o mance while
con olling p edic ion unce ain y and cons aining ail la ency
wi hin a p ede ined h eshold. In his way, he sys em can
dynamically adap o he demands o di e en que ies and he
load condi ions o esou ces, achie ing adap i e and obus

coo dina ed op imiza ion ha p o ides highe s abili y and
con ollabili y o da abase que y execu ion.
4. Expe imen al Resul s
4.1 Da ase
This s udy uses he public da ase named SPARQL
Que ies Pe o mance P edic ion, which con ains a la ge
numbe o ins ances designed o que y la ency p edic ion
asks. I co e s ac ual que y plan s uc u es oge he wi h hei
execu ion la ency labels. The da ase ocuses on s uc u al
ea u es a he que y plan le el and hei la ency esponses,
making i highly sui able o la ency modeling and la ency
dis ibu ion lea ning.
The da ase consis s o se e al key componen s. I
includes SPARQL que y ex s and s uc u al ea u es ex ac ed
om que y plans, such as ope a o ype sequences, di e ences
be ween es ima ed and ac ual ca dinali ies, and join opology
in o ma ion. I also p o ides p ecise que y execu ion imes as
g ound- u h labels. In addi ion, un ime con ex ea u es a e
included, such as concu ency le el, cache hi a e, and I/O
la ency me ics. Toge he , hese componen s o m a ich se o
da a samples ha suppo lea ning he mapping be ween g aph
s uc u al ea u es and execu ion ime.
One o he main ad an ages o his da ase is he di e si y
and cla i y o i s ea u es. I no only cap u es que y plan
s uc u al in o ma ion bu also aligns i wi h accu a e
execu ion la encies, o e ing an end- o-end supe ised lea ning
basis o la ency modeling. Mo eo e , because he da a
o igina es om eal que y en i onmen s a he han syn he ic
se ings, i p o ides highe ep esen a i eness and enhances
model gene aliza ion. Finally, he da ase is easy o access,
s o ed in s anda d o ma s, and o ganized, which acili a es he
cons uc ion o aining and es ing pipelines. This makes i
highly e icien and ep oducible o esea che s de eloping
la ency p edic ion and op imiza ion s a egies.
4.2 Expe imen al se up
The expe imen s in his s udy we e conduc ed in a high-
pe o mance compu ing en i onmen . The ha dwa e
con igu a ion included a 32-co e CPU wi h a 2.6 GHz clock
speed, 256 GB o memo y, and a mul i-GPU clus e wi h 32
GB o memo y pe ca d. High- h oughpu NVMe SSD s o age
and gigabi E he ne we e used o ensu e s abili y and
scalabili y du ing que y plan cons uc ion, la ency p edic ion
modeling, and esou ce o ches a ion. This en i onmen
p o ided su icien compu a ional suppo o complex g aph
ea u e ex ac ion and deep lea ning model aining.
On he so wa e side, he expe imen s an on a Linux
ope a ing sys em. Py hon 3.10 was used as he p ima y
p og amming language, and PyTo ch se ed as he main deep
lea ning amewo k o model implemen a ion and aining.
To e icien ly execu e g aph- ela ed compu a ions, DGL and
ela ed pa allel compu ing lib a ies we e employed. Docke
con aine iza ion was used o ensu e po abili y and s able
dependency managemen . In addi ion, CUDA 11.x and
cuDNN we e used o GPU accele a ion o ully exploi he
pa allel compu ing capabili ies o he ha dwa e.
Fo hype pa ame e se ings, he Adam op imize was
adop ed wi h an ini ial lea ning a e o 1e-4. A cosine
annealing schedule was used o dynamically adjus he
lea ning a e du ing aining. The ba ch size was se o 128 o
balance GPU memo y usage and aining s abili y. The
g adien clipping h eshold was se o 1.0 o a oid g adien
explosion. Loss unc ion weigh s we e de e mined h ough
g id sea ch, and he L2 egula iza ion coe icien was se o
1e-5 o imp o e gene aliza ion pe o mance. An ea ly
s opping mechanism was applied in all expe imen s,
e mina ing aining i alida ion me ics did no imp o e o
10 consecu i e i e a ions, hus ensu ing bo h e iciency and
s abili y.
4.3 Expe imen al Resul s
1) Compa a i e expe imen al esul s
This pape i s conduc s a compa a i e expe imen , and he
expe imen al esul s a e shown in Table 1.
Table 1: Compa a i e expe imen al esul s
Me hod
MAE ↓
RMSE↓
P95 Abs.
E o
↓
ECE ↓
In o me [18]
28.7
56.4
132.0
0.061
Au o o me [19]
26.9
53.1
124.0
0.055
FED o me [20]
24.8
49.2
115.0
0.049
Pa chTST[21]
22.6
45.7
108.0
0.043
ou s
(PGLM+AQRO)
15.9
31.0
72.0
0.028
The esul s demons a e consis en supe io i y. Ac oss all
ou me ics, ou s (PGLM+AQRO) ou pe o ms e e y baseline.
Compa ed wi h he bes -pe o ming baseline (Pa chTST: MAE
22.6 ms, RMSE 45.7 ms, P95 108 ms, ECE 0.043), ou s
educes MAE o 15.9 ms (abou 29.7 pe cen imp o emen ),
RMSE o 31.0 ms (abou 32.2 pe cen imp o emen ), and P95
e o o 72.0 ms (abou 33.3 pe cen imp o emen ), while also
lowe ing ECE o 0.028 (abou 34.9 pe cen imp o emen ). This
simul aneous educ ion in e o , ail la ency, and calib a ion
shows ha he model is mo e obus in o e all accu acy,
ex eme cases, and unce ain y cha ac e iza ion.
Mechanis ically, he i s ad an age comes om he
s uc u al awa eness o PGLM. By abs ac ing logical and
physical plans in o a ibu ed plan g aphs, he model explici ly
cap u es blocking ela ionships among ope a o s, join o de s,
selec i i y ampli ica ion e ec s, and he in luence o
pa allelism on c i ical pa hs. Compa ed wi h baselines ha ea
que ies as ime se ies o la ea u es, s uc u ed ep esen a ion
educes sys ema ic e o p opaga ion caused by ca dinali y bias
and esou ce con en ion. This is di ec ly e lec ed in he
simul aneous imp o emen s o MAE and RMSE.
Second, he signi ican educ ion in P95 absolu e e o
highligh s he model's s onge abili y o cap u e ail la ency
and dis ibu ional cha ac e is ics. PGLM gene a es join
p edic ions o mean and high pe cen iles and p o ides
con idence in o ma ion. AQRO hen le e ages hese
unce ain y signals o coo dina e plans and esou ces, such as
adjus ing pa allelism and ba ch size, esampling ho keys,
applying a ini y placemen , and en o cing quo a con ol. As a
esul , he model main ains con ollable ail e o s e en unde
wo kloads domina ed by long- ail la ency. This a ge ed
ea men o ail dis ibu ion is one o he mos p ac ically
aluable imp o emen s o que y execu ion la ency.
Finally, he educ ion in ECE indica es be e alignmen
be ween p edic ed p obabili ies and obse ed dis ibu ions,
making con idence in e als mo e use ul. This di ec ly
enhances he eliabili y o subsequen decisions. In admission
con ol, p io i y scheduling, and elas ic scaling, AQRO can use
c edible con idence alues o se h esholds and conse a i e
s a egies, educing he isks o o e -p o isioning and unde -
p o isioning. This allows he sys em o mee se ice-le el
objec i es while con olling esou ce cos s. O e all, he
s uc u al ep esen a ion o PGLM and he unce ain y-awa e
con ol o AQRO o m a closed loop. Mo e accu a e p edic ion
and mo e s able o ches a ion ein o ce each o he , d i ing
que y la ency p edic ion and op imiza ion s a egies om s a ic
o adap i e.
2) The en i onmen al sensi i i y o esou ce limi s and
quo a policies (CPU/Memo y/I/O) o SLO sa is ac ion a es
This pape u he s udies he en i onmen al sensi i i y o
esou ce limi s and quo a policies (CPU/Memo y/I/O) o SLO
sa is ac ion a es. The expe imen al esul s a e shown in
Figu e 4.
Figu e 4. The en i onmen al sensi i i y o esou ce limi s and quo a policies (CPU/Memo y/I/O) o SLO sa is ac ion a es
In his se o expe imen s, he e o me ics o la ency
p edic ion show clea di e ences unde di e en quo a
s a egies. When he sys em is cons ained by CPU o memo y
quo as, he MAE ises o 17.5 ms and 18.2 ms, espec i ely.
Unde I/O quo a cons ain s, i is e en highe a 19.6 ms,
indica ing ha I/O limi s ha e he g ea es impac on p edic ion
accu acy. In con as , unde balanced esou ce alloca ion, he
MAE is he lowes a 15.9 ms, showing ha he model can
be e cap u e he co espondence be ween que y plan ea u es
and execu ion la ency when esou ces a e su icien and
p ope ly alloca ed.
Fu he obse a ion o RMSE shows a end consis en wi h
MAE. Unde CPU and memo y quo as, RMSE eaches 33.9 ms
and 35.1 ms, espec i ely. Unde I/O quo as, i inc eases o
37.8 ms, while in he balanced quo as scena io, i d ops o 31.0
ms, showing a clea ad an age. Since RMSE is mo e sensi i e
o la ge de ia ions, his indica es ha in balanced scena ios, he
model no only achie es lowe a e age e o bu also p o ides
mo e s able p edic ions unde la ge luc ua ions. This e i ies
he obus ness o PGLM's g aph-s uc u ed ep esen a ion in
mul i- esou ce scheduling scena ios.
Fo ail la ency e o (P95 absolu e e o ), I/O quo as
emain he main bo leneck. The e o ises om 78 ms unde
CPU quo as o 86 ms, while in he balanced esou ce se ing, i
dec eases signi ican ly o 72 ms. In he agg essi e quo as
scena io, P95 e o soa s o 104 ms, highligh ing he
ampli ica ion o ail la ency isk unde ex eme esou ce
cons ain s. This shows ha al hough AQRO's dynamic
o ches a ion s a egy can pa ially mi iga e ins abili y caused
by esou ce sho ages, i is s ill di icul o ully supp ess long-
ail e ec s when esou ces a e o e ly es ic ed.
Finally, looking a he ECE me ic, he lowes alue o
0.028 is achie ed in he balanced scena io, indica ing he bes
alignmen be ween p edic ed and ac ual p obabili y
dis ibu ions. Unde CPU and memo y quo as, ECE alues ise
o 0.030 and 0.031, while unde I/O quo as, hey each 0.034.
In he agg essi e quo as scena io, i u he wo sens o 0.040.
This indica es ha when sys em esou ces a e excessi ely
cons ained, he e ec i eness o unce ain y modeling
dec eases, educing he eliabili y o p edic ion con idence.
O e all, he esul s demons a e ha PGLM+AQRO achie es
supe io p edic ion accu acy and calib a ion in balanced
esou ce en i onmen s, while ex eme quo a cons ain s e eal
he limi s o model adap abili y.
3) Sensi i i y o que y empla e di e si y (ope a o pa e n
and connec ion opology changes) o p edic ion gene aliza ion
This pape also s udies he sensi i i y o que y empla e
di e si y (ope a o pa e n and connec ion opology changes)
o p edic ion gene aliza ion. The expe imen al esul s a e
shown in Figu e 5.
Figu e 5.Sensi i i y o que y empla e di e si y (ope a o pa e n and connec ion opology changes) o p edic ion gene aliza ion
The expe imen al esul s show ha he la ency p edic ion
abili y o PGLM+AQRO a ies signi ican ly unde di e en
le els o que y empla e di e si y. MAE emains low unde
low and medium di e si y condi ions bu ises no ably when
ope a o numbe s and join ela ionships become mo e complex.
This indica es ha he model is mo e sensi i e o a e age e o
in complex opologies. The obse a ion highligh s he
in luence o que y empla e s uc u e on he baseline e o
dis ibu ion o he model.
The end o RMSE shows a clea e mono onic inc ease
compa ed wi h MAE. In scena ios wi h high ope a o di e si y
and mixed complexi y, he luc ua ions in p edic ion e o
become much la ge . This means ha when handling complex
que ies, he dispe sion o he e o dis ibu ion inc eases, and
ex eme alues exe a s onge in luence on o e all e o . The
con inuous ise o RMSE demons a es ha obus ness unde
complex ope a o pa e ns emains a challenge o he sys em.
The P95 absolu e e o peaks unde high-di e si y
condi ions and hen sligh ly dec eases in ex eme mixed
scena ios. This shows ha ail la ency p edic ion is hea ily
a ec ed by ex eme s uc u al complexi y bu is pa ly
alle ia ed a e in e nal adjus men o he model. The end
highligh s he s abilizing e ec o AQRO's adap i e scheduling
and esou ce o ches a ion unde high-p essu e condi ions,
which helps supp ess ex eme la ency.
The ECE esul s i s dec ease and hen inc ease, e en ually
s abilizing a a mode a e le el in ex eme scena ios. This end
indica es ha unde mode a e que y di e si y, he model
p o ides mo e eliable unce ain y es ima es. Howe e , in
ex emely complex cases, de ia ions a ise be ween con idence
in e als and ac ual e o s. This sugges s ha PGLM needs
u he imp o emen in unce ain y calib a ion o enhance he
consis ency o p edic ion con idence when acing high
s uc u al di e si y.
4) Hype pa ame e sensi i i y o lea ning a e and ba ch
size o delayed p edic ion s abili y
This pape u he p oposes a hype pa ame e sensi i i y
es ocusing on he lea ning a e and ba ch size, aiming o
explo e hei in luence on he o e all s abili y o delayed
p edic ion wi hin he p oposed amewo k. The mo i a ion
behind his es lies in he ac ha hype pa ame e s play a
decisi e ole in balancing con e gence speed, op imiza ion
s abili y, and gene aliza ion abili y, pa icula ly in scena ios
whe e p edic ion ou comes a e a ec ed by empo al
dependencies o delayed esponses. By sys ema ically
adjus ing he alues o lea ning a e and ba ch size, he s udy
seeks o analyze how sub le changes in hese pa ame e s
impac he obus ness o he model agains luc ua ions, as
well as i s abili y o main ain consis en pe o mance ac oss
di e en aining phases. Such sensi i i y analysis p o ides
no only a deepe unde s anding o he aining dynamics bu
also p ac ical guidance o selec ing mo e eliable
hype pa ame e con igu a ions when applying he me hod o
eal-wo ld asks. The o e all design o his expe imen is
summa ized in Figu e 6, which illus a es he ela ionship
be ween hese c i ical hype pa ame e s and he s abili y o
delayed p edic ion.
Figu e 6. Hype pa ame e sensi i i y o lea ning a e and ba ch size o delayed p edic ion s abili y
The expe imen al esul s show ha MAE exhibi s a ypical
U-shaped cu e conce ning lea ning a e se ings. When he
lea ning a e is oo small, model upda es a e slow, and he e o
emains high. As he lea ning a e inc eases, MAE dec eases
signi ican ly and eaches i s op imal alue in he middle ange,
indica ing ha he model cap u es he ela ionship be ween plan
g aphs and la ency mo e e icien ly. Howe e , when he
lea ning a e con inues o inc ease, MAE ises again, e lec ing
ins abili y caused by o e ly apid pa ame e upda es. This end
e eals he sensi i i y o la ency p edic ion models o
pa ame e uning and highligh s he impo ance o a easonable
lea ning a e ange o p edic ion accu acy.
The end o RMSE shows an o e all inc ease wi h small
luc ua ions. A low lea ning a es, RMSE is ela i ely low,
sugges ing ha he p edic ion dis ibu ion is mo e concen a ed.
As he lea ning a e g adually inc eases, RMSE con inues o
ise wi h s onge luc ua ions, indica ing ha agg essi e
upda e a es ampli y e o s in some complex que y plans. This
phenomenon sugges s he isk o unce ain y in oduced by
high lea ning a es and u he emphasizes he impo ance o
s abili y in la ency p edic ion.
In he ba ch size expe imen s, he P95 absolu e e o i s
dec eases signi ican ly as ba ch size inc eases, eaching i s
op imal le el a medium scale (such as 128), and hen ises
again. This indica es ha mode a e ba ch sizes e ec i ely
balance he ade-o be ween gene aliza ion and con e gence,
making ail la ency p edic ion mo e s able. Howe e , when he
ba ch size becomes oo la ge, he model loses ine-g ained
op imiza ion capaci y du ing g adien upda es, leading o
ampli ied ail e o s and educed accu acy in ex eme
condi ions.
The ECE esul s demons a e he sensi i i y o model
calib a ion. As ba ch size inc eases, ECE g adually dec eases
om a highe le el, showing ha he model achie es be e
consis ency be ween p edic ed con idence and ac ual e o s a
medium scales. When ba ch size con inues o g ow, ECE ises
again, indica ing ha o e ly la ge ba ches in oduce es ima ion
bias and educe he alignmen be ween p edic ed p obabili ies
and ue e o s. This phenomenon shows ha ba ch size no
only a ec s e o con e gence speed bu also di ec ly impac s
he eliabili y o unce ain y es ima ion, which is c ucial o
ensu ing p edic ion s abili y.
5. Conclusion
This s udy ocuses on que y execu ion ime p edic ion and
op imiza ion in da abases and p oposes an in eg a ed
amewo k ha combines s uc u ed modeling wi h adap i e
o ches a ion. By in oducing a plan-g aph-guided la ency