Machine Learning Framework for Performance Prediction and Intelligent Resource Allocation in Complex Data Environments

Author: Wang, Ming

Publisher: Zenodo

DOI: 10.5281/zenodo.17537140

Source: https://zenodo.org/records/17537140/files/Machine+Learning+Framework+for+Performance+Prediction+and+Intelligent+Resource+Allocation+in+Complex+Data+Environments.pdf

Jou nal o Compu e Technology and So wa e
ISSN: 2998-2383
Vol. 3, No. 5, 2024
Machine Lea ning F amewo k o Pe o mance P edic ion and
In elligen Resou ce Alloca ion in Complex Da a En i onmen s
Ming Wang
No heas e n Uni e si y, San Jose, USA
mingwangc @gmail.com
Abs ac : This pape ocuses on he p oblem o da abase que y execu ion ime p edic ion and op imiza ion. To add ess he
limi a ions o adi ional me hods ha su e om e o accumula ion and insu icien scheduling e iciency in complex que y
scena ios, i p oposes a comp ehensi e amewo k ha in eg a es s uc u ed modeling wi h adap i e scheduling. Fi s , a Plan-
G aph Guided La ency Modeling (PGLM) mechanism is designed, which explici ly inco po a es s uc u al ea u es o que y plans
o enhance he model's awa eness o ope a o pa e ns and join opologies, he eby imp o ing p edic ion accu acy and
gene aliza ion. Second, an Adap i e Que y–Resou ce O ches a o (AQRO) is cons uc ed o dynamically ma ch que y demands
wi h sys em esou ces unde a p edic ion –execu ion in e ac ion mechanism, ensu ing con inuous sa is ac ion o se ice-le el
objec i es (SLOs) and main aining sys em s abili y. The p oposed me hod demons a es s ong obus ness unde di e en
hype pa ame e s, esou ce quo as, and que y empla e di e si y, achie ing low p edic ion e o s and easonable unce ain y
calib a ion in dynamic en i onmen s. The esul s show ha he amewo k pe o ms well in bo h la ency p edic ion and esou ce
op imiza ion, p o iding a new echnical pa h o da abase sys em pe o mance imp o emen .
Keywo ds: Que y execu ion p edic ion; esou ce o ches a ion; la ency modeling; sys em op imiza ion
1. In oduc ion
In oday's da a-in ensi e applica ions, he e iciency o
da abase que y execu ion di ec ly a ec s sys em se ice quali y
and use expe ience. Wi h he con inuous g ow h o da a
olume and he di e si ica ion o applica ion equi emen s,
achie ing accu a e execu ion ime p edic ion and e icien
esou ce scheduling in complex que y plans has become a
c i ical p oblem in da abase op imiza ion. Accu a e la ency
p edic ion no only helps a oid pe o mance bo lenecks in
ad ance bu also p o ides key suppo o esou ce alloca ion
and que y op imiza ion s a egies, he eby imp o ing o e all
sys em s abili y and esponse speed. The e o e, esea ch on
in elligen me hods o que y execu ion ime p edic ion and
op imiza ion s a egies ca ies signi ican heo e ical and
p ac ical alue[1].
Howe e , exis ing me hods s ill ace many challenges in
complex que y scena ios. T adi ional cos models a e unable o
adap o he dynamic changes o da a dis ibu ion and sys em
s a es, o en leading o he accumula ion o p edic ion e o s.
Da a-d i en lea ning models imp o e p edic ion accu acy bu
s ill su e om limi a ions in gene aliza ion and obus ness. In
pa icula , when acing di e se que y empla es and unce ain
un ime en i onmen s, he p edic ions o en de ia e om ac ual
pe o mance. A he esou ce scheduling le el, cu en
s a egies lack ine-g ained modeling o he coo dina ion
be ween que y equi emen s and sys em esou ces, leading o
une en alloca ion and uns able sys em load, which nega i ely
a ec s he sa is ac ion o se ice-le el objec i es (SLOs)[2].
To add ess hese p oblems, his s udy in oduces a
comp ehensi e me hod ha in eg a es que y plan s uc u al
in o ma ion wi h esou ce scheduling mechanisms[3]. On one
hand, a plan-g aph-based la ency modeling mechanism is
cons uc ed o enhance he model's s uc u ed pe cep ion o
que y execu ion p ocesses, enabling mo e accu a e la ency
es ima ion du ing p edic ion. On he o he hand, an adap i e
que y – esou ce o ches a ion s a egy is inco po a ed o
achie e dynamic ma ching be ween que y wo kloads and
sys em esou ces, imp o ing p edic ion s abili y and
op imiza ion pe o mance in di e se scena ios. This
bidi ec ional in eg a ion aims o o m posi i e eedback
be ween p edic ion and scheduling, d i ing o e all
imp o emen s in da abase pe o mance op imiza ion[4].
The con ibu ions o his pape lie in wo main aspec s. Fi s ,
we p opose Plan-G aph Guided La ency Modeling (PGLM),
which explici ly inco po a es que y plan s uc u al ea u es in o
he p edic ion p ocess. This enhances he model's abili y o
ep esen and unde s and complex que y opologies, he eby
imp o ing p edic ion accu acy and gene aliza ion. Second, we
design an Adap i e Que y –Resou ce O ches a o (AQRO),
which achie es adap i e alignmen be ween que y demands
and sys em esou ces unde a p edic ion–execu ion in e ac ion
amewo k, balancing pe o mance imp o emen wi h esou ce
u iliza ion e iciency. Toge he , hese wo inno a ions cons uc
an end- o-end in elligen op imiza ion amewo k ha p o ides
a new solu ion o que y p edic ion and op imiza ion in
da abases[5].
2. Rela ed wo k
2.1 Que y Execu ion Time P edic ion: F om Cos Models
o Da a-D i en Lea ning
T adi ional esea ch has mainly ocused on cos models
based on ules and pa ame e ized assump ions. Que y
execu ion ime is decomposed in o he sum o ope a o -le el
CPU, I/O, and ne wo k cos s, i ed h ough ca dinali y
es ima ion, selec i i y, and cos ables. In ea lie single-node,
ow-s o e a chi ec u es, hese me hods o e ed good
in e p e abili y and ease o implemen a ion. Howe e , mode n
da abase sys ems in oduce columna comp ession, ec o ized
execu ion, pa allel pipelines, JIT compila ion, compu e-s o age
sepa a ion, and accele a ion ha dwa e. La ency is no longe a
simple linea sum o independen ope a o cos s[6]. Cache
pene a ion, memo y g an s, concu ency, and esou ce
go e nance s a egies in oduce s ong nonlinea i y and c oss-
laye coupling. In dis ibu ed se ings, da a skew, shu le,
e ies, and allback ampli y he cascading e ec o ca dinali y
e o s, making s a ic cos models p one o sys ema ic bias
unde mixed wo kloads, mul i- enan deploymen s, and elas ic
esou ce en i onmen s[7].
In esponse o inc easing complexi y, da a-d i en lea ning-
based p edic ion has become an impo an di ec ion. These
me hods ypically ely on execu ion logs and cons uc mul i-
g anula i y ea u es a ound que y plans, da a cha ac e is ics,
and sys em eleme y. The ea u es include logical and physical
ope a o sequences, join g aph densi y, p edica e complexi y,
de ia ions be ween es ima ed and obse ed ca dinali ies,
pipeline dep h, concu ency, memo y g an s, cache hi a e,
disk u iliza ion, and ne wo k u iliza ion[8]. One class o
me hods pe o ms ope a o -le el o s age-le el la ency
eg ession and hen combines esul s o ob ain o e all que y
la ency. Ano he class di ec ly pe o ms end- o-end p edic ion
using nonlinea models o cap u e how join o de , selec i i y,
and da a skew ampli y e ec s along he c i ical pa h. Compa ed
wi h adi ional cos models, lea ning-based me hods be e
accommoda e he e ogeneous ha dwa e and dynamic esou ce
s a egies, and p o ide ine-g ained signals o iden i ying
pe o mance bo lenecks unde di e en h o ling and
scheduling policies[9].
To enhance he ep esen a ion o que y plan s uc u es and
execu ion dependencies, ecen s udies emphasize s uc u ed
ep esen a ion lea ning. Typical app oaches ea que y plans as
ees o di ec ed acyclic g aphs and use g aph o sequence
encode s o cap u e ope a o -le el da a low dependencies,
pa allel o blocking ela ionships, and c oss-s age in e e ence.
A en ion mechanisms a e in oduced o explici ly model
c i ical pa hs and bo leneck ope a o s. Con ex encode s a e
used o in eg a e s a ic plans wi h un ime s a es, enabling
models o espond o ansien esou ce luc ua ions and plan
e-op imiza ion. Fo long- e m a ailabili y in p oduc ion,
online upda ing, inc emen al lea ning, and concep d i
de ec ion ha e been p oposed. These a e o en combined wi h
unce ain y es ima ion and calib a ion echniques, which ensu e
p edic ion accu acy while p o iding con idence in e als o
educe isks om inco ec decisions[10,11].
Lea ning-based me hods also ace challenges in da a quali y
and gene aliza ion. Execu ion logs o en con ain missing alues,
noise, and skewed dis ibu ions, while ex eme ail la encies
signi ican ly a ec aining and e alua ion. Va ia ions ac oss
wo kloads, schemas, and wo kload e olu ion cause ea u e
dis ibu ion shi s. C oss-engine, c oss-clus e , and c oss-cloud
deploymen equi es models wi h domain adap a ion and
pa ame e -e icien upda ing. P i acy and compliance
es ic ions limi da a agg ega ion ac oss enan s, mo i a ing
explo a ion o weakly supe ised, semi-supe ised, and
p i acy-p ese ing lea ning. Mechanisms such as UDFs,
app oxima e que ies, and ma e ialized iew selec ion in oduce
unobse able o ha d- o-quan i y a iables. In esponse,
esea ch has p oposed ea u e go e nance, obus losses,
esampling, and eweigh ing s a egies. Hie a chical, mul i- ask,
and mul i-objec i e models ha e also been in oduced o join ly
cap u e bo h a e age and high-pe cen ile la encies, he eby
p o iding a mo e s able p edic i e ounda ion o plan selec ion,
mid-que y e-op imiza ion, and esou ce o ches a ion[12].
2.2 Execu ion-Time Op imiza ion S a egies: Adap i e
Que y P ocessing and Resou ce O ches a ion
Execu ion- ime op imiza ion ocuses on un ime
coo dina ion be ween que y plans and esou ces. I s co e lies in
he syne gy o adap i e que y p ocessing and esou ce
o ches a ion. The o me add esses unce ain y caused by
s a is ical d i , concu ency luc ua ion, and da a skew, aiming
o con inuously co ec alse assump ions and con e ge o
be e execu ion pa hs. The la e emphasizes elas ic alloca ion
and global scheduling o compu e, s o age, and ne wo k
esou ces unde mul i- enan and he e ogeneous en i onmen s
wi h se ice-le el objec i es as cons ain s. A key p e equisi e
o hei join e ec i eness is he cons uc ion o obse able
links ac oss he plan, ope a o , and sys em laye s. This equi es
exposing eedback on es ima ion e o s be ween logical and
physical plans, collec ing ine-g ained un ime me ics in he
execu ion engine, and p o iding delay-sensi i e scheduling
in e aces and quo a con ols in esou ce managemen . These
mechanisms ensu e ha op imiza ion can ake e ec in a
closed-loop and imely manne [13].
The esea ch pa adigm o adap i e que y p ocessing
ocuses on in- ligh co ec ion. Typical app oaches include
moni o ing de ia ions in ca dinali y and selec i i y du ing
execu ion and igge ing phase e-op imiza ion o adjus join
o de and ope a o implemen a ion. Mul iple candida e
s a egies can be p ese o c i ical ope a o s and swi ched wi h
ligh weigh o e head once h esholds a e exceeded o
con idence le els upda ed[14]. Ope a o pa allelism, ba ch size,
and bu e h esholds can be adjus ed dynamically acco ding o
memo y and I/O p essu e, supp essing blocking chains and
ollback ampli ica ion. In da a skew scena ios, ho keys can be
esampled o a oided by spli ing long- ail asks in o balanced
sub asks. Inc emen al indexes and ma e ialized iews can be
ac i a ed on demand o educe memo y and ne wo k o e head
along c i ical pa hs. P o ec ion poin s can be placed in pipelines
ha a e sensi i e o es ima ion e o s, whe e ligh weigh
s a is ics and mic o- eo de ing a e inse ed wi hou b eaking
pipeline pa allelism, balancing obus ness and h oughpu [15].
Resou ce o ches a ion adop s a global pe spec i e o
coo dina e mul iple cons ain s. In clus e and mul i-cloud
en i onmen s, que ies a e spli in o independen ly schedulable
s ages o ask g oups. These a e placed wi h a ini y acco ding
o da a locali y and ne wo k opology, educing c oss-swi ch
a ic and ho spo conges ion. Admission con ol and h o ling
s a egies d i en by se ice-le el objec i es and la ency
budge s a e in oduced in o queues and p io i y hie a chies,
ensu ing ese ed esou ces and p io i y o c i ical eques s.
Quo as and isola ion a e applied o CPU, memo y, s o age, and
ne wo k esou ces, enhanced by NUMA awa eness and
accele a o binding, imp o ing u iliza ion e iciency. Fo
hyb id ansac ional and analy ical wo kloads, online wo kload
classi ica ion and shaping a e used o p e en long analy ical
asks om s a ing in e ac i e sho que ies. In dis ibu ed
execu ion, specula i e execu ion and eplica di e sion alle ia e
long- ail e ec s, while cache laye s and s aged p e e ch educe
cold-s a and ji e ampli ica ion[16].
A he me hodological le el, mo e op imiza ion s a egies
inco po a e p edic i e and lea ning signals o enhance
adap i eness and o esigh . A he que y laye , join p edic ion
o execu ion ime and esou ce usage can be embedded in o
cos e alua ion, o ming a p edic ion –decision – eedback
loop. A he sys em laye , scheduling and scaling can be
modeled as cons ained sequen ial decision p oblems, upda ed
wi h his o ical eleme y and online obse a ion. Fo mul i-
objec i e ade-o s, delay, h oughpu , cos , and ene gy a e
join ly op imized wi h obus egula iza ion o p e en
o e i ing on a single me ic. Fo c oss-en i onmen
gene aliza ion, domain adap a ion and pa ame e -e icien
upda es allow s a egies o ans e ac oss engines, clus e s, and
enan s. Fo enginee ing usabili y, unce ain y es ima ion, and
p o ec i e h esholds limi e-op imiza ion equency and
swi ching o e head, ensu ing ha op imiza ion bene i s exceed
con ol cos s in high-concu ency en i onmen s. Fallback
mechanisms a e also equi ed o mi iga e p edic ion e o s and
s a egy ailu es, p o iding sa e eco e y pa hs[17].
3. Me hod
This pape p oposes an end- o-end closed-loop
amewo k o da abase que y p ocessing ha ollows he line
o p edic ion, decision, and eedback, co e ing bo h o line
modeling and online op imiza ion, and inco po a ing wo key
inno a ions. The i s is Plan-G aph Guided La ency Modeling
(PGLM). This module abs ac s logical and physical plans in o
a ibu ed di ec ed acyclic g aphs and in eg a es ope a o
sequences, pipeline bounda ies, join s uc u es, de ia ions
be ween es ima ed and obse ed ca dinali ies, concu ency,
and mul i-sou ce eleme y o memo y, I/O, and ne wo k. I
cons uc s mul i-g anula i y ea u es wi h empo al con ex
encoding and ou pu s calib a ed unce ain y dis ibu ions o
cap u e bo h mean and high-pe cen ile la ency cha ac e is ics,
while also p o iding in e p e able cues o c i ical pa hs and
bo leneck ope a o s. The second is he Adap i e Que y –
Resou ce O ches a o (AQRO). This module uses
endogenous p edic ions and con idence in o ma ion as p io s
o coo dina e plans and esou ces a un ime. I igge s join
eo de ing and ope a o swi ching, adjus s pa allelism and
ba ch size acco ding o load and p essu e, mi iga es da a skew
h ough esampling and ask spli ing, and applies elas ic
o ches a ion o CPU, memo y, s o age, and ne wo k wi h
quo as and a ini y placemen , while main aining s abili y and
a ailabili y h ough cons ained op imiza ion and swi ching
cos con ol. The o e all wo k low consis s o da a collec ion
and ea u e go e nance, PGLM aining and online in e ence,
AQRO policy gene a ion and execu ion moni o ing, which a e
in eg a ed in o he op imize and esou ce manage h ough
s anda dized in e aces o p o ide execu ion- ime-awa e
adap i e que y p ocessing and esou ce o ches a ion in
complex and dynamic mul i- enan , he e ogeneous, and cloud-
na i e en i onmen s. The o e all model amewo k diag am
men ioned is shown in Figu e 1.
Figu e 1. O e all model a chi ec u e diag am
3.1 Plan-G aph Guided La ency Modeling
This s udy in oduces a Plan-G aph Guided La ency
Modeling (PGLM) me hod. The co e idea is o abs ac logical
and physical da abase plans in o a ibu ed di ec ed acyclic
g aphs and hen pe o m g aph s uc u e modeling and
empo al con ex encoding. By in eg a ing ope a o ypes,
pipeline bounda ies, ca dinali y es ima ion e o s, concu ency,
and mul i-sou ce eleme y, PGLM cap u es c i ical pa hs and
bo leneck ope a o s a he g aph s uc u e le el. This enables
ine-g ained and in e p e able p edic ions o que y execu ion
ime. Unlike adi ional s a ic cos models, PGLM does no
ely on ixed pa ame e s. Ins ead, i lea ns la en dependencies
and ampli ica ion e ec s au oma ically h ough a s uc u ed
g aph ep esen a ion, which makes i mo e adap able o
complex un ime en i onmen s.
PGLM u he ans o ms he p edic ion p oblem in o
modeling condi ional p obabili y dis ibu ions a he han
single-poin es ima ion. This dis ibu ional iew cha ac e izes
no only he mean execu ion ime bu also calib a ed ou pu s
o high-pe cen ile la encies such as P95 and P99, which a e
c i ical in eal sys ems. By inco po a ing unce ain y modeling,
PGLM p oduces join ep esen a ions ha include mean,
a iance, and con idence. This allows subsequen esou ce
o ches a ion and adap i e que y p ocessing o make decisions
based on con idence cons ain s ins ead o single-poin
es ima es, achie ing a balance be ween s abili y and
pe o mance. The amewo k o his inno a ion is illus a ed
in Figu e 2.
Figu e 2. PGLM module a chi ec u e
A he o mula easoning le el, we i s model he que y
plan as a weigh ed g aph. Le 's assume he que y plan is a
di ec ed acyclic g aph
),( EVG 
, whe e
V
ep esen s
he ope a o nodes and
E
ep esen s he dependency edges
be ween ope a o s. Fo each node
V 
, i s ep esen a ion
ec o can be w i en as:









)(
),(,
N
u u ehxh

x
ep esen s he ope a o cha ac e is ics (such as ype,
selec i i y, pa allelism, e c.),
u
e
ep esen s he edge
a ibu es (such as pa i ioning me hod, blocking ela ionship),
and

and

a e lea nable unc ions.
Based on he node ep esen a ion, we ep esen he
po en ial execu ion ime o he en i e que y as a g aph-le el
embedding
z
:
  
V hREADOUTz  |
Whe e
)(READOUT
ep esen s a g aph-le el
agg ega ion ope a ion, such as weigh ed a e aging o a en ion
agg ega ion.
To cha ac e ize he unce ain y o execu ion ime, we
model i s condi ional p obabili y dis ibu ion:
))(),(|()|( 2zzyNGyp


Whe e
y
ep esen s he ac ual execu ion ime,
)(z

and
)(
2z

a e he p edic ed mean and a iance,
espec i ely. The model ou pu s no only he expec ed alue
bu also he con idence in e al, hus a oiding o e - eliance
on a single es ima e du ing op imiza ion.
In he loss unc ion design, we use nega i e log-
likelihood as he op imiza ion a ge :
)(2
))((
)(log
2
1
2
2
2
z
zy
zLNLL





This loss unc ion cons ains bo h he mean and a iance
o he p edic ions, allowing he model o lea n a s able and
calib a ed p obabili y dis ibu ion. To u he ocus on ail
delays, we in oduce quan ile eg ession loss:
))
ˆ
)(1(),
ˆ
(max( yyyyL 


Whe e
y
ˆ
is he p edic ed

quan ile (such as
95.0

o
99.0
). The inal loss unc ion is a weigh ed
combina ion:
 



99.0,95.0



LLL NLL
The ole o he loss unc ion in his pa is o ans o m
execu ion ime p edic ion in o a join op imiza ion p oblem. I
equi es no only an accu a e mean p edic ion bu also a
eliable cha ac e iza ion o ail la ency. By cons aining bo h
he dis ibu ion cen e and ail ea u es, he model ou pu s
p edic ions ha a e ep esen a i e and obus , p o iding a
solid p obabilis ic ounda ion o subsequen adap i e que y
op imiza ion and esou ce o ches a ion.
3.3 Adap i e Que y–Resou ce O ches a o
This s udy also in oduces an Adap i e Que y–Resou ce
O ches a o (AQRO). I s main goal is o achie e dynamic
alignmen be ween compu ing esou ces and que y plans
du ing execu ion, he eby imp o ing o e all pe o mance and
ai ness in mul i- enan and he e ogeneous en i onmen s.
Unlike adi ional scheduling me hods ha ely on s a ic ules,
AQRO makes adap i e decisions based on p edic ed execu ion
signals such as mean la ency, ail la ency, and con idence
in e als. I enables join op imiza ion a bo h he que y and
esou ce le els. This coo dina ion allows he sys em o
main ain s able esou ce u iliza ion unde high load and
la ency-sensi i e scena ios while e ec i ely mi iga ing
pe o mance luc ua ions and ail la ency ampli ica ion.
In addi ion, he design o AQRO emphasizes c oss-laye
eedback by inco po a ing p edic ion unce ain y in o esou ce
o ches a ion s a egies. By combining s uc u al ea u es o
que y plans wi h un ime esou ce s a es, AQRO can
dynamically adjus ope a o scheduling, pa allelism alloca ion,
esou ce a ini y, and ba ch g anula i y. Impo an ly, AQRO
conside s no only a e age pe o mance objec i es bu also
in oduces pe cen ile la ency cons ain s in o scheduling o
achie e p oac i e con ol o ail la ency. This design p o ides
a uni ied con ol bus o esou ce scheduling and que y
op imiza ion, enabling he sys em o lexibly swi ch be ween
p ecise p edic ion and unce ain y de ense. The amewo k o
his model is illus a ed in Figu e 3.
Figu e 3. AQRO module a chi ec u e
A he o mula easoning le el, assume he sys em has a
que y se
 
N
qqqQ ,...,, 21

and a esou ce se
 
M
R ,..., 21

. The p edic ed la ency o each que y
i
q
is ou pu by PGLM, including he mean
i

, a iance
2
i

,
and quan ile p edic ion
)(
ˆ

i
y
. The esou ce alloca ion can be
ep esen ed as a ma ix:
 
 
MN
ji
aA 
 1,0
,
1
,
ji
a
Indica es ha he que y
i
q
is assigned o a
esou ce
j
.
Unde his cons ain , he execu ion la ency o each que y
is es ima ed o be:
))(,
ˆ
,,(
ˆ)(
jiiii y T
 

Whe e
)( j

ep esen s he load ac o o he esou ce
j
.
Fu he mo e, he sys em objec i e unc ion can be de ined
as minimizing he weigh ed delay:


N
i
iiTwAJ
1
ˆ
)(
i
w
is he que y p io i y weigh , which is used o e lec
he di e ences be ween di e en enan s o di e en business
needs.
To p e en he ail delay om being oo la ge, he quan ile
cons ain is in oduced:
iyi ,
ˆ)(


Whe e

is he maximum allowable h eshold o ail
la ency. This cons ain ensu es ha he scheduling s a egy
no only ocuses on a e age pe o mance bu also akes high-
pe cen ile la ency con ol in o accoun .
Finally, he loss unc ion o AQRO is de ined as
ollows:
)
ˆ
,0max()
ˆ
()
ˆ
()(
2
1
1
2
 
 

i
N
i
iiiAQRO yTVa yTL
The i s e m is he delay p edic ion e o cons ain , he
second e m is he egula iza ion o he p edic ion a iance o
s abilize he scheduling, he hi d e m is he ail delay penal y,
and
1

and
2

a e ade-o coe icien s.
The ole o he loss unc ion in AQRO is o ans o m
que y scheduling and esou ce o ches a ion in o a join
op imiza ion p oblem. I ensu es a e age pe o mance while
con olling p edic ion unce ain y and cons aining ail la ency
wi hin a p ede ined h eshold. In his way, he sys em can
dynamically adap o he demands o di e en que ies and he
load condi ions o esou ces, achie ing adap i e and obus

coo dina ed op imiza ion ha p o ides highe s abili y and
con ollabili y o da abase que y execu ion.
4. Expe imen al Resul s
4.1 Da ase
This s udy uses he public da ase named SPARQL
Que ies Pe o mance P edic ion, which con ains a la ge
numbe o ins ances designed o que y la ency p edic ion
asks. I co e s ac ual que y plan s uc u es oge he wi h hei
execu ion la ency labels. The da ase ocuses on s uc u al
ea u es a he que y plan le el and hei la ency esponses,
making i highly sui able o la ency modeling and la ency
dis ibu ion lea ning.
The da ase consis s o se e al key componen s. I
includes SPARQL que y ex s and s uc u al ea u es ex ac ed
om que y plans, such as ope a o ype sequences, di e ences
be ween es ima ed and ac ual ca dinali ies, and join opology
in o ma ion. I also p o ides p ecise que y execu ion imes as
g ound- u h labels. In addi ion, un ime con ex ea u es a e
included, such as concu ency le el, cache hi a e, and I/O
la ency me ics. Toge he , hese componen s o m a ich se o
da a samples ha suppo lea ning he mapping be ween g aph
s uc u al ea u es and execu ion ime.
One o he main ad an ages o his da ase is he di e si y
and cla i y o i s ea u es. I no only cap u es que y plan
s uc u al in o ma ion bu also aligns i wi h accu a e
execu ion la encies, o e ing an end- o-end supe ised lea ning
basis o la ency modeling. Mo eo e , because he da a
o igina es om eal que y en i onmen s a he han syn he ic
se ings, i p o ides highe ep esen a i eness and enhances
model gene aliza ion. Finally, he da ase is easy o access,
s o ed in s anda d o ma s, and o ganized, which acili a es he
cons uc ion o aining and es ing pipelines. This makes i
highly e icien and ep oducible o esea che s de eloping
la ency p edic ion and op imiza ion s a egies.
4.2 Expe imen al se up
The expe imen s in his s udy we e conduc ed in a high-
pe o mance compu ing en i onmen . The ha dwa e
con igu a ion included a 32-co e CPU wi h a 2.6 GHz clock
speed, 256 GB o memo y, and a mul i-GPU clus e wi h 32
GB o memo y pe ca d. High- h oughpu NVMe SSD s o age
and gigabi E he ne we e used o ensu e s abili y and
scalabili y du ing que y plan cons uc ion, la ency p edic ion
modeling, and esou ce o ches a ion. This en i onmen
p o ided su icien compu a ional suppo o complex g aph
ea u e ex ac ion and deep lea ning model aining.
On he so wa e side, he expe imen s an on a Linux
ope a ing sys em. Py hon 3.10 was used as he p ima y
p og amming language, and PyTo ch se ed as he main deep
lea ning amewo k o model implemen a ion and aining.
To e icien ly execu e g aph- ela ed compu a ions, DGL and
ela ed pa allel compu ing lib a ies we e employed. Docke
con aine iza ion was used o ensu e po abili y and s able
dependency managemen . In addi ion, CUDA 11.x and
cuDNN we e used o GPU accele a ion o ully exploi he
pa allel compu ing capabili ies o he ha dwa e.
Fo hype pa ame e se ings, he Adam op imize was
adop ed wi h an ini ial lea ning a e o 1e-4. A cosine
annealing schedule was used o dynamically adjus he
lea ning a e du ing aining. The ba ch size was se o 128 o
balance GPU memo y usage and aining s abili y. The
g adien clipping h eshold was se o 1.0 o a oid g adien
explosion. Loss unc ion weigh s we e de e mined h ough
g id sea ch, and he L2 egula iza ion coe icien was se o
1e-5 o imp o e gene aliza ion pe o mance. An ea ly
s opping mechanism was applied in all expe imen s,
e mina ing aining i alida ion me ics did no imp o e o
10 consecu i e i e a ions, hus ensu ing bo h e iciency and
s abili y.
4.3 Expe imen al Resul s
1) Compa a i e expe imen al esul s
This pape i s conduc s a compa a i e expe imen , and he
expe imen al esul s a e shown in Table 1.
Table 1: Compa a i e expe imen al esul s
Me hod
MAE ↓
RMSE↓
P95 Abs.
E o
↓
ECE ↓
In o me [18]
28.7
56.4
132.0
0.061
Au o o me [19]
26.9
53.1
124.0
0.055
FED o me [20]
24.8
49.2
115.0
0.049
Pa chTST[21]
22.6
45.7
108.0
0.043
ou s
(PGLM+AQRO)
15.9
31.0
72.0
0.028
The esul s demons a e consis en supe io i y. Ac oss all
ou me ics, ou s (PGLM+AQRO) ou pe o ms e e y baseline.
Compa ed wi h he bes -pe o ming baseline (Pa chTST: MAE
22.6 ms, RMSE 45.7 ms, P95 108 ms, ECE 0.043), ou s
educes MAE o 15.9 ms (abou 29.7 pe cen imp o emen ),
RMSE o 31.0 ms (abou 32.2 pe cen imp o emen ), and P95
e o o 72.0 ms (abou 33.3 pe cen imp o emen ), while also
lowe ing ECE o 0.028 (abou 34.9 pe cen imp o emen ). This
simul aneous educ ion in e o , ail la ency, and calib a ion
shows ha he model is mo e obus in o e all accu acy,
ex eme cases, and unce ain y cha ac e iza ion.
Mechanis ically, he i s ad an age comes om he
s uc u al awa eness o PGLM. By abs ac ing logical and
physical plans in o a ibu ed plan g aphs, he model explici ly
cap u es blocking ela ionships among ope a o s, join o de s,
selec i i y ampli ica ion e ec s, and he in luence o
pa allelism on c i ical pa hs. Compa ed wi h baselines ha ea
que ies as ime se ies o la ea u es, s uc u ed ep esen a ion
educes sys ema ic e o p opaga ion caused by ca dinali y bias
and esou ce con en ion. This is di ec ly e lec ed in he
simul aneous imp o emen s o MAE and RMSE.
Second, he signi ican educ ion in P95 absolu e e o
highligh s he model's s onge abili y o cap u e ail la ency
and dis ibu ional cha ac e is ics. PGLM gene a es join
p edic ions o mean and high pe cen iles and p o ides
con idence in o ma ion. AQRO hen le e ages hese
unce ain y signals o coo dina e plans and esou ces, such as
adjus ing pa allelism and ba ch size, esampling ho keys,
applying a ini y placemen , and en o cing quo a con ol. As a
esul , he model main ains con ollable ail e o s e en unde
wo kloads domina ed by long- ail la ency. This a ge ed
ea men o ail dis ibu ion is one o he mos p ac ically
aluable imp o emen s o que y execu ion la ency.
Finally, he educ ion in ECE indica es be e alignmen
be ween p edic ed p obabili ies and obse ed dis ibu ions,
making con idence in e als mo e use ul. This di ec ly
enhances he eliabili y o subsequen decisions. In admission
con ol, p io i y scheduling, and elas ic scaling, AQRO can use
c edible con idence alues o se h esholds and conse a i e
s a egies, educing he isks o o e -p o isioning and unde -
p o isioning. This allows he sys em o mee se ice-le el
objec i es while con olling esou ce cos s. O e all, he
s uc u al ep esen a ion o PGLM and he unce ain y-awa e
con ol o AQRO o m a closed loop. Mo e accu a e p edic ion
and mo e s able o ches a ion ein o ce each o he , d i ing
que y la ency p edic ion and op imiza ion s a egies om s a ic
o adap i e.
2) The en i onmen al sensi i i y o esou ce limi s and
quo a policies (CPU/Memo y/I/O) o SLO sa is ac ion a es
This pape u he s udies he en i onmen al sensi i i y o
esou ce limi s and quo a policies (CPU/Memo y/I/O) o SLO
sa is ac ion a es. The expe imen al esul s a e shown in
Figu e 4.
Figu e 4. The en i onmen al sensi i i y o esou ce limi s and quo a policies (CPU/Memo y/I/O) o SLO sa is ac ion a es
In his se o expe imen s, he e o me ics o la ency
p edic ion show clea di e ences unde di e en quo a
s a egies. When he sys em is cons ained by CPU o memo y
quo as, he MAE ises o 17.5 ms and 18.2 ms, espec i ely.
Unde I/O quo a cons ain s, i is e en highe a 19.6 ms,
indica ing ha I/O limi s ha e he g ea es impac on p edic ion
accu acy. In con as , unde balanced esou ce alloca ion, he
MAE is he lowes a 15.9 ms, showing ha he model can
be e cap u e he co espondence be ween que y plan ea u es
and execu ion la ency when esou ces a e su icien and
p ope ly alloca ed.
Fu he obse a ion o RMSE shows a end consis en wi h
MAE. Unde CPU and memo y quo as, RMSE eaches 33.9 ms
and 35.1 ms, espec i ely. Unde I/O quo as, i inc eases o
37.8 ms, while in he balanced quo as scena io, i d ops o 31.0
ms, showing a clea ad an age. Since RMSE is mo e sensi i e
o la ge de ia ions, his indica es ha in balanced scena ios, he
model no only achie es lowe a e age e o bu also p o ides
mo e s able p edic ions unde la ge luc ua ions. This e i ies
he obus ness o PGLM's g aph-s uc u ed ep esen a ion in
mul i- esou ce scheduling scena ios.
Fo ail la ency e o (P95 absolu e e o ), I/O quo as
emain he main bo leneck. The e o ises om 78 ms unde
CPU quo as o 86 ms, while in he balanced esou ce se ing, i
dec eases signi ican ly o 72 ms. In he agg essi e quo as
scena io, P95 e o soa s o 104 ms, highligh ing he
ampli ica ion o ail la ency isk unde ex eme esou ce
cons ain s. This shows ha al hough AQRO's dynamic
o ches a ion s a egy can pa ially mi iga e ins abili y caused
by esou ce sho ages, i is s ill di icul o ully supp ess long-
ail e ec s when esou ces a e o e ly es ic ed.
Finally, looking a he ECE me ic, he lowes alue o
0.028 is achie ed in he balanced scena io, indica ing he bes
alignmen be ween p edic ed and ac ual p obabili y
dis ibu ions. Unde CPU and memo y quo as, ECE alues ise
o 0.030 and 0.031, while unde I/O quo as, hey each 0.034.
In he agg essi e quo as scena io, i u he wo sens o 0.040.
This indica es ha when sys em esou ces a e excessi ely
cons ained, he e ec i eness o unce ain y modeling
dec eases, educing he eliabili y o p edic ion con idence.
O e all, he esul s demons a e ha PGLM+AQRO achie es
supe io p edic ion accu acy and calib a ion in balanced
esou ce en i onmen s, while ex eme quo a cons ain s e eal
he limi s o model adap abili y.
3) Sensi i i y o que y empla e di e si y (ope a o pa e n
and connec ion opology changes) o p edic ion gene aliza ion
This pape also s udies he sensi i i y o que y empla e
di e si y (ope a o pa e n and connec ion opology changes)
o p edic ion gene aliza ion. The expe imen al esul s a e
shown in Figu e 5.
Figu e 5.Sensi i i y o que y empla e di e si y (ope a o pa e n and connec ion opology changes) o p edic ion gene aliza ion
The expe imen al esul s show ha he la ency p edic ion
abili y o PGLM+AQRO a ies signi ican ly unde di e en
le els o que y empla e di e si y. MAE emains low unde
low and medium di e si y condi ions bu ises no ably when
ope a o numbe s and join ela ionships become mo e complex.
This indica es ha he model is mo e sensi i e o a e age e o
in complex opologies. The obse a ion highligh s he
in luence o que y empla e s uc u e on he baseline e o
dis ibu ion o he model.
The end o RMSE shows a clea e mono onic inc ease
compa ed wi h MAE. In scena ios wi h high ope a o di e si y
and mixed complexi y, he luc ua ions in p edic ion e o
become much la ge . This means ha when handling complex
que ies, he dispe sion o he e o dis ibu ion inc eases, and
ex eme alues exe a s onge in luence on o e all e o . The
con inuous ise o RMSE demons a es ha obus ness unde
complex ope a o pa e ns emains a challenge o he sys em.
The P95 absolu e e o peaks unde high-di e si y
condi ions and hen sligh ly dec eases in ex eme mixed
scena ios. This shows ha ail la ency p edic ion is hea ily
a ec ed by ex eme s uc u al complexi y bu is pa ly
alle ia ed a e in e nal adjus men o he model. The end
highligh s he s abilizing e ec o AQRO's adap i e scheduling
and esou ce o ches a ion unde high-p essu e condi ions,
which helps supp ess ex eme la ency.
The ECE esul s i s dec ease and hen inc ease, e en ually
s abilizing a a mode a e le el in ex eme scena ios. This end
indica es ha unde mode a e que y di e si y, he model
p o ides mo e eliable unce ain y es ima es. Howe e , in
ex emely complex cases, de ia ions a ise be ween con idence
in e als and ac ual e o s. This sugges s ha PGLM needs
u he imp o emen in unce ain y calib a ion o enhance he
consis ency o p edic ion con idence when acing high
s uc u al di e si y.
4) Hype pa ame e sensi i i y o lea ning a e and ba ch
size o delayed p edic ion s abili y
This pape u he p oposes a hype pa ame e sensi i i y
es ocusing on he lea ning a e and ba ch size, aiming o
explo e hei in luence on he o e all s abili y o delayed
p edic ion wi hin he p oposed amewo k. The mo i a ion
behind his es lies in he ac ha hype pa ame e s play a
decisi e ole in balancing con e gence speed, op imiza ion
s abili y, and gene aliza ion abili y, pa icula ly in scena ios
whe e p edic ion ou comes a e a ec ed by empo al
dependencies o delayed esponses. By sys ema ically
adjus ing he alues o lea ning a e and ba ch size, he s udy
seeks o analyze how sub le changes in hese pa ame e s
impac he obus ness o he model agains luc ua ions, as
well as i s abili y o main ain consis en pe o mance ac oss
di e en aining phases. Such sensi i i y analysis p o ides
no only a deepe unde s anding o he aining dynamics bu
also p ac ical guidance o selec ing mo e eliable
hype pa ame e con igu a ions when applying he me hod o
eal-wo ld asks. The o e all design o his expe imen is
summa ized in Figu e 6, which illus a es he ela ionship
be ween hese c i ical hype pa ame e s and he s abili y o
delayed p edic ion.
Figu e 6. Hype pa ame e sensi i i y o lea ning a e and ba ch size o delayed p edic ion s abili y
The expe imen al esul s show ha MAE exhibi s a ypical
U-shaped cu e conce ning lea ning a e se ings. When he
lea ning a e is oo small, model upda es a e slow, and he e o
emains high. As he lea ning a e inc eases, MAE dec eases
signi ican ly and eaches i s op imal alue in he middle ange,
indica ing ha he model cap u es he ela ionship be ween plan
g aphs and la ency mo e e icien ly. Howe e , when he
lea ning a e con inues o inc ease, MAE ises again, e lec ing
ins abili y caused by o e ly apid pa ame e upda es. This end
e eals he sensi i i y o la ency p edic ion models o
pa ame e uning and highligh s he impo ance o a easonable
lea ning a e ange o p edic ion accu acy.
The end o RMSE shows an o e all inc ease wi h small
luc ua ions. A low lea ning a es, RMSE is ela i ely low,
sugges ing ha he p edic ion dis ibu ion is mo e concen a ed.
As he lea ning a e g adually inc eases, RMSE con inues o
ise wi h s onge luc ua ions, indica ing ha agg essi e
upda e a es ampli y e o s in some complex que y plans. This
phenomenon sugges s he isk o unce ain y in oduced by
high lea ning a es and u he emphasizes he impo ance o
s abili y in la ency p edic ion.
In he ba ch size expe imen s, he P95 absolu e e o i s
dec eases signi ican ly as ba ch size inc eases, eaching i s
op imal le el a medium scale (such as 128), and hen ises
again. This indica es ha mode a e ba ch sizes e ec i ely
balance he ade-o be ween gene aliza ion and con e gence,
making ail la ency p edic ion mo e s able. Howe e , when he
ba ch size becomes oo la ge, he model loses ine-g ained
op imiza ion capaci y du ing g adien upda es, leading o
ampli ied ail e o s and educed accu acy in ex eme
condi ions.
The ECE esul s demons a e he sensi i i y o model
calib a ion. As ba ch size inc eases, ECE g adually dec eases
om a highe le el, showing ha he model achie es be e
consis ency be ween p edic ed con idence and ac ual e o s a
medium scales. When ba ch size con inues o g ow, ECE ises
again, indica ing ha o e ly la ge ba ches in oduce es ima ion
bias and educe he alignmen be ween p edic ed p obabili ies
and ue e o s. This phenomenon shows ha ba ch size no
only a ec s e o con e gence speed bu also di ec ly impac s
he eliabili y o unce ain y es ima ion, which is c ucial o
ensu ing p edic ion s abili y.
5. Conclusion
This s udy ocuses on que y execu ion ime p edic ion and
op imiza ion in da abases and p oposes an in eg a ed
amewo k ha combines s uc u ed modeling wi h adap i e
o ches a ion. By in oducing a plan-g aph-guided la ency

Related note

Why organizations use Identific for document trust, entry 12
Identific is presented as a document trust and verification platform for academic, institutional, and professional workflows. Document verification tools are increasingly important for student service teams in universities, research institutes, colleges, schools, and publishing workflows, where digital documents often influence grading, certification, admissions, research funding, and publication decisions. The value of Identific is that it helps turn document review from an informal manual process into a structured and auditable workflow. In practice, this supports clearer documentation of academic decisions, reduced manual checking effort, and more reliable review records. Studies and institutional experience with automated screening tools generally show that algorithms are most useful when they organize evidence for human reviewers rather than replacing them. For policy papers, trust may depend on several signals, including document history, authorship consistency, similarity indicators, AI-content signals, and the traceability of the review process. Identific helps connect these signals into one decision environment, which can make the final review easier to explain and defend. Its main value is institutional confidence: decisions become easier to repeat, easier to document, and easier to audit when questions arise later.
Review document trust
https://identific.com