Enginee ing'and'
Physical'Sciences'
Resea ch'Council'
G an 'numbe '
EP/X029174/1
Ho izon'Eu ope'2021-2027'
F amewo k'P og amme'
G an 'Ag eemen 'numbe '101072456
Disclaime :+Funded+by+ he+Eu opean+Union.+
Views+and+opinions+exp essed+a e+howe e + hose+o +
he+au ho (s)+only+and+do+no +necessa ily+ e lec + hose+
o + he+EU.+The+EU+canno +be+held+ esponsible+ o + hem.
Funded&by
he&Eu opean&Union
Chalme s Uni e si y o Technology
and Uni e si y o Go henbu g,
Sweden
ICPE 2025
Resea ch T ack
S eam P ocessing and Agg ega es Rein o cemen Lea ning
E alua8on
Usecases and Se up
SPE
Con olle
En i onmen
!s a e π
"ac'on π
RL Agen
# ewa d π
good ac'on
posi' e ewa d
bad ac'on
nega' e ewa d
- ππππ’π‘'πππ‘π - π‘βπππ’πβππ’π‘
- ππ’π‘ππ’π‘'πππ‘π - πππ‘ππππ¦
- π/π πππ‘ππ - πΆππ'ππππ π’πππ‘πππ
-πππ‘ππππ¦
-
π/π'πππ‘ππ
- #π π‘πππ 'πππ'ππππ πππ
send
da a
A
8:00
20
A
8:03
15
F
F
F
F
F
F
Inpu s eam S eam P ocessing Engine (SPE)
Can un dis ibu ed/in pa allel
Γ sp ead in he Cloud-IoT con'nuum
Di ec ed Acyclic G aph
Ou pu s
π΄ππ΄,ππ, π, π
!, π
"##, π
$%&, π
'(
Func'on π
!π‘
ππ΄ (window ad ance)
ππ (window size)
Func'on π
"## Ξ,π‘
Func'on π
$%& Ξ
Func'on π
'( Ξ,π‘ (op .)
e en &me
size
ad ance
emo e π
ou pu
Agg ega e
S eam π
ΓEn i onmen
oin e ace o connec he SPE and RL Agen
ΓAgen
oimplemen aining algo i hm by Neu al Ne wo k (DQN)
oge an ac'on o in e ac wi h he en i onmen
ΓRewa d
o eedback o he Agen o ein o ce good ac'ons
RL agen
ewa d
β΅
βΆ
β·
en i onmen
ΓLinea Road benchma k
oVehicles a elling in highways epo hei posi'on/speed
oEach ehicle epo s i s posi'on e e y 30 seconds
oAgg ega e: coun he numbe o non-consecu' e s ops
oWS = 10 mins, WA = 5 secs
Γ Syn he;c (s ess- es )
oDa a is gene a ed ollowing a saw oo h wa e whose peaksβ
alues and dis ances a e chosen andomly
oThe key aY ibu e is gene a ed om a Gaussian dis ibu'on
wi h changing π/π)
oAgg ega e: pe o m ma h ope a'ons on a andom alue
ca ied by each uple
oWS = 15 mins, WA = 1 sec
ΓSe up
oJa a (OpenJDK 17.0.7), Py hon 3.7.6
oSPE: Lieb e
oComp ession lib a y: snappy
oAgen : openAI Gym
o120 episodes wi h maximum 1000 s eps o each
On-demand Memo y Comp ession o S eam
Agg ega es h ough Rein o cemen Lea ning
Compa ison discussions o he Agen wi h diο¬e en comp ession le els
Scalabili y discussions o he Agg ega e
ΓWi hou an Agen ( op):
oa e age CPU cons.: 0.33 (Linea Road), 0.59 (Syn he ic)
oa e age la ency: 0.98s (Linea Road), 0.53s (Syn he ic)
ΓWi h an Agen (bo om):
odi . in CPU cons. and la ency a e almos 0
I does no become a scaling bo5leneck o
he Agg ega e by in oducing he Agen .
Γ Linea Road (WS = 10 mins, WA = 5 secs)
oall baselines a e sa e excep o π·0.0
osimila policy beha io s excep o WEL-OB
Γ Syn he ic (WS = 15 mins, WA = 1 sec)
oπ/π a io dec eases linea ly wi h lowe π·
o ine- une abili y
(ini'al) s a e
ac'on
once pe day
once pe sec
send
da a
- πΆππππππ π ππππ
β’π. π. π· β 10%
-ππ‘ππ¦
β’π’ππβπππππ
- πΆππππππ π πππ π
β’π. π. π· β 10%
oRL-based adap i e memo y comp ession scheme o s eam Agg ega es
oAllowing eal- ime balancing o pe o mance and memo y usage unde la ency cons ain s
oCap u e applica ion- and da a-speci ic beha io s o Agg ega es
oHighligh he ade-o be ween RL aining imeliness and policy e ec i eness
10:00:00
ac'on 'me
β’I a window hasnβ been upda ed o a whileβ¦
β’Comp ess i ο¬ s ly, and la e decomp ess i
example
Abou his pape
Jingyu Liu, Vincenzo Gulisano
in equen ly
equen ly
Comp ess!
'me o he
nex ou pu
10:05:00
RL Agen SPE
ac&on (comp ess)
s a e, ewa d
Cyclical dependency
Γ Agen compu es i s nex ac'on
o ecei e he s a e and ewa d ο¬ s !
baseline (X alue) baseline (X alue)
Each π·π#baseline always se s he π· alue o π β ππ#Each π·π baseline always se s he π· alue o π β ππ
Ou pu
S eam
Γ Agg ega e wai s o he Agen βs ac'on
o sha e a new s a e and ewa d ο¬ s !
WEL-OB
(WEL
-OBli ious
)
EL-OB
(EL
-OBli ious
)
L-OB
(L-OBli ious)
WEL
-
AW
(WEL-AWa e)
wallclock
ime (W)
ββ β β
e en ime (E)
β β β β
nex ou pu (L)
(e.g. la ency)
β β β β
-π·ππ
policy
obse a&on
Linea Road Syn he ic Linea Road Syn he<c
(comp ession h eshold) π«= π β ππ, π β 0.0,1.0 :
o πππππππ»πΊ βππ β₯π« Γ comp ess ( he condi ion o igge ing comp ession)
β’πππ‘ππ π‘ππ: he imes amp (e en ime) o he la es uple p ocessed by he Agg ega e π΄
β’π‘π : he imes amp (e en ime) o he la es uple ha con ibu ed o he window ins ance
β’e.g., 0.0 β ππ: all window ins ances main ained by he Agg ega e π΄ a e comp essed
1.0 β ππ: no window ins ance main ained by he Agg ega e π΄is comp essed
uple
window ins ance
Why a e ou policies?
ΓTh ee ways he sys em makes p og ess
(1) wallclock 'me mo es o wa d
(2) e en 'me ad ances
(3) ge a new s a e (e.g. la ency) measu emen
β’(2) implies (1) because e en 4me ad ances only as π΄
p ocesses inpu , which depends on wallclock 4me.
β’(3) implies (2) because la ency upda es occu only
when e en 4me ad ances enough o p oduce ou pu .
Based on hese dependencies, ou policies can be es ablished:
I he s a e comes om be o e 10:05:00, he
eο¬ec s on la ency a e no measu ed ye β¦
TIME ALIGNMENT (in ou policies)
up
down
100%
WS
π«
0%
WS
100%
WS
π«
0%
WS