SaaMS: The Synopses-as-a-Mic oSe ice Pa adigm o Scalable Adap i e S eaming
Analy ics ac oss he Cloud o Edge Con inuum
Geo gios Panagio is Kal akisa, Nikos Gia akosa,∗
aSchool o Elec ical and Compu e Enginee ing, Technical Uni e si y o C e e, Uni e si y Campus, Kounoupidiana, Chania, GR-73100, G eece
Abs ac
The use o da a synopses in Big s eaming Da a analy ics can o e 3 ypes o scalabili y: (i) ho izon al scalabili y, o scaling wi h
he olume and eloci y o Big s eaming Da a, (ii) e ical scalabili y, o scaling wi h he numbe o p ocessed s eams, and (iii)
ede a ed scalabili y, i.e. educing he communica ion cos o pe o ming global analy ics ac oss a numbe o geo-dis ibu ed da a
cen e s o de ices in IoT se ings. Despi e he a o emen ioned i ues o synopses, no s a e-o - he-a Big Da a amewo k o IoT
pla o m p o ides a na i e API o s eam synopses suppo ing all h ee ypes o equi ed scalabili y. In his wo k, we ill his
gap by in oducing a no el sys em and a chi ec u al pa adigm, namely Synopses-as-a-Mic oSe ice (SaaMS), o bo h pa allel and
geo-dis ibu ed s eam summa iza ion a scale. SaaMS is de eloped on Apache Ka ka and Ka ka S eams and can p o ide all he
equi ed ypes o scalabili y oge he wi h (i) he abili y o seamlessly pe o m adap i e esou ce alloca ion wi h ze o down ime o
he unning analy ics and (ii) he abili y o un bo h ac oss powe ul compu e clus e s and Ja a-enabled IoT de ices. The e o e,
SaaMS is di ec ly deployable om applica ions ha ei he ope a e on powe ul clouds o ac oss he cloud o edge con inuum.
Keywo ds: Da a summa iza ion, Da a s eams, Big da a analy ics, Cloud, Edge, Adap i i y
1. In oduc ion
Many mode n applica ions om he inancial, ma i ime, ci il
p o ec ion and o he di e se domains base hei business alue
on eal- ime, online Big Da a analy ics ei he a powe ul da a
cen e s o o e a numbe o , po en ially geo-dispe sed, de ices
ac oss he cloud o edge con inuum. In he inancial domain,
ex eme scale s ock s eams, om a ious ma ke s ac oss he
wo ld, s eam in in es men companies’ da a cen e s. Global
and con inuous analy ics o e housands, apidly e ol ing s ock
s eams need o be pe o med in eal- ime o imely pinpoin
in e - o in a-ma ke in es men oppo uni ies and isks [1].
In u n, such companies p o ide pe sonalized po olio man-
agemen and in es men se ices ha span he cloud o og o
edge con inuum o allow apid ading on in es o s mobile de-
ices. In he ma i ime domain, housands o essels ac oss he
globe a e being moni o ed ia sa elli e images, AIS ecei e s a-
ions a egional coas al a eas o unmanned ehicles swa ming
a sea o enable au ho i ies o de ec illegal ac i i ies o sa e y
inciden s [2] and ac acco dingly. In ci il p o ec ion scena -
ios, d ones, obo s and senso s o a ious ypes ope a e in o -
es a eas o along i e banks. Such de ices collec olumi-
nous s eams o ele an da a in o de o moni o en i onmen al
condi ions and p o ide ea ly wa nings in case o o es i es o
loods, among o he e en s.
O e he yea s, he e is an es ablished consensus in he s eam-
ing da a managemen communi y [3, 4, 5, 6, 7, 8, 9, 10] ha in
∗Co esponding au ho
Email add esses: [email p o ec ed] (Geo gios Panagio is Kal akis),
[email p o ec ed] (Nikos Gia akos)
o de o deal wi h he olume and eloci y o such unbounded
s eams o da a, da a s eam summa ies including ske ches [8, 9,
7, 10, 11], samples [6, 12, 12, 13], wa ele s[14], his og ams[15,
16] and dimensionali y educ ion echniques [17, 18, 19] can
combine he po en ial o scale he compu a ion by educing he
p ocessing and memo y load, while con ollably sac i icing he
accu acy o s eaming analy ics asks, wi h p ede ined quali y
gua an ees. Such summa ies can p o ide analy ics answe s o a
a ie y o commonly used, con inuously execu ed que ies ha
include, bu a e no limi ed o, dis inc coun , ca dinali y, e-
quency momen , co ela ion, se membe ship o quan ile es i-
ma ion [5].
To deli e Big s eaming Da a analy ics a scale, s eam syn-
opses can p o ide h ee ypes o scalabili y:
Ho izon al scalabili y: scaling he compu a ion wi h he ol-
ume and eloci y o da a s eams by educing he p ocessing
and memo y load ia da a summa iza ion. In public o p i a e
cloud en i onmen s, his p ope y can be combined wi h pa al-
leliza ion oppo uni ies whe e each wo ke is assigned o p o-
cess a disjoin subse o he incoming s eams o a po ion o
an en i e s eaming da ase . In ha , each wo ke ope a es inde-
penden ly on a po ion o he incoming load and pa ial que y
esul s pe wo ke can be combined (i needed) based on he
me geabili y p ope y o many o such summa ies [20]. In ha ,
he i ues o bo h s eam summa iza ion and pa alleliza ion a e
exploi ed. Despi e his ac , Big Da a amewo ks, like Apache
Spa k [21] o Flink [22], p o ide no Na i e API o da a s eam
synopses [23]. Besides, hese Big Da a pla o ms can only aid
in ho izon al scalabili y a he cloud side as desc ibed abo e,
while wo addi ional ypes o scalabili y a e equi ed o eal-
P ep in submi ed o In o ma ion Sys ems No embe 6, 2025
ime da a s eam p ocessing a he cloud, and beyond he cloud
side, as de ailed below.
Ve ical scalabili y: scaling he compu a ion wi h he numbe
o p ocessed s eams. Fo ins ance, in c oss-s eam co ela-
ion (s ock-, essel-, o senso -s eams in he a o emen ioned
scena ios) compu a ion, he complexi y o he p oblem a hand
is exponen ial o he numbe o p ocessed s eams. The e o e,
me e pa allelism canno help by i sel o educe he p ocessing
load. On he con a y, s eam summa iza ion p o ides a scal-
able solu ion. In pa icula , ske ch summa ies [18, 24, 25] ha e
been used o co ela ion/dis ance-awa e hashing o s eams o
espec i e p ocessing uni s. Based on he synopses, using he
mos signi ican DFT coe icien s o comp essed Locali y Sen-
si i e Hashing signa u es as he hash key espec i ely, highly
unco ela ed/dissimila s eams a e hashed o be p ocessed o
pai wise compa isons a pa allel p ocessing uni s. S eam co -
ela ions a e p uned o s eams ha do no end up nea by in he
hashing space by exploi ing such locali y- and simila i y-awa e
hashing schemes.
Fede a ed scalabili y: This ype o scalabili y in ol es scaling
ou he compu a ion beyond single compu e clus e s o clouds
o ully geo-dis ibu ed se ings. Ac oss he cloud o edge con-
inuum, he e exis a numbe o de ices and in e media e nodes
(senso s, obo s, d ones, Raspbe y Pis) ha do possess some
p ocessing capaci y. Ha ing hese nodes nai ely elaying aw
da a deple es he a ailable bandwid h and causes ne wo k la-
encies, hus hinde ing he deli e y o eal- ime, con inuous e-
sponses o unning analy ics [26, 27, 28]. By pushing he com-
pu a ion o s eam summa ies ac oss he cloud o edge con in-
uum, he p ocessing capabili ies o he en i e ne wo k o de-
ices can be exploi ed and he communica ion cos , as well as
ne wo k la ency, can be ha nessed due o he use o synopses.
Despi e his ac , no IoT amewo k p o ides na i e suppo o
s eam synopses ac oss he cloud o edge con inuum [29].
The sole ela ed wo k in he li e a u e ha p o ides a s a e-
o - he-a pa allel, s eam summa iza ion engine o suppo all
he equi ed ypes o scalabili y is SDEaaS [30, 31]. Howe e ,
SDEaaS su e s om wo inhe en d awbacks: (i) i is no eas-
ily adap able o changing s eam summa y main enance condi-
ions. This means ha changing he pa allelism o an SDEaaS
se ice ha uns a he cloud incu s signi ican down ime o
scaling ou o in (inc easing o dec easing pa allelism, espec-
i ely) upon wo kload changes and (ii) i canno be deployed on
de ices ac oss he cloud o edge con inuum. The la e is bo h
due o he ac ha SDEaaS is buil on a Big Da a pla o m,
no des ined o esou ce cons ained de ices, and because o
he down ime adap a ion decisions incu o i . In pa icula ,
ac oss he cloud o edge con inuum, ne wo k de ices may de-
pa o connec a any gi en ime due o spo adic connec i i y
o mobili y cha ac e is ics. I synopses compu a ion is assigned
on such de ices, e e y ime hey en e o lea e he ne wo k an
adap a ion o he synopses compu a ion assignmen should be
pe o med. Hence, con inuous adap a ion o he dis ibu ion
o p ocessing load among he de ices is equi ed in such cases.
Bu he down ime ha would be incu ed by SDEaaS in o de o
adap , would domina e he bene i s o dis ibu ing he p ocess-
ing load among a dynamically changing popula ion o de ices.
In his wo k, we in oduce a no el pa adigm o pa allel
and geo-dis ibu ed s eam summa iza ion, SaaMS (Synopses-
as-a-Mic oSe ice). SaaMS ope a es ac oss he cloud o edge
con inuum and deals wi h all he a o emen ioned d awbacks.
We desc ibe an open-sou ce [32] ealiza ion o SaaMS buil
on Apache Ka ka [34] using he Ka ka S eams API [35]. A
he cloud side, SaaMS e ains he abili y o p o ide all he e-
qui ed ypes o scalabili y along wi h ze o down ime adap a-
ion o changing wo kloads. Ac oss he cloud o edge con-
inuum, con a y o p io a , SaaMS is di ec ly deployable on
Ja a-enabled de ices incu ing ze o down ime adap a ion upon
changes in he popula ions o a ailable de ices which pe o m
s eam synopses compu a ion. The e o e, SaaMS can simul a-
neously scale s eam synopses compu a ion a he cloud side
and dis ibu e synopses compu a ion ac oss ne wo k de ices.
In ha , i ully exploi s he lo o he p ocessing capaci y o he
con inuum.
Ou expe imen al e alua ion using hund eds o eal s ock
and essel da a s eams shows ha SaaMS (i) scales as e han
linea ly wi h inc easing s eam olumes and eloci ies (ho i-
zon al scalabili y), (ii) main ains linea scaling ends upon in-
c easing he numbe o p ocessed s eams om 10s o 100s
( e ical scalabili y) and (ii) can sa e mo e han wo o de s o
magni ude communica ion-wise ac oss he cloud o edge con-
inuum compa ed o nai ely using ne wo k de ices a he edge
as elay nodes ( ede a ed scalabili y). Compa ed o he s a e
o he a SDEaaS app oach which ope a es only a he cloud
side, SaaMS exhibi s an up o 3 imes highe a e age h ough-
pu due o he ac ha i diminishes down ime. Mo eo e , his
especially happens in cases whe e he olume, eloci y and he
numbe o p ocessed s eams is signi ican ly inc eased.
2. Rela ed Wo k
F om a esea ch iewpoin , he e is a la ge numbe o e-
la ed wo ks on da a synopsis echniques. Please e e o [5, 40,
3] o comp ehensi e e iews on ele an issues. Such p omi-
nen echniques, ci ed in Table 2, a e inco po a ed in SaaMS,
which al eady includes a ich se o da a summa ies se ing
a wide a ie y o analy ics asks. Wha is mo e impo an is
ha SaaMS can inco po a e any da a summa iza ion echnique
abiding by a simple, ye e ec i e, so wa e a chi ec u e (Sec-
ion 4.3).
Wi h espec o da a summa iza ion engines and lib a ies,
Table 1 p o ides a compa ison o SaaMS agains p io a , e-
ga ding hei scalabili y ea u es, he le el o he cloud o edge
con inuum whe e hese scalabili y ea u es a e suppo ed and
hei po en ial o adap i i y. Apache Da aSke ches [36] and
S eam-lib [37] a e so wa e lib a ies o s ochas ic s eaming
algo i hms and summa iza ion echniques. These so wa e li-
b a ies a e de ached om pa alleliza ion aspec s. The e o e,
hey canno p o ide ho izon al, e ical o ede a ed scalabili y
a he cloud side. Mo eo e , hey do no p o ide he p imi i es
o IoT de ices o coo dina e and me ge hei pa ial synopses
wi hin he con ex o a uni ied synopses se ice. In o he wo ds,
he applica ion should manually ins all, p og am, synch onize
and coo dina e he IoT a chi ec u e om sc a ch.
2
Fea u es →Ho izon al Scalabili y Ve ical Scalabili y Fede a ed Scalabili y Ze o-down ime Adap i i y
IoT Le el →
Rela ed App oach ↓@CLOUD @FOG @EDGE @CLOUD @FOG @EDGE @CLOUD @FOG @EDGE ANY
Da aSke ch [36] ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘
S eam-lib [37] ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘
S eamApp ox [38] ❑
(S a i ied Sampling) ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘
SnappyDa a [39] ❑
(Simple Agg ega es) ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘
Condo [23] ✔✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘
SDaaS [30, 31] ✔✘ ✘ ✔✘ ✘ ✔✘ ✘ ✘
SaaMS ( his wo k) ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
Table 1: Scalabili y and Ze o-down ime Adap i i y o P io A ac oss he Cloud o Edge Con inuum
SnappyDa a’s [39] s eam p ocessing is based on Spa k and
inco po a es a limi ed se o synopses se ing simple SUM,COUNT
and AVG que ies. Simila ly, S eamApp ox [38] o e s only
sampling as a pipeline ope a o o e Spa k and Flink. Thus,
hese a e dep i ed o e ical scalabili y ea u es and ede a ed
scalabili y p o isions a he cloud side. Besides hei limi ed
suppo o synopses echniques ( he e o e he o ange squa e
in Table 1), SnappyDa a’s and S eamApp ox ope a e on op o
Big Da a amewo ks which a e no deployable ac oss he cloud
o edge con inuum.
The p ominen wo k o Condo [23] elegan ly op imizes he
pa allel compu a ion o s eam summa ies a he cloud side,
hough neglec ing e ical and ede a ed scalabili y. SDEaaS [31,
30] co e s all aspec s o scalabili y a he cloud side only. Since
Condo and SDEaaS un on op o Flink, hey a e es ic ed a
he cloud le el.
No ela ed echnique p o ides ze o-down ime adap i i y a
any le el o he cloud o edge con inuum. Indica i ely, since
SDEaaS and Condo a e de eloped on Flink, hey a e unning
as Flink jobs a he cloud side. Ou expe ience wi h adap ing,
e en such simple s a eless jobs says ha i akes 10 o 15 sec-
onds in case o adap a ion (scaling in o ou ). This down ime
in e als can conside ably inc ease when s a e should be s o ed
be o e and loaded a e adap a ion.
SaaMS also o e comes o he limi a ions o SDEaaS includ-
ing: (i) SDEaaS, by design, canno use he na i e windowing
ope a o s o he Da aS eam API o Flink, while SaaMS inco -
po a es he windowing ope a o o he Ka ka S eams API, hus,
o e ing addi ional pa alleliza ion s a egies (Sec ion 4.2), (ii)
SaaMS allows o sa e and load sa ed synopses o euse sum-
ma ies om pas mic ose ices, while SDEaaS always s a s
main aining synopses om sc a ch, (iii) Adding code o new
synopses on- he- ly, a SDEaaS un ime, is cumbe some due
o Ja a ClassLoade usage, aising po en ial secu i y issues in
YARN-like clus e s. On he con a y, adding code o new syn-
opses on- he- ly, a SaaMS un ime, jus c ea es new Mic oSe -
ice ins ances wi hou he need o ClassLoade usage.
3. Backg ound
3.1. Apache Ka ka
Apache Ka ka [34, 35] cons i u es he de- ac o s anda d o
s eam inges ion in la ge scale, dis ibu ed, s eaming applica-
ions [41, 42]. Ka ka is a as , scalable, du able, and aul -
ole an publish-subsc ibe messaging sys em. A Ka ka clus e
is composed o a numbe o wo ke s coo dina ed by a ZooKeepe .
Wo ke s a e un ime ins ances an on Vi ual Machines (VMs)
o a Ka ka clus e unde aking he execu ion o da a inges ion
asks.
Ka ka ecei es and sends da a using p oduce s and con-
sume s. A p oduce is a da a sou ce ha ini ializes inpu by
sending eco ds o he Ka ka log. The log ypically consis o
<key, alue>pai s and, as i s name sugges s, i ope a es on an
append only ashion.
The eco ds ha a i e a a Ka ka clus e a e ca ego ized
in o opics based on he con ex he co esponding logs e e
o. Fo ins ance, based on he unning examples o Sec ion 1,
a Ka ka opic may in ol e a g oup o s ocks coming om he
same ma ke , in he inancial domain scena io. The p oduce
in his scena io gene a es and publishes inancial ma ke da a,
such as s ockID (key) along wi h olume and p ices pe s ock
( alue), o o he ele an ma ke e en s, in o Ka ka opics. In
he ma i ime ac i i y moni o ing scena io, a Ka ka opic may
log essel posi ions o a speci ic a ea o he oceans gene a ing
esselID (key) and longi ude, la i ude ( alue) pai s. The ac ual
con ex s o ed in he Ka ka opics should be de e mined and
de ined by he applica ion.
Au oma ically, each eco d o a opic is assigned a unique
‘o se ’, which is a alue ha de e mines i s posi ion in he log.
Ka ka opics include one o mo e pa i ions o pa allel p ocess-
ing pu poses. A opic wi h mul iple, non-o e lapping pa i ions
can be ead (consumed) by di e en applica ion ins ances, each
eading and p ocessing only a po ion o he inges ed da a. Each
pa i ion has a leade ins ance held a one b oke and eplicas
kep a he same o o he b oke s. Applica ion ins ances should
be equipped wi h Ka ka consume s which ead eco ds using
he o se as a poin e o iden i y whe e hey p e iously s opped.
A main cha ac e is ic o Ka ka consume s is he o ganiza ion
in o g oups, allowing he dis ibu ion o pa i ions ac oss he
consume s o he same g oup. Fo ins ance, a opic wi h Xpa -
i ions can ha e i s eco ds dis ibu ed among hese pa i ions
in a ound obin ashion and a g oup o Xo ewe consume s
can ope a e on a sepa a e subse o pa i ions each, eading and
p ocessing he eco ds in pa allel. In his case, a pa alleliza ion
deg ee o Xis achie ed and he pa allelism can inc ease in case
a opic has mo e pa i ions. In gene al, he numbe o pa i ions
o a Ka ka opic se s an uppe bound on he allowed p ocessing
pa allelism by he applica ions buil on op o Ka ka. The abo e
3
Ka ka Clus e
ZooKeepe
Wo ke 1 Wo ke 2
B oke 1 B oke 2 B oke 3
Nikkei225 Topic
S&P500 Topic NASDAQ Topic AIS Vessel Da a
Medi e anean Topic
Leade Pa i ion 0Leade Pa i ion 0
Replica Pa i ion 0
o se
Leade Pa i ion 1
Leade Pa i ion 2
Replica Pa i ion 1
Replica Pa i ion 2
Replica Pa i ion 0
Replica Pa i ion 1
Replica Pa i ion 2
Leade Pa i ion 0
ReplicaPa i ion 0
ReplicaPa i ion 1
Replica Pa i ion 0
Replica Pa i ion 1
Leade Pa i ion 1
Replica Pa i ion 0
Replica Pa i ion 1
Replica Pa i ion 0
Replica Pa i ion 1
Leade Pa i ion 0
Leade Pa i ion 1
Figu e 1: Ka ka Clus e Exempla y O ganiza ion. Colo ing o opics and pa i-
ions exp esses co espondence. Fo ins ance, g ey pa i ions belong o he g ey
NASDAQ opic.
discussion is depic ed in Figu e 1.
Ou SaaMS amewo k communica es wi h he ou side wo ld,
in he con ex o b oade s eam p ocessing wo k lows, ia Ka ka
opics which (i) consume s eams and upda e main ained syn-
opses, (ii) accep (consume) applica ion/use que ies and e-
ques s, and (ii) deli e (p oduce) he es ima ions p o ided by
he main ained synopses as que y answe s in he ou pu . De-
ails ollow in Sec ion 4.1 and Sec ion 5.1.
3.2. The Ka ka S eams API
The Ka ka S eams API o Ka ka is a Ja a lib a y o build-
ing eal- ime s eaming mic ose ices ha can consume, p o-
cess, and p oduce da a om and o Ka ka opics. I seamlessly
in eg a es wi h Ka ka, allowing s eam p ocessing applica ions
o di ec ly consume and p oduce messages o Ka ka opics.
This makes i easy o inco po a e s eam p ocessing in o ex-
is ing Ka ka-based a chi ec u es. Ka ka S eams is a unc ional
p og amming API ha ensu es aul ole ance exploi ing Ka ka
me ada a au oma ically c ea ed in he backg ound and exac ly-
once seman ics (each eco d is p ocessed exac ly once e en in
he ace o ailu es).
Ka ka S eams applica ions a e Ja a applica ions composed
o one o mo e mic ose ices ha can un on any Ja a-enabled
de ice. The de elope has jus o p oduce a .ja ile and de-
ploy i o he desi ed de ices (clus e VM o de ice, ac oss he
cloud o edge con inuum). Sepa a e de ices can unde s and ha
hey a e pa o he same applica ion since hey sha e he same
applica ion.id. The applica ion.id is a unique iden i ie
assigned o a Ka ka S eams applica ion, and i plays a c ucial
ole in ensu ing ha ins ances o he same applica ion coo di-
na e and wo k oge he wi hin he Ka ka clus e .
A Ka ka S eams applica ion is composed o a numbe o
p ocessing and s o ing en i ies, besides p oduce s and consume s
desc ibed in Sec ion 3.1. Below we summa ize hese en i ies
ocusing on hei p ope ies ha jus i y he way hey a e po-
si ioned in he SaaMS a chi ec u e la e on in ou discussion
(Sec ion 5):
•KS eam: a high-le el abs ac ion ep esen ing an immu able,
o de ed, and eplayable sequence o s eam eco ds. Typi-
cally, each eco d is conside ed as a key- alue pai . The e m
”KS eam” is de i ed om ”Keyed S eam”.
•KTable: ep esen s ma e ialized iews o changelog s eams.
KTables a e immu able and main ain only he la es s a e o a
mic ose ice o applica ion da a, enabling e icien joining o
s eams o KTables. KTables a e use ul when he applica ion
needs o keep ack o he cu en s a e o en i ies.
•S a e S o e: p o ides mu able s o age wi hin a Ka ka S eams
applica ion. S a e S o es can be used o main ain in e medi-
a e o agg ega e s a e du ing s eam p ocessing. S a e s o es
allow upda es, inse s, and dele ions.
•Se des: he e m combines se (ializa ion) and des(e ializa ion)
needed o ans o ming da a (see s eam ans o ma ions be-
low) be ween mic ose ices connec ed wi h Ka ka opics and
he in e nal ep esen a ions used by he Ka ka S eams appli-
ca ion. The di e en wo ke s (wi hin a clus e ) o a ious
de ices (ac oss he cloud o edge con inuum) need o com-
munica e hei ans o med s eams, he sende i s has o
se ialize he da a in a se ies o by es so ha hey can a el
ia ne wo k channels, while he ecipien has o dese ialize
by es in o meaning ul s eaming eco ds.
•S eam T ans o ma ions: implemen he ac ual da a p o-
cessing unc ionali y o a numbe o mic ose ices wi hin
Ka ka S eams applica ions. S eam ans o ma ions ypi-
cally a e highe o de unc ions aking as inpu o ha ing a
buil -in anonymous unc ion along wi h one o mo e Ka ka
S eams o KTables. Below we b ie ly men ion he unc ion-
ali y o he majo s eam ans o ma ions used in he SaaMS
a chi ec u e:
– la Map: ans o ms each inpu eco d in o ze o o
mo e ou pu eco ds by applying a one- o-many ans-
o ma ion.
–mapValues: ans o ms he alues o a s eam wi hou
changing he keys.
–g oupByKey: g oups eco ds by hei keys o u he
agg ega ion o p ocessing.
–agg ega e: pe o ms s a e ul agg ega ions on a (usu-
ally g ouped) s eam, main aining esul s o e ime.
–windowedBy: g oups eco ds in o ime-based windows
o ope a ions like coun ing o agg ega ing o e speci ic
ime in e als.
–join: combines eco ds om a pai o sou ces based
on hei keys. Fo KS eam-KS eam joins, i ma ches
eco ds wi h he same key om wo KS eams. Fo
KS eam-KTable joins, i combines KS eam eco ds
wi h he la es eco d om a KTable based on ma ching
keys. A JoinWindows pa ame e speci ies ime win-
dows o empo al ma ching in bo h ypes o joins.
– ans o m: allows he applica ion o cus om s a e-
ul ans o ma ions on each eco d in a KS eam, p o-
iding lexibili y o complex p ocessing scena ios by
main aining and upda ing s a e ac oss mul iple eco ds
wi hin a ans o me .
4
Que y(y) → Es ima ion(1,”yes”)
0 1 0 0 0 0 0 01 1
Que y(w) → Es ima ion(0,”No”)
Bloom Fil e
Cu en S a e
Figu e 2: Bloom Fil e example on a cu en bi map (synopsis s a e) wi h 2
que ied elemen s (y,w) and se membe ship es ima ions.
3.3. Running S eam Synopses Examples
SaaMS suppo s a la ge se o synopses o e s eaming da a
(Table 2), bu is ex ensible and cus omizable o any da a sum-
ma iza ion algo i hm equi ed by an applica ion. In his sec ion
we a e going o p esen wo synopsis echniques, namely Bloom
Fli e s [11] and AMS Ske ches [43], which will be used as un-
ning examples in he ollowing sec ions. Ou ocus will be on
p esen ing (i) he way hey educe he p ocessing and mem-
o y load and (ii) hei upda e, es ima ion (que ying) and syn-
opses me ging p ocedu es. The eason o his, is ha (i) and
(ii) a e a he co e o he so wa e echnology in SaaMS design
(Sec ion 4) and he implemen a ion o he SaaMS a chi ec u al
amewo k (Sec ion 5).
The Bloom Fil e [11] is a ligh weigh and space-e icien
p obabilis ic algo i hm ha can p o ide app oxima e es ima-
ions on se membe ship que ies. The algo i hm uses as i s
s uc u e a bi map o Mbi s wi h ini ial alues se all o 0.
Ldeno es he ca dinali y o he dis inc s eam elemen s (wi h
L≫M, he e o e, he memo y e iciency), and Ka e dis inc
hash unc ions used o map incoming s eam elemen s o bi map
posi ions.
Upda e P ocedu e and Synopsis S a e: When a s eam elemen
xa i es, i is p ocessed by a o al o Khash unc ions. Each
hash unc ion is deno ed as hi(x),i=1,2,...,K. The bi
posi ions in he bi a ay a e ep esen ed as:
h1(x),h2(x),...,hK(x)
Fo each o hese bi posi ions o which he hash unc ions poin s,
he co esponding bi is se o 1. A bi al eady se o 1 by a
p e ious s eam elemen emains unchanged. The s a e o he
synopsis a any gi en ime is he bi map i sel wi h i s se and
unse bi s in each posi ion, as o med by upda es ecei ed so
a .
Es ima ion (Que ying): Upon an applica ion que y, he algo-
i hm es ima es se membe ship o a que ied elemen yin he
se , by compu ing Khash unc ions on y. I all he bi posi ions,
esul ed om he hash unc ions, in he bi a ay a e 1, yis in
he se as illus a ed in Figu e 2. On he o he hand, i o an-
o he elemen w he e a e 0 bi s in any o hese posi ions, hen
wis de ini ely no in he se (Figu e 2).
The eason he algo i hm does no espond wi h ce ain y
is he limi ed size o he bi a ay compa ed o he ca dinali y
o he o iginal se o elemen s. As a esul , di e en s eam
elemen s may hash o he same se o posi ions. The p obabili y
+𝑐𝑔1(e)
+𝑐𝑔2(e)
+𝑐𝑔3(e)
+𝑐𝑔𝐿(e)
h1(K)
hL(K)
e, +𝑐
L dep h
M bucke s
AMS Ske ch Cu en S a e
Figu e 3: AMS Ske ch synopsis upda e and cu en s a e.
ha all Kposi ions a e 1 o a que ied elemen and his elemen
did no appea in he s eam so a is FP =1−e−KL
MK.
To minimize he p obabili y o a alse posi i e (FP) esul
( he Bloom Fil e e oneously eplies ha an elemen is in he
se al hough i has no appea ed in he s eam so a ), he numbe
o hash unc ions should be se o:
K=M
Lln 2
Me ge P ocedu e: In case he s eam is moni o ed in pa allel by
mul iple wo ke s o de ices ac oss he cloud o edge con inuum,
each ins ance o he Bloom Fil e keeps a local s a e (bi map)
buil on he subse o s eam elemen s ha ha e a i ed locally.
Local Bloom Fil e s can be me ged in o a global Bloom Fil e ,
e aining he a o emen ioned p ope ies o he synopsis, by pe -
o ming a bi wise OR ope a ion on locally cons uc ed Bloom
Fil e s. A Bloom il e equi es O(M) memo y and O(K) up-
da e, que y and me ge ime complexi y pe pa allel ins ance.
The key idea in AMS Ske ch [43] is o ep esen a ec o
, holding he equencies o a la ge domain o s eaming ele-
men s, using a much smalle ske ch ec o sk( ). This ec o
is upda ed wi h he s eaming uples and p o ides p obabilis ic
gua an ees o he quali y o he da a app oxima ion.
Upda e P ocedu e and Synopsis S a e: An AMS ske ch is ini-
ialized as a ma ix Swi h L ows and Mcolumns (dep h and
bucke s in Table 2), whe e L=O(1/ε2), and M=O(log 1/δ),
wi h ε, 1−δbeing he desi ed bounds on e o and p obabilis ic
con idence, co espondingly. Lis he numbe o hash unc ions
and Mis he numbe o bucke s. When an elemen eis encoun-
e ed in he da a s eam, i is hashed L imes using Lindepen-
den hash unc ions. Each hash unc ion himaps e o a bucke
index j(whe e 1 ≤j≤M), and he coun in he co espond-
ing bucke o each ow iis inc emen ed (Figu e 3). The AMS
ske ch de ines he i- h ske ch en y, sk( )[i] as he andom a i-
able PK [K]·gi[K], whe e {gi}is a amily o ou -wise indepen-
den bina y andom a iables uni o mly dis ibu ed in {−1,+1}
(wi h mu ually-independen amilies ac oss di e en en ies o
he ske ch). Using app op ia e pseudo- andom hash unc ions,
each such amily can be e icien ly cons uc ed on-line in loga-
i hmic space. No e ha , by cons uc ion, each en y o sk( ) is
essen ially a andomized linea p ojec ion (i.e., an inne p od-
uc ) o he ec o (using he co esponding g amily), ha
can be easily main ained (using a simple coun e ) o e he in-
pu upda e s eam. Simila ly o elemen dele ion – expi a ion.
5
Lis ing 1: Example o S eaming Tuple a SaaMS Da a Topic.
1{
2" s eamID ":" Tesla Inc .",
3" da aSe Key ":" NASDAQ S ock Ma ke " ,
4"da e ":"01/02/2019",
5" ime ":"00:00:01",
6" p ice ":6.0654,
7" olume":1
8}
The s a e o he ske ch a any gi en ime is he wo-dimensional
L×Ma ay along wi h i s coun en ies.
Es ima ion (Que ying): An es ima ion on he ca dinali y o he
“inne p oduc ” be ween wo s eams in he ske ch- ec o space
is gi en by:
sk( 1)·sk( 2)=median
| {z }
j=1,··· ,M
1
L
L
X
i=1
sk( 1)[i,j]·sk( 2)[i,j]
Fo es ima ing he second equency momen o a single
s eam we simply eplace sk( 2) wi h sk( 1) in he o mula abo e.
Me ge P ocedu e: In case he s eam is moni o ed in pa allel
by mul iple wo ke s o de ices ac oss he cloud o edge con-
inuum, each ins ance o he AMS keeps a local L×Mma ix
buil on he subse o s eam elemen s ha ha e a i ed locally.
Local AMS Ske ches can be me ged in o a global ske ch, by
en y-wise summa ions o he espec i e a ays. As men ioned
abo e, AMS equi es O(M) memo y and exhibi s an O(K) up-
da e, que y and me ge ime complexi y pe pa allel ins ance.
4. SaaMS Fundamen als
4.1. SaaMS API
SaaMS communica es o he ou side wo ld and b oade ap-
plica ion wo k lows, whe e i is deployed, ia JSON o ma ed
Ka ka messages. JSON is used bo h o desc ibe he schema
o he incoming and ou going da a uples, and as an API so
ha i can pa se and accep ins uc ions o (i) s a ing main-
aining new synopses, (ii) que ying exis ing synopses, (iii) up-
da ing exis ing synopses, (i ) sa ing he cu en s a e o a syn-
opsis o loading a pas synopsis and ( ) dele ing an exis ing
synopsis. Because o his design choice, SaaMS can be pa o
any s eaming p ocessing wo k low i espec i ely o o he pla -
o ms and ools ha a e deployed in he es o he p ocessing
pipeline.
Lis ing 1 shows an example o a JSON o ma ed Ka ka
message which cons i u es a s eaming uple des ined o up-
da e an exis ing synopsis. We use he s ock ma ke scena io
desc ibed in Sec ion 1 as ou unning example:
•s eamID: The name o he s eam he uple belongs o.
•da aSe Key: I ep esen s a se o s eams.
•Field Name(s): Rep esen s he ield(s) which may be
used o build he synopsis on. In his example:
Lis ing 2: Example o New Synopses Main enance Reques .
1{
2" s eamID ":" Tesla Inc ." ,
3" synopsisID ":1,
4" da aSe Key ":" NASDAQ S ock Ma ke ",
5" pa am ":[" Coun Min "," olume "," Cons uc ",0.
001,0.99,12345],
6" noO P ":5
7}
Lis ing 3: Coun Min [7] Que y Example a SaaMS Reques Topic.
1{
2" s eamID " :" Tesla Inc ." ,
3" synopsisID " :1,
4" da aSe Key " :"NASDAQ S ock Ma ke ",
5" pa am " :[6," olume"," Que yable ","
Con inuous "]
6}
– da e & ime: The imes amp a which he uple
( ade o s eamID = "Tesla Inc." in he ex-
ample) was cap u ed.
– p ice: The cu en p ice a he speci ied da e and
ime.
– olume: The quan i y o a ade o he s eamID
= "Tesla Inc." a he speci ied da e and ime.
As ano he example, in he ma i ime scena io o Sec ion 1,
s eamID would co espond o he iden i ie o a essel, da a-
Se Key would co espond o a sea a ea, while applica ion spe-
ci ic ields like p ice and olume would be eplaced wi h imes-
amped essel coo dina es.
Lis ing 2 p esen s a JSON message wi h a eques o s a -
ing main aining a new synopses. In he example o Lis ing 2,
he c ea ion o a new synopsis is eques ed wi h speci ica ions
o c ea e a Coun Min synopsis on he olume a ibu e, along
wi h necessa y pa ame e s (see Table 2) o ha synopsis. In
pa icula :
•s eamID: As in Lis ing 1.
•synopsisID: De ines he synopsis ype as a sequen ially
inc easing in ege o each o he synopses lis ed in Ta-
ble 2. Fo ins ance, synopsisID = 1 co esponds o a
Coun Min ske ch.
•da aSe Key: As in Lis ing 1.
•pa am: An a ay o pa ame e s necessa y o build he
eques ed synopsis:
Lis ing 4: Reques Message o Loading a Sa ed Synopsis.
1{
2" pa am " :["LOAD_REQUEST",Pa hToLoadSynopsis
s o ed_Coun Min . se " ]
3}
6
SynopsisID Synopsis Ou pu Es ima ion Inpu Pa ame e s
1 Coun Min [7] Coun , F equency Es ima ion ϵ(maximum e o ), δp obabili yO Exceeding ϵ, seed
2 Hype LogLog [9] Ca dinali y, Dis inc Coun Rela i e S anda d De ia ion (RSD)
3 BloomFil e [11] Membe ship Expec ed #Elemen s, False Posi i e Ra e
4 DFT [18] Co ela ion Es ima ion Window Size, Slide Size,#coe icien s
5 LossyCoun ing [13] Coun , F equen I ems ϵ(maximum e o )
6 S ickySampling [13] Coun , F equen I ems Suppo , ϵ(maximum e o ), δP obab. o Exceeding ϵ
7 AMS [43] L2No m, Inne P oduc Dep h, Bucke s
8 GKQuan iles [44] Quan iles ϵ(Maximum E o )
9 LSH [19, 45] Co ela ion Es ima ion Window Size, Comp ession Ra io, #Wo ke s
10 WindowSke ch Quan iles [46] Quan iles ϵ(Maximum E o ), Window Size
Table 2: SaaMS Buil -in Synopses, Inpu Pa ame e s and Ou pu Es ima ion. SaaMS emains ex ensible and cus omizable by plugging in new synopses.
– Synopsis Type: Speci ies he ype o he synopsis
(e.g., Coun Min) including hose in Table 2 o cus-
om synopses plugged in he SaaMS amewo k.
– Field Name: Rep esen s he ield used o build he
synopsis on (e.g., ” olume”).
– Reques S a us: Desc ibes he ype o he eques .
In Lis ing 2 i ecei es he alue o Cons uc o
decla e a eques o s a main aining a new synop-
sis.
– Synopsis Pa ame e (s): Each synopsis has di e -
en pa ame e s as in Table 2. Fo example, Coun -
Min has ϵ,δ, and seed. These a e ins an ia ed in he
example o Lis ing 2.
•noO P: De ines he numbe o pa i ions in he synop-
sis da a opics and he maximum pa alleliza ion deg ee
o he synopsis mic ose ice, o be u he discussed in
Sec ion 5.1.
I a synopsis al eady exis s, i is no duplica ed. In case
he s eamID is emp y, he synopsis will be main ained on all
s eams wi h he same da aSe Key. Mo eo e , he eques o
dele ing a synopsis is simila o ha o Lis ing 2, wi h Dele e
eplacing he Cons uc keywo d.
Lis ing 3 p o ides an example o a que y on he p e iously
c ea ed Coun Min synopsis. The di e ence compa ed o Lis -
ing 2 is on he pa ame e a ay ha is passed in he eques .
pa am is again an a ay o pa ame e s, bu his ime includes
he pa ame e s ha a e necessa y o que y he in ol ed synop-
sis:
•Que y Pa ame e s: Speci ies he que y pa ame e s o
he in ol ed synopsis. In he example o Lis ing 3 we
que y he Coun Min ske ch (because synopsisID = 1)
o es ima ing he equency o ades wi h a olume o
6.
•Field Name: The que ied ield, olume in he example
o Lis ing 3.
•Reques S a us: Desc ibes he ype o he eques . In
Lis ing 3 i ecei es he alue o Que yable o decla e a
eques o que ying a synopsis.
•Que y Type: Each que y may ei he be a Con inuous
que y, meaning ha i is egis e ed once and ge s con in-
uously execu ed un il he synopsis is dele ed, o Ad-hoc
which in ol es one sho que ies.
Finally, Lis ing 4 exp esses a eques o loading a p e i-
ously s o ed synopsis, om a ile o .se ype. This kind o
eques jus uses he pa h om which he co esponding ile
should be loaded. Ha ing issued a Cons uc eques as in
Lis ing 2, a LOAD REQUEST ollows in o de o load a p e iously
sa ed synopsis, ins ead o s a ing main aining i om sc a ch.
Ha ing loaded a synopsis, i can ge upda ed using uples sim-
ila o he one in Lis ing 1, as well as i can ge que ied by
ecei ing eques s simila o he one in Lis ing 3. The eques
o sa ing he cu en s a e o a synopsis is simila , subs i u ing
LOAD REQUEST wi h SAVE REQUEST.
4.2. SaaMS Pa alleliza ion Schemes
SaaMS employs, and when needed combines, h ee pa al-
leliza ion schemes: (i) pa i ion-based pa alleliza ion, (ii) key-
based pa alleliza ion, and (iii) window-based pa alleliza ion.
Pa i ion-based pa alleliza ion e e s o he abili y o p o-
cess mul iple pa i ions o a Kak a opic concu en ly. Each
pa i ion is an o de ed, immu able sequence o uples, and con-
sume s (wo ke s a he cloud side o de ices ac oss he cloud
o edge con inuum) can ead om mul iple pa i ions simul a-
neously, allowing o pa allel p ocessing o da a. This ype o
pa allelism is used on synopses de ined based on da aSe Key
as de ailed in Sec ion 4.1. In his case, uples o he same da ase
a e dis ibu ed among pa i ions (noO P pa ame e in Lis ing 2).
Key-based pa alleliza ion, on he o he hand, in ol es g oup-
ing eco ds by hei keys and p ocessing eco ds wi h he same
key in pa allel. This is pa icula ly use ul in SaaMS since we
ha e da a ha may need o be pa i ioned based on a key, such
as he s eamID in ou unning example.
These wo o ms o pa alleliza ion a e combined, depend-
ing on he synopsis ype, in SaaMS by ensu ing ha uples
wi h he same key a e sen o he same pa i ion. By doing so,
SaaMS le e ages bo h pa i ion-based pa alleliza ion (p ocess-
ing mul iple pa i ions concu en ly) and key-based pa alleliza-
ion (p ocessing uples wi h he same key in pa allel wi hin a
pa i ion).
7
ED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
UNREGISTERED
«package»
Synopses
«package»
Hype LogLog
«package»
BloomFil e
«package»
DFT
«package»
GKQuan iles
«package»
AMSSke ch
«package»
LossyCoun ing
LSH
«package»
S ickySampling
«package»
WindowSke chQuan iles
«package»
Coun Min
AMSSke ch
-dep h: in
-bucke s: in
-coun : in
-coun s: in [*]
«cons uc o »+AMSSke ch(dep h: in , bucke s: in )
#upda e(i em: Double): boolean
#upda e(i em: Double, inc emen Coun : long): boolean
#compa eTo(o: AMSSke ch): in
#es ima eCoun (i em: Double): long
#size(): long
#es ima eF2(): long
#es ima eInne P oduc (b: AMSSke ch): long
#upda e(sou ce: AMSSke ch): boolean
#sub ac (sou ce: AMSSke ch): boolean
#con ains(i em: Double): boolean
AMSSke chSynopsis
+amss: AMSSke ch
«cons uc o »+AMSSke chSynopsis(S ing [])
+se de(): Se de<AMSSke chSynopsis>
+add(Objec ): oid
+es ima e(Objec ): Objec {que y}
+me ge(Synopsis): Synopsis
+size(): long
«package»
SynopsesSe des
«package»
AMSSke ch
AMSSke chSe de
«cons uc o »+AMSSke chSe de()
+se ialize (): Se ialize <AMSSke ch>
+dese ialize (): Dese ialize <AMSSke ch>
- eadAllBy es(Da aInpu S eam): by e[*]
Hype LogLog
GKQuan iles
WindowSke chQuan iles
Coun Min
BloomFil e
LossyCoun ing
AMSSke chSe ialize
«cons uc o »+AMSSke chSe ialize ()
+con igu e(Map<S ing, ?> , boolean): oid
+se ialize(S ing, AMSSke ch): by e[*]
AMSSke chDese ialize
«cons uc o »+AMSSke chDese ialize ()
+con igu e(Map<S ing, ?>, boolean): oid
+dese ialize(S ing, by e[]): AMSSke ch
- eadAllBy es(Da aInpu S eam): by e[*]
S ickSamplingLSH
«package»
DFT
«package» «package»
«package»
«package» «package»
«package»
«package»
«package»
«package»
Figu e 4: SaaMS Lib a y So wa e Technology (Package S uc u e).
In ou unning example, i we ha e a Ka ka opic wi h s ock
ades om a ma ke ( he same da aSe Key) and we pa i ion
he opic based on s eamID, uples o he same s ock will be
sen o he same pa i ion. This allows SaaMS o p ocess da a
o di e en s ocks in pa allel ac oss di e en pa i ions, while
s ill p ocessing uples o he same s ock in pa allel wi hin a
pa i ion, based on hei keys.
Window-based pa alleliza ion, on he o he hand, in ol es
di iding da a in o ixed-size ime in e als o windows and p o-
cessing each window independen ly. This is pa icula ly use ul
o handling s eaming da a whe e we wan o pe o m ope a-
ions, such as agg ega ion o compu a ion, o e a de ined ime
window.
These o ms o pa alleliza ion can be combined in SaaMS
ia he Ka ka S eams lib a y. In ha , SaaMS allows he de i-
ni ion o windowed ope a ions (Sec ion 3.2), enabling he p o-
cessing o da a in pa allel ac oss di e en pa i ions wi hin each
ime window. This boos s he abili y o SaaMS o e icien and
scalable p ocessing o s eaming da a, aking ad an age o bo h
pa i ion-based and window-based pa alleliza ion.
Such pa alleliza ion lexibili y should no be aken o g an ed.
I is based on he acili ies p o ided by Ka ka and Ka ka S eams,
bu e aining hese pa alleliza ion acili ies, use and combine
pa alleliza ion schemes is solely p o ided due o he unique
Synopses-as-a-Mic ose ice pa adigm (Sec ion 5.1) in oduced
by SaaMS. Fo ins ance, p io a [31, 30] is buil on Flink,
which suppo s window-based pa alleliza ion, bu he SDEaaS
amewo k buil on Flink, ails o inco po a e i na i ely and
elies on manually p og ammed window con igu a ions.
4.3. SaaMS Synopses Lib a y and So wa e Technology
The SaaMS lib a y is buil using a simple, ye e ec i e so -
wa e echnology ha no only allows he inco po a ion o a
la ge se o s eam synopsis echniques (Table 2), bu , impo -
an ly, emains ex ensible o new synopses and easily cus omiz-
able o any applica ion speci ic needs. This is in con as o
o he la ge lib a ies lis ed in Table 1, which a e ex emely com-
plex and ha d o main ain and ex end in he long un [42].
SaaMS a oids by design his complexi y ia wo means.
Fi s , he design choice o using JSON o ma ed, schema-awa e
Ka ka messages in i s inpu /ou pu . This alle ia es he bu den
o inco po a ing schema de ails in he SaaMS so wa e echnol-
ogy i sel . This is in con as o o he app oaches which de ine
a sepa a e class o any synopsis speci ic da a ype. As an ex-
ample, Da aSke ches [36] de ines a sepa a e Ja a in e ace and
classes o ske ches o loa s, ske ches o doubles and so on.
Second, he SaaMS lib a y is buil based on he obse a ion
ha synopsis echniques a ely a e complex algo i hms hem-
sel es. S eam summa iza ion algo i hms all in o he ca ego y
o cle e , small and elegan p obabilis ic mechanisms ha sha e
he ollowing undamen al ope a ions (i) upda e a synopsis wi h
a new uple (add ope a ion), (ii) p o ide an es ima ion upon ge -
ing que ied (es ima e ope a ion), (iii) pe o m a me ge ope a-
ion in case a synopsis is main ained in a dis ibu ed se ing,
p o ided he synopsis i sel possesses he me geabili y p op-
e y [20].
Figu e 4 and Figu e 5 p o ide a concise iew o he SaaMS
Synopses Lib a y so wa e echnology. In he Synopses pack-
age ( op o Figu e 4) he e is he abs ac Synopsis class (mid-
dle o Figu e 5). The membe a iables o he Synopsis class
a e he synopsisID and he synopsisPa ame e s explained
in Sec ion 4.1. In addi ion o hese, he synopsisDe ails
8
Figu e 5: SaaMS zooming in he Class Diag am.
membe a iable holds impo an in o ma ion abou he syn-
opsis, including he key o he Cons uc Reques and noO P
(Sec ion 4.1).
The abs ac Synopsis class (Figu e 5) p o ides se e s and
ge e s o hese membe a iables and equi es o e e y sub-
class ha inhe i s om i o implemen he undamen al ope a-
ions o each synopsis, as p e iously men ioned, i.e., he add,
es ima e,me ge ope a ions. Mo eo e , he e a e wo addi-
ional me hods, size and se de, o se ializa ion and dese ial-
iza ion pu poses (Sec ion 3.2) upon ope a ing in a dis ibu ed/-
pa allel (wi hin he cloud) o geo-dis ibu ed (ac oss he cloud
o edge con inuum) se ing.
E e y speci ic synopsis echnique, such as hose ci ed in
Table 2, inhe i s om he Synopsis abs ac class and imple-
men s add,es ima e,me ge,size and se de as illus a ed
in Figu e 5. I is impo an o no e ha all classes ha ha e he
...Synopsis su ix, a ange he de ails o he dis ibu ed/pa -
allel main enance and que ying o he espec i e synopsis o e
SaaMS pa allel a chi ec u e (Sec ion 5.1).
Re u ning o Figu e 4, a ound he Synopses package, he e
a e sepa a e packages o each speci ic synopsis echnique. Fig-
u e 4 zooms in he AMSSke ch package which includes 2 classes.
The AMSSke chSynopsis which, as we al eady analyzed, in-
he i s om he abs ac class Synopsis in he Synopses pack-
age and akes ca e o he dis ibu ed/pa allel synopsis o e he
SaaMS a chi ec u e. The second class is he AMSSke ch class
which is a simple .ja a ile wi h he code o he espec i e ( he
AMSSKe ch in his example) synopsis. This class implemen s
he logic o he synopsis, being o ally dep i ed o pa allel ex-
ecu ion de ails. This is a design choice ha boos s he ex ensi-
bili y o he SaaMS Lib a y since any o he exis ing lib a y ha
implemen s a synopsis can be plugged in a sepa a e package,
as hose a ound he Synopses one in Figu e 4, and only a new
class wi h he ...Synopsis su ix, inhe i ing om Synopsis,
should be c ea ed implemen ing add,es ima e,me ge,size
and se de.
Speaking abou he se de me hod, he SaaMS Lib a y so -
wa e echnology ollows a modula app oach he e as well, pack-
ing se des in a sepa a e package, SynopsesSe des in Figu e 4.
The SynopsesSe des package includes a sepa a e sub-package
o each speci ic synopsis echnique and Figu e 4 uses he AMS-
Ske ch sub-package o p o ide a zoomed in iewpoin and
illus a e he con en s o each such sub-package. In pa icu-
la , he AMSSke ch sub-package in he SynopsesSe des pack-
age includes 3 classes. The AMSSKe chSe ialize and he
AMSSKe chDese ialize implemen he ac ual se ializa ion
9
he applica ion logic. Consume s in e ac ing wi h SaaMS
expe ience minimal o no dis up ion, as Ka ka manages
he ebalancing p ocess in e nally.
•A omici y and Consis ency: Ka ka ensu es a omici y and
consis ency du ing ebalancing ope a ions o p e en da a
inconsis encies o p ocessing anomalies. S a e ul ope a-
ions main ain da a in eg i y, and Ka ka’s consume g oup
coo dina ion ensu es ha all ins ances ag ee on he pa i-
ion assignmen s and p ocessing esponsibili ies.
In all, SaaMS u ns Ka ka’s gene ic ebalance machine y
in o an applica ion-awa e, con inuum-wide con ol plane. Each
synopsis is shipped as a ligh weigh mic ose ice wi h i s own
consume g oup, so ailu es o scale in e en s igge pa i-
ion mo emen , s a e mig a ion, and ask ac i a ion only o
he synopses ha we e ac ually unning on he a ec ed node(s).
SaaMS u he o ches a es au oma ic scale ou p ocedu es by
launching new ins ances whene e spa e capaci y appea s when
new nodes join, le ing synopses mic ose ices ha ha e no
exceeded hei noO P pa allelism cap exploi he newly in o-
duced esou ces. Because he s a e ha mo es is a compac
synopsis (KTables /s a e s o es) ins ead o aw s eams, eplay
and eco e y o e head emain minimal, while JMX-based eeds
make any au oscaling policies o hogonal o SaaMS. The esul
is i ually ze o down ime, minimal eco e y la ency, and en-
hanced elas ici y capabili ies ha plain Ka ka S eams canno
deli e on i s own o concu en ly unning synopses wi hou
con inuous manual in e en ion.
6. Expe imen al E alua ion
Fo epea abili y pu poses, we p o ide a de ailed desc ip-
ion o ou expe imen al se up.
SaaMS Open Sou ce Reposi o y: SaaMS code is a ailable as
open-sou ce so wa e [32], along wi h de ailed ins alla ion in-
s uc ions o bo h ba e-me al se ups (used in his expe imen-
al e alua ion) as well as ia Docke con aine s. SaaMS has
been de eloped using Ka ka and Ka ka S eams 3.3.1. I has
been es ed wi h Zookeepe 3.9.0, Ja a 19, and Ma en 3.6.3.
Backwa d compa ibili y has also been e i ied wi h e sions o
Ka ka down o 2.8.1. The Sc ip s olde o he eposi o y p o-
ides he necessa y sc ip s o s a ing he Ka ka b oke s and a
Zookeepe Se e . The Reques Examples olde includes he
exempla y syn ax o all possible eques s (Sec ion 4.1) ac oss
suppo ed synopses, including hose used in he expe imen s
p esen ed below.
S a is ics Collec o : In ou expe imen s, we moni o ele an
pe o mance me ics ( h oughpu and communica ion cos ) us-
ing JMX echnology2. The JMX API is a s anda d API o
moni o ing and managing esou ces, Ka ka S eams applica-
ions, de ices, se ices, and Ja a Vi ual Machines hemsel es.
JMX p o ides ools o local and emo e moni o ing, as well
2h ps://docs.o acle.com/en/ja a/ja ase/19/jmx/ja a-managemen -
ex ensions-jmx-use -guide.h ml
as capabili ies o collec ing and exposing applica ion pe o -
mance s a is ics in eal- ime. Ou ligh weigh s a is ics col-
lec o is a ailable in he me ics di ec o y o SaaMS code
epo. I connec s o e e y Ka ka S eams JVM and polls i o e
JMX, so as o moni o bo h p ocessing speed and applica ion
le el communica ion cos , anywhe e om he cloud down o
edge de ices. Inside each SaaMS ask unning on a de ice,
aBy eCoun ingSenso egis e s cus om by e coun e s wi h
he S eamsMe ics egis y, which a e hen exposed as JMX
MBeans. A command line in e ace (Me icsUse In e ace)
ecei es a comma sepa a ed lis o JMX se ice URLs, one
pe node in he cloud- o-edge con inuum, and es ablishes wo
collec o s: (i) he Th oughpu JMXMe ics collec o agg e-
ga es he h ead le el h oughpu om e e y URL/de ice pe-
iodically ( he pe iod is exp essed as a con igu able numbe o
seconds) and appends a imes amped block o a epo , (ii) he
Communica ionCos JMXMe ics collec o snapsho s he wo
cus om by e coun e s once and p in s he ne wo k wide o als o
s dou . Because he s a is ics collec o sc apes any JMX end-
poin , i wo ks ac oss he e ogeneous nodes (cloud VMs, og
ga eways, Raspbe y Pis) so long as we expose he igh po
and enable JMX.
Ha dwa e Se up: Fo ou expe imen al se up, we ha e a ne -
wo k composed o a cloud and a ne wo k side wi h he ollow-
ing cha ac e is ics: Cloud side, unning Ubun u 22.04.3 LTS
se e , equipped wi h (a) Two In el Xeon Sil e 4310 p oces-
so s wi h 12 co es and 24 h eads each, (b) Fou 64GB RAMs
RDIMM 3200MT/s each, (c) One ROM 960GB SSD SAS
wi h ead-in ensi e 12Gbps. Ne wo k side wi h 5 VMs unning
Raspbe y Pi OS (Debian GNU/Linux 11) each wi h a ha dwa e
con igu a ion o (a) One CPU, (b) 16GB RAM, (c) 30GB ROM.
Da ase s: We use wo eal da ase s om he s ock ma ke [47,
48] and he ma i ime domain [2, 50].
The ull s ock ma ke da a spans om 1 Janua y 2019 o
31 Decembe 2019, co e ing mo e han 500 adeable ins u-
men s: o y-nine Fo ex pai s, nine spo -c yp o pai s, wen y
ICE & CME u u es con ac s (ene gy, me als, a es, and index
u u es), ele en syn he ic equi y index con ac s, ou p op i-
e a y gaming okens, he es being cash equi ies pa i ioned
by ma ke (200+US, 130+UK, 80+FR, 70+DE, 30+ES,
10+CH, and 10+NL). Each ading day is in a quo es/ di ec-
o y o ick-by- ick iles and a companion his o y/ di ec o y
o one minu e OHLCV (Open, High, Low, Close, and Volume)
agg ega es; he ick iles con ain ou comma sepa a ed ields:
Da e, Time, P ice (na i e cu ency), and Volume. A single day
con ibu es on a e age 8.4×106 icks (σ=0.9×106), o which
Fo ex accoun s o 39%, US equi ies 29%, u u es 13%, index
syn he ics 7%, Eu opean equi ies 11%, and he emaining asse
classes <1%. Scaling he daily igu e o he 252 ading ses-
sions o 2019 yields oughly 3.1×109quo es and ades. The
median ick equency pe ins umen is 9.6×103messages pe
session, wi h a 95 h pe cen ile bu s a e o 1080 messages/sec.
The ull ma i ime da ase comp ises oughly 3.8×107 imes-
amped posi ion epo s collec ed om 500 dis inc essels op-
e a ing in he Sa onic Gul (23.1–25.4°E, 37.1–38.4°N). Each
eco d supplies a UTC ime ield, a essel iden i ie , geode-
ic coo dina es (longi ude, la i ude) and kinema ic a ibu es in-
16
cluding heading, cou se o e g ound (COG), speed o e g ound
(SOG), ship ype code and he des ina ion o he essel. The
s eam a e ages 7.4×104upda es pe ship, peaking abo e 1000
messages/sec, wi h essel speeds spanning 0–38.6 kn (mean 1.9
kn, ising o 5.6 kn when essels s aying idle a po s a e ex-
cluded). Passenge e ies (AIS ship ype code 60), ugs (AIS
ship ype code 37), as ships (AIS ship ype code 90), plea-
su e boa s (AIS ship ype code 80) and p oduc anke s (AIS
ship ype code 52) domina e he ecei ed messages. The ile is
a di ec expo o aw NMEA 0183 AIS messages decoded by
sho e ecei e s. The coun e s and anges epo ed abo e ely on
a ligh cleaning pass ha emo es a e la i ude, longi ude ou -
lie s be o e s a is ical agg ega ion.
Synopses and Synopses Pa ame e s: Since SaaMS inges s JSON
o ma ed Ka ka messages, a CSV o JSON con e e is also
p o ided in he SaaMS code eposi o y [32]. In he synopses
used in ou expe imen s we main ain Coun Min (deno ed by
CM -ϵ=0.002, δ=0.99), Hype LogLog (deno ed by HLL -
RS D =0.02) and Disc e e Fou ie T ans o m (deno ed by DFT -
WindowS ize =500,S lideS ize =200,#coe icien s =8). We
se hese pa ame e s a e discussions wi h domain expe s who
sha e he da ase s in he abo e ci ed links. As shown in Table 2,
each synopses is des ined o suppo di e en ypes o analy ics
ela ed o equency es ima ion, dis inc coun and co ela ion,
espec i ely, on he <p ice, olume >and posi ional a ibu es
o s ocks and essels o he in ol ed domain da a s eams.
Compe i o App oaches: In Sec ion 6.4 we pe o m an expe i-
men al compa ison be ween SaaMS and he p io s a e-o - he-
a , SDEaaS appo oach [31, 30]. The code o SDEaaS is p o-
ided open sou ce as well [33].
No e ha ou expe imen s concen a e on compu a ional and
communica ion pe o mance igu es. We do no p o ide esul s
o he synopses accu acy, since SaaMS does no al e in any-
way he accu acy gua an ees o synopses. Theo e ic bounds
and expe imen al esul s on he accu acy o each synopsis can
be ound in ela ed wo ks ci ed in Table 2.
In ou expe imen al e alua ion we i s examine SaaMS pe -
o mance wi h espec o he scalabili y ype equi emen s mo i-
a ed in Sec ion 1. We p esen iple s o igu es desc ibing he
ho izon al, e ical and ede a ed scalabili y a he cloud side,
he ne wo k side and he con inuum as a whole. Since we no-
ice li le de ia ions be ween he conclusions d awn om he
s ock and he ma i ime da ase s, we ini ially ocus on he s ock
da a. Then, we p o ide a compa a i e s udy be ween he p e i-
ous s a e-o - he-a , he SDEaaS app oach [30], and SaaMS on
he Ma i ime Da ase . Again analogous esul s can be ex ac ed
o he s ock ma ke da a.
6.1. SaaMS Ho izon al Scalabili y
Figu e 9 plo s he achie ed h oughpu ( e ical axes) while
a ying pa allelism (ho izon al axes), a he cloud side (Fig-
u e 9(a)), a he ne wo k side (Figu e 9(b)) and ac oss he cloud
o edge con inuum as a whole (Figu e 9(c)).
As illus a ed in Figu e 9(a), a he cloud side, SaaMS ex-
hibi s ho izon al scalabili y ha is polynomial while inc easing
pa allelism om 3 o 18, o all he ypes (CM,HLL,DFT) o
main ained synopses as well as cumula i ely ( ed line in Fig-
u e 9(a)). This esul , as no ed on he e ical axis label, s ands
o p ocessing 100 s eams main aining 3 synopses pe s eam
(300 synopses in o al). Polynomial scalabili y, whe e inc eas-
ing pa allelism leads o a polynomial inc ease in h oughpu ,
indica es ha as SaaMS scales ou (mo e ins ances and p ocess-
ing h eads a e de o ed), he o e all pe o mance imp o es a a
a e ha is as e han linea . This means SaaMS makes e i-
cien use o he addi ional esou ces ha become a ailable o i
in o de o handle i s wo kload. In addi ion, he exhibi ed poly-
nomial scalabili y p o ides lexibili y in scaling SaaMS o ac-
commoda e g owing wo kloads. As he demand o p ocessing
powe inc eases, SaaMS can add mo e esou ces o he sys em
o main ain pe o mance le els wi hou hi ing a pe o mance
pla eau oo quickly. O e all, polynomial scalabili y is a desi -
able cha ac e is ic in dis ibu ed sys ems like SaaMS because i
indica es ha he sys em can e icien ly u ilize esou ces, handle
inc easing wo kloads, and main ain pe o mance as i scales.
A he ne wo k side, plo ed in Figu e 9(b), we obse e ha
in his case, because he compu a ional esou ces o Raspbe y
Pis a e mo e es ic ed, SaaMS scales almos linea ly, while
p ocessing 20 s eams main aining 3 synopses pe s eam, i.e.,
60 synopses in o al. This holds bo h upon examining indi id-
ual CM,HLL,DFT synopses, as well as in cumula i e h oughpu
ep esen ed by he ed line in Figu e 9(b). Mo e p ecisely, he
ed line in he igu e has a sigmoid shape while inc easing he
numbe o Raspbe y Pis in he ne wo k om 2 o 5. The blue
line shows how h oughpu would be plo ed in case o abso-
lu ely linea end while inc easing pa allelism. Fo ewe han
4 Raspbe y Pis, SaaMS is below linea h oughpu , while o
4 and 5 Raspbe y Pis, he h oughpu o SaaMS shows a end
ha is be e han linea . Ne e heless, he blue shaded a eas
abo e and below he ed line a e almos equi alen in su ace.
The e o e, in o al, SaaMS exhibi s a linea end in h oughpu
inc ease upon adding mo e de ices a he ne wo k side.
Wha is mo e impo an is ha , ac oss he cloud o edge
con inuum, Figu e 9(c) shows ha he o e all pe o mance o
SaaMS is supe io compa ed o examining he cloud (Figu e 9(a))
o he ne wo k (Figu e 9(b)) sides, sepa a ely. This holds bo h
o indi idual ypes o synopses (black lines), and cumula i ely
( ed line). In pa icula , in Figu e 9(c) he numbe o pa allel in-
s ances in he ho izon al axis is a ied be ween 6 o 23. These
numbe s a e he sum o he pa allelism a he cloud side and he
numbe o Raspbe y Pis used a he ne wo k side. We begin
wi h a pa allelism o 6 a he cloud side, we scale ou o 6+2=8
by adding 2 Raspbe y Pis. Then, we swi ch o a se up wi h
a pa allelism o 9 a he cloud side and 3 Raspbe y Pis a he
ne wo k side (9 +3=12 pa allel ins ances) and o 12+4=16.
Finally, he las alue o 23 in he ho izon al axis is o med by
a pa allelism o 18 a he cloud side and 5 Raspbe y Pis a he
ne wo k side. In Figu e 9(c), i can be obse ed ha o ewe
Raspbe y Pis (2-3), whe e he h oughpu end a he ne wo k
side is below linea (Figu e 9(b)), his sligh unde -pe o mance
is balanced ou by he cloud side. On he o he hand, o pa al-
lelism be ween 12 o 18, he cloud side shows a mo e mode a e
inc ease in h oughpu (Figu e 9(a)). Bu being combined wi h
he pe o mance o he ne wo k side, which is abo e linea o
17
0.0E+00
3.0E+05
6.0E+05
9.0E+05
1.2E+06
1.5E+06
361218
Th oughpu ( uples/sec)
100 S eams - 300 Synopses
Deg ee o Pa allelism
Cumula i e@Cloud
CM
HLL
DFT
(a) Ho izon al Scalabili y @ he Cloud Side
0.0E+00
1.0E+05
2.0E+05
3.0E+05
4.0E+05
2345
Th oughpu ( uples/sec)
20 S eams - 60 Synopses
Numbe o Raspbe y Pi OS VMs
Cumula i e@Ne wo k
CM
HLL
DFT
(b) Ho izon al Scalabili y @ he Ne wo k Side
0.0E+00
3.0E+05
6.0E+05
9.0E+05
1.2E+06
1.5E+06
1.8E+06
6 8 12 16 23
Th oughpu ( uples/sec)
120 S eams - 360 Synopses
To al Pa allel Mi ose ice Ins ances
Cumula i e@Con inuum
CM
HLL
DFT
(c) Ho izon al Scalabili y ac oss he Con inuum
Figu e 9: SaaMS Ho izon al Scalabili y
mo e han 3 Raspbe y Pis, con ibu es in imp o ing he o e all
h oughpu ac oss he cloud o edge con inuum. The e o e, bo h
he black and he ed lines in Figu e 9(c) show a highe inc eas-
ing end in h oughpu compa ed o examining he ne wo k o
he cloud side, sepa a ely.
6.2. SaaMS Ve ical Scalabili y
Figu e 10 plo s he e ical scalabiliy o SaaMS. Recall ha
e ical scalabili y e e s o he abili y o scale he compu a-
ion wi h he numbe o p ocessed s eams. A he cloud side
(Figu e 10(a)) we ix pa allelism o 9 and we explo e he abil-
i y o SaaMS o scale by p og essi ely inc easing he numbe
o p ocessed s eams om 30 o 230 o each ype o synopsis
(CM,HLL,DFT). A he ne wo k side (Figu e 10(b)) we use 3
Raspbe y Pis and we explo e he abili y o SaaMS o scale by
p og essi ely inc easing he numbe o p ocessed s eams om
10 o 30 o each ype o synopsis, due o he mo e esou ce
cons ained se up. Ac oss he cloud o edge con inuum (Fig-
u e 10(c)), o keep he co espondence be ween he g aphs, we
s a by main aining synopses o 40 s eams, 30 s eams om
he cloud plo o Figu e 10(a), plus 10 om he ne wo k side o
Figu e 10(b) (40 s eams in o al). Then, we change he se up
o 60+20 s eams, swi ching o 150+30 s eams. Finally, o he
180 s eams o he p e ious s ep, we s a main aining synopses
o 80 mo e s eams. This explains he ho izon al axes among
he a ious sub igu es o Figu e 10. In addi ion, no e ha by
adding mo e s eams we simul aneously inc ease he olume
and eloci y o he inges ed da a.
Figu e 10(a) illus a es ha , a he cloud side, SaaMS scales
linea ly wi h inc easing numbe o p ocessed s eams, bo h pe
synopsis ype (black lines) and cumula i ely ( ed line). This
is an impo an esul since e en main aining s eady h ough-
pu wi h he numbe o p ocessed s eams would be sa is ac o y
om a e ical scalabili y iewpoin . On he con a y, a he
ne wo k side (Figu e 10(b)), indi idual synopses main ain el-
a i ely s eady h oughpu wi h inc easing numbe o s eams.
None heless, he cumula i e h oughpu exp essed by he ed
line in Figu e 10(b) shows almos linea scaling. The eason o
his beha io is ha indi idual ypes o main ained synopses a e
no o absolu ely s eady h oughpu , bu ha e a sligh ly inc eas-
ing end. The e o e, when he o al h oughpu is compu ed,
he small h oughpu inc emen s pe synopsis ype accumula e
o an o e all linea scaling end wi h inc easing numbe o p o-
cessed s eams. Finally, he e ical scalabili y ac oss he con-
inuum is domina ed by he linea h oughpu inc ease a he
cloud side (Figu e 10(c)).
6.3. SaaMS Fede a ed Scalabili y
To judge he ede a ed scalabili y o SaaMS, we measu e
he amoun o da a communica ed be ween mic ose ice in-
s ances bo h a he cloud and a he ne wo k side, among Rasp-
be y Pis. In Figu e 11, he ho izon al axes hold he numbe
o pa allel mic ose ice ins ances, while he e a e 2 e ical
axes in each g aph. The le mos e ical axes, in each o Fig-
u e 11(a), Figu e 11(b) and Figu e 11(c) measu es he amoun
o communica ed da a be ween pa allel mic ose ice ins ances
while main aining he synopses as desc ibed o Figu es 9(a),
Figu e 9(b) and Figu e 9(c), co espondingly. Because simply
men ioning he amoun o da a communica ed by SaaMS is no
in o ma i e enough by i sel , we also include he co espond-
ing communica ion cos o answe ing con inuous que ies (see
Ou pu Es ima ion column in Table 2) in case we communica e
he o iginal s eams ins ead o synopses. The e o e, all sub ig-
u es in Figu e 11 ha e sepa a e clus e s o ba s o he com-
munica ion cos o answe ing que ies using synopses, deno ed
by CM+HLL+DFT, e sus he ba s exposing he communica ion
cos o answe ing que ies using he o iginal s eams, e med
NoCM+NoHLL+NoDFT. Gi en hese, he igh mos e ical axis in
each sub igu e shows he communica ion a io o Raw S eams
o e SaaMS and he ed line plo s his a io o he co espond-
ing numbe o pa allel mic ose ice ins ances.
As shown by he ed line in Figu e 11, he ede a ed scal-
abili y i ues o SaaMS a e mo e p onounced a he ne wo k
(Figu e 11(b)), a he han he cloud side (Figu e 11(a)). In
pa icula , a he cloud side, synopses can educe he commu-
nica ion cos om an o de o magni ude and up o 28 imes
compa ed o he nai e app oach o con inuously communica -
ing he aw s eams in o de o answe equency es ima ion ( o
CM), dis inc coun ( o HLL) and co ela ion ( o DFT) que ies.
Fo he ne wo k side (Figu e 11(b)), he communica ion cos
educ ion is be ween 170 and up o 188 imes while a ying he
numbe o Raspbe y Pis in he ne wo k, om 2 o 5. The o e -
all communica ion cos educ ion ac oss he con inuum (Fig-
u e 11(c)), he e o e, is be ween 188 and 215 imes.
18
0.0E+00
2.0E+05
4.0E+05
6.0E+05
8.0E+05
1.0E+06
1.2E+06
1.4E+06
1.6E+06
30 60 150 230
Th oughpu ( uples/sec)
Pa allelism = 9
Numbe o S eams
Cumula i e@Cloud
CM
HLL
DFT
(a) Ve ical Scalabili y @ he Cloud Side
0.0E+00
5.0E+04
1.0E+05
1.5E+05
2.0E+05
2.5E+05
3.0E+05
1030
Th oughpu ( uples/sec)
3 RPis
20
Numbe o S eams
Cumula i e@Cloud
CM
HLL
DFT
(b) Ve ical Scalabili y @ he Ne wo k Side
0.0E+00
3.0E+05
6.0E+05
9.0E+05
1.2E+06
1.5E+06
1.8E+06
40 80 180 260
Th oughpu ( uples/sec)
Pa allelism = 9 + 3 RPis
Numbe o S eams
Cumula i e@Cloud
CM
HLL
DFT
(c) Ve ical Scalabili y ac oss he Con inuum
Figu e 10: SaaMS Ve ical Scalabili y
0
5
10
15
20
25
30
1.0E+00
1.0E+01
1.0E+02
1.0E+03
1.0E+04
3 6 9 12 18
In a-cloud Communica ion Cos
(MB - LogScale)
Deg ee o Pa allelism
CM+HLL+DFT
NoCM+NoHLL+NoDFT
Raw S eams/SaaMS
x Times Communica ion Cos Ra io
(a) Fede a ed Scalabili y @ he Cloud Side
100
110
120
130
140
150
160
170
180
190
1.0E-01
1.0E+00
1.0E+01
1.0E+02
1.0E+03
1.0E+04
2345
Communica ion Cos
(MB - LogScale)
Numbe o Raspbe y Pi OS VMs
CM+HLL+DFT
NoCM+NoHLL+NoDFT
Raw S eams/SaaMS
x Times Communica ion Cos Ra io
(b) Fede a ed Scalabili y @ he Ne wo k Side
165
170
175
180
185
190
195
200
205
210
215
220
1.0E+00
1.0E+01
1.0E+02
1.0E+03
1.0E+04
6 8 12 16 23
Communica ion Cos
MB - LogScale
To al Pa allel Mic ose ice Ins ances
CM+HLL+DFT
NoCM+NoHLL+NoDFT
Raw S eams/SaaMS
x Times Communica ion Cos Ra io
(c) Fede a ed Scalabili y ac oss he Con inuum
Figu e 11: SaaMS Fede a ed Scalabili y
The communica ion cos educ ion a io achie ed by SaaMS
a he ne wo k side is much highe compa ed o he one a he
cloud side. This is due o he da a shu ling op imiza ion pe -
o med by Ka ka, acco ding o he discussion in Sec ion 5.2.
Da a shu ling op imiza ion is possible a he cloud side o he
ne wo k whe e he ac ual Ka ka opics a e handled, a he han
a he ne wo k de ices. The e o e, because SaaMS communi-
ca es compac da a summa ies ins ead o he o iginal aw s eams
e en a he ne wo k side, i s bene i s a e g ea e . This is an im-
po an a gumen exhibi ing ha simply deploying Ka ka S eams
ac oss he cloud o edge con inuum and me ely elying on Ka ka
S eams op imiza ion o ede a ed scalabili y is no su icien
by i sel . I is he s eams synopses mic ose ices o SaaMS
ha enable ede a ed scalabili y.
Mo eo e , we no e ha he communica ion cos educ ion
achie ed by SaaMS is expec ed o be e en g ea e o la ge ne -
wo ks. This is because, in la ge ne wo k se ings, he numbe
o hops da a need o a el in o de o each he que y sou ce has
an agg ega i e e ec in he o al communica ion bu den ac oss
he con inuum.
6.4. SaaMS s SDEaaS
In his sec ion we p o ide a compa a i e s udy on he cu -
en s a e o he a , he SDEaaS app oach [31, 30] e sus SaaMS.
SDEaaS has been p o en [31, 30, 42] o ou pe o m o he (non-
SDEaaS compe i o s) on he same da ase s we use in his wo k.
We use he Ma i ime da ase o his expe imen , keeping syn-
opses o up o 500 essels.
SDEaaS is designed o ope a e a he cloud side and, he e-
o e, we also es ic SaaMS o deploying i only a he cloud
side. As discussed in Sec ion 1, Sec ion 2, Sec ion 5.1 and Sec-
ion 5.3, he mic ose ice a chi ec u e o SaaMS exploi s Ka ka
and Ka ka S eams in o de o e-scale wi h ze o down ime.
SDEaaS on he o he hand, as a Flink job, in each e-scaling
decision equi es o (i) ake a sa epoin , (ii) s op he unning
SDEaaS job, (iii) es a he job wi h al e ed pa allelism, load-
ing he p e iously aken sa epoin . The e o e, o SDEaaS we
measu e wo e sions o i s h oughpu . “SDEaaS only Up-
ime” measu es h oughpu while he job is up and unning,
while “SDEaaS Up ime +Down ime” measu es he a e age
h oughpu h oughou he job’s li espan, also accoun ing o
he ac ha du ing down imes, he h oughpu o SDEaaS is
ze oed ou .
In Figu e 12, we ha e he e ical axis, which measu es
h oughpu , and wo ho izon al axes. The ho izon al axis a he
bo om shows he numbe o essels o which we keep syn-
opses (equi alen o hose o he p e ious sec ions). No e ha
his en ails bo h inc easing he numbe o p ocessed s eams
( e ical scalabili y) and he olume, eloci y o he incoming
s eams (ho izon al scalabili y). As he ho izon al axis a he
bo om shows, we begin wi h keeping synopses o 10 s eams
and we inc ease he numbe o s eams up o 500. The mo e
s eams SaaMS p ocesses, he highe he olume and he eloc-
i y o inges ed s eams/ essels. The ho izon al axis a he op
o Figu e 12 shows he pa allelism and how i is al e ed du ing
he expe imen . So, all he candida e app oaches begin wi h a
pa allelism o 3, when 50 s eams a e p ocessed he pa allelism
swi ches o 6, i u he inc eases a 100 s eams o a alue o 9
and eaches 12 o ∼180 s eams/ essels. Pa allelism emains
19
0.0E+00
2.0E+05
4.0E+05
6.0E+05
8.0E+05
1.0E+06
1.2E+06
1.4E+06
1.6E+06
10 100 1000
A e age Th oughpu (Tuples/sec)
Numbe o Vessels
SDEaaS Up ime + Down ime
SDEaaS only Up ime
SaaMS
3 6 9 12
Pa allelism Scale ou
Figu e 12: SaaMS s SDEaaS Compa ison. The “SDEaaS only Up ime” line
plo s h oughpu wi hou conside ing donw ime due o e-scaling. “SDEaaS
Up ime +Down ime” also accoun s o ze o h oughpu du ing e-scaling in-
e als.
ixed o 12 o he es (180 - 500 s eams/ essels) o he expe -
imen . No e ha we decide he imepoin o e-scaling based
on he numbe o p ocessed s eams, as de ailed abo e. Any
mechanism ha pe o ms adap i e e-scaling in a di e en way
is o hogonal o SaaMS. P oposing an au oscale o SaaMS o
SDEaaS is ou o he scope o he cu en wo k and, h e o e,
we lea e u he in es iga ion o his issue as u u e wo k.
As Figu e 12 demons a es, “SDEaaS only Up ime” can
achie e be ween 1.5 o 3.8 imes highe h oughpu compa ed
o SaaMS o up o 200 s eams. Fo mo e han 200 s eams,
SaaMS shows up o 1.5 imes highe h oughpu . The eason
o his beha io is ha , SDEaaS inges s da a om Ka ka op-
ics. Flink (and he e o e SDEaaS) does no en o ce s ic col-
loca ion o Ka ka opics wi h Flink asks by de aul , as Ka ka
does. The e o e, when he inges ion (message consump ion)
load inc eases, he ime de o ed o eading messages om op-
ics ha a e no necessa ily colloca ed wi h he p ocessing asks,
causes a d op in he o e all h oughpu . On he con a y, as dis-
cussed in Sec ion 5.2, Ka ka by de aul pe o ms da a shu ling
op imiza ion.
Ne e heless, “SDEaaS only Up ime” shows an ideal pic-
u e abou he pe o mance o SDEaaS. This is because “SDEaaS
only Up ime” o ally igno es he in e als o down ime h ough-
ou he SDEaaS li espan. While swi ching pa allelism om
3→6→9→12 SDEaaS exhibi s down ime he du a ion o
which a ies om ens o seconds o minu es depending on he
s a e size o he espec i e sa epoin s. Du ing hese ime in e -
als, no s eaming uples a e p ocessed and he e o e SDEaaS’s
h oughpu is ze o.
When we ake his in o accoun , he ue a e age h ough-
pu o SDEaaS as plo ed by “SDEaaS Up ime +Down ime”
is signi ican ly de e io a ed. Fo any numbe o s eams/ essels
abo e 40, SaaMS shows om 1.2 o 3 imes highe h oughpu
compa ed o SDEaaS. An impo an obse a ion in ol es he
g ea e de e io a ion in a e age h oughpu be ween “SDEaaS
Up ime +DownTime” and “SDEaaS only Up ime”, SaaMS o
la ge numbe o essels/s eams. This beha io appea s due o
he inc eased s a e size and he inc eased complexi y o s a e
o ganiza ion as a esul o main aining mo e synopses o mo e
s eams. The e o e, i akes mo e ime in Flink o ake a sa e-
poin o a la ge s a e, s op he job and hen load ha la ge s a e
in o de o es a he job unde a di e en pa allelism.
Finally, all 3 app oaches show a conside able de e io a ion
in h oughpu o any alue s eams/ essels abo e 300 due o
he espec i e inc ease in he wo kload ha p og essi ely causes
backp essu es. E en in hese cases, SaaMS p o ides 1.3 and up
o 3 imes highe h oughpu han SDEaaS.
7. Conclusion and Fu u e Wo k
In his wo k we p esen ed SaaMS, he i s engine o main-
aining hund eds o s eam synopses o a ious ypes ac oss he
cloud o edge con inuum. SaaMS a ibu es ho izon al, e ical
and ede a ed scalabili y o he applica ions ha u ilize i s syn-
opses. I also ensu es adap i i y o changing ne wo k condi ions
and s eam s a is ical p ope ies, wi h ze o down ime. SaaMS
ou pe o ms he cu en s a e o he a by ensu ing up o 3 imes
highe h oughpu , o hund eds o s eams, acco ding o ou
expe imen al e alua ion. Ou u u e wo k concen a es on de-
ploying SaaMS in la ge , eal ne wo ks composed o de ices
wi h he e ogeneous esou ce capaci ies. We a e u he de el-
oping a, Bayesian Op imiza ion-based, op imiza ion module o
au oma ically se ing and adap ing he pa allelism o each syn-
opsis mic ose ice, based on ou p io expe ience [49].
Acknowledgmen s
Nikos Gia akos was suppo ed by he EU p ojec CREX-
DATA unde Ho izon Eu ope ag eemen No. 101092749.
Re e ences
[1] A. Kon axakis, A. Deligiannakis, H. A nd , S. Bu ka d, C. Ke ne , E. Pe-
likan, K. Noack, Real- ime p ocessing o geo-dis ibu ed inancial da a,
in: A. Ma ga a, E. D. Valle, A. A ikis, N. Ta bul, H. Pa zyjegla (Eds.),
15 h ACM In e na ional Con e ence on Dis ibu ed and E en -based Sys-
ems, DEBS 2021, Vi ual E en , I aly, June 28 - July 2, 2021, ACM,
2021, pp. 190–191. doi:10.1145/3465480.3467842.
URL h ps://doi.o g/10.1145/3465480.3467842
[2] M. Vodas, K. Be e a, D. Kladis, D. Zissis, E. Ale izos, E. N oulias, A. A -
ikis, A. Deligiannakis, A. Kon axakis, N. Gia akos, D. A nu, E. Yaqub,
F. Temme, M. To ok, R. Klinkenbe g, Online dis ibu ed ma i ime e en
de ec ion & o ecas ing o e big essel acking da a, in: 2021 IEEE In-
e na ional Con e ence on Big Da a (Big Da a), O lando, FL, USA, De-
cembe 15-18, 2021, 2021, pp. 2052–2057.
[3] G. Co mode, K. Yi, Small Summa ies o Big Da a, Camb idge Uni e -
si y P ess, 2020.
[4] G. Co mode, M. N. Ga o alakis, Join sizes, equency momen s, and ap-
plica ions, in: Da a S eam Managemen - P ocessing High-Speed Da a
S eams, 2016, pp. 87–102.
[5] G. Co mode, M. Ga o alakis, P. Haas, C. Je maine, Synopses o massi e
da a: Samples, his og ams, wa ele s, ske ches, Founda ions and T ends
in Da abases 4 (1-3) (2012) 1–294.
[6] G. Co mode, S. Mu huk ishnan, K. Yi, Q. Zhang, Op imal sampling
om dis ibu ed s eams, in: P oceedings o he Twen y-Nin h ACM
SIGMOD-SIGACT-SIGART Symposium on P inciples o Da abase Sys-
ems, PODS 2010, June 6-11, 2010, Indianapolis, Indiana, USA, 2010,
pp. 77–86.
20
[7] G. Co mode, S. Mu huk ishnan, An imp o ed da a s eam summa y: he
coun -min ske ch and i s applica ions, J. Algo i hms 55 (1) (2005) 58–75.
[8] G. Co mode, M. N. Ga o alakis, App oxima e con inuous que ying o e
dis ibu ed s eams, ACM T ans. Da abase Sys . 33 (2) (2008) 9:1–9:39.
[9] P. Flajole , ´
E. Fusy, O. Gandoue , F. Meunie , Hype loglog: he analysis
o a nea -op imal ca dinali y es ima ion algo i hm, in: Disc e e Ma h-
ema ics and Theo e ical Compu e Science, Disc e e Ma hema ics and
Theo e ical Compu e Science, 2007, pp. 137–156.
[10] P. Flajole , G. N. Ma in, P obabilis ic coun ing algo i hms o da a base
applica ions, J. Compu . Sys . Sci. 31 (2) (1985) 182–209.
[11] B. H. Bloom, Space/ ime ade-o s in hash coding wi h allowable e o s,
Commun. ACM 13 (7) (1970) 422–426.
[12] B. Babcock, M. Da a , R. Mo wani, Sampling om a mo ing window
o e s eaming da a, in: P oceedings o he Thi een h Annual ACM-
SIAM Symposium on Disc e e Algo i hms, Janua y 6-8, 2002, San F an-
cisco, CA, USA, 2002, pp. 633–634.
[13] G. S. Manku, R. Mo wani, App oxima e equency coun s o e da a
s eams, in: P oceedings o 28 h In e na ional Con e ence on Ve y La ge
Da a Bases, VLDB 2002, Hong Kong, Augus 20-23, 2002, 2002, pp.
346–357.
[14] M. N. Ga o alakis, Disc e e wa ele ans o m and wa ele synopses, in:
L. Liu, M. T. ¨
Ozsu (Eds.), Encyclopedia o Da abase Sys ems, Sp inge
US, 2009, pp. 857–863. doi:10.1007/978-0-387-39940-9 539.
URL h ps://doi.o g/10.1007/978-0-387-39940-9 _539
[15] M. Shekelyan, A. Dign¨
os, J. Gampe , Digi his : a his og am-based da a
summa y wi h igh e o bounds, P oc. VLDB Endow. 10 (11) (2017)
1514–1525. doi:10.14778/3137628.3137658.
URL h p://www. ldb.o g/p ldb/ ol10/p1514-shekelyan.
pd
[16] Y. E. Ioannidis, The his o y o his og ams (ab idged), in: J. C. F ey ag,
P. C. Lockemann, S. Abi eboul, M. J. Ca ey, P. G. Selinge , A. Heue
(Eds.), P oceedings o 29 h In e na ional Con e ence on Ve y La ge Da a
Bases, VLDB 2003, Be lin, Ge many, Sep embe 9-12, 2003, Mo gan
Kau mann, 2003, pp. 19–30. doi:10.1016/B978-012722442-8/50011-2.
URL h p://www. ldb.o g/con /2003/pape s/S02P01.pd
[17] D. E. Yagoubi, R. Akba inia, F. Masseglia, D. E. Shasha, Radiusske ch:
Massi ely dis ibu ed indexing o ime se ies, in: 2017 IEEE In e -
na ional Con e ence on Da a Science and Ad anced Analy ics, DSAA
2017, Tokyo, Japan, Oc obe 19-21, 2017, IEEE, 2017, pp. 262–271.
doi:10.1109/DSAA.2017.49.
URL h ps://doi.o g/10.1109/DSAA.2017.49
[18] Y. Zhu, D. E. Shasha, S a s eam: S a is ical moni o ing o housands o
da a s eams in eal ime, in: P oceedings o 28 h In e na ional Con e -
ence on Ve y La ge Da a Bases, VLDB 2002, Hong Kong, Augus 20-23,
2002, 2002, pp. 358–369.
[19] M. Cha ika , Simila i y es ima ion echniques om ounding algo i hms,
in: P oceedings on 34 h Annual ACM Symposium on Theo y o Compu -
ing, May 19-21, 2002, Mon ´
eal, Qu´
ebec, Canada, 2002, pp. 380–388.
[20] P. K. Aga wal, G. Co mode, Z. Huang, J. M. Phillips, Z. Wei, K. Yi,
Me geable summa ies, in: M. Benedik , M. K ¨
o zsch, M. Lenze ini
(Eds.), P oceedings o he 31s ACM SIGMOD-SIGACT-SIGART Sym-
posium on P inciples o Da abase Sys ems, PODS 2012, Sco sdale, AZ,
USA, May 20-24, 2012, ACM, 2012, pp. 23–34.
[21] Apache Spa k . 3.5.0, h ps://spa k.apache.o g/.
[22] Apache Flink . 1.18, h ps:// link.apache.o g/.
[23] R. P. Lemai e, M. Kie e , J. V. Hein, J. Quian´
e-Ruiz, V. Ma kl, In he
land o da a s eams whe e synopses a e missing, one amewo k o b ing
hem all, P oc. VLDB Endow. 14 (10) (2021) 1818–1831.
[24] O. Le chenko, D. E. Yagoubi, R. Akba inia, F. Masseglia, B. Kole , D. E.
Shasha, Spa k-pa ske ch: A massi ely dis ibu ed indexing o ime se-
ies da ase s, in: A. Cuzzoc ea, J. Allan, N. W. Pa on, D. S i as a a,
R. Ag awal, A. Z. B ode , M. J. Zaki, K. S. Candan, A. Lab inidis,
A. Schus e , H. Wang (Eds.), P oceedings o he 27 h ACM In e na-
ional Con e ence on In o ma ion and Knowledge Managemen , CIKM
2018, To ino, I aly, Oc obe 22-26, 2018, ACM, 2018, pp. 1951–1954.
doi:10.1145/3269206.3269226.
URL h ps://doi.o g/10.1145/3269206.3269226
[25] O. Le chenko, B. Kole , D. E. Yagoubi, R. Akba inia, F. Masseglia,
T. Palpanas, D. E. Shasha, P. Valdu iez, Bes neighbo : e icien e alua-
ion o knn que ies on la ge ime se ies da abases, Knowl. In . Sys . 63 (2)
(2021) 349–378. doi:10.1007/S10115-020-01518-4.
URL h ps://doi.o g/10.1007/s10115-020-01518-4
[26] N. Gia akos, e al, In o e: In e ac i e c oss-pla o m analy ics o e e y-
one, in: M. d’Aquin, S. Die ze, C. Hau , E. Cu y, P. Cud ´
e-Mau oux
(Eds.), CIKM ’20: The 29 h ACM In e na ional Con e ence on In o ma-
ion and Knowledge Managemen , Vi ual E en , I eland, Oc obe 19-23,
2020, ACM, 2020, pp. 3389–3392. doi:10.1145/3340531.3417435.
URL h ps://doi.o g/10.1145/3340531.3417435
[27] I. Flou is, N. Gia akos, A. Deligiannakis, M. N. Ga o alakis, Ne wo k-
wide complex e en p ocessing o e geog aphically dis ibu ed da a
sou ces, In . Sys . 88 (2020). doi:10.1016/J.IS.2019.101442.
URL h ps://doi.o g/10.1016/j.is.2019.101442
[28] G. S ama akis, A. Kon axakis, A. Simi sis, N. Gia akos, A. Deligian-
nakis, Shee mp: Op imized s eaming analy ics-as-a-se ice o e mul i-
si e and mul i-pla o m se ings, in: P oceedings o he 25 h In e na ional
Con e ence on Ex ending Da abase Technology, EDBT 2022, Edinbu gh,
UK, Ma ch 29 - Ap il 1, 2022, 2022, pp. 2:558–2:561.
[29] D. Giou oukis, A. Dadiani, J. T aub, S. Zeuch, V. Ma kl, A su ey
o adap i e sampling and il e ing algo i hms o he in e ne o hings
(2020). doi:10.1145/3401025.3403777.
URL h ps://doi.o g/10.1145/3401025.3403777
[30] A. Kon axakis, N. Gia akos, D. Sacha idis, A. Deligiannakis, And syn-
opses o all: A synopses da a engine o ex eme scale analy ics-as-a-
se ice, In . Sys . 116 (2023) 102221. doi:10.1016/J.IS.2023.102221.
URL h ps://doi.o g/10.1016/j.is.2023.102221
[31] A. Kon axakis, N. Gia akos, A. Deligiannakis, A synopses da a engine
o in e ac i e ex eme-scale analy ics, in: CIKM ’20: The 29 h ACM
In e na ional Con e ence on In o ma ion and Knowledge Managemen ,
Vi ual E en , I eland, Oc obe 19-23, 2020, 2020, pp. 2085–2088.
[32] SaaMS – Synopses-as-a-Mic oSe ice Open-sou ce Reposi o y, h ps:
//gi hub.com/geok1999/Synopses-as-a-Mic oSe ice-SaaMS
(2024).
[33] SDEaaS – Synopses Da a Engine-as-a-Se ice Open-sou ce Reposi o y,
h ps://sdeaas.gi hub.io/ (2023).
[34] J. K eps, N. Na khede, J. Rao, e al., Ka ka: A dis ibu ed messaging
sys em o log p ocessing, in: P oceedings o he Ne DB, 2011, pp. 1–7.
[35] Apache Ka ka . 3.3, h ps://ka ka.apache.o g/.
[36] Apache da aske ches, h ps://da aske ches.gi hub.io/.
[37] S eam-lib, h ps://gi hub.com/add his/s eam-lib.
[38] D. L. Quoc, R. Chen, P. Bha o ia, C. Fe ze , V. Hil , T. S u e, S eamap-
p ox: app oxima e compu ing o s eam analy ics, in: P oceedings o he
18 h ACM/IFIP/USENIX Middlewa e Con e ence, Las Vegas, NV, USA,
Decembe 11 - 15, 2017, 2017, pp. 185–197.
[39] B. Moza a i, Snappyda a, in: Encyclopedia o Big Da a Technologies,
2019.
[40] Da a s eam managemen - p ocessing high-speed da a s eams, Da a-
Cen ic Sys ems and Applica ions, Sp inge , 2016. doi:10.1007/978-3-
540-28608-0.
URL h ps://doi.o g/10.1007/978-3-540-28608-0
[41] A. Po zne , P. Mahajan, J. Gus a son, J. Rao, I. Juma, F. Min, S. S idha-
an, N. Bha ia, G. K. A alu i, A. Chand a, S. Kozlo ski, R. Si a am,
L. B ads ee , B. Ba e , D. Shah, D. Jaco , D. A hu , M. Chawla,
R. Dagos ino, C. Mccabe, M. R. Obili, K. P akasam, J. G. Sancio,
V. Singh, A. Nikhil, K. Gup a, Ko a: A cloud-na i e e en s eaming
pla o m o ka ka, P oc. VLDB Endow. 16 (12) (2023) 3822–3834.
doi:10.14778/3611540.3611567.
URL h ps://www. ldb.o g/p ldb/ ol16/p3822-po zne .pd
[42] N. Gia akos, E. Ale izos, A. Deligiannakis, R. Klinkenbe g, A. A ikis,
P oac i e s eaming analy ics a scale: A jou ney om he s a e-o - he-
a o a p oduc ion pla o m, in: I. F ommholz, F. Hop ga ne , M. Lee,
M. Oakes, M. Lalmas, M. Zhang, R. L. T. San os (Eds.), P oceedings o
he 32nd ACM In e na ional Con e ence on In o ma ion and Knowledge
Managemen , CIKM 2023, Bi mingham, Uni ed Kingdom, Oc obe 21-
25, 2023, ACM, 2023, pp. 5204–5207. doi:10.1145/3583780.3615293.
URL h ps://doi.o g/10.1145/3583780.3615293
[43] N. Alon, Y. Ma ias, M. Szegedy, The space complexi y o app oxima ing
he equency momen s, in: P oceedings o he Twen y-Eigh h Annual
ACM Symposium on he Theo y o Compu ing, Philadelphia, Pennsyl a-
nia, USA, May 22-24, 1996, 1996, pp. 20–29.
[44] M. G eenwald, S. Khanna, Space-e icien online compu a ion o quan ile
summa ies, in: S. Meh o a, T. K. Sellis (Eds.), P oceedings o he 2001
ACM SIGMOD in e na ional con e ence on Managemen o da a, San a
21
Ba ba a, CA, USA, May 21-24, 2001, ACM, 2001, pp. 58–66.
[45] N. Gia akos, Y. Ko idis, A. Deligiannakis, V. Vassalos, Y. Theodo idis,
In-ne wo k app oxima e compu a ion o ou lie s wi h quali y gua an ees,
In . Sys . 38 (8) (2013) 1285–1308.
[46] A. A asu, G. S. Manku, App oxima e coun s and quan iles o e sliding
windows, in: C. Bee i, A. Deu sch (Eds.), P oceedings o he Twen y-
hi d ACM SIGACT-SIGMOD-SIGART Symposium on P inciples o
Da abase Sys ems, June 14-16, 2004, Pa is, F ance, ACM, 2004, pp. 286–
296.
[47] Bu ka d, Da a se o co ela ions be ween s ocks wo ld wide (Ma . 2022).
doi:10.5281/zenodo.6331464.
URL h ps://doi.o g/10.5281/zenodo.6331464
[48] sp ing, Financial da a se used in in o e p ojec (Jun. 2020).
doi:10.5281/zenodo.3886895.
URL h ps://doi.o g/10.5281/zenodo.3886895
[49] N. Gia akos, E. Kougioum zi, A. Kon axakis, A. Deligiannakis, Y. Ko-
idis, Easy linkcep: Big e en da a analy ics o e e yone, in: G. De-
ma ini, G. Zuccon, J. S. Culpeppe , Z. Huang, H. Tong (Eds.), CIKM
’21: The 30 h ACM In e na ional Con e ence on In o ma ion and Knowl-
edge Managemen , Vi ual E en , Queensland, Aus alia, No embe 1 -
5, 2021, ACM, 2021, pp. 3029–3033. doi:10.1145/3459637.3482094.
URL h ps://doi.o g/10.1145/3459637.3482094
[50] I. Kon opoulos, M. Vodas, G. Spiliopoulos., K. Tse pes. & D. Zissis.
Single G ound Based AIS Recei e Vessel T acking Da ase . (Zen-
odo,2020,4), h ps://doi.o g/10.5281/zenodo.3754481
22