In eg a ing and explo ing
he e ogeneous da ase s
Nelly Ba e
Pos doc o al esea che
Da a Science g oup, DEIB, Poli ecnico di Milano
Ap il 19, 2024
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 1 / 89
Ou line
1Mo i a ion: da a in eg a ion and explo a ion p oblems
2P edihood: p edic ing neighbou hoods’ en i onmen
3GeoAlign: spa ial en i y ma ching o Poin s o In e es
4Abs a: i s -sigh o e iew o a da ase
5Pa hways: e icien ly inding in e es ing pa hs
6Sys ems de eloped
7Conclusion
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 2 / 89
Mo i a ion: da a in eg a ion and explo a ion p oblems
Ou line
1Mo i a ion: da a in eg a ion and explo a ion p oblems
2P edihood: p edic ing neighbou hoods’ en i onmen
3GeoAlign: spa ial en i y ma ching o Poin s o In e es
4Abs a: i s -sigh o e iew o a da ase
5Pa hways: e icien ly inding in e es ing pa hs
6Sys ems de eloped
7Conclusion
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 3 / 89
Mo i a ion: da a in eg a ion and explo a ion p oblems Da a explo a ion and in eg a ion
Da a explo a ion and in eg a ion
S uc u ed da a models:
Rela ional da abases
Tables
Semi-s uc u ed da a models:
XML documen s
JSON documen s
RDF g aphs
P ope y g aphs
Da ase explo a ion and in eg a ion is ha d: la ge, complex, i egula
Today’s menu: ocus on ca og aphic and semi-s uc u ed da a
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 4 / 89
Mo i a ion: da a in eg a ion and explo a ion p oblems Da a explo a ion and in eg a ion
Da a explo a ion and in eg a ion
S uc u ed da a models:
Rela ional da abases
Tables
Semi-s uc u ed da a models:
XML documen s
JSON documen s
RDF g aphs
P ope y g aphs
Da ase explo a ion and in eg a ion is ha d: la ge, complex, i egula
Today’s menu: ocus on ca og aphic and semi-s uc u ed da a
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 4 / 89
P edihood: p edic ing neighbou hoods’ en i onmen
Ou line
1Mo i a ion: da a in eg a ion and explo a ion p oblems
2P edihood: p edic ing neighbou hoods’ en i onmen
3GeoAlign: spa ial en i y ma ching o Poin s o In e es
4Abs a: i s -sigh o e iew o a da ase
5Pa hways: e icien ly inding in e es ing pa hs
6Sys ems de eloped
7Conclusion
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 5 / 89
P edihood: p edic ing neighbou hoods’ en i onmen
Mo i a ion: he e ogeneous da a is e e ywhe e
Name: Jane Doe
Job: F ench in es iga i e jou nalis
Sex: F
Bi h ci y: Pa is
Residence ci y: Lyon
Wishes:
Lea n Lyon neighbou hoods [BDF+21]
Visi Lyon’s monumen s [BDFM19]
Explo e new da ase s o he in es iga ions [BMU24]
Re eal undecla ed con lic s o in e es s [BGLM23a]
Skills:
Excel: ? ? ?? Wo d: ? ? ??
Rel. da abases: ?Semi-s uc . da a: N/A
Agg ega e ci y-le el da a
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 6 / 89
P edihood: p edic ing neighbou hoods’ en i onmen Neighbou hood en i onmen p edic ion
Neighbou hood en i onmen p edic ion
INSEE (F ench Na ional Ins i u e o S a is ics)
IRIS: small geo uni o 5K inhabi an s (50K IRIS in FR)
Fo each IRIS: 600 quan i a i e ea u es
→No high-le el desc ip ion o neighbou hoods’ cha ac e is ics
→Too many ea u es o p edic ion
Resea ch con ibu ion
P edic au oma ically he en i onmen o a any F ench neighbou hood,
based on ca og aphic and ci y-le el da a
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 7 / 89
P edihood: p edic ing neighbou hoods’ en i onmen Neighbou hood en i onmen p edic ion
Neighbou hood en i onmen p edic ion
INSEE (F ench Na ional Ins i u e o S a is ics)
IRIS: small geo uni o 5K inhabi an s (50K IRIS in FR)
Fo each IRIS: 600 quan i a i e ea u es
→No high-le el desc ip ion o neighbou hoods’ cha ac e is ics
→Too many ea u es o p edic ion
Resea ch con ibu ion
P edic au oma ically he en i onmen o a any F ench neighbou hood,
based on ca og aphic and ci y-le el da a
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 7 / 89
GeoAlign: spa ial en i y ma ching o Poin s o In e es F om ca og aphic en i ies o POIS
Adap i e o mula o geog aphic en i y ma ching
Gi en wo en i ies e1,e2, he adap i e o mula elies on:
The simila deg ee o e1and e2a ibu es
13 measu es: geo, ex , ype, ...
The weigh /impo ance o e1and e2a ibu es
(e1,e2) = Pn
i=1 weigh i∗simi(a ibu ei)> θ
weigh sim. measu e a ibu e
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 14 / 89
GeoAlign: spa ial en i y ma ching o Poin s o In e es F om ca og aphic en i ies o POIS
GeoAlign a wo k
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 15 / 89
Abs a: i s -sigh o e iew o a da ase
Ou line
1Mo i a ion: da a in eg a ion and explo a ion p oblems
2P edihood: p edic ing neighbou hoods’ en i onmen
3GeoAlign: spa ial en i y ma ching o Poin s o In e es
4Abs a: i s -sigh o e iew o a da ase
5Pa hways: e icien ly inding in e es ing pa hs
6Sys ems de eloped
7Conclusion
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 16 / 89
Abs a: i s -sigh o e iew o a da ase
Mo i a ion: he e ogeneous da a is e e ywhe e
Name: Jane Doe
Job: F ench in es iga i e jou nalis
Sex: F
Bi h ci y: Pa is
Residence ci y: Lyon
Wishes:
Lea n Lyon neighbou hoods [BDF+21]
Visi Lyon’s monumen s [BDFM19]
Explo e new da ase s o he in es iga ions [BMU24]
Re eal undecla ed con lic s o in e es s [BGLM23a]
Skills:
Excel: ? ? ?? Wo d: ? ? ??
Rel. da abases: ?Semi-s uc . da a: N/A
Simple
desc ip ions
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 17 / 89
Abs a: i s -sigh o e iew o a da ase
Wha does he da ase desc ibe?
Real-wo ld objec s and ela ionships be ween hem
En i y-Rela ionship models [RG03]
Need o compu e hem om he da ase !
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 18 / 89
Abs a: i s -sigh o e iew o a da ase
Wha does he da ase desc ibe?
Real-wo ld objec s and ela ionships be ween hem
En i y-Rela ionship models [RG03]
Need o compu e hem om he da ase !
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 18 / 89
Abs a: i s -sigh o e iew o a da ase
Wha does he da ase desc ibe?
Real-wo ld objec s and ela ionships be ween hem
En i y-Rela ionship models [RG03]
Need o compu e hem om he da ase !
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 18 / 89
Abs a: i s -sigh o e iew o a da ase
Wha does he da ase desc ibe?
Real-wo ld objec s and ela ionships be ween hem
En i y-Rela ionship models [RG03]
Need o compu e hem om he da ase !
Wha abou semi-s uc u ed da a models (nes ing)?
Keep i simple and o con ollable size
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 19 / 89
Abs a: i s -sigh o e iew o a da ase
Wha does he da ase desc ibe?
Real-wo ld objec s and ela ionships be ween hem
En i y-Rela ionship models [RG03]
Need o compu e hem om he da ase !
Wha abou semi-s uc u ed da a models (nes ing)?
Keep i simple and o con ollable size
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 19 / 89
Abs a: i s -sigh o e iew o a da ase
Resea ch con ibu ion: da a abs ac ion
Abs a: Ligh weigh En i y-Rela ionship diag ams [BMU22,BMU24]
Au oma ically and e icien ly om semi-s uc u ed da a
Compac ye meaning ul da a o e iews
Ideal o i s -sigh da ase disco e y
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 20 / 89
Abs a: i s -sigh o e iew o a da ase Da a g aph summa iza ion
Quo ien summa iza ion ac oss da a models
Each da a model has i s own syn ax:
XML JSON
RDF PG
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 24 / 89
Abs a: i s -sigh o e iew o a da ase Da a g aph summa iza ion
Summa iza ion based on same-kind nodes
We iden i y node kinds in each model based on he espec i e bes
p ac ices o da a design:
XML: elemen s wi h he same label (o ype)
JSON: nodes on he same pa h om he oo
RDF [GGM20]: depending on node ype(s) o , i absen , incoming
and ou going p ope ies
PG: adap a ion o he abo e [GGM20]
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 25 / 89
Abs a: i s -sigh o e iew o a da ase Da a g aph summa iza ion
The summa y (collec ion g aph) G
Collec ion node o each equi alence class
pape
abs ac
# al
yea
# al
i le
# al wB
hW
pIn
in
au ho
con
name # al
da e # al
email # al
a ilia ion uni e si y ci y # al
campus
# al
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 26 / 89
Abs a: i s -sigh o e iew o a da ase Da a g aph summa iza ion
The summa y (collec ion g aph) G
Collec ion node o each equi alence class
Collec ion edge Cs→C i a da a edge exis s
pape
abs ac
# al
yea
# al
i le
# al wB
hW
pIn
in
au ho
con
name # al
da e # al
email # al
a ilia ion uni e si y ci y # al
campus
# al
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 27 / 89
Abs a: i s -sigh o e iew o a da ase Da a g aph summa iza ion
The summa y (collec ion g aph) G
Collec ion node o each equi alence class
Collec ion edge Cs→C i a da a edge exis s
En i y p o ile o each lea collec ion node: e lec s NEs in he lea es
pape
abs ac
# al
yea
# al
i le
# al wB
hW
pIn
in
au ho
con
name # al
da e # al
email # al
a ilia ion uni e si y ci y # al
campus
# al
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 28 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
Iden i ying en i ies in he collec ion g aph G
pape
abs ac
# al
yea
# al
i le
# al wB
hW
pIn
in
au ho
con
name # al
da e # al
email # al
a ilia ion uni e si y ci y # al
campus
# al
Which collec ions ep esen en i ies in he E-R diag am?
Which collec ions ep esen en i y a ibu es?
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 29 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
Iden i ying en i ies in he collec ion g aph G
pape
abs ac
# al
yea
# al
i le
# al wB
hW
pIn
in
au ho
con
name # al
da e # al
email # al
a ilia ion uni e si y ci y # al
campus
# al
Which collec ions ep esen en i ies in he E-R diag am?
Which collec ions ep esen en i y a ibu es?
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 29 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
Iden i ying en i ies in he collec ion g aph G
pape
abs ac
# al
yea
# al
i le
# al wB
hW
pIn
in
au ho
con
name # al
da e # al
email # al
a ilia ion uni e si y ci y # al
campus
# al
Which collec ions ep esen en i ies in he E-R diag am?
Which collec ions ep esen en i y a ibu es?
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 29 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
Requi emen s and algo i hm
We need an algo i hm o iden i y en i y oo s and a ibu es o he
E-R diag am
Fo complex, po en ially cyclic, collec ion g aphs
G eedy selec ion o ew en i ies in G
1Assign a sco e o each collec ion node
2While less han Emax en i y oo s, o da a co e age <co min
1Elec he nex highes -sco ed eligible collec ion node as an en i y oo
2Compu e i s bounda y , i.e., a ibu e se
3Upda e he collec ion g aph o e lec he selec ion o an en i y
4Recompu e he sco es
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 30 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
Requi emen s and algo i hm
We need an algo i hm o iden i y en i y oo s and a ibu es o he
E-R diag am
Fo complex, po en ially cyclic, collec ion g aphs
G eedy selec ion o ew en i ies in G
1Assign a sco e o each collec ion node
2While less han Emax en i y oo s, o da a co e age <co min
1Elec he nex highes -sco ed eligible collec ion node as an en i y oo
2Compu e i s bounda y , i.e., a ibu e se
3Upda e he collec ion g aph o e lec he selec ion o an en i y
4Recompu e he sco es
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 30 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
How o sco e a collec ion node?
1wdesck,wlea k: # descendan s, lea descendan s, a dep h k
2Di ec ed Acyclic G aph (DAG) oo ed in each node: wDAG
3wPageRank : PageRank algo i hm on G
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 36 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
PageRank sco e o a collec ion g aph node
pape
abs ac
# al
yea
# al
i le
# al wB
hW
pIn
in
au ho
con
name # al
da e # al
email # al
a ilia ion uni e si y ci y # al
campus
# al
The collec ion g aph G
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 37 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
PageRank sco e o a collec ion g aph node
pape
abs ac
# al
yea
# al
i le
# al wB
hW
pIn
in
au ho
con
name # al
da e # al
email # al
a ilia ion uni e si y ci y # al
campus
# al
The e e se collec ion g aph GR
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 38 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
PageRank sco e o a collec ion g aph node
pape
abs ac
# al
yea
# al
i le
# al wB
hW
pIn
in
au ho
con
name # al
da e # al
email # al
a ilia ion uni e si y ci y # al
campus
# al
1
1
11
1
1
1
0.5
1
1
1
1
1
0.5
1
1
0.5
0.5
1
1 1
1
1 1 1
1
1
The e e se collec ion g aph GRwi h PR edge weigh s
Collec ions dis ibu e hei sco e based solely on hei connec i i y
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 39 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
PageRank sco e o a collec ion g aph node
pape
abs ac
# al
yea
# al
i le
# al wB
hW
pIn
in
au ho
con
name # al
da e # al
email # al
a ilia ion uni e si y ci y # al
campus
# al
1
1
11
1
1
1
0.5
1
1
1
1
1
0.5
1
1
0.5
0.5
1
1 1
1
1 1 1
1
1
The e e se collec ion g aph GRwi h PR edge weigh s
Collec ions dis ibu e hei sco e based solely on hei connec i i y
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 39 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
How o sco e a collec ion node?
1wdesck,wlea k: # descendan s, lea descendan s, a dep h k
2wDAG :dw bo om-up p opaga ion on G(ou side cycles)
3wPageRank : PageRank algo i hm on G
4wdwPageRank : PageRank algo i hm on Gwi h dw- uned PR edge
weigh s
XRe lec s bo h he opology and whe e ac ual da a is
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 40 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
The da a-weigh ed PageRank sco e
pape
abs ac
# al
yea
# al
i le
# al wB
hW
pIn
in
au ho
con
name # al
da e # al
email # al
a ilia ion uni e si y ci y # al
campus
# al
1
1
11
1
1
1
0.66
1
1
1
1
1
0.33
1
1
0.4
0.6
1
1 1
1
1 1 1
1
1
The e e se collec ion g aph GRwi h dw- uned PR edge weigh s
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 41 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
The da a-weigh ed PageRank sco e
pape
.178
abs ac
.011
# al
.006
yea
.011
# al
.006
i le
.011
# al
.006
wB
.107
hW
.158
pIn
.063
in
.056
au ho
.179
con
.067
name
.011
# al
.006
da e
.011
# al
.006
email
.011
# al
.006
a ilia ion
.027
uni e si y
.024
ci y
.011
# al
.006
campus
.011
# al
.006
1
1
11
1
1
1
0.66
1
1
1
1
1
0.33
1
1
0.4
0.6
1
1 1
1
1 1 1
1
1
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 42 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
The da a-weigh ed PageRank sco e
pape
.178
abs ac
.011
# al
.006
yea
.011
# al
.006
i le
.011
# al
.006
wB
.107
hW
.158
pIn
.063
in
.056
au ho
.179
con
.067
name
.011
# al
.006
da e
.011
# al
.006
email
.011
# al
.006
a ilia ion
.027
uni e si y
.024
ci y
.011
# al
.006
campus
.011
# al
.006
1
1
11
1
1
1
0.66
1
1
1
1
1
0.33
1
1
0.4
0.6
1
1 1
1
1 1 1
1
1
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 43 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
The da a-weigh ed PageRank sco e
pape
.178
abs ac
.011
# al
.006
yea
.011
# al
.006
i le
.011
# al
.006
wB
.107
hW
.158
pIn
.063
in
.056
au ho
.179
con
.067
name
.011
# al
.006
da e
.011
# al
.006
email
.011
# al
.006
a ilia ion
.027
uni e si y
.024
ci y
.011
# al
.006
campus
.011
# al
.006
1
1
11
1
1
1
0.66
1
1
1
1
1
0.33
1
1
0.4
0.6
1
1 1
1
1 1 1
1
1
P opaga es sco es ac oss he collec ion g aph
Wo ks on cyclic collec ion g aphs
The sco e e lec s he opology and whe e he da a is
A collec ion node dis ibu es i s weigh
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 44 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
How o upda e he collec ion g aph a e selec ing an
en i y?
Re lec he alloca ion o da a nodes and edges o one en i y
1upda eboolean
Collec ion nodes and edges in he bounda y o he en i y
Ve y e icien
Su icien o wdesck,wlea k,wDAG
2upda eexac
G aph nodes and edges
Much mo e cos ly
Requi ed o wPageRank ,wdwPageRank
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 49 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
How o upda e he collec ion g aph a e selec ing an
en i y?
Re lec he alloca ion o da a nodes and edges o one en i y
1upda eboolean
Collec ion nodes and edges in he bounda y o he en i y
Ve y e icien
Su icien o wdesck,wlea k,wDAG
2upda eexac
G aph nodes and edges
Much mo e cos ly
Requi ed o wPageRank ,wdwPageRank
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 49 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
How o upda e he collec ion g aph a e selec ing an
en i y?
Re lec he alloca ion o da a nodes and edges o one en i y
1upda eboolean
Collec ion nodes and edges in he bounda y o he en i y
Ve y e icien
Su icien o wdesck,wlea k,wDAG
2upda eexac
G aph nodes and edges
Much mo e cos ly
Requi ed o wPageRank ,wdwPageRank
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 49 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
Exac g aph upda e
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 50 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
Exac g aph upda e
pape
abs ac
# al
yea
# al
i le
# al wB
hW
pIn
in
au ho
con
name # al
da e # al
email # al
a ilia ion uni e si y ci y # al
campus
# al
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 51 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
Selec ed en i ies and hei bounda ies
pape
abs ac
# al
yea
# al
i le
# al wB
hW
pIn
in
au ho
con
name # al
da e # al
email # al
a ilia ion uni e si y ci y # al
campus
# al
1
1
1 1
1
1
1
1
1
1
1
1
1
0.33
1
1
0.4
0.6
1
1 1
1
1 1 1
1
1
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 52 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
Finding ela ionships be ween en i ies
Rela ionship: a pa h om an en i y o ano he
pape
abs ac
# al
yea
# al
i le
# al wB
hW
pIn
in
au ho
con
name # al
da e # al
email # al
a ilia ion uni e si y ci y # al
campus
# al
1
1
1 1
1
1
1
1
1
1
1
1
1
0.33
1
1
0.4
0.6
1
1 1
1
1 1 1
1
1
pape →wB →au ho
pape →pIn →con
au ho →hW →pape
con →in →au ho
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 53 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
En i y classi ica ion
Assign a seman ic ca ego y o each en i y
Inpu : an en i y E, ca ego ies K, seman ic p ope ies P
K: Pe son, Scien i icPape , E en , Websi e, Moun ain, ...
P:{label:"add ess", domain:[Pe s., O g.], ange:[Place]}, ...
Ou pu : a ca ego y o E
Algo i hm:
Compa e:
The common name o all nodes in he en i y oo (i i exis s) wi h
k∈ K (con , pape , au ho )
I s a ibu e names wi h p∈ P (a ilia ion, email, ...)
I s en i y p o iles wi h p. ange ∈ P (,,, ...)
Each good ma ch o es o one o ew ca ego ies
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 54 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
En i y classi ica ion
Assign a seman ic ca ego y o each en i y
Inpu : an en i y E, ca ego ies K, seman ic p ope ies P
K: Pe son, Scien i icPape , E en , Websi e, Moun ain, ...
P:{label:"add ess", domain:[Pe s., O g.], ange:[Place]}, ...
Ou pu : a ca ego y o E
Algo i hm:
Compa e:
The common name o all nodes in he en i y oo (i i exis s) wi h
k∈ K (con , pape , au ho )
I s a ibu e names wi h p∈ P (a ilia ion, email, ...)
I s en i y p o iles wi h p. ange ∈ P (,,, ...)
Each good ma ch o es o one o ew ca ego ies
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 54 / 89
Abs a: i s -sigh o e iew o a da ase Iden i ying en i ies and ela ionships
En i y classi ica ion
Name Simila o Vo es o
pape Resea chPublica ion (0.85) Resea chPublica ion
News (0.63) News
pape
abs ac
# al
yea
# al
i le
# al
1
1
1 1
1
1
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 55 / 89
Abs a: i s -sigh o e iew o a da ase Expe imen al e alua ion
Expe imen al e alua ion
On main semi-s uc u ed da a models: 8 JSON, 7 RDF, 5 XML, 3 PG
10 syn he ic, 13 eal-wo ld
5M o 14M nodes
Collec ion g aphs:
26 o 4.8K collec ions
14/23 ha e cycles
G aphs s o ed in Pos g eSQL, algo i hms in SQL and Ja a
We e alua e:
1En i y selec ion quali y
2Scalabili y
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 61 / 89
Abs a: i s -sigh o e iew o a da ase Expe imen al e alua ion
Expe imen al e alua ion
On main semi-s uc u ed da a models: 8 JSON, 7 RDF, 5 XML, 3 PG
10 syn he ic, 13 eal-wo ld
5M o 14M nodes
Collec ion g aphs:
26 o 4.8K collec ions
14/23 ha e cycles
G aphs s o ed in Pos g eSQL, algo i hms in SQL and Ja a
We e alua e:
1En i y selec ion quali y
2Scalabili y
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 61 / 89
Abs a: i s -sigh o e iew o a da ase Expe imen al e alua ion
En i y selec ion quali y wi h (wdwPageRank ,bound l−ac )
Da ase name |C| |ME| |MR| co ME dmax |MEi|
Mondial 168 5 8 0.85
Ci y
P o ince
Coun y
O ganiza ion
Ri e
3
3
4
4
4
3,152
1,455
231
168
135
PubMed 26 1 0 1.0 PubMedA icle 5 957
XMa k1 136 5 10 0.91
Pe son
I em
Open Auc ion
Closed Auc ion
Ca ego y
4
7
8
8
2
25,500
21,750
12,000
9,750
1,000
XMa k4 136 5 10 0.90
Pe son
I em
Open Auc ion
Closed Auc ion
Ca ego y
4
7
8
8
2
102,000
87,000
48,000
39,000
4,000
Wikimedia 59 2 0 1.0 Page
Namespace
4
3
54,750
32
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 62 / 89
Abs a: i s -sigh o e iew o a da ase Expe imen al e alua ion
En i y selec ion quali y wi h (wdwPageRank ,bound l−ac )
Da ase name |C| |ME| |MR| co ME dmax |MEi|
Mondial 168 5 8 0.85
Ci y
P o ince
Coun y
O ganiza ion
Ri e
3
3
4
4
4
3,152
1,455
231
168
135
PubMed 26 1 0 1.0 PubMedA icle 5 957
XMa k1 136 5 10 0.91
Pe son
I em
Open Auc ion
Closed Auc ion
Ca ego y
4
7
8
8
2
25,500
21,750
12,000
9,750
1,000
XMa k4 136 5 10 0.90
Pe son
I em
Open Auc ion
Closed Auc ion
Ca ego y
4
7
8
8
2
102,000
87,000
48,000
39,000
4,000
Wikimedia 59 2 0 1.0 Page
Namespace
4
3
54,750
32
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 63 / 89
Abs a: i s -sigh o e iew o a da ase Expe imen al e alua ion
En i y selec ion quali y wi h (wdwPageRank ,bound l−ac )
Da ase name |C| |ME| |MR| co ME dmax |MEi|
Mondial 168 5 8 0.85
Ci y
P o ince
Coun y
O ganiza ion
Ri e
3
3
4
4
4
3,152
1,455
231
168
135
PubMed 26 1 0 1.0 PubMedA icle 5 957
XMa k1 136 5 10 0.91
Pe son
I em
Open Auc ion
Closed Auc ion
Ca ego y
4
7
8
8
2
25,500
21,750
12,000
9,750
1,000
XMa k4 136 5 10 0.90
Pe son
I em
Open Auc ion
Closed Auc ion
Ca ego y
4
7
8
8
2
102,000
87,000
48,000
39,000
4,000
Wikimedia 59 2 0 1.0 Page
Namespace
4
3
54,750
32
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 64 / 89
Abs a: i s -sigh o e iew o a da ase Expe imen al e alua ion
En i y selec ion quali y wi h (wdwPageRank ,bound l−ac )
Da ase name |C| |ME| |MR| co ME dmax |MEi|
Mondial 168 5 8 0.85
Ci y
P o ince
Coun y
O ganiza ion
Ri e
3
3
4
4
4
3,152
1,455
231
168
135
PubMed 26 1 0 1.0 PubMedA icle 5 957
XMa k1 136 5 10 0.91
Pe son
I em
Open Auc ion
Closed Auc ion
Ca ego y
4
7
8
8
2
25,500
21,750
12,000
9,750
1,000
XMa k4 136 5 10 0.90
Pe son
I em
Open Auc ion
Closed Auc ion
Ca ego y
4
7
8
8
2
102,000
87,000
48,000
39,000
4,000
Wikimedia 59 2 0 1.0 Page
Namespace
4
3
54,750
32
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 65 / 89
Abs a: i s -sigh o e iew o a da ase Expe imen al e alua ion
En i y selec ion quali y wi h (wdwPageRank ,bound l−ac )
Da ase name |C| |ME| |MR| co ME dmax |MEi|
Mondial 168 5 8 0.85
Ci y
P o ince
Coun y
O ganiza ion
Ri e
3
3
4
4
4
3,152
1,455
231
168
135
PubMed 26 1 0 1.0 PubMedA icle 5 957
XMa k1 136 5 10 0.91
Pe son
I em
Open Auc ion
Closed Auc ion
Ca ego y
4
7
8
8
2
25,500
21,750
12,000
9,750
1,000
XMa k4 136 5 10 0.90
Pe son
I em
Open Auc ion
Closed Auc ion
Ca ego y
4
7
8
8
2
102,000
87,000
48,000
39,000
4,000
Wikimedia 59 2 0 1.0 Page
Namespace
4
3
54,750
32
Abs a selec s equen , cohe en and seman ically cen al en i ies
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 66 / 89
Abs a: i s -sigh o e iew o a da ase Expe imen al e alua ion
Expe imen al e alua ion: scalabili y
Ou abs ac ion me hod scales up linea ly in he da a size
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 67 / 89
Abs a: i s -sigh o e iew o a da ase Rela ed wo k
Rela ed wo k
Da a summa iza ion
S uc u al
Quo ien [GGM20,KC10,MS99]
( he one we adop o build G)
Non-quo ien [GW97]
Pa e n mining [ZLVK16]
S a is ical [HS12]
Hyb id [RGSB17]
Schema in e ence
XML [CGS11]
JSON [BCGS19]
RDF [GLSW22]
PG [LBH21]
Da a summa iza ion and schema in e ence a e ied o one da a model
Schemas a e o en no sui ed o NTUs
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 68 / 89
Abs a: i s -sigh o e iew o a da ase Rela ed wo k
A JSON schema om social ne wo k da a using [BCGS19]
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 69 / 89
Pa hways: e icien ly inding in e es ing pa hs NE- o-NE pa h enume a ion
Wha makes a NE- o-NE pa h in e es ing?
Some pa hs connec ing Pe son NEs () o O ganiza ion NEs ()
←# al ←Name ←Au ho →A ilia ion →# al →
←# al ←Name ←Au ho ←Au ho s ←A icle →Jou nal →# al →
←# al ←COI ←A icle →Jou nal →# al →←# al →
Which pa hs a e mos in e es ing and dese e o be e alua ed?
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 74 / 89
Pa hways: e icien ly inding in e es ing pa hs NE- o-NE pa h enume a ion
Wha makes a NE- o-NE pa h in e es ing?
Some pa hs connec ing Pe son NEs () o O ganiza ion NEs ()
←# al ←Name ←Au ho →A ilia ion →# al →
←# al ←Name ←Au ho ←Au ho s ←A icle →Jou nal →# al →
←# al ←COI ←A icle →Jou nal →# al →←# al →
Which pa hs a e mos in e es ing and dese e o be e alua ed?
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 74 / 89
Pa hways: e icien ly inding in e es ing pa hs NE- o-NE pa h enume a ion
Wha makes a NE- o-NE pa h in e es ing?
Some pa hs connec ing Pe son NEs () o O ganiza ion NEs ()
←# al ←Name ←Au ho →A ilia ion →# al →
←# al ←Name ←Au ho ←Au ho s ←A icle →Jou nal →# al →
←# al ←COI ←A icle →Jou nal →# al →←# al →
Which pa hs a e mos in e es ing and dese e o be e alua ed?
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 74 / 89
Pa hways: e icien ly inding in e es ing pa hs NE- o-NE pa h enume a ion
Wha makes a NE- o-NE pa h in e es ing?
Some pa hs connec ing Pe son NEs () o O ganiza ion NEs ()
←# al ←Name ←Au ho →A ilia ion →# al →
←# al ←Name ←Au ho ←Au ho s ←A icle →Jou nal →# al →
←# al ←COI ←A icle →Jou nal →# al →←# al →
Which pa hs a e mos in e es ing and dese e o be e alua ed?
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 74 / 89
Pa hways: e icien ly inding in e es ing pa hs NE- o-NE pa h enume a ion
Wha makes a NE- o-NE pa h in e es ing?
Some pa hs a e un eliable: we ace en i y ex ac ion e o s
E.g., “John Hopkins
| {z }
pe son
Uni e si y Hospi al”
False posi i es, o w ong en i y ype a ibu ion, e.g., “THC
|{z}
o g.
”
Some pa hs a e s uc u ally weak: we ace in o ma ion dilu ion
E.g., a pape has 50 au ho s
Pa h in e es ingness : based on edge eliabili y and edge o ce
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 75 / 89
Pa hways: e icien ly inding in e es ing pa hs NE- o-NE pa h enume a ion
Wha makes a NE- o-NE pa h in e es ing?
Some pa hs a e un eliable: we ace en i y ex ac ion e o s
E.g., “John Hopkins
| {z }
pe son
Uni e si y Hospi al”
False posi i es, o w ong en i y ype a ibu ion, e.g., “THC
|{z}
o g.
”
Some pa hs a e s uc u ally weak: we ace in o ma ion dilu ion
E.g., a pape has 50 au ho s
Pa h in e es ingness : based on edge eliabili y and edge o ce
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 75 / 89
Pa hways: e icien ly inding in e es ing pa hs NE- o-NE pa h enume a ion
Wha makes a NE- o-NE pa h in e es ing?
Some pa hs a e un eliable: we ace en i y ex ac ion e o s
E.g., “John Hopkins
| {z }
pe son
Uni e si y Hospi al”
False posi i es, o w ong en i y ype a ibu ion, e.g., “THC
|{z}
o g.
”
Some pa hs a e s uc u ally weak: we ace in o ma ion dilu ion
E.g., a pape has 50 au ho s
Pa h in e es ingness : based on edge eliabili y and edge o ce
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 75 / 89
Pa hways: e icien ly inding in e es ing pa hs NE- o-NE pa h enume a ion
Wha makes a NE- o-NE pa h in e es ing?
1Reliabili y (Ci99K )o an ex ac ion collec ion edge
The a io o NEs ha ing he ype , and ex ac ed om Ci
Pa h eliabili y: minimum ex ac ion edge eliabili y
2Fo ce (Ci→Cj)o a s uc u al collec ion edge
The in e se o he maximal sou ce node ou -deg ee among da a edges
ep esen ed by Ci→Cj
Pa h o ce: p oduc o edge o ces
3Rank pa hs on hei eliabili y, hen hei o ce
4Take a op-ko hose ha ing ≥θ
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 76 / 89
Pa hways: e icien ly inding in e es ing pa hs NE- o-NE pa h enume a ion
Wha makes a NE- o-NE pa h in e es ing?
1Reliabili y (Ci99K )o an ex ac ion collec ion edge
The a io o NEs ha ing he ype , and ex ac ed om Ci
Pa h eliabili y: minimum ex ac ion edge eliabili y
2Fo ce (Ci→Cj)o a s uc u al collec ion edge
The in e se o he maximal sou ce node ou -deg ee among da a edges
ep esen ed by Ci→Cj
Pa h o ce: p oduc o edge o ces
3Rank pa hs on hei eliabili y, hen hei o ce
4Take a op-ko hose ha ing ≥θ
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 76 / 89
Pa hways: e icien ly inding in e es ing pa hs NE- o-NE pa h enume a ion
Wha makes a NE- o-NE pa h in e es ing?
1Reliabili y (Ci99K )o an ex ac ion collec ion edge
The a io o NEs ha ing he ype , and ex ac ed om Ci
Pa h eliabili y: minimum ex ac ion edge eliabili y
2Fo ce (Ci→Cj)o a s uc u al collec ion edge
The in e se o he maximal sou ce node ou -deg ee among da a edges
ep esen ed by Ci→Cj
Pa h o ce: p oduc o edge o ces
3Rank pa hs on hei eliabili y, hen hei o ce
4Take a op-ko hose ha ing ≥θ
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 76 / 89
Pa hways: e icien ly inding in e es ing pa hs Expe imen al e alua ion
Expe imen al e alua ion: pa h in e es ingness
(τ1, τ2) min p el max p el p20
el |P| |P0|R=|P0|
|P|
PubMed
(Pe son, O ganiza ion) 0.0150 0.9142 0.0409 52 20 38.45%
(Pe son, Loca ion) 0.0150 0.9107 0.0150 30 20 66.66%
(Loca ion, O ganiza ion) 0.0150 0.9107 0.0232 34 20 58.82%
(Pe son, Pe son) 0.0150 0.9774 0.0150 24 20 83.33%
(O ganiza ion, O ganiza ion) 0.0150 0.4158 0.0232 31 20 64.51%
(Loca ion, Loca ion) 0.0150 0.0954 0.0150 20 20 100.00%
Nasa
(Pe son, O ganiza ion) 0.0014 0.0645 0.0178 191 20 10.47%
(Pe son, Loca ion) 0.0014 0.0645 0.0077 142 20 14.08%
(Loca ion, O ganiza ion) 0.0014 0.1016 0.0077 115 20 17.39%
(Pe son, Pe son) 0.0014 0.0232 0.0077 110 20 18.18%
(O ganiza ion, O ganiza ion) 0.0014 0.0581 0.0077 92 20 21.73%
(Loca ion, Loca ion) 0.0014 0.3790 0.0077 67 20 29.85%
Yelp
(Loca ion, O ganiza ion) 0.0002 0.9997 0.0002 8 8 100.00%
(Loca ion, Loca ion) 0.0002 1.0000 0.0002 11 11 100.00%
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 81 / 89
Pa hways: e icien ly inding in e es ing pa hs Expe imen al e alua ion
Expe imen al e alua ion: pa h in e es ingness
(τ1, τ2) min p el max p el p20
el |P| |P0|R=|P0|
|P|
PubMed
(Pe son, O ganiza ion) 0.0150 0.9142 0.0409 52 20 38.45%
(Pe son, Loca ion) 0.0150 0.9107 0.0150 30 20 66.66%
(Loca ion, O ganiza ion) 0.0150 0.9107 0.0232 34 20 58.82%
(Pe son, Pe son) 0.0150 0.9774 0.0150 24 20 83.33%
(O ganiza ion, O ganiza ion) 0.0150 0.4158 0.0232 31 20 64.51%
(Loca ion, Loca ion) 0.0150 0.0954 0.0150 20 20 100.00%
Nasa
(Pe son, O ganiza ion) 0.0014 0.0645 0.0178 191 20 10.47%
(Pe son, Loca ion) 0.0014 0.0645 0.0077 142 20 14.08%
(Loca ion, O ganiza ion) 0.0014 0.1016 0.0077 115 20 17.39%
(Pe son, Pe son) 0.0014 0.0232 0.0077 110 20 18.18%
(O ganiza ion, O ganiza ion) 0.0014 0.0581 0.0077 92 20 21.73%
(Loca ion, Loca ion) 0.0014 0.3790 0.0077 67 20 29.85%
Yelp
(Loca ion, O ganiza ion) 0.0002 0.9997 0.0002 8 8 100.00%
(Loca ion, Loca ion) 0.0002 1.0000 0.0002 11 11 100.00%
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 82 / 89
Pa hways: e icien ly inding in e es ing pa hs Expe imen al e alua ion
Expe imen al e alua ion: pa h in e es ingness
(τ1, τ2) min p el max p el p20
el |P| |P0|R=|P0|
|P|
PubMed
(Pe son, O ganiza ion) 0.0150 0.9142 0.0409 52 20 38.45%
(Pe son, Loca ion) 0.0150 0.9107 0.0150 30 20 66.66%
(Loca ion, O ganiza ion) 0.0150 0.9107 0.0232 34 20 58.82%
(Pe son, Pe son) 0.0150 0.9774 0.0150 24 20 83.33%
(O ganiza ion, O ganiza ion) 0.0150 0.4158 0.0232 31 20 64.51%
(Loca ion, Loca ion) 0.0150 0.0954 0.0150 20 20 100.00%
Nasa
(Pe son, O ganiza ion) 0.0014 0.0645 0.0178 191 20 10.47%
(Pe son, Loca ion) 0.0014 0.0645 0.0077 142 20 14.08%
(Loca ion, O ganiza ion) 0.0014 0.1016 0.0077 115 20 17.39%
(Pe son, Pe son) 0.0014 0.0232 0.0077 110 20 18.18%
(O ganiza ion, O ganiza ion) 0.0014 0.0581 0.0077 92 20 21.73%
(Loca ion, Loca ion) 0.0014 0.3790 0.0077 67 20 29.85%
Yelp
(Loca ion, O ganiza ion) 0.0002 0.9997 0.0002 8 8 100.00%
(Loca ion, Loca ion) 0.0002 1.0000 0.0002 11 11 100.00%
Bo h eliabili y and o ce downg ade meaningless pa hs (NE e o s o
s uc u ally weak)
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 83 / 89
Pa hways: e icien ly inding in e es ing pa hs Rela ed wo k
Rela ed wo k
S uc u ed que ying
SQL, SPARQL, GQL
[DFG+22]
Assis ed s uc . que ying
In e ac i e que ies [DAB16]
Guided que y w i ing
[ERAAL18,KKBS10]
NL2SQL [KSHL20]
Keywo d-based sea ch
Unidi ec ional
[ABC+02,LOF+08]
Bi-di ec ional [ABC+22]
Pa h sea ch in s uc . que ies
SPARQL ex ensions:
[ASMH18,AMSH18,
AMM23]
Fo PGs: [DFG+22]
Pa hways use s need no knowledge o he g aph s uc u e o alues
Less in imida ing o NTUs
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 84 / 89
Sys ems de eloped
Ou line
1Mo i a ion: da a in eg a ion and explo a ion p oblems
2P edihood: p edic ing neighbou hoods’ en i onmen
3GeoAlign: spa ial en i y ma ching o Poin s o In e es
4Abs a: i s -sigh o e iew o a da ase
5Pa hways: e icien ly inding in e es ing pa hs
6Sys ems de eloped
7Conclusion
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 85 / 89
Sys ems de eloped
Sys ems de eloped
P edihood GeoAlign
Ci y en i onmen p edic ion En i y ma ching o POIs
17 Py hon co e classes 41 PHP co e classes
DATA 2020 [BDF+21] SIGSPATIAL 2019 [BDFM19]
Abs a Pa hWays
Abs ac ions as E-R diag ams In e es ing NE- o-NE pa hs
65 Ja a co e classes 18 Ja a co e classes
CIKM 2022 [BMU22] ESWC 2023 [BGLM23b]
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 86 / 89
Conclusion
Ou line
1Mo i a ion: da a in eg a ion and explo a ion p oblems
2P edihood: p edic ing neighbou hoods’ en i onmen
3GeoAlign: spa ial en i y ma ching o Poin s o In e es
4Abs a: i s -sigh o e iew o a da ase
5Pa hways: e icien ly inding in e es ing pa hs
6Sys ems de eloped
7Conclusion
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 87 / 89
Conclusion
Lessons lea ned
Da a in eg a ion and explo a ion a e di icul :
Lack o schema o schema he e ogenei y
Da a quali y: w ong, null, missing alues, ...
La ge amoun s o da a
B ing ou insigh s and knowledge om aw da a
F om he use poin o iew:
1Use - iendly in e aces
2No echnical de ail
3High-le el ep esen a ion
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 88 / 89
Conclusion
Thanks
Nelly BARRET
¥[email p o ec ed]
§h ps://nelly-ba e .gi hub.io/
Da a Science g oup
DEIB, Poli ecnico di Milano
Milano
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 89 / 89
Re e ences I
B Adi ya, Gau a Bhalo ia, Soumen Chak aba i, A ind Hulge i, Cha u a Nakhe, S Suda shanxe, e al.
BANKS: b owsing and keywo d sea ching in ela ional da abases.
In VLDB’02: P oceedings o he 28 h In e na ional Con e ence on Ve y La ge Da abases, pages 1083–1086. Else ie ,
2002.
Angelos Anadio is, Oana Balalau, Ca a ina Conceicao, e al.
G aph in eg a ion o s uc u ed, semis uc u ed and uns uc u ed da a o da a jou nalism.
In . Sys ems, 104, 2022.
Angelos Ch is os Anadio is, Ioana Manolescu, and Madhulika Mohan y.
In eg a ing connec ion sea ch in g aph que ies.
In ICDE, Ap il 2023.
Ch is ian Aebeloe, Gab iela Mon oya, Vinay Se y, and Ka ja Hose.
Disco e ing di e si ied pa hs in knowledge bases.
P oc. VLDB Endow., 11(12):2002–2005, 2018.
Code a ailable a : h p://qweb.cs.aau.dk/jedi/.
Ch is ian Aebeloe, Vinay Se y, Gab iela Mon oya, and Ka ja Hose.
Top-k di e si ica ion o pa h que ies in knowledge g aphs.
In Ma ieke an E p, Medha A e, Vanessa L´opez, Ka i ha S ini as, and Ca olina Fo una, edi o s, P oceedings o he
ISWC 2018 Pos e s & Demons a ions, Indus y and Blue Sky Ideas T acks co-loca ed wi h 17 h In e na ional Seman ic
Web Con e ence (ISWC 2018), Mon e ey, USA, Oc obe 8 h - o - 12 h, 2018, olume 2180 o CEUR Wo kshop
P oceedings. CEUR-WS.o g, 2018.
Mohamed Amine Baazizi, Da io Colazzo, Gio gio Ghelli, and Ca lo Sa iani.
Pa ame ic schema in e ence o massi e JSON da ase s.
VLDB J., 28(4), 2019.
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 1 / 14
RDF quo ien g aph summa iza ion [GGM20]
Sou ce clique: se o ou going p ope ies co-occu ing oge he on a
leas one node
Ta ge clique: se o incoming p ope ies co-occu ing oge he on a
leas one node
P ope ies “a”, “b”, “d” a e in he
same sou ce clique
P ope ies “a” and “e” a e in he
same a ge clique
(c) Pawel Guzewic
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 8 / 14
S ong summa y [GGM20]
S ong S summa y:
Two nodes a e S equi alen i hey ha e bo h he same sou ce and
a ge cliques
Sou ce and a ge cliques o each
node S ong summa y
(c) Pawel Guzewic
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 9 / 14
Typed-s ong summa y [GGM20]
Typed-s ong TS summa y:
Two yped nodes a e TS equi alen i hey ha e he same ype se
Two un yped nodes a e TS equi alen i hey ha e bo h he same
sou ce and a ge cliques
Sou ce and a ge cliques o each
node + an RDF ype Typed-s ong summa y
(c) Pawel Guzewic
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 10 / 14
Disag eemen be ween Flai and Cha GPT
False Flai posi i es:
Flai iden i ies “A . Pe e Hen y Rol s
| {z }
pe son
36570-900 Vicosa”
Flai mislead by capi aliza ion:
Flai iden i ies “Claudin-7b
| {z }
pe son
” (bu no Cha GPT)
Di e en oken alloca ion:
“Uni e si y o Alabama
| {z }
o g.
”, “Bi mingham
| {z }
loc.
”
“Uni e si y o Alabama, Bi mingham
| {z }
loc.
”
Missed non-English spelling/names:
Cha GPT inds “An onio Gonz´alez
| {z }
pe son
”
Cha GPT inds “Yoshida, Sakyo-ku, Kyo o 606-8501, Japan
| {z }
loc.
”
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 11 / 14
A comp ehensi e da a explo a ion ool o NTUs
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 12 / 14
A comp ehensi e da a explo a ion ool o NTUs
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 13 / 14
Expe imen al e alua ion: Flai VS Cha GPT NE ex ac o s
Flai and Cha GPT mos ly ag ee
Cha GPT ex ac ion has be e quali y
Nelly Ba e (DEIB@PoliMi) Da a in eg a ion and explo a ion Ap il 19, 2024 14 / 14