scieee Science in your language
[en] (orig)

How do we integrate community data from Wikidata and the fuzzy-sl Wikibase into a cultural heritage knowledge graph?

Author: Stefan, Daria; Thiery, Florian; Schenk, Fiona
Publisher: Zenodo
DOI: 10.5281/zenodo.17338461
Source: https://zenodo.org/records/17338461/files/CAA2025Athens_Wikidata_v2.pdf
How do we in eg a e communi y da a om
Wikida a and he uzzy-sl Wikibase in o a
cul u al he i age knowledge g aph?
Da ia S e an*
1,2
, Flo ian Thie y*
2
, Fiona Schenk
3,2
1 TU Wien – Vienna, Aus ia
2 Resea ch Squi el Enginee s Ne wo k – Mainz/Vienna, Ge many/Aus ia
3 Johannes Gu enbe g Uni e si ä Mainz – Mainz, Ge many
* Co esponding au ho (s)
Co espondence: [email p o ec ed] ; mail@ hie y.de
1
2
3
4
5
6
7
8
9
10
11
12
A BSTRACT
Communi y-cu a ed Wikibase ecosys ems, mos no ably Wikida a, Fac G id, and
wikibase.cloud ins ances such as uzzy-sl, ha e become significan sou ces o
Cul u al He i age (CH) and a chaeological da a. In pa allel, in as uc u e ini ia i es
(e.g., NFDI4Objec s) a e building CIDOC-aligned knowledge g aphs ha demand
obus p o enance, seman ic in e ope abili y, and ep oducibili y. This pape p esen s
a semi-au oma ed wo kflow o in eg a ing Wikibase da a in o in as uc u e-scale
g aphs, including en i y selec ion, on ology design aligned wi h CIDOC CRM (wi h
CRMa chaeo, CRMsci, and CRMdig as needed), sc ip ed RDF ans o ma ion, open
publica ion o code and da a, e sioned snapsho eleases, and inges ion in o he
NFDI4Objec s Knowledge G aph. The app oach p ese es Wikida a-s yle qualifie s
and e e ences while yielding an e en - and p o enance-cen ic ep esen a ion. Two
use cases demons a e easibili y and limi s. The I ish Holy Wells da ase
showcases ichly eified s a emen s (e.g., use, sou ces) mapped o CIDOC e en s.
The Campanian Ignimb i e findspo s in uzzy-sl ocus on loca ion-cen ic modelling
and in e disciplina i y, equi ing E53 Place as baseline wi h CRMa chaeo/CRMsci
specialisa ion o s a ig aphy, sampling, and analysis. We discuss challenges in
on ology alignmen , g anula i y, explici ea men o unce ain y (“ uzzy/wobbly”
da a), and sus aining semi-au oma ed pipelines amid e ol ing communi y schemas.
We a gue o iden ifie discipline, machine-ac ionable p o enance, and FAIR Digi al
Objec s o each elease. The ou come is an in e ope able, ede a ed ecosys em,
spanning iples o es, Wikibases, and FDOs, in which communi y knowledge bases
become pa ne s o in as uc u e g aphs.
Keywo ds: Wikibase, Wikida a, Linked Da a, On ology, A chaeology, Geosciences
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
In oduc ion
Communi y-d i en eposi o ies and knowledge bases – mos p ominen ly Wikida a,
Fac G id, and Wikibase ins ances on wikibase.cloud such as uzzy-sl – cons i u e
subs an ial ese oi s o cul u al he i age (CH) and a chaeology- ela ed da a [1], [2].
Cu a ed by olun ee s, ci izen scien is s, and independen esea che s, hese pla o ms
equen ly close gaps le by p ojec -bound ini ia i es and en ich he b oade esea ch
ecosys em wi h imely, fine-g ained obse a ions and links o he e ogeneous sou ces. In
pa allel, la ge-scale in as uc u e e o s (e.g., wi hin he Ge man Na ional Resea ch
Da a In as uc u e, NFDI [3]) a e building knowledge-g aph-based se ices o eeding
he in e disciplina y ede a ed Knowledge G aph Ecosys em [4], [5] (Fig. 1), g ounded in
es ablished seman ic amewo ks (RDF/OWL) and domain on ologies such as CIDOC
CRM and i s ex ensions. A key challenge – and oppo uni y – lies in in eg a ing
communi y-d i en esou ces wi h hese in as uc u e g aphs in a way ha is
me hodologically sound, FAIR, and sus ainable.
Figu e 1 - A dis ibu ed Knowledge G aph Scheme b inging oge he Linked Open
Da a and FAIR Digi al Objec s app oaches. Flo ian Thie y & And eas Noback, CC BY
4.0, ia Wikimedia Commons.
Wi hin his landscape, Ge many’s NFDI4Objec s conso ium [6] ac s as an
in e disciplina y hub o a chaeology and cul u al he i age, spanning an excep ionally
b oad empo al and ma e ial scope. The NFDI4Objec s Knowledge G aph
1 al eady
in e links di e se da ase s
2 (e.g., Linked Open Ogham, A ican Red Slip Wa e, Linked
Open Samian Wa e), employing CIDOC CRM and selec ed ex ensions (e.g., CRMa chaeo,
CRMdig) o enable in e ope able que ying ac oss collec ions, si es, and esea ch
ou pu s
3
. Le e aging Linked Open Da a (LOD) p ac ices [7], [8] he e is no me ely a
echnical choice: i unde pins ep oducible schola ship, aceable p o enance, and
3 c . h ps://doi.o g/10.5281/zenodo.13946052
2 c . h ps://g aph.n di4objec s.ne /collec ion/
1 c . h ps://g aph.n di4objec s.ne /
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
c oss-da ase disco e y. The ques ion, hen, is how o in e ace cu a ed, e ol ing
communi y pla o ms – each wi h i s own modelling idioms and go e nance – wi h a
ha monised, CIDOC-aligned knowledge g aph a in as uc u e scale.
Ou p oposed app oach is a semi-au oma ed, six-s ep wo kflow ha (1) iden ifies
CH- ele an en i ies in Wikibase ins ances, (2) designs o selec s an on ology aligned o
CIDOC CRM (supplemen ed whe e app op ia e), (3) ans o ms he sou ce da a in o RDF
ia sc ip ed pipelines, (4) publishes code and da a o ci a ion and e sioning, (5)
eleases egula da ase snapsho s o ensu e cu ency and pe sis ence, and (6) inges s
hese eleases in o, e.g. he NFDI4Objec s Knowledge G aph. This pipeline aims o
espec he open-wo ld assump ion o LOD and p ese e he ich con ex o s a emen s
and qualifie s. I e e ences ypical Wikida a-s yle modelling, while p oducing a cohe en ,
e en - and p o enance-cen ic ep esen a ion in CIDOC CRM.
The Wiki e se p o ides complemen a y a o dances in his se ing [9], [10]. Wikida a
o e s mul ilingual b ead h, ma u e p ope y cons ain s, and widely adop ed iden ifie s;
Fac G id suppo s humani ies-o ien ed p ojec s wi h de ailed his o ical con ex ; and
uzzy-sl a ge s spa ial unce ain y explici ly, in oducing modelling pa e ns o
agueness and ambigui y in findspo desc ip ions. By iangula ing ac oss hese
s eng hs, we can de i e a leas common denomina o o classes and p ope ies ha is
bo h mappable o CIDOC CRM and se iceable o downs eam analy ical asks. The
objec i e is no o e ase local nuances bu o encode hem ai h ully, o example, by
ans o ming qualifie s and eifica ions in o e en -cen ed CIDOC cons uc s wi h explici
imespans, ac o s, and sou ces.
Two use cases ancho ou con ibu ion. Fi s , he I ish Holy Wells da ase (Wikida a’s
WikiP ojec Holy Wells
4
) exemplifies ichly eified s a emen s ha combine s a us,
conse a ion, and sou cing p ac ices ac oss he e ogeneous e idence (e.g., O dnance
Su ey maps, local his o ies, Wikimedia Commons media). He e, he ask is o p ese e
e e en ial in eg i y (Q/P iden ifie s) and ans o m qualifie s in o CIDOC CRM e en
pa e ns, he eby enabling empo al easoning and p o enance-awa e que ies. Second,
he Campanian Ignimb i e (CI) findspo s [11] om he uzzy-sl Wikibase ocus on spa ial
unce ain y and c oss-domain alignmen (a chaeology–geosciences). In his case,
uzzy-sl ca ego ies and qualifie s can be mapped o CIDOC CRM (wi h
CRMsci/CRMa chaeo as needed), while p ese ing unce ain y en elopes and
obse a ional p o enance. Bo h cases s ess machine-ac ionable p o enance and
in e ope able unce ain y, which a e p e equisi es o meaning ul agg ega ion ac oss
NFDI4Objec s and beyond. Me hodologically, h ee p inciples guide he in eg a ion:
1. Iden ifie discipline . We e ain o e e ence au ho i a i e URIs (e.g., Wikida a
Q-codes) and, whe e app op ia e, asse equi alences (e.g., owl:sameAs) o
main ain global cohe ence. This minimises agile s ing-ma ching and ensu es
s able joins ac oss sou ces.
4 c . h ps://www.wikida a.o g/wiki/Wikida a:WikiP ojec _HolyWells
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
2. E en -cen ic modelling . Ra he han collapsing complex, qualified s a emen s
in o imeless iples, we ma e ialise he implied ac i i ies (assessmen ,
documen a ion, naming, celeb a ion, sampling) as CIDOC CRM e en s wi h
agen s, imespans, and documen a y e idence. This p ese es seman ics
essen ial o his o ical easoning and da a quali y assessmen .
3. Unce ain y as fi s -class da a . Spa ial agueness, compe ing a ibu ions, and
e ol ing s a es a e modelled explici ly (e.g., ia condi ion assessmen s and
empo ally scoped s a es; uzzy spa ial oo p in s), aligning wi h he open-wo ld
assump ion and a oiding misleading alse p ecision.
In sum, we posi ion communi y knowledge bases no as ex e nal “ eeds” bu as
coequal knowledge pa ne s whose specifici y and dynamism en ich in as uc u e-scale
g aphs. By combining sc ip ed wo kflows, CIDOC-complian pa e ns, and igo ous
iden ifie s a egies, ou app oach enhances he FAIRness and analy ic u ili y o CH da a
while main aining he flexibili y o pe iodic, au oma ed e eshes ha keep
in as uc u e g aphs in sync wi h ongoing communi y e o s. In his way, Wikibase
ins ances can become in eg al pa s o an in e disciplina y ede a ed Knowledge G aph
ecosys em ha spans iples o es, Wikibases, and FAIR Digi al Objec s.
Use Case: I ish Holy Wells om Wikida a
The WikiP ojec HolyWells (Q126443484) p o ides a complex and ichly s uc u ed
da ase . I combines con ibu ions om bo h academic esea che s and ci izen
scien is s, complemen s ex ual da a [12] wi h ichly anno a ed media om Wikimedia
Commons, and in eg a es in o ma ion om he e ogeneous sou ces, bo h digi al and
analogue (Fig. 2 and 3). By using eifica ions, he p ojec documen s he e olu ion o
conse a ion and use s a us, speci ying he con ex in which each condi ion was
eco ded. These ea u es a e op imal o demons a ing he p ocess o aligning a flexible
model like Wikida a o an in e ope able and consis en seman ic amewo k like CIDOC
CRM, while also exposing he p ac ical challenges ha a ise when a emp ing o
au oma e he ans o ma ion o such mul i ace ed da a.
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137

Figu e 2 - Example: The Ci izen Science Wikida a P ojec Holy Wells. CC0.
Figu e 3 - F esh o d: S Lach ain's Well in Wikida a. CC0 and OSM Con ibu o s.
This p ojec aims o de elop a semi-au oma ic on ology design and alignmen
wo kflow ha can be implemen ed en i ely wi h Py hon lib a ies. The cu en wo kflow
p o o ype begins by que ying Wikida a and s o ing he da a in ela ional ables wi hin
Pos g es. A CIDOC CRM complian on ology is designed in P o égé, along wi h
mappings ha associa e he da a s o ed in Pos g es wi h i s new o m. Then he iples
a e ma e ialised using he On op plugin, c ea ing he final knowledge g aph. While he
ex ac ion, ans o ma ion, and loading p ocesses can be la gely au oma ed using
Py hon lib a ies, designing a CIDOC CRM–complian on ology emains a non- i ial ask,
as i equi es ca e ul analysis and alignmen o he loosely s uc u ed in o ma ion om
Wikida a. In he ollowing sec ions, we ou line how he da a was acqui ed, s uc u ed,
and ul ima ely aligned wi h CIDOC CRM, as well as he specific challenges we
encoun e ed.
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
Que ying Wikida a
Fo he da a explo a ion s ep, we se up a SPARQL endpoin and que ied o all iples
wi h an ins ance o Holy Well Seman ic Concep (Q126443332) as hei subjec . Holy
Well Seman ic Concep is a subclass o Holy Well (Q1371047), and e e y en y in he
I ish Holy Wells p ojec is an ins ance o his class.
To ensu e bo h accu acy and in e ope abili y, que ies e ie e no only he
human- eadable labels o en i ies and p ope ies bu also hei unique Wikida a
iden ifie s: Q-IDs o en i ies and P-IDs o p ope ies. Embedding hese Q-IDs di ec ly
in o he URIs o indi iduals modelled in P o égé while keeping he Wikida a p efix is ou
s a egy o main aining e e en ial in eg i y. Human- eadable labels, desc ip ions, and
aliases a e s ill inco po a ed as d s:label, skos:al Label, and schema:desc ip ion,
allowing seman ic cla i y o human use s and mul ilingual applica ions.
The ini ial que y e ie ed 229 dis inc subjec s and 37 unique s a emen s. Among he
mos equen ly used p ope ies a e ‘desc ibed by sou ce’ (P1343), ‘ins ance o ’ (P31),
‘loca ed in he adminis a i e e i o ial en i y’ (P131), ‘in en o y numbe ’ (P217), and
‘collec ion’ (P195). Wikida a’s da a model is buil a ound s a emen s, qualifie s, and
e e ences [13]. Each ac is exp essed as a s a emen , while qualifie s add con ex ual
condi ions and e e ences poin o suppo ing e idence, bu a e no di ec ly connec ed o
he subjec . All s a emen s we e examined o hei domain and ange cons ain s o
acili a e seman ic alignmen and o ensu e accu a e mapping in o a CIDOC CRM-based
on ology.
Fo some en ies, bo h s a us in o ma ion and con ex ual in o ma ion a e missing
al oge he ; o o he s, s a uses a e decla ed di ec ly as p ope ies, wi hou u he
suppo ing in o ma ion. A smalle subse con ains eified s a emen s, which allow
qualifie s such as he s a e o use o be a ached o a p ope y, such as ‘ins ance o ’
(P31) and o e e ences o be added as well, men ioning he sou ce o he claim. Bo h
he classifica ion o he en i y and an assessmen o i s condi ion a e linked o he same
piece o e idence. F om a CIDOC CRM pe spec i e, his p ac ice aligns wi h he
e en -cen ic modelling o knowledge, whe e he asse ion o a p ope y is i sel si ua ed
in ime, a ibu ed o an ac o , and suppo ed by e idence. Que ies o e such s a emen s
aim o p ese e hei s uc u e and con ex , e ie ing no only he comple e s a emen ,
bu also he supe class o he sou ce and all addi ional in o ma ion con ained on he
sou ce’s page, such as au ho and publica ion da e.
Wikida a inco po a es and e e ences ex e nal esou ces such as YouTube,
OpenS ee Map, I ish Si es and Monumen s Reco ds h ough specific p ope ies known
as ex e nal iden ifie s. Those p ope ies a e all ins ances o subclasses o he class
Unique Iden ifie (Q6545185). Fo each iden ifie , que ies collec bo h he e e ence i sel
and i s associa ed qualifie s, which may include e ie al da e, publishe , o language,
and in some cases, he classifica ion o he qualifie alues. Fo example, a linked
‘YouTube ideo’ (P1651) can be que ied alongside he ideo’s ‘publica ion da e’ (P577)
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
and ‘du a ion’ (P2047), showing how audio isual con en is ancho ed in Wikida a wi h
desc ip i e and empo al a ibu es.
A di e en s a egy is equi ed o Wikimedia Commons, since me ada a such as
au ho ship, c edi , and licensing is no ully exposed h ough he Wikida a SPARQL
endpoin . Ins ead, filenames e ie ed om Wikida a mus fi s be no malised, a e
which he Wikimedia API is used o e ch ex ended me ada a. HTML agmen s a e hen
pa sed o ex ac s uc u ed in o ma ion, such as a is , c edi name and links. While he
echnical que ying mechanism di e s, he goal emains he same—en iching iden ifie s
wi h con ex ual me ada a, like da e and place o he c ea ion o an image, i s au ho and
license policy.
Alongside eified s a emen s and ex e nal iden ifie s, some p ope ies a e exp essed
as di ec s a emen s wi hou addi ional qualifie s. These include spa ial and ca ego ical
a ibu es such as geog aphic coo dina es (con e ed in o bo h WKT and DMS o ma s
o s anda dised ep esen a ion), diocese affilia ion, and medical condi ion. Fo hese
cases, a simplified que y pa e n e ie es he subjec –p edica e–objec iple di ec ly
and supplemen s i wi h he ‘ins ance o ’ (P31) classifica ion o he objec , ensu ing ha
e en simple s a emen s emain embedded wi hin a s uc u ed and in e p e able da a
model.
CIDOC CRM On ology Mapping
T ansla ing he acqui ed in o ma ion in o CIDOC CRM equi es he cons uc ion o a
cus om on ology wi h mappings. While he ul ima e aim is o de elop a semi-au oma ed
wo kflow, we cu en ly p oduce hem manually, p o iding a aluable “g ound u h” o
u u e expe imen s wi h au oma ion. The p ocedu e begins wi h iden i ying salien
ea u es o bo h on ologies: o Wikida a, hese include en i y Q-codes, labels,
desc ip ions, ‘ins ance o ’ (P31) alues, and p ope y domain and ange es ic ions,
while o CIDOC CRM hey consis o class desc ip ions and p ope y domain and ange
cons ain s as defined in he documen a ion. The ep esen a i e example o
Columbkile’s Well (Q126456441) was modelled acco ding o Table 1:
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
Table 1 – co espondences when modelling ela ionship ins ance o (P31) in
wikida a o a CIDOC complian on ology o he example en i y Columbkille’s well.
Columbkile’s Well
Wikida a
On ology
Well is Ins ance O X
X is Subclass O
Well is Ins ance O Y
Y is Subclass o
A chaeological si e
(Q839954)
loca ion o disco e y
(Q1291195),
human-geog aphic
e i o ial en i y
(Q15642541), geog aphic
loca ion (Q2221906)
A chaeological Si e
E27 Si e, E53 Place
Wa e sou ce (Q10713454)
body o wa e (Q15324),
wa e esou ce (Q1049799)
Wa e Sou ce
E27 Si e, E53 Place
Holy well (Q1371047)
oun ain(Q483453),
s uc u e o
wo ship(Q1370598)
sp ing (Q124714), wa e
well(Q43483)
Holy Well
Foun ain, Wa e Well (bo h
a e subclasses o E25
Human made Fea u e )
Holy well seman ic
concep (Q126443332)
Holy well (Q1371047)
Holy Well Seman ic
Concep
Holy Well
This s ep b ings in ano he le el o complexi y: he ela ionships om Wikida a do no
co espond one- o-one o he ones defined in CIDOC, since he la e is e en -based, and
mos e en s, like documen ing, assessing, naming, o celeb a ing, a e no explici ly
s a ed in Wikida a. The implied exis ence o hese e en s has o be ecognised by a
human and placed in he co ec con ex , modelling causali y and empo ali y.
Once he ele an classes ha e been iden ified, he ela ionships be ween hei
ins ances mus also be ansla ed. This can be au oma ed h ough he use o
mappings—que ies ha ex ac da a om he ela ional da abase c ea ed a e que ying
Wikida a and eshape i in o he a ge on ology s uc u e, he eby ma e ialising
indi iduals. Al hough On op au oma es he ma e ialisa ion s ep i sel , he mappings ha
define how he da a is ans o med (Figu e 10) need o be w i en manually.
The ollowing cases illus a e his modelling p ocess using Columbkille’s well as an
example. Fo he ‘named a e ‘ p ope y (P138), he p ocess begins by e ie ing he
subjec (Columbkille’s Well), p edica e and objec (Columba), oge he wi h he objec ’s
Q-code (Q236326), class (human) and labels om Wikida a. A co esponding indi idual
(Columba) is c ea ed in he on ology, as an ins ance o he co ec CIDOC class, E21
Pe son . Columbkille’s Well is iden ified by i s name, which is a new indi idual ins ance o
E41 Appella ion . This name ‘P67 e e s o ’ Columba, he pe son.
The eas day celeb a ion o Columba was ‘ P17 mo i a ed by ’ Columba and ‘ P4 has
imespan ’ o he Feas Day o Columba, a new indi idual ins ance o E7 Ac i i y . June 9 is
a new indi idual ins ance o SP14 Time Exp ession and ‘Q16 defines ime’ o he eas
day wi hou modi ying he s ing e u ned by he wikida a que y.
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
Fo a chaeological si es, examples include Topli sa Ca e [15] (Q115), F anch hi Ca e
[16] (Q111; Fig. 11 and 12), and C ena S iljena Ca e [17] (Q89). These loca ions a e
ele an because hey combine e idence o human occupa ion wi h CI eph a, allowing
o in e ences abou human – en i onmen in e ac ions, ch onological ancho s, and
cul u al esponses o olcanic e en s. Wi hin a CIDOC CRM pe spec i e, hese en i ies
can emain modelled as E53 Place , bu equi e linking o E27 Si e (cul u ally defined
locale) and, whe e exca a ion da a exis , o CRMa chaeo’s A8 S a ig aphic Uni o laye s
con aining olcanic ash. E en s such as he deposi ion o eph a o i s disco e y can
hen be ep esen ed as E5 E en ins ances, enabling empo al and causal easoning.
Fo geological si es, en ies such as DE3 Dehne Maa (Q85), Auel Maa AU3 (Q70),
and U luia Qua y [18] (Q73) documen CI deposi s iden ified h ough field su eys and
geoscien ific sampling [14]. He e, CIDOC CRM alone is insufficien . In eg a ion wi h
CRMsci is necessa y o ep esen sampling ac i i ies (e.g., S4 Obse a ion, S19
Encoun e E en ) and analy ical ou comes, while CRMdig can suppo p o enance o
labo a o y p ocesses. This ensu es ha eph a samples a e no me ely a ibu es o
places bu a e documen ed as en i ies gene a ed by scien ific p ocedu es wi h aceable
p o enance and ep oducibili y.
A p esen , he uzzy-sl Wikibase cap u es hese si es p ima ily as labelled places
wi h coo dina es and limi ed qualifie s. The p ope ies and qualifie s a e no ye mapped
o CIDOC CRM, eflec ing he s ill-expe imen al s a e o he modelling. None heless, his
da ase highligh s a me hodological challenge: how o align he e ogeneous disciplina y
pe spec i es on he “same” si e. A chaeologis s p io i ise cul u al laye s and human
in e ac ion, whe eas geoscien is s ocus on s a ig aphic sequences and geochemical
signa u es. A ede a ed g aph mus accommoda e bo h pe spec i es wi hou fla ening
hem in o an o e simplified schema. The in e disciplina y in eg a ion can be achie ed by
adop ing an e en -cen ic app oach:
● The e up ion i sel is modelled as an E5 E en , p oducing deposi s wi h a defined
imespan.
● Geological obse a ions (sampling, analysis) a e modelled ia CRMsci, linked o
bo h he deposi s and he places whe e hey occu .
● A chaeological con ex s a e ep esen ed as E27 Si es , s a ig aphic uni s, o
ma e ial cul u e associa ions, connec ed o he same deposi s bu wi h addi ional
cul u al in e p e a ion.
By in e linking hese elemen s, he knowledge g aph suppo s que ies ha a e se
disciplina y bounda ies, e.g., “Which a chaeological si es wi h CI deposi s coincide wi h
geologically da ed findspo s olde han ~40.000 y b2k?” o “Which s a ig aphic
con ex s combine olcanic ash wi h a e ac ual assemblages?” C ucially, his case
demons a es he ole o CIDOC CRM as an in e ope abili y b idge. While E53 Place
suffices as a baseline, in e disciplina y da a in eg a ion equi es laye ing addi ional
on ologies. Fo CI findspo s, his means ha he same URI may unc ion simul aneously
as a E53 Place , an E27 A chaeological Si e , and he locus o scien ific obse a ions
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398

(CRMsci). Such polyhie a chical modelling aligns wi h he open-wo ld assump ion and
p ese es he capaci y o expand as new da a o disciplina y pe spec i es eme ge.
F om he pe spec i e o he NFDI4Objec s Knowledge G aph, he CI case o e s an
oppo uni y a he han a limi a ion. E en hough he implemen a ion is no ye
ope a ional, he concep ual mapping illus a es how communi y-cu a ed Wikibase da a
can flow in o in as uc u e-scale g aphs. The benefi lies in es ablishing iden ifie
discipline (s able URIs, equi alences o Wikida a Q-IDs), documen ing p o enance o
scien ific ac i i ies, and ensu ing ha unce ain y in spa ial and ch onological esolu ion
is no collapsed bu explici ly ep esen ed.
In conclusion, he Campanian Ignimb i e use case unde sco es he po en ial o
semi-au oma ed wo kflows o ede a ed knowledge g aph cons uc ion. By in eg a ing
uzzy-sl da a in o a CIDOC-aligned g aph, we no only cap u e a pi o al La e Pleis ocene
olcanic e en bu also showcase how in e disciplina y collabo a ion – b idging geology
and a chaeology – can be encoded as Linked Open Da a. This se s he s age o u u e
implemen a ion whe e lab analysis o CI si es becomes FAIR Digi al Objec s wi hin he
b oade ede a ed ecosys em o NFDI and beyond.
Discussion & Conclusion & Ou look
The wo case s udies highligh bo h he oppo uni ies and he challenges o
in eg a ing communi y-cu a ed Wikibase da a in o in as uc u e-scale knowledge
g aphs. The Holy Wells illus a e he po en ial o ichly eified cul u al he i age
s a emen s, while he Campanian Ignimb i e (CI) findspo s expose he difficul ies o
aligning a chaeological and geoscien ific pe spec i es.
A key challenge lies in on ology alignmen and modelling. Wikibase p ope ies and
qualifie s a e o en oo flexible o be mapped di ec ly in o CIDOC CRM s uc u es. While
CIDOC CRM p o ides an e en -cen ic amewo k, i s ex ensions (CRMa chaeo, CRMsci,
CRMdig) a e equi ed o ep esen s a ig aphic easoning, sampling p ocedu es, and
labo a o y p o enance.
Fo he Holy Wells example, he fi s explo a ion s ep e ealed a modelling
inconsis ency in Wikida a: ins ances o Holy Well Seman ic Concep appea as he
subjec s o he ‘ eas day’ (P841) p ope y, despi e i being es ic ed o humans, g oups
o humans, i les o Ma y, legenda y figu es, a ibu es o God, Bible s o ies, o
pe iscopes. Addi ionally, Holy Wells a e no consis en ly named a e o associa ed wi h
Ch is ian eligious concep s, no do hey always ha e a fixed eas day. Fo example, he
Well o he Rags (Q126647640) is named a e a cloo ie ee (Q107257053) and has a
eas day “Sunday a e Augus 13”. This issue also ies in wi h he conse a ion and
usage s a e qualifie s: acco ding o he p ojec desc ip ion, i he eas day o a well is
cu en ly being celeb a ed, i should be ma ked as being in use (Q55654238) and as an
ins ance o (P31) a eligious si e (Q105889895).
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
Al hough humans can in ui i ely unde s and hese associa ions, modelling hem in
CIDOC CRM aises ques ions: I he celeb a ion is o be modelled as an e en , hen
addi ional con ex ual de ails a e equi ed: Is he celeb a ion mo i a ed by de o ion o a
specific eligious figu e? Does i ake place a he well i sel , o does i in ol e he well in
any way? Has he celeb a ion been documen ed, and i so, by whom and when? Was i
his o ically ied o a pa icula pe iod, and is i s ill p ac ised oday? These ques ions
highligh which con ex ual in o ma ion is no explici ly encoded and mus he e o e be
added when modelling he da a in CIDOC, aking he eas day om a simple ecu ing
poin in ime o an on ologically app op ia e en i y ha exp esses he esea che s’
in ended seman ic cons uc .
Addi ionally, he p ojec sugges s a simplified modelling s a egy ha limi s s a e
qualifie s o wo op ions (in use o abandoned), and conse a ion s a e qualifie s o
h ee (p ese ed, in dange , o demolished/des oyed), e en hough Wikida a p ope y
cons ain s allow o a much wide ange o alues. This s a egy implici ly eflec s a
closed wo ld assump ion in which he qualifie s a e ea ed as mu ually exclusi e and
exhaus i e: i a well is no in use, i is assumed o be abandoned. LOD ope a es unde an
open-wo ld assump ion, whe e he absence o a qualifie does no imply i s nega ion; he
lis o possible s a es is no seen as exhaus i e, and s a es can co-occu . CIDOC CRM
aligns wi h he open wo ld assump ion and suppo s fine-g ained modelling o use and
conse a ion s a es ac oss ime, while also d awing a en ion o he gaps ha hinde
au oma ic easoning.
Ano he challenge is posed by he classifica ion s ep, whe e candida e
co espondences a e gene a ed by compa ing he Wikida a classes agains CIDOC
classes. This s ep p o ed challenging e en o humans, as i equi es a nuanced
unde s anding o he fine-g ained dis inc ions in CIDOC CRM (e.g. E22 Human-Made
Objec s E25 Human-Made Fea u e o E31 Documen s E73 In o ma ion Objec ), he
implica ions o CIDOC’s inhe i ance hie a chy o p ope y domains and anges, as well
as basic unde s anding o he modelled subjec s and hei indi idual pa icula i ies.
Deciding on an app op ia e g anula i y le el o he classifica ion s ep is a u he
non- i ial ask: oo gene ic, and aluable seman ics a e los ; oo specific, and
in e ope abili y su e s.
The decision-making p ocess and heu is ics employed o classi y Columbkile’s Well
(Q126456441, Table 1) we e based mainly on “common sense” and backg ound
knowledge, which is no o iously flimsy and difficul o encode in o a o mal decision
g aph o compu a ional use. I he p ocess we e o be au oma ed, subsequen s eps
would in ol e compu ing simila i y measu es o all candida e co espondences,
agg ega ing hese esul s, and applying h esholds o fil e ou un eliable pai ings. The
mapping would hen be efined h ough i e a i e cycles, inco po a ing s uc u al indices,
con ex ual neighbou hood in o ma ion, map-disco e y and map- epai echniques, un il
consis ency is eached and all classes find a co espondence. This p ocess would be
applied no only o all 229 subjec s in he Wikida a p ojec bu also o e e y en i y
appea ing as an objec in a ela ed s a emen , qualifie , o e e ence. The scale and
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
he e ogenei y o his ask make i clea ha one o he cen al challenges lies in
de eloping an au oma ed app oach ha is simul aneously scalable and eliable.
The second challenge is in e disciplina i y. A chaeologis s end o ocus on cul u al
laye s, a e ac s, and human ac i i ies, while geoscien is s p io i ise s a ig aphy,
geochemis y, and analy ical wo kflows. Modelling a single loca ion as bo h an
a chaeological si e and a geological ou c op equi es mul i-pe spec i e ep esen a ions
ha can coexis wi hou con adic ion.
The CI case shows ha he same en i y may need o unc ion simul aneously as an
E53 Place , an E27 Si e , and he locus o a CRMsci obse a ion. Such polyhie a chical
modelling is concep ually demanding bu essen ial i he ede a ed g aph is o suppo
que ies ac oss disciplines. A u he issue is unce ain y. Communi y da a o en con ains
ague coo dina es, con es ed ch onologies, o shi ing a ibu ions. Unless explici ly
ep esen ed, hese ambigui ies isk being fla ened in o misleading p ecision. Handling
“ uzzy” and “wobbly” da a equi es no only echnical modelling pa e ns bu also sha ed
s anda ds ha balance usabili y wi h accu acy. Au oma ion and sus ainabili y p esen
ano he laye o difficul y. Semi-au oma ed pipelines (SPARQL ex ac ion, RDF
con e sion, ETL) a e indispensable o scalabili y, ye hey depend on e ol ing
communi y schemas. Quali y assu ance, pe sis en iden ifie s, and e sioned eleases
a e needed o ensu e ha in as uc u e knowledge g aphs emain s able e en as
communi y ins ances change dynamically.
In sum, he discussion e eals a ension be ween communi y dynamism and
in as uc u e s abili y. Communi y Wikibases h i e on openness, apid e olu ion, and
olun ee con ibu ions; in as uc u es such as NFDI4Objec s equi e pe sis ence,
ci abili y, and eliabili y. Reconciling hese modes demands ca e ul go e nance,
ep oducible wo kflows, and machine-ac ionable p o enance.
Looking o wa d, he on ology used in he uzzy-sl Wikibase will equi e efinemen
and close alignmen wi h CIDOC CRM, MaCHeCO, and he Objec Co e Me ada a P ofile.
Semi-au oma ed wo kflows o ex ac ion and mapping mus be s abilised, wi h each
elease packaged as a FAIR Digi al Objec . Mo e impo an ly, he de elopmen o
communi y s anda ds o unce ain y and in e disciplina i y will be cen al o scaling
beyond indi idual use cases. I hese s eps a e aken, Wikibase ins ances such as
Wikida a, Fac G id, and uzzy-sl can e ol e om expe imen al eposi o ies in o in eg al
componen s o a ede a ed, in e disciplina y knowledge g aph ecosys em ha b idges
a chaeology, cul u al he i age, and he geosciences.
Acknowledgemen s
The au ho s would like o hank S ephen S ead o his ad ice as well as he CAA
Ge many, SIG Da a D agon and NFDI4Objec s Communi y. The au ho s acknowledge he
use o language assis ance powe ed by a ificial in elligence (Cha GPT, OpenAI) o
s ylis ic edi ing and linguis ic efinemen . All con en and a gumen s we e au ho ed and
e ified by he au ho s hemsel es.
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
Da a, sc ip s, code, and supplemen a y in o ma ion a ailabili y
Dis el, A.K.. e al. (2025). Wikida a:WikiP ojec HolyWells:
h ps://www.wikida a.o g/wiki/Wikida a:WikiP ojec _HolyWells ;
Thie y, F. e al. (2025). uzzy-sl Wikibase: h ps:// uzzy-sl.wikibase.cloud ;
Thie y, F., & Schenk, F. (2023). Campanian Ignimb i e Geo Loca ions [Da aSe ] a
h ps://gi hub.com/Resea ch-Squi el-Enginee s/campanian-ignimb i e-geo [19];
S e an, D. (2025). Holy Wells o CIDOC [So wa e] a
h ps://gi hub.com/CopyKi yCode/Holy_Wells_ o_CIDOC ;
Conflic o in e es disclosu e
The au ho s decla e ha hey comply wi h he PCI ule o ha ing no financial conflic s
o in e es in ela ion o he con en o he a icle.
Funding
The au ho s decla e ha hey ha e ecei ed no specific unding o his s udy.
Re e ences
[1] L. Rosseno a, P. Duchesne, and I. Blümel, ‘Wikida a and Wikibase as complemen a y
esea ch da a managemen se ices o cul u al he i age da a’, in P oceedings o he
3 d Wikida a Wo kshop 2022 co-loca ed wi h he 21s In e na ional Seman ic Web
Con e ence (ISWC2022) , 2022. [Online]. A ailable:
h ps://ceu -ws.o g/Vol-3262/pape 15.pd
[2] F. Thie y, A. W. Mees, and J. B. Kiesling, ‘Challenges in esea ch communi y building:
in eg a ing Te a Sigilla a (Samian) esea ch in o he Wikida a communi y’, AeC , ol.
34, no. 1, pp. 157–164, 2023, doi: 10.19282/ac.34.1.2023.17.
[3] N. Ha l, E. Wössne , and Y. Su e-Ve e , ‘Na ionale Fo schungsda enin as uk u
(NFDI)’, In o ma ik Spek um , ol. 44, no. 5, pp. 370–373, Oc . 2021, doi:
10.1007/s00287-021-01392-6.
[4] L. Rosseno a e al. , ‘How a e NFDI conso ia using Knowledge G aphs? An o e iew
o common unc ions and challenges by he Wo king G oup “Knowledge G aphs”’, in
P oceedings o he Con e ence on Resea ch Da a In as uc u e 2025 , Y. Su e-Ve e
and G. Paul, Eds, Aachen: Squi el Pape s, Aug. 2025, p. 7(5), 𝒬5. doi:
10.5281/zenodo.16736077.
[5] K. Fische e al. , ‘Windows on Da a: Fede a ing Resea ch Da a wi h FAIR Digi al
Objec s and Linked Open Da a’, in P oceedings o he Con e ence on Resea ch Da a
In as uc u e 2025 , Y. Su e-Ve e and P. G o h, Eds, Aachen: Squi el Pape s, Aug.
2025, p. 7(5), 𝒬3. doi: 10.5281/zenodo.16736221.
[6] F. Thie y e al. , ‘Objec -Rela ed Resea ch Da a Wo kflows Wi hin NFDI4Objec s and
Beyond’, in P oceedings o he Con e ence on Resea ch Da a In as uc u e , Y.
Su e-Ve e and C. Goble, Eds, Hanno e : TIB Open Publishing, Sep . 2023, pp.
CoRDI2023-46. doi: 10.52825/co di. 1i.326.
[7] S. C. Schmid , F. Thie y, and M. T ogni z, ‘P ac ices o Linked Open Da a in
A chaeology and Thei Realisa ion in Wikida a’, Digi al , ol. 2, no. 3, pp. 333–364,
June 2022, doi: 10.3390/digi al2030019.
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
[8] F. Thie y and P. Thie y, ‘Linked Open Ogham. How o publish and in e link a ious
Ogham Da a?’, A cheologia e Calcola o i , ol. 34, no. 1, pp. 105–114, 2023, doi:
10.19282/ac.34.1.2023.12.
[9] F. Thie y, L. Rosseno a, D. Mie chen, T. Hombu g, and P. Thie y, ‘Dis ibu ed Resea ch
Da a Knowledge G aphs - Challenges o ede a ed que ies using he Wiki e se and
OpenS ee Map wi hin he NFDI Knowledge G aph Ecosys em’, in P oceedings o he
Con e ence on Resea ch Da a In as uc u e 2025 , Y. Su e-Ve e and P. G o h, Eds,
Aachen: Squi el Pape s, Aug. 2025, p. 7(5), 𝒬2. doi: 10.5281/zenodo.16736047.
[10] F. Thie y, L. Rosseno a, and O. Simons, ‘Wikibase ins ances in he Cul u al He i age
Domain: Examples om he Ge man humani ies NFDI conso ia’, Squi el Pape s , ol.
6, no. 4, p. #4, No . 2024, doi: 10.5281/zenodo.14055699.
[11] F. Thie y and F. Schenk, ‘Modelling o Unce ain y in Geo Sciences Si es’, Squi el
Pape s , ol. 5, no. 1, p. #4, Dec. 2023, doi: 10.5281/zenodo.10255259.
[12] P. Ó Dálaigh, ‘The Holy Wells o Coun y Kilkenny - Volume 2’, Doc o al hesis, Ma y
Immacula e College, Uni e si y o Lime ick, 2018. Accessed: Jan. 11, 2025. [Online].
A ailable: h ps://dspace.mic.ul.ie/handle/10395/2584
[13] S. C. Schmid , F. Thie y, and M. T ogni z, ‘P ac ices o Linked Open Da a in
A chaeology and Thei Realisa ion in Wikida a’, Digi al , ol. 2, no. 3, pp. 333–364,
June 2022, doi: 10.3390/digi al2030019.
[14] F. Schenk, U. Hambach, S. B i zius, D. Ve es, and F. Si ocko, ‘A C yp o eph a Laye in
Sedimen s o an Infilled Maa Lake om he Ei el (Ge many): Fi s E idence o
Campanian Ignimb i e Ash Ai all in Cen al Eu ope’, Qua e na y , ol. 7, no. 2, p. 17,
Ma . 2024, doi: 10.3390/qua 7020017.
[15] T. Tsano a e al. , ‘Uppe Palaeoli hic laye s and Campanian Ignimb i e/Y-5 eph a in
Topli sa ca e, No he n Bulga ia’, Jou nal o A chaeological Science: Repo s , ol. 37,
p. 102912, June 2021, doi: 10.1016/j.jas ep.2021.102912.
[16] F. G. Fedele, B. Giaccio, R. Isaia, and G. O si, ‘The Campanian Ignimb i e E up ion,
Hein ich E en 4, and palaeoli hic change in Eu ope: A high- esolu ion in es iga ion’,
in Geophysical Monog aph Se ies , ol. 139, A. Robock and C. Oppenheime , Eds,
Washing on, D. C.: Ame ican Geophysical Union, 2003, pp. 301–325. doi:
10.1029/139GM20.
[17] M. W. Mo ley and J. C. Woodwa d, ‘The Campanian Ignimb i e (Y5) eph a a C ena
S ijena Rockshel e , Mon eneg o’, Qua . es. , ol. 75, no. 3, pp. 683–696, May 2011,
doi: 10.1016/j.yq es.2011.02.005.
[18] F. Thie y and F. Schenk, ‘How o loca e he Campanian Ignimb i e si e U luia based
on li e a u e? How o p o ide and publish his da a in a FAIR way?’, Squi el Pape s ,
ol. 5, no. 1, p. #5, 2023, doi: 10.5281/zenodo.10262720.
[19] F. Thie y and F. Schenk, ‘Campanian Ignimb i e Geo Loca ions’, Squi el Pape s , ol. 5,
no. 2, p. #2, 2023, doi: 10.5281/zenodo.10361309.
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610