scieee Science in your language
[en] (orig)

How do we integrate community data from Wikidata and the fuzzy-sl Wikibase into a cultural heritage knowledge graph?

Author: Stefan, Daria; Thiery, Florian; Schenk, Fiona
Publisher: Zenodo
DOI: 10.5281/zenodo.17333236
Source: https://zenodo.org/records/17333236/files/CAA2025Athens_Wikidata_v1.pdf
How do we in eg a e communi y da a om
Wikida a and he uzzy-sl Wikibase in o a
cul u al he i age knowledge g aph?
Da ia S e an*1,2, Flo ian Thie y*2, Fiona Schenk3,2
1 TU Wien – Vienna, Aus ia
2 Resea ch Squi el Enginee s Ne wo k – Mainz/Vienna, Ge many/Aus ia
3 Johannes Gu enbe g Uni e si ä Mainz – Mainz, Ge many
* Co esponding au ho (s)
Co espondence: [email p o ec ed]m; mail@ hie y.de
ABSTRACT
Communi y-cu a ed Wikibase ecosys ems, mos no ably Wikida a, Fac G id, and
wikibase.cloud ins ances such as uzzy-sl, ha e become significan sou ces o
Cul u al He i age (CH) and a chaeological da a. In pa allel, in as uc u e ini ia i es
(e.g., NFDI4Objec s) a e building CIDOC-aligned knowledge g aphs ha demand
obus p o enance, seman ic in e ope abili y, and ep oducibili y. This pape p esen s
a semi-au oma ed wo kflow o in eg a ing Wikibase da a in o in as uc u e-scale
g aphs, including en i y selec ion, on ology design aligned wi h CIDOC CRM (wi h
CRMa chaeo, CRMsci, and CRMdig as needed), sc ip ed RDF ans o ma ion, open
publica ion o code and da a, e sioned snapsho eleases, and inges ion in o he
NFDI4Objec s Knowledge G aph. The app oach p ese es Wikida a-s yle qualifie s
and e e ences while yielding an e en - and p o enance-cen ic ep esen a ion. Two
use cases demons a e easibili y and limi s. The I ish Holy Wells da ase
showcases ichly eified s a emen s (e.g., use, sou ces) mapped o CIDOC e en s.
The Campanian Ignimb i e findspo s in uzzy-sl ocus on loca ion-cen ic modelling
and in e disciplina i y, equi ing E53 Place as baseline wi h CRMa chaeo/CRMsci
specialisa ion o s a ig aphy, sampling, and analysis. We discuss challenges in
on ology alignmen , g anula i y, explici ea men o unce ain y (“ uzzy/wobbly”
da a), and sus aining semi-au oma ed pipelines amid e ol ing communi y schemas.
We a gue o iden ifie discipline, machine-ac ionable p o enance, and FAIR Digi al
Objec s o each elease. The ou come is an in e ope able, ede a ed ecosys em,
spanning iples o es, Wikibases, and FDOs, in which communi y knowledge bases
become pa ne s o in as uc u e g aphs.
Keywo ds: Wikibase, Wikida a, Linked Da a, On ology, A chaeology, Geosciences
In oduc ion
Communi y-d i en eposi o ies and knowledge bases – mos p ominen ly Wikida a,
Fac G id, and Wikibase ins ances on wikibase.cloud such as uzzy-sl – cons i u e
subs an ial ese oi s o cul u al he i age (CH) and a chaeology- ela ed da a [1], [2].
Cu a ed by olun ee s, ci izen scien is s, and independen esea che s, hese pla o ms
equen ly close gaps le by p ojec -bound ini ia i es and en ich he b oade esea ch
ecosys em wi h imely, fine-g ained obse a ions and links o he e ogeneous sou ces. In
pa allel, la ge-scale in as uc u e e o s (e.g., wi hin he Ge man Na ional Resea ch
Da a In as uc u e, NFDI [3]) a e building knowledge-g aph-based se ices o eeding
he in e disciplina y ede a ed Knowledge G aph Ecosys em [4], [5] (Fig. 1), g ounded in
es ablished seman ic amewo ks (RDF/OWL) and domain on ologies such as CIDOC
CRM and i s ex ensions. A key challenge – and oppo uni y – lies in in eg a ing
communi y-d i en esou ces wi h hese in as uc u e g aphs in a way ha is
me hodologically sound, FAIR, and sus ainable.
Figu e 1 - A dis ibu ed Knowledge G aph Scheme b inging oge he Linked Open
Da a and FAIR Digi al Objec s app oaches. Flo ian Thie y & And eas Noback, CC BY
4.0, ia Wikimedia Commons.
Wi hin his landscape, Ge many’s NFDI4Objec s conso ium [6] ac s as an
in e disciplina y hub o a chaeology and cul u al he i age, spanning an excep ionally
b oad empo al and ma e ial scope. The NFDI4Objec s Knowledge G aph1 al eady
in e links di e se da ase s2 (e.g., Linked Open Ogham, A ican Red Slip Wa e, Linked
Open Samian Wa e), employing CIDOC CRM and selec ed ex ensions (e.g., CRMa chaeo,
CRMdig) o enable in e ope able que ying ac oss collec ions, si es, and esea ch
ou pu s3. Le e aging Linked Open Da a (LOD) p ac ices [7], [8] he e is no me ely a
echnical choice: i unde pins ep oducible schola ship, aceable p o enance, and
3 c . h ps://doi.o g/10.5281/zenodo.13946052
2 c . h ps://g aph.n di4objec s.ne /collec ion/
1 c . h ps://g aph.n di4objec s.ne /
c oss-da ase disco e y. The ques ion, hen, is how o in e ace cu a ed, e ol ing
communi y pla o ms – each wi h i s own modelling idioms and go e nance – wi h a
ha monised, CIDOC-aligned knowledge g aph a in as uc u e scale.
Ou p oposed app oach is a semi-au oma ed, six-s ep wo kflow ha (1) iden ifies
CH- ele an en i ies in Wikibase ins ances, (2) designs o selec s an on ology aligned o
CIDOC CRM (supplemen ed whe e app op ia e), (3) ans o ms he sou ce da a in o RDF
ia sc ip ed pipelines, (4) publishes code and da a o ci a ion and e sioning, (5)
eleases egula da ase snapsho s o ensu e cu ency and pe sis ence, and (6) inges s
hese eleases in o, e.g. he NFDI4Objec s Knowledge G aph. This pipeline aims o
espec he open-wo ld assump ion o LOD and p ese e he ich con ex o s a emen s
and qualifie s. I e e ences ypical Wikida a-s yle modelling, while p oducing a cohe en ,
e en - and p o enance-cen ic ep esen a ion in CIDOC CRM.
The Wiki e se p o ides complemen a y a o dances in his se ing [9], [10]. Wikida a
o e s mul ilingual b ead h, ma u e p ope y cons ain s, and widely adop ed iden ifie s;
Fac G id suppo s humani ies-o ien ed p ojec s wi h de ailed his o ical con ex ; and
uzzy-sl a ge s spa ial unce ain y explici ly, in oducing modelling pa e ns o
agueness and ambigui y in findspo desc ip ions. By iangula ing ac oss hese
s eng hs, we can de i e a leas common denomina o o classes and p ope ies ha is
bo h mappable o CIDOC CRM and se iceable o downs eam analy ical asks. The
objec i e is no o e ase local nuances bu o encode hem ai h ully, o example, by
ans o ming qualifie s and eifica ions in o e en -cen ed CIDOC cons uc s wi h explici
imespans, ac o s, and sou ces.
Two use cases ancho ou con ibu ion. Fi s , he I ish Holy Wells da ase (Wikida a’s
WikiP ojec Holy Wells4) exemplifies ichly eified s a emen s ha combine s a us,
conse a ion, and sou cing p ac ices ac oss he e ogeneous e idence (e.g., O dnance
Su ey maps, local his o ies, Wikimedia Commons media). He e, he ask is o p ese e
e e en ial in eg i y (Q/P iden ifie s) and ans o m qualifie s in o CIDOC CRM e en
pa e ns, he eby enabling empo al easoning and p o enance-awa e que ies. Second,
he Campanian Ignimb i e (CI) findspo s [11] om he uzzy-sl Wikibase ocus on spa ial
unce ain y and c oss-domain alignmen (a chaeology–geosciences). In his case,
uzzy-sl ca ego ies and qualifie s can be mapped o CIDOC CRM (wi h
CRMsci/CRMa chaeo as needed), while p ese ing unce ain y en elopes and
obse a ional p o enance. Bo h cases s ess machine-ac ionable p o enance and
in e ope able unce ain y, which a e p e equisi es o meaning ul agg ega ion ac oss
NFDI4Objec s and beyond. Me hodologically, h ee p inciples guide he in eg a ion:
1. Iden ifie discipline. We e ain o e e ence au ho i a i e URIs (e.g., Wikida a
Q-codes) and, whe e app op ia e, asse equi alences (e.g., owl:sameAs) o
main ain global cohe ence. This minimises agile s ing-ma ching and ensu es
s able joins ac oss sou ces.
4 c . h ps://www.wikida a.o g/wiki/Wikida a:WikiP ojec _HolyWells
2. E en -cen ic modelling. Ra he han collapsing complex, qualified s a emen s
in o imeless iples, we ma e ialise he implied ac i i ies (assessmen ,
documen a ion, naming, celeb a ion, sampling) as CIDOC CRM e en s wi h
agen s, imespans, and documen a y e idence. This p ese es seman ics
essen ial o his o ical easoning and da a quali y assessmen .
3. Unce ain y as fi s -class da a. Spa ial agueness, compe ing a ibu ions, and
e ol ing s a es a e modelled explici ly (e.g., ia condi ion assessmen s and
empo ally scoped s a es; uzzy spa ial oo p in s), aligning wi h he open-wo ld
assump ion and a oiding misleading alse p ecision.
In sum, we posi ion communi y knowledge bases no as ex e nal “ eeds” bu as
coequal knowledge pa ne s whose specifici y and dynamism en ich in as uc u e-scale
g aphs. By combining sc ip ed wo kflows, CIDOC-complian pa e ns, and igo ous
iden ifie s a egies, ou app oach enhances he FAIRness and analy ic u ili y o CH da a
while main aining he flexibili y o pe iodic, au oma ed e eshes ha keep
in as uc u e g aphs in sync wi h ongoing communi y e o s. In his way, Wikibase
ins ances can become in eg al pa s o an in e disciplina y ede a ed Knowledge G aph
ecosys em ha spans iples o es, Wikibases, and FAIR Digi al Objec s.
Use Case: I ish Holy Wells om Wikida a
The WikiP ojec HolyWells (Q126443484) p o ides a complex and ichly s uc u ed
da ase . I combines con ibu ions om bo h academic esea che s and ci izen
scien is s, complemen s ex ual da a [12] wi h ichly anno a ed media om Wikimedia
Commons, and in eg a es in o ma ion om he e ogeneous sou ces, bo h digi al and
analogue (Fig. 2 and 3). By using eifica ions, he p ojec documen s he e olu ion o
conse a ion and use s a us, speci ying he con ex in which each condi ion was
eco ded. These ea u es a e op imal o demons a ing he p ocess o aligning a flexible
model like Wikida a o an in e ope able and consis en seman ic amewo k like CIDOC
CRM, while also exposing he p ac ical challenges ha a ise when a emp ing o
au oma e he ans o ma ion o such mul i ace ed da a.

Figu e 2 - Example: The Ci izen Science Wikida a P ojec Holy Wells. CC0.
Figu e 3 - F esh o d: S Lach ain's Well in Wikida a. CC0 and OSM Con ibu o s.
This p ojec aims o de elop a semi-au oma ic on ology design and alignmen
wo kflow ha can be implemen ed en i ely wi h Py hon lib a ies. The cu en wo kflow
p o o ype begins by que ying Wikida a and s o ing he da a in ela ional ables wi hin
Pos g es. A CIDOC CRM complian on ology is designed in P o égé, along wi h
mappings ha associa e he da a s o ed in Pos g es wi h i s new o m. Then he iples
a e ma e ialised using he On op plugin, c ea ing he final knowledge g aph. While he
ex ac ion, ans o ma ion, and loading p ocesses can be la gely au oma ed using
Py hon lib a ies, designing a CIDOC CRM–complian on ology emains a non- i ial ask,
as i equi es ca e ul analysis and alignmen o he loosely s uc u ed in o ma ion om
Wikida a. In he ollowing sec ions, we ou line how he da a was acqui ed, s uc u ed,
and ul ima ely aligned wi h CIDOC CRM, as well as he specific challenges we
encoun e ed.
Que ying Wikida a
Fo he da a explo a ion s ep, we se up a SPARQL endpoin and que ied o all iples
wi h an ins ance o Holy Well Seman ic Concep (Q126443332) as hei subjec . Holy
Well Seman ic Concep is a subclass o Holy Well (Q1371047), and e e y en y in he
I ish Holy Wells p ojec is an ins ance o his class.
To ensu e bo h accu acy and in e ope abili y, que ies e ie e no only he
human- eadable labels o en i ies and p ope ies bu also hei unique Wikida a
iden ifie s: Q-IDs o en i ies and P-IDs o p ope ies. Embedding hese Q-IDs di ec ly
in o he URIs o indi iduals modelled in P o égé while keeping he Wikida a p efix is ou
s a egy o main aining e e en ial in eg i y. Human- eadable labels, desc ip ions, and
aliases a e s ill inco po a ed as d s:label, skos:al Label, and schema:desc ip ion,
allowing seman ic cla i y o human use s and mul ilingual applica ions.
The ini ial que y e ie ed 229 dis inc subjec s and 37 unique s a emen s. Among he
mos equen ly used p ope ies a e ‘desc ibed by sou ce’ (P1343), ‘ins ance o ’ (P31),
‘loca ed in he adminis a i e e i o ial en i y’ (P131), ‘in en o y numbe ’ (P217), and
‘collec ion’ (P195). Wikida a’s da a model is buil a ound s a emen s, qualifie s, and
e e ences [13]. Each ac is exp essed as a s a emen , while qualifie s add con ex ual
condi ions and e e ences poin o suppo ing e idence, bu a e no di ec ly connec ed o
he subjec . All s a emen s we e examined o hei domain and ange cons ain s o
acili a e seman ic alignmen and o ensu e accu a e mapping in o a CIDOC CRM-based
on ology.
Fo some en ies, bo h s a us in o ma ion and con ex ual in o ma ion a e missing
al oge he ; o o he s, s a uses a e decla ed di ec ly as p ope ies, wi hou u he
suppo ing in o ma ion. A smalle subse con ains eified s a emen s, which allow
qualifie s such as he s a e o use o be a ached o a p ope y, such as ‘ins ance o ’
(P31) and o e e ences o be added as well, men ioning he sou ce o he claim. Bo h
he classifica ion o he en i y and an assessmen o i s condi ion a e linked o he same
piece o e idence. F om a CIDOC CRM pe spec i e, his p ac ice aligns wi h he
e en -cen ic modelling o knowledge, whe e he asse ion o a p ope y is i sel si ua ed
in ime, a ibu ed o an ac o , and suppo ed by e idence. Que ies o e such s a emen s
aim o p ese e hei s uc u e and con ex , e ie ing no only he comple e s a emen ,
bu also he supe class o he sou ce and all addi ional in o ma ion con ained on he
sou ce’s page, such as au ho and publica ion da e.
Wikida a inco po a es and e e ences ex e nal esou ces such as YouTube,
OpenS ee Map, I ish Si es and Monumen s Reco ds h ough specific p ope ies known
as ex e nal iden ifie s. Those p ope ies a e all ins ances o subclasses o he class
Unique Iden ifie (Q6545185). Fo each iden ifie , que ies collec bo h he e e ence i sel
and i s associa ed qualifie s, which may include e ie al da e, publishe , o language,
and in some cases, he classifica ion o he qualifie alues. Fo example, a linked
‘YouTube ideo’ (P1651) can be que ied alongside he ideo’s ‘publica ion da e’ (P577)
and ‘du a ion’ (P2047), showing how audio isual con en is ancho ed in Wikida a wi h
desc ip i e and empo al a ibu es.
A di e en s a egy is equi ed o Wikimedia Commons, since me ada a such as
au ho ship, c edi , and licensing is no ully exposed h ough he Wikida a SPARQL
endpoin . Ins ead, filenames e ie ed om Wikida a mus fi s be no malised, a e
which he Wikimedia API is used o e ch ex ended me ada a. HTML agmen s a e hen
pa sed o ex ac s uc u ed in o ma ion, such as a is , c edi name and links. While he
echnical que ying mechanism di e s, he goal emains he same—en iching iden ifie s
wi h con ex ual me ada a, like da e and place o he c ea ion o an image, i s au ho and
license policy.
Alongside eified s a emen s and ex e nal iden ifie s, some p ope ies a e exp essed
as di ec s a emen s wi hou addi ional qualifie s. These include spa ial and ca ego ical
a ibu es such as geog aphic coo dina es (con e ed in o bo h WKT and DMS o ma s
o s anda dised ep esen a ion), diocese affilia ion, and medical condi ion. Fo hese
cases, a simplified que y pa e n e ie es he subjec –p edica e–objec iple di ec ly
and supplemen s i wi h he ‘ins ance o ’ (P31) classifica ion o he objec , ensu ing ha
e en simple s a emen s emain embedded wi hin a s uc u ed and in e p e able da a
model.
CIDOC CRM On ology Mapping
T ansla ing he acqui ed in o ma ion in o CIDOC CRM equi es he cons uc ion o a
cus om on ology wi h mappings. While he ul ima e aim is o de elop a semi-au oma ed
wo kflow, we cu en ly p oduce hem manually, p o iding a aluable “g ound u h” o
u u e expe imen s wi h au oma ion. The p ocedu e begins wi h iden i ying salien
ea u es o bo h on ologies: o Wikida a, hese include en i y Q-codes, labels,
desc ip ions, ‘ins ance o ’ (P31) alues, and p ope y domain and ange es ic ions,
while o CIDOC CRM hey consis o class desc ip ions and p ope y domain and ange
cons ain s as defined in he documen a ion. The ep esen a i e example o
Columbkile’s Well (Q126456441) was modelled acco ding o Table 1:
Table 1 – co espondences when modelling ela ionship ins ance o (P31) in
wikida a o a CIDOC complian on ology o he example en i y Columbkille’s well.
Columbkile’s Well
Wikida a
On ology
Well is Ins ance O X
X is Subclass O
Well is Ins ance O Y
Y is Subclass o
A chaeological si e
(Q839954)
loca ion o disco e y
(Q1291195),
human-geog aphic
e i o ial en i y
(Q15642541), geog aphic
loca ion (Q2221906)
A chaeological Si e
E27 Si e, E53 Place
Wa e sou ce (Q10713454)
body o wa e (Q15324),
wa e esou ce (Q1049799)
Wa e Sou ce
E27 Si e, E53 Place
Holy well (Q1371047)
oun ain(Q483453),
s uc u e o
wo ship(Q1370598)
sp ing (Q124714), wa e
well(Q43483)
Holy Well
Foun ain, Wa e Well (bo h
a e subclasses o E25
Human made Fea u e)
Holy well seman ic
concep (Q126443332)
Holy well (Q1371047)
Holy Well Seman ic
Concep
Holy Well
This s ep b ings in ano he le el o complexi y: he ela ionships om Wikida a do no
co espond one- o-one o he ones defined in CIDOC, since he la e is e en -based, and
mos e en s, like documen ing, assessing, naming, o celeb a ing, a e no explici ly
s a ed in Wikida a. The implied exis ence o hese e en s has o be ecognised by a
human and placed in he co ec con ex , modelling causali y and empo ali y.
Once he ele an classes ha e been iden ified, he ela ionships be ween hei
ins ances mus also be ansla ed. This can be au oma ed h ough he use o
mappings—que ies ha ex ac da a om he ela ional da abase c ea ed a e que ying
Wikida a and eshape i in o he a ge on ology s uc u e, he eby ma e ialising
indi iduals. Al hough On op au oma es he ma e ialisa ion s ep i sel , he mappings ha
define how he da a is ans o med (Figu e 10) need o be w i en manually.
The ollowing cases illus a e his modelling p ocess using Columbkille’s well as an
example. Fo he ‘named a e ‘ p ope y (P138), he p ocess begins by e ie ing he
subjec (Columbkille’s Well), p edica e and objec (Columba), oge he wi h he objec ’s
Q-code (Q236326), class (human) and labels om Wikida a. A co esponding indi idual
(Columba) is c ea ed in he on ology, as an ins ance o he co ec CIDOC class, E21
Pe son. Columbkille’s Well is iden ified by i s name, which is a new indi idual ins ance o
E41 Appella ion. This name ‘P67 e e s o’ Columba, he pe son.
The eas day celeb a ion o Columba was ‘P17 mo i a ed by’ Columba and ‘P4 has
imespan’ o he Feas Day o Columba, a new indi idual ins ance o E7 Ac i i y. June 9 is
a new indi idual ins ance o SP14 Time Exp ession and ‘Q16 defines ime’ o he eas
day wi hou modi ying he s ing e u ned by he wikida a que y.
Fo a chaeological si es, examples include Topli sa Ca e [15] (Q115), F anch hi Ca e
[16] (Q111; Fig. 11 and 12), and C ena S iljena Ca e [17] (Q89). These loca ions a e
ele an because hey combine e idence o human occupa ion wi h CI eph a, allowing
o in e ences abou human – en i onmen in e ac ions, ch onological ancho s, and
cul u al esponses o olcanic e en s. Wi hin a CIDOC CRM pe spec i e, hese en i ies
can emain modelled as E53 Place, bu equi e linking o E27 Si e (cul u ally defined
locale) and, whe e exca a ion da a exis , o CRMa chaeo’s A8 S a ig aphic Uni o laye s
con aining olcanic ash. E en s such as he deposi ion o eph a o i s disco e y can
hen be ep esen ed as E5 E en ins ances, enabling empo al and causal easoning.
Fo geological si es, en ies such as DE3 Dehne Maa (Q85), Auel Maa AU3 (Q70),
and U luia Qua y [18] (Q73) documen CI deposi s iden ified h ough field su eys and
geoscien ific sampling [14]. He e, CIDOC CRM alone is insufficien . In eg a ion wi h
CRMsci is necessa y o ep esen sampling ac i i ies (e.g., S4 Obse a ion, S19
Encoun e E en ) and analy ical ou comes, while CRMdig can suppo p o enance o
labo a o y p ocesses. This ensu es ha eph a samples a e no me ely a ibu es o
places bu a e documen ed as en i ies gene a ed by scien ific p ocedu es wi h aceable
p o enance and ep oducibili y.
A p esen , he uzzy-sl Wikibase cap u es hese si es p ima ily as labelled places
wi h coo dina es and limi ed qualifie s. The p ope ies and qualifie s a e no ye mapped
o CIDOC CRM, eflec ing he s ill-expe imen al s a e o he modelling. None heless, his
da ase highligh s a me hodological challenge: how o align he e ogeneous disciplina y
pe spec i es on he “same” si e. A chaeologis s p io i ise cul u al laye s and human
in e ac ion, whe eas geoscien is s ocus on s a ig aphic sequences and geochemical
signa u es. A ede a ed g aph mus accommoda e bo h pe spec i es wi hou fla ening
hem in o an o e simplified schema. The in e disciplina y in eg a ion can be achie ed by
adop ing an e en -cen ic app oach:
● The e up ion i sel is modelled as an E5 E en , p oducing deposi s wi h a defined
imespan.
● Geological obse a ions (sampling, analysis) a e modelled ia CRMsci, linked o
bo h he deposi s and he places whe e hey occu .
● A chaeological con ex s a e ep esen ed as E27 Si es, s a ig aphic uni s, o
ma e ial cul u e associa ions, connec ed o he same deposi s bu wi h addi ional
cul u al in e p e a ion.
By in e linking hese elemen s, he knowledge g aph suppo s que ies ha a e se
disciplina y bounda ies, e.g., “Which a chaeological si es wi h CI deposi s coincide wi h
geologically da ed findspo s olde han ~40.000 y b2k?” o “Which s a ig aphic
con ex s combine olcanic ash wi h a e ac ual assemblages?” C ucially, his case
demons a es he ole o CIDOC CRM as an in e ope abili y b idge. While E53 Place
suffices as a baseline, in e disciplina y da a in eg a ion equi es laye ing addi ional
on ologies. Fo CI findspo s, his means ha he same URI may unc ion simul aneously
as a E53 Place, an E27 A chaeological Si e, and he locus o scien ific obse a ions

(CRMsci). Such polyhie a chical modelling aligns wi h he open-wo ld assump ion and
p ese es he capaci y o expand as new da a o disciplina y pe spec i es eme ge.
F om he pe spec i e o he NFDI4Objec s Knowledge G aph, he CI case o e s an
oppo uni y a he han a limi a ion. E en hough he implemen a ion is no ye
ope a ional, he concep ual mapping illus a es how communi y-cu a ed Wikibase da a
can flow in o in as uc u e-scale g aphs. The benefi lies in es ablishing iden ifie
discipline (s able URIs, equi alences o Wikida a Q-IDs), documen ing p o enance o
scien ific ac i i ies, and ensu ing ha unce ain y in spa ial and ch onological esolu ion
is no collapsed bu explici ly ep esen ed.
In conclusion, he Campanian Ignimb i e use case unde sco es he po en ial o
semi-au oma ed wo kflows o ede a ed knowledge g aph cons uc ion. By in eg a ing
uzzy-sl da a in o a CIDOC-aligned g aph, we no only cap u e a pi o al La e Pleis ocene
olcanic e en bu also showcase how in e disciplina y collabo a ion – b idging geology
and a chaeology – can be encoded as Linked Open Da a. This se s he s age o u u e
implemen a ion whe e lab analysis o CI si es becomes FAIR Digi al Objec s wi hin he
b oade ede a ed ecosys em o NFDI and beyond.
Discussion & Conclusion & Ou look
The wo case s udies highligh bo h he oppo uni ies and he challenges o
in eg a ing communi y-cu a ed Wikibase da a in o in as uc u e-scale knowledge
g aphs. The Holy Wells illus a e he po en ial o ichly eified cul u al he i age
s a emen s, while he Campanian Ignimb i e (CI) findspo s expose he difficul ies o
aligning a chaeological and geoscien ific pe spec i es.
A key challenge lies in on ology alignmen and modelling. Wikibase p ope ies and
qualifie s a e o en oo flexible o be mapped di ec ly in o CIDOC CRM s uc u es. While
CIDOC CRM p o ides an e en -cen ic amewo k, i s ex ensions (CRMa chaeo, CRMsci,
CRMdig) a e equi ed o ep esen s a ig aphic easoning, sampling p ocedu es, and
labo a o y p o enance.
Fo he Holy Wells example, he fi s explo a ion s ep e ealed a modelling
inconsis ency in Wikida a: ins ances o Holy Well Seman ic Concep appea as he
subjec s o he ‘ eas day’ (P841) p ope y, despi e i being es ic ed o humans, g oups
o humans, i les o Ma y, legenda y figu es, a ibu es o God, Bible s o ies, o
pe iscopes. Addi ionally, Holy Wells a e no consis en ly named a e o associa ed wi h
Ch is ian eligious concep s, no do hey always ha e a fixed eas day. Fo example, he
Well o he Rags (Q126647640) is named a e a cloo ie ee (Q107257053) and has a
eas day “Sunday a e Augus 13”. This issue also ies in wi h he conse a ion and
usage s a e qualifie s: acco ding o he p ojec desc ip ion, i he eas day o a well is
cu en ly being celeb a ed, i should be ma ked as being in use (Q55654238) and as an
ins ance o (P31) a eligious si e (Q105889895).
Al hough humans can in ui i ely unde s and hese associa ions, modelling hem in
CIDOC CRM aises ques ions: I he celeb a ion is o be modelled as an e en , hen
addi ional con ex ual de ails a e equi ed: Is he celeb a ion mo i a ed by de o ion o a
specific eligious figu e? Does i ake place a he well i sel , o does i in ol e he well in
any way? Has he celeb a ion been documen ed, and i so, by whom and when? Was i
his o ically ied o a pa icula pe iod, and is i s ill p ac ised oday? These ques ions
highligh which con ex ual in o ma ion is no explici ly encoded and mus he e o e be
added when modelling he da a in CIDOC, aking he eas day om a simple ecu ing
poin in ime o an on ologically app op ia e en i y ha exp esses he esea che s’
in ended seman ic cons uc .
Addi ionally, he p ojec sugges s a simplified modelling s a egy ha limi s s a e
qualifie s o wo op ions (in use o abandoned), and conse a ion s a e qualifie s o
h ee (p ese ed, in dange , o demolished/des oyed), e en hough Wikida a p ope y
cons ain s allow o a much wide ange o alues. This s a egy implici ly eflec s a
closed wo ld assump ion in which he qualifie s a e ea ed as mu ually exclusi e and
exhaus i e: i a well is no in use, i is assumed o be abandoned. LOD ope a es unde an
open-wo ld assump ion, whe e he absence o a qualifie does no imply i s nega ion; he
lis o possible s a es is no seen as exhaus i e, and s a es can co-occu . CIDOC CRM
aligns wi h he open wo ld assump ion and suppo s fine-g ained modelling o use and
conse a ion s a es ac oss ime, while also d awing a en ion o he gaps ha hinde
au oma ic easoning.
Ano he challenge is posed by he classifica ion s ep, whe e candida e
co espondences a e gene a ed by compa ing he Wikida a classes agains CIDOC
classes. This s ep p o ed challenging e en o humans, as i equi es a nuanced
unde s anding o he fine-g ained dis inc ions in CIDOC CRM (e.g. E22 Human-Made
Objec s E25 Human-Made Fea u e o E31 Documen s E73 In o ma ion Objec ), he
implica ions o CIDOC’s inhe i ance hie a chy o p ope y domains and anges, as well
as basic unde s anding o he modelled subjec s and hei indi idual pa icula i ies.
Deciding on an app op ia e g anula i y le el o he classifica ion s ep is a u he
non- i ial ask: oo gene ic, and aluable seman ics a e los ; oo specific, and
in e ope abili y su e s.
The decision-making p ocess and heu is ics employed o classi y Columbkile’s Well
(Q126456441, Table 1) we e based mainly on “common sense” and backg ound
knowledge, which is no o iously flimsy and difficul o encode in o a o mal decision
g aph o compu a ional use. I he p ocess we e o be au oma ed, subsequen s eps
would in ol e compu ing simila i y measu es o all candida e co espondences,
agg ega ing hese esul s, and applying h esholds o fil e ou un eliable pai ings. The
mapping would hen be efined h ough i e a i e cycles, inco po a ing s uc u al indices,
con ex ual neighbou hood in o ma ion, map-disco e y and map- epai echniques, un il
consis ency is eached and all classes find a co espondence. This p ocess would be
applied no only o all 229 subjec s in he Wikida a p ojec bu also o e e y en i y
appea ing as an objec in a ela ed s a emen , qualifie , o e e ence. The scale and
he e ogenei y o his ask make i clea ha one o he cen al challenges lies in
de eloping an au oma ed app oach ha is simul aneously scalable and eliable.
The second challenge is in e disciplina i y. A chaeologis s end o ocus on cul u al
laye s, a e ac s, and human ac i i ies, while geoscien is s p io i ise s a ig aphy,
geochemis y, and analy ical wo kflows. Modelling a single loca ion as bo h an
a chaeological si e and a geological ou c op equi es mul i-pe spec i e ep esen a ions
ha can coexis wi hou con adic ion.
The CI case shows ha he same en i y may need o unc ion simul aneously as an
E53 Place, an E27 Si e, and he locus o a CRMsci obse a ion. Such polyhie a chical
modelling is concep ually demanding bu essen ial i he ede a ed g aph is o suppo
que ies ac oss disciplines. A u he issue is unce ain y. Communi y da a o en con ains
ague coo dina es, con es ed ch onologies, o shi ing a ibu ions. Unless explici ly
ep esen ed, hese ambigui ies isk being fla ened in o misleading p ecision. Handling
“ uzzy” and “wobbly” da a equi es no only echnical modelling pa e ns bu also sha ed
s anda ds ha balance usabili y wi h accu acy. Au oma ion and sus ainabili y p esen
ano he laye o difficul y. Semi-au oma ed pipelines (SPARQL ex ac ion, RDF
con e sion, ETL) a e indispensable o scalabili y, ye hey depend on e ol ing
communi y schemas. Quali y assu ance, pe sis en iden ifie s, and e sioned eleases
a e needed o ensu e ha in as uc u e knowledge g aphs emain s able e en as
communi y ins ances change dynamically.
In sum, he discussion e eals a ension be ween communi y dynamism and
in as uc u e s abili y. Communi y Wikibases h i e on openness, apid e olu ion, and
olun ee con ibu ions; in as uc u es such as NFDI4Objec s equi e pe sis ence,
ci abili y, and eliabili y. Reconciling hese modes demands ca e ul go e nance,
ep oducible wo kflows, and machine-ac ionable p o enance.
Looking o wa d, he on ology used in he uzzy-sl Wikibase will equi e efinemen
and close alignmen wi h CIDOC CRM, MaCHeCO, and he Objec Co e Me ada a P ofile.
Semi-au oma ed wo kflows o ex ac ion and mapping mus be s abilised, wi h each
elease packaged as a FAIR Digi al Objec . Mo e impo an ly, he de elopmen o
communi y s anda ds o unce ain y and in e disciplina i y will be cen al o scaling
beyond indi idual use cases. I hese s eps a e aken, Wikibase ins ances such as
Wikida a, Fac G id, and uzzy-sl can e ol e om expe imen al eposi o ies in o in eg al
componen s o a ede a ed, in e disciplina y knowledge g aph ecosys em ha b idges
a chaeology, cul u al he i age, and he geosciences.
Acknowledgemen s
The au ho s would like o hank S ephen S ead o his ad ice as well as he CAA
Ge many, SIG Da a D agon and NFDI4Objec s Communi y. The au ho s acknowledge he
use o language assis ance powe ed by a ificial in elligence (Cha GPT, OpenAI) o
s ylis ic edi ing and linguis ic efinemen . All con en and a gumen s we e au ho ed and
e ified by he au ho s hemsel es.
Da a, sc ip s, code, and supplemen a y in o ma ion a ailabili y
Dis el, A.K.. e al. (2025). Wikida a:WikiP ojec HolyWells:
h ps://www.wikida a.o g/wiki/Wikida a:WikiP ojec _HolyWells ;
Thie y, F. e al. (2025). uzzy-sl Wikibase: h ps:// uzzy-sl.wikibase.cloud;
Thie y, F., & Schenk, F. (2023). Campanian Ignimb i e Geo Loca ions [Da aSe ] a
h ps://gi hub.com/Resea ch-Squi el-Enginee s/campanian-ignimb i e-geo [19];
S e an, D. (2025). Holy Wells o CIDOC [So wa e] a
h ps://gi hub.com/CopyKi yCode/Holy_Wells_ o_CIDOC;
Conflic o in e es disclosu e
The au ho s decla e ha hey comply wi h he PCI ule o ha ing no financial conflic s
o in e es in ela ion o he con en o he a icle.
Funding
The au ho s decla e ha hey ha e ecei ed no specific unding o his s udy.
Re e ences
[1] L. Rosseno a, P. Duchesne, and I. Blümel, ‘Wikida a and Wikibase as complemen a y
esea ch da a managemen se ices o cul u al he i age da a’, in P oceedings o he
3 d Wikida a Wo kshop 2022 co-loca ed wi h he 21s In e na ional Seman ic Web
Con e ence (ISWC2022), 2022. [Online]. A ailable:
h ps://ceu -ws.o g/Vol-3262/pape 15.pd
[2] F. Thie y, A. W. Mees, and J. B. Kiesling, ‘Challenges in esea ch communi y building:
in eg a ing Te a Sigilla a (Samian) esea ch in o he Wikida a communi y’, AeC, ol.
34, no. 1, pp. 157–164, 2023, doi: 10.19282/ac.34.1.2023.17.
[3] N. Ha l, E. Wössne , and Y. Su e-Ve e , ‘Na ionale Fo schungsda enin as uk u
(NFDI)’, In o ma ik Spek um, ol. 44, no. 5, pp. 370–373, Oc . 2021, doi:
10.1007/s00287-021-01392-6.
[4] L. Rosseno a e al., ‘How a e NFDI conso ia using Knowledge G aphs? An o e iew
o common unc ions and challenges by he Wo king G oup “Knowledge G aphs”’, in
P oceedings o he Con e ence on Resea ch Da a In as uc u e 2025, Y. Su e-Ve e
and G. Paul, Eds, Aachen: Squi el Pape s, Aug. 2025, p. 7(5), 𝒬5. doi:
10.5281/zenodo.16736077.
[5] K. Fische e al., ‘Windows on Da a: Fede a ing Resea ch Da a wi h FAIR Digi al
Objec s and Linked Open Da a’, in P oceedings o he Con e ence on Resea ch Da a
In as uc u e 2025, Y. Su e-Ve e and P. G o h, Eds, Aachen: Squi el Pape s, Aug.
2025, p. 7(5), 𝒬3. doi: 10.5281/zenodo.16736221.
[6] F. Thie y e al., ‘Objec -Rela ed Resea ch Da a Wo kflows Wi hin NFDI4Objec s and
Beyond’, in P oceedings o he Con e ence on Resea ch Da a In as uc u e, Y.
Su e-Ve e and C. Goble, Eds, Hanno e : TIB Open Publishing, Sep . 2023, pp.
CoRDI2023-46. doi: 10.52825/co di. 1i.326.
[7] S. C. Schmid , F. Thie y, and M. T ogni z, ‘P ac ices o Linked Open Da a in
A chaeology and Thei Realisa ion in Wikida a’, Digi al, ol. 2, no. 3, pp. 333–364,
June 2022, doi: 10.3390/digi al2030019.
[8] F. Thie y and P. Thie y, ‘Linked Open Ogham. How o publish and in e link a ious
Ogham Da a?’, A cheologia e Calcola o i, ol. 34, no. 1, pp. 105–114, 2023, doi:
10.19282/ac.34.1.2023.12.
[9] F. Thie y, L. Rosseno a, D. Mie chen, T. Hombu g, and P. Thie y, ‘Dis ibu ed Resea ch
Da a Knowledge G aphs - Challenges o ede a ed que ies using he Wiki e se and
OpenS ee Map wi hin he NFDI Knowledge G aph Ecosys em’, in P oceedings o he
Con e ence on Resea ch Da a In as uc u e 2025, Y. Su e-Ve e and P. G o h, Eds,
Aachen: Squi el Pape s, Aug. 2025, p. 7(5), 𝒬2. doi: 10.5281/zenodo.16736047.
[10] F. Thie y, L. Rosseno a, and O. Simons, ‘Wikibase ins ances in he Cul u al He i age
Domain: Examples om he Ge man humani ies NFDI conso ia’, Squi el Pape s, ol.
6, no. 4, p. #4, No . 2024, doi: 10.5281/zenodo.14055699.
[11] F. Thie y and F. Schenk, ‘Modelling o Unce ain y in Geo Sciences Si es’, Squi el
Pape s, ol. 5, no. 1, p. #4, Dec. 2023, doi: 10.5281/zenodo.10255259.
[12] P. Ó Dálaigh, ‘The Holy Wells o Coun y Kilkenny - Volume 2’, Doc o al hesis, Ma y
Immacula e College, Uni e si y o Lime ick, 2018. Accessed: Jan. 11, 2025. [Online].
A ailable: h ps://dspace.mic.ul.ie/handle/10395/2584
[13] S. C. Schmid , F. Thie y, and M. T ogni z, ‘P ac ices o Linked Open Da a in
A chaeology and Thei Realisa ion in Wikida a’, Digi al, ol. 2, no. 3, pp. 333–364,
June 2022, doi: 10.3390/digi al2030019.
[14] F. Schenk, U. Hambach, S. B i zius, D. Ve es, and F. Si ocko, ‘A C yp o eph a Laye in
Sedimen s o an Infilled Maa Lake om he Ei el (Ge many): Fi s E idence o
Campanian Ignimb i e Ash Ai all in Cen al Eu ope’, Qua e na y, ol. 7, no. 2, p. 17,
Ma . 2024, doi: 10.3390/qua 7020017.
[15] T. Tsano a e al., ‘Uppe Palaeoli hic laye s and Campanian Ignimb i e/Y-5 eph a in
Topli sa ca e, No he n Bulga ia’, Jou nal o A chaeological Science: Repo s, ol. 37,
p. 102912, June 2021, doi: 10.1016/j.jas ep.2021.102912.
[16] F. G. Fedele, B. Giaccio, R. Isaia, and G. O si, ‘The Campanian Ignimb i e E up ion,
Hein ich E en 4, and palaeoli hic change in Eu ope: A high- esolu ion in es iga ion’,
in Geophysical Monog aph Se ies, ol. 139, A. Robock and C. Oppenheime , Eds,
Washing on, D. C.: Ame ican Geophysical Union, 2003, pp. 301–325. doi:
10.1029/139GM20.
[17] M. W. Mo ley and J. C. Woodwa d, ‘The Campanian Ignimb i e (Y5) eph a a C ena
S ijena Rockshel e , Mon eneg o’, Qua . es., ol. 75, no. 3, pp. 683–696, May 2011,
doi: 10.1016/j.yq es.2011.02.005.
[18] F. Thie y and F. Schenk, ‘How o loca e he Campanian Ignimb i e si e U luia based
on li e a u e? How o p o ide and publish his da a in a FAIR way?’, Squi el Pape s,
ol. 5, no. 1, p. #5, 2023, doi: 10.5281/zenodo.10262720.
[19] F. Thie y and F. Schenk, ‘Campanian Ignimb i e Geo Loca ions’, Squi el Pape s, ol. 5,
no. 2, p. #2, 2023, doi: 10.5281/zenodo.10361309.