scieee Science in your language
[en] (orig)

How are we standardising metadata in the Physical Sciences Data Infrastructure (PSDI)?

Author: Day, Aileen
Publisher: Zenodo
DOI: 10.5281/zenodo.17592734
Source: https://zenodo.org/records/17592734/files/20251111-PSDIMetadata.pdf
How a e we s anda dising me ada a in he Physical Sciences
Da a In as uc u e (PSDI)?
D Aileen Day (PSDI me ada a lead)
4 h On ologies4Chem wo kshop (11 h No embe 2025)
Ou line
O e iew o cu en PSDI me ada a
Onboa ding a PSDI esou ce
Fu u e de elopmen : A eas o o mal de ini ion
Ou line
O e iew o cu en PSDI me ada a
Onboa ding a PSDI esou ce
Fu u e de elopmen : A eas o o mal de ini ion
Wha is PSDI?
UK unded p ojec h ough he UKRI
Digi al Resea ch In as uc u e heme
(DRI) ia EPSRC
De eloped ou o a communi y
s a emen o need
Lead ins i u ions: Uni e si y o
Sou hamp on and Science and
Technology Facili ies Council (STFC)
Fi s launch o esou ces a e now li e!
h ps://www.psdi.ac.uk
PSDI: illing a Gap in P o ision
O he domains ha e es ablished
p o ision in he UK, e.g.
EBI in Li e Sciences
NERC Da a cen es in En i onmen al Science
UK Da a A chi e in Social Science
USA: Ma e ials Genome Ini ia i e
Japan: NIMS
Eu opean da a in as uc u es, e.g. E-CAM, MaX and NOMAD
Ge man Na ional Resea ch Da a In as uc u e (NFDI)
O he coun ies ha e ini ia i es
unde way in his domain, e.g.
We a e building a UK, Physical Science, Da a In as uc u e
Suppo ing UK Chemis y, Ma e ials and ela ed disciplines
In e acing o o he domains:
eg. Li e, Medical, Enginee ing and En i onmen al Sciences
In e acing wi h in e na ional ini ia i es

he glue ha b ings PSDI oge he
PSDI me ada a
h ps://da a-sea ch.psdi.ac.uk/
h ps:// esou ces.psdi.ac.uk/
PSDI Me ada a
Top-le el PSDI me ada a Documen a ion
PSDI Me ada a
– op-down app oach
Me ada a
abou
esou ces
Me ada a abou
p ope ies wi hin
esou ces
Me ada a abou da a
wi hin esou ces
Me ada a abou da a p o enance
Me ada a ha suppo s
PSDI Resou ce Ca alogue
Me ada a ha suppo s
PSDI C oss Da a Sea ch
PSDI Me ada a
– using exis ing s anda ds
Whe e e possible we use exis ing
s anda ds, on ologies, ocabula ies,
communi y bes p ac ices
e.g. Following CDIF1,2
ecommenda ions
CDIF is “ i s and o emos a p ac ical
exe cise: how can we ind a se o
common s anda ds and echnology
app oaches ha will enable us o
implemen FAIR o c oss-domain
scena ios using hings which exis
oday”
1. D2 .3 C oss-Domain In e ope abili y F amewo k CDIF Repo (Syn hesising Recommenda ions o
Disciplines and C oss-Disciplina y Resea ch A ea epo ) is a ecen Wo ldFAIR ou pu
2. h ps://c oss-domain-in e ope abili y- amewo k.gi hub.io/cdi book/in oduc ion.h ml
PSDI Me ada a
DCAT Resou ce Ca alogue
Wha we P o ide - PSDI (h ps:// esou ces.psdi.ac.uk/ )

PSDI Me ada a
DCAT Resou ce Ca alogue
Resou ces,
desc ip ions
and de ails
a e e ie ed
om DCAT
jsonld
me ada a
h ps://me ada a.psdi.ac.uk/psdi-dca .jsonld
DCAT o ma (h ps://www.w3.o g/TR/ ocab-dca -3/ )
Onboa ding a PSDI esou ce s ep 2:
esou ce desc ip ion o ca alogue
Py hon
sc ip s eed
hese in o
psdi-
dca .jsonld
Cu en ly ga he esou ce heme and
esou ce desc ip ions in Excel empla es,
collabo a i ely wi h esou ce owne s
Onboa ding a PSDI esou ce
S ep 3
PSDI
PSDI
e minology
PSDI
Resou ce
Ca alogue
PSDI C oss
Da a Sea ch
SKOS ocabula y
DCAT esou ce
ca alogue
OPTIMADE
PSDI C oss Da a Sea ch
PSDI C oss Da a Sea ch uses
OPTIMADE
We equi e he esou ce da a
o align wi h OPTIMADE
co e p ope ies and PSDI
namespace p ope ies o
maximum sea chabili y…

PSDI C oss Da a Sea ch
Align wi h OPTIMADE p ope ies in:
-s uc u es endpoin
(h ps://schemas.op imade.o g/de s/ 1
.2/en y ypes/op imade/s uc u es )
- e e ences endpoin
(h ps://schemas.op imade.o g/de s/ 1
.2/en y ypes/op imade/ e e ences )
PSDI – ind a subs ance
Wha is a subs ance?
e e y hing made up o elemen s
sea ching on OPTIMADE
s uc u e endpoin p ope y:
desc ip i e chemical o mula
(chemical_ o mula_desc ip i e)
This is he mos widely-popula ed
chemical o mula p ope y bu
o he s a e a ailable o sea ching in
he “Ad anced Sea ch”:
 educed chemical o mula
(chemical_ o mula_ educed)
Hill chemical o mula
(chemical_ o mula_hill)
anonymous chemical o mula
(chemical_ o mula_anonymous)
OPTIMADE p o ides
comp ehensi e
de ini ion o co e
p ope ies e.g.
PSDI – ind a subs ance
sea ching on OPTIMADE
s uc u e endpoin p ope y:
numbe o elemen s
(nelemen s)
elemen s
elemen s a ios
(elemen s_ a ios)
Wha is a subs ance?
e e y hing made up o elemen s
I i is a c ys al s uc u e
sea ching on cus om
p ope ies o cell
pa ame e s: a, b, c,
alpha, be a, gamma
will be linked up wi h
de ini ions in he ci
namespace
(no e ha OPTIMADE
cap u es he
al e na i e la ice
ec o s
(la ice_ ec o s) which
we do no use bu can
be sea ched on in
Ad anced Sea ch)
PSDI – ind a subs ance
Ou line
O e iew o cu en PSDI me ada a
Onboa ding a PSDI esou ce
Fu u e de elopmen : A eas o o mal de ini ion
Resou ce subjec classi ica ion (in p og ess)
Subs ance de ini ion
Uni s
Sou ce o unding/p ojec iden i ie s

Resou ce subjec ca ego isa ion
Aim: e iew choice o dca : hemeTaxonomy
( axonomy/ ocabula y/on ology/classi ica ion sys em) o desc ibe he
dca : heme ( esea ch opic o scien i ic discipline classi ica ion) o PSDI
esou ce hemes and esou ces. This is used:
1. In he DCAT me ada a which desc ibes he ca alogue o esou ces and esou ce
hemes
2. In he PSDI Wha We P o ide esou ce ca alogue o acili a e esou ce il e ing
3. In he PSDI C oss Da a Sea ch o acili a e da a sou ce il e ing
Cu en ly using Eu oSciVoc bu his decision was aken quickly and does no
include implemen a ion o hie a chy
No e ha while DCAT calls his ield dca : heme, we will e e o i
explici ly as dca : heme o “subjec ” he e ( a he han jus “ heme”) o
a oid con usion wi h esou ce heme
Cu en esou ce subjec ca ego isa ion
1. In he DCAT me ada a which desc ibes he ca alogue o esou ces and esou ce
hemes h ps://me ada a.psdi.ac.uk/psdi-dca .jsonld
Cu en esou ce subjec ca ego isa ion
2. In he PSDI Wha We P o ide PSDI esou ce ca alogue o acili a e esou ce
il e ing (cu en ly he e ms hemsel es a e shown wi h no hie a chy)
Resou ce subjec ca ego isa ion
equi emen s
Mus ha e:
1. In ui i e: s aigh o wa d o esou ce-owne s and use s o unde s and and
na iga e
2. Physical Sciences scope: includes he whole o he physical sciences, no jus
pa o i
3. Main ained and e sioned: upda ed as esea ch e ol es (so ha new e ms a e
added and old e ms a e s ill p ese ed)
Nice o ha e:
1. Pe sis en iden i ie s: o e ms (we could o malise an exis ing classi ica ion
sys em in o a skos ocabula y wi h pe sis en iden i ie s i i does no cu en ly
ha e his o ma )
2. De ini ions: e m de ini ions o make i clea e o na iga e and unde s and
3. Hie a chical: so ha i can p o ide di e en le els o de ail as equi ed
Subjec classi ica ion selec ion p ocess
Re iew di e en subjec classi ica ion sys ems acco ding o equi emen s
Tes sho lis ed candida es
1. How well do hey accommoda e he PSDI esou ces as o summe 2025?
Desc ip ion o e iew will sho ly be published on Zenodo in he PSDI communi y
(h ps://zenodo.o g/communi ies/psdi/ eco ds?q=&l=lis &p=1&s=10&so =newes )

Resou ce subjec ca ego isa ion
candida es in es iga ed
Na u e Physical
Sciences ca ego ies
FORD (Fields o Resea ch and
De elopmen )/FOS (Fields o
S udy) om F asca i Manual
Resou ce subjec ca ego isa ion
candida es in es iga ed
Also use ul o see a ious ways o o ganising subjec and good ways o doing
hings:
o ganise consis en ly by:
Discipline (e.g. chemis y)
Sub-discipline (e.g. o ganic chemis y)
Speci ic ield
Include “mul idisciplina y” and “o he ” a op le el
Sho lis ed candida es
Classi ica ion
sys em
P os
Cons
Eu oSciVoc
-
Top le el is well es ablished (FORD/FOS), g anula i y added
by NLP
-
Used by Eu opean Union o classi y and sea ch on esea ch
p ojec s in CORDIS
-
We a e al eady using i in PSDI
-
Seman ic wi h pids
-
Top-le el is a bi con using, so would omi in PSDI
-
Con using de ini ion o “physical sciences” in PSDI
con ex
-
No de ini ions
The Mode n
Science On ology
(
modsci)
-
Used by SciDa a which is one op ion o PSDI o conside o
da a-le el me ada a
-
Ve y seman ic (BFO on ology) wi h pids
-
Some de ini ions
-
On ology does no jus desc ibe scien i ic disciplines, bu also
disco e ies, phenomena, scien is s, ins umen s
-
Top le el “Applied Sciences”/”Fo mal
Science”/”In e disciplina y Science”/“Na u al Sciences”
is maybe no so would omi in PSDI
-
Good ep esen a ion o Chemis y bu can be pa chy
co e age elsewhe e e.g. “Compu e Science”
OpenAlex
Topics
-
was Fields o s udy axonomy wi hin Mic oso Academic
G aph (MAG)
-
Now used o ca ego ise OpenAlex
-
Top le el domains, ields and sub ields co espond o
Else ie ’s Scopus’s All Science Jou nal Classi ica ions (ASJC)
s uc u e a e e y in ui i e and o ganised
-
The e is accompanying open sou ce code o assign opic
based on i le and abs ac
-
Desc ip ions
-
Fields, sub ields and opics can be linked o ia OpenAlex API
(e.g. h ps://openalex.o g/ opics/T10464 )
-
Bo om le el ( opic) is a guably oo de ailed and
speci ic
-
Bo om le el ( opic) is gene a ed by an LLM and
assigned o he highe le el hie a chy - some
inconsis encies a his le el e.g. The bes desc ip ion o
c ys allog aphy (10464 C ys al s uc u es o chemical
compounds) si s unde 1604 Ino ganic Chemis y
a he han O ganic chemis y
-
Would be mos wo k o implemen
Would equi e on ology de elopmen
Inconsis encies o LLM-gene a ed bo om-le el ( e isi in ime)
Resou ce subjec ca ego isa ion
Nex s eps
Re iew cu en Eu oSciVoc e m assignmen o esou ce hemes and
esou ces wi h PSDI p ojec pa ne s and upda e psdi-dca .jsonld
acco dingly
Add mapping o dca : heme alues o hei con aining olde s
(Eu oSciVoc subse ) in me ada a.psdi.ac.uk o help wi h his
Adap “PSDI Wha We P o ide” esou ce ca alogue il e s o show
hie a chical ca ego ies o subjec
Add he same il e s o PSDI C oss Da a sea ch when selec ing Da a
Sou ces o sea ch
Publish PSDI_Subjec _Requi emen s.docx on Zenodo and link i o
DCAT me ada a documen a ion