1
Documen in o ma ion
G an ag eemen
101137656
P ojec i le
Unde pinning in as uc u e o he e icien and
lexible analysis o la ge clima e da ase s
P ojec ac onym
EXPECT
P ojec s a da e
1 Ap il 2024
Rela ed wo k package
WP7
Rela ed ask(s)
7.1, 7.2, 7.3
Lead o ganisa ion
DKRZ
Au ho s
S ephan Kinde mann, Daniel Fulla, …
Submission da e
30 Sep embe 2025
Dissemina ion le el
PU (Public)
His o y
Da e
Submi ed by
Re iewed by
No es
17/07/2025
Fi s d a , S.
Kinde mann, D.
Fulla
Fi s d a
29/08/2025
S ephan
Kinde mann,
Daniel Fulla, Pie e
An oine
B e onnie e,
Ma co Puccini,
Se ena Lo enzini
To be e iewed by
Expec Theme 4
Wi h commen s o be
add essed
08/08/2025
Theme4
Ch is ian Lessig,
Be nd Funke
Ve sion submi ed o
ex e nal e iewe s
2
30/09/2024
Daniel Fulla e . al.
Ch is ian Lessig,
Be nd Funke
Final e sion wi h
commen s om
e iewe s add essed
Disclaime : Funded by he Eu opean Union. Views and opinions exp essed a e
howe e hose o he au ho (s) only and do no necessa ily e lec hose o he
Eu opean Union o he Eu opean Clima e, In as uc u e and En i onmen Execu i e
Agency (CINEA). Nei he he Eu opean Union no he g an ing au ho i y can be held
esponsible o hem.
3
Table o con en
Abou 5
Execu i e summa y 6
Requi emen s on in as uc u e om pa ne s’ eedback 7
EXPECT In as uc u e 8
Da a S o age and T ans e 9
Da a S o age and ans e a DKRZ 10
Da a s o age and ans e a CINECA 12
Da a P ocessing 16
Da a P ocessing a DKRZ 16
Da a p ocessing a CINECA 19
Expec Ca alog (o Ca alogs) 22
Da a analysis wo k lows 24
Summa y 26
Fu u e Wo k 27
Appendix 28
Lis o ables
Table 1 6
Table 2 6
Lis o igu es
Figu e 1 EXPECT In as uc u e O e iew
Figu e 2 Cloud da a s o age se up
Figu e 3 Jupy e Hub a DKRZ
Figu e 4 Mul i-use Dask ga eway implemen a ion
Figu e 5 Dask da a p ocessing and diagnos ics
Figu e 6 The EXPECT STAC ca alog o ca alogs
Figu e 7 Dis ibu ed da a analysis suppo
4
Abou
The clima e sys em is changing apidly and some egions ha e seen inc eases in
ex emes beyond wha is expec ed om clima e model simula ions. To suppo
a ge ed clima e adap a ion s a egies, EXPECT will enable us wo hy assessmen s
and p edic ions o egional clima e change including ex emes by de eloping a
p o o ype ope a ional capabili y o in eg a ed a ibu ion and p edic ion o clima e.
This ambi ious goal is closely aligned wi h he WCRP Ligh house Ac i i y on Explaining
and P edic ing Ea h Sys em Change.
EXPECT will iden i y and quan i y he mechanisms by which physical p ocesses
go e n egional clima ic changes, including ex emes, on in e -annual o
mul i-decadal ime scales. I will do so by exploi ing newly a ailable clima e
simula ions and Ea h Obse a ions (EOs), and by combining machine lea ning (ML)
wi h physical me hods.
The esea ch will a ge undamen al knowledge gaps ela ed o a mosphe ic
ci cula ion and land-a mosphe e in e ac ions, which ep esen majo limi a ions in
cu en clima e p edic ions and p ojec ions, and in pa icula in unde s anding
changes in Eu opean summe ex emes.
To unde pin he esea ch, and bene i ing he wide esea ch communi y, EXPECT will
de elop ools o e icien ly analyze a a ie y o la ge da a se s in combina ion ha
a e hos ed in di e en eposi o ies ac oss ins i u ions. This will acili a e he exploi a ion
o ecen in es men s in o high- esolu ion clima e models and E O da a. EXPECT will
u he build da a science capaci y o he scien i ically obus , e icien and
ep oducible analysis o he massi e da a asse s, including no el ML app oaches,
and p o ide aining o he clima e science communi y and he nex gene a ion o
esea che s in pa icula . EXPECT will hus deli e signi ican scien i ic and
echnological ad ances o socie y and he clima e science communi y ha will las
well beyond he p ojec , in suppo o WCRP’s s a egic objec i es.
5
Execu i e summa y
This deli e able epo s wo k done as pa o wo k package 7 which es ablishes he
ounda ional in as uc u e o dis ibu ed da a analy ics wi hin he EXPECT p ojec .
The goal is o enable u u e c oss-ins i u ional, scalable and FAIR da a wo k lows ha
suppo di e se scien i ic goals o EXPECT. The objec i es ocus on in eg a ing
dis ibu ed compu e and s o age esou ces, enabling da a inges ion om key
clima e da ase s, and de eloping uni ied ca alogues and ools o e icien ,
loca ion-awa e analysis.
O e he i s 18 mon hs, p og ess has been made in ga he ing pa ne equi emen s,
acking da ase usage, and in eg a ing ins i u ional in as uc u es ac oss majo
Eu opean cen e s such as DKRZ and CINECA. A i s e sion o p ocessing se ices is
a ailable o use es ing and in e ac i e compu ing pla o ms such as Jupy e Hub
ha e been deployed o suppo use wo k lows. A i s e sion o an in eg a ed da a
ca alogue sys em (in eg a ing ex e nally managed ca alogs like ESGF) is in place.
This deli e able p esen s an o e iew o he s a us o he in eg a ion o in as uc u e
componen s enabling wo k lows ha mee use needs and enable u u e c oss-si e
in e ope abili y. I also ou lines planned enhancemen s o imp o e da a s aging,
ede a ed ca alogues, and hyb id cloud-HPC p ocessing models.
These de elopmen s con ibu e o a g owing ecosys em o Eu opean clima e da a
in as uc u e, complemen ing and building upon ini ia i es such as ESGF,
Cope nicus, and Pangeo. The EXPECT p ojec ’s dis ibu ed app oach aligns wi h
cu en EU p io i ies o FAIR da a and scalable, collabo a i e esea ch en i onmen s.
The echnical solu ions p oposed in his deli e able a e based on he equi emen s
exp essed by he pa ne s o he conso ium in he con ex o ask 7.1 whose
conclusions we e p esen ed in miles one M10 “Collec ing equi emen s o sha ed
da a in as uc u e and dis ibu ed p ocessing (a chi ec u e, se ices)” deli e ed a
M12.
6
Requi emen s on in as uc u e om pa ne s’
eedback
In M10, a ailable on he p ojec wiki1, he use s we e asked abou he da ase s hey
we e using and planning o use du ing he p ojec , conside ing hei olume, whe e
hey we e planning o access he da a and wi h which ools, p og amming
languages, and p o ocols. The main conclusions and how hey ha e been aken in o
accoun in he design o he in as uc u e a e desc ibed in his documen .
The p incipal da ase s ha people a e planning o use come om ESGF (CMIP6,
LSFMIP), he Cope nicus Da as o e S o e (CDS) o MARS (ERA5 o ma s seasonal
o ecas s) and some o he ex e nal High Resolu ion Clima e models da a (Clima e
Digi al Twin Des inE, EERIE, Nex GEMS). CMIP6 and ERA5, which a e a ailable in
Jasmin2 and DKRZ a e he mos eques ed da ase s.
Use s ei he download he da a om he CDS3 and ESGF (~60%) and MARS4 (~25%) o
hei local machines, o ake ad an age o he p e e ched da a om ins i u ional
eposi o ies such as CEDA, DKRZ, BSC. The da a access (bandwid h and complexi y)
is he main d i e o choose whe e and how o un he analysis. This clea ly indica es
he use ulness o p oposing he CINECA compu ing pla o m in addi ion o he BADC
and DKRZ ones al eady men ioned. The cu en ly p e alen app oach o
downloading he da a o local disk o analysis also shows he impo ance and u u e
bene i s in e ms o pe o mance o he in as uc u e desc ibed below. Th ough i s
online p ocessing capaci ies, i is expec ed o signi ican ly inc ease pe o mance o
he analysis, a oiding he o e head o he download.
To suppo e ec i e collabo a ion be ween in as uc u e pa ne s (BSC, DKRZ,
UREAD, CINECA), he p ojec o ganizes egula coo dina ion mee ings bo h wi hin
and ac oss in as uc u e wo k packages, as well as b oade p ojec mee ings such
as he Theme 4 kicko , Execu i e Boa d mee ings, and he EXPECT Gene al
Assembly.
Bila e al mee ings be ween speci ic pa ne s, o example, DKRZ and CINECA, a e
o ganized whene e close alignmen is equi ed. Collabo a i e de elopmen can
u he be suppo ed h ough sha ed code eposi o ies hos ed on he DKRZ and
BSCGi Lab pla o ms, ensu ing anspa en e sion con ol and acili a ing s eamlined
con ibu ions ac oss all eams.
4 Me eo ological A chi al and Re ie al Sys em
3 Clima e Da a S o e
2 High-pe o mance da a analysis and s o age pla o m de eloped and ope a ed by he UK
Cen e o En i onmen al Da a Analysis (CEDA)
1 h ps://ea h.bsc.es/expec /lib/exe/ e ch.php?media=wiki:con en :expec _use _su ey-m10.pd
7
EXPECT In as uc u e
An o e iew o he dis ibu ed in as uc u e suppo ing clima e analysis wo k lows in
EXPECT is p o ided in Figu e 1. The in as uc u e is ini ially buil a ound he h ee co e
da a cen e s DKRZ, BSC and CINECA. DKRZ and CINECA p o ide s o age and
compu e esou ces oge he wi h a ange o associa ed da a and compu e se ices.
These will be p og essi ely in eg a ed and ex ended as pa o EXPECT. BSC
complemen s hese esou ces by hos ing addi ional in e nal da a se e s. I also ac s
as a b idge o he e ol ing Des inE da a in as uc u e. The da a s o age, anspo
and compu e se ices a ailable a hese si es which need o be in eg a ed o
enable c oss-ins i u ional wo k lows a e summa ized in he ollowing subsec ions.
Figu e 1 EXPECT In as uc u e O e iew illus a es he dis ibu ed in as uc u e
connec ing DKRZ, CINECA, and BSC, highligh ing hei complemen a y s o age,
compu e, and da a se ices.
To exploi DKRZ and CINECA esou ces as pa o EXPECT wo k lows, scien is s need o
egis e a he si es and ge access igh s o esou ces and se ices:
8
To ge an accoun a DKRZ and join he EXPECT p ojec (p ojec code: bk1444), use s
should i s ob ain a DKRZ use accoun by egis e ing h ough he o icial po al.
Once he accoun is app o ed, hey should log in o he DKRZ use po al (LUV) and
submi a eques o join he EXPECT p ojec ia he p ojec applica ion in e ace.
A CINECA, use s should begin by egis e ing in he Use Da abase (Use DB). A e
comple ing he Use DB egis a ion and ecei ing app o al om CINECA, use s may
ha e access o he EXPECT p ojec sys ems hos ed he e.
Wi h an accoun a ei he o he compu e cen e s, use s can access da a s o age,
ans e da a be ween si es, and use da a p ocessing capabili ies. Wo k owa ds
in eg a ion in o a ede a ed AAI5 in as uc u e (e.g. based on he EGI AAI6 and
aligned wi h EOSC AAI a chi ec u e e o s7) is no pa o EXPECT, ye e o s a e
unde way as pa o o he p ojec s which in ol e EXPECT pa ne s (especially DKRZ
and BADC8/UKRI9 e.g. in RI-SCALE10).
BSC p o ides addi ional in as uc u e o s anda dized s o age on disk and apes,
a ailable exclusi ely o in e nal use s only. This in as uc u e suppo s da ase s such
as CMIP6, CDS seasonal o ecas s, and ERA5. Addi ionally a dedica ed wo k low has
been implemen ed o analyze Des inE da a s o ed on he Ma eNos um5 da a
b idge.
Da a S o age and T ans e
Clima e da a analysis ac i i ies in ol e high olume da a collec ions sp ead o e
dis ibu ed and di e en ypes o s o age sys ems. On he one hand he e is a need
o as , high- h oughpu s o age access (e.g. based on disk o ssd ha dwa e and
associa ed high pe o mance ile sys ems) which a e associa ed wi h (e.g. HPC)
compu e esou ces. On he o he hand he e is he need o empo a ily sha e,
ans e and s o e la ge amoun s o da a, which equi e au oma able da a ans e
s eps and associa ed da a sha ing sys ems e.g. based on cloud s o age. Addi ionally
he e is he need o in e ac wi h long e m a chi ed da a and hus access long e m
p ese ed da a esiding on e.g. ape-based s o age sys ems.
10 h ps://www. iscale.eu/
9 UK Resea ch and Inno a ion
8 B i ish A mosphe ic Da a Cen e
7 EOSC AAI a chi ec u e ask o ce: h ps://eosc.eu/ad iso y-g oups/aai-a chi ec u e/
6 EGI AAI: h ps://www.egi.eu/se ice/check-in-in e nal/
5 Au hen ica ion and Au ho iza ion In as uc u e
9
O e iew o da a ans e op ions
To go e n da a ans e be ween de ices o e a ne wo k, one o mo e s anda dised
p o ocols needs o be used. These p o ocols speci y s anda dised ules and o ma s
o da a ans e and a y in e ms o speed, secu i y and complexi y. The mos
sui able p o ocol will be chosen depending on he speci ic use wi hin he p ojec .
To mo e da a o/ om HPC sys ems, Cineca o e s dedica ed da a ans e se ices
ha b oadly all in o wo ca ego ies: da a mo e s and G idFTP. Da a mo e s a e
dedica ed, con aine ised nodes wi hou in e ac i e access ha suppo only a
limi ed se o commands (scp, sync, s p, wge , cu l, clone, AWS S3 and S3). G idFTP
is also a ailable on hese nodes, bu i can only be used ia he globus-u l-copy
clien , which mus be un om he use 's local machine.
O icial documen a ion de ails he da a ans e se ices
(h ps://docs.hpc.cineca.i /hpc/hpc_da a_s o age.h ml).
Da a P ocessing
Da a p ocessing wi hin he EXPECT p ojec is p og essing h ough implemen a ions a
DKRZ and CINECA, whe e dedica ed pla o ms a e being de eloped o suppo
scalable, e icien , and ep oducible wo k lows o clima e da a analysis.
A DKRZ, he Le an e HPC sys em is con igu ed o enable bo h in e ac i e and ba ch
p ocessing, using ools such as Jupy e Hub, Slu m, and he Rook Web P ocessing
Se ice.
CINECA has deployed a cloud-na i e in as uc u e based on Kube ne es,
Jupy e Hub, and Dask, allowing o lexible, dis ibu ed da a p ocessing. These
pla o ms can al eady suppo EXPECT use cases, and u he wo k is ongoing o
expand unc ionali y and imp o e in e ope abili y.
Da a P ocessing a DKRZ
DKRZ o e s HPC esou ces ailo ed o clima e modeling and da a-in ensi e
wo k lows, wi h CPU-only, GPU-enabled, and in e ac i e nodes. S o age elies on a
high-pe o mance Lus e-based ile sys em o as I/O. Typical wo k lows include da a
16
p epa a ion (using Py hon, CDO33, NCO34), ba ch simula ions ia Slu m, and
pos p ocessing wi h da a educ ion and isualiza ion. Au oma ion h ough wo k low
manage s and sc ip ing enables e icien , ep oducible esea ch. Ba ch jobs un on
Slu m, whe e use s submi sc ip s speci ying esou ces and commands. This
non-in e ac i e mode sui s la ge, pa allel simula ions using MPI35 o mul i- h eading.
In e ac i e da a p ocessing (jupy e hub)
DKRZ’s Jupy e Hub p o ides an in e ac i e, web-based en i onmen on he Le an e
HPC sys em, enabling scalable, da a-nea analysis o la ge clima e da ase s like
CMIP6 and ERA5. Use s au hen ica e wi h DKRZ c eden ials, selec esou ce p o iles
ia Slu m, and launch sessions suppo ing Py hon, R, and Julia. Con aine ized
en i onmen s and cus om ke nels allow ep oducible wo k lows and ad anced
analyses.
Fo EXPECT, DKRZ’s Jupy e no ebooks eposi o y o e s hands-on u o ials and use
cases o model da a analysis. No ebooks demons a e op imal HPC usage,
p ocessing ools, and da a isualiza ion. Use s can clone he eposi o y di ec ly in
Jupy e Hub, un demo no ebooks, o c ea e ailo ed ke nels ia conda/mamba o
ad anced wo k lows.
35 Message Passing In e ace
34 Ne CDF Ope a o s
33 Clima e Da a Ope a o s
17
Fig 3 Jupy e Hub a DKRZ. The igu e shows how DKRZ’s Jupy e Hub se ice connec s
esea che s o he Le an e HPC sys em. Use s log in h ough Apache wi h LDAP
au hen ica ion, a e which he Jupy e Hub hub manages sessions and ou es
eques s ia CHP. Resou ce alloca ion is handled by he DKRZSpawne , which o e s
bo h ad anced and p ese op ions and submi s jobs o Slu m. Once scheduled,
sessions un on Le an e compu e nodes, p o iding in e ac i e en i onmen s.
18
Web P ocessing Se ice
Rook is a web-based clima e da a p ocessing se ice de eloped unde he ROOCS36
p ojec by DKRZ (Ge many), CEDA/STFC (UK) and Ou anos (Canada). I o e s
s anda dized access o clima e da a p ocessing ia a Web P ocessing Se ice (WPS)
API, enabling dis ibu ed wo k lows ac oss di e se s o age sys ems. Buil on
GeoPy hon ools, including pywps37 (OGC WPS implemen a ion38), clisops39
(xa ay-based clima e da a ope a ions), ooki (a Py hon clien o Jupy e
in eg a ion), and daops40 (da a ope a ion suppo ), Rook acili a es p ocessing close
o he da a, educing la ge ans e s and suppo ing FAIR p inciples.
Wi hin EXPECT, Rook enables dis ibu ed, s anda ds-based access o clima e da ase s
and p ep ocessing se ices and in eg a es wi h in e ac i e en i onmen s like Jupy e
no ebooks. Deploymen s include a pe manen join DKRZ & IPSL ins ance, a
s andalone DKRZ node in eg a ed wi h ESGF and The ENES Clima e4Impac 41 po al,
and a CINECA ins alla ion op imized o S3-based HPC wo k lows. These un on i ual
machines accessing local o cloud s o age, enabling scalable and lexible
p ocessing. Fo example, he Cope nicus Clima e Da a S o e (CDS) uses a Rook
adap e on a DKRZ VM o access CMIP6 and CORDEX da ase s, complemen ing
Cope nicus’ obse a ional ocus by p o iding model da a p ocessing wi h ools like
eg idding o ha monize spa ial esolu ions. Rook in eg a es wi h ede a ed ca alogs
such as ESGF, Cope nicus CDS, STAC, and Pangeo s anda ds. Howe e , i s WPS
in e ace cu en ly lacks ull compa ibili y wi h ESGF S3 s o age ins ances. Reg idding
ia clisops add esses challenges in g id inconsis encies and in e pola ion p ac ices,
o e ing a ep oducible, p og ammable in e ace despi e some communi y-le el
issues.
Fo hands-on use, he public WPS endpoin is ook.dk z.de, wi h demo no ebooks
a ailable a he ooki Gi Hub eposi o y. No ably, he CMIP6 subse ing no ebook
41 h ps://www.clima e4impac .eu/c4i- on end/
40 h ps://pywps. ead hedocs.io/en/la es /
39 Clima e Simula ion Ope a ions
38 Open Geospa ial Conso ium’s Web P ocessing Se ice speci ica ion
37 PyWPS Web P ocessing Se ice implemen a ion: h ps://pywps.o g/
36 Remo e Ope a ions On Clima e Simula ions
19
suppo s ime and bounding-box que ies, and he ime-componen s no ebook
demons a es ad anced subse ing and wo k low examples.
Da a p ocessing a CINECA
The da a p ocessing se up in ol ed deploying an easy- o-use, mul i-use pla o m o
p ocessing da a. The ollowing de ails a e om he es ing and alida ion pe o med
up o M18.
O e iew o da a p ocessing componen s
The pla o m consis s o se e al componen s deployed o e K8s, including a use
au hen ica ion mechanism, a Jupy e Hub in e ace and a dask ga eway, as
depic ed in he ollowing igu e. Access o p ocessing ools is cu en ly es ic ed o
au ho ised use s (cu en ly only in e nal s a ) o ensu e ull access con ol and o
a oid secu i y b eaches.
Figu e 4: Mul i-use Dask ga eway implemen a ion
The au hen ica ion has been enabled using X con aine / ool. The Jupy e Hub ( ide
in a) has been deployed wi h X con aine /Helm cha ha allows mul iple use s o
un hei Jupy e no ebooks di ec ly on cloud in as uc u e whe e he da a is s o ed,
enabling da a-p oxima e compu a ion. The hi d componen o he da a p ocessing
se up is he deploymen o Dask-Ga eway.
This is a powe ul ool o managing and scaling Dask clus e s. I does his in a secu e,
mul i- enan en i onmen . This makes i ideal o dis ibu ed da a p ocessing in sha ed
in as uc u e like Kube ne es o HPC sys ems. I enables use s o launch and connec
o Dask clus e s dynamically on demand while gi ing adminis a o s con ol o e
esou ce limi s and au hen ica ion. By sepa a ing he use in e ace om back-end
clus e managemen , Dask-Ga eway simpli ies complex da a wo k lows and enables
e icien pa allel compu ing ac oss la ge da ase s.
20
The deploymen has been es ed o add ess he se e al issues ha includes; ensu ing
ha he Jupy e Hub se ice exposes he clus e se ice h ough co ec API
endpoin s, ensu ing he compa ibili y o Dask and Dask-Ga eway e sions by
a oiding ou da ed de aul images, enabling esou ce cleanup when he use shu s
down he Dask clus e o closes he no ebook se e , en o cing use isola ion ia
di e en namespaces o p e en wo use s om accessing he same Dask clus e ,
and examining he secu e execu ion o Dask wo k lows.
In e ac i e da a p ocessing (jupy e hub)
Jupy e Hub is a mul i-use o ches a ion laye ha can be used o spawn and
manage Jupy e no ebook se e s on sha ed in as uc u e. Typically backed by
Docke , Kube ne es, o o he spawne backends, i p o ides cen alised
au hen ica ion, use session isola ion, and esou ce managemen . I also suppo s
in eg a ion wi h OAu h, LDAP, and o he iden i y p o ide s, making i sui able o
secu e en i onmen s. When deployed wi h Kube ne es, i enables scalable,
con aine ised no ebook en i onmen s o each use wi h con igu able esou ce
quo as and pe sis en s o age. I is o en he de ac o choice o esea ch pla o ms
p o iding in e ac i e compu ing en i onmen s wi h ep oducible wo k lows.
Jupy e Hub can be e ec i ely combined wi h Dask o p o ide compu ing esou ces
and eal- ime pe o mance moni o ing ia Dask diagnos ics dashboa ds. When use s
un dis ibu ed Dask compu a ions in hei Jupy e no ebooks wi hin Jupy e Hub, Dask
au oma ically launches a diagnos ics dashboa d p o iding li e insigh s in o ask
scheduling, memo y usage and wo kload dis ibu ion. The ollowing igu e shows a
sc eensho o he ypical diagnos ics dashboa d.
Figu e 5: Dask da a p ocessing and diagnos ics
21
Each use can hen moni o hei own compu a ions ia a link o he Dask
dashboa d. This is ypically hos ed on a dedica ed po and can be secu ely p oxied
h ough Jupy e Hub. This in eg a ion is pa icula ly bene icial in da a-in ensi e
en i onmen s, whe e pe o mance isibili y and scalabili y a e essen ial. The
implemen a ion was es ed o ensu e he p ope unc ioning o he dashboa d, he
isola ion o indi idual use dashboa ds, and he possibili y o exposing he
dashboa ds o sepa a e windows o isualisa ion con enience.
Web P ocessing se ice
The Rook-WPS se ice is pa o a la ge sui e o ools known as ROOCS (Remo e
Ope a ions on Clima e Simula ions). ROOCS implemen s a da a-awa e pa adigm,
which means i enhances da a s o age sys ems wi h he capabili y o analyze,
p ocess, and agg ega e da a di ec ly whe e i esides. This app oach allows o
e icien p ocessing o ESGF da a by b inging he compu a ion o he da a, a he
han mo ing la ge da ase s.
The Rook-WPS se ice equi es access o he same publicly exposed da a made
a ailable h ough ESGF. Since he se ice is con aine ized, i can be hos ed on a
simple se e o a i ual machine, as long as i can moun he olume whe e he
da a is s o ed.
This equi emen , howe e , c ea ed a signi ican challenge o CINECA. The
o ganiza ion has upda ed i s ESGF so wa e s ack o a new Kube ne es-based e sion
ha p ima ily uses S3 s o age. This led o an ex ensi e pe iod o con igu a ion and
es ing, as CINECA a emp ed o ei he use Rook di ec ly wi h he S3 da a o ind a
sui able wo ka ound.
The only iable solu ion a p esen is o ha e an ESGF da ase eplica ed on bo h S3
s o age ( o publica ion ia he new se ices) and a s anda d ile sys em ha can be
moun ed on he i ual machine hos ing he Rook-WPS se ice.
Coinciden ally, he Op imESM p ojec da a, due o i s own speci ic p ojec
equi emen s, al eady possesses his dual-s o age cha ac e is ic. The e o e, he nex
s eps will ocus on he con igu a ion and deploymen o he Rook-WPS se ice using
his pa icula da ase .
Expec Ca alog (o Ca alogs)
22
Da a collec ions which a e used in EXPECT da a analysis wo k lows a e ca aloged in
di e en he e ogeneous sys ems. Howe e , cu en ly a s ong communi y e o
ocuses on ha monizing ca aloging app oaches based on Spa io-Tempo al Asse
Ca alogs42 (STAC) and he e ol ing associa ed STAC ooling ecosys em43. Thus, he
newly de eloped Expec Ca alog is based on STAC. I in eg a es ex e nal STAC
ca alogs bu also deploys a dynamic STAC backend o addi ional da a accessible
a EXPECT pa ne si es. The basic s uc u e is illus a ed in Figu e 6.
Fig 6. The EXPECT STAC ca alog o ca alogs
To be able o use ools om he apidly e ol ing STAC ool ecosys em, he EXPECT
ca alog o ca alogs can be b owsed (wi h he he s ac-b owse ool44) based on a
44 S ac-b owse : h ps://gi hub.com/ adian ea h/s ac-b owse
43 STAC ools & esou ces: h ps://s acspec.o g/en/abou / ools- esou ces/
42 STAC Spa io Tempo al Asse Ca alogs h ps://s acspec.o g/en
23
cen ally managed json desc ip ion45. This s ac-b owse iew is a ailable a
h ps://disco e .dk z.de. This ca alog o ca alogs cu en ly includes he ollowing
sub-ca alogs:
- DKRZ / EXPECT STAC ca alog: hos ed a DKRZ, exposing clima e model da a
om he ollowing p ojec s/e o s:
- EERIE da a collec ions
- Wa mwo ld da a collec ions
- Ex e nal STAC ca alogs
- ESGF (eas ) STAC ca alog
- Cope nicus ca alog
- The Des inE da a lake ca alog
The EXPECT ca alog o ca alogs will be adap ed, ex ended and e ined du ing he
p ojec li e ime:
● The cha ac e iza ion o ca alogs in he ca alog o ca alogs is cu en ly based
on me ada a om he egis y o esea ch da a eposi o ies ( e3da a.o g)
e o . This cha ac e iza ion is qui e gene ic and will be e ined o include
speci ic me ada a use ul o exploi clima e model da a eposi o ies (e.g.
p o iding in o ma ion on he speci ic con olled ocabula ies used o suppo
da a sea ch).
● As men ioned be o e, many pa allel e o s a e ongoing o ca alog clima e
da a collec ions based on STAC and p o ide STAC APIs and ca alogs. The
EXPECT ca alog o ca alogs will ollow hese de elopmen s and associa ed
ongoing ha moniza ion e o s. Fo example, he STAC ca alog ha is cu en ly
in eg a ed o p o ide access o da a collec ions om he global ESGF da a
ede a ion (e.g. CMIP6 and CORDEX) is based on he STAC ca alog hos ed a
CEDA UK. I will be supe seded by a STAC ca alog which is ully synch onized
wi h he ESGF (wes ) STAC ca alog hos ed in he US and will go ope a ional o
suppo CMIP7 la e his yea .
● The e a e also cu en ly many ea ly e o s o enable an au oma ic
gene a ion o s a ic (as well as dynamic) STAC ca alogs based on (e.g.
empo a y) da a collec ions ha a e hos ed on disk (and cloud) esou ces,
associa ed me ada a con en ions, and con olled ocabula ies. Di e en
ooling suppo is cu en ly unde de elopmen , communi y wide (e.g. o
45 EXPECT ca alog o ca alogs: h ps:// inyu l.com/Expec Ca alog
24
ESGF da a publica ion) as well as a ins i u ional le el (DKRZ disk and ape
da a STAC ca alog gene a o s). Cu en ly i is oo ea ly o decide on speci ic
ool sui es which can be p oposed o e.g. EXPECT da a p o ide s wan ing o
egis e da a in he EXPECT da a ca alog. Such da a egis a ion would
suppo clima e da a sha ing be ween clima e da a cen e s (and esea ch
ins i u es).
● To be able o exploi he EXPECT STAC ca alog o suppo dis ibu ed clima e
analy ics wo k lows as planned in wo kpackage 9, speci ic low le el da a
access in o ma ion will be included in he EXPECT ca alog. This is necessa y o
use s o ha e in o ma ion on how o bes access he in o ma ion based e.g.
on xa ay and dask, as well as he low-le el echnical de ails necessa y o
con igu e clien side ooling. Fo example, he exploi a ion o da a collec ions
based on STAC ca alog en ies is cu en ly no as use iendly as hose based
on s a ic in ake ca alogs46, a ailable o some da a collec ions. The cu en
DKRZ STAC ca alog hus also includes in o ma ion on da a exploi a ion based
on in ake clien side ooling. (see e.g. DKRZ s a ic in ake-esm da a ca alog47)
Da a analysis wo k lows
A key objec i e o he wo k in WP7 was o build he in as uc u e ounda ion on
which u u e wo k in WP9 can build upon o suppo dis ibu ed da a analysis
wo k lows in ol ing high olume clima e da a collec ions in EXPECT. In Figu e 7, a
simple ep esen a i e wo k low example is p o ided, illus a ing he necessa y
in eg a ion o he di e en in as uc u e componen s ha a e summa ized in he
p e ious sec ions:
● Use s in e ac wi h a cen alized ca alog o b owse and sea ch o he
a ailable da a se s hey need o he planned da a analysis ac i i y . This STAC
ca alog p o ides an o e iew o he da a accessible a EXPECT HPC si es as
well as ia ex e nal da a p o ide s (e.g. Cope nicus and ESGF). This is
illus a ed by he g een lines in Figu e 7.
47
h ps://disco e .dk z.de/ex e nal/s ac-de -a722d9.gi lab-pages.dk z.de/s a ic/DKRZ-s a ic-in
ake-esm.json?.language=de
46 In ake: Taking he pain ou o da a access and dis ibu ion:
h ps://in ake. ead hedocs.io/en/la es /index.h ml
25