LUMI AI Fac o y Se ice Cen e
Empowe ing Eu ope’s AI Ecosys em
D4.5. HPC API and documen a ion
2
D4.5
HPC API and documen a ion
D4.5. HPC API and documen a ion
3
P ojec Ti le
LUMI AI Fac o y Se ice Cen e
P ojec Ac onym
LUMI-AIF
P ojec Numbe
101234208
Type o Ac ion
HORIZON-JU-RIA
Topic
HORIZON-JU-EUROHPC-2025-AI-01-IBA-01
S a ing Da e o P ojec
01.03.2025
Ending Da e o P ojec
29.02.2028
Du a ion o he P ojec
36 mon hs
Websi e
lumi-ai- ac o y.eu
Wo k Package
WP4
Task
Task 4.4. De elopmen o AI- eady compu ing en i onmen and
HPC API
Lead Au ho s
Sebas ian on Al han (CSC), Juha A. Leh onen (CSC)
Con ibu o s
Lukas P edige (CSC), Vacla S a on (IT4I), Juha Hulkkonen
(CSC)
Pee Re iewe s
Juho Ke änen (CSC), Tomáš Ma ino ič (IT4I)
Ve sion
1.0
Due Da e
29.8.2025
Submission Da e
27.8.2025
Dissemina ion le el
x
PU: Public
SEN: Sensi i e – limi ed unde he condi ions o he G an Ag eemen
EU-RES. Classi ied In o ma ion: RESTREINT UE (Commission Decision 2005/444/EC)
EU-CON. Classi ied In o ma ion: CONFIDENTIEL UE (Commission Decision 2005/444/EC)
EU-SEC. Classi ied In o ma ion: SECRET UE (Commission Decision 2005/444/EC)
D4.5. HPC API and documen a ion
4
Ve sion His o y
Re ision
Da e
Edi o s
Commen s
0.1.
Lukas P edige
D a o HPC-API Use S o ies
0.2.
26.6.2025
Juha Hulkkonen
Added MLOps con en and some p oposals
0.3.
2.7.2025
Juha Hulkkonen
D a ed he In oduc ion, es uc u ed a bi
and edi ed he o he chap e s
0.4
4.7.2025
Sebas ian on
Al han
Added i s d a o Fi ecREST + Kube low
0.5
11.7.2025
Juha A. Leh onen
Added i s d a o Fi ecREST solu ion
echnical o e iew.
0.6
21.7.2025
Juha Hulkkonen
Re-w i e he In oduc ion, cleaning and
e ining o he chap e s.
0.7
25.7.2025
Juha Hulkkonen
Finalizing he d a
0.8
13.8.2025
Lukas P edige
Changes o o e all s uc u e
0.9
21.8.2025
Pauliina
Some koski
Resol ed he changes and dele ed
commen s. Small inishig ouches.
1.0
27.8.2025
Anna Luoma
Final quali y check pe o med by he PMO,
sen o o icial e iew.
Glossa y o Te ms
AI
A i icial In elligence
Ai low
Da a-enginee ing o ien ed wo k low engine
Ai a
AI In e ence se ice c ea ed in CSC
API
Applica ion P og amming In e ace
Fi ecREST
An open-sou ce web API o access HPC esou ces. Fi ecREST is
de eloped by Swiss Na ional Supe compu ing Cen e (CSCS).
HEAppE
HEAppE Middlewa e (High-End Applica ion Execu ion Middlewa e) is an
open-sou ce so wa e amewo k o p o ides a in e ace o HPC
esou ces. De eloped a IT4Inno a ions Na ional Supe compu ing
Cen e .
HPC
High Pe o mance Compu ing
I em
Desc ip ion
Kube low
A popula wo k low sys em o machine lea ning ope a ions
LUMI-K
OpenShi based con aine cloud cu en ly in de elopmen
LUMI-O
Ceph based objec s o age connec ed o LUMI sys em
D4.5. HPC API and documen a ion
5
ML
Machine Lea ning
MLOps
Machine Lea ning Ope a ions – The ope a ion o machine lea ning
sys ems using au oma ed ools o aining, es ing, and deploymen o
machine lea ning models.
OpenID Connec
OpenID Connec (OIDC) is an iden i y laye buil on op o OAu h 2.0
amewo k. I allows hi d pa y applica ions o e i y he iden i y o he
use based on au hen ica ion pe o med by an au ho iza ion se e .
REST API
Rep esen a ional S a e T ans e Applica ion P og amming In e ace - a
common o m o a s a eless web API using he HTTP p o ocol
S3
Simple S o age Se ice p o ocol o access objec s o age sys ems
Slu m
The Slu m Wo kload Manage is an open-sou ce job schedule o Linux
based sys ems, used by many o he wo ld's supe compu e s and
compu e clus e s.
D4.5. HPC API and documen a ion
6
Execu i e Summa y
This documen desc ibes HPC-API de elopmen o LUMI AI Fac o y.
An HPC-API (High Pe o mance Compu ing Applica ion P og amming In e ace) is used o
p og amma ically manage and in e ac wi h HPC en i onmen s. I unc ions o job managemen , like
submi ing and cancelling jobs on a supe compu e and que ying hei s a us, as well as handling da a on
he sys em.
The HPC-APIs deployed o LUMI a e used by access a ious so wa e componen s ha a e being
de eloped o he LUMI AI Fac o y. Mode n AI (A i icial In elligence) de elopmen o en elies on an
ex ensi e unde lying so wa e en i onmen o suppo he machine lea ning wo kloads unning in he
sys em.
This documen desc ibes he deploymen o a new HPC-API o LUMI. The aim is o se up an HPC-API
ha can be used wi h MLOps ools and con aine ized wo k lows o se e use s o LUMI AI Fac o y in AI
de elopmen asks. The echnology selec ed o his HPC-API is Fi ecREST de eloped by Swiss Na ional
Supe compu ing Cen e (CSCS).
This documen also desc ibes HEAppE as an al e na i e app oach. HEAppE, de eloped by IT4Inno a ions
Na ional Supe compu ing Cen e (IT4I), is an open-sou ce so wa e amewo k ha p o ides a secu e
and use - iendly in e ace o HPC esou ces. HEAppE is al eady used as pa o some CSC se ices as
well as isola ed, dedica ed HPC-API o some scien i ic p ojec s.
D4.5. HPC API and documen a ion
7
Table o Con en s
1. In oduc ion............................................................................................................... 8
1.1 In oduc ion o LUMI AI Fac o y MLOps en i onmen 8
2. Requi emen s o HPC-API ........................................................................................ 10
2.1 Use s o ies o HPC-API usage 10
2.1.1 Use o LUMI AI Fac o y se ice o e ings 10
2.1.2 Use o LUMI AI Fac o y HPC-API 11
2.1.3 Clien applica ions hos ed by LUMI AI Fac o y p ojec s 11
3. Exis ing HPC-API so wa e solu ions .......................................................................... 12
3.1 Fi ecREST HPC-API 12
3.1.1 Desc ip ion o Fi ecREST 12
3.1.2 A chi ec u e 13
3.1.3 Fea u es 13
3.1.4 Au hen ica ion and au ho iza ion 14
3.1.5 Use case suppo 14
3.2 HEAppE Middlewa e 16
3.2.1 Mo i a ion 16
3.2.2 A chi ec u e 17
3.2.3 Fea u es and Secu i y 17
3.2.4 Fu u e wo k and Re e ences 18
4. Design o he LUMI AI Fac o y HPC-API ..................................................................... 18
4.1 Selec ion o solu ion componen s 18
4.2 Implemen a ion plan 19
4.2.1 In eg a ing Fi ecREST wi h LUMI and CSC in as uc u e 19
4.2.2 Kube low HPC o loading ia Fi ecREST 19
4.2.2.1 Implemen a ion plan 20
4.2.2.2 Al e na i e app oaches and u u e wo k 20
D4.5. HPC API and documen a ion
8
1. In oduc ion
Many comme cial se ices in he ield o AI a e based on cloud pla o ms. E en hough he la es
con aine ised mic ose ice solu ions a e capable and scalable, mo e pa allel compu ing powe is needed
o he mos demanding da a p e-p ocessing, AI aining, ine uning and in e ence asks.
The LUMI AI Fac o y conso ium compu e esou ces consis o access o LUMI and, la e , LUMI-AI
supe compu e as well as s ong cloud pla o ms. The e o e, we a e well posi ioned o empowe cloud-
based solu ions wi h conside able HPC capaci y. Combining hese e y di e en compu ing pa adigms
is no s aigh o wa d ask. P og amma ic access om cloud se ices o HPC is needed o combine hese
pla o ms. This Applica ion P og amming In e ace (API) o be de eloped and deli e ed is e e ed as
HPC-API in he ollowing.
The HPC-API is an essen ial componen o implemen he LUMI AI Fac o y se ice po olio as
en isioned, o which an o e iew is p esen ed in he ollowing sec ion. The HPC-API will allow us o
connec o HPC capaci y o LUMI and la e LUMI-AI supe compu e s om ou cloud-based se ices. Tha
connec ion and he eme gen combina ion o hose wo compu ing pla o ms can enable no el ways o
de elop and o e AI se ices. Mos o he LUMI AI Fac o y so wa e en i onmen componen s will
equi e access o supe compu e esou ces.
The e a e exis ing echnologies cu en ly a ailable o sol e his issue, such as Fi ecREST and HEAppE
which a e desc ibed in mo e de ail in Sec ion 3. They ha e di e en app oaches which a e a ge ed o
di e en use-cases and echnologies. Key conside a ions o LUMI AI Fac o y a e he con enience o
usage, sui abili y o ou needs, au hen ica ion and access key managemen . To selec he igh app oach,
equi emen s o he HPC-API a e conside ed in Sec ion 2, based on which a plan o he implemen a ion
o he HPC-API is p esen ed in Sec ion 4.
Typical uses cases a e wo k lows suppo ing use o LLMs o ex p e-p ocessing, like da ase cu a ion.
O he conside ed use cases a e LLM ine- uning and LLM e alua ion. The aim is o use he HPC-API o
access hese eady-made con aine ized wo k lows on he LUMI sys em. HPC-API ex ends he ways use
can u ilize he HPC esou ces om ex e nal ools o se ices on LUMI.
1.1 In oduc ion o LUMI AI Fac o y MLOps en i onmen
LUMI AI Fac o y p o ides an MLOps en i onmen designed a ound cen ally ope a ed co e componen s
and se ed applica ions, ha a e suppo ed by a la ge selec ion o ecommended amewo ks and
comple e MLOps s acks.
Planned co e componen s o LUMI AI Fac o y MLOps en i onmen a e he open-sou ce ools Kube low,
o o ches a e he machine lea ning wo k lows, and ML low, o ack he aining and o s o e he models
and aining me ada a. Ou sel -de eloped Ai a AI in e ence se ice is complemen ing he o e ing and
closing he MLOps loop isualized in he Figu e 1, by p o iding he HPC-powe ed in e ence se ice e en
D4.5. HPC API and documen a ion
9
o he hea ies gene a i e models. Ai a is cu en ly using a dedica ed HEAppE ins ance o accessing
he HPC.
Figu e 1: MLOps Cycle o co e ools o con inuous imp o emen and deli e y o machine lea ning models
In he MLOps con ex he HPC-API will o e he essen ial a enue o Kube low o access LUMI esou ces
o complemen LUMI-K con aine cloud esou ces o mo e hea yweigh use cases.
Kube low is a popula wo k low sys em o machine lea ning ope a ions, enjoying almos a de- ac o
s anda d s a us cu en ly and p o iding smoo h in e ope abili y wi h comme cial cloud en i onmen s.
Kube low excels in o ches a ing complex ML aining and deploymen pipelines, including p e-
p ocessing o da a, pe o ming aining and unning model in e ence. Kube low enables indus y use s
who a e used o cloud na i e ools on se ices o e ed by Ame ican hype scale s an easy way o s a
using LUMI AI Fac o y se ices. A key a ge is o enable Kube low o un compu e-hea y asks on LUMI-
G GPU nodes using he HPC-API, while ligh asks can emain on LUMI-K.
ML low is one o he co e componen s ex ending he possibili ies o Kube low o managing he machine
lea ning li ecycle. I is al eady a ailable o use s on CSC's HPC sys ems o ack ML aining me ics
locally. Wi h he ex e nal ML low applica ion, a use can ack he me ics ou side o he HPC sys em o
wide and longe - e m usage. In he LUMI AI Fac o y MLOps oolki , ML low is o e ed as a s and-alone
applica ion on op o CSC con aine clouds, bu also as in eg al pa o Kube low o o e a seamless way
D4.5. HPC API and documen a ion
16
Figu e 4: S andalone API o CLI o desk op ools
3.2 HEAppE Middlewa e
3.2.1 Mo i a ion
HEAppE Middlewa e (High-End Applica ion Execu ion Middlewa e) is an open-sou ce so wa e
amewo k ha p o ides a secu e and use - iendly in e ace o HPC esou ces. Designed wi h
modula i y, mul i-use suppo , and REST API accessibili y, HEAppE enables seamless in eg a ion o HPC
in as uc u e wi h cus om use applica ions, web po als, and hi d-pa y se ices. De eloped in-house
a IT4Inno a ions Na ional Supe compu ing Cen e , HEAppE is an implemen a ion o he HPC-as-a-
Se ice concep .
HPC-as-a-Se ice is a well-known e m in high-pe o mance compu ing. I enables use s o access an
HPC in as uc u e wi hou a need o buy and manage hei own physical se e s o da a cen e
in as uc u e. Th ough his se ice, small and medium en e p ises (SMEs) can ake ad an age o he
echnology wi hou an up on in es men in he ha dwa e. This app oach u he lowe s he en y
ba ie o use s and SMEs who a e in e es ed in u ilizing massi e pa allel compu e s bu o en do no
ha e he necessa y le el o expe ise in his a ea.
D4.5. HPC API and documen a ion
17
HEAppE manages and p o ides in o ma ion abou submi ed and unning jobs and hei da a be ween
he clien applica ion and he HPC in as uc u e. HEAppE is able o submi equi ed compu a ion o
simula ion on HPC in as uc u e, moni o he p og ess and no i y he use should he need a ise. I
p o ides necessa y unc ions o job managemen , moni o ing and epo ing, use au hen ica ion and
au ho iza ion, ile ans e , enc yp ion, and a ious no i ica ion mechanisms.
3.2.2 A chi ec u e
HEAppE Middlewa e ollows a modula a chi ec u e (Figu e 5). The sys em consis s o se e al key
componen s. A i s co e, he HEAppE applica ion includes modules o API handling, se ice
o ches a ion, business logic, da a access, ile ans e , and HPC connec ion managemen . The API
module p o ides REST API endpoin s ha allow ex e nal sys ems and clien applica ions o in e ac wi h
HEAppE. The se ice module coo dina es in e nal ope a ions, while he business logic module
implemen s co e unc ionali ies such as job and da a managemen . Secu e and e icien in e ac ion wi h
HPC sys ems is acili a ed by he HPC connec ion module, which manages SSH-based access o emo e
job schedule s such as PBS and SLURM. To p o ec sensi i e in o ma ion such as SSH keys and
c eden ials o he access o he HPC, he sys em inco po a es a dedica ed secu e aul module.
Figu e 5: HEAppE a chi ec u e
3.2.3 Fea u es and Secu i y
HEAppE Middlewa e p o ides:
• Seamless HPC access ia a s anda dized REST API
D4.5. HPC API and documen a ion
18
• Templa ed jobsc ip s enabling easy HPC applica ion execu ion and moni o ing wi hou di ec
access o he clus e
• REST API o SLURM o PBS ba ch job managemen
• Suppo o o he ba ch schedule s o me aschedule s h ough plugin in e ace
• Secu e c eden ial managemen o HPC da a ans e
• Audi ed emo e use - o-HPC access
• Secu e aul o HPC access c eden ials
• Au hen ica ion ia Pe sonalized o Robo accoun s wi h Sha ed o Exclusi e access modes
• P o ec ion agains b u e- o ce a acks, a e limi ing, and minimized da a exposu e
• OpenID oken ole-based au hen ica ion
HEAppE implemen s se e al key secu i y ea u es o ensu e secu e access o HPC esou ces.
F om he au hen ica ion poin o iew HEAppE suppo s wo access modes. Fi s one uses pe sonalized
use accoun s i.e. one- o-one mapping o use c eden ials meaning ha he HPC compu e job execu ed
ia HEAppE will be submi ed unde an ac ual HPC use accoun . Second app oach u ilizes so-called obo
accoun s whe e he API use s don' ha e di ec access o an HPC in as uc u e bu HEAppE maps each
compu e job execu ion o a selec ed obo accoun , hus allowing con olled and es ic i e access o he
HPC esou ces. These obo accoun s a e c ea ed o a conc e e esou ce alloca ion on a speci ic HPC
in as uc u e.
To p e en a bi a y command execu ion, use s can only un p ede ined job execu ion empla es called
Command Templa es. Each empla e de ines a sc ip o execu able bina y, ha will be execu ed on an
HPC in as uc u e oge he wi h any dependencies o hi d-pa y so wa e i migh equi e.
3.2.4 Fu u e wo k and Re e ences
HEAppE Middlewa e is an open-sou ce so wa e wi h GPL 3 license. I is unde cons an de elopmen
wi h he aim o ha ing qua e ly eleases con aining bug ixes and newly suppo ed ea u es based on
eedback om he use s and ela ed p ojec s' equi emen s.
HEAppE landing page h ps://heappe.eu
HEAppE sou ce codes h ps://gi hub.com/I 4inno a ions/HEAppE
HEAppE documen a ion h ps://heappe.i 4i.cz/docs
4. Design o he LUMI AI Fac o y HPC-API
4.1 Selec ion o solu ion componen s
As desc ibed in sec ion 3.1.5, Fi ecREST suppo s he wo i s use cases, which a e o key impo ance. In
he i s use case, he LUMI AI Fac o y can p o ide MLOps web se ices whe e use s access LUMI
esou ces unde hei own iden i y. In he second use case, use s can con igu e single-use access on hei
own lap ops and use LUMI as an ex ension o hei local wo k lows and ools. Bo h o hese can be
D4.5. HPC API and documen a ion
19
suppo ed by a single cen alized API se ice. In p inciple he hi d use case could be co e ed using obo
accoun s, bu ini ially his usage may no be suppo ed.
HeAppE was o iginally designed o mee he hi d use case. In ha case API use s don' ha e di ec access
o an HPC in as uc u e bu HEAppE gi es con olled and es ic i e access o he HPC esou ces ia
obo accoun s. P ojec s can also independen ly se up HEAppE and une i o hei needs. No cen alized
ins alla ion is being planned o LUMI.
4.2 Implemen a ion plan
4.2.1 In eg a ing Fi ecREST wi h LUMI and CSC in as uc u e
Fi ecREST API will be in eg a ed wi h CSC au hen ica ion p oxy use -au h. This suppo s OpenID
Connec and he equi ed au ho iza ion lows. Ano he key se ice is an SSH-CA se ice. This p o ides
he ac ual SSH ce i ica e ha Fi ecREST uses o access he supe compu e on behal o he use . IN
2025, CSC is se ing up a SSH-CA se ice based on he DeiC SSH Ce i ica e Au ho i y Lib a y
(h ps://gi hub.com/way -dk/sshca). Fi ecREST ecen ly ecei ed suppo o his solu ion.
4.2.2 Kube low HPC o loading ia Fi ecREST
Kube low is a Kube ne es-based MLOps pla o m ha p o ides a use - iendly way o handle he ull li e
cycle o an ML model, enabling use s o de elop, o ches a e, and deploy machine lea ning wo k lows.
As desc ibed in Sec ion 1.1, i is a key componen o he LUMI AI Fac o y MLOps en i onmen .
LUMI AI Fac o y Se ice Cen e will ope a e a Kube low se ice on he upcoming LUMI-K con aine cloud
pla o m. This pla o m o e s a managed, mul i- enan Kube ne es based se ice unning on CPU
compu e nodes o LUMI. I has no di ec access o Slu m o he Lus e ile sys em bu will need o access
he LUMI HPC pa i ions ia he Fi ecREST REST API and LUMI-O objec s o e sys em ia he Simple
S o age Se ice (S3) p o ocol. Use s will au hen ica e using OpenID Connec , enabling seamless single
sign-on ac oss he pla o m. The au hen ica ion sys em will au oma ically p o ision access okens o
bo h Fi ecREST and LUMI-O se ices.
A key a ge is o enable Kube low o un compu e hea y asks on LUMI-G GPU nodes using his API (HPC
o load), while ligh asks can emain on LUMI-K. The main use cases ha will be o loaded a e:
• ML wo k lows: Kube low Pipelines a e complex ML wo k lows ha can be exp essed as a
di ec ed acyclic g aph (DAG) o asks. Each ask is ypically a con aine ha uns in Kube ne es,
doing e.g. da a p e-p ocessing, aining, e alua ion, e c. The p ima y goal is o enable hese asks
o be o loaded.
• Hype pa ame e uning: The Ka ib componen can be used o hype pa ame e uning by
launching many aining jobs in pa allel. A po en ial u u e goal is o o load hese o HPC.
Some use cases o Kube low will no be o loaded o HPC:
• In e ac i e Jupy e no ebooks: Typically used o da a analysis and model de elopmen asks.
The LUMI supe compu e al eady p o ides a web in e ace whe e Jupy e no ebooks can be un
on, hence we will in his wo k no implemen HPC o loading o his componen . Some da a
p ocessing and model de elopmen can s ill be done also on he LUMI-K nodes.
D4.5. HPC API and documen a ion
20
• Se ing models: The KSe e componen is an in e ence se ice componen o se ing he
de eloped models. In he LUMI AI Fac o y MLOps en i onmen he Ai a se ice is he main
model se ing pla o m; hence no in eg a ion o KSe e is cu en ly planned.
4.2.2.1 Implemen a ion plan
The goal is o keep he pipeline de ini ion and use expe ience as close o no mal as possible. The mos
s aigh o wa d app oach is o implemen a cus om Kube ne es pipeline componen . This allows he use
o de ine he con aine image and commands o be un, a se o inpu da a (e.g. S3 pa h), a S3 pa h whe e
ou pu da a is o be placed, and inally a speci ica ion o he HPC esou ces equi ed o he compu a ion
(numbe o GPUs, max job ime, accoun ing p ojec , e c.). When he componen is launched as pa o a
Kube low pipeline, i uns as a ligh weigh pod in LUMI-K. This pod does no pe o m he ac ual
compu a ion bu ins ead launches and moni o s a job on he HPC sys em o do so. I also ans e s da a
o and om he HPC sys em. All hese asks can be pe o med using Fi ecREST REST API calls.
In addi ion o a gene ic componen , i is also possible o de elop mo e specialized componen s o
speci ic asks such as aining, making he componen s easie o use.
An impo an conside a ion is ha he LUMI HPC pa i ions a e only able o un App aine o Singula i y
con aine s. This is no a majo p oblem since hey can anspa en ly con e Docke con aine s in o hei
speci ic o ma s when unning. A key ask is s ill o p o ide a good se o con aine s ha can be used o
pa allel la ge-scale asks on he HPC sys em, ha a e able o e icien ly u ilize GPUs ac oss mul iple
nodes and MPI. The wo k o de elop his se o con aine s is desc ibed in Deli e able 4.4. in de ail.
4.2.2.2 Al e na i e app oaches and u u e wo k
I should be possible o use a simila app oach as he Kube low pipelines in Ka ib, Kube lows
hype pa ame e uning amewo k, whe e Fi ecREST is used o o load ial jobs o HPC.
An in e es ing al e na i e o he pipeline componen app oach men ioned abo e is o implemen a
gene ic cus om esou ce (CRD) ep esen ing Fi ecREST. Such a Fi ecRESTJob CRD would unc ion like
exis ing job ypes (TFJob, PyTo chJob) in Kube low. Technically, his also equi es implemen a ion o a
con olle ha wa ches Fi ecRESTJob esou ces, ansla es speci ica ions o Fi ecREST API calls,
manages job li ecycle, and epo s s a us in o ma ion back o he Kube ne es API. The bene i s o he
CRD implemen a ion a e ha i enables us o implemen he Fi ecREST in eg a ion once ins ead o
duplica ing ha logic ac oss se e al pipeline componen s. Pipeline componen s hen become hin
w appe s ha c ea e Fi ecRESTJob esou ces and wai o comple ion. Ano he main bene i is ha hey
would in eg a e much be e in o Kube ne es, and no mal ooling can be used o moni o ing and
managemen . Howe e , as his is likely a mo e complex ask han adding a gene ic pipeline componen ,
he la e op ion will be explo ed i s , and in es iga e implemen a ion o a CRD in la e s ages o he
p ojec .