LUMI AI Fac o y Se ice Cen e
Empowe ing Eu ope’s AI Ecosys em
D4.3 In e ence Se ice documen a ion
2
D4.3
In e ence se ice documen a ion
D4.3 In e ence Se ice documen a ion
3
P ojec Ti le
LUMI AI Fac o y Se ice Cen e
P ojec Ac onym
LUMI-AIF
P ojec Numbe
101234208
Type o Ac ion
HORIZON-JU-RIA
Topic
HORIZON-JU-EUROHPC-2025-AI-01-IBA-01
S a ing Da e o P ojec
01.03.2025
Ending Da e o P ojec
29.02.2028
Du a ion o he P ojec
36 mon hs
Websi e
lumi-ai- ac o y.eu
Wo k Package
WP4
Task
T4.3
Lead Au ho s
Juha Hulkkonen, Tomáš Ma ino ič
Con ibu o s
Pauliina Some koski, Emma Hin sala, Jan Ma ino ič, Juho
Ke änen
Pee Re iewe s
Abdul ahman Azab, Heidi Laine, Ka ja Mankinen
Ve sion
1.0
Due Da e
30.4.2025
Submission Da e
30.4.2025
Dissemina ion le el
X
PU: Public
SEN: Sensi i e – limi ed unde he condi ions o he G an Ag eemen
EU-RES. Classi ied In o ma ion: RESTREINT UE (Commission Decision 2005/444/EC)
EU-CON. Classi ied In o ma ion: CONFIDENTIEL UE (Commission Decision 2005/444/EC)
EU-SEC. Classi ied In o ma ion: SECRET UE (Commission Decision 2005/444/EC)
D4.3 In e ence Se ice documen a ion
4
Ve sion His o y
Re ision
Da e
Edi o s
Commen s
0.1.
28.3.2025
Juha Hulkkonen
Ini ial d a
0.2.
17.4.2025
Juho Ke änen, Jan
Ma ino ič, Tomáš
Ma ino ič,
Pauliina
Some koski
D a o e iew
0.3.
25.4.2025
Pauliina
Some koski
Edi ed based on e iew commen s
1.0
28.4.
Pauliina
Some koski
Final e sion.
Glossa y o Te ms
I em
Desc ip ion
Ai a
AI In e ence se ice c ea ed in CSC
LumiLingo
Cha bo o No dic languages c ea ed as demons a ion applica ion on
op o Ai a
LAIFS
LUMI AI Fac o y Se ice Cen e
LLM
La ge Language Model
API
Applica ion P og amming In e ace
REST API
Rep esen a ional S a e T ans e Applica ion P og amming In e ace
Web UI
Web b owse Use In e ace
D4.3 In e ence Se ice documen a ion
5
Execu i e Summa y
LUMI AI Fac o y's objec i es a e ad ancing he de elopmen o AI solu ions, including o e ings like
in e ence se ice. We expec o see as -paced g ow h in compu a ional needs in his a ea.
This documen desc ibes wo in e ence pla o ms: Ai a and EXA4MIND. Ai a in e ence pla o m is
de eloped by CSC, and EXA4MIND de eloped by he EXA4MIND p ojec conso ium (IT4I and METU).
The in e ence se ices will be p o ided by LUMI AI Fac o y.
In e ence e e s o he p ocess o using an AI model o gene a e p edic ions, p oduce ou pu s o
pe o m speci ic asks based on new inpu da a.
In e ence enables eal-wo ld applica ions o La ge Language Models (LLMs) and o he AI models,
allowing use s o in e ac wi h he model by p o iding p omp s — ex inpu s ha guide he model’s
esponses.
Ai a in e ence pla o m is a se ice o hos AI models on supe compu e s. Wi h Ai a, you can in e ac
wi h LLMs e icien ly using simple API (Applica ion P og amming In e ace) calls o simple Web UI
(Web b owse Use In e ace). Use s will be able o un hei sel - ained models, use cu a ed po olio
o models. Easy in eg a ion o exis ing model eposi o ies will be p o ided.
EXA4MIND in e ence se ice is de eloped on e icien le e aging o HPC esou ces in a p ojec ini ia i e
ha is building a pla o m o ex eme da a (i.e. da a ha is a he uppe o lowe limi s o expec a ions
ha should be accep ed by he sys em), enabling ad anced analy ics on supe compu e s. EXA4MIND
in e ence se ice solu ion complemen s he Ai a in e ence se ice wi h he main di e ence in he ask
queue implemen a ion. This opens he possibili y o u he Ai a ex ensions and op imiza ion based
on he incoming equi emen s.
D4.3 In e ence Se ice documen a ion
6
Table o Con en s
1. Ai a in Gene al ......................................................................................................... 7
1.1 In oduc ion o Ai a 7
1.2 Value 7
1.3 Fu u e wo k 7
2. How o use Ai a ........................................................................................................ 9
2.1 T aining Ma e ials 9
3. Ai a In e ence Se ice Technical Desc ip ion ........................................................... 10
3.1 Ai a In e ence Se ice Technical Desc ip ion 10
3.1.1 Web use in e ace (Fas API & G adio) 11
3.1.2 Ai a Backend Se ice API 11
3.1.3 Task and Job Managemen (Cele y & Redis) 12
3.1.4 Model Me ada a (MongoB) 12
3.1.5 High-End Applica ion Execu ion Middlewa e (HEAppE) 13
3.1.6 Au hen ica ion 13
4. EXA4MIND in e ence se e ..................................................................................... 15
4.1 In oduc ion 15
4.2 A chi ec u e 15
4.2.1 Web use in e ace (Fas API) 15
4.2.2 High-End Applica ion Execu ion Middlewa e (HEAppE) 15
4.2.3 Ins ances and deployed models DB 15
4.2.4 Task queue and scheduling 16
4.2.5 Logs 16
4.3 Fu u e wo k 16
D4.3 In e ence Se ice documen a ion
7
1. Ai a in Gene al
1.1 In oduc ion o Ai a
Ai a is a gene al pu pose scalable se ice componen o hos AI models on a supe compu e . Ai a is
o e ed o LUMI AI Fac o y Se ice Cen e cus ome s o un hei AI models in a eliable and powe ul
compu ing sys em sa ely in Finland, while ha nessing powe s o ene gy e icien supe compu e s.
Ai a o e s a se o cu a ed models ia an easy- o-use mode n web in e ace and API endpoin s wi h i s
own Py hon lib a y o p og amma ic usage. The undamen al de elopmen idea behind Ai a is ha
use s can un a bi a y models wi h any ype o da a hey need o hei esea ch and de elopmen
pu poses. Ai a sa es use s om ha ing o se up he se e s, o om buying cloud se ice capaci y
om hi d pa ies. Ai a o e s a use - iendly expe ience, allowing use s o ocus on hei p ima y asks
and objec i es.
AI in e ence pla o m Ai a is cu en ly in de elopmen s age, o iginally c ea ed o complemen CSC's
g owing AI se ice po olio. Wi h LUMI AI Fac o y he se ice is o be o e ed o a wide a ie y o
in e na ional use cases and use s. CSC o e s a a ie y o compu ing se ices, including HPC capaci y,
scien i ic suppo , and aining ma e ials, o de elopmen o cu ing edge AI models, including
gene a i e models. Ai a complemen s he se ices wi h p o iding sui able en i onmen s o unning
hea y AI models.
A he momen Ai a is hea ily ocused on La ge Language Models (LLMs) and ou cu en se o
a ailable models includes No dic language models Po o, Viking, GPT-SW3, FinGPT wi h mode n cha
in e ace.
1.2 Value
Ai a can be seen as a pa o commi men o e icien and sus ainable use o esou ces by o e ing
scalabili y o compu ing esou ces and making supe compu e s mo e mul i-pu pose ins umen s.
Compu ing esou ces used o aining can be also used o in e encing he models. Running AI models
may equi e subs an ial compu a ional powe , which Ai a uses e icien ly by alloca ing esou ces on
demand. Resou ces a e alloca ed au oma ically and eed a e a ce ain pe iod o ime.
Ai a will o e a ious backends o ensu e e ec i e use o esou ces: cloud and con aine cloud o
unning small expe imen al models quickly, na ional supe compu e s and LUMI and LUMI AI o la ges
and hea ies wo kloads.
1.3 Fu u e wo k
In he u u e, a ea u e o upload sel - ained models will be de eloped. Wi h help o o he MLOps
solu ions planned o be de eloped in LUMI AI Fac o y, use s can deploy hei models in o Ai a.
Addi ionally, he cu a ed po olio o models, ha now consis s o LLM models, will be de eloped
u he . Access key managemen will be imp o ed g ea ly o allow use s o in eg a e Ai a in o hei
D4.3 In e ence Se ice documen a ion
8
own se ices. Cu en ly Ai a de elopmen has been concen a ing mo e on use ia API, so Ai a has
only a basic use in e ace, ha will be de eloped u he o be mo e use - iendly.
D4.3 In e ence Se ice documen a ion
9
2. How o use Ai a
Ai a can be used ia web use in e ace o REST API. REST API (Rep esen a ional S a eT ans e
Applica ion P og amming In e ace) is a common web se ice in e ace.
Ai a has a simple web UI, see Ai a's webpage. The cu en Web UI is made mainly o es ing and
de elopmen pu poses. The WebUI is cu en ly also used o ge he access key o p og amma ic use.
Ai a makes models and in e ence asks a ailable ia REST API. Ai a REST API is also OpenAI
compa ible. AITTA API e e ence documen a ion documen s he API in de ail desc ibing endpoin s,
pa ame e s, and esponses. To be able o use his API, use s need o log in o Ai a's webpage o c ea e
API keys o access.
A Py hon clien o be used wi h Ai a API is also de eloped. This clien o p og amma ic use is
published o PyPi eposi o y Wi h jus a ew lines o code, one can u ilize his clien o e.g. in eg a e and
wo k wi h LLMs ia he Ai a API.
While i s de elopmen is ongoing and uploading cus om models is no ye a ailable, Ai a al eady
p o ides models o use s o explo e h ough he web UI and API endpoin s using Py hon lib a ies ai a-
clien and openai. In he nea u u e, i will also be possible o c ea e embeddings (nume ical
ep esen a ions o eal-wo ld objec s) o he da a using Ai a.
2.1 T aining Ma e ials
In oduc ion o Ai a - aining is online and a ailable ia he Noppe se ice. Noppe is CSC's se ice o
lea ning and cou se pu poses, o e ing web applica ions o wo king wi h da a and p og amming.
Noppe suppo s Jupy e and RS udio based applica ions.
The aining ma e ials a e cu en ly loca ed and will be main ained in a Gi hub eposi o y. In need o
using ins uc ions wi h p i a e compu e , he ma e ial sou ces can be downloaded om Gi hub.
D4.3 In e ence Se ice documen a ion
16
Scaling is ocused on he HPC job sides h ough adding jobs wi h HEAppE. Fo keeping ack o he
indi idual HPC jobs, he e is a DB which s o es wo main hings: he compu e jobs c ea ed by HEAppE
and ela ed me ada a, and he deployed models on hese compu e jobs. The second i em is necessa y
so we know which models a e al eady a ailable, o whe he he use s would ha e o load a new model.
4.2.4 Task queue and scheduling
Each compu e job will ha e a p ocess ha ac s as a ask queue and load balance using a ious
s a egies such as ound- obin. The e can be mul iple AI models deployed in he wo ke p ocesses
(using ei he mul iple single-GPU models o mul i-GPU models deployed on one o mo e compu e
nodes), wi h communica ion be ween he wo ke s and he main p ocess handled by Ze oMQ.
4.2.5 Logs
Fo each compu e job he e will be logs c ea ed con aining in o ma ion abou use que ies, esponses,
and AI model s a e. These a e mainly used o de elope s o debug any e o s in he se e ope a ion.
4.3 Fu u e wo k
The e a e mul iple ea u es ha we would like o add in he u u e. Among hese we a e conside ing
adding e ie al augmen ed gene a ion (RAG) suppo , pe sis ence o queues on he compu e job o
allow es a s in case o ailu es du ing he que y execu ion, adding cha his o y o he use in e ace,
and making he solu ion mo e gene al by allowing di e en models besides LLMs, such as image
gene a ion, audio gene a ion, e c.