scieee Science in your language
[en] (orig)

Mixed-Precision For Energy Efficient Computations

Author: Gedik, Gülçin; Schöne, Robert; Iakymchuk, Roman
Publisher: Zenodo
DOI: 10.5281/zenodo.17550111
Source: https://zenodo.org/records/17550111/files/ws_whpc104s2.pdf
Mixed-P ecision Fo Ene gy E icien Compu a ions
G¨
ulc¸in Gedik∗†‡
†Uni e si ´
e Pa is-Saclay, UVSQ
D esden, Ge many
[email p o ec ed]
Robe Sch¨
one‡
‡ZIH, CIDS, TU D esden
D esden, Ge many
[email p o ec ed]
Roman Iakymchuk∗
∗Ume˚
a Uni e si y
Ume˚
a, Sweden
[email p o ec ed]
Abs ac —As simula ions become mo e ealis ic, he pu sui o
highe accu acy esul s in ex ended compu a ion imes and sub-
s an ial ene gy consump ion. This s udy explo es mixed-p ecision
compu ing as a p omising s a egy o add ess hese challenges,
le e aging compu e a i hme ic ools o op imize pe o mance.
To do so, we used Reac o Simula o and LULESH benchma ks
as case s udies o e alua e he po en ial o mixed-p ecision
s a egies o educe bo h ime- o-solu ion and ene gy- o-solu ion.
Fo Reac o Simula o , we achie ed mo e hen 30 % educ ion in
bo h me ics wi hou comp omising accu acy. Simila ly, esul s
o LULESH demons a ed imp o emen s o up o 31.5 % in
ime- o-solu ion and 25.6 % sa ings in ene gy- o-solu ion.
Index Te ms—Mixed-p ecision, Time- o-solu ion, Ene gy- o-
solu ion, LULESH, Ve i ica lo.
I. INTRODUCTION AND BACKGROUND
Recen e o s ha e ocused on designing applica ions ha
no only deli e high accu acy, bu also aim o minimize wo
me ics: ime- o-solu ion and ene gy- o-solu ion [1]–[3].
Al hough mixed-p ecision compu ing o e s po en ial gains
in bo h, achie ing hose gains wi hou comp omising scien i ic
accu acy is non- i ial [4], [5] as i ep esen s a mul i-objec i e
op imiza ion challenge. Ou wo k add esses his challenge
by explo ing he op imiza ion space de ined by ha dwa e
capabili ies, algo i hmic choices, and he selec i e applica ion
o mixed-p ecision echniques.
In pa allel wi h heo e ical ad ances in loa ing poin a i h-
me ic e o mi iga ion and es ima ion echniques [6], [7],
p ac ical ools o explo ing p ecision and e o p opaga ion
ha e eme ged. One such ool is Ve i ica lo, which ope a es a
he in e media e ep esen a ion le el o he compile o ins u-
men and analyze loa ing poin ope a ions wi hou modi ying
he sou ce code. Ve i ica lo le e ages wo complemen a y
backends: he Mon e Ca lo A i hme ic (MCA) backend [8] o
assess e o sensi i i y in code egions by e alua ing nume ical
s abili y unde andomized pe u ba ions, and he Va iable
P ecision (VPREC) backend [9] o explo e mixed-p ecision
s a egies, quan i y ounding-e o e ec s, and de e mine he
minimal p ecision o p ese e accu acy o con e gence.
Exascale compu ing demands an unde s anding o how
ha dwa e limi s, algo i hm design, and p ecision choices in e -
ac o impac pe o mance. When hese ac o s a e balanced
well, applica ions can be bo h ene gy-e icien and eliable. To
demons a e his, we p esen ou me hodology in he ollowing
sec ions and apply i o wo case s udies, bo h o which
employ explici sol e s and exhibi dis inc compu a ional
cha ac e is ics.
II. METHODOLOGY
Changing an applica ion om double p ecision o mixed
(including lowe ) p ecision equi es balancing accu acy and
pe o mance. Hence, mechanisms ha apply hese changes
ha e o be adap able since each applica ion has unique ea-
u es. A key challenge is o iden i y ou ines ha can use
educed p ecision sa ely – wi hou comp omising accu acy
o deg ading pe o mance [10]. This in ol es weighing he
pe o mance gains o lowe p ecision agains he o e head o
copying and cas ing be ween a iable ypes. Achie ing his
balance demands a deep unde s anding o he applica ion’s
a chi ec u e, including da a s uc u es, compu a ions, lib a ies,
and communica ion pa e ns. To guide his, we ollow he
me hodology p oposed in [11].
We s a wi h p o iling pe o mance ho spo s o iden i y
egions wi h high po en ial speedup. Then, we analyze nu-
me ical ho spo s by acking a iables and quan i ying he
e o g ow h wi h he help o Ve i ica lo’s VPREC and MCA
backends. We hen implemen mixed p ecision o he mos
ime-consuming egions ha con ibu e minimal e o , which
maximizes speedup wi h minimal accu acy loss. This unc ion-
le el analysis illus a es he wo k low and pe o mance-ene gy
ade-o s o sequen ial e sions, bu also di ec ly ex ends o
highly pa allel scien i ic codes, whe e lowe -p ecision s o age
sh inks message sizes and educes communica ion o e head
in addi ion o highe compu a ional h oughpu . While ine -
g ained uning exis s [12], [13], unc ion-le el g anula i y
o e s a p ac ical balance by signi ican ly educing he sea ch
space.
Th oughou , we elimina e egions unlikely o bene i om
educed p ecision. We adop a s aged s a egy: con e ing he
mos ime-c i ical inne ou ines o single p ecision i s , hen
ex ending o ou e bo lenecks whe e each s age includes he
p e ious one. Finally, we implemen he mixed-p ecision code
and moni o accu acy, ime- o-solu ion, and ene gy- o-solu ion
on he LUMI sys em using SLURM’s ene gy accoun ing
plugin and HPE C ay PM Coun e s [14], [15], guided by ou
CEEC Bes P ac ice Guide [16].
III. REACTOR SIMULATOR BENCHMARK
The Reac o Simula o [17], [18] implemen s a p obabilis ic
Mon e Ca lo applica ion ha models in e ac ions be ween
neu on-sou ce pa icles and a slab. I emi s pa icles i e -
a i ely, upda es hei s a es, and accumula es o al-ene gy
in double p ecision while acking ou comes wi h in ege
coun e s, making i a obus benchma k o mixed-p ecision
Type S age 1 S age 2 S age 3 S age 4 S age 5 Double
Ene gy Median (J) 1060 1220 1200 1250 1230 1700
Ene gy Sa ings (%) 37.6% 28.2% 29.4% 26.4% 27.6% -
Time Median (s) 3.89 4.15 4.12 4.22 4.12 5.63
Time Sa ings (%) 30.7% 26.2% 26.6% 25% 26.7% -
E o 0 0 0 10−810−7-
TABLE I: Ene gy (in Joules) and ime- o-solu ion (in sec-
onds) wi h Reac o Simula o wi h 10 million elemen s.
Type S age 1 S age 2 S age 3 Double
Ene gy Median (J) 1060 803 965 1080
Ene gy Sa ings (%) 1.8% 25.6% 10.6% -
Time Median (s) 3.53 2.71 3.20 3.96
Time Sa ings (%) 10.8% 31.5% 19.0% -
E o 10−910−910−9-
TABLE II: Ene gy (in Joules) and ime- o-solu ion (in sec-
onds) wi h LULESH wi h 203elemen s.
echniques. Applying a highe p ecision o he en i e applica-
ion educes he o wa d e o in o al-ene gy nea ly linea ly,
as he man issa a ies om 3 o 52, ega dless o he numbe o
pa icles. This linea ela ionship is no i ial, as highligh ed
in [11].
P o iling e eals ha sha ed ma h lib a y calls (e.g., sin_
ma(),exp_ ma()) domina e execu ion. To e alua e p e-
cision impac s, we ins umen ed hese unc ions using he
VPREC backend. We obse ed ha he e o pla eaus a e
17 bi , indica ing ha FP32 accu acy is su icien as shown in
Fig. 1. We p oceed o ins umen each unc ion indi idually
a single p ecision wi h he VPREC backend, while lea ing
he o he s in double p ecision. By ollowing his app oach
we explo e he ex en o which we can educe p ecision. We
no ice ha only wo unc ions ou o en ha e he po en ial o
con ibu e o o wa d e o unde his app oach. These insigh s
guided ou implemen a ion o a i e-s age mixed-p ecision
wo k low. In S age 5, mos unc ions un in single p eci-
sion, wi h only accu acy-c i ical ou ines in double p ecision.
Table I summa izes ime and ene gy sa ings on he LUMI
sys em.
IV. LULESH
LULESH (Li e mo e Uns uc u ed Lag angian Explici
Shock Hyd odynamics) is a p oxy applica ion de eloped by
Law ence Li e mo e Na ional Lab [19]–[21] ha eplica es
he compu a ional beha io o a hyd odynamics code [22].
The applica ion ep esen s eal-wo ld simula ions wi h i s
compu a ional and memo y access pa e ns.
Wi h a mesh o 203elemen s, ou expe imen s e eal
ha e o dec eases nea ly linea ly wi h inc eased p ecision.
No ably, a 23 bi s o man issa (co esponding o FP32), we
obse e a s agna ion a he imes ep selec ion wi h Ve i ica lo,
indica ing a p ecision limi . We also obse e ha 10 ou o
40 unc ions do no con ibu e o he inal e o o he ene gy
a iable, when emula ed in single p ecision sepa a ely while
he es o he applica ion is kep in double p ecision.
P o iling e ealed ha a small numbe o ou ines accoun ed
o a signi ican sha e o he un ime. Func ions equi ing high
nume ical accu acy, such as TimeInc emen () and Calc
Posi ionFo Nodes(), we e e ained in double p ecision.
In con as , pe o mance-c i ical compu a ions we e selec ed
o educed p ecision o imp o e e iciency. We applied mixed-
p ecision op imiza ion in h ee s ages: In he i s s age,
only he co e ou ine CalcElemFBHou glassFo ce()
was con e ed o single p ecision; howe e , pe -elemen cas s
and copies imposed on i s calle in oduced o e head ha
yielded negligible pe o mance gains as seen in Table II. In
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ... 52
Size o Man issa [bi ]
0.0000
0.0002
0.0004
0.0006
0.0008
0.0010
0.0012
Mean Rela i e E o [%]
c oss/sin
c oss/exp
Fig. 1: Mean absolu e ela i e o wa d e o o a ying
man issa bi s [3,52] o sin() and exp() calls in he
c oss() unc ion o he Reac o Simula o case s udy.
he second s age, we ex ended his con e sion o CalcFB
Hou glassFo ceFo Elems() by modi ying i s signa u e
o accep single-p ecision inpu s and eloca ing all cas /copy
ope a ions in o i s calle , CalcHou glassCon olFo
Elems(). This elimina ed indi ec -copy o e head and p o-
duced he la ges pe o mance gain. In he inal s age,
we implemen ed highe -le el ou ines CalcHou glass
Con olFo Elems(),Collec DomainNodes oElem
Nodes(), which showed ze o e o when ins umen ed alone
in single p ecision wi h he help o VPREC backend, and
VoluDe (), which signi ican ly impac s un ime, i s con-
ibu ion o e o is compa a i ely limi ed, he eby comple ing
he ull op imiza ion.
V. CONCLUSION AND FUTURE WORK
In his wo k, we applied a well-es ablished me hodology
o wo explici sol e use cases and showed ha mixed-
p ecision uning is inhe en ly applica ion-speci ic. Ou esul s
demons a e ha selec i ely educing p ecision in key com-
pu a ional ke nels can imp o e pe o mance and ene gy up o
31.5 % and 37.6 %, espec i ely, while accep able nume ical
accu acy o bo h he Reac o Simula o and LULESH bench-
ma ks is p ese ed. These esul s unde sco e he po en ial
o mixed-p ecision op imiza ions as an e ec i e app oach
o op imize scien i ic simula ions o bo h pe o mance and
ene gy e iciency. Al hough challenges emain in na owing
he mixed-p ecision sea ch space and au oma ing i s imple-
men a ion, u u e wo k will add ess hese and e alua e ou
app oach ac oss a wide ange o scien i ic applica ions.
REFERENCES
[1] M. Malms, L. Ca gemel, E. Sua ez, N. Mi enzwey, M. Du an on,
S. Seze , C. P un y, P. Ross´
e-Lau en , M. P´
e ez-Ha nandez, M. Ma aza-
kis, G. Lonsdale, P. Ca pen e , G. An oniu, S. Na asimha mu hy,
A. B inkman, D. Plei e , U.-U. Haus, J. K uege , H.-C. Hoppe, E. Lau e,
A. Wie se, V. Ba sch, K. Michielsen, C. Allouche, T. Becke , and
R. Haas, “ETP4HPC’s SRA 5 - S a egic Resea ch Agenda o High-
Pe o mance Compu ing in Eu ope - 2022,” No . 2022.
[2] M. Go e , B. Bah, P. Baue , D. Be od, V. Bouche , S. Co i,
C. Da is, Y. Duan, T. G aham, Y. Honda, A. Hines, M. Jean, J. Ishida,
B. Law ence, J. Li, J. Lu e bache , C. Mu oi, K. Rowe, M. Schul z,
M. Visbeck, and K. Williams, “Exascale compu ing and da a handling:
Challenges and oppo uni ies o wea he and clima e p edic ion,” Bul-
le in o he Ame ican Me eo ological Socie y, ol. 105, no. 12, pp. E2385
– E2404, 2024.
[3] DeepSeek-AI, A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, and e . al.,
“Deepseek- 3 echnical epo ,” 2025.
[4] Y. Chen, P. de Oli ei a Cas o, P. Bien inesi, N. Jansson, and R. Iakym-
chuk, “Enabling mixed-p ecision in spec al elemen codes,” Fu u e
Gene a ion Compu e Sys ems, ol. 174, p. 107990, 2026.
[5] A. Kashi, H. Lu, W. B ewe , D. Roge s, M. Ma heson, M. Shanka , and
F. Wang, “Mixed-p ecision nume ics in scien i ic applica ions: su ey
and pe spec i es,” 2025.
[6] N. J. Higham and T. Ma y, “A new app oach o p obabilis ic ounding
e o analysis,” SIAM Jou nal on Scien i ic Compu ing, ol. 41, no. 5,
pp. A2815–A2835, 2019.
[7] E.-M. El A a , D. Sohie , P. de Oli ei a Cas o, and E. Pe i , “S ochas ic
ounding a iance and p obabilis ic bounds: A new app oach,” SIAM
Jou nal on Scien i ic Compu ing, ol. 45, no. 5, pp. C255–C275, 2023.
[8] C. Denis, P. De Oli ei a Cas o, and E. Pe i , “Ve i ica lo: Checking
loa ing poin accu acy h ough mon e ca lo a i hme ic,” in 2016 IEEE
23nd Symposium on Compu e A i hme ic (ARITH), pp. 55–62, 2016.
[9] Y. Cha elain, E. Pe i , P. de Oli ei a Cas o, G. La igue, and D. De ou ,
“Au oma ic explo a ion o educed loa ing-poin ep esen a ions in i e -
a i e me hods,” in Eu o-Pa 2019: Pa allel P ocessing (R. Yahyapou ,
ed.), (Cham), pp. 481–494, Sp inge In e na ional Publishing, 2019.
[10] P. de Oli ei a Cas o, High Pe o mance Compu ing Code Op imiza-
ions: Tuning Pe o mance and Accu acy. PhD hesis, Uni e si ´
e Pa is-
Saclay, 2022.
[11] Y. Chen, P. d. O. Cas o, P. Bien inesi, and R. Iakymchuk, “Enabling
mixed-p ecision wi h he help o ools: A nekbone case s udy,” in Pa al-
lel P ocessing and Applied Ma hema ics (R. Wy zykowski, J. Donga a,
E. Deelman, and K. Ka czewski, eds.), (Cham), pp. 34–50, Sp inge
Na u e Swi ze land, 2025.
[12] C. Rubio-Gonz´
alez, C. Nguyen, H. D. Nguyen, J. Demmel, W. Kahan,
K. Sen, D. H. Bailey, C. Iancu, and D. Hough, “P ecimonious: Tuning
assis an o loa ing-poin p ecision,” in SC ’13: P oceedings o he
In e na ional Con e ence on High Pe o mance Compu ing, Ne wo king,
S o age and Analysis, pp. 1–12, 2013.
[13] M. O. Lam, T. Vande b uggen, H. Menon, and M. Scho dan, “Tool
in eg a ion o sou ce-le el mixed p ecision,” in 2019 IEEE/ACM 3 d
In e na ional Wo kshop on So wa e Co ec ness o HPC Applica ions
(Co ec ness), pp. 27–35, 2019.
[14] A. Ha , H. Richa dson, J. Doleschal, T. Ilsche, M. Biele , and M. Kap-
pel, “Use -le el powe moni o ing and applica ion pe o mance on c ay
xc30 supe compu e s,” C ay Use G oup, 2014.
[15] S. J. Ma in and M. Kappel, “C ay xc30 powe moni o ing and man-
agemen ,” C ay Use G oup, 2014.
[16] R. Iakymchuk, G. Gedik, K. Kulka ni, Y. Chen, D. Kemp , S. Kemmle ,
D. Papageo giou, D. Konio is, S. Kiebdaj, J. Co balan, and H. K¨
os le ,
“Bes P ac ice Guide – Ha es ing ene gy consump ion on Eu opean
HPC sys ems: Sha ing Expe ience om he CEEC p ojec ,” Aug. 2024.
[17] D. Kahane , C. Mole , S. Nash, and J. Bu ka d , “Reac o simula o ,”
1989.
[18] D. Kahane , C. Mole , and S. Nash, Nume ical Me hods and So wa e.
Englewood Cli s, NJ: P en ice Hall, 1989. LC: TA345.K34.
[19] I. Ka lin, A. Bha ele, B. L. Chambe lain, J. Cohen, Z. De i o,
M. Gokhale, R. Haque, R. Ho nung, J. Keasle , D. Laney, E. Luke,
S. Lloyd, J. McG aw, R. Neely, D. Richa ds, M. Schulz, C. H. S ill,
F. Wang, and D. Wong, “Lulesh p og amming model and pe o mance
po s o e iew,” Tech. Rep. LLNL-TR-608824, Li e mo e CA, Decem-
be 2012.
[20] I. Ka lin, J. Keasle , and R. Neely, “Lulesh 2.0 upda es and changes,”
Tech. Rep. LLNL-TR-641973, Li e mo e, CA, Augus 2013.
[21] M. B. G. R. D. Ho nung, J. A. Keasle , “Hyd odynamics challenge
p oblem,” 2011.
[22] L. L. N. L. (LLNL), “Lulesh – li e mo e uns uc u ed lag angian
explici shock hyd odynamics.” h ps://gi hub.com/LLNL/LULESH/ ee/
46c2a1d6db9171 9637d79 407212e0 176e8194, n.d. Accessed: 2025-
07-31.
ACKNOWLEDGMENTS
The au ho g a e ully acknowledges he inancial suppo
p o ided by he Cen e o Excellence in Exascale Compu a-
ional Fluid Dynamics (CEEC) unde G an No. 101093393,
unded by he Eu opean Union h ough he Eu oHPC Join Un-
de aking and Sweden, Ge many, Spain, G eece and Denma k.
The au ho would like o hank he NHR-Ve ein e.V.(www.nh -
e ein.de) o suppo ing his wo k/p ojec wi hin NHR G ad-
ua e School o Na ional High Pe o mance Compu ing (NHR).
The au ho would like o hank Pablo de Oli ei a Cas o o
hei aluable guidance and con ibu ions.
APPENDIX A
MEASUREMENT METHODOLOGY DETAILS
All expe imen s we e pe o med on he CPU pa i ion o
he LUMI sys em, whe e we compiled each benchma k wi h
g++ e sion 12.2.0 and he -O2 op imiza ion lag. To ob ain
s a is ically signi ican measu emen s, e e y applica ion was
un 32 imes unde he pe o mance-scaling go e no , and
bo h un ime (in seconds) and ene gy consump ion (in joules)
we e eco ded. Ene gy da a we e ga he ed ia SLURM’s
ene gy-accoun ing plugin, which eads HPE C ay PM Coun-
e s h ough he Baseboa d Managemen Con olle (BMC)
as de ailed in ou CEEC Bes P ac ice Guide; nodes we e
dedica ed exclusi ely o each job o elimina e in e e ence
om co-scheduled wo kloads. CPU ime was measu ed wi h
he GNU ime command, cap u ing only use and sys em
ime o e lec he p ecise CPU esou ces de o ed o each
p ocess.
APPENDIX B
ERROR ANALYSIS DETAILS
To e i y he nume ical accu acy o ou esul s, we
ecompiled he code using Ve i ica lo ( e sion 1.0.0) wi h he
-O2 op imiza ion lag o a oid unin ended al e a ions om
compile op imiza ions. Du ing s anda d uns, Ve i ica lo’s
VPREC backend was employed wi h i s de aul se ings,
p ese ing IEEE-754 bina y64 p ecision and ange. Fo
mixed-p ecision expe imen s, we in oked VPREC wi h he
-p ecision-bina y64=<desi ed_man issa_bi s>
op ion o emula e double p ecision a he speci ied man issa
wid h. This app oach ensu es ha only he a ge ed p ecision
changes, a he han o he compile ans o ma ions, impac
ou accu acy measu emen s. We quan i y he e o in oduced
by educed p ecision using he mean absolu e ela i e e o .