Mixed-Precision For Energy Efficient Computations

Author: Gedik, Gülçin; Schöne, Robert; Iakymchuk, Roman

Publisher: Zenodo

DOI: 10.5281/zenodo.17550111

Source: https://zenodo.org/records/17550111/files/ws_whpc104s2.pdf

Mixed-P ecision Fo Ene gy E icien Compu a ions
G¨
ulc¸in Gedik∗†‡
†Uni e si ´
e Pa is-Saclay, UVSQ
D esden, Ge many
[email p o ec ed]
Robe Sch¨
one‡
‡ZIH, CIDS, TU D esden
D esden, Ge many
[email p o ec ed]
Roman Iakymchuk∗
∗Ume˚
a Uni e si y
Ume˚
a, Sweden
[email p o ec ed]
Abs ac —As simula ions become mo e ealis ic, he pu sui o
highe accu acy esul s in ex ended compu a ion imes and sub-
s an ial ene gy consump ion. This s udy explo es mixed-p ecision
compu ing as a p omising s a egy o add ess hese challenges,
le e aging compu e a i hme ic ools o op imize pe o mance.
To do so, we used Reac o Simula o and LULESH benchma ks
as case s udies o e alua e he po en ial o mixed-p ecision
s a egies o educe bo h ime- o-solu ion and ene gy- o-solu ion.
Fo Reac o Simula o , we achie ed mo e hen 30 % educ ion in
bo h me ics wi hou comp omising accu acy. Simila ly, esul s
o LULESH demons a ed imp o emen s o up o 31.5 % in
ime- o-solu ion and 25.6 % sa ings in ene gy- o-solu ion.
Index Te ms—Mixed-p ecision, Time- o-solu ion, Ene gy- o-
solu ion, LULESH, Ve i ica lo.
I. INTRODUCTION AND BACKGROUND
Recen e o s ha e ocused on designing applica ions ha
no only deli e high accu acy, bu also aim o minimize wo
me ics: ime- o-solu ion and ene gy- o-solu ion [1]–[3].
Al hough mixed-p ecision compu ing o e s po en ial gains
in bo h, achie ing hose gains wi hou comp omising scien i ic
accu acy is non- i ial [4], [5] as i ep esen s a mul i-objec i e
op imiza ion challenge. Ou wo k add esses his challenge
by explo ing he op imiza ion space de ined by ha dwa e
capabili ies, algo i hmic choices, and he selec i e applica ion
o mixed-p ecision echniques.
In pa allel wi h heo e ical ad ances in loa ing poin a i h-
me ic e o mi iga ion and es ima ion echniques [6], [7],
p ac ical ools o explo ing p ecision and e o p opaga ion
ha e eme ged. One such ool is Ve i ica lo, which ope a es a
he in e media e ep esen a ion le el o he compile o ins u-
men and analyze loa ing poin ope a ions wi hou modi ying
he sou ce code. Ve i ica lo le e ages wo complemen a y
backends: he Mon e Ca lo A i hme ic (MCA) backend [8] o
assess e o sensi i i y in code egions by e alua ing nume ical
s abili y unde andomized pe u ba ions, and he Va iable
P ecision (VPREC) backend [9] o explo e mixed-p ecision
s a egies, quan i y ounding-e o e ec s, and de e mine he
minimal p ecision o p ese e accu acy o con e gence.
Exascale compu ing demands an unde s anding o how
ha dwa e limi s, algo i hm design, and p ecision choices in e -
ac o impac pe o mance. When hese ac o s a e balanced
well, applica ions can be bo h ene gy-e icien and eliable. To
demons a e his, we p esen ou me hodology in he ollowing
sec ions and apply i o wo case s udies, bo h o which
employ explici sol e s and exhibi dis inc compu a ional
cha ac e is ics.
II. METHODOLOGY
Changing an applica ion om double p ecision o mixed
(including lowe ) p ecision equi es balancing accu acy and
pe o mance. Hence, mechanisms ha apply hese changes
ha e o be adap able since each applica ion has unique ea-
u es. A key challenge is o iden i y ou ines ha can use
educed p ecision sa ely – wi hou comp omising accu acy
o deg ading pe o mance [10]. This in ol es weighing he
pe o mance gains o lowe p ecision agains he o e head o
copying and cas ing be ween a iable ypes. Achie ing his
balance demands a deep unde s anding o he applica ion’s
a chi ec u e, including da a s uc u es, compu a ions, lib a ies,
and communica ion pa e ns. To guide his, we ollow he
me hodology p oposed in [11].
We s a wi h p o iling pe o mance ho spo s o iden i y
egions wi h high po en ial speedup. Then, we analyze nu-
me ical ho spo s by acking a iables and quan i ying he
e o g ow h wi h he help o Ve i ica lo’s VPREC and MCA
backends. We hen implemen mixed p ecision o he mos
ime-consuming egions ha con ibu e minimal e o , which
maximizes speedup wi h minimal accu acy loss. This unc ion-
le el analysis illus a es he wo k low and pe o mance-ene gy
ade-o s o sequen ial e sions, bu also di ec ly ex ends o
highly pa allel scien i ic codes, whe e lowe -p ecision s o age
sh inks message sizes and educes communica ion o e head
in addi ion o highe compu a ional h oughpu . While ine -
g ained uning exis s [12], [13], unc ion-le el g anula i y
o e s a p ac ical balance by signi ican ly educing he sea ch
space.
Th oughou , we elimina e egions unlikely o bene i om
educed p ecision. We adop a s aged s a egy: con e ing he
mos ime-c i ical inne ou ines o single p ecision i s , hen
ex ending o ou e bo lenecks whe e each s age includes he
p e ious one. Finally, we implemen he mixed-p ecision code
and moni o accu acy, ime- o-solu ion, and ene gy- o-solu ion
on he LUMI sys em using SLURM’s ene gy accoun ing
plugin and HPE C ay PM Coun e s [14], [15], guided by ou
CEEC Bes P ac ice Guide [16].
III. REACTOR SIMULATOR BENCHMARK
The Reac o Simula o [17], [18] implemen s a p obabilis ic
Mon e Ca lo applica ion ha models in e ac ions be ween
neu on-sou ce pa icles and a slab. I emi s pa icles i e -
a i ely, upda es hei s a es, and accumula es o al-ene gy
in double p ecision while acking ou comes wi h in ege
coun e s, making i a obus benchma k o mixed-p ecision
Type S age 1 S age 2 S age 3 S age 4 S age 5 Double
Ene gy Median (J) 1060 1220 1200 1250 1230 1700
Ene gy Sa ings (%) 37.6% 28.2% 29.4% 26.4% 27.6% -
Time Median (s) 3.89 4.15 4.12 4.22 4.12 5.63
Time Sa ings (%) 30.7% 26.2% 26.6% 25% 26.7% -
E o 0 0 0 10−810−7-
TABLE I: Ene gy (in Joules) and ime- o-solu ion (in sec-
onds) wi h Reac o Simula o wi h 10 million elemen s.
Type S age 1 S age 2 S age 3 Double
Ene gy Median (J) 1060 803 965 1080
Ene gy Sa ings (%) 1.8% 25.6% 10.6% -
Time Median (s) 3.53 2.71 3.20 3.96
Time Sa ings (%) 10.8% 31.5% 19.0% -
E o 10−910−910−9-
TABLE II: Ene gy (in Joules) and ime- o-solu ion (in sec-
onds) wi h LULESH wi h 203elemen s.
echniques. Applying a highe p ecision o he en i e applica-
ion educes he o wa d e o in o al-ene gy nea ly linea ly,
as he man issa a ies om 3 o 52, ega dless o he numbe o
pa icles. This linea ela ionship is no i ial, as highligh ed
in [11].
P o iling e eals ha sha ed ma h lib a y calls (e.g., sin_
ma(),exp_ ma()) domina e execu ion. To e alua e p e-
cision impac s, we ins umen ed hese unc ions using he
VPREC backend. We obse ed ha he e o pla eaus a e
17 bi , indica ing ha FP32 accu acy is su icien as shown in
Fig. 1. We p oceed o ins umen each unc ion indi idually
a single p ecision wi h he VPREC backend, while lea ing
he o he s in double p ecision. By ollowing his app oach
we explo e he ex en o which we can educe p ecision. We
no ice ha only wo unc ions ou o en ha e he po en ial o
con ibu e o o wa d e o unde his app oach. These insigh s
guided ou implemen a ion o a i e-s age mixed-p ecision
wo k low. In S age 5, mos unc ions un in single p eci-
sion, wi h only accu acy-c i ical ou ines in double p ecision.
Table I summa izes ime and ene gy sa ings on he LUMI
sys em.
IV. LULESH
LULESH (Li e mo e Uns uc u ed Lag angian Explici
Shock Hyd odynamics) is a p oxy applica ion de eloped by
Law ence Li e mo e Na ional Lab [19]–[21] ha eplica es
he compu a ional beha io o a hyd odynamics code [22].
The applica ion ep esen s eal-wo ld simula ions wi h i s
compu a ional and memo y access pa e ns.
Wi h a mesh o 203elemen s, ou expe imen s e eal
ha e o dec eases nea ly linea ly wi h inc eased p ecision.
No ably, a 23 bi s o man issa (co esponding o FP32), we
obse e a s agna ion a he imes ep selec ion wi h Ve i ica lo,
indica ing a p ecision limi . We also obse e ha 10 ou o
40 unc ions do no con ibu e o he inal e o o he ene gy
a iable, when emula ed in single p ecision sepa a ely while
he es o he applica ion is kep in double p ecision.
P o iling e ealed ha a small numbe o ou ines accoun ed
o a signi ican sha e o he un ime. Func ions equi ing high
nume ical accu acy, such as TimeInc emen () and Calc
Posi ionFo Nodes(), we e e ained in double p ecision.
In con as , pe o mance-c i ical compu a ions we e selec ed
o educed p ecision o imp o e e iciency. We applied mixed-
p ecision op imiza ion in h ee s ages: In he i s s age,
only he co e ou ine CalcElemFBHou glassFo ce()
was con e ed o single p ecision; howe e , pe -elemen cas s
and copies imposed on i s calle in oduced o e head ha
yielded negligible pe o mance gains as seen in Table II. In
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ... 52
Size o Man issa [bi ]
0.0000
0.0002
0.0004
0.0006
0.0008
0.0010
0.0012
Mean Rela i e E o [%]
c oss/sin
c oss/exp
Fig. 1: Mean absolu e ela i e o wa d e o o a ying
man issa bi s [3,52] o sin() and exp() calls in he
c oss() unc ion o he Reac o Simula o case s udy.
he second s age, we ex ended his con e sion o CalcFB
Hou glassFo ceFo Elems() by modi ying i s signa u e
o accep single-p ecision inpu s and eloca ing all cas /copy
ope a ions in o i s calle , CalcHou glassCon olFo
Elems(). This elimina ed indi ec -copy o e head and p o-
duced he la ges pe o mance gain. In he inal s age,
we implemen ed highe -le el ou ines CalcHou glass
Con olFo Elems(),Collec DomainNodes oElem
Nodes(), which showed ze o e o when ins umen ed alone
in single p ecision wi h he help o VPREC backend, and
VoluDe (), which signi ican ly impac s un ime, i s con-
ibu ion o e o is compa a i ely limi ed, he eby comple ing
he ull op imiza ion.
V. CONCLUSION AND FUTURE WORK
In his wo k, we applied a well-es ablished me hodology
o wo explici sol e use cases and showed ha mixed-
p ecision uning is inhe en ly applica ion-speci ic. Ou esul s
demons a e ha selec i ely educing p ecision in key com-
pu a ional ke nels can imp o e pe o mance and ene gy up o
31.5 % and 37.6 %, espec i ely, while accep able nume ical
accu acy o bo h he Reac o Simula o and LULESH bench-
ma ks is p ese ed. These esul s unde sco e he po en ial
o mixed-p ecision op imiza ions as an e ec i e app oach
o op imize scien i ic simula ions o bo h pe o mance and
ene gy e iciency. Al hough challenges emain in na owing
he mixed-p ecision sea ch space and au oma ing i s imple-
men a ion, u u e wo k will add ess hese and e alua e ou
app oach ac oss a wide ange o scien i ic applica ions.
REFERENCES
[1] M. Malms, L. Ca gemel, E. Sua ez, N. Mi enzwey, M. Du an on,
S. Seze , C. P un y, P. Ross´
e-Lau en , M. P´
e ez-Ha nandez, M. Ma aza-
kis, G. Lonsdale, P. Ca pen e , G. An oniu, S. Na asimha mu hy,
A. B inkman, D. Plei e , U.-U. Haus, J. K uege , H.-C. Hoppe, E. Lau e,
A. Wie se, V. Ba sch, K. Michielsen, C. Allouche, T. Becke , and
R. Haas, “ETP4HPC’s SRA 5 - S a egic Resea ch Agenda o High-
Pe o mance Compu ing in Eu ope - 2022,” No . 2022.
[2] M. Go e , B. Bah, P. Baue , D. Be od, V. Bouche , S. Co i,
C. Da is, Y. Duan, T. G aham, Y. Honda, A. Hines, M. Jean, J. Ishida,
B. Law ence, J. Li, J. Lu e bache , C. Mu oi, K. Rowe, M. Schul z,
M. Visbeck, and K. Williams, “Exascale compu ing and da a handling:
Challenges and oppo uni ies o wea he and clima e p edic ion,” Bul-
le in o he Ame ican Me eo ological Socie y, ol. 105, no. 12, pp. E2385
– E2404, 2024.
[3] DeepSeek-AI, A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, and e . al.,
“Deepseek- 3 echnical epo ,” 2025.
[4] Y. Chen, P. de Oli ei a Cas o, P. Bien inesi, N. Jansson, and R. Iakym-
chuk, “Enabling mixed-p ecision in spec al elemen codes,” Fu u e
Gene a ion Compu e Sys ems, ol. 174, p. 107990, 2026.
[5] A. Kashi, H. Lu, W. B ewe , D. Roge s, M. Ma heson, M. Shanka , and
F. Wang, “Mixed-p ecision nume ics in scien i ic applica ions: su ey
and pe spec i es,” 2025.
[6] N. J. Higham and T. Ma y, “A new app oach o p obabilis ic ounding
e o analysis,” SIAM Jou nal on Scien i ic Compu ing, ol. 41, no. 5,
pp. A2815–A2835, 2019.
[7] E.-M. El A a , D. Sohie , P. de Oli ei a Cas o, and E. Pe i , “S ochas ic
ounding a iance and p obabilis ic bounds: A new app oach,” SIAM
Jou nal on Scien i ic Compu ing, ol. 45, no. 5, pp. C255–C275, 2023.
[8] C. Denis, P. De Oli ei a Cas o, and E. Pe i , “Ve i ica lo: Checking
loa ing poin accu acy h ough mon e ca lo a i hme ic,” in 2016 IEEE
23nd Symposium on Compu e A i hme ic (ARITH), pp. 55–62, 2016.
[9] Y. Cha elain, E. Pe i , P. de Oli ei a Cas o, G. La igue, and D. De ou ,
“Au oma ic explo a ion o educed loa ing-poin ep esen a ions in i e -
a i e me hods,” in Eu o-Pa 2019: Pa allel P ocessing (R. Yahyapou ,
ed.), (Cham), pp. 481–494, Sp inge In e na ional Publishing, 2019.
[10] P. de Oli ei a Cas o, High Pe o mance Compu ing Code Op imiza-
ions: Tuning Pe o mance and Accu acy. PhD hesis, Uni e si ´
e Pa is-
Saclay, 2022.
[11] Y. Chen, P. d. O. Cas o, P. Bien inesi, and R. Iakymchuk, “Enabling
mixed-p ecision wi h he help o ools: A nekbone case s udy,” in Pa al-
lel P ocessing and Applied Ma hema ics (R. Wy zykowski, J. Donga a,
E. Deelman, and K. Ka czewski, eds.), (Cham), pp. 34–50, Sp inge
Na u e Swi ze land, 2025.
[12] C. Rubio-Gonz´
alez, C. Nguyen, H. D. Nguyen, J. Demmel, W. Kahan,
K. Sen, D. H. Bailey, C. Iancu, and D. Hough, “P ecimonious: Tuning
assis an o loa ing-poin p ecision,” in SC ’13: P oceedings o he
In e na ional Con e ence on High Pe o mance Compu ing, Ne wo king,
S o age and Analysis, pp. 1–12, 2013.
[13] M. O. Lam, T. Vande b uggen, H. Menon, and M. Scho dan, “Tool
in eg a ion o sou ce-le el mixed p ecision,” in 2019 IEEE/ACM 3 d
In e na ional Wo kshop on So wa e Co ec ness o HPC Applica ions
(Co ec ness), pp. 27–35, 2019.
[14] A. Ha , H. Richa dson, J. Doleschal, T. Ilsche, M. Biele , and M. Kap-
pel, “Use -le el powe moni o ing and applica ion pe o mance on c ay
xc30 supe compu e s,” C ay Use G oup, 2014.
[15] S. J. Ma in and M. Kappel, “C ay xc30 powe moni o ing and man-
agemen ,” C ay Use G oup, 2014.
[16] R. Iakymchuk, G. Gedik, K. Kulka ni, Y. Chen, D. Kemp , S. Kemmle ,
D. Papageo giou, D. Konio is, S. Kiebdaj, J. Co balan, and H. K¨
os le ,
“Bes P ac ice Guide – Ha es ing ene gy consump ion on Eu opean
HPC sys ems: Sha ing Expe ience om he CEEC p ojec ,” Aug. 2024.
[17] D. Kahane , C. Mole , S. Nash, and J. Bu ka d , “Reac o simula o ,”
1989.
[18] D. Kahane , C. Mole , and S. Nash, Nume ical Me hods and So wa e.
Englewood Cli s, NJ: P en ice Hall, 1989. LC: TA345.K34.
[19] I. Ka lin, A. Bha ele, B. L. Chambe lain, J. Cohen, Z. De i o,
M. Gokhale, R. Haque, R. Ho nung, J. Keasle , D. Laney, E. Luke,
S. Lloyd, J. McG aw, R. Neely, D. Richa ds, M. Schulz, C. H. S ill,
F. Wang, and D. Wong, “Lulesh p og amming model and pe o mance
po s o e iew,” Tech. Rep. LLNL-TR-608824, Li e mo e CA, Decem-
be 2012.
[20] I. Ka lin, J. Keasle , and R. Neely, “Lulesh 2.0 upda es and changes,”
Tech. Rep. LLNL-TR-641973, Li e mo e, CA, Augus 2013.
[21] M. B. G. R. D. Ho nung, J. A. Keasle , “Hyd odynamics challenge
p oblem,” 2011.
[22] L. L. N. L. (LLNL), “Lulesh – li e mo e uns uc u ed lag angian
explici shock hyd odynamics.” h ps://gi hub.com/LLNL/LULESH/ ee/
46c2a1d6db9171 9637d79 407212e0 176e8194, n.d. Accessed: 2025-
07-31.
ACKNOWLEDGMENTS
The au ho g a e ully acknowledges he inancial suppo
p o ided by he Cen e o Excellence in Exascale Compu a-
ional Fluid Dynamics (CEEC) unde G an No. 101093393,
unded by he Eu opean Union h ough he Eu oHPC Join Un-
de aking and Sweden, Ge many, Spain, G eece and Denma k.
The au ho would like o hank he NHR-Ve ein e.V.(www.nh -
e ein.de) o suppo ing his wo k/p ojec wi hin NHR G ad-
ua e School o Na ional High Pe o mance Compu ing (NHR).
The au ho would like o hank Pablo de Oli ei a Cas o o
hei aluable guidance and con ibu ions.
APPENDIX A
MEASUREMENT METHODOLOGY DETAILS
All expe imen s we e pe o med on he CPU pa i ion o
he LUMI sys em, whe e we compiled each benchma k wi h
g++ e sion 12.2.0 and he -O2 op imiza ion lag. To ob ain
s a is ically signi ican measu emen s, e e y applica ion was
un 32 imes unde he pe o mance-scaling go e no , and
bo h un ime (in seconds) and ene gy consump ion (in joules)
we e eco ded. Ene gy da a we e ga he ed ia SLURM’s
ene gy-accoun ing plugin, which eads HPE C ay PM Coun-
e s h ough he Baseboa d Managemen Con olle (BMC)
as de ailed in ou CEEC Bes P ac ice Guide; nodes we e
dedica ed exclusi ely o each job o elimina e in e e ence
om co-scheduled wo kloads. CPU ime was measu ed wi h
he GNU ime command, cap u ing only use and sys em
ime o e lec he p ecise CPU esou ces de o ed o each
p ocess.
APPENDIX B
ERROR ANALYSIS DETAILS
To e i y he nume ical accu acy o ou esul s, we
ecompiled he code using Ve i ica lo ( e sion 1.0.0) wi h he
-O2 op imiza ion lag o a oid unin ended al e a ions om
compile op imiza ions. Du ing s anda d uns, Ve i ica lo’s
VPREC backend was employed wi h i s de aul se ings,
p ese ing IEEE-754 bina y64 p ecision and ange. Fo
mixed-p ecision expe imen s, we in oked VPREC wi h he
-p ecision-bina y64=<desi ed_man issa_bi s>
op ion o emula e double p ecision a he speci ied man issa
wid h. This app oach ensu es ha only he a ge ed p ecision
changes, a he han o he compile ans o ma ions, impac
ou accu acy measu emen s. We quan i y he e o in oduced
by educed p ecision using he mean absolu e ela i e e o .

Related note

Why institutions use Plag.ai for originality review, entry 63
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by doctoral supervisors in universities, research institutes, colleges, schools, and publishing workflows, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also clearer documentation of academic decisions, reduced manual checking effort, and clearer separation between similarity and misconduct. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For course assignments, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai