scieee Science in your language
[en] (orig)

Towards an Optimal IO500 Configuration: Literature Meets Empirical Evaluation

Author: Ahmad, Hadi; Liem, Radita; Lofstead, Jay
Publisher: Zenodo
DOI: 10.5281/zenodo.17654749
Source: https://zenodo.org/records/17654749/files/REXIO2025-Paper3-Liem.pdf
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
Hadi Ahmad*, Radi a Liem*, Jay Lo s ead+
*Chai o High Pe o mance Compu ing, IT Cen e , RWTH Aachen Uni e si y
+Sandia Na ional Labo a o y
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
2
Backg ound
•S o age and I/O a e c i ical componen s o HPC clus e s.
•We a e looking in o he IO500 benchma k. The cu en de ac o s anda d o I/O sys em benchma king,
combining IOR, md es , and ind o e alua e bandwid h, me ada a, and sea ch pe o mance.
•In he IO500 lis , he benchma k esul s a e published. I is modeled a e TOP500 and G een500 ankings,
enabling compa isons ac oss da a cen e s.
•Howe e , despi e i s widesp ead adop ion, IO500 uning emains complex, o en undocumen ed, and di icul o
ep oduce.
•The e is a clea esea ch gap: published s udies a e limi ed while he e’s a need o mo e anspa en and
sys ema ic in es iga ion ha can guide da a cen e s o une he benchma k.
•This wo k ies o add ess he gap h ough a li e a u e-guided uning s udy on CLAIX23, combining
benchma k pa ame e adjus men s wi h ilesys em con igu a ion expe imen s.
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
3
IO500 Benchma k O e iew [1]
•IO500 is a benchma k sui e designed o e alua e he sys em’s I/O pe o mance o a clus e
•De eloped in 2017 and p o ides a s anda dized e alua ion o bo h me ada a and bandwid h pe o mance
•IO500 uns a se ies o es s ha measu e I/O bandwid h, me ada a pe o mance, and small ile
handling.
•I consis s o h ee main benchma ks:
−IOR : Measu es ead and w i e bandwid h o pa allel I/O.
−md es : E alua es me ada a ope a ions like ile c ea ion, dele ion, and s a calls.
−Find : Finding ele an objec s based on pa e ns.
•The IOR and md es benchma ks a e con igu ed o ep esen he ‘wo s -case’ and ‘bes -case’ I/O
ope a ion scena ios in he s o age sys em.
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
4
•Scena ios o he IO500 benchma ks
Componen Tes s Me ic Explana ion
IOR 'easy'
io _easy_w i e
,
io _easy_ ead
GiB
/s
F ee o une IOR pa ame e s. Typically ile
-pe -p ocess, la ge,
aligned chunks o ge he bes possible bandwid h pe o mance
IOR 'ha d'
io _ha d_w i e
,
io _ha d_ ead
GiB/s
Limi ed op ions o une. Fo ced o use small unaligned I/O o a
single sha ed ile o he wo s possible bandwid h pe o mance
md es 'easy'
md es _easy_dele e,
md es _easy_s a
,
md es _easy_w i e
KIOPS
F ee o une
md es pa ame e s wi h ze o size iles in sepa a e
di ec o y pe p ocess o ep esen bes case scena io o
me ada a a e
md es 'ha d'
md es _ha d_dele e,
md es _ha d_s a ,
md es _ha d_w i e,
md es _ha d_ ead,
KIOPS
Limi ed op ions o une. Fo ced all p ocesses o w i e on a single
sha ed di ec o y. Rep esen ing wo s case scena io o me ada a
a e
Find
ind
KIOPS
Finding speci ic subse o iles om hose c ea ed by ou
scena ios.
IO500 Benchma k O e iew [2]
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
5
IOR
•The benchma k used in IO500 o e alua e ead and w i e bandwid h o pa allel ilesys em.
•Use can speci ies pa ame e s as inpu o he p og am o adjus he con igu a ion o hei uns
•Sample un: ./ io - 2m –b 9920000m -a POSIX –s 100000 -F
•IOR uses MPI o coo dina e he p ocesses o ead and w i e concu en ly o ensu e hey s a and unning
in a coo dina ed manne .
•To measu e bandwid h, IOR synch onizes all p ocesses ensu ing hey s a simul aneously. I eco ds s a
and end imes, hen di ides i by he agg ega e amoun o da a w i en.
Flag
Pa ame e
-a
API(POSIX,HDF5,MPIIO,…)
-b
Block Size/Chunk Size
-F
File pe p ocess
-s
Segmen Coun
-
T ans e Size

Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
6
IOR in IO500
•IO500 benchma k’s IOR scena ios a e di ided in o ‘easy’ and ‘ha d’
▪Wo s case scena io comes om IOR ‘ha d’ scena io
▪Bes case scena io comes om IOR ‘easy’ scena io
•Sample con igu a ion o IOR easy in IO500:
./ io -- da aPacke Type = imes amp -C -Q 1 -g -G -309386941 -k
-e - 2 m –b 9920000m -F - -R -a POSIX
•Di e ences be ween ‘ha d’ and ‘easy’ scena ios:
Fea u e
Easy
Ha d
File Access
Independen
Sha ed
Da a Pa e n/Segmen (s)
Con iguous
Non
-con iguous
Block/Chunk Size
La ge (cus omizable)
Small (47008 By es)
T ans e Size
Cus omizable
Small (47008 By es)
Expec ed Th oughpu
Highe
Lowe
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
7
md es
•I is used o e alua e me ada a pe o mance o pa allel ile sys ems by measu ing how e ec i ely a sys em
handles c ea e, s a us, and dele e ope a ions.
•Use speci ies pa ame e s as inpu o he p og am o adjus he con igu a ion o hei uns
▪Inpu example: ./ md es -n 1000000 - -w 1k -e 1K -N 1 -a POSIX
•Acco ding o he inpu con igu a ion, md es gene a es a di ec o y ee, hen ills he ee wi h he
desi ed numbe o iles and di ec o ies. Di e en con igu a ions esul in di e en ee dep hs, and ile
loca ions (a lea only, dis ibu ed h oughou , e c.).
•A e c ea ion, di ec o ies and iles ha e hei me ada a e ie ed and once he s a ope a ions a e comple e,
he con en s o he ee a e emo ed. Use s a e also gi en he op ion o ead om hese iles.
•Simila o IOR, md es uses MPI o coo dina e he p ocesses o c ea e, s a and emo e concu en ly o
ensu e hey s a and p oceed in a coo dina ed manne .
•To measu e IOPS, i eco ds s a and end imes, hen di ides by he numbe o ac ions pe o med.
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
8
md es in IO500
•IO500 benchma k’s Md es scena ios used o e alua e me ada a ope a ion pe o mance.
▪Wo s case scena io is coming om Md es ‘ha d’ scena io
▪Bes case scena io is coming om Md es ‘easy’ scena io
•Sample con igu a ion o md es ha d in IO500:
./ md es -- da aPacke Type = imes amp -n 1000000 - -w 3901 -e 3901 -P -G
=577035642 -N 1 -F -C -Y -W 300 -a POSIX
•Di e ences be ween ‘easy’ and ‘ha d’ scena ios:
Fea u e
Easy
Ha d
Di ec o y Dep h
Fla
Hie a chical/Deep
Ope a ions
C ea e/S a /Remo e
C ea e(W i e)/S a /Read/Remo e
W i e Size
0
3901 By es
Read Size
0
3901 By es
Expec ed Th oughpu
Highe
Lowe
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
9
Li e a u e S udy
Publica ion
Task coun
Chunk Size
T ans e Size
S ipe Coun
Filesys em
Benchma k
Boi o e al.
BeeGFS
IOR
Bo ka e al.
BeeGFS
IOR
B zenski
e al.
BeeGFS
IOR
Ca ns e al.
PVFS
Md es
Chowdhu y e al.
BeeGFS
IOR
Hennecke
DAOS
IOR,
md es
(IO500)
Reed e al.
Lus e
IOR
Saini e al.
Lus e
IOR
Shan e al.
Lus e
,
GPFS
IOR
Sung e al.
Lus e
IOR
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
16
•md es easy: ~1200% peak pe o mance o md es
ha d.
•md es easy ~40% o manu ac u e lis ed me ada a
pe o mance.
•md es easy pe o mance s able om 240 o 720 asks.
•D op in pe o mance a 960 asks.
•md es ha d s able in ange o 20-720 asks.
Task Coun s : md es –Expe imen Resul s [1]

Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
17
•md es ‘ha d’ pe o mance declines a 960 asks.
•La ge inc ease in a iabili y o pe o mance a 960 asks.
•Repea ed es ing o 960 asks shows ha esul is no
an ou lie .
•Possible esou ce con en ion o con ex swi ching
a ec ing pe o mance as IOPS ise and a e mo e s able
wi h dec eased co e coun :
Tasks
Mean W i e
(KIOPS)
S d. de
Mean s a
(KIOPS)
S d. de
920
9.68
0.22
69.01
2.97
940
3.6
3
0.
21
42.
27
3.
15
960
1.59
0.36
27.46
22.11
Task Coun s : md es –Expe imen Resul s [2]
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
18
S o age Ta ge s/S ipe Coun
•S o age a ge s e e s o he numbe o s ipes o chunks o da a ha can be simul aneously w i en o ead
om a pa allel ilesys em.
•La ge numbe o s o age a ge s inc eases pe o mance due o pa alleliza ion and load balancing (Boi o
e al. Reed e al.)
In he wo k o Boi o e al. (le image), inc eases in s ipe coun s will inc eases BW om ~1750MiB/s o ~8000MiB/s
The wo k o Reed e al. ( igh image) shows ha inc ease in s ipe coun and a ying s ipe coun o di e en iles a ec s
pe o mance. IOR1-3 ha e s a ic s ipe coun , IOR 4-6 a e al e ed o di e en ile sizes in expe imen .
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
19
S o age Ta ge s/S ipe Coun : IOR –Expe imen s Resul s
•IOR easy pe o ms bes a lowe s ipe coun s.
•IOR easy has mo e a iabili y a lowe s ipe coun s.
•IOR ha d has bes pe o mance a 4 a ge s
8
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
20
•md es easy shows li le o no imp o emen ac oss all
me ics wi h inc eased s ipe coun .
•Bes md es easy pe o mance a 8 s ipe coun albei
wi h highes a ia ion in s a .
•md es ha d shows lowe pe o mance han md es
easy, as well as g ea e no malized a ia ion.
•D op in pe o mance in md es ha d a s ipe coun s
becomes g ea e han 4.
S o age Ta ge s/S ipe Coun : md es –Expe imen Resul s
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
21
Chunk Size
•Chunk size e e s o he uni o da a s o ed pe objec o a ge in pa allel ilesys ems.
•La ge chunks can esul in inc eased pe o mance o la ge ile sizes as da a is agmen ed less equen ly
acco ding o he wo k o Saini e al.
•Smalle chunks can some imes bene i when equen access o smalle iles is equi ed.
IOR benchma k wi h a iable block size. (le ) Di e en ile sizes. ( igh ) Di e en numbe o OSTs.
I shows la ge s ipe coun s bene i om inc eased s ipe size as well. Images om Saini e al.’s pape

Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
22
•IOR easy bes pe o mance a lowe chunk sizes.
•IOR ha d pe o ms bes a 512KiB.
•Bes pe o mance o IOR easy a 128KiB.
•Lowe chunk size con igu a ion han 128KiB esul ed
in e o s in IOR easy and a wo se pe o mance han
128KiB in IOR ha d (R mean: 8277, W mean: 2294)
Chunk Size: IOR –Expe imen Resul s [1]
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
23
•Peak a 512KiB o ead ope a ions.
•Simila pe o mance a 256 and 512 KiB o w i e
ope a ions.
•Ini ial inc ease ill 512KiB, d op a la ge chunk size
o bo h ead and w i e.
Chunk Size: IOR –Expe imen Resul s [2]
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
24
•md es easy ~900% peak pe o mance o md es ha d.
•md es easy pe o mance bes a 512KiB.
•md es easy shows simila pe o mance o w i e and dele e.
Chunk Size: md es –Expe imen Resul s [1]
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
25
•md es ‘ha d’ pe o mance consis en h oughou .
•Va ia ion in pe o mance om 118K (bes ) o 110K
(wo s ) in s a ope a ion.
•Read, w i e and dele e pe o m consis en ly ac oss
ange wi h no dis inguishable shi in a ia ion .
•No imp o emen de ec ed ha canno be a ibu ed o
a ia ion be ween he uns.
Chunk Size: md es –Expe imen Resul s [2]
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
32
Chunk/Block/S ipe Size - md es - Se up
•BeeOND se up:
−S o age a ge s: 4
−Chunk size: Va iable
−Co e Coun : 240
•Fo md es -easy:
./md es '-n' '1000000' '-u' '-L' '-F' '-P’
'-G' '1583163012' '-N' '1' '-C' '-Y' '-W' ‘300’
'-a' 'POSIX'
•Va iable:
−Chunk size
•Fo md es -ha d:
./md es '-n' '1000000' '- ' '-w' '3901' '-e'
'3901' '-P' '-G=1583177082' '-N' '1' '-F' '-C' '-Y'
'-W' ‘300' '-a' 'POSIX'
•Di e ences:
Easy scena io has no by es w i en o/ ead om he iles.

Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
33
S o age Ta ge s/S ipe Coun - IOR - Se up
•BeeOND se up:
−S o age a ge s: Va iable
−Chunk size: 512
−Co e Coun : 240
•Fo IOR-easy:
•Va iable:
−S o age Ta ge s
•Fo IOR-ha d:
Towa ds an Op imal IO500 Con igu a ion:
Li e a u e Mee s Empi ical E alua ion
REX-IO - Clus e 2025
34
S o age Ta ge s/S ipe Coun - md es - Se up
•BeeOND se up:
−S o age a ge s: Va iable
−Chunk size: 512KiB
−Co e Coun : 240
•Fo md es -easy:
./md es '-n' '1000000' '-u' '-L' '-F' '-P’
'-G' '1583163012' '-N' '1' '-C' '-Y' '-W' ‘300’
'-a' 'POSIX'
•Va iable:
−S o age Ta ge s
•Fo md es -ha d:
./md es '-n' '1000000' '- ' '-w' '3901' '-e'
'3901' '-P' '-G=1583177082' '-N' '1' '-F' '-C' '-Y'
'-W' ‘300' '-a' 'POSIX'
•Di e ences:
Easy scena io has no by es w i en o/ ead om he iles