scieee Science in your language
[en] (orig)

BBView: A View-Aware Burst-Buffer Mechanism for MPI-IO

Author: Koyama, Sohei; Tatebe, Osuma
Publisher: Zenodo
DOI: 10.5281/zenodo.17654712
Source: https://zenodo.org/records/17654712/files/REXIO2025-Paper1-Koyama.pdf
BBView: A View-Awa e Bu s -Bu e
Mechanism o MPI-IO
Sohei Koyama, Osamu Ta ebe
Uni e si y o Tsukuba
Sep embe 2, 2025
1 / 18
Table o Con en s
•Backg ound
•Resea ch Goal
•Rela ed Wo k
•Posi ioning o BBView in Rela ed Wo k
•P oposed Me hod
•E alua ion o P oposed Me hod
•Conclusion
2 / 18
Backg ound: HPC Simula ion and Ou pu Phase
e : OpenMHD
•Consis s o compu a ion phase and ou pu phase (al e na ing be ween compu a ion
and ou pu )
•In he ou pu phase, simula ion esul s a e w i en (a each imes ep)
I he ou pu phase is slowe han he compu a ion phase,
execu ion ime becomes longe , o he in e al be ween imes eps mus be ex ended.
3 / 18
Backg ound: Issues in he Ou pu Phase
•In he compu a ion phase, he mul idimensional space is di ided in o g ids and
dis ibu ed o p ocesses
•Example: di ided in o 2 ×2 o 4 p ocesses ( o minimize communica ion)
•Each p ocess needs o w i e ou many small chunks
•I/O in small uni s ( ens o KiB) canno exploi he pe o mance o pa allel ile
sys ems
4 / 18
Resea ch Goal
Wi hou modi ying he applica ion o he ou pu ile o ma ,
sho en o hide he ou pu phase ime, he eby
educing he execu ion ime o HPC simula ions
(o enabling mo e equen ou pu s).
Usage o BBView
Jus add a gumen s a un ime, no ecompila ion equi ed, no change in ou pu ile
mpi un –np 4 –mca io bb iew –mca coll indi idual ./a.ou
5 / 18

Rela ed Wo k: Two Phase
I/O [Dickens and Thaku , 1998]
•Each p ocess calls MPI File w i e all() simul aneously
•Bu e s o each p ocess a e agg ega ed in he MPI un ime
mpi un –np 4 –mca coll dynamic gen2 ./a.ou
The w i e uni becomes la ge , bu s ill only a ew hund ed KiB.
6 / 18
Rela ed Wo k: Node-Local Bu s Bu e s
(e.g. Uni yFS [B im e al., 2023])
•Cons uc a ile sys em by agg ega ing local s o age on compu e nodes
•Tempo a ily w i e o he bu s bu e quickly, hen lush la e o he pa allel ile
sys em
•Hide w i es o he pa allel ile sys em by mo ing hem o he c i ical pa h
Howe e , w i e uni size o local s o age does no become la ge .
7 / 18
Posi ioning o BBView in Rela ed Wo k
Ad an ages o ela ed wo k:
•La ge w i e uni s o he pa allel ile sys em educe w i e ime
•W i ing o local s o age hides w i e ime o he pa allel ile sys em
BBView combines bo h ad an ages:
•W i ing o local s o age in la ge uni s educes w i e ime
•W i ing o local s o age hides w i e ime o he pa allel ile sys em
8 / 18
P oposed Me hod
O e iew
•W i e MPI File Views sequen ially (in la ge uni s)in o a single ile
•Recons uc iews la e and asynch onously w i e o he pa allel ile sys em
P ocess
•When a View is se , each p ocess opens / mp/{ ilename}-{ ank}-{idx}
•All w i es a e appended sequen ially in o ha ile (in e nally se o an iden i y View)
•When MPI File close is called, a backg ound daemon asynch onously econs uc s
he iew and w i es o he pa allel ile sys em
9 / 18
Conclusion
BBView sho ens o hides he ou pu phase ime
wi hou modi ying he applica ion o ou pu ile o ma ,
he eby educing execu ion ime o HPC simula ions.
Released a : h ps://gi hub.com/ sukuba-hpcs/bb iew
16 / 18

Acknowledgmen s
This wo k was pa ially suppo ed by
•JSPS KAKENHI JP22H00509
•NEDO “Pos -5G In as uc u e Enhancemen R&D P og am” (JPNP20017)
•JST CREST JPMJCR24R4
•In e disciplina y Join Usage P og am a he Cen e o Compu a ional Sciences,
Uni e si y o Tsukuba
•Special join esea ch wi h Fuji su
17 / 18
Re e ences
B im, M. J., Moody, A. T., Lim, S.-H., Mille , R., Boehm, S., S ana ige, C., Moh o , K. M., and
O al, S. (2023).
Uni yFS: A use -le el sha ed ile sys em o uni ied access o dis ibu ed local s o age.
In 2023 IEEE In e na ional Pa allel and Dis ibu ed P ocessing Symposium (IPDPS), pages 290–300.
IEEE.
Dickens, P. M. and Thaku , R. (1998).
A pe o mance s udy o wo-phase i/o.
In Eu o-Pa ’98 Pa allel P ocessing: 4 h In e na ional Eu o-Pa Con e ence Sou hamp on, UK,
Sep embe 1–4, 1998 P oceedings 4, pages 959–965. Sp inge .
Hi aga, K. (2024).
2d eac ion-di usion sys em benchma k using mpi and mpi-io.
18 / 18