US-RSE 2025
He e ogeneous Dis ibu ed Da a
Managemen in Academia
Josh Bo ow (Uni e si y o Pennsyl ania & Simons Obse a o y)
500 people in he collabo a ion!
To al p ojec cos : a ound $200m
500 TB - 1 PB aw da a a yea
Diffe ing ha dwa e landscapes
Sales
Resea ch
T acking
Expe imen
Na ional Facili y
Co e Analysis
Addi ional
Analysis
Time-Sensi i e
E en s
Da a Release
Publica ion
Web Se ices
Local Clus e s
On-si e
Compu e and
S o age
Da a
Wa ehouse
Analysis
Web Se ices
Dashboa d
Repo s
Eg ess
Pe o med on owned o con igu able ha dwa e
Pe o med on owned o
con igu able ha dwa e
Diffe ing ha dwa e landscapes
Sales
Resea ch
T acking
Expe imen
Na ional Facili y
Co e Analysis
Addi ional
Analysis
Time-Sensi i e
E en s
Da a Release
Publica ion
Web Se ices
Local Clus e s
On-si e
Compu e and
S o age
Da a
Wa ehouse
Analysis
Web Se ices
Dashboa d
Repo s
Eg ess
Pe o med on owned o con igu able ha dwa e
Pe o med on owned o
con igu able ha dwa e
Si e
UCSD
NERSC P ince on
✈
✈
✈
S
n
e
a
k
e
N
e
T
a
n
s
e
(
H
a
n
d
c
a
y
)
Da a
Des ina ions
Main node: na ional
acili y (NERSC) on
U.S. Wes Coas .!
Copies need o be
sen o P ince on
and he U.K., wi h
mo e nodes
possible.
Academic wo kflows
Expe imen
Da a
Sha ed
Filesys em
Ba ch Job
P ocessing
Seconda y
Da a A i ac
Visualize
Unlinked
Da abase
Sha e Pa h on
Filesys em
scp? X
o wa d?
Da a ound a
p e-known pa h
Da a equi emen s
•Replica ion o a POSIX
ilesys em s uc u e on high
pe o mance disks all da a-cen e s.!
•Cus om e en ion policy con ol
on a pe -cen e basis.!
•Low la ency (ideally sub-hou ).!
•Abili y o wo k in ‘sneake ne ’
mode, whe e disks a e hand-
ca ied o da a ans e .
So wa e se up
•A cus om da a ans e
o ches a o : ‘Lib a ian’ uses
globus o in e -node ans e . Each
si e con ols hei own da a
policies, and a e only esponsible
o esponding o ques ions abou
wha da a hey hold.!
•Full da a ca aloging in pos g es,
a ailable h ough a HTTP clien .!
•Da a in eg i y is e i ied con inually
h ough in e -node communica ion.
Sys em se ups
NERSC
Ranche
P ince on
Podman on dedica ed VM
SO:UK
Podman
NERSC
managed
endpoin
P ince on
compu ing
endpoin
Cus om
endpoin
J
e
e
m
y
M
y
e
s
U
n
i
e
s
i
y
o
P
e
n
n
s
y
l
a
n
i
a
D
i
e
s
h
J
a
i
n
U
n
i
e
s
i
y
o
M
a
n
c
h
e
s
e
G
i
a
n
n
i
s
P
a
a
s
k
e
a
k
o
s
P
i
n
c
e
o
n
R
e
s
e
a
c
h
C
o
m
p
u
i
n
g
Visualizing spa ial da a in he b owse
•A he co e o ou p ojec a e
gian as onomical images ( hink
RAW iles), each wi h up o 900
megapixels.!
•Pain poin : isualizing hese;
use s may ha e 50-100 o such
images o ‘ lip’ h ough (250-500
GB o da a).
J
e
e
m
y
M
y
e
s
U
n
i
e
s
i
y
o
P
e
n
n
s
y
l
a
n
i
a
D
i
e
s
h
J
a
i
n
U
n
i
e
s
i
y
o
M
a
n
c
h
e
s
e
G
i
a
n
n
i
s
P
a
a
s
k
e
a
k
o
s
P
i
n
c
e
o
n
R
e
s
e
a
c
h
C
o
m
p
u
i
n
g
Lessons om jupy e
•I has become common o use s
o o wa d po s om HPC
machines o hei lap ops o e.g.
Jupy e se ices.!
•Wi h his in mind, we buil a
command-line d i en web
applica ion o isualizing hese
maps.!
•The applica ion elies hea ily on
jus -in- ime da a p ocessing o
keep memo y oo p in low.
Composable so wa e
•By building ‘ ilemake ’ as a
composable lib a y, we a e able o
ha e a deployed e sion a
maps.simonsobse a o y.o g!
•Like hippo, we ha e g oup-based
access con ol, wi h some da a
public and some no .!
•The map iewe is a highly use ul ool
o da a eleases.
Takeaways
•When dealing wi h he e ogeneous compu ing en i onmen s, ha ing local
expe s is c i ical.
•Decen aliza ion is no always he bes s a egy, and i ’s wo h pushing back
agains .
•Web-based ools d i en by CLI in e aces p o ide powe ul expe iences o
single use s, and p o ide ex a alue as ‘hos ed e sions’ o public access.
•We build e e y hing in he open (gi hub.com/simonsobs) bu ou s u is all
e y ea ly-s age, as we’ e a young expe imen .
J
e
e
m
y
M
y
e
s
U
n
i
e
s
i
y
o
P
e
n
n
s
y
l
a
n
i
a
D
i
e
s
h
J
a
i
n
U
n
i
e
s
i
y
o
M
a
n
c
h
e
s
e
G
i
a
n
n
i
s
P
a
a
s
k
e
a
k
o
s
P
i
n
c
e
o
n
R
e
s
e
a
c
h
C
o
m
p
u
i
n
g