µSpli : image decomposi ion o luo escence mic oscopy
Ashesh1, Alexande K ull2, Moises Di San e3, F ancesco Sil io Pasqualini3, Flo ian Jug1,*
1Human Technopole, I aly, 2Uni e si y o Bi mingham, UK, 3Uni e si y o Pa ia, I aly
[email p o ec ed], [email p o ec ed], [email p o ec ed]
[email p o ec ed], [email p o ec ed]
Abs ac
We p esen µSpli , a dedica ed app oach o ained
image decomposi ion in he con ex o luo escence mi-
c oscopy images. We ind ha bes esul s using egula
deep a chi ec u es a e achie ed when la ge image pa ches
a e used du ing aining, making memo y consump ion he
limi ing ac o o u he imp o ing pe o mance. We he e-
o e in oduce la e al con ex ualiza ion (LC), a no el me a-
a chi ec u e ha enables he memo y e icien inco po a ion
o la ge image-con ex , which we obse e is a key ing edi-
en o sol ing he image decomposi ion ask a hand. We
in eg a e LC wi h U-Ne s, Hie a chical AEs, and Hie a -
chical VAEs, o which we o mula e a modi ied ELBO loss.
Addi ionally, LC enables aining deepe hie a chical mod-
els han o he wise possible and, in e es ingly, helps o e-
duce iling a e ac s ha a e inhe en ly impossible o a oid
when using iled VAE p edic ions. We apply µSpli o i e
decomposi ion asks, one on a syn he ic da ase , ou o h-
e s de i ed om eal mic oscopy da a. Ou me hod con-
sis en ly achie es bes esul s (a e age imp o emen s o he
bes baseline o 2.25 dB PSNR), while simul aneously e-
qui ing conside ably less GPU memo y. Ou code and
da ase s can be ound a h ps://gi hub.com/juglab/uSpli .
1. In oduc ion
Fluo escence mic oscopy [10] is ou inely used o look a
li ing cells and biological issues a cellula and sub-cellula
esolu ion [18]. Componen s o he imaged cells can be
highligh ed using luo escen labels, allowing biologis s o
in es iga e indi idual s uc u es o in e es . Gi en he com-
plexi y o biological p ocesses, i is ypically necessa y o
look a mul iple s uc u es simul aneously, ypically ia a
empo al mul iplexing scheme [10] ha sepa a es hem in o
di e en image channels.
Imaging mo e han 3 o 4 s uc u es in his way is di -
icul o echnical easons, limi ing he a e o scien i ic
*Co esponding Au ho , ([email p o ec ed]).
Figu e 1. Spli ing o supe imposed image channels. The inpu
image is he sum o wo image channels, each channel con aining
s uc u es om one gi en objec class. The ask o µSpli is o
iden i y and spli he s uc u es supe imposed in he gi en inpu
image (dashed ec angles).
p og ess in he li e sciences. One way o ci cum en his
limi a ion would be o label wo cellula componen s wi h
he same luo opho e, i.e. image hem in he same image
channel. Hence, a compu a ional me hod o spli apa (de-
compose) supe imposed biological s uc u es acqui ed in a
single image channel, i.e. wi hou empo al mul iplexing,
would ha e emendous impac (see Figu e 1).
His o ically, image decomposi ion has ound applica-
ions on na u al images [9,8,1,3]. Ou app oach o image
decomposi ion, called µSpli , es s on he idea o lea ning
s uc u al p io s o he wo unmixed a ge image channels,
and hen using hese o guide he decomposi ion o he su-
pe imposed (added) pixel in ensi ies. Such con en -awa e
p io s ha e p e iously been used o asks such as image
es o a ion [29,4,28], denoising [14,2,15,11,20,19], and
segmen a ion [5,25,30].
In many o hese cases, he achie able pe o mance
hea ily depends on he po ion o he image a ne wo k can
see be o e ha ing o make a p edic ion. As we show in his
wo k, he need o la ge spa ial con ex , i.e. ecep i e ield
and pa ch size, is pa icula ly p onounced o image decom-
posi ion. Biological s uc u es in mic oscopy images can
easily ex end o e dis ances o se e al hund ed pixels. Ac-
co dingly, we obse e ha esul s imp o e wi h la ge ain-
ing pa ch sizes and deepe a chi ec u es (see Figu e 6(a)).
Na u ally, his leads o models ha ing a huge GPU mem-
o y oo p in , which limi s hei applicabili y o selec ed
compu e-sa y li e-science labs.
The impo ance o con ex has p e iously been u ilized
in he ield o image segmen a ion [16,13]. Leng e al. [16]
de ised a me hod o e icien ly use he a ailable con ex o
he inpu image o a segmen a ion ask. Howe e , hey did
no use addi ional inpu s o ha ing access o a la ge con-
ex han wha is al eady p esen in he gi en inpu pa ch.
Hilbe e al. [13] wo ked wi h 3D images and used an ad-
di ional lowe esolu ion image o imp o e o e all segmen-
a ion pe o mance.
Also o µSpli we obse e ha addi ional image con ex
is impo an . In con as o he p e iously men ioned a -
chi ec u es, we in oduce La e al Con ex ualiza ion (LC), a
no el me a-a chi ec u e ha eeds addi ional image con ex
a mul iple p ocessing s eps. We in oduce h ee a ian s,
Lean-LC,Regula -LC, and Deep-LC, di e ing om each
o he in e ms o GPU memo y equi emen s and achie able
p edic ion quali y. As we elabo a e below, Deep-LC addi-
ionally o e s he possibili y o ins an ia e a mo e powe ul
HIERARCHICAL VAE wi h mo e hie a chy laye s han o h-
e wise possible, and show ha his leads o imp o ed pe -
o mance on he image spli ing ask a hand. e Since µSpli
needs o be applicable o la ge mic oscopy images, iled
p edic ions a e equi ed. In iled p edic ions, inpu image
is di ided in o o e lapping pa ches on which p edic ions
a e pe o med indi idually. Those p edic ions a e hen ap-
p op ia ely cen e -c opped in o non-o e lapping iles which
can hen be appended o o m he inal p edic ion. O e lap-
ping pa ches ha e o be used o ensu e ha su icien image
con ex is a ailable o add ess bo de a i ac s o occu in
he non-o e lapping cen al egion.
In Sec ion 3, we a gue ha o deep ne wo ks ope a -
ing on ela i ely small pa ches, o e lapping egions should
no be c ea ed by making iles la ge (Ou e Padding) which
is a guably he mos common way, bu ha i is be e o
ins ead cen e -c op egions smalle han he o iginal pa ch
size (Inne Padding).
Since HIERARCHICAL VAES(HVAES) [26] ha e e-
cen ly gained popula i y, e.g. o mic oscopy image denois-
ing and es o a ion [20,19], we made hese powe ul a chi-
ec u es also a ailable o he image decomposi ion ask by
modi ying he de aul VAE ELBO loss, inco po a ing he
ac ha he ed inpu is di e en om he decoded ou pu .
2. P oblem S a emen
A da ase Dmix = (x1, x2, .., xN)o Nimages is c e-
a ed by supe imposing sampled pai s o image channels
(D1, D2), such ha
xi= (di
1+di
2)/2,∀i∈[1, N],(1)
wi h D1= (d1
1, d2
1, ...dN
1)and D2= (d1
2, d2
2..., dN
2).
Gi en a newly sampled x= (d1+d2)/2, he ask is o
decompose xin o es ima es o d1and d2.
3. Ou App oach
A Sound ELBO o µSpli . We ain ou VAE o desc ibe
he join dis ibu ion o bo h channel images d1and d2.
We modi y he VAE’s ELBO objec i e o inco po a e he
ac ha inpu and ou pu a e no he same (as hey a e o
au oencode s). When aining he VAE, ou objec i e is o
ind
a g max
θ
N
X
i=1
log P(di
1, di
2;θ),
based on ou aining examples (di
1, di
2). He e, θa e he de-
code pa ame e s o ou VAE, which de ine he dis ibu ion.
Nex , we expand log P(d1, d2;θ)as
log ZP(d1, d2, z;θ)dz
= log Zq(z|x;ϕ)∗P(d1, d2, z;θ)
q(z|x;ϕ)dz
>=Zq(z|x;ϕ)∗log P(d1, d2, z;θ)
q(z|x;ϕ)dz, (2)
whe e q(z|x;ϕ)is ou encode ne wo k wi h pa ame e s ϕ.
I can be shown ha he e idence lowe bound in Eq. 2is
equal o
Eq(z|x;ϕ)[log P(d1, d2|z;θ)] −KL(q(z|x;ϕ), P (z)).
By making he assump ion o condi ional independence o
d1and d2gi en z, we can simpli y he exp ession o
Eq(z|x;ϕ)[log P(d1|z;θ) + log P(d2|z;θ)]
−KL(q(z|x;ϕ), P (z)).(3)
Exp ession 3is wha we end up maximizing du ing aining.
No e ha his analysis can be seamlessly ex ended o he
case whe e one has a hie a chy o la en ec o s [26] ins ead
o jus one.
Fo modelling q(z|x;ϕ), we use he iden ical se up o
he bo om-up b anch used in [20] wi h he inpu being
x, he supe imposed inpu . Fo modeling P(d1|z;θ)and
P(d2|z;θ), we again use he op-down b anch design used
in [20] bu make he op-down b anch ou pu wo channels
o mean and wo mo e o he pixelwise log( a ), one each
o d1and d2. So, he ou pu o ou model is a 4 chan-
nel enso wi h iden ical spa ial dimensions as he inpu .
No e ha o enco po a e LC, we modi y bo h q(z|x;ϕ)and
P(d2|z;θ)which we desc ibe nex .
La e al Con ex ualisa ion (LC). We in oduce LC, allow-
ing µSpli o see la ge po ions o he inpu image a inc eas-
ingly downscaled pixel esolu ions. LC only equi es small
(a) (b)
[20]
[20]
[20]
[20]
[20]
xp
x(p,1)
Figu e 2. Ne wo k a chi ec u e o µSpli . In (a), we show he ne wo k a chi ec u e employed by Regula -LC. The inpu (le side) consis s
o a co e image pa ch xp, oge he wi h downscaled e sion o he pa ch su oundings – he la e al con ex (LC). We show he a ea
co esponding o he o iginal pa ch as ed do ed box h oughou he igu e. (b) The ne wo k a chi ec u e o Deep-LC. The a chi ec u e
used in [20] is s acked on op o he Regula -LC a chi ec u e shown in (a). No e ha his is only possible because he la en space in
Regula -LC e ained he spa ial dimensions o all laye s by means o using he p oposed LC. No e: a ske ch o he Lean-LC a chi ec u e,
ou hi d LC a ian , can be ound in he Supp. Figu e S.1.
ull esolu ion pa ches, ende ing he ne wo k conside ably
mo e memo y e icien .
Many popula a chi ec u es, such as U-NETS [23] o
HVAES[20,6,27] a e composed o a hie a chy o le -
els ha ope a e on inc easingly downsampled and he e o e
also inc easingly smalle laye s. The basic idea o LC is o
pad each downsampled laye by addi ional image con ex ,
i.e. addi ional inpu om an a ailable la ge inpu image,
such ha each laye a each hie a chy le el main ains he
same spa ial dimensions. (In Figu e 2(a), he ed dashed
squa es in he s ack o inpu s (le mos column) indica e he
loca ion o he o iginal pa ch (xp) wi hin he downscaled
and la e ally con ex ualized inpu s a highe hie a chy le -
els (x(p,i)).)
C ea ing downsampled LC inpu s. Le xp=x[c,h]deno e a
pa ch o size h×h om x∈Dmix cen e ed a ound pixel lo-
ca ion c. To decompose he pa ch xp, we addi ionally use a
sequence o successi ely downscaled and c opped e sions
o x,Xlow es
p= (x(p,1), x(p,2), . . . , x(p,nLC )), whe e x(p,k)is
x[c,2k·h], downsampled o he same pixel esolu ion o h×h,
and nLC deno es he o al numbe o used LC inpu s.
Implemen a ion o Regula -LC.O e all a chi ec u e is
shown in Figu e 2(a). P ima y inpu pa ch xpis ed o
he i s inpu b anch (IB). The ou pu o his IB is ed o
he i s bo om up (BU) block, which downsamples he in-
pu ia s ided con olu ions, whose ou pu is hen passed
o some esidual blocks (see Supp. igu e S.1), and inally
ze o padded o egain he same spa ial dimension as he in-
pu i ecei ed. The ou pu o he i s BU block is con-
ca ena ed wi h he ou pu o he second IB, which has e-
cei ed he i s lowe esolu ion inpu con aining addi ional
la e al con ex , x(p,1). Ze o-padding ollowed by conca e-
na ion ensu es pixelwise alignmen be ween IB’s ou pu and
BU’s ou pu . We use 1×1-con olu ions o me ge hese con-
ca ena ed channels and eed he esul ing laye in o he nex
BU block. This p ocedu e ge s epea ed o e e y hie a chy
le el in he gi en HVAE.
Once he opmos hie a chy le el is eached, he las
laye is ed in o he opmos op down (TD) block. A
TD block consis o some esidual laye s, ollowed by a
s ochas ic block as hey a e used in HVAES. The ou pu o
he s ochas ic block is cen e -c opped o hal size and up-
sampled ia anspose con olu ions be o e again being ed
h ough some esidual laye s ((see Supp. igu e S.1)). C op-
Figu e 3. Ca oon o a gene ic hie a chical ne wo k wi h an encode -decode a chi ec u e illus a ing he ela ionship be ween he inpu
pa ch size, he e ec i e ecep i e ield, and he heo e ical ecep i e ield. The inpu pa ch, shown a he e y le in he cen e o he ligh
blue a ea, is p ocessed and downsampled mul iple imes (encode ) be o e being upsampled mul iple imes (decode ) o allow he ou pu ,
shown on he e y igh , o ha e he same pixel dimensions as he inpu pa ch. Cuboids shown by solid black lines ep esen he enso s
he ne wo k compu es du ing i s execu ion. Solid blue cuboids show he e ec i e ecep i e ield,i.e. he a eas wi hin each enso ha can
in luence he cen e -mos pixels in he wo ou pu laye s (depic ed by ed ec angles). All bu he las wo enso s a e ully ‘ isible’ o
hose pixels, since he heo e ical ecep i e ield,i.e. he maximum a ea ha would in luence hose pixels i he espec i e enso would
be su icien ly la ge, g ows beyond hei bounds (shown as ligh -blue solid cuboids). No e ha wo king wi h la ge inpu pa ches will ill
a la ge po ion o he heo e ical ecep i e ield. I heo e ical and e ec i e ecep i e ields di e ge, as shown in his ca oon, padded
p edic ions on inpu pa ches la ge hen he aining pa ch size will cause he ne wo k o ope a e ou -o -dis ibu ion (OOD) and he e o e
lead o deg aded p edic ion quali y (see main ex and Supp. Sec ion S.2.1).
ping and upsampling ensu es ha he ou pu o he TD block
ma ches he nex lowe hie a chy le el. The ou pu o he
TD block is, simila o be o e, i s conca ena ed wi h he
ou pu o he bo om up compu a ions and hen ed h ough
1×1-con olu ions. Once we each he bo om hie a chy
le el, he ou pu o he las TD block is ed h ough an ou -
pu block (OB) composed o some addi ional con olu ional
laye s, gi ing us he inal p edic ions o d1and d2.
We’ e in eg a ed LC in o HVAE, HAE and he clas-
sic U-NET a chi ec u e. No e ha he di e ence be ween
HVAESand HIERARCHICAL AUTOENCODERS (HAES) is
ha he s ochas ic block is eplaced by he iden i y. We use
he e m Vanilla o deno e he unde lying a chi ec u e on
which we hen enable LC.
Deep-LC: deepe pe o ms be e . We obse e empi i-
cally ha ha ing deepe hie a chies is bene icial (see Fig-
u e 6(a)). Since in U-NETS, HAES, and HVAES, each
consecu i e hie a chical le el hal es he inpu enso in all
spa ial dimensions, a na u al limi o he maximum hie a -
chy le el is gi en by he ed pa ch size1. By making use o
addi ional lowe esolu ion image con ex a each hie a chy
le el, we’ e designed µSpli such ha spa ial dimensions o
la en enso s s ay cons an ac oss all hie a chy le els. This
enables Deep-LC (see Figu e 2(b)), ou mos po en a chi-
1Using a pa ch size o 64, o example, can a mos gi e ise o 5 hie -
a chy le els (25+1 = 64).
ec u e a ian , o ha e addi ional hie a chy le els o e wha
a anilla HVAE can ha e, ypically showing bes esul s in
ou expe imen s (see Figu e 6(b) and Figu e 7).
Mo e conc e ely, in ou Deep-LC ne wo k, we s ack a
de aul HVAE (like he one used in [20]) on op o ou
Regula -LC a ian (Figu e 2(a)). This means ha s a ing
om he highes hie a chy le el using LC, any u he hie -
a chy le el is buil like a egula HVAE hie a chy s ack.
Lean-LC: minimal memo y oo p in . Lean-LC, ou
mos memo y e icien LC a ia ion, does no use he la -
e al con ex in oduced in he bo om-up b anch wi hin he
op-down b anch (see Supp. Figu e S.1 o i s a chi ec-
u e). Mo e speci ically, he bo om-up b anch is iden ical
o Regula -LC, bu he op-down b anch educes o he de-
aul HVAE implemen a ion, e y simila o how i was also
used in [20]. This is enabled by cen e c opping he ou pu
o each BU block going in o he TD block.
Tiled P edic ions. Fo i ually all asks using ully con-
olu ional a chi ec u es, ained ne wo ks a e o en used o
p edic esul s on inpu s much la ge hen he pa ches hey
we e ained on. Whene e an inpu image is so la ge ha
he ne wo k in ques ion canno scale wi hou unning ou -
o -memo y, p edic ions a e ypically pe o med on o e -
lapping pa ches and la e sui ably c opped and appended.
When applied o ela i ely shallow [24] and non- a ia ional
ne wo ks, esul s can be pixel-pe ec , i.e. no con aining
(a)
(b)
(c)
Figu e 4. S a egies o iled p edic ions. (a) The di e ence be-
ween Inne and Ou e Padding. The blue dashed ec angle ep-
esen s one pa ch used o iled p edic ions. Fo each cell in he
ain g ay g id supe imposed on he inpu image one such pa ch
exis s. The ed dashed ec angle ep esen s he cen e -c op egion
used o ile he inal p edic ion o he en i e inpu image. The blue
shaded a ea is he e o e he pa o he pa ch ha o e laps wi h
neighbo ing pa ches, i.e. i is he padding a ea o he ed ec an-
gle. Ou e Padding uses a ile size equi alen o he aining pa ch
size and in oduces o e lap by enla ging he pa ch being ed o he
ne wo k. Inne Padding, in con as , main ains he o iginal pa ch
size, and uses only an inne c op o ile he gi en inpu image.
(b) Pe cen ange a ia ion (o PSNR measu emen s) when using
di e en amoun s o Ou e o Inne padding ( o HAE and HVAE
anilla se ups using a pa ch size o 64). Fo a ying amoun s o
padding (x-axis), we plo how 6da a poin s o he Pa iaATN da a
(3 asks∗2 = 6) and 2da a poin s o Hagen e al.da a (1 ask) a e
dis ibu ed. No e how dis ibu ions o Inne Padding a e consis-
en ly be e . (c) Using Ou e Padding, p edic ions a e pe o med
on pa ches la ge han he ones used du ing aining, leading o
ou -o -dis ibu ion (OOD) inpu s and he e o e o in e io p edic-
ions ( ed a ows). Fi s and second ow a e he g ound u h and
p edic ion made wi hou any padding espec i ely. See Supp. Fig-
u e S.3 o mo e examples.
any iling a i ac s. Bu we obse e ha he e a e wo cases
whe ein iling a e ac s a e no easily a oidable.
The i s is caused by ne wo ks ha ha e huge ecep i e
ields (see Figu e 3). When ained wi h a pa ch size much
smalle han he heo e ical ecep i e ield size, la ge pa s
o he heo e ical ecep i e ield will be emp y (i.e. ze o).
See also Supp. Sec ion S.2.1 o a mo e de ailed desc ip ion.
When such ained ne wo ks a e la e used o iled p e-
dic ions, a p oblem a ises whene e he inpu pa ches, on
which p edic ions a e made, a e la ge han he pa ch size
used du ing aining (which ypically is he case because
pa ch sizes is chosen such ha GPU memo y is bes u ilized,
and inpu pa ches need o o e lap su icien ly o a oid bo -
de a i ac s). These pa ches will ill a la ge po ion o he
heo e ical ecep i e ield han aining pa ches did, esul -
ing in ou -o -dis ibu ion (OOD) p edic ions and wo sened
pe o mance (see Figu e 4(b) o quan i a i e assessmen ).
The second case o iling a i ac s a ises when a ia-
ional ne wo ks like HVAESa e used. These a chi ec u es
sample om he a ia ional la en space o encoded iles,
wi h samples o neighbo ing iles no necessa ily decod-
ing in o consis en image con en s along he bo de s o p e-
dic ed iles.
The solu ion we p opose is wo old: (i)Ins ead o iled
p edic ion on la ge pa ches (Ou e Padding), which is a -
guably he mos o en used iling scheme, we p opose o
use Inne Padding ins ead, an app oach ha uses pa ches o
he same size as he ones used du ing aining, he eby sol -
ing he OOD issue in oduced abo e. Mo e speci ically, in
bo h iling schemes, he inpu image is di ided in o o e lap-
ping pa ches. The p edic ions on hese pa ches a e hen cen-
e c opped and hese c ops a e pu igh nex o each o he
in o de o c ea e a p edic ion o he en i e inpu image.
To enla ge he o e lap be ween neighbo ing pa ches, Ou e
Padding enla ges he pa ch size. Inne Padding does no al-
e he size o pa ches, bu ins ead only uses a smalle cen-
al a ea o hei espec i e p edic ions. See Figu e 4(a) o
a isual depic ion o Inne and Ou e Padding. In ou ex-
pe imen s (see Sec ion 5), we ha e used Inne Padding o
24 pixels, de e mined ia g id-sea ch. (ii)O e lap amoun
wi h Inne Padding a e cons ained o be small. Small o e -
lap would usually cause a i ac s due o insu icien image
con ex a ile bounda ies. Howe e , due o ou LC ap-
p oach, µSpli is ed a e y la ge and consis en image con-
ex a bo h sides o all pa ch bounda ies, allowing us o op-
e a e wi h minimal a i ac s e en wi h small o e laps2. In
supplemen , we empi ically show he lowe need o o e lap
o ou LC a ian s.
T aining De ails. Fo e e y da ase , we use 80%, 10% and
10% o he da a as ain- alida ion- es spli . All models a e
ained using 16-bi p ecision on a Tesla V100 GPU. Un-
less o he wise men ioned, all models a e ained wi h ba ch
size o 32 and inpu pa ch size o 64. Fo all HVAES, we
lowe -bound σs o P(d1, d2, θ) o exp(−5). This a oids
nume ical p oblems a ising om hese σs going o ze o, as
epo ed in [22]. Nex , we e-pa ame e ize he no mal dis-
ibu ions o he BU b anch using σExpLin e o mula ion
in oduced in [7]. We addi ionally uppe -bound he inpu o
σExpLin o 20. Fo aining µSpli wi h Deep-LC, we ol-
low he sugges ions in [6,21], and di ide he ou pu o each
BU block by √2˙
i, wi h ibeing he index o he hie a chy
le el he BU block is pa o .
4. Da ase s
SinosoidalC i e s.
We c ea ed his syn he ic da ase explici ly o demon-
s a e he impo ance o con ex o he spli ing ask and
he use ulness o using LC wi hin µSpli . Images in his
2No e ha a i ac s a ising om independen ly sampling he la en
space in HVAES emains an unsol ed p oblem.
(a) (b)
F equencies
F eq. pai s, i.e., C i e s Connec ing hese F eq. pai s
Ch1. Image
Ch2. Image
Inpu Image
Figu e 5. The syn he ic SinosoidalC i e s da ase is designed in such a way ha la ge la e al image con ex is needed in o de o pe o m
co ec channel spli ing. (a) A schema illus a ing how we c ea ed he SinosoidalC i e s da ase . A de ailed desc ip ion is p o ided in
Sec ion 4.(b) We show wo sample SinosoidalC i e s inpu images ( ow 1) o size 128 ×128 and 256 ×256 pixels and he wo channels
ha c ea ed hem ( ow 2), espec i ely. Below, we show he decomposi ion esul s ob ained wi h a ained anilla HVAE wi h inpu pa ch
size 64 ( ow 3), and esul s ob ained wi h he same a chi ec u e bu using Regula -LC ( ow 4). To ecognise which c i e is depic ed and
assign i o a channel, he ne wo k has o see bo h wa e o ms, hence equi ing long ange la e al image con ex .
(a) (b)
Figu e 6. Bene i s o µSpli in one glance: Quan i a i e esul s o baselines s.µSpli a ian s. (a) We plo he pe o mance o he
anilla U-NET and he anilla HVAE baseline ained on inc easingly la ge pa ch sizes on ou Pa iaATN Ac s. Tub da a. The U-NET
pe o mance pla eaus oughly a a pa ch size o abou 256. The pe o mance o he anilla HVAE (no using LC) depends on how many
hie a chy laye s we use (1 o 4, di e en colo ed plo s), bu hen pla eaus as well, o equi ing a emendous amoun o GPU memo y
(black plo , also see Table 1). (b) The le plo displays he da a as shown in he HVAE plo in (a), bu now as a unc ion o hie a chy le els
in he used a chi ec u e. Each cu e is now ep esen ing a gi en pa ch size. X-axis icks exp ess how many hie a chy le els he HVAE has,
and how many o hose make use o LC (numbe in b acke s). The igh mos wo plo s show esul s ob ained wi h µSpli using an HVAE
wi h a pa ch size o only 64. Each plo shows esul s ob ained wi h one o ou LC a ia ions being used. No only do ne wo ks using LC
ou pe o m all baselines, hey do so al eady when using he smalles pa ch size (64), he eby equi ing only a mode a e amoun o GPU
memo y (see Table 1).
da ase can only co ec ly be decomposed when su icien
la e al image con ex is a ailable du ing p edic ion ime.
We i s choose 4 di e en equencies and combine
hem in o 4 unique pai s. Two pai s a e dedica ed o im-
age channel 1 (blue box), he o he wo o image channel 2
(g een box). We call hese pai s c i e s. The assignmen o
hese c i e s o channels is done such ha each equency
is assigned exac ly once o each channel. We connec he
wo sinosoids o each c i e wi h a low equency cu e
o con ollable leng h ( la e deno ed by Njoin in Table 2).
No e ha i is he speci ic combina ion o sinosoid equen-
cies p esen in he cu e which decides whe he i belongs
o Channel 1 o 2 since he indi idual sinosoids hemsel es
occu in bo h channels in equal amoun . Nex , we assem-
ble channel images by placing a p ede ined numbe o an-
domly chosen cu es a andom posi ions in he espec i e
image channel. The inal inpu image is c ea ed as he sum
o he wo channels. See Figu e 5(a) o da ase cons uc-
ion.
Pa iaATN Mic oscopy Da ase . We’ e c ea ed Pa iaATN
da ase . I has been imaged in he Syn he ic Physiol-
ogy Labo a o y a Uni e si y o Pa ia, and is composed
o 62 4-channel luo escence mic oscopy images o size
2720 ×2720. No ably, his da ase has highe pixel esolu-
ion han mos publicly a ailable luo oscence mic oscopy
da ase s [12,17,31]. The h ee channels we use label Ac in,
Tubulin and Nuclei, espec i ely, yielding h ee decomposi-
ion asks we e e o as Ac in s. Tubulin, Ac in s. Nu-
clues, and Tubulin s. Nucleus. No e ha he da ase has
wo channels labelling Nuclei om which we picked one.
See supplemen o mo e de ails.
Hagen e al.Ac in-Mi ochond ia Da ase . F om many
PSNR:28.3 (27.2) PSNR:30.4 (29.8) PSNR:30.1 (30.7)
PSNR:22.3 (23.1) PSNR:24.4 (25.5) PSNR:24.0 (26.5)
Figu e 7. Quali a i e esul s on he Ac s. Tub ask om ou Pa iaATN da ase . We compa e g ound u h o esul s ob ained wi h he
anilla HVAE baseline ained wi h a pa ch size o 64 o esul s ob ained wi h wo a ia ions o µSpli (HVAESusing lean and deep LC,
bo h also using a pa ch size o 64). The o e laid his og ams shows ei he he in ensi y dis ibu ion o he wo channels (column 1) o he
in ensi y dis ibu ion o he g ound u h and he p edic ion ( ed). The gi en PSNR a e o he indi idual p edic ion ( ull inpu image) and
o he en i e da ase (in b acke s).
sub-da ase s p o ided by Hagen and colleagues [12], we
picked he one wi h Mi ochond ia and Ac in channels, he
one wi h he highes pixel esolu ion (2048 ×2048).
5. Expe imen s and Resul s
Inc emen ally In oducing LC. In le panel o Fig-
u e 6(b), we show ha o Vanilla HVAE, as hie a chy le -
els inc ease (BU blocks), so does he pe o mance, p o ided
we’ e la ge enough pa ch size. Fo pa ch size o 64, inc eas-
ing hie a chy le els does no b ing any bene i a e a poin .
In cen al panel o Figu e 6(b), keeping he pa ch size
and hie a chy le els ixed o 64 and 4 espec i ely, we in-
oduce LC o an inc easing numbe o hie a chy le els (de-
no ed by he numbe in he b acke s along he x-axis). This
gi es us a cumula i e gain o a ound 2dB PSNR. Fu he -
mo e, wi h Deep-LC ( igh panel), we inc ease he hie a -
chy le el e en u he which gi es us u he imp o emen s.
Two hings a e wo h no ing he e o he pa ch size o 64:
(i)The e is no much bene i in inc easing hie a chy le -
els o Vanilla HVAE. Using LC, on he o he hand, leads
o addi ional imp o emen s, and (ii)Vanilla HVAE, can-
no employ as many hie a chy le els as we can do using
Deep-LC, and he esul s gain subs an ially om hose ex a
le els. The Vanilla-XL model deno es Vanilla model ained
wi h a pa ch size o 512. The Deep-LC esul s ou pe o m
he Vanilla-XL HVAE, see Figu e 6(a), while also ha ing a
much smalle GPU memo y oo p in (see Table 1).
Expe imen s on Mic oscopy Da a. We p esen esul s on 3
decomposi ion asks on he Pa iaATN da ase and 1decom-
posi ion ask on he Hagen e al.da ase . Table 1summa-
izes ou indings. As baselines, we’ e adap ed he wo ks
o [16,13] and ind ha µSpli ou pe o ms hem. I is wo h
no ing ha a chi ec u e used in [13], unlike ou s, did no
gene alize o using a hie a chy o lowe esolu ion inpu s
and wo ked wi h jus one addi ional low esolu ion inpu . I
also, unlike us, did no espec pixel alignmen s while con-
ca ena ing he la en space enso s o he wo esolu ion le -
els. We ha e also applied he unsupe ised Double-DIP [9]
baseline o andom sampled 6c ops o size 256 ×256
o each es -se image o he Pa iaATN and Hagen e al.
da ase s (see Table 1and supplemen a y igu e).
O e all ou asks, he bes pe o ming LC a ian wi h
HVAE a chi ec u e ou pe o ms he bes LC a ian wi h
HAE a chi ec u e by 0.5 PSNR on a e age. Using he
HVAE a chi ec u e, Deep-LC ou pe o ms Lean-LC on a -
e age by 0.8PSNR. Fo he HAE a chi ec u e his di e -
ence is 0.1PSNR. Quali a i e esul s a e shown in Figu e 7
and in he supplemen .
Ou e s. Inne Padding and Run ime Pe o mance. In
Figu e 4(c), we show he pe cen age change in PSNR wi h
di e en amoun s o padding and see ha he anilla HAE
and HVAE se up pe o mances deg ade (le plo ) when
Ou e Padding is used wi h la ge padding amoun s. Bu wi h
Inne Padding ( igh plo ), we see imp o emen sa u a ion
wi h inc ease in padding amoun . In Figu e 4(b), one can
obse e an a e ac appea ing solely due o Ou e Padding
(a i ac does no exis in ’No padding’). These esul s sup-
po ou claim abou OOD issue as desc ibed in Sec ion 3.
No e ha Inne Padding equi es a la ge numbe o indi-
Pa iaATN Hagen e al.
Model + Pa ch Size GPU Ac s Nuc Tub s Nuc Ac s Tub Ac s Mi
(GiB) PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
Double-DIP [9] - 22.8 0.30 21.2 0.20 20.9 0.30 25.3 0.56
B a eNe [13] 64 2.8 31.7 0.73 30.3 0.61 25.9 0.62 33.0 0.92
Con ex -Awa e U-Ne [16] 64 4.7 31.5 0.74 29.0 0.61 25.1 0.61 31.1 0.91
U-Ne 256 9.4 33.2 0.79 31.4 0.71 28.1 0.69 34.2 0.95
U-Ne 512 28.7 33.3 0.79 31.1 0.72 27.9 0.69 34.1 0.94
U-Ne Regula -LC 64 12.5 33.5 0.79 32.0 0.71 27.6 0.68 32.7 0.93
HAE
Vanilla 64 2.3 31.7 0.74 29.5 0.64 25.4 0.63 31.9 0.92
Lean-LC 64 3.9 33.6 0.78 31.9 0.70 27.7 0.67 32.9 0.94
Regula -LC 64 6.0 33.5 0.79 31.6 0.71 27.9 0.68 33.4 0.94
Deep-LC 64 6.9 33.7 0.80 31.8 0.72 28.3 0.69 32.8 0.94
Vanilla-XL 512 31.2 33.2 0.79 30.2 0.68 27.6 0.67 34.2 0.95
HVAE
Vanilla 64 2.8 31.8 0.75 29.6 0.64 25.2 0.61 31.9 0.93
Lean-LC 64 4.4 33.8 0.79 31.9 0.71 27.7 0.68 32.7 0.94
Regula -LC 64 11.1 33.9 0.80 32.1 0.72 27.8 0.68 34.1 0.95
Deep-LC 64 12.8 33.9 0.81 32.5 0.73 28.6 0.70 34.3 0.95
Vanilla-XL 512 (∗)33.4 0.78 32.9 0.69 27.6 0.67 34.3 0.95
Table 1. Quan i a i e esul s on luo escen image decomposi ion asks de i ed om he Pa iaATN and Hagen e al.da ase s. All esul s a e
epo ed in e ms o peak signal- o-noise a io (PSNR) and s uc u al simila i y index measu e (SSIM). Fo each model we also epo he
used aining pa ch size and GPU memo y usage du ing aining. The baselines we use a e Double-DIP [9], B a eNe [13], Con ex -Awa e
U-Ne [16], as well as anilla HAESand HVAESusing ou hie a chy le els. Addi ionally, we show esul s o U-NETS [23], HAES,
and HVAES ained on much la ge pa ch sizes (256 and 512). The esul s o µSpli a e also ob ained wi h he same HAE and HVAE
a chi ec u es ained on pa ches o size 64 ×64, bu wi h all hie a chy le els also employing ei he Lean-LC,Regula -LC, o Deep-LC
(see main ex o de ails). Bold numbe s deno e he bes esul o any gi en ask (column). In all bu one case (Pa iaATN, Tubulin s.
Nuclei), ou esul s ou pe o m all baselines despi e ha ing a compa a i ely lean memo y oo p in . No e ha he Vanilla-XL HVAE wi h
pa ch size o 512 and ba ch size o 32 did no i in 32 GiB o GPU memo y and so we lowe ed he ba ch size such ha he model did i in
memo y.
Image Model Njoin = 0 Njoin = 25
Size PSNR SSIM PSNR SSIM
128
Vanilla 28.3 0.90 25.5 0.85
Lean-LC 37.3 0.97 35.1 0.96
Regula -LC 37.0 0.98 39.2 0.98
256
Vanilla 19.4 0.75 15.8 0.43
Lean-LC 34.1 0.97 32.2 0.97
Regula -LC 41.5 0.99 41.6 0.98
Table 2. Quan i a i e esul s on he SinosoidalC i e s da ase . We
compa e esul s ob ained wi h anilla HVAES ha do no use LC,
and HVAESemploying ei he Lean-LC o Regula -LC (i.e.µSpli
esul s, see main ex o de ails). All expe imen s a e pe o med
using a pa ch size o 64. Bold numbe s deno e he bes esul o
any gi en ask (columns), showing ha ou esul s consis en ly
ou pe o m he anilla baselines.
idual p edic ions, indica ed by he smalle g id size seen in
Figu e 4(a) (deno ed by ed dashed ec angle). Speci ically,
using an Inne Padding o 24 pixels wi h a pa ch size o 64
will use 16×16 cen e -c op pe pa ch. Hence, we will need
o p edic 16 ((64/16 = 4)2) imes mo e pa ches o co e
he en i e inpu image.
BU Blocks anilla 64 LC 64 LC 128
1 24.3 24.7 24.8
2 25.1 25.9 25.9
3 25.2 27.0 27.0
4 25.4 27.8 27.9
Table 3. Pe o mance o HVAE + Regula -LC ained wi h pa ch
size o 64 (col 3) and 128 (col 4) on Ac s Tub da a. The la ge
pa ch size shows diminishing e u ns, indica ing ha LC is p o id-
ing enough image con ex , showcasing he alue o ou app oach.
In e es ingly, we ound padding gi ing mino bene i s o
Deep-LC quan i a i ely and so Deep-LC esul s in Table 1
we e compu ed wi hou padding he eby leading o a be e
un ime o Deep-LC. Howe e , we s ill ind ew iling a e-
ac s wi h Deep-LC and in hose cases Inne Padding helps.
O he wo LC a ian s bene i bo h quan i a i ely and qual-
i a i ely om Inne Padding.
E ec s o La ge T aining Pa ch Sizes. In Figu e 6(a)
we show ha inc easing he aining pa ch size imp o es
he pe o mance o a U-NET and anilla HVAESac oss
di e en hie a chy le els. While he U-NET baseline pe -
o mance sa u a es, HVAES’ imp o emen wi h inc easing
hie a chy le els does no , bu quickly each a ha d limi in
e ms o GPU memo y equi emen (see Table 1).
Pe o mance o LC wi h la ge pa ch sizes. Using µSpli ,
mic oscopy labs ha ing limi ed GPU compu e will s ill ge
simila pe o mance o labs wi h ample esou ces, labs ca-
pable o using ne wo ks employing la ge pa ch sizes. So
a , all ou LC a ian s ha e been ained wi h a pa ch size
o 64. A na u al ques ion o ask is whe he he e is s ill
some bene i in using la ge pa ch sizes when also using LC.
While he answe o his ques ion depends upon mul iple
ac o s like how much long ange in e ac ions a e p esen
in he da a, he ecep i e ield size o he ne wo k e c, we
did an abla ion o empi ically in es iga e his in Table 3.
One can obse e ha o HVAE + Lean-LC, ac oss di e -
en hie a chy le els (BU Block coun ), using a pa ch size o
128 only p o ides a mino pe o mance imp o emen o e
a pa ch size o 64. This implies ha o a pixel’s p edic ion,
only a small amoun o neighbou hood con ex needs o be
gi en a na i e pixel esolu ion and mos o he con ex can
be gi en ia lowe - esolu ion la e al image con ex .
Expe imen s on Syn he ic Da a. In Table 2we show he
esul s ob ained on he SinosoidalC i e s da ase . We used
wo inpu image sizes, 128×128 and 256×256, and wo al-
ues o Njoin, namely 0and 25 pixels. On a e age, µSpli
ou pe o ms he anilla HVAE by 18 PSNR. Also no e ha
he la ge inpu size, cons i u ing a ha de p oblem o sol e,
is esul ing in a d op o pe o mance o he anilla HVAE.
Using µSpli , ins ead, he pe o mance inc eases. To ecog-
nise which c i e is depic ed and assign i o a channel, he
ne wo k has o see bo h wa e o ms. The anilla HVAE is
able o do spli ing on 128 ×128, bu i has a e ac s ( ed
ci cle in Figu e 5(b)). Fo he 256 ×256 pixel images, i
comple ely ails because i is unable dis inguish be ween
he c i e s since i canno simul aneously p ocess a su i-
cien ly la ge pa o he image. In con as , by using LC we
a e able o success ully spli bo h images.
U-NET Hype pa ame e Tuning. We uned dep h and
pa ch size o a classic U-NET o achie e op imal pe o -
mance o he asks a hand (see supplemen o de ails).
6. Discussion
In his wo k, on ou da ase we show ha µSpli pe o ms
be e when deepe a chi ec u es, i.e. HAESand HVAES,
a e employed and enabled o p ocess addi ional image con-
ex ia he memo y e icien la e al con ex ualiza ion (LC)
schemes we p opose.
The deepe such ne wo ks become, he la ge will he e-
cep i e ield (RF) sizes g ow, in ou case ou inely exceed-
ing sizes o 512×512 pixels. An immedia e consequence o
his is ha we canno easily employ common iling schemes
(i.e. Ou e Padding) wi hou unning in o ou -o -dis ibu ion
(OOD) issues (see Sec ion 3). Hence, we p opose o use In-
ne Padding o ci cum en his p oblem. Addi ionally, we
obse e ha Deep-LC does e en pe o m qui e well wi h-
ou padded iled p edic ions (no addi ional o e lap be ween
pa ches). The eason o his is ha he pa ch con ex yp-
ically gi en by o e lapping egions is now subs i u ed by
con ex being ed ia Deep-LC. S ill, bes pe o mance is
ypically ob ained using Deep-LC and Inne Padding du -
ing iled p edic ions.
I is impo an o poin ou ha o any a ia ional mod-
els, such as HVAES, iled p edic ions su e om he addi-
ional p oblem ha neighbo ing iles will likely no be con-
sis en due o he sampling s ep pe o med independen ly
pe ile. While Inne Padding s ill is he be e s a egy o
employ ( o he same a gumen as o any o he model wi h
huge ecep i e ields), sampling inconsis encies canno be
ully a oided. The s eng h o hese a i ac s will depend
on he da a unce ain y (i.e. he ambigui y in he ed inpu s
w. . . he ained model).
In summa y, we ha e p oposed a powe ul new me hod
o e icien ly use image con ex . We ha e hen applied his
me hod o an impac ul new image decomposi ion ask on
luo escence mic oscopy da a. We belie e ha he p esen ed
ideas will p o e o also be use ul in he con ex o o he
compu e ision p oblems. We will explo e he applicabil-
i y o LC o o he p oblem domains in u u e wo k. Addi-
ionally, we will make µSpli mo e amenable o noisy luo-
escence da a and o disen anglemen asks whe e mo e han
wo image channels a e supe imposed.
Acknowledgemen s
This wo k was suppo ed by he Eu opean Union
h ough he Ho izon Eu ope p og am (IMAGINE p ojec ,
g an ag eemen 101094250-IMAGINE and AI4LIFE
p ojec , g an ag eemen 101057970-AI4LIFE) as well as
he compu e in as uc u e o he BMBF- unded de.NBI
Cloud wi hin he Ge man Ne wo k o Bioin o ma ics In-
as uc u e (de.NBI) (031A532B, 031A533A, 031A533B,
031A534A, 031A535A, 031A537A, 031A537B,
031A537C, 031A537D, 031A538A). Addi ionally, he
au ho s also wan o hank Damian Dalle Noga e o he
Image Analysis Facili y a Human Technopole o use ul
guidance and discussions and he IT and HPC eams a HT
o he compu e in as uc u e hey make a ailable o us.
Re e ences
[1] Yu al Baha and Michal I ani. Blind dehazing using in e nal
pa ch ecu ence. In 2016 IEEE In e na ional Con e ence on
Compu a ional Pho og aphy (ICCP), pages 1–9, May 2016.
1
[2] Joshua Ba son and Lo¨
ıc Roye . Noise2Sel : Blind denoising
by Sel -Supe ision. pages 1–16, Jan. 2019. 1
[3] Dana Be man, Tali T eibi z, and Shai A idan. Non-local im-
age dehazing. In 2016 IEEE Con e ence on Compu e Vision
BU Blocks PSNR
1 29.8
2 31.3
4 33.2
5 33.2
6 33.0
Table S.2. The achie able pe o mance using a U-Ne using a i-
ous numbe s o bo om-up (BU) blocks. Fo he esul s epo ed
in he main ex , 5 BU blocks ha e been used.
no e ha Double-DIP, being a comple ely unsupe ised ap-
p oach, na u ally inds i di icul o know he ’co ec ’ spli ,
he spli which exis s in na u e. I simply e u ns one o he
many plausible spli ing op ions. I s in e io pe o mance
a gues o some o m o supe ision o ou p oblem.
S.7. Di e en Neu al Ne wo k Submodules
Residual Block We’ e aken he esidual block o mula-
ion om [20]. The schema o he esidual block is shown
in Supplemen a y Figu e S.1 (b). The las laye in he esid-
ual block is he Ga edLaye 2D which doubles he numbe
o channels h ough a con olu ional laye , hen use hal he
channels as ga e o he o he hal .
S ochas ic Block The channels o he inpu o his block
a e di ided in o wo equal g oups. The i s hal is used as
he mean o he Gaussian dis ibu ion o he la en space.
The second hal is used o ge he a iance o his dis i-
bu ion, implemen ed ia he σExpLin e o mula ion in o-
duced in [7].
S.8. U-Ne Tuning
We a ied he dep h o he used U-Ne . Fo consis ency
wi h he o he used a chi ec u es, we decided o s ill call
i Bo omUp (BU) blocks (HAESand HVAESg ow up-
wa ds, no downwa ds.) Table S.2 shows he achie able
pe o mance wi h U-Ne s o di e en dep h (numbe o BU
blocks).
O he ele an hype pa ame e alues used o U-Ne s
a e pa ience = 200 o ea ly s opping , pa ience = 30 o
he lea ning a e schedule (ReduceLROnPla eau).
Inpu Image Vanilla Lean-LC Deep-LC GT
Ch1
Ch2
Ch1
Ch2
Ch1
Ch2
Figu e S.4. Quali a i e e alua ion o Vanilla HVAE and ou LC a ian s (also in eg a ed o HVAE a chi ec u e) on Ac in s Mi ochond ia
ask. He e, we show esul s on h ee andom c ops o size 300 ×300. Inpu o all models is he egion inside ed squa e, as seen in column
one. Las column has he g ound u h o bo h channels. Red a ows highligh ew in e es ing a eas whe e we obse e ou Deep-LC
pe o ms be e han o he s.
Inpu Image Vanilla Lean-LC Deep-LC GT
Ch1
Ch2
Ch1
Ch2
Ch1
Ch2
Figu e S.5. Quali a i e e alua ion o Vanilla HVAE and ou LC a ian s (also in eg a ed o HVAE a chi ec u e) on Ac in s Tubulin ask.
He e, we show esul s on h ee andom c ops o size 300 ×300. We disen angle he egion inside ed squa e, which is shown in column
one. Las column has he g ound u h o bo h channels.
Inpu Image Vanilla Lean-LC Deep-LC GT
Ch1
Ch2
Ch1
Ch2
Ch1
Ch2
Figu e S.6. Quali a i e e alua ion o Vanilla HVAE and ou LC a ian s (also in eg a ed o HVAE a chi ec u e) on Tubulin s Nucleus
ask. He e, we show esul s on h ee andom c ops o size 300 ×300. We disen angle he egion inside ed squa e, which is shown in
column one. Las column has he g ound u h o bo h channels.
Inpu Image Vanilla Lean-LC Deep-LC GT
Ch1
Ch2
Ch1
Ch2
Ch1
Ch2
Figu e S.7. Quali a i e e alua ion o Vanilla HVAE and ou LC a ian s (also in eg a ed o HVAE a chi ec u e) on Ac in s Nucleus ask.
He e, we show esul s on h ee andom c ops o size 300 ×300. We disen angle he egion inside ed squa e, which is shown in column
one. Las column has he g ound u h o bo h channels.
Inpu Image Vanilla Lean-LC Deep-LC GT
Ch1
Ch2
Ch1
Ch2
Ch1
Ch2
Figu e S.8. Quali a i e e alua ion o Vanilla HVAE and ou LC a ian s (also in eg a ed o HVAE a chi ec u e) on SinosoidalC i e s
da ase . He e, we show esul s on h ee andom c ops o size 200 ×200. We disen angle he egion inside ed squa e, which is shown
in column one. Las column has he g ound u h o bo h channels. Red a ows highligh ew in e es ing a eas whe e we obse e ou
Deep-LC pe o ms be e han o he s.
PSNR:32.8 (28.5) PSNR:23.7 (25.7) PSNR:23.5 (24.9)
PSNR:49.7 (36.1) PSNR:32.8 (30.0) PSNR:30.5 (31.8)
Figu e S.9. Quali a i e image decomposi ion esul s using he
Double-DIP baseline ( ow 2) on an 256 ×256 image c op
om Hagen e al.da ase . The o e laid his og ams shows ei-
he he in ensi y dis ibu ion o he wo channels (column 1) o
he in ensi y dis ibu ion o he g ound u h and he p edic ion
( ed). Regula -LC, on he o he hand, pe o ms well. No e ha
Double-DIP is sol ing a much ha de ask since i is an unsupe -
ised me hod ained on a single inpu images.