Fea u eFo es : he powe o ounda ion models, he
usabili y o andom o es s
Mehdi Sei i1, Damian Dalle Noga e2, Juan Ba aglio i2, Ve a Galino a1, Ananya Kedige
Rao3, Pie e-Hen i Jou neau4, Anwai A chi 5, AI4Li e Ho izon Eu ope P og amme
Conso ium+, Cons an in Pape5,6, Johan Decelle3, Flo ian Jug1, and Jo an Deschamps2*
1Compu a ional Biology Resea ch Cen e , Human Technopole, Milan, I aly
2Bioimage Analysis Uni , Na ional Facili y o Da a Handling and Analysis, Human Technopole, Milan, I aly
3Cell and Plan Physiology Labo a o y, CNRS, CEA, INRAE, IRIG, Uni e si ´
e G enoble Alpes, G enoble, F ance
4Uni e si y G enoble Alpes, CEA, IRIG-MEM, 38054, G enoble, F ance
5Ins i u e o Compu e Science, Uni e si y o G¨
o ingen, G¨
o ingen, Ge many
6
Clus e o Excellence ”Mul iscale Bioimaging: om Molecula Machines o Ne wo ks o Exci able Cells”, Uni e si y
o G¨
o ingen, G¨
o ingen, Ge many
+A lis o au ho s and hei a ilia ions appea s a he end o he pape .
*[email p o ec ed]
ABSTRACT
Analysis o biological images elies hea ily on segmen ing he biological objec s o in e es in he image be o e pe o ming
quan i a i e analysis. Deep-lea ning (DL) is ubiqui ous in such segmen a ion asks, bu can be cumbe some o apply, as
i o en equi es la ge amoun o manual labeling o p oduce g ound- u h da a, and expe knowledge o ain he models.
Mo e ecen ly, la ge ounda ion models, such as SAM, ha e shown p omising esul s on scien i ic images. They, howe e ,
equi e manual p omp ing o each objec o edious pos -p ocessing o selec i ely segmen hese objec s. He e, we p esen
Fea u eFo es , a me hod ha le e ages he ea u e embeddings o la ge ounda ion models o ain a andom o es classi ie ,
he eby p o iding use s wi h a apid way o seman ically segmen ing complex images using only a ew labeling s okes. We
demons a e he imp o emen in pe o mance o e a a ie y o da ase s, and p o ide an open-sou ce implemen a ion in napa i
ha can be ex ended o new models.
In oduc ion
Segmen a ion is a ubiqui ous ask in mic oscopy image analysis, as i enables downs eam p ocessing and quan i ica ion o
objec s o in e es . Resea che s ha e a hei disposal a wide a ay o algo i hms, among which machine lea ning app oaches
ha e long been he me hods o choice. In pa icula , andom o es pixel classi ica ion is a well-es ablished algo i hm, a he
hea o se e al popula so wa e ools o bioimage analysis
1–4
. This app oach uses common image il e s o ex ac a ea u e
ec o ep esen a ion o hand-labeled pixels in o de o ain decision ees o bes ma ch he gi en inpu labels. Because he
image il e s can be 2D o 3D, andom o es pixel classi ie s can na i ely pe o m 3D segmen a ion. Mo eo e , hey a e
compa ible wi h mul iclass pixel classi ica ion. These algo i hms owe hei popula i y o he simple i e a i e p ocess by which
use s d aw small sc ibbles o assign a class o a subse o pixels, apidly ain a andom o es , and p edic esul s o e many
images. This swi aining p ocedu e allows he co ec ion o mis akes by adding new labels o he aining se and aining
anew. While andom o es pixel classi ica ion algo i hms ha e a wide applica ion ange co e ing all ypes o images and
modali ies, hey a e limi ed in hei p edic i e powe , and easily con use di e en objec ypes ha ha e simila ex u es3.
In ecen yea s, deep-lea ning has eme ged as he mos powe ul app oach o image segmen a ion. Such app oaches a e
mos o en ained in a supe ised ashion, ha is o say wi h a la ge da ase o manually segmen ed images as e e ence
5,6
.
The likes o S a Dis
7
o CellPose
8
a e go- o ools o image analys s wan ing o pe o m image segmen a ion. Once ained,
hese me hods o en ou pe o m andom o es pixel classi ica ion
9,10
and a e compa ible wi h 3D segmen a ion. Fu he mo e,
CellPose2
11
in oduced use - iendly ine- uning o models by p o iding a use in e ace o co ec e o s and e ain he
selec ed model, simila o he way andom o es classi ie s a e used. Base CellPose models we e ained on da ase s consis ing
o a ious imaging modali ies and di e se samples, and a e capable o segmen ing objec s o simila size in a wide ange o
images. I does no , howe e , segmen mul iple classes, and can s uggle o e ec i ely segmen objec s wi h a ious shapes and
sizes simul aneously.
Wi h mo e compu e powe and mo e da a being a ailable, much la ge ne wo ks a e now being ained wi h as ounding
esul s. Fo ins ance, Segmen Any hing Model (SAM)
12
, is capable o accu a ely segmen ing biological objec s in 2D in
bo h elec on and ligh mic oscopy images, all he while being ained on a da ase o e whelmingly composed o na u al (i.e.
e e y-day scenes) images. To push he bounda y o i s capabili ies, ine- uning his model wi h scien i ic images is being
explo ed13–17.
SAM does no na i ely segmen whole images, bu a he expec s use anno a ions - also called p omp s - in he o m o
bounding boxes o poin s as inpu s, and e u ns segmen ed ins ances o he anno a ed objec s. While his is a powe ul way o
enable in e ac i i y, scien i ic segmen a ion pipelines p e e en ially equi e au oma ed p ocessing o la ge da ase s. SAM ships
wi h an au o-segmen a ion me hod based on au oma ically gene a ing p omp s as a g id o poin s o e he image. Un o una ely,
such ea u e is no guided and will esul , in mos cases, in missing objec s o in e es and o he ypes o objec being segmen ed
as well. Wi hou an accu a e and au oma ed way o p oducing he p omp s, SAM applica ions in bioimage analysis a e limi ed
o di ec and ime-consuming use in e ac ions o each objec in he da ase .
Ano he ui ul esea ch a enue is he use o ich la en spaces as basis o segmen a ion. Ra he han segmen ing pixels
di ec ly, o he app oaches, such as MAESTER
18
o DINO 2
19,20
, ain a la ge ne wo k on a di e en ask (e.g. econs uc ing
masked a eas o he image) in o de o p oduce ich ea u e embedding o he image. These ea u es can hen be used o
clus e he pixels based on hei p oximi y in his la en space, and iden i y objec classes wi h hese clus e s. While en icing,
clus e -based ea u es a e o en limi ed by he lack o knowledge o how many classes a e expec ed in a gi en image, and
whe he hese classes clus e meaning ully in he ea u e space. Mo eo e , he applica ion o such app oaches a e so a limi ed
o deep-lea ning expe s due o he complexi y o he aining p ocess, and success in segmen ing scien i ic images di e en
om hose in he aining se is no ensu ed.
He e, we p esen Fea u eFo es , a me hod ha combines he powe o la ge deep-lea ning models wi h he simplici y
and use -guidance p o ided by andom o es classi ica ion algo i hms. Wi h Fea u eFo es , manual labeling can be a ma e
o minu es, and use -guidance allows segmen ing complex objec s h oughou en i e da ase s wi hou equi ing e- aining
la ge deep lea ning ne wo ks. We showcase how Fea u eFo es ills a gap in he segmen a ion o la ge elec on mic oscopy
da ase s, enabling esea che s o segmen challenging images. Mo e speci ically, Fea u eFo es uses la ge ounda ion models
o ex ac ea u e ec o s co esponding o use -labeled pixels in o de o ain a andom o es algo i hm. In his manusc ip ,
we demons a e he use ulness o Fea u eFo es o e a ious scien i ic da ase s o which no s aigh o wa d o use - iendly
algo i hm exis , and he imp o emen s i yields o e classical andom o es classi ica ion. We p o ide an implemen a ion o
Fea u eFo es in an open napa i
21
plugin, as well as example sc ip s and no ebooks o pe o m p edic ion ou side napa i (e.g.
on compu e clus e s).
Resul s
Fea u eFo es in a nu shell
Fea u eFo es eplaces he classical il e s o a andom o es classi ie wi h la ge deep-lea ning models (see Fig. 1a), and
ex ac s he ea u e ec o s used du ing andom o es aining om he embeddings ha a e compu ed wi hin hose ne wo ks.
The o e all i e a i e aining p ocess emains o he wise simila , wi h use s equi ing a ew i e a ions o labeling and aining
be o e ob aining desi ed esul s. Fea u eFo es cu en ly includes se e al ounda ion models: MobileSAM
22
, SAM2
23
and
DINO 2
19
(see Me hods o a desc ip ion o he ea u e ec o s ex ac ion p ocess). Use s can ex end his lis and adap he
model o hei choice o use in Fea u eFo es (see Me hods).
In Fig. 1b, we desc ibe he Fea u eFo es pipeline as a ailable o use s ia he napa i plugin we p o ide. In a i s s ep,
using he Fea u e Ex ac ion widge , use s ex ac he ea u e ec o s co esponding o all pixels in a se o images loaded in
napa i om he model o hei choice. The ea u e ec o s co esponding o indi idual pixels a e s o ed in an HDF5 ile o
allow andom access du ing he la e s ages. The ea u e ec o s a e la ge ( om
320
o
1536
ea u es pe pixel depending on
he model) and hei ex ac ion slow. The e o e, s o ing hem once o he whole aining da ase enables as e i e a ions when
aining he andom o es .
Then, he Segmen a ion Widge is used o ain i e a i ely a andom o es on he da a subse , as well as pe o m he inal
segmen a ion. Fi s , use s selec a napa i laye con aining hei da a, poin o he ea u e ec o ile ha was expo ed using he
Fea u e Ex ac ion widge , and selec hei labeling laye . Nex , using he napa i’s buil -in labeling ools, hey label a small
ep esen a i e se o pixels be o e aining a andom o es on hese labeled pixels. Once he aining is done, use s can segmen
he cu en ly selec ed slice, o a ull s ack. The esul s can be imp o ed by i e a i ely adding new labeled pixels whe e he
ained classi ie pe o med poo ly. The aining p ocess allows apid i e a ion be ween labeling, aining and p edic ion. A any
poin , use s can sa e he ained andom o es classi ie . A e a ew i e a ions, use s can p edic on a da ase sa ed on he disk,
e.g. a much la ge s ack.
Fu he mo e, Fea u eFo es includes pos -p ocessing (see Me hods), such as smoo hing s eps and il e ing connec ed
componen s based on size. Addi ional pos -p ocessing ools le e age SAM2 in wo di e en ways: (i) by gene a ing bounding
boxes a ound ins ances ob ained om pe o ming wa e shed on he ou pu o Fea u eFo es and using hem as p omp s o
2/19
SAM2 (see Fig. 1c), o (ii) by using he SAM2 au o-segmen a ion ea u e in which a g id o poin s o e he image is passed o
he model as p omp s, and he inal masks a e selec ed by h esholding he in e sec ion o e union (IoU) be ween ins ances
ob ained om SAM2 and wa e shed-p ocessed Fea u eFo es esul s. Using SAM2 in he pos -p ocessing s ep ypically esul s
in objec segmen a ions wi h smoo he bounda ies (see Fig. 1c inse s).
Fea u eFo es on a ious mic oscopy modali ies
We applied Fea u eFo es o a ious da ase s om h ee di e en imaging modali ies: ocused ion beam scanning elec on
mic oscopy (FIB-SEM), label- ee mic oscopy, and H&E s aining. Fo each da ase , we ained a classical andom o es
classi ie using Labki
3
and Fea u eFo es on he same aining images. In his sec ion, we used SAM2 as he ea u e-gene a ing
model in Fea u eFo es , and applied he SAM2 bounding-box pos -p ocessing a ailable wi hin Fea u eFo es (see Me hods). In
o de o quan i y he segmen a ion pe o mance, we compu ed p ecision, ecall, Dice sco e, bounda y F1
24
, and Hausdo
dis ance25 me ics be ween he esul ing segmen a ion and he g ound- u h p o ided in he public da ase s.
FIB-SEM da a ypically has high con as and dense s uc u es, while being oo la ge o manually label and oo complex
o segmen using andom o es pixel classi ie s. Fig. 2a shows a single slice o a ly b ain imaged by FIB-SEM as well as
he g ound- u h masks o mi ochond ia and he segmen a ions ob ained wi h Labki and Fea u eFo es . The mi ochond ia
appea as da k and ound objec s o a ying in ensi y. While he andom o es classi ie is able o classi y mos pixels om
inside he mi ochond ia, i also c ea es a high numbe o alse posi i e and misses hei ou e memb ane. In con as o
Labki , Fea u eFo es p oduces a segmen a ion wi h high co e age o he mi ochond ia and ew alse posi i e pixels, which
is quan i a i ely con i med by a Dice sco e o
0.56
o he andom o es and
0.90
o Fea u eFo es . Pos -p ocessing he
segmen a ion om Fea u eFo es using he bounding boxes gene a ion and SAM2 (see Fig. 1c) yields smoo he segmen a ion
masks and a u he imp o ed Dice sco e o
0.94
. Simila esul s a e ob ained h oughou he da ase (see a ious slices in
Supplemen a y Figu e 1) and compu ing he Dice sco e o e he en i e da ase shows ha Fea u eFo es pe o ms much be e
han he classical andom o es (see Fig. 2b), wi h mean and s anda d de ia ions o
0.88±0.05
(Fea u eFo es ),
0.92±0.04
(Fea u eFo es + pos -p ocessing), and
0.61±0.07
( andom o es ). In addi ion o he highe mean Dice sco e, Fea u eFo es
also esul s in lowe a iabili y and less sensi i i y o a ying image quali y. These esul s a e con i med ac oss a ious o he
me ics (see Table 1and Supplemen a y Figu e 2 o he dis ibu ions), such as p ecision and ecall, which a e used o compu e
he Dice sco e bu a e sensi i e o di e en componen s o he con usion ma ix, as well as bounda y-based me ics: bounda y
F1 and Hausdo dis ance.
The example da ase o Fig. 2a is a ela i ely easy segmen a ion challenge as he mi ochond ial ex u e is su icien ly
di e en om he es o he image o be well cap u ed by classical image il e s. Classical image analysis could u he imp o e
he segmen a ion ob ained wi h he andom o es classi ie , o ins ance by il e ing connec ed componen s by size and applying
smoo hing o mo phological ope a ions. In Fig. 2c, we use ano he FIB-SEM da ase (human b eas cance sphe oid) in which
he mi ochond ia ha e simila ex u e o hei su ounding and can only be segmen ed by conside ing hei la ge con ex and
shape. Such a si ua ion is exac ly whe e andom o es classi ie s ypically ail, and indeed he classical app oach applied o
his da ase esul ed in a poo quali y segmen a ion (Dice sco e o
0.33
). In compa ison, Fea u eFo es leads o he co ec
segmen a ion o he mi ochond ia wi h ew spu ious segmen ed pixels (Dice sco e
0.83
). As be o e, he esul s can be u he
imp o ed by using ou pos -p ocessing (Dice sco e
0.87
). The dis ibu ion o Dice sco es o e he whole da ase (500 slices)
u he shows ha Fea u eFo es enables segmen ing he s ack wi h high ideli y while he andom o es classi ie leads o poo
quali y esul s (see Fig. 2d), wi h mean and s anda d de ia ions o
0.74±0.06
(Fea u eFo es ),
0.78±0.07
(Fea u eFo es +
pos -p ocessing), and
0.30±0.06
( andom o es classi ie ). He e again, o he me ics co obo a e he Dice sco e (see Table 1
and Supplemen a y Figu e 2), wi h Fea u eFo es as ly ou pe o ming he andom o es classi ie . Fea u eFo es SAM2-based
pos -p ocessing sligh ly imp o es he me ics sco es while yielding a be e isual imp ession (see Supplemen a y Figu e 3)
due o he smoo hness o he esul ing segmen a ion. No e ha he andom o es classi ie p oduces a la ge amoun o spu ious
segmen ed pixels, leading o a low p ecision (0.26±0.09) compa ed o i s ecall sco e (0.38±0.07).
Nex , we compa ed segmen a ion pe o mance on da a om a di e en imaging modali y and sample ype. Fig. 2e
showcases he ou pu o Labki and Fea u eFo es on an H&E s ained human kidney issue. This da a con ains speci ic blood
essel s uc u es called glome uli. In he example om Fig. 2e, he glome uli a e sligh ly da ke han hei su ounding and,
mos impo an ly, display a wide a ie y o ex u es. The andom o es classi ie is capable o app oxima ely segmen ing many
glome ulus ins ances, bu misses se e al o hem and p oduces many spu ious g oups o segmen ed pixels (Dice sco e
0.52
).
He e again, Fea u eFo es co ec ly segmen s all s uc u es, and i s pos -p ocessing leads o smoo h and comple e segmen ed
objec s. The da ase was c ea ed by iling a la ge image, and some iles a e shown in Supplemen a y Figu e 4, including he
ecomposed image, showcasing he pe o mance o Fea u eFo es . Compu ing he Dice sco e o each ile (Fig. 2 ) leads o
mean and s anda d de ia ions o
0.81±0.04
(Fea u eFo es ),
0.87±0.04
(Fea u eFo es + pos -p ocessing), and
0.61±0.11
( andom o es ). Ac oss all me ics, Fea u eFo es su passes he andom o es classi ie (Labki ). Howe e , Fea u eFo es
sco es ela i ely low on he bounda y F1 (0.63±0.07) compa ed o he o he me ics. Pos -p ocessing inc eases subs an ially
3/19
he pe o mance on ha me ics (0.87±0.06).
Because Fea u eFo es uses a andom o es as classi ie on op o he ounda ional model ea u es, Fea u eFo es can
segmen mul iple classes a a ime. To demons a e his, in Fig. 2g, we segmen a mouse emb yo imaged in label- ee
b igh ield mic oscopy. While he cells a he cen e o he emb yo ha e a as ly di e en ex u e om he es o he image,
he ex aemb yonic memb ane o he emb yo has spa ially a ying in ensi y due o shadowing and is close o he uni o m
backg ound ex u e. The andom o es classi ie pe o ms well on he cell mass (Dice sco e
0.91
), bu is subpa on he
ex aemb yonic memb ane (
0.70
), leading o incomple e segmen a ion o he la e . Once again, he classical andom o es
me hod e oneously segmen s o he s uc u es in he image, leading o he same imbalance be ween p ecision and ecall as be o e
(
0.59±0.09
s
0.75±0.03
, espec i ely), as shown in Table 1. Fea u eFo es p oduces an almos pe ec segmen a ion wi h
high Dice sco es (
0.99
o he cells, and
0.90
o he ex aemb yonic memb ane). This is he case h oughou all he es images
(see Fig. 2h), wi h mean and s anda d de ia ions o
0.90±0.01
(Fea u eFo es ),
0.93±0.01
(Fea u eFo es + pos -p ocessing),
and
0.66 ±0.05
( andom o es classi ie ). O he me ics con i m he segmen a ion pe o mance o Fea u eFo es on he
ex aemb yonic memb ane class (see Table 1and Supplemen a y Figu e 2).
To u he showcase mul iclass segmen a ion, we also segmen ed a FIB-SEM da ase dis inguishing 6 classes (endoplasmic
e iculum, golgi, mi ochond ia, lysosomes, lipid d ople s and nuclea en elope). Fea u eFo es co ec ly segmen s mos objec s
in he images (see Supplemen a y Figu e 5), ac oss a wide ange o ex u e and shapes.
Mul iclass segmen a ion on la ge da ase s
Fo complex da ase s, as we ha e seen, he pe o mance o classical andom o es pixel classi ica ion can lead o unusable
segmen a ion, as shown in Fig. 2c. When aining deep lea ning ne wo ks is no possible due he g ound- u h label gene a ion
equi emen , Fea u eFo es p o ides a use ul al e na i e o pe o m he segmen a ion.
This was exempli ied in a ecen s udy
26
, in which Fea u eFo es was used o segmen o ganelles in a complex symbio ic
in e ac ion be ween euka yo ic cells. The da a consis ed o la ge esin-embedded FIB-SEM s acks ep esen ing a dino lagella e
cell ( e e ed o as he hos ). This dino lagella e species is known o acqui e and hijack o ganelles om i s algal p ey (mic oalga
Phaeocys is an a c ica), including nucleus, plas ids and mi ochond ia, and e ain hem o e se e al mon hs.
In Fig. 3a, we compa e he manual segmen a ion o h ee classes (algal plas ids, algal mi ochond ia and hos mi ochond ia)
wi h he esul s om Fea u eFo es on h ee di e en slices o a single FIB-SEM s ack om Rao e al
26
(o iginal s ack o size
3598×4455×3944
pixels, which was binned wi h a ac o 4). The mi ochond ia o bo h he hos (o ange) and he algal p ey
( ed) we e segmen ed in wo di e en classes in one Fea u eFo es model, while we ained Fea u eFo es again sepa a ely o
he algal plas ids (blue). In all cases, Fea u eFo es led o high quali y segmen a ion. In pa icula , he plas ids a e accu a ely
segmen ed h oughou he s ack. To quan i y his, we manually segmen ed
7
es slices dis ibu ed o e he whole ange o
he s ack. We hen compu ed he Dice sco e be ween he manual segmen a ion and Fea u eFo es + pos -p ocessing on hese
es slices, con i ming he isual imp ession, wi h mean and s anda d de ia ions o
0.58 ±0.06
(algal p ey mi ochond ia),
0.64±0.03
(hos mi ochond ia), and
0.88±0.02
(algal plas ids) (see Supplemen a y Figu e 6 o he dis ibu ions). He e,
manually anno a ing
7
slices o quan i ica ion pu poses was a slow p ocess. In con as , he ained Fea u eFo es classi ie
does no equi e addi ional inpu s o segmen he h ee classes in he
3598
slices o he en i e s ack. Segmen a ion o hese
o ganelles h oughou such a la ge s ack is essen ial o isualize and quan i y mo phological changes (e.g. changes in olume
and su ace o s olen o ganelles). The segmen a ion p o ided by Fea u eFo es allows building a 3D model o he dis ibu ion
o o ganelles in space (see Fig. 3b), a necessa y s ep in measu ing he mo phome ics o he a ious o ganelles. Mo e de ails on
he indings o he s udy a e a ailable in Rao e al26.
Compa ing model pe o mance
Fea u eFo es d as ically imp o es segmen a ion quali y compa ed o classical andom o es based app oaches such as Labki
3
,
in pa icula on complex da ase s such as he low-con as elec on mic oscopy da ase shown in Fig. 2c. In his sec ion, we
assess he pe o mances o ou Fea u eFo es app oach when di e en ea u e gene a ing ne wo ks a e used. Be o e, howe e , i
is impo an o emind ou sel es ha he mo i a ion o using ou me hod is wo old:
(i)
he ease o use, e en o use s wi hou
any compu a ional backg ound o expe ience, and
(ii)
he i e a i e wo k low, as desc ibed in Fig. 1, whe e a ew sc ibbles by a
use can al eady lead o ini ial esul s and any u he labeling o pixels is guided by mis akes he cu en Fea u eFo es model
makes. O he me hods, like aining a U-Ne
27
, do no sha e hese ad an ages, ypically equi ing some expe ience wi h se ing
up a deep lea ning aining pipeline, and equi ing so-called dense labels o e e y single pixel in he en i e aining se . As a
compa ison, we ained a U-Ne
28
o he Sphe oid da ase , using
8
ull slices o he a ailable dense g ound- u h labels and
obse ed ha Fea u eFo es yields compa able pe o mance o he ained U-Ne (see Table. 2and Supplemen a y Figu es 7
and 8). While Fea u eFo es wi h pos p ocessing s ill ou pe o ms he U-Ne , he impo an insigh is he ela i e ease a which
use s can achie e esul s as good as speci ically ained neu al ne wo ks on dense and pe ec quali y aining da a.
So a , all he esul s we e ob ained using SAM2 (SAM2_La ge model), he mos powe ul model cu en ly a ailable in
Fea u eFo es . By de aul , MobileSAM
22
and DINO 2
19
can also be used o ex ac ea u e ec o s. O he p oposed me hods
4/19
ha a e simila o Fea u eFo es
29,30
use a p e ained VGG16
31
o gene a e ea u es. Since ou expe imen s did no display
con incing esul s using VGG16 ea u es, we do no o e his ela i ely ou da ed model in ou own implemen a ion. In Table 3,
we compu ed all a o emen ioned me ics on he wo elec on mic oscopy da ase s om Fig. 2 o a ious ea u e-gene a ing
models (see Supplemen a y Figu e 9 o he dis ibu ions). On he Fly b ain da ase (see Table 3and Supplemen a y Figu e 10),
DINO 2 ou pe o ms he o he app oaches, wi h SAM2 being a e y close con ende . MobileSAM, he smalles model
a ailable in Fea u eFo es , p o ides in e io pe o mance o he la ge ne wo ks, bu a es be e han VGG16. Visually, VGG16
o e -segmen ed he image, yielding la ge alse posi i e a eas (Supplemen a y Figu e 10). This leads o esul s e en in e io o
hose o he andom o es classi ie . On he mo e complex da ase (see Table 3and Supplemen a y Figu e 11), he Sphe oid
da ase , SAM2 is he bes model, wi h all o he s displaying simila , bu in e io , pe o mances. O e all, SAM2 p o ides eliable
segmen a ion esul s, while he o he models expe ience highe sample-dependen a iabili y: hey can pe o m well on a
pa icula da ase and poo ly on o he s.
As desc ibed p e iously, Fea u eFo es pos -p ocessing imp o ed he esul s ob ained wi h SAM2_La ge in Fig. 2. We
obse ed imp o emen s o e e y model a ailable in Fea u eFo es (see Supplemen a y Table 1 and Supplemen a y Figu e 12),
con i ming he u ili y o his ea u e.
To u he es he obus ness o Fea u eFo es agains low con as o low signal- o-noise a io, we co up ed he Sphe oid
da ase and ained Fea u eFo es (SAM2_La ge) wi h and wi hou pos -p ocessing on he deg aded images. Low con as
a ec s he dynamic ange o he pixel alues, while main aining he in eg i y o he s uc u es (see Supplemen a y Figu e 13).
Fea u eFo es p o ed o be esilien o dec easing con as , wi h pe o mance deg ading ac oss all me ics o he lowes con as
le el only, a which poin s uc u es we e in ac ba ely isible any longe . We hen gene a ed low signal- o-noise a io images
using wo di e en app oaches (see Me hods o desc ip ions): addi i e Gaussian noise (see Supplemen a y Figu e 14) and
escaled Poisson noise (see Supplemen a y Figu e 15). As opposed o lowe ing con as , noise dis o s he bounda ies o objec s
in he image, complica ing he segmen a ion ask. In bo h cases, he pe o mance ac oss he a ious me ics dec ease wi h he
amoun o noise (Supplemen a y Figu es 14 and 15). To allow o compa ison, we es ima ed he signal- o-noise a io (SNR)
o he images deg aded by bo h noise p ocesses. Fea u eFo es segmen a ion quali y as measu ed by Dice sco e is equally
sensi i e o bo h deg ada ion (see Supplemen a y Figu e 16).
T aining and p edic ion iming
The ime equi ed o each s ep o he Fea u eFo es pipeline a ies, depending la gely on he size o he aining images, he
chosen model and he compu e sys em. In pa icula , ex ac ing he ea u e ec o s is a leng hy ope a ion ha ob iously scales
wi h he size o he da ase . We measu ed he du a ion o he ea u e ec o s ex ac ion and w i ing o he HDF5 s o age o a
single slice o a ious size and o each model, on di e en ope a ing sys ems, GPUs and CPU (see Supplemen a y Figu e 17
and Supplemen a y Table 2). The ex ac ion ime is oughly linea wi h he inpu size. DINO 2 p o ed di icul o un on
Windows due o missing op imiza ion lib a ies. In e e y sys ems we es ed, he ligh weigh MobileSAM led o as e ex ac ion
ime ( anging om
0.67
s o
1.77
s o ex ac a
256×256
slice on GPU) han he o he models, and was e en as e on CPU
(
1.61
s and
2.83
s) han he o he models on GPU. Ex ac ion ime o SAM2_La ge seemed o be dependen on he a ailable
GPU memo y (going om
21.91
s on Windows wi h 6 GB GPU o
4.61
s on Linux wi h 32 GB GPU o ex ac a
256×256
slice). Finally, DINO 2 was pe o ming well only on Linux sys ems. No e ha he numbe s epo ed he e a e indica i e and
will depend on a a ie y o ac o s, including ope a ing sys em, NVIDIA d i e s, speci ic GPU model and ins alled lib a y
e sions.
Fea u eFo es aining s eps consis o i e a i ely labeling he da a, aining he andom o es and p edic ing on he sample.
In o de o assess he du a ions o hese a ious s eps, we acked he numbe o andom o es aining as a p oxy o he numbe
o i e a ions, as well as es ima ed he o al aining ime by measu ing he in e al be ween he i s and las labeled pixels (see
Table 4), while aining on he da ase s om Fig. 2a (Fly b ain) and 2c (Sphe oid). These cons i u e impe ec measu emen s,
since hey include a numbe o i e a ions and p edic ions ha a e no ep oducible ac oss da ase s and models, o be ween use s.
In addi ion, a model may no show signs o u he imp o emen a e ewe i e a ions han a be e pe o ming model ha
equi es mo e i e a ions o each sa is ying esul s. They none heless p o ide indica ions on he amoun o ime necessa y o
ain Fea u eFo es on hese da ase s. In addi ion, we epo he a e age aining s ep and he slice p edic ion du a ions (see
Table 4). Since aining he andom o es scales wi h he numbe o labeled pixels, he aining ime inc eases h oughou he
i e a ions as use s end o add labels a he han dele e hem. Single-slice p edic ion du a ion, on he o he hand, only depends
on he numbe o ex ac ed ea u es and model implemen s, and is he e o e s able h oughou aining. The Fly b ain da ase
equi ed abou
30
min o be ained on o each model, wi h DINO 2 con e ging apidly (
33
i e a ions) compa a i ely o he
o he models, all he while also aking he longes o al ime o ain (
43
min). The eason o he DINO 2 aining being slowe
al hough consis ing o ewe i e a ions was o be ound in i s slowe p edic ion ime on his pa icula da ase . SAM2, al hough
a la ge model han MobileSAM, ained in 20 minu es hanks o apid con e gence owa ds high quali y segmen a ion. The
Sphe oid da ase is mo e complex and equi ed mo e i e a ions o DINO 2 and SAM2, bu also longe aining ime o all
5/19
models. In pa , his is due o a much longe slice p edic ion ime caused by la ge image dimensions (each slice o he Sphe oid
da ase is
1024×512
compa ed o
256×256
in he case o he Fly b ain). In addi ion, he da ase complexi y also led o a
highe numbe o labeled pixels, yielding longe a e age aining s eps. No e ha single-slice p edic ion du a ions a e smalle
in Table 4 han he ex ac ion du a ions epo ed in Supplemen a y Table 2 o he same image size, as he p edic ion s ep does
no include w i ing he ea u es o he ea u e ec o s s o age.
Finally, once ained, he o al p edic ion ime equi ed depends on he sys em ha dwa e, ope a ing sys em, and ins alled
lib a ies. Fo he wo da ase s examined in his sec ion, he whole s ack p edic ion du a ion a e es ima ed om Table 4and
epo ed in Table 5 o ou Linux es sys em (
16
GB GPU). In he case o he Fly b ain, bo h MobileSAM and SAM2 a e
easonably quick, being able o p edic on he whole s ack wi hin
6
minu es. DINO 2, on he o he akes a li le unde
20
minu es. The la ge Sphe oid s ack (
500×1024×512
) leads o ime scales in he o de o a ew hou s, wi h bo h MobileSAM
and DINO 2 p edic ing wi hin 4 hou s. As opposed o he o he da ase , SAM2 was he e he slowes model o p edic ion (
4
hou s and 10 minu es).
Compu a ional cos and minimum ha dwa e equi emen s
The use o la ge ounda ion models in Fea u eFo es imposes cons ain s on he compu a ional ha dwa e equi ed o p ocess
images e icien ly. In pa icula , as wi h mos deep-lea ning based ools, i is s ongly ecommended o use GPU-accele a ion.
The a ailable memo y in he GPU es ic s which models can be used, as shown in Table 6. The e, we es ima ed a loose
minimum cons ain on he GPU memo y ha allows unning he model success ully wi h a
512 ×512
image. The GPU
memo y oo p in will inc ease wi h la ge images and wi h he numbe o slices in a s ack. Fo small GPU uni s (< 4 GB), only
MobileSAM can un. La ge GPUs (>= 6 GB) should be able o un SAM2_Base o SAM2_La ge. Fo ins ance ou Windows
es lap op wi h 6 GB GPU success ully ex ac ed ea u es o la ge images wi h SAM2_La ge (see Supplemen a y Figu e 17).
This is indica i e as i hea ily depends on he ope a ing sys em, he GPU model, he speci ic d i e , and he ins alled lib a y
e sions.
Ano he cons ain esul ing om he aining p ocess o Fea u eFo es is he size o he image ea u e ec o s s o age.
In able 6, we show he memo y oo p in o disk o he ea u e ec o HDF5 s o age o he di e en Fea u eFo es models.
The size o he ea u e ec o s is dependen on he chosen model, wi h a o al s o age space anging om abou
200
MB
(DINO 2) o
800
MB (SAM2) o a
512×512
sized image. The o al s o age size scales linea ly wi h he inpu shape, and a
256×512×512 image s ack will equi e 50 GB disk space o MobileSAM and DINO 2, and 100 GB o SAM2.
Finally, aining on and p ocessing images wi h Fea u eFo es is ba ely in luenced by he amoun o andom-access (RAM)
memo y, o numbe o CPU co es. Fea u eFo es only loads in RAM he images used o aining o p edic ion, and does no
use in ensi e mul i- h eading. In addi ion, mos compu e s wi h a dedica ed GPU come wi h enough RAM and CPU co es o be
able o un s anda d image p ocessing asks, making hem compa ible wi h Fea u eFo es .
Discussion
In his manusc ip , we in oduced Fea u eFo es , an app oach le e aging exis ing ounda ion models o gene a e pixel-wise
high-quali y ea u e ep esen a ions ha a e hen used o ain a andom o es o pixel classi ica ion. Via ou napa i plugin
implemen a ion, Fea u eFo es p o ides a simple, in ui i e and s aigh o wa d segmen a ion pipeline, combining he powe o
la ge deep lea ning image segmen a ion models wi h he ease o use o andom o es s. C ucially, hese models can be applied
e en by esea che s wi h no knowledge o deep lea ning. Fea u eFo es ills a gap in he landscape o segmen a ion ools, in
pa icula o la ge and complex da ase s such as elec on mic oscopy olumes, o which he anno a ion e o equi ed o
assemble g ound- u h o deep-lea ning is conside able. We p o ide se e al di e en ounda ion models o ea u e gene a ion,
including SAM2, he cu en s a e o he a la ge ounda ion model o segmen a ion, as well as he possibili y o use s o add
hei own model adap e o Fea u eFo es . Mo eoe e , we also designed pos -p ocessing s eps allowing u he imp o emen o
he esul s by using Fea u eFo es segmen a ion ou pu o di ec ly gene a e SAM2 p edic ions.
We benchma ked Fea u eFo es on mul iple publicly a ailable da ase s ha we e published wi h g ound- u h (o o which
we could gene a e ou own g ound u h), including FIB-SEM, H&E s ainings and label- ee b igh ield images, bo h o single
and mul i-class segmen a ion. We showed ha no only does Fea u eFo es signi ican ly imp o e segmen a ion pe o mance on
hese da ase s compa ed o a classical andom o es pixel classi ie , bu ha i also p oduces high quali y segmen a ion o
complex da ase s o which he andom o es classi ie pixel esul s a e unusable.
He e, we in es iga ed Fea u eFo es wi h di e en models: MobileSAM, DINO 2, and SAM2_La ge.SAM2_La ge was
he bes pe o ming model o e all, only su passed me ics-wise by a small ma gin by DINO 2 on a single da ase . DINO 2
o he wise unde pe o med on he o he da ase s. We he e o e ecommend using SAM2 models whene e possible as i deli e s
he bes and mos consis en segmen a ion quali y. In cases whe e he a ailable GPU is limi ing, we sugges use s o use
Fea u eFo es wi h SAM2_Base i s , ollowed by MobileSAM.
6/19
Fea u eFo es pos -p ocessing consis en ly imp o ed esul s, leading o smoo he masks and mo e comple e objec s. In
ce ain cases, pos -p ocessing wi h bounding box gene a ion can lead o o e -segmen a ion when objec ins ances a e di icul
o sepa a e, and e en in a e cases o a mask co e ing he en i e image. In such cases, use s migh need o pos -p ocess hese
images sepa a ely wi h di e en pa ame e s (e.g. smalle o la ge numbe o smoo hing s eps).
We also showed ha Fea u eFo es p oduced esul s o compa able quali y o a well- ained deep-lea ning ne wo k, while
no equi ing dense g ound- u h labels and o e ing he in ui i e i e a i e lea ning app oach p esen ed in Fig. 1. We belie e his
is p ecisely his i e a i e labeling wo k low ha ende s Fea u eFo es such a p ac ical ool o li e science use s. Indeed, wi h
such a wo k low, hey no longe need o densely label an ap io i unknown amoun o da a, bu a e i e a i ely guided o loca ions
whe e Fea u eFo es p edic ions a e w ong, can elabel some pixels in hese a eas, and na u ally s op his p ocess when he
esul s a e o su icien quali y o u he downs eam analysis.
Ou me hod inhe i s some incon eniences ha a e inhe en o he la ge deep-lea ning models we use o ex ac ea u e
ec o s. Fi s ly, SAM2, MobileSAM and DINO 2 a e ained on na u al images (e.g. scenes o e e yday li e, o en RGB
images) and he ea u e ec o s hey p oduce migh no be op imized o sepa a ing he biological objec s o in e es . To add ess
his, ine- uning hese models on mic oscopy images is an exci ing possibili y ha he communi y is now s a ing o explo e
13–17
.
We also p o ide a p elimina y in eg a ion o models om µSAM
16
. These a e domain speci ic models ha ine une SAM o
ligh and elec on mic oscopy da a and ha may lead o e en u he imp o ed seman ic segmen a ion esul s o hese domains.
A u he limi a ion conce ns he s o age and gene a ion o ea u e ec o s. In o de o be ime-e icien , ea u e ex ac ion
should p e e en ially be pe o med on a g aphical p ocessing uni (GPU). Wi hou access o a GPU, use s should expec he
ea u e ex ac ion, and he segmen a ion o ull s acks o which ea u es we e no p e-expo ed, o ake om minu es o hou s
depending on he s ack size. As his is he mos ime consuming s ep, we sepa a ed he ea u e ex ac ion and aining s eps
in ou napa i plugins. Once Fea u eFo es is ained, he ea u es a e compu ed on he ly while segmen ing an en i e da ase .
We he e o e ad ise use s o ain on a ep esen a i e sub-s ack o he image in o de o minimize he oo p in on disk and
gene a ion ime o he ea u e ec o s, and segmen on he la ge s ack once hey a e sa is ied wi h he esul s on he aining
s ack. In addi ion, he mo e complex models ha e a la ge memo y oo p in as hey consis o a much la ge numbe o
pa ame e s. Fu u e upda es may include u he op imiza ion o memo y usage, such as p uning he ea u e ec o s om
non-essen ial ea u es and imp o ing GPU usage.
Du ing he wo k on Fea u eFo es , simila app oaches ha e been co-de elopped, highligh ing he use ulness o he
me hod
29,30,32,33
. Compa ed o hese a ian s, we use s a e o he a ounda ion models o gene a e he ea u e ec o s (e.g.
MobileSAM, SAM2), a he han simple and olde ne wo ks such as VGG16
31
o ne wo ks ained on e y speci ic da ase s.
Compa ed o SAM2, VGG16 has he ad an age o being ligh weigh , and he e o e able o un on mos machines. In ou
expe imen s, howe e , VGG16 did no p o ide subs an ial imp o emen s o e classical andom o es s and was, as expec ed,
as ly in e io o using SAM2 ea u es.
In he u u e, we will con inue o op imize Fea u eFo es in o de o u he imp o e he use expe ience, in pa icula wi h
espec o speed and memo y e iciency, and by adding mo e models o ea u e ex ac ion and pos -p ocessing. The sou ce
code o ou napa i plugins is eely and openly a ailable on Gi hub
34
, and can be ins alled h ough PyPI. We also p o ide
documen a ion on how o use Fea u eFo es , as well as sc ip s, no ebook examples, and command-line in e ace o unning
Fea u eFo es ou side napa i (e.g. on high pe o mance compu ing (HPC) sys ems). We belie e ha Fea u eFo es cons i u es a
much needed ool o many s udies ha deal wi h complex images.
Me hods
Fea u eFo es napa i plugins
Fea u eFo es is a Py hon so wa e package and consis s o con enience unc ions and a napa i plugin. All code and
documen a ion is accessible on Gi hub (juglab/ ea u e o es ). The Fea u eFo es napa i plugin con ains wo di e en widge s:
Fea u e Ex ac ion and Segmen a ion widge . The i s plugin ex ac s ea u e ec o s o each pixels in a selec ed napa i laye
and s o es hem in a HDF5 con aine o allow andom access. The second widge allows aining he andom o es classi ie
using he p e iously expo ed ea u e ec o s, as well as pe o m pos -p ocessing and segmen a ion o he en i e da ase .
Models
The embeddings o deep-lea ning ne wo ks a e o en o smalle spa ial dimension ha hose o he inpu images, while ha ing
many mo e channels ( he ea u es). In o de o ob ain pe image pixel ea u es, we spli he images in o o e lapping pa ches.
The cons ain s on he pa ch size and on he o e laps a e model-dependen , and desc ibed below. Nex , we upscale he pa ches
o i he model inpu size using bicubic in e pola ion (Resize om he o ch ision. ans o ms. 2 module). Since he models
equi e RGB inpu , he single-channel pa ches a e duplica ed and conca ena ed in o 3-channel pa ches. We apply he model o
he pa ches, and sa e he esul ing embeddings. Typically, hese embeddings a e he ou pu o he encode pa o he model.
See he a ious desc ip ions ha ollow o mo e model-dependen de ails. Those embeddings ha e spa ial dimensions smalle
7/19
han ha o he o iginal pa ches, and we he e o e upscale hem again using bicubic in e pola ion, as desc ibed ea lie . Using
small inpu pa ches educes he scale o he embeddings upscaling, leading o mo e dis inc i e ea u es be ween neighbo ing
pixels. Finally, he embedding pa ches, now o he same size as he pa ch inpu s, a e c opped o he non-o e lapping egions
and e-assembled as a ea u e map o same spa ial ex en as he o iginal image wi h
N
channels co esponding o he ea u es,
N
being dependen on he speci ic model used (see below).
Fea u eFo es includes he ollowing models ha we e used in his manusc ip : SAM2_La ge
23
,SAM2_Base
23
,Mobile-
SAM
22
, and DINO 2
19
. All models a e implemen ed by ex ending he BaseModelAdap e class, which allows se ing a pa ch
size compa ible wi h he speci ic model, as well as ex ac ing ea u e ec o s o each pixel p o ided o he model. Each model
has i s own implemen a ion, as hey ha e di e en inpu equi emen s and a chi ec u es.
Mo e speci ically, SAM2_La ge uses "sam2.1_hie a_la ge.p " as model, while SAM2_Base co esponds o he ligh e
"sam2.1_hie a_base_plus.p " (see acebook esea ch/sam2 on Gi hub). We chose a maximum pa ch size o
512
and a minimum
numbe o pa ches pe dimension o
2
. I images a e smalle han hal he pa ch size, we hal he pa ch size i e a i ely un il he
pa ch size mee s he cons ain o a leas
2
pa ches pe dimension. The o e lap is chosen as hal o he pa ch size. Pa ches a e
scaled o
1024×1024
, SAM2 inpu dimensions. SAM2 encode includes a Fea u e Py amid Ne wo k (FPN, backbone_ pn)
ha ou pu s embeddings a h ee dis inc esolu ion le els (
64×64
,
256×256
, and
128×128
). We independen ly upscale
hese embeddings o he pa ch size and conca ena e hem, leading o 768 ea u es pe pixel.
MobileSAM model uses a modi ied e sion o he TinyVIT model a chi ec u e ha gi e access o he in e nal embeddings
compu ed by he encode . We use "mobile_sam.p " (see ChaoningZhang/MobileSAM on Gi hub) as weigh s o ou modi ied
isual ans o me a chi ec u e. We use he same pa ch and o e lap cons ain s as o SAM2. MobileSAM encode ou pu s
256
ea u es. I also compu es
64
pa ch embeddings
35
(Pa chEmbed) class, which a e e u ned by ou cus om implemen a ion o he
encode . We conca ena e hese embeddings o ob ain 320 ea u es in o al pe pixel.
Finally, we use "dino 2_ i s14_ eg" om he PyTo ch Hub o DINO 2. DINO 2 inpu pa ches o size di isible by
14
. To
ob ain pe pixel ou pu , we c ea e pa ches o ixed size
70x70
wi h o e laps
28x28
. The numbe o ou pu ea u es o each
pixel is 384, and is he ou pu o he model i sel .
Fo each expe imen , Fea u eFo es was un om he commi 4ae 995, wi h he codebase being a ailable on Gi hub
(juglab/ ea u e o es ). Unless o he wise indica ed, he aining and pos -p ocessing we e ca ied ou wi h de aul s pa ame e s.
All aining, analysis, and plo ing we e pe o med in Py hon using open-sou ce lib a ies, using he GPU conda en i onmen
p o ided in he sou ce code eposi o y. Unless s a ed o he wise, all aining and p edic ions we e pe o med on a Linux i ual
machine (RedHa ) wi h access o a NVIDIA A40-16Q (16 GB) GPU using SAM2 model.
Fea u eFo es andom o es aining
Fea u eFo es ains a andom o es classi ie using he ea u e ec o s ex ac ed om one o i s adap ed models. Fo each
labeled pixel in he labeling laye in napa i, he co esponding ea u e ec o s a e ex ac ed, and ed along wi h he label numbe
o he andom o es classi ie 36. By de aul , we use 450 ees o maximum dep h 9. The ained classi ie can hen be used o
p edic pixel label class o each pixels in he image o slice cu en ly displayed in napa i, o p edic on he whole s ack.
SAM2 pos -p ocessing
As pa o Fea u eFo es pipeline, we p o ide se e al pos -p ocessing op ions ha le e age he la ge deep lea ning ne wo k used
o ea u e gene a ion. In any case, he i s s ep employs mean cu a u e smoo hing, an i e a i e edge-p ese ing smoo hing
me hod ha ill small holes, and il e s ou small connec ed componen s. Use s can change he numbe o smoo hing i e a ions
and he h eshold used o il e ou connec ed componen s by a ea (absolu e o ela i e). By de aul , we use
25
smoo hing
i e a ions, and an absolu e h eshold o 50 pixels.
Subsequen ly, use s can use ei he o wo addi ional s eps: SAM2ImageP edic o and SAM2Au oma icMaskGene a o . In
he o me , we use a wa e shed algo i hm o sepa a e he mask in o ins ances. Bounding boxes a e hen gene a ed a ound each
ins ance, and used as p omp s o SAM2. The ou pu ins ances a e me ged in o a single mask and added in o napa i as a laye .
SamAu oma icMaskGene a o gene a es a e enly-spaced g id o poin s as p omp s o SAM2, which ou pu s a la ge numbe o
masks. We e ain only ins ances wi h an in e sec ion o e union wi h espec o he closes connec ed componen om he
andom o es segmen a ion la ge han a use -se h eshold (by de aul 0.35).
Da ase s
The Fly b ain da ase om Fig. 2a and Supplemen a y Figu e 1 is a ailable as pa o he EMPIAR-10982 da ase , and consis
o a s ack o size
256×255×255
and an iso opic pixel size
12 nm
. We use e e y 16 ames, s a ing om he i s one, as
aining se , while p edic ion was pe o med on he whole da ase . In he igu es, only images ha we e no used o aining
and a e as a as possible om neighbo ing aining slices a e shown.
The human b eas cance sphe oid s ack (Fig. 2c and Supplemen a y Fig 3) is ex ac ed om EMPIAR-11380 (sample
F059_bin2)
37
. The s ack has dimensions
1446×1683×1928
and an iso opic pixel size o
20 nm
. We c opped he da a o size
8/19
500×512×1024
om he op-le coo dina e
(390,800,150)
. We epo slice numbe s om he o iginal da ase a he han
om ou c opped s ack. T aining was pe o med using e e y 30 h ame, s a ing om he i s , while p edic ion was pe o med
on he whole da ase . In he igu es, only images ha we e no used o aining a e shown, selec ing speci ically slices ha a e
as a as possible in z om he aining slices. Fo noise and con as expe imen s, we made a es subs ack o 10 slices (405,
455, 505, 555, 605, 655, 705, 755, 805, and 855 indexed in he o iginal s ack).
The human kidney issue example (Fig. 2e and Supplemen a y Figu e 4) is pa o a da ase ha was compiled om he Hu-
man Biomolecula A las P og am (HuBMAP) and publicly eleased as pa o a Kaggle challenge (www.kaggle.com/c/hubmap-
kidney-segmen a ion/da a). Speci ically, we selec ed he 1e2425 28 sample, and used he ou h se ies ( esolu ion
4027×3347
),
and c opped i o
1024×3072
( op-le coo dina es
(486,1532)
), be o e iling i in o a se o
512×512
images (
N=12
). The
masks we e p o ided as ins ances in a json ile and we e con e ed in o a bina y image, be o e being c opped and iled o ma ch
he aw image. We ained Fea u eFo es on he i s ou ames, and p edic ed on he whole ile s ack.
The mouse emb yo da ase (Fig. 2g) is publicly a ailable on he B oad Bioimage Benchma k Collec ion wi h access numbe
BBBC003. I consis s o
5
slices o a 3D label- ee b igh ield s ack o size
640×480
and pixel size
420 nm
. As he ini ial
g ound u h only included he segmen a ion o he emb yo as a single class, we manually labeled he ex aemb yonic memb ane
as a second class o gene a e wo-label g ound u h. T aining was pe o med on he i s slice, and p edic ion on he whole
s ack.
The U2OS FIB-SEM da ase (Fig. 5) is publicly a ailable as EMPIAR-11746
38
, and consis s o a
1168×3394×1385
s ack
wi h pixel size
2.5 nm
in X and Y, and
0.5 nm
in Z. We down-scaled he whole s ack o a wid h o
1200
, and used e e y
40
images om slice
500
as aining da ase , and p edic ed on e e y
30
slice om slice
501
( es da ase ). We used
6
ou o he
8
classes a ailable in he da ase g ound- u h.
The dino lagella e FIB-SEM da ase (Fig. 3) is pa o a ecen publica ion
26
and is publicly a ailable (EMPIAR-12627). I
was high-p essu e ozen and esin-embedded be o e imaging, and has dimensions
3598×4455×3944
pixels. Mo e de ails
abou sample p epa a ion a e a ailable in Rao e al. We binned he s ack wi h a ac o 4 (
3598 ×1113 ×986
pixels) o
wo k on a smalle s ack. We use slices 50, 275, 462, 752, 1024, 1375, 1721, 2015, 2310, 2813, and 3067 o aining and
p edic ed on e e y
3
slices ( o al numbe o
1200
slices). To allow o quan i ica ion, we manually labeled 7 slices (370, 650,
900, 1550, 1850, 2175, 2550, e e ed o as he es s ack) wi h he h ee classes using he SAMJ ImageJ2 plugin (SAM2,
segmen -any hing-models-ja a/SAMJ-IJ on Gi hub).
Me ics
We implemen ed Dice sco e, p ecision and ecall calcula ions by coun ing ue posi i e, alse posi i e and alse nega i e pixels
while compa ing g ound u h masks and p edic ion esul s. Bounda y F1 was adap ed om Gi hub (mina 09/b sco e_py hon),
and we used sciki -image’s39 implemen a ion o he Hausdo dis ance.
Compa ing Labki and Fea u eFo es
In Figu e 2, o each da ase , we ained Labki
3
as he andom o es classi ie and Fea u eFo es using SAM2_La ge on he
aining s ack. Labki was un independen ly by di e en image analys s, each analys sa ed a ious classi ie s wi h di e en
se s o labels and il e s anda d de ia ion se ings. We epo ed only he bes pe o ming classi ie , e alua ion o pe o mances
was ca ied ou on he epo ed me ics. We used he de aul il e s, and il e s anda d de ia ions [1, 2, 4, 8] (de aul alues) o
he Fly b ain and Emb yo, and [1, 2, 4, 8, 16, 32] o he Sphe oid and Kidney da ase . P edic ions we e pe o med on he
en i e da ase o bo h Labki and Fea u eFo es , and Fea u eFo es pos -p ocessing was ca ied ou wi h de aul s pa ame e s.
Panel (a), (c), (e), and (g) co espond o slice numbe
72
(Fly b ain),
435
(Sphe oid, c opped o a squa e egion),
6
(Kidney),
and
4
(Emb yo). Dice sco es in panels (b), (d), ( ), and (h) a e compu ed o e he en i e da ase s (each pixel coun ing as ue
posi i e, ue nega i e, o alse nega i e).
To ob ain Table 1and Supplemen a y Figu e 2, we an he a ious me ics o he same p edic ion esul s as in Figu e 2
o e he en i e s acks, and compu ed mean and s anda d de ia ion o each me ics and me hod ( andom o es , Fea u eFo es
and Fea u eFo es wi h pos -p ocessing).
In Supplemen a y Figu es 1, 3 and 4, we show he p edic ion esul s o he model ained in Figu e 2. Dice sco es we e
compu ed o each single slice agains he g ound- u h. In Supplemen a y Figu e 1, we show
40
,
124
,
168
, and
232
. In
Supplemen a y Figu e 3, we used slices
435
,
525
,
705
, and
885
. Finally, in Supplemen a y Figu e 4, we show iles
7
,
8
, and
9
in panel (a), while panel (b) shows an o e lay o he g ound- u h and he p edic ion o he en i e un- iled image.
Fea u eFo es wi h mul iple classes
In Supplemen a y Figu e 5, Fea u eFo es was ained wi h 6-class labels on he aining s ack and p edic ion was pe o med on
he en i e s ack. Slices 651, 891, and 1131 we e sligh ly c opped o exclude whi e bo de wi hou in o ma ion.
In Figu e 3and Supplemen a y Figu e 6, o ob ain he segmen a ion o he h ee classes (hos mi ochond ia, algal
mi ochond ia and algal plas ids), we ained wo di e en Fea u eFo es classi ie s: one o segmen he wo ypes o mi ochond ia,
9/19
Da ase Me hod Dice sco e ↑P ecision ↑Recall ↑Bounda y F1 ↑Hausdo dis . ↓
Fly b ain Random Fo es 0.61±0.07 0.60±0.09 0.62±0.07 0.38±0.07 11.54±4.72
Fea u eFo es (FF) 0.88±0.05 0.91±0.05 0.86±0.06 0.70±0.06 2.82±2.04
FF + Pos -P ocessing 0.92±0.04 0.97±0.03 0.87±0.05 0.92±0.05 1.49±1.67
Sphe oid Random Fo es 0.30±0.06 0.26±0.09 0.38±0.07 0.26±0.08 73.63±26.89
Fea u eFo es (FF) 0.74±0.06 0.76±0.09 0.73±0.06 0.71±0.07 10.80±7.74
FF + Pos -P ocessing 0.78±0.07 0.78±0.10 0.80±0.06 0.79±0.07 9.14±8.27
Kidney Random Fo es 0.61±0.11 0.60±0.13 0.63±0.11 0.50±0.08 21.28±17.41
Fea u eFo es (FF) 0.81±0.04 0.79±0.07 0.85±0.05 0.63±0.07 10.53±6.83
FF + Pos -P ocessing 0.87±0.04 0.86±0.07 0.89±0.05 0.87±0.06 7.16±4.86
Emb yo Random Fo es 0.66±0.06 0.59±0.09 0.75±0.03 0.48±0.02 39.20±22.63
(memb ane) Fea u eFo es (FF) 0.90±0.01 1.00±0.00 0.82±0.02 0.88±0.05 0.65±0.02
FF + Pos -P ocessing 0.93±0.01 1.00±0.00 0.86±0.02 0.96±0.03 0.50±0.02
Table 1. Me ics sco e compa ing a andom o es classi ie , Fea u eFo es , and Fea u eFo es wi h Pos -P ocessing.
The da ase s a e he same as hose shown in Fig 2: Fly b ain (panel (a), N=256), Sphe oid (panel (c), N=512), Kidney
(panel (e),
N=12
), and Emb yo (panel (g),
N=5
). The measu emen s a e shown as mean
±
s anda d de ia ion o e he en i e
da ase . Bes pe o ming me hod o each me ic is unde lined, whe e o Dice, p ecision, ecall, and bounda y F1 la ge alues
a e be e (↑), while o he Hausdo dis ance, smalle is be e (↓). Fea u eFo es was ained using SAM2_La ge.
16/19
Figu e 3. Segmen a ion o plank on o ganelles om a FIB-SEM s ack using Fea u eFo es . (a) Th ee di e en slices (ou
o 3598) o a dino lagella e cell imaged in FIB-SEM, o e laid wi h manual segmen a ion, and pos -p ocessed Fea u eFo es
(SAM2_La ge). The segmen a ion masks consis o h ee classes: algal plas ids (blue), algal mi ochond ia ( ed) and hos
mi ochond ia (o ange). Dice sco e be ween he g ound u h and Fea u eFo es + pos -p ocessing is indica ed on he op igh
co ne o each class. Scale ba 4 µm. (b) 3D econs uc ion o he h ee classes (algal plas ids, algal mi ochond ia, and hos
mi ochond ia) o (a) h oughou he en i e da ase .
17/19
Me hod Dice sco e ↑P ecision ↑Recall ↑Bounda y F1 ↑Hausdo dis . ↓
U-Ne 0.78±0.07 0.82±0.08 0.75±0.09 0.77±0.07 10.16±14.80
Fea u eFo es (FF) 0.74±0.06 0.76±0.09 0.73±0.06 0.71±0.07 10.80±7.74
FF + Pos -P ocessing 0.78±0.07 0.78±0.10 0.80±0.06 0.79±0.07 9.14±8.27
Table 2. Me ics sco e compa ing a UNe , Fea u eFo es , and Fea u eFo es wi h pos -p ocessing. The me ics a e
compu ed on he Sphe oid (Fig 2c, N=512) da ase . Bes pe o ming me hod o each me ic is unde lined, whe e o Dice,
p ecision, ecall, and bounda y F1 la ge alues a e be e (↑), while o he Hausdo dis ance, smalle is be e (↓).
Fea u eFo es was ained using SAM2_La ge.
Da ase Me hod Dice sco e ↑P ecision ↑Recall ↑Bounda y F1 ↑Hausdo dis . ↓
Fly b ain Random Fo es 0.61±0.07 0.60±0.09 0.62±0.07 0.38±0.07 11.54±4.72
VGG16 0.32±0.10 0.25±0.09 0.50±0.16 0.17±0.05 16.88±6.52
MobileSAM (FF) 0.69±0.08 0.69±0.10 0.70±0.08 0.41±0.07 9.79±4.56
DINO 2 (FF) 0.89±0.04 0.88±0.04 0.91±0.06 0.76±0.07 1.49±0.94
SAM2_La ge (FF) 0.88±0.05 0.91±0.05 0.86±0.06 0.70±0.06 2.82±2.04
Sphe oid Random Fo es 0.30±0.06 0.26±0.09 0.38±0.07 0.26±0.08 73.63±26.89
VGG16 0.53±0.08 0.43±0.09 0.70±0.09 0.42±0.08 25.48±11.92
MobileSAM (FF) 0.57±0.07 0.54±0.10 0.61±0.07 0.46±0.07 29.85±13.60
DINO 2 (FF) 0.56±0.07 0.53±0.08 0.61±0.08 0.49±0.06 28.61±11.41
SAM2_La ge (FF) 0.74±0.06 0.76±0.09 0.73±0.06 0.71±0.07 10.80±7.74
Table 3. Quan i a i e compa ison o andom o es and Fea u eFo es using a ious models o ea u e ec o s
ex ac ion. The me ics a e compu ed on he Fly b ain (Fig 2a, N=256) and Sphe oid (Fig 2c, N=512) da ase s.
MobileSAM,DINO 2 and SAM2_La ge a e a ailable wi hin Fea u eFo es (FF), while VGG16 is a ailable wi hin Con pain
30
.
Bes pe o ming me hod o each me ic is unde lined, whe e o Dice, p ecision, ecall, and bounda y F1 la ge alues a e
be e (↑), while o he Hausdo dis ance, smalle is be e (↓).
18/19
Da ase Model I e a ions To . aining (min) A e age aining s ep (s) Slice p edic ion (s)
Fly b ain MobileSAM 55 33.96 5.46 1.49
13×256×256 DINO 2 33 43.58 3.71 4.48
SAM2_La ge 39 23.64 3.08 1.52
Sphe oid MobileSAM 49 110.52 9.43 21.17
17×1024×512 DINO 2 48 133.63 10.64 24.15
SAM2_La ge 60 141.35 12.53 29.50
Table 4. Du a ion o he a ious s eps in he Fea u eFo es aining o wo da ase s. The numbe o i e a ions is
es ima ed as he numbe o andom o es ainings pe o med. The o al aining ime was measu ed as he in e al be ween he
i s and las pixel labeling. A e age aining s ep was compu ed om a e aging e e y andom o es aining. The p edic ion
s ep du a ion o a single slice is cons an , a single slice o he Fly b ain being
256×256
, while a single slice o he Sphe oid is
1024×512
. The size o he aining s ack is indica ed unde he da ase name. The measu emen s we e pe o med on a Linux
machine wi h a high-end GPU (NVIDIA A40-16Q, 16 GB).
Da ase Model P edic ion (min)
Fly b ain MobileSAM 6.36
256×255×255 DINO 2 19.11
SAM2_La ge 6.49
Sphe oid MobileSAM 180.65
500×1024×512 DINO 2 206.08
SAM2_La ge 251.73
Table 5. Du a ion o whole s ack p edic ion. The epo ed alues a e es ima ed om Table 4and co espond o p edic ion
pe o med on a Linux machine wi h a high-end GPU (NVIDIA A40-16Q, 16 GB).
Model GPU memo y (GB) S o age space (GB) S o age space (GB)
512×512 512×512 256×512×512
MobileSAM 3 0.352 42.539
DINO 2 10 0.213 48.470
SAM2_Base 6 0.844 102.094
SAM2_La ge 8 0.844 102.094
Table 6. GPU memo y and s o age space equi emen s o di e en ea u e gene a ing models. The minimum GPU
memo y was es ima ed om he GPU memo y oo p in o unning he model wi h a 512×512 image on Linux. The s o age
space co esponds o he ea u e ec o s s o age oo p in on disk o 512×512 and 256×512×512 images.
19/19