FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation

Author: Zhao, Dong; Li, Jinlong; Wang, Shuang; Wu, Mengyao; Zang, Qi; Sebe, Niculae; Zhong, Zhun

Publisher: Zenodo

DOI: 10.1109/CVPR52734.2025.01401

Source: https://zenodo.org/records/17688217/files/Zhao_FisherTune_Fisher-Guided_Robust_Tuning_of_Vision_Foundation_Models_for_Domain_CVPR_2025_paper.pdf

Fishe Tune: Fishe -Guided Robus Tuning o Vision Founda ion Models o
Domain Gene alized Segmen a ion
Dong Zhao1, Jinlong Li2, Shuang Wang1B, Mengyao Wu, Qi Zang1B, Nicu Sebe2, Zhun Zhong3
1School o A i icial In elligence, Xidian Uni e si y, Shaanxi, China
2Depa men o In o ma ion Enginee ing and Compu e Science, Uni e si y o T en o, I aly
3School o Compu e Science and In o ma ion Enginee ing, He ei Uni e si y o Technology, China
Abs ac
Vision Founda ion Models (VFMs) excel in gene aliza-
ion due o la ge-scale p e aining, bu ine- uning hem o
Domain Gene alized Seman ic Segmen a ion (DGSS) while
main aining his abili y emains a challenge. Exis ing ap-
p oaches ei he selec i ely ine- une pa ame e s o eeze
he VFMs and upda e only he adap e s, bo h o which may
unde u ilize he VFMs’ ull po en ial in DGSS asks. We
obse e ha domain-sensi i e pa ame e s in VFMs, a is-
ing om ask and dis ibu ion di e ences, can hinde gen-
e aliza ion. To add ess his, we p opose Fishe Tune, a
obus ine- uning me hod guided by he Domain-Rela ed
Fishe In o ma ion Ma ix (DR-FIM). DR-FIM measu es
pa ame e sensi i i y ac oss asks and domains, enabling
selec i e upda es ha p ese e gene aliza ion and enhance
DGSS adap abili y. To s abilize DR-FIM es ima ion, Fish-
e Tune inco po a es a ia ional in e ence, ea ing pa am-
e e s as Gaussian-dis ibu ed a iables and le e aging p e-
ained p io s. Ex ensi e expe imen s show ha Fishe -
Tune achie es supe io c oss-domain segmen a ion while
main aining gene aliza ion, ou pe o ming bo h selec i e-
pa ame e and adap e -based me hods.
1. In oduc ion
Vision Founda ion Models (VFMs), such as CLIP [46],
DINO 2 [5], and EVA02 [15], ha e eme ged as powe -
ul ools in compu e ision, achie ing ema kable gene -
aliza ion ac oss di e se downs eam asks, including c oss-
domain pe cep ion [35,53,56,71,76], ew-sho [37,65,70,
79] and ze o-sho pe cep ion [26,31,54,72]. P e- ained
on massi e da a ields, VFMs encapsula e ich isual ep e-
This wo k was suppo ed in pa by he Na ional Na u al Sci-
ence Founda ion o China No. 62271377, he Key Resea ch and De-
elopmen P og am o Shannxi P og am No. 2021ZDLGY0106 and
No. 2022ZDLGY0112, he Key Scien i ic Technological Inno a ion Re-
sea ch P ojec by Minis y o Educa ion, he MUR PNRR p ojec FAIR
(PE00000013) unded by he Nex Gene a ionEU and he EU Ho izon
p ojec s ELIAS (No. 101120237) and AI4T us (No. 101070190).
BCo-co esponding au ho .
VFMs
VFMs VFMs
(b) Selec i e Tuning
(a) Adap e Tuning
: Domain-sensi i e pa ame e s
(c) Fishe Tuning (Ou s)
: Ex a adap e s : Handmade pa ame e s
Sou ce Da a
Sou ce Da a Sou ce Da a
Figu e 1. Compa ison o p inciples o di e en VFM adjus men
me hods: (a) uning by adap e inse ion [60,66], (b) uning by
manually selec ed [57] o au oma ically selec ed pa ame e s [55],
(c) ou me hod o uning domain-sensi i e pa ame e s.
sen a ions ha can be ans e ed o nume ous applica ions
wi h mino adap a ion [61,66]. Despi e his, when i comes
o Domain Gene alized Seman ic Segmen a ion (DGSS),
whe e he goal is o segmen unseen domain images wi hou
explici access o hei aining da a, e ec i ely ine- uning
VFMs while p ese ing hei s ong gene aliza ion capabil-
i ies emains an open challenge.
Exis ing me hods o adap ing VFMs o DGSS asks yp-
ically in ol e ine- uning ia adap e laye s o emap p e-
ained okens, as shown in Fig. 1(a) [60,66]. While his
app oach educes o e i ing, i does no ully le e age he
in e nal ep esen a ions o he VFM, as he co e con en o
he model emains unchanged. Fu he mo e, when he p e-
aining asks o he VFM (e.g., MAE, DINO 2) signi i-
can ly de ia e om he DGSS ask, he adap a ion imp o e-
men is limi ed. An al e na i e solu ion is o ine- une a sub-
se o pa ame e s ela ed o he a ge DGSS ask, which ac-
i a es he ep esen a ions o VFMs, as illus a ed in Fig. 1
(b). Howe e , we obse e ha adi ional pa ame e selec-
ion me hods, whe he manually de ined [57] o au oma -
ically chosen [36,62], ail o gua an ee he gene aliza ion
abili y o he VFM, and in ac , pe o m e en wo se han
simply adding adap e s, as shown in Fig. 2.
This CVPR pape is he Open Access e sion, p o ided by he Compu e Vision Founda ion.
Excep o his wa e ma k, i is iden ical o he accep ed e sion;
he inal published e sion o he p oceedings is a ailable on IEEE Xplo e.
15043
Figu e 2. Compa ison o a e age pe o mance ac oss mul-
iple VFMs in DGSS expe imen s on GTA →Ci yscapes +
BDD100K + Mapilla y using di e en ine- uning me hods, in-
cluding adap e -based Rein [60], manually selec ed pa ame e -
based VQT [57], adap i ely selec ed pa ame e -based ChildTune
[62], and ou Fishe Tune.
Ou expe imen s e eal ha ce ain pa ame e s o Vi-
sion Founda ion Models (VFMs) a e c ucial o main aining
gene aliza ion, while o he s a e key o adap ing o new do-
mains and asks. T adi ional selec i e ine- uning me hods
ocus solely on ask-sensi i e pa ame e s, isking he dis-
up ion o he VFM’s gene aliza ion abili y. The e o e, we
p opose iden i ying and ine- uning hese domain-sensi i e
pa ame e s. To add ess his, we in oduce Fishe Tune,
a no el obus ine- uning me hod based on he Domain-
Rela ed Fishe In o ma ion Ma ix (DR-FIM). This me hod
allows us o p ese e he gene aliza ion capabili y o he
p e- ained VFM while ac i a ing i s adap abili y o DGSS
asks. Speci ically, we i s in oduce he DR-FIM me ic,
which measu es domain sensi i i y by e alua ing he luc-
ua ion o pa ame e s ac oss di e en domains. Unlike FIM
me ics, DR-FIM accoun s o domain shi s, ex ending
FIM o c oss-domain asks. To mi iga e po en ial deg a-
da ion in DR-FIM es ima ion, Fishe Tune inno a i ely em-
ploys a ia ional in e ence. By ea ing model pa ame e s
as andom a iables ollowing a Gaussian dis ibu ion and
inco po a ing p io in o ma ion om he p e- ained VFM,
Fishe Tune s abilizes he DR-FIM es ima ion p ocess, en-
su ing accu acy and obus ness in c oss-domain asks. This
no el es ima ion me hod imp o es compu a ional e iciency
and enhances he scalabili y o Fishe Tune o la ge-scale
VFM models. Th ough ex ensi e expe imen s on mul iple
DGSS benchma ks, we demons a e ha Fishe Tune con-
sis en ly ou pe o ms bo h selec i e-pa ame e and adap e -
based me hods. In summa y, ou con ibu ions a e,
• We p opose Fishe Tune, a no el ine- uning s a egy
ha le e ages Fishe In o ma ion Ma ix o selec i ely
ine- une VFMs o DGSS, p ese ing he gene aliza-
ion capabili ies o VFMs while imp o ing domain
adap abili y.
• We in oduce Domain-Rela ed FIM (DR-FIM), a no el
me ic ma ix ha quan i ies he sensi i i y o pa ame-
e s o domain shi s.
• We employ a ia ional in e ence o ea model pa am-
e e s as Gaussian-dis ibu ed, ensu ing s able and ac-
cu a e DR-FIM es ima ion.
• We alida e he e ec i eness o Fishe Tune h ough
ex ensi e expe imen s, showing supe io gene aliza-
ion compa ed o s a e-o - he-a me hods.
2. Rela ed Wo k
Domain Gene alized Seman ic Segmen a ion (DGSS)
ocuses on enhancing a model’s abili y o gene alize o un-
seen domains by aining on sou ce domains [8,43,74].
Common s a egies include domain-in a ian ep esen a ion
lea ning me hods and domain augmen a ion echniques.
Domain-in a ian ep esen a ion lea ning app oaches in-
ol e spli ing lea ned ea u es in o domain-in a ian [68,
69] and domain-speci ic componen s [32,58,59,75], o em-
ploying me a-lea ning o de elop mo e obus models [11,
30,73]. Addi ionally, se e al me hods ha e succeeded by
lea ning ea u e no maliza ion o whi ening schemes [9,
39,44]. Domain augmen a ion echniques, on he o he
hand, imp o e segmen a ion esul s h ough s yle ans e
a image-le el [23,43,45,78] o ea u e-le el [6,8,77] and
he in oduc ion o addi ional da a [29,38,67]. Some e-
cen wo k has shown ha ex -guided ea u e enhancemen
[12,13] h ough CLIP [47] o syn he ic da a o di usion
models can also bene i model gene aliza ion [3,40,76].
Pa ame e -E icien Fine-Tuning(PEFT) [18] cus omizes
p e- ained models by ine- uning a subse o pa ame e s
[17], imp o ing pe o mance and gene aliza ion wi h lowe
compu a ional cos . The dominance o ViT [10] in i-
sion asks has spu ed he de elopmen o PEFT me hods.
Adap o -based P omp Tuning [34,55] has shown s ong
pe o mance in ision ans e asks by adding lea nable
p omp s. Fo ins ance, Visual P omp Tuning (VPT) [25]
in oduces lea nable p omp s o each T ans o me laye ’s
inpu embeddings, Adap Fo me [7] adds a bo leneck ully
connec ed laye pa allel o he MLP block, and VQT [57]
op imizes p omp s h ough bypassing o e ec i ely le e -
age in e media e ea u es o VFMs.
In seman ic segmen a ion, se e al wo ks ha e applied i-
sual p omp s o model ans e . [35] uses equency and
spa ial p omp s o ans e p e- ained ViTs o low-le el seg-
men a ion asks, while [64] applies mask p omp s o aid
con inual adap a ion. [71] uses weak supe ision o LoRA
adap e [21] o adap VFMs ac oss domains. Rein [60] adds
LoRA adap e s o ans o m okens and ac i a e VFMs o
15044
DGSS. [66] enhances c oss-domain adap a ion by adding
Fou ie ans o m p omp s o in e media e okens o VFMs.
While mos isual seman ic segmen a ion me hods ely
on adap e -based p omp ine- uning, ou wo k pionee s
selec i e ine- uning me hods o isual model adap a ion.
Closely ela ed o ou app oach a e selec i e ine- uning
me hods in NLP, like ChildTune [62] and Fishe mask [36],
which use Fishe ma ices o iden i y ask-sensi i e pa am-
e e s. In con as , ou me hod in oduces domain-sensi i e
pa ame e s and a s able es ima ion me hod o iden i y pa-
ame e s highly sensi i e o bo h asks and domains, making
i pa icula ly sui ed o DGSS asks.
3. Me hodology
3.1. P elimina ies
Domain Gene alized Seman ic Segmen a ion (DGSS)
aims o ain models ha can gene alize ac oss unseen do-
mains. Fo mally, gi en a se o labeled sou ce domains
Ds={(xi, yi)}Ns
i=1, whe e xi ep esen s he inpu image
and yi ep esen s he co esponding pixel-wise label, he
goal is o ain a model θpa ame e ized by θ ha pe o ms
well on unseen a ge domains D ={xj}N
j=1, whe e he la-
bels o D a e no a ailable du ing aining. The op imiza-
ion objec i e o DGSS can be w i en as:
min
θ
E(xi,yi)∼Ds[L( θ(xi), yi)] ,(1)
whe e Lis he segmen a ion loss (e.g., c oss-en opy) ha
e alua es he di e ence be ween he p edic ed segmen a ion
map θ(xi)and he g ound u h yi. The challenge lies in
ensu ing ha he lea ned model θgene alizes well o un-
seen a ge domains D , which can be w i en as a gene al-
iza ion objec i e:
min
θ
Exj∼D L( θ(xj), y∗
j),(2)
whe e y∗
j ep esen s he ue (bu unknown) labels o
he a ge domain D . Since he labels y∗
ja e no a ailable,
he op imiza ion ocuses on lea ning domain-in a ian ep-
esen a ions in θ, enabling s ong pe o mance ac oss bo h
seen and unseen domains.
Vision Founda ion Models (VFMs), such as CLIP
[46], MAE [19], SAM [28], EVA02 [14], and DINO 2
[41], almos ly use he Vision T ans o me (ViT) a chi-
ec u e. ViT ypically consis o Ls acked blocks,
each con aining wo main submodules: mul i-head a en-
ion (MHA) and a eed- o wa d ne wo k (FFN). Speci i-
cally, he a en ion sco e o each head is calcula ed as:
MHA(X) = Conca (head1,...,headh)θo,headi=
So max Xθqi(Xθki)T
√dhXθ i,whe e θois he ou pu p o-
jec ion ma ix, and θqi,θki, and θ i ep esen he que y,
key, and alue p ojec ions o head i. The FFN consis s
o wo linea laye s wi h a ReLU ac i a ion: FFN(X) =
F eeze
Full
使用DinoV2-La ge 在GTA→
(Ci yscapes, BDD, Mapi,的DGSS实验
平均性能)
我们观察到，仅微调Vision
Founda ion Models (VFM)中的部分
关键参数相比于全微调或完全冻结
其他参数，能显著提高模型的泛化
性能。这一现象启发了我们提出一
个假设：VFM中的某些预训练参数与
特定任务和领域的适应性密切相关，
而其他参数则较为通用，能够更好
地适应不同的域和任务。
基于这一观察，我们提出识别VFM中
与任务和域适应性密切相关的“域敏
感参数”，并对这些参数进行精细微
调，从而在保持VFM预训练泛化能力
的前提下，提升其在Domain
Gene alized Seman ic Segmen a ion
(DGSS)任务中的适应性。
Figu e 3. Obse a ions o ine- uning di e en VFM laye s
o DGSS expe imen s using DINOV2-la ge unde GTA →
Ci yscapes + BDD100K + Mapilla y. I shows ha ine- uning
di e en laye s has di e en e ec s on he gene aliza ion pe o -
mance o he VFMs. B means blocks.
ReLU(Xθ n1+θb1)θ n2+θb2.Bo h he MHA and FFN
a e ollowed by esidual connec ions and laye no maliza-
ion. Le θdeno e he se o he pa ame e s o hose VFMs
ha we aim o ine- une:
θ= [θ(1)
Q, θ(1)
K, θ(1)
V, θ(1)
FFN, . . . , θ(L)
Q, θ(L)
K, θ(L)
V, θ(L)
FFN]⊤,
(3)
whe e θ(l)
Q= [θqi, ..., θqh], and so a e θ(l)
Kand θ(l)
K.
Fishe In o ma ion Ma ix (FIM) is a undamen al con-
cep in s a is ical es ima ion heo y [16], which measu es
he amoun o in o ma ion ha an obse able andom a i-
able ca ies abou an unknown pa ame e [2]. In he con ex
o neu al ne wo ks, he FIM p o ides insigh s in o he sensi-
i i y o he loss unc ion wi h espec o he model pa ame-
e s [36], e lec ing he cu a u e o he loss landscape [48].
Ma hema ically, he FIM Fθis de ined as:
Fθ=ExEy∼ θ(y|x)∇θL( θ(x), y)· ∇θL( θ(x), y)⊤,
(4)
whe e Fθ∈R|θ|×|θ|is he symme ical ma ix, ∇θL(·)de-
no es he g adien o he loss unc ion o he pa ame e s. In-
ui i ely, he Fishe In o ma ion Ma ix cap u es how much
changing he pa ame e s a ec s he model’s ou pu , hus
quan i ying he “in o ma i eness” o he pa ame e s.
3.2. Mo i a ion
This s udy is mo i a ed by an in iguing expe imen al ob-
se a ion. We g oup he θQ,θK,θV, and θFFN componen s
om di e en blocks o DINO 2 sepa a ely and ine- une
all possible pai wise combina ions o hese g oups, and an-
alyzed hei impac on model gene aliza ion, as shown in
Fig. 3. We ound ha uning speci ic laye s led o di e en
le els o gene aliza ion in VFMs, wi h some con igu a ions
15045
e en ou pe o ming a ully uned model. This sugges s ha
ce ain pa ame e s a e c i ical o main aining gene aliza-
ion, while o he s a e key o adap ing o new domains and
asks. Based on his, we hypo hesize ha VFMs con ain
domain-sensi i e pa ame e s sui ed o speci ic asks and
domains, while o he pa ame e s emain b oadly gene aliz-
able. Consequen ly, we p opose iden i ying and ine- uning
hese domain-sensi i e pa ame e s, enhancing adap abili y
in DGSS o imp o ed c oss-domain pe o mance.
3.3. Fishe Tune
In his sec ion, we p esen ou Fishe Tune, ine- uning
ViT-based Vision Founda ion Models (VFMs) guided by
he Fishe In o ma ion Ma ix (FIM) while p ese ing hei
gene aliza ion s eng hs. Ou idea is o use FIM o ind ask-
and domain-sensi i e pa ame e s in θand ine- une hese
sensi i e pa ame e s o imp o e he gene aliza ion abili y
o he model on unseen domains while main aining he p e-
ained knowledge o VFMs.
3.3.1 Domain-Rela ed FIM
Domain-Rela ed FIM. In Eq. 4, FIM quan i ies he impo -
ance o model pa ame e s o he cu en ask by measu -
ing he sensi i i y o pa ame e s o model ou pu . Howe e ,
FIM can no su icien ly cap u e he beha io o pa ame e s
in c oss-domain scenes, especially in DGSS asks, whe e
he sensi i i y o di e en pa ame e s o a ying da a dis i-
bu ions may di e signi ican ly. Fo he DGSS ask, we
need a pa ame e s es ima ion me ic ha can cap u e he
a ia ion o pa ame e s ac oss di e en domains.
To add ess his issue, we p opose o calcula e he Fishe
in o ma ion change ∆Fθbe ween di e en da a domains
(seen domain and simula ed unseen domain) o measu e he
sensi i i y di e ence o pa ame e s ac oss domains. Fo -
mally, gi en he seen single-sou ce domain Ds={(x, y)},
he ∆Fθis calcula ed as:
∆Fθ=|Fθ(x, y)−Fθ(x′, y)|
min(Fθi(x),Fθi(x′)) + ϵ,(5)
whe e Fθ(x, y)and Fθ(x′, y)is he FIM o in he seen
and simula ed unseen domain. ϵ= 1 ×10−8is a small
cons an o p e en di ision by ze o. The nume a o ,
|Fθ(x, y)−Fθ(x′, y)|, compu es he di e ence be ween
he FIM ac oss domains, e lec ing he model’s a ying sen-
si i i y o pa ame e changes in di e en en i onmen s o
da a dis ibu ions. The denomina o , min(Fθi(x),Fθi(x′)),
no malizes his di e ence o ensu e he ela i e na u e o he
me ic. A highe ∆Fiindica es ha he pa ame e is mo e
sensi i e o domain changes.
To simula e an unseen domain sample x′ om a seen
domain x, we le e age he unce ain y modeling me hod in-
spi ed by [33]. Speci ically, he unseen domain ea u e x′
𝑥𝑥=𝑥𝑥𝑥 𝒙𝒙𝟏𝟏𝑥
FIM DR-FIM
=
FIM DR-FIM
<
FIM DR-FIM
<
FIM DR-FIM
<
Domain Shi :
𝒙𝒙𝟐𝟐𝑥𝒙𝒙𝟑𝟑𝑥
𝑥𝑥1𝑥𝑥𝑥2𝑥
𝑥𝑥3𝑥
Figu e 4. Compa ison o FIM and DR-FIM unde di e en de-
g ees o domain shi . The size o he ci cle indica es he alue.
I shows ha DR-FIM is a gene aliza ion o FIM as i addi ionally
conside s he c oss-domain sensi i i y o pa ame e s.
is simula ed by modi ying he ea u e s a is ics (mean and
a iance) o he seen domain sample x. The pe u bed mean
is gene a ed as, α(x) = µ(x) + ϵµΣµ(x), whe e µ(x) ep-
esen s he mean o he ea u e, Σµ(x)is he unce ain y
es ima ion o he mean, and ϵµ∼ N(0,1) is noise sam-
pled om a s anda d no mal dis ibu ion. Nex , he pe -
u bed a iance is gene a ed as, β(x) = σ(x) + ϵσΣσ(x),
whe e σ(x)is he s anda d de ia ion o he ea u e, Σσ(x)
is he unce ain y es ima ion o he s anda d de ia ion, and
ϵσ∼ N(0,1). Using he pe u bed mean α(x)and a i-
ance β(x), he unseen domain sample x′is gene a ed wi h
he ollowing o mula,
x′=β(x)·x−µ(x)
σ(x)+α(x).(6)
Using ∆Fθ, we in oduce a uni ied me ic, Domain-
Rela ed FIM (DR-FIM), o accoun o bo h ask-sensi i e
and domain-sensi i e pa ame e s as,
DRFθ=Fθ(x, y)
| {z }
ask-sensi i e
+e−(ϵµ+ϵσ)|Fθ(x, y)−Fθ(x′, y)|
min(Fθi(x),Fθi(x′)) + ϵ
| {z }
domain-sensi i e
.
(7)
The DRFθis a linea combina ion o Fθand ∆Fθ, and
he combina ion coe icien s a e de e mined by domain shi
con ol ac o s ϵµand ϵσ. When ϵµand ϵσa e la ge, he
simula ed domain shi is signi ican , and ∆Fθis scaled ap-
p op ia ely o balance wi h Fθ. The ela ionship be ween
he simula ed domain shi and he nume ical alues o DR-
FIM and FIM is shown in Fig. 4.
3.3.2 S able Es ima ion o DR-FIM
Al hough DRFθp o ides an es ima ion o he domain sen-
si i i y o VFMs pa ame e s, he dimensions o he pa am-
e e s lis θa e e y high, which makes i imp ac ical o di-
ec ly calcula e in compu a ion and s o age, i.e.,O(|θ|2).
The e o e, i is necessa y o app oxima e he FIM o educe
he compu a ional complexi y.
Diagonal App oxima ion. Following [36], by assuming
ha he o -diagonal elemen s a e negligible, he FIM can
15046
be e icien ly app oxima ed by using a diagonal app oxima-
ion, i.e.,ˆ
Fθ=diag(Fθ1,Fθ2, ..., Fθ|θ|),(8)
whe e each indi idual Fθncan be calcula ed as,
Fθn=1
N
N
X
i=1
Eyi∼ θ(yi|xi)(∇θnL( θ(x), yi))2,(9)
In he abo e diagonal app oxima ion, only he indi idual
con ibu ion o each pa ame e o he loss is conside ed,
while he in e ac ion e ms be ween di e en pa ame e s a e
igno ed. This diagonal app oxima ion e ec i ely simpli ies
aO(|θ|2)ma ix o a ec o o leng h O(|θ|), g ea ly e-
ducing he compu a ional complexi y.
Va ia ional Es ima ion o DR-FIM. Diagonaliza ion p o-
ides an e icien app oxima ion me hod o compu ing he
FIM. Howe e , due o he di e ences be ween he p e ain-
ing asks o he VFM and he DGSS ask, he es ima ed
FIM pa ame e s o en exhibi high sensi i i y, leading o
inaccu acies (See Fig. 6). To add ess his issue, we in o-
duce a a ia ional in e ence app oach [4], ea ing he ine-
uning model’s pa ame e s θas andom a iables ollowing
a Gaussian dis ibu ion. This in oduces an addi ional eg-
ula iza ion e m in o he FIM es ima ion, helping o lea n
a smoo he p io dis ibu ion. Consequen ly, a ia ional in-
e ence s abilizes he g adien upda e p ocess du ing FIM
calcula ion, mi iga ing he ins abili y caused by high g adi-
en noise.
Speci ically, assuming ha he pos e io dis ibu ion
o he model pa ame e s ollows a Gaussian dis ibu ion:
q(θ) = N(ˆ
θ,Λ−1), whe e ˆ
θis he mean o he cu en pa-
ame e s es ima ion, Λ−1is he co a iance ma ix o he pa-
ame e s. To p ese e he p e- ained knowledge o VFMs,
we in oduce he p io pa ame e dis ibu ion as a egula -
ize o p e en deg ada ion du ing p edic ion:
p(θ) = N(θp , τ2I),(10)
whe e θp is he p e- ained pa ame e s o VFMs, τ2is he
a iance con olling he lexibili y o ine- une pa ame e s,
and Iis he iden i y ma ix. We u ilize he a ia ional ee
ene gy (also called he e idence lowe bound, ELBO [20])
as he loss unc ion o op imizing Λ,
L(ˆ
θ,Λ−1) = Eθ∼q(θ)[L(θ)] + γ KL(q(θ)∥p(θ)),(11)
whe e γis he egula iza ion coe icien , con olling he in-
luence o he p io , and KL(q(θ)∥p(θ)) is he Kullback-
Leible di e gence be ween he pos e io q(θ)and he p io
p(θ).
Connec ion wi h DR-FIM. To simpli y he i s e m
in Eq. (11), we pe o m a second-o de Taylo expan-
sion o he loss unc ion L(θ)a ound he cu en pa-
ame e s es ima e θ=ˆ
θ. Taking he expec a ion o e
he weigh dis ibu ion q(θ),Eθ∼q(θ)[L(θ)] ≈ L(ˆ
θ) +
1
2T ∇2
θL(ˆ
θ)Λ−1.Acco ding o he de ini ion o FIM
and i s connec ion wi h he Hessian ma ix [16], he
FIM can be app oxima ed by he Hessian ma ix nea ˆ
θ,
∇2
θL(ˆ
θ)≈Fθ.Thus,
Eθ∼q(θ)[L(θ)] ≈ L(ˆ
θ) + 1
2T FθΛ−1.(12)
The second e m in Eq. (11), KL di e gence be ween wo
Gaussian dis ibu ions is simpli ied by,
KL(q(θ)∥p(θ)) = 1
2τ−2T (Λ−1) + τ−2∥ˆ
θ−θp ∥2
−k+kln τ2+ ln de Λ.
(13)
Subs i u ing he Eq. (12) and Eq. (13) back in o Eq. (11),
and aking he de i a i e o he loss unc ion wi h espec
o Λand we ob ain (See Appendix A o de ailed de i a-
ion),
Fθ=γΛ−γτ−2I. (14)
Then, he DR-FIM de ined in Eq. (7) is upda ed as,
DRFθ=γ Λx−τ−2I+e−(ϵµ+ϵσ)|Λx−Λx′|
min(Λx,Λx′) + ϵ
γ!.
(15)
I shows ha he DR-FIM can be es ima ed om he co-
a iance ma ix Λ, wi h γand τas hype pa ame e s. Us-
ing Eq. (15) o es ima e he DR-FIM has se e al ad an ages
o e using Eq. (7) and Eq. (9). 1) S abili y in Es ima ion:
Eq. (15) in oduces a mo e s able es ima ion mechanism by
inco po a ing p io knowledge om he p e- ained VFMs
p(θ)in Eq. (10) and he pos e io dis ibu ion q(θ). This
app oach helps p e en he deg ada ion o FIM es ima ion
caused by he ask shi be ween VFM p e- aining asks
and DGSS asks, ensu ing mo e obus pe o mance ac oss
unseen domains. 2) Compu a ional E iciency: The co a i-
ance ma ix Λcan be e icien ly compu ed by di ec ly mini-
mizing he loss L(ˆ
θ,Λ−1)in Eq. (11) using epa ame e iza-
ion ick [27] and s ochas ic g adien a ia ional Bayes [1],
educing bo h compu a ional and memo y o e head com-
pa ed o adi ional FIM es ima ion in Eq. (9).
3.3.3 T aining Schedule o Fishe Tune
We ollow Rein [60] which adds a mask decode o he
backbone ne wo k o VFMs as a segmen a ion model o
DGSS. Di e en om Rein, we do no modi y he backbone
s uc u e o add addi ional adap e s. Du ing aining, we
i s ix he backbone ne wo k o VFMs and use he o iginal
da a o wa m-up he decode o adap he whole segmen a-
ion model o he DGSS ask. A e ha , we une he VFMs
and decode by ou Fishe Tune as ollows.
15047

Algo i hm 1 Fishe Tune P ocess
1: Inpu : sou ce da ase D={(xi, yi)}N
i=1; Hype pa-
ame e s: egula iza ion coe icien γ, a iance coe i-
cien τ, wa m-up i e a ions T1, FIM es ima ion i e a-
ions T2, numbe o une i e a ions T3; p e ained VFM
θVFM; segmen a ion decode θdec.
2: S ep 1: Wa m-up decode :
3: T ain he decode θdec on D o T1s eps, oze θVFM.
4: S ep 2: Sampling and DR-FIM Calcula ion:
5: o = 1 o T2do
6: Sample ba ch (x, y)∼ D
7: Simula e unseen domain da a x′ ia Eq. 6.
8: Op imize co a iance ma ix Λ ia Eq. 11
9: Es ima e DR-FIM using he op imized Λ ia Eq. 15
10: S ep 3: Pa ame e Fine-Tuning:
11: o = 1 o T3do
12: Sample a ba ch (x, y)∼ D
13: Selec pa ame e s ˆ
θVFM ia Eq. 16
14: Upda e he selec ed ˆ
θVFM and θdec ia Eq. 1.
15: Ou pu : Fine- uned θVFM and θdec.
In Fishe Tune, he selec ion o pa ame e s o ine- uning
is guided by he DR-FIM (DRFθ), which quan i ies he
sensi i i y o pa ame e s o ask and domain shi s. To op-
imize he ine- uning p ocess, we p opose a dynamic ain-
ing schedule ha adjus s he numbe o ainable pa ame e s
based on hei DR-FIM alues. A he beginning o aining,
we ine- une only he mos sensi i e δmin% o pa ame e s,
as anked by DRFθi. As aining p og esses, we g adually
inc ease he pe cen age o ine- uned pa ame e s, eaching
δmax% by he end. This ensu es ha he model s a s wi h
a ocused ine- uning p ocess, a ge ing only he mos c i -
ical pa ame e s, and p og essi ely expands he ine- uning
scope as he model becomes mo e s able. Fo mally, a each
aining s ep , he dynamic h eshold DRF h esh( )is up-
da ed as ollows:
DRF h esh( ) = δmin +(δmax −δmin)·exp −
T,(16)
whe e Tis he o al numbe o aining s eps. Pa ame e s
wi h DRFθ alues highe han he h eshold DRF h esh( )
will be selec ed o aining. The de ailed ine- uning p o-
cess o ou Fishe Tune is in Algo i hm 1.
4. Expe imen s
4.1. Da ase s & Se up See Appendix B.
4.2. Compa ison wi h S a e-o - he-a Al e na i es
GTAV →C, B, M. Table 1demons a es ha ou ap-
p oach signi ican ly ou pe o ms o he ine- uning me hods
ac oss mul iple ision ounda ion models (VFMs). Com-
GTAV →Ci yscapes (Ci ys) + BDD100K (BDD) + Mapilla y (Map)
VFM ype Fine- une Me hod T ainable Pa ams Ci ys BDD Map A g.
CLIP [46]
(ViT-La ge)
Full 304.20M 51.3 47.6 54.3 51.1
F eeze 0M 53.7 48.7 55.0 52.5
LoRA [22] 0.79M 54.0 49.8 55.1 53.0
VPT [25] 3.69M 54.0 51.8 57.5 54.4
Rein [60] 2.99M 57.1 54.7 60.5 57.4
VQT [57] 3.01M 54.3 51.2 56.7 55.3
ChildTune [63] 15.21M 57.9 53.4 58.2 56.5
Ou s 15.21M 59.2 57.5 61.0 59.2
MAE [19]
(Huge))
Full 304.20M 53.7 50.8 58.1 54.2
F eeze 0M 43.3 37.8 48.0 43.0
LoRA [22] 0.79M 44.6 38.4 52.5 45.2
VPT [25] 3.69M 52.7 50.2 57.6 53.5
Rein [60] 2.99M 55.0 49.3 58.6 54.3
VQT [57] 3.01M 53.3 50.3 57.7 53.8
ChildTune [63] 15.21M 55.4 50.6 58.1 54.7
Ou s 15.21M 56.6 51.9 59.7 56.1
SAM [28]
(Huge)
Full 632.18M 57.6 51.7 61.5 56.9
F eeze 0M 57.0 47.1 58.4 54.2
LoRA [22] 0.79M 57.4 47.7 58.4 54.5
VPT [25] 3.69M 56.3 52.7 57.8 55.6
Rein [60] 2.99M 59.6 52.0 62.1 57.9
VQT [57] 3.01M 56.7 53.9 59.3 56.6
ChildTune [63] 15.21M 60.8 49.6 61.2 57.2
Ou s 15.21M 60.9 54.4 63.9 59.7
EVA02 [15]
(La ge)
Full 304.20M 62.1 56.2 64.6 60.9
LoRA [22] 0.79M 55.5 52.7 58.3 55.5
Adap Fo me [7] 3.17M 63.7 59.9 64.2 62.6
VPT [25] 3.69M 62.2 57.7 62.5 60.8
Rein [60] 2.99M 65.3 61.1 63.9 63.4
VQT [57] 3.01M 61.3 55.1 62.2 59.5
ChildTune [63] 15.21M 61.6 59.3 62.3 61.1
Ou s 15.21M 65.8 61.5 66.0 64.4
DINO 2 [41]
(ViT-La ge)
Full 304.20M 63.7 57.4 64.2 61.7
LoRA [22] 0.79M 65.2 58.3 64.6 62.7
Adap Fo me [7] 3.17M 64.9 59.0 64.2 62.7
VQT [25] 3.01M 64.6 59.0 65.7 63.1
Rein [60] 2.99M 66.4 60.4 66.1 64.3
ChildTune [63] 15.21M 65.6 59.3 65.3 63.4
Ou s 15.21M 68.2 63.3 68.0 66.5
EVA02 VLTSeg [24] 304.2M 65.3 58.3 66.0 63.2
DINOV2 SDT [66] 6.94M 68.1 61.6 67.7 65.8
CLIP+SAM CLOUDS [42] 304.2M 60.2 57.4 67.0 61.5
EVA02 qdm [42] 304.2M 68.9 59.2 70.1 66.1
EVA02 Ou s 15.21M 65.8 61.5 66.0 64.4
DINOV2 Ou s 15.21M 68.2 63.3 68.7 66.6
Table 1. Pe o mance and T ainable Pa ame e s Compa ison wi h
he p oposed Fishe Tune ac oss Mul iple VFMs as Backbones un-
de he GTAV →Ci yscapes (Ci ys) + BDD100K (BDD) + Map-
illa y (Map) gene aliza ion se ing.
pa ed o adap e -based me hods (e.g., LoRA and Rein),
ou app oach achie es an a e age o 4.3% highe mIoU
han Rein ac oss i e VFM models. Addi ionally, i su -
passes he sel - ocused pa ame ic ine- uning me hod VQT
by 3.1% on a e age. No ably, o models wi h a subs an-
ial gap be ween p e- aining and downs eam asks, such
as MAE and EVA02, adap e me hods yielded modes im-
p o emen s o 1.3% and 1.7% mIoU, espec i ely, whe eas
ou app oach achie ed 4.6% and 6.6% imp o emen s. Be-
sides, we added compa isons wi h he s a e-o - he-a me h-
ods using VFMs, and ou me hod emains compe i i e. The
qdm [42] and VLTSeg [24] me hod le e ages ea u es o
he language model, while Rein-se ies me hods and ou s o-
cus on isual models. These esul s highligh ou me hod’s
enhanced adap abili y o downs eam asks and i s signi i-
15048
Ci yscapes →BDD100K
Fine- une Me hod T ainable Pa ams oad side. build. wall ence pole ligh sign ege e . sky pe s. ide ca uck bus ain mo o. bicy. mIoU
DINO 2 [5]
(La ge)
Full 304.20M 89.0 44.5 89.6 51.1 46.4 49.2 60.0 38.9 89.1 47.5 91.7 75.8 48.2 91.7 52.5 82.9 81.0 30.4 49.9 63.7
F eeze 0M 92.1 55.2 90.2 57.2 48.5 49.5 56.7 47.7 89.3 47.8 91.1 74.2 46.7 92.2 62.6 77.5 47.7 29.6 47.2 63.3
REIN [60] 2.99M 92.4 59.1 90.7 58.3 53.7 51.8 58.2 46.4 89.8 49.4 90.8 73.9 43.3 92.3 64.3 81.6 70.9 40.4 54.0 66.4
VQT [57] 3.01M 88.3 49.9 85.9 50.7 47.9 44.3 55.6 39.2 86.1 42.8 87.5 71.3 45.4 89.4 53.5 82.6 74.9 46.1 57.4 63.1
ChildTune [62] 15.21M 92.1 56.1 91.0 58.8 46.9 52.0 58.6 47.2 90.8 47.9 93.3 72.0 47.1 93.0 63.9 76.2 47.9 28.8 48.3 63.8
Ou s 15.21M 92.1 55.4 90.2 58.9 50.9 54.5 59.8 49.1 92.5 52.8 91.0 73.7 51.5 92.7 67.4 82.9 72.8 44.3 54.1 67.7
EVA02 [15]
(La ge)
Full 304.20M 89.3 46.9 89.9 47.7 45.6 50.1 56.8 42.2 88.8 48.4 89.9 75.8 49.0 90.5 45.3 69.2 55.9 44.4 55.1 62.1
F eeze 0M 93.1 52.7 88.0 47.4 31.1 41.7 46.0 39.6 85.7 41.4 89.5 67.5 39.7 89.0 47.0 72.8 46.3 19.2 35.2 56.5
REIN [60] 2.99M 91.7 51.8 90.1 52.8 48.4 48.2 56.0 42.0 89.1 44.1 90.2 74.2 47.0 91.1 54.5 84.1 78.9 47.2 59.4 65.3
VQT [57] 3.01M 90.1 46.6 91.1 46.9 46.4 51.7 56.5 43.2 89.3 49.6 92.3 75.0 50.3 90.3 44.6 71.8 57.4 44.0 55.8 62.8
ChildTune [62] 15.21M 87.9 46.5 88.1 46.5 46.1 46.1 56.0 41.5 87.9 50.3 89.6 77.7 45.6 91.4 42.4 68.1 54.7 46.0 56.8 61.5
Ou s 15.21M 92.6 49.9 95.9 51.1 53.0 50.8 59.8 45.7 92.9 54.6 94.0 83.5 52.2 93.9 45.1 69.4 57.1 47.2 62.4 65.8
Ci yscapes →ACDC
DINO 2 [5]
(La ge)
Full 304.20M 92.8 75 87.4 55.7 54.1 55.6 71.2 69.6 82.4 56 92.2 66.8 45.6 89 79.7 87.9 87.5 51.4 62.7 71.7
F eeze 0M 86.0 68.1 80.2 52.4 47.8 48.2 65.5 65.3 80.0 54.7 86.2 65.0 44.9 86.4 73.3 80.5 86.9 50.1 60.9 67.5
REIN [60] 2.99M 94.6 78.3 92.0 61.9 55.0 64.8 73.8 72.7 88.4 67.4 95.4 77.1 60.2 92.6 84.1 86.9 92.5 67.6 68.6 77.6
VQT [57] 3.01M 93.3 76.4 89.2 55.0 53.9 53.9 72.0 67.3 83.4 55.3 95.1 67.7 47.0 90.5 81.6 86.3 88.2 50.1 61.9 72.0
ChildTune [62] 15.21M 92.9 72.8 84.7 56.6 54.1 56.8 70.9 67.7 82.3 55.7 93.6 65.9 45.3 89.6 77.6 87.8 87.0 52.5 62.2 71.4
Ou s 15.21M 95.6 79.0 96.5 60.5 58.3 64.9 75.6 77.7 85.0 61.3 98.6 73.6 51.5 94.8 85.4 94.7 93.8 59.0 66.7 77.5
EVA02 [15]
(La ge)
Full 304.20M 90.2 68.8 81.0 53.7 49.9 48.1 68.7 64.2 80.1 57.4 88.1 68.8 41.8 89.7 74.1 82.1 89.7 50.0 56.8 68.6
F eeze 0M 86.0 60.5 76.3 49.0 41.7 46.1 60.5 61.0 72.1 49.8 77.7 56.7 40.6 80.3 68.3 77.2 85.5 46.7 56.4 62.8
REIN [60] 2.99M 88.7 71.8 81.7 55.2 51.7 50.5 70.5 64.9 83.7 59.0 90.3 72.0 48.3 93.0 79.3 83.3 91.3 50.8 62.0 70.9
VQT [57] 3.01M 90.3 71.2 81.4 54.3 53.1 49.1 67.9 64.3 82.0 60.5 86.9 66.8 41.3 89.3 76.6 81.7 91.3 47.2 55.7 69.0
ChildTune [62] 15.21M 86.4 68.8 81.0 54.4 50.6 48.9 69.6 64.5 83.2 57.8 88.2 69.0 47.9 90.2 74.8 82.8 90.3 51.0 61.4 69.5
Ou s 15.21M 90.5 75.2 83.6 58.8 54.6 52.2 73.1 66.6 85.7 60.5 90.2 70.7 51.5 92.3 82.6 88.2 91.9 54.0 62.4 72.9
Table 2. DGSS gene aliza ion pe o mance o each ca ego y om he Ci yscapes sou ce domain o mixed-domain BDD100K and ACDC,
wi h compa ison me hods including adap o -based Rein [60] and selec i e pa ame e ine- uning me hods VQT [57] and ChildTune [62].
Ci yscapes →Ad e se Wea he
Fine- une Me hod T ainable Pa ams Foggy Zu ich [49] Foggy D i ing [49] Da k Zu ich [50] Nigh ime D i ing [52] ACDC-Rain [51] ACDC-Snow [51] mIoU
DINO 2 [5]
(La ge)
Full 304.20M 50.4 55.3 62.7 47.7 75.2 76.8 61.3
F eeze 0M 50.3 43.7 54.3 40.8 66.1 71.7 54.5
REIN [60] 2.99M 55.5 58.2 64.3 50.3 78.2 79.5 64.3
VQT [57] 3.01M 54.1 57.1 61.9 47.4 76.1 75.3 62.0
ChildTune [62] 15.21M 55.2 56.9 64.5 50.7 77.7 78.3 63.9
Ou s 15.21M 56.9 60.0 66.6 53.2 78.6 82.2 66.3
Table 3. DGSS pe o mance compa ison o Ci yscapes as he sou ce domain unde di e se wea he condi ions.
Ci yscapes →BDD100K Ci yscapes →ACDC
EVA02 [5]
(La ge)
Full 62.1 68.6
F eeze 56.5 62.8
Random 61.1 67.6
Random Q62.8 69.1
Random K61.9 68.1
Random V62.9 69.2
Fθ63.8 69.5
∆Fθ63.1 71.3
DRFθ65.8 (+3.7) 72.9 (+5.3)
DINO 2 [5]
(La ge)
Full 63.7 71.7
F eeze 63.3 67.5
Random 62.7 71.0
Random Q63.2 72.0
Random K63.5 72.3
Random V63.2 72.9
Fθ63.8 71.4
∆Fθ64.5 76.1
DRFθ67.7 (+4.0) 77.5 (+5.8)
Table 4. Abla ion s udy on gene aliza ion wi h 5% ine- unable
pa ame e s in e ms o mIoU.
can imp o emen in model gene aliza ion.
Ci yscapes →BDD100K, ACDC. In mig a ing om
Ci yscapes o BDD100K and ACDC, ou me hod achie ed
s ong esul s, wi h a e age mIoU sco es o 67.7% and
77.5%. As shown in Table 2, ou me hod’s a e age mIoU
on BDD100K is 2.4% highe han REIN. Compa ed o
VQT and ChildTune, ou me hod imp o ed mIoU by 4.4%
and 2.6% on he espec i e da ase s, add essing issues in
pa ame e uning, da a adap a ion, and mig a ion s a egy.
These esul s highligh ou me hod’s supe io adap abili y
and gene aliza ion in complex scenes.
Ci yscapes →Ad e se Wea he . We e alua ed a ious
ine- uning s a egies o DINO 2 models ac oss challeng-
ing wea he condi ions, as shown in Table 3. Ou app oach
achie ed an a e age o 2.0% highe mIoU han he adap e -
based REIN me hod and 4.3% highe han he sel - ocused
VQT app oach. This imp o emen likely s ems om he
subs an ial di e ence be ween p e- aining and downs eam
asks. Besides, ChildTune showed limi ed pe o mance
gains, and ou me hod su passed ChildTune by an a e -
age o 2.4% mIoU, demons a ing supe io adap abili y and
gene aliza ion unde complex wea he scena ios.
4.3. Abla ion S udies
Abla ion o DR-FIM e ec i eness As shown in Table 4,
andomly selec ing Q,K, and Vpa ame e s o ine- uning
does no ully le e age he gene aliza ion abili y o VFMs,
leading o lowe mIoU. Using FIM (Fθ) o pa ame e se-
lec ion imp o es pe o mance o e andom choice. Fu -
he gains a e achie ed wi h ∆Fθ, which be e iden i ies
domain-sensi i e pa ame e s—especially on ACDC, whe e
se e e wea he di e ences pose g ea e challenges. Ou
p oposed DR-FIM, combining Fand ∆F, deli e s he bes
15049
Me hod EVA02 EVA02+FP DINOV2 DINOV2+FP
Adap Fo me 62.6 63.3 (+0.7) 62.7 63.7 (+1.0)
VPT 60.8 61.8 (+1.0) 63.3 64.1 (+0.8)
Rein 63.6 63.9 (+0.3) 64.3 65.0 (+0.7)
Ou s 64.4 64.5 (+0.1) 66.3 66.5 (+0.2)
Table 5. Abla ion s udy on Fea u e Pe u ba ion (FP) using [33].
esul s, boos ing mIoU by +3.7% and +5.3% on Ci yscapes
→BDD100K and Ci yscapes →ACDC o EVA02 (La ge),
and by +4.0% and +5.8%, espec i ely. These esul s high-
ligh he e ec i eness o ou me hod.
Abla ion o DR-FIM Es ima ion Fig. 5p esen s he abla-
ion s udy on he p oposed s able es ima ion me hod. The
esul s show ha while DR-FIM ou pe o ms FIM in pa-
ame e e alua ion, bu he e ec i eness o DR-FIM is lim-
i ed by adi ional es ima ion me hods. The s able es ima-
ion me hod signi ican ly enhances he accu acy o pa ame-
e e alua ion o bo h FIM and DR-FIM. No ably, applying
s able es ima ion o DR-FIM esul s in an a e age imp o e-
men o 2.6% mIoU, demons a ing supe io o e all gene -
aliza ion pe o mance.
Abla ion o Fea u e Pe u ba ion Since we adop do-
main simula ion augmen a ion om [32], which is gene -
ally conside ed e ec i e o DG, we also apply i o exis -
ing VFM me hods o a ai compa ison. Fishe Tune uses
ea u e pe u ba ion (FP) solely o iden i ying domain-
sensi i e pa ame e s, no du ing ine- uning. As shown in
Table 5, FP yields a modes imp o emen (+1.0% mIoU)
on GTA→A g., ye ou me hod s ill ou pe o ms o he s.
4.4. Discussion
Cap u ed domain-sensi i e pa ame e s. Fig. 6illus a es
he impac o di e en es ima ion me hods on pa ame e
sensi i i y es ima ion. (a) shows ha pa ame e sensi i -
i y es ima ed om o iginal FIM is gene ally high, making
i di icul o iden i y he mos aluable pa ame e s. (b)
demons a es ha inco po a ing ∆Fθ ede ines pa ame e
sensi i i y by comp ehensi ely conside ing bo h ask el-
e ance and domain sensi i i y. (c) p esen s he DR-FIM
es ima ed using a obus way, which highligh s impo an
pa ame e s mo e e ec i ely, aiding in he selec ion o alu-
able pa ame e s. Addi ionally, (c) e eals ha impo an pa-
ame e s end o be concen a ed in he Q,K,Vand FFM
pa ame e s o deepe blocks. Fu he mo e, he o e all sen-
si i i y o Qand Kis highe han ha o V.
Fea u e Visualiza ion. Fig. 7compa es he T-SNE isual-
iza ions o ea u e dis ibu ions be ween Rein [60] and Fish-
e Tune. Fishe Tune exhibi s a mo e balanced ea u e dis i-
bu ion ac oss mul iple unseen domains, indica ing educed
domain bias and imp o ed gene aliza ion.
The Ra io o Fine- uned Pa ame e s.See Appendix C.
Segmen a ion Resul Visualiza ion.See Appendix D.
In luence o Hype -pa ame e s.See Appendix E.
Figu e 5. Abla ion s udy o es ima ion ways on Ci yscapes
→BDD100K (C2B), →ACDC (C2A), and GTAV →
Ci yscapes(G2C), →BDD100K(G2B) and →Mapilla y(G2M).
(b) DR-FIM wi hou S able Es ima ion
(c) DR-FIM wi h S able Es ima ion
(a) FIM
Figu e 6. Diag am o pa ame e sensi i i y es ima ed by FIM and
ou DR-FIM using DINOV2-la ge, ained on GTAV o DGSS
expe imen s. The Q,K,V, and FFN pa ame e s a e a anged in
ascending o de acco ding o hei block indices.
Nigh •Rainy•Snow
．
•Foggy
Figu e 7. Compa ison o T-SNE ea u e isualiza ions: Rein [60]
(le ) and he p oposed Fishe Tune ( igh ). The model is ained on
he Ci yscapes →ACDC DGSS ask. Fishe Tune shows a mo e
balanced ea u e dis ibu ion ac oss mul iple unseen domains.
5. Conclusion
We p opose Fishe Tune, a ine- uning me hod o Vi-
sion Founda ion Models (VFMs) in DGSS. I in oduces
he Domain-Rela ed Fishe In o ma ion Ma ix (DR-FIM)
o measu e pa ame e sensi i i y o domain shi s, using
a ia ional in e ence o s able es ima ion. Fishe Tune en-
hances domain adap abili y while main aining gene aliza-
ion. We hope i encou ages u he esea ch on selec i e
ine- uning o be e unlock he gene aliza ion po en ial o
VFMs in DGSS and beyond.
15050
Re e ences
[1] Alessand o Achille, Michael Lam, Rahul Tewa i, A inash
Ra ichand an, Subh ansu Maji, Cha less C Fowlkes, S e-
ano Soa o, and Pie o Pe ona. Task2 ec: Task embedding
o me a-lea ning. In P oceedings o he IEEE/CVF in e -
na ional con e ence on compu e ision, pages 6430–6439,
2019. 5
[2] Alessand o Achille, Gio anni Paolini, and S e ano Soa o.
Whe e is he in o ma ion in a deep neu al ne wo k? a Xi
p ep in a Xi :1905.12213, 2019. 3
[3] Yasse Benigmim, Subhanka Roy, Slim Essid, Vicky Kalo-
gei on, and S ´
ephane La huili`
e e. Collabo a ing ounda ion
models o domain gene alized seman ic segmen a ion. In
P oceedings o he IEEE/CVF Con e ence on Compu e Vi-
sion and Pa e n Recogni ion, pages 3108–3119, 2024. 2
[4] Da id M Blei, Alp Kucukelbi , and Jon D McAuli e. Va i-
a ional in e ence: A e iew o s a is icians. Jou nal o he
Ame ican s a is ical Associa ion, 112(518):859–877, 2017.
5
[5] Ma hilde Ca on e al. Dino 2: Lea ning obus isual ea-
u es wi hou supe ision. a Xi p ep in a Xi :2304.07193,
2023. 1,7
[6] P i h iji Cha opadhyay, Ka ik Sa angma h, Vi ek Vi-
jaykuma , and Judy Ho man. Pas a: P opo ional ampli ude
spec um aining augmen a ion o syn- o- eal domain gen-
e aliza ion. In P oceedings o he IEEE/CVF In e na ional
Con e ence on Compu e Vision, pages 19288–19300, 2023.
2
[7] Shou a Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang,
Yibing Song, Jue Wang, and Ping Luo. Adap o me :
Adap ing ision ans o me s o scalable isual ecogni-
ion. Ad ances in Neu al In o ma ion P ocessing Sys ems,
35:16664–16678, 2022. 2,6
[8] Ziyuan Cheng, Ruinian Wan, Meng Li, Feiyu Wang, Chao
Xu, and Xiao ei He. Domain gene aliza ion ia s yle-
e icien pe u ba ion and clus e ing o in a-domain he e o-
geneous da a. In P oceedings o he IEEE/CVF Con e ence
on Compu e Vision and Pa e n Recogni ion, pages 3938–
3947, 2022. 2
[9] Sungha Choi, Sanghun Jung, Huiwon Yun, Joanne T Kim,
Seung yong Kim, and Jaegul Choo. Robus ne : Imp o ing
domain gene aliza ion in u ban-scene segmen a ion ia in-
s ance selec i e whi ening. In P oceedings o he IEEE/CVF
con e ence on compu e ision and pa e n ecogni ion,
pages 11580–11590, 2021. 2
[10] Alexey Doso i skiy. An image is wo h 16x16 wo ds:
T ans o me s o image ecogni ion a scale. a Xi p ep in
a Xi :2010.11929, 2020. 2
[11] Qiong Dou, Daniel Ca o de Cas o, Kons an inos Kamni sas,
and Ben Glocke . Domain gene aliza ion ia model-agnos ic
lea ning o seman ic ea u es. In Ad ances in Neu al In o -
ma ion P ocessing Sys ems, pages 6450–6461, 2019. 2
[12] Mohammad Fahes, Tuan-Hung Vu, And ei Bu suc, Pa ick
P´
e ez, and Raoul De Cha e e. Poda: P omp -d i en ze o-
sho domain adap a ion. In P oceedings o he IEEE/CVF
In e na ional Con e ence on Compu e Vision, pages 18623–
18633, 2023. 2
[13] Mohammad Fahes, Tuan-Hung Vu, And ei Bu suc, Pa ick
P´
e ez, and Raoul de Cha e e. A simple ecipe o language-
guided domain gene alized segmen a ion. In P oceedings o
he IEEE/CVF Con e ence on Compu e Vision and Pa e n
Recogni ion, pages 23428–23437, 2024. 2
[14] Hao Fang e al. E a-02: A isual lea ne o mo e gene al-
ized isual ep esen a ion lea ning. In Con e ence on Com-
pu e Vision and Pa e n Recogni ion (CVPR), 2023. 3
[15] Hao Fang e al. E a-clip: Imp o ing ision-language models
wi h masked modeling. a Xi p ep in a Xi :2303.13495,
2023. 1,6,7
[16] Ronald A Fishe . On he ma hema ical ounda ions o he-
o e ical s a is ics. Philosophical ansac ions o he Royal
Socie y o London. Se ies A, con aining pape s o a ma he-
ma ical o physical cha ac e , 222(594-604):309–368, 1922.
3,5
[17] Zhangwei Gao, Zhe Chen, E ei Cui, Yiming Ren, Weiyun
Wang, Jinguo Zhu, Hao Tian, Shenglong Ye, Junjun He,
Xizhou Zhu, e al. Mini-in e n l: a lexible- ans e pocke
mul i-modal model wi h 5% pa ame e s and 90% pe o -
mance. Visual In elligence, 2(1):1–17, 2024. 2
[18] Zeyu Han, Chao Gao, Jinyang Liu, Je Zhang, and Sai Qian
Zhang. Pa ame e -e icien ine- uning o la ge models: A
comp ehensi e su ey. a Xi p ep in a Xi :2403.14608,
2024. 2
[19] Kaiming He e al. Masked au oencode s a e scalable ision
lea ne s. In Con e ence on Compu e Vision and Pa e n
Recogni ion (CVPR), 2022. 3,6
[20] Ma hew D Ho man, Da id M Blei, Chong Wang, and John
Paisley. S ochas ic a ia ional in e ence. Jou nal o Machine
Lea ning Resea ch, 2013. 5
[21] Edwa d J Hu e al. Lo a: Low- ank adap a ion o la ge lan-
guage models. In e na ional Con e ence on Lea ning Rep-
esen a ions (ICLR), 2022. 2
[22] Edwa d J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-
Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen.
Lo a: Low- ank adap a ion o la ge language models. a Xi
p ep in a Xi :2106.09685, 2021. 6
[23] Jiaxing Huang, Dayan Guan, Ao an Xiao, and Shijian Lu.
Fsd : F equency space domain andomiza ion o domain
gene aliza ion. In P oceedings o he IEEE/CVF con e ence
on compu e ision and pa e n ecogni ion, pages 6891–
6902, 2021. 2
[24] Ch is oph H¨
umme , Manuel Schwonbe g, Liangwei Zhou,
Hu Cao, Alois Knoll, and Hanno Go schalk. S ong bu
simple: A baseline o domain gene alized dense pe cep ion
by clip-based ans e lea ning. In P oceedings o he Asian
Con e ence on Compu e Vision, pages 4223–4244, 2024. 6
[25] Menglin Jia, Luming Tang, Bo -Chun Chen, Clai e Ca die,
Se ge Belongie, Bha a h Ha iha an, and Se -Nam Lim. Vi-
sual p omp uning. In Eu opean Con e ence on Compu e
Vision, pages 709–727. Sp inge , 2022. 2,6
[26] Muhammad Uzai Kha ak, Hanoona Rasheed, Muhammad
Maaz, Salman Khan, and Fahad Shahbaz Khan. Maple:
Mul i-modal p omp lea ning. In P oceedings o he
IEEE/CVF Con e ence on Compu e Vision and Pa e n
Recogni ion, pages 19113–19122, 2023. 1
15051

Related note

Why institutions use Plag.ai for originality review, entry 47
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by research administrators in North America, Europe, Latin America, and international online education, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also stronger evidence for review committees, more reliable review records, and clearer documentation of academic decisions. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For research files, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai