scieee Science in your language
[en] (orig)

When Measures Become Targets: Lessons from Open Science and Machine Learning on the Fragility of Reform

Author: Herrmann, Moritz
Publisher: Zenodo
DOI: 10.5281/zenodo.17280495
Source: https://zenodo.org/records/17280495/files/Poster_OS-conference.pdf
When Measu es Become Ta ge s: Lessons om Open Science and Machine
Lea ning on he F agili y o Re o m
Mo i z He mann1,2
1LMU Munich 2Munich Cen e o Machine Lea ning
Insigh s om me hodological ML esea ch
Cu en common bu incomple e unde s anding o empi ical
esea ch in ML leads o non- eplicable and un eliable indings.
One o he "main e idence gaps a ound cu en AI capabili ies" iden-
i ied in he In e na ional AI Sa e y Repo (Bengio e al., 2025)
P oblem 1: Biased expe imen s and lack o sc u iny
Mos me hod compa isons a e ca ied ou as pa o a pape p opos-
ing a new me hod and a e usually biased in a o o he new me hod.
P oblem 2: Bias owa ds ce ain ypes o esea ch
The e is a bias owa ds o mal p oo s & applica ion imp o emen s,
while pu ely expe imen al esea ch is much less incen i ized.
P oblem 3: Concep ual cla i y and ope a ionaliza ion
A lack o 1) cla i y abou impo an concep s in ML esea ch and 2)
clea ope a ionaliza ion o expe imen s ha ms he alidi y o esul s.
Scien ifically,
calls in o ques ion
p og ess in he field.
"[N]on- ep oducible single occu ences a e o no
significance o science".a
P ac ically,
jeopa dizes applied
esea che s’ us in ML.
May discou age applying ML me hods, e en
hough hese can be beneficial.
A long line o li e a u e wa ned agains his si ua ion.
Langley: Machine Lea ning as an expe imen al science
Hooke : Tes ing heu is ics: We ha e i all w ong
1995
Hand: Classifie echnology and he illusion o p og ess
D ummond: Machine Lea ning as an expe imen al
science ( e isi ed)
Sculley e al.: Winne ’s cu se? On pace, p og ess, and
empi ical igo
McGeoch; Johnson
1988
2006
2018
2002
D ummond
2009
D ummond & Japkowicz
2010
Lones; T os en
Liao e al.
Ch is odoulou e al.; Ra 2019
Hende son e al.; Melis e al.; Lucic e al.
2010
2019
2018
2021
2022
2023
2021
2022
2023
Ma eus e al.; McEl esh e al.; Kapoo & Na ayanan
Fe a i Dac ema e al.; Ma ie e al.; Na ang e al.
Elo & A e buch-Elo ; an den Goo be gh e al.
Mohammadmahdi e al.:
Repo ing bias when using eal da a se s o
analyze classifica ion pe o mance
Wa ningsE idence
Bou hillie e al.
ML esea ch
Fo de ails & he lis o e e ences: He mann e al. (2024). Posi ion:
Why We Mus Re hink Empi ical Resea ch in Machine Lea ning.
Lessons om Open Science and ML
The si ua ion in ML esea ch has a esemblance o eplica ion
c isis in applied esea ch, wi h wo s iking simila i ies
1Neglec o epis emic ounda ions and wa nings abou i
Applied esea ch
S a is ical es ing: Decades o dispu e and wa nings!
1942
1955
1994
2019
2Measu es become a ge s
Applied esea ch
p- alue
Measu e o e idence
becomes
Indica o o scien i ic quali y →S a is ical signi icance
becomes
Ta ge in (social) decision-making (publica ion) →p-hacking
ML esea ch
Accu acy (w.l.o.g)
Measu e o p edic ion pe o mance
becomes
Indica o o scien i ic quali y →S a e-o - he-a (SOTA) pe o mance
becomes
Ta ge in (social) decision-making (publica ion) →“SOTA-hacking”
(Gencoglu e al., 2019)
Goodha ’s lawa
“When a measu e becomes a a ge , i
ceases o be a good measu e.”
Bengio e al. (2025). In e na ional AI Sa e y Repo . a Xi p ep in a Xi :2501.17805 p. 45.
He mann e al. (2024). Posi ion: Why We Mus Re hink Empi ical Resea ch in Machine Lea ning. P oceedings o he 41s In e na ional
Con e ence on Machine Lea ning 235:18228-18247. LINK.
aS a he n, M. (1997). ’Imp o ing a ings’: audi in he B i ish Uni e si y sys em. Eu opean Re iew, 5(3), 305-321. LINK. p. 308.
Gencoglu e al. (2019). HARK Side o Deep Lea ning-F om G ad S uden Descen o Au oma ed Machine Lea ning. a Xi p ep in a Xi :1904.07633
Campbell, D. T. (1979). Assessing he impac o planned social change. E alua ion and P og am Planning, 2(1), 67–90. LINK. p. 85.
Syed, M. (2023). Some da a indica ing ha edi o s and e iewe s do no check p e egis a ions du ing he e iew p ocess. PsyA Xi
p ep in LINK. p. 1.
Klonsky, E. D. (2025). Klonsky, E. Da id. "Campbell’s law explains he eplica ion c isis: P e- egis a ion badges a e his o y epea ing. Assessmen
32.2, 224-234. LINK. p. 1.
Consequences o Re o m
Re o m e o s ocus a lo on p ac ices, in pa icula eplacing
quali y indica o s. Campbell’s law sugges s his is no emedy.
Campbell’s law
“The mo e any quan i a i e social indica o is used o social
decision-making, he mo e subjec i will be o co up ion p es-
su es and he mo e ap i will be o dis o and co up he social
p ocesses i is in ended o moni o .”
Also applies o "al e na i e" quali y indica o s!
Example: P e egis a ion
“Ano he use ul ool has been con e ed in o an
indica o o s ong science and a goal in and o
i sel . [...] Fo example, he e is al eady e idence ha
pape s seeking PRBs [p e- egis a ion badges]
ou inely iola e he ules and spi i o
p e- egis a ion.”
Su ey o 201 a icles om PLOS jou nals
“A he edi o / e iewe le el hings look much wo se,
wi h only 18% men ioning p e egis a ion, 5%
epo -ing accessing he p e egis a ion, and
3% discussing he ela ion be ween he
p e egis a ion and he manusc ip .”
Speci ically p oblema ic, as P eReg is no uni e sally applicable.
1High po en ial o implici bias o /agains ce ain esea ch ypes.
2E.g., P eReg is no well applicable o explo a o y esea ch,
including some ypes o empi ical ML.
All Re o m is F agile
•The e a e no one-size- i s-all solu ions o he eplica ion c isis.
•We should no unc i ically p esc ibe speci ic modes o ope a ion.
•Focus on epis emic aspec s is as impo an as changing p ac ices.
Con ac
Bmo[email p o ec ed]