scieee Science in your language
[en] (orig)

AI-assisted research data annotation in biomedical consortia

Author: Engel, Felix; Watter, Manuel; Benadi, Gita; Giuliani, Claudia; Kalantari Sarcheshmeh, Aref; Binder, Harald; Kaier, Klaus
Publisher: Zenodo
DOI: 10.5281/zenodo.17242942
Source: https://zenodo.org/records/17242942/files/2025-10-08-OpenScience_Conference-IMBI_Freiburg-print.pdf
AI-assis ed esea ch da a
anno a ion in biomedical conso ia
Felix Engel, Manuel Wa e , Gi a Benadi, Claudia Giuliani, A e Kalan a i, Ha ald Binde , Klaus Kaie
Ins i u e o Medical Biome y and S a is ics, Facul y o Medicine and Medical Cen e – Uni e si y o F eibu g
Resea ch Da a Managemen
Ins i u e o Medical Biome y
and S a is ics
Assessmen o
Requi emen s
Schema
De elopmen
Schema
Tes ing & Tuning
Da a
Anno a ion
Da a
P esen a ion
Da a
Reuse
The Ins i u e o Medical Biome y
and S a is ics (IMBI) suppo s
biomedical collabo a i e esea ch
cen e s (CRCs) wi h esea ch da a
managemen . Toge he wi h
conso ium esea che s we
de elop me ada a schemas o
anno a ing he CRC esea ch da a
ou pu . Resea che s use IMBI’s
own esea ch da a pla o m
eda o o en ich hei da ase s
wi h me ada a.
1234
Anno a ion P edic ion
In a ou -s ep p ocess pipeline, LLMs analyse
a pape published in he CRC o sugges
me ada a ele an o he esea ch da a ha
we e used in he esea ch documen ed in he
publica ion. F eely iden i ied keywo ds a e
g ounded by alignmen wi h he PubTa o ³
so wa e and he CRC’s me ada a schema.
This esul s in sugges ions o bo h da a
anno a ion and schema e inemen .
E alua ion by esea ch da a c ea o s has demons a ed a high
p ecision o he anno a ion p edic ion. S ill, esea che s a e always
kep in he loop o elimina e inco ec sugges ions and o ensu e
su icien me ada a co e age. LLM-based anno a ion p edic ion is
cu en ly being implemen ed as a ea u e in eda o.
Schema In e ence
La ge Language Models (LLMs) a e p esen ed wi h cu en
li e a u e om he CRC’s a ea o esea ch. Ou pu is a lis o
essen ial keywo ds ha is discussed wi h CRC esea che s
and se es as a basis o schema de elopmen and
e inemen .
Au oma ed Da ase De ec ion and Anno a ion
To ake anno a ion p edic ion o a new le el, we ha e asked LLMs o sea ch
published jou nal a icles o e e ences o published da ase s. The models
we e asked o isi he eposi o y landing pages o hese sou ces and o
sugges anno a ion keywo ds om he in o ma ion ound he e. These
sugges ions a e u he en iched wi h in o ma ion om he a icle i sel .
This app oach p o ides a mo e complex and iche anno a ion empla e o
scien is s o comple e and app o e.
Comple e and machine-in elligible
me ada a suppo u he p ocessing wi h
ools powe ed by a i ical in elligence (AI).
AI-Readiness
Engel, F., Benadi, G., Giuliani, C., We ne , J., Wa e , M., Zeise , R.,
Kö gen, A., Binde , H., & Kaie , K. (2025). De elopmen o Me ada a
Schemas Fo Collabo a i e Resea ch Cen e s. F eiDa a.
h ps://doi.o g/10.60493/K1XE3-NPC10
Wa e , M., Kahle, L., B unswiek, B., Fich ne , U., P a enlehne , M.,
We ne , F., Gebele, D., Binde , H., & Knaus, J. (2023). S anda dized
me ada a collec ion in a esea ch da a managemen ool o
s eng hen collabo a ion in Collabo a i e Resea ch Cen e s. E-
Science-Tage, Heidelbe g.
h ps://doi.o g/10.11588/HEIDOK.00033131
Giuliani, C., Benadi, G., Engel, F., We ne , J., Wa e , M., Schwa ze , G.,
G oß, O., Zeise , R., Binde , H., & Kaie , K. (2025). Iden i ying
biomedical en i ies o da ase s in scien i ic a icles – A 4-s ep
cache-augmen ed gene a ion app oach using GPT-4o and PubTa o
3.0. medRxi . h ps://doi.o g/10.1101/2025.03.04.25323310
Wa e , M, Giuliani, C., Benadi, G., Engel, F., Binde , H., Kaie , K. (2025)
Au oma ed Iden i ica ion o Con ex ually Rele an Biomedical
En i ies wi h G ounded LLMs. medRxi .
h ps://doi.o g/10.1101/2025.07.07.25331004
This wo k is licensed unde
h ps://c ea i ecommons.o g/licenses/by/4.0/