scieee Science in your language
[en] (orig)

ORKGEx v2.0: Automated Multimodal Knowledge Extraction for Scalable Scientific Knowledge Graph Construction

Author: Hussein, Hassan; Oelen, Allard; Auer, Sören
Publisher: Zenodo
DOI: 10.5281/zenodo.17703289
Source: https://zenodo.org/records/17703289/files/abstract-aikg-sd-2025-hussein.pdf
AIKG-SD 2025 Summe School co-loca ed wi h he NFDI4DS Con e ence 2025
No embe 25-26, 2025, Be lin, Ge many
1
ORKGEx 2.0: Au oma ed Mul imodal Knowledge Ex ac ion o Scalable Scien i ic
Knowledge G aph Cons uc ion
Hassan Hussein, Alla d Oelen, and Sö en Aue
TIB Leibniz In o ma ion Cen e o Science and Technology, Hanno e , Ge many
Abs ac
The exponen ial g ow h o scien i ic li e a u e has c ea ed an unp eceden ed challenge:
adi ional manual anno a ion and knowledge ex ac ion p ocesses ha e become
undamen ally inadequa e o meaning ul scien i ic disco e y. Cu en app oaches o
li e a u e e iew and knowledge syn hesis c ea e signi ican bo lenecks in scien i ic p og ess
and lead o knowledge agmen a ion ac oss disciplines. This wo k p esen s ORKGEx 2.0, a
e olu iona y Ch ome ex ension ha add esses c i ical limi a ions o he o iginal sys em
h ough undamen al a chi ec u al imp o emen s. Building upon ou awa d-winning WEB
2025 con e ence pape [1], his ex ended e sion in oduces a guided i e-s ep anno a ion
wo k low ha sys ema ically walks esea che s h ough me ada a ex ac ion, esea ch ield
iden i ica ion, esea ch p oblem disco e y, empla e selec ion, and knowledge g aph
in eg a ion—signi ican ly educing cogni i e load and anno a ion e o s ha plagued ea lie
e sion. Ou sys em in oduces a sophis ica ed mul imodal app oach wi h human-in- he-loop
in eg a ion ha p ocesses ex ual con en , images, ables, and igu es using GPT-4 Vision.
The human esea che emains cen al o he anno a ion p ocess, p o iding domain expe ise
and alida ion, while he AI handles compu a ionally in ensi e asks such as con en analysis
and p ope y sugges ion. This collabo a i e app oach ensu es high-quali y anno a ions while
d ama ically educing manual e o , wi h he sys em lea ning om use eedback o imp o e
subsequen sugges ions and main ain scien i ic accu acy. The co e inno a ion lies in ou dual-
p onged app oach combining Re ie al-Augmen ed Gene a ion (RAG) a chi ec u e wi h
ad anced seman ic embedding. The RAG sys em eeds con ex ual pape con en o la ge
language models o in elligen p ope y sugges ions, while ou embedding-based simila i y
mechanism iden i ies seman ically ela ed esea ch p oblems and p ope ies ac oss he Open
Resea ch Knowledge G aph (ORKG)— signi ican ly educing anno a ion edundancy and
enabling esea che s o disco e hidden connec ions ac oss disciplines. The mul imodal
capabili ies ex end beyond adi ional ex p ocessing o include au oma ic ex ac ion o
s uc u ed knowledge om esea ch igu es, da a isualiza ions, and abula con en ,
add essing a c i ical gap whe e aluable in o ma ion embedded in isual elemen s ypically
emains inaccessible o au oma ed p ocessing sys ems. This wo k ep esen s a signi ican
ad ancemen owa d AI-assis ed scien i ic knowledge cons uc ion, demons a ing how
in elligen sys ems can augmen human expe ise a he han eplace i . By add essing he
undamen al challenges o scalabili y, accu acy, and use expe ience in scien i ic anno a ion.
Ou sys em p o ides a p ac ical solu ion wi h he po en ial o accele a e scien i ic disco e y
and enhance he quali y o esea ch knowledge g aphs. The seamless in eg a ion o
mul imodal AI p ocessing, seman ic simila i y de ec ion, and guided use wo k lows
es ablishes a new pa adigm o e icien ly ans o ming uns uc u ed scien i ic li e a u e in o
ac ionable, in e connec ed knowledge.
AIKG-SD 2025 Summe School co-loca ed wi h he NFDI4DS Con e ence 2025
No embe 25-26, 2025, Be lin, Ge many
2
Re e ences
1. H. Hussein, F. Ahmed, A. Oelen, R. Ewe h, and S. Aue , "ORKGEx: Le e aging
Language and Vision Models wi h Knowledge G aphs o Resea ch Con ibu ion
Anno a ion," p esen ed a he IARIA WEB Con e ence, Lisbon, Po ugal, Ma . 2025.