scieee Science in your language
[en] (orig)

Fully Algorithmic Librarian: Large-Scale Citation Experiments

Author: Stompor, Tomasz; Zittel, Janina; Koch, Thorsten; Rusch, Beate
Publisher: Zenodo
DOI: 10.5281/zenodo.17200746
Source: https://zenodo.org/records/17200746/files/Poster_FAN_ISSI_final.pdf
Fully Algo i hmic Lib a ian:
La ge-Scale Ci a ion Expe imen s
Tomasz S ompo , Janina Zi el, Tho s en Koch and Bea e Rusch
Coope a ion: KOBV, Kompe enzne zwe k Bibliome ie
Fully Algo i hmic Lib a ian:
La ge-Scale Ci a ion Expe imen s
Tomasz S ompo , Janina Zi el, Tho s en Koch and Bea e Rusch
Coope a ion: KOBV, Kompe enzne zwe k Bibliome ie
P ojec Ou line
The Fully Algo i hmic Lib a ian (FAN) is an in e disciplina y p ojec a he Zuse Ins i u e Be lin, combining ma hema ics and lib a y & in o ma ion
science o explo e how lib a ies can use algo i hmic me hods o be e suppo academic esea ch.
We analyze la ge-scale ci a ion ne wo ks wi h up o 119M eco ds and 1,4B ci a ions using da a om Web o Science [1] and OpenAlex[2], building
aci a ion g aph as a ounda ion o u u e AI-powe ed se ices in lib a ies. In his p ocess, we de elop ma hema ical me hods ha can be applied
o in e disciplina y ci a ion g aphs. The de elopmen o he knowledge g aph is accompanied by exploi ing abs ac ec o iza ion using embedding
models o compa e how hey align.
We aim o enable lib a ies o offe scalable,da a-d i en, and open bibliome ic se ices, empowe ing esea che s and ins i u ions o be e un-
de s and and shape he academic landscape. Ou mission is o de elop an open p o o ype ha can be eimplemen ed by he scien ific communi y.
E alua ing Scien ific P es ige Using Ci a ion G aphs
E alua ing scien ific impac equi es p ecise measu emen o indi idual a icle influence, adi ionally as-
sessed h ough ci a ion-based me ics. Recen ly, app oaches ha e shi ed owa d le e aging ci a ion g aph
s uc u es a he han elying solely on aw ci a ion coun s — o example, by employing he PageRank
me hod o assess scien ific p es ige [3].
Beyond anking influence, PageRank also se es as a aluable ool o compa ing bibliome ic da abases,
e ealing ha ci a ion-based p es ige inhe en ly depends on he comple eness and accu acy o he chosen
da ase .
The PageRank compu a ion o Web
o Science (2000–2021) and Ope-
nAlex (1950–2020) highligh s diffe -
ences in bo h empo al co e age
and ci a ion ne wo k s uc u e. No-
ably, no PageRank is calcula ed o
he mos ecen 10 yea s in ei he
da ase , as he me ic equi es a 10-
yea ci a ion window.
Beyond his empo al aspec , he e-
sul s also e eal s uc u al a ia ions
be ween he wo ci a ion ne wo ks.
Mos p ominen ly, he WoS is mo e
s ongly connec ed eflec ed by a
highe a e age PageRank.
The s uc u al diffe ences o bibliome ic da ase s illus a ed by
a PageRank me ic ollowing Chen e al. (2023) wi h 10 yea s
ci a ion span and a damping ac o o 0.5 on WoS and
OpenAlex.
Visualizing Publica ion Ne wo ks
Visualiza ion o a subg aph using ci a ion da a om he IPCC Assessmen Repo 6 (2021) in he field o
clima e esea ch. The IPCC da a was ma ched wi h he da a om OpenAlex.
Funding
Clus e ing Techniques
Analyzing la ge ci a ion g aphs equi es au oma ed classifica ion.
Compa ison o algo i hmic clus e ing labels wi h WoS labels o a
subg aph o WoS on Ma hema ics and Ope a ion Resea ch &
Managemen Science.
Disciplines o en o e lap, so we use a mul i-label clus e ing app oach
based on a icle simila i y de i ed om ci a ions [4]. The clus e ing
ask is o find a so clus e assignmen X∈[0,1]C×N ha minimizes
he disc epancy be ween a simila i y S∈RN×Nobse ed om ci a-
ions and he simila i y om a p edic ed dis ibu ion:
(X) =
N
∑
i=1
N
∑
j=1 si j −
C
∑
k
xkixk j!2
.
The op imiza ion p oblem is non-con ex and la ge-scale, wi h num-
be o a icles N>107and numbe o clus e s C∼100–500.
We use GPU-accele a ed g adien descen (CUDA) o scale o mas-
si e g aphs:
700k-node subg aph: clus e ing in 30 seconds
Full OpenAlex (60M nodes, 1.2B links): in p og ess
A icle 1 A icle 2 ... A icle N
Clus e 1 x11 x12 ... x1N
Clus e 2 x21 x22 ... x2N
.
.
..
.
..
.
..
.
.
Clus e C xC1xC2... xCN
S uc u e o he clus e assignmen ma ix X
Use-Case Scena ios
Unlike he e alua i e applica ion o bibliome ics h ough s a is ical
ools, ou app oach ocuses on explo a o y me hods ha could be
used in he ollowing scena ios:
au oma ic clus e ing and classifica ion
de ec ion o eme ging esea ch opics
 isualiza ion o in e disciplina y ne wo ks
 hema ic sea ch
assis an o po en ially missing ci a ions
 e iewe and collabo a o ecommenda ion
de ec ion o blind spo s: wha is no ep esen ed in he da a?
Publica ions
[1] We acknowledge he use o WoS h ough he Kompe enzne zwe k Bibliome ie. Suppo ed ia he Ge man Compe ence Ne wo k o Bibliome ics unded by he Fede al Minis y o Educa ion and Resea ch (G an : 16WIK2101A).
[2] J. P iem, H. Piwowa , R. O . OpenAlex: A ully-open index o schola ly wo ks, au ho s, enues, ins i u ions, and concep s. A Xi .h ps://a xi .o g/abs/2205.01833, 2022
[3] Y. Chen, T. Koch, N. Zakiye a, K. Liu, Z. Xu, C-h. Chen, J. Nakano, K. Honda: A icle’s scien ific p es ige: Measu ing he impac o indi idual a icles in he web o science. Jou nal o In o me ics, 17, 101379, 2023
[4] T. Nepusz, A. Pe óczi, L. Négyessy, F. Bazsó: Fuzzy communi ies and he concep o b idgeness in complex ne wo ks, Phys. Re . E 77, 016107, 2008.
[5] V.T. Huong, I. Li zel, T. Koch: Simila i y-based uzzy clus e ing scien ific a icles: po en ials and challenges om ma hema ical and compu a ional pe spec i es, a xi :2506.04045, 2025.