●A o al o mo e han
200 e minologies,
●Mo e han 1,097,000 e ms
●Collec ion o 38 e minologies
ele an o he ESS
F om Idea o P o o ype: Using BITS and LLMs o au oma e he
anno a ion p ocess o SGN Collec ion Da a
// Cu en Challenges
DFG p ojec numbe 508107981
Senckenbe g Collec ions:
●Mo e han 40 million physical specimen
●Cu en ly: 1,547,711 digi al specimen
○in 124 collec ions
●Use o mixed languages
●Use o axonomic s uc u es in objec
desc ip ions
●Use o labels and a ious image da a
●Plain ex wi hou linked da a
Example ex en y:
Mine als: 2. Sul ides
and Sul osal s: 2.C:
Me al Sul ides, M: S
= 1: 1 (and simila ):
2.CD.: 2.CD.10:
Galeni
The Challenge:
Iden i y anno a able chunks o ex ,
o , in o he wo ds, ecognize noun
g oups, subg oups, and ele an
single nouns ac oss a ious
languages.
// Nex Challenges
SGN's Collec ion Da a includes no only ex s and desc ip ions, bu also a wide ange o
associa ed media con en , om pho og aphs and scans o complex 3D models. These con en s
p o ide us wi h a weal h o insigh s ha can be made accessible h ough app op ia e
seman ically anno a ed ex ual cap u e. This p ocess equi es he use o sui able mul imodal AI
models, which a e cu en ly being es ed and will be ex ended in he u he wo k.
P i acy Policy
s.
Resou ce Limi a ions
// Cu en Expe iences
●LLM Size ma e s in many cases
○On he one hand, a la ge model usually has a be e unde s anding o he asks, needs and expec a ions
○On he o he hand, signi ican ly highe esou ce equi emen s, pa icula ly in RAM and ime, necessi a e g ea e in es men
○A he same ime, some ligh weigh models below 32B s uggle o ollow ask speci ica ions and p oduce well-in en ioned bu useless ou pu
●Impo an o unde s and: The s uc u e o noun ph ases in a sen ence is no s ic ly de e minis ic!
○Bigges models may ha e di e en ideas in a ious wo king sessions
○Simple p e-p ocessing and sugges ions, such as p e-selec ed NP combina ions om SpaCy, can enhance accu acy in he ollowing s eps
●Too much s. oo li le specialisa ion
○Too much specialisa ion o a model inc eases he o e all e o . Too li le educes e ec i eness
○Lack o aining esou ces o a speci ic model such as BERT and simila a ian s
●Find he swee spo :
○As small as possible, bu able o pe o m asks wi hou hallucina ion and wi h easonable pos -p ocessing e o
○Mode n echnology (LLM T ans o me ) wi h p ospec s o con inuous u he de elopmen
○MoE as an ex ension o he T ans o me : Replacing Feed-Fo wa d Ne wo ks by (a subse o ) mul iple expe s
Au ho s:
Alexande Wolodkin
h ps://o cid.o g/0000-0003-1556-8750
Alexande .W[email p o ec ed]
Claudia Ma ens
h ps://o cid.o g/0000-0003-2478-4295
[email p o ec ed]
Con ac BITS:
[email p o ec ed]
Image c ea ed by AI
Image pa ially c ea ed by AI
AI-c ea ed image based on he cu en e alua ion p ocess
Wi hin BITS, a Te minology Se ice (TS) will be es ablished o subfields o ESS
// Digi al Collec ion Wo k low
AI-c ea ed image based on he cu en wo k low p ocess