scieee Science in your language
[en] (orig)

Using ChatGPT for generating SKOS thesauri from hand-drawn sketches (preprint)

Author: Kraus, Felix; Blumenröhr, Nicolas
Publisher: Zenodo
DOI: 10.5281/zenodo.17310591
Source: https://zenodo.org/records/17310591/files/DH2025__Using_ChatGPT_for_generating_SKOS_thesauri_from_hand_drawn_sketches.pdf
Using Cha GPT o gene a ing SKOS hesau i
om hand-d awn ske ches
Felix K aus[0000−0002−2102−4170], Nicolas Blumen ¨oh [0009−0007−0235−4995]
Ka ls uhe Ins i u e o Technology, Ka ls uhe, Ge many
[email p o ec ed]
Abs ac . This pape shows how Cha GPT can be used o c ea e SKOS
hesau i om hand-d awn ske ches o digi al d a s. The me hod makes
he s a o hesau i c ea ion easie and as e compa ed o s anda d
SKOS edi o s. Using he DHA and TaDiRAH axonomies and a ic ional
language example, he esul s show good accu acy o mos labels and e-
la ions bu also some small e o s, like w ong hie a chy le els o spelling
issues wi h ic ional wo ds. While no pe ec o big hesau i, his ap-
p oach lowe s ba ie s o c ea ing FAIR da a and suppo s acqui ing
knowledge in building SKOS hesau i.
Keywo ds: SKOS · hesau i ·LLM ·Cha GPT ·p omp enginee ing
1 In oduc ion and Mo i a ion
Thesau i and con olled ocabula ies play an impo an ole in o ganizing and
s uc u ing knowledge (Hy ¨onen (2020)), especially when handling humani ies
da a collec ions wi h hei di e se ypes o esea ch objec s. Fo example, hesau i-
s uc u ed me ada a enables esea che s o use compu e so wa e o que y
linked da a and use i , e.g. o da a anno a ion. The de elopmen o such he-
sau i is no a s aigh - o wa d p ocess. Using edi o s o SKOS hesau i1such as
P o ´eg´e (Musen (2015)), VocBench (S ella o e al. (2017)), o ou edi o EVOKS
(E ns e al. (2023)) a he hinde s han ampli ies he c ea i i y and lexibili y
needed especially o he ini ial d a s ha o en imes is an impo an pa o
he esea ch.
In his pape , we explo e u ilizing Cha GPT2, which uses a La ge Language
Model (LLM) designed by OpenAI as base, o ans o ming au oma ed gene -
a ion o SKOS hesau i based on hand-d awn ske ches o digi al d a s c ea ed
in ools like d awio3. This me hod can o e a p agma ic solu ion o educing
he ini ial wo kload, especially when he e is li le p io expe ience in c ea ing
SKOS hesau i. I addi ional esea ch o da a managemen so wa e is used ha
equi es SKOS hesau i, e.g. o (me a)da a managemen , i can be quickly e al-
ua ed i he de eloped hesau us s uc u e ul ils equi emen s posed by such
1Simple Knowledge O ganiza ion Sys em (SKOS) is he mos common da a model
used o ep esen machine- eadable hesau i (Conway e al. (2016)).
2h ps://cha gp .com/
3h ps://www.d awio.com/
2 F. K aus
ools. Na u ally, i is possible any ime o impo he esul ing SKOS ou pu
o Cha GPT in o an edi o o u he e inemen s. Ano he bene i o e con-
en ional edi o s is he possibili y o suppo he use in he lea ning p ocess
o SKOS when making use o he in e ac i e na u e o Cha GPT, e.g. because
selec ed ea u es o he code can be explained o he use .
2 Me hodology
To e alua e he esul s coming ou o Cha GPT, we c ea ed ske ches, bo h hand-
d awn and using d awio. The ske ches a e based on he DHA axonomy4and
on TaDiRAH: Taxonomy o Digi al Resea ch Ac i i ies in he Humani ies5. To
closely esemble eal-wo ld applica ions o his app oach, we emo ed all p op-
e ies om he e ms excep he URI, he English label (skos:p e Label), hi-
e a chy ela ions (skos:na owe and skos:b oade ) and he membe ship o
he concep scheme (skos:inScheme) as well as he decla a ion o he concep
scheme i sel . We hen emo ed selec ed hie a chy b anches o dec ease he num-
be o e ms using ou py hon code (published wi h MIT licence on Gi Hub6).
This led o hesau i small enough o d a by hand in a mind map-like s uc u e
(see ig. 1) which we e hen digi ized. Bo h, he images om he hand-d awn
ske ch and he d a c ea ed wi h d awio we e hen ed in o Cha GPT oge he
wi h he p omp as shown in ig. 2, esul ing in he ou pu o SKOS iles.
Addi ionally, we c ea ed 50 andom ic ional language wo ds (like eumkauh
o o ko ) using (Shack (2011)) o c ea e a hand-d awn, andom hie a chy ou o
i . This p e en s ha he GPT model al eady ”knows” he con en o he o he
wo publicly a ailable hesau i because hey migh ha e been used o aining
he model.
To eplica e ou expe imen s, we published ou da a as a supplemen o his
pape 7. We used he OpenAI 4o-model, only a single p omp inpu wi hou
e ining he ou pu , we used a new cha o e e y p omp and we disabled he
memo y unc ion. Finally, we compa ed he SKOS ou pu o Cha GPT wi h he
hesau i which he hand-d awn ske ches we e based on.
3 Resul s and Limi a ions
When looking a he esul s o he wo hand-d awn hesau i o he size-dec eased
DHA and TaDiRAH hesau i, he esul ing SKOS ile closely ma ches he one we
used as sou ce. The main di e ences we e an in oduced capi aliza ion, change
4h ps:// ocabs.da iah.eu/dha_ axonomy/en/, CC BY 4.0, C ea o s: ACDH-
OEAW Team
5h ps:// ocabs.da iah.eu/ adi ah/en/, CC0, C ea o s: Luise Bo ek, Canan
Has ik, Ve a Kh amo a, Jona han Geige
6h ps://gi hub.com/FelixF izzy/ d - ools/ ee/main/
hie a chy-sub anches, DOI: 10.5281/zenodo.12731609
7h ps://doi.o g/10.5281/zenodo.14290535 (CC-BY-SA)
3
Fig. 1. Ini ial hesau us s uc u e ans e ed in o hand-d awn igu e.
Fig. 2. The iden ical p omp o Cha GPT used o gene a e a SKOS hesau us in u le
se ializa ion using mind map-like hand-d awn d a s.
4 F. K aus
o he label o he uppe mos e m, and h ee e ms one le el oo deep in he
hie a chy, bu s ill in he igh b anch. All o he 35 labels as well as he b oade
and na owe ela ionships we e iden ical.
Fo he d awio case, he o iginal uppe mos e m was one le el oo deep in
he hie a chy and a new one was c ea ed. Fu he mo e, one label was al e ed
om ”Upload” o ”Uploading”, which would hen be mo e consis en wi h he
e m ”Pos ing” in he same le el. All o he labels and ela ions we e co ec .
Examining he Cha GPT ou pu when using he hand-d awn image wi h
ic ional language e ms as inpu , we ind ha all ela ions a e co ec . Di e -
en om he o he cases, mos o he labels a e misspelled. This indica es ha
Cha GPT is using a combina ion o handw i en ex ecogni ion and LLM-
based e o co ec ion o ge he bes esul s. The la e na u ally ails when
using ic ional wo ds. I has o be men ioned ha iden i ying he co ec le -
e s in he image is also challenging o humans, which can be assessed in he
published pape supplemen .
While hese esul s look p omising, i is impo an o keep in mind ha
LLMs a e no de e minis ic and can change he da a and hallucina e. Especially
in hese cases, LLMs migh iola e he copy igh o da a. In ou case, his is a
mino issue because exis ing con en is ans o med, no c ea ed.
We ound ha he p omp gene a ion will s op o a highe numbe o e ms
(in ou case, mo e han abou 100) which poses a big limi a ion. This can be
easily p e en ed by using he OpenAI API o by using any o he LLM ha
accep s image inpu .
4 Ou look and Conclusion
We plan o conduc u he wo k on using Cha GPT o alida ion o SKOS
iles, au oma ed ansla ion o adding desc ip ions and ela ions o e.g. Wikida a
i ems which would enhance he usabili y o ou p oposed app oach.
To summa ize, ou expe imen s s ongly sugges ha Cha GPT o simila
can accele a e he p ocess o c ea ing a SKOS ocabula y om handw i en o
digi al d a s. Se ing aside he equi ed p io knowledge o se ing up he base
s uc u e o a alid SKOS ile, using an edi o equi es manually en e ing all
e ms. E en in cases whe e his can be au omized by handw i en ex ecogni-
ion o simila , adding a e m ela ion would s ill equi e a leas one click o
pas ing and adjus ing one line o code. This p ocess akes a longe han he ew
seconds ha i ook Cha GPT o c ea e he esul . On he downside, his also
poses he dange o misunde s anding impo an p ope ies o he SKOS da a
model and he e o e c ea ing da a model iola ions. Howe e , by a he bigges
bene i is ha he ba ie o c ea ing Findable, Accessible, In e ope able and
Reusable (FAIR)8da a is emendously lowe ed and he lea ning cu e o using
dedica ed edi o s can be la ened by suppo ing he esea che s wi h in e ac i e
help.
8h ps://www.go- ai .o g/ ai -p inciples/
Bibliog aphy
Mike Conway, A em Khojoyan, Fa iba Fana, William Scuba, Melissa Cas ine,
Danielle Mowe y, Wendy Chapman, and Simon Jupp. De eloping a web-based
SKOS edi o . Jou nal o Biomedical Seman ics, 7(1):5, Ap il 2016. ISSN 2041-
1480. h ps://doi.o g/10.1186/s13326-015-0043-z.
Felix E ns , Lau a F ank, and Ge maine G¨o zelmann. EVOKS - Be-
nu ze eundliche E s ellung kon ollie e Vokabula e ¨u die Geis eswis-
senscha en. In FORGE 2023 - Fo schungsda en in Den Geis eswis-
senscha en: Any hing Goes?! Fo schungsda en in Den Geis eswissenscha en
- K i isch Be ach e . Kon e enzabs ac s, T¨ubingen, Ge many, Oc obe 2023.
h ps://doi.o g/10.5281/zenodo.8386468.
Ee o Hy ¨onen. Using he Seman ic Web in digi al humani ies: Shi
om da a publishing o da a-analysis and se endipi ous knowledge dis-
co e y. Seman ic Web, 11(1):187–193, Janua y 2020. ISSN 1570-0844.
h ps://doi.o g/10.3233/SW-190386.
Ma k A. Musen. The P o ´eg´e P ojec : A Look Back and a Look
Fo wa d. AI ma e s, 1(4):4–12, June 2015. ISSN 2372-3483.
h ps://doi.o g/10.1145/2757001.2757003.
Elizabe h Shack. Random Wo d Gene a ion o Fic ional Languages. Wol am,
2011.
A mando S ella o, And ea Tu ba i, Manuel Fio elli, Tiziano Lo enze i, Euge-
niu Cos e chi, Ch is ine Laaboudi, Willem Van Geme , and Johannes Keize .
Towa ds VocBench 3: Pushing Collabo a i e De elopmen o Thesau i and
On ologies Fu he Beyond. In 17 h Eu opean Ne wo ked Knowledge O gani-
za ion Sys ems (NKOS) Wo kshop, olume 1937, pages 39–52, Thessaloniki,
G eece, Sep embe 2017. CEUR Wo kshop P oceedings.