Chatting Over Course Material: The Role of Retrieval Augmented Generation Systems in Enhancing Academic Chatbots.

Author: Monteiro, Hélder

Publisher: Zenodo

DOI: 10.5281/zenodo.17735676

Source: https://zenodo.org/records/17735676/files/MA_thesis_Monteiro_2024_LTU.pdf

DEGREE PROJECT
Cha ing O e Cou se Ma e ial.
The Role o Re ie al Augmen ed Gene a ion Sys ems
in Enhancing Academic Cha bo s
Hélde Mon ei o
Mas e P og amme in Applied A i icial In elligence
2024
Luleå Uni e si y o Technology
Depa men o Compu e Science, Elec ical and Space Enginee ing
[This page in en ionally le blank]
Abs ac
La ge Language Models (LLMs) ha e he po en ial o enhance lea ning among s uden s.
These ools can be used in cha bo sys ems allowing s uden s o ask ques ions abou
cou se ma e ial, in pa icula when plugged wi h he so-called Re ie al Augmen ed
Sys ems (RAGs). RAGs allow LLMs o access ex e nal knowledge, which imp o es
ailo ed esponses when used in a cha bo sys em. This hesis s udies di e en RAGs
h ough an expe imen a ion app oach whe e each RAG is cons uc ed using di e en
se s o pa ame e s and ools, including small and la ge language models. We conclude
by sugges ing which o he RAGs bes adap s o high school cou ses in Physics and
unde g adua e cou ses in Ma hema ics, such ha he e ie al sys ems oge he wi h
he LLMs a e able o e u n he mos ele an answe s om p o ided cou se ma e ial.
We conclude wi h wo RAG-powe ed LLM wi h di e en con igu a ions pe o ming o e
64% accu acy in physics and 66% in ma hema ics.
P e ace
In his hesis, I explo e e ie al-augmen ed gene a ion sys ems (RAGs), which is an
exci ing echnique o hose wo king wi h o in e es ed in la ge language models (LLMs)
and a e keen on augmen ing hei cha bo s wi h ex e nal knowledge. Th oughou he
documen , I walk you h ough he a ionale o he expe imen s ha I conduc ed and
wha hey en ail, and I conclude wi h some ema ks on he esul s.
In he p ojec , local LLMs we e used: i) o gene a e syn he ic ques ion answe pai s
on publicly a ailable educa ional ma e ial om MIT’s OpenCou seWa e in o de o
expe imen wi h di e en RAGs; ii) o un di e en RAGs, and iii) o e alua e he
esul s.
The hope is ha he esul s p esen ed he e a e meaning ul and can be used o u he
p o ide an unde s anding o RAGs, ailo ed o educa ion ma e ial, such as class no es,
ideos, and audio.
Con en s
1 In oduc ion 1
1.1 Goals ...................................... 3
1.2 Ou line ..................................... 3
2 Backg ound and ela ed wo k 4
2.1 F omNLP oLLMs .............................. 4
2.2 Open-sou ceLLMs............................... 5
2.3 Cha bo sinEduca ion............................. 6
2.4 RAGTechniques ................................ 8
2.5 Syn he ic Da a Gene a ion . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Ma e ials and Me hods 10
3.1 Syn he ic Da a Gene a ion . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 ToolsandLanguages.............................. 11
3.3 Expe imen Design............................... 12
4 Resul s 14
4.1 Syn he icQAda a............................... 14
4.2 Re ie alcapabili y............................... 15
4.3 Q&AE alua ion ................................ 16
5 Discussion and Conclusion 19
5.1 Discussion.................................... 19
5.2 Conclusion ................................... 20
5.3 Fu u ewo k................................... 20
5.4 E hical conside a ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Bibliog aphy 21
A Appendices 31
A.1 RAGScena ios ................................. 31
A.2 Sho g ound- u h answe s in Ma hs . . . . . . . . . . . . . . . . . . . . . 32

B Ex a igu es 35
B.1 Ma hs RAG on high pe o ming physics RAG . . . . . . . . . . . . . . . . 35
B.2 Physics RAG on high pe o ming ma hs RAG . . . . . . . . . . . . . . . . 36
Lis o Figu es
3.1 Schema ic o he pipeline o gene a e syn he ic da a. . . . . . . . . . . . . 10
4.1 Coun o QA pai s gene a ed. . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Accu acy pe each Physics RAG (See able A.1 o de ailed con igu a ion
o he RAGs shown he igu e). . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 Accu acy pe each Ma hs RAG (See able A.1 o de ailed con igu a ion
o he RAGs shown he igu e). . . . . . . . . . . . . . . . . . . . . . . . . 16
4.4 Coun o cha ac e s (log-scale) o ques ion, answe (RAG) and g ound-
u h answe o he high pe o ming Physics RAG (#2). . . . . . . . . . 17
4.5 Coun o cha ac e s (log-scale) o ques ion, answe (RAG) and g ound-
u h answe o he high pe o ming Ma hema ics RAG (#57). . . . . . . 18
B.1 Coun o cha ac e s (log-scale) o ques ion, answe (RAG) and g ound-
u h answe o Ma hema ics RAG (#2). . . . . . . . . . . . . . . . . . . 35
B.2 Coun o cha ac e s (log-scale) o ques ion, answe (RAG) and g ound-
u h answe o Physics RAG (#57). . . . . . . . . . . . . . . . . . . . . 36
Lis o Tables
3.1 Expe imen pa ame e s used in he s udy . . . . . . . . . . . . . . . . . . 12
A.1 Re ie al-augmen ed gene a ion (RAG) scena ios used in he expe imen-
a ion....................................... 32
A.2 Ques ions, Answe s, and G ound T u hs . . . . . . . . . . . . . . . . . . . 34
1 In oduc ion
Since he ise o Cha GPT, he e has been a lo o hype a ound La ge Language Mod-
els (LLMs) and cha bo sys ems. These echnologies ha e enabled us o imp o e ou
wo k low (e.g. Gi Hub Copilo as code comple ion ool), and e en being used as a s udy
companion. LLMs a e o en seen as gian s a is ical models (Rosen eld, 2000) ained on
billions o ex s om he in e ne made a ailable h ough p ojec s like Common C awl1.
They a e capable o no only gene a ing ex in mul iple na u al languages bu also code,
do machine ansla ion and e en summa ize ex s.
The e has been di e en esea ch conduc ed wi hin he use o LLMs in educa ion
se ings (Vacalopoulou e al., 2024; Alexand a Fa azouli and McG a h, 2024; Yu, 2023;
La i e al., 2024; Xiao e al., 2023; Nechakhin, D’Souza, and Ege , 2024; Yen and Hsu,
2023) wi h di e en ocuses including ma hema ical lea ning (Yen and Hsu, 2023) and
impac in eache s’ assessmen s (Alexand a Fa azouli and McG a h, 2024). In some
cases, he use o LLMs is encou aged, such as a S an o d Uni e si y’s “C ea i i y and
Design Thinking P og am”2cou se, whe e s uden s submi he p omp hey used ha
ga e ise o he solu ion, he eby e alua ing hei c ea i i y in w i ing p omp s (Klebahn
and K akowski, 2023; Leung1and Lo, 2024). Di e en uni e si ies and eache s see he
ool di e en ly, ei he as an enable o be e educa ion (Gaˇse i´c, Siemens, and Sadiq,
2023) o as a de ac o o e ec i e lea ning (S ish i, 2024), whe eby s uden s become
dependen on he ools.
The esea ch in his ield has p og essed a a s eady pace, and has seen he ise o
open-sou ce LLMs and echniques o augmen hei knowledge using domain speci ic
da a. The a ailabili y o such models made i s adop ion on consume ha dwa e much
easie , hanks o a o dable G aphics P ocessing Uni (GPU) ca ds and echniques aimed
a comp essing hem o in e ence on Cen al P ocessing Uni s (CPUs) only. This means
ha anyone wi h a decen compu e can ha e locally un LLMs and cha bo s ha a e
powe ul enough o gene a e ex and allow cus omiza ion o di e en asks.
When i comes o augmen ing he knowledge o an LLM, one idea is o use wha
is called e ie al augmen ed gene a ion (RAG) sys em, which simply pu is a way o
in eg a e ex e nal knowledge in o he LLM using di e en ools. This knowledge can
come om di e en sou ces: documen s, da abases, media iles, he in e ne , e c., such
ha he model can access hem be o ehand, in a p ep ocessed o m, and c ea e he
1h ps://commonc awl.o g/
2h ps://online.s an o d.edu/how-you-can-use-cha gp -inc ease-you -c ea i e-ou pu
1
2.4 RAG Techniques
The Re ie al-Augmen ed Gene a ion (RAG) sys em is an essen ial componen o en-
hance knowledge o a la ge language model (LLM). Ha ing a RAG-powe ed LLM ad-
d esses issues such as ou da ed in o ma ion and hallucina ions (Ding e al., 2024) which
would limi he LLM in p o iding ele an answe s o he use , pa icula ly in he con-
ex o a cha bo . The RAG sys em wo ks h ough a combina ion o da a indexing,
e ie al and gene a ion (Gao e al., 2024) in an end- o-end ashion.
In hei su ey pape , Gao e al., 2024 ca ego ize RAGs in o h ee ypes: Nai e,
Ad anced and Modula . The nai e consis s solely in common pipelines o indexing,
e ie ing and gene a ion. The ad anced, op imizes he use que y by ew i ing i (Peng
e al., 2024), so ha ele an in o ma ion is e ie ed. This is use ul when he que y
om he use is oo wo dy o unclea ha may a ec he sys em e ie al capabili y
when sea ching o simila ex s. The modula RAG is mo e cus omizable o in eg a e
wi h di e en componen s wi hin he RAG pipeline.
Es e al., 2023 p oposes Ragas, a amewo k o e alua e RAG pipelines using me ics
such as ai h ulness, answe and con ex ele ance. The amewo k can be used o
gene a e syn he ic da a ha can hen be used o es RAG sys ems. Th ough hei
documen a ion3, Ragas seem o equi e OpenAI’s REST API o gene a e he da a,
he e is no men ion o usage wi h local LLMs bu he amewo k gi es an idea o wha
is possible, when i comes o e alua ing RAG sys ems.
Salemi and Zamani, 2024 in oduces eRAG, ano he amewo k o e alua e RAG
pipelines. This ool e alua es e u ning answe s om he RAG sys em agains hei
g ound- u h, meaning ha i he e ie al sys em e u ns k esponses, each o hem
is e alua ed agains he g ound- u h and assigned a label, be o e he inal answe is
e u ned o he use . This di e s om a mo e di ec e alua ion app oach o ins ance,
he one om Rouche , n.d. which e alua es solely on he inal esponse om he RAG
and no on each esponse om he e ie al sys em. None heless, he au ho s a gue ha
hei amewo k is mo e e icien ha exis ing app oaches.
2.5 Syn he ic Da a Gene a ion
Syn he ic da a gene a ion is a common p ac ice wi hin he AI p ac ice (Pu i e al., 2020;
Shake i e al., 2020; Riabi e al., 2020; Albe i e al., 2019; Wang e al., 2022; Rouche ,
n.d.) o educe eliance on human anno a ions which can be expensi e.
Riabi e al., 2020 in oduces an app oach o c oss-lingual syn he ic da a gene a ion,
making use o English ques ion and answe (QA) model and ansla e he gene a ed
pai s in o mul iple languages. The da a is hen used o ain be e mul ilingual QA
models. The au ho s say ha hei app oach ou pe o ms English-only baseline models
(Riabi e al., 2020).
Shake i e al., 2020 build on op o he SQuAD (Rajpu ka e al., 2016) da ase and
gene a e addi ional syn he ic QA pai s using a ans o me -based model. The model
3h ps://docs. agas.io/en/s able/concep s/ es se _gene a ion.h ml
8

no only gene a es he da a bu also il e s he bes candida es using likelihood sco e
(Shake i e al., 2020).
Pu i e al., 2020 uses GPT-2 model o gene a e syn he ic QA pai s. They b eakdown
ex in o pa ag aphs and pass hose o an LLM o gene a e QA pai s. Quali y checks
a e done using BERT (De lin e al., 2019; Ja ed e al., 2022), which il e s i ele an
couples (Pu i e al., 2020). Simila app oach is done by Rouche , n.d. which uses Mix al-
8x7B-Ins uc - 0.1 (Jiang e al., 2024) model ha ing 56 billion pa ame e s. The wo k
o Rouche , n.d. sligh ly di e s on ha o Pu i e al., 2020, whe e Mix al model is
used o e e y hing: ques ion and answe gene a ion and e alua ion, bo h o which done
h ough p omp s. The e alua ion consis s o h ee me ics: g oundedness, ele ance and
s andalone sco es. The g oundedness e alua es he u h ulness o he gene a ed pai
gi en he e ie ed con ex . The ele ance ela es o he domain o he da a, o ins ance
i we wan o e alua e he ele ance o physics and ma hema ics, he model will be asked
o assign a sco e based on he ele ance o hese ields. The s andalone me ic sco es
he QA pai on whe he he e is implici men ion o con ex in he ques ion, which migh
indica e low quali y o ques ion gene a ed. All hese me ics by Rouche , n.d. ake on
he alues be ween 1 o 5, which a e hen il e ed ou o a minimum o 4 ac oss he
h ee me ics.
These s udies a e impo an o ou p ojec , specially he wo k o Rouche , n.d.
as i can be adap ed o un wi h sligh ly smalle language models wi hin a consume
ha dwa e, o ins ance Llama 3 8B4(AI@Me a, 2024) o gene a e and e alua e syn he ic
da a, as well as es a ious RAG sys ems.
4h ps://en.wikipedia.o g/wiki/Llama_(language_model)
9
3 Ma e ials and Me hods
3.1 Syn he ic Da a Gene a ion
Since ou goal is o expe imen wi h di e en RAGs, we should ha e ques ion and answe
pai s o e alua e each RAG ha we cons uc . In igu e 3.1 we show he schema ic o
he pipeline o gene a e syn he ic da a.
Di ec o yLoade
(PyPDFLoade )
Recu si e
Cha ac e Tex
Spli e
Chunk size: 2000
Chunk o e lap: 200
Sepa a o s:
[" n n", " n", ".", " ", ""]
"./cou ses/RES.8-009/"
"./cou ses/18.01/"
P omp Gene a o LLM
(Llama3-8B)
Un il e ed ques ion
and answe pai s
E alua o LLM
(Llama3-8B)
> G oundedness (1-5)
> Rele ance (1-5)
> S andalone (1-5) il e ed ques ion and
answe pai s
>=4
Figu e 3.1: Schema ic o he pipeline o gene a e syn he ic da a.
We ollow he wo k o Rouche , n.d. ha uses an open-sou ce LLM o gene a e
syn he ic da a. The au ho uses Mix al-8x7B-Ins uc - 0.1 (Jiang e al., 2024) model
whe eas in ou p ojec we use Llama3-8B model as i has shown be e pe o mance
compa ed o ea lie a ian s o he se ies wi h compa able size (AI@Me a, 2024) and i
can be un on he a ailable compu a ional esou ces ha we ha e. We modi y he code
o allow loading PDF documen s om a di ec o y using Di ec o yLoade module om
10
Langchain1whe e we pass PyPDFLoade module as class o guide Di ec o yLoade ha
he expec ed iles a e o PDF ype and ha PyPDFLoade should be used as a pa se . As
seen om igu e 3.1, a e each cou se olde is loaded sepa a ely, he nex s ep is o spli
he da a. We use Langchain’s Recu si e Cha ac e spli e wi h he same pa ame e s as
Rouche , n.d., ha is, chunk size o 2000 and chunk o e lap o 200. The lis o sepa a o s
a e de aul . These pa ame e s we e kep o ensu e enough ex is e ie ed (e.g. 2000
cha ac e s pe chunk, wi h o e lap be ween chunks o 200 cha ac e s). The sepa a o s
a e he means o spli he ex , i s s a ing wi h double newlines, ollowed by newline,
ull-s op, space and cha ac e le el (no space). Each o he sepa a o s a e such ha
he spli s ha e he mos ex wi hin he maximum allowed chunk size, and he spli e
i e a es o e he sepa a o s o ind he bes ha keeps ele an chunks oge he .
Fo he gene a o LLM which gene a es he QA pai s, and he e alua o LLM which
p o ide sco es o he QA pai s ac oss h ee me ics (g oundedness, ele ance and s an-
dalone) we e used wi h Llama3-8B model using he same p omp s o Rouche , n.d. How-
e e , he p omp o ele ance, we modi ied o include ha he sco ing should ensu e
ha he syn he ic ques ions a e ele an o physics and ma hema ics.
•Rele ance sco e (Physics)
The ele ance sco e is gi en depending on how use ul his ques ion
can be o high school senio s aking he cou se In oduc ion To
Oscilla ions And Wa es.
•Rele ance sco e (Ma hema ics)
The ele ance sco e is gi en depending on how use ul his ques ion
can be o unde g adua e s uden s aking he cou se Single Va iable
Calculus.
When he ques ion pai s a e sco ed, we il e hem o only e ain hose wi h sco e
g ea e han o equal o 4.
3.2 Tools and Languages
Th oughou he p ojec we use Py hon2p og amming language o gene a e syn he ic
da a and es mul iple RAG pipelines. O ches a ion ools a e equi ed o allow us o
make use o he LLMs o building applica ions. Mos componen s needed o building
RAG-powe ed LLMs a e p o ided by he o ches a ion ool. These include componen s
o ex spli ing, e ie al and s o age in seman ic da abases. Usually hese componen s
a e hi d-pa y lib a ies ha a e in eg a ed in o he o ches a ion ool. Fo ou p ojec ,
we use LangChain as o ches a ion ool as i is one o he easies and comp ehensi e
ools ha cu en exis s besides LlamaIndex, o p og amma ically in e ac wi h LLMs.
In o de o un local LLMs we use Ollama as i bes op imizes unning local LLMs
o di e en ha dwa es (Zimme mann and Roh e , 2024) wi h o wi hou GPU, and o
1h ps://en.wikipedia.o g/wiki/LangChain
2h ps://www.py hon.o g/
11
i s simplici y and ease o in eg a ion wi h di e en o ches a ion ools like LangChain.
Ollama wo ks in a simila manne as Docke , allowing o “pull” models om a eposi o y
and unning hem as local LLM REST APIs.
3.3 Expe imen Design
Majo pa in he p ojec consis o expe imen a ion. Fi s by gene a ion o syn he ic QA
da a and hen e alua ion o di e en RAG sys ems ha we ca e ully designed conside ing
ime and esou ce cons ain s. To gene a e he syn he ic da a and es ou RAG, we ocus
on subjec s ela ed o unde g adua e Ma hema ics cou se on Single Va iable Calculus
(Je ison, 2006) and high school Physics cou se on In oduc ion o Oscilla ion and Wa es
(Williams, 2017) bo h om MIT’s OpenCou seWa e. We picked hese subjec s as we
ind ha designing RAG-powe ed LLMs o ma hema ics and na u al science subjec s
like Physics a e mo e in e es ing om applica ion poin -o - iew as hese subjec s a e
ha d o mas e and hus i would be use ul o s uden s in gene al o enhance hei
lea ning wi h a RAG-powe ed LLM ailo ed o his ype o educa ional ma e ial.
In able 3.1 we ha e he pa ame e s used in ou expe imen s. Conside ing Ca esian
p oduc o he coun o each pa ame e , we ha e a o al o 64 scena ios o RAGs ha we
es each pe cou se subjec . Fo de ailed combina ions, see able A.1 in he appendix.
Pa ame e Values
Chunk Sizes 500, 1000
O e laps 50, 100
Vec o s o es Ch oma, FAISS
Models Phi3, Llama3
Embedding Models mxbai-embed-la ge, llama3
Tex Spli e s Cha ac e Tex Spli e , Recu si eCha ac e Tex Spli e
Table 3.1: Expe imen pa ame e s used in he s udy
The choice o chunk sizes, he idea was o ha e a balance be ween small (500) and
la ge (1000). The o e laps we e chosen in simila ashion, small (50) and la ge (100).
The ec o s o es is whe e we s o e he ex e nal knowledge o do seman ic sea ch. We
used wo ha a e popula , Ch oma3and Facebook AI Simila i y Sea ch (FAISS) (Douze
e al., 2024). These a e used wi hou any cus omiza ion, meaning ha we use de aul
pa ame e s when using hem in ou expe imen a ion pipeline, as we a e ocusing on
inding he bes RAG solely using de aul pa ame e s as hey a e, as hese can be ine-
uned la e on once he RAG is in use.
The models we use a e Phi-3 (Abdin e al., 2024) om Mic oso and Llama-3
(AI@Me a, 2024) om Me a AI. These wo models p o ide a good balance be ween
small (3.8B pa ame e s in Phi-3) and la ge (8B pa ame e s in Llama-3) so we can s udy
i model size a ec s he gene a ion quali y wi hin he RAG-powe ed LLM. The same can
3h ps://docs. ych oma.com/
12
be s udied wi h he embedding models which a e esponsable o c ea ing a ec o space
o which seman ic sea ch can be ca ied ou . We es wo models mxbai-embed-la ge
(Sean Lee, 2024; Li and Li, 2023) om MixedB ead AI ha has only 335M pa ame e s
and Llama-3. The LLMs should be passed wi h p omp s o guide hem h ough he
ask. The ollowing p omp was used in he expe imen s:
Answe he ques ion using only on he p o ided con ex .
Only espond o wha was asked wi hou epea ing he ques ion .
The esponse should be concise and ’ s aigh o he poin ’.
I you a e unable o answe he ques ion , say "I don ’ know ".
Con ex :
{con ex }
Ques ion:
{ ques ion }
Fo ex spli ing, we a e using cha ac e -le el and ecu si e cha ac e -based ex
spli e s. The di e ence be ween hem is ha he cha ac e -le el spli e spli s chunks
o ex on each cha ac e , whe eas ecu si e spli e allow us o decide on a lis o ex
sepa a o s o conside , whe e each o one is ied un il chunk size usage is maximized. I
a ecu si e cha ac e spli e has emp y s ing sepa a o , i becomes a cha ac e spli e .
E alua ion o he RAGs a e done using he p omp wi h he i e sco es p oposed
by Rouche , n.d., anging om 1 o comple ely inco ec /inaccu a e o 5 o comple ely
co ec /accu a e. The p omp a e passed o he local LLM, in ou case LLama3-8B ha
ac s like a judge on he gene a ion quali y o he RAGs compa ed agains he g ound-
u h answe s. In o de o calcula e accu acy, we choose sco e o 3 as he cu -o o
accu a e esul s as he he sco e implies somewha co ec /accu a e esponse om he
RAG.
13

4 Resul s
In his sec ion, we p esen he esul s o he expe imen s, whe e we an a pipeline o es
di e en RAG combina ions.
4.1 Syn he ic QA da a
We gene a ed a o al o 183 QA pai s o physics and 200 o ma hema ics. A e il e ing
o ele ance, g oundedness and s andalone sco es g ea e han equal o 4, we ob ained
119 QA pai s o physics and 137 pai s o ma hema ics, see igu e 4.1 o de ails.
un il e ed il e ed un il e ed il e ed
Subjec s
0
25
50
75
100
125
150
175
200
Coun (QA pai s)
183
119
200
137
Physics
Ma hema ics
Figu e 4.1: Coun o QA pai s gene a ed.
This gene a ed da a was hen used o es he 128 RAGs ha we c ea ed wi h
di e en pa ame e s.
14
4.2 Re ie al capabili y
A e unning he gene a ed da a o each o he 128 RAGs, we ob ained in e es ing
esul s. Fo Physics RAGs (see igu e 4.2), he maximum accu acy was 64% and ha
was achie ed wi h RAG #2 ha ing chunk size o 500, o e lap 50, ch oma as ec o s o e,
Cha ac e Tex Spli e as ex spli e , and embedding model was mxbai-embed-la ge
and main LLM was Llama-3.
123456789
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
Scena io / RAG (Physics)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Accu acy
0.64
Figu e 4.2: Accu acy pe each Physics RAG (See able A.1 o de ailed con igu a ion o
he RAGs shown he igu e).
Fo ma hema ics RAGs (see igu e 4.3), 66% maximum accu acy was achie ed o
RAG #57 ha ing chunk size o 1000, o e lap 100, Ch oma as ec o s o e, Recu -
si eCha ac e as ex spli e , and embedding model was mxbai-embed-la ge and main
LLM was Phi-3. This was comple ely opposed o wha we ob ained o he Physics
RAGs.
15
123456789
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
Scena io / RAG (Ma hema ics)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Accu acy
0.66
Figu e 4.3: Accu acy pe each Ma hs RAG (See able A.1 o de ailed con igu a ion o
he RAGs shown he igu e).
4.3 Q&A E alua ion
Fo he highly pe o ming RAGs o physics and ma hema ics we plo ed he cha ac e
coun o he ques ion c ea ed du ing syn he ic da a gene a ion p ocess, and compa e
he coun o he g ound- u h answe and he answe e u ned by he indi idual RAGs.
F om igu e 4.4, we see ha he RAG answe s a e o compa able size o he g ound
u h, e en hough he e a e some picks in ei he o hem. On he o he hand, he
cha ac e coun o ma hema ics RAG (see igu e 4.5) ha pe o med bes , he numbe
o cha ac e s coun is highes in mos cases o he gene a ed answe . The g ound- u h
answe s we e ela i ely sho .
16
0 20 40 60 80 100 120
Ques ion/Answe index
100
101
102
Cha ac e coun (log scale)
Coun o cha ac e s // RAG 2 // Physics
Ques ion
Answe
G ound T u h
Figu e 4.4: Coun o cha ac e s (log-scale) o ques ion, answe (RAG) and g ound- u h
answe o he high pe o ming Physics RAG (#2).
17
Gunes, Yasin Celal and Tu ay Cesu (2024). “A Compa a i e S udy: Diagnos ic Pe o -
mance o Cha GPT 3.5, Google Ba d, Mic oso Bing, and Radiologis s in Tho acic
Radiology Cases”. In: medRxi , pp. 2024–01.
Gwon, Yong Nam, Jae Heon Kim, Hyun Soo Chung, Eun Jee Jung, Joey Chun, Se in
Lee, and Sung Ryul Shim (2024). “The Use o Gene a i e AI o Scien i ic Li e a u e
Sea ches o Sys ema ic Re iews: Cha GPT and Mic oso Bing AI Pe o mance
E alua ion”. In: JMIR Medical In o ma ics 12, e51187.
Hoch ei e , Sepp and J¨u gen Schmidhube (1997). “Long sho - e m memo y”. In: Neu-
al compu a ion 9.8, pp. 1735–1780.
Hum, Yan Chai, Yee Kai Tee, Wun-She Yap, Hamam Mokayed, Tian Swee Tan, Maheza
I na Mohamad Salim, and Khin Wee Lai (2022). “A con as enhancemen ame-
wo k unde uncon olled en i onmen s based on jus no iceable di e ence”. In: Signal
P ocessing: Image Communica ion 103, p. 116657.
Ide, Nancy and Jean V´e onis (1998). “In oduc ion o he special issue on wo d sense
disambigua ion: he s a e o he a ”. In: Compu a ional linguis ics 24.1, pp. 1–40.
Ja ed, Saleha, F ed ik Sandin, Hamam Mokayed, Je ke Delsing, and Ma cus Liwicki
(2022). “Deep On ology Alignmen wi h BERT INT: Imp o emen s and Indus ial
In e ne o Things (IIoT) Case S udy”. In.
Ja ed, Saleha, Muhammad Usman, F ed ik Sandin, Ma cus Liwicki, and Hamam Mokayed
(2023a). “Deep On ology Alignmen Using a Na u al Language P ocessing App oach
o Au oma ic M2M T ansla ion in IIoT”. In: Senso s 23.20, p. 8427.
Ja ed, Salman, Apa aji a T ipa hy, Jan an De en e , Hamam Mokayed, C is ina Pa-
niagua, and Je ke Delsing (2023b). “An app oach owa ds demand esponse op i-
miza ion a he edge in sma ene gy sys ems using local clouds”. In: Sma Ene gy
12, p. 100123.
Je ison, Da id (2006). Single Va iable Calculus. MIT OpenCou seWa e: Massachuse s
Ins i u e o Technology. 18.01 (Fall 2006). Licensed unde CC BY-NC-SA 4.0. A ail-
able a h ps://ocw.mi .edu/cou ses/18-01- single- a iable- calculus-
all-2006/.
Jiang, Albe Q., Alexand e Sablay olles, A hu Mensch, Ch is Bam o d, De end a
Singh Chaplo , Diego de las Casas, Flo ian B essand, Gianna Lengyel, Guillaume
Lample, Lucile Saulnie , L´elio Rena d La aud, Ma ie-Anne Lachaux, Pie e S ock,
Te en Le Scao, Thibau La il, Thomas Wang, Timo h´ee Lac oix, and William El
Sayed (2023). Mis al 7B. a Xi : 2310.06825 [cs.CL].
Jiang, Albe Q, Alexand e Sablay olles, An oine Roux, A hu Mensch, Blanche Sa a y,
Ch is Bam o d, De end a Singh Chaplo , Diego de las Casas, Emma Bou Hanna, Flo-
ian B essand, e al. (2024). “Mix al o expe s”. In: a Xi p ep in a Xi :2401.04088.
24

Jiao, Xiaoqi, Yichun Yin, Li eng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang,
and Qun Liu (2019). “Tinybe : Dis illing be o na u al language unde s anding”.
In: a Xi p ep in a Xi :1909.10351.
Ju a sky, Daniel and James H Ma in (n.d.). Speech and Language P ocessing: An In-
oduc ion o Na u al Language P ocessing, Compu a ional Linguis ics, and Speech
Recogni ion.
Khalid, Ma zuki, Rubiyah Yuso , and Hamam Mokayed (2011). “Fusion o mul i-classi ie s
o online signa u e e i ica ion using uzzy logic in e ence”. In: In e na ional Jou nal
o Inno a i e Compu ing 7.5, pp. 2709–2726.
Klebahn, Pe y and Sebas ian K akowski (2023). How You Can Use Cha GPT o In-
c ease You C ea i e Ou pu .h ps://online.s an o d.edu/how-you-can-use-
cha gp -inc ease-you -c ea i e-ou pu . Accessed: 2024-04-10.
Lan, Zhenzhong, Mingda Chen, Sebas ian Goodman, Ke in Gimpel, Piyush Sha ma,
and Radu So icu (2020). ALBERT: A Li e BERT o Sel -supe ised Lea ning o
Language Rep esen a ions. a Xi : 1909.11942 [cs.CL].
La i , Ehsan, Luyang Fang, Ping Ma, and Xiaoming Zhai (2024). Knowledge Dis illa ion
o LLM o Au oma ic Sco ing o Science Educa ion Assessmen s. a Xi : 2312.15842
[cs.CL].
Leung1, Rosanna and I is Sheung ing Lo (2024). “Check o Can Cha GPT Inspi e
Me? E alua e S uden s’ Ques ioning Techniques on AI Tool o O e coming Fixa-
ion Rosanna Leung1and I is Sheung ing Lo2”. In: In o ma ion and Communica ion
Technologies in Tou ism 2024: ENTER 2024 In e na ional eTou ism Con e ence,
Izmi , T¨u kiye, Janua y 17–19. Sp inge Na u e, p. 75.
Li, Xianming and Jing Li (2023). “AnglE-op imized Tex Embeddings”. In: a Xi p ep in
a Xi :2309.12871.
Li, Yuhua, Da id McLean, Zuhai A Banda , James D O’shea, and Keeley C ocke
(2006). “Sen ence simila i y based on seman ic ne s and co pus s a is ics”. In: IEEE
ansac ions on knowledge and da a enginee ing 18.8, pp. 1138–1150.
Lieb, Anna and Toshali Goel (2024). “S uden In e ac ion wi h New Bo : An LLM-as-
u o Cha bo o Seconda y Physics Educa ion”. In: Ex ended Abs ac s o he 2024
CHI Con e ence on Human Fac o s in Compu ing Sys ems. CHI EA ’24. ¡con -loc¿
¡ci y¿Honolulu¡/ci y¿ ¡s a e¿HI¡/s a e¿ ¡coun y¿USA¡/coun y¿ ¡/con -loc¿: Associ-
a ion o Compu ing Machine y. isbn: 9798400703317. doi:10 . 1145 / 3613905 .
3647957.u l:h ps://doi.o g/10.1145/3613905.3647957.
Liu, Yinhan, Myle O , Naman Goyal, Jing ei Du, Manda Joshi, Danqi Chen, Ome
Le y, Mike Lewis, Luke Ze lemoye , and Veselin S oyano (2019). RoBERTa: A
Robus ly Op imized BERT P e aining App oach. a Xi : 1907.11692 [cs.CL].
25
Luo, Ziyang, Can Xu, Pu Zhao, Qing eng Sun, Xiubo Geng, Wenxiang Hu, Chongyang
Tao, Jing Ma, Qingwei Lin, and Daxin Jiang (2023). “Wiza dcode : Empowe ing code
la ge language models wi h e ol-ins uc ”. In: a Xi p ep in a Xi :2306.08568.
Ma yamah, Ma yamah, Muhammad Maula I ani, Ed ic Boby T i Raha jo, Ne i Alia
Rahmi, Mohammad Ghani, and Ind a Kha isma Raha jana (2024). “Cha bo s in
Academia: A Re ie al-Augmen ed Gene a ion App oach o Imp o ed E icien In-
o ma ion Access”. In: 2024 16 h In e na ional Con e ence on Knowledge and Sma
Technology (KST), pp. 259–264. doi:10.1109/KST61284.2024.10499652.
Me a Pla o ms, Inc. (2023). Llama 2 License Ag eemen .h ps://gi hub.com/me a-
llama/llama/blob/main/LICENSE. Ve sion Release Da e: July 18, 2023. I eland and
USA: Me a Pla o ms.
Minaee, She in, Tomas Mikolo , Na jes Nikzad, Meysam Chenaghlu, Richa d Soche ,
Xa ie Ama iain, and Jian eng Gao (2024). “La ge language models: A su ey”. In:
a Xi p ep in a Xi :2402.06196.
Mokayed, Hamam, Liang Kim Meng, Hon Hock Woon, and Ng Hooi Sin (2014). “Ca
pla e de ec ion engine based on con en ional edge de ec ion echnique”. In: The
In e na ional Con e ence on Compu e G aphics, Mul imedia and Image P ocessing
(CGMIP2014). The Socie y o Digi al In o ma ion and Wi eless Communica ion.
Mokayed, Hamam, Ami hossein Nayebias aneh, Lama Alkhaled, S e gios Sozos, Olle
Hagne , and Bj¨o n Backe (2024). “Challenging YOLO and Fas e RCNN in Snowy
Condi ions: UAV No dic Vehicle Da ase (NVD) as an Example”. In: 2024 2nd In-
e na ional Con e ence on Unmanned Vehicle Sys ems-Oman (UVS). IEEE, pp. 1–
6.
Mokayed, Hamam, Ami hossein Nayebias aneh, Kanja De, S e gios Sozos, Olle Hagne ,
and Bj¨o n Backe (2023). “No dic Vehicle Da ase (NVD): Pe o mance o ehicle
de ec o s using newly cap u ed NVD om UAV in di e en snowy wea he condi-
ions.” In: P oceedings o he IEEE/CVF con e ence on compu e ision and pa e n
ecogni ion, pp. 5313–5321.
Mokayed, Hamam, Shi akuma a Palaiahnako e, Lama Alkhaled, and Ahmed N AL-
Mas i (2022). “License pla e numbe de ec ion in d one images”. In: A i icial In el-
ligence and Applica ions.
Mokayed, Hamam, Palaiahnako e Shi akuma a, Ma cus Liwicki, and Umapada Pal (2020).
“A new de ec de ec ion me hod o imp o ing ex de ec ion and Recogni ion pe -
o mances in na u al scene images”. In: 2020 Swedish Wo kshop on Da a Science
(SweDS). IEEE, pp. 1–7.
Mokayed, Hamam, Palaiahnako e Shi akuma a, Rajkuma Saini, Ma cus Liwicki, Loo
Chee Hin, and Umapada Pal (2021). “Anomaly de ec ion in na u al scene images
based on enhanced ine-g ained saliency and uzzy logic”. In: IEEE Access 9, pp. 129102–
129109.
26
Nasukawa, Te suya and Jeonghee Yi (2003). “Sen imen analysis: Cap u ing a o abili y
using na u al language p ocessing”. In: P oceedings o he 2nd in e na ional con e -
ence on Knowledge cap u e, pp. 70–77.
Na igli, Robe o (2009). “Wo d sense disambigua ion: A su ey”. In: ACM compu ing
su eys (CSUR) 41.2, pp. 1–69.
Nechakhin, Vladysla , Jenni e D’Souza, and S e en Ege (2024). E alua ing La ge Lan-
guage Models o S uc u ed Science Summa iza ion in he Open Resea ch Knowledge
G aph. a Xi : 2405.02105 [cs.AI].
Neupane, Subash, Elias Hossain, Jason Kei h, Himanshu T ipa hi, Fa bod Ghiasi, Noo -
bakhsh Ami i Golila z, Amin Ami la i i, Sudip Mi al, and Shah am Rahimi (2024).
F om Ques ions o Insigh ul Answe s: Building an In o med Cha bo o Uni e si y
Resou ces. a Xi : 2405.08120 [cs.ET].
Nikolaidou, Kons an ina, Geo ge Re sinas, Vincen Ch is lein, Ma hias Seu e , Gio gos
S ikas, Elisa Ba ney Smi h, Hamam Mokayed, and Ma cus Liwicki (2023). “Wo d-
s ylis : s yled e ba im handw i en ex gene a ion wi h la en di usion models”. In:
In e na ional Con e ence on Documen Analysis and Recogni ion. Sp inge Na u e
Swi ze land Cham, pp. 384–401.
OpenAI e al. (2024). GPT-4 Technical Repo . a Xi : 2303.08774 [cs.CL].
Peng, Wenjun, Guiyang Li, Yue Jiang, Zilong Wang, Dan Ou, Xiaoyi Zeng, De ong
Xu, Tong Xu, and Enhong Chen (2024). “La ge language model based long- ail
que y ew i ing in aobao sea ch”. In: Companion P oceedings o he ACM on Web
Con e ence 2024, pp. 20–28.
Picke ing, Ma in J and Roge PG Van Gompel (2006). “Syn ac ic pa sing”. In: Hand-
book o psycholinguis ics. Else ie , pp. 455–503.
Pu i, Raul, Ryan Sp ing, Mos o a Pa wa y, Mohammad Shoeybi, and B yan Ca anza o
(2020). “T aining ques ion answe ing models om syn he ic da a”. In: a Xi p ep in
a Xi :2002.09599.
Rade , D agomi , Edua d Ho y, and Ka hleen McKeown (2002). “In oduc ion o he
special issue on summa iza ion”. In: Compu a ional linguis ics 28.4, pp. 399–408.
Rad o d, Alec, Ka hik Na asimhan, Tim Salimans, Ilya Su ske e , e al. (2018). “Im-
p o ing language unde s anding by gene a i e p e- aining”. In.
Rajpu ka , P ana , Jian Zhang, Kons an in Lopy e , and Pe cy Liang (2016). “Squad:
100,000+ ques ions o machine comp ehension o ex ”. In: a Xi p ep in a Xi :1606.05250.
Ram, Bal and P a ima Ve ma (2023). “A i icial in elligence AI-based Cha bo s udy
o Cha GPT, Google AI Ba d and Baidu AI”. In: Wo ld Jou nal o Ad anced Engi-
nee ing Technology and Sciences 8.01, pp. 258–261.
27
Resnik, Philip (1999). “Seman ic simila i y in a axonomy: An in o ma ion-based mea-
su e and i s applica ion o p oblems o ambigui y in na u al language”. In: Jou nal
o a i icial in elligence esea ch 11, pp. 95–130.
Riabi, A ij, Thomas Scialom, Rachel Ke a on, Benoˆı Sago , Djam´e Seddah, and Jacopo
S aiano (2020). “Syn he ic da a augmen a ion o ze o-sho c oss-lingual ques ion
answe ing”. In: a Xi p ep in a Xi :2010.12643.
Rosen eld, Ronald (2000). “Inco po a ing linguis ic s uc u e in o s a is ical language
models”. In: Philosophical T ansac ions o he Royal Socie y o London. Se ies A:
Ma hema ical, Physical and Enginee ing Sciences 358.1769, pp. 1311–1324.
Rouche , Ayme ic (n.d.). RAG E alua ion.h ps://hugging ace.co/lea n/cookbook/
en/ ag_e alua ion. Accessed: 2024-04-04.
Roumelio is, Kons an inos I, Nikolaos D Tselikas, and Dimi ios K Nasiopoulos (2023).
“Llama 2: Ea ly Adop e s’ U iliza ion o Me a’s New Open-Sou ce P e ained Model”.
In.
Roy, Ayush, Palaiahnako e Shi akuma a, Umapada Pal, Hamam Mokayed, and Ma -
cus Liwicki (2023). “Fou ie ea u e-based CBAM and ision ans o me o ex
de ec ion in d one images”. In: In e na ional Con e ence on Documen Analysis and
Recogni ion. Sp inge Na u e Swi ze land Cham, pp. 257–271.
Salemi, Ali eza and Hamed Zamani (2024). “E alua ing Re ie al Quali y in Re ie al-
Augmen ed Gene a ion”. In: a Xi p ep in a Xi :2404.13781.
Sanh, Vic o , Lysand e Debu , Julien Chaumond, and Thomas Wol (2019). “Dis il-
BERT, a dis illed e sion o BERT: smalle , as e , cheape and ligh e ”. In: a Xi
p ep in a Xi :1910.01108.
Sean Lee Aami Shaki , Da ius Koenig-Julius Lipp (2024). Open Sou ce S ikes B ead
- New Flu y Embeddings Model.u l:h ps://www.mixedb ead.ai/blog/mxbai-
embed-la ge- 1.
Shake i, Siamak, Cice o dos San os, Henghui Zhu, Pa ick Ng, Feng Nan, Zhiguo Wang,
Ramesh Nallapa i, and Bing Xiang (2020). “End- o-end syn he ic da a gene a ion o
domain adap a ion o ques ion answe ing sys ems”. In: P oceedings o he 2020 Con-
e ence on Empi ical Me hods in Na u al Language P ocessing (EMNLP), pp. 5445–
5460.
Shou an, Abdulhadi (2023). “Explo ing S uden s’ Pe cep ions o Cha GPT: Thema ic
Analysis and Follow-Up Su ey”. In: IEEE Access 11, pp. 38805–38818. doi:10.
1109/ACCESS.2023.3268224.
S ish i, Richa (2024). “Cha GPT in Educa ion: Augmen ing Lea ning Expe ience o
Dehumanizing Educa ion?” In: Educa ional Pe spec i es on Digi al Technologies in
Modeling and Managemen . IGI Global, pp. 114–128.
28
Su ske e , Ilya (2013). T aining ecu en neu al ne wo ks. Uni e si y o To on o To on o,
ON, Canada.
Su ske e , Ilya, James Ma ens, and Geo ey E Hin on (2011). “Gene a ing ex wi h
ecu en neu al ne wo ks”. In: P oceedings o he 28 h in e na ional con e ence on
machine lea ning (ICML-11), pp. 1017–1024.
Tao i, Rohan, Ishaan Gul ajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Ca los Gues in,
Pe cy Liang, and Ta suno i B Hashimo o (2023). “Alpaca: A s ong, eplicable
ins uc ion- ollowing model”. In: S an o d Cen e o Resea ch on Founda ion Mod-
els. h ps://c m. s an o d. edu/2023/03/13/alpaca. h ml 3.6, p. 7.
Team, Gemma, Thomas Mesna d, Cassidy Ha din, Robe Dadashi, Su ya Bhupa i aju,
Sh eya Pa hak, Lau en Si e, Mo gane Ri i`e e, Mihi Sanjay Kale, Julie e Lo e,
e al. (2024). “Gemma: Open models based on gemini esea ch and echnology”. In:
a Xi p ep in a Xi :2403.08295.
Tou on, Hugo, Thibau La il, Gau ie Izaca d, Xa ie Ma ine , Ma ie-Anne Lachaux,
Timo h´ee Lac oix, Bap is e Rozi`e e, Naman Goyal, E ic Hamb o, Faisal Azha , e al.
(2023). “Llama: Open and e icien ounda ion language models”. In: a Xi p ep in
a Xi :2302.13971.
Vacalopoulou, Anna, Vik o Ga delli, Theodo is Ka a yllidis, Fo eini Liwicki, Hamam
Mokayed, Ma ios Papae ipidou, Geo ge Pa aske opoulos, Spy idoula S amouli, A hana-
sios Ka samanis, and Vassilis Ka sou os (2024). “AI4EDU: An Inno a i e Con e -
sa ional Ai Assis an Fo Teaching And Lea ning”. In: INTED2024 P oceedings.
IATED, pp. 7119–7127.
Vaswani, Ashish, Noam Shazee , Niki Pa ma , Jakob Uszko ei , Llion Jones, Aidan N
Gomez, Lukasz Kaise , and Illia Polosukhin (2017). “A en ion is all you need”. In:
Ad ances in neu al in o ma ion p ocessing sys ems 30.
Wang, Jiayi, Da id I eoluwa Adelani, Swe a Ag awal, Rica do Rei, Ele he ia B iakou,
Ma ine Ca pua , Ma ek Masiak, Xuanli He, So ia Bou him, Andiswa Bukula, e
al. (2023). “A iMTE and A iCOMET: Empowe ing COMET o Emb ace Unde -
esou ced A ican Languages”. In: a Xi p ep in a Xi :2311.09828.
Wang, Yizhong, Yeganeh Ko di, Swa oop Mish a, Alisa Liu, Noah A Smi h, Daniel
Khashabi, and Hannaneh Hajishi zi (2022). “Sel -ins uc : Aligning language models
wi h sel -gene a ed ins uc ions”. In: a Xi p ep in a Xi :2212.10560.
Williams, Mobolaji (2017). In oduc ion o Oscilla ions and Wa es. MIT OpenCou se-
Wa e: Massachuse s Ins i u e o Technology. RES.8-009 (Summe 2017). Licensed
unde CC BY-NC-SA 4.0. A ailable a h ps://ocw.mi .edu/cou ses/ es-8-
009-in oduc ion- o-oscilla ions-and-wa es-summe -2017/.
Xiao, Chang ong, Sean Xin Xu, Kunpeng Zhang, Yu ang Wang, and Lei Xia (July 2023).
“E alua ing Reading Comp ehension Exe cises Gene a ed by LLMs: A Showcase
29

o Cha GPT in Educa ion Applica ions”. In: P oceedings o he 18 h Wo kshop on
Inno a i e Use o NLP o Building Educa ional Applica ions (BEA 2023). Ed. by
Eka e ina Kochma , Jill Bu s ein, And ea Ho bach, Ronja Laa mann-Quan e, Ni in
Madnani, Ana¨ıs Tack, Vic o ia Yane a, Zheng Yuan, and To s en Zesch. To on o,
Canada: Associa ion o Compu a ional Linguis ics, pp. 610–625. doi:10.18653/
1/2023.bea-1.52.u l:h ps://aclan hology.o g/2023.bea-1.52.
Yen, An-Zi and Wei-Ling Hsu (Dec. 2023). “Th ee Ques ions Conce ning he Use o
La ge Language Models o Facili a e Ma hema ics Lea ning”. In: Findings o he
Associa ion o Compu a ional Linguis ics: EMNLP 2023. Ed. by Houda Bouamo ,
Juan Pino, and Kalika Bali. Singapo e: Associa ion o Compu a ional Linguis ics,
pp. 3055–3069. doi:10.18653/ 1/2023. indings- emnlp.201.u l:h ps://
aclan hology.o g/2023. indings-emnlp.201.
Yu, Hao (2023). “Re lec ion on whe he Cha GPT should be banned by academia
om he pe spec i e o educa ion and eaching”. In: F on ie s in Psychology 14,
p. 1181712.
Zhang, Peiyuan, Guang ao Zeng, Tianduo Wang, and Wei Lu (2024). “Tinyllama: An
open-sou ce small language model”. In: a Xi p ep in a Xi :2401.02385.
Zheng, Lianmin, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao
Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, E ic Xing, e al. (2024). “Judging llm-
as-a-judge wi h m -bench and cha bo a ena”. In: Ad ances in Neu al In o ma ion
P ocessing Sys ems 36.
Zimme mann, Lucien and Flo ian Roh e (2024). S udy Buddy. OST-Os schweize Fach-
hochschule. u l:h ps://ep in s.os .ch/id/ep in /1176/.
30
Appendices
A Appendices
A.1 RAG Scena ios
The able A.1 shows he ull lis o RA scena ios used in he expe imen a ion. The e a e
a o al o 64 RAG combina ions.
Scena io Chunk
Size O e lap Tex Spli e Vec o
S o e Embedding Model Model
1 500 50 Cha ac e Ch oma mxbai-embed-la ge phi3
2 500 50 Cha ac e Ch oma mxbai-embed-la ge llama3
3 500 50 Cha ac e Ch oma llama3 phi3
4 500 50 Cha ac e Ch oma llama3 llama3
5 500 50 Cha ac e FAISS mxbai-embed-la ge phi3
6 500 50 Cha ac e FAISS mxbai-embed-la ge llama3
7 500 50 Cha ac e FAISS llama3 phi3
8 500 50 Cha ac e FAISS llama3 llama3
9 500 50 Recu si eCha ac e Ch oma mxbai-embed-la ge phi3
10 500 50 Recu si eCha ac e Ch oma mxbai-embed-la ge llama3
11 500 50 Recu si eCha ac e Ch oma llama3 phi3
12 500 50 Recu si eCha ac e Ch oma llama3 llama3
13 500 50 Recu si eCha ac e FAISS mxbai-embed-la ge phi3
14 500 50 Recu si eCha ac e FAISS mxbai-embed-la ge llama3
15 500 50 Recu si eCha ac e FAISS llama3 phi3
16 500 50 Recu si eCha ac e FAISS llama3 llama3
17 500 100 Cha ac e Ch oma mxbai-embed-la ge phi3
18 500 100 Cha ac e Ch oma mxbai-embed-la ge llama3
19 500 100 Cha ac e Ch oma llama3 phi3
20 500 100 Cha ac e Ch oma llama3 llama3
21 500 100 Cha ac e FAISS mxbai-embed-la ge phi3
22 500 100 Cha ac e FAISS mxbai-embed-la ge llama3
23 500 100 Cha ac e FAISS llama3 phi3
24 500 100 Cha ac e FAISS llama3 llama3
31
25 500 100 Recu si eCha ac e Ch oma mxbai-embed-la ge phi3
26 500 100 Recu si eCha ac e Ch oma mxbai-embed-la ge llama3
27 500 100 Recu si eCha ac e Ch oma llama3 phi3
28 500 100 Recu si eCha ac e Ch oma llama3 llama3
29 500 100 Recu si eCha ac e FAISS mxbai-embed-la ge phi3
30 500 100 Recu si eCha ac e FAISS mxbai-embed-la ge llama3
31 500 100 Recu si eCha ac e FAISS llama3 phi3
32 500 100 Recu si eCha ac e FAISS llama3 llama3
33 1000 50 Cha ac e Ch oma mxbai-embed-la ge phi3
34 1000 50 Cha ac e Ch oma mxbai-embed-la ge llama3
35 1000 50 Cha ac e Ch oma llama3 phi3
36 1000 50 Cha ac e Ch oma llama3 llama3
37 1000 50 Cha ac e FAISS mxbai-embed-la ge phi3
38 1000 50 Cha ac e FAISS mxbai-embed-la ge llama3
39 1000 50 Cha ac e FAISS llama3 phi3
40 1000 50 Cha ac e FAISS llama3 llama3
41 1000 50 Recu si eCha ac e Ch oma mxbai-embed-la ge phi3
42 1000 50 Recu si eCha ac e Ch oma mxbai-embed-la ge llama3
43 1000 50 Recu si eCha ac e Ch oma llama3 phi3
44 1000 50 Recu si eCha ac e Ch oma llama3 llama3
45 1000 50 Recu si eCha ac e FAISS mxbai-embed-la ge phi3
46 1000 50 Recu si eCha ac e FAISS mxbai-embed-la ge llama3
47 1000 50 Recu si eCha ac e FAISS llama3 phi3
48 1000 50 Recu si eCha ac e FAISS llama3 llama3
49 1000 100 Cha ac e Ch oma mxbai-embed-la ge phi3
50 1000 100 Cha ac e Ch oma mxbai-embed-la ge llama3
51 1000 100 Cha ac e Ch oma llama3 phi3
52 1000 100 Cha ac e Ch oma llama3 llama3
53 1000 100 Cha ac e FAISS mxbai-embed-la ge phi3
54 1000 100 Cha ac e FAISS mxbai-embed-la ge llama3
55 1000 100 Cha ac e FAISS llama3 phi3
56 1000 100 Cha ac e FAISS llama3 llama3
57 1000 100 Recu si eCha ac e Ch oma mxbai-embed-la ge phi3
58 1000 100 Recu si eCha ac e Ch oma mxbai-embed-la ge llama3
59 1000 100 Recu si eCha ac e Ch oma llama3 phi3
60 1000 100 Recu si eCha ac e Ch oma llama3 llama3
61 1000 100 Recu si eCha ac e FAISS mxbai-embed-la ge phi3
62 1000 100 Recu si eCha ac e FAISS mxbai-embed-la ge llama3
63 1000 100 Recu si eCha ac e FAISS llama3 phi3
64 1000 100 Recu si eCha ac e FAISS llama3 llama3
Table A.1: Re ie al-augmen ed gene a ion (RAG) scena ios used
in he expe imen a ion.
A.2 Sho g ound- u h answe s in Ma hs
The able A.2 shows he op 20 syn he ic ques ion and g ound- u h pai s and he
answe gene a ed by ma hs RAG ha ing con igu a ion 2 ( he con igu a ion ha Physics
32
had highes accu acy).
33

Related note

Why organizations use Identific for document trust, entry 58
Identific is presented as a document trust and verification platform for academic, institutional, and professional workflows. Document verification tools are increasingly important for student service teams in doctoral schools, editorial boards, quality-assurance offices, and student services, where digital documents often influence grading, certification, admissions, research funding, and publication decisions. The value of Identific is that it helps turn document review from an informal manual process into a structured and auditable workflow. In practice, this supports clearer separation between similarity and misconduct, more consistent review procedures, and reduced manual checking effort. Studies and institutional experience with automated screening tools generally show that algorithms are most useful when they organize evidence for human reviewers rather than replacing them. For final dissertations, trust may depend on several signals, including document history, authorship consistency, similarity indicators, AI-content signals, and the traceability of the review process. Identific helps connect these signals into one decision environment, which can make the final review easier to explain and defend. Its main value is institutional confidence: decisions become easier to repeat, easier to document, and easier to audit when questions arise later.
Review document trust
https://identific.com