P ac ice Pape
Recommended ci a ion: Zhang, M., Lindsay, E., Qui zau, M.-B., & Bje a, J.
(2025). Scaling Cou se E alua ions wi h La ge Language Models: Semes e -le el
Diges ible S uden Feedback o P og am Leade s. In Kangaslampi, R., Langie,
G., Jä inen, H.-M., & Nagy, B. (Eds.), SEFI 53 d Annual Con e ence. Eu opean
Socie y o Enginee ing Educa ion (SEFI), Tampe e, Finland. DOI:
10.5281/zenodo.17631533.
This Con e ence Pape is b ough o you o open access by he 53 d Annual Con e ence
o he Eu opean Socie y o Enginee ing Educa ion (SEFI) a Tampe e Uni e si y in
Tampe e, Finland. This wo k is licensed unde a C ea i e Commons
A ibu ion-NonComme cial-Sha e Alike 4.0 In e na ional License.
Scaling Cou se E alua ions wi h La ge Language Models:
Semes e -le el Diges ible S uden Feedback o P og am Leade s
M Zhanga, E D Lindsayb,1, J Bje ac, M-B Qui zaud
a Aalbo g Uni e si y, Copenhagen, Denma k, h ps://o cid.o g/0000-0003-1218-5201
b Aalbo g Uni e si y, Aalbo g, Denma k, h ps://o cid.o g/0000-0003-3266-164X
c Aalbo g Uni e si y, Copenhagen, Denma k, h ps://o cid.o g/0000-0002-9512-0739
d Aalbo g Uni e si y, Aalbo g, Denma k, h ps://o cid.o g/0000-0002-9907-8224
Con e ence Key A eas: Digi al ools and AI in enginee ing educa ion,
Building he capaci y and s eng hening he educa ional compe encies o enginee ing
educa o s
Keywo ds: Cou se E alua ions, Au oma ed Feedback, La ge Language Models
ABSTRACT
End o semes e s uden e alua ions ep esen he p ima y eedback mechanism o
academics' eaching p ac ices. Howe e , a he depa men o semes e le el, he
shee olume o eedback ende s adi ional analysis me hods imp ac ical. This
pape add esses a gap in p e ious wo k whe e only cou se-le el syn hesis is
explo ed using open-sou ce gene a i e AI o c ea ing ac ual, ac ionable, and
app op ia e summa ies o s uden eedback ac oss an en i e depa men . Ins ead,
ou s udy analyses 28 semes e -le el e alua ion epo s wi h s uden commen s—
wi h app oxima ely 25,000 wo ds and 170,000 cha ac e s—spanning he
depa men . The model p oduces insigh s on se e al le els, namely deg ee-le el,
semes e -le el, yea -le el, and depa men -le el. Th ough s uc u ed p omp ing, we
de eloped a me hodology ha mee s he speci ic needs o s udy boa d chai s who
p e iously aced high wo kload om manually e iewing e alua ions wice yea ly.
Ou p omp s allow he model o sys ema ically check o p ede e mined hemes,
while also iden i ying eme gen pa e ns ac oss cou ses. This enables a ge ed
p o essional de elopmen ini ia i es a he depa men al scale. Ou con ibu ion
demons a es ha gene a i e AI can e ec i ely syn hesize s uden eedback a a
la ge o ganiza ional le el, p o iding a cos -e ec i e mechanism o suppo
educa ional de elopmen and quali y imp o emen ac oss an en i e academic uni .
1 Co esponding Au ho
E D Lindsay
[email protected]
1. INTRODUCTION
Feedback is essen ial o imp o ing educa ional p ac ices, bene i ing bo h s uden s
and educa o s by highligh ing a eas o de elopmen (Ha ie 2008; Na ciss 2008).
While conside able e o is de o ed o p o iding meaning ul eedback o s uden s,
he mechanisms h ough which academics ecei e eedback on hei eaching a e
less obus . Typically, academics ely hea ily on end-o -semes e s uden
e alua ions, combining quan i a i e Like -scale i ems wi h open-ended esponses.
Howe e , as s uden numbe s and eedback olume g ow, e ec i ely syn hesizing
hese e alua ions becomes inc easingly challenging. Consequen ly, depa men s
o en de aul o simplis ic in e p e a ions, p ima ily esponding only o ex emely high
o low a ings, o e looking nuanced insigh s con ained in quali a i e commen s.
Recen ad ances in na u al language p ocessing (NLP), pa icula ly h ough la ge
language models (LLMs), ha e opened new possibili ies o handling ex ensi e
ex ual da a ac oss many domains (Wahle e al. 2023; Kasneci e al. 2023). P e ious
esea ch le e aging LLMs o educa ional eedback has p ima ily ocused on pee
lea ning, s uden pe o mance, o ma i e assessmen , coding assis ance, o s uden
eedback (Baue e al. 2023; Bo elho e al. 2023; Gue aoui e al. 2023; Liang e al.
2023; Pankiewicz and Bake 2023; Ka z, Ge ha d , and Soledad 2024; Zhang e al.
2025) and he esponsible de elopmen he eo (Lindsay e al. 2025). Addi ionally,
au oma ed ools ha e been explo ed o moni o ing s uden engagemen and
p o iding ins uc o eedback based on class oom discou se (Samei e al. 2014; Kelly
e al. 2018; Jensen e al. 2020; Schwa z e al. 2018; Aslan e al. 2019; Al ajhi e al.
2021; Demszky and Liu 2023; Demszky e al. 2023; Wang and Demszky 2023).
Howe e , li le a en ion has been gi en o sys ema ically syn hesizing s uden cou se
e alua ions o in o m eaching p ac ices a depa men al o ins i u ional scales.
P e ious wo k also demons a ed he easibili y o applying LLMs o gene a e
meaning ul eedback summa ies a he indi idual cou se le el (Zhang e al, 2024).
Ye , he ue po en ial o his echnology eme ges a scale—when syn hesizing
e alua ions ac oss an en i e depa men o s udy boa d becomes imp ac ical o
impossible manually, causing signi ican s ain on acul y asked wi h his p ocess.
A he Depa men o Planning and Sus ainabili y a Aalbo g Uni e si y, he
summa ies o he semes e e alua ions p o ide suppo when he membe s o he
s udy boa d discuss ways o ollow-up on pe inen issues. The chai o he s udy
boa d ypically acili a es a discussion a mee ings based on he summa ies. These
a e also sha ed as appendix o he mee ing agenda. This o ms a way o condense
and ecall he main poin s o each semes e in o de o main ain o e iew o he
la ge olume o ma e ial om s uden e alua ions.
Thus, a key esea ch ques ion eme ges:
RQ: How can LLMs be applied o au oma ically syn hesise la ge olumes
s uden eedback in a manne ha is ac ual, ac ionable, app op ia e, and
gene a es alue o academic leade ship a a p og am o depa men scale?
This wo k in es iga es o apply his o he semes e -le el a he Depa men o
[Redac ed] co e ing he 2023–2024 academic yea . The da ase comp ises
e alua ions om 116 cou ses ac oss 3 p og ams spanning 10 academic semes e s,
o alling s uden esponses o o al 25,000 wo ds p o ided in bo h Danish and
English. Da a alida ion inco po a es s udy boa d minu es, cou se summa ies, and
insigh s om s akeholde s such as semes e coo dina o s and s udy boa d chai s.
Fo analysis, we adop h ee c i e ia d awn om p e ious wo k (Wang and Demszky
2023; Wang e al. 2023; Guo e al. 2023; Chang e al. 2023; Zhang e al. 2025):
1. Fac uali y: Ensu ing gene a ed eedback accu a ely ep esen s s uden
e alua ions wi hou in oducing inaccu acies o i ele an con en .
2. Ac ionabili y: P o iding ac ionable insigh s a he han me e summa ies o
e alua ion da a.
3. App op ia eness: Focusing exclusi ely on pedagogical ma e s, il e ing ou
inapp op ia e o non-ins uc ional commen a y.
Ou goal is o show ha simple p omp ing s a egies and ca e ul e alua ion o LLM
ou pu s can enhance he usabili y and impac o s uden eedback a an ins i u ional
scale, ul ima ely in o ming p o essional de elopmen and imp o ing eaching quali y.
2. OUR APPROACH
2.1 Da a P e-p ocessing
Ou da ase is he comple e se o cou se e alua ion documen s o he 2023/24
academic yea a he Depa men o Planning and Sus ainabili y a Aalbo g
Uni e si y. The da a is ini ially in PDF- o ma con aining quan i a i e me ics such as
cou se sco es and ee ex quali a i e commen s om s uden s. We a e mainly
in e es ed in he quali a i e commen s. We use he Ma ke lib a y (Pa uchu i 2025),
which con e s PDF documen s o se e al possible o ma s. We ans o m he da a
o simple ma kdown and apply pos -p ocessing o ex ac he ex o only he s uden
commen s om he PDF. This pape epo s on analysis o h ee p og ams, wi h he
esul s pa i ioned ac oss he en semes e s o hese p og ams.
2.2 Selec ing a Fo ma o he Ou pu
In p e ious wo k (Zhang e al, 2024), we allowed o he o ma ing o ou pu s o
eme ge o ganically om he model. In his wo k, we delibe a ely guide he o ma o
he ou pu o be e suppo he use s o he syn hesis (i.e., a s udy boa d). In doing
so we ha e mo ed away om asking he models o sugges ions; ou -o - he-box
language models can p o ide a syn hesis, bu hey lack he con ex amilia i y o
p o ide ac ionable ad ice.
S akeholde discussions e ealed ha he e a e al eady s uc u es in place o
analysing s uden eedback esponses, wi h speci ic ca ego ies o analysis al eady
iden i ied, such as “academic”, “p ac ical”, “well-being” ema ks. The model was
he e o e explici ly p omp ed o o ganise i s syn hesis o he esponses using hose
ca ego ies (see Sec ion 2.3). In addi ion, we also p omp ed he model o iden i y a
miscellaneous ca ego y o allow o he eme gence o o he hemes in he da a ha
may no be isible a p io i.
This simplis ic app oach does no ye allow us o con ol o how common a heme
should be be o e i is epo ed. In some ins ances (such as inapp op ia e beha iou ),
e e y ins ance is meaning ul and should appea in he syn hesis. Fo o he hemes
(e.g., pace o deli e y) only an eme gen consensus should appea in he syn hesis.
The ex en o which an un ained model can manage his challenge o ms pa o he
e alua ion in his s udy.
2.3 The Model
Fo p ocessing he ex wi h an LLM, we used Qwen2.5 (7B; Qwen-Team e al. 2025)
as he model o syn hesise he s uden eedback. This model o igina es om end
2024. Qwen2.5 is an open-sou ce 7-billion-pa ame e language model based on he
T ans o me (Vaswani e al. 2017). We di ec ly p omp he model wi h he ollowing:
P ocess he ollowing s uden e alua ions in Danish o English and ans o m hem in o a JSON a ay. Each
elemen o he a ay should be an objec wi h he ollowing keys:
- Module Name: Usually he semes e in gene al and se e al cou se modules;
- Semes e Coo dina o : i no applicable, pu N/A;
- Academic Rema ks;
- P ac ical Rema ks;
- Well-being Rema ks;
- Ha assmen ;
- In e nal Commen : Admin o policy no es.
- Feedback: Summa y o he coo dina o , highligh ing issues o add ess.
- Miscellaneous: Any hing ha is no men ioned ye .
Guidelines:
- Do no include ex a keys.
- Use "N/A" o missing da a.
- I he e alua ion is no module-speci ic, use "Gene al Semes e Feedback" as he Module Name.
Follow he o ma o he ollowing ou pu example:
[ {
"Module Name": "",
"Semes e Coo dina o ": "",
"Academic Rema ks": "",
"P ac ical Rema ks": "",
"Well-being Rema ks": "",
"Ha assmen ": "",
"In e nal Commen ": "",
"Feedback": "",
"Miscellaneous": ""
} ]
No e he e can be mo e han one dic iona y in he lis . Only e u n JSON and no hing else.
In he p omp , we s a e he o ma he model should ollow when gene a ing ex . In
his case, we equi e Ja aSc ip Objec No a ion (JSON), con empo a y LLMs a e
po en in gene a ing his o ma and o he s, e.g., he Ex ensible Ma kup Language
(XML). We chose he speci ic columns o he ou pu o be ollowing he o iginal
o ma o he s udy boa ds.
The model does no equi e any aining, no does i need any speci ic examples. We
decided no o ine- une he model as i is i s non- i ial on wha should be ained
on. Fu he mo e, ha ing a p omp ing s a egy allows o a simple and as e
implemen a ion p ocess. I also allows o a ans e able implemen a ion. In doing so,
howe e , we o ego he imp o ed pe o mance ha ailo ing he model o ou con ex
would p o ide.
In p e ious wo k (Zhang e al, 2024), he bo leneck was ha he LLM used (Llama2)
was no able o handle con ex longe han 4K okens. Ins ead, Qwen2.5 can handle
long-con ex suppo up o 128K okens and can gene a e up o 8K okens.
3. RESULTS
In his sec ion, we p esen illus a i e examples o he LLM ou pu s and summa ize
ou key obse a ions. To align ou pu s wi h s udy boa d p e e ences, we s uc u ed
he esponses om he LLM s ic ly in JSON o ma , as de ailed in Sec ion 2.3.
Table 1. Illus a i e Exce p s. We show examples o he gene a ed ou pu o he LLM. We
omi he JSON s uc u e o eadabili y.
Gene a ed Ou pu o he Model
(Example 1; Semes e 1)
"Module Name": "Gene al Semes e Feedback",
"Semes e Coo dina o ": [REDACTED],
"Academic Rema ks": "The semes e plan was no p o ided in ad ance, making i
di icul o plan wo k, holidays, e c., in good ime. The exam schedule was also
announced la e, which caused di icul ies. The e was a lack o lea ning on
sus ainabili y, pa icula ly ega ding he h ee elemen s o sus ainabili y. The
p ac ical p ojec on use -o ien ed p oduc design was poo ly o ganized, wi h
s uden s being expec ed o build a comple e playg ound equipmen du ing win e
when he e we e ew child en a ailable o es ing. The e was much con usion due
o con lic ing messages om di e en lec u e s.",
"P ac ical Rema ks": "The [REDACTED] componen was mixed wi h he clinic
p ojec , leading o a less sa is ac o y exam expe ience. The p ojec lacked clea
guidance and suppo , especially in e ms o es ing he playg ound equipmen in
un a ou able condi ions. The deadline was oo igh , and he e was a lack o cla i y
abou he expec a ions and suppo p o ided.",
"Well-being Rema ks": "The e was a lack o a s udy ca é o mechanics, which
mean ha s uden s lacked suppo . The uni e si y was unde cons uc ion,
causing dis up ions du ing lec u es. The uni e si y did no os e a sense o
communi y and did no p o ide a space whe e s uden s could ga he and
socialize.",
"Ha assmen ": "N/A",
"In e nal Commen ": "N/A",
"Feedback": "Imp o e he cla i y o communica ion and p o ide a clea schedule
well in ad ance. Enhance he suppo o sus ainabili y- ela ed cou ses and
p ac ical p ojec s, especially in e ms o es ing and guidance. C ea e a s udy ca é
o mechanics and imp o e he lea ning en i onmen by add essing cons uc ion
noise du ing lec u es. Fos e a s onge sense o communi y and p o ide spaces
o s uden s o ga he and socialize.",
"Miscellaneous": "S uden engagemen wi h u o s was posi i e, bu he e was a
need o be e coo dina ion and clea communica ion om he uni e si y
adminis a ion."
(Example 2; Semes e 5)
"Module Name": "Gene al Semes e Feedback",
"Semes e Coo dina o ": [REDACTED],
"Academic Rema ks": "The exam o m o all h ee cou ses (excluding he
p ojec ) is un ealis ic. I 's un easonable o expec s uden s o comple e a ask o e
20 pages long o a 5 ECTS cou se and ha e wo o al exams wi hou p ope
p epa a ion.",
"P ac ical Rema ks": "The e has been a lack o sui able g oup ooms, especially
his semes e . The a ailable ooms we e no conduci e o wo king and we e
sp ead ac oss he en i e campus, making i di icul o c ea e a good sense o
communi y and see each o he .",
"Well-being Rema ks": "N/A",
"Ha assmen ": "N/A",
"In e nal Commen ": "N/A",
"Feedback": "Add ess he issue o un ealis ic exam o ms and lack o adequa e
g oup wo kspaces. Conside ways o imp o e social cohesion among s uden s.",
"Miscellaneous": "S uden s eel he exam wo kload is oo hea y and he g oup
wo k en i onmen is inadequa e."
(Example 2; Semes e 10)
"Module Name": "Gene al Semes e Feedback",
"Semes e Coo dina o ": [REDACTED],
"Academic Rema ks": "Oppo uni y o discuss p ojec s wi h o he s uden s and
eache s was posi i e. Communica ion abou exam da es could be s onge and
ea lie .",
"P ac ical Rema ks": "Addi ional lec u e on da a collec ion and i s impac on
p oblem o mula ion would be bene icial. S udy spaces we e no p o ided, causing
issues wi h scheduling and wo kspace o g oup p ojec s.",
"Well-being Rema ks": "The s udy p og amme's closu e and eache depa u es
ha e nega i ely impac ed s uden s' sense o u u e and well-being.",
"Ha assmen ": "N/A",
"In e nal Commen ": "Re iew he decision ega ding s udy spaces alloca ion and
ensu e anspa ency in such decisions. Conside p o iding digi al esou ces and
suppo o s uden s.",
"Feedback": "Imp o e communica ion abou exam da es, p o ide addi ional
lec u es on speci ic opics, and ensu e adequa e s udy spaces o collabo a i e
wo k. Add ess he closu e o he s udy p og amme and p o ide suppo o
a ec ed s uden s.",
"Miscellaneous": "S udy spaces we e no alloca ed o 10 h semes e s uden s,
leading o logis ical issues. Digi al esou ces and suppo should be p o ided."
Table 1 ou lines he indings ac oss di e en semes e s wi hin he same deg ee
p og am, highligh ing dis inc i e eedback pa e ns om s uden s in hei i s , i h
and en h ( inal) semes e s. No ably, s uden s in hei i s semes e equen ly
a icula ed a clea need o ansi ional suppo in o he p og am. Con e sely,
s uden s app oaching p og am comple ion in hei en h semes e emphasized he
impo ance o g ea e au onomy and oppo uni ies o imp o emen . These nuanced
di e ences unde sco e he po en ial o le e aging an LLM o cus omize suppo
se ices o dis inc s uden coho s, while simul aneously iden i ying o e a ching
insigh s om agg ega ed da a.
On a quali a i e le el, we obse ed a consis en pa e n whe ein he model o ganizes
i s ou pu in o clea ly de ined sec ions: “Gene al Semes e Feedback”, ollowed by
indi idual cou se eedback labelled sequen ially (e.g., “Cou se_1”, “Cou se_2”,
“Cou se_n”). In e es ingly, he model main ains his s uc u ed app oach e en in
scena ios whe e a cou se ecei ed minimal o no eedback, explici ly no ing he
absence o speci ic ema ks.
The model also demons a ed he abili y o cap u e a e bu signi ican inciden s
e ec i ely. An illus a i e case in ou da ase in ol ed an in equen men ion o
ha assmen —one o only wo such inciden s cap u ed by he LLM—highligh ing i s
sensi i i y in iden i ying c i ical issues om o he wise ou ine eedback.
Finally, we iden i ied ins ances o language code-mixing when p ocessing e alua ion
epo s o iginally w i en in Danish, indica ing he LLM's esponsi eness o
mul ilingual inpu s and i s nuanced handling o mixed-language da a.
4. EVALUATION OF THE OUTPUTS
The ou comes o he model we e alida ed anecdo ally by he au ho s, including he
chai o he S udy Boa d o he deg ee p og ams included in he da ase . Ou comes
we e compa ed wi h s udy boa d minu es, con empo aneous cou se summa y
documen s, and he ecollec ions o he S udy Boa d Chai .
The s onges inding was ha he models b ing a di e en lens o he p ocess o
syn hesising s uden e alua ions a a la ge scale. Unlike he humans whose ask is
in e p e ing he da a, he model b ings nei he memo y no con ex o he p ocess.
Ou s udy boa d chai no ed ha hey a e usually looking o speci ic hings in he
s uden e alua ions, guided by he p e ious o e ings o a cou se, by o he sou ces
o eedback and in e ac ion wi h s uden s h oughou he semes e , o by an
expec a ion o change, such as om a new academic eaching he cou se. The
model, howe e , simply syn hesises he s uden esponses, which can lead o a
di e ence in emphasis when i comes o p oducing summa ies.
We also obse ed ha o some o ou iden i ied ca ego ies, he model had di e ing
in e p e a ions – pa icula wi h ega ds o he “academic” and “p ac ical” issues.
While he e is a clea dis inc ion in he mind o he s udy boa d as o he di e ence
be ween hese ca ego ies, his nuance is no ep esen ed by he un ained model
and i s ela i ely simple p omp . This could be add essed h ough a mo e
sophis ica ed aining and/o p omp ing in he u u e.
The model was able o deal wi h summa ising low olume esponses. Unlike ou
p e ious i e a ion ha had a endency o add o he summa y de ails no p esen in
he da ase , his model was able o acknowledge ha he e was insu icien da a and
espond wi h a blank o “N/A” esponse.
O e all, he model a ed well on app op ia eness. While he unde lying da ase i sel
has li le in he way o inapp op ia e aw da a, he syn hesis om he model does no
con ey inapp op ia e one o messaging.
The ou comes o he model also pe o med well wi h ega d o ac ionabili y. The
summa ies p oduced by he model did p o ide insigh s ha could be used o in o m
u u e p ac ice, pa icula ly a he la ge scale (eg semes e o o e all). The hemes
ha eme ge in he o e all syn hesis o mul iple cou ses a e he hemes ha mos
wa an some kind o ac ion on he pa o he eaching eam, and so he app oach
appea s o na u ally end owa ds p o iding ac ionable summa ies.
One obse a ion ega ding he summa ies is ha he model is no able o be
iden i ied p oblems ha ha e in ac al eady been esol ed. When s uden s
comple e hei e alua ions hey e lec upon hei expe ience h oughou he whole
semes e , which can include us a ions wi h ea ly-semes e di icul ies (such as a
eache illness) ha ha e in ac al eady been add essed. While cap u ing hese
issues is ac ual, emphasising hem in a summa y ac ually se es o make i less
ac ionable, because ac ion has al eady been aken.
5. CONCLUSION
In his wo k, we p esen a p oo -o -concep le e aging open-sou ce la ge language
models o au oma ically syn hesise cou se e alua ions in o a diges ible o ma o
academic leade s. We p omp ed an ou -o - he-box open-sou ce LLM o summa ise
cou se e alua ions and syn hesise s uden esponses ac oss mul iple cou ses wi hin
a single semes e .
The inhe en con ex - ee na u e o he LLM-based syn hesis allowed he model o
p o ide a di e en pe spec i e o academic leade s. This lack o con ex , howe e ,
also se es o limi somewha he ac ionabili y o he summa ies ha a e p o ided.
Fu he wo k o de elop mo e sophis ica ed p omp ing, as well as o inco po a e
con ex in o he syn hesis will se e o imp o e he alue o he summa ies.
Despi e he simplici y o ou model, ou indings sugges ha he esul ing syn hesis
is la gely ep esen a i e o he o e all s uden e alua ions, as p e iously cap u ed by
o he channels in he s uden e alua ion amewo k. This sugges s ha such ools
o e he po en ial o simply and inexpensi ely o e ing a big pic u e iew o s uden
e alua ions whe e p e iously i was ei he ex emely labou in ensi e o impossible.
6. ACKNOWLEDGEMENTS
This wo k was suppo ed by esea ch g an VIL57392 om VILLUM FONDEN.
We acknowledge he assis ance o ou colleagues in alida ing he ou pu s o ou
models agains hei ecollec ions o he academic yea in ques ion.
This wo k was app o ed by he Aalbo g Uni e si y Human Resea ch E hics
commi ee, wi h app o al numbe 2024-505-00440.