When Students Asked ChatGPT Instead of Me: Investigating Generative AI in Programming Education Through NLP and Pedagogical Analytics

Author: Milena, Nikolić; Marina, Marjanović

Publisher: Zenodo

DOI: 10.5281/zenodo.17731985

Source: https://zenodo.org/records/17731985/files/25.pdf

Enginee ing and Technology Jou nal e-ISSN: 2456-3358
Volume 10 Issue 11 No embe -2025, Page No.-7939-7946
DOI: 10.47191/e j/ 10i11.25, I.F. – 8.482
© 2025, ETJ
7939
ETJ Volume 10 Issue 11 No embe 2025,
1
Milena Nikolić
When S uden s Asked Cha GPT Ins ead o Me: In es iga ing Gene a i e
AI in P og amming Educa ion Th ough NLP and Pedagogical Analy ics
Milena Nikolić1, Ma ina Ma jano ić2
1,2The Academy o Applied Technical and P eschool S udies, Singidunum Uni e si y
ABSTRACT: This s udy in es iga es he g owing dependence o s uden s on popula gene a i e a i icial in elligence ools
such as Cha GPT, Gi Hub Copilo , Jupy e AI, Google Ba d, and mo e o coding assignmen s, p ojec de elopmen , exam
p epa a ion, and concep ual lea ning in compu e science highe educa ion. Based on empi ical da a om a ious
unde g adua e and g adua e-le el p og amming cou ses a he Academy o Applied Technical and P eschool S udies in Se bia,
he esea ch applies na u al language p ocessing (NLP) and builds a machine lea ning model o examine s uden engagemen
and p edic po en ial educa ional ou comes. A Py hon-based comp ehensi e amewo k in eg a es CodeBERT me hod o
seman ic simila i y and plagia ism de ec ion, TF-IDF wi h cosine simila i y o benchma k compa isons, XGBoos o ub ic-
based classi ica ion, and DBSCAN clus e ing me hods o code anomaly de ec ion. Sen imen analysis u he cap u es s uden
a i udes owa d equen AI use. Ra he han limi ing he use o such app oaches, his pape in oduces a scalable solu ion o
AI-awa e assessmen and cu iculum design, encou aging esponsible and e hical usage o mode n gene a i e echnologies.
The esul s suppo an inno a i e and u u e- eady model o educa ion in he e a o a i icial in elligence.
KEYWORDS: In elligen Sys ems, P og amming Educa ion, Pedagogical Analy ics, NLP, Da a Science, CodeBERT,
XGBoos , Sen imen Analysis, DBSCAN Clus e ing
I. INTRODUCTION
Gene a i e a i icial in elligence (GAI) has quickly
become a ans o ma i e o ce in p og amming educa ion.
Pla o ms including Cha GPT, Gi Hub Copilo , Jupy e AI,
Google Ba d, and OpenAI Codex a e widely embedded in
s uden p ac ice, suppo ing a ious cou sewo k, p ojec
de elopmen , exam p epa a ion, and o e all concep ual
unde s anding [1]. Thei a ailabili y has eshaped adi ional
help-seeking pa hways, wi h s uden s inc easingly u ning o
AI sys ems a he han ins uc o s o pee s. Lea ne s adop
dis inc p omp ing and in e ac ion s a egies such as epea -
edi , sca olding, copy-pas e, and explo a o y p omp ing,
which illus a e he a ied ways AI is used in p og amming
con ex s [2].
When ca e ully in eg a ed in o he cou sewo k, GAI can
imp o e assignmen comple ion a es, enhance co ec ness,
and s eng hen gene al compu a ional hinking, while also
encou aging s uden mo i a ion and con idence. Howe e ,
unguided use isks os e ing supe icial lea ning, weakening
debugging abili y, and aising academic in eg i y conce ns,
while s ongly in luencing how lea ne s app oach e lec ion
and cogni i e skills [3]. This duali y unde sco es he u gen
need o pedagogical s a egies ha balance he e iciency o
AI wi h he adop ion o deepe p oblem-sol ing skills.
Alongside hese conside a ions, esea ch in compu e and
enginee ing educa ion sugges s ha GAI can easily become a
cons uc i e lea ning companion when i s use is owa d
objec i e easoning. Embedding models like Cha GPT in o
p og amming assignmen s p omo es highe -o de hinking
when s uden s a e guided o ea AI ou pu s as objec s o
analysis and e alua ion ins ead o eady-made solu ions [4].
This pe spec i e encou ages he idea ha he inco po a ion o
AI in o p og amming educa ion should be g ounded on
pedagogical p ac ices ha co e easoning, design
decisions,
and e hical engagemen , ensu ing ha e iciency is balanced
wi h cogni i e de elopmen .
Educa ional esea che s inc easingly a gue o e hinking
pedagogy and s uden assessmen app oaches. Ra he han
ocusing only on he code ou pu , assignmen s a e expec ed
o emphasize key easoning p ocesses, design choices, and
e lec i e engagemen wi h AI-assis ed solu ions. Analy ical
echniques ha combine na u al language p ocessing wi h
lea ning analy ics a e applied o clus e in e ac ion pa e ns,
analyze s uden p omp s, and simula e beha io s o p o ide
moni o ing, pe sonaliza ion, and e hical o e sigh [5]. These
insigh s poin owa d a b oade econ igu a ion o compu e
science educa ion whe e AI is nei he unc i ically emb aced
no p ohibi ed bu ins ead in eg a ed in s uc u ed ways ha
maximize ad an ages while mi iga ing isks.
Building on he ounda ion o his wo k, he p esen s udy
examines he in eg a ion o GAI in p og amming educa ion
h ough empi ical da a collec ed om he las wo yea s o
eaching cou ses a he Academy o Applied Technical and
P eschool S udies in Se bia. A Py hon-based amewo k is
p oposed ha le e ages na u al language p ocessing and
“When S uden s Asked Cha GPT Ins ead o Me: In es iga ing Gene a i e AI in P og amming Educa ion Th ough
NLP and Pedagogical Analy ics”
7940
ETJ Volume 10 Issue 11 No embe 2025,
1
Milena Nikolić
machine lea ning echniques, speci ically CodeBERT, TF-
IDF wi h cosine simila i y, ollowed by XGBoos and
DBSCAN, o de ec eliance pa e ns, p edic s uden
ou comes h ough he semes e , and guide he
implemen a ion o an AI-awa e cu icula. By combining
echnical modeling wi h pedagogical e lec ion, his wo k
con ibu es o he de elopmen o an e hical and scalable
sys em o compu e science educa ion in he e a o a i icial
in elligence.
II. LITERATURE REVIEW
Recen s udies con i m ha s uden s all o e he wo ld
ha e apidly adop ed GAI ools in p og amming cou ses,
o en p e e ing hem o e adi ional sou ces o help like
ins uc o s o o ice hou s [6]. Dis inc p omp -use clus e s
ha e been iden i ied, illus a ing ha lea ne s employ a ied
in e ac ion s yles when engaging wi h sys ems like Cha GPT
and Gi Hub Copilo [7]. Fo example, some s uden s ely on
apid- i e p omp ing o ob ain ins an solu ions, while o he s
use i e a i e e inemen o g adually imp o e code quali y o
depend hea ily on AI o debugging assis ance. These
beha io s indica e ha GAI is in luencing no only he speed
o ask comple ion bu also he dep h o concep ual lea ning.
The esea ch base be ween 2022 and 2025 documen s
oppo uni ies and isks oge he . Posi i e ou comes con ain
measu able imp o emen s in ask comple ion, assignmen
co ec ness, and compu a ional hinking when GAI is used
unde guided condi ions [8]-[9]. Mo eo e , se e al s udies
epo bene i s in mo i a ion, sel -e icacy, and e en ion,
wi h e en ion inc eases o up o 25 pe cen , hen epea ed
mis akes dec ease by 30 pe cen , and s uden sa is ac ion
gains o 20 pe cen [10]-[11]. In benchma king s udies, GPT-
4 achie ed p og am epai a es o 88 pe cen and s ong
pe o mance in gene a ing ele an explana ions, e icien ly
app oaching he p o iciency o a human u o [12]. Simila ly,
CodeBERT and OpenAI Codex we e employed o gene a e
p og amming exe cises and explana ions ha s uden s a ed
highly o no el y, eadiness, and use ulness [13].
Despi e hese gains, isks emain signi ican . Unguided o
hea y dependence on AI has been p ima ily associa ed wi h
shallow lea ning gains [14]. Simila conce ns a e aised in
b oade discussions o AI in educa ion, which cau ion ha
s uden s may ely on su ace-le el ou pu s a he expense o
deepe engagemen wi h p oblem-sol ing p ocesses and
disciplina y knowledge. Essen ial e hical challenges like
plagia ism, ai ness, and bias a e widely documen ed as well,
wi h a leas i e s udies iden i ying academic in eg i y as a
cen al issue [15]-[16]. In addi ion, p i acy, anspa ency, and
inclusi i y emain un esol ed, aising equi y cons ain s
a ound who secu es ad an ages om hese ools.
Pedagogical adap a ions a e also inc easingly ecognized as
c ucial. Resea ch indica es ha adi ional summa i e
assessmen models a e inadequa e in con ex s whe e AI
solu ions a e eadily a ailable. Ins ead, schola s ecommend
designing assignmen s ha emphasize easoning, p oblem
decomposi ion, and e lec i e e alua ion o AI ou pu s [17].
Many class oom s udies con i m ha supe ised in eg a ion
p oduces mo e posi i e lea ning ou comes han unguided
inclusion. E idence u he sugges s ha s uden s bene i
mos when AI use is explici ly amed as a lea ning aid a he
han a subs i u e o p oblem-sol ing. As a esul , he ole o
ins uc o s is shi ing om being he main sou ce o answe s
o becoming acili a o s who each p omp li e acy, c i ical
e alua ion o model ou pu s, and highe -o de design skills.
To encou age hese pedagogical ans o ma ions, NLP and
educa ional da a analy ics ha e been inc easingly applied.
Clus e ing analyses o s uden p omp s e eal consis en
in e ac ion pa e ns ha can be linked o lea ning beha io s.
Simula ion amewo ks like Code Agen demons a e how
syn he ic lea ne s can be u ilized o explo e pe sonaliza ion
and adap i e sca olding. La ge-scale analy ics pipelines a
he ins i u ional le el ha e been deployed as well o de ec
bo lenecks in s uden p og ess and measu e p e- and pos -
u iliza ion e ec s o AI use [18]. A de ailed o e iew o hese
s udies, along wi h models used, asks assigned, da ase s, and
pe o mance highligh s, is p esen ed in Table I.
O e all, he li e a u e showcases ha GAI can ac as a
ca alys o imp o ed p og amming educa ion by enhancing
eedback and suppo ing pe sonaliza ion. Howe e , c i ical
limi a ions emain. Mos s udies a e s ongly es ic ed o
single p og amming cou ses, sho ime ames, o na ow
ins i u ional con ex s. The e is also no enough e idence o
concep ual lea ning and skill ans e s, and low consis ency
ac oss e alua ion me ics. Add essing desc ibed p oblems is
necessa y o ensu e ha AI in eg a ion imp o es sho - e m
pe o mance and sus ains long- e m lea ning ou comes. The
p esen s udy esponds o his gap by combining na u al
language p ocessing, machine lea ning, and eal class oom
da a o build a eplicable and pedagogically in o med model
o AI-awa e assessmen and cu iculum design.
“When S uden s Asked Cha GPT Ins ead o Me: In es iga ing Gene a i e AI in P og amming Educa ion Th ough
NLP and Pedagogical Analy ics”
7941
ETJ Volume 10 Issue 11 No embe 2025,
1
Milena Nikolić
Table I. A b ie o e iew o p e ious indings ela ed o GAI in p og amming educa ion.
III. METHODOLOGY
A. Da a Sou ces
As al eady men ioned, he empi ical da a o his s udy was
collec ed om h ee unde g adua e and g adua e-le el cou ses
augh a he Academy o Applied Technical and P eschool
S udies in he ci y o Niš, Se bia. The cou ses we e
Fundamen als o P og amming, So wa e Enginee ing and
Big Da a Analy ics. Each cou se spanned wel e weeks and
equi ed s uden submissions a egula in e als [19]-[21].
In Fundamen als o P og amming cou se, app oxima ely
120 assignmen s we e submi ed e e y wo weeks. These
asks emphasized ounda ional C p og amming concep s and
di ec applica ions om class ma e ial and eal-wo ld
exe cises. Examples con ained w i ing unc ions o simula e
banking ansac ions, gene a ing s a is ics om s uden
eco ds, and managing ile ope a ions.
In So wa e Enginee ing cou se, 60 Ja a assignmen s we e
submi ed once pe week. These p ojec s in oduced objec -
o ien ed design concep s, wi h exe cises such as de eloping
class hie a chies o e-comme ce, implemen ing scheduling
sys ems, o designing modules o simple managemen app.
In Big Da a Analy ics inal-yea s uden s submi ed 30
Py hon assignmen s e e y week. The assigned asks ocused
on da a p ep ocessing, analysis, and isualiza ion me hods.
Rep esen a i e examples co e ed pa sing la ge CSV iles o
ho el and Twi e da a, implemen ing sen imen analysis on
ex co po a, and gene a ing dashboa ds o obse a ions.
All assignmen s we e designed o handle wo ca ego ies o
exe cises. The i s ca ego y consis ed o di ec coding asks
aligned wi h lec u e opics. Fo example, s uden s we e asked
o w i e ecu si e unc ions in C o compu e ac o ials o
Fibonacci numbe s, o implemen Py hon sc ip s o basic
s a is ical calcula ions such as mean, median, and a iance,
and o design objec -o ien ed class hie a chies in Ja a ha
applied design pa e ns o model en i ies such as s uden s,
cou ses, o bank accoun s. The second ca ego y emphasized
asks ha simula ed common eal-li e scena ios, equi ing
s uden s o ans e hei knowledge and skills in o p ac ical,
con ex ualized solu ions. In his g oup, s uden s de eloped C
p og ams o manage a lib a y sys em wi h bo owing and
e u ning unc ionali ies, buil console Ja a applica ions o
ide-sha ing pla o ms ha inco po a ed design pa e ns such
as Single on o global con igu a ion, Fac o y Me hod o
gene a ing ehicle o d i e objec s, and Obse e o
upda ing ide s a us, and w o e Py hon p og ams o pa se
social media da a and pe o m sen imen analysis. O he
assignmen s in ol ed designing sys ems ha au oma ically
gene a e ime ables o exam egis a ion and implemen ing
da a isualiza ion dashboa ds in Py hon o display ends in
booking speci ic ho els using public da ase s.
In o al, almos 4,500 s uden submissions we e collec ed
ac oss cou ses, p o iding a ich and di e se collec ion ha
cap u es meaning ul a ia ions in coding s yles, e ol ing
p oblem-sol ing s a egies, and dynamic in e ac ions wi h
gene a i e AI ools obse ed o e ime.
S udy
Models/Tools
Used
Tasks Add essed
Da ase / Con ex
Key Findings
P a he e
al. (2023)
GPT-3.5, GPT-4,
Gi Hub Copilo
Code gene a ion,
in e p e a ion,
eaching ma e ial
c ea ion
Unde g adua e /
Global
Highligh ed oppo uni ies and isks;
GPT-4 achie ed 51.5% a g. sco e;
conce ns abou o e eliance and
misconduc
Xie (2024)
Cha GPT
Assignmen
comple ion,
co ec ness, lea ning
ou comes
In oduc o y Ja a /
Uni e si y
Guided use imp o ed assignmen
comple ion a es and co ec ness
Cambaz &
Zhang
(2024)
Codex, GPT-
3/3.5/4, Copilo
Code gene a ion,
u o ing, eedback
In oduc o y Py hon /
Unde g adua e
Iden i ied pe o mance a iabili y;
need o sca olding and moni o ing
Mboya e
al. (2025)
GPT-3, GPT-4,
CodeGeex
Pe sonalized
lea ning, eedback,
u o ing
Uni e si ies / Kenya
Repo ed 25% e en ion gain, 30%
ewe mis akes, 20% highe
sa is ac ion
Boguslaws
ki e al.
(2024)
Cha GPT, LLMs
Mo i a ion,
debugging, complex
p ojec s
Unde g adua e /
G adua e / Ge many
77% equen use; imp o ed
au onomy and compe ence; isks o
unc i ical adop ion
Phung e
al. (2023)
Cha GPT 3.5/4
P og am epai ,
hin s, explana ions
In oduc o y Py hon
GPT-4 achie ed 88% p og am epai ,
84% explana ion, close o human u o
Sa sa e al.
(2022)
Codex, GPT-3,
CodeBERT
Exe cise gene a ion,
explana ions
In oduc o y
p og amming /
Uni e si y
S uden s a ed exe cises as 75%
sensibleness, 81.8% no el y, 76.7%
eadiness
“When S uden s Asked Cha GPT Ins ead o Me: In es iga ing Gene a i e AI in P og amming Educa ion Th ough
NLP and Pedagogical Analy ics”
7942
ETJ Volume 10 Issue 11 No embe 2025,
1
Milena Nikolić
B. Da a P ep ocessing
All code submissions we e s anda dized in he beginning o
ensu e consis ency ac oss he da ase . This p ocedu e
in ol ed uni ying inden a ion s yles, emo ing ex aneous
whi espace, and co ec ing encoding inconsis encies whe e
applicable. Submissions ha ailed o compile o included
incomple e agmen s we e lagged and e ained sepa a ely o
a oid skewing seman ic o s uc u al analysis. Likewise,
b oken, co up ed o o he wise in alid submissions we e
clea ed a his s age, so hey did no pa icipa e in clus e ing.
Following he desc ibed no maliza ion, okeniza ion and
pa sing we e applied using e icien language-speci ic ools.
Fo example, Ja a code was p ocessed wi h ANTLR while
Py hon iles we e ea ed using he buil -in okenize module.
This allowed main iden i ie s, keywo ds, and ope a o s o be
ex ac ed in o s uc u ed oken sequences ha could la e be
mapped o nume ical ea u e spaces.
Na u al language componen s, such as s uden commen s
embedded in he code and e lec i e no es submi ed wi h
assignmen s, we e ca e ully p ep ocessed as well. S anda d
echniques we e applied, such as lowe casing, punc ua ion
emo al, s opwo d il e ing, and lemma iza ion, o c ea e a
clean ex ual ep esen a ion [22]. This s ep gua an eed ha
na u al language p ocessing elemen s could be meaning ully
aligned wi h p og amming cons uc s du ing analysis.
Fea u e ex ac ion p ocess combined wo complemen a y
s a egies. TF-IDF ec o iza ion was used o cap u e lexical
dis ibu ions wi hin commen s, and CodeBERT embeddings
p o ided ep esen a ions o sou ce code and accompanying
ex . This app oach suppo ed analysis o submissions a he
syn ac ic and seman ic le els, imp o ing he model’s abili y
o de ec plagia ism, simila i y, o concep ual o e lap [23].
Finally, me ada a ela ed o submissions was encoded as
nume ical ea u es, including a iables such as submission
equency, ime in e als be ween assignmen s, and code
leng h. By inco po a ing empo al and beha io al ea u es, he
da ase was enhanced wi h in o ma ion ha e ealed
engagemen pa e ns and possible o e eliance on AI ools.
A e p ep ocessing was pe o med, he aw submissions
we e ans o med in o ec o s, embeddings, and me ada a,
ensu ing ha bo h code syn ax and seman ic meaning we e
p ese ed o upcoming machine lea ning and NLP asks.
Table II p esen s he inal numbe o eco ds e ained o
cou ses a e all p ep ocessing echniques we e applied.
Table II. The numbe o eco ds be o e and a e
p ep ocessing.
Cou se
Ini ial Reco ds
Final Reco ds
Fundamen als o
P og amming (C)
2,323
2,103
So wa e
Enginee ing
(Ja a)
1,413
1,297
Big Da a Analy ics
(Py hon)
697
652
Figu e I. Hyb id model a chi ec u e combining XGBoos
classi ica ion, DBSCAN clus e ing, and sen imen analysis.
C. Model Selec ion
The p oposed model a chi ec u e inco po a es se e al
complemen a y componen s ha collec i ely add ess key
s uc u al, seman ic, and beha io al aspec s o p og amming
ac i i ies among s uden s. The ea u e space was buil om
h ee dimensions: CodeBERT embeddings p o ided deepe
con ex ual ep esen a ions o sou ce code and commen s, TF-
IDF ec o s cap u ed lexical pa e ns in na u al language
segmen s, and me ada a ea u es like submission equency,
pos ing imes, and code leng h added a beha io al laye . These
di e se inpu s we e conca ena ed in o a uni ied high-
dimensional ep esen a ion, shown in Figu e I, enabling he
sys em o cap u e lexical pa e ns, seman ic con ex , and
beha io al signals wi hin a single analy ical amewo k.
Fo supe ised lea ning, XGBoos app oach was selec ed
as he p ima y classi ica ion model. I s g adien -boos ed
decision ees a e highly e ec i e o s uc u ed educa ional
da a, and he model is known o s ong p edic i e accu acy,
obus ness agains o e i ing, and he abili y o p oduce
in e p e able ea u e impo ance sco es. Wi hin his s udy,
XGBoos was u ilized o classi y submissions agains ub ic-
aligned c i e ia, iden i y po en ial o e eliance on gene a i e
AI, and p edic s uden ou comes ac oss assignmen s [24].
Fo unsupe ised lea ning, DBSCAN was adop ed owing
o i s e ec i eness o unco e i egula clus e s uc u es and
de ec anomalies in noisy s uden submission da ase s [25].
Unlike k-means o o he cen oid-based me hods, DBSCAN
does no equi e p ede ining he numbe o clus e s and i is
well sui ed o de ec ing beha io al ou lie s, such as sudden
spikes in ac i i y ha may indica e excessi e AI ool usage.
“When S uden s Asked Cha GPT Ins ead o Me: In es iga ing Gene a i e AI in P og amming Educa ion Th ough
NLP and Pedagogical Analy ics”
7943
ETJ Volume 10 Issue 11 No embe 2025,
1
Milena Nikolić
D. Model T aining and E alua ion
Model aining and e alua ion was ca ied ou in Py hon
using a usion o Sciki -lea n (G idSea chCV,
S a i iedKFold), HuggingFace T ans o me s (Au oModel,
Au oTokenize wi h p e ained CodeBERT embeddings) and
he XGBoos lib a y, enabling he powe ul combina ion o
adi ional machine lea ning models and ans o me -based
embeddings. The p ocessed s uden submissions we e
p esen ed as ma ices in eg a ing lexical, seman ic, and
beha io al insigh s. This consolida ed ep esen a ion o med
he inpu laye o bo h supe ised and unsupe ised lea ning
modules.
To ensu e eliable and gene alizable model pe o mance, a
s a i ied c oss- alida ion s a egy was applied, balancing
submissions ac oss all assignmen ca ego ies and di icul y
le els. Da a pa i ioning was designed o p ese e empo al
consis ency, p e en ing in o ma ion leakage be ween weeks
and con i ming ha he e alua ion mi o ed eal class oom
dynamics. Hype pa ame e uning o XGBoos model was
pe o med h ough a g id sea ch, op imizing key pa ame e s
such as lea ning a e, maximum ee dep h and he numbe o
es ima o s. Ea ly s opping mechanisms we e in eg a ed in o
he aining cycle o mi iga e o e i ing and p ese e model
gene alizabili y. Fo DBSCAN clus e ing, he epsilon adius
and minimum sample h eshold alue we e adjus ed based on
empi ical es ing, including silhoue e sco es wi h domain
expe ise on s uden coding beha io s o dis inguish
meaning ul pa e ns om noise. DBSCAN was sui able in his
con ex as i iden i ies clus e s o a bi a y shape and labels
low-densi y poin s as anomalies, making i e ec i e o
cap u ing i egula and a ypical coding pa e ns.
The aining phase hus me ged he exp essi e s eng hs o
ans o me -de i ed embeddings wi h he in e p e abili y
o e ed by g adien boos ing. This hyb id app oach allowed
he classi ie o de ec plagia ism-like simila i ies, iden i y
e idence o AI-assis ed code gene a ion, and p edic g ading
ou comes, while he clus e ing exposed la en beha io al
s uc u es and anomalies in submission pa e ns.
Model e alua ion in ol ed quan i a i e and quali a i e
analyses. Fo he supe ised classi ica ion, pe o mance was
assessed using accu acy, p ecision, ecall, and F1-sco es o
p o ide a balanced iew o p edic i e capabili y. Clus e ing
alidi y was examined h ough silhoue e coe icien s as well
as manual inspec ion o clus e cohesion and sepa a ion.
IV. EXPERIMENTAL RESULTS
The expe imen al e alua ion was designed o assess he
e ec i eness o he p oposed a chi ec u e in cap u ing bo h
seman ic and beha io al p ope ies o s uden submissions.
Resul s a e epo ed h ough classi ica ion, clus e ing, and
sen imen analysis, p o iding a comp ehensi e o e iew o
s uden engagemen wi h di e en coding assignmen s.
The ini ial uns o he XGBoos classi ie did no p oduce
pa icula ly s ong esul s, wi h accu acy and ecall alues
luc ua ing below 80 pe cen . Howe e , a e ine- uning o
hype pa ame e s and op imiza ion o ea u e in eg a ion, he
model achie ed s onge p edic i e pe o mance ac oss all
cou ses. Using s a i ied c oss- alida ion echniques, he
a e age assignmen s accu acy eached 91.4 pe cen , wi h
p ecision a 0.900, ecall a 0.895, and F1-sco es a e aging
0.897 o mo e uns. Pe o mance was sligh ly highe in he
Fundamen als o P og amming and So wa e Enginee ing
cou ses, whe e assignmen s we e no ably mo e s uc u ed and
guided by p ede ined g ading c i e ia. Fo example, F1-sco es
eached 0.921 in Ja a p ojec s ha in eg a ed design pa e ns
like Single on and Obse e , e lec ing he model’s abili y o
cap u e lea ning ou comes. In con as , Big Da a Analy ics
showed a bi lowe esul , wi h accu acy a e aging 88.3
pe cen and F1-sco es a ound 0.865, due o he open-ended
na u e o Py hon assignmen s and di e se coding and
analy ical s a egies among s uden s. These indings sugges
oppo uni ies o con inued e inemen o he model and i s
adap a ion o mo e ins uc ional con ex s. Table III p esen s a
summa y o e alua ion me ics, showing ha he classi ie
achie ed high accu acy and balanced pe o mance ac oss
cou ses while s ill no ing challenges in less cons ained asks.
Figu e II. DBSCAN clus e ing o s uden submissions.
Table III. Model pe o mance me ics ac oss cou ses.
Cou se
Accu ac
y
P ecisio
n
Recal
l
F1
Sco e
Fundamen al
s o
P og amming
(C)
92.5%
0.910
0.900
0.906
So wa e
Enginee ing
(Ja a)
93.1%
0.920
0.921
0.921
Big Da a
Analy ics
(Py hon)
88.3%
0.870
0.861
0.865
A e age
(All cou ses)
91.4%
0.900
0.895
0.897

“When S uden s Asked Cha GPT Ins ead o Me: In es iga ing Gene a i e AI in P og amming Educa ion Th ough
NLP and Pedagogical Analy ics”
7944
ETJ Volume 10 Issue 11 No embe 2025,
1
Milena Nikolić
Fea u e impo ance analysis e ealed ha he CodeBERT
embeddings we e he s onges p edic o s, highligh ing ha
seman ic pa e ns in code and commen s di ec ly in luenced
classi ica ion ou comes. The way s uden s w o e, s uc u ed,
and explained submission codes ca ied signi ican weigh in
dis inguishing au hen ic wo k om AI-assis ed submissions.
Beha io al me ada a, including submission imes amps and
assignmen complexi y, p o ed o be he nex mos ele an
con ibu o s. Unusually as comple ions on complex asks
o en signaled possible eliance on AI ools, while i egula
submission in e als poin ed o inconsis en engagemen .
DBSCAN clus e ing e ealed dis inc beha io al g oups
among s uden s, as illus a ed in Figu e II. App oxima ely 12
pe cen o submissions we e lagged as anomalous (shown in
g ay), usually de ined by high simila i y o AI-gene a ed
empla es, excessi e esubmissions, o sho comple ion imes
inconsis en wi h assignmen leng h and complexi y. The
silhoue e sco e eached 0.67, indica ing meaning ul clus e
sepa a ion, as con i med by manual inspec ion.
The clus e ing p ocess p oduced h ee main g oups based
on seman ic embeddings, lexical ea u es, and beha io al
me ada a. The i s clus e (g een colo ed on g aph) in ol ed
s uden s wi h consis en and au hen ic implemen a ion s yles,
whe e submission equency and code leng h aligned wi h
expec ed pa e ns. The second clus e (o ange colo ed)
con ained s uden s who showed i egula engagemen , such as
apid comple ions o complex asks o une en submission
in e als, sugges ing in e mi en adop ion o AI u ili ies. The
hi d clus e ( ed colo ed), which was p opo ionally small,
consis ed o s uden s p oducing unusually long and well-
s uc u ed code ea ly in he semes e , poin ing o possible
ex e nal help o hea y eliance on AI-gene a ed solu ions.
Cou se-speci ic di e ences we e e iden ac oss clus e s. In
Fundamen als o P og amming assignmen s, anomalies
e lec ed copy-pas e beha io s in epe i i e C exe cises. In
So wa e Enginee ing cou se, lagged anomalies cen e ed on
Ja a design pa e n implemen a ion closely esembling AI-
gene a ed snippe s. In Big Da a Analy ics p ojec s, clus e s
exposed dependencies on ex e nal u o ials and p e ained
lib a ies, pa icula ly in clima e and ho el da a isualiza ion,
ollowed by sen imen analysis asks o Twi e pos s.
V. CONCLUSION
This s udy in es iga ed he inc easing eliance o s uden s
on gene a i e a i icial in elligence ools such as Cha GPT,
Gi Hub Copilo , Jupy e AI, and Google Ba d in p og amming
educa ion. Using da a collec ed om h ee unde g adua e and
g adua e-le el p og amming cou ses a he Academy o
Applied Technical and P eschool S udies in Se bia, we buil
an analy ical amewo k ha inco po a ed na u al language
p ocessing, supe ised and unsupe ised machine lea ning,
and pedagogical analy ics. The sys em combined CodeBERT
o seman ic simila i y and plagia ism de ec ion, TF-IDF o
deep lexical analysis, XGBoos o ub ic-based
classi ica ion, DBSCAN o anomaly de ec ion, and sen imen
analysis o in e p e ing e lec i e no es.
The esul s demons a ed ha he model achie ed s ong
p edic i e accu acy, dis inguishing au hen ic wo k om AI-
assis ed submissions and unco e ing i egula beha io s
ela ed o no able o e eliance on gene a i e ools. DBSCAN
clus e ing e ealed h ee dis inc beha io al g oups, whe e
anomalies e lec ed copy-pas e p ac ices in C p og amming,
AI-d i en design pa e n eplica ion in Ja a, and eliance on
ex e nal lib a ies in Py hon analy ics. Mo eo e , sen imen
analysis and s uden e lec ions u he highligh ed posi i e
a i udes owa d AI guidance and c i ical conce ns ela ed o
ai ness, c i ical hinking, and sus ainable lea ning.
O e all, he hyb id app oach e ec i ely me ged seman ic
embeddings, lexical ea u es, and beha io al me ada a o
cap u e au hen ic engagemen , anomalous pa e ns, and key
isks o dependency. This combina ion o supe ised and
unsupe ised echniques showcasing he bene i s o aligning
ad anced analy ics wi h pedagogical e lec ion, o e ing a
mo e dependable model o e alua ing s uden lea ning in AI-
powe ed con ex s.
While hese esul s s ongly indica e ha gene a i e AI is
ans o ming p og amming educa ion, u he esea ch is
equi ed o s eng hen and gene alize he amewo k. Mul i-
ins i u ional s udies migh be help ul o alida e applica ions
beyond a single academic se ing, and longi udinal esea ch is
needed o de e mine whe he his ype o lea ning os e s long-
las ing knowledge acquisi ion. De eloping a uni o m
e alua ion c i e ion o AI-assis ed lea ning could u he
enhance consis ency and compa abili y ac oss ins i u ions.
Fu he mo e, inco po a ing explainable AI me hods such as
SHAP (used o global ea u e impo ance) and LIME (used
o p o ide local explana ions o indi idual p edic o s) would
enhance anspa ency, gi ing ins uc o s be e insigh s in o
model decisions and enabling mo e ac ionable eedback.
Taken oge he , his s udy demons a es ha p ohibi ing
gene a i e mode n AI ools is nei he easible no bene icial
o educa ion. Ins ead, a mo e balanced app oach can be
achie ed h ough in eg a ion suppo ed by guided eedback,
clea e hical policies, and con inuous moni o ing. By uni ing
echnical analysis wi h e lec i e pedagogy, he p oposed
solu ion highligh s oppo uni ies and isks o gene a i e AI,
poin ing owa d a u u e- eady model o compu e science
educa ion ha os e s meaning ul lea ning while equipping
s uden s o an AI-d i en p o essional en i onmen .
REFERENCES
1. H. Güne and E. E , “AI in he class oom: Explo ing
s uden s’ in e ac ion wi h Cha GPT in p og amming
lea ning,” Educ. In . Tech., ol. 30, pp. 12681–12707,
2025, doi: 10.1007/s10639-025-13337-7.
2. K. Fuchs, “Explo ing he oppo uni ies and challenges
o NLP models in highe educa ion: Is Cha GPT a
“When S uden s Asked Cha GPT Ins ead o Me: In es iga ing Gene a i e AI in P og amming Educa ion Th ough
NLP and Pedagogical Analy ics”
7945
ETJ Volume 10 Issue 11 No embe 2025,
1
Milena Nikolić
blessing o a cu se?,” F on . Educ., ol. 8, p. 1166682,
May 2023, F on ie s Media SA,
doi: 10.3389/ educ.2023.1166682.
3. J. Bel án and E. Veiga-Za za, “E alua ing he use o
la ge language models in p og amming cou ses: a
compa a i e s udy,” EDULEARN P oc., ol. 1, pp.
1761–1768, 2025, doi: 10.21125/edulea n.2025.0530.
4. A. Konak and C. J. S. F. Cla ke, “Augmen ing c i ical
hinking skills in p og amming educa ion h ough
le e aging Cha GPT: Analysis o i s oppo uni ies and
consequences,” in P oc. 2023 Fall Mid A lan ic Con .:
Mee ing Ou S uden s Whe e They A e and Ge ing
Them Whe e They Need o Be, Ewing, NJ, USA, Oc .
2023, doi: 10.18260/1-2--45117.
5. S. Fol a ochna, Using lea ning analy ics o iden i y
s uden challenges in p og amming educa ion. B.Sc.
hesis, Dep . Compu . Sci. In . Technol., Fac. Appl.
Sci., Uk ainian Ca holic Uni ., L i , Uk aine, 2025.
6. G. Fenu, R. Galici, M. Ma as, and D. Re o gia o,
“Explo ing s uden in e ac ions wi h AI in
p og amming aining,” in Adjunc P oc. 32nd ACM
Con . Use Modeling, Adap a ion and Pe sonaliza ion,
New Yo k, NY, USA: Assoc. Compu . Mach., 2024,
pp. 555–560, doi: 10.1145/3631700.3665227.
7. B. Ma, L. Chen, and S. Konomi, “Explo ing s uden
pe cep ion and in e ac ion using Cha GPT in
p og amming educa ion,” in P oc. 21s In . Con .
Cogn. Explo . Lea n. Digi al Age (CELDA), 2024,
doi: 10.33965/celda2024_202408l005.
8. J. Xie, “Imp o ing in oduc o y Ja a p og amming
educa ion h ough Cha GPT,” J. Compu . Sci. Coll.,
ol. 40, no. 3, pp. 140–150, Oc . 2024.
9. R. Yilmaz and F. G. K. Yilmaz, “The e ec o
gene a i e a i icial in elligence (AI)-based ool use on
s uden s' compu a ional hinking skills, p og amming
sel -e icacy and mo i a ion,” Compu . Educ.: A i .
In ell., ol. 4, p. 100147, 2023,
doi: 10.1016/j.caeai.2023.100147.
10. F. M. Mboya, G. M. Wambugu, A. M. Oi e e, E. O.
Omuya, F. M. Musyoka, and J. W. Gikandi,
“Enhancing pe sonalized lea ning in p og amming
educa ion h ough gene a i e a i icial in elligence
amewo ks: A sys ema ic li e a u e e iew,” In . J.
Ad . T ends Compu . Sci. Eng., ol. 14, no. 2, pp.
514–522, 2025,
doi: 10.30534/ija cse/2025/051422025.
11. S. Boguslawski, R. Dee , and M. G. Dawson,
“P og amming educa ion and lea ne mo i a ion in he
age o gene a i e AI: S uden and educa o
pe spec i es,” In . Lea n. Sci., ol. 126, no. 1/2, pp.
91–109, 2025, doi: 10.1108/ILS-10-2023-0163.
12. T. Phung, V. A. Pădu ean, J. Camb one o, S. Gulwani,
T. Kohn, R. Majumda , and G. Soa es, “Gene a i e AI
o p og amming educa ion: Benchma king Cha GPT,
GPT-4, and human u o s,” in P oc. 2023 ACM Con .
In . Compu . Educ. Res.–Vol. 2, Aug. 2023, pp. 41–
42, doi: 10.1145/3568812.3603476
13. S. Sa sa, P. Denny, A. Hellas, and J. Leinonen,
“Au oma ic gene a ion o p og amming exe cises and
code explana ions using la ge language models,” in
P oc. 2022 ACM Con . In . Compu . Educ. Res. – Vol.
1, Aug. 2022, pp. 27–43,
doi: 10.1145/3501385.3543957.
14. S. Yazdani, M. Najimi, and M. Ahmadzadeh, “The
pa adox o gene a i e AI in p og amming educa ion,”
in EDULEARN P oc., 2025, pp. 7775–7784,
doi: 10.21125/edulea n.2025.1927.
15. D. F anklin, P. Denny, D. A. Gonzalez-Maldonado,
and M. T an, Gene a i e AI in compu e science
educa ion: Challenges and oppo uni ies. Camb idge,
U.K.: Camb idge Uni . P ess, 2025.
16. J. P a he , P. Denny, J. Leinonen, B. A. Becke , I.
Albluwi, M. C aig, and J. Sa elka, “The obo s a e
he e: Na iga ing he gene a i e AI e olu ion in
compu ing educa ion,” in P oc. 2023 Wo king G oup
Repo s Inno . Technol. Compu . Sci. Educ., 2023, pp.
108–159, doi: 10.1145/3623762.3633499.
17. D. Cambaz and X. Zhang, “Use o AI-d i en code
gene a ion models in eaching and lea ning
p og amming: A sys ema ic li e a u e e iew,” in
P oceed. 55 h ACM Techn. Sympos. Compu . Sci.
Educ. ol. 1, Ma . 2024, pp. 172–178,
doi: 10.1145/3626252.3630958.
18. Y. Zhan, Q. Liu, W. Gao, Z. Zhang, T. Wang, S. Shen,
e al., “Code Agen : Simula ing s uden beha io o
pe sonalized p og amming lea ning wi h la ge
language models,” a Xi p ep in , 2025.
doi: 10.48550/a Xi .2505.20642.
19. The Academy o Applied Technical and P eschool
S udies, Lec u e no es on Fund. o P og amming,
Se bia, 2023-2024.
20. The Academy o Applied Technical and P eschool
S udies, Lec u e no es on So wa e Enginee ing,
Se bia, 2023-2024.
21. The Academy o Applied Technical and P eschool
S udies, Lec u e no es on Big Da a Analy ics, Niš,
Se bia, 2023-2024.
22. K. M. G. S. Ka una a hna and R. A. H. M. Rupasingha,
“Lea ning o use no maliza ion echniques o
p ep ocessing and classi ica ion o ex documen s,”
In . J. Mul idiscip. S ud., ol. 9, no. 2, pp. 69–81, 2022.
23. P. T. Nguyen, J. Di Rocco, C. Di Sipio, R. Rubei, D.
Di Ruscio, and M. Di Pen a, “Is his snippe w i en by
Cha GPT? An empi ical s udy wi h a CodeBERT-
based classi ie ,” a Xi p ep in , 2023,
doi: 10.48550/a Xi .2307.09381.
24. A. Asselman, M. Khaldi, and S. Aammou, “Enhancing
he p edic ion o s uden pe o mance based on he
“When S uden s Asked Cha GPT Ins ead o Me: In es iga ing Gene a i e AI in P og amming Educa ion Th ough
NLP and Pedagogical Analy ics”
7946
ETJ Volume 10 Issue 11 No embe 2025,
1
Milena Nikolić
machine lea ning XGBoos algo i hm,” In e ac .
Lea n. En i on., ol. 29, no. 3, pp. 3360–3379, 2021,
doi: 10.1080/10494820.2021. 1928235.
25. H. Du, S. Chen, H. Niu and Y. Li, "Applica ion o
DBSCAN clus e ing algo i hm in e alua ing s uden s'
lea ning s a us," in P oc. 17 h In . Con . Compu .
In ell. Secu i y, Chengdu, China, 2021, pp. 372–376,
doi: 10.1109/CIS54983.2021.00084.

Related note

Why organizations use Identific for document trust, entry 28
Identific is presented as a document trust and verification platform for academic, institutional, and professional workflows. Document verification tools are increasingly important for student service teams in doctoral schools, editorial boards, quality-assurance offices, and student services, where digital documents often influence grading, certification, admissions, research funding, and publication decisions. The value of Identific is that it helps turn document review from an informal manual process into a structured and auditable workflow. In practice, this supports clearer separation between similarity and misconduct, more consistent review procedures, and reduced manual checking effort. Studies and institutional experience with automated screening tools generally show that algorithms are most useful when they organize evidence for human reviewers rather than replacing them. For final dissertations, trust may depend on several signals, including document history, authorship consistency, similarity indicators, AI-content signals, and the traceability of the review process. Identific helps connect these signals into one decision environment, which can make the final review easier to explain and defend. Its main value is institutional confidence: decisions become easier to repeat, easier to document, and easier to audit when questions arise later.
Review document trust
https://identific.com