Towa ds Adap i e Pedagogical Policies: A Hyb id
Rein o cemen Lea ning and La ge Language Model
F amewo k o In elligen Tu o ing Sys ems unde
Indonesia’s Cu iculum
P adi ya Wicaksono
Magis e In o ma ika
Uni e si as Dian Nuswan o o
Sema ang, Indonesia
[email p o ec ed]us.ac.id
Pulung Nu an io Andono
Magis e In o ma ika
Uni e si as Dian Nuswan o o
Sema ang, Indonesia
[email p o ec ed]us.ac.id
Pujiono
Magis e In o ma ika
Uni e si as Dian Nuswan o o
Sema ang, Indonesia
[email p o ec ed]us.ac.id
Abs ac —Pe sonalized lea ning is inc easingly cen al o
echnology-enhanced educa ion, ye p ac ical deploymen s o -
en oscilla e be ween sys ems ha a e pedagogically s a egic
bu igid (classical In elligen Tu o ing Sys ems, ITS) and sys-
ems ha a e con e sa ionally luen bu pedagogically myopic
(La ge Language Models, LLMs). This ision pape p oposes a
hyb id a chi ec u e ha in eg a es Q-lea ning as a pedagog-
ical decision engine and LLMs as a communica ion engine,
o ches a ed wi hin an Adap i e Lea ning Sys em (ALS) laye .
The goal is o ealize adap i e pedagogical policies ha align
wi h Indonesia’s Ku ikulum Me deka (Me deka Cu iculum),
which emphasizes lexibili y, lea ne agency, and di e en ia ed
pa hways. We posi ion he esea ch gap a he in e sec ion o
s a egy (RL/ITS), communica ion (LLM), and o ches a ion
(ALS), and a icula e con ibu ions in e ms o (i) a p incipled
in eg a ion ha p ese es pedagogical in en while deli e ing
human-like dialogue; (ii) a localiza ion agenda g ounded in he
cu icula and in as uc u al eali ies o Indonesian seconda y
educa ion; and (iii) a esea ch oadmap co e ing concep ual
design, limi ed-scope p o o yping, echnical and pedagogical
e alua ion, and scaling. The pape in en ionally a oids imple-
men a ion de ails, pa ame e s, o code, ocusing ins ead on
no el y, scope, and ac ionable implica ions o esea che s and
p ac i ione s.
Index Te ms—In elligen Tu o ing Sys ems, Rein o cemen
Lea ning, Q-lea ning, La ge Language Models, Adap i e Lea n-
ing Sys ems, Me deka Cu iculum, Indonesia, Educa ional
Technology.
I. In oduc ion
The global shi owa d pe sonaliza ion has exposed
limi a ions in one-size- i s-all ins uc ion. Lea ne s di e
in p io knowledge, mo i a ion, pace, and p e e ed modal-
i ies, demanding sys ems ha adap o e ol ing needs
a he han en o cing s a ic sequences. Classical In elligen
Tu o ing Sys ems (ITS)—pa icula ly hose employing Re-
in o cemen Lea ning (RL)—ha e demons a ed p omise
in op imizing pedagogical s a egies such as difficul y
adjus men and con en sequencing [1], [3]. Ye many
ITS deploymen s emain igid, as s a egy pa ame e s and
esponse o ms a e o en p ede e mined and slow o eac
o eal- ime lea ne dynamics.
In pa allel, La ge Language Models (LLMs) ha e shown
excep ional capabili y in na u al dialogue, sca olding
explana ions, and esponding o open-ended que ies [4], [5].
Howe e , LLMs a e ypically eac i e con en gene a o s
wi hou s able long-ho izon pedagogical policies; con e -
sa ion quali y can be high while ins uc ional cohe ence
o e ime is no gua an eed.
Indonesia’s Me deka Cu iculum (Ku ikulum Me deka)
accen ua es pe sonaliza ion, lexibili y, and con ex ual-
iza ion ac oss di e se school se ings. The cu iculum’s
ambi ion mee s implemen a ion cons ain s: a iabili y in
eache eadiness, esou ce limi a ions, and geog aphical
dispa i ies. A p incipled human–AI pa ne ship is needed
o scale di e en ia ion wi hou dilu ing pedagogical in-
eg i y.
This pape p oposes a hyb id ITS a chi ec u e whe e
Q-lea ning go e ns wha pedagogical ac ion o ake and
when, while an LLM de e mines how o communica e
he ac ion in a human-like, s uden -sensi i e manne . An
ALS laye o ches a es da a lows, p o iles, and objec i es,
ensu ing ha mic o-le el decisions cohe e wi h mac o-le el
lea ning pa hs and cu icula a ge s.
II. Posi ioning and No el y
A. Resea ch Gap
E idence sugges s RL can lea n adap i e eaching s a e-
gies and imp o e engagemen [1]–[3], ye classical se ups
s uggle wi h eal- ime esponsi eness and ich eedback
channels. Con e sely, LLMs enable lexible, con ex ual
dialogue bu lack policy memo y and p incipled ins uc-
ional con ol [4]. ALS p o ides he mac o- amewo k o
di e en ia ion and p og ess moni o ing [6], [7], bu mos
in eg a ions emain pa ial (e.g., scheduling wi hou con-
e sa ional sca olding, o cha wi hou policy cohe ence).
The gap lies in a uni ied model ha : (i) p ese es
pedagogical in en ia a lea ned policy, (ii) exp esses ha
in en ia na u al dialogue, and (iii) coo dina es bo h
wi hin a cu iculum-awa e o ches a ion laye .
B. Claims o No el y
(1) Policy- i s in eg a ion o RL and LLM wi hin
ALS. We a icula e a di ision o labo : Q-lea ning as a
decision engine ha selec s pedagogical ac ions and LLM
as a communica ion engine ha ealizes hose ac ions
con e sa ionally. ALS binds he wo, aligning policy wi h
lea ne models, goals, and cons ain s.
(2) Localiza ion o he Me deka Cu iculum. We explic-
i ly a ge he needs o Indonesian seconda y schools: di -
e en ia ed pa hways, a ied eadiness, and eache wo k-
load. The ision p io i izes cu icula cohe ence, eache
o e sigh , and scalabili y unde esou ce cons ain s.
(3) E alua ion ac oss echnical and pedagogical axes.
Beyond sys em pe o mance, we emphasize e ec s on
unde s anding, mo i a ion, and engagemen —c i ical o
policy adop ion and eal-wo ld impac .
III. Concep ual A chi ec u e
A. Th ee-Laye View
Decision Engine (Q-lea ning). The policy space com-
p ises pedagogical ac ions (e.g., explain, p omp me acog-
ni ion, gi e p ac ice, o e challenge, e iew), chosen wi h
espec o lea ne s a es (engaged, s uggling, passi e,
cu ious, e c.). The ocus is on adap i e policies ha
espond o ecen e idence while emaining ai h ul o cu -
icula a ge s. We delibe a ely a oid algo i hmic de ail;
he essen ial poin is ha decisions a e policy-d i en and
da a-in o med.
Communica ion Engine (LLM). The LLM ansla es se-
lec ed ac ions in o dialogic u ns ha a e age-app op ia e,
linguis ically and cul u ally sensi i e, and aligned wi h
lea ne p o iles ( eading le el, p io misconcep ions, a ec ).
The LLM is cons ained by he decision engine: i does
no “ eelance” ins uc ion, bu exp esses policy cohe en ly
and consis en ly.
ALS Laye . ALS o ches a es long-ho izon di e en-
ia ion: lea ne p o iling, pa h planning, p e equisi e
s uc u es, assessmen checkpoin s, and alignmen wi h
Me deka Cu iculum ou comes. I media es da a ex-
changes and gua ds cohe ence be ween mic o-decisions
and mac o-goals.
B. Human-in- he-Loop O e sigh
Teache dashboa ds su ace analy ics (e.g., p og es-
sion indica o s, engagemen aces, misconcep ion clus-
e s) and p o ide e o/o e ide mechanisms. This ensu es
accoun abili y, suppo s o ma i e assessmen , and p e-
se es eache agency. The sys em aims o complemen —
no eplace— eache s, pa icula ly in con ex s wi h high
s uden – eache a ios.
C. Sa e y, Reliabili y, and E hics
We an icipa e isks such as hallucina ions, biased
eedback, and o e -au oma ion. The a chi ec u e he e-
o e inco po a es: (i) g ounded p omp s and con olled
gene a ion o he LLM, (ii) con en alida ion agains
cu iculum-aligned eposi o ies, (iii) anspa en logging
o audi abili y, and (i ) s uden p i acy sa egua ds con-
sis en wi h local egula ions and ins i u ional policies.
IV. Resea ch Roadmap
A. Phase I: Concep ual Design
•Fo malize he policy–communica ion sepa a ion o
conce ns.
•De ine lea ne s a e axonomies and ac ion on ologies
aligned wi h Me deka Cu iculum compe encies.
•Conduc expe e iews (educa ional echnologis s,
cu iculum specialis s, eache s) o alida e con-
s uc s and usabili y expec a ions.
B. Phase II: Limi ed-Scope P o o yping
•Implemen a minimal e ical slice o one high-school
subjec (e.g., Physics o In o ma ics), ocusing on a
ew well-de ined compe encies.
•In eg a e a basic dashboa d o eache o e sigh and
eedback cap u e.
•Pilo in a con olled en i onmen (small class, a e -
school p og am) o es easibili y and wo k low.
C. Phase III: Technical and Pedagogical E alua ion
Technical. E alua e esponsi eness o he decision en-
gine and s abili y o policy–dialogue alignmen . T ack con-
e gence signals and obus ness o noisy inpu s (wi hou
exposing pa ame e s).
Pedagogical. Use p e/pos measu es o concep ual un-
de s anding, mo i a ion, and engagemen ; include usabil-
i y measu es and quali a i e eedback om eache s and
s uden s. Compa e agains con en ional ins uc ion o
cha -only baselines o isola e he alue o policy-d i en
dialogue.
D. Phase IV: Expansion and Dissemina ion
•Ex end o addi ional subjec s and compe encies;
s ess- es in schools wi h a ying esou ce p o iles.
•Documen design p inciples and deploymen play-
books o local ed ech s akeholde s.
•Sha e indings ia open p ep in s and con e -
ence/jou nal publica ions o os e collabo a ion.
V. Implica ions
A. Fo Resea ch
This agenda si ua es ITS esea ch a he in e sec ion
o s a egy lea ning, con e sa ional pedagogy, and sys em
o ches a ion. By cleanly sepa a ing policy om exp es-
sion, we open a enues o s udying how di e en ac ion
on ologies, eedback designs, and cu iculum cons ain s
shape lea ning ajec o ies.
B. Fo P ac ice
Teache s gain a policy-awa e assis an ha scales di e -
en ia ion and educes ou ine bu dens, while p ese ing
p o essional judgmen . School leade s ob ain analy ics o
a ge ed in e en ions. Fo lea ne s, he sys em p omises
guidance ha is bo h pe sonable and pu pose ully se-
quenced.
C. Fo Technology
Vendo s and de elope s can adop he a chi ec u e as a
modula bluep in : in e changeable policy lea ne s, swap-
pable LLM backends, and pluggable ALS componen s.
This modula i y suppo s esponsible upg ades wi hou
des abilizing class oom p ac ice.
VI. Conclusion
We ha e ou lined a hyb id ITS ision ha ma ies he
pedagogical in en ionali y o RL wi h he communica i e
powe o LLMs, o ches a ed by an ALS laye uned o
Indonesia’s Me deka Cu iculum. The con ibu ion is no
algo i hmic no el y pe se bu a chi ec u al cla i y: a
policy- i s in eg a ion ha keeps dialogue in se ice o in-
s uc ional goals, localized o a na ional e o m agenda. By
ocusing on concep ual soundness, localized ele ance, and
dual-axis e alua ion ( echnical and pedagogical), we aim
o ca alyze deployable inno a ion a he han labo a o y-
bound p o o ypes.
Acknowledgmen
The au ho s hank collabo a ing eache s, cu iculum
expe s, and school pa ne s o o ma i e eedback du ing
he concep de elopmen .
Re e ences
[1] A. Iglesias, P. Ma ínez, R. Ale , and F. Fe nández, “Lea ning
eaching s a egies in an adap i e and in elligen educa ional
sys em,” Applied In elligence, 2009.
[2] Y. Zhang and I. A oyo, “Q-lea ning in In elligen Tu o ing
Sys ems,” P oc. AI-ED, 2001.
[3] F. Do ça e al., “Adap i e lea ning sys ems: A e iew,” Com-
pu e s & Educa ion, 2013.
[4] E. Kasneci e al., “Cha GPT o educa ion,” Compu e s and
Educa ion: AI, 2023.
[5] K. Sha ma e al., “Con e sa ional AI o pe sonalized lea ning,”
P oc. ACM L@S, 2023.
[6] X. Chen e al., “P edic i e analy ics in adap i e lea ning,” IEEE
T ans. Lea ning Technologies, 2022.
[7] J. Bassen e al., “Adap i e scheduling o p ac ice,” P oc. EDM,
2020.