scieee Science in your language
[en] (orig)

UoL-UPF at TSAR 2025 Shared Task: A Generate-and-Select Approach for Readability-Controlled Text Simplification

Author: Hayakawa, Akio; Khallaf, Nouran; Sharoff, Serge; Saggion, Horacio
Publisher: Zenodo
DOI: 10.5281/zenodo.17698365
Source: https://zenodo.org/records/17698365/files/2025.tsar-1.16.pdf
P oceedings o he Fou h Wo kshop on Tex Simpli ica ion, Accessibili y and Readabili y (TSAR 2025), pages 193–210
No embe 4-9, 2025 ©2025 Associa ion o Compu a ional Linguis ics
UoL-UPF a TSAR 2025 Sha ed Task: A Gene a e-and-Selec App oach
o Readabili y-Con olled Tex Simpli ica ion
Akio Hayakawa1Nou an Khalla 2Se ge Sha o 2Ho acio Saggion1
1Uni e si a Pompeu Fab a 2Uni e si y o Leeds
{akio.hayakawa,ho acio.saggion}@up .edu
{N.Khalla ,S.Sha o }@leeds.ac.uk
Abs ac
The TSAR 2025 Sha ed Task on Readabili y-
Con olled Tex Simpli ica ion ocuses on sim-
pli ying English pa ag aphs w i en a an ad-
anced le el (B2 o highe ) and ew i ing hem
o a ge CEFR le els (A2 o B1). The chal-
lenge is o educe linguis ic complexi y wi hou
sac i icing cohe ence o meaning. We de el-
oped h ee complemen a y app oaches based
on la ge language models (LLMs). The i s
app oach (Run 1) gene a es a di e se se o
pa ag aph-le el simpli ica ions. I hen applies
il e s o en o ce CEFR alignmen , p ese e
meaning, and encou age di e si y, and inally
selec s he candida es wi h he lowes pe cei ed
isk. The second (Run 2) pe o ms simpli i-
ca ion a he sen ence le el, combining s uc-
u ed p omp ing, co e e ence esolu ion, and
explainable AI echniques o highligh in luen-
ial ph ases, wi h candida e selec ion guided
by au oma ic and LLM-based judges. The
hi d hyb id app oach (Run 3) in eg a es bo h
s a egies by pooling pa ag aph- and sen ence-
le el simpli ica ions, and subsequen ly apply-
ing he iden ical il e ing and selec ion a chi-
ec u e used in Run 1. In he o icial TSAR
e alua ion, he hyb id sys em anked 2nd o e -
all, while i s componen sys ems also achie ed
compe i i e esul s.
1 In oduc ion
Tex Simpli ica ion aims o make complex ex s
mo e accessible o a b oad audience, including lan-
guage lea ne s and indi iduals wi h eading di icul-
ies (Saggion,2017;Al-Thanyyan and Azmi,2021).
Howe e , many adi ional app oaches ail o mee
he di e se needs o eade s a di e en p o iciency
le els. To add ess his, he ield has mo ed owa ds
a ge ed simpli ica ion, which aims o adap he
complexi y o a ex o a speci ic eade ’s needs,
a he han jus simpli ying i o a gene al audience
(Ba ayan e al.,2025;Säube li e al.,2024). This
equi es de ining speci ic p o iciency a ge s, and
he Common Eu opean F amewo k o Re e ence
o Languages (CEFR) has been widely used o
his pu pose (Impe ial e al.,2025). Also, he ma-
jo i y o ex simpli ica ion esea ch has ocused on
sen ence-le el, while la gely o e looking he mo e
p ac ical scena io o pa ag aph-le el simpli ica-
ion. The TSAR 2025 Sha ed Task on Readabili y-
Con olled Tex Simpli ica ion is si ua ed wi hin
his con ex , challenging pa icipan s o simpli y
pa ag aphs o iginally a B2 le el o abo e o a ge
le els o A2 and B1 (Al a-Manchego e al.,2025).
In his pape , we p opose and alida e a
Gene a e-and-Selec app oach ha does no ely on
a single bes p omp , model, o simpli ica ion s a -
egy. Ou p ima y goal was o achie e a high sco e
on a key e alua ion me ic: simila i y o he e e -
ence ex . The o icial e alua ion, conduc ed only
au oma ically, was based on h ee me ics: CEFR
compliance, ou pu - o-o iginal simila i y (Mean-
ing P ese a ion), and ou pu - o- e e ence simila -
i y. While he i s wo could be calcula ed by
pa icipan s hemsel es, he e e ence ex s we e
no p o ided. Ou sys em he e o e aimed o a
high ou pu - o- e e ence simila i y.
To achie e his, we de eloped a powe ul
gene a e-and-selec pipeline based on pa ag aph-
le el simpli ica ion (Run 1) as ou co e app oach.
This sys em i s gene a es a di e se se o can-
dida es and hen il e ed o c ea e a high-quali y
candida e pool o Minimum Bayes Risk (MBR)
decoding (Bickel and Doksum,1977) o selec he
op imal ou pu . As demons a ed by Heineman
e al. (2024), he di e si y o candida es is c ucial
o enhancing he quali y o MBR decoding. To
u he imp o e i s pe o mance, we in oduced a
sen ence-le el sys em (Run 2). While weake on i s
own, his seconda y sys em success ully injec ed
s uc u al di e si y in o ou candida e pool. Ou i-
nal, hyb id sys em (Run 3) combines he candida e
pool om bo h Run 1 and Run 2. I hen p ocesses
his combined pool using he same pipeline as Run
193
1 o selec he op imal ou pu .
Ou app oach p o ed highly e ec i e in he
sha ed ask. Among 48 submissions om 20 in-
e na ional eams, ou hyb id sys em (Run 3) and
co e sys em (Run 1) placed 2nd and 3 d o e all.
No ably, Run 3 and 1 anked 1s and 2nd on he
e e ence ex simila i y espec i ely, con i ming
he success o ou p ima y objec i e.
Howe e , ou success also e ealed an inhe en
limi a ion o he e alua ion me ic we ocused on
op imizing. Ou case s udy highligh s ha while
he me ic is designed o cap u e deep seman ic sim-
ila i y, i s sco es can s ill be in luenced by su ace-
le el ea u es. This can be misleading, as lexical
o e lap can some imes ou weigh seman ic ac ual-
i y in he sco e.
The main con ibu ions o his pape a e:
•
We p esen a Gene a e-and-Selec pipeline ha
success ully maximizes e e ence simila i y.
•
We demons a e ha e en a weak sys em can
con ibu e he di e si y needed o a powe ul
selec ion pipeline.
•
We analyse he limi a ions o he e alua ion me -
ic we ocused on op imizing.
The expe imen al se up is a ailable on Gi Hub.
1
2 Ou pipeline
Ou submission consis s o h ee sys ems (Runs
1-3). Ou co e app oach, which achie ed 3 d place
o e all, is p esen ed as Run 1. While ou p ima y
objec i e is o achie e a high ou pu - o- e e ence
simila i y, we also aim o a ain sa is ac o y sco es
in o he me ics, namely CEFR compliance and
meaning p ese a ion.
2.1 Run 1: Pa ag aph-Le el MBR Sys em
Run 1 is ou p ima y sys em, designed o maxi-
mize he simila i y be ween sys em ou pu s and
e e ence ex s, h ough a mul i-s age pipeline. As
shown in Figu e 1, he co e app oach is a h ee-
s age p ocess. We i s gene a e a di e se se o
candida es, and hen selec a high-quali y subse
by applying CEFR and Meaning P ese a ion il-
e ing. Finally, we apply MBR decoding o selec
he ou pu wi h he lowes isk.
2.1.1 Di e se Candida e Gene a ion
The p ocess s a s wi h gene a ing a la ge se o ini-
ial simpli ica ion candida es o each sou ce pa a-
1h ps://gi hub.com/ahaya3776/
sa 2025sha ed ask-uol-up
Run 1
Run 3
LLM1 LLM2 LLM3 LLM4
P1 P2 P3 P4
4 LLMs x 4 P omp s
x 5 T ials
LLMp LLMq LLM
S1p S1q S1
S2p S2q S2
... ... ...
3^(n_sen s)
Di e se Candida es
80
Candida es
Up o 80
Candida es
CEFR Fil e ing
CEFR-aligned
Candida es Up o 20
Candida e
Pool
Meaning P ese a ion Ranking
+ Di e si y-Awa e Selec ion
Final Ou pu
MBR Decoding
Figu e 1: Sys em A chi ec u e o Run 1 and 3.
g aph and i s co esponding a ge CEFR le el. To
ensu e a ich and a ied candida e pool, his gene -
a ion p ocess employs wo key di e si y s a egies:
mul i-p omp ing and mul i-model.
•
Mul i-P omp ing: We p epa e ou ypes o
p omp s, wi h h ee o hem au oma ically gene -
a ed by an LLM. Ou p omp s include wo induc-
i e p omp s de i ed om ial da a, a deduc i e
p omp based on CEFR-adap ed simpli ica ion
ules, and a s anda d ew-sho p omp . (See Ap-
pendix A o he de ails.)
•
Mul i-Model: The p omp s abo e a e un
ac oss ou au o- eg essi e la ge language mod-
els (LLMs), GPT-4.1-mini,
2
gp -oss-20b (Ope-
nAI,2025), Gemma-3-4b-i (Gemma,2025), and
Qwen-2.5-14b-i (Qwen,2025), o cap u e he
unique simpli ica ion endencies o each model.
Fo each combina ion o p omp and LLM, we
pe o med i e simpli ica ion ials, using i e sepa-
a e API calls o i e di e en seeds. As a esul , we
gene a ed 80 candida es pe simpli ica ion ins ance
(4 LLMs x 4 p omp s x 5 ials). See Appendix C
o he hype pa ame e se ings.
2.1.2 Candida e Pool Cons uc ion
A e he gene a ion s age, we il e , ank, and se-
lec om he ini ial se o candida es. This p ocess
c ea es an op imized candida e pool o up o 20
simpli ica ions o MBR decoding.
2h ps://openai.com/index/gp -4-1/
194
1.
CEFR Fil e ing: Fi s , we label he CEFR le el
(A1, A2, B1, B2, C1, and C2) o all candida es
and ob ain he minimum di e ence om he
a ge CEFR le el. Gi en he la ge numbe o
candida es, his minimum di e ence is almos
always ze o (i.e., a leas one candida e ma ches
he a ge CEFR le el). We hen e ain only
he candida es ha ha e his minimum di e -
ence. CEFR le els a e labeled using classi i-
ca ion models used in he o icial sha ed ask
e alua ion.
2.
Meaning P ese a ion Ranking: The emain-
ing CEFR-complian candida es a e anked in
hei seman ic simila i y o he o iginal sou ce
pa ag aph. We use MeaningBERT (Beauchemin
e al.,2023) ollowing he o icial e alua ion.
3.
Di e si y-Awa e Selec ion: F om his anked
lis , we build he inal pool wi h a maximum
size o 20. We selec candida es p ima ily based
on he p e ious anking. Howe e , o maximize
he bene i s o MBR decoding, which equi es a
di e se candida e pool (Heineman e al.,2024),
we apply a il e o ensu e s uc u al di e si y. A
candida e is added o he pool only i i s BLEU
(Papineni e al.,2002) agains e e y candida e
al eady in he pool is below a h eshold o 0.5.
2.1.3 MBR Decoding
Finally, we apply MBR decoding o he cons uc ed
pool. MBR selec s he single candida e ha maxi-
mizes he expec ed u ili y unc ion agains all o he
candida es in he se . Fo he u ili y unc ion, we
again use MeaningBERT, measu ing he pai wise
simila i y be ween candida es. The candida e wi h
he highes a e age simila i y sco e agains i s o he
candida es is selec ed as he inal ou pu . The inal
ou pu ˆyMBR can be exp essed as:
ˆyMBR = a gmax
y∈H
(EH[Ey′∈H[u(y, y′)]]),(1)
whe e
H
is a candida e pool and
u(y, y′)
is a u ili y
unc ion, de ined as MeaningBERT(y, y′).
2.2 Run 2: Sen ence-le el Simpli ica ion
Ou second sys em app oaches he ask a he sen-
ence le el. P io wo k has shown ha long, co e -
e en ial sen ences wi h dense e minology a e a
key sou ce o di icul y o eade s and a e bes
add essed h ough a ge ed edi s a he han global
ew i es (Siddha han,2006;Sha dlow,2014;Š a-
jne and Popo i´
c,2016;Ba ayan e al.,2025). Run
2 he e o e in es iga es whe he explici linguis ic
con ol ha applied locally a he sen ence le el,
can be e align ou pu s wi h CEFR le els while
p ese ing meaning ( o sys em a chi ec u e see
Appendix E). By simpli ying sen ences indepen-
den ly, while s ill highligh ing he mos impo an
ph ases, we aim o p oduce ou pu s ha a e bo h
con olled and in e p e able. Run 2 consis s o he
ollowing s eps.
P ep ocessing. Each pa ag aph is i s seg-
men ed in o sen ences and no malised o co e -
e ence. We eplace ambiguous p onominal e e -
ences (e.g., he, she, hey, i ) wi h hei an eceden s
using AllenNLP’s co e e ence sys em (Lee e al.,
2017) and he spaCy-compa ible co e module
(Honnibal e al.,2020). This p oduces a lis o
sel -con ained sen ences ha can be simpli ied in-
dependen ly.
Highligh ing in luen ial ph ases. To iden i y
which pa s o a sen ence con ibu e mos o lin-
guis ic complexi y, we apply In eg a ed G adi-
en s (IG) (Sunda a ajan e al.,2017). We apply
Cap um’s
Laye In eg a edG adien s
(Miglani
e al.,2023) o e he embedding laye o a sen ence-
based CEFR classi ie (Ba ayan e al.,2025), using
a padded baseline sequence and in eg a ing g adi-
en s wi h espec o he “complexi y” logi . Token-
le el a ibu ion sco es a e agg ega ed in o mul i-
wo d ph ases (NP, VP, ADJP, PP) using spaCy
chunks. The op-
K
ph ases (de aul
K=6
) a e e-
ained by absolu e sco e. These in luen ial ph ases
a e expo ed as
( ype,ph ase,sco e)
iples and
injec ed in o he simpli ica ion p omp (see Ap-
pendix B.1). This allows he LLM o ocus on
which e ms o simpli y o gloss.
The same in luen ial ph ases ha e ano he ole
in he e alua o s ep, in which he me ic e i ies
whe he hese spans a e p ese ed in he simpli-
ied ou pu . In his way, IG a ibu ions se e a
dual pu pose: guiding gene a ion and in o ming
e alua ion.
Simpli ica ion s a egies. We guide he mod-
els wi h s a egies inspi ed by in alingual ans-
la ion and Easy- o-Read (E2R) English (Khalla
e al.,2025). These include explana ion (adding
glosses), modula ion (one idea pe sen ence), syn-
onymy (simple wo ds), syn ac ic changes (spli -
ing clauses), and omission (d opping non-essen ial
de ails).
P omp ing and candida e gene a ion. We
p omp h ee LLMs, LLaMA-3-8B (Dubey e al.,
195
2024), GPT-4o (OpenAI,2023), and Mis al-7B
(Jiang e al.,2023), o gene a e simpli ica ions o
CEFR le els A1, A2, and B1 in a single esponse.
P omp s en o ce cons ain s on meaning p ese a-
ion, co ec ness o en i ies and numbe s, eadabil-
i y (sho e sen ences, simple wo ds), and s ic o -
ma ing wi h explici le el ags (see Appendix B.1).
Au oma ic and hyb id judging. Candida e ou -
pu s a e sco ed by an au oma ic judge ha in e-
g a es eigh complemen a y signals (see Table 7
in Appendix F). These include seman ic simila -
i y based on sen ence embeddings and en ailmen
(Williams e al.,2018), key-ph ase co e age om
IG a ibu ions, en i y and numbe ideli y using
spaCy (Honnibal e al.,2020), eadabili y a ge s
de i ed om a e age sen ence leng h (ASL) and
Flesch Reading Ease (Flesch,1948), lexical simpli-
ica ion (syllable educ ion), luency ia language
model pe plexi y (Ju a sky and Ma in,2023), com-
p ession a io, and sen ence/ o ma con ol.
We combine he e ogeneous me ics wi h a
weigh ed geome ic mean, which is widely used
in mul i-c i e ia e alua ion (Mohapa a and Kuma ,
2015;Dodd e al.,2021). When wo candida es
sco e wi hin a small ma gin, we in oke a Hyb id
Au o+LLM (HAI) judge, which que ies a sec-
ond LLM (GPT-4o o LLaMA) o make a pai wise
choice wi h jus i ica ion. We pass he o iginal, a -
ge le el, and op-
K
candida es (p e il e ed by he
au o judge) o a second LLM (GPT-4o o LLaMA)
ha e u ns a winne index and eason (see Ap-
pendix B.2). A e simpli ica ion, sen ences a e
e-s i ched in o he le el- agged block (
<B1>
,
<A2>
,
<A1>)
2.3 Run 3: Hyb id MBR Sys em
Ou bes -pe o ming sys em, Run 3, uses he same
pipeline as Run 1 bu s a s wi h a mo e di e se
se o ini ial candida es om Run 2. In addi ion o
80 candida es gene a ed in Run 1, we inco po a e
candida es based on sen ence-le el simpli ica ion
in Run 2. As shown in Figu e 1, we gene a e can-
dida es based on Run 2 by conca ena ing sen ence-
le el simpli ica ions. Fo each sen ence in an o igi-
nal pa ag aph, h ee simpli ied sen ences a e gene -
a ed by h ee di e en LLMs. The combina ion o
simpli ied sen ences esul in
3n_sen ences
po en ial
pa ag aph a ian s, om which we andomly sam-
ple up o 80 candida es. Among his combined se
o up o 160 candida es, he inal ou pu is selec ed
h ough he iden ical p ocess desc ibed o Run 1.
CEFR Sim Sim To al
Team RMSE O ig Re Rank
EhiMeNLP 0.000 .902 .845 1
UoL-UPF (3) 0.000 .856 .857 2
UoL-UPF (1) 0.000 .849 .856 3
HIT-YOU 0.158 .852 .835 4
A chaeology 0.122 .779 .804 11
ounlp 0.755 .855 .849 14
SQUREL 1.153 .979 .819 23
UoL-UPF (2) 0.693 .808 .827 -
Table 1: Rep esen a i e esul s om 44 uns om 20
eams. The bes pe o mance o each me ic is shown
in ed. Run 2 is an uno icial esul due o pa sing e o ,
and i s es ima ed ank is a ound 20 h.
A2 B1
Model Num. Sim Num. Sim
GPT-4.1-mini 24 .841 13 .865
gp -oss-20b 31 .831 17 .902
Gemma-3-4b 16 .840 12 .862
Qwen-2.5-14b 26 .862 36 .877
Sen ence-l 3 .730 22 .860
A2 B1
P omp Num. Sim Num. Sim
P omp 1 19 .839 20 .872
P omp 2 30 .838 15 .908
P omp 3 24 .831 23 .867
P omp 4 24 .866 20 .874
Sen ence-l 3 .730 22 .860
Table 2: Dis ibu ion o models and p omp s selec ed
as a inal candida e in Run 3 wi h ou pu - o- e e ence
simila i y sco es by MeaningBERT.
3 Resul s and Discussions
Table 1 shows he o icial esul s o he sha ed ask.
The hyb id sys em (Run 3) is anked 2nd, while he
co e sys em (Run 1) is 3 d o e all. Fu he mo e,
ou sys ems placed 1s ( ied, ull ma ks) on CEFR
alignmen , and 1s and 2nd on ou pu - o- e e ence
simila i y. This esul con i ms he success o ou
pipeline combining il e ing and MBR decoding,
he eby achie ing he high ou pu - o- e e ence sim-
ila i y while main aining o he me ics.
Table 2 demons a es he dis ibu ion o se-
lec ed candida es o Run 3, ca ego ized by hei
sou ce. The selec ions we e gene ally dis ibu ed
e enly ac oss a ge le els and ou a ious p omp s,
models, and g anula i ies. The only excep ion is
196
A2 B1
Abla ion O ig Re O ig Re
Run 3 .836 .840 .876 .874
w/o Sen . l (≡Run 1) .824 .837 .874 .875
w/o MPR, DAS, MBR .756 .779 .817 .822
w/o MPR, DAS .815 .830 .850 .858
w/o DAS .849 .834 .891 .869
w/o MBR (Random) .789 .793 .841 .832
w/o MBR (Highes MP) .896 .814 .919 .858
w/ smalle MBR (size=10) .853 .838 .888 .873
Table 3: MeaningBERT sco es be ween ou pu s and
o iginal (O ig) and e e ence (Re ), as an abla ion s udy
o p ocesses a e he CEFR il e ing. MPR and DAS
e e s o Meaning P ese a ion Ranking and Di e si y-
Awa e Selec ion, espec i ely.
sen ence-le el app oach o he A2 a ge . This
implies ha adding explana ions, o en obse ed
in he simpli ica ion o lowe p o iciency le els, is
ha d o achie e ia sen ence-le el app oach. This
o e all di e si y was he key o he success o ou
MBR-based selec ion pipeline.
Fu he mo e, we conduc ed abla ion s udy
shown in Table 3. As we desc ibed, inal ou pu s
a e selec ed h ough Meaning P ese a ion Rank-
ing, Di e si y-Awa e Selec ion, and MBR decod-
ing a e he CEFR il e ing. The s udy shows ha
each o hese s eps con ibu ed o imp o e ou pu -
o- e e ence simila i y. No ably, MBR decoding
boos ed i up, while inc easing he candida e pool
size p oduced only a negligible gain.
This success also highligh s an impo an cha -
ac e is ic o ou me hod. Figu e 2 illus a es
he MeaningBERT sco es dis ibu ion o CEFR-
aligning candida es o one example ins ance.
While he inal ou pu shows he highes ou pu - o-
o iginal simila i y, se e al candida es show highe
ou pu - o- e e ence simila i y. This obse a ion
con i ms ha MBR decoding is designed o mini-
mize he isk o selec ing a low-sco ing candida e,
no o selec one wi h he maximum expec ed sco e.
As a esul , inal ou pu s a e o en conse a i e.
Despi e p io i izing ou pu - o- e e ence simila -
i y, we acknowledge ha o e - eliance on his me -
ic can be p oblema ic. Ou quali a i e analysis
shows limi ed ag eemen be ween sco es and hu-
man judgmen s. Speci ically, ins ances con aining
seman ic e o s o complex ocabula y (yellow in
he sca e plo ) a e o en o e -e alua ed by he
me ic when hey a e s uc u ally simila o he
e e ence. On he o he hand, s uc u e changes,
such as sen ence spli ing, a e penalized e en i
bene icial.
0.75 0.8 0.85 0.9
0.7
0.75
0.8
0.85
0.9
Ou pu - o-O iginal Simila i y
Ou pu - o-Re e ence Simila i y
Excellen Ve y Good Poo O he
MBR Ou pu Candida e Pool Fil e ed Ou
Figu e 2: Sca e plo o CEFR-aligned candida es o
a single ins ance. Axes ep esen simila i y sco es be-
ween ou pu and o iginal/ e e ence. Ci cles a e ones
selec ed as candida e pool, and he diamond is he i-
nal ou pu h ough MBR decoding. Colo s align wi h
Table 5,Table 6 in which we manually judged simpli i-
ca ion quali y.
Ou case s udy suppo s ha MeaningBERT o -
en ail o cap u e he alue o ea u es such as sen-
ence spli ing, synonym choice, and mo al o p ag-
ma ic cla i y, ewa ding su ace o e lap ins ead o
genuine accessibili y (Ba ayan e al.,2025). We
p o ide ull analysis in Appendix D.
4 Conclusion
In his pape , we p esen ed ou Gene a e-and-Selec
amewo k o he TSAR 2025 sha ed ask, which
achie ed 2nd and 3 d place o e all. Ou co e ap-
p oach u ilized a di e se candida e pool om mul-
iple LLMs and p omp s, wi h MBR decoding o
obus selec ion.
Ou p ima y con ibu ion is demons a ing ha
ou Gene a e-and-Selec amewo k is highly e ec-
i e. We showed ha i s s eng h lies in p io i izing
he di e si y o candida es, which allowed e en a
weake sys em (ou sen ence-le el Run 2) o make
a con ibu ion o he inal pe o mance by injec ing
a ie y.
Finally, ou analysis shows ha while ou
pipeline is obus , i s limi a ion in a single-
e e ence con ex highligh s he need o selec ion
me hods ha can be e handle unp edic able sim-
pli ica ions.
197

Lay Summa y
UoL-UPF eam pa icipa ed in he TSAR 2025
Sha ed Task. The goal o his sha ed ask was
o ew i e di icul English ex s in o simple ex s
a a speci ic le el.
We ied an idea we call Gene a e-and-Selec
app oach. In his app oach, i s , we used LLMs o
gene a e many e sions o simple ex s. We used
di e en LLMs and p omp s, so he e we e a lo
o op ions o choose om. This a ie y was a key
pa o ou idea. Nex , we selec ed he bes op ion
om hese simple ex s. We buil a sys em o check
all he simple ex s. This sys em had some il e ing
p ocesses. Fo example, one il e only selec ed
ex s ha we e simila o o iginal di icul ex s.
A e hese il e ing p ocesses, we only had high-
quali y op ions. Finally, om hese high-quali y
op ions, we selec ed he lowes - isk op ion as a
inal esul .
Ou sys em pe o med e y well, and was anked
2nd place ou o 48 sys ems. This g ea esul
showed ha ou idea was a good one. Th ough his
p ojec , we lea ned some e y impo an hings. I
is ue ha ou gene a e-and-selec app oach wo ks
well, especially when he quali y o gene a ed ex s
is judged by compu e . Howe e , we canno always
us compu e judge. In ou s udy, some simple
ex s we e good by compu e judge, bu no by
human judge.
Limi a ions
The p ima y limi a ion o his wo k is i s eliance
on di e se se o gene a ion. While he LLMs
we employed a e ela i ely small-scaled and hus
do no equi e excessi e compu a ional esou ces,
he ime and cos associa ed wi h ob aining he i-
nal ou pu s canno be dis ega ded. The e o e, ou
gene a e-and-selec amewo k would be unsui -
able o eal- ime ex simpli ica ion.
Also, his sha ed ask elies on au oma ic e al-
ua ion me ics. While ou sys em achie ed high
sco es, we did no conduc a manual e alua ion
wi h human pa icipan s o con i m whe he he ou -
pu s a e genuinely mo e eadable and unde s and-
able o he a ge eade s. Such manual e alua ion,
wi h Like sco ing o eading comp ehension ques-
ions, would be necessa y o alida e he eal-wo ld
e ec i eness o ou simpli ica ions.
Acknowledgmen s
This documen is pa o a p ojec ha has ecei ed
unding om he Eu opean Union’s Ho izon Eu-
ope esea ch and inno a ion p og am unde G an
Ag eemen No. 101132431 (iDEM P ojec ). The
iews and opinions exp essed in his documen a e
solely hose o he au ho (s) and do no necessa ily
e lec he iews o he Eu opean Union. Nei he
he Eu opean Union no he g an ing au ho i y can
be held esponsible o hem. The Uni e si y o
Leeds (UOL) was unded by UK Resea ch and In-
no a ion (UKRI) unde he UK go e nmen ’s Ho i-
zon Eu ope unding gua an ee (G an Ag eemen
No. 10103529).
Also, his wo k is pa ially inanced by he
Minis e io de Ciencia, Inno ación y Uni e -
sidades, Agencia Es a al de In es igaciones:
p ojec CPP2023-010780 unded by MICI-
U/AEI/10.13039/501100011033 and by FEDER,
UE (“Habili ando Modelos de Lenguaje Respons-
ables e Inclusi os”). Ho acio Saggion also ecei es
suppo om he Spanish S a e Resea ch Agency
unde he Ma ia de Maez u Uni s o Excellence
P og amme (CEX2021-001195-M) and om
he Depa amen de Rece ca i Uni e si a s de la
Gene ali a de Ca alunya (aju s SGR-Ca 2021).
Re e ences
Suha S Al-Thanyyan and Aqil M Azmi. 2021. Au o-
ma ed ex simpli ica ion: a su ey. ACM Compu ing
Su eys (CSUR), 54(2):1–36.
Fe nando Al a-Manchego, Regina S odden,
Joseph Ma in Impe ial, Abdullah Ba ayan,
Kai No h, and Ha ish Tayya Madabushi. 2025.
Findings o he TSAR 2025 sha ed ask on
eadabili y-con olled ex simpli ica ion. In P oceed-
ings o he Fou h Wo kshop on Tex Simpli ica ion,
Accessibili y, and Readabili y (TSAR 2025), Suzhou,
China. Associa ion o Compu a ional Linguis ics.
Abdullah Ba ayan, Jose Camacho-Collados, and Fe -
nando Al a-Manchego. 2025. Analysing ze o-sho
eadabili y-con olled sen ence simpli ica ion. In
P oceedings o he 31s In e na ional Con e ence on
Compu a ional Linguis ics (COLING), pages 6762–
6781. Associa ion o Compu a ional Linguis ics.
Da id Beauchemin, Ho acio Saggion, and Richa d
Khou y. 2023. Meaningbe : assessing meaning
p ese a ion be ween sen ences. F on ie s in A i-
icial In elligence, 6:1223924.
P.J. Bickel and K.A. Doksum. 1977. Ma hema ical
S a is ics: Basic Ideas and Selec ed Topics. P en ice
Hall.
198
Ben Dodd, Be y an Aken, Paul Rö ge , and Isabelle
Augens ein. 2021. AUTORANK: A sys ema ic ap-
p oach o benchma k and compa e machine lea ning
models. In P oceedings o he 2021 Con e ence on
Empi ical Me hods in Na u al Language P ocessing,
pages 170–185, Online and Pun a Cana, Dominican
Republic. Associa ion o Compu a ional Linguis ics.
Abhimanyu Dubey, Rohan Tao i, Alexei Bae ski, and
1 o he s. 2024. The llama 3 he d o models. a Xi
p ep in a Xi :2407.21783.
Rudol Flesch. 1948. A new eadabili y ya ds ick.Jou -
nal o Applied Psychology, 32(3):221–233.
Gemma. 2025. Gemma 3 echnical epo .P ep in ,
a Xi :2503.19786.
Da id Heineman, Yao Dou, and Wei Xu. 2024. Imp o -
ing minimum Bayes isk decoding wi h mul i-p omp .
In P oceedings o he 2024 Con e ence on Empi i-
cal Me hods in Na u al Language P ocessing, pages
22525–22545, Miami, Flo ida, USA. Associa ion o
Compu a ional Linguis ics.
Ma hew Honnibal, Ines Mon ani, So ie Van Lan-
deghem, and Ad iane Boyd. 2020. spaCy: Na u-
al language unde s anding wi h bloom embeddings,
con olu ional neu al ne wo ks and inc emen al pa s-
ing. So wa e documen a ion.
Joseph Ma in Impe ial, Abdullah Ba ayan, Regina
S odden, Rod igo Wilkens, Rica do Munoz Sanchez,
Lingyun Gao, Melissa To gbi, Dawn Knigh , Gail
Fo ey, Reka R Jablonkai, and 1 o he s. 2025. Uni-
e salce : Enabling open mul ilingual esea ch on
language p o iciency assessmen . a Xi p ep in
a Xi :2506.01419.
Albe Q. Jiang, Alexand e Sablay olles, A hu Men-
sch, Ch is Bam o d, De end a Singh Chaplo , Diego
de las Casas, Flo ian B essand, Gianna Lengyel, Guil-
laume Lample, Lucile Saulnie , Lélio Rena d La aud,
Ma ie-Anne Lachaux, Pie e S ock, Te en Le Scao,
Thibau La il, Thomas Wang, Timo hée Lac oix,
and William El Sayed. 2023. Mis al 7B.P ep in ,
a Xi :2310.06825.
Dan Ju a sky and James H. Ma in. 2023. Speech and
language p ocessing (3 d ed. d a ): Chap e on lan-
guage modeling. Online d a .
Nou an Khalla , Ca lo Eugeni, and Se ge Sha o . 2025.
Reading be ween he lines: A da ase and a s udy on
why some ex s a e oughe han o he s.P ep in ,
a Xi :2501.01796. Published a W i ing Aids a he
C oss oads o AI, Cogni i e Science and NLP (WR-
AI-CogS), COLING 2025, Abu Dhabi.
Ken on Lee, Luheng He, Mike Lewis, and Luke Ze le-
moye . 2017. End- o-end neu al co e e ence eso-
lu ion. In P oceedings o he 2017 Con e ence on
Empi ical Me hods in Na u al Language P ocessing,
pages 188–197, Copenhagen, Denma k. Associa ion
o Compu a ional Linguis ics.
Vi ek Miglani, Aobo Yang, A am Ma kosyan, Diego
Ga cia-Olano, and Na ine Kokhlikyan. 2023. Using
Cap um o explain gene a i e language models. In
P oceedings o he 3 d Wo kshop o Na u al Lan-
guage P ocessing Open Sou ce So wa e (NLP-OSS
2023), pages 165–173, Singapo e. Associa ion o
Compu a ional Linguis ics.
P asan a Kuma Mohapa a and Su esh Kuma . 2015.
A mul i-c i e ia decision making me hod based on
weigh ed geome ic mean.In e na ional Jou nal o
Applied Decision Sciences, 8(2):133–148.
OpenAI. 2023. GPT-4 echnical epo .P ep in ,
a Xi :2303.08774.
OpenAI. 2025. gp -oss-120b & gp -oss-20b model ca d.
P ep in , a Xi :2508.10925.
Kisho e Papineni, Salim Roukos, Todd Wa d, and Wei-
Jing Zhu. 2002. Bleu: a me hod o au oma ic e alu-
a ion o machine ansla ion. In P oceedings o he
40 h Annual Mee ing o he Associa ion o Compu-
a ional Linguis ics, pages 311–318, Philadelphia,
Pennsyl ania, USA. Associa ion o Compu a ional
Linguis ics.
Qwen. 2025. Qwen2.5 echnical epo .P ep in ,
a Xi :2412.15115.
Ho acio Saggion. 2017. Au oma ic Tex Simpli ica ion,
olume 10 o Syn hesis Lec u es on Human Language
Technologies. Mo gan & Claypool Publishe s.
And eas Säube li, F anz Holzknech , Pa ick Halle ,
Sil ana Deilen, Lau a Schi l, Sil ia Hansen-Schi a,
and Sa ah Ebling. 2024. Digi al comp ehensibili y
assessmen o simpli ied ex s among pe sons wi h
in ellec ual disabili ies. In P oceedings o he 2024
CHI Con e ence on Human Fac o s in Compu ing
Sys ems, pages 1–11.
Ma hew Sha dlow. 2014. A su ey o au oma ed ex
simpli ica ion.In e na ional Jou nal o Ad anced
Compu e Science and Applica ions (IJACSA),
4(1):58–70.
Ad ai h Siddha han. 2006. Syn ac ic simpli ica ion
and ex cohesion.Resea ch on Language and Com-
pu a ion, 4(1):77–109.
Sanja Š ajne and Maja Popo i´
c. 2016. Can ex sim-
pli ica ion imp o e machine ansla ion? In P o-
ceedings o he 10 h In e na ional Con e ence on
Language Resou ces and E alua ion (LREC), pages
172–178. Eu opean Language Resou ces Associa-
ion.
Mukund Sunda a ajan, Anku Taly, and Qiqi Yan. 2017.
Axioma ic a ibu ion o deep ne wo ks. In P oceed-
ings o he 34 h In e na ional Con e ence on Ma-
chine Lea ning, pages 3319–3328, Sydney, Aus alia.
PMLR.
199
Adina Williams, Niki a Nangia, and Samuel Bowman.
2018. A b oad-co e age challenge co pus o sen-
ence unde s anding h ough in e ence. In P oceed-
ings o he 2018 Con e ence o he No h Ame ican
Chap e o he Associa ion o Compu a ional Lin-
guis ics: Human Language Technologies, Volume
1 (Long Pape s), pages 1112–1122, New O leans,
Louisiana. Associa ion o Compu a ional Linguis-
ics.
A Run 1: P omp s o Pa ag aph-le el
Simpli ica ion
We used ou simpli ica ion p omp s o LLMs.
Two o hese we e based on induc i e app oach,
which in ol ed ex ac ing simpli ica ion ea u es
om ial da a o c ea e ins uc ions as a p omp .
To do his, he ollowing p omp was gi en o GPT-
4.1-mini.
You will be gi en se e al pai s o pa ag aphs. Each pai is
composed o an o iginal pa ag aph and a simpli ied e sion
o CEFR {l } eade s. You ask is o analyze hese pai s o
ind he gene al pa e ns o simpli ica ion and w i e an
ins uc ion o LLMs o simpli y pa ag aphs simila ly.
Include obse a ions on in o ma ion o ph asing ha
emains unchanged. Do no include examples ha con ain
ex pa s in gi en pa ag aphs. Only ou pu you inal p omp
.
O iginal: {O iginal Pa ag aph 1 o he a ge CEFR le el}
Re e ence: {Re e ence Pa ag aph 1 o he a ge CEFR le el
}
O iginal: {O iginal Pa ag aph 2 o he a ge CEFR le el}
Re e ence: {Re e ence Pa ag aph 2 o he a ge CEFR le el
}
...
A e se e al ials, we picked up ollowing wo
ypes o p omp s o each le el wi h some mino
a angemen s.
P omp 1 : A2
Simpli y pa ag aphs o CEFR A2 eade s by ollowing
hese guidelines:
1. Use sho , clea sen ences wi h simple g amma s uc u es
(mos ly p esen and pas simple).
2. Replace complex o abs ac ocabula y wi h common,
conc e e wo ds; explain any necessa y echnical e ms
b ie ly and clea ly.
3. Remo e o educe de ailed nume ical da a, s a is ics, o
nuanced concep s unless essen ial; when included, p esen
numbe s simply and ound i app op ia e.
4. A oid idioma ic exp essions, igu a i e language, and
complex sen ence o ms like passi e oice o embedded
clauses.
5. Focus on main ideas and essen ial ac s; omi de ailed
backg ound in o ma ion, specula ion, o sub le dis inc ions
unless hey suppo comp ehension.
6. Use explici cause−e ec and empo al connec o s (e.g.,
because, so, bu , hen, now) o cla i y ela ionships.
7. Main ain logical and cohe en low wi h clea opic
in oduc ions and simple sequencing.
8. P ese e p ope names, key e ms, and no able ac s ha
a e cen al o unde s anding.
9. When app op ia e, add b ie , s aigh o wa d de ini ions
o explana ions o less amilia concep s.
10. Use ac i e oice p edominan ly and ensu e he subjec o
sen ences is clea .
11. Replace p onouns ha may con use wi h explici nouns
whe e needed.
12. Re ain he o e all meaning and impo an de ails bu
adap ph asing o be di ec and conc e e.
13. In oduce examples o illus a e poin s simply, using
amilia o ela able con ex s.
14. Do no assume p io knowledge; p esen backg ound
in o ma ion in simple e ms i equi ed.
15. Whe e opinion o in e p e a ion appea s, p esen i
clea ly and simply, o en using di ec s a emen s like "
people say" o "some hink."
16. Use simple punc ua ion and a oid complex s uc u es
such as long lis s o pa en he ical asides.
By ollowing hese pa e ns, p oduce an accessible, easy− o−
ead e sion o a pa ag aph ha p ese es he co e message
and key de ails o A2−le el eade s. P o ide only he
simpli ied pa ag aph wi hou any explana ion o jus i ica ion.
# O iginal:
{O iginal Pa ag aph}
# Simpli ied:
P omp 1 : B1
Simpli y pa ag aphs o CEFR B1 eade s by ollowing
hese guidelines:
1. Use simple ocabula y and exp essions: Replace
complex o o mal wo ds and ph ases wi h mo e common,
e e yday al e na i es, while keeping he meaning in ac .
2. Sho en and cla i y sen ences: B eak long, complex
sen ences in o sho e , clea e ones. Use s aigh o wa d
sen ence s uc u es, a oiding passi e oice o complica ed
clauses.
3. Explain o de ine less amilia e ms: When necessa y,
in oduce b ie explana ions o de ini ions o echnical,
cul u al, o less common concep s wi hin he ex o aid
unde s anding.
4. Re ain key in o ma ion and ac s: Keep all essen ial da a,
igu es, names, and co e ideas om he o iginal ex ,
ensu ing he main message is p ese ed.
5. Reph ase o explici ness and cla i y: Make implied
meanings mo e explici , and cla i y e e ences o p onouns
o abs ac concep s.
6. Main ain o iginal ac ual con en and sequence: Do no
omi majo de ails o eo de in o ma ion in ways ha
change he logical low o signi icance.
7. Use amilia synonyms and ph ases: P e e wo ds and
exp essions ha a e equen ly used a in e media e English
le el a he han academic o highly echnical language.
8. Simpli y complex concep s wi hou o e simpli ying:
P esen di icul ideas in mo e accessible language bu a oid
losing he nuance o accu acy o he o iginal con en .
9. Use conc e e examples o con ex whe e help ul: When
abs ac concep s migh con use, add b ie ela able
examples o con ex ual cues o aid comp ehension.
10. P ese e unchanged p ope nouns and names: Keep
names o people, places, e en s, i les, and speci ic e ms as
in he o iginal o main ain accu acy and ecogni ion.
11. A oid idioma ic o cul u e−speci ic exp essions unless
explained: Replace o explain idioms and cul u ally speci ic
e e ences ha migh no be unde s ood by B1 lea ne s.
12. Re ain he o iginal one and in en as much as possible
:** The simpli ica ion should espec he au ho 's pu pose,
one, and he o e all s yle, aiming o cla i y a he han
casualness.
200
In summa y, simpli y language and sen ence s uc u e,
cla i y meaning, explain o de ine un amilia e ms, keep all
impo an ac s and de ails, and ensu e he ex emains
cohe en and ai h ul o he o iginal. P o ide only he
simpli ied pa ag aph wi hou any explana ion o jus i ica ion.
# O iginal:
{O iginal Pa ag aph}
# Simpli ied:
P omp 2 : A2
Simpli y pa ag aphs o CEFR A2 eade s by ollowing
hese guidelines:
1. **Vocabula y and G amma :**
− Use e y common, e e yday wo ds and simple sen ence
s uc u es.
− A oid idioms, me apho s, o abs ac exp essions.
− P e e p esen ense o simple pas ense; a oid complex
e b o ms.
− Use sho sen ences, o en one idea pe sen ence.
2. **Sen ence S uc u e:**
− B eak long, complex sen ences in o mul iple sho e
sen ences.
− Use basic conjunc ions (and, bu , so, because) o connec
ideas simply.
− A oid passi e oice whe e possible; use ac i e oice
ins ead.
3. **In o ma ion Selec ion and Cla i y:**
− Re ain all key ac ual in o ma ion om he o iginal
pa ag aph.
− Remo e o eph ase any s a is ics o igu es only i hey
migh con use he eade , bu gene ally keep numbe s wi h
simple explana ions.
− Explain o de ine any echnical e ms o names using
simple language o amilia examples.
− A oid unnecessa y de ail o backg ound in o ma ion
unless i helps unde s anding.
4. **Reph asing and Simpli ica ion:**
− Replace complex nouns o ph ases wi h simple
equi alen s o b ie explana ions.
− Make implici in o ma ion explici i needed.
− Use examples o explana ions o cla i y concep s ha
migh be un amilia .
− Use epe i ion and es a emen o ein o ce unde s anding
wi hou changing meaning.
5. **Tone and S yle:**
− Use a neu al, clea , and s aigh o wa d one.
− Add ess he eade mo e di ec ly and simply when
app op ia e.
− Keep he o iginal meaning, emphasis, and main poin s
in ac .
6. **P ese ing Key P ope Nouns and Da a:**
− Keep p ope names (people, places, o ganiza ions, i les)
unchanged bu b ie ly explain hei signi icance i needed.
− Main ain impo an da es, measu emen s, and speci ic
igu es, simpli ying explana ions a ound hem.
7. **A oid Remo ing Con en En i ely:**
− Ins ead o dele ing di icul o nuanced con en , e−
exp ess i in accessible language.
− Ques ions o he o ical de ices in he o iginal can be
kep bu simpli ied and cla i ied.
By applying hese p inciples, ans o m o iginal pa ag aphs
in o clea , accessible ex sui able o A2−le el eade s
while p ese ing essen ial in o ma ion and in en . P o ide
only he simpli ied pa ag aph wi hou any explana ion o
jus i ica ion.
# O iginal:
{O iginal Pa ag aph}
# Simpli ied:
P omp 2 : B1
Simpli y pa ag aphs o CEFR B1 eade s by ollowing
hese guidelines:
1. **Vocabula y and Sen ence S uc u e:**
− Use common, e e yday wo ds ins ead o specialized o
complex ocabula y.
− P e e simple sen ence s uc u es; b eak longe o
compound sen ences in o sho e ones.
− Replace abs ac o complex e ms wi h conc e e, clea e
exp essions o b ie explana ions.
− Use ac i e oice whe e possible and a oid idioma ic
exp essions o cul u al e e ences ha may be unclea .
2. **In o ma ion P esen a ion:**
− Keep all key ac ual in o ma ion and co e ideas in ac o
p ese e he o iginal meaning.
− P esen numbe s, da es, and s a is ics clea ly, o en
epea ing o eph asing o cla i y.
− When echnical o un amilia e ms appea , de ine o
explain hem b ie ly bu simply.
− Remo e less essen ial de ails only i hey do no a ec
o e all comp ehension; o he wise, e ain he main con en
ully.
3. **Cla i ica ion and Explici ness:**
− Make implici in o ma ion explici whe e needed.
− Whe e he o iginal con ains p onouns o e e ences ha
may be unclea , eplace o cla i y hem.
− Use clea cause−and−e ec o ch onological connec o s
(e.g., "because," "so," "howe e ," "since hen") o imp o e
cohe ence.
4. **Tone and S yle:**
− Main ain a neu al, in o ma i e, and accessible one
app op ia e o lea ne s.
− A oid complex o igu a i e language; use
s aigh o wa d, li e al exp essions.
− When o iginal one includes sub le nuance, simpli y bu
y o e ain he in ended emphasis o a i ude i impo an .
5. **Ph asing and Repe i ion:**
− Some p ope nouns, da es, and well−known names
emain unchanged o p ese e iden i y and con ex .
− Common ph ases and de ini ions ha cla i y he subjec
o en ge added o sligh ly expanded o aid unde s anding.
− Sen ences may be ewo ded bu o en echo he o iginal
in o ma ion closely, some imes epea ing key ideas wi h
sligh e o mula ion o cla i y.
In summa y, simpli y ocabula y and g amma , cla i y
po en ially di icul concep s, main ain all essen ial ac s,
and ensu e eadabili y and cohe ence o in e media e
English lea ne s wi hou omi ing impo an con en .
P o ide only he simpli ied pa ag aph wi hou any
explana ion o jus i ica ion.
# O iginal:
{O iginal Pa ag aph}
# Simpli ied:
One o ou ou p omp s was also gene a ed by
an LLM, based on a deduc i e app oach wi hou
p o iding ial da a. We made GPT-4.1-mini gene -
a e such p omp s wi h he ollowing p omp .
You a e an expe in language educa ion and ha e a deep
unde s anding o CEFR. You ask is o gene a e a lis o
ules ha an LLM can use o simpli y a gi en English ex o
each o he CEFR A2 and B1 le el. The simpli ica ion mus
201
Case Model/P omp Selec ion (Non-Selec ed)
O ig.
Sim.
Re .
Sim.
NS1
Gemma-3-4b
(P omp 4)
Te ie Sha p is a e y popula w i e o s o ies. She has won he Oli e
C ime W i e Awa d wo imes. She g ew up in a place in Glasgow wi h
many c imes. She lea ned o be quie and no ge in o ouble. She also
lea ned how o open ca doo s wi h a kni e. Bu she used his only o
w i e he books. Now she is no quie . She goes on TV shows like Police
Today and o w i ing e en s in he UK.
0.828 0.802
Linguis ’s no e: This e sion uses sho , clea sen ences. The ph ase “w i e o s o ies” a oids he business-like idea o
“books sell well.” I keeps he iolen de ail bu explains i as only used o w i ing. Au oma ic me ics ma k i down because
he wo ding di e s om he e e ence, bu i is s ill e y sui able o A2.
NS2
Gemma-3-4b
(P omp 4)
Te ie Sha p is a amous w i e who can w i e many s o ies. She won he
Oli e C ime W i e Awa d wo imes. She was bo n in a pa o Glasgow
wi h many c imes. She lea ned o be quie and no ha e p oblems. She
also lea ned how o open ca doo s wi h a kni e. Bu now she uses i only
o he books. Today she alks a lo on TV shows like Police Today and
a w i ing e en s in he UK.
0.848 0.797
Linguis ’s no e : The wo d “ amous” is clea e han “popula ,” and “no ha e p oblems” is a good eplacemen o he
idiom “s ay ou o ouble.” The sen ences a e simple and easy o ollow. The only weakness is a small edundancy (“can
w i e many s o ies”). Me ics lowe he sco e because o di e en wo ds, no because o di icul y.
NS3
Gemma-3-4b
(P omp 1)
Te ie Sha p is a popula w i e . She has won he Oli e C ime W i e
Awa d wice. She was bo n in a pa o Glasgow wi h many c imes. She
lea ned o be quie and no ha e p oblems. She also lea ned how o open
ca doo s wi h a kni e. Now she only uses his o he s o ies. Today she
alks on TV shows like Police Today and a w i ing e en s in he UK.
0.830 0.830
Linguis ’s no e: The sen ences a e clea and sho . The idiom “s ay ou o ouble” is eplaced wi h he simple “no ha e
p oblems.” All key ac s a e kep . Me ics gi e only a e age c edi , e en hough his is s ong A2 ma e ial.
NS4
Gemma-3-4b
(P omp 1)
Te ie Sha p is a popula w i e . She has won he Oli e C ime W i e
Awa d wice. She was bo n in a pa o Glasgow wi h many c imes. She
lea ned o be quie and no ha e p oblems. She also lea ned how o open
ca doo s wi h a kni e. Now she only uses his o he s o ies. She alks
on TV shows like Police Today and goes o w i ing e en s in he UK.
0.848 0.762
Linguis ’s no e: This e sion has he same s eng hs as NS3. The simila i y sco e is low because i says “e en s” ins ead o
“con e ences,” bu “e en s” is ac ually easie o A2 lea ne s.
NS5
Gemma-3-4b
(P o mp 1)
Te ie Sha p is a amous w i e . She w i es good s o ies. She has won he
Oli e C ime W i e Awa d wo imes. She was bo n in a pa o Glasgow
wi h many c imes. She lea ned o be quie and no ha e p oblems. She
also lea ned how o open ca doo s wi h a kni e. She now uses his only o
w i e he books. She goes on TV shows like Police Today and o w i ing
e en s in he UK.
0.853 0.816
Linguis ’s no e: The wo ds “ amous” and “w i es good s o ies” a e simple and anspa en . The sen ences a e spli clea ly.
The meaning is ai h ul o he o iginal. Me ics educe he sco e only because he wo ding does no ma ch he e e ence.
NS6
Gemma-3-4b
(P omp 2)
Te ie Sha p is a popula w i e . She can ell good s o ies. She won he
Oli e C ime W i e Awa d wice. She was bo n in a pa o Glasgow wi h
many c imes. She lea ned o be quie and no ha e p oblems. She also
lea ned how o open ca doo s wi h a kni e. She only uses his o he
books now. She is no quie anymo e. She alks on TV shows like Police
Today and goes o w i ing e en s in he UK.
0.854 0.804
Linguis ’s no e: The ime ma ke “now” makes he s o y easie o ollow. The con as “no quie anymo e” is clea . The
only issue is he abs ac noun “knowledge,” which emains a bi ha d o A2. Me ics again don’ gi e ull c edi o hese
imp o emen s.
NS7
Gemma-3-4b
(P omp 2)
Te ie Sha p is a popula w i e . She can ell good s o ies. She won he
Oli e C ime W i e Awa d wice. She was bo n in a pa o Glasgow wi h
many c imes. She lea ned o be quie and no ha e p oblems. She also
lea ned how o open ca doo s wi h a kni e. She uses his only o w i e
he books now. She is no quie anymo e. She alks on TV shows like
Police Today and goes o w i ing e en s in he UK.
0.849 0.780
Linguis ’s no e: Ve y simila o NS6: sho sen ences, easy connec o s, clea con as . The simila i y sco e is lowe because
o ha mless wo d changes.
NS8
Gemma-3-4b
(P omp 3)
Te ie Sha p is a amous w i e . She can ell good s o ies. She has won
he Oli e C ime W i e Awa d wice. She was bo n in a pa o Glasgow
wi h many c imes. She lea ned o be quie and no ha e p oblems. She
also lea ned how o open ca doo s wi h a kni e. She only uses his o
w i e he books now. She goes on TV shows like Police Today and o
w i ing e en s in he UK.
0.852 0.786
Linguis ’s no e: The e bs “goes on” and “uses” a e clea and conc e e. Sequencing is simple and e ec i e. Me ics sco e
i lowe because he wo ds di e om he e e ence.
Con inued on nex page
208

Table 6 – con inued om p e ious page
Case
Model
(P omp )
Candida e (Non-Selec ed)
Sim.
O ig.
Sim.
Re .
NS9
GPT-4.1-mini
(P omp 4)
Te ie Sha p is a amous w i e . She has won he Oli e C ime W i e
Awa d wo imes. She was bo n in Glasgow, a place wi h a lo o c ime.
She lea ned o be quie and no ge in o ouble. She also lea ned o open
ca doo s wi h a kni e. She only used his o w i e he books. Now, she
alks on TV shows like Police Today and goes o w i ing mee ings in he
UK.
0.828 0.834
Linguis ’s no e: This e sion is e y clea and accu a e. Wo ds like amous and mee ings a e easy o A2 lea ne s. The
sequencing is simple and he con as is clea . Au oma ic me ics gi e i a lowe sco e because o di e en wo ding, bu i is
an excellen A2 simpli ica ion.
NS10
GPT-4.1-mini
(P omp 1)
Te ie Sha p is a amous w i e who has won he Oli e C ime W i e
Awa d wo imes. She was bo n in a pa o Glasgow wi h a lo o c ime.
She lea ned o be quie and a oid ouble. She also lea ned how o open
ca doo s wi h a kni e. She used his only o w i e he books. Now, she
alks on TV shows like Police Today and goes o w i ing mee ings in he
UK.
0.842 0.770
Linguis ’s no e: This e sion handles he idiom well (a oid ouble), and he wo d “mee ings” is cul u ally simple a A2.
The con en is ai h ul and he s yle is easy o ead. The lowe simila i y sco e only e lec s use ul wo d changes, no quali y
loss.
Table 6: Case s udy analysis o non-selec ed ou pu s ha we e linguis ically s ong bu sco ed lowe on au oma ic
me ics. Rows shaded
ed
a e judged ( e y good) and ows shaded
g een
a e judged (excellen ). These examples
show ha me ics o en ma k down simpli ica ions ha use common wo ds (e.g., amous s. popula ,mee ings
s. con e ences) and conc e e ph asing, e en hough hey be e ma ch CEFR A2 desc ip o s.
209
E Sen ence Simpli ica ion a chi ec u e: Run2
Run 2
P ep ocessing P omp ing & Gene a ion Sco ing & Selec ion S i ch & Analysis
Inpu pa ag aph
Sen ence segmen a ion
( egex sen _spli o spaCy)
Co e e ence esolu ion
(AllenNLP + spaCy-compa ible)
Sel -con ained sen ences
S uc u ed p omp (A1/A2/B1)
Cons ain s: meaning, en i ies/numbe s, eadabili y,
op ional b acke ed glosses, s ic o ma
In eg a ed G adien s (IG) on CEFR cl
Cap um LIG, m=100 s eps
Top-Kin luen ial ph ases: ( ype,ph ase,sco e)
Decoding: T≤0.3,p=0.9,≤180 ok/sen ,
s op a nex ag
Gene a e wi h h ee LLMs
LLaMA-3-8B, GPT-4o (API), Mis al-7B
Au oma ic pe -candida e sco ing
8 signals
1) Meaning (emb cosine +
MNLI)
2) Key-in o co e age (IG
ph ases)
3) En i y/numbe /uni i-
deli y
4) Readabili y s CEFR
(ASL+FRE)
5) Lexical simpli ica ion
gain
6) Fluency (PPL →[0,1])
7) Comp ession con ol
8) Sen ence con ol & o -
ma ga e
Weigh ed geome ic mean
co e ×con ol, clamp [0,1]
Rank candida es / op-K
LLM Judge?
Au o op-1 Send op-Kwi h c x+le el
(GPT-4o o LLaMA)
Winne index + eason
Policy
O e ide wi h LLM Blend (weigh ed)
Pe -sen ence winne (A1/A2/B1)
W i e winne s pe sen ence
S i ch by le el →pa ag aphs
<B1> <A2> <A1> blocks
Compa e judge ypes & backends
ag eemen , d i , dis ibu ion shi s
Final Ou pu
Figu e 3: Run 2 p ep ocessing (segmen a ion, co e e ence) and IG a ibu ion; CEFR-con olled p omp ing/decoding
ac oss h ee LLMs; au oma ic judge (8 signals) wi h weigh ed geome ic mean; op ional LLM-as-Judge wi h policy;
s i ching and compa a i e analysis.
F E alua ion-Me ics-Sen ence simpli ica ion: Run2
Me ic Desc ip ion
Meaning p ese a ion
Embedding cosine simila i y plus bidi ec ional en ailmen p obabili ies (MNLI) o assess whe he he simpli ied
sen ence p ese es he meaning o he sou ce.
Key in o ma ion co e age
Checks whe he he op-
K
in luen ial ph ases iden i ied by IG a e p esen in he simpli ied ou pu (case-insensi i e
ma ching).
En i y, numbe , and uni ideli y
Compa es named en i ies wi h spaCy (se F1). Numbe s a e g eedily ma ched one- o-one i uni s ag ee, allowing an
absolu e e o wi hin max(1%,10−6).
Readabili y s. CEFR
Combines a e age sen ence leng h (ASL) and Flesch Reading Ease (FRE) (Flesch,1948), no malised o CEFR
a ge s: A1 (ASL ≈10, FRE ≥0.80), A2 (15, 0.70), B1 (20, 0.60).
Lexical simpli ica ion gain
Reduc ion in a e age syllables pe wo d compa ed o he sou ce. A small bonus is gi en o inline glosses (e.g., “[a
simple meaning]”).
Fluency
Language model pe plexi y mapped o
[0,1]
(Ju a sky and Ma in,2023); lowe pe plexi y means highe luency. I
no LM is p o ided, a neu al sco e o 0.75 is assigned.
Comp ession con ol
Ra io o simpli ied o o iginal wo d coun s, no malised o he a ge ange 0.6–1.0. Penalises ou pu s ha a e oo
sho o oo e bose.
Sen ence/ o ma con ol
Encou ages keeping sen ence coun close o he sou ce ( a io 0.7–1.1). Rejec s emp y ou pu s o hose exceeding
1200 cha ac e s.
Table 7: E alua ion signals used by he au oma ic judge. Each me ic is no malised o
[0,1]
and combined by a
weigh ed geome ic mean.
210