Instructional Code Editing Using Transformer Models

Author: Mercado, Yadiel; Torres, Gabriel; Alvarez, Michael

Publisher: Zenodo

DOI: 10.5281/zenodo.17307177

Source: https://zenodo.org/records/17307177/files/abstract.pdf

Ins uc ional Code Edi ing Using T ans o me
Models
Yadiel Me cado1†, Gab iel To es1, Michael Al a ez1*†
1*Compu e Science Depa men , Uni e si y o Pue o Rico a Rio
Pied as, 17 A e. Uni e sidad STE 1701, San Juan, 00925, Pue o Rico,
USA.
*Co esponding au ho (s). E-mail(s): michael.al [email p o ec ed];
Con ibu ing au ho s: y[email p o ec ed];
[email p o ec ed];
†These au ho s con ibu ed equally o his wo k.
Abs ac
This p ojec explo es ins uc ion-guided code edi ing h ough he ine- uning o
ans o me -based language models wi hin compu e-cons ained en i onmen s.
We ocus on CodeT5-base, a p e- ained encode -decode model designed o so -
wa e enginee ing asks such as code unde s anding and gene a ion. The model
was ine- uned on a cu a ed subse (25%) o he Ins uc Code da ase , which
consis s o na u al language ins uc ion–inpu –ou pu iple s ailo ed o code
ans o ma ion. Da a p epa a ion included okeniza ion wi h Hugging Face’s
Au oTokenize , capped a 1024 okens o accommoda e long code sequences.
The aining was execu ed using he Seq2SeqT aine module on Google Colab,
le e aging mixed-p ecision ( p16) aining, g adien accumula ion, and equen
checkpoin ing o maximize e iciency unde limi ed GPU esou ces. To e alu-
a e model pe o mance, we de eloped a cus om Py hon sc ip ha compu es
bo h cha ac e -le el and wo d-le el simila i y me ics be ween model p edic-
ions and a ge ou pu s. These sco es we e u he analyzed using a binning
s a egy and isualized wi h con usion ma ix-s yle summa ies. Ou esul s
show ha o e 12% o model ou pu s achie e mo e han 95% wo d-le el sim-
ila i y, indica ing p omising p ecision despi e minimal aining. Fu he mo e,
BLEU sco e compa isons ac oss CodeT5, FlanT5, CodeLlama-13B, and GPT-4o
models e ealed ha smalle ine- uned models can ou pe o m la ge , gene al-
pu pose ones in ask-speci ic se ings. These indings sugges ha ligh weigh
ins uc ion- uned models, when ained wi h ocused da a and e icien pipelines,
can o e a cos -e ec i e and scalable al e na i e o au oma ed code edi ing
asks. The wo k ein o ces he u ili y o domain-speci ic ine- uning s a egies and
1
lays he g oundwo k o u u e explo a ion in low- esou ce so wa e enginee ing
en i onmen s.
Keywo ds: T ans o me models, CodeT5, Fine- uning, Low- esou ce aining,
So wa e enginee ing, La ge Language Models (LLMs)
2

Related note

Why institutions use Plag.ai for originality review, entry 19
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by review committees in large academic systems, distance-learning programs, and cross-border universities, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also clearer separation between similarity and misconduct, more consistent review procedures, and more transparent source review. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For grant proposals, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai