A heuristic dataset reduction for green computing of photovoltaic power generation prediction

Author: Aravena Cifuentes, Ana Paula,Núñez González, José David,Graña Romay, Manuel María

Publisher: Springer Nature

Year: 2025

DOI: 10.1007/s11047-025-10029-6

Source: https://addi.ehu.eus/bitstream/10810/78552/1/s11047-025-10029-6-1-1.pdf

A heu is ic da ase educ ion o g een compu ing o pho o ol aic
powe gene a ion p edic ion
Ana Paula A a ena-Ci uen es
1
•J. Da id Nun
˜ez-Gonzalez
1
•Manuel G an
˜a
1
Accep ed: 25 May 2025 / Published online: 16 June 2025
ÓThe Au ho (s) 2025
Abs ac
A i icial In elligence (AI) has become inc easingly in eg a ed in o e e yday li e, wi h he gene al popula ion p og essi ely
elying on i o e en ou ine asks. As AI models g ow in complexi y, p ecision, and compu a ional powe , hei ene gy
consump ion ises exponen ially, aising se ious conce ns abou he sus ainabili y o hei widesp ead adop ion. The ield o
g een lea ning aims o mi iga e hese conce ns by de eloping ene gy-e icien AI solu ions. In his wo k, we p opose a
me hod o p ese e he accu acy o pho o ol aic (PV) powe gene a ion o ecas ing while educing he en i onmen al
impac h ough da ase size educ ion. Ou app oach employs a heu is ic s a egy ha i e a i ely educes he aining
da ase un il a cu o poin is eached, balancing p edic i e accu acy and en i onmen al e iciency. Expe imen al esul s
using publicly a ailable PV gene a ion da ase s demons a e ha he p oposed da a educ ion me hod dec eases aining
ime by up o 17.13%, wi h only a 1.47% decline in p edic ion accu acy. These indings highligh he po en ial o he
me hod o subs an ially educe he ca bon oo p in o AI applica ions wi h minimal pe o mance deg ada ion.
Keywo ds G een Compu ing Pho o ol aic powe gene a ion p edic ion Da ase educ ion
1 In oduc ion
Pho o ol aic (PV) sola ene gy is a enewable, inex-
haus ible, and en i onmen ally iendly ene gy sou ce
de i ed om he con e sion o sunligh in o elec ici y
using pho oelec ic echnology (Haegel and Ku z 2022).
Among i s many ad an ages a e i s minimal en i onmen al
impac , he abili y o s o e su plus ene gy in ba e ies, and
i s s aigh o wa d ins alla ion in bo h indus ial and esi-
den ial se ings. Fu he mo e, he p oduc ion o PV panels
has a signi ican ly smalle ca bon oo p in compa ed o
o he ene gy gene a ion echnologies.
As a esul , ecen decades ha e wi nessed a no able in-
c ease in he deploymen o enewable ene gy sou ces,
pa icula ly sola pho o ol aic sys ems, as pa o he global
e o o ensu e sus ainable elec ici y p oduc ion (A a-
ena-Ci uen es e al. 2023).
Sola PV powe gene a ion, howe e , is inhe en ly
dependen on a ious en i onmen al ac o s-especially
dynamic wea he condi ions-which in luence he in ensi y,
du a ion, and angle o sola adia ion. These a iables
in oduce subs an ial challenges o he accu a e p edic ion
o sola powe ou pu , a ask cu en ly add essed using
s a is ical o ecas ing echniques and A i icial In elligence
(AI)-based models (Abdelsa a e al. 2024). Reliable
o ecas ing is c ucial o ha monize ene gy p oduc ion wi h
demand, educe dependence on ossil uels, and ensu e
e icien in eg a ion o sola ene gy in o he elec ical g id
(Chau e al., 2028). In pa icula , sho - e m o ecas ing is
essen ial o enhancing g id s abili y and eliabili y
(Almonacid e al. 2014).
O e he pas decade, he ield o AI-and pa icula ly
machine lea ning (ML)-has made ema kable s ides in
imp o ing o ecas ing accu acy. Much o his p og ess has
been d i en by he a ailabili y o inc easingly la ge da ase s,
which suppo he de elopmen o deep neu al ne wo k
a chi ec u es wi h high p edic i e powe (Jay Kuo e al.
&J. Da id Nun
˜ez-Gonzalez
[email p o ec ed]
Ana Paula A a ena-Ci uen es
[email p o ec ed]
Manuel G an
˜a
[email p o ec ed]
1
compu a ional in elligence g oup, uni e si y o basque
coun y, o aola a 29, 20600 Eiba , gipuzkoa, Spain
123
Na u al Compu ing (2025) 24:637–649
h ps://doi.o g/10.1007/s11047-025-10029-6(0123456789().,- olV)(0123456789().,- olV)
2023). Howe e , a g owing conce n, o en o e looked, is he
subs an ial ca bon oo p in associa ed wi h he aining o
hese models. Deep lea ning (DL) models, due o hei
inhe en complexi y and eliance on massi e da ase s,
equi e ex ensi e compu a ional esou ces, hus unde min-
ing b oade sus ainabili y goals (Schwa z e al. 2020).
In esponse, he ield o g een compu ing-also e e ed o
as sus ainable compu ing-has eme ged as a c i ical a ea o
esea ch wi hin ML (Raja 2021; Paul 2023). As aining
and deploying ML models become inc easingly esou ce-
in ensi e, esea che s and indus y p ac i ione s a e seeking
me hods o minimize ene gy consump ion and en i on-
men al impac . S a egies unde explo a ion include ha d-
wa e op imiza ion, ene gy-e icien aining echniques,
and he inco po a ion o enewable ene gy sou ces in o
compu a ional in as uc u e.
A key esea ch di ec ion wi hin his domain is he
de elopmen o ene gy-e icien model aining echniques.
Inno a ions such as model p uning, quan iza ion, and
knowledge dis illa ion aim o educe compu a ional com-
plexi y wi hou signi ican ly comp omising pe o mance.
The ocus o his wo k is he educ ion o aining
da ase size while main aining he p edic i e e iciency o
he ML model. This s a egy aims o educe aining ime
and, consequen ly, he ca bon oo p in associa ed wi h ML
wo k lows (Ougia oglou e al. 2023). Da a educ ion can
be achie ed h ough se e al echniques, including ea u e
selec ion and ex ac ion, disc e iza ion, ins ance selec ion,
and da a gene a ion (Ramı
´ ez and Se gio 2014). The
me hod p oposed in his pape cen e s on educing he
aining da ase by p ese ing key s a is ical cha ac e is-
ics-speci ically, he mean and s anda d de ia ion-o he
o iginal da a. I is impo an o emphasize ha he p ima y
goal o his wo k is no o enhance p edic i e pe o mance,
bu a he o con ibu e ounda ional ideas in g een na u al
and a i icial compu a ion.
The emainde o his pape is o ganized as ollows:
Sec ion 2 e iews he s a e o he a . Sec ion 3 p esen s he
expe imen al da a and he p oposed da ase educ ion
me hod. Sec ion 4 desc ibes he expe imen al design.
Sec ion 5 epo s he esul s. Sec ion 6 concludes he pape .
2 Rela ed wo ks
2.1 G een compu ing
The e is a g owing body o scien i ic esea ch dedica ed o
explo ing he sus ainable use o in o ma ion echnologies,
pa icula ly in ligh o hei inc easing ubiqui y in daily
li e. Among hese echnologies, a i icial in elligence (AI)
ools s and ou due o hei widesp ead adop ion and sig-
ni ican compu a ional demands.
A me hodological amewo k o es ima ing he ca bon
oo p in o compu a ional asks in a s anda dized and
eliable manne is p esen ed by Lannelongue e al. (2021).
The au ho s p opose a se o me ics designed o con ex-
ualize g eenhouse gas (GHG) emissions ac oss di e se
compu a ional domains. Thei app oach in eg a es ene gy
consump ion da a om a ious sou ces-including p oces-
so s, memo y, and acili y o e head-and accoun s o
geog aphical ac o s. The me hodology was alida ed using
use cases om pa icle physics simula ions, wea he o e-
cas ing models, and na u al language p ocessing (NLP)
sys ems. A key con ibu ion o his wo k is i s balance
be ween es ima ion accu acy and p ac ical applicabili y,
making i a aluable ool o assessing he en i onmen al
cos o compu a ional wo kloads.
In ano he no able con ibu ion, She yl e al. (2023)
p opose a heu is ic-based g een compu ing app oach
combined wi h an op imized ou ing p o ocol o ene gy-
awa e managemen in secu e sys ems. The ene gy
equi emen s a e es ima ed using a i s -o de adio model,
which quan i ies he ene gy needed o ansmi a packe o
m bi s. Le e aging his da a, he au ho s sugges ha i is
possible o de ine ene gy-sa ing s anda ds h ough he use
o a Managemen In o ma ion Base (MIB) model ailo ed
o ene gy managemen . Thei app oach no only imp o es
ene gy e iciency bu also enhances secu i y in ene gy-
sensi i e applica ions.
Addi ionally, g een compu ing p inciples ha e been
applied in he con ex o he In e ne o Things (IoT), as
explo ed by Jaiswal e al. (2021). Gi en he massi e con-
nec i i y and ene gy cons ain s o IoT ne wo ks, ene gy
ha es ing (EH) has eme ged as a key s a egy o p olong
ne wo k li e ime. Howe e , con en ional EH echniques
o en esul in signi ican ene gy consump ion a he senso
le el. To add ess his limi a ion, he au ho s p opose a
Time-Swi ching Simul aneous Wi eless In o ma ion and
Powe T ans e (T-SWIPT) p o ocol. This echnique
enhances he ene gy e iciency o senso -enabled IoT ne -
wo ks by allowing concu en da a and ene gy ansmis-
sion, he eby educing ene gy o e head and ex ending
de ice ope abili y.
2.2 Da ase size educ ion
Ea ly esea ch on da ase size educ ion is exempli ied by
he doc o al disse a ion o Lozano and Ma ı
´a(2007),
which add esses he cons uc ion and condensa ion o
aining se s om gi en da a. In addi ion o o mula ing
s a egies o da ase educ ion, he wo k compa es se e al
classi ica ion echniques. The au ho ca ego izes conden-
sa ion me hods in o wo main g oups: non-adap i e and
adap i e echniques. The o me includes app oaches such
as he Nea es Cen oid Neighbou (NCN) Rule, MaxNCN,
638 A. P. A a ena-Ci uen es e al.
123
and Reconsis en . The la e g oup encompasses NCN-
based Adap i e Condensa ion Algo i hms and Gaussian-
based Adap i e Condensa ion Algo i hms. The s udy con-
cludes ha MaxNCN, Ha , and Reconsis en yielded he
mos p omising esul s in e ms o classi ica ion pe o -
mance and da a educ ion e iciency.
To u he educe he aining se size, Chou a u e al.
(2015) in oduce an enhancemen o he well-known g aph-
based Op imum-Pa h Fo es (OPF) classi ie . Thei
app oach le e ages he Segmen ed Leas Squa es Algo-
i hm (SLSA) o es ima e he s uc u e o he classi ica ion
ee mo e e icien ly. This me hodology achie es a educ-
ion in aining se size anging om 7% o 21%, while
main aining classi ica ion accu acy wi h only ma ginal
losses be ween 0.2% and 0.5% ac oss he e alua ed
da ase s.
Mo e ecen ly, Ougia oglou e al. (2023) p esen s
ad ancemen s aimed a accele a ing he pe o mance o he
k-Nea es Neighbou (k-NN) classi ie . Speci ically, he
s udy in oduces wo no el p o o ype gene a ion algo i hms
o mul i-label da ase s. The i s is a a ian o he
Reduc ion by Homogeneous Clus e ing (RHC) me hod,
while he second is a modi ied e sion o he RSP3 algo-
i hm-o iginally designed o single-label asks and
no able o being pa ame e - ee. Bo h echniques con-
ibu e o educing compu a ional complexi y wi hou
comp omising classi ica ion e ec i eness.
In his con ex , ou con ibu ion p oposes a heu is ic
me hod o da ase educ ion ha p ese es he s a is ical
ep esen a i eness o he o iginal da a. Unlike adi ional
dimensionali y educ ion echniques, which ypically
ope a e on a ibu e space, ou app oach ope a e on
ins ance space. This me hodology ep esen s a no el pe -
spec i e in he pu sui o sus ainable machine lea ning. I is
wo h no ing ha ou p e ious s udies (A a ena-Ci uen es
e al. 2023,2024) did no inco po a e da ase educ ion
s a egies, which a e in oduced and e alua ed in he p e-
sen wo k.
3 Da a and algo i hm
3.1 Da a
This s udy le e ages localized pho o ol aic (PV) ene gy
p oduc ion da a published by Williams and Wagne (2019),
p eceding he syn hesis p o ided by Pasion e al. (2020).
The da ase comp ises powe ou pu measu emen s om
wel e U.S. Depa men o De ense (DoD) sola ins alla-
ions dis ibu ed ac oss di e se clima ic egions wi hin he
Uni ed S a es, co e ing he pe iod be ween 2017 and 2018
(see Figu e 1).
The da ase con ains 21,046 ins ances and 17 a ibu es,
encompassing sola powe ou pu along wi h a a ie y o
geog aphical, empo al, and me eo ological ea u es. The
sampling equency anges om 15 min o se e al hou s,
depending on he ins alla ion.
All expe imen s we e conduc ed on a machine equipped
wi h an In el(R) Co e(TM) i5-10300 H CPU @ 2.50GHz,
8.00 GB RAM (7.78 GB usable), unning a 64-bi ope a -
ing sys em. Da a p ep ocessing and modeling we e
implemen ed in Py hon wi hin he Anaconda dis ibu ion,
using he Spyde de elopmen en i onmen .
Du ing he p ep ocessing phase, ca ego ical alues in
he Season a ibu e we e encoded nume ically o enable
model inges ion. A p io s udy by A a ena-Ci uen es e al.
(2024) ex ended he benchma k in oduced in A a ena-
Ci uen es e al. (2023), which ini ially ocused on he
T a is Ai Fo ce Base da ase -iden i ied as he highes -
pe o ming loca ion in e ms o PV ou pu . The ex ended
analysis inco po a ed da a om addi ional loca ions:
Malms om, MNANG, Hill Webe , and Camp Mu ay.
Figu es 2 o 6p esen sca e plo s o he Polypowe
a iable o each selec ed loca ion. Inpu ea u es used o
model aining include empo al ( ime, mon h, season),
geog aphical (la i ude, al i ude), and me eo ological a i-
ables (humidi y, ambien empe a u e, wind speed, isi-
bili y, ba ome ic p essu e, and cloud co e ). The appa en
dispe sion in he sca e plo s is a ibu ed o he high
sampling esolu ion (minu e-le el in e als), compounded
by i egula da a acquisi ion schedules ac oss si es. Some
loca ions exhibi subs an ial da a gaps, while o he s a e
densely popula ed wi h high- equency measu emen s.
Fig. 1 Hea map o powe gene a ion (Polypowe a iable) pe
loca ion and mon h
A heu is ic da ase educ ion o g een compu ing o ... 639
123
Fig. 4 Polypowe samples om
Malms om loca ion
Fig. 3 Polypowe samples om
Hill Webe loca ion
Fig. 5 Polypowe samples om
MNANG loca ion
Fig. 2 Polypowe samples om
Camp Mu ay loca ion
640 A. P. A a ena-Ci uen es e al.
123
3.2 Da a educ ion algo i hm
The objec i e o his wo k is o minimize he amoun o
aining da a equi ed by a Random Fo es (RF) eg ession
model while p ese ing he unde lying s a is ical s uc u e
o he da ase , main aining p edic i e accu acy abo e a
p ede ined h eshold, and educing he en i onmen al
impac associa ed wi h compu a ional esou ce usage.
This s udy builds upon p io wo k by A a ena-Ci uen es
e al. (2023), which p oposed a baseline RF model ained
using a s anda d i e- old c oss- alida ion s a egy in 5
seeds (25 uns in o al). The same p o ocol is ollowed he e
o ees ablish a benchma k p io o da a educ ion. Subse-
quen ly, he model is e ained on a educed da ase , gen-
e a ed by he no el heu is ic desc ibed in Algo i hm 1.
This da a educ ion heu is ic ope a es by le e aging a
s ochas ic sampling mechanism inspi ed by p inciples o
di e si y-p ese ing selec ion. Speci ically, one da a
ins ance (i.e., ow) is andomly selec ed o each ea u e
column-ensu ing ha no ins ance index is eused-so ha
each selec ed alue comes om a dis inc ow. These
alues a e hen me ged o cons uc a new syn he ic
ins ance. The ows om which he selec ed alues we e
d awn a e subsequen ly emo ed om he da ase , and he
syn he ic ins ance is appended, yielding a comp essed
da ase . This p ocess achie es a comp ession a io equal o
he numbe o a iables (e.g., 12:1 o a 12- a iable da a-
se ), and main ains he o iginal ea u e dimensionali y.
Gi en he s ochas ic na u e o he educ ion p ocess, an
i e a i e e inemen scheme is applied. Each educed
da ase candida e is e alua ed using a i ness unc ion ha
quan i ies s a is ical simila i y o he o iginal da ase . This
e alua ion, de ailed in Algo i hm 2, compu es and com-
pa es summa y s a is ics-namely mean, median, and s an-
da d de ia ion-be ween he o iginal da ase and he
candida e educed da ase . A educed da ase is conside ed
alid i he de ia ions o all selec ed s a is ics emain
Fig. 6 Polypowe samples om
T a is loca ion
Algo i hm 1 Heu is ic s a egy o da a educ ion
A heu is ic da ase educ ion o g een compu ing o ... 641
123

wi hin use -de ined h esholds. This enables a lexible
calib a ion o he ade-o be ween da a comp ession and
s a is ical ideli y.
4 Expe imen al design
In his sec ion, we p esen he expe imen al se up and
esul s used o e alua e he pe o mance o he p oposed
da a educ ion s a egy. The me hodology es ablished in he
benchma k s udy by A a ena-Ci uen es e al. (2023) se es
bo h as a e e ence amewo k and as a compa a i e
baseline o assessing he new app oach. Tha s udy
unde sco ed he necessi y o ailo ing p edic i e models o
speci ic geog aphic loca ions, and emphasized he dual
impo ance o achie ing high p edic i e accu acy while
minimizing he en i onmen al impac associa ed wi h
compu a ional esou ce consump ion.
To ensu e obus ness, each i e a ion o he p oposed da a
educ ion algo i hm is subjec ed o a 5- old c oss- alida-
ion p o ocol. Fu he mo e, o mi iga e he e ec s o
s ochas ic a iabili y, he algo i hm is execu ed unde i e
di e en andom seeds, esul ing in a o al o 25 uns pe
i e a ion. The ini ial i e a ion (I e a ion 0) uses he com-
ple e aining da ase wi hou any educ ion, se ing as a
baseline. Subsequen i e a ions p og essi ely educe he
aining da ase by applying he heu is ic educ ion unc-
ion desc ibed in Algo i hm 1.
Two expe imen al con igu a ions a e examined in his
s udy, he ea e e e ed o as Me hod 1 and Me hod 2. In
Me hod 1, he en i e aining da ase is subjec ed o he
educ ion p ocess. In con as , Me hod 2 es ic s he
educ ion p ocess o ins ances co esponding o he mon hs
o June h ough Sep embe . This empo al segmen a ion is
mo i a ed by he dis ibu ion o he o iginal da ase , which
spans om 09 June 2017 o 03 Oc obe 2018, and exhibi s
highe da a densi y du ing he summe mon hs.
Finally, objec i e e alua ion me ics such as mean
squa e e o (MSE), mean absolu e e o (MAE), and
coe icien o de e mina ion 1(R2) a e calcula ed o assess
he pe o mance o he models a each old, seed, and
gene a ion.
The coe icien o de e mina ion R2is calcula ed using
Equa ion 1.
R2¼1SS es
SS o
ð1Þ
whe e
SS es: Sum o squa ed di e ences be ween p edic ed and
ac ual alues
SS o : To al sum o squa ed di e ences be ween ac ual
alues and hei means.
Addi ionally, he mean absolu e e o (MAE) and he mean
squa e e o (MSE) a e calcula ed using Eqs. 2and 3,
espec i el:.
MAE ¼1
nX
n
i¼1
jyp ed y uej;ð2Þ
MSE ¼1
nX
n
i¼1
ðyp ed y ueÞ2
;ð3Þ
whe e nis he numbe o da a ins ances, yp ed a e p edic ed
alues and y ue a e he obse ed alues
5 Resul s
This sec ion p esen s he esul s ob ained h ough he
implemen a ion o he p oposed da a educ ion echnique
o sola powe gene a ion p edic ion. The analysis illus-
a es he e olu ion o model pe o mance h oughou he
op imisa ion p ocess.
Algo i hm 2 Fi ness unc ion
642 A. P. A a ena-Ci uen es e al.
123
Figu es 7,8and 9compa e he pe o mance o Me hod
1 and Me hod 2 in e ms o R-squa ed (R2) agains he size
o he educed da ase s, measu ed by he numbe o ain-
ing ins ances, o he T a is loca ion.
The o iginal aining da ase con ained 2197 ins ances.
A educ ion o 2000 ins ances (app oxima ely 10%) in bo h
me hods, ega dless o he h eshold alue, did no lead o
no able changes in he e alua ion me ics, as shown in
Figu e 9.
The bes esul s we e ob ained wi h a ole ance h eshold
o 18% o Me hod 1 and 20% o Me hod 2. In pa icula ,
Me hod 2 achie ed a subs an ial educ ion o 17.13% in he
aining da ase size wi h only a 1.5% dec ease in he R2
alue. Me hod 1, by compa ison, esul ed in an 8.3%
educ ion in da ase size wi h a minimal 0.78% d op in he
a e age coe icien o de e mina ion.
Me hod 1 showed ha inc easing he ole ance h eshold
in he selec ion unc ion allowed mo e agg essi e da a
educ ion, leading o ewe aining ins ances. Howe e ,
his was accompanied by a deg ada ion in p edic i e pe -
o mance, demons a ing he ade-o be ween da ase
compac ness and model accu acy.
In con as , Me hod 2 exhibi ed a di e en pa e n. Once
he h eshold exceeded 19%, no u he ins ances we e
emo ed, as he educ ion mechanism is cons ained o a
ixed empo al window wi hin he aining da a. As a esul ,
a pla eau was eached whe e addi ional h eshold inc e-
men s no longe a ec ed he aining se size. This beha-
iou is clea ly illus a ed in Figu e 9.
The Camp Mu ay loca ion, which al eady has a ew
ins ances, speci ically 890. We s opped he p ocess a e
only 2 i e a ions because educing he size o he da ase
ge s wo se.
Figu es 10 and 11 p esen he esul s o he Hill Webe
loca ion. In his case, bo h da a educ ion me hods we e
applied, achie ing equi alen le els o educ ion; howe e ,
Me hod 2 yielded sligh ly be e pe o mance. The R2
alues emain compa able be ween he wo me hods o
da ase sizes down o app oxima ely 1200 ins ances.
The esul s o he Malms om loca ion a e depic ed in
Figu es 12 and 13. Me hod 1 demons a es good pe o -
mance o educ ions in he ange o 10–15%; speci ically,
he emo al o 200 ins ances has a negligible impac on R2.
Howe e , beyond his ange, pe o mance me ics begin o
de e io a e, as is pa icula ly e iden a ound he 18%
educ ion ma k.
Finally, Figu es 14 and 15 display he esul s o he
MNANG loca ion. The beha iou obse ed is simila o
ha o he Camp Mu ay loca ion, hough wi h imp o ed
o e all pe o mance. Gi en he o iginal da ase o 600
ins ances, accep able p edic i e accu acy is main ained
wi h educ ions o up o 20%.
The T a is da ase , when analyzed using Me hod 1 (see
Table 1), shows ela i ely s able esul s wi h MAE alues
a ound 1.95 o he di e en window sizes (10, 15, 17, 18),
wi h sligh luc ua ions in he s anda d de ia ion. The MSE
alues also emain close, ho e ing a ound 9, which sug-
ges s consis en p edic ion e o s. The R2 alues a e s ong,
consis en ly abo e 0.79, indica ing a good i o he model
o he da a, wi h small a iabili y. O e all, Me hod 1 seems
o pe o m consis en ly ac oss he di e en pa ame e s wi h
s ong p edic ion pe o mance and li le a ia ion.
In con as o Me hod 1, Me hod 2 (see Table 2 o he
T a is da ase shows sligh ly highe MAE alues, anging
om 1.96 o 2.07, which indica es a ma ginally less
accu a e model compa ed o Me hod 1. The MSE alues
a e simila ly highe han hose in Me hod 1, peaking a
9.96. Howe e , R2 alues emain ela i ely s able, wi h
sligh luc ua ions a ound 0.79. While he MAE and MSE
alues sugges sligh ly wo se pe o mance, he R2 alues
indica e ha he model s ill p o ides a good i , albei wi h
a small inc ease in e o ma gins.
5.1 Camp mu ay
The esul s o Camp Mu ay (see Table 3) show consis en
and s able pe o mance ac oss a ious con igu a ions. The
Mean Absolu e E o (MAE) luc ua es sligh ly be ween
2.274 and 2.278, wi h he s anda d de ia ion emaining
close o 0.18, indica ing mino a ia ion in he model’s
p edic i e accu acy. This consis ency sugges s ha he
model pe o ms eliably in e ms o absolu e p edic ion
e o .
Fo he Mean Squa ed E o (MSE), alues ange om
12.250 o 12.272, again showing s able p edic ion e o s
ac oss he di e en con igu a ions. The s anda d de ia ion
o MSE is also qui e low, indica ing li le luc ua ion in he
squa ed e o s be ween uns.
The R2 alues a e consis en ly a ound 0.741, wi h a e y
small s anda d de ia ion o 0.04. This indica es ha he
model is able o explain abou 74% o he a iance in he
da a, and he p edic i e powe is s able ac oss di e en
uns.
The pe o mance o Me hod 1 a Hill Webe (see
Table 4) shows a clea end o deg ada ion as he
pa ame e alue inc eases. MAE ises s eadily om 2.492
a 10 o 2.802 a 25, indica ing a g owing a e age e o in
p edic ions. Likewise, MSE inc eases om 13.931 o
16.860, showing a signi ican ise in squa ed e o s. The R
2
sco e d ops om 0.700 o 0.647, e lec ing a educed
abili y o he model o explain he a iance in he da a. The
s anda d de ia ions also inc ease sligh ly, pa icula ly o
MSE and R2, sugges ing ha he model’s eliabili y
dec eases as he pa ame e g ows.
A heu is ic da ase educ ion o g een compu ing o ... 643
123
Me hod 2 (see Table 5) shows a mo e s able beha io
ac oss con igu a ions, pa icula ly when compa ed o
Me hod 1. MAE emains ela i ely low and luc ua es less
d ama ically, anging om 2.492 o 2.662. Simila ly, MSE
inc eases mode a ely, om 13.924 o 15.277, which is a
slowe g ow h han in Me hod 1. In e es ingly, he R
2
alues emain close o 0.690 h ough mos o he ange and
only dec ease sligh ly owa d he end, indica ing a mo e
consis en explana o y powe . The s anda d de ia ions
emain small ac oss all me ics, poin ing o highe model
obus ness.
The model’s pe o mance a Malms om (see Table 6)
demons a es conside able a iabili y and a clea endency
owa d de e io a ion as he pa ame e alue inc eases. The
MAE g ows om 2.654 a pa ame e 10 o 3.183 a 25,
e lec ing a signi ican ise in a e age p edic ion e o .
Fig. 7 T a is loca ion. Pe o mance esul s unde he da a educ ion p ocessed ollowed by Me hod 1 (le ) and Me hod 2 (le ). The e olu ion o
he g aphs is om igh o le , i.e. he dec easing numbe o ins ances in he da ase
Fig. 8 T a is loca ion. Pe o mance esul s unde he da a educ ion p ocessed ollowed by Me hod 1 (le ) and Me hod 2 ( igh ): A quad a ic
app oxima ion was used o il e ou he noise in he esul s
644 A. P. A a ena-Ci uen es e al.
123
Simila ly, he MSE ollows an upwa d ajec o y, inc eas-
ing om 16.427 o 21.130, indica ing la ge squa ed
de ia ions. The R
2
sco e consis en ly dec eases om 0.671
o 0.574, showing a weakening explana o y powe o he
model. In e es ingly, pa ame e 18 ma ks a empo a y
eco e y in pe o mance, wi h me ics nea ly ma ching
hose a pa ame e 10. Howe e , his appea s o be an
ou lie , as esul s con inue wo sening a e wa d. S anda d
de ia ions also inc ease, pa icula ly o MAE and MSE,
highligh ing g owing inconsis ency and educed model
s abili y a highe pa ame e alues. O e all, he model
pe o ms bes a he lowe pa ame e alues, especially a
10 and 18.
The MNANG model (see Table 7) shows consis en ye
mode a e p edic i e pe o mance ac oss all pa ame e
con igu a ions. The MAE alues emain wi hin a na ow
ange, om 2.832 o 2.940, e lec ing ela i ely s able ab-
solu e e o . MSE alues inc ease g adually om 18.406 o
19.512, indica ing a sligh ise in he a e age squa ed e o
as model complexi y g ows. Co espondingly, he R2
me ic declines om 0.689 o 0.669, sugges ing a small
educ ion in he p opo ion o a iance explained.
Fig. 9 T a is loca ion. Pe o mance esul s unde Me hod 1 (le ) and Me hod 2 ( igh ): Close-up conside ing he minimum numbe o ins ances
achie ed wi h Me hod 2
Fig. 10 Hill Webe loca ion. Pe o mance esul s unde Me hod 1 (le ) and Me hod 2 ( igh )
A heu is ic da ase educ ion o g een compu ing o ... 645
123

Related note

Why institutions use Plag.ai for originality review, entry 53
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by doctoral supervisors in universities, research institutes, colleges, schools, and publishing workflows, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also clearer documentation of academic decisions, reduced manual checking effort, and clearer separation between similarity and misconduct. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For course assignments, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai