scieee Science in your language
[en] (orig)

From vulnerability to resilience: Adversarial training and real-time detection for AI security

Author: Ziras, Georgios; Farao, Aristeidis; Zarras, Apostolis; Xenakis, Christos
Publisher: Zenodo
DOI: 10.1016/j.array.2025.100546
Source: https://zenodo.org/records/17521005/files/1-s2.0-S2590005625001730-main.pdf
Con en s lis s a ailable a ScienceDi ec
A ay
jou nal homepage: www.else ie .com/loca e/a ay
F om ulne abili y o esilience: Ad e sa ial aining and eal- ime de ec ion
o AI secu i y
Geo gios Zi as a, A is eidis Fa ao a,b,∗, Apos olis Za as a,c, Ch is os Xenakis a
aDepa men o Digi al Sys ems, Uni e si y o Pi aeus, Pi aeus, G eece
bInQbi Inno a ions SRL., Bucha es , Romania
cFounda ion o Resea ch and Technology - Hellas, He aklion, G eece
A R T I C L E I N F O
Keywo ds:
Ad e sa ial a acks
Machine lea ning models
Ad e sa ial aining
A B S T R A C T
The g owing in eg a ion o A i icial In elligence sys ems in o c i ical in as uc u e, such as cybe secu i y,
heal hca e, and inance, has aised signi ican conce ns ega ding model obus ness in he p esence o
ad e sa ial a acks. This s udy examines he ulne abili y o a ious machine lea ning models o ad e sa ial
manipula ions and e alua es e ec i e de ec ion and mi iga ion s a egies o imp o e model esilience.
Le e aging he CIC-IDS2017 and CICIoT2023 da ase s, we ain and e alua e a sui e o ML classi ie s, including
Decision T ee, Random Fo es , Logis ic Reg ession, XGBoos , Recu en Neu al Ne wo ks, Con olu ional Neu al
Ne wo ks, and a cus om PyTo ch-based Neu al Ne wo k, unde a spec um o ad e sa ial e asion a acks.
These include he Fas G adien Sign Me hod, P ojec ed G adien Descen , DeepFool, Ca lini–Wagne , and
ans e a acks. We assess classi ie obus ness agains hose a acks and examine hei de ensi e beha io
h ough ad e sa ial aining, as well as bina y inpu and ac i a ion-based de ec ion mechanisms. Ou indings
indica e ha ad e sa ial aining p o ides a mo e e ec i e and consis en de ense compa ed o de ec ion-based
me hods.
1. In oduc ion
Cybe secu i y a acks undamen ally aim o comp omise a sys em’s
con iden iali y, in eg i y, o a ailabili y by causing i o beha e in
unin ended ways. Simila ly, ad e sa ial Machine Lea ning (ML) a acks
a ge A i icial In elligence (AI) sys ems by manipula ing hem o
p oduce inco ec o misleading ou pu s. These a acks ypically in ol e
he c ea ion o ca e ully c a ed malicious inpu s (a.k.a. ad e sa ial ex-
amples) ha exploi model ulne abili ies, leading o misclassi ica ion
and unde mining he model’s eliabili y and accu acy.
Ad e sa ial ML a acks can occu a any s age o an AI model’s
li ecycle, including he aining, es ing, and deploymen phases. They
can be b oadly ca ego ized in o e asion, poisoning, and p i acy-based
a acks. Mo eo e , ad e sa ial a acks may be ei he a ge ed, whe ein
he a acke aims o induce a speci ic e oneous beha io o ou pu ,
o un a ge ed, whe e he objec i e is o cause gene al misclassi ica ion
wi hou a p ede ined ou come.
As AI sys ems become inc easingly in eg a ed in o mission-c i ical
domains such as heal hca e, inance, de ense, and au onomous sys ems
– whe e secu i y and eliabili y a e pa amoun – hei suscep ibili y
o ad e sa ial a acks poses a signi ican h ea . These sub le ma-
nipula ions can comp omise sys em unc ionali y and lead o se e e
∗Co esponding au ho a : Depa men o Digi al Sys ems, Uni e si y o Pi aeus, Pi aeus, G eece.
E-mail add esses: [email p o ec ed] (G. Zi as), [email p o ec ed] (A. Fa ao).
eal-wo ld consequences. The e o e, unde s anding, de ec ing, and mi -
iga ing such h ea s has become an essen ial a ea o esea ch.
This wo k in es iga es he suscep ibili y o AI models o ad e sa ial
pe u ba ions while measu ing hei baseline accu acy and obus ness.
The CIC-IDS2017 [1] and CICIoT2023 [2] da ase s a e employed o
ain a a ie y o classi ica ion algo i hms, including Decision T ee
(DT), Random Fo es (RF), Logis ic Reg ession (LR), eX eme G adien
Boos ing (XGBoos ), Con olu ional Neu al Ne wo k (CNN), Recu en
Neu al Ne wo k (RNN) and PyTo ch MLP model. This da ase was
selec ed o i s ealis ic ep esen a ion o egula ne wo k a ic and
con empo a y cybe h ea s. I consis s o labeled ne wo k low ea u es
ex ac ed om packe cap u es (PCAPs) using CICFlowMe e .
Following he ini ial aining and e alua ion o hese models, a sui e
o ad e sa ial a acks, including Fas G adien Sign Me hod (FGSM),
P ojec ed G adien Descen (PGD), DeepFool, and Ca lini & Wagne
(C&W), is applied o assess he models’ esilience agains ad e sa ial
pe u ba ions. Addi ionally, his s udy explo es ad e sa ial ans e abil-
i y, whe ein ad e sa ial examples gene a ed o one model a e es ed
agains o he s wi h di e en a chi ec u es. This aspec p o ides insigh s
in o sha ed ulne abili ies and model gene alizabili y. To enhance
obus ness, ad e sa ial aining is employed by augmen ing he aining
h ps://doi.o g/10.1016/j.a ay.2025.100546
Recei ed 19 May 2025; Recei ed in e ised o m 9 July 2025; Accep ed 14 Oc obe 2025
A ay 28 (2025) 100546
A ailable online 17 Oc obe 2025
2590-0056/© 2025 The Au ho s. Published by Else ie Inc. This is an open access a icle unde he CC BY license (
h p://c ea i ecommons.o g/licenses/by/4.0/ ).
G. Zi as e al.
da ase wi h examples speci ically c a ed o exploi model weaknesses.
This i e a i e p ocess aims o imp o e he model’s abili y o de ec and
esis ad e sa ial inpu s. The e ained models a e hen e-e alua ed
o assess imp o emen s in accu acy and esilience. Fu he mo e, his
esea ch in es iga es eal- ime ad e sa ial de ec ion me hods. One such
app oach in ol es ea u e squeezing, which educes inpu complex-
i y o highligh disc epancies be ween o iginal and pe u bed inpu s.
Ano he echnique, G adien -Based Anomaly De ec ion, analyzes he
dis ibu ion o g adien s, wi h ad e sa ial examples ypically exhibi ing
less peaked g adien s han benign samples.
In summa y, we make he ollowing main con ibu ions:
•We ain a ious AI models and comp ehensi ely assess hei
ulne abili ies h ough ca e ully designed ad e sa ial a acks.
•We de elop and implemen obus AI models by e aining hem
using ad e sa ial aining wi h clean and ad e sa ially pe u bed
da ase s.
•We p esen , o he bes o ou knowledge, he i s p ac ical
implemen a ion and e alua ion o eal- ime ad e sa ial a ack
de ec ion me hods.
•We analyze and quan i y imp o emen s in obus ness and accu-
acy o AI models esul ing om ad e sa ial aining compa ed o
adi ional models.
The emainde o his a icle is o ganized as ollows. Sec ion 2
p esen s backg ound knowledge, including ulne abili ies in AI sys-
ems, an analysis o ad e sa ial ML a acks, and an o e iew o de ec-
ion and mi iga ion echniques. Sec ion 3 desc ibes he me hodology o
conduc ing ad e sa ial a acks on ML models, as well as he p ocedu es
o ad e sa ial aining, in eg a ion o de enses, and he pe o mance
e alua ion me ics used. Sec ion 4 discusses he s udy’s limi a ions and
ou lines u u e esea ch di ec ions, while Sec ion 5 e iews ela ed
wo k. Finally, Sec ion 6 concludes he a icle.
2. Backg ound
This sec ion ou lines he key ulne abili ies o AI-based sys ems,
he main ypes o ad e sa ial AI a acks, and cu en mi iga ion and
de ec ion s a egies designed o coun e such h ea s.
2.1. Vulne abili ies in AI -based sys ems
AI-based sys ems a e inc easingly deployed in c i ical domains such
as heal hca e, inance, au onomous ehicles, de ense, and cybe se-
cu i y. These sys ems le e age complex ML models o analyze da a,
make decisions, and adap o new inpu s. Despi e hei capabili ies, AI
sys ems emain inhe en ly ulne able due o hei dependence on da a-
d i en lea ning and gene aliza ion p ocesses [3–5]. The key challenges
a e desc ibed below.
A p ima y ulne abili y s ems om he dependence o AI models on
he quali y and ep esen a i eness o hei aining da a. Biases, imbal-
ances, o incomple eness in he aining da ase can se e ely hinde
he model’s gene aliza ion abili y, making i suscep ible o ad e sa ial
manipula ion. Mo eo e , he non-linea i y and high dimensionali y o
mode n Deep Lea ning (DL) models make hem pa icula ly sensi i e
o ad e sa ial pe u ba ions, whe e minimal inpu al e a ions can lead
o signi ican changes in ou pu .
Ano he key ulne abili y is he lack o obus de ense mechanisms in
many exis ing AI sys ems. Mos ML models a e op imized o accu acy
on clean da ase s and a e no inhe en ly esilien o ad e sa ial inpu s,
c ea ing oppo uni ies o a acke s o exploi hese weaknesses.
Fu he mo e, AI sys ems ope a ing in eal-wo ld se ings o en unc-
ion in dynamic and unce ain en i onmen s. This a iabili y enables
ad e sa ies o c a con ex -awa e ad e sa ial examples ha e ade
de ec ion and dis up ope a ions. In complex asks such as mul i-
class classi ica ion [6,7], hese manipula ions can lead o misclassi i-
ca ion in o plausible bu inco ec classes, making de ec ion e en mo e
challenging.
2.2. Ad e sa ial AI a acks
Ad e sa ial a acks in ol e he delibe a e design o inpu s in ended
o decei e ML and AI models by exploi ing hei ulne abili ies. These
inpu s, known as ad e sa ial examples, a e c a ed o induce inco ec
p edic ions, hus comp omising he model’s in eg i y, eliabili y, and
pe o mance [6]. A de ining cha ac e is ic o ad e sa ial a acks is hei
abili y o emain nea ly impe cep ible o human obse e s while being
highly e ec i e in decei ing AI sys ems.
The mo i a ions behind ad e sa ial a acks ange om malicious
in en , such as dis up ing c i ical sys ems, o e hical hacking aimed a
unco e ing and ixing ulne abili ies. The consequences o such a acks
can be se e e, pa icula ly in high-s akes en i onmen s like heal hca e,
inance, o anspo a ion.
Ad e sa ial a acks a e ypically classi ied based on he ad e sa y’s
knowledge and access le el:
•Whi e-Box A acks: The a acke has comple e knowledge o he
model, including i s a chi ec u e, pa ame e s, and aining da a.
This allows o highly e ec i e inpu manipula ions based on he
model’s in e nal s uc u e [4].
•G ay-Box A acks: The ad e sa y has pa ial in o ma ion, such
as knowledge o he model a chi ec u e, bu no i s pa ame-
e s o aining da a. Hyb id s a egies combine limi ed in e nal
knowledge wi h ex e nal p obing [4].
•Black-Box A acks: The a acke has no in e nal access o he
model and elies solely on que ying i o in e beha io . These
a acks o en use su oga e models o exploi he ans e abili y
p ope y o ad e sa ial examples [4].
Ad e sa ies employ a ious me hods o c a ad e sa ial inpu s
ailo ed o he le el o access and knowledge a ailable. G adien -based
echniques a e among he mos commonly used, le e aging he model’s
g adien s o iden i y di ec ions ha maximize he ad e sa ial impac .
Examples include he FGSM, which adds pe u ba ions in he g adien ’s
di ec ion, and he PGD, which i e a i ely e ines ad e sa ial examples
o s onge a acks.
Beyond g adien -based app oaches, o he me hods include: (𝑖) C&W:
an op imiza ion-based me hod ha minimizes pe u ba ions while en-
su ing misclassi ica ion [8]; (𝑖𝑖) DeepFool: an i e a i e algo i hm ha
compu es he minimum pe u ba ion equi ed o c oss he decision
bounda y [9]; (𝑖𝑖𝑖) Ze o h-O de Op imiza ion (ZOO): a black-box a ack
ha es ima es g adien s using model que ies [10]; (𝑖𝑣) Bounda y A ack:
a decision-based me hod ha begins wi h a la ge pe u ba ion and
g adually educes i [11]; (𝑣) T ans e -based A ack: exploi s he abili y
o ad e sa ial examples o gene alize ac oss di e en models [12].
These me hods unde sco e he e sa ili y o ad e sa ial a acks, as hey
can adap o di e en le els o sys em access and de enses. Unde s and-
ing hese echniques is essen ial o designing obus de enses agains
ad e sa ial h ea s.
Ad e sa ial a acks may also be classi ied based on hei me hodol-
ogy and he s age o he ML pipeline hey a ge [4]:
•E asion A acks: Execu ed du ing he in e ence phase, hese a -
acks bypass he model’s de enses by modi ying es inpu s wi h-
ou al e ing he aining da a.
•Poisoning A acks: Conduc ed du ing aining, hese a acks in-
jec malicious da a o comp omise he lea ning p ocess and em-
bed ulne abili ies.
•Explo a o y A acks: These a acks p obe he sys em o ga he
insigh s wi hou modi ying aining da a, o en se ing as a p e-
cu so o e asion o poisoning.
A ay 28 (2025) 100546
2
G. Zi as e al.
2.3. Mi iga ion echniques
Mi iga ion s a egies aim o s eng hen ML models agains ad e -
sa ial h ea s by imp o ing obus ness. These echniques include mod-
i ying aining p ocedu es, augmen ing inpu da a, o al e ing model
a chi ec u es [3].
Ad e sa ial T aining [13–15] is one o he mos widely s udied and
e ec i e mi iga ion s a egies. In his app oach, he model is ained on
a mix u e o clean and ad e sa ial pe u bed examples. The inclusion
o ad e sa ial examples in he aining p ocess o ces he model o
lea n how o classi y bo h clean and pe u bed inpu s co ec ly, hus
imp o ing i s obus ness o u u e a acks. Fo example, ad e sa ial
aining ypically in ol es gene a ing ad e sa ial examples using me h-
ods like FGSM o PGD and adding hese examples o he aining da a.
By epea edly exposing he model o ad e sa ial pe u ba ions, he
model lea ns o esis such a acks by ecognizing pa e ns indica i e
o ad e sa ial pe u ba ions.
Inpu T ans o ma ion and Da a Augmen a ion echniques modi y
he inpu da a in ways ha educe he impac o ad e sa ial pe -
u ba ions, o en by emo ing he noise in oduced by ad e sa ial
a acks. These ans o ma ions can include echniques such as ea u e
squeezing, image c opping, o andom ans o ma ions [16].
Ce i ied De enses [17,18] ocus on p o iding o mal gua an ees
ha a model will be obus o ad e sa ial a acks wi hin speci ic
bounds. These me hods use ma hema ical easoning and op imiza ion
echniques o compu e p o able obus ness gua an ees. Howe e , ce -
i ied de enses a e o en compu a ionally expensi e and challenging o
implemen o complex models.
Ensemble Lea ning [19–21] is a powe ul s a egy ha combines
he p edic ions o mul iple models o make mo e obus decisions. The
unde lying idea is ha while indi idual models may be ulne able
o speci ic ad e sa ial a acks, an ensemble o models, each ained
on sligh ly di e en da a o wi h a ying a chi ec u es, can educe
he o e all ulne abili y o he sys em. By agg ega ing he p edic ions
o mul iple models, ensemble me hods help mi iga e he impac o
ad e sa ial pe u ba ions, making i mo e di icul o an a acke o
decei e all models in he ensemble simul aneously.
Regula iza ion Me hods [22], such as weigh decay and d opou ,
a e designed o imp o e he gene aliza ion capabili ies o a model.
Regula iza ion p e en s he model om o e i ing o ad e sa ial ex-
amples by penalizing o e ly complex decision bounda ies, making i
mo e di icul o small, impe cep ible pe u ba ions o d as ically al e
model p edic ions.
2.4. De ec ion echniques
De ec ion echniques a e c i ical o iden i ying ad e sa ial inpu s
be o e hey can comp omise model pe o mance. The goal is o de ec
ad e sa ial pe u ba ions a he ea lies possible s age, he eby enabling
co ec i e measu es o be applied in eal ime.
G adien -based De ec ion in ol es analyzing he g adien s o he
model’s loss unc ion wi h espec o i s inpu s. Ad e sa ial examples
end o exhibi la ge o unusual g adien s compa ed o clean inpu s
due o hei pe u ba ions. By examining he g adien beha io , one
can iden i y anomalies indica i e o ad e sa ial manipula ion.
Fea u e-based De ec ion ocuses on de ec ing ad e sa ial pe u ba-
ions in he lea ned ea u es o a model. Since ad e sa ial a acks a e
designed o exploi lea ned ep esen a ions, hese a acks o en cause
disc epancies in he ea u e space. De ec ion can hus be achie ed by
analyzing he ac i a ion alues o in e media e laye s in he neu al
ne wo k.
Inpu P ep ocessing and T ans o ma ion app oach in ol es p ep o-
cessing he inpu da a o emo e o mi iga e he e ec s o ad e sa ial
pe u ba ions. Techniques such as image denoising o da a ans o ma-
ion a e used o il e ou noise and educe he impac o ad e sa ial
manipula ion.
S a is ical Me hods o ad e sa ial de ec ion ocus on analyzing
he s a is ical p ope ies o he inpu da a. These me hods examine
dis ibu ions o ea u es such as mean, a iance, skewness, and ku osis,
which may shi due o ad e sa ial pe u ba ions. Ad e sa ial inpu s
ypically lead o anomalies ha can be de ec ed h ough s a is ical
ou lie de ec ion echniques. S a is ical Ou lie De ec ion in ol es he
analysis o inpu ea u es o de ec s a is ical anomalies. By e alua ing
he s a is ical p ope ies o he da a, his me hod can de ec ad e sa ial
examples ha de ia e om he expec ed da a dis ibu ion.
On he de ec ion side, s a is ical ea u e analysis o e s a p omising
me hod o iden i ying ad e sa ial examples. This app oach le e ages
he s a is ical p ope ies o he inpu da a, such as i s mean, a iance,
and highe -o de momen s, o de ec anomalies indica i e o ad e -
sa ial manipula ion. Ad e sa ial pe u ba ions o en cause no iceable
shi s in he s a is ical dis ibu ion o he da a, and by examining hese
shi s, s a is ical me hods can e icien ly iden i y ad e sa ial inpu s.
Mo eo e , s a is ical ea u e analysis is compu a ionally e icien and
can be implemen ed in eal- ime de ec ion sys ems [3].
In pa allel, s a is ical analysis o e s an e icien and scalable me hod
o ad e sa ial de ec ion; analyzing shi s in he s a is ical dis ibu-
ion o ea u es enables he eal- ime iden i ica ion o manipula ed
inpu s. Combined, hese echniques o e a obus and complemen a y
de ense mechanism: ad e sa ial aining ein o ces model esilience,
while s a is ical analysis ensu es apid de ec ion o ad e sa ial h ea s,
enhancing he secu i y and eliabili y o AI sys ems.
3. Me hodology
This sec ion ou lines he me hodology employed o e alua e he
obus ness o ML models agains ad e sa ial AI a acks.
3.1. Sys em o e iew
This s udy adop s a comp ehensi e app oach o e alua e and en-
hance he obus ness o ML models in he con ex o ne wo k in usion
de ec ion. All expe imen s we e conduc ed on a sys em equipped wi h
an AMD Ryzen 5 2600 Six-Co e p ocesso (12 h eads), 32 GB o RAM,
and AMD-V i ualiza ion suppo . The en i onmen uns on Ubun u
24.04.2 LTS, o e ing a s able, high-pe o mance pla o m sui able o
in ensi e model aining and ad e sa ial e alua ion asks. A his poin
we ha e o men ion ha all expe imen s pe o med 10 imes.
The CIC-IDS2017 da ase [1] and CICIoT2023 [2] we e selec ed
o his esea ch, as being widely ecognized. CIC-IDS2017 add esses
se e al limi a ions ound in ea lie da ase s by o e ing comp ehensi e
ea u e se s, ealis ic a ic pa e ns, and de ailed labeling. The da ase
includes bo h benign a ic and di e se cybe a ack ca ego ies, such as
denial-o -se ice (DoS), dis ibu ed DoS (DDoS), b u e- o ce login a -
emp s, in il a ion, web-based a acks, po scans, and bo ne ac i i ies.
While, CICIoT2023 is a eal- ime da ase and benchma k o la ge-scale
a acks in IoT en i onmen . I includes he execu ion o 33 a acks in an
IoT opology composed o 105 de ices. These a acks a e classi ied in o
se en ca ego ies, namely DDoS, DoS, Recon, Web-based, B u e Fo ce,
Spoo ing, and Mi ai. Finally, all a acks a e execu ed by malicious
IoT de ices a ge ing o he IoT de ices. Ini ial analysis e ealed he
ollowing dis ibu ion o ne wo k a ic ca ego ies as depic ed in Table
1.
Da a p ep ocessing included handling alues by eplacing hem wi h
NaN, add essing missing alues, and emo ing duplica e en ies. A e
duplica e emo al, he da ase was educed om an ini ial 1,236,424
ows o 1,122,397. Fu he p ep ocessing in ol ed encoding ca ego ical
a iables and scaling nume ical da a. La e on, he da ase spli in o a
aining da ase ha includes 785,677 samples wi h 78 ea u es and
a es ing da ase ha includes 336,720 samples wi h 78 ea u es, o
ensu ing he same ime eliabili y and accu acy o subsequen in usion
de ec ion model e alua ions.
A ay 28 (2025) 100546
3
G. Zi as e al.
Table 1
Numbe o samples pe ca ego y ound in used da ase s.
Ca ego y Samples Ca ego y Samples Ca ego y Samples
CIC-IDS2017
Benign 872,105 Po Scanning 12,843 SSH-Pa a o 5897
DoS Hulk 231,073 DoS GoldenEye 10,293 DoS Slowlo is 5796
No mal Ne wo k T a ic 84,980 FTP-Pa a o 7938 DoS Slowh p es 5499
CICIoT2023
Benign 16,577 DDOS-ICMP_FLOOD 5227 DDOS-UDP_FLOOD 3962
DDOS-TCP_FLOOD 3227 DDOS-RSTFINFLOOD 3019 DDOS-PSHACK_FLOOD 3003
DDOS-SYN_FLOOD 2954 DDOS-SYNONYMOUSIP_FLOOD 2620 DOS-UDP_FLOOD 2490
DOS-TCP_FLOOD 1907
The p ep ocessing phase in ol ed he ollowing ac ions o ensu e
da a quali y and eadiness o modeling: (𝑖) Da a Cleaning whe e he
da ase was ini ially inspec ed o missing alues and in ini e alues—
missing o co up ed da a we e iden i ied and eplaced acco dingly o
ensu e da ase in eg i y; (𝑖𝑖) Encoding whe e all ca ego ical ea u es in
he da ase we e encoded nume ically using label encoding, con e ing
ex ual in o ma ion in o nume ical ep esen a ions sui able o ma-
chine lea ning algo i hms; (𝑖𝑖𝑖) Da a Scaling whe e nume ical ea u es
we e scaled using he S anda dScale me hod o s anda dize ea u e
anges, imp o ing model aining e iciency and pe o mance consis-
ency; (𝑖𝑣) Da ase Spli ing whe e he inal p ep ocessed da ase was
spli in o aining and es ing subse s (commonly 70–30) o acili a e
unbiased e alua ion o he model’s pe o mance— his spli ing ensu ed
ep oducibili y and obus ness in pe o mance assessmen . O e all,
h ough his p ep ocessing, he esul ing da ase p o ided a eliable
ounda ion o aining ML models capable o accu a ely de ec ing and
esponding o ad e sa ial a acks, he eby ensu ing he ele ance and
applicabili y o he expe imen al esul s [23].
In his s udy, we ained and e alua ed mul iple ML models o de ec
ad e sa ial a acks using he CIC-IDS2017 and CICIoT2023 da ase s.
The models selec ed include DT, RF, LR, XGBoos , and Deep Neu al
Ne wo k (DNN), CNN and RNN and PyTo ch MLP model. DT was
chosen o i s simplici y, in e p e abili y, and compu a ional e iciency,
making i sui able o baseline e alua ion. I e ec i ely iden i ies clea
decision ules and highligh s he mos signi ican ea u es con ibu -
ing o ad e sa ial de ec ion. RF was selec ed due o i s obus ness
agains o e i ing, high accu acy ac oss di e se da ase s, and i s abil-
i y o handle noisy o complex da a by agg ega ing p edic ions om
mul iple ees o cap u e mo e in ica e pa e ns. LR o e ed a s aigh -
o wa d p obabilis ic amewo k app op ia e o bina y classi ica ion
asks, p o iding a anspa en and in e p e able baseline o compa -
ison. XGBoos was employed o i s supe io p edic i e pe o mance,
as execu ion, and capabili y o add ess class imbalance (i.e., an issue
commonly encoun e ed in cybe secu i y da ase s) h ough i s g adien -
boos ing app oach, which enables i e a i e op imiza ion and model
e inemen . Finally, a DNN was implemen ed using PyTo ch MLP model
o le e age i s capaci y o lea ning complex, nonlinea ela ionships in
high-dimensional ne wo k a ic da a. Neu al ne wo ks a e pa icula ly
ad an ageous in ex ac ing hie a chical ea u es and de ec ing sub le
ad e sa ial pa e ns ha adi ional models may no cap u e.
Fo pe o mance e alua ion, we employed se e al s anda d clas-
si ica ion me ics o assess he e ec i eness o he p oposed models.
Speci ically, we used accu acy o measu e he o e all p opo ion o
co ec ly classi ied ins ances, including benign and ad e sa ial samples.
P ecision was u ilized o e alua e he model’s abili y o co ec ly iden-
i y ad e sa ial examples among all ins ances p edic ed as ad e sa ial,
a c i ical ac o in educing alse posi i es. Recall (o sensi i i y) was
used o assess he model’s capaci y o de ec ac ual ad e sa ial inpu s,
essen ial o minimizing he likelihood o unde ec ed a acks. Finally,
he F1-Sco e was calcula ed as a ha monic mean o p ecision and ecall,
o e ing a balanced assessmen o he model’s pe o mance, pa icula ly
in class imbalance scena ios.
Table 2 p esen s he baseline pe o mance me ics o each assessed
model. Rega ding, he me ics occu ed in CIC-IDS2017 da ase i is
obse ed, DT achie ed he highes accu acy, eaching 99.99%, whe eas
LR eco ded he lowes accu acy a 97.56%. The emaining models all
a ained accu acy sco es exceeding 99%. Rega ding, he me ics oc-
cu ed in CICIoT2023 da ase i is obse ed (see Table 2), LR achie ed
he highes accu acy, eaching 87.06%, whe eas RF eco ded he lowes
accu acy a 84.51%. The emaining models all a ained accu acy sco es
exceeding 86%. These baseline me ics a e e e ence poin s agains
which he pe o mance imp o emen s om ad e sa ial aining will be
compa ed.
3.2. Robus ness e alua ion agains ad e sa ial AI
Se e al ad e sa ial a acks we e pe o med o ho oughly e alua e
he obus ness o he a o emen ioned ML models. The DT A ack, a
specialized ad e sa ial echnique ailo ed o DT-based models, was
applied o he DT model. This a ack exploi s he hie a chical s uc-
u e o decision ees by iden i ying and modi ying c i ical ea u e
alues ha in luence he classi ica ion ou come. The FGSM, a as ,
single-s ep a ack, was execu ed agains bo h he LR model and he
PyTo ch MLP model. This me hod pe u bs inpu da a by adjus ing
ea u e alues in he di ec ion o he g adien sign, gene a ing minimal
ye e ec i e pe u ba ions ha mislead he model. The PGD a ack
was also applied o he LR and PyTo ch MLP model. Unlike FGSM,
PGD ope a es i e a i ely, applying small pe u ba ions o e mul iple
s eps o e ine ad e sa ial modi ica ions, he eby inc easing he a -
ack’s e ec i eness and making i mo e di icul o de end agains .
DeepFool, ano he i e a i e a ack, was used on he same models. I
compu es he minimal pe u ba ion equi ed o shi an inpu sample
ac oss he decision bounda y, leading o misclassi ica ion wi h minimal
changes o he inpu . Finally, he C&W a ack was conduc ed on he
LR and PyTo ch MLP model. This op imiza ion-based echnique aims
o p oduce highly e ec i e ad e sa ial examples by minimizing he
pe u ba ion’s pe cep ibili y while maximizing i s ad e sa ial impac
h ough a e ined op imiza ion p ocess. The ollowing se up is used
o he a o emen ioned a acks. FGSM is applied wi h a pe u ba ion
magni ude 𝜀 anging om 0.4 o 0.7, sui able o one-s ep a acks on
s uc u ed da a. PGD, a s onge i e a i e a ian , uses 𝜀 alues be ween
0.6 and 0.8, wi h a s ep size o 0.01 o e 50 o 100 i e a ions o
c a mo e e ec i e pe u ba ions. DeepFool au oma ically adjus s i s
pe u ba ions based on he decision bounda y, equi ing no p ede ined
𝜀. Las ly, he C&W a ack is pe o med as a a ge ed op imiza ion-based
me hod, using a con idence pa ame e o 0.3 o 0.4 and 10 op imiza ion
s eps o gene a e high-con idence ad e sa ial examples wi h minimal
isibili y.
Table 3 summa izes he e alua ion me ics o he models ollowing
he applica ion o a ious ad e sa ial a acks. On he one hand, in
CIC-IDS2017 da ase (see Table 3), we can obse e ha he DT a ack
had a ca as ophic impac on he DT model, educing i s accu acy o
jus 0.27%. This d as ic pe o mance deg ada ion e eals he model’s
ex eme sensi i i y o ad e sa ial pe u ba ions and highligh s i s lack
o obus ness in de ending agains e en minimal ad e sa ial noise.
A ay 28 (2025) 100546
4
G. Zi as e al.
Table 2
Baseline pe o mance me ics pe model.
Model Accu acy (%) P ecision (%) Recall (%) F1-Sco e (%)
CIC-IDS2017
RF 99.94 99.95 99.96 99.95
LR 97.56 96.58 97.01 97.55
XGBoos 99.97 99.81 99.77 99.79
DT 99.99 99.95 96.80 99.99
PyTo ch MLP model 99.44 99.89 98.52 99.69
RNN 99.71 99.71 99.71 99.71
CNN 99.81 99.81 99.81 99.81
CICIoT2023
RF 84.51 84.47 84.51 84.48
LR 87.06 87.96 87.06 85.65
XGBoos 86.18 85.96 86.18 85.82
DT 86.74 86.77 86.74 85.72
PyTo ch MLP model 86.57 88.87 86.57 84.21
RNN 86.08 88.14 86.08 84.25
CNN 86.92 88.19 86.92 85.21
Table 3
Model e alua ion me ics a e ad e sa ial a acks.
Pe o med a ack Ta ge model Accu acy (%) P ecision (%) Recall (%) F1-Sco e (%)
CIC-IDS2017
DT A ack DT 0.27 22.29 0.18 0.07
FGSM LR 1.99 2.85 2.29 2.65
PGD LR 0.65 26.81 0.65 0.50
DeepFool LR 1.62 3.24 1.62 1.61
C&W LR 2.09 4.82 2.09 2.09
FGSM PyTo ch MLP model 78.54 31.69 22.64 23.98
PGD PyTo ch MLP model 72.00 18.76 11.04 22.55
DeepFool PyTo ch MLP model 65.92 18.76 11.04 10.55
C&W PyTo ch MLP model 55.94 33.18 17.83 18.76
FGSM CNN 10.43 33.98 10.43 10.11
FGSM RNN 18.94 33.18 18.94 23.87
PGD CNN 0.85 2.55 0.85 1.21
PGD RNN 0.24 0.67 0.24 0.34
DeepFool CNN 0.15 0.43 0.15 0.18
DeepFool RNN 0.38 1.29 0.38 0.58
CICIoT2023
DT A ack DT 0.02 78.59 0.02 0.01
FGSM LR 17.85 43.81 17.85 11.63
PGD LR 11.24 88.96 11.24 2.63
DeepFool LR 13.78 21.11 13.78 9.90
C&W LR 12.90 24.07 12.90 6.94
FGSM PyTo ch MLP model 27.13 24.41 27.13 21.49
PGD PyTo ch MLP model 84.43 84.04 84.43 82.14
DeepFool PyTo ch MLP model 41.53 36.03 41.53 35.30
C&W PyTo ch MLP model 86.57 88.87 86.57 84.21
FGSM CNN 17.12 19.69 17.12 17.07
PGD CNN 20.76 20.16 20.76 18.02
DeepFool CNN 2.09 48.23 2.09 3.29
FGSM RNN 13.87 29.93 13.87 6.94
PGD RNN 11.53 5.26 11.53 6.75
DeepFool RNN 13.42 6.28 13.42 7.29
Simila ly, bo h FGSM and PGD a acks se e ely comp omised he pe -
o mance o he LR model, wi h accu acies d opping o 1.99% and
0.65%, espec i ely. In con as , he PyTo ch MLP model demons a ed
a ma kedly s onge esis ance o hese a acks, main aining accu acies
o 78.54% unde FGSM and 72.00% unde PGD. These esul s sugges
ha while linea models a e highly suscep ible o g adien -based ad-
e sa ial me hods, deepe a chi ec u es may inco po a e ea u es ha
inhe en ly mi iga e such ulne abili ies, a leas o a ce ain ex en .
Unde he DeepFool a ack, LR again showed a d ama ic d op in
pe o mance, wi h accu acy alling o 1.62%. Al hough he PyTo ch
MLP model pe o med be e in his case, i s accu acy s ill d opped
o 65.92%, indica ing ha e en mo e obus deep lea ning models
emain ulne able o well-op imized ad e sa ial pe u ba ions designed
o sub ly manipula e decision bounda ies. The C&W a ack eme ged as
he mos damaging ad e sa ial me hod ac oss bo h models. I educed
he accu acy o he LR model o 2.09%, while he PyTo ch MLP
model expe ienced a subs an ial decline o 55.94%. This signi ican
impac unde sco es he e ec i eness o op imiza ion-based a acks like
C&W, which can exploi e en obus a chi ec u es h ough a ge ed
and p ecise ad e sa ial noise. Ac oss all a ack ypes, bo h CNN and
RNN models expe ience a d ama ic d op in accu acy, o en below
1% in he case o PGD and DeepFool, indica ing ha he ad e sa ial
inpu s e ec i ely decei e he models. P ecision, ecall, and F1-sco es
a e simila ly low, especially o PGD and DeepFool, wi h F1-sco es
d opping o as low as 0.18% o DeepFool on he CNN and 0.34%
o PGD on he RNN. The FGSM a ack shows sligh ly be e de ec ion
me ics, pa icula ly o he RNN (F1-sco e o 23.87%), bu s ill e lec s
poo o e all pe o mance. These me ics clea ly highligh ha bo h
CNNs and RNNs, wi hou ad e sa ial de enses, a e highly suscep ible o
e en basic ad e sa ial a acks, esul ing in se e e deg ada ion o hei
classi ica ion capabili ies.
A ay 28 (2025) 100546
5

G. Zi as e al.
On he o he hand, in CICIoT2023 da ase (see Table 3), we can ob-
se e ha DT A ack achie ed again o d ama ically dec ease accu acy
o DT model a 0.02%. Then, LR’s accu acy was also dec eased being
a acked om FGSM, PGD, C&W and Deep ool. While, we men ion
ha he bigges impac was no ed agains PGD eaching 11.24%. Nex ,
PyTo ch MLP model was e alua ed agains all a acks FGSM, PGD,
C&W and Deep ool. All achie ed o impac he model. The bigges
impac achie ed by FGSM a ack while he less om he C&W achie ing
27.13% and 86.57% accu acy co espondingly. CNN assessed agains
FGSM, PGD and DeepFool. The la e achie ed he bigges impac
wi h 2.09%, while PGD he less a 20.76% accu acy. Finally, RNN
e alua ed agains FGSM, PGD and DeepFool. PGD achie ed he bigges
impac wi h 11.53% and FGSM he less impac a 13.87% accu acy. The
ela i ely s ong pe o mance o he PyTo ch MLP model unde ad e -
sa ial condi ions (e.g., 78.54% accu acy unde FGSM and 55.94% unde
C&W) can be a ibu ed o i s a chi ec u al capaci y o lea n complex,
non-linea ea u e ep esen a ions. Unlike adi ional models wi h igid
o shallow decision bounda ies, deep ne wo ks ex ac hie a chical
abs ac ions ha enable mo e lexible and obus classi ica ion. This
allows he model o pa ially esis sub le pe u ba ions, pa icula ly
in high-dimensional ea u e spaces such as hose p esen in ne wo k
a ic da a.
O e all, i is obse ed ha DeepFool pe o med he mos se e e
impac on he CNN model ac oss bo h CIC-IDS2017 and CICIoT2023
da ase s. This can be easoned due o he a ack’s unique app oach
and he inhe en ulne abili y o CNNs o small, well- a ge ed pe u -
ba ions. DeepFool compu es he minimal pe u ba ion equi ed o push
an inpu ac oss he decision bounda y. As a high-capaci y model, CNN
o en de elops sha p decision bounda ies in high-dimensional ea u e
spaces, making i pa icula ly suscep ible o minimal pe u ba ions ha
a e ca e ully aligned wi h he g adien s o he ne wo k, as DeepFool
does. Mo eo e , since he used CNN is mo e sensi i e o ea u e-le el
dis o ions in such da a, DeepFool’s abili y o exploi sub le ulne abil-
i ies wi h p ecision leads o d as ic pe o mance deg ada ion, as seen
by nea -ze o accu acy and ecall in bo h da ase s. This illus a es ha
CNNs, while powe ul, can be c i ically des abilized by a acks ha
inely adap o he model’s geome y, as DeepFool does. Also, PyTo ch
MLP model consis en ly exhibi s he highes esilience o ad e sa ial
a acks, ega dless o he a ack ype o da ase . Gene ally speaking,
PyTo ch MLP model likely has a simple and mo e egula ized s uc u e
han CNN. This makes i less p one o o e i ing, educing i s sensi i -
i y o small ad e sa ial pe u ba ions. Simple models o en gene alize
be e in he p esence o noise, pa icula ly in s uc u ed da a domains
like hese. Also, PyTo ch MLP model ea s all inpu ea u es in a
la , uni o m manne . This homogeneous ea u e p ocessing educes
he chance ha pe u ba ions a ge ing localized dependencies (like in
CNNs) o sequen ial dependencies (like in RNNs) will d as ically al e
he ou pu .
Beyond di ec a acks, we also in es iga ed he ans e abili y o
ad e sa ial examples ac oss models (see Tables 4and 5). In his se ing,
ad e sa ial samples c a ed o a sou ce model we e e alua ed on
di e en a ge models o assess he c oss-model gene aliza ion o ad-
e sa ial pe u ba ions. This phenomenon o ans e abili y is especially
ele an in black-box scena ios, whe e a acke s do no ha e access o
he a ge model’s pa ame e s o a chi ec u e bu can s ill comp omise
i s in eg i y using su oga e models. The e ec i eness o hese ans e
a acks aises c i ical conce ns abou he gene al obus ness o machine
lea ning models and hei exposu e o eal-wo ld ad e sa ial h ea s.
E alua ing ans e a acks ac oss mul iple models o e s c i ical in-
sigh s in o hei ulne abili y o ad e sa ial pe u ba ions. This analysis
in es iga es he e ec i eness o a ious a ack s a egies on di e en
a ge models (i.e., DT, RF, XGBoos , LR, and PyTo ch MLP model)
using pe o mance me ics such as accu acy, p ecision, ecall, and
F1-sco e (see Tables 4and 5).
Rega ding he da ase CIC-IDS1017 (see Table 4), when he DT
a ack is applied o i s sou ce model, he DT classi ie pe o mance
deg ades d as ically, wi h accu acy and F1-sco e d opping o 0.27%
and 0.07%, espec i ely. This ou come indica es ha he a ack is
highly e ec i e a misleading he model i was c a ed o . Howe e ,
when ans e ed o o he models, he impac is less uni o m. RF e-
mains highly obus , main aining 99.28% accu acy, while XGBoos and
LR exhibi mode a e deg ada ion, wi h accu acies be ween 90% and
92%. The PyTo ch MLP model is simila ly esilien , p ese ing 98.1%
accu acy, sugges ing ha DL models a e less a ec ed by his speci ic
a ack when no used as he sou ce. The FGSM a ack, c a ed using a LR
model, signi ican ly deg ades he pe o mance o DT (1.09% accu acy),
ye i s e ec on o he models a ies. RF and XGBoos demons a e el-
a i e obus ness wi h 28.33% and 0.63% accu acy, espec i ely, while
he PyTo ch MLP model e ains pa ial obus ness a 12.41%. No ably,
he LR model i sel is hea ily comp omised (1.99% accu acy), con i m-
ing he e ec i eness o FGSM when a ge ed a i s sou ce a chi ec u e.
A simila end is obse ed wi h PGD a acks on LR. The pe o mance
o DT and XGBoos declines sha ply, wi h accu acies o 1.14% and
0.6%, espec i ely. Al hough RF displays g ea e esis ance (27.94%
accu acy), i s ill expe iences a no able d op. The PyTo ch MLP model,
once again, shows mode a e esilience, main aining 11.54% accu acy
unde his s onge i e a i e a ack.
DeepFool a acks c a ed om he LR model a e pa icula ly e ec-
i e agains adi ional ML models. DT and XGBoos exhibi signi i-
can pe o mance deg ada ion, wi h accu acies d opping o 4.34% and
12.24%, espec i ely. The LR model i sel is also se e ely impac ed
(1.62% accu acy). In con as , RF and PyTo ch MLP model demons a e
s ong obus ness, achie ing 87.06% and 98.85% accu acy, espec-
i ely. The C&W a ack ollows a simila pa e n. When gene a ed using
LR, i causes a conside able d op in pe o mance o bo h DT (3.3%)
and XGBoos (2.21%). Howe e , PyTo ch is only ma ginally a ec ed,
main aining a high accu acy o 98.22%. These indings ein o ce he
no ion ha DL models a e gene ally mo e esilien o ad e sa ial exam-
ples c a ed on adi ional ML models. In e es ingly, when ad e sa ial
examples a e gene a ed using he PyTo ch MLP model and ans e ed
o o he models, he o e all impac is less se e e compa ed o a acks
sou ced om classical models. Al hough DT and XGBoos emain ul-
ne able, he deg ada ion is educed. PyTo ch MLP model i sel shows
pe o mance deg ada ion unde a ious a acks (FGSM: 78.57%, PGD:
78.09%, DeepFool: 69.93%, C&W: 55.94%), bu emains conside ably
mo e obus han i s adi ional coun e pa s.
The CNN model exhibi s signi ican ulne abili y o mos ans e
a acks, especially hose gene a ed using s ong g adien -based me h-
ods such as DeepFool and C&W, e en when ans e ed om simple
su oga e models like LR o PyTo ch MLP model. Accu acy unde a ack
o en d ops o nea -ze o le els, such as 0.14% o DeepFool and C&W
om LR, indica ing he CNN is highly suscep ible. Despi e he low accu-
acy, ecall alues emain high (e.g., 0.89 o FGSM), bu his comes a
he cos o e y low p ecision, sugges ing many alse posi i es. The bes -
pe o ming a ack in e ms o de ec ion (high F1) seems o be FGSM
om LR, which achie es an F1-Sco e o 1.00, albei wi h poo p e-
cision (14.88%), highligh ing a de ec ion-hea y, e o -p one esponse.
A acks sou ced om CNN (e.g., DeepFool (CNN)) lead o sligh ly be e
balance, wi h an F1-Sco e o 27.35%, s ill low, bu no iceably highe
han o he s. In con as o he CNN, he RNN demons a es much
s onge esilience agains mos ans e a acks. The DT-based a ack
ails o deg ade RNN pe o mance, wi h an accu acy o 97.5%, and
e y high p ecision and F1-sco e, indica ing i is ine ec i e as an a ack
me hod. A acks om LR (e.g., FGSM and PGD) do educe he accu acy
o a ound 15%, bu ha e low p ecision and F1-sco es, meaning he
a acks cause misclassi ica ions bu a e poo ly de ec able. In e es ingly,
DeepFool and C&W a acks om LR a e mo e e ec i e he e han on
he CNN a ge , main aining ela i ely high F1-sco es o 78.71% and
85.34%, espec i ely. A acks ans e ed om PyTo ch MLP model
and CNN a y in e ec i eness, wi h FGSM and PGD (PyTo ch MLP
model) showing mode a e impac (F1-sco es a ound 65%–67%), and
DeepFool (PyTo ch MLP model) achie ing 38.30%, showing pa ial
A ay 28 (2025) 100546
6
G. Zi as e al.
Table 4
T ans e a ack e alua ion me ics in CIC-IDS2017.
Pe o med a ack Ta ge model Accu acy (%) P ecision (%) Recall (%) F1-Sco e (%)
DT A ack (DT) DT 0.27 22.29 0.18 0.07
DT A ack (DT) RF 99.28 99.28 98.25 98.75
DT A ack (DT) XGBoos 90.52 65.84 66.57 66.16
DT A ack (DT) LR 92.8 78.14 76.64 75.14
DT A ack (DT) PyTo ch MLP model 98.1 97.96 83.72 85.21
FGSM (LR) DT 1.09 22.51 1.18 0.3
FGSM (LR) RF 28.33 83.63 5.01 5.08
FGSM (LR) XGBoos 0.63 15.7 0.26 0.32
FGSM (LR) LR 1.99 2.85 4.29 2.65
FGSM (LR) 12.41 19.9 4.06 4.28
PGD (LR) DT 1.14 55.85 1.26 0.32
PGD (LR) RF 27.94 83.62 5.07 5.05
PGD (LR) XGBoos 0.6 18.14 0.47 0.71
PGD (LR) LR 1.98 2.78 4.02 2.53
PGD (LR) PyTo ch MLP model 11.54 22.07 4.29 4.64
DeepFool (LR) DT 4.34 25.63 6.32 5.82
DeepFool (LR) RF 87.06 94.24 25.83 30
DeepFool (LR) XGBoos 12.24 34.87 11.2 14.25
DeepFool (LR) LR 1.62 3.24 3.81 3.01
DeepFool (LR) PyTo ch MLP model 98.85 97.57 94.04 95.66
C&W (LR) DT 3.3 36.69 13.73 7.75
C&W (LR) RF 74.39 91.16 13.19 12.98
C&W (LR) XGBoos 2.21 34.32 11.29 5.91
C&W (LR) LR 2.09 4.82 4.52 3.51
C&W (LR) PyTo ch MLP model 98.22 98.51 94.49 96.41
FGSM (PyTo ch MLP model) DT 48.95 29.27 9.31 11.02
FGSM (PyTo ch MLP model) RF 73.22 86.09 11.2 9.63
FGSM (PyTo ch MLP model) XGBoos 52.37 9.08 8.75 8.84
FGSM (PyTo ch MLP model) LR 32.92 22.79 44.82 20.65
FGSM (PyTo ch MLP model) PyTo ch MLP model 78.57 33.7 22.71 24.15
PGD (PyTo ch MLP model) DT 24.75 15.17 10.84 8.09
PGD (PyTo ch MLP model) RF 73.21 86.04 11.11 9.41
PGD (PyTo ch MLP model) XGBoos 47.74 10.89 17.46 10.17
PGD (PyTo ch MLP model) LR 26.41 21.88 34.86 16.49
PGD (PyTo ch MLP model) PyTo ch MLP model 78.09 37.86 20.92 22.45
DeepFool (PyTo ch MLP model) DT 42.8 15.26 13.13 11.29
DeepFool (PyTo ch MLP model) RF 73.23 86.39 11.11 9.4
DeepFool (PyTo ch MLP model) XGBoos 63.53 18.65 10.56 10.72
DeepFool (PyTo ch MLP model) LR 18.77 15.08 23.59 10.89
DeepFool (PyTo ch MLP model) PyTo ch MLP model 69.93 20.01 11.05 9.87
C&W (PyTo ch MLP model) DT 34.25 14.76 13.43 9.5
C&W (PyTo ch MLP model) RF 74.78 92.09 14.46 15.76
C&W (PyTo ch MLP model) XGBoos 57.17 23.11 17.42 14.12
C&W (PyTo ch MLP model) LR 49.7 14.31 19.53 13.17
C&W (PyTo ch MLP model) PyTo ch MLP model 55.94 33.18 17.83 18.76
DT A ack CNN 0.74 65.63 0.74 0.73
FGSM (LR) CNN 0.89 14.88 0.89 1.00
PGD (LR) CNN 0.79 15.16 0.79 0.85
DeepFool (LR) CNN 0.14 22.12 0.14 0.04
C&W (LR) CNN 0.14 18.19 0.14 0.04
FGSM (PyTo ch MLP model) CNN 0.94 24.21 0.94 0.74
PGD (PyTo ch MLP model) CNN 0.57 31.48 0.57 0.44
DeepFool (PyTo ch MLP model) CNN 1.14 16.75 1.14 1.95
C&W (PyTo ch MLP model) CNN 3.33 64.11 3.33 5.60
FGSM (CNN) CNN 1.41 36.53 1.41 0.76
PGD (CNN) CNN 0.21 4.02 0.21 0.09
DeepFool (CNN) CNN 17.12 73.71 17.12 27.35
FGSM (RNN) CNN 0.79 71.26 0.79 1.09
PGD (RNN) CNN 1.14 72.93 1.14 1.63
DT A ack (RNN) 97.5 97.71 97.5 97.54
FGSM (LR) (RNN) 15.13 28.74 15.13 19.24
PGD (LR) (RNN) 15.25 28.63 15.25 19.27
DeepFool (LR) (RNN) 80.24 78.59 80.24 78.71
C&W (LR) (RNN) 87.04 85.75 87.04 85.34
FGSM (PyTo ch MLP model) (RNN) 64.9 71.41 64.9 67.13
PGD (PyTo ch MLP model) (RNN) 63.59 72.95 63.59 65.83
DeepFool (PyTo ch MLP model) (RNN) 28.53 69.24 28.53 38.3
C&W (PyTo ch MLP model) (RNN) 53.16 63.31 53.16 55.46
FGSM (CNN) (RNN) 50.07 55.08 50.07 52.26
PGD (CNN) (RNN) 53.97 57.5 53.97 55.56
DeepFool (CNN) (RNN) 3.44 10.24 3.44 4.37
FGSM (RNN) (RNN) 18.94 33.18 18.94 23.87
PGD (RNN) (RNN) 0.24 0.67 0.24 0.34
A ay 28 (2025) 100546
7
G. Zi as e al.
Table 5
T ans e a ack e alua ion me ics in CICIoT2023.
Pe o med a ack Ta ge model Accu acy (%) P ecision (%) Recall (%) F1-Sco e (%)
DT A ack DT 0.02 78.59 0.02 0.01
DT A ack XGBoos 85.98 86.05 85.98 83.96
DT A ack RF 82.02 83.69 82.02 81.79
DT A ack LR 86.06 88.44 86.06 85.01
DT A ack PyTo ch MLP model 86.24 88.48 86.24 83.65
DT A ack CNN 8.42 72.61 8.42 1.89
DT A ack RNN 85.84 88.16 85.84 83.61
FGSM (LR) DT 25.74 15.19 25.74 18.32
FGSM (LR) XGBoos 34.88 23.55 34.88 25.31
FGSM (LR) RF 60.89 63.79 60.89 57.57
FGSM (LR) LR 17.85 43.81 17.85 11.63
FGSM (LR) PyTo ch MLP model 51.99 59.11 51.99 50.67
FGSM (LR) CNN 10.74 66.37 10.74 2.62
FGSM (LR) RNN 28.84 51.8 28.84 23.85
PGD (LR) DT 25.85 15.29 25.85 18.51
PGD (LR) XGBoos 37.48 27.09 37.48 28.42
PGD (LR) RF 65.26 71.09 65.26 61.06
PGD (LR) LR 13.28 41.08 13.28 7.02
PGD (LR) PyTo ch MLP model 58.65 67.7 58.65 56.89
PGD (LR) CNN 10.9 72.99 10.9 2.6
PGD (LR) RNN 34.95 55.2 34.95 33.49
DeepFool (LR) DT 28.61 19.61 28.61 20.77
DeepFool (LR) XGBoos 35.84 32.79 35.84 30.63
DeepFool (LR) RF 40.66 47.12 40.66 37.14
DeepFool (LR) LR 13.78 21.11 13.78 9.9
DeepFool (LR) PyTo ch MLP model 86.18 88.32 86.18 83.68
DeepFool (LR) CNN 7.25 72.63 7.25 1.73
DeepFool (LR) RNN 84.56 87.81 84.56 82.17
C&W (LR) DT 24.03 52.07 24.03 15.05
C&W (LR) XGBoos 37.02 64.61 37.02 28.11
C&W (LR) RF 68.21 76.98 68.21 67.02
C&W (LR) LR 12.9 24.07 12.9 6.94
C&W (LR) PyTo ch MLP model 85 85.88 85 83.14
C&W (LR) CNN 7.34 72.67 7.34 1.84
C&W (LR) RNN 71.91 80.37 71.91 71.16
FGSM (PyTo ch MLP model) DT 25.64 15.89 25.64 15.75
FGSM (PyTo ch MLP model) XGBoos 36.87 24.43 36.87 27.02
FGSM (PyTo ch MLP model) RF 37.57 63.23 37.57 34.76
FGSM (PyTo ch MLP model) LR 37.44 53.46 37.44 29.11
FGSM (PyTo ch MLP model) PyTo ch MLP model 27.13 24.41 27.13 21.49
FGSM (PyTo ch MLP model) CNN 7.62 72.57 7.62 1.49
FGSM (PyTo ch MLP model) RNN 28.49 41.65 28.49 18.27
PGD (PyTo ch MLP model) DT 65.58 69.65 65.58 65.36
PGD (PyTo ch MLP model) XGBoos 67.52 69.15 67.52 64.23
PGD (PyTo ch MLP model) RF 78.59 83.77 78.59 73.16
PGD (PyTo ch MLP model) LR 84.36 83.4 84.36 83.1
PGD (PyTo ch MLP model) PyTo ch MLP model 84.43 84.04 84.43 82.14
PGD (PyTo ch MLP model) CNN 2.9 66.27 2.9 0.58
PGD (PyTo ch MLP model) RNN 77.2 76.47 77.2 72.02
DeepFool (PyTo ch MLP model) DT 36.92 32.15 36.92 33.12
DeepFool (PyTo ch MLP model) XGBoos 21.03 21.96 21.03 18.61
DeepFool (PyTo ch MLP model) RF 36.74 37.33 36.74 31.55
DeepFool (PyTo ch MLP model) LR 29.46 36.14 29.46 30.12
DeepFool (PyTo ch MLP model) PyTo ch MLP model 41.53 36.03 41.53 35.3
DeepFool (PyTo ch MLP model) CNN 7.98 48.41 7.98 2.24
DeepFool (PyTo ch MLP model) RNN 38.4 32.53 38.4 34.16
C&W (PyTo ch MLP model) DT 86.74 86.77 86.74 85.72
C&W (PyTo ch MLP model) XGBoos 86.18 85.96 86.18 85.82
C&W (PyTo ch MLP model) RF 84.51 84.47 84.51 84.48
C&W (PyTo ch MLP model) LR 87.06 87.96 87.06 85.65
C&W (PyTo ch MLP model) PyTo ch MLP model 86.57 88.87 86.57 84.21
C&W (PyTo ch MLP model) CNN 7.32 66.97 7.32 1.67
C&W (PyTo ch MLP model) RNN 86.08 88.14 86.08 84.25
FGSM (CNN) DT 33.64 58.19 33.64 26.6
FGSM (CNN) XGBoos 32.05 60.21 32.05 29.12
FGSM (CNN) RF 77.59 71.22 77.59 72.45
FGSM (CNN) LR 71.61 72.49 71.61 67.57
FGSM (CNN) PyTo ch MLP model 79.04 80.22 79.04 75.86
FGSM (CNN) CNN 1.62 66.17 1.62 0.39
FGSM (CNN) RNN 67.74 67.41 67.74 63.85
PGD (CNN) DT 33.93 62.27 33.93 25.95
PGD (CNN) XGBoos 41.34 66.08 41.34 35.87
PGD (CNN) RF 78.13 76.99 78.13 72.92
PGD (CNN) LR 75.03 74.43 75.03 69.52
(con inued on nex page)
A ay 28 (2025) 100546
8
G. Zi as e al.
Table 5 (con inued).
PGD (CNN) PyTo ch MLP model 76.83 78.58 76.83 71.28
PGD (CNN) CNN 5.95 72.32 5.95 1.42
PGD (CNN) RNN 74.38 72.28 74.38 70.45
DeepFool (CNN) DT 38.37 43.37 38.37 38.32
DeepFool (CNN) XGBoos 34.55 45.5 34.55 30.79
DeepFool (CNN) RF 50.23 54.07 50.23 46.09
DeepFool (CNN) LR 29.68 42.44 29.68 34.17
DeepFool (CNN) PyTo ch MLP model 42.77 37.52 42.77 37.56
DeepFool (CNN) CNN 7.18 65.36 7.18 1.16
DeepFool (CNN) RNN 42.81 37.72 42.81 35.87
FGSM (RNN) DT 31.1 18 31.1 21.84
FGSM (RNN) XGBoos 39.23 26.58 39.23 28.88
FGSM (RNN) RF 54.31 67.48 54.31 52.78
FGSM (RNN) LR 26.58 50.95 26.58 18.84
FGSM (RNN) PyTo ch MLP model 56.12 57.13 56.12 54.4
FGSM (RNN) CNN 7.51 65.78 7.51 1.6
FGSM (RNN) RNN 13.87 29.93 13.87 6.94
PGD (RNN) DT 31.6 19.14 31.6 22.14
PGD (RNN) XGBoos 42.01 35.37 42.01 32.4
PGD (RNN) RF 59.56 69.95 59.56 57.2
PGD (RNN) LR 17.71 40.06 17.71 13.03
PGD (RNN) PyTo ch MLP model 55.92 57.18 55.92 54.38
PGD (RNN) CNN 5.95 66.68 5.95 1.29
PGD (RNN) RNN 11.53 5.26 11.53 6.75
DeepFool (RNN) DT 3.79 5.96 3.79 3.85
DeepFool (RNN) XGBoos 4.52 7.51 4.52 5.4
DeepFool (RNN) RF 5.37 16.45 5.37 6.75
DeepFool (RNN) LR 13.63 5.69 13.63 6.41
DeepFool (RNN) PyTo ch MLP model 7.05 3.46 7.05 3.79
DeepFool (RNN) CNN 7.65 72.84 7.65 3.08
DeepFool (RNN) RNN 13.42 6.28 13.42 7.29
success. A acks om he RNN i sel , such as FGSM and PGD, a e mos ly
ine ec i e, wi h F1-sco es as low as 0.34%, likely due o o e i ing o
lack o ans e abili y be ween simila a chi ec u es.
O e all, as obse ed om Table 4, he mos e ec i e ans e a ack
is DeepFool (LR) execu ed on CNN, whe e he CNN’s pe o mance
comple ely collapses: accu acy d ops o 0.14%. This demons a es ex-
emely high ans e abili y om a simple linea model o a complex
deep model because CNNs a e highly sensi i e o inely c a ed pe -
u ba ions, especially hose op imized agains decision bounda ies, as
DeepFool does. Simila ly, C&W (LR) on CNN and PGD (LR) on CNN
also show nea -ze o accu acy, con i ming CNN as he mos ulne able
model o ans e a acks. This is a ibu ed o CNNs’ local ea u e
dependencies and lack o inpu -space egula iza ion, making hem
p one o spa ially dis ibu ed ad e sa ial noise. On he o he hand,
he leas e ec i e ans e a ack (i.e., he one ha ails o decei e
he a ge model) is DT A ack (DT) on RF, whe e he Random Fo es
p ese es 99.28%, indica ing almos no impac . This highligh s RF
as he mos obus model o ans e a acks due o i s ensemble
na u e and esis ance o g adien -based pe u ba ions (since i is non-
di e en iable and buil on decision bounda ies ha do no smoo hly
shi wi h inpu changes).
Rega ding he da ase CICIoT2023 (see Table 5), he DT A ack has a
de as a ing impac on he DT model i sel , educing i s accu acy o jus
0.02%, e ec i ely b eaking i s p edic i e capabili y. The same applies
o CNN, which eaches an impac equal o 8.42%. In con as , o he
models emain i ually una ec ed. XGBoos (85.98%), RF (82.02%),
LR (86.06%), PyTo ch MLP model (86.24%), and RNN (85.84%) main-
ain high accu acy, highligh ing he highly non- ans e able na u e o
his a ack. Nex , FGSM c a ed om a LR model se e ely impac s DT
(25.74% accu acy) and mode a ely a ec s o he models. RF (60.89%)
and PyTo ch MLP model (51.99%) show pa ial obus ness, while
CNN (10.74%) and RNN (28.84%) see subs an ial pe o mance d ops.
In e es ingly, he LR model ha gene a ed he a ack d ops o 17.85%
accu acy, con i ming FGSM’s e ec i eness on i s own a chi ec u e.
This a ack demons a es mode a e ans e abili y. Fu he mo e, wi h
PGD c a ed using LR, DT (25.85%) and LR (13.28%) a e hea ily
comp omised, while RF (65.26%) and PyTo ch MLP model (58.65%)
exhibi s onge esilience. CNN con inues o pe o m poo ly (10.90%),
and RNN shows a mode a e d op o 34.95%. In addi ion, DeepFool a -
ge ing LR p oduces mixed esul s. While DT (28.61%) and LR (13.78%)
a e signi ican ly a ec ed, PyTo ch MLP model (86.18%) and RNN
(84.56%) emain la gely in ac , sugges ing high obus ness in neu al
a chi ec u es. CNN’s pe o mance plumme s o 7.25%, consis en wi h
i s o e all ulne abili y.
Also, C&W c a ed on LR a ge s i s sou ce model e ec i ely (12.90%
accu acy) and mode a ely a ec s DT (24.03%) and XGBoos (37.02%).
Howe e , RF (68.21%), PyTo ch MLP model (85.00%), and RNN
(71.91%) esis he a ack well. CNN d ops again o 7.34%, ein o cing
i s consis en weakness. O e all, his a ack appea s less ans e able.
Mo eo e , FGSM gene a ed om PyTo ch MLP model leads o mod-
e a e deg ada ion in DT (25.64%) and XGBoos (36.87%). LR and
RNN bo h d op o a ound 37.44% and 28.49%, espec i ely. The
mos su p ising esul is he low accu acy o he PyTo ch MLP model
i sel (27.13%), indica ing ha FGSM emains highly e ec i e on he
sou ce model bu less so on o he s. CNN’s pe o mance emains dismal
(7.62%), consis en ac oss a acks. PGD om PyTo ch MLP model
causes widesp ead pe o mance deg ada ion, especially in shallow
models. DT alls o 65.58%, XGBoos o 67.52%, while RF (78.59%)
and LR (84.36%) a e be e . Su p isingly, PyTo ch MLP model e ains
84.43%, showing s ong obus ness o i s own PGD a ack. CNN’s ac-
cu acy d ops sha ply (2.90%), while RNN holds a 77.20%. This a ack
shows mode a e ans e abili y. Also, DeepFool a ge ing PyTo ch MLP
model p oduces a iable esul s. While DT (36.92%) and XGBoos
(21.03%) show losses, models such as PyTo ch MLP model (41.53%)
and RNN (38.40%) su e less sha ply. CNN, howe e , again s uggles
(7.98%), as expec ed. LR and RF main ain middling sco es (29.46%,
36.74%). C&W o PyTo ch MLP model yields minimal deg ada ion,
e en agains i s own model: PyTo ch e ains 86.57% accu acy. LR
(87.06%), RNN (86.08%), and XGBoos (86.18%) also hold s ong.
E en DT (86.74%) esis s his a ack—a s a k con as om o he cases.
CNN, in keeping wi h i s end, collapses o 7.32%. The C&W a ack
om PyTo ch MLP model shows e y low ans e abili y. Fu he -
mo e, FGSM c a ed om CNN su p isingly causes no able d ops in
LR (71.61%) and DT (33.64%), despi e being a simple a ack. CNN
i sel collapses (1.62%), con i ming he a ack’s s eng h agains i s own
A ay 28 (2025) 100546
9