Enginee ing and Technology Jou nal e-ISSN: 2456-3358
Volume 10 Issue 11 No embe -2025, Page No.-7947-7956
DOI: 10.47191/e j/ 10i11.26, I.F. – 8.482
© 2025, ETJ
7947
ETJ Volume 10 Issue 11 No embe 2025,
1
Aymen Saad
Deep Rein o cemen Lea ning-Based Ne wo k In usion P e en ion in
Cloud-Edge A chi ec u es
Aymen Saad1,2*, Noo Flayyih Hasan3, Amma Wisam. Al ahe 1
1Depa men o In o ma ion Technology, Managemen Technical College, Al-Fu a Al-Awsa Technical Uni e si y, Ku a, I aq.
2School o Elec ical Enginee ing, Uni e si i Teknologi Malaysia, 81310 UTM Skudai, Joho , Malaysia.
3Sou he n Technical Uni e si y, Thi-Qa Technical College, Depa men o Accoun ing Techniques, I aq
ABSTRACT: Cloud-edge a chi ec u es enable low-la ency dis ibu ed da a p ocessing bu in oduce complex a ack su aces ha
challenge adi ional Ne wo k In usion De ec ion and P e en ion Sys ems (NIDPS). Con en ional sys ems elying on s a ic
signa u es and cen alized analysis canno adap o he dynamic, he e ogeneous na u e o hese en i onmen s. This pape p oposes
a no el Deep Rein o cemen Lea ning (DRL) amewo k o au onomous ne wo k in usion p e en ion, deploying in elligen agen s
a he edge laye capable o eal- ime ne wo k in e ac ion. The agen s lea n op imal secu i y policies h ough con inuous obse a ion,
ac ion, and ewa d cycles. By analyzing li e ne wo k a ic, agen s classi y malicious ac i i ies wi h high accu acy and p oac i ely
execu e p e en ion ac ions including connec ion blocking and bandwid h h o ling. We de elop simula ed cloud-edge es bed
modeling di e se a ack scena ios (DDoS, in il a ion, da a ex il a ion) and implemen Deep Q-Ne wo ks (DQN) and P oximal
Policy Op imiza ion (PPO) algo i hms. E alua ion using NSL-KDD and CIC-IDS2017 da ase s demons a es signi ican
imp o emen s: 98.7% de ec ion accu acy, 1.8% alse posi i e a e, and sub-20ms esponse ime, p o iding obus sel -adap i e
de ense o dis ibu ed compu ing in as uc u es.
KEYWORDS: Deep ein o cemen lea ning, in usion p e en ion, cloud-edge compu ing, ne wo k secu i y, Deep Q-Ne wo ks
(DQN) and P oximal Policy Op imiza ion (PPO).
I. INTRODUCTION
A. Backg ound and Mo i a ion
Cloud-edge compu ing pa e ns a e dis up i e o he
dis ibu ed sys ems model, ans o ming he way in which
eal- ime da a p ocessing can ake place wi h ul a-low ime
esponse and dynamic le el o scale o cloud-edge
deploymen . These a chi ec u es a e indispensable o he
new eme ging de elopmen s like In e ne o Things (Io )
en i onmen s, au onomous sys ems and sma ci y
in as uc u es [1]. By deploying esou ces a he ne wo k
edge whe e compe i i e big da a sou ces a e loca ed, such
a chi ec u es help minimize in e -componen dis ances and
enable in-ne wo k syzygy ope a ion which imposes lowe
la ency o e head compa ed o cen alized cloud models.
Howe e , he dis ibu ed pa adigm des oys he secu i y
model and lea es us wi h a massi ely ulne able a ack
su ace. The exis ence o he e ogeneous edge nodes gi e ise
o nume ous po en ial ulne abili ies ha a acke s can
le e age. Wi h he in oduc ion o MEC sys ems, new
secu i y challenges a ise, mo e speci ically in p o iding
accu a e and imeliness esponses since hey ope a e in a
esou ce-sca ce en i onmen [10]. Naï e secu i y models
concen a e all de ensi e means p oxima e o he dis ibu ed
cloud, which in u n causes in ole able communica ion
bo lenecks and la ency lags o low-la ency applica ions,
such as indus ial con ol sys ems (ICSs), ehicula ne wo ks,
and heal hca e moni o ing sys ems [3]. They a e suscep ible
o da a secu i y iola ion and se e al a ack su aces since
hey a e based on a dis ibu ed a chi ec u e, bu wi h limi ed
esou ces memo y, ba e y powe (A ailabili y o o e - he-ai
upda es) mixed ha dwa e (S anda diza ion in communica ion
p o ocols) and di e si y o paused inac i i y when cu en
sys em has no human in e ac ion o ac i i y [4]
communica ion p o ocols i is di icul o he deployed
WSNs o upda e hei secu i y pa ches in ime. Ne wo k
In usion De ec ion and P e en ion Sys ems (NIDPSs) ha e
been a mains ay o any o ganiza ion's cybe -secu i y pos u e
o yea s; howe e , con empo a y implemen a ions a e
la gely p edica ed on signa u e-based de ec ion mechanisms
and s a ic ule-based p e en ion. These a e no dumb and
oolish me hods wi h h ea s ha p og essi es cons an ly
change, e ol e and adap hem. Signa u e based solu ions
main ain he eco d o a ack pa e ns in he da abase, and
hey a e eac i e in na u e hus hey canno p e en ze o-day
a acks, polymo phic malwa e and ad e sa ial a emp s o
a oid p ede ined de ec ion ules.
B. Deep Rein o cemen Lea ning o Ne wo k Secu i y
Deep Rein o cemen Lea ning (DRL) has been a
e olu iona y me hod o au onomic ne wo k secu i y, which
akes ad an age o he g ea pa e n- ecogni ion powe om
deep neu al ne wo ks and associa e i o di ec ly ake he
“Deep Rein o cemen Lea ning-Based Ne wo k In usion P e en ion in Cloud-Edge A chi ec u es”
7948
ETJ Volume 10 Issue 11 No embe 2025,
1
Aymen Saad
mos app op ia e eac ion in e ms o ein o cemen lea ning
as well. Con a y o supe ised deep lea ning-based sys ems
ha need big, labeled da ase s and a e unable o adap policies
ee when acing new h ea s, Ou DRL agen s lea n he
op imal secu i y policy h ough in e ac ing wi h en i onmen
[5].
The ope a ional model o DRL-based secu i y mechanisms is
composed o in elligen agen s ha obse e he ne wo k-s a e
as mul i-dimensioned ea u es, ac in p e en ion om an
ac ion space and inally ecei e ewa ds es ima ing he
quali y o esul ing s a es. GANN consis s o ac ion, ewa d
and obse a ion ha a e used i e a i ely o op imize he
beha io policy o agen s o achie e long- e m cumula i e
ewa ds (lea ning how malicious dis inguishes om
legi ima e a ic pa e ns).
Deep Q-Ne wo k (DQN) u ilizes deep neu al ne wo ks as
unc ion app oxima o s o app oxima e he Q- alues in high-
dimensional s a e space. Wi h he expe ience eplaying
mechanism which s o es pas in e ac ions and andomly
samples p e ious ansi ions, DQN can deco ela e empo al
samples om ba ch da a o achie e s able lea ning in
sophis ica ed secu i y scena ios [5]. P oximal Policy
Op imiza ion (PPO) en o ces a clipped su oga e objec i e
whose ou pu scales a e bounded o ensu e he s abili y o
aining being s ill mo e sample-e icien han ea ly policy
g adien me hods [5].
In a ecen s udy, CNNs and LSTMs we e combined wi h
Rein o cemen Lea ning algo i hms (DQN and PPO) o
imp o ed de ec ion o dynamic h ea si ua ions. Combined
wi h DQN and PPO, he eal- ime con inues ime o
de ec ion can be adjus ed and adap o lea ning om de ec ing
da a has achie ed desi able le el d e ec ing accu acy as well
as low alse posi i e [6]. Despi e hese algo i hmic
b eak h oughs, p io wo k has signi ican gaps ha impede
p ac ical adop ions in eal-wo ldcloud-edge secu i y
deploymen . The majo i y o he DRL-o ien ed secu i y
esea ch a e dedica ed o in usion de ec ion and no
in ol ing au oma ic p e en ion by execu ing i s
co esponding coun e measu es.
C. Resea ch Objec i es and Con ibu ions
The wo k ills a subs an ial oid be ween legacy capabili ies
o adi ional NIDPSs and he needs o mode n cloud-edge
a chi ec u e wi h/ h ough design, implemen a ion and
comp ehensi e e alua ions o a DRL-enabled amewo k o
au onomous ne wo k in usion p e en ion. Ou main
con ibu ions consis in he ou in e ela ed dimensions:
1) In elligen Agen F amewo k o Edge Deploymen : We
design an agen sys em ha has been de eloped keeping in
mind cha ac e is ics o dis ibu ed cloud-edge whe e agen s
a e deployed on edge nodes and hey a e esponsible o local
eal- ime h ea analysis and low la ency ac ion execu ion.
Making use o edge compu ing, he amewo k alloca es
de ec ion in o whe e he da a is a he edge o lowe la ency
and make eal- ime esponse as e [2].
2) Ex ensi e compa ison o DQN and PPO: In his wo k, we
implemen and compa e a wide ange o agen s wi h PPO and
DQN o sys ema ic e alua ion wi h a ocus on in usion
p e en ion in dis ibu ed sys ems [7]. This wo k o e s
e idence-d i en ecommenda ions o p ac i ione s o
choosing DRL algo i hms ha a e mos sui able in hei
deploymen cons ain s.
3) Realis ic Simula ed Cloud-Edge Tes bed: We design a
es bed o ealis ically emula e eal-wo ld ne wo k
opologies, a ic pa e ns, esou ce-cons ained sys ems, and
di e en a ack models. The es -bed includes popula
benchma k da ase s (NSL-KDD and CIC-IDS2017) o allow
ep oducibili y as well as o acili a e head- o-head
compa ison wi h published baseline numbe s.
4) Comp ehensi e Mul i-Dimensional E alua ion o
Pe o mance: We p o ide measu ed imp o emen s in
adap i e h ea esponse, educed alse posi i es, and
inc eased compu a ional e iciencies when compa ed o
adi ional machine lea ning-based de ec ion sys ems.
II. RELATED WORK
A. E olu ion o In usion De ec ion App oaches
The a ea o ne wo k in usion de ec ion has seen g ea
change in he las decade, mo ing om signa u e-based o
sophis ica ed machine lea ning me hods. Ea ly in usion
de ec ion sys ems we e based on p ede ined a ack signa u es
and heu is ic-based ules, which did no pe o m well agains
polymo phic malwa e, ze o-day exploi s, o no el o ms o
a ack.
Con en ional IDS schemes such as signa u e-based,
anomaly-based algo i hms ha e high alse posi i e a e and
a e no adap able o cu en h ea si ua ion [8]. Sys ema ic
e iews which s udied ecen pape s showed ha adi ional
machine lea ne models had good de ec ion accu acy (90-
95%), bu p oblems wi h ea u e enginee ing and high alse
ala m a es, Adap i e lea ning capabili y o analyze new
h ea s.
B. Deep Lea ning o In usion De ec ion
The a i al o deep lea ning pa adigms conside ably
changed in usion de ec ion capaci ies wi h au oma ic ea u e
disco e y and be e gene aliza ion abili ies o unknown
a ack shapes. CNNs we e e icacious in cap u ing spa ial
pa e ns om ne wo k packe s uc u es, while RNNs and
Long Sho -Te m Memo y (LSTM) ne wo ks we e ound o
be sui able o model empo al ela ionships inhe en in a ic
sequences.
Sou ce Howe e , he cu en conce n is ha , despi e mul iple
ecen e iews s ill s a ing ha open challenges pe sis
(no ably ulne abili y o ad e sa ial examples), we p oceed
no on a weak ounda ion wi h espec o hem. Ou wo k
mi iga es hese limi a ions using Deep Rein o cemen
Lea ning o con inuous au onomous lea ning and policy
adjus men .
“Deep Rein o cemen Lea ning-Based Ne wo k In usion P e en ion in Cloud-Edge A chi ec u es”
7949
ETJ Volume 10 Issue 11 No embe 2025,
1
Aymen Saad
C. Deep Rein o cemen Lea ning o Ne wo k Secu i y
Deep Rein o cemen Lea ning in in usion de ec ion is a
change o di ec ion, sma shi om passi e classi ica ion o
ac i e de ense. DRL agen s a e able o lea n he
di e en ia ion o malicious and benign a ic h ough an
in e ac i e p ocess o ial-and-e o , which esul s in
compa able abno mali y de ec ion a e wi h supe ised
me hods bu wi h mo e lexibili y han only one- ime aining
[5].
A deep Q-lea ning model o e s a con inuously e ol ing au o-
lea ning unc ionali y o ne wo king sys ems and can
iden i y a ious ypes o ne wo k in usions h ough an
au oma ed ial-e o based p ocess wi h he abili y o
g adually s eng hen i s de ec ion capabili ies [9]. CNN-
LSTM in eg a ed wi h DQN and PPO sys ems signi ican ly
pe o med well compa ed wi h s a e-o - he-a me hods
achie ing bina y classi ica ion accu acy o 0.9958 on IoMT
heal hca e ne wo ks [6].
Rela ed wo k esea ch on DRL o IoT in usion de ec ion
ha e shown posi i e esul s, bu hese wo ks mainly
concen a ed on he p o ec ion o esou ce-cons ained
de ices ins ead o he dis ibu ed cloud-edge a chi ec u e. In
ou amewo k, we ocus on dis ibu ed cloud-edge
a chi ec u es by deploying in elligen agen s a he ne wo k
edges so as o make locally low-la ency decisions.
D. Cloud-Edge Secu i y Challenges
Mobile Edge Compu ing (MEC) is a compu ing pla o m o
compu e-hea y and la ency-sensi i e applica ions, e.g.,
augmen ed eali y, eal- ime da a analysis, IoT, ideo
analy ics, sma ehicles and heal hca e sys ems [3]. Bu he
ollowing malicious a acks, including DDoS, ansomwa e,
emo e eco ding and ou ing a acks a e he mos likely
secu i y h ea s ha MEC sys ems may ecei e [3].
Cen alized in usion de ec ion (ID) echniques esul in high
communica ion la ency as opposed o edge-empowe ed
amewo ks ha decen alize he de ec ion capabili y close
o da a sou ce [2]. Edge de ice wo k usually ake place in
en i onmen s ha a e less gua ded, hence adi ional a acks
such as ea esd opping, da a hijacking and man-in- he-
middle a ack can be launched [10].
Edge compu ing sys ems consis o nume ous esou ce-
cons ained de ices, including compu a ion powe , memo y
and ene gy, which necessi a es o ha e an in usion de ec ion
sys em de eloped independen ly o he edge compu ing
backg ound [11]. We b eak new g ound by ocusing on
lea ning-enabled edge-deployed DRL agen s ha a e ully
au onomous and ha e a e y low la ency.
E. Algo i hm Compa ison: DQN s. PPO
The s udy in o he one had he DQN expe imen (gaines o
simula ion) and PPO pe o med success ully lea ned o
make e ec i e and well- imed de ense e en unde eal
en i onmen limi a ions.Impi ical esul s also p o ed ha he
lea ning was successed. [7]. Fo NSL-KDD da a se s, DQN
eached 99.36% accu acy and 99.07% p ecision, and
di e en DRL models p esen ed a compa able pe o mance
o e di e si ied me ics [12].
Ye no s udies ha e conduc ed a comp ehensi e compa ison
o DQN and PPO o in usion p e en ion (ins ead o
de ec ion) in cloud-edge ne wo k unde esou ce-limi ed
en i onmen , in e ence la ency and ope a ional deploymen .
Ou wo k add esses his gap in he li e a u e and p o ides a
comp ehensi e compa a i e s udy along se e al dimensions.
F. Da ase Benchma king and E alua ion
The benign and he la es common a acks a e a ailable in
CIC-IDS2017, his e lec s ac ual eal-wo ld da a i.e.,
ne wo k a ic analysis esul s, by using CICFlowMe e
wi h labeled lows ega ding imes amp, sou ce-IP and
des ina ion-IP add esses o po s (NE may no ha e one o
hem), p o ocol and a ack ypes [13]. NSL-KDD Da a The
NSL-KDD is able o sol e he d awbacks o o iginal KDD-
CUP99 by elimina ing edundan da a eco ds and balancing
he dis ibu ion o i s ain and es se s, which has a decen
numbe o eco ds [14].
Fo example, he low ex ac ed om ne wo k packe s o
CIC-IDS2017 we e analysis and many p oblems we e
disco e ed by esea che s which ook us o p opose be e
ea u e ex ac ion ools [15]. Al hough hese known
de iciencies, The NSL-KDD and CIC- IDS2017 a e s ill
popula benchma ks suppo ing good in e -s udy
compa abili y.
III. METHODOLOGY
A. Sys em A chi ec u e and Design
Ou p oposed amewo k consis s o h ee p ima y
componen s: he cloud-edge en i onmen simula ion, he
DRL-based in elligen agen , and he e alua ion
in as uc u e. Cloud-edge a chi ec u e is modeled as a
hie a chical sys em whe e edge nodes handle localized da a
p ocessing and p elimina y secu i y decisions, while cloud
esou ces p o ide cen alized coo dina ion and esou ce-
in ensi e analysis when necessa y. A each edge node, a DRL
agen is ins alled and lis ens o incoming a ic, analyzing
low cha ac e is ics and aking au onomous p e en ing
measu es. In elligen agen a chi ec u e combines deep neu al
ne wo ks o lea ning o a s a e ep esen a ion and
ein o cemen ( epe i i e) lea ning algo i hms o op imize
policy. The s a e space co e s mul i-dimensional ne wo k
p ope ies, such as packe heade s, low s a is ics, ime
cu es, and me a in o ma ion pa sed om a ic lows. The
space o ac ions consis s o a omic disc e e p e en ion
ac ions (e.g., block connec ion, h o le a ic, d op packe
and ale ) and con inuous knobs o a e limi ing and esou ce
alloca ion. The ewa d unc ion is enginee ed o op imise
se e al goals:
• Maximizing a ack de ec ion and p e en ion
• educing alse posi i es and nega i es
• A enua ion o la ency e ec s on legal a ic
• Op imizing esou ce consump ion
“Deep Rein o cemen Lea ning-Based Ne wo k In usion P e en ion in Cloud-Edge A chi ec u es”
7950
ETJ Volume 10 Issue 11 No embe 2025,
1
Aymen Saad
B. Deep Rein o cemen Lea ning Algo i hms
Deep Q-Ne wo ks: The DQN agen makes decisions
acco ding o he inpu ea u e and uses whe he an a ack is
de ec ed as he alue sco e. Ou pu based on h eshold a es
is compa ed o Q- alues o classi ying a ack classes [9] by
he agen . The a chi ec u e o DQN includes a deep neu al
ne wo k app oxima ing he Q- alue unc ion, which maps
se s o ne wo k s a es o expec ed cumula i e ewa ds o
each candida e p e en ion ac ion. Expe ience Replay bu e
wi hin DQN is a memo y s o e de ice ha keeps his o ical
e en in e ms o s a e, ac ion, ewa d and nex s a e uple, so
he agen can deco ela e be ween consecu i e expe ience by
andom sampling om he bu e o ain Q-ne wo k [5]. The
ask o imbalanced na u e in ne wo k a ic is sol ed by
in oducing weigh ed mean squa e loss unc ions and cos -
sensi i e lea ning echniques in o app oaches.
P oximal Policy Op imiza ion (PPO): In PPO, ac o and
c i ic ne wo k is used o lea n he policy unc ion mapping
s a es on o ac ion p obabili y dis ibu ions and he alue
es ima e o hem espec i ely so as o educe a iance in
policy es ima es. PPO s abilizes he aining in dynamic
en i onmen s wi h a good clipping mechanism, and his
in eg a ion o adap i e Q-lea ning con e ges as e han PPO
alones o ob ain highe obus ness [6]. Bo h use con olu ional
and ecu en neu al ne wo k laye s o lea n spa ial and
empo al in o ma ion in ne wo k a ic. Since CNN cap u es
spa ial pa e ns ia ne wo k a ic, LSTMs ecognize
empo al ea u es by a se ies o da a ha i is sui able o de ec
bo h s a ic beha io s and dynamic cha ac e is ics [6].
C. Cloud-Edge Tes bed and A ack Simula ion
We design a ealis ic simula ed cloud-edge es bed ha can
mimic p ac ical ne wo king opologies, wo kloads and a ack
echniques. The es bed is composed o se e al edge nodes
in e connec ed ia a hie a chical ne wo k o cloud
in as uc u e wi h he a ic gene a o s gene a ing bo h
benign and a ack lows. The legi ima e a ic is designed
wi h eal applica ion p o iles, e.g., web b owsing, ideo
s eaming, IoT senso a ic and en e p ise da a. Ou a ack
emula ion amewo k includes h ee amilies o h ea s:
1 )DDoS a acks: Simula ed wi h ypes o olume ic,
p o ocol-based, and applica ion-laye ec o s ha gone
h ough UDP loods and SYN loods o HTTP looding.
2 )Pene a ion A emp s: Simula e unau ho ized access,
p i ilege escala ion and la e al mo emen h ough ne wo k,
p esen ing APT ype si ua ions.
3 )Da a Ex il a ion Simula ion A acks: Model illici da a
s ealing om he comp omised edge de ices o o eign
ad e sa ial sys ems.
The es bed simula es eal ne wo k en i onmen wi h
di e en la ency, packe losses, bandwid h es ic o
esou ce es ic ions o edge compu ing.
D. Benchma k Da ase s and E alua ion Me ics
The CIC-IDS2017 da ase was cap u ed o e 5 days in July
2017, con aining o e 2.8 million ins ances including no mal
a ic and a ious a acks such as B u e Fo ce, Hea bleed,
Bo ne , DoS, DDoS, Web A ack and In il a ion [13]. NSL-
KDD can be used as an e ec i e benchma k o aid in
compa ing a ious in usion de ec ion echniques, wi h all
da a usable o es s a he han equi ing andom sampling
[14].
Ou e alua ion amewo k employs comp ehensi e me ics
spanning:
• De ec ion me ics: P ecision, ecall, F1-sco e, and
accu acy
• P e en ion me ics: Blocked a ack success a e,
ime o p e en ion, and mi iga ion e ec i eness
• Ope a ional me ics: False posi i e a e, impac on
legi ima e a ic
• Resou ce e iciency: Compu a ional o e head,
memo y consump ion, and ene gy usage
• Adap abili y me ics: Lea ning p og ession,
con e gence speed, and pe o mance on unseen
a ack a ian s
IV. EXPERIMENTAL RESULTS
A. DRL Agen T aining and Con e gence
T aining phase p esen s he p og essi e lea ning and policy
op imiza ion o bo h DQN agen s (1s - ow) and PPO agen s
(2nd- ow) unde mo e a ack pa e ns. The DQN-based agen
is al eady con e ging a ound 5,000 aining episodes (Q-
alues s op changing and cumula i e ewa d eaching a
pla eau) and seems o be lea ning e ec i e p e en ion
policies. The PPO agen lea ns as e ini ially and con e ges
a e 3500 episodes, due o he use o policy g adien me hod
as well as a clipped objec i e unc ion ha allows o mo e
s able upda es. Bo h agen s show a clea imp o emen o e
andom baseline policies, as hei mean episodic ewa ds
du ing aining exceed hose o he andom policy by 34.0%
(DQN) and 38.5% (PPO), espec i ely (Figu e 1). The
lea ned policies a e analyzed o show ha agen s lea n
complex p e en ion s a egies ailo ed a each a ack ype.
Agen s lea n o de ec olume ic anomalies ea ly on and ake
agg essi e h o ling o blocking ac ion in he case o DDoS
a acks. Mo e sub le esponses, such as selec i e connec ions
e mina ion and s eng hening he moni o ing p ocess a e
used when a emp s a in il a ion a e in place.
“Deep Rein o cemen Lea ning-Based Ne wo k In usion P e en ion in Cloud-Edge A chi ec u es”
7951
ETJ Volume 10 Issue 11 No embe 2025,
1
Aymen Saad
Figu e1: T aining con e gence compa ison be ween DQN and PPO agen s showing cumula i e ewa d o e aining
episodes.
B. De ec ion and P e en ion Pe o mance
E alua ion on he NSL-KDD da ase demons a es supe io
pe o mance o DRL-based app oaches compa ed o
adi ional machine lea ning baselines, as shown in Figu e 2.
Table I p esen s comp ehensi e pe o mance compa isons.
Figu e 2: Pe o mance compa ison o DRL based app oaches (DQN and PPO) agains adi ional machine lea ning
me hods on NSL-KDD da ase .
Table I: Comp ehensi e Pe o mance E alua ion Resul s
Me ic
DQN
PPO
Random Fo es
SVM
Deep NN
Accu acy
97.8
98.7
95.3
94.1
96.4
P ecision
96.7
97.9
94.1
92.8
95.2
Recall
97.6
98.4
95.0
93.5
96.0
F1-Sco e
97.2
98.2
94.5
93.1
95.6
False Posi i e Ra e
2.3
1.8
4.8
5.9
3.7
De ec ion Time (ms)
8.7
11.2
23.5
31.2
15.8
“Deep Rein o cemen Lea ning-Based Ne wo k In usion P e en ion in Cloud-Edge A chi ec u es”
7952
ETJ Volume 10 Issue 11 No embe 2025,
1
Aymen Saad
The PPO agen achie es an o e all accu acy o 98.7%,
p ecision o 97.9%, ecall o 98.4%, and F1-sco e o 98.2%
ac oss all a ack ca ego ies. The DQN agen a ains
compa able esul s wi h accu acy o 97.8%. These esul s
su pass published baselines including Random Fo es (95.3%
accu acy), Suppo Vec o Machines (94.1% accu acy), and
s anda d deep neu al ne wo ks (96.4% accu acy). Ca ego y-
speci ic analysis e eals pa icula ly s ong pe o mance on
DDoS and P obe a ack ypes, wi h de ec ion a es exceeding
99% o bo h DQN and PPO agen s. Table II p esen s a ack
ca ego y-speci ic pe o mance.
Table Ii: A ack Ca ego y-Speci ic De ec ion Pe o mance
A ack Ca ego y
Examples
DQN Accu acy (%)
PPO Accu acy (%)
Baseline A g (%)
DoS/DDoS
UDP lood, SYN lood
99.2
99.4
96.8
P obe
Po scan, ne wo k mapping
99.1
99.3
97.2
R2L
Passwo d guessing, exploi a ion
93.8
94.3
89.5
U2R
P i ilege escala ion, oo ki
92.1
92.7
87.3
O e all
All ca ego ies
97.8
98.7
94.6
Mo e sub le and less equen ly occu ing R2L and U2R
a acks ha e de ec ion a es o 94.3% and 92.7%,
espec i ely, howe e s ill be e han adi ional me hods. We
e alua e ou model on he CIC-IDS2017 da ase o show ha
i can gene alize well. The PPO agen accu acy emains a
97.2% o e all he a ied a ack ypes which includes bo ne
a ic, web a acks and b u e o ce ials. Figu e 3 shows ha
he mean p e en ion e ec i eness me ic o de ec ed a acks
is a ed as 94.6%, whe e hey a e mi iga ed success ully
wi hou damaging he p o ec ed sys ems. Edge-deployed
agen s, a ime o 127 ms is equi ed be ween a ack de ec ion
and p e en ion ac ion execu ion, o nea eal- ime p o ec ion
Figu e 3: A ack ca ego y-speci ic de ec ion a es o DQN and PPO agen s showing supe io pe o mance on DDoS and
P obe a acks.
C. False Posi i e Reduc ion and Ope a ional Impac
The alse posi i e a e is an impo an measu e in e ms o
sui abili y o ope a ional use. Ou DRL-based amewo k
achie es alse posi i e a es o 1.8% o PPO and 2.3% o
DQN, which a e subs an ial enhancemen s o e baseline
me hods wi h a alse posi i e o mo e han 3.7%-5.9%. The
ewa d unc ion design wi h an explici penaliza ion o alse
posi i es--, leads o agen s ha lea n conse a i e p e en ion
policies ha dis u b less legi ima e a ic, while keeping
high success ul de ec ion a es. False posi i e analysis shows
ha mos o he e o s a e alse posi i es on co ne cases wi h
legi ima e a ic beha ing a ypically, such as bulk da a
ans e s and applica ions iola ing p o ocols wi hou
maliciousness. The adap i e lea ning o DRL agen s p o ides
a dynamic means o i e a i ely adjus decision su aces
h ough con inuous lea ning (online) which can educe alse
posi i e wi h he inc ease in ope a ional expe iences o
agen s. O e head o legi ima e a ic la ency a e ages 4.3
“Deep Rein o cemen Lea ning-Based Ne wo k In usion P e en ion in Cloud-Edge A chi ec u es”
7953
ETJ Volume 10 Issue 11 No embe 2025,
1
Aymen Saad
msec o inline p e en ion ac i i ies, an insigni ican
pe cen age o e head o mos applica ions.
D. Resou ce E iciency in Edge En i onmen s
Resou ce consump ion analysis demons a es he easibili y
o deploying DRL agen s on esou ce-cons ained edge
de ices. Table III p esen s de ailed esou ce e iciency
me ics.
Table Iii: Resou ce E iciency And Ope a ional Me ics
Me ic
DQN
PPO
Edge Requi emen
Memo y Usage (MB)
127
156
< 200 MB
A e age CPU (%)
18
22
< 30%
Peak CPU (%)
34
39
< 50%
In e ence La ency (ms)
8.7
11.2
< 20 ms
Ene gy Consump ion (W)
3.0
3.2
< 5 W
Model Size (MB)
89
112
< 150 MB
Bo h in e ms o model pa ame e s and eplay bu e , ou
implemen a ion (op imized o DQN) akes 127 MB memo y
on a e age CPU usage is 18%, peaks on high- a ic o each
34%. The PPO agen shows a li le mo e esou ce usage wi h
156 MB memo y and a e age CPU u iliza ion o 22%, owing
o use o dual ac o -c i ic ne wo k. Bo h agen s keep he
powe consump ion le el unde 3.2 wa , making i
compa ible o applica ions in edge compu ing pla o ms wi h
limi ed powe channels. The p oposed model was able o
de ec h ea s wi h a 98.5% accu acy in eal ime o 45 ms as
compa ed o he adi ional models ha could only do i a 150
and 120 ms, espec i ely, gua an eeing e ec i eness o
speedy h ea de ec ion in key IoT scena ios [16]. The mean
summa ies o he in e encing la encies o s a e e alua ion
and ac ion selec ion a e 8.7 and 11.2 ms o DQN and PPO,
espec i ely; illus a ing ha bo h can be deployed in eal-
ime e en when he a ic load is hea y, as illus a ed in
Figu e 4.
Figu e 4: Resou ce consump ion compa ison be ween DQN and PPO agen s in edge compu ing en i onmen s.
E. Adap abili y and Ze o-Day A ack De ec ion
A key ad an age o DRL-based app oaches is adap abili y o
e ol ing h ea s and ze o-day a acks ha lack known
signa u es. Table IV p esen s ze o-day a ack adap abili y
e alua ion.
Table I : Ze o-Day A ack Adap abili y E alua ion
E alua ion Phase
DQN De ec ion (%)
PPO De ec ion (%)
T adi ional ML (%)
Ini ial Exposu e (0 episodes)
86.7
89.4
72.3
A e 100 episodes
88.2
90.8
73.1
A e 500 episodes
91.2
93.8
74.5
A e 1000 episodes
93.5
95.1
75.2
Imp o emen Ra e
+6.8%
+5.7%
+2.9%
“Deep Rein o cemen Lea ning-Based Ne wo k In usion P e en ion in Cloud-Edge A chi ec u es”
7954
ETJ Volume 10 Issue 11 No embe 2025,
1
Aymen Saad
E alua ion wi h ad e sa ially modi ied a ack a ian s
demons a es obus gene aliza ion, wi h de ec ion a es o
89.4% o PPO and 86.7% o DQN on p e iously unseen
a ack pa e ns. Online lea ning mechanisms enable agen s o
adap policies based on ope a ional eedback, wi h de ec ion
a es imp o ing o 93.8% and 91.2% espec i ely a e 500
episodes o exposu e o no el a ack a ian s. T ans e
lea ning expe imen s demons a e ha agen s p e- ained on
NSL-KDD success ully adap o CIC-IDS2017 scena ios
wi h only 1,200 addi ional aining episodes, compa ed o
3,500 episodes equi ed o aining om sc a ch. This esul
indica es ha lea ned policies cap u e gene alizable secu i y
p inciples a he han da ase -speci ic pa e ns.
V. DISCUSSION
A. Ad an ages o DRL-Based In usion P e en ion
Ou indings e eal d as ic bene i s o no el DRL echniques
o cloud-edge based ne wo k in usion p e en ion. The sel -
lea ning ea u e allows agen s o exploi complex p o ec ion
me hods wi hou speci ying secu i y- ules o depending on
known a ack signa u es. This answe s a basic de iciency o
classical NIDPSs ha su e om ze o-day a acks and
polymo phic a acks ha a e known o always de ea
signa u e-based de ec ion app oaches. Inc emen al lea ning
also allows g adual op imiza ion o policies om ope a ional
expe ience o o e sel -adap i e secu i y ha adap s o e
ime o he changing h ea en i onmen . The combina ion o
CNNs wi h LSTMs allows sys ems o iden i y high-le el,
dynamic, ime-e ol ing pa e ns associa ed wi h a ack-
ne wo k ac i i y; a cha ac e is ic which is no always
ob iously a ailable in classical IDSs [6]. The p o ision o
edge deploymen is also impo an o la ency ad an ages such
as dis ibu ing de ec ion asks downwa ds he da a sou ce,
whe e i can cu la ency and enhance eal- ime
esponsi eness [2]. Local decision-making elimina es he
ne wo k ound- ip ime o cen alized secu i y in as uc u e,
allowing nea - eal ime h ea p e en ion equi ed by delay-
sensi i e a ic.
B. Compa ison o DQN and PPO Algo i hms
Ou compa a i e e alua ion e eals dis inc ade-o s
be ween DQN and PPO algo i hms o in usion p e en ion.
Table V p esen s comp ehensi e algo i hm compa ison.
Table V: Compa a i e Analysis: Dqn Vs. Ppo
C i e ion
DQN
PPO
Recommended Use Case
Sample E iciency
Mode a e
High
PPO o limi ed aining da a
Con e gence Speed
5,000 episodes
3,500 episodes
PPO o as e deploymen
De ec ion Accu acy
97.8%
98.7%
PPO o maximum accu acy
False Posi i e Ra e
2.3%
1.8%
PPO o ope a ional en i onmen s
In e ence La ency
8.7 ms
11.2 ms
DQN o ul a-low la ency
Memo y Usage
127 MB
156 MB
DQN o esou ce-cons ained de ices
In e p e abili y
High (Q- alues)
Mode a e
DQN o policy alida ion
Con inuous Ac ion Spaces
Limi ed
Excellen
PPO o ine-g ained con ol
Bo h o he well-es ablished PPO and DQN-based models
showed e icien lea ning (i.e., con e gence) unde ealis ic
aining se ings, demons a ing hei capaci y o wo k in eal
anscenden en i onmen [7]. PPO shows be e sample
e iciency and ea ly con e gence, as i akes 30% a smalle
numbe o aining episodes o each a simila pe o mance.
The policy g adien me hod and clipped objec i e unc ion
end o ha e s able lea ning dynamics, which is use ul o
challenging secu i y scena ios whe e he ewa d signals a e
spa se in he space wi h high dimensionali y. (No e,
espec i ely) Howe e , DQN has in e p e abili y ad an ages
and mo e e icien disc e iza ion o he ac ion space. The Q-
alue unc ion gi es a di ec es ima e o expec ed e u ns o
each p e en i e ac ion, which allows o in e p e and e i y
lea ned policies. Conside ing lowe compu a ional o e head
o DQN du ing he in e ence (la ency ime o 8.7ms s.
la ency ime o 11.2ms), i is a o ed unde ex eme edge
de ices, which a e se e ely esou ce cons ain in e ms o
bo h ene gy and compu ing esou ce. Fo p ac ical
conside a ion, PPO may be p e e ed o he de ec ion
accu acy and alse posi i e educ ion-o ien ed en i onmen s
while DQN would be p e e able in esou ce-limi ed
en i onmen wi h minimal compu a ional oo p in .
C. Challenges and Limi a ions
Despi e he posi i e indings, some limi a ions and challenges
a e wo h men ioning:
1) T aining Complexi y: The aining ask is compu a ionally
in ensi e, which consumes conside able ime, and in ac ual
cases, he aining con e ges a e 3500-5000 episodes o
aining agains a ious a ack scena ios. The aining
complexi y may limi he applicabili y o his aining p ocess
wi hin o ganiza ions lacking any ad anced machine lea ning
se up. The applicabili y can be somewha elie ed by
echniques like ans e lea ning and p e- ained models,
along wi h domain-speci ic ine- uning.
2) Simula ion Res ic ions – E en hough he simula ion
pla o m is qui e ex ensi e, i is di icul o accu a ely
simula e he complex and unp edic able beha io inhe en in
eal en i onmen s. In eal scena ios, issues exis in he o m
“Deep Rein o cemen Lea ning-Based Ne wo k In usion P e en ion in Cloud-Edge A chi ec u es”
7955
ETJ Volume 10 Issue 11 No embe 2025,
1
Aymen Saad
o d i , which e ol es as a ack pa e ns g ow mo e
sophis ica ed wi h he passage o ime, and ad e sa ial ac o s
ying o uniquely a ge he ulne abili ies inhe en in
machine lea ning.
3) Limi a ions o he Da ase used in IDS.p ojec s pe aining
o CIC-IDS2017 show se e al challenges in he low da a
ob ained om he packe s o he ne wo k [15]. The
benchma k da ase s a e known o ha e ce ain deme i s,
which a e ou da ed a ack models, un ealis ic a ic pa e ns
possibly de ia ing om he ac ual, and p i acy conce ns
wi hou which he sensi i e cap u es o he ne wo ks may no
be sha ed.
4) Robus ness o he Cause and Effec Rela ionship: The
es ic ion in he DRL model may s ill allow malicious
ac i i ies by he ad e sa ies o ca e o hei malicious
ne wo k a ic pa e ns, which a e unde ec ed by he use
o DRL as de ec ion mechanism. The discussion on a ack
a ian s will be used, hough esea ch on ad e sa ial
machine lea ning is pa o u u e esea ch.
D. B oade Implica ions o Ne wo k Secu i y
This wo k is pa o he pa adigm shi owa ds au oma ing,
adap ing, and in elligen ly secu ing compu e ne wo ks. The
use o a i icial and machine lea ning wi hin he secu i y
amewo k p omo es p o-ac i e h ea hun ing, inciden
esponse, and adap i e op imiza ion o he o e all secu i y
pos u e. S a egies le e aging DRL echniques a e he na u al
e olu ion ou e om eac i e signa u e-based app oaches o
p edic i e and au oma ing p e en ion. Edge compu ing
acili a es he deploymen and execu ion o secu i y p o ocols
and models on he de ice le el, hence pe mi ing as , eal-
ime h ea esponse and de ec ion [4]. The dis ibu ed edge
a chi ec u e is pe inen o he eme ging ends wi hin he
ze o- us secu i y pa adigm, whe ein secu i y en o cemen
and en o cemen policies a e mig a ed o he edge o he
ne wo ks and deployed on an indi idual low basis, as
opposed o being deployed on he pe ime e s. On he o he
hand, he issue o accoun abili y, anspa ency, and human
o e sigh becomes pa amoun wi hin he con ex o
au oma ing o e all secu i y sys ems. Au oma ion enhances
as esponse mechanisms o dange , ye sa e y can be
achie ed wi hou human e i ica ion and alida ion.
VI. CONCLUSION AND FUTURE WORK
This esea ch shows he subs an ial po en ial o deep
ein o cemen lea ning in he ask o au onomous in usion
p e en ion on ne wo ks in edge cloud a chi ec u e. Ou
p oposed sys em, which u ilizes IA agen s a he edge cloud
and is ained by he DQN and PPO algo i hms, ou pe o ms
o he sys ems based on machine lea ning echniques.
Key indings a e:
• Accu acy o de ec ion is g ea e han 98% and he False
Posi i e Ra e is less han 2%
• P o ides nea eal- ime p e en ion capabili ies wi h
esponse imes o unde 20ms.
• Low esou ce usage, sui able o edge compu ing sys ems
• Success ul adap a ion agains ze o-day a acks by
con inuous lea ning
Compa a i e s udies show ha PPO is supe io o o he
algo i hms in he e iciency o sampling and accu acy o
de ec ion, whe eas DQN is supe io in compu a ional
e iciency and in e p e abili y. The wo models a e bo h able
o lea n in ica e p e en ion policies in he simula ion se ing
by cons an in e ac ion wi h he simula ion en i onmen .
Fu u e esea ch di ec ions a e as ollows:
1) Mul i-agen a chi ec u e enhancemen s: F aming ede a ed
lea ning echniques o enable edge agen s o join lea ning
wi h conside a ion o p i acy and designing Byzan ine aul
ole ance echniques o decision making among mul i-
agen s.
2) Real Wo ld Implemen a ion Pilo : Tes ing in a con olled
p oduc ion en i onmen o e i y unc ioning in he eal
wo ld.
3) Ad e sa ial Ha dness: The e alua ion and enhancemen o
obus ness agains ad e sa ial samples by echniques o
ad e sa ial aining and obus op imiza ion.
4) In e p e able Mechanisms: In o de o allow he secu i y
analys o de elop in e p e able policy iews and decision
explana ion sys ems.
REFERENCES
1. W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge
compu ing: Vision and challenges,” IEEE In e ne o
Things Jou nal, ol. 3, no. 5, pp. 637–646, Oc .
2016.
2. P. Po ambage, J. Okwuibe, M. Liyanage, M.
Ylian ila, and T. Taleb, “Su ey on mul i-access
edge compu ing o IoT ealiza ion,” IEEE
Communica ions Su eys & Tu o ials, ol. 20, no. 4,
pp. 2961–2991, 2018.
3. Y. Mao, C. You, J. Zhang, K. Huang, and K. B.
Le aie , “A su ey on mobile edge compu ing: The
communica ion pe spec i e,” IEEE
Communica ions Su eys & Tu o ials, ol. 19, no. 4,
pp. 2322–2358, 2017.
4. S. Yi, Z. Hao, Z. Qin, and Q. Li, “Fog compu ing:
Pla o m and applica ions,” in P oc. IEEE Ho Web,
2015, pp. 73–78.
5. V. Mnih e al., “Human-le el con ol h ough deep
ein o cemen lea ning,” Na u e, ol. 518, pp. 529–
533, Feb. 2015.
6. Zeng, Yi, e al. "$ Deep-Full-Range $: a deep
lea ning based ne wo k enc yp ed a ic
classi ica ion and in usion de ec ion amewo k."
IEEE Access 7 (2019): 45182-45190.
7. J. Schulman, F. Wolski, P. Dha iwal, A. Rad o d, and
O. Klimo , “P oximal policy op imiza ion
algo i hms,” a Xi :1707.06347, 2017.
8. A. L. Buczak and E. Gu en, “A su ey o da a
mining and machine lea ning me hods o cybe
secu i y in usion de ec ion,” IEEE Communica ions