Recei ed 1 Augus 2025, accep ed 28 Augus 2025, da e o publica ion 4 Sep embe 2025, da e o cu en e sion 11 Sep embe 2025.
Digi al Objec Iden i ie 10.1109/ACCESS.2025.3606053
A Rein o cemen Lea ning-Based In elligen Du y
Cycle MAC P o ocol o In e ne o Things
SHAH ABDUL LATIF 1, MICHEAL DRIEBERG 1, (Membe , IEEE),
SOHAIL SARANG 2, (Senio Membe , IEEE), AZRINA ABD AZIZ 1, (Senio Membe , IEEE),
RIZWAN AHMAD 3, (Membe , IEEE), AND GORAN M. STOJANOVIĆ 2, (Membe , IEEE)
1Depa men o Elec ical and Elec onic Enginee ing, Uni e si i Teknologi PETRONAS, Se i Iskanda , Pe ak 32610, Malaysia
2Facul y o Technical Sciences, Uni e si y o No i Sad, 21000 No i Sad, Se bia
3School o Elec ical Enginee ing and Compu e Science, Na ional Uni e si y o Sciences and Technology (NUST), Islamabad 44000, Pakis an
Co esponding au ho : Shah Abdul La i ([email p o ec ed])
This wo k was join ly suppo ed by he g adua e assis an ship p og am a Uni e si i Teknologi PETRONAS and he GRETA p ojec .
GRETA has ecei ed unding om he Eu opean Union’s Ho izon Eu ope EIC 2023 Pa h inde Challenge P og amme G an 101161032.
ABSTRACT The Wi eless Senso Ne wo ks (WSNs) enabled In e ne o Things (IoT) applica ions ace
ene gy e iciency challenge due o he limi ed ba e y capaci y o he senso nodes. Hence, he ne wo k’s
pe o mance o en in ol es a adeo wi h ne wo k li e ime. T adi ional medium access con ol (MAC)
p o ocols a e less adap able o he dynamic ne wo k condi ions. While exis ing ein o cemen lea ning (RL)
based MACs a e mo e adap able, hey s ill encoun e challenges such as complexi y and dimensionali y.
The e o e, his wo k aims o de elop an RL based in elligen Du y cycle MAC (RiD-MAC) p o ocol ha
inco po a es sui able ne wo k in o ma ion o balance complexi y and pe o mance, e ec i ely. The p oposed
RiD-MAC p o ocol is based on he Q-lea ning algo i hm, me iculously designed wi h emaining ene gy
as he s a e space and du y cycle as he ac ion space. The ewa d is hen o mula ed based on ene gy
consump ion and h oughpu . I is implemen ed on OMNeT++ pla o m-based Cas alia simula o and
he pe o mance is compa ed wi h h ee s a e-o - he-a p o ocols, including AQSen-MAC, lDC-MAC
and QX-MAC unde h ee simula ion scena ios, s a iona y nodes wi h pe iodic a ic, hyb id a ic and
node mobili y. The simula ion esul s demons a e ha RiD-MAC p o ocol signi ican ly imp o es ene gy
e iciency, wi h educ ion in ecei e ene gy consump ion o up o 21%, and ecei e ene gy consump ion
pe bi o up o 26%, when compa ed o s a e-o - he-a p o ocols.
INDEX TERMS Machine lea ning, ein o cemen lea ning, MAC p o ocol, in elligen du y cycle, In e ne
o Things.
NOMENCLATURE
ACRONYMS
AQSen-MAC Ene gy-e icien Asynch onous QoS
MAC.
CCA Clea Channel Assessmen .
CI Con idence In e al.
CSMA Ca ie Sense Mul iple Access.
DC Du y Cycle.
IoT In e ne o Things.
MAC Medium Access P o ocol.
The associa e edi o coo dina ing he e iew o his manusc ip and
app o ing i o publica ion was Hosam El-Ocla .
ML Machine Lea ning.
QX-MAC Q-lea ning-based X-MAC.
RiD-MAC RL-based in elligen Du y cycle MAC.
RL Rein o cemen Lea ning.
lDC-MAC Rein o cemen Lea ning based Du y Cycle
MAC.
SIFS Sho In e ame Space.
WSN Wi eless Senso Ne wo k.
SYMBOLS
A Ac ion Space.
D End- o-end Packe Delay [s].
156170
2025 The Au ho s. This wo k is licensed unde a C ea i e Commons A ibu ion-NonComme cial-NoDe i a i es 4.0 License.
Fo mo e in o ma ion, see h ps://c ea i ecommons.o g/licenses/by-nc-nd/4.0/ VOLUME 13, 2025
S. A. La i e al.: RL-Based In elligen Du y Cycle MAC P o ocol o IoT
E Recei e Ene gy Consump ion pe bi
[J/bi ].
ECEne gy Consumed [J].
R Rewa d.
S S a e Space.
T Th oughpu [bps].
Tsleep Sleep Pe iod [s].
αLea ning Ra e.
γDiscoun Fac o .
I. INTRODUCTION
The In e ne o Things (IoT) o ms a global in as uc u e ha
connec s li ing and non-li ing hings such as humans, ani-
mals, elec onic de ices, ehicles, buildings and o he s [1],
[2]. In ecen yea s, emendous g ow h has been obse ed
in IoT applica ions a eas including sma ci ies, ag icul u e,
heal h, mili a y, secu i y, indus ial au oma ion, en i onmen-
al moni o ing and o he s [1],[3]. Fu he mo e, he numbe
o connec ed IoT de ices is p edic ed o each app oxima ely
29 billion globally by 2030 [1]. Wi eless Senso Ne wo ks
(WSNs) a e i al building blocks o IoT. A ypical a chi-
ec u e o WSNs-enabled IoT is illus a ed in Figu e 1.
WSNs can collec in o ma ion om he su ounding en i-
onmen , p ocess and communica e i wi elessly [4]. I is
an in as uc u e-less wi eless ne wo k ha comp ises o
small senso nodes. Each senso node comp ises o sensing
elemen , p ocessing elemen , communica ion elemen and
ba e y. The challenge is ha WSN nodes a e solely powe ed
om small non- echa geable ba e ies wi h limi ed capac-
i y [5]. The ba e ies can deple e wi hin a b ie pe iod, which
limi s he li e ime o he senso node. Addi ionally, senso
nodes a e ypically deployed in en i onmen s whe e ba e y
eplacemen s a e cos ly and di icul [6]. The e o e, ene gy
e iciency is c ucial o p olong he unc ions o he senso
nodes.
In WSNs, he senso nodes sha e he wi eless medium
o communica e and exchange da a wi h each o he . Only
one senso node can ansmi da a success ully h ough he
medium a a ime. Con e sely, a collision occu s i wo
o mo e senso nodes a emp o ansmi da a o e he
sha ed medium simul aneously. The medium access con-
ol (MAC) p o ocol is employed o handle he access o
he sha ed wi eless medium du ing he ansmi ing and
ecei ing o da a packe s [7]. I aims o minimize da a col-
lisions and e ansmissions using ime di ision, equency
di ision and ca ie sense echniques. Addi ionally, many
MAC p o ocols use collision a oidance mechanisms such as
eady o send (RTS) o clea o send (CTS) and back o
mechanisms [8].
Mos o he ene gy consump ion is con ibu ed by ans-
mission, ecep ion and sensing ope a ions [9]. The e o e, i is
essen ial o ocus on ene gy e iciency a MAC p o ocol o
p olong ne wo k li e ime. Du y cycle (DC) echniques a e
employed o manage ac i e pe iod and sleep pe iod o achie e
ene gy e iciency [10]. Due o his, he e is ade-o be ween
FIGURE 1. A chi ec u e o WSNs-enabled IoT.
maximizing ne wo k pe o mance and ex ending ne wo k
li e ime. The MAC p o ocol enables he senso nodes o
adjus hei ope a ions and use a ailable ene gy e icien ly o
imp o e ne wo k pe o mance while balancing ne wo k li e-
ime [11]. Howe e , hese senso nodes ope a e in a dynamic
ne wo k condi ions ha expe iences egula a ia ions in
ac o s such as he numbe o nodes, opology, a ic load
and o he pa ame e s [12]. The e o e, adap i e mechanisms
a e necessa y o he MAC p o ocol o u he enhance he
ne wo k pe o mance and ex end i s li e ime.
Machine lea ning (ML) o e s adap i e echniques o man-
age dynamic ne wo k condi ions [13],[14]. ML algo i hms
build models based on aining expe ience and can adap
o he equen changes in he dynamic ne wo k condi ions.
Supe ised lea ning equi es a la ge se o labelled da a o
p edic he ou pu accu a ely, while unsupe ised lea ning can
iden i y hidden pa e ns in unlabeled da ase [15]. On he
o he hand, ein o cemen lea ning (RL) does no equi e
p io da a, bu i is dependen on con inuous in e ac ion wi h
he en i onmen and eedback o make decisions [16]. An RL
agen lea ns h ough ial and e o based on i s explo a ion
and eedback om i s en i onmen [17]. Mo eo e , RL is
simple o implemen and less complex han supe ised and
unsupe ised me hods [18]. Fo hese easons, RL is o en
p e e ed and employed in WSNs.
Q-lea ning is a well-known RL me hod ha does no
equi e p io knowledge o he en i onmen . A Q-lea ning
agen in e ac s wi h he en i onmen , pe o ms an ac ion and
ecei es posi i e o nega i e ewa d as eedback, depending
VOLUME 13, 2025 156171
S. A. La i e al.: RL-Based In elligen Du y Cycle MAC P o ocol o IoT
on he ac ion’s impac on he en i onmen [19]. Compa ed
o o he RL algo i hms, Q-lea ning achie es as con e -
gence unde gi en condi ions. Addi ionally, i conse es
compu a ional esou ces due o i s low powe consump ion.
Fu he mo e, he Q-lea ning algo i hm’s s uc u e is simple
o implemen [20]. The e o e, he Q-lea ning algo i hm is
chosen o his s udy.
A. MOTIVATION
P io o his, se e al ypes o MAC p o ocols ha e been
de eloped, om adi ional MAC p o ocols [21],[22],[23],
[24],[25],[26] o ML-based MAC p o ocols [27],[28],[29],
[30],[31],[32],[33],[34],[35]. AQSen-MAC [25] employs
a ixed du y cycle calcula ion o mula which limi s i s pe o -
mance unde dynamic ne wo k condi ions. Addi ionally, he
nodes s a s wi h an ini ial ene gy o 75% only ins ead o ull
capaci y, which is no ealis ic. lDC-MAC [34] uses RL bu
conside s h ee pa ame e s in he s a e space and wo pa am-
e e s in he ac ion space, which lead o exponen ial g ow h
in he Q- alue able dimensions. This inc eases complexi y
and esou ce usage, and may also cause equen changes
in he s a e, leading o delayed con e gence. Addi ionally,
i employs a complex ewa d unc ion ha makes he uning
o weigh ac o s di icul . QX-MAC [30] chooses only queue
leng h as he s a e space. Al hough simple o implemen ,
his choice may cause as e ene gy deple ion due o no
accoun ing o he ene gy cos o i s ac ions. Addi ionally,
a bina y ewa d unc ion has been employed, which may
educe adap abili y o dynamic ne wo k condi ions and limi
he abili y o cap u e c i ical ne wo k in o ma ion. The e o e,
he e is a need o a p o ocol ha is adap able o dynamic
ne wo k condi ions, has mode a e complexi y and employs
app op ia e ne wo k pa ame e s o RL algo i hm design and
lea ning o mula ion.
Thus, he ocus o his wo k is he de elopmen o a
RL-based in elligen Du y cycle MAC (RiD-MAC) p o ocol
o WSNs-enabled IoT. The p oposed RiD-MAC p o ocol
uses Q-lea ning o adjus he du y cycle o he ecei e node
based on i s emaining ene gy as he s a e space and du y
cycle as he ac ion space. The p o ocol aims o balance
complexi y and pe o mance e ec i ely by employing app o-
p ia e and su icien ne wo k in o ma ion. The RiD-MAC
p o ocol ensu es obus pe o mance unde dynamic ne wo k
condi ions, including h ee simula ion scena ios, s a iona y
nodes wi h pe iodic a ic, hyb id a ic and node mobili y,
he eby p o iding ealis ic adap a ion o du y cycle adjus -
men .
B. MAIN CONTRIBUTIONS
The main con ibu ions o his esea ch wo k a e as ollows:
•The p oposed Q-lea ning based in elligen du y cycle
MAC p o ocol adjus s he sleep pe iod o he ecei e
node wi h espec o emaining ene gy.
•The ecei e node wakes up pe iodically o ecei e
da a packe s om he in ended sende s. The ecei e
node emains ac i e o longe pe iods o ime when
i has high ene gy and sleeps mo e when i has low
ene gy.
•The Q-lea ning algo i hm is designed wi h emaining
ene gy as he s a e space and du y cycle as he ac ion
space. The ewa d is o mula ed based on ene gy con-
sump ion and h oughpu .
•The p oposed RiD-MAC p o ocol employs epsilon-
g eedy (ε-g eedy) policy o add ess he explo a ion-
exploi a ion dilemma in RL.
•The p oposed RiD-MAC p o ocol is implemen ed and
e alua ed a he packe le el on OMNeT++ pla o m-
based Cas alia simula o .
•The pe o mance o he p oposed RiD-MAC p o ocol
is compa ed wi h s a e-o - he-a p o ocols, including
AQSen-MAC [25], QX-MAC [30] and lDC-MAC [34]
unde dynamic ne wo k condi ions.
•The RiD-MAC p o ocol subs an ially educes he
ecei e ene gy consump ion by up o 21%, and ecei e
ene gy consump ion pe bi by up o 26%, when com-
pa ed o s a e-o - he-a p o ocols.
C. ORGANIZATION
The es o his pape is o ganized as ollows: Sec ion II
p esen s he li e a u e e iew. Sec ion III desc ibes he
machine lea ning echniques o WSNs-enabled IoT.
Sec ion IV ou lines he RiD-MAC p o ocol design. Sec ion V
p esen s he esul s and discussion, and Sec ion VI concludes
he pape and p o ides he u u e wo k.
II. LITERATURE REVIEW
The p oposed MAC p o ocol in [21] adjus s i s du y cycle
based on a ic load by c ea ing clus e s. The p o ocol aims
o educe ene gy consump ion while imp o ing la ency and
packe collisions. Howe e , bo de nodes can ollow mo e
han one schedule which inc eases he numbe o i ual
clus e s. Hence, ene gy consump ion inc eases as mo e nodes
emain ac i e due o he la ge numbe o i ual clus e s. The
MAC p o ocol in [22] o e comes his issue by employing
p ede ined clus e s. The du y cycle is adjus ed wi h espec
o he clus e s o a y hei sleep schedules. Howe e , he
p o ocol o ces nodes o ollow a ixed du y cycle o each
clus e , which may esul in less adap a ion o dynamic ne -
wo k condi ions.
The hyb id p o ocol in [23] adjus s du y cycle based on
esidual ene gy and a ic load. I combines synch onous and
asynch onous mechanisms, using he asynch onous mecha-
nism o adjus he du y cycle and he synch onous mechanism
o sha e he du y cycle wi h neighbo ing nodes. Resul s show
ha he p o ocol imp o es ne wo k li e ime, packe deli e y
a io and delay. Howe e , he p o ocol inc eases ne wo k
complexi y due o i s hyb id mode. Addi ionally, i inc eases
synch oniza ion o e head. Fu he mo e, he p o ocol does no
accoun o adap a ion o dynamic ne wo k condi ions.
The MAC p o ocol p oposed in [24] aims o dec ease
ene gy consump ion and delay. The p oposed p o ocol con-
side s a ic load o adjus he du y cycle o he ecei e node,
156172 VOLUME 13, 2025
S. A. La i e al.: RL-Based In elligen Du y Cycle MAC P o ocol o IoT
and he con en ion window o he sende node. The p o ocol
educes sleep ime du ing high a ic and educes wake up
ime du ing low a ic. Ea ly acknowledgmen is used o
b oadcas he in o ma ion o du y cycle a io and con en ion
window o dec ease ene gy consump ion and delay. Howe e ,
he p o ocol has inc eased complexi y due o combining he
adjus men o du y cycle and con en ion window size. Addi-
ionally, i also did no conside changes in dynamic ne wo k
condi ions.
The au ho s in [25] p oposed he ene gy-e icien asyn-
ch onous QoS (AQSen) MAC p o ocol. The AQSen-MAC
achie es good ene gy e iciency and ne wo k pe o mance.
The du y cycle o he ecei e node is adjus ed based on i s
emaining ene gy o imp o e ene gy e iciency. The p o ocol
educes he ecei e ene gy consump ion signi ican ly. How-
e e , he esul s a e ob ained by se ing ini ial ene gy up o
75% only ins ead o ull capaci y. Addi ionally, du y cycle
adjus men is done by using a ixed o mula which does no
conside he dynamic aspec o WSNs.
The au ho s in [26] p esen an ene gy e icien and QoS
ocused p o ocol called EEQ MAC. The EEQ MAC adjus s
nodes’ du y cycle based on queue leng h and da a p io i y.
In con as o [25], he EEQ MAC adjus s he du y cycle
o ecei e and sende node. Hence, i inc eases complex-
i y, communica ion o e head, and synch oniza ion issues.
Addi ionally, he p o ocol does no conside ene gy aspec
o he du y cycle adjus men , and dynamic aspec s o
WSNs.
The au ho s in [27] p opose a dynamic clus e ing echnique
based on adap i e sleep scheduling. The senso nodes a e
a anged in o clus e s and a clus e head is assigned in each
clus e . The nodes join he clus e dynamically. Packe a i al
ime is conside ed o adjus sleep scheduling. The algo i hm
ope a es in an i e a i e manne wi h each i e a ion consis ing
o ounda ion, o ma ion, and o wa ding. A he s a o
each slo , a node can selec om a se o h ee ac ions:
ansmi , sleep and lis en. I ansmi is selec ed, hen he ime
slo o send packe is ob ained. The algo i hm uses sepa a e
Q-lea ning mechanisms o ac ion selec ion and ime slo
de e mina ion. Hence i inc eases complexi y and compu a-
ional o e heads. Fu he mo e, he algo i hm may expe ience
con e gence issues since each node has o upda e and s o e
he Q- alue able.
The au ho s in [28] p oposed an RL based sleep schedul-
ing o ene gy e iciency and li e ime o WSNs. A selec i e
numbe o nodes emain ac i e while all o he nodes sleep.
The Q-lea ning is employed o schedule he sleep pe iod
o he nodes based on a ia ions in emaining ene gy. The
s a e space is conside ed as he se o all he nodes whe eas
ac ion space is designed as he se o all neighbo ing nodes
o a gi en a node. This may esul in an in ini e numbe o
s a e-ac ion pai s which e ec i ely inc eases he dimension
o he Q- able and complexi y o la ge ne wo ks. Addi ion-
ally, e en dis ibu ion o ene gy inc eases o e head and may
comp omise pe o mance by igno ing sho e communica-
ion ou es.
The au ho s in [29] de eloped a mul i- ie lea ning based
on RL and a mul i-a med bandi (MAB) model o slo selec-
ion and sleep scheduling in WSNs. The au ho s s i e o
balance he ade-o be ween h oughpu and ene gy e i-
ciency. Simula ions we e pe o med o alida e he analy ical
model de eloped o sleep scheduling. Adap i e decision
making is achie ed by using RL and he MAB model in
andem. Howe e , his combina ion adds communica ion
o e head and inc eases complexi y. Addi ionally, con e -
gence may be a ec ed due o he use o he wo models
on each node, which causes equen ansien s in decision
making.
The ac i e pe iod o he sende nodes is adjus ed using
Q-lea ning in he QX-MAC p o ocol p oposed in [30]. The
sende node ese es he ac i e pe iod based on he numbe
o packe s queued o ansmission in he cu en s a e. A new
s a e is achie ed once his ac i e pe iod is o e . I he queue is
emp y in he new s a e, a posi i e ewa d is gi en, o he wise
a nega i e ewa d is gi en. On he o he hand, he ecei e
node is no i ied abou he queue s a us o he sende node by
u ning he ‘‘mo e bi ’’ o 1 when he sende has mo e packe s
lined up o he same ecei e and 0 o he wise. Howe e , he
dynamic ne wo k condi ions may no be ully cap u ed using
queue leng h as a s a e. A ne wo k may expe ience conges ion
by he ime he queue g ows. Addi ionally, a senso node
migh deple e i s ene gy p ema u ely due o a high queue
leng h and low ene gy le el. Fu he mo e, wi h a bina y
ewa d, he quali y o an ac ion is gi en equal weigh o suc-
cess ul ansmissions om bo h high ene gy and low ene gy
senso nodes.
The pape in [31] employs Q-lea ning and linea eg es-
sion o adjus he du y cycle o he senso nodes. Howe e ,
he combina ion o he wo schemes may add communica-
ion o e head and inc ease complexi y, simila o [29]. The
au ho s de eloped a QL model wi h he s a e space, ac ion
space and ewa d. The no malized a ic load is conside ed
as he s a e, and he bes ac ion, de ined as he du y cycle,
is de e mined based on he load. Howe e , he Q- able dimen-
sions may a y andomly due o he load being used as a
s a e, esul ing in slow o non-con e gence o la ge loads.
Addi ionally, he scheme has no been compa ed o any o he
ML-based p o ocol. The p oposed wo k in [32] adjus s he
du y cycle and packe o wa ding using RL and he Mon e
Ca lo echnique. The aim is o inc ease ene gy e iciency and
educe delay. The du y cycle is adjus ed using an e en d i en
app oach. Howe e , he combina ion o he wo schemes
may add communica ion o e head and inc ease complexi y,
simila o [29] and [31]. Fu he mo e, no o he ML-based
p o ocol has been compa ed o he p oposed echnique.
The au ho s in [33] p opose an RL based sleep scheduling
p o ocol ha is adap i e o empe a u e. The au ho s analyze
he impac o empe a u e a ia ions on ene gy consump ion.
The senso node akes ac ion o ansmi , lis en o sleep based
on empe a u e a ia ions in and a ound he node. The s a e is
conside ed based on he node’s ene gy and i s neighbo hood.
Howe e , RL is implemen ed on each node which may a ec
VOLUME 13, 2025 156173
S. A. La i e al.: RL-Based In elligen Du y Cycle MAC P o ocol o IoT
he lea ning p ocess as well as consuming mo e ene gy du ing
he explo a ion phase. Addi ionally, i akes a long ime o
con e ge.
In [34], he au ho s in oduced lDC-MAC o balance ne -
wo k pe o mance and ene gy e iciency using Q-lea ning.
The ne wo k li e ime was ex ended by calcula ing he du y
cycle o senso nodes. Whe eas, he ne wo k pe o mance
was imp o ed by calcula ing a o able ansmission con-
en ion window. When a senso node has da a packe o
send, i s a s by calcula ing he wake-up du a ion and a o -
able con en ion window be o e ansmi ing he da a packe .
The sink node ecei es he da a packe and esponds by
sending an acknowledgemen . Howe e , he p o ocol su e s
om a as e deple ion a e o he ba e y. Addi ionally, he
p o ocol conside s a iple o pa ame e s as a s a e, which
es ic s i s p ac ical easibili y. Consequen ly, he dimensions
o Q- alue able inc ease due o he la ge numbe o s a e-
ac ion pai s. Fu he mo e, he complexi y o he p o ocol
i sel is inc eased. Mo eo e , he lea ning is dependen on he
cen alized a bi a ion o speci ic ga eway nodes.
The au ho s in [35] p oposed a Q-lea ning MAC (QL-
MAC) p o ocol o ex end he ne wo k li e ime. The QL-MAC
modi ies he du y cycle o he node based on i s a -
ic load and ansmission s a e o i s neighbo ing nodes.
I di ides he ime in o ames, and ames in o slo s like
asynch onous CSMA-CA p o ocol. Q- alues a e compu ed
and s o ed in e e y slo wi hin each ame. Howe e , he
p ima y ocus o he p o ocol is o inc ease ne wo k li e ime,
which esul s in high p obabili y o ex ended delay and low
h oughpu h oughou he whole simula ion. Addi ionally,
he p o ocol migh be less adap i e o he dynamic ne -
wo k condi ions due o i s hea y dependence on Q-lea ning
hype pa ame e s ha a e selec ed manually. Fu he mo e,
i equi es a di ec con ol channel o exchange he lea ned
in o ma ion.
Mos o he adi ional MAC p o ocols [21],[22],[23],
[24],[25],[26] use a ixed o mula o adjus he du y cycle
adjus men . In some p o ocols, he senso node employs a
cons an du y cycle h oughou i s li e ime. Consequen ly,
ei he he ne wo k pe o mance o he li e ime o he node
is comp omised. Addi ionally, adi ional MAC p o ocols a e
less adap able o dynamic aspec o ne wo k. Las ly, he
e icien use o he a ailable ene gy emains a conce n.
ML-based MACs o e a po en ial solu ion o o e come
he limi a ions aced in adi ional MACs. I is obse ed
in he li e a u e ha Q-lea ning is used mo e o en o add ess
he dynamic na u e o WSNs. In pa icula , Q-lea ning based
MAC p o ocols adjus he du y cycle adap i ely o equen
changes in he ne wo k. Howe e , some esea ch wo ks con-
side excessi e ne wo k pa ame e s o Q-lea ning, which
TABLE 1. Summa y o ML-based MAC p o ocols.
156174 VOLUME 13, 2025
S. A. La i e al.: RL-Based In elligen Du y Cycle MAC P o ocol o IoT
FIGURE 2. Taxonomy o machine lea ning algo i hms.
inc eases i s complexi y and limi s i s e ec i eness. Addi-
ionally, p ac ical easibili y is es ic ed due o he lack o
ne wo k in o ma ion. A summa y o ML-based MAC p o-
ocols [27],[28],[29],[30],[31],[32],[33],[34],[35] is
p esen ed in Table 1. These p o ocols mainly su e om
inc eased complexi y and o e heads, high dimensionali y,
and delayed con e gence.
III. MACHINE LEARNING ALGORITHMS FOR
WSNS-ENABLED IoT
In WSNs-enabled IoT, ML is widely employed o o e come
a ious issues such as ene gy e iciency, quali y o se ice,
da a agg ega ion, da a in eg i y, ou ing, localiza ion, syn-
ch oniza ion and ene gy o ecas ing [36],[37]. They ha e
been implemen ed in a ious applica ions such as in p edic-
i e main enance and aul diagnosis [38], sma heal h ca e
sys ems [39], ag icul u e [40], and o he s. The ML algo i hm
akes a da ase as an inpu o aining he model. The model
is hen e alua ed o assess i s accu acy. The p ocess o e alu-
a ion and op imiza ion con inues i e a i ely un il he model
achie es he equi ed le el o accu acy. Las ly, he ained
model is alida ed on a new da ase o ensu e a balance
be ween o e i ing and unde i ing [41]. Figu e 2illus a es
he main ca ego ies o he ML algo i hms.
Supe ised lea ning uses a labelled da ase o ain he
model [42]. I can be ca ego ized in o eg ession and clas-
si ica ion. Reg ession is used o p edic quan i a i e a iables
while classi ica ion is applied o p edic ca ego ical ou -
comes [42],[43]. Unsupe ised lea ning algo i hms use
unlabeled da ase ha con ain only inpu da a. Thus, he
model ies o iden i y hidden pa e ns and g oups da a
acco ding o hei simila i ies [37]. Unsupe ised lea ning
is ca ego ized in o clus e ing and dimensionali y educ ion.
Clus e ing g oups simila da a componen s in o clus e s
whe eas dimensionali y educ ion educes he numbe o
inpu ea u es o add ess he issue o high dimensionali y [44].
RL does no equi e p io da ase s o ain a model. Ins ead,
RL elies on da a collec ed h ough con inuous in e ac ion
wi h he en i onmen [16]. RL u ilizes a ial-and-e o based
s a egy and is i e a i e in na u e. An RL agen explo es he
en i onmen and akes ac ion based on ei he explo a ion o
p io expe ience. The agen lea ns om he ewa d, which
is conside ed as he impac o i s ac ion on he en i onmen .
FIGURE 3. Q-lea ning mechanism [47].
In he simples o m, he ewa d can be posi i e o nega i e.
Howe e , a complex ewa d unc ion can be o mula ed based
on en i onmen al condi ions. The aining p ocess con inues
un il he ewa d sa u a es o a ains a p ede ined le el [45].
RL can be u he di ided in o model- ee and model-based
echniques. The RL agen lea ns h ough i s con inuous in e -
ac ion wi h he en i onmen in model- ee echnique. The
agen doesno model he en i onmen . Model- ee echniques
include Q-lea ning and SARSA. In model-based echnique,
he agen builds a model ha esembles he en i onmen and
akes ac ions based on his model. Mon e Ca lo me hod is an
example o model-based RL echnique [46].
Q-lea ning is a popula RL me hod which is widely used in
WSNs [19]. A Q-lea ning agen does no depend on any p io
knowledge o an en i onmen . Ins ead, i in e ac s wi h he
en i onmen and lea ns by expe ience. The agen obse es
he cu en s a e in he en i onmen and akes an ac ion.
I ecei es a ewa d depending on he impac o he ac ion and
upda es o he new s a e as shown in Figu e 3. The ac ions o
he agen depend on a Q- unc ion which signi ies he quali y
o a pa icula ac ion a a pa icula s a e. The Q- unc ion is
upda ed i e a i ely as ollows [48]:
Q(S +1,A )=(1−α)Q(S ,A )
+α[R+γmaxaQ(S +1,a)](1)
whe e Q(S , A ) is he Q- alue o he cu en s a e S and
ac ion A ,Q(S +1, A ) is he upda ed Q- alue o he nex
s a e, is he ime s ep, and Ris he ewa d o s a e S and
ac ion A . The a iable adeno es all possible ac ions a ailable
o he new s a e S +1. The e m γis called he discoun
ac o and i s alue is be ween 0 and 1. The alue o dis-
VOLUME 13, 2025 156175
S. A. La i e al.: RL-Based In elligen Du y Cycle MAC P o ocol o IoT
FIGURE 4. Ope a ion cycle o RiD-MAC p o ocol.
coun ac o is chosen ei he close o one o p e e ing he
long- e m ewa d o close o ze o o p e e ing he sho -
e m ewa d. maxaQ(S +1, a) is he maximum Q- alue and
γmaxaQ(S +1, a) is he discoun ed u u e alue. αdeno es
he lea ning a e which con ols he speed o lea ning and i s
alue anges be ween 0 and 1.
The agen can ake ac ions based on ei he explo a ion o
exploi a ion. Exploi a ion means he agen akes he ac ion
wi h he highes Q- alue. On he o he hand, in explo a ion,
he agen akes a andom ac ion by igno ing he Q- alue o
disco e new possibili ies o maximizing he ewa d. I he
agen explo es oo much, i migh be was ing a lo o ime
on andom ac ions. On he o he hand, an agen migh miss
ou on be e ac ions when i elies mo e on exploi a ion.
The e o e, i is c ucial o s ike he igh balance be ween
explo a ion and exploi a ion. The epsilon-g eedy (ε-g eedy)
policy is employed o achie e his balance. In he ε-g eedy
policy, he agen explo es mo e in he beginning wi h a p ob-
abili y o εand slowly ansi s o exploi a ion in he end wi h
a p obabili y o 1 – ε[34]. Addi ionally, he agen can adap
o he equen changes in he dynamic ne wo k condi ions
using he ε-g eedy policy.
IV. RID-MAC PROTOCOL DESIGN
This sec ion p o ides a de ailed desc ip ion o he RiD-MAC
p o ocol design. I is di ided in o h ee pa s: The o e iew o
he baseline p o ocol, he Q-lea ning amewo k design and
he du y cycle adjus men mechanism.
A. BASELINE PROTOCOL OVERVIEW
The p oposed RiD-MAC is a con en ion based asynch onous
p o ocol. The baseline design o RiD-MAC is inspi ed
by AQSen-MAC P o ocol [25]. The RiD-MAC ollows a
ecei e -ini ia ed app oach. The ecei e wakes up pe iodi-
cally o ecei e da a packe s om he in ended sende s. The
ope a ion cycle o he p oposed p o ocol is di ided in o an
ac i e pe iod and a sleep pe iod as shown in Figu e 4. Da a
communica ion akes place du ing he ac i e pe iod. Whe eas
he ecei e node conse es ene gy du ing he sleep pe iod.
The ac i e pe iod du a ion is ixed o 17ms. On he o he
hand, he sleep pe iod du a ion a ies based on du y cycle
o he ecei e node. The sleep pe iod is equal o ze o when
he DC alue becomes one. This means ha he node is ac i e
du ing he en i e cycle, and i does no sleep. The sleep pe iod
is equal o he ac i e pe iod when he DC alue is 0.5. The
sleep pe iod is 9 imes he ac i e pe iod when he DC alue is
0.1. The sleep pe iod o he ecei e node is calcula ed using
he ollowing equa ion:
Tsleep =Tac i e ×(1 −DC)
DC (2)
whe e Tsleep is he sleep pe iod and Tac i e is he ac i e pe iod.
The communica ion o e iew o he p oposed p o ocol is
shown in Figu e 5. Du ing he ac i e pe iod, he ecei e node
lis ens o he channel and ca ies ou clea channel assess-
men (CCA) o check he channel s a us. I he channel is
ound idle, i b oadcas s he wake-up beacon o all sende s o
announce i s a ailabili y o ecei e da a packe s. The ecei e
wai s o a speci ic ime Tw o ge a esponse om he sende s.
On he o he hand, sende nodes ha ha e da a packe s, lis en
o he channel. When a sende node ecei es he wake-up
beacon, i pe o ms CCA o check channel s a us. I he
channel is idle, he sende node ansmi s he Tx beacon.
A e ecei ing he Tx beacon, he ecei e e mina es he
wai ing ime Twand ansmi s he Rx beacon. The Rx beacon
shows he eadiness o he ecei e o accep da a packe s. The
Rx beacon igge s he sende node o ansmi da a packe .
Finally, he ecei e node ecei es he da a packe and sends
ACK packe o con i m he da a ecep ion.
The wake-up beacon ca ies he sou ce add ess and
ecei e ene gy in o ma ion. The Tx beacon consis s o
sou ce add ess, des ina ion add ess and ne wo k alloca ion
ec o . The Rx beacon comp ises o sou ce add ess, selec ed
sende add ess and ne wo k alloca ion ec o . Addi ion-
ally, he h ee beacons use ame con ol and ame check
sequence om IEEE 802.15.4 s anda d.
B. Q-LEARNING DESIGN
The RiD-MAC p o ocol employs Q-lea ning o adjus he
sleep pe iod o he ecei e node. The p o ocol adop s di e -
en du y cycle alues based on emaining ene gy in elligen ly.
On he con a y, o he MAC p o ocols such as AQSen-MAC
use a ixed o mula o adjus he du y cycle. The design o
Q-lea ning is discussed below:
1) STATE SPACE
The s a e space, Sis designed wi h espec o emaining
ene gy, EL. I is disc e ized o ou ene gy le els in pe cen -
age. Co espondingly, he node can a ain ou s a es. The
senso node s a s wi h a cu en ene gy le el o 100%.
S=(10,40,70,100)
2) ACTION SPACE
The ac ion space, A ep esen s he du y cycle o he ecei e
node. I is also disc e ized o ou DC alues. Co espond-
156176 VOLUME 13, 2025
S. A. La i e al.: RL-Based In elligen Du y Cycle MAC P o ocol o IoT
FIGURE 5. Communica ion o e iew o RiD-MAC p o ocol.
ingly, he node can ake ac ion om he ou alues. The node
s a s wi h a du y cycle o 0.5 o he i s cycle.
A=(0.1,0.4,0.7,1.0)
3) Q-VALUE TABLE
The Q- alue able s o es Q- alues o each s a e-ac ion pai .
I is upda ed i e a i ely based on (1) and comp ises o ows
and columns. The s uc u e o he Q- alue able is speci ied
by s a es as i s ows and ac ions as i s columns. The o de o
he Q- alue able is ows by columns. Since he RiD-MAC
has ou s a es and ou ac ions hence he o de is 4 ×4,
and he able has a o al o 16 Q- alues. Ini ially, all he
Q- alues a e ini ialized o ze o as shown in Table 2. The main
diagonal highligh ed in g een colo ep esen s he bes ac ion
o each s a e. Fo example, he node will ecei e maximum
ewa d o s a e 70% only i i chooses he bes ac ion o 0.7.
The Q- alues posi ioned unde he main diagonal ep esen
insu icien ac ions o each s a e. Fo example, i he node
chooses an ac ion o 0.4 a s a e 70%, i will ecei e low
ewa d. This is because he ene gy is s ill su icien a his
s a e, so he ne wo k pe o mance will be deg aded i a low
du y cycle is chosen. Hence, his ac ion is conside ed insu i-
cien . Simila ly, he ac ion 0.7 o lowe and he ac ion 0.1 a e
insu icien o s a e 100% and s a e 40%, espec i ely. The
Q- alues placed abo e he main diagonal ep esen excessi e
ac ions o each s a e. Fo example, i he node chooses an
ac ion o 0.7 a s a e 40%, i is conside ed excessi e. Because
he emaining ene gy is low a his s a e, so he ne wo k
TABLE 2. Ini ial Q- alue able.
li e ime will be a ec ed i a high du y cycle is chosen. Hence,
he node should be p e en ed om choosing such ac ions.
Simila ly, he ac ion o 0.4 o g ea e and ac ion 1.0 a e
excessi e o s a e 10% and s a e 40%, espec i ely.
The o de o Q- alue able is c i ical in Q-lea ning design.
I depends on he numbe o disc e e le els o s a e space
and ac ion space. Complexi y o Q-lea ning inc eases wi h
mo e disc e e le els o s a e and ac ion space. As a esul ,
Q-lea ning will expe ience equen ansien s due o quick
changes in s a e. Addi ionally, he change in s a e may occu
be o e he bes ac ion is ound o he gi en s a e. Ano he
ac o ha con ibu es highly o he complexi y o Q- alue
able is he numbe o ne wo k pa ame e s conside ed o
s a e and ac ion spaces. This c ea es he issue o high dimen-
sionali y. Fo example, lDC-MAC conside s h ee ne wo k
pa ame e s in he s a e space and wo ne wo k pa ame e s
in he ac ion space. This esul s in a mul i-dimensional Q-
alue able. I all he ne wo k pa ame e s a e assumed o
be disc e ized o ou le els, hen he o al numbe o com-
bina ions o he s a e is equal o 64. Whe eas he o al
numbe o combina ions o ac ion equal o 16. This means,
he e a e 16 ac ions a ailable o be aken du ing each s a e,
which esul s in 1024 Q- alues. Consequen ly, i s Q-lea ning
may expe ience inc eased complexi y, equen ansien s,
educed s eady s a e du a ions and inadequa e ac ions o
a ious s a es. The p oposed RiD-MAC add esses hese con-
ce ns by conside ing one ne wo k pa ame e o he s a e
and he ac ion, esul ing in wo-dimensional Q- alue able.
The QX-MAC also conside s one pa ame e as a s a e, bu i
migh no be able o cap u e he dynamic ne wo k condi ions,
as discussed in Sec ion II.
4) REWARD
The ewa d, Ris designed wi h h oughpu , Tand ene gy
consumed, EC. The ewa d will be maximized when ei he he
h oughpu inc eases, o he ene gy consump ion dec eases.
The e o e, he node s i es o inc ease h oughpu and educe
ene gy consump ion. I is o mula ed as ollows:
R=wTT−wEEC(3)
whe e wT, and wEa e he co esponding weigh coe icien s.
VOLUME 13, 2025 156177
S. A. La i e al.: RL-Based In elligen Du y Cycle MAC P o ocol o IoT
The weigh coe icien s’ alues a e adap i e o he s a e o
he node and he ac ion aken du ing he s a e. The ewa d
is maximized i a highe alue o du y cycle is chosen as an
ac ion when ene gy le el is high, o a lowe du y cycle alue
is chosen when ene gy le el is low. On he o he hand, he
ewa d is minimized i a lowe alue o du y cycle is chosen
as an ac ion when ene gy le el is high o a highe alue o
du y cycle is chosen when he ene gy le el is low.
Th oughpu , Tis calcula ed as:
T=Npk Rx ×Lpk
Ts
(4)
whe e Npk Rx is he o al numbe o packe s ecei ed, Lpk is
he da a packe size in bi s and Tsis he simula ion ime.
Ene gy consumed, ECis calcula ed as:
Ec=Xn
i=0Pi× i(5)
whe e nis he o al numbe o adio s a es, Piis he powe
consump ion a e o adio s a e iand iis he ime spen in
adio s a e i.
The choice o ewa d pa ame e s e lec s he balance
be ween ne wo k pe o mance and ene gy e iciency. On he
con a y, QX-MAC conside s bina y ewa d which may no
be able o ully accoun o he dynamic ne wo k condi ions
as discussed in Sec ion II.
5) NEXT STATE
The nex s a e is upda ed based on maximum ba e y ene gy
and cumula i e ene gy consumed om s a o he cu en
ope a ion cycle. I is ob ained by:
S +1=Em−Ecc (6)
whe e S +1is he nex s a e, Emis he maximum ba e y
ene gy and Ecc is he cumula i e ene gy consumed.
6) COMPUTATIONAL COMPLEXITY
RiD-MAC in ol es a o al o a ound 10 indi idual Q-lea ning
logic ope a ions execu ed du ing each cycle. All indi id-
ual ope a ions a e pe o med in cons an ime O(1), which
is independen o s a e-ac ion space size. These include
lookups, such as e ching consumed ene gy and numbe o
ecei ed packe s. They also include a i hme ic ope a ions o
calcula ing ewa d and upda ing he Q- alue using he Bell-
man equa ion. Las ly, compa isons a e made o de e mining
he cu en s a e o nex s a e. On he o he hand, exploi a ion
and inding maximum Q- alue o nex s a e a e conside ed
ac ion dependen ope a ions whose ime g ows linea ly wi h
he numbe o ac ions, deno ed as O(A). Du ing exploi a ion,
he senso node scans and compa es all Q- alues o he
cu en s a e o de e mine he ac ion wi h he highes Q- alue.
The maximum Q- alue o he nex s a e is ob ained in a simi-
la way. Addi ionally, Q- alue able p in ing and ini ializa ion
o ze os, equi es O(S ×A) ime bu hese ope a ions a e
pe o med only once du ing s a -up, hence hei impac is
negligible. On he o he hand, O(A) and O(1) a e obse ed
du ing each cycle bu O(A) is mo e dominan . The e o e,
FIGURE 6. Du y cycle adjus men mechanism o RiD-MAC p o ocol.
he o e all ime complexi y o Q-lea ning logic p ocessing
is O(A) pe cycle [49]. Addi ionally, he node execu es RL
156178 VOLUME 13, 2025
S. A. La i e al.: RL-Based In elligen Du y Cycle MAC P o ocol o IoT
REFERENCES
[1] A. Saleem, S. Shah, H. I ikha , J. Zywiołek, and O. Albalawi, ‘‘A
comp ehensi e sys ema ic su ey o IoT p o ocols: Implica ions o da a
quali y and pe o mance,’’ IEEE Access, ea ly access, Oc . 28, 2024, doi:
10.1109/ACCESS.2024.3486927.
[2] M. Weqa , S. Meh uz, D. Gup a, and S. U ooj, ‘‘Adap i e swi ching
based da a-communica ion model o In e ne o Heal hca e Things
ne wo ks,’’ IEEE Access, ol. 12, pp. 11530–11548, 2024, doi:
10.1109/ACCESS.2024.3354722.
[3] R. B. P ee hi and M. S. Nai , ‘‘Augmen ing ene gy sus ainabili y o
s a ic nodes using hyb id KGNN-AHP d i en app oach o IoT-based
he e ogeneous WSN,’’ IEEE Access, ol. 13, pp. 3320–3354, 2025, doi:
10.1109/ACCESS.2024.3523401.
[4] M. Z. Hasan and Z. M. Hanapi, ‘‘E icien and secu ed mechanisms o
da a link in IoT WSNs: A li e a u e e iew,’’ Elec onics, ol. 12, no. 2,
p. 458, Jan. 2023, doi: 10.3390/elec onics12020458.
[5] P. Kau and P. Singh, ‘‘Adap i e da a ansmission p o ocols o ene gy
ha es ing WSNs used in ag icul u e,’’ J. Telecommun. In . Technol., ol. 1,
no. 2024, pp. 97–103, Ma . 2024, doi: 10.26636/j i .2024.1.1390.
[6] L. Nguyen and H. T. Nguyen, ‘‘Mobili y based ne wo k li e ime in wi eless
senso ne wo ks: A e iew,’’ Compu . Ne w., ol. 174, pp. 1–24, Jun. 2020,
doi: 10.1016/j.comne .2020.107236.
[7] J. Li, T. Han, W. Guan, and X. Lian, ‘‘A p eemp i e- esume p io i y MAC
p o ocol o e icien BSM ansmission in UAV-assis ed VANETs,’’ Appl.
Sci., ol. 14, no. 5, pp. 1–27, Ma . 2024, doi: 10.3390/app14052151.
[8] R. Zhu, A. Bouke che, D. Li, and Q. Yang, ‘‘Delay-awa e and eli-
able medium access con ol p o ocols o UWSNs: Fea u es, p o ocols,
and classi ica ion,’’ Compu . Ne w., ol. 252, pp. 1–21, Oc . 2024, doi:
10.1016/j.comne .2024.110631.
[9] F. Ojeda, D. Mendez, A. Faja do, and F. Ellinge , ‘‘On wi eless senso
ne wo k models: A c oss-laye sys ema ic e iew,’’ J. Senso Ac ua o
Ne w., ol. 12, no. 4, p. 50, Jun. 2023, doi: 10.3390/jsan12040050.
[10] A. Roy and N. Sa ma, ‘‘A synch onous du y-cycled ese a ion
based MAC p o ocol o unde wa e wi eless senso ne wo ks,’’
Digi al Commun. Ne w., ol. 7, no. 3, pp. 385–398, 2021,
doi: 10.1016/j.dcan.2020.09.001.
[11] C. Blondia, ‘‘E alua ion o he end o end esponse imes in an
ene gy ha es ing wi eless senso ne wo k using a ecei e ini ia ed
MAC p o ocol,’’ Ad Hoc Ne w., ol. 136, pp. 1–12, Ap . 2022, doi:
10.1016/j.adhoc.2022.102995.
[12] P. D. Nguyen and L.-W. Kim, ‘‘Senso sys em: A su ey o senso ype,
ad hoc ne wo k opology and ene gy ha es ing echniques,’’ Elec onics,
ol. 10, no. 2, pp. 1–20, Jan. 2021, doi: 10.3390/elec onics10020219.
[13] R. P iyada shi, ‘‘Explo ing machine lea ning solu ions o o e coming
challenges in IoT-based wi eless senso ne wo k ou ing: A comp ehensi e
e iew,’’ Wi eless Ne w., ol. 30, no. 4, pp. 2647–2673, May 2024, doi:
10.1007/s11276-024-03697-2.
[14] E. Ba bie a o and A. Ga i, ‘‘The challenges o machine lea ning: A c i ical
e iew,’’ Elec onics, ol. 13, no. 2, pp. 1–30, Jan. 2024, doi: 10.3390/elec-
onics13020416.
[15] E. D i sas and M. T igka, ‘‘Machine lea ning in in o ma ion and communi-
ca ions echnology: A su ey,’’ In o ma ion, ol. 16, no. 1, p. 8, Dec. 2024,
doi: 10.3390/in o16010008.
[16] M. Abbasi, A. Shah aki, M. Jalil Pi an, and A. Tahe ko di, ‘‘Deep
ein o cemen lea ning o QoS p o isioning a he MAC laye : A
su ey,’’ Eng. Appl. A i . In ell., ol. 102, pp. 1–20, Jun. 2021, doi:
10.1016/j.engappai.2021.104234.
[17] A. R. Gaidhani and A. D. Po gan wa , ‘‘A e iew o machine lea ning based
ou ing p o ocols o wi eless senso ne wo k li e ime,’’ Eng. P oc., ol. 59,
no. 1, pp. 1–13, 2024, doi: 10.3390/engp oc2023059231.
[18] D. Han, B. Mulyana, V. S anko ic, and S. Cheng, ‘‘A su ey on deep
ein o cemen lea ning algo i hms o obo ic manipula ion,’’ Senso s,
ol. 23, no. 7, p. 3762, Ap . 2023, doi: 10.3390/s23073762.
[19] P. N. Ka unanayake, A. Könsgen, T. Wee awa dane, and A. Fö s e ,
‘‘Q lea ning based adap i e p o ocol pa ame e s o WSNs,’’
J. Commun. Ne w., ol. 25, no. 1, pp. 76–87, Feb. 2023, doi:
10.23919/JCN.2022.000035.
[20] Q. Gang, W. U. Rahman, F. Zhou, M. Bilal, W. Ali, S. U. Khan,
and M. I. Kha ak, ‘‘A Q-lea ning-based app oach o design an
ene gy-e icien MAC p o ocol o UWSNs h ough collision
a oidance,’’ Elec onics, ol. 13, no. 22, p. 4388, No . 2024,
doi: 10.3390/elec onics13224388.
[21] M. U. Rehman, I. Uddin, M. Adnan, A. Ta iq, and S. Malik, ‘‘VTA-
SMAC: Va iable a ic-adap i e du y cycled senso MAC p o ocol
o enhance o e all QoS o S-MAC p o ocol,’’ IEEE Access, ol. 9,
pp. 33030–33040, 2021,doi: 10.1109/ACCESS.2021.3061357.
[22] J.-D. Abdulai, A. A. Amengu, F. A. Ka s iku, and K. S. Adu-Manu,
‘‘CBU-SMAC: An ene gy-e icien CLUSTER-BASED UNIFIED SMAC
algo i hm o wi eless senso ne wo ks,’’ J. Ambien In ell. Humanized
Compu ., ol. 15, no. 4, pp. 2073–2092, Ap . 2024, doi: 10.1007/s12652-
023-04737-z.
[23] Z. Ahmed, M. M. Rehan, O. Chugh ai, and M. W. Rehan, ‘‘AD-RDC:
A no el adap i e dynamic adio du y cycle mechanism o low-powe
IoT de ices,’’ IEEE In e ne Things J., ol. 9, no. 15, pp. 13376–13389,
Aug. 2022, doi: 10.1109/JIOT.2022.3145017.
[24] G. Kim, J.-G. Kang, and M. Rim, ‘‘Dynamic du y-cycle MAC p o ocol o
IoT en i onmen s and wi eless senso ne wo ks,’’ Ene gies, ol. 12, no. 21,
p. 4069, Oc . 2019, doi: 10.3390/en12214069.
[25] S. Sa ang, G. M. S ojano ić, S. S anko ski, Ž. T po ski, and M. D iebe g,
‘‘Ene gy-e icien asynch onous QoS MAC p o ocol o wi eless sen-
so ne wo ks,’’ Wi eless Commun. Mobile Compu ., ol. 2020, pp. 1–13,
Sep. 2020, doi: 10.1155/2020/8860371.
[26] B. A. Muzakka i, M. A. Mohamed, M. F. A. Kadi , and M. Mama ,
‘‘Queue and p io i y-awa e adap i e du y cycle scheme o ene gy e icien
wi eless senso ne wo ks,’’ IEEE Access, ol. 8, pp. 17231–17242, 2020,
doi: 10.1109/ACCESS.2020.2968121.
[27] A. N. El-Shenhabi, E. H. Abdelhay, M. A. Mohamed, and I. F. Moawad,
‘‘A ein o cemen lea ning-based dynamic clus e ing o sleep scheduling
algo i hm (RLDCSSA-CDG) o comp essi e da a ga he ing in wi eless
senso ne wo ks,’’ Technologies, ol. 13, no. 1, p. 25, Jan. 2025, doi:
10.3390/ echnologies13010025.
[28] X. Wang, H. Chen, and S. Li, ‘‘A ein o cemen lea ning-based sleep
scheduling algo i hm o comp essi e da a ga he ing in wi eless senso ne -
wo ks,’’ EURASIP J. Wi eless Commun. Ne w., ol. 2023, no. 1, pp. 1–17,
Ma . 2023, doi: 10.1186/s13638-023-02237-4.
[29] H. Du a, A. K. Bhuyan, and S. Biswas, ‘‘Rein o cemen lea ning
based low and ene gy managemen in esou ce-cons ained wi eless
ne wo ks,’’ Compu . Commun., ol. 202, pp. 73–86, Ma . 2023, doi:
10.1016/j.comcom.2023.02.011.
[30] F. A oz and R. B aun, ‘‘Empi ical analysis o ex ended QX-MAC o
IoT-based WSNS,’’ Elec onics, ol. 11, no. 16, p. 2543, Aug. 2022, doi:
10.3390/elec onics11162543.
[31] H. Y. Huang, K. T. Kim, and H. Y. Youn, ‘‘De e mining node du y cycle
using Q-lea ning and linea eg ession o WSN,’’ F on ie s Compu . Sci.,
ol. 15, no. 1, pp. 1–7, Feb. 2021, doi: 10.1007/s11704-020-9153-6.
[32] H. Y. Huang, T.-J. Lee, and H. Y. Youn, ‘‘E en d i en du y cycling
wi h ein o cemen lea ning and Mon e Ca lo echnique o wi eless
ne wo k,’’ Mobile In . Sys ., ol. 2021, pp. 1–12, Ma . 2021, doi:
10.1155/2021/6644389.
[33] P. S. Bane jee, S. N. Mandal, D. De, and B. Mai i, ‘‘RL-sleep: Tempe a u e
adap i e sleep scheduling using ein o cemen lea ning o sus ainable
connec i i y in wi eless senso ne wo ks,’’ Sus ain. Compu ing: In o ma .
Sys ., ol. 26, pp. 1–18, Jun. 2020, doi: 10.1016/j.suscom.2020.100380.
[34] B.-N. T inh, L. Mu phy, and G.-M. Mun ean, ‘‘A ein o cemen
lea ning-based du y cycle adjus men echnique in wi eless mul imedia
senso ne wo ks,’’ IEEE Access, ol. 8, pp. 58774–58787, 2020, doi:
10.1109/ACCESS.2020.2982590.
[35] C. Sa aglio, P. Pace, G. Aloi, A. Lio a, and G. Fo ino, ‘‘Ligh weigh
ein o cemen lea ning o ene gy e icien communica ions in wi eless
senso ne wo ks,’’ IEEE Access, ol. 7, pp. 29355–29364, 2019, doi:
10.1109/ACCESS.2019.2902371.
[36] S. Sa ang, G. M. S ojano ic, M. D iebe g, S. S anko ski, K. Bingi,
and V. Jeo i, ‘‘Machine lea ning p edic ion based adap i e du y
cycle MAC p o ocol o sola ene gy ha es ing wi eless senso
ne wo ks,’’ IEEE Access, ol. 11, pp. 17536–17554, 2023, doi:
10.1109/ACCESS.2023.3246108.
[37] M. Pundi and J. K. Sandhu, ‘‘A sys ema ic e iew o quali y o se ice
in wi eless senso ne wo ks using machine lea ning: Recen end and
u u e ision,’’ J. Ne w. Compu . Appl., ol. 188, pp. 1–33, Aug. 2021, doi:
10.1016/j.jnca.2021.103084.
[38] N. Es-Sakali, Z. Zoubi , S. I. Kai ouni, M. O. Mghazli,
M. Che kaoui, and J. P a e o , ‘‘Ad anced p edic i e main enance
and aul diagnosis s a egy o enhanced HVAC e iciency in
buildings,’’ Appl. The mal Eng., ol. 254, pp. 1–17, Oc . 2024, doi:
10.1016/j.appl he maleng.2024.123910.
VOLUME 13, 2025 156185
S. A. La i e al.: RL-Based In elligen Du y Cycle MAC P o ocol o IoT
[39] S. Ra al, A. Bad i, M. Moughi , E. M. A -Reyouchi, and K. Ghoumid,
‘‘AI-d i en op imiza ion o low-ene gy IoT p o ocols o scalable
and e icien sma heal hca e sys ems,’’ IEEE Access, ol. 13,
pp. 48401–48415, 2025, doi: 10.1109/ACCESS.2025.3551224.
[40] K. Medani, C. Ghe bi, H. Mabed, and Z. Alioua , ‘‘Ene gy-e icien
Q-lea ning-based pa h planning o UAV-aided da a collec ion in ag i-
cul u al WSNs,’’ In e ne Things, ol. 33, pp. 1–17, Sep. 2025, doi:
10.1016/j.io .2025.101698.
[41] M. A. Ridwan, N. A. M. Radzi, F. Abdullah, and Y. E. Jalil, ‘‘Appli-
ca ions o machine lea ning in ne wo king: A su ey o cu en issues
and u u e challenges,’’ IEEE Access, ol. 9, pp. 52523–52556, 2021, doi:
10.1109/ACCESS.2021.3069210.
[42] G. Obaido, I. D. Mienye, O. F. Egbelowo, I. D. Emmanuel, A. Ogunleye,
B. Ogbuoki i, P. Mienye, and K. A uleba, ‘‘Supe ised machine lea ning
in d ug disco e y and de elopmen : Algo i hms, applica ions, challenges,
and p ospec s,’’ Mach. Lea n. Appl., ol. 17, pp. 1–20, Sep. 2024, doi:
10.1016/j.mlwa.2024.100576.
[43] H. Sha ma, A. Haque, and F. Blaabje g, ‘‘Machine lea ning in wi eless
senso ne wo ks o sma ci ies: A su ey,’’ Elec onics, ol. 10, no. 9,
pp. 1–22, Ap . 2021, doi: 10.3390/elec onics10091012.
[44] S. Fo opoulou, ‘‘A e iew o unsupe ised lea ning in as onomy,’’ As on.
Compu ., ol. 48, pp. 1–23, Jul. 2024, doi: 10.1016/j.ascom.2024.100851.
[45] K. Hu, M. Li, Z. Song, K. Xu, Q. Xia, N. Sun, P. Zhou, and
M. Xia, ‘‘A e iew o esea ch on ein o cemen lea ning algo i hms
o mul i-agen s,’’ Neu ocompu ing, ol. 599, pp. 1–33, Sep. 2024, doi:
10.1016/j.neucom.2024.128068.
[46] M. Al-Hamadani, M. Fadhel, L. Alzubaidi, and B. Ha angi, ‘‘Rein o ce-
men lea ning algo i hms and applica ions in heal hca e and obo ics: A
comp ehensi e and sys ema ic e iew,’’ Senso s, ol. 24, no. 8, pp. 1–25,
Ap . 2024, doi: 10.3390/s24082461.
[47] C.-M. Wu, Y.-C. Kao, K.-F. Chang, C.-T. Tsai, and C.-C. Hou, ‘‘A
Q-lea ning-based adap i e MAC p o ocol o In e ne o Things
ne wo ks,’’ IEEE Access, ol. 9, pp. 128905–128918, 2021, doi:
10.1109/ACCESS.2021.3103718.
[48] M. S. F ikha, S. M. Gamma , A. Lahmadi, and L. And ey, ‘‘Rein o ce-
men and deep ein o cemen lea ning o wi eless In e ne o Things:
A su ey,’’ Compu . Commun., ol. 178, pp. 98–113, Oc . 2021, doi:
10.1016/j.comcom.2021.07.014.
[49] E. D i , M. Shi in, and O. Gu ewi z, ‘‘Coope a i e mul i-agen ein-
o cemen lea ning o da a ga he ing in ene gy-ha es ing wi eless senso
ne wo ks,’’ Ma hema ics, ol. 12, no. 13, pp. 1–34, Jul. 2024, doi:
10.3390/ma h12132102.
[50] T. Pe i, M. Zajc, and M. Beko, ‘‘TinyML: Machine lea ning on ul a-low-
powe mic ocon olle s,’’ Fu u e In e ne , ol. 14, no. 12, pp. 1–18, 2022,
A . no. 363, doi: 10.3390/ i14120363.
[51] K. A. Ngo, T. T. Huynh, and D. T. Huynh, ‘‘Simula ion wi eless senso
ne wo ks in Cas alia,’’ in P oc. In . Con . In ell. In . Technol., Hanoi,
Vie nam, Feb. 2018, pp. 39–44.
[52] A. Va ga. OMNeT++: Disc e e E en Simula o . Accessed: Ap . 20, 2025.
[Online]. A ailable: h ps://omne pp.o g
[53] M. A. Alha bi, M. Kolbe g, and M. Zeeshan, ‘‘Towa ds imp o ed clus-
e ing and ou ing p o ocol o wi eless senso ne wo ks,’’ EURASIP J.
Wi eless Commun. Ne w., ol. 2021, no. 1, pp. 1–31, Dec. 2021, doi:
10.1186/s13638-021-01911-9.
[54] B. Zeng, S. Li, and X. Gao, ‘‘Th eshold-d i en K-means sec o clus e ing
algo i hm o wi eless senso ne wo ks,’’ EURASIP J. Wi eless Commun.
Ne w., ol. 2024, no. 1, pp. 1–19, Sep. 2024, doi: 10.1186/s13638-024-
02403-2.
[55] A. Sohoub, S. Sa i, and M. Esh awie, ‘‘Op imal clus e size o wi e-
less senso ne wo ks,’’ In . J. Wi eless Mic ow. Technol., ol. 13, no. 1,
pp. 36–49, Feb. 2023, doi: 10.5815/ijwm .2023.01.04.
[56] S udy on Ambien Powe -Enabled In e ne o Things,
documen TR 22.840, 3GPP, 2023. [Online]. A ailable:
h ps://www.3gpp.o g/ p/Specs/a chi e/22_se ies/22.840/22840-120.zip
[57] M. S. Es-haghi, C. Ani escu, and T. Rabczuk, ‘‘Me hods o
enabling eal- ime analysis in digi al wins: A li e a u e e iew,’’
Compu . S uc ., ol. 297, Jul. 2024, A . no. 107342, doi:
10.1016/j.comps uc.2024.107342.
[58] A. K. Bapa la, S. P. Mohan y, and E. Kougianos, ‘‘SFa m: A dis ibu ed
ledge based emo e c op moni o ing sys em o sma a ming,’’ in
IFIP Ad ances in In o ma ion and Communica ion Technology, 2022,
pp. 13–31.
[59] M. A i , J. A. Maya, N. Anandan, D. A. Pé ez, A. M. Tonello, H. Zangl,
and B. Rinne , ‘‘Resou ce-e icien ubiqui ous senso ne wo ks o sma
ag icul u e: A su ey,’’ IEEE Access, ol. 12, pp. 193332–193364, 2024,
doi: 10.1109/ACCESS.2024.3516814.
[60] N. Senadhi a, S. Du ani, S. A. Al i, N. Yang, and X. Zhou, ‘‘UAV-assis ed
IoT moni o ing ne wo k: Adap i e mul iuse access o low-la ency and
high- eliabili y unde bu s y a ic,’’ IEEE T ans. Commun., ol. 73, no. 7,
pp. 5279–5294, Jan. 2024, doi: 10.1109/TCOMM.2024.3516503.
[61] S. Singh, U. Singh, N. Mi al, and F. Ga ed, ‘‘A sel adap i e a ac ion
and epulsion based naked mole a algo i hm o ene gy e icien mobile
wi eless senso ne wo ks,’’ Sci. Rep., ol. 14, Jan. 2024, A . no. 1040, doi:
10.1038/s41598-024-51218-0.
[62] K. Xu, Z. Li, A. Cui, S. Geng, D. Xiao, X. Wang, and P. Wan, ‘‘Q-lea ning
and e icien low-quan i y cha ge me hod o nodes o ex end he li e ime o
wi eless senso ne wo ks,’’ Elec onics, ol. 12, no. 22, p. 4676, No . 2023,
doi: 10.3390/elec onics12224676.
SHAH ABDUL LATIF ecei ed he B.Eng. deg ee
in elecommunica ion enginee ing om Hamda d
Uni e si y, Ka achi, Pakis an, in 2014, and he
M.Eng. deg ee in elec onic enginee ing om he
NED Uni e si y o Enginee ing and Technology,
Ka achi, in 2017. He is cu en ly pu suing he
Ph.D. deg ee wi h he Depa men o Elec ical
and Elec onics Enginee ing, Uni e si i Teknologi
PETRONAS, Se i Iskanda , Malaysia. P e iously,
he has o e i e yea s o eaching expe ience in
he ield o elec ical and elec onic enginee ing a he unde g adua e le el.
His esea ch in e es s include machine lea ning, medium access con ol
p o ocol, ene gy ha es ing communica ions, wi eless senso ne wo ks, and
he In e ne o Things.
MICHEAL DRIEBERG (Membe , IEEE) ecei ed
he B.Eng. deg ee in elec ical and elec on-
ics enginee ing om Uni e si i Sains Malaysia,
Penang, Malaysia, in 2001, he M.Sc. deg ee
in elec ical and elec onics enginee ing om
Uni e si i Teknologi PETRONAS, Se i Iskanda ,
Malaysia, in 2005, and he Ph.D. deg ee in elec-
ical and elec onics enginee ing om Vic o ia
Uni e si y, Melbou ne, Aus alia, in 2011. He is
cu en ly a Senio Lec u e wi h he Depa men o
Elec ical and Elec onics Enginee ing, Uni e si i Teknologi PETRONAS.
He has published and se ed as a e iewe o se e al high-impac jou nals
and lagship con e ences. He has also made se e al con ibu ions o he
wi eless b oadband s anda ds g oup. His esea ch in e es s include adio
esou ce managemen , medium access con ol p o ocols, ene gy ha es ing
communica ions,andpe o manceanalysis o wi elessandsenso ne wo ks.
SOHAIL SARANG (Senio Membe , IEEE)
ecei ed he B.Eng. deg ee in elecommunica ion
enginee ing om Hamda d Uni e si y, Ka achi,
Pakis an, in 2014, he M.Sc. deg ee in elec i-
cal and elec onics enginee ing om Uni e si i
Teknologi PETRONAS (UTP), Malaysia, in 2018,
and he Ph.D. deg ee in elec ical and compu e
enginee ing om he Facul y o Technical Sci-
ences, Uni e si y o No i Sad (UNS), Se bia.
Cu en ly, he is a Pos doc o al Resea che wi h
he Depa men o Elec ical Enginee ing, Facul y o Technical Sciences,
UNS. His esea ch in e es s include ene gy ha es ing communica ions,
low-powe senso ne wo ks, ba e y- ee IoT, MAC p o ocols, and machine
lea ning-d i en communica ion algo i hms and p o ocols.
156186 VOLUME 13, 2025
S. A. La i e al.: RL-Based In elligen Du y Cycle MAC P o ocol o IoT
AZRINA ABD AZIZ (Senio Membe , IEEE)
ecei ed he B.Eng. deg ee (Hons.) in elec ical
and elec onic enginee ing om The Uni e si y
o Queensland, Aus alia, in 1997, he M.Sc.
deg ee in sys em-le el in eg a ion om he Ins i-
u e o Sys em Le el In eg a ion (ISLI), Sco land,
in 2003, and he Ph.D. deg ee in compu e sys ems
enginee ing om Monash Uni e si y, Melbou ne,
Aus alia, in 2013. She is cu en ly a Senio
Lec u e wi h he Depa men o Elec ical and
Elec onic Enginee ing, Uni e si i Teknologi PETRONAS (UTP), Malaysia.
Be o e en e ing academia, she gained indus y expe ience in in eg a ed
ci cui (IC) packaging, speci ically in he inal isual inspec ion p ocess. He
esea ch in e es s include ene gy-e icien opology con ol echniques o
wi eless senso ne wo ks (WSNs), wi eless body a ea ne wo ks (WBANs)
o biomedical applica ions and medical imaging, and he applica ion o
machine lea ning in hese domains. She is a Regis e ed Membe o he Boa d
o Enginee s Malaysia and a Regis e ed P o essional Technologis and is
ac i ely in ol ed in p o essional communi ies as he Vice Chai o he IEEE
Robo ics and Au oma ion Socie y, Malaysia, and a membe o he IEEE
Women in Enginee ing Malaysia.
RIZWAN AHMAD (Membe , IEEE) ecei ed
he M.Sc. deg ee in communica ion enginee -
ing and media echnology om he Uni e si y
o S u ga , S u ga , Ge many, in 2004, and
he Ph.D. deg ee in elec ical enginee ing om
Vic o ia Uni e si y, Melbou ne, VIC, Aus alia,
in 2010. F om 2010 o 2012, he was a Pos -
doc o al Resea ch Fellow wi h Qa a Uni e si y,
Doha, Qa a , on a QNRF G an . He is cu en ly
a P o esso wi h he School o Elec ical Engi-
nee ing and Compu e Science (SEECS), Na ional Uni e si y o Sciences
and Technology (NUST), Islamabad, Pakis an. He is also he Di ec o o
he Communica ion Sys ems and Ne wo king (CSN) Labo a o y, SEECS,
NUST. He has au ho ed o e 100 jou nal a icles and con e ence pape s.
His esea ch in e es s include public sa e y ne wo ks, medium access con-
ol p o ocols, spec um and ene gy e iciency, ene gy ha es ing, and he
pe o mance analysis o wi eless communica ion and ne wo ks. He was a
ecipien o he p es igious In e na ional Pos g adua e Resea ch Schola ship
om he Aus alian Go e nmen . He se ed as a e iewe o IEEE jou nals
and con e ences.
GORAN M. STOJANOVIĆ (Membe , IEEE)
ecei ed he B.Sc., M.Sc., and Ph.D. deg ees in
elec ical enginee ing om he Facul y o Techni-
cal Sciences (FTS), Uni e si y o No iSad (UNS),
Se bia, in 1996, 2003, and 2005, espec i ely.
He is cu en ly a Full P o esso wi h FTS, UNS.
He has 27 yea s o expe ience in esea ch and
de elopmen . He has mo e han 18 yea s o
expe ience in w i ing, implemen a ion, and coo -
dina ion o EU- unded p ojec s (Ho izon Eu ope,
H2020, EUREKA, ERASMUS, and CEI), wi h a o al budge exceeding
22.86 MEUR. He was a supe iso o 14 Ph.D. s uden s, 40 M.Sc. s u-
den s, and 60 diploma s uden s a FTS-UNS. He is he au ho /co-au ho o
280 a icles, including 180 in pee - e iewed jou nals wi h impac ac o s, i e
books, h ee pa en s, and wo chap e s in a monog aph. His esea ch in e es s
include senso s, lexible elec onics, ex ile elec onics, edible elec onics,
and mic o luidics. He was a keyno e speake a 14 in e na ional con e ences.
VOLUME 13, 2025 156187