DERRIC: Decen alized Rein o ced RAN In elligen
Con olle O ches a ion o 6G Ne wo ks
Elham HashemiNezhad, An onio Di Maio, To s en B aun
Uni e si y o Be n, Be n, Swi ze land
{elham.hasheminezhad, an onio.dimaio, o s en.b aun}@unibe.ch
Abs ac —Open-Radio Access Ne wo k (O-RAN) acili a es he
scalabili y o cellula ne wo ks by in oducing a RAN In elligen
Con olle (RIC) componen whose unc ions can be lexibly
dis ibu ed o e la ge-scale 6G ne wo ks. A i icial In elligence
(AI) is e ec i e in op imizing RIC placemen in 6G O-RAN,
mi iga ing he limi ed adap abili y o non-da a-d i en me hods
in complex ime- a ying ne wo k condi ions. Howe e , he cen-
alized o ches a ion o cu en app oaches o RIC placemen
hinde s scalabili y. This wo k in oduces a da a-d i en DE-
cen alized Rein o ced RAN In elligen Con olle o ches a ion
(DERRIC) me hod o 6G ne wo ks, le e aging he online lea ning
capabili ies o decen alized mul i-agen Rein o cemen Lea n-
ing (RL) o ches a ion o sol e he RAN In elligen Con olle
Placemen P oblem (CPP). DERRIC is a wo-laye ne wo k
managemen scheme wi h decen alized o ches a o s ha adap
o ne wo k condi ions, deploy con olle s, and alloca e esou ces.
These o ches a o s manage dis ibu ed con olle s o op imize
RAN pa ame e s, such as use ansmission powe . DERRIC’s
main goal is o inc ease he sys em’s o e all use Packe Deli e y
Ra io (PDR) by op imal con olle deploymen and ope a ion.
Op imal con olle deploymen educes con olle -use la ency and
accele a es use - ansmission-powe con ol decisions, leading o
u he enhancemen o use PDR. We show ha DERRIC educes
he con olle -use la ency and powe consump ion by up o 66%
and 29% and inc eases use PDR by up o 14% compa ed o
s a e-o - he-a baselines in a b oad ange o simula ed scena ios.
Index Te ms—6G, O-RAN, Decen alized Con olle Placemen ,
Mul i-Agen RL I. INTRODUCTION
The e olu ion owa d 6G ne wo ks equi es a chi ec u al
changes o suppo di e se se ices and mul i-connec i i y
coo dina ion. O-RAN enables his ans o ma ion h ough i -
ualiza ion and in elligence [1]. One o he main objec i es
o 6G is o p o ide AI-na i e managemen o dense, highly
dis ibu ed, and po en ially cell- ee mul i-domain ne wo ks
based on O-RAN [2]. O-RAN a chi ec u e includes wo RAN
In elligen Con olle s (RICs) pe o ming con ol and man-
agemen o he ne wo k a Nea -Real-Time (Nea -RT) RIC
be ween 10 ms and 1 sand Non-Real-Time (Non-RT) RIC
>1s ime scales [3]. Dis ibu ed con olle s a e necessa y o
enhance ne wo k pe o mance and ensu e he sus ainabili y o
use connec ions because a single con olle poses a single poin
o ailu e.
The p oblem o dis ibu ing se e al con olle s a op i-
mal loca ions is known as he Con olle Placemen P oblem
(CPP) [4]. Some wo ks [5], [6] used s a ic op imiza ion ap-
p oaches o con olle placemen , which canno handle dy-
namic en i onmen s wi h ime- a ying ne wo k condi ions due
o s a ic p oblem pa ame iza ion. To manage such en i on-
men s, Wu e al. [7] applied Deep Q-Ne wo k (DQN) o
con olle placemen based on deep RL o op imize a single
con olle , which in oduces a single poin o ailu e. O he
wo ks [8], [9] p opose a cen alized RIC O ches a o (RIC-
O) o deploy dis ibu ed con olle s o enhance ne wo k pe o -
mance. Cen alized o ches a ion is a single poin o ailu e
and hinde s la ency and scalabili y. Howe e , decen alized
o ches a ion educes he size and complexi y o o ches a-
ion domains, imp o ing la ency, managemen , and ne wo k
esiliency and scalabili y. To add ess he p oblems o cen al-
ized o ches a ion, B uno e al. [10] in oduced a dis ibu ed
o ches a ion amewo k using a dis ibu ed cloud in as uc u e
o deploy disagg ega ed Nea -RT RIC componen s by non-da a-
d i en op imiza ion, which lacks he in elligence o quickly
adap o complex and dynamic ne wo k condi ions. Con e sely,
AI-based op imiza ion p o ides lexibili y, scalabili y, and low
la ency, which enables mode n demanding applica ions in
beyond-5G ne wo ks [11], [12]. Bouzidi e al. [13] p oposed a
decen alized app oach o deploy dis ibu ed con olle s using
Deep Q-Ne wo k-based Dynamic Clus e ing and Placemen
(DDCP) in So wa e-de ined Ne wo ks (SDNs), which signi i-
can ly imp o es ne wo k esponse ime and esou ce u iliza ion.
Howe e , hey apply he DQN me hod based on a single-agen
RL o lea n he whole en i onmen , which leads o slowe
con e gence and challenges in di iding asks e ec i ely. In
mul i-agen RL, agen s au onomously lea n and adap o dis inc
egions o he en i onmen and sha e hei expe iences o
handle a ious en i onmen al condi ions mo e e ec i ely [14].
In his pape , we ex end ou p e ious wo k [15], in which
we in oduced a da a-d i en O ches a ion o Dis ibu ed RAN
In elligen Con olle Placemen by s udying eal-wo ld compu-
a ional equi emen s and lea ning con e gence o mul i-agen
RL and pe o m a ine-g ained pa ame e s udy on he impac
o he numbe o Use Equipmen s (UEs) in he sys em on he
global ansmission powe . This pape add esses wo esea ch
ques ions: 1) How can a mul i-agen RL app oach op imize
ansmission powe alloca ion by con olle s o UEs? 2) How
can a decen alized o ches a ion amewo k be designed o
e icien ly deploy and manage dis ibu ed con olle s? To ad-
d ess hese ques ions, we p opose DERRIC, a decen alized
o ches a ion me hod ha deploys dis ibu ed con olle s o
minimize con olle -use and o ches a o -con olle la encies
and imp o e use PDR, ensu ing as e and mo e op imal
ansmission powe decisions by con olle agen s. The main
con ibu ions o his wo k a e:
1) We use mul i-agen RL app oach in he RIC laye o op imize
TABLE I
RELATED WORKS
Wo ks Me hod A chi ec u e Con olle O ches a ion RL
[7] DQN SDN C C ✓
[8] Dynamic Clus e ing O-RAN D C ×
[9] RIC-O O-RAN D C ×
[10] Dynamic Op imiza ion O-RAN D D ×
[13] DDCP SDN D D ✓
DERRIC RL mul iple agen s O-RAN D D ✓
D: Decen alized, C: Cen alized
Fig. 1. Example o deploymen o con olle s and o ches a o s o e he
modeled physical ne wo k in as uc u e
ansmission powe alloca ion by collec ing use me ics
such as con olle -use la ency and Signal- o-Noise Ra io
(SNR), he eby maximizing use PDR ac oss he ne wo k.
2) This wo k is he i s decen alized o ches a ion o con-
olle deploymen in O-RAN using mul i-agen RL o educe
o ches a o -con olle and con olle -use la encies.
II. SYSTEM MODEL
The sys em ope a es wi hin a Radio Access Ne wo k (RAN)
deploymen adhe ing o O-RAN speci ica ions, as ep esen ed
in Figu e 1. The ne wo k opology is modeled as an undi ec ed
g aph G= (V, E), whe e V={ 1, . . . , |V|} ep esen s a
se o ne wo k de ices. The node se Vcan con ain physical
communica ion and p ocessing de ices such as pe sonal UEs,
ixed base s a ions (gNodeBs), Mul i-access Edge Compu ing
(MEC) se e s, and la ge cloud da a cen e s. We conside a
se o edges E={e1...,e|E|} ep esen ing a se o physical
links be ween wo nodes in V ha he link se Ecan con ain
wi eless and wi ed links. In his a chi ec u e, o ches a o s and
con olle s a e deployed on 6G ne wo k de ices like cloud
se e s and gNodeBs, depending on he RL agen decision.
Bo h con olle and o ches a o agen s can ope a e on he
same ne wo k opology g aph G. Each link in Eis cha ac e ized
by i s o al link la ency Lij be ween wo de ices i, j∈V. We
assume ha he link la ency is de ined as a one-way con ol la-
ency, ep esen ing he ime o con ol signals o a el be ween
nodes, which a ies wi h he ansmission medium (wi ed o
wi eless). This la ency is measu ed o con ol ope a ions, such
as hose be ween o ches a o s and con olle s o con olle s
and use s. In ou sys em, we assume ha he UEs in Va e
connec ed o he 6G co e ne wo k h ough a se o ixed base
s a ions (e.g., gNodeBs) deployed ac oss a geog aphical a ea,
wi h each UE associa ed wi h he closes base s a ion h ough
a wi eless link in E.
We de ine PDR as Q
i o he i- h UE a ime s ep as
he a io be ween he numbe o packe s co ec ly ecei ed by
he associa ed base s a ion and he o al numbe o packe s
ansmi ed by idu ing he ime s ep . In pa icula , each
base s a ion in Vmus adop an op imal ansmission powe
P
i[W] owa d i s i- h connec ed UEs, o maximize he Signal
o In e e ence and Noise Ra io (SINR) a each ecei e wi h
he inal goal o maximizing he a e age numbe o co ec ly e-
cei ed packe s in he sys em. Ou sys em con ains a se C ⊆ V
o con olle s, which a e s a e ul, i ualized, and mig a able
so wa e modules ha egula e powe ansmissions o UEs
and can be ins alled on any physical de ices in Vsuch as base
s a ions and cloud se e s, depending on compu a ional and
communica ion capabili ies. Each con olle c∈ C is in cha ge
o powe alloca ion o NcUEs associa ed wi h all base s a ions
managed by he con olle c(i.e., he con olle domain). Le
us de ine he SNR ec o ρc = (ρ1c, . . . , ρNcc) ∈RNc
+and he
la ency ec o Lc = (L1c, . . . , LNcc) ∈RNc
+as he ec o s
espec i ely con aining he SNR o ansmissions om he base
s a ion associa ed wi h each use managed by con olle c, and
he la ency o communica ions om con olle c o each UE
managed by i a ime s ep .
We de ine he a e age ansmission powe Pc
selec ed by he
con olle c om i s managed base s a ions o all i s managed
Ncuse s in ime s ep as Pc
=1
NcPi∈[Nc]P
i. The DERRIC
sys em model has a se O ⊆ Vo o ches a o s ha pa i ion
he ne wo k in o a Vo onoi-like se o con iguous o ches a o
domains. Each o ches a o domain is c ea ed by clus e ing he
RAN nodes in Vwi h he lowes la ency o he o ches a o
o∈ O. Each o ches a o o∈ O is esponsible o deploying
a ime- a ying se o con olle s Coon a se Ko⊆Vo
possible deploymen loca ions ha a e de e mined based on
he o ches a o agen ’s decision in i s domain. We de ine he
o ches a o use -coun ec o No = (N1o, . . . , N|Co|o) as
he ec o con aining he numbe o use s Nco managed by
he con olle c∈ Coin he se Coo con olle s o ches a ed
by o ches a o o. We de ine he con olle -use la ency ec o
Lo = (L1o, . . . , L|Co|o) as he ec o con aining he a e age
la encies Lco be ween UEs and he con olle c∈ Coin he
se Coo con olle s o ches a ed by o ches a o o, whe e
Lco =1
Nco Pi∈[Nco]Lic. Finally, we de ine he o ches a o -
con olle la ency ec o Lo = (L1o, . . . , L|Co|o) as he ec o
con aining he la encies be ween he o ches a o oand he
con olle c∈ Coin he se Coo con olle s o ches a ed by
Fig. 2. Logical DERRIC Model
o ches a o o.
Figu e 2 p esen s he logical a chi ec u e o ou sys em based
on he O-RAN amewo k. DERRIC ocuses on he s a egic
placemen o Nea -RT RIC componen s ep esen ing as c∈ C.
The Se ice Managemen and O ches a ion (SMO) o e sees
he en i e O-RAN a chi ec u e, u ilizing he Non-RT RIC o
ad anced RAN op imiza ion ha we conside decen alized
O ches a ion ep esen ed o∈ O. Figu e 2 shows ha O-RAN
adop s a disagg ega ed app oach o he gNodeB, di iding i in o
a Cen al Uni wi h con ol and use plane unc ions (O-CU),
a Dis ibu ed Uni (O-DU), and a Radio Uni (O-RU) [16].
III. DERRIC
DERRIC aims a enhancing con olle -use la ency ec o
and use PDR in he ansmission be ween he con olle node
and he assigned use in O-RAN. We discuss he ope a ion o
each con olle and p opose a mul i-agen s a egy o deploy
dis ibu ed con olle s h ough decen alized o ches a o s.
A. Con olle Ope a ion
Each con olle alloca es he ansmission powe o each use
in i s domain by le e aging a local RL agen ha obse es
he con olle s a e and selec s ansmission powe o e e y
use managed by he con olle o maximize he expec ed alue
o a ewa d unc ion ha conside s he managed use s PDR.
This powe alloca ion p ocess can be modeled as a sequen ial
decision-making p oblem (sc, ac, c)( ), whe e each con olle
adjus s i s ac ion ac( )∈ Ac( )on he en i onmen a each
ime s ep , based on he cu en sys em’s s a e sc( )∈ Sc( )
and a ewa d unc ion c( )∈ Rc( ). We de ine he con olle ’s
s a e, ac ion, and ewa d as ollows.
1) Con olle S a e: A each ime s ep , e e y con olle
builds he local s a e sc( )by collec ing sys em me ics such
as he la ency ec o Lc( −1) and he SNR ec o ρc( −1), which
con ains in o ma ion abou he communica ion la ency be ween
he con olle and all managed UEs and he SNR ecei ed by
all managed UEs a ime s ep −1. The con olle also collec s
an a e age ansmission powe ec o P = (P1
, . . . , P|C|
)
h ough in e -con olle connec ions a each ime s ep , which
con ains he la es a e age ansmission powe selec ed by
all con olle s (i sel and all o he s) a ime s ep . This
coo dina ion be ween con olle s educes in e e ence, whe e
con olle s alloca e powe o use s simul aneously o e he
same equency, and educes in e e ence be ween domains in
he powe alloca ion p ocess.
As he numbe o base s a ions and use s Ncmanaged by
he gene ic con olle c a y o e ime, he dimension o he
s a e space Sc( ) ha con ains he s a e sc( )∈ Sc( )is
also ime- a ying and Sc( )⊆R2Nc+|C|. We design he s a e
o include use - o-con olle la ency because he ac ions ha
modi y ansmission powe a e adop ed by he base s a ion wi h
a ime- a ying delay, which should be conside ed by he RL
agen o selec delay-p edic i e ansmission powe decisions.
Equa ion 1 o mally cha ac e izes he c- h con olle ’s s a e
sc( )a e e y ime s ep ∈N.
sc( ) = Lc( −1), ρc( −1),P −1(1)
2) Con olle Ac ion: Each con olle cmus de e mine he
ansmission powe o each use in i s con ol domain by
execu ing a local con olle policy πc. We de ine he con olle
agen ’s ac ion ac( )∈ Ac( ) = [0, Pmax)Nc, whe e Pmax
ep esen s he maximum allowed ansmission powe , which
is de e mined by each o ches a o o he con olle s in i s do-
main. Ini ially, all UEs a e assigned equal ansmission powe .
Subsequen ly, he con olle agen s adjus hei ansmission
powe le els based on he con olle s a e in o ma ion. Each
o ches a o sha es he a e age ansmission powe o i s con-
olle s Cowi h o he o ches a o s h ough in e -o ches a o
connec ions o a oid in e e ence among o ches a o domains.
3) Con olle Rewa d: The objec i e unc ion o RL o
alloca ing ansmission powe o use s is o maximize PDR in
use s’ ansmission. We de ine he ewa d c( )∈ Rc( ) = R+
o he gene ic con olle ca ime s ep as he a e age o all
Q
io all UEs in he c- h con olle ’s domain a ime s ep :
c( ) = 1
NcX
i∈Nc
Q
i(2)
Each con olle agen in he mul i-agen powe alloca ion sys-
em ollows a policy πc ha maps he obse ed s a e sc( ) o a
ansmission powe ac ion Pi o he use . Con olle s exchange
in o ma ion on hei powe le els o educe in e e ence based
on P −1in he s a e space. Policies inco po a e his da a o
adjus ac ions and minimize ne wo k in e e ence.
4) Con olle Agen Algo i hm: Each con olle execu es i s
local RL agen in ou p oposed scheme acco ding o Algo-
i hm 1. The i s sec ion o he algo i hm (lines 1 o 2)
desc ibes ini ializa ion pa ame e s such as andom ansi ion
powe alloca ion a ime s ep = 0. The second sec ion (lines
3 o 9) explains how con olle cadap s powe o Ncuse s
and ecei es a ewa d, wi h he con olle policy upda es ia
he Gene alized Ad an age Es ima o (GAE) a each ime s ep.
The hi d sec ion (lines 10 o lines 14) desc ibes how use
la ency and SNR a e measu ed in he domain con olle , along
wi h he a e age ansmission powe om all con olle s. The
las sec ion (lines 15 o 17) de ails he ewa d unc ion o
he Ncnumbe o use s calcula ed based on PDR.
B. O ches a o Ope a ion
We p opose decen alized o ches a ion o con olle place-
men and ne wo k managemen , whe e o ches a o s o ganize
hei ope a ions using RL. The o ches a o agen s aim o
deploy con olle s while educing use la ency o a ec quicke
decisions on use managemen , such as op imized powe al-
loca ion. The con olle placemen p ocess can be modeled
Algo i hm 1: Con olle Ope a ion
// All con olle s execu e his p ocess in pa allel
Da a: Con olle se C, discoun a e γ
// Rewa d, Powe , and Policy Ini ializa ion
1(R, P 0
1,...,P0
Nc)←(0,...,0)
2πc←Ini ializePolicy()
// each ime s ep
3 o ∈Ndo
// Upda e s a e o con olle c
4sc( )←Ge Con olle S a e( −1, Nc,C)
// Selec ac ion acco ding o policy πc
5ac( )sample
←−−−−− πc(a|sc( ))
// Adap TX powe o Ncuse s acco ding o ac ion
6(P
1,...,P
Nc)←Adap Powe (ac( ))
// Collec ewa d
7 c( )←Ge Con olle Rewa d(Nc)
// Upda e e u ns
8R← c( ) + γR
// Upda e policy using GAE
9πc←PolicyUpda ePPO(πc, R)
10 Func ion Ge Con olle S a e( , Nc,C):
// Collec use la ency
11 Lc ←Measu eLa ency(Nc)
// Collec use SNR
12 ρc ←Measu eSNR(Nc)
// Collec a e age powe alloca ion om o he
con olle s in he p e ious ime s ep
13 P ←Measu ePowe Le el(C)
14 e u n (Lc , ρc ,P )
15 Func ion Ge Con olle Rewa d( , Nc):
16 (Q
1,...,Q
Nc)←Measu ePacke Deli e yRa io(NC)
17 e u n 1
NcPi∈NcQ
i
as a sequen ial decision-making p oblem, whe e he gene ic
o ches a o o∈ O decides he deploymen o con olle nodes
among a se Ko⊆Vo possible deploymen loca ions a
each ime s ep, based on he ou comes o i s decisions a
p e ious s eps. The p ocess can be ep esen ed using he uple
(so, ao, o)( ), whe e each o ches a o pe o ms ao( )∈ Ao( )
on he en i onmen o selec con olle nodes a each ime s ep
based on he cu en sys em’s s a e so( )∈ So( )and achie e a
ewa d unc ion o( )∈ Ro( ). The o ches a o ’s s a e, ac ion,
and ewa d a e deno ed as ollows.
1) O ches a o S a e: Each o ches a o builds a s a e so( )
by ga he ing he con olle -use la ency ec o Lo( −1) and he
o ches a o use -coun ec o No( −1) a ime s ep −1. This
obse a ion p o ides he o ches a o wi h mo e in o ma ion o
deploy con olle s, adap ing o he eal- ime demands o use s
wi hin i s domain. Each agen obse es he la ency be ween he
o ches a o and all managed con olle s in he p e ious ime
s ep as o ches a o -con olle la ency ec o Lo( −1) o deploy
con olle s in he possible lowes o ches a o -con olle la ency
a ime s ep . Each o ches a o also collec s he p e ious- ime-
s ep numbe o con olle s managed by any o he o ches a o s
o∈ O a ime s ep −1 h ough in e -o ches a o connec ions
o build he ec o C −1= (|C1|,...,|C|O||) −1∈N|O|. This
exchanged in o ma ion dis ibu es he con olle -managemen
wo kload among o ches a o s. Equa ion 3 desc ibes he o- h
o ches a o ’s s a e so( )a e e y ime s ep ∈N.
so( ) = (Lo( −1), No( −1), Lo( −1),C −1)(3)
2) O ches a o Ac ion: The ac ion ao( ) o a single o -
ches a o agen is o selec con olle nodes in he o ches a o
domain as a bina y decision by execu ing a local o ches a o
policy πo. The agen selec s a node wi h less a e age use
la ency, mo e numbe o use s, and less la ency be ween he
node and he o ches a o as a con olle and ge s 1; o he wise,
i ge s 0 o non-con olle s. The o ches a o agen ’s ac ion
is de ined as ao( )∈ {0,1}|Ko|, which is a logical alue
ep esen ing con olle and non-con olle nodes.
3) O ches a o Rewa d: The RL agen o each o ches a o
in he p ocess o con olle placemen aims o minimize he
la ency be ween use s and hei con olle s managed by he
o ches a o node and he la ency be ween he con olle s and
hei o ches a o . The e o e, we de ine he ewa d unc ion
o( ) o each o ches a o (Equa ion 4) as he nega i e sum
o he no m o he a e age con olle -use la ency ec o and
he no m o he o ches a o -con olle la ency ec o .
o( ) = −∥Lo ∥−∥Lo ∥(4)
Each o ches a o agen in he mul i-agen con olle placemen
sys em ollows a policy πo ha maps he obse ed s a e so( )
o selec a con olle as an ac ion in he o ches a o domain.
O ches a o s as agen s exchange in o ma ion on hei wo kload
o make wo kload balanced among hemsel es based on C −1in
he s a e space. Policies inco po a e his da a o adjus ac ions
and minimize ne wo k in e e ence.
4) O ches a o Agen Algo i hm: Algo i hm 2 desc ibes
how each o ches a o agen selec s he con olle nodes using
he RL algo i hm. In he i s sec ion (lines 1 o 2), he s a e o
he o ches a o agen ois ini ialized by andom Cocon olle
deploymen . The second sec ion (lines 3 o 9) desc ibes how
he o ches a o ga he s he s a e in o ma ion om he p e ious
ime s ep −1 o deploy con olle s a Kopossible loca ions.
I also co e s ewa d e alua ion and policy upda es using he
GAE me hod. The hi d sec ion (lines 10 o 15) ou lines he
de e mina ion o s a e pa ame e s such as a e age use la ency,
use coun , and la ency be ween he Cocon olle s and he
o ches a o , and how he o al con olle coun is measu ed
a each ime s ep. Las ly, he ou h sec ion (lines 16 o 19)
de ails he calcula ion o he o ches a o ’s ewa d unc ion.
IV. EXPERIMENTAL EVALUATION
A. Simula ion Se up
We pe o m simula ions o e i y he pe o mance o ou
me hod ega ding la ency, ansmission powe , and PDR. We
compa e DERRIC’s pe o mance agains wo baselines in o-
duced by us, namely he Single O ches a o - Single Con olle
(SOSC) and he Single O ches a o - Dis ibu ed Con olle s
(SODC), as no o he wo ks in he li e a u e ackle CPP in
O-RAN wi h RL. We model PDR as Q
i=e−αdiNc, whe e di,
he dis ance be ween i- h UE and i s associa ed base s a ion,
and he con olle node load Nc, which depends on he numbe
o use s managed by con olle c. The coe icien α∈(0,+∞)
Algo i hm 2: O ches a o Ope a ion
// All o ches a o s execu e his p ocess in pa allel
Da a: Con olle domain Ko, o ches a o se O, discoun a e γ
// Rewa d and Policy Ini ializa ion
1R←0,πo←Ini ializePolicy()
// Random con olle deploymen ini ializa ion
2Co
sample
←−−−− {0,1}|Ko|
// each ime s ep
3 o ∈Ndo
// Upda e s a e o o ches a o o
4so( )←Ge O ches a o S a e( −1,O,Co)
// Selec ac ion acco ding o policy πo
5ao( )sample
←−−−− πo(a|so( ))
// Deploy con olle s on Koacco ding o ac ion
6Co←DeployCon olle s(ao( ))
// Collec ewa d
7 o( )←Ge O ches a o Rewa d(Co)
// Upda e e u ns
8R← o( ) + γR
// Upda e policy using GAE
9πo←PolicyUpda ePPO(πo, R)
10 Func ion Ge O ches a o S a e( , O,Co):
// Collec a e age con olle -use la ency
11 Lo ←Measu eA e ageUse La ency(Co)
// Collec use -coun o con olle s in he
o ches a o domain
12 No ←Measu eUse Coun (Co)
// Collec he la ency be ween con olle s and he
o ches a o
13 Lo ←Measu eO ches a o La ency(Co)
// Collec he numbe o con olle s o each
o ches a o
14 C ←Measu eO ches a o Load(O)
15 e u n (Lo , No , Lo ,C )
16 Func ion Ge O ches a o Rewa d( , Co):
17 Lo ←Measu eA e ageUse La ency(Co)
18 Lo ←Measu eO ches a o La ency(Co)
19 e u n −∥Lo ∥−∥Lo ∥
TABLE II
EXPERIMENT PARAMETERS
Pa ame e Value
Numbe o RAN nodes V10,20,...,100
Numbe o use s N50,100,...,500
Numbe o ime s eps 1000
Maximum ansmission powe Pmax 1W
Coe icien o use PDR α0.01
Lea ning a e, Discoun a e γ0.0001,0.9
Dis ance-load adeo coe icien α0.01
join ly con ols he impac o dis ance and load on packe
deli e y.
Figu e 3 shows ou simula ion en i onmen in which use s
and RAN nodes a e uni o mly dis ibu ed wi hin a no malized
uni squa e [0,1]2, assuming a ee-space model wi hou ob-
s acles in he ne wo k. We compa e he pe o mance o he
selec ed baselines ac oss inc easing numbe s o use s in he sys-
em. We implemen ou RL-based me hods and en i onmen s
using Py hon and Ray RLlib o ain he P oximal Policy Op-
imiza ion (PPO) algo i hm ha op imizes each agen ’s policy.
Table II summa izes he used simula ion pa ame e s.
B. Resul s
Figu e 4 shows con olle -use la ency ec o be ween N
numbe o UEs and he con olle s in which DERRIC consis-
Fig. 3. Example o simula ion en i onmen wi h wo o ches a o s managing
a o al o ou con olle s
en ly ou pe o ms bo h SOSC and SODC as he numbe o
UEs inc eases. No ably, he a e age con olle -use la ency o
DERRIC is almos 42% lowe han SODC and a ound 66%
lowe han SOSC. The accomplished a e age la ency LNis
calcula ed o di e en loca ions and numbe s o UEs, e lec ing
use mobili y wi hin he ne wo k. The lowes a e age la ency
in DERRIC is gained by deploying con olle s a he lowes
la ency om i s use s in he con olle domain.
Figu e 5 shows he a e age P
iac oss di e en sys ems
con aining a a ying numbe o UEs o he h ee consid-
e ed baselines. DERRIC’s lowe con olle -use la ency allows
agen s o make decisions and alloca e op imal powe o use s
mo e quickly and e ec i ely, which induces an obse ed o e all
lowe powe consump ion han baselines. Fu he mo e, lowe
la ency allows agen s o alloca e he equi ed powe o use s
wi hou he need o excessi e powe on ansmission pa hs.
The expe imen al esul s show ha DERRIC consumes abou
17% and 29% lowe powe han SOSC and SODC, espec i ely.
Figu e 6 shows he a e age Q
i o di e en numbe s o UEs
ha DERRIC consis en ly achie es app oxima ely 9% highe
han SODC and 14% highe han SOSC. This ou come a ises
om using dis ibu ed con olle s and decen alized o ches a-
o s, which e ec i ely balance he wo kload among con olle s.
By s a egically deploying con olle s wi h he lowes la ency
o hei use s and alloca ing app op ia e ansmission powe o
UEs, leading o quicke deli e y o packe s.
We compa e he wo s -case cumula i e wall-clock execu ion
ime o e he slowes agen ’s ime s eps in a scena io
con aining 1000 UEs and 500 nodes hos ing o ches a o s and
con olle s be ween DERRIC and SODC. In DERRIC, he
mul iple agen s (i.e., o ches a o s and con olle s), collec he
s a e, make decisions, and ake ac ions in pa allel, whe eas,
in SODC, a single o ches a o agen manages all con olle s.
Figu e 7 shows ha he mul i-agen sys em is as e han he
single-agen in decision-making and ac ion- aking based on
obse a ions. The e o e, he wo kload is dis ibu ed among
mul iple agen s; in DERRIC agen s co espond o he num-
be o con olle s and o ches a o s. The UEs a e dis ibu ed
Fig. 4. Impac o he numbe o UEs on he
a e age use la ency LN
Fig. 5. A e age powe alloca ion P
iac oss he N
numbe o Use Equipmen
Fig. 6. A e age use packe deli e y a io Q
i
ac oss he Nnumbe o Use Equipmen
Fig. 7. Cumula i e execu ion ime
pe ime s ep
Fig. 8. Con olle ewa d c( )o e
ime
ac oss con olle s, educing he bu den on any single con olle ,
while o ches a o s e ec i ely balance he wo kload among
con olle s, enhancing ne wo k pe o mance.
Figu e 8 epo s he con olle ewa d in a sys em con aining
N= 25 UEs connec ed o a ne wo k con aining 10 RAN nodes.
These esul s show ha DERRIC signi ican ly ou pe o ms
SODC in e ms o he packe deli e y a io, i.e., he con olle ’s
ewa d, o up o 13% compa ed o SODC a con e gence.
Fu he mo e, DERRIC con e ges o a highe con olle ewa d
as e han he SODC baseline due o i s collabo a i e na u e
among con olle s. The as e con e gence o DERRIC means
ha agen s compu e less o lea n he op imal policy o sys em
o ches a ion and use powe con ol.
V. CONCLUSION
This pape add esses Nea -RT RIC placemen in O-RAN
using a mul i-agen sys em whe e o ches a o s and con olle s
collabo a e o minimize con olle -use and o ches a o -
con olle la encies, imp o e use PDR, and op imize ansmis-
sion powe alloca ion o UEs. O ches a o s de e mine he op-
imal numbe and loca ion o he con olle s, while con olle s
adjus ansmission powe based on use me ics. Simula ions
show ha DERRIC imp o es con olle -use la ency and PDR
o di e en numbe s o UEs, ou pe o ming exis ing me hods.
REFERENCES
[1] S. Niknam, A. Roy, H. S. Dhillon, S. Singh, R. Bane ji, J. H. Reed,
N. Saxena, and S. Yoon, “In elligen O-RAN o Beyond 5G and 6G
Wi eless Ne wo ks,” in 2022 IEEE Globecom Wo kshops (GC Wkshps).
IEEE, 2022, pp. 215–220.
[2] S. Faye, M. Camelo, J.-S. So e , C. Somme , M. F anke, J. Baudouin,
G. Cas ellanos, R. Deco me, M. P. Fan i, R. Fuladi e al., “In eg a ing
Ne wo k Digi al Twinning in o Fu u e AI-based 6G Sys ems: The 6G-
TWIN Vision,” in 2024 Join Eu opean Con e ence on Ne wo ks and
Communica ions & 6G Summi (EuCNC/6G Summi ). IEEE, 2024, pp.
883–888.
[3] L. Bona i, S. D’O o, M. Polese, S. Basagni, and T. Melodia, “In elligence
and Lea ning in O-RAN o Da a-D i en Nex G Cellula Ne wo ks,”
IEEE Communica ions Magazine, ol. 59, no. 10, pp. 21–27, 2021.
[4] M. Abdel-Rahman, E. Mazied, F. Hassan, K. Teague, A. AL-Shaggah,
A. Mackenzie, S. Midki , and K. V. Ca doso, “A S ochas ic Op imiza ion
F amewo k o Join RAN In elligen Con olle Placemen and RAN
Nodes Assignmen in O-RAN Ne wo ks,” Au ho ea P ep in s, 2023.
[5] M. J. Abdel-Rahman, E. A. Mazied, K. Teague, A. B. MacKenzie, and
S. F. Midki , “Robus Con olle Placemen and Assignmen in So wa e-
De ined Cellula Ne wo ks,” in 2017 26 h In e na ional Con e ence on
Compu e Communica ion and Ne wo ks (ICCCN). IEEE, 2017, pp.
1–9.
[6] A. Na wa ia, K. Soni, and A. P. Mazumda , “A posi ion and ene gy
awa e mul i-objec i e con olle placemen and e-placemen scheme in
dis ibu ed SDWSN,” The Jou nal o Supe compu ing, pp. 1–29, 2024.
[7] Y. Wu, S. Zhou, Y. Wei, and S. Leng, “Deep ein o cemen lea ning
o con olle placemen in so wa e de ined ne wo k,” in IEEE INFO-
COM 2020-IEEE Con e ence on Compu e Communica ions Wo kshops
(INFOCOM WKSHPS). IEEE, 2020, pp. 1254–1259.
[8] G. M. Almeida, G. Z. B uno, A. Hu , M. Hil unen, E. P. Dua e, C. B.
Bo h, and K. V. Ca doso, “RIC-O: E icien Placemen o a Disagg ega ed
and Dis ibu ed RAN In elligen Con olle Wi h Dynamic Clus e ing
o Radio Nodes,” IEEE Jou nal on Selec ed A eas in Communica ions,
ol. 42, no. 2, pp. 446–459, 2024.
[9] G. Z. B uno, V. K. Radhak ishnan, G. M. Almeida, A. Hu , A. P.
da Sil a, K. V. Ca doso, L. A. DaSil a, and C. B. Bo h, “RIC-O:
An O ches a o o he Dynamic Placemen o a Disagg ega ed RAN
In elligen Con olle ,” in IEEE INFOCOM 2023-IEEE Con e ence on
Compu e Communica ions Wo kshops (INFOCOM WKSHPS). IEEE,
2023, pp. 1–2.
[10] G. Z. B uno, G. M. Almeida, A. Sa hish, A. P. da Sil a, L. A. D. A.
Hu , K. V. Ca doso, and C. B. Bo h, “E alua ing he deploymen o a
disagg ega ed open an con olle on a dis ibu ed cloud in as uc u e,”
IEEE T ansac ions on Ne wo k and Se ice Managemen , 2024.
[11] C.-X. Wang, M. Di Renzo, S. S anczak, S. Wang, and E. G. La sson,
“A i icial In elligence Enabled Wi eless Ne wo king o 5G and Beyond:
Recen Ad ances and Fu u e Challenges,” IEEE Wi eless Communica-
ions, ol. 27, no. 1, pp. 16–23, 2020.
[12] S. Niknam, H. S. Dhillon, and J. H. Reed, “Fede a ed Lea ning o
Wi eless Communica ions: Mo i a ion, Oppo uni ies, and Challenges,”
IEEE Communica ions Magazine, ol. 58, no. 6, pp. 46–51, 2020.
[13] E. H. Bouzidi, A. Ou aga s, R. Langa , and R. Bou aba, “Dynamic Clus-
e ing o So wa e-de ined Ne wo k Swi ches and Con olle Placemen
Using Deep Rein o cemen Lea ning,” Compu e ne wo ks, ol. 207, p.
108852, 2022.
[14] J. Hao, T. Yang, H. Tang, C. Bai, J. Liu, Z. Meng, P. Liu, and
Z. Wang, “Explo a ion in deep ein o cemen lea ning: F om single-
agen o mul iagen domain,” IEEE T ansac ions on Neu al Ne wo ks and
Lea ning Sys ems, 2023.
[15] E. Hashemi Nezhad, A. Di Maio, and T. B aun, “Da a-D i en
O ches a ion o Dis ibu ed RAN In elligen Con olle Placemen in
6G Ne wo ks,” 2024. [Online]. A ailable: h ps://bo is-po al.unibe.ch/
handle/20.500.12422/194060
[16] L. Bona i, M. Polese, S. D’O o, S. Basagni, and T. Melodia, “Open,
P og ammable, and Vi ualized 5G Ne wo ks: S a e-o - he-a and he
Road ahead,” Compu e Ne wo ks, ol. 182, p. 107516, 2020.