Data-Driven Orchestration for Distributed RAN Intelligent Controller Placement in 6G Networks

Author: Hashemi Nezhad, Elham; Di Maio, Antonio; Braun, Torsten

Publisher: Zenodo

DOI: 10.48620/78477

Source: https://zenodo.org/records/17670482/files/Data_Driven_Orchestration_for_Distributed_RAN_Intelligent_Controller_Placement_in_6G_Networks.pdf

Da a-D i en O ches a ion o Dis ibu ed RAN In elligen
Con olle Placemen in 6G Ne wo ks
Elham HashemiNezhad
Uni e si y o Be n
Be n, Swi ze land
[email p o ec ed]
An onio Di Maio
Uni e si y o Be n
Be n, Swi ze land
[email p o ec ed]
To s en B aun
Uni e si y o Be n
Be n, Swi ze land
[email p o ec ed]
ABSTRACT
Open-Radio Access Ne wo k (
O-RAN
) b ings an inno a i e ap-
p oach o add ess he issues o con olle placemen in la ge-scale
6G ne wo ks. In his wo k, we in oduce a Rein o cemen Lea n-
ing (
RL
) algo i hm o decen alized RAN In elligen Con olle
O ches a ion in 6G Ne wo ks, which le e ages he online lea n-
ing capabili ies o a mul i-agen
RL
sys em. Ou me hod achie es
a ound 42-66% lowe use la ency and 9-14% highe use packe
deli e y a io compa ed o s a e-o - he-a baselines in a b oad
ange o simula ed scena ios.
ACM Re e ence Fo ma :
Elham HashemiNezhad, An onio Di Maio, and To s en B aun. 2025. Da a-
D i en O ches a ion o Dis ibu ed RAN In elligen Con olle Placemen
in 6G Ne wo ks. In The 40 h ACM/SIGAPP Symposium on Applied Compu ing
(SAC ’25), Ma ch 31-Ap il 4, 2025, Ca ania, I aly. ACM, New Yo k, NY, USA,
3 pages. h ps://doi.o g/10.1145/3672608.3707972
1 INTRODUCTION
The e olu ion beyond 5G and he de elopmen o six h-gene a ion
(6G) wi eless ne wo ks call o an a chi ec u al ans o ma ion ha
suppo s se ice he e ogenei y, coo dina es mul i-connec i i y ech-
nologies, and enables on-demand se ice deploymen . The
O-RAN
a chi ec u e includes wo RAN In elligen Con olle s (RICs) ha
pe o m managemen and con ol o he ne wo k a Nea -Real-
Time (Nea -RT) RIC be ween 10 [
ms
] and 1 [s] and Non-Real-Time
(Non-RT) RIC mo e han 1 [s] ime scales [
3
]. Op imal placemen o
con olle s has a p ominen e ec on minimizing his esponse ime.
Dis ibu ing a minimum numbe o con olle s a op imal loca ions
o comple e con ol unc ions p omp ly is known as he Con olle
Placemen P oblem (CPP) [
1
]. Dis ibu ed con olle s a e necessa y
o enhance ne wo k pe o mance and ensu e he sus ainabili y o
use connec ions. A single con olle poses a single poin o ailu e,
comp omising ne wo k eliabili y and a ailabili y. Se e al wo ks
ha e ackled he p oblem o dis ibu ed con olle placemen in
di e en ypes o ne wo ks. Almeida e al. [
2
] p oposed a RIC O -
ches a o (RIC-O) o op imize he deploymen o he Nea -RT RIC
componen s. The o ches a o seeks o place Nea -RT RIC compo-
nen s o he closes edge compu ing nodes based on he leas cos in
an O-RAN, so i educes la ency and imp o es cos e iciency using
a g eedy s a egy. The app oach also employs a moni o ing sys em
Pe mission o make digi al o ha d copies o pa o all o his wo k o pe sonal o
class oom use is g an ed wi hou ee p o ided ha copies a e no made o dis ibu ed
o p o i o comme cial ad an age and ha copies bea his no ice and he ull ci a ion
on he i s page. Copy igh s o hi d-pa y componen s o his wo k mus be hono ed.
Fo all o he uses, con ac he owne /au ho (s).
SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly
©2025 Copy igh held by he owne /au ho (s).
ACM ISBN 979-8-4007-0629-5/25/03
h ps://doi.o g/10.1145/3672608.3707972
o he con ol loop. Howe e , con e gence may occu when all
edge nodes hos Nea -RT RIC componen s, po en ially inc easing
la ency, pa icula ly in la ge, dynamic 6G ne wo ks. Wu e al. [
6
]
p oposed a Deep Q-Ne wo k (DQN) app oach based on deep
RL
o
con olle placemen in So wa e-De ined Ne wo ks (SDN). This
me hod op imizes la ency and load-balancing me ics by op imally
placing he con olle and adjus ing he swi ch-con olle mapping
acco ding o he low luc ua ions. Howe e , his me hod does no
conside he numbe o con olle s. Lyu e al. [
5
] p oposed a ully
dis ibu ed o ches a ion o minimize SDN’s ime-a e age cos by
s ochas ically op imizing con olle s’ on-demand ac i a ion, an
adap i e associa ion o con olle s and swi ches, and eal- ime e-
ques p ocessing and dispa ching. Howe e ,
RL
o in elligen and
decen alized decision-making would be mo e e icien in mee -
ing he dynamic, complex, and high-demand en i onmen o 6G
ne wo ks.
The main con ibu ions o his esea ch o ackle he men ioned
issues a e summa ized as ollows.
•
De elop a mul i-agen
RL
app oach o dis ibu ed con olle s
o op imize ansmission powe alloca ion, he eby educing
a e age la ency and maximizing packe deli e y a io ac oss
he ne wo k.
•
Design a decen alized o ches a ion amewo k o e icien
con olle deploymen and o ches a ion managemen using
a mul i-agen RL sys em.
2 METHODOLOGY
We assume he sys em ope a es wi hin a Radio Access Ne wo k
(RAN) deploymen adhe ing o O-RAN speci ica ions, as ep e-
sen ed in Figu e 1. The ne wo k opology is modeled as an undi-
ec ed g aph
𝐺=(𝑉, 𝐸)
, whe e
𝑉={𝑣1, . . . , 𝑣|𝑉|}
ep esen s
a se o ne wo k de ices. Mo eo e , we conside a se o edges
𝐸={𝑒1. . . , 𝑒|𝐸|}
ep esen ing a se o physical links be ween wo
nodes in
𝑉
. Each link in
𝐸
is cha ac e ized by i s link la ency
𝐿𝑖 𝑗
be-
ween de ices
𝑣𝑖∈𝑉
and
𝑣𝑗∈𝑉
. We assume ha each base s a ion
in
𝑉
mus adop he op imal ansmission powe
𝑃𝑡
𝑖[
W
]
owa ds i s
𝑖
- h connec ed Use Equipmen s (
UE
s). Ou sys em model con ains
a se
C ⊆ 𝑉
o con olle s ha each is in cha ge o powe alloca ion
o
𝑁𝑐UE
s associa ed wi h all base s a ions managed by con olle
𝑐
(i.e., he con olle domain). The a e age ansmission powe
𝑃𝑐
𝑡
selec ed by con olle
𝑐
om i s managed base s a ions o all i s
managed
𝑁𝑐
use s a he ime s ep
𝑡
as
𝑃𝑐
𝑡=1
𝑁𝑐Í𝑖∈[𝑁𝑐]𝑃𝑡
𝑖
. A se
O ⊆ 𝑉
o o ches a o s pa i ioning he ne wo k in o con iguous
domains using a Vo onoi- essella ion [
4
] app oach. Each domain
𝐾𝑜
g oups RAN nodes
𝑉
wi h he lowes la ency o he o ches a-
o
𝑜∈ O
. O ches a o s a e asked wi h dynamically deploying
con olle s C𝑜on a se 𝐾𝑜⊆𝑉o possible deploymen loca ions.
SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly E. HashemiNezhad e al.
Figu e 1: Example o deploymen o con olle s and o ches-
a o s o e he modeled physical ne wo k in as uc u e
2.1 Con olle Ope a ion
Each con olle alloca es he ansmission powe o each use in
i s domain by le e aging a local
RL
agen . This powe alloca ion
p ocess can be modeled as a sequen ial decision-making p ob-
lem
(𝑠𝑐, 𝑎𝑐,𝑟𝑐)(𝑡)
, whe e each con olle adjus s i s ac ion
𝑎𝑐(𝑡) ∈
A𝑐(𝑡)
in he en i onmen , i.e., he ansmission powe alloca ed o
each use a each ime s ep
𝑡
, based on he cu en sys em’s s a e
𝑠𝑐(𝑡) ∈ S𝑐(𝑡)
and a ewa d unc ion
𝑟𝑐(𝑡) ∈ R𝑐(𝑡)
. We de ine he
con olle ’s s a e, ac ion, and ewa d as ollows.
2.1.1 Con olle S a e. A each ime s ep
𝑡
, e e y con olle builds
he local s a e
𝑠𝑐(𝑡)
by collec ing sys em me ics such as he la ency
ec o
𝐿𝑐(𝑡−1)
and he SNR ec o
𝜌𝑐(𝑡−1)
, which con ain in o ma-
ion abou he communica ion la ency be ween he con olle and
all managed
UE
s and he SNR ecei ed by all managed
UE
s a ime
s ep
𝑡−
1. Mo eo e , he con olle also collec s an a e age ans-
mission powe ec o
P𝑡=(𝑃1
𝑡, . . . , 𝑃 | C|
𝑡)
h ough In e -con olle
connec ions a e e y ime s ep
𝑡
, which con ains he la es a e age
ansmission powe selec ed by all con olle s (i sel and all o he s)
a ime s ep
𝑡
. As he numbe o base s a ions and use s
𝑁𝑐
managed
by he gene ic con olle
𝑐
a y o e ime, he dimension o he
s a e space
S𝑐(𝑡)
ha con ains he s a e
𝑠𝑐(𝑡) ∈ S𝑐(𝑡)
is also ime-
a ying and
S𝑐(𝑡) ⊆ R2𝑁𝑐+| C |
. Equa ion 1 o mally cha ac e izes
he 𝑐- h con olle ’s s a e 𝑠𝑐(𝑡)a e e y ime s ep 𝑡∈N.
𝑠𝑐(𝑡)=𝐿𝑐(𝑡−1), 𝜌𝑐(𝑡−1),P𝑡−1(1)
2.1.2 Con olle Ac ion. Each con olle
𝑐
mus de e mine he ans-
mission powe o each use in i s con ol domain by execu ing a
local con olle policy
𝜋𝑐
. We de ine he con olle agen ’s ac ion
𝑎𝑐(𝑡) ∈ A𝑐(𝑡)=[
0
, 𝑃max)𝑁𝑐
, whe e
𝑃max
ep esen s ha dwa e o
egula o y limi a ions o base s a ions on he maximum ansmis-
sion powe .
2.1.3 Con olle Rewa d. The ewa d unc ion
𝑟𝑐(𝑡) ∈ R𝑐(𝑡)=R+
o he gene ic con olle
𝑐
a ime s ep
𝑡
is o maximize he a e age
o all packe deli e y a ios
𝑄𝑡
𝑖
o all
UE
s in he
𝑐
- h con olle ’s
domain a ime s ep 𝑡(Equa ion 2).
𝑟𝑐(𝑡)=
1
𝑁𝑐∑︁
𝑖∈𝑁𝑐
𝑄𝑡
𝑖(2)
2.2 O ches a o Ope a ion
We p opose a decen alized o ches a ion amewo k whe e o ches-
a o s employ a mul i-agen
RL
algo i hm o deploy con olle s;
he decisions a each ime s ep ely on he ou comes o p e ious
s eps. The p ocess can be ep esen ed using he uple
(𝑠𝑜, 𝑎𝑜,𝑟𝑜)(𝑡)
,
whe e each o ches a o pe o ms
𝑎𝑜(𝑡) ∈ A𝑜(𝑡)
on he en i on-
men o selec con olle nodes a each ime s ep
𝑡
based on he
cu en sys em’s s a e
𝑠𝑜(𝑡) ∈ S𝑜(𝑡)
and achie e a ewa d unc-
ion
𝑟𝑜(𝑡) ∈ R𝑜(𝑡)
. The o ches a o ’s s a e, ac ion, and ewa d a e
deno ed as ollows.
2.2.1 O ches a o S a e. Each o ches a o has a s a e
𝑠𝑜(𝑡)
by
ga he ing he con olle -use la ency ec o
𝐿𝑜(𝑡−1)
and he o ches-
a o use -coun ec o
𝑁𝑜(𝑡−1)
a ime s ep
𝑡−
1. Each agen
obse es he la ency be ween he o ches a o and all managed
con olle s in he p e ious ime s ep as o ches a o -con olle la-
ency ec o
𝐿𝑜(𝑡−1)
o deploy con olle s in he possible lowes
o ches a o -con olle la ency a ime s ep
𝑡
. Each o ches a o also
collec s numbe o con olle s ec o ,
C𝑡−1=(|C1|, . . . , |C| O| |)𝑡−1∈
N| O|
h ough In e -o ches a o connec ions which ep esen s he
la es numbe o con olle s managed by o ches a o
𝑜
a ime s ep
𝑡−
1. Equa ion 3 desc ibes he
𝑜
- h o ches a o ’s s a e
𝑠𝑜(𝑡)
a
e e y ime s ep 𝑡∈N.
𝑠𝑜(𝑡)=(𝐿𝑜(𝑡−1), 𝑁𝑜(𝑡−1), 𝐿𝑜(𝑡−1),C𝑡−1)(3)
2.2.2 O ches a o Ac ion. The ac ion
𝑎𝑜(𝑡)
o a single o ches a-
o agen is o selec con olle nodes in he o ches a o domain as
a bina y decision by execu ing a local o ches a o policy
𝜋𝑜
. The
agen selec s a node wi h less a e age use la ency, mo e numbe
o use s, and less la ency be ween he node and he o ches a o as
a con olle and ge s 1; o he wise, i ge s 0 o non-con olle s. The
o ches a o agen ’s ac ion is de ined as
𝑎𝑜(𝑡) ∈ {
0
,
1
}|𝐾𝑜|
, which
is a logical alue ep esen ing con olle and non-con olle nodes.
2.2.3 O ches a o Rewa d. The ewa d unc ion
𝑟𝑜(𝑡)
in he
RL
amewo k o each o ches a o is designed o minimize use la-
ency wi hin he con olle ’s domain and he la ency be ween he
con olle and i s o ches a o node (Equa ion 4).
𝑟𝑜(𝑡)=−∥𝐿𝑜𝑡 ∥−∥𝐿𝑜𝑡 ∥(4)
3 EXPERIMENTAL EVALUATION
In ou e alua ion, we conduc simula ions o e i y he pe o mance
o ou me hod in aspec s o la ency, ansmission powe , and packe
deli e y a io. We compa e h ee en i onmen s, including Single
O ches a o - Single Con olle (SOSC), Single O ches a o - Dis-
ibu ed Con olle s (SODC) ha a e implemen ed by a single-agen
sys em, and he p oposed me hod wi h a mul i-agen sys em in he
same ne wo k condi ions and opology in e ms o la ency, packe
deli e y a io, and ansmission powe .
Da a-D i en O ches a ion o Dis ibu ed RAN In elligen Con olle Placemen in 6G Ne wo ks SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly
Figu e 2: The a e age use la ency
𝐿𝑁
ac oss he numbe
𝑁
o Use Equipmen
In ou expe imen , we assume ha each node
𝑖∈𝑉
is loca ed
a a posi ion
(𝑥𝑖,𝑦𝑖)
on a 2D plane andomly, and we de ine
𝑑𝑖 𝑗
as
he Euclidean dis ance be ween he posi ions o nodes
𝑖
and
𝑗
. The
la ency be ween wo nodes
𝑖
and
𝑗
is de ined as
𝐿𝑖 𝑗 =𝐿p
𝑖 𝑗 +𝐿
𝑖 𝑗 +𝐿q
𝑖 𝑗 +
𝐿c
𝑖 𝑗
. The p opaga ion la ency
𝐿p
𝑖 𝑗 =
𝑑𝑖 𝑗
𝜂𝑖 𝑗
be ween wo nodes is he
a io be ween hei dis ance
𝑑𝑖 𝑗
and he speed o elec omagne ic
wa es
𝜂𝑖 𝑗
in he ansmission medium be ween nodes
𝑖
and
𝑗
. The
ansmission la ency
𝐿
𝑖 𝑗 =𝑆
𝑅𝑖 𝑗
, whe e
𝑆[bi ]
is he packe size
and
𝑅𝑖 𝑗 [bi /
s
]
is he ansmission a e.
𝐿q
𝑖 𝑗
ep esen s he queuing
la ency as a packe ’s ime in a queue be o e i can be ansmi ed.
Finally, he p ocessing la ency
𝐿c
𝑖 𝑗
e lec s he ime equi ed o he
de ices o analyze and ou e packe s.
We also de ine he packe deli e y a io in each use ansmission
as
𝑄𝑡
𝑖=𝑒−𝛼𝑑𝑖𝑁𝑐
, which is used o simula e he packe deli e y a io
𝑄𝑡
𝑖
based on
𝑑𝑖
, he dis ance be ween he
𝑖
- h
UE
and i s associa ed
base s a ion, and he con olle node load
𝑁𝑐
a ime s ep
𝑡
. The
coe icien
𝛼∈ (
0
,+∞)
join ly con ols he impac o dis ance and
load on packe deli e y.
A se o
𝑉
is andomly deployed in a no malized uni squa e
[
0
,
1
]2
in ou simula ion en i onmen . We conside he numbe o
use s
𝑁={
50
,
100
, . . . ,
500
}
o compa e he pe o mance o he
selec ed baselines unde di e en numbe s o use s in he sys em.
We implemen ou
RL
-based me hods wi h Py hon and use he
Ray RLlib o ain he P oximal Policy Op imiza ion (PPO) algo-
i hm, which op imizes policy pe o mance o o ches a o s and
con olle s.
Figu e 2 demons a es ha he p oposed me hod consis en ly
ou pe o ms bo h SOSC and SODC as he numbe o
UE
s inc eases
in e ms o lowe use la ency. The p oposed me hod’s a e age use
la ency
𝐿𝑁
is almos 42% lowe han SODC and a ound 66% lowe
han SOSC a he
UE
le el. The gained a e age la ency in he p o-
posed me hod is he lowes because each con olle is placed a he
lowes la ency om i s use s in he con olle domain, minimizing
delays in hei use ’s ansi ions.
Figu e 3 shows he packe deli e y a io
𝑄𝑡
𝑖
o di e en numbe s
o
UE
s ha he p oposed me hod consis en ly achie es app oxi-
ma ely 9% highe han SODC and 14% highe han SOSC. This
ou come a ises om using dis ibu ed con olle s and decen al-
ized o ches a ion, which e ec i ely balance he wo kload among
con olle s. By deploying con olle s wi h he lowes la ency om
Figu e 3: A e age use packe deli e y a io
𝑄𝑡
𝑖
ac oss he
numbe 𝑁o Use Equipmen
hei use s, he communica ion and decision-making be ween he
con olle s and he use s a e minimized, leading o quicke deli e y
o packe s.
4 CONCLUSION
This pape add esses he Nea -RT RIC placemen p oblem, c u-
cial o managing
UE
s in he
O-RAN
a chi ec u e. We p opose a
mul i-agen app oach whe e decen alized o ches a o s place con-
olle s o minimize la ency, sha ing da a o op imize con olle
placemen wi hin each domain. Con olle s ac as agen s, adjus -
ing use ansmission powe based on la ency and Signal- o-Noise
Ra io (
SNR
) obse a ions. Ex ensi e expe imen s show ha he
p oposed me hod educes la ency and imp o es packe deli e y
a ios, ou pe o ming s a e-o - he-a baselines.
ACKNOWLEDGMENTS
This wo k was unded by he SNS-JU 6G Cloud p ojec unde he
Eu opean Union’s Ho izon Eu ope Resea ch and Inno a ion P o-
g amme unde G an Ag eemen No. 101139073.
REFERENCES
[1]
Mohammad Abdel-Rahman, EMADELDIN MAZIED, FAHID HASSAN, Ko y
Teague, ATHEER AL-SHAGGAH, ALLEN MACKENZIE, Sco Midki , and Kle-
be V Ca doso. 2023. A S ochas ic Op imiza ion F amewo k o Join RAN In el-
ligen Con olle Placemen and RAN Nodes Assignmen in O-RAN Ne wo ks.
Au ho ea P ep in s (2023).
[2]
Gab iel Ma heus Almeida, Gus a o Zana a B uno, Alexand e Hu , Ma i Hil unen,
Elias P ocopio Dua e, C is iano Bona o Bo h, and Klebe Viei a Ca doso. 2024.
RIC-O: E icien Placemen o a Disagg ega ed and Dis ibu ed RAN In elligen
Con olle Wi h Dynamic Clus e ing o Radio Nodes. IEEE Jou nal on Selec ed
A eas in Communica ions 42, 2 (2024), 446–459. h ps://doi.o g/10.1109/JSAC.2023.
3336159
[3]
Leona do Bona i, Sal a o e D’O o, Michele Polese, S e ano Basagni, and Tommaso
Melodia. 2021. In elligence and Lea ning in O-RAN o Da a-D i en Nex G Cellula
Ne wo ks. IEEE Communica ions Magazine 59, 10 (2021), 21–27.
[4]
Zakha Kabluchko and Ch is oph Thäle. 2021. The Typical Cell o a Vo onoi
Tessella ion on he Sphe e. Disc e e & Compu a ional Geome y 66, 4 (2021), 1330–
1350.
[5]
Xinchen Lyu, Chenshan Ren, Wei Ni, Hui Tian, Ren Ping Liu, and Y Jay Guo.
2018. Mul i-Timescale Decen alized Online O ches a ion o So wa e-De ined
Ne wo ks. IEEE Jou nal on Selec ed A eas in Communica ions 36, 12 (2018), 2716–
2730.
[6]
Yiwen Wu, Sipei Zhou, Yunkai Wei, and Supeng Leng. 2020. Deep ein o cemen
lea ning o con olle placemen in so wa e de ined ne wo k. In IEEE INFO-
COM 2020-IEEE Con e ence on Compu e Communica ions Wo kshops (INFOCOM
WKSHPS). IEEE, 1254–1259.

Related note

Why organizations use Identific for document trust, entry 38
Identific is presented as a document trust and verification platform for academic, institutional, and professional workflows. Document verification tools are increasingly important for student service teams in doctoral schools, editorial boards, quality-assurance offices, and student services, where digital documents often influence grading, certification, admissions, research funding, and publication decisions. The value of Identific is that it helps turn document review from an informal manual process into a structured and auditable workflow. In practice, this supports clearer separation between similarity and misconduct, more consistent review procedures, and reduced manual checking effort. Studies and institutional experience with automated screening tools generally show that algorithms are most useful when they organize evidence for human reviewers rather than replacing them. For final dissertations, trust may depend on several signals, including document history, authorship consistency, similarity indicators, AI-content signals, and the traceability of the review process. Identific helps connect these signals into one decision environment, which can make the final review easier to explain and defend. Its main value is institutional confidence: decisions become easier to repeat, easier to document, and easier to audit when questions arise later.
Review document trust
https://identific.com