MARC-6G: Mul i-Agen Rein o cemen Lea ning o
Dis ibu ed Con ex -Awa e SFC Deploymen and
Mig a ion in 6G Ne wo ks
Solomon Fikadie Wassie, E ic Samikwa, An onio Di Maio, and To s en B aun
Ins i u e o Compu e Science, Uni e si y o Be n, Swi ze land
Email: {solomon.wassie, e ic.samikwa, an onio.dimaio, o s en.b aun}@unibe.ch
Abs ac —The Cloud Con inuum F amewo k (CCF) ex ends
compu ing capabili ies ac oss nea -edge, a -edge, and ex eme-
edge nodes beyond he adi ional edge o mee he di e se
pe o mance demands o eme ging 6G applica ions. While Deep
Rein o cemen Lea ning (DRL) has demons a ed po en ial in
au oma ing Vi ual Ne wo k Func ion (VNF) mig a ion by lea n-
ing op imal policies, cen alized DRL-based o ches a ion aces
challenges ela ed o scalabili y and limi ed isibili y in dis ibu ed,
he e ogeneous ne wo k en i onmen s. To add ess hese limi a ions,
we in oduce MARC-6G (Mul i-Agen Rein o cemen Lea ning
o Dis ibu ed Con ex -Awa e Se ice Func ion Chain (SFC)
Deploymen and Mig a ion in 6G Ne wo ks), a no el amewo k
ha le e ages decen alized agen s o dis ibu ed, dynamic, and
se ice-awa e SFC placemen and mig a ion. MARC-6G allows
agen s o moni o di e en po ions o he ne wo k, collabo a i ely
op imize ne wo k con ol policies ia expe ience sha ing, and make
local decisions ha collec i ely enhance global o ches a ion unde
ime- a ying a ic condi ions. We show h ough simula ions
ha MARC-6G imp o es SFC deploymen e iciency, educes
mig a ion cos s by 34%, and lowe s ene gy consump ion by 12.5%
compa ed o he s a e-o - he-a cen alized DRL baseline.
Index Te ms—Mul i-Agen Rein o cemen Lea ning, Dis-
ibu ed Se ice O ches a ion, Dis ibu ed In elligence, Se ice
Func ion Chain
I. INTRODUCTION
The Six h Gene a ion (6G) mobile communica ion ne wo k
is expec ed o le e age he concep o he Cloud Con inuum
F amewo k (CCF), which p o ides mo e lexible compu a ional
esou ces close o end use s beyond adi ional edge com-
pu ing, he eby mee ing di e se applica ion equi emen s [1].
Howe e , ealizing he ull po en ial o his con inuum equi es
scalable and in elligen o ches a ion o dis ibu ed esou ces
ac oss a he e ogeneous, mul i- ie in as uc u e.
Ne wo k se ices a e p o isioned as sequences o he e oge-
neous, p ede ined, and o de ed Vi ual Ne wo k Func ion (VNF)
in he o m o SFC on s anda dized, gene al-pu pose se e s
enabled by Ne wo k Func ion Vi ualiza ion (NFV) echnol-
ogy [2]. VNFs a e so wa e-based implemen a ions o ne wo k
se ices such as Ne wo k Add ess T ansla ion (NAT), Fi ewalls
(FW), In usion De ec ion and P e en ion Sys ems (IDPS),
WAN op imize s (WO), Video Op imiza ion Con olle s (VOC),
T a ic moni o s (TM), and encoding/decoding unc ionali ies.
Those ne wo k unc ions p o ide a wide ange o eme ging ap-
plica ions, including ideo s eaming, i ual/augmen ed/mixed
eali y, Indus y 4.0, holog aphic communica ion, sma ac o-
ies, au onomous ehicles, ac ile indus ial in e ne [3].
Op imal VNF deploymen is one o he design equi emen s
o mode n mobile ne wo ks o ensu e sus ainable long- e m
pe o mance and minimize ope a ional cos s. This, in u n,
enables as , eliable, and cos -e ec i e deli e y o ne wo k
se ices. Se e al s udies ha e p oposed machine lea ning based
app oaches o he cen alized o ches a ion o ne wo k unc-
ions [4], [5]. Deep Rein o cemen Lea ning (DRL) is employed
o ne wo k s a e awa eness by le e aging Deep Lea ning
(DL) o ex ac complex, high-dimensional ne wo k pa e ns
and using Rein o cemen Lea ning (RL) o op imize decision-
making h ough in e ac ions wi h dynamic ne wo k s a es.
A Se ice O ches a o (SO) is a ne wo k managemen
sys em designed o au oma e he p o isioning, scaling, and
li ecycle managemen o ne wo k se ices [6]. Despi e being
e ec i e in small-scale ne wo ks, cen alized se ice o ches a-
o s exhibi se e al limi a ions in la ge-scale en i onmen s due
o hei limi ed isibili y o he global ne wo k s a e. These limi-
a ions include a single poin o ailu e, high signaling o e head
o ne wo k-wide da a collec ion, and educed esponsi eness
in eal- ime decision-making, which deg ade he pe o mance
o la ency-sensi i e applica ions, and o en lack he lexibili y
and scalabili y equi ed o e icien ly manage dynamic and
he e ogeneous wo kload demands [7].
The main esea ch ques ion add essed in his wo k is: How o
op imally deploy and dynamically econ igu e mul iple SFC
eques s in la ge-scale 6G ne wo ks, while adap ing o ime-
a ying a ic demands and he e ogeneous in as uc u e
esou ces, o sa is y end- o-end pe o mance equi emen s?
To add ess his challenge, we in oduce MARC-6G (Mul i-
Agen Rein o cemen Lea ning o Dis ibu ed Con ex -Awa e
SFC Deploymen and Mig a ion in 6G Ne wo ks). In MARC-
6G, agen s moni o po ions o he CCF and collabo a i ely
lea n VNF placemen and mig a ion policies om eal- ime
ne wo k me ics, enabling scalable and dynamic o ches a ion
ac oss he e ogeneous 6G in as uc u es. The key con ibu ions
o his pape a e summa ized as ollows:
•We model he p oblem o scalable and dis ibu ed se ice-
awa e o ches a ion o VNF deploymen and mig a ion o
mul iple SFC eques s in la ge-scale 6G ne wo ks.
•We design mul i-agen RL–based dis ibu ed o ches a o s
ha use local s a e o join ly lea n deploymen and mi-
g a ion policies, minimizing delay, ene gy consump ion,
and VNF mig a ion cos unde dynamic a ic and e-
sou ce condi ions, while concu en ly p o isioning mul-
iple SFCs.o isioning mul iple SFCs.
•We e alua e MARC-6G agains baseline me hods and
demons a e highe eques accep ance, imp o ed ene gy
e iciency, and educed mig a ion cos s, alida ing i s e -
ec i eness o dynamic SFC managemen in 6G ne wo ks.
The emainde o he pape is o ganized as ollows: Sec ion
II desc ibes he ela ed wo ks. Sec ion III p esen s he sys em
model and p oblem o mula ion. Sec ion IV ou lines he p o-
posed me hods. Sec ion V p esen s he pe o mance e alua ion.
Finally, Sec ion VI d aws he conclusions.
II. RELATED WORKS
Exis ing adap i e cen alized p o isioning echniques add ess
VNF deploymen as an elas ic esou ce p o isioning p oblem,
aiming o lexible and on-demand esou ce alloca ion o mee
ne wo k se ice equi emen s and se ice le el ag eemen s [4],
[8], [9]. Howe e , hey o en o e look he ac ha mul iple
se ice eques s may a i e a he o ches a o simul aneously,
wi h a ying pe o mance equi emen s.
Tang e al. [10] employ a digi al win powe ed by an
a en ion model o guide an RL agen by p edic ing esou ce
equi emen s a p io i. Howe e , because p edic ion is decoupled
om ac ion selec ion, he sys em’s abili y o adap in eal ime is
comp omised. Onsu e al. [8] apply DRL o VNF placemen
using ixed da a cen e p io i ies based on esidual capaci y,
bu s a ic sco ing o e looks con ex and a ic, esul ing in
subop imal placemen .
Dynamic p io i y assignmen enables mo e adap i e and
scalable o ches a ion unde esou ce a iabili y. Tanuboddi
e al.[11] add essed VNF mig a ion by le e aging so wa e-
based ne wo k unc ions o enable dynamic scaling, acili a ing
seamless mig a ion in esponse o use mobili y, load a ia ions,
and ha dwa e ailu es. Chen e al. [12] and J.Chen e al. [13] ad-
d ess cos -e icien and aul - ole an SFC mig a ion using DRL
and op imiza ion echniques, espec i ely, bu bo h app oaches
o e look key aspec s such as ai ness, ealis ic se ice li e imes.
Table I p o ides a compa ison o he pa ame e s conside ed in
his s udy wi h hose epo ed in he li e a u e.
III. SYSTEM MODEL AND PROBLEM FORMULATION
A. Dis ibu ed Se ice O ches a ion in 6G Ne wo ks
We en ision he 6G cloud ne wo k a chi ec u e ha con-
sis s o h ee amewo ks: he CCF, he Managemen and
O ches a ion F amewo k (MOF), and he A i icial In elli-
gence and Machine Lea ning F amewo k (AIMLF), as shown
in Fig. 1 [15]. The CCF p o ides logically uni ied esou ce
managemen ac oss cloud- o-edge en i onmen s by dynamically
in eg a ing esou ces in o Cloud, Nea -edge, Fa -edge, and
Ex eme-edge. A po ion o CCF is highligh ed in a di e en
colo o illus a e ha VNFs o a single ne wo k se ice can be
deployed ac oss he e ogeneous CCF nodes.
TABLE I: Compa ison o Rela ed Wo ks.
Re e ence Concu en
VNF
Mig a ion
Mul iple
SFC
Reques s
S a e ul
VNF
Mig a ion
Mig a ion
Cos
Tang e al. [10]✓×✓×
J.Chen e al.[13]✓×✓ ✓
Onsu e al. [8]×✓× ×
Tanub. e al. [11]×✓×✓
Zhang e al [14]×✓ ✓ ×
S.Long e al [4]×✓× ×
Chen e al. [12]✓×✓ ✓
Liu e al. [9]✓ ✓ × ×
MARC-6G ✓ ✓ ✓ ✓
DSO#M
AIMLF O ches a o
Po al
Model Se ing
and CI/CD
AIMLF S o age
T aining
Da a In e ence
Da a
Pe o mance
Da a
AIMLF Func ions
Fuc ions, Algo i hms, Lib a ies
AIMLF Lea ning
Supe ised
Unsupe ised
Rein o cemen
NDT
AIMLF Collabo a i e Func ions
FL, T ans e , MARL
AIMLF
Pe o mance
Moni o ing
AIMLF Models
Da abase
MMMM
M
R W
Da a Manage
MOF
NS1
NS2
NS3
AIMLF MSO
VNF1
VNF4
VNF5
VNF6
VNF10
VNF11
VNF9 VNF12
VNF2
SO
#1 SO
#1
SO
#N SO
#N
VNF6 VNF7
Cloud
EdgeFa -Edge
Ex eme-edge
VNF1
AI
AI DSO#1
CCF
AI
Fig. 1: AI-na i e 6G ne wo k a chi ec u e wi h dis ibu ed se ice
o ches a o s o scalable ne wo k s a e adap i e VNF deploymen [15].
The MOF p o ides dis ibu ed o ches a ion capabili ies o
enable scalable o ches a ion, in e acing wi h he CCF o
in as uc u e con ol and wi h he AIMLF o lea ning-based
decision suppo . I comp ises wo componen s: he Mas e Se -
ice O ches a o (MSO) and he Dis ibu ed Se ice O ches a-
o s (DSOs). The MSO is esponsible o he ini ial deploymen
o Ne wo k Se ices (NSs) and DSOs ac oss he CCF, while
he DSOs manages un ime ope a ions and he ne wo k se ice
li ecycle managemen . Each DSOs con ains mul iple SO o
add ess he dynamic wo kloads and scalabili y challenges posed
by he he e ogeneous 6G ne wo k in as uc u e.
The AIMLF is he in elligen con ol amewo k, suppo ing
eal- ime moni o ing and con inuous lea ning in dynamic ne -
wo k en i onmen s. I ensu es au onomous o ches a ion and
adap i e se ice managemen by coope a ing wi h he CCF
and MOF. DSOs comp ise in elligen agen s ha le e age he
AIMLF o manage he CCF in eal ime, enabling con ex -
awa e decisions. DSOs moni o esou ce a ailabili y o suppo
au onomous VNF alloca ion, p edic i e scaling, and p oac i e
SFC mig a ion. This ensu es low la ency, ene gy e iciency, and
enhanced esilience and aul ole ance.
B. Sys em Model
We model dis ibu ed o ches a ion o e he CCF a a high
le el (Fig. 2), whe e se ice o ches a o s manage ba ches o
SFC eques s in a queue. We conside CCF ne wo k in as-
uc u e ha comp ises he e ogeneous physical nodes i, each
wi h CPU/GPU capaci y Ci[cycles/s], dis ibu ed ac oss ou
ie s: (i) cen alized cloud da a cen e s, (ii) nea -edge nodes,
(iii) a -edge nodes, and (i ) end-use de ices.
We ep esen he ne wo k in as uc u e as a weigh ed undi-
ec ed g aph G= (V, E, W ), whe e Vis he se o physical
nodes, E⊆V×Vdeno es he se o links connec ing hem,
and he weigh unc ion W:E→R+ ep esen s each
link’s a ailable bandwid h. Each node ∈V ep esen s a
physical ne wo k en i y, such as an ex eme-edge de ice (e.g.,
sma phone, elec ic ehicle, o d one), an edge o nea -edge
se e , o a cen alized cloud da a cen e wi hin he CCF.
Each link e∈E ep esen s a high-speed communica ion pa h,
ypically implemen ed ia ibe connec ions. We deno e he
bandwid h capaci y be ween nodes i, j∈Vas Bij [bi /s].
We conside a scena io in ol ing mul iple SFC e-
ques s a i ing a he o ches a o . Each SFC eques is
modeled as a Di ec ed Acyclic G aph (DAG) i=
(Ki, Li, δi, ζi,Λi, Bmin
i, Dmax
i, σi), whe e Kideno es he se
o VNFs o he i- h SFC; Lideno es he se o logical links
be ween VNFs; δiand ζideno e he sou ce and des ina ion
endpoin s, espec i ely; Λisigni ies he a ic a i al ime; Bmin
i
[bi /s] is he minimum bandwid h equi emen ; Dmax
i[s] is
he maximum ole able end- o-end delay; and σi[cycle/s] is
he o al compu a ional demand ac oss all VNFs. Each VNF
k∈Ki ep esen s a so wa ized ne wo k unc ion ha can
p ocess incoming packe s. The logical links (ki, kj)∈Li
ep esen he connec ions be ween successi e VNFs kiand kj,
which ep esen a sequen ial dependency be ween VNFs.
To suppo concu en deploymen , we conside a ba ch o
Nac i e SFC eques s simul aneously o deploymen , deno ed
by F= ( 1, 2, . . . , N). These eques s, p ede ined acco ding
o applica ion-speci ic equi emen s and submi ed by enan s,
a e o ches a ed in pa allel o e he physical in as uc u e. The
opology io an SFC is de e mined by he applica ion i se es
and is assumed o be speci ied by he enan and o wa ded o
he ne wo k managemen plane o p ocessing and deploymen .
Each VNF k∈Kimain ains an in e nal s a e, making s a e-
ul mig a ion essen ial o p ese e session con inui y and a oid
se ice dis up ion du ing ealloca ion. Gi en use mobili y and
luc ua ing link quali y, p oac i e esou ce managemen and
adap i e s a e ans e a e essen ial. The selec ion o a a ge
node o mig a ing a s a e ul VNF can be modeled as a uple
Sk= (Mi, Dc, Q , Ps, Tm)whe e Miis he size o he con ex
o be mig a ed, Dc ep esen s he deploymen cos o he se ice
on he a ge node, Q deno es he SLA iola ion impac du ing
mig a ion, Psindica es he conges ion le el along he selec ed
mig a ion pa h, and Tmis he o al mig a ion ime.
C. P oblem Fo mula ion
We o mula e he p oblem o simul aneous, elas ic sel -
scaling placemen o VNFs o a ba ch o Nac i e SFC eques s
Cloud Con inuum F amewo k
Fa edge Edge CloudEx eme Edge
VK7
VK14
VK10
VK11
VK8
VK5
VK1
VK2
VK3
VK4
VK9
VK13
VK6
VK6 VK7 VK8 VK9 VK10
VK11 VK12 VK13 VK15
VK1 VK2 VK3 VK4
VK14
SO #N
RL Agen
SO#1
RL Agen
SO#2
RL Agen
2
1
VK5
N
SFC eques s wai ing in a queue
Deploymen
Policy
Expe ience sha ing
1
2
3
M
Wi ed link Vi ual linkDeploymen ac ion
S a e
In o ma ion
T a ic low
Fig. 2: Se e al SFC eques s a i e a he o ches a o in queues, while
mul iple in elligen agen s concu en ly deploy VNFs o e CCF.
o e a sha ed physical ne wo k, wi h he goal o de e mining
he op imal placemen ha minimizes end- o-end SFC la ency.
The o al end- o-end delay o an SFC ypically comp ises
p opaga ion, communica ion, queuing, VNF compu ing, and
i ualiza ion delays. Fo ac abili y, we simpli y ou o mu-
la ion by conside ing communica ion delay, VNF compu ing
delay, and queuing delay. Ou objec i e is o de e mine an
op imal alloca ion ha minimizes o e all sys em la ency, ene gy
consump ion, and mig a ion cos .
Fo each SFC eques i, we de ine he alloca ion ec o
as αi= (αi,1, αi,2, . . . , αi,|Ki|)∈V|Ki|, whe e each elemen
αi,j ∈V ep esen s he physical node on which he j- h
VNF o he SFC eques iis deployed. The comple e se
o alloca ion ec o s o a ba ch o NSFC eques s is deno ed
by A={α1, α2, . . . , αN} ∈ Ω, whe e Ω = QN
i=1 V|Ki|is he
gene alized ca esian p oduc o alloca ion ec o s. Speci ically,
Ω = (α1, . . . , αN)|αi∈V|Ki|,∀i∈ {1, . . . , N}de ines
he easible solu ion space comp ising uples o alloca ion
ec o s o e he e ogeneous domains V|K1|, V |K2|, . . . , V |KN|,
co esponding o he a iable numbe o VNFs ac oss he N
SFC eques s.
The communica ion la ency o an SFC eques
i, gi en he alloca ion αi, is modeled as Γ(αi) =
P(km,kn)∈Lil(αi,m, αi,n), whe e l(αi,m, αi,n)deno es
he sho es -pa h ansmission delay be ween VNF deploymen
loca ions αi,m and αi,n. We de ine he p ocessing la ency o
an SFC eques ias P(αi) = PKi
j=1(Pc(αi,j) + Pq(αi,j)),
whe e Pc(αi,j)and Pq(αi,j ) ep esen he compu ing delay
and queuing delay wai ing o p ocessing o he j- h VNF
a i s assigned node, espec i ely. The o al SFC delay o
a eques iis hen T(αi) = P(αi) + Γ(αi), comp ising
he communica ion, p ocessing, and queuing delays. We
he e o e de ine he o al delay o a ba ch o N eques s
as T(A) = Pαi∈AT(αi), ep esen ing he agg ega e delay
ac oss all SFCs in he ba ch.
The ene gy consump ion a ime gi en an alloca ion αiis
modeled as
Pαi( ) = X
∈V
ω (αi)·PP
( ) + X
∈V
(1 −ω (αi)) ·PN
( )(1)
whe e ω (αi)∈ {0,1}indica es whe he node is ac i e
unde alloca ion αi. The ac i e p ocessing ene gy consump ion
is de ined as PP
( ) = PN
( ) + β ( )·Pmax
( )−PN
( ),
wi h PN
( ) ep esen ing baseline ene gy usage and Pmax
( ) he
ene gy d awn unde ull u iliza ion [16]. The u iliza ion ac o
β ( )acco ding o [13] is gi en by
β ( ) = wp·xk
( )·νk
( )·σp
Cp
+wu·xk
( )·νk
( )·σu
Cu
(2)
whe e xk
( )indica es whe he low kis p ocessed on node
,νk
( )is he low’s p ocessing a e, σp
and σu
a e he
esou ce demands pe uni o compu ing and s o age capaci y,
espec i ely, and Cp
,Cu
a e he co esponding a ailable ca-
paci ies. The weigh s wpand wu e lec he ela i e impo ance
o CPU and s o age u iliza ion. Le us de ine Pτ
( )as he
ansmission ene gy equi ed o mig a e VNF k om one node
o ano he . We he e o e de ine he o al ene gy consump ion o
a ba ch o N eques s as P(A) = Pαi∈APαi( ), ep esen ing
he cumula i e ene gy usage induced by he cu en alloca ion
ac oss all eques s in he ba ch.
The VNF mig a ion cos comp ises bo h ime and ene gy
componen s. The o al mig a ion ime o a VNF k∈Kiis
de ined as k=P(i,j)∈lk
Mk
Bij , which ep esen s he sum o
he ansmission imes equi ed o ans e he VNFs’ s a e
size Mko e each physical link (i, j)wi h bandwid h Bij
along he sho es pa h lk om he VNFs’ cu en deploymen
loca ion o i s new candida e node. The associa ed ene gy
consump ion o mig a e VNF kalong he sho es pa h lk
is ek=P(i,j)∈lkPτ
( ). We hen de ine he SFC mig a ion
cos M(αi) = Pk∈Kiλ· k+ (1 −λ)·ekas he sum
o all VNF mig a ion cos , whe e λ∈[0,1] is a unable
pa ame e used o balance he ade-o be ween ime and ene gy
cos s, exp essing di e en uni s as pe cen ages. We he e o e
de ine he o al mig a ion cos o a ba ch o N eques s as
M(A) = Pαi∈AM(αi), ep esen ing he agg ega e mig a ion
cos s ac oss all SFCs in he ba ch.
Le us de ine nk
as he CPU cycles pe second equi ed by a
VNF kwhen deployed on a physical node ∈V. Simila ly, le
bkl
ij deno e he bandwid h consumed by he logical link (k, l)
o an SFC when mapped on o he physical link (i, j)∈E. We
assume a o al o MSFC eques s a i e o e ime. To suppo
scalable o ches a ion, we di ide hese in o mba ches, each
consis ing o Nconcu en eques s, such ha m=M/N. In
each ba ch, VNFs a e alloca ed join ly o he NSFCs.
Gi en ha he op imiza ion p oblem is mul i-objec i e, we
de ine β= (β1, β2, β3)as he weigh ec o balancing he
ade-o s be ween agg ega ed delay, ene gy consump ion, and
mig a ion cos . Recall ha T(A),P(A), and M(A)deno e
he o al delay, ene gy consump ion, and mig a ion cos , e-
spec i ely, agg ega ed o e he ba ch o NSFC eques s unde
alloca ion A. The objec i e is o minimize he scala ized cos
unc ion β⊤C(A), whe e C(A) = (T, P, M)(A)cap u es he
h ee cos componen s. The goal o he op imiza ion p oblem
is o de e mine he op imal alloca ion A∗∈Ω o a ba ch o
NSFC eques s such ha sys em-wide esou ce u iliza ion is
e icien and cons ain -sa is ying, as shown in Equa ion 3.
minimize
A∈Ω
To al SFC delay
z }| {
β1·T(A) +
Ene gy consump ion
z }| {
β2·P(A) +
Mig a ion cos
z }| {
β3·M(A)(3)
subjec o
m
X
=1 X
i∈[N]X
k∈Ki
nk
≤C ,∀ ∈V(3a)
m
X
=1 X
i∈[N]X
(k,l)∈Li
bkl
ij ≤Bij,∀(i, j)∈E(3b)
T(αi)≤Dmax
i,∀i∈ {1, . . . , N}(3c)
Cons ain 3a ensu es ha o each node in he ne wo k, he
o al p ocessing demand o all VNFs assigned o ha node does
no exceed i s a ailable compu a ional capaci y. Cons ain 3b
ensu es ha o each logical link, he combined bandwid h
equi emen s o all SFC eques s do no exceed he link’s
a ailable bandwid h capaci y. Cons ain 3c ensu es ha o
each SFC eques , he agg ega e p ocessing and communica ion
delay does no exceed i s E2E delay ole ance equi ed o
success ul se ice comple ion.
IV. MULTI-AGENT DEEP REINFORCEMENT LEARNING FOR
DISTRIBUTED SFC DEPLOYMENT AND MIGRATION
We p esen MARC-6G, a dis ibu ed coope a i e MARL
amewo k ha o ches a es VNF deploymen and mig a ion
in dynamic ne wo ks. We ede ine he op imiza ion equa ion
1 in he con ex o he Mul i Agen Rein o cemen lea ning
(MARL) p oblem, whe e each agen ope a es unde a Pa ially
Obse able Ma ko Decision P ocess (POMDP), obse ing a
local po ion o he CCF, ac ing independen ly, and exchanging
s a e, ac ion, and ewa d o lea n a join policy ha an icipa es
u u e condi ions. The policy selec s physical nodes o VNFs
by accoun ing o in e -VNF delay, deploymen cos , SLA con-
s ain s, conges ion, ene gy, and mig a ion cos . Agen s e ine
s a egies om expe ience and pee sha ing, enabling scalable
managemen o he e ogeneous, ime- a ying wo kloads.
The b oade sys em-wide objec i e is o join ly lea n a se o
op imal policies, deno ed as π∗={π∗
1, . . . , π∗
N}, which collec-
i ely maximize he pe o mance o all agen s in a coope a i e
manne .
A. Modeling VNF Deploymen as Ma ko Decision P ocess
We model he dis ibu ed SFC deploymen and mig a ion
ask as a coope a i e MARL p oblem, whe e he de ailed
o mula ion is exp essed in e ms o he join s a e space S ,
join ac ion space A , and join ewa d unc ion R a a gi en
ime , as desc ibed below.
1) Join S a e Space S desc ibes he cu en si ua ion
o each agen in he en i onmen . The global s a e a
ime is he collec ion o local s a es obse ed by
each agen . The indi idual s a e o agen iis Si
=
VKi
i, , V K2
i, , ...L|Ki|
Vi,h, G opo, M, iwhe e VKi
i, indica es
ha VNF Kiis deployed on physical node Via ime , and
L|Ki|
Vi,h ep esen s he link delay be ween node Vi(hos ing
Ki) and node Vh(hos ing |Ki|). The s a e in o ma ion also
includes he obse ed ne wo k opology (G opo), he num-
be o SFC eques s in he queue M, and he cha ac e is ics
o he SFC eques i.
2) Join Ac ion Space A explo es op imal physical-node
placemen s o hos ing VNFs.Each ac ion co esponds o
selec ing a sequence o physical nodes ha mee he
pe o mance equi emen s o incoming SFC eques s. A
each ime s ep , each agen iselec s a sequence o physical
nodes o deploy he VNFs o i s SFC. The ac ion o agen i
is de ined as: αi
= (αi,1, αi,2, . . . , αi,|Ki|)∈V|Ki|whe e
each elemen αi,Ki∈V ep esen s he selec ed physical
node on which he i- h VNF o he SFC eques iis
deployed.
3) Join Rewa d Func ion R assigns a nume ical sco e
o each agen ’s decision on he ne wo k pe o mance.
The agen e alua es each physical node’s placemen by
in e p e ing pa e ns i lea ns om he en i onmen . A
ime , he ewa d o agen iacco ding o Equa ion 3 is
gi en as ollows. Ri
=−PT
=0 γ ·(β1·T(A) + β2·
P(A) + β3·M(A)) whe e T is he maximum ime-s ep
a which an agen lea n op imal deploymen policies. The
collec i e ewa d is hen de ined as: R=PN
i=1 Ri
,whe e
a is he join ac ion ac oss all agen s.
B. Mul i-Agen P oximal Policy Op imiza ion o Au onomous
VNF deploymen
In Mul i Agen P oximal Policy Op imiza ion (MAPPO),
dis ibu ed lea ning ia expe ience sha ing employs sepa a e
policy ne wo ks, each pa ame e ized by πθi, and sepa a e
alue ne wo ks, each pa ame e ized by Vϕi. The RL agen s
exchange expe iences h ough hei alue ne wo ks and policy
ne wo ks o lea n coo dina ed deploymen policies. Each agen
iobse es a local s a e si
(e.g., a subse o he ne wo k
opology, cu en VNF placemen s, esou ce u iliza ion, and KPI
equi emen s o incoming a ic) and selec s a deploymen
ac ion ai
∼πθi(a|si
). A e agen iexecu es i s ac ion, i
ecei es a ewa d i
+1, which is weigh ed by βiand exchanged
wi h i s neighbo ing agen s. Based on his ewa d, each agen
iupda es i s alue unc ion Vϕi(si
)and he policy ne wo k
pa ame e s θi.
In each aining epoch, agen s also sha e hei la es policy
πθi(· | si
)wi h neighbo s, enabling hem o an icipa e o he s’
beha io s and coo dina e VNF placemen s ac oss he ne wo k.
Thus, by exchanging weigh ed ewa d signals βi i
+1 (i.e,
expe ience o an agen ), MAPPO ensu es ha each agen ’s alue
ne wo k gains a mo e global pe spec i e on pe o mance.
Algo i hm 1: MARC-6G Ope a ion
Inpu : F= ( 1, 2,..., N),Tmax,G,l(i, j),M,N
// Ini ialize policy, alue, eplay bu e
1Ini ializa ion: ϕi
0,πi
0, D
// Randomly place Ini ially VNF
2Si
=VKi
i, , V K2
i, , ...L|Ki|
Vi,h
// Ini ialize he ewa d o ze o
3Ri
←0
4 o ∈Tmax do
5 o i∈Ndo
// Each agen selec s one SFC eques
6 i←Sample(M, i)
7c←Measu eCPU equi men ( i)
8b←Measu eBandwid h( i)
9l←Measu eLa ency( i)
10 q←Moni o Linkquali y(G opo)
// T ansla e i& numbe o SFC eques s in a queue
as s a e
11 Si
←(l, b, c, q, G opo, M)
// Execu e Ai
∼πi
θbased on he cu en policy
12 Si
Ai
,πθ(Ai
|Si
,A−i
)
−−−−−−−−−−−−→ Si
+1, Ri
+1
// Deploy Kinew loca ion
13 Si
+1 ←VKi
// S o e each si
, ai
, Ri
in eplay bu e
14 D ← {si
0, ai
0, i
0,...,si
, ai
, i
}
15 i l( i)< l(i, j) hen
16 Ai
∼πθ(Ai
|Si
, A−i
)Execu e
−−−−→ Ri
+1, Si
+1
// Each agen calcula es ad an age es ima e
17 Aπi
θ (s , a )←Qi(s, a)−Vi
ϕ(s)
// Upda e θiand ϕipa ame e s
18 θi
+1 = a g maxθi1
|D |TPτ∈D PT
=0
min πθ(a |s )
πθ (a |s )Aπθ (s , a ), g(ϵ, Aπθ (s , a ))
19 ϕi
+1 =
a g minϕ1
|D |Pτ∈D PT
=0(Vϕ(s )−R )2
20 else
// Reloca e VNF
21 VKi
+1 ←VKi
// Assign VNF Kiop imally
22 Ki
Deploy
−−−−→ VKi
23 e u n ϕi(s ), πθi(s )
C. MARC-6G Algo i hm
The s ep-by-s ep wo k low o Algo i hm 1 is desc ibed below
in de ail. The inpu s o he MARC-6G algo i hm a e he
physical ne wo k opology G opo, he co esponding esou ce
capaci ies (e.g., link bandwid h, link delay, node CPU), and a se
o SFC eques s iwi h hei cha ac e is ics. Each agen selec s
a single SFC eques om he pool M, based on i s a ic
a i al ime Λiand E2E delay equi emen Dmax
i(lines 5-7).
Each agen cons uc s he s a e s by measu ing he SFC eques
KPIs i(bandwid h, CPU, la ency) and he cu en physical
esou ce a ailabili y (lines 7–12). Nex each agen execu e he
deploymen ac ion a ∼πθ(s )by conside ing he deploymen
ac ion o o he s, ollowing cu en policy πθ, upda e he sys em
o s a e s +1, and se he new VNF placemen S +1 =Vk
is o e
si
, ai
, i
ajec o y in eplay bu e D (lines 13–14). Then he
equi emen s o l( i)a e compa ed wi h a ailable in as uc u e
esou ces: l( i)< l(i, j) o he gi en alloca ion ec o . The
ewa d R and he ad an age Aπi
θ (s , a )a e hen compu ed
(lines 11-15). Then upda e θi
+1( he policy) and ϕi
+1 ( alue
pa ame e s), and epea his i e a ion un il he op imal policy
is de eloped (lines 14-21). The decision abou he SFC eques
deploymen o mig a ion o ano he node (lines 21-22). Finally,
e u n he alue pa ame e ϕi(s )and he policy pa ame e
πθi(s )(line 23).
V. PERFORMANCE EVALUATION
A. Expe imen al Se up
We pe o med he simula ion expe imen s using a cus om
simula o de eloped in Py hon wi h he Ne wo kX [17] o
gene a e USA NET [18] ne wo k opologies ha ep esen
he unde lying ne wo k in as uc u e. Fo he MARC-6G im-
plemen a ion, we employ he open-sou ce lib a ies Gymnasium
1.0 and RLlib 2.4.0 o ain and e alua e RL agen s wi hin a
cus om en i onmen . Gi en he a iabili y o incoming a ic,
each SFC eques iis managed in a way ha p ese es
be e link quali y. Mul iple PPO agen s ope a e in pa allel,
con inuously moni o ing ne wo k condi ions and adjus ing VNF
placemen s in esponse o a ic pa e ns and sys em dynamics.
We model bo h he s a e space Sand he ac ion space A
as mul i-disc e e, making PPO a sui able choice due o i s
obus ness, s abili y, and con e gence p ope ies, and upda e i s
policy online in disc e e con ol asks.
B. Baselines and E alua ion Me ics
The pe o mance o MARC-6G, compa ed wi h cen alized
o ches a ion om he p e ious wo k [19] and baseline g eedy-
based VNF alloca ion. The G eedy alloca o , deploying VNFs
immedia ely wi hou conside ing long- e m impac s, o en leads
o subop imal esou ce u iliza ion. Cen alized o ches a o
Single Agen P oximal Policy Op imiza ion (SAPPO), su e s
om scalabili y limi a ions due o i s eliance on a single
global con olle , p e en ing eal- ime adap abili y in la ge-
scale, dynamic ne wo k en i onmen s.
MARC-6G is e alua ed using: (i) Rewa d, he weigh ed ob-
jec i e unc ion ha combines o al E2E delay, ene gy consump-
ion, and mig a ion cos (Equa ion 3); (ii) Numbe o Accep ed
Reques s, he ac ion o VNF eques s success ully deployed
wi h he equi ed pe o mance ela i e o all incoming eques s;
(iii) Ene gy Consump ion, highligh ing ene gy-e icien VNF
placemen ha minimizes usage by agg ega ing wo kloads om
unde u ilized nodes on o a minimal se o ac i e se e s in
eal ime; and (i ) Mig a ion Cos , include he compu a ional,
ene gy, and bandwid h o e head incu ed when mig a ing VNF
ins ances o op imize esou ce u iliza ion and se ice quali y.
C. Discussion and Simula ion Resul s
Figu e 3a compa es lea ning cu es o MARC-6G, SAPPO,
and a g eedy (non-lea ning) alloca o : ewa ds luc ua e du ing
explo a ion and SAPPO leads ea ly (no coo dina ion o e -
head), bu once a sha ed s a e eme ges, MARC-6G’s mul i-
agen coope a ion accele a es lea ning pas SAPPO, ul ima ely
ou pe o ming bo h SAPPO and he g eedy baseline wi h mo e
s able, e icien VNF placemen s.
Figu e 3b shows ha ac oss he SFC ypes in [3] (ID4.0,
MIoT, CG, AR, VS), MARC-6G wi h i e agen s consis en ly
ou pe o ms SAPPO and he g eedy alloca o in e ms o he
numbe o accep ed eques s. Fo ID4.0, SAPPO and MARC-
6G a e compa able because ID4.0 has ew VNFs, educing
con en ion. MARC-6G deploys SFCs concu en ly wi h i e
agen s, whe eas SAPPO and he g eedy baseline place one pe
s ep; he g eedy is u he hinde ed by andom VNF placemen .
Figu e 3c shows ene gy consump ion ising wi h he numbe
o de ices because VNFs a e sp ead ac oss many se e s
wi hou accoun ing o o e - o unde p o isioning. MARC-
6G lowe s ene gy consump ion by up o 12.5% and 39.2%
compa ed o SAPPO and g eedy ac oss de ice scales by
lea ning u iliza ion-awa e, ene gy-e icien VNF placemen s.
Figu e 4a shows ha as he numbe o physical de ices
g ows, mig a ion cos dec eases: a la ge pool o nodes enables
op imal nea by node selec ion, and MARC-6G’s mul i-agen
moni o ing u he minimizes cos by up o 34% and 41.25%
compa ed o SAPPO and g eedy app oaches, espec i ely. Fig-
u e 4b epo s E2E delay o 40/70/100-node ne wo ks unde
concu en SFC loads o 3, 6, 9, and 12; delay d ops ac oss all
loads as node coun g ows, indica ing ha MARC-6G lea ns
scalable, esou ce-awa e deploymen s ha minimize la ency
e en unde hea ie a ic.
Figu e 4c illus a es he scalabili y o MARC-6G: as he num-
be o SFC eques s and hence agen s inc eases p opo ionally,
he end- o-end delay dec eases, since each agen , managing
only a segmen o he ne wo k, and deploy mul iple eques s
concu en ly.
VI. CONCLUSION
This pape add esses dis ibu ed, con ex -awa e VNF place-
men and mig a ion unde ime- a ying a ic o 6G ne wo ks,
wi h he objec i e o minimizing end- o-end delay, ene gy con-
sump ion, and mig a ion cos . We p esen MARC-6G, a MARL
amewo k o concu en SFC deploymen ha adap s online
o ne wo k dynamics, placing and mig a ing VNFs o mee
he s ic delay equi emen s o luc ua ing applica ions. Unlike
cen alized o ches a ion, cha ac e ized by slow, global s a e
collec ion as eques olume and opology size g ow, MARC-
6G moni o s he local po ion o he CCF, sha es expe ience
among agen s, and upda es policies in eal ime. Expe imen al
esul s show ha MARC-6G has highe eques accep ance,
be e ene gy e iciency, lowe mig a ion cos , and imp o ed
scalabili y compa ed o baseline app oaches.
(a) Rewa d con e gence (b) Numbe o accep ed eques s o di e -
en specialized ne wo k se ices
(c) Ene gy-consump ion compa ison
ac oss di e en physical de ices
Fig. 3: Lea ning pe o mance o MARC-6G: accep ed SFC eques s and ene gy consump ion ac oss he numbe o de ices, compa ed wi h he
baseline me hods.
(a) VNF mig a ion cos s o di e en
physical de ices
(b) Va iable numbe o physical de ices
and SFC eques s (c) Va iable numbe o agen s and SFC eques s
Fig. 4: Analysis o mig a ion cos and scalabili y unde a ying SFC eques loads and numbe s o agen s, and hei impac on E2E delay.
ACKNOWLEDGMENT
This wo k is unded by he SNS-JU 6G Cloud p ojec
unde he EU Ho izon Eu ope p og amme (G an Ag eemen
No. 101139073).
REFERENCES
[1] C. Campolo, A. Ie a, and A. Molina o, “Ne wo k o dis ibu ed in el-
ligence: A su ey and u u e pe spec i es,” IEEE Access, ol. 11, pp.
52 840–52 861, 2023.
[2] I. A gouleas, D. Yuan, N. Pappas, and V. Angelakis, “Vi ual ne wo k
unc ions scheduling unde delay-weigh ed p icing,” IEEE Ne wo king
Le e s, ol. 1, no. 4, pp. 160–163, 2019.
[3] J. M. Ziaze , B. Jauma d, H. Duong, P. Khoshabi, and E. Janulewicz, “A
dynamic a ic gene a o o elas ic 5g ne wo k slicing,” in 2022 IEEE
in e na ional symposium on measu emen s & ne wo king (M&N). IEEE,
2022, pp. 1–6.
[4] S. Long, B. Liu, H. Gao, X. Su, and X. Xu, “Deep ein o cemen
lea ning-based s c deploymen scheme o 6g io scena io,” in 2023 IEEE
Symposium on Compu e s and Communica ions (ISCC). IEEE, 2023,
pp. 1189–1192.
[5] N. Toumi, M. Bagaa, and A. Ksen ini, “Machine lea ning o se ice
mig a ion: a su ey,” IEEE Communica ions Su eys & Tu o ials, ol. 25,
no. 3, pp. 1991–2020, 2023.
[6] T. Haga, “O ches a ion o ne wo king p ocesses,” 2007.
[7] T. Mai, H. Yao, N. Zhang, W. He, D. Guo, and M. Guizani, “T ans e
ein o cemen lea ning aided dis ibu ed ne wo k slicing op imiza ion in
indus ial io ,” IEEE T ansac ions on Indus ial In o ma ics, ol. 18, no. 6,
pp. 4308–4316, 2022.
[8] M. A. Onsu, P. Lohan, B. Kan a ci, E. Janulewicz, and S. Slobod ian,
“Unlocking econ igu abili y o deep ein o cemen lea ning in s c p o-
isioning,” IEEE Ne wo king Le e s, 2024.
[9] Q. Liu, L. Tang, T. Wu, and Q. Chen, “Deep ein o cemen lea ning o
esou ce demand p edic ion and i ual unc ion ne wo k mig a ion in
digi al win ne wo k,” IEEE In e ne o Things Jou nal, ol. 10, no. 21,
pp. 19 102–19 116, 2023.
[10] L. Tang, Z. Li, J. Li, D. Fang, L. Li, and Q. Chen, “D -assis ed n mig a-
ion in sdn/n -enabled io ne wo ks ia mul iagen deep ein o cemen
lea ning,” IEEE In e ne o Things Jou nal, ol. 11, no. 14, pp. 25 294–
25 315, 2024.
[11] B. R. Tanuboddi, G. Gad, Z. M. Fadlullah, and M. M. Fouda, “Op imizing
n mig a ion in b5g co e ne wo ks: A machine lea ning app oach,” in
2024 In e na ional Con e ence on Sma Applica ions, Communica ions
and Ne wo king (Sma Ne s). IEEE, 2024, pp. 1–5.
[12] R. Chen, H. Lu, Y. Lu, and J. Liu, “Msd : A deep ein o cemen lea ning
amewo k o se ice unc ion chain mig a ion,” in 2020 IEEE Wi eless
communica ions and ne wo king con e ence (WCNC). IEEE, 2020, pp.
1–6.
[13] J. Chen, J. Chen, K. Guo, R. Hu, T. Zou, J. Zhu, H. Zhang, and
J. Liu, “Faul ole ance o ien ed s c op imiza ion in sdn/n -enabled cloud
en i onmen based on deep ein o cemen lea ning,” IEEE T ansac ions
on Cloud Compu ing, ol. 12, no. 1, pp. 200–218, 2024.
[14] Y. Zhang, R. Wang, J. Hao, Q. Wu, Y. Teng, P. Wang, and D. Niya o,
“Se ice unc ion chain deploymen wi h n -dependen so wa e mig a-
ion in mul i-domain ne wo ks,” IEEE T ansac ions on Mobile Compu ing,
pp. 1–18, 2024.
[15] 6G-Cloud, “D2.2 - Ini ial Resul s on A chi ec u e, Se ice In e aces and
AI/ML,” 6G-Cloud Conso ium, Deli e able D2.2, 2025.
[16] J. A. A oca, A. Cha zipapas, A. F. An a, and V. Mancuso, “A
measu emen -based cha ac e iza ion o he ene gy consump ion in da a
cen e se e s,” IEEE Jou nal on selec ed a eas in communica ions,
ol. 33, no. 12, pp. 2863–2877, 2015.
[17] Ne wo kX De elope s, “Ne wo kx,” h ps://ne wo kx.o g/, 2025, ac-
cessed: 31 May 2025.
[18] N. Sp ing, R. Mahajan, D. We he all, and T. Ande son, “Measu ing
isp opologies wi h ocke uel,” IEEE/ACM T ansac ions on ne wo king,
ol. 12, no. 1, pp. 2–16, 2004.
[19] S. F. Wassie, A. Di Maio, and T. B aun, “Deep ein o cemen lea ning
o con ex -awa e online se ice unc ion chain deploymen and mig a ion
o e 6g ne wo ks,” in P oceedings o he 40 h ACM/SIGAPP Symposium
on Applied Compu ing, 2025, pp. 1361–1370.