Deep Rein o cemen Lea ning o Con ex -Awa e Online Se ice
Func ion Chain Deploymen and Mig a ion o e 6G Ne wo ks
Solomon Fikadie Wassie
Uni e si y o Be n
Swi ze land
[email p o ec ed]
An onio Di Maio
Uni e si y o Be n
Swi ze land
[email p o ec ed]
To s en B aun
Uni e si y o Be n
Swi ze land
[email p o ec ed]
ABSTRACT
The Cloud Con inuum F amewo k (
CCF
) logically in eg a es dis-
ibu ed ex eme edge, a edge, nea edge, and cloud da a cen-
e s in 6G ne wo ks. Deploying VNFs o e he CCF can enhance
ne wo k pe o mance and Quali y o Se ice (
QoS
) o mode n
delay-sensi i e applica ions and use cases in 6G ne wo ks. Deep
Rein o cemen Lea ning (
DRL
) has shown po en ial o au oma e
Vi ual Ne wo k Func ion (
VNF
) mig a ions by lea ning op imal
policies h ough con inuous moni o ing o he ne wo k en i on-
men . In his wo k, we le e age Deep Rein o cemen Lea ning o
op imize ne wo k con ol policies ha con inuously upda e VNF
placemen o op imal Se ice Func ion Chain (
SFC
) deploymen
in ime- a ying use a ic scena ios. By le e aging dynamic VNF
eloca ion, his app oach seeks o imp o e ne wo k pe o mance
in e ms o la ency, ope a ional cos s, scalabili y, and lexibili y.
This s udy add esses he gap in exis ing solu ions by join ly con-
side ing ne wo k pe o mance equi emen s and mig a ion cos s,
p o iding a mo e comp ehensi e s a egy o e icien VNF deploy-
men and managemen . We show ha ou p oposed DRL-based
VNF deploymen me hod achie es a 28.8% lowe delay and a 34%
lowe mig a ion o e head compa ed o s a e-o - he-a baselines
in a b oad ange o la ge-scale simula ed scena ios, showing he
p oposed me hod’s scalabili y ea u es.
CCS CONCEPTS
•Ne wo ks
→
Ne wo k a chi ec u es;Ne wo k managemen ;
Ne wo k se ices.
KEYWORDS
6G Ne wo k A chi ec u e,Cloud Con inuum F amewo k,Se ice
O ches a o , Deep ein o cemen lea ning
ACM Re e ence Fo ma :
Solomon Fikadie Wassie, An onio Di Maio, and To s en B aun. 2025. Deep
Rein o cemen Lea ning o Con ex -Awa e Online Se ice Func ion Chain
Deploymen and Mig a ion o e 6G Ne wo ks. In The 40 h ACM/SIGAPP
Symposium on Applied Compu ing (SAC ’25), Ma ch 31-Ap il 4, 2025, Ca ania,
I aly. ACM, New Yo k, NY, USA, 10 pages. h ps://doi.o g/10.1145/3672608.
3707975
Pe mission o make digi al o ha d copies o all o pa o his wo k o pe sonal o
class oom use is g an ed wi hou ee p o ided ha copies a e no made o dis ibu ed
o p o i o comme cial ad an age and ha copies bea his no ice and he ull ci a ion
on he i s page. Copy igh s o componen s o his wo k owned by o he s han ACM
mus be hono ed. Abs ac ing wi h c edi is pe mi ed. To copy o he wise, o epublish,
o pos on se e s o o edis ibu e o lis s, equi es p io speci ic pe mission and/o a
ee. Reques pe missions om [email p o ec ed].
SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly
©2025 ACM.
ACM ISBN 979-8-4007-0629-5/25/03
h ps://doi.o g/10.1145/3672608.3707975
1 INTRODUCTION
So wa e De ined Ne wo k (
SDN
) and Ne wo k Func ion Vi ual-
iza ion (
NFV
) a e key echnologies ha enable elas ically esou ce
p o isioning o Vi ual Ne wo k Func ion (VNF) using i ual-
iza ion echnology. This app oach le e ages he po en ial o
SDN
echnology eplacing adi ional ha dwa e-based ne wo k unc-
ions wi h so wa e p og ams. Se ice unc ions a e ypically de-
ployed as Se ice Func ion Chains (SFCs), which consis o mul i-
ple
VNF
s in a p ede ined sequence o deli e end- o-end se ices.
These
VNF
s can be hos ed in a i ualized en i onmen on s anda d
Comme cial O -The-Shel (
COTS
) se e s, educing bo h Capi al
Expendi u e (
CAPEX
) and Ope a ional Expense (
OPEX
) o ne -
wo k ope a o s,[1],[2].
Ne wo k Add ess T ansla ion (
NAT
), In usion De ec ion and
P e en ion Sys em (
IDPS
), Fi ewall (FW), Load Balance (
LB
), Video
Op imiza ion con olle (
VOC
), T a ic Moni o ing (
TM
), WAN Op-
imize (
WO
), Deep Packe Inspec ion (DPI), and mo e
VNF
s can be
in e connec ed in speci ic p ede ined sequences o c ea e SFC e-
ques s, enabling he p o ision o specialized ne wo k se ices such
as Video S eaming (
VS
), Augmen ed Reali y (
AR
), Vi ual Reali y,
Indus y 4.0 (Ind 4.0), Holog aphic-Type Communica ions, Sma
Fac o y, Au onomous d i ing, Cloud gaming and ac ile indus ial
In e ne [3, 4].
The main challenge o an In e ne Se ice P o ide (
ISP
) in en-
hancing Quali y o Se ice (QoS) and Quali y o Expe ience (
QoE
)
is de e mining he op imal VNF deploymen loca ions o mee s in-
gen , a iable se ice eques s. Op imal VNF placemen on physical
se e s is c ucial o ne wo k pe o mance,
OPEX
, and eliabili y
[
5
],[
6
]. Machine lea ning models, pa icula ly Deep Lea ning (
DL
)
and Rein o cemen Lea ning (
RL
), make VNF deploymen dynamic
and adap i e, enabling eal- ime adjus men s. DL handles complex
high-dimensional ea u es, while RL op imizes s a egies h ough
in e ac ion wi h ne wo k s a es, imp o ing pe o mance, eliabili y,
and se ice con inui y, while educing ope a ional cos s.
Few s udies ha e explo ed VNF deploymen and mig a ion in
ime- a ying a ic, ypically in ol ing a ic p edic ion and a mi-
g a ion index o ep esen node load ends. Howe e , his app oach
is complex, needs bo h a ic p edic ion and node scheduling based
on load. Many wo ks add ess VNF deploymen , mig a ion, and SFC
econ igu a ion in h ee s ages: VNF Resou ce P edic ion,SFC Deploy-
men Op imiza ion, and Des ina ion Node Scheduling [
7
],[
8
],[
9
]. A
DRL-based app oach enables in elligen agen s o moni o eal- ime
ne wo k pe o mance, adap o a ic a ia ions, and con inuously
imp o e decision-making by acking use a ic and node s a us
h ough pe iodic in e ac ions and eedback om he en i onmen .
Fluc ua ing VNF esou ce equi emen s due o ime- a ying use
SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly S. Wassie e al.
a ic equi e se ice eloca ion o main ain eliabili y and pe o -
mance. The key esea ch ques ion is how o op imally deploy VNFs
while conside ing bo h ime- a ying use a ic and he a iable
unde lying ne wo k in as uc u e.
We p opose DRL-based app oaches o au oma ic VNF deploy-
men and ne wo k-s a e adap i e ne wo k econ igu a ion. To he
bes o ou knowledge, his pape is he i s o add ess VNF de-
ploymen and mig a ion om he ex eme edge, h ough he edge,
o cloud da a cen e s, aiming o long- e m ope a ional cos bene i s
and ne wo k-s a e adap i e VNF deploymen wi hin 6G mobile
ne wo k a chi ec u e. The h ee key con ibu ions o his pape a e
as ollows:
•
De elop a da a-d i en se ice o ches a o , which is pa o
he 6G ne wo k a chi ec u e managemen plane, o manage
ne wo k-s a e-awa e VNF deploymen o delay-sensi i e
applica ions.
•
De elop an in elligen ne wo k unc ion deploymen man-
agemen en i y o place VNFs, aimed a p edic ing he e ec
o long- e m ope a ional cos s, ne wo k pe o mance, and
eliabili y by con inuously moni o ing ime- a ying use
a ic demand and ne wo k in as uc u e.
•
P opose a no el DRL-based connec i i y con inuum wi h
VNF mig a ion om he ex eme edge h ough he edge o
he cloud o e he CCF o 6G Ne wo k a chi ec u e.
The emainde o he pape is o ganized as ollows: Sec ion 2
desc ibes he ela ed wo ks and conside ed scena io. Sec ion 3
p esen s he sys em model. Sec ion 4 ou lines he p oposed me hods.
Sec ion 4 explains he expe imen al se up and simula ion esul s.
Finally, Sec ion 6 d aws he conclusions.
2 RELATED WORKS
Recen esea ch has app oached he p oblems o VNF deploymen ,
sel -scaling, and elas ic esou ce alloca ion om a ious pe spec-
i es. We ha e e iewed s udies om ecen yea s ha a emp o
sol e he VNF deploymen p oblem ac oss h ee ca ego ies: esou ce
p o isioning, VNF mig a ion, and esou ce p edic ion and schedul-
ing, o en le e aging p edic ions o ime- a ying use a ic. While
many s udies ocus on QoS-awa e VNF deploymen and mig a ion
ac oss dis ibu ed da a cen e s, a ew app oach ha e explo ed DRL
as a po en ial solu ion. Signi ican e o s ha e op imized VNF place-
men o enhance ne wo k pe o mance. Despi e hese ad ances,
au oma ic and ne wo k-adap i e ope a ional cos and long- e m
e ec conside a ions, as well as e icien , eliable, and scalable VNF
deploymen in la ge-scale ne wo ks, emain challenging.
2.1 Resou ce p o isioning
Resea che s ha e add essed he VNF placemen and chaining p ob-
lem as a esou ce p o isioning issue, ocusing on lexible esou ce
alloca ion o mee se ice equi emen s and se ice le el ag ee-
men s. They p ima ily conside how much esou ce alloca ion is
needed o sa is y QoS and ne wo k pe o mance, bu o en o e look
he impac o ex e nal a ic luc ua ions [
10
],[
11
]. Fu he mo e,
many s udies ackled VNF placemen in MEC-NFV ne wo ks, o mu-
la ing op imiza ion models o enhance esou ce u iliza ion h ough
deep lea ning echniques ha in elligen ly selec nodes and place
VNFs o SFC eques s. Se ice deploymen ypically in ol es allo-
ca ing a Vi ual Ne wo k Func ion - Fo wa ding G aph o mee he
QoS equi emen s o VNFs [12],[13].
2.2 VNF mig a ion
NFV echnologies enable VNFs as so wa e-based ne wo k se -
ices. Howe e , equen use mobili y necessi a es e-scaling and
e-p o isioning o VNFs. Ak em e al. [
14
] add ess his wi h an
AI-Based Ne wo k-Awa e Se ice Func ion Chain Mig a ion o 5G,
enabling low-la ency slice ans e s be ween se ice a eas. Like
Vi ual Machine (
VM
) mig a ion and se e less compu ing, s a e ul
VNFs can mig a e wi hin elecom da a cen e s, acili a ing con ex
ans e ac oss geog aphically dis ibu ed se ups.
Li e al. [
15
] p oposed a join esou ce op imiza ion and delay-
awa e
VNF
mig a ion me hod ocusing on esou ce a ailabili y and
delay cons ain s. He e al. [
16
] in oduced an SLA-awa e app oach
o mul iple mig a ion planning in SDN-NFV clouds, op imizing
sequence and iming o minimize mig a ion ime and p e en QoS
deg ada ion. Howe e , hese heu is ics o e look dynamic a ia ions
in link quali y and compu a ional esou ces o e ime.
2.3 SFC esou ce equi emen p edic ions and
scheduling
This app oach p edic s he ime- a ying esou ce and QoS equi e-
men s o SFC eques s, p oac i ely alloca ing esou ces on a ailable
nodes based on luc ua ing use a ic demands. This p oac i e al-
loca ion is essen ial o e icien ly add essing he VNF deploymen
and esou ce p o isioning p oblem. E icien scheduling aims o
educe o al deploymen cos , communica ion cos , and enhance
QoS by dynamically alloca ing esou ces based on ime- a ying de-
mand. Gu e al. [
17
] p oposed a mixed-in ege linea p og amming
solu ion o VNF deploymen and low scheduling in dis ibu ed
da a cen e s, conside ing ne wo k opology, VNF ins ances, and
deploymen . Tang e al. [
18
] de eloped a me hod p edic ing u u e
esou ce needs based on ime- a ying use a ic and deep belie
ne wo ks, add essing dynamic VNF esou ce equi emen s. The pa-
ame e s conside ed in his me hod, compa ed wi h o he li e a u e,
a e shown in Table 1.
Table 1: Compa ison o Rela ed Wo ks
Pa ame e s End o
end delay
Concu en
VNF
Mig a ion
Node
Resou ce
Va iable
T a ic
S a e ul
VNF
Mig a ion
Mig a ion
Cos
[10]-2020 ✓×✓×××
[16]-2020 ✓ ✓ × × ✓×
[5]-2021 ×✓ ✓ × × ✓
[7]-2021 ✓ ✓ × × ✓ ✓
[14]-2022 ✓ ✓ ✓ × × ✓
[8]-2023 ✓×✓ ✓ × ×
[6]-2023 ✓×✓ ✓ ×✓
[13]-2024 ✓×✓×××
P oposed
Me hod
✓✓ ✓✓✓✓
Deep Rein o cemen Lea ning o Con ex -Awa e Online Se ice Func ion Chain Deploymen and Mig a ion o e 6G Ne wo ks SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly
NDT
Au oma ion
Op imiza ion
O ches a ion
Ope a ions/Business Suppo
Sys ems
NFV-MANOSe ice o ches a ion (SO)
AI/ML
F amewo k
OSS/BSS
Applica ion laye
S
VNF1 VNF3 VNF5 VNF6
Ex eme
edge
Edge Cloud
Business in e ace
Cloud con inuum amewo k(CCF)
Managemen & o ches a ion F ame wo k(MOF) AI models
VNF2
Resou ce O ches a ion(RO)
MOF Bus
AI/ML Bus
Ne wo k unc ion laye
VNF4End de ices
sma phone
d one
AR/VR
headse
Con ol policy and da a low VNF Deploymen ac ion Message bus
Figu e 1: Illus a ion o P oposed Na i e AI 6G Ne wo k a -
chi ec u e wi h ne wo k s a e adap i e VNF deploymen
3 SYSTEM MODEL AND PROBLEM
FORMULATION
3.1 6G Ne wo k A chi ec u e
We en ision a 6G ne wo k a chi ec u e depic ed in Figu e 1 com-
posed o h ee main componen s: he Cloud Con inuum F ame-
wo k (
CCF
), he Managemen and O ches a ion F amewo k (
MOF
),
and he A i icial In elligence and Machine Lea ning F amewo k
(
AIMLF
).Each amewo k can use message buses bo h in e nally
and o in e - amewo k communica ion. Speci ically, he MOF
message bus is used o MOF and AIMLF communica ion, while
he AIMLF message bus acili a es communica ion be ween CCF
and AIMLF.
3.1.1 Cloud Con inuum F amewo k (
CCF
). The CCF o e s a uni-
ied esou ce pool ha o ches a es esou ces ac oss mul iple clouds
and dynamically composes ne wo k and cloud esou ces om he
ex eme edge o cen al clouds based on a ailabili y and se ice
equi emen s. The nodes in a
CCF
can be classi ied based on hei
geog aphical p oximi y o he end use and hei compu a ional ca-
paci y in o he ca ego ies o Cloud,Nea -edge,Fa -edge, and Ex eme-
edge. I in eg a es AI-d i en esou ce managemen o op imize
u iliza ion and ene gy e iciency by p edic ing demand and dynami-
cally adjus ing alloca ions in eal ime. Addi ionally, he amewo k
p o ides business in e aces o cloud p o ide s o enhance Se ice
le el Ag eemen s (
SLA
s) and ensu e secu i y, eliabili y, us , and
ene gy e iciency.
3.1.2 Managemen and O ches a ion F amewo k (
MOF
). The MOF
o ches a es ne wo k se ices ac oss he cloud con inuum in he
6G se ice-o ien ed ne wo k, in eg a ing a ious echnological do-
mains and suppo ing AI/ML amewo ks o eal- ime moni o ing
and upda es o AI-d i en unc ions. I s dis ibu ed managemen
app oach sepa a es conce ns, ede a es unc ional domains, and de-
li e s end- o-end ne wo k se ices (E2E NS). I also allows enan s
o eques deploymen , modi ica ion, o e mina ion o ne wo ks
o applica ions.
MEC
MEC
MEC
MEC
RAN
RAN
RAN
RAN
S +1
Sn
A
+1
An
VNF2
VNF 1
VNF3
Cloud DC
SDN con olle
Wi ed link
Mig a ion
Wi eless link
S a es
i=(Bi, Di,𝜎i
)
VNF2
VNF3
S
Ac ions
En i onmen al
in o ma ion
1
A VNF deploymen ac ion
3
VNF1
2
SFC eques a ic gene a ion
Figu e 2: Example o physical ne wo k con aining a Cloud
Da a Cen e , se e al in e connec ed MEC se e s se ing one
o mo e RAN domains se ing a di e se se o Use Equip-
men s (UEs), AR headse s, and IoT de ices
In he 6G ne wo k a chi ec u e, each Se ice O ches a o (SO)
is igh ly in eg a ed wi h he Ope a ions Suppo Sys em (OSS) and
Business Suppo Sys em (BSS), which handle he o ches a ion
and li ecycle managemen o Ne wo k Se ices (NS) as a se o
VNFs. The OSS manages ne wo k ope a ions like moni o ing, aul
managemen , and pe o mance, while he BSS o e sees business
asks like billing and cus ome managemen . Toge he , hey ensu e
e icien se ice deploymen , scaling, and esou ce op imiza ion.
SO au oma e FCAPS unc ions—Faul , Con igu a ion, Accoun ing,
Pe o mance, and Secu i y—ensu ing ne wo k heal h and secu i y.
A he business laye , OSS/BSS handles Li ecycle Managemen
(LCM) eques s o NS, dis ibu ing hem ac oss o ches a ion do-
mains o ensu e p og ammabili y and se ice in eg a ion ac oss
di e se cloud en i onmen s. This coo dina ion o echnical and busi-
ness laye s enables he a chi ec u e o adap o se ice demands
e icien ly while main aining seamless business ope a ions.
3.1.3 A i icial In elligence and Machine Lea ning F amewo k (
AIMLF
).
Designed o p o ide uni ied AI/ML managemen and o ches a ion
ac oss a ious segmen s and amewo ks o he 6G ne wo k a -
chi ec u e, he AIMLF suppo s he de elopmen , aining, and
dis ibu ion o AI/ML models. I inco po a es mechanisms o con-
inuous in eg a ion and con inuous de elopmen (CI/CD) o AI/ML
deploymen wi hin a se ice-o ien ed a chi ec u e. The amewo k
is capable o e alua ing and upda ing AI-d i en unc ions du ing
sys em un ime and u ilizes esou ces p o ided by he CCF. Addi-
ionally, AIMLF employs a ne wo k digi al win o imp o e AI/ML
aining, enhance simula ions beha io , and op imize AI-d i en
unc ions, inc ease e iciency and adap abili y.
3.2 Sys em Model
End-use de ices gene a e dynamic a ic pa e ns and ini ia e
ne wo k-access eques s h ough base s a ions, as shown in Fig-
u e 1. The a ic a e ses h ough he applica ion laye in he
p oposed 6G ne wo k a chi ec u e, eques ing an o ches a o o
SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly S. Wassie e al.
deploy VNFs op imally while mee ing s ic pe o mance equi e-
men s.The se ice o ches a o , pa o he MOF, ecei es his a ic
and de e mines he op imal deploymen on physical se e s wi hin
he CCF, o e he Cloud,Nea -edge,Fa -edge, and Ex eme-edge. We
model a physical ne wo k a high le el as shown in Figu e 2, whe e
Mobile Edge Compu ing (
MEC
) nodes a e connec ed o he cloud ia
high-speed ibe . All compu a ional and communica ion esou ces
o he edge nodes a e con olled by an o e lay con olle (i.e RL
agen ), which o ches a es SDN en i ies a he cloud DC.
The ne wo k in as uc u e consis s o : (i) a cloud da a cen e
(DC) capable o hos ing he p oposed DRL algo i hm and po en-
ially deploying mul iple VNFs; (ii) se e al MEC da a cen e s
𝑣𝑖
,
each wi h a CPU capaci y
𝐶𝑖
[
cycle/s
] ,posi ioned nea end-use s
o minimize la ency; and (iii) a se o end-use de ices ha ac-
cess ne wo k se ices h ough base s a ions as decip ed in igu e 2.
We conside he e ogeneous esou ces and a ying link capaci ies
among dis ibu ed edge se e s, whe e delays be ween VNFs on
di e en se e s, dic a ed by dynamic use a ic, a ec he op imal
VNF deploymen loca ions. In such a sys em, end-use de ices such
as sma phones, AR headse s, and IoT de ices eques dynamic
communica ion and compu a ional esou ces h ough base s a ions,
ini ia ing ne wo k-access eques s (e.g. egis a ion, a ach eques s,
and adio esou ce eques s o channel alloca ion). These de ices
p oduce dynamic a ic as shown in igu e 2 in s ep
1
wi h s ic
Key Pe o mance Indica o (
KPI
) equi emen s, demanding lexible
esou ce alloca ion o op imal compu ing and communica ion pe -
o mance.The RL agen ope a es as an in elligen SDN con olle ,
p ocessing complex s a e in o ma ion o op imize ne wo k pe o -
mance and se ice deli e y. The RL agen con inuously adap s o
ne wo k s a e in o ma ion, which se es as i s inpu (Figu e 2, s ep
2
).The en i onmen al s a e in o ma ion includes a ic loads, e-
sou ce a ailabili y (e.g., MEC CPU, memo y, bandwid h), ne wo k
pe o mance me ics (e.g., delay, link u iliza ion), and in as uc-
u e s a us (e.g., edge node s a us). Based on his analysis, he RL
agen p edic s op imal VNF deploymen ac ions and dynamically
alloca es esou ces o e icien ope a ion, as shown in Figu e 2,
s ep 3 .
To model he abo e desc ibed physical ne wo k, We conside a
scena io in ol ing a physical ne wo k in as uc u e modeled as
an undi ec ed g aph
𝐺=(𝑉, 𝐸)
, whe e
𝑉
and
𝐸
ep esen he se s
o physical ne wo k nodes and links be ween nodes, espec i ely.
Each node
𝑣∈𝑉
ep esen s a physical ne wo k en i y, such as an
ex eme-edge node (i.e., a
UE
end de ice such as a sma phone,
elec ic ehicle, o d one), and edge se e , o a cen al cloud da a
cen e wi hin he
CCF
, as depic ed in Figu e 1. Each link
𝑒∈𝐸
co esponds o a physical ne wo k connec ion, which ep esen s
high-speed ibe links be ween nodes. The bandwid h (BW) capaci y
o he physical link be ween nodes
𝑣𝑖, 𝑣𝑗∈𝑉
is deno ed as
𝐵𝑖 𝑗
[bi /s].
We model a gene ic
SFC
as a Di ec ed Acyclic G aph (
DAG
)
𝐻=(𝐾, 𝐿)
, whe e
𝐾
ep esen s he se o VNFs wi hin he SFC
and
𝐿
deno es he se o logical links be ween VNFs. Each VNF
𝑘∈𝐾
ep esen s a so wa ized ne wo k unc ion ha can p ocess
incoming packe s. The logical links
(𝑘𝑖,𝑘𝑗) ∈ 𝐿
ep esen he con-
nec ions be ween successi e VNFs
𝑘𝑖
and
𝑘𝑗
. The opology
𝐻
o an
SFC depends on he applica ion ha he SFC aims o suppo , and
we assume i is al eady de e mined by he enan and submi ed
o he ne wo k managemen plane o accep ance and deploymen .
The posi ion and logical o de o VNFs in an SFC a e c i ical pa-
ame e o ne wo k pe o mance.The de ailed g anula i y o a ic
a i al a he o ches a o is depic ed in Figu e 3. End- o-end use
a ic om he applica ion laye gene a es mul iple SFC eques s
(
𝑓1
,
𝑓2
,
𝑓3
, ...,
𝑓𝑛
), which a e ansmi ed o he se ice o ches a o .
These eques s ollow a s anda dized s uc u e wi hin a de ined
queuing model. The communica ion pa e ns encompass a ious
in e ac ion ypes, including human- o-human (H2H), machine- o-
machine (M2M), and machine- o-human (M2H) communica ion.
The se ice o ches a o p ocesses he incoming a ic, iden i ies
he ele an VNFs and hei in e dependencies, and maps hem o
app op ia e physical nodes. This p ocess ensu es op imal ope a ion
while main aining logical connec ions be ween VNFs ac oss he
ne wo k in as uc u e wi hin he Cloud Con inuum F amewo k.
We de ine an SFC eques as a uple
𝑓𝑖=𝐻𝑖, 𝐵min
𝑖, 𝐷max
𝑖, 𝜎𝑖
whe e
𝐻𝑖=(𝐾𝑖, 𝐿𝑖)
is he SFC’s opology,
𝐵min
𝑖[bi /
s
]
ep esen s
he minimal end- o-end bandwid h equi emen ,
𝐷max
𝑖[
s
]
is he
maximum allowable end- o-end delay,
𝜎𝑖[cycle/
s
]
deno es he o e -
all SFC’s compu a ional capaci y equi emen . As he incoming
a ic pa e n om use a ic luc ua es o e ime, by aking in o
accoun he ne wo k design, packe a i ing a he o ches a o
ollows a Poisson dis ibu ion wi h mean a i al in ensi y a e
𝜆𝑖
.
We deno e he sequence o SFC eques s a i ing o he ne wo k’s
esou ce o ches a o o being deployed on o he physical ne wo k
as
𝐹=(𝑓1, 𝑓2, . . . , 𝑓𝑛)
, whe e
𝑓𝑖
indica es he
𝑖
- h SFC eques in he
queue.
Each VNF
𝑘∈𝐾
is associa ed wi h a speci ic s a e, making
s a e ul mig a ion essen ial o main aining se ice con inui y and
op imizing ne wo k pe o mance. The a ge node selec ion o
each s a e can be modeled as a uple:
S𝑘=(𝑀𝑖, 𝐷𝑐, 𝑄𝑣, 𝑃𝑠,𝑇𝑚)
,
whe e
𝑀𝑖[
B
]
is he size o he con ex in o ma ion o be mig a ed,
𝐷𝑐
ep esen s se ice deploymen cos s,
𝑄𝑣
is he impac o SLA
iola ions du ing mig a ion,
𝑃𝑠
is selec ed pa h conges ion s a us,
and
𝑇𝑚[
s
]
is he o al mig a ion ime. S a e ul mig a ion p ese es
ac i e sessions and da a p ocessing, minimizing dis up ions. The
inabili y o eloca e se ices may esul in ailu es, leading o in e -
up ions and inc eased delays.
Ou p oposed me hod can also be applied o sys ems whe e
SFC
a e modeled as gene ic Di ec ed Acyclic G aphs (DAGs), suppo -
ing eme ging 6G mission-c i ical applica ions wi h s ingen
KPI
equi emen s. The ing ess a ic om he applica ion laye i s
en e s he NAT as inbound a ic o many ne wo k se ices, while
he ou bound a ic om he las VNF o en exi s h ough he IDPS.
Se e al s udies [
19
] show ha eg ess a ic depends on he ne wo k
se ices and is no unique o speci ic VNFs (e.g Ind 4.0 a ic exi
h ough FW).
To illus a e how a ic a e ses h ough VNFs o a ious appli-
ca ions, VNFs a e logically a anged in sequence. Fo example, VNFs
a e o ganized as
𝐾=(NAT,FW,TM,VOC,IDPS)
o ideo s eam-
ing, and as
𝐾=(NAT,TM,enc yp ion,decomp ession,dec yp ion)
o au onomous ehicles. An augmen ed eali y (AR) applica ion
ypically employs a linea SFC eques
𝑓𝑖
, wi h VNFs o de ed as
𝐾=(NAT,FW,VOC,TM,WO,IDPS)
. Incoming a ic i s en e s
he NAT o add ess ansla ion, hen passes h ough he i ewall
(FW) o il e unau ho ized a ic. Nex , i goes h ough he VOC
Deep Rein o cemen Lea ning o Con ex -Awa e Online Se ice Func ion Chain Deploymen and Mig a ion o e 6G Ne wo ks SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly
VK1
S
d
Cloud Con inuum Ne wo k In as uc u e
2
3
.
.
.
.
n
Physical ne wo k Vi ual link SFC logical link Embedding
Ex eme edge Edge Cen al cloud
VK4
VK5 VK6 VK7
VK8 VK9
d
S
S
VK2 VK3
VK4 VK5
VK6
VK7
d
VK1 VK2 VK3
1
VK3
VK5
VK1 VK4
VK6 VK7
Se ice o ches a o
VK9
VK2
VK1
L1
L2
L3
L4
L5
Figu e 3: SFC deploymen wi h gene ic s uc u e
o ideo quali y and bandwid h op imiza ion. The a ic manage
(TM) analyzes pa e ns, while he WAN Op imize (WO) enhances
pe o mance by op imizing da a low and educing la ency. Finally,
he IDPS scans o malicious ac i i y.
3.3 P oblem Fo mula ion
We o mula e he p oblem o elas ically au o-scaling VNF deploy-
men o e a physical ne wo k, aiming o de e mine he op imal
physical loca ion o VNFs by minimizing he la ency o SFC eques s
ac oss he physical ne wo k. This la ency comp ises p opaga ion
delay, ansmission delay, queuing delay, and p ocessing delay. How-
e e , we conside he ansmission delay and VNF compu ing delay.
The op imiza ion p oblem is o mula ed as ollows. We de ine he
SFC alloca ion ec o
𝛼=(𝛼1, . . . , 𝛼|𝐾|) ∈ 𝑉|𝐾|
, whe e each com-
ponen
𝛼𝑖
ep esen s he physical node in
𝑉
on which he
𝑖
- h VNF
in he SFC is deployed on. Le us de ine he VNF p ocessing la ency
𝑃(𝛼𝑖)
on node
𝛼𝑖
as he ime needed by he
𝑖
- h VNF o pe o m
i s compu ing ask when deployed on node
𝛼𝑖
, and we de ine he
wo s -case SFC p ocessing la ency as he sum
𝑃(𝛼)=Í|𝐾|
𝑖=1𝑃(𝛼𝑖)
o all SFC’s VNFs’ p ocessing delays. We also de ine he VNF com-
munica ion la ency
𝑙(𝛼𝑖, 𝛼𝑗
) as he sho es -pa h la ency be ween
VNF deploymen loca ions
𝛼𝑖
and
𝛼𝑗
o e he physical links
𝐸
, and
we de ine he wo s -case SFC communica ion la ency as he sum
Γ(𝛼)=Í(𝑖,𝑗 ) ∈𝐿𝑙(𝛼𝑖, 𝛼𝑗)
o all VNF communica ion la ency’s o e
all SFC’s links. Finally, we de ine he o al SFC delay as he sum o
he SFC p ocessing and communica ion la ancies, i.e.,
𝑃(𝛼) + Γ(𝛼)
.
E en hough he unc ions
𝑃
and
Γ
depend on he physical ne wo k
opology
𝐺
and he SFC opology
𝐻
, and hei ime- a ian p ope -
ies, we d op such dependency in he no a ion o conciseness.
Ano he pa ame e we conside ed in o mula ing objec i e unc-
ion o selec he op imal loca ion o a s a e ul VNF is i s mig a ion
ime, which depends on he size o he VNF and he h oughpu
o all links cons i u ing he sho es pa h om he VNF’s sou ce
node o a candida e a ge eloca ion node. We de ine he SFC’s
𝑘
- h VNF mig a ion ime
𝑡𝑘=Í(𝑖,𝑗 ) ∈𝑝𝑘
𝑀𝑘
𝐵𝑖 𝑗
as he sum o all he
imes needed o ansmi he VNF’s s a e o size
𝑀𝑘
o e all phys-
ical links
(𝑖, 𝑗) ∈ 𝑝𝑘
ha cons i u e he sho es pa h
𝑝𝑘
om he
VNF’s p e ious deploymen loca ion o he new candida e deploy-
men node
𝛼𝑘
, which depends on ime- a ying VNF s a e size and
ne wo k condi ions. We de ine he wo s -case SFC mig a ion ime
𝑇(𝛼)=Í𝑘∈𝐾𝑡𝑘
as he sum o all VNF mig a ion imes, om hei
deploymen loca ions o hei espec i e candida e a ge nodes
ep esen ed by
𝛼
. I is wo h no ing ha he mig a ion ime migh
no be he only cos ope a o incu o mig a ing SFCs, o example
adding economical o ene gy expenses o in as uc u e ac i a ion,
SLA iola ions, bandwid h quo a excess, he size o mig a ed VNF
con ex in o ma ion, ene gy consumed o mig a ion ask, e ce e a.
The e o e, we conside
𝑇(𝛼)
as a mo e gene ic de ini ion o SFC
mig a ion cos ha may no necessa ily be exp essed in la ency bu
includes o he inancial, ene gy, and esou ce aspec s.
Gi en ha he op imiza ion p oblem is mul i-objec i e, we de ine
𝛽=(𝛽1, 𝛽2)
as he weigh ac o s ha balance he ade-o be ween
delay and mig a ion cos when selec ing physical nodes, and aim
o minimize
𝛽⊤𝐶(𝛼)
, whe e
𝐶=(𝑃+Γ,𝑇 )(𝛼)
. Le us de ine
𝑛𝑘
𝑣
as
he
cycles/
sused by a VNF
𝑘
when deployed on node
𝑣∈𝑉
, and
𝑏𝑘𝑙
𝑖 𝑗
as he bandwid h in
bi /
sused by an SFC’s logical link
(𝑘, 𝑙)
i
deployed o e he physical link (𝑖, 𝑗) ∈ 𝐸.
The op imiza ion p oblem’s goal is o ind he op imal SFC al-
loca ion ec o
𝛼∗
o each SFC eques
𝑓𝑖
, which minimizes he
objec i e unc ion
𝛽⊤𝐶(𝛼)
unde a se o in as uc u e-induced
cons ain s, as in Equa ion 1.
minimize
𝛼∈𝑉|𝐾|
To al SFC delay
z }| {
𝛽1· (𝑃(𝛼) + Γ(𝛼)) +
SFC mig a ion cos
z }| {
𝛽2·𝑇(𝛼)(1)
subjec o
∑︁
𝐾𝑖:𝑖∈[𝑛]∑︁
𝑘∈𝐾𝑖
𝑛𝑘
𝑣≤𝐶𝑣,∀𝑣∈𝑉(1a)
∑︁
𝐿𝑖:𝑖∈[𝑛]∑︁
(𝑘,𝑙 ) ∈𝐿𝑖
𝑏𝑘𝑙
𝑖𝑗 ≤𝐵𝑖 𝑗,∀(𝑖, 𝑗) ∈ 𝐸(1b)
𝑃(𝛼) + Γ(𝛼) ≤ 𝐷max (1c)
Cons ain 1a imposes ha he sum o he p ocessing equi e-
men s o all VNFs deployed on each node in he sys em should be
less han he locally a ailable compu a ional capaci y. Cons ain 1b
indica es ha he bandwid h usage o all SFC eques s mus emain
wi hin he a ailable bandwid h capaci y o he ne wo k logical links.
Finally, Cons ain 1c implies ha he o al communica ion and
p ocessing delay o an SFC eques does no exceed he E2E delay
ole ance limi equi ed o he success ul comple ion o he gi en
se ice.
SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly S. Wassie e al.
4 METHODOLOGY
4.1 Deep Rein o cemen Lea ning App oach
In his sec ion, we ede ine he op imiza ion p oblem in Equa ion 1
wi h he con ex o RL, whe e he RL agen pe o ms VNF deploy-
men ac ions by con inuously moni o ing he physical ne wo k
in as uc u e and ecei ing eedback in he o m o ewa ds. The
goal is o de i e an op imal con ol policy ha enables he agen o
selec VNF deploymen ac ions op imally, conside ing he u u e
ne wo k s a e. To achie e his, we employ DRL wi h deep neu al
ne wo ks, enabling he agen o manage complex en i onmen s
and high-dimensional s a e spaces.
The p oposed DRL-based P oximal Policy Op imiza ion (
PPO
)
agen iden i ies op imal physical nodes o mee applica ion delay
equi emen s by calcula ing delays be ween VNF-hos ing nodes,
aking in o accoun session in o ma ion size, se ice deploymen
cos s, SLA iola ions, ne wo k conges ion, ene gy use o ac i e
ans e s, and o al mig a ion ime. I e ines i s s a egies by le e -
aging ewa ds om ac ions wi hin a Ma ko Decision P ocess
(MDP). The p oblem is o mula ed in e ms o a s a e space
𝑆
, an
ac ion space
𝐴
, and a ewa d unc ion
𝑅
. We demons a e how
PPO sol es he p oblem using bo h a alue ne wo k and a policy
ne wo k. The discoun ac o
𝛾∈ [
0
,
1
)
is used o ma hema ically
ep esen con inuing asks.
The main objec i e o he RL agen is o disco e a policy ha
maximizes he expec ed sum o discoun ed u u e ewa ds, known
as he e u n, gi en by
𝑅𝑡=Í∞
𝑖=0𝛾𝑖+𝑡𝑟𝑡+𝑖
. The op imal policy, gi en
by
𝜋∗=a g max𝜋E𝜋{𝑟0|𝑠0=𝑠}
, is he one ha maximizes he
expec ed e u n om any gi en s a e
𝑠
. The de ailed desc ip ion o
he s a e space, ac ion space, and ewa d unc ion is as ollows.
1)
S a e space
𝑆𝑡
:Desc ibes he cu en si ua ion o he agen
in he en i onmen o ou VNF deploymen p oblem, de-
signed as a ec o ini ially andomly deployed wi h
𝐷=
{𝑉𝐾𝑗
𝑖,𝑉𝐾𝑗+1
𝑖+1, . . . ,𝑉 𝐾𝑛
𝑛}
in he physical ne wo k which ep e-
sen s VNFs
𝐾𝑗, 𝐾𝑗+1, . . . , 𝐾𝑛
ha a e deployed on physical
nodes
𝑉𝑖,𝑉𝑖+1, . . . ,𝑉𝑛
a ime
𝑡0
, as well as he link delay
be ween he sou ce and des ina ion nodes o he physical
nodes hos ing he VNFs in an SFC. The s a e
S
is de ined as
𝑆=𝑉𝐾𝑗
𝑖,𝑡 , 𝐿𝐾𝑗
𝑉𝑖,ℎ ∀𝑖∈𝑉𝑛,∀𝑗∈𝐾𝑛,(2)
, whe e
𝑉𝐾𝑗
𝑖,𝑡
means ha VNF
𝐾𝑗
is deployed on physical
se e
𝑉𝑖
a ime
𝑡
, and
𝐿𝐾𝑗
𝑉𝑖,ℎ
deno es he link delay
𝐿𝑉𝑖
ℎ,𝑡
be-
ween he sou ce node hos ing VNF
𝐾𝑗
a
𝑉𝑖
and he des ina-
ion node whe e VNF
𝐾𝑗+1
is hos ed on physical se e
𝑉ℎ
.
Whe e Bo h (𝑣𝑖, 𝑣ℎ) ∈ 𝑉2.
2)
Ac ion space
𝐴𝑡
:The agen explo es whe e a e he op imal
loca ions o physical nodes o hos VNFs ha mee he QoS
equi emen s o incoming a ic. The ac ion space p o ides
bounda ies he agen how o sea ch he physical nodes e-
ga ding VNF deploymen , wi h each ac ion ep esen ing he
selec ion o a sequence o nodes ha sa is ies he QoS e-
qui emen s. The agen will selec a numbe o nodes equal
o he leng h o he VNFs in he SFCs a each ime s ep 𝑡.
𝐴=(𝑉𝐾𝑖
1, . . . ,𝑉 𝐾𝑛
𝑛),(3)
L= 3ms
B=100 Mbps
CPU: 3GHz
RAM:9GB
memo:36GB
CPU: 3GHz
RAM:9GB
Memo:36GB
CPU: 4GHz
RAM:12GB
memo:42GB
CPU:6GHz
RAM:15GB
memo:256GB
VNF1 VNF2 VNF3
L=4ms
B=120Mbps
L=5ms
B=50Mbps
L=4ms
B=40Mbps
L=12ms
B=60 Mbps
En i omen Rewa d R
Policy ne wo k
S a e s
Ac ion A
Vϕ (s )
Value ne wo k
S a e s
Figu e 4: Value and policy ne wo k o VNF deploymen .
, whe e
𝑉1, . . . ,𝑉𝑛
ep esen he numbe o nodes selec ed a
each ime s ep 𝑡1, . . . , 𝑡𝑛.
3)
Rewa d Func ion
𝑅𝑡
:The ewa d unc ion quan i a i ely
measu es he impac o decisions on ne wo k ope a ions.
The agen needs o quan i y he quali y o he deploymen
loca ion o physical node by lea ning om he pa e n ex-
ac ed om en i onmen . The QoS u ili y unc ion is gi en
by
𝑄𝑗=𝛼𝜇𝑗·𝑆𝑗(𝐷𝑗) + 𝛼𝑓
𝜇𝑗·𝑇𝑗(𝑊𝑗,𝐶𝑗)
whe e
𝛼𝜇𝑗
and
𝛼𝑓
𝜇𝑗
,
a e weigh ing pa ame e s used o p io i ize use sa is ac ion
𝑆𝑗(𝐷𝑗)
,and he cos sa ings sco e
𝑇𝑗(𝑊𝑗,C𝑗)
o he gi en
a ic eques .
Assume
𝑉={𝑉1, . . . ,𝑉𝑛}
as he se o all possible loca ions
whe e VNFs can be deployed and
𝐾={𝐾1, . . . , 𝐾𝑛}
a e pos-
sible VNFs o be deployed on a ailable loca ions o physical
se e . Equa ion 2 ep esen s he delay be ween VNF
𝐾𝑗
un-
ning a he sou ce node
𝑉𝑖
and VNF
𝐾𝑗+1
unning on he
a ge node 𝑉ℎa ime 𝑡. The ewa d unc ion is gi en by
𝑅𝑡=−𝜔1∑︁
(𝑣𝑖,ℎ) ∈𝑉2
𝐿𝐾𝑗
𝑉𝑖,ℎ −𝜔2∑︁
𝑓∈𝐹
𝑄𝑖,(4)
, whe e
𝜔1
and
𝜔2
p io i ize la ency educ ion and maximize
QoS espec i ely,
4.2 P oximal Policy Op imiza ion o
au onomous VNF deploymen
PPO
is policy-based DRL algo i hm, di ec ly upda es i s policy based
on obse ed ewa ds, allowing o apid adap a ion in dynamic en-
i onmen s. I s use o policy g adien s makes i highly e icien and
adap able, making i a aluable choice o op imiza ion p oblems
in complex and changing ne wo k en i onmen .
As shown in Figu e 4, he s a e space consis s o he ne wo k
opology, cu en VNF placemen s, esou ce u iliza ion, and KPI
equi emen s o incoming a ic. The RL agen makes VNF place-
men decisions by conside ing SFC ou ing choices and esou ce
Deep Rein o cemen Lea ning o Con ex -Awa e Online Se ice Func ion Chain Deploymen and Mig a ion o e 6G Ne wo ks SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly
alloca ion. The ewa d is based on imp o emen s in ne wo k pe o -
mance and he e iciency o esou ce u iliza ion. The unc ionali ies
o each componen in his ein o cemen lea ning amewo k a e
de ailed as ollows.
Algo i hm 1: Au onomous VNF deploymen
Inpu : 𝑓𝑖=(𝐻𝑖, 𝐵min
𝑖, 𝐷max
𝑖, 𝜎𝑖),𝛼,𝑇max,𝐺 opo,𝑙(𝑖, 𝑗)
1 Ini ializa ion: 𝜙0,𝜋0// Ini ialize 𝜙,𝜋pa ame e s
2𝑆𝑡← {𝑉𝐾𝑗
𝑖,𝑡 , 𝐿𝑉𝑖,𝑗
𝑡}// Ini ial VNF loca ion
3𝑅𝑡←0// Ini ial he ewa d o ze o
4 o 𝑡∈𝑇max do
5𝑐←Measu eCPU equi men (𝑓𝑖)
6𝑏←Measu eBandwid h(𝑓𝑖)
7𝑙←Measu eLa ency(𝑓𝑖)
8𝑞←Moni o Linkquili y(𝐺 opo)
9𝑆𝑡← (𝑙,𝑏,𝑐,𝑞)// Assign 𝑓𝑖KPI equi emen
10 𝑆𝑡
𝐴𝑡,𝜋𝜃(𝐴𝑡|𝑆𝑡)
−−−−−−−−−−−→ 𝑆𝑡+1// Selec 𝐴𝑡∼𝜋𝜃
11 𝑆𝑡+1←𝑉𝐾𝑖
𝑡// assign VNF 𝐾𝑖new loca ion
12 𝐷𝑡← {𝑠0, 𝑎0,𝑟0, . . . ,𝑠𝑡, 𝑎𝑡, 𝑟𝑡}// collec s a e ac ion
in e ac ion
13 i 𝑙(𝑓𝑖)<𝑙(𝑖, 𝑗) hen
14 𝐴𝑡∼𝜋𝜃(𝐴𝑡|𝑆𝑡)Execu e 𝐴𝑡
−−−−−−−−−→ 𝑅𝑡+1, 𝑆𝑡+1
15 𝐴𝜋𝜃𝑡(𝑠𝑡, 𝑎𝑡) ← 𝑄(𝑠, 𝑎) − 𝑉𝜙(𝑠)// Ad an age es ima e
// Upda e 𝜃and 𝜙pa ame e s
16 𝜃𝑡+1=a g max𝜃1
| D𝑡|𝑇Í𝜏∈ D𝑡Í𝑇
𝑡=0
17 min 𝜋𝜃(𝑎𝑡|𝑠𝑡)
𝜋𝜃𝑡(𝑎𝑡|𝑠𝑡)𝐴𝜋𝜃𝑡(𝑠𝑡, 𝑎𝑡),𝑔(𝜖, 𝐴𝜋𝜃𝑡(𝑠𝑡, 𝑎𝑡))
18 𝜙𝑡+1=a g min𝜙1
| D𝑡|Í𝜏∈ D𝑡Í𝑇
𝑡=0(𝑉𝜙(𝑠𝑡) − 𝑅𝑡)2
19 else
20 SFC_ eq ← ejec ed
21 e u n 𝜙𝑖(𝑠𝑡), 𝜋𝜃(𝑠𝑡)
PPO de elops an op imal policy by u ilizing he collabo a ion
be ween alue and policy ne wo ks o decision-making op imiza-
ion. The de ailed wo king p inciple o how he alue and policy
ne wo ks moni o he en i onmen s a e and disco e he op imal
policy is illus a ed in (Algo i hm 1). The alue ne wo k es ima es
he long- e m e ec i eness o ne wo k s a es and econ igu a ion
ac ions, ac o ing in VNF placemen s, SFC con igu a ions, esou ce
u iliza ion, and a ic demands. This es ima ion, ep esen ed by
𝑉𝜙(𝑠)=EÍ∞
𝑡=0𝛾𝑡𝑅𝑡+1|𝑆𝑡=𝑠
guides he policy ne wo k. He e,
𝑉𝜙(𝑠)
is pa ame e ized by
𝜙
,
𝛾
is he discoun ac o ,
𝑅𝑡+1
is he
ewa d a ime 𝑡, and 𝑆𝑡=𝑠speci ies he s a e a ime 𝑡.
In RL, a policy
𝜋
de ines how an agen ac s based on he obse ed
s a e. I guides he agen ac ions in esponse o en i onmen al s a es.
The policy ne wo k maps s a es o ac ions, aiming o maximize
cumula i e ewa ds by inc easing he p obabili y o high- ewa d
ac ions and dec easing ha o less e ec i e ones. PPO e ines his
app oach using he ad an age unc ion
𝐴(𝑠, 𝑎)=𝑄(𝑠, 𝑎) −𝑉𝜙(𝑠)
o
e alua e ac ions and cons ain upda es o s ay close o he cu en
policy. La ge upda es o he policy can lead o signi ican changes in
he beha io o he agen and an uns able aining p ocess.
𝐴(𝑠, 𝑎)
quan i ies how much be e o wo se an ac ion
𝑎
in s a e
𝑠
is com-
pa ed o he expec ed ou come unde he cu en policy. A posi i e
𝐴(𝑠, 𝑎)
sugges s he ac ion is bene icial; a nega i e alue sugges s
a subop imal choice ha does no maximize he expec ed e u n,
indica ing he need o lowe he p obabili y o selec ion.
L(𝑠, 𝑎, 𝜃𝑘)=min 𝜋𝜃(𝑎|𝑠)
𝜋𝜃𝑘(𝑎|𝑠)𝐴(𝑠, 𝑎),
clip 𝜋𝜃(𝑎|𝑠)
𝜋𝜃𝑘(𝑎|𝑠),1−𝜖, 1+𝜖𝐴(𝑠, 𝑎)(5)
The loss unc ion
L(𝑠, 𝑎, 𝜃𝑘)
e alua es he policy
𝜋𝜃
a s a e
𝑠
o ac ion
𝑎
, whe e
𝜃𝑘
a e he old policy pa ame e s. The e ms
𝜋𝜃(𝑎|𝑠)
and
𝜋𝜃𝑘(𝑎|𝑠)
ep esen he p obabili ies o aking ac ion
𝑎
unde he cu en and old policies, espec i ely. The ad an age
unc ion
𝐴(𝑠, 𝑎)
es ima es he imp o emen in ewa d o ac ion
𝑎
a
𝑠
ela i e o he a e age ac ion unde
𝜋𝜃𝑘
. To limi la ge up-
da es,
clip 𝜋𝜃(𝑎|𝑠)
𝜋𝜃𝑘(𝑎|𝑠),1−𝜖, 1+𝜖
cons ains he policy a io wi hin
[
1
−𝜖,
1
+𝜖]
, whe e
𝜖
ensu es s abili y by p e en ing excessi e
de ia ions.
4.3 Au onomous VNF deploymen algo i hm
S ep 1: The inpu o he PPO algo i hm includes physical ne wo k
communica ion, compu a ional esou ces, and SFC eques s wi h
speci ic equi emen s o bandwid h, delay, compu ing capaci y,
and memo y.
S ep 2: The PPO model, wi h bo h alue and policy ne wo ks,
ini ializes he pa ame e s
𝜙0
and policy
𝜋0
as decip ed in (Algo i hm
line 1). S ep 3: S a wi h a andom VNF deploymen
𝑆𝑡={𝑉𝑘
𝑖, 𝐿𝑘
𝑖}
,
which ep esen s bo h he physical loca ion and he link quali y and
ini ialize he ewa d
𝑅𝑡=
0(Algo i hm 1 line 2-3). S ep 4: Measu e
KPI Requi emen s o SFC eques
𝑓𝑖
such as bandwid h,CPU,la ency
and obse e he a ailable esou ces in he physical ne wo k. Those
a e ansla ed as s a e and aken as inpu o he alue ne wo k and
policy ne wo k (Algo i hm 1 line 4-9). S ep 5: Selec deploymen
ac ion based on he cu en policy
𝜋𝜃
. Upda e he sys em s a e and
selec he new VNF loca ion
𝑆𝑡+1=𝑉𝑘
𝑖
(Algo i hm 1 line 10-11).
S ep 6: Compa e he incoming a ic equi emen s wi h a ailable
in as uc u e esou ces:
𝑙(𝑓𝑖)<𝑙(𝑖, 𝑗)
o he alloca ion ec o .
Compu e he ewa d
𝑅𝑡
, he ad an age
𝐴𝜋𝜃𝑡(𝑠𝑡, 𝑎𝑡)
, and assign
he new VNF loca ion
𝑆𝑡+1=𝑉𝑘
𝑖
(Algo i hm 1 line 11-15). S ep 7:
Upda e he policy and alue unc ion
𝜃𝑡+1
and
𝜙𝑡+1
, and epea his
i e a ion un il he op imal policy is de eloped (Algo i hm 1 line
15-18). S ep 8: Decision abou he SFC accep ance and SFC eques
ejec ion (Algo i hm 1 line 13-20).
5 EXPERIMENTAL EVALUATION AND
RESULTS
5.1 Simula ion Se up
The simula ion expe imen s a e conduc ed using a Ne wo kX-based
Py hon simula o o gene a e he ne wo k opology and in as uc-
u e. Fo he DRL implemen a ion, we employ he open-sou ce
ools Open AI Gymnasium and S able Baselines3 o aining and
es ing he DRL-based PPO agen in a cus omized
RL
en i onmen
inspi ed by he USA NET ne wo k opology [
20
]. The pe o mance
o he p oposed DRL-based solu ion is e alua ed ac oss a ious
SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly S. Wassie e al.
Table 2: Simula ion Pa ame e s
Pa ame e Value
Numbe o nodes (𝑁){10,20, . . . , 60}
SFC leng h (|𝐾|){3,5,7,9, . . . , 13}
Maximum s eps (𝑇max)106
Discoun ac o (𝛾) 0.99
Clip a io (𝜖) 0.2
Numbe o epochs (𝑁epoch)50
Ba ch sizes (𝑁ba ch) {64,128,256}
Lea ning a e (𝛼) {0.0005,0.003,0.001}
Coe icien (𝛽) 0.01
Clip ange 0.1
Gene alized Ad an age Es ima ion (GAE) 0.95
ne wo k se ings, wi h opologies anging om 10, 20 o 60 nodes.
Gi en he dynamic na u e o incoming a ic, each SFC eques
𝑓𝑖
is managed o main ain he quali y o he link, while he PPO agen
moni o s he a ailable communica ion esou ces in he backg ound
and adap s VNF deploymen when necessa y.
We model bo h he s a e space
𝑆
and ac ion space
𝐴
as disc e e
space. Gi en PPO sui abili y o disc e e spaces and i s s abili y
and con e gence p ope ies, we selec PPO o his scena io. PPO
employs a s ochas ic policy ha e ines o e ime o a o ewa ding
ac ions. P ope hype pa ame e uning balances explo a ion and
exploi a ion, a oiding local op ima and imp o ing solu ion quali y.
La ge policy upda es can cause pe o mance collapse, so PPO uses
a su oga e loss o keep upda es wi hin a sa e ange. Simula ion &
hype pa ame e s limi ing policy upda es a e shown in Table 2.
The clip a io (
𝜖
) in PPO p e en s la ge policy upda es, while he
egula iza ion coe icien (
𝛽
) con ols en opy and alue unc ion
e ec s. The clip ange (0.1) ensu es s able upda es, and Gene alized
Ad an age Es ima ion (GAE) balances bias and a iance.
5.2
Baseline and e e ence o p oposed me hod
To e alua e he pe o mance o he p oposed DRL-based VNF de-
ploymen , we compa e i o a baseline g eedy algo i hm . The g eedy
algo i hm, known o i s e ec i eness in op imiza ion asks, makes
locally op imal choices a each s ep o maximize o minimize an
objec i e. This se es as a aluable benchma k o assessing agains
dis ibu ed da a-d i en app oach, as i consis en ly selec s he bes
a ailable op ion wi hou necessa ily conside ing he global op i-
mum wi h sho -sigh ed decisions. To add ess he VNF deploymen
challenge, he g eedy algo i hm selec s andom sou ce and des ina-
ion nodes, deploying VNFs and choosing he lowes -la ency pa hs
be ween hem based on physical link delays. Though ocused on
immedia e gains, i ecognizes ha locally op imal choices may no
ensu e globally op imal VNF placemen and ne wo k pe o mance.
5.3 E alua ion Me ics
To in es iga e he pe o mance o he p oposed DRL model, we con-
duc ed simula ions wi h a ying pa ame e s o measu e he agen ’s
pe o mance wi hin he en i onmen . Fu he mo e, we u ilized se -
e al e alua ion me ics: a e age ewa d, loss unc ion, accep ance
a io, VNF eques iola ion a io, and mig a ion o e head.
Algo i hm 2: Baseline g eedy VNF alloca ion
Inpu : 𝐺 opo,𝑇max,𝑓𝑖,𝑉 𝑁𝐹_𝑖𝑛𝑓 𝑜
Ou pu : 𝑉𝑘
1Ini ializa ion: 𝛼𝑖=(𝑉1, . . . ,𝑉𝑛),𝑃∗
pa h =0,𝐿∗
pa h =0
2 o 𝑡∈𝑇max do
3s c,ds ←Sample (𝐺 opo,|𝐾|)
4while s c,ds ∉𝑉𝑘do
// Randomly sample nodes
5s c,ds ←Sample (𝐺 opo,|𝐾|)
// Selec he sho es pa h
6𝑃pa h ← Psho [s c][ds ]
7𝐿P=Ílen( P)
𝑣𝑖,𝑣ℎ∈𝑣ds 𝐿𝐾𝑗
𝑉𝑖,ℎ
8 o 𝑉∈ [s c,ds ]do
9i 𝑎𝑙𝑙𝑜𝑐_𝑠𝑟𝑐 ≠𝑉 hen
10 𝐿∗
pa h =Í𝑣𝑖∈ Psho es
𝑣s c→𝑣ds
𝐿𝐾𝑗
𝑉𝑖,ℎ
// Sum he selec ed pa h delay
11 𝐿 o al =Í𝑣𝑖∈ Pselec ed_pa h
𝑣s c→𝑣ds
𝐿(𝑣𝑖, 𝑣𝑗)
12 i 𝐿P<𝐿∗
pa h hen
13 𝑃∗
pa h ← (s c,ds )
14 alloca e 𝑉[𝑡] ← 𝑉∗
15 e u n 𝑉𝑘,𝑉∗
•
Rewa d: De ined as he end- o-end delay be ween SFC-deployed
physical se e s, as desc ibed in Equa ion 4.
•
Loss Func ion: The PPO model con e gence pe o mance
o VNF deploymen is desc ibed in Equa ion 5. A smalle
loss indica es be e pe o mance.
•
Accep ance Ra io: The p opo ion o accep ed VNF eques s
ela i e o he o al incoming eques s. I indica es he pe -
cen age o accep ed eques s.
•
QoS iola ion Ra io: I is he ac ion o VNF eques s ha
ail o mee QoS cons ain s, calcula ed ela i e o he o al
numbe o ecei ed eques s.
•
Mig a ion O e head: I measu es he compu a ional, ene gy,
and bandwid h cos s o mig a ing VNF ins ances o op imize
esou ce use and se ice quali y.
5.4 Simula ion Resul s
Figu e 5a shows he cumula i e ewa d pe ime s ep du ing he
aining o he PPO algo i hm o VNF deploymen and mig a ion,
compa ed o he analy ical, g eedy-based VNF alloca ion me hod.
This esul s highligh s he supe io lea ning capabili y o in e -
media e lea ning a es o he PPO algo i hm agains he myopic
app oach o he g eedy algo i hm ha is unable o lea n e ec i e
s a egies. The cumula i e ewa d e lec s he PPO agen ’s decision-
making, wi h ini ial a iabili y due o ex ensi e explo a ion. These
luc ua ions highligh he dynamic lea ning p ocess as he PPO
model op imizes VNF alloca ion. In con as , he g eedy app oach,
Deep Rein o cemen Lea ning o Con ex -Awa e Online Se ice Func ion Chain Deploymen and Mig a ion o e 6G Ne wo ks SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly
0 200 400 600 800 1000
Time S ep
0.08
0.06
0.04
0.02
0.00
Rewa d
×103
G eedy VNF alloca ion
PPO l =0.0001
PPO l =0.003
PPO l =0.0005
(a) Compa ison o ewa d con e gence in e ms o
nega i e la ency be ween he DRL-based VNF de-
ploymen agen and he analy ical g eedy me hod.
0.00 0.25 0.50 0.75 1.00
Time S ep ×105
0.00
0.25
0.50
0.75
1.00
Loss Func ion
×105
Lea ning a e
LR 0.001
LR 0.003
LR 0.0005
(b) Con e gence o he loss unc ion
o a ying lea ning a es, highligh -
ing supe io pe o mance o in e me-
dia e alues o lea ning a e
30 40 50 60
Numbe o Nodes N
0
200
400
600
800
Se ice Reloca ion Cos
SFC Leng h 5
SFC Leng h 7
SFC Leng h 9
SFC Leng h 11
SFC Leng h 13
(c) Impac o ne wo k size and SFC leng h
on he mig a ion o e head in e ms o num-
be o VNF mig a ions
Figu e 5: Lea ning capabili y o he DRL-based VNF deploymen agen agains g eedy-based VNF alloca ion o a ious lea ning
a es and an SFC composed o 13 VNFs in a ne wo k con aining 60 nodes, and a pa ame e s udy o ne wo k scalabili y on
mig a ion o e head
which igno es u u e s a es, also shows high a iabili y. The compa -
ison shows ha PPO consis en ly ou pe o ms he g eedy me hod,
o e ing mo e s able and e icien esul s.
Figu e 5b illus a es he e ec o lea ning a es
𝛼=
0
.
001,0
.
003,
and 0
.
0005 on loss con e gence du ing aining. Ini ially, he DRL
PPO agen explo es andomly, leading o highe loss. As aining
p og esses, he agen e ines i s s a egy, educing he loss owa d
0, indica ing imp o ed decision-making and con e gence o he
VNF deploymen policy.
Figu e 5c desc ibes he ela ionship be ween he numbe o nodes
and he se ice eloca ion cos o a ious SFC leng hs. As he
numbe o nodes inc eases om 30,40 o 60, he se ice eloca ion
cos shows a ising end ac oss all SFC leng hs. No ably, longe
SFCs, such as hose wi h leng hs 11 and 13, exhibi signi ican ly
highe eloca ion cos s compa ed o sho e SFCs, like hose wi h
leng hs 5 and 7. This indica es ha as he ne wo k g ows and he
complexi y o he SFC inc eases, he cos associa ed wi h eloca ing
se ices also inc ease, wi h la ge SFCs incu ing disp opo iona ely
highe cos s.
Figu e 7 shows he impac o communica ion o e head ac oss
a ious ne wo k con igu a ions as he numbe o VNFs pe SFC
inc eases om 5,7 o 13. The esul s indica e ha he numbe o
mig a ion (i.e mig a ion o e head ) ises wi h he numbe o VNFs.
No ably, he ne wo k se ing wi h 60-node consis en ly expe iences
he highes communica ion o e head, sugges ing la ge ne wo ks
ha e g ea e capaci y o VNF mig a ion. In con as , he ne wo k
wi h 30-node con igu a ion exhibi s small numbe o mig a ion
which means indi ec ly lowe communica ion o e head, highligh -
ing he impo ance o managing ne wo k size and VNF alloca ion
o minimize o e head.
Figu e 6a shows he VNF accep ance and link iola ion a ios
o e ime as he PPO algo i hm lea ns o mee QoS cons ain s.
Ini ially, VNF eques s a e ejec ed, and he link iola ion a io
is high. As he agen e ines i s policy, VNF placemen imp o es,
educing iola ions. E en ually, he PPO agen con e ges o a pol-
icy ha maximizes VNF accep ance and minimizes link iola ions,
demons a ing imp o ed sys em pe o mance and e icien QoS
managemen .
The scalabili y o he DRL-based PPO sys em is e alua ed by
analyzing i s esponse o inc easing wo kloads. Figu e 6b shows
ha as he numbe o physical se e s in edge nodes inc eases,
E2E a ic delay dec eases o di e en SFC leng hs. Mo e se e s
educe la ency, allowing he RL agen o iden i y op imal deploy-
men loca ions. Addi ionally, sho e SFCs esul in lowe la ency,
wi h E2E delay ising as SFC leng h inc eases. O e all, mo e physi-
cal se e s enhance ne wo k pe o mance by educing delay and
p o iding mo e deploymen op ions o a iable la ency links.
Figu e 6c illus a es scalabili y ac oss SFC leng hs and ne wo k
con igu a ions. As SFC leng h inc eases, la ency also ises, wi h
longe SFCs leading o highe delays and educed e iciency. Con ig-
u a ions wi h ewe VNFs pe SFC pe o m be e . No ably, la ge
ne wo ks (e.g., 45 nodes) show lowe la ency han smalle ones (e.g.,
10 nodes), emphasizing he bene i s o la ge ne wo ks in educing
delays. This unde sco es he impo ance o s a egic VNF place-
men o op imize la ency and se ice deli e y in a ying ne wo k
sizes.
6 CONCLUSION
This pape add esses he p oblem o ne wo k s a e-adap i e(i.e
ne wo k con ex awa e) op imal VNF deploymen and mig a ion
while minimizing E2E delay. VNFs a e o ganized in a p ede ined
sequence o sa is y he s ic delay equi emen s o SFC eques s
and accommoda e luc ua ing communica ion esou ce demands.
Unlike analy ical algo i hm o ne wo k load and ailu e de ec ion
echniques, he DRL model adap s o eal- ime ne wo k changes
by con inuously upda ing i s decision-making policies. The p o-
posed app oach demons a es supe io pe o mance compa ed o
baseline me hods by educing delay link iola ions, imp o ing VNF
eques accep ance a es, and minimizing E2E la ency h ough op-
imal VNF deploymen decisions. Fu he mo e, his wo k can be
ex ended using a ede a ed ein o cemen lea ning o suppo he
new "ne wo k o ne wo ks" concep o 6G ne wo k a chi ec u e