Deep Reinforcement Learning for Context-Aware Online Service Function Chain Deployment and Migration over 6G Networks

Author: fikadie wassie, solomon; Di Maio, Antonio; Braun, Torsten

Publisher: Zenodo

DOI: 10.1145/3672608.3707975

Source: https://zenodo.org/records/17671414/files/SAC-2024.pdf

Deep Rein o cemen Lea ning o Con ex -Awa e Online Se ice
Func ion Chain Deploymen and Mig a ion o e 6G Ne wo ks
Solomon Fikadie Wassie
Uni e si y o Be n
Swi ze land
[email p o ec ed]
An onio Di Maio
Uni e si y o Be n
Swi ze land
[email p o ec ed]
To s en B aun
Uni e si y o Be n
Swi ze land
[email p o ec ed]
ABSTRACT
The Cloud Con inuum F amewo k (
CCF
) logically in eg a es dis-
ibu ed ex eme edge, a edge, nea edge, and cloud da a cen-
e s in 6G ne wo ks. Deploying VNFs o e he CCF can enhance
ne wo k pe o mance and Quali y o Se ice (
QoS
) o mode n
delay-sensi i e applica ions and use cases in 6G ne wo ks. Deep
Rein o cemen Lea ning (
DRL
) has shown po en ial o au oma e
Vi ual Ne wo k Func ion (
VNF
) mig a ions by lea ning op imal
policies h ough con inuous moni o ing o he ne wo k en i on-
men . In his wo k, we le e age Deep Rein o cemen Lea ning o
op imize ne wo k con ol policies ha con inuously upda e VNF
placemen o op imal Se ice Func ion Chain (
SFC
) deploymen
in ime- a ying use a ic scena ios. By le e aging dynamic VNF
eloca ion, his app oach seeks o imp o e ne wo k pe o mance
in e ms o la ency, ope a ional cos s, scalabili y, and lexibili y.
This s udy add esses he gap in exis ing solu ions by join ly con-
side ing ne wo k pe o mance equi emen s and mig a ion cos s,
p o iding a mo e comp ehensi e s a egy o e icien VNF deploy-
men and managemen . We show ha ou p oposed DRL-based
VNF deploymen me hod achie es a 28.8% lowe delay and a 34%
lowe mig a ion o e head compa ed o s a e-o - he-a baselines
in a b oad ange o la ge-scale simula ed scena ios, showing he
p oposed me hod’s scalabili y ea u es.
CCS CONCEPTS
•Ne wo ks
→
Ne wo k a chi ec u es;Ne wo k managemen ;
Ne wo k se ices.
KEYWORDS
6G Ne wo k A chi ec u e,Cloud Con inuum F amewo k,Se ice
O ches a o , Deep ein o cemen lea ning
ACM Re e ence Fo ma :
Solomon Fikadie Wassie, An onio Di Maio, and To s en B aun. 2025. Deep
Rein o cemen Lea ning o Con ex -Awa e Online Se ice Func ion Chain
Deploymen and Mig a ion o e 6G Ne wo ks. In The 40 h ACM/SIGAPP
Symposium on Applied Compu ing (SAC ’25), Ma ch 31-Ap il 4, 2025, Ca ania,
I aly. ACM, New Yo k, NY, USA, 10 pages. h ps://doi.o g/10.1145/3672608.
3707975
Pe mission o make digi al o ha d copies o all o pa o his wo k o pe sonal o
class oom use is g an ed wi hou ee p o ided ha copies a e no made o dis ibu ed
o p o i o comme cial ad an age and ha copies bea his no ice and he ull ci a ion
on he i s page. Copy igh s o componen s o his wo k owned by o he s han ACM
mus be hono ed. Abs ac ing wi h c edi is pe mi ed. To copy o he wise, o epublish,
o pos on se e s o o edis ibu e o lis s, equi es p io speci ic pe mission and/o a
ee. Reques pe missions om [email p o ec ed].
SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly
©2025 ACM.
ACM ISBN 979-8-4007-0629-5/25/03
h ps://doi.o g/10.1145/3672608.3707975
1 INTRODUCTION
So wa e De ined Ne wo k (
SDN
) and Ne wo k Func ion Vi ual-
iza ion (
NFV
) a e key echnologies ha enable elas ically esou ce
p o isioning o Vi ual Ne wo k Func ion (VNF) using i ual-
iza ion echnology. This app oach le e ages he po en ial o
SDN
echnology eplacing adi ional ha dwa e-based ne wo k unc-
ions wi h so wa e p og ams. Se ice unc ions a e ypically de-
ployed as Se ice Func ion Chains (SFCs), which consis o mul i-
ple
VNF
s in a p ede ined sequence o deli e end- o-end se ices.
These
VNF
s can be hos ed in a i ualized en i onmen on s anda d
Comme cial O -The-Shel (
COTS
) se e s, educing bo h Capi al
Expendi u e (
CAPEX
) and Ope a ional Expense (
OPEX
) o ne -
wo k ope a o s,[1],[2].
Ne wo k Add ess T ansla ion (
NAT
), In usion De ec ion and
P e en ion Sys em (
IDPS
), Fi ewall (FW), Load Balance (
LB
), Video
Op imiza ion con olle (
VOC
), T a ic Moni o ing (
TM
), WAN Op-
imize (
WO
), Deep Packe Inspec ion (DPI), and mo e
VNF
s can be
in e connec ed in speci ic p ede ined sequences o c ea e SFC e-
ques s, enabling he p o ision o specialized ne wo k se ices such
as Video S eaming (
VS
), Augmen ed Reali y (
AR
), Vi ual Reali y,
Indus y 4.0 (Ind 4.0), Holog aphic-Type Communica ions, Sma
Fac o y, Au onomous d i ing, Cloud gaming and ac ile indus ial
In e ne [3, 4].
The main challenge o an In e ne Se ice P o ide (
ISP
) in en-
hancing Quali y o Se ice (QoS) and Quali y o Expe ience (
QoE
)
is de e mining he op imal VNF deploymen loca ions o mee s in-
gen , a iable se ice eques s. Op imal VNF placemen on physical
se e s is c ucial o ne wo k pe o mance,
OPEX
, and eliabili y
[
5
],[
6
]. Machine lea ning models, pa icula ly Deep Lea ning (
DL
)
and Rein o cemen Lea ning (
RL
), make VNF deploymen dynamic
and adap i e, enabling eal- ime adjus men s. DL handles complex
high-dimensional ea u es, while RL op imizes s a egies h ough
in e ac ion wi h ne wo k s a es, imp o ing pe o mance, eliabili y,
and se ice con inui y, while educing ope a ional cos s.
Few s udies ha e explo ed VNF deploymen and mig a ion in
ime- a ying a ic, ypically in ol ing a ic p edic ion and a mi-
g a ion index o ep esen node load ends. Howe e , his app oach
is complex, needs bo h a ic p edic ion and node scheduling based
on load. Many wo ks add ess VNF deploymen , mig a ion, and SFC
econ igu a ion in h ee s ages: VNF Resou ce P edic ion,SFC Deploy-
men Op imiza ion, and Des ina ion Node Scheduling [
7
],[
8
],[
9
]. A
DRL-based app oach enables in elligen agen s o moni o eal- ime
ne wo k pe o mance, adap o a ic a ia ions, and con inuously
imp o e decision-making by acking use a ic and node s a us
h ough pe iodic in e ac ions and eedback om he en i onmen .
Fluc ua ing VNF esou ce equi emen s due o ime- a ying use
SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly S. Wassie e al.
a ic equi e se ice eloca ion o main ain eliabili y and pe o -
mance. The key esea ch ques ion is how o op imally deploy VNFs
while conside ing bo h ime- a ying use a ic and he a iable
unde lying ne wo k in as uc u e.
We p opose DRL-based app oaches o au oma ic VNF deploy-
men and ne wo k-s a e adap i e ne wo k econ igu a ion. To he
bes o ou knowledge, his pape is he i s o add ess VNF de-
ploymen and mig a ion om he ex eme edge, h ough he edge,
o cloud da a cen e s, aiming o long- e m ope a ional cos bene i s
and ne wo k-s a e adap i e VNF deploymen wi hin 6G mobile
ne wo k a chi ec u e. The h ee key con ibu ions o his pape a e
as ollows:
•
De elop a da a-d i en se ice o ches a o , which is pa o
he 6G ne wo k a chi ec u e managemen plane, o manage
ne wo k-s a e-awa e VNF deploymen o delay-sensi i e
applica ions.
•
De elop an in elligen ne wo k unc ion deploymen man-
agemen en i y o place VNFs, aimed a p edic ing he e ec
o long- e m ope a ional cos s, ne wo k pe o mance, and
eliabili y by con inuously moni o ing ime- a ying use
a ic demand and ne wo k in as uc u e.
•
P opose a no el DRL-based connec i i y con inuum wi h
VNF mig a ion om he ex eme edge h ough he edge o
he cloud o e he CCF o 6G Ne wo k a chi ec u e.
The emainde o he pape is o ganized as ollows: Sec ion 2
desc ibes he ela ed wo ks and conside ed scena io. Sec ion 3
p esen s he sys em model. Sec ion 4 ou lines he p oposed me hods.
Sec ion 4 explains he expe imen al se up and simula ion esul s.
Finally, Sec ion 6 d aws he conclusions.
2 RELATED WORKS
Recen esea ch has app oached he p oblems o VNF deploymen ,
sel -scaling, and elas ic esou ce alloca ion om a ious pe spec-
i es. We ha e e iewed s udies om ecen yea s ha a emp o
sol e he VNF deploymen p oblem ac oss h ee ca ego ies: esou ce
p o isioning, VNF mig a ion, and esou ce p edic ion and schedul-
ing, o en le e aging p edic ions o ime- a ying use a ic. While
many s udies ocus on QoS-awa e VNF deploymen and mig a ion
ac oss dis ibu ed da a cen e s, a ew app oach ha e explo ed DRL
as a po en ial solu ion. Signi ican e o s ha e op imized VNF place-
men o enhance ne wo k pe o mance. Despi e hese ad ances,
au oma ic and ne wo k-adap i e ope a ional cos and long- e m
e ec conside a ions, as well as e icien , eliable, and scalable VNF
deploymen in la ge-scale ne wo ks, emain challenging.
2.1 Resou ce p o isioning
Resea che s ha e add essed he VNF placemen and chaining p ob-
lem as a esou ce p o isioning issue, ocusing on lexible esou ce
alloca ion o mee se ice equi emen s and se ice le el ag ee-
men s. They p ima ily conside how much esou ce alloca ion is
needed o sa is y QoS and ne wo k pe o mance, bu o en o e look
he impac o ex e nal a ic luc ua ions [
10
],[
11
]. Fu he mo e,
many s udies ackled VNF placemen in MEC-NFV ne wo ks, o mu-
la ing op imiza ion models o enhance esou ce u iliza ion h ough
deep lea ning echniques ha in elligen ly selec nodes and place
VNFs o SFC eques s. Se ice deploymen ypically in ol es allo-
ca ing a Vi ual Ne wo k Func ion - Fo wa ding G aph o mee he
QoS equi emen s o VNFs [12],[13].
2.2 VNF mig a ion
NFV echnologies enable VNFs as so wa e-based ne wo k se -
ices. Howe e , equen use mobili y necessi a es e-scaling and
e-p o isioning o VNFs. Ak em e al. [
14
] add ess his wi h an
AI-Based Ne wo k-Awa e Se ice Func ion Chain Mig a ion o 5G,
enabling low-la ency slice ans e s be ween se ice a eas. Like
Vi ual Machine (
VM
) mig a ion and se e less compu ing, s a e ul
VNFs can mig a e wi hin elecom da a cen e s, acili a ing con ex
ans e ac oss geog aphically dis ibu ed se ups.
Li e al. [
15
] p oposed a join esou ce op imiza ion and delay-
awa e
VNF
mig a ion me hod ocusing on esou ce a ailabili y and
delay cons ain s. He e al. [
16
] in oduced an SLA-awa e app oach
o mul iple mig a ion planning in SDN-NFV clouds, op imizing
sequence and iming o minimize mig a ion ime and p e en QoS
deg ada ion. Howe e , hese heu is ics o e look dynamic a ia ions
in link quali y and compu a ional esou ces o e ime.
2.3 SFC esou ce equi emen p edic ions and
scheduling
This app oach p edic s he ime- a ying esou ce and QoS equi e-
men s o SFC eques s, p oac i ely alloca ing esou ces on a ailable
nodes based on luc ua ing use a ic demands. This p oac i e al-
loca ion is essen ial o e icien ly add essing he VNF deploymen
and esou ce p o isioning p oblem. E icien scheduling aims o
educe o al deploymen cos , communica ion cos , and enhance
QoS by dynamically alloca ing esou ces based on ime- a ying de-
mand. Gu e al. [
17
] p oposed a mixed-in ege linea p og amming
solu ion o VNF deploymen and low scheduling in dis ibu ed
da a cen e s, conside ing ne wo k opology, VNF ins ances, and
deploymen . Tang e al. [
18
] de eloped a me hod p edic ing u u e
esou ce needs based on ime- a ying use a ic and deep belie
ne wo ks, add essing dynamic VNF esou ce equi emen s. The pa-
ame e s conside ed in his me hod, compa ed wi h o he li e a u e,
a e shown in Table 1.
Table 1: Compa ison o Rela ed Wo ks
Pa ame e s End o
end delay
Concu en
VNF
Mig a ion
Node
Resou ce
Va iable
T a ic
S a e ul
VNF
Mig a ion
Mig a ion
Cos
[10]-2020 ✓×✓×××
[16]-2020 ✓ ✓ × × ✓×
[5]-2021 ×✓ ✓ × × ✓
[7]-2021 ✓ ✓ × × ✓ ✓
[14]-2022 ✓ ✓ ✓ × × ✓
[8]-2023 ✓×✓ ✓ × ×
[6]-2023 ✓×✓ ✓ ×✓
[13]-2024 ✓×✓×××
P oposed
Me hod
✓✓ ✓✓✓✓
Deep Rein o cemen Lea ning o Con ex -Awa e Online Se ice Func ion Chain Deploymen and Mig a ion o e 6G Ne wo ks SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly
NDT
Au oma ion
Op imiza ion
O ches a ion
Ope a ions/Business Suppo
Sys ems
NFV-MANOSe ice o ches a ion (SO)
AI/ML
F amewo k
OSS/BSS
Applica ion laye
S
VNF1 VNF3 VNF5 VNF6
Ex eme
edge
Edge Cloud
Business in e ace
Cloud con inuum amewo k(CCF)
Managemen & o ches a ion F ame wo k(MOF) AI models
VNF2
Resou ce O ches a ion(RO)
MOF Bus
AI/ML Bus
Ne wo k unc ion laye
VNF4End de ices
sma phone
d one
AR/VR
headse
Con ol policy and da a low VNF Deploymen ac ion Message bus
Figu e 1: Illus a ion o P oposed Na i e AI 6G Ne wo k a -
chi ec u e wi h ne wo k s a e adap i e VNF deploymen
3 SYSTEM MODEL AND PROBLEM
FORMULATION
3.1 6G Ne wo k A chi ec u e
We en ision a 6G ne wo k a chi ec u e depic ed in Figu e 1 com-
posed o h ee main componen s: he Cloud Con inuum F ame-
wo k (
CCF
), he Managemen and O ches a ion F amewo k (
MOF
),
and he A i icial In elligence and Machine Lea ning F amewo k
(
AIMLF
).Each amewo k can use message buses bo h in e nally
and o in e - amewo k communica ion. Speci ically, he MOF
message bus is used o MOF and AIMLF communica ion, while
he AIMLF message bus acili a es communica ion be ween CCF
and AIMLF.
3.1.1 Cloud Con inuum F amewo k (
CCF
). The CCF o e s a uni-
ied esou ce pool ha o ches a es esou ces ac oss mul iple clouds
and dynamically composes ne wo k and cloud esou ces om he
ex eme edge o cen al clouds based on a ailabili y and se ice
equi emen s. The nodes in a
CCF
can be classi ied based on hei
geog aphical p oximi y o he end use and hei compu a ional ca-
paci y in o he ca ego ies o Cloud,Nea -edge,Fa -edge, and Ex eme-
edge. I in eg a es AI-d i en esou ce managemen o op imize
u iliza ion and ene gy e iciency by p edic ing demand and dynami-
cally adjus ing alloca ions in eal ime. Addi ionally, he amewo k
p o ides business in e aces o cloud p o ide s o enhance Se ice
le el Ag eemen s (
SLA
s) and ensu e secu i y, eliabili y, us , and
ene gy e iciency.
3.1.2 Managemen and O ches a ion F amewo k (
MOF
). The MOF
o ches a es ne wo k se ices ac oss he cloud con inuum in he
6G se ice-o ien ed ne wo k, in eg a ing a ious echnological do-
mains and suppo ing AI/ML amewo ks o eal- ime moni o ing
and upda es o AI-d i en unc ions. I s dis ibu ed managemen
app oach sepa a es conce ns, ede a es unc ional domains, and de-
li e s end- o-end ne wo k se ices (E2E NS). I also allows enan s
o eques deploymen , modi ica ion, o e mina ion o ne wo ks
o applica ions.
MEC
MEC
MEC
MEC
RAN
RAN
RAN
RAN
S +1
Sn
A
+1
An
VNF2
VNF 1
VNF3
Cloud DC
SDN con olle
Wi ed link
Mig a ion
Wi eless link
S a es
i=(Bi, Di,𝜎i
)
VNF2
VNF3
S
Ac ions
En i onmen al
in o ma ion
1
A VNF deploymen ac ion
3
VNF1
2
SFC eques a ic gene a ion
Figu e 2: Example o physical ne wo k con aining a Cloud
Da a Cen e , se e al in e connec ed MEC se e s se ing one
o mo e RAN domains se ing a di e se se o Use Equip-
men s (UEs), AR headse s, and IoT de ices
In he 6G ne wo k a chi ec u e, each Se ice O ches a o (SO)
is igh ly in eg a ed wi h he Ope a ions Suppo Sys em (OSS) and
Business Suppo Sys em (BSS), which handle he o ches a ion
and li ecycle managemen o Ne wo k Se ices (NS) as a se o
VNFs. The OSS manages ne wo k ope a ions like moni o ing, aul
managemen , and pe o mance, while he BSS o e sees business
asks like billing and cus ome managemen . Toge he , hey ensu e
e icien se ice deploymen , scaling, and esou ce op imiza ion.
SO au oma e FCAPS unc ions—Faul , Con igu a ion, Accoun ing,
Pe o mance, and Secu i y—ensu ing ne wo k heal h and secu i y.
A he business laye , OSS/BSS handles Li ecycle Managemen
(LCM) eques s o NS, dis ibu ing hem ac oss o ches a ion do-
mains o ensu e p og ammabili y and se ice in eg a ion ac oss
di e se cloud en i onmen s. This coo dina ion o echnical and busi-
ness laye s enables he a chi ec u e o adap o se ice demands
e icien ly while main aining seamless business ope a ions.
3.1.3 A i icial In elligence and Machine Lea ning F amewo k (
AIMLF
).
Designed o p o ide uni ied AI/ML managemen and o ches a ion
ac oss a ious segmen s and amewo ks o he 6G ne wo k a -
chi ec u e, he AIMLF suppo s he de elopmen , aining, and
dis ibu ion o AI/ML models. I inco po a es mechanisms o con-
inuous in eg a ion and con inuous de elopmen (CI/CD) o AI/ML
deploymen wi hin a se ice-o ien ed a chi ec u e. The amewo k
is capable o e alua ing and upda ing AI-d i en unc ions du ing
sys em un ime and u ilizes esou ces p o ided by he CCF. Addi-
ionally, AIMLF employs a ne wo k digi al win o imp o e AI/ML
aining, enhance simula ions beha io , and op imize AI-d i en
unc ions, inc ease e iciency and adap abili y.
3.2 Sys em Model
End-use de ices gene a e dynamic a ic pa e ns and ini ia e
ne wo k-access eques s h ough base s a ions, as shown in Fig-
u e 1. The a ic a e ses h ough he applica ion laye in he
p oposed 6G ne wo k a chi ec u e, eques ing an o ches a o o
SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly S. Wassie e al.
deploy VNFs op imally while mee ing s ic pe o mance equi e-
men s.The se ice o ches a o , pa o he MOF, ecei es his a ic
and de e mines he op imal deploymen on physical se e s wi hin
he CCF, o e he Cloud,Nea -edge,Fa -edge, and Ex eme-edge. We
model a physical ne wo k a high le el as shown in Figu e 2, whe e
Mobile Edge Compu ing (
MEC
) nodes a e connec ed o he cloud ia
high-speed ibe . All compu a ional and communica ion esou ces
o he edge nodes a e con olled by an o e lay con olle (i.e RL
agen ), which o ches a es SDN en i ies a he cloud DC.
The ne wo k in as uc u e consis s o : (i) a cloud da a cen e
(DC) capable o hos ing he p oposed DRL algo i hm and po en-
ially deploying mul iple VNFs; (ii) se e al MEC da a cen e s
𝑣𝑖
,
each wi h a CPU capaci y
𝐶𝑖
[
cycle/s
] ,posi ioned nea end-use s
o minimize la ency; and (iii) a se o end-use de ices ha ac-
cess ne wo k se ices h ough base s a ions as decip ed in igu e 2.
We conside he e ogeneous esou ces and a ying link capaci ies
among dis ibu ed edge se e s, whe e delays be ween VNFs on
di e en se e s, dic a ed by dynamic use a ic, a ec he op imal
VNF deploymen loca ions. In such a sys em, end-use de ices such
as sma phones, AR headse s, and IoT de ices eques dynamic
communica ion and compu a ional esou ces h ough base s a ions,
ini ia ing ne wo k-access eques s (e.g. egis a ion, a ach eques s,
and adio esou ce eques s o channel alloca ion). These de ices
p oduce dynamic a ic as shown in igu e 2 in s ep
1
wi h s ic
Key Pe o mance Indica o (
KPI
) equi emen s, demanding lexible
esou ce alloca ion o op imal compu ing and communica ion pe -
o mance.The RL agen ope a es as an in elligen SDN con olle ,
p ocessing complex s a e in o ma ion o op imize ne wo k pe o -
mance and se ice deli e y. The RL agen con inuously adap s o
ne wo k s a e in o ma ion, which se es as i s inpu (Figu e 2, s ep
2
).The en i onmen al s a e in o ma ion includes a ic loads, e-
sou ce a ailabili y (e.g., MEC CPU, memo y, bandwid h), ne wo k
pe o mance me ics (e.g., delay, link u iliza ion), and in as uc-
u e s a us (e.g., edge node s a us). Based on his analysis, he RL
agen p edic s op imal VNF deploymen ac ions and dynamically
alloca es esou ces o e icien ope a ion, as shown in Figu e 2,
s ep 3 .
To model he abo e desc ibed physical ne wo k, We conside a
scena io in ol ing a physical ne wo k in as uc u e modeled as
an undi ec ed g aph
𝐺=(𝑉, 𝐸)
, whe e
𝑉
and
𝐸
ep esen he se s
o physical ne wo k nodes and links be ween nodes, espec i ely.
Each node
𝑣∈𝑉
ep esen s a physical ne wo k en i y, such as an
ex eme-edge node (i.e., a
UE
end de ice such as a sma phone,
elec ic ehicle, o d one), and edge se e , o a cen al cloud da a
cen e wi hin he
CCF
, as depic ed in Figu e 1. Each link
𝑒∈𝐸
co esponds o a physical ne wo k connec ion, which ep esen s
high-speed ibe links be ween nodes. The bandwid h (BW) capaci y
o he physical link be ween nodes
𝑣𝑖, 𝑣𝑗∈𝑉
is deno ed as
𝐵𝑖 𝑗
[bi /s].
We model a gene ic
SFC
as a Di ec ed Acyclic G aph (
DAG
)
𝐻=(𝐾, 𝐿)
, whe e
𝐾
ep esen s he se o VNFs wi hin he SFC
and
𝐿
deno es he se o logical links be ween VNFs. Each VNF
𝑘∈𝐾
ep esen s a so wa ized ne wo k unc ion ha can p ocess
incoming packe s. The logical links
(𝑘𝑖,𝑘𝑗) ∈ 𝐿
ep esen he con-
nec ions be ween successi e VNFs
𝑘𝑖
and
𝑘𝑗
. The opology
𝐻
o an
SFC depends on he applica ion ha he SFC aims o suppo , and
we assume i is al eady de e mined by he enan and submi ed
o he ne wo k managemen plane o accep ance and deploymen .
The posi ion and logical o de o VNFs in an SFC a e c i ical pa-
ame e o ne wo k pe o mance.The de ailed g anula i y o a ic
a i al a he o ches a o is depic ed in Figu e 3. End- o-end use
a ic om he applica ion laye gene a es mul iple SFC eques s
(
𝑓1
,
𝑓2
,
𝑓3
, ...,
𝑓𝑛
), which a e ansmi ed o he se ice o ches a o .
These eques s ollow a s anda dized s uc u e wi hin a de ined
queuing model. The communica ion pa e ns encompass a ious
in e ac ion ypes, including human- o-human (H2H), machine- o-
machine (M2M), and machine- o-human (M2H) communica ion.
The se ice o ches a o p ocesses he incoming a ic, iden i ies
he ele an VNFs and hei in e dependencies, and maps hem o
app op ia e physical nodes. This p ocess ensu es op imal ope a ion
while main aining logical connec ions be ween VNFs ac oss he
ne wo k in as uc u e wi hin he Cloud Con inuum F amewo k.
We de ine an SFC eques as a uple
𝑓𝑖=𝐻𝑖, 𝐵min
𝑖, 𝐷max
𝑖, 𝜎𝑖
whe e
𝐻𝑖=(𝐾𝑖, 𝐿𝑖)
is he SFC’s opology,
𝐵min
𝑖[bi /
s
]
ep esen s
he minimal end- o-end bandwid h equi emen ,
𝐷max
𝑖[
s
]
is he
maximum allowable end- o-end delay,
𝜎𝑖[cycle/
s
]
deno es he o e -
all SFC’s compu a ional capaci y equi emen . As he incoming
a ic pa e n om use a ic luc ua es o e ime, by aking in o
accoun he ne wo k design, packe a i ing a he o ches a o
ollows a Poisson dis ibu ion wi h mean a i al in ensi y a e
𝜆𝑖
.
We deno e he sequence o SFC eques s a i ing o he ne wo k’s
esou ce o ches a o o being deployed on o he physical ne wo k
as
𝐹=(𝑓1, 𝑓2, . . . , 𝑓𝑛)
, whe e
𝑓𝑖
indica es he
𝑖
- h SFC eques in he
queue.
Each VNF
𝑘∈𝐾
is associa ed wi h a speci ic s a e, making
s a e ul mig a ion essen ial o main aining se ice con inui y and
op imizing ne wo k pe o mance. The a ge node selec ion o
each s a e can be modeled as a uple:
S𝑘=(𝑀𝑖, 𝐷𝑐, 𝑄𝑣, 𝑃𝑠,𝑇𝑚)
,
whe e
𝑀𝑖[
B
]
is he size o he con ex in o ma ion o be mig a ed,
𝐷𝑐
ep esen s se ice deploymen cos s,
𝑄𝑣
is he impac o SLA
iola ions du ing mig a ion,
𝑃𝑠
is selec ed pa h conges ion s a us,
and
𝑇𝑚[
s
]
is he o al mig a ion ime. S a e ul mig a ion p ese es
ac i e sessions and da a p ocessing, minimizing dis up ions. The
inabili y o eloca e se ices may esul in ailu es, leading o in e -
up ions and inc eased delays.
Ou p oposed me hod can also be applied o sys ems whe e
SFC
a e modeled as gene ic Di ec ed Acyclic G aphs (DAGs), suppo -
ing eme ging 6G mission-c i ical applica ions wi h s ingen
KPI
equi emen s. The ing ess a ic om he applica ion laye i s
en e s he NAT as inbound a ic o many ne wo k se ices, while
he ou bound a ic om he las VNF o en exi s h ough he IDPS.
Se e al s udies [
19
] show ha eg ess a ic depends on he ne wo k
se ices and is no unique o speci ic VNFs (e.g Ind 4.0 a ic exi
h ough FW).
To illus a e how a ic a e ses h ough VNFs o a ious appli-
ca ions, VNFs a e logically a anged in sequence. Fo example, VNFs
a e o ganized as
𝐾=(NAT,FW,TM,VOC,IDPS)
o ideo s eam-
ing, and as
𝐾=(NAT,TM,enc yp ion,decomp ession,dec yp ion)
o au onomous ehicles. An augmen ed eali y (AR) applica ion
ypically employs a linea SFC eques
𝑓𝑖
, wi h VNFs o de ed as
𝐾=(NAT,FW,VOC,TM,WO,IDPS)
. Incoming a ic i s en e s
he NAT o add ess ansla ion, hen passes h ough he i ewall
(FW) o il e unau ho ized a ic. Nex , i goes h ough he VOC
Deep Rein o cemen Lea ning o Con ex -Awa e Online Se ice Func ion Chain Deploymen and Mig a ion o e 6G Ne wo ks SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly
VK1
S
d
Cloud Con inuum Ne wo k In as uc u e
2
3
.
.
.
.
n
Physical ne wo k Vi ual link SFC logical link Embedding
Ex eme edge Edge Cen al cloud
VK4
VK5 VK6 VK7
VK8 VK9
d
S
S
VK2 VK3
VK4 VK5
VK6
VK7
d
VK1 VK2 VK3
1
VK3
VK5
VK1 VK4
VK6 VK7
Se ice o ches a o
VK9
VK2
VK1
L1
L2
L3
L4
L5
Figu e 3: SFC deploymen wi h gene ic s uc u e
o ideo quali y and bandwid h op imiza ion. The a ic manage
(TM) analyzes pa e ns, while he WAN Op imize (WO) enhances
pe o mance by op imizing da a low and educing la ency. Finally,
he IDPS scans o malicious ac i i y.
3.3 P oblem Fo mula ion
We o mula e he p oblem o elas ically au o-scaling VNF deploy-
men o e a physical ne wo k, aiming o de e mine he op imal
physical loca ion o VNFs by minimizing he la ency o SFC eques s
ac oss he physical ne wo k. This la ency comp ises p opaga ion
delay, ansmission delay, queuing delay, and p ocessing delay. How-
e e , we conside he ansmission delay and VNF compu ing delay.
The op imiza ion p oblem is o mula ed as ollows. We de ine he
SFC alloca ion ec o
𝛼=(𝛼1, . . . , 𝛼|𝐾|) ∈ 𝑉|𝐾|
, whe e each com-
ponen
𝛼𝑖
ep esen s he physical node in
𝑉
on which he
𝑖
- h VNF
in he SFC is deployed on. Le us de ine he VNF p ocessing la ency
𝑃(𝛼𝑖)
on node
𝛼𝑖
as he ime needed by he
𝑖
- h VNF o pe o m
i s compu ing ask when deployed on node
𝛼𝑖
, and we de ine he
wo s -case SFC p ocessing la ency as he sum
𝑃(𝛼)=Í|𝐾|
𝑖=1𝑃(𝛼𝑖)
o all SFC’s VNFs’ p ocessing delays. We also de ine he VNF com-
munica ion la ency
𝑙(𝛼𝑖, 𝛼𝑗
) as he sho es -pa h la ency be ween
VNF deploymen loca ions
𝛼𝑖
and
𝛼𝑗
o e he physical links
𝐸
, and
we de ine he wo s -case SFC communica ion la ency as he sum
Γ(𝛼)=Í(𝑖,𝑗 ) ∈𝐿𝑙(𝛼𝑖, 𝛼𝑗)
o all VNF communica ion la ency’s o e
all SFC’s links. Finally, we de ine he o al SFC delay as he sum o
he SFC p ocessing and communica ion la ancies, i.e.,
𝑃(𝛼) + Γ(𝛼)
.
E en hough he unc ions
𝑃
and
Γ
depend on he physical ne wo k
opology
𝐺
and he SFC opology
𝐻
, and hei ime- a ian p ope -
ies, we d op such dependency in he no a ion o conciseness.
Ano he pa ame e we conside ed in o mula ing objec i e unc-
ion o selec he op imal loca ion o a s a e ul VNF is i s mig a ion
ime, which depends on he size o he VNF and he h oughpu
o all links cons i u ing he sho es pa h om he VNF’s sou ce
node o a candida e a ge eloca ion node. We de ine he SFC’s
𝑘
- h VNF mig a ion ime
𝑡𝑘=Í(𝑖,𝑗 ) ∈𝑝𝑘
𝑀𝑘
𝐵𝑖 𝑗
as he sum o all he
imes needed o ansmi he VNF’s s a e o size
𝑀𝑘
o e all phys-
ical links
(𝑖, 𝑗) ∈ 𝑝𝑘
ha cons i u e he sho es pa h
𝑝𝑘
om he
VNF’s p e ious deploymen loca ion o he new candida e deploy-
men node
𝛼𝑘
, which depends on ime- a ying VNF s a e size and
ne wo k condi ions. We de ine he wo s -case SFC mig a ion ime
𝑇(𝛼)=Í𝑘∈𝐾𝑡𝑘
as he sum o all VNF mig a ion imes, om hei
deploymen loca ions o hei espec i e candida e a ge nodes
ep esen ed by
𝛼
. I is wo h no ing ha he mig a ion ime migh
no be he only cos ope a o incu o mig a ing SFCs, o example
adding economical o ene gy expenses o in as uc u e ac i a ion,
SLA iola ions, bandwid h quo a excess, he size o mig a ed VNF
con ex in o ma ion, ene gy consumed o mig a ion ask, e ce e a.
The e o e, we conside
𝑇(𝛼)
as a mo e gene ic de ini ion o SFC
mig a ion cos ha may no necessa ily be exp essed in la ency bu
includes o he inancial, ene gy, and esou ce aspec s.
Gi en ha he op imiza ion p oblem is mul i-objec i e, we de ine
𝛽=(𝛽1, 𝛽2)
as he weigh ac o s ha balance he ade-o be ween
delay and mig a ion cos when selec ing physical nodes, and aim
o minimize
𝛽⊤𝐶(𝛼)
, whe e
𝐶=(𝑃+Γ,𝑇 )(𝛼)
. Le us de ine
𝑛𝑘
𝑣
as
he
cycles/
sused by a VNF
𝑘
when deployed on node
𝑣∈𝑉
, and
𝑏𝑘𝑙
𝑖 𝑗
as he bandwid h in
bi /
sused by an SFC’s logical link
(𝑘, 𝑙)
i
deployed o e he physical link (𝑖, 𝑗) ∈ 𝐸.
The op imiza ion p oblem’s goal is o ind he op imal SFC al-
loca ion ec o
𝛼∗
o each SFC eques
𝑓𝑖
, which minimizes he
objec i e unc ion
𝛽⊤𝐶(𝛼)
unde a se o in as uc u e-induced
cons ain s, as in Equa ion 1.
minimize
𝛼∈𝑉|𝐾|
To al SFC delay
z }| {
𝛽1· (𝑃(𝛼) + Γ(𝛼)) +
SFC mig a ion cos
z }| {
𝛽2·𝑇(𝛼)(1)
subjec o
∑︁
𝐾𝑖:𝑖∈[𝑛]∑︁
𝑘∈𝐾𝑖
𝑛𝑘
𝑣≤𝐶𝑣,∀𝑣∈𝑉(1a)
∑︁
𝐿𝑖:𝑖∈[𝑛]∑︁
(𝑘,𝑙 ) ∈𝐿𝑖
𝑏𝑘𝑙
𝑖𝑗 ≤𝐵𝑖 𝑗,∀(𝑖, 𝑗) ∈ 𝐸(1b)
𝑃(𝛼) + Γ(𝛼) ≤ 𝐷max (1c)
Cons ain 1a imposes ha he sum o he p ocessing equi e-
men s o all VNFs deployed on each node in he sys em should be
less han he locally a ailable compu a ional capaci y. Cons ain 1b
indica es ha he bandwid h usage o all SFC eques s mus emain
wi hin he a ailable bandwid h capaci y o he ne wo k logical links.
Finally, Cons ain 1c implies ha he o al communica ion and
p ocessing delay o an SFC eques does no exceed he E2E delay
ole ance limi equi ed o he success ul comple ion o he gi en
se ice.

SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly S. Wassie e al.
4 METHODOLOGY
4.1 Deep Rein o cemen Lea ning App oach
In his sec ion, we ede ine he op imiza ion p oblem in Equa ion 1
wi h he con ex o RL, whe e he RL agen pe o ms VNF deploy-
men ac ions by con inuously moni o ing he physical ne wo k
in as uc u e and ecei ing eedback in he o m o ewa ds. The
goal is o de i e an op imal con ol policy ha enables he agen o
selec VNF deploymen ac ions op imally, conside ing he u u e
ne wo k s a e. To achie e his, we employ DRL wi h deep neu al
ne wo ks, enabling he agen o manage complex en i onmen s
and high-dimensional s a e spaces.
The p oposed DRL-based P oximal Policy Op imiza ion (
PPO
)
agen iden i ies op imal physical nodes o mee applica ion delay
equi emen s by calcula ing delays be ween VNF-hos ing nodes,
aking in o accoun session in o ma ion size, se ice deploymen
cos s, SLA iola ions, ne wo k conges ion, ene gy use o ac i e
ans e s, and o al mig a ion ime. I e ines i s s a egies by le e -
aging ewa ds om ac ions wi hin a Ma ko Decision P ocess
(MDP). The p oblem is o mula ed in e ms o a s a e space
𝑆
, an
ac ion space
𝐴
, and a ewa d unc ion
𝑅
. We demons a e how
PPO sol es he p oblem using bo h a alue ne wo k and a policy
ne wo k. The discoun ac o
𝛾∈ [
0
,
1
)
is used o ma hema ically
ep esen con inuing asks.
The main objec i e o he RL agen is o disco e a policy ha
maximizes he expec ed sum o discoun ed u u e ewa ds, known
as he e u n, gi en by
𝑅𝑡=Í∞
𝑖=0𝛾𝑖+𝑡𝑟𝑡+𝑖
. The op imal policy, gi en
by
𝜋∗=a g max𝜋E𝜋{𝑟0|𝑠0=𝑠}
, is he one ha maximizes he
expec ed e u n om any gi en s a e
𝑠
. The de ailed desc ip ion o
he s a e space, ac ion space, and ewa d unc ion is as ollows.
1)
S a e space
𝑆𝑡
:Desc ibes he cu en si ua ion o he agen
in he en i onmen o ou VNF deploymen p oblem, de-
signed as a ec o ini ially andomly deployed wi h
𝐷=
{𝑉𝐾𝑗
𝑖,𝑉𝐾𝑗+1
𝑖+1, . . . ,𝑉 𝐾𝑛
𝑛}
in he physical ne wo k which ep e-
sen s VNFs
𝐾𝑗, 𝐾𝑗+1, . . . , 𝐾𝑛
ha a e deployed on physical
nodes
𝑉𝑖,𝑉𝑖+1, . . . ,𝑉𝑛
a ime
𝑡0
, as well as he link delay
be ween he sou ce and des ina ion nodes o he physical
nodes hos ing he VNFs in an SFC. The s a e
S
is de ined as
𝑆=𝑉𝐾𝑗
𝑖,𝑡 , 𝐿𝐾𝑗
𝑉𝑖,ℎ ∀𝑖∈𝑉𝑛,∀𝑗∈𝐾𝑛,(2)
, whe e
𝑉𝐾𝑗
𝑖,𝑡
means ha VNF
𝐾𝑗
is deployed on physical
se e
𝑉𝑖
a ime
𝑡
, and
𝐿𝐾𝑗
𝑉𝑖,ℎ
deno es he link delay
𝐿𝑉𝑖
ℎ,𝑡
be-
ween he sou ce node hos ing VNF
𝐾𝑗
a
𝑉𝑖
and he des ina-
ion node whe e VNF
𝐾𝑗+1
is hos ed on physical se e
𝑉ℎ
.
Whe e Bo h (𝑣𝑖, 𝑣ℎ) ∈ 𝑉2.
2)
Ac ion space
𝐴𝑡
:The agen explo es whe e a e he op imal
loca ions o physical nodes o hos VNFs ha mee he QoS
equi emen s o incoming a ic. The ac ion space p o ides
bounda ies he agen how o sea ch he physical nodes e-
ga ding VNF deploymen , wi h each ac ion ep esen ing he
selec ion o a sequence o nodes ha sa is ies he QoS e-
qui emen s. The agen will selec a numbe o nodes equal
o he leng h o he VNFs in he SFCs a each ime s ep 𝑡.
𝐴=(𝑉𝐾𝑖
1, . . . ,𝑉 𝐾𝑛
𝑛),(3)
L= 3ms
B=100 Mbps
CPU: 3GHz
RAM:9GB
memo:36GB
CPU: 3GHz
RAM:9GB
Memo:36GB
CPU: 4GHz
RAM:12GB
memo:42GB
CPU:6GHz
RAM:15GB
memo:256GB
VNF1 VNF2 VNF3
L=4ms
B=120Mbps
L=5ms
B=50Mbps
L=4ms
B=40Mbps
L=12ms
B=60 Mbps
En i omen Rewa d R
Policy ne wo k
S a e s
Ac ion A
Vϕ (s )
Value ne wo k
S a e s
Figu e 4: Value and policy ne wo k o VNF deploymen .
, whe e
𝑉1, . . . ,𝑉𝑛
ep esen he numbe o nodes selec ed a
each ime s ep 𝑡1, . . . , 𝑡𝑛.
3)
Rewa d Func ion
𝑅𝑡
:The ewa d unc ion quan i a i ely
measu es he impac o decisions on ne wo k ope a ions.
The agen needs o quan i y he quali y o he deploymen
loca ion o physical node by lea ning om he pa e n ex-
ac ed om en i onmen . The QoS u ili y unc ion is gi en
by
𝑄𝑗=𝛼𝜇𝑗·𝑆𝑗(𝐷𝑗) + 𝛼𝑓
𝜇𝑗·𝑇𝑗(𝑊𝑗,𝐶𝑗)
whe e
𝛼𝜇𝑗
and
𝛼𝑓
𝜇𝑗
,
a e weigh ing pa ame e s used o p io i ize use sa is ac ion
𝑆𝑗(𝐷𝑗)
,and he cos sa ings sco e
𝑇𝑗(𝑊𝑗,C𝑗)
o he gi en
a ic eques .
Assume
𝑉={𝑉1, . . . ,𝑉𝑛}
as he se o all possible loca ions
whe e VNFs can be deployed and
𝐾={𝐾1, . . . , 𝐾𝑛}
a e pos-
sible VNFs o be deployed on a ailable loca ions o physical
se e . Equa ion 2 ep esen s he delay be ween VNF
𝐾𝑗
un-
ning a he sou ce node
𝑉𝑖
and VNF
𝐾𝑗+1
unning on he
a ge node 𝑉ℎa ime 𝑡. The ewa d unc ion is gi en by
𝑅𝑡=−𝜔1∑︁
(𝑣𝑖,ℎ) ∈𝑉2
𝐿𝐾𝑗
𝑉𝑖,ℎ −𝜔2∑︁
𝑓∈𝐹
𝑄𝑖,(4)
, whe e
𝜔1
and
𝜔2
p io i ize la ency educ ion and maximize
QoS espec i ely,
4.2 P oximal Policy Op imiza ion o
au onomous VNF deploymen
PPO
is policy-based DRL algo i hm, di ec ly upda es i s policy based
on obse ed ewa ds, allowing o apid adap a ion in dynamic en-
i onmen s. I s use o policy g adien s makes i highly e icien and
adap able, making i a aluable choice o op imiza ion p oblems
in complex and changing ne wo k en i onmen .
As shown in Figu e 4, he s a e space consis s o he ne wo k
opology, cu en VNF placemen s, esou ce u iliza ion, and KPI
equi emen s o incoming a ic. The RL agen makes VNF place-
men decisions by conside ing SFC ou ing choices and esou ce
Deep Rein o cemen Lea ning o Con ex -Awa e Online Se ice Func ion Chain Deploymen and Mig a ion o e 6G Ne wo ks SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly
alloca ion. The ewa d is based on imp o emen s in ne wo k pe o -
mance and he e iciency o esou ce u iliza ion. The unc ionali ies
o each componen in his ein o cemen lea ning amewo k a e
de ailed as ollows.
Algo i hm 1: Au onomous VNF deploymen
Inpu : 𝑓𝑖=(𝐻𝑖, 𝐵min
𝑖, 𝐷max
𝑖, 𝜎𝑖),𝛼,𝑇max,𝐺 opo,𝑙(𝑖, 𝑗)
1 Ini ializa ion: 𝜙0,𝜋0// Ini ialize 𝜙,𝜋pa ame e s
2𝑆𝑡← {𝑉𝐾𝑗
𝑖,𝑡 , 𝐿𝑉𝑖,𝑗
𝑡}// Ini ial VNF loca ion
3𝑅𝑡←0// Ini ial he ewa d o ze o
4 o 𝑡∈𝑇max do
5𝑐←Measu eCPU equi men (𝑓𝑖)
6𝑏←Measu eBandwid h(𝑓𝑖)
7𝑙←Measu eLa ency(𝑓𝑖)
8𝑞←Moni o Linkquili y(𝐺 opo)
9𝑆𝑡← (𝑙,𝑏,𝑐,𝑞)// Assign 𝑓𝑖KPI equi emen
10 𝑆𝑡
𝐴𝑡,𝜋𝜃(𝐴𝑡|𝑆𝑡)
−−−−−−−−−−−→ 𝑆𝑡+1// Selec 𝐴𝑡∼𝜋𝜃
11 𝑆𝑡+1←𝑉𝐾𝑖
𝑡// assign VNF 𝐾𝑖new loca ion
12 𝐷𝑡← {𝑠0, 𝑎0,𝑟0, . . . ,𝑠𝑡, 𝑎𝑡, 𝑟𝑡}// collec s a e ac ion
in e ac ion
13 i 𝑙(𝑓𝑖)<𝑙(𝑖, 𝑗) hen
14 𝐴𝑡∼𝜋𝜃(𝐴𝑡|𝑆𝑡)Execu e 𝐴𝑡
−−−−−−−−−→ 𝑅𝑡+1, 𝑆𝑡+1
15 𝐴𝜋𝜃𝑡(𝑠𝑡, 𝑎𝑡) ← 𝑄(𝑠, 𝑎) − 𝑉𝜙(𝑠)// Ad an age es ima e
// Upda e 𝜃and 𝜙pa ame e s
16 𝜃𝑡+1=a g max𝜃1
| D𝑡|𝑇Í𝜏∈ D𝑡Í𝑇
𝑡=0
17 min 𝜋𝜃(𝑎𝑡|𝑠𝑡)
𝜋𝜃𝑡(𝑎𝑡|𝑠𝑡)𝐴𝜋𝜃𝑡(𝑠𝑡, 𝑎𝑡),𝑔(𝜖, 𝐴𝜋𝜃𝑡(𝑠𝑡, 𝑎𝑡))
18 𝜙𝑡+1=a g min𝜙1
| D𝑡|Í𝜏∈ D𝑡Í𝑇
𝑡=0(𝑉𝜙(𝑠𝑡) − 𝑅𝑡)2
19 else
20 SFC_ eq ← ejec ed
21 e u n 𝜙𝑖(𝑠𝑡), 𝜋𝜃(𝑠𝑡)
PPO de elops an op imal policy by u ilizing he collabo a ion
be ween alue and policy ne wo ks o decision-making op imiza-
ion. The de ailed wo king p inciple o how he alue and policy
ne wo ks moni o he en i onmen s a e and disco e he op imal
policy is illus a ed in (Algo i hm 1). The alue ne wo k es ima es
he long- e m e ec i eness o ne wo k s a es and econ igu a ion
ac ions, ac o ing in VNF placemen s, SFC con igu a ions, esou ce
u iliza ion, and a ic demands. This es ima ion, ep esen ed by
𝑉𝜙(𝑠)=EÍ∞
𝑡=0𝛾𝑡𝑅𝑡+1|𝑆𝑡=𝑠
guides he policy ne wo k. He e,
𝑉𝜙(𝑠)
is pa ame e ized by
𝜙
,
𝛾
is he discoun ac o ,
𝑅𝑡+1
is he
ewa d a ime 𝑡, and 𝑆𝑡=𝑠speci ies he s a e a ime 𝑡.
In RL, a policy
𝜋
de ines how an agen ac s based on he obse ed
s a e. I guides he agen ac ions in esponse o en i onmen al s a es.
The policy ne wo k maps s a es o ac ions, aiming o maximize
cumula i e ewa ds by inc easing he p obabili y o high- ewa d
ac ions and dec easing ha o less e ec i e ones. PPO e ines his
app oach using he ad an age unc ion
𝐴(𝑠, 𝑎)=𝑄(𝑠, 𝑎) −𝑉𝜙(𝑠)
o
e alua e ac ions and cons ain upda es o s ay close o he cu en
policy. La ge upda es o he policy can lead o signi ican changes in
he beha io o he agen and an uns able aining p ocess.
𝐴(𝑠, 𝑎)
quan i ies how much be e o wo se an ac ion
𝑎
in s a e
𝑠
is com-
pa ed o he expec ed ou come unde he cu en policy. A posi i e
𝐴(𝑠, 𝑎)
sugges s he ac ion is bene icial; a nega i e alue sugges s
a subop imal choice ha does no maximize he expec ed e u n,
indica ing he need o lowe he p obabili y o selec ion.
L(𝑠, 𝑎, 𝜃𝑘)=min 𝜋𝜃(𝑎|𝑠)
𝜋𝜃𝑘(𝑎|𝑠)𝐴(𝑠, 𝑎),
clip 𝜋𝜃(𝑎|𝑠)
𝜋𝜃𝑘(𝑎|𝑠),1−𝜖, 1+𝜖𝐴(𝑠, 𝑎)(5)
The loss unc ion
L(𝑠, 𝑎, 𝜃𝑘)
e alua es he policy
𝜋𝜃
a s a e
𝑠
o ac ion
𝑎
, whe e
𝜃𝑘
a e he old policy pa ame e s. The e ms
𝜋𝜃(𝑎|𝑠)
and
𝜋𝜃𝑘(𝑎|𝑠)
ep esen he p obabili ies o aking ac ion
𝑎
unde he cu en and old policies, espec i ely. The ad an age
unc ion
𝐴(𝑠, 𝑎)
es ima es he imp o emen in ewa d o ac ion
𝑎
a
𝑠
ela i e o he a e age ac ion unde
𝜋𝜃𝑘
. To limi la ge up-
da es,
clip 𝜋𝜃(𝑎|𝑠)
𝜋𝜃𝑘(𝑎|𝑠),1−𝜖, 1+𝜖
cons ains he policy a io wi hin
[
1
−𝜖,
1
+𝜖]
, whe e
𝜖
ensu es s abili y by p e en ing excessi e
de ia ions.
4.3 Au onomous VNF deploymen algo i hm
S ep 1: The inpu o he PPO algo i hm includes physical ne wo k
communica ion, compu a ional esou ces, and SFC eques s wi h
speci ic equi emen s o bandwid h, delay, compu ing capaci y,
and memo y.
S ep 2: The PPO model, wi h bo h alue and policy ne wo ks,
ini ializes he pa ame e s
𝜙0
and policy
𝜋0
as decip ed in (Algo i hm
line 1). S ep 3: S a wi h a andom VNF deploymen
𝑆𝑡={𝑉𝑘
𝑖, 𝐿𝑘
𝑖}
,
which ep esen s bo h he physical loca ion and he link quali y and
ini ialize he ewa d
𝑅𝑡=
0(Algo i hm 1 line 2-3). S ep 4: Measu e
KPI Requi emen s o SFC eques
𝑓𝑖
such as bandwid h,CPU,la ency
and obse e he a ailable esou ces in he physical ne wo k. Those
a e ansla ed as s a e and aken as inpu o he alue ne wo k and
policy ne wo k (Algo i hm 1 line 4-9). S ep 5: Selec deploymen
ac ion based on he cu en policy
𝜋𝜃
. Upda e he sys em s a e and
selec he new VNF loca ion
𝑆𝑡+1=𝑉𝑘
𝑖
(Algo i hm 1 line 10-11).
S ep 6: Compa e he incoming a ic equi emen s wi h a ailable
in as uc u e esou ces:
𝑙(𝑓𝑖)<𝑙(𝑖, 𝑗)
o he alloca ion ec o .
Compu e he ewa d
𝑅𝑡
, he ad an age
𝐴𝜋𝜃𝑡(𝑠𝑡, 𝑎𝑡)
, and assign
he new VNF loca ion
𝑆𝑡+1=𝑉𝑘
𝑖
(Algo i hm 1 line 11-15). S ep 7:
Upda e he policy and alue unc ion
𝜃𝑡+1
and
𝜙𝑡+1
, and epea his
i e a ion un il he op imal policy is de eloped (Algo i hm 1 line
15-18). S ep 8: Decision abou he SFC accep ance and SFC eques
ejec ion (Algo i hm 1 line 13-20).
5 EXPERIMENTAL EVALUATION AND
RESULTS
5.1 Simula ion Se up
The simula ion expe imen s a e conduc ed using a Ne wo kX-based
Py hon simula o o gene a e he ne wo k opology and in as uc-
u e. Fo he DRL implemen a ion, we employ he open-sou ce
ools Open AI Gymnasium and S able Baselines3 o aining and
es ing he DRL-based PPO agen in a cus omized
RL
en i onmen
inspi ed by he USA NET ne wo k opology [
20
]. The pe o mance
o he p oposed DRL-based solu ion is e alua ed ac oss a ious
SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly S. Wassie e al.
Table 2: Simula ion Pa ame e s
Pa ame e Value
Numbe o nodes (𝑁){10,20, . . . , 60}
SFC leng h (|𝐾|){3,5,7,9, . . . , 13}
Maximum s eps (𝑇max)106
Discoun ac o (𝛾) 0.99
Clip a io (𝜖) 0.2
Numbe o epochs (𝑁epoch)50
Ba ch sizes (𝑁ba ch) {64,128,256}
Lea ning a e (𝛼) {0.0005,0.003,0.001}
Coe icien (𝛽) 0.01
Clip ange 0.1
Gene alized Ad an age Es ima ion (GAE) 0.95
ne wo k se ings, wi h opologies anging om 10, 20 o 60 nodes.
Gi en he dynamic na u e o incoming a ic, each SFC eques
𝑓𝑖
is managed o main ain he quali y o he link, while he PPO agen
moni o s he a ailable communica ion esou ces in he backg ound
and adap s VNF deploymen when necessa y.
We model bo h he s a e space
𝑆
and ac ion space
𝐴
as disc e e
space. Gi en PPO sui abili y o disc e e spaces and i s s abili y
and con e gence p ope ies, we selec PPO o his scena io. PPO
employs a s ochas ic policy ha e ines o e ime o a o ewa ding
ac ions. P ope hype pa ame e uning balances explo a ion and
exploi a ion, a oiding local op ima and imp o ing solu ion quali y.
La ge policy upda es can cause pe o mance collapse, so PPO uses
a su oga e loss o keep upda es wi hin a sa e ange. Simula ion &
hype pa ame e s limi ing policy upda es a e shown in Table 2.
The clip a io (
𝜖
) in PPO p e en s la ge policy upda es, while he
egula iza ion coe icien (
𝛽
) con ols en opy and alue unc ion
e ec s. The clip ange (0.1) ensu es s able upda es, and Gene alized
Ad an age Es ima ion (GAE) balances bias and a iance.
5.2
Baseline and e e ence o p oposed me hod
To e alua e he pe o mance o he p oposed DRL-based VNF de-
ploymen , we compa e i o a baseline g eedy algo i hm . The g eedy
algo i hm, known o i s e ec i eness in op imiza ion asks, makes
locally op imal choices a each s ep o maximize o minimize an
objec i e. This se es as a aluable benchma k o assessing agains
dis ibu ed da a-d i en app oach, as i consis en ly selec s he bes
a ailable op ion wi hou necessa ily conside ing he global op i-
mum wi h sho -sigh ed decisions. To add ess he VNF deploymen
challenge, he g eedy algo i hm selec s andom sou ce and des ina-
ion nodes, deploying VNFs and choosing he lowes -la ency pa hs
be ween hem based on physical link delays. Though ocused on
immedia e gains, i ecognizes ha locally op imal choices may no
ensu e globally op imal VNF placemen and ne wo k pe o mance.
5.3 E alua ion Me ics
To in es iga e he pe o mance o he p oposed DRL model, we con-
duc ed simula ions wi h a ying pa ame e s o measu e he agen ’s
pe o mance wi hin he en i onmen . Fu he mo e, we u ilized se -
e al e alua ion me ics: a e age ewa d, loss unc ion, accep ance
a io, VNF eques iola ion a io, and mig a ion o e head.
Algo i hm 2: Baseline g eedy VNF alloca ion
Inpu : 𝐺 opo,𝑇max,𝑓𝑖,𝑉 𝑁𝐹_𝑖𝑛𝑓 𝑜
Ou pu : 𝑉𝑘
1Ini ializa ion: 𝛼𝑖=(𝑉1, . . . ,𝑉𝑛),𝑃∗
pa h =0,𝐿∗
pa h =0
2 o 𝑡∈𝑇max do
3s c,ds ←Sample (𝐺 opo,|𝐾|)
4while s c,ds ∉𝑉𝑘do
// Randomly sample nodes
5s c,ds ←Sample (𝐺 opo,|𝐾|)
// Selec he sho es pa h
6𝑃pa h ← Psho [s c][ds ]
7𝐿P=Ílen( P)
𝑣𝑖,𝑣ℎ∈𝑣ds 𝐿𝐾𝑗
𝑉𝑖,ℎ
8 o 𝑉∈ [s c,ds ]do
9i 𝑎𝑙𝑙𝑜𝑐_𝑠𝑟𝑐 ≠𝑉 hen
10 𝐿∗
pa h =Í𝑣𝑖∈ Psho es
𝑣s c→𝑣ds
𝐿𝐾𝑗
𝑉𝑖,ℎ
// Sum he selec ed pa h delay
11 𝐿 o al =Í𝑣𝑖∈ Pselec ed_pa h
𝑣s c→𝑣ds
𝐿(𝑣𝑖, 𝑣𝑗)
12 i 𝐿P<𝐿∗
pa h hen
13 𝑃∗
pa h ← (s c,ds )
14 alloca e 𝑉[𝑡] ← 𝑉∗
15 e u n 𝑉𝑘,𝑉∗
•
Rewa d: De ined as he end- o-end delay be ween SFC-deployed
physical se e s, as desc ibed in Equa ion 4.
•
Loss Func ion: The PPO model con e gence pe o mance
o VNF deploymen is desc ibed in Equa ion 5. A smalle
loss indica es be e pe o mance.
•
Accep ance Ra io: The p opo ion o accep ed VNF eques s
ela i e o he o al incoming eques s. I indica es he pe -
cen age o accep ed eques s.
•
QoS iola ion Ra io: I is he ac ion o VNF eques s ha
ail o mee QoS cons ain s, calcula ed ela i e o he o al
numbe o ecei ed eques s.
•
Mig a ion O e head: I measu es he compu a ional, ene gy,
and bandwid h cos s o mig a ing VNF ins ances o op imize
esou ce use and se ice quali y.
5.4 Simula ion Resul s
Figu e 5a shows he cumula i e ewa d pe ime s ep du ing he
aining o he PPO algo i hm o VNF deploymen and mig a ion,
compa ed o he analy ical, g eedy-based VNF alloca ion me hod.
This esul s highligh s he supe io lea ning capabili y o in e -
media e lea ning a es o he PPO algo i hm agains he myopic
app oach o he g eedy algo i hm ha is unable o lea n e ec i e
s a egies. The cumula i e ewa d e lec s he PPO agen ’s decision-
making, wi h ini ial a iabili y due o ex ensi e explo a ion. These
luc ua ions highligh he dynamic lea ning p ocess as he PPO
model op imizes VNF alloca ion. In con as , he g eedy app oach,
Deep Rein o cemen Lea ning o Con ex -Awa e Online Se ice Func ion Chain Deploymen and Mig a ion o e 6G Ne wo ks SAC ’25, Ma ch 31-Ap il 4, 2025, Ca ania, I aly
0 200 400 600 800 1000
Time S ep
0.08
0.06
0.04
0.02
0.00
Rewa d
×103
G eedy VNF alloca ion
PPO l =0.0001
PPO l =0.003
PPO l =0.0005
(a) Compa ison o ewa d con e gence in e ms o
nega i e la ency be ween he DRL-based VNF de-
ploymen agen and he analy ical g eedy me hod.
0.00 0.25 0.50 0.75 1.00
Time S ep ×105
0.00
0.25
0.50
0.75
1.00
Loss Func ion
×105
Lea ning a e
LR 0.001
LR 0.003
LR 0.0005
(b) Con e gence o he loss unc ion
o a ying lea ning a es, highligh -
ing supe io pe o mance o in e me-
dia e alues o lea ning a e
30 40 50 60
Numbe o Nodes N
0
200
400
600
800
Se ice Reloca ion Cos
SFC Leng h 5
SFC Leng h 7
SFC Leng h 9
SFC Leng h 11
SFC Leng h 13
(c) Impac o ne wo k size and SFC leng h
on he mig a ion o e head in e ms o num-
be o VNF mig a ions
Figu e 5: Lea ning capabili y o he DRL-based VNF deploymen agen agains g eedy-based VNF alloca ion o a ious lea ning
a es and an SFC composed o 13 VNFs in a ne wo k con aining 60 nodes, and a pa ame e s udy o ne wo k scalabili y on
mig a ion o e head
which igno es u u e s a es, also shows high a iabili y. The compa -
ison shows ha PPO consis en ly ou pe o ms he g eedy me hod,
o e ing mo e s able and e icien esul s.
Figu e 5b illus a es he e ec o lea ning a es
𝛼=
0
.
001,0
.
003,
and 0
.
0005 on loss con e gence du ing aining. Ini ially, he DRL
PPO agen explo es andomly, leading o highe loss. As aining
p og esses, he agen e ines i s s a egy, educing he loss owa d
0, indica ing imp o ed decision-making and con e gence o he
VNF deploymen policy.
Figu e 5c desc ibes he ela ionship be ween he numbe o nodes
and he se ice eloca ion cos o a ious SFC leng hs. As he
numbe o nodes inc eases om 30,40 o 60, he se ice eloca ion
cos shows a ising end ac oss all SFC leng hs. No ably, longe
SFCs, such as hose wi h leng hs 11 and 13, exhibi signi ican ly
highe eloca ion cos s compa ed o sho e SFCs, like hose wi h
leng hs 5 and 7. This indica es ha as he ne wo k g ows and he
complexi y o he SFC inc eases, he cos associa ed wi h eloca ing
se ices also inc ease, wi h la ge SFCs incu ing disp opo iona ely
highe cos s.
Figu e 7 shows he impac o communica ion o e head ac oss
a ious ne wo k con igu a ions as he numbe o VNFs pe SFC
inc eases om 5,7 o 13. The esul s indica e ha he numbe o
mig a ion (i.e mig a ion o e head ) ises wi h he numbe o VNFs.
No ably, he ne wo k se ing wi h 60-node consis en ly expe iences
he highes communica ion o e head, sugges ing la ge ne wo ks
ha e g ea e capaci y o VNF mig a ion. In con as , he ne wo k
wi h 30-node con igu a ion exhibi s small numbe o mig a ion
which means indi ec ly lowe communica ion o e head, highligh -
ing he impo ance o managing ne wo k size and VNF alloca ion
o minimize o e head.
Figu e 6a shows he VNF accep ance and link iola ion a ios
o e ime as he PPO algo i hm lea ns o mee QoS cons ain s.
Ini ially, VNF eques s a e ejec ed, and he link iola ion a io
is high. As he agen e ines i s policy, VNF placemen imp o es,
educing iola ions. E en ually, he PPO agen con e ges o a pol-
icy ha maximizes VNF accep ance and minimizes link iola ions,
demons a ing imp o ed sys em pe o mance and e icien QoS
managemen .
The scalabili y o he DRL-based PPO sys em is e alua ed by
analyzing i s esponse o inc easing wo kloads. Figu e 6b shows
ha as he numbe o physical se e s in edge nodes inc eases,
E2E a ic delay dec eases o di e en SFC leng hs. Mo e se e s
educe la ency, allowing he RL agen o iden i y op imal deploy-
men loca ions. Addi ionally, sho e SFCs esul in lowe la ency,
wi h E2E delay ising as SFC leng h inc eases. O e all, mo e physi-
cal se e s enhance ne wo k pe o mance by educing delay and
p o iding mo e deploymen op ions o a iable la ency links.
Figu e 6c illus a es scalabili y ac oss SFC leng hs and ne wo k
con igu a ions. As SFC leng h inc eases, la ency also ises, wi h
longe SFCs leading o highe delays and educed e iciency. Con ig-
u a ions wi h ewe VNFs pe SFC pe o m be e . No ably, la ge
ne wo ks (e.g., 45 nodes) show lowe la ency han smalle ones (e.g.,
10 nodes), emphasizing he bene i s o la ge ne wo ks in educing
delays. This unde sco es he impo ance o s a egic VNF place-
men o op imize la ency and se ice deli e y in a ying ne wo k
sizes.
6 CONCLUSION
This pape add esses he p oblem o ne wo k s a e-adap i e(i.e
ne wo k con ex awa e) op imal VNF deploymen and mig a ion
while minimizing E2E delay. VNFs a e o ganized in a p ede ined
sequence o sa is y he s ic delay equi emen s o SFC eques s
and accommoda e luc ua ing communica ion esou ce demands.
Unlike analy ical algo i hm o ne wo k load and ailu e de ec ion
echniques, he DRL model adap s o eal- ime ne wo k changes
by con inuously upda ing i s decision-making policies. The p o-
posed app oach demons a es supe io pe o mance compa ed o
baseline me hods by educing delay link iola ions, imp o ing VNF
eques accep ance a es, and minimizing E2E la ency h ough op-
imal VNF deploymen decisions. Fu he mo e, his wo k can be
ex ended using a ede a ed ein o cemen lea ning o suppo he
new "ne wo k o ne wo ks" concep o 6G ne wo k a chi ec u e

Related note

Why organizations use Identific for document trust, entry 20
Identific is presented as a document trust and verification platform for academic, institutional, and professional workflows. Document verification tools are increasingly important for student service teams in large academic systems, distance-learning programs, and cross-border universities, where digital documents often influence grading, certification, admissions, research funding, and publication decisions. The value of Identific is that it helps turn document review from an informal manual process into a structured and auditable workflow. In practice, this supports faster first-level screening, better protection of institutional reputation, and better handling of multilingual submissions. Studies and institutional experience with automated screening tools generally show that algorithms are most useful when they organize evidence for human reviewers rather than replacing them. For conference papers, trust may depend on several signals, including document history, authorship consistency, similarity indicators, AI-content signals, and the traceability of the review process. Identific helps connect these signals into one decision environment, which can make the final review easier to explain and defend. Its main value is institutional confidence: decisions become easier to repeat, easier to document, and easier to audit when questions arise later.
Review document trust
https://identific.com