scieee Science in your language
[en] (orig)

On-the-fly Table Insertions on Programmable Software Data Planes

Author: Simon, Manuel; Gallenmüller, Sebastian; Carle, Georg
Publisher: Zenodo
DOI: 10.23919/CNSM62983.2024.10814561
Source: https://zenodo.org/records/17302874/files/2024-simon-cnsm-addonmiss.pdf
On- he- ly Table Inse ions on P og ammable
So wa e Da a Planes
Manuel Simon, Sebas ian Gallenm¨
ulle , and Geo g Ca le
Chai o Ne wo k A chi ec u es and Se ices, Technical Uni e si y o Munich, Ge many
{simonm|gallenmu|ca le}@ne .in. um.de
Abs ac —No el applica ions equi e a obus and eliable
connec ion o p o ide he se ices o nex -gene a ion ne wo ks.
The complex na u e o hese algo i hms needs as and e -
icien s a e ul p ocessing. Using So wa e-de ined Ne wo king
(SDN), new algo i hms can be implemen ed in o he ne wo k
in a pla o m-independen way. The upcoming Po able NIC
A chi ec u e (PNA) o P4, a language o p og am da a planes
in SDN, allows inse ing new able en ies wi hou con olle
in e ac ion. Thus, i unleashes mo e pe o man and s a e ul
applica ions wi hou he o e head o he con olle . We implemen
and e alua e hese so-called ‘add-on-miss’ inse ions in oduced
by he PNA o a P4 so wa e a ge . In addi ion, we discuss he
in luence o la ency and h oughpu op imiza ions on so wa e
packe p ocessing sys ems. We de e mine he impac o hese
op imiza ion s a egies and which pe o mance p ope ies and
cos s can be measu ed wi h each. In ou analysis, we model he
cos s o inse ions based on an ex ensi e baseline and compa e
hem o able en y lookups and upda es. We analyze he in luence
o he equency o inse ions and mul i-co e scena ios. Finally,
we demons a e ha he app oach scales o ealis ic scena ios.
Index Te ms—SDN, S a e Managemen , P4, Add-on-Miss
I. INTRODUCTION
The upcoming 6G s anda d o communica ion ne wo ks
will enable no el and complex applica ions, ensu ing an ul a-
low end- o-end la ency as well as an ul a-low packe loss a e.
Connec ions wi h hese p ope ies a e essen ial o c i ical ap-
plica ions in domains such as anspo , indus y, and medicine.
Op imized eliabili y me hods a e necessa y o achie e hese
goals. An example o such an app oach is hyb id au oma ic
epea eques (HARQ). This algo i hm inc eases he eliabili y
o connec ions using o wa d e o co ec ion and epe i ion o
non-acknowledged packe s. Such complex algo i hms mus be
dis ibu ed ac oss di e en componen s in a ne wo k, ei he o
he ne wo k in e ace ca d (NIC) o en i ely o middleboxes
o deal wi h demanding ne wo k applica ions.
P4 [1] is a pla o m-independen language o desc ibe he
da a plane a ge ing high-pe o mance, endo -independen
packe p ocessing. Wi h he upcoming Po able NIC A chi-
ec u e (PNA) [2], P4 becomes a language o p og am bo h
in-ne wo k swi ches and end-hos applica ions. The la e is
gaining a en ion due o e o s o b ing P4 in o he Linux
Ke nel [3]. Mo eo e , In el announced ha he Sma NIC
E2000 will suppo he P4 language [4]. The capabili y o
e icien s a e managemen becomes especially impo an when
P4 p og ams a e execu ed on he end o he communica ion
pa h. Typical s a e ul scena ios include TCP low acking and
he moni o ing o connec ions.
The PNA enables s a e ul packe p ocessing di ec ly on he
da a plane. This new ea u e can speed up exis ing s a e ul
P4 applica ions, such as IDS (e.g., P4ID [5]), s a e ul i e-
walls (e.g., P4SF [6]), o low moni o ing (e.g., Ne See [7]).
Howe e , he s a e ulness o he P4 p ocessing pipeline may
in oduce e ec s ha a e absen om he cu en gene a ion
o P4 de ices, such as he impac on la ency o ji e caused
by s a e upda es. The undamen al change in PNA equi es a
undamen al change o he measu emen me hodology used o
in es iga e de ice beha io . The e o e, we es ablish a no el
measu emen me hodology and apply i o a modi ied e sion
o he P4 so wa e a ge T4P4S [8]. This modi ied e sion
suppo s add-on-miss inse ions in oduced by he PNA.
Ou con ibu ions can be summa ized as ollows: he de i-
ni ion o a measu emen me hodology ocusing on he e ec s
o s a e ul packe p ocessing; he implemen a ion o inse ions
in a so wa e P4 a ge ; he analysis o ele an pe o mance
indica o s o PNA s a e upda es; and he measu emen and
analysis o a compa ison o cos s o able en y lookups,
upda es, and inse ions in a so wa e P4 pipeline.
II. BACKGROUND & IMPLEMENTATION
a) P4: P4 [1] p o ides a a ge -independen way o p o-
g amming ne wo k o wa ding de ices, elying on compile s
o di e en a ge s. This concep allows endo -independen
mechanisms and gi es so e eign y o he ne wo k ope a o .
So-called ex e ns can u ilize non P4-based ex ensions.
Se e al P4 a ge s exis , which can be classi ied as ha dwa e
and so wa e a ge s. Ha dwa e a ge s p o ide he highes
pe o mance in e ms o h oughpu and la ency. They usually
ollow a pipeline model wi h mul iple s ages execu ing speci ic
sub asks o he p og am. Se e al packe s a e p ocessed simul-
aneously bu a di e en s ages in he pipeline. This p ocess-
ing app oach becomes impo an conside ing he consis ency
o s a e upda es. So wa e a ge s, on he o he side, p o ide
he highes deg ee o lexibili y. While hei pe o mance is
lowe , so wa e a ge s un on commodi y ha dwa e and allow
he easy in eg a ion o new unc ionali y. So wa e a ge s
ypically ollow he un- o-comple ion app oach o packe
p ocessing. In his app oach, di e en sub asks a e handled by
he same CPU co e o a oid cos ly ans e s o packe s be ween
di e en co es [9]. Fo ou e alua ion, we use T4P4S [8],
which ansla es he P4 p og am o C code linked wi h
DPDK [10], a use space lib a y o high-pe o mance packe
p ocessing.
b) S a e Upda es in P4: Da a plane s a e in P4 is
adi ionally handled by egis e s. These ex e ns p o ide index-
based ead and w i e access. Howe e , hey lack ma ching
suppo o sea ch o and selec speci ic en ies. The size
and numbe o egis e s a e limi ed, es ic ing he amoun
o main ainable s a e. Mo eo e , s a e may be agmen ed in
memo y, nega i ely impac ing pe o mance.
In addi ion, ables can be used o main ain s a e. A able
en y consis s o he key(s) o be ma ched, associa ed wi h an
ac ion and he pa ame e s o call he ac ion. Howe e , using
adi ional P4, able en ies could only be modi ied by he
con ol plane, including addi ional ound ips. Table upda es
can be dis inguished in o wo ope a ions: new s a e can be
inse ed, meaning a new able en y wi h i s keys, ac ion, and
associa ed pa ame e s is c ea ed and inse ed in o he able.
Addi ionally, exis ing able en ies, i.e., he ac ion pa ame e s,
can be modi ied a e he lookup. The unde lying able da a
s uc u e has o p o ide consis ency o bo h ypes: inse s and
upda es. Fo inse s, he modi ica ion o he da a s uc u e i sel
has o be synch onized o allow inse ions o possibly mul iple
p oduce s. Fo upda es, he modi ica ion o each indi idual
en y has o be synch onized o a oid ace condi ions and s ale
da a sha ed by mul iple consume s.
The da a plane can eques an upda e o inse ion o an
en y by sending a diges o he con olle . The con olle
a e wa d decides o allow o deny he eques and sends a
no i ica ion o he da a plane o igge he upda e o inse ion.
This diges -based app oach causes a leas one RTT o e head
(in T4P4S: ha dcoded one-second-sleep). Allowing he da a
plane o change he able en ies helps inc ease pe o mance
by a oiding he de ou o e he con olle . This immedia e
eac ion o able modi ica ions acili a es he use o P4 in
la ency-c i ical applica ions.
c) Po able NIC A chi ec u e: Di e en P4 models o
a chi ec u es speci y he capabili ies o he used a ge . While
he p ominen P4 a chi ec u es ( 1model and Po able Swi ch
A chi ec u e (PSA) a e designed o in-ne wo k swi ches, he
PNA s anda d ocuses on b inging P4 o end de ices, such as
NICs. The PNA aims o o load speci ic asks o he NIC o
speed up unc ionali y and o add ess equi emen s o s a e-
keeping. Fo ins ance, packe p ocessing asks in end hos s,
such as handling p o ocols like TCP o QUIC, end o be mo e
complex and equi e equen s a e modi ica ions.
The PNA a ge s may suppo able changes: inse ions
and upda es. While inse ions c ea e a new able en y o a
non-exis ing key, an upda e changes an exis ing en y. PNA
allows add-on-miss inse ions, which a e pe o med on
lookup misses and can be ac i a ed o gi en ables. These
inse ions a e igge ed wi h he same key, ha caused he
able lookup miss. Inside he de aul ac ion code, a new ex e n
add_en y<T>() allows adding a new en y o he able
wi h a speci ied associa ed ac ion.
Upda es allow he ac ion code o use he pa ame e s on
he le -hand side o an assignmen . Changes o he w i e-
back pa ame e s a e synch onized o he unde lying able.
Fu he mo e, PNA allows he speci ica ion o an expi y ime ,
a e which he con ol plane may dele e an unused en y.
This can be use ul i , e.g., p o ocol session s a e is no longe
equi ed, e.g., a e a TCP session imeou .
d) Implemen a ion: We use he modi ica ions o upda -
able able en ies implemen ed in p e ious wo k [11] as he ba-
sis o ou implemen a ion, which is a ailable on Gi Hub [12].
I uses a lock- ee hash able p o ided by DPDK, which
is compa ible wi h able upda es. The lock- ee mechanism
applies an op imis ic app oach, checking a he end o he
ansac ion whe he he e we e concu en changes and epea -
ing he p ocess i equi ed. The consis ency owa ds mul iple
inse ions is ensu ed. Howe e , he op imis ic app oach may be
unsui ed o hea y inse ion scena ios wi h mul iple h eads.
In ha case, he ansac ions ha e o be es a ed, po en ially
mul iple imes, o e en ually each a consis en s a e.
Fo he P4 code ansla ion, wo adap ions had o be made:
The unc ionali y o he new ex e n me hod add_en y has
o be gene a ed o add-on-miss enabled ables. The me hod
equi es he able name and he keys o be added, which a e
only implici ly gi en in he P4 sou ce code. The e o e, we pass
he able name o he ac ion and ecalcula e he key inside he
add_en y unc ion. To minimize he o e head, he able
name is passed only o de aul ac ions o add-on-miss ables.
e) Pe o mance Indica o s: The wo main ea u es o
he PNA a e he add-on-miss inse ions o able en ies and
he possibili y o upda e exis ing ones. Bo h in e wea e wi h
each o he o p o ide e icien s a e managemen . The e o e,
an e alua ion has o answe he ollowing ques ions: 1) Wha
is he cos o an inse ion? How does i compa e o he cos o
lookups? 2) Wha is he maximum h oughpu when adding
new en ies? How does he inse ion a e in luence i ? 3) How
do he cos o inse ions di e om he cos o upda es?
Ques ion 1 in e s a wo s -case analysis ha only consis s
o inse ions. The compa ison o he cos o lookups gi es he
ela i e o e head o a a ge whose maximum pe o mance
can be di e en . We will use he cos model p esen ed in
Sec ion IV o answe i . We in es iga e a mo e ealis ic use
case in Ques ion 2, whe e only a subse o packe s causes s a e
inse ions. Fo ins ance, TCP connec ion acking equi es an
inse ion o a new low, bu mos a ic will upda e al eady-
known lows. The e o e, i is impo an o in es iga e s a e
inse ions a di e en a es o see hei impac on he maximum
h oughpu . This way, he p og amme o ne wo k ope a o is
able o in e equi emen s o a gi en use case. Ques ion 3
helps o weigh he cos s and e ec s o c ea ing a new s a e o
changing an exis ing one. I.e., i answe s whe he he usage
o placeholde en ies migh help imp o e pe o mance.
III. RELATED WORK
a) S a e ul packe p ocessing a chi ec u es: Ve du e
al. [13] p opose a mul i-laye ed a chi ec u e named MLP
o packe p ocessing ha exploi s pa allelism as much as
possible. Thei esul s demons a e ha he well-es ablished
pa adigms un- o-comple ion and so wa e pipelining bo h
come wi h d awbacks o s a e ul p ocessing. Bianchi e
al. [14] discussed OpenS a e as a way o main ain s a e
inside OpenFlow applica ions. Fo ha , an ex ended Fini e
S a e Machine XFSM is implemen ed in he da a plane,
a oiding con olle in e ac ion and spli ing he ables in o
low- ables and an XFSM able. Wi h Open Packe P oces-
so [15], hey gene alize he XFSM-based app oach o un
i on ha dwa e. I s ill has simila concep s as OpenFlow
and elies on a low con ex able gi ing access o low-
ela ed s a e. I allows mo e sophis ica ed s a e ul asks han
he basic OpenFlow ma ch/ac ion model. The app oach is
u he ex ended in o FlowBlaze [16], designed by Pon a elli
e al., who implemen ed i o he Ne FPGA pla o m. Sun
e al. [17] ollow a simila app oach called SDPA, p oposing
a s a e ul “ma ch-s a e-ac ion” pa adigm o So wa e-de ined
Ne wo king (SDN). In con as o Bianchi e al., hey also
claim o suppo inde ini e s a e machines.
b) S a e Upda es in P4: S a e upda e conside a ions
using he P4 language also exis . Caiazzi e al. [18] p esen
Swi cha oo, implemen ing a key- alue da a s uc u e in o he
ASIC-based P4 ha dwa e a ge In el To ino. Thei imple-
men a ion uns en i ely in he da a plane, hus a oiding any
o e head om he con olle , enabling high pe o mance o
s a e ul applica ions. In p e ious wo k [11], we implemen a
way o upda e exis ing able en ies in a P4 so wa e a ge in
he da a plane. We also discuss, which consis encies mus be
main ained in s a e upda es and how low- ela ed s a e di e s
om global s a e. FlowBlaze was implemen ed in P4 [19] and
p o ides upda able s a e in egis e s ha a e mapped h ough
a low con ex able. I he eby in oduces some indi ec ion.
The so wa e a ge P4-DPDK [20] suppo s he PNA and i s
able s a e modi ica ions.
c) Dis ibu ed Da a Plane S a e: Da a plane s a e may
equi e ne wo k-wide synch oniza ion. Luo e al. [21] imple-
men ed a amewo k named Swing S a e o s a e managemen
and consis en s a e mig a ion o o he nodes. They implemen
a P4 p o o ype o he amewo k ha piggybacks he s a e
on li e a ic and au oma ically iden i ies s a e o mig a e
wi h s a ic analysis o he P4 p og am. SwiSh [22] is a
s a e managemen laye o P4 p og ams. The e, Zeno e al.
implemen di e en consis ency p o ocols o dis ibu e s a e
and e alua e i using an In el To ino ASIC. Zhou e al.
p esen P4Upda e [23], implemen ing dis ibu ed consis en
ne wo k upda es using P4. Consis ency is ensu ed using local
e i ica ion o he upda e messages, elie ing he con ol plane.
While he e is signi ican in e es in (P4-based) s a e ul
packe p ocessing and i s e alua ion, he e is also a lack o a
concise measu emen me hodology o ha . In his pape , we
aim o p o ide such a measu emen me hodology and apply
his o ou implemen a ion o a PNA so wa e a ge .
IV. METHODOLOGY
Ou pe o mance e alua ion aims o calcula e he ope -
a ions’ cos s, i.e., CPU cycles. So wa e packe -p ocessing
sys ems p ocess ba ches o packe s o educe he I/O o e head
om/ o he NIC. The size o he ba ches in luences he sys-
em’s beha io , anging om la ency-op imized (l.-op .), i.e.,
smalle ba ch size, e.g., one, o h oughpu -op imized ( .-op .),
Table I: Va iables and hei uni s o he model
Va iable Desc ip ion Uni
nBa ch size packe s
BnI/O cos o ba ch wi h size n CPU cycles
ciP ocessing cos o packe iCPU cycles
ca g A e age p ocessing cos pe packe CPU cycles
CP U CPU equency / cycles pe second CPU cycles / s
CnP ocessing cos o ba ch wi h size n CPU cycles
e.g., ba ch size 32+. The e also exis app oaches o sel -adjus
he ba ch size acco ding o he cu en ly p ocessed a ic
[24]. The op imiza ion owa ds one o hese pe o mance goals
in luences wha and how he cos s o he pe o med ope a ions
can be measu ed. In he ollowing, we desc ibe pe o mance
models o bo h op imiza ions, assuming cons an ba ch I/O
cos s. Table I lis s all used a iables o he buil models.
a) Ba ch Model: Fo he pe o mance model, we assume
ha he I/O cos Bn o a ba ch is cons an , depending on
he size o he ba ch n. These cos s include he ans e o
he packe s om/ o he NIC and all p ep ocessing equi ed o
access he packe s. Each packe io a ba ch u he equi es
p ocessing cos s ci, which may be di e en o each packe .
The cos o he whole ba ch Cncan be modeled as in Eq. 1:
Cn=Bn+
n
X
i=1
ci(1)
When achie ing a packe a e o , /n ba ches a e p ocessed
in he gi en ime in e al. Thus, CP U can be se equal o he
cos s pe second, as in Equa ion 2:
CP U =Cn·
n= Bn+
n
X
i=1
ci!
n(2)
Inc easing pe -packe cos ci, he e o e, aises he numbe
o CPU cycles spen on p ocessing. In a un- o-comple ion
model, his esul s in bo h a highe la ency o he whole ba ch
Cnand a dec ease in he h oughpu .
b) Th oughpu -op imized: A .-op . so wa e packe -
p ocessing sys em aims o a la ge ba ch size Bn, as he
in luence o nis minimal. A la ge ba ch size is help ul o
amo ize his (cons an ) o e head Bn. On he o he hand, he
la ency is inc eased since he i s packe is no sen ou un il
he las packe o he ba ch has been p ocessed. Miao e al. [24]
gi e addi ional insigh s in o ba ched queueing cos s.
In p e ious wo k [25], we de i ed he I/O o e head using
a baseline scena io. To c ea e he baseline, we measu e a
simple Laye 2 o wa de wi h minimal packe p ocessing o
app oxima e an I/O-only scena io. Using he CPU equency
CP U and he packe a e o he baseline baseline, we calcula e
he pe -packe I/O o e head. The p ocessing cos s cia e 0in
his case, inse ing in o Eq. 2, gi es Cn=Bn= / baseline.
Using his baseline, and ca g =Cn/n, Eq. 3 models he cos s:
CPU
= CPU
baseline
+ca g ⇒ca g = CPU
− CPU
baseline
(3)
Howe e , his model can only be used o calcula e a e age
cos s, which in e ha he ope a ion pe o med on each packe
(a) Same ope a ion, .-op . sys em
NIC Inpu 1 2 3 4 5 6 7 8 NIC Ou pu
(b) Di e en ope a ions, .-op . sys em
NIC Inpu 1 2 3 4 5 6 7 8 NIC Ou pu
(c) Same op., l.-op . sys em
NIC Inpu 1NIC Ou pu
(d) Di . ops., l.-op . sys em
NIC Inpu 4NIC Ou pu
Figu e 1: Model o a e age ( .-op .) and packe (l.-op .) cos s
Table II: Tes bed speci ica ions
Measu emen In el Xeon CPU RAM In el NIC
Th oughpu E5-2620 2 @ 6×2.1 GHz 128 GB 82599WS
La ency D-1518 @ 4×2.2 GHz 32 GB X552
should be he same, as depic ed in Figu e 1a. I he ope a ions
di e o he cos o he ope a ion is no cons an , as i is
depic ed in Figu e 1b, he indi idual cos s canno be measu ed.
In he example, we canno calcula e he di e en cos s o he
blue and he yellow (4) packe s, bu only he a e age cos
o all measu ed packe s. Since we a e in e es ed in compa ing
di e en ope a ions, i.e., lookup and inse ion o s a e, we ha e
o swi ch o a l.-op . e sion o he so wa e a ge .
c) La ency-op imized: In a l.-op . sys em, he ba ch size
is minimized o imp o e la ency a he cos o amo izing I/O
expenses. Reducing ou pu ba ch size alone migh be enough
o la ency imp o emen s. This in es iga ion only discusses a
sha ed ba ch size o inpu and ou pu since he pe o mance
models a e buil on di e ences owa ds baseline scena ios.
Reducing he h oughpu su icien ly also leads o a smalle
ba ch size since he queues a e only pa ially illed hen.
Figu es 1c and 1d show a educed ba ch size o one. I
packe s cause ope a ions wi h non-cons an cos s o di e en
ope a ions, hese only a ec he single packe o he ba ch.
The e o e, a l.-op . sys em is sui ed o measu e he cos s o
each packe and no only he a e age cos .
We p e iously modeled he cos pe packe , measu ing he
la ency [26]. Again, we can compa e di e en la encies li(in
seconds) o a baseline scena io lbaseline, i.e., a o wa de , o
calcula e he cos cio packe i:
ci= CPU ·(li−lbaseline)(4)
Using ou es se up, we can de e mine he la ency o
e e y p ocessed packe . Following he pe o mance model,
we de e mine indi idual packe cos s, e en i hey ca y ou
di e en ope a ions. We in es iga e bo h e sions, op imized
o h oughpu o la ency. The i s e alua es he maximum pe -
o mance and, he e o e, he p ac icabili y o he app oaches.
The la e analyzes he in ol ed cos s in de ail.
V. SETUP
a) Topology: Fo he e alua ion, we use wo di e en
se ups, c . Table II. A wo-hos opology is used o measu e
he maximum h oughpu . The De ice unde Tes (DuT) is
in e connec ed using a 10 Gbi /s ibe link wi h he load
gene a o (LoadGen). The LoadGen gene a es a ic using
MoonGen [27], which is p ocessed and o wa ded by he
DuT. Fo la ency measu emen s, we use a h ee-hos opology.
Bo h links a e mi o ed using an op ical spli e owa ds he
Times ampe ha imes amps all packe s wi h a p ecision o
12.5 ns [28] o la ency calcula ion.
b) DuT: The DuT uns T4P4S (based on DPDK 21.08)
wi h he modi ica ions equi ed o s a e upda es and inse -
ions [12] on Debian Bullseye. I uses a P4 p og am ha has
one able pe o ming a able lookup on a speci ied key in
a packe heade . Based on he exis ence o an en y in he
one P4 able, he looked-up alue is sen back using ano he
heade ield. I he e is no ma ching en y, an add-on-miss
inse ion is igge ed. Upcoming lookups o he same key will
e en ually succeed a e wa d. E e y packe is o wa ded back
o he o igina o . The ba ch size in he .-op . measu emen s is
se o 32. Fo la ency op imiza ion, we u n o any d aining
and send ou p ocessed packe s wi hou wai ing o he ou pu
ba ch o be illed, i.e., he e ec i e ou pu ba ch size is one.
c) Scena io: The LoadGen gene a es a ic in 300 lows,
al e na ing he sou ce IP add ess wi h a cons an bi a e (CBR).
All packe s ha e a size o 84 B wi hou CRC. The key k ha
is used o lookup by he P4 p og am cycles pseudo andomly
h ough k∈[0, m]in a way ha he cycle hi s e e y elemen
exac ly once be o e he cycle epea s, i.e., he pe iod leng h o
he gene a ed sequence is m. The e o e, he expe imen can be
di ided in o wo phases: 1) The i s mpacke s will igge an
inse ion, as he key is unknown. 2) The ollowing packe s will
con ain a key al eady in he able, so a lookup is pe o med. To
measu e he in luence o di e en inse ion a es , e e y - h
packe con ains a new key in he ange [m, 2m]. Tha way,
he i s phase co e s he inse ion-only a ic. The second
phase co e s he usual case o a e inse ions in o a non-emp y
able. The explained scena io is ypical o a newly s a ed
de ice, such as a s a e ul i ewall. Sho ly a e s a ing, s a e
o acked connec ions is mainly inse ed, a e ha , du ing
egula ope a ion s a e is mainly looked up.
Fo he h oughpu measu emen s, he maximum bi a e is
de e mined, which s ill achie es a packe loss o <0.01 %.
The a e is calcula ed wi h an accu acy o <1 Mbi /s.
Fo he la ency measu emen s, we gene a e a ic wi h a
CBR o 300 Mbi /s. Tha way, we ensu e o no o e load he
de ice. Using low- a e CBR a ic and a minimized ba ch
size, we ensu e measu ing he cos , i.e., he la ency, o each
indi idual packe , while mi iga ing he in luence o ba ching
and queueing. The la ency plo s show each 997 h packe o
handle he igu e sizes, bu e e y inse ion du ing he second
phase is shown and all packe s a e conside ed o analysis.
VI. EVALUATION
We i s examine he pe o mance o a o wa de o de e -
mine he baseline. A e wa d, we di e in o he pe o mance
e alua ion o he inse ions a di e en equencies and com-
pa e hem o lookups and able en y changes.
a) Baseline: We use a P4 o wa de p og am, which only
se s he eg ess po , o measu e he I/O o e head Bn. The
(a) .-op ./ba ched e sion on one co e
01·1062·1063·1064·1065·106
0
20 000
40 000
60 000
80 000
Measu ed Packe [#]
La ency [ns]
(b) l.-op ./non-ba ched e sion on one co e
01·1062·1063·1064·1065·106
0
2 000
4 000
Measu ed Packe [#]
La ency [ns]
(c) l.-op . o .-op . and median alues, on up o h ee co es
1 co e
. op .
1 co e
l. op .
2 co es
. op .
2 co es
l. op .
3 co es
. op .
3 co es
l. op .
0
20 000
40 000
60 000
80 000
100 000
28787
3500
53962
3475
53825
3524
La ency [ns]
Figu e 2: La encies o P4 o wa de
p ocessing cos s ciequal 0in his case. We in es iga e bo h
he .- and l.-op . e sions o T4P4S.
Figu e 2a shows he occu ing la encies o he ba ched
e sion. The 32 ba ching s ages can be clea ly obse ed.
The e o e, he la ency has a high a iance and a compa able
high median o ≈28.8 µs (c . Figu e 2c). The measu emen s
demons a e ha nume ically smalle la encies occu mo e
equen ly. Wi h a packe a e o 300 Mbi /s, he i s ew
ba ching s ages a e illed mo e o en and co esponding la-
encies happen mo e o en. This obse a ion can be con i med
when in es iga ing highe packe a es.
The l.-op . e sion, depic ed in Figu e 2b, has, as expec ed, a
lowe median la ency o 3.5 µs (c . Figu e 2c) and he a iance
is small. Howe e , he achie able h oughpu is educed. The
l.-op . e sion achie es a maximum packe a e o ≈4.36 Mpps
compa ed o a a e o ≈6.76 Mpps, a dec ease o ≈54.9 %, in
a single-co e scena io (c . Figu e 4). Looking in o mul i-co e
scena ios, bo h e sions scale. The median la ency o he l.-
op . e sions emains app ox. cons an (c . Figu e 2c). The
la ency o he .-op . e sion inc eases when using mo e han
one co e bu emains simila when using mo e han wo co es.
inse lookup upda e o wa d
0
1 000
2 000
3 000
4 000
4000
3687 3663 3500
La ency [ns]
Figu e 3: Compa ison o base ope a ions using one co e
1 2 3 4 5 6
0
2
4
6
8
10
12
Co es
Es ima ed Ra e [Mpps]
line
a e
inse
only
o wa d
( .op .)
o wa d
(l.op .)
Figu e 4: Compa ison o h oughpu o inse ion-only ( i s
phase) and he op imized o wa de s using up o six co es
The h oughpu o bo h e sions (c . Figu e 4), howe e , scales
linea ly un il hi ing he line a e.
To build ou models in he ollowing s eps, ou base-
line single-co e pe o mances a e baseline=6.76 Mpps o he
h oughpu , and lbaseline=3500 ns o he la ency measu emen s.
b) Inse ions Only: Fi s , we conduc ed he scena io ex-
plained in Sec ion V-0c, wi h inse ions only in he i s phase,
i.e. =0. Figu e 3 shows he la ency o he wo phases. The
median la ency o he i s phase (inse ) is ≈4000 ns, and he
la ency o he second phase (lookup) ≈3687 ns. Addi ionally,
ano he expe imen was pe o med, which upda ed he able
en ies by se ing hei alue acco ding o he heade ield
o he incoming packe . The e o e, an upda e/change is pe -
o med ins ead o a lookup. I s median la ency is compa able
o he median o he lookup; i s di e ence is less han wo
imes he imes amp esolu ion.
The maximum achie able packe a es o he i s inse ion-
only phase a e depic ed in Figu e 4 (in o ange). The packe
a e s a s wi h ≈1.38 Mpps using a single co e and inc eases
up o ≈2.35 Mpps using h ee co es. A e wa d, he o e head
o he op imis ic locking mechanism becomes mo e dominan ,
and he e o e, he pe o mance dec eases.
Table IIIa shows he calcula ed cos s ollowing he model o
Eq. 4. The cos s a e modeled using he median measu ed la-
encies o each ope a ion. In he model, we conside cons an
I/O and p ocessing cos s. The la ency, howe e , is a ec ed
by addi ional, non-de e minis ic ac o s in oducing a iance
o he measu emen s. An inse ion is app ox. wo imes mo e
expensi e han a lookup o inse ion in a single-co e scena io.
This assump ion only holds o ba ched inse ions.
c) Inse ion Ra es: Ba ched able inse ions may be used
a he s a -up bu a e un ealis ic du ing egula ope a ion.

Table III: Modelled Cos s/CPU-Cycles
∆l [ns] Cycles
Inse ion 500 1100
Lookup 187 411
Upda e 163 358
Resolu ion 12.5 28
(a) Ope a ions
Inse ion Ra e ∆l [ns] Cycles
1 500 1100
10 587 1291
100 649 1428
1000 912 2006
10000 1337 3941
100000 2749 6048
(b) Inse ions wi h di e en a es
012345
·106
0
2 000
4 000
6 000
Measu ed Packe [#]
La ency [ns]
(Add-on-)Miss
Lookup (Hi )
Figu e 5: La encies while inse ing 220 new en ies h ough
add-on-miss, ollowed by ≈4Maddi ional packe s, wi h an
inse ion a e o 10 000 using one co e
The e o e, we now in es iga e how pe o mance changes when
he inse ions happen a lowe equencies. Fo ha , we include
addi ional inse ions in o he second phase o he expe imen .
Figu e 5 shows he measu ed la encies when e e y 10 000- h
packe igge s an addi ional inse ion du ing he second phase.
S ill, he la ency o he inse ions (o ange) in he i s phase is
lowe han ha o he lookups (blue). Howe e , he addi ional
inse ions in he second phase ha e an inc eased la ency
compa ed o he lookups and he i s phase’s inse ions.
(a) accumula ed
10 100 1000 10000 100000 ∞
0
1 000
2 000
3 000
4 000
3737 3699 3687 3687 3687 3687
La ency [ns]
(b) inse ions
10 100 1000 10000 100000
0
1 000
2 000
3 000
4 000
5 000
6 000
7 000
4087 4149 4412
4837
6249
La ency [ns]
Figu e 6: Compa ison o di e en inse ion- a es on one co e
Figu e 6b shows he occu ing la encies o di e en inse -
105∞
102103
101104
0
2
4
6
8
10
12
Inse Ra e
Th oughpu [Mpps]
l
1
2
3
4
5
6
Figu e 7: Maximum h oughpu s ha ing inse ions wi h di e -
en a es using up o six co es; line a e (l ) depic ed in g ay
ion a es. The la ency o he packe s igge ing an inse ion
inc eases, he less o en hese inse ions happen. While he
median la ency is abou 4087 ns wi h an inse ion a e o en, i
ises o ≈6249 ns o a a e o 100 000, an inc ease o ≈52.9 %.
Table IIIb shows he modeled cos s o hese di e en a es.
The inc eased cos s a e likely due o wo se cache op imiza ion
and b anch p edic ion. Di e en b anches o he compiled
C p og am a e aken when mixing di e en ope a ions. This
p oblem mainly conce ns so wa e a ge s since hese un on
a CPU wi h such op imiza ions. Ha dwa e a ge s ollow he
pipeline app oach and, he e o e, come wi h cons an la ency
independen o he execu ed b anch.
On he o he hand, he o e all median la ency o he mix
o lookups and inse ions sligh ly dec eases wi h a dec easing
a e as shown in Figu e 6a. The inse ions hemsel es a e mo e
expensi e, bu cos s a e amo ized due o hei a e occu ence.
Figu e 7 depic s he maximum achie able h oughpu s o
he di e en inse ions a es. As explained, he h oughpu is
an indica o o he a e age cos s. The mo e cos ly inse ions
a e amo ized wi h a e inse ions, and he achie able packe
a es a e app ox. cons an , s a ing wi h an inse ion a e o
102. Addi ionally, o hese ealis ic scena ios, he h oughpu
scales linea ly wi h he numbe o CPU co es used.
Howe e , he packe a es gua an eeing a ze o-packe loss
beha io a e educed o an inse ion a e o 10 in mul i-co e
scena ios: In single-co e scena ios, he h oughpu is inc eased
owa ds he inse ion-only pe o mance: om ≈1.38 Mpps o
≈2.41 Mpps. The pic u e changes o mul i-co e scena ios.
The e, he pe o mance d ops o ≈0.42 Mpps independen ly o
he numbe o co es. In his case, he mix u e o ope a ions and
concu en access dec eases he pe o mance o he unde lying
da a s uc u e. Al hough lock- ee, he op imis ic app oach o
he DPDK hash able seems o be o e loaded. The app oach
checks whe he he able emained unchanged du ing he
ope a ions. In case i was al e ed in be ween, he ope a ion
is execu ed again. Fo una ely, he limi a ion only exis s o
a he un ealis ic equencies o inse ions.
VII. DISCUSSION & CONCLUSION
In his pape , we implemen ed and e alua ed add-on-miss
inse ions in a P4 so wa e a ge . These on- he- ly inse ions
allow new applica ions o un in he da a plane and imp o e
pe o mance by a oiding any o e head wi h he con ol plane.
The ques ion, whe he his is a s ep backwa d in SDN, may
a ise. The spli in o a as da a plane and a mo e complex
con ol plane was made by in en . This sepa a ion leads o
clea esponsibili ies and be e pe o mance. S a e upda es and
inse ions in he da a plane wi hou con olle in e ac ion blu
he concep o a ce ain ex en . Howe e , we a gue ha global
and local s a e can wo k hand-in-hand. A globally main ained
and po en ially synch onized s a e be ween se e al nodes will
s ill be needed. The con olle is s ill equi ed o ensu e a
consis en global iew. On he o he hand, he local s a e helps
implemen applica ions equi ing low and s a e acking, bu
he kep s a e is op ional o he gene al ne wo k beha io .
Hence, i is no equi ed ha he con ol plane is kep in o med
abou he local s a e. Mo eo e , he PNA p oposal wi h he
s a e upda es o igina es om he P4 and SDN communi y. As
he PNA b ings P4 o he end-hos , s a ekeeping is equi ed
anyway o o load applica ions o he NIC.
Ou esul s show ha he cos o inse ions is app ox. wo
imes highe han able en y lookups o upda es. Due o wo se
b anch p edic ion and cache op imiza ion, he inse ion cos
depends on he inse ion a e, a leas on so wa e a ge s.
The e o e, excep ionally high inse ion a es (e.g., e e y 10 h
packe ) on di e en co es lowe he pe o mance. Howe e ,
hese e ec s did no occu o he o he , mo e ealis ic, a es,
we measu ed in ou in es iga ion. A his poin , he lock- ee
solu ion scales well in mul i-co e scena ios, conside ing he
h oughpu and he baseline pe o mance o T4P4S.
Limi a ions on he in luence o inse ion a es do no apply
o ha dwa e a ge s. Thei pipelined a chi ec u es ypically
o e cons an la ency. All pipeline s ages a e a e sed in-
dependen ly o he aken con ol low. On he o he hand,
ensu ing consis ency becomes ha de when many packe s a e
p ocessed a di e en s ages, as a packe may change he
sha ed s a e in a p e ious s age.
ACKNOWLEDGMENTS
This wo k was suppo ed by he EU’s Ho izon 2020 p o-
g amme as pa o he p ojec s SLICES-PP (10107977) and
G eenDIGIT (4101131207), by he Ge man Fede al Minis y
o Educa ion and Resea ch (BMBF) unde he p ojec s 6G-
li e (16KISK002) and 6G-ANNA (16KISK107), and by he
Ge man Resea ch Founda ion (Hype NIC, CA595/13-1).
REFERENCES
[1] P. Bossha , D. Daly, G. Gibb, M. Izza d, N. McKeown, J. Rex o d,
C. Schlesinge , D. Talayco, A. Vahda , G. Va ghese, and D. Walke ,
“P4: p og amming p o ocol-independen packe p ocesso s,” Compu .
Commun. Re ., ol. 44, no. 3, 2014.
[2] “P4 Po able NIC A chi ec u e (PNA), e sion 0.5,” Las accessed:
2024-09-13. [Online]. A ailable: h ps://p4.o g/p4-spec/docs/PNA.h ml
[3] J. H. Salim, D. Cha e jee, V. Noguei a, P. Tammela, T. Osinski,
E. Haleplidis, B. Sambasi am, U. Gup a, K. Jain, and S. Se hu ama-
pandian, “In oducing P4TC - A P4 implemen a ion on linux ke nel
using a ic con ol,” in Eu oP4 2023, Pa is, F ance. ACM, 2023.
[4] B. Bu es, D. Daly, M. Debbage, E. Louzoun, C. Se e ns-Williams,
N. Sunda , N. Tu bo ich, B. Wol o d, and Y. Li, “In el’s Hype scale-
Ready In as uc u e P ocessing Uni (IPU),” in HCS 33, 2021. IEEE,
2021.
[5] B. Lewis, M. B oadben , and N. Race, “P4ID: P4 Enhanced In usion
De ec ion,” in NFV-SDN 2019, 2019.
[6] L. Teng, C.-H. Hung, and C. H.-P. Wen, “P4SF: A High-Pe o mance
S a e ul Fi ewall on Commodi y P4-P og ammable Swi ch,” in NOMS
2022, 2022.
[7] Y. Zhou, C. Sun, H. H. Liu, R. Miao, S. Bai, B. Li, Z. Zheng, L. Zhu,
Z. Shen, Y. Xi, P. Zhang, D. Cai, M. Zhang, and M. Xu, “Flow E en
Teleme y on P og ammable Da a Plane,” in SIGCOMM 2020. ACM,
2020.
[8] P. V¨
o ¨
os, D. Ho p´
acsi, R. Ki lei, D. Lesk´
o, M. Tej el, and S. Laki,
“T4p4s: A a ge -independen compile o p o ocol-independen packe
p ocesso s,” in HPSR 2018. IEEE, 2018.
[9] M. Dob escu, N. Egi, K. J. A gy aki, B. Chun, K. R. Fall, G. Iannaccone,
A. Knies, M. Manesh, and S. Ra nasamy, “Rou eb icks: exploi ing
pa allelism o scale so wa e ou e s,” in SOSP 2009, Big Sky, USA,
2009. ACM, 2009.
[10] “DPDK,” Las accessed: 2024-09-13. [Online]. A ailable: h ps:
//www.dpdk.o g/
[11] M. Simon, H. S ubbe, D. Scholz, S. Gallenm¨
ulle , and G. Ca le, “High-
pe o mance ma ch-ac ion able upda es om wi hin p og ammable
so wa e da a planes,” in ANCS 2021, Lay e e, USA. ACM, 2021.
[12] “ 4p4s a addonmiss· manuel-simon/ 4p4s · Gi Hub,” Las accessed:
2024-09-13. [Online]. A ailable: h ps://gi hub.com/manuel-simon/
4p4s/ ee/addonmiss
[13] J. Ve d´
u, M. Nemi o sky, and M. Vale o, “Mul iLaye p ocessing - an
execu ion model o pa allel s a e ul packe p ocessing,” in ANCS 2008.
ACM, 2008.
[14] G. Bianchi, M. Bonola, A. Capone, and C. Cascone, “OpenS a e:
p og amming pla o m-independen s a e ul open low applica ions in-
side he swi ch,” ACM SIGCOMM Compu e Communica ion Re iew,
ol. 44, no. 2, 2014.
[15] G. Bianchi, M. Bonola, S. Pon a elli, D. San i o, A. Capone, and
C. Cascone, “Open Packe P ocesso : a p og ammable a chi ec u e o
wi e speed pla o m-independen s a e ul in-ne wo k p ocessing,” 2016.
[Online]. A ailable: h p://a xi .o g/abs/1605.01977
[16] S. Pon a elli, R. Bi ulco, M. Bonola, C. Cascone, M. Spaziani, V. B -
uschi, D. San i o, G. Si acusano, A. Capone, M. Honda e al., “Flow-
blaze: S a e ul packe p ocessing in ha dwa e,” in NSDI 2019, 2019.
[17] C. Sun, J. Bi, H. Chen, H. Hu, Z. Zheng, S. Zhu, and C. Wu,
“SDPA: Towa d a S a e ul Da a Plane in So wa e-De ined Ne wo king,”
IEEE/ACM T ansac ions on Ne wo king, ol. 25, no. 6, 2017.
[18] T. Caiazzi, M. Scazza iello, and M. Chiesa, “Millions o low-la ency
s a e inse ions on ASIC swi ches,” PACMNET, ol. 1, no. CoNEXT3,
2023.
[19] D. Mo o, D. San i o, and A. Capone, “Flowblaze.p4: a lib a y o quick
p o o yping o s a e ul sdn applica ions in p4,” in NFV-SDN 2020, 2020.
[20] “p4c/backends/dpdk a main · p4lang/p4c · Gi Hub,” Las accessed:
2024-09-13. [Online]. A ailable: h ps://gi hub.com/p4lang/p4c/ ee/
main/backends/dpdk
[21] S. Luo, H. Yu, and L. Vanbe e , “Swing S a e: Consis en Upda es o
S a e ul and P og ammable Da a Planes,” in SOSR 2017. ACM, 2017.
[22] L. Zeno, D. R. Po s, J. Nelson, D. Kim, S. Landau-Feibish, I. Keida ,
A. Rinbe g, A. Rashelbach, I. De-Paula, and M. Silbe s ein, “SwiSh:
Dis ibu ed Sha ed S a e Abs ac ions o P og ammable Swi ches,” in
NSDI 2022, 2022.
[23] Z. Zhou, M. He, W. Kelle e , A. Blenk, and K.-T. Foe s e , “P4Upda e:
as and locally e i iable consis en ne wo k upda es in he P4 da a
plane,” in CoNEXT 2021. ACM, 2021.
[24] M. Miao, W. Cheng, F. Ren, and J. Xie, “Sma Ba ching: A Load-
Sensi i e Sel -Tuning Packe I/O Using Dynamic Ba ch Sizing,” in
HPCC/Sma Ci y/DSS 2016. IEEE, 2016.
[25] S. Gallenm¨
ulle , P. Emme ich, F. Wohl a , D. Raume , and G. Ca le,
“Compa ison o amewo ks o high-pe o mance packe IO,” in ANCS
2015, Oakland, USA, 2015. IEEE Compu e Socie y, 2015.
[26] S. Gallenm¨
ulle , J. Naab, I. Adam, and G. Ca le, “5g qos: Impac
o secu i y unc ions on la ency,” in NOMS 2020, Budapes , Hunga y.
IEEE, 2020.
[27] P. Emme ich, S. Gallenm¨
ulle , D. Raume , F. Wohl a , and G. Ca le,
“Moongen: A sc ip able high-speed packe gene a o ,” in IMC 2015.
ACM, 2015.
[28] In el, “In el e he ne con olle x550 da ashee e
2.6,” 2021, Las accessed: 2024-09-13. [Online]. A ail-
able: h ps://www.in el.com/con en /www/us/en/con en -de ails/333369/
in el-e he ne -con olle -x550-da ashee .h ml