Fix with P6: Verifying Programmable Switches at Runtime [original]

Apoorv Shukla, Kevin Hudemann, Zsolt Vági, Lily Hügerich,

Georgios Smaragdakis, Artur Hecker, Stefan Schmid, Anja

Feldmann

Fix with P6: Verifying Programmable Switches at

Runtime

Open Access via institutional repository of Technische Universität Berlin

Document type

Conference paper | Accepted version

(i. e. final author-created version that incorporates referee comments and is the version accepted for

publication; also known as: Author’s Accepted Manuscript (AAM), Final Draft, Postprint)

This version is available at

https://doi.org/10.14279/depositonce-12013

Citation details

Shukla, Apoorv; Hudemann, Kevin; Vági, Zsolt; Hügerich, Lily; Smaragdakis, Georgios; Hecker, Artur; Schmid,

Stefan; Feldmann, Anja (2021). Fix with P6: Verifying Programmable Switches at Runtime. IEEE INFOCOM

2021 – IEEE Conference on Computer Communications, 10–13.05.2021.

©©

2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other

uses, in any current or future media, including reprinting/republishing this material for advertising or

promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of

any copyrighted component of this work in other works.

This work is protected by copyright and/or related rights. You are free to use this work in any way permitted by

the copyright and related rights legislation that applies to your usage. For other uses, you must obtain

permission from the rights-holder(s).

Fix with P6: Verifying Programmable Switches at Runtime

Apoorv Shukla1,*Kevin Hudemann2,*Zsolt Vági3,*Lily Hügerich4

Georgios Smaragdakis4,6Artur Hecker1Stefan Schmid5Anja Feldmann6

1Huawei Munich Research Center 2SAP 3Swisscom 4TU Berlin 5Faculty of Computer Science, University of Vienna 6MPI-Informatics

Abstract—We design, develop, and evaluate P6, an auto-

mated approach to (a) detect, (b) localize, and (c) patch

software bugs in P4 programs. Bugs are reported via a

violation of pre-specified expected behavior that is captured

by P6.P6 is based on machine learning-guided fuzzing

that tests P4 switch non-intrusively, i.e., without modifying

the P4 program for detecting runtime bugs. This enables

an automated and real-time localization and patching of

bugs. We used a P6 prototype to detect and patch existing

bugs in various publicly available P4 application programs

deployed on two different switch platforms: behavioral model

(bmv2) and Tofino. Our evaluation shows that P6 significantly

outperforms bug detection baselines while generating fewer

packets and patches bugs in large P4 programs such as

switch.p4 without triggering any regressions.

I. INTRODUCTION

Programmable networks herald a paradigm shift in the

design and operation of networks. While programmable

networks enable to break the tie between vendor-specific

hardware and proprietary software, they facilitate an in-

dependent evolution of software and hardware. With the

P4 language [1], [2], one can define in a P4 program,

the instructions for processing the packets, e.g., how the

received packet should be read, manipulated, and forwarded

by a network device, e.g., a P4 switch.

However, with these new capabilities, also new chal-

lenges are unleashed, related to the P4 software verification,

i.e., ensuring that the software fully satisfies all the expected

requirements. The P4 switch behavior depends on the

correctness of the P4 programs running on them. We realize

that a bug in a P4 program, i.e., a small fault such as a

missing line of code or a fat finger error, or a vendor-

specific implementation error, can trigger unexpected and

abnormal switch behavior. In the worst case, it can result

in a network outage, or even a security compromise [3].

Problem Statement. In this paper, we examine and verify

the behavior of P4 switches after the P4 programs are

deployed. We pose the question: “Is it possible to detect,

localize, and patch software bugs in a P4 program

running on P4 switches?”. We believe that being able to

answer this question, even partially, unlocks full potential of

programmable networks, improves their security, and will

hence increase their penetration in operational and mission-

critical networks.

Recently, a panoply of P4 program verification tools [4]–

[10] has been proposed. These verification systems, how-

ever, fail to repair the P4 program containing bugs. Most

*Authors worked on this paper while affiliated with TU Berlin.

of them [4]–[7] aim to statically verify user-defined P4

programs which are later, compiled to run on a target

switch. They mostly find bugs that violate memory safety

properties, e.g., invalid memory access, buffer overflow,

etc. Furthermore, they are prone to false positives and are

unable to verify the runtime behavior on real packets. In

addition, classes of bugs, e.g., checksum-related or ECMP

(Equal-Cost Multi-path) hash calculations-related bugs are

platform-dependent or P4 target switch implementation-

specific and, thus, cannot be detected by static analysis

approaches [4]–[7] or others [11]. Since, runtime verifica-

tion aims to verify the actual behavior against the expected

behavior of a switch by sending specially-crafted input

packets to the switch and observing the behavior, such

verification is complementary to static analysis. Currently,

the development and testing cycles in P4-based systems are

short [12] due to intense competition and need for new

applications which makes runtime verification indispens-

able. Note; this makes the detection of bugs causing the

abnormal runtime behavior a challenging task as the P4

switch does not throw any runtime exceptions. Furthermore,

the detection of bugs is also challenging if there is no

output, i.e., packets are dropped silently instead of being

forwarded. Thus, runtime verification of switch is crucial.

A useful approach to verify the runtime behavior is

fuzz testing or fuzzing [13]–[23], a well-known dynamic

program testing technique that generates semi-valid, ran-

dom inputs which may trigger abnormal program behavior.

However, for fuzzing to be efficient, intelligence needs to

be added to the input generation, so that the inputs are not

rejected by the parser and it maximizes the chances of trig-

gering bugs. This becomes crucial especially in networking,

where the input space is huge, e.g., even a 32-bit destina-

tion IPv4 address field in a packet header introduces 232

possibilities. To make fuzzing more effective, we consider

the use of machine learning, to guide the fuzzing process to

generate smart inputs that trigger abnormal target behavior.

Recently, Shukla et al. [23] have shown that Reinforcement

Learning (RL) [24], [25] can be used to train a verification

system. We build upon [23] by adding (a) static analysis to

the fuzzing process to significantly reduce the input search

space, and thus, adding input structure awareness, and (b)

support for platform-dependent bug detection.

Even if a bug in a P4 program is detected, the localization

of code statements in the P4 code that are responsible

for the bug, is non-trivial. The difficulty stems from the

fact that practical P4 programs can be large with a dense

conditional structure. In addition, the same faulty statements

in a P4 program may be executed for both passed as well

as failed pre-defined test cases; this makes it difficult to

pinpoint the actual faulty line/s of code. Tarantula [26]–

[28] is a dynamic program analysis technique that helps in

fault localization by pinpointing the potential faulty lines

of code. To localize the software bugs, we tailor Tarantula

for generic software to P4 programs by building a localizer

called P4Tarantula and integrating it with the bug detection

of machine learning-guided fuzzing. In this paper, we also

show how the detection and localization of bug makes it

possible to patch a number of bugs in P4 programs.

P6.In P4, the automated program repair [29] is an un-

charted territory and becomes increasingly important as the

software development lifecycle in programmable networks

is short [12] with insufficient testing. In this paper, we show

that due to the structure of P4 programs, it is possible to au-

tomate patching of platform-independent bugs (P4 program-

specific software bugs) in P4 programs, if the patch is avail-

able. To this end, we present P6,P4 with runtime Program

Patching, a novel runtime P4-switch verification system

that (a) detects, (b) localizes, and (c) patches

software bugs in a P4 program. P6 improves existing work

on machine learning-guided fuzzing [23] in P4 by extending

it and augmenting it with: (a) automated localization, and

(b) runtime patching. P6 relies on the combination of static

analysis of the P4 program and Reinforcement Learning

(RL) technique to guide the fuzzing process to verify the

P4-switch behavior at runtime.

In a nutshell, in P6, the first step is to capture the

expected behavior of a P4 switch, which is achieved using

information from three different sources: (i) the control

plane configuration, (ii) queries in p4q (§III-B1), a query

language which we leverage to describe expected behav-

ior using conditional statements, and (iii) accepted header

layouts, e.g., IPv4, IPv6, etc., learned via static analysis

of the P4 program. If the actual runtime behavior to the

test packets generated via machine-learning guided fuzzing

differs from the expected behavior through the violation

of the p4q queries, it signals a bug to P6 which then

identifies a patch from a library of patches. If the patch

is available, P6 modifies the original P4 program to fix

the bug signaled by the p4q queries. Then, the patched P4

program is subjected to sanity and regression testing.

We develop a prototype of P6 and evaluate it by testing it

on eight P416 application programs from switch.p4 [30],

P4 tutorial solutions [31], and NetPaxos codebase [32]

across two P4 switch platforms, namely, behavioral model

version 2 (bmv2) [33] and Tofino [34]. Our results show

that P6 successfully detects, localizes, and patches diverse

bugs in all P416 programs while significantly outperforming

bug detection baselines without introducing any regressions.

Related Work. Unlike P6, P4-based verification ap-

proaches [4]–[9], [11], [23], [35], [36] are insufficient in

localizing and patching of runtime bugs. Besides, they

Related work in P4 Runtime Verification Detection Localization Patching Detection of PD bugs

Cocoon [36] 7X X 7 7

Vera [4] 7X7 7 7

p4v [5] 7X7 7 7

ASSERT-P4 [6], [7] 7X7 7 7

P4NOD [35] 7X7 7 7

p4pktgen [8] 7X7 7 7

P4CONSIST [9] X X 7 7 7

P4RL [23] X X 7 7 7

P6 X X X X X

TABLE I: Related work in P4 verification. PD corresponds to the

platform-dependent bugs. Note, Xdenotes the capability, (X) denotes

a part of full capability, and ⇥denotes the missing capability.

cannot detect the platform-dependent bugs. Contrary to

them, P6 can automatically detect, localize and patch the

software bugs in the P4 programs. In addition, P6 detects

the platform-dependent bugs. Table I illustrates capabilities

of other P4 verification tools as compared to P6.

Contributions. Our main contributions are:

•We design, implement, and evaluate P6, an end-to-end

runtime P4 verification system that detects, localizes, and

patches bugs in P4 programs non-intrusively. (§III)

•We observe that the success of P6 relies on the increased

patchability of P4 program from old (P414) to the new

version (P416). (§II)

•We present a P6 prototype and report on an evaluation

study. We evaluate our P6 prototype on a P4 switch running

eight P416 programs (including switch.p4 with 8,715

LOC) from publicly available sources [30]–[32] across

two platforms, namely, behavioral model and Tofino. Our

results show that P6 non-intrusively detects both platform-

dependent and platform-independent bugs, and significantly

outperforms state-of-the-art bug detection baselines. (§IV)

•For platform-independent bugs, P6 localizes bugs and

fixes the P4 program, when a patch is available, without

causing regressions/introducing new bugs. (§III, §IV)

•We release the P6 software and library of ready patches

for all existing bugs in the P4 programs [37].

II. CHALLENGES &OPPORTUNITIES

A. Primer: Packet Processing Pipeline of P4

P4 [1], [2] is a domain-specific language comprising

of packet-processing abstractions, e.g., headers, parsers,

tables, actions, and controls. The P4 packet processing

pipeline evolved from [38] to its current form P416 [2]

in generic Portable Switch Architecture (PSA) [39]

switch platform, e.g., Tofino [34] (Figure 1a and 1b). In

P416 pipeline, there are six programmable blocks that

are platform-independent, namely, ingress parser,

ingress match-action,ingress deparser,

egress parser,egress match-action, and

egress deparser. The programmable blocks are

annotated with a solid line in Figures 1a and 1b. There are

also two platform-dependent blocks (annotated with dashed

lines in Figures 1a and 1b): the packet replication

engine (PRE) and the buffer queuing engine

(BQE). These are non-programmable relying on proprietary

implementations of hardware vendors.

Ingress

Match-Action

Packet

Replication

Engine

(PRE)

Packet

Egress

Parser

Egress Match-Action

Parser MyParser(...){

(…)

state parse_ipv4 {

pkt.extract(hdr.ipv4);

transition accept;

}

(...)

}

(...)

update_checksum(

(...)

{ hdr.ipv4.version,

…

hdr.ipv4.dstAddr },

(…); )

(...)

Egress

Deparser

Ingress

Deparser

Buﬀer

Queuing

Engine

(BQE)

Ingress Parser

(a) An example of platform-independent bug in P416 pipeline.

(PRE)

Egress

Parser

Egress

Match-Action

Egress

Deparser

Ingress

Deparser

Buﬀer

Queuing

Engine

(BQE)

Ingress Parser

Table 1 Table n

Miss:

Drop & Exit

Match:

Clone

...

if (clone_ﬂag != 0)

{clone}

if (resubmit_ﬂag != 0)

{resubmit}

elif (mcast_grp! = 0)

{multicast}

elif (egr_port == 511)

{Drop} ...

Packet cloned

Ingress

Match-Action

Packet

(b) An example of platform-dependent bug in P416 pipeline.

Fuzzer

P4 Switch

Localizer

Patcher

§3.2§3.3

§3.4

Test

Packets

Feedback

Activate

Patcher

Activate

Localizer

Compile and deploy the patched P4 program

214

Fig. 1: Fig. 1a and Fig. 1b illustrate platform-independent and -dependent bugs respectively. Fig. 1c depicts P6 Workflow.

B. Challenges: Runtime Bugs in P4

Bugs or errors can occur at any stage in the P4 pipeline.

If a bug occurs in any of the programmable blocks, then we

term the bug as platform-independent and software patching

can solve the problem. If the bug appears in the non-

programmable or platform-dependent blocks, namely, the

PRE or BQE, then the vendor has to be informed to fix the

issue if the implementation is hardware-related or vendor-

specific. P4 program verification systems [4]–[7] are able

to detect bugs using static analysis. Unfortunately, static

analysis is (i) prone to false positives, (ii) cannot detect

platform-dependent bugs, and (iii) cannot detect runtime

bugs that require to actively send real packets.

For platform-independent bugs, we consider the Figure 1a

(solid line blocks). It illustrates part of the implementation

of Layer-3 (L3) switch provided in the P4 tutorial solu-

tions [31]. Here, the parser does not check if the IPv4 header

contains IPv4 options or not, i.e., if the IPv4 ihl field is

equal to 5 or not. When updating the IPv4 checksum

of the packets during egress processing, IPv4 options are

not taken into account, hence for those IPv4 packets with

options, the resulting checksum is wrong causing such

packets to be forwarded and incorrectly dropped at the

next hop leading to anomalies in network behavior. Other

bugs that fall in this category are those related to IPv4/6

checksum and ttl in the packet. Such bugs are platform-

independent, as they only result from programming errors.

For a platform-dependent bug, consider the scenario

shown in Figure 1b (dashed line blocks). Here, we assume

a P4 program implements at least two match-action tables.

Any table except the last one could be a longest prefix

match (LPM) table, offering unicast, clone, and drop actions

(ingress match-action block). The last match-action table

implements an access control list (ACL). So, the packets

can either be dropped or forwarded according to the chosen

actions by the previous tables. In this case, it is possible that

conflicting forwarding decisions are made. Consider packets

are matched by the first table (Table 1) and a clone decision

is made, later, those are dropped by the ACL table (Table

n). In such a case, the forwarding behavior depends on the

implementation of the PRE, which is platform-dependent.

The implementation of PRE of the SimpleSwitch target in

the behavioral model (bmv2) is illustrated in Figure 1b. It

would drop the original packet, however, forward the cloned

copy of the packet. Similar bugs can occur, if instead of

the clone action, resubmit action is chosen (blue) or when

implementing multicast (green).

The above motivates us to turn our attention to run-

time detection of bugs. Runtime verification is a useful

and complementary tool in the P4 verification repertoire

that detects both platform-independent bugs resulting from

programming errors as well as platform-dependent bugs.

C. Opportunities for Patching: Structure of a P4 Program

In the evolution of P4, there are two recent versions:

P414 [40] and P416 [2]. P416 allows programmers to use

definitions of a target switch-specific architecture, e.g.,

PSA (Portable Switch Architecture) [39], [41]. P416 is an

upgraded version of P414. In particular, a large number

of language features have been eliminated from P414 and

moved into libraries including counters, checksum units,

meters, etc., in P416.P414 allowed the programmer to

explicitly program three blocks: ingress parser (including

header definitions of accepted header layouts), ingress con-

trol and egress control functions. Recall that P416 allows to

explicitly program six programmable blocks (Figure 1a).

By analyzing programs in the P414 and P416 versions,

we realize that as more blocks of the P4 program get pro-

grammable, there is more onus on the programmer to write

a program that behaves as expected (when it gets compiled

and deployed on the P4 switch). Missing checks or fat finger

errors can cause havoc in the network. However, this is a

blessing in disguise as the more programmable the code

is, the more patchable it is. Thus, programming errors can

be fixed. We observe that the potentially patchable code

percentage increases from P414 to P416 in all applications

(excluding calculator) from P4 tutorial solutions [31] and

NetPaxos codebase [32] in behavioral model (bmv2) switch

platform [33] and other generic PSA switch platforms [39],

[41], e.g., Tofino [34] respectively. The patchable code

*P4 Source Code*

………….

state parse_ipv4 {

packet.extract(hdr.ipv4);

transition accept;

}

………….

*P4 Source Code*

………….

state parse_ipv4 {

packet.extract(hdr.ipv4);

transition accept;

}

………….

*P4 Source Code*

………….

state parse_ipv4 {

packet.extract(hdr.ipv4);

verify(hdr.ipv4.version == 4, error.BadHeader);

verify(hdr.ipv4.ihl == 5, error.BadHeader);

verify(hdr.ipv4.len >= 20, error.BadHeader);

verify(hdr.ipv4.ttl >= 2, error.BadHeader);

transition accept; }

………….

apply {

if (standard_metadata.parser_error != error.NoError) {

mark_to_drop();

return;

}

………….

P4 Source Code with Bugs Localized by P4Tarantula Patched by Patcher

FuzzerLocalizer

Patcher

Fuzzer

P4 Switch

Localizer

Patcher

FuzzerLocalizer

Patcher

P4 Switch P4 Switch

Fig. 2: P6 in Action: depicting the automated detection, localization and patching of a bug in a L3 switch P4 program [31].

percentage comes from the six programmable blocks in

P416. Roughly, whatever is programmable, is patchable. In

principle, around 40-45% of a P4 program is patchable in

P416 programs for behavioral model (bmv2) switch plat-

form [33]. This increases to 50-55% if the ingress deparser

and egress parser are programmable for other target switch

platforms, e.g., Tofino [34]. In particular, the parser and

header definitions account for 20-40% of the total patchable

code. If there is no bug in parser or header, packets with

incorrect header get dropped. However, the bug still can be

either in the non-patchable platform-dependent block or in

the application code logic or deparser which is patchable

as it is platform-independent.

Observation: From P414 to P416, P4 program possesses

twice as many programmable blocks increasing the chances

for patchability. Bugs detected in the platform-independent

part can be localized and patched; a platform-dependent

bug may not be patchable if it is hardware-related.

III. P6:SYSTEM DESIGN

A. P6: Overview

P6’s goal (see Figure 1c) is to detect, localize, and

patch the software bugs in a P4 program at runtime.

This is achieved by verifying the actual runtime behavior

against the expected behavior of a P4 switch running a pre-

compiled P4 program to the incoming packets.

The P6 system contains three main modules:

(1) Fuzzer: Generates test packets using RL-guided

fuzzing, static analysis, and p4q queries (§III-B1) to the

P4 switch running the pre-compiled P4 program. (§III-B)

(2) Localizer: P4Tarantula is the Localizer which pinpoints

faulty lines of code causing bugs in the P4 program. (§III-C)

(3) Patcher: Automates patching of the bugs localized by

P4Tarantula Localizer, if patchable. Then, Patcher compiles

and loads the patched P4 program on the P4 switch. (§III-D)

P6 Workflow. P6 is a closed-loop control system. Through

a pre-generated dictionary from control plane configuration,

p4q queries, and static analysis of a P4 program, the

expected runtime behavior of the P4 switch is captured and

sent as an input to the Fuzzer containing the RL Agent

and the Reward System (§III-B). As shown in Figure 1c,

the Fuzzer selects appropriate mutation actions such as

add/delete/modify bytes in a packet to generate test packets

towards the P4 switch running the pre-compiled P4 pro-

gram 1 . If the actual runtime behavior towards the packets

defies the expected behavior through the violation of the

p4q queries, it signals a bug in the form of a reward as a

feedback to the Reward System which is then, exploited by

the RL Agent to improve during the training process by se-

lecting better mutation actions on the packet 2 . After the

bug detection, the Fuzzer automatically triggers Localizer

(§III-C), P4Tarantula (only for platform-independent bugs;

for platform-dependent bugs, the vendor is informed) which

pinpoints the faulty line of code 3 to trigger the Patcher

(§III-D) which searches for the appropriate patch from a

library of patches for the corresponding P4 program 4 .

If the patch is available, Patcher modifies the original P4

program, compiles and loads it on the P4 switch and checks

if the bug is no longer triggered by p4q queries by repeating

the whole-cycle and executing sanity and regression test-

ing 5 . Note, P6 is non-intrusive and thus, requires no

modification to the P4 program for testing before patching.

P6 in Action. Before we dive into the details of Fuzzer,

Localizer, and Patcher, we demonstrate the operation of

P6. Figure 2 illustrates how P6 detects, localizes, and

patches an existing bug in a layer-3 (L3) switch P4 source

code (program) from [31] in an automated fashion. The

left part of Figure 2 shows the P4 program containing a

platform-independent bug in the parser code, i.e., no header

field validation is implemented, hence all IPv4 packets are

incorrectly accepted by the parser. After the P4 program is

deployed on the P4 switch, P6 is triggered. Initially, the

Fuzzer detects the bug violating the corresponding p4q

query based on the feedback (reward) received from the

P4 switch. Then, it triggers the P4Tarantula for localization

(shown in the center of Figure 2) where it pinpoints the

problematic part of the code (highlighted). Afterwards, the

Patcher is triggered automatically, patching the necessary

problematic parts of the code, i.e., adding header field

verification statements (highlighted in right), after checking

if the patch was indeed missing from the P4 program.

Finally, Patcher automatically compiles [42] and deploys

the patched P4 program on the P4 switch, and triggers P6

to ensure that the patches caused no regressions and fixed

Loading more pages...