Journal of Global Optimization (2021) 81:109–152
https://doi.org/10.1007/s10898-020-00949-1
Advances in verification of ReLU neural networks
Ansgar Rössig1·Milena Petkovic2
Received: 1 August 2019 / Accepted: 5 September 2020 / Published online: 27 October 2020
© The Author(s) 2020
Abstract
We consider the problem of verifying linear properties of neural networks. Despite their
success in many classification and prediction tasks, neural networks may return unexpected
results for certain inputs. This is highly problematic with respect to the application of neural
networks for safety-critical tasks, e.g. in autonomous driving. We provide an overview of
algorithmic approaches that aim to provide formal guarantees on the behaviour of neural
networks. Moreover, we present new theoretical results with respect to the approximation of
ReLU neural networks. On the other hand, we implement a solver for verification of ReLU
neural networks which combines mixed integer programming with specialized branching
and approximation techniques. To evaluate its performance, we conduct an extensive com-
putational study. For that we use test instances based on the ACAS Xu system and the
MNIST handwritten digit data set. The results indicate that our approach is very competitive
with others, i.e. it outperforms the solvers of Bunel et al. (in: Bengio, Wallach, Larochelle,
Grauman, Cesa-Bianchi, Garnett (eds) Advances in neural information processing systems
(NIPS 2018), 2018) and Reluplex (Katz et al. in: Computer aided verification—29th interna-
tional conference, CAV 2017, Heidelberg, Germany, July 24–28, 2017, Proceedings, 2017).
In comparison to the solvers ReluVal (Wang et al. in: 27th USENIX security symposium
(USENIX Security 18), USENIX Association, Baltimore, 2018a) and Neurify (Wang et al.
in: 32nd Conference on neural information processing systems (NIPS), Montreal, 2018b),
the number of necessary branchings is much smaller. Our solver is publicly available and
able to solve the verification problem for instances which do not have independent bounds
for each input neuron.
Keywords Neural networks verification ·ReLU ·MIP
BMilena Petkovic
petkovic@zib.de
Ansgar Rössig
1Institute for Mathematics, Software and Algorithms for Discrete Optimization, Technische
Universität Berlin, Straße des 17. Juni 136, Berlin, Germany
2Zuse Institute Berlin, Takustr. 7, 14195 Berlin, Germany
123
110 Journal of Global Optimization (2021) 81:109–152
1 Introduction
During the last few years, various approaches have been presented that aim to provide formal
guarantees on the behaviour of neural networks. The use of such verification methods may
be crucial to enable the secure and certified application of neural networks for safety-critical
tasks. Moreover, based on first results of [32], awareness was raised that neural networks
are prone to fail on so called adversarial examples. These are created by small perturbations
of input samples, such that the changes are (almost) imperceptible to humans. However,
these perturbations are often sufficient to make a neural network fail on the input sample.
The existence of such adversarial examples can be ruled out by methods of neural network
verification. In fact, a closely related line of research termed as robustness certification is
focused explicitly on this topic.
In the following section we formally introduce the problem that we regard. In Sect. 3we
provide an overview of related work, and present formulations of the verification problem
as MIP in Sect. 4. In the subsequent sections we consider approximation techniques, primal
heuristics, and branching methods for verification of neural networks. Extensive computa-
tional results on the performance of our solver and others can be found in Sect. 8, and Sect. 9
concludes the paper with some final remarks. Additional material can be found in the appen-
dices. Our solver, which is based on the academic MIP solver SCIP [13], is publicly available
at https://github.com/roessig/verify-nn.
For the ease of notation, we use [n]for n∈Nto denote the set {1,...,n}. In our work,
we only regard trained neural networks, which can be seen as immutable and deterministic
functions F:Rn→Rm.Fis determined by its weights and biases ((Al,bl))L
l=1. It holds
Al∈RNl×Nl−1for l∈[L]and bl∈RNl,l∈[L],whereLis the number of layers in the
neural network. N0,...,NLare the numbers of neurons per layer (cf. Bölcskei et al. [3],
Definition 1).
2 Problem definition
Now we give a formal definition of the verification problem for ReLU neural networks and
comment on some relevant properties of this problem. In the following we use the term
(solving) model to refer to an algorithmic approach for solving the verification problem. This
may encompass a range of choices in obtaining an actual algorithm.
Definition 1 (Verification Problem for ReLU Neural Networks) Assume that ∅ = X⊂Rnis
a polytope, and let ∅ = Y⊂Rmbe such that Y=k
i=1Qior Y=k
i=1Qiwhere k∈N
and Qi⊆Rmis a halfspace for i∈[k]. Given a neural network F:X→Rmwith ReLU
activation function, the verification problem consists in the decision whether F(X)⊆Y
holds. A triple (X,Y,F)will be called an instance of the verification problem (for ReLU
neural networks). Furthermore, if F(X)⊆Y, we say that the instance is verifiable, otherwise
it is refutable.
The construction of the feasible input polytope Xand the set of admissible outputs Yis
solely based on the application for which the neural network shall be used. Depending on the
algorithm which is used to solve the problem, the halfspaces Qican either be open or closed.
Though, either all of them must be closed or all of them must be open. However, the use of
floating point arithmetic by a solver for the verification problem makes this distinction rather
unimportant, since numerical comparisons require the use of a certain threshold difference.
123
Journal of Global Optimization (2021) 81:109–152 111
Moreover, Katz et al. [17] show that the verification problem for ReLU neural networks is
NP-complete. Hence we cannot expect that the problem can be solved efficiently in general.
We also follow the naming concept of Katz et al. [17] and refer to verifiable instances of the
verification problem as UNSAT instances, and to refutable instances as SAT instances. This
naming corresponds to the existence of a counterexample as defined in the following remark.
Remark 1 If an instance (X,Y,F)is refutable, i.e. F(X)Y,wewanttoprovidex∈X
such that F(x)/∈Y. We will refer to this x∈Xas a counterexample for the instance.
Remark 2 More complex properties can be investigated by spliting them into separate
instances. For example, if Y=k
i=1Qi∩l
j=1Pjfor halfspaces Qiand Pjand
k,l∈N,thenF(X)⊆Yholds if and only if F(X)⊆(k
iQi)and F(X)⊆(l
iPi).
Remark 3 Considering an instance Π=(X,Y,F)of the verification problem with X⊂Rn,
we will often assume the existence of bounds li,uifor i∈[n]such that li≤xi≤uifor
x∈X. Indeed, the requirements of Definition 1justify this assumption. These bounds can
be computed by solving one LP per bound. We set
li:= min
x∈Xxiand ui:= max
x∈Xxifor i∈[n].
In fact, for all publicly available instances of the verification problem that we are aware of,
the polytope Xis actually a box which is directly given by the bounds li,uifor i∈[n].For
these instances it is thus not necessary to solve any LP in order to obtain the bounds. However,
in this paper we also consider instances where Xis not a box, cf. Sect. 8and “Appendix A”.
Remark 4 Assume that we are given an instance Π=(X,Y,F)of the verification problem
as introduced in Definition 1. Some solving models of other authors are not only limited to
instances where the input polytope Xis in fact a box. Also the choice of output constraints as
represented by Yis more restricted for these models. These require that Y=k
i=1Qi⊆Rm,
where k∈Nand Qi⊆Rmis an open halfspace for i∈[k]. Indeed, this is the only of the cases
which are regarded in Definition 1where Rm\Yis a polyhedron. Yet, it is possible to use such
restricted solving models to solve an instance Π=(X,Y,F)where Y=k
i=1Qi⊆Rmfor
open halfpaces Qi⊆Rm. To this end, it is necessary to split the corresponding instance into
kinstances (X,Qi,F). Clearly, if F(X)⊆Qifor all i∈[k], then it holds F(X)⊆Yand Π
is verifiable. On the other hand, if there is x∈Xand some i∈[k]such that F(x)/∈Qi,we
know that Πis refutable since F(X)Y. We will refer to such an instance Πas conjunction
instance. On the other hand, an instance Π=(X,Y,F)where Y=k
i=1Qi⊆Rmfor open
halfspaces Qi⊆Rm, will be called disjunction instance. We will also regard those instances
as disjunction instances that fulfill Y=Qfor some open halfspace Q⊆Rm.Infact,all
instances that we consider in our computational experiments (see Sect. 8) are based on open
halfspaces. Closed halfspaces are only mentioned in some cases to provide a comprehensive
explanation.
Often, we will regard constraints of the form y=ReLU(x):= max({x,0})for x∈[l,u],
y∈R, that refer to a certain neuron with ReLU activation function. If the bounds l,u∈R
with l≤uare such that either l≥0oru≤0, we say that the corresponding neuron is fixed
(in its phase).
123
112 Journal of Global Optimization (2021) 81:109–152
3 Related work
The key properties of the problem as given in Definition 1are considered likewise in the
literature for neural network verification [4–6,9,10,17,19,21,22,33,35,36,40,41]. In view of
an instance Π=(X,Y,F)these can be summarized as follows.
A box, a polytope or a union of polytopes is defined as the feasible input domain Xfor the
property which shall be verified. Then, linear properties are defined that we denote in terms
of a set Y, such that Πis verifiable if and only if F(X)⊆Y. Complete algorithms (except
in [9]) are employed to solve this problem, i.e. if there exists ˜x∈Xsuch that F(˜x)/∈Y,
this will be reported. Clearly, the verification problem is not necessarily limited to neural
networks with ReLU activations, i.e. other activation functions are sometimes considered,
too. Cheng et al. [7] and Narodytska et al. [20] regard the verification problem on binarized
neural networks, which we do not investigate further.
First approaches to verification of neural networks [21,22,26] belong to the field of sat-
isfiability modulo theories (SMT), which generalize the boolean satisfiability problem by
replacing variables with predicates from various theories. Also the solver Reluplex [17]for
verification of neural networks is presented in this context, but solves instances which are
significantly more difficult, using an extended version of the well known simplex algorithm.
Ehlers [10] presents the solver Planet, which is based on LP and SAT solving. Dvijotham et
al. [9] formulate the verification problem as a non-convex optimization problem and obtain an
incomplete algorithm. Xiang et al. [41] regard the propagation of an input polytope through
a ReLU neural network, and Xiang et al. [40] propose to discretize the input domain in order
to verify neural networks. However, their work remains limited to theoretical considerations
and the presentation of numerical toy examples.
Various authors [6,11,19,33] consider MIP models for the verification problem. The per-
formance of such MIP models is predominantly determined by the quality of the bounds which
are computed for the ReLU neurons in the neural network. For that reason, the computation of
such bounds is discussed in more detail in Sect. 5. The use of appropriate branching schemes
is also important for an MIP model of the verification problem, we will provide more details
on this in Sect. 7. In fact, it is not necessary to solve the verification problem as an MIP if
such approximation and branching methods are used. Bunel et al. [5] present such a branch-
and-bound method without solving the verification problem as an MIP directly. Moreover,
they provide a good comparison of various methods for neural network verification. Besides
their own approach, the empirical evaluation includes Reluplex [17], Planet [10], and an MIP
model based on the suggestions of various authors [6,19,33]. While we also implement an
MIP model to solve the verification problem, its functioning is more similar to the branch-
and-bound method of Bunel et al. [5] than to the MIP model they use in their comparison.
Besides, we consider various additional aspects, and therefore speed up the solving process
significantly. For a computational comparison of other solvers with ours, we select Reluplex
[17] and the branch-and-bound method [5]. The other solvers regarded by Bunel et al. [5]
are not competitive with these, as their experimental results show. Moreover, we regard the
solvers ReluVal and Neurify as introduced by Wang et al. [35,36]. The concept for both
solvers is also a branch-and-bound scheme, that works with a frequent linear approximation
of the regarded neural network. In contrast to the method of Bunel et al. [5], the approximation
is not as good, but much faster to compute.
Anderson et al. [2] present an ideal MIP formulation for ReLU constraints which is closely
related with the techniques used in our work. Especially, they present a separation routine
which can be used to compute stronger neuron bounds. Optionally, we include this separator
123
Journal of Global Optimization (2021) 81:109–152 113
in our solving model as mentioned in Sect. 8.2. Nevertheless, it should be noted that the
results of Anderson et al. [2] do refer only to single ReLU neurons and at most the layer
before. Hence we do not have an ideal formulation of the whole network which implies that
solving the verification problem cannot be reduced to solving an LP using their formulations.
The idea of output range or reachability analysis is in principle to compute the output
range F(X)of a neural network F, given an input domain X. Since this is quite difficult,
the relevant work of Dutta et al. [8] and Ruan et al. [25] is limited to computing the range
g(F(X)), for some function g:F(X)→R. The function gshould then give some insights
into the output of the neural network Fon input domain X. Clearly, this problem is closely
related to the verification problem.
Several authors [12,23,29–31,34,37–39,42,43] consider the problem of computing robust-
ness guarantees for neural networks which are used for classification. Robustness means, that
the classification of an input sample should remain the same when the input is changed by
small perturbations. The computation of certified robustness bounds should rule out the
existence of adversarial examples. Indeed, this problem is a special case of neural network
verification. Except for Tjeng et al. [34], this problem is solved by incomplete algorithms.
That means, an algorithm either returns a guarantee that a region around an input sample is
free of adversarial examples, or no result, which is due to the use of approximations.
Modelling ReLU neural networks as MIPs is considered in the literature for other appli-
cation domains, too. Grimstad and Andersson [15] investigate the usage of ReLU neural
networks as surrogate models in MIPs and study various bound tightening techniques. Serra
et al. [28] apply a MIP formulation of a ReLU neural network to enable a lossless pruning
method. This way, equivalent neural networks of smaller size can be obtained. The compu-
tation of linear regions in ReLU neural networks is another field of application [27].
4 Neural network verification as MIP
It is straightforward to formulate the verification problem as a mixed integer program (MIP),
see [6,8]. We present a slightly improved formulation, as it can be found in [5,34]. In this
formulation, each neuron is represented by one or two (continuous) variables. The value of a
neuron before application of the ReLU function is given as a linear combination of the output
values of the predecessor neurons in the network plus the bias. That means, this connection
can be simply modelled by a linear equation in the MIP. We need two variables for neurons
with ReLU activation function. Let variable xbe the input value to the ReLU function and
ybe the output value. In this setting we will refer to xas the ReLU input variable and to y
as ReLU output variable. We want to model y=max{0,x}, which is represented using one
additional binary variable d. Furthermore, we need that upper and lower bounds l≤x≤u
are known. Then we obtain the following constraints which are equivalent to y=max{0,x}:
y≥x,y≥0
y≤x−(1−d)l,y≤d·u
d∈{0,1}
x∈[l,u],l<0<u(1)
Of course it is possible that we have l≥0oru≤0 for the bounds. In these cases, we can omit
the binary variable dand replace (1) as follows. If l≥0, this implies y=max{0,x}=x,
i.e. (1) is replaced by y=xfor x∈[l,u].Ifu≤0, we have y=max{0,x}=0 and thus
123
114 Journal of Global Optimization (2021) 81:109–152
we can set y=0forx∈[l,u]. These cases correspond to fixing the binary variable dto 1
or 0, respectively.
Let Π=(X,Y,F)be a disjunction instance of the verification problem such that it holds
Y=k
i=1Qi⊆Rmfor certain open halfspaces Qi,i∈[k]. Then it is straightforward
to formulate an MIP which is feasible if and only if Πis refutable. The instance Πof the
verification problem is represented by the following constraints:
x∈X,y∈Rm\Yand y=F(x)(2)
This is an MIP, since x∈Xand y∈Rm\Ycan be represented by linear constraints.
Especially, y=F(x)can be expressed by linear constraints combined with integrality
constraints for auxiliary binary variables that are used to model the ReLU function as shown
in (1). Now, if the MIP (2) is feasible, there exists x∈Xsuch that F(x)=y/∈Y.This
implies F(X)Yand hence Πis refutable. Otherwise, if MIP (2) is not feasible, that means
that for all x∈Xit holds F(x)=y∈Yand thus Πis verifiable.
For conjunction instances where Y=k
i=1Qi⊆Rmfor open halfspaces Qi,i∈[k],we
consider two options. Either we split instance Πinto kinstances as mentioned in Remark 4
or we formulate the verification problem as an optimization problem as proposed by Bunel
et al. [5]. In this setting, an instance Π=(X,Y,F)is verifiable if the optimum value of
the corresponding optimization problem is greater than zero and refutable if it is lower than
zero.
Assume that Y=k
i=1Qi⊆Rmwhere Qi,i∈[k]are open halfspaces. This implies
the existence of qi∈Rmand bi∈Rfor i∈[k]such that we have halfspaces Qi={x∈
Rm|qT
ix>bi}. Then we see that
y∈
k
i=1
Qi
⇔∃j∈[k]: y∈Qj={x∈Rm|qT
jx>bj}
⇔∃j∈[k]: qT
jy−bj>0
⇔max
i∈[k]qT
iy−bi>0.
The same holds for closed halfspaces Qi,i∈[k], if all inequalities “>” are replaced by their
counterparts “≥”. Analogously, with open halfspaces Qias before and Y=k
i=1Qithe
same can be shown with “min” instead of “max”. For the case Y=k
i=1Qiwe consider
the following MIP:
minimize t
s.t. x∈X
y=F(x)
zi=qT
iy−bi∀i∈[k]
t=max{z1,...,zk}(3)
Indeed, (3) is an MIP since the constraint t=max{z1,...,zk}can be replaced by linear
constraints using kadditional binary variables as shown in [5,34]. In this case, we can also
replace the constraint by t≥z1,...,t≥zk.
123
Journal of Global Optimization (2021) 81:109–152 115
Theorem 1 Instance Π=(X,Y,F),whereY =k
i=1Qifor some open halfspaces Qi=
{x∈Rm|qT
ix>bi},q
i∈Rmand bi∈Rfor i ∈[k], is verifiable if and only if the optimum
value of (3)is greater than zero.
Proof Assume that Πis verifiable, i.e. F(X)⊆Y=k
i=1Qi. Hence, for any x∈Xthere
exists j∈[k]such that y:= f(x)∈Qj, i.e. qT
jy−bj>0. It follows t≥zj:= qT
jy−bj>
0 which implies the desired result since x∈Xwas arbitrary. Remind that we regard optimum
solutions of an MIP so it suffices to consider finitely many x∈X.
For the opposite direction, assume that the optimum value ˆ
tof (3) fulfills ˆ
t>0. Let x∈X
be arbitrary and y=F(x). With zi=qT
iy−bifor i∈[k]it holds max{z1,...,zk}≥ˆ
t>0
since ˆ
tis optimal. In other words, there is j∈[k]such that qT
jy−bj=zj>0 and thus
y∈Qj⊆Y.Sincex∈Xwas arbitrary, Πis verifiable.
It works also for the case Y=k
i=1Qiby replacing “max” with “min” in (3), and similarly
for closed halfspaces. In practice, the optimum value ˆ
tof (3) will usually be significantly
greater than zero if an instance is indeed verifiable. Clearly, it is not necessary to actually
compute ˆ
tin order to solve the verification problem as Bunel et al. [5] point out. If the dual
bound of (3) is greater than zero, the instance is verifiable. We mainly use this formulation
in our implementation. On the other hand, if the primal bound of (3) is lower than zero, we
know that the corresponding instance of the verification problem is refutable as ˆ
t<0is
already implied. However, this case has less relevance since primal solutions are usually only
found by specialized heuristics which we describe in Sect. 6.
Besides, we note that the verification problem can be modelled as quadratic program. This
formulation does not require any integer or binary variables as the nonlinear behavior of the
ReLU activations is modelled by the quadratic objective function and an optimality condition.
Let Π=(X,Y,F)be a disjunction instance, i.e. Y=k
i=1Qifor open halfspaces Qi,
i∈[k]and k∈N.Let((Al,bl))L
l=1be the weights and biases corresponding to F. Here, L
is the number of layers in the neural network and N0,...,NLare the numbers of neurons
per layer. This implies X⊆RN0and Y⊆RNLand we can state the formulation:
minimize
L−1
l=1
xT
l(xl−Alxl−1−bl)
xl≥Alxl−1+bl,xl≥0,xl∈RNl∀l∈[L−1]
xL=ALxL−1+bL
x0∈X,xL∈RNL\Y(4)
Theorem 2 Instance Πis refutable if and only if the quadratic program (4) is feasible and
the optimum value is zero. Otherwise Πis verifiable.
Proof We first assume that Πis refutable so that we can find x∈Xwith F(x)/∈Y.Weset
x0:= x,xL:= F(x)∈RNL\Yand for l∈[L−1]we let xl:= ReLU(Alxl−1+bl)which
implies xl≥0andxl≥Alxl−1+bl. Furthermore it is xL=ALxL−1+bLand for each
l∈[l−1]we have for each i∈[Nl]that either [xl]i=0or[xl]i=[Alxl−1+bl]i.Since
xl∈RNl, this leads to the conclusion that xT
l(xl−Alxl−1−bl)=0foralll∈[L−1].
Hence, the quadratic program (4) is feasible and its optimum value is zero.
On the other hand, if (4) is feasible and the optimum value is zero, we know that there is
x0∈X, such that F(x)=xL/∈Ywhich means that Πis refutable. Indeed, it holds F(x)=
xLsince for all l∈[L−1]we have xl≥0andxl≥Alxl−1+bl, i.e. xT
l(xl−Alxl−1−bl)≥0
123
116 Journal of Global Optimization (2021) 81:109–152
Fig. 1 Naive approximation of ReLU function in one dimension. Here we have lower bound −4 and upper
bound 6 for the ReLU input variable x. The feasible domain of the ReLU output variable yisgivenbythe
solid black line for the actual ReLU function and by the shaded area for the naive approximation
for all l∈[L−1]. Hence we know xT
l(xl−Alxl−1−bl)=0foralll∈[L−1]as the
objective value of (4) is zero, and it follows that [xl]i[(xl−Alxl−1−bl)]i=0foralli∈[Nl]
and l∈[L−1]. Subsequently it holds [xl]i=ReLU(Alxl−1−bl)and thus we can conclude
that F(x)=xL.
To evaluate this formulation computationally, we tried a plain implementation in SCIP
[13]. Within a time limit of one hour, SCIP is not able to solve any of the disjunction instances
in our SAT and UNSAT evaluation sets. Only very easy MNIST instances could be solved
with this formulation.
Anderson et al. [2] present an ideal MIP formulation for ReLU constraints which can
replace (1). It should be noted that the formulation is ideal for a single ReLU neuron but not
for the whole neural network. As the formulation of Anderson et al. [2] has an exponential
number of constraints, they also describe a separation routine that runs in linear time. This
allows to strengthen formulation (1) by adding additional cuts to the LP relaxation, as obtained
by the separation routine.
5 Approximations of ReLU neural networks
Solving the problem of neural network verification requires to model constraints of the form
y=max{0,x}for all ReLU input variables xand corresponding ReLU output variables y
of each layer. It is crucial to obtain tight bounds l,uon the value of xbefore the application
of the ReLU function. Especially, we regard the linear approximation of these constraints for
a whole layer at once, an idea so far considered only briefly in [2,5,23].
Given an instance Π=(X,Y,F)of the verification problem with X⊂Rn, we will
assume the availability of input bounds li,uiwith i∈[n]for the components of Xthroughout
this section (cf. Remark 3). All approximation methods that we present are executed layer
by layer. Based on the input bounds, we compute bounds for the neurons in the following
layer. This process is iterated until the last layer is reached, i.e. the output layer. Depending
on the instance and the bound computation approach, it may be possible to prove that Πis
verifiable using only these bounds for the output layer. Assume that we have a set Awhich
approximates the neural network output F(X), i.e. F(X)⊆A.IncasethatA⊆Y,wehave
thus shown that F(X)⊆Y, which means that Πis verifiable.
5.1 Basic approximation methods for bound computations in neural networks
The most simple approximation approach is naive interval arithmetic as used in [10,35].
Figure 1provides a visual representation of this approximation, to which we will refer as
123
Journal of Global Optimization (2021) 81:109–152 117
Fig. 2 Approximation of ReLU
function in one dimension as
proposed by [36]. Here we have
lower bound −4 and upper bound
6 for the ReLU input variable x
Fig. 3 Approximation of ReLU
function in one dimension as
proposed by Ehlers [10]. Here we
have lower bound −4 and upper
bound 6 for the ReLU input
variable x
−8−6−4−2 2468
2
4
6
8
x
y
naive approximation. This simple approach mainly suffers from the fact that it assumes
the independency of all predecessor neurons when computing a new bound. Therefore, the
bounds computed with this method are so bad, that they only serve to solve tiny instances.
Wang et al. [35] use symbolic interval arithmetic to keep track on some of the neuron
dependencies in order to compute better bounds. The idea is to keep a symbolic equation,
based on the input values of the network, for each neuron. This symbolic approach can only
provide better bounds if at least some of the ReLU activations can be fixed positively, i.e.
l≥0. Otherwise, the symbolic interval arithmetic uses the same bounds as the naive method
and computes new bounds in the same way.
To overcome this drawback, Wang et al. [36] improve the method by introducing a different
approximation for the casel<0<u. The main idea is to maintain the symbolic dependencies
also in this case. Though, the linear equation for the value of a ReLU neuron with input bounds
l<0<ucannot be kept. Instead a symbolic equations is introduced which provides a lower
and upper bound for the neuron value. These symbolic bounds can then be propagated through
the network and have the advantage that the dependency information partially remains. For
the propagation of the symbolic equactions an approximation is used as visualized in Fig. 2.
Now we consider a linear approximation of ReLU constraints which was first proposed
by Ehlers [10]. In fact, we will show that it is best possible in a certain sense, which we
define in the following subsection. Given x∈[l,u],wherel<0<u,andy=max{0,x}it
holds (i) y≥0, (ii) y≥x, and (iii) y≤u(x−l)
u−l. We graphically depict this approximation in
Fig. 3which in fact coincides with the linear relaxation of the MIP formulation (1)forReLU
constraints, see [2,5].
Of course, the linear approximation of Ehlers [10] remains valid, if either the constraint
y≥0 or the constraint y≥xis removed. This enables the use of matrix multiplication
(cf. Zhang et al. [42]) or static analyzers with abstract domains (cf. Singh et al. [30]) for the
propagation of the inequations.
Another approximation method is proposed by Raghunathan et al. [23] in the context of
robustness certification. It consists in an SDP relaxation for ReLU neural networks that acts
simultaneously on all neurons of a layer.
123
118 Journal of Global Optimization (2021) 81:109–152
5.2 Comparison of linear ReLU approximations
In general, one ReLU layer contains several neurons, and we are interested to compute an
approximation of the output range of the layer. This approximated output range can then be
regarded as input to the next layer. As we want to reach a quick propagation of the output
ranges through the layers, it is important that the approximated output range is a polytope.
This allows to compute neuron bounds quickly using linear programming. In the following,
we develop a theoretical framework to analyse different linear approximations.
Definition 2 (ReLU approximation) Let n,m∈N,A,B∈Rm×n,c∈Rn,andP⊂Rnbe a
polytope. We say that
Q:= x
y∈P×Rn|Ax +By ≤c⊂R2n
is a ReLU approximation (of P) if it holds that (P×ReLU(P)) ⊆Q.Qis called an
independent ReLU approximation,ifforall j∈[m], there exists i∈[n]such that Ajl =
Bjl =0foralll∈[n]\{i}. A polytope Pis called ReLU proper,ifforalli∈[n]it holds
min
x∈Pxi<0<max
x∈Pxi.
The consideration of ReLU proper polytopes simplifies the formulation of statements,
as fixed ReLU neurons are not regarded. If we apply the naive approximation to a ReLU
proper polytope we obtain a box [0,u1]×...×[0,un],whereuiis the upper bound for
the corresponding variable. We see that this is an independent ReLU approximation.Let
A=0∈R2n×nand for each i∈[n],weaddtworowstomatrixBand vector cto enforce
0≤yi≤uifor i∈[n], i.e. m=2nfor the min Definition 2.TheserowsareeT
iy≤ui
and −eT
iy≤0, where eiis the i-th unit vector in Rn. Hence, we have exactly one non-zero
coefficient in each row of Band only zero coefficients in A, so that the property holds. In
passing we notice that the approximation proposed in Wang et al. [36] is an independent
ReLU approximation, too.
Now we use our definition of a ReLU approximation for a more thorough investigation of
the possibilities to approximate ReLU constraints. Within the restrictions of the definition,
we would like to find matrices A,Band cfor a ReLU proper polytope P, such that Qis as
small as possible (with respect to inclusion). First, we will restrict our analysis to independent
ReLU approximations and claim: the approximation proposed by Ehlers [10] is best possible
among all independent ReLU approximations of a ReLU proper polytope. We define this
approximation formally as a ReLU approximation in order to state the result in Theorem 3.
Definition 3 Let P⊂Rnbe a ReLU proper polytope. The ReLU approximation of P
corresponding to the approximation of Ehlers [10] will be denoted as QE. In detail, for
i∈[n],weset
A(i)=⎡
⎣
0
eT
i
ui
li−uieT
i
⎤
⎦,B(i)=⎡
⎣
−eT
i
−eT
i
eT
i
⎤
⎦and c(i)=⎡
⎣
0
0
uili
li−ui
⎤
⎦.
For that, we use li:= minx∈Pxiand ui:= maxx∈Pxiand eventually define
AE=⎡
⎢
⎣
A(1)
.
.
.
A(n)
⎤
⎥
⎦,BE=⎡
⎢
⎣
B(1)
.
.
.
B(n)
⎤
⎥
⎦and cE=⎡
⎢
⎣
c(1)
.
.
.
c(n)
⎤
⎥
⎦.
123
Journal of Global Optimization (2021) 81:109–152 119
Thus we obtain
QE:= x
y∈P×Rn|AEx+BEy≤cE⊂R2n.
Remark 5 Indeed, QEis an independent ReLU approximation. All rows of Aand Bare
either 0 or a multiple of a transposed unit vector eT
i∈Rn. If the latter is the case, i∈[n]is
thesamebothinAand Bwhen regarding the same row indices of Aand B.
Theorem 3 Let P ⊂Rnbe a ReLU proper polytope and QEbe the approximation of P as
in Definition 3. For any independent ReLU approximation Q of P it holds QE⊆Q.
For the proof see “Appendix C”. In the following section, we explain how the approxi-
mation of Ehlers [10]isusedin[5,10] and discuss possibilities to speed up the computation
to make this method more efficient. Then we present an approximation that is stronger than
the one of Ehlers [10] and hence not independent in Sect. 5.4.
5.3 Efficient optimization based bound tightening for neural network verification
If we build the MIP model using some preliminary lower and upper bounds for each ReLU
neuron, we can use the LP relaxation of the model to approximate the output values of the
neural network. As in [5,10], we can also tighten the neuron bounds using the LP relaxation,
which is identical to the approximation of Ehlers [10]. For each ReLU input variable xwe
compute an optimal solution of the LP relaxation for the objective functions xand −x.
The optimum objective values hence give the new bounds for xin the neural network.
In accordance with Gleixner et al. [14], we call this technique optimization based bound
tightening (OBBT).
After the bound update, it is crucial to improve the MIP formulation (1). This allows
to compute significantly tighter bounds in the next layer. Indeed, it is possible to build the
approximation of the whole network only during the process of bound optimization. That
means, each variable (corresponding to a neuron) is added separately to the relaxed MIP
model such that the LP is always as small as possible. We regard the ideas of Gleixner et al.
[14], who implemented an OBBT propagator in SCIP [13], to reduce the computational cost.
Gleixner et al. [14] treat two main topics: First, they show how to generate and propagate
Langrangian variable bounds (LVBs), and second, they propose methods for the acceleration
of OBBT.
LVBs are valid inequalities with respect to the LP relaxations of mixed integer nonlinear
problems (MINLP), which also includes LP relaxations of MIPs. Gleixner et al. [14] state
that LVBs can be viewed as a one-row relaxation of the given MINLP, that provide a local
approximation of the effect of OBBT. They are obtained as a by-product of the LP solutions
which are computed during the execution of OBBT. Dual multipliers of the LP relaxation
and an objective cutoff constraints are used to create an LVB. For the actual definition and
more details see Gleixner et al. [14]. If we model the neural network verification problem as
optimization problem as described in Sect. 4, we are only interested whether there exists a
solution with objective value smaller than or equal to zero or not. Hence, we can safely cut
off all solutions with an objective value greater than some ε>0. For our experiments we set
ε:= 0.1tohaveasufficientlybigmargintozeroinordertopreventerroneousresults.
The advantage of LVBs is that they can be propagated efficiently through a branch-and-
bound tree, while the frequent application of OBBT requires a great computational effort
for each branch-and-bound node that is processed. We see in our experiments that for some
123
120 Journal of Global Optimization (2021) 81:109–152
instances the use of LVBs is able to speed up the solving process significantly. See Sect. 8.2
for an overview of the experiments.
Moreover, we consider ideas of Gleixner et al. [14] for accelerating the application of
OBBT. One aspect is filtering of bounds which can hardly be improved by executing OBBT.
Assume that yis the value of a variable, which is a candidate for the application of OBBT,
in a feasible solution of the LP relaxation. Moreover, let l≤y≤ube the bounds which are
currently known for this variable. If then y−l≤εor u−y≤εfor some ε>0, OBBT can
strengthen the corresponding bound by at most ε, as Gleixner et al. [14] point out. Yet, initial
experiments showed, that usually almost all bounds can be improved significantly by OBBT,
so that filtering bounds is not useful for verification of neural networks. Another aspect is the
order of the variables for which OBBT is executed. As OBBT is executed layer by layer in
our case, the order of variables can only be changed within each layer. However, the various
strategies of Gleixner et al. [14] did not show any advantage over a simple fixed order in our
computational experiments, see Rössig [24] for details.
Eventually, we consider another approach for bound computations in neural networks
that is also a form of OBBT. Instead of using the LP relaxation to compute bounds, it
is also possible to employ the exact MIP model and compute bounds for the neurons with
OBBT. Computing the neuron bounds using the MIP formulation instead of the LP relaxation
leads to strongly improved bounds. Although not all MIPs are solved to optimality, clear
improvements of the corresponding bounds can be reached within a time limit of few seconds
per MIP. These improvements render it possible to solve also relevant instances without
specialized branching rules for neural network verification, however the bound computations
take a lot of time.
5.4 Optimization based relaxation tightening for two variables
In general we regard neural network layers that feature ReLU activations for all neurons of
the layer. Hence, we investigate in more detail how the ReLU function behaves in higher
dimensions, i.e. if the ReLU function is applied componentwise to layers with several neurons.
The following theorem can be found in Xiang et al. [41] as Corollary 1:
Theorem 4 For a polytope P ⊂Rn,ReLU(P)is a finite union of polytopes.
Hence we see that the best possible convex approximation conv(ReLU(P)) of ReLU(P)
is the convex hull of the union of polytopes in Theorem 4.Weinvestigateasimpleexample
to see how the approximation of Ehlers [10] differs from this best possible convex approxi-
mation. We consider a toy example as depicted in Fig. 8of the “Appendix”. Figure 4shows
the feasible input polytope of the ReLU layer and the corresponding ReLU image. The
same ReLU image can be seen in Fig. 5, replenished with a depiction of its convex hull, the
approximation of Ehlers [10] and the naive approximation. Figure 5clearly shows that even
for only two variables the convex hull of the ReLU image is strictly smaller compared to the
approximation of Ehlers [10]. It seems appealing to find an improved approximation of the
ReLU image closer to the convex hull, which is the best possible convex approximation.
Subsequently, we propose an efficient method which can strengthen the approximation of
Ehlers [10] by considering at least pairs of two neurons jointly. This new ReLU approximation
is not independent (cf. Definition 2). The depiction in Fig. 5shows, that in this situation
we could actually add one inequality and would improve the approximation to be exactly
the convex hull of the ReLU image. This inequality is induced by the connecting segment
between the vertices of the convex hull that maximize y1or y2, respectively. Of course, we
123
Journal of Global Optimization (2021) 81:109–152 121
Fig. 4 Example for the feasible
set before (blue polytope) and
after (set enclosed by red lines)
application of the ReLU function
Fig. 5 The red lines enclose the
ReLU image and the black line
(with the coordinate axes)
indicates the convex hull of this
ReLU image. The approximation
of Ehlers [10] is bounded by the
orange segments, the naive
approximation by the green ones
(and coordinate axes)
Fig. 6 Feasible set before (blue)
and after ReLU application (red)
for a different input polytope
cannot make this inequality tighter, since otherwise feasible points of the ReLU image would
be cut off. Though, the segment between the vertices that maximize y1or y2, respectively,
does not always induce a valid inequality as we show in the following example. Figure 6
shows a polytope of feasible x1,x2values and the corresponding ReLU image of feasible
values for y1and y2, such that y1=max{0,x1}and y2=max{0,x2}. The polytope is
two dimensional, but can also be considered as embedded image of a higher dimensional
polytope which is projected onto its variables x1and x2. These two variables correspond to
two neurons in one layer of a ReLU neural network. The dimension of the original polytope
is then the number of all neurons in that layer. It should be noted that we use these projections
to R2only for the visualization of our method. The goal of our method is to obtain a tighter
approximation without computing projections of higher dimensional polytopes. In Fig. 6,
the segment (dashed line) between the vertices that maximize y1or y2, respectively, does not
induce a valid inequality with respect to the ReLU image.
Now the idea is to add an inequality to the model which partly cuts off the polytope resulting
from the approximation of Ehlers [10], but leaves the ReLU image intact. The cut is parallel
to the segment between the vertices that maximize y1or y2, respectively. Depending on the
situation, these vertices will either meet the inequality with equality or not. Figure 7depicts
this inequality and shows that adding this constraint considerably improves the approximation
of the convex hull. In the following we describe how this constraint can be computed. A linear
123
122 Journal of Global Optimization (2021) 81:109–152
Fig. 7 Here we see the ReLU
image of the polytope depicted in
Fig. 6colored in red, its convex
hull in black, the approximation
of Ehlers [10] in orange, the
inequality which we want to
introduce as a black dashed line
and the constraints of the naive
approximation as a green dashed
line. All sets are limited by the
coordinate axes
approximation of the ReLU neural network in question serves as a basis. Naturally, we can use
the LP relaxation (which corresponds to the approximation of Ehlers [10]) if the verification
problem is formulated as an MIP.
Assume we want to tighten the approximation for the ReLU output variables y1and y2
which correspond to ReLU input variables x1and x2. All of these variables are contained in
the LP relaxation of the neural network. In the final solution it must hold y1=max{0,x1}
and y2=max{0,x2}due to the ReLU constraints. Let ˆaand ˆ
bbe the optimum solutions
when maximizing x1or x2, respectively, in the current LP relaxation. Then we write ˆa1and
ˆa2for the values of the variables x1and x2in the solution ˆa. Analogously we write ˆ
b1and ˆ
b2
for the corresponding variable values in solution ˆ
b. It should be noted that these LP solutions
are computed during the execution of OBBT, and can therefore be obtained at no additional
cost. Obviously it holds ˆa1≥ˆ
b1and ˆ
b2≥ˆa2due to the choice of objective functions. Now
we define a1:= max{0,ˆa1}and analogously a2,b1and b2. We compute new objective
coefficients as c1:= b2−a2and c2:= a1−b1, i.e. c1,c2≥0. The latter holds due to
the fact that α≥βimplies max{0,α}≥max{0,β}for α, β ∈R.Again,wesolveanLP
using the current relaxation and maximize the objective function c1x1+c2x2. We denote the
optimum objective value as γand compute δ:= c1a1+c2a2. After this computation we can
strengthen the LP relaxation by adding the constraint
c1y1+c2y2≤max{γ,δ}.(5)
Theorem 5 Constraint (5)is a valid inequality with respect to the ReLU image corresponding
to y1and y2. That means, constraint (5)can strengthen the LP relaxation of our MIP for the
verification problem but cannot cut off any feasible solution.
Proof We remind that it holds y1=max{0,x1},y2=max{0,x2}due to the ReLU con-
straints, and a1,a2,b1,b2,c1,c2≥0, hence δ≥0. That means, if (y1,y2)=(0,0)
we have c1y1+c2y2=0≤δ.If(y1,y2)=(x1,0)it holds x1≤a1and hence
c1y1+c2y2=c1x1+0≤c1a1+c2a2=δ. On the other hand, the case (y1,y2)=(0,x2)
implies x2≤b2and subsequently we see c1y1+c2y2=0+c2x2≤c1b1+c2b2=δ.
Otherwise it holds (y1,y2)=(x1,x2)which implies c1y1+c2y2≤γand we can conclude
the proof.
Thus, the approximation of Ehlers [10] can be improved by adding constraints of type (5)
to the LP relaxation of the model. Like this, we obtain a ReLU approximation which is not
independent. Although we have to solve only one LP per pair of neurons, applying this method
to all possible pairs of neurons would lead to an immense computational cost. Therefore,
we select only some pairs of neurons for which it is likely to significantly strengthen the
LP relaxation by adding the new inequality to our model. Though, our selection strategy
as laid out in Rössig [24] was not able to outperform a baseline selection strategy, which
123
Journal of Global Optimization (2021) 81:109–152 123
selects neurons in a fixed, predetermined order. Yet, this technique, which we abbreviate as
OBBT2, can significantly strenghten the LP relaxation and reduce the number of nodes in
the branch-and-bound tree (see Table 10 in the “Appendix”).
6 Primal heuristics
For the problem of neural network verification the use of primal heuristics lies in the quick
falsification of incorrect properties. Surprisingly, even a trivial heuristic, which only performs
random sampling within the set of feasible inputs, can often find counterexamples to incorrect
properties quickly in contrast to standard MIP heuristics.
The idea of the random sampling heuristic as introduced by Bunel et al. [5]) is plain and
simple: Given an instance Π=(X,Y,F)of the verification problem, we randomly pick
x∈Xand check whether F(x)∈Y. In case that F(x)/∈Y, we know that Πis refutable.
Moreover, using the MIP formulation as optimization problem, the input vector xis also
useful if it leads to a decrease of the primal bound, since this may help to tighten neuron
bounds. In general it is not trivial to obtain x∈X,ifX⊂Rnis an arbitrary polytope.
However, as mentioned in Remark 3, many of the instances we regard feature a polytope X
which is actually a box. In this case, we simply pick xi∈[li,ui]uniformly at random for
i∈[n],whereli,uiare the bounds of Xfor each component. This is performed similarly in
[5,36]. Otherwise, if Xis not a box, we solve an LP to obtain x∈Xusing a random objective
function.
We propose another heuristic that can be used in addition to the random sampling heuristic.
It is based on the local search proposed by Dutta et al. [8] for output range analysis of ReLU
neural networks. Though, we omit the use of gradient information and fit the heuristic more
naturally into the framework of MIP solving. The main idea is to fix all neurons in one of
their phases, such that the optimization variant of neural network verification consists only
in solving a linear program. We start with a feasible input x0∈Xfor the neural network and
use forward propagation to compute the values of all neurons in the network. Then, for each
ReLU neuron, we fix the binary variable din (1) to zero or one, corresponding to the phase
of the neuron that is determined by propagating x0through the network. Furthermore, the
binary variables in the formulation of the maximum function for objective variable tare also
fixed,suchthatt=max{z1,...,zk}. With all binary variables fixed, the MIP as described
in Sect. 4becomes an LP.
This LP is minimized with respect to variable tas objective function. After the first
minimization LP has been solved, we choose a ReLU input variable ¯x(corresponding to
one ReLU neuron) of value zero if possible. For this variable, we switch the fixed value
of the corresponding binary variable ¯
dfrom zero to one or vice versa. Then we optimize
again and obtain a new input vector ˆx0∈Xfor the neural network. After that, we switch the
fixing of another binary variable, whose corresponding ReLU input variable has value 0 in
the solution. This process is iterated until we find a feasible counterexample, i.e. the optimal
value of the LP is smaller than zero, or we reach a predefined iteration limit. In case that none
of the ReLU input variables is equal to zero, we have to abort the procedure. It is easy to see
that switching the fixings of the binary variables as described, can only reduce the objective
value of the optimum LP solution.
In the following we describe, how we combine our LP based heuristic with the random
sampling heuristic. First we use the random sampling heuristic to find an input vector x0∈X.
The random sampling process and forward propagation are very fast, and therefore we try
123
124 Journal of Global Optimization (2021) 81:109–152
many (e.g. 1000) random inputs to find an input x0∈X. Out of all sampled input vectors, we
select x0∈Xsuch that it corresponds to the lowest value of objective variable t. The hope
is that x0can be converted into an actual counterexample by computing a new input vector
ˆx0. This is given by the optimum LP solution after some ReLU phase switches as described.
Instance Πis shown to be indeed refutable, if the value of tis below zero in this optimum
LP solution.
Of course, both heuristics can be applied several times throughout the solving process in a
branch-and-bound tree which we enable in our implementation. Our experimental evaluation
shows that the LP based heuristic works quite successfully. In fact, the mean runtime on our
evaluation set of SAT instances, as mentioned in Sect. 8, drops from 330.1 to 71.7 seconds if
our LP based heuristic is employed. On the other hand, the mean runtime on our evaluation
set of UNSAT instances increases only slightly from 915.3 to 943.7 seconds due to the
application of our LP based heuristic.
7 Branching for neural network verification
The verification problem can be solved with a generic branch-and-bound approach as
described by Bunel et al. [5]. If the problem is solved as an MIP, specific branching rules for
neural network verification can be integrated into the MIP solving process to strongly speed
up the process. Initial bounds are necessary for the formulation of the verification problem
as an MIP model and can be obtained by one of the approximation methods as introduced in
Sect. 5. Many relevant instances of the verification problem cannot be solved if an approxi-
mation of the network is computed only once. Specific branching rules can be used to split
an instance into simpler ones which can be approximated better. One option is to split the set
of feasible input vectors for an instance of the verification problem as in [5,35].
Given an instance Π=(X,Y,F)of the verification problem, the design of the domain
branching rule is based on the assumption that Xis a box. However, the branching rule can
also be applied if Xis not a box. We assume the existence of bounds li≤xi≤uifor all x∈
X⊂Rnand i∈[n], cf. Remark 3.IncasethatXis a box, it holds X=[l1,u1]×...×[ln,un],
otherwise X⊆[l1,u1]×...×[ln,un].Buneletal.[5] propose to select j∈[n]and split
the domain of variable xjto subdomains [lj,uj+lj
2]and [uj+lj
2,uj].The domains of all other
variables xi,i∈[n]\{ j}are left unchanged, so that we obtain two sub-instances with smaller
input domains.
The selection of the branching variable is very important for the performance of the
branching rule, cf. [5]. In Bunel et al. [4], the selection depends on the depth in the branch-and-
bound tree and follows a fixed order. Bunel et al. [5] implement another selection rule, based
on the approximation method of Wong and Kolter [38]. In our implementation we mainly
use a selection rule “gradient” which is quite similar to the one used in Wang et al. [35]. For
that, we extend the neural network Fto another one ˜
F, which encodes also the properties
that shall be verified. It has output dimension one, and for a fixed input x∈X, the output
is the same as the value of the objective variable tin the MIP formulation as optimization
problem (3). We use a max-pooling layer to model the computation of the maximum in (3)
in the neural network ˜
Fand refer to Bunel et al. [5] for more details on the construction. We
compute the gradient of ˜
Fat the input vectors x1=(l1,...,ln),x2=(u1+l1
2,...,un+ln
2),
and x3=(u1,...,un)and let g:= ∇ ˜
F(x1)+∇˜
F(x2)+∇˜
F(x3)∈Rn.Fori∈[n]
we compute zi:= |gi|·(ui−li)and choose the branching variable j∈[n]such that
zj=max{z1,...,zn}. The intuition is that verifiability of the instance depends mainly on
123
Journal of Global Optimization (2021) 81:109–152 125
the values of input neurons with a high (averaged) absolute gradient value and sufficiently
big input range.
Another natural possibility is to perform branching over the two phases of a ReLU neuron
whichisusedin[6,10,17,36]. If the verification problem is formulated as an MIP, this
corresponds exactly to branching over the binary variables which correspond to the ReLU
neurons. Given a ReLU input variable xand the corresponding output variable y, we can
branch the constraint y=max{0,x}into two cases: either x≤0, y=0, or x>0,
y=x. The main question is how to select the branching variables. As in Cheng et al. [6], we
prioritize ReLU neurons which are located in the front of the neural network in the selection
rule “standard”. Similarly to Wang et al. [36], we implement a selection rule “gradient” that
picks ReLU neurons which have a large gradient with respect to the outputs of the network.
Though, in our experiments the rule “standard” performed better.
8 Computational evaluation
In this section we discuss computational results for various experiments that we conducted
on a diverse set of test instances which we shortly present in the following. We empirically
compare several methods for the computation of neuron bounds. Furthermore, we report com-
putational results for various configurations of our solver and compare it with the programs
of Bunel et al. [5], Wang et al. [36], and Katz et al. [17].
As we are not aware of any publicly available instances Π=(X,Y,F)of the verification
problem where Xis not a box (cf. Remark 4), we define such instances to show the capabilities
of our solving model. As a basis we use the neural networks of the ACAS Xu system,
which were used by Katz et al. [17] to create instances of the verification problem. These
are described in “Appendix A”. The ACAS Xu system is designed to prevent collisions of
(autonomous) aircrafts. We also perform our evaluations on the original instances published
by Katz et al. [17]. Furthermore, we use test instances with neural networks that are trained
on the well known MNIST [18] handwritten digit data set. In fact, we use two of the trained
neural networks published by Wang et al. [36] and verify robustness of classifications using
the L∞norm. Each input neuron may have a value between 0 and 255 and we investigate
four different perturbation radii (1, 5, 10, and 20). These networks have two layers, each of
them with 24 or 512 neurons, respectively. The smaller one reaches an accuracy of 96.59
%, the one with 512 neurons per layer 98.27 %. In contrast to the neural networks of the
ACAS Xu system with input dimension five, the input dimension of the MNIST networks is
784. This difference is especially interesting with respect to the performance of input domain
branching as presented in Sect. 7. To evaluate various settings of our solving model, we
use two evaluation subsets that contain a diverse selection of all the instances. One contains
13 SAT instances and the other one 23 UNSAT instances. In general, we compute average
values for runtime and number of solving nodes as shifted geometric mean which reduces
the sensitiviy to outliers, cf. [1,16].
8.1 Empirical comparison of bound computation approaches
We use our evaluation set of UNSAT instances for a numeric comparison of the bounds which
are obtained by various bound computation approaches. For each instance Π=(X,Y,F)
we compute lower and upper bounds [li,ui]for all neurons i∈[N](before application
of the ReLU function), based on the feasible input domain X. The bounds are computed
123
126 Journal of Global Optimization (2021) 81:109–152
layerwise from the first to the last layer and branching is not applied. Then we compute the
shifted geometric mean of {u1−l1,...,uN−lN}with shift value 1 for each instance. The
mean value indicates how good the corresponding bound computation approach is. Clearly,
it is desirable that the difference ui−liis as small as possible for all neurons i∈[N],and
we use the mean value to compare the different approximation methods. See Table 8in the
“Appendix” for detailed results, here we report the overall mean.
As a second measure we report how many neurons can be fixed in their phase by the
respective bound computation method. Detailed results on this can be found in Table 9in the
“Appendix”.
The results in Tables 1and 2show that the best bounds are computed by OBBT on the
MIP model. While OBBT2 computes better bounds compared to OBBT on the LP relaxation,
the improvements are unfortunately quite minor. We also see that the approaches of Wang
et al. [35,36] are clearly superior to naive interval arithmetic. Yet, they are not competitive
with OBBT if regarding only the quality of the computed bounds.
8.2 Comparison of different techniques in our model
In this section we provide an overview of the performance of our solving model in various
configurations. The experiments in this section are run for the instances of our evaluation set
of UNSAT instances. All results are obtained on cluster nodes with Intel Xeon CPUs E5-
2670 which have a clock rate of 2.5 GHz. Each experiment is run exlusively on one cluster
node and a memory limit of 32 GB is set. In general we set a time limit of 7200 seconds. If
the time limit is hit during the solving process, we assume the time limit as runtime for the
corresponding instance.
We report results both for the whole UNSAT test set, and separately for the ACAS and
MNIST based instances. Comparing ACAS and MNIST based instances, the main differences
are the number of input neurons (5 vs. 784) and the number of layers in the neural networks
(6 vs. 2).
For the experiments in this section we use a baseline configuration “no_heur_base” of
our solving model. In this configuration, domain branching (with selection rule “gradient”)
is used up to a depth of 20 in the branch-and-bound tree. It should be noted that this depth is
usually not exceeded if domain branching is used. Furthermore, OBBT is applied to the LP
relaxation at each solving node up to a depth of 20 in the branch-and-bound tree. The primal
heuristic is enabled only at the root node. Further techniques (e.g. our separator based on the
work of Anderson et al. [2]) are not applied.
In Table 3we report mean runtimes on the UNSAT test set for several different configu-
rations of our solving model. Here we try to give an impression of the differences between
the configurations which we tested. We are mostly interested in the computation of the dual
bound by various methods and therefore the primal heuristic is only employed at the root node.
Table 3shows that there is a clear difference between solving MNIST and ACAS instances. In
fact, the configuration “no_heur_base_genv” is the fastest in total because it performs well on
both types of instances. However, it is not the best configuration for either of the two subsets
(although close to the best configuration for ACAS instances). The best configurations with
respect to the MNIST instances (“no_heur_relu_genv” with respect to number of timeouts,
“mnist_base” with respect to mean runtime) do not perform well on the ACAS instances. The
configurations in Table 3are sorted by the total number of timeouts. In the following, we give
short descriptions of the configurations in this order. The configuration “no_heur_base_genv”
corresponds to the baseline configuration with the additional use of Langrangian variable
123
Journal of Global Optimization (2021) 81:109–152 127
Table 1 Averaged differences between neuron bounds in our UNSAT test set for different bound computation methods
Naive IA Sym. IA Symbolic equations OBBT LP OBBT2 k= 2, l= 5 OBBT2 k= 10, l= 10 OBBT MIP
mean value 156.54 98.30 60.23 24.83 24.61 23.98 10.38
IA stands for interval arithmetic; in case of OBBT2 we compare two different settings
123
128 Journal of Global Optimization (2021) 81:109–152
Table 2 Average of the number of fixed neurons per instance in our UNSAT test set for different bound computation methods
Naive IA Sym. IA Symbolic equations OBBT LP OBBT2 k =2, l =5 OBBT2 k= 10, l = 10 OBBT MIP
Mean value 84.35 107.61 124.87 151.30 151.57 151.74 189.30
123
Journal of Global Optimization (2021) 81:109–152 129
Table 3 Runtime results on the UNSAT test set using the formulation as optimization problem and various
configurations
Subset (number of instances) All (23) ACAS (18) MNIST (5)
Configuration Time Timeouts Time Timeouts Time Timeouts
no_heur_base_genv 586.3 3 765.4 2 221.61
no_heur_sepa0_freq5 839.2 5 757.4 2 1213.13
no_heur_base 854.9 5 771.2 2 1237.33
no_heur_base_obbt2_nosort 888.0 5 816.8 2 1199.13
no_heur_base_mip 1556.5 6 1578.2 3 1480.93
no_heur_relu_genv 750.3 7 1180.0 7 141.60
no_heur_relu 970.2 9 1334.2 8 304.51
mnist_base 2753.5 19 7200.018 77.51
bounds (LVBs) as described in Sect. 5.3. Obviously, the LVBs are quite beneficial for solving
MNIST instances. The configuration “no_heur_sepa0_freq5” sometimes calls an additional
separator, as suggested by Anderson et al. [2]. Next in Table 3follows our baseline configu-
ration which is hence quite good already. Indeed, “no_heur_base_obbt2_nosort” is the best
of our configurations that use OBBT2 and it has a higher mean runtime. Regarding the mean
runtime compared to the number of timeouts, the configuration “no_heur_base_mip” is not
in line with the other configurations. Indeed, this configuration applies OBBT to the MIP
model in the beginning of the solving process to compute initial neurons bounds. This comes
with a high computational cost, also for rather easy instances. However, for more difficult
instances, this stratey is not that bad as indicated by the relatively low number of timeouts.
In the configuration “no_heur_relu_genv”, ReLU branching is combined with OBBT on
the LP relaxation and the creation of LVBs. Most notably, this configuration is the only
one that solves all MNIST instances in the evaluation set within the time limit. Though, the
performance on the ACAS based instances is rather mediocre. Due to the low number of
input neurons of the ACAS neural networks, input domain branching is more efficient for
these instances. The same holds for the configuration “no_heur_relu” which does not include
the LVBs. Eventually, in the configuration “mnist_base” the solving process is limited to the
application of standard MIP techniques. Notably enough, this configuration has the lowest
mean runtime on the subset of MNIST instances. On the other hand, it times out on all
ACAS instances. This vast difference can be attributed to the different number of layers in
the neural networks (two for MNIST, six for ACAS), as deeper networks are more difficult
to approximate.
8.3 Comparison with other solvers
In this section we provide detailed results of computational experiments which we conducted
to investigate the performance of different solvers for neural network verification. Besides
our own solving model, which we regard in various configurations, we include the programs
of Bunel et al. [5], Wang et al. [36], and Reluplex by Katz et al. [17]. The work of Wang et
al. [36] is implemented in two solvers, Neurify and ReluVal. ReluVal is used on the ACAS
instances whereas Neurify is applied to the MNIST instances. We always check whether any
alleged counterexample presented by some solver is indeed a feasible counterexample for
the corresponding instance. In general, we perform these checks with an absolute numerical
123
130 Journal of Global Optimization (2021) 81:109–152
Table 4 Mean runtime on ACAS properties 1, 2, 3, 4, and 8
ReluVal Adv NonOpt Joint BaB Reluplex
Mean runtime in seconds 5.7 3.1 34.1 34.1 67.5 669.0
tolerance of 10−8. Several counterexamples are only feasible though, if a higher numerical
tolerance is allowed, which we report in that case.
Each experiment is run exclusively on one cluster node with an Intel Xeon Gold 5122
CPU which operates at a clock rate of 3.6 GHz, and we set a memory limit of 32 GB. Besides
the program names ReluVal,Neurify and Reluplex, we use the following terms to denote the
various solving models. Adv refers to ReluVal or Neurify using their adversary check mode
which is focused on finding counterexamples. BaB denotes the branch-and-bound method of
Bunel et al. [5]. Eventually, we use NonOpt to describe that our solving model is used with
the formulation of the verification problem as feasibility problem. Joint refers to our solving
model using the formulation as optimization problem, which can solve conjunction instances
(cf. Remark 4). Separate indicates that we solve the verification problem as optimization
problem, but conjunction instances are split into several disjunction instances as explained
in Remark 4. Conjunction instances have to be splitted likewise for BaB and Reluplex.
Based on the results of our evaluation experiments in Sect. 8.2, we choose the configuration
“no_heur_sepa0_freq5” as the best for the ACAS instances. Though, in order for a good
performance on refutable instances, we do adapt this configuration to execute the primal
heuristic also locally up to a depth of eight in the branch-and-bound tree.
Table 4contains the runtime results for all ACAS instances of Properties 1, 2, 3, 4, and 8
as defined by Katz et al. [17]. It should be noted that these are all disjunction instances. The
mean runtime of Reluplex is at least one order of magnitude larger than the mean runtimes
of all other solvers. ReluVal is clearly superior to all other solvers, especially if its adversary
check mode is applied. We see that our solving model, which uses SCIP [13] to strongly
integrate the bound computations into a MIP framework, performs significantly better than
the rather similar approach of Bunel et al. [5].
A similar picture of the performance of the various solvers can be seen in Table 5,which
shows the results on the ACAS instances of Properties 5, 6, 7, 9, and 10. Remind that we
underestimate the runtime of instances for which the solving process is stopped due to the
time limit.
In Tables 6and 7we report runtimes for the MNIST instances. We remind that all of
these are conjunction instances by definition. For our solving model we use the configuration
“mnist_base” as presented in Sect. 8.2. The instances which are based on neural networks
with two layers of 24 neurons are solved very quickly by most solvers, as can be seen in
Table 6. Though, domain branching is apparently not a good strategy to solve these instances,
which feature 784 input neurons. This is shown by the high number of timeouts of the solving
method BaB [5] on the MNIST instances in Table 6.
Clearly, the MNIST instances with 512 neurons per layer are much more challenging.
In Table 7we see that our solving model performs quite good when the approach Joint is
taken, i.e. conjunction instances are solved directly as one optimization problem. In several
cases Neurify aborts the solving process with no result. For the corresponding instances we
assume the time limit of 7200 seconds as runtime in order to compute the mean runtime.
If the conjunction instances are split into separate disjunction instances, our solving model
mostly fails to find counterexamples within the time limit (see NonOpt and Separate). This
can be explained by the fact that the instances, which are obtained after splitting, are solved
123
Journal of Global Optimization (2021) 81:109–152 131
Table 5 Runtimes on ACAS properties
Instance Result Reluval Adv NonOpt Joint Separate BaB Reluplex
property5 UNSAT 12.3 9.1 3922.2 timelimit 4104.8 timelimit 4883.4
property6a UNSAT 3.3 3.2 7078.8 6248.2 3745.0 timelimit timelimit
property6b UNSAT 1.6 1.5 5935.9 4207.5 6179.1 timelimit timelimit
property7 SAT memlimit 1099.1 timelimit 3012.7 3012.7 timelimit timelimit
property9 UNSAT 411.5 184.5 5672.0 timelimit 5921.1 4214.8 timelimit
property10 UNSAT 1.3 1.1 timelimit 3268.3 5335.3 4567.3 5516.3
sh. geo. mean 60.0 33.6 6053.9 4875.5 4564.6 6107.5 6456.0
If Reluval is run in normal check mode, it terminates prematurely on Property 7 due to a lack of memory. We use the time limit of 7200 seconds for the mean computation in this
case
123
132 Journal of Global Optimization (2021) 81:109–152
Table 6 Runtimes on the MNIST 24 data set
Instance Result Neurify Adv NonOpt Joint Separate BaB Reluplex
mnist_24_image11_1 UNSAT 0.4 0.4 10.68.128.0 25.7 0.5
mnist_24_image11_10 UNSAT 2.0 2.1 31.015.535.3 timelimit 4824.5
mnist_24_image11_20 SAT 0.4 0.5 6.07.57.8 timelimit 3437.6
mnist_24_image11_5 UNSAT 0.4 0.4 860.76.131.6 22.4 189.9
mnist_24_image2_1 UNSAT 0.4 0.4 25.13.827.1 21.5 1.6
mnist_24_image2_10 SAT 0.4 0.5 2.62.12.0 timelimit 24.7
mnist_24_image2_20 SAT 0.4 0.5 2.72.02.0 timelimit 19.2
mnist_24_image2_5 UNSAT 0.4 0.4 31.613.636.8 timelimit timelimit
mnist_24_image4_1 UNSAT 0.4 0.6 12.42.525.8 22.3 0.9
mnist_24_image4_10 SAT 0.4 0.5 7.92.02.1 timelimit 70.5
mnist_24_image4_20 SAT 0.5 0.6 2.72.35.3 timelimit 11.9
mnist_24_image4_5 SAT 0.4 0.5 23.12.024.0 timelimit 536.1
shifted geo. mean 0.4 0.5 18.84.414.1 1006.3 86.1
The last number of the instance name is the perturbation radius. In contrast to all other solvers, Neurify reports SAT for instance mnist_24_image11_10. However, all counterex-
amples that Neurify produces, are only valid when applying a large numerical tolerance of 10−3. We regard the instance mnist_24_image11_10 as verifiable and exclude it from
the mean computation. Counterexamples produced by Reluplex are valid with a numerical tolerance of 10−5
123
Journal of Global Optimization (2021) 81:109–152 133
sequentially one after the other. However, if a verifiable instances is processed first, this may
already lead to a timeout. Reluplex is only able to solve two of the easiest instances within the
time limit, and BaB also performs considerably worse than Neurify and our solving model.
Though for all MNIST instances, it should be noted that the counterexamples which are
produced by Neurify are only feasible if one applies a large numerical tolerance of 10−3.It
is clear to see, that the instances with the lowest perturbation radius of 1 (in L∞norm) are
easy to verify. On the other hand, for the instances with a high perturbation radius of 10 or
20, counterexamples are found in most cases. Apparently, the perturbation radius 5 poses
the biggest difficulties for the solvers, as the corresponding instances are probably on the
borderline between SAT and UNSAT.
In the “Appendix” we report our self defined instances where the input polytope Xis not
a box and computational results of our solving model on these, as the other solvers cannot
process these instances. See Table 12 for the results which are obtained with the formulation
as optimization problem, and Table 13 for the results corresponding to the formulation as
feasibility problem.
9 Conclusions
Our solving model shows a solid performance in all categories of our benchmark set. This
highlights the success of our approach, which combines MIP solving, using the solver SCIP
[13], with specialized bound computation and branching techniques. Especially, we can solve
instances with general polytopes as input domains which is not possible with other verification
algorithms [5,17,36]. Moreover, our solving model clearly outperforms the solvers of Bunel
et al. [5] and Katz et al. [17]. Subsequently, Reluplex cannot be regarded as a state-of-the-art
solver for neural network verification anymore. While ReluVal and Neurify from Wang et al.
[36] show impressive runtime results for most instances, they rely primarily on branching. For
big instances, this can become problematic due to limited numerical accuracy and memory
capacity.
Moreover, we developed a theoretical framework for the comparison of linear approxi-
mation methods and presented the novel approximation technique OBBT2 which is able to
improve the linear relaxation of a ReLU neural network. Besides that, we showed how the
local search procedure of Dutta et al. [8] can be implemented differently in an MIP solving
context such that it serves as a primal heuristic. Additionally, we described a novel formula-
tion of the verification problem as quadratic program. All newly proposed techniques were
evaluated computationally within our newly implemented solving model.
Although our implementation is limited to neural networks with ReLU activation func-
tion, the approach could be easily extended to work with other piecewise-linear activation
functions. Examples for that are the leaky ReLU function or max-pooling layers. In addition,
the great flexibility of our solving model allows the integration of further techniques such
that future improvements may render it even more efficient.
In conclusion, it can be said that many challenges remain in the field of neural network
verification. The further scalability of current verification approaches is still an open task.
Besides, new approaches for the falsification of incorrect properties would be highly inter-
esting. Although we present a new heuristic for that (based on ideas of Dutta et al. [8]), better
algorithms could probably be developed.
Acknowledgements Open Access funding enabled and organized by Projekt DEAL. Ansgar Rössig would
like to thank Carmeq GmbH, Berlin, for the support of this research.
123
134 Journal of Global Optimization (2021) 81:109–152
Table 7 Runtimes on the MNIST 512 data set, where the last number of the instance name is the perturbation radius
Instance Result Neurify Adv NonOpt Joint Separate BaB Reluplex
mnist_512_image11_1 UNSAT 0.2 0.5 80.0 39.9 236.3 4446.4 371.6
mnist_512_image11_10 - no result no result timelimit timelimit timelimit timelimit timelimit
mnist_512_image11_20 SAT 0.6 0.8 timelimit 28.6 timelimit timelimit timelimit
mnist_512_image11_5 UNSAT 0.2 0.5 timelimit timelimit timelimit 4908.0 timelimit
mnist_512_image2_1 UNSAT 0.5 0.5 92.0 97.4 498.6 4854.3 timelimit
mnist_512_image2_10 SAT no result no result timelimit 30.9 timelimit timelimit timelimit
mnist_512_image2_20 SAT 0.4 0.7 timelimit 12.4 56.2 timelimit timelimit
mnist_512_image2_5 - no result no result timelimit timelimit timelimit timelimit timelimit
mnist_512_image4_1 UNSAT 0.3 0.5 84.3 32.8 290.4 4512.2 1132.2
mnist_512_image4_10 SAT no result no result timelimit 22.9 timelimit timelimit timelimit
mnist_512_image4_20 SAT 0.4 0.6 timelimit 21.6 122.3 timelimit timelimit
mnist_512_image4_5 - no result no result timelimit timelimit timelimit timelimit timelimit
shifted geo. mean 148.5 150.4 2434.9 220.7 1612.8 6235.3 4830.5
123
Journal of Global Optimization (2021) 81:109–152 135
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included in the
article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Appendix A Definitions of additional properties on ACAS neural net-
works
Here we provide the formal definitions of our properties on the ACAS neural networks. The
numbers i_jin the beginning of each name refer to the neural network which is used.
The input constraints for (i) 1_1_lin_opp and (ii) 1_1_lin_opp2 are given as fol-
lows, where either constraint (i) or (ii) is used. Desired output: COC (i.e. clear-of-conflict)
should not be minimal.
1000 ≤ρ≤2000
−3.141593 ≤θ≤0(i)
0≤θ≤3.141593 (ii)
−3.141593 ≤ψ≤3.141593
1000 ≤vown ≤1200
800 ≤vint ≤1200
θ=−ψ
−100 ≤vown −vint ≤100
Input constraints for (i) 2_2_lin_opp and (ii) 2_2_lin_opp2, where either constraint
(i) or (ii) is used. Desired output: COC should not be minimal.
1000 ≤ρ≤2000
0≤θ≤3.141593 (i)
−3.141593 ≤θ≤0(ii)
−3.141593 ≤ψ≤3.141593
100 ≤vown ≤1200
0≤vint ≤1200
θ=−ψ
vown =vint
Input constraints for (i) 1_1_lin_opp_dir and (ii) 1_1_lin_opp2_dir,where
either constraint (i) or (ii) is used. Desired output: COC should not be minimal.
1000 ≤ρ≤2500
−3.141593 ≤θ≤0(i)
123
136 Journal of Global Optimization (2021) 81:109–152
Fig. 8 Neural network with input neurons a1∈[−3,2]and a2∈[−5,4]. For our example we try to find
an upper bound for the output o1=y1+y2of the network. This network is used as an example throughout
Section 5
0≤θ≤3.141593 (ii)
−3.141593 ≤ψ≤3.141593
800 ≤vown ≤1200
600 ≤vint ≤1200
θ=−ψ
vown =vint
Input constraints for (i) 1_1_int_away and (ii) 1_1_int_away2. Desired output: (i)
strong left is not minimal, or (ii) strong right is not minimal.
5000 ≤ρ≤6000
−3.141593 ≤θ≤3.141593
−3.141593 ≤ψ≤3.141593
1000 ≤vown ≤1200
500 ≤vint ≤1200
vint ≥vown +100
−0.392699 ≤ψ−θ≤0.392699
Input constraints for (i) 1_2_int_away and (ii) 1_2_int_away2. Desired output: (i)
strong left is not minimal, or (ii) strong right is not minimal.
5000 ≤ρ≤7000
−3.141593 ≤θ≤3.141593
−3.141593 ≤ψ≤3.141593
500 ≤vown ≤1200
500 ≤vint ≤1200
vint ≥vown +100
−0.392699 ≤ψ−θ≤0.392699
Input constraints for 2_1_var_dist and 3_1_var_dist. Desired output: COC is
minimal.
10000 ≤ρ≤60760
−0.141593 ≤θ≤0.141593
123
Journal of Global Optimization (2021) 81:109–152 137
−0.141593 ≤ψ≤0.141593
300 ≤vown ≤1200
0≤vint ≤1200
vint ≥vown +601 −0.01ρ
Appendix B Tables and Figures
See Fig. 8and Tables 8,9,10,11,12 and 13
Appendix C Proofs
Definition 4 Let S⊂R2n. Then, for i∈[n], we denote the embedded image (in R2)ofthe
orthogonal projection of Son the subspace span{ei,ei+n}as Si.
Lemma 1 Let P ⊂Rnbe a ReLU proper polytope and Q ⊂R2nan independent ReLU
approximation of P. Then it holds for all i ∈[n]and all ˆx∈P:
Qi∩({ˆxi}×R)=Q∩({ˆx}×Rn)i(6)
Furthermore, there exist αi≤βi,αi∈R∪{−∞}and βi∈R∪{+∞}for i ∈[n]such that
{ˆx}×[α1,β
1]×...×[αn,β
n]=Q∩({ˆx}×Rn). (7)
Proof As in Definition 2,weuse
Q=x
y∈P×Rn|Ax +By ≤c⊂R2n.
For the first part we see
Qi∩({ˆxi}×R)
=xi
yix∈P,y∈Rn,Ax +By ≤c∩({ˆxi}×R)(8)
=ˆxi
yix∈P:xi=ˆxi,y∈Rn,Ax +By ≤c(9)
=ˆxi
yix∈P,xi=ˆxi,y∈Rn,A(i)x+B(i)y≤c(i)(10)
=ˆxi
yiy∈Rn,A(i)ˆx+B(i)y≤c(i)(11)
=ˆxi
yiy∈Rn,Aˆx+By ≤c(12)
=x
yx∈P,y∈Rn,Ax +By ≤c∩({ˆx}×Rn)i(13)
=Q∩({ˆx}×Rn)i(14)
123
138 Journal of Global Optimization (2021) 81:109–152
where (8) holds due to the definition of Qand the orthogonal projection, and (9) is rewritten.
(10) holds, since the set is determined only by the values of ˆxiand yi. The values of xjand
yjfor j∈[n],j= iare not relevant for the set and therefore we do not need any constraints
on these. Hence we can restrict the inequality in the set to the submatrices which are relevant
for xiand yi.In(11) it suffices to consider ˆxbecause all columns except the i-th column of
A(i)are zero. We can now switch back to the original inequality in (12), as this affects only
xjand yjfor j∈[n],j= i. Then we can write the set as an intersection (13) and apply the
definition of Q(14).
For the second part of the lemma, we stress that for (x,y)T∈Qeach yk,k∈[n]is
independent of the values of all other yl,l= k. This is an immediate consequence of the
Table 8 Averaged differences between neuron bounds in our UNSAT test set for different bound computation
methods
Instance Naive
IA
Sym. IA Symbolic
equations
OBBT LP OBBT2
k=2,
l=5
OBBT2
k= 10,
l=10
OBBT
MIP
lin_acas_1_1_int_away 214.36 178.85 119.17 36.51 35.95 34.67 4.23
lin_acas_1_1_lin_opp2 182.90 111.48 60.60 15.60 15.49 15.01 3.55
lin_acas_1_1_lin_opp_dir 175.81 98.07 57.77 16.79 16.75 16.14 4.24
lin_acas_3_1_var_dist 155.95 110.24 73.78 15.38 15.14 14.32 1.25
mnist_24_image11_5 1183.50 888.54 755.85 732.07 732.07 732.07 670.67
mnist_24_image2_5 1150.58 834.30 743.92 718.20 718.20 718.20 634.58
mnist_24_image4_1 240.24 141.79 136.55 136.17 136.17 136.17 134.88
mnist_512_image11_5 895.37 654.84 475.09 442.19 442.19 442.19 440.36
mnist_512_image2_1 174.50 92.88 77.00 73.98 73.98 73.98 72.95
property10_property 189.95 93.06 52.47 18.49 18.29 17.49 6.43
property1_1_1 156.76 142.13 101.46 33.10 32.24 30.64 6.98
property1_2_2 251.37 246.90 141.17 52.89 52.41 50.27 12.54
property2_3_3 276.92 256.99 145.08 47.43 46.86 45.11 12.20
property2_4_2 169.18 167.75 100.07 36.38 36.18 35.26 8.79
property3_4_3 35.07 10.03 5.38 1.42 1.37 1.34 0.86
property3_4_4 46.96 18.14 5.87 1.18 1.17 1.17 0.85
property4_2_2 20.35 8.58 5.06 0.83 0.82 0.81 0.54
property4_3_7 54.87 24.96 8.84 2.91 2.83 2.81 2.27
property5_property 63.30 37.02 24.38 5.90 5.79 5.52 2.39
property6_6a_property_3 155.80 123.15 67.99 27.09 26.81 25.62 5.87
property6_6b_property_1 179.81 135.86 72.60 27.22 26.95 25.79 5.72
property9_property_0 66.15 33.42 17.63 6.21 6.15 5.89 2.61
property9_property_4 66.15 33.42 17.63 6.21 6.15 5.89 2.62
shifted geo.
mean (shift=1)
156.54 98.30 60.23 24.83 24.61 23.98 10.38
“IA” stands for interval arithmetic, i.e. symbolic IA is the approach of Wang et al. [35]. “Symbolic equations”
refers to the improved method of Wang et al. [36]. We evaluate OBBT on the LP relaxation as well as
on the MIP directly, and also our new technique OBBT2 with two different parameter settings. For OBBT
on the MIP model, a time limit of five seconds is set for each MIP which is solved. In fact, the instances
“property9_property_0” and “property9_property_4” differ only in the property to be verified, which explains
the coinciding numbers
123
Journal of Global Optimization (2021) 81:109–152 139
Table 9 Numbers of neurons that could be fixed in our UNSAT test set for different bound computation
methods
Instance Naive
IA
Sym.
IA
Symbolic
equa-
tions
OBBT LP OBBT2
k=2,
l=5
OBBT2
k= 10,
l=10
OBBT MIP
lin_acas_1_1_int_away 14 19 19 25 25 25 101
lin_acas_1_1_lin_opp2 20 31 39 51 51 51 91
lin_acas_1_1_lin_opp_dir 13 24 28 37 37 37 79
lin_acas_3_1_var_dist 22 27 30 48 48 48 211
mnist_24_image11_5 23 31 34 35 35 35 36
mnist_24_image2_5 15 24 27 28 28 28 31
mnist_24_image4_1 43 44 44 44 44 44 44
mnist_512_image11_5 385 488 639 677 677 677 677
mnist_512_image2_1 657 850 887 893 893 893 893
property10_property 53 70 77 90 90 91 190
property1_1_1 44 45 52 58 58 58 104
property1_2_2 23 23 27 30 30 30 45
property2_3_3 31 32 35 39 39 39 67
property2_4_2 27 27 27 32 32 32 51
property3_4_3 78 119 146 217 220 220 250
property3_4_4 83 113 150 272 272 272 277
property4_2_2 81 115 130 231 231 231 255
property4_3_7 82 108 157 260 262 262 269
property5_property 63 72 73 108 109 112 187
property6_6a_property_3 29 34 42 46 46 46 76
property6_6b_property_1 30 33 41 47 47 47 124
property9_property_0 62 73 84 106 106 106 148
property9_property_4 62 73 84 106 106 106 148
arithmetic mean 84.35 107.61 124.87 151.30 151.57 151.74 189.30
See Table 8for an explanation of the column names
Table 10 Runtime and number of
solving nodes for several
configurations of our model using
OBBT2
Configuration Nodes Time
no_heur_base_obbt2_10_nosort 48.2 387.4
no_heur_base_obbt2_10 48.7 400.3
no_heur_base_obbt2 51.2 252.1
no_heur_base_obbt2_nosort 51.7 249.6
no_heur_base 55.6 236.3
OBBT2 clearly reduces the number of solving nodes, which is computed
as shifted geometric mean with shift value 10 (as the runtime mean
values). We regard only those 14 instances of our UNSAT evaluation set,
which are solved within the time limit by all methods. If the execution of
a method is stopped due to the time limit, it is not reasonable to compare
the number of solving nodes between different methods
123
140 Journal of Global Optimization (2021) 81:109–152
Table 11 Runtimes on ACAS properties
Instance Result ReluVal Adv NonOpt Joint BaB Reluplex
property1_1_1 UNSAT 0.4 0.1 88.0 95.9 329.4 822.3
property1_1_2 UNSAT 0.5 0.2 146.0 155.3 703.7 1300.9
property1_1_3 UNSAT 1.9 1.5 359.6 382.5 1349.5 timelimit
property1_1_4 UNSAT 1.7 1.7 141.1 148.1 791.9 2260.2
property1_1_5 UNSAT 0.3 0.3 100.0 105.9 60.4 1750.7
property1_1_6 UNSAT 0.4 0.2 127.7 130.6 68.3 999.0
property1_1_7 UNSAT 0.1 0.1 81.8 83.5 66.0 398.0
property1_1_8 UNSAT 0.1 0.1 15.1 17.1 15.2 741.2
property1_1_9 UNSAT 0.1 0.1 15.9 14.6 16.5 204.9
property1_2_1 UNSAT 0.6 0.6 246.5 260.0 974.8 3015.1
property1_2_2 UNSAT 1.1 1.2 968.3 1029.9 1687.5 5351.5
property1_2_3 UNSAT 1.6 1.5 928.8 988.1 818.8 4123.8
property1_2_4 UNSAT 0.6 0.6 116.5 125.0 603.2 1646.8
property1_2_5 UNSAT 3.8 3.4 1187.3 1148.5 timelimit timelimit
property1_2_6 UNSAT 2.7 2.7 2319.3 2443.8 timelimit timelimit
property1_2_7 UNSAT 11.2 10.3 4368.0 4620.1 timelimit 6329.2
property1_2_8 UNSAT 3.5 3.5 3557.8 3713.8 timelimit timelimit
property1_2_9 UNSAT 8.9 7.2 timelimit timelimit timelimit timelimit
property1_3_1 UNSAT 0.4 0.5 184.0 195.0 818.1 886.6
property1_3_2 UNSAT 0.8 0.8 215.8 269.1 1224.7 2200.6
property1_3_3 UNSAT 1.1 1.3 211.0 220.9 1007.5 2235.1
123
Journal of Global Optimization (2021) 81:109–152 141
Table 11 continued
Instance Result ReluVal Adv NonOpt Joint BaB Reluplex
property1_3_4 UNSAT 0.6 0.6 121.9 130.0 604.1 2025.7
property1_3_5 UNSAT 1.5 1.5 459.9 480.5 4595.1 2737.0
property1_3_6 UNSAT 28.4 21.9 1124.0 1174.8 timelimit timelimit
property1_3_7 UNSAT 12.1 10.9 1081.2 1142.7 timelimit timelimit
property1_3_8 UNSAT 7.8 6.7 2540.9 2665.7 timelimit timelimit
property1_3_9 UNSAT 11.7 8.5 982.6 1052.0 timelimit timelimit
property1_4_1 UNSAT 17.0 16.0 567.0 597.6 2525.6 timelimit
property1_4_2 UNSAT 2.5 2.5 490.0 501.4 1287.9 5064.6
property1_4_3 UNSAT 1.1 1.2 1344.5 1370.2 940.5 3106.0
property1_4_4 UNSAT 0.8 1.0 134.9 137.9 829.4 2679.3
property1_4_5 UNSAT 2.9 3.1 1927.2 2002.1 timelimit timelimit
property1_4_6 UNSAT 22.6 15.1 timelimit timelimit timelimit timelimit
property1_4_7 UNSAT 22.3 17.6 3869.8 4205.4 timelimit timelimit
property1_4_8 UNSAT 48.9 16.8 1786.9 1856.2 timelimit timelimit
property1_4_9 UNSAT 21.2 15.8 timelimit timelimit timelimit timelimit
property1_5_1 UNSAT 0.6 0.6 255.4 301.4 1204.2 1424.1
property1_5_2 UNSAT 0.6 0.6 204.2 171.7 809.2 3187.3
property1_5_3 UNSAT 0.3 0.3 165.2 177.5 794.2 1461.0
property1_5_4 UNSAT 0.4 0.5 111.0 124.8 594.5 3011.9
property1_5_5 UNSAT 1.2 1.2 624.0 652.0 timelimit 6622.6
123
142 Journal of Global Optimization (2021) 81:109–152
Table 11 continued
Instance Result ReluVal Adv NonOpt Joint BaB Reluplex
property1_5_6 UNSAT 15.9 13.7 3092.2 3248.8 timelimit timelimit
property1_5_7 UNSAT 3.4 3.4 3427.0 3558.0 timelimit timelimit
property1_5_8 UNSAT 16.1 11.0 3538.5 3816.2 timelimit timelimit
property1_5_9 UNSAT 7.8 7.4 4706.9 4995.1 timelimit timelimit
property2_2_1 SAT 0.1 0.1 24.8 1.4 17.2 386.6
property2_2_2 SAT 0.3 0.2 1.4 1.2 15.5 9.9
property2_2_3 SAT 0.2 0.1 24.8 1.2 15.3 43.3
property2_2_4 SAT 0.1 0.1 1.2 1.2 13.0 29.6
property2_2_5 SAT 0.1 0.1 1.2 1.4 16.4 238.8
property2_2_6 SAT 0.2 0.1 1.3 1.2 15.6 5195.4
property2_2_7 SAT 0.1 0.1 1.2 1.2 16.2 4498.2
property2_2_8 SAT 0.1 0.1 1.2 1.2 17.1 697.7
property2_2_9 SAT timelimit 0.3 1.2 1.4 16.6 (wrong)
property2_3_1 SAT 0.2 0.2 1.2 1.1 13.4 583.1
property2_3_2 SAT 4252.8 0.2 23.7 1.5 88.1 777.6
property2_3_3 UNSAT 5382.9 95.6 3548.5 3964.8 timelimit timelimit
property2_3_4 SAT 0.4 0.1 1.2 1.2 13.4 1463.1
property2_3_5 SAT 0.1 0.1 1.2 1.2 14.3 3761.4
property2_3_6 SAT 0.2 0.1 1.2 1.3 15.0 149.3
property2_3_7 SAT 87.2 3.2 1.2 1.2 17.1 595.6
123
Journal of Global Optimization (2021) 81:109–152 143
Table 11 continued
Instance Result ReluVal Adv NonOpt Joint BaB Reluplex
property2_3_8 SAT 0.3 0.2 1.3 1.2 15.3 536.5
property2_3_9 SAT 0.2 0.1 4.7 1.2 15.1 67.2
property2_4_1 SAT 0.1 0.1 1.4 1.2 13.5 700.8
property2_4_2 UNSAT timelimit 205.2 timelimit timelimit timelimit timelimit
property2_4_3 SAT 0.2 0.1 1.3 6.0 18.0 10.6
property2_4_4 SAT 0.1 0.1 1.3 1.3 12.9 75.9
property2_4_5 SAT 0.2 0.1 1.1 1.4 16.4 3430.9
property2_4_6 SAT 0.1 0.1 1.2 1.3 15.8 1169.8
property2_4_7 SAT 0.1 0.1 1.3 1.2 16.1 3470.2
property2_4_8 SAT 0.1 0.1 1.2 1.3 16.7 4274.8
property2_4_9 SAT 50.6 0.1 1.2 1.2 16.7 5354.9
property2_5_1 SAT 0.1 0.1 1.2 1.2 15.9 1456.7
property2_5_2 SAT 0.2 0.1 1.4 1.2 14.1 290.1
property2_5_3 SAT timelimit (UNSAT) 691.7 2.1 1199.7 (wrong)
property2_5_4 SAT 0.2 0.2 4.1 1.2 16.2 33.0
property2_5_5 SAT 0.2 0.1 1.2 1.2 15.3 932.4
property2_5_6 SAT 0.1 0.1 1.1 1.4 17.1 1857.9
property2_5_7 SAT 0.2 0.1 1.3 1.2 18.0 1211.9
property2_5_8 SAT 0.1 0.1 1.2 1.3 16.5 5435.2
property2_5_9 SAT 0.1 0.1 1.3 1.4 20.4 (wrong)
property3_1_1 UNSAT 112.2 73.1 40.0 42.3 79.9 6946.6
property3_1_2 UNSAT 2.4 2.5 10.9 13.5 13.2 5588.8
123
144 Journal of Global Optimization (2021) 81:109–152
Table 11 continued
Instance Result ReluVal Adv NonOpt Joint BaB Reluplex
property3_1_3 UNSAT 3.7 3.7 87.8 92.9 105.9 1277.5
property3_1_4 UNSAT 0.3 0.3 3.4 4.4 6.5 595.9
property3_1_5 UNSAT 0.2 0.2 2.9 2.9 6.1 359.8
property3_1_6 UNSAT 0.1 0.1 2.4 3.4 5.4 74.2
property3_2_1 UNSAT 21.5 15.6 7.8 14.8 9.8 1367.4
property3_2_2 UNSAT 8.1 8.2 11.4 12.8 18.8 739.7
property3_2_3 UNSAT 0.2 3.2 40.8 44.8 159.1 1184.2
property3_2_4 UNSAT 0.6 0.6 2.6 2.6 4.8 47.5
property3_2_5 UNSAT 0.4 0.4 2.8 4.1 5.5 268.8
property3_2_6 UNSAT 0.1 0.1 2.4 3.4 5.7 90.4
property3_2_7 UNSAT 0.2 0.3 2.4 3.7 5.6 132.9
property3_2_8 UNSAT 0.4 0.1 2.2 3.2 5.5 93.8
property3_2_9 UNSAT 0.1 0.1 1.9 2.1 4.6 31.5
property3_3_1 UNSAT 3.0 2.8 3.5 4.5 4.9 202.0
property3_3_2 UNSAT 6.1 6.1 19.1 21.0 69.2 1772.9
property3_3_3 UNSAT 0.3 0.3 3.5 4.4 5.1 1234.8
property3_3_4 UNSAT 0.5 0.5 7.0 8.3 12.7 238.8
property3_3_5 UNSAT 2.1 2.1 2.8 3.9 5.4 94.5
property3_3_6 UNSAT 22.0 20.8 3.5 8.0 13.6 287.9
property3_3_7 UNSAT 0.2 0.1 2.0 3.1 4.3 40.5
property3_3_8 UNSAT 3.5 3.6 2.8 3.8 5.4 205.1
123
Journal of Global Optimization (2021) 81:109–152 145
Table 11 continued
Instance Result ReluVal Adv NonOpt Joint BaB Reluplex
property3_3_9 UNSAT 2.7 2.7 2.5 3.8 4.7 134.2
property3_4_1 UNSAT 8.5 8.4 8.3 10.0 28.9 231.8
property3_4_2 UNSAT 106.8 81.0 22.8 24.6 64.4 3240.4
property3_4_3 UNSAT 2.3 2.3 20.3 22.3 90.8 1990.7
property3_4_4 UNSAT 0.2 0.2 2.3 3.5 4.8 100.7
property3_4_5 UNSAT 0.1 0.1 2.5 2.6 6.1 39.9
property3_4_6 UNSAT 0.2 0.2 4.6 5.9 6.8 346.4
property3_4_7 UNSAT 0.5 0.5 2.5 3.6 5.7 144.9
property3_4_8 UNSAT 1.8 1.8 2.9 4.1 6.0 165.3
property3_4_9 UNSAT 0.1 0.1 3.1 4.4 6.1 162.3
property3_5_1 UNSAT 22.8 20.9 24.0 29.0 72.0 1278.7
property3_5_2 UNSAT 2.3 2.2 3.0 3.5 5.3 170.5
property3_5_3 UNSAT 0.2 0.2 3.7 4.8 5.5 388.6
property3_5_4 UNSAT 0.2 0.2 5.5 3.6 5.0 72.0
property3_5_5 UNSAT 0.5 0.5 3.2 4.4 5.4 98.6
property3_5_6 UNSAT 0.9 1.0 2.8 3.9 7.0 329.5
property3_5_7 UNSAT 0.1 0.1 2.3 3.2 4.8 41.2
property3_5_8 UNSAT 0.1 0.1 2.6 3.7 5.9 353.1
property3_5_9 UNSAT 0.1 0.1 2.2 2.2 4.9 22.7
property4_1_1 UNSAT 1.2 1.1 8.7 11.0 12.7 1463.5
property4_1_2 UNSAT 1.3 1.3 15.3 17.3 13.4 1437.2
property4_1_3 UNSAT 0.4 0.4 25.8 27.6 46.8 1328.9
property4_1_4 UNSAT 0.3 0.3 6.5 8.4 12.9 121.2
123
146 Journal of Global Optimization (2021) 81:109–152
Table 11 continued
Instance Result ReluVal Adv NonOpt Joint BaB Reluplex
property4_1_5 UNSAT 0.5 0.5 6.3 7.8 5.7 406.0
property4_1_6 UNSAT 0.2 0.3 3.2 4.4 5.8 249.7
property4_2_1 UNSAT 0.8 0.9 8.9 11.4 6.8 371.8
property4_2_2 UNSAT 2.1 2.1 7.6 10.0 6.0 471.0
property4_2_3 UNSAT 0.9 1.0 3.1 3.7 6.0 284.0
property4_2_4 UNSAT 0.2 0.2 3.2 4.2 5.3 98.4
property4_2_5 UNSAT 0.4 0.4 3.2 4.1 5.5 174.0
property4_2_6 UNSAT 0.3 0.3 3.9 5.0 6.5 135.1
property4_2_7 UNSAT 0.1 0.1 2.3 3.1 5.1 39.6
property4_2_8 UNSAT 0.1 0.1 7.8 9.3 14.8 623.0
property4_2_9 UNSAT 0.1 0.1 2.1 7.2 4.9 59.3
property4_3_1 UNSAT 1.1 1.2 4.4 5.5 5.7 556.1
property4_3_2 UNSAT 0.5 0.4 3.8 5.0 5.7 125.7
property4_3_3 UNSAT 0.1 0.1 3.2 4.2 4.7 131.6
property4_3_4 UNSAT 0.2 0.2 3.0 4.0 6.1 85.9
property4_3_5 UNSAT 1.4 1.1 4.2 5.3 6.3 233.2
property4_3_6 UNSAT 1.4 1.3 3.7 4.8 6.3 161.4
property4_3_7 UNSAT 0.4 0.3 2.6 2.8 5.0 346.1
property4_3_8 UNSAT 0.3 0.3 6.6 7.7 44.7 148.9
property4_3_9 UNSAT 1.5 1.5 3.8 3.9 5.8 703.1
property4_4_1 UNSAT 3.2 3.1 3.4 4.3 5.7 55.8
123
Journal of Global Optimization (2021) 81:109–152 147
Table 11 continued
Instance Result ReluVal Adv NonOpt Joint BaB Reluplex
property4_4_2 UNSAT 2.3 2.2 4.0 4.5 6.0 277.5
property4_4_3 UNSAT 1.3 1.3 4.0 4.6 5.9 277.7
property4_4_4 UNSAT 1.5 1.5 7.2 8.5 48.9 235.8
property4_4_5 UNSAT 1.5 1.6 3.7 4.7 6.7 246.1
property4_4_6 UNSAT 0.1 0.1 3.4 4.5 5.7 200.5
property4_4_7 UNSAT 0.2 0.2 2.6 3.7 5.4 53.9
property4_4_8 UNSAT 0.2 0.2 3.4 3.7 6.0 221.5
property4_4_9 UNSAT 0.1 0.1 4.0 4.5 6.2 480.2
property4_5_1 UNSAT 2.9 2.9 3.9 4.9 6.2 583.3
property4_5_2 UNSAT 0.7 0.6 3.8 4.7 5.5 239.0
property4_5_3 UNSAT 0.2 0.2 3.6 4.6 5.5 141.4
property4_5_4 UNSAT 0.2 0.2 2.9 3.9 5.0 164.5
property4_5_5 UNSAT 0.6 0.6 3.7 4.8 5.9 130.2
property4_5_6 UNSAT 0.4 0.4 3.1 3.3 5.7 182.7
property4_5_7 UNSAT 0.1 0.1 2.4 3.4 5.4 44.2
property4_5_8 UNSAT 0.1 0.1 3.2 4.3 5.7 126.3
property4_5_9 UNSAT 0.2 0.1 2.5 3.5 5.4 133.1
property8 SAT 2939.7 74.5 22.9 1.2 20.3 timelimit
shifted geo. mean 5.7 3.1 34.1 34.1 67.5 669.0
In adversary check mode, Reluval fails on property2_5_3 and reports UNSAT instead of SAT. We set the runtime to 7200 s for the mean computation in this case. We also assume
a runtime of 7200 s for those three cases, for which Reluplex reports counterexamples that are not even valid with a numerical tolerance of 10−3. These are denoted as “wrong”
in the table
123
148 Journal of Global Optimization (2021) 81:109–152
Table 12 Linear ACAS instances run with our model using the formulation as optimization problem
Instance Dual Bound Primal Bound Nodes Result Status Time
1_1_int_away 0.0057 0.0057 229 UNSAT optimal 415.6
1_1_int_away2 0.015 0.015 403 UNSAT optimal 560.7
1_1_lin_opp 0.91 0.91 315 UNSAT optimal 1316.3
1_1_lin_opp2 0.6 0.6 249 UNSAT optimal 983.4
1_1_lin_opp2_dir 0.6 0.6 583 UNSAT optimal 2404.9
1_1_lin_opp_dir 0.69 0.69 543 UNSAT optimal 2488.6
1_2_int_away −1.7e+04 −0.11 2 SAT bound 23
1_2_int_away2 −1.7e+04 −0.1 2 SAT bound 22.3
2_1_var_dist 0.13 0.16 483 UNSAT bound 1679.2
2_2_lin_opp −2.5e+06 −2.8 1 SAT bound 1.7
2_2_lin_opp2 −4.3e+02 −0.21 41 SAT bound 537.9
3_1_var_dist 0.12 0.26 285 UNSAT bound 629.2
In the column status, “optimal” means that the problem was solved to optimality, while “bound” implies that
the solving process was interrupted due to a positive dual or negative primal bound
Table 13 Linear ACAS instances run with our model using the formulation as feasibility problem
Instance Dual Bound Primal Bound Nodes Result Status Time
1_1_int_away – – 227 UNSAT infeasible 395.3
1_1_int_away2 – – 515 UNSAT infeasible 634.2
1_1_lin_opp – – 313 UNSAT infeasible 1217.3
1_1_lin_opp2 – – 247 UNSAT infeasible 934.5
1_1_lin_opp2_dir – – 583 UNSAT infeasible 2315.9
1_1_lin_opp_dir – – 543 UNSAT infeasible 2410.8
1_2_int_away – – 2 SAT optimal 17.7
1_2_int_away2 – – 2 SAT optimal 17.9
2_2_lin_opp – – 1 SAT optimal 5.5
2_2_lin_opp2 – – 297 SAT optimal 1763.2
Using this formulation, there are no meaningful primal and dual bounds. The solution status shows that
either infeasibility is detected, or a feasible, i.e. optimal, solution is found. Here we only show results for the
disjunction instances in the test set
definition of an independent ReLU approximation. Because ˆx∈Pis fixed, we can find a
lower and upper bound for each yk,k∈[n]which define the feasible range. Of course, these
bounds can be infinite. Thus, we obtain (7).
Theorem 3Let P ⊂Rnbe a ReLU proper polytope and QEbe the approximation of P as
in Definition 3. For any independent ReLU approximation Q of P it holds QE⊆Q.
Proof Let i∈[n]be fixed and Qbe an independent ReLU approximation of P.Wedefine
the set
Ci:= (xi,yi)|x∈P,yi=max{0,xi}
123
Journal of Global Optimization (2021) 81:109–152 149
and let Qibe the embedded image (in R2) of the projection of Qon the variables xiand
yi. We claim that it holds conv(Ci)⊆Qi. By definition of Qwe have Ci⊆Qiand Qiis
a polyhedron, hence convex and therefore the claim holds. In the first part of this proof, we
show that conv(Ci)=QEi. To this end, we set
˜
QE:= x
y∈P×Rn|A(i)x+B(i)y≤c(i)⊂R2n
where A(i),B(i)and c(i)are defined according to Definition 3and for our fixed index i.
Obviously, we have QE⊆˜
QEwhich implies QEi⊆˜
QEi. We will show that ˜
QEi=
conv(Ci), which allows us to conclude that
conv(Ci)⊆QEi⊆˜
QEi=conv(Ci).
Let li:= minx∈Pxiand ui:= maxx∈Pxi.Thenwefind
Ci=(xi,max{0,xi})xi∈[li,ui],
which is a direct consequence of the convexity of P. Because Pis ReLU proper we have
li<0<ui. Thus we can write
Ci=(xi,0)xi∈[li,0]∪(xi,xi)xi∈[0,ui].
Hence, Ciis the union of two one-dimensional polytopes and conv(Ci)has three vertices
at coordinates (li,0), (0,0)and (ui,ui). These are exactly the vertices of the polytopes that
constitute Ci. Definition 3establishes A(i),B(i)and c(i)such that they imply the following
constraints for (x,y)T∈˜
QE:
yi≥0
yi≥xi
yi≤ui(xi−li)
ui−li
The segments between the vertices of conv(Ci)induce the following lines:
(li,0), (0,0)yi=0
(0,0), (ui,ui)yi=xi
(ui,ui), (li,0)yi=ui(xi−li)
ui−li
Taking the respective third vertex into account, we obtain exactly the constraints of ˜
QEand
hence show that ˜
QEi=conv(Ci).Since
conv(Ci)⊆QEi⊆˜
QEi=conv(Ci),
we find that conv(Ci)=QEi. Now, we will combine this result with Lemma 1in order to
prove the theorem.
To this end, we fix an arbitrary ˆx∈P. Combining the last result with (6) and applying it
to QE, we obtain:
conv(Ci)∩({ˆxi}×R)=QE∩({ˆx}×Rn)i.
Using conv(Ci)⊆Qi, which we showed in the beginning of the proof, and (6) applied to
Q,gives
conv(Ci)∩({ˆxi}×R)⊆Qi∩({ˆxi}×R)=Q∩({ˆx}×Rn)i.
123
150 Journal of Global Optimization (2021) 81:109–152
Thus we obtain
QE∩({ˆx}×Rn)i=conv(Ci)∩({ˆxi}×R)⊆Q∩({ˆx}×Rn)i.(15)
As a consequence of (7) in Lemma 1we have
Q∩({ˆx}×Rn)i={ˆxi}×[αi,β
i]
and
Q∩({ˆx}×Rn)={ˆx}×[α1,β
1]×...×[αn,β
n]
is entirely determined by the projections for all i∈[n]. The same holds for QEinstead of
Q. Since we fixed i∈[n]arbitrarily in the beginning, with (15) we are now able to conclude
that
QE∩({ˆx}×Rn)⊆Q∩({ˆx}×Rn).
Now fix an arbitrary (ˇx,ˇy)∈QE.Since ˆx∈Pwas also arbitrary, we have
(ˇx,ˇy)∈QE∩({ˇx}×Rn)⊆Q∩({ˇx}×Rn)
which implies (ˇx,ˇy)∈Q. This proves QE⊆Q.
References
1. Achterberg, T.: Constraint Integer Programming. PhD thesis, TU Berlin (2007). https://doi.org/10.14279/
depositonce-1634
2. Anderson, R., Huchette, J., Tjandraatmadja, C., Vielma, J.P.: Strong convex relaxations and mixed-integer
programming formulations for trained neural networks (2018). https://arxiv.org/abs/1811.01988
3. Bölcskei, H., Grohs, P., Kutyniok, G., Petersen, P.: Optimal approximation with sparsely connected
deep neural networks. SIAM J. Math. Data Sci. (2019). http://www.nari.ee.ethz.ch/commth/pubs/p/deep-
approx-18
4. Bunel, R., Turkaslan, I., Torr, P.H.S., Kohli, P., Kumar, M.P.: Piecewise linear neural network verification:
a comparative study (2017). https://arxiv.org/abs/1711.00455
5. Bunel, R., Turkaslan, I., Torr, P.H.S., Kohli, P., Mudigonda, P.K.: A unified view of piecewise linear
neural network verification. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N.,
Garnett, R. (eds.) Advances in Neural Information Processing Systems 31 (NIPS 2018), pp. 4795–4804
(2018). https://arxiv.org/abs/1711.00455v3
6. Cheng, C.-H., Nührenberg, G., Ruess, H.: Maximum resilience of artificial neural networks. In: D’Souza,
D., Narayan Kumar, K. (eds.) Automated Technology for Verification and Analysis, pp. 251–268. Springer,
Cham (2017). ISBN 978-3-319-68167-2
7. Chih-Hong, C., Georg, N., Chung-Hao, H., Harald, R.: Verification of binarized neural networks via
inter-neuron factoring. In: Verified Software. Theories, Tools, and Experiments—10th International Con-
ference: Revised Selected Papers, pp. 279–290 (2018). https://doi.org/10.1007/978-3-030-03592-1_16
8. Dutta, S., Jha, S., Sankaranarayanan, S., Tiwari, A.: Output range analysis for deep feedforward neural
networks. In: NASA Formal Methods—10th International Symposium, NFM 2018, Newport News, VA,
USA, April 17–19, 2018, Proceedings, pp. 121–138 (2018). https://doi.org/10.1007/978-3-319-77935-
5_9
9. Dvijotham, K., Stanforth, R., Gowal, S., Mann, T.A., Kohli, P.: A dual approach to scalable verification
of deep networks. In: UAI, pp. 550–559. AUAI Press (2018)
10. Ehlers, R.: Formal verification of piece-wise linear feed-forward neural networks. In: D’Souza, D.,
Narayan Kumar, K. (eds.) Automated Technology for Verification and Analysis, pp. 269–286. Springer,
Cham (2017). https://doi.org/10.1007/978-3-319-68167-2_19. ISBN 978-3-319-68167-2
11. Fischetti, M., Jo, J.: Deep neural networks and mixed integer linear optimization. Constraints 23(3),
296–309 (2018). https://doi.org/10.1007/s10601-018-9285-6. ISSN 1383-7133
123
Journal of Global Optimization (2021) 81:109–152 151
12. Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M.T.: AI2: safety and
robustness certification of neural networks with abstract interpretation. In: IEEE Symposium on Security
and Privacy, pp. 3–18. IEEE Computer Society (2018)
13. Gleixner, A., Bastubbe, M., Eifler, L., Gally, T., Gamrath, G., Gottwald, R.L., Hendel, G., Hojny, C., Koch,
T., Lübbecke, M.E., Maher, S.J., Miltenberger, M., Müller, B., Pfetsch, M.E., Puchert, C., Rehfeldt, D.,
Schlösser, F., Schubert, C., Serrano, F., Shinano, Y., Viernickel, J.M., Walter, M., Wegscheider, F., Witt,
J.T., Witzig, J.: The SCIP Optimization Suite 6.0. Technical report, Optimization Online (2018). http://
www.optimization-online.org/DB_HTML/2018/07/6692.html
14. Gleixner, A.M., Berthold, T., Müller, B., Weltge, S.: Three enhancements for optimization-based bound
tightening. J. Global Optim. 67(4), 731–757 (2017). https://doi.org/10.1007/s10898-016-0450-4. ISSN
1573-2916
15. Grimstad, B., Andersson, H.: Relu networks as surrogate models in mixed-integer linear programs. Com-
put. Chem. Eng. (2019). https://doi.org/10.1016/j.compchemeng.2019.106580
16. Hendel, G.: Empirical analysis of solving phases in mixed integer programming. Master’s thesis, TU
Berlin (2014)
17. Katz, G., Barrett, C.W., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for
verifying deep neural networks. In: Computer Aided Verification—29th International Conference, CAV
2017, Heidelberg, Germany, July 24–28, 2017, Proceedings, Part I, pp. 97–117 (2017). https://doi.org/
10.1007/978-3-319-63387-9_5
18. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition.
Proc. IEEE 86(11), 2278–2324 (1998)
19. Lomuscio, A., Maganti, L.: An approach to reachability analysis for feed-forward relu neural networks
(2017). http://arxiv.org/abs/1706.07351
20. Narodytska, N., Kasiviswanathan, S., Ryzhyk, L., Sagiv, M., Walsh, T.: Verifying properties of binarized
deep neural networks. In: AAAI Conference on Artificial Intelligence. Association for the Advancement
of Artificial Intelligence (2018). https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16898
21. Pulina, L., Tacchella, A.: An abstraction-refinement approach to verification of artificial neural networks.
In: Touili, T., Cook, B., Jackson, P. (eds.) Computer Aided Verification, pp. 243–257. Springer, Berlin
(2010). ISBN 978-3-642-14295-6
22. Pulina, L., Tacchella, A.: Challenging SMT solvers to verify neural networks. Ai Commun. 25, 117–135
(2012). https://doi.org/10.3233/AIC-2012-0525
23. Raghunathan, A., Steinhardt, J., Liang, P.S.: Semidefinite relaxations for certifying robustness to adver-
sarial examples. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R.
(eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 10877–10887. Curran Associates
Inc., Red Hook (2018)
24. Rössig, A.: Verification of neural networks. Technical Report 19-40, ZIB (2019). http://nbn-resolving.
de/urn:nbn:de:0297-zib-74174
25. Ruan, W., Huang, X., Kwiatkowska, M.: Reachability analysis of deep neural networks with provable
guarantees. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelli-
gence, IJCAI-18, pp. 2651–2659. International Joint Conferences on Artificial Intelligence Organization,
p. 7 (2018). https://doi.org/10.24963/ijcai.2018/368
26. Scheibler, K., Winterer, L., Wimmer, R., Becker, B.: Towards verification of artificial neural networks. In:
Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen,
MBMV 2015, Chemnitz, Germany, March 3–4, 2015, pp. 30–40 (2015)
27. Serra, T., Ramalingam, S.: Empirical bounds on linear regions of deep rectifier networks. In: The
Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative
Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educa-
tional Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020, pp.
5628–5635. AAAI Press (2020). https://aaai.org/ojs/index.php/AAAI/article/view/6016
28. Serra, T., Kumar, A., Ramalingam, S.: Lossless compression of deep neural networks (2020). https://
arxiv.org/abs/2001.00218v3
29. Singh, G., Gehr, T., Mirman, M., Püschel, M., Vechev, M.T.: Fast and effective robustness certification.
In: NeurIPS, pp. 10825–10836 (2018)
30. Singh, G., Gehr, T., Püschel, M., Vechev, M.T.: An abstract domain for certifying neural networks.
PACMPL 3(POPL), 41:1–41:30 (2019a)
31. Singh, G., Gehr, T., Püschel, M., Vechev, M.T.: Boosting robustness certification of neural networks.
In: International Conference on Learning Representations (2019b). https://files.sri.inf.ethz.ch/website/
papers/RefineAI.pdf
123
152 Journal of Global Optimization (2021) 81:109–152
32. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.J., Fergus, R.: Intriguing
properties of neural networks. In: International Conference on Learning Representations (2014). https://
arxiv.org/abs/1312.6199v4
33. Tjeng, V., Tedrake, R.: Verifying neural networks with mixed integer programming (2017). https://arxiv.
org/abs/1711.07356v1
34. Tjeng, V., Xiao, K.Y., Tedrake, R.: Evaluating robustness of neural networks with mixed integer pro-
gramming. In: International Conference on Learning Representations (2019). https://arxiv.org/abs/1711.
07356v3
35. Wang, S., Pei, K., Whitehouse, J., Yang, J., Jana, S.: Formal security analysis of neural networks using
symbolic intervals. In: 27th USENIX Security Symposium (USENIX Security 18). USENIX Association,
Baltimore (2018a). https://www.usenix.org/conference/usenixsecurity18/presentation/wang-shiqi
36. Wang, S., Pei, K., Whitehouse, J., Yang, J., Jana, S.: Efficient formal safety analysis of neural networks.
In 32nd Conference on Neural Information Processing Systems (NIPS), Montreal (2018b). https://arxiv.
org/abs/1809.08098
37. Weng, T.W., Zhang, H., Chen, H., Song, Z., Hsieh, C.J., Boning, D., Dhillon, I.S., Daniel, L.: Towards
fast computation of certified robustness for RELU networks. In: International Conference on Machine
Learning (ICML) (2018)
38. Wong, E., Kolter, Z.: Provable defenses against adversarial examples via the convex outer adversarial
polytope. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine
Learning, Volume 80 of Proceedings of Machine Learning Research, pp. 5286–5295. PMLR, Stockholm
(2018). https://arxiv.org/abs/1711.00851
39. Wong, E., Schmidt, F., Metzen, J.H., Zico K.J.: Scaling provable adversarial defenses. In: Bengio, S.,
Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural
Information Processing Systems 31, pp. 8400–8409. Curran Associates Inc (2018). http://papers.nips.cc/
paper/8060-scaling-provable-adversarial-defenses.pdf
40. Xiang, W., Tran, H.D., Johnson, T.T.: Output reachable set estimation and verification for multi-layer
neural networks. IEEE Trans. Neural Netw. Learn. Syst. (2018). https://doi.org/10.1109/TNNLS.2018.
2808470
41. Xiang, W., Tran, H.D., Rosenfeld, J.A., Johnson, T.T.: Reachable set estimation and safety verification for
piecewise linear systems with neural network controllers. In: 2018 Annual American Control Conference
(ACC), pp. 1574–1579 (2018). https://doi.org/10.23919/ACC.2018.8431048
42. Zhang, H., Weng, T.-W., Chen, P.-Y., Hsieh, C.-J., Daniel, L.: Efficient neural network robustness cer-
tification with general activation functions. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K.,
Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp.
4939–4948. Curran Associates Inc., Red Hook (2018)
43. Zhang, H., Zhang, P., Hsieh, C.J.: Recurjac: an efficient recursive algorithm for bounding jacobian matrix
of neural networks and its applications (2019). https://arxiv.org/abs/1810.11783
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
123