On Cutting Planes for Mixed-Integer
Nonlinear Programming
vorgelegt von
M. Sc.
Felipe Serrano Musalem
ORCID: 0000-0002-7892-3951
an der Fakult¨at II – Mathematik und Naturwissenschaften
der Technischen Universit¨at Berlin
zur Erlangung des akademischen Grades
Doktor der Naturwissenschaften
– Dr. rer. nat. –
genehmigte Dissertation
Promotionsausschuss:
Vorsitzender: Prof. Dr. Wolfgang K¨onig
Gutachter: Prof. Dr. Thorsten Koch
Prof. Dr. Juan Pablo Vielma
Tag der wissenschaftlichen Aussprache: 21. August 2020
Berlin 2021
Abstract
Mixed-integer nonlinear programming is a powerful technology that allows
us to model and solve problems involving nonlinear functions, continuous,
and discrete variables. The state-of-the-art solvers of mixed-integer nonlinear
programs (MINLPs) use a combination of, among other techniques, branch-
and-bound and cutting planes. In the late ’90s, solvers for mixed-integer linear
programs saw an increase in performance due to the incorporation of general-
purpose cutting planes.
In this thesis, we deepen our understanding of a classical cutting planes
algorithm, develop a strengthening technique, and two new cutting planes for
MINLPs.
We first show that Veinott’s supporting hyperplane algorithm is a particular
case of Kelley’s cutting plane algorithm. We further extend the applicability
of Veinott’s supporting hyperplane algorithm to solve convex problems repre-
sented by non-convex functions.
We then develop a technique to strengthen cutting planes for non-convex
MINLPs. Many cuts for non-convex MINLPs strongly rely on the domain
of the variables: tighter bounds produce tighter cuts. Using the point to be
separated, we show that we can restrict the feasible region and still ensure
the validity of the resulting cutting plane.
Finally, we develop two intersection cuts for non-convex MINLP. The first
one is a technique to construct
S
-free sets for any factorable MINLP. For the
second one, we show how to build maximal quadratic-free sets, from which we
compute intersection cuts. These last cuts reduce the average running time
of the solver SCIP by 20% on hard MINLPs.
i
Zusammenfassung
Die gemischt-ganzzahlige nichtlineare Programmierung ist eine leistungsstarke
Technik, mit der wir Probleme modellieren und l¨osen k¨onnen, die nichtlineare
Funktionen und kontinuierliche und diskrete Variablen enthalten. Die hoch-
modernen L¨oser f¨ur gemischt-ganzzahlige nichtlineare Programme (MINLPs)
verwenden unter anderem eine Kombination der Branch-and-Bound-Methode
und Schnittebenengenerierung. In den sp¨aten 90er Jahren erfuhren die L¨oser
f¨ur gemischt-ganzzahlige lineare Programme eine Leistungssteigerung durch
die Einbeziehung von universell nutzbaren Schnittebenen.
In dieser Arbeit vertiefen wir unser Verst¨andnis eines klassischen Schnitt-
ebenen-Algorithmus, wir entwickeln eine Verst¨arkungstechnik und zwei neue
Schnittebenen f¨ur MINLPs.
Zun¨achst zeigen wir, dass der St¨utzhyperebenen-Algorithmus von Veinott
ein Sonderfall des Kelley’schen Schnittebenen-Algorithmus ist. Dar¨uber hinaus
erweitern wir die Anwendbarkeit von Veinotts St¨utzhyperebenen-Algorithmus
auf die L¨osung konvexer Probleme, die durch nicht-konvexe Funktionen repr¨a-
sentiert werden.
Anschließend entwickeln wir eine Technik zur Verst¨arkung der Schnittebe-
nen f¨ur nicht-konvexe MINLPs. Viele Schnitte f¨ur nicht-konvexe MINLPs
h¨angen stark vom Wertebereich der Variablen ab: Strengere Schranken erzeu-
gen st¨arkere Schnitte. Anhand des zu separierenden Punktes zeigen wir, dass
wir die zul¨assige Region einschr¨anken k¨onnen und dennoch die G¨ultigkeit der
resultierenden Schnitte beibehalten.
Schließlich entwickeln wir zwei
¨
Uberschneidungsschnittebenen f¨ur nicht-
konvexe MINLPs. Der erste Schnitt ist eine Technik zur Konstruktion
S-freier
Mengen f¨ur beliebige faktorisierbare MINLPs. F¨ur den zweiten Schnitt zeigen
wir, wie man maximal quadratisch-freie Mengen bildet, aus denen wir
¨
Uber-
schneidungsschnittebenen berechnen. Diese Schnitte reduzieren die durch-
schnittliche Laufzeit des L¨osers SCIP um 20% bei schwierigen Problemen.
iii
Contents
Abstract i
Zusammenfassung iii
1 Introduction 1
1.1 Mathematical Preliminaries . . . . . . . . . . . . . . . . . . . . 6
1.2 Intersection Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Monoidal Strengthening . . . . . . . . . . . . . . . . . . . . . . 23
1.4.1 One Row Relaxations: Gomory Cuts, 24
1.4.2 Disjunctive Cuts, 24
1.4.3 Monoidal Strengthening, 27
2 On the Relation Between the Extended Supporting Hyperplane
Algorithm and Kelley’s Cutting Plane Algorithm 31
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.1.1 Literature Review, 37
2.2 Characterization of Functions with Supporting Linearizations .39
2.3 The Gauge Function . . . . . . . . . . . . . . . . . . . . . . . . 42
2.3.1 Using the Gauge Function for Separation, 42
2.3.2 Evaluating the Gauge Function, 44
2.3.3 Handling Sets with Empty Interior, 44
2.3.4 Using a Nonzero Interior Point, 45
2.4 Convergence Proofs . . . . . . . . . . . . . . . . . . . . . . . . 45
v
vi Contents
2.5
Convex Programs Represented by Non-Convex Non-Smooth Func-
tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.5.1 The ESH Algorithm in the Context of Generalized Differentiability, 47
2.5.2 Limits to the Applicability of the ESH Algorithm, 50
2.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 51
3 Visible Points, the Separation Problem, and Applications to Mixed-
Integer Nonlinear Programming 53
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2 Visible Points and the Reverse Polar . . . . . . . . . . . . . . . 56
3.3 The Smallest Generators . . . . . . . . . . . . . . . . . . . . . 58
3.3.1 Motivation, 58
3.3.2 Preliminaries, 61
3.3.3 Results, 62
3.4 Applications to MINLP . . . . . . . . . . . . . . . . . . . . . . 65
3.4.1 Characterizing the Visible Points, 66
3.5 Conclusions and Outlook . . . . . . . . . . . . . . . . . . . . . 72
4 Intersection Cuts for Factorable Mixed-Integer Nonlinear Pro-
gramming 73
4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2 Literature Review and Related Work . . . . . . . . . . . . . . 76
4.3 Concave Underestimators . . . . . . . . . . . . . . . . . . . . . 78
4.3.1
Concave Underestimators and Intersection Cuts for Convex Constraints, 81
4.4 Enlarging the S-free Sets by Using Bound Information . . . . . 83
4.5 “Monoidal” Strengthening . . . . . . . . . . . . . . . . . . . . 84
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5 Maximal Quadratic-Free Sets 89
5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.1.1 Related Work, 90
5.1.2 Contribution, 91
5.1.3 Notation, 91
5.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.2.1 Techniques for Proving Maximality, 92
Contents vii
5.3 Maximal Quadratic-Free Sets for Homogeneous Quadratics . . 97
5.3.1 Removing Strict Convexity Matters, 98
5.3.2 Maximal Sh-free Sets, 99
5.4
Homogeneous Quadratics With a Single Homogeneous Linear Con-
straint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.4.1 Case 1: ∥a∥ ≤ ∥d∥ ∧ m > 1, 101
5.4.2 Case 2: ∥a∥ ≥ ∥d∥, 106
5.5 Non-Homogeneous Quadratics . . . . . . . . . . . . . . . . . . 112
5.5.1 Case 1: ∥a∥ ≤ ∥d∥ ∧ m > 1, 113
5.5.2 Case 2: ∥a∥>∥d∥, 114
5.6 On the Diagonalization and Homogenization of Quadratics .125
5.7 Further Remarks and Generalizations . . . . . . . . . . . . . 128
5.7.1 Generalizing Theorem 5.16, 128
5.7.2 Generalizing Proposition 5.21, 131
5.7.3 Extensions to the Work of Bienstock et al. (2016), 132
5.7.4 There Are More Quadratic-Free Sets, 133
5.8 Computational Experiments . . . . . . . . . . . . . . . . . . . 134
5.9 Summary and Future Work . . . . . . . . . . . . . . . . . . . 136
5.10 Missing Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6 Conclusion 143
Bibliography 154
Chapter 1
Introduction
This thesis develops techniques for solving mixed-integer nonlinear problems,
in particular, techniques related to cutting planes. A mixed-integer nonlinear
problem (MINLP) belongs to the class of Mathematical Programming (MP).
In its simplest form, MP is concerned with finding the largest or smallest
value that a function can attain in some domain. For example, finding the
region of smallest surface that has a prescribed volume, or finding the path
that a ball has to take so that it goes from point A to point B in the least
amount of time under the influence of gravity. Already at this point one can
suspect that MP has lots of applications, just imagine packing a given volume
of liquid using the least amount of material. More modern examples of MP
problems include finding the shortest path between two points in a city, or
deciding where to open stores from a given set of possible locations such that
customers’ average shortest travel time is minimized, etc. One can find an
impressive amount of applications in the survey of Boukouvala, Misener, and
Floudas (2016).
The example problems mentioned above above have two distinct features.
The first examples are continuous, that is, the solution can be any real number.
In contrast, the last examples are discrete. Discrete structures appear, for
example, when we can only choose from a finite set of possibilities.
One of the features of these type of problems is that they can be translated,
with more or less work, to a mathematical model. That is, the set of feasible
solutions can be described by equations and inequalities, called constraints,
while the criterion we want to optimize over can be described as a function,
called objective function. As a toy example, suppose we are interested in
finding two non-negative integers such that the cube of one number is two
units away from the square of the other and their sum is smallest. If
x
and
y
are the two integer numbers and
v
is the value of their sum, the problem
1
2Chapter 1. Introduction
above can be written as
min{v:v=x+y, x3−y2= 2, x, y ∈Z+, v ∈R}.(1.1)
In
(1.1)
we encounter the constraints
v
=
x
+
y
,
x3−y2
= 2,
x, y ∈Z+
and
v∈R
, and the objective function is just
v
, which is the quantity we want to
minimize. The constraint
v
=
x
+
y
is linear, while
x3−y2
= 2 is nonlinear.
The variables x, y are restricted to be integers while vis continuous.
Such a model is an example of an MINLP problem. The “mixed-integer”
comes from the fact that variables can be either discrete or continuous. The
“nonlinear” makes reference to the possibility of having constraints represented
by nonlinear functions.
More general, a generic MINLP can be written as
min f(x)
s.t. gk(x)≤0∀k∈[m],
xi∈Z∀i∈I,
where
m, n ∈Z+
,
f, gk
:
A⊆Rn→R
, [
m
] =
{
1
, . . . , m}
,
x∈Rn
, and
I⊆
[
n
].
We note that assuming that the constraints are
gk
(
x
)
≤
0 is without loss of
generality, since gk(x) = 0 is equivalent to gk(x)≤0 and −gk(x)≤0.
In practice, MINLP problems are difficult to solve. The best algorithm
we currently have for trying to solve a general MINLP is the so-called LP-
based spatial branch and bound. LP stands for linear programming, which is a
subclass of MINLP concerned with optimization problems where all variables
are continuous and all constraints are linear. In contrast to MINLPs, LPs are
easy to solve in practice.
The basic idea of LP-based spatial branch and bound is to construct an
LP relaxation of the MINLP, that is, an LP such that every feasible point
of the MINLP is feasible for the LP. Solving this LP yields a bound on the
optimal value of the MINLP. The solution of the LP,
x¯
is likely to be infeasible
for the MINLP. Thus, the LP relaxation can, in principle, be refined by the
introduction of cutting planes separating
x¯
. These are linear inequalities that
every point of the MINLP satisfies and
x¯
does not satisfy. By refining the LP
relaxation, we obtain a better bound on the optimal value of the MINLP.
For example, it is not hard to see that (
x, y, v
) = (3
,
5
,
8) is an optimal
solution of
(1.1)
(just check that (3
,
5) is the only feasible point in
{
1
,
2
,
3
}×
{
1
,
2
,
3
,
4
,
5
}
). An LP relaxation of
(1.1)
is
min{v
:
v
=
x
+
y, x, y ≥
0
}
for
which an optimal solution is (
x¯, y¯, v¯
) = (0
,
0
,
0). The optimal value of the LP
is 0, which is a (lower) bound on the optimal value of the MINLP, which is 8.
3
Now, since
x3
=
y2
+2 and
y2≥
0, we can deduce that
x3≥
2. This implies
that
x >
1 and since
x
must be integral, we conclude that
x≥
2. Note that the
LP solution does not satisfy
x≥
2. Thus,
min{v
:
v
=
x
+
y, x ≥
2
, y ≥
0
}
is a tighter LP relaxation. An optimal solution of this LP is (
x¯, y¯, v¯
) = (2
,
0
,
2)
and yields a better lower bound. Cuts that involve a single variable are usually
called bound tightenings.
Notice that the LP solution, (
x¯, y¯
) = (2
,
0) violates the constraint
x3
=
y2
+ 2. In particular, if we interpret the equality as two inequalities, then
the violated inequality is
x3−y2≤
2. Since
x≥
2 and
y≥
0, the above
inequality is equivalent to
√x3−2−y≤
0. The function
f
(
x
) =
√x3−2
is
convex and differentiable at
x
= 2 and so
f
(2) +
f′
(2)(
x−
2)
≤f
(
x
), that is,
√6x−√6≤√x3−2
for
x≥
2. Therefore, every feasible point must satisfy
√6x−√6−y≤
0. We see that (
x¯, y¯
) = (2
,
0) does not satisfy this inequality.
Such an inequality is then cutting plane and its addition to the current LP
relaxation makes it tighter. Indeed, by adding it and solving the corresponding
LP we obtain the optimal point (
x¯, y¯, v¯
) = (2
,√6,
2 +
√6
) with value 2 +
√6
,
which is better than the one of the previous iteration.
However, at some point it might not be possible to compute a cutting
plane and so the algorithm starts branching. In its most basic form, branching
means to split the feasible region into two regions, in such a way that the
union of both regions is the original feasible region. For example, in the last LP
relaxation we obtained
y¯
=
√6
. Branching on
y
at
√6
produces two problems
which are the same as the original one, except that in one the constraint
y≤√6
is added and in the other one,
y≥√6
. Since
y
is restricted to be an
integer we can further make these inequalities tighter. Thus, after branching
on ywe obtain the following problems
min{v:v=x+y, x3−y2= 2, y ≤2, x, y ∈Z+, v ∈R}and
min{v:v=x+y, x3−y2= 2, y ≥3, x, y ∈Z+, v ∈R}.
The adjective spatial in spatial branch and bound means that the branching
can also be done on continuous variables, for example,
v
. The adjective is
added to distinguish the algorithm from the standard branch-and-bound al-
gorithm for solving mixed-integer linear problems (MILPs). Via branching,
the algorithm implicitly constructs a tree of problems.
By continuing the branching process the problem will eventually be solved.
However, as can be seen from the example, cutting planes are an important
tool for tightening the LP relaxation of the MINLP, whose purpose is to
accelerate the solution process.
Let us look at another example to illustrate another important tool for
solving MINLPs. Assume we are interested in buying some number of shirts
4Chapter 1. Introduction
and pants in such a way that the number of different outfits we can create
is maximal. We enter a rather expensive shop where the cost of each shirt is
30 euros, while each pant is 70 euros, and we have 250 euros in our wallet. If
s
is the number of shirts and
p
the number of pants that we buy, then the
number of outfits is T=s·p. Then, the problem we try to solve is
max{T:T≤s·p, 3s+ 7p≤25, s, p ∈Z+}.
Let us first notice that we do not have enough money to buy 9 shirts nor 4
pants, so
s≤
8 and
p≤
3. One way of obtaining a linear relaxation for this
problem is to find a linear relaxation of the constraint
T≤s·p
. To obtain
one, notice that for every feasible
p
and
s
we have that
s
(3
−p
)
≥
0 and
(8
−s
)
p≥
0. Thus,
T≤s·p≤min{
3
s,
8
p}
. These are the famous McCormick
inequalities (McCormick, 1976). Our first linear relaxation then looks like
max{T:T≤3s, T ≤8p, 3s+ 7p≤25, s, p ∈R+}.
We could have added the bounds
s≤
8,
p≤
3, but less us keep it simple. The
optimal solution of the linear relaxation is (
T, s, p
)
≈
(13
.
3
,
4
.
4
,
1
.
6). As this
is an upper bound on the optimal value, we know that it is not possible to
get 14 different outfits. Let us branch on
s≤
4 and
s≥
4. The first problem
created is
max{T:T≤s·p, 3s+ 7p≤25, s ≤4, s, p ∈Z+}.
If we solve the linear relaxation
max{T:T≤3s, T ≤8p, 3s+ 7p≤25, s ≤4, s, p ∈R+},
we obtain a value of
T
= 12. However, when branching on
s≤
4, the upper
bound of
s
is reduced from 8 to 4. Thus, there is a chance that we can deduce
a better linear relaxation of
T≤s·p
. Indeed, following the same reasoning as
above we see that
T≤s·p≤min{
3
s,
4
p}
. Now, solving the improved linear
relaxation
max{T:T≤3s, T ≤4p, 3s+ 7p≤25, s ≤4, s, p ∈R+},
yields
T≈
9
.
09, which is a much better upper bound. This shows that if we
buy 4 or less shirts we can only hope for 9 outfits. The algorithm will continue
either by branching or cutting. If anybody is interested, the maximum number
of outfits is actually 6, far away from the possibility of 13 given by the first
linear relaxation.
5
This example illustrates that the bounds of the variables are very im-
portant for building tight linear relaxations of MINLPs. Many details about
branch-and-bound algorithms have not been dealt with in the previous expla-
nation. For more details, including proofs of convergence, the reader is referred
to Horst and Tuy (1990, Chapter IV).
The importance of bound propagation and cutting planes is...
Contributions and outline In Chapter 2, we investigate two classical al-
gorithms for convex MINLPs, a subclass of MINLP in which all the functions
appearing in nonlinear constraints are convex. These algorithms are Kelley’s
Cutting Plane algorithm and Veinott’s Supporting Hyperplane algorithm. We
show that the convergence of Veinott’s algorithm follows from the conver-
gence of Kelley’s algorithm. The idea is to interpret Veinott’s algorithm as
Kelley’s algorithm applied to a reformulation of the original problem. Such a
reformulation only depends on the feasible region and not on functions used
to represent it. Thus, we are able to extend the applicability of Veinott’s
algorithm to some problems with convex feasible region, but where constraint
functions are not necessarily convex nor differentiable. Under a mild technical
condition, Veinott’s algorithm converges if the function are differentiable. To
extend this result, we relax the differentiability assumption of the functions
by introducing a notion of a generalized derivative which is enough to show
the convergence of Veinott’s algorithm.
In Chapter 3, we study in a more general setting the separation problem,
namely, given a point
x¯
and a set
S
, find a valid linear cutting plane for
S
that separates
x¯
, or show that none exists. In other words, if
A
(
S, x¯
) is the
set of all the answers of the separation problem, that is, all valid cuts for
S
that separate
x¯
from
S
, then the separation problem is to find an element of
A
(
S, x¯
) or show that
A
(
S, x¯
) =
∅
. We show that given
S
and
x¯
, there exists
S
ˆ⊆S
such that
A
(
S, x¯
) =
A
(
S
ˆ, x¯
). The intuition of such a result is as follows.
To ensure that a cutting plane is valid for a closed set
S
, it is enough to verify
that it is valid for every vertex of
S
. However, in general, we want a cutting
plane that separates a given point
x¯
. Thus, to ensure validity of such a cut,
it is enough to verify that it is valid for every vertex of
S
“near”
x¯
. We use
the concept of visible points of Sfrom x¯, VS(x¯), to formalize the meaning of
“near” and show that
A
(
S, x¯
) =
A
(
VS
(
x¯
)
, x¯
). We give a simple characterization
of the visible points of
S
when
S
is the intersection of a quadratic constraint
and a convex set. If
S
is the intersection of a polynomial constraint and a
convex set, we provide an extended formulation for a relaxation of the visible
points. As we will see, simple examples show that the visible points are not
the smallest
S
ˆ
such that
A
(
S, x¯
) =
A
(
S
ˆ, x¯
). Finally, we use the visible points
6Chapter 1. Introduction
to characterize the smallest S
ˆfor different classes of sets.
Then, in Chapter 4, we focus on intersection cuts. Intersection cuts are an
elegant technique to construct cutting planes that perfectly fits to LP-based
approaches for MINLP. We show how to construct intersection cuts for general
factorable MINLPs. The idea is to construct concave underestimators of a
factorable function. Our approach is to mimic McCormick’s procedure for
building convex underestimators. Furthermore we propose a strengthening
procedure for intersection cuts using monoidal strengthening in the presence
of a single integer variable.
With the aid of the concave underestimators, we build so-called
S
-free sets,
closed convex sets that do not contain any point of
S
in their interior, where
S
is normally the feasible region or a relaxation thereof. From an
S
-free set
and a simplicial conic relaxation of the feasible region one can construct an
intersection cut. As it turns out, the larger the
S
-free set the stronger the cut.
Thus, it is natural to seek maximal
S
-free sets, that is,
S
-free sets that are not
completely contained in any other
S
-free set. Although the constructions of
Chapter 4 allow us to construct
S
-free sets, they are usually not maximal. In
Chapter 5 we construct maximal
S
-free sets when
S
is given by a quadratic
constraint.
In the remainder of this chapter we introduce our notation and general
definitions that are used throughout the thesis. We explain, in a rather leisurely
manner, more techniques in MINLP that are relevant for this thesis.
1.1 Mathematical Preliminaries
In this section, we introduce notation and some concepts that we use through-
out the thesis. The reader is referred to the following references for some
definitions and proofs of some of the claims made in this section without proof:
Rockafellar (1970), Schrijver (1998) and Boyd and Vandenberghe (2004). We
classify the concepts to make the reference easier.
Topology
We will be working in
Rn
. We denote its inner product between
x, y ∈Rn
by
xTy
and by
∥·∥
the euclidean norm. We denote by
Br
(
x
) and
Dr
(
x
) the euclidean ball centered at
x
of radius
r
and its boundary, respectively.
More precisely,
Br
(
x
) =
{y∈Rn
:
∥y−x∥ ≤ r}
and
Dr
(
x
) =
{y∈Rn
:
∥y−x∥=r}.
Let
C⊆Rn
. We denote the boundary, complement, closure, interior, and
relative interior of
C
by
∂C
,
(C)c
,
cl C
,
int C
, and
ri C
, respectively. Given
v∈Rn
and a set
C⊆Rn
, we denote the distance between
v
and
C
by
dist
(
v, C
) =
infx∈C∥v−x∥
. Given two sets
A, B ⊆Rn
, the Minkowski sum
1.1. Mathematical Preliminaries 7
of
A
and
B
is
{a
+
b
:
a∈A, b ∈B}
and we denote it by
A
+
B
. When
A
is
a singleton, say
A
=
{a}
, we denote the sum by
a
+
B
. For a set of vectors
{v1, . . . , vk} ⊆ Rn
, we denote by
⟨v1, . . . , vk⟩
the subspace generated by them.
Given some set
C⊆Rn×Rm
, we denote by
projxC
the projection of
C
onto the
x
-space, that is,
projxC
=
{x∈Rn
:
∃y∈Rm,
(
x, y
)
∈C}
. More
generally, if
H
is a subspace of
Rn
, we denote
projHC
the projection of
C
onto H.
Convex sets
Given
m
points
x1, . . . , xm∈Rn
and given
λ1, . . . , λm∈
[0
,
1]
such that
∑︁m
i=1 λi
= 1, the point
∑︁m
i=1 λixi
is said to be a convex combination
of the points
x1, . . . , xm
. We say that
C
is convex if for every
x, y ∈C
and
λ∈
[0
,
1],
λx
+ (1
−λ
)
y∈C
, that is, if for every pair of points in
C
every
convex combination of them is in
C
. The convex hull of
C
is the smallest
convex set that contains
C
, or equivalently the intersection of all convex sets
containing
C
and is denoted by
conv C
. The closure of the convex hull of
C
is
denoted by
conv C
. The extreme points of a not necessarily convex set
C
are
the points in
C
that cannot be written as convex combination of other points
in
C
, and we denote them by
ext C
. For example, if
C
is a square, then the
extreme points are the vertices. If
C
is a disk, then the extreme points are all
the points at the boundary. If
C
is this figure
⊐⊂
, then the two right vertices
and all the points of the semi-circle at the left are extreme points. The beauty
of the concept of extreme points is that those points are the only ones needed
to describe the convex hull of a set.
A related concept is that of exposed points. When one optimizes a linear
function over a set
C
, then an optimal solution, if one exists, is going to
be at the boundary of
C
. The solution might be unique, for example, when
optimizing in any direction over a circle. There might be multiple solutions,
for example, when optimizing in the direction (1
,
0) over a square. Any
x0∈C
such that there exists a linear function
αTx
for which
x0
is the unique solution
of
maxx∈CαTx
is called an exposed point. We denote the set of exposed points
of
C
by
exp C
. Every exposed point is an extreme point. However, not every
extreme point is an exposed point. To see this, consider again
C
=
⊐⊂
. The
two points where the semi-circle meets the straight part are extreme but not
exposed.
The gauge function of a convex set
C
is
ϕC
(
x
) =
inf{t
:
t >
0
,x
t∈C}
.
The gauge function is a sort of distance measured by
C
. It measures what is
the minimum that we have to scale Cso that xis at its boundary.
Given a closed set
S
, a convex set
C
is said to be
S
-free if its interior does
not contain any point of
S
. In other words,
C
is
S
-free if
S∩int C
=
∅
. Let
C
be an
S
-free set. We say that
C
is maximal
S
-free if it holds that for every
8Chapter 1. Introduction
convex S-free set K, if C⊆K, then C=K.
Inequalities
Let
α∈Rn
and
β∈R
. The set
{x∈Rn
:
αTx
=
β}
is
called an affine subspace and we say that
α
is its normal. The set
{x∈Rn
:
αTx≤β}
is a half-space. Both are convex. In general, a closed convex set can
be written as the intersection of an arbitrary number of half-spaces. Usually,
instead of writing the half-space as a set we just write
αTx≤β
. We say that
αTx≤β
is valid or a valid inequality for
C
if
C⊆ {x∈Rn
:
αTx≤β}
. If
αTx≤β
is a valid inequality for
C
and
x¯/∈C
is such that
αTx¯> β
, we say
that
αTx≤β
separates
x¯
from
C
. If
αTx≤β
is a valid inequality for
C
and it
is tight, that is, there exists a
y∈C
such that
αTy
=
β
, we say that
αTx≤β
is a supporting hyperplane of
C
, or that it supports
C
. A closed convex set can
be written as the intersection of its supporting hyperplanes. If the number of
hyperplanes needed to describe a convex set is finite, then the convex set is
called a polyhedron.
Cones
Acone is a set
C⊆Rn
with the following property. If
x∈C
and
λ≥
0, then
λx ∈C
. A cone is pointed if it has an extreme point, in which
case this extreme point is called apex. Given
m
points
x1, . . . , xm∈Rn
and
λ1, . . . , λm≥
0, the point
∑︁m
i=1 λixi
is said to be a conic combination of the
points
x1, . . . , xm
. In the context of cones, the extreme rays play the role of
extreme points. A ray is a set of the form
{λx
:
λ≥
0
}
and we call it the ray
generated by
x
. If
C
is a cone and
x∈C
, the ray generated by
x
is contained
in
C
. We say that the ray generated by
x∈C
is an extreme ray if
x
cannot
be written as a conic combination of other points of C. Note that this is the
same as saying that neither
x
nor any positive scaling of it can be written
as a conic combination of other points of
C
. We say that a set
K⊆Rn
is a
translated cone if there exist a cone
C
and
x∈Rn
such that
K
=
C
+
x
. A
cone in Rnis said to be simplicial if it has exactly nextreme rays.
Every unbounded convex set contains a (translated) cone. The recession
cone of a convex set
C
, denoted by
rec
(
C
), is the largest cone
K
such that
C
+
K
=
C
. In other words,
rec
(
C
) is the largest cone that can be translated to
be completely contained in
C
. It is possible that a direction
d
and its opposite,
−d
, are both in the recession cone of
C
. The set of all such directions, that is,
rec
(
C
)
∩rec
(
−C
) is called the lineality space of
C
and is denoted by
lin
(
C
).
It is the largest subspace,
L
, such that
L
+
C
=
C
. Note that a convex cone
is pointed if and only if its lineality space is {0}.
Convex functions
Let
g
:
X⊆Rn→R
be a function. The epigraph of
g
is the set of all points above the graph,
epi g
=
{
(
x, z
)
∈Rn+1
:
z≥g
(
x
)
}
.
1.1. Mathematical Preliminaries 9
We say that
g
is convex in
C⊆X
if
C
is convex and for every
x, y ∈C
and
λ∈
[0
,
1],
g
(
λx
+ (1
−λ
)
y
)
≤λg
(
x
) + (1
−λ
)
g
(
y
). Equivalently,
g
is convex
if its epigraph is convex. We say that
g
is concave when
−g
is convex and
every concept we define for convex functions has its counterpart for concave
functions.
When
g
is differentiable and convex in
C
we have that
g
(
y
) +
∇g
(
y
)
T
(
x−
y
)
≤g
(
x
) for every
x, y ∈C
. For a given
y
, this inequality means that the
tangent hyperplane at
y
of the graph of
g
,
g
(
y
) +
∇g
(
y
)
T
(
x−y
), is always
below the function. Equivalently, it means that the epigraph of
x↦→ g
(
y
) +
∇g
(
y
)
T
(
x−y
) is a valid inequality for the epigraph of
g
. Actually, since the
inequality is tight when
x
=
y
, the inequality supports
epi g
. In general, convex
functions do not need to be differentiable, however, the epigraph is still convex
and it still has supporting hyperplanes. A subgradient of a convex function is
the normal of a supporting hyperplane, when the inequality is written in a
similar form to the differentiable case. Specifically, a vector
v
is a subgradient
of
g
at
y
if
g
(
y
)+
vT
(
x−y
)
≤g
(
x
) for every
x∈C
. The set of all subgradients
of
g
at
y
is called the subdifferential of
g
at
y
and its denoted by
∂g
(
y
). Thus,
∂g(y) = {v∈Rn:g(y) + vT(x−y)≤g(x)∀x∈C}.
For example,
g
(
x
) =
|x|
is convex, not differentiable at 0, and
∂g
(0) = [
−
1
,
1].
A function
g
is positively homogeneous if
g
(
λx
) =
λg
(
x
) for every
λ≥
0
and all
x
. A function
g
is subadditive if
g
(
x
+
y
)
≤g
(
x
) +
g
(
y
). A function
is sublinear if it is positively homogeneous and subadditive. Equivalently,
g
is sublinear if it is positively homogeneous and convex. The epigraph of a
sublinear function from
Rn
to
R
is a closed convex cone. We say that a convex
set is represented by a sublinear function gif C={x:g(x)≤1}.
Given a convex function
g
:
C→R
,
g
(
x
)
≤
0 is called a convex constraint.
We have that for any x¯∈Cand v∈∂g(x¯),
g(x¯) + vT(x−x¯) ≤0 (1.2)
is a valid inequality for
g
(
x
)
≤
0. Thus, if
x¯∈C
violates the convex constraint,
that is,
g
(
x¯
)
>
0, then
(1.2)
separates
x¯
from
g
(
x
)
≤
0. To see this, recall
that
g
(
x¯
) +
vT
(
x−x¯
)
≤g
(
x
) for every
x∈C
. In particular, if
x
satisfies the
constraint, then
g
(
x¯
)+
vT
(
x−x¯
)
≤g
(
x
)
≤
0, which shows the validity of
(1.2)
.
Evaluating
(1.2)
at
x¯
yields
g
(
x¯
)
≤
0 from where we conclude that
x¯
does not
satisfy
(1.2)
. We call such inequalities gradient cutting planes, or gradient cuts
for short, because when gis differentiable vcan only be the gradient ∇g(x¯).
If
g
:
X⊆Rn→R
is a function and
C⊆X
is convex, then we denote by
gvex
C
aconvex underestimator of
g
over
C
. This means that
gvex
C
:
C→R
is
10 Chapter 1. Introduction
a convex function and underestimates
g
on
C
, that is,
gvex
C
(
x
)
≤g
(
x
) for all
x∈C. Similarly, we define a concave overestimator.
Matrices
A matrix
M∈Rn×n
is symmetric if
M
=
MT
. We say that a
symmetric matrix
M
is positive semi-definite if
xTMx ≥
0 for every
x∈Rn
.
Given an integer
n
, we denote by
Sn
+
the cone of positive semi-definite matrices
of size
n×n
. A matrix
M
is copositive if
xTMx ≥
0 for every
x∈Rn
+
. A
k×k
submatrix of a matrix
M
is a matrix formed by the deleting all but
k
columns and
k
rows of
M
. The rank of a matrix
M
is the number of linearly
independent columns, which is the same as the number of linearly independent
rows, and we denote it by rk M.
General notation
Given an interval
I⊆R
and an arbitrary set
A⊆Rn
we denote by
IA
the set
{λx
:
λ∈I, x ∈A}
. Likewise, for
x∈Rn
,
Ix
:=
{λx :λ∈I}.
Given
n∈N
, we denote by [
n
] =
{
1
, . . . n}
. If
A
and
B
are sets and
A
is
finite, we denote by BAthe set B|A|, where |A|is the cardinality of A.
1.2 Intersection Cuts
Intersection cuts are the topic of chapters 4 and 5. In this section, we give a
brief introduction to intersection cuts.
The history of intersection cuts and
S
-free sets dates back to the 60’s.
They were originally introduced in the nonlinear setting by Tuy (1964) for the
problem of minimizing a concave function over a polytope. Later on, they were
introduced in integer programming by Balas (1971) and have been largely
studied since. The more modern form of intersection cuts deduced from an
arbitrary convex
S
-free set is due to Glover (1973), although the term
S
-free
was coined by Dey and Wolsey (2010).
We illustrate the idea with the following integer program
max{−12x+5y:x+4y≤17,−4x+y≤ −3,5x−6y≤1, x, y ∈Z},(1.3)
depicted in Figure 1.1. The LP relaxation solution is
x¯
= (
29
17 ,65
17
). The nearest
feasible point is at a distance of
√︁13/17
, and so there is no feasible point in
the interior of the ball centered at
x¯
of radius
√︁13/17
. If
S
=
{
(
x, y
)
∈Z2
:
x+ 4y≤17,−4x+y≤ −3,5x−6y≤1}, then this ball is an S-free set.
The LP solution is the apex of a cone whose extreme rays are the edges of
the polyhedron adjacent to the LP solution. Now, consider the points where
the extreme rays of the cone intersect the ball and build the hyperplane (in
1.2. Intersection Cuts 11
1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4
Figure 1.1: The left plot shows the integer points in black, the LP relaxation
of
(1.3)
in blue, and the optimal LP solution in red. The middle plot highlights
the ball centered at the optimal LP solution of radius equal to the distance
between the optimal LP solution and the nearest feasible point in orange. It
also shows the extreme rays of the conic relaxation starting at the optimal
LP solution in green. The right plot shows the intersection points of the ball
with the cone in green, the intersection cut in gray, and the region cutoff by
the cut also in gray.
this case just a line) that goes through those points. This hyperplane defines
a valid inequality that separates the LP solution from
S
. The reason why
it is valid is that the region of the LP relaxation cutoff by the inequality
is completely contained inside the ball. This happens because the ball is a
convex set. As the ball does not contain any feasible point in its interior, the
cut must be valid. Such a cutting plane is an intersection cut.
In general, there are three ingredients for the construction of intersection
cuts. First, the set of (or a relaxation of the) feasible points
S
. Second, a
simplicial cone that contains the feasible region and whose apex is the LP
solution (or the point to separate). Third, an
S
-free set
C
that contains the
LP solution in its interior. We ask for the cone to be simplicial so that the
intersection of its extreme rays with Cdefines a unique hyperplane.
Note that the larger the
S
-free set, the better the intersection cut. The
intuition is that if
K
and
C
are
S
-free and
K
is larger than
C
, then the
intersection of an extreme ray of the cone with
K
will be farther away, and
thus the cut will be deeper. This is illustrated in Figure 1.2 where we compare
the cut obtained in the above example with the intersection cut deduced by
using as
S
-free set the largest ball centered at the LP solution that does not
include any integer point in its interior.
How can we build a simplicial cone whose apex is the LP solution and
12 Chapter 1. Introduction
1234
1
2
3
4
1234
1
2
3
4
Figure 1.2: The left plot shows the intersection cut for
(1.3)
obtained above.
The right plot shows the intersection cut obtained from the
S
-free set given
by a Z2-free ball.
that contains the whole feasible region? Luckily, such a cone appears quite
naturally when we solve the LP using the simplex algorithm. Consider a linear
program
max{cTx
:
Ax ≤b}
. The simplex algorithm starts at a vertex of
Ax ≤b
and iteratively moves to a neighbor vertex with better objective value
if there is one. If there is none, then the vertex is optimal. A vertex is a feasible
point defined by the intersection of
n
independent hyperplanes among the
m
ones in
Ax ≤b
. Ignoring all but the
n
constraints that define a vertex, yields
a simplicial cone whose apex is the vertex and contains the whole LP, see the
middle plot in Figure 1.1. When an optimal solution is obtained, one can find
out the
n
constraints that the simplex algorithm considered in order to define
the solution. Therefore, intersection cuts are readily available in LP-based
branch and bound algorithms if we are able to construct an
S
-free set that
contains the LP solution in its interior.
We will now present a more algebraic deduction of intersection cuts whose
advantage is that it admits a generalization of intersection cuts. As it turns
out, this generalization is only relevant when the
S
-free set is unbounded. We
will also give a geometric characterization of the generalization and show that
in this case it no longer holds that larger S-free sets yield better cuts.
The simplex algorithm is usually presented using the so-called standard
from of an LP, namely,
max{cTx
:
Ax
=
b, x ≥
0
}
. The advantage is
that the algebraic description of the algorithm is simpler, but certainly the
1.2. Intersection Cuts 13
geometric intuition is obfuscated. But the story is the same. We have
n
variables and
m
+
n
constraints,
m
from
Ax
=
b
and
n
from
x≥
0. Since
m
of these constraints are equality, we simply need
n−m
more to define a point,
assuming, as we are, that the equality constraints are linearly independent.
These
n−m
can only come from
x≥
0. Thus, any vertex will have
n−m
variables fixed to 0 and the others will be the unique solution to the remaining
system of equations. As above, not every selection of
n−m
constraints from
x≥
0 yields a vertex, but some do. In particular, if a selection does, then the
matrix describing the remaining system is invertible. That is, the columns of
A
associated to the
m
variables not fixed to 0 after setting
n−m
constraints
from
x≥
0 to equality are linearly independent. These variables are called
basic variables, their indices are called a basis, and the remaining variables
are called non-basic.
Let
B
be a basis and let
N
be the indices of the non-basic variables. We
can partition the system
Ax
=
b
into basic and non-basic variables. For this we
introduce the following notation: if
I⊆ {
1
, . . . , n}
, then
AI
represents the col-
umns of
A
indexed by
I
, while
xI
the subvector of variables indexed by
I
. Then
Ax
=
b
is equivalent to
ABxB
+
ANxN
=
b
. From the above discussion
AB
is an invertible matrix, thus
Ax
=
b
is equivalent to
xB
=
A−1
Bb−A−1
BANxN
.
This is the so-called tableau.
1
There is a lot of important information in the
tableau. In particular, the apex of the simplicial cone is (
xB, xN
) = (
A−1
Bb,
0),
while its extreme rays are (
xB, xN
) = (
−A−1
BANej, ej
) for
j∈N
. Note that
although
x∈Rn
, the feasible points are in an
n−m
dimensional space, as-
suming
A
has full rank. So the cone is actually simplicial only in the solution
space, as it has
n−m
rays. Thus, it gets a bit more complicated to picture
this, but the beauty is that we can deduce the intersection cuts directly from
the tableau.
Consider an optimization problem
P
and assume that the tableau of an
LP relaxation of it is
x
=
f
+
Rs
, where
x
are the basic and
s
the non-basic
variables. Let
S
be a closed set such that for every feasible solution (
x, s
) of
P
, it holds that
x∈S
. Furthermore, assume that
f /∈S
, that is, the optimal
LP solution (
f,
0) is not feasible. Let
C
be an
S
-free set such that
f∈int C
.
Let us assume that
C
is given by
C
=
{x
:
ϕ
(
x−f
)
≤
1
}
, where
ϕ
is
sublinear. Now, any
s≥
0 defines an
x
=
f
+
Rs
and
ϕ
(
x−f
) =
ϕ
(
Rs
). Thus,
as long as
ϕ
(
Rs
)
<
1,
x∈C
and
x
itself cannot be feasible. We conclude
that if (
x, s
) is to be feasible, then
ϕ
(
Rs
)
≥
1, that is,
ϕ
(
Rs
)
≥
1 is a valid
(nonlinear) inequality. To make it linear, we use the sublinearity of
ϕ
and the
1
The tableau also has a row with the objective function, but we omit it as it is not relevant
for our current discussion.
14 Chapter 1. Introduction
non-negativity of the variables. Indeed,
1≤ϕ(Rs) = ϕ(∑︂
j
Rjsj)≤∑︂
j
ϕ(Rjsj) = ∑︂
j
ϕ(Rj)sj,
where the second inequality follows from the subadditivity of
ϕ
and the last
equality follows from the positive homogeneity of
ϕ
and the non-negativity of
s
. Such a function
ϕ
is also called a cut generating function, since evaluating
it at the given rays is sufficient enough to obtain the cut’s coefficients.
When
ϕ
is the gauge of
C−f
, then the cut above corresponds to the
intersection cut described geometrically above. Indeed, the points
si
=
1
ϕ(Ri)ei
,
assuming
ϕ
(
Ri
)
>
0, satisfy the inequality
∑︁jϕ
(
Rj
)
sj≥
1 with equality.
These points define
xi
=
f
+
1
ϕ(Ri)Ri
and satisfy
ϕ
(
xi−f
) = 1. This means that
all
xi
are on the boundary of
C
. In other words, the hyperplane
∑︁jϕ
(
Rj
)
sj≥
1
passes through the
n−m
points (
xi, si
), which correspond to the intersection
of the
n−m
rays (
Ri, ei
) with the boundary of the
S×Rn−m
-free set,
C×Rn−m
.
As mentioned before, note that the LP is
Ax
=
b, x ≥
0 so, even though
x∈Rn
and we need
n
points to define a hyperplane, the feasible region
lives in the translated subspace
Ax
=
b
. Therefore, we are working on
Rn−m
embedded in
Rn
and only
n−m
points define a unique hyperplane in the
space that we are working on.
A sublinear function other than the gauge, if it exists, will yield better cut
coefficients, thus, a better cut. As it turns out, if
C
=
{x
:
ϕ
(
x−f
)
≤
1
}
for
some sublinear function
ϕ
and
f
+
RiR+
is a ray that is not in the interior of
the recession cone of
C
, then
ϕ
(
Ri
) is equal to the gauge of
C−f
at
Ri
. That
is, the only way of improving on a coefficient is that
f
+
RiR+∈int rec
(
C
).
In other words, the possibility of improving the cut coefficients can only occur
when
C
is unbounded and, furthermore, when a ray of the simplicial cone is
in the interior of the recession cone of
C
. Note that when this occurs, then
the gauge of
C−f
at
Ri
is 0 and if an improvement is possible, then the
coefficient must be negative. A negative coefficient can never be achieved with
the gauge as the gauge is always non-negative.
This phenomenon was first observed by Glover (1974). Glover interpreted
the negative coefficient as moving in the negative direction of the ray instead
of the positive one.
Here we provide an interpretation of the negative edge extension. Consider
the following set
S
=
{
(
x, y
)
∈R2
+
:
x−y≥
2
∨x−
5
y≥
1
}
, see Figure 1.3.
Clearly, a maximal
S
-free set is
C
=
{
(
x, y
)
∈R2
:
x−y≤
2
, x −
5
y≤
1
}
.
The cone with apex 0 and rays
e1
and
e2
is simplicial and contains the whole
feasible region, so we use it to generate the intersection cut. The intersection
1.2. Intersection Cuts 15
-1 0 1 2 3 4
-3
-2
-1
0
1
2
x
y
-1 0 1 2 3 4
-3
-2
-1
0
1
2
x
y
-1 0 1 2 3 4
-3
-2
-1
0
1
2
x
y
Figure 1.3: The left plot shows the set
S
in blue. The middle plot shows the
set
S
in blue and
C
in orange with the intersection cut obtained by the gauge.
The right plot shows S,Cand the cut obtained with ϕ.
cut obtained from the simplicial cone and the
C
is
x≥
1. Indeed, the gauge
of
C
,
ϕC
, satisfies
ϕC
(
e1
) = 1, since
e1∈C
, and
ϕC
(
e2
) = 0, as
λe2∈C
for every
λ≥
0. As it turns out,
C
=
{
(
x, y
)
∈R2
:
ϕ
(
x, y
)
≤
1
}
for
ϕ
(
x, y
) =
max{x−y
2, x −
5
y}
. Note that
ϕ
(
e2
) =
max{−1
2,−
5
y}
=
−1
2
. Thus,
ϕis not the gauge and, more importantly, the cut x−1
2y≥1 is valid.
The interpretation of the coefficients of the intersection cut obtained by
the gauge is as follows. If we move along the ray
e1
, then we hit the boundary
of
C
at 1
e1
, thus the cut coefficient is
1
1
. Instead, if we move along
e2
, then
we “hit” the boundary of Cat “∞e2” and the cut coefficient is 1
∞= 0.
However, we can actually tilt this cut to make it stronger. How much
can we tilt it? Well, we can tilt as long at the cut off region is inside
C
.
The tilted cut intersects the
y
axis at some negative point. The higher the
point the stronger the cut, see Figure 1.4. The coefficient of the intersection
cut obtained by the sublinear function
ϕ
corresponds to the tilting whose
intersection with the
y
axis is the lowest point at which a supporting valid
inequality for
C
intersects the
y
axis. In this case, such a point is (0
,−
2) and
so the cut coefficient is −1
2.
Something looks off, though, the cut is not the best possible. How can
we achieve a better cut? Consider the weaker
S
-free set
K
=
{
(
x, y
)
∈R2
:
x−y≤
1
, x −
5
y≤
1
}
. We have that
K
=
{
(
x, y
)
∈R2
:
ψ
(
x, y
)
≤
1
}
,
where
ψ
(
x, y
) =
max{x−y, x −
5
y}
. Now the intersection cut is
x−y≥
1
and it cannot be strengthened anymore as it defines a facet of conv(S).
What happened? By moving the facet
x−y≤
2 of
C
to the left until
x−y≤
1, we did not change the intersection point of the ray
e1
. However,
we did make the lowest point at which a valid inequality for
K
intersects the
16 Chapter 1. Introduction
-101234
-3
-2
-1
0
1
2
x
y
Figure 1.4: The plot shows the set
S
in blue and the set
C
in orange. We see
the intersection cut obtained with the gauge (dashed), a better tilted cut that
intersects the
y
axis at
−
2
.
5 (green), and the intersection cut obtained with
ϕ
(
x, y
) (red). The higher the intersection with the
y
axis, the better the cut.
Also, the red cut intersects the
y
axis at the lowest point that a supporting
valid inequality of Cintersects the yaxis. Supporting valid inequalities of C
intersect the yaxis between the black dot and the red dot.
1.3. Duality 17
-1 0 1 2 3 4
-3
-2
-1
0
1
2
x
y
-1 0 1 2 3 4
-3
-2
-1
0
1
2
x
y
Figure 1.5: The left plot shows how shrinking the
S
-free set moves the lowest
intersection with the
y
axis up. The right plot shows the final intersection cut,
which defines the closure of the convex hull of S.
yaxis higher, thus the cut is stronger. For an illustration see Figure 1.5.
The above is an example that larger
S
-free sets are not always better when
one builds intersection cuts with sublinear functions other than the gauge.
Let
C
be an
S
-free set. When the ray actually intersects the boundary of
C
,
it is clear that if we extend
C
in that direction, then the intersection point is
going to be farther away as we discussed above and illustrated in Figure 1.2.
However, the interpretation of the cut coefficient with a sublinear function is a
bit more involved and uses more global information. Indeed, making
C
larger
in some direction will affect which inequalities are valid and so it can have
a (negative) effect on the cut coefficient for rays that are contained inside C.
This is what the above example illustrates.
We refer the reader to Conforti et al. (2011b) and Conforti et al. (2015)
for more details on intersection cuts.
1.3 Duality
In chapters 2 and 5, we mention and use Slater’s condition, respectively. This
is a condition that ensure strong duality of convex problems. Here we give a
brief introduction to duality aiming at explaining Slater’s condition from a
geometrical point of view.
In this section we give a brief introduction to intersection cuts. Consider
18 Chapter 1. Introduction
a linear program
max{cTx
:
Ax ≤b}
. Suppose its optimal value is
z
. This
means that
cTx≤z
for every
x
such that
Ax ≤b
. In fact, it is the tightest
valid inequality for
Ax ≤b
with normal
c
. Thus, instead of solving
max{cTx
:
Ax ≤b}
directly, one can try to find the tightest valid inequality for
Ax ≤b
with normal
c
. Alternatively, one can think of it as finding the best upper
bound on the value that
cTx
can achieve over
Ax ≤b
. But how can we do
this?
It should, of course, be possible to deduce the inequality
cTx≤z
just
from the information in
Ax ≤b
. For example, consider
max{
3
x
+
y
: 4
x−
y≤
2
,−x
+ 3
y≤
5
}
. The optimal solution is obtained at (
x¯, y¯
) = (1
,
2)
and has a solution value of 5. Thus, the inequality 3
x
+
y≤
5 is valid for
{
(
x, y
):4
x−y≤
2
,−x
+ 3
y≤
5
}
. Indeed, we can deduce it from 4
x−y≤
2
and
−x
+ 3
y≤
5 by multiplying the first inequality by 10, the second one by
7, and then adding them up. This yields 33
x
+ 11
y≤
55, which is the same
as 3x+y≤5.
It is a fundamental result in linear programming, called Farkas’ lemma,
that if
Ax ≤b
is non-empty, then every valid inequality can be deduced
by considering a conic combination of the constraints (Ziegler, 1995). Why
the non-empty assumption? The problem is that every inequality is valid
when
Ax ≤b
is empty, but to be able to write every inequality as a conic
combination of
Ax ≤b
one needs enough inequalities, more than the ones
needed to describe an empty set. For example,
{
(
x, y
)
∈Rn
:
x≤
0
, x ≥
1
}
is clearly empty, thus the inequality
y≤
0 is valid. However, there is no way
of building that inequality by taking positive linear combinations of
x≤
0
and −x≤ −1.
With Farkas’ lemma we can write the problem of finding the tightest valid
inequality for
Ax ≤b
with normal
c
as follows. Every valid inequality is given
by
µTAx ≤µTb
for some
µ≥
0. The normal of the inequality has to be
c
,
thus we have the constraint
µTA
=
c
and it has to be the tightest, that is, the
right hand side,
µTb
has to be the smallest. Thus, when
Ax ≤b
is feasible,
we have
min{µTb:µTA=c, µ ≥0}= max{cTx:Ax ≤b}.
The problem on the left hand side is called the dual problem and the one in
the right hand side, the primal.
There are many ways of deducing the dual problem. A standard way is
through Lagrange duality. The idea is as follows. The problem
max{cTx
:
Ax ≤b}
can be written as an unconstrained problem using
IRm
−
, the indicator
1.3. Duality 19
function of Rm
−,
IRm
−(y) = {︄0,if y≤0
+∞,otherwise.
We have
max{cTx
:
Ax ≤b}
=
max cTx−IRm
−
(
Ax −b
). The dual tries to
bound the optimal value. One way to find a bound is to find an overestimator
of the objective function. We have that
IRm
−
(
y
)
≥µTy
for any
µ∈Rm
+
. Indeed,
if
y≤
0, then the left hand side is +
∞
, so the inequality holds. Otherwise,
the left hand side is 0, while the right one is non-positive, so the inequality
holds. Therefore, for any µ≥0,
max{cTx:Ax ≤b} ≤ sup
xcTx−µT(Ax −b).
We can now take the best µ≥0 to get
max{cTx:Ax ≤b} ≤ inf
µ≥0sup
xcTx−µT(Ax −b).
The function
L
(
x, µ
) =
cTx−µT
(
Ax −b
) is called the Lagrangian function,
θ
(
µ
) =
supxL
(
x, µ
) is the Lagrangian dual function, and
infµ≥0θ
(
µ
) is the
(Lagrangian) dual problem of max{cTx:Ax ≤b}. We have that
θ(µ) = sup
xcTx−µT(Ax−b) = sup
x(c−ATµ)Tx+µTb={︄µTb, if c−ATµ= 0
∞,otherwise.
Thus, the Lagrangian dual is
inf{µTb:ATµ=c, µ ≥0},
which is the same as the linear programming dual.
The advantage of Lagrangian duality is that the deduction of the dual
generalizes to other types of problems. For example, consider
max{ex
:
x2≤
y, y ≤
1
}
. The reasoning in the linear case was to find valid inequalities that
can be deduced from the constraints. Luckily, Farkas’ lemma tells us how these
valid inequalities look like and so we could write an optimization problem to
find the tightest one. Here, it is not clear how the valid inequalities actually
look like. However, Lagrangian duality still yields a dual.
The disadvantage, though, is that it will not be clear that the bound
provided by the Lagrangian dual is equal to the optimal value of the primal.
In fact, even if the primal is convex there can be a positive difference between
the optimal values of the primal and dual problems. We refer to the optimal
value of the primal as primal value and the optimal value of the dual es dual
20 Chapter 1. Introduction
value. When the primal and dual values coincide, we say that strong duality
holds. The difference between the primal and dual values is called duality gap.
To see that there are convex problems with positive duality gap, let us
compute the Lagrangian dual of
max{−e−x
:
√︁x2+y2≤y}
. The Lagrangian
function is
L
(
x, y, µ
) =
−e−x−µ
(
√︁x2+y2−y
). The Lagrangian dual function
is
θ
(
µ
) =
supx,y −e−x−µ
(
√︁x2+y2−y
). By Cauchy-Schwarz inequality
y≤√︁x2+y2
for all
x, y ∈R
, so
−µ
(
√︁x2+y2−y
)
≤
0 for every
x, y ∈R2
and µ≥0. Thus, θ(µ)≤supx,y −e−x= 0.
Let us show that actually θ(µ) = 0 for all µ≥0. Notice that
−e−x−µ(√︁x2+y2−y) = −e−x−µx2
√︁x2+y2+y.
Replacing yby exabove and computing the limit as x→ ∞ we obtain
lim
x→∞−e−x−µx2
√x2+e2x+ex= 0.
Thus, θ(µ) = 0 for every µ≥0.
However, the primal’s feasible region is
{
0
}×R+
and its optimal value is,
thus, −e0=−1.
To understand why this could happen, let us interpret the dual from a
more geometric point of view. For this, let us abstract the problem a bit.
Consider
max{f
(
x
) :
gi
(
x
)
≤
0
}
. The Lagrangian dual function is then
θ
(
µ
) =
supxf
(
x
)
−∑︁iµigi
(
x
). Thus, we have that
f
(
x
)
−∑︁iµigi
(
x
)
≤θ
(
µ
)
for every
x
. An enlightening way of interpreting this inequality is to see it as a
valid inequality of a set. Indeed, the inequality is saying that the hyperplane
y0−∑︁iµiyi≤θ
(
µ
) is valid for the set Φ(
Rn
) =
{
(
f
(
x
)
, g1
(
x
)
, . . . , gm
(
x
)) :
x∈Rn}
, where Φ(
x
) = (
f
(
x
)
, g1
(
x
)
, . . . , gm
(
x
)). Thus, we can interpret the
Lagrangian dual function as a function that given
µ≥
0, finds the best
right-hand side of a valid inequality with normal (1
,−µ
) for Φ(
Rn
). Then,
the Lagrangian dual problem seeks the normal (1
,−µ
) such that the valid
inequality with that normal has the best (smallest in this case) right-hand
side.
So, why do we have a positive duality gap for
max{−e−x
:
√︁x2+y2≤
y}
? To answer this question we need to understand how Φ(
R2
) looks when
Φ(
x, y
) = (
−e−x,√︁x2+y2−y
). Figure 1.6 shows Φ([
−1
2,1
2
]
×
[
−1
2,1
2
]) and
Φ([
−1
2,
5]
×
[
−1
2,
150]). One can prove that Φ(
R2
) = ((
−∞,
0)
×
(0
,
+
∞
))
∪
{
(
−
1
,
0)
}
. From here we see that for every
µ≥
0, the tightest valid inequality
for Φ(
R2
) with normal (1
,−µ
) is
y0−µy1≤
0. In other words,
θ
(
µ
) = 0 for
every µ≥0 as we saw above.
1.3. Duality 21
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
y0
y1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
y0
y1
Figure 1.6: The left plot shows Φ([
−1
2,1
2
]
×
[
−1
2,1
2
]) and the right one shows
Φ([−1
2,5] ×[−1
2,150]), where Φ(x, y) = (−e−x,√︁x2+y2−y).
When can we ensure that strong duality holds? Consider again
max{f
(
x
) :
gi
(
x
)
≤
0
}
and let
p∗
be the optimal value. Assume that
f
is concave and the
gi
are convex and notice that
y0−∑︁iµiyi≤θ
is a valid inequality for Φ(
Rn
)
with
µ≥
0, if and only if, it is valid for Φ(
Rn
) + (
R−×Rm
+
). The advantage of
Φ(
Rn
) + (
R−×Rm
+
) over Φ(
Rn
) is that it is convex. Now, as
p∗
is the optimal
value, it follows that there cannot be any feasible point,
x
such that
gi
(
x
)
≤
0
for all i, such that f(x)< p∗, that is,
(Φ(Rn)+(R−×Rm
+)) ∩((p∗,+∞)×Rm
−) = ∅.
We illustrate Φ(
Rn
) + (
R−×Rm
+
) and (
p∗,
+
∞
)
×Rm
−
in Figure 1.7 for
max{−e−x:√︁x2+y2≤y}.
Now, Φ(
Rn
) + (
R−×Rm
+
) and (
p∗,
+
∞
)
×Rm
−
are two convex sets which
do not intersect. Therefore, from separation theorems, we know that there
must exist a hyperplane separating both sets. For our current example,
y1
= 0
is the only hyperplane that separates both sets, but remember that the dual
tries to find a hyperplane with a nonzero coefficient for
y0
and contains Φ(
Rn
)
on one side, thus,
y1
= 0 is not feasible for the dual problem. So, how could we
ensure that, first, such a hyperplane exists and, second, it actually separates
Φ(
Rn
) from (
p∗,
+
∞
)
×Rm
−
? Note that the existence of such a hyperplane is
related to the feasibility of the dual problem, while the separation of Φ(
Rn
)
from (
p∗,
+
∞
)
×Rm
−
ensures that the dual achieves the same value as the
primal.
22 Chapter 1. Introduction
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
Figure 1.7: The set Φ(
R2
) is depicted in blue and (
−
1
,∞
)
×R−
in orange,
where Φ(x, y) = (−e−x,√︁x2+y2−y).
We will now see that if Φ(
Rn
) intersects the interior of
R×Rm
−
, then
we will have that the dual is feasible and equal to the primal. That is, if
there exists an
x0
such that
gi
(
x0
)
<
0 for all
i∈
[
m
], then strong duality
holds. Indeed, such a point forces every hyperplane separating Φ(
Rn
) from
(
p∗,
+
∞
)
×Rm
−
to have a nonzero coefficient for
y0
. This should be fairly
intuitive from the pictures. To see this algebraically, let
µ0y0−∑︁µiyi≤θ
be a hyperplane that separates Φ(
Rn
) from (
p∗,
+
∞
)
×Rm
−
. In particular,
(
µ0, µ
)
= 0 as otherwise
µ0y0−∑︁µiyi≤θ
would not be a hyperplane. As
(
f0, g1
(
x0
)
, . . . , gm
(
x0
))
∈
Φ(
Rn
), it follows that
µ0f
(
x0
)
−∑︁µig
(
x0
)
≤θ
. As
(
p,
0)
∈
(
p∗,
+
∞
)
×Rm
−
for every
p > p∗
, it follows that
θ≤µ0p
for every
p > p∗, which implies that θ≤µ0p∗. Thus, µ0f(x0)−∑︁µig(x0)≤µ0p∗.
Now, if
µ0
= 0, then
−∑︁µig
(
x0
)
≤
0, but
µ≥
0 and
g
(
x0
)
<
0, which can
only hold if
µ
= 0. However, this contradicts (
µ0, µ
)
= 0. Therefore
µ0>
0
and we can normalize so that
µ0
= 1. This shows that the dual is feasible and
that its value is equal to the primal. Indeed,
f
(
x
)
−∑︁µig
(
x
)
≤p∗
implies
that θ(µ)≤p∗, but by construction, θ(µ)≥p∗.
If there exists an
x0
such that
gi
(
x0
)
<
0 for
i∈
[
m
], then we say that
Slater’s condition holds and
x0
is called an Slater point. Thus, we have proven
that if the primal is feasible, bounded and, Slater’s condition holds, then
there is strong duality. The above result still holds when Slater’s condition is
weaken to ask that there exists a point
x0
such that
gi
(
x0
)
<
0 for every
gi
that is non-linear, see (Rockafellar, 1970, Theorem 28.2). The proof of such a
1.4. Monoidal Strengthening 23
result follows the same reasoning, but one needs a slightly stronger separation
theorem that exploits the polyhedrality of (
p∗,
+
∞
)
×Rm
−
, see (Rockafellar,
1970, Theorem 20.2).
More interpretations of duality among these lines can be found in Pourciau
(1980).
1.4 Monoidal Strengthening
In Chapter 4, we apply a modification of monoidal strengthening to intersec-
tion cuts. In this section, we explain what monoidal strengthening is.
Monoidal strengthening is a technique introduced in 1980 by Balas and
Jeroslow (1980). Our deduction of the monoidal strengthening technique ap-
plied to disjunctions is novel and is inspired by Wiese (2016, Section 4.2.3) and
several conversations with Sven Wiese. We also present the general technique
in the more modern framework of S-free sets.
Before we start, a monoid is the discrete analog of a convex convex. A
monoid is a pair (
M,
+) where
M
is a set and + :
M×M→M
such that
+ is associative and there exist 0
∈M
such that +(0
,·
) is the identity. The
name monoidal strengthening comes from the use of a monoid to strengthen
cuts.
As we discussed in Section 1.2, a simple way of generating cutting planes
is through cut generating functions. In this setting, and for the rest of this
section, we will assume that we have the following relaxation of the feasible
region of our optimization problem
{(x, y)∈Rq
+×Zp
+:∑︂
i
rixi+∑︂
j
djyj∈S},(1.4)
where
S⊆Rn
is a closed set such that 0
/∈S
and
ri, dj∈Rn
. We also have
a convex
S
-free set
C
such that 0
∈int C
. The set
C
is represented by a
sublinear function ϕ, that is,
C={z∈Rn:ϕ(z)≤1}.
The intersection cut generated by
ϕ
that separates the point (
x, y
) = (0
,
0)
from (1.4) is ∑︂
i
ϕ(ri)xi+∑︂
j
ϕ(dj)yj≥1
Probably the most intuitive way of understanding monoidal strengthening
is to see it as a technique that takes a relaxation of the form
{(x, y)∈Rq
+×Zp
+:∑︂
i
rixi+∑︂
j
djyj∈S},
24 Chapter 1. Introduction
and builds new ones,
{(x, y)∈Rq
+×Zp
+:∑︂
i
r′ixi+∑︂
j
d′jyj∈S′}.
Each of them can generate a cut that separates (0
,
0) and, of course, the idea
is to select a “best” one. The construction of new relaxations exploits the fact
that some variables are restricted to be integers and the structure of the set
S
.
Let us see two examples before we present the general principle of monoidal
strengthening.
1.4.1 One Row Relaxations: Gomory Cuts
Assume f /∈Z,n= 1, and that (1.4) is
∑︂
i
rixi+∑︂
j
djyj∈S:= Z−f,
where
ri, dj∈R
. As each
yj∈Z
, adding some integer multiple of
yj
to the
above relation does not change S. That is, if mj∈Z, then
∑︂
i
rixi+∑︂
j
djyj+∑︂
j
mjyj∈Z+∑︂
j
mjyj−f=Z−f.
Thus, if
C
is a convex
S
-free set represented by
ϕ
, such that 0
∈int C
,
then not only is ∑︁iϕ(ri)xi+∑︁jϕ(dj)yj≥1 a valid inequality, but also
∑︂
i
ϕ(ri)xi+∑︂
j
ϕ(dj+mj)yj≥1 for every mj∈Z.
Note that in this particular case, the only maximal
S
-free set that contains
0 is
C
= [
−f,
1
−f
] and the only sublinear function
ϕ
such that
C
=
{x∈
R
:
ϕ
(
x
)
≤
1
}
is its gauge,
ϕ
(
x
) =
max{x
1−f,−x
f}
. Using
ϕ
and finding the
best
mj
for each
dj
yields the Gomory cut (Gomory, 1960). By best here we
mean the mjthat makes ϕ(dj+mj) the smallest.
1.4.2 Disjunctive Cuts
Let Qbe an index set and consider an optimization problem Psuch that
S={(x, y)∈Rq
+×Zp
+:⋁︂
k∈Q
a(k)Tx+d(k)Ty≥1},
1.4. Monoidal Strengthening 25
is a valid disjunction, that is, every feasible solution of
P
is in
S
. Here, we
denote the vectors as
a
(
k
)
∈Rq
and
d
(
k
)
∈Rp
instead of the more usual
notation akand dk. As (1.4) we use
∑︂
j
ejxj+∑︂
j
ej+qyj= (x, y)∈S.
Consider the S-free set
C={(x, y)∈Rq+p:a(k)Tx+d(k)Ty≤1 for k∈Q}
A sublinear function representing C, which may or may not be its gauge, is
ϕC(x, y) = max
k∈Qa(k)Tx+d(k)Ty. (1.5)
Thus, we obtain the cut
1≤∑︂
j
ϕC(ej)xj+∑︂
j
ϕC(eq+j)yj=∑︂
j(︃max
k∈Qa(k)j)︃xj+∑︂
j(︃max
k∈Qd(k)j)︃yj
This cut is known as disjunctive cut (Balas, 1979) and the implication
⋁︂
k∈Q∑︂
j
a(k)jxj+∑︂
j
d(k)jyj≥1 =⇒∑︂
j
xjmax
k∈Qa(k)j+∑︂
j
yjmax
k∈Qd(k)j≥1,
is known as the maximum principle.
Monoidal strengthening in this setting amounts to finding a new dis-
junction that every feasible point must satisfy. Balas and Jeroslow showed
how to build new disjunctions. For their construction we need that if each
disjunction is relaxed enough, then it is automatically satisfied. More for-
mally, we need that for each
k
, there is a
bk
such that every
x∈S
sat-
isfies
a
(
k
)
Tx
+
d
(
k
)
Ty≥bk
. In other words, we need that the expression
a(k)Tx+d(k)Tyis bounded from below in the feasible region of P.
For example, consider
S
=
{x∈R2
+
:
x1+x2
3≥
1
∨x1≥
1
}
and assume
that the feasible region of
P
is
S
. Then,
x1+x2
3≥1
3
is a valid inequality for
S
.
In other words, if we relax the first disjunctive term
x1+x2
3≥
1 by
2
3
, then we
obtain an inequality satisfied by every element of
S
. Thus,
b1
=
1
3
. Similarly,
b2
= 0 is a lower bound for the second disjunctive term that makes it valid
for every element of S.
On the other hand, consider
S
=
{x∈R2
+
:
x1
2≥
1
∨x1−x2≥
1
}
. While
b1
=
1
2
is a valid bound for
x1
2
, there is no
b2
such that
x1−x2≥b2
is valid
for
S
. One reason is that
x1
= 2,
x2≥
0 is in
S
and so
x1−x2
is unbounded
from below.
26 Chapter 1. Introduction
Given the lower bounds bkwe have the following lemma which will allow
us to build new disjunctions. Notice that we can, and will, assume that
bk<
1
as otherwise the disjunction is trivially satisfied.
Lemma 1.1. Every (x, y)∈Ssatisfies the disjunction
⋁︂
k∈Q
a(k)Tx+d(k)Ty+ (1 −bk)zk≥1 (1.6)
whenever z∈Z={z∈ZQ:z= 0 ∨∃k, zk≥1}.
Proof. If
z
= 0 there is nothing to prove. Let
z
= 0 and let
k0∈Q
be such
that
zk0≥
1. Then, the disjunction is satisfied because
a
(
k0
)
Tx
+
d
(
k0
)
Ty
+
(1
−bk0
)
zk0≥
1 is a relaxation of
a
(
k0
)
Tx
+
d
(
k0
)
Ty≥bk0
which is satisfied
by hypothesis. To see that it is a relaxation just notice that
bk0≥
1
−
(1
−
bk0)zk0.
As written in the lemma above, this disjunction is not interesting due to
the fact that for any given
z∈Z
,
(1.6)
is either the original disjunction or is
redundant. However, by making
z
depend on
y
, we obtain a non-trivial new
disjunction.
Theorem 1.2 (Balas and Jeroslow (1980, Theorem 3)).Let
M:= {m∈ZQ:∑︂
k∈Q
mk≥0}(1.7)
and consider m(k)∈Rpfor k∈Qsuch that (m(k)j)k∈Q∈ M for all j∈[p].
Then, ⋁︂
k∈Q
a(k)Tx+ (d(k) + (1 −bk)m(k))Ty≥1 (1.8)
is a valid disjunction for (x, y)∈S.
Proof. Let (x¯, y¯) ∈Sand let z∈ZQbe defined by zk=m(k)Ty¯. Since
∑︂
k
zk=∑︂
j
y¯j
⏞⏟⏟⏞
∈Z+∑︂
k
m(k)j
⏞⏟⏟ ⏞
≥0
≥0
we conclude that z∈Z.
On the other hand, note that (1.6) is equivalent to
⋁︂
k∈Q
a(k)Tx+d(k)Ty+ (1 −bk)m(k)Ty¯≥1.
1.4. Monoidal Strengthening 27
As
z∈Z
, Lemma 1.1 implies that the previous disjunction is valid for every
(
x, y
)
∈S
, in particular, for (
x¯, y¯
). Evaluating the disjunction at (
x¯, y¯
) yields
⋁︂
k∈Q
a(k)Tx¯ + d(k)Ty¯ + (1 −bk)m(k)Ty¯≥1.
which is equivalent to evaluating
(1.8)
at (
x¯, y¯
). Thus, (
x¯, y¯
) satisfies
(1.8)
. It
follows that every (x, y)∈Ssatisfies (1.8) as we wanted to show.
The theorem implies that each
Q
-tuple
M
= (
m
(
k
)
∈Rp
:
k∈Q
)
such that (
m
(
k
)
j
)
k∈Q∈ M
for all
j∈
[
p
], yields a new valid disjunction,
namely (1.8), which in turn yields a new S-free set
CM={(x, y)∈Rq+p:ϕCM(x, y)≤1},
where ϕCM(x, y) = maxk∈Qa(k)Tx+ (d(k) + (1 −bk)m(k))Ty. Therefore,
∑︂
j
ϕCM(ej)xj+∑︂
j
ϕCM(eq+j)yj≥1
is a valid inequality for S. This inequality reads
∑︂
j(︃max
k∈Qa(k)j)︃xj+∑︂
j(︃max
k∈Q(d(k)j+ (1 −bk)m(k)j))︃yj≥1.
Choosing the best possible tuple Myields
∑︂
j(︃max
k∈Qa(k)j)︃xj+∑︂
j(︃max
k∈Qinf
m∈M(d(k)j+ (1 −bk)mk))︃yj≥1.(1.9)
1.4.3 Monoidal Strengthening
The general principle of monoidal strengthening is as follows. Assume we have
a monoid
M
and an (
S
+
M
)-free set
C
=
{
(
x, y
) :
ϕ
(
x, y
)
≤
1
}
where
ϕ
is sublinear. If
{
(
x, y
)
∈Rn
+×Zp
+
:
∑︁irixi
+
∑︁jdjyj∈S
+
M}
is a valid
relaxation, then not only is ∑︁iϕ(ri)xi+∑︁jϕ(dj)yj≥1 valid, but also
∑︂
i
ϕ(ri)xi+∑︂
j
inf
m∈Mϕ(dj+m)yj≥1.(1.10)
In particular, the previous cut is the strongest one that can be obtained with
this technique.
28 Chapter 1. Introduction
The proof of validity follows from exploiting the integrality restrictions
of
y
. Indeed, as
y
is a non-negative integer,
∑︁mjyj∈M
for every
mj∈M
.
Every feasible solution satisfies
∑︁irixi
+
∑︁jdjyj∈S
+
M
and so they also
satisfy
∑︂
i
rixi+∑︂
j
(dj+mj)yj∈S+M+∑︂mjyj⊆S+M+M=S+M, (1.11)
where the last equality holds because
M
is a monoid, in particular, closed
under addition. Then, applying the cut generating function to
(1.11)
, we
obtain the valid inequality
∑︂
i
ϕ(ri)xi+∑︂
j
ϕ(dj+mj)yj≥1.
As the mj∈Mare arbitrary, we obtain (1.10).
For this technique to work, one actually needs a monoid. In the case
of Gomory cuts,
S
=
Z−f
. As
S
+
Z
=
S
, one can use
M
=
Z
as the
monoid for monoidal strengthening. One can also write the relaxation as
f
+
∑︁irixi
+
∑︁jdjyj∈S
and consider
S
to be
Z
, in which case the monoid is
S
itself. In the literature, it is rather common to use
S
or a subset of it as the
monoid. For example, relaxations where
S
=
Zn
or
S
=
Zn∩P
, where
P
is a
polyhedron, or even a convex set have been studied, see for example Andersen
et al. (2007), Basu et al. (2010b), Mor´an and Dey (2011) and (Conforti et al.,
2014, Chapter 6). When
S
=
Zn∩P
and
P
is a polyhedron, a typical monoid
used for strengthening is
M
=
Zn∩lin
(
conv
(
S
)), see for example Dey and
Wolsey (2010), Conforti et al. (2011a) and Basu et al. (2012).
A more complicated setting is when
S
+
M
=
S
. The disjunctive case
corresponds to this more complicated setting, but to be able to see this, we
need to follow more closely the original derivation of Balas and Jeroslow (1980).
Furthermore, note that our exposition of the disjunctive case in Section 1.4.2
is not an application of the general principle of monoidal strengthening as
presented here, since we also modified the S-free set.
The setting for the disjunctive case is that we have an optimization problem
on the variables (x, y)∈Rq
+×Zp
+such that
⋁︂
k∈Q
a(k)Tx+d(k)Ty≥1
is a valid disjunction. In Section 1.4.2, we represented this by taking
S={(x, y)∈Rq
+×Zp
+:⋁︂
k∈Q
a(k)Tx+d(k)Ty≥1}
1.4. Monoidal Strengthening 29
and ∑︁jejxj+∑︁jej+qyj= (x, y)∈Sas our relaxation (1.4).
However, we can also represent the disjunction in a different way. Recall
that
bk
are the lower bounds on the disjunctive terms
a
(
k
)
Tx
+
d
(
k
)
Ty
, see
Section 1.4.2. Let
Sb={w∈RQ:⋁︂
k∈Q
wk≥1, w ≥b}.
We can model this disjunction via
⎛
⎜
⎝
a(1)Tx+d(1)Ty
.
.
.
a(K)Tx+d(K)Ty
⎞
⎟
⎠∈Sb⇐⇒ Ax +Dy ∈Sb,
where A=⎛
⎜
⎝
a(1)T
.
.
.
a(K)T⎞
⎟
⎠,D=⎛
⎜
⎝
d(1)T
.
.
.
d(K)T⎞
⎟
⎠, and Q={1, . . . , K}.
We have that
Cb
=
{w∈RQ
:
wk≤
1
}
is a convex
Sb
-free set. Note
that
Cb
=
{w∈RQ
:
ϕCb
(
w
)
≤
1
}
, where
ϕCb
(
w
) =
maxk∈Qwk
, and it is
sublinear. Note that
ϕCb
(
A·j
) =
ϕC
(
ej,
0) and
ϕCb
(
D·j
) =
ϕC
(0
, ej
), where
ϕCis defined in (1.5).
Now, consider the monoids
M
defined in
(1.7)
and
T
=
{τ∈RQ
:
∃m∈
M, τk
= (1
−bk
)
mk,for k∈Q}
. Let us see that
Cb
is (
Sb
+
T
)-free. Let
θ∈Sb
+
T
so that
θ
=
w
+
τ
. If
τ
= 0, then
θ∈Sb
and so
θ /∈int Cb
as
Cb
is
Sb
-free. Otherwise, there exists
k∈Q
such that
mk>
0 so
τk≥
1
−bk
. Then,
θk=wk+τk≥bk+ 1 −bk≥1, thus, θ /∈int Cb.
Summarizing,
Cb
is not only
Sb
-free but also (
Sb
+
T
)-free. Thus, we
can apply monoidal strengthening to obtain the cut
(1.9)
. This argument is
basically a modern rewrite of the argument in Balas and Jeroslow (1980).
Note here that Sb+T=Sb.
In general, the challenge of monoidal strengthening is to find a monoid
M
such that given a closed set
S
and an
S
-free set
C
,
C
is also (
S
+
M
)-free, so
that we can apply monoidal strengthening as above.
Chapter 2
On the Relation Between the Extended
Supporting Hyperplane Algorithm and
Kelley’s Cutting Plane Algorithm
In this chapter we revisit two classical algorithms for convex mixed integer
optimization, namely, Kelley’s cutting plane algorithm and Veinott’s support-
ing hyperplane algorithm. The motivation to look into these algorithm is the
following. Some state-of-the-art LP-based MINLP solvers enforce convex con-
straint by adding gradient cutting planes. Simple examples show that these
cuts do not necessarily support the feasible region, and so they are dominated.
In order to build undominated cuts, or equivalently, supporting cuts, different
separation procedures are needed such as the one proposed by Veinott.
However, it is not always the case that gradient cutting planes are not
supporting. Thus, the purpose of this chapter is to understand when gradient
cutting planes are supporting. Our findings naturally suggest a reformula-
tion of the feasible region for which every gradient cut is supporting. As a
consequence, we can show that Veinott’s supporting hyperplane algorithm is
just a special case of Kelley’s cutting plane algorithm. As a result, we extend
the applicability of the supporting hyperplane algorithm to convex problems
represented by a class of general, not necessarily convex nor differentiable,
functions.
The insights obtained in this chapter, together with an interpretation
of gradient cutting planes as intersection cuts presented in Chapter 4 will
motivate the basic construction of maximal quadratic-free sets presented in
Chapter 5.
The chapter is organized as follows. In Section 2.1 we introduce the object
of study of this chapter and review the literature on cutting plane approaches
and efforts on obtaining supporting valid inequalities. In Section 2.2, we char-
acterize functions whose linearizations are supporting hyperplanes to their
31
32 On the Relation Between the ESH Algorithm and KCP Algorithm
0-sublevel sets. Section 2.3 introduces the gauge function and shows how to
use it for building supporting hyperplanes. We note that evaluating the gauge
function is equivalent to the line search step of the supporting hyperplane
algorithm. This equivalence provides the link between the supporting hyper-
plane and Kelley’s cutting plane algorithm. In Section 2.4, we show that the
cutting planes generated by the supporting hyperplane algorithm can also
be generated by Kelley’s algorithm when applied to a reformulation of the
problem. This implies that the convergence of the supporting hyperplane al-
gorithm follows from Kelley’s. In Section 2.5, we show that we can apply the
supporting hyperlane algorithm to problem whose feasible region is convex
but represented via functions that are not necessarily convex nor differentiable.
We introdue the concept of a well-behaved generalized directional derivative
and show that if the functions have well-behaved generalized directional deriv-
atives and 0 does not belong to the generalized subdifferential at points where
the functions are zero, then the supporting hyperplane algorithm converges.
Finally, Section 2.6 presents our concluding remarks.
This chapter is joint work with Ambros Gleixner and Robert Schwarz and
has been submitted to the Journal of Global Optimization.
2.1 Background
A mixed integer convex program (MICP) is a problem of the form
min{cTx:x∈C∩(Zp×Rn−p)},(2.1)
where
C
is a closed convex set,
c∈Rn
, and
p
denotes the number of variables
with integrality requirement. The use of a linear objective function is without
loss of generality given that one can always transform a problem with a convex
objective function into a problem of the form
(2.1)
. We can represent the set
C
in different ways, one of the most common being as the intersection of
sublevel sets of convex differentiable functions, that is,
C={x∈Rn:gj(x)≤0, j ∈J}.(2.2)
Here, Jis a finite index set and each gjis convex and differentiable.
Several methods have been proposed for solving
MICP
. When the problem
is continuous and represented as
(2.2)
, one of the first proposed methods was
the cutting plane algorithm by J. E. Kelley (1960). This algorithm exploits
the convexity of a constraint function gto build gradient cuts.
The idea of Kelley’s cutting plane (KCP) algorithm is to approximate the
feasible region with a polytope, solve the resulting linear program (LP) and, if
2.1. Background 33
the LP solution is not feasible, separate it using gradient cuts to obtain a new
polytope which is a better approximation of the feasible region and repeat,
see Algorithm 2.1.
Algorithm 2.1: Kelley’s cutting plane algorithm
1LP ={x:x∈[l, u]}, x¯←arg minx∈LP cTx
2while maxj∈Jgj(x¯) > ϵ do
3forall jsuch that gj(x¯) >0do
4LP ←LP ∩{x:gj(x¯) + ∇gj(x¯)(x−x¯) ≤0}
5x¯←arg minx∈LP cTx
6return x¯
Kelley shows that the algorithm converges to the optimum and it converges
in finite time to a point close to the optimum. By solving integer programs
(IP) using the cutting planes of Gomory (1958) instead of LP relaxations,
Kelley shows that his cutting plane algorithm solves purely integer convex
programs in finite time. The same algorithm works just as well for
MICP
.
However, Kelley did not have access to a finite algorithm for solving mixed
integer linear programs (MILP).
In an attempt to speed up Kelley’s algorithm, Veinott (1967) proposes
the supporting hyperplane algorithm (SH). A possible issue with Kelley’s
algorithm is that, in general, gradient cuts do not support the feasible region,
see Figure 2.1. Therefore, it is expected that better relaxations can be achieved
by using supporting cutting planes.
In order to construct supporting hyperplanes, Veinott suggests to build
gradient cuts at boundary points of
C
. He uses an interior point of
C
to find
the point on the boundary,
xˆ
, that intersects the segment joining the interior
point and the solution of the current relaxation. These cuts are automatically
supporting hyperplanes of
C
, at
xˆ
. However, since the cut is computed at
xˆ
which is in
C
, it might happen that the gradient of the constraints active at
xˆ
vanishes. For this reason, Veinott also requires that the functions representing
C
have non-vanishing gradients at the boundary. This is immediately implied
by, e.g., Slater’s condition (Section 1.3). Veinott also identifies that one can use
his algorithm to solve
(2.1)
when representing
C
by quasi-convex functions,
that is, functions whose sublevel sets are convex.
Recently, Kronqvist et al. (2016) rediscovered and implemented Veinott’s
algorithm (Veinott, 1967). They call their algorithm the extended supporting
hyperplane algorithm (ESH). They discuss the practical importance of choos-
ing a good interior point and propose some improvements over the original
34 On the Relation Between the ESH Algorithm and KCP Algorithm
method, such as solving LP relaxations during the first iterations instead of the
more expensive MILP relaxation. As a result, they present a computationally
competitive solver implementation for MICPs defined by convex differentiable
constraint functions (Kronqvist et al., 2018).
In this chapter, we would like to understand when, given a convex dif-
ferentiable function
g
, gradient cuts of
g
are supporting to the convex set
C
=
{x∈Rn
:
g
(
x
)
≤
0
}
. This question is motivated by the fact that in
this case Kelley’s algorithm automatically becomes a supporting hyperplane
algorithm. In Theorem 2.3 we give a necessary and sufficient condition for
a gradient cut of
g
at a given point to be a supporting hyperplane of
C
. In
particular, this condition suggests to look at sublinear functions, i.e., convex
and positively homogeneous functions. As it turns out, this naturally leads to
Veinott’s algorithm.
Sublinear functions and convex sets are deeply related. When the origin is
in the interior of a convex set
C
, then we can represent
C
via its gauge func-
tion
φC
, which is sublinear (Rockafellar, 1970). We give the formal definition
of the gauge function in Section 2.3, but for now it suffices to know that we
can represent
C
as
C
=
{x∈Rn
:
φC
(
x
)
≤
1
}
and that, in particular, for
every
x¯
= 0 a gradient cut of
φC
at
x¯
supports all of its sublevel sets. The
following example illustrates this.
Example 2.1. Consider the convex feasible region given by
C={(x, y)∈R2:g(x, y)≤0},
where
g
(
x, y
) =
x2
+
y2−
1. We show through an example that gradient cuts
of
g
are not necessarily supporting to
C
, explain why this happens, and show
that changing the representation of
C
to use its gauge function solves the
issue.
Separating the infeasible point
x¯
= (
3
2,3
2
) by a gradient cut of
g
at
x¯
gives
g(x¯) + ∇g(x¯)(x−x¯) ≤0
⇔x+y≤11
6.
This cut does not support
C
, see Figure 2.1. Alternatively, the gauge function
of
C
is given by
φC
(
x, y
) =
√︁x2+y2
and
C
=
{
(
x, y
) :
√︁x2+y2≤
1
}
. The
gradient cut of φCat x¯ is x+y≤√2, which is supporting.
From the previous discussion it is a natural idea to represent
C
via its
gauge function, namely,
C
=
{x∈Rn
:
φC
(
x
)
≤
1
}
. However, as mentioned
2.1. Background 35
x1
x2
−1 1
−1
1
−2
2
−2
−1
12
Figure 2.1: The feasible region
C
and the infeasible point
x¯
= (
3
2,3
2
) to separate.
On the left we see that the separating hyperplane is not supporting to
C
. On
the right we see why this happens: the linearization of
g
at
x¯
is tangent to the
epigraph of
g
(shown upside-down for clarity) at (
x¯, g
(
x¯
)). However, when this
hyperplane intersects the
x
-
y
-plane, it is already far away from the epigraph,
and in consequence, from the sublevel set. The intersection of the hyperplane
with the x-y-plane is the gradient cut.
before,
C
is usually given by
(2.2)
. Our main contribution is to show that
reformulating
(2.2)
to the gauge representation will naturally lead to the
ESH algorithm, see Section 2.3.2. As a consequence, the convergence proofs
of Veinott (1967) and Kronqvist et al. (2016) follow directly from the conver-
gence proof of Kelley’s cutting plane algorithm (J. E. Kelley, 1960; Horst and
Tuy, 1990), see Section 2.4. In other words, we show that the ESH algorithm
is KCP algorithm applied to a different representation of the problem.2
Motivated by this approach of representing
C
by its gauge function, we
are able to show that the ESH algorithm applied to
(2.1)
converges even
when
C
is not represented by convex functions. This is related to recent work
of Lasserre (2009) that tries to understand how different techniques behave
when the convex set
C
is not represented via
(2.2)
. Lasserre considers sets
C
=
{x
:
gj
(
x
)
≤
0
, j ∈J}
where
gj
are only differentiable, but not necessarily
convex in the following setting:
2
Strictly speaking, when the problem is mixed integer, the KCP algorithm only corresponds
to the so-called LP-step (Kronqvist et al., 2016) of the ESH algorithm. However, given
that the KCP algorithm allows for an straightforward extension to the mixed integer case,
we will continue to compare the KCP algorithm to the ESH algorithm with respect to
their technique of generating cutting planes.
36 On the Relation Between the ESH Algorithm and KCP Algorithm
Assumption 2.2.
For all
x∈C
and all
j∈J
, if
gj
(
x
) = 0, then
∇gj
(
x
)
= 0.
Under this assumption, that is, if the gradients of active constraints do
not vanish at the boundary of
C
, Lasserre shows that the KKT conditions
are not only necessary but also sufficient for global optimality. In other words,
every minimizer is a KKT point and every KKT point is a minimizer.
A series of generalizations follow the work of Lasserre. Dutta and Lalitha
(2011) generalize the previous result to the case where
C
is represented by lo-
cally Lipschitz functions, not necessarily differentiable nor convex, but regular
in the sense of Clarke (Clarke, 1990), see also Definition 2.15. Mart´ınez-Legaz
(2014) further generalize the result to the case where
C
is represented by
tangentially convex functions (Lemar´echal, 1986; Pshenichnyi, 1971). Kabgani
et al. (2017) generalize the result to the case where
C
is represented by func-
tions that admit an upper regular convexificator URC (Jeyakumar and Luc,
1999), see also Definition 2.16. We note that regular functions in the sense of
Clarke and tangentially convex functions admit a URC (Kabgani et al., 2017),
thus the URC assumption is the most general among the ones considered in
these works.
In terms of computations, Lasserre (2011, 2014) proposes an algorithm to
find the KKT point via log-barrier functions. He shows that the algorithm
converges to the KKT point if Assumption 2.2 holds.
For all these concepts of generalized derivative, there is a notion of direc-
tional derivative and a notion of subdifferential. For example, for functions
that admit a URC, the notion of directional derivative is the upper Dini direc-
tional derivative and its subdifferential is the URC, see Definition 2.16. Let
f
be a function and let us denote by
f′
(
x
;
d
) a generalized directional derivative.
We say that the directional derivative is well-behaved if
f′
(
x
;
d
)
>
0 implies
that there exists tn↘0 such that f(x+tnd)> f(x).
In this sense we show that if
C
is represented by functions whose gen-
eralized directional derivatives are well-behaved, then the ESH converges to
the global optimum, under the equivalent of Assumption 2.2 (see
(2.8)
) for
the corresponding subdifferential. The upper Dini directional derivative is
certainly well-behaved and, thus, our result shows that the ESH converges
when
C
is represented by functions that admit a URC. We also show that
for
∂◦
-pseudoconvex (see Definition 2.19) constraints, the Clarke directional
derivative (see Definition 2.15) is well-behaved. Therefore, our result gener-
alizes the result of Eronen et al. (2017) that the ESH converges when
C
is
represented by ∂◦-pseudoconvex functions.
We also show, via an example, that if we use Clarke’s subdifferential (Clarke,
1990), the ESH does not need to converge when the functions are only Lipschitz
2.1. Background 37
continuous but not regular in the sense of Clarke.
Finally, we provide a characterization of convex functions whose lineariza-
tions are supporting to their sublevel sets. Although elementary, the authors
are not aware of its presence in the literature. In particular, this result allows
us to identify some families of functions for which gradient cuts are never
supporting (see Example 2.7) and some for which they are always supporting
(see Corollary 2.5 and Example 2.6).
2.1.1 Literature Review
We can think of the algorithms of J. E. Kelley (1960) and Veinott (1967) as a
mixture of two ingredients: which relaxation to solve and where to compute the
cutting plane. Indeed, at each iteration we have a point
xk
that we would like
to separate with a linear inequality
β
+
αT
(
x−x0
)
≤
0. For Kelley’s algorithm,
x0
=
xk
, while for Veinott’s algorithm,
x0∈∂C
, and for both
α∈∂g
(
x0
)
and
β
=
g
(
x0
). Choosing different relaxations and different points where to
compute the cutting planes yields different algorithms. This framework is
developed in Horst and Tuy (1990).
Following the previous framework, Duran and Grossmann (1986) propose
the, so-called, outer approximation algorithm for
MICP
. The idea is to solve
an MILP relaxation, but instead of computing a cutting plane at the MILP op-
timum, or at the boundary point on the segment between the MILP optimum
and some interior point, they suggest to compute cutting planes at a solution
of the nonlinear program (NLP) obtained after fixing the integer variables to
the integer values given by the MILP optimal solution. This is a much more
expensive algorithm but has the advantage of finite convergence. Of course,
this does not work in complete generality and we need some assumptions, for
example, requiring some constraint qualifications. Moreover, when obtaining
an infeasible NLP after fixing the integer variables, care must be taken to pre-
vent the same integer assignment in future iterations. To handle such cases,
Duran and Grossmann propose the use of integer cuts. However, Fletcher
and Leyffer (1994) point out that this is not necessary. They show that the
gradient cuts at the solution of a slack NLP separates the integer assignment.
Eronen et al. (2012) show that a naive generalization of the outer approxima-
tion algorithm to the non-differentiable case will not work. They provide a
generalization for a particular class of function. Wei and Ali (2015a,b) provide
further generalizations to the non-differentiable case.
A related algorithm to the outer approximation method is the so-called
generalized Benders decomposition (Geoffrion, 1972). We refer to Duran and
Grossmann (1986); Fletcher and Leyffer (1994); Quesada and Grossmann
38 On the Relation Between the ESH Algorithm and KCP Algorithm
(1992) for discussions about the relation between these two algorithms. Wei
and Ali (2015c) extend the generalized Benders decomposition to Banach
spaces.
Westerlund and Pettersson (1995) propose the so-called extended cutting
plane algorithm. This algorithm is the extension of Kelley’s cutting plane to
MICP
and they show that the algorithm convergences. Further extensions
and convergence proofs of cutting plane and outer approximation algorithms
for non-smooth problems are given in Eronen et al. (2012). An interesting
generalization of the extended cutting plane algorithm to solve a class of
non-convex problems is the so-called
α
extended cutting plane algorithm
introduced by Westerlund et al. (1998). They consider problem
(2.1)
where
C
is represented by differentiable pseudoconvex constraints. The idea is that,
even though a gradient cut might not be valid, one can tilt the cut in order
to make it valid. The tilting is done by multiplying the gradient by some
α
,
hence the name. We refer to Westerlund et al. (1998) for more details.
As mentioned at the beginning, the assumption that the objective function
is linear is without loss of generality, provided that the original objective func-
tion is convex. However, some classes of problems cannot be encompassed by
(2.1)
, for example, when the objective function is quasi-convex. An extension
of the KCP algorithm, the (
α
) extended cutting plane algorithm, and the
ESH to convex problems with a class of quasi-convex objectives were devel-
oped by Plastria (1985), Eronen et al. (2013), and Westerlund et al. (2018),
respectively.
Yet another technique for producing tight cuts is to project the point to
be separated onto
C
(Horst and Tuy, 1990). Using the projected point and the
difference between the point and its projection, one can build a supporting
hyperplane that separates the point. In the same reference, Horst and Tuy
show that this algorithm converges.
There have been attempts at building tighter relaxations by ensuring that
gradient cuts are supporting, in a more general context than convex mixed
integer nonlinear programming. Belotti et al. (2009) consider bivariate convex
constraints of the form
f
(
x
)
−y≤
0, where
f
is a univariate convex function.
They propose projecting the point to be separated onto the curve
y
=
f
(
x
)
and building a gradient cut at the projection. However, their motivation is not
to find supporting hyperplanes, but to find the most violated cut. Indeed, as
we will see, gradient cuts for these types of constraints are always supporting
(Example 2.6). Other work along these lines includes the one by Lubin et al.
(2015), where the authors derive an efficient procedure to project onto a two
dimensional constraint derived from a Gaussian linear chance constraint, thus
building supporting valid inequalities.
2.2. Characterization of Functions with Supporting Linearizations 39
Another algorithm for solving non-smooth convex optimization problems
is the so-called bundle method (Hiriart-Urruty and Lemar´echal, 1993). This
method has also been extended to consider the mixed integer case by de Oliveira
(2016).
Finally, in terms of applications, we would like to point out that the sup-
porting hyperplane algorithm is very popular in stochastic optimization (van
Ackooij et al., 2018, 2013; van Ackooij and de Oliveira, 2016; Arnold et al.,
2013; Pr´ekopa, 1995; Pr´ekopa and Sz´antai, 1978; Sz´antai, 1988).
2.2 Characterization of Functions with Supporting Lineariza-
tions
We now give necessary and sufficient conditions for the linearization of a
convex, not necessarily differentiable, function
g
at a point
x¯
to support the
region
C
=
{x∈Rn
:
g
(
x
)
≤
0
}
. In order for this to happen, the supporting
hyperplane has to support the epigraph on the whole segment joining the
point of
C
where it supports and (
x¯, g
(
x¯
)). In other words, the function must
be affine on the segment joining the set
C
and
x¯
. This is due to the convexity
of g.
Theorem 2.3.
Let
g:Rn→R
be a convex function,
C
=
{x∈Rn
:
g
(
x
)
≤
0
}
=
∅
, and
x¯/∈C
. There exists a subgradient
v∈∂g
(
x¯
)such that the valid
inequality
g(x¯) + vT(x−x¯) ≤0 (2.3)
supports
C
, if and only if, there exists
x0∈C
such that
λ↦→ g
(
x0
+
λ
(
x¯−x0
))
is affine in [0,1].
Proof. (
⇒
) Let
x0∈∂C
be the point where
(2.3)
supports
C
. The idea is to
show that the affine function
x↦→ g
(
x¯
) +
vT
(
x−x¯
) coincides
g
at two points,
x¯
and
x0
. Then, by the convexity of
g
, it must coincide with
g
on the segment
joining both points.
In more detail, by definition of x0we have,
g(x¯) + vT(x0−x¯) = 0.(2.4)
For
λ∈
[0
,
1], let
l
(
λ
) =
x0
+
λ
(
x¯−x0
) and
ρ
(
λ
) =
g
(
l
(
λ
)). Since
g
is convex
and laffine, ρis convex.
Since vis a subgradient,
g(x¯) + vT(l(λ)−x¯) ≤ρ(λ) for every λ∈[0,1].
40 On the Relation Between the ESH Algorithm and KCP Algorithm
After some algebraic manipulation and using that
ρ
(1) =
g
(
x¯
) =
vT
(
x¯−x0
),
we obtain
ρ(1)λ≤ρ(λ).
On the other hand,
ρ
(0) = 0 and
ρ
(
λ
) is convex, thus we have
ρ
(
λ
)
≤λρ
(1) +
(1
−λ
)
ρ
(0) =
λρ
(1) for
λ∈
[0
,
1]. Therefore,
ρ
(
λ
) =
ρ
(1)
λ
, hence
g
(
l
(
λ
)) is
affine in [0,1].
(
⇐
) The idea is to show that there is a supporting hyperplane
H
of
epi g⊆Rn×R
which contains the graph of
g
restricted to the segment joining
x0
and
x¯
, that is,
A
=
{
(
x0
+
λ
(
x¯−x0
)
, g
(
x0
+
λ
(
x¯−x0
))) :
λ∈
[0
,
1]
}
. Then
the intersection of such Hwith Rn×{0}will give us (2.3).
The set
A
is a convex nonempty subset of
epi g
that does not intersect the
relative interior of epi g. Hence, there exists a supporting hyperplane,
H={(x, z)∈Rn×R:vTx+az =b},
to epi gcontaining A(Rockafellar, 1970, Theorem 11.6).
Since
g
(
x0
)
≤
0 and
g
(
x¯
)
>
0, it follows that
A
is not parallel to the
x
-space. Therefore,
H
is also not parallel to the
x
-space and so
v
= 0. Since
A
is not parallel to the
z
-axis, it follows that
a
= 0. We assume, without loss
of generality, that a=−1.
The point (
x¯, g
(
x¯
)) belongs to
A⊆H
, thus
vTx¯−g
(
x¯
) =
b
and
H
=
{(x, g(x¯) + vT(x−x¯)) : x∈Rn}. Given that Hsupports the epigraph, then
vis a subgradient of g, in particular,
g(x¯) + vT(x−x¯) ≤g(x) for every x∈Rn.
Let
z
(
x
) be the affine function whose graph is
H
, that is,
z
(
x
) =
g
(
x¯
)+
vT
(
x−x¯
).
We now need to show that
g
(
x¯
) +
vT
(
x−x¯
)
≤
0 supports
C
by exhibiting an
xˆ∈C
such that
g
(
x¯
) +
vT
(
xˆ−x¯
) = 0. By construction,
z
(
x0
+
λ
(
x¯−x0
)) =
g
(
x0
+
λ
(
x¯−x0
)). Since
z
(
x0
+
λ
(
x¯−x0
)) is non-positive for
λ
= 0 and
positive for
λ
= 1, it has to be zero for some
λ0
. Let
xˆ
=
x0
+
λ0
(
x¯−x0
). Then
g(xˆ) = z(xˆ) = 0 and we conclude that xˆ∈Cand g(x¯) + vT(xˆ−x¯) = 0.
Specializing the theorem to differentiable functions directly leads to the
following:
Corollary 2.4.
Let
g:Rn→R
be a convex differentiable function,
C
=
{x∈
Rn:g(x)≤0}, and x¯/∈C. Then the valid inequality
g(x¯) + ∇g(x¯)T(x−x¯) ≤0,
supports
C
, if and only if, there exists
x0∈C
such that
λ↦→ g
(
x0
+
λ
(
x¯−x0
))
is affine in [0,1].
2.2. Characterization of Functions with Supporting Linearizations 41
Proof. Since
g
is differentiable, the subdifferential of
g
consists only of the
gradient of g.
A natural candidate for functions with supporting gradient cuts at every
point are functions whose epigraph is a translation of a convex cone.
Corollary 2.5
(Sublinear functions)
.
Let
h
(
x
)be a sublinear function. For
this type of function, gradient cuts always support
C
=
{x
:
h
(
x
)
≤c}
, for
any c≥0.
Proof. This follows directly from Theorem 2.3, since 0
∈C
and
λ↦→ h
(
λx¯
) is
affine in R+for any x¯.
However, these are not the only functions that satisfy the conditions of
Theorem 2.3 for every point. The previous theorem implies that linearizations
always support the constraint set if a convex constraint
g
(
x
)
≤
0 is linear in
one of its arguments.
Example 2.6
(Functions with linear variables)
.
Let
f:Rm×Rn→R
be
a convex function of the form
f
(
x, y
) =
g
(
x
) +
aTy
+
c
, with
a
= 0 and
g:Rm→R
convex. Then gradient cuts support
C
=
{
(
x, y
) :
f
(
x, y
)
≤
0
}
.
Indeed, assume without loss of generality that
a1>
0 and let (
x¯, y¯
)
/∈C
. Then
there exists a
λ >
0 such that
f
(
x¯, y¯−λe1
) =
g
(
x¯
) +
aTy¯
+
c−a1λ
= 0. The
statement follows from Theorem 2.3.
Consider separating a point (
x0, z0
) from a constraint of the form
z
=
g
(
x
)
with
g:R→R
and convex, with
z0< g
(
x0
) (that is, separating on the convex
constraint
g
(
x
)
≤z
). As mentioned earlier, Belotti et al. (2009) suggest
projecting (
x0, z0
) to the graph
z
=
g
(
x
) and computing a gradient cut there.
This example shows that this step is unnecessary when the sole purpose is to
obtain a cut that is supporting to the graph.
By contrast, if
g
(
x
) is strictly convex, linearizations at points
x
such that
g
(
x
)
= 0 are never supporting to
g
(
x
)
≤
0. This follows directly from Theo-
rem 2.3 since
λ↦→ g
(
x
+
λv
) is not affine for any
v
. We can also characterize
convex quadratic functions with supporting linearizations.
Example 2.7
(Convex quadratic functions)
.
Let
g
(
x
) =
xTAx
+
bTx
+
c
be
a convex quadratic function, i.e.,
A
is an
n
by
n
symmetric and positive semi-
definite matrix. We show that gradient cuts support
C
=
{x∈Rn
:
g
(
x
)
≤
0
}
,
if and only if, bis not in the range of A, i.e., b /∈R(A) = {Ax :x∈Rn}.
First notice that
lv
(
λ
) =
g
(
x
+
λv
) is affine linear, if and only if,
v∈ker
(
A
).
42 On the Relation Between the ESH Algorithm and KCP Algorithm
Let
v∈ker
(
A
) and
x¯/∈C
. Then there is a
λ∈R
such that
x¯
+
λv ∈C
if and only if
lv
is not constant. Thus, gradient cuts are not supporting, if
and only if,
lv
is constant for every
v∈ker
(
A
). But
lv
is constant for every
v∈ker
(
A
), if and only if,
bTv
= 0 for every
v∈ker
(
A
), which is equivalent
to
b∈ker
(
A
)
⊥
=
R
(
AT
) =
R
(
A
), since
A
is symmetric. Hence, gradient cuts
support C, if and only if, b /∈R(A).
In particular, if
b
= 0, i.e., there are no linear terms in the quadratic
function, then gradient cuts are never supporting hyperplanes. Also, if
A
is
invertible,
b∈R
(
A
) and gradient cuts are not supporting. This is to be
expected since in this case gis strictly convex.
2.3 The Gauge Function
Any
MICP
of form
(2.1)
can be reformulated to an equivalent
MICP
with a
single constraint for which every linearization supports the continuous relax-
ation of the feasible region. To this end, we can use any sublinear function
whose 1-sublevel set is
C
. Each convex set
C
has at least one sublinear function
that represents it, namely, the gauge function (Rockafellar, 1970) of C.
Definition 2.8.
Let
C⊆Rn
be a convex set such that 0
∈int C
. The gauge
of Cis
φC(x) = inf {t > 0 : x∈tC }.
Proposition 2.9
(Tuy (2016, Proposition 1.11))
.
Let
C⊆Rn
be a convex
set such that 0
∈int C
, then
φC
(
x
)is sublinear. If, in addition,
C
is closed,
then it holds that
C={x∈Rn:φC(x)≤1}
and
∂C ={x∈Rn:φC(x) = 1}.
Combining Proposition 2.9 with Corollary 2.5, we can see that the gauge
function is appealing for separation, because it always generates supporting
hyperplanes.
2.3.1 Using the Gauge Function for Separation
Even though the gauge function is exactly what we need to ensure supporting
gradient cuts, in general, there is no closed-form formula for it. Therefore, it
is not always possible to explicitly reformulate Cas φC(x)≤1.
Furthermore, if one is interested in solving mathematical programs with
a numerical solver, performing such a reformulation might introduce some
2.3. The Gauge Function 43
numerical issues one would have to take care of. Solvers usually solve up
to a given tolerance, that is, they accept points that satisfy
gj
(
x
)
≤ε
for
some
ε >
0. Then, even though
C
=
{x
:
φC
(
x
)
≤
1
}
, it might be that
{x∈Rn
:
φC
(
x
)
≤
1 +
ε}⊈{x∈Rn
:
gj
(
x
)
≤ε}
. In fact, even simple
constraints show this behavior. Consider
C
=
{x
:
x2−
1
≤
0
}
. In this case,
φC
(
x
) =
|x|
and for
x0
= 1 +
ε
, we have
φC
(
x0
) = 1 +
ε
. Then
x0
would be
ε
-feasible for
φC
(
x
)
≤
1, although it would be infeasible for
x2−
1
≤
0, since
2ε+ε2> ε.
Luckily, one does not need to reformulate in order to take advantage of the
gauge function for tighter separation. The next propositions show how to use
the gauge function and a point
x¯/∈C
to obtain a boundary point of
C
and
that linearizing at that boundary point gives a supporting valid inequality that
actually separates
x¯
. For ensuring the existence of a supporting hyperplane
we need Assumption 2.2. For example, Assumption 2.2 is satisfied whenever
Slater’s condition (Section 1.3) is satisfied for
(2.1)
with
C
represented by
(2.2), that is, when there exists x0such that gj(x0)<0 for every j∈J.
Before we state the propositions we start with a simple lemma.
Lemma 2.10.
Let
C⊆Rn
be a closed convex set such that 0
∈int C
, let
xˆ∈∂C
and
x¯/∈C
. Let
α∈Rn, β ∈R
such that
α
= 0 and
αTx≤β
is a
valid inequality for
C
that supports
C
at
xˆ
. If the segment joining 0and
x¯
contains xˆ, then the inequality separates x¯from C.
Proof. Consider
l
(
λ
) =
αT
(
λx¯
)
−β
and let
λ0∈
(0
,
1) be such that
λ0x¯
=
xˆ
.
The function
l
is a strictly increasing affine linear function. Indeed, 0
∈int C
implies that l(0) <0, while l(λ0) = 0. Thus, l(1) >0, i.e., αTx¯> β.
Proposition 2.11.
Let
C⊆Rn
be a closed convex set such that 0
∈int C
and let x¯/∈C. Then x¯
φC(x¯) ∈∂C.
Proof. First,
φC
(
x¯
)
= 0 since
x¯/∈C
. The positive homogeneity of
φC
implies
that φC(︂x¯
φC(x¯) )︂=φC(x¯)
φC(x¯) = 1. Proposition 2.9 implies x¯
φC(x¯) ∈∂C.
Let
J0
(
x
) be the set of indices of the active constraints at
x
, i.e.,
J0
(
x
) =
{j∈J:gj(x) = 0}.
Proposition 2.12.
Let
C
=
{x
:
gj
(
x
)
≤
0
, j ∈J}
be such that 0
∈int C
and let
φC
be its gauge function. Assume that Assumption 2.2 holds. Given
x¯/∈C
, define
xˆ
=
x¯
φC(x¯)
. Then, for any
j∈J0
(
xˆ
), the gradient cut of
gj
at
xˆ
yields a valid supporting inequality for Cthat separates x¯.
44 On the Relation Between the ESH Algorithm and KCP Algorithm
Proof. By the previous proposition, we have that
xˆ∈∂C
. Let
j∈J0
(
xˆ
). Then
the gradient cut of
gj
at
xˆ
yields a valid supporting inequality. The fact that it
separates follows from Lemma 2.10. Note that Lemma 2.10 is applicable since
Assumption 2.2 ensures that the normal of the gradient cut is nonzero.
Hence, we can get supporting valid inequalities separating a given point
x¯/∈C
by using the gauge function to find the point
xˆ
=
x¯
φC(x¯) ∈∂C
. Then
Proposition 2.12 ensures that the gradient cut of any active constraint at
xˆ
will separate x¯ from C. But how do we compute φC(x¯)?
2.3.2 Evaluating the Gauge Function
Let
C
=
{x
:
gj
(
x
)
≤
0
, j ∈J}
be a closed convex set such that 0
∈int C
and consider
f(x) = max
j∈Jgj(x).(2.5)
In general, evaluating the gauge function of
C
at
x¯/∈C
is equivalent to solving
the following one dimensional equation
f(λx¯) = 0, λ ∈(0,1).(2.6)
If λ∗is the solution, then φC(x¯) = 1
λ∗.
One can solve such an equation using a line search. Note that the line
search is looking for a point
xˆ∈∂C
on the segment between 0 and
x¯
. This is
exactly what the (extended) supporting hyperplane algorithm performs when
it uses 0 as its interior point.
We would also like to remark that a closed-form formula expression for
the gauge function of
C
is equivalent to a closed-form formula for the solution
of
(2.6)
. It is possible to find such a formula for some functions, e.g., when
f
is a convex quadratic function.
Next, we briefly discuss what happens when 0 is not in the interior of
C
and when
C
has no interior. In the next section we discuss the implications
of the fact that evaluating the gauge function is equivalent to the line search
step of the supporting hyperplane algorithm.
2.3.3 Handling Sets with Empty Interior
When
int C
=
∅
, we can still use the methods discussed above by applying a
trick from Kronqvist et al. (2016). Assuming
C
=
{x∈Rn
:
gj
(
x
)
≤
0
, j ∈
J}
=
∅
, consider the set
Cϵ
=
{x∈Rn
:
gj
(
x
)
≤ϵ, j ∈J}
. This set satisfies
int Cϵ=∅and optimizing over Cϵprovides an ϵ-optimal solution.
2.4. Convergence Proofs 45
2.3.4 Using a Nonzero Interior Point
If
x0∈int C
and
x0
= 0, we can translate
C
so that 0 is in its interior.
Equivalently, we can build a gauge function centered on x0. This is given by
φx0,C(x) = φC−x0(x−x0).
Then, given x¯/∈C, the point
xˆ = x¯−x0
φC−x0(x¯−x0)+x0(2.7)
belongs to the boundary of
C
. Equivalently,
xˆ
=
x0
+
λ∗
(
x¯−x0
), where
λ∗
solves
f(x0+λ(x¯−x0)) = 0, λ ∈(0,1),
with f(x) = maxj∈Jgj(x) as in (2.5).
2.4 Convergence Proofs
Consider an
MICP
given by
(2.1)
with
C
represented as
(2.2)
. Let
f
be defined
as in
(2.5)
. As mentioned above, the ESH algorithm computes an interior point
of
C
(which we will assume to be 0) and performs a line search between
x¯/∈C
and 0 in order to find a point on the boundary. It computes a gradient cut
at the boundary point, solves the relaxation again, and repeats the process.
From our previous discussion, computing a gradient cut at the boundary point
is equivalent to computing a gradient cut at
x¯
φC(x¯)
. Therefore, the generated
cuts are f(x¯
φC(x¯) ) + vT(x−x¯
φC(x¯) )≤0, where v∈∂f(x¯
φC(x¯) ).
To prove the convergence of the ESH algorithm, Veinott and Kronqvist
et al. use tailored arguments. Here we show that the convergence of the algo-
rithm follows from the convergence of KCP. We note that the KCP algorithm
still converges when
C
is represented by a convex non-differentiable function.
One needs to replace gradients by subgradients and one can use any subgradi-
ent (Horst and Tuy, 1990). Therefore, given that
φC
(
x
) is a convex function,
we know that KCP converges when applied to
min{cTx
:
φC
(
x
)
≤
1
}
. Thus,
in order to prove that ESH converges, it is sufficient to show that the cutting
planes generated by ESH can also be generated by KCP.
We first prove that the normals of (normalized) supporting valid inequali-
ties are subgradients of the gauge function at the supporting point.
Lemma 2.13.
Let
αTx≤
1be a valid and supporting inequality for
C
. Let
xˆ∈∂C be a point where it supports C, i.e., αTxˆ = 1. Then α∈∂φC(xˆ).
46 On the Relation Between the ESH Algorithm and KCP Algorithm
Proof. We need to show that
φC
(
xˆ
) +
αT
(
x−xˆ
)
≤φC
(
x
) for every
x
. Note
that since
xˆ∈∂C
, we have that
φC
(
xˆ
) = 1 and we just have to prove that
αTx≤φC(x).
When
x
is such that
φC
(
x
)
>
0, we have
x
φC(x)∈C
. Due to the validity
of αTx≤1, it follows that αTx
φC(x)≤1.
Now let
x
be such that
φC
(
x
) = 0. Then
φC
(
λx
) = 0 for every
λ >
0, i.e.,
λx ∈C
for every
λ >
0. Hence,
αT
(
λx
)
≤
1 for every
λ >
0 which implies
that αTx≤0 = φC(x).
Now we prove that the inequalities generated by the ESH algorithm can
also be generated by the KCP algorithm. Given that the KCP algorithm
converges even for non-smooth convex function (Horst and Tuy, 1990), the
next theorem implies the convergence of the ESH algorithm.
Theorem 2.14.
Consider an
MICP
given by
(2.1)
with
C
represented as
(2.2)
such that 0
∈int C
and Assumption 2.2 holds. Let
f
be defined as
in
(2.5)
and let
x¯/∈C
be the current relaxation solution to separate. Let
f
(
x¯
φC(x¯)
) +
vT
(
x−x¯
φC(x¯)
)
≤
0, with
v∈∂f
(
x¯
φC(x¯)
), be the inequality generated
by the ESH algorithm using 0as the interior point. Then KCP applied to
min{cTx:φC(x)≤1}can generate the same inequality.
Proof. Let
xˆ
=
x¯
φC(x¯)
. First, let us show that Assumption 2.2 implies
v
= 0.
Indeed, if
v
= 0, then
f
(
xˆ
) +
vT
(
x−xˆ
)
≤f
(
x
) and 0
∈C
imply that
0
≥f
(0)
≥f
(
xˆ
) +
vT
(0
−xˆ
) = 0. Let
j∈J
be such that
gj
(0) =
f
(0) = 0.
Then
λ↦→ gj
(
λxˆ
) is constant in [0
,
1]. Thus, its derivative at 1 is 0, i.e.,
∇gj
(
xˆ
)
Txˆ
= 0. This implies that
∇gj
(
xˆ
)
Tx¯
= 0. Furthermore,
∇gj
(
xˆ
)
= 0 by
Assumption 2.2 and so Lemma 2.10 implies that
∇gj
(
xˆ
)
T
(
x−xˆ
)
≤
0 separates
x¯ from C. But this contradicts the equality ∇gj(xˆ)Tx¯ = 0.
Let us manipulate the inequality obtained by the ESH algorithm. Notice
that
f
(
xˆ
) = 0 and so the inequality reads as
vTx≤vTxˆ
. By Lemma 2.10,
x¯
is
cut off by
vTx≤vTxˆ
, i.e.,
vTx¯> vTxˆ
. This, together with
φC
(
x¯
)
>
1, implies
that
vTx¯>
0. Summarizing, the inequality obtained by the ESH algorithm
can be rewritten as (︃φC(x¯)
vTx¯v)︃T
x≤1.
Lemma 2.13 implies that
φC(x¯)
vTx¯v∈∂φC
(
xˆ
). Since
φC
is positively ho-
mogeneous,
∂φC
(
xˆ
) =
∂φC
(
x¯
). Hence, if the KCP algorithm applied to
min{cTx
:
φC
(
x
)
≤
1
}
separates
x¯
using
φC(x¯)
vTx¯v∈∂φC
(
x¯
), then it would
generate the gradient cut
φC(x¯) −1 + φC(x¯)
vTx¯vT(x−x¯) ≤0.
2.5. Convex Programs Represented by Non-Convex Non-Smooth Functions47
The left hand side of the above inequality is equivalent to
−
1 +
φC(x¯)
vTx¯vTx
.
This shows that the gradient cut constructed by the KCP algorithm is the
same as the one construction by the ESH algorithm.
2.5 Convex Programs Represented by Non-Convex Non-Smooth
Functions
In this section we consider problem (2.1) with Crepresented as
C={x:gj(x)≤0, j ∈J},
where the functions
gj
are not necessarily convex. As mentioned in the in-
troduction, convex problems represented by non-convex functions have been
considered in Dutta and Lalitha (2011); Kabgani et al. (2017); Lasserre (2009,
2011, 2014); Mart´ınez-Legaz (2014). These different works have generalized
each other by considering more general classes of non-smooth functions.
2.5.1 The ESH Algorithm in the Context of Generalized Differen-
tiability
When a function is non-smooth there are many ways of extending the notion
of differentiability. Informally, it is common to first define a notion of direc-
tional derivative and then a generalization of the gradient. As the directional
derivative of
g
at
x
in the direction
d
is given by
∇g
(
x
)
Td
, the notion of
generalized gradient tries to capture this relation.
A classic notion of generalized derivative is Clarke’s subdifferential.
Definition 2.15
(Clarke (1990); Clarke et al. (1998))
.
The Clarke directional
derivative of a function
g
:
Rn→R
at
x¯
in the direction
d∈Rn
is defined as
g◦(x¯; d) = lim sup
x→x¯,t↘0
g(x+td)−g(x)
t.
The Clarke subdifferential of gat x¯is
∂◦g(x¯) = {η∈Rn:ηTd≤g◦(x¯; d)∀d∈Rn}.
We say that
g
is directionally differentiable at
x¯
if directional derivatives of
g
at x¯exist, that is,
g′(x¯; d) = lim
t↘0
g(x¯ + td)−g(x¯)
t,
exists for every
d∈Rn
. Finally,
g
is regular in the sense of Clarke at
x¯
if the
gis directional differentiable at x¯and g′(x¯; d) = g◦(x¯; d)for every d∈R.
48 On the Relation Between the ESH Algorithm and KCP Algorithm
Another interesting class is the following.
Definition 2.16
(Jeyakumar and Luc (1999))
.
Let
g
:
Rn→R
. The upper
Dini directional derivative of gat x¯in the direction d∈Rnis
g+(x¯; d) = lim sup
t↘0
g(x¯ + td)−g(x¯)
t.
The function
g
has an upper regular convexificator (URC) at
x¯
if there exists
a closed set ∂+g(x¯) ⊆Rnsuch that for each d∈Rn,
g+(x¯; d) = sup
α∈∂+g(x¯)
αTd.
We abstract the notion of directional derivative and subdifferential as
follows.
Definition 2.17.
Let
g
:
Rn→R
be a function. A generalized directional
derivative of
g
is a function
h
:
Rn×Rn→R
, and the generalized directional
derivative of
g
at
x
in the direction
d
is
h
(
x
;
d
). We say that
g
admits a
generalized subdifferential at
x
if there exists
A
=
A
(
x
)
⊆Rn
such that
h(x;d) = supv∈A(x)vTdfor all d∈Rn.
For example, if
g
is locally Lipschitz, then Clarke’s directional derivative is
a generalized directional derivative and
∂◦g
(
x
) is a generalized subdifferential
as
g◦
(
x
;
d
) =
sup{vTd
:
v∈∂◦g
(
x
)
}
(Clarke et al., 1998, Proposition 2.1.5).
Or, if
g
admits a URC, then Dini’s directional derivative is a generalized
directional derivative that admits a generalized subdifferential.
However, the above definition of generalized directional derivative and sub-
differential is so general, that any support function of a set yields a generalized
directional derivative that admits a generalized subdifferential. The following
definition adds a further requirement in order to make this general notion
useful.
Definition 2.18.
Let
h
be a generalized directional derivative of
g
. We say
that the generalized directional derivative is well-behaved if
h
(
x
;
d
)
>
0implies
that there exists tn↘0such that g(x+tnd)> g(x).
As we will see, this is the key property to show that the ESH algorithm
converges.
Clearly, if
g
is differentiable, then the directional derivative is well-behaved.
Also, Dini’s directional derivative is well-behaved. As we will see in the next
2.5. Convex Programs Represented by Non-Convex Non-Smooth Functions49
section, Clarke’s directional derivative is not well-behaved in general. However,
if the function is regular in the sense of Clarke, then it is well-behaved. Another
important class of functions for which Clarke’s directional derivative is well-
behaved is the class of ∂◦-pseudoconvex functions.
Definition 2.19. A function g:Rn→Ris ∂◦-pseudoconvex if
– it is locally Lipschitz and,
– for every x, y ∈Rn, if g(y)< g(x), then g◦(x;y−x)<0
To show that it is well-behaved, we need to following result.
Lemma 2.20
(Bagirov et al. (2014, Lemma 5.3))
.
If a function
g
is
∂◦
-
pseudoconvex, then for every
x, y ∈Rn
, if
g
(
y
) =
g
(
x
), then
g◦
(
x
;
y−x
)
≤
0.
In particular, if g(y)≤g(x), then g◦(x;y−x)≤0.
The contrapositive of the last statement is if
g◦
(
x
;
y−x
)
>
0, then
g
(
y
)
>
g
(
x
). As
g◦
(
x
;
·
) is positively homogeneous (Clarke et al., 1998, Proposition
2.1.1), we conclude that if
g
is
∂◦
-pseudoconvex,
g◦
(
x
;
d
)
>
0 for some
d∈Rn
,
and
t >
0, then
g
(
x
+
td
)
> g
(
x
). Thus, if
g
is
∂◦
-pseudoconvex, then Clarke’s
directional derivative is well-behaved.
Now we are ready to prove the main result of this section. Recall that
J0(x) = {j∈J:gj(x) = 0}.
Theorem 2.21.
Let
C
=
{x
:
gj
(
x
)
≤
0
, j ∈J}
be such that
C
is convex,
closed, and 0
∈int C
. Assume that for each
x∈C
and
j∈J0
(
x
), the function
gj
has a well-behaved generalized directional derivative at
x
denoted by
hj
,
and that it admits a generalized subdifferential,
∂∗gj
(
x
). Furthermore, assume
that
∂∗gj(x)\{0} =∅for all x∈Cand j∈J0(x).(2.8)
Let
φC
be the gauge function of
C
. For
x¯/∈C
, define
xˆ
=
x¯
φC(x¯)
. Then, for
every
j∈J0
(
xˆ
)and every
v∈∂∗gj
(
xˆ
)
\{
0
}
, the gradient cut,
gj
(
xˆ
) +
vT
(
x−
xˆ) ≤0, is a valid supporting inequality for Cthat separates x¯.
Proof. By Proposition 2.11 we have that
xˆ∈∂C
. Let
j∈J0
(
xˆ
) and let us
a consider an arbitrary
v∈∂∗gj
(
xˆ
)
\ {
0
}
. The gradient cut of
gj
at
xˆ
is
vT(x−xˆ) ≤0.
We first show that the gradient cut is valid, that is,
vT
(
y−xˆ
)
≤
0 for all
y∈C
. If this is not the case, then there exists
y0∈C
for which
vT
(
y0−xˆ
)
>
0.
50 On the Relation Between the ESH Algorithm and KCP Algorithm
Since gjadmits a generalized subdifferential at xˆ, we have that
hj(xˆ; y0−xˆ) = sup
η∈∂∗gj(xˆ)
ηT(y0−xˆ).
As
v∈∂∗gj
(
xˆ
), it follows that
hj
(
xˆ
;
y0−xˆ
)
>
0. Since
hj
is well-behaved,
there is a sufficiently small
t∈
(0
,
1) such that
gj
(
xˆ
+
t
(
y0−xˆ
))
>
0. Thus,
xˆ
+
t
(
y0−xˆ
)
/∈C
. However, the convexity of
C
implies that
xˆ
+
λ
(
y0−xˆ
)
∈C
for λ∈[0,1], which is a contradiction.
The fact that the gradient cut separates
x¯
follows from Lemma 2.10. Note
that v= 0 by hypothesis.
Theorem 2.21 extends the algorithm of Veinott to further representations
of the set
C
. In particular, it implies that the ESH converges (via an argument
similar to Theorem 2.14’s proof) when the constraints admit a URC or are
∂◦-pseudoconvex. Thus, it generalizes the result of Eronen et al. (2017).
Remark 2.22.
Any representation of a convex set
C
as
{x∈Rn
:
gj
(
x
)
≤
0, j ∈J}yields a way to evaluate its gauge function, namely,
φC(x) = inf {︃t > 0 : max
jgj(︂x
t)︂= 0}︃.
This infimum can be computed using a line search procedure.
However, what is more important is the ability to compute subgradients.
Given any method to compute subgradients of the gauge function, we can
apply the KCP algorithm using the implicitly defined gauge function. This
allows us, for example, to drop
(2.8)
. This algorithm is more general than
the one proposed by Lasserre (2011), but it will not necessarily converge to a
KKT point of the original problem.
2.5.2 Limits to the Applicability of the ESH Algorithm
The idea of the proof of Theorem 2.21 is that since
C
is convex,
xˆ
+
λ
(
y−xˆ
)
∈C
for every
y∈C
and
λ∈
[0
,
1]. Hence, the functions
gj
do not increase when
moving in the direction
y−xˆ
from
xˆ
. Thus, a notion of subdifferential that
characterizes a well-behaved directional derivative yields valid gradient cuts.
The abstract definitions introduced above try to capture this line of reasoning.
Note that this is also how the proofs of the ‘only if’ parts of (Lasserre, 2009,
Lemma 2.2), (Kabgani et al., 2017, Theorem 1), (Dutta and Lalitha, 2011,
Proposition 2.2), and the
⊆
inclusion of (Mart´ınez-Legaz, 2014, Proposition
6) work. For example, Lasserre (2009) assumes that the
gj
is differentiable,
2.6. Concluding Remarks 51
in which case the generalized subdifferential is just the singleton given by the
gradient and the generalized directional derivative is the classic directional
derivative. Dutta and Lalitha (2011) assume that the functions are locally
Lipschitz and regular in the sense of Clarke.
It is a natural question to wonder how important the regularity assumption
is. As the following example shows, the ESH algorithm can produce invalid
cutting planes when using Clarke’s subdifferential and the constraints are
not regular in the sense of Clarke. In particular, this shows that, without the
assumption of regularity, Clarke’s directional derivative is not well-behaved,
in general.
Example 2.23.
Consider the function
g
(
x1, x2
) =
max{min{
3
x1
+
x2,
2
x1
+
3
x2}, x1}
. The set
C
=
{
(
x1, x2
) :
g
(
x1, x2
)
≤
0
}
is convex, closed and its inte-
rior is nonempty as shown in Figure 2.2. Note that as
g
is piecewise linear, it is
globally Lipschitz continuous (Scholtes, 2012, Proposition 2.2.7). Using Clarke
et al. (1998, Theorem 2.8.1), it follows that
∂◦g
(0) =
conv{
(3
,
1)
,
(2
,
3)
,
(1
,
0)
}
.
Then 2
x1
+ 3
x2≤
0 is a gradient cut of
g
at 0. However, it is not valid as
(−1,3) is feasible but −2+9>0.
In particular, it must be that
g
is not regular in the sense of Clarke and that
g◦
is not well-behaved. To see that
g
is not well-behaved, consider the direction
d
= (
−
1
,
1). Notice that
g
((0
,
0) +
td
) =
tg
(
−
1
,
1) =
−t
, and so
g
is strictly
decreasing in the direction
d
. However,
g◦
(0;
d
) =
maxv∈∂◦g(0) −v1
+
v2
= 1.
This also shows that
g
is not regular. The directional derivative of
g
at 0 in
the direction dis −1= 1.
2.6 Concluding Remarks
In this chapter, we have shown that the extended supporting hyperplane algo-
rithm introduced by Veinott (1967) and rediscovered by Kronqvist et al. (2016)
is identical to Kelley’s classic cutting plane algorithm applied to a suitable
reformulation of the problem. We used this new perspective in order to prove
the convergence of the method for the larger class of problems with convex
feasible regions represented by non-convex non-smooth constraints which ad-
mit a generalized subdifferential and whose generalized directional derivative
is well-behaved. This class includes
∂◦
-pseudoconvex functions and functions
that admit a URC. Functions that admit a URC include differentiable func-
tions and locally Lipschitz functions that are regular in the sense of Clarke.
More generally, the algorithm extends to any representation of a convex set
that allows to compute subgradients of its gauge function. These theoretical re-
sults bear relevance in practice, as the experimental results in Kronqvist et al.
52 On the Relation Between the ESH Algorithm and KCP Algorithm
-2-1 0 1 2
-2
-1
0
1
2
Figure 2.2: Counterexample showing that, in general, the ESH algorithm can
generate invalid cutting planes if the constraints are just Lipschitz continuous.
The convex feasible region
max{min{
3
x1
+
x2,
2
x1
+ 3
x2}, x1} ≤
0 in blue
and the boundary of the invalid gradient cut 2x1+ 3x2≤0 in red.
(2016, 2018) have already demonstrated the computational benefits of the
supporting hyperplane algorithm in comparison to alternative state-of-the-art
solving methods.
Another intuition gain from this chapter, which we will use in Chapter 5, is
that if we want the gradient cuts to be supporting, then the constraint function
cannot be “too” convex. Indeed, as we saw, gradient cuts from strictly convex
functions will never be supporting.
Chapter 3
Visible Points, the Separation Problem,
and Applications to Mixed-Integer
Nonlinear Programming
From now on we move away from convex mixed-interger non-linear programs
and consider non-convex mixed-integer linear programs. In this chapter we
introduce a technique to produce tighter cutting planes for mixed-integer
non-linear programs. Usually, a cutting plane is generated to cut off a specific
infeasible point. The underlying idea is to use the infeasible point to restrict
the feasible region in order to obtain a tighter domain. To ensure validity, we
require that every valid cut separating the infeasible point from the restricted
feasible region is still valid for the original feasible region. We translate this
requirement in terms of the separation problem and the reverse polar. In
particular, if the reverse polar of the restricted feasible region is the same
as the reverse polar of the original feasible region, then any cut valid for the
restricted feasible region that separates the infeasible point, is also valid for
the original feasible region.
We show that the reverse polar of the so-called visible points of the feasible
region from the infeasible point coincides with the reverse polar of the feasible
region. In the special case where the feasible region is described by a single
non-convex constraint intersected with a convex set we provide a characteri-
zation of the visible points. Furthermore, when the non-convex constraint is
quadratic the characterization is particularly simple. We also provide an ex-
tended formulation for a relaxation of the visible points when the non-convex
constraint is a general polynomial.
Finally, we give some conditions under which for a given set there is an
inclusion-wise smallest set, in some predefined family of sets, whose reverse
polars coincide.
53
54 Visible Points, the Separation Problem, and Applications to MINLP
3.1 Introduction
The separation problem is a fundamental problem in optimization (Gr¨otschel
et al., 1993). Given a set
S⊆Rn
and a point
x¯∈Rn
, the separation problem
is
Decide if
x¯
is in the closure of convex hull of
S
or find a valid for
Sthat separates x¯.
Algorithms to solve optimization problems, especially those based on solving
relaxations, such as branch and bound, need to deal with the separation
problem. Consider, for example, solving a mixed integer linear problem via
branch and bound (Conforti et al., 2014, Section 9.2). The solution to the
linear relaxation plays the role of
x¯
, while a relaxation based on a subset of
the constraints is used as
S
for the separation problem, see (Conforti et al.,
2014, Chapter 6).
The separation problem can be rephrased in terms of the reverse po-
lar (Balas, 1998; Zaffaroni, 2008) of Sat x¯, defined as
Sx¯={α∈Rn:αT(x−x¯) ≥1,∀x∈S}.
The elements of
Sx¯
are the normals of the hyperplanes that separate
x¯
from
conv S. Hence, the separation problem can be stated equivalently as
Decide if Sx¯is empty or find an element from it.
The point of departure of the present work is the following observation.
Observation 3.1.
If there is a set
V
such that (
S∩V
)
x¯
=
Sx¯
, then, as far
as the separation problem is concerned, the feasible region can be regarded
as S∩Vinstead of S.
A set
V
such that
Vx¯
=
Sx¯
will be called a generator of
Sx¯
. Intuitively,
if a set
V
is such that
V∩S
generates
Sx¯
, that is, if we can ensure that a
cut valid for
V∩S
that separates
x¯
is also valid for
S
, then
V
should at least
contain the points of
S
that are “near”
x¯
. To formalize the meaning of “near”
we use the concept of visible points (Deutsch et al., 2013) of
S
from
x¯
, which
are the points
x∈S
for which the segment joining
x
with
x¯
only intersects
S
at
x
, see Definition 3.5. In other words, they are the points of
S
that can
be “seen” from
x¯
. In Proposition 3.9 we show that the visible points are a
generator of Sx¯.
As a motivation, we present an application of our results in the context
of nonlinear programming, which is treated in more detail in Section 3.4.
3.1. Introduction 55
-0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Figure 3.1: The feasible region
g
(
x
)
≤
0 and
x¯
= (0
,
0) together with the box
V.
Example 3.2.
Consider the separation problem of
x¯
= (0
,
0) from
S
=
{x∈
B:g(x)≤0}where
B= [−1
2,3] ×[−1
2,3],
g(x1, x2) = −x2
1x2+ 5x1x2
2−x2
2−x2−2x1+ 2,
as depicted in Figure 3.1. A standard technique for solving the separation prob-
lem for
S
and
x¯
is to construct a convex underestimator of
g
over
B
(Vigerske,
2013, Sections 6.1.2 and 7.5.1). The quality of a convex underestimator de-
pends on the bounds of the variables and tighter bounds yield tighter un-
derestimators. As we will see (Proposition 3.9 and Theorem 3.27),
Rx¯
=
Sx¯
where
R={x∈B:g(x) = 0,∇g(x)Tx≤0}.
It is possible to show that
R⊆V
, where
V
= [
−1
2,17
10
]
×
[
−6
25 ,3
2
]. Hence, by
Corollary 3.25, (
V∩S
)
x¯
=
Sx¯
. This means that we can solve the separation
problem over
{x∈V
:
g
(
x
)
≤
0
}
instead of
S
. Therefore, if we were to
compute an underestimator of g, it could be computed over V⊊B.
Methods for obtaining tighter bounds for mixed integer nonlinear program-
ming (MINLP) are of paramount importance. Indeed, not only bound tight-
ening procedures enhance the performance of MINLP solvers, but also many
algorithms for solving MINLPs require that all variables are bounded (Hamed
and McCormick, 1993). We refer to the recent survey of Puranik and Sahinidis
(2017) for more information on bound tightening procedures and its impact
on MINLP solvers.
56 Visible Points, the Separation Problem, and Applications to MINLP
However, the technique that we introduce in this chapter is not a bound
tightening technique in the classic sense, i.e., the tighter bounds that might
be learned from
V
are not valid for the original problem, but only for the
separation problem at hand.
We would like to point out that Venkatachalam and Ntaimo (2016) dis-
cusses a similar idea — to modify the separation problem — is used in the
context of stochastic mixed integer programming. Their objective is to speed-
up the solution of the separation problem. In contrast, our objective is to
produce tighter cutting planes for MINLP.
Contributions
We show that for every closed set
S
, there exists an inclusion-
wise smallest closed convex set that generates
Sx¯
(Theorem 3.21). When
S
is compact, there is an inclusion-wise smallest closed set that generates
Sx¯
(Theorem 3.23). Furthermore, under some mild assumptions on
S
, we show
that there is an inclusion-wise smallest closed convex set
C
such that
C∩S
generates
Sx¯
(Theorem 3.22). We also show the existence of a generator,
VS
(
x¯
),
of Sx¯which is more suitable for computations.
We apply our results to MINLP and give an explicit description of
VS
(
x¯
)
when
S
=
{x∈C
:
g
(
x
)
≤
0
}
, where
C
is a closed convex set containing
x¯
, and
g
is continuous (Section 3.4.1). For the important case of quadratic
constraints, i.e., when
g
is a quadratic function, we show that
VS
(
x¯
) has a
particularly simple expression (Theorem 3.29).
For the case when
g
is a general polynomial, we provide an extended formula-
tion for a relaxation of
VS
(
x¯
) based on the theory of non-negative univariate
polynomials (Theorem 3.34).
3.2 Visible Points and the Reverse Polar
In this section we introduce the concept of visible points and reverse polar,
and state some basic properties about them. The main result in this section
is that the reverse polar of the visible points of a set is the reverse polar of
the set (Proposition 3.9).
Unless stated otherwise, we will assume
x¯
= 0. This is without loss of
generality, since we can always translate the set
S
to
S−x¯
. We start by
restating the definition of reverse polar.
Definition 3.3. Let S⊆Rnand x¯∈Rn. The reverse polar of Sat x¯is
Sx¯={α∈Rn:αT(x−x¯) ≥1,for all x∈S}.
3.2. Visible Points and the Reverse Polar 57
As stated in the introduction, the reverse polar contains all valid inequali-
ties for Sthat separate x¯ from S.
Definition 3.4.
Let
S, V ⊆Rn
and
x¯∈Rn
. We say that
V
is a generator
of Sx¯if and only if
Vx¯=Sx¯
Definition 3.5.
Let
S⊆Rn
be closed and
x¯/∈S
. The set of visible points
of Sfrom x¯is
VS(x¯) = {x∈S: (x+ [0,1](x¯−x)) ∩S={x}}
={x∈S: (x+ (0,1](x¯−x)) ∩S=∅}.
We denote VS(0) by VSand note that
VS={x∈S: [0,1]x∩S={x}} ={x∈S: [0,1)x∩S=∅}.
The following concept is, in some sense, the opposite of the visible points.
Definition 3.6. Let S⊆Rnbe closed. The shadow of Sfrom 0 is
shw S= [1,∞)S.
The concept of shadow has also been called penumbra (Rockafellar, 1970,
p. 22),(Tind and Wolsey, 1982; Conforti and Wolsey, 2018) and aureole clo-
sure (Ruys, 1974). The followings are some basic properties of the reverse
polar.
Lemma 3.7. Ruys (1974, Property 9.2.2) Let S, T ⊆Rn. Then,
1. S0= (shw S)0= (conv S)0= (cl S)0= (conv S)0.
2. S0=∅if and only if 0∈conv S.
3. S⊆Timplies T0⊆S0.
4. If 0/∈conv S, then (S0)0= shw conv S.
We will now show that
VS
is a generator of
S0
. To this end, we need the
following lemma, which says that the shadow of what can be seen of a set is
the same as the shadow of the whole set. Likewise, what can be seen of a set
is the same as what can be seen of the shadow of the set.
58 Visible Points, the Separation Problem, and Applications to MINLP
Lemma 3.8.
Let
S⊆Rn
be a closed set such that 0
/∈S
. Then,
shw VS
=
shw Sand Vshw S=VS.
Proof. First we prove that shw VS= shw S. Clearly, shw VS⊆shw S.
Let
y∈shw S
, then
y
=
λx
with
x∈S
,
λ≥
1. Let
I
=
{µ≥
0 :
µx ∈S}
and
µ0
=
min I
. The minimum exists since
I
is closed and not empty as
S
is
closed and 1
∈I
, respectively. From 1
∈I
, we deduce
µ0≤
1, and from 0
/∈S
,
µ0>0. Hence, µ0x∈VSand y=λ
µ0(µ0x)∈shw VS, since λ
µ0≥1.
Now we prove that
Vshw S
=
VS
. Clearly,
S⊆shw S
implies that
VS⊆
Vshw S.
Let
x0∈Vshw S
. Then, [0
,
1)
x0/∈shw S
. As
S⊆shw S
it follows that
[0
,
1)
x0/∈S
. Hence, if we manage to show that
x0∈S
, then
x0∈VS
which is
what we want to prove.
As
x0∈Vshw S
, it must be that
x0∈shw S
. This means that there exists
λ≥
1 and
x∈S
such that
x0
=
λx
. Note that
1
λx0
=
x∈S⊆shw S
. In
other words,
1
λx0∈shw S
but, as we mentioned above, [0
,
1)
x0/∈shw S
. This
implies that 1
λ≥1. Therefore, λ= 1, which means that x0=x∈S.
Proposition 3.9. Let S⊆Rnbe a closed set. Then,
(S∩VS)0=V0
S=S0.
Proof. The first equality just comes from the fact that VS⊆S.
If 0
∈S
, then the equality holds as all the sets are empty. Otherwise, the
equality follows from
V0
S
= (
shw VS
)
0
= (
shw S
)
0
=
S0
, where the first and
last equalities are by Lemma 3.7 and the middle one, by Lemma 3.8.
3.3 The Smallest Generators
3.3.1 Motivation
In the previous section we showed that there is a set
U⊆S
such that (
U∩
S
)
0
=
S0
, namely,
U
=
VS
. This set can be used to improve separation routines
as was shown already in Example 3.2. We will come back to applications of
the visible points to separation in the next section.
The topic of this section is motivated by the following example, where the
set VSis much larger than the smallest generator.
Example 3.10.
Consider the constrained set
S
=
{
(
x1, x2
)
∈R2
+
: (
x1−
x2
)
2≥
1
}
depicted in Figure 3.2. The visible points are the lines
x2
=
x1
+ 1
3.3. The Smallest Generators 59
0246810
0
2
4
6
8
10
0246810
0
2
4
6
8
10
0246810
0
2
4
6
8
10
Figure 3.2: The region
S
. In the middle picture
VS
are the points described
by the thick red line. In the right picture the red points form the smallest set
Vsuch that V0=S0.
and
x2
=
x1−
1 intersected with the first orthant. However, it is not hard to
see that V={(0,1),(1,0)}is the smallest closed generator of S0.
This example motivates the following question.
Question 3.11.
What is, if any, the smallest closed set
U
such that
U0
=
S0
?
The reason we restrict to generators that are closed sets is to avoid rep-
resentation issues. For example, if
S
is the ball of radius 1 centered at (2
,
0),
then Theorem 3.29 implies that the left arc joining (2
,
1) and (2
,−
1) generates
S0
. However, the rational points on this arc also generate
S0
and the smallest
set generating
S0
does not exist. In order to avoid such issues, we concentrate
on closed generators.
As can be seen from simple examples, such as
S
=
R+×{
1
}
for which
every
a≥
0 defines the generator (
{
0
} ∪
[
a, ∞
))
× {
1
}
, the smallest closed
generator must not exist. However, a smallest closed convex generator might
exist and so we ask the following question.
Question 3.12.
What is, if any, the smallest closed convex generator of
S0
?
We are mainly interested in applying our results to the separation problem,
as already explained in the introduction. In that case, the set
S
usually looks
like
S
=
C∩F
, where
C
is a convex set and
F
is the sublevel set of some non-
convex function, see the next section. In this context, replacing
C
by a smaller
convex set might be beneficial for the separation problem (see Example 3.32).
Thus, it is also natural to consider the following question.
Question 3.13.
What is, if any, the smallest closed convex set
U
such that
S∩Ugenerates S0?
60 Visible Points, the Separation Problem, and Applications to MINLP
The last two questions are not the same. Informally,
S
is only used to define
S0
in Question 3.12, and so any other set
T
such that
T0
=
S0
can be used to
formulate the question. For instance, we can assume without loss of generality
that
S
is closed and convex, since Lemma 3.7 implies that (
conv S
)
0
=
S0
. In
contrast, in Question 3.13 we are asking for the smallest generator contained
in S.
As we will see, the answer to Question 3.12 is that
conv Vconv S
is the
smallest closed convex generator of
S0
. However, the next two examples show
that Question 3.13 is a bit more delicate.
The first example shows that, in general, there is no unique smallest closed
convex set Usuch that (S∩U)0=S0.
Example 3.14.
Let
S
=
{
(1
,
0)
,
(0
,
1)
,
(
−
1
,
0)
,
(0
,−
1)
}
. Since 0
∈conv S
,
S0=∅.
Clearly
V
=
{
0
}
=
Vconv S
is the smallest closed convex set such that
V0
=
∅
. However,
S∩V
=
∅
, which implies that (
S∩V
)
0
=
R2
=
S0
. Furthermore,
U1
=
{
(
λ,
0) :
λ∈
[
−
1
,
1]
}
and
U2
=
{
(0
, λ
) :
λ∈
[
−
1
,
1]
}
are both closed
convex and (
Ui∩S
)
0
=
S0
. Since
U1⊆ U2
and
U2⊆ U1
we conclude that
there is no smallest closed convex set Usuch that (U∩S)0=S0.
However, we cannot even expect to find a minimal closed convex set
U
such that (S∩U)0=S0.
Example 3.15.
Let
S
=
{
(0
,
1)
} ∪ {
(
λ,
2) :
λ≥
0
}
. We have
S0
=
{α
:
α1≥0, α2≥1}.
Indeed, (0
,
1)
∈S
implies that
α2≥
1. If
α1<
0 for some
α∈S0
, then
there is a large enough
λ
such that
λα1
+ 2
α2<
1 and (
λ,
2)
∈S
. On the
other hand, if α1≥0 and α2≥1, then α1x1+α2x2≥1 for every (x, y)∈S.
Let
TM
=
{
(0
,
1)
} ∪ {
(
λ,
2) :
λ≥M}
and
UM
=
conv TM
. The same
argument as above shows that (
UM∩S
)
0
=
T0
M
=
S0
. Notice that any
U
with (
U∩S
)
0
=
S0
must contain a sequence
λn→ ∞
such that (
λn,
2)
∈S
.
Thus, any minimal U, if it exists, must be of the form UMfor some M≥0.
It is clear that
UM1⊆UM2
if and only if
M1> M2
and
⋂︁M>0UM
=
{
(
λ,
1) :
λ≥
0
}
. However,
S∩{
(
λ,
1) :
λ≥
0
}
=
{
(0
,
1)
}
and
{
(0
,
1)
}0
=
S0
.
Therefore, there is no minimal U.
On the other hand,
V
=
{
(
λ,
1) :
λ≥
0
}
=
Vconv S
is the smallest closed
convex set such that V0=S0.
However, these are the only “pathological cases”. Indeed, as we will see,
if
conv S
is closed (e.g. when
S
is compact) and 0
/∈conv S
, (i.e.,
S0
=
∅
),
3.3. The Smallest Generators 61
then
conv Vconv S
is the smallest closed convex set such that
conv Vconv S∩S
generates S0.
Remark 3.16.
The closure operations are needed because, in general,
VS
and
conv VS
are not closed, even when
S
is convex and compact. Indeed, it is
shown in Deutsch et al. (2013, Example 15.5) that for
S:= (1,0,0) + cone{(1, α, β) : α2+ (β−1)2≤1},
VS
is open. The authors show that the points (2
,sin
(
t
)
,
1 +
cos
(
t
)) are visible
for
t∈
(0
, π
), but the limit when
t
approaches
π
, (2,0,0), is not. The remark
follows from a modification of this example so that
S
is compact, e.g., by
intersecting it with [0,3] ×R2.
3.3.2 Preliminaries
Here we collect a few lemmata that we are going to need in order to answer
Questions 3.11, 3.12 and 3.13.
Lemma 3.17.
Deutsch et al. (2013, Proposition 15.19) Let
S
be a closed
convex set such that 0
/∈S
. If
x∈VS
is a strict convex combination of
x1, . . . xm∈S, then x1, . . . xm∈VS.
This result immediately implies the following two lemmata.
Lemma 3.18.
Let
S⊆Rn
be a closed convex set such that 0
/∈S
. Then,
ext VS=VS∩ext S.
Proof. We start by proving
ext VS⊆VS∩ext S
. Let
x∈ext VS
. Clearly,
x∈VS
. If
x /∈ext S
, then there are
x1, . . . , xm∈S
such that
x
is a strict
convex combination of
x1, . . . , xm
. Lemma 3.17 implies that
xi∈VS
for every
i
= 1
, . . . , m
. Thus,
x
is not an extreme point of
VS
. This contradiction proves
that x∈ext S.
If
x∈VS∩ext S
but
x /∈ext VS
, then
x
is a strict convex combination of
some elements of
VS
. Since
VS⊆S
,
x
is a strict convex combination of some
element of S. This is a contradiction with x∈ext S.
Lemma 3.19.
Let
S⊆Rn
be closed set such that
conv S
is closed and
0/∈conv S. Then,
conv Vconv S= conv(S∩Vconv S).
62 Visible Points, the Separation Problem, and Applications to MINLP
Proof. From
S∩Vconv S⊆Vconv S
, it follows that
conv
(
S∩Vconv S
)
⊆conv Vconv S
.
To prove the other inclusion it is enough to show that
Vconv S⊆conv
(
S∩
Vconv S
). Let
x∈Vconv S
. Then,
x∈conv S
and so
x
is a strict convex combi-
nation of some points of
x1, . . . , xm∈S
. Then, by Lemma 3.17,
x1, . . . , xm∈
S∩Vconv S. Thus, x∈conv(S∩Vconv S).
We remark that the previous lemma does not follow from Lemma 3.18 by
just taking the convex hull operation to the equality, since
conv S
may not
have extreme points.
The following is a slight extension of (Rockafellar, 1970, Corollary 18.3.1).
Lemma 3.20. Let S⊆Rnbe a closed set. Then, ext conv S⊆S.
Proof. Recall that
x0
is an exposed point of a closed convex set
C
if and only
if there exists an αsuch that {x0}= arg maxx∈CαTx0.
We will show that the exposed points of
conv S
is a subset of
S
. Then, by
Straszewicz’s Theorem (Rockafellar, 1970, Theorem 18.6) and the closedness
of
S
, it follows that
ext conv S⊆S
. Note that when the set of exposed points
is empty, the result follows trivially. Thus, we assume that the set of exposed
points is non-empty.
Let
x0
be an exposed point of
conv S
and let
α
be a direction that exposes
it. Then,
supx∈SαTx
=
αTx0
. Since
S
is closed, there exists
x1∈S
such that
αTx1
=
αTx0
. However, since
x1∈S⊆conv S
and
α
exposes
x0
, we must
have x1=x0. Thus, x0∈S.
3.3.3 Results
Let us start by answering Question 3.12.
Theorem 3.21. Let S⊆Rnbe closed. Then,
(conv Vconv S)0=S0.
Furthermore, if C⊆Rnis a closed convex generator of S0, then
conv Vconv S⊆C.
Proof. Note that if
S0
=
∅
, then 0
∈conv S
and
Vconv S
=
{
0
}
, from which
the theorem clearly follows. Thus, we assume S0=∅.
3.3. The Smallest Generators 63
Lemma 3.7 implies that (
conv Vconv S
)
0
= (
Vconv S
)
0
and
S0
= (
conv S
)
0
.
Proposition 3.9 implies (conv S)0= (Vconv S)0.
To show the second statement of the theorem, let
C
be closed and convex
such that
C0
=
S0
. Since
C
is closed and convex, it is enough to prove that
Vconv S⊆C
. Suppose, by contradiction, that this is not the case, i.e., there is
an
x¯∈Vconv S
such that
x¯/∈C
. There are two cases, either [0
,
1]
x¯∩C
=
∅
or
[0,1]x¯∩C=∅. We will deduce a contradiction from each of them.
First, suppose [0
,
1]
x¯∩C
=
∅
. Both sets are closed and [0
,
1]
x¯
is bounded,
thus, they can be separated. Indeed, as 0
∈
[0
,
1]
x¯
, Rockafellar (1970, Corollary
11.4.1) ensures the existence of
α
such that
αx ≥
1 for every
x∈C
and
αx¯<
1.
This means that
α∈C0
. However,
α /∈
(
conv S
)
0
=
S0
, since
x¯∈conv S
.
This contradicts S0=C0.
Now, suppose [0
,
1]
x¯∩C
=
∅
. Since 0
/∈C
(as
C0
=
S0
=
∅
) and
x¯/∈C
,
there must be
µ∈
(0
,
1) such that
µx¯∈C
. However,
x¯∈Vconv S
implies
that
µx¯/∈conv S
. Thus, the same argument as above ensures that
µx¯
can be
separated from
conv S
. Therefore, there is an
α
such that
αTx≥
1 for every
x∈conv S
while
αTµx¯<
1. Hence,
α∈S0
and the contradiction follows from
the fact that µx¯∈Cimplies α /∈C0.
Therefore, we conclude that conv Vconv S⊆C.
Now we show that if
conv S
is closed and 0
/∈conv S
, then
conv Vconv S
is
the answer to Question 3.13, i.e., is the smallest closed convex
U
such that
(U∩S)0=S0.
Theorem 3.22.
Let
S⊆Rn
be a closed set such that
conv S
is closed and
0/∈conv S, i.e., S0=∅. Then,
(conv(Vconv S)∩S)0=S0.
Furthermore, if Cis closed and convex such that (C∩S)0=S0, then
conv Vconv S⊆C.
Proof. We first show that (conv(Vconv S)∩S)0=S0.
Clearly,
Vconv S∩S⊆conv(Vconv S)∩S⊆S.
Lemma 3.7 implies that
S0⊆(conv(Vconv S)∩S)0⊆(Vconv S∩S)0.
64 Visible Points, the Separation Problem, and Applications to MINLP
Thus, it is enough to show that (Vconv S∩S)0=S0. This follows from
(S∩Vconv S)0= (conv(S∩Vconv S))0Lemma 3.7
= (conv Vconv S)0Lemma 3.19
= (Vconv S)0Lemma 3.7
= (conv S)0Proposition 3.9
=S0.Lemma 3.7
To show the second statement of the theorem, let
C
be a closed convex set
such that (
C∩S
)
0
=
S0
. Lemma 3.7 implies that (
C∩S
)
0
= (
conv
(
C∩S
))
0
.
Theorem 3.21 implies that
conv Vconv S⊆conv
(
C∩S
)
.
Clearly,
Vconv S⊆
conv Vconv S
and
conv
(
C∩S
)
⊆C∩conv S
. Therefore,
Vconv S⊆C∩conv S
which implies Vconv S⊆Cas we wanted.
Finally, we answer Question 3.11 in the case where Sis compact.
Theorem 3.23.
Let
S
be any closed set such that 0
/∈conv S
. If
D
is any
closed generator of S0, then
ext Vconv S⊆D.
If, in addition,
S
is compact, then
ext Vconv S
is the smallest closed gener-
ator of S0.
Proof. First, by Lemma 3.7 and
D0
=
S0
, we have
shw conv D
=
shw conv S
.
Then, Lemma 3.8 implies that
Vconv D
=
Vconv S
. Hence,
ext Vconv D
=
ext Vconv S
.
Therefore,
ext Vconv S
=
ext Vconv D⊆ext conv D⊆D
, where the first and sec-
ond containments are due to Lemma 3.18 and Lemma 3.20, respectively.
To prove the second statement, by Lemma 3.7, it is enough to show that
(
ext Vconv S
)
0
=
S0
. First, as
ext Vconv S⊆conv S
, we have
S0⊆
(
ext Vconv S
)
0
by Lemma 3.7.
To prove the other containment take any
α∈
(
ext Vconv S
)
0
. Let
x∈conv S
be arbitrary. We will prove that
αTx≥
1. This will imply that
α∈
(
conv S
)
0
=
S0and, therefore, that (ext Vconv S)0⊆S0.
Let
λ∈
(0
,
1] be such that
λx ∈Vconv S
. If
λx ∈ext Vconv S
, then
αTλx ≥
1,
which implies that αTx≥1
λ≥1.
Now, assume
λx /∈ext Vconv S
. Since
S
is compact,
conv S
is closed and we
can use Lemma 3.18 to obtain that
ext Vconv S
=
Vconv S∩ext conv S
. Thus,
λx /∈ext conv S
. Also by the compactness of
S
, Rockafellar (1970, Theorem
3.4. Applications to MINLP 65
18.5.1) implies that
λx
is a strict convex combination of some
x1, . . . , xm∈
ext conv S.
Lemma 3.17 implies that
x1, . . . , xm∈Vconv S
and so Lemma 3.18 implies that
x1, . . . , xm∈ext Vconv S
. Since
α∈
(
ext Vconv S
)
0
, it follows
αTxi≥
1 for every
i= 1, . . . , m. Hence, αTλx ≥1 and, as before, αTx≥1
λ≥1.
We remark that the closure operation is needed since the extreme points
of a set, in general, do not form a closed set, see Rockafellar (1970, p. 167).
3.4 Applications to MINLP
Here we apply the results from Section 3.2 to MINLP.
In this section, unless specified otherwise,
x¯∈Rn
,
C
is a closed convex set
that contains
x¯
, and
S
:=
{x∈C
:
g
(
x
)
≤
0
}
, where
g
:
C→R
is continuous
and
g
(
x¯
)
>
0. The idea is that
C
represents a convex relaxation of our MINLP
and
x¯∈C
is the current relaxation solution that is infeasible for a constraint
g(x)≤0.
The basic scheme for applying our results is the following translation of
Observation 3.1.
Proposition 3.24.
Let
D⊆C
be such that (
D∩S
)
x¯
=
Sx¯
, and
T
=
{x∈
D
:
g
(
x
)
≤
0
}
. If
αT
(
x−x¯
)
≥
1is a valid inequality for
T
, then it is valid
for S.
Proof. Directly from α∈Tx¯= (D∩S)x¯=Sx¯.
Of course, the applicability of the previous proposition relies on our ability
to obtain an easy-to-compute set
D
that satisfies the hypothesis. As shown
in Section 3.3,
D
=
ext conv Vconv S
(
x¯
) is the smallest we can hope for, but it
is useless from a practical point of view. Instead, the set of visible points of
S
(or a set enclosing them) is, computationally, a better candidate as we will
see in Section 3.4.1.
Corollary 3.25.
Let
D⊆C
be such that
VS
(
x¯
)
⊆D
, and
T
=
{x∈D
:
g
(
x
)
≤
0
}
. If
αT
(
x−x¯
)
≥
1is a valid inequality for
T
, then it is valid for
S
.
Proof. Clearly,
VS
(
x¯
)
⊆T
=
D∩S⊆S
. The inclusion-reversing property
of the reverse polar implies that
Sx¯⊆
(
D∩S
)
x¯⊆VS
(
x¯
)
x¯
=
Sx¯
, where
the last equality follows from Proposition 3.9. The statement follows from
Proposition 3.24.
In the context of separation via convex underestimators Corollary 3.25
reads as follows.
66 Visible Points, the Separation Problem, and Applications to MINLP
Corollary 3.26.
Let
D⊆C
be a closed convex set such that
VS
(
x¯
)
⊆D
,
and let
T
=
{x∈D
:
g
(
x
)
≤
0
}
. If
gvex
D
(
x¯
)
>
0and
∂gvex
D
(
x¯
)
=
∅
, then a
gradient cut of gvex
Dat x¯is valid for S.
Proof. Let
Tr
=
{x∈D
:
gvex
D
(
x
)
≤
0
}
and
v∈∂gvex
D
(
x¯
). The cut
gvex
D
(
x¯
) +
vT
(
x−x¯
)
≤
0 is valid for
Tr
, and separates
x¯
from
Tr
. Since
Tr
is a relaxation,
i.e.
T⊆Tr
, it follows that the cut is also valid for
T
, and Corollary 3.25
implies its validity for S.
The previous result tells us that if we find a box, tighter than the bounds,
that contains the visible points, then we might be able to construct tighter
underestimators. However, to compute a box containing
VS
(
x¯
) we need to
know how VS(x¯) looks like. That is the topic of the next section.
3.4.1 Characterizing the Visible Points
From the definition of visible points we have:
Theorem 3.27. Let g:Rn→Rbe a continuous function, C⊆Rna closed
convex set, and S={x∈C:g(x)≤0}. If x¯∈Cand g(x¯) >0, then
VS(x¯) = {x∈C:g(x) = 0, g(x+λ(x¯−x)) >0for every λ∈(0,1]}.
(3.1)
Furthermore, if gis differentiable, then every x∈VS(x¯) satisfies
∇g(x)T(x¯−x)≥0.
Proof. Given that
x¯/∈S
, by definition we have
x∈VS
(
x¯
) if and only if
x∈S
and for every
λ∈
(0
,
1],
x
+
λ
(
x¯−x
)
/∈C
or
g
(
x
+
λ
(
x¯−x
))
>
0. However,
the convexity of
C
and
x¯∈C
imply that for
x∈S
,
x
+
λ
(
x¯−x
)
∈C
. Hence,
VS(x¯) = {x∈C:g(x)≤0, g(x+λ(x¯−x)) >0 for every λ∈(0,1]}.
Since gis continuous, it follows that for x∈VS(x¯),
0≥g(x) = lim
λ→0+g(x+λ(x−x¯)) ≥0.
Thus, g(x) = 0 which proves (3.1).
Now, assume that gis differentiable and let x∈VS(x¯). Then,
0≤lim
λ→0+
g(x+λ(x¯−x))
λ= lim
λ→0+
g(x+λ(x¯−x)) −g(x)
λ=∇g(x)T(x¯−x).
This concludes the claim.
3.4. Applications to MINLP 67
Remark 3.28.
Note that if we drop the hypothesis that
x¯
is in
C
, then there
might be visible points for which
g
is strictly negative, and there does not
seem to be a nice description of the visible points. In such a case,
VS
(
x¯
) would
be a disjunctive set and we would even lose the valid (non-linear) inequality
∇g
(
x
)
T
(
x¯−x
)
≥
0. Likewise, if
C
was not convex, or if we had more than
one non-convex constraint, e.g., some variable has to be binary, then there
does not seem to be a nice description of the visible points. This last point
is rather unfortunate, it means that it might not be easy to generalize the
technique to relaxations that involve more than one non-convex constraint. In
particular, since a mixed-integer set usually consists of multiple non-convex
constraints, the techniques presented here might not be applicable to MILPs.
On the other hand, considering more constraints might allow us to see more
of the feasible region. Therefore, in such cases one might have to try to use
stronger generators such as
conv Vconv S
, see also Venkatachalam and Ntaimo
(2016).
Quadratic constraints
For quadratic constraints, the visible points have a particularly simple descrip-
tion.
Theorem 3.29.
Let
C
be a closed, convex set that contains
x¯
. Let
g
(
x
) =
xTQx +bTx+cand S={x∈C:g(x)≤0}. If g(x¯) >0, then
VS(x¯) = {︂x∈C:g(x) = 0,∇g(x¯)Tx+bTx¯ + 2c≥0}︂
Proof. (
⊆
) Let
x∈VS
(
x¯
). By Theorem 3.27, we have
g
(
x
) = 0 and
∇g
(
x
)
T
(
x¯−
x)≥0. Equivalently,
xTQx +bTx+c= 0,
2xTQ(x¯−x) + bT(x¯−x)≥0.
By multiplying the equation by 2, adding it to the inequality, and re-arranging
terms we obtain the result.
(
⊇
) Let
x
satisfy
g
(
x
) = 0 and
∇g
(
x¯
)
Tx
+
bTx¯
+2
c≥
0. Then, subtracting
2g(x) from ∇g(x¯)Tx+bTx¯ + 2c≥0 yields ∇g(x)T(x¯−x)≥0. Let
q(λ) = g(x+λ(x¯−x)), for λ∈R.
The derivative is given by
q′
(
λ
) =
∇g
(
x
+
λ
(
x¯−x
))
T
(
x¯−x
), and
q′
(0) =
∇g
(
x
)
T
(
x¯−x
)
≥
0. Since
q
is quadratic,
q
(1) =
g
(
x¯
)
>
0,
q
(0) =
g
(
x
) = 0, and
68 Visible Points, the Separation Problem, and Applications to MINLP
q′
(0)
≥
0, we have that
q
has no roots in (0
,
1]. Thus,
g
(
x
+
λ
(
x¯−x
)) =
q
(
λ
)
>
0
for every
λ∈
(0
,
1] and, from Theorem 3.27, we conclude that
x∈VS
(
x¯
) as
we wanted.
Remark 3.30.
Theorem 3.29 implies in particular that the visible points of
a closed convex set intersected with a quadratic constraint, from a point in
the convex set, is always closed. This does not contradict Deutsch et al. (2013,
Example 15.5) mentioned in Remark 3.16. Indeed, if one represents the cone
as a quadratic constraint
g
(
x
)
≤
0, then the origin must be feasible for the
quadratic constraint. This follows from the fact that the ray [1
,∞
)(1
,
0
,
0) is
in the boundary of the cone, which implies that
g
(
λ,
0
,
0) = 0 for
λ≥
0. But
g
(
λ,
0
,
0) is a univariate quadratic function and as such can have at most two
roots if it is nonzero. Hence,
g
(
λ,
0
,
0) = 0 and, in particular,
g
(0
,
0
,
0) = 0.
Remark 3.31.
The hyperplane
∇g
(
x¯
)
Tx
+
bTx¯
+ 2
c
= 0 is known as the
polar hyperplane (Fasano and Pesenti, 2017) of the point
x¯
with respect to
the quadratic
g
in projective geometry. In fact, homogenizing the quadratic
g
yields the quadric
gh(x, x0) = xTQx +bTxx0+cx2
0=(︃x
x0)︃T(︄Qb
2
bT
2c)︄(︃x
x0)︃.
The polar hyperplane of (︃x¯
1)︃with respect to gh(x, x0) = 0 is then given by
∇gh(x, x0)T(x¯,1) = 0
⇐⇒ 2x¯TQx +bTx¯x0+bTx+ 2cx0= 0.
Intersecting with x0= 1 yields ∇g(x¯)Tx+bTx¯ + 2c= 0.
Example 3.32. Consider the function
g(x1, x2, x3) = −x1x2+x1x3+x2x3−x1−x2−x3+ 1,
the boxed domain B= [−1
10 ,2] ×[0,2]2, the constrained set
S={x∈B:g(x)≤0},
and the infeasible point
x¯
= (0
,
0
,
0). By Theorem 3.29, the visible points from
x¯ are given by
VS(x¯) = {(x1, x2, x3)∈B:g(x) = 0, x1+x2+x3≥0},
3.4. Applications to MINLP 69
as shown in Figure 3.3.
The tightest box bounding VS(x¯) is
R= [−1
10,1] ×[0,1
20(23 + 3√5)] ×[0,1
20(19 + 3√5)].
The linear underestimators of
g
obtained by using McCormick inequalities (Mc-
Cormick, 1976) for each term over Band Rare
1≤x1+ 3x2+11
10x3and 1 ≤x1+ 2x2+11
10x3,
respectively. Since 0
≤x2
, it follows that the underestimator over
R
dominates
the underestimator over
B
. We remark that the improvement in this particular
cut is only due to the improvement on the upper bound of x1.
Figure 3.3: The left plot shows the feasible region
S
and
x¯
. The set
{x∈B
:
g
(
x
) = 0
}
appears in the middle plot. Finally, the visible points,
VS
(
x¯
), are
plotted on the right.
Polynomial constraints
For a general polynomial g, the condition
g(x+λ(x¯−x)) >0 for every λ∈(0,1] (3.2)
of
(3.1)
asks for the univariate polynomial
px
(
λ
) =
g
(
x
+
λ
(
x¯−x
)) to be
positive on (0
,
1]. We can then use the theory of non-negative polynomials
to translate a relaxation of the infinitely many constraints
(3.2)
to a finite
number of constraints. From the following classic characterization of univariate
non-negative polynomials on intervals, see for instance Powers and Reznick
(2000), we can derive an extended formulation for the relaxation of (3.1),
RS(x¯) := {x∈C:g(x) = 0, g(x+λ(x¯−x)) ≥0 for every λ∈[0,1]}.
70 Visible Points, the Separation Problem, and Applications to MINLP
Theorem 3.33.
Let
p∈R
[
λ
]be a polynomial. Then
p
is non-negative on
[0,1] if and only if
1.
the degree of
p
is 2
d
and there exist
s1, s2∈R
[
λ
]of degree
d
and
d−
1,
respectively, such that
p(λ) = s1(λ)2+λ(1 −λ)s2(λ)2.
2.
the degree of
p
is 2
d
+ 1 and there exist
s1, s2∈R
[
λ
]of degree
d
, such
that
p(λ) = λs1(λ)2+ (1 −λ)s2(λ)2.
Theorem 3.34.
Let
C
be a closed convex set that contains
x¯
. Let
g
(
x
)be
a polynomial such that
g
(
x¯
)
>
0and
S
=
{x∈C
:
g
(
x
)
≤
0
}
. Let
px
(
λ
) =
g(x+λ(x¯−x)).
1. If the degree of gis 2d, then
RS(x¯) = projxE,
where Eis
{(x, A, B)∈C×Sd
+×Sd
+:
g(x) = 0,
p′
x(0) = B00,
p(k+2)
x(0)
(k+ 2)! =∑︂
i+j=k
0≤i,j≤d−1
Aij −Bij +∑︂
i+j=k+1
0≤i,j≤d−1
Bij, for 0≤k≤2d−2}.
2. If the degree of gis 2d+ 1, then
RS(x¯) = projxE,
where Eis
{(x, A, B)∈C×Sd+1
+×Sd
+:
g(x) = 0,
p′
x(0) = A00,
p′′
x(0)
2= 2A01 +B00,
p(k+3)
x(0)
(k+ 3)! =∑︂
i+j=k+2
0≤i,j≤d
Aij+∑︂
i+j=k+1
0≤i,j≤d−1
Bij−∑︂
i+j=k
0≤i,j≤d−1
Bij, for 0≤k≤2d−2}.
3.4. Applications to MINLP 71
Proof. We just prove the case of even degree as the proof for the odd degree
case is similar. We have
x∈RS
(
x¯
) if and only if
px
(0) = 0 and
px
(
λ
) is
non-negative on [0
,
1]. By Theorem 3.33, this is equivalent to
px
(0) = 0 and
there exist polynomials s1, s2of degree dand d−1, respectively, such that
px(λ) = s1(λ)2+λ(1 −λ)s2(λ)2.
Given that 0 =
px
(0) =
s1
(0)
2
, the polynomial
s1
has a root at 0 and we can
write it as
s1
(
λ
) =
λr1
(
λ
) where
r1
is a polynomial of degree
d−
1. Thus,
x∈RS
(
x¯
) if and only if
px
(0) = 0 and there exist polynomials
r1, r2
of degree
d−1 such that
px(λ) = λ2r1(λ)2+λ(1 −λ)r2(λ)2.
Let Λ = (1
, λ, . . . , λd−1
)
T
. The polynomials
ri
can be written as
ri
=
cT
i
Λ
for some
ci∈Rd
. Then,
r1
(
λ
)
2
= Λ
TA
Λ and
r2
(
λ
)
2
= Λ
TB
Λ for some
A, B ∈ Sd
+.
Thus,
x∈RS
(
x¯
) if and only if
px
(0) = 0 and there exist
A, B ∈ Sd
+
such
that
px(λ) = λ2ΛTAΛ + λ(1 −λ)ΛTBΛ.
Since px(λ) is a polynomial of degree 2d, its Taylor expansion at 0 yields
px(λ) =
2d
∑︂
k=1
p(k)
x(0)
k!λk.
Identifying coefficients, we conclude the theorem.
Remark 3.35.
One could also add the constraints
rk
(
A
) =
rk
(
B
) = 1 to
E
in the statement of Theorem 3.34. The correctness can easily be seen from the
proof since
A
=
c1cT
1
and
B
=
c2cT
2
. Although it makes the set more restricted,
the rank constraint is non-convex and does not change the projection. Thus,
we decided to leave it out.
We can recover Theorem 3.29 from Theorem 3.34. The set
E
of Theo-
rem 3.34 for the quadratic case (
d
= 1) is described by
g
(
x
) = 0,
p′
x
(0) =
B00
and
p′′
x
(0)
/
2 =
A00 −B00
, where
A00, B00 ≥
0. This implies that 0
<
g
(
x¯
) =
px
(1) =
p′
x
(0) +
p′′
x
(0)
/
2 =
A00
. Therefore,
RS
(
x¯
) consists of the
x
such that
px
(0) = 0 and
p′
x
(0)
≥
0. This last constraint is equivalent to
∇g
(
x
)
T
(
x¯−x
)
≥
0 which is the only constraint needed, apart from
g
(
x
) = 0,
to prove Theorem 3.29.
72 Visible Points, the Separation Problem, and Applications to MINLP
-2-1 0 1 2
-2
-1
0
1
2
Figure 3.4: Feasible region
g
(
x
)
≤
0 of Example 3.36 that shows that
cl VS(x¯) =RS(x¯) when the degree of gis greater than 2.
The previous deduction is only possible because
VS
(
x¯
) =
RS
(
x¯
) holds for
a quadratic constraint. This equality does not hold as soon as the degree
is greater than 2, even after replacing
VS
(
x¯
) by its closure, as shown in the
following example.
Example 3.36.
Consider
g
(
x1, x2
) = (
x2
1
+
x2
2−
1)
x1
,
S
=
{
(
x1, x2
) :
g
(
x1, x2
)
≤
0
}
, and
x¯
= (1
,−
2). The set
S
consists of the right half of the
unit ball and the half space
x1≤
0 without the interior of the left half of the
unit ball, see Figure 3.4. The point
z
= (
−
1
,
0) is not visible from
x¯
, because
g
(
z
+
λ
(
x¯−z
)) =
g
(
−
1+2
λ, −
2
λ
) = ((2
λ−
1)
2
+4
λ2−
1)(2
λ−
1) = 4
λ
(2
λ−
1)
2
is zero at
λ
=
1
2
. On the other hand,
z∈RS
(
x¯
) since 4
λ
(2
λ−
1)
2≥
0
for every
λ∈
[0
,
1]. In this example
VS
(
x¯
) is closed, so we conclude that
cl VS(x¯) =RS(x¯).
3.5 Conclusions and Outlook
Using the concept of visible points, we introduced a technique that allows
to reduce the domains in separation problems. Such a result is particularly
interesting for MINLP, since the tightness of the domain directly affects the
quality of underestimators, from which cuts are obtained.
Some questions that could be interesting to look at in the future are the
followings. Is there a tighter domain other than
VS
that can be efficiently
exploited? Is there a useful characterizations of
VS
when
S
contains more
than one non-convex constraint, in particular, if some variables are restricted
to be integer?
Chapter 4
Intersection Cuts for Factorable
Mixed-Integer Nonlinear Programming
We now move to our final stop, intersection cuts (see Section 1.2). In this
chapter we develop a technique for constructing
S
-free sets where
S
=
{x
:
f
(
x
)
≤
0
}
and
f
is an arbitrary factorable function. In the next chapter we
specialized to the case where
f
is quadratic and we construct maximal
S
-free
sets.
In order to build an
S
-free for the case that
f
is factorable, we develop
a procedure that constructs a concave underestimator of
f
that is tight at a
given point. A peculiarity of these underestimators is that they do not rely on
a bounded domain. We propose a strengthening procedure for the intersection
cuts that exploits the bounds of the domain. Finally, we propose an extension
of monoidal strengthening to take advantage of the integrality of non-basic
variables.
In Section 4.1 we introduce our setting, motivate intersection cuts for
MINLP by making a parallel between branch and bound for MILP and MINLP,
and describe the contributions of the chapter. In Section 4.2 we review some lit-
erature and related works. Then we jump right into the construction of concave
underestimators in Section 4.3. The improvement using bound information
is presented in Section 4.4, while our application of monoidal strengthening
appears in Section 4.5. We offer a summary of the chapter in Section 4.6.
This chapter is based on the publication Serrano (2019).
73
74 Intersection Cuts for Factorable MINLP
4.1 Motivation
In this chapter we propose a procedure for generating intersection cuts for
MINLP. We consider MINLP of the following form
max cTx
s.t. gj(x)≤0, j ∈J
Ax =b
xi∈Z, i ∈I
x≥0,
(4.1)
where
J
=
{
1
, . . . , l}
denotes the indices of the nonlinear constraints,
gj:Rn→
R
are assumed to be continuous and factorable (see Definition 4.1),
A∈Rm×n
,
c∈Rn
,
b∈Rm
, and
I⊆ {
1
, . . . , n}
are the indices of the integer variables.
We denote the set of feasible solutions by
S
and a generic relaxation of
S
by
R, that is, S⊆R.
The current state of the art for solving
MINLP
to global optimality is
via linear programming (LP), convex nonlinear programming and (
MILP
)
relaxations of
S
, together with spatial branch and bound (Belotti et al., 2009;
Kılın¸c and Sahinidis, 2017; Lin and Schrage, 2009; Misener and Floudas, 2014;
Tawarmalani and Sahinidis, 2005; Vigerske and Gleixner, 2017). Let us recall,
roughly, how LP-based spatial branch and bound works. The initial polyhedral
relaxation is solved and yields
x¯
. If the solution
x¯
is feasible for
(4.1)
, we obtain
an optimal solution. If not, we try to separate the solution from the feasible
region. This is usually done by considering each violated constraint separately.
Let
g
(
x
)
≤
0 be a violated constraint of
(4.1)
. If
g
(
x¯
)
>
0 and
g
is convex, then
g
(
x¯
)+
vT
(
x−x¯
)
≤
0, where
v∈∂g
(
x¯
), is a valid cut. If
gj
is non-convex, then a
convex underestimator
gvex
, that is, a convex function such that
gvex
(
x
)
≤g
(
x
)
over the feasible region, is constructed and if
gvex
(
x¯
)
>
0 the previous cut is
constructed for
gvex
. If the point cannot be separated, then we branch, that
is, we select a variable
xk
in a violated constraint and split the problem into
two problems, one with xk≤x¯kand the other one with xk≥x¯k.
Applying the previous procedure to the
MILP
case, that is
(4.1)
with
J
=
∅
, reveals a problem with this approach. In this case, the polyhedral
relaxation is just the linear programming (LP) relaxation. Assuming that
x¯
is not feasible for the
MILP
, then there is an
i∈I
such that
xi/∈Z
. Let us
treat the constraint
xi∈Z
as a nonlinear non-convex constraint represented
by some function as
g
(
xi
)
≤
0. Then,
g
(
x¯i
)
>
0. A convex underestimator
g¯
of
g
must satisfy that
gvex
(
z
)
≤
0 for every
z∈R
, since
gvex
(
z
)
≤g
(
z
)
≤
0
for every
z∈Z
and
gvex
(
z
) is convex. Thus, separation is not possible and we
4.1. Motivation 75
need to branch. However, for the current state-of-the-art algorithms for
MILP
,
cutting planes are a fundamental component (Achterberg and Wunderling,
2013).
Recall, from Section 1.2, that when solving the LP relaxation, we obtain
xB
=
x¯B
+
RxN
, where
B
and
N
are the indices of the basic and non-basic
variables, respectively. Since
x¯
is infeasible for the
MILP
, there must be some
k∈B∩I
such that
x¯k/∈Z
. Now, even though
x¯
cannot be separated from the
violated constraint
xk∈Z
, the equivalent constraint,
x¯k
+
∑︁j∈Nrkjxj∈Z
can be used to separate x¯.
In the
MINLP
case, this framework generates equivalent non-linear con-
straints with some appealing properties, in particular, violated points can
always be separated. The change of variables
xk
=
x¯k
+
∑︁j∈Nrkjxj
for the
basic variables present in a violated nonlinear constraint
g
(
x
)
≤
0, produces
the non-linear constraint
h
(
xN
)
≤
0 for which
h
(0)
>
0 and
xN≥
0. Assuming
that the convex envelope of
h
exists in
xN≥
0, then we can always construct
a valid inequality. Indeed, by Tawarmalani and Sahinidis (2002, Corollary
3), the convex envelope of
h
is tight at 0. Since an
ϵ
-subgradient
3
always
exists for any
ϵ >
0 and
x∈dom h
(Brondsted and Rockafellar, 1965), an
h(0)
2-subgradient, for instance, at 0 will separate it.
Even when there is no convex underestimator for
h
, a valid cutting plane
does exist. Continuity of
h
implies that
X
=
{xN≥
0 :
h
(
xN
)
≤
0
}
is
closed and Conforti et al. (2015, Lemma 2.1) ensures that 0
/∈convX
, thus,
a valid inequality exists. We introduce a technique to construct such a valid
inequality. The idea is to build a concave underestimator of
h
,
have
, such that
have
(0) =
h
(0)
>
0. Then,
C
=
{xN
:
have
(
xN
)
≥
0
}
is an
S
-free set, that is,
a convex set that does not contain any feasible point in its interior, and as
such can be used to build an intersection cut (IC) (Tuy, 1964; Balas, 1971;
Glover, 1973).
First contribution
In Section 4.3, we present a procedure to build con-
cave underestimators for factorable functions that are tight at a given point.
The procedure is similar to McCormick’s method for constructing convex
underestimators, and generalizes Proposition 3.2 and improves Proposition
3.3 of Khamisov (1999). A simple way to build a concave underestimator of
a function is to write the function as a difference of convex (d.c.), then, by
linearizing the convex part a concave underestimator is obtained. However,
even if a function is known to have a d.c. representation, it is not always clear
3
An
ϵ
-subgradient of a convex function
f
at
y∈dom f
is
v
such that
f
(
x
)
≥f
(
y
)
−ϵ
+
vT(x−y) for all x∈dom f
76 Intersection Cuts for Factorable MINLP
how to construct it.
These underestimators can be used to build intersection cuts. We note
that IC from a concave underestimator can generate cuts that cannot be gen-
erated by using the convex envelope. This should not be surprising, given
that intersection cuts work at the feasible region level, while convex un-
derestimators depend on the graph of the function. A simple example is
{x∈
[0
,
2] :
−x2
+1
≤
0
}
. When separating 0, the intersection cut gives
x≥
1,
while using the convex envelope over [0,2] yields x≥1/2.
There are many differences between concave underestimators and convex
ones. Maybe the most interesting one is that concave underestimators do not
need bounded domains to exist. As an extreme example,
−x2
is a concave
underestimator of itself, but a convex underestimator only exists if the domain
of
x
is bounded. Even though this might be regarded as an advantage, it is
also a problem. If concave underestimators are independent of the domain,
then we cannot improve them when the domain shrinks.
Second contribution
In Section 4.4, we propose a strengthening procedure
that uses the bounds of the variables to enlarge the
S
-free set. Our procedure
improves on the one used by Tuy (1964).
Other techniques for strengthening IC have been proposed, such as, exploit-
ing the integrality of the non-basic variables (Balas and Jeroslow, 1980; Con-
forti et al., 2011a; Dey and Wolsey, 2010), improving the relaxation
R
(Balas
and Margot, 2011; Porembski, 1999, 2001) and computing the convex hull of
R\C
(Basu et al., 2011; Conforti et al., 2015; Glover, 1974; Sen and Sherali,
1986, 1987).
Third contribution
By interpreting IC as disjunctive cuts (Balas, 1979),
we extend the monoidal strengthening technique of Balas and Jeroslow (1980)
to our setting in Section 4.5. Although its applicability seems to be limited,
we think it is of independent interest, especially for MILP.
4.2 Literature Review and Related Work
There have been many efforts on generalizing cutting planes from
MILP
to
MINLP
, we refer the reader to Modaresi et al. (2015) and the references
therein. Modaresi et al. (2015) study how to compute
conv
(
R\C
) where
R
is
not polyhedral, but
C
is a
k
-branch split. In practice, such sets
C
usually come
4.2. Literature Review and Related Work 77
from the integrality of the variables. Works that build sets
C
which do not
come from integrality considerations include Belotti (2011); Bienstock et al.
(2019); Fischetti et al. (2016, 2017); Fischetti and Monaci (2019); Saxena et al.
(2010a,b). We refer to Bonami et al. (2011) and the references therein for
more details.
Fischetti et al. (2016) applied intersection cuts to bilevel optimization.
Bienstock et al. (2016, 2019) studied outer-product-free sets; these can be
used for generating intersection cuts for polynomial optimization when using
an extended formulation. Fischetti and Monaci (2019) constructed bilinear-
free sets through a bound disjunction and, in each term of the disjunction,
underestimating the bilinear term with McCormick inequalities (McCormick,
1976). The complement of this disjunction is the bilinear-free set.
We would like to point out that the disjunctions built in Belotti (2011);
Fischetti and Monaci (2019); Saxena et al. (2010b,a) can be interpreted as
piecewise linear concave underestimators. However, our approach is not suit-
able for disjunctive cuts built through cut generating LPs (Balas et al., 1993),
since we generate infinite disjunctions, see Section 4.5, so we rely on the classic
concept of intersection cuts where Ris a translated simplicial cone.
Khamisov (1999) studies functions
f
:
Rn→R
, representable as
f
(
x
) =
maxy∈Rφ
(
x, y
) where
φ
is continuous and concave on
x
. These functions
allow for a concave underestimator at every point. He shows that this class of
functions is very general, in particular, the class of functions representable as
difference of convex functions is a strict subset of this class. He then proposes
a procedure to build concave underestimators of composition of functions
which is a special case of Theorem 4.4 below. He also suggests how to build
an underestimator for the product of two functions over a compact domain.
The construction is based on writing the product as a difference of convex and
then using a construction for the square of a function. The construction of
a convex overestimator of
f2
is based on a piecewise linear overestimation of
the function
x2
over the range of
f
, which is why Khamisov needs a compact
domain for
f
. We simplify the construction for the product and no longer
need a compact domain. We still write the product as a d.c. but we use
Theorem 4.4 instead of a piecewise linear overestimator, allowing us to drop
the compactness assumption.
Although not directly related to our work, other papers that use underes-
timators other than convex are Buchheim and D’Ambrosio (2016); Buchheim
and Traversi (2013); Hasan (2018). We would also like to mention here the
work of Towle and Luedtke (2019) that proposes a method for constructing
valid cutting planes with a similar approach to intersection cuts, but allowing
x¯
to not be in the
S
-free set. The
S
-free sets developed in this chapter could
78 Intersection Cuts for Factorable MINLP
also be used in their framework.
4.3 Concave Underestimators
In his seminal paper, McCormick (1976) proposed a method to build convex
underestimators of factorable functions.
Definition 4.1.
Given a set of univariate functions
L
, e.g.,
L
=
{cos,·n,exp,log,...}
,
the set of factorable functions
F
is the smallest set that contains
L
, the con-
stant functions, and is closed under addition, product and composition.
As an example,
e−(cos(x2)+xy/4)2
is a factorable function for
L
=
{cos,exp}
.
Given the inductive definition of factorable functions, to show a property
about them one just needs to show that said property holds for all the func-
tions in
L
, constant functions, and that it is preserved by the product, addition
and composition. For instance, McCormick (1976) proves, constructively, that
every factorable function admits a convex underestimator and a concave over-
estimator, by showing how to construct estimators for the sum, product and
composition of two functions for which estimators are known.
An estimator for the sum of two functions is the sum of the estimators. For
the product, McCormick uses the well-known McCormick inequalities. Less
known is the way McCormick handles the composition
f
(
g
(
x
)). Let
fvex
be a
convex underestimator of fand zmin = arg min fvex(z). Let gvex be a convex
underestimator of
g
and
gave
a concave overestimator. McCormick shows
4
that
fvex
(
mid{gvex
(
x
)
, gave
(
x
)
, zmin}
) is a convex underestimator of
f
(
g
(
x
)),
where
mid{x, y, z}
is the median between
x, y
and
z
. It is well known that
the optimum of a convex function over a closed interval is given by such a
formula, thus
fvex(mid{gvex(x), gave(x), zmin}) = min{fvex(z) : z∈[gvex(x), gave(x)]},
see also Tsoukalas and Mitsos (2014).
Definition 4.2.
Let
X ⊆ Rn
be convex, and
f
:
X → R
be a function. We
say that
fave
:
X → R
is a concave underestimator of
f
at
x¯∈ X
if
fave
is
concave,
fave
(
x
)
≤f
(
x
)for every
x∈ X
and
fave
(
x¯
) =
f
(
x¯
). Similarly we
define a convex overestimator of fat x¯∈ X.
Remark 4.3.
For simplicity, we will consider only the case where
X
=
Rn
.
This restriction leaves out some common functions like
log
. One possibility
4He actually leaves it as an exercise for the reader.
4.3. Concave Underestimators 79
to include these function is to let the range of the function to be
R∪{±∞}
.
Then,
log
(
x
) =
−∞
for
x∈R−
. Note that other functions like
√x
can be
handled by replacing them by a concave underestimator defined on all
R
.
We now show that every factorable function admits a concave underesti-
mator at a given point. Since the case for the addition is easy, we just need to
specify how to build concave underestimators and convex overestimators for
– the product of two functions for which estimators are known,
–
the composition
f
(
g
(
x
)) where estimators of
f
and
g
are known and
f
is univariate.
Theorem 4.4.
Let
f
:
R→R
and
g
:
Rn→R
. Let
gave, fave
be, respectively,
a concave underestimator of
g
at
x¯
and of
f
at
g
(
x¯
). Further, let
gvex
be a
convex overestimator of gat x¯. Then, h:Rn→Rgiven by
h(x) := min{fave(gave(x)), fave(gvex(x))},
is a concave underestimator of f◦gat x¯.
Proof. Clearly, h(x¯) = f(g(x¯)).
To establish h(x)≤f(g(x)), notice that
h(x) = min{fave(z) : gave(x)≤z≤gvex(x)}.(4.2)
Since
z
=
g
(
x
) is a feasible solution and
fave
is an underestimator of
f
, we
obtain that h(x)≤f(g(x)).
Now, let us prove that
h
is concave. To this end, we again use the represen-
tation (4.2). To simplify notation, we write g1, g2for gave, gvex, respectively.
We prove concavity by definition, that is,
h(λx1+ (1 −λ)x2)≥λh(x1) + (1 −λ)h(x2),for λ∈[0,1].
Let
I= [g1(λx1+ (1 −λ)x2), g2(λx1+ (1 −λ)x2)]
J= [λg1(x1) + (1 −λ)g1(x2), λg2(x1) + (1 −λ)g2(x2)].
By the concavity of g1and convexity of g2we have I⊆J. Therefore,
h(λx1+ (1 −λ)x2) = min{fave(z) : z∈I} ≥ min{fave(z) : z∈J}.
80 Intersection Cuts for Factorable MINLP
Since fave is concave, the minimum is achieved at the boundary,
min{fave(z) : z∈J}= min
i∈{1,2}fave(λgi(x1) + (1 −λ)gi(x2)).
Furthermore,
fave
(
λgi
(
x1
)+(1
−λ
)
gi
(
x2
))
≥λfave
(
gi
(
x1
))+(1
−λ
)
fave
(
gi
(
x2
))
which implies that
h(λx1+ (1 −λ)x2)≥min
i∈{1,2}λfave(gi(x1)) + (1 −λ)fave(gi(x2))
≥min
i∈{1,2}λfave(gi(x1)) + min
i∈{1,2}(1 −λ)fave(gi(x2))
=λh(x1) + (1 −λ)h(x2),
as we wanted to show.
Remark 4.5.
The generalization of Theorem 4.4 to the case where
f
is
multivariate in the spirit of Tsoukalas and Mitsos (2014) is straightforward.
The computation of a concave underestimator and convex overestimator
of the product of two functions reduces to the computation of estimators for
the square of a function through the polarization identity
f(x)g(x) = 1
4(f(x) + g(x))2−1
4(f(x)−g(x))2.
This identity is based on writing the product
x1x2
as a difference of convex.
In particular, it can be proven by doing an eigenvalue decomposition of the
Hessian of
x1x2
Let
h
:
Rn→R
for which we know estimators
hvex ≤h≤
have
at
x¯
. From Theorem 4.4, a convex overestimator of
h2
at
x¯
is given by
max{hvex2, have2}
. On the other hand, a concave underestimator of
h2
at
x¯
can
be constructed from the underestimator
h2
(
x
)
≥h2
(
x¯
)+2
h
(
x¯
)(
h
(
x
)
−h
(
x¯
)).
From here we obtain
{︄2h(x¯)have(x)−h2(x¯),if h(x¯) ≤0
2h(x¯)hvex(x)−h2(x¯),if h(x¯) >0.(4.3)
Example 4.6.
Let us compute a concave underestimator of
f
(
x
) =
e−(cos(x2)+x/4)2
at 0. Estimators of
x2
are given by 0
≤x2≤x2
. For
cos
(
x
), estimators are
cos
(
x
)
−x2/
2
≤cos
(
x
)
≤
1. Then, a concave underestimator of
cos
(
x2
) is,
according to Theorem 4.4,
min{cos
(0)
−
0
2/
2
,cos
(
x2
)
−x4/
2
}
=
cos
(
x2
)
−x4/
2.
4.3. Concave Underestimators 81
Figure 4.1: Concave underestimator (orange) and convex overestimator (green)
of
cos
(
x2
) +
x/
4 (left),
−
(
cos
(
x2
) +
x/
4)
2
(middle) and
f
(
x
) (right) at
x
= 0.
A convex overestimator is 1. Hence, cos(x2)−x4/2 + x/4≤cos(x2) + x/4≤
1 + x/4.
Given that
−x2
is concave, a concave underestimator of
−
(
cos
(
x2
)+
x/
4)
2
is
min{−
(
cos
(
x2
)
−x4/
2+
x/
4)
2,−
(1+
x/
4)
2}
. To compute a convex overesti-
mator of
−
(
cos
(
x2
)+
x/
4)
2
, we compute a concave underestimator of (
cos
(
x2
)+
x/
4)
2
. Since,
cos
(
x2
) +
x/
4 at 0 is 1,
(4.3)
yields 2(
cos
(
x2
)
−x4/
2 +
x/
4)
−
1.
Finally, a concave underestimator of
ex
at
x
=
−
1 is just its linearization,
e−1
+
e−1
(
x
+1) and so
e−1
+
e−1
(1+
min{−
(
cos
(
x2
)
−x4/
2+
x/
4)
2,−
(1+
x/
4)
2}
)
is a concave underestimator of
f
(
x
). The intermediate estimators as well as
the final concave underestimator are illustrated in Figure 4.1.
For ease of exposition, in the rest of the chapter we assume that the concave
underestimator is differentiable. All results can be extended to the case where
the functions are only sub- or super-differentiable.
4.3.1 Concave Underestimators and Intersection Cuts for Convex
Constraints
Here we show that if we apply our procedure to construct an
S
-free set from a
violated convex constraint and compute an intersection cut using the smallest
representation (see Section 1.2), we just recover the gradient cut. Even more
this gradient cut is the same that we would have computed in the original
space. In particular, the point is separable in the original space if and only if
it is separable in the non-basic space. If one recalls that gradient cuts do not
use bounds information, then this might not be surprising.
Let
g
(
x
) be a differentiable convex function and consider the constraints
g
(
x
)
≤
0. Suppose
xB
=
f
+
RxN
is the current optimal tableau and
(xB, xN) = (f, 0) the optimal LP solution. Further, assume that g(f, 0) >0.
Let
h
(
xN
) =
g
(
f
+
RxN, xN
) and note that this function is still convex
since it is the composition of a convex function with an affine map. A concave
82 Intersection Cuts for Factorable MINLP
underestimator at 0 is just the linearization of hat 0, that is,
h(0) + ∇h(0)TxN.
Then, the
S
-free set is
C
=
{xN
:
h
(0)+
∇h
(0)
TxN≥
0
}
=
{xN
:
−1
h(0) ∇h
(0)
TxN≤
1
}
. Thus the smallest representation is given by the sublinear function (actu-
ally, linear)
ρ
(
xN
) =
−1
h(0) ∇h
(0)
TxN
. In the space of the non-basic variables
the rays are just
ei
for
i∈N
. Thus, the intersection cut is
∑︁i∈Nρ
(
ei
)
xi≥
1,
that is,
−1
h(0) ∇h
(0)
TxN≥
1. Manipulating the last expression we arrive at
h(0) + ∇h(0)TxN≤0. This is the same as the gradient cut of hat 0.
Furthermore,
h(0) + ∇h(0)TxN=g(f, 0) + ∇g(f, 0)T(︃R
I)︃xN
=g(f, 0) + ∇g(f, 0)T(︃xB−f
xN)︃
This last expression is the gradient cut of gat (f, 0).
Thus, there is nothing to be gain from this approach for convex constraints.
An interesting observation, in connection to Chapter 2, is that the
S
-free
set, either
{xN
:
h
(0) +
∇h
(0)
TxN≥
0
}
in the non-basic space or
{x
:
g
(
f,
0) +
∇g
(
f,
0)
T(︃xB−f
xN)︃≥
0
}
, is not going to be maximal if it does not
support the constraint. In particular, if
g
is strictly convex the
S
-free set is
not maximal. This will be important in the next chapter.
Remark 4.7.
Also, this already provides evidence that the
S
-free sets con-
structed by our approach will not be maximal in general. Assume we have
a function
g
,
S
=
{x
:
g
(
x
)
≤
0
}
, and we write
g
as a difference of convex
g
=
f−h
. Say we linearize the function
f
at a point
x¯
such that
g
(
x¯
)
>
0 to
obtain
f≥l
. Then, the concave underestimator is
l−h≤g
and the
S
-free set
is
l−h≥
0. If
f
is strictly convex, we would have
l
(
x
)
< f
(
x
) for every
x
=
x¯
.
This
S
-free set will not touch
S
. If it did, that is, if there is a point
x
both in
S
and the
S
-free set, then
f
(
x
)
−h
(
x
)
≥l
(
x
)
−h
(
x
)
≥
0
≥g
(
x
) =
f
(
x
)
−h
(
x
),
thus x=x¯ and g(x¯) = 0, which contradicts our assumption.
This argument is very far from a proof since, first, our procedure does not
really construct a d.c. decomposition, but rather use a d.c. as an intermediate
step for the product. Second, an S-free does not need to touch Sin order to
be maximal (see Chapter 5).
4.4. Enlarging the S-free Sets by Using Bound Information 83
4.4 Enlarging the S-free Sets by Using Bound Information
In Section 4.3, we showed how to build concave underestimators which give
us
S
-free sets. Note that the construction does not make use of the bounds of
the domain. We can exploit the bounds of the domain by the observation that
the concave underestimator only needs to underestimate within the feasible
region. However, to preserve the convexity of the
S
-free set, we must ensure
that the underestimator is still concave.
Let
h
(
x
)
≤
0 be a constraint of
(4.1)
, assume
x∈
[
l, u
] and let
have
be a concave underestimator of
h
. Throughout this section,
S
=
{x∈
[
l, u
] :
h
(
x
)
≤
0
}
. In order to construct a concave function
h
ˆ
such that
{x
:
h
ˆ
(
x
)
≥
0
}
contains {x:have(x)≥0}, consider the following function
h
ˆ(x) = min{have(z) + ∇have(z)T(x−z) : z∈[l, u], have(z)≥0}.(4.4)
A similar function was already considered by Tuy (1964). The only difference
is that Tuy’s strengthening does not use the restriction
have
(
z
)
≥
0, see
Figure 4.2.
Proposition 4.8.
Let
have
be a concave underestimator of
h
at
x¯∈
[
l, u
],
such that
h
(
x¯
)
>
0. Define
h
ˆ
as in
(4.4)
. Then, the set
C
=
{x
:
h
ˆ
(
x
)
≥
0
}
is
a convex S-free set and C⊇ {x:have(x)≥0}.
Proof. The function
h
ˆ
is concave since it is the minimum of linear functions.
This establishes the convexity of C.
To show that
C⊇ {x
:
have
(
x
)
≥
0
}
, notice that
have
(
x
) =
minzhave
(
z
) +
∇have
(
z
)
T
(
x−z
). The inclusion follows from observing that the objective
function in the definition of
h
ˆ
(
x
) is the same as above, but over a smaller
domain.
To show that it is
S
-free, we will show that for every
x∈
[
l, u
] such that
h(x)≤0, h
ˆ(x)≤0.
Let
x0∈
[
l, u
] such that
h
(
x0
)
≤
0. Since
have
is a concave underestimator
at
x¯
,
have
(
x¯
)
>
0 and
have
(
x0
)
≤
0. If
have
(
x0
) = 0, then, by definition,
h
ˆ
(
x0
)
≤have
(
x0
) = 0 and we are done. We assume, therefore, that
have
(
x0
)
<
0.
Consider
g
(
λ
) =
have
(
x¯
+
λ
(
x0−x¯
)) and let
λ1∈
(0
,
1) be such that
g
(
λ1
) = 0. The existence of
λ1
is justified by the continuity of
g
,
g
(0)
>
0 and
g
(1)
<
0. Equivalently,
x1
=
x¯
+
λ1
(
x0−x¯
) is the intersection point between
the segment joining
x0
with
x¯
and
{x
:
have
(
x
) = 0
}
. The linearization
of
g
at
λ1
evaluated at
λ
= 1 is negative, because
g
is concave, and equals
have
(
x1
)+
∇have
(
x1
)
T
(
x0−x1
). Finally, given that
x1∈
[
l, u
] and
have
(
x1
) = 0,
x1is feasible for (4.4) and we conclude that h
ˆ(x0)<0.
84 Intersection Cuts for Factorable MINLP
Figure 4.2: Feasible region
{x, y ∈
[0
,
2] :
h
(
x, y
)
≤
0
}
, where
h
=
x2−
2
y2
+
4
xy −
3
x
+ 2
y
+ 1, in blue together with
have
(
x, y
)
≤
0 at
x¯
= (1
,
1) (left),
Tuy’s strengthening (middle) and
h
ˆ≤
0 (right) in orange. Region shown is
[0
,
4]
2
, [0
,
2]
2
is bounded by black lines. The difference between the
S
-free sets
can be seen on the top of the picture.
In general, evaluating
h
ˆ
is a difficult problem and there is no closed form
formula. However, when
have
is quadratic, the problem in the right hand side
of (4.4) is convex and a cut could be strengthen in polynomial time.
4.5 “Monoidal” Strengthening
We show how to strengthen cuts from reverse convex constraints when ex-
actly one non-basic variable is integer. Our technique is based on monoidal
strengthening applied to disjunctive cuts, see Lemma 4.10 and the discussion
following it. If more than one variable is integer, we can generate one cut
per integer variable, relaxing the integrality of all but one variable at a time.
However, under some conditions (see Remark 4.12), we can exploit the inte-
grality of several variables at the same time. For an introduction to monoidal
strengthening see Section 1.4.
Throughout this section, we assume that we already have a concave under-
estimator, and that we have performed the change of variables described in
the introduction. Therefore, we consider the constraint
{x∈
[0
, u
] :
h
(
x
)
≤
0
}
where
h
:
Rn→R
is concave and
h
(0)
>
0. Let
Y
=
{y∈
[0
, u
] :
h
(
y
) = 0
}
.
The convex S-free set C={x∈[0, u] : h(x)≥0}can be written as
C=⋂︂
y∈Y{x∈[0, u] : ∇h(y)Tx≥ ∇h(y)Ty}.
The concavity of
h
implies that
h
(0)
≤h
(
y
)
−∇h
(
y
)
Ty
for all
y
in the domain
of
h
. In particular, if
y∈Y
, then
∇h
(
y
)
Ty≤ −h
(0)
<
0. Since all feasible
4.5. “Monoidal” Strengthening 85
points satisfy h(x)≤0, they must satisfy the infinite disjunction
⋁︂
y∈Y
∇h(y)T
∇h(y)Tyx≥1.(4.5)
The maximum principle (see Section 1.4) implies that with
αj= max
y∈Y
∂jh(y)
∇h(y)Ty,(4.6)
the cut
∑︁jαjxj≥
1 is valid. We remark that the maximum exists, since the
concavity of
h
implies that for
y∈Y
,
h
(
ej
)
≤∂jh
(
y
)
−∇h
(
y
)
Ty
. This implies,
together with
∇h
(
y
)
Ty≤ −h
(0)
<
0, that
∂jh(y)
∇h(y)Ty≤
1 +
h(ej)
∇h(y)Ty
. If
h
(
ej
)
≥
0,
then ∂jh(y)
∇h(y)Ty≤1. Otherwise, ∂jh(y)
∇h(y)Ty≤1−h(ej)
h(0) .
The application of monoidal strengthening (Balas and Jeroslow, 1980,
Theorem 3) to a valid disjunction
⋁︁iαix≥
1 requires the existence of bounds
βi
such that
αix≥βi
is valid for every feasible point. Let
β
(
y
) be such a
bound for (4.5). An example of β(y) is
β(y) = min
x∈[0,u]∇h(y)Tx
∇h(y)Ty.
Remark 4.9.
If
β
(
y
)
≥
1, then
∇h
(
y
)
Tx/∇h
(
y
)
Ty≥
1 is redundant and can
be removed from
(4.5)
. Therefore, we can assume without loss of generality
that β(y)<1.
The following lemma is just a restatement of Lemma 1.1 in Section 1.4.
Lemma 4.10. Every x≥0that satisfies (4.5), also satisfies
⋁︂
y∈Y
∇h(y)Tx
∇h(y)Ty+z(y)(1 −β(y)) ≥1,(4.7)
where z:Y→Zis such that z≡0or there is a y0∈Yfor which z(y0)>0.
Proof. If z≡0, then (4.7) reduces to (4.5).
Otherwise, let
y0∈Y
such that
z
(
y0
)
>
0, that is,
z
(
y0
)
≥
1. By Re-
mark 4.9, for every y∈Y, it holds 1 −β(y)>0, and so
z(y0)(1 −β(y0)) ≥1−β(y0).
Therefore,
β
(
y0
)
≥
1
−z
(
y0
)(1
−β
(
y0
)). Since every
x≥
0 satisfying
(4.5)
satisfies
∇h(y0)Tx
∇h(y0)Ty0≥β
(
y0
), we conclude that
∇h(y0)Tx
∇h(y0)Ty0
+
z
(
y0
)(1
−β
(
y0
))
≥
1
holds.
86 Intersection Cuts for Factorable MINLP
Remark 4.11.
Even if some disjunctive terms have no lower bound, that is,
β
(
y
) =
−∞
for
y∈Y′⊆Y
, Lemma 4.10 still holds if, additionally,
z
(
y
) = 0
for all
y∈Y′
. This means that we are not using that disjunction for the
strengthening. In particular, if for some variable
xj
,
αj
is defined by some
y∈Y′, then this cut coefficient cannot be improved.
Assume now that
xk∈Z
for every
k∈K⊆ {
1
, . . . , n}
. One way of
constructing a new disjunction is to find a set of functions
M
such that for any
choice of
mk∈M
and any feasible assignment of
xk
,
z
(
y
) :=
∑︁k∈Kxkmk
(
y
)
satisfies the conditions of Lemma 4.10, that is, zis in
Z={z:Y→Z:z≡0∨∃y∈Y, z(y)>0}.
Once such a family of functions has been identified, the cut
∑︁jγjxj≥
1 with
γj=αjif j /∈K, and
γk= inf
m∈Mmax
y∈Y
∂kh(y)
∇h(y)Ty+m(y)(1 −β(y)) for k∈K, (4.8)
is valid and at least as strong as
(4.6)
. Any
M⊆Z
such that (
M,
+) is a
monoid, that is, 0 ∈Mand Mis closed under addition can be used in (4.8).
The question that remains is how to choose
M
. For example, the monoid
M
=
{m:Y→Z
:
mhas finite support and ∑︁y∈Ym
(
y
)
≥
0
}
is an obvious
candidate for
M
. However, the problem is how to optimize over such an
M
,
see (4.8).
We circumvent this problem by considering only one integer variable at a
time. Fix
k∈K
. In this setting we can use
Z
as
M
, which is not a monoid.
Indeed, if
z∈Z
, then
xkz∈Z
for any
xk∈Z+
. The advantage of using
Z
is
that the solution of (4.8) is easy to characterize.
With
M
=
Z
, the cut coefficients
(4.8)
of all variables are the same as
(4.6)
except for xk. The cut coefficient of xkis given by
inf
z∈Zmax
y∈Y
∂kh(y)
∇h(y)Ty+z(y)(1 −β(y)).
To compute this coefficient, observe that one would like to have
z
(
y
)
<
0
for points
y
such that the objective function of
(4.6)
is large. However,
z
must
be positive for at least one point. Therefore,
min
y∈Y
∂kh(y)
∇h(y)Ty+ (1 −β(y))
4.6. Conclusions 87
is the best coefficient we can hope for if
z≡
0. This coefficient can be achieved
by
z(y) = {︄1,if y∈arg miny∈Y
∂kh(y)
∇h(y)Ty+ (1 −β(y)),
−L, otherwise (4.9)
where L > 0 is sufficiently large.
Summarizing, we can obtain the following cut:
αj={︄maxy∈Y∂jh(y)
∇h(y)Tyif j=k
min{maxy∈Y∂jh(y)
∇h(y)Ty,miny∈Y∂jh(y)
∇h(y)Ty+ (1 −β(y))}if j=k
(4.10)
Remark 4.12.
Let
zk∈Z
be given by
(4.9)
for each
k∈K
. Assume there
is a subset
K0⊆K
and a monoid
M⊆Z
such that
zk∈M
for every
k∈K0
.
Then, the strengthening can be applied to all xkfor k∈K0.
Alternatively, if there is a constraint enforcing that at most one of the
xk
can be non-zero for
k∈K0
, e.g.,
∑︁k∈Kxk≤
1, then the strengthening can
be applied to all xkfor k∈K0.
In the finite case, our application of monoidal strengthening would be
dominated by the original technique of Balas and Jeroslow (1980) by using
an appropriate monoid. However, in the presence of extra constraint, such
as the one described above, our technique can dominate vanilla monoidal
strengthening.
Example 4.13.
Consider the constraint
{x∈ {
0
,
1
,
2
}×
[0
,
5] :
h
(
x
)
≤
0
}
,
where
h
(
x1, x2
) =
−
10
x2
1−
1
/
2
x2
2
+ 2
x1x2
+ 4, see Figure 4.3. The IC is given
by
√︁5/2x1
+1
/
(2
√2
)
x2≥
1. Note that (1
/√10,√10
)
∈Y
and yields the term
1
/√10x2≥
1 in
(4.5)
. Since
x2≥
0,
β
(1
/√10,√10
) = 0. Hence,
(4.10)
yields
α1≤min{√︁5/2,
1
}
= 1 and the strengthened inequality is
x1
+1
/
(2
√2
)
x2≥
1.
4.6 Conclusions
We have introduced a procedure to generate concave underestimators of fac-
torable functions, which can be used to generate intersection cuts, together
with two strengthening procedures.
It remains to be seen the practical performance of these intersection cuts.
We expect that its generation is cheaper than the generation of disjunctive cuts,
given that there is no need to solve an LP. As for the strengthening procedures,
they might be too expensive to be of practical use. An alternative is to
88 Intersection Cuts for Factorable MINLP
0.0 0.5 1.0 1.5 2.0
0
1
2
3
4
5
0.0 0.5 1.0 1.5 2.0
0
1
2
3
4
5
0.0 0.5 1.0 1.5 2.0
0
1
2
3
4
5
Figure 4.3: The feasible region
{x∈ {
0
,
1
,
2
}×
[0
,
5] :
h
(
x
)
≤
0
}
from Exam-
ple 4.13 (left), the IC (middle), and the strengthened cut (right).
construct a polyhedral inner approximation of the
S
-free set and use monoidal
strengthening in the finite setting. However, in this case, the strengthening
proposed in Section 4.4 has no effect. Nonetheless, as far as the author knows,
this has been the first application of monoidal strengthening that is able to
exploit further problem structure such as demonstrated in Remark 4.12 and
it might be interesting to investigate further.
With respect to maximality, we cannot expect, in principle, that the
S
-free
sets constructed via the techniques presented here is maximal. In the next
chapter we show how to construct maximal
S
-free sets when
S
is described
by a single quadratic constraint.
Chapter 5
Maximal Quadratic-Free Sets
As we discussed in Section 1.2, classic intersection cuts are undominated when
they are generated from maximal
S
-free sets. However, maximality can be a
challenging goal in general. In this chapter, we show how to construct maximal
S-free sets when Sis defined as a general quadratic inequality.
The chapter is organized as follows. In Section 4.1 we introduce our setting,
review some related work and describe the contributions of the chapter. In
Section 5.2 we introduce some definitions and necessary conditions to prove
maximality of
S
-free sets. In particular, we define exposing points and exposing
point at infinity and show that if
C
is an
S
-free set whose defining inequalities
are exposed or exposed at infinity, then
C
is maximal. In Section 5.3 we show
how to construct maximal
S
-free sets when
S
is defined by a homogeneous
quadratic function. Section 5.4 presents the construction of maximal
S
-free
sets when
S
is defined by a homogeneous quadratic function and a homoge-
neous linear inequality constraints. The construction of a maximal
S
-free set
when
S
is the sublevel set of any quadratic function is presented in Section 5.5.
Our constructions depend on a “canonical” representation of the set
S
. The
effects of this representation are discussed in Section 5.6. In Section 5.6 we
collect some generalizations and remarks. In particular, we generalize the con-
struction of Section 5.3 to show how to construct construct maximal
S
-free
set when
S
is the 0-sublevel set of a difference of sublinear functions. We also
show how to handle more than one homogeneous linear inequality, extending
the result of Section 5.4. We discuss how our results can extend the work of
Bienstock et al. (2019) by constructing maximal outer-product-free sets when
the considered 2 by 2 minor contains entries to the diagonal. We show, via
an example, that our construction does not capture every possible maximal
quadratic-free set, even in the homogeneous case.
The cuts developed in this section have been implemented in Chmiela
(2020). We briefly discuss their computational impact on Section 5.8. We offer
89
90 Chapter 5. Maximal Quadratic-Free Sets
a summary and some directions for further research on Section 5.9. Finally,
we present some omitted proofs in Section 5.10.
This chapter is joint work with Gonzalo Mu˜noz. An extended abstract
based on this chapter has been accepted on the proceedings of Integer Pro-
gramming and Combinatorial Optimization Mu˜noz and Serrano (2020).
5.1 Background
Consider a generic optimization problem,
min cTx(5.1a)
s.t. x∈S⊆Rn.(5.1b)
A particularly important case is obtained when
(5.1)
is a quadratic problem,
that is,
S={x∈Rn:xTQix+bT
ix+ci≤0, i = 1, . . . , m}
for certain
n×n
matrices
Qi
, not necessarily positive semi-definite. Note that
if x¯∈ S, there exists i∈ {1, . . . , m}such that
x¯∈ Si:={x∈Rn:xTQix+bT
ix+ci≤0},
and constructing an
Si
-free set containing
x¯
would suffice to ensure separation.
Thus, slightly abusing notation, given
x¯
we focus on a systematic way of con-
structing
S
-free sets containing
x¯
, where
S
is defined using a single quadratic
inequality:
S={x∈Rn:xTQx +bTx+c≤0}.
As a final note, if we consider the simplest form of intersection cuts, where
the cuts are computed using the intersection points of the
S
-free set and the
extreme rays of the simplicial conic relaxation of
S
(i.e., using the gauge),
then the largest the
S
-free set the better. In other words, if two
S
-free sets
C1, C2
are such that
C1⊊C2
, the intersection cut derived from
C2
is stronger
than the one derived from
C1
Conforti et al. (2015). Therefore, we aim at
computing maximal S-free sets.
5.1.1 Related Work
From all the works that construct intersection cuts in a non-linear setting
reviewed in Section 4.2, the only one that ensures maximality of the corre-
sponding
S
-free sets is the work of Bienstock et al. (2016, 2019). While their
approach can also be used to generate cutting planes in our setting (gen-
eral quadratic inequalities), the definition of
S
differs: Bienstock et al. use
5.1. Background 91
a moment-based extended formulation of polynomial optimization problems
(Shor, 1987; Lasserre, 2001; Laurent, 2009) and from there define
S
as the set
of matrices which are positive semi-definite and of rank 1, which the authors
refer to as outer-products. Maximality is computed with respect to this notion.
It is unclear if a maximal outer-product-free set can be converted into a max-
imal quadratic-free set. There is an even more fundamental difference that
makes these approaches incomparable at this point: in a quadratic setting,
the approach of Bienstock et al. would compute a cutting plane in extended
space of dimension proportional to
n2
, whereas our approach can construct
a maximal
S
-free set in the original space. The quadratic dimension increase
can be a drawback in some applications, however stronger cuts can be derived
from extended formulations in some cases (Bodur et al., 2017). A thorough
comparison of these approaches is subject of future work.
5.1.2 Contribution
The main contribution of this chapter is an explicit construction of maxi-
mal
S
-free sets, when
S
is defined using a non-convex quadratic inequality
(Theorem 5.36 and Theorem 5.46). We achieve this by relying on the fact
that any quadratic inequality can be represented using a homogeneous qua-
dratic inequality intersected with a linear equality. While these maximal
S
-free
sets are constructed using semi-infinite representations, we show equivalent
closed-form representations of them.
In order to construct these sets, we also derive maximal
S
-free sets for sets
S
defined as the intersection of a homogeneous quadratic inequality intersected
with a linear homogeneous inequality. These are an important intermediate
step in our construction, but they are of independent interest as well.
In order to show our results, we state and prove a criterion for maximality
of
S
-free sets which generalizes a criterion proven by Dey and Wolsey (the
‘only if’ of (Dey and Wolsey, 2010, Proposition A.4)) in the case of maximal
lattice-free sets (Definition 5.2 and Theorem 5.6). We also develop a new
criterion that can handle a special phenomenon that arises in our setting and
also in non-linear integer programming: the boundary of a maximal
S
-free set
may not even intersect
S
. Instead, the intersection might be “at infinity”. We
formalize this in Definition 5.9 and show the criterion in Theorem 5.11.
5.1.3 Notation
Perhaps the least standard notation we use is denoting an inequality
αTx≤β
by (
α, β
). If
β
= 0 we denote it as well as
α
. This is based on the fact that
in the polar of a convex set —roughly, the set of all valid inequalities— the
92 Chapter 5. Maximal Quadratic-Free Sets
inequalities are points and, although we do not use any polarity results, many
of the ideas in this chapter were originally developed from looking at the
polar.
5.2 Preliminaries
In this section we collect definitions and results that are going to be useful
later on. As we mentioned above, our main object of study is the set
S
=
{x∈Rp
:
q
(
x
)
≤
0
} ⊆ Rp
, where
q
is a quadratic function. To make the
analysis easier, we can work on
Rp+1
and consider the cone generated by
S×{
1
}
, namely,
{
(
x, z
)
∈Rp+1
:
z2q
(
x
z
)
≤
0
, z ≥
0
}
. To recover the original
S
, however, we must intersect the cone with
z
= 1. Since we are interested
in maximal
S
-free sets, this motivates the following definition, see also Basu
et al. (2010a).
Definition 5.1.
Given
S, C, H ⊆Rn
where
S
is closed,
C
is closed and
convex and
H
is an affine hyperplane, we say that
C
is
S
-free with respect
to
H
if
C∩H
is
S∩H
-free w.r.t the induced topology in
H
. We say
C
is
maximal
S
-free with respect to
H
, if for any
C′⊇C
that is
S
-free with respect
to Hit holds that C′∩H⊆C∩H.
5.2.1 Techniques for Proving Maximality
In this section we describe some sufficient conditions to prove that a convex
set Cis maximal S-free which will be used in the chapter.
A sufficient (and necessary) condition for a full dimensional convex
C
lattice-free (that is,
S
=
Zn
) set to be maximal is that
C
is a polyhedron and
there is a point of
Zn
in the relative interior of each of its facets (Conforti
et al., 2014, Theorem 6.18). More generally, if
C
is a full dimensional
S
-free
polyhedron such that there is a point of
S
in the relative interior of each
facet, then
C
is maximal. The problem with extending this property to non-
polyhedral maximal
S
-free sets is that they might not even have facets, e.g.,
if
S
is the complement of
int B1
(0) and
C
is
B1
(0) in dimension 3 or higher.
The motivation of the next definition is to capture the property of a facet
that is key for proving maximality.
Definition 5.2.
Given a convex set
C⊆Rn
and a valid inequality
αTx≤β
,
we say that a point
x0∈Rn
exposes (
α, β
)with respect to
C
or that (
α, β
)is
exposed by x0if
–αTx0=βand,
5.2. Preliminaries 93
–
if
γTx≤δ
is any other non-trivial valid inequality for
C
such that
γTx0=δ, then there exists a µ > 0such that γ=µα and β=µδ.
In some cases we omit saying “with respect to C” if it is clear from context.
To get some intuition, if
C
is a polyhedron and
x∈C
exposes an inequality,
then that inequality is a facet and xis in the relative interior of the facet.
Remark 5.3.
It is very important to note that if there exists a point exposing
a valid inequality of
C
, then
C
is full dimensional. The reader should keep
this in mind throughout the whole chapter.
Remark 5.4.
For some convex
C
, a point
x /∈C
can expose a valid inequality
of
C
. For instance, consider
C
=
{x∈R2
:
x1
+
x2≥
1
}
. Then (0
,
0)
/∈C
and exposes x1+x2≥0.
The name “exposed inequality” comes from the concept of exposed point,
see Section 1.1. Actually, from the standard duality between points and hy-
perplanes (a hyperplane can be characterized by its normal which is a point),
one can interpret a exposed inequality just as the dual of an exposed point.
In more details and to simplify ideas, let us assume that 0
∈int
(
C
). Recall
that a point
x0∈C
is exposed if there exists a valid inequality of
C
,
αTx≤
1,
such that
{x∈C
:
αTx
= 1
}
=
{x0}
. If
α0
is an exposed point of the polar
of
C
,
C◦
=
{α
:
αTx≤
1
,∀x∈C}
, then there is a valid inequality,
xT
0α≤
1,
such that
{α∈C◦
:
xT
0α
= 1
}
=
{α0}
. In other words, if
αTx≤
1 is valid
for
C
(i.e.
α∈C◦
) and
αTx0
= 1, then
α
=
α0
. We see that
x0
is a point
(direction) that shows that
α0
is an exposed inequality, or, that
x0
exposes
α0. See also Lemma 5.15.
We now show that our definition is indeed helpful to show maximality.
Theorem 5.5.
Let
K, K′⊆Rn
be convex sets such that
K⊆K′
. If
αTx≤β
is
– valid for K,
– not valid for K′, and
– exposed by x0∈Kwith respect to K,
then x0∈int(K′).
94 Chapter 5. Maximal Quadratic-Free Sets
Proof. As
x0∈K
exposes
αTx≤β
, it holds that
αTx0
=
β
and, thus,
x0
is in
the boundary of
K
. Suppose
x0
is not in the interior of
K′
. Then it must be
in the boundary of
K′
and there is a valid inequality for
K′
,
γTx≤δ
, such
that γTx0=δ.
As
K⊊K′
,
γTx≤δ
is also valid for
K
. Given that (
γ, δ
) is tight at
x0
and
x0
exposes (
α, β
), we conclude that there is a
µ >
0 such that
γ
=
µα
and
β
=
µδ
. However, since
αTx≤β
is not valid for
K′
, it follows that
γTx≤δ
cannot be valid for K′. This contradiction proves the claim.
Theorem 5.6.
Let
S⊆Rn
be a closed set and
C⊆Rn
a convex
S
-free set.
Assume that
C
=
{x∈Rn
:
αTx≤β, ∀
(
α, β
)
∈
Γ
}
and that for every (
α, β
)
there is an x∈S∩Cthat exposes (α, β). Then, Cis maximal S-free.
Proof. To show that
C
is maximal we are going to show that for every
x¯/∈C
,
S∩int(conv(C∪{x¯})) is nonempty.
Let
x¯/∈C
and let (
α, β
)
∈
Γ be a separating inequality, i.e.,
αTx¯> β
. Let
C′= conv(C∪{x¯}).
By hypothesis, there is an x0∈S∩Cthat exposes (α, β). Since (α, β) is
valid for Cand not for C′, Theorem 5.5 implies that x0∈int(C′).
With minor modifications one can also get the following sufficient condition
for maximality with respect to a hyperplane.
Theorem 5.7.
Let
S⊆Rn
be a closed set,
H
be an affine hyperplane,
and
C⊆Rn
be a convex
S
-free set. Assume that
C
=
{x∈Rn
:
αTx≤
β, ∀
(
α, β
)
∈
Γ
}
and that for every (
α, β
)there is an
x∈S∩C∩H
that
exposes (α, β). Then, Cis maximal S-free with respect to H.
Remark 5.8.
Points that expose inequalities are also called smooth points.
Asmooth point of
C
is a point for which there exists a unique supporting
hyperplane to
C
at it Goberna et al. (2010). Therefore, if
x0∈C
, then
x0
exposes some valid inequality of C, if and only if, x0is a smooth point of C.
A related concept is that of blocking points Basu et al. (2019). However,
blocking points need not to be smooth points in general, that is, they do not
need to expose any inequality. As seen in Theorem 5.6 we use exposing points
to determine maximality of a convex
S
-free set. Similarly, in the context of lift-
ing Conforti et al. (2011a), blocking points are used to determine maximality
of a translated convex cone S×Z+-free set.
5.2. Preliminaries 95
There is another phenomenon that does not occur when
S
=
Zn
. If
S
is a
quadratic set, the inequalities of a maximal
S
-free set might not be exposed
by any point of
S
. For instance, consider
S
=
{
(
x, y
)
∈R2
:
x2
+ 1
≤y2}
.
The boundary of
S
is a hyperbola with asymptotes
x
=
±y
. Thus,
C
=
{
(
x, y
)
∈R2
:
x≥ |y|}
is a maximal
S
-free set, because its inequalities are
asymptotes of
S
, but they are not exposed by points of
S
. This phenomenon
also occurs when
S
=
Zn∩K
, with
K
convex Mor´an and Dey (2011). However,
in that case, it also turns out that maximal
S
-free sets are polyhedral and
their constructions rely on the concept of a facet (see for instance (Mor´an
and Dey, 2011, Theorem 3.2)) which we do not have access to in the general
case. In our case, we extend the definition of what it means for an inequality
to be exposed in order to handle a situation like the one above. We do this
by interpreting that asymptotes are exposed “at infinity”.
Definition 5.9.
Given a convex set
C⊆Rn
with non-empty recession cone
and a valid inequality
αTx≤β
, we say that a sequence (
xn
)
n⊆Rn
exposes
(α, β) at infinity with respect to Cif
–∥xn∥ → ∞,
–xn
∥xn∥→d∈rec(C),
–dexposes αTx≤0with respect to rec(C), and
– there exists ysuch that αTy=βsuch that dist(xn, y +⟨d⟩)→0.
As before, we omit saying “with respect to C” if it is clear from context.
Using this definition, we can prove an analogous result to Theorem 5.5 for
inequalities exposed at infinity.
Theorem 5.10.
Let
K, K′⊆Rn
be convex sets such that
K⊆K′
. If
αTx≤β
is
– valid for K,
– not valid for K′, and
– exposed at infinity by (xn)nwith respect to K,
then there exists a ksuch that xk∈int(K′).
96 Chapter 5. Maximal Quadratic-Free Sets
Proof. Suppose that for all
k
,
xk
is not in the interior of
K′
. Then, for each
k
there exists a non-trivial valid inequality for
K′
,
γT
kx≤δk
, such that
γT
kxk≥δk
.
We can assume without loss of generality that
∥
(
γk, δk
)
∥
= 1. Hence, going
through a subsequence if necessary, there exist
γ∈Rn
and
δ∈R
such that
γk→γ
and
δk→δ
when
k→ ∞
and
∥
(
γ, δ
)
∥
= 1. Note that the inequality
(
γ, δ
) is valid for
K′
. The idea is to show that (
γ, δ
) defines the same inequality
as (α, β).
As
d
=
limk→∞ xk
∥xk∥∈rec
(
K
) (see Definition 5.9) and (
γ, δ
) is valid for
K′⊇K
, then
γTx≤
0 is valid for
rec
(
K
). In particular,
γTd≤
0. On the
other hand, δk
∥xk∥≤γT
k
xk
∥xk∥implies 0 ≤γTd,
We conclude that
γTd
= 0. As
d
exposes
αTx≤
0 with respect to
rec
(
K
) ,
there exists a
µ≥
0 such that
γ
=
µα
. Note that we cannot conclude that
µ >
0 since, at this point, we do not know that (
γ, δ
) is a non-trivial inequality
(e.g. it could be 0Tx≤1).
Let
y
be such that
αTy
=
β
and
dist
(
xk, y
+
⟨d⟩
)
→
0, which exists by
Definition 5.9. Let wk=xk−dTxkd. We have that
dist(xk, y+⟨d⟩) = dist(xk−y, ⟨d⟩) = ∥xk−y−dT(xk−y)d∥=∥wk−(y−dTyd)∥.
Thus, wk→y−dTyd as k→ ∞.
Since each (
γk, δk
) is valid for
K′
,
γT
kd≤
0. Additionally, for large enough
kit must hold that dTxk>0. Therefore,
δk≤γT
kxk=γT
k(dTxkd+wk)≤γT
kwk.
Computing the limit when k→ ∞ we get,
δ≤µαT(y−dTyd) = µαTy=µβ.
If
µ
= 0, then
γ
= 0 and
δ≤
0. As
∥
(
γ, δ
)
∥
= 1, it follows that
δ
=
−
1, which
cannot be since (
γ, δ
) is a valid inequality for
K′
and
K′
is, by hypothesis,
non-empty. We conclude that
µ >
0 and that
µαTx≤µβ
is valid for
K′
,
which implies that
αTx≤β
is valid for
K′
, contradicting the hypothesis of
the theorem.
With the previous results it is straightforward to prove the following
generalization of Theorem 5.7.
Theorem 5.11.
Let
S⊆Rn
be a closed set,
H
be an affine hyperplane,
and
C⊆Rn
be a convex
S
-free set. Assume that
C
=
{x∈Rn
:
αTx≤
5.3. Maximal Quadratic-Free Sets for Homogeneous Quadratics 97
β, ∀
(
α, β
)
∈
Γ
}
and that for every (
α, β
)there is, either, an
x∈S∩C∩H
that exposes (
α, β
), or sequence (
xn
)
n⊆S∩H
that exposes (
α, β
)at infinity.
Then, Cis maximal S-free with respect to H.
Another useful result for studying maximal
S
-free sets is the following (see
also (Conforti et al., 2014, Lemma 6.17)). It states that in some cases we can
project
S
into a lower dimensional space and find maximal sets that are free
for the projection. This result is also useful for visualizing higher dimensional
S-free sets.
Theorem 5.12.
Let
C
be a full dimensional closed convex cone with lineality
space
L
. Let
S⊆Rn
be closed. Then,
C
is maximal
S
-free if and only if
(C∩L⊥)is maximal cl(projL⊥S)-free.
Proof. (
⇒
) If
C∩L⊥
is not maximal, let
K⊆L⊥
be a
cl
(
projL⊥S
)-free
set that contains it. Then,
K
+
L⊋C
. Since
C
is maximal
S
-free, there
exists an
x∈S
such that
x∈int
(
K
+
L
) =
int
(
K
) +
int
(
L
) ((Rockafellar,
1970, Corollary 6.6.2)). That is,
x
=
k
+
ℓ
with
k∈int
(
K
) and
ℓ∈L
. Thus,
x−ℓ∈K⊆L⊥
which implies that
x−ℓ∈projL⊥S
and contradicts the fact
that Kis cl(projL⊥S)-free.
(
⇐
) By contradiction, suppose that
C
is not maximal
S
-free and let
K⊋C
be
a closed convex
S
-free set. Then
K∩L⊥⊋C∩L⊥
, which implies that
K∩L⊥
is not
cl
(
projL⊥S
)-free. This implies that
∃s˜∈cl
(
projL⊥S
)
∩int
(
K∩L⊥
).
Moreover, we can further assume
s˜∈projL⊥S∩int
(
K∩L⊥
), as any sequence
contained in
projL⊥S
converging to an element of
cl
(
projL⊥S
)
∩int
(
K∩L⊥
)
must have an element in projL⊥S∩int(K∩L⊥).
By the definition of orthogonal projection, there must exist
s∈S
and
ℓ∈Lsuch that s˜ = s−ℓ. Thus, we obtain s−ℓ∈int(K∩L⊥), i.e.
s∈int(K∩L⊥) + L.
Since the lineality space of
K
must contain
L
, we conclude
s∈int
(
K
); a
contradiction with Kbeing S-free.
5.3 Maximal Quadratic-Free Sets for Homogeneous Quadratics
In this section we construct maximal
Sh
-free sets that contain a vector
x¯∈ Sh
for
Sh
=
{x∈Rp
:
xTQx ≤
0
}
. This is our building block towards maximality
98 Chapter 5. Maximal Quadratic-Free Sets
in the general case. After a change of variable, we can assume that
Sh={(x, y, z)∈Rn+m+l:
n
∑︂
i=i
x2
i−
m
∑︂
i=i
y2
i≤0}
={(x, y)∈Rn+m:
n
∑︂
i=i
x2
i−
m
∑︂
i=i
y2
i≤0}×Rl.
Thus, we will only focus on
Sh
=
{
(
x, y
)
∈Rn+m
:
∑︁n
i=ix2
i−∑︁m
i=iy2
i≤
0
}
and assume we are given (x¯, y¯) such that ∥x¯∥2>∥y¯∥2.
Remark 5.13.
The transformation used to bring
Sh
to the last “diagonal”
form is, in general, not unique. Nonetheless, maximality of the
Sh
-free sets
is preserved, as there always is such transformation that is one-to-one. In
Section 5.6 we discuss the effect different choices of this transformation have.
5.3.1 Removing Strict Convexity Matters
A simple way of obtaining an
Sh
-free set is via a concave underestimator of
f
(
x, y
) =
∑︁n
i=ix2
i−∑︁m
i=iy2
i
=
∥x∥2−∥y∥2
directly. A concave underestimator
tight at (
x¯, y¯
) is obtained after linearizing the convex function
∥x∥2
at
x¯
, that
is,
∥x¯∥2
+2
∥x¯∥
(
x−x¯
)
−∥y∥2
. The concave underestimator yields the
Sh
-free set
{
(
x, y
)
∈Rn+m
:
∥x¯∥2
+ 2
∥x¯∥
(
x−x¯
)
−∥y∥2≥
0
}
. However, simple examples
show that such an Sh-free set is not maximal.
Example 5.14. The case n=m= 1 with x¯ = 3 yields the Sh-free set
C={(x, y)∈R2:−9+6x−y2≥0}
In Figure 5.1 we can see that the set is not maximal Sh-free.
As discussed in Section 4.3.1 the problem is that
∥x∥2
is a strictly convex
function. Indeed, suppose
S
=
{x∈Rn
:
f
(
x
)
≤
0
}
where
f
is strictly convex.
The
S
-free set obtained via a concave underestimator at
x¯
is
C
=
{x∈Rn
:
f
(
x¯
) +
∇f
(
x¯
)(
x−x¯
)
≥
0
}
. It is not hard to see that the strict convexity of
f
implies that
C
is not maximal
S
-free. The reason is that, as we saw in
Chapter 2, linearizations of
f
at
x¯/∈S
will not support
S
. On the other hand,
if
f
is instead sublinear, then any linearization of
f
supports
S
, thus it yields
a maximal Sfree set.
The previous observation motivates the following. The set
Sh
can be
equivalently be described by
Sh
=
{
(
x, y
)
∈Rn+m
:
∥x∥−∥y∥ ≤
0
}
. Now,
5.3. Maximal Quadratic-Free Sets for Homogeneous Quadratics 99
-4-2 0 2 4
-4
-2
0
2
4
Figure 5.1:
Sh
in Example 5.14 (blue) and the
Sh
-free set constructed using
a concave underestimator of ∥x∥2−∥y∥2(orange).
the function
f
(
x, y
) =
∥x∥−∥y∥
has the following concave underestimator at
x¯= 0, x¯Tx
∥x¯∥−∥y∥, which yields the Sh-free set
Cλ={(x, y)∈Rn+m:λTx≥ ∥y∥},(5.2)
where
λ
=
x¯
∥x¯∥
. This set turns out to be maximal, even if we consider any other
λ∈D1
(0) We note that in Bienstock et al. (2016), the authors use a similar
technique and reformulate a 4-variable homogeneous quadratic condition of
outer-product-free sets in the form ∥x∥ ≤ ∥y∥. This allows them to construct
maximal outer-product-free sets that are of the form Cλ.
5.3.2 Maximal Sh-free Sets
We now prove that
Cλ
is maximal
Sh
-free. The main idea is to exploit that
every inequality describing
Cλ
has a point in
Sh∩Cλ
exposing it and use
Theorem 5.6. We begin with a Lemma whose proof we present in Section 5.10.
We recall that a function is sublinear if and only if it is convex and positive
homogeneous.
Lemma 5.15. Let ϕ:Rn→Rbe a sublinear function, λ∈D1(0), and let
C={(x, y) : ϕ(y)≤λTx}.
If (
x¯, y¯
)
∈C
is such that
ϕ
is differentiable at
y¯
and
ϕ
(
y¯
) =
λTx¯
, then (
x¯, y¯
)
exposes the valid inequality −λTx+∇ϕ(y¯)Ty≤0.
In particular, if
β0∈∂ϕ
(0) is an exposed point of
∂ϕ
(0), exposed by
y¯
,
and ϕ(y¯) = λTx¯, then (x¯, y¯) exposes the valid inequality −λTx+βT
0y≤0.
Theorem 5.16.
Let
Sh
=
{
(
x, y
)
∈Rn+m
:
∥x∥ ≤ ∥y∥}
and
Cλ
=
{
(
x, y
)
∈
Rn+m
:
λTx≥ ∥y∥}
for
λ∈D1
(0). Then,
Cλ
is a maximal
Sh
-free set.
Furthermore, if λ=x¯
∥x¯∥,Cλcontains (x¯, y¯) in its interior.
100 Chapter 5. Maximal Quadratic-Free Sets
Proof. The Sh-freeness follows by construction. To show that Cλis maximal,
we first notice that
Cλ={(x, y)∈Rn+m:−λTx+βTy≤0,∀β∈D1(0)}.
We just need to show that every inequality (
−λ, β
) is exposed by a point
(x, y)∈Sh∩Cλ.
Since the norm function
∥·∥
is sublinear, differentiable everywhere but
in the origin, and
∥β∥
= 1 =
λTλ
, Lemma 5.15 shows that (
λ, β
)
∈Sh∩Cλ
exposes (
−λ, β
). From Theorem 5.6 we conclude that
Cλ
is maximal
Sh
-free.
The fact that (x¯, y¯) ∈int(Cλ) when λ=x¯
∥x¯∥, can be verified directly.
5.4 Homogeneous Quadratics With a Single Homogeneous Lin-
ear Constraint
Finding maximal
S
-free sets for
S
defined using a non-homogeneous quadratic
function is much more challenging than the previous case. In general, using a
homogenization and diagonalization, any such Scan be described as
{(x, y, z)∈Rn+m+l:∥x∥ ≤ ∥y∥, aTx+dTy+hTz=−1}.(5.3)
Remark 5.17.
Similarly to our discussion in Remark 5.13, the choice of
transformation to bring a non-homogenous quadratic to the form
(5.3)
is
not unique. Different choices can produce different vectors
a, d, h
. Nonethe-
less, maximality of
S
-free sets is preserved through these transformations if
they are one-to-one. We discuss the effect of the different choices of such
transformations in Section 5.6.
First of all, we note that the case
h
= 0 can be tackled directly using
Section 5.3. Indeed, if this is the case it is not hard to see that
C×Rl
is
maximal
S
-free (with respect to the corresponding hyperplane), where
C
is
any maximal
Sh
-free. This follows from Theorem 5.12. Thus, in what follows
we consider
S={(x, y)∈Rn+m:∥x∥ ≤ ∥y∥, aTx+dTy=−1}.
Also note that using transformations that yield the latter form of
S
allow us
to assume that the given point (x¯, y¯) ∈ Ssatisfies
∥x¯∥>∥y¯∥, aTx¯ + dTy¯ = −1.
5.4. Homogeneous Quadratics With a Single Homogeneous Linear
Constraint 101
We elaborate on this point in Section 5.6.
The set
S
above is our final goal. However, at this point, a simpler set to
study is
S≤0={(x, y)∈Rn+m:∥x∥≤∥y∥, aTx+dTy≤0}.
In this section we construct maximal
S≤0
-free sets that contain (
x¯, y¯
) satisfying
∥x¯∥>∥y¯∥, aTx¯ + dTy¯≤0.
While this set is interesting on its own, it provides an important intermediate
step into our construction of maximal S-free sets.
As it turns out, the construction of maximal
S≤0
-free sets depends on
whether
∥a∥<∥d∥
or
∥a∥ ≥ ∥d∥
and on the value of
m
. Unfortunately, each
case requires different ideas. The following remark dismisses a simple case:
Remark 5.18.
If
m
= 1 and
∥a∥<∥d∥
then
S≤0
is convex. To see this,
assume that
d >
0 and let (
x, y
)
∈S≤0
with
y
= 0. Then,
dy ≤ −aTx≤
∥a∥∥x∥ ≤ ∥a∥|y|< d|y|
. This can only happen if
y <
0. Therefore,
S≤0
is
the second order cone
{
(
x, y
) :
∥x∥ ≤ −y}
. The case
d <
0 is analogous. We
remark that the assumption
∥a∥<|d|
is fundamental for the argument. As
we show in Example 5.25, S≤0is not necessarily convex if ∥a∥=|d|.
We divide the remaining cases in the following:
Case 1 ∥a∥ ≤ ∥d∥ ∧ m > 1.
Case 2 ∥a∥ ≥ ∥d∥.
Note that both our strategies allow us to handle the overlapping case
∥a∥
=
∥d∥ ∧ m >
1. We start with the more natural idea that follows from our
previous discussions. This yields the proof of Case 1 and motivates our case
distinction.
5.4.1 Case 1: ∥a∥ ≤ ∥d∥ ∧ m > 1
The strategy for proving maximality of Cλwas to write Cλas
Cλ={(x, y)∈Rn+m:−λTx+βTy≤0,∀β∈D1(0)},
and to find an exposing point in
Sh∩Cλ
for each of the inequalities defining
Cλ
. As
S≤0⊆Sh
,
Cλ
is clearly
S≤0
-free. However, if we try to prove it is
maximal following the same technique, we find that it is not clear that some
102 Chapter 5. Maximal Quadratic-Free Sets
inequalities have exposing points in
S≤0∩Cλ
. The exposing point of the
inequality (−λ, β), (λ, β) is in S≤0if and only if aTλ+dTβ≤0. Let
G(λ) = {β:∥β∥= 1, aTλ+dTβ≤0}.
It is natural to ask, then, if
CG(λ)={(x, y)∈Rn+m:−λTx+βTy≤0,∀β∈G(λ)}
is maximal
S≤0
-free. Intuitively,
CG(λ)
is obtained from
Cλ
by removing from
its description all inequalities that do not have an exposing point in
aTλ
+
dTβ≤
0. It is reasonable to expect maximality, as, by construction, every
inequality has a point exposing it. Indeed,
Proposition 5.19.
If
CG(λ)
=
∅
and
C
is any
S≤0
-free set such that
Cλ⊆C
,
then C⊆CG(λ).
Proof. Suppose, by contradiction, that
C⊆ CG(λ)
. This implies that there
must exist
β0∈G
(
λ
) such that
−λTx
+
βT
0y≤
0 is not valid for
C
. As
Cλ⊆CG(λ),−λTx+βT
0y≤0 is valid for Cλ.
As we saw in Theorem 5.16, (
λ, β0
)
∈Cλ
exposes
−λTx
+
βT
0y≤
0, and
since
Cλ⊆C
, Theorem 5.5 implies that (
λ, β0
)
∈int
(
C
). However, since
β0∈G
(
λ
), we have (
λ, β0
)
∈S≤0
. This contradicts the
S≤0
-freeness of
C
.
This result shows that
CG(λ)
is the largest (inclusion-wise) set that one
can aspire to obtain from
Cλ
. However, it is unclear if
CG(λ)
is
S≤0
-free. Even
more, it is unclear whether
G
(
λ
) is non-empty or not. In the following we
study when CG(λ)is S≤0-free
We start by showing that when λ=x¯
∥x¯∥,G(λ) is non-empty.
Proposition 5.20.
Let (
x¯, y¯
)
/∈S≤0
such that
aTx¯
+
dTy¯≤
0and let
λ
=
x¯
∥x¯∥
.
Then,
G(λ)=∅.
If, in addition,
d
= 0, then
G
(
λ
) =
D1
(0) and
CG(λ)
=
Cλ
is maximal
S≤0
-free.
Proof. As (
x¯, y¯
)
/∈S≤0
, we have that
∥y¯∥<∥x¯∥
. Since
m >
1, then we can
find
z∈Rm\{
0
}
such that
dTz
= 0 and
∥y¯
∥x¯∥
+
z∥
= 1. Also,
aTx¯
+
dTy¯≤
0
and dTz= 0 imply that aTλ+dT(y¯
∥x¯∥+z)≤0. Thus, y¯
∥x¯∥+z∈G(λ).
Regarding the second statement of the proposition, if
d
= 0 then clearly
either
G
(
λ
) =
D1
(0) or
G
(
λ
) =
∅
. Since we are in the case
G
(
λ
)
=
∅
, this
5.4. Homogeneous Quadratics With a Single Homogeneous Linear
Constraint 103
immediately implies
CG(λ)
=
Cλ
. Thus, Proposition 5.19 implies its maximal-
ity.
In light of Proposition 5.19, we just need for
CG(λ)
to be
S≤0
-free for it to
be maximal. Note that
CG(λ)={(x, y)∈Rn+m: max
β∈G(λ)yTβ≤λTx},(5.4)
and so to prove
S≤0
-freeness, it is enough to show that for every (
x, y
)
∈
S≤0
,
maxβ∈G(λ)yTβ≥λTx
. In trying to prove this inequality is where the
conditions of this case naturally arise.
Proposition 5.21.
Let (
x¯, y¯
)
/∈S≤0
such that
aTx¯
+
dTy¯≤
0and
λ
=
x¯
∥x¯∥
.
If
∥d∥ ≥ ∥a∥
and
m >
1, then
CG(λ)
is maximal
S≤0
-free and contains (
x¯, y¯
)
in its interior.
Proof. As discussed above, it is enough to show that
max
β∈G(λ)yTβ≥λTxfor every (x, y)∈S≤0.(5.5)
Informally, the strategy is to find a dual of
maxβ∈G(λ)yTβ
so that the inequality
we have to prove is of the form “minimum of something greater or equal than
λTx
”, which often times is easier to reason about. As the objective function
of
maxβ∈G(λ)yTβ
is linear and
m >
1, we can replace the
∥β∥
= 1 constraint
with an inequality and obtain
max
β∈G(λ)yTβ= max{yTβ:∥β∥ ≤ 1, aTλ+dTβ≤0}.(5.6)
As
G
(
λ
) is constructed from an infeasible point (
x¯, y¯
)
/∈S≤0
such that
aTx¯
+
dTy¯≤
0, i.e.,
∥y¯∥<∥x¯∥
, we have
∥y¯/∥x¯∥∥ <
1. Moreover, perturbing the
latter we can argue that the rightmost optimization problem in
(5.6)
has a
strictly feasible point. Thus, Slater’s condition holds and we have that
max{yTβ:∥β∥ ≤ 1, aTλ+dTβ≤0}= inf
θ≥0∥y−dθ∥−λTaθ. (5.7)
Using (5.7), (5.5) is equivalent to
inf
θ≥0∥y−dθ∥−λTaθ ≥λTxfor every (x, y)∈S≤0.(5.8)
We now prove that if (
x, y
)
∈S≤0
, then
λT
(
x
+
aθ
)
≤ ∥y−dθ∥
, which implies
the result.
104 Chapter 5. Maximal Quadratic-Free Sets
By Cauchy-Schwarz and
∥λ∥
= 1, we have that
λT
(
x
+
aθ
)
≤ ∥x
+
aθ∥
.
Furthermore,
∥x
+
aθ∥2
=
∥x∥2
+ 2
θaTx
+
∥aθ∥2
. Since
θ≥
0,
θaTx≤ −θdTy
.
Together with ∥x∥2≤ ∥y∥2they imply
∥x+aθ∥2≤ ∥y∥2−2θdTy+∥a∥2θ2
=∥y−dθ∥2+ (∥a∥2−∥d∥2)θ2
≤ ∥y−dθ∥2,
where the last inequality follows since ∥d∥ ≥ ∥a∥.
We have shown that
∥x
+
aθ∥ ≤ ∥y−dθ∥
. Hence,
λT
(
x
+
aθ
)
≤ ∥y−dθ∥
as we
wanted to show, which implies that
CG(λ)
is
S≤0
-free. Finally, Proposition 5.19
implies the maximality of
CG(λ)
, and (
x¯, y¯
)
∈int
(
CG(λ)
) since
Cλ⊆CG(λ)
.
Remark 5.22.
Using Proposition 5.54 one can show that
maxβ{yTβ
:
∥β∥ ≤
1, aTλ+dTβ≤0}is
⎧
⎨
⎩
∥y∥,if aTλ∥y∥+yTd≤0
√︂(1 −(aTλ
∥d∥)2)(∥y∥2−(yTd
∥d∥2)2)−aTλyTd
∥d∥2,otherwise. (5.9)
Note that this is well defined since if
∥d∥
= 0, then
∥a∥
= 0 and so
(5.9)
=
∥y∥
.
This yields a closed-form expression for CG(λ)of the form
CG(λ)={(x, y)∈Rn+m: (5.9) ≤λTx}.(5.10)
The last proposition provides certain guarantees of when a simple modi-
fication of
Cλ
yields maximal
S≤0
-free sets. Our proof heavily relies on our
assumptions
∥a∥≤∥d∥
(to show
(5.8)
) and
m >
1 (to show
(5.6)
), so the
natural question is whether these conditions are actually necessary for our
statement to be true. Thus, before moving on to the next case, we argue
why these conditions are indeed necessary in our statements. The following
examples motivate our case distinction and illustrate all cases we have covered.
Example 5.23.
Consider the following set of the type
S≤0
, which we denote
S1
≤0:
S1
≤0={(x, y1, y2)∈R3:|x| ≤ ∥y∥, ax +dTy≤0}
with
a
= 1 and
d
= (1
,−
1)
T
. Let us consider the point (
x¯, y¯
) = (
−
1
,
0
,
0)
T
,
clearly satisfying the linear inequality, but not in
S1
≤0
. In Figure 5.2 we show
S1
≤0
, the
S1
≤0
-free set given by
Cλ
and the set
CG(λ)
for
λ
=
x¯
∥x¯∥
. Since in this
case
|a|
= 1
≤√2
=
∥d∥
and
m >
1, we know
CG(λ)
is maximal
S1
≤0
-free.
5.4. Homogeneous Quadratics With a Single Homogeneous Linear
Constraint 105
(a)
S1
≤0
in Example 5.23 (orange) and
the corresponding
Cλ
set (green). The
latter is S1
≤0-free but not maximal.
(b)
S1
≤0
in Example 5.23 (orange) and
the corresponding
CG(λ)
set (green).
The latter is maximal S1
≤0-free.
Figure 5.2: Sets Cλand CG(λ)in Example 5.23 for the case ∥a∥ ≤ ∥d∥.
Example 5.24. Consider the set S2
≤0, defined as
S2
≤0={(x1, x2, y)∈R3:∥x∥ ≤ |y|, aTx+dy ≤0}
with
a
= (
−
1
/√2,
1
/√2
)
T
and
d
= 1
/√2
(the 1
/√2
terms are not really
important now as we can scale the inequality, but we reuse this example in
subsequent sections where they do matter), and (
x¯, y¯
) = (
−
1
,−
1
,
0)
T
. This
point satisfies the linear inequality in S2
≤0, but it is not in S2
≤0. Let λ=x¯
∥x¯∥.
In this case
aTλ
= 0, and as a consequence the corresponding set
G
(
λ
)
is given by the singleton
{−
1
}
. In Figure 5.3 we show
S2
≤0
, the
S2
≤0
-free set
given by
Cλ
and the set
CG(λ)
. In this case
∥a∥
= 1
>
1
/√2
=
|d|
, so we have
no guarantee on the
S2
≤0
-freeness of
CG(λ)
. Even more, it is not
S2
≤0
-free.
Example 5.25.
Let us consider the following example with
n
= 2,
m
= 1
and
∥d∥
=
∥a∥
. Let
a
= (
−
3
,
4)
T, d
= 5 and consider (
x¯, y¯
) = (
−
4
,−
3
,−
1)
and
λ
=
x¯
∥x¯∥
. Clearly (
x¯, y¯
)
∈ S≤0
, but satisfies the linear constraint. In this
case, β∈G(λ) must satisfy
5·β≤0,|β|= 1
106 Chapter 5. Maximal Quadratic-Free Sets
(a)
S2
≤0
in Example 5.24 (orange) and
the corresponding
Cλ
set (green). The
latter is S2
≤0-free but not maximal.
(b)
S2
≤0
in Example 5.24 (orange) and
the corresponding
CG(λ)
set (green).
The latter is not S2
≤0-free.
Figure 5.3: Sets Cλand CG(λ)in Example 5.24 for the case ∥a∥>∥d∥.
thus G(λ) = {−1}. Nonetheless, (x, y) = (3,−4,5) ∈S≤0, and
λTx+y= 0 + 5 >0
This means (x, y)∈int(CG(λ)). Thus, CG(λ)is not S≤0-free.
Remark 5.26.
The situation in Example 5.25 is similar to the one depicted
in Figure 5.3b. Roughly speaking, when
∥a∥
=
∥d∥
the upper region becomes
a single line and this line intersects the interior of
CG(λ)
. Intuitively, when we
consider
S
where
aTx
+
dTy
=
−
1, this line should not appear. Even more,
S
should be convex. We will see that this is the case in the Section 5.5.1.
5.4.2 Case 2: ∥a∥ ≥ ∥d∥
As we have seen in Example 5.24, when
∥a∥ ≤ ∥d∥
does not hold,
CG(λ)
is not
necessarily
S≤0
-free. On the other hand,
Cλ
is
S≤0
-free but not necessarily
maximal. As before, we are looking for a convex set
C
that is maximal
S≤0
-free
set that contains
Cλ
. We point out that in not all statements of this section
we require λ=x¯
∥x¯∥.
Projecting-out the lineality space
The lineality space of
Cλ
is
L
=
{
(
x, y
) :
λTx
= 0
, y
= 0
}
and as
Cλ⊆C
,
it must be that
L
is contained in the lineality space of
C
. By Theorem 5.12,
projL⊥C
is maximal
projL⊥S≤0
-free, thus, it might be possible (and we show
it is) to find
C
by studying maximal
projL⊥S≤0
-free sets. We note that
L⊥=⟨λ⟩×Rmand
projL⊥S≤0={(λTx, y) : ∥x∥ ≤ ∥y∥, aTx+dTy≤0}.
5.4. Homogeneous Quadratics With a Single Homogeneous Linear
Constraint 107
After analyzing low dimensional instances of
projL⊥S≤0
we conjecture that
(projL⊥S≤0)c
is formed by the union of two disjoint convex sets. If this is true,
it would directly provide maximal projL⊥S≤0-free sets.
In order to show that this is actually true, we follow the following strategy.
For each point
y∈Rm
, the points (
λTx, y
)
∈projL⊥S≤0
lie on an interval,
namely, {λTx:∥x∥≤∥y∥, aTx+dTy≤0}. Thus, we define the functions
y↦→ max{λTx:∥x∥≤∥y∥, aTx+dTy≤0}and
y↦→ min{λTx:∥x∥ ≤ ∥y∥, aTx+dTy≤0}.
If the first function is convex and the second is concave, then the closure of
(projL⊥S≤0)c
is the union of the epigraph of the first one and the hypograph
of the second one. Thus, it suffices to show that
ϕλ(y) = max
x{λTx:∥x∥ ≤ ∥y∥, aTx+dTy≤0}(5.11)
is convex for every λ∈D1(0), as the second function is −ϕ−λ.
We first show that ϕλis defined over all Rm.
Proposition 5.27.
If
∥d∥ ≤ ∥a∥
, then for every
y
the set
{
(
x, y
) :
∥x∥ ≤
∥y∥, aTx≤ −dTy}is not empty.
Proof. Note that
x
=
−dTya
∥a∥2
belongs to the set. Indeed,
aTx
=
−dTy
,
in particular,
aTx≤ −dTy
. Also,
∥d∥≤∥a∥
implies that
∥x∥ ≤ ∥d∥
∥a∥∥y∥ ≤
∥y∥.
We now show that
ϕλ
is convex. Furthermore, we prove that
ϕλ
is sublinear,
that is, convex and positive homogeneous. The proof is basically to find
ϕλ
explicitly and then verify its properties. Note that in this case
∥a∥
= 0 implies
that the linear inequality in
S≤0
is trivial. Thus,we assume without loss of
generality, that ∥a∥= 1.
Proposition 5.28.
Let
λ, a ∈D1
(0)
⊆Rn
and
d∈Rm
such that
∥d∥ ≤
1.
Then,
ϕλ(y) = {︄∥y∥,if λTa∥y∥+dTy≤0
√︁(∥y∥2−(dTy)2)(1 −(λTa)2)−dTyλTa, otherwise.
(5.12)
Furthermore, ϕλis sublinear and
– if ∥d∥= 1 ∧m > 1, then ϕλis differentiable Rm\dR+,
108 Chapter 5. Maximal Quadratic-Free Sets
– otherwise ϕλis differentiable in Rm\{0}.
Proof. The fact that
ϕλ
is positive homogeneous can be easily verified. We leave
the proof that
ϕλ
is of the form
(5.12)
to Section 5.10, see Proposition 5.54.
Thus convexity and differentiability remains.
First, note that if
λ
=
a
, then
ϕλ
(
y
) =
−dTy
. This function is clearly
sublinear and differentiable everywhere. On the other hand, if
λ
=
−a
, then
ϕλ
(
y
) =
∥y∥
. This function is clearly sublinear and differentiable everywhere
but the origin.
We now consider λ=±a. Let
A1={y:λTa∥y∥+dTy≤0},
A2={y:λTa∥y∥+dTy≥0},(5.13)
and let ϕ1
λand ϕ2
λbe the restriction of ϕλto A1and A2, respectively.
To show that
ϕλ
is convex we are going to use (Solovev, 1983, Theorem
3). In our particular case, since
ϕλ
is positively homogeneous, this theorem
implies that we just need to check that
ϕλ
is convex on each convex subset of
A1and A2,ϕ1
λ=ϕ2
λon A1∩A2, and that
ϕ′
λ(y;ρ) + ϕ′
λ(y;−ρ)≥0,for all ρ∈Rm\{0}, y ∈A1∩A2.(5.14)
Here, ϕ′
λ(y;ρ) is the directional derivative of ϕλat yin the direction of ρ.
Clearly,
ϕλ
is convex in each convex subset of
A1
. The function
ϕ2
λ
is of
the form
c1∥y∥W−c2dTy
, where
W
=
I−ddT⪰
0 and
c1, c2
are constants.
Thus, ϕλis convex on each convex subset of A2.
It is not hard to see that ϕ1
λ(y) = ϕ2
λ(y) for y∈A1∩A2.
Let us verify
(5.14)
for
y
= 0. For this, first notice that
ϕ1
λ
(
y
) is differ-
entiable whenever
y
= 0. Likewise,
ϕ2
λ
(
y
) is differentiable whenever
y
= 0
if
∥d∥<
1 or whenever
y /∈dR+
if
∥d∥
= 1. However, if
y∈A1∩A2\{
0
}
and
∥d∥
= 1, then
y /∈dR+
, thus
ϕ2
λ
is differentiable in a neighborhood of
y
.
Furthermore,
∇ϕ2
λ(y) = (1 −(λTa)2)(I−ddT)y
√︁(∥y∥2−(dTy)2)(1 −(λTa)2)−λTad
=1
∥y∥(I−ddT)y−λTad
=y
∥y∥
=∇ϕ1
λ(y).
5.4. Homogeneous Quadratics With a Single Homogeneous Linear
Constraint 109
Therefore,
ϕλ
is differentiable in whenever
y
= 0 if
∥d∥<
1 or whenever
y /∈dR+if ∥d∥= 1. Thus, (5.14) holds with equality for y∈A1∩A2\{0}.
It remains to verify
(5.14)
for
y
= 0. Let
ρ
be such that
ρ∈A1
and
−ρ∈A2. As ϕλis positively homogeneous, ϕ′
λ(0; ·) = ϕλ(·). Hence,
ϕ′
λ(0; ρ) = ∥ρ∥and ϕ′
λ(0; −ρ) = √︂1−(λTa)2√︂∥ρ∥2−(dTρ)2+dTρλTa.
We need to prove that
√︂1−(λTa)2√︂∥ρ∥2−(dTρ)2+dTρλTa+∥ρ∥ ≥ 0.
By Cauchy-Schwarz,
|dTρλTa| ≤ ∥d∥∥ρ∥<∥ρ∥
. Thus,
dTρλTa
+
∥ρ∥>
0.
Since
√︁1−(λTa)2√︁∥ρ∥2−(dTρ)2≥
0, the inequality follows. Therefore,
ϕλ
is convex.
We have proved that
ϕλ
is convex and differentiable in
Rm\{
0
}
if
∥d∥<
1
and in
Rm\dR+
if
∥d∥
= 1. It remains to show that if
m
= 1 and
∥d∥
= 1, then
ϕλ
is differentiable in
Rm\{
0
}
. This follows from
(5.12)
since
ϕ2
λ
(
y
) =
−dyλTa
in this case. This concludes the proof.
With this, we have completed the proof of sublinearity of
ϕλ
. Moreover,
we have explicitly described the function. As a corollary:
Corollary 5.29.
The epigraph of
ϕλ
and the hypograph of
−ϕ−λ
are maximal
projL⊥S≤0-free sets.
While this result provides two convex sets, it is not clear which one to
chose. This means, which of these two constructed
projL⊥S≤0
-free sets will
yield an S≤0-free containing the given solution (x¯, y¯). We answer this next.
Lemma 5.30.
Consider (
x¯, y¯
)such that
∥x¯∥>∥y¯∥
and
aTx¯
+
dTy¯≤
0
and
λ
=
x¯
∥x¯∥
. Then, the projection of (
x¯, y¯
)onto
L⊥
is in the interior of the
epigraph of ϕλ.
Proof. The projection of (
x¯, y¯
) onto
L⊥
is given by (
λTx¯, y¯
). Then,
ϕλ
(
y¯
) =
maxx{λTx
:
∥x∥≤∥y¯∥, aTx
+
dTy¯≤
0
} ≤ λTλ∥y¯∥
=
∥y¯∥
. Thus,
λTx¯
=
∥x¯∥>∥y¯∥ ≥ ϕλ(y¯).
110 Chapter 5. Maximal Quadratic-Free Sets
Back to the original space
Finally, we use the above to construct
S≤0
-free sets, i.e., in the original space.
Embedded in
Rn+m
, the epigraph of
ϕλ
is
{
(
tλ, y
) :
y∈Rm, ϕλ
(
y
)
≤t}
.
Thus,
Cϕλ={(tλ, y) : y∈Rm, ϕλ(y)≤t}+L
={(tλ +z, y) : y∈Rm, λTz= 0, ϕλ(y)≤t}
={(x, y) : ϕλ(y)≤λTx}.(5.15)
As a summary we prove that
Cϕλ
is maximal
S≤0
-free without going through
the projection.
Proposition 5.31.
Let
λ∈D1
(0) and
ϕλ
(
y
) =
maxx{λTx
: (
x, y
)
∈S≤0}
.
If ∥a∥= 1 ≥ ∥d∥, then Cϕλ={(x, y) : ϕλ(y)≤λTx}is maximal S≤0-free.
Additionally, if (
x¯, y¯
)
/∈S≤0
is such that
aTx¯
+
dTy¯≤
0, letting
λ
=
x¯
∥x¯∥
ensures (x¯, y¯) ∈int(Cϕλ).
Proof. We will prove that Cϕλis convex, free and maximal.
The convexity of
Cϕλ
follows directly from
Proposition
5
.
28. Also,
Cϕλ
is
S≤0
-free since if (
x, y
)
∈S≤0
, then
ϕλ
(
y
)
≥λTx
. Therefore, (
x, y
) is not in
the interior of Cϕλ.
We now focus on proving maximality. In the cases where
ϕλ
is differentiable
in Rm\{0}we can directly write
Cϕλ={(x, y)∈Rn+m:∇ϕλ(β)Ty≤λTx, ∀β∈D1(0)}.
Let
β∈D1
(0) and let
xβ
be the optimal solution of the problem
(5.11)
which defines
ϕλ
(
β
). That is,
λTxβ
=
ϕλ
(
β
). By Lemma 5.15, the inequality
−λTx+∇ϕλ(β)Ty≤0 is exposed by (xβ, β).
The only remaining case is
∥d∥
= 1
∧m >
1, where
ϕλ
is only differentiable
in
D1
(0)
\{d}
. Since in this case
m >
1 we can safely remove a single inequality
from the outer-description of Cϕλwithout affecting it, i.e.,
Cϕλ={(x, y)∈Rn+m:∇ϕλ(β)Ty≤λTx, ∀β∈D1(0) \{d}}.
Using the same argument as above we can find an exposing point of each
inequality −λTx+∇ϕλ(β)Ty≤0 for β∈D1(0) \{d}.
The fact that (
x¯, y¯
)
∈int
(
Cϕλ
) when
λ
=
x¯
∥x¯∥
follows directly since
Cλ⊆
Cϕλ.
5.4. Homogeneous Quadratics With a Single Homogeneous Linear
Constraint 111
Figure 5.4:
S2
≤0
in Example 5.24 (orange) and
Cϕλ
set (blue). The latter is
maximal S2
≤0-free.
Example 5.32. Let us recall the set S2
≤0in Example 5.24.
S2
≤0={(x1, x2, y)∈R3:∥x∥ ≤ |y|, aTx+dy ≤0}
with
a
= (
−
1
/√2,
1
/√2
)
T
,
d
= 1
/√2
, and (
x¯, y¯
) = (
−
1
,−
1
,
0)
T
. In Fig-
ure 5.3 we showed that the set
Cλ
is
S2
≤0
-free but not maximal, and
CG(λ)
is
not
S2
≤0
-free. In Figure 5.4 we show the set
Cϕλ
, which is maximal
S2
≤0
-free.
For this example, we know λTa= 0, thus
λTa∥y∥+dTy≤0⇐⇒ y≤0.
A simple calculation using (5.12) yields
ϕλ(y) = {︄−y, if y≤0
y
√2if y > 0
Remark 5.33.
As we saw in the proof of Proposition 5.28 if
λ
=
a
, then
ϕλ
(
y
) =
−dTy
. This implies that
Cϕλ
=
{
(
x, y
) :
aTx
+
dTy≥
0
}
. By
definition, this set does not contain any point from
aTx
+
dTy≤
0 in its
interior, thus, it is a very uninteresting maximal
S≤0
-free set. One is usually
interested in constructing a maximal
S≤0
-free set that contain a point (
x¯, y¯
)
that satisfies
aTx
+
dTy≤
0. Hence, by Lemma 5.30, whenever we assume
that
λ
=
x¯
∥x¯∥
where
aTx¯
+
dTy¯≤
0 and
∥x¯∥>∥y¯∥
, it will automatically hold
that λ=a.
Remark 5.34.
At this point we would like to show some relations between
Cλ
,
Cϕλ
and
CG(λ)
. The inequalities defining
Cλ
are (
−λ, β
) for
β∈D1
(0).
112 Chapter 5. Maximal Quadratic-Free Sets
-1.0 -0.5 0.5 1.0
-1.0
-0.5
0.5
1.0
Figure 5.5: Let
a
= (
3
5,−4
5
)
, d
= (
3
10 ,2
5
), and
λ
= (
63
65 ,16
65
). The boundary of
the
y
coordinates of the polars of
Cλ
,
CG(λ)
, and
Cϕλ
are depicted in orange,
green, and blue, respectively. They all coincide below the green line.
Equivalently, the polar of
Cλ
is the cone generated by
{−λ}×conv D1
(0) =
{−λ}×B1(0).
The inequalities defining
CG(λ)
are (
−λ, β
) for
β∈G
(
λ
) =
{β∈D1
(0) :
βλTa
+
dTβ≤
0
}
. Equivalently, the polar of
CG(λ)
is the cone generated by
{−λ}×conv G(λ).
The inequalities defining
Cϕλ
are (
−λ, ∇ϕλ
(
β
)) for
β∈D1
(0). When
β∈G
(
λ
), then
ϕλ
(
y
) =
∥y∥
and so the inequalities are (
−λ, β
). In other
words, some inequalities defining
Cϕλ
coincide with the inequalities defining
CG(λ)
and
Cλ
. Thus, when
Cϕλ
is convex (i.e., when
∥a∥≥∥d∥
), there is a
region where all three convex sets look the same. In terms of the polars, when
∥a∥ ≥ ∥d∥
, the polar of
Cϕλ
is between the polars of
CG(λ)
and
Cλ
. This is
depicted in Figure 5.5.
5.5 Non-Homogeneous Quadratics
As discussed at the beginning of the previous section, we now study a general
non-homogeneous quadratic which can be written as
S={(x, y)∈Rn+m:∥x∥ ≤ ∥y∥, aTx+dTy=−1}.
We assume we are given (x¯, y¯) such that
∥x¯∥>∥y¯∥, aTx¯ + dTy¯ = −1.
Much like in Section 5.4, we begin by dismissing a simple case.
5.5. Non-Homogeneous Quadratics 113
Remark 5.35.
The case
∥a∥ ≤ ∥d∥ ∧ m
= 1 can be treated separately. Note
that, as opposed to the analogous analysis at the beginning of Section 5.4,
here we include the case where the norms are equal. As already noted in
Remark 5.26, we should expect
S
to be convex in this case. Indeed, as
d
= 0
(if not, then
a
= 0 and
S
=
∅
) we can write
y
=
1
d
(
−
1
−aTx
) and consequently
S={(x, y)∈Rn+1 :∥x∥2≤1
d2(1 + 2aTx+ (aTx)2), aTx+dTy=−1}
={(x, y)∈Rn+1 :xT(︃I−1
d2aaT)︃x−1
d2(1 + 2aTx)≤0, aTx+dTy=−1}.
Since
I−1
d2aaT
is positive semi-definite whenever
|d| ≥ ∥a∥
, the set
S
is
convex. Thus, a maximal
S
-free set, or even directly a cutting plane, can be
obtained using a supporting hyperplane.
Similarly to Section 5.4, we distinguish the following cases:
Case 1 ∥a∥ ≤ ∥d∥ ∧ m > 1.
Case 2 ∥a∥>∥d∥.
Since
S⊊S≤0
, then
CG(λ)
(
Cϕλ
) is
S
-free in Case 1 (Case 2) as per
Section 5.4. It is natural to wonder whether these sets are maximal already.
5.5.1 Case 1: ∥a∥ ≤ ∥d∥ ∧ m > 1
The technique we used to prove maximality of
CG(λ)
with respect to
S≤0
is
to exploit that
CG(λ)
is defined by the inequalities of
Cλ
exposed by elements
in
S≤0
. Following this approach, we study which inequalities of
CG(λ)
are
exposed by a point of S. Recall that
CG(λ)={(x, y)∈Rn+m:−λTx+βTy≤0,∀β∈G(λ)},
where
G(λ) = {β∈Rm:∥β∥= 1, aTλ+dTβ≤0}.
Consider an inequality in the definition of
CG(λ)
given by (
−λ, β
) such that
aTλ
+
dTβ <
0. Then, the point (
λ, β
)
∈S≤0
can be scaled by
µ
=
−1
aTλ+dTβ
to the exposing point
µ
(
λ, β
)
∈S
. Thus, almost every inequality describing
CG(λ)
is exposed by points of
S
. Furthermore, we can simply remove the
inequalities that are not exposed by points of
S
from
CG(λ)
without changing
the set CG(λ). We specify this next.
114 Chapter 5. Maximal Quadratic-Free Sets
Theorem 5.36. Let λ=x¯
∥x¯∥,
H={(x, y)∈Rn+m:aTx+dTy=−1}
and
S≤0={(x, y)Rn+m:∥x∥ ≤ ∥y∥, aTx+dTy≤0},
where
∥a∥≤∥d∥ ∧ m >
1. Then,
CG(λ)
is maximal
S≤0
-free with respect to
Hand contains (x¯, y¯) in its interior.
Proof. By Proposition 5.21, we know that
CG(λ)
is maximal
S≤0
-free. Thus,
CG(λ)
is
S≤0
-free with respect to
H
. To prove maximality, we note that thanks
to m > 1:
CG(λ)={(x, y)∈Rn+m:−λTx+βTy≤0,∀β∈ri(G(λ))},
where
ri(G(λ)) = {β∈Rm:∥β∥= 1, aTλ+dTβ < 0}
is the relative interior of
G
(
λ
). Consider
β0∈ri
(
G
(
λ
)). As we saw in Proposi-
tion 5.19, (
λ, β0
)
∈CG(λ)∩S≤0
exposes the inequality (
−λ, β0
). As
CG(λ)∩S≤0
is a (non-convex) cone, we have that for any
µ >
0,
µ
(
λ, β0
)
∈CG(λ)∩S≤0
exposes the inequality (
−λ, β0
). Since
aTλ
+
dTβ0<
0,
µ
=
−1
aTλ+dTβ0>
0
and so
−(λ, β0)
aTλ+dTβ0∈S≤0∩H∩CG(λ),(5.16)
exposes the inequality (
−λ, β0
). The claim now follows from Theorem 5.7.
The above theorem states that obtaining a maximal
S
-free set in this case
amounts to simply using the maximal
S≤0
-free set
CG(λ)
, and then intersecting
with H. Recall that S=S≤0∩H. The next case is considerably different.
5.5.2 Case 2: ∥a∥>∥d∥
We begin with an important remark regarding an assumption made in the
analogous case of the previous section.
Remark 5.37.
Since in this case
∥a∥>
0, we can, again, assume that
∥a∥
= 1.
Indeed, we can always rescale the variables (
x, y
) by
∥a∥
to obtain such
requirement.
Also note that since
∥d∥<∥a∥
= 1, then
ϕλ
is differentiable in
D1
(0). See
Proposition 5.28.
5.5. Non-Homogeneous Quadratics 115
(a)
S2
≤0
(orange),
H
(green) and
Cϕλ
(blue).
-4-2 0 2 4
-25
-20
-15
-10
-5
0
5
(b) Projection onto (
x1, x2
) of
S2
≤0∩H
(orange) and
Cϕλ∩H
(blue). One of
the facets of
Cϕλ∩H
has a gap with
the boundary of S2
≤0∩H.
Figure 5.6: Plots of
S2
≤0
,
H
and
Cϕλ
as defined in Example 5.38 showing
that
Cϕλ
is not necessarily maximal
S2
≤0
-free with respect to
H
in the case
∥a∥>∥d∥.
Unfortunately, in this case the maximality of
Cϕλ
with respect to
S≤0
does
not carry over to S, as the following example shows.
Example 5.38.
We continue with
S2
≤0
defined in Example 5.24. In Figure 5.4
we showed how Cϕλgives us a maximal S2
≤0-free set. If we now consider
H={(x, y)∈Rn+m:aTx+dTy=−1}
with
a
= (
−
1
/√2,
1
/√2
)
T
and
d
= 1
/√2
, we do not necessarily obtain that
Cϕλ∩His maximal S2
≤0∩H-free. In Figure 5.6 we illustrate this issue.
Figure 5.6 of the previous example displays an interesting feature though:
the inequalities defining
Cϕλ
seem to have the correct “slope” and just need
to be translated. We conjecture, then, that in order to find a maximal
S
-free
set, we only need to adequately relax the inequalities of Cϕλ.
Set-up
Recall that
Cϕλ={(x, y) : ϕλ(y)≤λTx}
={(x, y) : −λTx+∇ϕλ(β)Ty≤0,∀β∈D1(0)}.
116 Chapter 5. Maximal Quadratic-Free Sets
We denote by
r
(
β
) the amount by which we need to relax each inequality of
Cϕλsuch that
C={(x, y) : −λTx+∇ϕλ(β)Ty≤r(β),∀β∈D1(0)},(5.17)
is
S
-free. Note that when
β
satisfies
λTa
+
dTβ <
0, the inequalities of
Cϕλ
are the same as the ones of
CG(λ)
(see also Remark 5.34) and, just like in
Section 5.5.1, they have exposing points in
S
. An inequality of this type can
be seen in Figure 5.6b: it is the inequality of
Cϕλ
tangent to
S
at one of its
exposing points. Thus, we expect that
r
(
β
) = 0 when
λTa
+
dTβ <
0. In
the following we find
r
(
β
) when
λTa
+
dTβ≥
0 and show maximality of the
resulting set.
Following the spirit of Section 5.4.2, not all statement in this section
require
λ
=
x¯
∥x¯∥
. However, we assume
λ
=
±a
. This assumption, however, is
not restrictive when constructing maximal
S
-free sets, as the following remark
shows.
Remark 5.39.
If
λ
=
−a
, then for every
β∈D1
(0) it holds that
λTa
+
dTβ <
0. In this case
r
(
β
) will be simply defined as 0 everywhere and
C
=
Cϕλ
. This
means all inequalities defining
C
have an exposing point in
S
and maximality
follows directly.
On the other hand, if we take
λ
=
x¯
∥x¯∥
with (
x¯, y¯
)
∈H
and
∥x¯∥>∥y¯∥
, we
have that if additionally λ=a
aTx¯ + dTy¯ = −1⇐⇒ ∥x¯∥+dTy¯ = −1
=⇒ ∥y¯∥+dTy¯<−1.
The latter cannot be, as ∥d∥<1.
Remark 5.40.
The assumption
λ
=
±a
has an unexpected consequence: as
λ
=
±a
and
∥a∥
=
∥λ∥
= 1, it must hold that
n≥
2. This implicit assumption,
however, does not present an issue: whenever
n
= 1 either
λ
=
a
or
λ
=
−a
. By
Remark 5.39, if we use
λ
=
x¯
∥x¯∥
, then
λ
=
−a
. Thus,
C
=
Cϕλ
and maximality
holds.
Construction of r(β)
Let
β∈D1
(0) be such that
λTa
+
dTβ≥
0. Then, the face of
Cϕλ
defined by the
valid inequality
−λTx
+
∇ϕλ
(
β
)
Ty≤
0 does not intersect
S
. See Lemma 5.55
for a proof of this statement.
In particular, the inequality is not exposed by any point in
S∩Cϕλ
. How-
ever, it is exposed by (
xβ, β
)
∈S≤0
, where
xβ
is given by
(5.27)
(see the proof
5.5. Non-Homogeneous Quadratics 117
of Proposition 5.31). Note that (
xβ, β
)
∈H0
=
{
(
x, y
) :
aTx
+
dTy
= 0
}
, as
otherwise we can scale it so that it belongs to S.
The quantity
r
(
β
) is the amount we need to relax the inequality in order to
be an “asymptote”, and we compute it as follows. We first find a sequence of
points, (
xn, yn
)
n∈N
, in
S≤0
that converge to (
xβ, β
), enforcing that no element
of the sequence belongs to
H0
. If we find such sequence, then every (
xn, yn
)
∈
S≤0can be scaled to be in S:
zn=−(xn, yn)
aTxn+dTyn∈S.
This last scaled sequence diverges, as the denominator goes to 0 due to
(
xn, yn
)
→
(
xβ, β
)
∈H0
. The idea is that the violation (
−λ, ∇ϕλ
(
β
))
Tzn
given by this sequence will give us, in the limit, the maximum relaxation that
will ensure S-freeness (see Figure 5.7). Then, we would define
r(β) = lim
n→∞(−λ, ∇ϕλ(β))Tzn=−lim
n→∞ −λTxn+∇ϕλ(β)Tyn
aTxn+dTyn
.
We remark that this limit is what we intuitively aim for, but it might not
even be well defined in general. In what follows, we construct a sequence that
yields a closed-form expression for the above limit. Additionally, we show that
such definition of r(β) yields the desired maximal S-free set.
The sequence.
Our goal is to find a sequence (
xn, yn
)
n
such that (
xn, yn
)
∈
S≤0
,
aTxn
+
dTyn<
0 and (
xn, yn
)
→
(
xβ, β
). We take
yn
=
β
and
xn
such
that
∥xn∥
=
∥β∥
= 1,
aTxn
+
dTβ <
0 and
xn→xβ
. Note that these always
exists as
∥a∥
= 1 and
∥d∥<
1. We illustrate such a sequence with our running
example.
Example 5.41.
We continue with Example 5.38. As we mentioned in Ex-
ample 5.32, in this case
ϕλ(y) = {︄−y, if y≤0
y
√2if y > 0
and since λ=1
√2(−1,−1)T, we see that
Cϕλ={(x, y) : 1
√2(x1+x2)−y≤0,(5.18a)
1
√2(x1+x2) + 1
√2y≤0}.(5.18b)
118 Chapter 5. Maximal Quadratic-Free Sets
-4-2 0 2 4
-25
-20
-15
-10
-5
0
5
Figure 5.7: Projection onto (
x1, x2
) of
S2
≤0∩H
(orange) and
Cϕλ
(blue), along
with the first two coordinates of the sequence (
zn
)
n∈N
defined in Example 5.41
for several values of n(red). The sequence is diverging “downwards”.
It is not hard to check that
−
(
1
√2,1
√2,√2
)
∈S2
≤0∩H∩Cϕλ
exposes
inequality
(5.18a)
. This is the tangent point in Figure 5.6b we discussed
above.
On the other hand,
(5.18b)
, which is obtained from
β
= 1, does not have
an exposing point in
S2
≤0∩H∩Cϕλ
, and corresponds to an inequality we
should relax as per our discussion. This inequality, however, is exposed by
(xβ, β) = (0,−1,1) ∈S2
≤0∩Cϕλ. Consider now the sequence defined as
(xn, yn) = (︃1
√n2+ 1,−n
√n2+ 1,1)︃∈S2
≤0.
Clearly the limit of this sequence is (0,−1,1) and
aTxn+dTyn=1
√2(︃−1
√n2+ 1 −n
√n2+ 1 + 1)︃<0.
Now we let
zn=−(xn, yn)
aTxn+dTyn∈S2
≤0∩H.
As we mention above, this sequence diverges. Continuing with Figure 5.6, in
Figure 5.7, we plot the first two components of the sequence (
zn
)
n∈N
along
with
S2
≤0∩H
and
Cϕλ∩H
. From this figure we can anticipate where our
argument is going: the sequence (
zn
)
n∈N
moves along the boundary of
S2
≤0∩H
towards an “asymptote” from where we can deduce
r
(
β
). The latter is given
by the gap between inequality (5.18b) and the asymptote.
Computing the limit. Here we compute
r(β) = −lim
n→∞ −λTxn+∇ϕλ(β)Tyn
aTxn+dTyn
.
5.5. Non-Homogeneous Quadratics 119
We proceed to rewrite the limit.
Since yn=βand xβis the optimal solution of (5.11), we have:
∇ϕλ(β)Tyn=ϕλ(β) = λTxβ
dTyn=−aTxβ.
Thus,
r(β) = −lim
n→∞ −λTxn+∇ϕλ(β)Tyn
aTxn+dTyn
=−lim
n→∞ −λTxn+λTxβ
aTxn−aTxβ
= lim
n→∞
λT(xn−xβ)
aT(xn−xβ).
Notice that
xβ
belongs to the 2 dimensional space generated by
λ
and
a
,
which we denote by Λ. Note that it is indeed 2 dimensional, since
λ
=
±a
,
see Remark 5.39. Furthermore, we can assume that
xn
also belongs to Λ as
any other component of
xn
is irrelevant for the value of the limit. Indeed, as
Rn= Λ ⊕Λ⊥, then xn=x∥
n+x⊥
n, where x∥
n∈Λ and x⊥
n∈Λ⊥, and
λT(xn−xβ)
aT(xn−xβ)=λT(x∥
n−xβ)
aT(x∥
n−xβ).
To compute the limit observe that
λT(xn−xβ)
aT(xn−xβ)=λTxn−xβ
∥xn−xβ∥
aTxn−xβ
∥xn−xβ∥
.
Notice that
xn−xβ
∥xn−xβ∥
converges, as
xn∈
Λ,
∥xn∥
= 1, and
xn→xβ
. Let
xˆ
be
the limit and note that xˆ is orthogonal to xβ. Indeed,
xβ
Txˆ = lim
n→∞xβ
Txn−xβ
∥xn−xβ∥
= lim
n→∞
xβTxn−1
∥xn−xβ∥
= lim
n→∞−∥xn−xβ∥2
2∥xn−xβ∥
= 0.
120 Chapter 5. Maximal Quadratic-Free Sets
Hence,
r(β) = lim
n→∞
λT(xn−xβ)
aT(xn−xβ)=λTxˆ
aTxˆ.
Since we are interested in the quotient of
λTxˆ
and
aTxˆ
, any multiple of
xˆ
can
be used, that is, any vector orthogonal to
xβ
in Λ. Using
λ
and
a
as basis
for Λ, we have that for
x∈
Λ with coordinates
xλ
and
xa
, the vector
y
with
coordinates
yλ
=
−
(
xa
+
xλλTa
) and
ya
=
xλ
+
xaλTa
is orthogonal to
x
.
Indeed,
xTy= (xλλ+xaa)T(yλλ+yaa)
=xλyλ+xaya+ (xλya+xayλ)λTa
= (xλ+xaλTa)yλ+ (xa+xλλTa)ya
= 0.
Thus, let
x˜
=
−
(
xβa
+
xβλλTa
)
λ
+(
xβλ
+
xβaλTa
)
a
. Given that
λTa
+
dTβ≥
0,
from (5.27) (see Section 5.10) we have
xβ=√︄1−(dTβ)2
1−(λTa)2λ−(︄dTβ+λTa√︄1−(dTβ)2
1−(λTa)2)︄a. (5.19)
Note that while this last explicit formula for
xβ
is the one stated for the case
λTa+dTβ > 0, it also holds when λTa+dTβ= 0. Therefore,
x˜ = (dTβ)λ+(︄√︄1−(dTβ)2
1−(λTa)2−(︄dTβ+λTa√︄1−(dTβ)2
1−(λTa)2)︄λTa)︄a
= (dTβ)λ+ϕλ(β)a.
All together, we obtain
r(β) = λTx˜
aTx˜=dTβ+λTaϕλ(β)
ϕλ(β) + dTβλTa.
Note that if
λTa
+
dTβ
= 0, then
r
(
β
) = 0. We summarize the above discussion
in the following result.
Lemma 5.42.
Let
a, λ, β ∈D1
(0),
d∈B1
(0), and
λ
=
±a
be such that
∥d∥<
∥a∥
and
λTa
+
dTβ≥
0. Then, every sequence (
xn
)
n∈N⊆ ⟨λ, a⟩
converging to
xβsuch that ∥xn∥= 1 and aTxn+dTβ < 0, satisfies
r(β) = lim
n→∞
λT(xn−xβ)
aT(xn−xβ)=dTβ+λTaϕλ(β)
ϕλ(β) + dTβλTa.
Such sequences are always guaranteed to exist.
5.5. Non-Homogeneous Quadratics 121
(a)
S2
≤0
(orange),
H
(green) and
C1
(blue). In this case
C1
is no longer
S2
≤0
-
free.
-4-2 0 2 4
-25
-20
-15
-10
-5
0
5
(b) Projection onto (
x1, x2
) of
S2
≤0∩H
(orange) and C1∩H(blue).
Figure 5.8: Plots of
S2
≤0
,
H
and
C1
as defined in Example 5.43 showing that
C1is maximal S2
≤0-free with respect to H.
Therefore, for β∈D1(0), we define
r(β) = {︄0,if λTa+dTβ≤0
dTβ+λTaϕλ(β)
ϕλ(β)+dTβλTa,otherwise.
We extend rto y∈Rm\{0}by r(y) = r(y
∥y∥) and leave it undefined at 0.
Example 5.43.
We continue with our running example in Example 5.41. In
this case
r
(
−
1) = 0, and since
ϕλ
(
β
) = 1
/√2
,
λTa
= 0 and
d
= 1
/√2
it can
be checked that
r(1) = 1.
Now, let
C1={(x, y) : −λTx+∇ϕλ(β)Ty≤r(β),for all β∈D1(0)}
={(x, y) : 1
√2(x1+x2)−y≤0,1
√2(x1+x2) + 1
√2y≤1}.
Figure 5.8 shows the same plots as Figure 5.6 with C1instead of Cϕλ.
As we see below, the characterization of
r
as a limit is going to be useful
to prove maximality of
C
. However, to show that
C
is free, we need a different
interpretation of r.
Lemma 5.44.
For every
β∈D1
(0),
r
(
β
) =
θ
(
β
), where
θ
(
β
)is defined in
(5.28)
and corresponds to the optimal dual solution of the optimization problem
defining ϕλ(β).
122 Chapter 5. Maximal Quadratic-Free Sets
Proof. If
λTa
+
dTβ≤
0,
r
(
β
)=0=
θ
(
β
). Let
β∈D1
(0) be such that
λTa+dTβ > 0. Then,
r(β) = dTβ+λTaϕλ(β)
ϕλ(β) + dTβλTa
=dTβ+λTa√︁1−(λTa)2√︁1−(dTβ)2−dTβ(λTa)2
√︁1−(λTa)2√︁1−(dTβ)2
=dTβ√︁1−(λTa)2
√︁1−(dTβ)2+λTa
=θ(β).
S-freeness and maximality proofs
We now show that Cis S-free and then that it is maximal.
Theorem 5.45. Let λ∈D1(0) such that λ=±a,
C={(x, y) : −λTx+∇ϕλ(β)Ty≤r(β),for all β, ∥β∥= 1}.
and
S
=
{
(
x, y
) :
∥x∥ ≤ ∥y∥, aTx
+
dTy
=
−
1
}
, with
∥d∥<∥a∥
= 1. Then,
C
is S-free.
Proof. Let (
x0, y0
)
∈S
and let
β0
=
y0
∥y0∥
. The claim will follow if we are able
to show that −λTx0+∇ϕλ(β0)Ty0≥r(β0).
Since x0satisfies ∥x0∥ ≤ ∥y0∥and aTx0+dTy0=−1, it follows that
λTx0≤max
x{λTx:∥x∥ ≤ ∥y0∥, aTx+dTy0≤ −1}.
By weak duality we have
max
x{λTx:∥x∥ ≤ ∥y0∥, aTx+dTy0≤ −1} ≤ inf
θ≥0∥y0∥∥λ−aθ∥−(dTy0+ 1)θ.
Recall that
θ
(
y0
) is the optimal dual solution to the optimization problem
defining
ϕλ
(
y0
). Thus, it holds that
θ
(
y0
)
∈R+
and
θ
(
y0
)
<
+
∞
because
∥d∥<1. Consequently,
inf
θ≥0∥y0∥∥λ−aθ∥−(dTy0+1)θ≤ ∥y0∥∥λ−aθ(y0)∥−(dTy0+1)θ(y0) = ϕλ(y0)−θ(y0),
5.5. Non-Homogeneous Quadratics 123
where the last equality follows from the strong duality between the opti-
mization problem that defines
ϕλ
and its dual, see Proposition 5.54. All the
inequalities together show that
λTx0≤ϕλ(y0)−θ(y0).
From (5.28) and Lemma 5.44 it follow θ(y0) = θ(β0) = r(β0). Thus,
−λTx0+ϕλ(y0)≥r(β0),
as we wanted to establish.
Theorem 5.46. Let λ∈D1(0) such that λ=±a,
H={(x, y)∈Rn+m:aTx+dTy=−1},
S≤0={(x, y)Rn+m:∥x∥ ≤ ∥y∥, aTx+dTy≤0},
and
C={(x, y) : −λTx+∇ϕλ(β)Ty≤r(β),for all β∈D1(0)}.
where ∥d∥<∥a∥= 1. Then, Cis maximal S≤0-free with respect to H.
Additionally, if
λ
=
x¯
∥x¯∥
with (
x¯, y¯
)
∈H
and
∥x¯∥>∥y¯∥
, then (
x¯, y¯
)
∈
int(C).
Proof. Let S=S≤0∩H. By Theorem 5.45, Cis S-free.
To show maximality we will use Theorem 5.11, that is, we will show that
every inequality of
C
is either exposed by a point in
S∩C
or exposed at
infinity by a sequence in S.
Let
β0∈D1
(0) and consider the valid inequality
−λTx
+
∇ϕλ
(
β0
)
Ty≤
r
(
β0
). Assume, first, that
aTλ
+
dTβ0<
0 As
aTλ
+
dTβ0<
0, we have that
r
(
β0
) = 0,
ϕλ
(
β0
) =
∥β0∥
= 1, and
∇ϕλ
(
β0
) =
β0
. Hence, the inequality is
−λTx+βT
0y≤0. It is exposed by
−1
aTλ+dTβ0
(λ, β0)∈S∩Cϕλ⊆S∩C.
Now, let us assume that
aTλ
+
dTβ0≥
0. We will show that there is a
sequence in
S
that exposes
−λTx
+
∇ϕλ
(
β0
)
Ty≤r
(
β0
) at infinity. Let (
xn
)
n⊆
⟨λ, a⟩
be a sequence converging to
xβ0
such that
∥xn∥
= 1,
aTxn
+
dTβ0<
0
(Lemma 5.42).
r(β0) = lim
n→∞
λT(xn−xβ0)
aT(xn−xβ0).
124 Chapter 5. Maximal Quadratic-Free Sets
Consider the sequence conformed by
zn=−(xn, β0)
aTxn+dTβ0
=(xn, β0)
aT(xβ0−xn)∈S,
where the equality above follows from
aTxβ0
+
dTβ0
= 0. We proceed to verify
that znexposes −λTx+∇ϕλ(β0)Ty≤r(β0) at infinity.
As
xn→xβ0
, we have that
∥zn∥ → ∞
. Also,
zn
∥zn∥
=
1
√2
(
xn, β0
) converges
to v=1
√2(xβ0, β0)∈Cϕλ= rec(C) and exposes −λTx+∇ϕλ(β0)Ty≤0.
Finally, we have to show that there exists a
w
such that (
−λ, ∇ϕλ
(
β0
))
Tw
=
r
(
β0
) and
dist
(
zn, w
+
⟨v⟩
)
→
0. Let
xˆ
=
limn→∞
xn−xβ0
∥xn−xβ0∥
and let
w
=
(−xˆ
aTxˆ,0). We have that (−λ, ∇ϕλ(β0))Tw=r(β0). Also,
zn−√2
aT(xβ0−xn)v=1
aT(xβ0−xn)(xn−xβ0,0) → −(xˆ
aTxˆ,0) = w.
Thus, dist(zn, w +⟨v⟩)→0.
A closed-form formula for C
Since the construction of
C
involves translating some of the inequalities of
Cϕλ
of its outer-description, it is natural to ask if this translation yields a
translation of the whole function
ϕλ
. This would yield a closed-form formula
for Cwhich is much more appealing from a computational standpoint.
In what follows, we ask whether there exists an (
x0, y0
) such that for every
βsuch that
{(x, y) : −λTx+∇ϕλ(β)Ty≤r(β),for all β, λTa+dTβ≥0}
={(x, y) : −λT(x−x0) + ∇ϕλ(β)T(y−y0)≤0,for all β, λTa+dTβ≥0}.
In order to reach this equality it would suffice to satisfy
λTx0−∇ϕλ(β)Ty0=−r(β).(5.20)
Note that since λTa+dTβ≥0
∇ϕλ(β) = √︂1−(λTa)2Wβ
∥β∥W−λTad (5.21)
r(β) = λTa+dTβ√︁1−(λTa)2
∥β∥W
.
5.6. On the Diagonalization and Homogenization of Quadratics 125
where W=I−ddT. Thus (5.20) becomes
λT(x0+adTy0)−√︂1−(λTa)2βTWy0
∥β∥W
=−λTa−dTβ√︁1−(λTa)2
∥β∥W
.
From the last expression, we see that if we are able to find (
x0, y0
) such that
x0+adTy0=−a(5.22a)
dTβ=βTWy0(5.22b)
then
(5.20)
would hold. Note that
d
is an eigenvector of
W
=
I−ddT
with
eigenvalue 1
−∥d∥2
. Thus, with
y0
=
d
1−∥d∥2
we can easily check that
(5.22b)
holds. With y0defined, in order to satisfy (5.22a) it suffices to set
x0=−a(dTy0+ 1) = −a
1−∥d∥2.
In summary, we arrive to the following expression for C,
C=⎧
⎪
⎨
⎪
⎩
(x, y) :
ϕλ(y)≤λTxif λTa∥y∥+dTy≤0
ϕλ(︃y−d
1−∥d∥2)︃≤λT(︃x+a
1−∥d∥2)︃otherwise⎫
⎪
⎬
⎪
⎭
.
(5.23)
5.6 On the Diagonalization and Homogenization of Quadratics
Consider an arbitrary quadratic set
Q={s∈Rp:sTQs +bTs+c≤0}.
Given a point
s¯/∈ Q
we can construct a maximal
Q
-free set that contains
s¯
using the techniques developed in the previous sections. The idea to do this
is first to find a one-to-one map Tsuch that
T(Q) = S≤0∩H={(x, y, z)∈Rn+m+l:∥x∥ ≤ ∥y∥, aTx+dTy+hTz=−1}
T(s¯) ∈H\S≤0,
for some hyperplane H, that is, for some a, d and h.
Then, we construct a maximal
Q
-free set using the following fact which
can be easily verified: if
C
is a maximal
S≤0
-free set with respect to
H
that
contains T(s¯), then T−1(C) is a maximal Q-free set containing s¯.
Here we show a surprising fact: which maximal
Q
-free set is obtained
heavily depends on the choice of
T
. We illustrate this interesting feature with
our running example.
126 Chapter 5. Maximal Quadratic-Free Sets
Example 5.47. Let
Q={s∈R2:−2+2√2s1−2√2s2+ 2s1s2≤0}
and s¯ = (−2,−2) ∈ Q. The following map
τ1(s1, s2) = (s1, s2,√2 + s1−s2)
is one-to-one and satisfies
τ1(Q) = S2
≤0∩H1,
where S2
≤0∩H1is defined in Example 5.38 and is given by
S2
≤0∩H1={(x1, x2, y)∈R3:∥x1, x2∥ ≤ |y|,−x1+x2+y=−√2}.
Computing a maximal
S2
≤0
-free set with respect to
H1
containing
τ1
(
s¯
) =
(
−
2
,−
2
,√2
) yields the same maximal
S2∩H1
-free set we compute in Exam-
ple 5.43, that is
C1∩H1={(x, y) : 1
√2(x1+x2)−y≤0,
1
√2(x1+x2) + 1
√2y≤1
−x1+x2+y=−√2}.
As τ−1
1is simply the projection onto the first two coordinates, we have that
τ−1
1(C1) = {︃s∈R2:(︃1
√2−1)︃s1+(︃1
√2+ 1)︃s2+√2≤0,√2s1−2≤0}︃
is our maximal Q-free set. This is exactly the set we show in Figure 5.8b.
Now we consider a different transformation for Q. Let
T1(s1, s2) = 1
2[︃−1 1
1 1 ]︃[︃s1−√2
s2+√2]︃,
T2(s1, s2) = (−1, s1, s2),and
τ2=T2◦T1.
For the curious reader,
T1
is obtained from an eigen-decomposition of the
quadratic form. After some algebraic manipulation we can see that
T1(Q) = {w∈R2:T−1
1(w1, w2)∈ Q}
={w∈R2: 1 −w2
1+w2
2≤0}.
5.6. On the Diagonalization and Homogenization of Quadratics 127
-10 -8-6-4-2 0 2 4
-25
-20
-15
-10
-5
0
5
Figure 5.9: Different maximal
S
-free sets obtained from different transforma-
tions, as discussed in Example 5.47. The quadratic set
Q
(blue), a maximal
Q
-free set obtained from
τ1
(orange), and another such set obtained from
τ2
(green).
Thus, τ2is one-to-one and
τ2(Q) = {(x1, x2, y)R2:∥x1, x2∥ ≤ |y|, x1=−1}.
Letting
S3
≤0
=
{
(
x1, x2, y
)
R3
:
∥x1, x2∥ ≤ |y|, x1≤
0
}
and
H2
=
{
(
x1, x2, y
)
R3
:
x1
=
−
1
}
, we have that
τ2
(
Q
) =
S3
≤0∩H2
. We can now construct a maximal
S3
≤0
-free set with respect to
H2
. For this, note that in this case
a
= (1
,
0) and
d
= 0. Also,
τ2
(
s¯
) = (
−
1
,−
2
,√2
) and so
λ
=
1
√5
(
−
1
,−
2). As
aTλ|y|
+
dy <
0
for every y∈R, we have that r(y) = 0 and ϕλ(y) = |y|. By Theorem 5.46,
C2={(x1, x2, y)∈R3:|y| ≤ λTx}
is maximal
S3
≤0
-free set with respect to
H2
. Therefore,
τ−1
2
(
C2
) is maximal
Q
-free. In Figure 5.9 we show the sets
Q
and both maximal
Q
-free sets given
by
τ−1
1
(
C1
) and
τ−1
2
(
C2
). Note that in this case, the set
τ−1
2
(
C2
) does not
have an asymptote, and both its facets have an exposing point.
This example shows the important role of the transformation used to bring
the quadratic set to the form
S
. The resulting maximal
S
-free set can signifi-
cantly change. This opens an array of interesting questions regarding the role
of transformations in our approach: Can we distinguish the transformations
that generate
S
-free sets with asymptotes? Is there a benefit/downside from
using the latter sets? These an other questions are left for future work.
128 Chapter 5. Maximal Quadratic-Free Sets
5.7 Further Remarks and Generalizations
In this section we collect some further remarks and generalizations. We start
by generalizing Theorem 5.16 to the case where
S
is represented as the differ-
ence of two sublinear functions in independent variables. Then we generalize
Proposition 5.21 to the case of several homogeneous linear inequalities. After
this we show that we can use Proposition 5.21 to extend the work of Bienstock
et al. (2016) by constructing further outer-product-free sets. We also present
simpler proofs of some of the outer-products-free sets developed there. Finally,
we present an example that shows that there are more quadratic-free sets
than the ones that we construct on this chapter.
5.7.1 Generalizing Theorem 5.16
We can generalize Theorem 5.16 to the case when
S
can be written as the
sublevel set of a difference of sublinear functions in independent variables.
Theorem 5.48.
Let
σ
:
Rn→R
be a sublinear function and let
ρ
:
Rm→R
be a sublinear function that is positive except at 0. Let
S={(x, y)∈Rn×Rm:σ(x)≤ρ(y)}
and let
x¯
= 0 be such that there exists a
y¯
such that (
x¯, y¯
)
/∈S
. Let
λ∈∂σ
(
x¯
)
and
Cλ={(x, y)∈Rn×Rm:λTx≥ρ(y)}.
Then Cλis maximal S-free.
Proof. First note that
σ
(
x¯
)
>
0 since otherwise, due to the positivity of
ρ
,
(
x¯, y
)
∈S
for any
y∈Rm
. Therefore, 0
/∈∂σ
(
x¯
), in particular
λ
= 0, and we
can assume without loss of generality that ∥λ∥= 1.
We are going to prove maximality via Theorem 5.6. For this notice that
Cλ={(x, y)∈Rn×Rm:λTx≥βTyfor all β∈exp ∂ρ(0)}.
Thus, we just need to show that every inequality is exposed by a point of
S∩Cλ
. We show this now. Let
β∈exp ∂ρ
(0), let
y0
be such that it exposes
β
,
and let
x0
=
ρ(y0)
σ(x¯) x¯
. By Proposition 5.53 (see Section 5.10),
λTx¯
=
σ
(
x¯
), which
implies that
λTx0
=
ρ
(
y0
). Then, Lemma 5.15 implies that (
x0, y0
) exposes
λTx≥βTy.
We need to show that (
x0, y0
)
∈S∩Cλ
. As we saw,
λTx0
=
ρ
(
y0
), which
implies that (x0, y0)∈Cλ. Finally,
σ(x0) = σ(ρ(y0)
σ(x¯) x¯) = ρ(y0)
5.7. Further Remarks and Generalizations 129
implies that (
x0, y0
)
∈S
. Notice that the second equality holds because
ρ(y0)>0 and σ(x¯) >0.
The next example shows that the positivity of ρis necessary.
Example 5.49.
Consider
σ
(
x, y
) =
|x
+
y|
+
∥
(
x, y
)
∥
and
ρ
(
z
) = 2
z
+
|z|
. Both
functions are positively homogeneous. Let
S
=
{
(
x, y, z
)
∈R3
:
σ
(
x, y
)
≤
ρ(z)},x¯ = (1,1). Then, ∇σ(x¯) = (1 + 1
√2)x¯ and
Cλ={(x, y, z)∈R3: (1 + 1
√2)(x+y)≥2z+|z|}.
We now show that Cλis not maximal S-free, see also Figure 5.10.
Consider
K={(x, y, z)∈R3: (1 + 1
√2)(x+y)≥3z}.
Since
z≤ |z|
, then
Cλ⊊K
. Furthermore,
K
is
S
-free. Indeed, given that
σ
(
x, y
)
≥
0, then any element of
S
satisfies
z≥
0. Thus if (
x, y, z
) is in
S
and
the interior of K, it must satisfy z≥0 and
σ(x, y)≤3z < (1 + 1
√2)(x+y).
This is impossible since (1 +
1
√2
)(
x
+
y
) =
∇σ
(
x¯
)
T
(
x, y
)
≤σ
(
x, y
) for every
x, y.
The next example shows that it is necessary that each sublinear function
is in a different set of variables.
Example 5.50.
Consider
σ
(
x, y
) =
|x
+
y|
+
x
and
ρ
(
x, y
) =
∥
(
x, y
)
∥
+
y
.
Both functions are positively homogeneous. Let
S
=
{
(
x, y
)
∈R2
:
σ
(
x, y
)
≤
ρ
(
x, y
)
}
and (
x¯, y¯
) = (1
,
1). We have that (
x¯, y¯
)
/∈S
and
∇σ
(
x¯, y¯
) = (2
,
1).
From Figure 5.11 it is easy to see that
C={(x, y)∈R2: 2x+y≥ρ(x, y)}
is not maximal S-free.
5.7. Further Remarks and Generalizations 131
5.7.2 Generalizing Proposition 5.21
Let
P
=
{
(
x, y
) :
Ax
+
Dy ≤
0
}
and let
SP
=
{
(
x, y
)
∈P
:
∥x∥ ≤ ∥y∥}
.
Here we construct maximal
SP
-free sets under some conditions on
P
. The
construction generalizes Proposition 5.21.
The construction follows basically the same steps, but there is one extra
issue. Just like
G
(
λ
), we define
GP
(
λ
) =
{β
:
∥β∥
= 1
, Aλ
+
Dβ ≤
0
}
. Also,
just like CG(λ)we define
CGP(λ)={(x, y)∈Rn+m:−λTx+βTy≤0,∀β∈GP(λ)}.
The extension of Proposition 5.19 presents the extra hypothesis needed. In
the proof of Proposition 5.21 it was key to write the non-convex problem
maxβ∈G(λ)yTβ
as the convex problem
max{yTβ
:
∥β∥ ≤
1
, aTλ
+
dTβ≤
0
}
,
see
(5.6)
. However, in general, the same does not work using
GP
(
λ
) instead
of G(λ). Indeed,
max{yTβ:∥β∥ ≤ 1, Aλ +Dβ ≤0}(5.24)
can have optimal solutions for which
∥β∥<
1. This can never happen when
we have a single inequality and
β∈Rm
with
m >
1. To force that every
optimal solution of
(5.24)
satisfies
∥β∥
= 1 we are going to ask that
P
has no
vertex of the form (λ, β) with ∥β∥<1.
Alternatively, we could define
GP
(
λ
) =
{β
:
∥β∥ ≤
1
, Aλ
+
Dβ ≤
0
}
and
CGP(λ)
with the new
GP
(
λ
). However, it would not be clear if there is a
point in
SP
exposing an inequality with
∥β∥<
1. Indeed, it must no happen.
This can be seen from modifying Example 5.25. Consider
a
= (
−
2
,
4) instead
of
a
= (
−
3
,
4). The modification discussed here yields
CG(λ)
=
{
(
x, y
) :
λTx
+
y≥
0
, λTx−4
25 y≥
0
}
. The second inequality comes from
β
=
4
25
.
However, with the new a,{(x, y) : λTx+y≥0}is already maximal.
Finally, we need to generalize the condition
∥d∥ ≥ ∥a∥
. This generalizes to
the condition DDT−AATis copositive. All together, we have the following.
Proposition 5.51.
Let (
x¯, y¯
)
∈P\SP
and
λ
=
x¯
∥x¯∥
. Assume that
P
has no
vertex (λ, β)with ∥β∥<1. If DDT−AATis copositive, then
CGP(λ)={(x, y)∈Rn+m:−λTx+βTy≤0,∀β∈GP(λ)}.
is maximal SP-free.
Proof. As
P
has no vertex (
λ, β
) with
∥β∥<
1, we have that
GP
(
λ
)
=
∅
if and only if
{∥β∥ ≤
1
, Aλ
+
Dβ ≤
0
}
=
∅
. Since
Ax¯
+
Dy¯≤
0 and
132 Chapter 5. Maximal Quadratic-Free Sets
∥x¯∥>∥y¯∥
, it holds that
Aλ
+
Dy¯
∥x¯∥≤
0 and
∥y¯∥
∥x¯∥<
1. In other words,
(
−λ, y¯
∥x¯∥
)
∈ {∥β∥ ≤
1
, Aλ
+
Dβ ≤
0
}
. Note that (
−λ, y¯
∥x¯∥
) is a Slater point,
see Section 1.3.
To show that CGP(λ)is SP-free, it is enough to show that
max
β∈G(λ)yTβ≥λTxfor every (x, y)∈SP.
Since
P
has no vertex (
λ, β
) with
∥β∥<
1 the maximum above is equivalent
to (5.24). By strong duality (5.24) is equal to
min
θ≥0∥y−DTθ∥−λTATθ.
Now we just need to show that for any
θ≥
0 and every (
x, y
)
∈SP
, the
expression
−λTx
+
∥y−DTθ∥−λTATθ
is non-negative. We will now prove
that λT(x+ATθ)≤ ∥y−DTθ∥, which implies the freeness.
By Cauchy-Schwarz and
∥λ∥
= 1, we have that
λT
(
x
+
ATθ
)
≤ ∥x
+
ATθ∥
.
Furthermore,
∥x
+
ATθ∥2
=
∥x∥2
+ 2
θTAx
+
∥ATθ∥2
. Since
θ≥
0,
θTAx ≤
−θTDy. In addition, ∥x∥2≤ ∥y∥2. Thus,
∥x+ATθ∥2≤ ∥y∥2−2θTDy +∥ATθ∥2
=∥y−DθT∥2+∥ATθ∥2−∥DTθ∥2
≤ ∥y−DθT∥2,
where the last inequality is due to the copositivity of DDT−AAT.
We have shown that
∥x
+
ATθ∥ ≤ ∥y−DθT∥
. Hence,
λT
(
x
+
ATθ
)
≤
∥y−DθT∥and we conclude that CGP(λ)is SP-free.
Finally, we have to prove maximality. Suppose there exists an
SP
-free,
C
,
such that
CGP(λ)⊊C
. This implies that there must exist
β0∈GP
(
λ
) such
that
−λTx
+
βT
0y≤
0 is not valid for
C
. As
Cλ⊆CGP(λ)
,
−λTx
+
βT
0y≤
0 is
valid for Cλ.
As we saw in Theorem 5.16, (
λ, β0
)
∈Cλ
exposes
−λTx
+
βT
0y≤
0, and
since
Cλ⊆C
, Theorem 5.5 implies that (
λ, β0
)
∈int
(
C
). However, since
β0∈GP
(
λ
), we have that (
λ, β0
)
∈SP
. This contradicts the
SP
-freeness of
C.
5.7.3 Extensions to the Work of Bienstock et al. (2016)
Bienstock et al. (2019) construct maximal
S
-free sets for
S
=
{X∈ Sn
+
:
rk(X) = 1}. They show that
Cijkl ={X∈ Sn
+:λ1(xij +xlk) + λ2(xik −xlj)≥ ∥(xik +xlj, xij −xlk)∥}
5.7. Further Remarks and Generalizations 133
is maximal
S
-free under some conditions of
λ1
and
λ2
depending on
i, j, k, l
,
see Bienstock et al. (2019, Theorem 4). In other words, the matrices for which
the entries of a given 2
×
2 submatrix satisfies the condition above. To simplify
notation we will denote the entries of the submatrix by
(︃a b
c d)︃
and
Cijkl
by
C
. For example, if the submatrix is taken from the columns
i, j
and rows
k, l
such that
k
=
j
, that is,
b
is in the diagonal, then
λ1
= 1
, λ2
= 0 yields a
maximal
S
-free set according to (Bienstock et al., 2019, Theorem 4). Or if none
of
a, b, c, d
corresponds to an entry in the diagonal, then any (
λ1, λ2
)
∈D1
(0)
yields a maximal S-free set.
This last result can be deduced as follows. By using the projection theo-
rem Theorem 5.12 we can reduce finding maximal
S
-free sets to finding the
maximal S0-free sets, where
S0={(a, b, c, d)∈R4:ad =bc}.
The set
S1={(a, b, c, d)∈R4∈ad ≤bc}
is a non-convex
S0
-free set. Using the eigenvalue decomposition we obtain
C
as a maximal
S1
-free set. Theorem 5.16 tells us that
C
is going to be maximal
for any (λ1, λ2)∈D1(0).
The difficulty when some entries belong to the diagonal is that if
X∈S
,
then its diagonal entries are non-negative. Thus, if, say,
b
is in the diagonal,
then
S0
=
{
(
a, b, c, d
)
∈R4
:
ad
=
bc, b ≥
0
}
. Thus,
S1
=
{
(
a, b, c, d
)
∈R4∈
ad ≤bc, b ≥
0
}
and we can use the techniques from Section 5.4 to construct
maximal S1-free sets.
5.7.4 There Are More Quadratic-Free Sets
It is an interesting question whether every quadratic-free set can be obtain
via the construction presented in this chapter. In this section we show that
the answer is no. Even for the homogeneous case we can find
Sh
-free sets that
are not given by our construction.
The Sh-free sets Cλhave the following property.
Proposition 5.52. If (x, y)∈Sh∩Cλ\{(0,0)}then x
∥x∥=λ.
Proof. If (
x, y
)
∈Cλ
, then
λTx≥ ∥y∥
. If (
x, y
)
∈Sh
, then
∥y∥≥∥x∥
. By
Cauchy-Schwarz,
∥x∥ ≥ λTx
. All together imply that
∥x∥
=
λTx
, which
implies that x
∥x∥=λ.
134 Chapter 5. Maximal Quadratic-Free Sets
Consider now S={(x1, x2, y1, y2)∈R4:∥x∥ ≤ ∥y∥} and let
C= conv{(1,0,1,0),(0,1,0,1),(1,1,0,0),(1,1,−1,0),(0,0,0,0)}.
C
is full dimensional and
S
-free. To see this, note that the points in the
interior of
C
are of the form (
λ1
+
λ3
+
λ4, λ2
+
λ3
+
λ4, λ1−λ4, λ2
) for which
λ1
+
. . .
+
λ5
= 1 and
λi>
0 for
i
= 1
, . . . ,
5. For such a point to be in
S
it
must hold satisfy
(λ1+λ3+λ4)2+ (λ2+λ3+λ4)2≤(λ1−λ4)2+λ2
2.
But subtracting the right hand side and factorizing, this is the same as
(2λ1+λ3)(λ3+ 2λ4) + (2λ2+λ3+λ4)(λ3+λ4)≤0.
No λi>0 satisfy the above inequality.
Notice that (1
,
0
,
1
,
0)
,
(0
,
1
,
0
,
1)
∈S∩C
, but the property in Proposi-
tion 5.52 does not hold since (1
,
0)
= (0
,
1). Therefore,
C
can be extended to
a maximal S-free set such that C=Cλfor every λ∈D1(0).
5.8 Computational Experiments
These cuts, among others, are studied computationally in the Master’s thesis
of Antonia Chmiela (Chmiela, 2020). In her work, a transformation similar to
τ2
of Section 5.6 is used to transform a general quadratic into the form needed
to construct the
S
-free set. Specifically, the idea is to write the quadratic part
of a quadratic function as a d.c. using the eigenvalues and eigenvectors and
then, to homogenize it.
Two experiments are performed in Chmiela (2020) using the MINLP solver
SCIP (Gamrath et al., 2020; Vigerske and Gleixner, 2018; Achterberg, 2009)
The first one consists of testing how much gap can be closed in the root
node, when as many cuts as possible are added. This means the following.
SCIP creates an initial linear relaxation of the optimization problem at hand.
After solving this relaxation we obtain a first lower bound
d1
. Then, the root
node is processed by tightening bound, adding cutting planes, resolving the
LP relaxation, etc. (for more details consult Achterberg (2009)). Just before
branching starts, a last lower bound
d2
is obtained. The gap closed is then
d2−d1
p−d1
, where
p
is the value of the optimal solution. Note that this measure
only makes sense when
d1
=
p
, thus, in particular, feasibility problems are
not considered.
The second experiment consists of an assessment of the performance of
SCIP with the cuts included. That is, how much faster (or slower) SCIP is
when the intersection cuts are used.
5.8. Computational Experiments 135
max default relative
subset instances solved time nodes solved time nodes time nodes
clean 2689 1625 97.73 1647 1619 99.83 1691 1.02 1.03
affected 1188 716 138.72 3790 710 145.58 4030 1.04 1.06
[0,3600] 1652 1625 9.46 342 1619 9.81 362 1.04 1.06
[1,3600] 965 938 40.19 1112 932 42.64 1198 1.06 1.08
[10,3600] 650 623 135.17 2533 617 146.12 2798 1.08 1.10
[100,3600] 359 332 462.07 8278 326 516.28 9558 1.12 1.16
[1000,3600] 135 108 1226.30 31104 102 1493.00 40748 1.22 1.31
all-optimal 1598 1598 7.91 271 1598 8.17 284 1.03 1.05
diff-timeout 54 27 1202.26 - 21 1418.06 - 1.18 -
Table 5.1: Comparison of running time (in seconds) and number of nodes when
using SCIP with the settings max and default, respectively. The columns
“relative” denote the corresponding relative shifted geometric mean of the
results obtained by default with respect to the results of max.
The results of the first experiments are as follows. Out of 690 instances
from the MINLPLIB MINLPLIB for which there was a difference in the gap
closed, 512 closed more gap. In average, a 3% more gap can be closed in the
root node. However, solvers do not add as many cuts as they can and at some
point they decide to start branching. Thus, although this result is positive,
the empirical performance still needs to be assessed.
For the second experiment, we reproduce Table 4.5 from Chmiela (2020)
as Table 5.1. We can observe that, as expected, less nodes are needed. Three
reasons why a slowdown might be expected from intersection cuts are the
following. First, to compute them, one needs access to the LP tableau, which is
not a cheap operation if performed often. Second, intersection cuts are gener-
ally dense, which might render the LP harder to solve. Third, the numerics of
these cuts can be really bad, in the sense that they might have large and small
coefficients, again making the LP harder to solve. Despite all this, we do see
a speed-up in the solving time. For example, in the instances for which either
SCIP with or without the cut took at least 1000 seconds to solve, the hard
instances, we see a 20% speed-up. When considering all instances that did not
fail, the speed-up is a modest 2%. These results were obtained by adding at
most 20 intersection cuts only in the root node. Although this might sound
as a small number, the performance is sensible to this limits. Experiments
performed by the author adding additionally at most 2 cuts per node in the
tree, led to a 10% slowdown.
These results show that there is potential in this type of cuts for nonlinear
problems. For more details, we refer the reader to Chmiela (2020).
136 Chapter 5. Maximal Quadratic-Free Sets
5.9 Summary and Future Work
In this chapter we have shown how to construct maximal quadratic-free sets,
i.e., convex sets whose interior does not intersect the sublevel set of a quadratic
function. Using the long-studied intersection cut framework, these sets can
be used in order to generate deep cutting planes for quadratically constrained
problems. We strongly believe that, by carefully laying a theoretical frame-
work for quadratic-free sets, this chapter provides an important contribution
to the understanding and future computational development of non-convex
quadratically constrained optimization problems.
The maximal quadratic-free sets we construct in this chapter allow for
an efficient computation of the corresponding intersection cuts. Computing
such cutting planes amount to solving a simple one-dimensional convex op-
timization problem using the quadratic-free sets we show here. Moreover,
even if in our constructions and maximality proofs we use semi-infinite outer-
descriptions of
S
-free sets such as
(5.17)
, all of them have closed-form expres-
sions that are more adequate for computational purposes: see
(5.2)
,
(5.10)
,
(5.15)
,
(5.23)
for these expressions for the sets
Cλ
,
CG(λ)
,
Cϕλ
and
C
, respec-
tively, and
(5.12)
for the explicit description of the
ϕλ
function. This ensures
efficient separation in LP-based methods for quadratically constrained opti-
mization problems.
The empirical performance of these intersection cuts is promising. The
development of a cut strengthening procedure is likely to be important for
obtaining an even better empirical performance. Other important open ques-
tions involve the better understanding of the role different transformations
of quadratic inequalities have (Section 5.6), a theoretical and empirical com-
parison with the method proposed by Bienstock et al. Bienstock et al. (2016,
2019), and devising new methods for producing other families of quadratic-free
sets. All this is subject of ongoing work.
5.10. Missing Proofs 137
5.10 Missing Proofs
The following is a useful identity that sublinear functions satisfies. For posi-
tively homogeneous and differentiable functions the result is implied by the
well-known Euler homogeneous function theorem.
Proposition 5.53.
If
ϕ
:
Rn→R
is sublinear, then
ϕ
(
x
) =
βTx
for every
β∈∂ϕ(x).
Proof. Let
β∈∂ϕ
(
x
). It holds that
ϕ
(
x
) +
βT
(
y−x
)
≤ϕ
(
y
) for every
y∈Rn
.
Taking y= 2xand y=1
2x, we conclude that ϕ(x) = βTx.
Lemma 5.15. Let ϕ:Rn→Rbe a sublinear function, λ∈D1(0), and let
C={(x, y) : ϕ(y)≤λTx}.
If (
x¯, y¯
)
∈C
is such that
ϕ
is differentiable at
y¯
and
ϕ
(
y¯
) =
λTx¯
, then (
x¯, y¯
)
exposes the valid inequality −λTx+∇ϕ(y¯)Ty≤0.
In particular, if
β0∈∂ϕ
(0) is an exposed point of
∂ϕ
(0), exposed by
y¯
,
and ϕ(y¯) = λTx¯, then (x¯, y¯) exposes the valid inequality −λTx+βT
0y≤0.
Proof. We need to verify both conditions of Definition 5.2. As
ϕ
is positively
homogeneous and differentiable at
y¯
, then
ϕ
(
y¯
) =
∇ϕ
(
y¯
)
y¯
. Thus, evaluating
−λTx
+
∇ϕ
(
y¯
)
Ty
at (
x¯, y¯
) yields
−λTx¯
+
ϕ
(
y¯
), which is 0 by hypothesis. This
shows that the inequality is tight at (x¯, y¯).
Now, let
αTx
+
γTy≤δ
be a non-trivial valid inequality tight at (
x¯, y¯
).
Then,
δ
=
αTx¯
+
γTy¯
and we can rewrite the inequality as
αT
(
x−x¯
)+
γT
(
y−
y¯
)
≤
0. Notice that (
ϕ
(
y
)
λ, y
)
∈C
, thus,
αTλ
(
ϕ
(
y
)
−ϕ
(
y¯
)) +
γT
(
y−y¯
)
≤
0
for every
y∈Rm
. Subtracting
αTλ∇ϕ
(
y¯
)
T
(
y−y¯
) and dividing by
∥y−y¯∥
we
obtain the equivalent expression
αTλϕ(y)−ϕ(y¯) −∇ϕ(y¯)T(y−y¯)
∥y−y¯∥≤(−γ−αTλ∇ϕ(y¯))Ty−y¯
∥y−y¯∥.
Since
ϕ
is differentiable at
y¯
, the limit when
y
approaches
y¯
of the left hand
side of the above expression is 0. However, one can make the expression
y−y¯
∥y−y¯∥
converge to any point of D1(0). Therefore,
0≤(−γ−αTλ∇ϕ(y¯))Tβ
for every
β∈D1
(0). This implies that
γ
=
−αTλ∇ϕ
(
y¯
). From here we see
that α= 0 as otherwise α=γ= 0 and the inequality would be trivial.
138 Chapter 5. Maximal Quadratic-Free Sets
Given that any (
x,
0) such that
λTx
= 0 belongs to
C
, it follows that
α
is
parallel to
λ
, i.e., there exists
ν∈R
such that
α
=
νλ
. Furthermore, (
µλ,
0)
∈
C
for every
µ≥
0, implies that 0
> αTλ
=
ν
. Therefore,
γ
=
−ν∇ϕ
(
y¯
) and the
inequality reads
νλT
(
x−x¯
)
−ν∇ϕ
(
y¯
)
T
(
y−y¯
)
≤
0. Dividing by
|ν|
and using
that
−λTx
+
∇ϕ
(
y¯
)
Ty≤
0 is tight at (
x¯, y¯
), we conclude that the inequality
can be written as
−λTx+∇ϕ(y¯)Ty≤0.
The second claims follows from the first part of the lemma and the fact that
if
β0
is an exposed point of
∂ϕ
(0) and
y¯
exposes it, then
ϕ
is differentiable
at
y¯
and
∇ϕ
(
y¯
) =
β0
. To show this last statement, it is enough to prove
that
∂ϕ
(
y¯
) =
{β0}
, as then (Rockafellar, 1970, Theorem 25.1) implies that
β0=∇ϕ(y¯).
We first show that
β0∈∂ϕ
(
y¯
). We have that
ϕ
(
y
) =
max{βTy
:
β∈
∂ϕ
(0)
}
. Since
y¯
exposes
β0
, we have that
ϕ
(
y¯
) =
βT
0y¯
. Given that
β0∈∂ϕ
(0),
we have that
βT
0y≤ϕ
(
y
). Thus,
ϕ
(
y¯
) +
βT
0
(
y−y¯
)
≤ϕ
(
y
) and we conclude
that β0∈∂ϕ(y¯).
Now, let
β∈∂ϕ
(
y¯
). Then,
ϕ
(
y¯
)+
βT
(
y−y¯
)
≤ϕ
(
y
) for all
y
. Proposition 5.53
implies that
βTy≤ϕ
(
y
) and we conclude that
β∈∂ϕ
(0). But
y¯
exposes
β0
,
which means that
β0
is the only solution to
ϕ
(
y¯
) =
max{βTy¯
:
β∈∂ϕ
(0)
}
.
This implies that β=β0. Hence, ∂ϕ(y¯) = {β0}as we wanted to show.
Proposition 5.54.
Let
a, λ ∈D1
(0),
λ
=
±a
and let
d∈Rm
be such that
∥d∥ ≤ 1. The (Lagrangian) dual problem of
max
x{λTx:∥x∥ ≤ ∥y∥, aTx+dTy≤0}(5.25)
is
inf
θ{∥λ−θa∥∥y∥−θdTy:θ≥0}.(5.26)
The optimal solution to (5.25) is x:Rm→Rn,
x(y) = ⎧
⎨
⎩
λ∥y∥,if λTa∥y∥+dTy≤0
√︂∥y∥2−(dTy)2
1−(λTa)2λ−(︂dTy+λTa√︂∥y∥2−(dTy)2
1−(λTa)2)︂a, otherwise.
(5.27)
The optimal dual solution is θ:Rm→R+∪{+∞},
θ(y) = ⎧
⎨
⎩
0,if λTa∥y∥+dTy≤0
λTa+dTy√1−(λTa)2
√∥y∥2−(dTy)2,otherwise. (5.28)
5.10. Missing Proofs 139
Here,
1
0
= +
∞
and
r
+(+
∞
) = +
∞
for every
r∈R
. Moreover, strong duality
holds, that is, (5.25) = (5.26), and
(5.25) = {︄∥y∥,if λTa∥y∥+dTy≤0
√︁(∥y∥2−(dTy)2)(1 −(λTa)2)−dTyλTa, otherwise.
(5.29)
Finally, (5.29) holds even if λ=±a.
Proof. First, note that since
λ
=
±a
and
∥d∥ ≤
1,
x
(
y
) and
θ
(
y
) are defined
for every
y∈Rm
. Second, to make some of the calculations that follow more
amenable, let S(y) = √︂∥y∥2−(dTy)2
1−(λTa)2.
The Lagrangian of (5.25) is L:Rn×R2
+→R,
L(x, µ, θ) = λTx−µ(∥x∥−∥y∥)−θ(aTx+dTy).
Thus, the dual function is
d(µ, θ) = max
xL(x, µ, θ).
We have that
d
(
µ, θ
) is infinity whenever
µ < ∥λ−aθ∥
, and
µ∥y∥ − θdTy
otherwise. Hence, the dual problem,
minθ,µ≥0d
(
µ, θ
), is
min{µ∥y∥−θdTyθ
:
θ≥0, µ ≥ ∥λ−aθ∥} which is (5.26).
Let us assume that
λTa∥y∥
+
dTy≤
0. Clearly,
x
(
y
) =
λ∥y∥
is feasible
for
(5.25)
. Its objective value is
∥y∥
. On the other hand,
θ
(
y
) = 0 is always
feasible for
(5.25)
. Its objective value is also
∥y∥
, therefore,
x
(
y
) is the primal
optimal solution and θ(y) the dual optimal solution.
Now let us consider the case
λTa∥y∥
+
dTy >
0. Let us check that
θ
(
y
) is
dual feasible, that is,
θ
(
y
)
≥
0. Note that, due to the positive homogeneity of
θ
(
y
) and the condition
λTa∥y∥
+
dTy >
0 with respect to
y
, we can assume
without loss of generality that ∥y∥= 1.
Let
α
=
λTa
and
β
=
dTy
. Since
θ
(
d
) = +
∞ ≥
0 when
∥d∥
= 1, we can
assume that
y
=
d
when
∥d∥
= 1. Note that the same does not occur when
y=−dsince we are assuming λTa∥y∥+dTy > 0. Thus, α, β ∈(−1,1).
We will prove that
θ
(
y
)
√︁1−β2
=
α√︁1−β2
+
β√1−α2≥
0, which
implies that
θ
(
y
)
≥
0. If
α, β ≥
0, then we are done. As
α
+
β >
0, at least one
of them must be positive. Let us assume
α >
0 and
β <
0, the other case is
analogous. Then,
α > −β≥
0. This implies that
α2> β2
. Subtracting
α2β2
,
factorizing and taking square roots we obtain the desired inequality.
Let us compute the value of the dual solution
θ
(
y
). First,
y
=
d
and
∥d∥= 1, θ(y) = +∞, which means that the optimal value is
lim
θ→+∞∥λ−θa∥−θ=−λTa.
140 Chapter 5. Maximal Quadratic-Free Sets
One way of computing this limit is to multiply and divide the expression
by
∥λ−θa∥+θ
θ
, expand, and simplify the numerator and denominator until one
obtains something simple enough.
Now assume
y
=
d
if
∥d∥
= 1. Observe that
∥λ−θ
(
y
)
a∥∥y∥−θ
(
y
)
dTy
=
√︁∥λ−θ(y)a∥2∥y∥−θ(y)dTy. We have that
∥λ−θ(y)a∥2= 1 + θ(y)(θ(y)−2λTa)
= 1 + (θ(y)−λTa+λTa)(θ(y)−λTa−λTa)
= 1 + (θ(y)−λTa)2−(λTa)2.
Replacing θ(y), we obtain
∥λ−θ(y)a∥2= 1 + (dTy)2
S(y)−(λTa)2
=1
S(y)(S2(y)(1 −(λTa)2)+(dTy)2)
=∥y∥2
S2(y).
Therefore,
∥λ−θ(y)a∥∥y∥−θ(y)dTy=∥y∥2
S(y)−dTyλTa−(dTy)2
S(y)
=∥y∥2−(dTy)2
S(y)−dTyλTa
=√︂(∥y∥2−(dTy)2)(1 −(λTa)2)−dTyλTa.
Let us now check the feasibility of
x
(
y
). Let us first check that
∥x
(
y
)
∥2≤
∥y∥2
. We have
∥x
(
y
)
∥2
=
S2
(
y
)
−
2
S
(
y
)(
dTy
+
S
(
y
)
λTa
)
λTa
+(
dTy
+
λTaS
(
y
))
2
.
Expanding and removing common terms yields
∥x
(
y
)
∥2
=
S2
(
y
)(1
−
(
λTa
)
2
)+
(dTy)2=∥y∥2. Thus, the first constraint is satisfied.
To check the second constraint just notice that, as ∥a∥= 1, aTx(y) = −dTy.
The primal value of x(y) is
λTx(y) = S(y)(1−(λTa)2)−dTyλTa=√︂(∥y∥2−(dTy)2)(1 −(λTa)2)−dTyλTa.
As it coincides with the value of the dual solution, even when
y
=
d
and
∥d∥= 1, we conclude that both are optimal.
It only remains to check
(5.29)
for
λ
=
±a
. If
λ
=
−a
, then the linear
constraint becomes
λTx≥dTy
and the optimal solution is
x
=
λ∥y∥
. If
λ
=
a
,
5.10. Missing Proofs 141
then the linear constraint becomes
λTx≤ −dTy
and
x
=
−dTyλ
is then
optimal. In both cases (5.29) holds.
Lemma 5.55. Consider the set
S={(x, y)Rn+m:∥x∥ ≤ ∥y∥, aTx+dTy=−1}
with
a, d
such that
∥a∥>∥d∥
. Let
λ, β ∈D1
(0) be two vectors satisfying
λTa+dTβ≥0and consider Cϕλdefined in (5.15).
Then, the face of
Cϕλ
defined by the valid inequality
−λTx
+
∇ϕλ
(
β
)
Ty≤
0
does not intersect S.
Proof. By contradiction, suppose that (x¯, y¯) ∈Cϕλis such that
(x¯, y¯) ∈S∧ −λTx¯ + ∇ϕλ(β)Ty¯ = 0.
The latter equality and the fact that
ϕλ
is sublinear implies
ϕλ
(
y¯
) =
λTx¯
.
Moreover,
x¯
is a feasible solution of the optimization problem
ϕλ
(
y¯
), which
implies it is an optimal solution.
By Lemma 5.15 we know (
x¯, y¯
) exposes the valid inequality of
Cϕλ
given
by −λTx+∇ϕλ(y¯)Ty≤0. By definition of exposing point this means
∇ϕλ(y¯) = ∇ϕλ(β).
From
(5.21)
, since
W
is invertible, we can see that this implies
β
=
y¯
∥y¯∥
.
However, as
λTa
+
dTβ≥
0, the optimal solution of in the definition of
ϕλ
(
y¯
),
x0
, must satisfy
aTx0
+
dTy¯
= 0. This contradicts
ϕλ
(
y¯
) =
λTx¯
, since
x¯
is an
optimal solution but aTx¯ + dTy¯ = −1.
Chapter 6
Conclusion
In this thesis, we have mostly studied and developed cutting planes techniques
for MINLP. The main contributions of the thesis, grouped by chapters, are
the following.
Chapter 1 The exposition of monoidal strengthening.
Chapter 2
The interpretation of Veinott’s Supporting Hyperplane algorithm
as a particular case of Kelley’s Cutting Plane algorithm. The extension
of Veinott’s algorithm to the case where the feasible region is represented
by, possibly, nonconvex and non-differentiable functions.
Chapter 3
The observation that the point we want to separate allows to
reduce the feasible region while still ensuring that every separating
hyperplane is valid. The formalization of the above observation using
the reverse polar and visible points. The characterization of visible points
for quadratic constraints.
Chapter 4
The framework for generating intersection cuts for factorable
MINLPs. The construction of a concave underestimator of a general
factorable function.
Chapter 5 The definition of a point exposing an inequality at infinity. The
construction of maximal quadratic-free sets.
We offer a final summary of the main ideas of the different chapters.
For Chapter 2, while trying to understand when gradient cuts of functions
yield supporting hyperplanes, we observed that Veinott’s algorithm naturally
appeared. This led to the observation that Veinott’s algorithm is just Kelley’s
algorithm in disguise. The disguise is changing the constraints representing
the feasible region,
C
, by the gauge of
C
. With this insight it is natural to
143
144 Chapter 6. Conclusion
consider extensions of these algorithm to cases where the constraint function
are not convex nor differentiable.
Chapter 3 is based on a simple observation: if a cut separating
x
is invalid,
then it must be separate a feasible near
x
. In other words, the point
x
defines
a subset of the feasible region,
V
, such that if a cut separates
x
from
V
,
then it separates
x
from the feasible region. The way we capitalized on this
observation was to find better bounds for the variables based on
V
. Then,
cutting plane methods that exploit bounds can produce stronger cuts.
The main motivation for Chapter 4 came from the parallel between solving
MILPs and MINLPs. Currently, a solver for MINLPs such as SCIP would not
generate cuts for a violated constraint of the form
x
(1
−x
)
≤
0. However,
if the constraint is written as
x∈0,1
, then SCIP would try to generate,
for example, Gomory cuts. Applying the same deduction of Gomory cuts
to a general nonlinear constraint, leads to a reformulation of the nonlinear
constraint. The advantage of this reformulation is that the point to separate
is now the vertex. This alone allows to recover Gomory cuts for
x
(1
−x
)
≤
0,
but only because
x
(1
−x
) is a concave function. Thus, it is natural to seek for
a concave underestimator in order to be able to deduce an intersection cut.
The question motivating Chapter 5 is natural, as maximal
S
-free sets yield
the strongest intersection cuts.
Bibliography
The page numbers in brackets at the end of each citation refer to the text.
T. Achterberg. Constraint Integer Programming. PhD thesis, 2009. [134]
T. Achterberg and R. Wunderling. Mixed integer programming: Analyzing 12 years
of progress. In Facets of Combinatorial Optimization, pages 449–481. Springer
Berlin Heidelberg, 2013. doi: 10.1007/978-3-642-38189-8 18. [75]
K. Andersen, Q. Louveaux, R. Weismantel, and L. A. Wolsey. Inequalities from two
rows of a simplex tableau. In Integer Programming and Combinatorial Optimization,
pages 1–15. Springer Berlin Heidelberg, 2007. doi: 10.1007/978-3-540-72792-7
\
1.
[28]
T. Arnold, R. Henrion, M¨oller, A., and S. Vigerske. A mixed-integer stochastic
nonlinear optimization problem with joint probabilistic constraints, 2013. URL
https://edoc.hu-berlin.de/handle/18452/9087. [39]
A. Bagirov, N. Karmitsa, and M. M. M¨akel¨a. Introduction to Nonsmooth Optimization.
Springer International Publishing, 2014. doi: 10.1007/978-3-319-08114-4. [49]
E. Balas. Intersection cuts—a new type of cutting planes for integer programming.
Operations Research, 19(1):19–39, feb 1971. doi: 10.1287/opre.19.1.19. [10, 75]
E. Balas. Disjunctive programming. In Discrete Optimization II, Proceedings of the
Advanced Research Institute on Discrete Optimization and Systems Applications of
the Systems Science Panel of NATO and of the Discrete Optimization Symposium
co-sponsored by IBM Canada and SIAM Banff, Aha. and Vancouver, pages 3–51.
Elsevier BV, 1979. doi: 10.1016/s0167-5060(08)70342-x. [25, 76]
E. Balas. Disjunctive programming: Properties of the convex hull of feasible points.
Discrete Applied Mathematics, 89(1-3):3–44, dec 1998. doi: 10.1016/s0166-218x(98)
00136-x. [54]
E. Balas and R. G. Jeroslow. Strengthening cuts for mixed integer programs. Eu-
ropean Journal of Operational Research, 4(4):224–234, apr 1980. doi: 10.1016/
0377-2217(80)90106-x. [23, 26, 28, 29, 76, 85, and 87]
145
146 Bibliography
E. Balas and F. Margot. Generalized intersection cuts and a new cut generating
paradigm. Mathematical Programming, 137(1-2):19–35, aug 2011. doi: 10.1007/
s10107-011-0483-x. [76]
E. Balas, S. Ceria, and G. Cornu´ejols. A lift-and-project cutting plane algorithm for
mixed 0–1 programs. Mathematical Programming, 58(1-3):295–324, jan 1993. doi:
10.1007/bf01581273. [77]
A. Basu, M. Conforti, G. Cornu´ejols, and G. Zambelli. Maximal lattice-free con-
vex sets in linear subspaces. Mathematics of Operations Research, 35(3):704–720,
Aug. 2010a. doi: 10.1287/moor.1100.0461. URL
https://doi.org/10.1287/moor.
1100.0461. [92]
A. Basu, M. Conforti, G. Cornu´ejols, and G. Zambelli. Minimal inequalities for an
infinite relaxation of integer programs. SIAM Journal on Discrete Mathematics,
24(1):158–168, 2010b. doi: 10.1137/090756375. [28]
A. Basu, G. Cornu´ejols, and G. Zambelli. Convex sets and minimal sublinear functions.
Journal of Convex Analysis, 18(2):427–432, 2011. [76]
A. Basu, M. Campelo, M. Conforti, G. Cornu´ejols, and G. Zambelli. Unique lifting
of integer variables in minimal inequalities. Mathematical Programming, 141(1-2):
561–576, jun 2012. doi: 10.1007/s10107-012-0560-9. [28]
A. Basu, S. S. Dey, and J. Paat. Nonunique lifting of integer variables in minimal
inequalities. SIAM Journal on Discrete Mathematics, 33(2):755–783, jan 2019. doi:
10.1137/17m1117070. [94]
P. Belotti. Disjunctive cuts for nonconvex MINLP. In Mixed Integer Nonlin-
ear Programming, pages 117–144. Springer New York, nov 2011. doi: 10.1007/
978-1-4614-1927-3\5. [77]
P. Belotti, J. Lee, L. Liberti, F. Margot, and A. W¨achter. Branching and bounds
tightening techniques for non-convex MINLP. Optimization Methods & Software,
24(4-5):597–634, 2009. [38, 41, and 74]
D. Bienstock, C. Chen, and G. Munoz. Outer-product-free sets for polynomial opti-
mization and oracle-based cuts. arXiv preprint arXiv:1610.04604, 2016. [vii, 77,
90, 99, 128, 132, and 136]
D. Bienstock, C. Chen, and G. Mu˜noz. Intersection cuts for polynomial optimization.
In A. Lodi and V. Nagarajan, editors, Integer Programming and Combinatorial
Optimization, pages 72–87, Cham, 2019. Springer International Publishing. ISBN
978-3-030-17953-3. doi: 10.1007/978-3-030-17953-3
\
6. [77, 89, 90, 91, 132, 133,
and 136]
M. Bodur, S. Dash, and O. G¨unl¨uk. Cutting planes from extended lp formulations.
Mathematical Programming, 161(1-2):159–192, 2017. [91]
P. Bonami, J. Linderoth, and A. Lodi. Disjunctive cuts for mixed integer nonlinear
programming problems. Progress in Combinatorial Optimization, pages 521–541,
2011. [77]
Bibliography 147
F. Boukouvala, R. Misener, and C. A. Floudas. Global optimization advances in
mixed-integer nonlinear programming, MINLP, and constrained derivative-free op-
timization, CDFO. European Journal of Operational Research, 252(3):701–727, aug
2016. doi: 10.1016/j.ejor.2015.12.018. [1]
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press,
2004. [6]
A. Brondsted and R. T. Rockafellar. On the subdifferentiability of convex functions.
Proceedings of the American Mathematical Society, 16(4):605, aug 1965. doi: 10.
2307/2033889. [75]
C. Buchheim and C. D’Ambrosio. Monomial-wise optimal separable underestimators
for mixed-integer polynomial optimization. Journal of Global Optimization, 67(4):
759–786, may 2016. doi: 10.1007/s10898-016-0443-3. [77]
C. Buchheim and E. Traversi. Separable non-convex underestimators for binary
quadratic programming. In Experimental Algorithms, pages 236–247. Springer
Berlin Heidelberg, 2013. doi: 10.1007/978-3-642-38527-8\22. [77]
A. Chmiela. Intersection cuts for non-convex MINLP. Master’s thesis, Technische
Universit¨et Berlin, 2020. [89, 134, and 135]
F. H. Clarke. Optimization and Nonsmooth Analysis. Society for Industrial and
Applied Mathematics, Jan. 1990. doi: 10.1137/1.9781611971309. [36, 47]
F. H. Clarke, Y. S. Ledyaev, R. J. Stern, and P. R. Wolenski. Nonsmooth Analysis
and Control Theory. Springer New York, 1998. doi: 10.1007/b97650. [47, 48, 49,
and 51]
M. Conforti and L. A. Wolsey. “Facet” separation with one linear program. Mathe-
matical Programming, may 2018. doi: 10.1007/s10107-018-1299-8. [57]
M. Conforti, G. Cornu´ejols, and G. Zambelli. A geometric perspective on lifting.
Operations Research, 59(3):569–577, jun 2011a. doi: 10.1287/opre.1110.0916. [28,
76, and 94]
M. Conforti, G. Cornu´ejols, and G. Zambelli. Corner polyhedron and intersection
cuts. Surveys in Operations Research and Management Science, 16(2):105–120, jul
2011b. doi: 10.1016/j.sorms.2011.03.001. [17]
M. Conforti, G. Cornu´ejols, and G. Zambelli. Integer Programming. Springer Interna-
tional Publishing, 2014. ISBN 978-3-319-11008-0. doi: 10.1007/978-3-319-11008-0.
[28, 54, 92, and 97]
M. Conforti, G. Cornu´ejols, A. Daniilidis, C. Lemar´echal, and J. Malick. Cut-
generating functions and S-free sets. Mathematics of Operations Research, 40
(2):276–391, may 2015. doi: 10.1287/moor.2014.0670. [17, 75, 76, and 90]
W. de Oliveira. Regularized optimization methods for convex MINLP problems.
TOP, 24(3):665–692, mar 2016. doi: 10.1007/s11750-016-0413-4. [39]
148 Bibliography
F. Deutsch, H. Hundal, and L. Zikatanov. Visible points in convex sets and best
approximation. In Computational and Analytical Mathematics, pages 349–364.
Springer New York, 2013. doi: 10.1007/978-1-4614-7621-4\15. [54, 61, and 68]
S. S. Dey and L. A. Wolsey. Two row mixed-integer cuts via lifting. Mathematical
Programming, 124(1-2):143–174, may 2010. doi: 10.1007/s10107-010-0362-x. [10,
28, 76, and 91]
M. A. Duran and I. E. Grossmann. An outer-approximation algorithm for a class of
mixed-integer nonlinear programs. Mathematical Programming, 36(3):307–339, oct
1986. doi: 10.1007/bf02592064. [37]
J. Dutta and C. S. Lalitha. Optimality conditions in convex optimization revisited.
Optimization Letters, 7(2):221–229, Oct. 2011. doi: 10.1007/s11590-011-0410-3.
[36, 47, 50, and 51]
V.-P. Eronen, M. M. M¨akel¨a, and T. Westerlund. On the generalization of ECP and
OA methods to nonsmooth convex MINLP problems. Optimization, 63(7):1057–
1073, aug 2012. doi: 10.1080/02331934.2012.712118. URL
https://doi.org/10.
1080%2F02331934.2012.712118. [37, 38]
V.-P. Eronen, M. M. M¨akel¨a, and T. Westerlund. Extended cutting plane method
for a class of nonsmooth nonconvex MINLP problems. Optimization, pages 1–21,
jun 2013. doi: 10.1080/02331934.2013.796473. [38]
V.-P. Eronen, J. Kronqvist, T. Westerlund, M. M. M¨akel¨a, and N. Karmitsa. Method
for solving generalized convex nonsmooth mixed-integer nonlinear programming
problems. Journal of Global Optimization, 69(2):443–459, May 2017. doi: 10.1007/
s10898-017-0528-7. [36, 50]
G. Fasano and R. Pesenti. Conjugate direction methods and polarity for quadratic
hypersurfaces. Journal of Optimization Theory and Applications, 175(3):764–794,
Oct. 2017. doi: 10.1007/s10957-017-1180-6. [68]
M. Fischetti and M. Monaci. A branch-and-cut algorithm for mixed-integer bilinear
programming. European Journal of Operational Research, sep 2019. doi: 10.1016/
j.ejor.2019.09.043. [77]
M. Fischetti, I. Ljubi´c, M. Monaci, and M. Sinnl. Intersection cuts for bilevel opti-
mization. In Integer Programming and Combinatorial Optimization, pages 77–88.
Springer International Publishing, 2016. doi: 10.1007/978-3-319-33461-5\7. [77]
M. Fischetti, I. Ljubi´c, M. Monaci, and M. Sinnl. A new general-purpose algorithm
for mixed-integer bilevel linear programs. Operations Research, 65(6):1615–1637,
dec 2017. doi: 10.1287/opre.2017.1650. [77]
R. Fletcher and S. Leyffer. Solving mixed integer nonlinear programs by outer
approximation. Mathematical Programming, 66(1):327–349, 1994. ISSN 1436-4646.
doi: 10.1007/BF01581153. [37]
Bibliography 149
G. Gamrath, D. Anderson, K. Bestuzheva, W.-K. Chen, L. Eifler, M. Gasse, P. Ge-
mander, A. Gleixner, L. Gottwald, K. Halbig, G. Hendel, C. Hojny, T. Koch, P. L.
Bodic, S. J. Maher, F. Matter, M. Miltenberger, E. M¨uhmer, B. M¨uller, M. Pfetsch,
F. Schl¨osser, F. Serrano, Y. Shinano, C. Tawfik, S. Vigerske, F. Wegscheider,
D. Weninger, and J. Witzig. The SCIP Optimization Suite 7.0. ZIB-Report
20-10, Zuse Institute Berlin, March 2020. URL
http://nbn-resolving.de/urn:
nbn:de:0297-zib-78023. [134]
A. M. Geoffrion. Generalized benders decomposition. Journal of Optimization Theory
and Applications, 10(4):237–260, oct 1972. doi: 10.1007/bf00934810. [37]
F. Glover. Convexity cuts and cut search. Operations Research, 21(1):123–134, feb
1973. doi: 10.1287/opre.21.1.123. [10, 75]
F. Glover. Polyhedral convexity cuts and negative edge extensions. Zeitschrift f¨ur
Operations Research, 18(5):181–186, oct 1974. doi: 10.1007/bf02026599. [14, 76]
M. Goberna, E. Gonz´alez, J. Mart´ınez-Legaz, and M. Todorov. Motzkin decompo-
sition of closed convex sets. Journal of Mathematical Analysis and Applications,
364(1):209–221, apr 2010. doi: 10.1016/j.jmaa.2009.10.015. [94]
R. Gomory. An algorithm for the mixed integer problem. Technical report, RAND
CORP SANTA MONICA CA, 1960. [24]
R. E. Gomory. Outline of an algorithm for integer solutions to linear programs.
Bulletin of the American Mathematical Society, 64(5):275–279, sep 1958. doi: 10.
1090/s0002-9904-1958-10224-4. [33]
M. Gr¨otschel, L. Lov´asz, and A. Schrijver. Geometric Algorithms and Combinatorial
Optimization. Springer Berlin Heidelberg, 1993. doi: 10.1007/978-3-642-78240-4.
[54]
A. S. E. D. Hamed and G. P. McCormick. Calculation of bounds on variables
satisfying nonlinear inequality constraints. Journal of Global Optimization, 3(1):
25–47, 1993. doi: 10.1007/bf01100238. [55]
M. M. F. Hasan. An edge-concave underestimator for the global optimization of
twice-differentiable nonconvex problems. Journal of Global Optimization, 71(4):
735–752, mar 2018. doi: 10.1007/s10898-018-0646-x. [77]
J.-B. Hiriart-Urruty and C. Lemar´echal. Convex Analysis and Minimization Al-
gorithms II. Springer Berlin Heidelberg, 1993. doi: 10.1007/978-3-662-06409-2.
[39]
R. Horst and H. Tuy. Global Optimization. Springer Nature, 1990. doi: 10.1007/
978-3-662-02598-7. [5, 35, 37, 38, 45, and 46]
J. J. E. Kelley. The cutting-plane method for solving convex programs. Journal of
the Society for Industrial and Applied Mathematics, 8(4):703–712, dec 1960. doi:
10.1137/0108053. [32, 35, and 37]
150 Bibliography
V. Jeyakumar and D. T. Luc. Nonsmooth calculus, minimality, and monotonicity of
convexificators. Journal of Optimization Theory and Applications, 101(3):599–621,
Jun 1999. ISSN 1573-2878. doi: 10.1023/a:1021790120780. [36, 48]
A. Kabgani, M. Soleimani-damaneh, and M. Zamani. Optimality conditions in op-
timization problems with convex feasible set using convexificators. Mathematical
Methods of Operations Research, 86(1):103–121, Apr 2017. ISSN 1432-5217. doi:
10.1007/s00186-017-0584-2. [36, 47, and 50]
O. Khamisov. On optimization properties of functions, with a concave minorant.
Journal of Global Optimization, 14(1):79–101, 1999. doi: 10.1023/a:1008321729949.
[75, 77]
M. R. Kılın¸c and N. V. Sahinidis. Exploiting integrality in the global optimization of
mixed-integer nonlinear programming problems with BARON. Optimization Meth-
ods and Software, 33(3):540–562, jul 2017. doi: 10.1080/10556788.2017.1350178.
[74]
J. Kronqvist, A. Lundell, and T. Westerlund. The extended supporting hyperplane
algorithm for convex mixed-integer nonlinear programming. Journal of Global Op-
timization, 64(2):249–272, 2016. ISSN 1573-2916. doi: 10.1007/s10898-015-0322-3.
[33, 35, 44, 45, and 51]
J. Kronqvist, D. E. Bernal, A. Lundell, and I. E. Grossmann. A review and comparison
of solvers for convex MINLP. Optimization and Engineering, 20(2):397–455, dec
2018. doi: 10.1007/s11081-018-9411-8. [34, 52]
J. B. Lasserre. Global optimization with polynomials and the problem of moments.
SIAM Journal on Optimization, 11(3):796–817, 2001. [91]
J. B. Lasserre. On representations of the feasible set in convex optimization. Op-
timization Letters, 4(1):1–5, oct 2009. doi: 10.1007/s11590-009-0153-6. [35, 47,
and 50]
J. B. Lasserre. On convex optimization without convex representation. Optimization
Letters, 5(4):549–556, apr 2011. doi: 10.1007/s11590-011-0323-1. [36, 47, and 50]
J. B. Lasserre. Erratum to: On convex optimization without convex representation.
Optimization Letters, 8(5):1795–1796, Apr. 2014. doi: 10.1007/s11590-014-0735-9.
[36, 47]
M. Laurent. Sums of squares, moment matrices and optimization over polynomials.
In Emerging Applications of Algebraic Geometry, pages 157–270. Springer, 2009.
[91]
C. Lemar´echal. An introduction to the theory of nonsmooth optimization. Optimiza-
tion, 17(6):827–858, 1986. doi: 10.1080/02331938608843204. [36]
Y. Lin and L. Schrage. The global solver in the LINDO API. Optimization Methods
and Software, 24(4-5):657–668, oct 2009. doi: 10.1080/10556780902753221. [74]
Bibliography 151
M. Lubin, D. Bienstock, and J. P. Vielma. Two-sided linear chance constraints and
extensions. arXiv preprint arXiv:1507.01995, 2015. [38]
J. E. Mart´ınez-Legaz. Optimality conditions for pseudoconvex minimization over
convex sets defined by tangentially convex constraints. Optimization Letters, 9(5):
1017–1023, Oct. 2014. doi: 10.1007/s11590-014-0822-y. [36, 47, and 50]
G. P. McCormick. Computability of global solutions to factorable nonconvex pro-
grams: Part i — convex underestimating problems. Mathematical Programming,
10(1):147–175, dec 1976. doi: 10.1007/bf01580665. [4, 6, 69, 77, and 78]
MINLPLIB. MINLP library. http://www.minlplib.org. [135]
R. Misener and C. A. Floudas. ANTIGONE: Algorithms for coNTinuous / Integer
Global Optimization of Nonlinear Equations. Journal of Global Optimization, 59
(2-3):503–526, mar 2014. doi: 10.1007/s10898-014-0166-2. [74]
S. Modaresi, M. R. Kılın¸c, and J. P. Vielma. Intersection cuts for nonlinear inte-
ger programming: convexification techniques for structured sets. Mathematical
Programming, 155(1-2):575–611, feb 2015. doi: 10.1007/s10107-015-0866-5. [76]
D. Mor´an and S. S. Dey. On maximal s-free convex sets. SIAM Journal on Discrete
Mathematics, 25(1):379–393, jan 2011. doi: 10.1137/100796947. [28, 95]
G. Mu˜noz and F. Serrano. Maximal quadratic-free sets. In Integer Programming
and Combinatorial Optimization, pages 307–321. Springer International Publish-
ing, 2020. doi: 10.1007/978-3-030-45771-6 24. URL
https://doi.org/10.1007/
978-3-030-45771-6_24. [90]
F. Plastria. Lower subdifferentiable functions and their minimization by cutting
planes. Journal of Optimization Theory and Applications, 46(1):37–53, may 1985.
doi: 10.1007/bf00938758. [38]
M. Porembski. How to extend the concept of convexity cuts to derive deeper cut-
ting planes. Journal of Global Optimization, 15(4):371–404, 1999. doi: 10.1023/a:
1008315229750. [76]
M. Porembski. Finitely convergent cutting planes for concave minimization. Journal
of Global Optimization, 20(2):109–132, 2001. doi: 10.1023/a:1011240309783. [76]
B. H. Pourciau. Modern multiplier rules. The American Mathematical Monthly, 87
(6):433–452, jun 1980. doi: 10.1080/00029890.1980.11995060. [23]
V. Powers and B. Reznick. Polynomials that are positive on an interval. Transactions
of the American Mathematical Society, 352(10):4677–4693, oct 2000. doi: 10.1090/
s0002-9947-00-02595-2. [69]
A. Pr´ekopa. Stochastic Programming. Springer Netherlands, 1995. doi: 10.1007/
978-94-017-3087-7. [39]
A. Pr´ekopa and T. Sz´antai. Flood control reservoir system design using stochastic
programming. In Mathematical Programming in Use, pages 138–151. Springer
Berlin Heidelberg, 1978. doi: 10.1007/bfb0120831. [39]
152 Bibliography
B. N. Pshenichnyi. Necessary Conditions for an Extremum. Marcel Dekker Inc, New
York, 1971. [36]
Y. Puranik and N. V. Sahinidis. Domain reduction techniques for global NLP
and MINLP optimization. Constraints, 22(3):338–376, jan 2017. doi: 10.1007/
s10601-016-9267-5. [55]
I. Quesada and I. E. Grossmann. An lp/nlp based branch and bound algorithm
for convex minlp optimization problems. Computers & chemical engineering, 16
(10-11):937–947, 1992. [37]
R. T. Rockafellar. Convex analysis. Princeton University Press, 1970. [6, 22, 23, 34,
40, 42, 57, 62, 63, 64, 65, 97, and 138]
P. Ruys. Public goods and decentralization: the duality approach in the theory of
value. PhD thesis, Tilburg University, 1974. [57]
A. Saxena, P. Bonami, and J. Lee. Convex relaxations of non-convex mixed in-
teger quadratically constrained programs: extended formulations. Mathematical
Programming, 124(1-2):383–411, may 2010a. doi: 10.1007/s10107-010-0371-9. [77]
A. Saxena, P. Bonami, and J. Lee. Convex relaxations of non-convex mixed inte-
ger quadratically constrained programs: projected formulations. Mathematical
Programming, 130(2):359–413, mar 2010b. doi: 10.1007/s10107-010-0340-3. [77]
S. Scholtes. Introduction to Piecewise Differentiable Equations. Springer New York,
2012. doi: 10.1007/978-1-4614-4340-7. [51]
A. Schrijver. Theory of Linear and Integer Programming. Wiley, 1998. [6]
S. Sen and H. D. Sherali. Facet inequalities from simple disjunctions in cutting plane
theory. Mathematical Programming, 34(1):72–83, jan 1986. doi: 10.1007/bf01582164.
[76]
S. Sen and H. D. Sherali. Nondifferentiable reverse convex programs and facetial
convexity cuts via a disjunctive characterization. Mathematical Programming, 37
(2):169–183, jun 1987. doi: 10.1007/bf02591693. [76]
F. Serrano. Intersection cuts for factorable MINLP. In Integer Programming and Com-
binatorial Optimization, pages 385–398. Springer International Publishing, 2019.
doi: 10.1007/978-3-030-17953-3\29. [73]
N. Z. Shor. Quadratic optimization problems. Soviet Journal of Computer and
Systems Sciences, 25:1–11, 1987. [91]
V. N. Solovev. On a criterion for convexity of a positive-homogeneous
function. Mathematics of the USSR-Sbornik, 46(2):285–290, Feb. 1983.
doi: 10.1070/sm1983v046n02abeh002787. URL
https://doi.org/10.1070/
sm1983v046n02abeh002787. [108]
Sz´antai. A computer code for solution of probabilistic-constrained stochastic program-
ming problems. In Y. Ermoliev and R.-B. Wets, editors, Numerical Techniques for
Stochastic Optimization, pages 229–235. Springer Verlag, 1988. [39]
Bibliography 153
M. Tawarmalani and N. V. Sahinidis. Convex extensions and envelopes of lower
semi-continuous functions. Mathematical Programming, 93(2):247–263, dec 2002.
doi: 10.1007/s10107-002-0308-z. [75]
M. Tawarmalani and N. V. Sahinidis. A polyhedral branch-and-cut approach to
global optimization. Mathematical Programming, 103(2):225–249, may 2005. doi:
10.1007/s10107-005-0581-8. [74]
J. Tind and L. A. Wolsey. On the use of penumbras in blocking and antiblocking the-
ory. Mathematical Programming, 22(1):71–81, dec 1982. doi: 10.1007/bf01581026.
[57]
E. Towle and J. Luedtke. Intersection disjunctions for reverse convex sets. arXiv
preprint arXiv:1901.02112, 2019. [77]
A. Tsoukalas and A. Mitsos. Multivariate McCormick relaxations. Journal of Global
Optimization, 59(2-3):633–662, apr 2014. doi: 10.1007/s10898-014-0176-0. [78, 80]
H. Tuy. Concave programming with linear constraints. Doklady Akademii Nauk, 159
(1):32–35, 1964. [10, 75, 76, and 83]
H. Tuy. Convex Analysis and Global Optimization. Springer International Publishing,
2016. doi: 10.1007/978-3-319-31484-6. [42]
W. van Ackooij and W. de Oliveira. Convexity and optimization with copulæ struc-
tured probabilistic constraints. Optimization, 65(7):1349–1376, May 2016. doi:
10.1080/02331934.2016.1179302. [39]
W. van Ackooij, R. Henrion, A. M¨oller, and R. Zorgati. Joint chance constrained
programming for hydro reservoir management. Optimization and Engineering, oct
2013. doi: 10.1007/s11081-013-9236-4. [39]
W. van Ackooij, E. C. Finardi, and G. M. Ramalho. An exact solution method
for the hydrothermal unit commitment under wind power uncertainty with joint
probability constraints. IEEE Transactions on Power Systems, 33(6):6487–6500,
nov 2018. doi: 10.1109/tpwrs.2018.2848594. [39]
A. F. Veinott. The supporting hyperplane method for unimodal programming. Op-
erations Research, 15(1):147–152, feb 1967. doi: 10.1287/opre.15.1.147. [33, 35, 37,
45, 50, and 51]
S. Venkatachalam and L. Ntaimo. Integer Set Reduction for Stochastic Mixed-Integer
Programming. arXiv e-prints, art. arXiv:1605.05194, Apr 2016. [56, 67]
S. Vigerske. Decomposition in multistage stochastic programming and a constraint
integer programming approach to mixed-integer nonlinear programming. PhD thesis,
Humboldt-Universit¨at zu Berlin, Mathematisch-Naturwissenschaftliche Fakult¨at
II, 2013. [55]
S. Vigerske and A. Gleixner. SCIP: global optimization of mixed-integer nonlinear
programs in a branch-and-cut framework. Optimization Methods and Software, 33
(3):563–593, jun 2017. doi: 10.1080/10556788.2017.1335312. [74]
154 Bibliography
S. Vigerske and A. Gleixner. Scip: global optimization of mixed-integer nonlinear
programs in a branch-and-cut framework. Optimization Methods and Software, 33
(3):563–593, 2018. doi: 10.1080/10556788.2017.1335312. URL
https://doi.org/
10.1080/10556788.2017.1335312. [134]
Z. Wei and M. M. Ali. Outer approximation algorithm for one class of convex mixed-
integer nonlinear programming problems with partial differentiability. Journal of
Optimization Theory and Applications, 167(2):644–652, mar 2015a. doi: 10.1007/
s10957-015-0715-y. [37]
Z. Wei and M. M. Ali. Convex mixed integer nonlinear programming problems and
an outer approximation algorithm. Journal of Global Optimization, 63(2):213–227,
feb 2015b. doi: 10.1007/s10898-015-0284-5. [37]
Z. Wei and M. M. Ali. Generalized benders decomposition for one class of MINLPs
with vector conic constraint. SIAM Journal on Optimization, 25(3):1809–1825, jan
2015c. doi: 10.1137/140967519. [38]
T. Westerlund and F. Pettersson. An extended cutting plane method for solving
convex MINLP problems. Computers & Chemical Engineering, 19:131–136, jun
1995. doi: 10.1016/0098-1354(95)87027-x. [38]
T. Westerlund, H. Skrifvars, I. Harjunkoski, and R. P¨orn. An extended cutting
plane method for a class of non-convex MINLP problems. Computers & Chemical
Engineering, 22(3):357–365, feb 1998. doi: 10.1016/s0098-1354(97)00000-8. [38]
T. Westerlund, V.-P. Eronen, and M. M. M¨akel¨a. On solving generalized convex
MINLP problems using supporting hyperplane techniques. Journal of Global
Optimization, 71(4):987–1011, mar 2018. doi: 10.1007/s10898-018-0644-z. [38]
S. Wiese. On the interplay of Mixed Integer Linear, Mixed Integer Nonlinear and
Constraint Programming. PhD thesis, Alma Mater Studiorum - Universit`a di
Bologna, 2016. [23]
A. Zaffaroni. Convex coradiant sets with a continuous concave cogauge. Journal of
Convex Analysis, 15(2):325–343, 2008. [54]
G. M. Ziegler. Lectures on Polytopes. Springer New York, 1995. doi: 10.1007/
978-1-4613-8431-1. URL https://doi.org/10.1007/978-1-4613-8431-1. [18]