Document [original]

On Cutting Planes for Mixed-Integer

Nonlinear Programming

vorgelegt von

M. Sc.

Felipe Serrano Musalem

ORCID: 0000-0002-7892-3951

an der Fakult¨at II – Mathematik und Naturwissenschaften

der Technischen Universit¨at Berlin

zur Erlangung des akademischen Grades

Doktor der Naturwissenschaften

– Dr. rer. nat. –

genehmigte Dissertation

Promotionsausschuss:

Vorsitzender: Prof. Dr. Wolfgang K¨onig

Gutachter: Prof. Dr. Thorsten Koch

Prof. Dr. Juan Pablo Vielma

Tag der wissenschaftlichen Aussprache: 21. August 2020

Berlin 2021

Abstract

Mixed-integer nonlinear programming is a powerful technology that allows

us to model and solve problems involving nonlinear functions, continuous,

and discrete variables. The state-of-the-art solvers of mixed-integer nonlinear

programs (MINLPs) use a combination of, among other techniques, branch-

and-bound and cutting planes. In the late ’90s, solvers for mixed-integer linear

programs saw an increase in performance due to the incorporation of general-

purpose cutting planes.

In this thesis, we deepen our understanding of a classical cutting planes

algorithm, develop a strengthening technique, and two new cutting planes for

MINLPs.

We first show that Veinott’s supporting hyperplane algorithm is a particular

case of Kelley’s cutting plane algorithm. We further extend the applicability

of Veinott’s supporting hyperplane algorithm to solve convex problems repre-

sented by non-convex functions.

We then develop a technique to strengthen cutting planes for non-convex

MINLPs. Many cuts for non-convex MINLPs strongly rely on the domain

of the variables: tighter bounds produce tighter cuts. Using the point to be

separated, we show that we can restrict the feasible region and still ensure

the validity of the resulting cutting plane.

Finally, we develop two intersection cuts for non-convex MINLP. The first

one is a technique to construct

-free sets for any factorable MINLP. For the

second one, we show how to build maximal quadratic-free sets, from which we

compute intersection cuts. These last cuts reduce the average running time

of the solver SCIP by 20% on hard MINLPs.

Zusammenfassung

Die gemischt-ganzzahlige nichtlineare Programmierung ist eine leistungsstarke

Technik, mit der wir Probleme modellieren und l¨osen k¨onnen, die nichtlineare

Funktionen und kontinuierliche und diskrete Variablen enthalten. Die hoch-

modernen L¨oser f¨ur gemischt-ganzzahlige nichtlineare Programme (MINLPs)

verwenden unter anderem eine Kombination der Branch-and-Bound-Methode

und Schnittebenengenerierung. In den sp¨aten 90er Jahren erfuhren die L¨oser

f¨ur gemischt-ganzzahlige lineare Programme eine Leistungssteigerung durch

die Einbeziehung von universell nutzbaren Schnittebenen.

In dieser Arbeit vertiefen wir unser Verst¨andnis eines klassischen Schnitt-

ebenen-Algorithmus, wir entwickeln eine Verst¨arkungstechnik und zwei neue

Schnittebenen f¨ur MINLPs.

Zun¨achst zeigen wir, dass der St¨utzhyperebenen-Algorithmus von Veinott

ein Sonderfall des Kelley’schen Schnittebenen-Algorithmus ist. Dar¨uber hinaus

erweitern wir die Anwendbarkeit von Veinotts St¨utzhyperebenen-Algorithmus

auf die L¨osung konvexer Probleme, die durch nicht-konvexe Funktionen repr¨a-

sentiert werden.

Anschließend entwickeln wir eine Technik zur Verst¨arkung der Schnittebe-

nen f¨ur nicht-konvexe MINLPs. Viele Schnitte f¨ur nicht-konvexe MINLPs

h¨angen stark vom Wertebereich der Variablen ab: Strengere Schranken erzeu-

gen st¨arkere Schnitte. Anhand des zu separierenden Punktes zeigen wir, dass

wir die zul¨assige Region einschr¨anken k¨onnen und dennoch die G¨ultigkeit der

resultierenden Schnitte beibehalten.

Schließlich entwickeln wir zwei

Uberschneidungsschnittebenen f¨ur nicht-

konvexe MINLPs. Der erste Schnitt ist eine Technik zur Konstruktion

S-freier

Mengen f¨ur beliebige faktorisierbare MINLPs. F¨ur den zweiten Schnitt zeigen

wir, wie man maximal quadratisch-freie Mengen bildet, aus denen wir

Uber-

schneidungsschnittebenen berechnen. Diese Schnitte reduzieren die durch-

schnittliche Laufzeit des L¨osers SCIP um 20% bei schwierigen Problemen.

iii

Contents

Abstract i

Zusammenfassung iii

1 Introduction 1

1.1 Mathematical Preliminaries . . . . . . . . . . . . . . . . . . . . 6

1.2 Intersection Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.4 Monoidal Strengthening . . . . . . . . . . . . . . . . . . . . . . 23

1.4.1 One Row Relaxations: Gomory Cuts, 24

1.4.2 Disjunctive Cuts, 24

1.4.3 Monoidal Strengthening, 27

2 On the Relation Between the Extended Supporting Hyperplane

Algorithm and Kelley’s Cutting Plane Algorithm 31

2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.1.1 Literature Review, 37

2.2 Characterization of Functions with Supporting Linearizations .39

2.3 The Gauge Function . . . . . . . . . . . . . . . . . . . . . . . . 42

2.3.1 Using the Gauge Function for Separation, 42

2.3.2 Evaluating the Gauge Function, 44

2.3.3 Handling Sets with Empty Interior, 44

2.3.4 Using a Nonzero Interior Point, 45

2.4 Convergence Proofs . . . . . . . . . . . . . . . . . . . . . . . . 45

vi Contents

2.5

Convex Programs Represented by Non-Convex Non-Smooth Func-

tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.5.1 The ESH Algorithm in the Context of Generalized Differentiability, 47

2.5.2 Limits to the Applicability of the ESH Algorithm, 50

2.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 51

3 Visible Points, the Separation Problem, and Applications to Mixed-

Integer Nonlinear Programming 53

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.2 Visible Points and the Reverse Polar . . . . . . . . . . . . . . . 56

3.3 The Smallest Generators . . . . . . . . . . . . . . . . . . . . . 58

3.3.1 Motivation, 58

3.3.2 Preliminaries, 61

3.3.3 Results, 62

3.4 Applications to MINLP . . . . . . . . . . . . . . . . . . . . . . 65

3.4.1 Characterizing the Visible Points, 66

3.5 Conclusions and Outlook . . . . . . . . . . . . . . . . . . . . . 72

4 Intersection Cuts for Factorable Mixed-Integer Nonlinear Pro-

gramming 73

4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.2 Literature Review and Related Work . . . . . . . . . . . . . . 76

4.3 Concave Underestimators . . . . . . . . . . . . . . . . . . . . . 78

4.3.1

Concave Underestimators and Intersection Cuts for Convex Constraints, 81

4.4 Enlarging the S-free Sets by Using Bound Information . . . . . 83

4.5 “Monoidal” Strengthening . . . . . . . . . . . . . . . . . . . . 84

4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5 Maximal Quadratic-Free Sets 89

5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.1.1 Related Work, 90

5.1.2 Contribution, 91

5.1.3 Notation, 91

5.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2.1 Techniques for Proving Maximality, 92

Contents vii

5.3 Maximal Quadratic-Free Sets for Homogeneous Quadratics . . 97

5.3.1 Removing Strict Convexity Matters, 98

5.3.2 Maximal Sh-free Sets, 99

5.4

Homogeneous Quadratics With a Single Homogeneous Linear Con-

straint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.4.1 Case 1: ∥a∥ ≤ ∥d∥ ∧ m > 1, 101

5.4.2 Case 2: ∥a∥ ≥ ∥d∥, 106

5.5 Non-Homogeneous Quadratics . . . . . . . . . . . . . . . . . . 112

5.5.1 Case 1: ∥a∥ ≤ ∥d∥ ∧ m > 1, 113

5.5.2 Case 2: ∥a∥>∥d∥, 114

5.6 On the Diagonalization and Homogenization of Quadratics .125

5.7 Further Remarks and Generalizations . . . . . . . . . . . . . 128

5.7.1 Generalizing Theorem 5.16, 128

5.7.2 Generalizing Proposition 5.21, 131

5.7.3 Extensions to the Work of Bienstock et al. (2016), 132

5.7.4 There Are More Quadratic-Free Sets, 133

5.8 Computational Experiments . . . . . . . . . . . . . . . . . . . 134

5.9 Summary and Future Work . . . . . . . . . . . . . . . . . . . 136

5.10 Missing Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6 Conclusion 143

Bibliography 154

Chapter 1

Introduction

This thesis develops techniques for solving mixed-integer nonlinear problems,

in particular, techniques related to cutting planes. A mixed-integer nonlinear

problem (MINLP) belongs to the class of Mathematical Programming (MP).

In its simplest form, MP is concerned with finding the largest or smallest

value that a function can attain in some domain. For example, finding the

region of smallest surface that has a prescribed volume, or finding the path

that a ball has to take so that it goes from point A to point B in the least

amount of time under the influence of gravity. Already at this point one can

suspect that MP has lots of applications, just imagine packing a given volume

of liquid using the least amount of material. More modern examples of MP

problems include finding the shortest path between two points in a city, or

deciding where to open stores from a given set of possible locations such that

customers’ average shortest travel time is minimized, etc. One can find an

impressive amount of applications in the survey of Boukouvala, Misener, and

Floudas (2016).

The example problems mentioned above above have two distinct features.

The first examples are continuous, that is, the solution can be any real number.

In contrast, the last examples are discrete. Discrete structures appear, for

example, when we can only choose from a finite set of possibilities.

One of the features of these type of problems is that they can be translated,

with more or less work, to a mathematical model. That is, the set of feasible

solutions can be described by equations and inequalities, called constraints,

while the criterion we want to optimize over can be described as a function,

called objective function. As a toy example, suppose we are interested in

finding two non-negative integers such that the cube of one number is two

units away from the square of the other and their sum is smallest. If

and

are the two integer numbers and

is the value of their sum, the problem

2Chapter 1. Introduction

above can be written as

min{v:v=x+y, x3−y2= 2, x, y ∈Z+, v ∈R}.(1.1)

(1.1)

we encounter the constraints

x3−y2

= 2,

x, y ∈Z+

and

v∈R

, and the objective function is just

, which is the quantity we want to

minimize. The constraint

is linear, while

x3−y2

= 2 is nonlinear.

The variables x, y are restricted to be integers while vis continuous.

Such a model is an example of an MINLP problem. The “mixed-integer”

comes from the fact that variables can be either discrete or continuous. The

“nonlinear” makes reference to the possibility of having constraints represented

by nonlinear functions.

More general, a generic MINLP can be written as

min f(x)

s.t. gk(x)≤0∀k∈[m],

xi∈Z∀i∈I,

where

m, n ∈Z+

f, gk

A⊆Rn→R

, [

] =

{

, . . . , m}

x∈Rn

, and

I⊆

[

We note that assuming that the constraints are

(

)

≤

0 is without loss of

generality, since gk(x) = 0 is equivalent to gk(x)≤0 and −gk(x)≤0.

In practice, MINLP problems are difficult to solve. The best algorithm

we currently have for trying to solve a general MINLP is the so-called LP-

based spatial branch and bound. LP stands for linear programming, which is a

subclass of MINLP concerned with optimization problems where all variables

are continuous and all constraints are linear. In contrast to MINLPs, LPs are

easy to solve in practice.

The basic idea of LP-based spatial branch and bound is to construct an

LP relaxation of the MINLP, that is, an LP such that every feasible point

of the MINLP is feasible for the LP. Solving this LP yields a bound on the

optimal value of the MINLP. The solution of the LP,

x¯

is likely to be infeasible

for the MINLP. Thus, the LP relaxation can, in principle, be refined by the

introduction of cutting planes separating

x¯

. These are linear inequalities that

every point of the MINLP satisfies and

x¯

does not satisfy. By refining the LP

relaxation, we obtain a better bound on the optimal value of the MINLP.

For example, it is not hard to see that (

x, y, v

) = (3

8) is an optimal

solution of

(1.1)

(just check that (3

5) is the only feasible point in

{

}×

{

}

). An LP relaxation of

(1.1)

min{v

y, x, y ≥

}

for

which an optimal solution is (

x¯, y¯, v¯

) = (0

0). The optimal value of the LP

is 0, which is a (lower) bound on the optimal value of the MINLP, which is 8.

Now, since

+2 and

y2≥

0, we can deduce that

x3≥

2. This implies

that

x >

1 and since

must be integral, we conclude that

x≥

2. Note that the

LP solution does not satisfy

x≥

2. Thus,

min{v

y, x ≥

, y ≥

}

is a tighter LP relaxation. An optimal solution of this LP is (

x¯, y¯, v¯

) = (2

and yields a better lower bound. Cuts that involve a single variable are usually

called bound tightenings.

Notice that the LP solution, (

x¯, y¯

) = (2

0) violates the constraint

+ 2. In particular, if we interpret the equality as two inequalities, then

the violated inequality is

x3−y2≤

2. Since

x≥

2 and

y≥

0, the above

inequality is equivalent to

√x3−2−y≤

0. The function

(

) =

√x3−2

convex and differentiable at

= 2 and so

(2) +

f′

(2)(

x−

≤f

(

), that is,

√6x−√6≤√x3−2

for

x≥

2. Therefore, every feasible point must satisfy

√6x−√6−y≤

0. We see that (

x¯, y¯

) = (2

0) does not satisfy this inequality.

Such an inequality is then cutting plane and its addition to the current LP

relaxation makes it tighter. Indeed, by adding it and solving the corresponding

LP we obtain the optimal point (

x¯, y¯, v¯

) = (2

,√6,

2 +

√6

) with value 2 +

√6

which is better than the one of the previous iteration.

However, at some point it might not be possible to compute a cutting

plane and so the algorithm starts branching. In its most basic form, branching

means to split the feasible region into two regions, in such a way that the

union of both regions is the original feasible region. For example, in the last LP

relaxation we obtained

y¯

√6

. Branching on

√6

produces two problems

which are the same as the original one, except that in one the constraint

y≤√6

is added and in the other one,

y≥√6

. Since

is restricted to be an

integer we can further make these inequalities tighter. Thus, after branching

on ywe obtain the following problems

min{v:v=x+y, x3−y2= 2, y ≤2, x, y ∈Z+, v ∈R}and

min{v:v=x+y, x3−y2= 2, y ≥3, x, y ∈Z+, v ∈R}.

The adjective spatial in spatial branch and bound means that the branching

can also be done on continuous variables, for example,

. The adjective is

added to distinguish the algorithm from the standard branch-and-bound al-

gorithm for solving mixed-integer linear problems (MILPs). Via branching,

the algorithm implicitly constructs a tree of problems.

By continuing the branching process the problem will eventually be solved.

However, as can be seen from the example, cutting planes are an important

tool for tightening the LP relaxation of the MINLP, whose purpose is to

accelerate the solution process.

Let us look at another example to illustrate another important tool for

solving MINLPs. Assume we are interested in buying some number of shirts

4Chapter 1. Introduction

and pants in such a way that the number of different outfits we can create

is maximal. We enter a rather expensive shop where the cost of each shirt is

30 euros, while each pant is 70 euros, and we have 250 euros in our wallet. If

is the number of shirts and

the number of pants that we buy, then the

number of outfits is T=s·p. Then, the problem we try to solve is

max{T:T≤s·p, 3s+ 7p≤25, s, p ∈Z+}.

Let us first notice that we do not have enough money to buy 9 shirts nor 4

pants, so

s≤

8 and

p≤

3. One way of obtaining a linear relaxation for this

problem is to find a linear relaxation of the constraint

T≤s·p

. To obtain

one, notice that for every feasible

and

we have that

−p

)

≥

0 and

−s

)

p≥

0. Thus,

T≤s·p≤min{

. These are the famous McCormick

inequalities (McCormick, 1976). Our first linear relaxation then looks like

max{T:T≤3s, T ≤8p, 3s+ 7p≤25, s, p ∈R+}.

We could have added the bounds

s≤

p≤

3, but less us keep it simple. The

optimal solution of the linear relaxation is (

T, s, p

)

≈

(13

6). As this

is an upper bound on the optimal value, we know that it is not possible to

get 14 different outfits. Let us branch on

s≤

4 and

s≥

4. The first problem

created is

max{T:T≤s·p, 3s+ 7p≤25, s ≤4, s, p ∈Z+}.

If we solve the linear relaxation

max{T:T≤3s, T ≤8p, 3s+ 7p≤25, s ≤4, s, p ∈R+},

we obtain a value of

= 12. However, when branching on

s≤

4, the upper

bound of

is reduced from 8 to 4. Thus, there is a chance that we can deduce

a better linear relaxation of

T≤s·p

. Indeed, following the same reasoning as

above we see that

T≤s·p≤min{

. Now, solving the improved linear

relaxation

max{T:T≤3s, T ≤4p, 3s+ 7p≤25, s ≤4, s, p ∈R+},

yields

T≈

09, which is a much better upper bound. This shows that if we

buy 4 or less shirts we can only hope for 9 outfits. The algorithm will continue

either by branching or cutting. If anybody is interested, the maximum number

of outfits is actually 6, far away from the possibility of 13 given by the first

linear relaxation.

This example illustrates that the bounds of the variables are very im-

portant for building tight linear relaxations of MINLPs. Many details about

branch-and-bound algorithms have not been dealt with in the previous expla-

nation. For more details, including proofs of convergence, the reader is referred

to Horst and Tuy (1990, Chapter IV).

The importance of bound propagation and cutting planes is...

Contributions and outline In Chapter 2, we investigate two classical al-

gorithms for convex MINLPs, a subclass of MINLP in which all the functions

appearing in nonlinear constraints are convex. These algorithms are Kelley’s

Cutting Plane algorithm and Veinott’s Supporting Hyperplane algorithm. We

show that the convergence of Veinott’s algorithm follows from the conver-

gence of Kelley’s algorithm. The idea is to interpret Veinott’s algorithm as

Kelley’s algorithm applied to a reformulation of the original problem. Such a

reformulation only depends on the feasible region and not on functions used

to represent it. Thus, we are able to extend the applicability of Veinott’s

algorithm to some problems with convex feasible region, but where constraint

functions are not necessarily convex nor differentiable. Under a mild technical

condition, Veinott’s algorithm converges if the function are differentiable. To

extend this result, we relax the differentiability assumption of the functions

by introducing a notion of a generalized derivative which is enough to show

the convergence of Veinott’s algorithm.

In Chapter 3, we study in a more general setting the separation problem,

namely, given a point

x¯

and a set

, find a valid linear cutting plane for

that separates

x¯

, or show that none exists. In other words, if

(

S, x¯

) is the

set of all the answers of the separation problem, that is, all valid cuts for

that separate

x¯

from

, then the separation problem is to find an element of

(

S, x¯

) or show that

(

S, x¯

) =

∅

. We show that given

and

x¯

, there exists

ˆ⊆S

such that

(

S, x¯

) =

(

ˆ, x¯

). The intuition of such a result is as follows.

To ensure that a cutting plane is valid for a closed set

, it is enough to verify

that it is valid for every vertex of

. However, in general, we want a cutting

plane that separates a given point

x¯

. Thus, to ensure validity of such a cut,

it is enough to verify that it is valid for every vertex of

“near”

x¯

. We use

the concept of visible points of Sfrom x¯, VS(x¯), to formalize the meaning of

“near” and show that

(

S, x¯

) =

(

x¯

)

, x¯

). We give a simple characterization

of the visible points of

when

is the intersection of a quadratic constraint

and a convex set. If

is the intersection of a polynomial constraint and a

convex set, we provide an extended formulation for a relaxation of the visible

points. As we will see, simple examples show that the visible points are not

the smallest

such that

(

S, x¯

) =

(

ˆ, x¯

). Finally, we use the visible points

6Chapter 1. Introduction

to characterize the smallest S

ˆfor different classes of sets.

Then, in Chapter 4, we focus on intersection cuts. Intersection cuts are an

elegant technique to construct cutting planes that perfectly fits to LP-based

approaches for MINLP. We show how to construct intersection cuts for general

factorable MINLPs. The idea is to construct concave underestimators of a

factorable function. Our approach is to mimic McCormick’s procedure for

building convex underestimators. Furthermore we propose a strengthening

procedure for intersection cuts using monoidal strengthening in the presence

of a single integer variable.

With the aid of the concave underestimators, we build so-called

-free sets,

closed convex sets that do not contain any point of

in their interior, where

is normally the feasible region or a relaxation thereof. From an

-free set

and a simplicial conic relaxation of the feasible region one can construct an

intersection cut. As it turns out, the larger the

-free set the stronger the cut.

Thus, it is natural to seek maximal

-free sets, that is,

-free sets that are not

completely contained in any other

-free set. Although the constructions of

Chapter 4 allow us to construct

-free sets, they are usually not maximal. In

Chapter 5 we construct maximal

-free sets when

is given by a quadratic

constraint.

In the remainder of this chapter we introduce our notation and general

definitions that are used throughout the thesis. We explain, in a rather leisurely

manner, more techniques in MINLP that are relevant for this thesis.

1.1 Mathematical Preliminaries

In this section, we introduce notation and some concepts that we use through-

out the thesis. The reader is referred to the following references for some

definitions and proofs of some of the claims made in this section without proof:

Rockafellar (1970), Schrijver (1998) and Boyd and Vandenberghe (2004). We

classify the concepts to make the reference easier.

Topology

We will be working in

. We denote its inner product between

x, y ∈Rn

xTy

and by

∥·∥

the euclidean norm. We denote by

(

) and

(

) the euclidean ball centered at

of radius

and its boundary, respectively.

More precisely,

(

) =

{y∈Rn

∥y−x∥ ≤ r}

and

(

) =

{y∈Rn

∥y−x∥=r}.

Let

C⊆Rn

. We denote the boundary, complement, closure, interior, and

relative interior of

∂C

(C)c

cl C

int C

, and

ri C

, respectively. Given

v∈Rn

and a set

C⊆Rn

, we denote the distance between

and

dist

(

v, C

) =

infx∈C∥v−x∥

. Given two sets

A, B ⊆Rn

, the Minkowski sum

1.1. Mathematical Preliminaries 7

and

a∈A, b ∈B}

and we denote it by

. When

a singleton, say

{a}

, we denote the sum by

. For a set of vectors

{v1, . . . , vk} ⊆ Rn

, we denote by

⟨v1, . . . , vk⟩

the subspace generated by them.

Given some set

C⊆Rn×Rm

, we denote by

projxC

the projection of

onto the

-space, that is,

projxC

{x∈Rn

∃y∈Rm,

(

x, y

)

∈C}

. More

generally, if

is a subspace of

, we denote

projHC

the projection of

onto H.

Convex sets

Given

points

x1, . . . , xm∈Rn

and given

λ1, . . . , λm∈

such that

∑︁m

i=1 λi

= 1, the point

∑︁m

i=1 λixi

is said to be a convex combination

of the points

x1, . . . , xm

. We say that

is convex if for every

x, y ∈C

and

λ∈

1],

λx

+ (1

−λ

)

y∈C

, that is, if for every pair of points in

every

convex combination of them is in

. The convex hull of

is the smallest

convex set that contains

, or equivalently the intersection of all convex sets

containing

and is denoted by

conv C

. The closure of the convex hull of

denoted by

conv C

. The extreme points of a not necessarily convex set

are

the points in

that cannot be written as convex combination of other points

, and we denote them by

ext C

. For example, if

is a square, then the

extreme points are the vertices. If

is a disk, then the extreme points are all

the points at the boundary. If

is this figure

⊐⊂

, then the two right vertices

and all the points of the semi-circle at the left are extreme points. The beauty

of the concept of extreme points is that those points are the only ones needed

to describe the convex hull of a set.

A related concept is that of exposed points. When one optimizes a linear

function over a set

, then an optimal solution, if one exists, is going to

be at the boundary of

. The solution might be unique, for example, when

optimizing in any direction over a circle. There might be multiple solutions,

for example, when optimizing in the direction (1

0) over a square. Any

x0∈C

such that there exists a linear function

αTx

for which

is the unique solution

maxx∈CαTx

is called an exposed point. We denote the set of exposed points

exp C

. Every exposed point is an extreme point. However, not every

extreme point is an exposed point. To see this, consider again

⊐⊂

. The

two points where the semi-circle meets the straight part are extreme but not

exposed.

The gauge function of a convex set

ϕC

(

) =

inf{t

t >

t∈C}

The gauge function is a sort of distance measured by

. It measures what is

the minimum that we have to scale Cso that xis at its boundary.

Given a closed set

, a convex set

is said to be

-free if its interior does

not contain any point of

. In other words,

-free if

S∩int C

∅

. Let

be an

-free set. We say that

is maximal

-free if it holds that for every

8Chapter 1. Introduction

convex S-free set K, if C⊆K, then C=K.

Inequalities

Let

α∈Rn

and

β∈R

. The set

{x∈Rn

αTx

β}

called an affine subspace and we say that

is its normal. The set

{x∈Rn

αTx≤β}

is a half-space. Both are convex. In general, a closed convex set can

be written as the intersection of an arbitrary number of half-spaces. Usually,

instead of writing the half-space as a set we just write

αTx≤β

. We say that

αTx≤β

is valid or a valid inequality for

C⊆ {x∈Rn

αTx≤β}

. If

αTx≤β

is a valid inequality for

and

x¯/∈C

is such that

αTx¯> β

, we say

that

αTx≤β

separates

x¯

from

. If

αTx≤β

is a valid inequality for

and it

is tight, that is, there exists a

y∈C

such that

αTy

, we say that

αTx≤β

is a supporting hyperplane of

, or that it supports

. A closed convex set can

be written as the intersection of its supporting hyperplanes. If the number of

hyperplanes needed to describe a convex set is finite, then the convex set is

called a polyhedron.

Cones

Acone is a set

C⊆Rn

with the following property. If

x∈C

and

λ≥

0, then

λx ∈C

. A cone is pointed if it has an extreme point, in which

case this extreme point is called apex. Given

points

x1, . . . , xm∈Rn

and

λ1, . . . , λm≥

0, the point

∑︁m

i=1 λixi

is said to be a conic combination of the

points

x1, . . . , xm

. In the context of cones, the extreme rays play the role of

extreme points. A ray is a set of the form

{λx

λ≥

}

and we call it the ray

generated by

. If

is a cone and

x∈C

, the ray generated by

is contained

. We say that the ray generated by

x∈C

is an extreme ray if

cannot

be written as a conic combination of other points of C. Note that this is the

same as saying that neither

nor any positive scaling of it can be written

as a conic combination of other points of

. We say that a set

K⊆Rn

is a

translated cone if there exist a cone

and

x∈Rn

such that

. A

cone in Rnis said to be simplicial if it has exactly nextreme rays.

Every unbounded convex set contains a (translated) cone. The recession

cone of a convex set

, denoted by

rec

(

), is the largest cone

such that

. In other words,

rec

(

) is the largest cone that can be translated to

be completely contained in

. It is possible that a direction

and its opposite,

−d

, are both in the recession cone of

. The set of all such directions, that is,

rec

(

)

∩rec

(

−C

) is called the lineality space of

and is denoted by

lin

(

It is the largest subspace,

, such that

. Note that a convex cone

is pointed if and only if its lineality space is {0}.

Convex functions

Let

X⊆Rn→R

be a function. The epigraph of

is the set of all points above the graph,

epi g

{

(

x, z

)

∈Rn+1

z≥g

(

)

}

1.1. Mathematical Preliminaries 9

We say that

is convex in

C⊆X

is convex and for every

x, y ∈C

and

λ∈

1],

(

λx

+ (1

−λ

)

≤λg

(

) + (1

−λ

)

(

). Equivalently,

is convex

if its epigraph is convex. We say that

is concave when

−g

is convex and

every concept we define for convex functions has its counterpart for concave

functions.

When

is differentiable and convex in

we have that

(

) +

∇g

(

)

(

x−

)

≤g

(

) for every

x, y ∈C

. For a given

, this inequality means that the

tangent hyperplane at

of the graph of

(

) +

∇g

(

)

(

x−y

), is always

below the function. Equivalently, it means that the epigraph of

x↦→ g

(

) +

∇g

(

)

(

x−y

) is a valid inequality for the epigraph of

. Actually, since the

inequality is tight when

, the inequality supports

epi g

. In general, convex

functions do not need to be differentiable, however, the epigraph is still convex

and it still has supporting hyperplanes. A subgradient of a convex function is

the normal of a supporting hyperplane, when the inequality is written in a

similar form to the differentiable case. Specifically, a vector

is a subgradient

(

x−y

)

≤g

(

) for every

x∈C

. The set of all subgradients

is called the subdifferential of

and its denoted by

∂g

(

). Thus,

∂g(y) = {v∈Rn:g(y) + vT(x−y)≤g(x)∀x∈C}.

For example,

(

) =

|x|

is convex, not differentiable at 0, and

∂g

(0) = [

−

1].

A function

is positively homogeneous if

(

λx

) =

λg

(

) for every

λ≥

and all

. A function

is subadditive if

(

)

≤g

(

) +

(

). A function

is sublinear if it is positively homogeneous and subadditive. Equivalently,

is sublinear if it is positively homogeneous and convex. The epigraph of a

sublinear function from

is a closed convex cone. We say that a convex

set is represented by a sublinear function gif C={x:g(x)≤1}.

Given a convex function

C→R

(

)

≤

0 is called a convex constraint.

We have that for any x¯∈Cand v∈∂g(x¯),

g(x¯) + vT(x−x¯) ≤0 (1.2)

is a valid inequality for

(

)

≤

0. Thus, if

x¯∈C

violates the convex constraint,

that is,

(

x¯

)

0, then

(1.2)

separates

x¯

from

(

)

≤

0. To see this, recall

that

(

x¯

) +

(

x−x¯

)

≤g

(

) for every

x∈C

. In particular, if

satisfies the

constraint, then

(

x¯

(

x−x¯

)

≤g

(

)

≤

0, which shows the validity of

(1.2)

Evaluating

(1.2)

x¯

yields

(

x¯

)

≤

0 from where we conclude that

x¯

does not

satisfy

(1.2)

. We call such inequalities gradient cutting planes, or gradient cuts

for short, because when gis differentiable vcan only be the gradient ∇g(x¯).

X⊆Rn→R

is a function and

C⊆X

is convex, then we denote by

gvex

aconvex underestimator of

over

. This means that

gvex

C→R

10 Chapter 1. Introduction

a convex function and underestimates

, that is,

gvex

(

)

≤g

(

) for all

x∈C. Similarly, we define a concave overestimator.

Matrices

A matrix

M∈Rn×n

is symmetric if

. We say that a

symmetric matrix

is positive semi-definite if

xTMx ≥

0 for every

x∈Rn

Given an integer

, we denote by

the cone of positive semi-definite matrices

of size

n×n

. A matrix

is copositive if

xTMx ≥

0 for every

x∈Rn

. A

k×k

submatrix of a matrix

is a matrix formed by the deleting all but

columns and

rows of

. The rank of a matrix

is the number of linearly

independent columns, which is the same as the number of linearly independent

rows, and we denote it by rk M.

General notation

Given an interval

I⊆R

and an arbitrary set

A⊆Rn

we denote by

the set

{λx

λ∈I, x ∈A}

. Likewise, for

x∈Rn

{λx :λ∈I}.

Given

n∈N

, we denote by [

] =

{

, . . . n}

. If

and

are sets and

finite, we denote by BAthe set B|A|, where |A|is the cardinality of A.

1.2 Intersection Cuts

Intersection cuts are the topic of chapters 4 and 5. In this section, we give a

brief introduction to intersection cuts.

The history of intersection cuts and

-free sets dates back to the 60’s.

They were originally introduced in the nonlinear setting by Tuy (1964) for the

problem of minimizing a concave function over a polytope. Later on, they were

introduced in integer programming by Balas (1971) and have been largely

studied since. The more modern form of intersection cuts deduced from an

arbitrary convex

-free set is due to Glover (1973), although the term

-free

was coined by Dey and Wolsey (2010).

We illustrate the idea with the following integer program

max{−12x+5y:x+4y≤17,−4x+y≤ −3,5x−6y≤1, x, y ∈Z},(1.3)

depicted in Figure 1.1. The LP relaxation solution is

x¯

= (

17 ,65

). The nearest

feasible point is at a distance of

√︁13/17

, and so there is no feasible point in

the interior of the ball centered at

x¯

of radius

√︁13/17

. If

{

(

x, y

)

∈Z2

x+ 4y≤17,−4x+y≤ −3,5x−6y≤1}, then this ball is an S-free set.

The LP solution is the apex of a cone whose extreme rays are the edges of

the polyhedron adjacent to the LP solution. Now, consider the points where

the extreme rays of the cone intersect the ball and build the hyperplane (in

1.2. Intersection Cuts 11

1 2 3 4

Figure 1.1: The left plot shows the integer points in black, the LP relaxation

(1.3)

in blue, and the optimal LP solution in red. The middle plot highlights

the ball centered at the optimal LP solution of radius equal to the distance

between the optimal LP solution and the nearest feasible point in orange. It

also shows the extreme rays of the conic relaxation starting at the optimal

LP solution in green. The right plot shows the intersection points of the ball

with the cone in green, the intersection cut in gray, and the region cutoff by

the cut also in gray.

this case just a line) that goes through those points. This hyperplane defines

a valid inequality that separates the LP solution from

. The reason why

it is valid is that the region of the LP relaxation cutoff by the inequality

is completely contained inside the ball. This happens because the ball is a

convex set. As the ball does not contain any feasible point in its interior, the

cut must be valid. Such a cutting plane is an intersection cut.

In general, there are three ingredients for the construction of intersection

cuts. First, the set of (or a relaxation of the) feasible points

. Second, a

simplicial cone that contains the feasible region and whose apex is the LP

solution (or the point to separate). Third, an

-free set

that contains the

LP solution in its interior. We ask for the cone to be simplicial so that the

intersection of its extreme rays with Cdefines a unique hyperplane.

Note that the larger the

-free set, the better the intersection cut. The

intuition is that if

and

are

-free and

is larger than

, then the

intersection of an extreme ray of the cone with

will be farther away, and

thus the cut will be deeper. This is illustrated in Figure 1.2 where we compare

the cut obtained in the above example with the intersection cut deduced by

using as

-free set the largest ball centered at the LP solution that does not

include any integer point in its interior.

How can we build a simplicial cone whose apex is the LP solution and

12 Chapter 1. Introduction

1234

Figure 1.2: The left plot shows the intersection cut for

(1.3)

obtained above.

The right plot shows the intersection cut obtained from the

-free set given

by a Z2-free ball.

that contains the whole feasible region? Luckily, such a cone appears quite

naturally when we solve the LP using the simplex algorithm. Consider a linear

program

max{cTx

Ax ≤b}

. The simplex algorithm starts at a vertex of

Ax ≤b

and iteratively moves to a neighbor vertex with better objective value

if there is one. If there is none, then the vertex is optimal. A vertex is a feasible

point defined by the intersection of

independent hyperplanes among the

ones in

Ax ≤b

. Ignoring all but the

constraints that define a vertex, yields

a simplicial cone whose apex is the vertex and contains the whole LP, see the

middle plot in Figure 1.1. When an optimal solution is obtained, one can find

out the

constraints that the simplex algorithm considered in order to define

the solution. Therefore, intersection cuts are readily available in LP-based

branch and bound algorithms if we are able to construct an

-free set that

contains the LP solution in its interior.

We will now present a more algebraic deduction of intersection cuts whose

advantage is that it admits a generalization of intersection cuts. As it turns

out, this generalization is only relevant when the

-free set is unbounded. We

will also give a geometric characterization of the generalization and show that

in this case it no longer holds that larger S-free sets yield better cuts.

The simplex algorithm is usually presented using the so-called standard

from of an LP, namely,

max{cTx

b, x ≥

}

. The advantage is

that the algebraic description of the algorithm is simpler, but certainly the

1.2. Intersection Cuts 13

geometric intuition is obfuscated. But the story is the same. We have

variables and

constraints,

from

and

from

x≥

0. Since

of these constraints are equality, we simply need

n−m

more to define a point,

assuming, as we are, that the equality constraints are linearly independent.

These

n−m

can only come from

x≥

0. Thus, any vertex will have

n−m

variables fixed to 0 and the others will be the unique solution to the remaining

system of equations. As above, not every selection of

n−m

constraints from

x≥

0 yields a vertex, but some do. In particular, if a selection does, then the

matrix describing the remaining system is invertible. That is, the columns of

associated to the

variables not fixed to 0 after setting

n−m

constraints

from

x≥

0 to equality are linearly independent. These variables are called

basic variables, their indices are called a basis, and the remaining variables

are called non-basic.

Let

be a basis and let

be the indices of the non-basic variables. We

can partition the system

into basic and non-basic variables. For this we

introduce the following notation: if

I⊆ {

, . . . , n}

, then

represents the col-

umns of

indexed by

, while

the subvector of variables indexed by

. Then

is equivalent to

ABxB

ANxN

. From the above discussion

is an invertible matrix, thus

is equivalent to

A−1

Bb−A−1

BANxN

This is the so-called tableau.

There is a lot of important information in the

tableau. In particular, the apex of the simplicial cone is (

xB, xN

) = (

A−1

Bb,

0),

while its extreme rays are (

xB, xN

) = (

−A−1

BANej, ej

) for

j∈N

. Note that

although

x∈Rn

, the feasible points are in an

n−m

dimensional space, as-

suming

has full rank. So the cone is actually simplicial only in the solution

space, as it has

n−m

rays. Thus, it gets a bit more complicated to picture

this, but the beauty is that we can deduce the intersection cuts directly from

the tableau.

Consider an optimization problem

and assume that the tableau of an

LP relaxation of it is

, where

are the basic and

the non-basic

variables. Let

be a closed set such that for every feasible solution (

x, s

) of

, it holds that

x∈S

. Furthermore, assume that

f /∈S

, that is, the optimal

LP solution (

0) is not feasible. Let

be an

-free set such that

f∈int C

Let us assume that

is given by

(

x−f

)

≤

}

, where

sublinear. Now, any

s≥

0 defines an

and

(

x−f

) =

(

). Thus,

as long as

(

)

x∈C

and

itself cannot be feasible. We conclude

that if (

x, s

) is to be feasible, then

(

)

≥

1, that is,

(

)

≥

1 is a valid

(nonlinear) inequality. To make it linear, we use the sublinearity of

and the

The tableau also has a row with the objective function, but we omit it as it is not relevant

for our current discussion.

14 Chapter 1. Introduction

non-negativity of the variables. Indeed,

1≤ϕ(Rs) = ϕ(∑︂

Rjsj)≤∑︂

ϕ(Rjsj) = ∑︂

ϕ(Rj)sj,

where the second inequality follows from the subadditivity of

and the last

equality follows from the positive homogeneity of

and the non-negativity of

. Such a function

is also called a cut generating function, since evaluating

it at the given rays is sufficient enough to obtain the cut’s coefficients.

When

is the gauge of

C−f

, then the cut above corresponds to the

intersection cut described geometrically above. Indeed, the points

ϕ(Ri)ei

assuming

(

)

0, satisfy the inequality

∑︁jϕ

(

)

sj≥

1 with equality.

These points define

ϕ(Ri)Ri

and satisfy

(

xi−f

) = 1. This means that

all

are on the boundary of

. In other words, the hyperplane

∑︁jϕ

(

)

sj≥

passes through the

n−m

points (

xi, si

), which correspond to the intersection

of the

n−m

rays (

Ri, ei

) with the boundary of the

S×Rn−m

-free set,

C×Rn−m

As mentioned before, note that the LP is

b, x ≥

0 so, even though

x∈Rn

and we need

points to define a hyperplane, the feasible region

lives in the translated subspace

. Therefore, we are working on

Rn−m

embedded in

and only

n−m

points define a unique hyperplane in the

space that we are working on.

A sublinear function other than the gauge, if it exists, will yield better cut

coefficients, thus, a better cut. As it turns out, if

(

x−f

)

≤

}

for

some sublinear function

and

RiR+

is a ray that is not in the interior of

the recession cone of

, then

(

) is equal to the gauge of

C−f

. That

is, the only way of improving on a coefficient is that

RiR+∈int rec

(

In other words, the possibility of improving the cut coefficients can only occur

when

is unbounded and, furthermore, when a ray of the simplicial cone is

in the interior of the recession cone of

. Note that when this occurs, then

the gauge of

C−f

is 0 and if an improvement is possible, then the

coefficient must be negative. A negative coefficient can never be achieved with

the gauge as the gauge is always non-negative.

This phenomenon was first observed by Glover (1974). Glover interpreted

the negative coefficient as moving in the negative direction of the ray instead

of the positive one.

Here we provide an interpretation of the negative edge extension. Consider

the following set

{

(

x, y

)

∈R2

x−y≥

∨x−

y≥

}

, see Figure 1.3.

Clearly, a maximal

-free set is

{

(

x, y

)

∈R2

x−y≤

, x −

y≤

}

The cone with apex 0 and rays

and

is simplicial and contains the whole

feasible region, so we use it to generate the intersection cut. The intersection

1.2. Intersection Cuts 15

-1 0 1 2 3 4

-3

-2

-1

-1 0 1 2 3 4

-3

-2

-1

-1 0 1 2 3 4

-3

-2

-1

Figure 1.3: The left plot shows the set

in blue. The middle plot shows the

set

in blue and

in orange with the intersection cut obtained by the gauge.

The right plot shows S,Cand the cut obtained with ϕ.

cut obtained from the simplicial cone and the

x≥

1. Indeed, the gauge

ϕC

, satisfies

ϕC

(

) = 1, since

e1∈C

, and

ϕC

(

) = 0, as

λe2∈C

for every

λ≥

0. As it turns out,

{

(

x, y

)

∈R2

(

x, y

)

≤

}

for

(

x, y

) =

max{x−y

2, x −

. Note that

(

) =

max{−1

2,−

−1

. Thus,

ϕis not the gauge and, more importantly, the cut x−1

2y≥1 is valid.

The interpretation of the coefficients of the intersection cut obtained by

the gauge is as follows. If we move along the ray

, then we hit the boundary

at 1

, thus the cut coefficient is

. Instead, if we move along

, then

we “hit” the boundary of Cat “∞e2” and the cut coefficient is 1

∞= 0.

However, we can actually tilt this cut to make it stronger. How much

can we tilt it? Well, we can tilt as long at the cut off region is inside

The tilted cut intersects the

axis at some negative point. The higher the

point the stronger the cut, see Figure 1.4. The coefficient of the intersection

cut obtained by the sublinear function

corresponds to the tilting whose

intersection with the

axis is the lowest point at which a supporting valid

inequality for

intersects the

axis. In this case, such a point is (0

,−

2) and

so the cut coefficient is −1

Something looks off, though, the cut is not the best possible. How can

we achieve a better cut? Consider the weaker

-free set

{

(

x, y

)

∈R2

x−y≤

, x −

y≤

}

. We have that

{

(

x, y

)

∈R2

(

x, y

)

≤

}

where

(

x, y

) =

max{x−y, x −

. Now the intersection cut is

x−y≥

and it cannot be strengthened anymore as it defines a facet of conv(S).

What happened? By moving the facet

x−y≤

2 of

to the left until

x−y≤

1, we did not change the intersection point of the ray

. However,

we did make the lowest point at which a valid inequality for

intersects the

16 Chapter 1. Introduction

-101234

-3

-2

-1

Figure 1.4: The plot shows the set

in blue and the set

in orange. We see

the intersection cut obtained with the gauge (dashed), a better tilted cut that

intersects the

axis at

−

5 (green), and the intersection cut obtained with

(

x, y

) (red). The higher the intersection with the

axis, the better the cut.

Also, the red cut intersects the

axis at the lowest point that a supporting

valid inequality of Cintersects the yaxis. Supporting valid inequalities of C

intersect the yaxis between the black dot and the red dot.

1.3. Duality 17

-1 0 1 2 3 4

-3

-2

-1

-1 0 1 2 3 4

-3

-2

-1

Figure 1.5: The left plot shows how shrinking the

-free set moves the lowest

intersection with the

axis up. The right plot shows the final intersection cut,

which defines the closure of the convex hull of S.

yaxis higher, thus the cut is stronger. For an illustration see Figure 1.5.

The above is an example that larger

-free sets are not always better when

one builds intersection cuts with sublinear functions other than the gauge.

Let

be an

-free set. When the ray actually intersects the boundary of

it is clear that if we extend

in that direction, then the intersection point is

going to be farther away as we discussed above and illustrated in Figure 1.2.

However, the interpretation of the cut coefficient with a sublinear function is a

bit more involved and uses more global information. Indeed, making

larger

in some direction will affect which inequalities are valid and so it can have

a (negative) effect on the cut coefficient for rays that are contained inside C.

This is what the above example illustrates.

We refer the reader to Conforti et al. (2011b) and Conforti et al. (2015)

for more details on intersection cuts.

1.3 Duality

In chapters 2 and 5, we mention and use Slater’s condition, respectively. This

is a condition that ensure strong duality of convex problems. Here we give a

brief introduction to duality aiming at explaining Slater’s condition from a

geometrical point of view.

In this section we give a brief introduction to intersection cuts. Consider

18 Chapter 1. Introduction

a linear program

max{cTx

Ax ≤b}

. Suppose its optimal value is

. This

means that

cTx≤z

for every

such that

Ax ≤b

. In fact, it is the tightest

valid inequality for

Ax ≤b

with normal

. Thus, instead of solving

max{cTx

Ax ≤b}

directly, one can try to find the tightest valid inequality for

Ax ≤b

with normal

. Alternatively, one can think of it as finding the best upper

bound on the value that

cTx

can achieve over

Ax ≤b

. But how can we do

this?

It should, of course, be possible to deduce the inequality

cTx≤z

just

from the information in

Ax ≤b

. For example, consider

max{

: 4

x−

y≤

,−x

+ 3

y≤

}

. The optimal solution is obtained at (

x¯, y¯

) = (1

and has a solution value of 5. Thus, the inequality 3

y≤

5 is valid for

{

(

x, y

):4

x−y≤

,−x

+ 3

y≤

}

. Indeed, we can deduce it from 4

x−y≤

and

−x

+ 3

y≤

5 by multiplying the first inequality by 10, the second one by

7, and then adding them up. This yields 33

+ 11

y≤

55, which is the same

as 3x+y≤5.

It is a fundamental result in linear programming, called Farkas’ lemma,

that if

Ax ≤b

is non-empty, then every valid inequality can be deduced

by considering a conic combination of the constraints (Ziegler, 1995). Why

the non-empty assumption? The problem is that every inequality is valid

when

Ax ≤b

is empty, but to be able to write every inequality as a conic

combination of

Ax ≤b

one needs enough inequalities, more than the ones

needed to describe an empty set. For example,

{

(

x, y

)

∈Rn

x≤

, x ≥

}

is clearly empty, thus the inequality

y≤

0 is valid. However, there is no way

of building that inequality by taking positive linear combinations of

x≤

and −x≤ −1.

With Farkas’ lemma we can write the problem of finding the tightest valid

inequality for

Ax ≤b

with normal

as follows. Every valid inequality is given

µTAx ≤µTb

for some

µ≥

0. The normal of the inequality has to be

thus we have the constraint

µTA

and it has to be the tightest, that is, the

right hand side,

µTb

has to be the smallest. Thus, when

Ax ≤b

is feasible,

we have

min{µTb:µTA=c, µ ≥0}= max{cTx:Ax ≤b}.

The problem on the left hand side is called the dual problem and the one in

the right hand side, the primal.

There are many ways of deducing the dual problem. A standard way is

through Lagrange duality. The idea is as follows. The problem

max{cTx

Ax ≤b}

can be written as an unconstrained problem using

IRm

−

, the indicator

1.3. Duality 19

function of Rm

−,

IRm

−(y) = {︄0,if y≤0

+∞,otherwise.

We have

max{cTx

Ax ≤b}

max cTx−IRm

−

(

Ax −b

). The dual tries to

bound the optimal value. One way to find a bound is to find an overestimator

of the objective function. We have that

IRm

−

(

)

≥µTy

for any

µ∈Rm

. Indeed,

y≤

0, then the left hand side is +

∞

, so the inequality holds. Otherwise,

the left hand side is 0, while the right one is non-positive, so the inequality

holds. Therefore, for any µ≥0,

max{cTx:Ax ≤b} ≤ sup

xcTx−µT(Ax −b).

We can now take the best µ≥0 to get

max{cTx:Ax ≤b} ≤ inf

µ≥0sup

xcTx−µT(Ax −b).

The function

(

x, µ

) =

cTx−µT

(

Ax −b

) is called the Lagrangian function,

(

) =

supxL

(

x, µ

) is the Lagrangian dual function, and

infµ≥0θ

(

) is the

(Lagrangian) dual problem of max{cTx:Ax ≤b}. We have that

θ(µ) = sup

xcTx−µT(Ax−b) = sup

x(c−ATµ)Tx+µTb={︄µTb, if c−ATµ= 0

∞,otherwise.

Thus, the Lagrangian dual is

inf{µTb:ATµ=c, µ ≥0},

which is the same as the linear programming dual.

The advantage of Lagrangian duality is that the deduction of the dual

generalizes to other types of problems. For example, consider

max{ex

x2≤

y, y ≤

}

. The reasoning in the linear case was to find valid inequalities that

can be deduced from the constraints. Luckily, Farkas’ lemma tells us how these

valid inequalities look like and so we could write an optimization problem to

find the tightest one. Here, it is not clear how the valid inequalities actually

look like. However, Lagrangian duality still yields a dual.

The disadvantage, though, is that it will not be clear that the bound

provided by the Lagrangian dual is equal to the optimal value of the primal.

In fact, even if the primal is convex there can be a positive difference between

the optimal values of the primal and dual problems. We refer to the optimal

value of the primal as primal value and the optimal value of the dual es dual

20 Chapter 1. Introduction

value. When the primal and dual values coincide, we say that strong duality

holds. The difference between the primal and dual values is called duality gap.

To see that there are convex problems with positive duality gap, let us

compute the Lagrangian dual of

max{−e−x

√︁x2+y2≤y}

. The Lagrangian

function is

(

x, y, µ

) =

−e−x−µ

(

√︁x2+y2−y

). The Lagrangian dual function

(

) =

supx,y −e−x−µ

(

√︁x2+y2−y

). By Cauchy-Schwarz inequality

y≤√︁x2+y2

for all

x, y ∈R

, so

−µ

(

√︁x2+y2−y

)

≤

0 for every

x, y ∈R2

and µ≥0. Thus, θ(µ)≤supx,y −e−x= 0.

Let us show that actually θ(µ) = 0 for all µ≥0. Notice that

−e−x−µ(√︁x2+y2−y) = −e−x−µx2

√︁x2+y2+y.

Replacing yby exabove and computing the limit as x→ ∞ we obtain

lim

x→∞−e−x−µx2

√x2+e2x+ex= 0.

Thus, θ(µ) = 0 for every µ≥0.

However, the primal’s feasible region is

{

}×R+

and its optimal value is,

thus, −e0=−1.

To understand why this could happen, let us interpret the dual from a

more geometric point of view. For this, let us abstract the problem a bit.

Consider

max{f

(

) :

(

)

≤

}

. The Lagrangian dual function is then

(

) =

supxf

(

)

−∑︁iµigi

(

). Thus, we have that

(

)

−∑︁iµigi

(

)

≤θ

(

)

for every

. An enlightening way of interpreting this inequality is to see it as a

valid inequality of a set. Indeed, the inequality is saying that the hyperplane

y0−∑︁iµiyi≤θ

(

) is valid for the set Φ(

) =

{

(

)

, g1

(

)

, . . . , gm

(

)) :

x∈Rn}

, where Φ(

) = (

(

)

, g1

(

)

, . . . , gm

(

)). Thus, we can interpret the

Lagrangian dual function as a function that given

µ≥

0, finds the best

right-hand side of a valid inequality with normal (1

,−µ

) for Φ(

). Then,

the Lagrangian dual problem seeks the normal (1

,−µ

) such that the valid

inequality with that normal has the best (smallest in this case) right-hand

side.

So, why do we have a positive duality gap for

max{−e−x

√︁x2+y2≤

? To answer this question we need to understand how Φ(

) looks when

Φ(

x, y

) = (

−e−x,√︁x2+y2−y

). Figure 1.6 shows Φ([

−1

2,1

]

[

−1

2,1

]) and

Φ([

−1

[

−1

150]). One can prove that Φ(

) = ((

−∞,

∞

))

∪

{

(

−

}

. From here we see that for every

µ≥

0, the tightest valid inequality

for Φ(

) with normal (1

,−µ

) is

y0−µy1≤

0. In other words,

(

) = 0 for

every µ≥0 as we saw above.

1.3. Duality 21

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

Figure 1.6: The left plot shows Φ([

−1

2,1

]

[

−1

2,1

]) and the right one shows

Φ([−1

2,5] ×[−1

2,150]), where Φ(x, y) = (−e−x,√︁x2+y2−y).

When can we ensure that strong duality holds? Consider again

max{f

(

) :

(

)

≤

}

and let

p∗

be the optimal value. Assume that

is concave and the

are convex and notice that

y0−∑︁iµiyi≤θ

is a valid inequality for Φ(

)

with

µ≥

0, if and only if, it is valid for Φ(

) + (

R−×Rm

). The advantage of

Φ(

) + (

R−×Rm

) over Φ(

) is that it is convex. Now, as

p∗

is the optimal

value, it follows that there cannot be any feasible point,

such that

(

)

≤

for all i, such that f(x)< p∗, that is,

(Φ(Rn)+(R−×Rm

+)) ∩((p∗,+∞)×Rm

−) = ∅.

We illustrate Φ(

) + (

R−×Rm

) and (

p∗,

∞

)

×Rm

−

in Figure 1.7 for

max{−e−x:√︁x2+y2≤y}.

Now, Φ(

) + (

R−×Rm

) and (

p∗,

∞

)

×Rm

−

are two convex sets which

do not intersect. Therefore, from separation theorems, we know that there

must exist a hyperplane separating both sets. For our current example,

= 0

is the only hyperplane that separates both sets, but remember that the dual

tries to find a hyperplane with a nonzero coefficient for

and contains Φ(

)

on one side, thus,

= 0 is not feasible for the dual problem. So, how could we

ensure that, first, such a hyperplane exists and, second, it actually separates

Φ(

) from (

p∗,

∞

)

×Rm

−

? Note that the existence of such a hyperplane is

related to the feasibility of the dual problem, while the separation of Φ(

)

from (

p∗,

∞

)

×Rm

−

ensures that the dual achieves the same value as the

primal.

22 Chapter 1. Introduction

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

Figure 1.7: The set Φ(

) is depicted in blue and (

−

,∞

)

×R−

in orange,

where Φ(x, y) = (−e−x,√︁x2+y2−y).

We will now see that if Φ(

) intersects the interior of

R×Rm

−

, then

we will have that the dual is feasible and equal to the primal. That is, if

there exists an

such that

(

)

0 for all

i∈

[

], then strong duality

holds. Indeed, such a point forces every hyperplane separating Φ(

) from

(

p∗,

∞

)

×Rm

−

to have a nonzero coefficient for

. This should be fairly

intuitive from the pictures. To see this algebraically, let

µ0y0−∑︁µiyi≤θ

be a hyperplane that separates Φ(

) from (

p∗,

∞

)

×Rm

−

. In particular,

(

µ0, µ

)



= 0 as otherwise

µ0y0−∑︁µiyi≤θ

would not be a hyperplane. As

(

f0, g1

(

)

, . . . , gm

(

))

∈

Φ(

), it follows that

µ0f

(

)

−∑︁µig

(

)

≤θ

. As

(

∈

(

p∗,

∞

)

×Rm

−

for every

p > p∗

, it follows that

θ≤µ0p

for every

p > p∗, which implies that θ≤µ0p∗. Thus, µ0f(x0)−∑︁µig(x0)≤µ0p∗.

Now, if

µ0

= 0, then

−∑︁µig

(

)

≤

0, but

µ≥

0 and

(

)

0, which can

only hold if

= 0. However, this contradicts (

µ0, µ

)



= 0. Therefore

µ0>

and we can normalize so that

µ0

= 1. This shows that the dual is feasible and

that its value is equal to the primal. Indeed,

(

)

−∑︁µig

(

)

≤p∗

implies

that θ(µ)≤p∗, but by construction, θ(µ)≥p∗.

If there exists an

such that

(

)

0 for

i∈

[

], then we say that

Slater’s condition holds and

is called an Slater point. Thus, we have proven

that if the primal is feasible, bounded and, Slater’s condition holds, then

there is strong duality. The above result still holds when Slater’s condition is

weaken to ask that there exists a point

such that

(

)

0 for every

that is non-linear, see (Rockafellar, 1970, Theorem 28.2). The proof of such a

1.4. Monoidal Strengthening 23

result follows the same reasoning, but one needs a slightly stronger separation

theorem that exploits the polyhedrality of (

p∗,

∞

)

×Rm

−

, see (Rockafellar,

1970, Theorem 20.2).

More interpretations of duality among these lines can be found in Pourciau

(1980).

1.4 Monoidal Strengthening

In Chapter 4, we apply a modification of monoidal strengthening to intersec-

tion cuts. In this section, we explain what monoidal strengthening is.

Monoidal strengthening is a technique introduced in 1980 by Balas and

Jeroslow (1980). Our deduction of the monoidal strengthening technique ap-

plied to disjunctions is novel and is inspired by Wiese (2016, Section 4.2.3) and

several conversations with Sven Wiese. We also present the general technique

in the more modern framework of S-free sets.

Before we start, a monoid is the discrete analog of a convex convex. A

monoid is a pair (

+) where

is a set and + :

M×M→M

such that

+ is associative and there exist 0

∈M

such that +(0

,·

) is the identity. The

name monoidal strengthening comes from the use of a monoid to strengthen

cuts.

As we discussed in Section 1.2, a simple way of generating cutting planes

is through cut generating functions. In this setting, and for the rest of this

section, we will assume that we have the following relaxation of the feasible

region of our optimization problem

{(x, y)∈Rq

+×Zp

+:∑︂

rixi+∑︂

djyj∈S},(1.4)

where

S⊆Rn

is a closed set such that 0

/∈S

and

ri, dj∈Rn

. We also have

a convex

-free set

such that 0

∈int C

. The set

is represented by a

sublinear function ϕ, that is,

C={z∈Rn:ϕ(z)≤1}.

The intersection cut generated by

that separates the point (

x, y

) = (0

from (1.4) is ∑︂

ϕ(ri)xi+∑︂

ϕ(dj)yj≥1

Probably the most intuitive way of understanding monoidal strengthening

is to see it as a technique that takes a relaxation of the form

{(x, y)∈Rq

+×Zp

+:∑︂

rixi+∑︂

djyj∈S},

24 Chapter 1. Introduction

and builds new ones,

{(x, y)∈Rq

+×Zp

+:∑︂

r′ixi+∑︂

d′jyj∈S′}.

Each of them can generate a cut that separates (0

0) and, of course, the idea

is to select a “best” one. The construction of new relaxations exploits the fact

that some variables are restricted to be integers and the structure of the set

Let us see two examples before we present the general principle of monoidal

strengthening.

1.4.1 One Row Relaxations: Gomory Cuts

Assume f /∈Z,n= 1, and that (1.4) is

∑︂

rixi+∑︂

djyj∈S:= Z−f,

where

ri, dj∈R

. As each

yj∈Z

, adding some integer multiple of

to the

above relation does not change S. That is, if mj∈Z, then

∑︂

rixi+∑︂

djyj+∑︂

mjyj∈Z+∑︂

mjyj−f=Z−f.

Thus, if

is a convex

-free set represented by

, such that 0

∈int C

then not only is ∑︁iϕ(ri)xi+∑︁jϕ(dj)yj≥1 a valid inequality, but also

∑︂

ϕ(ri)xi+∑︂

ϕ(dj+mj)yj≥1 for every mj∈Z.

Note that in this particular case, the only maximal

-free set that contains

0 is

= [

−f,

−f

] and the only sublinear function

such that

{x∈

(

)

≤

}

is its gauge,

(

) =

max{x

1−f,−x

. Using

and finding the

best

for each

yields the Gomory cut (Gomory, 1960). By best here we

mean the mjthat makes ϕ(dj+mj) the smallest.

1.4.2 Disjunctive Cuts

Let Qbe an index set and consider an optimization problem Psuch that

S={(x, y)∈Rq

+×Zp

+:⋁︂

k∈Q

a(k)Tx+d(k)Ty≥1},

1.4. Monoidal Strengthening 25

is a valid disjunction, that is, every feasible solution of

is in

. Here, we

denote the vectors as

(

)

∈Rq

and

(

)

∈Rp

instead of the more usual

notation akand dk. As (1.4) we use

∑︂

ejxj+∑︂

ej+qyj= (x, y)∈S.

Consider the S-free set

C={(x, y)∈Rq+p:a(k)Tx+d(k)Ty≤1 for k∈Q}

A sublinear function representing C, which may or may not be its gauge, is

ϕC(x, y) = max

k∈Qa(k)Tx+d(k)Ty. (1.5)

Thus, we obtain the cut

1≤∑︂

ϕC(ej)xj+∑︂

ϕC(eq+j)yj=∑︂

j(︃max

k∈Qa(k)j)︃xj+∑︂

j(︃max

k∈Qd(k)j)︃yj

This cut is known as disjunctive cut (Balas, 1979) and the implication

⋁︂

k∈Q∑︂

a(k)jxj+∑︂

d(k)jyj≥1 =⇒∑︂

xjmax

k∈Qa(k)j+∑︂

yjmax

k∈Qd(k)j≥1,

is known as the maximum principle.

Monoidal strengthening in this setting amounts to finding a new dis-

junction that every feasible point must satisfy. Balas and Jeroslow showed

how to build new disjunctions. For their construction we need that if each

disjunction is relaxed enough, then it is automatically satisfied. More for-

mally, we need that for each

, there is a

such that every

x∈S

sat-

isfies

(

)

(

)

Ty≥bk

. In other words, we need that the expression

a(k)Tx+d(k)Tyis bounded from below in the feasible region of P.

For example, consider

{x∈R2

x1+x2

3≥

∨x1≥

}

and assume

that the feasible region of

. Then,

x1+x2

3≥1

is a valid inequality for

In other words, if we relax the first disjunctive term

x1+x2

3≥

1 by

, then we

obtain an inequality satisfied by every element of

. Thus,

. Similarly,

= 0 is a lower bound for the second disjunctive term that makes it valid

for every element of S.

On the other hand, consider

{x∈R2

2≥

∨x1−x2≥

}

. While

is a valid bound for

, there is no

such that

x1−x2≥b2

is valid

for

. One reason is that

= 2,

x2≥

0 is in

and so

x1−x2

is unbounded

from below.

26 Chapter 1. Introduction

Given the lower bounds bkwe have the following lemma which will allow

us to build new disjunctions. Notice that we can, and will, assume that

bk<

as otherwise the disjunction is trivially satisfied.

Lemma 1.1. Every (x, y)∈Ssatisfies the disjunction

⋁︂

k∈Q

a(k)Tx+d(k)Ty+ (1 −bk)zk≥1 (1.6)

whenever z∈Z={z∈ZQ:z= 0 ∨∃k, zk≥1}.

Proof. If

= 0 there is nothing to prove. Let

z

= 0 and let

k0∈Q

be such

that

zk0≥

1. Then, the disjunction is satisfied because

(

)

(

)

−bk0

)

zk0≥

1 is a relaxation of

(

)

(

)

Ty≥bk0

which is satisfied

by hypothesis. To see that it is a relaxation just notice that

bk0≥

−

bk0)zk0.

As written in the lemma above, this disjunction is not interesting due to

the fact that for any given

z∈Z

(1.6)

is either the original disjunction or is

redundant. However, by making

depend on

, we obtain a non-trivial new

disjunction.

Theorem 1.2 (Balas and Jeroslow (1980, Theorem 3)).Let

M:= {m∈ZQ:∑︂

k∈Q

mk≥0}(1.7)

and consider m(k)∈Rpfor k∈Qsuch that (m(k)j)k∈Q∈ M for all j∈[p].

Then, ⋁︂

k∈Q

a(k)Tx+ (d(k) + (1 −bk)m(k))Ty≥1 (1.8)

is a valid disjunction for (x, y)∈S.

Proof. Let (x¯, y¯) ∈Sand let z∈ZQbe defined by zk=m(k)Ty¯. Since

∑︂

zk=∑︂

y¯j

⏞⏟⏟⏞

∈Z+∑︂

m(k)j

⏞⏟⏟ ⏞

≥0

we conclude that z∈Z.

On the other hand, note that (1.6) is equivalent to

⋁︂

k∈Q

a(k)Tx+d(k)Ty+ (1 −bk)m(k)Ty¯≥1.

1.4. Monoidal Strengthening 27

z∈Z

, Lemma 1.1 implies that the previous disjunction is valid for every

(

x, y

)

∈S

, in particular, for (

x¯, y¯

). Evaluating the disjunction at (

x¯, y¯

) yields

⋁︂

k∈Q

a(k)Tx¯ + d(k)Ty¯ + (1 −bk)m(k)Ty¯≥1.

which is equivalent to evaluating

(1.8)

at (

x¯, y¯

). Thus, (

x¯, y¯

) satisfies

(1.8)

. It

follows that every (x, y)∈Ssatisfies (1.8) as we wanted to show.

The theorem implies that each

-tuple

= (

(

)

∈Rp

k∈Q

)

such that (

(

)

k∈Q∈ M

for all

j∈

[

], yields a new valid disjunction,

namely (1.8), which in turn yields a new S-free set

CM={(x, y)∈Rq+p:ϕCM(x, y)≤1},

where ϕCM(x, y) = maxk∈Qa(k)Tx+ (d(k) + (1 −bk)m(k))Ty. Therefore,

∑︂

ϕCM(ej)xj+∑︂

ϕCM(eq+j)yj≥1

is a valid inequality for S. This inequality reads

∑︂

j(︃max

k∈Qa(k)j)︃xj+∑︂

j(︃max

k∈Q(d(k)j+ (1 −bk)m(k)j))︃yj≥1.

Choosing the best possible tuple Myields

∑︂

j(︃max

k∈Qa(k)j)︃xj+∑︂

j(︃max

k∈Qinf

m∈M(d(k)j+ (1 −bk)mk))︃yj≥1.(1.9)

1.4.3 Monoidal Strengthening

The general principle of monoidal strengthening is as follows. Assume we have

a monoid

and an (

)-free set

{

(

x, y

) :

(

x, y

)

≤

}

where

is sublinear. If

{

(

x, y

)

∈Rn

+×Zp

∑︁irixi

∑︁jdjyj∈S

is a valid

relaxation, then not only is ∑︁iϕ(ri)xi+∑︁jϕ(dj)yj≥1 valid, but also

∑︂

ϕ(ri)xi+∑︂

inf

m∈Mϕ(dj+m)yj≥1.(1.10)

In particular, the previous cut is the strongest one that can be obtained with

this technique.

28 Chapter 1. Introduction

The proof of validity follows from exploiting the integrality restrictions

. Indeed, as

is a non-negative integer,

∑︁mjyj∈M

for every

mj∈M

Every feasible solution satisfies

∑︁irixi

∑︁jdjyj∈S

and so they also

satisfy

∑︂

rixi+∑︂

(dj+mj)yj∈S+M+∑︂mjyj⊆S+M+M=S+M, (1.11)

where the last equality holds because

is a monoid, in particular, closed

under addition. Then, applying the cut generating function to

(1.11)

, we

obtain the valid inequality

∑︂

ϕ(ri)xi+∑︂

ϕ(dj+mj)yj≥1.

As the mj∈Mare arbitrary, we obtain (1.10).

For this technique to work, one actually needs a monoid. In the case

of Gomory cuts,

Z−f

. As

, one can use

as the

monoid for monoidal strengthening. One can also write the relaxation as

∑︁irixi

∑︁jdjyj∈S

and consider

to be

, in which case the monoid is

itself. In the literature, it is rather common to use

or a subset of it as the

monoid. For example, relaxations where

Zn∩P

, where

is a

polyhedron, or even a convex set have been studied, see for example Andersen

et al. (2007), Basu et al. (2010b), Mor´an and Dey (2011) and (Conforti et al.,

2014, Chapter 6). When

Zn∩P

and

is a polyhedron, a typical monoid

used for strengthening is

Zn∩lin

(

conv

(

)), see for example Dey and

Wolsey (2010), Conforti et al. (2011a) and Basu et al. (2012).

A more complicated setting is when

M

. The disjunctive case

corresponds to this more complicated setting, but to be able to see this, we

need to follow more closely the original derivation of Balas and Jeroslow (1980).

Furthermore, note that our exposition of the disjunctive case in Section 1.4.2

is not an application of the general principle of monoidal strengthening as

presented here, since we also modified the S-free set.

The setting for the disjunctive case is that we have an optimization problem

on the variables (x, y)∈Rq

+×Zp

+such that

⋁︂

k∈Q

a(k)Tx+d(k)Ty≥1

is a valid disjunction. In Section 1.4.2, we represented this by taking

S={(x, y)∈Rq

+×Zp

+:⋁︂

k∈Q

a(k)Tx+d(k)Ty≥1}

1.4. Monoidal Strengthening 29

and ∑︁jejxj+∑︁jej+qyj= (x, y)∈Sas our relaxation (1.4).

However, we can also represent the disjunction in a different way. Recall

that

are the lower bounds on the disjunctive terms

(

)

(

)

, see

Section 1.4.2. Let

Sb={w∈RQ:⋁︂

k∈Q

wk≥1, w ≥b}.

We can model this disjunction via

⎛

⎜

⎝

a(1)Tx+d(1)Ty

a(K)Tx+d(K)Ty

⎞

⎟

⎠∈Sb⇐⇒ Ax +Dy ∈Sb,

where A=⎛

⎜

⎝

a(1)T

a(K)T⎞

⎟

⎠,D=⎛

⎜

⎝

d(1)T

d(K)T⎞

⎟

⎠, and Q={1, . . . , K}.

We have that

{w∈RQ

wk≤

}

is a convex

-free set. Note

that

{w∈RQ

ϕCb

(

)

≤

}

, where

ϕCb

(

) =

maxk∈Qwk

, and it is

sublinear. Note that

ϕCb

(

A·j

) =

ϕC

(

ej,

0) and

ϕCb

(

D·j

) =

ϕC

, ej

), where

ϕCis defined in (1.5).

Now, consider the monoids

defined in

(1.7)

and

{τ∈RQ

∃m∈

M, τk

= (1

−bk

)

mk,for k∈Q}

. Let us see that

is (

)-free. Let

θ∈Sb

so that

. If

= 0, then

θ∈Sb

and so

θ /∈int Cb

-free. Otherwise, there exists

k∈Q

such that

mk>

0 so

τk≥

−bk

. Then,

θk=wk+τk≥bk+ 1 −bk≥1, thus, θ /∈int Cb.

Summarizing,

is not only

-free but also (

)-free. Thus, we

can apply monoidal strengthening to obtain the cut

(1.9)

. This argument is

basically a modern rewrite of the argument in Balas and Jeroslow (1980).

Note here that Sb+T=Sb.

In general, the challenge of monoidal strengthening is to find a monoid

such that given a closed set

and an

-free set

is also (

)-free, so

that we can apply monoidal strengthening as above.

Chapter 2

On the Relation Between the Extended

Supporting Hyperplane Algorithm and

Kelley’s Cutting Plane Algorithm

In this chapter we revisit two classical algorithms for convex mixed integer

optimization, namely, Kelley’s cutting plane algorithm and Veinott’s support-

ing hyperplane algorithm. The motivation to look into these algorithm is the

following. Some state-of-the-art LP-based MINLP solvers enforce convex con-

straint by adding gradient cutting planes. Simple examples show that these

cuts do not necessarily support the feasible region, and so they are dominated.

In order to build undominated cuts, or equivalently, supporting cuts, different

separation procedures are needed such as the one proposed by Veinott.

However, it is not always the case that gradient cutting planes are not

supporting. Thus, the purpose of this chapter is to understand when gradient

cutting planes are supporting. Our findings naturally suggest a reformula-

tion of the feasible region for which every gradient cut is supporting. As a

consequence, we can show that Veinott’s supporting hyperplane algorithm is

just a special case of Kelley’s cutting plane algorithm. As a result, we extend

the applicability of the supporting hyperplane algorithm to convex problems

represented by a class of general, not necessarily convex nor differentiable,

functions.

The insights obtained in this chapter, together with an interpretation

of gradient cutting planes as intersection cuts presented in Chapter 4 will

motivate the basic construction of maximal quadratic-free sets presented in

Chapter 5.

The chapter is organized as follows. In Section 2.1 we introduce the object

of study of this chapter and review the literature on cutting plane approaches

and efforts on obtaining supporting valid inequalities. In Section 2.2, we char-

acterize functions whose linearizations are supporting hyperplanes to their

32 On the Relation Between the ESH Algorithm and KCP Algorithm

0-sublevel sets. Section 2.3 introduces the gauge function and shows how to

use it for building supporting hyperplanes. We note that evaluating the gauge

function is equivalent to the line search step of the supporting hyperplane

algorithm. This equivalence provides the link between the supporting hyper-

plane and Kelley’s cutting plane algorithm. In Section 2.4, we show that the

cutting planes generated by the supporting hyperplane algorithm can also

be generated by Kelley’s algorithm when applied to a reformulation of the

problem. This implies that the convergence of the supporting hyperplane al-

gorithm follows from Kelley’s. In Section 2.5, we show that we can apply the

supporting hyperlane algorithm to problem whose feasible region is convex

but represented via functions that are not necessarily convex nor differentiable.

We introdue the concept of a well-behaved generalized directional derivative

and show that if the functions have well-behaved generalized directional deriv-

atives and 0 does not belong to the generalized subdifferential at points where

the functions are zero, then the supporting hyperplane algorithm converges.

Finally, Section 2.6 presents our concluding remarks.

This chapter is joint work with Ambros Gleixner and Robert Schwarz and

has been submitted to the Journal of Global Optimization.

2.1 Background

A mixed integer convex program (MICP) is a problem of the form

min{cTx:x∈C∩(Zp×Rn−p)},(2.1)

where

is a closed convex set,

c∈Rn

, and

denotes the number of variables

with integrality requirement. The use of a linear objective function is without

loss of generality given that one can always transform a problem with a convex

objective function into a problem of the form

(2.1)

. We can represent the set

in different ways, one of the most common being as the intersection of

sublevel sets of convex differentiable functions, that is,

C={x∈Rn:gj(x)≤0, j ∈J}.(2.2)

Here, Jis a finite index set and each gjis convex and differentiable.

Several methods have been proposed for solving

MICP

. When the problem

is continuous and represented as

(2.2)

, one of the first proposed methods was

the cutting plane algorithm by J. E. Kelley (1960). This algorithm exploits

the convexity of a constraint function gto build gradient cuts.

The idea of Kelley’s cutting plane (KCP) algorithm is to approximate the

feasible region with a polytope, solve the resulting linear program (LP) and, if

2.1. Background 33

the LP solution is not feasible, separate it using gradient cuts to obtain a new

polytope which is a better approximation of the feasible region and repeat,

see Algorithm 2.1.

Algorithm 2.1: Kelley’s cutting plane algorithm

1LP ={x:x∈[l, u]}, x¯←arg minx∈LP cTx

2while maxj∈Jgj(x¯) > ϵ do

3forall jsuch that gj(x¯) >0do

4LP ←LP ∩{x:gj(x¯) + ∇gj(x¯)(x−x¯) ≤0}

5x¯←arg minx∈LP cTx

6return x¯

Kelley shows that the algorithm converges to the optimum and it converges

in finite time to a point close to the optimum. By solving integer programs

(IP) using the cutting planes of Gomory (1958) instead of LP relaxations,

Kelley shows that his cutting plane algorithm solves purely integer convex

programs in finite time. The same algorithm works just as well for

MICP

However, Kelley did not have access to a finite algorithm for solving mixed

integer linear programs (MILP).

In an attempt to speed up Kelley’s algorithm, Veinott (1967) proposes

the supporting hyperplane algorithm (SH). A possible issue with Kelley’s

algorithm is that, in general, gradient cuts do not support the feasible region,

see Figure 2.1. Therefore, it is expected that better relaxations can be achieved

by using supporting cutting planes.

In order to construct supporting hyperplanes, Veinott suggests to build

gradient cuts at boundary points of

. He uses an interior point of

to find

the point on the boundary,

xˆ

, that intersects the segment joining the interior

point and the solution of the current relaxation. These cuts are automatically

supporting hyperplanes of

, at

xˆ

. However, since the cut is computed at

xˆ

which is in

, it might happen that the gradient of the constraints active at

xˆ

vanishes. For this reason, Veinott also requires that the functions representing

have non-vanishing gradients at the boundary. This is immediately implied

by, e.g., Slater’s condition (Section 1.3). Veinott also identifies that one can use

his algorithm to solve

(2.1)

when representing

by quasi-convex functions,

that is, functions whose sublevel sets are convex.

Recently, Kronqvist et al. (2016) rediscovered and implemented Veinott’s

algorithm (Veinott, 1967). They call their algorithm the extended supporting

hyperplane algorithm (ESH). They discuss the practical importance of choos-

ing a good interior point and propose some improvements over the original

34 On the Relation Between the ESH Algorithm and KCP Algorithm

method, such as solving LP relaxations during the first iterations instead of the

more expensive MILP relaxation. As a result, they present a computationally

competitive solver implementation for MICPs defined by convex differentiable

constraint functions (Kronqvist et al., 2018).

In this chapter, we would like to understand when, given a convex dif-

ferentiable function

, gradient cuts of

are supporting to the convex set

{x∈Rn

(

)

≤

}

. This question is motivated by the fact that in

this case Kelley’s algorithm automatically becomes a supporting hyperplane

algorithm. In Theorem 2.3 we give a necessary and sufficient condition for

a gradient cut of

at a given point to be a supporting hyperplane of

. In

particular, this condition suggests to look at sublinear functions, i.e., convex

and positively homogeneous functions. As it turns out, this naturally leads to

Veinott’s algorithm.

Sublinear functions and convex sets are deeply related. When the origin is

in the interior of a convex set

, then we can represent

via its gauge func-

tion

φC

, which is sublinear (Rockafellar, 1970). We give the formal definition

of the gauge function in Section 2.3, but for now it suffices to know that we

can represent

{x∈Rn

φC

(

)

≤

}

and that, in particular, for

every

x¯

= 0 a gradient cut of

φC

x¯

supports all of its sublevel sets. The

following example illustrates this.

Example 2.1. Consider the convex feasible region given by

C={(x, y)∈R2:g(x, y)≤0},

where

(

x, y

) =

y2−

1. We show through an example that gradient cuts

are not necessarily supporting to

, explain why this happens, and show

that changing the representation of

to use its gauge function solves the

issue.

Separating the infeasible point

x¯

= (

2,3

) by a gradient cut of

x¯

gives

g(x¯) + ∇g(x¯)(x−x¯) ≤0

⇔x+y≤11

This cut does not support

, see Figure 2.1. Alternatively, the gauge function

is given by

φC

(

x, y

) =

√︁x2+y2

and

{

(

x, y

) :

√︁x2+y2≤

}

. The

gradient cut of φCat x¯ is x+y≤√2, which is supporting.

From the previous discussion it is a natural idea to represent

via its

gauge function, namely,

{x∈Rn

φC

(

)

≤

}

. However, as mentioned

2.1. Background 35

−1 1

−1

−2

−1

Figure 2.1: The feasible region

and the infeasible point

x¯

= (

2,3

) to separate.

On the left we see that the separating hyperplane is not supporting to

. On

the right we see why this happens: the linearization of

x¯

is tangent to the

epigraph of

(shown upside-down for clarity) at (

x¯, g

(

x¯

)). However, when this

hyperplane intersects the

-plane, it is already far away from the epigraph,

and in consequence, from the sublevel set. The intersection of the hyperplane

with the x-y-plane is the gradient cut.

before,

is usually given by

(2.2)

. Our main contribution is to show that

reformulating

(2.2)

to the gauge representation will naturally lead to the

ESH algorithm, see Section 2.3.2. As a consequence, the convergence proofs

of Veinott (1967) and Kronqvist et al. (2016) follow directly from the conver-

gence proof of Kelley’s cutting plane algorithm (J. E. Kelley, 1960; Horst and

Tuy, 1990), see Section 2.4. In other words, we show that the ESH algorithm

is KCP algorithm applied to a different representation of the problem.2

Motivated by this approach of representing

by its gauge function, we

are able to show that the ESH algorithm applied to

(2.1)

converges even

when

is not represented by convex functions. This is related to recent work

of Lasserre (2009) that tries to understand how different techniques behave

when the convex set

is not represented via

(2.2)

. Lasserre considers sets

(

)

≤

, j ∈J}

where

are only differentiable, but not necessarily

convex in the following setting:

Strictly speaking, when the problem is mixed integer, the KCP algorithm only corresponds

to the so-called LP-step (Kronqvist et al., 2016) of the ESH algorithm. However, given

that the KCP algorithm allows for an straightforward extension to the mixed integer case,

we will continue to compare the KCP algorithm to the ESH algorithm with respect to

their technique of generating cutting planes.

36 On the Relation Between the ESH Algorithm and KCP Algorithm

Assumption 2.2.

For all

x∈C

and all

j∈J

, if

(

) = 0, then

∇gj

(

)



= 0.

Under this assumption, that is, if the gradients of active constraints do

not vanish at the boundary of

, Lasserre shows that the KKT conditions

are not only necessary but also sufficient for global optimality. In other words,

every minimizer is a KKT point and every KKT point is a minimizer.

A series of generalizations follow the work of Lasserre. Dutta and Lalitha

(2011) generalize the previous result to the case where

is represented by lo-

cally Lipschitz functions, not necessarily differentiable nor convex, but regular

in the sense of Clarke (Clarke, 1990), see also Definition 2.15. Mart´ınez-Legaz

(2014) further generalize the result to the case where

is represented by

tangentially convex functions (Lemar´echal, 1986; Pshenichnyi, 1971). Kabgani

et al. (2017) generalize the result to the case where

is represented by func-

tions that admit an upper regular convexificator URC (Jeyakumar and Luc,

1999), see also Definition 2.16. We note that regular functions in the sense of

Clarke and tangentially convex functions admit a URC (Kabgani et al., 2017),

thus the URC assumption is the most general among the ones considered in

these works.

In terms of computations, Lasserre (2011, 2014) proposes an algorithm to

find the KKT point via log-barrier functions. He shows that the algorithm

converges to the KKT point if Assumption 2.2 holds.

For all these concepts of generalized derivative, there is a notion of direc-

tional derivative and a notion of subdifferential. For example, for functions

that admit a URC, the notion of directional derivative is the upper Dini direc-

tional derivative and its subdifferential is the URC, see Definition 2.16. Let

be a function and let us denote by

f′

(

;

) a generalized directional derivative.

We say that the directional derivative is well-behaved if

f′

(

;

)

0 implies

that there exists tn↘0 such that f(x+tnd)> f(x).

In this sense we show that if

is represented by functions whose gen-

eralized directional derivatives are well-behaved, then the ESH converges to

the global optimum, under the equivalent of Assumption 2.2 (see

(2.8)

) for

the corresponding subdifferential. The upper Dini directional derivative is

certainly well-behaved and, thus, our result shows that the ESH converges

when

is represented by functions that admit a URC. We also show that

for

∂◦

-pseudoconvex (see Definition 2.19) constraints, the Clarke directional

derivative (see Definition 2.15) is well-behaved. Therefore, our result gener-

alizes the result of Eronen et al. (2017) that the ESH converges when

represented by ∂◦-pseudoconvex functions.

We also show, via an example, that if we use Clarke’s subdifferential (Clarke,

1990), the ESH does not need to converge when the functions are only Lipschitz

2.1. Background 37

continuous but not regular in the sense of Clarke.

Finally, we provide a characterization of convex functions whose lineariza-

tions are supporting to their sublevel sets. Although elementary, the authors

are not aware of its presence in the literature. In particular, this result allows

us to identify some families of functions for which gradient cuts are never

supporting (see Example 2.7) and some for which they are always supporting

(see Corollary 2.5 and Example 2.6).

2.1.1 Literature Review

We can think of the algorithms of J. E. Kelley (1960) and Veinott (1967) as a

mixture of two ingredients: which relaxation to solve and where to compute the

cutting plane. Indeed, at each iteration we have a point

that we would like

to separate with a linear inequality

αT

(

x−x0

)

≤

0. For Kelley’s algorithm,

, while for Veinott’s algorithm,

x0∈∂C

, and for both

α∈∂g

(

)

and

(

). Choosing different relaxations and different points where to

compute the cutting planes yields different algorithms. This framework is

developed in Horst and Tuy (1990).

Following the previous framework, Duran and Grossmann (1986) propose

the, so-called, outer approximation algorithm for

MICP

. The idea is to solve

an MILP relaxation, but instead of computing a cutting plane at the MILP op-

timum, or at the boundary point on the segment between the MILP optimum

and some interior point, they suggest to compute cutting planes at a solution

of the nonlinear program (NLP) obtained after fixing the integer variables to

the integer values given by the MILP optimal solution. This is a much more

expensive algorithm but has the advantage of finite convergence. Of course,

this does not work in complete generality and we need some assumptions, for

example, requiring some constraint qualifications. Moreover, when obtaining

an infeasible NLP after fixing the integer variables, care must be taken to pre-

vent the same integer assignment in future iterations. To handle such cases,

Duran and Grossmann propose the use of integer cuts. However, Fletcher

and Leyffer (1994) point out that this is not necessary. They show that the

gradient cuts at the solution of a slack NLP separates the integer assignment.

Eronen et al. (2012) show that a naive generalization of the outer approxima-

tion algorithm to the non-differentiable case will not work. They provide a

generalization for a particular class of function. Wei and Ali (2015a,b) provide

further generalizations to the non-differentiable case.

A related algorithm to the outer approximation method is the so-called

generalized Benders decomposition (Geoffrion, 1972). We refer to Duran and

Grossmann (1986); Fletcher and Leyffer (1994); Quesada and Grossmann

38 On the Relation Between the ESH Algorithm and KCP Algorithm

(1992) for discussions about the relation between these two algorithms. Wei

and Ali (2015c) extend the generalized Benders decomposition to Banach

spaces.

Westerlund and Pettersson (1995) propose the so-called extended cutting

plane algorithm. This algorithm is the extension of Kelley’s cutting plane to

MICP

and they show that the algorithm convergences. Further extensions

and convergence proofs of cutting plane and outer approximation algorithms

for non-smooth problems are given in Eronen et al. (2012). An interesting

generalization of the extended cutting plane algorithm to solve a class of

non-convex problems is the so-called

extended cutting plane algorithm

introduced by Westerlund et al. (1998). They consider problem

(2.1)

where

is represented by differentiable pseudoconvex constraints. The idea is that,

even though a gradient cut might not be valid, one can tilt the cut in order

to make it valid. The tilting is done by multiplying the gradient by some

hence the name. We refer to Westerlund et al. (1998) for more details.

As mentioned at the beginning, the assumption that the objective function

is linear is without loss of generality, provided that the original objective func-

tion is convex. However, some classes of problems cannot be encompassed by

(2.1)

, for example, when the objective function is quasi-convex. An extension

of the KCP algorithm, the (

) extended cutting plane algorithm, and the

ESH to convex problems with a class of quasi-convex objectives were devel-

oped by Plastria (1985), Eronen et al. (2013), and Westerlund et al. (2018),

respectively.

Yet another technique for producing tight cuts is to project the point to

be separated onto

(Horst and Tuy, 1990). Using the projected point and the

difference between the point and its projection, one can build a supporting

hyperplane that separates the point. In the same reference, Horst and Tuy

show that this algorithm converges.

There have been attempts at building tighter relaxations by ensuring that

gradient cuts are supporting, in a more general context than convex mixed

integer nonlinear programming. Belotti et al. (2009) consider bivariate convex

constraints of the form

(

)

−y≤

0, where

is a univariate convex function.

They propose projecting the point to be separated onto the curve

(

)

and building a gradient cut at the projection. However, their motivation is not

to find supporting hyperplanes, but to find the most violated cut. Indeed, as

we will see, gradient cuts for these types of constraints are always supporting

(Example 2.6). Other work along these lines includes the one by Lubin et al.

(2015), where the authors derive an efficient procedure to project onto a two

dimensional constraint derived from a Gaussian linear chance constraint, thus

building supporting valid inequalities.

2.2. Characterization of Functions with Supporting Linearizations 39

Another algorithm for solving non-smooth convex optimization problems

is the so-called bundle method (Hiriart-Urruty and Lemar´echal, 1993). This

method has also been extended to consider the mixed integer case by de Oliveira

(2016).

Finally, in terms of applications, we would like to point out that the sup-

porting hyperplane algorithm is very popular in stochastic optimization (van

Ackooij et al., 2018, 2013; van Ackooij and de Oliveira, 2016; Arnold et al.,

2013; Pr´ekopa, 1995; Pr´ekopa and Sz´antai, 1978; Sz´antai, 1988).

2.2 Characterization of Functions with Supporting Lineariza-

tions

We now give necessary and sufficient conditions for the linearization of a

convex, not necessarily differentiable, function

at a point

x¯

to support the

region

{x∈Rn

(

)

≤

}

. In order for this to happen, the supporting

hyperplane has to support the epigraph on the whole segment joining the

point of

where it supports and (

x¯, g

(

x¯

)). In other words, the function must

be affine on the segment joining the set

and

x¯

. This is due to the convexity

of g.

Theorem 2.3.

Let

g:Rn→R

be a convex function,

{x∈Rn

(

)

≤

} 

∅

, and

x¯/∈C

. There exists a subgradient

v∈∂g

(

x¯

)such that the valid

inequality

g(x¯) + vT(x−x¯) ≤0 (2.3)

supports

, if and only if, there exists

x0∈C

such that

λ↦→ g

(

x¯−x0

))

is affine in [0,1].

Proof. (

⇒

) Let

x0∈∂C

be the point where

(2.3)

supports

. The idea is to

show that the affine function

x↦→ g

(

x¯

) +

(

x−x¯

) coincides

at two points,

x¯

and

. Then, by the convexity of

, it must coincide with

on the segment

joining both points.

In more detail, by definition of x0we have,

g(x¯) + vT(x0−x¯) = 0.(2.4)

For

λ∈

1], let

(

) =

(

x¯−x0

) and

(

) =

(

)). Since

is convex

and laffine, ρis convex.

Since vis a subgradient,

g(x¯) + vT(l(λ)−x¯) ≤ρ(λ) for every λ∈[0,1].

40 On the Relation Between the ESH Algorithm and KCP Algorithm

After some algebraic manipulation and using that

(1) =

(

x¯

) =

(

x¯−x0

we obtain

ρ(1)λ≤ρ(λ).

On the other hand,

(0) = 0 and

(

) is convex, thus we have

(

)

≤λρ

(1) +

−λ

)

(0) =

λρ

(1) for

λ∈

1]. Therefore,

(

) =

(1)

, hence

(

)) is

affine in [0,1].

(

⇐

) The idea is to show that there is a supporting hyperplane

epi g⊆Rn×R

which contains the graph of

restricted to the segment joining

and

x¯

, that is,

{

(

x¯−x0

)

, g

(

x¯−x0

))) :

λ∈

}

. Then

the intersection of such Hwith Rn×{0}will give us (2.3).

The set

is a convex nonempty subset of

epi g

that does not intersect the

relative interior of epi g. Hence, there exists a supporting hyperplane,

H={(x, z)∈Rn×R:vTx+az =b},

to epi gcontaining A(Rockafellar, 1970, Theorem 11.6).

Since

(

)

≤

0 and

(

x¯

)

0, it follows that

is not parallel to the

-space. Therefore,

is also not parallel to the

-space and so

v

= 0. Since

is not parallel to the

-axis, it follows that

a

= 0. We assume, without loss

of generality, that a=−1.

The point (

x¯, g

(

x¯

)) belongs to

A⊆H

, thus

vTx¯−g

(

x¯

) =

and

{(x, g(x¯) + vT(x−x¯)) : x∈Rn}. Given that Hsupports the epigraph, then

vis a subgradient of g, in particular,

g(x¯) + vT(x−x¯) ≤g(x) for every x∈Rn.

Let

(

) be the affine function whose graph is

, that is,

(

) =

(

x¯

(

x−x¯

We now need to show that

(

x¯

) +

(

x−x¯

)

≤

0 supports

by exhibiting an

xˆ∈C

such that

(

x¯

) +

(

xˆ−x¯

) = 0. By construction,

(

x¯−x0

)) =

(

x¯−x0

)). Since

(

x¯−x0

)) is non-positive for

= 0 and

positive for

= 1, it has to be zero for some

λ0

. Let

xˆ

λ0

(

x¯−x0

). Then

g(xˆ) = z(xˆ) = 0 and we conclude that xˆ∈Cand g(x¯) + vT(xˆ−x¯) = 0.

Specializing the theorem to differentiable functions directly leads to the

following:

Corollary 2.4.

Let

g:Rn→R

be a convex differentiable function,

{x∈

Rn:g(x)≤0}, and x¯/∈C. Then the valid inequality

g(x¯) + ∇g(x¯)T(x−x¯) ≤0,

supports

, if and only if, there exists

x0∈C

such that

λ↦→ g

(

x¯−x0

))

is affine in [0,1].

2.2. Characterization of Functions with Supporting Linearizations 41

Proof. Since

is differentiable, the subdifferential of

consists only of the

gradient of g.

A natural candidate for functions with supporting gradient cuts at every

point are functions whose epigraph is a translation of a convex cone.

Corollary 2.5

(Sublinear functions)

Let

(

)be a sublinear function. For

this type of function, gradient cuts always support

(

)

≤c}

, for

any c≥0.

Proof. This follows directly from Theorem 2.3, since 0

∈C

and

λ↦→ h

(

λx¯

) is

affine in R+for any x¯.

However, these are not the only functions that satisfy the conditions of

Theorem 2.3 for every point. The previous theorem implies that linearizations

always support the constraint set if a convex constraint

(

)

≤

0 is linear in

one of its arguments.

Example 2.6

(Functions with linear variables)

Let

f:Rm×Rn→R

a convex function of the form

(

x, y

) =

(

) +

aTy

, with

a

= 0 and

g:Rm→R

convex. Then gradient cuts support

{

(

x, y

) :

(

x, y

)

≤

}

Indeed, assume without loss of generality that

a1>

0 and let (

x¯, y¯

)

/∈C

. Then

there exists a

λ >

0 such that

(

x¯, y¯−λe1

) =

(

x¯

) +

aTy¯

c−a1λ

= 0. The

statement follows from Theorem 2.3.

Consider separating a point (

x0, z0

) from a constraint of the form

(

)

with

g:R→R

and convex, with

z0< g

(

) (that is, separating on the convex

constraint

(

)

≤z

). As mentioned earlier, Belotti et al. (2009) suggest

projecting (

x0, z0

) to the graph

(

) and computing a gradient cut there.

This example shows that this step is unnecessary when the sole purpose is to

obtain a cut that is supporting to the graph.

By contrast, if

(

) is strictly convex, linearizations at points

such that

(

)



= 0 are never supporting to

(

)

≤

0. This follows directly from Theo-

rem 2.3 since

λ↦→ g

(

λv

) is not affine for any

. We can also characterize

convex quadratic functions with supporting linearizations.

Example 2.7

(Convex quadratic functions)

Let

(

) =

xTAx

bTx

a convex quadratic function, i.e.,

is an

symmetric and positive semi-

definite matrix. We show that gradient cuts support

{x∈Rn

(

)

≤

}

if and only if, bis not in the range of A, i.e., b /∈R(A) = {Ax :x∈Rn}.

First notice that

(

) =

(

λv

) is affine linear, if and only if,

v∈ker

(

42 On the Relation Between the ESH Algorithm and KCP Algorithm

Let

v∈ker

(

) and

x¯/∈C

. Then there is a

λ∈R

such that

x¯

λv ∈C

if and only if

is not constant. Thus, gradient cuts are not supporting, if

and only if,

is constant for every

v∈ker

(

). But

is constant for every

v∈ker

(

), if and only if,

bTv

= 0 for every

v∈ker

(

), which is equivalent

b∈ker

(

)

⊥

(

) =

(

), since

is symmetric. Hence, gradient cuts

support C, if and only if, b /∈R(A).

In particular, if

= 0, i.e., there are no linear terms in the quadratic

function, then gradient cuts are never supporting hyperplanes. Also, if

invertible,

b∈R

(

) and gradient cuts are not supporting. This is to be

expected since in this case gis strictly convex.

2.3 The Gauge Function

Any

MICP

of form

(2.1)

can be reformulated to an equivalent

MICP

with a

single constraint for which every linearization supports the continuous relax-

ation of the feasible region. To this end, we can use any sublinear function

whose 1-sublevel set is

. Each convex set

has at least one sublinear function

that represents it, namely, the gauge function (Rockafellar, 1970) of C.

Definition 2.8.

Let

C⊆Rn

be a convex set such that 0

∈int C

. The gauge

of Cis

φC(x) = inf {t > 0 : x∈tC }.

Proposition 2.9

(Tuy (2016, Proposition 1.11))

Let

C⊆Rn

be a convex

set such that 0

∈int C

, then

φC

(

)is sublinear. If, in addition,

is closed,

then it holds that

C={x∈Rn:φC(x)≤1}

and

∂C ={x∈Rn:φC(x) = 1}.

Combining Proposition 2.9 with Corollary 2.5, we can see that the gauge

function is appealing for separation, because it always generates supporting

hyperplanes.

2.3.1 Using the Gauge Function for Separation

Even though the gauge function is exactly what we need to ensure supporting

gradient cuts, in general, there is no closed-form formula for it. Therefore, it

is not always possible to explicitly reformulate Cas φC(x)≤1.

Furthermore, if one is interested in solving mathematical programs with

a numerical solver, performing such a reformulation might introduce some

2.3. The Gauge Function 43

numerical issues one would have to take care of. Solvers usually solve up

to a given tolerance, that is, they accept points that satisfy

(

)

≤ε

for

some

ε >

0. Then, even though

φC

(

)

≤

}

, it might be that

{x∈Rn

φC

(

)

≤

1 +

ε}⊈{x∈Rn

(

)

≤ε}

. In fact, even simple

constraints show this behavior. Consider

x2−

≤

}

. In this case,

φC

(

) =

|x|

and for

= 1 +

, we have

φC

(

) = 1 +

. Then

would be

-feasible for

φC

(

)

≤

1, although it would be infeasible for

x2−

≤

0, since

2ε+ε2> ε.

Luckily, one does not need to reformulate in order to take advantage of the

gauge function for tighter separation. The next propositions show how to use

the gauge function and a point

x¯/∈C

to obtain a boundary point of

and

that linearizing at that boundary point gives a supporting valid inequality that

actually separates

x¯

. For ensuring the existence of a supporting hyperplane

we need Assumption 2.2. For example, Assumption 2.2 is satisfied whenever

Slater’s condition (Section 1.3) is satisfied for

(2.1)

with

represented by

(2.2), that is, when there exists x0such that gj(x0)<0 for every j∈J.

Before we state the propositions we start with a simple lemma.

Lemma 2.10.

Let

C⊆Rn

be a closed convex set such that 0

∈int C

, let

xˆ∈∂C

and

x¯/∈C

. Let

α∈Rn, β ∈R

such that

α

= 0 and

αTx≤β

is a

valid inequality for

that supports

xˆ

. If the segment joining 0and

x¯

contains xˆ, then the inequality separates x¯from C.

Proof. Consider

(

) =

αT

(

λx¯

)

−β

and let

λ0∈

1) be such that

λ0x¯

xˆ

The function

is a strictly increasing affine linear function. Indeed, 0

∈int C

implies that l(0) <0, while l(λ0) = 0. Thus, l(1) >0, i.e., αTx¯> β.

Proposition 2.11.

Let

C⊆Rn

be a closed convex set such that 0

∈int C

and let x¯/∈C. Then x¯

φC(x¯) ∈∂C.

Proof. First,

φC

(

x¯

)



= 0 since

x¯/∈C

. The positive homogeneity of

φC

implies

that φC(︂x¯

φC(x¯) )︂=φC(x¯)

φC(x¯) = 1. Proposition 2.9 implies x¯

φC(x¯) ∈∂C.

Let

(

) be the set of indices of the active constraints at

, i.e.,

(

) =

{j∈J:gj(x) = 0}.

Proposition 2.12.

Let

(

)

≤

, j ∈J}

be such that 0

∈int C

and let

φC

be its gauge function. Assume that Assumption 2.2 holds. Given

x¯/∈C

, define

xˆ

x¯

φC(x¯)

. Then, for any

j∈J0

(

xˆ

), the gradient cut of

xˆ

yields a valid supporting inequality for Cthat separates x¯.

44 On the Relation Between the ESH Algorithm and KCP Algorithm

Proof. By the previous proposition, we have that

xˆ∈∂C

. Let

j∈J0

(

xˆ

). Then

the gradient cut of

xˆ

yields a valid supporting inequality. The fact that it

separates follows from Lemma 2.10. Note that Lemma 2.10 is applicable since

Assumption 2.2 ensures that the normal of the gradient cut is nonzero.

Hence, we can get supporting valid inequalities separating a given point

x¯/∈C

by using the gauge function to find the point

xˆ

x¯

φC(x¯) ∈∂C

. Then

Proposition 2.12 ensures that the gradient cut of any active constraint at

xˆ

will separate x¯ from C. But how do we compute φC(x¯)?

2.3.2 Evaluating the Gauge Function

Let

(

)

≤

, j ∈J}

be a closed convex set such that 0

∈int C

and consider

f(x) = max

j∈Jgj(x).(2.5)

In general, evaluating the gauge function of

x¯/∈C

is equivalent to solving

the following one dimensional equation

f(λx¯) = 0, λ ∈(0,1).(2.6)

If λ∗is the solution, then φC(x¯) = 1

λ∗.

One can solve such an equation using a line search. Note that the line

search is looking for a point

xˆ∈∂C

on the segment between 0 and

x¯

. This is

exactly what the (extended) supporting hyperplane algorithm performs when

it uses 0 as its interior point.

We would also like to remark that a closed-form formula expression for

the gauge function of

is equivalent to a closed-form formula for the solution

(2.6)

. It is possible to find such a formula for some functions, e.g., when

is a convex quadratic function.

Next, we briefly discuss what happens when 0 is not in the interior of

and when

has no interior. In the next section we discuss the implications

of the fact that evaluating the gauge function is equivalent to the line search

step of the supporting hyperplane algorithm.

2.3.3 Handling Sets with Empty Interior

When

int C

∅

, we can still use the methods discussed above by applying a

trick from Kronqvist et al. (2016). Assuming

{x∈Rn

(

)

≤

, j ∈

J} 

∅

, consider the set

Cϵ

{x∈Rn

(

)

≤ϵ, j ∈J}

. This set satisfies

int Cϵ=∅and optimizing over Cϵprovides an ϵ-optimal solution.

2.4. Convergence Proofs 45

2.3.4 Using a Nonzero Interior Point

x0∈int C

and

x0

= 0, we can translate

so that 0 is in its interior.

Equivalently, we can build a gauge function centered on x0. This is given by

φx0,C(x) = φC−x0(x−x0).

Then, given x¯/∈C, the point

xˆ = x¯−x0

φC−x0(x¯−x0)+x0(2.7)

belongs to the boundary of

. Equivalently,

xˆ

λ∗

(

x¯−x0

), where

λ∗

solves

f(x0+λ(x¯−x0)) = 0, λ ∈(0,1),

with f(x) = maxj∈Jgj(x) as in (2.5).

2.4 Convergence Proofs

Consider an

MICP

given by

(2.1)

with

represented as

(2.2)

. Let

be defined

as in

(2.5)

. As mentioned above, the ESH algorithm computes an interior point

(which we will assume to be 0) and performs a line search between

x¯/∈C

and 0 in order to find a point on the boundary. It computes a gradient cut

at the boundary point, solves the relaxation again, and repeats the process.

From our previous discussion, computing a gradient cut at the boundary point

is equivalent to computing a gradient cut at

x¯

φC(x¯)

. Therefore, the generated

cuts are f(x¯

φC(x¯) ) + vT(x−x¯

φC(x¯) )≤0, where v∈∂f(x¯

φC(x¯) ).

To prove the convergence of the ESH algorithm, Veinott and Kronqvist

et al. use tailored arguments. Here we show that the convergence of the algo-

rithm follows from the convergence of KCP. We note that the KCP algorithm

still converges when

is represented by a convex non-differentiable function.

One needs to replace gradients by subgradients and one can use any subgradi-

ent (Horst and Tuy, 1990). Therefore, given that

φC

(

) is a convex function,

we know that KCP converges when applied to

min{cTx

φC

(

)

≤

}

. Thus,

in order to prove that ESH converges, it is sufficient to show that the cutting

planes generated by ESH can also be generated by KCP.

We first prove that the normals of (normalized) supporting valid inequali-

ties are subgradients of the gauge function at the supporting point.

Lemma 2.13.

Let

αTx≤

1be a valid and supporting inequality for

. Let

xˆ∈∂C be a point where it supports C, i.e., αTxˆ = 1. Then α∈∂φC(xˆ).

46 On the Relation Between the ESH Algorithm and KCP Algorithm

Proof. We need to show that

φC

(

xˆ

) +

αT

(

x−xˆ

)

≤φC

(

) for every

. Note

that since

xˆ∈∂C

, we have that

φC

(

xˆ

) = 1 and we just have to prove that

αTx≤φC(x).

When

is such that

φC

(

)

0, we have

φC(x)∈C

. Due to the validity

of αTx≤1, it follows that αTx

φC(x)≤1.

Now let

be such that

φC

(

) = 0. Then

φC

(

λx

) = 0 for every

λ >

0, i.e.,

λx ∈C

for every

λ >

0. Hence,

αT

(

λx

)

≤

1 for every

λ >

0 which implies

that αTx≤0 = φC(x).

Now we prove that the inequalities generated by the ESH algorithm can

also be generated by the KCP algorithm. Given that the KCP algorithm

converges even for non-smooth convex function (Horst and Tuy, 1990), the

next theorem implies the convergence of the ESH algorithm.

Theorem 2.14.

Consider an

MICP

given by

(2.1)

with

represented as

(2.2)

such that 0

∈int C

and Assumption 2.2 holds. Let

be defined as

(2.5)

and let

x¯/∈C

be the current relaxation solution to separate. Let

(

x¯

φC(x¯)

) +

(

x−x¯

φC(x¯)

)

≤

0, with

v∈∂f

(

x¯

φC(x¯)

), be the inequality generated

by the ESH algorithm using 0as the interior point. Then KCP applied to

min{cTx:φC(x)≤1}can generate the same inequality.

Proof. Let

xˆ

x¯

φC(x¯)

. First, let us show that Assumption 2.2 implies

v

= 0.

Indeed, if

= 0, then

(

xˆ

) +

(

x−xˆ

)

≤f

(

) and 0

∈C

imply that

≥f

(0)

≥f

(

xˆ

) +

−xˆ

) = 0. Let

j∈J

be such that

(0) =

(0) = 0.

Then

λ↦→ gj

(

λxˆ

) is constant in [0

1]. Thus, its derivative at 1 is 0, i.e.,

∇gj

(

xˆ

)

Txˆ

= 0. This implies that

∇gj

(

xˆ

)

Tx¯

= 0. Furthermore,

∇gj

(

xˆ

)



= 0 by

Assumption 2.2 and so Lemma 2.10 implies that

∇gj

(

xˆ

)

(

x−xˆ

)

≤

0 separates

x¯ from C. But this contradicts the equality ∇gj(xˆ)Tx¯ = 0.

Let us manipulate the inequality obtained by the ESH algorithm. Notice

that

(

xˆ

) = 0 and so the inequality reads as

vTx≤vTxˆ

. By Lemma 2.10,

x¯

cut off by

vTx≤vTxˆ

, i.e.,

vTx¯> vTxˆ

. This, together with

φC

(

x¯

)

1, implies

that

vTx¯>

0. Summarizing, the inequality obtained by the ESH algorithm

can be rewritten as (︃φC(x¯)

vTx¯v)︃T

x≤1.

Lemma 2.13 implies that

φC(x¯)

vTx¯v∈∂φC

(

xˆ

). Since

φC

is positively ho-

mogeneous,

∂φC

(

xˆ

) =

∂φC

(

x¯

). Hence, if the KCP algorithm applied to

min{cTx

φC

(

)

≤

}

separates

x¯

using

φC(x¯)

vTx¯v∈∂φC

(

x¯

), then it would

generate the gradient cut

φC(x¯) −1 + φC(x¯)

vTx¯vT(x−x¯) ≤0.

2.5. Convex Programs Represented by Non-Convex Non-Smooth Functions47

The left hand side of the above inequality is equivalent to

−

1 +

φC(x¯)

vTx¯vTx

This shows that the gradient cut constructed by the KCP algorithm is the

same as the one construction by the ESH algorithm.

2.5 Convex Programs Represented by Non-Convex Non-Smooth

Functions

In this section we consider problem (2.1) with Crepresented as

C={x:gj(x)≤0, j ∈J},

where the functions

are not necessarily convex. As mentioned in the in-

troduction, convex problems represented by non-convex functions have been

considered in Dutta and Lalitha (2011); Kabgani et al. (2017); Lasserre (2009,

2011, 2014); Mart´ınez-Legaz (2014). These different works have generalized

each other by considering more general classes of non-smooth functions.

2.5.1 The ESH Algorithm in the Context of Generalized Differen-

tiability

When a function is non-smooth there are many ways of extending the notion

of differentiability. Informally, it is common to first define a notion of direc-

tional derivative and then a generalization of the gradient. As the directional

derivative of

in the direction

is given by

∇g

(

)

, the notion of

generalized gradient tries to capture this relation.

A classic notion of generalized derivative is Clarke’s subdifferential.

Definition 2.15

(Clarke (1990); Clarke et al. (1998))

The Clarke directional

derivative of a function

Rn→R

x¯

in the direction

d∈Rn

is defined as

g◦(x¯; d) = lim sup

x→x¯,t↘0

g(x+td)−g(x)

The Clarke subdifferential of gat x¯is

∂◦g(x¯) = {η∈Rn:ηTd≤g◦(x¯; d)∀d∈Rn}.

We say that

is directionally differentiable at

x¯

if directional derivatives of

at x¯exist, that is,

g′(x¯; d) = lim

t↘0

g(x¯ + td)−g(x¯)

exists for every

d∈Rn

. Finally,

is regular in the sense of Clarke at

x¯

if the

gis directional differentiable at x¯and g′(x¯; d) = g◦(x¯; d)for every d∈R.

48 On the Relation Between the ESH Algorithm and KCP Algorithm

Another interesting class is the following.

Definition 2.16

(Jeyakumar and Luc (1999))

Let

Rn→R

. The upper

Dini directional derivative of gat x¯in the direction d∈Rnis

g+(x¯; d) = lim sup

t↘0

g(x¯ + td)−g(x¯)

The function

has an upper regular convexificator (URC) at

x¯

if there exists

a closed set ∂+g(x¯) ⊆Rnsuch that for each d∈Rn,

g+(x¯; d) = sup

α∈∂+g(x¯)

αTd.

We abstract the notion of directional derivative and subdifferential as

follows.

Definition 2.17.

Let

Rn→R

be a function. A generalized directional

derivative of

is a function

Rn×Rn→R

, and the generalized directional

derivative of

in the direction

(

;

). We say that

admits a

generalized subdifferential at

if there exists

(

)

⊆Rn

such that

h(x;d) = supv∈A(x)vTdfor all d∈Rn.

For example, if

is locally Lipschitz, then Clarke’s directional derivative is

a generalized directional derivative and

∂◦g

(

) is a generalized subdifferential

g◦

(

;

) =

sup{vTd

v∈∂◦g

(

)

}

(Clarke et al., 1998, Proposition 2.1.5).

Or, if

admits a URC, then Dini’s directional derivative is a generalized

directional derivative that admits a generalized subdifferential.

However, the above definition of generalized directional derivative and sub-

differential is so general, that any support function of a set yields a generalized

directional derivative that admits a generalized subdifferential. The following

definition adds a further requirement in order to make this general notion

useful.

Definition 2.18.

Let

be a generalized directional derivative of

. We say

that the generalized directional derivative is well-behaved if

(

;

)

0implies

that there exists tn↘0such that g(x+tnd)> g(x).

As we will see, this is the key property to show that the ESH algorithm

converges.

Clearly, if

is differentiable, then the directional derivative is well-behaved.

Also, Dini’s directional derivative is well-behaved. As we will see in the next

2.5. Convex Programs Represented by Non-Convex Non-Smooth Functions49

section, Clarke’s directional derivative is not well-behaved in general. However,

if the function is regular in the sense of Clarke, then it is well-behaved. Another

important class of functions for which Clarke’s directional derivative is well-

behaved is the class of ∂◦-pseudoconvex functions.

Definition 2.19. A function g:Rn→Ris ∂◦-pseudoconvex if

– it is locally Lipschitz and,

– for every x, y ∈Rn, if g(y)< g(x), then g◦(x;y−x)<0

To show that it is well-behaved, we need to following result.

Lemma 2.20

(Bagirov et al. (2014, Lemma 5.3))

If a function

∂◦

pseudoconvex, then for every

x, y ∈Rn

, if

(

) =

(

), then

g◦

(

;

y−x

)

≤

In particular, if g(y)≤g(x), then g◦(x;y−x)≤0.

The contrapositive of the last statement is if

g◦

(

;

y−x

)

0, then

(

)

(

). As

g◦

(

;

) is positively homogeneous (Clarke et al., 1998, Proposition

2.1.1), we conclude that if

∂◦

-pseudoconvex,

g◦

(

;

)

0 for some

d∈Rn

and

t >

0, then

(

)

> g

(

). Thus, if

∂◦

-pseudoconvex, then Clarke’s

directional derivative is well-behaved.

Now we are ready to prove the main result of this section. Recall that

J0(x) = {j∈J:gj(x) = 0}.

Theorem 2.21.

Let

(

)

≤

, j ∈J}

be such that

is convex,

closed, and 0

∈int C

. Assume that for each

x∈C

and

j∈J0

(

), the function

has a well-behaved generalized directional derivative at

denoted by

and that it admits a generalized subdifferential,

∂∗gj

(

). Furthermore, assume

that

∂∗gj(x)\{0} =∅for all x∈Cand j∈J0(x).(2.8)

Let

φC

be the gauge function of

. For

x¯/∈C

, define

xˆ

x¯

φC(x¯)

. Then, for

every

j∈J0

(

xˆ

)and every

v∈∂∗gj

(

xˆ

)

}

, the gradient cut,

(

xˆ

) +

(

x−

xˆ) ≤0, is a valid supporting inequality for Cthat separates x¯.

Proof. By Proposition 2.11 we have that

xˆ∈∂C

. Let

j∈J0

(

xˆ

) and let us

a consider an arbitrary

v∈∂∗gj

(

xˆ

)

\ {

}

. The gradient cut of

xˆ

vT(x−xˆ) ≤0.

We first show that the gradient cut is valid, that is,

(

y−xˆ

)

≤

0 for all

y∈C

. If this is not the case, then there exists

y0∈C

for which

(

y0−xˆ

)

50 On the Relation Between the ESH Algorithm and KCP Algorithm

Since gjadmits a generalized subdifferential at xˆ, we have that

hj(xˆ; y0−xˆ) = sup

η∈∂∗gj(xˆ)

ηT(y0−xˆ).

v∈∂∗gj

(

xˆ

), it follows that

(

xˆ

;

y0−xˆ

)

0. Since

is well-behaved,

there is a sufficiently small

t∈

1) such that

(

xˆ

(

y0−xˆ

))

0. Thus,

xˆ

(

y0−xˆ

)

/∈C

. However, the convexity of

implies that

xˆ

(

y0−xˆ

)

∈C

for λ∈[0,1], which is a contradiction.

The fact that the gradient cut separates

x¯

follows from Lemma 2.10. Note

that v= 0 by hypothesis.

Theorem 2.21 extends the algorithm of Veinott to further representations

of the set

. In particular, it implies that the ESH converges (via an argument

similar to Theorem 2.14’s proof) when the constraints admit a URC or are

∂◦-pseudoconvex. Thus, it generalizes the result of Eronen et al. (2017).

Remark 2.22.

Any representation of a convex set

{x∈Rn

(

)

≤

0, j ∈J}yields a way to evaluate its gauge function, namely,

φC(x) = inf {︃t > 0 : max

jgj(︂x

t)︂= 0}︃.

This infimum can be computed using a line search procedure.

However, what is more important is the ability to compute subgradients.

Given any method to compute subgradients of the gauge function, we can

apply the KCP algorithm using the implicitly defined gauge function. This

allows us, for example, to drop

(2.8)

. This algorithm is more general than

the one proposed by Lasserre (2011), but it will not necessarily converge to a

KKT point of the original problem.

2.5.2 Limits to the Applicability of the ESH Algorithm

The idea of the proof of Theorem 2.21 is that since

is convex,

xˆ

(

y−xˆ

)

∈C

for every

y∈C

and

λ∈

1]. Hence, the functions

do not increase when

moving in the direction

y−xˆ

from

xˆ

. Thus, a notion of subdifferential that

characterizes a well-behaved directional derivative yields valid gradient cuts.

The abstract definitions introduced above try to capture this line of reasoning.

Note that this is also how the proofs of the ‘only if’ parts of (Lasserre, 2009,

Lemma 2.2), (Kabgani et al., 2017, Theorem 1), (Dutta and Lalitha, 2011,

Proposition 2.2), and the

⊆

inclusion of (Mart´ınez-Legaz, 2014, Proposition

6) work. For example, Lasserre (2009) assumes that the

is differentiable,

2.6. Concluding Remarks 51

in which case the generalized subdifferential is just the singleton given by the

gradient and the generalized directional derivative is the classic directional

derivative. Dutta and Lalitha (2011) assume that the functions are locally

Lipschitz and regular in the sense of Clarke.

It is a natural question to wonder how important the regularity assumption

is. As the following example shows, the ESH algorithm can produce invalid

cutting planes when using Clarke’s subdifferential and the constraints are

not regular in the sense of Clarke. In particular, this shows that, without the

assumption of regularity, Clarke’s directional derivative is not well-behaved,

in general.

Example 2.23.

Consider the function

(

x1, x2

) =

max{min{

x2,

x2}, x1}

. The set

{

(

x1, x2

) :

(

x1, x2

)

≤

}

is convex, closed and its inte-

rior is nonempty as shown in Figure 2.2. Note that as

is piecewise linear, it is

globally Lipschitz continuous (Scholtes, 2012, Proposition 2.2.7). Using Clarke

et al. (1998, Theorem 2.8.1), it follows that

∂◦g

(0) =

conv{

}

Then 2

+ 3

x2≤

0 is a gradient cut of

at 0. However, it is not valid as

(−1,3) is feasible but −2+9>0.

In particular, it must be that

is not regular in the sense of Clarke and that

g◦

is not well-behaved. To see that

is not well-behaved, consider the direction

= (

−

1). Notice that

((0

0) +

) =

(

−

1) =

−t

, and so

is strictly

decreasing in the direction

. However,

g◦

(0;

) =

maxv∈∂◦g(0) −v1

= 1.

This also shows that

is not regular. The directional derivative of

at 0 in

the direction dis −1= 1.

2.6 Concluding Remarks

In this chapter, we have shown that the extended supporting hyperplane algo-

rithm introduced by Veinott (1967) and rediscovered by Kronqvist et al. (2016)

is identical to Kelley’s classic cutting plane algorithm applied to a suitable

reformulation of the problem. We used this new perspective in order to prove

the convergence of the method for the larger class of problems with convex

feasible regions represented by non-convex non-smooth constraints which ad-

mit a generalized subdifferential and whose generalized directional derivative

is well-behaved. This class includes

∂◦

-pseudoconvex functions and functions

that admit a URC. Functions that admit a URC include differentiable func-

tions and locally Lipschitz functions that are regular in the sense of Clarke.

More generally, the algorithm extends to any representation of a convex set

that allows to compute subgradients of its gauge function. These theoretical re-

sults bear relevance in practice, as the experimental results in Kronqvist et al.

52 On the Relation Between the ESH Algorithm and KCP Algorithm

-2-1 0 1 2

-2

-1

Figure 2.2: Counterexample showing that, in general, the ESH algorithm can

generate invalid cutting planes if the constraints are just Lipschitz continuous.

The convex feasible region

max{min{

x2,

+ 3

x2}, x1} ≤

0 in blue

and the boundary of the invalid gradient cut 2x1+ 3x2≤0 in red.

(2016, 2018) have already demonstrated the computational benefits of the

supporting hyperplane algorithm in comparison to alternative state-of-the-art

solving methods.

Another intuition gain from this chapter, which we will use in Chapter 5, is

that if we want the gradient cuts to be supporting, then the constraint function

cannot be “too” convex. Indeed, as we saw, gradient cuts from strictly convex

functions will never be supporting.

Chapter 3

Visible Points, the Separation Problem,

and Applications to Mixed-Integer

Nonlinear Programming

From now on we move away from convex mixed-interger non-linear programs

and consider non-convex mixed-integer linear programs. In this chapter we

introduce a technique to produce tighter cutting planes for mixed-integer

non-linear programs. Usually, a cutting plane is generated to cut off a specific

infeasible point. The underlying idea is to use the infeasible point to restrict

the feasible region in order to obtain a tighter domain. To ensure validity, we

require that every valid cut separating the infeasible point from the restricted

feasible region is still valid for the original feasible region. We translate this

requirement in terms of the separation problem and the reverse polar. In

particular, if the reverse polar of the restricted feasible region is the same

as the reverse polar of the original feasible region, then any cut valid for the

restricted feasible region that separates the infeasible point, is also valid for

the original feasible region.

We show that the reverse polar of the so-called visible points of the feasible

region from the infeasible point coincides with the reverse polar of the feasible

region. In the special case where the feasible region is described by a single

non-convex constraint intersected with a convex set we provide a characteri-

zation of the visible points. Furthermore, when the non-convex constraint is

quadratic the characterization is particularly simple. We also provide an ex-

tended formulation for a relaxation of the visible points when the non-convex

constraint is a general polynomial.

Finally, we give some conditions under which for a given set there is an

inclusion-wise smallest set, in some predefined family of sets, whose reverse

polars coincide.

54 Visible Points, the Separation Problem, and Applications to MINLP

3.1 Introduction

The separation problem is a fundamental problem in optimization (Gr¨otschel

et al., 1993). Given a set

S⊆Rn

and a point

x¯∈Rn

, the separation problem

Decide if

x¯

is in the closure of convex hull of

or find a valid for

Sthat separates x¯.

Algorithms to solve optimization problems, especially those based on solving

relaxations, such as branch and bound, need to deal with the separation

problem. Consider, for example, solving a mixed integer linear problem via

branch and bound (Conforti et al., 2014, Section 9.2). The solution to the

linear relaxation plays the role of

x¯

, while a relaxation based on a subset of

the constraints is used as

for the separation problem, see (Conforti et al.,

2014, Chapter 6).

The separation problem can be rephrased in terms of the reverse po-

lar (Balas, 1998; Zaffaroni, 2008) of Sat x¯, defined as

Sx¯={α∈Rn:αT(x−x¯) ≥1,∀x∈S}.

The elements of

Sx¯

are the normals of the hyperplanes that separate

x¯

from

conv S. Hence, the separation problem can be stated equivalently as

Decide if Sx¯is empty or find an element from it.

The point of departure of the present work is the following observation.

Observation 3.1.

If there is a set

such that (

S∩V

)

x¯

Sx¯

, then, as far

as the separation problem is concerned, the feasible region can be regarded

as S∩Vinstead of S.

A set

such that

Vx¯

Sx¯

will be called a generator of

Sx¯

. Intuitively,

if a set

is such that

V∩S

generates

Sx¯

, that is, if we can ensure that a

cut valid for

V∩S

that separates

x¯

is also valid for

, then

should at least

contain the points of

that are “near”

x¯

. To formalize the meaning of “near”

we use the concept of visible points (Deutsch et al., 2013) of

from

x¯

, which

are the points

x∈S

for which the segment joining

with

x¯

only intersects

, see Definition 3.5. In other words, they are the points of

that can

be “seen” from

x¯

. In Proposition 3.9 we show that the visible points are a

generator of Sx¯.

As a motivation, we present an application of our results in the context

of nonlinear programming, which is treated in more detail in Section 3.4.

3.1. Introduction 55

-0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Figure 3.1: The feasible region

(

)

≤

0 and

x¯

= (0

0) together with the box

Example 3.2.

Consider the separation problem of

x¯

= (0

0) from

{x∈

B:g(x)≤0}where

B= [−1

2,3] ×[−1

2,3],

g(x1, x2) = −x2

1x2+ 5x1x2

2−x2

2−x2−2x1+ 2,

as depicted in Figure 3.1. A standard technique for solving the separation prob-

lem for

and

x¯

is to construct a convex underestimator of

over

(Vigerske,

2013, Sections 6.1.2 and 7.5.1). The quality of a convex underestimator de-

pends on the bounds of the variables and tighter bounds yield tighter un-

derestimators. As we will see (Proposition 3.9 and Theorem 3.27),

Rx¯

Sx¯

where

R={x∈B:g(x) = 0,∇g(x)Tx≤0}.

It is possible to show that

R⊆V

, where

= [

−1

2,17

]

[

−6

25 ,3

]. Hence, by

Corollary 3.25, (

V∩S

)

x¯

Sx¯

. This means that we can solve the separation

problem over

{x∈V

(

)

≤

}

instead of

. Therefore, if we were to

compute an underestimator of g, it could be computed over V⊊B.

Methods for obtaining tighter bounds for mixed integer nonlinear program-

ming (MINLP) are of paramount importance. Indeed, not only bound tight-

ening procedures enhance the performance of MINLP solvers, but also many

algorithms for solving MINLPs require that all variables are bounded (Hamed

and McCormick, 1993). We refer to the recent survey of Puranik and Sahinidis

(2017) for more information on bound tightening procedures and its impact

on MINLP solvers.

56 Visible Points, the Separation Problem, and Applications to MINLP

However, the technique that we introduce in this chapter is not a bound

tightening technique in the classic sense, i.e., the tighter bounds that might

be learned from

are not valid for the original problem, but only for the

separation problem at hand.

We would like to point out that Venkatachalam and Ntaimo (2016) dis-

cusses a similar idea — to modify the separation problem — is used in the

context of stochastic mixed integer programming. Their objective is to speed-

up the solution of the separation problem. In contrast, our objective is to

produce tighter cutting planes for MINLP.

Contributions

We show that for every closed set

, there exists an inclusion-

wise smallest closed convex set that generates

Sx¯

(Theorem 3.21). When

is compact, there is an inclusion-wise smallest closed set that generates

Sx¯

(Theorem 3.23). Furthermore, under some mild assumptions on

, we show

that there is an inclusion-wise smallest closed convex set

such that

C∩S

generates

Sx¯

(Theorem 3.22). We also show the existence of a generator,

(

x¯

of Sx¯which is more suitable for computations.

We apply our results to MINLP and give an explicit description of

(

x¯

)

when

{x∈C

(

)

≤

}

, where

is a closed convex set containing

x¯

, and

is continuous (Section 3.4.1). For the important case of quadratic

constraints, i.e., when

is a quadratic function, we show that

(

x¯

) has a

particularly simple expression (Theorem 3.29).

For the case when

is a general polynomial, we provide an extended formula-

tion for a relaxation of

(

x¯

) based on the theory of non-negative univariate

polynomials (Theorem 3.34).

3.2 Visible Points and the Reverse Polar

In this section we introduce the concept of visible points and reverse polar,

and state some basic properties about them. The main result in this section

is that the reverse polar of the visible points of a set is the reverse polar of

the set (Proposition 3.9).

Unless stated otherwise, we will assume

x¯

= 0. This is without loss of

generality, since we can always translate the set

S−x¯

. We start by

restating the definition of reverse polar.

Definition 3.3. Let S⊆Rnand x¯∈Rn. The reverse polar of Sat x¯is

Sx¯={α∈Rn:αT(x−x¯) ≥1,for all x∈S}.

3.2. Visible Points and the Reverse Polar 57

As stated in the introduction, the reverse polar contains all valid inequali-

ties for Sthat separate x¯ from S.

Definition 3.4.

Let

S, V ⊆Rn

and

x¯∈Rn

. We say that

is a generator

of Sx¯if and only if

Vx¯=Sx¯

Definition 3.5.

Let

S⊆Rn

be closed and

x¯/∈S

. The set of visible points

of Sfrom x¯is

VS(x¯) = {x∈S: (x+ [0,1](x¯−x)) ∩S={x}}

={x∈S: (x+ (0,1](x¯−x)) ∩S=∅}.

We denote VS(0) by VSand note that

VS={x∈S: [0,1]x∩S={x}} ={x∈S: [0,1)x∩S=∅}.

The following concept is, in some sense, the opposite of the visible points.

Definition 3.6. Let S⊆Rnbe closed. The shadow of Sfrom 0 is

shw S= [1,∞)S.

The concept of shadow has also been called penumbra (Rockafellar, 1970,

p. 22),(Tind and Wolsey, 1982; Conforti and Wolsey, 2018) and aureole clo-

sure (Ruys, 1974). The followings are some basic properties of the reverse

polar.

Lemma 3.7. Ruys (1974, Property 9.2.2) Let S, T ⊆Rn. Then,

1. S0= (shw S)0= (conv S)0= (cl S)0= (conv S)0.

2. S0=∅if and only if 0∈conv S.

3. S⊆Timplies T0⊆S0.

4. If 0/∈conv S, then (S0)0= shw conv S.

We will now show that

is a generator of

. To this end, we need the

following lemma, which says that the shadow of what can be seen of a set is

the same as the shadow of the whole set. Likewise, what can be seen of a set

is the same as what can be seen of the shadow of the set.

58 Visible Points, the Separation Problem, and Applications to MINLP

Lemma 3.8.

Let

S⊆Rn

be a closed set such that 0

/∈S

. Then,

shw VS

shw Sand Vshw S=VS.

Proof. First we prove that shw VS= shw S. Clearly, shw VS⊆shw S.

Let

y∈shw S

, then

λx

with

x∈S

λ≥

1. Let

{µ≥

0 :

µx ∈S}

and

µ0

min I

. The minimum exists since

is closed and not empty as

closed and 1

∈I

, respectively. From 1

∈I

, we deduce

µ0≤

1, and from 0

/∈S

µ0>0. Hence, µ0x∈VSand y=λ

µ0(µ0x)∈shw VS, since λ

µ0≥1.

Now we prove that

Vshw S

. Clearly,

S⊆shw S

implies that

VS⊆

Vshw S.

Let

x0∈Vshw S

. Then, [0

x0/∈shw S

. As

S⊆shw S

it follows that

x0/∈S

. Hence, if we manage to show that

x0∈S

, then

x0∈VS

which is

what we want to prove.

x0∈Vshw S

, it must be that

x0∈shw S

. This means that there exists

λ≥

1 and

x∈S

such that

λx

. Note that

λx0

x∈S⊆shw S

. In

other words,

λx0∈shw S

but, as we mentioned above, [0

x0/∈shw S

. This

implies that 1

λ≥1. Therefore, λ= 1, which means that x0=x∈S.

Proposition 3.9. Let S⊆Rnbe a closed set. Then,

(S∩VS)0=V0

S=S0.

Proof. The first equality just comes from the fact that VS⊆S.

If 0

∈S

, then the equality holds as all the sets are empty. Otherwise, the

equality follows from

= (

shw VS

)

= (

shw S

)

, where the first and

last equalities are by Lemma 3.7 and the middle one, by Lemma 3.8.

3.3 The Smallest Generators

3.3.1 Motivation

In the previous section we showed that there is a set

U⊆S

such that (

U∩

)

, namely,

. This set can be used to improve separation routines

as was shown already in Example 3.2. We will come back to applications of

the visible points to separation in the next section.

The topic of this section is motivated by the following example, where the

set VSis much larger than the smallest generator.

Example 3.10.

Consider the constrained set

{

(

x1, x2

)

∈R2

: (

x1−

)

2≥

}

depicted in Figure 3.2. The visible points are the lines

+ 1

3.3. The Smallest Generators 59

0246810

Figure 3.2: The region

. In the middle picture

are the points described

by the thick red line. In the right picture the red points form the smallest set

Vsuch that V0=S0.

and

x1−

1 intersected with the first orthant. However, it is not hard to

see that V={(0,1),(1,0)}is the smallest closed generator of S0.

This example motivates the following question.

Question 3.11.

What is, if any, the smallest closed set

such that

The reason we restrict to generators that are closed sets is to avoid rep-

resentation issues. For example, if

is the ball of radius 1 centered at (2

0),

then Theorem 3.29 implies that the left arc joining (2

1) and (2

,−

1) generates

. However, the rational points on this arc also generate

and the smallest

set generating

does not exist. In order to avoid such issues, we concentrate

on closed generators.

As can be seen from simple examples, such as

R+×{

}

for which

every

a≥

0 defines the generator (

{

} ∪

[

a, ∞

))

× {

}

, the smallest closed

generator must not exist. However, a smallest closed convex generator might

exist and so we ask the following question.

Question 3.12.

What is, if any, the smallest closed convex generator of

We are mainly interested in applying our results to the separation problem,

as already explained in the introduction. In that case, the set

usually looks

C∩F

, where

is a convex set and

is the sublevel set of some non-

convex function, see the next section. In this context, replacing

by a smaller

convex set might be beneficial for the separation problem (see Example 3.32).

Thus, it is also natural to consider the following question.

Question 3.13.

What is, if any, the smallest closed convex set

such that

S∩Ugenerates S0?

60 Visible Points, the Separation Problem, and Applications to MINLP

The last two questions are not the same. Informally,

is only used to define

in Question 3.12, and so any other set

such that

can be used to

formulate the question. For instance, we can assume without loss of generality

that

is closed and convex, since Lemma 3.7 implies that (

conv S

)

. In

contrast, in Question 3.13 we are asking for the smallest generator contained

in S.

As we will see, the answer to Question 3.12 is that

conv Vconv S

is the

smallest closed convex generator of

. However, the next two examples show

that Question 3.13 is a bit more delicate.

The first example shows that, in general, there is no unique smallest closed

convex set Usuch that (S∩U)0=S0.

Example 3.14.

Let

{

(

−

,−

}

. Since 0

∈conv S

S0=∅.

Clearly

{

}

Vconv S

is the smallest closed convex set such that

∅

. However,

S∩V

∅

, which implies that (

S∩V

)

R2

. Furthermore,

{

(

λ,

0) :

λ∈

[

−

}

and

{

, λ

) :

λ∈

[

−

}

are both closed

convex and (

Ui∩S

)

. Since

U1⊆ U2

and

U2⊆ U1

we conclude that

there is no smallest closed convex set Usuch that (U∩S)0=S0.

However, we cannot even expect to find a minimal closed convex set

such that (S∩U)0=S0.

Example 3.15.

Let

{

} ∪ {

(

λ,

2) :

λ≥

}

. We have

{α

α1≥0, α2≥1}.

Indeed, (0

∈S

implies that

α2≥

1. If

α1<

0 for some

α∈S0

, then

there is a large enough

such that

λα1

+ 2

α2<

1 and (

λ,

∈S

. On the

other hand, if α1≥0 and α2≥1, then α1x1+α2x2≥1 for every (x, y)∈S.

Let

{

} ∪ {

(

λ,

2) :

λ≥M}

and

conv TM

. The same

argument as above shows that (

UM∩S

)

. Notice that any

with (

U∩S

)

must contain a sequence

λn→ ∞

such that (

λn,

∈S

Thus, any minimal U, if it exists, must be of the form UMfor some M≥0.

It is clear that

UM1⊆UM2

if and only if

M1> M2

and

⋂︁M>0UM

{

(

λ,

1) :

λ≥

}

. However,

S∩{

(

λ,

1) :

λ≥

}

{

}

and

{

}0

Therefore, there is no minimal U.

On the other hand,

{

(

λ,

1) :

λ≥

}

Vconv S

is the smallest closed

convex set such that V0=S0.

However, these are the only “pathological cases”. Indeed, as we will see,

conv S

is closed (e.g. when

is compact) and 0

/∈conv S

, (i.e.,

S0

∅

3.3. The Smallest Generators 61

then

conv Vconv S

is the smallest closed convex set such that

conv Vconv S∩S

generates S0.

Remark 3.16.

The closure operations are needed because, in general,

and

conv VS

are not closed, even when

is convex and compact. Indeed, it is

shown in Deutsch et al. (2013, Example 15.5) that for

S:= (1,0,0) + cone{(1, α, β) : α2+ (β−1)2≤1},

is open. The authors show that the points (2

,sin

(

)

1 +

cos

(

)) are visible

for

t∈

, π

), but the limit when

approaches

, (2,0,0), is not. The remark

follows from a modification of this example so that

is compact, e.g., by

intersecting it with [0,3] ×R2.

3.3.2 Preliminaries

Here we collect a few lemmata that we are going to need in order to answer

Questions 3.11, 3.12 and 3.13.

Lemma 3.17.

Deutsch et al. (2013, Proposition 15.19) Let

be a closed

convex set such that 0

/∈S

. If

x∈VS

is a strict convex combination of

x1, . . . xm∈S, then x1, . . . xm∈VS.

This result immediately implies the following two lemmata.

Lemma 3.18.

Let

S⊆Rn

be a closed convex set such that 0

/∈S

. Then,

ext VS=VS∩ext S.

Proof. We start by proving

ext VS⊆VS∩ext S

. Let

x∈ext VS

. Clearly,

x∈VS

. If

x /∈ext S

, then there are

x1, . . . , xm∈S

such that

is a strict

convex combination of

x1, . . . , xm

. Lemma 3.17 implies that

xi∈VS

for every

= 1

, . . . , m

. Thus,

is not an extreme point of

. This contradiction proves

that x∈ext S.

x∈VS∩ext S

but

x /∈ext VS

, then

is a strict convex combination of

some elements of

. Since

VS⊆S

is a strict convex combination of some

element of S. This is a contradiction with x∈ext S.

Lemma 3.19.

Let

S⊆Rn

be closed set such that

conv S

is closed and

0/∈conv S. Then,

conv Vconv S= conv(S∩Vconv S).

62 Visible Points, the Separation Problem, and Applications to MINLP

Proof. From

S∩Vconv S⊆Vconv S

, it follows that

conv

(

S∩Vconv S

)

⊆conv Vconv S

To prove the other inclusion it is enough to show that

Vconv S⊆conv

(

S∩

Vconv S

). Let

x∈Vconv S

. Then,

x∈conv S

and so

is a strict convex combi-

nation of some points of

x1, . . . , xm∈S

. Then, by Lemma 3.17,

x1, . . . , xm∈

S∩Vconv S. Thus, x∈conv(S∩Vconv S).

We remark that the previous lemma does not follow from Lemma 3.18 by

just taking the convex hull operation to the equality, since

conv S

may not

have extreme points.

The following is a slight extension of (Rockafellar, 1970, Corollary 18.3.1).

Lemma 3.20. Let S⊆Rnbe a closed set. Then, ext conv S⊆S.

Proof. Recall that

is an exposed point of a closed convex set

if and only

if there exists an αsuch that {x0}= arg maxx∈CαTx0.

We will show that the exposed points of

conv S

is a subset of

. Then, by

Straszewicz’s Theorem (Rockafellar, 1970, Theorem 18.6) and the closedness

, it follows that

ext conv S⊆S

. Note that when the set of exposed points

is empty, the result follows trivially. Thus, we assume that the set of exposed

points is non-empty.

Let

be an exposed point of

conv S

and let

be a direction that exposes

it. Then,

supx∈SαTx

αTx0

. Since

is closed, there exists

x1∈S

such that

αTx1

αTx0

. However, since

x1∈S⊆conv S

and

exposes

, we must

have x1=x0. Thus, x0∈S.

3.3.3 Results

Let us start by answering Question 3.12.

Theorem 3.21. Let S⊆Rnbe closed. Then,

(conv Vconv S)0=S0.

Furthermore, if C⊆Rnis a closed convex generator of S0, then

conv Vconv S⊆C.

Proof. Note that if

∅

, then 0

∈conv S

and

Vconv S

{

}

, from which

the theorem clearly follows. Thus, we assume S0=∅.

3.3. The Smallest Generators 63

Lemma 3.7 implies that (

conv Vconv S

)

= (

Vconv S

)

and

= (

conv S

)

Proposition 3.9 implies (conv S)0= (Vconv S)0.

To show the second statement of the theorem, let

be closed and convex

such that

. Since

is closed and convex, it is enough to prove that

Vconv S⊆C

. Suppose, by contradiction, that this is not the case, i.e., there is

x¯∈Vconv S

such that

x¯/∈C

. There are two cases, either [0

x¯∩C

∅

[0,1]x¯∩C=∅. We will deduce a contradiction from each of them.

First, suppose [0

x¯∩C

∅

. Both sets are closed and [0

x¯

is bounded,

thus, they can be separated. Indeed, as 0

∈

x¯

, Rockafellar (1970, Corollary

11.4.1) ensures the existence of

such that

αx ≥

1 for every

x∈C

and

αx¯<

This means that

α∈C0

. However,

α /∈

(

conv S

)

, since

x¯∈conv S

This contradicts S0=C0.

Now, suppose [0

x¯∩C

∅

. Since 0

/∈C

(as

S0

∅

) and

x¯/∈C

there must be

µ∈

1) such that

µx¯∈C

. However,

x¯∈Vconv S

implies

that

µx¯/∈conv S

. Thus, the same argument as above ensures that

µx¯

can be

separated from

conv S

. Therefore, there is an

such that

αTx≥

1 for every

x∈conv S

while

αTµx¯<

1. Hence,

α∈S0

and the contradiction follows from

the fact that µx¯∈Cimplies α /∈C0.

Therefore, we conclude that conv Vconv S⊆C.

Now we show that if

conv S

is closed and 0

/∈conv S

, then

conv Vconv S

the answer to Question 3.13, i.e., is the smallest closed convex

such that

(U∩S)0=S0.

Theorem 3.22.

Let

S⊆Rn

be a closed set such that

conv S

is closed and

0/∈conv S, i.e., S0=∅. Then,

(conv(Vconv S)∩S)0=S0.

Furthermore, if Cis closed and convex such that (C∩S)0=S0, then

conv Vconv S⊆C.

Proof. We first show that (conv(Vconv S)∩S)0=S0.

Clearly,

Vconv S∩S⊆conv(Vconv S)∩S⊆S.

Lemma 3.7 implies that

S0⊆(conv(Vconv S)∩S)0⊆(Vconv S∩S)0.

64 Visible Points, the Separation Problem, and Applications to MINLP

Thus, it is enough to show that (Vconv S∩S)0=S0. This follows from

(S∩Vconv S)0= (conv(S∩Vconv S))0Lemma 3.7

= (conv Vconv S)0Lemma 3.19

= (Vconv S)0Lemma 3.7

= (conv S)0Proposition 3.9

=S0.Lemma 3.7

To show the second statement of the theorem, let

be a closed convex set

such that (

C∩S

)

. Lemma 3.7 implies that (

C∩S

)

= (

conv

(

C∩S

))

Theorem 3.21 implies that

conv Vconv S⊆conv

(

C∩S

)

Clearly,

Vconv S⊆

conv Vconv S

and

conv

(

C∩S

)

⊆C∩conv S

. Therefore,

Vconv S⊆C∩conv S

which implies Vconv S⊆Cas we wanted.

Finally, we answer Question 3.11 in the case where Sis compact.

Theorem 3.23.

Let

be any closed set such that 0

/∈conv S

. If

is any

closed generator of S0, then

ext Vconv S⊆D.

If, in addition,

is compact, then

ext Vconv S

is the smallest closed gener-

ator of S0.

Proof. First, by Lemma 3.7 and

, we have

shw conv D

shw conv S

Then, Lemma 3.8 implies that

Vconv D

Vconv S

. Hence,

ext Vconv D

ext Vconv S

Therefore,

ext Vconv S

ext Vconv D⊆ext conv D⊆D

, where the first and sec-

ond containments are due to Lemma 3.18 and Lemma 3.20, respectively.

To prove the second statement, by Lemma 3.7, it is enough to show that

(

ext Vconv S

)

. First, as

ext Vconv S⊆conv S

, we have

S0⊆

(

ext Vconv S

)

by Lemma 3.7.

To prove the other containment take any

α∈

(

ext Vconv S

)

. Let

x∈conv S

be arbitrary. We will prove that

αTx≥

1. This will imply that

α∈

(

conv S

)

S0and, therefore, that (ext Vconv S)0⊆S0.

Let

λ∈

1] be such that

λx ∈Vconv S

. If

λx ∈ext Vconv S

, then

αTλx ≥

which implies that αTx≥1

λ≥1.

Now, assume

λx /∈ext Vconv S

. Since

is compact,

conv S

is closed and we

can use Lemma 3.18 to obtain that

ext Vconv S

Vconv S∩ext conv S

. Thus,

λx /∈ext conv S

. Also by the compactness of

, Rockafellar (1970, Theorem

3.4. Applications to MINLP 65

18.5.1) implies that

λx

is a strict convex combination of some

x1, . . . , xm∈

ext conv S.

Lemma 3.17 implies that

x1, . . . , xm∈Vconv S

and so Lemma 3.18 implies that

x1, . . . , xm∈ext Vconv S

. Since

α∈

(

ext Vconv S

)

, it follows

αTxi≥

1 for every

i= 1, . . . , m. Hence, αTλx ≥1 and, as before, αTx≥1

λ≥1.

We remark that the closure operation is needed since the extreme points

of a set, in general, do not form a closed set, see Rockafellar (1970, p. 167).

3.4 Applications to MINLP

Here we apply the results from Section 3.2 to MINLP.

In this section, unless specified otherwise,

x¯∈Rn

is a closed convex set

that contains

x¯

, and

{x∈C

(

)

≤

}

, where

C→R

is continuous

and

(

x¯

)

0. The idea is that

represents a convex relaxation of our MINLP

and

x¯∈C

is the current relaxation solution that is infeasible for a constraint

g(x)≤0.

The basic scheme for applying our results is the following translation of

Observation 3.1.

Proposition 3.24.

Let

D⊆C

be such that (

D∩S

)

x¯

Sx¯

, and

{x∈

(

)

≤

}

. If

αT

(

x−x¯

)

≥

1is a valid inequality for

, then it is valid

for S.

Proof. Directly from α∈Tx¯= (D∩S)x¯=Sx¯.

Of course, the applicability of the previous proposition relies on our ability

to obtain an easy-to-compute set

that satisfies the hypothesis. As shown

in Section 3.3,

ext conv Vconv S

(

x¯

) is the smallest we can hope for, but it

is useless from a practical point of view. Instead, the set of visible points of

(or a set enclosing them) is, computationally, a better candidate as we will

see in Section 3.4.1.

Corollary 3.25.

Let

D⊆C

be such that

(

x¯

)

⊆D

, and

{x∈D

(

)

≤

}

. If

αT

(

x−x¯

)

≥

1is a valid inequality for

, then it is valid for

Proof. Clearly,

(

x¯

)

⊆T

D∩S⊆S

. The inclusion-reversing property

of the reverse polar implies that

Sx¯⊆

(

D∩S

)

x¯⊆VS

(

x¯

)

x¯

Sx¯

, where

the last equality follows from Proposition 3.9. The statement follows from

Proposition 3.24.

In the context of separation via convex underestimators Corollary 3.25

reads as follows.

66 Visible Points, the Separation Problem, and Applications to MINLP

Corollary 3.26.

Let

D⊆C

be a closed convex set such that

(

x¯

)

⊆D

and let

{x∈D

(

)

≤

}

. If

gvex

(

x¯

)

0and

∂gvex

(

x¯

)



∅

, then a

gradient cut of gvex

Dat x¯is valid for S.

Proof. Let

{x∈D

gvex

(

)

≤

}

and

v∈∂gvex

(

x¯

). The cut

gvex

(

x¯

) +

(

x−x¯

)

≤

0 is valid for

, and separates

x¯

from

. Since

is a relaxation,

i.e.

T⊆Tr

, it follows that the cut is also valid for

, and Corollary 3.25

implies its validity for S.

The previous result tells us that if we find a box, tighter than the bounds,

that contains the visible points, then we might be able to construct tighter

underestimators. However, to compute a box containing

(

x¯

) we need to

know how VS(x¯) looks like. That is the topic of the next section.

3.4.1 Characterizing the Visible Points

From the definition of visible points we have:

Theorem 3.27. Let g:Rn→Rbe a continuous function, C⊆Rna closed

convex set, and S={x∈C:g(x)≤0}. If x¯∈Cand g(x¯) >0, then

VS(x¯) = {x∈C:g(x) = 0, g(x+λ(x¯−x)) >0for every λ∈(0,1]}.

(3.1)

Furthermore, if gis differentiable, then every x∈VS(x¯) satisfies

∇g(x)T(x¯−x)≥0.

Proof. Given that

x¯/∈S

, by definition we have

x∈VS

(

x¯

) if and only if

x∈S

and for every

λ∈

1],

(

x¯−x

)

/∈C

(

x¯−x

))

0. However,

the convexity of

and

x¯∈C

imply that for

x∈S

(

x¯−x

)

∈C

. Hence,

VS(x¯) = {x∈C:g(x)≤0, g(x+λ(x¯−x)) >0 for every λ∈(0,1]}.

Since gis continuous, it follows that for x∈VS(x¯),

0≥g(x) = lim

λ→0+g(x+λ(x−x¯)) ≥0.

Thus, g(x) = 0 which proves (3.1).

Now, assume that gis differentiable and let x∈VS(x¯). Then,

0≤lim

λ→0+

g(x+λ(x¯−x))

λ= lim

λ→0+

g(x+λ(x¯−x)) −g(x)

λ=∇g(x)T(x¯−x).

This concludes the claim.

3.4. Applications to MINLP 67

Remark 3.28.

Note that if we drop the hypothesis that

x¯

is in

, then there

might be visible points for which

is strictly negative, and there does not

seem to be a nice description of the visible points. In such a case,

(

x¯

) would

be a disjunctive set and we would even lose the valid (non-linear) inequality

∇g

(

)

(

x¯−x

)

≥

0. Likewise, if

was not convex, or if we had more than

one non-convex constraint, e.g., some variable has to be binary, then there

does not seem to be a nice description of the visible points. This last point

is rather unfortunate, it means that it might not be easy to generalize the

technique to relaxations that involve more than one non-convex constraint. In

particular, since a mixed-integer set usually consists of multiple non-convex

constraints, the techniques presented here might not be applicable to MILPs.

On the other hand, considering more constraints might allow us to see more

of the feasible region. Therefore, in such cases one might have to try to use

stronger generators such as

conv Vconv S

, see also Venkatachalam and Ntaimo

(2016).

Quadratic constraints

For quadratic constraints, the visible points have a particularly simple descrip-

tion.

Theorem 3.29.

Let

be a closed, convex set that contains

x¯

. Let

(

) =

xTQx +bTx+cand S={x∈C:g(x)≤0}. If g(x¯) >0, then

VS(x¯) = {︂x∈C:g(x) = 0,∇g(x¯)Tx+bTx¯ + 2c≥0}︂

Proof. (

⊆

) Let

x∈VS

(

x¯

). By Theorem 3.27, we have

(

) = 0 and

∇g

(

)

(

x¯−

x)≥0. Equivalently,

xTQx +bTx+c= 0,

2xTQ(x¯−x) + bT(x¯−x)≥0.

By multiplying the equation by 2, adding it to the inequality, and re-arranging

terms we obtain the result.

(

⊇

) Let

satisfy

(

) = 0 and

∇g

(

x¯

)

bTx¯

c≥

0. Then, subtracting

2g(x) from ∇g(x¯)Tx+bTx¯ + 2c≥0 yields ∇g(x)T(x¯−x)≥0. Let

q(λ) = g(x+λ(x¯−x)), for λ∈R.

The derivative is given by

q′

(

) =

∇g

(

x¯−x

))

(

x¯−x

), and

q′

(0) =

∇g

(

)

(

x¯−x

)

≥

0. Since

is quadratic,

(1) =

(

x¯

)

(0) =

(

) = 0, and

68 Visible Points, the Separation Problem, and Applications to MINLP

q′

(0)

≥

0, we have that

has no roots in (0

1]. Thus,

(

x¯−x

)) =

(

)

for every

λ∈

1] and, from Theorem 3.27, we conclude that

x∈VS

(

x¯

) as

we wanted.

Remark 3.30.

Theorem 3.29 implies in particular that the visible points of

a closed convex set intersected with a quadratic constraint, from a point in

the convex set, is always closed. This does not contradict Deutsch et al. (2013,

Example 15.5) mentioned in Remark 3.16. Indeed, if one represents the cone

as a quadratic constraint

(

)

≤

0, then the origin must be feasible for the

quadratic constraint. This follows from the fact that the ray [1

,∞

)(1

0) is

in the boundary of the cone, which implies that

(

λ,

0) = 0 for

λ≥

0. But

(

λ,

0) is a univariate quadratic function and as such can have at most two

roots if it is nonzero. Hence,

(

λ,

0) = 0 and, in particular,

0) = 0.

Remark 3.31.

The hyperplane

∇g

(

x¯

)

bTx¯

+ 2

= 0 is known as the

polar hyperplane (Fasano and Pesenti, 2017) of the point

x¯

with respect to

the quadratic

in projective geometry. In fact, homogenizing the quadratic

yields the quadric

gh(x, x0) = xTQx +bTxx0+cx2

0=(︃x

x0)︃T(︄Qb

2c)︄(︃x

x0)︃.

The polar hyperplane of (︃x¯

1)︃with respect to gh(x, x0) = 0 is then given by

∇gh(x, x0)T(x¯,1) = 0

⇐⇒ 2x¯TQx +bTx¯x0+bTx+ 2cx0= 0.

Intersecting with x0= 1 yields ∇g(x¯)Tx+bTx¯ + 2c= 0.

Example 3.32. Consider the function

g(x1, x2, x3) = −x1x2+x1x3+x2x3−x1−x2−x3+ 1,

the boxed domain B= [−1

10 ,2] ×[0,2]2, the constrained set

S={x∈B:g(x)≤0},

and the infeasible point

x¯

= (0

0). By Theorem 3.29, the visible points from

x¯ are given by

VS(x¯) = {(x1, x2, x3)∈B:g(x) = 0, x1+x2+x3≥0},

3.4. Applications to MINLP 69

as shown in Figure 3.3.

The tightest box bounding VS(x¯) is

R= [−1

10,1] ×[0,1

20(23 + 3√5)] ×[0,1

20(19 + 3√5)].

The linear underestimators of

obtained by using McCormick inequalities (Mc-

Cormick, 1976) for each term over Band Rare

1≤x1+ 3x2+11

10x3and 1 ≤x1+ 2x2+11

10x3,

respectively. Since 0

≤x2

, it follows that the underestimator over

dominates

the underestimator over

. We remark that the improvement in this particular

cut is only due to the improvement on the upper bound of x1.

Figure 3.3: The left plot shows the feasible region

and

x¯

. The set

{x∈B

(

) = 0

}

appears in the middle plot. Finally, the visible points,

(

x¯

), are

plotted on the right.

Polynomial constraints

For a general polynomial g, the condition

g(x+λ(x¯−x)) >0 for every λ∈(0,1] (3.2)

(3.1)

asks for the univariate polynomial

(

) =

(

x¯−x

)) to be

positive on (0

1]. We can then use the theory of non-negative polynomials

to translate a relaxation of the infinitely many constraints

(3.2)

to a finite

number of constraints. From the following classic characterization of univariate

non-negative polynomials on intervals, see for instance Powers and Reznick

(2000), we can derive an extended formulation for the relaxation of (3.1),

RS(x¯) := {x∈C:g(x) = 0, g(x+λ(x¯−x)) ≥0 for every λ∈[0,1]}.

70 Visible Points, the Separation Problem, and Applications to MINLP

Theorem 3.33.

Let

p∈R

[

]be a polynomial. Then

is non-negative on

[0,1] if and only if

the degree of

is 2

and there exist

s1, s2∈R

[

]of degree

and

d−

respectively, such that

p(λ) = s1(λ)2+λ(1 −λ)s2(λ)2.

the degree of

is 2

+ 1 and there exist

s1, s2∈R

[

]of degree

, such

that

p(λ) = λs1(λ)2+ (1 −λ)s2(λ)2.

Theorem 3.34.

Let

be a closed convex set that contains

x¯

. Let

(

)be

a polynomial such that

(

x¯

)

0and

{x∈C

(

)

≤

}

. Let

(

) =

g(x+λ(x¯−x)).

1. If the degree of gis 2d, then

RS(x¯) = projxE,

where Eis

{(x, A, B)∈C×Sd

+×Sd

g(x) = 0,

p′

x(0) = B00,

p(k+2)

x(0)

(k+ 2)! =∑︂

i+j=k

0≤i,j≤d−1

Aij −Bij +∑︂

i+j=k+1

0≤i,j≤d−1

Bij, for 0≤k≤2d−2}.

2. If the degree of gis 2d+ 1, then

RS(x¯) = projxE,

where Eis

{(x, A, B)∈C×Sd+1

+×Sd

g(x) = 0,

p′

x(0) = A00,

p′′

x(0)

2= 2A01 +B00,

p(k+3)

x(0)

(k+ 3)! =∑︂

i+j=k+2

0≤i,j≤d

Aij+∑︂

i+j=k+1

0≤i,j≤d−1

Bij−∑︂

i+j=k

0≤i,j≤d−1

Bij, for 0≤k≤2d−2}.

3.4. Applications to MINLP 71

Proof. We just prove the case of even degree as the proof for the odd degree

case is similar. We have

x∈RS

(

x¯

) if and only if

(0) = 0 and

(

) is

non-negative on [0

1]. By Theorem 3.33, this is equivalent to

(0) = 0 and

there exist polynomials s1, s2of degree dand d−1, respectively, such that

px(λ) = s1(λ)2+λ(1 −λ)s2(λ)2.

Given that 0 =

(0) =

(0)

, the polynomial

has a root at 0 and we can

write it as

(

) =

λr1

(

) where

is a polynomial of degree

d−

1. Thus,

x∈RS

(

x¯

) if and only if

(0) = 0 and there exist polynomials

r1, r2

of degree

d−1 such that

px(λ) = λ2r1(λ)2+λ(1 −λ)r2(λ)2.

Let Λ = (1

, λ, . . . , λd−1

)

. The polynomials

can be written as

for some

ci∈Rd

. Then,

(

)

= Λ

Λ and

(

)

= Λ

Λ for some

A, B ∈ Sd

Thus,

x∈RS

(

x¯

) if and only if

(0) = 0 and there exist

A, B ∈ Sd

such

that

px(λ) = λ2ΛTAΛ + λ(1 −λ)ΛTBΛ.

Since px(λ) is a polynomial of degree 2d, its Taylor expansion at 0 yields

px(λ) =

∑︂

k=1

p(k)

x(0)

k!λk.

Identifying coefficients, we conclude the theorem.

Remark 3.35.

One could also add the constraints

(

) =

(

) = 1 to

in the statement of Theorem 3.34. The correctness can easily be seen from the

proof since

c1cT

and

c2cT

. Although it makes the set more restricted,

the rank constraint is non-convex and does not change the projection. Thus,

we decided to leave it out.

We can recover Theorem 3.29 from Theorem 3.34. The set

of Theo-

rem 3.34 for the quadratic case (

= 1) is described by

(

) = 0,

p′

(0) =

B00

and

p′′

(0)

2 =

A00 −B00

, where

A00, B00 ≥

0. This implies that 0

(

x¯

) =

(1) =

p′

(0) +

p′′

(0)

2 =

A00

. Therefore,

(

x¯

) consists of the

such that

(0) = 0 and

p′

(0)

≥

0. This last constraint is equivalent to

∇g

(

)

(

x¯−x

)

≥

0 which is the only constraint needed, apart from

(

) = 0,

to prove Theorem 3.29.

72 Visible Points, the Separation Problem, and Applications to MINLP

-2-1 0 1 2

-2

-1

Figure 3.4: Feasible region

(

)

≤

0 of Example 3.36 that shows that

cl VS(x¯) =RS(x¯) when the degree of gis greater than 2.

The previous deduction is only possible because

(

x¯

) =

(

x¯

) holds for

a quadratic constraint. This equality does not hold as soon as the degree

is greater than 2, even after replacing

(

x¯

) by its closure, as shown in the

following example.

Example 3.36.

Consider

(

x1, x2

) = (

2−

{

(

x1, x2

) :

(

x1, x2

)

≤

}

, and

x¯

= (1

,−

2). The set

consists of the right half of the

unit ball and the half space

x1≤

0 without the interior of the left half of the

unit ball, see Figure 3.4. The point

= (

−

0) is not visible from

x¯

, because

(

x¯−z

)) =

(

−

1+2

λ, −

) = ((2

λ−

λ2−

1)(2

λ−

1) = 4

λ−

is zero at

. On the other hand,

z∈RS

(

x¯

) since 4

λ−

2≥

for every

λ∈

1]. In this example

(

x¯

) is closed, so we conclude that

cl VS(x¯) =RS(x¯).

3.5 Conclusions and Outlook

Using the concept of visible points, we introduced a technique that allows

to reduce the domains in separation problems. Such a result is particularly

interesting for MINLP, since the tightness of the domain directly affects the

quality of underestimators, from which cuts are obtained.

Some questions that could be interesting to look at in the future are the

followings. Is there a tighter domain other than

that can be efficiently

exploited? Is there a useful characterizations of

when

contains more

than one non-convex constraint, in particular, if some variables are restricted

to be integer?

Chapter 4

Intersection Cuts for Factorable

Mixed-Integer Nonlinear Programming

We now move to our final stop, intersection cuts (see Section 1.2). In this

chapter we develop a technique for constructing

-free sets where

(

)

≤

}

and

is an arbitrary factorable function. In the next chapter we

specialized to the case where

is quadratic and we construct maximal

-free

sets.

In order to build an

-free for the case that

is factorable, we develop

a procedure that constructs a concave underestimator of

that is tight at a

given point. A peculiarity of these underestimators is that they do not rely on

a bounded domain. We propose a strengthening procedure for the intersection

cuts that exploits the bounds of the domain. Finally, we propose an extension

of monoidal strengthening to take advantage of the integrality of non-basic

variables.

In Section 4.1 we introduce our setting, motivate intersection cuts for

MINLP by making a parallel between branch and bound for MILP and MINLP,

and describe the contributions of the chapter. In Section 4.2 we review some lit-

erature and related works. Then we jump right into the construction of concave

underestimators in Section 4.3. The improvement using bound information

is presented in Section 4.4, while our application of monoidal strengthening

appears in Section 4.5. We offer a summary of the chapter in Section 4.6.

This chapter is based on the publication Serrano (2019).

74 Intersection Cuts for Factorable MINLP

4.1 Motivation

In this chapter we propose a procedure for generating intersection cuts for

MINLP. We consider MINLP of the following form

max cTx

s.t. gj(x)≤0, j ∈J

Ax =b

xi∈Z, i ∈I

x≥0,

(4.1)

where

{

, . . . , l}

denotes the indices of the nonlinear constraints,

gj:Rn→

are assumed to be continuous and factorable (see Definition 4.1),

A∈Rm×n

c∈Rn

b∈Rm

, and

I⊆ {

, . . . , n}

are the indices of the integer variables.

We denote the set of feasible solutions by

and a generic relaxation of

R, that is, S⊆R.

The current state of the art for solving

MINLP

to global optimality is

via linear programming (LP), convex nonlinear programming and (

MILP

)

relaxations of

, together with spatial branch and bound (Belotti et al., 2009;

Kılın¸c and Sahinidis, 2017; Lin and Schrage, 2009; Misener and Floudas, 2014;

Tawarmalani and Sahinidis, 2005; Vigerske and Gleixner, 2017). Let us recall,

roughly, how LP-based spatial branch and bound works. The initial polyhedral

relaxation is solved and yields

x¯

. If the solution

x¯

is feasible for

(4.1)

, we obtain

an optimal solution. If not, we try to separate the solution from the feasible

region. This is usually done by considering each violated constraint separately.

Let

(

)

≤

0 be a violated constraint of

(4.1)

. If

(

x¯

)

0 and

is convex, then

(

x¯

(

x−x¯

)

≤

0, where

v∈∂g

(

x¯

), is a valid cut. If

is non-convex, then a

convex underestimator

gvex

, that is, a convex function such that

gvex

(

)

≤g

(

)

over the feasible region, is constructed and if

gvex

(

x¯

)

0 the previous cut is

constructed for

gvex

. If the point cannot be separated, then we branch, that

is, we select a variable

in a violated constraint and split the problem into

two problems, one with xk≤x¯kand the other one with xk≥x¯k.

Applying the previous procedure to the

MILP

case, that is

(4.1)

with

∅

, reveals a problem with this approach. In this case, the polyhedral

relaxation is just the linear programming (LP) relaxation. Assuming that

x¯

is not feasible for the

MILP

, then there is an

i∈I

such that

xi/∈Z

. Let us

treat the constraint

xi∈Z

as a nonlinear non-convex constraint represented

by some function as

(

)

≤

0. Then,

(

x¯i

)

0. A convex underestimator

g¯

must satisfy that

gvex

(

)

≤

0 for every

z∈R

, since

gvex

(

)

≤g

(

)

≤

for every

z∈Z

and

gvex

(

) is convex. Thus, separation is not possible and we

4.1. Motivation 75

need to branch. However, for the current state-of-the-art algorithms for

MILP

cutting planes are a fundamental component (Achterberg and Wunderling,

2013).

Recall, from Section 1.2, that when solving the LP relaxation, we obtain

x¯B

RxN

, where

and

are the indices of the basic and non-basic

variables, respectively. Since

x¯

is infeasible for the

MILP

, there must be some

k∈B∩I

such that

x¯k/∈Z

. Now, even though

x¯

cannot be separated from the

violated constraint

xk∈Z

, the equivalent constraint,

x¯k

∑︁j∈Nrkjxj∈Z

can be used to separate x¯.

In the

MINLP

case, this framework generates equivalent non-linear con-

straints with some appealing properties, in particular, violated points can

always be separated. The change of variables

x¯k

∑︁j∈Nrkjxj

for the

basic variables present in a violated nonlinear constraint

(

)

≤

0, produces

the non-linear constraint

(

)

≤

0 for which

(0)

0 and

xN≥

0. Assuming

that the convex envelope of

exists in

xN≥

0, then we can always construct

a valid inequality. Indeed, by Tawarmalani and Sahinidis (2002, Corollary

3), the convex envelope of

is tight at 0. Since an

-subgradient

always

exists for any

ϵ >

0 and

x∈dom h

(Brondsted and Rockafellar, 1965), an

h(0)

2-subgradient, for instance, at 0 will separate it.

Even when there is no convex underestimator for

, a valid cutting plane

does exist. Continuity of

implies that

{xN≥

0 :

(

)

≤

}

closed and Conforti et al. (2015, Lemma 2.1) ensures that 0

/∈convX

, thus,

a valid inequality exists. We introduce a technique to construct such a valid

inequality. The idea is to build a concave underestimator of

have

, such that

have

(0) =

(0)

0. Then,

{xN

have

(

)

≥

}

is an

-free set, that is,

a convex set that does not contain any feasible point in its interior, and as

such can be used to build an intersection cut (IC) (Tuy, 1964; Balas, 1971;

Glover, 1973).

First contribution

In Section 4.3, we present a procedure to build con-

cave underestimators for factorable functions that are tight at a given point.

The procedure is similar to McCormick’s method for constructing convex

underestimators, and generalizes Proposition 3.2 and improves Proposition

3.3 of Khamisov (1999). A simple way to build a concave underestimator of

a function is to write the function as a difference of convex (d.c.), then, by

linearizing the convex part a concave underestimator is obtained. However,

even if a function is known to have a d.c. representation, it is not always clear

-subgradient of a convex function

y∈dom f

such that

(

)

≥f

(

)

−ϵ

vT(x−y) for all x∈dom f

76 Intersection Cuts for Factorable MINLP

how to construct it.

These underestimators can be used to build intersection cuts. We note

that IC from a concave underestimator can generate cuts that cannot be gen-

erated by using the convex envelope. This should not be surprising, given

that intersection cuts work at the feasible region level, while convex un-

derestimators depend on the graph of the function. A simple example is

{x∈

2] :

−x2

≤

}

. When separating 0, the intersection cut gives

x≥

while using the convex envelope over [0,2] yields x≥1/2.

There are many differences between concave underestimators and convex

ones. Maybe the most interesting one is that concave underestimators do not

need bounded domains to exist. As an extreme example,

−x2

is a concave

underestimator of itself, but a convex underestimator only exists if the domain

is bounded. Even though this might be regarded as an advantage, it is

also a problem. If concave underestimators are independent of the domain,

then we cannot improve them when the domain shrinks.

Second contribution

In Section 4.4, we propose a strengthening procedure

that uses the bounds of the variables to enlarge the

-free set. Our procedure

improves on the one used by Tuy (1964).

Other techniques for strengthening IC have been proposed, such as, exploit-

ing the integrality of the non-basic variables (Balas and Jeroslow, 1980; Con-

forti et al., 2011a; Dey and Wolsey, 2010), improving the relaxation

(Balas

and Margot, 2011; Porembski, 1999, 2001) and computing the convex hull of

R\C

(Basu et al., 2011; Conforti et al., 2015; Glover, 1974; Sen and Sherali,

1986, 1987).

Third contribution

By interpreting IC as disjunctive cuts (Balas, 1979),

we extend the monoidal strengthening technique of Balas and Jeroslow (1980)

to our setting in Section 4.5. Although its applicability seems to be limited,

we think it is of independent interest, especially for MILP.

4.2 Literature Review and Related Work

There have been many efforts on generalizing cutting planes from

MILP

MINLP

, we refer the reader to Modaresi et al. (2015) and the references

therein. Modaresi et al. (2015) study how to compute

conv

(

R\C

) where

not polyhedral, but

is a

-branch split. In practice, such sets

usually come

4.2. Literature Review and Related Work 77

from the integrality of the variables. Works that build sets

which do not

come from integrality considerations include Belotti (2011); Bienstock et al.

(2019); Fischetti et al. (2016, 2017); Fischetti and Monaci (2019); Saxena et al.

(2010a,b). We refer to Bonami et al. (2011) and the references therein for

more details.

Fischetti et al. (2016) applied intersection cuts to bilevel optimization.

Bienstock et al. (2016, 2019) studied outer-product-free sets; these can be

used for generating intersection cuts for polynomial optimization when using

an extended formulation. Fischetti and Monaci (2019) constructed bilinear-

free sets through a bound disjunction and, in each term of the disjunction,

underestimating the bilinear term with McCormick inequalities (McCormick,

1976). The complement of this disjunction is the bilinear-free set.

We would like to point out that the disjunctions built in Belotti (2011);

Fischetti and Monaci (2019); Saxena et al. (2010b,a) can be interpreted as

piecewise linear concave underestimators. However, our approach is not suit-

able for disjunctive cuts built through cut generating LPs (Balas et al., 1993),

since we generate infinite disjunctions, see Section 4.5, so we rely on the classic

concept of intersection cuts where Ris a translated simplicial cone.

Khamisov (1999) studies functions

Rn→R

, representable as

(

) =

maxy∈Rφ

(

x, y

) where

is continuous and concave on

. These functions

allow for a concave underestimator at every point. He shows that this class of

functions is very general, in particular, the class of functions representable as

difference of convex functions is a strict subset of this class. He then proposes

a procedure to build concave underestimators of composition of functions

which is a special case of Theorem 4.4 below. He also suggests how to build

an underestimator for the product of two functions over a compact domain.

The construction is based on writing the product as a difference of convex and

then using a construction for the square of a function. The construction of

a convex overestimator of

is based on a piecewise linear overestimation of

the function

over the range of

, which is why Khamisov needs a compact

domain for

. We simplify the construction for the product and no longer

need a compact domain. We still write the product as a d.c. but we use

Theorem 4.4 instead of a piecewise linear overestimator, allowing us to drop

the compactness assumption.

Although not directly related to our work, other papers that use underes-

timators other than convex are Buchheim and D’Ambrosio (2016); Buchheim

and Traversi (2013); Hasan (2018). We would also like to mention here the

work of Towle and Luedtke (2019) that proposes a method for constructing

valid cutting planes with a similar approach to intersection cuts, but allowing

x¯

to not be in the

-free set. The

-free sets developed in this chapter could

78 Intersection Cuts for Factorable MINLP

also be used in their framework.

4.3 Concave Underestimators

In his seminal paper, McCormick (1976) proposed a method to build convex

underestimators of factorable functions.

Definition 4.1.

Given a set of univariate functions

, e.g.,

{cos,·n,exp,log,...}

the set of factorable functions

is the smallest set that contains

, the con-

stant functions, and is closed under addition, product and composition.

As an example,

e−(cos(x2)+xy/4)2

is a factorable function for

{cos,exp}

Given the inductive definition of factorable functions, to show a property

about them one just needs to show that said property holds for all the func-

tions in

, constant functions, and that it is preserved by the product, addition

and composition. For instance, McCormick (1976) proves, constructively, that

every factorable function admits a convex underestimator and a concave over-

estimator, by showing how to construct estimators for the sum, product and

composition of two functions for which estimators are known.

An estimator for the sum of two functions is the sum of the estimators. For

the product, McCormick uses the well-known McCormick inequalities. Less

known is the way McCormick handles the composition

(

)). Let

fvex

be a

convex underestimator of fand zmin = arg min fvex(z). Let gvex be a convex

underestimator of

and

gave

a concave overestimator. McCormick shows

that

fvex

(

mid{gvex

(

)

, gave

(

)

, zmin}

) is a convex underestimator of

(

)),

where

mid{x, y, z}

is the median between

x, y

and

. It is well known that

the optimum of a convex function over a closed interval is given by such a

formula, thus

fvex(mid{gvex(x), gave(x), zmin}) = min{fvex(z) : z∈[gvex(x), gave(x)]},

see also Tsoukalas and Mitsos (2014).

Definition 4.2.

Let

X ⊆ Rn

be convex, and

X → R

be a function. We

say that

fave

X → R

is a concave underestimator of

x¯∈ X

fave

concave,

fave

(

)

≤f

(

)for every

x∈ X

and

fave

(

x¯

) =

(

x¯

). Similarly we

define a convex overestimator of fat x¯∈ X.

Remark 4.3.

For simplicity, we will consider only the case where

This restriction leaves out some common functions like

log

. One possibility

4He actually leaves it as an exercise for the reader.

4.3. Concave Underestimators 79

to include these function is to let the range of the function to be

R∪{±∞}

Then,

log

(

) =

−∞

for

x∈R−

. Note that other functions like

√x

can be

handled by replacing them by a concave underestimator defined on all

We now show that every factorable function admits a concave underesti-

mator at a given point. Since the case for the addition is easy, we just need to

specify how to build concave underestimators and convex overestimators for

– the product of two functions for which estimators are known,

–

the composition

(

)) where estimators of

and

are known and

is univariate.

Theorem 4.4.

Let

R→R

and

Rn→R

. Let

gave, fave

be, respectively,

a concave underestimator of

x¯

and of

(

x¯

). Further, let

gvex

be a

convex overestimator of gat x¯. Then, h:Rn→Rgiven by

h(x) := min{fave(gave(x)), fave(gvex(x))},

is a concave underestimator of f◦gat x¯.

Proof. Clearly, h(x¯) = f(g(x¯)).

To establish h(x)≤f(g(x)), notice that

h(x) = min{fave(z) : gave(x)≤z≤gvex(x)}.(4.2)

Since

(

) is a feasible solution and

fave

is an underestimator of

, we

obtain that h(x)≤f(g(x)).

Now, let us prove that

is concave. To this end, we again use the represen-

tation (4.2). To simplify notation, we write g1, g2for gave, gvex, respectively.

We prove concavity by definition, that is,

h(λx1+ (1 −λ)x2)≥λh(x1) + (1 −λ)h(x2),for λ∈[0,1].

Let

I= [g1(λx1+ (1 −λ)x2), g2(λx1+ (1 −λ)x2)]

J= [λg1(x1) + (1 −λ)g1(x2), λg2(x1) + (1 −λ)g2(x2)].

By the concavity of g1and convexity of g2we have I⊆J. Therefore,

h(λx1+ (1 −λ)x2) = min{fave(z) : z∈I} ≥ min{fave(z) : z∈J}.

80 Intersection Cuts for Factorable MINLP

Since fave is concave, the minimum is achieved at the boundary,

min{fave(z) : z∈J}= min

i∈{1,2}fave(λgi(x1) + (1 −λ)gi(x2)).

Furthermore,

fave

(

λgi

(

)+(1

−λ

)

(

))

≥λfave

(

))+(1

−λ

)

fave

(

))

which implies that

h(λx1+ (1 −λ)x2)≥min

i∈{1,2}λfave(gi(x1)) + (1 −λ)fave(gi(x2))

≥min

i∈{1,2}λfave(gi(x1)) + min

i∈{1,2}(1 −λ)fave(gi(x2))

=λh(x1) + (1 −λ)h(x2),

as we wanted to show.

Remark 4.5.

The generalization of Theorem 4.4 to the case where

multivariate in the spirit of Tsoukalas and Mitsos (2014) is straightforward.

The computation of a concave underestimator and convex overestimator

of the product of two functions reduces to the computation of estimators for

the square of a function through the polarization identity

f(x)g(x) = 1

4(f(x) + g(x))2−1

4(f(x)−g(x))2.

This identity is based on writing the product

x1x2

as a difference of convex.

In particular, it can be proven by doing an eigenvalue decomposition of the

Hessian of

x1x2

Let

Rn→R

for which we know estimators

hvex ≤h≤

have

x¯

. From Theorem 4.4, a convex overestimator of

x¯

is given by

max{hvex2, have2}

. On the other hand, a concave underestimator of

x¯

can

be constructed from the underestimator

(

)

≥h2

(

x¯

)+2

(

x¯

)(

(

)

−h

(

x¯

)).

From here we obtain

{︄2h(x¯)have(x)−h2(x¯),if h(x¯) ≤0

2h(x¯)hvex(x)−h2(x¯),if h(x¯) >0.(4.3)

Example 4.6.

Let us compute a concave underestimator of

(

) =

e−(cos(x2)+x/4)2

at 0. Estimators of

are given by 0

≤x2≤x2

. For

cos

(

), estimators are

cos

(

)

−x2/

≤cos

(

)

≤

1. Then, a concave underestimator of

cos

(

) is,

according to Theorem 4.4,

min{cos

(0)

−

,cos

(

)

−x4/

}

cos

(

)

−x4/

4.3. Concave Underestimators 81

Figure 4.1: Concave underestimator (orange) and convex overestimator (green)

cos

(

) +

4 (left),

−

(

cos

(

) +

(middle) and

(

) (right) at

= 0.

A convex overestimator is 1. Hence, cos(x2)−x4/2 + x/4≤cos(x2) + x/4≤

1 + x/4.

Given that

−x2

is concave, a concave underestimator of

−

(

cos

(

min{−

(

cos

(

)

−x4/

2,−

(1+

. To compute a convex overesti-

mator of

−

(

cos

(

, we compute a concave underestimator of (

cos

(

. Since,

cos

(

) +

4 at 0 is 1,

(4.3)

yields 2(

cos

(

)

−x4/

2 +

−

Finally, a concave underestimator of

−

1 is just its linearization,

e−1

(

+1) and so

e−1

(1+

min{−

(

cos

(

)

−x4/

2,−

(1+

)

is a concave underestimator of

(

). The intermediate estimators as well as

the final concave underestimator are illustrated in Figure 4.1.

For ease of exposition, in the rest of the chapter we assume that the concave

underestimator is differentiable. All results can be extended to the case where

the functions are only sub- or super-differentiable.

4.3.1 Concave Underestimators and Intersection Cuts for Convex

Constraints

Here we show that if we apply our procedure to construct an

-free set from a

violated convex constraint and compute an intersection cut using the smallest

representation (see Section 1.2), we just recover the gradient cut. Even more

this gradient cut is the same that we would have computed in the original

space. In particular, the point is separable in the original space if and only if

it is separable in the non-basic space. If one recalls that gradient cuts do not

use bounds information, then this might not be surprising.

Let

(

) be a differentiable convex function and consider the constraints

(

)

≤

0. Suppose

RxN

is the current optimal tableau and

(xB, xN) = (f, 0) the optimal LP solution. Further, assume that g(f, 0) >0.

Let

(

) =

(

RxN, xN

) and note that this function is still convex

since it is the composition of a convex function with an affine map. A concave

82 Intersection Cuts for Factorable MINLP

underestimator at 0 is just the linearization of hat 0, that is,

h(0) + ∇h(0)TxN.

Then, the

-free set is

{xN

(0)+

∇h

(0)

TxN≥

}

{xN

−1

h(0) ∇h

(0)

TxN≤

}

. Thus the smallest representation is given by the sublinear function (actu-

ally, linear)

(

) =

−1

h(0) ∇h

(0)

TxN

. In the space of the non-basic variables

the rays are just

for

i∈N

. Thus, the intersection cut is

∑︁i∈Nρ

(

)

xi≥

that is,

−1

h(0) ∇h

(0)

TxN≥

1. Manipulating the last expression we arrive at

h(0) + ∇h(0)TxN≤0. This is the same as the gradient cut of hat 0.

Furthermore,

h(0) + ∇h(0)TxN=g(f, 0) + ∇g(f, 0)T(︃R

I)︃xN

=g(f, 0) + ∇g(f, 0)T(︃xB−f

xN)︃

This last expression is the gradient cut of gat (f, 0).

Thus, there is nothing to be gain from this approach for convex constraints.

An interesting observation, in connection to Chapter 2, is that the

-free

set, either

{xN

(0) +

∇h

(0)

TxN≥

}

in the non-basic space or

(

0) +

∇g

(

T(︃xB−f

xN)︃≥

}

, is not going to be maximal if it does not

support the constraint. In particular, if

is strictly convex the

-free set is

not maximal. This will be important in the next chapter.

Remark 4.7.

Also, this already provides evidence that the

-free sets con-

structed by our approach will not be maximal in general. Assume we have

a function

(

)

≤

}

, and we write

as a difference of convex

f−h

. Say we linearize the function

at a point

x¯

such that

(

x¯

)

0 to

obtain

f≥l

. Then, the concave underestimator is

l−h≤g

and the

-free set

l−h≥

0. If

is strictly convex, we would have

(

)

< f

(

) for every

x

x¯

This

-free set will not touch

. If it did, that is, if there is a point

both in

and the

-free set, then

(

)

−h

(

)

≥l

(

)

−h

(

)

≥

≥g

(

) =

(

)

−h

(

thus x=x¯ and g(x¯) = 0, which contradicts our assumption.

This argument is very far from a proof since, first, our procedure does not

really construct a d.c. decomposition, but rather use a d.c. as an intermediate

step for the product. Second, an S-free does not need to touch Sin order to

be maximal (see Chapter 5).

4.4. Enlarging the S-free Sets by Using Bound Information 83

4.4 Enlarging the S-free Sets by Using Bound Information

In Section 4.3, we showed how to build concave underestimators which give

-free sets. Note that the construction does not make use of the bounds of

the domain. We can exploit the bounds of the domain by the observation that

the concave underestimator only needs to underestimate within the feasible

region. However, to preserve the convexity of the

-free set, we must ensure

that the underestimator is still concave.

Let

(

)

≤

0 be a constraint of

(4.1)

, assume

x∈

[

l, u

] and let

have

be a concave underestimator of

. Throughout this section,

{x∈

[

l, u

] :

(

)

≤

}

. In order to construct a concave function

such that

(

)

≥

}

contains {x:have(x)≥0}, consider the following function

ˆ(x) = min{have(z) + ∇have(z)T(x−z) : z∈[l, u], have(z)≥0}.(4.4)

A similar function was already considered by Tuy (1964). The only difference

is that Tuy’s strengthening does not use the restriction

have

(

)

≥

0, see

Figure 4.2.

Proposition 4.8.

Let

have

be a concave underestimator of

x¯∈

[

l, u

such that

(

x¯

)

0. Define

as in

(4.4)

. Then, the set

(

)

≥

}

a convex S-free set and C⊇ {x:have(x)≥0}.

Proof. The function

is concave since it is the minimum of linear functions.

This establishes the convexity of C.

To show that

C⊇ {x

have

(

)

≥

}

, notice that

have

(

) =

minzhave

(

) +

∇have

(

)

(

x−z

). The inclusion follows from observing that the objective

function in the definition of

(

) is the same as above, but over a smaller

domain.

To show that it is

-free, we will show that for every

x∈

[

l, u

] such that

h(x)≤0, h

ˆ(x)≤0.

Let

x0∈

[

l, u

] such that

(

)

≤

0. Since

have

is a concave underestimator

x¯

have

(

x¯

)

0 and

have

(

)

≤

0. If

have

(

) = 0, then, by definition,

(

)

≤have

(

) = 0 and we are done. We assume, therefore, that

have

(

)

Consider

(

) =

have

(

x¯

(

x0−x¯

)) and let

λ1∈

1) be such that

(

λ1

) = 0. The existence of

λ1

is justified by the continuity of

(0)

0 and

(1)

0. Equivalently,

x¯

λ1

(

x0−x¯

) is the intersection point between

the segment joining

with

x¯

and

have

(

) = 0

}

. The linearization

λ1

evaluated at

= 1 is negative, because

is concave, and equals

have

(

∇have

(

)

(

x0−x1

). Finally, given that

x1∈

[

l, u

] and

have

(

) = 0,

x1is feasible for (4.4) and we conclude that h

ˆ(x0)<0.

84 Intersection Cuts for Factorable MINLP

Figure 4.2: Feasible region

{x, y ∈

2] :

(

x, y

)

≤

}

, where

x2−

xy −

+ 2

+ 1, in blue together with

have

(

x, y

)

≤

0 at

x¯

= (1

1) (left),

Tuy’s strengthening (middle) and

ˆ≤

0 (right) in orange. Region shown is

, [0

is bounded by black lines. The difference between the

-free sets

can be seen on the top of the picture.

In general, evaluating

is a difficult problem and there is no closed form

formula. However, when

have

is quadratic, the problem in the right hand side

of (4.4) is convex and a cut could be strengthen in polynomial time.

4.5 “Monoidal” Strengthening

We show how to strengthen cuts from reverse convex constraints when ex-

actly one non-basic variable is integer. Our technique is based on monoidal

strengthening applied to disjunctive cuts, see Lemma 4.10 and the discussion

following it. If more than one variable is integer, we can generate one cut

per integer variable, relaxing the integrality of all but one variable at a time.

However, under some conditions (see Remark 4.12), we can exploit the inte-

grality of several variables at the same time. For an introduction to monoidal

strengthening see Section 1.4.

Throughout this section, we assume that we already have a concave under-

estimator, and that we have performed the change of variables described in

the introduction. Therefore, we consider the constraint

{x∈

, u

] :

(

)

≤

}

where

Rn→R

is concave and

(0)

0. Let

{y∈

, u

] :

(

) = 0

}

The convex S-free set C={x∈[0, u] : h(x)≥0}can be written as

C=⋂︂

y∈Y{x∈[0, u] : ∇h(y)Tx≥ ∇h(y)Ty}.

The concavity of

implies that

(0)

≤h

(

)

−∇h

(

)

for all

in the domain

. In particular, if

y∈Y

, then

∇h

(

)

Ty≤ −h

(0)

0. Since all feasible

4.5. “Monoidal” Strengthening 85

points satisfy h(x)≤0, they must satisfy the infinite disjunction

⋁︂

y∈Y

∇h(y)T

∇h(y)Tyx≥1.(4.5)

The maximum principle (see Section 1.4) implies that with

αj= max

y∈Y

∂jh(y)

∇h(y)Ty,(4.6)

the cut

∑︁jαjxj≥

1 is valid. We remark that the maximum exists, since the

concavity of

implies that for

y∈Y

(

)

≤∂jh

(

)

−∇h

(

)

. This implies,

together with

∇h

(

)

Ty≤ −h

(0)

0, that

∂jh(y)

∇h(y)Ty≤

1 +

h(ej)

∇h(y)Ty

. If

(

)

≥

then ∂jh(y)

∇h(y)Ty≤1. Otherwise, ∂jh(y)

∇h(y)Ty≤1−h(ej)

h(0) .

The application of monoidal strengthening (Balas and Jeroslow, 1980,

Theorem 3) to a valid disjunction

⋁︁iαix≥

1 requires the existence of bounds

βi

such that

αix≥βi

is valid for every feasible point. Let

(

) be such a

bound for (4.5). An example of β(y) is

β(y) = min

x∈[0,u]∇h(y)Tx

∇h(y)Ty.

Remark 4.9.

(

)

≥

1, then

∇h

(

)

Tx/∇h

(

)

Ty≥

1 is redundant and can

be removed from

(4.5)

. Therefore, we can assume without loss of generality

that β(y)<1.

The following lemma is just a restatement of Lemma 1.1 in Section 1.4.

Lemma 4.10. Every x≥0that satisfies (4.5), also satisfies

⋁︂

y∈Y

∇h(y)Tx

∇h(y)Ty+z(y)(1 −β(y)) ≥1,(4.7)

where z:Y→Zis such that z≡0or there is a y0∈Yfor which z(y0)>0.

Proof. If z≡0, then (4.7) reduces to (4.5).

Otherwise, let

y0∈Y

such that

(

)

0, that is,

(

)

≥

1. By Re-

mark 4.9, for every y∈Y, it holds 1 −β(y)>0, and so

z(y0)(1 −β(y0)) ≥1−β(y0).

Therefore,

(

)

≥

−z

(

)(1

−β

(

)). Since every

x≥

0 satisfying

(4.5)

satisfies

∇h(y0)Tx

∇h(y0)Ty0≥β

(

), we conclude that

∇h(y0)Tx

∇h(y0)Ty0

(

)(1

−β

(

))

≥

holds.

86 Intersection Cuts for Factorable MINLP

Remark 4.11.

Even if some disjunctive terms have no lower bound, that is,

(

) =

−∞

for

y∈Y′⊆Y

, Lemma 4.10 still holds if, additionally,

(

) = 0

for all

y∈Y′

. This means that we are not using that disjunction for the

strengthening. In particular, if for some variable

αj

is defined by some

y∈Y′, then this cut coefficient cannot be improved.

Assume now that

xk∈Z

for every

k∈K⊆ {

, . . . , n}

. One way of

constructing a new disjunction is to find a set of functions

such that for any

choice of

mk∈M

and any feasible assignment of

(

) :=

∑︁k∈Kxkmk

(

)

satisfies the conditions of Lemma 4.10, that is, zis in

Z={z:Y→Z:z≡0∨∃y∈Y, z(y)>0}.

Once such a family of functions has been identified, the cut

∑︁jγjxj≥

1 with

γj=αjif j /∈K, and

γk= inf

m∈Mmax

y∈Y

∂kh(y)

∇h(y)Ty+m(y)(1 −β(y)) for k∈K, (4.8)

is valid and at least as strong as

(4.6)

. Any

M⊆Z

such that (

+) is a

monoid, that is, 0 ∈Mand Mis closed under addition can be used in (4.8).

The question that remains is how to choose

. For example, the monoid

{m:Y→Z

mhas finite support and ∑︁y∈Ym

(

)

≥

}

is an obvious

candidate for

. However, the problem is how to optimize over such an

see (4.8).

We circumvent this problem by considering only one integer variable at a

time. Fix

k∈K

. In this setting we can use

, which is not a monoid.

Indeed, if

z∈Z

, then

xkz∈Z

for any

xk∈Z+

. The advantage of using

that the solution of (4.8) is easy to characterize.

With

, the cut coefficients

(4.8)

of all variables are the same as

(4.6)

except for xk. The cut coefficient of xkis given by

inf

z∈Zmax

y∈Y

∂kh(y)

∇h(y)Ty+z(y)(1 −β(y)).

To compute this coefficient, observe that one would like to have

(

)

for points

such that the objective function of

(4.6)

is large. However,

must

be positive for at least one point. Therefore,

min

y∈Y

∂kh(y)

∇h(y)Ty+ (1 −β(y))

4.6. Conclusions 87

is the best coefficient we can hope for if

z≡

0. This coefficient can be achieved

z(y) = {︄1,if y∈arg miny∈Y

∂kh(y)

∇h(y)Ty+ (1 −β(y)),

−L, otherwise (4.9)

where L > 0 is sufficiently large.

Summarizing, we can obtain the following cut:

αj={︄maxy∈Y∂jh(y)

∇h(y)Tyif j=k

min{maxy∈Y∂jh(y)

∇h(y)Ty,miny∈Y∂jh(y)

∇h(y)Ty+ (1 −β(y))}if j=k

(4.10)

Remark 4.12.

Let

zk∈Z

be given by

(4.9)

for each

k∈K

. Assume there

is a subset

K0⊆K

and a monoid

M⊆Z

such that

zk∈M

for every

k∈K0

Then, the strengthening can be applied to all xkfor k∈K0.

Alternatively, if there is a constraint enforcing that at most one of the

can be non-zero for

k∈K0

, e.g.,

∑︁k∈Kxk≤

1, then the strengthening can

be applied to all xkfor k∈K0.

In the finite case, our application of monoidal strengthening would be

dominated by the original technique of Balas and Jeroslow (1980) by using

an appropriate monoid. However, in the presence of extra constraint, such

as the one described above, our technique can dominate vanilla monoidal

strengthening.

Example 4.13.

Consider the constraint

{x∈ {

}×

5] :

(

)

≤

}

where

(

x1, x2

) =

−

1−

+ 2

x1x2

+ 4, see Figure 4.3. The IC is given

√︁5/2x1

√2

)

x2≥

1. Note that (1

/√10,√10

)

∈Y

and yields the term

/√10x2≥

1 in

(4.5)

. Since

x2≥

/√10,√10

) = 0. Hence,

(4.10)

yields

α1≤min{√︁5/2,

}

= 1 and the strengthened inequality is

√2

)

x2≥

4.6 Conclusions

We have introduced a procedure to generate concave underestimators of fac-

torable functions, which can be used to generate intersection cuts, together

with two strengthening procedures.

It remains to be seen the practical performance of these intersection cuts.

We expect that its generation is cheaper than the generation of disjunctive cuts,

given that there is no need to solve an LP. As for the strengthening procedures,

they might be too expensive to be of practical use. An alternative is to

88 Intersection Cuts for Factorable MINLP

0.0 0.5 1.0 1.5 2.0

Figure 4.3: The feasible region

{x∈ {

}×

5] :

(

)

≤

}

from Exam-

ple 4.13 (left), the IC (middle), and the strengthened cut (right).

construct a polyhedral inner approximation of the

-free set and use monoidal

strengthening in the finite setting. However, in this case, the strengthening

proposed in Section 4.4 has no effect. Nonetheless, as far as the author knows,

this has been the first application of monoidal strengthening that is able to

exploit further problem structure such as demonstrated in Remark 4.12 and

it might be interesting to investigate further.

With respect to maximality, we cannot expect, in principle, that the

-free

sets constructed via the techniques presented here is maximal. In the next

chapter we show how to construct maximal

-free sets when

is described

by a single quadratic constraint.

Chapter 5

Maximal Quadratic-Free Sets

As we discussed in Section 1.2, classic intersection cuts are undominated when

they are generated from maximal

-free sets. However, maximality can be a

challenging goal in general. In this chapter, we show how to construct maximal

S-free sets when Sis defined as a general quadratic inequality.

The chapter is organized as follows. In Section 4.1 we introduce our setting,

review some related work and describe the contributions of the chapter. In

Section 5.2 we introduce some definitions and necessary conditions to prove

maximality of

-free sets. In particular, we define exposing points and exposing

point at infinity and show that if

is an

-free set whose defining inequalities

are exposed or exposed at infinity, then

is maximal. In Section 5.3 we show

how to construct maximal

-free sets when

is defined by a homogeneous

quadratic function. Section 5.4 presents the construction of maximal

-free

sets when

is defined by a homogeneous quadratic function and a homoge-

neous linear inequality constraints. The construction of a maximal

-free set

when

is the sublevel set of any quadratic function is presented in Section 5.5.

Our constructions depend on a “canonical” representation of the set

. The

effects of this representation are discussed in Section 5.6. In Section 5.6 we

collect some generalizations and remarks. In particular, we generalize the con-

struction of Section 5.3 to show how to construct construct maximal

-free

set when

is the 0-sublevel set of a difference of sublinear functions. We also

show how to handle more than one homogeneous linear inequality, extending

the result of Section 5.4. We discuss how our results can extend the work of

Bienstock et al. (2019) by constructing maximal outer-product-free sets when

the considered 2 by 2 minor contains entries to the diagonal. We show, via

an example, that our construction does not capture every possible maximal

quadratic-free set, even in the homogeneous case.

The cuts developed in this section have been implemented in Chmiela

(2020). We briefly discuss their computational impact on Section 5.8. We offer

90 Chapter 5. Maximal Quadratic-Free Sets

a summary and some directions for further research on Section 5.9. Finally,

we present some omitted proofs in Section 5.10.

This chapter is joint work with Gonzalo Mu˜noz. An extended abstract

based on this chapter has been accepted on the proceedings of Integer Pro-

gramming and Combinatorial Optimization Mu˜noz and Serrano (2020).

5.1 Background

Consider a generic optimization problem,

min cTx(5.1a)

s.t. x∈S⊆Rn.(5.1b)

A particularly important case is obtained when

(5.1)

is a quadratic problem,

that is,

S={x∈Rn:xTQix+bT

ix+ci≤0, i = 1, . . . , m}

for certain

n×n

matrices

, not necessarily positive semi-definite. Note that

if x¯∈ S, there exists i∈ {1, . . . , m}such that

x¯∈ Si:={x∈Rn:xTQix+bT

ix+ci≤0},

and constructing an

-free set containing

x¯

would suffice to ensure separation.

Thus, slightly abusing notation, given

x¯

we focus on a systematic way of con-

structing

-free sets containing

x¯

, where

is defined using a single quadratic

inequality:

S={x∈Rn:xTQx +bTx+c≤0}.

As a final note, if we consider the simplest form of intersection cuts, where

the cuts are computed using the intersection points of the

-free set and the

extreme rays of the simplicial conic relaxation of

(i.e., using the gauge),

then the largest the

-free set the better. In other words, if two

-free sets

C1, C2

are such that

C1⊊C2

, the intersection cut derived from

is stronger

than the one derived from

Conforti et al. (2015). Therefore, we aim at

computing maximal S-free sets.

5.1.1 Related Work

From all the works that construct intersection cuts in a non-linear setting

reviewed in Section 4.2, the only one that ensures maximality of the corre-

sponding

-free sets is the work of Bienstock et al. (2016, 2019). While their

approach can also be used to generate cutting planes in our setting (gen-

eral quadratic inequalities), the definition of

differs: Bienstock et al. use

5.1. Background 91

a moment-based extended formulation of polynomial optimization problems

(Shor, 1987; Lasserre, 2001; Laurent, 2009) and from there define

as the set

of matrices which are positive semi-definite and of rank 1, which the authors

refer to as outer-products. Maximality is computed with respect to this notion.

It is unclear if a maximal outer-product-free set can be converted into a max-

imal quadratic-free set. There is an even more fundamental difference that

makes these approaches incomparable at this point: in a quadratic setting,

the approach of Bienstock et al. would compute a cutting plane in extended

space of dimension proportional to

, whereas our approach can construct

a maximal

-free set in the original space. The quadratic dimension increase

can be a drawback in some applications, however stronger cuts can be derived

from extended formulations in some cases (Bodur et al., 2017). A thorough

comparison of these approaches is subject of future work.

5.1.2 Contribution

The main contribution of this chapter is an explicit construction of maxi-

mal

-free sets, when

is defined using a non-convex quadratic inequality

(Theorem 5.36 and Theorem 5.46). We achieve this by relying on the fact

that any quadratic inequality can be represented using a homogeneous qua-

dratic inequality intersected with a linear equality. While these maximal

-free

sets are constructed using semi-infinite representations, we show equivalent

closed-form representations of them.

In order to construct these sets, we also derive maximal

-free sets for sets

defined as the intersection of a homogeneous quadratic inequality intersected

with a linear homogeneous inequality. These are an important intermediate

step in our construction, but they are of independent interest as well.

In order to show our results, we state and prove a criterion for maximality

-free sets which generalizes a criterion proven by Dey and Wolsey (the

‘only if’ of (Dey and Wolsey, 2010, Proposition A.4)) in the case of maximal

lattice-free sets (Definition 5.2 and Theorem 5.6). We also develop a new

criterion that can handle a special phenomenon that arises in our setting and

also in non-linear integer programming: the boundary of a maximal

-free set

may not even intersect

. Instead, the intersection might be “at infinity”. We

formalize this in Definition 5.9 and show the criterion in Theorem 5.11.

5.1.3 Notation

Perhaps the least standard notation we use is denoting an inequality

αTx≤β

by (

α, β

). If

= 0 we denote it as well as

. This is based on the fact that

in the polar of a convex set —roughly, the set of all valid inequalities— the

92 Chapter 5. Maximal Quadratic-Free Sets

inequalities are points and, although we do not use any polarity results, many

of the ideas in this chapter were originally developed from looking at the

polar.

5.2 Preliminaries

In this section we collect definitions and results that are going to be useful

later on. As we mentioned above, our main object of study is the set

{x∈Rp

(

)

≤

} ⊆ Rp

, where

is a quadratic function. To make the

analysis easier, we can work on

Rp+1

and consider the cone generated by

S×{

}

, namely,

{

(

x, z

)

∈Rp+1

z2q

(

)

≤

, z ≥

}

. To recover the original

, however, we must intersect the cone with

= 1. Since we are interested

in maximal

-free sets, this motivates the following definition, see also Basu

et al. (2010a).

Definition 5.1.

Given

S, C, H ⊆Rn

where

is closed,

is closed and

convex and

is an affine hyperplane, we say that

-free with respect

C∩H

S∩H

-free w.r.t the induced topology in

. We say

maximal

-free with respect to

, if for any

C′⊇C

that is

-free with respect

to Hit holds that C′∩H⊆C∩H.

5.2.1 Techniques for Proving Maximality

In this section we describe some sufficient conditions to prove that a convex

set Cis maximal S-free which will be used in the chapter.

A sufficient (and necessary) condition for a full dimensional convex

lattice-free (that is,

) set to be maximal is that

is a polyhedron and

there is a point of

in the relative interior of each of its facets (Conforti

et al., 2014, Theorem 6.18). More generally, if

is a full dimensional

-free

polyhedron such that there is a point of

in the relative interior of each

facet, then

is maximal. The problem with extending this property to non-

polyhedral maximal

-free sets is that they might not even have facets, e.g.,

is the complement of

int B1

(0) and

(0) in dimension 3 or higher.

The motivation of the next definition is to capture the property of a facet

that is key for proving maximality.

Definition 5.2.

Given a convex set

C⊆Rn

and a valid inequality

αTx≤β

we say that a point

x0∈Rn

exposes (

α, β

)with respect to

or that (

α, β

)is

exposed by x0if

–αTx0=βand,

5.2. Preliminaries 93

–

γTx≤δ

is any other non-trivial valid inequality for

such that

γTx0=δ, then there exists a µ > 0such that γ=µα and β=µδ.

In some cases we omit saying “with respect to C” if it is clear from context.

To get some intuition, if

is a polyhedron and

x∈C

exposes an inequality,

then that inequality is a facet and xis in the relative interior of the facet.

Remark 5.3.

It is very important to note that if there exists a point exposing

a valid inequality of

, then

is full dimensional. The reader should keep

this in mind throughout the whole chapter.

Remark 5.4.

For some convex

, a point

x /∈C

can expose a valid inequality

. For instance, consider

{x∈R2

x2≥

}

. Then (0

/∈C

and exposes x1+x2≥0.

The name “exposed inequality” comes from the concept of exposed point,

see Section 1.1. Actually, from the standard duality between points and hy-

perplanes (a hyperplane can be characterized by its normal which is a point),

one can interpret a exposed inequality just as the dual of an exposed point.

In more details and to simplify ideas, let us assume that 0

∈int

(

). Recall

that a point

x0∈C

is exposed if there exists a valid inequality of

αTx≤

such that

{x∈C

αTx

= 1

}

{x0}

. If

α0

is an exposed point of the polar

C◦

{α

αTx≤

,∀x∈C}

, then there is a valid inequality,

0α≤

such that

{α∈C◦

0α

= 1

}

{α0}

. In other words, if

αTx≤

1 is valid

for

(i.e.

α∈C◦

) and

αTx0

= 1, then

α0

. We see that

is a point

(direction) that shows that

α0

is an exposed inequality, or, that

exposes

α0. See also Lemma 5.15.

We now show that our definition is indeed helpful to show maximality.

Theorem 5.5.

Let

K, K′⊆Rn

be convex sets such that

K⊆K′

. If

αTx≤β

– valid for K,

– not valid for K′, and

– exposed by x0∈Kwith respect to K,

then x0∈int(K′).

94 Chapter 5. Maximal Quadratic-Free Sets

Proof. As

x0∈K

exposes

αTx≤β

, it holds that

αTx0

and, thus,

is in

the boundary of

. Suppose

is not in the interior of

K′

. Then it must be

in the boundary of

K′

and there is a valid inequality for

K′

γTx≤δ

, such

that γTx0=δ.

K⊊K′

γTx≤δ

is also valid for

. Given that (

γ, δ

) is tight at

and

exposes (

α, β

), we conclude that there is a

µ >

0 such that

µα

and

µδ

. However, since

αTx≤β

is not valid for

K′

, it follows that

γTx≤δ

cannot be valid for K′. This contradiction proves the claim.

Theorem 5.6.

Let

S⊆Rn

be a closed set and

C⊆Rn

a convex

-free set.

Assume that

{x∈Rn

αTx≤β, ∀

(

α, β

)

∈

}

and that for every (

α, β

)

there is an x∈S∩Cthat exposes (α, β). Then, Cis maximal S-free.

Proof. To show that

is maximal we are going to show that for every

x¯/∈C

S∩int(conv(C∪{x¯})) is nonempty.

Let

x¯/∈C

and let (

α, β

)

∈

Γ be a separating inequality, i.e.,

αTx¯> β

. Let

C′= conv(C∪{x¯}).

By hypothesis, there is an x0∈S∩Cthat exposes (α, β). Since (α, β) is

valid for Cand not for C′, Theorem 5.5 implies that x0∈int(C′).

With minor modifications one can also get the following sufficient condition

for maximality with respect to a hyperplane.

Theorem 5.7.

Let

S⊆Rn

be a closed set,

be an affine hyperplane,

and

C⊆Rn

be a convex

-free set. Assume that

{x∈Rn

αTx≤

β, ∀

(

α, β

)

∈

}

and that for every (

α, β

)there is an

x∈S∩C∩H

that

exposes (α, β). Then, Cis maximal S-free with respect to H.

Remark 5.8.

Points that expose inequalities are also called smooth points.

Asmooth point of

is a point for which there exists a unique supporting

hyperplane to

at it Goberna et al. (2010). Therefore, if

x0∈C

, then

exposes some valid inequality of C, if and only if, x0is a smooth point of C.

A related concept is that of blocking points Basu et al. (2019). However,

blocking points need not to be smooth points in general, that is, they do not

need to expose any inequality. As seen in Theorem 5.6 we use exposing points

to determine maximality of a convex

-free set. Similarly, in the context of lift-

ing Conforti et al. (2011a), blocking points are used to determine maximality

of a translated convex cone S×Z+-free set.

5.2. Preliminaries 95

There is another phenomenon that does not occur when

. If

is a

quadratic set, the inequalities of a maximal

-free set might not be exposed

by any point of

. For instance, consider

{

(

x, y

)

∈R2

+ 1

≤y2}

The boundary of

is a hyperbola with asymptotes

±y

. Thus,

{

(

x, y

)

∈R2

x≥ |y|}

is a maximal

-free set, because its inequalities are

asymptotes of

, but they are not exposed by points of

. This phenomenon

also occurs when

Zn∩K

, with

convex Mor´an and Dey (2011). However,

in that case, it also turns out that maximal

-free sets are polyhedral and

their constructions rely on the concept of a facet (see for instance (Mor´an

and Dey, 2011, Theorem 3.2)) which we do not have access to in the general

case. In our case, we extend the definition of what it means for an inequality

to be exposed in order to handle a situation like the one above. We do this

by interpreting that asymptotes are exposed “at infinity”.

Definition 5.9.

Given a convex set

C⊆Rn

with non-empty recession cone

and a valid inequality

αTx≤β

, we say that a sequence (

)

n⊆Rn

exposes

(α, β) at infinity with respect to Cif

–∥xn∥ → ∞,

–xn

∥xn∥→d∈rec(C),

–dexposes αTx≤0with respect to rec(C), and

– there exists ysuch that αTy=βsuch that dist(xn, y +⟨d⟩)→0.

As before, we omit saying “with respect to C” if it is clear from context.

Using this definition, we can prove an analogous result to Theorem 5.5 for

inequalities exposed at infinity.

Theorem 5.10.

Let

K, K′⊆Rn

be convex sets such that

K⊆K′

. If

αTx≤β

– valid for K,

– not valid for K′, and

– exposed at infinity by (xn)nwith respect to K,

then there exists a ksuch that xk∈int(K′).

96 Chapter 5. Maximal Quadratic-Free Sets

Proof. Suppose that for all

is not in the interior of

K′

. Then, for each

there exists a non-trivial valid inequality for

K′

γT

kx≤δk

, such that

γT

kxk≥δk

We can assume without loss of generality that

∥

(

γk, δk

)

∥

= 1. Hence, going

through a subsequence if necessary, there exist

γ∈Rn

and

δ∈R

such that

γk→γ

and

δk→δ

when

k→ ∞

and

∥

(

γ, δ

)

∥

= 1. Note that the inequality

(

γ, δ

) is valid for

K′

. The idea is to show that (

γ, δ

) defines the same inequality

as (α, β).

limk→∞ xk

∥xk∥∈rec

(

) (see Definition 5.9) and (

γ, δ

) is valid for

K′⊇K

, then

γTx≤

0 is valid for

rec

(

). In particular,

γTd≤

0. On the

other hand, δk

∥xk∥≤γT

∥xk∥implies 0 ≤γTd,

We conclude that

γTd

= 0. As

exposes

αTx≤

0 with respect to

rec

(

) ,

there exists a

µ≥

0 such that

µα

. Note that we cannot conclude that

µ >

0 since, at this point, we do not know that (

γ, δ

) is a non-trivial inequality

(e.g. it could be 0Tx≤1).

Let

be such that

αTy

and

dist

(

xk, y

⟨d⟩

)

→

0, which exists by

Definition 5.9. Let wk=xk−dTxkd. We have that

dist(xk, y+⟨d⟩) = dist(xk−y, ⟨d⟩) = ∥xk−y−dT(xk−y)d∥=∥wk−(y−dTyd)∥.

Thus, wk→y−dTyd as k→ ∞.

Since each (

γk, δk

) is valid for

K′

γT

kd≤

0. Additionally, for large enough

kit must hold that dTxk>0. Therefore,

δk≤γT

kxk=γT

k(dTxkd+wk)≤γT

kwk.

Computing the limit when k→ ∞ we get,

δ≤µαT(y−dTyd) = µαTy=µβ.

= 0, then

= 0 and

δ≤

0. As

∥

(

γ, δ

)

∥

= 1, it follows that

−

1, which

cannot be since (

γ, δ

) is a valid inequality for

K′

and

K′

is, by hypothesis,

non-empty. We conclude that

µ >

0 and that

µαTx≤µβ

is valid for

K′

which implies that

αTx≤β

is valid for

K′

, contradicting the hypothesis of

the theorem.

With the previous results it is straightforward to prove the following

generalization of Theorem 5.7.

Theorem 5.11.

Let

S⊆Rn

be a closed set,

be an affine hyperplane,

and

C⊆Rn

be a convex

-free set. Assume that

{x∈Rn

αTx≤

5.3. Maximal Quadratic-Free Sets for Homogeneous Quadratics 97

β, ∀

(

α, β

)

∈

}

and that for every (

α, β

)there is, either, an

x∈S∩C∩H

that exposes (

α, β

), or sequence (

)

n⊆S∩H

that exposes (

α, β

)at infinity.

Then, Cis maximal S-free with respect to H.

Another useful result for studying maximal

-free sets is the following (see

also (Conforti et al., 2014, Lemma 6.17)). It states that in some cases we can

project

into a lower dimensional space and find maximal sets that are free

for the projection. This result is also useful for visualizing higher dimensional

S-free sets.

Theorem 5.12.

Let

be a full dimensional closed convex cone with lineality

space

. Let

S⊆Rn

be closed. Then,

is maximal

-free if and only if

(C∩L⊥)is maximal cl(projL⊥S)-free.

Proof. (

⇒

) If

C∩L⊥

is not maximal, let

K⊆L⊥

be a

(

projL⊥S

)-free

set that contains it. Then,

L⊋C

. Since

is maximal

-free, there

exists an

x∈S

such that

x∈int

(

) =

int

(

) +

int

(

) ((Rockafellar,

1970, Corollary 6.6.2)). That is,

ℓ

with

k∈int

(

) and

ℓ∈L

. Thus,

x−ℓ∈K⊆L⊥

which implies that

x−ℓ∈projL⊥S

and contradicts the fact

that Kis cl(projL⊥S)-free.

(

⇐

) By contradiction, suppose that

is not maximal

-free and let

K⊋C

a closed convex

-free set. Then

K∩L⊥⊋C∩L⊥

, which implies that

K∩L⊥

is not

(

projL⊥S

)-free. This implies that

∃s˜∈cl

(

projL⊥S

)

∩int

(

K∩L⊥

Moreover, we can further assume

s˜∈projL⊥S∩int

(

K∩L⊥

), as any sequence

contained in

projL⊥S

converging to an element of

(

projL⊥S

)

∩int

(

K∩L⊥

)

must have an element in projL⊥S∩int(K∩L⊥).

By the definition of orthogonal projection, there must exist

s∈S

and

ℓ∈Lsuch that s˜ = s−ℓ. Thus, we obtain s−ℓ∈int(K∩L⊥), i.e.

s∈int(K∩L⊥) + L.

Since the lineality space of

must contain

, we conclude

s∈int

(

); a

contradiction with Kbeing S-free.

5.3 Maximal Quadratic-Free Sets for Homogeneous Quadratics

In this section we construct maximal

-free sets that contain a vector

x¯∈ Sh

for

{x∈Rp

xTQx ≤

}

. This is our building block towards maximality

98 Chapter 5. Maximal Quadratic-Free Sets

in the general case. After a change of variable, we can assume that

Sh={(x, y, z)∈Rn+m+l:

∑︂

i=i

i−

∑︂

i=i

i≤0}

={(x, y)∈Rn+m:

∑︂

i=i

i−

∑︂

i=i

i≤0}×Rl.

Thus, we will only focus on

{

(

x, y

)

∈Rn+m

∑︁n

i=ix2

i−∑︁m

i=iy2

i≤

}

and assume we are given (x¯, y¯) such that ∥x¯∥2>∥y¯∥2.

Remark 5.13.

The transformation used to bring

to the last “diagonal”

form is, in general, not unique. Nonetheless, maximality of the

-free sets

is preserved, as there always is such transformation that is one-to-one. In

Section 5.6 we discuss the effect different choices of this transformation have.

5.3.1 Removing Strict Convexity Matters

A simple way of obtaining an

-free set is via a concave underestimator of

(

x, y

) =

∑︁n

i=ix2

i−∑︁m

i=iy2

∥x∥2−∥y∥2

directly. A concave underestimator

tight at (

x¯, y¯

) is obtained after linearizing the convex function

∥x∥2

x¯

, that

is,

∥x¯∥2

∥x¯∥

(

x−x¯

)

−∥y∥2

. The concave underestimator yields the

-free set

{

(

x, y

)

∈Rn+m

∥x¯∥2

+ 2

∥x¯∥

(

x−x¯

)

−∥y∥2≥

}

. However, simple examples

show that such an Sh-free set is not maximal.

Example 5.14. The case n=m= 1 with x¯ = 3 yields the Sh-free set

C={(x, y)∈R2:−9+6x−y2≥0}

In Figure 5.1 we can see that the set is not maximal Sh-free.

As discussed in Section 4.3.1 the problem is that

∥x∥2

is a strictly convex

function. Indeed, suppose

{x∈Rn

(

)

≤

}

where

is strictly convex.

The

-free set obtained via a concave underestimator at

x¯

{x∈Rn

(

x¯

) +

∇f

(

x¯

)(

x−x¯

)

≥

}

. It is not hard to see that the strict convexity of

implies that

is not maximal

-free. The reason is that, as we saw in

Chapter 2, linearizations of

x¯/∈S

will not support

. On the other hand,

is instead sublinear, then any linearization of

supports

, thus it yields

a maximal Sfree set.

The previous observation motivates the following. The set

can be

equivalently be described by

{

(

x, y

)

∈Rn+m

∥x∥−∥y∥ ≤

}

. Now,

5.3. Maximal Quadratic-Free Sets for Homogeneous Quadratics 99

-4-2 0 2 4

-4

-2

Figure 5.1:

in Example 5.14 (blue) and the

-free set constructed using

a concave underestimator of ∥x∥2−∥y∥2(orange).

the function

(

x, y

) =

∥x∥−∥y∥

has the following concave underestimator at

x¯= 0, x¯Tx

∥x¯∥−∥y∥, which yields the Sh-free set

Cλ={(x, y)∈Rn+m:λTx≥ ∥y∥},(5.2)

where

x¯

∥x¯∥

. This set turns out to be maximal, even if we consider any other

λ∈D1

(0) We note that in Bienstock et al. (2016), the authors use a similar

technique and reformulate a 4-variable homogeneous quadratic condition of

outer-product-free sets in the form ∥x∥ ≤ ∥y∥. This allows them to construct

maximal outer-product-free sets that are of the form Cλ.

5.3.2 Maximal Sh-free Sets

We now prove that

Cλ

is maximal

-free. The main idea is to exploit that

every inequality describing

Cλ

has a point in

Sh∩Cλ

exposing it and use

Theorem 5.6. We begin with a Lemma whose proof we present in Section 5.10.

We recall that a function is sublinear if and only if it is convex and positive

homogeneous.

Lemma 5.15. Let ϕ:Rn→Rbe a sublinear function, λ∈D1(0), and let

C={(x, y) : ϕ(y)≤λTx}.

If (

x¯, y¯

)

∈C

is such that

is differentiable at

y¯

and

(

y¯

) =

λTx¯

, then (

x¯, y¯

)

exposes the valid inequality −λTx+∇ϕ(y¯)Ty≤0.

In particular, if

β0∈∂ϕ

(0) is an exposed point of

∂ϕ

(0), exposed by

y¯

and ϕ(y¯) = λTx¯, then (x¯, y¯) exposes the valid inequality −λTx+βT

0y≤0.

Theorem 5.16.

Let

{

(

x, y

)

∈Rn+m

∥x∥ ≤ ∥y∥}

and

Cλ

{

(

x, y

)

∈

Rn+m

λTx≥ ∥y∥}

for

λ∈D1

(0). Then,

Cλ

is a maximal

-free set.

Furthermore, if λ=x¯

∥x¯∥,Cλcontains (x¯, y¯) in its interior.

100 Chapter 5. Maximal Quadratic-Free Sets

Proof. The Sh-freeness follows by construction. To show that Cλis maximal,

we first notice that

Cλ={(x, y)∈Rn+m:−λTx+βTy≤0,∀β∈D1(0)}.

We just need to show that every inequality (

−λ, β

) is exposed by a point

(x, y)∈Sh∩Cλ.

Since the norm function

∥·∥

is sublinear, differentiable everywhere but

in the origin, and

∥β∥

= 1 =

λTλ

, Lemma 5.15 shows that (

λ, β

)

∈Sh∩Cλ

exposes (

−λ, β

). From Theorem 5.6 we conclude that

Cλ

is maximal

-free.

The fact that (x¯, y¯) ∈int(Cλ) when λ=x¯

∥x¯∥, can be verified directly.

5.4 Homogeneous Quadratics With a Single Homogeneous Lin-

ear Constraint

Finding maximal

-free sets for

defined using a non-homogeneous quadratic

function is much more challenging than the previous case. In general, using a

homogenization and diagonalization, any such Scan be described as

{(x, y, z)∈Rn+m+l:∥x∥ ≤ ∥y∥, aTx+dTy+hTz=−1}.(5.3)

Remark 5.17.

Similarly to our discussion in Remark 5.13, the choice of

transformation to bring a non-homogenous quadratic to the form

(5.3)

not unique. Different choices can produce different vectors

a, d, h

. Nonethe-

less, maximality of

-free sets is preserved through these transformations if

they are one-to-one. We discuss the effect of the different choices of such

transformations in Section 5.6.

First of all, we note that the case

h

= 0 can be tackled directly using

Section 5.3. Indeed, if this is the case it is not hard to see that

C×Rl

maximal

-free (with respect to the corresponding hyperplane), where

any maximal

-free. This follows from Theorem 5.12. Thus, in what follows

we consider

S={(x, y)∈Rn+m:∥x∥ ≤ ∥y∥, aTx+dTy=−1}.

Also note that using transformations that yield the latter form of

allow us

to assume that the given point (x¯, y¯) ∈ Ssatisfies

∥x¯∥>∥y¯∥, aTx¯ + dTy¯ = −1.

5.4. Homogeneous Quadratics With a Single Homogeneous Linear

Constraint 101

We elaborate on this point in Section 5.6.

The set

above is our final goal. However, at this point, a simpler set to

study is

S≤0={(x, y)∈Rn+m:∥x∥≤∥y∥, aTx+dTy≤0}.

In this section we construct maximal

S≤0

-free sets that contain (

x¯, y¯

) satisfying

∥x¯∥>∥y¯∥, aTx¯ + dTy¯≤0.

While this set is interesting on its own, it provides an important intermediate

step into our construction of maximal S-free sets.

As it turns out, the construction of maximal

S≤0

-free sets depends on

whether

∥a∥<∥d∥

∥a∥ ≥ ∥d∥

and on the value of

. Unfortunately, each

case requires different ideas. The following remark dismisses a simple case:

Remark 5.18.

= 1 and

∥a∥<∥d∥

then

S≤0

is convex. To see this,

assume that

d >

0 and let (

x, y

)

∈S≤0

with

y

= 0. Then,

dy ≤ −aTx≤

∥a∥∥x∥ ≤ ∥a∥|y|< d|y|

. This can only happen if

y <

0. Therefore,

S≤0

the second order cone

{

(

x, y

) :

∥x∥ ≤ −y}

. The case

d <

0 is analogous. We

remark that the assumption

∥a∥<|d|

is fundamental for the argument. As

we show in Example 5.25, S≤0is not necessarily convex if ∥a∥=|d|.

We divide the remaining cases in the following:

Case 1 ∥a∥ ≤ ∥d∥ ∧ m > 1.

Case 2 ∥a∥ ≥ ∥d∥.

Note that both our strategies allow us to handle the overlapping case

∥a∥

∥d∥ ∧ m >

1. We start with the more natural idea that follows from our

previous discussions. This yields the proof of Case 1 and motivates our case

distinction.

5.4.1 Case 1: ∥a∥ ≤ ∥d∥ ∧ m > 1

The strategy for proving maximality of Cλwas to write Cλas

Cλ={(x, y)∈Rn+m:−λTx+βTy≤0,∀β∈D1(0)},

and to find an exposing point in

Sh∩Cλ

for each of the inequalities defining

Cλ

. As

S≤0⊆Sh

Cλ

is clearly

S≤0

-free. However, if we try to prove it is

maximal following the same technique, we find that it is not clear that some

102 Chapter 5. Maximal Quadratic-Free Sets

inequalities have exposing points in

S≤0∩Cλ

. The exposing point of the

inequality (−λ, β), (λ, β) is in S≤0if and only if aTλ+dTβ≤0. Let

G(λ) = {β:∥β∥= 1, aTλ+dTβ≤0}.

It is natural to ask, then, if

CG(λ)={(x, y)∈Rn+m:−λTx+βTy≤0,∀β∈G(λ)}

is maximal

S≤0

-free. Intuitively,

CG(λ)

is obtained from

Cλ

by removing from

its description all inequalities that do not have an exposing point in

aTλ

dTβ≤

0. It is reasonable to expect maximality, as, by construction, every

inequality has a point exposing it. Indeed,

Proposition 5.19.

CG(λ)

∅

and

is any

S≤0

-free set such that

Cλ⊆C

then C⊆CG(λ).

Proof. Suppose, by contradiction, that

C⊆ CG(λ)

. This implies that there

must exist

β0∈G

(

) such that

−λTx

βT

0y≤

0 is not valid for

. As

Cλ⊆CG(λ),−λTx+βT

0y≤0 is valid for Cλ.

As we saw in Theorem 5.16, (

λ, β0

)

∈Cλ

exposes

−λTx

βT

0y≤

0, and

since

Cλ⊆C

, Theorem 5.5 implies that (

λ, β0

)

∈int

(

). However, since

β0∈G

(

), we have (

λ, β0

)

∈S≤0

. This contradicts the

S≤0

-freeness of

This result shows that

CG(λ)

is the largest (inclusion-wise) set that one

can aspire to obtain from

Cλ

. However, it is unclear if

CG(λ)

S≤0

-free. Even

more, it is unclear whether

(

) is non-empty or not. In the following we

study when CG(λ)is S≤0-free

We start by showing that when λ=x¯

∥x¯∥,G(λ) is non-empty.

Proposition 5.20.

Let (

x¯, y¯

)

/∈S≤0

such that

aTx¯

dTy¯≤

0and let

x¯

∥x¯∥

Then,

G(λ)=∅.

If, in addition,

= 0, then

(

) =

(0) and

CG(λ)

Cλ

is maximal

S≤0

-free.

Proof. As (

x¯, y¯

)

/∈S≤0

, we have that

∥y¯∥<∥x¯∥

. Since

m >

1, then we can

find

z∈Rm\{

}

such that

dTz

= 0 and

∥y¯

∥x¯∥

z∥

= 1. Also,

aTx¯

dTy¯≤

and dTz= 0 imply that aTλ+dT(y¯

∥x¯∥+z)≤0. Thus, y¯

∥x¯∥+z∈G(λ).

Regarding the second statement of the proposition, if

= 0 then clearly

either

(

) =

(0) or

(

) =

∅

. Since we are in the case

(

)



∅

, this

5.4. Homogeneous Quadratics With a Single Homogeneous Linear

Constraint 103

immediately implies

CG(λ)

Cλ

. Thus, Proposition 5.19 implies its maximal-

ity.

In light of Proposition 5.19, we just need for

CG(λ)

to be

S≤0

-free for it to

be maximal. Note that

CG(λ)={(x, y)∈Rn+m: max

β∈G(λ)yTβ≤λTx},(5.4)

and so to prove

S≤0

-freeness, it is enough to show that for every (

x, y

)

∈

S≤0

maxβ∈G(λ)yTβ≥λTx

. In trying to prove this inequality is where the

conditions of this case naturally arise.

Proposition 5.21.

Let (

x¯, y¯

)

/∈S≤0

such that

aTx¯

dTy¯≤

0and

x¯

∥x¯∥

∥d∥ ≥ ∥a∥

and

m >

1, then

CG(λ)

is maximal

S≤0

-free and contains (

x¯, y¯

)

in its interior.

Proof. As discussed above, it is enough to show that

max

β∈G(λ)yTβ≥λTxfor every (x, y)∈S≤0.(5.5)

Informally, the strategy is to find a dual of

maxβ∈G(λ)yTβ

so that the inequality

we have to prove is of the form “minimum of something greater or equal than

λTx

”, which often times is easier to reason about. As the objective function

maxβ∈G(λ)yTβ

is linear and

m >

1, we can replace the

∥β∥

= 1 constraint

with an inequality and obtain

max

β∈G(λ)yTβ= max{yTβ:∥β∥ ≤ 1, aTλ+dTβ≤0}.(5.6)

(

) is constructed from an infeasible point (

x¯, y¯

)

/∈S≤0

such that

aTx¯

dTy¯≤

0, i.e.,

∥y¯∥<∥x¯∥

, we have

∥y¯/∥x¯∥∥ <

1. Moreover, perturbing the

latter we can argue that the rightmost optimization problem in

(5.6)

has a

strictly feasible point. Thus, Slater’s condition holds and we have that

max{yTβ:∥β∥ ≤ 1, aTλ+dTβ≤0}= inf

θ≥0∥y−dθ∥−λTaθ. (5.7)

Using (5.7), (5.5) is equivalent to

inf

θ≥0∥y−dθ∥−λTaθ ≥λTxfor every (x, y)∈S≤0.(5.8)

We now prove that if (

x, y

)

∈S≤0

, then

λT

(

aθ

)

≤ ∥y−dθ∥

, which implies

the result.

104 Chapter 5. Maximal Quadratic-Free Sets

By Cauchy-Schwarz and

∥λ∥

= 1, we have that

λT

(

aθ

)

≤ ∥x

aθ∥

Furthermore,

∥x

aθ∥2

∥x∥2

+ 2

θaTx

∥aθ∥2

. Since

θ≥

θaTx≤ −θdTy

Together with ∥x∥2≤ ∥y∥2they imply

∥x+aθ∥2≤ ∥y∥2−2θdTy+∥a∥2θ2

=∥y−dθ∥2+ (∥a∥2−∥d∥2)θ2

≤ ∥y−dθ∥2,

where the last inequality follows since ∥d∥ ≥ ∥a∥.

We have shown that

∥x

aθ∥ ≤ ∥y−dθ∥

. Hence,

λT

(

aθ

)

≤ ∥y−dθ∥

as we

wanted to show, which implies that

CG(λ)

S≤0

-free. Finally, Proposition 5.19

implies the maximality of

CG(λ)

, and (

x¯, y¯

)

∈int

(

CG(λ)

) since

Cλ⊆CG(λ)

Remark 5.22.

Using Proposition 5.54 one can show that

maxβ{yTβ

∥β∥ ≤

1, aTλ+dTβ≤0}is

⎧

⎨

⎩

∥y∥,if aTλ∥y∥+yTd≤0

√︂(1 −(aTλ

∥d∥)2)(∥y∥2−(yTd

∥d∥2)2)−aTλyTd

∥d∥2,otherwise. (5.9)

Note that this is well defined since if

∥d∥

= 0, then

∥a∥

= 0 and so

(5.9)

∥y∥

This yields a closed-form expression for CG(λ)of the form

CG(λ)={(x, y)∈Rn+m: (5.9) ≤λTx}.(5.10)

The last proposition provides certain guarantees of when a simple modi-

fication of

Cλ

yields maximal

S≤0

-free sets. Our proof heavily relies on our

assumptions

∥a∥≤∥d∥

(to show

(5.8)

) and

m >

1 (to show

(5.6)

), so the

natural question is whether these conditions are actually necessary for our

statement to be true. Thus, before moving on to the next case, we argue

why these conditions are indeed necessary in our statements. The following

examples motivate our case distinction and illustrate all cases we have covered.

Example 5.23.

Consider the following set of the type

S≤0

, which we denote

≤0:

≤0={(x, y1, y2)∈R3:|x| ≤ ∥y∥, ax +dTy≤0}

with

= 1 and

= (1

,−

. Let us consider the point (

x¯, y¯

) = (

−

clearly satisfying the linear inequality, but not in

≤0

. In Figure 5.2 we show

≤0

, the

≤0

-free set given by

Cλ

and the set

CG(λ)

for

x¯

∥x¯∥

. Since in this

case

|a|

= 1

≤√2

∥d∥

and

m >

1, we know

CG(λ)

is maximal

≤0

-free.

5.4. Homogeneous Quadratics With a Single Homogeneous Linear

Constraint 105

(a)

≤0

in Example 5.23 (orange) and

the corresponding

Cλ

set (green). The

latter is S1

≤0-free but not maximal.

(b)

≤0

in Example 5.23 (orange) and

the corresponding

CG(λ)

set (green).

The latter is maximal S1

≤0-free.

Figure 5.2: Sets Cλand CG(λ)in Example 5.23 for the case ∥a∥ ≤ ∥d∥.

Example 5.24. Consider the set S2

≤0, defined as

≤0={(x1, x2, y)∈R3:∥x∥ ≤ |y|, aTx+dy ≤0}

with

= (

−

/√2,

/√2

)

and

= 1

/√2

(the 1

/√2

terms are not really

important now as we can scale the inequality, but we reuse this example in

subsequent sections where they do matter), and (

x¯, y¯

) = (

−

,−

. This

point satisfies the linear inequality in S2

≤0, but it is not in S2

≤0. Let λ=x¯

∥x¯∥.

In this case

aTλ

= 0, and as a consequence the corresponding set

(

)

is given by the singleton

{−

}

. In Figure 5.3 we show

≤0

, the

≤0

-free set

given by

Cλ

and the set

CG(λ)

. In this case

∥a∥

= 1

/√2

|d|

, so we have

no guarantee on the

≤0

-freeness of

CG(λ)

. Even more, it is not

≤0

-free.

Example 5.25.

Let us consider the following example with

= 2,

= 1

and

∥d∥

∥a∥

. Let

= (

−

T, d

= 5 and consider (

x¯, y¯

) = (

−

,−

and

x¯

∥x¯∥

. Clearly (

x¯, y¯

)

∈ S≤0

, but satisfies the linear constraint. In this

case, β∈G(λ) must satisfy

5·β≤0,|β|= 1

106 Chapter 5. Maximal Quadratic-Free Sets

(a)

≤0

in Example 5.24 (orange) and

the corresponding

Cλ

set (green). The

latter is S2

≤0-free but not maximal.

(b)

≤0

in Example 5.24 (orange) and

the corresponding

CG(λ)

set (green).

The latter is not S2

≤0-free.

Figure 5.3: Sets Cλand CG(λ)in Example 5.24 for the case ∥a∥>∥d∥.

thus G(λ) = {−1}. Nonetheless, (x, y) = (3,−4,5) ∈S≤0, and

λTx+y= 0 + 5 >0

This means (x, y)∈int(CG(λ)). Thus, CG(λ)is not S≤0-free.

Remark 5.26.

The situation in Example 5.25 is similar to the one depicted

in Figure 5.3b. Roughly speaking, when

∥a∥

∥d∥

the upper region becomes

a single line and this line intersects the interior of

CG(λ)

. Intuitively, when we

consider

where

aTx

dTy

−

1, this line should not appear. Even more,

should be convex. We will see that this is the case in the Section 5.5.1.

5.4.2 Case 2: ∥a∥ ≥ ∥d∥

As we have seen in Example 5.24, when

∥a∥ ≤ ∥d∥

does not hold,

CG(λ)

is not

necessarily

S≤0

-free. On the other hand,

Cλ

S≤0

-free but not necessarily

maximal. As before, we are looking for a convex set

that is maximal

S≤0

-free

set that contains

Cλ

. We point out that in not all statements of this section

we require λ=x¯

∥x¯∥.

Projecting-out the lineality space

The lineality space of

Cλ

{

(

x, y

) :

λTx

= 0

, y

= 0

}

and as

Cλ⊆C

it must be that

is contained in the lineality space of

. By Theorem 5.12,

projL⊥C

is maximal

projL⊥S≤0

-free, thus, it might be possible (and we show

it is) to find

by studying maximal

projL⊥S≤0

-free sets. We note that

L⊥=⟨λ⟩×Rmand

projL⊥S≤0={(λTx, y) : ∥x∥ ≤ ∥y∥, aTx+dTy≤0}.

5.4. Homogeneous Quadratics With a Single Homogeneous Linear

Constraint 107

After analyzing low dimensional instances of

projL⊥S≤0

we conjecture that

(projL⊥S≤0)c

is formed by the union of two disjoint convex sets. If this is true,

it would directly provide maximal projL⊥S≤0-free sets.

In order to show that this is actually true, we follow the following strategy.

For each point

y∈Rm

, the points (

λTx, y

)

∈projL⊥S≤0

lie on an interval,

namely, {λTx:∥x∥≤∥y∥, aTx+dTy≤0}. Thus, we define the functions

y↦→ max{λTx:∥x∥≤∥y∥, aTx+dTy≤0}and

y↦→ min{λTx:∥x∥ ≤ ∥y∥, aTx+dTy≤0}.

If the first function is convex and the second is concave, then the closure of

(projL⊥S≤0)c

is the union of the epigraph of the first one and the hypograph

of the second one. Thus, it suffices to show that

ϕλ(y) = max

x{λTx:∥x∥ ≤ ∥y∥, aTx+dTy≤0}(5.11)

is convex for every λ∈D1(0), as the second function is −ϕ−λ.

We first show that ϕλis defined over all Rm.

Proposition 5.27.

∥d∥ ≤ ∥a∥

, then for every

the set

{

(

x, y

) :

∥x∥ ≤

∥y∥, aTx≤ −dTy}is not empty.

Proof. Note that

−dTya

∥a∥2

belongs to the set. Indeed,

aTx

−dTy

in particular,

aTx≤ −dTy

. Also,

∥d∥≤∥a∥

implies that

∥x∥ ≤ ∥d∥

∥a∥∥y∥ ≤

∥y∥.

We now show that

ϕλ

is convex. Furthermore, we prove that

ϕλ

is sublinear,

that is, convex and positive homogeneous. The proof is basically to find

ϕλ

explicitly and then verify its properties. Note that in this case

∥a∥

= 0 implies

that the linear inequality in

S≤0

is trivial. Thus,we assume without loss of

generality, that ∥a∥= 1.

Proposition 5.28.

Let

λ, a ∈D1

(0)

⊆Rn

and

d∈Rm

such that

∥d∥ ≤

Then,

ϕλ(y) = {︄∥y∥,if λTa∥y∥+dTy≤0

√︁(∥y∥2−(dTy)2)(1 −(λTa)2)−dTyλTa, otherwise.

(5.12)

Furthermore, ϕλis sublinear and

– if ∥d∥= 1 ∧m > 1, then ϕλis differentiable Rm\dR+,

108 Chapter 5. Maximal Quadratic-Free Sets

– otherwise ϕλis differentiable in Rm\{0}.

Proof. The fact that

ϕλ

is positive homogeneous can be easily verified. We leave

the proof that

ϕλ

is of the form

(5.12)

to Section 5.10, see Proposition 5.54.

Thus convexity and differentiability remains.

First, note that if

, then

ϕλ

(

) =

−dTy

. This function is clearly

sublinear and differentiable everywhere. On the other hand, if

−a

, then

ϕλ

(

) =

∥y∥

. This function is clearly sublinear and differentiable everywhere

but the origin.

We now consider λ=±a. Let

A1={y:λTa∥y∥+dTy≤0},

A2={y:λTa∥y∥+dTy≥0},(5.13)

and let ϕ1

λand ϕ2

λbe the restriction of ϕλto A1and A2, respectively.

To show that

ϕλ

is convex we are going to use (Solovev, 1983, Theorem

3). In our particular case, since

ϕλ

is positively homogeneous, this theorem

implies that we just need to check that

ϕλ

is convex on each convex subset of

A1and A2,ϕ1

λ=ϕ2

λon A1∩A2, and that

ϕ′

λ(y;ρ) + ϕ′

λ(y;−ρ)≥0,for all ρ∈Rm\{0}, y ∈A1∩A2.(5.14)

Here, ϕ′

λ(y;ρ) is the directional derivative of ϕλat yin the direction of ρ.

Clearly,

ϕλ

is convex in each convex subset of

. The function

ϕ2

is of

the form

c1∥y∥W−c2dTy

, where

I−ddT⪰

0 and

c1, c2

are constants.

Thus, ϕλis convex on each convex subset of A2.

It is not hard to see that ϕ1

λ(y) = ϕ2

λ(y) for y∈A1∩A2.

Let us verify

(5.14)

for

y

= 0. For this, first notice that

ϕ1

(

) is differ-

entiable whenever

y

= 0. Likewise,

ϕ2

(

) is differentiable whenever

y

= 0

∥d∥<

1 or whenever

y /∈dR+

∥d∥

= 1. However, if

y∈A1∩A2\{

}

and

∥d∥

= 1, then

y /∈dR+

, thus

ϕ2

is differentiable in a neighborhood of

Furthermore,

∇ϕ2

λ(y) = (1 −(λTa)2)(I−ddT)y

√︁(∥y∥2−(dTy)2)(1 −(λTa)2)−λTad

∥y∥(I−ddT)y−λTad

∥y∥

=∇ϕ1

λ(y).

5.4. Homogeneous Quadratics With a Single Homogeneous Linear

Constraint 109

Therefore,

ϕλ

is differentiable in whenever

y

= 0 if

∥d∥<

1 or whenever

y /∈dR+if ∥d∥= 1. Thus, (5.14) holds with equality for y∈A1∩A2\{0}.

It remains to verify

(5.14)

for

= 0. Let

be such that

ρ∈A1

and

−ρ∈A2. As ϕλis positively homogeneous, ϕ′

λ(0; ·) = ϕλ(·). Hence,

ϕ′

λ(0; ρ) = ∥ρ∥and ϕ′

λ(0; −ρ) = √︂1−(λTa)2√︂∥ρ∥2−(dTρ)2+dTρλTa.

We need to prove that

√︂1−(λTa)2√︂∥ρ∥2−(dTρ)2+dTρλTa+∥ρ∥ ≥ 0.

By Cauchy-Schwarz,

|dTρλTa| ≤ ∥d∥∥ρ∥<∥ρ∥

. Thus,

dTρλTa

∥ρ∥>

Since

√︁1−(λTa)2√︁∥ρ∥2−(dTρ)2≥

0, the inequality follows. Therefore,

ϕλ

is convex.

We have proved that

ϕλ

is convex and differentiable in

Rm\{

}

∥d∥<

and in

Rm\dR+

∥d∥

= 1. It remains to show that if

= 1 and

∥d∥

= 1, then

ϕλ

is differentiable in

Rm\{

}

. This follows from

(5.12)

since

ϕ2

(

) =

−dyλTa

in this case. This concludes the proof.

With this, we have completed the proof of sublinearity of

ϕλ

. Moreover,

we have explicitly described the function. As a corollary:

Corollary 5.29.

The epigraph of

ϕλ

and the hypograph of

−ϕ−λ

are maximal

projL⊥S≤0-free sets.

While this result provides two convex sets, it is not clear which one to

chose. This means, which of these two constructed

projL⊥S≤0

-free sets will

yield an S≤0-free containing the given solution (x¯, y¯). We answer this next.

Lemma 5.30.

Consider (

x¯, y¯

)such that

∥x¯∥>∥y¯∥

and

aTx¯

dTy¯≤

and

x¯

∥x¯∥

. Then, the projection of (

x¯, y¯

)onto

L⊥

is in the interior of the

epigraph of ϕλ.

Proof. The projection of (

x¯, y¯

) onto

L⊥

is given by (

λTx¯, y¯

). Then,

ϕλ

(

y¯

) =

maxx{λTx

∥x∥≤∥y¯∥, aTx

dTy¯≤

} ≤ λTλ∥y¯∥

∥y¯∥

. Thus,

λTx¯

∥x¯∥>∥y¯∥ ≥ ϕλ(y¯).

110 Chapter 5. Maximal Quadratic-Free Sets

Back to the original space

Finally, we use the above to construct

S≤0

-free sets, i.e., in the original space.

Embedded in

Rn+m

, the epigraph of

ϕλ

{

(

tλ, y

) :

y∈Rm, ϕλ

(

)

≤t}

Thus,

Cϕλ={(tλ, y) : y∈Rm, ϕλ(y)≤t}+L

={(tλ +z, y) : y∈Rm, λTz= 0, ϕλ(y)≤t}

={(x, y) : ϕλ(y)≤λTx}.(5.15)

As a summary we prove that

Cϕλ

is maximal

S≤0

-free without going through

the projection.

Proposition 5.31.

Let

λ∈D1

(0) and

ϕλ

(

) =

maxx{λTx

: (

x, y

)

∈S≤0}

If ∥a∥= 1 ≥ ∥d∥, then Cϕλ={(x, y) : ϕλ(y)≤λTx}is maximal S≤0-free.

Additionally, if (

x¯, y¯

)

/∈S≤0

is such that

aTx¯

dTy¯≤

0, letting

x¯

∥x¯∥

ensures (x¯, y¯) ∈int(Cϕλ).

Proof. We will prove that Cϕλis convex, free and maximal.

The convexity of

Cϕλ

follows directly from

Proposition

28. Also,

Cϕλ

S≤0

-free since if (

x, y

)

∈S≤0

, then

ϕλ

(

)

≥λTx

. Therefore, (

x, y

) is not in

the interior of Cϕλ.

We now focus on proving maximality. In the cases where

ϕλ

is differentiable

in Rm\{0}we can directly write

Cϕλ={(x, y)∈Rn+m:∇ϕλ(β)Ty≤λTx, ∀β∈D1(0)}.

Let

β∈D1

(0) and let

xβ

be the optimal solution of the problem

(5.11)

which defines

ϕλ

(

). That is,

λTxβ

ϕλ

(

). By Lemma 5.15, the inequality

−λTx+∇ϕλ(β)Ty≤0 is exposed by (xβ, β).

The only remaining case is

∥d∥

= 1

∧m >

1, where

ϕλ

is only differentiable

(0)

\{d}

. Since in this case

m >

1 we can safely remove a single inequality

from the outer-description of Cϕλwithout affecting it, i.e.,

Cϕλ={(x, y)∈Rn+m:∇ϕλ(β)Ty≤λTx, ∀β∈D1(0) \{d}}.

Using the same argument as above we can find an exposing point of each

inequality −λTx+∇ϕλ(β)Ty≤0 for β∈D1(0) \{d}.

The fact that (

x¯, y¯

)

∈int

(

Cϕλ

) when

x¯

∥x¯∥

follows directly since

Cλ⊆

Cϕλ.

5.4. Homogeneous Quadratics With a Single Homogeneous Linear

Constraint 111

Figure 5.4:

≤0

in Example 5.24 (orange) and

Cϕλ

set (blue). The latter is

maximal S2

≤0-free.

Example 5.32. Let us recall the set S2

≤0in Example 5.24.

≤0={(x1, x2, y)∈R3:∥x∥ ≤ |y|, aTx+dy ≤0}

with

= (

−

/√2,

/√2

)

= 1

/√2

, and (

x¯, y¯

) = (

−

,−

. In Fig-

ure 5.3 we showed that the set

Cλ

≤0

-free but not maximal, and

CG(λ)

not

≤0

-free. In Figure 5.4 we show the set

Cϕλ

, which is maximal

≤0

-free.

For this example, we know λTa= 0, thus

λTa∥y∥+dTy≤0⇐⇒ y≤0.

A simple calculation using (5.12) yields

ϕλ(y) = {︄−y, if y≤0

√2if y > 0

Remark 5.33.

As we saw in the proof of Proposition 5.28 if

, then

ϕλ

(

) =

−dTy

. This implies that

Cϕλ

{

(

x, y

) :

aTx

dTy≥

}

. By

definition, this set does not contain any point from

aTx

dTy≤

0 in its

interior, thus, it is a very uninteresting maximal

S≤0

-free set. One is usually

interested in constructing a maximal

S≤0

-free set that contain a point (

x¯, y¯

)

that satisfies

aTx

dTy≤

0. Hence, by Lemma 5.30, whenever we assume

that

x¯

∥x¯∥

where

aTx¯

dTy¯≤

0 and

∥x¯∥>∥y¯∥

, it will automatically hold

that λ=a.

Remark 5.34.

At this point we would like to show some relations between

Cλ

Cϕλ

and

CG(λ)

. The inequalities defining

Cλ

are (

−λ, β

) for

β∈D1

(0).

112 Chapter 5. Maximal Quadratic-Free Sets

-1.0 -0.5 0.5 1.0

-1.0

-0.5

0.5

1.0

Figure 5.5: Let

= (

5,−4

)

, d

= (

10 ,2

), and

= (

65 ,16

). The boundary of

the

coordinates of the polars of

Cλ

CG(λ)

, and

Cϕλ

are depicted in orange,

green, and blue, respectively. They all coincide below the green line.

Equivalently, the polar of

Cλ

is the cone generated by

{−λ}×conv D1

(0) =

{−λ}×B1(0).

The inequalities defining

CG(λ)

are (

−λ, β

) for

β∈G

(

) =

{β∈D1

(0) :

βλTa

dTβ≤

}

. Equivalently, the polar of

CG(λ)

is the cone generated by

{−λ}×conv G(λ).

The inequalities defining

Cϕλ

are (

−λ, ∇ϕλ

(

)) for

β∈D1

(0). When

β∈G

(

), then

ϕλ

(

) =

∥y∥

and so the inequalities are (

−λ, β

). In other

words, some inequalities defining

Cϕλ

coincide with the inequalities defining

CG(λ)

and

Cλ

. Thus, when

Cϕλ

is convex (i.e., when

∥a∥≥∥d∥

), there is a

region where all three convex sets look the same. In terms of the polars, when

∥a∥ ≥ ∥d∥

, the polar of

Cϕλ

is between the polars of

CG(λ)

and

Cλ

. This is

depicted in Figure 5.5.

5.5 Non-Homogeneous Quadratics

As discussed at the beginning of the previous section, we now study a general

non-homogeneous quadratic which can be written as

S={(x, y)∈Rn+m:∥x∥ ≤ ∥y∥, aTx+dTy=−1}.

We assume we are given (x¯, y¯) such that

∥x¯∥>∥y¯∥, aTx¯ + dTy¯ = −1.

Much like in Section 5.4, we begin by dismissing a simple case.

5.5. Non-Homogeneous Quadratics 113

Remark 5.35.

The case

∥a∥ ≤ ∥d∥ ∧ m

= 1 can be treated separately. Note

that, as opposed to the analogous analysis at the beginning of Section 5.4,

here we include the case where the norms are equal. As already noted in

Remark 5.26, we should expect

to be convex in this case. Indeed, as

d

= 0

(if not, then

= 0 and

∅

) we can write

(

−

−aTx

) and consequently

S={(x, y)∈Rn+1 :∥x∥2≤1

d2(1 + 2aTx+ (aTx)2), aTx+dTy=−1}

={(x, y)∈Rn+1 :xT(︃I−1

d2aaT)︃x−1

d2(1 + 2aTx)≤0, aTx+dTy=−1}.

Since

I−1

d2aaT

is positive semi-definite whenever

|d| ≥ ∥a∥

, the set

convex. Thus, a maximal

-free set, or even directly a cutting plane, can be

obtained using a supporting hyperplane.

Similarly to Section 5.4, we distinguish the following cases:

Case 1 ∥a∥ ≤ ∥d∥ ∧ m > 1.

Case 2 ∥a∥>∥d∥.

Since

S⊊S≤0

, then

CG(λ)

(

Cϕλ

) is

-free in Case 1 (Case 2) as per

Section 5.4. It is natural to wonder whether these sets are maximal already.

5.5.1 Case 1: ∥a∥ ≤ ∥d∥ ∧ m > 1

The technique we used to prove maximality of

CG(λ)

with respect to

S≤0

to exploit that

CG(λ)

is defined by the inequalities of

Cλ

exposed by elements

S≤0

. Following this approach, we study which inequalities of

CG(λ)

are

exposed by a point of S. Recall that

CG(λ)={(x, y)∈Rn+m:−λTx+βTy≤0,∀β∈G(λ)},

where

G(λ) = {β∈Rm:∥β∥= 1, aTλ+dTβ≤0}.

Consider an inequality in the definition of

CG(λ)

given by (

−λ, β

) such that

aTλ

dTβ <

0. Then, the point (

λ, β

)

∈S≤0

can be scaled by

−1

aTλ+dTβ

to the exposing point

(

λ, β

)

∈S

. Thus, almost every inequality describing

CG(λ)

is exposed by points of

. Furthermore, we can simply remove the

inequalities that are not exposed by points of

from

CG(λ)

without changing

the set CG(λ). We specify this next.

114 Chapter 5. Maximal Quadratic-Free Sets

Theorem 5.36. Let λ=x¯

∥x¯∥,

H={(x, y)∈Rn+m:aTx+dTy=−1}

and

S≤0={(x, y)Rn+m:∥x∥ ≤ ∥y∥, aTx+dTy≤0},

where

∥a∥≤∥d∥ ∧ m >

1. Then,

CG(λ)

is maximal

S≤0

-free with respect to

Hand contains (x¯, y¯) in its interior.

Proof. By Proposition 5.21, we know that

CG(λ)

is maximal

S≤0

-free. Thus,

CG(λ)

S≤0

-free with respect to

. To prove maximality, we note that thanks

to m > 1:

CG(λ)={(x, y)∈Rn+m:−λTx+βTy≤0,∀β∈ri(G(λ))},

where

ri(G(λ)) = {β∈Rm:∥β∥= 1, aTλ+dTβ < 0}

is the relative interior of

(

). Consider

β0∈ri

(

)). As we saw in Proposi-

tion 5.19, (

λ, β0

)

∈CG(λ)∩S≤0

exposes the inequality (

−λ, β0

). As

CG(λ)∩S≤0

is a (non-convex) cone, we have that for any

µ >

(

λ, β0

)

∈CG(λ)∩S≤0

exposes the inequality (

−λ, β0

). Since

aTλ

dTβ0<

−1

aTλ+dTβ0>

and so

−(λ, β0)

aTλ+dTβ0∈S≤0∩H∩CG(λ),(5.16)

exposes the inequality (

−λ, β0

). The claim now follows from Theorem 5.7.

The above theorem states that obtaining a maximal

-free set in this case

amounts to simply using the maximal

S≤0

-free set

CG(λ)

, and then intersecting

with H. Recall that S=S≤0∩H. The next case is considerably different.

5.5.2 Case 2: ∥a∥>∥d∥

We begin with an important remark regarding an assumption made in the

analogous case of the previous section.

Remark 5.37.

Since in this case

∥a∥>

0, we can, again, assume that

∥a∥

= 1.

Indeed, we can always rescale the variables (

x, y

) by

∥a∥

to obtain such

requirement.

Also note that since

∥d∥<∥a∥

= 1, then

ϕλ

is differentiable in

(0). See

Proposition 5.28.

5.5. Non-Homogeneous Quadratics 115

(a)

≤0

(orange),

(green) and

Cϕλ

(blue).

-4-2 0 2 4

-25

-20

-15

-10

-5

(b) Projection onto (

x1, x2

) of

≤0∩H

(orange) and

Cϕλ∩H

(blue). One of

the facets of

Cϕλ∩H

has a gap with

the boundary of S2

≤0∩H.

Figure 5.6: Plots of

≤0

and

Cϕλ

as defined in Example 5.38 showing

that

Cϕλ

is not necessarily maximal

≤0

-free with respect to

in the case

∥a∥>∥d∥.

Unfortunately, in this case the maximality of

Cϕλ

with respect to

S≤0

does

not carry over to S, as the following example shows.

Example 5.38.

We continue with

≤0

defined in Example 5.24. In Figure 5.4

we showed how Cϕλgives us a maximal S2

≤0-free set. If we now consider

H={(x, y)∈Rn+m:aTx+dTy=−1}

with

= (

−

/√2,

/√2

)

and

= 1

/√2

, we do not necessarily obtain that

Cϕλ∩His maximal S2

≤0∩H-free. In Figure 5.6 we illustrate this issue.

Figure 5.6 of the previous example displays an interesting feature though:

the inequalities defining

Cϕλ

seem to have the correct “slope” and just need

to be translated. We conjecture, then, that in order to find a maximal

-free

set, we only need to adequately relax the inequalities of Cϕλ.

Set-up

Recall that

Cϕλ={(x, y) : ϕλ(y)≤λTx}

={(x, y) : −λTx+∇ϕλ(β)Ty≤0,∀β∈D1(0)}.

116 Chapter 5. Maximal Quadratic-Free Sets

We denote by

(

) the amount by which we need to relax each inequality of

Cϕλsuch that

C={(x, y) : −λTx+∇ϕλ(β)Ty≤r(β),∀β∈D1(0)},(5.17)

-free. Note that when

satisfies

λTa

dTβ <

0, the inequalities of

Cϕλ

are the same as the ones of

CG(λ)

(see also Remark 5.34) and, just like in

Section 5.5.1, they have exposing points in

. An inequality of this type can

be seen in Figure 5.6b: it is the inequality of

Cϕλ

tangent to

at one of its

exposing points. Thus, we expect that

(

) = 0 when

λTa

dTβ <

0. In

the following we find

(

) when

λTa

dTβ≥

0 and show maximality of the

resulting set.

Following the spirit of Section 5.4.2, not all statement in this section

require

x¯

∥x¯∥

. However, we assume

λ

±a

. This assumption, however, is

not restrictive when constructing maximal

-free sets, as the following remark

shows.

Remark 5.39.

−a

, then for every

β∈D1

(0) it holds that

λTa

dTβ <

0. In this case

(

) will be simply defined as 0 everywhere and

Cϕλ

. This

means all inequalities defining

have an exposing point in

and maximality

follows directly.

On the other hand, if we take

x¯

∥x¯∥

with (

x¯, y¯

)

∈H

and

∥x¯∥>∥y¯∥

, we

have that if additionally λ=a

aTx¯ + dTy¯ = −1⇐⇒ ∥x¯∥+dTy¯ = −1

=⇒ ∥y¯∥+dTy¯<−1.

The latter cannot be, as ∥d∥<1.

Remark 5.40.

The assumption

λ

±a

has an unexpected consequence: as

λ

±a

and

∥a∥

∥λ∥

= 1, it must hold that

n≥

2. This implicit assumption,

however, does not present an issue: whenever

= 1 either

−a

. By

Remark 5.39, if we use

x¯

∥x¯∥

, then

−a

. Thus,

Cϕλ

and maximality

holds.

Construction of r(β)

Let

β∈D1

(0) be such that

λTa

dTβ≥

0. Then, the face of

Cϕλ

defined by the

valid inequality

−λTx

∇ϕλ

(

)

Ty≤

0 does not intersect

. See Lemma 5.55

for a proof of this statement.

In particular, the inequality is not exposed by any point in

S∩Cϕλ

. How-

ever, it is exposed by (

xβ, β

)

∈S≤0

, where

xβ

is given by

(5.27)

(see the proof

5.5. Non-Homogeneous Quadratics 117

of Proposition 5.31). Note that (

xβ, β

)

∈H0

{

(

x, y

) :

aTx

dTy

= 0

}

, as

otherwise we can scale it so that it belongs to S.

The quantity

(

) is the amount we need to relax the inequality in order to

be an “asymptote”, and we compute it as follows. We first find a sequence of

points, (

xn, yn

)

n∈N

, in

S≤0

that converge to (

xβ, β

), enforcing that no element

of the sequence belongs to

. If we find such sequence, then every (

xn, yn

)

∈

S≤0can be scaled to be in S:

zn=−(xn, yn)

aTxn+dTyn∈S.

This last scaled sequence diverges, as the denominator goes to 0 due to

(

xn, yn

)

→

(

xβ, β

)

∈H0

. The idea is that the violation (

−λ, ∇ϕλ

(

))

Tzn

given by this sequence will give us, in the limit, the maximum relaxation that

will ensure S-freeness (see Figure 5.7). Then, we would define

r(β) = lim

n→∞(−λ, ∇ϕλ(β))Tzn=−lim

n→∞ −λTxn+∇ϕλ(β)Tyn

aTxn+dTyn

We remark that this limit is what we intuitively aim for, but it might not

even be well defined in general. In what follows, we construct a sequence that

yields a closed-form expression for the above limit. Additionally, we show that

such definition of r(β) yields the desired maximal S-free set.

The sequence.

Our goal is to find a sequence (

xn, yn

)

such that (

xn, yn

)

∈

S≤0

aTxn

dTyn<

0 and (

xn, yn

)

→

(

xβ, β

). We take

and

such

that

∥xn∥

∥β∥

= 1,

aTxn

dTβ <

0 and

xn→xβ

. Note that these always

exists as

∥a∥

= 1 and

∥d∥<

1. We illustrate such a sequence with our running

example.

Example 5.41.

We continue with Example 5.38. As we mentioned in Ex-

ample 5.32, in this case

ϕλ(y) = {︄−y, if y≤0

√2if y > 0

and since λ=1

√2(−1,−1)T, we see that

Cϕλ={(x, y) : 1

√2(x1+x2)−y≤0,(5.18a)

√2(x1+x2) + 1

√2y≤0}.(5.18b)

118 Chapter 5. Maximal Quadratic-Free Sets

-4-2 0 2 4

-25

-20

-15

-10

-5

Figure 5.7: Projection onto (

x1, x2

) of

≤0∩H

(orange) and

Cϕλ

(blue), along

with the first two coordinates of the sequence (

)

n∈N

defined in Example 5.41

for several values of n(red). The sequence is diverging “downwards”.

It is not hard to check that

−

(

√2,1

√2,√2

)

∈S2

≤0∩H∩Cϕλ

exposes

inequality

(5.18a)

. This is the tangent point in Figure 5.6b we discussed

above.

On the other hand,

(5.18b)

, which is obtained from

= 1, does not have

an exposing point in

≤0∩H∩Cϕλ

, and corresponds to an inequality we

should relax as per our discussion. This inequality, however, is exposed by

(xβ, β) = (0,−1,1) ∈S2

≤0∩Cϕλ. Consider now the sequence defined as

(xn, yn) = (︃1

√n2+ 1,−n

√n2+ 1,1)︃∈S2

≤0.

Clearly the limit of this sequence is (0,−1,1) and

aTxn+dTyn=1

√2(︃−1

√n2+ 1 −n

√n2+ 1 + 1)︃<0.

Now we let

zn=−(xn, yn)

aTxn+dTyn∈S2

≤0∩H.

As we mention above, this sequence diverges. Continuing with Figure 5.6, in

Figure 5.7, we plot the first two components of the sequence (

)

n∈N

along

with

≤0∩H

and

Cϕλ∩H

. From this figure we can anticipate where our

argument is going: the sequence (

)

n∈N

moves along the boundary of

≤0∩H

towards an “asymptote” from where we can deduce

(

). The latter is given

by the gap between inequality (5.18b) and the asymptote.

Computing the limit. Here we compute

r(β) = −lim

n→∞ −λTxn+∇ϕλ(β)Tyn

aTxn+dTyn

5.5. Non-Homogeneous Quadratics 119

We proceed to rewrite the limit.

Since yn=βand xβis the optimal solution of (5.11), we have:

∇ϕλ(β)Tyn=ϕλ(β) = λTxβ

dTyn=−aTxβ.

Thus,

r(β) = −lim

n→∞ −λTxn+∇ϕλ(β)Tyn

aTxn+dTyn

=−lim

n→∞ −λTxn+λTxβ

aTxn−aTxβ

= lim

n→∞

λT(xn−xβ)

aT(xn−xβ).

Notice that

xβ

belongs to the 2 dimensional space generated by

and

which we denote by Λ. Note that it is indeed 2 dimensional, since

λ

±a

see Remark 5.39. Furthermore, we can assume that

also belongs to Λ as

any other component of

is irrelevant for the value of the limit. Indeed, as

Rn= Λ ⊕Λ⊥, then xn=x∥

n+x⊥

n, where x∥

n∈Λ and x⊥

n∈Λ⊥, and

λT(xn−xβ)

aT(xn−xβ)=λT(x∥

n−xβ)

aT(x∥

n−xβ).

To compute the limit observe that

λT(xn−xβ)

aT(xn−xβ)=λTxn−xβ

∥xn−xβ∥

aTxn−xβ

∥xn−xβ∥

Notice that

xn−xβ

∥xn−xβ∥

converges, as

xn∈

Λ,

∥xn∥

= 1, and

xn→xβ

. Let

xˆ

the limit and note that xˆ is orthogonal to xβ. Indeed,

xβ

Txˆ = lim

n→∞xβ

Txn−xβ

∥xn−xβ∥

= lim

n→∞

xβTxn−1

∥xn−xβ∥

= lim

n→∞−∥xn−xβ∥2

2∥xn−xβ∥

= 0.

120 Chapter 5. Maximal Quadratic-Free Sets

Hence,

r(β) = lim

n→∞

λT(xn−xβ)

aT(xn−xβ)=λTxˆ

aTxˆ.

Since we are interested in the quotient of

λTxˆ

and

aTxˆ

, any multiple of

xˆ

can

be used, that is, any vector orthogonal to

xβ

in Λ. Using

and

as basis

for Λ, we have that for

x∈

Λ with coordinates

xλ

and

, the vector

with

coordinates

yλ

−

(

xλλTa

) and

xλ

xaλTa

is orthogonal to

Indeed,

xTy= (xλλ+xaa)T(yλλ+yaa)

=xλyλ+xaya+ (xλya+xayλ)λTa

= (xλ+xaλTa)yλ+ (xa+xλλTa)ya

= 0.

Thus, let

x˜

−

(

xβa

xβλλTa

)

xβλ

xβaλTa

)

. Given that

λTa

dTβ≥

from (5.27) (see Section 5.10) we have

xβ=√︄1−(dTβ)2

1−(λTa)2λ−(︄dTβ+λTa√︄1−(dTβ)2

1−(λTa)2)︄a. (5.19)

Note that while this last explicit formula for

xβ

is the one stated for the case

λTa+dTβ > 0, it also holds when λTa+dTβ= 0. Therefore,

x˜ = (dTβ)λ+(︄√︄1−(dTβ)2

1−(λTa)2−(︄dTβ+λTa√︄1−(dTβ)2

1−(λTa)2)︄λTa)︄a

= (dTβ)λ+ϕλ(β)a.

All together, we obtain

r(β) = λTx˜

aTx˜=dTβ+λTaϕλ(β)

ϕλ(β) + dTβλTa.

Note that if

λTa

dTβ

= 0, then

(

) = 0. We summarize the above discussion

in the following result.

Lemma 5.42.

Let

a, λ, β ∈D1

(0),

d∈B1

(0), and

λ

±a

be such that

∥d∥<

∥a∥

and

λTa

dTβ≥

0. Then, every sequence (

)

n∈N⊆ ⟨λ, a⟩

converging to

xβsuch that ∥xn∥= 1 and aTxn+dTβ < 0, satisfies

r(β) = lim

n→∞

λT(xn−xβ)

aT(xn−xβ)=dTβ+λTaϕλ(β)

ϕλ(β) + dTβλTa.

Such sequences are always guaranteed to exist.

5.5. Non-Homogeneous Quadratics 121

(a)

≤0

(orange),

(green) and

(blue). In this case

is no longer

≤0

free.

-4-2 0 2 4

-25

-20

-15

-10

-5

(b) Projection onto (

x1, x2

) of

≤0∩H

(orange) and C1∩H(blue).

Figure 5.8: Plots of

≤0

and

as defined in Example 5.43 showing that

C1is maximal S2

≤0-free with respect to H.

Therefore, for β∈D1(0), we define

r(β) = {︄0,if λTa+dTβ≤0

dTβ+λTaϕλ(β)

ϕλ(β)+dTβλTa,otherwise.

We extend rto y∈Rm\{0}by r(y) = r(y

∥y∥) and leave it undefined at 0.

Example 5.43.

We continue with our running example in Example 5.41. In

this case

(

−

1) = 0, and since

ϕλ

(

) = 1

/√2

λTa

= 0 and

= 1

/√2

it can

be checked that

r(1) = 1.

Now, let

C1={(x, y) : −λTx+∇ϕλ(β)Ty≤r(β),for all β∈D1(0)}

={(x, y) : 1

√2(x1+x2)−y≤0,1

√2(x1+x2) + 1

√2y≤1}.

Figure 5.8 shows the same plots as Figure 5.6 with C1instead of Cϕλ.

As we see below, the characterization of

as a limit is going to be useful

to prove maximality of

. However, to show that

is free, we need a different

interpretation of r.

Lemma 5.44.

For every

β∈D1

(0),

(

) =

(

), where

(

)is defined in

(5.28)

and corresponds to the optimal dual solution of the optimization problem

defining ϕλ(β).

122 Chapter 5. Maximal Quadratic-Free Sets

Proof. If

λTa

dTβ≤

(

)=0=

(

). Let

β∈D1

(0) be such that

λTa+dTβ > 0. Then,

r(β) = dTβ+λTaϕλ(β)

ϕλ(β) + dTβλTa

=dTβ+λTa√︁1−(λTa)2√︁1−(dTβ)2−dTβ(λTa)2

√︁1−(λTa)2√︁1−(dTβ)2

=dTβ√︁1−(λTa)2

√︁1−(dTβ)2+λTa

=θ(β).

S-freeness and maximality proofs

We now show that Cis S-free and then that it is maximal.

Theorem 5.45. Let λ∈D1(0) such that λ=±a,

C={(x, y) : −λTx+∇ϕλ(β)Ty≤r(β),for all β, ∥β∥= 1}.

and

{

(

x, y

) :

∥x∥ ≤ ∥y∥, aTx

dTy

−

}

, with

∥d∥<∥a∥

= 1. Then,

is S-free.

Proof. Let (

x0, y0

)

∈S

and let

β0

∥y0∥

. The claim will follow if we are able

to show that −λTx0+∇ϕλ(β0)Ty0≥r(β0).

Since x0satisfies ∥x0∥ ≤ ∥y0∥and aTx0+dTy0=−1, it follows that

λTx0≤max

x{λTx:∥x∥ ≤ ∥y0∥, aTx+dTy0≤ −1}.

By weak duality we have

max

x{λTx:∥x∥ ≤ ∥y0∥, aTx+dTy0≤ −1} ≤ inf

θ≥0∥y0∥∥λ−aθ∥−(dTy0+ 1)θ.

Recall that

(

) is the optimal dual solution to the optimization problem

defining

ϕλ

(

). Thus, it holds that

(

)

∈R+

and

(

)

∞

because

∥d∥<1. Consequently,

inf

θ≥0∥y0∥∥λ−aθ∥−(dTy0+1)θ≤ ∥y0∥∥λ−aθ(y0)∥−(dTy0+1)θ(y0) = ϕλ(y0)−θ(y0),

5.5. Non-Homogeneous Quadratics 123

where the last equality follows from the strong duality between the opti-

mization problem that defines

ϕλ

and its dual, see Proposition 5.54. All the

inequalities together show that

λTx0≤ϕλ(y0)−θ(y0).

From (5.28) and Lemma 5.44 it follow θ(y0) = θ(β0) = r(β0). Thus,

−λTx0+ϕλ(y0)≥r(β0),

as we wanted to establish.

Theorem 5.46. Let λ∈D1(0) such that λ=±a,

H={(x, y)∈Rn+m:aTx+dTy=−1},

S≤0={(x, y)Rn+m:∥x∥ ≤ ∥y∥, aTx+dTy≤0},

and

C={(x, y) : −λTx+∇ϕλ(β)Ty≤r(β),for all β∈D1(0)}.

where ∥d∥<∥a∥= 1. Then, Cis maximal S≤0-free with respect to H.

Additionally, if

x¯

∥x¯∥

with (

x¯, y¯

)

∈H

and

∥x¯∥>∥y¯∥

, then (

x¯, y¯

)

∈

int(C).

Proof. Let S=S≤0∩H. By Theorem 5.45, Cis S-free.

To show maximality we will use Theorem 5.11, that is, we will show that

every inequality of

is either exposed by a point in

S∩C

or exposed at

infinity by a sequence in S.

Let

β0∈D1

(0) and consider the valid inequality

−λTx

∇ϕλ

(

β0

)

Ty≤

(

β0

). Assume, first, that

aTλ

dTβ0<

0 As

aTλ

dTβ0<

0, we have that

(

β0

) = 0,

ϕλ

(

β0

) =

∥β0∥

= 1, and

∇ϕλ

(

β0

) =

β0

. Hence, the inequality is

−λTx+βT

0y≤0. It is exposed by

−1

aTλ+dTβ0

(λ, β0)∈S∩Cϕλ⊆S∩C.

Now, let us assume that

aTλ

dTβ0≥

0. We will show that there is a

sequence in

that exposes

−λTx

∇ϕλ

(

β0

)

Ty≤r

(

β0

) at infinity. Let (

)

n⊆

⟨λ, a⟩

be a sequence converging to

xβ0

such that

∥xn∥

= 1,

aTxn

dTβ0<

(Lemma 5.42).

r(β0) = lim

n→∞

λT(xn−xβ0)

aT(xn−xβ0).

124 Chapter 5. Maximal Quadratic-Free Sets

Consider the sequence conformed by

zn=−(xn, β0)

aTxn+dTβ0

=(xn, β0)

aT(xβ0−xn)∈S,

where the equality above follows from

aTxβ0

dTβ0

= 0. We proceed to verify

that znexposes −λTx+∇ϕλ(β0)Ty≤r(β0) at infinity.

xn→xβ0

, we have that

∥zn∥ → ∞

. Also,

∥zn∥

√2

(

xn, β0

) converges

to v=1

√2(xβ0, β0)∈Cϕλ= rec(C) and exposes −λTx+∇ϕλ(β0)Ty≤0.

Finally, we have to show that there exists a

such that (

−λ, ∇ϕλ

(

β0

))

(

β0

) and

dist

(

zn, w

⟨v⟩

)

→

0. Let

xˆ

limn→∞

xn−xβ0

∥xn−xβ0∥

and let

(−xˆ

aTxˆ,0). We have that (−λ, ∇ϕλ(β0))Tw=r(β0). Also,

zn−√2

aT(xβ0−xn)v=1

aT(xβ0−xn)(xn−xβ0,0) → −(xˆ

aTxˆ,0) = w.

Thus, dist(zn, w +⟨v⟩)→0.

A closed-form formula for C

Since the construction of

involves translating some of the inequalities of

Cϕλ

of its outer-description, it is natural to ask if this translation yields a

translation of the whole function

ϕλ

. This would yield a closed-form formula

for Cwhich is much more appealing from a computational standpoint.

In what follows, we ask whether there exists an (

x0, y0

) such that for every

βsuch that

{(x, y) : −λTx+∇ϕλ(β)Ty≤r(β),for all β, λTa+dTβ≥0}

={(x, y) : −λT(x−x0) + ∇ϕλ(β)T(y−y0)≤0,for all β, λTa+dTβ≥0}.

In order to reach this equality it would suffice to satisfy

λTx0−∇ϕλ(β)Ty0=−r(β).(5.20)

Note that since λTa+dTβ≥0

∇ϕλ(β) = √︂1−(λTa)2Wβ

∥β∥W−λTad (5.21)

r(β) = λTa+dTβ√︁1−(λTa)2

∥β∥W

5.6. On the Diagonalization and Homogenization of Quadratics 125

where W=I−ddT. Thus (5.20) becomes

λT(x0+adTy0)−√︂1−(λTa)2βTWy0

∥β∥W

=−λTa−dTβ√︁1−(λTa)2

∥β∥W

From the last expression, we see that if we are able to find (

x0, y0

) such that

x0+adTy0=−a(5.22a)

dTβ=βTWy0(5.22b)

then

(5.20)

would hold. Note that

is an eigenvector of

I−ddT

with

eigenvalue 1

−∥d∥2

. Thus, with

1−∥d∥2

we can easily check that

(5.22b)

holds. With y0defined, in order to satisfy (5.22a) it suffices to set

x0=−a(dTy0+ 1) = −a

1−∥d∥2.

In summary, we arrive to the following expression for C,

C=⎧

⎪

⎨

⎪

⎩

(x, y) :

ϕλ(y)≤λTxif λTa∥y∥+dTy≤0

ϕλ(︃y−d

1−∥d∥2)︃≤λT(︃x+a

1−∥d∥2)︃otherwise⎫

⎪

⎬

⎪

⎭

(5.23)

5.6 On the Diagonalization and Homogenization of Quadratics

Consider an arbitrary quadratic set

Q={s∈Rp:sTQs +bTs+c≤0}.

Given a point

s¯/∈ Q

we can construct a maximal

-free set that contains

s¯

using the techniques developed in the previous sections. The idea to do this

is first to find a one-to-one map Tsuch that

T(Q) = S≤0∩H={(x, y, z)∈Rn+m+l:∥x∥ ≤ ∥y∥, aTx+dTy+hTz=−1}

T(s¯) ∈H\S≤0,

for some hyperplane H, that is, for some a, d and h.

Then, we construct a maximal

-free set using the following fact which

can be easily verified: if

is a maximal

S≤0

-free set with respect to

that

contains T(s¯), then T−1(C) is a maximal Q-free set containing s¯.

Here we show a surprising fact: which maximal

-free set is obtained

heavily depends on the choice of

. We illustrate this interesting feature with

our running example.

126 Chapter 5. Maximal Quadratic-Free Sets

Example 5.47. Let

Q={s∈R2:−2+2√2s1−2√2s2+ 2s1s2≤0}

and s¯ = (−2,−2) ∈ Q. The following map

τ1(s1, s2) = (s1, s2,√2 + s1−s2)

is one-to-one and satisfies

τ1(Q) = S2

≤0∩H1,

where S2

≤0∩H1is defined in Example 5.38 and is given by

≤0∩H1={(x1, x2, y)∈R3:∥x1, x2∥ ≤ |y|,−x1+x2+y=−√2}.

Computing a maximal

≤0

-free set with respect to

containing

τ1

(

s¯

) =

(

−

,−

,√2

) yields the same maximal

S2∩H1

-free set we compute in Exam-

ple 5.43, that is

C1∩H1={(x, y) : 1

√2(x1+x2)−y≤0,

√2(x1+x2) + 1

√2y≤1

−x1+x2+y=−√2}.

As τ−1

1is simply the projection onto the first two coordinates, we have that

τ−1

1(C1) = {︃s∈R2:(︃1

√2−1)︃s1+(︃1

√2+ 1)︃s2+√2≤0,√2s1−2≤0}︃

is our maximal Q-free set. This is exactly the set we show in Figure 5.8b.

Now we consider a different transformation for Q. Let

T1(s1, s2) = 1

2[︃−1 1

1 1 ]︃[︃s1−√2

s2+√2]︃,

T2(s1, s2) = (−1, s1, s2),and

τ2=T2◦T1.

For the curious reader,

is obtained from an eigen-decomposition of the

quadratic form. After some algebraic manipulation we can see that

T1(Q) = {w∈R2:T−1

1(w1, w2)∈ Q}

={w∈R2: 1 −w2

1+w2

2≤0}.

5.6. On the Diagonalization and Homogenization of Quadratics 127

-10 -8-6-4-2 0 2 4

-25

-20

-15

-10

-5

Figure 5.9: Different maximal

-free sets obtained from different transforma-

tions, as discussed in Example 5.47. The quadratic set

(blue), a maximal

-free set obtained from

τ1

(orange), and another such set obtained from

τ2

(green).

Thus, τ2is one-to-one and

τ2(Q) = {(x1, x2, y)R2:∥x1, x2∥ ≤ |y|, x1=−1}.

Letting

≤0

{

(

x1, x2, y

)

∥x1, x2∥ ≤ |y|, x1≤

}

and

{

(

x1, x2, y

)

−

}

, we have that

τ2

(

) =

≤0∩H2

. We can now construct a maximal

≤0

-free set with respect to

. For this, note that in this case

= (1

0) and

= 0. Also,

τ2

(

s¯

) = (

−

,−

,√2

) and so

√5

(

−

,−

2). As

aTλ|y|

dy <

for every y∈R, we have that r(y) = 0 and ϕλ(y) = |y|. By Theorem 5.46,

C2={(x1, x2, y)∈R3:|y| ≤ λTx}

is maximal

≤0

-free set with respect to

. Therefore,

τ−1

(

) is maximal

-free. In Figure 5.9 we show the sets

and both maximal

-free sets given

τ−1

(

) and

τ−1

(

). Note that in this case, the set

τ−1

(

) does not

have an asymptote, and both its facets have an exposing point.

This example shows the important role of the transformation used to bring

the quadratic set to the form

. The resulting maximal

-free set can signifi-

cantly change. This opens an array of interesting questions regarding the role

of transformations in our approach: Can we distinguish the transformations

that generate

-free sets with asymptotes? Is there a benefit/downside from

using the latter sets? These an other questions are left for future work.

128 Chapter 5. Maximal Quadratic-Free Sets

5.7 Further Remarks and Generalizations

In this section we collect some further remarks and generalizations. We start

by generalizing Theorem 5.16 to the case where

is represented as the differ-

ence of two sublinear functions in independent variables. Then we generalize

Proposition 5.21 to the case of several homogeneous linear inequalities. After

this we show that we can use Proposition 5.21 to extend the work of Bienstock

et al. (2016) by constructing further outer-product-free sets. We also present

simpler proofs of some of the outer-products-free sets developed there. Finally,

we present an example that shows that there are more quadratic-free sets

than the ones that we construct on this chapter.

5.7.1 Generalizing Theorem 5.16

We can generalize Theorem 5.16 to the case when

can be written as the

sublevel set of a difference of sublinear functions in independent variables.

Theorem 5.48.

Let

Rn→R

be a sublinear function and let

Rm→R

be a sublinear function that is positive except at 0. Let

S={(x, y)∈Rn×Rm:σ(x)≤ρ(y)}

and let

x¯

= 0 be such that there exists a

y¯

such that (

x¯, y¯

)

/∈S

. Let

λ∈∂σ

(

x¯

)

and

Cλ={(x, y)∈Rn×Rm:λTx≥ρ(y)}.

Then Cλis maximal S-free.

Proof. First note that

(

x¯

)

0 since otherwise, due to the positivity of

(

x¯, y

)

∈S

for any

y∈Rm

. Therefore, 0

/∈∂σ

(

x¯

), in particular

λ

= 0, and we

can assume without loss of generality that ∥λ∥= 1.

We are going to prove maximality via Theorem 5.6. For this notice that

Cλ={(x, y)∈Rn×Rm:λTx≥βTyfor all β∈exp ∂ρ(0)}.

Thus, we just need to show that every inequality is exposed by a point of

S∩Cλ

. We show this now. Let

β∈exp ∂ρ

(0), let

be such that it exposes

and let

ρ(y0)

σ(x¯) x¯

. By Proposition 5.53 (see Section 5.10),

λTx¯

(

x¯

), which

implies that

λTx0

(

). Then, Lemma 5.15 implies that (

x0, y0

) exposes

λTx≥βTy.

We need to show that (

x0, y0

)

∈S∩Cλ

. As we saw,

λTx0

(

), which

implies that (x0, y0)∈Cλ. Finally,

σ(x0) = σ(ρ(y0)

σ(x¯) x¯) = ρ(y0)

5.7. Further Remarks and Generalizations 129

implies that (

x0, y0

)

∈S

. Notice that the second equality holds because

ρ(y0)>0 and σ(x¯) >0.

The next example shows that the positivity of ρis necessary.

Example 5.49.

Consider

(

x, y

) =

∥

(

x, y

)

∥

and

(

) = 2

|z|

. Both

functions are positively homogeneous. Let

{

(

x, y, z

)

∈R3

(

x, y

)

≤

ρ(z)},x¯ = (1,1). Then, ∇σ(x¯) = (1 + 1

√2)x¯ and

Cλ={(x, y, z)∈R3: (1 + 1

√2)(x+y)≥2z+|z|}.

We now show that Cλis not maximal S-free, see also Figure 5.10.

Consider

K={(x, y, z)∈R3: (1 + 1

√2)(x+y)≥3z}.

Since

z≤ |z|

, then

Cλ⊊K

. Furthermore,

-free. Indeed, given that

(

x, y

)

≥

0, then any element of

satisfies

z≥

0. Thus if (

x, y, z

) is in

and

the interior of K, it must satisfy z≥0 and

σ(x, y)≤3z < (1 + 1

√2)(x+y).

This is impossible since (1 +

√2

)(

) =

∇σ

(

x¯

)

(

x, y

)

≤σ

(

x, y

) for every

x, y.

The next example shows that it is necessary that each sublinear function

is in a different set of variables.

Example 5.50.

Consider

(

x, y

) =

and

(

x, y

) =

∥

(

x, y

)

∥

Both functions are positively homogeneous. Let

{

(

x, y

)

∈R2

(

x, y

)

≤

(

x, y

)

}

and (

x¯, y¯

) = (1

1). We have that (

x¯, y¯

)

/∈S

and

∇σ

(

x¯, y¯

) = (2

1).

From Figure 5.11 it is easy to see that

C={(x, y)∈R2: 2x+y≥ρ(x, y)}

is not maximal S-free.

130 Chapter 5. Maximal Quadratic-Free Sets

Figure 5.10: The set S(orange) and Cλ(green) from Example 5.49.

-2-1 0 1 2

-2

-1

Figure 5.11: The set S(blue) and C(orange) from Example 5.50.

5.7. Further Remarks and Generalizations 131

5.7.2 Generalizing Proposition 5.21

Let

{

(

x, y

) :

Dy ≤

}

and let

{

(

x, y

)

∈P

∥x∥ ≤ ∥y∥}

Here we construct maximal

-free sets under some conditions on

. The

construction generalizes Proposition 5.21.

The construction follows basically the same steps, but there is one extra

issue. Just like

(

), we define

(

) =

{β

∥β∥

= 1

, Aλ

Dβ ≤

}

. Also,

just like CG(λ)we define

CGP(λ)={(x, y)∈Rn+m:−λTx+βTy≤0,∀β∈GP(λ)}.

The extension of Proposition 5.19 presents the extra hypothesis needed. In

the proof of Proposition 5.21 it was key to write the non-convex problem

maxβ∈G(λ)yTβ

as the convex problem

max{yTβ

∥β∥ ≤

, aTλ

dTβ≤

}

see

(5.6)

. However, in general, the same does not work using

(

) instead

of G(λ). Indeed,

max{yTβ:∥β∥ ≤ 1, Aλ +Dβ ≤0}(5.24)

can have optimal solutions for which

∥β∥<

1. This can never happen when

we have a single inequality and

β∈Rm

with

m >

1. To force that every

optimal solution of

(5.24)

satisfies

∥β∥

= 1 we are going to ask that

has no

vertex of the form (λ, β) with ∥β∥<1.

Alternatively, we could define

(

) =

{β

∥β∥ ≤

, Aλ

Dβ ≤

}

and

CGP(λ)

with the new

(

). However, it would not be clear if there is a

point in

exposing an inequality with

∥β∥<

1. Indeed, it must no happen.

This can be seen from modifying Example 5.25. Consider

= (

−

4) instead

= (

−

4). The modification discussed here yields

CG(λ)

{

(

x, y

) :

λTx

y≥

, λTx−4

25 y≥

}

. The second inequality comes from

However, with the new a,{(x, y) : λTx+y≥0}is already maximal.

Finally, we need to generalize the condition

∥d∥ ≥ ∥a∥

. This generalizes to

the condition DDT−AATis copositive. All together, we have the following.

Proposition 5.51.

Let (

x¯, y¯

)

∈P\SP

and

x¯

∥x¯∥

. Assume that

has no

vertex (λ, β)with ∥β∥<1. If DDT−AATis copositive, then

CGP(λ)={(x, y)∈Rn+m:−λTx+βTy≤0,∀β∈GP(λ)}.

is maximal SP-free.

Proof. As

has no vertex (

λ, β

) with

∥β∥<

1, we have that

(

)



∅

if and only if

{∥β∥ ≤

, Aλ

Dβ ≤

} 

∅

. Since

Ax¯

Dy¯≤

0 and

132 Chapter 5. Maximal Quadratic-Free Sets

∥x¯∥>∥y¯∥

, it holds that

Aλ

Dy¯

∥x¯∥≤

0 and

∥y¯∥

∥x¯∥<

1. In other words,

(

−λ, y¯

∥x¯∥

)

∈ {∥β∥ ≤

, Aλ

Dβ ≤

}

. Note that (

−λ, y¯

∥x¯∥

) is a Slater point,

see Section 1.3.

To show that CGP(λ)is SP-free, it is enough to show that

max

β∈G(λ)yTβ≥λTxfor every (x, y)∈SP.

Since

has no vertex (

λ, β

) with

∥β∥<

1 the maximum above is equivalent

to (5.24). By strong duality (5.24) is equal to

min

θ≥0∥y−DTθ∥−λTATθ.

Now we just need to show that for any

θ≥

0 and every (

x, y

)

∈SP

, the

expression

−λTx

∥y−DTθ∥−λTATθ

is non-negative. We will now prove

that λT(x+ATθ)≤ ∥y−DTθ∥, which implies the freeness.

By Cauchy-Schwarz and

∥λ∥

= 1, we have that

λT

(

ATθ

)

≤ ∥x

ATθ∥

Furthermore,

∥x

ATθ∥2

∥x∥2

+ 2

θTAx

∥ATθ∥2

. Since

θ≥

θTAx ≤

−θTDy. In addition, ∥x∥2≤ ∥y∥2. Thus,

∥x+ATθ∥2≤ ∥y∥2−2θTDy +∥ATθ∥2

=∥y−DθT∥2+∥ATθ∥2−∥DTθ∥2

≤ ∥y−DθT∥2,

where the last inequality is due to the copositivity of DDT−AAT.

We have shown that

∥x

ATθ∥ ≤ ∥y−DθT∥

. Hence,

λT

(

ATθ

)

≤

∥y−DθT∥and we conclude that CGP(λ)is SP-free.

Finally, we have to prove maximality. Suppose there exists an

-free,

such that

CGP(λ)⊊C

. This implies that there must exist

β0∈GP

(

) such

that

−λTx

βT

0y≤

0 is not valid for

. As

Cλ⊆CGP(λ)

−λTx

βT

0y≤

0 is

valid for Cλ.

As we saw in Theorem 5.16, (

λ, β0

)

∈Cλ

exposes

−λTx

βT

0y≤

0, and

since

Cλ⊆C

, Theorem 5.5 implies that (

λ, β0

)

∈int

(

). However, since

β0∈GP

(

), we have that (

λ, β0

)

∈SP

. This contradicts the

-freeness of

5.7.3 Extensions to the Work of Bienstock et al. (2016)

Bienstock et al. (2019) construct maximal

-free sets for

{X∈ Sn

rk(X) = 1}. They show that

Cijkl ={X∈ Sn

+:λ1(xij +xlk) + λ2(xik −xlj)≥ ∥(xik +xlj, xij −xlk)∥}

5.7. Further Remarks and Generalizations 133

is maximal

-free under some conditions of

λ1

and

λ2

depending on

i, j, k, l

see Bienstock et al. (2019, Theorem 4). In other words, the matrices for which

the entries of a given 2

2 submatrix satisfies the condition above. To simplify

notation we will denote the entries of the submatrix by

(︃a b

c d)︃

and

Cijkl

. For example, if the submatrix is taken from the columns

i, j

and rows

k, l

such that

, that is,

is in the diagonal, then

λ1

= 1

, λ2

= 0 yields a

maximal

-free set according to (Bienstock et al., 2019, Theorem 4). Or if none

a, b, c, d

corresponds to an entry in the diagonal, then any (

λ1, λ2

)

∈D1

(0)

yields a maximal S-free set.

This last result can be deduced as follows. By using the projection theo-

rem Theorem 5.12 we can reduce finding maximal

-free sets to finding the

maximal S0-free sets, where

S0={(a, b, c, d)∈R4:ad =bc}.

The set

S1={(a, b, c, d)∈R4∈ad ≤bc}

is a non-convex

-free set. Using the eigenvalue decomposition we obtain

as a maximal

-free set. Theorem 5.16 tells us that

is going to be maximal

for any (λ1, λ2)∈D1(0).

The difficulty when some entries belong to the diagonal is that if

X∈S

then its diagonal entries are non-negative. Thus, if, say,

is in the diagonal,

then

{

(

a, b, c, d

)

∈R4

bc, b ≥

}

. Thus,

{

(

a, b, c, d

)

∈R4∈

ad ≤bc, b ≥

}

and we can use the techniques from Section 5.4 to construct

maximal S1-free sets.

5.7.4 There Are More Quadratic-Free Sets

It is an interesting question whether every quadratic-free set can be obtain

via the construction presented in this chapter. In this section we show that

the answer is no. Even for the homogeneous case we can find

-free sets that

are not given by our construction.

The Sh-free sets Cλhave the following property.

Proposition 5.52. If (x, y)∈Sh∩Cλ\{(0,0)}then x

∥x∥=λ.

Proof. If (

x, y

)

∈Cλ

, then

λTx≥ ∥y∥

. If (

x, y

)

∈Sh

, then

∥y∥≥∥x∥

. By

Cauchy-Schwarz,

∥x∥ ≥ λTx

. All together imply that

∥x∥

λTx

, which

implies that x

∥x∥=λ.

134 Chapter 5. Maximal Quadratic-Free Sets

Consider now S={(x1, x2, y1, y2)∈R4:∥x∥ ≤ ∥y∥} and let

C= conv{(1,0,1,0),(0,1,0,1),(1,1,0,0),(1,1,−1,0),(0,0,0,0)}.

is full dimensional and

-free. To see this, note that the points in the

interior of

are of the form (

λ1

λ3

λ4, λ2

λ3

λ4, λ1−λ4, λ2

) for which

λ1

. . .

λ5

= 1 and

λi>

0 for

= 1

, . . . ,

5. For such a point to be in

must hold satisfy

(λ1+λ3+λ4)2+ (λ2+λ3+λ4)2≤(λ1−λ4)2+λ2

But subtracting the right hand side and factorizing, this is the same as

(2λ1+λ3)(λ3+ 2λ4) + (2λ2+λ3+λ4)(λ3+λ4)≤0.

No λi>0 satisfy the above inequality.

Notice that (1

∈S∩C

, but the property in Proposi-

tion 5.52 does not hold since (1



= (0

1). Therefore,

can be extended to

a maximal S-free set such that C=Cλfor every λ∈D1(0).

5.8 Computational Experiments

These cuts, among others, are studied computationally in the Master’s thesis

of Antonia Chmiela (Chmiela, 2020). In her work, a transformation similar to

τ2

of Section 5.6 is used to transform a general quadratic into the form needed

to construct the

-free set. Specifically, the idea is to write the quadratic part

of a quadratic function as a d.c. using the eigenvalues and eigenvectors and

then, to homogenize it.

Two experiments are performed in Chmiela (2020) using the MINLP solver

SCIP (Gamrath et al., 2020; Vigerske and Gleixner, 2018; Achterberg, 2009)

The first one consists of testing how much gap can be closed in the root

node, when as many cuts as possible are added. This means the following.

SCIP creates an initial linear relaxation of the optimization problem at hand.

After solving this relaxation we obtain a first lower bound

. Then, the root

node is processed by tightening bound, adding cutting planes, resolving the

LP relaxation, etc. (for more details consult Achterberg (2009)). Just before

branching starts, a last lower bound

is obtained. The gap closed is then

d2−d1

p−d1

, where

is the value of the optimal solution. Note that this measure

only makes sense when

d1

, thus, in particular, feasibility problems are

not considered.

The second experiment consists of an assessment of the performance of

SCIP with the cuts included. That is, how much faster (or slower) SCIP is

when the intersection cuts are used.

5.8. Computational Experiments 135

max default relative

subset instances solved time nodes solved time nodes time nodes

clean 2689 1625 97.73 1647 1619 99.83 1691 1.02 1.03

affected 1188 716 138.72 3790 710 145.58 4030 1.04 1.06

[0,3600] 1652 1625 9.46 342 1619 9.81 362 1.04 1.06

[1,3600] 965 938 40.19 1112 932 42.64 1198 1.06 1.08

[10,3600] 650 623 135.17 2533 617 146.12 2798 1.08 1.10

[100,3600] 359 332 462.07 8278 326 516.28 9558 1.12 1.16

[1000,3600] 135 108 1226.30 31104 102 1493.00 40748 1.22 1.31

all-optimal 1598 1598 7.91 271 1598 8.17 284 1.03 1.05

diff-timeout 54 27 1202.26 - 21 1418.06 - 1.18 -

Table 5.1: Comparison of running time (in seconds) and number of nodes when

using SCIP with the settings max and default, respectively. The columns

“relative” denote the corresponding relative shifted geometric mean of the

results obtained by default with respect to the results of max.

The results of the first experiments are as follows. Out of 690 instances

from the MINLPLIB MINLPLIB for which there was a difference in the gap

closed, 512 closed more gap. In average, a 3% more gap can be closed in the

root node. However, solvers do not add as many cuts as they can and at some

point they decide to start branching. Thus, although this result is positive,

the empirical performance still needs to be assessed.

For the second experiment, we reproduce Table 4.5 from Chmiela (2020)

as Table 5.1. We can observe that, as expected, less nodes are needed. Three

reasons why a slowdown might be expected from intersection cuts are the

following. First, to compute them, one needs access to the LP tableau, which is

not a cheap operation if performed often. Second, intersection cuts are gener-

ally dense, which might render the LP harder to solve. Third, the numerics of

these cuts can be really bad, in the sense that they might have large and small

coefficients, again making the LP harder to solve. Despite all this, we do see

a speed-up in the solving time. For example, in the instances for which either

SCIP with or without the cut took at least 1000 seconds to solve, the hard

instances, we see a 20% speed-up. When considering all instances that did not

fail, the speed-up is a modest 2%. These results were obtained by adding at

most 20 intersection cuts only in the root node. Although this might sound

as a small number, the performance is sensible to this limits. Experiments

performed by the author adding additionally at most 2 cuts per node in the

tree, led to a 10% slowdown.

These results show that there is potential in this type of cuts for nonlinear

problems. For more details, we refer the reader to Chmiela (2020).

136 Chapter 5. Maximal Quadratic-Free Sets

5.9 Summary and Future Work

In this chapter we have shown how to construct maximal quadratic-free sets,

i.e., convex sets whose interior does not intersect the sublevel set of a quadratic

function. Using the long-studied intersection cut framework, these sets can

be used in order to generate deep cutting planes for quadratically constrained

problems. We strongly believe that, by carefully laying a theoretical frame-

work for quadratic-free sets, this chapter provides an important contribution

to the understanding and future computational development of non-convex

quadratically constrained optimization problems.

The maximal quadratic-free sets we construct in this chapter allow for

an efficient computation of the corresponding intersection cuts. Computing

such cutting planes amount to solving a simple one-dimensional convex op-

timization problem using the quadratic-free sets we show here. Moreover,

even if in our constructions and maximality proofs we use semi-infinite outer-

descriptions of

-free sets such as

(5.17)

, all of them have closed-form expres-

sions that are more adequate for computational purposes: see

(5.2)

(5.10)

(5.15)

(5.23)

for these expressions for the sets

Cλ

CG(λ)

Cϕλ

and

, respec-

tively, and

(5.12)

for the explicit description of the

ϕλ

function. This ensures

efficient separation in LP-based methods for quadratically constrained opti-

mization problems.

The empirical performance of these intersection cuts is promising. The

development of a cut strengthening procedure is likely to be important for

obtaining an even better empirical performance. Other important open ques-

tions involve the better understanding of the role different transformations

of quadratic inequalities have (Section 5.6), a theoretical and empirical com-

parison with the method proposed by Bienstock et al. Bienstock et al. (2016,

2019), and devising new methods for producing other families of quadratic-free

sets. All this is subject of ongoing work.

5.10. Missing Proofs 137

5.10 Missing Proofs

The following is a useful identity that sublinear functions satisfies. For posi-

tively homogeneous and differentiable functions the result is implied by the

well-known Euler homogeneous function theorem.

Proposition 5.53.

Rn→R

is sublinear, then

(

) =

βTx

for every

β∈∂ϕ(x).

Proof. Let

β∈∂ϕ

(

). It holds that

(

) +

βT

(

y−x

)

≤ϕ

(

) for every

y∈Rn

Taking y= 2xand y=1

2x, we conclude that ϕ(x) = βTx.

Lemma 5.15. Let ϕ:Rn→Rbe a sublinear function, λ∈D1(0), and let

C={(x, y) : ϕ(y)≤λTx}.

If (

x¯, y¯

)

∈C

is such that

is differentiable at

y¯

and

(

y¯

) =

λTx¯

, then (

x¯, y¯

)

exposes the valid inequality −λTx+∇ϕ(y¯)Ty≤0.

In particular, if

β0∈∂ϕ

(0) is an exposed point of

∂ϕ

(0), exposed by

y¯

and ϕ(y¯) = λTx¯, then (x¯, y¯) exposes the valid inequality −λTx+βT

0y≤0.

Proof. We need to verify both conditions of Definition 5.2. As

is positively

homogeneous and differentiable at

y¯

, then

(

y¯

) =

∇ϕ

(

y¯

)

y¯

. Thus, evaluating

−λTx

∇ϕ

(

y¯

)

at (

x¯, y¯

) yields

−λTx¯

(

y¯

), which is 0 by hypothesis. This

shows that the inequality is tight at (x¯, y¯).

Now, let

αTx

γTy≤δ

be a non-trivial valid inequality tight at (

x¯, y¯

Then,

αTx¯

γTy¯

and we can rewrite the inequality as

αT

(

x−x¯

γT

(

y−

y¯

)

≤

0. Notice that (

(

)

λ, y

)

∈C

, thus,

αTλ

(

)

−ϕ

(

y¯

)) +

γT

(

y−y¯

)

≤

for every

y∈Rm

. Subtracting

αTλ∇ϕ

(

y¯

)

(

y−y¯

) and dividing by

∥y−y¯∥

obtain the equivalent expression

αTλϕ(y)−ϕ(y¯) −∇ϕ(y¯)T(y−y¯)

∥y−y¯∥≤(−γ−αTλ∇ϕ(y¯))Ty−y¯

∥y−y¯∥.

Since

is differentiable at

y¯

, the limit when

approaches

y¯

of the left hand

side of the above expression is 0. However, one can make the expression

y−y¯

∥y−y¯∥

converge to any point of D1(0). Therefore,

0≤(−γ−αTλ∇ϕ(y¯))Tβ

for every

β∈D1

(0). This implies that

−αTλ∇ϕ

(

y¯

). From here we see

that α= 0 as otherwise α=γ= 0 and the inequality would be trivial.

138 Chapter 5. Maximal Quadratic-Free Sets

Given that any (

0) such that

λTx

= 0 belongs to

, it follows that

parallel to

, i.e., there exists

ν∈R

such that

νλ

. Furthermore, (

µλ,

∈

for every

µ≥

0, implies that 0

> αTλ

. Therefore,

−ν∇ϕ

(

y¯

) and the

inequality reads

νλT

(

x−x¯

)

−ν∇ϕ

(

y¯

)

(

y−y¯

)

≤

0. Dividing by

|ν|

and using

that

−λTx

∇ϕ

(

y¯

)

Ty≤

0 is tight at (

x¯, y¯

), we conclude that the inequality

can be written as

−λTx+∇ϕ(y¯)Ty≤0.

The second claims follows from the first part of the lemma and the fact that

β0

is an exposed point of

∂ϕ

(0) and

y¯

exposes it, then

is differentiable

y¯

and

∇ϕ

(

y¯

) =

β0

. To show this last statement, it is enough to prove

that

∂ϕ

(

y¯

) =

{β0}

, as then (Rockafellar, 1970, Theorem 25.1) implies that

β0=∇ϕ(y¯).

We first show that

β0∈∂ϕ

(

y¯

). We have that

(

) =

max{βTy

β∈

∂ϕ

(0)

}

. Since

y¯

exposes

β0

, we have that

(

y¯

) =

βT

0y¯

. Given that

β0∈∂ϕ

(0),

we have that

βT

0y≤ϕ

(

). Thus,

(

y¯

) +

βT

(

y−y¯

)

≤ϕ

(

) and we conclude

that β0∈∂ϕ(y¯).

Now, let

β∈∂ϕ

(

y¯

). Then,

(

y¯

βT

(

y−y¯

)

≤ϕ

(

) for all

. Proposition 5.53

implies that

βTy≤ϕ

(

) and we conclude that

β∈∂ϕ

(0). But

y¯

exposes

β0

which means that

β0

is the only solution to

(

y¯

) =

max{βTy¯

β∈∂ϕ

(0)

}

This implies that β=β0. Hence, ∂ϕ(y¯) = {β0}as we wanted to show.

Proposition 5.54.

Let

a, λ ∈D1

(0),

λ

±a

and let

d∈Rm

be such that

∥d∥ ≤ 1. The (Lagrangian) dual problem of

max

x{λTx:∥x∥ ≤ ∥y∥, aTx+dTy≤0}(5.25)

inf

θ{∥λ−θa∥∥y∥−θdTy:θ≥0}.(5.26)

The optimal solution to (5.25) is x:Rm→Rn,

x(y) = ⎧

⎨

⎩

λ∥y∥,if λTa∥y∥+dTy≤0

√︂∥y∥2−(dTy)2

1−(λTa)2λ−(︂dTy+λTa√︂∥y∥2−(dTy)2

1−(λTa)2)︂a, otherwise.

(5.27)

The optimal dual solution is θ:Rm→R+∪{+∞},

θ(y) = ⎧

⎨

⎩

0,if λTa∥y∥+dTy≤0

λTa+dTy√1−(λTa)2

√∥y∥2−(dTy)2,otherwise. (5.28)

5.10. Missing Proofs 139

Here,

= +

∞

and

+(+

∞

) = +

∞

for every

r∈R

. Moreover, strong duality

holds, that is, (5.25) = (5.26), and

(5.25) = {︄∥y∥,if λTa∥y∥+dTy≤0

√︁(∥y∥2−(dTy)2)(1 −(λTa)2)−dTyλTa, otherwise.

(5.29)

Finally, (5.29) holds even if λ=±a.

Proof. First, note that since

λ

±a

and

∥d∥ ≤

(

) and

(

) are defined

for every

y∈Rm

. Second, to make some of the calculations that follow more

amenable, let S(y) = √︂∥y∥2−(dTy)2

1−(λTa)2.

The Lagrangian of (5.25) is L:Rn×R2

+→R,

L(x, µ, θ) = λTx−µ(∥x∥−∥y∥)−θ(aTx+dTy).

Thus, the dual function is

d(µ, θ) = max

xL(x, µ, θ).

We have that

(

µ, θ

) is infinity whenever

µ < ∥λ−aθ∥

, and

µ∥y∥ − θdTy

otherwise. Hence, the dual problem,

minθ,µ≥0d

(

µ, θ

), is

min{µ∥y∥−θdTyθ

θ≥0, µ ≥ ∥λ−aθ∥} which is (5.26).

Let us assume that

λTa∥y∥

dTy≤

0. Clearly,

(

) =

λ∥y∥

is feasible

for

(5.25)

. Its objective value is

∥y∥

. On the other hand,

(

) = 0 is always

feasible for

(5.25)

. Its objective value is also

∥y∥

, therefore,

(

) is the primal

optimal solution and θ(y) the dual optimal solution.

Now let us consider the case

λTa∥y∥

dTy >

0. Let us check that

(

) is

dual feasible, that is,

(

)

≥

0. Note that, due to the positive homogeneity of

(

) and the condition

λTa∥y∥

dTy >

0 with respect to

, we can assume

without loss of generality that ∥y∥= 1.

Let

λTa

and

dTy

. Since

(

) = +

∞ ≥

0 when

∥d∥

= 1, we can

assume that

y

when

∥d∥

= 1. Note that the same does not occur when

y=−dsince we are assuming λTa∥y∥+dTy > 0. Thus, α, β ∈(−1,1).

We will prove that

(

)

√︁1−β2

α√︁1−β2

β√1−α2≥

0, which

implies that

(

)

≥

0. If

α, β ≥

0, then we are done. As

β >

0, at least one

of them must be positive. Let us assume

α >

0 and

β <

0, the other case is

analogous. Then,

α > −β≥

0. This implies that

α2> β2

. Subtracting

α2β2

factorizing and taking square roots we obtain the desired inequality.

Let us compute the value of the dual solution

(

). First,

and

∥d∥= 1, θ(y) = +∞, which means that the optimal value is

lim

θ→+∞∥λ−θa∥−θ=−λTa.

140 Chapter 5. Maximal Quadratic-Free Sets

One way of computing this limit is to multiply and divide the expression

∥λ−θa∥+θ

, expand, and simplify the numerator and denominator until one

obtains something simple enough.

Now assume

y

∥d∥

= 1. Observe that

∥λ−θ

(

)

a∥∥y∥−θ

(

)

dTy

√︁∥λ−θ(y)a∥2∥y∥−θ(y)dTy. We have that

∥λ−θ(y)a∥2= 1 + θ(y)(θ(y)−2λTa)

= 1 + (θ(y)−λTa+λTa)(θ(y)−λTa−λTa)

= 1 + (θ(y)−λTa)2−(λTa)2.

Replacing θ(y), we obtain

∥λ−θ(y)a∥2= 1 + (dTy)2

S(y)−(λTa)2

S(y)(S2(y)(1 −(λTa)2)+(dTy)2)

=∥y∥2

S2(y).

Therefore,

∥λ−θ(y)a∥∥y∥−θ(y)dTy=∥y∥2

S(y)−dTyλTa−(dTy)2

S(y)

=∥y∥2−(dTy)2

S(y)−dTyλTa

=√︂(∥y∥2−(dTy)2)(1 −(λTa)2)−dTyλTa.

Let us now check the feasibility of

(

). Let us first check that

∥x

(

)

∥2≤

∥y∥2

. We have

∥x

(

)

∥2

(

)

−

(

)(

dTy

(

)

λTa

)

λTa

dTy

λTaS

(

))

Expanding and removing common terms yields

∥x

(

)

∥2

(

)(1

−

(

λTa

)

(dTy)2=∥y∥2. Thus, the first constraint is satisfied.

To check the second constraint just notice that, as ∥a∥= 1, aTx(y) = −dTy.

The primal value of x(y) is

λTx(y) = S(y)(1−(λTa)2)−dTyλTa=√︂(∥y∥2−(dTy)2)(1 −(λTa)2)−dTyλTa.

As it coincides with the value of the dual solution, even when

and

∥d∥= 1, we conclude that both are optimal.

It only remains to check

(5.29)

for

±a

. If

−a

, then the linear

constraint becomes

λTx≥dTy

and the optimal solution is

λ∥y∥

. If

5.10. Missing Proofs 141

then the linear constraint becomes

λTx≤ −dTy

and

−dTyλ

is then

optimal. In both cases (5.29) holds.

Lemma 5.55. Consider the set

S={(x, y)Rn+m:∥x∥ ≤ ∥y∥, aTx+dTy=−1}

with

a, d

such that

∥a∥>∥d∥

. Let

λ, β ∈D1

(0) be two vectors satisfying

λTa+dTβ≥0and consider Cϕλdefined in (5.15).

Then, the face of

Cϕλ

defined by the valid inequality

−λTx

∇ϕλ

(

)

Ty≤

does not intersect S.

Proof. By contradiction, suppose that (x¯, y¯) ∈Cϕλis such that

(x¯, y¯) ∈S∧ −λTx¯ + ∇ϕλ(β)Ty¯ = 0.

The latter equality and the fact that

ϕλ

is sublinear implies

ϕλ

(

y¯

) =

λTx¯

Moreover,

x¯

is a feasible solution of the optimization problem

ϕλ

(

y¯

), which

implies it is an optimal solution.

By Lemma 5.15 we know (

x¯, y¯

) exposes the valid inequality of

Cϕλ

given

by −λTx+∇ϕλ(y¯)Ty≤0. By definition of exposing point this means

∇ϕλ(y¯) = ∇ϕλ(β).

From

(5.21)

, since

is invertible, we can see that this implies

y¯

∥y¯∥

However, as

λTa

dTβ≥

0, the optimal solution of in the definition of

ϕλ

(

y¯

, must satisfy

aTx0

dTy¯

= 0. This contradicts

ϕλ

(

y¯

) =

λTx¯

, since

x¯

is an

optimal solution but aTx¯ + dTy¯ = −1.

Chapter 6

Conclusion

In this thesis, we have mostly studied and developed cutting planes techniques

for MINLP. The main contributions of the thesis, grouped by chapters, are

the following.

Chapter 1 The exposition of monoidal strengthening.

Chapter 2

The interpretation of Veinott’s Supporting Hyperplane algorithm

as a particular case of Kelley’s Cutting Plane algorithm. The extension

of Veinott’s algorithm to the case where the feasible region is represented

by, possibly, nonconvex and non-differentiable functions.

Chapter 3

The observation that the point we want to separate allows to

reduce the feasible region while still ensuring that every separating

hyperplane is valid. The formalization of the above observation using

the reverse polar and visible points. The characterization of visible points

for quadratic constraints.

Chapter 4

The framework for generating intersection cuts for factorable

MINLPs. The construction of a concave underestimator of a general

factorable function.

Chapter 5 The definition of a point exposing an inequality at infinity. The

construction of maximal quadratic-free sets.

We offer a final summary of the main ideas of the different chapters.

For Chapter 2, while trying to understand when gradient cuts of functions

yield supporting hyperplanes, we observed that Veinott’s algorithm naturally

appeared. This led to the observation that Veinott’s algorithm is just Kelley’s

algorithm in disguise. The disguise is changing the constraints representing

the feasible region,

, by the gauge of

. With this insight it is natural to

143

144 Chapter 6. Conclusion

consider extensions of these algorithm to cases where the constraint function

are not convex nor differentiable.

Chapter 3 is based on a simple observation: if a cut separating

is invalid,

then it must be separate a feasible near

. In other words, the point

defines

a subset of the feasible region,

, such that if a cut separates

from

then it separates

from the feasible region. The way we capitalized on this

observation was to find better bounds for the variables based on

. Then,

cutting plane methods that exploit bounds can produce stronger cuts.

The main motivation for Chapter 4 came from the parallel between solving

MILPs and MINLPs. Currently, a solver for MINLPs such as SCIP would not

generate cuts for a violated constraint of the form

−x

)

≤

0. However,

if the constraint is written as

x∈0,1

, then SCIP would try to generate,

for example, Gomory cuts. Applying the same deduction of Gomory cuts

to a general nonlinear constraint, leads to a reformulation of the nonlinear

constraint. The advantage of this reformulation is that the point to separate

is now the vertex. This alone allows to recover Gomory cuts for

−x

)

≤

but only because

−x

) is a concave function. Thus, it is natural to seek for

a concave underestimator in order to be able to deduce an intersection cut.

The question motivating Chapter 5 is natural, as maximal

-free sets yield

the strongest intersection cuts.

Bibliography

The page numbers in brackets at the end of each citation refer to the text.

T. Achterberg. Constraint Integer Programming. PhD thesis, 2009. [134]

T. Achterberg and R. Wunderling. Mixed integer programming: Analyzing 12 years

of progress. In Facets of Combinatorial Optimization, pages 449–481. Springer

Berlin Heidelberg, 2013. doi: 10.1007/978-3-642-38189-8 18. [75]

K. Andersen, Q. Louveaux, R. Weismantel, and L. A. Wolsey. Inequalities from two

rows of a simplex tableau. In Integer Programming and Combinatorial Optimization,

pages 1–15. Springer Berlin Heidelberg, 2007. doi: 10.1007/978-3-540-72792-7

[28]

T. Arnold, R. Henrion, M¨oller, A., and S. Vigerske. A mixed-integer stochastic

nonlinear optimization problem with joint probabilistic constraints, 2013. URL

https://edoc.hu-berlin.de/handle/18452/9087. [39]

A. Bagirov, N. Karmitsa, and M. M. M¨akel¨a. Introduction to Nonsmooth Optimization.

Springer International Publishing, 2014. doi: 10.1007/978-3-319-08114-4. [49]

E. Balas. Intersection cuts—a new type of cutting planes for integer programming.

Operations Research, 19(1):19–39, feb 1971. doi: 10.1287/opre.19.1.19. [10, 75]

E. Balas. Disjunctive programming. In Discrete Optimization II, Proceedings of the

Advanced Research Institute on Discrete Optimization and Systems Applications of

the Systems Science Panel of NATO and of the Discrete Optimization Symposium

co-sponsored by IBM Canada and SIAM Banff, Aha. and Vancouver, pages 3–51.

Elsevier BV, 1979. doi: 10.1016/s0167-5060(08)70342-x. [25, 76]

E. Balas. Disjunctive programming: Properties of the convex hull of feasible points.

Discrete Applied Mathematics, 89(1-3):3–44, dec 1998. doi: 10.1016/s0166-218x(98)

00136-x. [54]

E. Balas and R. G. Jeroslow. Strengthening cuts for mixed integer programs. Eu-

ropean Journal of Operational Research, 4(4):224–234, apr 1980. doi: 10.1016/

0377-2217(80)90106-x. [23, 26, 28, 29, 76, 85, and 87]

145

146 Bibliography

E. Balas and F. Margot. Generalized intersection cuts and a new cut generating

paradigm. Mathematical Programming, 137(1-2):19–35, aug 2011. doi: 10.1007/

s10107-011-0483-x. [76]

E. Balas, S. Ceria, and G. Cornu´ejols. A lift-and-project cutting plane algorithm for

mixed 0–1 programs. Mathematical Programming, 58(1-3):295–324, jan 1993. doi:

10.1007/bf01581273. [77]

A. Basu, M. Conforti, G. Cornu´ejols, and G. Zambelli. Maximal lattice-free con-

vex sets in linear subspaces. Mathematics of Operations Research, 35(3):704–720,

Aug. 2010a. doi: 10.1287/moor.1100.0461. URL

https://doi.org/10.1287/moor.

1100.0461. [92]

A. Basu, M. Conforti, G. Cornu´ejols, and G. Zambelli. Minimal inequalities for an

infinite relaxation of integer programs. SIAM Journal on Discrete Mathematics,

24(1):158–168, 2010b. doi: 10.1137/090756375. [28]

A. Basu, G. Cornu´ejols, and G. Zambelli. Convex sets and minimal sublinear functions.

Journal of Convex Analysis, 18(2):427–432, 2011. [76]

A. Basu, M. Campelo, M. Conforti, G. Cornu´ejols, and G. Zambelli. Unique lifting

of integer variables in minimal inequalities. Mathematical Programming, 141(1-2):

561–576, jun 2012. doi: 10.1007/s10107-012-0560-9. [28]

A. Basu, S. S. Dey, and J. Paat. Nonunique lifting of integer variables in minimal

inequalities. SIAM Journal on Discrete Mathematics, 33(2):755–783, jan 2019. doi:

10.1137/17m1117070. [94]

P. Belotti. Disjunctive cuts for nonconvex MINLP. In Mixed Integer Nonlin-

ear Programming, pages 117–144. Springer New York, nov 2011. doi: 10.1007/

978-1-4614-1927-3\5. [77]

P. Belotti, J. Lee, L. Liberti, F. Margot, and A. W¨achter. Branching and bounds

tightening techniques for non-convex MINLP. Optimization Methods & Software,

24(4-5):597–634, 2009. [38, 41, and 74]

D. Bienstock, C. Chen, and G. Munoz. Outer-product-free sets for polynomial opti-

mization and oracle-based cuts. arXiv preprint arXiv:1610.04604, 2016. [vii, 77,

90, 99, 128, 132, and 136]

D. Bienstock, C. Chen, and G. Mu˜noz. Intersection cuts for polynomial optimization.

In A. Lodi and V. Nagarajan, editors, Integer Programming and Combinatorial

Optimization, pages 72–87, Cham, 2019. Springer International Publishing. ISBN

978-3-030-17953-3. doi: 10.1007/978-3-030-17953-3

6. [77, 89, 90, 91, 132, 133,

and 136]

M. Bodur, S. Dash, and O. G¨unl¨uk. Cutting planes from extended lp formulations.

Mathematical Programming, 161(1-2):159–192, 2017. [91]

P. Bonami, J. Linderoth, and A. Lodi. Disjunctive cuts for mixed integer nonlinear

programming problems. Progress in Combinatorial Optimization, pages 521–541,

2011. [77]

Bibliography 147

F. Boukouvala, R. Misener, and C. A. Floudas. Global optimization advances in

mixed-integer nonlinear programming, MINLP, and constrained derivative-free op-

timization, CDFO. European Journal of Operational Research, 252(3):701–727, aug

2016. doi: 10.1016/j.ejor.2015.12.018. [1]

S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press,

2004. [6]

A. Brondsted and R. T. Rockafellar. On the subdifferentiability of convex functions.

Proceedings of the American Mathematical Society, 16(4):605, aug 1965. doi: 10.

2307/2033889. [75]

C. Buchheim and C. D’Ambrosio. Monomial-wise optimal separable underestimators

for mixed-integer polynomial optimization. Journal of Global Optimization, 67(4):

759–786, may 2016. doi: 10.1007/s10898-016-0443-3. [77]

C. Buchheim and E. Traversi. Separable non-convex underestimators for binary

quadratic programming. In Experimental Algorithms, pages 236–247. Springer

Berlin Heidelberg, 2013. doi: 10.1007/978-3-642-38527-8\22. [77]

A. Chmiela. Intersection cuts for non-convex MINLP. Master’s thesis, Technische

Universit¨et Berlin, 2020. [89, 134, and 135]

F. H. Clarke. Optimization and Nonsmooth Analysis. Society for Industrial and

Applied Mathematics, Jan. 1990. doi: 10.1137/1.9781611971309. [36, 47]

F. H. Clarke, Y. S. Ledyaev, R. J. Stern, and P. R. Wolenski. Nonsmooth Analysis

and Control Theory. Springer New York, 1998. doi: 10.1007/b97650. [47, 48, 49,

and 51]

M. Conforti and L. A. Wolsey. “Facet” separation with one linear program. Mathe-

matical Programming, may 2018. doi: 10.1007/s10107-018-1299-8. [57]

M. Conforti, G. Cornu´ejols, and G. Zambelli. A geometric perspective on lifting.

Operations Research, 59(3):569–577, jun 2011a. doi: 10.1287/opre.1110.0916. [28,

76, and 94]

M. Conforti, G. Cornu´ejols, and G. Zambelli. Corner polyhedron and intersection

cuts. Surveys in Operations Research and Management Science, 16(2):105–120, jul

2011b. doi: 10.1016/j.sorms.2011.03.001. [17]

M. Conforti, G. Cornu´ejols, and G. Zambelli. Integer Programming. Springer Interna-

tional Publishing, 2014. ISBN 978-3-319-11008-0. doi: 10.1007/978-3-319-11008-0.

[28, 54, 92, and 97]

M. Conforti, G. Cornu´ejols, A. Daniilidis, C. Lemar´echal, and J. Malick. Cut-

generating functions and S-free sets. Mathematics of Operations Research, 40

(2):276–391, may 2015. doi: 10.1287/moor.2014.0670. [17, 75, 76, and 90]

W. de Oliveira. Regularized optimization methods for convex MINLP problems.

TOP, 24(3):665–692, mar 2016. doi: 10.1007/s11750-016-0413-4. [39]

148 Bibliography

F. Deutsch, H. Hundal, and L. Zikatanov. Visible points in convex sets and best

approximation. In Computational and Analytical Mathematics, pages 349–364.

Springer New York, 2013. doi: 10.1007/978-1-4614-7621-4\15. [54, 61, and 68]

S. S. Dey and L. A. Wolsey. Two row mixed-integer cuts via lifting. Mathematical

Programming, 124(1-2):143–174, may 2010. doi: 10.1007/s10107-010-0362-x. [10,

28, 76, and 91]

M. A. Duran and I. E. Grossmann. An outer-approximation algorithm for a class of

mixed-integer nonlinear programs. Mathematical Programming, 36(3):307–339, oct

1986. doi: 10.1007/bf02592064. [37]

J. Dutta and C. S. Lalitha. Optimality conditions in convex optimization revisited.

Optimization Letters, 7(2):221–229, Oct. 2011. doi: 10.1007/s11590-011-0410-3.

[36, 47, 50, and 51]

V.-P. Eronen, M. M. M¨akel¨a, and T. Westerlund. On the generalization of ECP and

OA methods to nonsmooth convex MINLP problems. Optimization, 63(7):1057–

1073, aug 2012. doi: 10.1080/02331934.2012.712118. URL

https://doi.org/10.

1080%2F02331934.2012.712118. [37, 38]

V.-P. Eronen, M. M. M¨akel¨a, and T. Westerlund. Extended cutting plane method

for a class of nonsmooth nonconvex MINLP problems. Optimization, pages 1–21,

jun 2013. doi: 10.1080/02331934.2013.796473. [38]

V.-P. Eronen, J. Kronqvist, T. Westerlund, M. M. M¨akel¨a, and N. Karmitsa. Method

for solving generalized convex nonsmooth mixed-integer nonlinear programming

problems. Journal of Global Optimization, 69(2):443–459, May 2017. doi: 10.1007/

s10898-017-0528-7. [36, 50]

G. Fasano and R. Pesenti. Conjugate direction methods and polarity for quadratic

hypersurfaces. Journal of Optimization Theory and Applications, 175(3):764–794,

Oct. 2017. doi: 10.1007/s10957-017-1180-6. [68]

M. Fischetti and M. Monaci. A branch-and-cut algorithm for mixed-integer bilinear

programming. European Journal of Operational Research, sep 2019. doi: 10.1016/

j.ejor.2019.09.043. [77]

M. Fischetti, I. Ljubi´c, M. Monaci, and M. Sinnl. Intersection cuts for bilevel opti-

mization. In Integer Programming and Combinatorial Optimization, pages 77–88.

Springer International Publishing, 2016. doi: 10.1007/978-3-319-33461-5\7. [77]

M. Fischetti, I. Ljubi´c, M. Monaci, and M. Sinnl. A new general-purpose algorithm

for mixed-integer bilevel linear programs. Operations Research, 65(6):1615–1637,

dec 2017. doi: 10.1287/opre.2017.1650. [77]

R. Fletcher and S. Leyffer. Solving mixed integer nonlinear programs by outer

approximation. Mathematical Programming, 66(1):327–349, 1994. ISSN 1436-4646.

doi: 10.1007/BF01581153. [37]

Bibliography 149

G. Gamrath, D. Anderson, K. Bestuzheva, W.-K. Chen, L. Eifler, M. Gasse, P. Ge-

mander, A. Gleixner, L. Gottwald, K. Halbig, G. Hendel, C. Hojny, T. Koch, P. L.

Bodic, S. J. Maher, F. Matter, M. Miltenberger, E. M¨uhmer, B. M¨uller, M. Pfetsch,

F. Schl¨osser, F. Serrano, Y. Shinano, C. Tawfik, S. Vigerske, F. Wegscheider,

D. Weninger, and J. Witzig. The SCIP Optimization Suite 7.0. ZIB-Report

20-10, Zuse Institute Berlin, March 2020. URL

http://nbn-resolving.de/urn:

nbn:de:0297-zib-78023. [134]

A. M. Geoffrion. Generalized benders decomposition. Journal of Optimization Theory

and Applications, 10(4):237–260, oct 1972. doi: 10.1007/bf00934810. [37]

F. Glover. Convexity cuts and cut search. Operations Research, 21(1):123–134, feb

1973. doi: 10.1287/opre.21.1.123. [10, 75]

F. Glover. Polyhedral convexity cuts and negative edge extensions. Zeitschrift f¨ur

Operations Research, 18(5):181–186, oct 1974. doi: 10.1007/bf02026599. [14, 76]

M. Goberna, E. Gonz´alez, J. Mart´ınez-Legaz, and M. Todorov. Motzkin decompo-

sition of closed convex sets. Journal of Mathematical Analysis and Applications,

364(1):209–221, apr 2010. doi: 10.1016/j.jmaa.2009.10.015. [94]

R. Gomory. An algorithm for the mixed integer problem. Technical report, RAND

CORP SANTA MONICA CA, 1960. [24]

R. E. Gomory. Outline of an algorithm for integer solutions to linear programs.

Bulletin of the American Mathematical Society, 64(5):275–279, sep 1958. doi: 10.

1090/s0002-9904-1958-10224-4. [33]

M. Gr¨otschel, L. Lov´asz, and A. Schrijver. Geometric Algorithms and Combinatorial

Optimization. Springer Berlin Heidelberg, 1993. doi: 10.1007/978-3-642-78240-4.

[54]

A. S. E. D. Hamed and G. P. McCormick. Calculation of bounds on variables

satisfying nonlinear inequality constraints. Journal of Global Optimization, 3(1):

25–47, 1993. doi: 10.1007/bf01100238. [55]

M. M. F. Hasan. An edge-concave underestimator for the global optimization of

twice-differentiable nonconvex problems. Journal of Global Optimization, 71(4):

735–752, mar 2018. doi: 10.1007/s10898-018-0646-x. [77]

J.-B. Hiriart-Urruty and C. Lemar´echal. Convex Analysis and Minimization Al-

gorithms II. Springer Berlin Heidelberg, 1993. doi: 10.1007/978-3-662-06409-2.

[39]

R. Horst and H. Tuy. Global Optimization. Springer Nature, 1990. doi: 10.1007/

978-3-662-02598-7. [5, 35, 37, 38, 45, and 46]

J. J. E. Kelley. The cutting-plane method for solving convex programs. Journal of

the Society for Industrial and Applied Mathematics, 8(4):703–712, dec 1960. doi:

10.1137/0108053. [32, 35, and 37]

150 Bibliography

V. Jeyakumar and D. T. Luc. Nonsmooth calculus, minimality, and monotonicity of

convexificators. Journal of Optimization Theory and Applications, 101(3):599–621,

Jun 1999. ISSN 1573-2878. doi: 10.1023/a:1021790120780. [36, 48]

A. Kabgani, M. Soleimani-damaneh, and M. Zamani. Optimality conditions in op-

timization problems with convex feasible set using convexificators. Mathematical

Methods of Operations Research, 86(1):103–121, Apr 2017. ISSN 1432-5217. doi:

10.1007/s00186-017-0584-2. [36, 47, and 50]

O. Khamisov. On optimization properties of functions, with a concave minorant.

Journal of Global Optimization, 14(1):79–101, 1999. doi: 10.1023/a:1008321729949.

[75, 77]

M. R. Kılın¸c and N. V. Sahinidis. Exploiting integrality in the global optimization of

mixed-integer nonlinear programming problems with BARON. Optimization Meth-

ods and Software, 33(3):540–562, jul 2017. doi: 10.1080/10556788.2017.1350178.

[74]

J. Kronqvist, A. Lundell, and T. Westerlund. The extended supporting hyperplane

algorithm for convex mixed-integer nonlinear programming. Journal of Global Op-

timization, 64(2):249–272, 2016. ISSN 1573-2916. doi: 10.1007/s10898-015-0322-3.

[33, 35, 44, 45, and 51]

J. Kronqvist, D. E. Bernal, A. Lundell, and I. E. Grossmann. A review and comparison

of solvers for convex MINLP. Optimization and Engineering, 20(2):397–455, dec

2018. doi: 10.1007/s11081-018-9411-8. [34, 52]

J. B. Lasserre. Global optimization with polynomials and the problem of moments.

SIAM Journal on Optimization, 11(3):796–817, 2001. [91]

J. B. Lasserre. On representations of the feasible set in convex optimization. Op-

timization Letters, 4(1):1–5, oct 2009. doi: 10.1007/s11590-009-0153-6. [35, 47,

and 50]

J. B. Lasserre. On convex optimization without convex representation. Optimization

Letters, 5(4):549–556, apr 2011. doi: 10.1007/s11590-011-0323-1. [36, 47, and 50]

J. B. Lasserre. Erratum to: On convex optimization without convex representation.

Optimization Letters, 8(5):1795–1796, Apr. 2014. doi: 10.1007/s11590-014-0735-9.

[36, 47]

M. Laurent. Sums of squares, moment matrices and optimization over polynomials.

In Emerging Applications of Algebraic Geometry, pages 157–270. Springer, 2009.

[91]

C. Lemar´echal. An introduction to the theory of nonsmooth optimization. Optimiza-

tion, 17(6):827–858, 1986. doi: 10.1080/02331938608843204. [36]

Y. Lin and L. Schrage. The global solver in the LINDO API. Optimization Methods

and Software, 24(4-5):657–668, oct 2009. doi: 10.1080/10556780902753221. [74]

Bibliography 151

M. Lubin, D. Bienstock, and J. P. Vielma. Two-sided linear chance constraints and

extensions. arXiv preprint arXiv:1507.01995, 2015. [38]

J. E. Mart´ınez-Legaz. Optimality conditions for pseudoconvex minimization over

convex sets defined by tangentially convex constraints. Optimization Letters, 9(5):

1017–1023, Oct. 2014. doi: 10.1007/s11590-014-0822-y. [36, 47, and 50]

G. P. McCormick. Computability of global solutions to factorable nonconvex pro-

grams: Part i — convex underestimating problems. Mathematical Programming,

10(1):147–175, dec 1976. doi: 10.1007/bf01580665. [4, 6, 69, 77, and 78]

MINLPLIB. MINLP library. http://www.minlplib.org. [135]

R. Misener and C. A. Floudas. ANTIGONE: Algorithms for coNTinuous / Integer

Global Optimization of Nonlinear Equations. Journal of Global Optimization, 59

(2-3):503–526, mar 2014. doi: 10.1007/s10898-014-0166-2. [74]

S. Modaresi, M. R. Kılın¸c, and J. P. Vielma. Intersection cuts for nonlinear inte-

ger programming: convexification techniques for structured sets. Mathematical

Programming, 155(1-2):575–611, feb 2015. doi: 10.1007/s10107-015-0866-5. [76]

D. Mor´an and S. S. Dey. On maximal s-free convex sets. SIAM Journal on Discrete

Mathematics, 25(1):379–393, jan 2011. doi: 10.1137/100796947. [28, 95]

G. Mu˜noz and F. Serrano. Maximal quadratic-free sets. In Integer Programming

and Combinatorial Optimization, pages 307–321. Springer International Publish-

ing, 2020. doi: 10.1007/978-3-030-45771-6 24. URL

https://doi.org/10.1007/

978-3-030-45771-6_24. [90]

F. Plastria. Lower subdifferentiable functions and their minimization by cutting

planes. Journal of Optimization Theory and Applications, 46(1):37–53, may 1985.

doi: 10.1007/bf00938758. [38]

M. Porembski. How to extend the concept of convexity cuts to derive deeper cut-

ting planes. Journal of Global Optimization, 15(4):371–404, 1999. doi: 10.1023/a:

1008315229750. [76]

M. Porembski. Finitely convergent cutting planes for concave minimization. Journal

of Global Optimization, 20(2):109–132, 2001. doi: 10.1023/a:1011240309783. [76]

B. H. Pourciau. Modern multiplier rules. The American Mathematical Monthly, 87

(6):433–452, jun 1980. doi: 10.1080/00029890.1980.11995060. [23]

V. Powers and B. Reznick. Polynomials that are positive on an interval. Transactions

of the American Mathematical Society, 352(10):4677–4693, oct 2000. doi: 10.1090/

s0002-9947-00-02595-2. [69]

A. Pr´ekopa. Stochastic Programming. Springer Netherlands, 1995. doi: 10.1007/

978-94-017-3087-7. [39]

A. Pr´ekopa and T. Sz´antai. Flood control reservoir system design using stochastic

programming. In Mathematical Programming in Use, pages 138–151. Springer

Berlin Heidelberg, 1978. doi: 10.1007/bfb0120831. [39]

152 Bibliography

B. N. Pshenichnyi. Necessary Conditions for an Extremum. Marcel Dekker Inc, New

York, 1971. [36]

Y. Puranik and N. V. Sahinidis. Domain reduction techniques for global NLP

and MINLP optimization. Constraints, 22(3):338–376, jan 2017. doi: 10.1007/

s10601-016-9267-5. [55]

I. Quesada and I. E. Grossmann. An lp/nlp based branch and bound algorithm

for convex minlp optimization problems. Computers & chemical engineering, 16

(10-11):937–947, 1992. [37]

R. T. Rockafellar. Convex analysis. Princeton University Press, 1970. [6, 22, 23, 34,

40, 42, 57, 62, 63, 64, 65, 97, and 138]

P. Ruys. Public goods and decentralization: the duality approach in the theory of

value. PhD thesis, Tilburg University, 1974. [57]

A. Saxena, P. Bonami, and J. Lee. Convex relaxations of non-convex mixed in-

teger quadratically constrained programs: extended formulations. Mathematical

Programming, 124(1-2):383–411, may 2010a. doi: 10.1007/s10107-010-0371-9. [77]

A. Saxena, P. Bonami, and J. Lee. Convex relaxations of non-convex mixed inte-

ger quadratically constrained programs: projected formulations. Mathematical

Programming, 130(2):359–413, mar 2010b. doi: 10.1007/s10107-010-0340-3. [77]

S. Scholtes. Introduction to Piecewise Differentiable Equations. Springer New York,

2012. doi: 10.1007/978-1-4614-4340-7. [51]

A. Schrijver. Theory of Linear and Integer Programming. Wiley, 1998. [6]

S. Sen and H. D. Sherali. Facet inequalities from simple disjunctions in cutting plane

theory. Mathematical Programming, 34(1):72–83, jan 1986. doi: 10.1007/bf01582164.

[76]

S. Sen and H. D. Sherali. Nondifferentiable reverse convex programs and facetial

convexity cuts via a disjunctive characterization. Mathematical Programming, 37

(2):169–183, jun 1987. doi: 10.1007/bf02591693. [76]

F. Serrano. Intersection cuts for factorable MINLP. In Integer Programming and Com-

binatorial Optimization, pages 385–398. Springer International Publishing, 2019.

doi: 10.1007/978-3-030-17953-3\29. [73]

N. Z. Shor. Quadratic optimization problems. Soviet Journal of Computer and

Systems Sciences, 25:1–11, 1987. [91]

V. N. Solovev. On a criterion for convexity of a positive-homogeneous

function. Mathematics of the USSR-Sbornik, 46(2):285–290, Feb. 1983.

doi: 10.1070/sm1983v046n02abeh002787. URL

https://doi.org/10.1070/

sm1983v046n02abeh002787. [108]

Sz´antai. A computer code for solution of probabilistic-constrained stochastic program-

ming problems. In Y. Ermoliev and R.-B. Wets, editors, Numerical Techniques for

Stochastic Optimization, pages 229–235. Springer Verlag, 1988. [39]

Bibliography 153

M. Tawarmalani and N. V. Sahinidis. Convex extensions and envelopes of lower

semi-continuous functions. Mathematical Programming, 93(2):247–263, dec 2002.

doi: 10.1007/s10107-002-0308-z. [75]

M. Tawarmalani and N. V. Sahinidis. A polyhedral branch-and-cut approach to

global optimization. Mathematical Programming, 103(2):225–249, may 2005. doi:

10.1007/s10107-005-0581-8. [74]

J. Tind and L. A. Wolsey. On the use of penumbras in blocking and antiblocking the-

ory. Mathematical Programming, 22(1):71–81, dec 1982. doi: 10.1007/bf01581026.

[57]

E. Towle and J. Luedtke. Intersection disjunctions for reverse convex sets. arXiv

preprint arXiv:1901.02112, 2019. [77]

A. Tsoukalas and A. Mitsos. Multivariate McCormick relaxations. Journal of Global

Optimization, 59(2-3):633–662, apr 2014. doi: 10.1007/s10898-014-0176-0. [78, 80]

H. Tuy. Concave programming with linear constraints. Doklady Akademii Nauk, 159

(1):32–35, 1964. [10, 75, 76, and 83]

H. Tuy. Convex Analysis and Global Optimization. Springer International Publishing,

2016. doi: 10.1007/978-3-319-31484-6. [42]

W. van Ackooij and W. de Oliveira. Convexity and optimization with copulæ struc-

tured probabilistic constraints. Optimization, 65(7):1349–1376, May 2016. doi:

10.1080/02331934.2016.1179302. [39]

W. van Ackooij, R. Henrion, A. M¨oller, and R. Zorgati. Joint chance constrained

programming for hydro reservoir management. Optimization and Engineering, oct

2013. doi: 10.1007/s11081-013-9236-4. [39]

W. van Ackooij, E. C. Finardi, and G. M. Ramalho. An exact solution method

for the hydrothermal unit commitment under wind power uncertainty with joint

probability constraints. IEEE Transactions on Power Systems, 33(6):6487–6500,

nov 2018. doi: 10.1109/tpwrs.2018.2848594. [39]

A. F. Veinott. The supporting hyperplane method for unimodal programming. Op-

erations Research, 15(1):147–152, feb 1967. doi: 10.1287/opre.15.1.147. [33, 35, 37,

45, 50, and 51]

S. Venkatachalam and L. Ntaimo. Integer Set Reduction for Stochastic Mixed-Integer

Programming. arXiv e-prints, art. arXiv:1605.05194, Apr 2016. [56, 67]

S. Vigerske. Decomposition in multistage stochastic programming and a constraint

integer programming approach to mixed-integer nonlinear programming. PhD thesis,

Humboldt-Universit¨at zu Berlin, Mathematisch-Naturwissenschaftliche Fakult¨at

II, 2013. [55]

S. Vigerske and A. Gleixner. SCIP: global optimization of mixed-integer nonlinear

programs in a branch-and-cut framework. Optimization Methods and Software, 33

(3):563–593, jun 2017. doi: 10.1080/10556788.2017.1335312. [74]

154 Bibliography

S. Vigerske and A. Gleixner. Scip: global optimization of mixed-integer nonlinear

programs in a branch-and-cut framework. Optimization Methods and Software, 33

(3):563–593, 2018. doi: 10.1080/10556788.2017.1335312. URL

https://doi.org/

10.1080/10556788.2017.1335312. [134]

Z. Wei and M. M. Ali. Outer approximation algorithm for one class of convex mixed-

integer nonlinear programming problems with partial differentiability. Journal of

Optimization Theory and Applications, 167(2):644–652, mar 2015a. doi: 10.1007/

s10957-015-0715-y. [37]

Z. Wei and M. M. Ali. Convex mixed integer nonlinear programming problems and

an outer approximation algorithm. Journal of Global Optimization, 63(2):213–227,

feb 2015b. doi: 10.1007/s10898-015-0284-5. [37]

Z. Wei and M. M. Ali. Generalized benders decomposition for one class of MINLPs

with vector conic constraint. SIAM Journal on Optimization, 25(3):1809–1825, jan

2015c. doi: 10.1137/140967519. [38]

T. Westerlund and F. Pettersson. An extended cutting plane method for solving

convex MINLP problems. Computers & Chemical Engineering, 19:131–136, jun

1995. doi: 10.1016/0098-1354(95)87027-x. [38]

T. Westerlund, H. Skrifvars, I. Harjunkoski, and R. P¨orn. An extended cutting

plane method for a class of non-convex MINLP problems. Computers & Chemical

Engineering, 22(3):357–365, feb 1998. doi: 10.1016/s0098-1354(97)00000-8. [38]

T. Westerlund, V.-P. Eronen, and M. M. M¨akel¨a. On solving generalized convex

MINLP problems using supporting hyperplane techniques. Journal of Global

Optimization, 71(4):987–1011, mar 2018. doi: 10.1007/s10898-018-0644-z. [38]

S. Wiese. On the interplay of Mixed Integer Linear, Mixed Integer Nonlinear and

Constraint Programming. PhD thesis, Alma Mater Studiorum - Universit`a di

Bologna, 2016. [23]

A. Zaffaroni. Convex coradiant sets with a continuous concave cogauge. Journal of

Convex Analysis, 15(2):325–343, 2008. [54]

G. M. Ziegler. Lectures on Polytopes. Springer New York, 1995. doi: 10.1007/

978-1-4613-8431-1. URL https://doi.org/10.1007/978-1-4613-8431-1. [18]