Document [original]

Numerical Methods for the Solution of Bi-Level

Multi-Objective Optimization Problems

Von der Fakult¨at f¨ur Elektrotechnik,

Informatik und Mathematik

der Universit¨at Paderborn

zur Erlangung des akademischen Grades

Doktor der Naturwissenschaften

– Dr. rer. nat. –

genehmigte Dissertation

von

Alessandro Dell’Aere

Paderborn, 2008

Acknowledgements

First of all I want to thank my advisor Prof. Dr. Michael Dellnitz for giving me

the opportunity to do my PhD on a very interesting field of research. It was a great

pleasure to work with him and his group that has always provided a nice research

environment to me. Then I want to thank the second reviewers Prof. Dr. Joachim

B¨ocker and Prof. Dr. Stefan Sch¨affler, and the members of the examination board

Prof. Dr. Norbert K¨ockler, Prof. Dr. Henning Krause, and Dr. habil. Martin Ziegler.

My working group has been like a family to me. Here I want to thank Tanja

B¨urger, Dr. Wang Fang, Kathrin Flaßkamp, Sebastian Hage-Packh¨auser, Mirko

Hessel-von Molo (imagine there is a vector space...), Christian Horenkamp, Prof.

Dr. Oliver Junge, Marianne Kalle, Stefan Klus, Dr. Arvind Krishnamurty, Anna-

Lena Meyer, Dr. Kathrin Padberg, Michael Petry, Dr. Marcus Post, Dr. Oliver

Sch¨utze, Marcel Schwalb, Stefan Sertl, Bianca Thiere, Katrin Witting, but special

thanks are directed to Dr. Robert Preis, who has always been ready to help out in

any case of problems and to Dr. Sina Ober-Bl¨obaum for the pleasurable ambiance

in our office and for an exciting canoe ride.

Moreover I have to mention the students Maik Ringkamp, Albert Seifried, and

Dominik Steenken. They have always been ready to support me, especially to do

computational work.

During the last years, my work was partly supported by the German Research

Foundation within the Collaborative Research Center 614: ’Self-optimizing Con-

cepts and Structures in Mechanical Engineering’ and therefore I want to thank the

German Research Foundation and all the people involved in this project, especially

Tobias Knoke, Eckehard M¨unch, and Henner V¨ocking, who were always motivated

for interesting interdisciplinary cooperations.

I also want to thank Dr. Gabriele Eichfelder from the University of Erlangen

for visiting me in Paderborn and for fruitful discussions. Then I have to thank

Dr. Kai Gehrs. During our study we have been able to solve many difficult exercises

in cooperation and in our free time we had a lot of fun playing our guitars.

However, this list would not be complete without special thanks to all my friends.

And last but not least, I want to thank my wife Heike and my daughter Pia Sophia

for a number of things.

iii

Abstract

Both bi-level optimization and multi-objective optimization are of crucial impor-

tance in many modern sciences and accordingly they have attracted great interest

in many publications of the last decades. Since many applications, in particular

in the field of self-optimizing systems, become more and more complex during the

last years, in this thesis we go a step ahead and consider bi-level multi-objective

optimization problems. Such problems can be understood as bi-level optimization

problems, where the subproblems of both levels are given by multi-objective opti-

mization problems. We develop the theoretical background and practical algorithms

for the solution of these problems. Convergence of the algorithms is proved and their

strength is demonstrated by academic example problems and real world applications.

Zusammenfassung

Sowohl Bilevel-Optimierung als auch Mehrziel-Optimierung sind von großer Be-

deutung f¨ur viele moderne Wissenschaften und haben dementsprechend in vielen

Publikationen der letzten Jahrzehnte großes Interesse erfahren. Da in den letz-

ten Jahren viele Anwendungen, insbesondere im Bereich der selbstoptimierenden

Systeme, immer komplexer werden, gehen wir in dieser Dissertation einen Schritt

weiter und betrachten Bilevel-Mehrzieloptimierungsprobleme. Diese Probleme k¨on-

nen als Bilevel-Optimierungsprobleme aufgefasst werden, bei denen die Teilprobleme

der beiden Levels in Form von Mehrziel-Optimierungsproblemen vorliegen. Wir

entwickeln die theoretische Basis und anwendbare Algorithmen zur L¨osung dieser

Probleme. Die Konvergenz der Algorithmen wird bewiesen und ihre St¨arke wird an

Hand von akademischen Beispielproblemen und realistischen Anwendungen gezeigt.

Contents

1 Introduction 1

2 Basic Definitions and Concepts 6

2.1 Multi-Objective Optimization . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Bi-Level Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Bi-Level Multi-Objective Optimization . . . . . . . . . . . . . . . . . 11

3 Basic Methods for Solving Multi-Objective Optimization Problems 13

3.1 Set-Oriented Methods for Solving Unconstrained Multi-Objective Op-

timizationProblems ........................... 14

SubdivisionAlgorithm .......................... 14

RecoveringAlgorithm........................... 15

SamplingAlgorithm ........................... 17

3.2 Reference Point Methods . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 A New Image Set-Oriented Recovering Method for Solving Multi-

Objective Optimization Problems 20

4.1 Recovering-IS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Comparison with Other Methods . . . . . . . . . . . . . . . . . . . . 28

4.3 ConvexObjectives ............................ 35

4.4 A Hierarchical Concept for Computing the Desired Part of the Pareto

Set..................................... 39

5 Basic Methods for Solving Bi-Level Optimization Problems 41

5.1 Merit Functions and Smoothing Methods . . . . . . . . . . . . . . . . 41

6 Bi-Level Multi-Objective Optimization Problems (BLMOP) 45

6.1 An Optimality Condition for BLMOPs without Lower Level Inequal-

ityConstraints .............................. 45

6.2 Methods for Solving BLMOPs without Lower Level Inequality Con-

straints................................... 52

The Case k=1.............................. 54

A BLMOP Formulation for Finding Robust Pareto Points . . . . . . 55

BL-Subdivision Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 56

BL-Recovering-PS Algorithm . . . . . . . . . . . . . . . . . . . . . . 61

Convergence of Recovering-PS Algorithms . . . . . . . . . . . . . . . 64

BL-Recovering-IS Algorithm . . . . . . . . . . . . . . . . . . . . . . . 67

Convergence of Recovering-IS Algorithms . . . . . . . . . . . . . . . . 70

6.3 Pareto Set Constrained Multi-Objective Optimization Problems . . . 71

6.4 Methods for Solving PSCMOPs . . . . . . . . . . . . . . . . . . . . . 75

PSC-Subdivision and PSC-Recovering Algorithm . . . . . . . . . . . 76

PSC-SamRec Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 77

PSC-Sampling Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 78

Combination of the PSC-Sampling and PSC-SamRec Algorithms . . . 79

6.5 Methods for Solving BLMOPs with Lower Level Inequality Constraints 91

BL2-Recovering-IS algorithm . . . . . . . . . . . . . . . . . . . . . . . 92

6.6 Non-Convex and Non-Smooth BLMOPs . . . . . . . . . . . . . . . . 100

BL2-Subdivision Algorithm . . . . . . . . . . . . . . . . . . . . . . . 101

7 Sensitivity Analysis 103

7.1 Sensitivity Analysis for Classical Optimization Problems . . . . . . . 103

7.2 Sensitivity Analysis for MOP and BLMOP . . . . . . . . . . . . . . . 107

7.3 A Concept for the Adaptive Choice of Targets . . . . . . . . . . . . . 117

8 Conclusion and Outlook 121

1 Introduction

Many problems in all areas of applied science, engineering, economics and statis-

tics can be posed in terms of optimization. In particular, such problems can be

characterized by the fact that several objective functions have to be optimized at

the same time. For instance, for a perfect passenger car one wants to simultaneously

minimize energy consumption and maximize comfort. As indicated by this example

the different objectives typically contradict each other and therefore certainly do

not have identical optima. Thus, the goal is to approximate the ”optimal compro-

mises” which, in mathematical terms, are called Pareto points. It turned out that in

general there is an extensive set of Pareto points, the Pareto-set. Such optimization

problems are known as multi-objective optimization problems (MOP).

There are two main directions in solving multi-objective optimization problems.

On one hand, it might be sufficient to compute a single solution inside the Pareto set.

In this case several methods like for instance the weighted sums method ([18, 34])

or the ε-constraint method ([18, 34]) are good choices. On the other hand, in many

applications it is desired to know the entire Pareto set. In this case, set-oriented

methods ([10, 11, 39]) or the normal boundary intersection method ([7]) work very

efficiently. Certainly, if the entire Pareto set is available, the decision making has to

be performed, that is, one particular Pareto point has to be selected for adjusting

the system under consideration.

Very often the objectives of multi-objective optimization problems are outputs

of a complex system, where interactions between the associated subsystems restrict

the feasible set of the optimization problem. In particular the feasible set of a multi-

objective optimization problem (the higher level problem) can itself be defined by

the solution set of another multi-objective optimization problem (the lower level

problem) within an underlying subsystem. Let us illustrate this fact by an example.

Suppose one wants to minimize in a multi-objective sense both comfort and energy

consumption of a vehicle, which forms a complex system. But for safety reasons,

also optimality concerning the mechanical guidance both in the horizontal and the

vertical direction of the undercarriage regarded as an underlying subsystem has

to be guaranteed. Since this again is a multi-objective optimization problem, we

are concerned with a hierarchy of multi-objective optimization problems. In many

applications, the lower level problem is in addition parametrized by a part of the

upper level variables, such that the resulting overall problem turns out to be very

complex and hard to solve. Following the notion of classical bi-level optimization

problems, which are for instance considered in [1] and [13], the described hierarchy

forms a bi-level structure of multi-objective optimization problems and therefore we

call these problems bi-level multi-objective optimization problems (BLMOP).

The consideration of such problems was originally motivated by the author’s work

within the Collaborative Research Center 614 of the German Research Foundation,

which is concerned with self-optimizing systems in mechanical engineering. To state

the relevance of classical and bi-level multi-objective optimization in self-optimizing

systems, we first point out that the process of self-optimization is performed re-

peatedly in the following three steps. In Step 1 the current state of the technical

system is analyzed. Depending on the outcome of this analysis, a system of goals to

be achieved is generated in Step 2. In Step 3 the behavior of the technical system has

to be adapted due to the goals determined in Step 2. More details on self-optimizing

systems can be found in [21]. In particular, the system of goals - or a part of it - can

be made up not only by the fact that several performance properties have to meet

certain values but also by an importance rating among these performance properties.

Mathematically speaking, several possibly contradicting objective functions have

to be optimized simultaneously in Step 3, that is, a multi-objective optimization

problem has to be solved repeatedly during the self-optimizing process, while the

weights of the objectives are varying. Since the objectives are outputs of a complex

system, the corresponding feasible set can be restricted by the solution of another

parametrized multi-objective optimization problem, that is, bi-level multi-objective

optimization problems have to be solved in this case. In practice, instead of solving

the BLMOP in every cycle of the self-optimization process, the entire Pareto set

can be computed in advance, such that Step 3 is reduced to an inexpensive decision

making process instead of solving the relatively expensive BLMOP. Therefore, in

this work we concentrate on computing the Pareto set of a BLMOP, a task which

is regarded as an important part of the complex self-optimizing process.

In the literature one can find many contributions dealing with classical bi-level

optimization problems, which are characterized by the fact that every level is made

up by a classical, that is, a scalar valued optimization problem. But there are only

a few publications concerned with bi-level multi-objective optimization: first of all

we want to cite [16], from where we have taken some examples to demonstrate the

efficiency of our new algorithms. Similar problems with several parametrized lower

level multi-objective problems and an upper level multi-objective problem are inves-

tigated in [32] and [42]. Statements on the existence of solutions for these problems

are given in [32]. In [42] an algorithm for the solution of these problems is proposed,

but this algorithm is restricted to the computation of a single preferred solution out

of the entire solution set. In [2] the authors consider problems with a parametrized

lower level multi-objective problem and a scalar upper level problem. Problems with

a scalar upper level problem and a linear lower level multi-objective problem are in-

vestigated in [44] and [8]. Such hierarchical problems with a non-parametric lower

level multi-objective problem belong to the field of optimization over the efficient

set.

In this work we propose different new approaches for the solution of bi-level

multi-objective optimization problems. These methods are set-oriented in the sense

that they allow to approximate the entire Pareto set (or a desired part of it) rather

than just the computation of single points inside this set. The development of these

methods was essentially influenced by two different concepts in technical applica-

tions. To see the difference between these directions, observe that there might be

several solutions in a Pareto set for which the vectors of objective values are iden-

tical. On one hand it can be reasonable for many applications and it seems to be

efficient to restrict the algorithms to the computation of a complete set of alterna-

tives ([15]), that is, for every Pareto optimal point in the space of objective values

(the objective space or image space) only one corresponding point in the space of

parameters (the parameter space or pre-image space) has to be computed. But in

other technical applications it can be very important to select Pareto optimal solu-

tions which do not only correspond to a desired vector of objective values but also

have additional properties like for example (in a certain sense) robustness. More-

over, for online applications as described in [46], small distances between certain

solutions in parameter space are required. Thus, for these applications, the com-

putation of the entire Pareto set provides the best basis for selecting solutions in

order to adjust the technical system. According to the mentioned requirements cor-

responding to the two different views on applications, in this work we present both

parameter set-oriented methods for the computation of entire Pareto sets and image

set-oriented methods which are designed to compute a complete set of alternatives.

Most of our algorithms are realized by the use of a multi-level subdivision structure

in order to compute a tight covering formed by a collection of subsets of the para-

meter or image space, respectively, such that the union of these subsets contains the

solution. Since these coverings are supposed to approximate the solution, they are

computed in a way such that the standard Hausdorff distance between the solution

and the computed covering is smaller than a given value. We have implemented

our algorithms in the software package GAIO1, which was developed at the Chair

of Applied Mathematics of the University of Paderborn and provides the required

multi-level subdivision structure.

A more detailed outline of this thesis is as follows. In Section 2, we state more

precisely, that is, from a mathematical point of view, how the bi-level multi-objective

optimization problem is defined. To this end, we first recall the basic definitions

and the theoretical background for classical multi-objective optimization problems

in Section 2.1 ([15], [18], [34]). In particular we present the well-known Kuhn-Tucker

necessary conditions ([31]) for a Pareto optimal solution, because these will provide

the essential basis for the derivation of optimality conditions for BLMOP in Section

6.1. In Section 2.2 we consider classical bi-level optimization problems ([12], [13],

[45]). We state the basic definition for these problems and we present both the

optimistic and pessimistic formulation, which are the traditional concepts that have

been introduced in order to obtain unique solutions. Based on the content of Sections

2.1 and 2.2, we are capable of giving the mathematical definition for BLMOP in

Section 2.3. In particular, since this is the main interest of this thesis, we also state

the corresponding optimistic formulation.

The methods for the solution of classical multi-objective optimization problems,

from which we lend the basic concepts for the development of our new methods

for the solution of BLMOP, are presented in Section 3. More detailed, in Section

3.1 we concentrate on the set-oriented methods ([11], [39]), which are capable of

computing the entire Pareto set of MOP. In Section 3.2 we give a short introduction

to reference point methods ([18]), which turned out to be the proper method for

the solution of the particular subproblem, which has to be solved repeatedly in our

image set-oriented methods.

Section 4 contains new results and forms one of the main parts of this thesis.

Here, we are concerned with a new image set-oriented method for the generation of

a complete set of alternatives of multi-objective optimization problems. Originally,

we developed a variant of this method for the solution of BLMOP, but it turned out

that this concept can also be used for the solution of more general constrained multi-

objective optimization problems. In Section 4.1, the algorithm and its realization is

described in detail. Since this method uses scalarizations based on reference point

methods, in Section 4.2 we compare it to other scalarization methods, that is in

particular the weighted sums method. In Section 4.3 we explain the advantages of the

image set-oriented methods in the case of convex objective functions. Particularly,

1http://math-www.uni-paderborn.de/∼agdellnitz/gaio/

we show that in this case the complete set of alternatives corresponds to the entire

Pareto set. Furthermore, an interactive hierarchical concept for the fast computation

of a desired part of the solution based on the image set-oriented method is described

in Section 4.4.

Some basic methods for the solution of classical bi-level optimization problems

can be found in Section 5. Here, we mention very briefly several concepts presented

in [12], but we concentrate on the Kuhn-Tucker based concepts that we use for

the development of our algorithms for the solution of BLMOP. We also describe

a problem arising in the presence of lower level inequality constraints, that is, the

violation of constraint qualifications, see [12] and [33]. In Section 5.1, we present

how such problems can be overcome by reformulations which use merit functions or

smoothing functions and how smoothing methods help to solve these reformulations

([22], [28]).

Section 6 forms another main part of this thesis and is concerned with both the-

oretical and practical results for BLMOP without lower level inequality constraints

and a convex lower level problem. This section is divided as follows. In Section

6.1 we derive – based on the theoretical background given in the preceding Sections

– necessary optimality conditions for these problems. In Section 6.2 we propose

several set-oriented methods for the solution of this particular subclass of BLMOP.

These are on one hand subdivision algorithms, which are realized by the use of sub-

sets of the parameter space, and on the other hand recovering algorithms, which

work with subsets either in parameter space or in image space. We also prove con-

vergence (in a sense to be stated precisely later on) for most of these algorithms.

Thereafter, in Section 6.3 we concentrate on the particular case of a non-parametric

lower level problem and derive a corresponding variant of the necessary optimality

conditions. This is motivated by the fact that several applications fall into this

family of problems, which are termed Pareto set constrained multi-objective opti-

mization problems (PSCMOP). In Section 6.4, we present algorithms both of subdi-

vision and recovering nature for the solution of PSCMOP. In particular, we present

derivative-free algorithms, which can be efficiently implemented because of the sim-

pler structure of PSCMOP.

For the sake of completeness, we included a short Section 6.5 on the solution of

BLMOP with lower level inequality constraints. Here, we propose as an example an

algorithm which was realized by extending one of the previously described algorithms

using the smoothing concept mentioned in Section 5.1. In addition, we present

an algorithm for the solution of BLMOP with non-convex and non-differentiable

objectives and constraints in Section 6.6.

In Section 7 we perform a sensitivity analysis for MOP and BLMOP, that is, we

are interested in the variation of the Pareto set caused by the variation of an addi-

tional perturbation parameter. To this end, we first review the sensitivity analysis

for classical optimization problems ([17]) in Section 7.1. Based on this background,

in Section 7.2 we derive the desired sensitivity analysis for MOP and BLMOP. As

an application, in Section 7.3 we use the results derived in Section 7.2 for the devel-

opment of a concept for the adaptive choice of algorithmic parameters (the targets),

which helps to control the spreading among the Pareto points computed by the

image set-oriented methods described in Section 4.1 and Section 6.2.

In Section 8, the thesis closes with a summary of the results and a discussion

about open problems and possible future directions.

2 Basic Definitions and Concepts

2.1 Multi-Objective Optimization

In this section we review the theoretical background on multi-objective optimization

needed in this work. We briefly summarize the basic concepts including first order

necessary optimality conditions (Kuhn-Tucker conditions) and a direction of descent.

Then we describe the set-oriented methods for solving classical multi-objective op-

timization problems, from which we adapt the main concepts in order to develop an

image set-oriented variant of these methods in Section 4 and set-oriented methods

for the solution of bi-level multi-objective optimization problems in Section 6.

As mentioned in Section 1, the task of a multi-objective optimization problem

MOP is to optimize several real valued objective functions Fi:

n→

simulta-

neously. Before the MOP can be stated mathematically we have to define a partial

order on

Definition 2.1 Let v, w ∈

k. Then the vector vis less than w(v <pw), if

vi< wifor all i∈ {1, . . . , k}. The relation ≤pis defined in an analogous way.

We collect kobjective functions Fiin the vector valued function

n→

k, F(x)=(F1(x), . . . , Fk(x))t.

Additionally, if there are p≤nequality constraints Hiand qinequality constraints

Gito be satisfied, that is, Hi(x) = 0 for i= 1, . . . , p and Gi(x)≤0 for i= 1, . . . , q,

then we collect them analogously in vector valued functions

n→

p, H(x) = (H1(x), . . . , Hp(x))t

and

n→

q, G(x)=(G1(x), . . . , Gq(x))t

and denote the feasible set by

S={x∈

n:H(x) = 0 and G(x)≤p0}.

With this notation we can state the multi-objective optimization problem as follows:

min

x∈SF(x),(MOP)

where minimization has to be understood in the sense of Definition 2.1. A point

¯x∈Sis a solution of MOP, if for any other x∈Seither F(x) = F(¯x) or the value

Fi(x) of at least one objective Fiis greater than Fi(¯x). To be more precise, we state

the following

Definition 2.2 Consider the multi-objective optimization problem MOP. Then a

point ¯x∈Sis called (globally) Pareto optimal or a (global) Pareto point if there is

no y∈Ssuch that

F(y)6=F(¯x) and F(y)≤pF(¯x).(2.1)

A point ¯x∈Sis a local Pareto point, if there is a neighborhood U(¯x) of ¯xsuch that

there is no y∈U(¯x)∩Ssatisfying (2.1).

The following definition will be useful, because some of the algorithms described in

this work use the partial order ≤pwithin certain sets of points.

Definition 2.3 (i) For X ⊂

nwe call a point x∈ X nondominated with

respect to Fand Xor F-nondominated with respect to Xif there does not

exist any point y∈ X with F(y)6=F(x) and F(y)≤pF(x).

(ii) For X ⊂

nwe call a point x∈ X nondominated with respect to F, G, H and

Xor F(G,H)-nondominated with respect to X, if G(x)≤p0, H(x) = 0 and

there does not exist any point y∈ X with G(y)≤p0, H(x) = 0, F(y)6=F(x)

and F(y)≤pF(x).

Since ≤pjust defines a partial order on

k, one cannot expect to find isolated Pareto

points. Under certain assumptions the solution to an unconstrained MOP locally

forms a (k−1)-dimensional manifold, see [25]. On the other hand, if the constraints

of a MOP define an r-dimensional manifold, we can expect that generically the

solution to the MOP locally forms a manifold of dimension min{k−1, r}.

The following theorem of Kuhn and Tucker ([31]) states a necessary condition for

Pareto optimality.

Theorem 2.4 Let x∗be a Pareto point of MOP and assume that the vectors ∇Hi(x∗),

i= 1, . . . , p and ∇Gj(x∗),j∈ {l:Gl(x∗) = 0}are linearly independent.

Then there exist scalars α1, . . . , αk, µ1, . . . , µq≥0, λ1, . . . , λp∈R,such that

i=1

αi= 1,

µiGi(x∗) = 0 for i= 1, . . . , q and

i=1

αi∇Fi(x∗) +

i=1

λi∇Hi(x∗) +

i=1

µi∇Gi(x∗)=0.

(2.2)

Points satisfying (2.2) are not necessarily Pareto optimal, but certainly “Pareto

candidates” and thus we now emphasize their relevance by the following

Definition 2.5 A point x∈

nis called a substationary point of MOP if there

exist scalars α1, . . . , αk, µ1, . . . , µq≥0, λ1, . . . , λp∈Rsuch that (2.2) is satisfied.

To find substationary points, one can use a direction of descent, that is, a di-

rection in

nin which all the kobjectives are simultaneously non-increasing and

at least one objective is decreasing. Several choices of descent directions can for in-

stance be found in [3], [23], and [37] and references therein. As an example we review

the descent direction given in [37]. To this end, we associate with F:

n→

F(x) = (F1(x), . . . , Fk(x))t, the following quadratic optimization problem for every

fixed x∈

min

α∈

k(k

i=1

αi∇Fi(x)k2

2;αi≥0, i = 1, . . . , k,

i=1

αi= 1).(QOP)

Then the weighted sum of gradients

−

i=1

ˆαi∇Fi(x),

where ˆαis a solution of QOP, either vanishes or is a descent direction for all objective

functions F1, . . . , Fkin x.

With this result an iteration step for finding substationary points can be defined by

first computing ˆαfor the current point xand then performing a line search along

−Pk

i=1 ˆαi∇Fi(x) to find a new point ˜x.

2.2 Bi-Level Optimization

In this section we give a brief introduction to (classical) bi-level optimization. In

doing so we follow basically the definitions of [12] and concentrate on those contents

required for the development of our methods for solving BLMOPs. Comprehen-

sive overviews on bi-level optimization can be found in [1, 5, 12, 13, 45]. Bi-Level

optimization (or bi-level programming) problems constitute a particular kind of

hierarchical optimization problems, where a part of the constraints for the upper (or

higher) level problem is defined by another parametric optimization problem, the

lower level problem. To be more precise, denote by

n×

m→

the upper level objective function. The rupper level equality constraints Hi(x, y) = 0

and supper level inequality constraints Gj(x, y)≤0 are collected in the vector val-

ued functions

n×

m→

r, H(x, y)=(H1(x, y), . . . , Hr(x, y))t

and

n×

m→

s, G(x, y) = (G1(x, y), . . . , Gs(x, y))t.

Analogously, denote by

n×

m→

the lower level objective function. The plower level equality constraints hi(x, y) = 0

and qlower level inequality constraints gj(x, y)≤0 are collected in the vector valued

functions

n×

m→

p, h(x, y) = (h1(x, y), . . . , hp(x, y))t

and

n×

m→

q, g(x, y) = (g1(x, y), . . . , gq(x, y))t.

With these notations, the bi-level optimization problem can be written as

00 min

y00F(x(y), y) (BLP)

s.t. (x(y), y)∈S,

where

S={(x, y)∈

n×

m:G(x, y)≤p0, H(x, y) = 0, x ∈ψ(y)}

and ψ(y) denotes the solution set of the following lower level problem:

min

xf(x, y) (LLP)

s.t. g(x, y)≤p0,

h(x, y) = 0.

Observe that in the case of non-unique solutions x(y)∈ψ(y) for the lower level

problem, the notion of an optimal solution of the bi-level optimization problem is

not necessarily obvious. This ambiguity is expressed by using the quotation marks in

(BLP). One way out of this situation is given by the optimistic or weak formulation,

where for every fixed y∈

m,x(y)∈ψ(y) is chosen such that F(x(y), y) is minimal.

This optimistic formulation can be expressed as

min

ymin

x{F(x, y) : x∈ψ(y)}(BLP-O)

s.t. G(x, y)≤p0,

H(x, y) = 0.

Analogously, in the pessimistic or strong formulation,x(y)∈ψ(y) is chosen such

that F(x(y), y) is maximal. This pessimistic formulation can be expressed as

min

ymax

x{F(x, y) : x∈ψ(y)}(BLP-P)

s.t. G(x, y)≤p0,

H(x, y) = 0.

Originally, the optimistic and pessimistic formulations were used in [43] for the

description of real market situations, where different decision makers try to realize

best decisions with respect to their individual aims while they are not able to realize

their decisions independently but are forced to react according to a certain hierarchy.

In particular, this hierarchy can be given by a bi-level structure as described above.

Then, the optimistic or pessimistic formulations might be suitable, depending on the

question whether the lower level decision maker (the follower) is willing to support

the higher level decision maker (the leader) or not.

Non-uniqueness of lower level solutions will be particularly relevant in Section

6, where the lower level problem will be replaced by a MOP and therefore ψ(y) is

given by a Pareto set which in general is an extensive set and not a singleton. In

accordance to the structure of the applications we have in mind, we will concentrate

on the optimistic formulation.

Let us now assume that the lower level problem LLP is convex, that is, for

every fixed y∈

m,fand gi, i = 1, . . . , q are convex and hi, i = 1, . . . , p are

affine-linear. Then the solution x(y)∈ψ(y) is unique and we can write x=x(y).

Moreover, under the common regularity assumptions, LLP can be replaced by its

Kuhn-Tucker conditions2which are necessary and sufficient in this case. In this

way, with ¯

∇:= ∇(x), an auxiliary problem equivalent to the original problem is

obtained:

min

x,y,ζ,τ F(x, y) (BLP’)

s.t. G(x, y)≤p0,

H(x, y)=0,

∇f(x, y) +

i=1

ζi¯

∇hi(x, y) +

i=1

τi¯

∇gi(x, y) = 0,

g(x, y)≤p0,

h(x, y)=0,

τigi(x, y) = 0 for i= 1, . . . , q,

τi≥0 for i= 1, . . . , q.

There are several solution methods for (BLP), (see [12, 13, 1]), which are based

on the solution of such auxiliary problems. In Section 6, we will borrow and extend

this idea for the development of most of our algorithms for the solution of BLMOP.

2.3 Bi-Level Multi-Objective Optimization

In this section we introduce those problems, which make up the main problem class

under consideration in this thesis, that is, the class of bi-level multi-objective opti-

mization problems (BLMOP). This can be understood as a MOP, where some vari-

ables xhave to be taken from the solution set of another MOP which is parametrized

by variables yof the first MOP. In other words, BLMOP arises from BLP by re-

placing both the higher and lower level optimization problem by multi-objective

optimization problems.

2The Kuhn-Tucker conditions for scalar valued optimization problems can be obtained from

Theorem 2.4 by setting k= 1.

To express the BLMOP in mathematical terms, we use the same notations as in

Section 2.2 with the exception that both the upper and lower level objectives are

considered to be vector valued, that is

n×

m→

k, F(x, y) = (F1(x, y), . . . , Fk(x, y))t

and

n×

m→

l, f(x, y)=(f1(x, y), . . . , fl(x, y))t.

For every y∈

m, let ψ(y) denote the solution, that is, the Pareto set of the

following lower level problem:

min

xf(x, y),(BLMOP-LL)

s.t. g(x, y)≤p0,

h(x, y) = 0,

where minimization has to be understood in the sense of the partial order ≤p.

Then a BLMOP can be stated as follows:

00 min

y00F(x(y), y),(BLMOP)

s.t. (x(y), y)∈S,

where again minimization has to be understood in the sense of the partial order

≤pand the feasible set is defined by

S={(x, y) : G(x, y)≤p0, H(x, y) = 0, x ∈ψ(y)}.

Motivated by applications and by the fact that for a BLMOP ψ(y) is typically

an extensive Pareto set, we will concentrate on a suitable variant of the optimistic

formulation BLP-O. This variant differs from BLP-O in the way that the upper level

problem and for every fixed ythe lower level problem of BLMOP are minimized in a

multi-objective sense. For this, we introduce the Pareto-optimistic formulation for

BLMOP:

min

ymin

x{F(x, y) : x∈ψ(y)}(BLMOP-O)

s.t. G(x, y)≤p0,

H(x, y) = 0,

where minimizations have to be understood in the sense of the partial order ≤p. The

solution of such problem makes up the main interest of this thesis and is considered

in Section 6.

3 Basic Methods for Solving Multi-Objective Op-

timization Problems

In the last decades many methods for solving MOP have been developed. Compre-

hensive overviews on methods can be found for instance in [15] and [34]. There are

some algorithms – e.g. the weighted sums method – that are only capable of compu-

ting single points of the Pareto set. Since many of these single solution methods can

be realized by applying established optimization algorithms to scalar valued auxilia-

ry functions, they are relatively fast and efficient. Although in some particular

situations the user might be satisfied with a single Pareto point, in many applica-

tions it is desired to have an overview on the entire Pareto set. For this, algorithms

for computing the entire Pareto set have been developed, see [11],[15],[20],[25], [34],

and [39]. Once the entire Pareto set is available, the user has to select one of the

solutions, e.g. for adjusting the technical system under consideration. On the one

hand, this decision making problem is very challenging because the computed solu-

tions are of equal worth from the mathematical point of view. On the other hand,

having an overview on the entire Pareto set provides the best basis for selecting the

right solution.

Of course, computing the entire Pareto set can be very time-consuming, particu-

larly, if the dimension of the feasible set S⊂

nis very high. In order to attenuate

this drawback, other methods (e.g. the normal boundary intersection method, [7])

make a compromise and calculate only a certain subset of the Pareto set, which is

sufficient for the decision making process in the sense that it forms a complete set

of alternatives ([15]), that is, for every Pareto optimal point in image space, only

one corresponding Pareto point in parameter space is computed. Our new image

set-oriented method for the solution of MOP, see Section 4, belongs also to the latter

class of methods and – as will be demonstrated later on – has some additional nice

properties. Moreover, some of these properties gave reason to extend this algorithm,

see Section 6, in order to solve BLMOP.

In the next sections we describe in more detail those methods from which we

borrowed the basic ideas for the development of our methods for the solution of

BLMOP. To be more precise, in Section 3.1 we review set-oriented methods which

are originally tailored to compute the entire Pareto set of MOP in the absence of

constraints, but – as we will see in Section 6 – can be extended in order to solve

more general problems. In Section 3.2 we also give a short introduction to reference

point methods, which turned out to be the proper method for finding individual

Pareto points, a subproblem, which has to be solved repeatedly as the new image

set-oriented method described in Section 4 proceeds.

3.1 Set-Oriented Methods for Solving Unconstrained Multi-

Objective Optimization Problems

We now briefly review the set-oriented algorithms for the approximation of the

Pareto set or the set of substationary points of a given unconstrained MOP in a

compact domain Q⊂

n. For a detailed exposition the reader is referred to [11].

Using a multi-level subdivision scheme each of these methods produces a sequence

of sets B0,B1,B2,. . . where each Bjconsists of finitely many subsets of Q. Generally,

there are two different types of such methods: the methods of subdivision type and

the methods of recovering type. In methods of subdivision type, every Bjcovers the

Pareto set or the set of substationary points, such that the diameter of the subsets

shrinks as the index jincreases. In methods of recovering type, the diameter of the

subsets is fixed. Here, every Bjforms a partial covering of the Pareto set or the set

of substationary points and is extended as the method proceeds.

The numerical realization of the set-oriented methods works as follows: the el-

ements B∈ Bjare boxes each of which is specified by a center in

nand nradii.

But rather than working explicitly with centers and radii of the boxes these are

stored within a binary tree which leads to a significant reduction in the memory

requirement. The boxes are discretized via test points. Strategies for the choice of

test points are presented in [11].

In the following we will denote by Sthe set of substationary points within Qand

we will call the elements Bof Bjboxes. Let Φ : Rn→Rnbe an iteration scheme for

finding substationary points of the MOP under consideration.

Subdivision Algorithm

The sequence (Bj)j∈Nproduced by the following algorithm has the property that

the diameter

diam(Bj) := max

B∈Bj

diam(B)

tends to zero for j→ ∞. In fact it can be shown, see [11], that the box coverings

which are created by the following algorithm converge towards S.

Let B0be an initial collection of finitely many subsets of the compact set Qsuch

that ∪B∈B0B=Q. Then Bjis inductively obtained from Bj−1in two steps:

(i) Subdivision Construct from Bj−1a new system ˆ

Bjof subsets such that

[

B∈ˆ

B=[

B∈Bj−1

and

diam( ˆ

Bj) = θjdiam(Bj−1),

where 0 < θmin ≤θj≤θmax <1.

(ii) Selection Define the new collection Bjby

Bj=nB∈ˆ

Bj: there exists ˆ

B∈ˆ

Bjsuch that Φ−1(B)∩ˆ

B6=∅o.

Figure 1: The idea of the Subdivision algorithm.

The described method is of global nature in the sense that the entire domain Q

is explored, such that all connected components of Scan be found, if the num-

ber of test points used for the discretization of the boxes is large enough. But it

should be mentioned that particularly in the case of many test points this method

is restricted to moderate dimensions of the parameter space, because the number of

boxes B∈ Bj, which have to be explored in every iteration, might grow rapidly as

the index jincreases.

Recovering Algorithm

During the subdivision procedure boxes might get lost although they contain sub-

stationary points. This can be the case when the number of test points taken into

account is not large enough. In the following we describe an algorithm which allows

to recover boxes which have previously been lost, although they contain substation-

ary points. But it should be mentioned that this algorithm is also a self-contained

continuation method, that is, given at least one single Pareto point within the com-

pact domain Q, further substationary points are generated successively in order to

obtain a representation of the entire Pareto set of the given MOP within Q. For

formal reasons let us denote by Pdacomplete partition3of the set Qinto boxes

of subdivision size – or depth –d, which are generated by successive bisection of

Q. Then there exists for every point x∈Qand every depth dexactly one box

B(x, d)∈ Pdwith center cand radius rsuch that ci−ri≤xi< c+ri,∀i= 1, . . . , n.

Moreover, denote by Φq(s) the application of qsteps of an iteration scheme for find-

ing substationary points using an initial guess s. For a given initial box collection

B0the algorithm reads as follows:

(i) for all B∈ B0

B.active := TRUE

(ii) for j= 0, . . . , MaxStep

Bj:= Bj

for all B∈ {B∈ Bj:B.active == TRUE}

choose starting points {si}i=1,...,l near B

X:= {Φq(si)|i= 1, . . . , l}

B.active := FALSE

for all x∈ X:

if B(x, d)6∈ ˆ

B(x, d).active := TRUE

Bj:= ˆ

Bj∪B(x, d)

if ˆ

Bj== BjSTOP

Bj+1 := ˆ

Figure 2: The idea of the Recovering algorithm.

Unfortunately, the Recovering algorithm may not perform adequately when a box

does not contain part of Sbut is possibly far away. In this case many undesired

3Pdhas not to be explicitly computed by our algorithms.

regions would be added to the box collection on the way towards Sin the course

of the iteration of test points. Strategies to overcome this problem can be found in

[20].

Sampling Algorithm

There are some potential drawbacks of the algorithms described above: for many

iteration schemes Φ, the gradients of the objectives are needed. But in general,

gradients might not exist at all or the calculation of gradients might be very expen-

sive from the computational point of view. Moreover, the set Sis generally a strict

superset of the Pareto set for several reasons. On the one hand, there might be

local Pareto points, which are not Pareto optimal from a global point of view. On

the other hand, in some applications, e.g. if the objectives are not defined outside

the domain Q, penalization strategies have to be used to avoid that the iteration

scheme Φ leads to points outside Q. In this case, the algorithms are capable of

finding points on the boundary ∂Q of Q, which are not necessarily Pareto optimal

for the given MOP. These problems can be avoided using the following sampling

algorithm, which takes only the function values of the objectives into account. An

outline of an iteration of the algorithm is as follows. Given a box collection Bj−1

the collection Bjis obtained by:

(i) Subdivision Construct from Bj−1a new system ˆ

Bjof subsets such that

[

B∈ˆ

B=[

B∈Bj−1

and

diam( ˆ

Bj) = θjdiam(Bj−1),

where 0 < θmin ≤θj≤θmax <1.

(ii) Selection

for all B∈ˆ

choose a set of test points XB⊂B

N:= nondominated points of S

B∈ˆ

BjXB

Bj:= nB∈ˆ

Bj:∃x∈ XB∩No

Observe that, caused by the discretization of the boxes and, where required, also

by the application of an iteration scheme Φ, the described set-oriented methods

Figure 3: The idea of the Sampling algorithm.

(Subdivision, Recovering, Sampling) are capable of generating not only boxes but

also substationary points within these boxes. Typically, these points can be saved

in an archive and the globally nondominated points of this archive can be filtered

out by nondominance tests. These points are well-distributed in parameter space

in the sense that in every generated box there is at least one of these points. Such

a representation of the Pareto set forms an advantageous basis for the respective

decision making process in many (but not all) real world applications.

3.2 Reference Point Methods

Areference point t∈

kcan be regarded as a vector of desirable objective values

called aspiration levels or targets,ti, i = 1, . . . , k. Reference point methods use

feasible or infeasible reference points for the construction of scalar valued auxiliary

functions. For an overview on different types of reference point methods the reader is

referred to [18]. In the following we will focus on distance function based approaches,

which are relevant for our new algorithm presented in Section 4. As indicated by

its notation, distance function based approaches use a distance function, which is

typically based on a norm, to measure the distance between a reference point and

a given point in image space. To state the auxiliary problem corresponding to a

target vector t∈

k, let δ:

k×

k→

+be a distance function derived from a

norm, i.e., δ(a, b) = ||a−b|| for some norm || ·|| :

k→

+. Then the auxiliary

problem to be solved is

min

x∈Sδ(F(x), t).(RPP)

If we have δ(F(x?), t)>0, where x?is a solution to RPP, then we know that F(x?)

is on the boundary of the image F(S) = {F(x) : x∈S⊂

n}. Moreover, if in

addition t <pF(x?) we can expect that x?is (at least a local) Pareto point. Thus,

local Pareto points can be found by first choosing proper targets and then solving

RPP. Indeed, Theorem 3.1, which was taken from [15], guarantees that, under

certain assumptions, x?is a Pareto point. For this, recall that a norm ||·|| :

k→

is called strictly monotonically increasing, if ||y1|| <||y2|| for all y1, y2∈

kwith

|y1

j| ≤ |y2

j|, j = 1,2, . . . , k and |y1

j| 6=|y2

j|for some j.

Theorem 3.1 Let || · || be a strictly monotonically increasing norm and assume

ti= min{Fi(x) : x∈S}for i= 1,2, . . . , k. If x?is an optimal solution of RPP,

then x?is a solution of MOP.

Proof: See [15].

In order to compute a representation of the entire Pareto set, our new image set-

oriented algorithm presented in Section 4 repeatedly solves a variant of RPP while

the targets are varying. To state a corollary which guarantees that the correspon-

ding solutions are at least locally Pareto optimal we denote T= (T1, . . . , Tk), Ti=

min{Fi(x) : x∈S}for i= 1,2, . . . , k and define for a given target vector t∈

the modified feasible set

St={x∈S:Fi(x)≥ti, i = 1, . . . , k}.

Furthermore, we define both a modified multi-objective optimization problem and

a modified auxiliary problem by replacing Sby St:

min

x∈St

F(x),(MOP ’)

min

x∈St

δ(F(x), t).(RPP ’)

Now, with these notations we can state the following

Corollary 3.2 Let Fbe continuous on the compact domain S. Moreover, let ||·||

be a strictly monotonically increasing norm and assume that T <pt <pF(x?), where

x?is an optimal solution of RPP ’. Then x?is a local solution of MOP.

Proof: Since Fiis continuous and since there are ¯xi, x?∈Swith Fi(¯xi) = Ti<

ti< Fi(x?), there exist xi∈Ssuch that Fi(xi) = tifor all i= 1,2, . . . , k. From

construction of Stit is obvious, that x?∈Stand ti= min{Fi(x) : x∈St}. Thus,

Theorem 3.1 guarantees, that x?solves MOP ’. Since Stis constructed from Sjust

by constraining the image of F, such that F(St) contains a part of a local Pareto

optimal set in image space, x?is a local solution of MOP.

4 A New Image Set-Oriented Recovering Method

for Solving Multi-Objective Optimization Prob-

lems

As mentioned in Section 3.1, the (classical) Recovering algorithm uses a multi-

level subdivision scheme in order to discretize the parameter space and to achieve

diversity of the computed solution set. For our new algorithm presented in this

section, this set-oriented strategy was adapted in order to operate adequately in

image space, that is, we use boxes in image space rather than in parameter space.

This concept is motivated by the fact that in many applications there are just a

few objectives which depend on a relatively high number of parameters. Another

reason for the development of this image set-oriented variant is given by the decision

makers requirement that the calculated Pareto points should be well-distributed in

image space rather than in parameter space. Later on in Section 6.2, we will adapt

the concept of this new algorithm for the development of an image set-oriented

algorithm for the solution of BLMOPs.

4.1 Recovering-IS Algorithm

The crucial difference between our new algorithm and the classical Recovering algo-

rithm is the fact that a randomly chosen point t∈

kdoes not necessarily belong

to the image F(S) = {F(x) : x∈S}, that is, we do not know whether there is

any x∈Ssuch that F(x) = t. Moreover, if F(x) = tfor some x∈S, we do not

know whether xis Pareto optimal. To get an answer to these questions, we solve

the auxiliary problem RPP ’and – as mentioned in Section 3.2 – if t <pF(x?) for

a solution x?of RPP ’, then we know that – under suitable assumptions – x?is at

least locally Pareto optimal. Otherwise, if t=F(x?), then we repeatedly have to

vary tand solve RPP ’until t <pF(x?). A strategy for the choice and variation of

the targets tcan be found later on in this section.

With respect to the computational effort, the described technique for evaluating

a box in image space is similar to the technique for evaluating a box in parameter

space as used by the classical Recovering algorithm (provided that the dimensions

of both spaces are the same): for both techniques, a sufficiently number of targets

or starting points, respectively, has to be chosen and consequently the same number

of executions of the respective iteration scheme has to be applied. Thus, we expect

that the computational effort has the same order of magnitude for both evaluation

techniques. But due to the fact that in many applications the dimension of the

parameter space is much higher than the dimension of the image space, our new

Recovering-IS algorithm (with boxes in image space) turns out to be more efficient

in this case, because the number of points needed to represent a box grows rapidly

as the dimension of the considered space grows. For example, pmpoints are required

for generating a consistent grid of points in an m-dimensional box using ppositions

in every coordinate.

In the algorithm described below and for the remainder of this work the distance

function δis based on the norm || · ||2that is, δ(a, b) = ||a−b||2for all a, b ∈

k. Observe, that for every point F∈

kthere is exactly one box B(F, d) (in

image space) of depth dcontaining F. Starting with a given box collection B0in

image space and associating a solution (xB, FB) to every box B, the Recovering-IS

algorithm reads as follows:

(i) for all B∈ B0

B.active := TRUE

(ii) for j= 0, . . . , MaxStep

Bj:= Bj

for all B∈ {B∈ Bj:B.active == TRUE}

choose target vectors {ti}i=1,...,l near Bwith ti<pFB

i:= arg min

x∈Rnδ(F(x), ti), i = 1, . . . , l

i:= F(x?

i), i = 1, . . . , l

B.active := FALSE

for all i= 1, . . . , l:

if B(F?

i, d)6∈ ˆ

B:= B(F?

i, d), x ˜

B:= x?

i, F ˜

B:= F?

B.active := TRUE

Bj:= ˆ

Bj∪˜

if ˆ

Bj== BjSTOP

Bj+1 := ˆ

As desired, this algorithm computes Pareto points that are well-distributed in

image space in the sense that in every generated box there is one of the computed

Pareto point. Consequently, these points make up a suitable basis for the decision

making process related to many applications.

Figure 4: The idea of the Recovering-IS algorithm.

There remains the question, how to choose the target vectors ti, i = 1,2, . . . , l

near a current box Bin order to obtain a complete set of alternatives. Efficient

strategies for the choice of target vectors can be defined, particularly by using local

information on the Pareto set, e.g. orientation or curvature, which can be calculated

via objective derivatives (or numerical approximations of the derivatives). Moreover,

as presented in Section 7, sensitivity analysis can be used as an additional tool to

control the distance among the computed Pareto points. In the following we will

focus on a particular strategy for the choice of the targets which was originally

designed for problems with smooth objectives, but is also applicable and works

satisfactorily in the case of more general objectives. Let us assume that the image

F(P)⊂

kof the Pareto set P⊂

nis smooth and forms a (k−1)-dimensional

manifold in a neighborhood Nε(y?) of a given Pareto optimal point y?=F(x?)∈

F(P) in image space. Since an approximation of F(P) at y?is given by the tangent

space Ty?F(P), there are certainly further Pareto points near Ty?F(P)∩Nε(y?).

Consequently, we can expect that there are λ∈

and p∈Ty?F(P)∩Nε(y?), such

that suitable targets needed for the computation of further Pareto points can be

expressed by p+λd, where d≤p0 denotes a basis vector of the 1-dimensional space

(Ty?F(P))⊥. Thus, to apply this idea in practice, we first have to construct dand a

basis V:= {b1, b2, . . . , bk−1}of Ty?F(P) and then to specify targets

ti=y?

k−1

j=1

αi,j bj+λid, i = 1,2, . . . , l

by determining the coefficients αi,j and λi. For example, the construction of two

different targets in the neighborhood of a Pareto Point y?in image space for the

case k= 2 is depicted in Figure 5. Fortunately, if y?was found in a previous step

by solving RPP ’for a given target t?, t?<py?, then dcan be obtained without

any additional effort by d:= t?−y?. Once dis available, any standard method

for the construction of an orthogonal basis, e.g. the Grahm-Schmidt method can be

used to obtain the required basis V. For all i= 1,2, . . . , l, the coefficients αi,j are

chosen such that pi:= Pk−1

j=1 αi,j bjis located inside a neighbor box of the current

box. Moreover, the pishould be well-distributed around y?. With this heuristic, it

is very likely to find new boxes containing Pareto points. For the choice of λian

adaptive concept has to be applied, because a computed solution ¯yof RPP ’can

only be accepted, if ti<p¯yis satisfied. Such an adaptive concept should be guided

by the fact that ti<p¯ycertainly holds if λiis sufficiently large, but it should also

be considered that RPP ’is ill-conditioned if λiis too large.

Figure 5: The construction of targets based on b∈Ty?F(P) and d∈(Ty?F(P))⊥.

Example 4.1 For our first example let F= (F1, F2)t:R3→R2be defined by

F(x1, x2, x3) = (x1−1)2+ (x2−1)2+ (x3−1)4

(x1+ 1)4+ (x2+ 1)2+ (x3+ 1)2(4.1)

We assume that the decision maker is only interested in solutions for which both

objective values are located within the interval I:= [0,20] and therefore we define

S:= {x∈

3:Fi(x)∈I, i = 1,2}and consider the MOP

min

x∈SF(x).(4.2)

Figure 6 shows the solutions generated by the Recovering-IS algorithm using different

box sizes (depths). Here, the reader can get an impression of how the density of the

computed representation can be controlled by choosing the box size.

Figure 6: The solution of Example 4.1 computed by the image set-oriented recovering

algorithm using different box sizes in image space.

Example 4.2 We use the next example to demonstrate the nice property of the

Recovering-IS algorithm to generate a well-distributed representation of the Pareto

set. To this end let F= (F1, F2, F3)t:R3→R3be defined by

F(x1, x2, x3) = 



(x1−1)4+ (x2−1)2+ (x3−1)2

(x1+ 1)2+ (x2+ 1)4+ (x3+ 1)2

(x1−1)2+ (x2+ 1)2+ (x3−1)4

(4.3)

and consider the MOP

min

x∈

3F(x).(4.4)

Figure 7 depicts the solution computed by the Recovering-IS algorithm and Figure 8

depicts the solution computed by the classical Recovering algorithm.4Observe that

4The generated boxes are omitted in these figures in order to obtain more transparency.

the Pareto points computed by the Recovering-IS algorithm are well-distributed over

the entire Pareto set (in image space). On the other hand, the distribution obtained

by the classical Recovering algorithm varys drastically. Consequently, there are too

many unneeded points in one region, whereas the density is possibly too sparse for

the decision makers’ consideration in an essential part of the Pareto set.

Figure 7: The solution of Example 4.2 computed by the Recovering-IS algorithm

(image space).

Figure 8: The solution of Example 4.2 computed by the classical Recovering algo-

rithm (image space).

Example 4.3 Application to Mechatronics

In the following we review the main contents of [10] where the efficiency of the

Recovering-IS algorithm was demonstrated on a technical application taken from

[30]. Here we are concerned with an energy management problem for an onboard

storage system of an overhead line powered tram as shown in Figure 9. Here the aim

is to reduce both overhead line peak power and energy consumption simultaneously.

The storage device accumulates the energy which is released while the vehicle is

Figure 9: Block diagram of the tram system with onboard storage and operation

management system.

braking and, as shown in Figure 10, this stored energy can be used later on at

a suitable point of time to power the drive. Our task is to design the operation

management that is, to manage this charging and discharging, such that overhead

line peak power and energy consumption are minimized in a multi-objective sense.

For a more detailed description of this application see [30].

Since the drive cycle of the tram is divided into 1241 track sections and the

energy management system has to assign a reference value to each of these sections,

we have to solve a MOP with 2 objective functions and 1241 parameters. From

the engineer’s point of view a solution with a linear or at least monotone behavior

of the series of reference values corresponding to certain connected blocks of track

sections seemed to be reasonable. Motivated by this fact, the optimization process

Figure 10: The operation principle of the onboard storage system.

was performed in two steps. For a pre-optimization step, a reduced model was

constructed by grouping the parameters within each block and representing the

entire block by just two parameters i.e., by a linear auxiliary function. The resulting

model turned out to have only 184 parameters. Thus, it was possible to compute

very quickly a first approximation to the solution of the original problem as follows.

First, the Pareto set of the reduced model was computed using the Recovering-IS

algorithm. Due to the smaller number of parameters of the reduced model, this

step was executed relatively quickly. Then the higher-dimensional points of the

desired first approximation were calculated from the Pareto points of the reduced

model simply by linear interpolation. In the second step of the optimization process

the Recovering-IS algorithm was applied to the original problem using the first

approximation computed by the pre-optimization step as a starting iteration. This

principle of first performing a pre-optimization on a simplified lower-dimensional

model and taking the result for generating a starting iteration for solving the original

problem turned out to be a helpful step in decreasing the necessary computational

effort.

The results of the optimization procedure are shown in Figure 12. The stars

are the results of the pre-optimization and the circles are the results obtained from

the subsequent optimization on the original model using the results of the pre-

optimization as starting points. In order to compare the objectives, they are nor-

malized in the sense that they have the value 1 when no storage device is used.

For the purpose of comparison the reduced model has again been optimized

both with the classical Recovering algorithm in parameter space and with the new

Recovering-IS algorithm. To keep the results comparable, both algorithms have

been initialized with the same starting point and the provided computation time was

the same for both algorithms. The archived points associated with the computed

boxes are shown in Figure 11. Observe that the objective values computed by

the Recovering-IS algorithm are considerably better than those computed by the

Recovering algorithm in parameter space. Moreover, there is a crucial difference

between the results concerning diversity. This is caused by the fact that - as in many

applications - the map from the Pareto set in parameter space to the corresponding

points in image space is not injective, such that the classical Recovering algorithm

computes many Pareto points which are evenly distributed in the high-dimensional

parameter space but possibly close together in the low-dimensional image space.

On the other hand, due to the box-oriented concept in image space, the Recovering-

IS algorithm produces Pareto points which are evenly distributed in image space.

This shows the previously mentioned advantages of the Recovering-IS algorithm:

the time-consuming but unnecessary computation of too many points is avoided by

restricting to a few points which are evenly distributed in image space. These points

form a sufficient subset of the Pareto set and provide a manageable basis for selecting

the right solutions, particularly when this selection has to be done repeatedly in the

course of a self-optimizing process.

Let us make a remark on the choice of starting points. First observe that at least

one Pareto point has to be provided as a starting point for applying the Recovering

or the Recovering-IS algorithm. Obviously, such starting points can be generated

by just optimizing the individual objectives or a weighted sum of all objectives

using a standard optimization technique for scalar valued problems. Fortunately, in

the application presented here, a starting point was given because of the engineers’

knowledge on the technical system, such that this preliminary step was not necessary.

4.2 Comparison with Other Methods

In this section we will first compare the (image set-oriented) Recovering-IS algorithm

with the classical (parameter set-oriented) Recovering algorithm. In doing so, we

will point out some advantages of the Recovering-IS algorithm, but we will also

give some guidelines for combining the classical and the new method. Then we

will illustrate how the solutions of the Recovering-IS algorithm are related to the

solutions of the well-known weighted sums methods. We will give an example to

Figure 11: The solution of the tram problem computed by the IS-Recovering algo-

rithm and by the Recovering algorithm (image space).

Figure 12: Optimization results for the tram model (image space).

see that the Recovering-IS algorithm is capable of finding Pareto points for which

the weighted sums method is inapplicable. Moreover, we will prove that at least

every solution of the weighted sums method can also be found by the reference

point method, i.e., by solving RPP ’, which is the particular technique used to find

individual Pareto points within the Recovering-IS algorithm. Thereafter, we will

state some advantages of the Recovering-IS algorithm over the normal boundary

intersection method.

Comparison with the Classical Recovering Algorithm As stated in Section

4.1, the Recovering-IS algorithm is particularly advantageous in the case where the

dimension of the parameter space is significantly higher than the dimension of the

image space or, the number of objectives to be optimized. Moreover, the computed

points are well-distributed in image space, which can be particularly advantageous

from the decision maker’s point of view. But there is another advantage which is

independent of the dimensions. To see this, remember that in the Recovering-IS

algorithm a basis of the tangent space Ty?F(P), which is used for determining the

targets ti, can be obtained by some very simple calculations based on y?and t?. But,

for the classical Recovering algorithm, if one wants to construct the corresponding

tangent space Tx?Pin parameter space in order to determine starting points (or

predictors) close to the Pareto set P, the gradients ∇Fi(x?), i = 1, . . . , k of all

objectives are required and a QR-decomposition of a matrix at least of dimension

(n+k)×(n+1) (in the unconstrained case) has to be calculated, see [25]. Moreover,

for a suitable iteration scheme for finding further points on Pand for a termination

criterion gradients must be available, too. Of course, there are variants of the

classical Recovering method which do not require derivatives, but these methods

require many objective evaluations and consequently are not as efficient as gradient

based methods, particularly when the parameter space is high-dimensional.

Comparison with the Weighted Sums Method In the following we derive

some relationships between the solutions of the weighted sums method and RPP ’.

We assume that all objectives Fi, i = 1,2, . . . , k, are differentiable up to second

order.

For given weights wi≥0, i= 1,2, . . . , k, satisfying Pk

i=1 wi= 1 the weighted sums

problem, which forms a scalarized auxiliary problem for MOP, is given by:

min

x∈S

i=1

wiFi(x).(WS)

To obtain a representation of the entire Pareto set, WS has to be solved repeatedly

for different weighting vectors w= (w1, . . . , wk)t. One difficulty in this approach is,

that it is not always clear a priori how to choose these weighting vectors in order

to obtain a well-distributed representation of the Pareto set, see [6]. Moreover,

as demonstrated by the following example, the weighted sums method may fail to

compute the entire Pareto set.

Example 4.4 Let F:

+→

2be defined by F(x)=(F1(x), F2(x))t, where

F1(x)=1−xand F2(x) = √x, and consider the following MOP:

min

x>0F(x).

Observe that F0

1(x) = −1<0 and F0

2(x) = (2√x)−1>0 for all x > 0 that is, every

x > 0 is Pareto optimal. Assume we want to find the Pareto point x?=1

4. Keeping

in mind that w2= 1 −w1we can write down the necessary condition for x?to solve

WS:

w1F0

1(x?) + (1 −w1)F0

2(x?) = 1 −2w1= 0.

Consequently, if there is a weighted sums expression with a minimum at x?, then it

corresponds to w1=w2=1

2and thus must be given by

Fw(x) := 1

2F1(x) + 1

2F2(x) = 1−x+√x

Unfortunately, with

w(x) = 1

4√x−1

we have F0

w(x)>0 for 0 < x < x?and F0

w(x)<0 for x>x?that is, x?is a maximum

of Fwand a minimization of Fwcannot lead to x?, unless in the unlikely case when

x?is chosen to be the initial guess and a gradient based method is used. On the

other hand, using the target vector t= (1

4,0)t, minimization of the expression

Ft(x) := ||F(x)−t||2=rx2−1

2x+9

leads to x?, because with

t(x) = x−1

qx2−1

2x+9

we have F0

t(x)<0 for 0 < x < x?and F0

t(x)>0 for x>x?. In other words, we have

detected a solution x?of the given MOP which can be found using RPP ’, whereas

WS fails to find this solution.

Now, for the unconstrained case, i.e., S=

n, we want to state this relationship in

a more general context. In a first step we prove that any solution of WS fulfills the

necessary optimality conditions for being a solution of RPP ’, if the target vector is

chosen properly.

Lemma 4.5 For S=

nlet x?be a solution of WS. Then there is a target vector

t= (t1, t2, . . . , tk)t∈

k,t≤pF(x?), such that x?fulfills the necessary optimality

conditions for being a solution of RPP ’.

Proof: For i= 1,2, . . . , k let ti=Fi(x?)−wi. Since x?is a solution of WS, we

have

∇

i=1

wiFi(x?) =

i=1

wi∇Fi(x?) = 0

and consequently

∇||F(x?)−t||2=∇

v

i=1

(Fi(x?)−ti)2



=Pk

i=1 2(Fi(x?)−ti)∇Fi(x?)

2qPk

i=1(Fi(x?)−ti)2

||F(x?)−t||2

i=1

wi∇Fi(x?)=0.

Under certain conditions a solution of WS is indeed also a solution of RPP ’. To

prove this fact, which is formulated later on in Corollary 4.7, we have to do some

preliminary work. To decide whether a point is a solution of an unconstrained

minimization problem, in addition to the fact that the gradient vanishes, a statement

on the definiteness of the corresponding Hessian has to be made. For this we denote

by ∇2F(x) the Hessian of a function F:

n→

at xand consider the following

Theorem 4.6 For S=

nlet x?be a solution of WS, such that the Hessian

∇2

i=1

wiFi(x?) =

i=1

wi∇2Fi(x?)

is positive (semi-)definite. Then there is a target vector t= (t1, t2, . . . , tk)t∈

t≤pF(x?), such that

∇2||F(x?)−t||2

is positive (semi-)definite.

Proof: For i= 1,2, . . . , k let ti=Fi(x?)−wi. Then ||F(x?)−t||26= 0 and

∇2||F(x?)−t||2=∇(∇||F(x?)−t||2)t=∇ Pk

i=1(Fi(x?)−ti)∇Fi(x?)

||F(x?)−t||2!t

=Pk

i=1 ∇(Fi(x?)−ti) (∇Fi(x?))t

||F(x?)−t||2

+Pk

i=1 (Fi(x?)−ti)∇2Fi(x?)

||F(x?)−t||2

+∇1

||F(x?)−t||2 k

i=1

(Fi(x?)−ti)∇Fi(x?)!t

||F(x?)−t||2 k

i=1 ∇Fi(x?) (∇Fi(x?))t+

i=1

wi∇2Fi(x?)!

+∇1

||F(x?)−t||2





i=1

wi∇Fi(x?)

|{z }







Here, Pk

i=1 wi∇Fi(x?) = 0, because x?is a solution of WS. The dyadic product

∇Fi(x?) (∇Fi(x?))t

is positive semidefinite for all i= 1,2, . . . , k, and since

∇2

i=1

wiFi(x?) =

i=1

wi∇2Fi(x?)

is assumed to be positive (semi-)definite we can conclude that ∇2||F(x?)−t||2is

also positive (semi-)definite.

Corollary 4.7 For S=

nlet x?be an isolated solution of WS. Then there is

a target vector t= (t1, t2, . . . , tk)t∈

k,t≤pF(x?), such that x?is an isolated

solution of RPP ’.

Proof: For i= 1,2, . . . , k let ti=Fi(x?)−wi. Since x?is an isolated solution of

WS, ∇2Pk

i=1 wiFi(x?) is positive definite and ∇Pk

i=1 wiFi(x?) = 0. From Lemma

4.5 it follows that ∇||F(x?)−t||2= 0. Moreover, Theorem 4.6 guarantees that

∇2||F(x?)−t||2is positive definite. Thus, the necessary and sufficient conditions

for x?being an isolated solution of RPP ’are satisfied.

Comparison with Normal Boundary Intersection As mentioned in Section 1,

the Recovering-IS algorithm concentrates on the computation of a complete set of

alternatives ([15]). Another method following this concept is the normal boundary

intersection (NBI) method described in [7]. To apply this method, the individual

minima F?

iof all objectives Fihave to be computed in order to construct the

pay-off matrix Φ∈

k×kgiven by Φij =F?

j−F?

i. The set of points in

kthat

are convex combinations of the columns of Φ, i.e., {Φβ:β∈

+,Pk

i=1 βi= 1}is

called the convex hull of individual minima (CHIM). Let ˆndenote the unit normal

to the CHIM pointing towards the origin. The main idea of NBI is, that for every

β∈ {β∈

+,Pk

i=1 βi= 1}, the intersection of the line {Φβ+tˆn:t∈

}with

the boundary ∂F(S) of F(S) = {F(x), x ∈S}is accepted to be a local Pareto

point. Although the NBI method works as satisfactorily as the Recovering-IS algo-

rithm, we want to point out that there is an essential difference from the designers

point of view. In many applications, the decision making process is based on the

trade-offs associated with every Pareto point x?that is, information on how much

the value of one objective Fjincreases when another objective Fiis improved. The

corresponding trade-off can be expressed by αj

αi, where αjand αiare the multipliers

occuring in the Kuhn-Tucker conditions associated with the given MOP: there are

αl∈

, l = 1, . . . , k, such that

l=1

αl∇Fl(x?) = 0,(4.5)

l=1

αl= 1,(4.6)

αl≥0, l = 1, . . . , k. (4.7)

Certainly, α= (α1, . . . , αk)tcan be obtained by first calculating the gradients of

all objectives (nk derivatives!) and then solving (4.5), (4.6), (4.7) for α. But the

Recovering-IS algorithm has the advantage that for every calculated Pareto point

x?the vector F(x?)−t >p0 is available, where t∈

kdenotes the respective target

used to obtain x?. Multiplying the Kuhn-Tucker equation

∇||F(x?)−t||2=1

||F(x?)−t||2

l=1

(Fl(x?)−tl)∇Fl(x?) = 0

associated with RPP ’by

||F(x?)−t||2

l=1 Fl(x?)−tl∈

makes clear that α, given by

αi=Fi(x?)−ti

l=1 Fl(x?)−tl

, i = 1, . . . , k,

already solves (4.5), (4.6), (4.7). In other words, if the Pareto set is computed by the

Recovering-IS algorithm, αcan be simply obtained by scaling the vector F(x?)−t

instead of calculating all derivatives, which can be relatively expensive, and then

solving (4.5), (4.6), (4.7).

4.3 Convex Objectives

The Recovering-IS algorithm has some crucial advantages in the case where the MOP

is unconstrained and where all objectives are strictly convex. In Theorem 4.11 we

will show that in this case, given a solution x?and F?:= F(x?), suitable targets for

finding further solutions in the neighborhood of x?can always be chosen on the tan-

gent space TF?F(P). Moreover, we will show in Theorem 4.9 that the complete set

of alternatives in image space as generated by the Recovering-IS algorithm implies

that the corresponding Pareto set in parameter space is also complete. To this end,

denote by <·,·>the standard scalar product in

nand consider the following

Lemma 4.8 Let F∈C1(Rn,R)be strictly convex and let x1, x2∈Rn, x16=x2, such

that F(x1) = F(x2). Then we have

hx2−x1,∇F(x1)i<0.

Proof: Since Fis strictly convex and F(x1) = F(x2) we have

F(λ x2+ (1 −λ)x1)< λF(x2) + (1 −λ)F(x1) = F(x1)

for all λ∈(0,1). In particular, for ¯x=1

2x1+1

2x2we can write

F(¯x)< F(x1),

z:= F(¯x)−F(x1)<0.

Again using the strict convexity of F, we have

F(λ¯x+ (1 −λ)x1)< λF(¯x) + (1 −λ)F(x1)

=F(x1) + λ(F(¯x)−F(x1)),

or F(λ¯x+ (1 −λ)x1)−F(x1)

λ< z < 0

for all λ∈(0,1). Finally, since h¯x−x1,∇F(x1)iis the directional derivative of F

at x1corresponding to the direction ¯x−x1, we can conclude

hx2−x1,∇F(x1)i= 2 h¯x−x1,∇F(x1)i

= 2 lim

λ&0

F(λ¯x+ (1 −λ)x1)−F(x1)

λ<2z < 0.

Now we are in the position to prove the following

Theorem 4.9 Let Fi∈C1(Rn,R)be strictly convex for i= 1, . . . , k.

Define F(x) := (F1(x), . . . , Fk(x))tand consider the multi-objective optimization

problem

min

x∈RnF(x).

Let P⊂Rnbe the corresponding Pareto set. Then the mapping

Φ : P→F(P), x 7→ F(x)∀x∈P

is one-to-one.

Proof: Since Φ is obviously surjective it remains to show that Φ is injective.

Assume that there are x1, x2∈P, x16=x2, such that F(x1) = F(x2).Then there are

α1, . . . , αk≥0 with

i=1

αi= 1 such that

i=1

αi∇Fi(x1) = 0.(4.8)

Since the objectives Fiare strictly convex and Fi(x1) = Fi(x2) for i= 1, . . . , k, it

follows from Lemma 4.8, that

hx2−x1,∇Fi(x1)i<0 for all i= 1, . . . , k. (4.9)

W.l.o.g. let α1>0. Then we have from (4.8)

∇F1(x1) = −

i=2

αi

α1∇Fi(x1)

and consequently

hx2−x1,∇F1(x1)i=−

i=2

αi

α1hx2−x1,∇Fi(x1)i ≥ 0,

which is a contradiction to (4.9). Thus, F(x1)6=F(x2) that is, Φ is injective. This

completes the proof.

As mentioned at the beginning of this section, the choice of targets is very easy

in the case of convex objectives. Before we show this fact, in the following lemma

we first state that the vector F?−t?given by the difference between a target t?

and F?=F(x?), where x?is the Pareto optimal point found by solving RPP’, is

orthogonal to the tangent space TF?F(P) with respect to <·,·>.

Lemma 4.10 Let Fi∈C1(Rn,R)for i= 1, . . . , k, and consider the multi-objective

optimization problem

min

x∈RnF(x).

Denote by Pthe corresponding Pareto set and let F?:= F(x?), where x?∈

nis

the unique solution of RPP’ associated with the target t?<pF?.Then

F?−t?∈TF?F(P)⊥.

Proof: Let ∂F denote the boundary of {F(x) : x∈

n}. Since F(P) locally forms

a differentiable manifold (see [25]), there exists a differentiable curve α: [−1,1] →

∂F with α(0) = F?,α0(0) ∈TF?F(P) and α(λ)∈F(P) for all λ∈[0,1]. Then,

since x?is a solution of RPP’, λ= 0 is a solution of

min

λ∈[−1,1] ||α(λ)−t?||2

and therefore

dλ||α(λ)−t?||2=d

dλ hα(λ)−t?, α(λ)−t?i= 2 hα(λ)−t?, α0(λ)i= 0

for λ= 0. With α(0) = F?we obtain

hF?−t?, α0(0)i= 0,

that is, F?−t?∈TF?F(P)⊥.

Theorem 4.11 Denote by Pthe Pareto set of an unconstrained MOP and assume

that the objectives F1, . . . , Fkare strictly convex . Let x?∈Pbe obtained by solving

RPP ’with target t?<pF?:= F(x?). Then for every ¯

F∈F(P)\{F?}there is a

target ˜

t∈TF?F(P),˜

t <p¯

Fsuch that for the solution ˜xof RPP ’with target ˜

twe

have F(˜x) = ¯

Proof: First observe, that Theorem 4.9 guarantees that there is a unique solution

¯xof MOP with F(¯x) = ¯

F. Then

Fi(λ¯x+ (1 −λ)x?)< λ ¯

Fi+ (1 −λ)F?

and therefore

Fi(λ¯x+ (1 −λ)x?)−F?

i< λ(¯

Fi−F?

i) (4.10)

for every λ∈(0,1) and for every i= 1, . . . , k. Since F(P) forms a smooth manifold,

see [25], there exists a curve α: [0,1] →F(P) with α(0) = F?and

αi(λ)≤Fi(λ¯x+ (1 −λ)x?) (4.11)

for all i= 1, . . . , k and λ∈[0,1]. Thus, from (4.10) and (4.11) it follows that

αi(λ)−F?

i< λ(¯

Fi−F?

for λ∈(0,1) and i= 1, . . . , k. Multiplication by (t?

i−F?

i)<0 yields

(t?

i−F?

i)(αi(λ)−F?

i)> λ(t?

i−F?

i)( ¯

Fi−F?

for λ∈(0,1) and i= 1, . . . , k. With F?=α(0), after summation of these kequations

and division by λ6= 0, we have

t?−F?,αi(λ)−α(0)

λ>t?−F?,¯

F−F?

for λ∈(0,1). The left hand side of this inequality vanishes for λ&0, because

lim

λ&0

αi(λ)−α(0)

λ∈TF?F(P)

and, as stated in Lemma 4.10,

t?−F?∈TF?F(P)⊥.

Thus we have

t?−F?,¯

F−F?<0.

Now, for any vector v∈

kwith hv, t?−F?i>0 define

µv:= −t?−F?,¯

F−F?

hv, t?−F?i.

Then µv>0 and

(¯

F+µvv)−F?, t?−F?=¯

F−F?, t?−F?+µvhv, t?−F?i= 0,

that is, ¯

F+µvv∈TF?F(P).

Since ¯xis a solution of MOP, there are ¯wi≥0, i = 1, . . . , k, such that

i=1

¯wi∇Fi(¯x)=0

and since the objectives Fiare strictly convex we know that ¯xsolves WS. More-

over, the Hessians ∇2Fiare positive definite and therefore we can conclude from

Lemma 4.5 and Theorem 4.6, that there is a target ˆ

t <p¯

Fsuch that ˜x:= ¯x

is a solution of RPP ’. Obviously, ˜xis also a solution of RPP ’for all targets

t∈T:= {¯

F+µ(ˆ

t−¯

F) : µ > 0}. Moreover, since t?<pF?and t <p¯

Ffor

all t∈T, we have t−¯

F, t?−F?>0 for all t∈T. Finally, for the choice

t:= ¯

F+µˆ

t−¯

F(ˆ

t−¯

F)∈Twe have ˜

t∈TF?F(P).

4.4 A Hierarchical Concept for Computing the Desired Part

of the Pareto Set

In many applications, after considering a discrete representation of the entire Pareto

set, the decision maker finds out that the best Pareto points e.g. for adjusting the

technical system must be somewhere in a particular part of the Pareto set. To make

a more concrete decision, a more detailed representation of that part is desired

that is, more Pareto points must be available such that their density is significantly

higher within that particular part. Of course, one could use boxes of suitable depth

in the Recovering-IS algorithm, such that the computed representation is sufficiently

dense. But since in the end only an a priori unknown part of the Pareto set is of

special interest, this proceeding is not very efficient. Here, too many unneeded points

are computed. On the other hand, an initial representation of suitable density is

required in order to analyze which part of the Pareto set one wants to concentrate

on. Here, again the question arises how to choose the depth of the boxes, such

that the required density is obtained, whereas the computational effort has to be

adequate.

In the following we propose a hierarchical concept based on a multi-level subdi-

vision scheme, which helps to successively extend the computed set of Pareto points

within those boxes which are selected by the decision maker for further considera-

tion. The extensions are obtained by first subdividing the selected boxes and then

applying the Recovering-IS algorithm exclusively to those ”subboxes” containing

Pareto points found so far. Consequently, the points of the computed extensions

are well-distributed within the selected boxes and the density among these points

increases as the iteration proceeds, whereas the computation of too many undesired

Pareto points is avoided.

In Figure 13 the solution for Example 4.1 as obtained by our Recovering-IS

algorithm in combination with the interactive hierarchical concept described above

is depicted. After two steps of subdivision and execution of our algorithm, the

decision maker decided to concentrate on a certain region (marked box) of the image

space. Thus, further subdivisions and executions of our algorithm were restricted

to this region. These computations were continued until the decision maker was

satisfied with the density of the generated Pareto points within this region.

Figure 13: The solution (in image space) of Example 4.1 computed by the

Recovering-IS algorithm in combination with the described hierarchical concept.

5 Basic Methods for Solving Bi-Level Optimiza-

tion Problems

Many different approaches for solving (classical) bi-level optimization problems have

been proposed in the past, as there are for example descent algorithms, bundle

algorithms, penalty methods, trust region methods, smoothing methods and branch-

and-bound methods. Many of these approaches are based on the conversion of

the bi-level problem to an ordinary (or classical) optimization problem (a one-level

problem). One possibility is to replace the lower level objective fby an additional

non-differentiable equation f(x, y) = ϕ(y), where ϕ(y) = minx{f(x, y) : g(x, y)≤

0, h(x, y) = 0}. Other approaches use the implicit function theorem to derive a

local description of the function x(y) :

m→

n, which is then inserted into the

higher level problem. As mentioned in Section 2.2, another concept is to replace the

lower level problem by its Kuhn-Tucker conditions. In general, the resulting one-

level problem BLP’, which is a mathematical program with equilibrium constraints or

MPEC, see [33], is not equivalent to the original problem, but the desired equivalence

is ensured in the particular case where the lower level problem is a convex one. Since

in this work we will basically concentrate on optimistic formulations of BLMOP

with convex lower level problems, we will follow this concept based on lower level

Kuhn-Tucker conditions. Here, one essential difficulty to overcome in the presence

of lower level inequality constraints gi(x, y)≤0 is given by the fact that in this

case the usual constraint qualifications like linear independence or Mangasarian-

Fromowitz constraint qualifications for the higher level problem are violated at every

feasible point. A proof on this can be found for example in [12]. We will give

an alternative proof concerning linear independence constraint qualifications in the

more general context of BLMOP in Section 6.1. In order to overcome the mentioned

difficulties concerning constraint qualifications when solving the respective auxiliary

problems corresponding to BLMOP with lower level inequality constraints, we will

solve reformulations constructed by the use of merit functions ([47, 19]) as described

in the next section. We will also give a review on smoothing methods ([28]), which

help to improve the numerical solvability of the reformulated problems.

5.1 Merit Functions and Smoothing Methods

As we will see in the proof of Theorem 6.3, the mentioned violation of constraint

qualifications originates from the inequalities gi(x, y)≤0, τi≥0, i = 1, . . . , q and

the complementarity equations τigi(x, y)=0, i = 1, . . . , q in BLP’. A way out is

to replace these inequalities and equations, which act as constraints in BLP’, by

qequivalent constraints of the form Φ(τi,−gi(x, y)) = 0, where Φ :

2→

is a

so-called merit function. With this replacement and the notation ¯

∇:= ∇(x), BLP’

is reformulated as

min

x,y,τ,ζ F(x, y) (BLP”)

s.t. G(x, y)≤p0,

H(x, y)=0,

∇f(x, y) +

i=1

ζi¯

∇hi(x, y) +

i=1

τi¯

∇gi(x, y)=0,

h(x, y)=0,

Φ(τi,−gi(x, y)) = 0 for i= 1, . . . , q.

Several merit functions have been proposed in the past, see [47, 19]. Unfortunately,

it turned out that these merit functions are in general non-differentiable. Surely, non-

smooth methods can be used for the solution of merit function based reformulations,

but it is worth looking for differentiability conserving alternatives in order to benefit

from the advantages of gradient based optimization methods, that is in particular,

the higher rate of convergence.

The idea of smoothing methods is to use a family of smoothing functions Φε:

2→

, which are smooth and use a smoothing parameter εin order to approximate

the merit function Φ in the sense that

lim

ε&0Φε(a, b) = Φ(a, b)∀a, b ∈

In the following we will sometimes use synonymously the notation Φ(a, b, ε) :=

Φε(a, b). Examples of smoothing functions can for instance be found in [28]. The

method proposed in this work is particularly based on one of these smoothing func-

tion, the perturbed Fischer-Burmeister function.

Definition 5.1 The Fischer-Burmeister function Φ :

2→

is defined by

Φ(a, b) = a+b−√a2+b2.

For ε≥0, the perturbed Fischer-Burmeister function Φ :

3→

is defined by

Φ(a, b, ε) = a+b−√a2+b2+ε.

Observe that the perturbed Fischer-Burmeister function Φ has the following prop-

erties:

(i) for ε > 0, Φ is continuously differentiable with respect to (a, b) on

(ii) for ε= 0, Φ is continuously differentiable with respect to (a, b) on

2\(0,0)t,

(iii) Φ(a, b, 0) = 0 ⇔a≥0, b ≥0,and ab = 0.

Consequently, BLP” is equivalent to BLP’, but for every i= 1, . . . , q, BLP” includes

just one single equality constraint Φ(τi,−gi(x, y)) = 0 instead of the constraints

gi(x, y)≤0, τi≥0 and τigi(x, y) = 0, that is, the sources for the mentioned

violation of the constraint qualifications are circumvented. Another difference is,

that BLP” is not everywhere, but anyway almost everywhere differentiable, such

that one might generically expect to find solutions of BLP” using derivative based

algorithms. But as mentioned in [28], well-posedness can be improved in the sense

that feasibility and constraint qualifications, hence stability, are often more likely to

be satisfied, if Φ(τi,−gi(x, y)) is replaced by Φ(τi,−gi(x, y), ε) for some small ε > 0.

Moreover, solvability of quadratic approximation problems is improved in this case,

which opens the way to use sequential quadratic programming methods (SQP) from

classical nonlinear optimization.

Remark 5.2 (i) In general, it is not clear whether a solution of a perturbation

of BLP” according to ε > 0 approximates the solution of the original problem.

To see that under certain assumptions this desired approximation property is

indeed guaranteed, let

Φ(τ, x, y, ε) := (Φε(τ1,−g1(x, y)),...,Φε(τq,−gq(x, y)))t

and

H(x, y, τ, ζ, ε) := 





H(x, y)

∇f(x, y) +

i=1

ζi¯

∇hi(x, y) +

i=1

τi¯

∇gi(x, y)

h(x, y)

Φ(τ, x, y, ε)







Then, the perturbed variant of BLP” can be written as

min

x,y,τ,ζ F(x, y) (BLP(ε))

s.t. G(x, y)≤p0,

H(x, y, τ, ζ, ε)=0,

which makes up a parametric optimization problem as considered in [17].

Then, denoting by X(ε) := (x(ε), y(ε), τ(ε), ζ(ε)) the solution of BLP(ε) for

every ε, in [17] it is shown that – under suitable assumptions – the mapping

ε→X(ε) is continuous in a neighborhood of ε= 0. Consequently, we can

expect that for a sufficiently small perturbation parameter ε > 0, the solution

X(ε) approximates X(0), that is, the solution of BLP”.

(ii) Moreover, if X(¯ε) has been computed for some ¯ε > 0, the results of [17]

constitute a linear approximation for the mapping ε→X(ε) at ¯ε, which can

be used to derive different a posteriori error estimations, e.g., an estimation of

the form ||(x(¯ε), y(¯ε)) −(x(0), y(0))||.

There remains the question – which can in general hardly be answered a priori – how

to choose ε > 0, such that the corresponding reformulation is sufficiently well-posed

and the optimization results in a point which is somehow ”close enough” to the

solution of the original reformulation, that is, for the case ε= 0. Consequently, a

common technique is to start with a relatively large perturbation parameter ε0>0,

which has to converge towards 0 during the optimization process. Such techniques

are well-known in the field of Mathematical Programs with Complementarity Con-

straints and can be divided into implicit and explicit methods, see for instance [28].

In implicit smoothing methods, the smoothing parameter εis included as one

of the optimization variables of the problem reformulation and is updated at each

iteration just like the other variables. To this end, an additional equality constraint,

e.g. of the form eε−1 = 0, which is satisfied if and only if ε= 0, has to be included

in the reformulation (here, edenotes Euler’s constant).

In explicit smoothing methods the smoothing parameter is updated separately,

that is, after obtaining a solution for a fixed perturbation parameter ε > 0, εhas

to be decreased successively and in every cycle a subproblem associated with a new

perturbation parameter has to be solved, starting from the solution of the respective

previous cycle. Here, a strategy for decreasing the perturbation parameter is needed.

Furthermore, a termination criterion, i.e., an a posteriori estimation is required in

order to decide at the end of every cycle whether the solution is satisfying in the

sense that it is ”close enough” to the solution of the original reformulation, or not.

6 Bi-Level Multi-Objective Optimization Problems

(BLMOP)

In this section, we present the main topic of this thesis. First we derive some

theoretical results including optimality conditions for a certain class of BLMOP,

that is in particular, BLMOP without lower level inequality constraints. Then we

present some algorithms for the solution of these problems, which are based on the

mentioned optimality conditions and we prove convergence for such algorithms. We

also consider a special subclass of BLMOP, the Pareto set constrained multi-objective

optimization problems (PSCMOP), which are characterized by the fact that the

lower level problem is not a parametrized one. For the sake of completeness, we give

some basic ideas for the solution of BLMOP with lower level inequality constraints in

Section 6.5 and for non-convex and non-smooth BLMOPs in Section 6.6. Moreover,

in Section 6.4 we propose methods which are tailored to solve non-convex and non-

smooth PSCMOPs.

6.1 An Optimality Condition for BLMOPs without Lower

Level Inequality Constraints

For the case where lower level inequality constraints are absent, that is q= 0, we

want to state a necessary condition for a solution of BLMOP on the basis of the

Kuhn-Tucker conditions (2.2) for MOP. We restrict ourselves to this particular case,

because, as we will see later on, lower level inequality constraints lead necessarily to

the violation of constraint qualifications which have to be satisfied for the desired

optimality condition for BLMOP stated in Theorem 6.1. Later on in Section 6.5

we will apply the results of this section to reformulations based on the Fischer-

Burmeister function in order to handle the general BLMOP with both equality and

inequality constraints for both the higher and lower level.

We rewrite the optimistic formulation BLMOP-O for the case without lower level

inequality constraints as

min

x∈

n,y∈

mF(x, y),(BLMOP-O’)

s.t. G(x, y)≤p0,

H(x, y) = 0,

and xsolves: min

x∈

nf(x, y),

s.t. h(x, y) = 0.

In the following let L:

n×

m×

l×

p→

L(x, y, α, ζ) :=

i=1

αifi(x, y) +

i=1

ζihi(x, y)

be the lower level Lagrangian. Denote by Lxithe derivative of Lwith respect to

xi,i= 1, . . . , n. Let Iα:= {i:αi= 0}⊂{1, . . . , l}and for fixed x∈

n, y ∈

let IG(x, y) := {i:Gi(x, y) = 0} ⊂ {1, . . . , s}. The gradient ∇f(x) of a function

f:D⊂

n→

is understood to be a row vector and the Hessian of fis denoted

by ∇2f(x). Accordingly, the i−th row of the Jacobian ∇g(x) of a vector valued

function g:D⊂

n→

kis given by the gradient ∇gi(x) of the i−th component

of g. Furthermore, denote by eithe i−th vector of the standard basis in

n+m+l+p

and let ¯

∇:= ∇(x),˜

∇:= ∇(y),ˆ

∇:= ∇(x,y), and ∇:= ∇(x,y,α,ζ).

Theorem 6.1 Let (x∗, y∗)be a solution of BLMOP. If the gradients

∇hi(x∗, y∗), i = 1, . . . , p,

are linearly independent, then there exist α1, . . . , αl≥0,ζ1, . . . , ζp∈

such that

(i)¯

∇L(x∗, y∗, α, ζ) = 0,(6.1)

(ii)

i=1

αi= 1.(6.2)

Assume that in addition all (x, y, α, ζ)satisfying (6.1), (6.2), αi≥0, and h(x, y) = 0

correspond to Pareto points for the lower level problem. Furthermore assume that

the first nrows of the Hessian

∇2L(x∗, y∗, α, ζ)

of the lower level Lagrangian with respect to (x, y)and the gradients

∇hi(x∗, y∗) = ¯

∇hi(x∗, y∗),˜

∇hi(x∗, y∗), i = 1, . . . , p,

∇Hi(x∗, y∗) = ¯

∇Hi(x∗, y∗),˜

∇Hi(x∗, y∗), i = 1, . . . , r,

∇Gi(x∗, y∗) = ¯

∇Gi(x∗, y∗),˜

∇Gi(x∗, y∗), i ∈IG(x∗, y∗)

of the active higher and lower level constraints with respect to (x, y), are linearly

independent. Then there exist β1, . . . , βk,µ1, . . . , µl,δ1, . . . , δs≥0,ω1, . . . , ωp,

ν, λ1, . . . , λn,ρ1...,ρr∈

such that

(iii)

i=1

βi∇Fi(x∗, y∗) +

i=1

λi∇Lxi(x∗, y∗, α, ζ) (6.3)

i=1

ρi∇Hi(x∗, y∗) +

i=1

δi∇Gi(x∗, y∗)

i=1

ωi∇hi(x∗, y∗) +

i=1

(ν−µi)en+m+i= 0,

(iv)

i=1

βi= 1,(6.4)

(v)µiαi= 0 for i= 1, . . . , l, (6.5)

(vi)δiGi(x∗, y∗) = 0 for i= 1, . . . , s. (6.6)

Proof: Let (x∗, y∗) be a solution to BLMOP. Then x∗is a solution to the lower level

problem parametrized by y∗. Since ¯

∇hi(x∗, y∗), i = 1, . . . , p are linearly independent,

the Kuhn-Tucker condition 2.2 holds: there are α1, . . . , αl, ζ1, . . . , ζp∈

such that

∇L(x∗, y∗, α, ζ)=0,(6.7)

i=1

αi= 1 (6.8)

and

αi≥0, i = 1, . . . , l. (6.9)

Denote by SLthe set of those solutions of (6.7),(6.8),(6.9) for which h(x, y) = 0 holds.

Observe that – according to the assumptions of the theorem – for any (x, y, α, ζ)∈

SL,xis Pareto optimal for the lower level problem corresponding to y. Thus, we can

define the following auxiliary problem, which is equivalent to the given BLMOP:

min

x,y,α,ζ F(x, y),(6.10)

s.t G(x, y)≤p0,

H(x, y) = 0,

(x, y, α, ζ)∈SL.

In order to apply the Kuhn-Tucker conditions (2.2) to this auxiliary problem, we

have to ensure that the gradients with respect to (x, y, α, ζ) of the active constraints,

that is,

∇hi(x∗, y∗), i = 1, . . . , p, (6.11)

∇Hi(x∗, y∗), i = 1, . . . , r, (6.12)

∇Gi(x∗, y∗), i ∈IG(x∗, y∗),(6.13)

∇Lxi(x∗, y∗, α, ζ), i = 1, . . . , n, (6.14)

∇

i=1

αi=

i=1

en+m+i,(6.15)

∇(−αi) = −en+m+i, i ∈Iα,(6.16)

are linearly independent or equivalently, that the matrix M∈Rp+r+s+n+|Iα|+1×n+m+l+p

defined by

M:=







∇h(x∗, y∗)˜

∇h(x∗, y∗)0 0

∇H(x∗, y∗)˜

∇H(x∗, y∗)0 0

∇G(x∗, y∗)˜

∇G(x∗, y∗)0 0

∇2L(x∗, y∗, α, ζ)˜

∇¯

∇L(x∗, y∗, α, ζ)t¯

∇f(x∗, y∗)t¯

∇h(x∗, y∗)t

0 0 −˜

0 0 1. . . 10







has maximal rank, where the rows of ˜

Iare given by the transposed of distinct

standard basis vectors of

laccording to (6.16). Since

¯

∇2L(x∗, y∗, α, ζ)˜

∇¯

∇L(x∗, y∗, α, ζ)t

complies with the first n(linearly independent) rows of the Hessian ˆ

∇2L(x∗, y∗, α, ζ),

the rows of the upper left block







∇h(x∗, y∗)˜

∇h(x∗, y∗)

∇H(x∗, y∗)˜

∇H(x∗, y∗)

∇G(x∗, y∗)˜

∇G(x∗, y∗)

∇2L(x∗, y∗, α, ζ)˜

∇¯

∇L(x∗, y∗, α, ζ)t







of Mare linearly independent and since Mhas a certain triangular block structure,

it remains to show that the submatrix

−˜

1. . . 1∈R|Iα|+1×l(6.17)

has maximal rank. Indeed, with (6.8) and (6.9) we have |Iα|< l that is, (6.17)

has at most lrows. Obviously, these rows are linearly independent and thus both

(6.17) and also Mhave maximal rank. Consequently, (6.11), (6.12), (6.13), (6.14),

(6.15) and (6.16) are linearly independent and we can write down the Kuhn-Tucker

conditions (2.2) for the auxiliary problem (6.10): there are β1, . . . , βk,µ1, . . . , µl,

δ1, . . . , δs≥0, ω1, . . . , ωp,ν,λ1, . . . , λn,ρ1, . . . , ρr∈

such that

i=1

βi∇Fi(x∗, y∗) +

i=1

λi∇Lxi(x∗, y∗, α, ζ) (6.18)

+ν

i=1 ∇αi−

i=1

µi∇αi+

i=1

ωi∇hi(x∗, y∗)

i=1

ρi∇Hi(x∗, y∗) +

i=1

δi∇Gi(x∗, y∗) = 0,

i=1

βi= 1,(6.19)

µiαi= 0 for i= 1, . . . , l, (6.20)

δiGi(x∗, y∗) = 0 for i= 1, . . . , s. (6.21)

Observing that ∇αi=en+m+ifor i= 1, . . . , l, we obtain from (6.18)

i=1

βi∇Fi(x∗, y∗) +

i=1

λi∇Lxi(x∗, y∗, α, ζ) +

i=1

(ν−µi)en+m+i(6.22)

i=1

ωi∇hi(x∗, y∗) +

i=1

ρi∇Hi(x∗, y∗) +

i=1

δi∇Gi(x∗, y∗)=0.

Remark 6.2 (i) In the case without explicitly given constraints, that is r=s=

p= 0, the maximal rank condition for the matrix Mis already satisfied, if the

Hessian ¯

∇2L(x∗, y∗, α, ζ) (with respect to x) is regular.

(ii) After introducing slack variables to express the involved inequalities as equa-

tions, Theorem 6.1 along with the given constraints corresponds to a system

of nonlinear equations that has (k−1) more variables, than it has equations,

if k≥2. If k= 1, that is, if we have one single higher level objective F1, then

(6.3) simplifies to

∇F1(x∗, y∗) +

i=1

λi∇Lxi(x∗, y∗, α, ζ)

i=1

ωi∇hi(x∗, y∗) +

i=1

ρi∇Hi(x∗, y∗) +

i=1

δi∇Gi(x∗, y∗)

i=1

(ν−µi)en+m+i= 0

and (6.4) simplifies to β1= 1. In this case the system turns out to be quadratic

in the sense that the number of variables equals the number of equations.

(iii) If (x∗, y∗) is an inner point of the solution set with respect to Gand α, that is,

IG(x∗, y∗) = ∅and Iα=∅, then µi= 0 for i= 1, . . . , l,δi= 0 for i= 1, . . . , s,

and (6.3) simplifies to

i=1

βi∇Fi(x∗, y∗) +

i=1

λi∇Lxi(x∗, y∗, α, ζ)

i=1

ωi∇hi(x∗, y∗) +

i=1

ρi∇Hi(x∗, y∗) + ν

i=1

en+m+i= 0.

In the next sections we will propose algorithms based on Theorem 6.1 which are

tailored to solve BLMOPs without lower level inequality constraints. In Section 6.5

we will consider the more general BLMOPs, where lower level inequality constraints

are involved. Here, we will approximate the original problems by related BLMOPs

where the lower level inequality constraints are replaced by certain equality con-

straints which are constructed by using the perturbed Fischer-Burmeister function.

In this way, Theorem 6.1 will also be the basis for our algorithms for the solution of

BLMOPs with lower level inequality constraints.

There remains the question why Theorem 6.1 is stated only for BLMOPs without

lower level inequality constraints. For this, we state the following

Theorem 6.3 Let the Kuhn-Tucker conditions of an inequality constrained MOP

act as constraints for a higher level optimization problem. Then the linear inde-

pendence constraint qualifications for the higher level problem are violated at every

feasible point.

Proof: W.l.o.g., we assume that there are no higher level constraints and no

lower level equality constraints. In this case, with a lower level problem involving

objectives f:

n+m→

land inequality constraints g:

n+m→

q, the lower

level Lagrangian is given by

L(x, y, α, τ) :=

i=1

αifi(x, y) +

i=1

τigi(x, y).

Let ∇:= ∇(x,y,α,τ)and denote by Lxithe derivative of Lwith respect to xi,i=

1, . . . , n. Then, the Kuhn-Tucker reformulation associated with the lower level is

given by

∇L(x, y, α, τ) = 0,

i=1

αi= 1,

αi≥0, i = 1, . . . , l,

gi(x, y)τi= 0, i = 1, . . . , q,

τi≥0, i = 1, . . . , q.

Consequently, the linear independence condition for applying the Kuhn-Tucker con-

ditions to the higher level problem means that the gradients of the active constraints

(which are defined by the lower level constraints and the lower level Kuhn-Tucker

conditions) with respect to (x, y, α, τ), that is

∇Lxi(x, y, α, τ), i = 1, . . . , n,

∇(gi(x, y)τi), i = 1, . . . , q,

∇τi=en+m+l+i, i ∈Iτ={i:τi= 0},

∇gi(x, y), i ∈Ig={i:gi(x, y)=0},

∇

i=1

αi,

∇αi=en+m+i, i ∈Iα={i:αi= 0}

have to be linearly independent. But for i∈Igwe have

∇(gi(x, y)τi) = ( τi¯

∇gi(x, y)τi˜

∇gi(x, y)0 0 ),

∇gi(x, y) = ( ¯

∇gi(x, y)˜

∇gi(x, y)0 0 ),

and for i∈Iτ, we have

∇(gi(x, y)τi) = ( 0 0 0 eigi(x, y) ),

∇τi= ( 0 0 0 ei).

Consequently, since the condition gi(x, y)τi= 0 implies that i∈Ig∪Iτfor all

i= 1, . . . , q, any lower level inequality constraint violates the desired linear inde-

pendence conditions.

6.2 Methods for Solving BLMOPs without Lower Level In-

equality Constraints

In this section we propose new algorithms for the computation of the Pareto set of

a given BLMOP without lower level inequality constraints. We first consider how

to solve the BLMOP in the particular case k= 1, where the Pareto set turns out to

be a singleton and then we present the BL-Subdivision, the BL-Recovering-PS, and

the BL-Recovering-IS algorithms for the computation of both tight coverings and

discrete representations of the typically extensive Pareto sets in the case k > 1.

Here, we concentrate on derivative based methods, which are based on the so-

lution of a system of equations derived from Theorem 6.1. Therefore, we have to

assume that for any fixed y∈

mall feasible substationary points x∗∈

nof

the corresponding lower level problem are Pareto optimal, e.g., if the lower level

problem is a convex one. In this case the algorithms to be described below compute

a set of points, which certainly solve the corresponding lower level problems, but in

general this set is a superset of the Pareto set of the given BLMOP. Consequently,

a nondominance and feasibility test with respect to the higher level has to be per-

formed among these points in order to filter out the desired Pareto points. Efficient

strategies for nondominance tests can be found in [27].

Obviously, such nondominance tests can be omitted, if it is ensured that all

substationary points of the higher level problem are necessarily Pareto optimal for

the given BLMOP. This is for instance the case, if – in addition to the mentioned

assumptions for the lower level – also the higher level along with the higher level

constraints and the constraints implicitly given by the lower level solutions turns

out to be a convex problem.

Algorithms for the solution of BLMOPs which also include lower level inequality

constraints are presented in Section 6.5. Moreover, in Section 6.4 and Section 6.6

we present algorithms, which do not require derivatives or convexity of the involved

functions.

There are several possibilities to derive from Theorem 6.1 the system of equations

which has to be solved in order to compute individual Pareto points. Here, we want

to present a variant which uses slack variables s, t, u, v, w in order to incorporate the

higher level inequality constraints and the requirement that some multipliers have to

be non-negative. For this, let ˜z:= (x, y, α, ζ, β, µ, ω, δ, ν, λ, ρ, s, t, u, v, w) and denote

by a◦b∈

dthe component wise or Hadamard product of vectors a, b ∈

d, that is,

(a◦b)i=aibifor i= 1, . . . , d. Then, with the notations of the preceding sections,

we propose to solve the following system:

F(˜z) =







i=1

βi∇Fi(x, y) +

i=1

λi∇Lxi(x, y, α, ζ) +

i=1

(ν−µi)en+m+i

i=1

ωi∇hi(x, y) +

i=1

ρi∇Hi(x, y) +

i=1

δi∇Gi(x, y)

h(x, y)

H(x, y)

G(x, y) + w◦w

∇L(x, y, α, ζ)

i=1

βi−1

i=1

αi−1

µ◦α

δ◦G(x, y)

α−s◦s

β−t◦t

µ−u◦u

δ−v◦v







=0.

Observe that the Lxi(x, y, α, ζ) include first derivatives and so the ∇Lxi(x, y, α, ζ)

appearing in the first components of ˜

Finclude second derivatives. Moreover, deriva-

tives of all components of ˜

Fare required if one wants to solve the system with a

derivative based, e.g., a Gauß-Newton method. Thus, in this case derivatives up to

third order are needed for approaches based on the solution of ˜

F(˜z) = 0.

In some applications, which are affected by uncertainties, it might be required

to exclude those points, which are located close to the boundary of the solution set

with respect to Gand α. This can be achieved by a simplification of the system

according to Remark 6.2. For this, we use some sufficiently small ε > 0 to guarantee

that all components of αare strictly positive and that all components of G(x, y) are

strictly negative. Let ¯z:= (x, y, α, ζ, β, ω, ν, λ, ρ, s, t, w) and denote by 1dthe vector

(1,...,1) ∈

d. Then, for obtaining the ’inner’ part of the solution, the following

simplified system can be solved:

F(¯z) =







i=1

βi∇Fi(x, y) +

i=1

λi∇Lxi(x, y, α, ζ) + ν

i=1

en+m+i

i=1

ωi∇hi(x, y) +

i=1

ρi∇Hi(x, y)

h(x, y)

H(x, y)

G(x, y) + w◦w+ε1s

∇L(x, y, α, ζ)

i=1

βi−1

i=1

αi−1

α−s◦s−ε

β−t◦t







Later on in this section, the difference between the solutions of ˜

F(˜z) = 0 and

F(¯z) = 0 corresponding to Example 6.6 will be shown (Figures 20 and 21). It

should be pointed out that the application of the simplified system ¯

F(¯z) = 0 can

be meaningful for the recovering methods presented later on in this section. These

methods are based on continuation techniques which require for every connected

component of the Pareto set at least one single solution out of this component as a

starting point. According to the simpler structure, if it is known that {¯z:¯

F(¯z) =

0} 6=∅, we propose to compute a solution of the system ¯

F(¯z) = 0 for finding such

starting points. Moreover, the simplification from ˜

Fto ¯

Fmight be an acceptable

compromise for those applications, where the corresponding part of the solution is

satisfactory to the decision maker while computation has to be fast. This applies

particularly for the Subdivision-BL algorithm, since the computational effort of

subdivision methods increases drastically with the dimension of the search space.

For the remainder of this work we will always write ˜

F(˜z), but we emphasize that

in practice a simplification according to Remark 6.2, e.g., of the form ¯

F(¯z), can be

used, if desired.

The Case k= 1

Before we describe the set-oriented methods for the solution of the typically extensive

Pareto sets in the case k > 1, we consider the special case k= 1, that is, if there

is only one upper level objective. In this case each connected component of the

Pareto set is typically given by a single Pareto point. Here, the task of computing

a solution of the BLMOP can be regarded as computing a preferred solution out

of the Pareto set of the lower level problem, where the selection of this preferred

solution is performed due to the upper level problem. Such problems are strongly

related to the field of optimization over the efficient set, see for instance [8], [2], [44]

and references therein. As mentioned in Remark 6.2, for k= 1 the system ˜

Fhas

the same number of variables as it has equations. Consequently, a standard Newton

method is suitable for finding a solution of this particular BLMOP.

A BLMOP Formulation for Finding Robust Pareto Points

As a special application for the case k= 1, we consider the problem of finding a

Pareto point x?of a MOP with the additional property that it is robust or insensible

in the sense that the variation of the objective values is as small as possible in a

neighborhood Uε(x?) of x?. According to the requirements associated with different

applications, there are several possibilities to model robustness or sensitivity. Here

we use the expression

σi(x) := ||∇fi(x)||2

to measure the sensitivity corresponding to the i-th (lower level) objective and we

assume that the user is interested in a solution x?for which

max

i∈L{σi(x?)}

is minimal (L:= {1, . . . , l}). Now, with f(x)=(f1(x), . . . , fl(x))twe can write

down the mathematical formulation of the described problem:

min

x∈

nmax

i∈Lσi(x),(RMOP)

where xsolves: min

x∈

nf(x).

Observe that only the lower level problem, i.e., the minimization of f, has to be un-

derstood in the multi-objective sense, whereas minimization and maximization in the

upper level problem have to be understood in the classical sense, i.e., minimization

and maximization of a scalar valued expression. In order to obtain differentiability

we consider the equivalent problem

min

x∈

n,γ∈

γ, (6.24)

where σi(x)−γ≤0∀i∈L,

and xsolves: min

x∈

nf(x),

which has the form of BLMOP.

Example 6.4 We consider the RMOP associated with the MOP

min

x∈

2f(x) = 2(x1−1)4+ 3(x2−1)2

(x1+ 1)2+ (x2+ 1)2.(6.25)

The corresponding sensitivity expressions are given by

σ1(x) = 64(x1−1)6+ 36(x2−1)2

and

σ2(x) = 4(x1+ 1)2+ 4(x2+ 1)2.

The solution was computed by solving the system ˜

F(˜z) = 0 associated with the

corresponding reformulation of the form (6.24). For the purpose of visualization

we have also computed both the entire Pareto set of (6.25) and the corresponding

sensitivity values. The results are shown in Figure 14 and Figure 15. The Pareto

optimal objective values (blue) of (6.25), the respective sensitivity values (red) and,

the solution (black) of the RMOP are shown in Figure 14. Here one can see that the

maximum of the sensitivities is minimal at the solution. The Pareto set together

with the solution of the RMOP in image space is shown in Figure 15.

BL-Subdivision Algorithm

We define different variants of the BL-Subdivision algorithm by taking the classi-

cal Subdivision algorithm presented in Section 3.1, changing the search space and

replacing the iteration scheme by an iteration scheme for solving the system (or

subsystems of) ˜

F(˜z) = 0.

To define the basic BL-Subdivision algorithm, denote by GNs(z0) the application

of ssteps of the iteration scheme, e.g., a Gauß-Newton method for solving ˜

F(˜z) = 0

with starting point z0. Let Nbe the dimension of z= ˜z. Then, for a given domain

Q=B0∈RNan iteration of the basic BL-Subdivision algorithm reads as follows:

(i) Construct from Bj−1a new system ˆ

Bjof subsets such that

[

B∈ˆ

B=[

B∈Bj−1

and

diam( ˆ

Bj) = θjdiam(Bj−1),

where 0 < θmin ≤θj≤θmax <1.

Figure 14: The Pareto optimal objective values (blue), the corresponding sensitivi-

ties (red), and the solution (black) of Problem (6.25).

Figure 15: The Pareto set of (6.25) in image space and the solution to the corres-

ponding RMOP.

(ii) For all B∈ˆ

Bj:

choose a set of starting points X ⊂ B

Bj=Bj∪{¯

B:∃z∈ X with GNs(z)∈¯

B}.

Observe that for every starting point z∈ X only a small number sof steps of

the iteration scheme for solving ˜

F(˜z) = 0 is performed and so the generated point

GNs(z) is just one point of the trajectory defined by all iterates GNi(z), i ∈N.

Moreover, for i→ ∞ the trajectory might lead to a Pareto point far away from the

’starting box’ B3zregardless of the fact that there might also be Pareto points

close to or even within B. The resulting inconvenience is, that the box Band with

it a potential part of the Pareto set might get lost while the box containing GNs(z)

is kept even if it does not contain any Pareto points. A way out is to force the

iterates to converge towards a Pareto point located as close as possible to a reference

point within the starting box. Motivated by these considerations, we propose an

alternative way to realize the above algorithm. To this end, we replace the zero

finding method by a suitable iteration scheme for the solution of the constrained

minimization problem

min

˜z||˜z−rB||,(6.26)

s.t. ˜

F(˜z)=0,

where rBdenotes a reference point within the starting box B, e.g., the center of B

or the used starting point. For an illustration of this strategy see Figure 16.

Example 6.5 To demonstrate how the Subdivision-BL algorithm works, we con-

sider the following BLMOP with two higher level objectives F1, F2, three lower level

objectives f1, f2, f3, a higher level inequality constraint G1and a lower level equality

constraint h1:

min

x∈

3, y ∈

F(x, y) = (x0+ 1)2+ (x1−1−y)4+x2

(x0−1)2+ (x1+ 1 −y)2+ (x2−0.5)4,

such that G1(x, y) = −0.5x2

0+x1+ sin y−0.5≤0,

and xsolves:

min

x∈

3f(x, y) = 



(x0−1)2+ 0.5(x1+y)2+ (x2−1)4

(x0+ 1)2+ 0.5(x1+y)2+ (x2+ 1)2

0+x2

1+ (x2+ 1)2



such that h1=x0−x1y= 0.

Figure 16: min

˜z{||˜z−rB|| :˜

F(˜z) = 0}versus ˜

F(˜z) = 0.

The results are shown in Figures 17, 18 and 19. In particular, Figure 17 shows a

projection of the generated box collection to the x-space. The individual solutions

GNs(z) of the subproblems of the last iteration have been stored in an archive.

Figure 18 and 19 show these solutions in lower and higher level image space, respec-

tively.

Let us now consider another variant of the BL-Subdivision algorithm, which uses a

strategy described in [38]. Instead of using an iteration scheme for solving the entire

system of equations, a scheme for solving just one of the components ˜

Fl(˜z) = 0,

l∈ {1, . . . , N −k+ 1}, e.g., a damped Newton method, is used in every step of

the subdivision procedure. In order to guarantee convergence, every component has

to be applied infinitely often as the algorithm proceeds. In [38] this concept was

originally used for the location of zeros of a function f:

n→

using a finite set of

iteration schemes each of them characterized by an individual step length parameter.

In this way it can be avoided that the iterates run into undesired oscillations such

that the algorithm would keep the corresponding boxes which possibly do not contain

zeros of f.

For the development of the variant of the BL-Subdivision algorithm, we extended

this idea in the sense that the different iteration schemes correspond to Newton

methods applied to the individual components ˜

Fl,l∈ {1, . . . , N −k+ 1}. For the

description of this variant let {uj}∞

k=jbe a sequence with uj∈ {1, . . . , N −k+1}for

all j∈Nand |{j:uj=l}| =∞for all l= 1, . . . , N −k+ 1. Furthermore, denote

Figure 17: Projection of the generated box collection to the x-space.

Figure 18: The solution in lower level image space.

Figure 19: The solution in higher level image space.

by Ns

l(z0) the application of ssteps of an iteration scheme, e.g., a Newton method,

for solving ˜

Fl(˜z) = 0 with starting point z0. Then, for a given domain Q=B0∈RN

an iteration of the variant of the BL-Subdivision algorithm reads as follows:

(i) Construct from Bj−1a new system ˆ

Bjof subsets such that

[

B∈ˆ

B=[

B∈Bj−1

and

diam( ˆ

Bj) = θjdiam(Bj−1),

where 0 < θmin ≤θj≤θmax <1.

(ii) For all B∈ˆ

Bj:

choose a set of starting points X ⊂ B

Bj=Bj∪{¯

B:∃z∈ X with Ns

uj(z)∈¯

B}.

BL-Recovering-PS Algorithm

If in addition to the previous assumptions at least one box containing points of the

solution has been found so far, a variant of the Recovering-PS algorithm described

in Section 3.1 can be used to generate a box covering of the corresponding connected

component of the Pareto set. To be more precise, we use a fixed subdivision depth

dand define the BL-Recovering-PS algorithm by taking the classical Recovering

algorithm and replacing the corrector step by a suitable method for solving ˜

F(˜z) = 0.

For z∈

Ndenote by Π(x,y)(z) the projection from the z-space

Nto the (x, y)-

space

n+m. Moreover, denote by B(z, d) the unique box B⊂

n+mof subdivision

depth dwith Π(x,y)(z)∈B. For every box Bgenerated by the algorithm, let

zB∈

Nbe the particular solution of ˜

F(˜z) = 0 which led to B, that is, ˜

F(zB)=0

and Π(x,y)(zB)∈B.

With these notations, for a given (incomplete) box collection Bj⊂Rn+man

iteration of the BL-Recovering-PS algorithm reads as follows:

(i) for all B∈ Bj

B.active := TRUE

(ii) for i= 1, . . . , MaxStep

Bj:= Bj

for all {B∈ Bj:B.active == TRUE}

choose a set of starting points X ∈

Nnear zB

Y:= ∅

for all z∈ X

starting from zsolve ˜

F= 0 and get the solution Z

Y:= Y ∪Z

B.active := FALSE

for all z∈ Y:

if B(z, d)6∈ ˆ

B(z, d).active := TRUE

Bj:= ˆ

Bj∪B(z, d)

if ˆ

Bj== BjSTOP

Bj+1 := ˆ

In each loop of step (ii) the aim is to extend the current box collection Bjby adding

further boxes. The algorithm terminates when no further boxes can be found or

when the maximum number of iterations MaxStep is exceeded.

Observe that in contrast to the BL-Subdivision algorithm, which uses boxes in

N, the BL-Recovering-PS algorithm works with boxes in the smaller space

n+m,

which is an advantage concerning the computational effort. Certainly, one could

implement a variant of the BL-Recovering-PS algorithm using boxes in

N. But

this would be inefficient, since for every solution (x?, y?) of the BLMOP there are

in general many solutions z∈

Nwith ˜

F(z) = 0 and Π(x,y)(z) = (x?, y?), such that

the algorithm would generate many boxes in

Nwhich all correspond to (x?, y?),

although one box would be sufficient. A main difference to the BL-Subdivision

algorithm is, that according to the local nature of the BL-Recovering-PS algorithm

all starting points zare chosen in the neighborhood of solutions zBfound so far.

Consequently, we expect that with these starting points the zero finding method

converges to further solutions of ˜

F(˜z) = 0. The situation is different for the BL-

Subdivision algorithm, because here we do not know whether the starting points

zare close enough to the solution of ˜

F(˜z) = 0. Therefore, the BL-Subdivision

algorithm has to work with boxes in

Let us now make a remark on a variant of the BL-Recovering-PS algorithm. In

general, the applied iteration scheme might lead to substationary points which are

far away from the region to be explored in order to find further Pareto points. But

originally, the idea of our recovering methods was to find in every step further Pareto

points (or boxes) in the neighborhood of a certain Pareto point (or box) found in

the previous step. For some applications, particularly if a certain part of the Pareto

set is required, it could be desired to maintain this local behavior of the algorithm

even for starting points farther from the Pareto set. To this end, we propose the

following variant of the BL-Recovering-PS algorithm. Instead of just solving the

system ˜

F(˜z) = 0, we can search for a solution of this system with the additional

property that the distance to the starting point with respect to (x, y) is minimized,

that is, we have to solve

min

˜z||Π(x,y)(˜z)−Π(x,y)(z0)||,(6.27)

s.t. ˜

F(˜z) = 0,

where z0denotes the respective starting point which now has the additional role of

a reference point. This variant was applied to the following

Example 6.6

min

x∈

3, y ∈

F(x, y) = (x0+ 1)2+ (x1−1−y)4+x2

(x0−1)2+ (x1+ 1 −y)2+ (x2−0.5)4,

such that H1(x, y) = x2

0+x2−y2= 0,

G1(x, y) = x0+x3

1+ 0.5≤0,

and xsolves:

min

x∈

3f(x, y) = 



(x0−1)2+ 0.5(x1+y)2+ (x2−1)4

(x0+ 1)2+ 0.5(x1+y)2+ (x2+ 1)2

0+x2

1+ (x2+ 1)2



such that h1=x0−x1y= 0.

Figure 20 and Figure 21 show the projections to the x−space of the sets computed by

the BL-Recovering-PS algorithm based on the solution of ˜

F(˜z) = 0 and ¯

F(¯z) = 0,

respectively. To be more precise, Figure 20 shows a representation of the entire

Pareto set according to the system ˜

F(˜z) = 0, and Figure 21 shows a representation

of the ’inner’ part of the Pareto set according to the system ¯

F(¯z) = 0. This ’inner’

part is characterized by the fact that the upper level inequality constraint is inactive,

that is, G1<0.

Choice of Starting Points There are different possibilities for generating X,

the set of starting points. Of course, randomly chosen starting points can be used.

Another approach is to take points zj=c+Piλiei, where cis the center of the

current box, the eiare basis vectors of the search space and, the λiare real num-

bers such that every Π(x,y)(zj) lies within one of the neighboring boxes where new

substationary points are supposed to be. If derivatives are available, the points zj

can be generated by computing a basis biof the tangent space to the set ˜

F−1(0) at

a previously computed point z?∈˜

F−1(0) and then setting zj=z?+Piλibi. In

this latter approach, in particular when the dimension of the search space is very

large, the number of points zjcan be chosen small compared to other approaches.

Moreover, the points zjare already located relatively close to the solution set ˜

F−1(0)

and so the corrector step needs just a few iterations. For details on computing the

basis vectors bisee [25].

Convergence of Recovering-PS Algorithms

Since our algorithms generate substationary points of the given BLMOP by solving

the system of equations ˜

F(˜z) = 0, we give a more general convergence proof con-

cerning recovering techniques for the location of zeros of a vector valued function

n→

m,n > m. The proof is achieved in two steps: first, Theorem 6.7 gives

a guideline for the choice of a minimal number of starting points piwithin a given

Figure 20: The entire solution of Example 6.6 obtained by solving ˜

F(˜z) = 0 (pro-

jection to x-space).

Figure 21: The ’inner’ part of the solution of Example 6.6 obtained by solving

F(¯z) = 0 (projection to x-space).

subset B⊂

n, e.g., a box, such that, under certain assumptions on the iteration

scheme Φ :

n→

n, convergence towards a solution of F(x) = 0 within Bis

guaranteed in the sense that

lim

s→∞Φs(pi)∈ {x:F(x)=0}∩B

for at least one pi. Thereafter, this result is used in Corollary 6.9 in order to show

convergence for general recovering methods. For a norm ||·|| denote by dist(x, A) =

miny∈A||x−y|| the distance between the point x∈

nand the compact set A⊂

Theorem 6.7 Let F(x) :

n→

m, denote P:= {x∈

n:F(x)=0}and let

B⊂

nbe an open set such that P∩B6=∅. Let Φ :

n→

nbe a mapping and

let δ > 0, λ ≥1such that for all x0∈Bwith dist(x0, P)< δ we have

(i) lim

s→∞Φs(x0)∈P,

(ii) lim

s→∞||Φs(x0)−x0|| ≤ λdist(x0, P).

Then there is d > 0such that for any set X ⊂ Bof starting points with dist(x, X)<

dfor all x∈B∩{x: dist(x, P)< δ}there is a point p0∈ X, such that

lim

s→∞Φs(p0)∈P∩B.

Proof: Since Bis open and P∩B6=∅, there is ¯x∈P,ε > 0, and an open

neighborhood Uε(¯x) of ¯x, such that

P∩Uε(¯x)⊂B.

For 0 < θ < 1 let d:= θmin{δ, ε

2λ}. Then, for any set X ⊂ Bof starting points

with dist(x, X)< d for all x∈B∩{x: dist(x, P)< δ}, there is p0∈ X, such that

dist(p0, P)≤ ||p0−¯x|| <min{δ, ε

2λ}

and therefore

x?:= lim

s→∞Φs(p0)∈P.

Moreover,

||x?−p0|| = lim

s→∞||Φs(p0)−p0|| ≤ λdist(p0, P)≤λ||p0−¯x|| <ε

and consequently

||x?−¯x|| ≤ ||x?−p0||+||p0−¯x|| <ε

2+ε

2=ε,

that is, x?∈Uε(¯x). Since Uε,(¯x)⊂B, it follows that x?is a solution of F(x)=0

within B.

Remark 6.8 If the mapping Φ is given by an iteration step for the solution of

(6.27), the requirement on λin Theorem 6.7 is satisfied with the choice λ= 1.

To guarantee that a recovering method converges towards the union of those con-

nected components of the solution set which correspond to the initial box collection

B0, in every step of the algorithm the set of starting points pihas to be chosen

properly, such that all desired boxes are found, that is, boxes which are both neigh-

bors of the boxes generated in the respective previous step and contain a part of the

respective connected component of the solution. To this end, we denote by ¯

Bthe

closure of a box Band we state the following

Corollary 6.9 Using the notations of Theorem 6.7 and denoting by B0a box

collection of (fixed) subdivision depth dcovering a part of P, assume that every step

of the Recovering or BL-Recovering-PS algorithm, respectively, is realized in a way

such that for every B∈ Bj\Bj−1starting points are chosen according to Theorem

6.7 within all boxes C∈ {C:¯

C∩¯

B6=∅, C /∈ Bj}. Moreover, assume that Pis

bounded. Then, if the mapping defined by the iteration steps for solving F(x) = 0

fulfills the requirements on the mapping Φstated in Theorem 6.7, the algorithm

terminates after a finite number of steps such that the final box collection covers

those connected components of P, which correspond to at least one B∈ B0.

BL-Recovering-IS Algorithm

Similarly as presented for classical multi-objective optimization problems in Section

4, image set-oriented variants of the recovering methods can also be defined for

BLMOPs with a convex lower level problem. To this end the search space has to

be changed and the distance minimization subproblems, which have to be solved

within every step of the algorithm, have to be replaced by zero finding problems. To

be more precise, the higher level objectives have to be replaced by the scalar valued

objective ||F(x, y)−T||, and we have to solve the system ˜

F(˜z) = 0 for this altered

BLMOP, that is, according to Remark 6.2, ˜

F(˜z) simplifies to

F(˜z;T) =







∇||F(x, y)−T||+

i=1

λi∇Lxi(x, y, α, ζ) +

i=1

(ν−µi)en+m+i

i=1

ωi∇hi(x, y) +

i=1

ρi∇Hi(x, y) +

i=1

δi∇Gi(x, y)

h(x, y)

H(x, y)

G(x, y) + w◦w

∇L(x, y, α, ζ)

i=1

αi−1

µ◦α

δ◦G(x, y)

α−s◦s

µ−u◦u

δ−v◦v







=0,

where Tis the fixed target corresponding to the individual subproblem, and ˜z

is understood to be reduced in the sense that there are no components βand t.

Therefore, in the following we can change notation by using small letters tor ti

for the targets. Observe that the number of variables agrees with the number of

equations, that is, a Newton method can be used for the solution of the system.

For a given box collection Bj⊂

k(in image space) of subdivision depth dand

denoting by zBand FBthe previously generated solution (in parameter and image

space, respectively) associated with a box B∈ Bj, a step of the BL-Recovering-IS

algorithm can be written as follows:

(i) for all B∈ Bj

B.active := TRUE

(ii) for j= 1, . . . , MaxStep

Bj:= Bj

for all B∈ {B∈ Bj:B.active == TRUE}

choose target vectors {ti}i=1,...,ntnear Bwith ti<pFB

find ˜z?

iwith ˜

F(˜z?

i;ti)=0, i = 1, . . . , nt

i:= F(Π(x,y)(˜z?

i)), i = 1, . . . , nt

B.active := FALSE

for all i= 1, . . . , nt:

if B(F?

i, d)6∈ ˆ

B:= B(F?

i, d),˜zˇ

B:= ˜z?

i, F ˇ

B:= F?

B.active := TRUE

Bj:= ˆ

Bj∪ˇ

if ˆ

Bj== BjSTOP

Bj+1 := ˆ

Since the set of higher level objectives is replaced by a scalar distance minimiza-

tion problem, another variant of this algorithm can be defined by using standard

methods for the solution of classical optimization problems. To be more precise, in

every step the expressions ||F(x, y)−ti|| have to be minimized under the restriction

that only those components of ˜

Fvanish which guarantee that the optimality con-

ditions for the lower level problem hold and that the higher level constraints of the

BLMOP are fulfilled. With

F(x, y, α, ζ, w, s) :=







h(x, y)

H(x, y)

G(x, y) + w◦w

∇L(x, y, α, ζ)

i=1

αi−1

α−s◦s







= 0

and denoting ˆz:= (x, y, α, ζ, w, s) and S:= {ˆz:ˆ

F(ˆz)=0}, the variant of the

BL-Recovering-IS algorithm can be defined by replacing the parameter space by

the ˆz-space and replacing the solution of ˜

F(˜z;ti) = 0 by the distance minimization

problem

min

ˆz∈S||F(Π(x,y)(ˆz)) −ti||,

where ti, i = 1, . . . , ntdenote the targets which have to be chosen individually in

every cycle of the algorithm.

The efficiency of the BL-Recovering-IS algorithm will be demonstrated in a more

general context by the examples in Section 6.5, where the algorithm is extended by a

smoothing technique in order to solve BLMOPs which include lower level inequality

constraints. Moreover, we refer to the example in Section 7.3, where the algorithm

is combined with a sensitivity analysis based technique for the adaptive choice of

targets.

Convergence of Recovering-IS Algorithms

Since the described BL-Recovering-IS algorithm is realized by minimizing a refor-

mulation of the BLMOP, which can be understood as a constrained MOP, in the

following we prove convergence for the more general class of image set-oriented

recovering algorithms, which includes both the Recovering-IS algorithm and the

BL-Recovering-IS algorithm. The proof is carried out in two steps: first, Theorem

6.10 states that for every subset B⊂

kcontaining a part of the Pareto optimal

solution in image space, there is a minimal set of targets, such that for at least one of

these targets the corresponding distance minimization subproblem leads to a Pareto

point x?with F(x?)∈B. Then, this result is used in Corollary 6.11 to complete

the proof.

Theorem 6.10 Let F:

n→

k, S ⊂

nand denote by Pthe Pareto set of the

constrained MOP:

min

x∈SF(x).

Assume that the norm ||·|| is strictly monotonically increasing. Let B⊂

kbe an

open subset such that B∩F(P)6=∅. Then there is d > 0, such that for any set

X ⊂ Bof targets with dist(y, X)< d for all y∈B∩F(P)there exists a target

t∈ X with F(x?)∈F(P)∩B, where x?:= arg min

x∈S||F(x)−t||.

Proof: There are ¯y∈F(P) and ε > 0, such that Uε(¯y)⊂B. Let d:= ε

8√kand

c:= ¯y−2d

i=1

ei, where eidenotes the i-th standard basis vector in

k. Then, for

every y∈Ud(c), we have

||y−¯y|| ≤ ||y−c||+||c−¯y|| ≤ d+ 2d√k=ε

8√k+ε

4<ε

that is, Ud(c)⊂Uε(¯y). Consequently, there is a target t=c+v∈ X,||v|| ≤ d, such

that

min

x∈St||F(x)−t|| ≤ ||t−¯y|| <ε

and

ti=ci+vi= ¯yi−2d+vi<¯yifor i = 1, . . . , k.

With x?= arg min

x∈St||F(x)−t||, it follows that

||F(x?)−¯y|| ≤ ||F(x?)−t||+||t−¯y|| <ε

2+ε

2=ε

and therefore

F(x?)∈Uε(¯y)⊂B.

Now we have to show that F(x?) is not dominated by any ˆy∈F(P)∩St. For

F(x?) = ˆythis nondominance is obvious. For the case F(x?)6= ˆywe have to show

that Fi(x?)<ˆyifor at least one i= 1, . . . , k. To see this, assume that the opposite

is true. Then Fi(x?)≥ˆyi> tifor all i= 1, . . . , k, where, since F(x?)6= ˆy, strict

inequality holds for at least one i∈ {1, . . . , k}. Consequently, since ||·|| is strictly

monotonically increasing,

||F(x?)−t|| >||ˆy−t||,

which is a contradiction to ||F(x?)−t|| = min

x∈St||F(x)−t||. Finally, since F(x?) is

not dominated by any ˆy∈F(P)∩St, we have F(x?)∈F(P), which completes the

proof.

To guarantee that an image set-oriented recovering method converges towards

the union of those connected components of F(P) which correspond to the initial

box collection B0, in every step of the algorithm the set of targets tihas to be

chosen properly, such that all desired boxes are found, that is, boxes which are

both neighbors of the boxes generated in the respective previous step and contain a

part of the respective connected component of F(P). To this end, we recall that ¯

denotes the closure of a box Band we state the following

Corollary 6.11 Using the notations of Theorem 6.10 and denoting by B0a box

collection of (fixed) subdivision depth dcovering a part of F(P), assume that every

step of the Recovering-IS or BL-Recovering-IS algorithm, respectively, is realized in

a way such that for every B∈ Bj\Bj−1targets are chosen according to Theorem

6.10 within all boxes C∈ {C:¯

C∩¯

B6=∅, C /∈ Bj}. Moreover, assume that F(P)is

bounded. Then, the algorithm terminates after a finite number of steps such that the

final box collection covers those connected components of F(P), which correspond to

at least one B∈ B0.

6.3 Pareto Set Constrained Multi-Objective Optimization

Problems

In this section we introduce particular BLMOPs, which are termed Pareto set con-

strained multi-objective optimization problems PSCMOPs.

From Theorem 6.1 we will derive corollaries, which are concerned with necessary

optimality conditions for PSCMOP. Then we will develop algorithms for the solution

of this class of problems. In particular, we will present derivative-free algorithms,

which turned out to work very satisfactorily in combination.

A PSCMOP can be understood as a variant of BLMOP, where the hierarchical

structure is relaxed in the sense that – in contrast to classical bi-level structures –

the lower level problem is not a parametrized one. Moreover, there are no explicitly

given higher or lower level constraints. To be more precise, consider the vector

valued functions F:

n×

m→

kand f:

n→

l. Then a (PSCMOP) can be

stated as follows:

min

x∈

n,y∈

mF(x, y),(PSCMOP)

where xsolves: min

x∈

nf(x),

where minimization again has to be understood in the sense of the partial order

≤p. Based on Theorem 6.1 the following necessary conditions for Pareto optimality

for a PSCMOP can be stated. For this let L:

n×

l→

, defined by

L(x, α) :=

i=1

αifi(x)

denote the lower level Lagrangian. Let I={i:αi= 0} ⊂ {1, . . . , l}and denote by

eithe i−th vector of the standard basis in

n+m+l.Furthermore, for i= 1, . . . , n,

denote by

Lxi:= ∂

∂xiL

the derivative of Lwith respect to xiand let ¯

∇:= ∇(x)and ∇:= ∇(x,y,α).

Corollary 6.12 Let (x∗, y∗)be a Pareto point of PSCMOP.

Then there exist α1, . . . , αl≥0such that

(i)¯

∇L(x∗, α) =

i=1

αi¯

∇fi(x∗) = 0,(6.29)

(ii)

i=1

αi= 1.(6.30)

If in addition all solutions of (i) and (ii) comply with Pareto points of the lower level

problem and if the Hessian ¯

∇2L(x∗, α)of Lis regular at (x∗, α), then there exist

β1, . . . , βk, µ1, . . . , µl≥0,ν, λ1, . . . , λn∈Rsuch that

(iii)

i=1

βi∇Fi(x∗, y∗) +

i=1

λi∇Lxi(x∗, α) (6.31)

i=1

(ν−µi)en+m+i= 0,

(iv)

i=1

βi= 1,(6.32)

(v)µi= 0 for i /∈I. (6.33)

Proof: Let (x∗, y∗) be a solution of PSCMOP. Then x∗is a solution to the lower

level problem (minimization of f) and the Kuhn-Tucker condition (2.2) holds: there

are α1, . . . , αl∈

such that

∇L(x∗, α) =

i=1

αi¯

∇fi(x∗) = 0,(6.34)

i=1

αi= 1 (6.35)

and

αi≥0, i = 1, . . . , l. (6.36)

Since the set of solutions of (6.34), (6.35) and (6.36) is exactly the feasible region

for the higher level problem, we can also apply the Kuhn-Tucker conditions (2.2)

to the upper level problem while regarding (6.34), (6.35) and (6.36) as constraints.

For this we have to ensure that the gradients with respect to (x, y, α) of the active

constraints, that is

∇Lxi(x∗, α), i = 1, . . . , n, (6.37)

∇

i=1

αi=

i=1

en+m+i,(6.38)

∇(−αi) = −en+m+i, i ∈I, (6.39)

are linearly independent or equivalently, that the matrix





∇2L(x∗, α)0¯

∇f1(x∗). . . ¯

∇fl(x∗)

0 0 −˜

0 0 1. . . 1

∈Rn+|I|+1×n+m+l(6.40)

has maximal rank, where the rows of ˜

Iare given by the transposed of distinct

standard basis vectors of Rlaccording to (6.39). Since (6.40) has a certain triangular

block structure and ¯

∇2L(x∗, α) is regular, it remains to show that the submatrix

−˜

1. . . 1∈R|I|+1×l(6.41)

has maximal rank. Indeed, with (6.35) and (6.36) we have |I|< l, that is, (6.41)

has at most lrows. Obviously, these rows are linearly independent and thus both

(6.41) and also (6.40) have maximal rank. Consequently, (6.37), (6.38) and (6.39)

are linearly independent and we can write down the Kuhn-Tucker conditions (2.2)

for the (constrained) upper level problem: there are β1, . . . , βk, µ1, . . . , µl≥0,

ν, λ1, . . . , λn∈R, such that

i=1

βi∇Fi(x∗, y∗) +

i=1

λi∇Lxi(x∗, α) + ν

i=1 ∇αi+

i=1

µi∇(−αi)=0,(6.42)

i=1

βi= 1 (6.43)

and

µi= 0 for i /∈I. (6.44)

Observing that ∇(−αi) = −en+m+ifor i= 1, . . . , l, we obtain from (6.42)

i=1

βi∇Fi(x∗, y∗) +

i=1

λi∇Lxi(x∗, α) +

i=1

(ν−µi)en+m+i= 0 (6.45)

Since for positive coefficients a linear combination of symmetric positive definite

matrices is again symmetric positive definite, we can conclude the following corollary,

which will be relevant for some of our numerical examples later on.

Corollary 6.13 Let (x∗, y∗)be a Pareto point of PSCMOP.

Then there exist α1, . . . , αl≥0, such that

(i)

i=1

αi¯

∇fi(x∗) = 0,(6.46)

(ii)

i=1

αi= 1.(6.47)

(6.48)

If in addition all solutions of (i) and (ii) comply with Pareto points of the lower level

problem and if the Hessian ¯

∇2fi(x∗)of fiis positive definite at x∗for i= 1, . . . , l,

then there exist β1, . . . , βk, µ1, . . . , µl≥0,ν, λ1, . . . , λn∈R, such that

(iii)

i=1

βi∇Fi(x∗, y∗) +

i=1

λi∇Lxi(x∗, α) (6.49)

i=1

(ν−µi)en+m+i= 0,

(iv)

i=1

βi= 1,(6.50)

(v)µi= 0 for i /∈I. (6.51)

Proof: The Hessians ¯

∇2fi(x∗), i= 1, . . . , l, are symmetric positive definite

and consequently, with

i=1

αi= 1 and αi≥0 for i= 1, . . . , l, also the Hessian

∇2L(x∗, α) =

i=1

αi¯

∇2fi(x∗) of L(x∗, α) is symmetric positive definite at (x∗, α).

Since a symmetric positive definite matrix is regular we can use Corollary 6.12 to

complete the proof.

Remark 6.14 (i) If x∗is an inner point of the solution set of the lower level

problem, that is, I=∅, then (6.31) and (6.49) simplify to

i=1

βi∇fi(x∗, y∗) +

i=1

λi∇Lxi(x∗, α) +

i=1

ν en+m+i= 0 (6.52)

(ii) Since the Hessian ¯

∇2fof a convex function f:

n→

is positive definite,

Corollary 6.13 is in particular relevant for PSCMOPs with convex lower level

problems.

(iii) If all lower and higher level objectives and the solution set of the lower level

problem are convex, then all points satisfying the equations stated in Corollary

6.12 and Corollary 6.13 are Pareto points of the given PSCMOP.

6.4 Methods for Solving PSCMOPs

In this section we present algorithms for the solution of PSCMOP. The first two algo-

rithms PSC-Subdivision and PSC-Recovering, which are based on the results stated

in Corollary 6.12 and Corollary 6.13, are special variants of the more general algo-

rithms BL-Subdivision and BL-Recovering, respectively, for the solution of BLMOP.

Since the algorithms BL-Subdivision and BL-Recovering are described in detail in

the previous sections, we will restrict ourselves to the consideration of the essen-

tial item which distinguishes the PSC-variants from the BL-variants, that is, the

simplification of the system of equations to be solved in order to find substationary

points.

Thereafter we present the methods PSC-Sampling and PSC-SamRec, which are

based solely on the computation and comparison of objective values and there-

fore give the opportunity to solve PSCMOPs with objectives that do not meet any

smoothness or convexity assumptions. Moreover, we propose some guidelines on how

to combine these algorithms in order to increase the performance of the respective

numerical schemes.

PSC-Subdivision and PSC-Recovering Algorithm

As mentioned above, the PSC-variants of the Subdivision and Recovering algo-

rithm, respectively, are very similar to the corresponding BL-variants. The essential

difference is the system of equations to be solved in order to find Pareto candidates.

With ˜z:= (x, y, α, β, µ, ν, λ, s, t, u), the system simplifies to

F(˜z) =







i=1

βi∇Fi(x, y) +

i=1

λi∇Lxi(x, y, α) +

i=1

(ν−µi)en+m+i

∇L(x, y, α)

i=1

βi−1

i=1

αi−1

µ◦α

α−s◦s

β−t◦t

µ−u◦u







=0.

Apart from this simplification of the system of equations, the PSC-Subdivision and

PSC-Recovering algorithms are identical to the BL-Subdivision and BL-Recovering

algorithm, respectively. But there is one more important aspect to be mentioned.

Since the lower level problem of a PSCMOP is not parametrized by the variable

y∈

m, the PSC-algorithms can be combined with archiving strategies based on

the two-step nondominance test described below. In this way, it is possible to

approximate the Pareto set of the PSCMOP even if substationary points of the lower

level problem are not necessarily Pareto optimal for the lower level problem. The

two-step nondominance test might theoretically be extended to handle the general

BLMOP. But for this, a two-step nondominance test associated with at least every

yof a representative discretization of the y-Space has to be performed, such that –

in practice – the computational effort for such an extension would be unjustifiable.

PSC-SamRec Algorithm

In many applications the computation of derivatives of the given objectives is a

considerable problem. In the majority of cases symbolic derivatives are not available

at all and the numerical computation of derivatives is very expensive. Particularly,

there is no hope to get derivatives of higher order in justifiable time. This motivates

the construction of derivative-free algorithms. In the following we describe the

PSC-SamRec algorithm which is similar to the preceding PSC-Recovering algorithm,

but does not use derivatives. Moreover, at every considered point (x, y) the higher

level function Fmust only be evaluated if xis nondominated with respect to the

lower level function fby any other point accepted so far. The predictor-corrector

concept for finding new nondominated points in the neighborhood of a given box is

interchanged by just expanding the box to a larger test box, choosing test points in it

and then performing a local nondominance test in two steps: first, all points within

the test box are checked for nondominance with respect to the lower level function f.

Only those points which are in fact f-nondominated are taken into account for the

nondominance test concerning the upper level function F. All points ”surviving”

both steps of the nondominance test build up the basis for the decision whether a

box is added to the current box collection or not. Now for a fixed subdivision depth

dand a given box collection Bj⊂Rn+man iteration of the PSC-Sam-Rec algorithm

can be stated as follows:

(i) for all B∈ Bj

B.active := TRUE

(ii) for i= 1, . . . , MaxStep

Bj:= Bj

for all B∈ {B∈ Bj:B.active == TRUE}

create atest box ¯

Bsuch that

B⊂¯

and

diam( ¯

B) = θjdiam(B),

where θj≥θmin >1.

choose a set of test points X ⊂ ¯

Nf:= f-nondominated points of X

NF:= F-nondominated points of Nf

B.active := FALSE

for all z∈NF:

if B(z, d)6∈ ˆ

B(z, d).active := TRUE

Bj:= ˆ

Bj∪B(z, d)

if ˆ

Bj== BjSTOP

Bj+1 := ˆ

Unlike the PSC-Recovering algorithm, in the PSC-SamRec algorithm there is no

corrector step. As a result, the probability is small that a randomly chosen test

point is a nondominated point. This fact has to be taken into account by considering

a sufficiently large number of test points within every test box.

As well as the PSC-Recovering algorithm, the PSC-SamRec algorithm is of local

nature, that is, nondominance tests are performed only with respect to a local

neighborhood of the current box.

PSC-Sampling Algorithm

The following PSC-Sampling algorithm is an algorithm of subdivision type and

therefore it is tailored to generate a box collection that covers the global Pareto set

and not only a locally nondominated set. Like in all algorithms of subdivision type,

at the beginning of an iteration each box of the current collection is subdivided to

generate a finer collection. Thereafter a selection step is employed. For this, test

points within each box are chosen, and similar as in the PSC-SamRec algorithm,

a nondominance test is performed in two steps. The crucial difference is that test

points are not only compared to test points within a local region, but to all test

points within the whole box collection. This makes the PSC-Sampling algorithm a

global approach. Given a box collection Bj−1⊂Rn+mthe collection Bj⊂Rn+mis

obtained by:

(i) Subdivision

Construct from Bj−1a new system ˆ

Bjof subsets such that

[

B∈ˆ

B=[

B∈Bj−1

and

diam( ˆ

Bj) = θjdiam(Bj−1),

where 0 < θmin ≤θj≤θmax <1.

(ii) Selection

for all B∈ˆ

choose a set of test points XB⊂B

Nf:= f-nondominated points of S

B∈ˆ

BjXB

NF:= F-nondominated points of Nf

Bj:= nB∈ˆ

Bj:XB∩NF6=∅o

Combination of the PSC-Sampling and PSC-SamRec Algorithms

All the proposed approaches are self-contained algorithms, but in many applications

one may obtain even better results when combining them. Moreover, a suitable

combination can reduce the computational effort.

The PSC-Sampling algorithm finds global Pareto points due to the fact that

it works with comparisons within the entire image spaces of both the upper and

lower level of the PSCMOP. However, there always remains some uncertainty, in

particular when the boxes are big and/or the dimension of the PSCMOP is large.

To be more precise, boxes containing global Pareto points may get lost because

the selection is performed by considering finitely many test points within each box.

Nevertheless in practice it turned out that this algorithm works satisfactorily. The

PSC-SamRec algorithm, which is local in nature, successfully extends the computed

box covering of the set of substationary points. To obtain an even better performance

– i.e. to compute a robust approximation of the global Pareto set and to use a

moderate amount of function calls – we propose the following combination of the

PSC-Recovering and the PSC-Sampling algorithm.

step 0 Apply n0iterations of the PSC-Sampling algorithm using p0test points per

box.

step 1 Apply n1iterations of the PSC-SamRec algorithm to the current box collection

with p1test points per box. This fills the gaps which have possibly been

generated in previous steps.

step 2 Use n2iterations of the PSC-Sampling algorithm using p2test points per box

for further refinement of the box collection. In this way boxes which only

contain local Pareto points that are not Pareto-optimal from a global point of

view can be removed from the covering.

step 3 Carry out the PSC-SamRec algorithm using p3test points per box until no

more missing boxes are added.

Step 1 and step 2 are repeated until the desired box size is obtained.

With a proper choice of the parameters niand pi, this combination works much

better and faster than using just one of the algorithms on its own. Of course, the

choices for the niand pidepend on the concrete problem, but experience gives reason

to some general rules as follows: p1and p2can be chosen small compared to p0and

p3. The nishould be chosen based on how many ”good” boxes get lost during the

application of the PSC-Sampling algorithm. The number of boxes which are added

to the collection during step 1 is a measure for the number of test points p2needed

in step 2.

Example 6.15 Given a higher level function F= (F1, F2)t:R2×R→R2and a

lower level function f= (f1, f2)t:R2→R2, we consider the following example:

min

x∈

2, y∈

F(x, y) = (x1−1)4+ (x2−1)2+ (y−1)2

(x1+ 1)2+ (x2+ 1)4+ (y+ 1)2,(6.53)

such that xsolves:

min

x∈

2f(x) = (x1−2)2+ (x2−2)4

(x1+ 0.5)2+ (x2+ 1)2.(6.54)

Observe that the Hessians ¯

∇2f1(x) and ¯

∇2f2(x) of the lower level objectives f1and

f2with respect to xare given by

∇2f1(x) = 2 0

0 12 (x2−2)2and ¯

∇2f2(x) = 2 0

0 2 (6.55)

for all x∈

2.¯

∇2f2(x) is obviously constant positive definite. ¯

∇2f1(x) is positive

definite unless x2= 2, but in the latter case one can easily show that for all x1, y ∈R

the point (x1, x2, y) cannot be a solution of the given PSCMOP. In other words, the

assumptions of Corollary 6.13 are satisfied. Moreover, all objectives of both the

lower and the higher level problem are convex. Thus it is a good choice to use the

PSC-Subdivision or the PSC-Recovering algorithm for the solution of this problem.

To state the system of equations which has to be solved in every corrector step of

the PSC-Recovering algorithm observe that the optimality conditions for the lower

level system (6.54) are as follows:

There exist non-negative multipliers α1, α2∈Rsuch that

i=1

αi= 1 (6.56)

and

i=1

αi¯

∇fi(x) = 0.(6.57)

From this we formulate the following constraints for the higher level problem (6.53):

H(x, α) = 2α1(x1−2) + 2α2(x1+ 0.5)

4α1(x2−2)3+ 2α2(x2+ 1) = 0,(6.58)

α1+α2−1 = 0,(6.59)

α1> ε, (6.60)

α2> ε. (6.61)

As motivated in Section 6.2, εavoids that xis on the boundary of the solution set of

the lower level problem. Thus, according to Remark 6.14, the optimality conditions

for the higher level problem under the constraints defined above are:

There exist multipliers β1, β2∈R+and λ0, λ1, λ2∈Rsuch that

i=1

βi= 1 (6.62)

and

i=1

βi∇Fi(x, y) +

i=1

λi∇Hi(x, α) + λ0

i=1 ∇αi= 0,(6.63)

where ∇acts on (x, y, α). Calculating the gradients of Fand Hand substituting

them into (6.63), together with the equations above, we get the system which has

to be solved in every corrector step:







4β1(x1−1)3+ 2β2(x1+ 1) + 2λ1(α1+α2)

2β1(x2−1) + 4β2(x2+ 1)3+λ2(12α1(x2−2)2+ 2α2)

2β1(y−1) + 2β2(y+ 1)

λ1(2x1−4) + 4λ2(x2−2)3+λ0

λ1(2x1+ 1) + 2λ2(x2+ 1) + λ0

2α1(x1−2) + 2α2(x1+ 0.5)

4α1(x2−2)3+ 2α2(x2+ 1)

α1−s2

1−ε

α2−s2

2−ε

α1+α2−1

β1−t2

β2−t2

β1+β2−1







= 0.(6.64)

The result of the PSC-Recovering algorithm was used to create Figures 22 and 23

as follows. First the algorithm was applied to obtain a tight box covering of the

solution. Then test points in parameter space were generated within each box and

– similar as in the PSC-Sampling and PSC-SamRec algorithms – a nondominance

test was performed in two steps to select representatives of the solution set. In

doing so both the nondominated points in parameter space and the corresponding

objective values were archived. Thereafter the Recovering algorithm for solving

classical MOPs as described in Section 3.1 was applied both to the lower level

problem and to the unconstrained higher level problem. Again, nondominance tests

were performed on sets of test points, such that we obtained representatives of the

corresponding solutions both in parameter and objective space. The representatives

of the lower level problem in parameter space were expanded, in a certain sense, to

the three-dimensional higher level parameter space. They were used in Figure 22

to depict the constraint surface which defines the feasible set for the higher level

problem of the PSCMOP. The representatives of the solution to the PSCMOP in

parameter space were plotted in the same figure and consequently one can see that

this solution is embedded in the constraint surface, as expected. In Figure 23 the

representatives in higher level image space of both the unconstrained higher level

problem and the PSCMOP were depicted for the purpose of comparison. Obviously,

the solution to PSCMOP (which is a constrained problem) can not be better than

the solution to the unconstrained higher level problem and is thus located above the

latter one.

For alternatively solving the problem with the PSC-Sampling and the PSC-

SamRec algorithms we used a combination as follows:

for ifrom 1to 3do

6iterations PSC-Sampling with 300 test points/box

1iteration PSC-SamRec with 500 test points/box

end

3iterations PSC-Sampling with 500 test points/box

The result of this combined algorithm is shown in Figure 24. It should be men-

tioned that, to obtain a result like this by using the pure PSC-Sampling algorithm,

5000 test points/box were needed. Consequently the computational time for the

combined algorithm was about 10 times faster than that for the pure PSC-Sampling

algorithm.

Figure 22: The solution lies within the constraint surface defined by the lower level

problem. (the constraint surface was computed separately just for the purpose of

visualization, but need not be explicitly computed by our algorithms).

Figure 23: Comparison of the solution of the PSCMOP to the solution of the (un-

constrained) higher level problem (higher level objective space).

Figure 24: Box covering computed by a combination of the PSC-Sampling and

PSC-SamRec algorithms.

Example 6.16 Let us now consider the following example with a higher level func-

tion F= (F1, F2, F3)t:R3→R3and a lower level function f= (f1, f2, f3)t:R3→

R3:

min

x∈

3F(x) = 



(x1−2)4+ (x2−2)2+ (x3−2)2

(x1+ 1)2+ (x2+ 1)2+ (x3+ 1)4

(x1−1)2+ (x2−1)2+ (x3+ 1)4

,(6.65)

such that xsolves:

min

x∈

3f(x) = 



(x1−2)2+ (x2−2)4+ (x3−2)2

(x1+ 0.5)2+ (x2+ 1)4+ (x3−2)2

(x1+ 0.5)2+ (x2−1)2+ (x3+ 2)4

.(6.66)

Here, Fdoes not depend on y, that is, F(x, y) = F(x). Again, all objectives are

convex. The solution to this problem was computed using the PSC-Recovering

algorithm. The results represented in different spaces are shown in Figures 25 to 28.

For a demonstrative representation also the solutions in lower and higher level image

space, respectively, are depicted as box collections (Figures 27 and 28). Figure 26

does not only show the solution to the PSCMOP but also – in a different color – the

solution to the lower level problem. The latter was computed using the Recovering

algorithm designed for solving classical MOPs as described in Section 3.1. One can

observe that – in parameter space – the solution to the PSCMOP complies with a

part of the solution to the lower level Problem.

Figure 25: Box covering in parameter space computed by the PSC-Recovering algo-

rithm.

Figure 26: The solution to the PSCMOP – computed by the PSC-Recovering algo-

rithm – is also a part of the solution to the lower level problem – computed by the

Recovering algorithm.

Figure 27: The solution in lower level image space as computed by the

PSC-Recovering algorithm.

Figure 28: The solution in higher level image space as computed by the

PSC-Recovering algorithm.

Example 6.17 Application to Electromagnetic Shielding

In the following we deal with the optimization of a new multi-layer shielding ma-

terial with N= 3 layers, which is designed to protect an electronic device against

electromagnetic radiation. A similar problem was considered in [40], from where we

have taken the mathematical model which describes the relevant physical properties

of the material. For a deeper insight to this interesting field of applications the

reader is also referred to [40]. We consider a scenario where radiation of frequency

f1= 50 ·106Hz could cause light temporal disturbance, whereas radiation of fre-

quency f2= 3.35·107Hz would end up in heavy destruction of the electronic device.

Consequently, protection against f2is much more important than protection against

f1. In addition, due to production tolerances, the obtained degrees of shielding

against f1and f2, respectively, are required to be robust against small variations

of the layers’ thickness. Thus, the shielding material has to be designed such that

both protection against f2and the corresponding robustness are optimized in a first

instance, and the same objectives related to f1have to be optimized in a second

instance.

For our example we have chosen a 3-layered material, where the middle layer

consists of a particular polymer and the two outer layers consist of polyaniline

polyurethane (PAni/PU). We denote by Zi, σi, i, and di(i= 1,...,3) the impedance,

the conductivity, the permittivity, and the thickness of the i-th layer, respectively.

Due to the manufacturers possibilities both thickness and conductivity of the two

outer layers serve as design variables that is, x= (d1, d3, σ1, σ3)t.

For every frequency fthe characteristic matrix Mf

i∈C2×2of the i-th layer is

given by

i= cos(Af

i)−jZf

isin(Af

−j

sin(Af

i) cos(Af

i)!,

where

i=ωdisµ00i−jσi

ω0and Zf

i=sµ0

0i−jσi

ω0,

where ω= 2πf and jdenotes the imaginary unit. µ0= 4π·10−7and 0=

8.854 ·10−12 are the common physical constants. Due to their contact to air media,

the impedances of the outer layers are set to Z0=ZN+1 = 377. The characteristic

matrix for the entire 3-layered compound is given by

Mf=Mf

1Mf

2Mf

3=Mf

11 Mf

21 Mf

22 .

With this notation the transmission coefficient is given by

Tf=

2Mf

22(Mf

11Z0−Mf

12) + Mf

12(Mf

22 −Mf

21Z0)

(Mf

11Z0−Mf

12) + Z0(Mf

22 −Mf

21Z0)

and the electromagnetic shielding against radiation of frequency fcan be ex-

pressed by

sf= 20 log(|Tf|).

Furthermore, we express the shielding robustness against variations of the layers’

thickness by

rf= (∆d1sf)2+ (∆d2sf)2,

where ∆d1sfand ∆d2sfdenote finite differences of sfwith respect to d1and d2,

respectively. Finally we are in the position to state the PSCMOP we have to solve:

min

d1,d3,σ1,σ3sf1(d1, d3, σ1, σ3)

rf1(d1, d3, σ1, σ3)(6.67)

such that (d1, d3, σ1, σ3) solves:

min

d1,d3,σ1,σ3sf2(d1, d3, σ1, σ3)

rf2(d1, d3, σ1, σ3)(6.68)

Observe that, caused by the trigonometric functions involved in both the higher

and lower level objectives, this problem is not convex at all and therefore there might

be local solutions which are not necessarily globally Pareto optimal. Moreover, for

both the higher level and lower level problem, respectively, points satisfying the

Kuhn-Tucker conditions are not necessarily Pareto optimal even from a local point

of view. Thus, it was advantageous to use the PSC-Sampling algorithm to solve this

problem. After generating the box collections with the PSC-Sampling algorithm, we

have generated test points within each generated box. Thereafter, the nondominated

points have been selected by a two-step nondominance test. The solution in higher

and lower level objective space, respectively, are depicted in Figure 29. Two different

projections of the Pareto set in parameter space to representative 3-dimensional

subspaces are shown in Figure 30.

Figure 29: The solution of the shielding problem in higher level (left) and lower level

(right) objective space.

Figure 30: The solution of the shielding problem in parameter space (projections to

the (d1, d3, σ1)-subspace (left) and (σ3, d3, σ1)-subspace (right), respectively).

6.5 Methods for Solving BLMOPs with Lower Level In-

equality Constraints

As stated in Section 6.1, Theorem 6.1 and consequently also the algorithms based

on it are restricted to problems without lower level inequality constraints, because

the constraint qualifications which allow to apply the Kuhn-Tucker conditions to the

higher level problem can not be satisfied if lower level inequality constraints are in-

volved, see Theorem 6.3. Nevertheless, if there are lower level inequality constraints,

we can benefit from Theorem 6.1 when the BLMOP is adequately approximated. To

be more precise, for the reasons stated in Section 5.1, instead of solving the given

BLMOP directly, we solve successively a sequence of related auxiliary problems.

Here, the lower level inequality constraints gi(x, y), the requirement on the related

multipliers τito be non-negative, and the corresponding complementarity relations

τigi(x, y) = 0, are replaced by equality constraints constructed by the use of the

perturbed Fischer-Burmeister function Φ described in Section 5.1. The resulting

equations have the form

Φi(τi,−gi(x, y), ε) = τi−gi(x, y)−qτ2

i+gi(x, y)2+ε, i = 1, . . . , q. (6.69)

Now the series of auxiliary problems has to be defined such that the smoothing

parameter εis positive at the beginning, decreases with every auxiliary problem

and finally converges towards 0. Recall that Φi(τi,−gi(x, y), ε) is smooth for every

ε > 0 and that Φi(τi,−gi(x, y),0) = 0 if and only if gi(x, y)≤0, τi≥0 and

τigi(x, y) = 0. Consequently, the auxiliary problem associated with ε= 0 complies

with the original problem, that is, the given BLMOP is approximated by a sequence

of smooth problems without lower level inequality constraints which can be solved

based on Theorem 6.1. This strategy can be used to modify the algorithms of

Section 6.2 such that they are capable of solving BLMOP with lower level inequality

constraints.

BL2-Recovering-IS algorithm

As an example, in the following we describe the BL2-Recovering-IS algorithm, which

is a modified version of the BL-Recovering-IS algorithm. For this, we adopt the

notations of Section 6.2, expand ˜zby (ξ, τ)∈

2q, and denote by ˜

F(˜z;t, ε) the

variant of ˜

F(˜z;t), which includes the expressions (6.69) associated with the lower

level inequality constraints gi, i = 1, . . . , q, that is, with

Φ(x, y, τ, ε) := (Φ1(τ1,−g1(x, y), ε),...,Φq(τq,−gq(x, y), ε))t

and

L(x, y, α, ζ, τ) :=

i=1

αifi(x, y) +

i=1

ζihi(x, y) +

i=1

τigi(x, y),

we obtain

F(˜z;t, ε) =







∇||F(x, y)−t||+

i=1

λi∇Lxi(x, y, α, ζ, τ) +

i=1

(ν−µi)en+m+i

i=1

ωi∇hi(x, y) +

i=1

ξiΦi(x, y, τ, ε) +

i=1

ρi∇Hi(x, y)

i=1

δi∇Gi(x, y)

h(x, y)

Φ(x, y, τ, ε)

H(x, y)

G(x, y) + w◦w

∇L(x, y, α, ζ, τ)

i=1

αi−1

µ◦α

δ◦G(x, y)

α−s◦s

µ−u◦u

δ−v◦v







=0.

Now, for a fixed subdivision depth d, with ε0>0 and the notations above, an

iteration of the BL2-Recovering-IS algorithm, which can also handle lower level

inequality constraints, can be written as follows:

(i) for all B∈ Bj

B.active := TRUE

(ii) for j= 1, . . . , MaxStep

Bj:= Bj

for all B∈ {B∈ Bj:B.active == TRUE}

choose target vectors {ti}i=1,...,ntnear Bwith ti<pFB

for all i= 1, . . . , nt

ε:= ε0

˜z:= ˜zB

while termination criterion is not satisfied do

ε:= c ε for some c∈(0,1)

˜z0:= ˜z

starting from ˜z0find ˜zwith ˜

F(˜z;ti, ε) = 0

˜z?

i:= ˜z

i:= F(Π(x,y)(˜z?

i)), i = 1, . . . , nt

B.active := FALSE

for all i= 1, . . . , nt:

if B(F?

i, d)6∈ ˆ

B:= B(F?

i, d),˜zˇ

B:= ˜z?

i, F ˇ

B:= F?

B.active := TRUE

Bj:= ˆ

Bj∪ˇ

if ˆ

Bj== BjSTOP

Bj+1 := ˆ

There are several possibilities for the definition of a suitable termination criterion.

For example, one can use the simple criterion, that the variation of two successive

vectors of higher and lower level objectives has to be sufficiently small. For this, let

˜zεbe the solution of ˜

F(˜z;ti, ε) = 0 and (xε, yε) := Π(x,y)(˜zε) and denote

Fε:= (F(xε, yε), f(xε, yε))t∈

k+l.

Then the termination criterion is given by the requirement

||Fε−Fcε||2< εstop

for some small εstop >0. In some applications, such a termination criterion might be

satisfactory. But since Φi(τi,−gi(x, y), ε) = 0 is equivalent to the original constraints

gi(x, y)≤0, τi≥0 and τigi(x, y) = 0 only for the case ε= 0, it is worth to include

these original constraints into the termination criterion as follows:

||Fε−Fcε||2+||τε◦g(xε, yε)||2

+||max{0, g(xε, yε)}||2+||max{0,−τε}||2< εstop.

Here, max is understood to act component-wise and τε:= Πτ(˜zε).

Example 6.18 We consider now an example which was taken from [16]. In this

example there are explicitly defined inequality constraints Gi:R2×R→R, i = 1,2

and gi:R2×R→R, i = 1,...,4 for the higher and lower level problem, respectively.

With these constraints, a higher level function F= (F1, F2)t:R2×R→R2and, a

lower level function f= (f1, f2)t:R2×R→R2, the BLMOP is given as follows:

min

x∈

2,y ∈

F(x, y) = x1+x2

2+y+ sin2(x1+y)

cos(x2)·(0.1 + y)·exp(−x1

0.1+x2)!,(6.70)

such that

G1(x, y) = −y≤0,

G2(x, y) = y≤10,

and xsolves:

min

x∈

2f(x, y) = (x1−2)2+(x2−1)2

4+x2y+(5−y)2

16 + sin(x2

10 )

1+(x2−6)4−2x1y−(5−y)2

80 !,(6.71)

such that

g1(x, y) = x2

1−x2≤0,

g2(x, y) = 5x2

1+x2−10 ≤0,

g3(x, y) = x2+y

6−5≤0,

g4(x, y) = −x1≤0.

Of course, in (6.71) there is the non-convex term s(x2) := sin(x2

10 ). But the expli-

citly given constraints on both levels guarantee that x2∈[0,5] and obviously the

restriction of sto the interval [0,5] is convex. Consequently, since all other lower level

terms are also convex for fixed y, the entire lower level problem (6.71) is convex. The

problem was solved with the BL2-Recovering-IS algorithm using a few initial guesses

and initial targets. As a result, several local Pareto sets were computed, which, as

shown in Figure 31, intersect each other in image space. Therefore, not all computed

solutions are global solutions of the BLMOP. Nevertheless, since all lower level

problems are convex, every local solution (x?, y?) corresponds to a global solution

x?of the lower level problem associated with the parameter y?. Consequently,

the global solution of the BLMOP can be selected from the computed points by a

nondominance test with respect to the higher level. In Figure 31, also the result of

this nondominance test is marked.

Figure 31: The solutions of Problem (6.70,6.71) as computed by the BL-Recovering-

IS algorithm are locally but not necessarily globally Pareto optimal.

To show the behavior of the smoothing strategy, we have once more computed

a small part of the solution while achieving all intermediate solutions ˜zεand the

corresponding higher and lower level objective values. The resulting sequences for

the considered part of the Pareto set in parameter space and lower and higher level

image space are shown in Figures 32, 33 and 34, respectively. Here, the bigger points

mark the final points of the sequences, that is, they are substationary points of the

given BLMOP.

Figure 32: Convergence of the smoothing method (parameter space)

Figure 33: Convergence of the smoothing method (lower level objective space)

Figure 34: Convergence of the smoothing method (higher level objective space)

Example 6.19 Application to Medical Engineering

This example is also taken from [16] and deals with a real world application ap-

pearing in medical engineering, where the task is to find an optimal configuration

of coils. For a deeper insight to this applications we refer to [16] and the references

therein. The resulting BLMOP is given as follows:

min

x∈

14,y∈

||x||2

||x−xold||2

2,(6.72)

such that y∈[0, π],

and xsolves:

min

x∈

14 ||A(y)·V x −b(y)||2

||x||2

2,(6.73)

such that ||A(y)·V x −b(y)||2

2≤∆max

with

A(y) =







1 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 0 cos ysin y0 0

0 0 0 0 0 0 sin ycos y

0 0 0 −sin y0 0 cos y−sin y







b(y) = (0,cos y, sin y, 1,0,0)tand ∆max = 0.3.

For xold ∈

14 we choose

xold =(0.1247,0.1335,−0.0762,−0.1690,0.2118,−0.0534,−0.1473,

0.3170,−0.0185,−0.1800,0.1700,−0.0718,0.0058,0.0985)t.

The matrix V∈

8×14 is a non-sparse matrix with rank(V) = 8 and depends on the

special medical therapy to be administrated. For our calculation we have taken the

same (randomly chosen) matrix V= (V1|V2) as in [16] with

V1=







0.9501 0.8214 0.9355 0.1389 0.4451 0.8381 0.3046

0.2311 0.4447 0.9169 0.2028 0.9318 0.0196 0.1897

0.6068 0.6154 0.4103 0.1987 0.4660 0.6813 0.1934

0.4860 0.7919 0.8936 0.6038 0.4186 0.3795 0.6822

0.8913 0.9218 0.0579 0.2722 0.8462 0.8318 0.3028

0.7621 0.7382 0.3529 0.1988 0.5252 0.5028 0.5417

0.4565 0.1763 0.8132 0.0153 0.2026 0.7095 0.1509

0.0185 0.4057 0.0099 0.7468 0.6721 0.4289 0.6979







and

V2=







0.3784 0.8180 0.8385 0.7948 0.8757 0.2844 0.4329

0.8600 0.6602 0.5681 0.9568 0.7373 0.4692 0.2259

0.8537 0.3420 0.3704 0.5226 0.1365 0.0648 0.5798

0.5936 0.2897 0.7027 0.8801 0.0118 0.9883 0.7604

0.4966 0.3412 0.5466 0.1730 0.8939 0.5828 0.5298

0.8998 0.5341 0.4449 0.9797 0.1991 0.4235 0.6405

0.8216 0.7271 0.6946 0.2714 0.2987 0.5155 0.2091

0.6449 0.3093 0.6213 0.2523 0.6614 0.3340 0.3798







Observe that for every fixed y∈[0, π], A(y)·V x −b(y) is linear with respect to x.

Consequently, the lower level problem (6.73) is a convex one for every fixed y∈[0, π].

The solution of this problem was computed with a variant of the BL2-Recovering-

IS algorithm, which is based on the minimization of the distance between the vector

of higher level objectives and the targets under the restriction that only those com-

ponents of ˜

Fvanish which guarantee that the optimality conditions for the lower

Figure 35: The Pareto set of Problem (6.72, 6.73) in higher level image space (left)

and lower level image space (right) for different parameters λ.

level problem hold and that the higher level constraints of the BLMOP are fulfilled.

To be more precise, with ˆz:= (x, y, α, ζ, τ, δ, w, s) and

F(ˆz;ε) :=







h(x, y)

H(x, y)

G(x, y) + w◦w

Φ(x, y, τ, ε)

∇L(x, y, α, ζ, τ)

i=1

αi−1

α−s◦s







in every step we have to solve

min

ˆz∈S||F(Π(x,y)(ˆz)) −ti||,(6.74)

where S:= {ˆz:ˆ

F(ˆz;ε)=0}and ti, i = 1, . . . , ntdenote the targets which have to be

chosen individually in every cycle of the algorithm. This variant works without boxes

B⊂

k. Instead, the individual targets and consequently the Pareto points are

successively computed using the targets t?and the solutions ˆz?from the respective

previous step, that is, we solve (6.74) for a target

t=F?+λb +λ0(t?−F?),

where bis a basis vector of the one-dimensional tangent space TF?F(P) of the Pareto

set F(P) in higher level image space at F?:= F(Π(x,y)(ˆz?)). Here, λand λ0make

up a parameterization for the new target t. To demonstrate the advantage of these

methods concerning the user’s influence on the diversity and the granularity, we

have computed different solutions corresponding to different fixed values for the

parameter λ, see Figure 35. While computing these solutions it turned out that in

this particular application the simple choice λ0= 0 leads always to suitable targets,

that is, we have always t <pF(Π(x,y)(¯z?)), where ¯z?is the respective solution of

(6.74). As can be observed on the right hand side of Figure 35, the lower level

inequality constraint ||A(y)·V x −b(y)||2

2≤∆max = 0.3, which is related to the first

lower level objective, see (6.73), is active for the entire solution of this problem.

6.6 Non-Convex and Non-Smooth BLMOPs

Due to the conditions given by many real world applications, in this section we

assume that for the objectives and constraints of both the higher and lower level

100

either no derivatives are available at all or the computation of derivatives is too

time consuming. Moreover, we abandon the lower level convexity assumption which

was – as well as the availability of derivatives – essential for the application of the

algorithms based on Theorem 6.1.

BL2-Subdivision Algorithm

Motivated by the facts mentioned above, we want to describe an algorithm which

belongs to the family of subdivision algorithms and is tailored to solve non-convex

and non-smooth BLMOPs. A naive way to handle such problems would be to solve

for ’sufficiently many’ parameters y∈

mthe lower level problem, collecting the re-

spective solutions in a common archive, and then performing at first a feasibility test

and thereafter a nondominance test, both with respect to the higher level problem,

among all collected solutions. But there is the question how to choose ’sufficiently

many’ parameters y∈

min order to obtain a representative approximation of the

y-space while keeping the computational effort justifiable. The following set-oriented

algorithm finds a way out by the use of a subdivision technique. To be more precise,

the algorithm works generally with boxes in

n+m, but for the representation of the

y-space and the x-space, respectively, also boxes of respective lower dimension are

used, which can be regarded as projections of boxes in

n+mto the space of boxes in

nor

m, respectively. The subdivision depth is increased adaptively, that is, the

size of the boxes shrinks, such that both the number of points yi∈

mrepresenting

the y-space and the number of generated Pareto points of the associated lower level

problems is increased adaptively as the algorithm proceeds.

For a detailed description of the BL2-Subdivision algorithm, denote for every box

collection B ⊂

n+mby Πy(B)⊂

mthe ’box collection-valued projection’ of Bto

the space of boxes in

m. Moreover, for every fixed y∈

mand δ∈N, denote by

BLS(y, δ) a box collection in

nwith diam(BLS(y, δ)) = δcovering the solution

of the lower level problem associated with y, e.g., a box collection generated by a

combination of the classical SamRec and Sampling algorithms.

With these notations, starting with an initial box Q=B0⊂

n+m, an iteration

of the BL2-Subdivision algorithm reads as follows:

(i) Subdivision

Construct from Cj−1:= Πy(Bj−1) a new system ˆ

Cjof subsets such that

[

C∈ˆ

C=[

C∈Cj−1

101

and

δj:= diam( ˆ

Cj) = θjdiam(Cj−1),

where 0 < θmin ≤θj≤θmax <1.

(ii) Selection

Bj:= ∅

for all C∈ˆ

choose a set of test points XC⊂C

Bj:= ˆ

BjS

y∈XC

(BLS(y, δj)×C)

for all B∈ˆ

choose a set of test points XB⊂B

X:= S

B∈ˆ

BjXB

NF:= F(G,H)-nondominated points of X

Bj:= nB∈ˆ

Bj:XB∩NF6=∅o

As in all our algorithms of subdivision type, a termination criterion for the BL2-

Subdivision algorithm can be defined by the condition diam(Bj)< ε for some small

ε > 0, which is equivalent with the termination after a number of iterations.

Observe that at the end of each iteration the number of boxes B∈ Bjis mini-

mized by the use of a feasibility and nondominance test. Thus, the refinement of

the y-space in the next iteration is restricted to those regions where solutions of the

BLMOP are expected to be. Both, the adaptive refinement itself and the restriction

of the respective refinements to the relevant regions help to keep the computational

effort low.

Let us make a remark concerning the choice of test points for the higher level

nondominance test. If the method for the solution of the lower level problems

produces for every ya representative number of f(g,h)-nondominated points, these

points can be collected in an archive, which serves as a basis for the choice of the

set Xof testpoints, the higher level nondominance test has to be performed on. In

this way, the computational effort for the construction of X, which is particularly

time consuming in the high-dimensional case, can be reduced to a minimum.

Example 6.20 Given a higher level function F= (F1, F2)t:R2×R→R2, a lower

level function f= (f1, f2)t:R2×R→R2, and a lower level inequality constraint

102

function g= (g1, g2)t:R2→R2, we consider the following BLMOP:

min

x∈

2, y ∈

F(x, y) = x4

1+x2

2+ (y−6π)2

1+ (x2−6π)2+y4,(6.75)

such that xsolves:

min

x∈

2f(x, y) = sin x1+ sin(x2−y)

cos(x1−y) + cos x2,

such that g(x) = 0.2−x2

1+x2

2−5≤0,

(6.76)

where ≤has to be understood component-wise.

The solution of this problem in parameter space, as computed by the BL2-

Subdivision algorithm, is shown in Figure 36. For comparison reasons, we have also

computed the union of the lower level solutions associated with an extensive set of

parameters y∈

musing the BL-Subdivision algorithm. This union of lower level

solutions makes up the feasible set for the higher level problem and is also depicted

in Figure 36. Observe, that the solution of the BLMOP complies with just a small

part of the union. In Figure 37 the solution of the above BLMOP in higher level

objective space is shown.

7 Sensitivity Analysis

In this section we present results concerning the sensitivity of a BLMOP, that is,

we are interested in the behavior of the solution when the BLMOP is disturbed

due to additional parameters. To be more precise, we derive estimations for the

distance of a particular Pareto point of the original problem to the Pareto sets of

the problems arising when the perturbation parameters vary within a certain range.

Such information can help the decision maker to choose a particular solution out

of the computed Pareto set for the adjustment of the system under consideration,

particularly, if this system is affected by uncertainties. Moreover, as we will see in

Section 7.3, the derived estimations can be used to influence the representation of

the Pareto set computed by the algorithms described in the previous sections.

7.1 Sensitivity Analysis for Classical Optimization Problems

Before we can derive the sensitivity results for MOP and BLMOP we recall the basic

sensitivity theorem stated in [17], which is concerned with (scalar valued) parametric

103

Figure 36: The solution of the non-convex BLMOP in parameter space.

Figure 37: The solution of the non-convex BLMOP in higher level objective space.

104

optimization problems of the form

min

xF(x, ) (P())

s.t. Gi(x, )≤0, i = 1, . . . , q,

Hj(x, ) = 0, j = 1, . . . , p,

where ∈

mis a (fixed) perturbation parameter vector and all functions

F, Gi, Hj:

n×

m→

are twice continuously differentiable with respect to

x. For the remainder of this section let L:

n×

q×

p×

m→

, defined by

L(x, µ, λ, ) := F(x, ) +

i=1

µiGi(x, ) +

j=1

λjHj(x, ),

be the Lagrangian of P(). The gradient ∇F(x, ) of a function F:

n×

m→

is understood to be a row vector and the Hessian of Fis denoted by ∇2F(x, ).

Accordingly, the i−th row of the Jacobian ∇G(x, ) of a vector valued function

n×

m→

kis given by the gradient ∇Gi(x, ) of the i−th component of G.

Moreover, denote ¯

∇:= ∇(x),∇:= ∇(), and let IG(x, ) := {i:Gi(x, )=0}.

Theorem 7.1 Let x?∈

nbe a solution of P(0), such that the following conditions

hold at x?:

(i) The gradients ¯

∇F(x, ),¯

∇Gi(x, ),¯

∇Hj(x, )and the constraints Gi(x, ),

Hj(x, ),i= 1, . . . , q, j = 1, . . . , p, are continuously differentiable with respect

to in a neighborhood of (x?,0).

(ii) The gradients ¯

∇Gi(x?,0), i ∈IG(x?,0),¯

∇Hj(x?,0), j = 1, . . . , p, are linearly

independent.

(iii) There are µ?

i≥0, i = 1, . . . , q and λ?

j∈

, j = 1, . . . , p, such that

∇L(x?, µ?, λ?,0) = 0,(7.1)

and

µiGi(x?,0) = 0, i = 1, . . . , q. (7.2)

(iv)

z¯

∇2L(x?, µ?, λ?,0) z > 0 (7.3)

105

for all z6= 0 with

∇F(x?,0)z= 0,

∇Gi(x?,0)z≤0, i ∈IG(x?,0),

∇Hj(x?,0)z= 0, j = 1, . . . , p.

(v)

µi>0, i ∈IG(x?,0).(7.4)

Then

(a) x?is an isolated local solution of P(0) and the multipliers µ?

i, λ?

jare unique,

(b) for in a neighborhood of 0, there exists a unique, continuously differentiable

vector function v()=(x()t, µ()t, λ()t)tsatisfying the second order suffi-

cient conditions for a local minimum of P()such that v(0) = (x?t, µ?t, λ?t)t

and hence x()is a unique local minimum of P()with associated multipliers

µ(), λ(),

strict complementarity slackness µiGi(x(), )=0holds for i= 1, . . . , q, and

the gradients ¯

∇Gi(x(), ),¯

∇Hj(x, ), i ∈IG(x, ), j = 1, . . . , p of the active

constraints with respect to xare linearly independent at x().

Proof: See [17].

Differentiating the equations

∇L(x(), µ(), λ(), ) = 0,(7.5)

µi()Gi(x(), ) = 0, i = 1, . . . , q, (7.6)

H(x(), ) = 0,(7.7)

with respect to , it follows that

M()∇v() = N(),(7.8)

where

M() := 



∇2L(x(), µ(), λ(), ) ( ¯

∇G(x()), )t(¯

∇H(x(), ))t

Iqµ()¯

∇G(x(), )IqG(x(), ) 0

∇H(x(), ) 0 0 



106

and

N() := −

∇(¯

∇L(x(), µ(), λ(), )t)

Iqµ()∇G(x(), )

∇H(x(), )



and Iqdenotes the unity matrix in

q×q. As stated in [17], M() is regular under

the assumptions of Theorem 7.1 for near 0. Thus, it follows that

∇v() = M()−1N()

for near 0, where the quantities are evaluated as in (7.8). This expression can

be substituted into the Taylor expansion of v() in order to obtain a first-order

approximation, as is stated in the following

Corollary 7.2 Under the assumptions of Theorem 7.1, a first-order approxima-

tion of v()in a neighborhood of = 0 is given by

v() = 



x()

µ()

λ()

=



µ?

λ?

+M(0)−1N(0)+o(||||),(7.9)

where Φ() := o(||||)means that Φ()/|||| → 0as →0.

7.2 Sensitivity Analysis for MOP and BLMOP

In this section we develop methods for the sensitivity analysis of both parametrized

MOPs and parametrized BLMOPs. Since later on in this section the analysis for

BLMOPs is realized by applying the analysis for MOPs on a suitable reformulation

of BLMOP, we begin by extending the previously presented concept, which was

adapted from [17], in order to analyze parametrized MOPs of the following form:

min

xF(x, ) (PM())

s.t. Gi(x, )≤0, i = 1, . . . , q,

Hj(x, ) = 0, j = 1, . . . , p.

Here, F:

n×

m→

lis a vector-valued function and minimization has to

be understood in the sense of Definition 2.1, that is, in the sense of multi-objective

optimization.

To be more precise, we are interested in finding the maximal distance of a solution

x?=x(0) of PM(0) to a (compact) Pareto set of PM(), when varies within a

107

neighborhood of 0, that is, we want to estimate

∆(x?, δ) := max

min

x{||x?−x|| :xsolves PM(),|||| ≤ δ}

for some δ > 0 and some norm || · ||. For this we will also have to consider the

following weighted sums problems corresponding to fixed weighting vectors α∈

Sα:= {α:Pl

i=1 αi= 1, αi≥0, i = 1, . . . , l}:

min

i=1

αiFi(x, ) (PWS(, α))

s.t. Gi(x, )≤0, i = 1, . . . , q,

Hj(x, )=0, j = 1, . . . , p.

In the following we make the assumption

(A1) for every local solution x?of PM() there is a corresponding weighting vector

α∈Sαsuch that x?is also a unique local solution for problem PWS(, α).

W.l.o.g., let = 0 and let x?=x(0) be both a local Pareto point of PM(0) and

a local solution of PWS(0, α?) for a weighting vector α?∈Sα. Since the unique

solution of PWS(, α?) is contained in the Pareto set of PM(), we can conclude that

∆(x?, δ)≤max

{||x?−x|| :xsolves PWS(, α?),|||| ≤ δ},

and thus an upper bound of first order for ∆(x?, δ) can be obtained as follows. We

substitute F(x, ) by the weighted sums scalarization Pl

i=1 α?

iFi(x, ) and observe

that from Corollary 7.2 we have

||x?−x()|| =˜

M(0)−1N(0)+o(||||),

where M(0) and N(0) are the above defined matrices associated with the mentioned

substitution for Fand the matrix ˜

M(0)−1is given by the first nrows of M(0)−1.

Consequently,

∆(x?, δ)˙

≤max

{|| ˜

M(0)−1N(0)|| :|||| ≤ δ},(7.10)

where a˙

≤bmeans a≤b+o(||||) for all a, b > 0.

As we will present in the following, (7.10) can be improved in the sense that

x?is compared to the entire Pareto sets of the perturbed problems PM() and not

only to those particular solutions out of these sets, which are given by the solutions

108

of the problems PWS(, α?). For this we consider the problems PWS(, α), while

regarding not only but also αas perturbation parameters. In this case, Corollary

7.2 yields

x(, α) = x?+˜

M(0, α?)−1N(0, α?)

α−α?+o

α−α?.(7.11)

According to the assumption (A1), for all near 0, the distance of x?to the Pareto

set of PM() is given by the distance of x?to the unity of the solutions of the

problems PWS(, α), α∈Sα:

∆(x?, δ) = max

min

α{||x?−x(, α)|| :α∈Sα,|||| ≤ δ}.(7.12)

Consequently, with A?:= ˜

M(0, α?)−1N(0, α?) and ξ(, α) := (, α−α?)tthe following

equation holds up to first order:

∆(x?, δ) ˙= max

min

α{||A?ξ(, α)|| :α∈Sα,|||| ≤ δ}.(7.13)

Whereas the above considerations are based on comparisons of Pareto points in

parameter space, it might also be important to know the behavior of the objective

values caused by the variation of . Assuming that Fiis Lipschitz continuous with

Lipschitz constant Li,i= 1, . . . , l, we have

|Fi(x, )−Fi(x?,0)|˙

≤Li∆(x?, δ) (7.14)

for all with |||| ≤ δand x∈ {x:xsolves PM(),||x−x?|| ≤ ∆(x?, δ)}. The

expression (7.14) considers the variation of Fifor the case that the distance of x?

to the Pareto set of PM() in pre-image space is maximized subject to |||| ≤ δ. In

practice it can also be interesting to know the maximal distance of Fi(x?,0) to the

i-th component of the Pareto set of PM() in image space subject to |||| ≤ δ, which

for all i= 1, . . . , l can be expressed by

∆Fi(x?, δ) := max

min

x{|Fi(x?,0) −Fi(x, )|:xsolves PM(),|||| ≤ δ}.

Accordingly, if we are interested in the maximal distance of F(x?,0) to the Pareto

set of PM() in image space subject to |||| ≤ δ, we have to consider

∆F(x?, δ) := max

min

x{||F(x?,0) −F(x, )|| :xsolves PM(),|||| ≤ δ}.

From (7.11), we have

Fi(x(, α), )≈Fi(x?+A?ξ(, α), )

109

and therefore we can state

∆Fi(x?, δ)≈max

min

α{|Fi(x?,0) −Fi(x?+A?ξ(, α), )|:xsolves PM(),|||| ≤ δ}

and

∆F(x?, δ)≈max

min

α{||F(x?,0)−F(x?+A?ξ(, α), )|| :xsolves PM(),|||| ≤ δ}.

Example 7.3 Consider F= (F1, F2)t:

2×

→

2defined by

F1(x, ) = (x1−)2+x2

2,(7.15)

F2(x, ) = (x1−−1)2+ (x2−1)2(7.16)

and the corresponding unconstrained parametrized MOP

min

xF1(x, )

F2(x, ).(7.17)

The Pareto set of this MOPis given by {(x+, x):0≤x≤1}for every ∈

. Moreover, for every ∈

and every α∈Sαwe consider the weighted sums

scalarization Fα:

2×

→

defined by

Fα(x, ) = α1F1(x, ) + α2F2(x, )

and the associated weighted sums problem

min

xFα(x, ).(7.18)

Observe that, since F1and F2are convex for every fixed ∈

,Fαis convex for all

fixed ∈

,α∈Sα, and therefore the corresponding solutions of (7.18) are unique.

For the following demonstration we fix α1=α?

1=3

4,α2=α?

2=1

4and =?= 0.

Thus, the necessary optimality condition for a solution of (7.18) is

∇Fα?(x, 0) = 3

4¯

∇F1(x, 0) + 1

4¯

∇F2(x, 0) = 3

42x1

2x2t

42x1−2

2x2−2t

= 0,

from which we obtain the unique solution

x?=x?

2=1

41

1.

As mentioned before, for our analysis we have to regard both and αas per-

turbation parameters and therefore we have to calculate M(, α) and N(, α). The

gradient ¯

∇Fα(x?, ) turns out to be

∇Fα(x?, ) = 2(α1(x1−) + α2(x1−−1), α1x2+α2(x2−1))

110

Since there are no active constraints in the neighborhood of the considered point,

the matrix M(, α) is simply

M(, α) = ¯

∇2Fα(x?, ) = 2(α1+α2) 0

0 2(α1+α2)= 2 1 0

0 1 

and consequently

M(, α)−1=1

21 0

0 1 .

Denoting by ∇the gradient with respect to (, α), N(, α) turns out to be

N(, α) = −∇¯

∇Fα(x?, )t

=−2−(α1+α2)x1− x1−−1

0x2x2−1

=−2−1x1− x1−−1

0x2x2−1.

It follows that

∇x(?, α?) = M−1(0, α?)N(0, α?) = −1

4−4 1 −3

0 1 −3

and from (7.11) we have

x(, α) ˙= 1

41

1−1

4−4 1 −3

0 1 −3





α1−α?

α2−α?

2

.

Thus, for fixed α=α?we have

x(, α?) ˙= 1

41

1+

0,

that is, x(, α?) moves in parallel to the x1-axis when varies, see Figure 38. With

these results, according to (7.10), we can write

∆(x?, δ)˙

≤max

{||x(, α?)−x?|| :|||| ≤ δ}(7.19)

= max

{|||| :|||| ≤ δ}=δ. (7.20)

As one can see in Figure 38, for the case δ= 1, both x(−1, α?) and x(1, α?) are

located on the line x2=1

4such that ||x(−1, α?)−x?|| =||x(1, α?)−x?|| = 1 and

therefore we have

∆(x?,1) ˙

≤1.(7.21)

111

Now we go a step ahead and calculate a more accurate estimation according to

(7.13). For this, we have to find for every near ?= 0 the minimum of



M−1(0, α?)N(0, α?)





α1−α?

α2−α?

2



with respect to αor, equivalently, we have to solve

min

α2

(−4+ 1 −4α2)2+ (1 −4α2)2(7.22)

s.t. 0≤α2≤1,(7.23)

where α1has been eliminated by the substitution α1= 1 −α2. Assuming for the

moment that varies only in a small neighborhood such that the solution ˆα() of

(7.22) can be expected to vary within the open interval (0,1), we can omit the

constraints on α2and therefore the necessary optimality condition turns out to be

∇α2((−4+ 1 −4α2)2+ (1 −4α2)2) = −8(−4+ 2 −8α2) = 0

2−1+4α2= 0,

from which we obtain the solution

ˆα2=1

4−

2and ˆα1= 1 −1

4−

2=

2+3

This result is valid for all ∈(−3

2,1

2), because in this case ˆα2() remains within

the open interval (0,1), that is, there are no active constraints at ˆα2(). For ∈

\(−3

2,1

2) the necessary optimality condition alters to be

2−1+4α2−µ= 0,

where the sign of µ≥0 for α2= 1 and µ≤0 for α2= 0. More precisely, we have

µ=









3+2,  ≤ −3

0,  ∈(−3

2,1

2−1,  ≥1

(7.24)

and finally

ˆα() = 









(0,1)t,  ≤ −3

(

2+3

4,1

4−

2)t,  ∈(−3

2,1

(1,0)t,  ≥1

.(7.25)

112

As can be seen from this result, both ˆα1() and ˆα2() depend continuously and

monotonously on . Moreover, it can be shown that the function

→ ||M−1(0, α?)N(0, α?)ξ(, ˆα())||

is also continuous and monotone. Thus, for δ= 1, the calculation of (7.13) reduces

∆(x?,1) ˙= max

{||M−1(0, α?)N(0, α?)ξ(, ˆα())|| :∈ {−1,1}}

= max (1

√2,√10

4)=√10

4.(7.26)

Figure 38: Comparison of the different estimations for ∆(x?,1).

The sensitivity analysis for a BLMOP without lower level inequality constraints

can be realized by the use of the sensitivity analysis for MOP as described above.

To be more precise, the desired estimations are based on the application of (7.10)

and (7.13), respectively, to the following reformulation obtained by replacing the

113

lower level problem by its Kuhn-Tucker conditions:

min

x∈

n,y∈

m,α∈

l,ζ∈

pF(x, y),(7.27)

s.t G(x, y)≤p0,

H(x, y) = 0,

i=1

αi∇xfi(x, y) +

i=1

ζi∇xhi(x, y) = 0,(7.28)

h(x, y) = 0,

i=1

αi−1 = 0,

−αi≤0, i = 1, . . . , l,

where minimization has to be understood in the sense of multi-objective optimiza-

tion. For a more detailed description we consider the following

Example 7.4 Let F= (F1, F2)t, f = (f1, f2)t:

2×

→

2be defined by

F1(x, y, ) = 1

2(x1−y)2+ (1 + )x2

2,

F2(x, y, ) = 1

2(x1−y−1)2+ (x2−1)2,

f1(x, y, ) = 1

2(x1−2)2+ (x2−y−)2,

f2(x, y, ) = 1

2(x1+ 1)2+ (x2−1)2,

and consider the following parametrized BLMOP:

min

x,y F(x, y, ),(7.29)

s.t. xsolves: min

xf(x, y, ).

Observe that there are no explicitly given constraints for the upper level. Therefore,

in the following reformulation used for the calculation of the desired estimations, we

will use Hand Gto denote the constraints which arise from replacing the lower level

114

by its Kuhn-Tucker conditions. Thus, we consider the following auxiliary problem:

min

x,y,α F1(x, )

F2(x, ),(7.30)

s.t.

H1(x, α, ) := α1(x1−2) + α2(x1+ 1) = 0,(7.31)

H2(x, α, ) := α1(x2−y−) + α2(x2−1) = 0,(7.32)

H3(x, α, ) := α1+α2−1 = 0,(7.33)

G1(x, α, ) := −α1≤0,(7.34)

G2(x, α, ) := −α2≤0.(7.35)

Here, the equations H1and H2correspond to the rows of (7.28). As can be verified

by the optimality conditions stated in Theorem 6.1, the point (x?

1, x?

2, y?, α?

1, α?

2) =

2,1

2,0,1

2,1

2) along with corresponding multipliers λ1=λ2=λ3=µ1=µ2= 0

is both a solution of the given BLMOP for ?= 0 and a solution of the related

weighted sums problem corresponding to the weights (β?

1, β?

2) = (1

2,1

2). Now, let

L(x, y, α, β, µ, λ, ) =

i=1

βiFi(x, y, ) +

i=1

µiGi(x, y, ) +

i=1

λiHi(x, y, )

be the Lagrangian associated with the reformulation (7.30) – (7.35) and denote

∇=∇(x,y,α)and ∇=∇(,β). Then we obtain

∇2L(x?, y?, α?, β?, µ?, λ?, ?) = 





1 0 −1 0 0

0 1 0 0 0

−1 0 −1 0 0

0 0 0 0 0







∇¯

∇L(x?, y?, α?, β?, µ?, λ?, ?) = 





2−1

0−1

0 0 0







∇H(x?, y?, α?, ?) = 



0 0 0

−1

20 0

0 0 0 

,

∇G(x?, y?, α?, ?) = 0 0 0 −1 0

0 0 0 0 −1,

115

and

∇H(x?, y?, α?, ?) = 



1 0 0 −3

0 1 −1

2−1

0 0 0 1 1 

,

yielding

M(?, β?) =







1 0 −1 0 0 0 0 1 0 0

0 1 0 0 0 0 0 0 1 0

−1 0 −1 0 0 0 0 0 −1

0 0 0 0 0 −1 0 −3

0 0 0 0 0 0 −13

2−1

0 0 0 0 0 −1

20 0 0 0

0 0 0 0 0 0 −1

20 0 0

1 0 0 −3

20 0 0 0 0

0 1 −1

2−1

20 0 0 0 0

0 0 0 1 1 0 0 0 0 0







and

N(?, β?) =







2−1

0−1

0 0 0

−1

20 0

0 0 0







Since we are particularly interested in (x(), y()), let ˜

M−1(?, β?) denote the first

n+m= 3 rows of M−1(?, β?). Then we have

A?=˜

M−1(?, β?)N(?, β?) = 1

316 

−90 78 −78

−137 −34 34

−18 −16 16 

.

We assume that may vary within (0.5,0.5). Thus we are interested in estimating

∆((x?, y?),0.5). From (7.10), by fixing β=β?, we can calculate

∆((x?, y?),0.5) ˙

≤

A?



0.1

0

≈0.2609.(7.36)

116

To compute an estimation according to (7.13), observe that for β∈Sβthe gradient

of 

A?





β1−β?

β2−β?

2



with respect to (β1, β2) is given by

937

6241 β1−β2−1037

3748

−β1+β2+1037

3748=937

6241 1−2β2−1037

3748

−1+2β2+1037

3748,

and vanishes if and only if β1=1

2(1 + 1037

3748) and β2=1

2(1 −1037

3748). Moreover,

for || ≤ 0.5, we have β1, β2∈(0,1). Therefore,

min

β





A?





β1−β?

β2−β?

2



:β∈Sβ





=





A?





1037

3748

1037

3748







=r7993

29984.

Consequently, we can rewrite (7.13) as

∆((x?, y?),0.5) ˙= min

β





A?



0.5

β1−β?

β2−β?

2



:β∈Sβ





= 0.5r7993

29984 ≈0.2582,

which, as expected, is a better estimation than (7.36).

7.3 A Concept for the Adaptive Choice of Targets

When computing a discrete representation of a Pareto set for MOP or BLMOP,

respectively, by solving parametrized subproblems, sensitivity analysis can be used

to generate suitable parameters in order to control the distance (in parameter space

as well as in image space) between neighboring points of the computed represen-

tation. Here, we will present a new recovering algorithm of reference point type

which can be understood as a variant of our image set-oriented recovering methods

presented in Section 4.1 and Section 6.2. The new algorithm uses a concept for

the adaptive choice of targets (which are considered as perturbation parameters),

such that the generated Pareto points are well-distributed in parameter space rather

than in image space. This is motivated by applications with high-dimensional pa-

rameter space, where on one hand a well-distributed representation of the Pareto

set in parameter space is required (as can be computed by our classical set-oriented

methods), but on the other hand one wants to benefit from the advantages of the

new image set-oriented methods mentioned in Section 4.

117

The variant of the new algorithm for the solution of BLMOP is realized by

replacing the lower level problem by its Kuhn-Tucker conditions and computing a

solution of the resulting reformulation, which has the form of a constrained MOP.

Therefore, the following description is related to a general variant for the solution of

constrained MOPs. Thereafter, Example 7.5 shows that this algorithm also works

satisfactorily for BLMOPs. For the description of the new algorithm recall, that in

image set-oriented recovering methods, every new Pareto point ˆxis generated in the

neighborhood of a known Pareto point x?by minimizing the distance ||F(x)−t||

between F(x) and a target

t∈ {F(x?) +

k−1

j=1

jbj+ ˜(t?−F(x?))},

where {bj, j = 1, . . . , k −1}is a basis of the tangent space TF(x?)F(P) of the Pareto

set in image space at F(x?), and t?is the target used previously for generating x?.

The vector = (1, . . . , k−1) and the value ˜make up a parameterization for the

target t. Let us consider the case where the Pareto set is convex (in image space).

In this case, we can always choose ˜= 0, and therefore we write t=t(). Observe

that t(0) = F(x?) and thus minimization of ||F(x)−t(0)|| ends up in x?, that is,

we have x(0) = x?for the optimum function x(). Now suppose that, starting from

a previously found Pareto point x?, we want to find a new Pareto point ˆxsuch that

approximately

||ˆx−x?|| =δ(7.37)

holds. According to Corollary 7.2, we have to find a parameter vector ∈

k−1such

that

|| ˜

M−1(0)N(0)|| =δ,

where ˜

Mand Nare evaluated as in (7.8) while the objective F(x) is substituted by

the (differentiable) auxiliary function ||F(x)−t()||2. For every fixed i= 1, . . . , k−1,

since determines the target Pk−1

j=1 jbj∈TF(x?)F(P), we can choose j= 0 for j6=i

in order to force that F(ˆx) is close to F(x?) + ibi. Then, if ˜

M−1(0)N(0)ei6= 0,

where eidenotes the i-th standard basis vector in

k−1, we can conclude, that we

have to choose

ˆi:= |i|=δ

|| ˜

M−1(0)N(0)ei||.

Observe that, since we are dealing with a linear approximation, we can use the

targets

F(x?) + ˆibiand F(x?)−ˆibi

118

for computing two different Pareto points with the required property expressed by

(7.37). It should be mentioned that particularly for the case k≥3, the points

computed with this technique certainly have approximately the desired distance to

x?, but in general there is no information on the distance to the other Pareto points

which have been computed so far. Here, additional strategies are needed in order to

obtain diversity among all generated Pareto points. For the particular case k= 2 it

is certainly a good choice to compute at first the individual minimum x?of one of the

objectives Fiand then – since F(x?) makes up a vertex of the 1-dimensional Pareto

set – using successively the technique described above and choosing the respective

targets tin a way such that the i-th component tiincreases (and the other one

decreases) successively, see Figure 39 (right). Another strategy is to start with an

arbitrary target t, and after calculating the solution x?associated with that target

t, to extend the Pareto set into both directions with the mentioned technique by

choosing two different targets ˜

tand ˆ

tin a way such that for a fixed ˜

t1and ˆ

t2increase

and ˜

t2and ˆ

t1decreases (or vice versa) successively, see Figure 39 (left). In contrast to

the image space oriented recovering algorithms described in Section 4.1 and Section

6.2, the concept of using boxes in image space can be omitted here, since for k= 2

the desired diversity among all generated Pareto points is obtained with the help of

the adaptively chosen targets.

Figure 39: Adaptive target construction starting at an arbitrary Pareto point (left)

and at the Pareto point corresponding to the individual minimum of F1(right).

Example 7.5 Given a higher level function F= (F1, F2)t:R2×R→R2and a

lower level function f= (f1, f2)t:R2×R→R2, we consider the following BLMOP:

119

min

x,y ∈

F(x, y) = 10(x1−1)4+ (x2−1)2+ (y−1)2

(x1+ 1)2+ (x2+ 1)2+ (y+ 1)4,(7.38)

such that xsolves:

min

x∈

f(x, y) = (x1−2)2+ (x2−2)4+ 3y2

2y(x1+ 0.5)2+ (x2+ 1)2.(7.39)

We have calculated a solution for this problem using the BL-Recovering-IS

method (which uses the Kuhn-Tucker reformulation for the lower level problem)

in two different ways:

On one hand, the target parameter = 1 was fixed and consequently the distance

between neighboring computed Pareto points F(x?

i) and F(x?

i+1), i= 1,2,..., in

image space corresponds approximately with this value = 1, while the distance

between the associated points x?

iand x?

i+1,i= 1,2,..., in parameter space varies,

see Figure 40.

On the other hand, the target parameter = ˆwas estimated in every step by

the above sensitivity analysis based technique. Here, we have chosen δ= 0.15 and as

depicted in Figure 41, the distance between neighboring computed Pareto points in

parameter space x?

iand x?

i+1,i= 1,2,...,corresponds – as desired – approximately

with this value δ= 0.15, while the distance between the associated points F(x?

and F(x?

i+1), i= 1,2,..., in image space varies.

Figure 40: The solution in parameter space (left) and image space (right) for fixed

target parameter = 1.

120

Figure 41: The solution in parameter space (left) and image space (right) for adap-

tive target parameter ˆrelated to δ= 0.15.

8 Conclusion and Outlook

In this work we considered the class of bi-level multi-objective optimization problems

(BLMOP), which can be understood as a generalization of classical bi-level opti-

mization problems to the case where both the upper and lower level are given by

multi-objective optimization problems (MOP) instead of scalar-valued optimization

problems.

Based on the well-known Kuhn-Tucker optimality conditions for classical opti-

mization problems, we have developed optimality conditions for BLMOP for the case

that the lower level problem is a convex one without inequality constraints. These

optimality conditions have been used for the definition of numerical algorithms for

the solution of this particular subclass of BLMOP.

Since the solution of bi-level multi-objective optimization problems, the Pareto

set, is typically an extensive set or, in mathematical terms, a manifold, most of

our algorithms are embedded in a set-oriented framework. In particular, we have

concentrated on two main directions, that is, algorithms of subdivision type and

algorithms of recovering type. Both have their individual advantages and work

satisfactorily on their own, but it turned out that performance can often be improved

when they are used in combination.

Moreover, in this work we have considered the sensitivity analysis for MOP and

BLMOP, that is, we have investigated the behavior of the Pareto set under the

variation of additional parameters. The results have particularly been applied to

one of our set-oriented algorithms in order to control the spreading of the discrete

representation of the computed solution.

121

The restriction to the above mentioned subclass of BLMOP is motivated by

the following mathematical circumstances: the lower level problem is assumed to

be convex, because otherwise it can not be guaranteed that the points satisfying

the Kuhn-Tucker conditions for the lower level comply exactly with the feasible

region for the upper level problem. Moreover, the common constraint qualifications,

which have to be fulfilled for applying the Kuhn-Tucker conditions to the higher level

problem, are necessarily violated in the presence of lower level inequality constraints.

The development of advanced optimality conditions for the general BLMOP,

which also includes non-convex lower level problems with inequality constraints,

is an interesting field which shall be investigated in the future. Well-known second

order sufficient optimality conditions for classical optimization may form a promising

basis for further development. Moreover, the above mentioned sensitivity analysis

may be an additional tool not only for the development of the desired optimality

conditions, but also for designing algorithms for the solution of the general BLMOP.

Up to now, we have used the developed algorithms to solve several academic

examples and technical applications. Our interest for the future is to apply both the

existing and upcoming algorithms to BLMOPs arising from real world problems of

any discipline.

122

References

[1] J. F. Bard, Practical Bilevel Optimization: Algorithms and Applications, Kluwer

Academic Publishers, 1998.

[2] H. Bonnel and J. Morgan, Semivectorial Bilevel Optimization Problem: Penalty Ap-

proach, Journal of Optimization Theory and Application, vol. 131(3), pp. 365–382,

2006.

[3] M. Brown, R. E. Smith, Directed Multi-Objective Optimisation, International Journal

of Computers, Systems and Signals, vol. 6(1), 2005.

[4] I. Chauduri, S. Sertl, H. Zolt´an, M. Dellnitz and T. Fraunheim, Global Optimization

of Silicon Nanoclusters, Applied Surface Science, vol. 226(1-3), pp. 108-113, 2004.

[5] B. Colson, P. Marcotte and G. Savard, Bilevel programming: A survey, A Quarterly

Journal of Operations Research, vol. 3, pp. 87–107, 2005.

[6] I. Das, J. E. Dennis, A Closer Look at Drawbacks of Minimizing Weighted Sums of

Objectives for Pareto Set Generation in Multicriteria Optimization Problems, Struc-

tural Optimization, vol. 14, pp. 63–69, 1997

[7] I. Das, J. E. Dennis, Normal-Boundary Intersection: A new Method for Generating

the Pareto Surface in Nonlinear Multicriteria Optimization Problems, SIAM Journal

of Optimization, vol. 8, no.3, pp. 631–657, 1998

[8] J. P. Dauer, T. A. Fosnaugh, Optimization over the Efficient Set, Journal of Global

Optimization, vol. 7, pp. 261–277, 1995.

[9] A. Dell’Aere, Numerische nichtlineare Optimierung in MuPAD, Diplomarbeit, Pader-

born, 2003

[10] A. Dell’Aere, Multi-Objective Optimization in Self-Optimizing Systems, Proceedings

of the 32nd Annual Conference of the IEEE Industrial Electronics Society, Paris, 2006

[11] M. Dellnitz, O. Sch¨utze and T. Hestermeyer, Covering Pareto Sets by Multilevel

Subdivision Techniques, Journal of Optimization Theory and Application, vol. 124(1),

pp. 113–13, 2005.

[12] S. Dempe, Foundations of Bilevel Programming, Kluwer Academic Publishers, 2002.

[13] S. Dempe, Annotated Bibliography on Bilevel Programming and Mathematical Pro-

grams with Equilibrium Constraints, Optimization, vol. 52, pp. 333–359, 2003.

[14] P. Deuflhard, A. Hohmann, Numerische Mathematik I – Eine algorithmisch orientierte

Einfhrung, de Gruyter & Co., 1993.

[15] M. Ehrgott, Multicriteria Optimization, Lecture Notes in Economics and Mathemat-

ical Systems, vol. 491, Springer, Berlin, 2000.

[16] G. Eichfelder, Parametergesteuerte L¨osung nichtlinearer multikriterieller Opti-

mierungsprobleme, PhD thesis, University of Erlangen-N¨urnberg, Germany, 2006.

[17] A. V. Fiacco, Introduction to Sensitivity and Stability Analysis in Nonlinear Pro-

gramming, Academic Press, London, 1983.

[18] J. Figueira, S. Greco, M. Ehrgott, Multiple Criteria Decision Analysis, Lecture Notes

in Economics and Mathematical Systems, Springer, New York, 2005.

123

[19] M. Fukushima, Merit Functions for Variational Inequality and Complementarity

Problems, Nonlinear Optimization and Applications, G. Di Pillo and F. Giannessi

(eds.), Plenum Press, pp. 155-170, 1996.

[20] C. M. Fonseca, P. J. Fleming, E. Zitzler, K. Deb and L. Thiele, Evolutionary

Multi-Criterion Optimization. Second International Conference, EMO 2003, Springer,

Berlin, 2003.

[21] U. Frank, H. Giese, F. Klein, O. Oberschelp, A. Schmitt, B. Schulz, H. Vcking, K.

Witting, J. Gausemeier (Hrsg.), Selbstoptimierende Systeme des Maschinenbaus -

Definitionen und Konzepte, HNI-Verlagsschriften, vol. 155, Paderborn, 2004.

[22] M. Fukushima and J.-S. Pang, Convergence of a Smoothing Continuation Method

for Mathematical Programs with Complementarity Constraints, Ill-Posed Variational

Problems and Regularization Techniques, Lecture Notes in Economics and Mathe-

matical Systems, vol. 477, pp. 99-110, 1999.

[23] K. Harada, J. Sakuma, S. Kobayashi, Local Search for Multiobjective Function Op-

timization: Pareto Descent Method, Proceedings of the 8th annual conference on

genetic and evolutionary computation, pp. 659–666, New York, 2006.

[24] M. E. Henderson, Multiple Parameter Continuation: Computing Implictly Defined

k-Manifilds, International Journal of Bifurcation and Chaos, vol. 12, pp. 451–476,

2002

[25] C. Hillermeier, Nonlinear Multiobjective Optimization - A Generalized Homotopy

Approach, Birkh¨auser, Berlin, 2001.

[26] R. Horst and H. Tuy, Global Optimization: Deterministic Approaches, Springer,

Berlin, 1993.

[27] J. Jahn, Vector Optimization – Theory, Applications, and Extensions, Springer,

Berlin, 2004.

[28] H. Jiang and D. Ralph, Smooth SQP Methods for Mathematical Programs with

Nonlinear Complementarity Constraints, Journal on Optimization, vol. 10(3), pp.

779–808, 2000.

[29] H. Jiang and D. Ralph, Extension of Quasi-Newton Methods to Mathematical Pro-

grams with Complementarity Constraints, Computational Optimization and Appli-

cations, vol. 25(1-3), pp. 123–150, 2003

[30] T. Knoke, C. Romaus, J. B¨ocker, A. Dell’Aere, K. Witting, Energy management for

an Onboard Storage System based on Multiobjective Optimization, Proceedings of

the 32nd Annual Conference of the IEEE Industrial Electronics Society, Paris, 2006

[31] H. Kuhn and A. Tucker, Nonlinear programming, Proc. Berkeley Symp. Math. Statist.

Probability, vol. 2, pp. 481–492, 1951.

[32] Y. C. Liou, X. Yang, and J. C. Yao, Mathematical Programs with Vector Optimization

Constraints, Journal of Optimization Theory and Application, vol. 126(2), pp. 345–

355, 2005.

[33] Z. Q. Luo, J.-S. Pang, and D. Ralph, Mathematical Programs with Equilibrium

Constraints, Cambridge University Press, Cambridge, 1996.

[34] K. Miettinen, Nonlinear Multiobjective Optimization, Kluwer Academic Publishers,

1999.

124

[35] W. Oevel, Einf¨uhrung in die Numerische Mathematik, Spektrum Akademischer Ver-

lag, Heidelberg, 1996.

[36] P. M. Pardalos and M. G. C. Resende, Handbook of Applied Optimization, Oxford

University Press, New York, 2002.

[37] S. Sch¨affler and R. Schultz and K. Weinzierl, A Stochastic Method for the Solution of

Unconstrained Vector Optimization Problems, Journal of Optimization Theory and

Application, vol. 114(1), pp. 209–222, 2002.

[38] O. Sch¨utze, Set-oriented Methods for Global Optimization, PhD Thesis, University

of Paderborn, Germany, 2004.

[39] O. Sch¨utze, A. Dell’Aere and M. Dellnitz, On Continuation Methods for the Numer-

ical Treatment of Multi-Objective Optimization Problems, in Dagstuhl Seminar Pro-

ceedings 04461, Practical Approaches to Multi-Objective Optimization, IBFI, Schloss

Dagstuhl, Germany, 2005 http://drops.dagstuhl.de/opus/volltexte/2005/349.

[40] O. Sch¨utze, L. Jourdan, T. Legrand, E.-G. Talbi and J. L. Wojkiewicz, A Multi-

objective Approach to the Design of Conducting Polymer Composites for Electro-

magnetic Shielding, To appear in Evolutionary Multi-Criterion Optimization. Fourth

International Conference, EMO 2007, Springer, Berlin, 2007.

[41] S. Sertl and M. Dellnitz, Global Optimization using a Dynamical Systems Approach,

Submitted to Journal of Global Optimization, 2005

[42] X. Shi and H. S. Xia, Model and Interactive Algorithm of Bi-Level Multi-Objective

Decision Making with Multiple Interconnected Decision Makers, Journal of Multi-

Criteria Analysis vol. 10, pp. 27–34, 2001.

[43] H. v. Stackelberg, Marktform und Gleichgewicht, Springer, Berlin, 1934.

[44] P. T. Thach, H. Konno and D. Yokota, Dual Approach to Minimization on the Set

of Pareto-Optimal Solutions, Journal of Optimization Theory and Application, vol.

88(3), pp. 689–707, 1996.

[45] L. N. Vicente and P. H. Calamai, Bilevel and Multilevel Programming: A Bibliogra-

phy Review, Journal of Global Optimization, pp. 291–306, 1994.

[46] K. Witting, B. Schulz, A. Pottharst, M. Dellnitz, J. B¨ocker, N. Fr¨ohleke, A new

Approach for Online Multiobjective Optimization of Mechatronical Systems, Sub-

mitted to the International Journal on Software Tools for Technology Transfer STTT

(Special Issue on Self-Optimizing Mechatronic Systems), 2004.

[47] N. Yamashita, K. Taji, M. Fukushima, Unconstrained Optimization Reformulations

of Variational Inequality Problems, Journal of Optimization Theory and Applications,

vol. 92, pp. 439–456, 1997.

[48] J. J. Ye, Constraint Qualifications and KKT Conditions for Bilevel Programming

Problems, Mathematics of Operations Research, vol. 31(4), pp. 811–824, 2006.

125

List of Figures

1 The idea of the Subdivision algorithm. . . . . . . . . . . . . . . . . . . . . . 15

2 The idea of the Recovering algorithm. . . . . . . . . . . . . . . . . . . . . . 16

3 The idea of the Sampling algorithm. . . . . . . . . . . . . . . . . . . . . . . 18

4 The idea of the Recovering-IS algorithm. . . . . . . . . . . . . . . . . . . . . 22

5 The construction of targets based on b∈Ty?F(P) and d∈(Ty?F(P))⊥. . . 23

6 The solution of Example 4.1 computed by the image set-oriented recovering

algorithm using different box sizes in image space. . . . . . . . . . . . . . . 24

7 The solution of Example 4.2 computed by the Recovering-IS algorithm (im-

agespace). .................................... 25

8 The solution of Example 4.2 computed by the classical Recovering algorithm

(imagespace).................................... 25

9 Block diagram of the tram system with onboard storage and operation

managementsystem................................ 26

10 The operation principle of the onboard storage system. . . . . . . . . . . . . 27

11 The solution of the tram problem computed by the IS-Recovering algorithm

and by the Recovering algorithm (image space). . . . . . . . . . . . . . . . . 29

12 Optimization results for the tram model (image space). . . . . . . . . . . . 29

13 The solution (in image space) of Example 4.1 computed by the Recovering-

IS algorithm in combination with the described hierarchical concept. . . . . 40

14 The Pareto optimal objective values (blue), the corresponding sensitivities

(red), and the solution (black) of Problem (6.25). . . . . . . . . . . . . . . . 57

15 The Pareto set of (6.25) in image space and the solution to the correspon-

dingRMOP..................................... 57

16 min

˜z{||˜z−rB|| :˜

F(˜z)=0}versus ˜

F(˜z)=0. .................. 59

17 Projection of the generated box collection to the x-space. . . . . . . . . . . 60

18 The solution in lower level image space. . . . . . . . . . . . . . . . . . . . . 60

19 The solution in higher level image space. . . . . . . . . . . . . . . . . . . . . 61

20 The entire solution of Example 6.6 obtained by solving ˜

F(˜z) = 0 (projection

to x-space). .................................... 65

21 The ’inner’ part of the solution of Example 6.6 obtained by solving ¯

F(¯z) = 0

(projection to x-space). ............................. 65

22 The solution lies within the constraint surface defined by the lower level

problem. (the constraint surface was computed separately just for the pur-

pose of visualization, but need not be explicitly computed by our algorithms). 83

23 Comparison of the solution of the PSCMOP to the solution of the (uncon-

strained) higher level problem (higher level objective space). . . . . . . . . . 84

24 Box covering computed by a combination of the PSC-Sampling and PSC-SamRec

algorithms. .................................... 84

25 Box covering in parameter space computed by the PSC-Recovering algorithm. 86

26 The solution to the PSCMOP – computed by the PSC-Recovering algorithm

– is also a part of the solution to the lower level problem – computed by

theRecoveringalgorithm. ............................ 87

27 The solution in lower level image space as computed by the PSC-Recovering

algorithm...................................... 87

28 The solution in higher level image space as computed by the PSC-Recovering

algorithm...................................... 88

29 The solution of the shielding problem in higher level (left) and lower level

(right)objectivespace............................... 90

30 The solution of the shielding problem in parameter space (projections to

the (d1, d3, σ1)-subspace (left) and (σ3, d3, σ1)-subspace (right), respectively). 91

126

31 The solutions of Problem (6.70,6.71) as computed by the BL-Recovering-IS

algorithm are locally but not necessarily globally Pareto optimal. . . . . . . 95

32 Convergence of the smoothing method (parameter space) . . . . . . . . . . 96

33 Convergence of the smoothing method (lower level objective space) . . . . . 96

34 Convergence of the smoothing method (higher level objective space) . . . . 97

35 The Pareto set of Problem (6.72, 6.73) in higher level image space (left)

and lower level image space (right) for different parameters λ......... 99

36 The solution of the non-convex BLMOP in parameter space. . . . . . . . . . 104

37 The solution of the non-convex BLMOP in higher level objective space. . . 104

38 Comparison of the different estimations for ∆(x?,1)..............113

39 Adaptive target construction starting at an arbitrary Pareto point (left)

and at the Pareto point corresponding to the individual minimum of F1

(right)........................................119

40 The solution in parameter space (left) and image space (right) for fixed

target parameter =1. .............................120

41 The solution in parameter space (left) and image space (right) for adaptive

target parameter ˆrelated to δ= 0.15......................121

127

Index

(k−1)-dimensional manifold, 22

3-layered material, 88

F(G,H)-nondominated, 7

QR-decomposition, 30

ε-constraint method, 1

m-dimensional box, 21

active constraints, 47, 73, 106

adaptive concept, 23

aspiration levels, 18

auxiliary functions, 18

auxiliary problem, 11, 19, 20, 30

basic sensitivity theorem, 103

basis vectors, 64

basis vectors of tangent space, 64

bi-level multi-objective optimization, 11

bi-level multi-objective optimization prob-

lem, 2

bi-level optimization, 1

bi-level programming, 9

bi-level structure, 2

binary tree, 14

bisection, 16

BL-Recovering-IS algorithm, 52

BL-Recovering-PS algorithm, 52, 62

BL-Subdivision algorithm, 52, 56

BL2-Recovering-IS algorithm, 92

BL2-Subdivision algorithm, 101

BLMOP, 11, 52

boundary, 17, 53

of solution set, 81

box, 14, 40, 77

center, 14, 16

center of, 64

collection, 62, 77, 78

covering, 14, 79

neighbor of, 23

radius, 14, 16

size, 24

CHIM, 34

compact domain, 14

complementarity equations, 41

complementarity slackness, 106

complete partition, 16

complete set of alternatives, 3, 13, 22, 34,

component wise product, 53

computational effort, 20, 39, 79

computational time, 83

consistent grid of points, 21

constraint qualifications, 41

constraint surface, 82

constraints, 6, 9, 73, 81, 94

active, 106

continuation method, 15

continuously differentiable, 105

convex combinations, 34

convex hull of individual minima, 34

convex objectives, 35, 81, 85

corrector, 64

corrector step, 81

covering, 52

curvature, 22

decision maker, 20, 30, 39

decision making, 1, 13, 21

density of Pareto points, 39

depth, 39

derivative-free algorithms, 77

derivatives, 22, 30, 53, 77

design variables, 89

diameter, 14

differentiable, 105

differentiable objectives, 30

dimension

of image space, 21, 30

of parameter space, 21, 30

of PSCMOP, 79

of search space, 64

direction of descent, 6, 8

distance function, 18, 21

electromagnetic radiation, 88

expanded test box, 77

explicit methods, 44

explicitly defined inequality, 94

F-nondominated, 7

feasible point, 41

feasible set, 1, 19, 82

first derivatives, 53

Fischer-Burmeister function, 45, 50

follower, 10

GAIO, 4

Gauß-Newton method, 56

global nature of PSC-Sampling algorithm,

global Pareto points, 79

global Pareto set, 78, 79

128

gradient, 17, 30, 34, 46, 47, 105

Grahm-Schmidt method, 22

Hadamard product, 53

Hessian, 32, 39, 46, 72, 75, 80, 105

heuristic, 23

hierarchical concept, 40

hierarchical optimization, 9

hierarchy, 1

higher level function, 77

higher level image space, 82

higher level problem, 1, 9, 81

ill-conditioned auxiliary problem, 23

image of a function, 20

image set-oriented method, 20

image set-oriented methods, 3

image space, 20, 56, 79

dimension of, 21, 30

imaginary unit, 89

implicit methods, 44

inequality constraints, 94

injective, 36

inner point, 50, 75

isolated solution, 33, 106

iteration, 62, 78

iteration scheme, 16, 20, 30

Kuhn-Tucker, 6, 8, 11, 34, 45

Lagrangian, 46, 105

leader, 10

linear independence constraint qualifications,

linearly independent, 46, 73, 105

Lipschitz constant, 109

Lipschitz continuous, 109

local nature of PSC-SamRec algorithm, 79

local Pareto point, 7

locally nondominated, 78

lower level Lagrangian, 46

lower level objectives, 52

lower level problem, 1, 9, 73, 75

lower level system, 81

Mangasarian-Fromowitz constraint qualifica-

tions, 41

manifold, 8, 22

mathematical program with complementar-

ity constraints, 44

mathematical program with equilibrium con-

straints, 41

maximal rank, 48, 74

maximum number of iterations, 62

memory requirement, 14

merit function, 41

minimization, 12, 72, 114

MOP, 1, 6

parametrized, 11

multi-layer shielding material, 88

multi-level subdivision scheme, 14, 20

multi-level subdivision structure, 3

multi-objective optimization problem, 1, 6,

pareto set constrained, 71

multipliers, 81

NBI method, 30, 34

necessary optimality conditions, 31, 32, 71

neighborhood, 22, 105

neighboring box, 23, 64

non-convex objectives, 90

non-smooth methods, 42

non-unique lower level solutions, 10

nondominance test, 18, 77

nondominated, 7, 77

locally, 78

nondominated points, 77, 82

norm, 18

normal boundary intersection method, 1, 30,

objective derivatives, 22

objectives

non-convex, 90

one-to-one mapping, 36

optimality conditions, 6, 32, 71, 81

optimistic formulation, 10

optimization over the efficient set, 3, 55

orientation, 22

orthogonal basis, 22

parameter set-oriented methods, 3

parameter space, 20

dimension of, 21, 30

parameters, 20

parametric optimization, 9

parametric optimization problem, 103

parametrized MOP, 11

Pareto optimal, 7, 90

globally, 90

locally, 20, 90

Pareto point, 1, 7, 18, 22

global, 79

local, 34

robust, 55

Pareto set, 1, 11, 12, 52

boundary of, 53

approximation of, 22

129

curvature of, 22

entire, 31

global, 78, 79

orientation of, 22

well-distributed, 31

Pareto set constrained multi-objective opti-

mization problems, 45

Pareto set constrained multi-objective opti-

mization problems, 71

Pareto-optimistic formulation, 12

partial order, 6, 12, 72

partition, 16

pay-off matrix, 34

perturbation parameter, 105

perturbed Fischer-Burmeister function, 42

pessimistic formulation, 10

physical constants, 89

positive definite, 32, 80

pre-optimization, 27

predictor-corrector concept, 77

predictors, 30

preferred solution, 3, 55

PSC-Recovering algorithm, 75

PSC-Sam-Rec algorithm, 77

PSC-Sampling algorithm, 76

global nature of, 78

PSC-SamRec algorithm, 76

local nature of, 78, 79

PSC-Subdivision algorithm, 75

PSCMOP, 72

dimension of, 79

Recovering algorithm, 101

Recovering-IS algorithm, 21

reduced model, 27

reference point, 18

reference point method, 18, 30

regularity assumptions, 11

representatives, 82

robust approximation, 79

robust Pareto points, 55

robustness, 55, 88

Sampling algorithm, 101

second, 53

self-optimizing systems, 2

sensitivity, 55, 56, 103

sensitivity analysis, 22

sensitivity theorem, 103

sequential quadratic programming, 43

set-oriented algorithms, 14

set-oriented methods, 1, 3

slack variables, 52

smooth objectives, 22

smoothing functions, 42

smoothing methods, 41

smoothing parameter, 42

standard basis, 46

starting iterations, 20

starting points, 64

strictly convex, 35, 38

strictly monotonically increasing, 19

strong formulation, 10

subboxes, 40

subdivision, 78

subdivision depth, 101

subdivision size, 16

submatrix, 48, 74

substationary points, 8, 15, 64, 79

surjective, 36

symbolic derivatives, 77

symmetric positive definite, 74

system of equations, 81

system of nonlinear equations, 49

tangent space, 22, 30, 64

basis vectors of, 64

target, 18, 20, 30, 38

target vector, 18, 22, 32

termination criterion, 30

test box, 77

test points, 14, 17, 77, 79

third derivatives, 53

tight covering, 52

trade-off, 34

triangular block structure, 48, 74

two-step nondominance test, 90

uniqe multipliers, 106

upper level problem, 9, 73

vector function, 106

weak formulation, 10

weighted sums method, 1, 13, 30, 31

weighting vectors, 31

weights, 30

well-distributed, 21

well-distributed Pareto points, 18, 20, 24, 31

130