scieee Science in your language
[en] (orig)
Numerical Methods for the Solution of Bi-Level
Multi-Objective Optimization Problems
Von der Fakult¨at f¨ur Elektrotechnik,
Informatik und Mathematik
der Universit¨at Paderborn
zur Erlangung des akademischen Grades
Doktor der Naturwissenschaften
Dr. rer. nat.
genehmigte Dissertation
von
Alessandro Dell’Aere
Paderborn, 2008
Acknowledgements
First of all I want to thank my advisor Prof. Dr. Michael Dellnitz for giving me
the opportunity to do my PhD on a very interesting field of research. It was a great
pleasure to work with him and his group that has always provided a nice research
environment to me. Then I want to thank the second reviewers Prof. Dr. Joachim
ocker and Prof. Dr. Stefan Scaffler, and the members of the examination board
Prof. Dr. Norbert ockler, Prof. Dr. Henning Krause, and Dr. habil. Martin Ziegler.
My working group has been like a family to me. Here I want to thank Tanja
B¨urger, Dr. Wang Fang, Kathrin Flaßkamp, Sebastian Hage-Packh¨auser, Mirko
Hessel-von Molo (imagine there is a vector space...), Christian Horenkamp, Prof.
Dr. Oliver Junge, Marianne Kalle, Stefan Klus, Dr. Arvind Krishnamurty, Anna-
Lena Meyer, Dr. Kathrin Padberg, Michael Petry, Dr. Marcus Post, Dr. Oliver
Sch¨utze, Marcel Schwalb, Stefan Sertl, Bianca Thiere, Katrin Witting, but special
thanks are directed to Dr. Robert Preis, who has always been ready to help out in
any case of problems and to Dr. Sina Ober-Bl¨obaum for the pleasurable ambiance
in our office and for an exciting canoe ride.
Moreover I have to mention the students Maik Ringkamp, Albert Seifried, and
Dominik Steenken. They have always been ready to support me, especially to do
computational work.
During the last years, my work was partly supported by the German Research
Foundation within the Collaborative Research Center 614: ’Self-optimizing Con-
cepts and Structures in Mechanical Engineering’ and therefore I want to thank the
German Research Foundation and all the people involved in this project, especially
Tobias Knoke, Eckehard M¨unch, and Henner ocking, who were always motivated
for interesting interdisciplinary cooperations.
I also want to thank Dr. Gabriele Eichfelder from the University of Erlangen
for visiting me in Paderborn and for fruitful discussions. Then I have to thank
Dr. Kai Gehrs. During our study we have been able to solve many difficult exercises
in cooperation and in our free time we had a lot of fun playing our guitars.
However, this list would not be complete without special thanks to all my friends.
And last but not least, I want to thank my wife Heike and my daughter Pia Sophia
for a number of things.
iii
Abstract
Both bi-level optimization and multi-objective optimization are of crucial impor-
tance in many modern sciences and accordingly they have attracted great interest
in many publications of the last decades. Since many applications, in particular
in the field of self-optimizing systems, become more and more complex during the
last years, in this thesis we go a step ahead and consider bi-level multi-objective
optimization problems. Such problems can be understood as bi-level optimization
problems, where the subproblems of both levels are given by multi-objective opti-
mization problems. We develop the theoretical background and practical algorithms
for the solution of these problems. Convergence of the algorithms is proved and their
strength is demonstrated by academic example problems and real world applications.
Zusammenfassung
Sowohl Bilevel-Optimierung als auch Mehrziel-Optimierung sind von großer Be-
deutung f¨ur viele moderne Wissenschaften und haben dementsprechend in vielen
Publikationen der letzten Jahrzehnte großes Interesse erfahren. Da in den letz-
ten Jahren viele Anwendungen, insbesondere im Bereich der selbstoptimierenden
Systeme, immer komplexer werden, gehen wir in dieser Dissertation einen Schritt
weiter und betrachten Bilevel-Mehrzieloptimierungsprobleme. Diese Probleme k¨on-
nen als Bilevel-Optimierungsprobleme aufgefasst werden, bei denen die Teilprobleme
der beiden Levels in Form von Mehrziel-Optimierungsproblemen vorliegen. Wir
entwickeln die theoretische Basis und anwendbare Algorithmen zur osung dieser
Probleme. Die Konvergenz der Algorithmen wird bewiesen und ihre St¨arke wird an
Hand von akademischen Beispielproblemen und realistischen Anwendungen gezeigt.
iv
Contents
1 Introduction 1
2 Basic Definitions and Concepts 6
2.1 Multi-Objective Optimization . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Bi-Level Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Bi-Level Multi-Objective Optimization . . . . . . . . . . . . . . . . . 11
3 Basic Methods for Solving Multi-Objective Optimization Problems 13
3.1 Set-Oriented Methods for Solving Unconstrained Multi-Objective Op-
timizationProblems ........................... 14
SubdivisionAlgorithm .......................... 14
RecoveringAlgorithm........................... 15
SamplingAlgorithm ........................... 17
3.2 Reference Point Methods . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 A New Image Set-Oriented Recovering Method for Solving Multi-
Objective Optimization Problems 20
4.1 Recovering-IS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Comparison with Other Methods . . . . . . . . . . . . . . . . . . . . 28
4.3 ConvexObjectives ............................ 35
4.4 A Hierarchical Concept for Computing the Desired Part of the Pareto
Set..................................... 39
5 Basic Methods for Solving Bi-Level Optimization Problems 41
5.1 Merit Functions and Smoothing Methods . . . . . . . . . . . . . . . . 41
6 Bi-Level Multi-Objective Optimization Problems (BLMOP) 45
6.1 An Optimality Condition for BLMOPs without Lower Level Inequal-
ityConstraints .............................. 45
6.2 Methods for Solving BLMOPs without Lower Level Inequality Con-
straints................................... 52
The Case k=1.............................. 54
A BLMOP Formulation for Finding Robust Pareto Points . . . . . . 55
BL-Subdivision Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 56
BL-Recovering-PS Algorithm . . . . . . . . . . . . . . . . . . . . . . 61
Convergence of Recovering-PS Algorithms . . . . . . . . . . . . . . . 64
v
BL-Recovering-IS Algorithm . . . . . . . . . . . . . . . . . . . . . . . 67
Convergence of Recovering-IS Algorithms . . . . . . . . . . . . . . . . 70
6.3 Pareto Set Constrained Multi-Objective Optimization Problems . . . 71
6.4 Methods for Solving PSCMOPs . . . . . . . . . . . . . . . . . . . . . 75
PSC-Subdivision and PSC-Recovering Algorithm . . . . . . . . . . . 76
PSC-SamRec Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 77
PSC-Sampling Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 78
Combination of the PSC-Sampling and PSC-SamRec Algorithms . . . 79
6.5 Methods for Solving BLMOPs with Lower Level Inequality Constraints 91
BL2-Recovering-IS algorithm . . . . . . . . . . . . . . . . . . . . . . . 92
6.6 Non-Convex and Non-Smooth BLMOPs . . . . . . . . . . . . . . . . 100
BL2-Subdivision Algorithm . . . . . . . . . . . . . . . . . . . . . . . 101
7 Sensitivity Analysis 103
7.1 Sensitivity Analysis for Classical Optimization Problems . . . . . . . 103
7.2 Sensitivity Analysis for MOP and BLMOP . . . . . . . . . . . . . . . 107
7.3 A Concept for the Adaptive Choice of Targets . . . . . . . . . . . . . 117
8 Conclusion and Outlook 121
vi
1 Introduction
Many problems in all areas of applied science, engineering, economics and statis-
tics can be posed in terms of optimization. In particular, such problems can be
characterized by the fact that several objective functions have to be optimized at
the same time. For instance, for a perfect passenger car one wants to simultaneously
minimize energy consumption and maximize comfort. As indicated by this example
the different objectives typically contradict each other and therefore certainly do
not have identical optima. Thus, the goal is to approximate the ”optimal compro-
mises” which, in mathematical terms, are called Pareto points. It turned out that in
general there is an extensive set of Pareto points, the Pareto-set. Such optimization
problems are known as multi-objective optimization problems (MOP).
There are two main directions in solving multi-objective optimization problems.
On one hand, it might be sufficient to compute a single solution inside the Pareto set.
In this case several methods like for instance the weighted sums method ([18, 34])
or the ε-constraint method ([18, 34]) are good choices. On the other hand, in many
applications it is desired to know the entire Pareto set. In this case, set-oriented
methods ([10, 11, 39]) or the normal boundary intersection method ([7]) work very
efficiently. Certainly, if the entire Pareto set is available, the decision making has to
be performed, that is, one particular Pareto point has to be selected for adjusting
the system under consideration.
Very often the objectives of multi-objective optimization problems are outputs
of a complex system, where interactions between the associated subsystems restrict
the feasible set of the optimization problem. In particular the feasible set of a multi-
objective optimization problem (the higher level problem) can itself be defined by
the solution set of another multi-objective optimization problem (the lower level
problem) within an underlying subsystem. Let us illustrate this fact by an example.
Suppose one wants to minimize in a multi-objective sense both comfort and energy
consumption of a vehicle, which forms a complex system. But for safety reasons,
also optimality concerning the mechanical guidance both in the horizontal and the
vertical direction of the undercarriage regarded as an underlying subsystem has
to be guaranteed. Since this again is a multi-objective optimization problem, we
are concerned with a hierarchy of multi-objective optimization problems. In many
applications, the lower level problem is in addition parametrized by a part of the
upper level variables, such that the resulting overall problem turns out to be very
complex and hard to solve. Following the notion of classical bi-level optimization
1
problems, which are for instance considered in [1] and [13], the described hierarchy
forms a bi-level structure of multi-objective optimization problems and therefore we
call these problems bi-level multi-objective optimization problems (BLMOP).
The consideration of such problems was originally motivated by the author’s work
within the Collaborative Research Center 614 of the German Research Foundation,
which is concerned with self-optimizing systems in mechanical engineering. To state
the relevance of classical and bi-level multi-objective optimization in self-optimizing
systems, we first point out that the process of self-optimization is performed re-
peatedly in the following three steps. In Step 1 the current state of the technical
system is analyzed. Depending on the outcome of this analysis, a system of goals to
be achieved is generated in Step 2. In Step 3 the behavior of the technical system has
to be adapted due to the goals determined in Step 2. More details on self-optimizing
systems can be found in [21]. In particular, the system of goals - or a part of it - can
be made up not only by the fact that several performance properties have to meet
certain values but also by an importance rating among these performance properties.
Mathematically speaking, several possibly contradicting objective functions have
to be optimized simultaneously in Step 3, that is, a multi-objective optimization
problem has to be solved repeatedly during the self-optimizing process, while the
weights of the objectives are varying. Since the objectives are outputs of a complex
system, the corresponding feasible set can be restricted by the solution of another
parametrized multi-objective optimization problem, that is, bi-level multi-objective
optimization problems have to be solved in this case. In practice, instead of solving
the BLMOP in every cycle of the self-optimization process, the entire Pareto set
can be computed in advance, such that Step 3 is reduced to an inexpensive decision
making process instead of solving the relatively expensive BLMOP. Therefore, in
this work we concentrate on computing the Pareto set of a BLMOP, a task which
is regarded as an important part of the complex self-optimizing process.
In the literature one can find many contributions dealing with classical bi-level
optimization problems, which are characterized by the fact that every level is made
up by a classical, that is, a scalar valued optimization problem. But there are only
a few publications concerned with bi-level multi-objective optimization: first of all
we want to cite [16], from where we have taken some examples to demonstrate the
efficiency of our new algorithms. Similar problems with several parametrized lower
level multi-objective problems and an upper level multi-objective problem are inves-
tigated in [32] and [42]. Statements on the existence of solutions for these problems
are given in [32]. In [42] an algorithm for the solution of these problems is proposed,
2
but this algorithm is restricted to the computation of a single preferred solution out
of the entire solution set. In [2] the authors consider problems with a parametrized
lower level multi-objective problem and a scalar upper level problem. Problems with
a scalar upper level problem and a linear lower level multi-objective problem are in-
vestigated in [44] and [8]. Such hierarchical problems with a non-parametric lower
level multi-objective problem belong to the field of optimization over the efficient
set.
In this work we propose different new approaches for the solution of bi-level
multi-objective optimization problems. These methods are set-oriented in the sense
that they allow to approximate the entire Pareto set (or a desired part of it) rather
than just the computation of single points inside this set. The development of these
methods was essentially influenced by two different concepts in technical applica-
tions. To see the difference between these directions, observe that there might be
several solutions in a Pareto set for which the vectors of objective values are iden-
tical. On one hand it can be reasonable for many applications and it seems to be
efficient to restrict the algorithms to the computation of a complete set of alterna-
tives ([15]), that is, for every Pareto optimal point in the space of objective values
(the objective space or image space) only one corresponding point in the space of
parameters (the parameter space or pre-image space) has to be computed. But in
other technical applications it can be very important to select Pareto optimal solu-
tions which do not only correspond to a desired vector of objective values but also
have additional properties like for example (in a certain sense) robustness. More-
over, for online applications as described in [46], small distances between certain
solutions in parameter space are required. Thus, for these applications, the com-
putation of the entire Pareto set provides the best basis for selecting solutions in
order to adjust the technical system. According to the mentioned requirements cor-
responding to the two different views on applications, in this work we present both
parameter set-oriented methods for the computation of entire Pareto sets and image
set-oriented methods which are designed to compute a complete set of alternatives.
Most of our algorithms are realized by the use of a multi-level subdivision structure
in order to compute a tight covering formed by a collection of subsets of the para-
meter or image space, respectively, such that the union of these subsets contains the
solution. Since these coverings are supposed to approximate the solution, they are
computed in a way such that the standard Hausdorff distance between the solution
and the computed covering is smaller than a given value. We have implemented
3
our algorithms in the software package GAIO1, which was developed at the Chair
of Applied Mathematics of the University of Paderborn and provides the required
multi-level subdivision structure.
A more detailed outline of this thesis is as follows. In Section 2, we state more
precisely, that is, from a mathematical point of view, how the bi-level multi-objective
optimization problem is defined. To this end, we first recall the basic definitions
and the theoretical background for classical multi-objective optimization problems
in Section 2.1 ([15], [18], [34]). In particular we present the well-known Kuhn-Tucker
necessary conditions ([31]) for a Pareto optimal solution, because these will provide
the essential basis for the derivation of optimality conditions for BLMOP in Section
6.1. In Section 2.2 we consider classical bi-level optimization problems ([12], [13],
[45]). We state the basic definition for these problems and we present both the
optimistic and pessimistic formulation, which are the traditional concepts that have
been introduced in order to obtain unique solutions. Based on the content of Sections
2.1 and 2.2, we are capable of giving the mathematical definition for BLMOP in
Section 2.3. In particular, since this is the main interest of this thesis, we also state
the corresponding optimistic formulation.
The methods for the solution of classical multi-objective optimization problems,
from which we lend the basic concepts for the development of our new methods
for the solution of BLMOP, are presented in Section 3. More detailed, in Section
3.1 we concentrate on the set-oriented methods ([11], [39]), which are capable of
computing the entire Pareto set of MOP. In Section 3.2 we give a short introduction
to reference point methods ([18]), which turned out to be the proper method for
the solution of the particular subproblem, which has to be solved repeatedly in our
image set-oriented methods.
Section 4 contains new results and forms one of the main parts of this thesis.
Here, we are concerned with a new image set-oriented method for the generation of
a complete set of alternatives of multi-objective optimization problems. Originally,
we developed a variant of this method for the solution of BLMOP, but it turned out
that this concept can also be used for the solution of more general constrained multi-
objective optimization problems. In Section 4.1, the algorithm and its realization is
described in detail. Since this method uses scalarizations based on reference point
methods, in Section 4.2 we compare it to other scalarization methods, that is in
particular the weighted sums method. In Section 4.3 we explain the advantages of the
image set-oriented methods in the case of convex objective functions. Particularly,
1http://math-www.uni-paderborn.de/agdellnitz/gaio/
4
we show that in this case the complete set of alternatives corresponds to the entire
Pareto set. Furthermore, an interactive hierarchical concept for the fast computation
of a desired part of the solution based on the image set-oriented method is described
in Section 4.4.
Some basic methods for the solution of classical bi-level optimization problems
can be found in Section 5. Here, we mention very briefly several concepts presented
in [12], but we concentrate on the Kuhn-Tucker based concepts that we use for
the development of our algorithms for the solution of BLMOP. We also describe
a problem arising in the presence of lower level inequality constraints, that is, the
violation of constraint qualifications, see [12] and [33]. In Section 5.1, we present
how such problems can be overcome by reformulations which use merit functions or
smoothing functions and how smoothing methods help to solve these reformulations
([22], [28]).
Section 6 forms another main part of this thesis and is concerned with both the-
oretical and practical results for BLMOP without lower level inequality constraints
and a convex lower level problem. This section is divided as follows. In Section
6.1 we derive based on the theoretical background given in the preceding Sections
necessary optimality conditions for these problems. In Section 6.2 we propose
several set-oriented methods for the solution of this particular subclass of BLMOP.
These are on one hand subdivision algorithms, which are realized by the use of sub-
sets of the parameter space, and on the other hand recovering algorithms, which
work with subsets either in parameter space or in image space. We also prove con-
vergence (in a sense to be stated precisely later on) for most of these algorithms.
Thereafter, in Section 6.3 we concentrate on the particular case of a non-parametric
lower level problem and derive a corresponding variant of the necessary optimality
conditions. This is motivated by the fact that several applications fall into this
family of problems, which are termed Pareto set constrained multi-objective opti-
mization problems (PSCMOP). In Section 6.4, we present algorithms both of subdi-
vision and recovering nature for the solution of PSCMOP. In particular, we present
derivative-free algorithms, which can be efficiently implemented because of the sim-
pler structure of PSCMOP.
For the sake of completeness, we included a short Section 6.5 on the solution of
BLMOP with lower level inequality constraints. Here, we propose as an example an
algorithm which was realized by extending one of the previously described algorithms
using the smoothing concept mentioned in Section 5.1. In addition, we present
an algorithm for the solution of BLMOP with non-convex and non-differentiable
5
objectives and constraints in Section 6.6.
In Section 7 we perform a sensitivity analysis for MOP and BLMOP, that is, we
are interested in the variation of the Pareto set caused by the variation of an addi-
tional perturbation parameter. To this end, we first review the sensitivity analysis
for classical optimization problems ([17]) in Section 7.1. Based on this background,
in Section 7.2 we derive the desired sensitivity analysis for MOP and BLMOP. As
an application, in Section 7.3 we use the results derived in Section 7.2 for the devel-
opment of a concept for the adaptive choice of algorithmic parameters (the targets),
which helps to control the spreading among the Pareto points computed by the
image set-oriented methods described in Section 4.1 and Section 6.2.
In Section 8, the thesis closes with a summary of the results and a discussion
about open problems and possible future directions.
2 Basic Definitions and Concepts
2.1 Multi-Objective Optimization
In this section we review the theoretical background on multi-objective optimization
needed in this work. We briefly summarize the basic concepts including first order
necessary optimality conditions (Kuhn-Tucker conditions) and a direction of descent.
Then we describe the set-oriented methods for solving classical multi-objective op-
timization problems, from which we adapt the main concepts in order to develop an
image set-oriented variant of these methods in Section 4 and set-oriented methods
for the solution of bi-level multi-objective optimization problems in Section 6.
As mentioned in Section 1, the task of a multi-objective optimization problem
MOP is to optimize several real valued objective functions Fi:
R
n
R
simulta-
neously. Before the MOP can be stated mathematically we have to define a partial
order on
R
k.
Definition 2.1 Let v, w
R
k. Then the vector vis less than w(v <pw), if
vi< wifor all i {1, . . . , k}. The relation pis defined in an analogous way.
We collect kobjective functions Fiin the vector valued function
F:
R
n
R
k, F(x)=(F1(x), . . . , Fk(x))t.
Additionally, if there are pnequality constraints Hiand qinequality constraints
Gito be satisfied, that is, Hi(x) = 0 for i= 1, . . . , p and Gi(x)0 for i= 1, . . . , q,
6
then we collect them analogously in vector valued functions
H:
R
n
R
p, H(x) = (H1(x), . . . , Hp(x))t
and
G:
R
n
R
q, G(x)=(G1(x), . . . , Gq(x))t
and denote the feasible set by
S={x
R
n:H(x) = 0 and G(x)p0}.
With this notation we can state the multi-objective optimization problem as follows:
min
xSF(x),(MOP)
where minimization has to be understood in the sense of Definition 2.1. A point
¯xSis a solution of MOP, if for any other xSeither F(x) = F(¯x) or the value
Fi(x) of at least one objective Fiis greater than Fi(¯x). To be more precise, we state
the following
Definition 2.2 Consider the multi-objective optimization problem MOP. Then a
point ¯xSis called (globally) Pareto optimal or a (global) Pareto point if there is
no ySsuch that
F(y)6=F(¯x) and F(y)pF(¯x).(2.1)
A point ¯xSis a local Pareto point, if there is a neighborhood U(¯x) of ¯xsuch that
there is no yU(¯x)Ssatisfying (2.1).
The following definition will be useful, because some of the algorithms described in
this work use the partial order pwithin certain sets of points.
Definition 2.3 (i) For X
R
nwe call a point x X nondominated with
respect to Fand Xor F-nondominated with respect to Xif there does not
exist any point y X with F(y)6=F(x) and F(y)pF(x).
(ii) For X
R
nwe call a point x X nondominated with respect to F, G, H and
Xor F(G,H)-nondominated with respect to X, if G(x)p0, H(x) = 0 and
there does not exist any point y X with G(y)p0, H(x) = 0, F(y)6=F(x)
and F(y)pF(x).
Since pjust defines a partial order on
R
k, one cannot expect to find isolated Pareto
points. Under certain assumptions the solution to an unconstrained MOP locally
7
forms a (k1)-dimensional manifold, see [25]. On the other hand, if the constraints
of a MOP define an r-dimensional manifold, we can expect that generically the
solution to the MOP locally forms a manifold of dimension min{k1, r}.
The following theorem of Kuhn and Tucker ([31]) states a necessary condition for
Pareto optimality.
Theorem 2.4 Let xbe a Pareto point of MOP and assume that the vectors Hi(x),
i= 1, . . . , p and Gj(x),j {l:Gl(x) = 0}are linearly independent.
Then there exist scalars α1, . . . , αk, µ1, . . . , µq0, λ1, . . . , λpR,such that
k
X
i=1
αi= 1,
µiGi(x) = 0 for i= 1, . . . , q and
k
X
i=1
αiFi(x) +
p
X
i=1
λiHi(x) +
q
X
i=1
µiGi(x)=0.
(2.2)
Points satisfying (2.2) are not necessarily Pareto optimal, but certainly “Pareto
candidates” and thus we now emphasize their relevance by the following
Definition 2.5 A point x
R
nis called a substationary point of MOP if there
exist scalars α1, . . . , αk, µ1, . . . , µq0, λ1, . . . , λpRsuch that (2.2) is satisfied.
To find substationary points, one can use a direction of descent, that is, a di-
rection in
R
nin which all the kobjectives are simultaneously non-increasing and
at least one objective is decreasing. Several choices of descent directions can for in-
stance be found in [3], [23], and [37] and references therein. As an example we review
the descent direction given in [37]. To this end, we associate with F:
R
n
R
k,
F(x) = (F1(x), . . . , Fk(x))t, the following quadratic optimization problem for every
fixed x
R
n:
min
α
R
k(k
k
X
i=1
αiFi(x)k2
2;αi0, i = 1, . . . , k,
k
X
i=1
αi= 1).(QOP)
Then the weighted sum of gradients
k
X
i=1
ˆαiFi(x),
where ˆαis a solution of QOP, either vanishes or is a descent direction for all objective
functions F1, . . . , Fkin x.
8
With this result an iteration step for finding substationary points can be defined by
first computing ˆαfor the current point xand then performing a line search along
Pk
i=1 ˆαiFi(x) to find a new point ˜x.
2.2 Bi-Level Optimization
In this section we give a brief introduction to (classical) bi-level optimization. In
doing so we follow basically the definitions of [12] and concentrate on those contents
required for the development of our methods for solving BLMOPs. Comprehen-
sive overviews on bi-level optimization can be found in [1, 5, 12, 13, 45]. Bi-Level
optimization (or bi-level programming) problems constitute a particular kind of
hierarchical optimization problems, where a part of the constraints for the upper (or
higher) level problem is defined by another parametric optimization problem, the
lower level problem. To be more precise, denote by
F:
R
n×
R
m
R
the upper level objective function. The rupper level equality constraints Hi(x, y) = 0
and supper level inequality constraints Gj(x, y)0 are collected in the vector val-
ued functions
H:
R
n×
R
m
R
r, H(x, y)=(H1(x, y), . . . , Hr(x, y))t
and
G:
R
n×
R
m
R
s, G(x, y) = (G1(x, y), . . . , Gs(x, y))t.
Analogously, denote by
f:
R
n×
R
m
R
the lower level objective function. The plower level equality constraints hi(x, y) = 0
and qlower level inequality constraints gj(x, y)0 are collected in the vector valued
functions
h:
R
n×
R
m
R
p, h(x, y) = (h1(x, y), . . . , hp(x, y))t
and
g:
R
n×
R
m
R
q, g(x, y) = (g1(x, y), . . . , gq(x, y))t.
With these notations, the bi-level optimization problem can be written as
00 min
y00F(x(y), y) (BLP)
s.t. (x(y), y)S,
9
where
S={(x, y)
R
n×
R
m:G(x, y)p0, H(x, y) = 0, x ψ(y)}
and ψ(y) denotes the solution set of the following lower level problem:
min
xf(x, y) (LLP)
s.t. g(x, y)p0,
h(x, y) = 0.
Observe that in the case of non-unique solutions x(y)ψ(y) for the lower level
problem, the notion of an optimal solution of the bi-level optimization problem is
not necessarily obvious. This ambiguity is expressed by using the quotation marks in
(BLP). One way out of this situation is given by the optimistic or weak formulation,
where for every fixed y
R
m,x(y)ψ(y) is chosen such that F(x(y), y) is minimal.
This optimistic formulation can be expressed as
min
ymin
x{F(x, y) : xψ(y)}(BLP-O)
s.t. G(x, y)p0,
H(x, y) = 0.
Analogously, in the pessimistic or strong formulation,x(y)ψ(y) is chosen such
that F(x(y), y) is maximal. This pessimistic formulation can be expressed as
min
ymax
x{F(x, y) : xψ(y)}(BLP-P)
s.t. G(x, y)p0,
H(x, y) = 0.
Originally, the optimistic and pessimistic formulations were used in [43] for the
description of real market situations, where different decision makers try to realize
best decisions with respect to their individual aims while they are not able to realize
their decisions independently but are forced to react according to a certain hierarchy.
In particular, this hierarchy can be given by a bi-level structure as described above.
Then, the optimistic or pessimistic formulations might be suitable, depending on the
question whether the lower level decision maker (the follower) is willing to support
the higher level decision maker (the leader) or not.
Non-uniqueness of lower level solutions will be particularly relevant in Section
6, where the lower level problem will be replaced by a MOP and therefore ψ(y) is
10
given by a Pareto set which in general is an extensive set and not a singleton. In
accordance to the structure of the applications we have in mind, we will concentrate
on the optimistic formulation.
Let us now assume that the lower level problem LLP is convex, that is, for
every fixed y
R
m,fand gi, i = 1, . . . , q are convex and hi, i = 1, . . . , p are
affine-linear. Then the solution x(y)ψ(y) is unique and we can write x=x(y).
Moreover, under the common regularity assumptions, LLP can be replaced by its
Kuhn-Tucker conditions2which are necessary and sufficient in this case. In this
way, with ¯
:= (x), an auxiliary problem equivalent to the original problem is
obtained:
min
x,y F(x, y) (BLP’)
s.t. G(x, y)p0,
H(x, y)=0,
¯
f(x, y) +
p
X
i=1
ζi¯
hi(x, y) +
q
X
i=1
τi¯
gi(x, y) = 0,
g(x, y)p0,
h(x, y)=0,
τigi(x, y) = 0 for i= 1, . . . , q,
τi0 for i= 1, . . . , q.
There are several solution methods for (BLP), (see [12, 13, 1]), which are based
on the solution of such auxiliary problems. In Section 6, we will borrow and extend
this idea for the development of most of our algorithms for the solution of BLMOP.
2.3 Bi-Level Multi-Objective Optimization
In this section we introduce those problems, which make up the main problem class
under consideration in this thesis, that is, the class of bi-level multi-objective opti-
mization problems (BLMOP). This can be understood as a MOP, where some vari-
ables xhave to be taken from the solution set of another MOP which is parametrized
by variables yof the first MOP. In other words, BLMOP arises from BLP by re-
placing both the higher and lower level optimization problem by multi-objective
optimization problems.
2The Kuhn-Tucker conditions for scalar valued optimization problems can be obtained from
Theorem 2.4 by setting k= 1.
11
To express the BLMOP in mathematical terms, we use the same notations as in
Section 2.2 with the exception that both the upper and lower level objectives are
considered to be vector valued, that is
F:
R
n×
R
m
R
k, F(x, y) = (F1(x, y), . . . , Fk(x, y))t
and
f:
R
n×
R
m
R
l, f(x, y)=(f1(x, y), . . . , fl(x, y))t.
For every y
R
m, let ψ(y) denote the solution, that is, the Pareto set of the
following lower level problem:
min
xf(x, y),(BLMOP-LL)
s.t. g(x, y)p0,
h(x, y) = 0,
where minimization has to be understood in the sense of the partial order p.
Then a BLMOP can be stated as follows:
00 min
y00F(x(y), y),(BLMOP)
s.t. (x(y), y)S,
where again minimization has to be understood in the sense of the partial order
pand the feasible set is defined by
S={(x, y) : G(x, y)p0, H(x, y) = 0, x ψ(y)}.
Motivated by applications and by the fact that for a BLMOP ψ(y) is typically
an extensive Pareto set, we will concentrate on a suitable variant of the optimistic
formulation BLP-O. This variant differs from BLP-O in the way that the upper level
problem and for every fixed ythe lower level problem of BLMOP are minimized in a
multi-objective sense. For this, we introduce the Pareto-optimistic formulation for
BLMOP:
min
ymin
x{F(x, y) : xψ(y)}(BLMOP-O)
s.t. G(x, y)p0,
H(x, y) = 0,
where minimizations have to be understood in the sense of the partial order p. The
solution of such problem makes up the main interest of this thesis and is considered
in Section 6.
12
3 Basic Methods for Solving Multi-Objective Op-
timization Problems
In the last decades many methods for solving MOP have been developed. Compre-
hensive overviews on methods can be found for instance in [15] and [34]. There are
some algorithms e.g. the weighted sums method that are only capable of compu-
ting single points of the Pareto set. Since many of these single solution methods can
be realized by applying established optimization algorithms to scalar valued auxilia-
ry functions, they are relatively fast and efficient. Although in some particular
situations the user might be satisfied with a single Pareto point, in many applica-
tions it is desired to have an overview on the entire Pareto set. For this, algorithms
for computing the entire Pareto set have been developed, see [11],[15],[20],[25], [34],
and [39]. Once the entire Pareto set is available, the user has to select one of the
solutions, e.g. for adjusting the technical system under consideration. On the one
hand, this decision making problem is very challenging because the computed solu-
tions are of equal worth from the mathematical point of view. On the other hand,
having an overview on the entire Pareto set provides the best basis for selecting the
right solution.
Of course, computing the entire Pareto set can be very time-consuming, particu-
larly, if the dimension of the feasible set S
R
nis very high. In order to attenuate
this drawback, other methods (e.g. the normal boundary intersection method, [7])
make a compromise and calculate only a certain subset of the Pareto set, which is
sufficient for the decision making process in the sense that it forms a complete set
of alternatives ([15]), that is, for every Pareto optimal point in image space, only
one corresponding Pareto point in parameter space is computed. Our new image
set-oriented method for the solution of MOP, see Section 4, belongs also to the latter
class of methods and as will be demonstrated later on has some additional nice
properties. Moreover, some of these properties gave reason to extend this algorithm,
see Section 6, in order to solve BLMOP.
In the next sections we describe in more detail those methods from which we
borrowed the basic ideas for the development of our methods for the solution of
BLMOP. To be more precise, in Section 3.1 we review set-oriented methods which
are originally tailored to compute the entire Pareto set of MOP in the absence of
constraints, but as we will see in Section 6 can be extended in order to solve
more general problems. In Section 3.2 we also give a short introduction to reference
point methods, which turned out to be the proper method for finding individual
13
Pareto points, a subproblem, which has to be solved repeatedly as the new image
set-oriented method described in Section 4 proceeds.
3.1 Set-Oriented Methods for Solving Unconstrained Multi-
Objective Optimization Problems
We now briefly review the set-oriented algorithms for the approximation of the
Pareto set or the set of substationary points of a given unconstrained MOP in a
compact domain Q
R
n. For a detailed exposition the reader is referred to [11].
Using a multi-level subdivision scheme each of these methods produces a sequence
of sets B0,B1,B2,. . . where each Bjconsists of finitely many subsets of Q. Generally,
there are two different types of such methods: the methods of subdivision type and
the methods of recovering type. In methods of subdivision type, every Bjcovers the
Pareto set or the set of substationary points, such that the diameter of the subsets
shrinks as the index jincreases. In methods of recovering type, the diameter of the
subsets is fixed. Here, every Bjforms a partial covering of the Pareto set or the set
of substationary points and is extended as the method proceeds.
The numerical realization of the set-oriented methods works as follows: the el-
ements B Bjare boxes each of which is specified by a center in
R
nand nradii.
But rather than working explicitly with centers and radii of the boxes these are
stored within a binary tree which leads to a significant reduction in the memory
requirement. The boxes are discretized via test points. Strategies for the choice of
test points are presented in [11].
In the following we will denote by Sthe set of substationary points within Qand
we will call the elements Bof Bjboxes. Let Φ : RnRnbe an iteration scheme for
finding substationary points of the MOP under consideration.
Subdivision Algorithm
The sequence (Bj)jNproduced by the following algorithm has the property that
the diameter
diam(Bj) := max
B∈Bj
diam(B)
tends to zero for j . In fact it can be shown, see [11], that the box coverings
which are created by the following algorithm converge towards S.
Let B0be an initial collection of finitely many subsets of the compact set Qsuch
that B∈B0B=Q. Then Bjis inductively obtained from Bj1in two steps:
14
(i) Subdivision Construct from Bj1a new system ˆ
Bjof subsets such that
[
Bˆ
Bj
B=[
B∈Bj1
B
and
diam( ˆ
Bj) = θjdiam(Bj1),
where 0 < θmin θjθmax <1.
(ii) Selection Define the new collection Bjby
Bj=nBˆ
Bj: there exists ˆ
Bˆ
Bjsuch that Φ1(B)ˆ
B6=o.
Figure 1: The idea of the Subdivision algorithm.
The described method is of global nature in the sense that the entire domain Q
is explored, such that all connected components of Scan be found, if the num-
ber of test points used for the discretization of the boxes is large enough. But it
should be mentioned that particularly in the case of many test points this method
is restricted to moderate dimensions of the parameter space, because the number of
boxes B Bj, which have to be explored in every iteration, might grow rapidly as
the index jincreases.
Recovering Algorithm
During the subdivision procedure boxes might get lost although they contain sub-
stationary points. This can be the case when the number of test points taken into
account is not large enough. In the following we describe an algorithm which allows
to recover boxes which have previously been lost, although they contain substation-
ary points. But it should be mentioned that this algorithm is also a self-contained
continuation method, that is, given at least one single Pareto point within the com-
pact domain Q, further substationary points are generated successively in order to
15
obtain a representation of the entire Pareto set of the given MOP within Q. For
formal reasons let us denote by Pdacomplete partition3of the set Qinto boxes
of subdivision size or depth d, which are generated by successive bisection of
Q. Then there exists for every point xQand every depth dexactly one box
B(x, d) Pdwith center cand radius rsuch that cirixi< c+ri,i= 1, . . . , n.
Moreover, denote by Φq(s) the application of qsteps of an iteration scheme for find-
ing substationary points using an initial guess s. For a given initial box collection
B0the algorithm reads as follows:
(i) for all B B0
B.active := TRUE
(ii) for j= 0, . . . , MaxStep
ˆ
Bj:= Bj
for all B {B Bj:B.active == TRUE}
choose starting points {si}i=1,...,l near B
X:= {Φq(si)|i= 1, . . . , l}
B.active := FALSE
for all x X:
if B(x, d)6∈ ˆ
Bj
B(x, d).active := TRUE
ˆ
Bj:= ˆ
BjB(x, d)
if ˆ
Bj== BjSTOP
Bj+1 := ˆ
Bj
Figure 2: The idea of the Recovering algorithm.
Unfortunately, the Recovering algorithm may not perform adequately when a box
does not contain part of Sbut is possibly far away. In this case many undesired
3Pdhas not to be explicitly computed by our algorithms.
16
regions would be added to the box collection on the way towards Sin the course
of the iteration of test points. Strategies to overcome this problem can be found in
[20].
Sampling Algorithm
There are some potential drawbacks of the algorithms described above: for many
iteration schemes Φ, the gradients of the objectives are needed. But in general,
gradients might not exist at all or the calculation of gradients might be very expen-
sive from the computational point of view. Moreover, the set Sis generally a strict
superset of the Pareto set for several reasons. On the one hand, there might be
local Pareto points, which are not Pareto optimal from a global point of view. On
the other hand, in some applications, e.g. if the objectives are not defined outside
the domain Q, penalization strategies have to be used to avoid that the iteration
scheme Φ leads to points outside Q. In this case, the algorithms are capable of
finding points on the boundary Q of Q, which are not necessarily Pareto optimal
for the given MOP. These problems can be avoided using the following sampling
algorithm, which takes only the function values of the objectives into account. An
outline of an iteration of the algorithm is as follows. Given a box collection Bj1
the collection Bjis obtained by:
(i) Subdivision Construct from Bj1a new system ˆ
Bjof subsets such that
[
Bˆ
Bj
B=[
B∈Bj1
B
and
diam( ˆ
Bj) = θjdiam(Bj1),
where 0 < θmin θjθmax <1.
(ii) Selection
for all Bˆ
Bj
choose a set of test points XBB
N:= nondominated points of S
Bˆ
BjXB
Bj:= nBˆ
Bj:x XBNo
Observe that, caused by the discretization of the boxes and, where required, also
by the application of an iteration scheme Φ, the described set-oriented methods
17
Figure 3: The idea of the Sampling algorithm.
(Subdivision, Recovering, Sampling) are capable of generating not only boxes but
also substationary points within these boxes. Typically, these points can be saved
in an archive and the globally nondominated points of this archive can be filtered
out by nondominance tests. These points are well-distributed in parameter space
in the sense that in every generated box there is at least one of these points. Such
a representation of the Pareto set forms an advantageous basis for the respective
decision making process in many (but not all) real world applications.
3.2 Reference Point Methods
Areference point t
R
kcan be regarded as a vector of desirable objective values
called aspiration levels or targets,ti, i = 1, . . . , k. Reference point methods use
feasible or infeasible reference points for the construction of scalar valued auxiliary
functions. For an overview on different types of reference point methods the reader is
referred to [18]. In the following we will focus on distance function based approaches,
which are relevant for our new algorithm presented in Section 4. As indicated by
its notation, distance function based approaches use a distance function, which is
typically based on a norm, to measure the distance between a reference point and
a given point in image space. To state the auxiliary problem corresponding to a
target vector t
R
k, let δ:
R
k×
R
k
R
+be a distance function derived from a
norm, i.e., δ(a, b) = ||ab|| for some norm || ·|| :
R
k
R
+. Then the auxiliary
problem to be solved is
min
xSδ(F(x), t).(RPP)
If we have δ(F(x?), t)>0, where x?is a solution to RPP, then we know that F(x?)
is on the boundary of the image F(S) = {F(x) : xS
R
n}. Moreover, if in
addition t <pF(x?) we can expect that x?is (at least a local) Pareto point. Thus,
local Pareto points can be found by first choosing proper targets and then solving
RPP. Indeed, Theorem 3.1, which was taken from [15], guarantees that, under
18
certain assumptions, x?is a Pareto point. For this, recall that a norm ||·|| :
R
k
R
+
is called strictly monotonically increasing, if ||y1|| <||y2|| for all y1, y2
R
kwith
|y1
j| |y2
j|, j = 1,2, . . . , k and |y1
j| 6=|y2
j|for some j.
Theorem 3.1 Let || · || be a strictly monotonically increasing norm and assume
ti= min{Fi(x) : xS}for i= 1,2, . . . , k. If x?is an optimal solution of RPP,
then x?is a solution of MOP.
Proof: See [15].
In order to compute a representation of the entire Pareto set, our new image set-
oriented algorithm presented in Section 4 repeatedly solves a variant of RPP while
the targets are varying. To state a corollary which guarantees that the correspon-
ding solutions are at least locally Pareto optimal we denote T= (T1, . . . , Tk), Ti=
min{Fi(x) : xS}for i= 1,2, . . . , k and define for a given target vector t
R
k
the modified feasible set
St={xS:Fi(x)ti, i = 1, . . . , k}.
Furthermore, we define both a modified multi-objective optimization problem and
a modified auxiliary problem by replacing Sby St:
min
xSt
F(x),(MOP )
min
xSt
δ(F(x), t).(RPP )
Now, with these notations we can state the following
Corollary 3.2 Let Fbe continuous on the compact domain S. Moreover, let ||·||
be a strictly monotonically increasing norm and assume that T <pt <pF(x?), where
x?is an optimal solution of RPP . Then x?is a local solution of MOP.
Proof: Since Fiis continuous and since there are ¯xi, x?Swith Fi(¯xi) = Ti<
ti< Fi(x?), there exist xiSsuch that Fi(xi) = tifor all i= 1,2, . . . , k. From
construction of Stit is obvious, that x?Stand ti= min{Fi(x) : xSt}. Thus,
Theorem 3.1 guarantees, that x?solves MOP . Since Stis constructed from Sjust
by constraining the image of F, such that F(St) contains a part of a local Pareto
optimal set in image space, x?is a local solution of MOP.
19
4 A New Image Set-Oriented Recovering Method
for Solving Multi-Objective Optimization Prob-
lems
As mentioned in Section 3.1, the (classical) Recovering algorithm uses a multi-
level subdivision scheme in order to discretize the parameter space and to achieve
diversity of the computed solution set. For our new algorithm presented in this
section, this set-oriented strategy was adapted in order to operate adequately in
image space, that is, we use boxes in image space rather than in parameter space.
This concept is motivated by the fact that in many applications there are just a
few objectives which depend on a relatively high number of parameters. Another
reason for the development of this image set-oriented variant is given by the decision
makers requirement that the calculated Pareto points should be well-distributed in
image space rather than in parameter space. Later on in Section 6.2, we will adapt
the concept of this new algorithm for the development of an image set-oriented
algorithm for the solution of BLMOPs.
4.1 Recovering-IS Algorithm
The crucial difference between our new algorithm and the classical Recovering algo-
rithm is the fact that a randomly chosen point t
R
kdoes not necessarily belong
to the image F(S) = {F(x) : xS}, that is, we do not know whether there is
any xSsuch that F(x) = t. Moreover, if F(x) = tfor some xS, we do not
know whether xis Pareto optimal. To get an answer to these questions, we solve
the auxiliary problem RPP and as mentioned in Section 3.2 if t <pF(x?) for
a solution x?of RPP , then we know that under suitable assumptions x?is at
least locally Pareto optimal. Otherwise, if t=F(x?), then we repeatedly have to
vary tand solve RPP until t <pF(x?). A strategy for the choice and variation of
the targets tcan be found later on in this section.
With respect to the computational effort, the described technique for evaluating
a box in image space is similar to the technique for evaluating a box in parameter
space as used by the classical Recovering algorithm (provided that the dimensions
of both spaces are the same): for both techniques, a sufficiently number of targets
or starting points, respectively, has to be chosen and consequently the same number
of executions of the respective iteration scheme has to be applied. Thus, we expect
that the computational effort has the same order of magnitude for both evaluation
20
techniques. But due to the fact that in many applications the dimension of the
parameter space is much higher than the dimension of the image space, our new
Recovering-IS algorithm (with boxes in image space) turns out to be more efficient
in this case, because the number of points needed to represent a box grows rapidly
as the dimension of the considered space grows. For example, pmpoints are required
for generating a consistent grid of points in an m-dimensional box using ppositions
in every coordinate.
In the algorithm described below and for the remainder of this work the distance
function δis based on the norm || · ||2that is, δ(a, b) = ||ab||2for all a, b
R
k. Observe, that for every point F
R
kthere is exactly one box B(F, d) (in
image space) of depth dcontaining F. Starting with a given box collection B0in
image space and associating a solution (xB, FB) to every box B, the Recovering-IS
algorithm reads as follows:
(i) for all B B0
B.active := TRUE
(ii) for j= 0, . . . , MaxStep
ˆ
Bj:= Bj
for all B {B Bj:B.active == TRUE}
choose target vectors {ti}i=1,...,l near Bwith ti<pFB
x?
i:= arg min
xRnδ(F(x), ti), i = 1, . . . , l
F?
i:= F(x?
i), i = 1, . . . , l
B.active := FALSE
for all i= 1, . . . , l:
if B(F?
i, d)6∈ ˆ
Bj
˜
B:= B(F?
i, d), x ˜
B:= x?
i, F ˜
B:= F?
i
˜
B.active := TRUE
ˆ
Bj:= ˆ
Bj˜
B
if ˆ
Bj== BjSTOP
Bj+1 := ˆ
Bj
As desired, this algorithm computes Pareto points that are well-distributed in
image space in the sense that in every generated box there is one of the computed
Pareto point. Consequently, these points make up a suitable basis for the decision
making process related to many applications.
21
Figure 4: The idea of the Recovering-IS algorithm.
There remains the question, how to choose the target vectors ti, i = 1,2, . . . , l
near a current box Bin order to obtain a complete set of alternatives. Efficient
strategies for the choice of target vectors can be defined, particularly by using local
information on the Pareto set, e.g. orientation or curvature, which can be calculated
via objective derivatives (or numerical approximations of the derivatives). Moreover,
as presented in Section 7, sensitivity analysis can be used as an additional tool to
control the distance among the computed Pareto points. In the following we will
focus on a particular strategy for the choice of the targets which was originally
designed for problems with smooth objectives, but is also applicable and works
satisfactorily in the case of more general objectives. Let us assume that the image
F(P)
R
kof the Pareto set P
R
nis smooth and forms a (k1)-dimensional
manifold in a neighborhood Nε(y?) of a given Pareto optimal point y?=F(x?)
F(P) in image space. Since an approximation of F(P) at y?is given by the tangent
space Ty?F(P), there are certainly further Pareto points near Ty?F(P)Nε(y?).
Consequently, we can expect that there are λ
R
and pTy?F(P)Nε(y?), such
that suitable targets needed for the computation of further Pareto points can be
expressed by p+λd, where dp0 denotes a basis vector of the 1-dimensional space
(Ty?F(P)). Thus, to apply this idea in practice, we first have to construct dand a
basis V:= {b1, b2, . . . , bk1}of Ty?F(P) and then to specify targets
ti=y?
i+
k1
X
j=1
αi,j bj+λid, i = 1,2, . . . , l
by determining the coefficients αi,j and λi. For example, the construction of two
different targets in the neighborhood of a Pareto Point y?in image space for the
case k= 2 is depicted in Figure 5. Fortunately, if y?was found in a previous step
by solving RPP for a given target t?, t?<py?, then dcan be obtained without
any additional effort by d:= t?y?. Once dis available, any standard method
for the construction of an orthogonal basis, e.g. the Grahm-Schmidt method can be
22
used to obtain the required basis V. For all i= 1,2, . . . , l, the coefficients αi,j are
chosen such that pi:= Pk1
j=1 αi,j bjis located inside a neighbor box of the current
box. Moreover, the pishould be well-distributed around y?. With this heuristic, it
is very likely to find new boxes containing Pareto points. For the choice of λian
adaptive concept has to be applied, because a computed solution ¯yof RPP can
only be accepted, if ti<p¯yis satisfied. Such an adaptive concept should be guided
by the fact that ti<p¯ycertainly holds if λiis sufficiently large, but it should also
be considered that RPP is ill-conditioned if λiis too large.
Figure 5: The construction of targets based on bTy?F(P) and d(Ty?F(P)).
Example 4.1 For our first example let F= (F1, F2)t:R3R2be defined by
F(x1, x2, x3) = (x11)2+ (x21)2+ (x31)4
(x1+ 1)4+ (x2+ 1)2+ (x3+ 1)2(4.1)
We assume that the decision maker is only interested in solutions for which both
objective values are located within the interval I:= [0,20] and therefore we define
S:= {x
R
3:Fi(x)I, i = 1,2}and consider the MOP
min
xSF(x).(4.2)
23
Figure 6 shows the solutions generated by the Recovering-IS algorithm using different
box sizes (depths). Here, the reader can get an impression of how the density of the
computed representation can be controlled by choosing the box size.
Figure 6: The solution of Example 4.1 computed by the image set-oriented recovering
algorithm using different box sizes in image space.
Example 4.2 We use the next example to demonstrate the nice property of the
Recovering-IS algorithm to generate a well-distributed representation of the Pareto
set. To this end let F= (F1, F2, F3)t:R3R3be defined by
F(x1, x2, x3) =
(x11)4+ (x21)2+ (x31)2
(x1+ 1)2+ (x2+ 1)4+ (x3+ 1)2
(x11)2+ (x2+ 1)2+ (x31)4
(4.3)
and consider the MOP
min
x
R
3F(x).(4.4)
Figure 7 depicts the solution computed by the Recovering-IS algorithm and Figure 8
depicts the solution computed by the classical Recovering algorithm.4Observe that
4The generated boxes are omitted in these figures in order to obtain more transparency.
24
the Pareto points computed by the Recovering-IS algorithm are well-distributed over
the entire Pareto set (in image space). On the other hand, the distribution obtained
by the classical Recovering algorithm varys drastically. Consequently, there are too
many unneeded points in one region, whereas the density is possibly too sparse for
the decision makers’ consideration in an essential part of the Pareto set.
Figure 7: The solution of Example 4.2 computed by the Recovering-IS algorithm
(image space).
Figure 8: The solution of Example 4.2 computed by the classical Recovering algo-
rithm (image space).
25
Example 4.3 Application to Mechatronics
In the following we review the main contents of [10] where the efficiency of the
Recovering-IS algorithm was demonstrated on a technical application taken from
[30]. Here we are concerned with an energy management problem for an onboard
storage system of an overhead line powered tram as shown in Figure 9. Here the aim
is to reduce both overhead line peak power and energy consumption simultaneously.
The storage device accumulates the energy which is released while the vehicle is
Figure 9: Block diagram of the tram system with onboard storage and operation
management system.
braking and, as shown in Figure 10, this stored energy can be used later on at
a suitable point of time to power the drive. Our task is to design the operation
management that is, to manage this charging and discharging, such that overhead
line peak power and energy consumption are minimized in a multi-objective sense.
For a more detailed description of this application see [30].
Since the drive cycle of the tram is divided into 1241 track sections and the
energy management system has to assign a reference value to each of these sections,
we have to solve a MOP with 2 objective functions and 1241 parameters. From
the engineer’s point of view a solution with a linear or at least monotone behavior
of the series of reference values corresponding to certain connected blocks of track
sections seemed to be reasonable. Motivated by this fact, the optimization process
26
Figure 10: The operation principle of the onboard storage system.
was performed in two steps. For a pre-optimization step, a reduced model was
constructed by grouping the parameters within each block and representing the
entire block by just two parameters i.e., by a linear auxiliary function. The resulting
model turned out to have only 184 parameters. Thus, it was possible to compute
very quickly a first approximation to the solution of the original problem as follows.
First, the Pareto set of the reduced model was computed using the Recovering-IS
algorithm. Due to the smaller number of parameters of the reduced model, this
step was executed relatively quickly. Then the higher-dimensional points of the
desired first approximation were calculated from the Pareto points of the reduced
model simply by linear interpolation. In the second step of the optimization process
the Recovering-IS algorithm was applied to the original problem using the first
approximation computed by the pre-optimization step as a starting iteration. This
principle of first performing a pre-optimization on a simplified lower-dimensional
model and taking the result for generating a starting iteration for solving the original
problem turned out to be a helpful step in decreasing the necessary computational
effort.
The results of the optimization procedure are shown in Figure 12. The stars
are the results of the pre-optimization and the circles are the results obtained from
the subsequent optimization on the original model using the results of the pre-
optimization as starting points. In order to compare the objectives, they are nor-
malized in the sense that they have the value 1 when no storage device is used.
For the purpose of comparison the reduced model has again been optimized
both with the classical Recovering algorithm in parameter space and with the new
Recovering-IS algorithm. To keep the results comparable, both algorithms have
been initialized with the same starting point and the provided computation time was
27
the same for both algorithms. The archived points associated with the computed
boxes are shown in Figure 11. Observe that the objective values computed by
the Recovering-IS algorithm are considerably better than those computed by the
Recovering algorithm in parameter space. Moreover, there is a crucial difference
between the results concerning diversity. This is caused by the fact that - as in many
applications - the map from the Pareto set in parameter space to the corresponding
points in image space is not injective, such that the classical Recovering algorithm
computes many Pareto points which are evenly distributed in the high-dimensional
parameter space but possibly close together in the low-dimensional image space.
On the other hand, due to the box-oriented concept in image space, the Recovering-
IS algorithm produces Pareto points which are evenly distributed in image space.
This shows the previously mentioned advantages of the Recovering-IS algorithm:
the time-consuming but unnecessary computation of too many points is avoided by
restricting to a few points which are evenly distributed in image space. These points
form a sufficient subset of the Pareto set and provide a manageable basis for selecting
the right solutions, particularly when this selection has to be done repeatedly in the
course of a self-optimizing process.
Let us make a remark on the choice of starting points. First observe that at least
one Pareto point has to be provided as a starting point for applying the Recovering
or the Recovering-IS algorithm. Obviously, such starting points can be generated
by just optimizing the individual objectives or a weighted sum of all objectives
using a standard optimization technique for scalar valued problems. Fortunately, in
the application presented here, a starting point was given because of the engineers’
knowledge on the technical system, such that this preliminary step was not necessary.
4.2 Comparison with Other Methods
In this section we will first compare the (image set-oriented) Recovering-IS algorithm
with the classical (parameter set-oriented) Recovering algorithm. In doing so, we
will point out some advantages of the Recovering-IS algorithm, but we will also
give some guidelines for combining the classical and the new method. Then we
will illustrate how the solutions of the Recovering-IS algorithm are related to the
solutions of the well-known weighted sums methods. We will give an example to
28
Figure 11: The solution of the tram problem computed by the IS-Recovering algo-
rithm and by the Recovering algorithm (image space).
Figure 12: Optimization results for the tram model (image space).
29
see that the Recovering-IS algorithm is capable of finding Pareto points for which
the weighted sums method is inapplicable. Moreover, we will prove that at least
every solution of the weighted sums method can also be found by the reference
point method, i.e., by solving RPP , which is the particular technique used to find
individual Pareto points within the Recovering-IS algorithm. Thereafter, we will
state some advantages of the Recovering-IS algorithm over the normal boundary
intersection method.
Comparison with the Classical Recovering Algorithm As stated in Section
4.1, the Recovering-IS algorithm is particularly advantageous in the case where the
dimension of the parameter space is significantly higher than the dimension of the
image space or, the number of objectives to be optimized. Moreover, the computed
points are well-distributed in image space, which can be particularly advantageous
from the decision maker’s point of view. But there is another advantage which is
independent of the dimensions. To see this, remember that in the Recovering-IS
algorithm a basis of the tangent space Ty?F(P), which is used for determining the
targets ti, can be obtained by some very simple calculations based on y?and t?. But,
for the classical Recovering algorithm, if one wants to construct the corresponding
tangent space Tx?Pin parameter space in order to determine starting points (or
predictors) close to the Pareto set P, the gradients Fi(x?), i = 1, . . . , k of all
objectives are required and a QR-decomposition of a matrix at least of dimension
(n+k)×(n+1) (in the unconstrained case) has to be calculated, see [25]. Moreover,
for a suitable iteration scheme for finding further points on Pand for a termination
criterion gradients must be available, too. Of course, there are variants of the
classical Recovering method which do not require derivatives, but these methods
require many objective evaluations and consequently are not as efficient as gradient
based methods, particularly when the parameter space is high-dimensional.
Comparison with the Weighted Sums Method In the following we derive
some relationships between the solutions of the weighted sums method and RPP .
We assume that all objectives Fi, i = 1,2, . . . , k, are differentiable up to second
order.
For given weights wi0, i= 1,2, . . . , k, satisfying Pk
i=1 wi= 1 the weighted sums
problem, which forms a scalarized auxiliary problem for MOP, is given by:
min
xS
k
X
i=1
wiFi(x).(WS)
30
To obtain a representation of the entire Pareto set, WS has to be solved repeatedly
for different weighting vectors w= (w1, . . . , wk)t. One difficulty in this approach is,
that it is not always clear a priori how to choose these weighting vectors in order
to obtain a well-distributed representation of the Pareto set, see [6]. Moreover,
as demonstrated by the following example, the weighted sums method may fail to
compute the entire Pareto set.
Example 4.4 Let F:
R
+
R
2be defined by F(x)=(F1(x), F2(x))t, where
F1(x)=1xand F2(x) = x, and consider the following MOP:
min
x>0F(x).
Observe that F0
1(x) = 1<0 and F0
2(x) = (2x)1>0 for all x > 0 that is, every
x > 0 is Pareto optimal. Assume we want to find the Pareto point x?=1
4. Keeping
in mind that w2= 1 w1we can write down the necessary condition for x?to solve
WS:
w1F0
1(x?) + (1 w1)F0
2(x?) = 1 2w1= 0.
Consequently, if there is a weighted sums expression with a minimum at x?, then it
corresponds to w1=w2=1
2and thus must be given by
Fw(x) := 1
2F1(x) + 1
2F2(x) = 1x+x
2.
Unfortunately, with
F0
w(x) = 1
4x1
2
we have F0
w(x)>0 for 0 < x < x?and F0
w(x)<0 for x>x?that is, x?is a maximum
of Fwand a minimization of Fwcannot lead to x?, unless in the unlikely case when
x?is chosen to be the initial guess and a gradient based method is used. On the
other hand, using the target vector t= (1
4,0)t, minimization of the expression
Ft(x) := ||F(x)t||2=rx21
2x+9
16
leads to x?, because with
F0
t(x) = x1
4
qx21
2x+9
16
we have F0
t(x)<0 for 0 < x < x?and F0
t(x)>0 for x>x?. In other words, we have
detected a solution x?of the given MOP which can be found using RPP , whereas
WS fails to find this solution.
31
Now, for the unconstrained case, i.e., S=
R
n, we want to state this relationship in
a more general context. In a first step we prove that any solution of WS fulfills the
necessary optimality conditions for being a solution of RPP , if the target vector is
chosen properly.
Lemma 4.5 For S=
R
nlet x?be a solution of WS. Then there is a target vector
t= (t1, t2, . . . , tk)t
R
k,tpF(x?), such that x?fulfills the necessary optimality
conditions for being a solution of RPP .
Proof: For i= 1,2, . . . , k let ti=Fi(x?)wi. Since x?is a solution of WS, we
have
k
X
i=1
wiFi(x?) =
k
X
i=1
wiFi(x?) = 0
and consequently
∇||F(x?)t||2=
v
u
u
t
k
X
i=1
(Fi(x?)ti)2
=Pk
i=1 2(Fi(x?)ti)Fi(x?)
2qPk
i=1(Fi(x?)ti)2
=1
||F(x?)t||2
k
X
i=1
wiFi(x?)=0.
Under certain conditions a solution of WS is indeed also a solution of RPP . To
prove this fact, which is formulated later on in Corollary 4.7, we have to do some
preliminary work. To decide whether a point is a solution of an unconstrained
minimization problem, in addition to the fact that the gradient vanishes, a statement
on the definiteness of the corresponding Hessian has to be made. For this we denote
by 2F(x) the Hessian of a function F:
R
n
R
at xand consider the following
Theorem 4.6 For S=
R
nlet x?be a solution of WS, such that the Hessian
2
k
X
i=1
wiFi(x?) =
k
X
i=1
wi2Fi(x?)
is positive (semi-)definite. Then there is a target vector t= (t1, t2, . . . , tk)t
R
k,
tpF(x?), such that
2||F(x?)t||2
32
is positive (semi-)definite.
Proof: For i= 1,2, . . . , k let ti=Fi(x?)wi. Then ||F(x?)t||26= 0 and
2||F(x?)t||2=(∇||F(x?)t||2)t= Pk
i=1(Fi(x?)ti)Fi(x?)
||F(x?)t||2!t
=Pk
i=1 (Fi(x?)ti) (Fi(x?))t
||F(x?)t||2
+Pk
i=1 (Fi(x?)ti)2Fi(x?)
||F(x?)t||2
+1
||F(x?)t||2 k
X
i=1
(Fi(x?)ti)Fi(x?)!t
=1
||F(x?)t||2 k
X
i=1 Fi(x?) (Fi(x?))t+
k
X
i=1
wi2Fi(x?)!
+1
||F(x?)t||2
k
X
i=1
wiFi(x?)
|{z }
=0
t
.
Here, Pk
i=1 wiFi(x?) = 0, because x?is a solution of WS. The dyadic product
Fi(x?) (Fi(x?))t
is positive semidefinite for all i= 1,2, . . . , k, and since
2
k
X
i=1
wiFi(x?) =
k
X
i=1
wi2Fi(x?)
is assumed to be positive (semi-)definite we can conclude that 2||F(x?)t||2is
also positive (semi-)definite.
Corollary 4.7 For S=
R
nlet x?be an isolated solution of WS. Then there is
a target vector t= (t1, t2, . . . , tk)t
R
k,tpF(x?), such that x?is an isolated
solution of RPP .
Proof: For i= 1,2, . . . , k let ti=Fi(x?)wi. Since x?is an isolated solution of
WS, 2Pk
i=1 wiFi(x?) is positive definite and Pk
i=1 wiFi(x?) = 0. From Lemma
4.5 it follows that ∇||F(x?)t||2= 0. Moreover, Theorem 4.6 guarantees that
33
2||F(x?)t||2is positive definite. Thus, the necessary and sufficient conditions
for x?being an isolated solution of RPP are satisfied.
Comparison with Normal Boundary Intersection As mentioned in Section 1,
the Recovering-IS algorithm concentrates on the computation of a complete set of
alternatives ([15]). Another method following this concept is the normal boundary
intersection (NBI) method described in [7]. To apply this method, the individual
minima F?
iof all objectives Fihave to be computed in order to construct the
pay-off matrix Φ
R
k×kgiven by Φij =F?
jF?
i. The set of points in
R
kthat
are convex combinations of the columns of Φ, i.e., {Φβ:β
R
k
+,Pk
i=1 βi= 1}is
called the convex hull of individual minima (CHIM). Let ˆndenote the unit normal
to the CHIM pointing towards the origin. The main idea of NBI is, that for every
β {β
R
k
+,Pk
i=1 βi= 1}, the intersection of the line {Φβ+tˆn:t
R
}with
the boundary F(S) of F(S) = {F(x), x S}is accepted to be a local Pareto
point. Although the NBI method works as satisfactorily as the Recovering-IS algo-
rithm, we want to point out that there is an essential difference from the designers
point of view. In many applications, the decision making process is based on the
trade-offs associated with every Pareto point x?that is, information on how much
the value of one objective Fjincreases when another objective Fiis improved. The
corresponding trade-off can be expressed by αj
αi, where αjand αiare the multipliers
occuring in the Kuhn-Tucker conditions associated with the given MOP: there are
αl
R
, l = 1, . . . , k, such that
k
X
l=1
αlFl(x?) = 0,(4.5)
k
X
l=1
αl= 1,(4.6)
αl0, l = 1, . . . , k. (4.7)
Certainly, α= (α1, . . . , αk)tcan be obtained by first calculating the gradients of
all objectives (nk derivatives!) and then solving (4.5), (4.6), (4.7) for α. But the
Recovering-IS algorithm has the advantage that for every calculated Pareto point
x?the vector F(x?)t >p0 is available, where t
R
kdenotes the respective target
used to obtain x?. Multiplying the Kuhn-Tucker equation
∇||F(x?)t||2=1
||F(x?)t||2
k
X
l=1
(Fl(x?)tl)Fl(x?) = 0
34
associated with RPP by
||F(x?)t||2
Pk
l=1 Fl(x?)tl
R
makes clear that α, given by
αi=Fi(x?)ti
Pk
l=1 Fl(x?)tl
, i = 1, . . . , k,
already solves (4.5), (4.6), (4.7). In other words, if the Pareto set is computed by the
Recovering-IS algorithm, αcan be simply obtained by scaling the vector F(x?)t
instead of calculating all derivatives, which can be relatively expensive, and then
solving (4.5), (4.6), (4.7).
4.3 Convex Objectives
The Recovering-IS algorithm has some crucial advantages in the case where the MOP
is unconstrained and where all objectives are strictly convex. In Theorem 4.11 we
will show that in this case, given a solution x?and F?:= F(x?), suitable targets for
finding further solutions in the neighborhood of x?can always be chosen on the tan-
gent space TF?F(P). Moreover, we will show in Theorem 4.9 that the complete set
of alternatives in image space as generated by the Recovering-IS algorithm implies
that the corresponding Pareto set in parameter space is also complete. To this end,
denote by <·,·>the standard scalar product in
R
nand consider the following
Lemma 4.8 Let FC1(Rn,R)be strictly convex and let x1, x2Rn, x16=x2, such
that F(x1) = F(x2). Then we have
hx2x1,F(x1)i<0.
Proof: Since Fis strictly convex and F(x1) = F(x2) we have
F(λ x2+ (1 λ)x1)< λF(x2) + (1 λ)F(x1) = F(x1)
for all λ(0,1). In particular, for ¯x=1
2x1+1
2x2we can write
F(¯x)< F(x1),
or
z:= F(¯x)F(x1)<0.
Again using the strict convexity of F, we have
F(λ¯x+ (1 λ)x1)< λF(¯x) + (1 λ)F(x1)
=F(x1) + λ(F(¯x)F(x1)),
35
or F(λ¯x+ (1 λ)x1)F(x1)
λ< z < 0
for all λ(0,1). Finally, since h¯xx1,F(x1)iis the directional derivative of F
at x1corresponding to the direction ¯xx1, we can conclude
hx2x1,F(x1)i= 2 h¯xx1,F(x1)i
= 2 lim
λ&0
F(λ¯x+ (1 λ)x1)F(x1)
λ<2z < 0.
Now we are in the position to prove the following
Theorem 4.9 Let FiC1(Rn,R)be strictly convex for i= 1, . . . , k.
Define F(x) := (F1(x), . . . , Fk(x))tand consider the multi-objective optimization
problem
min
xRnF(x).
Let PRnbe the corresponding Pareto set. Then the mapping
Φ : PF(P), x 7→ F(x)xP
is one-to-one.
Proof: Since Φ is obviously surjective it remains to show that Φ is injective.
Assume that there are x1, x2P, x16=x2, such that F(x1) = F(x2).Then there are
α1, . . . , αk0 with
k
X
i=1
αi= 1 such that
k
X
i=1
αiFi(x1) = 0.(4.8)
Since the objectives Fiare strictly convex and Fi(x1) = Fi(x2) for i= 1, . . . , k, it
follows from Lemma 4.8, that
hx2x1,Fi(x1)i<0 for all i= 1, . . . , k. (4.9)
W.l.o.g. let α1>0. Then we have from (4.8)
F1(x1) =
k
X
i=2
αi
α1Fi(x1)
36
and consequently
hx2x1,F1(x1)i=
k
X
i=2
αi
α1hx2x1,Fi(x1)i 0,
which is a contradiction to (4.9). Thus, F(x1)6=F(x2) that is, Φ is injective. This
completes the proof.
As mentioned at the beginning of this section, the choice of targets is very easy
in the case of convex objectives. Before we show this fact, in the following lemma
we first state that the vector F?t?given by the difference between a target t?
and F?=F(x?), where x?is the Pareto optimal point found by solving RPP’, is
orthogonal to the tangent space TF?F(P) with respect to <·,·>.
Lemma 4.10 Let FiC1(Rn,R)for i= 1, . . . , k, and consider the multi-objective
optimization problem
min
xRnF(x).
Denote by Pthe corresponding Pareto set and let F?:= F(x?), where x?
R
nis
the unique solution of RPP’ associated with the target t?<pF?.Then
F?t?TF?F(P).
Proof: Let F denote the boundary of {F(x) : x
R
n}. Since F(P) locally forms
a differentiable manifold (see [25]), there exists a differentiable curve α: [1,1]
F with α(0) = F?,α0(0) TF?F(P) and α(λ)F(P) for all λ[0,1]. Then,
since x?is a solution of RPP’, λ= 0 is a solution of
min
λ[1,1] ||α(λ)t?||2
and therefore
d
||α(λ)t?||2=d
hα(λ)t?, α(λ)t?i= 2 hα(λ)t?, α0(λ)i= 0
for λ= 0. With α(0) = F?we obtain
hF?t?, α0(0)i= 0,
that is, F?t?TF?F(P).
37
Theorem 4.11 Denote by Pthe Pareto set of an unconstrained MOP and assume
that the objectives F1, . . . , Fkare strictly convex . Let x?Pbe obtained by solving
RPP with target t?<pF?:= F(x?). Then for every ¯
FF(P)\{F?}there is a
target ˜
tTF?F(P),˜
t <p¯
Fsuch that for the solution ˜xof RPP with target ˜
twe
have F(˜x) = ¯
F.
Proof: First observe, that Theorem 4.9 guarantees that there is a unique solution
¯xof MOP with F(¯x) = ¯
F. Then
Fi(λ¯x+ (1 λ)x?)< λ ¯
Fi+ (1 λ)F?
i
and therefore
Fi(λ¯x+ (1 λ)x?)F?
i< λ(¯
FiF?
i) (4.10)
for every λ(0,1) and for every i= 1, . . . , k. Since F(P) forms a smooth manifold,
see [25], there exists a curve α: [0,1] F(P) with α(0) = F?and
αi(λ)Fi(λ¯x+ (1 λ)x?) (4.11)
for all i= 1, . . . , k and λ[0,1]. Thus, from (4.10) and (4.11) it follows that
αi(λ)F?
i< λ(¯
FiF?
i)
for λ(0,1) and i= 1, . . . , k. Multiplication by (t?
iF?
i)<0 yields
(t?
iF?
i)(αi(λ)F?
i)> λ(t?
iF?
i)( ¯
FiF?
i)
for λ(0,1) and i= 1, . . . , k. With F?=α(0), after summation of these kequations
and division by λ6= 0, we have
t?F?,αi(λ)α(0)
λ>t?F?,¯
FF?
for λ(0,1). The left hand side of this inequality vanishes for λ&0, because
lim
λ&0
αi(λ)α(0)
λTF?F(P)
and, as stated in Lemma 4.10,
t?F?TF?F(P).
Thus we have
t?F?,¯
FF?<0.
38
Now, for any vector v
R
kwith hv, t?F?i>0 define
µv:= t?F?,¯
FF?
hv, t?F?i.
Then µv>0 and
(¯
F+µvv)F?, t?F?=¯
FF?, t?F?+µvhv, t?F?i= 0,
that is, ¯
F+µvvTF?F(P).
Since ¯xis a solution of MOP, there are ¯wi0, i = 1, . . . , k, such that
k
X
i=1
¯wiFi(¯x)=0
and since the objectives Fiare strictly convex we know that ¯xsolves WS. More-
over, the Hessians 2Fiare positive definite and therefore we can conclude from
Lemma 4.5 and Theorem 4.6, that there is a target ˆ
t <p¯
Fsuch that ˜x:= ¯x
is a solution of RPP . Obviously, ˜xis also a solution of RPP for all targets
tT:= {¯
F+µ(ˆ
t¯
F) : µ > 0}. Moreover, since t?<pF?and t <p¯
Ffor
all tT, we have t¯
F, t?F?>0 for all tT. Finally, for the choice
˜
t:= ¯
F+µˆ
t¯
F(ˆ
t¯
F)Twe have ˜
tTF?F(P).
4.4 A Hierarchical Concept for Computing the Desired Part
of the Pareto Set
In many applications, after considering a discrete representation of the entire Pareto
set, the decision maker finds out that the best Pareto points e.g. for adjusting the
technical system must be somewhere in a particular part of the Pareto set. To make
a more concrete decision, a more detailed representation of that part is desired
that is, more Pareto points must be available such that their density is significantly
higher within that particular part. Of course, one could use boxes of suitable depth
in the Recovering-IS algorithm, such that the computed representation is sufficiently
dense. But since in the end only an a priori unknown part of the Pareto set is of
special interest, this proceeding is not very efficient. Here, too many unneeded points
are computed. On the other hand, an initial representation of suitable density is
required in order to analyze which part of the Pareto set one wants to concentrate
on. Here, again the question arises how to choose the depth of the boxes, such
that the required density is obtained, whereas the computational effort has to be
adequate.
39
In the following we propose a hierarchical concept based on a multi-level subdi-
vision scheme, which helps to successively extend the computed set of Pareto points
within those boxes which are selected by the decision maker for further considera-
tion. The extensions are obtained by first subdividing the selected boxes and then
applying the Recovering-IS algorithm exclusively to those ”subboxes” containing
Pareto points found so far. Consequently, the points of the computed extensions
are well-distributed within the selected boxes and the density among these points
increases as the iteration proceeds, whereas the computation of too many undesired
Pareto points is avoided.
In Figure 13 the solution for Example 4.1 as obtained by our Recovering-IS
algorithm in combination with the interactive hierarchical concept described above
is depicted. After two steps of subdivision and execution of our algorithm, the
decision maker decided to concentrate on a certain region (marked box) of the image
space. Thus, further subdivisions and executions of our algorithm were restricted
to this region. These computations were continued until the decision maker was
satisfied with the density of the generated Pareto points within this region.
Figure 13: The solution (in image space) of Example 4.1 computed by the
Recovering-IS algorithm in combination with the described hierarchical concept.
40
5 Basic Methods for Solving Bi-Level Optimiza-
tion Problems
Many different approaches for solving (classical) bi-level optimization problems have
been proposed in the past, as there are for example descent algorithms, bundle
algorithms, penalty methods, trust region methods, smoothing methods and branch-
and-bound methods. Many of these approaches are based on the conversion of
the bi-level problem to an ordinary (or classical) optimization problem (a one-level
problem). One possibility is to replace the lower level objective fby an additional
non-differentiable equation f(x, y) = ϕ(y), where ϕ(y) = minx{f(x, y) : g(x, y)
0, h(x, y) = 0}. Other approaches use the implicit function theorem to derive a
local description of the function x(y) :
R
m
R
n, which is then inserted into the
higher level problem. As mentioned in Section 2.2, another concept is to replace the
lower level problem by its Kuhn-Tucker conditions. In general, the resulting one-
level problem BLP’, which is a mathematical program with equilibrium constraints or
MPEC, see [33], is not equivalent to the original problem, but the desired equivalence
is ensured in the particular case where the lower level problem is a convex one. Since
in this work we will basically concentrate on optimistic formulations of BLMOP
with convex lower level problems, we will follow this concept based on lower level
Kuhn-Tucker conditions. Here, one essential difficulty to overcome in the presence
of lower level inequality constraints gi(x, y)0 is given by the fact that in this
case the usual constraint qualifications like linear independence or Mangasarian-
Fromowitz constraint qualifications for the higher level problem are violated at every
feasible point. A proof on this can be found for example in [12]. We will give
an alternative proof concerning linear independence constraint qualifications in the
more general context of BLMOP in Section 6.1. In order to overcome the mentioned
difficulties concerning constraint qualifications when solving the respective auxiliary
problems corresponding to BLMOP with lower level inequality constraints, we will
solve reformulations constructed by the use of merit functions ([47, 19]) as described
in the next section. We will also give a review on smoothing methods ([28]), which
help to improve the numerical solvability of the reformulated problems.
5.1 Merit Functions and Smoothing Methods
As we will see in the proof of Theorem 6.3, the mentioned violation of constraint
qualifications originates from the inequalities gi(x, y)0, τi0, i = 1, . . . , q and
the complementarity equations τigi(x, y)=0, i = 1, . . . , q in BLP’. A way out is
41
to replace these inequalities and equations, which act as constraints in BLP’, by
qequivalent constraints of the form Φ(τi,gi(x, y)) = 0, where Φ :
R
2
R
is a
so-called merit function. With this replacement and the notation ¯
:= (x), BLP’
is reformulated as
min
x,y F(x, y) (BLP”)
s.t. G(x, y)p0,
H(x, y)=0,
¯
f(x, y) +
p
X
i=1
ζi¯
hi(x, y) +
q
X
i=1
τi¯
gi(x, y)=0,
h(x, y)=0,
Φ(τi,gi(x, y)) = 0 for i= 1, . . . , q.
Several merit functions have been proposed in the past, see [47, 19]. Unfortunately,
it turned out that these merit functions are in general non-differentiable. Surely, non-
smooth methods can be used for the solution of merit function based reformulations,
but it is worth looking for differentiability conserving alternatives in order to benefit
from the advantages of gradient based optimization methods, that is in particular,
the higher rate of convergence.
The idea of smoothing methods is to use a family of smoothing functions Φε:
R
2
R
, which are smooth and use a smoothing parameter εin order to approximate
the merit function Φ in the sense that
lim
ε&0Φε(a, b) = Φ(a, b)a, b
R
.
In the following we will sometimes use synonymously the notation Φ(a, b, ε) :=
Φε(a, b). Examples of smoothing functions can for instance be found in [28]. The
method proposed in this work is particularly based on one of these smoothing func-
tion, the perturbed Fischer-Burmeister function.
Definition 5.1 The Fischer-Burmeister function Φ :
R
2
R
is defined by
Φ(a, b) = a+ba2+b2.
For ε0, the perturbed Fischer-Burmeister function Φ :
R
3
R
is defined by
Φ(a, b, ε) = a+ba2+b2+ε.
42
Observe that the perturbed Fischer-Burmeister function Φ has the following prop-
erties:
(i) for ε > 0, Φ is continuously differentiable with respect to (a, b) on
R
2,
(ii) for ε= 0, Φ is continuously differentiable with respect to (a, b) on
R
2\(0,0)t,
(iii) Φ(a, b, 0) = 0 a0, b 0,and ab = 0.
Consequently, BLP” is equivalent to BLP’, but for every i= 1, . . . , q, BLP” includes
just one single equality constraint Φ(τi,gi(x, y)) = 0 instead of the constraints
gi(x, y)0, τi0 and τigi(x, y) = 0, that is, the sources for the mentioned
violation of the constraint qualifications are circumvented. Another difference is,
that BLP” is not everywhere, but anyway almost everywhere differentiable, such
that one might generically expect to find solutions of BLP” using derivative based
algorithms. But as mentioned in [28], well-posedness can be improved in the sense
that feasibility and constraint qualifications, hence stability, are often more likely to
be satisfied, if Φ(τi,gi(x, y)) is replaced by Φ(τi,gi(x, y), ε) for some small ε > 0.
Moreover, solvability of quadratic approximation problems is improved in this case,
which opens the way to use sequential quadratic programming methods (SQP) from
classical nonlinear optimization.
Remark 5.2 (i) In general, it is not clear whether a solution of a perturbation
of BLP” according to ε > 0 approximates the solution of the original problem.
To see that under certain assumptions this desired approximation property is
indeed guaranteed, let
Φ(τ, x, y, ε) := ε(τ1,g1(x, y)),...,Φε(τq,gq(x, y)))t
and
˜
H(x, y, τ, ζ, ε) :=
H(x, y)
¯
f(x, y) +
p
X
i=1
ζi¯
hi(x, y) +
q
X
i=1
τi¯
gi(x, y)
h(x, y)
Φ(τ, x, y, ε)
.
Then, the perturbed variant of BLP” can be written as
min
x,y F(x, y) (BLP(ε))
s.t. G(x, y)p0,
˜
H(x, y, τ, ζ, ε)=0,
43
which makes up a parametric optimization problem as considered in [17].
Then, denoting by X(ε) := (x(ε), y(ε), τ(ε), ζ(ε)) the solution of BLP(ε) for
every ε, in [17] it is shown that under suitable assumptions the mapping
εX(ε) is continuous in a neighborhood of ε= 0. Consequently, we can
expect that for a sufficiently small perturbation parameter ε > 0, the solution
X(ε) approximates X(0), that is, the solution of BLP”.
(ii) Moreover, if X(¯ε) has been computed for some ¯ε > 0, the results of [17]
constitute a linear approximation for the mapping εX(ε) at ¯ε, which can
be used to derive different a posteriori error estimations, e.g., an estimation of
the form ||(x(¯ε), y(¯ε)) (x(0), y(0))||.
There remains the question which can in general hardly be answered a priori how
to choose ε > 0, such that the corresponding reformulation is sufficiently well-posed
and the optimization results in a point which is somehow ”close enough” to the
solution of the original reformulation, that is, for the case ε= 0. Consequently, a
common technique is to start with a relatively large perturbation parameter ε0>0,
which has to converge towards 0 during the optimization process. Such techniques
are well-known in the field of Mathematical Programs with Complementarity Con-
straints and can be divided into implicit and explicit methods, see for instance [28].
In implicit smoothing methods, the smoothing parameter εis included as one
of the optimization variables of the problem reformulation and is updated at each
iteration just like the other variables. To this end, an additional equality constraint,
e.g. of the form eε1 = 0, which is satisfied if and only if ε= 0, has to be included
in the reformulation (here, edenotes Euler’s constant).
In explicit smoothing methods the smoothing parameter is updated separately,
that is, after obtaining a solution for a fixed perturbation parameter ε > 0, εhas
to be decreased successively and in every cycle a subproblem associated with a new
perturbation parameter has to be solved, starting from the solution of the respective
previous cycle. Here, a strategy for decreasing the perturbation parameter is needed.
Furthermore, a termination criterion, i.e., an a posteriori estimation is required in
order to decide at the end of every cycle whether the solution is satisfying in the
sense that it is ”close enough” to the solution of the original reformulation, or not.
44
6 Bi-Level Multi-Objective Optimization Problems
(BLMOP)
In this section, we present the main topic of this thesis. First we derive some
theoretical results including optimality conditions for a certain class of BLMOP,
that is in particular, BLMOP without lower level inequality constraints. Then we
present some algorithms for the solution of these problems, which are based on the
mentioned optimality conditions and we prove convergence for such algorithms. We
also consider a special subclass of BLMOP, the Pareto set constrained multi-objective
optimization problems (PSCMOP), which are characterized by the fact that the
lower level problem is not a parametrized one. For the sake of completeness, we give
some basic ideas for the solution of BLMOP with lower level inequality constraints in
Section 6.5 and for non-convex and non-smooth BLMOPs in Section 6.6. Moreover,
in Section 6.4 we propose methods which are tailored to solve non-convex and non-
smooth PSCMOPs.
6.1 An Optimality Condition for BLMOPs without Lower
Level Inequality Constraints
For the case where lower level inequality constraints are absent, that is q= 0, we
want to state a necessary condition for a solution of BLMOP on the basis of the
Kuhn-Tucker conditions (2.2) for MOP. We restrict ourselves to this particular case,
because, as we will see later on, lower level inequality constraints lead necessarily to
the violation of constraint qualifications which have to be satisfied for the desired
optimality condition for BLMOP stated in Theorem 6.1. Later on in Section 6.5
we will apply the results of this section to reformulations based on the Fischer-
Burmeister function in order to handle the general BLMOP with both equality and
inequality constraints for both the higher and lower level.
We rewrite the optimistic formulation BLMOP-O for the case without lower level
inequality constraints as
min
x
R
n,y
R
mF(x, y),(BLMOP-O’)
s.t. G(x, y)p0,
H(x, y) = 0,
and xsolves: min
x
R
nf(x, y),
s.t. h(x, y) = 0.
45
In the following let L:
R
n×
R
m×
R
l×
R
p
R
,
L(x, y, α, ζ) :=
l
X
i=1
αifi(x, y) +
p
X
i=1
ζihi(x, y)
be the lower level Lagrangian. Denote by Lxithe derivative of Lwith respect to
xi,i= 1, . . . , n. Let Iα:= {i:αi= 0}⊂{1, . . . , l}and for fixed x
R
n, y
R
m
let IG(x, y) := {i:Gi(x, y) = 0} {1, . . . , s}. The gradient f(x) of a function
f:D
R
n
R
is understood to be a row vector and the Hessian of fis denoted
by 2f(x). Accordingly, the ith row of the Jacobian g(x) of a vector valued
function g:D
R
n
R
kis given by the gradient gi(x) of the ith component
of g. Furthermore, denote by eithe ith vector of the standard basis in
R
n+m+l+p
and let ¯
:= (x),˜
:= (y),ˆ
:= (x,y), and := (x,y,α,ζ).
Theorem 6.1 Let (x, y)be a solution of BLMOP. If the gradients
¯
hi(x, y), i = 1, . . . , p,
are linearly independent, then there exist α1, . . . , αl0,ζ1, . . . , ζp
R
such that
(i)¯
∇L(x, y, α, ζ) = 0,(6.1)
(ii)
l
X
i=1
αi= 1.(6.2)
Assume that in addition all (x, y, α, ζ)satisfying (6.1), (6.2), αi0, and h(x, y) = 0
correspond to Pareto points for the lower level problem. Furthermore assume that
the first nrows of the Hessian
ˆ
2L(x, y, α, ζ)
of the lower level Lagrangian with respect to (x, y)and the gradients
ˆ
hi(x, y) = ¯
hi(x, y),˜
hi(x, y), i = 1, . . . , p,
ˆ
Hi(x, y) = ¯
Hi(x, y),˜
Hi(x, y), i = 1, . . . , r,
ˆ
Gi(x, y) = ¯
Gi(x, y),˜
Gi(x, y), i IG(x, y)
of the active higher and lower level constraints with respect to (x, y), are linearly
independent. Then there exist β1, . . . , βk,µ1, . . . , µl,δ1, . . . , δs0,ω1, . . . , ωp,
46
ν, λ1, . . . , λn,ρ1...,ρr
R
such that
(iii)
k
X
i=1
βiFi(x, y) +
n
X
i=1
λi∇Lxi(x, y, α, ζ) (6.3)
+
r
X
i=1
ρiHi(x, y) +
s
X
i=1
δiGi(x, y)
+
p
X
i=1
ωihi(x, y) +
l
X
i=1
(νµi)en+m+i= 0,
(iv)
k
X
i=1
βi= 1,(6.4)
(v)µiαi= 0 for i= 1, . . . , l, (6.5)
(vi)δiGi(x, y) = 0 for i= 1, . . . , s. (6.6)
Proof: Let (x, y) be a solution to BLMOP. Then xis a solution to the lower level
problem parametrized by y. Since ¯
hi(x, y), i = 1, . . . , p are linearly independent,
the Kuhn-Tucker condition 2.2 holds: there are α1, . . . , αl, ζ1, . . . , ζp
R
such that
¯
∇L(x, y, α, ζ)=0,(6.7)
l
X
i=1
αi= 1 (6.8)
and
αi0, i = 1, . . . , l. (6.9)
Denote by SLthe set of those solutions of (6.7),(6.8),(6.9) for which h(x, y) = 0 holds.
Observe that according to the assumptions of the theorem for any (x, y, α, ζ)
SL,xis Pareto optimal for the lower level problem corresponding to y. Thus, we can
define the following auxiliary problem, which is equivalent to the given BLMOP:
min
x,y,α,ζ F(x, y),(6.10)
s.t G(x, y)p0,
H(x, y) = 0,
(x, y, α, ζ)SL.
In order to apply the Kuhn-Tucker conditions (2.2) to this auxiliary problem, we
have to ensure that the gradients with respect to (x, y, α, ζ) of the active constraints,
47
that is,
hi(x, y), i = 1, . . . , p, (6.11)
Hi(x, y), i = 1, . . . , r, (6.12)
Gi(x, y), i IG(x, y),(6.13)
∇Lxi(x, y, α, ζ), i = 1, . . . , n, (6.14)
l
X
i=1
αi=
l
X
i=1
en+m+i,(6.15)
(αi) = en+m+i, i Iα,(6.16)
are linearly independent or equivalently, that the matrix MRp+r+s+n+|Iα|+1×n+m+l+p
defined by
M:=
¯
h(x, y)˜
h(x, y)0 0
¯
H(x, y)˜
H(x, y)0 0
¯
G(x, y)˜
G(x, y)0 0
¯
2L(x, y, α, ζ)˜
¯
∇L(x, y, α, ζ)t¯
f(x, y)t¯
h(x, y)t
0 0 ˜
I0
0 0 1. . . 10
has maximal rank, where the rows of ˜
Iare given by the transposed of distinct
standard basis vectors of
R
laccording to (6.16). Since
¯
2L(x, y, α, ζ)˜
¯
∇L(x, y, α, ζ)t
complies with the first n(linearly independent) rows of the Hessian ˆ
2L(x, y, α, ζ),
the rows of the upper left block
¯
h(x, y)˜
h(x, y)
¯
H(x, y)˜
H(x, y)
¯
G(x, y)˜
G(x, y)
¯
2L(x, y, α, ζ)˜
¯
∇L(x, y, α, ζ)t
of Mare linearly independent and since Mhas a certain triangular block structure,
it remains to show that the submatrix
˜
I
1. . . 1R|Iα|+1×l(6.17)
has maximal rank. Indeed, with (6.8) and (6.9) we have |Iα|< l that is, (6.17)
has at most lrows. Obviously, these rows are linearly independent and thus both
(6.17) and also Mhave maximal rank. Consequently, (6.11), (6.12), (6.13), (6.14),
48
(6.15) and (6.16) are linearly independent and we can write down the Kuhn-Tucker
conditions (2.2) for the auxiliary problem (6.10): there are β1, . . . , βk,µ1, . . . , µl,
δ1, . . . , δs0, ω1, . . . , ωp,ν,λ1, . . . , λn,ρ1, . . . , ρr
R
such that
k
X
i=1
βiFi(x, y) +
n
X
i=1
λi∇Lxi(x, y, α, ζ) (6.18)
+ν
l
X
i=1 αi
l
X
i=1
µiαi+
p
X
i=1
ωihi(x, y)
+
r
X
i=1
ρiHi(x, y) +
s
X
i=1
δiGi(x, y) = 0,
k
X
i=1
βi= 1,(6.19)
µiαi= 0 for i= 1, . . . , l, (6.20)
δiGi(x, y) = 0 for i= 1, . . . , s. (6.21)
Observing that αi=en+m+ifor i= 1, . . . , l, we obtain from (6.18)
k
X
i=1
βiFi(x, y) +
n
X
i=1
λi∇Lxi(x, y, α, ζ) +
l
X
i=1
(νµi)en+m+i(6.22)
+
p
X
i=1
ωihi(x, y) +
r
X
i=1
ρiHi(x, y) +
s
X
i=1
δiGi(x, y)=0.
Remark 6.2 (i) In the case without explicitly given constraints, that is r=s=
p= 0, the maximal rank condition for the matrix Mis already satisfied, if the
Hessian ¯
2L(x, y, α, ζ) (with respect to x) is regular.
(ii) After introducing slack variables to express the involved inequalities as equa-
tions, Theorem 6.1 along with the given constraints corresponds to a system
of nonlinear equations that has (k1) more variables, than it has equations,
if k2. If k= 1, that is, if we have one single higher level objective F1, then
49
(6.3) simplifies to
F1(x, y) +
n
X
i=1
λi∇Lxi(x, y, α, ζ)
+
p
X
i=1
ωihi(x, y) +
r
X
i=1
ρiHi(x, y) +
s
X
i=1
δiGi(x, y)
+
l
X
i=1
(νµi)en+m+i= 0
and (6.4) simplifies to β1= 1. In this case the system turns out to be quadratic
in the sense that the number of variables equals the number of equations.
(iii) If (x, y) is an inner point of the solution set with respect to Gand α, that is,
IG(x, y) = and Iα=, then µi= 0 for i= 1, . . . , l,δi= 0 for i= 1, . . . , s,
and (6.3) simplifies to
k
X
i=1
βiFi(x, y) +
n
X
i=1
λi∇Lxi(x, y, α, ζ)
+
p
X
i=1
ωihi(x, y) +
r
X
i=1
ρiHi(x, y) + ν
l
X
i=1
en+m+i= 0.
In the next sections we will propose algorithms based on Theorem 6.1 which are
tailored to solve BLMOPs without lower level inequality constraints. In Section 6.5
we will consider the more general BLMOPs, where lower level inequality constraints
are involved. Here, we will approximate the original problems by related BLMOPs
where the lower level inequality constraints are replaced by certain equality con-
straints which are constructed by using the perturbed Fischer-Burmeister function.
In this way, Theorem 6.1 will also be the basis for our algorithms for the solution of
BLMOPs with lower level inequality constraints.
There remains the question why Theorem 6.1 is stated only for BLMOPs without
lower level inequality constraints. For this, we state the following
Theorem 6.3 Let the Kuhn-Tucker conditions of an inequality constrained MOP
act as constraints for a higher level optimization problem. Then the linear inde-
pendence constraint qualifications for the higher level problem are violated at every
feasible point.
Proof: W.l.o.g., we assume that there are no higher level constraints and no
lower level equality constraints. In this case, with a lower level problem involving
50
objectives f:
R
n+m
R
land inequality constraints g:
R
n+m
R
q, the lower
level Lagrangian is given by
L(x, y, α, τ) :=
l
X
i=1
αifi(x, y) +
q
X
i=1
τigi(x, y).
Let := (x,y,α,τ)and denote by Lxithe derivative of Lwith respect to xi,i=
1, . . . , n. Then, the Kuhn-Tucker reformulation associated with the lower level is
given by
¯
∇L(x, y, α, τ) = 0,
l
X
i=1
αi= 1,
αi0, i = 1, . . . , l,
gi(x, y)τi= 0, i = 1, . . . , q,
τi0, i = 1, . . . , q.
Consequently, the linear independence condition for applying the Kuhn-Tucker con-
ditions to the higher level problem means that the gradients of the active constraints
(which are defined by the lower level constraints and the lower level Kuhn-Tucker
conditions) with respect to (x, y, α, τ), that is
∇Lxi(x, y, α, τ), i = 1, . . . , n,
(gi(x, y)τi), i = 1, . . . , q,
τi=en+m+l+i, i Iτ={i:τi= 0},
gi(x, y), i Ig={i:gi(x, y)=0},
l
X
i=1
αi,
αi=en+m+i, i Iα={i:αi= 0}
have to be linearly independent. But for iIgwe have
(gi(x, y)τi) = ( τi¯
gi(x, y)τi˜
gi(x, y)0 0 ),
gi(x, y) = ( ¯
gi(x, y)˜
gi(x, y)0 0 ),
and for iIτ, we have
(gi(x, y)τi) = ( 0 0 0 eigi(x, y) ),
τi= ( 0 0 0 ei).
51
Consequently, since the condition gi(x, y)τi= 0 implies that iIgIτfor all
i= 1, . . . , q, any lower level inequality constraint violates the desired linear inde-
pendence conditions.
6.2 Methods for Solving BLMOPs without Lower Level In-
equality Constraints
In this section we propose new algorithms for the computation of the Pareto set of
a given BLMOP without lower level inequality constraints. We first consider how
to solve the BLMOP in the particular case k= 1, where the Pareto set turns out to
be a singleton and then we present the BL-Subdivision, the BL-Recovering-PS, and
the BL-Recovering-IS algorithms for the computation of both tight coverings and
discrete representations of the typically extensive Pareto sets in the case k > 1.
Here, we concentrate on derivative based methods, which are based on the so-
lution of a system of equations derived from Theorem 6.1. Therefore, we have to
assume that for any fixed y
R
mall feasible substationary points x
R
nof
the corresponding lower level problem are Pareto optimal, e.g., if the lower level
problem is a convex one. In this case the algorithms to be described below compute
a set of points, which certainly solve the corresponding lower level problems, but in
general this set is a superset of the Pareto set of the given BLMOP. Consequently,
a nondominance and feasibility test with respect to the higher level has to be per-
formed among these points in order to filter out the desired Pareto points. Efficient
strategies for nondominance tests can be found in [27].
Obviously, such nondominance tests can be omitted, if it is ensured that all
substationary points of the higher level problem are necessarily Pareto optimal for
the given BLMOP. This is for instance the case, if in addition to the mentioned
assumptions for the lower level also the higher level along with the higher level
constraints and the constraints implicitly given by the lower level solutions turns
out to be a convex problem.
Algorithms for the solution of BLMOPs which also include lower level inequality
constraints are presented in Section 6.5. Moreover, in Section 6.4 and Section 6.6
we present algorithms, which do not require derivatives or convexity of the involved
functions.
There are several possibilities to derive from Theorem 6.1 the system of equations
which has to be solved in order to compute individual Pareto points. Here, we want
to present a variant which uses slack variables s, t, u, v, w in order to incorporate the
52
higher level inequality constraints and the requirement that some multipliers have to
be non-negative. For this, let ˜z:= (x, y, α, ζ, β, µ, ω, δ, ν, λ, ρ, s, t, u, v, w) and denote
by ab
R
dthe component wise or Hadamard product of vectors a, b
R
d, that is,
(ab)i=aibifor i= 1, . . . , d. Then, with the notations of the preceding sections,
we propose to solve the following system:
˜
F(˜z) =
k
X
i=1
βiFi(x, y) +
n
X
i=1
λi∇Lxi(x, y, α, ζ) +
l
X
i=1
(νµi)en+m+i
+
p
X
i=1
ωihi(x, y) +
r
X
i=1
ρiHi(x, y) +
s
X
i=1
δiGi(x, y)
h(x, y)
H(x, y)
G(x, y) + ww
¯
∇L(x, y, α, ζ)
k
X
i=1
βi1
l
X
i=1
αi1
µα
δG(x, y)
αss
βtt
µuu
δvv
=0.
Observe that the Lxi(x, y, α, ζ) include first derivatives and so the ∇Lxi(x, y, α, ζ)
appearing in the first components of ˜
Finclude second derivatives. Moreover, deriva-
tives of all components of ˜
Fare required if one wants to solve the system with a
derivative based, e.g., a Gauß-Newton method. Thus, in this case derivatives up to
third order are needed for approaches based on the solution of ˜
F(˜z) = 0.
In some applications, which are affected by uncertainties, it might be required
to exclude those points, which are located close to the boundary of the solution set
with respect to Gand α. This can be achieved by a simplification of the system
according to Remark 6.2. For this, we use some sufficiently small ε > 0 to guarantee
that all components of αare strictly positive and that all components of G(x, y) are
strictly negative. Let ¯z:= (x, y, α, ζ, β, ω, ν, λ, ρ, s, t, w) and denote by 1dthe vector
(1,...,1)
R
d. Then, for obtaining the ’inner’ part of the solution, the following
simplified system can be solved:
53
¯
F(¯z) =
k
X
i=1
βiFi(x, y) +
n
X
i=1
λi∇Lxi(x, y, α, ζ) + ν
l
X
i=1
en+m+i
+
p
X
i=1
ωihi(x, y) +
r
X
i=1
ρiHi(x, y)
h(x, y)
H(x, y)
G(x, y) + ww+ε1s
¯
∇L(x, y, α, ζ)
k
X
i=1
βi1
l
X
i=1
αi1
αssε
βtt
=0
Later on in this section, the difference between the solutions of ˜
F(˜z) = 0 and
¯
F(¯z) = 0 corresponding to Example 6.6 will be shown (Figures 20 and 21). It
should be pointed out that the application of the simplified system ¯
F(¯z) = 0 can
be meaningful for the recovering methods presented later on in this section. These
methods are based on continuation techniques which require for every connected
component of the Pareto set at least one single solution out of this component as a
starting point. According to the simpler structure, if it is known that {¯z:¯
F(¯z) =
0} 6=, we propose to compute a solution of the system ¯
F(¯z) = 0 for finding such
starting points. Moreover, the simplification from ˜
Fto ¯
Fmight be an acceptable
compromise for those applications, where the corresponding part of the solution is
satisfactory to the decision maker while computation has to be fast. This applies
particularly for the Subdivision-BL algorithm, since the computational effort of
subdivision methods increases drastically with the dimension of the search space.
For the remainder of this work we will always write ˜
F(˜z), but we emphasize that
in practice a simplification according to Remark 6.2, e.g., of the form ¯
F(¯z), can be
used, if desired.
The Case k= 1
Before we describe the set-oriented methods for the solution of the typically extensive
Pareto sets in the case k > 1, we consider the special case k= 1, that is, if there
54
is only one upper level objective. In this case each connected component of the
Pareto set is typically given by a single Pareto point. Here, the task of computing
a solution of the BLMOP can be regarded as computing a preferred solution out
of the Pareto set of the lower level problem, where the selection of this preferred
solution is performed due to the upper level problem. Such problems are strongly
related to the field of optimization over the efficient set, see for instance [8], [2], [44]
and references therein. As mentioned in Remark 6.2, for k= 1 the system ˜
Fhas
the same number of variables as it has equations. Consequently, a standard Newton
method is suitable for finding a solution of this particular BLMOP.
A BLMOP Formulation for Finding Robust Pareto Points
As a special application for the case k= 1, we consider the problem of finding a
Pareto point x?of a MOP with the additional property that it is robust or insensible
in the sense that the variation of the objective values is as small as possible in a
neighborhood Uε(x?) of x?. According to the requirements associated with different
applications, there are several possibilities to model robustness or sensitivity. Here
we use the expression
σi(x) := ||∇fi(x)||2
2
to measure the sensitivity corresponding to the i-th (lower level) objective and we
assume that the user is interested in a solution x?for which
max
iL{σi(x?)}
is minimal (L:= {1, . . . , l}). Now, with f(x)=(f1(x), . . . , fl(x))twe can write
down the mathematical formulation of the described problem:
min
x
R
nmax
iLσi(x),(RMOP)
where xsolves: min
x
R
nf(x).
Observe that only the lower level problem, i.e., the minimization of f, has to be un-
derstood in the multi-objective sense, whereas minimization and maximization in the
upper level problem have to be understood in the classical sense, i.e., minimization
and maximization of a scalar valued expression. In order to obtain differentiability
we consider the equivalent problem
min
x
R
n
R
γ, (6.24)
where σi(x)γ0iL,
and xsolves: min
x
R
nf(x),
55
which has the form of BLMOP.
Example 6.4 We consider the RMOP associated with the MOP
min
x
R
2f(x) = 2(x11)4+ 3(x21)2
(x1+ 1)2+ (x2+ 1)2.(6.25)
The corresponding sensitivity expressions are given by
σ1(x) = 64(x11)6+ 36(x21)2
and
σ2(x) = 4(x1+ 1)2+ 4(x2+ 1)2.
The solution was computed by solving the system ˜
F(˜z) = 0 associated with the
corresponding reformulation of the form (6.24). For the purpose of visualization
we have also computed both the entire Pareto set of (6.25) and the corresponding
sensitivity values. The results are shown in Figure 14 and Figure 15. The Pareto
optimal objective values (blue) of (6.25), the respective sensitivity values (red) and,
the solution (black) of the RMOP are shown in Figure 14. Here one can see that the
maximum of the sensitivities is minimal at the solution. The Pareto set together
with the solution of the RMOP in image space is shown in Figure 15.
BL-Subdivision Algorithm
We define different variants of the BL-Subdivision algorithm by taking the classi-
cal Subdivision algorithm presented in Section 3.1, changing the search space and
replacing the iteration scheme by an iteration scheme for solving the system (or
subsystems of) ˜
F(˜z) = 0.
To define the basic BL-Subdivision algorithm, denote by GNs(z0) the application
of ssteps of the iteration scheme, e.g., a Gauß-Newton method for solving ˜
F(˜z) = 0
with starting point z0. Let Nbe the dimension of z= ˜z. Then, for a given domain
Q=B0RNan iteration of the basic BL-Subdivision algorithm reads as follows:
(i) Construct from Bj1a new system ˆ
Bjof subsets such that
[
Bˆ
Bj
B=[
B∈Bj1
B
and
diam( ˆ
Bj) = θjdiam(Bj1),
where 0 < θmin θjθmax <1.
56
Figure 14: The Pareto optimal objective values (blue), the corresponding sensitivi-
ties (red), and the solution (black) of Problem (6.25).
Figure 15: The Pareto set of (6.25) in image space and the solution to the corres-
ponding RMOP.
57
(ii) For all Bˆ
Bj:
choose a set of starting points X B
Bj=Bj{¯
B:z X with GNs(z)¯
B}.
Observe that for every starting point z X only a small number sof steps of
the iteration scheme for solving ˜
F(˜z) = 0 is performed and so the generated point
GNs(z) is just one point of the trajectory defined by all iterates GNi(z), i N.
Moreover, for i the trajectory might lead to a Pareto point far away from the
’starting box’ B3zregardless of the fact that there might also be Pareto points
close to or even within B. The resulting inconvenience is, that the box Band with
it a potential part of the Pareto set might get lost while the box containing GNs(z)
is kept even if it does not contain any Pareto points. A way out is to force the
iterates to converge towards a Pareto point located as close as possible to a reference
point within the starting box. Motivated by these considerations, we propose an
alternative way to realize the above algorithm. To this end, we replace the zero
finding method by a suitable iteration scheme for the solution of the constrained
minimization problem
min
˜z||˜zrB||,(6.26)
s.t. ˜
F(˜z)=0,
where rBdenotes a reference point within the starting box B, e.g., the center of B
or the used starting point. For an illustration of this strategy see Figure 16.
Example 6.5 To demonstrate how the Subdivision-BL algorithm works, we con-
sider the following BLMOP with two higher level objectives F1, F2, three lower level
objectives f1, f2, f3, a higher level inequality constraint G1and a lower level equality
constraint h1:
min
x
R
3, y
R
F(x, y) = (x0+ 1)2+ (x11y)4+x2
2
(x01)2+ (x1+ 1 y)2+ (x20.5)4,
such that G1(x, y) = 0.5x2
0+x1+ sin y0.50,
and xsolves:
min
x
R
3f(x, y) =
(x01)2+ 0.5(x1+y)2+ (x21)4
(x0+ 1)2+ 0.5(x1+y)2+ (x2+ 1)2
x2
0+x2
1+ (x2+ 1)2
such that h1=x0x1y= 0.
58
Figure 16: min
˜z{||˜zrB|| :˜
F(˜z) = 0}versus ˜
F(˜z) = 0.
The results are shown in Figures 17, 18 and 19. In particular, Figure 17 shows a
projection of the generated box collection to the x-space. The individual solutions
GNs(z) of the subproblems of the last iteration have been stored in an archive.
Figure 18 and 19 show these solutions in lower and higher level image space, respec-
tively.
Let us now consider another variant of the BL-Subdivision algorithm, which uses a
strategy described in [38]. Instead of using an iteration scheme for solving the entire
system of equations, a scheme for solving just one of the components ˜
Fl(˜z) = 0,
l {1, . . . , N k+ 1}, e.g., a damped Newton method, is used in every step of
the subdivision procedure. In order to guarantee convergence, every component has
to be applied infinitely often as the algorithm proceeds. In [38] this concept was
originally used for the location of zeros of a function f:
R
n
R
using a finite set of
iteration schemes each of them characterized by an individual step length parameter.
In this way it can be avoided that the iterates run into undesired oscillations such
that the algorithm would keep the corresponding boxes which possibly do not contain
zeros of f.
For the development of the variant of the BL-Subdivision algorithm, we extended
this idea in the sense that the different iteration schemes correspond to Newton
methods applied to the individual components ˜
Fl,l {1, . . . , N k+ 1}. For the
description of this variant let {uj}
k=jbe a sequence with uj {1, . . . , N k+1}for
all jNand |{j:uj=l}| =for all l= 1, . . . , N k+ 1. Furthermore, denote
59
Figure 17: Projection of the generated box collection to the x-space.
Figure 18: The solution in lower level image space.
60
Figure 19: The solution in higher level image space.
by Ns
l(z0) the application of ssteps of an iteration scheme, e.g., a Newton method,
for solving ˜
Fl(˜z) = 0 with starting point z0. Then, for a given domain Q=B0RN
an iteration of the variant of the BL-Subdivision algorithm reads as follows:
(i) Construct from Bj1a new system ˆ
Bjof subsets such that
[
Bˆ
Bj
B=[
B∈Bj1
B
and
diam( ˆ
Bj) = θjdiam(Bj1),
where 0 < θmin θjθmax <1.
(ii) For all Bˆ
Bj:
choose a set of starting points X B
Bj=Bj{¯
B:z X with Ns
uj(z)¯
B}.
BL-Recovering-PS Algorithm
If in addition to the previous assumptions at least one box containing points of the
solution has been found so far, a variant of the Recovering-PS algorithm described
61
in Section 3.1 can be used to generate a box covering of the corresponding connected
component of the Pareto set. To be more precise, we use a fixed subdivision depth
dand define the BL-Recovering-PS algorithm by taking the classical Recovering
algorithm and replacing the corrector step by a suitable method for solving ˜
F(˜z) = 0.
For z
R
Ndenote by Π(x,y)(z) the projection from the z-space
R
Nto the (x, y)-
space
R
n+m. Moreover, denote by B(z, d) the unique box B
R
n+mof subdivision
depth dwith Π(x,y)(z)B. For every box Bgenerated by the algorithm, let
zB
R
Nbe the particular solution of ˜
F(˜z) = 0 which led to B, that is, ˜
F(zB)=0
and Π(x,y)(zB)B.
With these notations, for a given (incomplete) box collection BjRn+man
iteration of the BL-Recovering-PS algorithm reads as follows:
(i) for all B Bj
B.active := TRUE
(ii) for i= 1, . . . , MaxStep
ˆ
Bj:= Bj
for all {B Bj:B.active == TRUE}
choose a set of starting points X
R
Nnear zB
Y:=
for all z X
starting from zsolve ˜
F= 0 and get the solution Z
Y:= Y Z
B.active := FALSE
for all z Y:
if B(z, d)6∈ ˆ
Bj
B(z, d).active := TRUE
ˆ
Bj:= ˆ
BjB(z, d)
if ˆ
Bj== BjSTOP
Bj+1 := ˆ
Bj
In each loop of step (ii) the aim is to extend the current box collection Bjby adding
further boxes. The algorithm terminates when no further boxes can be found or
when the maximum number of iterations MaxStep is exceeded.
Observe that in contrast to the BL-Subdivision algorithm, which uses boxes in
R
N, the BL-Recovering-PS algorithm works with boxes in the smaller space
R
n+m,
which is an advantage concerning the computational effort. Certainly, one could
62
implement a variant of the BL-Recovering-PS algorithm using boxes in
R
N. But
this would be inefficient, since for every solution (x?, y?) of the BLMOP there are
in general many solutions z
R
Nwith ˜
F(z) = 0 and Π(x,y)(z) = (x?, y?), such that
the algorithm would generate many boxes in
R
Nwhich all correspond to (x?, y?),
although one box would be sufficient. A main difference to the BL-Subdivision
algorithm is, that according to the local nature of the BL-Recovering-PS algorithm
all starting points zare chosen in the neighborhood of solutions zBfound so far.
Consequently, we expect that with these starting points the zero finding method
converges to further solutions of ˜
F(˜z) = 0. The situation is different for the BL-
Subdivision algorithm, because here we do not know whether the starting points
zare close enough to the solution of ˜
F(˜z) = 0. Therefore, the BL-Subdivision
algorithm has to work with boxes in
R
N.
Let us now make a remark on a variant of the BL-Recovering-PS algorithm. In
general, the applied iteration scheme might lead to substationary points which are
far away from the region to be explored in order to find further Pareto points. But
originally, the idea of our recovering methods was to find in every step further Pareto
points (or boxes) in the neighborhood of a certain Pareto point (or box) found in
the previous step. For some applications, particularly if a certain part of the Pareto
set is required, it could be desired to maintain this local behavior of the algorithm
even for starting points farther from the Pareto set. To this end, we propose the
following variant of the BL-Recovering-PS algorithm. Instead of just solving the
system ˜
F(˜z) = 0, we can search for a solution of this system with the additional
property that the distance to the starting point with respect to (x, y) is minimized,
that is, we have to solve
min
˜z||Π(x,y)(˜z)Π(x,y)(z0)||,(6.27)
s.t. ˜
F(˜z) = 0,
where z0denotes the respective starting point which now has the additional role of
a reference point. This variant was applied to the following
Example 6.6
min
x
R
3, y
R
F(x, y) = (x0+ 1)2+ (x11y)4+x2
2
(x01)2+ (x1+ 1 y)2+ (x20.5)4,
63
such that H1(x, y) = x2
0+x2y2= 0,
G1(x, y) = x0+x3
1+ 0.50,
and xsolves:
min
x
R
3f(x, y) =
(x01)2+ 0.5(x1+y)2+ (x21)4
(x0+ 1)2+ 0.5(x1+y)2+ (x2+ 1)2
x2
0+x2
1+ (x2+ 1)2
such that h1=x0x1y= 0.
Figure 20 and Figure 21 show the projections to the xspace of the sets computed by
the BL-Recovering-PS algorithm based on the solution of ˜
F(˜z) = 0 and ¯
F(¯z) = 0,
respectively. To be more precise, Figure 20 shows a representation of the entire
Pareto set according to the system ˜
F(˜z) = 0, and Figure 21 shows a representation
of the ’inner’ part of the Pareto set according to the system ¯
F(¯z) = 0. This ’inner’
part is characterized by the fact that the upper level inequality constraint is inactive,
that is, G1<0.
Choice of Starting Points There are different possibilities for generating X,
the set of starting points. Of course, randomly chosen starting points can be used.
Another approach is to take points zj=c+Piλiei, where cis the center of the
current box, the eiare basis vectors of the search space and, the λiare real num-
bers such that every Π(x,y)(zj) lies within one of the neighboring boxes where new
substationary points are supposed to be. If derivatives are available, the points zj
can be generated by computing a basis biof the tangent space to the set ˜
F1(0) at
a previously computed point z?˜
F1(0) and then setting zj=z?+Piλibi. In
this latter approach, in particular when the dimension of the search space is very
large, the number of points zjcan be chosen small compared to other approaches.
Moreover, the points zjare already located relatively close to the solution set ˜
F1(0)
and so the corrector step needs just a few iterations. For details on computing the
basis vectors bisee [25].
Convergence of Recovering-PS Algorithms
Since our algorithms generate substationary points of the given BLMOP by solving
the system of equations ˜
F(˜z) = 0, we give a more general convergence proof con-
cerning recovering techniques for the location of zeros of a vector valued function
F:
R
n
R
m,n > m. The proof is achieved in two steps: first, Theorem 6.7 gives
a guideline for the choice of a minimal number of starting points piwithin a given
64
Figure 20: The entire solution of Example 6.6 obtained by solving ˜
F(˜z) = 0 (pro-
jection to x-space).
Figure 21: The ’inner’ part of the solution of Example 6.6 obtained by solving
¯
F(¯z) = 0 (projection to x-space).
65
subset B
R
n, e.g., a box, such that, under certain assumptions on the iteration
scheme Φ :
R
n
R
n, convergence towards a solution of F(x) = 0 within Bis
guaranteed in the sense that
lim
s→∞Φs(pi) {x:F(x)=0}B
for at least one pi. Thereafter, this result is used in Corollary 6.9 in order to show
convergence for general recovering methods. For a norm ||·|| denote by dist(x, A) =
minyA||xy|| the distance between the point x
R
nand the compact set A
R
n.
Theorem 6.7 Let F(x) :
R
n
R
m, denote P:= {x
R
n:F(x)=0}and let
B
R
nbe an open set such that PB6=. Let Φ :
R
n
R
nbe a mapping and
let δ > 0, λ 1such that for all x0Bwith dist(x0, P)< δ we have
(i) lim
s→∞Φs(x0)P,
(ii) lim
s→∞||Φs(x0)x0|| λdist(x0, P).
Then there is d > 0such that for any set X Bof starting points with dist(x, X)<
dfor all xB{x: dist(x, P)< δ}there is a point p0 X, such that
lim
s→∞Φs(p0)PB.
Proof: Since Bis open and PB6=, there is ¯xP,ε > 0, and an open
neighborhood Uε(¯x) of ¯x, such that
PUε(¯x)B.
For 0 < θ < 1 let d:= θmin{δ, ε
2λ}. Then, for any set X Bof starting points
with dist(x, X)< d for all xB{x: dist(x, P)< δ}, there is p0 X, such that
dist(p0, P) ||p0¯x|| <min{δ, ε
2λ}
and therefore
x?:= lim
s→∞Φs(p0)P.
Moreover,
||x?p0|| = lim
s→∞||Φs(p0)p0|| λdist(p0, P)λ||p0¯x|| <ε
2
and consequently
||x?¯x|| ||x?p0||+||p0¯x|| <ε
2+ε
2=ε,
that is, x?Uε(¯x). Since Uε,(¯x)B, it follows that x?is a solution of F(x)=0
within B.
66
Remark 6.8 If the mapping Φ is given by an iteration step for the solution of
(6.27), the requirement on λin Theorem 6.7 is satisfied with the choice λ= 1.
To guarantee that a recovering method converges towards the union of those con-
nected components of the solution set which correspond to the initial box collection
B0, in every step of the algorithm the set of starting points pihas to be chosen
properly, such that all desired boxes are found, that is, boxes which are both neigh-
bors of the boxes generated in the respective previous step and contain a part of the
respective connected component of the solution. To this end, we denote by ¯
Bthe
closure of a box Band we state the following
Corollary 6.9 Using the notations of Theorem 6.7 and denoting by B0a box
collection of (fixed) subdivision depth dcovering a part of P, assume that every step
of the Recovering or BL-Recovering-PS algorithm, respectively, is realized in a way
such that for every B Bj\Bj1starting points are chosen according to Theorem
6.7 within all boxes C {C:¯
C¯
B6=, C / Bj}. Moreover, assume that Pis
bounded. Then, if the mapping defined by the iteration steps for solving F(x) = 0
fulfills the requirements on the mapping Φstated in Theorem 6.7, the algorithm
terminates after a finite number of steps such that the final box collection covers
those connected components of P, which correspond to at least one B B0.
BL-Recovering-IS Algorithm
Similarly as presented for classical multi-objective optimization problems in Section
4, image set-oriented variants of the recovering methods can also be defined for
BLMOPs with a convex lower level problem. To this end the search space has to
be changed and the distance minimization subproblems, which have to be solved
within every step of the algorithm, have to be replaced by zero finding problems. To
be more precise, the higher level objectives have to be replaced by the scalar valued
objective ||F(x, y)T||, and we have to solve the system ˜
F(˜z) = 0 for this altered
BLMOP, that is, according to Remark 6.2, ˜
F(˜z) simplifies to
67
˜
F(˜z;T) =
∇||F(x, y)T||+
n
X
i=1
λi∇Lxi(x, y, α, ζ) +
l
X
i=1
(νµi)en+m+i
+
p
X
i=1
ωihi(x, y) +
r
X
i=1
ρiHi(x, y) +
s
X
i=1
δiGi(x, y)
h(x, y)
H(x, y)
G(x, y) + ww
¯
∇L(x, y, α, ζ)
l
X
i=1
αi1
µα
δG(x, y)
αss
µuu
δvv
=0,
where Tis the fixed target corresponding to the individual subproblem, and ˜z
is understood to be reduced in the sense that there are no components βand t.
Therefore, in the following we can change notation by using small letters tor ti
for the targets. Observe that the number of variables agrees with the number of
equations, that is, a Newton method can be used for the solution of the system.
For a given box collection Bj
R
k(in image space) of subdivision depth dand
denoting by zBand FBthe previously generated solution (in parameter and image
space, respectively) associated with a box B Bj, a step of the BL-Recovering-IS
algorithm can be written as follows:
(i) for all B Bj
B.active := TRUE
(ii) for j= 1, . . . , MaxStep
ˆ
Bj:= Bj
for all B {B Bj:B.active == TRUE}
choose target vectors {ti}i=1,...,ntnear Bwith ti<pFB
find ˜z?
iwith ˜
F(˜z?
i;ti)=0, i = 1, . . . , nt
F?
i:= F(x,y)(˜z?
i)), i = 1, . . . , nt
B.active := FALSE
for all i= 1, . . . , nt:
68
if B(F?
i, d)6∈ ˆ
Bj
ˇ
B:= B(F?
i, d),˜zˇ
B:= ˜z?
i, F ˇ
B:= F?
i
ˇ
B.active := TRUE
ˆ
Bj:= ˆ
Bjˇ
B
if ˆ
Bj== BjSTOP
Bj+1 := ˆ
Bj
Since the set of higher level objectives is replaced by a scalar distance minimiza-
tion problem, another variant of this algorithm can be defined by using standard
methods for the solution of classical optimization problems. To be more precise, in
every step the expressions ||F(x, y)ti|| have to be minimized under the restriction
that only those components of ˜
Fvanish which guarantee that the optimality con-
ditions for the lower level problem hold and that the higher level constraints of the
BLMOP are fulfilled. With
ˆ
F(x, y, α, ζ, w, s) :=
h(x, y)
H(x, y)
G(x, y) + ww
¯
∇L(x, y, α, ζ)
l
X
i=1
αi1
αss
= 0
and denoting ˆz:= (x, y, α, ζ, w, s) and S:= {ˆz:ˆ
F(ˆz)=0}, the variant of the
BL-Recovering-IS algorithm can be defined by replacing the parameter space by
the ˆz-space and replacing the solution of ˜
F(˜z;ti) = 0 by the distance minimization
problem
min
ˆzS||F(x,y)(ˆz)) ti||,
where ti, i = 1, . . . , ntdenote the targets which have to be chosen individually in
every cycle of the algorithm.
The efficiency of the BL-Recovering-IS algorithm will be demonstrated in a more
general context by the examples in Section 6.5, where the algorithm is extended by a
smoothing technique in order to solve BLMOPs which include lower level inequality
constraints. Moreover, we refer to the example in Section 7.3, where the algorithm
is combined with a sensitivity analysis based technique for the adaptive choice of
targets.
69
Convergence of Recovering-IS Algorithms
Since the described BL-Recovering-IS algorithm is realized by minimizing a refor-
mulation of the BLMOP, which can be understood as a constrained MOP, in the
following we prove convergence for the more general class of image set-oriented
recovering algorithms, which includes both the Recovering-IS algorithm and the
BL-Recovering-IS algorithm. The proof is carried out in two steps: first, Theorem
6.10 states that for every subset B
R
kcontaining a part of the Pareto optimal
solution in image space, there is a minimal set of targets, such that for at least one of
these targets the corresponding distance minimization subproblem leads to a Pareto
point x?with F(x?)B. Then, this result is used in Corollary 6.11 to complete
the proof.
Theorem 6.10 Let F:
R
n
R
k, S
R
nand denote by Pthe Pareto set of the
constrained MOP:
min
xSF(x).
Assume that the norm ||·|| is strictly monotonically increasing. Let B
R
kbe an
open subset such that BF(P)6=. Then there is d > 0, such that for any set
X Bof targets with dist(y, X)< d for all yBF(P)there exists a target
t X with F(x?)F(P)B, where x?:= arg min
xS||F(x)t||.
Proof: There are ¯yF(P) and ε > 0, such that Uε(¯y)B. Let d:= ε
8kand
c:= ¯y2d
k
X
i=1
ei, where eidenotes the i-th standard basis vector in
R
k. Then, for
every yUd(c), we have
||y¯y|| ||yc||+||c¯y|| d+ 2dk=ε
8k+ε
4<ε
2,
that is, Ud(c)Uε(¯y). Consequently, there is a target t=c+v X,||v|| d, such
that
min
xSt||F(x)t|| ||t¯y|| <ε
2
and
ti=ci+vi= ¯yi2d+vi<¯yifor i = 1, . . . , k.
With x?= arg min
xSt||F(x)t||, it follows that
||F(x?)¯y|| ||F(x?)t||+||t¯y|| <ε
2+ε
2=ε
70
and therefore
F(x?)Uε(¯y)B.
Now we have to show that F(x?) is not dominated by any ˆyF(P)St. For
F(x?) = ˆythis nondominance is obvious. For the case F(x?)6= ˆywe have to show
that Fi(x?)<ˆyifor at least one i= 1, . . . , k. To see this, assume that the opposite
is true. Then Fi(x?)ˆyi> tifor all i= 1, . . . , k, where, since F(x?)6= ˆy, strict
inequality holds for at least one i {1, . . . , k}. Consequently, since ||·|| is strictly
monotonically increasing,
||F(x?)t|| >||ˆyt||,
which is a contradiction to ||F(x?)t|| = min
xSt||F(x)t||. Finally, since F(x?) is
not dominated by any ˆyF(P)St, we have F(x?)F(P), which completes the
proof.
To guarantee that an image set-oriented recovering method converges towards
the union of those connected components of F(P) which correspond to the initial
box collection B0, in every step of the algorithm the set of targets tihas to be
chosen properly, such that all desired boxes are found, that is, boxes which are
both neighbors of the boxes generated in the respective previous step and contain a
part of the respective connected component of F(P). To this end, we recall that ¯
B
denotes the closure of a box Band we state the following
Corollary 6.11 Using the notations of Theorem 6.10 and denoting by B0a box
collection of (fixed) subdivision depth dcovering a part of F(P), assume that every
step of the Recovering-IS or BL-Recovering-IS algorithm, respectively, is realized in
a way such that for every B Bj\Bj1targets are chosen according to Theorem
6.10 within all boxes C {C:¯
C¯
B6=, C / Bj}. Moreover, assume that F(P)is
bounded. Then, the algorithm terminates after a finite number of steps such that the
final box collection covers those connected components of F(P), which correspond to
at least one B B0.
6.3 Pareto Set Constrained Multi-Objective Optimization
Problems
In this section we introduce particular BLMOPs, which are termed Pareto set con-
strained multi-objective optimization problems PSCMOPs.
71
From Theorem 6.1 we will derive corollaries, which are concerned with necessary
optimality conditions for PSCMOP. Then we will develop algorithms for the solution
of this class of problems. In particular, we will present derivative-free algorithms,
which turned out to work very satisfactorily in combination.
A PSCMOP can be understood as a variant of BLMOP, where the hierarchical
structure is relaxed in the sense that in contrast to classical bi-level structures
the lower level problem is not a parametrized one. Moreover, there are no explicitly
given higher or lower level constraints. To be more precise, consider the vector
valued functions F:
R
n×
R
m
R
kand f:
R
n
R
l. Then a (PSCMOP) can be
stated as follows:
min
x
R
n,y
R
mF(x, y),(PSCMOP)
where xsolves: min
x
R
nf(x),
where minimization again has to be understood in the sense of the partial order
p. Based on Theorem 6.1 the following necessary conditions for Pareto optimality
for a PSCMOP can be stated. For this let L:
R
n×
R
l
R
, defined by
L(x, α) :=
l
X
i=1
αifi(x)
denote the lower level Lagrangian. Let I={i:αi= 0} {1, . . . , l}and denote by
eithe ith vector of the standard basis in
R
n+m+l.Furthermore, for i= 1, . . . , n,
denote by
Lxi:=
xiL
the derivative of Lwith respect to xiand let ¯
:= (x)and := (x,y).
Corollary 6.12 Let (x, y)be a Pareto point of PSCMOP.
Then there exist α1, . . . , αl0such that
(i)¯
∇L(x, α) =
l
X
i=1
αi¯
fi(x) = 0,(6.29)
(ii)
l
X
i=1
αi= 1.(6.30)
If in addition all solutions of (i) and (ii) comply with Pareto points of the lower level
problem and if the Hessian ¯
2L(x, α)of Lis regular at (x, α), then there exist
72
β1, . . . , βk, µ1, . . . , µl0,ν, λ1, . . . , λnRsuch that
(iii)
k
X
i=1
βiFi(x, y) +
n
X
i=1
λi∇Lxi(x, α) (6.31)
+
l
X
i=1
(νµi)en+m+i= 0,
(iv)
k
X
i=1
βi= 1,(6.32)
(v)µi= 0 for i /I. (6.33)
Proof: Let (x, y) be a solution of PSCMOP. Then xis a solution to the lower
level problem (minimization of f) and the Kuhn-Tucker condition (2.2) holds: there
are α1, . . . , αl
R
such that
¯
∇L(x, α) =
l
X
i=1
αi¯
fi(x) = 0,(6.34)
l
X
i=1
αi= 1 (6.35)
and
αi0, i = 1, . . . , l. (6.36)
Since the set of solutions of (6.34), (6.35) and (6.36) is exactly the feasible region
for the higher level problem, we can also apply the Kuhn-Tucker conditions (2.2)
to the upper level problem while regarding (6.34), (6.35) and (6.36) as constraints.
For this we have to ensure that the gradients with respect to (x, y, α) of the active
constraints, that is
∇Lxi(x, α), i = 1, . . . , n, (6.37)
l
X
i=1
αi=
l
X
i=1
en+m+i,(6.38)
(αi) = en+m+i, i I, (6.39)
are linearly independent or equivalently, that the matrix
¯
2L(x, α)0¯
f1(x). . . ¯
fl(x)
0 0 ˜
I
0 0 1. . . 1
Rn+|I|+1×n+m+l(6.40)
73
has maximal rank, where the rows of ˜
Iare given by the transposed of distinct
standard basis vectors of Rlaccording to (6.39). Since (6.40) has a certain triangular
block structure and ¯
2L(x, α) is regular, it remains to show that the submatrix
˜
I
1. . . 1R|I|+1×l(6.41)
has maximal rank. Indeed, with (6.35) and (6.36) we have |I|< l, that is, (6.41)
has at most lrows. Obviously, these rows are linearly independent and thus both
(6.41) and also (6.40) have maximal rank. Consequently, (6.37), (6.38) and (6.39)
are linearly independent and we can write down the Kuhn-Tucker conditions (2.2)
for the (constrained) upper level problem: there are β1, . . . , βk, µ1, . . . , µl0,
ν, λ1, . . . , λnR, such that
k
X
i=1
βiFi(x, y) +
n
X
i=1
λi∇Lxi(x, α) + ν
l
X
i=1 αi+
l
X
i=1
µi(αi)=0,(6.42)
k
X
i=1
βi= 1 (6.43)
and
µi= 0 for i /I. (6.44)
Observing that (αi) = en+m+ifor i= 1, . . . , l, we obtain from (6.42)
k
X
i=1
βiFi(x, y) +
n
X
i=1
λi∇Lxi(x, α) +
l
X
i=1
(νµi)en+m+i= 0 (6.45)
Since for positive coefficients a linear combination of symmetric positive definite
matrices is again symmetric positive definite, we can conclude the following corollary,
which will be relevant for some of our numerical examples later on.
Corollary 6.13 Let (x, y)be a Pareto point of PSCMOP.
Then there exist α1, . . . , αl0, such that
(i)
l
X
i=1
αi¯
fi(x) = 0,(6.46)
(ii)
l
X
i=1
αi= 1.(6.47)
(6.48)
74
If in addition all solutions of (i) and (ii) comply with Pareto points of the lower level
problem and if the Hessian ¯
2fi(x)of fiis positive definite at xfor i= 1, . . . , l,
then there exist β1, . . . , βk, µ1, . . . , µl0,ν, λ1, . . . , λnR, such that
(iii)
k
X
i=1
βiFi(x, y) +
n
X
i=1
λi∇Lxi(x, α) (6.49)
+
l
X
i=1
(νµi)en+m+i= 0,
(iv)
k
X
i=1
βi= 1,(6.50)
(v)µi= 0 for i /I. (6.51)
Proof: The Hessians ¯
2fi(x), i= 1, . . . , l, are symmetric positive definite
and consequently, with
l
X
i=1
αi= 1 and αi0 for i= 1, . . . , l, also the Hessian
¯
2L(x, α) =
l
X
i=1
αi¯
2fi(x) of L(x, α) is symmetric positive definite at (x, α).
Since a symmetric positive definite matrix is regular we can use Corollary 6.12 to
complete the proof.
Remark 6.14 (i) If xis an inner point of the solution set of the lower level
problem, that is, I=, then (6.31) and (6.49) simplify to
k
X
i=1
βifi(x, y) +
n
X
i=1
λi∇Lxi(x, α) +
l
X
i=1
ν en+m+i= 0 (6.52)
(ii) Since the Hessian ¯
2fof a convex function f:
R
n
R
is positive definite,
Corollary 6.13 is in particular relevant for PSCMOPs with convex lower level
problems.
(iii) If all lower and higher level objectives and the solution set of the lower level
problem are convex, then all points satisfying the equations stated in Corollary
6.12 and Corollary 6.13 are Pareto points of the given PSCMOP.
6.4 Methods for Solving PSCMOPs
In this section we present algorithms for the solution of PSCMOP. The first two algo-
rithms PSC-Subdivision and PSC-Recovering, which are based on the results stated
75
in Corollary 6.12 and Corollary 6.13, are special variants of the more general algo-
rithms BL-Subdivision and BL-Recovering, respectively, for the solution of BLMOP.
Since the algorithms BL-Subdivision and BL-Recovering are described in detail in
the previous sections, we will restrict ourselves to the consideration of the essen-
tial item which distinguishes the PSC-variants from the BL-variants, that is, the
simplification of the system of equations to be solved in order to find substationary
points.
Thereafter we present the methods PSC-Sampling and PSC-SamRec, which are
based solely on the computation and comparison of objective values and there-
fore give the opportunity to solve PSCMOPs with objectives that do not meet any
smoothness or convexity assumptions. Moreover, we propose some guidelines on how
to combine these algorithms in order to increase the performance of the respective
numerical schemes.
PSC-Subdivision and PSC-Recovering Algorithm
As mentioned above, the PSC-variants of the Subdivision and Recovering algo-
rithm, respectively, are very similar to the corresponding BL-variants. The essential
difference is the system of equations to be solved in order to find Pareto candidates.
With ˜z:= (x, y, α, β, µ, ν, λ, s, t, u), the system simplifies to
˜
F(˜z) =
k
X
i=1
βiFi(x, y) +
n
X
i=1
λi∇Lxi(x, y, α) +
l
X
i=1
(νµi)en+m+i
¯
∇L(x, y, α)
k
X
i=1
βi1
l
X
i=1
αi1
µα
αss
βtt
µuu
=0.
Apart from this simplification of the system of equations, the PSC-Subdivision and
PSC-Recovering algorithms are identical to the BL-Subdivision and BL-Recovering
algorithm, respectively. But there is one more important aspect to be mentioned.
Since the lower level problem of a PSCMOP is not parametrized by the variable
y
R
m, the PSC-algorithms can be combined with archiving strategies based on
76
the two-step nondominance test described below. In this way, it is possible to
approximate the Pareto set of the PSCMOP even if substationary points of the lower
level problem are not necessarily Pareto optimal for the lower level problem. The
two-step nondominance test might theoretically be extended to handle the general
BLMOP. But for this, a two-step nondominance test associated with at least every
yof a representative discretization of the y-Space has to be performed, such that
in practice the computational effort for such an extension would be unjustifiable.
PSC-SamRec Algorithm
In many applications the computation of derivatives of the given objectives is a
considerable problem. In the majority of cases symbolic derivatives are not available
at all and the numerical computation of derivatives is very expensive. Particularly,
there is no hope to get derivatives of higher order in justifiable time. This motivates
the construction of derivative-free algorithms. In the following we describe the
PSC-SamRec algorithm which is similar to the preceding PSC-Recovering algorithm,
but does not use derivatives. Moreover, at every considered point (x, y) the higher
level function Fmust only be evaluated if xis nondominated with respect to the
lower level function fby any other point accepted so far. The predictor-corrector
concept for finding new nondominated points in the neighborhood of a given box is
interchanged by just expanding the box to a larger test box, choosing test points in it
and then performing a local nondominance test in two steps: first, all points within
the test box are checked for nondominance with respect to the lower level function f.
Only those points which are in fact f-nondominated are taken into account for the
nondominance test concerning the upper level function F. All points ”surviving”
both steps of the nondominance test build up the basis for the decision whether a
box is added to the current box collection or not. Now for a fixed subdivision depth
dand a given box collection BjRn+man iteration of the PSC-Sam-Rec algorithm
can be stated as follows:
(i) for all B Bj
B.active := TRUE
(ii) for i= 1, . . . , MaxStep
ˆ
Bj:= Bj
for all B {B Bj:B.active == TRUE}
77
create atest box ¯
Bsuch that
B¯
B
and
diam( ¯
B) = θjdiam(B),
where θjθmin >1.
choose a set of test points X ¯
B
Nf:= f-nondominated points of X
NF:= F-nondominated points of Nf
B.active := FALSE
for all zNF:
if B(z, d)6∈ ˆ
Bj
B(z, d).active := TRUE
ˆ
Bj:= ˆ
BjB(z, d)
if ˆ
Bj== BjSTOP
Bj+1 := ˆ
Bj
Unlike the PSC-Recovering algorithm, in the PSC-SamRec algorithm there is no
corrector step. As a result, the probability is small that a randomly chosen test
point is a nondominated point. This fact has to be taken into account by considering
a sufficiently large number of test points within every test box.
As well as the PSC-Recovering algorithm, the PSC-SamRec algorithm is of local
nature, that is, nondominance tests are performed only with respect to a local
neighborhood of the current box.
PSC-Sampling Algorithm
The following PSC-Sampling algorithm is an algorithm of subdivision type and
therefore it is tailored to generate a box collection that covers the global Pareto set
and not only a locally nondominated set. Like in all algorithms of subdivision type,
at the beginning of an iteration each box of the current collection is subdivided to
generate a finer collection. Thereafter a selection step is employed. For this, test
points within each box are chosen, and similar as in the PSC-SamRec algorithm,
a nondominance test is performed in two steps. The crucial difference is that test
points are not only compared to test points within a local region, but to all test
points within the whole box collection. This makes the PSC-Sampling algorithm a
global approach. Given a box collection Bj1Rn+mthe collection BjRn+mis
obtained by:
78
(i) Subdivision
Construct from Bj1a new system ˆ
Bjof subsets such that
[
Bˆ
Bj
B=[
B∈Bj1
B
and
diam( ˆ
Bj) = θjdiam(Bj1),
where 0 < θmin θjθmax <1.
(ii) Selection
for all Bˆ
Bj
choose a set of test points XBB
Nf:= f-nondominated points of S
Bˆ
BjXB
NF:= F-nondominated points of Nf
Bj:= nBˆ
Bj:XBNF6=o
Combination of the PSC-Sampling and PSC-SamRec Algorithms
All the proposed approaches are self-contained algorithms, but in many applications
one may obtain even better results when combining them. Moreover, a suitable
combination can reduce the computational effort.
The PSC-Sampling algorithm finds global Pareto points due to the fact that
it works with comparisons within the entire image spaces of both the upper and
lower level of the PSCMOP. However, there always remains some uncertainty, in
particular when the boxes are big and/or the dimension of the PSCMOP is large.
To be more precise, boxes containing global Pareto points may get lost because
the selection is performed by considering finitely many test points within each box.
Nevertheless in practice it turned out that this algorithm works satisfactorily. The
PSC-SamRec algorithm, which is local in nature, successfully extends the computed
box covering of the set of substationary points. To obtain an even better performance
i.e. to compute a robust approximation of the global Pareto set and to use a
moderate amount of function calls we propose the following combination of the
PSC-Recovering and the PSC-Sampling algorithm.
step 0 Apply n0iterations of the PSC-Sampling algorithm using p0test points per
box.
79
step 1 Apply n1iterations of the PSC-SamRec algorithm to the current box collection
with p1test points per box. This fills the gaps which have possibly been
generated in previous steps.
step 2 Use n2iterations of the PSC-Sampling algorithm using p2test points per box
for further refinement of the box collection. In this way boxes which only
contain local Pareto points that are not Pareto-optimal from a global point of
view can be removed from the covering.
step 3 Carry out the PSC-SamRec algorithm using p3test points per box until no
more missing boxes are added.
Step 1 and step 2 are repeated until the desired box size is obtained.
With a proper choice of the parameters niand pi, this combination works much
better and faster than using just one of the algorithms on its own. Of course, the
choices for the niand pidepend on the concrete problem, but experience gives reason
to some general rules as follows: p1and p2can be chosen small compared to p0and
p3. The nishould be chosen based on how many ”good” boxes get lost during the
application of the PSC-Sampling algorithm. The number of boxes which are added
to the collection during step 1 is a measure for the number of test points p2needed
in step 2.
Example 6.15 Given a higher level function F= (F1, F2)t:R2×RR2and a
lower level function f= (f1, f2)t:R2R2, we consider the following example:
min
x
R
2, y
R
F(x, y) = (x11)4+ (x21)2+ (y1)2
(x1+ 1)2+ (x2+ 1)4+ (y+ 1)2,(6.53)
such that xsolves:
min
x
R
2f(x) = (x12)2+ (x22)4
(x1+ 0.5)2+ (x2+ 1)2.(6.54)
Observe that the Hessians ¯
2f1(x) and ¯
2f2(x) of the lower level objectives f1and
f2with respect to xare given by
¯
2f1(x) = 2 0
0 12 (x22)2and ¯
2f2(x) = 2 0
0 2 (6.55)
for all x
R
2.¯
2f2(x) is obviously constant positive definite. ¯
2f1(x) is positive
definite unless x2= 2, but in the latter case one can easily show that for all x1, y R
80
the point (x1, x2, y) cannot be a solution of the given PSCMOP. In other words, the
assumptions of Corollary 6.13 are satisfied. Moreover, all objectives of both the
lower and the higher level problem are convex. Thus it is a good choice to use the
PSC-Subdivision or the PSC-Recovering algorithm for the solution of this problem.
To state the system of equations which has to be solved in every corrector step of
the PSC-Recovering algorithm observe that the optimality conditions for the lower
level system (6.54) are as follows:
There exist non-negative multipliers α1, α2Rsuch that
2
X
i=1
αi= 1 (6.56)
and
2
X
i=1
αi¯
fi(x) = 0.(6.57)
From this we formulate the following constraints for the higher level problem (6.53):
H(x, α) = 2α1(x12) + 2α2(x1+ 0.5)
4α1(x22)3+ 2α2(x2+ 1) = 0,(6.58)
α1+α21 = 0,(6.59)
α1> ε, (6.60)
α2> ε. (6.61)
As motivated in Section 6.2, εavoids that xis on the boundary of the solution set of
the lower level problem. Thus, according to Remark 6.14, the optimality conditions
for the higher level problem under the constraints defined above are:
There exist multipliers β1, β2R+and λ0, λ1, λ2Rsuch that
2
X
i=1
βi= 1 (6.62)
and
2
X
i=1
βiFi(x, y) +
2
X
i=1
λiHi(x, α) + λ0
2
X
i=1 αi= 0,(6.63)
where acts on (x, y, α). Calculating the gradients of Fand Hand substituting
them into (6.63), together with the equations above, we get the system which has
to be solved in every corrector step:
81
˜
F=
4β1(x11)3+ 2β2(x1+ 1) + 2λ1(α1+α2)
2β1(x21) + 4β2(x2+ 1)3+λ2(12α1(x22)2+ 2α2)
2β1(y1) + 2β2(y+ 1)
λ1(2x14) + 4λ2(x22)3+λ0
λ1(2x1+ 1) + 2λ2(x2+ 1) + λ0
2α1(x12) + 2α2(x1+ 0.5)
4α1(x22)3+ 2α2(x2+ 1)
α1s2
1ε
α2s2
2ε
α1+α21
β1t2
1
β2t2
2
β1+β21
= 0.(6.64)
The result of the PSC-Recovering algorithm was used to create Figures 22 and 23
as follows. First the algorithm was applied to obtain a tight box covering of the
solution. Then test points in parameter space were generated within each box and
similar as in the PSC-Sampling and PSC-SamRec algorithms a nondominance
test was performed in two steps to select representatives of the solution set. In
doing so both the nondominated points in parameter space and the corresponding
objective values were archived. Thereafter the Recovering algorithm for solving
classical MOPs as described in Section 3.1 was applied both to the lower level
problem and to the unconstrained higher level problem. Again, nondominance tests
were performed on sets of test points, such that we obtained representatives of the
corresponding solutions both in parameter and objective space. The representatives
of the lower level problem in parameter space were expanded, in a certain sense, to
the three-dimensional higher level parameter space. They were used in Figure 22
to depict the constraint surface which defines the feasible set for the higher level
problem of the PSCMOP. The representatives of the solution to the PSCMOP in
parameter space were plotted in the same figure and consequently one can see that
this solution is embedded in the constraint surface, as expected. In Figure 23 the
representatives in higher level image space of both the unconstrained higher level
problem and the PSCMOP were depicted for the purpose of comparison. Obviously,
the solution to PSCMOP (which is a constrained problem) can not be better than
the solution to the unconstrained higher level problem and is thus located above the
latter one.
For alternatively solving the problem with the PSC-Sampling and the PSC-
SamRec algorithms we used a combination as follows:
82
for ifrom 1to 3do
6iterations PSC-Sampling with 300 test points/box
1iteration PSC-SamRec with 500 test points/box
end
3iterations PSC-Sampling with 500 test points/box
The result of this combined algorithm is shown in Figure 24. It should be men-
tioned that, to obtain a result like this by using the pure PSC-Sampling algorithm,
5000 test points/box were needed. Consequently the computational time for the
combined algorithm was about 10 times faster than that for the pure PSC-Sampling
algorithm.
Figure 22: The solution lies within the constraint surface defined by the lower level
problem. (the constraint surface was computed separately just for the purpose of
visualization, but need not be explicitly computed by our algorithms).
83
Figure 23: Comparison of the solution of the PSCMOP to the solution of the (un-
constrained) higher level problem (higher level objective space).
Figure 24: Box covering computed by a combination of the PSC-Sampling and
PSC-SamRec algorithms.
84
Example 6.16 Let us now consider the following example with a higher level func-
tion F= (F1, F2, F3)t:R3R3and a lower level function f= (f1, f2, f3)t:R3
R3:
min
x
R
3F(x) =
(x12)4+ (x22)2+ (x32)2
(x1+ 1)2+ (x2+ 1)2+ (x3+ 1)4
(x11)2+ (x21)2+ (x3+ 1)4
,(6.65)
such that xsolves:
min
x
R
3f(x) =
(x12)2+ (x22)4+ (x32)2
(x1+ 0.5)2+ (x2+ 1)4+ (x32)2
(x1+ 0.5)2+ (x21)2+ (x3+ 2)4
.(6.66)
Here, Fdoes not depend on y, that is, F(x, y) = F(x). Again, all objectives are
convex. The solution to this problem was computed using the PSC-Recovering
algorithm. The results represented in different spaces are shown in Figures 25 to 28.
For a demonstrative representation also the solutions in lower and higher level image
space, respectively, are depicted as box collections (Figures 27 and 28). Figure 26
does not only show the solution to the PSCMOP but also in a different color the
solution to the lower level problem. The latter was computed using the Recovering
algorithm designed for solving classical MOPs as described in Section 3.1. One can
observe that in parameter space the solution to the PSCMOP complies with a
part of the solution to the lower level Problem.
85
Figure 25: Box covering in parameter space computed by the PSC-Recovering algo-
rithm.
86
Figure 26: The solution to the PSCMOP computed by the PSC-Recovering algo-
rithm is also a part of the solution to the lower level problem computed by the
Recovering algorithm.
Figure 27: The solution in lower level image space as computed by the
PSC-Recovering algorithm.
87
Figure 28: The solution in higher level image space as computed by the
PSC-Recovering algorithm.
Example 6.17 Application to Electromagnetic Shielding
In the following we deal with the optimization of a new multi-layer shielding ma-
terial with N= 3 layers, which is designed to protect an electronic device against
electromagnetic radiation. A similar problem was considered in [40], from where we
have taken the mathematical model which describes the relevant physical properties
of the material. For a deeper insight to this interesting field of applications the
reader is also referred to [40]. We consider a scenario where radiation of frequency
f1= 50 ·106Hz could cause light temporal disturbance, whereas radiation of fre-
quency f2= 3.35·107Hz would end up in heavy destruction of the electronic device.
Consequently, protection against f2is much more important than protection against
f1. In addition, due to production tolerances, the obtained degrees of shielding
against f1and f2, respectively, are required to be robust against small variations
of the layers’ thickness. Thus, the shielding material has to be designed such that
both protection against f2and the corresponding robustness are optimized in a first
instance, and the same objectives related to f1have to be optimized in a second
instance.
For our example we have chosen a 3-layered material, where the middle layer
88
consists of a particular polymer and the two outer layers consist of polyaniline
polyurethane (PAni/PU). We denote by Zi, σi, i, and di(i= 1,...,3) the impedance,
the conductivity, the permittivity, and the thickness of the i-th layer, respectively.
Due to the manufacturers possibilities both thickness and conductivity of the two
outer layers serve as design variables that is, x= (d1, d3, σ1, σ3)t.
For every frequency fthe characteristic matrix Mf
iC2×2of the i-th layer is
given by
Mf
i= cos(Af
i)jZf
isin(Af
i)
j
Zf
i
sin(Af
i) cos(Af
i)!,
where
Af
i=ωdisµ00ijσi
ω0and Zf
i=sµ0
0ijσi
ω0,
where ω= 2πf and jdenotes the imaginary unit. µ0= 4π·107and 0=
8.854 ·1012 are the common physical constants. Due to their contact to air media,
the impedances of the outer layers are set to Z0=ZN+1 = 377. The characteristic
matrix for the entire 3-layered compound is given by
Mf=Mf
1Mf
2Mf
3=Mf
11 Mf
12
Mf
21 Mf
22 .
With this notation the transmission coefficient is given by
Tf=
2Mf
22(Mf
11Z0Mf
12) + Mf
12(Mf
22 Mf
21Z0)
(Mf
11Z0Mf
12) + Z0(Mf
22 Mf
21Z0)
and the electromagnetic shielding against radiation of frequency fcan be ex-
pressed by
sf= 20 log(|Tf|).
Furthermore, we express the shielding robustness against variations of the layers’
thickness by
rf= (∆d1sf)2+ (∆d2sf)2,
where d1sfand d2sfdenote finite differences of sfwith respect to d1and d2,
89
respectively. Finally we are in the position to state the PSCMOP we have to solve:
min
d1,d313sf1(d1, d3, σ1, σ3)
rf1(d1, d3, σ1, σ3)(6.67)
such that (d1, d3, σ1, σ3) solves:
min
d1,d313sf2(d1, d3, σ1, σ3)
rf2(d1, d3, σ1, σ3)(6.68)
Observe that, caused by the trigonometric functions involved in both the higher
and lower level objectives, this problem is not convex at all and therefore there might
be local solutions which are not necessarily globally Pareto optimal. Moreover, for
both the higher level and lower level problem, respectively, points satisfying the
Kuhn-Tucker conditions are not necessarily Pareto optimal even from a local point
of view. Thus, it was advantageous to use the PSC-Sampling algorithm to solve this
problem. After generating the box collections with the PSC-Sampling algorithm, we
have generated test points within each generated box. Thereafter, the nondominated
points have been selected by a two-step nondominance test. The solution in higher
and lower level objective space, respectively, are depicted in Figure 29. Two different
projections of the Pareto set in parameter space to representative 3-dimensional
subspaces are shown in Figure 30.
Figure 29: The solution of the shielding problem in higher level (left) and lower level
(right) objective space.
90
Figure 30: The solution of the shielding problem in parameter space (projections to
the (d1, d3, σ1)-subspace (left) and (σ3, d3, σ1)-subspace (right), respectively).
6.5 Methods for Solving BLMOPs with Lower Level In-
equality Constraints
As stated in Section 6.1, Theorem 6.1 and consequently also the algorithms based
on it are restricted to problems without lower level inequality constraints, because
the constraint qualifications which allow to apply the Kuhn-Tucker conditions to the
higher level problem can not be satisfied if lower level inequality constraints are in-
volved, see Theorem 6.3. Nevertheless, if there are lower level inequality constraints,
we can benefit from Theorem 6.1 when the BLMOP is adequately approximated. To
be more precise, for the reasons stated in Section 5.1, instead of solving the given
BLMOP directly, we solve successively a sequence of related auxiliary problems.
Here, the lower level inequality constraints gi(x, y), the requirement on the related
multipliers τito be non-negative, and the corresponding complementarity relations
τigi(x, y) = 0, are replaced by equality constraints constructed by the use of the
perturbed Fischer-Burmeister function Φ described in Section 5.1. The resulting
equations have the form
Φi(τi,gi(x, y), ε) = τigi(x, y)qτ2
i+gi(x, y)2+ε, i = 1, . . . , q. (6.69)
Now the series of auxiliary problems has to be defined such that the smoothing
parameter εis positive at the beginning, decreases with every auxiliary problem
and finally converges towards 0. Recall that Φi(τi,gi(x, y), ε) is smooth for every
ε > 0 and that Φi(τi,gi(x, y),0) = 0 if and only if gi(x, y)0, τi0 and
τigi(x, y) = 0. Consequently, the auxiliary problem associated with ε= 0 complies
with the original problem, that is, the given BLMOP is approximated by a sequence
91
of smooth problems without lower level inequality constraints which can be solved
based on Theorem 6.1. This strategy can be used to modify the algorithms of
Section 6.2 such that they are capable of solving BLMOP with lower level inequality
constraints.
BL2-Recovering-IS algorithm
As an example, in the following we describe the BL2-Recovering-IS algorithm, which
is a modified version of the BL-Recovering-IS algorithm. For this, we adopt the
notations of Section 6.2, expand ˜zby (ξ, τ)
R
2q, and denote by ˜
F(˜z;t, ε) the
variant of ˜
F(˜z;t), which includes the expressions (6.69) associated with the lower
level inequality constraints gi, i = 1, . . . , q, that is, with
Φ(x, y, τ, ε) := 1(τ1,g1(x, y), ε),...,Φq(τq,gq(x, y), ε))t
and
L(x, y, α, ζ, τ) :=
l
X
i=1
αifi(x, y) +
p
X
i=1
ζihi(x, y) +
q
X
i=1
τigi(x, y),
we obtain
˜
F(˜z;t, ε) =
∇||F(x, y)t||+
n
X
i=1
λi∇Lxi(x, y, α, ζ, τ) +
l
X
i=1
(νµi)en+m+i
+
p
X
i=1
ωihi(x, y) +
q
X
i=1
ξiΦi(x, y, τ, ε) +
r
X
i=1
ρiHi(x, y)
+
s
X
i=1
δiGi(x, y)
h(x, y)
Φ(x, y, τ, ε)
H(x, y)
G(x, y) + ww
¯
∇L(x, y, α, ζ, τ)
l
X
i=1
αi1
µα
δG(x, y)
αss
µuu
δvv
=0.
92
Now, for a fixed subdivision depth d, with ε0>0 and the notations above, an
iteration of the BL2-Recovering-IS algorithm, which can also handle lower level
inequality constraints, can be written as follows:
(i) for all B Bj
B.active := TRUE
(ii) for j= 1, . . . , MaxStep
ˆ
Bj:= Bj
for all B {B Bj:B.active == TRUE}
choose target vectors {ti}i=1,...,ntnear Bwith ti<pFB
for all i= 1, . . . , nt
ε:= ε0
˜z:= ˜zB
while termination criterion is not satisfied do
ε:= c ε for some c(0,1)
˜z0:= ˜z
starting from ˜z0find ˜zwith ˜
F(˜z;ti, ε) = 0
˜z?
i:= ˜z
F?
i:= F(x,y)(˜z?
i)), i = 1, . . . , nt
B.active := FALSE
for all i= 1, . . . , nt:
if B(F?
i, d)6∈ ˆ
Bj
ˇ
B:= B(F?
i, d),˜zˇ
B:= ˜z?
i, F ˇ
B:= F?
i
ˇ
B.active := TRUE
ˆ
Bj:= ˆ
Bjˇ
B
if ˆ
Bj== BjSTOP
Bj+1 := ˆ
Bj
There are several possibilities for the definition of a suitable termination criterion.
For example, one can use the simple criterion, that the variation of two successive
vectors of higher and lower level objectives has to be sufficiently small. For this, let
˜zεbe the solution of ˜
F(˜z;ti, ε) = 0 and (xε, yε) := Π(x,y)(˜zε) and denote
Fε:= (F(xε, yε), f(xε, yε))t
R
k+l.
Then the termination criterion is given by the requirement
||FεF||2< εstop
93
for some small εstop >0. In some applications, such a termination criterion might be
satisfactory. But since Φi(τi,gi(x, y), ε) = 0 is equivalent to the original constraints
gi(x, y)0, τi0 and τigi(x, y) = 0 only for the case ε= 0, it is worth to include
these original constraints into the termination criterion as follows:
||FεF||2+||τεg(xε, yε)||2
+||max{0, g(xε, yε)}||2+||max{0,τε}||2< εstop.
Here, max is understood to act component-wise and τε:= Πτ(˜zε).
Example 6.18 We consider now an example which was taken from [16]. In this
example there are explicitly defined inequality constraints Gi:R2×RR, i = 1,2
and gi:R2×RR, i = 1,...,4 for the higher and lower level problem, respectively.
With these constraints, a higher level function F= (F1, F2)t:R2×RR2and, a
lower level function f= (f1, f2)t:R2×RR2, the BLMOP is given as follows:
min
x
R
2,y
R
F(x, y) = x1+x2
2+y+ sin2(x1+y)
cos(x2)·(0.1 + y)·exp(x1
0.1+x2)!,(6.70)
such that
G1(x, y) = y0,
G2(x, y) = y10,
and xsolves:
min
x
R
2f(x, y) = (x12)2+(x21)2
4+x2y+(5y)2
16 + sin(x2
10 )
x2
1+(x26)42x1y(5y)2
80 !,(6.71)
such that
g1(x, y) = x2
1x20,
g2(x, y) = 5x2
1+x210 0,
g3(x, y) = x2+y
650,
g4(x, y) = x10.
Of course, in (6.71) there is the non-convex term s(x2) := sin(x2
10 ). But the expli-
citly given constraints on both levels guarantee that x2[0,5] and obviously the
restriction of sto the interval [0,5] is convex. Consequently, since all other lower level
terms are also convex for fixed y, the entire lower level problem (6.71) is convex. The
94
problem was solved with the BL2-Recovering-IS algorithm using a few initial guesses
and initial targets. As a result, several local Pareto sets were computed, which, as
shown in Figure 31, intersect each other in image space. Therefore, not all computed
solutions are global solutions of the BLMOP. Nevertheless, since all lower level
problems are convex, every local solution (x?, y?) corresponds to a global solution
x?of the lower level problem associated with the parameter y?. Consequently,
the global solution of the BLMOP can be selected from the computed points by a
nondominance test with respect to the higher level. In Figure 31, also the result of
this nondominance test is marked.
Figure 31: The solutions of Problem (6.70,6.71) as computed by the BL-Recovering-
IS algorithm are locally but not necessarily globally Pareto optimal.
To show the behavior of the smoothing strategy, we have once more computed
a small part of the solution while achieving all intermediate solutions ˜zεand the
corresponding higher and lower level objective values. The resulting sequences for
the considered part of the Pareto set in parameter space and lower and higher level
image space are shown in Figures 32, 33 and 34, respectively. Here, the bigger points
mark the final points of the sequences, that is, they are substationary points of the
given BLMOP.
95
Figure 32: Convergence of the smoothing method (parameter space)
Figure 33: Convergence of the smoothing method (lower level objective space)
96
Figure 34: Convergence of the smoothing method (higher level objective space)
Example 6.19 Application to Medical Engineering
This example is also taken from [16] and deals with a real world application ap-
pearing in medical engineering, where the task is to find an optimal configuration
of coils. For a deeper insight to this applications we refer to [16] and the references
therein. The resulting BLMOP is given as follows:
min
x
R
14,y
R
||x||2
2
||xxold||2
2,(6.72)
such that y[0, π],
and xsolves:
min
x
R
14 ||A(y)·V x b(y)||2
2
||x||2
2,(6.73)
such that ||A(y)·V x b(y)||2
2max
97
with
A(y) =
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 cos ysin y0 0
0 0 0 0 0 0 sin ycos y
0 0 0 sin y0 0 cos ysin y
,
b(y) = (0,cos y, sin y, 1,0,0)tand max = 0.3.
For xold
R
14 we choose
xold =(0.1247,0.1335,0.0762,0.1690,0.2118,0.0534,0.1473,
0.3170,0.0185,0.1800,0.1700,0.0718,0.0058,0.0985)t.
The matrix V
R
8×14 is a non-sparse matrix with rank(V) = 8 and depends on the
special medical therapy to be administrated. For our calculation we have taken the
same (randomly chosen) matrix V= (V1|V2) as in [16] with
V1=
0.9501 0.8214 0.9355 0.1389 0.4451 0.8381 0.3046
0.2311 0.4447 0.9169 0.2028 0.9318 0.0196 0.1897
0.6068 0.6154 0.4103 0.1987 0.4660 0.6813 0.1934
0.4860 0.7919 0.8936 0.6038 0.4186 0.3795 0.6822
0.8913 0.9218 0.0579 0.2722 0.8462 0.8318 0.3028
0.7621 0.7382 0.3529 0.1988 0.5252 0.5028 0.5417
0.4565 0.1763 0.8132 0.0153 0.2026 0.7095 0.1509
0.0185 0.4057 0.0099 0.7468 0.6721 0.4289 0.6979
and
V2=
0.3784 0.8180 0.8385 0.7948 0.8757 0.2844 0.4329
0.8600 0.6602 0.5681 0.9568 0.7373 0.4692 0.2259
0.8537 0.3420 0.3704 0.5226 0.1365 0.0648 0.5798
0.5936 0.2897 0.7027 0.8801 0.0118 0.9883 0.7604
0.4966 0.3412 0.5466 0.1730 0.8939 0.5828 0.5298
0.8998 0.5341 0.4449 0.9797 0.1991 0.4235 0.6405
0.8216 0.7271 0.6946 0.2714 0.2987 0.5155 0.2091
0.6449 0.3093 0.6213 0.2523 0.6614 0.3340 0.3798
.
Observe that for every fixed y[0, π], A(y)·V x b(y) is linear with respect to x.
Consequently, the lower level problem (6.73) is a convex one for every fixed y[0, π].
The solution of this problem was computed with a variant of the BL2-Recovering-
IS algorithm, which is based on the minimization of the distance between the vector
of higher level objectives and the targets under the restriction that only those com-
ponents of ˜
Fvanish which guarantee that the optimality conditions for the lower
98
Figure 35: The Pareto set of Problem (6.72, 6.73) in higher level image space (left)
and lower level image space (right) for different parameters λ.
99
level problem hold and that the higher level constraints of the BLMOP are fulfilled.
To be more precise, with ˆz:= (x, y, α, ζ, τ, δ, w, s) and
ˆ
F(ˆz;ε) :=
h(x, y)
H(x, y)
G(x, y) + ww
Φ(x, y, τ, ε)
¯
∇L(x, y, α, ζ, τ)
l
X
i=1
αi1
αss
,
in every step we have to solve
min
ˆzS||F(x,y)(ˆz)) ti||,(6.74)
where S:= {ˆz:ˆ
F(ˆz;ε)=0}and ti, i = 1, . . . , ntdenote the targets which have to be
chosen individually in every cycle of the algorithm. This variant works without boxes
B
R
k. Instead, the individual targets and consequently the Pareto points are
successively computed using the targets t?and the solutions ˆz?from the respective
previous step, that is, we solve (6.74) for a target
t=F?+λb +λ0(t?F?),
where bis a basis vector of the one-dimensional tangent space TF?F(P) of the Pareto
set F(P) in higher level image space at F?:= F(x,y)(ˆz?)). Here, λand λ0make
up a parameterization for the new target t. To demonstrate the advantage of these
methods concerning the user’s influence on the diversity and the granularity, we
have computed different solutions corresponding to different fixed values for the
parameter λ, see Figure 35. While computing these solutions it turned out that in
this particular application the simple choice λ0= 0 leads always to suitable targets,
that is, we have always t <pF(x,y)(¯z?)), where ¯z?is the respective solution of
(6.74). As can be observed on the right hand side of Figure 35, the lower level
inequality constraint ||A(y)·V x b(y)||2
2max = 0.3, which is related to the first
lower level objective, see (6.73), is active for the entire solution of this problem.
6.6 Non-Convex and Non-Smooth BLMOPs
Due to the conditions given by many real world applications, in this section we
assume that for the objectives and constraints of both the higher and lower level
100
either no derivatives are available at all or the computation of derivatives is too
time consuming. Moreover, we abandon the lower level convexity assumption which
was as well as the availability of derivatives essential for the application of the
algorithms based on Theorem 6.1.
BL2-Subdivision Algorithm
Motivated by the facts mentioned above, we want to describe an algorithm which
belongs to the family of subdivision algorithms and is tailored to solve non-convex
and non-smooth BLMOPs. A naive way to handle such problems would be to solve
for ’sufficiently many’ parameters y
R
mthe lower level problem, collecting the re-
spective solutions in a common archive, and then performing at first a feasibility test
and thereafter a nondominance test, both with respect to the higher level problem,
among all collected solutions. But there is the question how to choose ’sufficiently
many’ parameters y
R
min order to obtain a representative approximation of the
y-space while keeping the computational effort justifiable. The following set-oriented
algorithm finds a way out by the use of a subdivision technique. To be more precise,
the algorithm works generally with boxes in
R
n+m, but for the representation of the
y-space and the x-space, respectively, also boxes of respective lower dimension are
used, which can be regarded as projections of boxes in
R
n+mto the space of boxes in
R
nor
R
m, respectively. The subdivision depth is increased adaptively, that is, the
size of the boxes shrinks, such that both the number of points yi
R
mrepresenting
the y-space and the number of generated Pareto points of the associated lower level
problems is increased adaptively as the algorithm proceeds.
For a detailed description of the BL2-Subdivision algorithm, denote for every box
collection B
R
n+mby Πy(B)
R
mthe ’box collection-valued projection’ of Bto
the space of boxes in
R
m. Moreover, for every fixed y
R
mand δN, denote by
BLS(y, δ) a box collection in
R
nwith diam(BLS(y, δ)) = δcovering the solution
of the lower level problem associated with y, e.g., a box collection generated by a
combination of the classical SamRec and Sampling algorithms.
With these notations, starting with an initial box Q=B0
R
n+m, an iteration
of the BL2-Subdivision algorithm reads as follows:
(i) Subdivision
Construct from Cj1:= Πy(Bj1) a new system ˆ
Cjof subsets such that
[
Cˆ
Cj
C=[
C∈Cj1
C
101
and
δj:= diam( ˆ
Cj) = θjdiam(Cj1),
where 0 < θmin θjθmax <1.
(ii) Selection
ˆ
Bj:=
for all Cˆ
Cj
choose a set of test points XCC
ˆ
Bj:= ˆ
BjS
y∈XC
(BLS(y, δj)×C)
for all Bˆ
Bj
choose a set of test points XBB
X:= S
Bˆ
BjXB
NF:= F(G,H)-nondominated points of X
Bj:= nBˆ
Bj:XBNF6=o
As in all our algorithms of subdivision type, a termination criterion for the BL2-
Subdivision algorithm can be defined by the condition diam(Bj)< ε for some small
ε > 0, which is equivalent with the termination after a number of iterations.
Observe that at the end of each iteration the number of boxes B Bjis mini-
mized by the use of a feasibility and nondominance test. Thus, the refinement of
the y-space in the next iteration is restricted to those regions where solutions of the
BLMOP are expected to be. Both, the adaptive refinement itself and the restriction
of the respective refinements to the relevant regions help to keep the computational
effort low.
Let us make a remark concerning the choice of test points for the higher level
nondominance test. If the method for the solution of the lower level problems
produces for every ya representative number of f(g,h)-nondominated points, these
points can be collected in an archive, which serves as a basis for the choice of the
set Xof testpoints, the higher level nondominance test has to be performed on. In
this way, the computational effort for the construction of X, which is particularly
time consuming in the high-dimensional case, can be reduced to a minimum.
Example 6.20 Given a higher level function F= (F1, F2)t:R2×RR2, a lower
level function f= (f1, f2)t:R2×RR2, and a lower level inequality constraint
102
function g= (g1, g2)t:R2R2, we consider the following BLMOP:
min
x
R
2, y
R
F(x, y) = x4
1+x2
2+ (y6π)2
x2
1+ (x26π)2+y4,(6.75)
such that xsolves:
min
x
R
2f(x, y) = sin x1+ sin(x2y)
cos(x1y) + cos x2,
such that g(x) = 0.2x2
1+x2
2
x2
1+x2
250,
(6.76)
where has to be understood component-wise.
The solution of this problem in parameter space, as computed by the BL2-
Subdivision algorithm, is shown in Figure 36. For comparison reasons, we have also
computed the union of the lower level solutions associated with an extensive set of
parameters y
R
musing the BL-Subdivision algorithm. This union of lower level
solutions makes up the feasible set for the higher level problem and is also depicted
in Figure 36. Observe, that the solution of the BLMOP complies with just a small
part of the union. In Figure 37 the solution of the above BLMOP in higher level
objective space is shown.
7 Sensitivity Analysis
In this section we present results concerning the sensitivity of a BLMOP, that is,
we are interested in the behavior of the solution when the BLMOP is disturbed
due to additional parameters. To be more precise, we derive estimations for the
distance of a particular Pareto point of the original problem to the Pareto sets of
the problems arising when the perturbation parameters vary within a certain range.
Such information can help the decision maker to choose a particular solution out
of the computed Pareto set for the adjustment of the system under consideration,
particularly, if this system is affected by uncertainties. Moreover, as we will see in
Section 7.3, the derived estimations can be used to influence the representation of
the Pareto set computed by the algorithms described in the previous sections.
7.1 Sensitivity Analysis for Classical Optimization Problems
Before we can derive the sensitivity results for MOP and BLMOP we recall the basic
sensitivity theorem stated in [17], which is concerned with (scalar valued) parametric
103
Figure 36: The solution of the non-convex BLMOP in parameter space.
Figure 37: The solution of the non-convex BLMOP in higher level objective space.
104
optimization problems of the form
min
xF(x, ) (P())
s.t. Gi(x, )0, i = 1, . . . , q,
Hj(x, ) = 0, j = 1, . . . , p,
where
R
mis a (fixed) perturbation parameter vector and all functions
F, Gi, Hj:
R
n×
R
m
R
are twice continuously differentiable with respect to
x. For the remainder of this section let L:
R
n×
R
q×
R
p×
R
m
R
, defined by
L(x, µ, λ, ) := F(x, ) +
q
X
i=1
µiGi(x, ) +
p
X
j=1
λjHj(x, ),
be the Lagrangian of P(). The gradient F(x, ) of a function F:
R
n×
R
m
R
is understood to be a row vector and the Hessian of Fis denoted by 2F(x, ).
Accordingly, the ith row of the Jacobian G(x, ) of a vector valued function
G:
R
n×
R
m
R
kis given by the gradient Gi(x, ) of the ith component of G.
Moreover, denote ¯
:= (x),:= (), and let IG(x, ) := {i:Gi(x, )=0}.
Theorem 7.1 Let x?
R
nbe a solution of P(0), such that the following conditions
hold at x?:
(i) The gradients ¯
F(x, ),¯
Gi(x, ),¯
Hj(x, )and the constraints Gi(x, ),
Hj(x, ),i= 1, . . . , q, j = 1, . . . , p, are continuously differentiable with respect
to in a neighborhood of (x?,0).
(ii) The gradients ¯
Gi(x?,0), i IG(x?,0),¯
Hj(x?,0), j = 1, . . . , p, are linearly
independent.
(iii) There are µ?
i0, i = 1, . . . , q and λ?
j
R
, j = 1, . . . , p, such that
¯
∇L(x?, µ?, λ?,0) = 0,(7.1)
and
µiGi(x?,0) = 0, i = 1, . . . , q. (7.2)
(iv)
z¯
2L(x?, µ?, λ?,0) z > 0 (7.3)
105
for all z6= 0 with
¯
F(x?,0)z= 0,
¯
Gi(x?,0)z0, i IG(x?,0),
¯
Hj(x?,0)z= 0, j = 1, . . . , p.
(v)
µi>0, i IG(x?,0).(7.4)
Then
(a) x?is an isolated local solution of P(0) and the multipliers µ?
i, λ?
jare unique,
(b) for in a neighborhood of 0, there exists a unique, continuously differentiable
vector function v()=(x()t, µ()t, λ()t)tsatisfying the second order suffi-
cient conditions for a local minimum of P()such that v(0) = (x?t, µ?t, λ?t)t
and hence x()is a unique local minimum of P()with associated multipliers
µ(), λ(),
(c) for near 0, the set IG(x(), )of active inequality constraints is unchanged,
strict complementarity slackness µiGi(x(), )=0holds for i= 1, . . . , q, and
the gradients ¯
Gi(x(), ),¯
Hj(x, ), i IG(x, ), j = 1, . . . , p of the active
constraints with respect to xare linearly independent at x().
Proof: See [17].
Differentiating the equations
¯
∇L(x(), µ(), λ(), ) = 0,(7.5)
µi()Gi(x(), ) = 0, i = 1, . . . , q, (7.6)
H(x(), ) = 0,(7.7)
with respect to , it follows that
M()v() = N(),(7.8)
where
M() :=
¯
2L(x(), µ(), λ(), ) ( ¯
G(x()), )t(¯
H(x(), ))t
Iqµ()¯
G(x(), )IqG(x(), ) 0
¯
H(x(), ) 0 0
106
and
N() :=
(¯
∇L(x(), µ(), λ(), )t)
Iqµ()G(x(), )
H(x(), )
and Iqdenotes the unity matrix in
R
q×q. As stated in [17], M() is regular under
the assumptions of Theorem 7.1 for near 0. Thus, it follows that
v() = M()1N()
for near 0, where the quantities are evaluated as in (7.8). This expression can
be substituted into the Taylor expansion of v() in order to obtain a first-order
approximation, as is stated in the following
Corollary 7.2 Under the assumptions of Theorem 7.1, a first-order approxima-
tion of v()in a neighborhood of = 0 is given by
v() =
x()
µ()
λ()
=
x?
µ?
λ?
+M(0)1N(0)+o(||||),(7.9)
where Φ() := o(||||)means that Φ()/|||| 0as 0.
7.2 Sensitivity Analysis for MOP and BLMOP
In this section we develop methods for the sensitivity analysis of both parametrized
MOPs and parametrized BLMOPs. Since later on in this section the analysis for
BLMOPs is realized by applying the analysis for MOPs on a suitable reformulation
of BLMOP, we begin by extending the previously presented concept, which was
adapted from [17], in order to analyze parametrized MOPs of the following form:
min
xF(x, ) (PM())
s.t. Gi(x, )0, i = 1, . . . , q,
Hj(x, ) = 0, j = 1, . . . , p.
Here, F:
R
n×
R
m
R
lis a vector-valued function and minimization has to
be understood in the sense of Definition 2.1, that is, in the sense of multi-objective
optimization.
To be more precise, we are interested in finding the maximal distance of a solution
x?=x(0) of PM(0) to a (compact) Pareto set of PM(), when varies within a
107
neighborhood of 0, that is, we want to estimate
∆(x?, δ) := max
min
x{||x?x|| :xsolves PM(),|||| δ}
for some δ > 0 and some norm || · ||. For this we will also have to consider the
following weighted sums problems corresponding to fixed weighting vectors α
Sα:= {α:Pl
i=1 αi= 1, αi0, i = 1, . . . , l}:
min
x
l
X
i=1
αiFi(x, ) (PWS(, α))
s.t. Gi(x, )0, i = 1, . . . , q,
Hj(x, )=0, j = 1, . . . , p.
In the following we make the assumption
(A1) for every local solution x?of PM() there is a corresponding weighting vector
αSαsuch that x?is also a unique local solution for problem PWS(, α).
W.l.o.g., let = 0 and let x?=x(0) be both a local Pareto point of PM(0) and
a local solution of PWS(0, α?) for a weighting vector α?Sα. Since the unique
solution of PWS(, α?) is contained in the Pareto set of PM(), we can conclude that
∆(x?, δ)max
{||x?x|| :xsolves PWS(, α?),|||| δ},
and thus an upper bound of first order for ∆(x?, δ) can be obtained as follows. We
substitute F(x, ) by the weighted sums scalarization Pl
i=1 α?
iFi(x, ) and observe
that from Corollary 7.2 we have
||x?x()|| =˜
M(0)1N(0)+o(||||),
where M(0) and N(0) are the above defined matrices associated with the mentioned
substitution for Fand the matrix ˜
M(0)1is given by the first nrows of M(0)1.
Consequently,
∆(x?, δ)˙
max
{|| ˜
M(0)1N(0)|| :|||| δ},(7.10)
where a˙
bmeans ab+o(||||) for all a, b > 0.
As we will present in the following, (7.10) can be improved in the sense that
x?is compared to the entire Pareto sets of the perturbed problems PM() and not
only to those particular solutions out of these sets, which are given by the solutions
108
of the problems PWS(, α?). For this we consider the problems PWS(, α), while
regarding not only but also αas perturbation parameters. In this case, Corollary
7.2 yields
x(, α) = x?+˜
M(0, α?)1N(0, α?)
αα?+o
αα?.(7.11)
According to the assumption (A1), for all near 0, the distance of x?to the Pareto
set of PM() is given by the distance of x?to the unity of the solutions of the
problems PWS(, α), αSα:
∆(x?, δ) = max
min
α{||x?x(, α)|| :αSα,|||| δ}.(7.12)
Consequently, with A?:= ˜
M(0, α?)1N(0, α?) and ξ(, α) := (, αα?)tthe following
equation holds up to first order:
∆(x?, δ) ˙= max
min
α{||A?ξ(, α)|| :αSα,|||| δ}.(7.13)
Whereas the above considerations are based on comparisons of Pareto points in
parameter space, it might also be important to know the behavior of the objective
values caused by the variation of . Assuming that Fiis Lipschitz continuous with
Lipschitz constant Li,i= 1, . . . , l, we have
|Fi(x, )Fi(x?,0)|˙
Li∆(x?, δ) (7.14)
for all with |||| δand x {x:xsolves PM(),||xx?|| ∆(x?, δ)}. The
expression (7.14) considers the variation of Fifor the case that the distance of x?
to the Pareto set of PM() in pre-image space is maximized subject to |||| δ. In
practice it can also be interesting to know the maximal distance of Fi(x?,0) to the
i-th component of the Pareto set of PM() in image space subject to |||| δ, which
for all i= 1, . . . , l can be expressed by
Fi(x?, δ) := max
min
x{|Fi(x?,0) Fi(x, )|:xsolves PM(),|||| δ}.
Accordingly, if we are interested in the maximal distance of F(x?,0) to the Pareto
set of PM() in image space subject to |||| δ, we have to consider
F(x?, δ) := max
min
x{||F(x?,0) F(x, )|| :xsolves PM(),|||| δ}.
From (7.11), we have
Fi(x(, α), )Fi(x?+A?ξ(, α), )
109
and therefore we can state
Fi(x?, δ)max
min
α{|Fi(x?,0) Fi(x?+A?ξ(, α), )|:xsolves PM(),|||| δ}
and
F(x?, δ)max
min
α{||F(x?,0)F(x?+A?ξ(, α), )|| :xsolves PM(),|||| δ}.
Example 7.3 Consider F= (F1, F2)t:
R
2×
R
R
2defined by
F1(x, ) = (x1)2+x2
2,(7.15)
F2(x, ) = (x11)2+ (x21)2(7.16)
and the corresponding unconstrained parametrized MOP
min
xF1(x, )
F2(x, ).(7.17)
The Pareto set of this MOPis given by {(x+, x):0x1}for every
R
. Moreover, for every
R
and every αSαwe consider the weighted sums
scalarization Fα:
R
2×
R
R
defined by
Fα(x, ) = α1F1(x, ) + α2F2(x, )
and the associated weighted sums problem
min
xFα(x, ).(7.18)
Observe that, since F1and F2are convex for every fixed
R
,Fαis convex for all
fixed
R
,αSα, and therefore the corresponding solutions of (7.18) are unique.
For the following demonstration we fix α1=α?
1=3
4,α2=α?
2=1
4and =?= 0.
Thus, the necessary optimality condition for a solution of (7.18) is
¯
Fα?(x, 0) = 3
4¯
F1(x, 0) + 1
4¯
F2(x, 0) = 3
42x1
2x2t
+1
42x12
2x22t
= 0,
from which we obtain the unique solution
x?=x?
1
x?
2=1
41
1.
As mentioned before, for our analysis we have to regard both and αas per-
turbation parameters and therefore we have to calculate M(, α) and N(, α). The
gradient ¯
Fα(x?, ) turns out to be
¯
Fα(x?, ) = 2(α1(x1) + α2(x11), α1x2+α2(x21))
110
Since there are no active constraints in the neighborhood of the considered point,
the matrix M(, α) is simply
M(, α) = ¯
2Fα(x?, ) = 2(α1+α2) 0
0 2(α1+α2)= 2 1 0
0 1
and consequently
M(, α)1=1
21 0
0 1 .
Denoting by the gradient with respect to (, α), N(, α) turns out to be
N(, α) = −∇¯
Fα(x?, )t
=2(α1+α2)x1 x11
0x2x21
=21x1 x11
0x2x21.
It follows that
x(?, α?) = M1(0, α?)N(0, α?) = 1
44 1 3
0 1 3
and from (7.11) we have
x(, α) ˙= 1
41
11
44 1 3
0 1 3
α1α?
1
α2α?
2
.
Thus, for fixed α=α?we have
x(, α?) ˙= 1
41
1+
0,
that is, x(, α?) moves in parallel to the x1-axis when varies, see Figure 38. With
these results, according to (7.10), we can write
∆(x?, δ)˙
max
{||x(, α?)x?|| :|||| δ}(7.19)
= max
{|||| :|||| δ}=δ. (7.20)
As one can see in Figure 38, for the case δ= 1, both x(1, α?) and x(1, α?) are
located on the line x2=1
4such that ||x(1, α?)x?|| =||x(1, α?)x?|| = 1 and
therefore we have
∆(x?,1) ˙
1.(7.21)
111
Now we go a step ahead and calculate a more accurate estimation according to
(7.13). For this, we have to find for every near ?= 0 the minimum of
M1(0, α?)N(0, α?)
α1α?
1
α2α?
2
with respect to αor, equivalently, we have to solve
min
α2
(4+ 1 4α2)2+ (1 4α2)2(7.22)
s.t. 0α21,(7.23)
where α1has been eliminated by the substitution α1= 1 α2. Assuming for the
moment that varies only in a small neighborhood such that the solution ˆα() of
(7.22) can be expected to vary within the open interval (0,1), we can omit the
constraints on α2and therefore the necessary optimality condition turns out to be
¯
α2((4+ 1 4α2)2+ (1 4α2)2) = 8(4+ 2 8α2) = 0
or
21+4α2= 0,
from which we obtain the solution
ˆα2=1
4
2and ˆα1= 1 1
4
2=
2+3
4.
This result is valid for all (3
2,1
2), because in this case ˆα2() remains within
the open interval (0,1), that is, there are no active constraints at ˆα2(). For
R
\(3
2,1
2) the necessary optimality condition alters to be
21+4α2µ= 0,
where the sign of µ0 for α2= 1 and µ0 for α2= 0. More precisely, we have
µ=
3+2, 3
2
0, (3
2,1
2)
21, 1
2
(7.24)
and finally
ˆα() =
(0,1)t, 3
2
(
2+3
4,1
4
2)t, (3
2,1
2)
(1,0)t, 1
2
.(7.25)
112
As can be seen from this result, both ˆα1() and ˆα2() depend continuously and
monotonously on . Moreover, it can be shown that the function
||M1(0, α?)N(0, α?)ξ(, ˆα())||
is also continuous and monotone. Thus, for δ= 1, the calculation of (7.13) reduces
to
∆(x?,1) ˙= max
{||M1(0, α?)N(0, α?)ξ(, ˆα())|| : {−1,1}}
= max (1
2,10
4)=10
4.(7.26)
Figure 38: Comparison of the different estimations for ∆(x?,1).
The sensitivity analysis for a BLMOP without lower level inequality constraints
can be realized by the use of the sensitivity analysis for MOP as described above.
To be more precise, the desired estimations are based on the application of (7.10)
and (7.13), respectively, to the following reformulation obtained by replacing the
113
lower level problem by its Kuhn-Tucker conditions:
min
x
R
n,y
R
m
R
l
R
pF(x, y),(7.27)
s.t G(x, y)p0,
H(x, y) = 0,
l
X
i=1
αixfi(x, y) +
p
X
i=1
ζixhi(x, y) = 0,(7.28)
h(x, y) = 0,
l
X
i=1
αi1 = 0,
αi0, i = 1, . . . , l,
where minimization has to be understood in the sense of multi-objective optimiza-
tion. For a more detailed description we consider the following
Example 7.4 Let F= (F1, F2)t, f = (f1, f2)t:
R
2×
R
×
R
R
2be defined by
F1(x, y, ) = 1
2(x1y)2+ (1 + )x2
2,
F2(x, y, ) = 1
2(x1y1)2+ (x21)2,
f1(x, y, ) = 1
2(x12)2+ (x2y)2,
f2(x, y, ) = 1
2(x1+ 1)2+ (x21)2,
and consider the following parametrized BLMOP:
min
x,y F(x, y, ),(7.29)
s.t. xsolves: min
xf(x, y, ).
Observe that there are no explicitly given constraints for the upper level. Therefore,
in the following reformulation used for the calculation of the desired estimations, we
will use Hand Gto denote the constraints which arise from replacing the lower level
114
by its Kuhn-Tucker conditions. Thus, we consider the following auxiliary problem:
min
x,y F1(x, )
F2(x, ),(7.30)
s.t.
H1(x, α, ) := α1(x12) + α2(x1+ 1) = 0,(7.31)
H2(x, α, ) := α1(x2y) + α2(x21) = 0,(7.32)
H3(x, α, ) := α1+α21 = 0,(7.33)
G1(x, α, ) := α10,(7.34)
G2(x, α, ) := α20.(7.35)
Here, the equations H1and H2correspond to the rows of (7.28). As can be verified
by the optimality conditions stated in Theorem 6.1, the point (x?
1, x?
2, y?, α?
1, α?
2) =
(1
2,1
2,0,1
2,1
2) along with corresponding multipliers λ1=λ2=λ3=µ1=µ2= 0
is both a solution of the given BLMOP for ?= 0 and a solution of the related
weighted sums problem corresponding to the weights (β?
1, β?
2) = (1
2,1
2). Now, let
L(x, y, α, β, µ, λ, ) =
k
X
i=1
βiFi(x, y, ) +
2
X
i=1
µiGi(x, y, ) +
3
X
i=1
λiHi(x, y, )
be the Lagrangian associated with the reformulation (7.30) (7.35) and denote
¯
=(x,y)and =(,β). Then we obtain
¯
2L(x?, y?, α?, β?, µ?, λ?, ?) =
1 0 1 0 0
0 1 0 0 0
1 0 1 0 0
0 0 0 0 0
0 0 0 0 0
,
¯
∇L(x?, y?, α?, β?, µ?, λ?, ?) =
01
21
2
1
4
1
21
2
01
2
1
2
0 0 0
0 0 0
,
H(x?, y?, α?, ?) =
0 0 0
1
20 0
0 0 0
,
¯
G(x?, y?, α?, ?) = 0 0 0 1 0
0 0 0 0 1,
115
and
¯
H(x?, y?, α?, ?) =
1 0 0 3
2
3
2
0 1 1
2
1
21
2
0 0 0 1 1
,
yielding
M(?, β?) =
1 0 1 0 0 0 0 1 0 0
0 1 0 0 0 0 0 0 1 0
1 0 1 0 0 0 0 0 1
20
0 0 0 0 0 1 0 3
2
1
21
0 0 0 0 0 0 13
21
21
0 0 0 0 0 1
20 0 0 0
0 0 0 0 0 0 1
20 0 0
1 0 0 3
2
3
20 0 0 0 0
0 1 1
2
1
21
20 0 0 0 0
0 0 0 1 1 0 0 0 0 0
and
N(?, β?) =
01
21
2
1
4
1
21
2
01
2
1
2
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
1
20 0
0 0 0
.
Since we are particularly interested in (x(), y()), let ˜
M1(?, β?) denote the first
n+m= 3 rows of M1(?, β?). Then we have
A?=˜
M1(?, β?)N(?, β?) = 1
316
90 78 78
137 34 34
18 16 16
.
We assume that may vary within (0.5,0.5). Thus we are interested in estimating
∆((x?, y?),0.5). From (7.10), by fixing β=β?, we can calculate
∆((x?, y?),0.5) ˙
A?
0.1
0
0
0.2609.(7.36)
116
To compute an estimation according to (7.13), observe that for βSβthe gradient
of
A?
β1β?
1
β2β?
2
2
with respect to (β1, β2) is given by
937
6241 β1β21037
3748
β1+β2+1037
3748=937
6241 12β21037
3748
1+2β2+1037
3748,
and vanishes if and only if β1=1
2(1 + 1037
3748) and β2=1
2(1 1037
3748). Moreover,
for || 0.5, we have β1, β2(0,1). Therefore,
min
β
A?
β1β?
1
β2β?
2
:βSβ
=
A?
1037
3748
1037
3748
=r7993
29984.
Consequently, we can rewrite (7.13) as
∆((x?, y?),0.5) ˙= min
β
A?
0.5
β1β?
1
β2β?
2
:βSβ
= 0.5r7993
29984 0.2582,
which, as expected, is a better estimation than (7.36).
7.3 A Concept for the Adaptive Choice of Targets
When computing a discrete representation of a Pareto set for MOP or BLMOP,
respectively, by solving parametrized subproblems, sensitivity analysis can be used
to generate suitable parameters in order to control the distance (in parameter space
as well as in image space) between neighboring points of the computed represen-
tation. Here, we will present a new recovering algorithm of reference point type
which can be understood as a variant of our image set-oriented recovering methods
presented in Section 4.1 and Section 6.2. The new algorithm uses a concept for
the adaptive choice of targets (which are considered as perturbation parameters),
such that the generated Pareto points are well-distributed in parameter space rather
than in image space. This is motivated by applications with high-dimensional pa-
rameter space, where on one hand a well-distributed representation of the Pareto
set in parameter space is required (as can be computed by our classical set-oriented
methods), but on the other hand one wants to benefit from the advantages of the
new image set-oriented methods mentioned in Section 4.
117
The variant of the new algorithm for the solution of BLMOP is realized by
replacing the lower level problem by its Kuhn-Tucker conditions and computing a
solution of the resulting reformulation, which has the form of a constrained MOP.
Therefore, the following description is related to a general variant for the solution of
constrained MOPs. Thereafter, Example 7.5 shows that this algorithm also works
satisfactorily for BLMOPs. For the description of the new algorithm recall, that in
image set-oriented recovering methods, every new Pareto point ˆxis generated in the
neighborhood of a known Pareto point x?by minimizing the distance ||F(x)t||
between F(x) and a target
t {F(x?) +
k1
X
j=1
jbj+ ˜(t?F(x?))},
where {bj, j = 1, . . . , k 1}is a basis of the tangent space TF(x?)F(P) of the Pareto
set in image space at F(x?), and t?is the target used previously for generating x?.
The vector = (1, . . . , k1) and the value ˜make up a parameterization for the
target t. Let us consider the case where the Pareto set is convex (in image space).
In this case, we can always choose ˜= 0, and therefore we write t=t(). Observe
that t(0) = F(x?) and thus minimization of ||F(x)t(0)|| ends up in x?, that is,
we have x(0) = x?for the optimum function x(). Now suppose that, starting from
a previously found Pareto point x?, we want to find a new Pareto point ˆxsuch that
approximately
||ˆxx?|| =δ(7.37)
holds. According to Corollary 7.2, we have to find a parameter vector
R
k1such
that
|| ˜
M1(0)N(0)|| =δ,
where ˜
Mand Nare evaluated as in (7.8) while the objective F(x) is substituted by
the (differentiable) auxiliary function ||F(x)t()||2. For every fixed i= 1, . . . , k1,
since determines the target Pk1
j=1 jbjTF(x?)F(P), we can choose j= 0 for j6=i
in order to force that F(ˆx) is close to F(x?) + ibi. Then, if ˜
M1(0)N(0)ei6= 0,
where eidenotes the i-th standard basis vector in
R
k1, we can conclude, that we
have to choose
ˆi:= |i|=δ
|| ˜
M1(0)N(0)ei||.
Observe that, since we are dealing with a linear approximation, we can use the
targets
F(x?) + ˆibiand F(x?)ˆibi
118
for computing two different Pareto points with the required property expressed by
(7.37). It should be mentioned that particularly for the case k3, the points
computed with this technique certainly have approximately the desired distance to
x?, but in general there is no information on the distance to the other Pareto points
which have been computed so far. Here, additional strategies are needed in order to
obtain diversity among all generated Pareto points. For the particular case k= 2 it
is certainly a good choice to compute at first the individual minimum x?of one of the
objectives Fiand then since F(x?) makes up a vertex of the 1-dimensional Pareto
set using successively the technique described above and choosing the respective
targets tin a way such that the i-th component tiincreases (and the other one
decreases) successively, see Figure 39 (right). Another strategy is to start with an
arbitrary target t, and after calculating the solution x?associated with that target
t, to extend the Pareto set into both directions with the mentioned technique by
choosing two different targets ˜
tand ˆ
tin a way such that for a fixed ˜
t1and ˆ
t2increase
and ˜
t2and ˆ
t1decreases (or vice versa) successively, see Figure 39 (left). In contrast to
the image space oriented recovering algorithms described in Section 4.1 and Section
6.2, the concept of using boxes in image space can be omitted here, since for k= 2
the desired diversity among all generated Pareto points is obtained with the help of
the adaptively chosen targets.
Figure 39: Adaptive target construction starting at an arbitrary Pareto point (left)
and at the Pareto point corresponding to the individual minimum of F1(right).
Example 7.5 Given a higher level function F= (F1, F2)t:R2×RR2and a
lower level function f= (f1, f2)t:R2×RR2, we consider the following BLMOP:
119
min
x,y
R
F(x, y) = 10(x11)4+ (x21)2+ (y1)2
(x1+ 1)2+ (x2+ 1)2+ (y+ 1)4,(7.38)
such that xsolves:
min
x
R
f(x, y) = (x12)2+ (x22)4+ 3y2
2y(x1+ 0.5)2+ (x2+ 1)2.(7.39)
We have calculated a solution for this problem using the BL-Recovering-IS
method (which uses the Kuhn-Tucker reformulation for the lower level problem)
in two different ways:
On one hand, the target parameter = 1 was fixed and consequently the distance
between neighboring computed Pareto points F(x?
i) and F(x?
i+1), i= 1,2,..., in
image space corresponds approximately with this value = 1, while the distance
between the associated points x?
iand x?
i+1,i= 1,2,..., in parameter space varies,
see Figure 40.
On the other hand, the target parameter = ˆwas estimated in every step by
the above sensitivity analysis based technique. Here, we have chosen δ= 0.15 and as
depicted in Figure 41, the distance between neighboring computed Pareto points in
parameter space x?
iand x?
i+1,i= 1,2,...,corresponds as desired approximately
with this value δ= 0.15, while the distance between the associated points F(x?
i)
and F(x?
i+1), i= 1,2,..., in image space varies.
Figure 40: The solution in parameter space (left) and image space (right) for fixed
target parameter = 1.
120
Figure 41: The solution in parameter space (left) and image space (right) for adap-
tive target parameter ˆrelated to δ= 0.15.
8 Conclusion and Outlook
In this work we considered the class of bi-level multi-objective optimization problems
(BLMOP), which can be understood as a generalization of classical bi-level opti-
mization problems to the case where both the upper and lower level are given by
multi-objective optimization problems (MOP) instead of scalar-valued optimization
problems.
Based on the well-known Kuhn-Tucker optimality conditions for classical opti-
mization problems, we have developed optimality conditions for BLMOP for the case
that the lower level problem is a convex one without inequality constraints. These
optimality conditions have been used for the definition of numerical algorithms for
the solution of this particular subclass of BLMOP.
Since the solution of bi-level multi-objective optimization problems, the Pareto
set, is typically an extensive set or, in mathematical terms, a manifold, most of
our algorithms are embedded in a set-oriented framework. In particular, we have
concentrated on two main directions, that is, algorithms of subdivision type and
algorithms of recovering type. Both have their individual advantages and work
satisfactorily on their own, but it turned out that performance can often be improved
when they are used in combination.
Moreover, in this work we have considered the sensitivity analysis for MOP and
BLMOP, that is, we have investigated the behavior of the Pareto set under the
variation of additional parameters. The results have particularly been applied to
one of our set-oriented algorithms in order to control the spreading of the discrete
representation of the computed solution.
121
The restriction to the above mentioned subclass of BLMOP is motivated by
the following mathematical circumstances: the lower level problem is assumed to
be convex, because otherwise it can not be guaranteed that the points satisfying
the Kuhn-Tucker conditions for the lower level comply exactly with the feasible
region for the upper level problem. Moreover, the common constraint qualifications,
which have to be fulfilled for applying the Kuhn-Tucker conditions to the higher level
problem, are necessarily violated in the presence of lower level inequality constraints.
The development of advanced optimality conditions for the general BLMOP,
which also includes non-convex lower level problems with inequality constraints,
is an interesting field which shall be investigated in the future. Well-known second
order sufficient optimality conditions for classical optimization may form a promising
basis for further development. Moreover, the above mentioned sensitivity analysis
may be an additional tool not only for the development of the desired optimality
conditions, but also for designing algorithms for the solution of the general BLMOP.
Up to now, we have used the developed algorithms to solve several academic
examples and technical applications. Our interest for the future is to apply both the
existing and upcoming algorithms to BLMOPs arising from real world problems of
any discipline.
122
References
[1] J. F. Bard, Practical Bilevel Optimization: Algorithms and Applications, Kluwer
Academic Publishers, 1998.
[2] H. Bonnel and J. Morgan, Semivectorial Bilevel Optimization Problem: Penalty Ap-
proach, Journal of Optimization Theory and Application, vol. 131(3), pp. 365–382,
2006.
[3] M. Brown, R. E. Smith, Directed Multi-Objective Optimisation, International Journal
of Computers, Systems and Signals, vol. 6(1), 2005.
[4] I. Chauduri, S. Sertl, H. Zolt´an, M. Dellnitz and T. Fraunheim, Global Optimization
of Silicon Nanoclusters, Applied Surface Science, vol. 226(1-3), pp. 108-113, 2004.
[5] B. Colson, P. Marcotte and G. Savard, Bilevel programming: A survey, A Quarterly
Journal of Operations Research, vol. 3, pp. 87–107, 2005.
[6] I. Das, J. E. Dennis, A Closer Look at Drawbacks of Minimizing Weighted Sums of
Objectives for Pareto Set Generation in Multicriteria Optimization Problems, Struc-
tural Optimization, vol. 14, pp. 63–69, 1997
[7] I. Das, J. E. Dennis, Normal-Boundary Intersection: A new Method for Generating
the Pareto Surface in Nonlinear Multicriteria Optimization Problems, SIAM Journal
of Optimization, vol. 8, no.3, pp. 631–657, 1998
[8] J. P. Dauer, T. A. Fosnaugh, Optimization over the Efficient Set, Journal of Global
Optimization, vol. 7, pp. 261–277, 1995.
[9] A. Dell’Aere, Numerische nichtlineare Optimierung in MuPAD, Diplomarbeit, Pader-
born, 2003
[10] A. Dell’Aere, Multi-Objective Optimization in Self-Optimizing Systems, Proceedings
of the 32nd Annual Conference of the IEEE Industrial Electronics Society, Paris, 2006
[11] M. Dellnitz, O. Sch¨utze and T. Hestermeyer, Covering Pareto Sets by Multilevel
Subdivision Techniques, Journal of Optimization Theory and Application, vol. 124(1),
pp. 113–13, 2005.
[12] S. Dempe, Foundations of Bilevel Programming, Kluwer Academic Publishers, 2002.
[13] S. Dempe, Annotated Bibliography on Bilevel Programming and Mathematical Pro-
grams with Equilibrium Constraints, Optimization, vol. 52, pp. 333–359, 2003.
[14] P. Deuflhard, A. Hohmann, Numerische Mathematik I Eine algorithmisch orientierte
Einfhrung, de Gruyter & Co., 1993.
[15] M. Ehrgott, Multicriteria Optimization, Lecture Notes in Economics and Mathemat-
ical Systems, vol. 491, Springer, Berlin, 2000.
[16] G. Eichfelder, Parametergesteuerte osung nichtlinearer multikriterieller Opti-
mierungsprobleme, PhD thesis, University of Erlangen-N¨urnberg, Germany, 2006.
[17] A. V. Fiacco, Introduction to Sensitivity and Stability Analysis in Nonlinear Pro-
gramming, Academic Press, London, 1983.
[18] J. Figueira, S. Greco, M. Ehrgott, Multiple Criteria Decision Analysis, Lecture Notes
in Economics and Mathematical Systems, Springer, New York, 2005.
123
[19] M. Fukushima, Merit Functions for Variational Inequality and Complementarity
Problems, Nonlinear Optimization and Applications, G. Di Pillo and F. Giannessi
(eds.), Plenum Press, pp. 155-170, 1996.
[20] C. M. Fonseca, P. J. Fleming, E. Zitzler, K. Deb and L. Thiele, Evolutionary
Multi-Criterion Optimization. Second International Conference, EMO 2003, Springer,
Berlin, 2003.
[21] U. Frank, H. Giese, F. Klein, O. Oberschelp, A. Schmitt, B. Schulz, H. Vcking, K.
Witting, J. Gausemeier (Hrsg.), Selbstoptimierende Systeme des Maschinenbaus -
Definitionen und Konzepte, HNI-Verlagsschriften, vol. 155, Paderborn, 2004.
[22] M. Fukushima and J.-S. Pang, Convergence of a Smoothing Continuation Method
for Mathematical Programs with Complementarity Constraints, Ill-Posed Variational
Problems and Regularization Techniques, Lecture Notes in Economics and Mathe-
matical Systems, vol. 477, pp. 99-110, 1999.
[23] K. Harada, J. Sakuma, S. Kobayashi, Local Search for Multiobjective Function Op-
timization: Pareto Descent Method, Proceedings of the 8th annual conference on
genetic and evolutionary computation, pp. 659–666, New York, 2006.
[24] M. E. Henderson, Multiple Parameter Continuation: Computing Implictly Defined
k-Manifilds, International Journal of Bifurcation and Chaos, vol. 12, pp. 451–476,
2002
[25] C. Hillermeier, Nonlinear Multiobjective Optimization - A Generalized Homotopy
Approach, Birkh¨auser, Berlin, 2001.
[26] R. Horst and H. Tuy, Global Optimization: Deterministic Approaches, Springer,
Berlin, 1993.
[27] J. Jahn, Vector Optimization Theory, Applications, and Extensions, Springer,
Berlin, 2004.
[28] H. Jiang and D. Ralph, Smooth SQP Methods for Mathematical Programs with
Nonlinear Complementarity Constraints, Journal on Optimization, vol. 10(3), pp.
779–808, 2000.
[29] H. Jiang and D. Ralph, Extension of Quasi-Newton Methods to Mathematical Pro-
grams with Complementarity Constraints, Computational Optimization and Appli-
cations, vol. 25(1-3), pp. 123–150, 2003
[30] T. Knoke, C. Romaus, J. ocker, A. Dell’Aere, K. Witting, Energy management for
an Onboard Storage System based on Multiobjective Optimization, Proceedings of
the 32nd Annual Conference of the IEEE Industrial Electronics Society, Paris, 2006
[31] H. Kuhn and A. Tucker, Nonlinear programming, Proc. Berkeley Symp. Math. Statist.
Probability, vol. 2, pp. 481–492, 1951.
[32] Y. C. Liou, X. Yang, and J. C. Yao, Mathematical Programs with Vector Optimization
Constraints, Journal of Optimization Theory and Application, vol. 126(2), pp. 345–
355, 2005.
[33] Z. Q. Luo, J.-S. Pang, and D. Ralph, Mathematical Programs with Equilibrium
Constraints, Cambridge University Press, Cambridge, 1996.
[34] K. Miettinen, Nonlinear Multiobjective Optimization, Kluwer Academic Publishers,
1999.
124
[35] W. Oevel, Einf¨uhrung in die Numerische Mathematik, Spektrum Akademischer Ver-
lag, Heidelberg, 1996.
[36] P. M. Pardalos and M. G. C. Resende, Handbook of Applied Optimization, Oxford
University Press, New York, 2002.
[37] S. Scaffler and R. Schultz and K. Weinzierl, A Stochastic Method for the Solution of
Unconstrained Vector Optimization Problems, Journal of Optimization Theory and
Application, vol. 114(1), pp. 209–222, 2002.
[38] O. Sch¨utze, Set-oriented Methods for Global Optimization, PhD Thesis, University
of Paderborn, Germany, 2004.
[39] O. Sch¨utze, A. Dell’Aere and M. Dellnitz, On Continuation Methods for the Numer-
ical Treatment of Multi-Objective Optimization Problems, in Dagstuhl Seminar Pro-
ceedings 04461, Practical Approaches to Multi-Objective Optimization, IBFI, Schloss
Dagstuhl, Germany, 2005 http://drops.dagstuhl.de/opus/volltexte/2005/349.
[40] O. Sch¨utze, L. Jourdan, T. Legrand, E.-G. Talbi and J. L. Wojkiewicz, A Multi-
objective Approach to the Design of Conducting Polymer Composites for Electro-
magnetic Shielding, To appear in Evolutionary Multi-Criterion Optimization. Fourth
International Conference, EMO 2007, Springer, Berlin, 2007.
[41] S. Sertl and M. Dellnitz, Global Optimization using a Dynamical Systems Approach,
Submitted to Journal of Global Optimization, 2005
[42] X. Shi and H. S. Xia, Model and Interactive Algorithm of Bi-Level Multi-Objective
Decision Making with Multiple Interconnected Decision Makers, Journal of Multi-
Criteria Analysis vol. 10, pp. 27–34, 2001.
[43] H. v. Stackelberg, Marktform und Gleichgewicht, Springer, Berlin, 1934.
[44] P. T. Thach, H. Konno and D. Yokota, Dual Approach to Minimization on the Set
of Pareto-Optimal Solutions, Journal of Optimization Theory and Application, vol.
88(3), pp. 689–707, 1996.
[45] L. N. Vicente and P. H. Calamai, Bilevel and Multilevel Programming: A Bibliogra-
phy Review, Journal of Global Optimization, pp. 291–306, 1994.
[46] K. Witting, B. Schulz, A. Pottharst, M. Dellnitz, J. ocker, N. Fohleke, A new
Approach for Online Multiobjective Optimization of Mechatronical Systems, Sub-
mitted to the International Journal on Software Tools for Technology Transfer STTT
(Special Issue on Self-Optimizing Mechatronic Systems), 2004.
[47] N. Yamashita, K. Taji, M. Fukushima, Unconstrained Optimization Reformulations
of Variational Inequality Problems, Journal of Optimization Theory and Applications,
vol. 92, pp. 439–456, 1997.
[48] J. J. Ye, Constraint Qualifications and KKT Conditions for Bilevel Programming
Problems, Mathematics of Operations Research, vol. 31(4), pp. 811–824, 2006.
125
List of Figures
1 The idea of the Subdivision algorithm. . . . . . . . . . . . . . . . . . . . . . 15
2 The idea of the Recovering algorithm. . . . . . . . . . . . . . . . . . . . . . 16
3 The idea of the Sampling algorithm. . . . . . . . . . . . . . . . . . . . . . . 18
4 The idea of the Recovering-IS algorithm. . . . . . . . . . . . . . . . . . . . . 22
5 The construction of targets based on bTy?F(P) and d(Ty?F(P)). . . 23
6 The solution of Example 4.1 computed by the image set-oriented recovering
algorithm using different box sizes in image space. . . . . . . . . . . . . . . 24
7 The solution of Example 4.2 computed by the Recovering-IS algorithm (im-
agespace). .................................... 25
8 The solution of Example 4.2 computed by the classical Recovering algorithm
(imagespace).................................... 25
9 Block diagram of the tram system with onboard storage and operation
managementsystem................................ 26
10 The operation principle of the onboard storage system. . . . . . . . . . . . . 27
11 The solution of the tram problem computed by the IS-Recovering algorithm
and by the Recovering algorithm (image space). . . . . . . . . . . . . . . . . 29
12 Optimization results for the tram model (image space). . . . . . . . . . . . 29
13 The solution (in image space) of Example 4.1 computed by the Recovering-
IS algorithm in combination with the described hierarchical concept. . . . . 40
14 The Pareto optimal objective values (blue), the corresponding sensitivities
(red), and the solution (black) of Problem (6.25). . . . . . . . . . . . . . . . 57
15 The Pareto set of (6.25) in image space and the solution to the correspon-
dingRMOP..................................... 57
16 min
˜z{||˜zrB|| :˜
F(˜z)=0}versus ˜
F(˜z)=0. .................. 59
17 Projection of the generated box collection to the x-space. . . . . . . . . . . 60
18 The solution in lower level image space. . . . . . . . . . . . . . . . . . . . . 60
19 The solution in higher level image space. . . . . . . . . . . . . . . . . . . . . 61
20 The entire solution of Example 6.6 obtained by solving ˜
F(˜z) = 0 (projection
to x-space). .................................... 65
21 The ’inner’ part of the solution of Example 6.6 obtained by solving ¯
F(¯z) = 0
(projection to x-space). ............................. 65
22 The solution lies within the constraint surface defined by the lower level
problem. (the constraint surface was computed separately just for the pur-
pose of visualization, but need not be explicitly computed by our algorithms). 83
23 Comparison of the solution of the PSCMOP to the solution of the (uncon-
strained) higher level problem (higher level objective space). . . . . . . . . . 84
24 Box covering computed by a combination of the PSC-Sampling and PSC-SamRec
algorithms. .................................... 84
25 Box covering in parameter space computed by the PSC-Recovering algorithm. 86
26 The solution to the PSCMOP computed by the PSC-Recovering algorithm
is also a part of the solution to the lower level problem computed by
theRecoveringalgorithm. ............................ 87
27 The solution in lower level image space as computed by the PSC-Recovering
algorithm...................................... 87
28 The solution in higher level image space as computed by the PSC-Recovering
algorithm...................................... 88
29 The solution of the shielding problem in higher level (left) and lower level
(right)objectivespace............................... 90
30 The solution of the shielding problem in parameter space (projections to
the (d1, d3, σ1)-subspace (left) and (σ3, d3, σ1)-subspace (right), respectively). 91
126
31 The solutions of Problem (6.70,6.71) as computed by the BL-Recovering-IS
algorithm are locally but not necessarily globally Pareto optimal. . . . . . . 95
32 Convergence of the smoothing method (parameter space) . . . . . . . . . . 96
33 Convergence of the smoothing method (lower level objective space) . . . . . 96
34 Convergence of the smoothing method (higher level objective space) . . . . 97
35 The Pareto set of Problem (6.72, 6.73) in higher level image space (left)
and lower level image space (right) for different parameters λ......... 99
36 The solution of the non-convex BLMOP in parameter space. . . . . . . . . . 104
37 The solution of the non-convex BLMOP in higher level objective space. . . 104
38 Comparison of the different estimations for ∆(x?,1)..............113
39 Adaptive target construction starting at an arbitrary Pareto point (left)
and at the Pareto point corresponding to the individual minimum of F1
(right)........................................119
40 The solution in parameter space (left) and image space (right) for fixed
target parameter =1. .............................120
41 The solution in parameter space (left) and image space (right) for adaptive
target parameter ˆrelated to δ= 0.15......................121
127
Index
(k1)-dimensional manifold, 22
3-layered material, 88
F(G,H)-nondominated, 7
QR-decomposition, 30
ε-constraint method, 1
m-dimensional box, 21
active constraints, 47, 73, 106
adaptive concept, 23
aspiration levels, 18
auxiliary functions, 18
auxiliary problem, 11, 19, 20, 30
basic sensitivity theorem, 103
basis vectors, 64
basis vectors of tangent space, 64
bi-level multi-objective optimization, 11
bi-level multi-objective optimization prob-
lem, 2
bi-level optimization, 1
bi-level programming, 9
bi-level structure, 2
binary tree, 14
bisection, 16
BL-Recovering-IS algorithm, 52
BL-Recovering-PS algorithm, 52, 62
BL-Subdivision algorithm, 52, 56
BL2-Recovering-IS algorithm, 92
BL2-Subdivision algorithm, 101
BLMOP, 11, 52
boundary, 17, 53
of solution set, 81
box, 14, 40, 77
center, 14, 16
center of, 64
collection, 62, 77, 78
covering, 14, 79
neighbor of, 23
radius, 14, 16
size, 24
CHIM, 34
compact domain, 14
complementarity equations, 41
complementarity slackness, 106
complete partition, 16
complete set of alternatives, 3, 13, 22, 34,
35
component wise product, 53
computational effort, 20, 39, 79
computational time, 83
consistent grid of points, 21
constraint qualifications, 41
constraint surface, 82
constraints, 6, 9, 73, 81, 94
active, 106
continuation method, 15
continuously differentiable, 105
convex combinations, 34
convex hull of individual minima, 34
convex objectives, 35, 81, 85
corrector, 64
corrector step, 81
covering, 52
curvature, 22
decision maker, 20, 30, 39
decision making, 1, 13, 21
density of Pareto points, 39
depth, 39
derivative-free algorithms, 77
derivatives, 22, 30, 53, 77
design variables, 89
diameter, 14
differentiable, 105
differentiable objectives, 30
dimension
of image space, 21, 30
of parameter space, 21, 30
of PSCMOP, 79
of search space, 64
direction of descent, 6, 8
distance function, 18, 21
electromagnetic radiation, 88
expanded test box, 77
explicit methods, 44
explicitly defined inequality, 94
F-nondominated, 7
feasible point, 41
feasible set, 1, 19, 82
first derivatives, 53
Fischer-Burmeister function, 45, 50
follower, 10
GAIO, 4
Gauß-Newton method, 56
global nature of PSC-Sampling algorithm,
78
global Pareto points, 79
global Pareto set, 78, 79
128
gradient, 17, 30, 34, 46, 47, 105
Grahm-Schmidt method, 22
Hadamard product, 53
Hessian, 32, 39, 46, 72, 75, 80, 105
heuristic, 23
hierarchical concept, 40
hierarchical optimization, 9
hierarchy, 1
higher level function, 77
higher level image space, 82
higher level problem, 1, 9, 81
ill-conditioned auxiliary problem, 23
image of a function, 20
image set-oriented method, 20
image set-oriented methods, 3
image space, 20, 56, 79
dimension of, 21, 30
imaginary unit, 89
implicit methods, 44
inequality constraints, 94
injective, 36
inner point, 50, 75
isolated solution, 33, 106
iteration, 62, 78
iteration scheme, 16, 20, 30
Kuhn-Tucker, 6, 8, 11, 34, 45
Lagrangian, 46, 105
leader, 10
linear independence constraint qualifications,
41
linearly independent, 46, 73, 105
Lipschitz constant, 109
Lipschitz continuous, 109
local nature of PSC-SamRec algorithm, 79
local Pareto point, 7
locally nondominated, 78
lower level Lagrangian, 46
lower level objectives, 52
lower level problem, 1, 9, 73, 75
lower level system, 81
Mangasarian-Fromowitz constraint qualifica-
tions, 41
manifold, 8, 22
mathematical program with complementar-
ity constraints, 44
mathematical program with equilibrium con-
straints, 41
maximal rank, 48, 74
maximum number of iterations, 62
memory requirement, 14
merit function, 41
minimization, 12, 72, 114
MOP, 1, 6
parametrized, 11
multi-layer shielding material, 88
multi-level subdivision scheme, 14, 20
multi-level subdivision structure, 3
multi-objective optimization problem, 1, 6,
7
pareto set constrained, 71
multipliers, 81
NBI method, 30, 34
necessary optimality conditions, 31, 32, 71
neighborhood, 22, 105
neighboring box, 23, 64
non-convex objectives, 90
non-smooth methods, 42
non-unique lower level solutions, 10
nondominance test, 18, 77
nondominated, 7, 77
locally, 78
nondominated points, 77, 82
norm, 18
normal boundary intersection method, 1, 30,
34
objective derivatives, 22
objectives
non-convex, 90
one-to-one mapping, 36
optimality conditions, 6, 32, 71, 81
optimistic formulation, 10
optimization over the efficient set, 3, 55
orientation, 22
orthogonal basis, 22
parameter set-oriented methods, 3
parameter space, 20
dimension of, 21, 30
parameters, 20
parametric optimization, 9
parametric optimization problem, 103
parametrized MOP, 11
Pareto optimal, 7, 90
globally, 90
locally, 20, 90
Pareto point, 1, 7, 18, 22
global, 79
local, 34
robust, 55
Pareto set, 1, 11, 12, 52
boundary of, 53
approximation of, 22
129
curvature of, 22
entire, 31
global, 78, 79
orientation of, 22
well-distributed, 31
Pareto set constrained multi-objective opti-
mization problems, 45
Pareto set constrained multi-objective opti-
mization problems, 71
Pareto-optimistic formulation, 12
partial order, 6, 12, 72
partition, 16
pay-off matrix, 34
perturbation parameter, 105
perturbed Fischer-Burmeister function, 42
pessimistic formulation, 10
physical constants, 89
positive definite, 32, 80
pre-optimization, 27
predictor-corrector concept, 77
predictors, 30
preferred solution, 3, 55
PSC-Recovering algorithm, 75
PSC-Sam-Rec algorithm, 77
PSC-Sampling algorithm, 76
global nature of, 78
PSC-SamRec algorithm, 76
local nature of, 78, 79
PSC-Subdivision algorithm, 75
PSCMOP, 72
dimension of, 79
Recovering algorithm, 101
Recovering-IS algorithm, 21
reduced model, 27
reference point, 18
reference point method, 18, 30
regularity assumptions, 11
representatives, 82
robust approximation, 79
robust Pareto points, 55
robustness, 55, 88
Sampling algorithm, 101
second, 53
self-optimizing systems, 2
sensitivity, 55, 56, 103
sensitivity analysis, 22
sensitivity theorem, 103
sequential quadratic programming, 43
set-oriented algorithms, 14
set-oriented methods, 1, 3
slack variables, 52
smooth objectives, 22
smoothing functions, 42
smoothing methods, 41
smoothing parameter, 42
standard basis, 46
starting iterations, 20
starting points, 64
strictly convex, 35, 38
strictly monotonically increasing, 19
strong formulation, 10
subboxes, 40
subdivision, 78
subdivision depth, 101
subdivision size, 16
submatrix, 48, 74
substationary points, 8, 15, 64, 79
surjective, 36
symbolic derivatives, 77
symmetric positive definite, 74
system of equations, 81
system of nonlinear equations, 49
tangent space, 22, 30, 64
basis vectors of, 64
target, 18, 20, 30, 38
target vector, 18, 22, 32
termination criterion, 30
test box, 77
test points, 14, 17, 77, 79
third derivatives, 53
tight covering, 52
trade-off, 34
triangular block structure, 48, 74
two-step nondominance test, 90
uniqe multipliers, 106
upper level problem, 9, 73
vector function, 106
weak formulation, 10
weighted sums method, 1, 13, 30, 31
weighting vectors, 31
weights, 30
well-distributed, 21
well-distributed Pareto points, 18, 20, 24, 31
130