Document [original]

Surrogate Optimization with

Algebraic Notes and Applications within

the Electromagnetics Context

vorgelegt von

M.Sc.

Mirsad Hadžiefendi´c

an der Fakultät IV - Elektrotechnik und Informatik

der Technischen Universität Berlin

zur Erlangung des akademischen Grades

Doktor der Ingenieurwissenschaften

- Dr.-Ing. -

genehmigte Dissertation

Promotionsausschuss:

Vorsitzender: Prof. Dr. Jürgen Bruns (TU Berlin, Fakultät IV)

Gutachter: Prof. Dr. Rolf Schuhmann (TU Berlin, Fakultät IV)

Gutachter: Prof. Dr. Fredi Tröltzsch (TU Berlin, Fakultät II)

Gutachter: Jun.-Prof. Dr. Ulrich Römer (TU Braunschweig)

Tag der wissenschaftlichen Aussprache: 14. Juli 2021

Berlin 2022

iii

Declaration of Authorship

I, Mirsad Hadžiefendi´c, M.Sc., declare that this thesis titled, “Surrogate Optimiza-

tion with Algebraic Notes and Applications within the Electromagnetics Context”

and the work presented in it are my own. I confirm that:

• This work was done wholly or mainly while in candidature for a research de-

gree at this University.

• Where any part of this thesis has previously been submitted for a degree or

any other qualification at this University or any other institution, this has been

clearly stated.

• Where I have consulted the published work of others, this is always clearly

attributed.

• Where I have quoted from the work of others, the source is always given. With

the exception of such quotations, this thesis is entirely my own work.

• I have acknowledged all main sources of help.

• Where the thesis is based on work done by myself jointly with others, I have

made clear exactly what was done by others and what I have contributed my-

self.

Signed:

Date:

“Ima jedna modra rijeka – valja nama preko rijeke.”

Mehmedalija Mak Dizdar

vii

TECHNISCHE UNIVERSITÄT BERLIN

Abstract

Fakultät Elektrotechnik und Informatik

Institut für Hochfrequenz- und Halbleiter-Systemtechnologien

Fachgebiet Theoretische Elektrotechnik

Doktor der Ingenieurwissenschaften (Dr.-Ing.)

Surrogate Optimization with

Algebraic Notes and Applications within

the Electromagnetics Context

by Mirsad Hadžiefendi´c, M.Sc.

This thesis deals with surrogate optimization for applications within the electro-

magnetics context. Regarding the electromagnetics context, in particular, the mag-

netoquasistatic model of Maxwell’s theory is discussed. Moreover, relevant points

regarding the magnetoquasistatic model’s numerical simulation and numerical op-

timization are examined.

The key notion surrogate optimization is thoroughly elaborated which is parti-

tioned into three sub-notions: (1) surrogate modeling & simulation, (2) surrogate-

based optimizaton, and (3) surrogate-guided optimization. The various notions of

surrogate optimization are tagged with algebraic notes in order to anticipate the

toolset of the formal language of category theory. Moreover, the capability of the

category theory toolset as an algebraic modeling framework for applications in sur-

rogate optimization is investigated.

Finally, representatives of the class of inductive components are invoked and the

surrogate optimization tools of the present work are applied to four high-fidelity op-

timization problems that are embedded within the setting of a two-dimensional lin-

ear boundary value problem and a three-dimensional linear boundary value prob-

lem, respectively. Concerning these optimization problems, some promising spots

for a useful application of the category theory toolset are illuminated.

From the bird’s-eye view, this thesis achieves some progress in the scientific

thicket of full automation of the virtual prototyping of power electronic systems.

From the frog’s-eye view, i.e., at a more technical level, some of the present

work’s achievements deal with hybrid model management strategies of surrogate-

guided optimization methods, the repercussions of the choice of a sampling plan

on these methods, and formalization issues regarding surrogate optimization with

multiple low-fidelity models.

TECHNISCHE UNIVERSITÄT BERLIN

Kurzzusammenfassung

Fakultät Elektrotechnik und Informatik

Institut für Hochfrequenz- und Halbleiter-Systemtechnologien

Fachgebiet Theoretische Elektrotechnik

Doktor der Ingenieurwissenschaften (Dr.-Ing.)

Surrogate Optimization with

Algebraic Notes and Applications within

the Electromagnetics Context

von Mirsad Hadžiefendi´c, M.Sc.

Diese Arbeit beschäftigt sich mit der Optimierung mit Ersatzmodellen für Anwen-

dungen innerhalb der elektromagnetischen Feldtheorie. Im Kontext der elektromag-

netischen Feldtheorie geht diese Arbeit insbesondere auf das magnetoquasistatische

Modell der Maxwell’schen Gleichungen ein. Zudem werden relevante Punkte hin-

sichtlich der numerischen Simulation und Optimierung des magnetoquasistatischen

Modells besprochen.

Es wird der Schlüsselbegriff „Surrogate optimization“ (in Dt.: Optimierung mit

Ersatzmodellen) ausführlich ausgearbeitet, der im Rahmen dieser Arbeit in drei Un-

terbegriffe aufgeteilt wird: (1) „Surrogate modeling & optimization“ (in Dt.: Mod-

ellierung und Simulation mit Ersatzmodellen), (2) „Surrogate-based optimization“

(in Dt.: auf Ersatzmodellen basierende Optimierung) und (3) „Surrogate-guided

optimization“ (in Dt.: durch Ersatzmodelle geführte Optimierung). Man beachte,

dass die verschiedenen Begriffe bzgl. der Optimierung mit Ersatzmodellen mit alge-

braischen Anmerkungen versehen werden, um den mathematischen Werkzeugkas-

ten der formalen Sprache der Kategorientheorie vorwegzunehmen. Es werden im

Speziellen die Möglichkeiten untersucht, den kategorientheoretischen Werkzeugkas-

ten als algebraische Modellierungsumgebung für Anwendungen im Rahmen der

Optimierung mit Ersatzmodellen zu verwenden.

Abschließend werden Vertreter aus der Klasse der induktiven Komponenten

präsentiert und die im Rahmen dieser Arbeit vorgestellten Werkzeuge bzgl. der Op-

timierung mit Ersatzmodellen werden auf vier hochgenaue Optimierungsprobleme

angewendet. Diese Optimierungsprobleme sind in das Umfeld von zweidimension-

alen und dreidimensionalen linearen Randwertproblemen eingebettet. Hinsichtlich

dieser Optimierungsprobleme werden einige vielversprechende Stellen für eine nüt-

zliche Anwendung des kategorientheoretischen Werkzeugkastens beleuchtet.

Aus der Vogelperspektive betrachtend erreicht diese Dissertation einen gewissen

Fortschritt im wissenschaftlichen Dickicht der vollständigen Automatisierung des

virtuellen Prototypings von leistungselektronischen Systemen.

Aus der Froschperspektive betrachtend, d. h., auf einer eher technischen Ebene,

beschäftigen sich einige der Errungenschaften dieser Arbeit mit hybriden Modell-

management-Strategien von durch Ersatzmodelle geführte Optimierungsmethoden,

mit den Auswirkungen der Wahl des Stichprobenplans auf diese Methoden und mit

Formalisierungsproblemen bzgl. der Optimierung mit Ersatzmodellen im Falle von

mehreren ungenauen Modellen.

Acknowledgements

I sincerely thank Prof. Dr. Rolf Schuhmann for his mentorship, advice, and open-

mindedness. I thank also all my colleagues at the group Theoretische Elektrotechnik

for a pleasant and inspirational working environment. A special thanks has to be

given to Marcus Christian Lehmann, Albert Piwonski, and Rodrigo Silva Rezende

with whom I have shared many pursuits of knowledge.

I thank Prof. Dr. Fredi Tröltzsch and Jun.-Prof. Dr. Ulrich Römer for their time

and energy to provide the second review and third review regarding the present

work, respectively. And I would like to thank Prof. Dr. Jürgen Bruns for taking over

the chairmanship of the doctoral committee.

Finally, I thank all my friends and my whole family for their support. My greatest

and warmest thanks are due to my parents, Mensur and Abasa, and my brothers,

Admir and Emir.

xiii

Contents

Declaration of Authorship iii

Abstract vii

Kurzzusammenfassung ix

Acknowledgements xi

1 Introduction 1

1.1 A bigger picture: The ideal long-term goal ................... 1

1.2 Glimpse at the details: Surrogate optimization ............... 3

1.3 Setting a horizon: The research scope & goals ............... 6

2 Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and opti-

mization 11

2.1 Magnetoquasistatic Model of Maxwell’s theory .............. 12

2.1.1 The fundamental problem statement of electromagnetism . . . 12

2.1.2 The system of Maxwell’s equations ................. 12

2.1.3 The magnetoquasistatic subsystem & the magnetostatic sub-

system ................................... 16

2.2 Numerical simulation of the magnetoquasistatic model ......... 18

2.2.1 The weak formulation ......................... 18

2.2.2 Numerical approximation ....................... 21

2.2.3 Parametric mathematical model ................... 22

2.3 Numerical optimization with the magnetoquasistatic model . . . . . . 23

2.3.1 Optimization with a partial differential equation ......... 24

2.3.2 Nonlinear optimization problem ................... 26

2.3.3 Optimization algorithms ........................ 28

2.4 In closing ...................................... 39

3 Surrogate optimization 41

3.1 Surrogate modeling & simulation ....................... 42

3.1.1 An abstract setting ........................... 42

3.1.2 Deterministic and probabilistic data-fit low-fidelity models . . 59

3.1.3 Simplified-physics low-fidelity models ............... 75

3.2 Surrogate-based optimization .......................... 81

3.2.1 Optimization with test functions

by data-fit low-fidelity models .................... 81

3.2.2 Optimization with test functions

by emulated simplified-physics low-fidelity models . . . . . . . 96

3.3 Surrogate-guided optimization ......................... 100

3.3.1 Sequential kriging optimization ................... 102

3.3.2 Optimization within the space-mapping paradigm . . . . . . . 104

xiv

3.3.3 Co-kriging optimization ........................ 117

3.4 In closing ...................................... 126

4 An algebraic modeling framework using the category theoretical language

for applications in surrogate optimization 129

4.1 Recapitulating and enlarging the contextual landscape . . . . . . . . . 130

4.1.1 Recapitulating & the context of full automation of SGO . . . . 130

4.1.2 Relevant related work ......................... 132

4.2 Category theory toolset ............................. 133

4.2.1 Fostering some intuition ........................ 133

4.2.2 Applying some rigor .......................... 134

4.2.3 Computational facets .......................... 140

4.3 Using the CT toolset for SGO methods .................... 142

4.3.1 Specifying a general optimization problem ............ 143

4.3.2 Specifying surrogate-guided optimization methods . . . . . . . 145

4.4 Use cases of the CT toolset

within the electromagnetics context ..................... 153

4.4.1 Use case #1: Simplified-physics low-fidelity models . . . . . . . 153

4.4.2 Use case #2: Coordinate transformations .............. 157

4.5 Future use cases for the CT toolset ...................... 161

4.6 In closing ...................................... 163

5 Surrogate optimization with the magnetoquasistatic model 165

5.1 Solenoid with a core ............................... 166

5.1.1 Preliminary consideration ....................... 166

5.1.2 Optimization problem I ........................ 172

5.1.3 Optimization problem II ........................ 178

5.2 Common-Mode Choke ............................. 182

5.2.1 Preliminary consideration ....................... 182

5.2.2 Optimization problem I ........................ 183

5.2.3 Optimization problem II ........................ 186

5.3 In closing ...................................... 190

6 Conclusion and outlook 191

6.1 Conclusion ..................................... 191

6.2 Outlook ....................................... 193

A Multivariate polynomials (§ 3.1.2) 195

A.1 Reparametrization using mean-centered arguments . . . . . . . . . . . 195

A.2 Bernstein polynomials .............................. 196

A.3 Chebyshev polynomials ............................. 199

B Solenoid with a core (§ 5.1) 203

B.1 An electrical network viewpoint ....................... 203

B.2 A visualization of evaluated data-fit low-fidelity models regarding (5.12),

(5.14c), and (5.15) ................................. 206

Bibliography 211

List of Figures

1.1 An inductive component in various representations. ........... 2

1.2 A schematic depiction of a low-fidelity model. ............... 5

1.3 Illustrating two generic devices under test. ................. 7

1.4 A schematic orientation aid for the present work. ............. 10

2.1 Illustrations of a magneotquasistatic subsystem’s domain. . . . . . . . 17

2.2 Representations of the six test functions in Table 2.1. ........... 35

2.3 Representations of the six test functions in Table 2.1 (highlighting the

neighborhood of the global minimum). ................... 36

2.4 Depicting grad(f)(x1,x2)with (x1,x2)∈Uas a projection on the con-

tour representation of the six test functions in Table 2.1. ......... 37

2.5 Depicting si(f,(x1,x2))from (2.49) with i∈{1,2}and (x1,x2)∈Uw.r.t.

the test functions in Table 2.1. ......................... 38

3.1 Representations of different sampling plans (part one). ......... 49

3.2 Representations of different sampling plans (part two). ......... 50

3.3 The Audze-Eglais LHC sampling plan in Figure 3.2 adapted for the

contour representation in Figure 2.2b. .................... 51

3.4 Sampling plan Xs(condition number κ(BTB)). ............... 66

3.5 Basis under consideration for the space P1

≤2.

(i)

Monomial basis,

(ii)

Bern-

stein basis,

(iii)

Chebyshev basis. ....................... 67

3.6 The six radial basis functions in Table 3.2. .................. 69

3.7 A schematic depiction of a user-prescribed hierarchy of generic bound-

ary value problems. ............................... 76

3.8 A schematic depiction of a user-prescribed hierarchy of magnetoqua-

sistatic and magnetostatic problems. ..................... 77

3.9 Using the Sobol quasi-random sequence sampling plan in Figure 3.2,

a 2-variate monomial polynomial model p(x)∈P2

≤2via regression of

the test functions in Figure 2.2. ......................... 83

3.10 Using the Sobol quasi-random sequence sampling plan in Figure 3.2,

a radial basis function φ=r↦φ(r)with thin plate spline assignment

via interpolation of the test functions in Figure 2.2. ............ 84

3.11 Using the Sobol quasi-random sequence sampling plan in Figure 3.2,

a kriging low-fidelity model via interpolation of the test functions in

Figure 2.2. ..................................... 85

3.12 The value of eN

cv and eNR

cv evaluated at the number of folds kand the

number of testing points mgw.r.t. the test functions in Figure 2.2 with

normalized values. ................................ 90

3.13 The value of eN

cv evaluated at the number of folds kand the number of

testing points mgw.r.t. the test functions in Figure 2.2 with normalized

values. ........................................ 91

3.14 The value of r2

y˜

y,cv evaluated at the number of folds kand the number

of testing points mgw.r.t. the test functions in Figure 2.2. . . . . . . . . 92

xvi

3.15 The value of SN

y,iwith i∈{1,2}evaluated at fw.r.t. the test functions

in Figure 2.2 and the number of training points mt............. 93

3.16 Emulated simplified-physics low-fidelity models for the modified Branin

test function in Figure 2.2 via the assignment rule in (3.105). . . . . . . 98

3.17 Depicting grad(˜

K)(x1,x2)as a projection on the contour representa-

tion of the emulated simplified-physics low-fidelity models for the

modified Branin test function in Figure 3.16. ................ 99

4.1 A schematic depiction of a three-dimensional helical coil of 5 turns in

xyz-coordinates and in uvw-coordinates. .................. 158

5.1 A representation of a solenoid with core within FEMM4.2. . . . . . . . 167

5.2 Contour representation of ˜

jPL,ωiand ˜

QL,ωiin (5.31) for various fre-

quencies. ...................................... 180

5.3 A representation of a common-mode choke within FEMM4.2. . . . . . 182

5.4 A prototypical version of a simplistic EMC filter. . . . . . . . . . . . . . 184

5.5 A representation of two common-mode choke within FEMM4.2. . . . 187

5.6 Contour representation of ˜

QL,ω0in (5.47). .................. 189

A.1 Instances of a Chebyshev grid as a sampling plan Xs⊂(X2)m1×m2.. . 200

B.1 Circuit diagram representation of the three fundamental passive elec-

trical components. ................................ 203

B.2 Two representatives from the equivalence class of circuit diagrams for

real inductive components. ........................... 204

B.3 The magnitude and the phase of the impedances associated with the

representatives in Figure B.2. .......................... 205

B.4 By using the Sobol quasi-random sequence sampling plan with m∶=21

and the data-fit low-fidelity models in § 3.2.1; representations of ˜

jPL,ω0(x1,x2)

and ˜

Vut(x1,x2)where ω0∶=2π100kHz. ................... 207

B.5 By using the Sobol quasi-random sequence sampling plan with m∶=21

and the data-fit low-fidelity models in § 3.2.1; representations of ˜

jPL,Vut,ω0(x1,x2)

and ˜

QL,ω0(x1,x2)where ω0∶=2π100kHz. .................. 208

B.6 Depicting grad(˜

K)(x1,x2)as a projection on the contour representa-

tion of the data-fit low-fidelity models for the functions in Figure B.4. 209

B.7 Depicting grad(˜

K)(x1,x2)as a projection on the contour representa-

tion of the data-fit low-fidelity models for the functions in Figure B.5. 210

xvii

List of Tables

2.1 Test functions of form f=(x1,x2)↦f(x1,x2)∶U→R.. . . . . . . . . . 29

2.2 The normalized global first-order sensitivity measure SN

iwith i∈{1,2}

evaluated at fw.r.t. the Figure 2.2b. ..................... 31

2.3 The normalized global first-order sensitivity measure SN

iwith i∈{1,2}

evaluated at fw.r.t. the Figure 2.3b. ..................... 31

2.4 The normalized global first-order sensitivity measure SN

ievaluated at

fRNξw.r.t. the domain [−2.0,2.0]Nξwith Nξ∈{2,3,4,5,6,7}.. . . . . . . 32

2.5 Check exemplarily the Ackley function’s global optimum. . . . . . . . 33

3.1 Given some pairs (k,d)∈N2, the dimension of Pd

≤kand Pd

k.. . . . . . . 61

3.2 Given a generic radial basis function φ=r↦φ(r)with function sig-

nature R+→R, six different definitions for the assignment φ(r).. . . 69

3.3 The normalized mean generalization error and the mean SSPCC within

the k-fold cross validation method w.r.t. a 2-variate monomial poly-

nomial. ....................................... 86

3.4 The normalized mean generalization error and the mean SSPCC within

the k-fold cross validation method w.r.t. a radial basis function with

thin plate spline assignment. .......................... 86

3.5 The normalized mean generalization error and the mean SSPCC within

the k-fold cross validation method w.r.t. a kriging low-fidelity model. 87

3.6 The normalized global first-order sensitivity measure SN

y,iwith i∈{1,2}

evaluated at fw.r.t. the 2-variate monomial polynomial in Figure 3.9b. 87

3.7 The normalized global first-order sensitivity measure SN

y,iwith i∈{1,2}

evaluated at fw.r.t. the radial basis function with thin plate spline as-

signment in Figure 3.10b. ............................ 87

3.8 The normalized global first-order sensitivity measure SN

y,iwith i∈{1,2}

evaluated at fw.r.t. the kriging low-fidelity model in Figure 3.11b. . . 87

3.9 The normalized mean generalization error and the mean SSPCC within

the k-fold cross validation method w.r.t. a kriging low-fidelity model

of a generalized version of the Rosenbrock test function in Table 2.1

(without normalized values). .......................... 89

3.10 The normalized global first-order sensitivity measure SN

ievaluated at

fRNξfrom (2.52) w.r.t. the domain [−2.0,2.0]Nξwith Nξ∈{2,3,4,5,6,7}.94

3.11 The low-fidelity models’ normalized global first-order sensitivity mea-

sures (LFSM) error in (3.37) evaluated at fRNξfrom (2.52) w.r.t. the

domain [−2.0,2.0]Nξwith Nξ∈{2,3,4,5,6,7}................. 94

3.12 Surrogate-based optimization w.r.t. the modified Branin function us-

ing data-fit low-fidelity models. ........................ 96

3.13 The choice of the 4-tuple of parameters (α,β,γ,δ)in Figure 3.16. . . . 96

xviii

3.14 The normalized mean generalization error and the mean SSPCC within

the k-fold cross validation method w.r.t. emulated simplified-physics

low-fidelity models in Figure 3.16a. ...................... 97

3.15 The normalized mean generalization error and the mean SSPCC within

the k-fold cross validation method w.r.t. emulated simplified-physics

low-fidelity models in Figure 3.16b. ..................... 97

5.1 The rough heuristic estimate of the scale of the winding losses in (5.13)

for various operating frequencies f0...................... 171

5.2 (Ia) SN

iwith i∈{1,2}evaluated at

(a)

f≡˜

jPL,ω0and

(b)

f≡˜

Vut w.r.t. the

Figure B.6; (Ib) Given mj∶=50, LFSM error emj(SN

y,i)w.r.t. (Ia); (IIa) SN

with i∈{1,2}evaluated at

(a)

f≡˜

jPL,Vut,ω0and

(b)

f≡˜

QL,ω0w.r.t. the

Figure B.7; (IIb) Given mj∶=50, LFSM error emj(SN

y,i)w.r.t. (IIa). . . . . 174

5.3 The mean SSPCC r2

y˜

y∣k∶=5within the k-fold cross validation method

w.r.t. the simplified-physics low-fidelity model regarding the entities

in (5.17). ....................................... 176

5.4 The log data in an abridged version w.r.t. the setting of the log data

in (5.19) for different operating frequencies. ................ 179

5.5 The log data in an abridged version w.r.t. the setting of the log data

in (5.19) for the operating frequency 1×108Hz and different initial

points. ........................................ 179

A.1 The condition number w.r.t. a sampling plan from Figure 3.4 without

and with reparametrization in (A.2). ..................... 195

A.2 The trace of 1

m(BTB)and the trace of 1

m(BT

ςBς).. . . . . . . . . . . . . . . 197

A.3 The condition number κ(BTB+λI)and κ(BT

ςBς+λI).. . . . . . . . . . 198

xix

List of Abbreviations

1D One-dimensional

2D Two-dimensional

3D Three-dimensional

ADE Adaptive Differential Evolution

AMMO Approximation and Model Management Optimization

BVP Boundary Value Problem

CCC Cartesian Closed Category

CM Common-Mode

CMC Common-Mode Choke

COBYLA Constrained Optimization by Linear Approximation

CT Category Theoretical/Category Theory

DIRECT Dividing Rectangles

DM Differential-Mode

DoFF Degree ofForgetfulness

EMC Electromagnetic Compatibility

FE Finite Element

GA Genetic Algorithm

GPU Graphics Processing Unit

KKT Karush-Kuhn-Tucker

L-BFGS Limited-memory Broyden-Fletcher-Goldfarb-Shanno

LBVP Linear Boundary Value Problem

LFSM Low-Fidelity Models’ Normalized Global First-Order Sensitivity Measures

LHC Latin Hypercube

LU Lower-Upper

MEA Modified Evolutionary Algorithm

MLE Maximum Likelihood Estimate

MM Manifold Mapping

MMA Method of Moving Asymptotes

MS Moore-Skelboe

NEGE Normalized Empirical Generalization Error

NLBVP Non-Linear Boundary Value Problem

NMS Nelder-Mead Simplex

NREGE Normalized Root Empirical Generalization Error

PDE Partial Differential Equation

PL Programming Language

PS Particle Swarm

RPM Response and Parameter Mapping

S- Scattering

SGO Surrogate-guided Optimization

SM Space Mapping

SQP Sequential Quadratic Programming

SSPCC Squared Sample Pearson Correlation Coefficient

SVD Singular Value Decomposition

TPS RBF Thin Plate Spline Radial Basis Function

TRASM Trust Region Aggressive Space Mapping

UMP Universal Mapping Property

WPBVP Well-Posed Boundary Value Problem

xxi

Physical & Mathematical Constants

speed of light in vacuum c0=2.99792458 ×108ms−1

vacuum magnetic permeability µ0=4π×10−7Hm−1

vacuum electric permittivity e0=1/(µ0c2

pi π=3.1415926535897. ..

xxiii

List of Symbols (Selection)

Jelectric current flux density

µmagnetic permeability

σelectric conductivity

ωangular frequency

Amagnetic vector potential

φelectric scalar potential

I0fixed current intensity

PLtime-averaged ohmic loss

Wmtime-averaged magnetic energy

Rresistance

Linductance

f0operating frequency

ω0operating angular frequency

Vut volume under test

Rset of real numbers

xspace variable

Ωspace region

∂Ωboundary of Ω

f∶A→Bdomain Aand codomain Bof function fwith the signature A→B

f=x↦f(x)function fwith the assignment rule x↦f(x)

div, grad,curl differential operators: divergence, gradient, and curl

∀universal quantifier

∃existential quantifier

VHilbert space

∥⋅∥Vappropriate norm on V

Qquantity of interest

hmesh size parameter

Thsimplicial triangulation

uhdiscrete solution

Nξnumber of parameters

ξparameter point

fparametric solution function

Qξ

ξreduced parametric quantity of interest

jreduced objective function

○composition operator

silocal first-order sensitivity measure w.r.t. component i

inormalized global first-order sensitivity measure w.r.t. component i

ddimensionality

eH(K)high-fidelity function approximation error

ssample

Xssampling plan

msample size

xxiv

eH,s(ˆ

Qξ

ξ)empirical surrogate modeling error

H,sg(ˆ

Qξ

ξ)normalized empirical generalization error

y˜

ysquared sample Pearson correlation coefficient

≤kspace of d-variate polynomials of total degree at most k

Nm(y∣µy,Σ)probability density of an m-dimensional Gaussian distribution at y

Ψcorrelation matrix

Ccovariance matrix

Lln ln-likelihood function

∆Ax=bthreshold for a termination criterion of an iterative solver

Pdomain-oriented correction map

∆(k+1)k+1-th iteration trust-region radius

x∗optimal solution of the high-fidelity optimization problem

Xobject

gmorphism

Acategory

F functor

αnatural transformation

Sij (i,j)-th scattering parameter

Nnumber of turns of a winding

mSGO,sm number of high-fidelity function evaluations (space-mapping)

mSGO,ck number of high-fidelity function evaluations (co-kriging)

mwnumber of operating frequencies

xxv

Dedicated to my parents, Mensur & Abasa

Chapter 1

Introduction

In this chapter, we encounter the background, the scope, and the research goals of

the present dissertation:

• First, I discuss the bigger picture, more precisely, the ideal long-term goal which

originates in the engineering domain of power electronics and inspires the

starting point and the direction of the research project.

• Second, I sketch the general path to which the dissertation contributes; that

is, the development and application of surrogate modeling, simulation, and

optimization methods in, primarily, the electromagnetic field theory’s realms

of magnetostatics and magnetoquasistatics.

• Third, I conclude the chapter by providing the path-dependent research goals

that guide the remainder of the work.1

1.1 A bigger picture: The ideal long-term goal

Implicitly or explicitly, every research project has an ideal long-term goal which

helps to establish the investigation’s concrete context and the actual research goals.

The thesis’ ideal long-term goal is the full automation of the virtual prototyping of power

electronic systems. Let me elucidate briefly this goal.

The domain of power electronics is concerned with the control and the conver-

sion of electrical energy by means of fast switching semiconductor components (see,

e.g., [152], [89], [60]); two representatives of the broad class of power electronic sys-

tems are three-phase rectifiers and electromagnetic compatibility (EMC) filters (see

Fig. 1.1).2Given this domain, my notion of “full automation” is that an ideal soft-

ware system processes a user’s input specifications and it outputs an appropriate

power electronic system – without any additional user’s intervention. Finally, I un-

derstand the term “virtual prototyping” as a proxy for “mathematical modeling,

numerical simulation & optimization”.

Note that full automation is still far away; but there has already been prolific

research regarding virtual prototyping of power electronic systems (see, e.g., [222],

[220], [41]). The corresponding real-world engineering optimization problems con-

sist of different levels of design complexity: from the materials’ design over the

1Notice that the present dissertation’s typesetting builds upon a free and open-source L

TEX type-

setting template provided by LaTeX Templates (see [162]).

2Mind that, for the purpose of drawing figures in the present dissertation, I invoke the free and

open-source vector graphics editor Inkscape (see version 1.0.1 at

https://inkscape.org/

), the free

vector graphics editor Ipe (see version 7.2.20 at

http://ipe.otfried.org/

), and the free and open-

source 3D computer graphics software Blender (see version 2.91.0 at

https://www.blender.org/

2Chapter 1. Introduction

(a)

(b)

FIGURE 1.1: An inductive component in various representations:

(a) in a circuit diagram (figure from [154, p. 7]), (b) in a real-

world EMC filter (source: Fraunhofer-Institut für Zuverlässigkeit

und Mikrointegration IZM), (c) in the 3D simulation tool CST Stu-

dio Suite®3, and (d) in the 2D simulation tool FEMM4.2 (see [149]).

components’ design to the systems’ design. Mind that these problems involve intri-

cate interactions between various physical domains such as electromagnetics, fluid

dynamics or structural mechanics; additionally, they involve several conflicting ob-

jectives such as performance, cost or efficiency. Hence, formalizing properly and

solving efficiently these problems are challenging tasks.

To date, the reported optimization procedures utilize predominantly concepts

from the area of multidisciplinary design optimization (see, e.g., [2], [147]) and from the

area of multiobjective optimization (see, e.g., [150], [146]):

Multiobjective optimization. If multiple objective functions are taken into ac-

count, then multiobjective optimization – or vector optimization – expresses an opti-

mal design by the notion of Pareto optimality, i.e., an optimal design is Pareto-optimal

if an improvement concerning one objective leads inevitably to a degradation con-

cerning another objective. Common multiobjective optimization techniques include

a transformation of the multiple objectives into a single objective, for instance, by the

weighted sum method: First, the objectives are multiplied by weights (non-negative

numbers that add up to one); next, the weighted objectives are summed up. An

immediate complication is the need for selecting a specific combination of weights

which is reflecting a specific preference of objectives.

3CST Studio Suite®is a proprietary commercial 3D electromagnetic analysis software package by

Dassault Systèmes (see version 2019 at

https://www.3ds.com/

1.2. Glimpse at the details: Surrogate optimization 3

Multidisciplinary design optimization. Optimal input variables attained by an

optimization using a single physical discipline rarely equal the optimal input vari-

ables attained by an optimization using multiple physical disciplines – especially, if

there are interdependencies between the different disciplines. For the preceding ob-

servation, multidisciplinary design optimization offers a framework to keep track of

the input variables and all the involved output variables. However, one important

issue is how to establish compatibility regarding the variables; another important is-

sue is how to choose an adequate architecture, i.e., how to coordinate the analysis of

the multiple interdependent physical disciplines. These issues influence the selec-

tion of a solution method and the reasoning about the optimal design.

Regarding virtual prototyping of power electronic systems, a noteworthiness of

the deployed procedures is that they utilize different computational and noncom-

putational models of variable degrees of fidelity, e.g., finite element simulations,

closed-form expressions, physical experiments, etc. The areas of surrogate optimiza-

tion (see, e.g., [70]), and multifidelity optimization (see, e.g., [166]), respectively, are

dedicated to exploit such different models for optimization purposes.

Mind that, to my best knowledge, concepts from surrogate optimization have

not yet been exhaustively discussed in the context of virtual prototyping of power

electronic systems. However, as I have elaborated above, the complexity of real-

world power electronic systems’ design is tremendous. Therefore, the focus of the

present work’s applications is on particular optimization problems concerning in-

ductive components (see Fig. 1.1). Inductive components represent significant de-

vices under test since they contribute heavily to the losses of a power electronic

system, and they demand a lot of space within a power electronic system (cf. [154,

p. 2]).

All in all, the surrogate optimization of inductive components constitutes the

starting point and the direction of the research project.

1.2 Glimpse at the details: Surrogate optimization

Surrogate optimization for engineering design problems is a vast research area that

spans several decades of intensive investigations (see, e.g., surveys in [125], [127],

and [126]). At a conceptional level, the notion surrogate optimization encompasses

three sub-notions:

• surrogate modeling & simulation,

• surrogate-based optimization, and

• surrogate-guided optimization.4

Surrogate modeling & simulation. The basic assumption is that the evaluation

of a given function – aka high-fidelity function or model – is too expensive; hence,

there is a need to approximate this function in a meaningful manner by another

function – aka low-fidelity function or model – whose evaluation costs are, by design,

much lower than those of the high-fidelity model. An example of a high-fidelity

4Note that I employ the affix “guided” in the term “surrogate-guided”. This term is by no means

a member of the usual terminology in which “surrogate-guided” would be a synonym for “surrogate-

based”. However, I consider the terminological supplement as a helpful tool to enable better concep-

tional differentiation of the corresponding mechanisms.

4Chapter 1. Introduction

model is the joule loss functional computed by a high-order finite element simula-

tion (see, e.g., [227]). An example of a low-fidelity model is a fit to data collected by

sampling the high-fidelity model.

Some immediate issues are concerned with error bounds or error estimates in

order to assess the quality of the low-fidelity model.

Concerning this sub-notion, the surrogate model is identical to the low-fidelity

model; and surrogate simulation means evaluating the low-fidelity model.

Surrogate-based optimization. Assuming that the surrogate model is sufficiently

accurate, the basic idea of surrogate-based optimization is to replace the optimiza-

tion problem regarding the high-fidelity function by an optimization problem re-

garding the low-fidelity function – without any additional interaction with the high-

fidelity function.

Next, the optimal solution corresponding to the low-fidelity optimization prob-

lem is computed, for instance, by deterministic algorithms such as the sequential

quadratic programming (SQP) (see, e.g., [158, ch. 18]) or stochastic algorithms such

as the genetic algorithm (GA) (see, e.g., [49, p. 39–43]). Finally, the computed opti-

mal solution is checked within the high-fidelity optimization problem (cf. [31, p. 2]).

An issue concerning this sub-notion is connected to the assessment of the com-

puted optimal solution – since the optimal solution of the high-fidelity optimization

problem is unknown apriori.

Thus, the low-fidelity optimization problem’s optimal solution is either accepted

as a proxy – to some extent – of a high-fidelity optimization problem’s optimal solu-

tion; or it is utilized as a starting point within the high-fidelity optimization problem.

Surrogate-guided optimization. Compared to the previous optimization approa-

ch, the key difference is that there is an interaction between the high-fidelity op-

timization problem and the low-fidelity optimization problem. During the search

for the optimal solution of the high-fidelity optimization problem, the role of the

low-fidelity function is to speedup the search; whereas the role of the high-fidelity

function is to ensure convergence of the search.

A common issue is concerned with the general theoretic characterization of opti-

mal solutions by the first-order necessary conditions – i.e., the Karush–Kuhn–Tucker

(KKT) conditions (see, e.g., [210, p. 17f]).

Concerning this sub-notion, the surrogate model is not necessarily identical to the

low-fidelity model (cf. [194, p. 28]) since it depends on the type of interaction – or

model management strategy (cf. [166, p. 554f]) – between the high-fidelity model and

the low-fidelity model.

In the remaining text, I use the terms surrogate-guided optimization and multifi-

delity optimization (recall § 1.1) interchangeably.

Motivated by the field of application in the present work (recall § 1.1), the seman-

tics of the models is mainly determined by the electromagnetic field theory’s realms

of magnetostatics and magnetoquasistatics (see, e.g., [139] or [103]).

The preceding elaborations already hint at the pivotal role played by the low-

fidelity model in the area of surrogate optimization. In Fig. 1.2, there is a schematic

depiction of a low-fidelity model depending on the available information about the

high-fidelity model. Considering the high-fidelity model as a black-box, a gray-

box or a white-box model influences one classification of low-fidelity models into

1.2. Glimpse at the details: Surrogate optimization 5

xK(x)

(1)(2) (3)

x˜

K(x)

FIGURE 1.2: A schematic depiction of a low-fidelity model (encoded

by ˜

K) depending on the available information about the high-fidelity

model (encoded by K). The vertical arrows merely emphasize a con-

nection between a high-fidelity model and a low-fidelity model; the

horizontal arrows indicate input and output entities. The boxes as-

sociated with the low-fidelity model ˜

Ksolely indicate schematically

different potential representations of ˜

K. The high-fidelity model K

is considered as (1) a black-box model, (2) a gray-box model or

(3) a white-box model. The vertical black line separating (1) and (2)

from (3) indicates that, in the present work, the focus is on (1) and (2).

(1) data-fit, (2) simplified-physics, and (3) projection-based models (cf. [166, p. 556]).5An-

other possible way to classify low-fidelity models (indicated by the vertical black

line in Fig. 1.2) is to ask whether the models are intrusive or non-intrusive – where I

understand “intrusive” as a need to modify the numerical software underlying the

high-fidelity model (cf. [77, p. 3f]).

In my investigation, I reduce the area of focus on low-fidelity models of data-fit

type – for instance, kriging models (see, e.g., [137]) – and of simplified-physics type –

for instance, coarse-grid discretization models (see, e.g., [125, p. 159]). In order to

provide a complete picture and to comprehend the reduced focus, I address briefly

low-fidelity models of projection-based type.

Brief digression: projection-based low-fidelity models. The upcoming expo-

sition is very condensed. Thus, for a more elaborate exposition, I refer to [189]

and [20].

The basic mechanism behind this type of low-fidelity models is: The high-fidelity

model is given as a system of equations in a high-dimensional space and a corre-

sponding low-dimensional subspace is constructed such that some desired charac-

teristics of the system are preserved. The low-fidelity model constitutes the projec-

tion of the high-fidelity model onto the low-dimensional subspace.

In the context of electrical engineering, this type of low-fidelity models is inten-

sively discussed for circuit simulations and electromagnetic field simulations. Re-

garding applications in circuit simulations, see [21] for a collection of detailed in-

vestigations. Regarding applications in electromagnetic field simulations, there are

various investigations depending on the meaning of the parameter under consider-

ation (encoded by xin Fig. 1.2). Common meanings of the parameter are: frequency

5In the literature (see, e.g., [49, p. 45]), data-fit low-fidelity models are also called metamodels. More-

over, in [57], the authors suggest the wording data-fit,multifidelity, and reduced-order.

6Chapter 1. Introduction

(see, e.g., [223], [118]), material (see, e.g., [115], [44]), and geometry (see, e.g., [40],

[30]).

Undoubtedly, the projection-based type of low-fidelity models is a very important

type because it does not depend on domain-specific knowledge of an expert. On the

one hand, this independence is valuable for the automated construction of a low-

fidelity model; especially, if theoretically sound error bounds and error estimators

are available. On the other hand, it is questionable why the domain specific knowl-

edge of an expert – for example, in the form of a large number of different models –

should not be exploited.

With regard to the complexity of a real-world engineering design problem (re-

call § 1.1), there are, inevitably, a lot of open challenges concerning the theory and

the implementation of low-fidelity models of projection-based type. However, it is

arguably reasonable to state that a harmoniously balanced interaction between all

three types of low-fidelity models has the potential to be a fruitful approach in the

long run – as recent promising results (see, e.g., [165, p. A3163]) indicate.

Finally, I have sketched the general path to which the present dissertation con-

tributes. Next, I identify critical points on this general path and specify the research

scope and goals.

1.3 Setting a horizon: The research scope & goals

In order to have a chance to reconcile the ideal-long term goal (see § 1.1) and surro-

gate optimization (see § 1.2), there are at least two critical points that one encounters:

(1) In real-world design optimization problems, various high-fidelity models and

low-fidelity models from various sources are used non-formally – that lack rig-

orously proven error bounds and error estimators (see, e.g., [179] or [32]); de-

spite the non-formal usage, these models and their relationships have proven

to be useful in practice.

(2) In general, the task of comparing optimization algorithms is non-trivial (see,

e.g., [224]). With regard to surrogate optimization, there is a variety of methods

discussed in the literature but the task of choosing an appropriate method for

a given problem is non-trivial. An obstacle is to find a proper way to classify

the numerous methods. Especially, there is a lack of well-defined benchmarks

that could enable a standardized benchmark-focused comparison.

For illustration purpose of the two critical points, I exhibit briefly two examples:

(E1) An example concerning (1) is connected to the computation of the ohmic loss

(as the quantity of interest) of a three-dimensional helical coil of Nturns (as

the device under test, see (i) in Fig. 1.3).6If we replace this helical coil by a

collection of Ntoroids (see (ii) in Fig. 1.3), then the ohmic loss computation

associated with the coil represents the high-fidelity model, and the ohmic loss

computation associated with the toroids represents the low-fidelity model. The

comparison of the two models is usually based on the comparison of the re-

spective computed ohmic loss encoded as a non-negative real number.

6Unfortunately, there is some ambiguity regarding the term "ohmic loss". On the one hand, it refers

to the non-negative real number resulting from the ohmic loss integral computation; on the other hand,

it refers to the ohmic loss integral itself as a map. In the present context, I refer to the map.

1.3. Setting a horizon: The research scope & goals 7

Mind that, for instance from a topological viewpoint (see, e.g., [92]) or a bound-

ary value problem viewpoint (see, e.g., [174]), the two devices under test are

not necessarily the same in general. However, they are commonly assumed as

approximately the same regarding the ohmic loss computation – i.e., the same-

ness of the high-fidelity model and the low-fidelity model implies sameness of

the respective devices under test.

(i) (ii)

FIGURE 1.3: Two generic devices under test: (i) a three-dimensional

helical coil of 5 turns, and (ii) a collection of 5 toroids. The devices are

created within CST Studio Suite®.

(E2) An example concerning (2) is to choose a surrogate-guided optimization meth-

od from the class of methods following the space mapping paradigm (see, e.g.,

[125, p. 50]) for the optimization of a three-dimensional helical coil of Nturns

as a device under test.

Note that, in the space mapping paradigm, the low-fidelity model and the sur-

rogate model are not identical (cf. [194, p. 28]). Various approaches have been

proposed to construct the surrogate model (see, e.g., [49, ch. 3]). An attempt to

classify some methods within this class is by assessing the quality of the low-

fidelity models and the surrogate models with regard to convergence proper-

ties of the corresponding algorithms (see, e.g., [120], [121]).

I argue that the two critical points (1) and (2) are natural bounds to the full recog-

nition of surrogate optimization methods by practitioners in the industrial sector.

Additionally, the two points bound naturally the research scope in the present work

at a problem- or application-oriented level and at a theory-oriented level.

Bound at a problem- or application-oriented level. Since there is no realistic

possibility to test all conceivable classes of use cases by all surrogate optimization

methods, there is a need to restrict the investigation to a subclass of use cases and

a subclass of methods. Therefore, the use cases are restricted to applications asso-

ciated with inductive components; and the methods are restricted to those methods

that are using simplified-physics and data-fit as low-fidelity models and that are

using the space mapping paradigm (see, e.g., [125, p. 50]) and co-kriging approach

(see, e.g., [70, p. 167]) as model management strategies. According to the terminol-

ogy in [166, p. 555], the space mapping paradigm is a subtype of the model man-

agement strategy adaptation, and the co-kriging approach is a subtype of the model

management strategy fusion.

At the transition between the two levels, there are inevitable software issues

regarding, for instance, finite element (FE) simulation tools or programming lan-

guages (PLs). Commercial FE software (e.g., in CST Studio Suite®), open-source FE

8Chapter 1. Introduction

software (e.g., FEMM4.2), and in-house programs for the algorithms (written, e.g.,

in MATLAB®7and Julia8) are all employed in the present work.

Bound at a theory-oriented level. If one applies the high-fidelity model and the

low-fidelity model of the example (E1) in context of the example (E2), then one can

observe that the current formal languages in surrogate-guided optimization (see,

e.g., [127], [166]) enable only insufficiently to encode the semantics (or interpretation)

that one model is derived from the other. Considering point (1), such an encoding is

beneficial in order to preserve and organize formally the practical prior knowledge

about the models and their relationships – which is also beneficial as a stage of model

preparation in context of point (2).

Commonly, questions concerning semantics (and syntax) are rather investigated

by tools from logical analysis than by tools from numerical analysis. Mostly, in nu-

merical analysis, questions regarding logical sound footing for a reliable reasoning

about numerical models are associated with the notions of validation and verifica-

tion (see, e.g., [159]).

However, a promising mediator between these apparently different tool sets

is the formal language of category theory which is a holistic-structural approach to

mathematics (see, e.g., [11], [177], [180]). Its usefulness in physics (see, e.g., [73],

[46]) and in computer science (see, e.g., [167], [16]) has already been recognized.

Moreover, its usefulness is gradually getting recognition in electrical engineering

and computational electromagnetics (see, e.g., [13], [133]). Thus, the category theo-

retical language opens up a new opportunity to complement the primarily numeri-

cal analytic perspective in the context of surrogate optimization.

To draw the research scope completely, it is also necessary to mention directions

that are closely related to the present work but which will not be pursued.

Disclaimer: What is not considered in the dissertation. I provide a list of three

trends in the context of surrogate optimization (see § 1.2). Note that the list is cer-

tainly not exhaustive, though:

1. In real-world applications, there are many sources of uncertainties such as

manufacturing imperfections that result in, for instance, uncertain material,

shape or excitation information of a problem under consideration. Hence, the

first trend is to investigate mathematical methods of uncertainty quantifica-

tion (see, e.g., [201], [178], [50], [30], [122]).

2. The need for finding quickly an optimal solution associated with a high-fidelity

model is a reason for using surrogate optimization. However, an acceler-

ated search is also conceivable if the overall computational costs of a high-

fidelity model are reduced by utilizing concepts from parallel computing (see,

e.g., [190], [212]). Thus, a second trend is to explore the applicability of parallel

computing (see a survey, e.g., in [87]).

3. In surrogate optimization, as mentioned before, selecting a proper method for

a given problem is a non-trivial task since the selection depends heavily on

7MATLAB®is a proprietary commercial programming language by MathWorks (see version

R2019b at

https://www.mathworks.com/

8Julia is a free and open-source programming language (see version v1.5.3 at

https://julialang.

org/

). For more details on the dynamically-checked programming language Julia, I refer to, e.g., [26]

or [25].

1.3. Setting a horizon: The research scope & goals 9

the given problem. Therefore, there is a lack of generally valid guiding prin-

ciples for the selection process. However, considering machine learning tech-

niques (see, e.g., [192]), a third trend is concerned with the automation of the

selection process (see, e.g., [185]).

After providing the research scope, I can state the superordinate research goals:

• investigate the applicability of a surrogate optimization’s subsegment to ap-

plications associated with inductive components;

• investigate the benefits and drawbacks of the category theoretical language as

an algebraic modeling toolbox in the context of surrogate optimization.

In order to assess the achievements of the present work, it aids to consider the

superordinate research goals from a methodological point of view (cf. [199, p. 3]9):

The first goal is largely concerned with utilizing long-researched techniques for new

applications; whereas the second goal is largely concerned with introducing a new

area of knowledge to a long-researched area of knowledge.

Finally, I present the outline of the work:

• In chapter 2, I discuss particularly the magnetoquasistatic model of Maxwell’s

theory. Moreover, some relevant aspects regarding the numerical simulation

of the magnetoquasistatic model are presented. Finally, a few key points con-

cerning the numerical optimization with the magnetoquasistatic model are il-

luminated. Mind that, in the exposition, I take also a few small detours in order

to show by familiar examples some facets of the formal language of category

theory in advance. At the end, I address a zoo of optimization algorithms re-

garding nonlinear optimization problems. Furthermore, six test functions are

introduced and a gradient-based interpretation of sensitivity measures are de-

ployed that are primarily applied to models such as, e.g., data-fit low-fidelity

models, that permit the determination of derivative information by forward

mode automatic differentiation.

• In chapter 3, I elaborate thoroughly on the key notion surrogate optimization and

on the proposed partitioning of this notion in § 1.2 into the three sub-notions:

(1) surrogate modeling & simulation, (2) surrogate-based optimization, and

(3) surrogate-guided optimization. The various notions of surrogate optimiza-

tion are tagged with algebraic notes in order to anticipate the toolset of the

category theoretical language. Concerning the sub-notion (1), the notion of a

high-fidelity model, a low-fidelity model, and a surrogate model are pinned

down, for instance. Concerning the sub-notion (2), a numerical scaffolding of

a benchmark-focused classification of test functions is carved out, for example.

Concerning the sub-notion (3), given an optimization procedure within the

space-mapping paradigm and a co-kriging low-fidelity model, we encounter,

e.g., the elucidation of potential hybrid model management strategies.

9In [199], the authors offer a general classification of research methodologies. Although their field

of application is the numerical modeling of AC-loss in high-temperature superconductors, their classi-

fication has an application-independent general validity. Note that their classification can be regarded

as an extension of the classification in [37] of the methodology of mathematics.

10 Chapter 1. Introduction

• In chapter 4, a formalization-oriented viewpoint is deepened by introducing

the category theory toolset. I focus solely on core tools and attempt to bal-

ance intuition and rigor regarding this toolset. Moreover, the toolset is used

to specify a general optimization problem and to specify surrogate-guided op-

timization methods where the focus rests on optimization procedures within

the space-mapping paradigm. In addition, we face also other use cases for the

toolset related to high- and low-fidelity models associated with examples of

applications in electrical engineering.

• In chapter 5, we look at a solenoid with a core and a common-mode choke as

representatives of the class of inductive components where I elaborate on four

optimization problems within the setting of a two-dimensional linear bound-

ary value problem and a three-dimensional linear boundary value problem,

respectively. Supposing the context of an electrical engineering design work-

flow, a strategy of using the tools from chapter 3in practical applications is

presented and some relevant spots are carved out where the tools from chap-

ter 4can have a favorable influence, too.

• In chapter 6, I distill a conclusion from the presented research and present an

outlook.

In order to furnish one with some kind of visual orientation aid for maneuver-

ing within the present work, the following figure depicts schematically four generic

levels (the level of programs, the level of algorithms, the level of (generalized) func-

tions, and the level of applications) to which the essence of the respective discus-

sion in chapters 2,3,4, and 5can be roughly assigned to. In addition, some above-

mentioned terms are associated with these four levels as well.

Programs

(ch. 2, 3)

Algorithms

(ch. 2, 3)

Functions

(ch. 4) find the minimum

space mapping

space mappingPL

Applications

(ch. 5) *

quantity of interest

FIGURE 1.4: A schematic orientation aid for the present work (in-

spired by [225, p. 3]). The index PL refers to "programming language".

The assignment of ch. 6is omitted. The dotted lines merely indicate

a connection between the large ellipses. Each large ellipse represents

one of the generic levels: programs, algorithms, (generalized) func-

tions, and applications. The respective small ellipse within a large

ellipse symbolizes a sub-area of interest. A colored asterisk within a

small ellipse encodes a use case that is stated as colored text.

Chapter 2

Magnetoquasistatic Maxwell’s

theory – Modeling, simulation, and

optimization

In the present work, the physical framework is primarily restricted to macroscopic

scale electromagnetic phenomena described by Maxwell’s theory – in which the

magnetic energy and the power loss (weighted with a time of oscillation) are much

bigger than the electric energy such that physical effects concerning electromagnetic

wave propagation can be disregarded. The majority of the thesis’ central applica-

tions under investigation is embedded in this particular physical framework. There-

fore, I choose to expand on this particular physical framework in the subsequent

sections and to leave the common details of the general physical framework to the

standard literature (see, e.g., [139], [103]).

If the operating frequency is greater than zero, then, as customary, the mathe-

matical representation of the physical framework is given by the magnetoquasistatic

model of Maxwell’s theory; otherwise the mathematical representation is given by

the magnetostatic model of Maxwell’s theory.

Respecting the standard approach in electrical engineering, let us discuss the cor-

responding mathematical models in the language of vector analysis. Hence, in order

to express Maxwell’s theory (see, e.g., in [139], [103]), one has to assume a familiarity

with notions such as vector fields, scalar fields, the differential operators div, grad,

curl, etc.

We are not concerned with a thorough numerical analysis of the models – since

we abstract over most of their inner workings in the remaining chapters. However,

the modern treatment regarding the numerical simulation and optimization of these

models makes it necessary to involve some basic concepts from the languages of

functional analysis (see, e.g., [226], [140], [218], [9], [153]) and differential geome-

try (see, e.g., [93], [42], [83], [66], [134]) that provide methodological and termino-

logical guidance. Thus, one has to suppose a working knowledge of elementary

definitions and results concerning notions such as Hilbert spaces, bounded linear

operators, manifolds and similar; but the explanations do not follow strictly the so-

called "definition-theorem-proof model of mathematics" (cf. [204, p. 3]).

Furthermore, these two languages assist in tracing some intuitions concerning

the structural perspective that is emphasized by the language of category theory

that I employ in ch. 4.

12 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and

optimization

2.1 Magnetoquasistatic Model of Maxwell’s theory

The section provides a detailed description of the physical realm of the present work:

Starting from a brief exposition of the fundamental problem statement of electro-

magnetism, we discuss the statement’s mathematical representation by the system

of Maxwell’s equations. From the general system, we derive the magnetoquasistatic

subsystem and the magnetostatic subsystem; and for the former subsystem, we ar-

rive at a strong formulation that serves as an orientation point for the discussion

about the numerical simulation in the subsequent section.

2.1.1 The fundamental problem statement of electromagnetism

In the present work, we focus exclusively on Maxwell’s theory of electromagnetism

and corresponding mathematical models. For more details on the distinction be-

tween theory, model, and formulation, I refer to, e.g., [132, p. 5–9].

In [205, p. 273], the author states the fundamental problem of electromagnetism:

•Given a space region and a time interval,

•Given the nature of the materials that fill the region,

•Given the boundary conditions,

•Given the initial values of the configuration variables,

•Given the space and time distribution of charges and currents,

•Find the configuration of the field at every point and at every later

instant.1

2.1.2 The system of Maxwell’s equations

Associated with the problem statement in the previous section is the system of Max-

well’s equations that represents its mathematical model. Let us formalize the model

by the language of vector analysis.

The system of Maxwell’s equations contains the following field functions: the

electric field intensity E, the electric field flux density D, the magnetic field flux den-

sity B, the magnetic field intensity H, the electric charge density ρ, and the electric

current flux density J.

All field functions are defined as functions of space and time. It is assumed that

a three-dimensional Euclidean space as a model for space and a one-dimensional

Euclidean space as a model for time are given (see, e.g., [33, p. 109]). Additionally,

it is assumed that there are no mechanically moving parts involved. The space vari-

able xis a member of the space region Ω⊂R3and the time variable tis a member of

the time interval IT∶=[0,T]⊂R.

Moreover, the field functions are categorized into two types: vector fields and

scalar fields, i.e., given an instant in time, vector fields map a point in space to a

vector and scalar fields map a point in space to a scalar. Thus, the field functions E,

D,B,H, and Jare vector fields with the function signature Ω×IT→R3; the field

1The bullet points are a direct quotation of the listing in [205, p. 273], but the italic face and bold

face are my emphasis.

2.1. Magnetoquasistatic Model of Maxwell’s theory 13

function ρis a scalar field with the function signature Ω×IT→R.2Regarding the

notation, however, it should be noted that the symbols for the field functions can

also mean the evaluated field functions – for instance, E≡E(x,t),D≡D(x,t)etc.

Notice that, e.g., the field functions Eand Bconstitute configuration variables

(see § 2.1.1). The electric current density can be decomposed in a conduction part Jcond

due to an electrically conductive medium, and a source part Jsrc that is imposed ex-

ternally; hence, J∶=Jcond +Jsrc.

Customarily, the system of Maxwell’s equations is displayed in the integral ver-

sion or in the differential version. Let A⊂Ωbe an oriented surface with bound-

ary ∂A, and let V⊂Ωdenote a volume with boundary ∂V. Mind that the symbol ∂

behaves polymorphically, i.e., it is utilized to declare a boundary operator and a

partial time derivative operator ∂t. The integral version reads as

(i) ∀A.∫

∂A

H⋅ds=∫

J⋅dA+dt∫

D⋅dA,

(ii) ∀A.∫

∂A

E⋅ds=−dt∫

B⋅dA, (2.1)

(iii) ∀V.∫

∂V

D⋅dA=∫

ρdV,

(iv) ∀V.∫

∂V

B⋅dA=0.

The system of Maxwell’s equations is completed by the three constitutive equa-

tions that relate the corresponding field functions and express their interaction with

matter. Assuming time-invariant, linear, homogeneous, and isotropic material, the

equations are given by

(i) Dmat

=eE,

(ii) Bmat

=µH, (2.2)

(iii) Jcond

mat

=σE,

where the notation mat

=follows the style of [205, p. 33]. The electric permittivity e, the

magnetic permeability µ, and the electric conductivity σare considered as functions

of space. The absolute electric permittivity e0is incorporated in eand the absolute

magnetic permeability µ0is incorporated in µ. Notice that it depends on the context

whether e≡e(x),µ≡µ(x), and σ≡σ(x). In the case of non-linear and inhomo-

geneous magnetic material, it is customary to introduce the magnetization Mas an

additional field function (see, e.g., [178, p. 3]). In the presence of permanent mag-

nets, it is customary to introduce an additional magnetic field strength Hpm (see, e.g.,

[30, p. 10]). However, driven by the domain of applications in the present work (re-

call § 1.1), we are mainly concerned with constitutive equations given by (2.2).

2Borrowing from programming language theory (see, e.g., [88]), I conceive the term "function sig-

nature" similarly to the sense of the term "type signature". For example, given a function called fthat

maps a real number xto a real number x⋅Rxwhere the function ⋅Rindicates the real-valued binary

multiplication map, then one can write f=x↦x⋅Rx∶R→Rsuch that R→Ris the type signature

(or type annotation) of the function f(cf. the discussion in [32, p. 279–282]). Setting A≡Rand B≡R

in this example, then, roughly speaking, the type of xrefers to A(such that x∶A), the type of x⋅Ax

refers to B(such that x⋅Ax∶B), and the type of frefers to A→Bor BA(such that f∶A→Band f∶BA,

respectively).

14 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and

optimization

If we prescribe a unit normal vector to ∂Ω, then one can extract the tangential

and normal components of the field functions in (2.1) at the boundary of the space

region. If subregions Ω1and Ω2of the space region Ωexhibit different material

properties, then additional conditions have to be taken into account at the material

interfaces. For more details on the handling of all these conditions – especially by

trace operators in a functional analytic setting –, I refer to [218], [32], and [179].

Providing initial values of the corresponding field functions, and the space and

time information of the sources, all requirements according to the problem statement

in§2.1.1 are fulfilled. Applying the theorem of Stokes and the theorem of Gauss

on (2.1), we derive the system of Maxwell’s equation in the differential version

⎧

⎪

⎨

⎪

⎩

(i) ∀(x,t)∈Ω×IT.curlH =J+∂tD,

(ii) ∀(x,t)∈Ω×IT.curlE =−∂tB,

(iii) ∀(x,t)∈Ω×IT. divD=ρ,

(iv) ∀(x,t)∈Ω×IT. divB=0,

(2.3)

where suitable boundary conditions and reasonable properties regarding the space

region Ωare assumed – which will be discussed later.

From (2.3), we can recover the continuity equation that encodes local charge con-

servation

∀(x,t)∈Ω×IT. divJ=−∂tρ. (2.4)

Commonly, potential field functions such as the electric scalar potential φand the

magnetic vector potential Aare also employed in the context of Maxwell’s equations.

Starting from (2.3), these two potential field functions are introduced, e.g., in the

representation

(i) E=∶−grad φ−∂tA, (ii) B=∶curl A . (2.5)

The potential field functions are particularly relevant for the formulation of mag-

netostatics problem and magnetoquasistatics problem where, for instance, the so-

called A-φformulation plays an important role in the numerical approximation of

these problems (see, e.g., [179, ch. 6]).

Before we move on to the approximation of the system of Maxwell’s equations

in magnetostatics and in magnetoquasistatics, let us seize the opportunity for a de-

tour to discuss shortly an important structural property of the differential operators

regarding the physical space. By discussing this property, I want to carve out some

intuitions concerning the category theory’s structural approach in ch. 4.

Detour 1: a structural perspective on a structural property. Recall that the three

differential operators grad,curl, and div exhibit an important structural property

for contractible domains: Firstly, given a field function expressed via grad, applying

curl to this function results in the zero vector field; secondly, given a field func-

tion expressed via curl, applying div to this function results in the zero scalar field.

Simplisticly, the structural property is expressed as curlgrad ≡0and divcurl ≡0

which is encoded in the Poincaré lemma (cf. [32, p. 298]).

If we emphasize the function signature of the three differential operators, i.e.,

grad has the type "scalar field →vector field", curl has the type "vector field →vector

field", and div has the type "vector field →scalar field", then one can systematize the

previous structural property as

" scalar field grad

ÐÐÐ→ vector field curl

ÐÐ→ vector field div

ÐÐ→ scalar field " .

2.1. Magnetoquasistatic Model of Maxwell’s theory 15

The inverted commas indicate that the systematization of the structural property

cannot be formalized properly in the standard language of vector analysis (see,

e.g., [83, p. 31]). For this purpose, there is a need to state the types and maps

more precisely. The language of differential geometry and the language of func-

tional analysis are capable to encode properly the structural property which, in these

languages, is the algebraic expression called exact sequence (cf. [32, p. 132f]). Let us

expose briefly this expression in these two languages. Leaving the majority of details

to the numerous textbooks that have been mentioned at the chapter’s beginning, we

focus only on the bare minimum of technicalities since the exposition’s purpose is to

abstract the structural essence of the common algebraic expression.

Within the manifold-based differential geometric approach, the full system of

Maxwell’s equations is formulated based on the machinery of differential forms and

exterior calculus. Using this approach, the vector field functions in (2.1) are called

"vector proxies" (cf. [33, p. 132]). For instance, the electric field strength is merely

a representative of an observable entity, more precisely, the assignment of a voltage

(i.e., the electromotive force) to an oriented line. Hence, the map

e=l↦∫lE⋅ds

is called a differential form of degree 1 – abbreviated as 1-form. Assuming a smooth

manifold M, we denote the space of 1-forms as Λ1(M), thus, eis an element of Λ1(M).

If we associate other field functions with other geometric objects such as points, sur-

faces, and volumes, then one can designate the corresponding spaces: the space of 0-

forms as Λ0(M), the space of 2-forms as Λ2(M), and the space of 3-forms as Λ3(M),

respectively. Additionally, one can instantiate a notion of a differential operator via

the exterior derivative d which maps a differential form of degree kto a differential

form of degree k+1 such that one can formalize the abovementioned systematization

of the structural property in vector analysis as the algebraic expression

0→Λ0(M)d1

Ð→ Λ1(M)d2

Ð→ Λ2(M)d3

Ð→ Λ3(M)→0 . (2.6)

Observe that the structural property itself is encoded in a defining property of the

exterior derivative: ∀a∈Λk(M).(dk+1○dk)(a)≡0; or concisely: dk+1○dk≡0.

Picking the functional analytic approach, technically, we are deploying the ma-

chinery of Sobolev spaces and weak differential operators such that, given a reg-

ular bounded, contractible domain Dof the Euclidean space, we have to choose

appropriate Sobolev spaces for the field functions in (2.3) and in (2.5), i.e.: L2

grad(D),

curl(D),L2

div(D), and L2(D)where the notational convention by [32, p. 128] is

employed in which L2(D)denotes the space of square-integrable functions over D

and L2(D)denotes the space of square-integrable vector fields over D(see, e.g., [32,

p. 69]). Moreover, we have to set the domains and codomains of the weak differ-

ential operators such that, by construction, we arrive at the algebraic expression –

which is conceptually similar to (2.6):

0→L2

grad(D)grad

ÐÐÐ→ L2

curl(D)curl

ÐÐ→ L2

div(D)div

ÐÐ→ L2(D)→0 . (2.7)

One significance of diagrams such as (2.6) and (2.7) is that they provide a guid-

ance for the construction of a discrete representation of the full system of Maxwell’s

equations in the sense that, in numerical approximations, such type of diagrams

should be preserved (cf. [33, p. 145]) in order to mimic the continuous properties

16 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and

optimization

at the discrete level. Therefore, such diagrams enable some kind of consistency

check (cf. [111]). Following the spirit of the diagrammatic notation in (2.6) and

in (2.7), one can systematize the full system of Maxwell’s equations in the so-called

Maxwell’s house (cf. [32, p. 134]). See, in addition, the so-called Tonti’s classification

diagrams of electromagnetism, in short: Tonti’s diagrams (cf. [205, p. 307–323]).

Regarding an in-depth elaboration on the relationship between (2.6) and (2.7), I

refer to the discussion in [6] about the de Rham complex and Hilbert complexes, and

I refer to the discussion in [131] about the Sobolev space setting.

From a purely structural perspective, though, the essence of the algebraic expres-

sion in (2.6) and in (2.7) is that there are, in general, four spaces U,V,W,Xand three

maps f1,f2,f3such that the algebraic expression reads as

0→Uf1

Ð→ Vf2

Ð→ Wf3

Ð→ X→0 . (2.8)

At an intuitive level, one can regard the expression in (2.8) as the syntax, whereas

one can consider the expression in (2.6) as one possible semantics and the expression

in (2.7) as another possible semantics. More interestingly, one can identify another

semantics if, in (2.6) and in (2.7), one regards the spaces as vector spaces and the

maps as linear maps. This interplay of syntax and semantics shows us a flavor of a

structural perspective that foreshadows the category theoretical language which we

encounter in ch. 4.

2.1.3 The magnetoquasistatic subsystem & the magnetostatic subsystem

Due to the applications addressed in the present work, we are chiefly interested in

subsystems of Maxwell’s equations in (2.1) and in (2.3), respectively, where wave

propagating effects are neglected, and, therefore, the term ∂tDis neglected. Addi-

tionally, the electric charge density ρis assumed to be the zero scalar field function.

These restrictions lead to the magnetoquasistatic subsytem of Maxwell’s equation –

which, in the literature (see, e.g., [179, p. 7]), is also called eddy current approxima-

tion or magnetoquasistatic approximation of the Maxwell’s equation. Furthermore,

if one neglects all time-dependencies, then one arrives at the magnetostatic subsys-

tem of Maxwell’s equation.

The two subsystems represent approximations, hence, there is a need for justi-

fication. Let us assume that the magnetic energy and power loss (weighted with a

time of oscillation) are much bigger than the electric energy. There are additional

quantifiable tools (cf. [178, p. 6]) to check our assumption: (1) Given an operating

angular frequency ωin a domain, the product ωe has to be much smaller compared

to σ; and (2) the diameter of a bounded domain has to be much smaller than the

corresponding minimal wavelength within the bounded domain. For more details

on the mathematical justification, see, e.g., [179, ch. 2] or [198].

Reducing the system in (2.3) according to the corresponding restrictions, the

magnetoquasistatic subsystem of Maxwell’s equations reads as

⎧

⎪

⎨

⎪

⎩

(i) ∀(x,t)∈Ω×IT.curlH =σE+Jsrc,

(ii) ∀(x,t)∈Ω×IT.curlE =−∂t(µH),

(iii) ∀(x,t)∈Ω×IT. div(eE)=0,

(iv) ∀(x,t)∈Ω×IT. div(µH)=0.

(2.9)

If we focus on the time-harmonic case where the field functions exhibit a sinu-

soidal time-dependency, one can formulate the magnetoquasistatic subsystem of

2.1. Magnetoquasistatic Model of Maxwell’s theory 17

Maxwell’s equations in the frequency domain as

⎧

⎪

⎨

⎪

⎩

(i) curlH =σE+Jsrc in Ω,

(ii) curlE =−jωµHin Ω,

(iii) div(eE)=0 in Ωnc.

(2.10)

Remark 2.1.1. In the frequency domain, the notation of field functions indicates complex-

valued field functions. As mentioned before, the symbols for the complex-valued field func-

tions can also mean the evaluated complex-valued field functions – for instance, E≡E(x,jω).

Hence, the notation Eindicates the componentwise conjugation of Eand E(x,jω), respec-

tively.

Remark 2.1.2. In Figure 2.1, I illustrate schematically two common representatives of a

magnetoquasistatic subystem’s domain in application.

Remark 2.1.3. Moving from (2.9) to (2.10) means that we have moved from an initial-

boundary value problem (IBVP) to a boundary value problem (BVP). In the time-harmonic

case, the equation (iv) in (2.9) can been dropped since it can be recovered from the equa-

tion (ii) in (2.10). Moreover, the equation (iii) in (2.10) holds for all non-conducting subre-

gions (Ωnc), whereas the equations (i) and (ii) refer to the whole physical (or computational)

domain under consideration (Ω). However, mind that the electric conductivity σis sup-

posed to be greater than zero in a conducting subregion (Ωc), and to be equal to zero in a

non-conducting subregion (Ωnc). Finally, one has to assume that divJsrc =0in Ωnc that

follows immediately from the continuity equation in (2.4).

Ωnc

∂Ω

Ωc

(A) Representative #1: A single con-

ducting subdomain.

Ωnc

∂Ω

Ωc,2

Ωc,1

(B) Representative #2: Multiple

conducting subdomains.

FIGURE 2.1: A schematic illustration of two common representatives

of a magneotquasistatic subsystem’s domain in application.

If we apply the frequency-domain representation of the potential field functions

from (2.5) to the subsystem in (2.10), one can state this subsystem in the A-φformu-

lation: ⎧

⎪

⎨

⎪

⎩

(i) curl(µ−1curl A)=−σgrad φ−jωσA+Jsrc in Ω,

(ii) div(−egrad φ−jωeA)=0in Ωnc.(2.11)

By setting ω≡0 in (2.11), one can immediately derive the magnetostatic subsystem

of Maxwell’s equations in the A-φformulation:

⎧

⎪

⎨

⎪

⎩

(i) curl(µ−1curl A)=−σgrad φ+Jsrc in Ω,

(ii) div(−egrad φ)=0 in Ωnc.(2.12)

18 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and

optimization

A necessary remark is concerned with the gradient field grad φin (2.12) and in (2.11).

Some authors (see, e.g., [178, p. 9]) neglect this term, other authors (see, e.g., [30,

p. 11]) introduce the field function Jsrc via the term −σgrad φ. In either cases, the

investigation focuses only on finding the vector potential A. For a treatment of the

term grad φin a more general setting, see, e.g., the discussion in [97].

In order to obtain uniqueness of the vector potential A, it is necessary to intro-

duce appropriate gauge conditions and boundary conditions. A common gauge

condition is the Coulomb gauge

divA=0 in Ω. (2.13)

Let ndenote the exterior unit normal at the computational domain’s boundary ∂Ω,

then common boundary conditions are Dirichlet boundary conditions

A⋅n=0 on ΓD⊂∂Ω, (2.14)

and Neumann boundary conditions

µ−1(curlA)×n=0 on ΓN⊂∂Ω, (2.15)

where it is assumed that ΓD∪ΓN≡∂Ω. If we assume a simply-connected com-

putational domain Ω, then, based on (i) in (2.11) – and, analogously, based on (i)

in (2.12) –, one can wrap-up the previous pieces of information in a so-called strong

formulation ⎧

⎪

⎨

⎪

⎩

curl(µ−1curl A)+σgrad φ+jωσA=Jsrc in Ω,

divA=0 in Ω,

A⋅n=0 on ΓD,

µ−1(curlA)×n=0 on ΓN.

(2.16)

It is customary that due to, for instance, rotational or translational symmetry (see,

e.g., (ii) in Fig. 1.3), a two-dimensional setting is applied. Hence, the formulation

in (2.16) has to be adapted accordingly. For further discussion on this adaptation, I

refer to [178, p. 17f], [30, p. 11f], or, more generally, [83], [32].

2.2 Numerical simulation of the magnetoquasistatic model

As mentioned at the very beginning of the chapter, instead of a thorough numerical

analysis of the model, the primary concern is rather to utilize the methodological and

terminological guidance by basic concepts from functional analysis in the discussion

of the numerical simulation of the model. Hence, let us discuss abstractly the weak

formulation and its numerical approximation; and let us conclude the section by an

exposition of a parametric mathematical model.

2.2.1 The weak formulation

A strong formulation such as in (2.16) is the orientation point of the numerical sim-

ulation of the magnetoquasistatic model. However, due to continuity issues, the

solvability of the given problem in (2.16) is not generally guaranteed. Starting from

the strong formulation, a weak formulation has to be derived by formally multiply-

ing (2.16) with a test function vas a member of a Hilbert space V(Ω)and integrating

over the space region Ω. Under certain conditions, a solution to the weak formula-

tion is a solution to the strong formulation.

2.2. Numerical simulation of the magnetoquasistatic model 19

The weak formulation is a means to show that the problem in (2.16) is well-posed

in the sense of Hadamard, i.e., it exists a solution that is unique and that depends

continuously on the given data (e.g., boundary conditions or source). Furthermore,

the weak formulation is used for a finite element numerical approximation.

For the model-specific technicalities regarding the weak formulation of (2.16), I

refer to, e.g., [179, ch. 6] because, hereafter, I illuminate the weak formulation merely

abstractly in a Hilbert space setting.

First, let us set V∶=V(Ω)and encode the magnetic vector potential Aby the solu-

tion function uwhich is a member of the Hilbert space W∶=W(Ω). Second, looking

ahead to the numerical approximation by the finite element method considered as

a special case of the Ritz-Galerkin method (see, e.g., [32, p. 73]), [218, p. 45]), we set

the solution function space Wequal to the test function space V, i.e., W∶=V. Third,

assuming that the Hilbert space’s underlying field is R, let the map a∶V×V→Rbe

the bilinear form, and let the map l∶V→Rbe the linear form. Finally, one can state

the weak formulation abstractly as

findu∈Vsuch that ∀v∈V.a(u,v)=l(v). (2.17)

Remark 2.2.1. In (2.17), the Hilbert space V has to provide a notion of weak derivatives,

thus, V has to be a Sobolev space (see, e.g., [218, p. 419f]). For instance, the corresponding

spaces in (2.7) are considered as Sobolev spaces.

The boundary conditions are incorporated in the weak formulation, and, conventionally,

the excitation is incorporated in the linear form. If the bilinear form satisfies certain require-

ments such as boundedness and coerciveness and if the linear form is bounded as well, then

the weak formulation is well-posed.

Let us consider two restatements of (2.17). By observing that the linear form lis

a member of V′that is the dual of V, one can introduce the so-called natural pairing

that is a non-degenerate bilinear map <⋅,⋅>∶V′×V→Rsuch that l(v)=< l,v>.

Hence, the first restatement of (2.17) is

findu∈Vsuch that ∀v∈V.a(u,v)= < l,v>. (2.18)

A benefit of this presentation is that it is an aid in the conceptual distinction of the

various quantities involved since moving from the infinite-dimensional to the finite-

dimensional case, this distinction could be overlooked.

A second restatement of (2.17) is achieved if we only partially evaluate the bilin-

ear form aregarding the first argument such that a(u,⋅)∶V→R. One can observe

that the map a(u,⋅)and the linear form lare members of V′. By introducing a map L

such that L=u↦a(u,⋅)∶V→V′, one can restate (2.17) as

findu∈Vsuch that ∀v∈V.(Lu)(v)=l(v), (2.19)

where, by omitting additional brackets such as (L(u))(v), the conventional order of

evaluation is assumed.

Conceiving the map Lfrom (2.19) as a member of the collection hom(V,V′), that

is, the collection of all structure-preserving maps from Vto V′, one can represent,

e.g., a homogeneous partial differential equation (PDE) by the equation L(u)=0

(cf. [96, p. 2]). Hence, compared to (2.17), a benefit of the presentation in (2.18) is

the more explicit representation of the mathematical model involved. Examining

20 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and

optimization

parametric mathematical models in the last subsection, another benefit becomes ap-

parent by the logical connection between the parametric mathematical model and

its corresponding weak formulation.

In application, the solution function uis especially utilized to determine an ob-

servable physical quantity. Such a quantity is formally encoded in the so-called quan-

tity of interest which is denoted by a non-linear functional Q∶V→R. For instance,

one can express a quantity of interest with an appropriate norm on a space region

under investigation Ωi⊂Ω(cf. [211, p. 6]) such that one can write

Q=u↦∥u∥Ωi∶V→R. (2.20)

If one denotes another functional by q∶V→R(cf. [178, p. 35]), one can also repre-

sent Qin the form

Q=u↦∫

Ωi

q(u)dx∶V→R. (2.21)

In the magnetoquasistatic model, two common interpretations of the evaluated quan-

tity of interest Q(u)are the magnetic energy and the power loss.

The notion of a quantity of interest is also relevant in the context of error quan-

tification regarding high-fidelity models and corresponding low-fidelity models. We

take a closer look at these models in the next chapters.

The authors in [160] discuss the estimation of errors in quantities of interest of

two related solution functions u∈Vand u0∈V– whereas they embed their discus-

sion within the topic of model validation.

The solution function uis determined by (2.17) which represents a high-fidelity

model; the solution function u0is determined by using a different bilinear form a0

in (2.17) which represents a low-fidelity model. A conceivable distinction between

the models lies, e.g., in the different modeling of the material properties. Hence, the

error regarding the solution function E(u)∈R+and the modeling error E(Q)∈R+

can be defined as

E(u)∶=∥u−u0∥V, (2.22a)

E(Q)∶=∥Q(u)−Q(u0)∥l2, (2.22b)

where ∥⋅∥Vdenotes an appropriate norm on Vand ∥⋅∥l2denotes the standard l2-norm.

In (2.22b), choosing the absolute-value norm ∣⋅∣instead of the standard l2-norm is

possible as well.

Note that, in (2.22), it is assumed that both solutions are members of the same

space V. However, in the more generic setting of surrogate optimization, error es-

timates or error bounds for (2.22)might not exist. There are various situations in

which the quantity of interest to be compared has to be represented by two different

linear functionals. Some examples are: If uis determined in a three-dimensional

space region and u0is determined in a two-dimensional space region; or if the

space region under investigation (cf. (2.20)) may exhibit different topological prop-

erties (see, e.g., Fig. 1.3); or if different numerical methods are employed. In such a

generic setting, a comparison relying merely on the real number E(Q)conceals the

characters of the models under investigation and their relationships. The category

theoretical language in ch. 4provides tools to express formally at least parts of these

characters and relationships.

2.2. Numerical simulation of the magnetoquasistatic model 21

2.2.2 Numerical approximation

Recalling (2.16), the main focus of the exposition is on the time-harmonic case, thus,

let us solely pay attention to the spatial discretization in the context of the finite

element method.

The initial step of this method is the simplicial triangulation Thof the space re-

gion Ω, i.e., the space region is spatially subdivided into a collection of tetrahedra.

Notice that if Ω⊂R2, then the triangulation This a subdivision of Ωinto a collection

of triangles. Let us refer to has mesh size parameter.

The next step is to choose a family of finite dimensional subspaces Vhof Vsuch

that one can seek the discrete solution uh∈Vhby solving the discrete problem of the

weak formulation in (2.17), more precisely,

finduh∈Vhsuch that ∀v∈Vh.ah(uh,v)=lh(v), (2.23)

where the map ah∶Vh×Vh→Rdenotes a bilinear form and the map lh∶Vh→R

denotes a linear form. Notice well that, recalling § 2.2.1, it is tacitly assumed that

the spaces’ underlying field is R. However, technically speaking, the formulation

in (2.16) requires to consider the field of complex numbers Cwhich, in turn, de-

mands to invoke the notion of a sesquilinear form and an anti-linear form. For the

sake of exposition, let us not dwell on these specific technicalities and their implica-

tions, though.

By choosing an appropriate basis of Vh, the corresponding matrix representation

of (2.23) expresses the computation of uhby solving a system of linear equations. For

the construction of the finite element subspaces by associating each element of Th

with shape functions and degrees of freedom, I refer to, e.g., [6, p. 82f].

To construct a convergent numerical method that approximates properly the so-

lution u, the family of finite dimensional subspaces Vhhas to fulfill certain proper-

ties (cf. [6, p. 55ff]). For a more elaborated discussion on the consistency, stability,

and convergence of numerical methods, see, e.g., [7].

To close this paragraph, let us look closer at a structural property that is related

to the detour in § 2.1.2.

Detour 2: a structural perspective on another structural property. Recall the

diagrammatic presentation of the algebraic expression in (2.7). I have argued that

such an diagram is significant since it provides guidance as one moves from the

continuous to the discrete representation. The mimicry of the continuous level’s

structural property at the discrete level can be encoded by the commuting diagram

of the form

grad(D)L2

curl(D)L2

div(D)L2(D)

grad,h(D)L2

curl,h(D)L2

div,h(D)L2

h(D)

grad

πgrad

curl

πcurl

div

πdiv

hπh

grad curl div

(2.24)

where L2

grad,h(D),L2

curl,h(D),L2

div,h(D), and L2

h(D)denote the finite element sub-

spaces (see, e.g., [178, p. 21]); the differential operators behave polymorphically; and

the maps πgrad

h,πcurl

h,πdiv

h,πhindicate projections (see, e.g., [218, p. 401-405]).

From a purely structural perspective, the essence of the algebraic expression

22 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and

optimization

in (2.24) is that there are different spaces and maps equipped with a notion of com-

position in a certain context in which one can draw diagrams of the form

U V W X

U˜

V˜

W˜

p3p4

f1˜

f2˜

(2.25)

which are commutative such that it reflects the equality of various paths. Mind that,

in a more generic context, diagrams in (2.25) are not necessarily commutative.

Utilizing the intuition from the detour in § 2.1.2, one can regard the expression

in (2.25) as the syntax, whereas one can consider the expression in (2.24) as a pos-

sible semantics. Thus, this example hints more accurately at the style of reasoning

employed in the category theoretical language.

2.2.3 Parametric mathematical model

Up to this point, the solution function uis only considered within a space region Ω.

In application, though, one is additionally interested in a solution function that is

dependent on Nξparameters where Nξ∈N. These parameters are encoded in the

parameter point ξ

ξ∈X⊂RNξ. The map f=ξ

ξ↦u∶X→Vencodes the parametric so-

lution function. The expression f(ξ

ξ)denotes the solution function for the parameter

point ξ

ξ.

The corresponding partial differential equation depends on uand on ξ

ξsuch that

the map Lhas to be extended to L∶X×V→V′which is leading to L(ξ

ξ,u)=0

(cf. § 2.2.1). Commonly (see, e.g., [96, p. 2f]), it is assumed that the corresponding

partial differential equation is well-posed, and ∀ξ

ξ∈X.∃!f(ξ

ξ)∈V.

In the next step, let us adapt the weak formulation in (2.17)in order to state a

parametric weak formulation (cf. [94, p. 16]). Therefore, we have to extend the bilin-

ear form and the linear form to a∶V×V×X→Rand l∶V×X→R, respectively. The

bilinearity and linearity are with respect to the V-related arguments. The parametric

weak formulation (aka strong-weak formulation) reads as

given ξ

ξ∈X, findf(ξ

ξ)∈Vsuch that ∀v∈V.a(f(ξ

ξ),v,ξ

ξ)=l(v,ξ

ξ). (2.26)

Regarding the well-posedness of the formulation (2.26), one has to suppose the re-

quirements of the non-parametric case (2.19). For further details, I refer to, e.g., [94].

Recalling the strong formulation in (2.16), one can observe that the physical mean-

ing of the parameters originates from either the material, the geometry or the source.

In general, the individual components of the parameter point ξ

ξcan have different

physical meanings.

Similarly to (2.21), let us introduce two functionals Qξ

ξand ˆ

Qξ

ξ: the parametric

quantity of interest Qξ

ξ∶V×X→Rand the reduced parametric quantity of in-

terest ˆ

Qξ

ξ∶X→R. Given the matching of the two functionals’ codomains such

that cod(ˆ

Qξ

ξ)≡cod(Qξ

ξ), it is assumed that the evaluations of the functionals yield

the same numerical result, i.e.,

∀ξ

ξ∈X.ˆ

Qξ

ξ(ξ

ξ)=Qξ

ξ(f(ξ

ξ),ξ

ξ). (2.27)

The evaluated functional ˆ

Qξ

ξ(ξ

ξ)can be interpreted as, for instance, the numerical

value of the magnetic energy – or the numerical value of the power loss – for certain

2.3. Numerical optimization with the magnetoquasistatic model 23

geometry parameters. Analogously, the evaluated functional Qξ

ξ(f(ξ

ξ),ξ

ξ)can be in-

terpreted as the numerical value of the magnetic energy or the power loss; however,

it emphasizes the role of the parametric solution function fas well.

For further discussion on parametric mathematical models in the context of mag-

netoquasistatic Maxwell’s theory, I refer to the functional analytic setting, e.g., in [178],

and I refer to the differential geometric setting, e.g., in [174].

Remark 2.2.2. Since the functionals’ domains do not match, i.e., dom(ˆ

Qξ

ξ)/≡dom(Qξ

ξ), one

cannot conclude from (2.27) that the maps ˆ

Qξ

ξand Qξ

ξare equal by function extensionality.

In principle, one should be cautious with the equality of the evaluated quantities of in-

terests such as in (2.27). For instance, consider the so-called magnetic energy functional

and the so-called magnetic coenergy functional. Their assignment rules are different and the

numerical results of their evaluations are only equal in the linear case (cf. [32, p. 194]).

Additionally, recall the example (E1) in § 1.3 regarding the loss computation of a three-

dimensional helical coil and the loss computation of a corresponding representation by toroids.

Let ξ

ξcomprise geometry parameters that are the same in some sense for both the coil and the

toroids, let ˆ

Qξ

ξ,1(ξ

ξ)denote the loss of the coil, and let ˆ

Qξ

ξ,2(ξ

ξ)denote the loss of the toroids,

respectively. Then, an elemental tool of comparison is to check

∀ξ

ξ∈X . ˆ

Qξ

ξ,1(ξ

ξ)=Rˆ

Qξ

ξ,2(ξ

ξ),(2.28)

where the notation =Rindicates a test of equality of real numbers. If the statement in (2.28)

holds true, then one can conclude that the maps ˆ

Qξ

ξ,1 and ˆ

Qξ

ξ,2 are equal by function ex-

tensionality, thus, ˆ

Qξ

ξ,1 =X→Rˆ

Qξ

ξ,2 – such that one can substitute one map for the other.

However, the general statement in (2.28) might be undecidable. For the sake of completeness,

it is unlikely that the maps are equal by function intensionality as well; because it is unlikely

that the internal definitions of the maps are equal. In (3.2) in ch. 3, we encounter a situation

similar to the statement in (2.28) from the perspective of approximation theory.

Mind that the previous considerations are relevant from a rather logical analysis view-

point. Especially, if one imagines other loss computations by in some sense corresponding

representations of the helical coil, then it becomes appealing to look out for further tools of

comparison at the map level. Putting an emphasis on the map level is a peculiarity of the

category theoretical language.

2.3 Numerical optimization with the magnetoquasistatic model

Establishing the well-posedness property of a mathematical model is a demanding

major task in its own right. This property is a prerequisite for any optimization

procedure that is build on top of it. Hence, in the present work, it is assumed that all

mathematical models under investigation are well-posed.

Let us begin the section by outlining some theoretical considerations and limi-

tations regarding the optimization theory with partial differential equations and its

finite dimensional formulation as a nonlinear optimization problem. A particular

feature of the optimization problems is that the evaluation of the objective function

or of the constraints or of both requires the solving of a PDE. In the present work, as

opposed to its treatment as an explicit equality constraint, the discrete version of the

PDE is only considered implicitly within a given optimization problem.

We end the section by an illustration of a subset of optimization test functions

and various types of optimization algorithms.

24 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and

optimization

2.3.1 Optimization with a partial differential equation

The modern solution theory regarding optimization problems with partial differen-

tial equation is in tandem with the modern solution theory regarding partial differ-

ential equations; more precisely, it is rooted in the infinite-dimensional Banach space

setting and Hilbert space setting, respectively. However, since a thorough discussion

in such settings is out of the scope of the present work, let us only consider briefly

some aspects in order to be consistent with the abstract discussion in the previous

section, and to shine a light on some questions regarding optimization problems.

For an in-depth look at the infinite-dimensional case, I refer to, e.g., [96], [210].

Let us use the parametric mathematical model L(ξ

ξ,u)=0 from § 2.2.3 as a basis

for the investigation of the objective functional (or cost functional) J∶X×V→R.

Moreover, let us introduce a Hilbert space Rand a closed convex cone K⊂Rsuch

that one can define a map C∶X×V→Rin order to encode an abstract inequality

constraint as C(ξ

ξ,u)∈K. Hence, one can define abstractly the following optimiza-

tion problem (cf. [96, p. 2])

minimize J(ξ

ξ,u)over (ξ

ξ,u)∈X×V, (2.29a)

subject to L(ξ

ξ,u)=0, C(ξ

ξ,u)∈K, (2.29b)

where it is assumed that the objective functional Jis sufficiently smooth. The evalu-

ation of the objective functional Jrelies on solving accurately the discrete version of

the partial differential equation that is encoded in L(ξ

ξ,u)=0. Supposing the well-

posedness of the partial differential equation and using the map ffrom § 2.2.3, one

can redefine the optimization problem as

min. J(ξ

ξ,f(ξ

ξ))over (ξ

ξ,f(ξ

ξ))∈X×V, (2.30a)

s.t. L(ξ

ξ,f(ξ

ξ))=0, C(ξ

ξ,f(ξ

ξ))∈K. (2.30b)

Analogously to (2.27), one can define the reduced objective functional ˆ

J∶X→Rsuch

that ˆ

J(ξ

ξ)=J(ξ

ξ,f(ξ

ξ)). Let us consider shortly an instance of the reduced objective

functional. For the sake of presentation, I replace the finite-dimensional space Xby

the infinite-dimensional space ˜

Xand introduce the variable ˜

ξsuch that ˜

ξ∈˜

X. Hence,

the instantiated reduced objective functional ˆ

Jreads as

J=˜

ξ↦α

2∥f(˜

ξ)−yd∥2

A+β

2∥˜

ξ∥2

B∶˜

X→R, (2.31)

where α,β>0 (with 0 <α+β) indicate some fixed scalars for, e.g., weighting or regu-

larization purposes, ∥⋅∥Aand ∥⋅∥Bindicate some appropriate norms, and yddenotes

a fixed desired solution function. Objective functionals similar to (2.31) are under

investigation, for instance, in the context of ˜

ξbeing the source current density. The

corresponding optimization problems deal with the optimal control of electromag-

netic fields (see, e.g., [211]).

From an application viewpoint, it is desirable to consider the role of (reduced)

parametric quantities of interest regarding the optimization problem in (2.30). There

are various possible combinations. For instance, the instantiated reduced objective

functional ˆ

Jin (2.31) does not necessarily encode a reduced parametric quantity of

interest ˆ

Q˜

ξ(see, e.g., [80]). Thus, the value ˆ

Q˜

ξ(˜

ξ)would only be determined in a

post-optimization step. Nevertheless, it is conceivable to instantiate the reduced

2.3. Numerical optimization with the magnetoquasistatic model 25

objective functional ˆ

Jas

J=˜

ξ↦ˆ

Q˜

ξ(˜

ξ)∶˜

X→R, (2.32)

or if a desired value Qd∈Ris provided, then one can instantiate ˆ

Jas

J=˜

ξ↦∥ˆ

Q˜

ξ(˜

ξ)−Qd∥2

l2∶˜

X→R. (2.33)

If Nˆ

Qquantities of interest with Nˆ

Q∈Nare considered, then a possible instantiation

of ˆ

Jcan be written as

J=˜

ξ↦

Nˆ

∑

i=1

ηi⋅Rˆ

Q˜

ξ,i(˜

ξ)∶˜

X→R, (2.34)

or, analogous to (2.33), if Nˆ

Qdesired values Qd,1,. . .,Qd,Nˆ

Q∈Rare provided, then

one can extend (2.34) to the expression

J=˜

ξ↦

Nˆ

∑

i=1

ηi⋅R∥ˆ

Q˜

ξ,i(˜

ξ)−Qd,i∥2

l2∶˜

X→R, (2.35)

where ηi>0 (with, e.g., η1+ ⋅ ⋅ ⋅ + ηNˆ

Q=1) denote Nˆ

Qfixed weighting constants and ⋅R

indicates the standard multiplication on the real numbers. Additionally, one could

incorporate the evaluated quantity of interests in the constraints (2.30b) as well.

From a solution theory viewpoint, there are a number of important questions

(cf. [96, p. 3f]) regarding the optimization problem in (2.30):

(a) One question is whether there exits an optimal argument for an optimal objec-

tive functional value.

(b) An immediate second question is whether this optimal argument is unique.

straints, hence, whether the optimality conditions, the so-called Karush-Kuhn-

Tucker (KKT) conditions, are satisfied. Mostly, first-order necessary optimal-

ity conditions are elaborated since the investigation of second-order necessary

and sufficient optimality conditions is harder.

(d) The fourth question is concerned with corresponding optimization algorithms.

Ideally, the algorithms respect the KKT conditions; thus, in the search of the

optimal solution, they rely on information about the first derivative (gradient)

or the second derivative (hessian) of, e.g., the objective functional. Generally,

such algorithms are guaranteed to find local minimal objective function values

of the problem in (2.30). Under certain conditions such as, e.g., convexity, they

even find the global minimal objective function value.

For instance, in the case of linear elliptic partial differential equations such as in (2.16),

there is much understanding concerning the questions (a) - (d) in the context of the

optimization problem in (2.30). However, to the extent of my present understanding,

there is still not yet a complete general solution theory regarding the consideration

of quantities of interests in different combinations in the optimization problem and

for different physical meanings of the parameters.

The above-mentioned algorithms’ tendency of finding local minima inspires the

introduction of some kind of randomness in the search of a potential global mini-

mum. If the algorithms exhibit some kind of randomness in the search, let us label

26 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and

optimization

them as stochastic; otherwise, let us label them as deterministic. For stochastic algo-

rithms, there is, to my best knowledge, no established theory dealing with guaran-

teed optimal solutions of an optimization problem such as (2.30).

Finally, in order to solve numerically the optimization problem in (2.30), one has

to transform it into a nonlinear optimization problem, i.e., one has to move from the

infinite-dimensional case to the finite-dimensional case.

2.3.2 Nonlinear optimization problem

Transforming the infinite-dimensional optimization problem in (2.30) into a finite-

dimensional optimization problem leads to the identification of the spaces involved

with subspaces of the standard Euclidean space and the restriction to finitely many

equality constraints and inequality constraints. Then, one can define abstractly the

so-called nonlinear optimization problem (see, e.g., [158], [18], [35], [142]) as

min. j(ξ

ξ,f(ξ

ξ))over (ξ

ξ,f(ξ

ξ))∈RNξ×Rn, (2.36a)

s.t. ∀i∈D.li(ξ

ξ,f(ξ

ξ))=0, (2.36b)

∀i∈E.ci(ξ

ξ,f(ξ

ξ))≤0, (2.36c)

where j∶Xad ×Vad →Ris the smooth objective function, Xad ⊆RNξis the set of

admissible parameter points, Vad ⊆Rnis the set of admissible parametric solu-

tion functions for a given parameter point, Dis the set of indices for the equality

constraints li∶RNξ×Rn→R, and Eis the set of indices for the inequality con-

straints ci∶RNξ×Rn→R. Let us leave the arguments ξ

ξand f(ξ

ξ)unchanged since

the altered context is clear, thus, there is no risk of confusion. If one incorporates the

constraints into a set of admissible arguments Wad ∶=Xad ×Vad, then one can express

the optimization problem in (2.36) compactly as

min.

(ξ

ξ,f(ξ

ξ))∈Wad

j(ξ

ξ,f(ξ

ξ)). (2.37)

Introducing a reduced objective function ˆ

j∶Xad →Rsuch that ˆ

j(ξ

ξ)∶=j(ξ

ξ,f(ξ

ξ))and

invoking set-builder notation to define the set of admissible parameter points Xad as

Xad ∶={ξ∈RNξ∶ ∀i∈D.li(ξ

ξ,f(ξ

ξ))=0∧ ∀i∈E.ci(ξ

ξ,f(ξ

ξ))≤0}, (2.38)

one can state the reduced optimization problem compactly as

min.

ξ∈Xad

j(ξ

ξ), (2.39)

where, technically, one can assume that there are reduced constraint functions li(ξ

ξ)

and ci(ξ

ξ)such that li(ξ

ξ)∶=li(ξ

ξ,f(ξ

ξ))and ci(ξ

ξ)∶=ci(ξ

ξ,f(ξ

ξ)). A decisive property of

the class of nonlinear optimization problems such as in (2.36), in (2.37), and (2.39),

respectively, is that the evaluation of the objective function or of the constraints or

of both requires the solving of the discrete version of a partial differential equation.

In the present work, however, the discrete version of L(ξ

ξ,f(ξ

ξ))=0 is only implicitly

considered (see, e.g., [3]) as opposed to its treatment as an explicit equality constraint

(see, e.g., [96] or [210]).

Regarding the above-mentioned class of nonlinear optimization problems, the

fundamental assumption is that the numerical simulation of the corresponding math-

ematical model – in our case, the magnetoquasistatic model – dominates the overall

computational costs of the optimization procedure. This assumption inspires the

2.3. Numerical optimization with the magnetoquasistatic model 27

use of so-called low-fidelity mathematical models in order to reliably accelerate the

optimization procedure. Notice that the low-fidelity models are implicitly associ-

ated with low computational costs. The chapter 3is devoted to the discussion about

optimization schemes using low-fidelity models.

In ch. 3, we discuss a particular class of optimization schemes that are following

the so-called space-mapping paradigm (see, e.g., [125, p. 47]). Within this class, there is

an emphasis on a reduced objective function ˆ

j○f∶Xad →Rwhere ˆ

j∶Vad →Rand the

condition ˆ

j(f(ξ

ξ))=j(ξ

ξ,f(ξ

ξ))is applied. Considering (2.39), the condition ˆ

j(f(ξ

ξ))=

j(ξ

ξ)is supposed, too. Finally, one can state the reduced optimization problem as

min.

ξ∈Xad (ˆ

j○f)(ξ

ξ), (2.40)

where the composition operator ○ ∶ hom(Vad,R)×hom(Xad,Vad)→hom(Xad,R)is

assumed.

Confronted with the optimization problems in (2.37), in (2.39), and in (2.40), let

us apply briefly the structural perspective from the detours in § 2.1.2 and in § 2.2.2.

We discuss this perspective more thoroughly in ch. 4.

Detour 3: a structural perspective on the objective functions. Recall the dia-

grammatic presentation of the abstract algebraic expressions in (2.8) and in (2.25).

Using this style of presentation in the context of the optimization problems in (2.37),

in (2.39), and in (2.40) results in, among other aspects, stressing the function signa-

tures of the involved objective functions. Therefore, the objective function in (2.37)

can be expressed by its assignment rule together with its signature, i.e.,

j=(ξ

ξ,f(ξ

ξ))↦j(ξ

ξ,f(ξ

ξ))∶Xad ×Vad →R, (2.41)

the objective function in (2.39) can be expressed by its assignment rule together with

its signature, i.e.,

j=ξ

ξ↦ˆ

j(ξ

ξ)∶Xad →R, (2.42)

and the objective function in (2.40) can be expressed by its assignment rule together

with its signature, i.e.,

j○f=ξ

ξ↦(ˆ

j○f)(ξ

ξ)∶Xad →R. (2.43)

Recalling (2.28), an elemental tool of comparison for (2.42) and (2.43) is to check

∀ξ

ξ∈Xad .ˆ

j(ξ

ξ)=R(ˆ

j○f)(ξ

ξ). (2.44)

If the statement in (2.44) holds true, then one can conclude that the maps ˆ

jand ˆ

j○f

are equal by function extensionality, thus, ˆ

j=Xad→Rˆ

j○f– such that one can substitute

one map for the other. However, from a purely structural perspective, the essence

of the previous designated objective functions can be captured by four distinct ob-

jects A,B,C, and A×Band by five distinct maps j0,j1,j2,f2, and j2○f2such that one

28 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and

optimization

can draw three diagrams

A×B C ,

A C ,

A B C .

j2○f2

f2j2

(2.45)

The benefits of such diagrams are diverse:

(1) Such diagrams disclose pictorially the different decisions for formulating the

objective function at the signature level. Hence, they reflect in some sense the

available information of the problem at hand.

(2) From the viewpoint of syntax and semantics (recall the detours in § 2.1.2 and

in§2.2.2), such diagrams’ level of abstraction is particular useful as a unifying

guiding tool. For instance, the map f2can be a parametric quantity of interest

and the map j2can be chosen such that the map j2○f2encodes an objective

function similar to (2.33).

(3) Especially in the wider context of validation and verification (see, e.g., [178,

p. 11ff] and some references therein such as, e.g., [12], [182], and [160]), the di-

agrammatic presentation is helpful as a formal organizing tool when we con-

sider various models of various fidelity at the signature level.

2.3.3 Optimization algorithms

In order to solve the optimization problem in (2.37), in (2.39) or in (2.40) by means

of a computer, we have to apply an appropriate optimization algorithm which seeks

iteratively the solution. It is intricate to define a generally accepted taxonomy of

optimization algorithms (see, e.g., [158, p. 422]) as it is intricate to define a gener-

ally accepted taxonomy of optimization problems (see, e.g., the internet website of

the NEOS Server [84]). Therefore, it depends on the user to select an appropriate

algorithm for a given problem (cf. [158, p. 2]).

Another user-dependent decision is a software-related issue, more precisely, the

choice of the programming language (PL) in which to implement the algorithm. Two

dynamically-checked programming languages – and corresponding libraries – are

employed: the well-known MATLAB®PL and the relatively young Julia PL. A thor-

ough discussion of this software-related issue is out of the scope of the present work.

Though, two of my reasons to utilize the Julia PL are:

(1) its promising outlook to reconcile performance issues and productivity issues;

(2) and its expressive type system and support of functional programming idioms.

The rationale behind point (1) is the observation that performance issues are pre-

dominantly discussed at the algorithm level; thus, performance issues at the pro-

gram level are rarely tackled explicitly in the literature. However, in order to com-

prehensively assess the Julia PL’s capabilities and limitations in comparison to other

programming languages, there are much more benchmarks of test problems from

various sources needed (see, e.g., [172]).

2.3. Numerical optimization with the magnetoquasistatic model 29

The rationale behind point (2) is the observation that the category theoretical lan-

guage (which we encounter in ch. 4) is closely related to type theory and functional

programming languages. However, the expressiveness of the Julia PL’s type system

and the range of its functional programming features are not sufficient to fully match

the category theoretical language.

Let us utilize the Julia PL to discuss some widely used optimization algorithms.

In our discussion, let us take a pragmatic viewpoint in the sense that we leave the

details about the algorithms to the corresponding references and we describe con-

cisely their behavior regarding a subset of test functions of form f=(x1,x2)↦

f(x1,x2)∶U→Rwhich are at least members of the differentiability class C1on

the open set U⊂R2. The test functions are: Ackley, the Unit sphere, Booth, Rosen-

brock, Michalewicz, and the modified Branin (see, e.g., [116], [70], [202]). These test

functions (see the Table 2.1) cover various shapes that pose challenges for the algo-

rithms. The Figure 2.2a provides the test functions’ surface representation and the

Figure 2.2b provides the test functions’ contour representation together with a mark

of a global minimum (x∗

1,x∗

2). Additionally, the Figure 2.3 provides a close-up of the

neighborhood of the test functions’ global minimum.

TABLE 2.1: Test functions of form f=(x1,x2)↦f(x1,x2)∶U→R.

Test function fDefinition f(x1,x2)Global minimum (x∗

1,x∗

Ackley −20exp(−0.2√1

2∑2

i=1x2

i)−

exp(1

2∑2

i=1cos(2πxi))+20 +

exp(1)(0,0)s.t. f(0,0)=0

Unit sphere ∑d=2

i=1x2

i(0,0)s.t. f(0,0)=0

Booth (x1+2x2−7)2+(2x1+x2−5)2(1,3)s.t. f(1,3)=0

Rosenbrock3∑2−1

i=15(xi+1−x2

i)2+(xi−1)2(1,1)s.t. f(1,1)=0

Michalewicz4−∑2

i=1sin(xi)sin20(ix2

π) (2.20,1.57)s.t.

f(2.20,1.57)=−1.801

Modified Branin51⋅(x2−5.1

4π2x2

1+5

πx1−6)2+

10⋅((1−1

8π)cos(x1)+1)+5x1(−3.689,13.630)s.t.

f(−3.689,13.630)=−16.644

Let us set up the nonlinear optimization problem under test by choosing the

respective test function as the objective function and by defining the admissible

3Often, the factor 100 is used instead of the factor 5. However, the factor 5 is employed analogously

to [116, p. 431].

4Mind that the global minimum (2.20,1.57)is just an approximation (see, e.g., [116, p. 430]).

5The additional term 5x1is the modification (cf. [70, p. 196]) to the Branin function (see, e.g., [116,

p. 427]). The global minimum (−3.70,13.63)is just an approximation.

30 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and

optimization

sets F1(also known as box constraints), F2, and F∶=F1∩F2such that

F1∶={(x1,x2)∈R×R∶x1,l−x1≤0∧x1−x1,u≤0∧x2,l−x2≤0∧x2−x2,u≤0},

(2.46a)

F2∶={(x1,x2)∈R×R∶(x1−x∗

1)2+(x2−x∗

2)2−r2

0≤0}, (2.46b)

where the point (x1,l,x1,u)and the point (x2,l,x2,u)represent the lower and upper

bound of the component x1and x2, respectively; the point (x∗

1,x∗

2)denotes an opti-

mal argument of the respective test function that is known apriori; and the scalar r0

encodes the radius of a disk D(M,r0)with the midpoint M∶=(x∗

1,x∗

2). The scalar r0

is set to 10, i.e., r0∶=10, and the lower and upper bounds are set such that the opti-

mal argument’s quadrant is captured, e.g., for the Rosenbrock function in Fig. 2.2b,

it holds that (x1,l,x1,u)=(0,10)and (x2,l,x2,u)=(0,10). In the cases, in which the

optimal argument is the zero point (see (i) and (ii) in Fig. 2.2b), the first quadrant is

selected.

The test functions can be composed of generic Julia functions, hence, one can

employ the package ForwardDiff.jl (see [175]) in order to determine derivative

information by forward mode automatic differentiation (see, e.g., [81], [156]). Since

it is assumed that the six test functions in Table 2.1 are at least members of C1, I

depict in Figure 2.4 the image of Uunder the vector field grad f, i.e., grad(f)(x1,x2)

where (x1,x2)∈U, as a projection on the test functions’ contour representation (see

Figure 2.2b and Figure 2.3b).6

Mind that if one assumes that the domain Uis convex, that is, the condition

∀p1,p2∈U.∀t∈[0,1].(1−t)p1+R2tp2∈U(2.47)

holds to be true; and if one assumes that, e.g., the condition

∀p1,p2∈U.f(p1)≥f(p2)+grad(f)(p2)⋅R2(p1−R2p2)(2.48)

holds to be true as well, then one can conclude that a test function fis convex.

However, for arbitrary domains and arbitrary functions in practical applications, in

most cases, one cannot exploit analytical examinations of the convexity property.

Furthermore, by means of numerical examinations, one can only test the condition

in (2.47) and the condition in (2.48)for some p1,p2∈U; hence, one can gain at most a

clue for convexity. In the present work, therefore, I refrain from elaborating on the

convexity property for all the functions under consideration.

Assuming a map s1and a map s2who share the same signature that reads as

C1(U,R)×U→R+, one can conceive these maps as local first-order sensitivity mea-

sures if one defines their assignment rules as

s1(f,p)∶=(∂ex1(f)(p))2,s2(f,p)∶=(∂ex2(f)(p))2. (2.49)

Hence, let us deploy a gradient-based interpretation of sensitivity measures (see,

e.g., [129]).7In Figure 2.5,s1(f,p)and s2(f,p)are depicted w.r.t. Figure 2.3b.

6The map grad in (2.7) is overloaded in the sense that grad is equipped with the signature (U→

R)→(U→R2). Given a point p∈U, one can extract the components of grad(f)(p), that is, ∂ex1(f)(p)

and ∂ex2(f)(p), by setting ∂ex1(f)(p)∶=grad(f)(p)⋅R2ex1and by setting ∂ex2(f)(p)∶=grad(f)(p)⋅R2

ex2where ⋅R2denotes the Euclidean inner product w.r.t. R2; and ex1and ex2refer to the unit vectors

w.r.t. x1and x2, respectively.

7For more details on gradient-based sensitivity measures such as, e.g., other possible definitions

than the definition in (2.49), I refer to [129] and references therein.

2.3. Numerical optimization with the magnetoquasistatic model 31

Exploiting the assignment rules in (2.49), one can define the maps S1and S2that

possess the same signature, that is, C1(U,R)→R+, whose assignment rules read as

S1(f)∶=∫

s1(f,x)d2x,S2(f)∶=∫

s2(f,x)d2x. (2.50)

Thus, one can conceive the maps S1and S2as global first-order sensitivity measures.

In addition, one can define normalized global first-order sensitivity measures SN

1and SN

whose assignment rules read as

1(f)∶=S1(f)

Σ2

i=1Si(f),SN

2(f)∶=S2(f)

Σ2

i=1Si(f), (2.51)

where ∀f.SN

1(f)+SN

2(f)=1.0 holds to be true in exact arithmetic.

By using the package HCubature.jl (see [105]), SN

1(f)and SN

2(f)in (2.51) can

be computed by means of numerical integration with regard to the Figure 2.2b and

with regard to the Figure 2.3b. In Table 2.2 and in Table 2.3, respectively, the corre-

sponding results are presented.8

TABLE 2.2: The normalized global first-order sensitivity measure SN

with i∈{1,2}evaluated at fw.r.t. the Figure 2.2b.

(i) (ii) (iii) (iv) (v) (vi)

1(f)0.5000 0.5000 0.4894 0.9965 0.3702 0.7234

2(f)0.5000 0.5000 0.5106 0.0035 0.6298 0.2766

Σ2

i=1SN

i(f)1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

TABLE 2.3: The normalized global first-order sensitivity measure SN

with i∈{1,2}evaluated at fw.r.t. the Figure 2.3b.

(i) (ii) (iii) (iv) (v) (vi)

1(f)0.5000 0.5000 0.5000 0.9109 0.2595 0.9277

2(f)0.5000 0.5000 0.5000 0.0891 0.7405 0.0723

Σ2

i=1SN

i(f)1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

If we consult the Table 2.1, then one can assert that the results in Table 2.2 and

in Table 2.3 seem to pass some plausibility checks. More precisely: The choice of

the domain Ucan have an impact on the sensitivity measures if the corresponding

evaluated test function does not show some kind of symmetry. In the cases from

(i)

(iii)

, for all points p∈Uthe evaluated squared instantaneous rate of change is,

roughly speaking, equal w.r.t. both variables x1and x2; whereas, in the cases from

(iv)

(vi)

, for all points p∈Uthe evaluated squared instantaneous rate of change is,

roughly speaking, either greater w.r.t. x1or greater w.r.t. x2. Hence, from a practical

applications viewpoint, the Table 2.2 and the Table 2.3 furnish us with a valuable

8Due to numerical inaccuracies, it is needed to set U≡[−29.9999,30.0]×[−30.0,30.0]and

U≡[−1.9999,2.0]×[−2.0,2.0], respectively, in the case of the test function

(i)

, i.e., Ackley, in order

to ensure that, for all test functions, the estimated error with regard to the estimated integral is at least

below 1×10−4.

32 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and

optimization

quantitative screening of the importance of the variables regarding the respective

test function.

Mind that the elaborations are without loss of generality regarding the number

of parameters Nξ(recall § 2.2.3). To exemplify this kind of generality, let us explore

the Rosenbrock test function in Table 2.1 since, without much ado, it is amenable to

the number of parameters Nξwith Nξ∈{2,3,4,5,6,7}. Hence, the Nξ-dimensional

Rosenbrock test function fRNξcan be written as

fRNξ

=x↦fRNξ(x)∶=

Nξ−1

∑

i=1

5(xi+1−x2

i)2+(xi−1)2∶UNξ→R(2.52)

where Nξ∈{2,3,4,5,6,7}, and UNξis an open set UNξ⊂RNξ. Notice that, in each case

of Nξ, the global minimum w.r.t. (2.52) is at the point (1,1,...,1)∈RNξ. Adapting the

normalized global first-order sensitivity measures in (2.51) to the use case of the Nξ-

dimensional Rosenbrock test function fRNξ, I report in Table 2.4 the corresponding

results.The observable pattern is reasonable if one unrolls the term fRNξ(x)in (2.52).

TABLE 2.4: The normalized global first-order sensitivity mea-

sure SN

ievaluated at fRNξw.r.t. the domain [−2.0,2.0]Nξwith

Nξ∈{2,3,4,5,6,7}.

Nξ

i(f)i∶=1i∶=2i∶=3i∶=4i∶=5i∶=6i∶=7ΣNξ

i=1SN

i(f)

2 0.9109 0.0891 −−−−−1.0000

3 0.4008 0.5600 0.0392 −−−−1.0000

4 0.2569 0.3590 0.3590 0.0251 −−−1.0000

5 0.1892 0.2641 0.2641 0.2641 0.0185 − − 1.0000

6 0.1494 0.2090 0.2090 0.2090 0.2090 0.0146 −1.0000

7 0.1239 0.1728 0.1728 0.1728 0.1728 0.1728 0.0121 1.0000

Finally, let us invoke four packages that contain various types of optimization

algorithms:

(Opkg1) the package NLopt.jl (see [107]) provides an interface to the open-source

NLopt library for nonlinear optimization,

(Opkg2) the package BlackBoxOptim.jl (see [64]) which provides some meta-heuristic9

stochastic algorithms for global optimization,

(Opkg3) the package Optim.jl (see [151]) provides some deterministic and stochastic

algorithms for box-constrained local and global optimization, and

(Opkg4) the package IntervalOptimisation.jl (see [186]) which provides guaran-

teed deterministic global optimization algorithms using interval arithmetic.

From (Opkg1), I employ two gradient-based local optimization algorithms –

more precisely, a sequential quadratic programming (SQP) algorithm based on [128]

and a method of moving asymptotes (MMA) algorithm based on [203] – on the

admissible set F; I apply two derivative-free local optimization algorithms – i.e., a

9Commonly, the term meta-heuristic (see, e.g., [217]) refers to a strategic search by trial and error

without a theoretical guarantee of global optimality.

2.3. Numerical optimization with the magnetoquasistatic model 33

Nelder-Mead simplex (NMS) algorithm based on [176] and a constrained optimiza-

tion by linear approximations (COBYLA) algorithm based on [170] – on the admis-

sible set F1and on the admissible set F, respectively; and I apply two derivative-free

global optimization algorithms – that is, the dividing rectangles (DIRECT) algorithm

based on [109] and a modified evolutionary algorithm (MEA) based on [193] – on the

admissible set F1.10

From (Opkg2), I pick an adaptive differential evolution (ADE) algorithm from a

collection of stochastic algorithms in order to perform a stochastic global optimiza-

tion on the admissible set F1.

From (Opkg3), I utilize a primal interior-point algorithm (see, e.g., [158, ch. 19]

or [116, ch. 10.9]) on the admissible set F1in which I employ for the inner optimiza-

tion algorithms a gradient-based, i.e., a limited-memory Broyden–Fletcher–Gold-

farb–Shanno (L-BFGS) algorithm (see, e.g., [158, ch. 7.2]); a derivative-free, i.e., a

Nelder-Mead simplex algorithm (see above); and a stochastic global, i.e., a particle

swarm (PS) algorithm based on [228].

By providing the initial point (ˆ

x1,ˆ

x2)=(1.1,1.1), let us check exemplarily the

Ackley function’s global optimal argument (x∗

1,x∗

2)=(0.0,0.0)with the correspond-

ing optimal function value f(x∗

1,x∗

2)=0.0 (see (i) in Figure 2.2b). As expected, all

algorithms find the optimal solution within a certain numerical tolerance (see Ta-

ble 2.5). But also as expected, choosing an initial point closer to the admissible set’s

borders, the gradient-based local optimization algorithms tend to be trapped in one

of the Ackley function’s many local minima. Similarly, the behavior of the algo-

rithms with respect to the other test functions (see (ii)–(vi) in Figure 2.2b) – that has

been well investigated in the literature – can be recapitulated.

TABLE 2.5: Check exemplarily the Ackley function’s global optimum.

Opkg SQP MMA NMS COBYLA DIRECT MEA ADE L-BFGS PS

13 3 3 3 3 3

33 3 3

In order to assess the ambit of the solution found, a common practice in many

applications is: Apply a global optimization algorithm; and use its solution as a start-

ing point for a local optimization algorithm. However, another possibility to assess

the area of validity is to use interval arithmetic (see, e.g., [213], [104]) in the context

of deterministic global optimization.11 In (Opkg4), such a possibility is pursued by a

Moore-Skelboe (MS) algorithm (see, e.g, [59]). Mind that the result is not comprised

of the optimal component values (x∗

1,x∗

2)and the optimal function value f(x∗

1,x∗

as with the aforementioned algorithms; instead the result is comprised of intervals

that contain guaranteed the optimal component values [x∗

1,l,x∗

1,u]×[x∗

2,l,x∗

2,u]and the

optimal function value [f(x∗

1,l,x∗

2,l),f(x∗

1,u,x∗

2,u)].

Finally, when one moves from the test functions such as in Figure 2.2a to func-

tions from applications, one has to recall two common issues:

10For more details on gradient-based optimization methods, I refer to, e.g., [158], [18]; on derivative-

free optimization algorithms, I refer to, e.g., [47], [10], [135]; and on deterministic and stochastic global

optimization algorithms, I refer to, e.g., [98], [67], and, e.g., [24], [195], respectively.

11For further deliberations on deterministic global optimization using interval arithmetic, I refer to,

e.g., the survey in [157].

34 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and

optimization

• The test functions exhibit a fairly complete picture that facilitates the choice

of an appropriate algorithm. However, in many applications, the choice of

an appropriate optimization algorithm for a given problem is difficult due to

incomplete preliminary information. Hence, there is also no preference for a

particular type of optimization algorithms.

• A test function evaluation is computationally cheap. However, in many appli-

cations, the evaluation of the objective function or the constraint functions or

both (see § 2.3.1) depends on a computationally expensive numerical simula-

tion (see § 2.2) such that, presumably, an exhaustive coverage of the parameter

space is prohibitive. Hence, an exhaustive reconstruction of the shape (or land-

scape) that can be associated with the function under investigation is unlikely.

We encounter these two issues again in the upcoming chapter 3that is concerned

with the discussion about optimization schemes using low-fidelity models.

Moreover, in chapter 5, we consider high-fidelity optimization problems as a con-

crete instances of the abstract optimization problem in (2.36) where the semantics of

the magnetoquasistatic model is applied. More precisely, we encounter functions

that encode, for instance, the time-averaged ohmic loss, the time-averaged ohmic

loss density or the inductance at different operating frequencies. Hence, the investi-

gation presented in this section is valuable as a preliminary study and anchor point

to develop and assess the studies of chapter 5.

2.3. Numerical optimization with the magnetoquasistatic model 35

-30-20-10 0102030

-30

-20

-10

(i)

-30-20-10 0102030

-30

-20

-10

250

500

750

1000

1250

1500

1750

(ii)

-10

-5

-10

-5

500

1000

1500

2000

2500

(iii)

-10

-5

-10

-5

×104

(iv)

-1.5

-1.0

-0.5

0.0

0.5

1.0

(v)

-5

100

150

200

250

(vi)

(A) Surface representation of z∶=f(x1,x2).

-30 -20 -10 0 10 20 30

-30

-20

-10

(i)

-30 -20 -10 0 10 20 30

-30

-20

-10

(ii)

-10 -5 0 5 10

-10

-5

(iii)

-10 -5 0 5 10

-10

-5

(iv)

0 1 2 3 4

(v)

-5 0 5 10

(vi)

(B) Contour representation of z∶=f(x1,x2).

The red cross indicates a global minimum.

FIGURE 2.2: Representations of the test functions in Table 2.1.

(i)

Ackley,

(ii)

Unit sphere,

(iii)

Booth,

(iv)

Rosenbrock,

(v)

Michalewicz,

(vi)

Modified Branin.

36 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and

optimization

-2

-1

-2

-1

(i)

-2

-1

-2

-1

(ii)

-1

(iii)

-2

-1

-2

-1

100

125

150

175

(iv)

-1.5

-1.2

-0.9

-0.6

-0.3

(v)

-5

-4

-3

-2

-1

(vi)

(A) Surface representation of z∶=f(x1,x2).

-2 -1 0 1 2

-2

-1

(i)

-2 -1 0 1 2

-2

-1

(ii)

-1 0 1 2 3

(iii)

-2 -1 0 1 2

-2

-1

(iv)

1 2 3

(v)

-5 -4 -3 -2 -1

(vi)

(B) Contour representation of z∶=f(x1,x2).

The red cross indicates a global minimum.

FIGURE 2.3: Representations of the test functions in Table 2.1 (high-

lighting the neighborhood of the global minimum).

(i)

Ackley,

(ii)

Unit sphere,

(iii)

Booth,

(iv)

Rosenbrock,

(v)

Michalewicz,

(vi)

Modified Branin.

2.3. Numerical optimization with the magnetoquasistatic model 37

−30 −10 10 30

−30

−10

(i)

−30 −10 10 30

−30

−10

(ii)

−10 −5 0510

−10

−5

(iii)

−10 −5 0510

−10

−5

(iv)

012 3 4

(v)

−5 0510

(vi)

(A) Depicting grad(f)(x1,x2)within Figure 2.2b.

−2 −1 012

−2

−1

(i)

−2 −1 012

−2

−1

(ii)

−1 012 3

(iii)

−2 −1 012

−2

−1

(iv)

12 3

(v)

−5 −4 −3 −2 −1

(vi)

(B) Depicting grad(f)(x1,x2)within Figure 2.3b.

FIGURE 2.4: Depicting grad(f)(x1,x2)with (x1,x2)∈Uas a projec-

tion on the contour representation of the test functions in Table 2.1.

(i)

Ackley,

(ii)

Unit sphere,

(iii)

Booth,

(iv)

Rosenbrock,

(v)

Michalewicz,

(vi)

Modified Branin.

38 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and

optimization

−2 −1 012

−2

−1

(i)

−2 −1 012

−2

−1

(ii)

−1 012 3

(iii)

−2 −1 012

−2

−1

(iv)

12 3

(v)

−5 −4 −3 −2 −1

(vi)

(A) Depicting s1(f,(x1,x2)) w.r.t. Figure 2.3b.

Dark colors indicate low values; bright colors indicate high values.

−2 −1 012

−2

−1

(i)

−2 −1 012

−2

−1

(ii)

−1 012 3

(iii)

−2 −1 012

−2

−1

(iv)

12 3

(v)

−5 −4 −3 −2 −1

(vi)

(B) Depicting s2(f,(x1,x2)) w.r.t. Figure 2.3b.

Dark colors indicate low values; bright colors indicate high values.

FIGURE 2.5: Depicting si(f,(x1,x2))from (2.49) with i∈{1,2}and

(x1,x2)∈Uw.r.t. the test functions in Table 2.1.

(i)

Ackley,

(ii)

Unit sphere,

(iii)

Booth,

(iv)

Rosenbrock,

(v)

Michalewicz,

(vi)

Modified Branin.

2.4. In closing 39

2.4 In closing

The chapter’s primary purpose has been to lay out the technical landscape in which

the remaining chapters are placed. The languages of vector analysis, differential

geometry, and functional analysis served as a methodological and terminological

guidance for formulating the relevant notions.

More precisely, we have elaborated the magnetoquasistatic model of Maxwell’s

theory by presenting the fundamental problem statement of electromagnetism and

the corresponding system of Maxwell’s equations. From this system, we have de-

rived the magnetoquasistatic subsystem and the magnetostatic subsystem.

Using the magnetoquasistatic model as a directing representative, we have ex-

amined its numerical simulation in the common procedure, i.e., we have recapitu-

lated concisely the concepts of the weak formulation, the numerical approximation,

and the parametric mathematical model.

Finally, we have discussed notions regarding the optimization with a partial dif-

ferential equation and its relation to nonlinear optimization problems. We have

sketched various types of optimization algorithms and we have outlined a subset of

optimization test functions. By deploying a gradient-based interpretation of sensi-

tivity measures to the test functions – which permit the determination of derivative

information by forward mode automatic differentiation –, we have completed the

discussion.

Chapter 3

Surrogate optimization

In this chapter, I provide an in-depth elaboration of the key notion surrogate optimiza-

tion. Furthermore, I provide an in-depth elaboration of the proposed partitioning of

this notion in § 1.2 into the three sub-notions: (1) surrogate modeling & simulation,

(2) surrogate-based optimization, and (3) surrogate-guided optimization.

Within the limited scope of the explanations, let us anticipate consistently alge-

braic tools from the category theoretical language in ch. 4in order to tag the various

notions of surrogate optimization with algebraic notes. Additionally, these algebraic

tools facilitate the smooth transition between the various layers in Figure 1.4.

Concerning the sub-notion (1) surrogate modeling & simulation, let us forge an

abstract setting in which we state common classes of mathematical problems. Within

the context of these classes, we embed the notion of a high-fidelity model and a

low-fidelity model. Subsequently, we define the high-fidelity approximation error,

the notion of a sampling plan, and the empirical surrogate modeling error as one

among other indicators within surrogate optimization. Afterwards, let us attempt

to sketch an holistic understanding of some deterministic data-fit low-fidelity mod-

els, i.e., multivariate polynomials and radial-basis functions, and some probabilistic

data-fit low-fidelity models, i.e., kriging low-fidelity models. We close this subpart

by applying a formalization-driven perspective on simplified-physics low-fidelity

models.

Concerning the sub-notion (2) surrogate-based optimization, let us examine the

optimization with the test functions in § 2.3.3 by data-fit low-fidelity models and

by emulated simplified-physics low-fidelity models. We carve out a numerical scaf-

folding of a benchmark-focused classification of test functions (more generally, high-

fidelity models) and we elucidate different procedures to find a solution of the high-

fidelity optimization problem.

Concerning the sub-notion (3) surrogate-guided optimization, let us dwell briefly

on the sequential kriging optimization and its construction principles as a subkind of

the model management strategy adaptation. Afterwards, we dwell on optimization

procedures within the space-mapping paradigm which are a subkind of the model

management strategy adaptation; and we dwell on the basic building blocks of the

co-kriging optimization which can be seen as a subkind of the model management

strategy fusion. By applying a formalization-oriented viewpoint, it is attempted to

illuminate potential hybrid model management strategies and to pin down prop-

erly, e.g., the conceptional distinction between a low-fidelity model and a surrogate

model within the space-mapping paradigm. Furthermore, driven by heuristics, we

construct formally some statements to provide a novel access to the delicate aspect

of convergence-related issues regarding the optimization within the space-mapping

paradigm and the co-kriging optimization.

42 Chapter 3. Surrogate optimization

3.1 Surrogate modeling & simulation

Notice well that, due to the work of so many diverse research communities in the

vast field of surrogate optimization, it seems impossible to provide a unifying metho-

dological and terminological guidance concerning surrogate optimization that suits

every research community.

Regarding probability low-fidelity models, for instance, there is the delicate as-

pect of the interpretation of probability (see, e.g., [201, p. 29ff]). One main interpre-

tation leads to the school of thought called Bayesian statistics (see, e.g., [155, ch. 5]),

another main interpretation leads to the school of thought called Frequentist statis-

tics (see, e.g., [155, ch. 6]).

Therefore, let a general guiding principle of ours be that we aim at being as in-

different as possible to potential issues of interpretation or semantics.

Thus, similarly to a first principles approach, we focus on stripping surrogate

optimization down to a bare syntactical minimum – and, then, to argue from this

bare syntactical minimum, adding layers of syntax and semantics when they are

needed.

3.1.1 An abstract setting

After discussing different classes of mathematical problems, we discuss the concepts

high-fidelity function approximation error, sampling plan, and empirical surrogate

modeling error. Among others, we introduce the empirical generalization error and

the squared sample Pearson correlation coefficient (SSPCC). We close this subsec-

tion by illuminating a link between the SSPCC and the normalized global first-order

sensitivity measures (see § 2.3.3).

Classes of mathematical problems

In the previous chapter, we have encountered the concepts of modeling, simulation

and optimization in the context of the magnetoquasistatic Maxwell’s theory. If we

apply a map-based viewpoint, one can assign abstractly each of these concepts to

one of the following classes of mathematical problems:

givenx∈Xand y∈Y, findK∈hom(X,Y)such that K(x)=y, (3.1a)

givenK∈hom(X,Y)and x∈X, findy∈Ysuch that K(x)=y, (3.1b)

givenK∈hom(X,Y)and y∈Y, find x∈Xsuch that K(x)=y, (3.1c)

where, for instance, Kdenotes a linear map, Xand Ydenote linear spaces over an

underlying field F, and hom(X,Y)(or homF(X,Y)) connotes a vector space as well.

Following the terminology in [140, p. 23], let us call the problems of the form

in (3.1a) as identification problems, the form in (3.1b) as direct problems, and the form

in (3.1c) as inverse problems.1Thus, in the context of the previous chapter, one can

assign modeling to (3.1a), simulation and optimization to (3.1c).

Observe that, for example, the evaluation of the reduced parametric quantity of

interest ˆ

Qξ

ξ(ξ

ξ)(see § 2.2.3) can be assigned to (3.1b). Seizing this example, let us pin

down a few notions regarding a surrogate model. Some chunks of approximation

theory and statistical learning theory are utilized which aid us to frame coherently

and to state economically the necessary notions.

1An identification problem is also frequently named recovery problem (see, e.g., [187, p. 551f]). A

direct problem is often called a forward problem as well (see, e.g., [49, p. 5]).

3.1. Surrogate modeling & simulation 43

Let us return to the statement in (2.28) and reformulate it slightly using the terms

in (3.1); hence, we deal with the statement

∀x∈X.K(x)=Y˜

K(x), (3.2)

where K,˜

K∶X⇉Y. In calculations that are of practical interest, it is assumed that

the map Kpossesses certain undesired properties, e.g., its evaluation is exceeding

reasonable finite computing time budgets or it is not straightforwardly available for

operations such as differentiation or integration. A consequence of these proper-

ties is that application-oriented optimizations relying on the map Kare prohibitive.

Therefore, the aim is to surrogate the map Kwith the map ˜

Kthat possesses user-

prescribed desired properties. Commonly, the map Kis called a high-fidelity model –

to emphasize the user’s prescribed assessment of the model’s predictive power –

or a high-cost model – to emphasize the user’s prescribed assessment of the model’s

computational costs. Then, the map ˜

Kis called a low-fidelity model, a low-cost model, a

meta-model or a surrogate model.

Technically, one can substitute the map Kfor the map ˜

Kif the statement in (3.2)

holds true such that the maps are equal by function extensionality, thus, K=X→Y˜

However, this line of thought is left to the next chapter since the usual starting point

regarding surrogate models focuses on, e.g., interpolating polynomials or splines

and linear or nonlinear regression models. Inspired by the origins of these concrete

examples in approximation theory and statistical learning theory, one can define the

class of data-fit low-fidelity models which can be subdivided into the subclass of de-

terministic low-fidelity models, and the subclass of probabilistic low-fidelity models

(see, e.g., [76, p. 132], [116, p. 275]). Facing numerical simulations, one can addition-

ally define the class of projection-based low-fidelity models and the class of simplified-

physics low-fidelity models.

Recalling § 1.2, the class of projection-based low-fidelity models is not pursued.

Furthermore, mind that the term meta-model is primarily a paraphrase for the term

data-fit low-fidelity model, and vice versa; and that, in the context of the space map-

ping paradigm or the defect correction paradigm, the term low-fidelity model and

the term surrogate model are distinguished (see, e.g., [194, p. 28f] or [49, p. 56f]).

It is assumed that a high-fidelity model is deterministic in the sense that repeated

use of the same input results in the same output each time (see, e.g., [184, p. 409]).

If the choice of a high-fidelity model is, e.g., the reduced parametric quantity of

interest ˆ

Qξ

ξthat is attained by a FE simulation, then a deterministic FE simulation

is considered as opposed to a stochastic FE simulation (see, e.g., [74]). However,

some kind of randomness in the form of noise εin the image ˆ

Qξ

ξ(ξ

ξ)is taken into

account but this noise does not stem from randomness in the argument ξ

ξ. The noise

εencodes, for instance,

• "random errors from unobserved variables" [61, p. 39],

• errors in the presence of a "parameter controlling missing physics" [201, p. 93],

• or a "systematic error ... caused by insufficient mesh resolution" [70, p. 5].

Hence, let us summarize in a single observational noise εall kinds of noise such

as the computational noise with regard to the mesh size parameter h(recall § 2.2.2)

and represent this observational noise as a random variable, even though we utilize

a deterministic high-fidelity model. Mind that this representation is a usual trick

in order to make the machinery of probabilistic low-fidelity models amenable to

deterministic high-fidelity models (see, e.g., [70, p. 5]).

44 Chapter 3. Surrogate optimization

High-fidelity function approximation error

Let us suppose that X,Y, and YX≡hom(X,Y)comprise a more finely layered struc-

ture, more precisely, let us suppose that they are normed linear spaces and they are

equipped with corresponding norm-induced metrics dX,dY, and dYXsuch that

dX=(x1,x2)↦∥x1−x2∥X∶X×X→R+, (3.3a)

dY=(y1,y2)↦∥y1−y2∥Y∶Y×Y→R+, (3.3b)

dYX=(K1,K2)↦∥K1−K2∥YX∶YX×YX→R+. (3.3c)

Notice well that, for the sake of simplicity, it is merely tacitly assumed that, for all

elaborations, additional properties such as compactness, Lipschitz continuity, and

the like hold to be true if the respective notions require these properties.

If we choose a surrogate model ˜

Kfrom a prescribed class of functions called

hypothesis space H(cf. [51, p. 9ff]), that is, ˜

K∈Hand H⊆YX, then we introduce an

error which one can capture by the metrics in (3.3). The theoretical capability of a

surrogate model to approximate accurately a high-fidelity model can be investigated

by an analysis of the convergence of a sequence of surrogate models (˜

Kn)n∈Nto the

high-fidelity model K. It is assumed that ˜

Kn∶X→Yand (˜

Kn)n∈N=n↦˜

Kn∶N→YX.

Often, the notion of uniform convergence (see, e.g., [183, p. 147ff] or [169, p. 203ff])

is deployed where it is set that

dYX(K1,K2)∶=sup

x∈X∥K1(x)−K2(x)∥Y, (3.4)

such that, in abbreviated form, (˜

Kn)n∈N→Kdenotes the convergence of the se-

quence to the limit function which can be expressed figuratively as

(˜

Kn)n∈N→K∶⇔lim

n→∞dYX(˜

Kn,K)=0. (3.5)

There are various theorems like the Stone-Weierstrass theorem for multivariate poly-

nomials (see, e.g., in [140], [169], [183]) which establish the convergence of different

surrogate models in a sense that is similar to (3.5). The expression in (3.4) and the

expression in (3.5) guide the definition of the high-fidelity function approximation er-

ror eH(K)with respect to a fixed surrogate model ˜

Kn.

Definition 3.1.1 (High-fidelity function approximation error).Given a high-fidelity

model K∶X→Yand a fixed surrogate model ˜

Kn∶X→Yfrom a hypothesis space H,

i.e., ˜

Kn∈Hand H⊆YX, the high-fidelity function approximation error eH(K)∈R+

with respect to a fixed surrogate model ˜

Knis constituted by

eH(K)∶=sup

x∈X∥K(x)−˜

Kn(x)∥Y. (3.6)

Remark 3.1.1. If it is unambiguous, then the adjunct high-fidelity is dropped.

Remark 3.1.2. An important special case of (3.6) is eH(ˆ

Qξ

ξ), that is, the function approxima-

tion error with respect to the lp-norm regarding the reduced parametric quantity of interest.

Comparing the function approximation error eH(K)in (3.6) to the modeling er-

ror E(Q)in (2.22b), it is apparent that one can control eH(K)by a surrogate model’s

order nwhere the positive integer n+1 can represent, for instance, the number of ba-

sis functions. Thus, assuming an F-vector space structure on YXand on H, a possible

3.1. Surrogate modeling & simulation 45

general presentation of a surrogate ˜

Knis given by

Kn∶=

∑

i=0

ci⋅F˜

ϕi, (3.7)

where ˜

ϕi∶X→Ysignifies basis functions, i.e., members of a basis of a selected hy-

pothesis space Hwith dimension n, that is, dim(H)≡n; and ˜

ci∈Ftags coefficients

which are also referred to as components or coordinates of ˜

Knwith respect to the

chosen basis. A prototypical hypothesis space is the space P≤n, that is, the set of all

univariate algebraic polynomials of degree at most non an interval [a,b]equipped

with an R-vector space structure and the finite monomial basis. The degree npro-

vides a notion of a characteristic size of the hypothesis space H(cf. [51, p. 13]).2

Supposing a Hilbert space structure on YXand on H, one can define a notion of

a best approximation in a least-squares sense. Thus, the best or the closest surrogate

model ˆ

Kn∈Hto the high-fidelity model K∈YXis associated with the optimization

problem

min

Kn∈H

.R(˜

Kn)∶=1

2∥K−˜

Kn∥2

YX(3.8a)

≡1

2⟨K−˜

Kn,K−˜

Kn⟩YX, (3.8b)

where R∶H→R+denotes the residual objective functional. One can characterize the

best surrogate model ˆ

Knby the minimizer functional argmin =R↦ˆ

Kn∶(H→R+)→H

as a single-valued functional. If the residual objective functional Rpossesses a unique

global minimizer, then the minimizer functional returns ˆ

Knas an output, i.e.,

Kn=argmin(R)(3.9)

is well-defined. If there are multiple global minimzers or if there is no global mini-

mizer at all, the minimizer functional falls back to the common definition as a multi-

valued functional where the output is constituted by the set of global minimizers

of R, i.e.,

argmin(R)≡argmin

Kn∈H

R(˜

Kn)∶=⎧

⎪

⎨

⎪

⎩˜

Kn∈HRRRRRRRRRRRR(˜

Kn)=inf

K′

n∈H

R(˜

K′

n)⎫

⎪

⎬

⎪

⎭(3.10)

In the case of (3.10), the surrogate model ˆ

Knis abest solution that reads as

Kn∈argmin

Kn∈H

R(˜

Kn). (3.11)

In the present work, it is generally assumed that the expression (3.9) holds. This

assumption is particularly reasonable for a least squares approach such as in (3.8a)

(see, e.g., [201, p. 69ff]) where the best (or the closest) surrogate model ˆ

Knis the

orthogonal (pseudo-) projection of the high-fidelity model Konto the hypothesis

2The notation in (3.7) follows the customary index convention from multilinear algebra in order to

emphasize the linear combination of members of a vector space. The operation ⋅Fimplies a signature

F×homF(X,Y)→homF(X,Y). The operation’s concrete implementation depends on the selected

concrete function space like, for example, P≤n.

46 Chapter 3. Surrogate optimization

space Hsuch that

(K−ˆ

Kn)⊥H. (3.12)

If we consider, e.g., the prototypical hypothesis space P≤nin the context of the

space of all square-integrable functions L2, then we have the basic continuous least

squares L2approximation as an instance of (3.8a).3

Sampling plan

If one would possess sufficient information in order to determine completely the

high-fidelity model, then the previous considerations suffice for the discussion of

corresponding surrogates. A standard example is the approximation of special func-

tions such as the sine function. However, a basic assumption in the present work

states that a single evaluation of a high-fidelity model is costly; hence, it is desired

to keep the total number of evaluations as low as possible. Therefore, let us create a

sample s∈Wmsuch that

s∶=((x1,y1),.. .,(xm,ym))∈(X×´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶

Y)× ⋅ ⋅ ⋅ × (X×Y), (3.13)

where m∈N/{0}and ∀i∈{1,. . .,m−1,m}.yi=K(xi)and, furthermore, it is set

that Wm∶={(si)i∈{1,...,m−1,m}∣∀i∈{1,. . .,m−1,m}.si∈X×Y}. Let us refer to mas

sample size. Mind that, in theoretical considerations, it is assumed that the pairs

in X×Yare independently randomly chosen which involves utilizing means from

the probability theory toolkit such as a probability measure (see, e.g., [51, p. 5]). For

a more comprehensive discussion on the toolkit concerning measure and probability

theory, see, e.g., [201, Ch. 2].

However, let us forgo using all of the corresponding theoretical toolkit: In the

present work, we rather focus on the entities with respect to a sample, which are

equipped with the attribute sample or empirical – such as the sample or empirical

mean, the sample or empirical variance, and similar.

Given a sample sand given Xmas the m-fold Cartesian product of Xwith X⊂

RNξ, it is more common to deploy the notion of a sampling plan Xs⊆Xmdefined as

Xs∶={(xi)i∈{1,...,m}∣∀i∈{1,...,m}.xi∈X}, (3.14)

where, concerning the implementation and the choice of a data structure, it is pre-

vailing to identify the sampling plan Xswith an m×Nξmatrix where Nξdenotes the

number of parameters (see § 2.2.3).

Given a member x∈Xsand invoking the projection maps πi∶Xs→Xs,iwith

i∈{1,. . .,m}and Xs,i≡Xwhere πi=x↦πi(x)=∶xi, let us refer to xias sampling

plan points and to the corresponding yiin (3.14) as output points.

Let us discuss briefly some peculiarities regarding the design of a sampling plan.

For a more elaborate discussion on the design of sampling plans, I refer to, e.g., [116,

ch. 13], [70, ch. 1], [53, ch. 17], or [61, ch. 2].

Two desirable properties of a sampling plan Xsare that it is:

3I regard YX∶=L2solely as a powerful interface. The most important use case is given by X∶=Rd

with d∈Nand Y∶=R. For all the subtleties regarding the constructions such as Borel σ-algebra,

Lebesgue measure, the extended real numbers, the Lebesgue integral, and similar, I refer to the lit-

erature (see, e.g., [201, ch. 3]).

3.1. Surrogate modeling & simulation 47

•space-filling and

•non-collapsing (see, e.g., [100]).

The non-collapsing property requires that the coordinates of the sampling plan points

are not identical. More precisely: Let i∈{1,. ..,m}be fixed, and let πjdenote the co-

ordinate projection maps such that πj∶Xs,i→Rwith j∈{1,. . ., Nξ}where πj=

xi↦πj(xi), then ∀xk∈Xs,k,xl∈Xs,l.πj(xk)≠πj(xl)where k∈{1,.. .,m}and

l∈{1,. . .,m}and k≠l. The rationale for this property is, in a strong sense, to exclude

the pathological case where there are two identical sampling plan points; and, in a

weak sense, to exclude the non-economical cases where two sampling plan points

differ only in coordinates to which the high fidelity model is not very sensitive any-

way, so that, in fact, the two points can be seen as equal.

The space-filling property requires to sample the domain of the high-fidelity

model in such a way that the sampling plan error e(Xs)∈R+is minimal which re-

sults in a maximal uniform scattering of the sampling plan points in the domain.

Notice well that there are many ways to quantify the space-filling property (see,

e.g., [53, p. 600]). In order to achieve an optimized sampling plan, a basic idea

is to minimize some objective function involving a distance measure of the sam-

pling plan points with respect to the lp-norm. Pursuing this idea, the corresponding

space-filling sampling plans are generally called Latin hypercube (LHC) sampling

plans. Another kind of space-filling sampling plans are quasi-random sequences or

low-discrepancy sequences ([53, p. 615ff]). They are discussed, for instance, in the

context of the numerical integration of multivariate functions (cf. [116, p. 245]).

In [70, p. 17–27], the authors provide an implementation in the MATLAB®PL for

creating an optimized LHC sampling plan based on the Morris-Mitchell criterion and

an evolutionary operation. By exploiting the package MATLAB.jl (see [101]) which

provides the capability to interact with the MATLAB®PL within the Julia PL, let us

adapt the lines of code concerning this particular optimized LHC sampling plan to

the Julia PL and label them (XSpkg1).

Additionally, let us invoke two Julia PL packages for the creation of sampling

plans:

(XSpkg2) the package LatinHypercubeSampling.jl (see [215] and [216]) provides an

implementation for creating an optimized LHC sampling plan based on the

Audze-Eglais criterion and a genetic algorithm to solve the corresponding opti-

mization problem,

(XSpkg3) the package Sobol.jl (see [106]) provides an implementation for creating a

Sobol quasi-random sequence.

In Figure 3.1 and in Figure 3.2, there are representations of different sampling plans

Xs⊆Xmwhere X∶=[0,1]2denotes the unit 2-dimensional hypercube and the num-

ber of sampling plan points mis given by m∈{10,25,50,100}as well as by m∈

{10,25,100,1000}. Using (XSpkg1), an optimized Latin hypercube sampling plan

is generated which is abbreviated to maximin LHC (cf. [100]). Using (XSpkg2), a

random Latin hypercube and an optimized Latin hypercube sampling plan are gen-

erated which are abbreviated to Audze-Eglais LHC (cf. [100]), respectively. The unit

hypercube is achieved by scaling the hypercube [1,m]×[1,m]. Using (XSpkg3), a

Sobol quasi-random sequence sampling plan is generated which is by default con-

structed for the unit hypercube. Other hypercubes can be achieved by scaling the

unit hypercube.

48 Chapter 3. Surrogate optimization

Comparing the LHC sampling plans (see Figures 3.1a–3.1d)), one can observe

that, already at a low number of sampling plan points, utilizing a random LHC can

lead to a clustering instead of a uniform spreading. The comparison of the Audze-

Eglais LHC and the maximin LHC is intricate due to the different underlying cri-

teria and the randomness in the corresponding stochastic optimization algorithms

(see § 2.3.3). However, both optimized Latin hypercubes exhibit a highly uniform

scattering of the sampling plan points as desired. For more details on a comparison

of the Audze-Eglais LHC and the maximin LHC, see, e.g., (cf. [100]).

In Figure 3.2, the Sobol quasi-random sequence sampling plan is investigated

which shows a highly uniform and nonrandom scattering of the sampling plan

points.4

Interestingly, using (XSpkg1) for high numbers of sampling plan points such as

m>100 and a high accuracy regarding the solving of the underlying optimization

problem requires a significantly higher computational time than using (XSpkg2) and

(XSpkg3).

From a modeling and simulation viewpoint, though, the fundamental premise

is that a single evaluation of a high-fidelity model is expensive. Therefore, the aim

is to construct a sufficiently space-filling sampling plan in a short amount of time at

a low number of sampling plan points. With this aim in mind, all three presented

space-filling sampling plans are well-suited.

From an optimization viewpoint, the sampling plan points should ideally be lo-

cated in the vicinity of optimal points. In Figure 3.3, I illustrate this requirement

by adapting the Audze-Eglais Latin hypercube sampling plan in Figure 3.2 for the

contour representation in Figure 2.2b.

Notice well that a high number of sampling plan points can lower the sampling

plan error e(Xs). Lowering this error can improve the local accuracy of a surrogate

model built upon these points. A surrogate model’s global accuracy, though, is also

determined by the function approximation error in (3.6) which is independent of the

sample.

Empirical surrogate modeling error

In context of the local and global accuracy of a surrogate model, let us introduce

another notion regarding the entity ˆ

Qξ

ξ: the empirical surrogate modeling error eH,s(ˆ

Qξ

ξ)

with respect to the sampling plan Xs.

Definition 3.1.2 (Empirical surrogate modeling error).Let us suppose a sampling

plan Xs(such that e(Xs)is minimal) equipped with sampling plan points xi∈Xs,i

where i∈{1,. . .,m}and Xs,i≡X. Furthermore, let us assume a high-fidelity model

Qξ

ξ∶X→Rand a fixed surrogate model ˜

Qξ

ξ,n∶X→Rfrom a hypothesis space H⊆

RX. Then, the empirical surrogate modeling error eH,s(ˆ

Qξ

ξ)∈R+with respect to the

sampling plan Xsis constituted by

eH,s(ˆ

Qξ

ξ)∶=1

∑

i=1(ˆ

Qξ

ξ(xi)−˜

Qξ

ξ,n(xi))2.5(3.15)

4I do not invoke the package’s functionality to skip the initial portion of the Sobol sequence which,

allegedly, could further improve the uniform spreading of the sampling plan’s points.

5From the viewpoint of statistical learning theory, this error is a representative of mean squared

errors (see, e.g., [116, p. 265]). Mean squared errors along with root mean squared errors (see, e.g., [70,

p. 37]) can be considered as parts of squared error loss functions (see, e.g., [91, p. 219]).

3.1. Surrogate modeling & simulation 49

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(i)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(ii)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(iii)

(A) The number of sampling plan points is given by m∶=10.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(i)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(ii)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(iii)

(B) The number of sampling plan points is given by m∶=25.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(i)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(ii)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(iii)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(i)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(ii)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(iii)

(D) The number of sampling plan points is given by m∶=100.

FIGURE 3.1: Representations of different sampling plans Xs⊆Xm

where X∶=[0,1]2denotes the unit 2-dimensional hypercube.

(i)

Random LHC,

(ii)

Audze-Eglais LHC,

(iii)

Maximin LHC.

50 Chapter 3. Surrogate optimization

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(i)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(ii)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(iii)

(A) The number of sampling plan points is given by m∶=10.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(i)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(ii)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(iii)

(B) The number of sampling plan points is given by m∶=50.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(i)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(ii)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(iii)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(i)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(ii)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(iii)

(D) The number of sampling plan points is given by m∶=1000.

FIGURE 3.2: Representations of different sampling plans Xs⊆Xm

where X∶=[0,1]2denotes the unit 2-dimensional hypercube.

(i)

Random LHC,

(ii)

Audze-Eglais LHC,

(iii)

Sobol quasi-random sequence.

3.1. Surrogate modeling & simulation 51

-30 -20 -10 0 10 20 30

-30

-20

-10

(i)

-30 -20 -10 0 10 20 30

-30

-20

-10

(ii)

-10 -5 0 5 10

-10

-5

(iii)

-10 -5 0 5 10

-10

-5

(iv)

0 1 2 3 4

(v)

-5 0 5 10

(vi)

(A) The number of sampling plan points is given by m∶=10.

-30 -20 -10 0 10 20 30

-30

-20

-10

(i)

-30 -20 -10 0 10 20 30

-30

-20

-10

(ii)

-10 -5 0 5 10

-10

-5

(iii)

-10 -5 0 5 10

-10

-5

(iv)

0 1 2 3 4

(v)

-5 0 5 10

(vi)

(B) The number of sampling plan points is given by m∶=50.

FIGURE 3.3: The Audze-Eglais LHC sampling plan in Figure 3.2

adapted for the contour representation in Figure 2.2b.

52 Chapter 3. Surrogate optimization

Remark 3.1.3. I point out that the size m of the sample sin (3.15) is fixed. If m would

theoretically tend to infinity, then this asymptotic consideration would particularly affect

the sampling plan error e(Xs). The function approximation error eH(ˆ

Qξ

ξ)in (3.6), though,

would not be affected since it is independent of the sample.

Remark 3.1.4. If we recall the presentation of a surrogate in (3.7), then, technically, we

face a family of surrogates parameterized by the coefficients ˜

ciand by the degree n. One can

distinguish the kinds of parameters by utilizing the term hyperparameter. Let us compre-

hend all parameters to be determined as hyperparameters ˜

χi– except the coefficients ˜

ci; one

can organize these hyperparameters within an ordered set whose size depends on the given

surrogate modeling problem.

Remark 3.1.5. Notice well that the presence of the coefficients and hyperparameters pres-

sures us to translate our notation from ˜

Qξ

ξ,n(xi)to, for instance, ˜

Qξ

ξ(xi1,ci2,˜

χi3). Though,

if the peril of confusion is low, in order to be consistent with the common literature, let us

accept an abuse of notation, that is, let us put ˜

Qξ

ξ,n(xi)to work which refers tacitly to, for

instance, ˜

Qξ

ξ(xi1,ci2,˜

χi3).

Defining the error eH,s(ˆ

Qξ

ξ)has a conceptual and a practical value. At a con-

ceptual level, it expresses the problem-dependency on ˆ

Qξ

ξ. Furthermore, this error

encodes the dependency on the surrogate model’s membership to a prescribed class

of functions, that is, the hypothesis space H; and it encodes the dependency on the

sample s.

At a practical level, the error eH,s(ˆ

Qξ

ξ)serves as a starting point in order to define

the empirical training error eH,st(ˆ

Qξ

ξ)and, more importantly, the empirical generalization

error eH,sg(ˆ

Qξ

ξ). These errors require a partition Xs,pof the sampling plan Xs, more

precisely, Xsis represented as the disjoint union of two subsets Xstand Xsg– where

Xstdenotes the training subset (or observed points subset) and Xsgdenotes the test-

ing subset (or prediction points subset). More formally, there exists an equivalence

relation ∼Xs,pon Xsassociated with the given partition such that Xs,p∶=Xs/∼Xs,p; and

by demanding Xst∩Xsg=∅, it is set that

Xs∶=Xst⊍Xsg, (3.16)

where Xst∈Xs,pand Xsg∈Xs,p. Given the number of sampling plan points m, then

the positive integer mtdenotes the number of training points and the positive inte-

ger mgdenotes the number of testing points such that m≡mt+mgwhereas mostly

mg≪mt. Moreover, given mand a scalar pm∈(0,1)that encodes a fixed partition

ratio for the sampling plan, then it is set that

mt∶=⌈p⋅m⌉such that mg∶=m−⌈p⋅m⌉. (3.17)

Let us suppose a scarcity of points in the sampling plan Xs. Hence, we do not

define yet another subset, more precisely, we do not define a validating subset (see,

e.g., [91, p. 222f]). A validating subset’s purpose is to aid selecting a member from

a parameterized family of surrogates (recall Remark 3.1.4) adapted to the training

subset; before this preselected member is assessed by the testing subset.

The members of the training subset Xstare deployed in (3.15) to determine or to

estimate the parameters ˜

g∈Gsuch as the coefficients ˜

ciin (3.7) or the hyperparam-

eters ˜

χieither by an interpolation problem (exactly fitting the given data) or by a

regression problem (inexactly fitting the given data).

3.1. Surrogate modeling & simulation 53

Determining the coefficients by interpolation results in an empirical training er-

ror eH,st(ˆ

Qξ

ξ)of zero – by virtue of the interpolation property: Supposing an evalua-

tion functional evi∈homR(RX,R)such that evi(˜

Qξ

ξ,n)∶=˜

Qξ

ξ,n(xi), let us conceive the

interpolation property in the sense of

∀xi∈Xs,i.ˆ

Qξ

ξ(xi)=˜

Qξ

ξ,n(xi,˜

g). (3.18)

In the next section, we discuss the radial basis surrogate model and the Gaussian

process regression (or kriging) surrogate model, respectively. In this discussion, we

encounter concisely the interpolation problem in a deterministic setting and in a

stochastic setting, respectively. For an in-depth elaboration about deterministic and

stochastic interpolation, I refer to, e.g., [201, ch. 13] or [188].

Recalling (2.35) and (3.8a), one can define an objective function r ∶G→Rin order

to estimate the best parameters ˆ

g∈Gvia a regression problem in the sense of a basic

discrete least squares l2approximation

minimize

g∈Gr(˜

g)∶=1

∑

i=1∥ˆ

Qξ

ξ(xi)−˜

Qξ

ξ(xi,˜

g)∥2

l2. (3.19)

If we estimate the coefficients by regression such as a least squares method, then,

generically, the training error eH,st(ˆ

Qξ

ξ)is greater than zero. A possible interpreta-

tion of this approach to parameter finding is the following: We know a priori that

the sample sin (3.13) fits the surrogate model but the observed outputs within the

sample are noisy, that is,

∀i∈{1,. . .,m}.yi=K(xi)+εi, (3.20)

where εiindicate members of a vector ε∈Rmwhich are independent random num-

bers distributed with regard to the normal distribution with mean µ≡0 and constant

variance σ2, i.e., ε∼N(0,σ2).6For ordinary least squares and a fixed standard de-

viation σ>0, this interpretation can be embedded in the more general parameter

finding by maximum likelihood estimation (see, e.g., [91, p. 31], [155, p. 217ff] or

[201, p. 96]). We look at the maximum likelihood estimation in the next section with

regard to the Gaussian process regression (or kriging) low-fidelity model.

The training error eH,st(ˆ

Qξ

ξ)associated with a surrogate model ˜

Qξ

ξ,nis inadequate

to assess the surrogate model’s predictive power concerning points not yet observed

(see, e.g., [173, p. 108] or [91, p. 221]). Hence, derived from the empirical surrogate

modeling error in (3.15), let us introduce the empirical generalization error eH,sg(ˆ

Qξ

ξ)∈

R+with respect to the testing subset Xsgsuch that

eH,sg(ˆ

Qξ

ξ)∶=1

∑

i=1(ˆ

Qξ

ξ(xi)−˜

Qξ

ξ,n(xi))2, (3.21)

where ∀i∈{1,. . .,mg}.xi∈Xsg,i∶=(Xs/Xst)iand (Xs/Xst)i≡X. Often, it is conve-

nient to normalize the error eH,sg(ˆ

Qξ

ξ)to the interval [ˆ

Qmin

ξ,ˆ

Qmax

ξ]Xsg,i, where ˆ

Qmin

ξ∈R

6In statistics vernacular (see, e.g., [61, p. 249–252]), the input entity, the output entity, and the

error entity are conceived as random variables X,Y, and ε, respectively. Hence, the high-fidelity

model Kis regarded as an unknown smooth regression function, i.e., as the conditional expectation

E(Y∣X=x)=∶K(x). Regarding the conditional variance Vµ≡0(ε∣X=x), it is assumed that the ho-

moscedasticity property holds, that is, ∀i.Vµ≡0(εi∣X=x)≡σ2

ε∣Xwith σ2

ε∣X∈R+∪{+∞}. In addition,

noise is only modeled by additive Gaussian noise.

54 Chapter 3. Surrogate optimization

denotes the minimal output point with respect to Xsg,iand ˆ

Qmax

ξ∈Rdenotes the

maximal output point with respect to Xsg,i. Hence, one can define the normalized

empirical generalization error (NEGE) eN

H,sg(ˆ

Qξ

ξ)∈R+with respect to the testing sub-

set Xsgsuch that

H,sg(ˆ

Qξ

ξ)∶=1

Qmax

ξ−ˆ

Qmin

eH,sg(ˆ

Qξ

ξ). (3.22)

As it has been pointed out in the commentary on the empirical surrogate model-

ing error in (3.15), one can additionally introduce the root empirical generalization

error eR

H,sg(ˆ

Qξ

ξ)∈R+such that

H,sg(ˆ

Qξ

ξ)∶=[eH,sg(ˆ

Qξ

ξ)]1

2(3.23)

and one can introduce the normalized root empirical generalization error (NREGE)

eNR

H,sg(ˆ

Qξ

ξ)∈R+that reads as

eNR

H,sg(ˆ

Qξ

ξ)∶=1

Qmax

ξ−ˆ

Qmin

H,sg(ˆ

Qξ

ξ). (3.24)

Note that both the error eR

H,sg(ˆ

Qξ

ξ)and the error eNR

H,sg(ˆ

Qξ

ξ)can be seen as more conser-

vative error measures than their counterparts in (3.21) and in (3.22), respectively.

Let us call a partition Xs,p– with, e.g., the cells Xstand Xsg– randomly created

if and only if the members of the sampling plan Xsare permuted randomly and as-

signed to the partition’s cells in accordance with the partition ratio pm. Then, given

a fixed number of sampling plan points m, a sampling plan point’s membership to

either the training subset or to the testing subset in (3.16) is random, i.e., the parti-

tion Xs,pis randomly created.

In order to average over this membership randomness, let us utilize the notion

of a mean generalization error eH,sg(ˆ

Qξ

ξ)∈R+which we compute by deploying the

hold-out with random sub-sampling method or the k-fold cross-validation method.

These methods are computationally tractable since they are non-exhaustive in the

sense that they do not consider all possible ways of partitioning the sampling plan.

For further elaboration on the technical intricacies, see, e.g. the survey in [5].

The basic version of the hold-out (or simple validation) method computes the

generalization error in (3.21) by assuming a fixed partition ratio pmand a randomly

created partition Xs,pthat is constituted of the two cells Xstand Xsg. An extended

version of the hold-out method includes random sub-sampling (cf. [116, p. 267f]),

viz. performing the basic hold-out method in a finite number of multiple indepen-

dent runs where at each run the generalization error in (3.21) is computed; finally,

the mean generalization error eH,sg(ˆ

Qξ

ξ)is computed as the mean of all individual

generalization errors. However, due to the random creation of the individual par-

titions, this method does not provide any guarantees that all sampling plan points

will be exploited properly as testing points.

The k-fold cross-validation method supposes a randomly created partition Xs,p

that is constituted of kcells X(1)

s,p,. . ., X(k−1)

s,p,X(k)

s,pwhere the positive integer kis se-

lected as k≪m. Either the cells are equal in size, i.e., given a positive integer q, then

∀i∈{1,. . .,k}.∣X(i)

s,p∣=q; or the cells are only approximately equal in size. Generally,

there are numerous options to construct the required training and testing subsets.

However, the construction principle underlying this method demands to define k−1

3.1. Surrogate modeling & simulation 55

cells as the training subset and to define the k-th cell as the testing subset. Since the

cells’ ordering is not preserved when assigned to the subsets, there are kdifferent op-

tions to define the corresponding subsets. Therefore, one can define the i-th training

subset and the i-th testing subset as

X(i)

st∶=Xs,p/X(i)

s,p, (3.25)

X(i)

sg∶=X(i)

s,p, (3.26)

where i∈{1,. . .,k}. For each of the koptions, the generalization error in (3.21) is

computed. Hence, the mean generalization error eH,sg(ˆ

Qξ

ξ)is computed as the mean

of all the kindividual generalization errors. In order to emphasize the dependency

of the generalization error on the k-fold cross validation method, let us introduce the

map eH,sg,cv that reads as

eH,sg,cv =k↦eH,sg(ˆ

Qξ

ξ)∣k∶=eH,sg,cv(k)∶Z+→R+, (3.27)

where eH,sg(ˆ

Qξ

ξ)∣kand eH,sg,cv(k), respectively, denote the k-dependent generalization

error. Hence, the mean k-dependent generalization error is denoted as eH,sg(ˆ

Qξ

ξ)∣k

and eH,sg,cv(k), respectively.

In the context of (3.22), let us ease the notation for the sake of conciseness, that

is, let us introduce the map eN

cv that reads as

cv =k↦eN

H,sg(ˆ

Qξ

ξ)∣k

≡eN

cv(k)∶Z+→R+, (3.28)

where eN

H,sg(ˆ

Qξ

ξ)∣kand eN

cv(k), respectively, denote the k-dependent normalized gen-

eralization error. Hence, the mean k-dependent normalized generalization error is

denoted as eN

H,sg(ˆ

Qξ

ξ)∣kand eN

cv(k), respectively. Notice well that one can analogously

define the mean k-dependent normalized root generalization error eNR

H,sg(ˆ

Qξ

ξ)∣kand

eNR

cv (k), respectively.

In both the extended hold-out method and the k-fold cross-validation method

the computational burden is dominated by determining the surrogate model via the

training subset for the computation of the individual generalization errors. Hence,

the number of runs (in the hold-out method) and the number of folds (in the cross-

validation method), respectively, have to be chosen in such a way that the burden

is low while still producing a reliable mean generalization error. Let us choose the

number of runs similar to the number of folds for which a computational reasonable

choice is k≡5 or k≡10 (see, e.g., [91, p. 242ff]).7

These computational considerations are relevant for both the surrogate model

assessment and the surrogate model selection – as already mentioned above regard-

ing the non-utilization of a validating subset – which can be subsumed under the

bias-variance problem (see, e.g., [51, p. 13f], [91, p. 223ff] or [155, p. 202]). It is a non-

trivial task to find the optimal hyperparameters in the sense that there is an ade-

quate tradeoff between the need for a small bias to avoid underfitting and the need

for a small variance to avoid overfitting the points regarding the sample sin (3.15)

where the size mis fixed. Since there is a lack of a rigorously proven and computa-

tionally cost-efficient approach to finding the optimal hyperparameters, a common

practice to emulate a validating subset’s purpose is to specify some hyperparameters

7Apart from these heuristic values, there is a lack of rigorously proven lower or upper bounds for

the number of runs or the number of folds.

56 Chapter 3. Surrogate optimization

by the user and to estimate the remaining hyperparameters by, e.g., cross-validation

or maximum-likelihood (see, e.g., [69]).

Supplementary to the error in (3.21), some authors (see, e.g., [70, p. 37]) suggest

to compute the squared sample Pearson correlation coefficient (SSPCC) r2

y˜

ywith respect

to the testing subset Xsgwhere rˆ

y˜

y∈[−1,1]. For the sake of lucidity, let us apply

partly the identifications

Y∶=ˆ

Qξ

ξ(xi)(3.29a)

Y∶=˜

Qξ

ξ,n(xi). (3.29b)

Since we consider solely the discrete setting, one can additionally set the sample

means

Y∶=1

∑

i=1

Yi(3.30a)

Y∶=1

∑

i=1

Yi, (3.30b)

where the identifications ˆ

Yi≡ˆ

Yand ˜

Yi≡˜

Yare invoked. Moreover, one can overload

the meaning of the covariance map cov and the variance map var regarding random

variables in a continuous setting. Hence, the coefficient r2

y˜

yreads as

y˜

y∶=⎛

⎜

⎝cov(ˆ

Y,˜

√cov(ˆ

Y,ˆ

Y)cov(˜

Y,˜

Y)⎞

⎟

⎠

(3.31a)

≡⎛

⎜

⎝cov(ˆ

Y,˜

√var(ˆ

Y)var(˜

Y)⎞

⎟

⎠

(3.31b)

≡⎛

⎜

⎝

mg−1∑(ˆ

Y−¯

Y)(˜

Y−¯

√1

mg−1∑(ˆ

Y−¯

Y)21

mg−1∑(˜

Y−¯

Y)2⎞

⎟

⎠

(3.31c)

≡⎛

⎜

⎝∑(ˆ

Y−¯

Y)(˜

Y−¯

√∑(ˆ

Y−¯

Y)2∑(˜

Y−¯

Y)2⎞

⎟

⎠

(3.31d)

≡⎛

⎜

⎝mg∑ˆ

Y˜

Y−∑ˆ

Y∑˜

√(mg∑ˆ

Y2−(∑ˆ

Y)2)(mg∑˜

Y2−(∑˜

Y)2)⎞

⎟

⎠

(3.31e)

(3.29)

≡⎛

⎜

⎝mg∑mg

i=1[ˆ

Qξ

ξ(xi)˜

Qξ

ξ,n(xi)]−[∑mg

i=1ˆ

Qξ

ξ(xi)][∑mg

i=1˜

Qξ

ξ,n(xi)]

√(mg∑mg

i=1[ˆ

Qξ

ξ(xi)]2−[∑mg

i=1ˆ

Qξ

ξ(xi)]2)(mg∑mg

i=1[˜

Qξ

ξ,n(xi)]2−[∑mg

i=1˜

Qξ

ξ,n(xi)]2)⎞

⎟

⎠

(3.31f)

In statistics parlance, if we regard the quantity ˆ

Yas an encoding of the observed val-

ues, and if we regard the quantity ˜

Yas an encoding of the predicted (or computed or

simulated) values, then the choice of the SSPCC r2

y˜

yin (3.31) rests on the assumption

that all observed values and all predicted values are equally important. Hence, the

weighting of the individual data points is one.

3.1. Surrogate modeling & simulation 57

Geometrically speaking, assuming a list of abstract points from an Euclidean

space, then the list of predicted values and the list of observed values can be inter-

preted as the Cartesian coordinates of the abstract points where e˜

Yand eˆ

Yrefer to

the unit vectors in R2w.r.t. ˜

Yand ˆ

Y, respectively. Therefore, the SSPCC r2

y˜

yindicates

how well the relationship between the abstract points can be described by a linear

equation in R2. Notice that, by definition, r2

y˜

y∈[0,1]. Thus, if r2

y˜

y=1, then there is a

total positive linear correlation and the relationship between the abstract points can

be described by the linear equation that reads as

∃a∈R+.∃b∈R.∀xi∈Xsg,i.ˆ

Qξ

ξ(xi)=a⋅˜

Qξ

ξ,n(xi)+b. (3.32)

Ideally, one can provide the number aand the number bsuch that a∶=1 and b∶=0.

However, the choice of the SSPCC r2

y˜

yin (3.31) is not capable of identifying the case

where a∶=1 and b∶=0.

If r2

y˜

y=0, then there is no linear correlation, more precisely, one cannot provide a

number aand a number bsuch that (3.32) holds to be true at least for some xi∈Xsg,i.

Observe that the geometrical consideration of the SSPCC r2

y˜

yreveals that the

number mgin (3.31) has to satisfy the condition mg>2 which can be translated into

the requirement that there are at least three abstract points represented as members

of the linear span constituted by the unit vectors e˜

Yand eˆ

Y, i.e., span({e˜

Y,eˆ

Y}); oth-

erwise r2

y˜

yis immediately equal to one.

In order to assess a low-fidelity model more nuanced, as mentioned above, one

can use r2

y˜

yin combination with eH,sg(ˆ

Qξ

ξ). However, if we use the SSPCC in combi-

nation with the mean generalization error determined by the k-fold cross-validation

method, then the condition mg>2 requires a minimum number of sampling points

mk,min depending on the number k, that is, the number of cells of the randomly cre-

ated partition. Therefore, for instance, if k∶=5, then mk∶=5,min ∶=15; and if k∶=10, then

mk∶=10,min ∶=30.

In order to emphasize this kind of dependency of the SSPCC on the k-fold cross

validation method, let us introduce the map r2

y˜

y,cv that reads as

y˜

y,cv =k↦r2

y˜

y∣k

∶=r2

y˜

y,cv(k)∶Z+→[0,1], (3.33)

where r2

y˜

y∣kand r2

y˜

y,cv(k), respectively, denote the k-dependent SSPCC. In analogy

to eH,sg(ˆ

Qξ

ξ), for each of the koptions in (3.25) and in (3.26), the SSPCC in (3.31) is

computed. Hence, the mean k-dependent SSPCC (or short: mean SSPCC) r2

y˜

y∣kis

computed as the mean of all the kindividual SSPCCs.

Whereas the error in (3.21) focuses on the comparison of the values of the high-

fidelity model and the low-fidelity model, the SSPCC focuses on the comparison

of the shapes (or landscapes) of the high-fidelity model and the low-fidelity model.

Thus, if the SSPCC is close to the number 1, then it hints at a high geometrical sim-

ilarity of the corresponding shapes. Therefore, the SSPCC can be suitable as a sup-

plementary tool to assess quantitatively the similarity of the high-fidelity model and

a low-fidelity model.

Notice well that describing shapes by exploiting information about derivatives is

a common theme in languages such as, for instance, differential geometry (see, e.g.,

Detour 1 in§2.1.2). Since we deploy a gradient-based interpretation of sensitivity

measures (recall § 2.3.3), it is mathematically reasonable to associate the k-dependent

58 Chapter 3. Surrogate optimization

SSPCC with a low-fidelity model’s normalized global first-order sensitivity mea-

sures SN

y,i(f)with SN

y,i(f)≡SN

i(˜

Qξ

ξ,n). Hence, I propose to normalize the k-dependent

SSPCC r2

y˜

y∣k– and the mean SSPCC r2

y˜

y∣kas well – to the sum ΣNξ

i=1SN

y,i(f). More pre-

cisely, one can define the normalized k-dependent SSPCC and the normalized mean

SSPCC such that

y˜

y∣N

∶=

y˜

y∣k

ΣNξ

i=1SN

y,i(f),r2

y˜

y∣N

k∶=

y˜

y∣k

ΣNξ

i=1SN

y,i(f). (3.34)

Notice that r2

y˜

y∣N

=[0,1]r2

y˜

y∣kand r2

y˜

y∣N

=[0,1]r2

y˜

y∣k, that is, the corresponding entities

are equal as numbers. However, they are conceptually different. A benefit of intro-

ducing the entity r2

y˜

y∣N

k– and the entity r2

y˜

y∣N

kas well – is to highlight the connection

of various information sources for assessing the shape (or landscape) of the low-

fidelity model with regard to the high-fidelity model. Another benefit is that r2

y˜

y∣N

and r2

y˜

y∣N

khint at the trustworthiness of the normalized global first-order sensitiv-

ity measures associated with a low-fidelity model, that is, SN

y,i(f), as proxies for the

normalized global first-order sensitivity measures associated with the high-fidelity

model, that is, SN

y,i(f)with SN

y,i(f)≡SN

i(ˆ

Qξ

ξ). To put the conjecture in more formal

words:

Conjecture (Trustworthiness of low-fidelity models’ normalized global first-order

sensitivity measures).Given k and m such that m >mk,min, then there exist some low-

fidelity models such that

∀i∈{1,. . ., Nξ}.SN

y,i(f)→SN

y,i(f)as m →∞Ô⇒ r2

y˜

y∣N

k→1as m →∞.8(3.35)

Remark 3.1.6. If and only if the case

∀i∈{1,. . ., Nξ}.SN

y,i(f)=SN

y,i(f)(3.36)

holds, then a low-fidelity model’s sensitivity measures is considered as total trustworthy

proxies for a high-fidelity model’s sensitivity measures.

The contrapositive of the statement in (3.35) emphasizes that if r2

y˜

y∣N

kis not asymp-

totically converging to one as mtends to infinity, then one cannot expect that the

low-fidelity model’s sensitivity measures are trustworthy proxies at all. To my best

knowledge, a thorough formal investigation of the above-mentioned conjecture is

lacking. Mind that such a thorough formal investigation is out of the scope of the

present work. The conjecture should be rather understood as an attempt to jot down

formally an accumulation of experimental observations than an attempt to infer log-

ically from a bundle of theoretical insights.

In § 3.2, I present a small data-driven investigation of the statement in (3.35) by

means of numerical experiments with regard to the test functions in Table 2.1. Notice

that, in this investigation, the sample size is limited, though. Hence, the asymptotic

behavior is not examined but, primarily, the pre-asymptotic behavior – since, from

8It is implicitly supposed that the limit considerations are with regard to some appropriate norms.

3.1. Surrogate modeling & simulation 59

an application-driven viewpoint, the pre-asymptotic behavior is particular interest-

ing.

In practical applications, the statement in (3.35) inspires to introduce for each

i∈{1,. . ., Nξ}a low-fidelity models’ normalized global first-order sensitivity mea-

sures (LFSM) error emj(SN

y,i)that reads as

emj(SN

y,i)∶=

i,mj(f)−SN

i,mj−1(f)

i,mj(f), (3.37)

where mjand mj−1denote sample sizes such that mj>mj−1. The errors in (3.37) track

the size as well as the orientation of the discrepancy between a low-fidelity model’s

normalized global first-order sensitivity measures w.r.t. a sample size mjand a sam-

ple size mj−1. The notation em∞(SN

y,i)refers to the situation where a reference value,

e.g., from an analytical calculation, is provided by a user.

3.1.2 Deterministic and probabilistic data-fit low-fidelity models

I only cover a small part of the vast territory of available deterministic and prob-

abilistic surrogate models. For more examples from the zoo of surrogates, I refer

to, e.g., [61] and [91] and references therein. Additionally, the new, still develop-

ing, Julia PL package Surrogates.jl (see [23]) is recommended that is part of the

larger open source software project called SciML: Scientific Machine Learning

(see

https://sciml.ai/

The choice of surrogate models as well as the respective terminologies and the

technicalities reflect partly a bias towards mesh-free and mesh-based numerical mod-

els (see, e.g., § 2.2) – albeit, these surrogate models are also connected to data-driven

statistical models. All the surrogate models described in this subsection are essen-

tially representable as an expansion of basis functions as shown in (3.7).

Multivariate polynomials

In the previous subsection 3.1.1, we have encountered the prototypical hypothesis

space P≤nwhich is the set of all univariate algebraic polynomials of degree at most n

on an interval X⊂Requipped with an R-vector space structure and the finite mono-

mial basis B⊆P≤nthat reads as

B∶={1,x1,. . ., xn−1,xn}(3.38)

such that P≤ncan be regarded as the linear span of B, i.e., P≤n≡span(B). A mem-

ber of the space P≤nis a univariate polynomial p=x↦p(x)∶X→Rwhich can be

portrayed as

p(x)∶=c0x0+c1x1+⋯+cn−1xn−1+cnxn≡

∑

i=0

cixi, (3.39)

where it holds that x0≡1, n∈Nand the coefficients ci∈Rwith i∈{0,...,n}and

cn≠0. Using the generic representation in (3.7), one can write (3.38) as

B∶={˜

ϕi(x)≡xi∣i∈{0,1,.. .,n−1,n}}, (3.40)

60 Chapter 3. Surrogate optimization

and one can define the linear span of Bas

span(B)∶={n

∑

i=0

ci⋅R˜

ϕi(x)∣n∈N∧˜

ϕi(x)∈B∧˜

ci∈R}.9(3.41)

Following the path in [201, p. 154ff], let us wield the tensor product construction in

order to articulate a space of multivariate polynomials. The construction’s underly-

ing principle is to express a d-variate polynomial with the arguments (x1,. .., xd)∈

X⊆Rdas a combination of dunivariate polynomials such as in (3.39).

Let N∶=(n1,...,nd)∈Nd

0be the ordered set of the degrees of the dunivariate

polynomials. Furthermore, let I∶=(i1,. . .,id)∈Nd

0denote an ordered set of dindices,

more precisely, a multi-index of dmembers, and ∣I∣∶=i1+⋅⋅⋅+iddesignates the degree

of the multi-index Ior the total degree of the monomial xIwhich can be written as

xI∶=xi1

1xi2

2⋯xid−1

d−1xid

d, (3.42)

where x(0,...,0)≡x0

1⋯x0

dand x(0,...,0)≡1. Then, one can write a multivariate polyno-

mial p=x↦p(x)∶X→Ras

p(x)∶=∑

I≤N

cIxI, (3.43)

where the coefficients cI∈Rare scalars and I≤Nencodes (i1≤n1,. . .,id≤nd).The

maximal degree of the monomial xIin (3.42) can be expressed as max{I}≤Nthat

encodes (max{i1}≤n1,. . .,max{id}≤nd); and the notion of a multivariate polyno-

mial’s degree deg(p)can be defined as

deg(p)∶=max{∣I∣∣cI≠0}. (3.44)

Finally, the space of d-variate polynomials of total degree at most kcan be ex-

pressed as the direct sums of tensor products of dspaces of univariate polynomials:

≤k∶=⊕

∣I∣≤k

Pi1⊗⋯⊗Pid.10 (3.45)

Let us restrict to the case in which N≡(k,...,k)with k∈N0, that is, each xiwith

i∈{1,. . .,d}is associated with a univariate polynomial of degree k. Then, a basis of

the space Pd

≤kcan be written as

⊗

i=1

Bi∶=B1⊗ ⋅ ⋅ ⋅ ⊗ Bd(3.46a)

∶={˜

ϕ1i1(x)⊗ ⋅ ⋅ ⋅ ⊗ ˜

ϕdid(x)≡xi1⊗ ⋅ ⋅ ⋅ ⊗ xid∣I∈{0,1,. . .,k}d∧∣I∣≤k}(3.46b)

∶={˜

ϕ1i1(x1)⋯˜

ϕdid(xd)≡xI∣I∈{0,1,. . .,k}d∧∣I∣≤k}(3.46c)

=∶B⊗d, (3.46d)

9In span(B), technically, the object under the given predicate allows to meaningfully state the

predicate p(x)∈P≤nwhere P≤n≡span(B). In order to meaningfully state the predicate p∈P≤nwith

p=x↦p(x)∶X→R, one should understand the object in span(B)as a shorthand notation for

x↦∑n

i=0˜

ci⋅R˜

ϕi(x)∶X→R.

10Generally, constructing multivariate functions as tensor products of univariate functions involves

many subtelties such as a quotient space or the universal property. For some subtleties of such con-

structions, see, e.g., [201, p. 45–52].

3.1. Surrogate modeling & simulation 61

where Biis the basis Bin (3.40) with respect to xi. Given the basis in (3.46), one can

express the dimension of the space Pd

≤kvia the binomial coefficient (k+d

d)such that

dim(Pd

≤k)≡(k+d

d). (3.47)

For example, the dimension of the space P2

≤2is dim(P2

≤2)∶=6 and a basis of this space

can be written as

⊗

i=1

Bi∶=B1⊗B2(3.48a)

∶={˜

ϕ1i(x)⊗˜

ϕ2j(x)≡xi⊗xj∣(i,j)∈{0,1,2}2∧i+j≤2}(3.48b)

∶={˜

ϕ1i(x1)˜

ϕ2j(x2)≡xi

1xj

2∣(i,j)∈{0,1,2}2∧i+j≤2}(3.48c)

=∶B⊗2, (3.48d)

where B1and B2are the basis Bin (3.40) for x1and x2, respectively. A polynomial

p∈span(B⊗2)can be represented as

p(x)∶=c(0,0)+c(1,0)x1+c(0,1)x2+c(2,0)x2

1+c(0,2)x2

2+c(1,1)x1x2, (3.49)

where an ordering for the multi-index Iis chosen which does respect the degree ∣I∣.

Occasionally, the basis associated with the space Pd

≤kis called a complete poly-

nomial basis. The basis associated with the space Pd

k, that is, the space of d-variate

polynomials of maximal degree at most k, is called a tensor product polynomial ba-

sis. The space Pd

kis constructed by applying the condition max{I}≤Nin (3.45). Its

dimension can be stated as

dim(Pd

k)≡(1+k)d. (3.50)

In Table 3.1, dim(Pd

≤k)and dim(Pd

k)are listed for some pairs (k,d)∈N2.

TABLE 3.1: Given some pairs (k,d)∈N2, the dimension of Pd

≤kand Pd

(k,d)dim(Pd

≤k)dim(Pd

(2,2)6 9

(2,3)10 27

(2,4)15 81

(2,5)21 243

(k,d)dim(Pd

≤k)dim(Pd

(3,2)10 16

(3,3)20 64

(3,4)35 256

(3,5)56 1024

For a fixed integer k>0 and a fixed integer d>1, in general, one can observe that

dim(Pd

≤k)<dim(Pd

k). (3.51)

For a given integer pair (k,d), empirically, the computation time associated with the

polynomials in the space Pd

≤kis frequently lower than for polynomials in the space Pd

while the approximation quality or accuracy is only slightly lower.

Recalling the test functions in Table (2.1), one cannot expect a globally sufficiently

accurate approximation of those test functions that include periodic parts. From an

optimization point of view, however, one can expect a locally sufficiently accurate

approximation in the neighborhood of the global optimum. The rationale behind

62 Chapter 3. Surrogate optimization

these expectations is rooted in the relationship of the notion of approximation qual-

ity for the polynomials in the space Pd

≤kwith the notion of approximation quality for

the d-variate Taylor-kind polynomials of degree k.

Admittedly, I do not elaborate on this relationship; it triggers, though, an im-

portant special case regarding low-fidelity models which is the space of d-variate

polynomials of total degree at most two. The corresponding polynomials p∈Pd

≤2are

called response surfaces (see, e.g., [61, p. 27]).

Ad-variate polynomial of degree at most two can be presented in the form

p(x)∶=c(0,...,0)+

∑

i=1

eixi+

∑

i=1

∑

j=1

ai,jxixj, (3.52)

where ei∈Rand ai,j∈Rare scalars. Invoking Householder’s notation for matrix

operations (see, e.g., [99, p. 1ff]) and given p∶Rd×1→R, we can write (3.52) as

p(x)∶=β0+eTx+xTAx , (3.53)

where β0∈Rrepresents a scalar, x∶=[xi]∈Rd×1and e∶=[ei]∈Rd×1represent column

vectors (or column matrices) and A∶=[ai,j]∈Rd×Rd– with Rd×Rd≅Rd×d– repre-

sents a quadratic matrix (or square matrix).11 By introducing the maps le=x↦eTx∶

Rd×1→Rand qA=x↦xTAx ∶Rd×1→R, a map-oriented presentation of (3.53) can

be achieved:

p(x)∶=β0+le(x)+qA(x). (3.54)

A possible matrix representation of the basis in (3.48c)is the column vector ˜

b∈

R6×1with

b∶=[1x1x2x2

1x2

2x1x2]T. (3.55)

Given the order of the components in (3.55), the authors in [61, p. 133] show a general

construction rule how to obtain the components of a column vector ˜

b∈Rs×1with

s∶=d(d+3)/2+1 (cf. (3.47)) that represents the basis of a d-variate polynomial of

degree at most two:

b∶=[˜

b0˜

b1... ˜

bd˜

bd+1... ˜

b2d˜

b2d+1... ˜

bs−1]T, (3.56)

where ˜

b0=1, ˜

b1=x1,˜

bd=xd,˜

bd+1=x2

1,˜

b2d=x2

d,˜

b2d+1=x1x2, and ˜

bs−1=xd−1xd.

Hence, if we introduce a column vector ˜

c∈Rs×1which encapsulates the coefficients

with regard to the components of ˜

b, one can reformulate (3.53) as

p(x)∶=˜

bT˜

c, (3.57)

Assuming a sample ssuch as in (3.13), one can employ the corresponding sampling

plan points in (3.57). Thus, one can define a column vector y∈Rm×1whose com-

ponents are the output points yiand one can succinctly define a matrix B∈Rm×Rs

11Technically, the representation of a vector y∈Rdas a column vector y∈Rd×1and the representa-

tion of a 1×1 matrix γ∈R1×1as a scalar γ∈Rinvolves some kind of isomorphisms, in order to state

Rd≅Rd×1and R≅R1×1.

3.1. Surrogate modeling & simulation 63

with respect to the sampling plan points xi:

B∶=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⋮

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

. (3.58)

In a verbose mode, the matrix in (3.58) displays

B∶=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1x11... x1dx2

11... x2

1dx11x12... x1d−1x1d

1x21... x2dx2

21... x2

2dx21x22... x2d−1x2d

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

1xm1... xmdx2

m1... x2

mdxm1xm2... xmd−1xmd

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, (3.59)

where xmddenotes the d-th coordinate of the m-th sampling plan point. Using the

matrix B, one can define a function h=˜

c↦B˜

c∶Rs→Rmand one can state an inverse

problem (cf. (3.1c)in§3.1.1) that reads as

given B∈Rm×Rsand y∈Rm×1, find ˜

c∈Rs×1such that B˜

c=y. (3.60)

If the condition m=sholds and rank(B)is full, the left and the right inverse of the

matrix Bexist.12 Hence, there is a unique solution ˜

c≡B−1˜

ywhich is a determination

of the coefficients by interpolation (cf. (3.18)in§3.1.1).

In the applications of the present work, however, the condition m>susually

holds that leads to an overdetermined system of linear equations in (3.60) such that

B−1does not exist. In this case, the inverse’s purpose is most adequately emulated

by the pseudoinverse. For more details on the properties of the pseudoinverse, I

refer to, e.g., [200, p. 618f].

Given the space-filling property and the non-collapsing property of a sampling

plan, it is reasonable to assume that the column rank of Bis full such that the inverse

(BTB)−1exists, then the pseudoinverse B+∈Rs×Rmcan be stated as

B+∶=(BTB)−1BT, (3.61)

which can be computed efficiently by the singular value decomposition method.

Using the pseudoinverse B+results in a reformulation of the problem in (3.60) in

terms of a projection matrix PB∈Rm×Rmwith PB∶=BB+where tr(PB)≡rank(B)

such that one can define a column vector ˆ

y∈Rm×1via ˆ

y∶=PBy.13 The corresponding

coefficients column vector ˆ

c∈Rqsuch that

c∶=B+y(3.62)

is the best solution in the sense of a linear multiple regression by the least squares

12Let us conceive the rank of the m×smatrix Bin the sense that rank(B)≤min{m,s}. If m=qand

rank(B)=m, then we say that the rank of the matrix Bis full.

13Let us comprehend the trace of a square matrix A∈Rn×Rnas the map tr ∶Rn×Rn→R+with

tr(A)∶=∑n

i=1ai,i.

64 Chapter 3. Surrogate optimization

method. Recalling (3.19), the best solution ˆ

c∈Rs×1is associated with the optimiza-

tion problem

minimize

c∈Rs×1rss(˜

c)∶=1

2(y−B˜

c)T(y−B˜

c), (3.63)

where the objective function rss ∶Rs×1→Ris sometimes called the residual sum-of-

squares function (see, e.g., [91, p. 30]) with r∈Rm×1designating the residual column

vector (abbreviated to residual) such that r∶=(y−B˜

c).

In order to ensure the well-posedness of the optimization problem, a fruitful

generalization of the basic discrete least squares l2approximation problem in (3.63)

is the Tikhonov regularized weighted least squares l2approximation problem that

can be stated as

minimize

c∈Rs×1rss(˜

c)∶=1

2∥y−B˜

c∥2

W+1

2∥˜

c−˜

c0∥2

R, (3.64)

that leads to the normal equations and the best solution, respectively,

c∶=(BTWB+R)−1(BTWy+R˜

c0), (3.65)

where the Tikhonov matrix R∈Rs×Rsand the residual variance-covariance matrix W∈

Rm×Rmdenote symmetric positive definite diagonal matrices such that

∥⋅∥W=v↦vTWv∶Rm×1→R, (3.66)

∥⋅∥R=v↦vTRv∶Rs×1→R, (3.67)

and ˜

c0∈Rs×1denotes a column vector that represents the initial guess about the best

solution (cf. [201, p. 71]).

Since the Tikhonov matrix Rencodes the regularization, it is set that R∶=LLT

where one can define L∈Rs×Rsas L∶=√λIwith the regularization parameter λ∈

[0,1[and I∈Rs×Rsbeing the identity matrix. The residual variance-covariance

matrix Wencodes the weighting of the components of the squared residual in the

sense that W∶=diag(σ−2,. . .,σ−2)where σdenotes the constant conditional error

variance in (3.20).14

Hence, let us consider all the components of the squared residual as uncorrelated

(represented by setting all of W’s off-diagonal entries to zero) and on a par with

each other (represented by setting all of W’s diagonal entries to the same positive

number). If we set W∶=Iwith I∈Rm×mbeing the identity matrix and λ≡0 in the

Tikhonov matrix, we recover the problem in (3.63) as a special case.

Mind, though, if the original problem in (3.63) is ill-conditioned, choosing λtoo

small will not change much, choosing λtoo big leads much more to a detachment

from the original problem. Thus, finding an optimal regularization parameter is not

a trivial task and it depends highly on the problem at hand and the judgment of

the user. Let us interpret the regularization parameter as a hyperparameter (recall

Remark 3.1.4).

Given the optimal coefficients as floating-point numbers, a numerically stable

approach to evaluate the function in (3.57) is Clenshaw’s recurrence formula (see, e.g.,

14Given a column vector d∈Rn×1and a square diagonal matrix A∈Rn×nwhere ∀i,j∈{1,2,. . . ,n}.

i≠jÔ⇒ ai,j∶=0, let us comprehend diag as the map with the signature Rn×1→Rn×nand the assign-

ment diag(d)∶=[ai,i≡di]. Note well that, in the specific context in which the map diag lives, the term

diag(d1,. . .,dn)is treated as a rewriting of the term diag([d1,. .. , dn]T).

3.1. Surrogate modeling & simulation 65

[171, p. 222f]) – which, in the case of a monomial sum, is the familiar Horner’s

method – that exploits the inherent recurrence relation and avoids the explicit eval-

uation of the polynomial functions in (3.56). For a more elaborate discussion on

the propagation of the rounding error in the context of polynomial evaluation, see,

e.g., [161]. In Listing 3.1, I present an example implementation of Clenshaw’s algo-

rithm for the evaluation of a univariate monomial sum in the Julia PL.

LISTING 3.1: An example implementation of Clenshaw’s algorithm

for the evaluation of a univariate monomial sum in the Julia PL.

function monomial_clenshaw_eval_1d(c::Vector{T},x::T) where {T<:Real}

N = size(c,1) - 1 # 1-based indexing

d = zeros(N+2)

d[N+2] = 0

d[N+1] = c[N+1]

for iin N:-1:2

d[i] = x*d[i+1] + c[i]

end

return x*d[2] + c[1]

end

In the multivariate case, one can apply a plain greedy approach in the sense that one

can invoke multiple nested hierarchical univariate monomial sum evaluations. For

instance, if we possess the space P2

≤2with dim(P2

≤2)≡6, then one can introduce the

column vectors ˜

φ(x2)∈R6×1and ˜

ψ(x1)∈R6×1and the diagonal matrix ˜

Σ∈R6×R6

such that

φ(x2)∶=[x0

2x0

2x1

2x2

2x1

2]T, (3.68a)

ψ(x1)∶=[x0

1x1

1x2

1x0

1x1

1]T, (3.68b)

Σ∶=diag(˜

c1,˜

c2,˜

c3,˜

c4,˜

c5,˜

c6), (3.68c)

p(x)∶=˜

φ(x2)T˜

Σ˜

ψ(x1). (3.68d)

A display that is favorable for the application of the multiple nested hierarchical

evaluations is

p(x)∶=

dim(P2

≤2)

∑

i=1

cj˜

φj(x2)˜

ψj(x1), (3.69)

where, firstly, the terms ˜

cj˜

φj(x2)are evaluated, and, secondly, these evaluated terms

are used as the coefficients for the evaluation of the terms ˜

ψj(x1). The proper gener-

alization of Horner’s method to multivariate polynomials is still an active research

area (see, e.g., [130]). Furthermore, notice well that the display in (3.69) hints at the

connection to the vivid research area of the computationally efficient representation

of multivariate functions using low-rank tensor approximation techniques (see, e.g.,

[206], [90]). However, in the present work, let us leave it at that adumbration.

I have argued that due to a sampling plan’s space-filling property and its non-

collapsing property, it is reasonable to assume the generic case in which the matrix

BTBis invertible (or non-singular or non-degenerate). However, this premise could

be challenged. Hence, let us glance briefly at the influence of the arrangement of a

66 Chapter 3. Surrogate optimization

sampling plan on the condition number κ(BTB)with the property

κ(BTB)≡(κ(B))2.15 (3.70)

Thus, the condition number with respect to BTBis always worse than the condition

number with respect to B. Let us focus on instances of a multicollinearity with re-

spect to the chosen basis – where, in a theoretical absence of numerical errors, one

could spot that the column rank of Bis not full.

Observing Figure 3.1 and Figure 3.2, a situation is conceivable where a sampling

plan could be constructed as, for instance,

Xs,1 ∶={(0.1,0.6),(0.2,0.3),(0.4,0.7),(0.5,0.8),

(0.6,0.1),(0.3,0.2),(0.7,0.4),(0.8,0.5)}, (3.71)

where Xs,1 is represented by a Rm×Rdmatrix with m=8 and d=2 such that the con-

dition number is 1.06×104(see

(i)

in Figure 3.4). The underlying construction prin-

ciple is based on the Householder reflection matrix H∈Rd×Rdwith H∶=I−2vvT

vTv

where I∈Rd×Rddenotes the identity matrix and v∈Rd×1denotes a column vector.

Let us choose vsuch that it is orthogonal to the vector vt∶=∑d

i=0eiwhere eidenote

the standard basis vectors of Rd.

Another illustration is a sampling plan Xs,2 such that

∀i∈{1,. . .,m}.xid−1=xidÔ⇒ ∀i∈{1,...,m}.bi,s−1=bi,2d. (3.72)

For an example regarding the case m=8 where the condition number is 8.67×1049 –

which, numerically, indicates a singular matrix –, consult

(ii)

in Figure 3.4. Let us

compare the sampling plan Xs,1 in (3.71) and Xs,2 in (3.72) with a sampling plan Xs,3

based on the Sobol quasi-random sequence where the condition number is 3.23 ×103

(see

(iii)

in Figure 3.4). Even in the case of a space-filling and non-collapsing sam-

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(i)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(ii)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(iii)

FIGURE 3.4: Sampling plan Xs[condition number κ(BTB)].

(i)

Xs,1 [1.06×104],

(ii)

Xs,2 [8.57×1049],

(iii)

Xs,3 [3.23×103].

15Let us comprehend the condition number for inversion of a matrix A∈Rm×Rnwith m≥nas

the map κ∶Rm×Rn→R+with κ(A)∶=∥A∥2∥A+∥2where ∥⋅∥2∶Rm×Rn→R+denotes the matrix

norm induced by the l2-norm for vectors. If we possess A’s largest singular value σmax and its smallest

singular value σmin, then one can set κ(A)≡σmax

σmin . Given a positive integer kwhere κ(A)∝10k, then,

very roughly speaking, it is supposed that there are only 16 −kor 16 −log10(κ(A))significant digits of

an output’s accuracy in a double-precision floating-point format.

3.1. Surrogate modeling & simulation 67

pling plan such as the Sobol quasi-random sequence, one can observe a high condi-

tion number. It is an indication that the familiar ill-conditioned behavior of a mono-

mial basis for the space P≤2is mimicked by the monomial basis for the space Pd

≤2.

In Appendix A, numerical experiments are conducted with regard to a repara-

metrization using mean-centered arguments, Bernstein polynomials, and Cheby-

shev polynomials.

In Figure 3.5, the monomial basis in (3.46), the Bernstein basis in (A.3), and the

Chebyshev basis in (A.10) for the space P1

≤2are exhibited. Note that there are Julia PL

packages such as MultivariatePolynomials.jl (see doi:10.5281/zenodo.3839754)

that, in their definition of various kinds of polynomials, utilize intensely several lan-

guage features of the Julia PL, e.g., its type system or its metaprogramming capabili-

ties. Note further that there are MATLAB®PL toolboxes such as Chebfun3 (see [90])

that are purely written in MATLAB. These toolboxes avoid using, e.g., MEX files, i.e.,

MATLAB executables. Let us not dwell on language-related issues because there

is a lack of comprehensive studies (cf. § 2.3.3) that, for such particular use cases,

compare thoroughly the pros and cons of each programming language in terms of

performance, readability or maintainability – to name but a few criteria. For exam-

ple, it is difficult to determine whether potential performance differences are due to

different language designs or different implementations, or both.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(

)

(i)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(

)

(ii)

bς

-1.0 -0.5 0.0 0.5 1.0

-1.0

-0.5

0.0

0.5

1.0

(

)

(iii)

FIGURE 3.5: Basis under consideration for the space P1

≤2.

(i)

Monomial basis,

(ii)

Bernstein basis,

(iii)

Chebyshev basis.

Mind that the Chebyshev grid (see Figure A.1) is closely linked to the numerical

technique of sparse grids (see, e.g., [39]). This technique is particularly useful when

one is dealing with the space Pd

kwhere high values are chosen for kand dsuch that

dim(Pd

k)in (3.50) is high as well. Thus, the dimensionality dis a curse in the sense

that the computational costs in terms of memory and time grow exponentially in the

dimensionality d.

Based on ideas from information-based complexity theory (see, e.g., [207]), the

authors in [39] invoke a formal encoding of the curse of dimensionality by the com-

plexity estimate O(ε−αd)where the non-negative real number εdenotes a desired

accuracy of an approximate solution and the non-negative real number αis depen-

dent on the properties of the high-fidelity model and the low-fidelity model, and the

concrete implementation as well. It is assumed that 0 <ε<1 and 0 <αand that the

big Onotation refers to the worst-case time complexity estimate.

The authors in [39] provide a link between the approximation error and the com-

plexity estimate which is adapted to the high-fidelity function approximation error

68 Chapter 3. Surrogate optimization

in (3.6) such that

∥K−˜

Kn∥YX=O(n−r/d)as n→∞ ∶⇔∥K−˜

Kn∥YX∈O(n−r/d), (3.73)

where rdenotes the isotropic smoothness of the high-fidelity model and O(n−r/d)

can be interpreted as the corresponding complexity class.

Even though the sparse grids technique is a tool to alleviate the curse of dimen-

sionality to some extent, this technique is ignored in the present work. A reason is

that the focus is on low-fidelity models associated with the space Pd

≤2where, for a

fixed d, the curse of dimensionality does not appear as heavily as for the space Pd

(cf. Table 3.1). By focusing on these spaces, a potential loss of accuracy is acceptable

which is also partly due to the global character of these low-fidelity models.

Another tool to alleviate the curse of dimensionality to some extent are radial

basis functions – with their local character – which we encounter next. Mind that,

in favor of radial basis functions, a discussion on multivariate splines (see, e.g., [91,

ch. 5.7]) is skipped. However, radial basis functions are partly related to splines (see,

e.g., [62, p. 311]).

Radial basis functions

Using a radial basis function as a low-fidelity model assumes that its corresponding

hypothesis space (recall Definition 3.1.1) is a reproducing kernel Hilbert space HK. For

an elaborate treatment of reproducing kernel Hilbert spaces, I refer to the literature

(see, e.g., [187], [51, ch. 2.4], [91, ch. 5.8]).

Let us regard the space HKsolely in the context of radial basis functions where a

generic kernel φ=(x,t)↦φ(x,t)∶X×X→Ris specified as a radial kernel by setting

φ=r↦φ(r)∶R+→Rwhere r∶=∥x−t∥l2and t∈Xdenotes a center point. Technically,

it is tacitly assumed that the basis functions are radially symmetric on Rd.

By invoking a function φx=t↦φ(x,t)∶X→R, a member of the space HKis a

function ψ=x↦ψ(x)∶X→Rwhich can be portrayed as

ψ(x)∶=c1φx(r1)+...+cn−1φx(rn−1)+cnφx(rn)(3.74a)

≡

∑

i=1

ciφx(ri), (3.74b)

where ri∶=∥x−ti∥l2with i∈{1,. . .,n}and it holds that n∈Nand the coefficients ci∈R.

Analogously to (3.41), one can state that

ψ∈span({φx(ri)∣x∈X∧i∈{1,...,n−1,n}}). (3.75)

Notice well that if there is no peril of confusion, let us reduce the amount of notation

by omitting the subscript of φx.

In Table 3.2, six different assignment definitions for a generic radial basis func-

tion φ=r↦φ(r)∶R+→Rare provided. Their description follows the notational

convention in [70, p. 46] and [116, p. 262], respectively; thus, let us call the parameter

σthe shape parameter. Furthermore, let us interpret the shape parameter as a hyper-

parameter (recall Remark 3.1.4). For more elaborations on smoothness properties or

convergence properties of radial basis functions, I refer to, e.g., [38].

Despite favorable smoothness and convergence properties associated with radial

basis functions, the selection of an appropriate radial basis functions for a task at

3.1. Surrogate modeling & simulation 69

TABLE 3.2: Given a generic radial basis function φ=r↦φ(r)with

function signature R+→R,

six different definitions for the assignment φ(r).

Linear Cubic Thin plate spline

r r3r2log(r)

Gaussian Multiquadratic Inverse multiquadratic

e−r2

2σ2(r2+σ2)1

2(r2+σ2)−1

hand is heavily problem-dependent such that a heuristic approach to the selection is

common.

In Figure 3.6, I illustrate the six different radial basis function definitions from

Table 3.2. For those radial basis functions involving the shape parameter σ, I sketch

the members of the corresponding family where σ∈{0.2,0.4,0.6}.

0.0 1.0 2.0

0.0

1.0

2.0

𝜙

(

)

(i)

0.0 1.0 2.0

0.0

2.0

4.0

6.0

8.0

𝜙

(

)

(ii)

0.0 1.0 2.0

0.0

1.0

2.0

3.0

𝜙

(

)

(iii)

0.0 1.0 2.0

0.0

0.2

0.4

0.6

0.8

1.0

𝜙

(

)

(iv)

≔0.2

≔0.4

≔0.6

0.0 1.0 2.0

0.0

1.0

2.0

𝜙

(

)

(v)

≔0.2

≔0.4

≔0.6

0.0 1.0 2.0

0.0

2.0

4.0

𝜙

(

)

(vi)

≔0.2

≔0.4

≔0.6

FIGURE 3.6: The six radial basis functions in Table 3.2.

(i)

Linear,

(ii)

Cubic,

(iii)

Thin plate spline

(iv)

Gaussian,

(v)

Multiquadratic,

(vi)

Inverse multiquadratic.

If we identify the number of basis functions nwith the number of sampling plan

points, that is, n≡m, one can execute a determination of the coefficients in (3.74) by

interpolation (cf. (3.18)). A common choice regarding the center points tiis to iden-

tify the center points with the sampling plan points xi, that is, ti≡xi. The floating-

point arithmetic operations complexity is O(n3), the storage costs are O(n2), and

the evaluation costs are O(n)(see, e.g., [181], [19], [63]). Since there is no particular

sparsity pattern associated with the corresponding interpolation matrix that could

be exploited during the solving, the solving’s arithmetic complexity is in the same

class as the Gaussian elimination algorithm’s arithmetic complexity of O(n3).

70 Chapter 3. Surrogate optimization

If the number of basis functions becomes too large, for instance, n≫1×104, then

this low-fidelity model becomes impractical. However, since the number of sam-

pling plan points is kept rather small, i.e., m≪1×104Ô⇒ n≪1×104, let us not

dwell on schemes for the evaluation of radial basis functions. For more details on

this active area of research, I refer to, e.g., [181], [19], [63].

By definition, the corresponding interpolation matrix – or, more suited to the

given context, the Gram matrix – is for all radial basis functions in Table 3.2 at least

positive semi-definite (see, e.g., [187], [116, p. 262]). In the case of the Gaussian radial

basis function, the corresponding interpolation matrix is even positive definite (see,

e.g., [187], [70, p. 46]) such that, given the output points, it is guaranteed that a

coefficient column vector exits. Nevertheless, for all radial basis functions, there

is still a governing trade-off principle or uncertainty principle which states that if we

increase the accuracy, e.g., by increasing the number of sampling plan points mor

by increasing the shape parameter σ, then the condition number – as an indicator

for numerical stability – grows as well (see, e.g. [62, ch. 16], [187]). Note that using

a sampling plan such as the Sobol quasi-random sequence can have a moderately

beneficial influence on the condition number (see, e.g., [34]).

Observe that if we make the choice n<m, more precisely, if we do not take ev-

ery sampling plan point as a center point, then one can execute a determination

of the coefficients in (3.74) by regression – analogously to (3.62). However, to my

best knowledge, there is no complete theory to explain the optimal selection of sam-

pling plan points as center points in the case of regression (see, e.g., [62, p. 168ff],

[70, p. 49]). Due to some relationship between radial basis functions and splines

(see, e.g., [91, p. 36]), the center point selection problem is mostly solved by heuris-

tic selection – similarly to the heuristic approach to the knot selection problem in

multivariate spline regression.

For the sake of completeness, I mention briefly two kinds of extensions of the

low-fidelity model in (3.74) which are discussed in the literature for reasons such as

ensuring well-posedness, increasing accuracy, and the like.

The first kind of extension is related to the thin plate spline and the radial pow-

ers – such as the linear or the cubic – (see Table 3.2). More precisely, the low-fidelity

model ψin (3.74) is extended linearly by a d-variate polynomial pof total degree at

most one, i.e., p∈Pd

≤1and, usually, in a monomial basis setting, such that one can

represent an extended radial basis low-fidelity model ˆ

ψas

ψ(x)∶=p(x)+ψ(x), (3.76)

where the signature of the operation +is R×R→R. For more details on the deter-

mination of the coefficients of ˆ

ψ(x)and for some applications of ˆ

ψ(x)in engineering,

I refer to [27] and references therein.16

The second kind of extension is related to a statistical setting (see, e.g., [61, p. 177]),

that is, the low-fidelity model ψin (3.74) is extended linearly by a d-variate polyno-

mial ˆ

µof total degree at most zero, i.e., ˆ

µ∈Pd

≤0such that one can define an extended

radial basis low-fidelity model ˆ

ψas

ψ(x)∶=ˆ

µ(x)+ψ(x), (3.77)

16Note that the determination of the coefficients of the extended radial basis low-fidelity model

ψresults in a block matrix which incorporates additional constraints in order to uniquely determine

all the coefficients. By considering the corresponding Schur complements, it is probably beneficial to

experiment with various combinations of polynomials p∈Pd

≤kof varying degree kand radial basis

functions ψin order to achieve desirable properties such as a symmetric positive definite block matrix.

3.1. Surrogate modeling & simulation 71

where the term ˆ

µ(x)can be interpreted as the estimate of the mean of the high-

fidelity model. In order to discuss the relationship between ψ(x)and ˆ

ψ(x), one

would have to dwell on affine spaces which we forgo since, frequently, the term ˆ

µ(x)

is considered as incorporated in the coefficients and in the basis functions of ψ(x)

(see, e.g., [91, p. 11f]). Otherwise, the determination of the coefficients column vector

in (3.62) has to be adapted such that

c∶=B+(y−µy), (3.78)

where B+has to be customized to the corresponding radial basis function and µy

denotes the constant unknown mean column vector with respect to y. Let us post-

pone, though, further discussions on the expression in (3.77) to the elaboration on

stochastic interpolation via kriging low-fidelity models.

Kriging

Supposing a sampling plan Xssuch as in (3.14), a kriging low-fidelity model can be

interpreted as an extended radial basis low-fidelity model ˆ

ψsuch as in (3.77) (see,

e.g., [70, p. 60] or [109]), that is,

ψ(x)∶=ˆ

µ(x)+

∑

i=1

ciφx(ri), (3.79)

where n≡m, i.e., all sampling plan points are utilized as center points. Following

the Gaussian in Table 3.2, the assignment φx(ri)in (3.79) is commonly chosen as

φx(ri)∶=exp(−ri), (3.80)

where the radii riare not defined via an l2-norm such as in (3.74), but via a metric

such that rican be written as

ri∶=

∑

j=1

θj∣xj−xj

i∣pj, (3.81)

where xj

irefers to the j-th component of the i-th sampling plan point, drefers to

the total number of components of the i-th sampling plan point, i.e., d≡Nξ, and

θj∈R+and pj∈[0,2]refer to parameters that have to be determined. Mind that the

coefficients ciin (3.79) depend on the parameters θj,pj, and ˆ

µ(x). These unknown

quantities are determined by means of a statistical machinery which we sketch out

next.

Building upon (3.20), an approach to a kriging low-fidelity model is to consider

∀i∈{1,. . .,m}.z(xi)=yi−µ(xi), (3.82)

where µindicates a constant unknown mean function with respect to yand the resid-

ual yi−µ(xi)indicates a realization of a Gaussian process z(x)(cf. [61, p. 146]). Note

that we only consider a constant unknown mean function. This choice of µis asso-

ciated with the so-called ordinary kriging. For other kinds of kriging, I refer to [221,

p. 154].

From a statistical viewpoint, z(x)is associated with a random error (recall the

discussion on noise in § 3.1.1). However, from an interpolation viewpoint, we do

not regard any errors in the output points yi(cf. [70, p. 55]).

72 Chapter 3. Surrogate optimization

Therefore, the sketch is inspired mainly by the approach to a kriging low-fidelity

model presented in [108] and in [70, ch. 2.4]. For more details regarding the sketch,

I refer to, e.g., [184], [108], [61, ch. 5.4], [173, ch. 5], [70, ch. 2.4] or [116, ch. 15].

With regard to a given sampling plan Xs, let us encode in matrix representation

the output points yias a column vector y∈Rm×1with

y∶=[y1y2... ym−1ym]T. (3.83)

Furthermore, with regard to a given sample, one can define the probability density

of an m-dimensional Gaussian distribution at yas

Nm(y∣µy,Σ)∶=1

(2π)m/2∣Σ∣1/2exp(−1

2(y−µy)TΣ−1(y−µy)), (3.84)

where, technically, yis associated with a corresponding random vector Y,Σ∈Rm×m

denotes the covariance matrix and µy∈Rm×1denotes the constant unknown mean

column vector with respect to ythat, given a scalar µy∈R, is defined as

µy∶=µy⋅[1 1 . . . 1 1]T(3.85)

≡µy⋅1, (3.86)

where 1 ∶=[1 1 .. . 1 1]Twith 1 ∈Rm×1.17 Observe that Nm(y∣µy,Σ)in (3.84) is

a slight abuse of notation in order to emphasize the sample-oriented viewpoint in

the present work (recall § 3.1.1).

If we consider the constant unknown variance with respect to y, i.e., σ2

y∈R, that

is associated with the constant unknown standard deviation w.r.t. y, i.e., σy∈R, by

σy∶=√σ2

y, (3.87)

then the covariance matrix can be expressed by the correlation matrix Ψ∈Rm×msuch

that

Σ≡σ2

y⋅Ψ. (3.88)

In the context of a kriging low-fidelity model, the entries of the correlation ma-

trix Ψ∶=[ψi,l]are commonly defined by the radial basis function in (3.80) such that

ψi,l≡exp(−

∑

j=1

θj∣xj

i−xj

l∣pj). (3.89)

The choice of entries in (3.89) reveals that if i=l, then ψi,l=1, and if ∥xi−xl∥l2grows

exponentially, then ψi,ltends asymptotically to zero.

Instead of a parametrization by its mean vector and its covariance matrix, one

can parameterize the m-dimensional Gaussian distribution in (3.84) by its mean vec-

tor, its variance scalar, and its correlation matrix such that (3.84) can be rewritten as

Nm(y∣µy,σ2

y,Ψ)∶=1

(2πσ2

y)m/2∣Ψ∣1/2exp(−1

2σ2

y(y−µy)TΨ−1(y−µy)), (3.90)

17Let us set ∣Σ∣∶=det(Σ)where we comprehend the map det as the determinant of a square matrix,

more precisely, det =Σ↦det(Σ)∶Rm×m→R.

3.1. Surrogate modeling & simulation 73

where it holds that ∀Ψ∈Rm×m.∀σ2

y∈R.det(σ2

yΨ)=(σ2

y)mdet(Ψ).

Due to the definition of the entries of Ψin (3.89), the matrix Ψis positive defi-

nite and all of its eigenvalues are positive, respectively. Therefore, the matrix Ψis

non-singular and the inverse matrix Ψ−1exists, respectively. Furthermore, one can

discern that ∣Ψ∣>0.18

Abstractly, one can define the likelihood function L=ϑ↦L(ϑ)≡L(y∣ϑ)with the

signature Θ→[0,1]and ϑ∶=(µy,σ2

y)such that the assignment L(ϑ)reads as

L(ϑ)∶=1

(2πσ2

y)m/2∣Ψ∣1/2exp(−1

2σ2

y(y−µy)TΨ−1(y−µy)). (3.91)

Roughly speaking: Given any y, the aim is to find the parameter ϑsuch that

the likelihood of observing yis maximized.19 Hence, the maximum likelihood estimate

(MLE) for ϑ, i.e., ˆ

ϑMLE or ˆ

ϑ, is characterized by

ϑ∶=argmax

ϑ∈Θ

L(ϑ). (3.92)

Though, computationally more amenable regarding (3.92) is to consider the ln-

likelihood function Lln =ϑ↦ln(L(ϑ))with the signature Θ→]− ∞,0]such that the

assignment Lln(ϑ)reads as

Lln(ϑ)∶=−m

2ln(2π)−m

2ln(σ2

y)−1

2ln(∣Ψ∣)−1

2σ2

y(y−ˆ

µy)TΨ−1(y−ˆ

µy), (3.93)

where, due to the definition of the natural logarithm, one has to suppose that σ2

y≥0

and ∣Ψ∣≥0.

The maximum likelihood estimates for µyand σ2

ycan be described by

µy∶=1TΨ−1

1TΨ−11y, (3.94a)

σ2

y∶=1

m(y−ˆ

µy)TΨ−1(y−ˆ

µy), (3.94b)

where ˆ

µyis defined analogously to (3.85), that is,

µy∶=ˆ

µy⋅1. (3.95)

By evaluating the ln-likelihood function in (3.93) at the estimates in (3.94) and

truncating those terms of the assignment Lln(ϑ)that represent solely numbers, one

can define the concentrated ln-likelihood function Lcln =(θ,p)↦Lcln(θ,p)with the

signature [0,+∞[d×[0,2]d→]− ∞,0]such that the assignment Lcln(θ,p)reads as

Lcln(θ,p)∶=−m

2ln(ˆ

σ2

y)−1

2ln(∣Ψ∣). (3.96)

In order to determine the maximum likelihood estimates of (θ,p)numerically by

utilizing a suitable optimization algorithm (recall § 2.3.3), it is common to associate

18Given ∣Ψ∣∶=det(Ψ)and if λidenote the eigenvalues of the matrix Ψ∈Rm×m, then one can invoke

the statement det(Ψ)≡∏m

i=1λi. Hence, if the matrix Ψis positive definite and all of its eigenvalues are

positive, then ∃Ψ. det(Ψ)>0 holds.

19For a more elaborated treatment of some aspects regarding the interpretation of the likelihood

function, I refer to [201, p. 29ff].

74 Chapter 3. Surrogate optimization

(ˆ

θ,ˆ

p)with the expression

(ˆ

θ,ˆ

p)∶=argmin

(θ,p)∈[0,+∞[d×[0,2]d

−Lcln(θ,p). (3.97)

In [70, p. 55–58], the authors mention further aspects regarding the numerical treat-

ment of the expression in (3.97). For instance, due to the definition of the entries of

the correlation matrix Ψin (3.89), the maximum likelihood estimates (ˆ

θ,ˆ

p)are sensi-

tive to the scaling of a given sampling plan Xs. Therefore, it is advisable to consider

the normalized sampling plan Xs, that is, the unit d-dimensional hypercube (recall

the case d=2 in Figure 3.1 and Figure 3.2).

Furthermore, it is preferable to consider the entity θrather on a closed logarith-

mic interval such as θ∈[10−3,102]d.

Additionally, it is common to alleviate the computational burden in (3.97) by

setting heuristically the entity ˆ

pto a fixed value in advance. A usual choice is

p=[2 2 . . . 2 2]Twith ˆ

p∈Rd×1.

It should be recalled that the kriging low-fidelity models are closely related to

Gaussian radial basis functions. Hence, it is reasonable to assume that the krig-

ing low-fidelity models suffer from ill-conditioning issues as well which could be

mitigated by simple regularization techniques (see the discussion concerning (3.65))

such that, for instance, instead of the correlation matrix Ψ, the regularized correla-

tion matrix Ψ+λIis considered with λ∈[e,1×10−6]and I∈Rm×m. Hence, the ma-

trix Ψ+λIhas to be take into account in (3.94) and in (3.96) as well. Technically, the

regularized correlation matrix implicitly supposes a regression problem as opposed

to an interpolation problem. Pragmatically, due to regarding the range of values for

the hyperparameter λas marginal, this specific regularized correlation matrix is still

treated within an interpolation problem (see, e.g., [70, p. 152]).

It is also reasonable to assume that the kriging low-fidelity models exhibits a

similar character regarding the floating-point arithmetic operations complexity, the

storage costs, and the evaluation costs such as the Gaussian radial basis functions.

Thus, it possible to exploit the matrix structure of the correlation matrix Ψwhich

is a square positive-definite matrix. More precisely, a matrix inversion based on

Cholesky decomposition can be performed in order to reduce the number of floating

point arithmetic operations compared to a lower–upper (LU) decomposition.

Notice well that if we apply a singular value decomposition (SVD) to the corre-

lation matrix Ψ, then we obtain a least-squares Kriging regression (cf. [70, p. 152]).

The determination of the inverse Ψ−1of a non-singular matrix Ψin SVD can be seen

as computationally equivalent to the determination of the pseudoinverse Ψ+of the

matrix Ψby the SVD method such as in (3.61).

After the determination of the maximum likelihood estimates (ˆ

θ,ˆ

p), one can

specify the kriging low-fidelity model in (3.79) as

y(x)∶=ˆ

µy+rTΨ−1(y−ˆ

µy), (3.98)

where ˆ

y∶X→Rsuch that ˆ

y(x)indicates the prediction20 at an arbitrary point xand

r∶=[ri]∈Rm×1denotes the correlation column vector that reads as

r∶=[r1r2... rm−1rm]T, (3.99)

20More precisely, in statistics vernacular, ˆ

y(x)indicates the best linear unbiased predictor (BLUP) (see,

e.g., [61, p. 146f]).

3.1. Surrogate modeling & simulation 75

where the components riare defined such as in (3.81). For an in-depth derivation of

the maximum likelihood estimate ˆ

y(x), I refer to [70, p. 59-62].

3.1.3 Simplified-physics low-fidelity models

Simplified-physics low-fidelity models depend on a user’s domain-specific knowl-

edge regarding the mathematical description of the physics associated with the high-

fidelity model and regarding the numerical software associated with the high-fidelity

model (recall chapter 2).

Depending on the degree of intervention in the implementation of the numeri-

cal software regarding the high-fidelity model, the low-fidelity models are intrusive

or non-intrusive. Let us consider all simplified-physics low-fidelity models as non-

intrusive, especially if a low-fidelity model is based on, e.g., a coarse-grid discretiza-

tion or a weakened termination criteria of an iterative solver or a combination of

both.

Recalling § 3.1.1, a basic postulate concerning a high-fidelity model Kand a low-

fidelity model ˜

Kis that K∈YXand ˜

K∈YX. Unlike the deterministic and probabilis-

tic data-fit low-fidelity models, one cannot generally provide a hypothesis space H.

Furthermore, the computational costs and the degrees of fidelity linked to the low-

fidelity models under consideration are prescribed by the user who, abstractly speak-

ing, imposes implicitly some kind of lexicographic ordering or lexicographic prefer-

ence on the class that encompasses all models.

Besides the low-fidelity model based on, e.g., a coarse-grid discretization, an-

other example of a simplified-physics low-fidelity model is a one-dimensional, lin-

ear boundary value problem (1D-LBVP) that is, in some sense, related to a two-

dimensional, linear boundary value problem (2D-LBVP). Notice well that the two-

dimensional, linear boundary value problem can be seen as a simplified-physics

low-fidelity model which in turn, in some sense, is related to a three-dimensional,

non-linear boundary value problem (3D-NLBVP).

Hence, one can construct a hierarchy of low-fidelity models where the high-

fidelity model corresponds to a 3D-NLBVP. In Figure 3.7, there is a schematic de-

piction of such a possible user-prescribed hierarchy.21

If we invoke the Figure 2.1a, then one can concretize the Figure 3.7 with regard

to a user-prescribed hierarchy of magnetoquasistatic and magnetostatic problems,

respectively, by means of the Figure 3.8.

In (i) of Figure 3.8, a single conducting subdomain as a common representative

of a magnetoquasistatic subsystem’s domain of application is depicted. In order to

emphasize the subdomain’s three-dimensionality, let us utilize the superscript 3D.

In (ii) of Figure 3.8, a single conducting subdomain exhibiting two-dimensionality

(2D) is shown. The cross sectional area indicated by Ωnc can be regarded as topologi-

cally equivalent to the closed 2-ball that is topologically equivalent to the closed unit

2-cube [0,1]2. Hence, the cross-sectional area’s boundary ∂Ωnc can be seen as topo-

logically equivalent to the 1-sphere that is topologically equivalent to the boundary

of the closed unit 2-cube [0,1]2. In applications, assuming an appropriate metric

structure, a round and a rectangular cross-sectional area are commonly utilized for

geometrically modeling a round conductor and a foil conductor, respectively. These

conductor kinds are usually the building blocks of an inductive components wind-

ing of varying complexity. For instance, the round conductor constitutes a basic

building block of a litz wire winding (see, e.g., [154, p. 110-113]).

21The Figure 3.7 is partly inspired by the depictions in [166].

76 Chapter 3. Surrogate optimization

(a)(b)

3D-NLBVP

3D-LBVP

2D-NLBVP

2D-LBVP

1D-LBVP

1D-NLBVP

computational costs

degree of fidelity

3D-NLBVP

3D-LBVP

2D-NLBVP

2D-LBVP

1D-NLBVP

1D-LBVP

FIGURE 3.7: A schematic depiction of a user-prescribed hierarchy of

problems which are associated with simplified-physics low-fidelity

models. The problem 3D-NLBVP is associated with the high-fidelity

model. (a) An arrangement of the user-prescribed hierarchy with re-

gard to the degree of fidelity and the computational costs. (b) An en-

coding of the user-prescribed hierarchy as a relationships diagram in

which the arrow points from a model with higher degree of fidelity

and higher computational costs to a model with lower degree of fi-

delity and lower computational costs.

A corresponding two-dimensional boundary value problem can be associated

with a three-dimensional boundary value problem where the cross-sectional area is

assumed to be spatially longitudinally homogeneous.

In (iii) of Figure 3.8, an ohmic resistor from electric circuit components is de-

picted as a symbolic representation of a function R. In applications, the function Ris

often associated with special mathematical functions such as the natural logarithm

or Bessel functions. Prevalently though, the function Ris associated with a multi-

variate rational function.

For instance, a conductor’s ohmic resistance at the frequency 0Hz can be ex-

pressed as a multivariate rational function depending on a parameter point ξ

ξ(recall

§2.2.3) in which geometrical parameters are incorporated that define the conduc-

tor’s length and its cross-sectional area. Assuming a spatially constant electric con-

ductivity (recall § 2.1.2), that is, σ(x)∶=σ0with σ0∈R+, then, technically, one could

include the material parameter σ0in the parameter point ξ

ξas well. However, a good

conductor is usually assumed in the sense that the material characteristics of plain

copper are utilized such that σ0is fixed as σ0∶=σCu with σCu ∶=5.96×107S/m.

Another example is an impedance which can be interpreted as a representative

of a real inductive component in a circuit theory context. This impedance can be

used, e.g., in the computation of a two-port S-parameter matrix. The components of

the parameter point ξ

ξadhere to the physical interpretation as an angular frequency,

a capacitance, an inductance, and an electrical resistance.

Abstractly, one can state that R∈p

qPd

(m,n)where, analogous to (3.50), the space p

qPd

(m,n)

denotes the space of d-variate rational polynomials of total degree at most min the

numerator polynomial p, i.e., p∈Pd

≤mand total degree at most nin the denomina-

tor polynomial q, i.e., q∈Pd

≤n. Hence, the function Rcan be called a multivariate

rational polynomial function of type (m,n)with m,n∈Z+

0. Finally, one can read the

3.1. Surrogate modeling & simulation 77

Ω3D

∂Ω3D

T3D

Ω3D

cΩ2D

Ω2D

∂Ω2D

T3D

h2T2D

1∆3D

Ax=b

2∆3D

Ax=b

T2D

1∆2D

Ax=b

2∆2D

Ax=b

(i) (ii) (iii)

FIGURE 3.8: A schematic depiction of a user-prescribed hierarchy of

magnetoquasistatic and magnetostatic problems which are associated

with simplified-physics low-fidelity models. The three-dimensional

domains in (i) are associated with the high-fidelity model’s un-

derlying magnetoquasistatic or magnetostatic problem. The two-

dimensional domains in (ii) are associated with a low-fidelity model’s

underlying magnetoquasistatic or magnetostatic problem. The ohmic

resistors in (iii) are associated with a multivariate rational polynomial

derived from a magnetoquasistatic or a magnetostatic problem. It is

assumed that the user prefers the low-fidelity models in (ii) over those

low-fidelity models in (iii).

assignment of the (m,n)multivariate rational function Ras

R(x)∶=p(x)

q(x)such that p∈Pd

≤m,q∈Pd

≤n. (3.100)

Unlike deterministic data-fit low-fidelity models, the function Ris derived from the

system of Maxwell’s equations (recall § 2.1.2) and its coefficients are fixed to known

values. It might be beneficial to examine the usefulness of multivariate rational poly-

nomials as deterministic data-fit low-fidelity models, however, I ignore them in the

present work and they – as well as associated methods within the electromagnet-

ics context such as vector fitting (see, e.g., [85]) – are left for future investigations.

78 Chapter 3. Surrogate optimization

Notice well that the rationale for ignoring them is driven by potential difficulties

in handling properly spurious poles of a multivariate rational polynomial in an op-

timization context.22 Additionally, it is supposed that the potential benefits of a

multivariate rational polynomial’s localized behavior steered by the poles are com-

parable with a radial basis function’s localized behavior steered by the choice of the

center points such that the radial basis functions are preferred over the multivariate

rational polynomials. A thoroughly elaborated juxtaposition of these two kinds of

deterministic data-fit low-fidelity models is out of the scope of this work, though.

Notice well that the dotted arrows in Figure 3.8 are semantically overloaded in

the sense that their vertical reading and their horizontal reading differ.

Considering the first level and the second level within (i) of Figure 3.8, the dotted

arrows indicate a relationship between a fine-grid discretization and a coarse-grid

discretization, more precisely, the respective simplicial triangulations (recall § 2.2.2)

T3D

h1and T3D

h2are governed by the characteristic h1<h2. Furthermore, considering

the second and the third level, the dotted arrows indicate a relationship between

a higher threshold 1∆3D

Ax=band a lower threshold 2∆3D

Ax=bfor a termination crite-

rion of an iterative solver, more precisely, the thresholds exhibit the characteristic

1∆3D

Ax=b<2∆3D

Ax=bwith 1∆3D

Ax=b∈R+and 2∆3D

Ax=b∈R+.

The explanation for the levels within (ii) of Figure 3.8 is analogous to the previous

one. By contrast, the dotted arrows for the levels within (iii) of Figure 3.8 hint at a

change such as the domain transformations in (A.1) or in (A.14). Or the arrows

hint at a change, for instance, from a multivariate rational polynomial function of

type (m,n)to a multivariate rational polynomial function of type (m,0)and leading

coefficient of one, that is, to a multivariate polynomial function from the space Pd

≤k.

The horizontal reading of the dotted arrows in Figure 3.8 reflects the relation-

ships diagram within (b)of Figure 3.7. In the vertical reading, it is partially con-

ceivable how a formal encoding of the arrows could look like, but it is not straight-

forward to conceive such a formal encoding with regard to the horizontal reading.

If we employ a structural perspective to the Figure 3.8 – similarly to the structural

perspectives in ch. 2, then one can extract exemplarily the formal encodings by a

map-oriented representation in (Diagrams of Fig. 3.8) where, for the sake of clarity,

it is omitted to extract the inverse maps from the Figure 3.8.

From (Diagrams of Fig. 3.8), one can conclude that, theoretically, a less preferred

problem can be constructed by a composition of maps associated with more pre-

ferred problems. Then, one can make statements such as

l1○f3○f2○f1=g3○g2○g1(3.101a)

l1○f3○f2○f1=g3○g2○g1○i1(3.101b)

l2○l1○f3○f2○f1=h3○h2○h1○i2○i1. (3.101c)

Note that the statements in (3.101) are valid under the assumption that all maps are

set functions and their domains and co-domains are sets. However, this assumption

does not take into account adequately the different algebraic characters of, e.g., T3D

h1,

22Let us comprehend an unwanted pole as a spurious pole in the sense that it captures a singularity

that does not correspond to a non-essential singularity of the high-fidelity model.

3.1. Surrogate modeling & simulation 79

1∆3D

Ax=band R1.

T3D

h1T2D

h1R1T3D

h1T2D

h1R1T3D

h1T2D

h1R1

T3D

h2T2D

h2R2T3D

h2T2D

h2R2T3D

h2T2D

h2R2

1∆3D

Ax=b1∆2D

Ax=bR31∆3D

Ax=b1∆2D

Ax=bR31∆3D

Ax=b1∆2D

Ax=bR3

2∆3D

Ax=b2∆2D

Ax=bR42∆3D

Ax=b2∆2D

Ax=bR42∆3D

Ax=b2∆2D

Ax=bR4

f1g1h1

i1i2

f1g1h1

F1F2

f2g2h2

j1j2

f2g2h2

G1G2

f3g3h3

k1k2

f3g3h3

H1H2

l1l2

(Diagrams of Fig. 3.8)

Moreover, the statements in (3.101) do not adequately capture the idea that, for

instance, moving from a high-fidelity model’s underlying three-dimensional mag-

netoquasistatic problem to a low-fidelity model’s underlying two-dimensional mag-

netoquasistatic problem corresponds technically to a loss of problem information,

e.g., with regard to the boundary conditions. Therefore, it is more appropriate to

consider the problems associated with the low-fidelity models as forgetful interpreta-

tions of the problem associated with the high-fidelity model. This viewpoint can be

mediated by, for example, the maps F1and F2such that

F1(T3D

h1)∶=T2D

h1F1(T3D

h2)∶=T2D

h2F1(f1)∶=g1(3.102a)

(F2○F1)(T3D

h1)∶=R1(F2○F1)(T3D

h2)∶=R2(F2○F1)(f1)∶=h1, (3.102b)

where the map F1and the map F2are overloaded in order to deal with the different

algebraic characters. We elaborate on the corresponding formal approach based on

the category theoretical language in chapter 4.

To adjust the expectations correctly, notice well that the category theoretical lan-

guage is not a panacea at all. Its merits stem from the fact that, in a nutshell, there

is an absence of a cohesive theory to express in formal terms the relationships be-

tween different problems associated with a high-fidelity model and corresponding

low-fidelity models (see Figure 3.8).

This absence, though, is the Achilles’ heel of the mathematical analysis of any

optimization approach that exploits simplified-physics low-fidelity models and re-

lies on a, in some sense, benign resemblance between these low-fidelity models and

the high-fidelity model (see, e.g., [49, p. 76]).

In order to mitigate the ramifications of the absence, the authors in [121] suggest

to assess the resemblance based on an observed points subset, i.e., the training sub-

set, and some quality factors that are derived in the context of the space mapping

paradigm.

In a more general context, the author in [201, p. 76] suggests the assessment and

ordering of different problems by analyzing their explanatory power based on an

80 Chapter 3. Surrogate optimization

observed points subset in a Bayesian setting.

Mind that these approaches are not pursued in the present work, although these

approaches are promising endeavors towards a quantification of the relationships

between different problems associated with the various models.

However, these approaches do not seem widely adopted in practical applica-

tions – because, presumably, the user-prescribed hierarchy of problems, which re-

flects the user’s preferences, outranks other conceivable orderings of the problems.

Nevertheless, in § 3.3.2, I contribute partly to this overall discussion by elabo-

rating briefly on the potential role of the NREGE in (3.24) and the SSPCC in (3.31)

regarding the quantitative assessment of the quality of a low-fidelity model and a

surrogate model within the space-mapping paradigm.

In § 3.3.1, we discuss the efficient global optimization (or sequential kriging op-

timization) technique as a subtype of the model management strategy adaptation.

This technique exploits solely a kriging low-fidelity model.

In§3.3.2 and in § 3.3.2, we discuss the space mapping paradigm and the co-

kriging approach as subtypes of the two model management strategies adaptation

and fusion, respectively (recall § 1.3). Both the space mapping paradigm and the co-

kriging approach are designed such that they exploit especially simplified-physics

low-fidelity models.

A notable distinction between the space mapping paradigm and the co-kriging

approach is how they deal with the statement in (3.2). Abstracting from the authors’

perspective in [70, p. 167], it can be argued that the co-kriging approach focuses on

an instance of the generic statement

∀x∈X.K(x)=YZρ(x)⋅Y˜

K(x)+YZ∆(x), (3.103)

where Zρ∈YXand Z∆∈YXdenote correction maps and the map ⋅Y∶Y×Y→Yand

the map +Y∶Y×Y→Ydenote a suitable multiplication on Yand a suitable addi-

tion on Y, respectively. We elaborate on the instance associated with the co-kriging

approach in § 3.3.2.

Taking into account in an abstract manner the stance of the authors in [194,

p. 32f], [56, ch. 2.5], and [49, p. 110ff], it can be argued that the space-mapping

paradigm focuses on instances of the generic statements

∀x∈X.K(x)=Y(˜

K○X˜

P)(x), (3.104a)

∀x∈X.K(x)=Y(˜

R○Y˜

K)(x), (3.104b)

∀x∈X.K(x)=Y(˜

R○Y˜

K○X˜

P)(x), (3.104c)

where ˜

P∈XXdenotes a domain-oriented correction map and ˜

R∈YYdenotes a co-

domain-oriented correction map and the map ○X∶YX×XX→YXand the map ○Y∶

YY×YX→YXdenote suitable composition maps. Notice well that, in (3.104), the

maps ˜

K○X˜

P∶X→Y,˜

R○Y˜

K∶X→Y, and ˜

R○Y˜

K○X˜

P∶X→Ydesignate surrogate

models. Hence, we encounter the conceptional distinction between the notion of a

low-fidelity model and a surrogate model that has been mentioned in § 3.1.1.

By using the defect correction principle of numerical analysis as a scaffolding, the

author in [56, ch. 2.5] investigates the space mapping paradigm. With regard to the

corresponding numerical iteration schemes, the defect correction principle permits

to interpret implementations of the map ˜

Rand the map ˜

Pas a left-preconditioner

and a right-preconditioner, respectively. In § 3.3.1, I dwell on algorithmic instances

associated with the space mapping paradigm.

3.2. Surrogate-based optimization 81

In § 4, the category theoretical language is employed as an algebraic modeling

scaffolding in order to assess its capability to complement the primarily numerical

analytic narrative on simplified-physics low-fidelity models and the space mapping

paradigm.

3.2 Surrogate-based optimization

A basic premise in the present work is that the acquisition of pairs of sampling plan

points and output points with respect to a sample sin (3.13) is computationally

expensive. Thus, it forces a user to be parsimonious with regard to the sample size m.

From an engineering application viewpoint, though, imagine the use case in

which an effortless interplay between hardware and software enables a much faster

acquisition of a sample sthan without using this interplay. More concretely, imagine

that a user can, without much ado, exploit opportunities for parallel computing and

GPU (graphics processing unit) computing.

Recalling the Figure 1.4, this use case shifts rather the attention from the level of

algorithms and the level of programs to a level of hardware technologies which is

out of the scope of the present work.

However, if we abstract from the aforementioned concrete use case, then it re-

veals this section’s main aim as gaining some insights about the degree of similar-

ity between a high-fidelity model and a low-fidelity model without the usage of

a model management strategy. This consideration corresponds to the assessment

of the global and the local accuracy of a low-fidelity model with regard to a high-

fidelity model.

Building upon a certain degree of established similarity between a high-fidelity

model and a low-fidelity model, the basic idea underlying the surrogate-based op-

timization (cf. § 1.2) is, first, to find a minimum associated with the low-fidelity

model and, second, either to accept this low-fidelity model’s minimum as a proxy –

to some extent – of a minimum associated with the high-fidelity model or to use

this low-fidelity model’s minimum as a starting point of the search for a minimum

within the high-fidelity model.

In the subsequent subsections, let us examine the optimization with the test

functions in Figure 2.2 by data-fit low-fidelity models and by emulated simplified-

physics low-fidelity models.

3.2.1 Optimization with test functions

by data-fit low-fidelity models

Using the Sobol quasi-random sequence sampling plan in Figure 3.2, let us invoke a

2-variate monomial polynomial model p(x)∈P2

≤2via regression of the test functions

(and high-fidelity models, respectively) in Figure 2.2 where the matrices Wand R

in (3.65) are chosen such that W∶=Iwith I∈Rm×mand R∶=0 with 0 ∈R6×6. In Fig-

ure 3.9, the corresponding contour representations are depicted in the cases where

the number of sampling plan points is given by m∶=10 and by m∶=50.

In Figure 3.10, a Sobol quasi-random sequence sampling plan is utilized with the

number of sampling plan points set to m∶=10 and to m∶=50 as well. A radial basis

function φ=r↦φ(r)with thin plate spline assignment (see Table 3.2) is invoked via

interpolation of the test functions in Figure 2.2. Let us choose the thin plate spline

assignment as a representative of radial basis functions without additional hyperpa-

rameters to be adjusted. Mind that, for visualization purposes, in-house Julia code

82 Chapter 3. Surrogate optimization

is combined with the Julia PL package ScatteredInterpolation.jl (see [141] and

[216]).

Finally, utilizing the Sobol quasi-random sequence sampling plan with m∶=10

and m∶=50 again, let us invoke the last data-fit low-fidelity model, that is, a krig-

ing low-fidelity model via interpolation of the test functions in Figure 2.2. Since the

computational burden of finding numerically the optimal parameters (ˆ

θ,ˆ

p)in (3.97)

scales with the dimensionality d, let us choose a compromise with regard to the pa-

rameters (ˆ

θ,ˆ

p)in the sense that it is set that (ˆ

θ1,ˆ

θ2,ˆ

p1,ˆ

p2≡ˆ

p1). Thus, we compensate

slightly the computational burden of finding numerically the optimal parameters

whilst taking into account the benefit of a numerical search for optimal parameters

compared to a manually predefined set of parameters. Recalling § 2.3.3, notice that

the Nelder-Mead simplex algorithm is primarily employed to the optimization prob-

lem in (3.97). Compared loosely to an adaptive differential evolution algorithm, the

NMS algorithm’s results differ mostly in one or two decimal places from the ADE

algorithm’s results but, on the average, the NMS algorithm’s results are achieved

faster than the ADE algorithm’s results. Though, it is hard to generalize this obser-

vation and to detect a firm preference for an optimization algorithm for the task at

hand in (3.97). Mind that, for visualization purposes, in-house Julia code is com-

bined with the Julia PL package Surrogates.jl.

At a qualitative level, given the number of sampling plan points by m∶=10, one

can observe that the 2-variate monomial polynomial model best recovers the con-

tour, more precisely, the value and the shape, of the Unit sphere and the Booth test

function. Furthermore, it satisfactorily recovers the contour of the Rosenbrock and

the modified Branin test function, and it worst recovers the contour of the Ackley

and the Michalewicz test function. Due to the known definitions of the test func-

tions, these observations are plausible. Mind that the polynomial model is invoked

in a regression context, hence, the influence of another kind of polynomial model

on the quality of the low-fidelity model with respect to the high-fidelity model is

subdued. However, the influence of a higher number of sampling plan points is

slightly bigger – especially, if we consider the modified Branin test function which

can be seen as a rather protypical function within an engineering applications’ con-

text (cf. [70, p. 196]).

In the case of m∶=10, if compared to the monomial polynomial model within the

regression context, then the thin plate spline radial basis function and the kriging

low-fidelity model within the interpolation context recover moderately, for instance,

the modified Branin function. If we contrast the thin plate spline radial basis func-

tion (or short TPS RBF) with the kriging low-fidelity model, then one can observe

that the kriging low-fidelity model tends to retrieve more accurately the values of

the test functions whereas the thin plate spline radial basis function tends to retrieve

more accurately the shape of the test functions. However, in the case of m∶=50,

both low-fidelity models are able to recover the values and the shapes of the high-

fidelity models satisfactorily albeit the kriging low-fidelity performs the recovery

slightly better. Recall, though, that the thin plate spline radial basis function does

not involve any hyperparameters, thus, a computationally intensive hyperparame-

ters optimization step is omitted.

At a quantitative level, let us look at the normalized mean generalization error

H,sg(ˆ

Qξ

ξ)and the mean SSPCC r2

y˜

y∣kwithin the k-fold cross validation method w.r.t.

deterministic and probabilistic data-fit low-fidelity models.23

23In the case of a Chebyshev polynomial, the common k-fold cross-validation method breaks the

regular pattern of the Chebyshev grid (recall Figure A.1).

3.2. Surrogate-based optimization 83

-30 -20 -10 0 10 20 30

-30

-20

-10

(i)

-30 -20 -10 0 10 20 30

-30

-20

-10

(ii)

-10 -5 0 5 10

-10

-5

(iii)

-10 -5 0 5 10

-10

-5

(iv)

0 1 2 3 4

(v)

-5 0 5 10

(vi)

(A) The number of sampling plan points is given by m∶=10.

The red cross refers to the global minimum of the high-fidelity models in Figure 2.2b.

-30 -20 -10 0 10 20 30

-30

-20

-10

(i)

-30 -20 -10 0 10 20 30

-30

-20

-10

(ii)

-10 -5 0 5 10

-10

-5

(iii)

-10 -5 0 5 10

-10

-5

(iv)

0 1 2 3 4

(v)

-5 0 5 10

(vi)

(B) The number of sampling plan points is given by m∶=50.

The red cross refers to the global minimum of the high-fidelity models in Figure 2.2b.

FIGURE 3.9: Using the Sobol quasi-random sequence sampling plan

in Figure 3.2, a 2-variate monomial polynomial model p(x)∈P2

≤2via

regression (W∶=Iwith I∈Rm×mand R∶=0 with 0 ∈R6×6in (3.65))

of the test functions (and high-fidelity models, respectively) in Fig-

ure 2.2 (solely in contour representation).

84 Chapter 3. Surrogate optimization

-30 -20 -10 0 10 20 30

-30

-20

-10

(i)

-30 -20 -10 0 10 20 30

-30

-20

-10

(ii)

-10 -5 0 5 10

-10

-5

(iii)

-10 -5 0 5 10

-10

-5

(iv)

0 1 2 3 4

(v)

-5 0 5 10

(vi)

(A) The number of sampling plan points is given by m∶=10.

The red cross refers to the global minimum of the high-fidelity models in Figure 2.2b.

-30 -20 -10 0 10 20 30

-30

-20

-10

(i)

-30 -20 -10 0 10 20 30

-30

-20

-10

(ii)

-10 -5 0 5 10

-10

-5

(iii)

-10 -5 0 5 10

-10

-5

(iv)

0 1 2 3 4

(v)

-5 0 5 10

(vi)

(B) The number of sampling plan points is given by m∶=50.

The red cross refers to the global minimum of the high-fidelity models in Figure 2.2b.

FIGURE 3.10: Using the Sobol quasi-random sequence sampling plan

in Figure 3.2, a radial basis function φ=r↦φ(r)with thin plate spline

assignment (see Table 3.2) via interpolation of the test functions (and

high-fidelity models, respectively) in Figure 2.2

(solely in contour representation).

3.2. Surrogate-based optimization 85

-30 -20 -10 0 10 20 30

-30

-20

-10

(i)

-30 -20 -10 0 10 20 30

-30

-20

-10

(ii)

-10 -5 0 5 10

-10

-5

(iii)

-10 -5 0 5 10

-10

-5

(iv)

0 1 2 3 4

(v)

-5 0 5 10

(vi)

(A) The number of sampling plan points is given by m∶=10.

The red cross refers to the global minimum of the high-fidelity models in Figure 2.2b.

-30 -20 -10 0 10 20 30

-30

-20

-10

(i)

-30 -20 -10 0 10 20 30

-30

-20

-10

(ii)

-10 -5 0 5 10

-10

-5

(iii)

-10 -5 0 5 10

-10

-5

(iv)

0 1 2 3 4

(v)

-5 0 5 10

(vi)

(B) The number of sampling plan points is given by m∶=50.

The red cross refers to the global minimum of the high-fidelity models in Figure 2.2b.

FIGURE 3.11: Using the Sobol quasi-random sequence sampling plan

in Figure 3.2, a kriging low-fidelity model via interpolation (where

(ˆ

θ1,ˆ

θ2,ˆ

p1,ˆ

p2≡ˆ

p1)in (3.97)) of the test functions (and high-fidelity

models, respectively) in Figure 2.2 (solely in contour representation).

86 Chapter 3. Surrogate optimization

As it has been already uttered in § 3.1.1, we consider the 5-fold case and the

10-fold case. The sample size is set to m∶=50 since we consider the results associ-

ated with this sample size as sample-based best case error estimates and sample-

based lower error bounds, respectively. The wording concerning the estimates and

bounds is rather a pragmatism-driven ad-hoc artifice and it should not be regarded

too tightly through the formally well crafted glasses in the context of numerical sim-

ulations (see § 2.2).

Moreover, let us consider the normalized global first-order sensitivity measures

y,iwith i∈{1,2}evaluated at fw.r.t. the data-fit low-fidelity models.

TABLE 3.3: The normalized mean generalization error eN

H,sg(ˆ

Qξ

ξ)and

the mean SSPCC r2

y˜

y∣kwithin the k-fold cross validation method

w.r.t. the 2-variate monomial polynomial in Figure 3.9 with sample

size m∶=50.

(i) (ii) (iii) (iv) (v) (vi)

H,sg(ˆ

Qξ

ξ)∣k∶=50.5098 ≪1.0 ×10−16 ≪1.0 ×10−16 ≫1.0 0.0657 >1.0

y˜

y∣k∶=50.5461 1.0 1.0 0.9003 0.3369 0.6558

H,sg(ˆ

Qξ

ξ)∣k∶=10 0.4477 ≪1.0 ×10−16 ≪1.0 ×10−16 ≫1.0 0.0582 >1.0

y˜

y∣k∶=10 0.7217 1.0 1.0 0.9352 0.3951 0.8137

TABLE 3.4: The normalized mean generalization error eN

H,sg(ˆ

Qξ

ξ)and

the mean SSPCC r2

y˜

y∣kwithin the k-fold cross validation method w.r.t.

the radial basis function with thin plate spline assignment in Fig-

ure 3.10 with sample size m∶=50.

(i) (ii) (iii) (iv) (v) (vi)

H,sg(ˆ

Qξ

ξ)∣k∶=50.3377 0.0149 0.4173 ≫1.0 0.0935 0.3707

y˜

y∣k∶=50.6166 0.9999 0.9973 0.9610 0.1772 0.8905

H,sg(ˆ

Qξ

ξ)∣k∶=10 0.2330 0.0111 0.4064 ≫1.0 0.0677 0.2898

y˜

y∣k∶=10 0.6963 0.9999 0.9963 0.9780 0.4328 0.9732

In Table 3.3, the normalized mean generalization error and the mean SSPCC

within the k-fold cross validation method w.r.t. the 2-variate monomial polynomial

in Figure 3.9 is presented. It supports the observations at the qualitative level. If

we pick the modified Branin test function as an example, then the Table hints addi-

tionally at the monomial polynomial’s convenience for recovering at least partly the

shape of such a test function.

In Table 3.4, the normalized mean generalization error and the mean SSPCC

within the k-fold cross validation method w.r.t. a radial basis function with thin plate

spline assignment is listed. Compared to the monomial polynomial, the thin plate

spline radial basis function recovers better the values and the shape of the modified

Branin test function.

3.2. Surrogate-based optimization 87

TABLE 3.5: The normalized mean generalization error eN

H,sg(ˆ

Qξ

ξ)and

the mean SSPCC r2

y˜

y∣kwithin the k-fold cross validation method w.r.t.

the kriging low-fidelity model in Figure 3.11 with sample size m∶=50.

(i) (ii) (iii) (iv) (v) (vi)

H,sg(ˆ

Qξ

ξ)∣k∶=50.2517 0.2818 0.0043 0.9956 0.0443 0.0242

y˜

y∣k∶=50.5367 0.9987 0.9999 0.9998 0.5188 0.9983

H,sg(ˆ

Qξ

ξ)∣k∶=10 0.2555 0.0925 0.0011 0.3290 0.0130 0.0022

y˜

y∣k∶=10 0.5838 0.9999 0.9999 0.9998 0.7665 0.9998

TABLE 3.6: The normalized global first-order sensitivity measure SN

y,i

with i∈{1,2}evaluated at fw.r.t. the 2-variate monomial polynomial

in Figure 3.9b with sample size m∶=50.

(i) (ii) (iii) (iv) (v) (vi)

y,1(f)0.4922 0.5000 0.4894 0.9935 0.8633 0.4455

y,2(f)0.5078 0.5000 0.5106 0.0065 0.1367 0.5545

Σ2

i=1SN

y,i(f)1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

TABLE 3.7: The normalized global first-order sensitivity measure SN

y,i

with i∈{1,2}evaluated at fw.r.t. the radial basis function with thin

plate spline assignment in Figure 3.10b with sample size m∶=50.

(i) (ii) (iii) (iv) (v) (vi)

y,1(f)0.5041 0.5000 0.4851 0.9733 0.5351 0.6226

y,2(f)0.4959 0.5000 0.5149 0.0267 0.4649 0.3774

Σ2

i=1SN

y,i(f)1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

TABLE 3.8: The normalized global first-order sensitivity measure SN

y,i

with i∈{1,2}evaluated at fw.r.t. the kriging low-fidelity model in

Figure 3.11b with sample size m∶=50.

(i) (ii) (iii) (iv) (v) (vi)

y,1(f)0.8994 0.5000 0.4893 0.9963 0.2609 0.7221

y,2(f)0.1006 0.5000 0.5107 0.0037 0.7391 0.2779

Σ2

i=1SN

y,i(f)1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

In Table 3.5, the normalized mean generalization error and the mean SSPCC

within the k-fold cross validation method w.r.t. a kriging low-fidelity model. Com-

pared to the thin plate spline radial basis function and the monomial polynomial,

the kriging low-fidelity model shows a relatively high mean SSPCC for all test func-

tions. Observe that the kriging low-fidelity model is capable to recover the values

88 Chapter 3. Surrogate optimization

and the shape of the modified Branin test function with a high degree of accuracy.

However, all data-fit low-fidelity model exhibits some difficulties in recovering

the values of

(iv)

the Rosenbrock function. This observation indicates that there is a

need for a normalization of the test function’s values. Thus, let us determine auto-

matically the decimal power of the test function’s maximal value and normalize all

the test function’s values regarding this decimal power. Notice well that the shape-

related entities such as the SSPCC or the normalized global first-order sensitivity

measures are not affected by a normalization of the test function’s values.

In Figure 3.12, in Figure 3.13 and in Figure 3.14, let us extend in a self-explanatory

manner the definition of eN

H,sg(ˆ

Qξ

ξ)∣kin (3.28) – and the definition of eNR

H,sg(ˆ

Qξ

ξ)∣kas

well – and the definition of r2

y˜

y∣kin (3.33) in order to depict the corresponding courses

of the paths regarding the number of testing points mg.

Observe that the number of testing points is truncated at mg∶=10 which corre-

sponds to the total number of points mby m∶=kmg. If k∶=10, then m∶=100. Thus,

from an application-driven viewpoint with, e.g., weaker requirements w.r.t. the ac-

curacy, let us regard such a sample size as an upper limit. An immediate potential

drawback of such an upper limit is that, in general, one cannot expect to detect some

kind of monotonicity which is usually associated with an asymptotic consideration.

Inspecting Figure 3.12b, one can observe that the paths associated with the er-

ror eNR

H,sg(ˆ

Qξ

ξ)∣kreflect values that are order of magnitudes higher than the values

corresponding to the paths associated with the error eN

H,sg(ˆ

Qξ

ξ)∣k. This observation

echos the more conservative behavior of eNR

H,sg(ˆ

Qξ

ξ)∣k(cf. § 3.1.1).

Observing Figure 3.12 and Figure 3.14, a valuable insight is that the continu-

ous deformation of the courses of paths corresponding to the 5-fold cross validation

into the courses of paths corresponding to the 10-fold cross validation is relatively

marginal – except for

(i)

the Ackley test function and

(v)

the Michalewicz test func-

tion which, from an application-driven viewpoint, one can consider as extreme use

cases as opposed to the common use cases

(ii)

(iii)

(iv)

, and

(vi)

. Another valuable

insight is that if the requirements regarding the values and the shapes of the low-

fidelity models are weakened, then, for the common use cases, the deterministic

data-fit low-fidelity models can provide a computationally less expensive alterna-

tive for the probabilistic data-fit low-fidelity models.

In Table 3.6, in Table 3.7, and in Table 3.8, I present the results regarding the

normalized global first-order sensitivity measure SN

y,iwith i∈{1,2}evaluated at f

w.r.t. the 2-variate monomial polynomial in Figure 3.9b, the radial basis function

with thin plate spline assignment in Figure 3.10b, and the kriging low-fidelity model

in Figure 3.11b, respectively. In all cases the sample size mis set to m∶=50.

In Figure 3.15, let us extend in an obvious way the definition of SN

y,i(f)in (3.34)

in order to depict the corresponding courses of the paths regarding the number of

training points mt.

A valuable insight is that if we only focus on limit considerations, then we are not

able to tell some test functions apart (see, e.g.,

(i)

(ii)

(iii)

in Figure 3.15). However,

the courses of the paths furnish us with some hints about the landscapes of the test

functions, therefore, they are helping us partially to tell the test functions apart.

Another valuable insight is that, in the case of the interpolation-focused data-fit

low-fidelity models, the entity SN

y,i(f)exhibit some kind of correlation with the en-

tity r2

y˜

y,cv(k,mg). This insight drives the conjecture about the trustworthiness of low-

fidelity models’ normalized global first-order sensitivity measures in (3.35). Mind

3.2. Surrogate-based optimization 89

that this insight should rather be understood as an intriguing starting point for

future investigations where the behavior of regression-focused and interpolation-

focused data-fit low-fidelity models is elaborated more thoroughly, and where the

number of test functions under consideration is increased substantially. To my best

knowledge, there is currently a lack of such extensive benchmarking in a similar

style of Figure 3.12, Figure 3.13, Figure 3.14, and Figure 3.15.

Since the elaborations are without loss of generality w.r.t. the dimensionality d,

let us dwell briefly on this issue by exploring the Rosenbrock test function on the

smaller domain [−2.0,2.0]d(for the case d=2, see the Figure 2.3) where the dimen-

sionality dis governed by d∈{2,3,4,5,6,7}. Thus, we explore the evolution of the

normalized mean generalization error, the mean SSPCC, and the normalized global

first-order sensitivity measure with the dimensionality.

Recalling § 2.3.3, the Rosenbrock test function in Table 2.1 permits to consider

immediately an arbitrary dimensionality with d∈{2,3,4,5,6,7}, more precisely, fRNξ

from (2.52) with Nξ≡d, where, in each case, the global minimum is at the point

(1,1,.. .,1)∈Rd. In Table 3.9, I depict the normalized mean generalization error

and the mean SSPCC within the k-fold cross validation method w.r.t. a kriging low-

fidelity model of a generalized version of the Rosenbrock test function where the

sample size mis set to m∶=50. Essentially, the Table 3.9 reveals how strongly the

curse of dimensionality kicks in: For instance, the error eN

H,sg(ˆ

Qξ

ξ)∣k∶=10 increases by

approximately five orders of magnitude from d∶=2 to d∶=7, and the SSPCC r2

y˜

y∣k∶=10

drops by almost 70% from d∶=2 to d∶=7. Geometrically, this observations translates

into shifts of the corresponding courses of paths in Figure 3.13 and in Figure 3.14.

Thus, the findings in Table 3.9 can be conceived as a worst-case estimate of the evo-

lution of the normalized mean generalization error and the mean SSPCC with the

dimensionality regarding the findings in Figure 3.13 and in Figure 3.14.

TABLE 3.9: Given the sample size m∶=50, the normalized mean gen-

eralization error eN

H,sg(ˆ

Qξ

ξ)and the mean SSPCC r2

y˜

y∣kwithin the k-

fold cross validation method w.r.t. a kriging low-fidelity model of the

Rosenbrock test function in Table 2.1 (with normalized values) gener-

alized to the domain [−2.0,2.0]dwith d∈{2,3,4,5,6,7}.

Dimensionality deN

H,sg(ˆ

Qξ

ξ)∣k∶=5r2

y˜

y∣k∶=5eN

H,sg(ˆ

Qξ

ξ)∣k∶=10 r2

y˜

y∣k∶=10

2 1.5808×10−60.9998 7.9288×10−70.9996

3 3.2861×10−40.9652 9.0496×10−50.9915

4 4.7462×10−30.6747 3.5632×10−30.6093

5 6.5692×10−30.6483 5.7308×10−30.6387

6 9.7461×10−30.2742 8.7787×10−30.3653

7 1.3586×10−20.0393 1.3579×10−20.3013

Determining the normalized mean generalization error eN

H,sg(ˆ

Qξ

ξ)∣k∶=5for the di-

mensionality d∶=7 takes, roughly, about 23 times longer than for the dimensionality

d∶=2. Furthermore, determining the normalized mean generalization error eN

H,sg(ˆ

Qξ

ξ)∣k∶=10

for the dimensionality d∶=7 takes, roughly, about 17 times longer than for the dimen-

sionality d∶=2. In both cases, the longer computation time for the dimensionality

d∶=7 is primarily dominated by the more involved optimization in (3.97). How-

ever, these factors should merely be conceived as raw estimates or raw proxies of

90 Chapter 3. Surrogate optimization

(A)eN

cv(5,mg)vs. mg.

(B)eNR

cv (5,mg)vs. mg.

FIGURE 3.12: The value of eN

cv and eNR

cv evaluated at the number of

folds kand the number of testing points mgw.r.t. the test functions

in Figure 2.2 with normalized values. The number mgcorresponds to

the number mtby mt∶=(k−1)mgand to the number mby m∶=kmg.

The data-fit low-fidelity models are ( ) a 2-variate monomial polyno-

mial via regression, ( ) a TPS RBF, and ( ) a kriging model.

(ii)

and

(iii)

,eN

cv(k,mg)w.r.t. ( ) is below machine precision.

3.2. Surrogate-based optimization 91

(A)eN

cv(5,mg)vs. mg.

(B)eN

cv(10,mg)vs. mg.

FIGURE 3.13: The value of eN

cv evaluated at the number of folds kand

the number of testing points mgw.r.t. the test functions in Figure 2.2

with normalized values. The number mgcorresponds to the number

mtby mt∶=(k−1)mgand to the number mby m∶=kmg.

The data-fit low-fidelity models are ( ) a 2-variate monomial polyno-

mial via regression, ( ) a TPS RBF, and ( ) a kriging model.

(ii)

and

(iii)

,eN

cv(k,mg)w.r.t. ( ) is below machine precision.

92 Chapter 3. Surrogate optimization

(A)r2

y˜

y,cv(5,mg)vs. mg.

(B)r2

y˜

y,cv(10,mg)vs. mg.

FIGURE 3.14: The value of r2

y˜

y,cv evaluated at the number of folds k

and the number of testing points mgw.r.t. the test functions in

Figure 2.2. The number mgcorresponds to the number mtby

mt∶=(k−1)mgand to the number mby m∶=kmg.

The data-fit low-fidelity models are ( ) a 2-variate monomial polyno-

mial via regression, ( ) a TPS RBF, and ( ) a kriging model.

3.2. Surrogate-based optimization 93

(A)SN

y,1(f,mt)vs. mt.

(B)SN

y,2(f,mt)vs. mt.

FIGURE 3.15: The value of SN

y,iwith i∈{1,2}evaluated at fw.r.t. the

test functions in Figure 2.2 and the number of training points mt.

The data-fit low-fidelity models are ( ) a 2-variate monomial polyno-

mial via regression, ( ) a TPS RBF, and ( ) a kriging model.

The gray dotted line indicates the position mt∶=12 and the gray

dashed line indicates the position mt∶=27. The thick black line refers

to the value of SN

i(f)of the respective test function in Table 2.2.

94 Chapter 3. Surrogate optimization

the overhead in the optimization in (3.97) due to the increase in the dimensionality.

TABLE 3.10: The normalized global first-order sensitivity measure SN

evaluated at fRNξfrom (2.52) w.r.t. the data in Table 3.9.

y,i(f)i∶=1i∶=2i∶=3i∶=4i∶=5i∶=6i∶=7Σd

i=1SN

y,i(f)

2 0.9081 0.0919 −−−−−1.0000

3 0.4125 0.5475 0.0400 −−−−1.0000

4 0.2004 0.4122 0.3568 0.0306 −−−1.0000

5 0.2133 0.2161 0.2386 0.3227 0.0093 − − 1.0000

6 0.1228 0.1842 0.2792 0.3246 0.0861 0.0031 −1.0000

7 0.0614 0.2896 0.0592 0.1496 0.0158 0.2985 0.1259 1.0000

TABLE 3.11: Using the reference values in Table 2.4, the low-fidelity

models’ normalized global first-order sensitivity measures (LFSM) er-

ror in (3.37) evaluated at fRNξfrom (2.52) w.r.t. the data in Table 3.9.

em∞(SN

y,i)i∶=1i∶=2i∶=3i∶=4i∶=5i∶=6i∶=7

2+0.0031 −0.0314 −−−−−

3−0.0292 +0.0223 −0.0204 − − − −

4+0.2199 −0.1482 +0.0061 −0.2191 − − −

5−0.1274 +0.1817 +0.0966 −0.2219 +0.4973 − −

6+0.1780 +0.1187 −0.3359 −0.5531 +0.5880 +0.7877 −

7+0.5044 −0.6759 +0.6574 +0.1343 +0.9086 −0.7274 −9.405

In Table 3.10, I present the normalized global first-order sensitivity measure SN

evaluated at fRNξfrom (2.52) w.r.t. the data in Table 3.9. Additionally, in Table 3.11, I

present the LFSM error in (3.37) w.r.t. the Table 2.4. The increase of the unsigned

LFSM error in Table 3.11 from d∶=2 to d∶=7 correlates with the decrease of the

SSPCC in Table 3.9 from d∶=2 to d∶=7. Hence, this relationship supports in higher

dimensions the similar observations in Figure 3.15 and in Figure 3.14 for the case

d∶=2.

A benefit of the previously discussed tables and figures with regard to the deter-

ministic and probabilistic data-fit low-fidelity models is that they provide us with

some quantitative hints about the values and the shape (or landscape) associated

with the high-fidelity models in the physics-oriented context of the applications in

the preset work such that one can roughly assign these high-fidelity models’ behav-

ior to one or more test functions from Table 2.1.

Thus, this by no means complete list of indicators or properties (i.e., tables and

figures) enables us some kind of raw classification of application-driven high-fidelity

models’ behaviors. More interestingly, if a high-fidelity model does not possess cer-

tain properties of a selected function from the list of test functions, then one can make

an educated guess that the high-fidelity model is probably not a representative of the

selected function from the list of test functions.

However, there are a couple of caveats regarding such a benchmark-focused

classification. First, it is not clear whether there exists a reliable complete list of

3.2. Surrogate-based optimization 95

indicators or properties. Second, it is not clear how big the list of test functions

has to be. Nevertheless, from an application-oriented viewpoint as well as from a

theory-oriented viewpoint, such an attempt of a benchmark-focused classification is

a worthwhile endeavor.

To conclude the discussion with respect to the data-fit low-fidelity models, let us

perform a surrogate-based optimization concerning the modified Branin test func-

tion according to the following schematic procedure that I refer to as SBO-DFLF:

1) create a sample w.r.t. the high-fidelity model (§ 3.1.1);

2) construct a data-fit low-fidelity model (§ 3.1.2) w.r.t. the sample from step 1);

3) invoke a global optimization algorithm (§ 2.3.3) w.r.t. the constructed data-fit

low-fidelity model from step 2);

4) use the minimizer from step 3) as a starting point for a local optimization algo-

rithm (§ 2.3.3) w.r.t. the high-fidelity model from step 1).

Notice that the proposed schematic procedure SBO-DFLF builds upon the com-

mon canon of optimization algorithms (recall § 2.3.3) such as the Nelder-Mead sim-

plex (NMS) algorithm and the adaptive differential evolution (ADE) algorithm.

Since the present work’s core mantra is that a function evaluation of the high-

fidelity model is computationally expensive, let us count the number of high-fidelity

model function evaluations in order to find the modified Branin test function’s global

minimizer and its corresponding value presented in Table 2.1 within a certain accu-

racy.

We start with ten function evaluations because the sample size mis set to m∶=10

such as in Figure 3.9a, and Figure 3.10a, and Figure 3.11a in order to construct the

respective data-fit low-fidelity model.

Next, let use invoke an adaptive differential evolution (ADE) algorithm to de-

termine the global minimizer and its corresponding value of the respective data-fit

low-fidelity model. Mind that we only consider box constraints. Depending on the

degree of rigor needed for a task at hand, in addition, a guaranteed deterministic

global optimization algorithm using interval arithmetic can be employed in order to

certify the result of the ADE algorithm, i.e., the global minimizer and its correspond-

ing value of the respective data-fit low-fidelity model.

Finally, let us use the minimizer from the ADE algorithm as a starting point for

a Nelder-Mead simplex (NMS) algorithm with regard to the modified Branin test

function. Note that, instead of the NMS algorithm from (Opkg1), the NMS algorithm

from (Opkg3) is invoked since, in addition to the common classic implementation,

it provides a modern implementation in the sense that, e.g., it defaults to an adap-

tive tuning parameters scheme and it focuses on keeping the number of necessary

function evaluations low.24

In Table 3.12, I present the results from the proposed schematic surrogate-based

optimization procedure SBO-DFLF. The quantity erx∗refers to the relative error with

respect to the global minimizer listed in Table 2.1 and the quantity erf(x∗)refers to the

relative error with respect to the global minimizer’s function value listed in Table 2.1

as well. The quantity itNMS refers to the total number of function evaluations in the

NMS algorithm.

24For more details on the implementation of the NMS algorithm from (Opkg1) and the NMS algo-

rithm from (Opkg3), I refer to the respective package documentation and references therein.

96 Chapter 3. Surrogate optimization

TABLE 3.12: Building upon the Figure 3.9a, and the Figure 3.10a,

and the Figure 3.11a, surrogate-based optimization according to the

proposed schematic procedure SBO-DFLF w.r.t. the modified Branin

function using data-fit low-fidelity models.

Low-fidelity model mitNMS erx∗erf(x∗)

Polynomial 10 11 0.0086 0.0018

TPS RBF 20 17 0.0703 0.0400

Kriging 10 11 0.0067 0.0023

The Table 3.12 reveals that the procedures based on the monomial polynomial

model and the kriging low-fidelity model consume in total 21 function evaluations

of the high-fidelity model in order to produce a relative error erx∗and a relative

error erf(x∗)that, encoding these errors in percentage, are less than one percent.

Choosing the thin plate spline radial basis function requires to increase the sample

size, otherwise the TPS RBF is not capable to produce a sufficiently good starting

point. Ultimately, the procedure based on the TPS RBF consumes in total 37 function

evaluations of the high-fidelity model in order to produce a relative error erx∗and

a relative error erf(x∗)that, encoding these errors in percentage, are less than eight

percent.

In practical applications, usually, there is no apriori knowledge about the min-

imizers of a high-fidelity model. Hence, the proposed schematic surrogate-based

optimization procedure SBO-DFLF offers an approach to find relatively quickly and

sufficiently accurately a global minimizer of a high-fidelity model.

An advantage of the Table 3.12 is that it serves as a quantitative hint at the po-

tential quality of a surrogate-based optimization procedure that is useful for the as-

sessment of a surrogate-guided optimization procedure.

3.2.2 Optimization with test functions

by emulated simplified-physics low-fidelity models

Finally, let us emulate simplified-physics low-fidelity models for the modified Branin

test function via the assignment rule

K(x)∶=γ+Rδ⋅RK(α+R2β⊙x), (3.105)

where γ,δ∈Rand α,β∈R2×1denote adjustment parameters and the map ⊙is com-

prehended as the Hadamard product or the element-wise product.

TABLE 3.13: The choice of the 4-tuple of parameters (α,β,γ,δ)in Fig-

ure 3.16a and in Figure 3.16b.

Figure 3.16a Figure 3.16b

(i)

([0.0 10.0]T,[1.0 1.0]T, 0.0, 1.0) ([0.0 0.0]T,[1.0 1.0]T, 1.0×103, 1.0)

(ii)

([10.0 0.0]T,[1.0 1.0]T, 0.0, 1.0) ([10.0 10.0]T,[1.0 1.0]T, 1.0×103, 1.0)

(iii)

([10.0 10.0]T,[1.0 1.0]T, 0.0, 1.0) ([0.0 0.0]T,[1.5 1.5]T, 1.0×103, 1.0)

(iv)

([0.0 0.0]T,[0.5 1.0]T, 0.0, 1.0) ([0.0 0.0]T,[1.0 1.0]T, 0.0, 1.0×103)

(v)

([0.0 0.0]T,[1.0 0.5]T, 0.0, 1.0) ([10.0 10.0]T,[1.0 1.0]T, 0.0, 1.0×103)

(vi)

([0.0 0.0]T,[1.5 1.5]T, 0.0, 1.0) ([0.0 0.0]T,[1.5 1.5]T, 0.0, 1.0×103)

3.2. Surrogate-based optimization 97

The assignment in (3.105) is a variation and a generalization to the two-dimensio-

nal case of the one-dimensional cases in [49, p. 86] and in [70, p. 195]. In Table 3.13,

several choices of the 4-tuple of parameters (α,β,γ,δ)are listed whose influences on

the low-fidelity model are depicted in Figure 3.16.

In Figure 3.17, I illustrate grad(˜

K)(x1,x2)as a projection on the contour repre-

sentation of the emulated simplified-physics low-fidelity models.

For the sake of completion, in Table 3.14 and in Table 3.15, I provide the normal-

ized mean generalization error eN

H,sg(ˆ

Qξ

ξ)and the mean SSPCC r2

y˜

y∣kwithin the k-fold

cross validation method w.r.t. the emulated simplified-physics low-fidelity models

in Figure 3.16 with sample size m∶=50. For all combinations of the adjustment pa-

rameters in Table 3.13, the deterioration in the value is comprehensibly large. The

deterioration in the shape is largest for those combinations of the adjustment param-

eters in which the argument of the high-fidelity model are shifted.

TABLE 3.14: The normalized mean generalization error eN

H,sg(ˆ

Qξ

ξ)and

the mean SSPCC r2

y˜

y∣kwithin the k-fold cross validation method w.r.t.

emulated simplified-physics low-fidelity models in Figure 3.16a with

normalized values and with sample size m∶=50.

(i) (ii) (iii) (iv) (v) (vi)

H,sg(ˆ

Qξ

ξ)∣k∶=56.2 ×10−24.2 ×10−29.6 ×10−24.9 ×10−31.7 ×10−21.6 ×10−2

y˜

y∣k∶=50.6391 0.0391 0.1092 0.7466 0.3434 0.8684

H,sg(ˆ

Qξ

ξ)∣k∶=10 6.2 ×10−24.2 ×10−29.6 ×10−24.9 ×10−31.7 ×10−21.6 ×10−2

y˜

y∣k∶=10 0.6238 0.2691 0.3007 0.7224 0.3611 0.7565

TABLE 3.15: The normalized mean generalization error eN

H,sg(ˆ

Qξ

ξ)and

the mean SSPCC r2

y˜

y∣kwithin the k-fold cross validation method w.r.t.

emulated simplified-physics low-fidelity in Figure 3.16b with normal-

ized values and with sample size m∶=50.

(i) (ii) (iii) (iv) (v) (vi)

H,sg(ˆ

Qξ

ξ)∣k∶=51.8 ×10−11.4 ×10−18.6 ×10−2<2.2 ×10−16 9.6 ×10−21.6 ×10−2

y˜

y∣k∶=51.0 0.1558 0.8367 1.0 0.1558 0.8367

H,sg(ˆ

Qξ

ξ)∣k∶=10 1.8 ×10−11.4 ×10−18.6 ×10−2<2.2 ×10−16 9.6 ×10−21.6 ×10−2

y˜

y∣k∶=10 1.0 0.3522 0.7545 1.0 0.3522 0.7545

Let us skip a discussion about the sensitivity measures since the procedure is

analogous to the corresponding procedure in the previous section. However, an im-

port insight from the Table 3.14 and the Table 3.15 is that if the mean SSPCC r2

y˜

y∣k

is above a threshold of 0.75, then it might be useful to adapt the proposed proce-

dure SBO-DFLF to the case of simplified-physics low-fidelity models as well. Notice

that the threshold of 0.75 is rather user-dependent. Based on anecdotal evidence,

however, the authors in [70, p. 37] allege that a threshold above 0.80 corresponds to

a good low-fidelity model in terms of predictive power.

98 Chapter 3. Surrogate optimization

-5 0 5 10

(i)

-5 0 5 10

(ii)

-5 0 5 10

(iii)

-5 0 5 10

(iv)

-5 0 5 10

(v)

-5 0 5 10

(vi)

(A) The concrete 4-tuple of parameters (α,β,γ,δ)are listed in the first column of the Table 3.13.

The red cross refers to the global minimum of the high-fidelity model in Figure 2.2b.

-5 0 5 10

(i)

-5 0 5 10

(ii)

-5 0 5 10

(iii)

-5 0 5 10

(iv)

-5 0 5 10

(v)

-5 0 5 10

(vi)

(B) The concrete 4-tuple of parameters (α,β,γ,δ)are listed in the second column of the Table 3.13.

The red cross refers to the global minimum of the high-fidelity model in Figure 2.2b.

FIGURE 3.16: Emulated simplified-physics low-fidelity models for

the modified Branin test function (and high-fidelity model, respec-

tively) in Figure 2.2 via the assignment rule in (3.105) (solely in con-

tour representation).

3.2. Surrogate-based optimization 99

−5 0510

(i)

−5 0510

(ii)

−5 0510

(iii)

−5 0510

(iv)

−5 0510

(v)

−5 0510

(vi)

(A) Depicting grad(˜

K)(x1,x2)within Figure 3.16a.

Same scaling from

(i)

(vi)

−5 0510

(i)

−5 0510

(ii)

−5 0510

(iii)

−5 0510

(iv)

−5 0510

(v)

−5 0510

(vi)

(B) Depicting grad(˜

K)(x1,x2)within Figure 3.16b.

Same scaling from

(i)

(iii)

and from

(iv)

(vi)

The scaling in

(iv)

, in

(v)

, and in

(v)

is 1×10−3of the scaling in

(i)

, in

(ii)

, and in

(iii)

FIGURE 3.17: Depicting grad(˜

K)(x1,x2)as a projection on the con-

tour representation of the emulated simplified-physics low-fidelity

models for the modified Branin test function in Figure 3.16.

100 Chapter 3. Surrogate optimization

Let us apply the following schematic procedure that I refer to as SBO-SPLF:

1) create a sample w.r.t. the high-fidelity model (§ 3.1.1);

2) provide a simplified-physics low-fidelity model (§ 3.1.3) and compute r2

y˜

y∣k

w.r.t. the sample from step 1) and if r2

y˜

y∣k

<0.75, break off the procedure, other-

wise continue the procedure;

3) invoke a global optimization algorithm (§ 2.3.3) w.r.t. the provided simplified-

physics low-fidelity model from step 2);

4) use the minimizer from step 3) as a starting point for a local optimization algo-

rithm (§ 2.3.3) w.r.t. the high-fidelity model from step 1).

Similarly to the proposed procedure SBO-DFLF, the proposed schematic proce-

dure SBO-SPLF builds upon the common canon of optimization algorithms such

as the Nelder-Mead simplex (NMS) algorithm and the adaptive differential evolu-

tion (ADE) algorithm.

Depending on the desired degree of scrutiny and rigor, it is also conceivable to

combine the proposed procedures SBO-DFLF and SBO-SPLF in sequence or in par-

allel and to compare, e.g., the respective minimizers that serve as starting points.

Furthermore, the procedure SBO-DFLF suits well as a fallback branch for the proce-

dure SBO-SPLF in the case that r2

y˜

y∣kis below the threshold 0.75.

Finally, let us observe that, in order to determine the quantities eN

H,sg(ˆ

Qξ

ξ)∣kand

y˜

y∣kin Table 3.14 and in Table 3.15, there are 50 evaluations of the simplified-physics

low-fidelity model and 50 evaluations of the high-fidelity model needed. Thus, in

practical applications, the sample size has to be lowered to, e.g., the number 10 that

corresponds to 10 evaluations of the high-fidelity model and 10 evaluations of the

simplified-physics low-fidelity model. However, in order to compute r2

y˜

y∣kwith, e.g.,

k∶=5, there are at least 15 evaluations of the high-fidelity model and 15 evaluations

of the simplified-physics low-fidelity model needed (recall § 3.1.1).

Under the common assumption that, compared to the number of evaluations

of the high-fidelity model, the number of evaluations of the simplified-physics low-

fidelity model is negligible, results from the proposed procedure SBO-SPLF are com-

parable to results such as in Table 3.12 from the proposed procedure SBO-DFLF.

3.3 Surrogate-guided optimization

The fundamental philosophy in the present work is to keep the sample size mas low

as possible since, with regard to the high-fidelity model, the acquisition of pairs of

sampling plan points and output points is computationally expensive.

In order to determine the high-fidelity model’s optimal solution, a reasonable

sample size mand a reasonable number of high-fidelity function evaluations, re-

spectively, are unknown apriori. Furthermore, as elucidated in the elaborations of

the previous section § 3.2, these numbers are problem-dependent as well.

If we suppose that the positive integer number of high-fidelity function evalu-

ations mDSO w.r.t. a direct solving of a high-fidelity optimization problem is higher

than the positive integer number of high-fidelity function evaluations mSBO w.r.t. a

3.3. Surrogate-guided optimization 101

surrogate-based optimization, then, hopefully, the positive integer number of high-

fidelity function evaluations mSGO w.r.t. a surrogate-guided optimization is lower

than mSBO such that the transitive relation

∀mDSO,mSBO,mSGO ∈N/{0}.mDSO >mSBO ∧mSBO >mSGO Ô⇒ mDSO >mSGO

(3.106)

holds to be true. From an application-driven viewpoint, the additional value of

checking the relation

∀mSBO,mSGO ∈N/{0}.mSBO >mSGO (3.107)

for a given problem is, for instance, comprehensible in the context of validation and

verification.

The relation in (3.106) as well as the relation in (3.107) possess the hidden as-

sumption that the corresponding solving procedures converge to the same optimal

solution w.r.t. a user-defined tolerance. Let us encode this hidden assumption by the

pre-condition

∃mDSO,mSBO,mSGO ∈N/{0}.mDSO,mSBO,mSGO >0, (3.108)

and let us encode the implicit order structure in (3.106) by the post-condition

∀mDSO,mSBO,mSGO ∈N/{0}.mDSO >mSBO >mSGO >0. (3.109)

In those cases in which the hidden assumption is not satisfied or the relation

in (3.107) does not hold, it might be more preferable to perform a surrogate-based

optimization instead of a surrogate-guided optimization.

However, mind that, generally, one cannot know with certainty in advance whe-

ther the relation in (3.107) holds for a task at hand; therefore, from an application-

driven viewpoint embedded in the context of validation and verification, it might be

reasonable to perform a surrogate-based optimization as well as a surrogate-guided

optimization for a task at hand.

The basic idea of surrogate-guided optimization is to provide some kind of in-

teraction between a high-fidelity model and a low-fidelity model, that is, to provide

some model management strategy (cf. § 1.2). In the subsequent subsections, let us

focus on the model management strategies adaptation and fusion (cf. § 1.3).

The essential idea underlying adaptation is to exploit information about a low-

fidelity model in each step of the solving procedure regarding the high-fidelity opti-

mization problem in order to adapt the procedure according to the low-fidelity model

information.

The key idea of fusion is to combine or to fuse information about the high-fidelity

model and information about the low-fidelity model into a single model that is ex-

ploited to constitute a proxy for the high-fidelity optimization problem.

For the sake of completeness of the discussion about the kriging low-fidelity

model in § 3.1.2, let us elaborate briefly on the so-called sequential kriging opti-

mization which is a subtype of the model management strategy adaptation.

Next, we discuss different algorithms from the context of the space-mapping

paradigm. Their underlying optimization procedures are subtypes of the model

management strategy adaptation as well.

Finally, we examine the co-kriging optimization that can be conceived as a sub-

type of the model management strategy fusion.

102 Chapter 3. Surrogate optimization

Regarding all optimization approaches under consideration, we elaborate some

convergence-related issues.

3.3.1 Sequential kriging optimization

An important feature of the kriging low-fidelity model in (3.98) is that one can pro-

vide a mean squared prediction error (ˆ

sy(x))2at an arbitrary point x. More precisely,

the error (ˆ

sy(x))2that is associated with ˆ

y(x)in (3.98) can be stated as

(ˆ

sy(x))2∶=ˆ

σ2

y(1−rTΨ−1r+(1−1TΨ−1r)2

1TΨ−11), (3.110)

where ˆ

sy∶X→R. Note that the term involving the fraction is negligibly small, thus,

let us consider the mean squared prediction error (ˆ

sy(x))2as

(ˆ

sy(x))2∶=ˆ

σ2

y(1−rTΨ−1r). (3.111)

In addition to the numerical justification for omitting the fraction term in (3.110),

the authors in [70, p. 84] provide a method-based justification, more specifically,

they argue that, from a Bayesian viewpoint, the fraction term is not mentioned at all

(such as in, e.g., [116, p. 280ff]).

Instead of (ˆ

sy(x))2, it is common to employ the root mean squared prediction

error ˆ

sy(x)as the measure of the uncertainty in the prediction ˆ

y(x). The error ˆ

sy(x)

is linked with the error (ˆ

sy(x))2by

sy(x)∶=√∣(ˆ

sy(x))2∣(3.112a)

≡√∣ˆ

σ2

y(1−rTΨ−1r)∣. (3.112b)

Observe that, if a sampling plan point is given, then the error is zero. More formally,

if we provide an auxiliary map aˆ

sy=x↦rTΨ−1r∶X→R, then one can state that

∀i∈{1,. . .,m}.aˆ

sy(xi)=1Ô⇒ ∀i∈{1,. ..,m}.ˆ

sy(xi)=0. (3.113)

This observation encodes the intuitive expectation that the prediction is exact at a

given sampling plan point where the corresponding output point is given as well.

Generally, if we utilize a data-fit low-fidelity model in an interpolation context,

then one can reduce the empirical surrogate modeling error (recall Definition 3.1.2)

by increasing sufficiently the sample size and by positioning appropriately the sam-

pling plan points.

However, for reasons of computational thrift, the total number of sampling plan

points has to be kept as low as possible. Hence, a usual economical approach is to

start with a sampling plan and to add sequentially new sampling plan points in a

guided way.

Ideally, this kind of adaptive interpolation balances the error-based global ex-

ploration for generating an overall accurate low-fidelity model and the prediction-

based local exploitation for determining an optimal value of the high-fidelity model.

Using the error ˆ

sy(x), one can define the acquisition function (or infill criterion

3.3. Surrogate-guided optimization 103

function or update point function) expected improvement EI ∶X→R+whose assign-

ment is defined by the conditional expression

EI(x)∶=

⎧

⎪

⎨

⎪

⎩

0, if ˆ

sy(x)≡0

(min(y)−ˆ

y(x))⎛

⎝1

2+1

2erf⎛

⎝min(y)−ˆ

y(x)

sy(x)√2⎞

⎠⎞

⎠

+ˆ

sy(x)

√2πexp(−(min(y)−ˆ

y(x))2

2(ˆ

sy(x))2), if ˆ

sy(x)>0, (3.114)

where a signature of the map min reads as Rm×1→Rand min(y)returns the min-

imal output point within the current column vector yin (3.83), thus, it is set that

ymin ≡min(y); and the map erf ∶R→[−1,1]denotes the Gauss error function. No-

tice that there is a slight abuse of notation in the sense that the expected improve-

ment acquisition function EI is regarded as an independent map instead of as the

expected value of an improvement function I, i.e., E[I(x)]. For more details, see,

e.g., [108], [70, p. 89ff] or [116, p. 294ff].

The acquisition function EI enables us to assert how much utility or improve-

ment is to be expected from a potentially new sampling plan point. For other kinds

of acquisition function, see, e.g., [70, ch. 3.2] or [116, ch. 16] and references therein.

After determining ymin and the initial kriging low-fidelity model in (3.98) w.r.t. a

given sampling plan, one can identify the new m+1 sampling plan point as the

optimal value that maximizes the corresponding expected improvement acquisition

function in (3.114)– or, equivalently, as the minimizer of the optimization problem

xm+1∶=argmin

x∈X

−EI(x).25 (3.115)

Observe that the statement in (3.115) follows a common notation in which it is

not emphasized that the assignment definition EI(x)in (3.114)depends on the sam-

pling plan points and the corresponding output points as well as on the parameters

of the kriging low-fidelity model.

Ideally, the iteration procedure in (3.115) of determining ymin and the kriging

low-fidelity model, and finding a new sampling plan point xm+1terminates after

finitely many steps at the global minimum such that ˆ

sy(x)≡0, and, consequently,

EI(x)≡0.

However, the experimental rate of convergence of the iteration procedure in

(3.115) might be very low or the iteration procedure might not be convergent at all

(cf. [70, p. 91]). Thus, the condition in (3.107) for a given problem does not neces-

sarily hold to be true or the condition cannot be applied because its assumption is

violated.

The iteration procedure in (3.115) is associated with the so-called efficient global

optimization (or sequential kriging optimization) technique which can be consid-

ered as a subtype of the model management strategy adaptation (see [166, p. 555]).

Regarding the zoo of possible acquisition functions, it is intricate to find the most

appropriate embedding of concepts associated with iteration procedures such as

in (3.115) into the context of numerical optimization with the magnetoquasistatic

25For computational reasons in a manner similar to (3.97), it can be useful to add the smallest posi-

tive normalized floating-point number 2−1022 to the assignment in (3.114) and to consider the logarithm

with base 10 of this extended version, i.e., to utilize rather log10(EI(x)+2−1022)than EI(x)itself (see,

e.g., the book website associated with [70]).

104 Chapter 3. Surrogate optimization

model (recall § 2.3). Hence, concepts such as, for instance, constrained expected im-

provement (see, e.g., [70, ch. 5.4]) within the electromagnetics context demand a thor-

ough examination all on their own – which is out of the scope of the present work.

In the present work, the previous considerations concerning sequential kriging

optimization are primarily employed as internal checking tools at the level of pro-

grams (recall Figure 1.4).

3.3.2 Optimization within the space-mapping paradigm

Recalling § 2.3.2 and § 3.1.3, one can adapt the problem formulation in (2.40) to the

generic statements in (3.104) in the sense that

min.

x∈X0(ˆ

j○K)(x), (3.116)

where K∈hom(X0,Y0)denotes the high-fidelity model and ˆ

j∈hom(Y0,Z0)denotes

the objective functional and the composition map ○is overloaded such that its signa-

ture reads as ZY0

0×YX0

0→ZX0

0. Let us think of (3.116) as the high-fidelity optimization

problem.

Referring to the low-fidelity model as ˜

K∈hom(X1,Y1)and overloading the ob-

jective functional ˆ

j∈hom(Y1,Z1)and the composition map ○∈hom(ZY1

1×YX1

1,ZX1

in (3.116), one can think of the low-fidelity optimization problem as

min.

x∈X1(ˆ

j○˜

K)(˜

x). (3.117)

Furthermore, if we refer to the domain-oriented correction map as ˜

P∈hom(X0,X1),

and to the codomain-oriented correction map as ˜

R∈hom(Y1,Y0), then the generic

statements in (3.104) reduce to one legitimate generic statement from the perspective

of function extensionality such as in (2.28), that is,

∀x∈X0.K(x)=Y0(˜

R○Y10 ˜

K○X01 ˜

P)(x), (3.118)

where the map ○X01 ∶YX1

1×XX0

1→YX0

1and the map ○Y10 ∶YY1

0×YX1

1→YX1

0denote suit-

able composition maps which adhere to right-associativity. Thus, the ideal property

regarding the maps ˜

R,˜

K,˜

Preads as

∀˜

R∈hom(Y1,Y0).∀˜

K∈hom(X1,Y1).∀˜

P∈hom(X0,X1).K=X0→Y0(˜

R○Y10 ˜

K○X01 ˜

P).

(3.119)

Recalling the commentary on (3.104), let us introduce the map ˜

Kssuch that

Ks=(˜

R,˜

K,˜

P)↦˜

Ks(˜

R,˜

K,˜

P)∶=˜

R○Y10 ˜

K○X01 ˜

P∶YY1

0×YX1

1×XX0

1→YX0

0, (3.120)

in order to conceptually discriminate the notion of a low-fidelity model ˜

Kand the

notion of a surrogate model ˜

Kswithin the context of the space-mapping paradigm.

For the sake of notational ease, let us define

K˜

R,˜

K,˜

s∶=˜

Ks(˜

R,˜

K,˜

P). (3.121)

If we add another assumption to the list of assumptions regarding the maps ˜

K, and ˜

P, more specifically, if we assume that there is some kind of sub-structure

between X0and X1as well as between Y1and Y0, then it is meaningful to define the

3.3. Surrogate-guided optimization 105

inclusion map ι˜

Rand the inclusion map ι˜

Psuch that

ι˜

P=x↦ι˜

P(x)∶=x∶X0→X1, (3.122a)

ι˜

R=˜

y↦ι˜

R(˜

y)∶=˜

y∶Y1→Y0. (3.122b)

However, if we assume that X≡X0and X0≡X1as well as Y≡Y1and Y1≡Y0, then

we receive ι˜

R≡idYand ι˜

P≡idX. Another possible assumption is that the corre-

sponding entities are isomorphic regarding some prescribed algebraic structure, i.e.,

X0≅X1and Y1≅Y0. We dwell on this assumption in ch. 4, though.

Observe that if and only if the case ι˜

R≡idYand ι˜

P≡idXis given, then

∀˜

K∈hom(X,Y).KidY,˜

K,idX

s=X→Y˜

K(3.123)

holds to be true such that the notion of a low-fidelity model ˜

Kand the notion of a

surrogate model ˜

Kscollapse within the context of the space-mapping paradigm.

If the condition in (3.118) and in (3.119), respectively, hold to be true, then one

can substitute Kwith K˜

R,˜

K,˜

sin (3.116) such that

min.

x∈X0(ˆ

j○(˜

R○Y10 ˜

K○X01 ˜

P))(x). (3.124)

However, according to the remarks regarding (2.28), one cannot expect that the con-

dition in (3.118) and in (3.119), respectively, are satisfied for real-world applications.

Thus, we have to incorporate the information about the high-fidelity model into the

surrogate model to reduce a potential discrepancy between the two.

If we assume that the low-fidelity model does not depend on the high-fidelity

model, then, conceptually, one would have to extend the signature of the correction

map ˜

Rand the signature of the correction map ˜

Pin the sense that

P=(x,K,˜

K)↦˜

P(x,K,˜

K)∶X0×YX0

0×YX1

1→X1, (3.125a)

R=(˜

y,˜

K,K)↦˜

R(˜

y,˜

K,K)∶Y1×YX1

1×YX0

0→Y0. (3.125b)

Though, in the common treatment of theoretical issues regarding the space-mapping

paradigm, the high-fidelity model and the low-fidelity model are assumed to be

fixed. Thus, they are implicitly incorporated into the assignment rules of the cor-

rection maps (see, e.g., [49, ch. 3] and references therein). This tactic resembles the

definition of the empirical generalization error eH,sg(ˆ

Qξ

ξ)in (3.21).

From a program’s viewpoint (recall Figure 1.4), the low-fidelity model and the

high-fidelity-model are defined within the scope of the programs associated with

the correction maps.

Subsequently, I deem it advisable to move from an abstract function definition

to a concrete function definition. More precisely, let us assume X0⊆Rd,X1⊆Rd,

Y0⊆R, and Y1⊆R, that is, let us cover the multivariate scalar-valued use case w.r.t.

the high-fidelity model and the low-fidelity model, then one can provide a possible

definition of the assignment rule of the domain-oriented correction map ˜

Pby means

Ps=x↦˜

Ps(x)∶=argmin

x∈X1(1

2(˜

K(˜

x)−K(x))2+R

2∥˜

x−x∥2

l2), (3.126)

where the index sin ˜

Psemphasizes the scalar-valued use case, α∈R+is a user-

assigned smoothing parameter for the purpose of existence and uniqueness of a

solution (compare with the parameter βin (2.31)). For more details regarding the

106 Chapter 3. Surrogate optimization

definition in (3.126), see [95].

Let us redefine the high-fidelity model and the low-fidelity model in the sense

that

K=(w,x)↦K(w,x)∶W0×X0→Y0, (3.127a)

K=(˜

w,˜

x)↦˜

K(˜

w,˜

x)∶W1×X1→Y1, (3.127b)

then one can introduce the map Kwand the map ˜

K˜

wthat read as

Kw=w↦(x↦K(w,x))∶W0→YX0

0, (3.128a)

K˜

w=˜

w↦(˜

x↦˜

K(˜

w,˜

x))∶W1→YX1

1, (3.128b)

where the operation currying is applied to the map Kand ˜

K. The operation currying

is conceived as an operation that transforms a function with multiple arguments into

a sequence of functions with single arguments.

If we suppose that X0⊆Rd,X1⊆Rd,Y0⊆R,Y1⊆R,W0⊆R, and W1⊆R, then

one can consider mwpoints with mw∈Nsuch that one can construct lists of func-

tions, i.e.,

(Kw(w1),. . .,Kw(wmw))≡(x↦K(w1,x),..., x↦K(wmw,x)), (3.129a)

(˜

K˜

w(˜

w1),. . ., ˜

K˜

w(˜

wmw))≡(˜

x↦˜

K(˜

w1,˜

x),. . ., ˜

x↦˜

K(˜

wmw,˜

x)), (3.129b)

where the operation ≡is conceived as a componentwise operation. Furthermore, if

we evaluate the lists of functions at xand at ˜

x, respectively, then one can construct

lists of evaluated functions, i.e.,

(Kw(w1)(x),.. .,Kw(wmw)(x))≡(K(w1,x),...,K(wmw,x)), (3.130a)

(˜

K˜

w(˜

w1)(˜

x),. . ., ˜

K˜

w(˜

wmw)(˜

x))≡(˜

K(˜

w1,˜

x),. . ., ˜

K(˜

wmw,˜

x)). (3.130b)

Finally, assuming the redefinition in (3.127), one can cover the multivariate vector-

valued use case26 w.r.t. the high-fidelity model and the low-fidelity model, that is,

if we assume that X0⊆Rd,X1⊆Rd,Y0⊆Rmw,Y1⊆Rmw, then one can overload the

maps in (3.127) such that

K=x↦K(x)∶=(K(w1,x),. . .,K(wmw,x))∶X0→Y0, (3.131a)

K=˜

x↦˜

K(˜

x)∶=(˜

K(˜

w1,˜

x),. . ., ˜

K(˜

wmw,˜

x))∶X1→Y1. (3.131b)

Notice that the entity wis often associated with the time variable tsuch as, e.g.,

in (2.9); or it is associated with the frequency ωsuch as, e.g., in (2.10), that is, w≡t

or w≡ω. In this context, it is usually assumed that W1≡W0in (3.127). These associ-

ations constitute some common interpretations of the entity wwithin the semantics

of electromagnetics. Due to these interpretations, the case ∀d.∀mw.d<mwis often

considered in practical applications (see, e.g., [194, p. 11] or [49, p. 91]).

In the vector-valued use case, technically, the dimensions of the domains and

co-domains do not necessarily have to match, i.e., the cases dim(X0)≠dim(X1)and

dim(Y0)≠dim(Y1)are conceivable. For instance, if the low-fidelity model admits

an application of automatic differentiation (see § 2.3.3) such as in the case of data-fit

low-fidelity models, then one can unleash the machinery of sensitivity computation

in order to determine an importance ranking of input variables (see § 3.2.1). Hence,

26In [138], multivariate functions are examined in a computational context for investigating partial

derivatives and the corresponding chain rule for multivariate calculus.

3.3. Surrogate-guided optimization 107

by means of sensitivity computation, one could construct a low-fidelity model such

that dim(X0)>dim(X1). In the present work, though, this path is not pursued. For

further elaborations regarding the dimension issue in the multivariate vector-valued

use case, I refer to, e.g., [56, p. 60ff] or [95].

Observing (3.130), though, I conclude that, syntactically, there is another possible

encoding of the vector-valued use case if it is set that

(Kw(w1)(x),.. .,Kw(wmw)(x))≡(K1(x),...,Kmw(x)), (3.132a)

(˜

K˜

w(˜

w1)(˜

x),. . ., ˜

K˜

w(˜

wmw)(˜

x))≡(˜

K1(˜

x),. . ., ˜

Kmw(˜

x)), (3.132b)

then one can rewrite (3.131) as

K=x↦K(x)∶=(K1(x),. . .,Kmw(x))∶X0→Y0, (3.133a)

K=˜

x↦˜

K(˜

x)∶=(˜

K1(˜

x),. . ., ˜

Kmw(˜

x))∶X1→Y1, (3.133b)

Mind that if we change the definitions in (3.128) such that it is set W1≡W0, and

therefore, in (3.133), Kwand ˜

Kwwith w∈{1,. . .,mw}denote mwdifferent component

high-fidelity models and mwdifferent component low-fidelity models, respectively,

then the corresponding interpretation resembles partly multiobjective optimization

(recall § 1.1). For some applications of this kind of interpretation of the vector-valued

use case, I refer to, e.g., [56, ch. 5] or [49, ch. 6].

Given the definitions in (3.131), the assignment rule in (3.126) has to be adapted,

more precisely, given X0⊆Rdand X1⊆Rd, then one can implement the map ˜

Pby

Pv=x↦˜

Pv(x)∶=argmin

x∈X1(1

2∥˜

K(˜

x)−K(x)∥2

l2+R

2∥˜

x−x∥2

l2), (3.134)

where the index vin ˜

Pvemphasizes the vector-valued use case and the correspond-

ing entities are conceived as column vectors (see also the commentary on the rep-

resentation of vectors as column vectors in § 3.1.2). For other possible definitions

of ˜

Pv(x), see, e.g., [49, p. 65] and references therein.

Depending on the choice of interpretation in (3.131) or in (3.133), one can recover

the scalar-valued use case by setting mw∶=1 such that that ˜

Pv≡˜

Ps.

Assuming some kind of differentiability structure regarding the map ˜

P(cf. [49,

p. 64]), one can invoke the notion of a first-order Taylor series expansion for multi-

variate vector-valued functions, thus, one can define an affine map as a representa-

tive of the map ˜

Pin (3.125a), that is,

P=x↦˜

P(x)∶=˜

P(x0)+J˜

P(x0)(x−x0), (3.135)

where, in the context of the space-mapping paradigm, the value of ˜

Pat the expan-

sion point x0is chosen as ˜

P(x0)∶=˜

Pv(x0); and J˜

P(x0)∈Rd×ddenotes the Jacobi ma-

trix w.r.t. the domain-oriented correction map ˜

Pin (3.125a) and evaluated at a fixed

argument x0of the high-fidelity model.27

27If we set [C1(U,R)]m∶=C1(U,R)× ⋯m−1×C1(U,R), and if we assume a map f∈[C1(U,R)]m

with U⊂Rnbeing an open set, and fi∈C1(U,R)with i∈{1,. . .,m}denote the components of

the map f, and xj∈Rwith j∈{1,. .. , n}denote the components of the component maps fi, and

if we suppose a fixed argument p∈U, then let us conceive the Jacobi matrix Jf(p)∈Rm×nw.r.t. f

and evaluated at pas Jf(p)∶=[jf(p)i,j]with jf(p)i,j∶=∂exj(fi)(p)for each i∈{1,. ..,m}and for each

j∈{1,.. .,n}with ∂exj(fi)(p)∶=grad(fi)(p)⋅Rnexjwhere ⋅Rndenotes the Euclidean inner product w.r.t.

Rn; and exjrefer to the unit vectors w.r.t. xj, respectively. See the commentary on the overloading of

the map grad in§2.3.3 as well.

108 Chapter 3. Surrogate optimization

By assuming some kind of sub-structure such as in (3.122), one can determine

J˜

P(x0)by the Jacobi matrix J˜

K(˜

Pv(x0))∈Rmw×dw.r.t. the low-fidelity model evalu-

ated at ˜

Pv(x0)and by the Jacobi matrix J˜

Ks(ι˜

R,˜

K,˜

P)(x0)∈Rmw×dw.r.t. the surrogate

model ˜

Ks(ι˜

R,˜

K,˜

P)and evaluated at x0.

Utilizing a multivariate vector-valued version of the chain rule, the Jacobi ma-

trix J˜

Preads as

J˜

P(x0)∶=J+

K(˜

Pv(x0))J˜

Ks(ι˜

R,˜

K,˜

P)(x0), (3.136)

where J+

K(˜

Pv(x0))∈Rd×mwindicates the pseudoinverse of J˜

K(˜

Pv(x0))possessing at

least a left inverse characteristic, i.e., J+

K(˜

Pv(x0))J˜

K(˜

Pv(x0))≡Iwith I∈Rd×dbeing

the identity matrix. The definition of the pseudoinverse follows the definition in

(3.61).

Furthermore, by assuming some kind of sub-structure such as in (3.122) and by

assuming Y0⊆Rmw,Y1⊆Rmw, one can define an ideal affine map as a representative

of the map ˜

R, that is,

R=˜

K(˜

x)↦˜

R(˜

K(˜

x))∶=K(x∗)+RmwS(˜

K(˜

x)−˜

K(ι˜

P(x∗))), (3.137)

where S∈Rmw×mwdenotes the ideal rotation matrix that is defined by the Jacobian

matrix JK(x∗)∈Rmw×dw.r.t. the high-fidelity model Kand evaluated at its optimal

argument x∗and by the Jacobian matrix J˜

K(ι˜

P(x∗))∈Rmw×dw.r.t. the low-fidelity

model ˜

Kand evaluated at ι˜

P(x∗).

Assuming that (3.118) adapted to the point x∗holds to be true for the surrogate

model ˜

Ks(˜

R,˜

K,ι˜

P)and utilizing a multivariate vector-valued version of the chain

rule, the ideal rotation matrix S≡J˜

R(˜

K(ι˜

P(x∗)))reads as

S∶=JK(x∗)J+

K(ι˜

P(x∗)), (3.138)

where J+

K(ι˜

P(x∗))∈Rd×mwindicates the pseudoinverse of J˜

K(ι˜

P(x∗))possessing at

least a right inverse characteristic, i.e., J˜

K(ι˜

P(x∗))J+

K(ι˜

P(x∗))≡Iwith I∈Rmw×mwbe-

ing the identity matrix. Hence, the definition of the pseudoinverse in (3.61) is adapted

to the case

J+

K(ι˜

P(x∗))∶=J˜

K(ι˜

P(x∗))T(J˜

K(ι˜

P(x∗))J˜

K(ι˜

P(x∗))T)−1. (3.139)

Note that the attribute "ideal" reflects the fact that one cannot know apriori the

optimal solution regarding the high-fidelity model optimization problem. For more

details regarding the ideal map ˜

Rin (3.137), I refer to, e.g., [56, p. 44ff].

From an algorithmic viewpoint (recall Figure 1.4 and recall § 2.3.3), the state-

ment in (3.124) constitutes the anchor point at the map level for any optimization

algorithm that follows the space-mapping paradigm.

Thus, one can articulate the essential aim of the corresponding iteration proce-

dures by

x(k+1)∶=argmin

x∈X0(ˆ

j○(˜

R○Y10 ˜

K○X01 ˜

P))(x(k)), (3.140a)

where k∈Nand xkand xk+1denote the k-th iteration point and the k+1-th iteration

point, respectively, such that, with regard to some appropriate norm,

x(k)→x∗as k→∞, (3.140b)

3.3. Surrogate-guided optimization 109

where x∗refers to an existing optimal solution of the high-fidelity optimization prob-

lem in (3.116), more precisely,

x∗∈argmin

x∈X0(ˆ

j○K)(x). (3.140c)

Analogous to (3.140c), one can refer to ˜

x∗as an existing optimal solution of the low-

fidelity optimization problem in (3.117), more precisely,

x∗∈argmin

x∈X1(ˆ

j○˜

K)(˜

x). (3.141)

Observe that, in the scalar-valued use case and the vector-valued use case, it is ad-

ditionally supposed that Z0⊆R+and Z1⊆R+in (3.116) and in (3.117), respectively.

If we invoke an object X11such that there is some kind of substructure between X11

and X1, then one can utilize the inclusion map ι˜

Pin (3.122a) in order to define a

preimage of X11under ι˜

Pin the sense that

ι−1

P(X11)∶={x∈X0∣ι˜

P(x)∈X11}. (3.142)

If we suppose that ˜

x∗∈X11⊂X1, then one can define a proposal for an initial iteration

point x(0)by

ι˜

P(x(0))∶=˜

x∗(3.143)

such that x(0)∈ι−1

P(X11). Bear in mind that other initial iteration points are plausible,

too, since they are commonly problem dependent. However, the choice in (3.143)

appears heuristically as a promising decision in order to, hopefully, keep the total

number of iterations as low as possible.

In the course of the years, many different algorithms have been presented to

achieve the essential aim in (3.140). For elaborations on a large portion of cor-

responding optimization algorithms, see, e.g., [49, ch. 3] or [125] and references

therein.

In the present work, though, solely a small subset of the large class of optimiza-

tion algorithms within the space mapping paradigm is considered. Notice that those

algorithms are regarded as algorithms within the space-mapping paradigm that con-

ceptually distinguish a low-fidelity model ˜

Kand a surrogate model K˜

R,˜

K,˜

sin (3.121)

at the function level (recall Figure 1.4).

From the small subset of algorithms under consideration, let us focus primarily

on a Trust Region Aggressive Space Mapping (TRASM) algorithm which assumes the

surrogate model ˜

Ks(ι˜

R,˜

K,˜

P)that is described with regard to (3.136).

Mind that the present work’s version of the TRASM algorithm, that is, algo-

rithm 3.1, builds upon the discussion in [95] and in [49, p. 67ff]) and extends their

considerations by the context of a more general set of admissible solutions and the

context of the Julia PL. A main novel use case of the TRASM algorithm 3.1 of the

present work is its combination with a co-kriging low-fidelity model that we en-

counter in the next subsection.

Similarly to the proposed procedures SBO-DFLF and SBO-SPLF, the TRASM al-

gorithm 3.1 builds upon the common canon of optimization algorithms (recall § 2.3.3).

The TRASM algorithm’s basic building blocks are well covered by the theory of

trust-region methods within nonlinear optimization (see, e.g., [158, ch. 4]). An essen-

tial overriding motivation of invoking a trust-region scaffolding for an optimization

110 Chapter 3. Surrogate optimization

algorithm is to equip the algorithm with good global convergence guarantees, that

is, to ensure that any remote starting point will eventually converge, in the uncon-

strained case, to a stationary accumulation point, and, in the constrained case, to a

KKT point where a KKT point is conceived as a stationary accumulation point that

satisfies the KKT conditions (see § 2.3.1). Regarding the experimental rate of con-

vergence, the corresponding equipped optimization algorithms exhibit satisfactory

practical performance.

Let us assume some additional structure w.r.t. the domains and codomains of

the respective models; more precisely, let us suppose some kind of topological vec-

tor space structure equipped with some metric structure, some norm structure, and

some inner product structure. Hence, for the sake of simplicity, it is presupposed that

the above-mentioned vector-valued use case incorporates all the structure needed.

Furthermore, it is supposed that ˜

Ks(ι˜

R,˜

K,˜

P)≡˜

Ks(idY,˜

K,˜

P)w.r.t. (3.122).

Let us define the k+1-th iteration point x(k+1)as

x(k+1)∶=x(k)+h(k), (3.144)

where h(k)∈X0⊆Rddenotes the k-th step from x(k)to x(k+1)in which the step’s

direction and the step’s length are encoded.

The definition in (3.135) is adapted in the sense that the identification (x−x0)∶=

h(k)is made, it is set that ˜

P(x0)∶=˜

P(xk)and ˜

Pv(x0)∶=˜

Pv(xk)such that ˜

P(xk)∶=

Pv(xk); and, finally, the Jacobi matrix J˜

P(x0)is approximated by means of the Broy-

den’s method for solving nonlinear equations (see, e.g., [158, p. 279–283]), i.e.,

J˜

P(x0)∶=B(k),B(0)∶=I,B(k+1)∶=B(k)+Rd×d(y(k)

+RdB(k)h(k))⊗h(k)

∥h(k)∥2

, (3.145)

where B(k)∈Rd×dis referred to as the k-th iteration Broyden’s matrix, the map ⊗is

conceived as the outer product w.r.t. two column vectors and the map ⊗is granted

a higher precedence than the map +Rd×d,I∈Rd×dis the identity matrix as a repre-

sentative of a non-singular matrix and y(k)

∈Rddenotes the change in ˜

Pvin (3.134)

w.r.t. the step h(k), that is,

y(k)

∶=˜

Pv(x(k+1))−˜

Pv(x(k)). (3.146)

Hence, the adaptation of the definition in (3.135) reads as

P=x(k+1)↦˜

P(x(k+1))∶=˜

Pv(x(k))+B(k)h(k). (3.147)

By moving from the definition in (3.135) to its adaptation in (3.147), one can ob-

tain the k-th step h(k)by solving the trust-region optimization sub-problem regard-

ing (3.140) in which the corresponding model function is set to be the surrogate

model ˜

Ks(idY0,˜

K,˜

P)≡idY0○Y10 ˜

K○X01 ˜

Pw.r.t. h(k), that is,

min.

h(k)∈F(k)

0(ˆ

j○(idY0○Y10 ˜

K○X01 ˜

P))(x(k)+h(k))(3.147)

≡min.

h(k)∈F(k)

0(ˆ

j○(idY0○Y10 ˜

K))(˜

Pv(x(k))+B(k)h(k)),

(3.148)

where F(k)

0⊆Rdincorporates the inequality constraint function c1that reads as

c1=h(k)↦c1(h(k))∶=∥Dh(k)∥l2−∆(k)∶X0→R−, (3.149)

3.3. Surrogate-guided optimization 111

with ∀h(k).c1(h(k))≤0 and ∥Dh(k)∥l2being the trust-region in the l2-norm and ∆(k)∈R+

being the trust-region radius and D∈Rd×dbeing a diagonal matrix with positive en-

tries that enables scaling in order to potentially enhance the solving process (cf. [158,

p. 95ff]).

Choosing adequately the trust-region radius ∆(k)in each iteration is an important

part of an optimization algorithm based on the trust-region theory (cf. [158, p. 68]).

In this context, let us define the trust-region reduction quotient ρ(k)∈R, that is,

ρ(k)∶=ared(x(k),h(k))

pred(x(k),h(k)), (3.150)

where ared ∶ZX0×X0

0denotes the actual reduction function and pred ∶ZX0×X0

0denotes

the predicted reduction function whose assignment rules read as

ared(x(k),h(k))∶=(ˆ

j○(idY0○Y10 ˜

K○X01 ˜

P))(x(k))−(ˆ

j○(idY0○Y10 ˜

K○X01 ˜

P))(x(k)+h(k)),

(3.151a)

pred(x(k),h(k))∶=(ˆ

j○(idY0○Y10 ˜

K○X01 ˜

P))(x(k))−(ˆ

j○(idY0○Y10 ˜

K))(˜

Pv(x(k))+B(k)h(k)).

(3.151b)

For the sake of the implementation of the TRASM algorithm 3.1, let us overload

straightforwardly the function signature of ared by ZX1×X1

0and the function signa-

ture of pred by ZX1×X0

0such that the assignment rules read as

ared(˜

x(k),˜

x(k+1))∶=(ˆ

j○(idY0○Y10 ˜

K))(˜

x(k))−(ˆ

j○(idY0○Y10 ˜

K))(˜

x(k+1)), (3.152a)

pred(˜

x(k),h(k))∶=(ˆ

j○(idY0○Y10 ˜

K))(˜

x(k))−(ˆ

j○(idY0○Y10 ˜

K))(˜

x(k)+B(k)h(k)),

(3.152b)

where ˜

x(k)∶=˜

Pv(x(k))and ˜

x(k+1)∶=˜

Pv(x(k)+h(k)).

Note that ρ(k)in (3.150) quantifies the degree of justification for the identifica-

tion in (3.148) since the genuine nature of the statement in (3.147) and the statement

in (3.135), respectively, is approximate instead of exact.

Given a trust-region reduction threshold η1∈]0,1[and a trust-region reduction thresh-

old η2∈]0,1[with ∀η1,η2.η1<η2, a trust-region reduction factor γ∈]0,1[, and a trust-

region augmentation factor ζ∈]1,∞[, let us distinguish three cases regarding the k+1-

th iteration trust-region radius ∆(k+1), i.e.,

∆(k+1)∶=⎧

⎪

⎨

⎪

⎩

γ∆(k), if ρ(k)<η1

∆(k), if η1≤ρ(k)<η2

ζ∆(k), if ρ(k)≥η2,

(3.153)

Thus, depending on the value of ρ(k), the trust-region radius is decreased or un-

changed or increased. Furthermore, if the trust-region radius is decreased, then

the k+1-th iteration point x(k+1)in (3.144) and the k+1-th iteration Broyden’s ma-

trix B(k+1)in (3.145) are not accepted, more precisely, x(k+1)∶=x(k)and B(k+1)∶=B(k);

otherwise they are accepted. For other strategies of updating ∆(k+1), see, e.g., [49,

p. 68] or [158, p. 69].

Mind that F(k)

0in (3.148) incorporates contingently also some other constraints

(recall § 2.3.2) such as, e.g., box constraints (see § 2.3.2), inherited from the high-

fidelity model’s domain. Hence, the corresponding inequality constraint functions c2

112 Chapter 3. Surrogate optimization

and c3read as

c2=h(k)↦c2(h(k))∶=xl−(x(k)+h(k))∶X0→X0(3.154a)

c3=h(k)↦c3(h(k))∶=(x(k)+h(k))−xu∶X0→X0, (3.154b)

where ∀h(k).c2(h(k))≤0, ∀h(k).c3(h(k))≤0, and xl∈X0⊆Rdincludes the lower bou-

ds w.r.t. x, and xu∈X0⊆Rdincludes the upper bounds w.r.t. x. Note that, e.g., in the

case of box constraints, it could be useful to change the norm from the l2- to the l∞-

or to the l1-norm (cf. [158, p. 97]).

Given an evaluated quantity of interest which is supposed to be computation-

ally expensive, if we incorporate this quantity in the constraints (recall § 2.3.1 and

§2.3.3), then, from an application-driven viewpoint, it might be advisable to replace

this quantity by a low-fidelity model as well. Depending on the computational costs

of a simplified-physics low-fidelity model in terms of memory storage and evalu-

ation time, it can be more favorable to invoke a data-fit low-fidelity model than a

simplified-physics low-fidelity model.

Finally, the k-th step h(k)in (3.144) is computed by

h(k)∶=argmin

h∈F(k)

0(ˆ

j○(idY0○Y10 ˜

K))(˜

Pv(x(k))+B(k)h).28 (3.155)

In order to translate the basic building blocks of the TRASM algorithm into a

practical implementation, i.e., into a program (recall Figure 1.4), one has to provide

at least one termination criterion. Mind that applying proper termination criteria for

practical purposes is an intricate endeavor.

Let us consider the maximal number kmax ∈Nof iteration points in (3.144) as one

termination criterion, more precisely, the condition

∀k.k<kmax (3.156)

has to be true, otherwise the algorithm terminates.

Furthermore, let us consider the norm of the k-th step h(k)relative to the norm of

the k-th iteration point x(k)in (3.144) such that

∥x(k+1)−x(k)∥l2

∥x(k)∥l2

≡∥h(k)∥l2

∥x(k)∥l2

. (3.157)

Given an absolute threshold w.r.t. the norm of the step h(k), i.e., eabs ∈]0,1[, and

an relative threshold w.r.t. the norm of the step h(k), i.e., erel ∈]0,1[, where it is de-

manded that ∀eabs,erel .eabs <erel, then one can reformulate (3.157) such that

∀x(k),h(k).∥h(k)∥l2

∥x(k)∥l2

>erel +eabs

∥x(k)∥l2

(3.158)

28The definition of the k-th step h(k)in (3.155) aligns itself with the approach in [95]. However,

in [56, p. 18f] and in [49, p. 68f], another approach for computing h(k)is discussed within the context

of the Levenberg-Marquardt method for least-squares problems (see, e.g., [158, p. 258–262]). Hence,

h(k)is determined by solving the linear system of equations ((B(k))TB(k)+λI)h(k)=−(B(k))Te(k)

with e(k)∶=˜

Pv(xk)−˜

x∗where ˜

x∗refers to an existing optimal solution of the low-fidelity optimization

problem and λ∈R+plays a similar role as the regularization parameter w.r.t. (3.65). For more details

on this approach for computing h(k), I refer to, e.g., [56, p. 18f], [49, p. 68f], [158, p. 69ff], and [158,

p. 258–262] and references therein.

3.3. Surrogate-guided optimization 113

Algorithm 3.1: Trust Region Aggressive Space Mapping (TRASM)

# Input:

x(0)#∈Rd... initial solution # (3.143)

B(0)#∈Rd×d... initial Broyden’s matrix # (3.145)

∆(0)#∈R+... initial trust-region radius # (3.153)

η1,η2,γ,ζ#∈]0,1[×]0,1[×]0,1[×]1,∞[# (3.153)

F(0)

0#⊆Rd... initial set of admissible solutions # (3.149) # (3.154)

kmax #∈]0,100]... maximal number of iterations # (3.156)

eabs #]0,1[... absolute threshold w.r.t. the norm of the step h(k)# (3.158)

erel #∈]0,1[... relative threshold w.r.t. the norm of the step h(k)# (3.158)

# Output:

x(k+1)#∈Rd... optimal solution after k +1iterations

1: for kin 0∶1∶kmax # (3.156)

2: ˜

x(k)∶=˜

Pv(x(k))# (3.134) # a high-fidelity model evaluation

3: h(k)∶=argminh∈F(k)

0(ˆ

j○(idY0○Y10 ˜

K))(˜

x(k)+B(k)h)# (3.155)

4: ˜

x(k+1)∶=˜

Pv(x(k)+h(k))# (3.134) # a high-fidelity model evaluation

5: ρ(k)∶=ared(˜

x(k),˜

x(k+1))

pred(˜

x(k),h(k))# (3.152)

6: if ρ(k)<η1# (3.153)

7: x(k+1)∶=x(k)

8: B(k+1)∶=B(k)

9: ∆(k+1)∶=γ∆(k)

10: elseif η1≤ρ(k)and ρ(k)<η2# (3.153)

11: x(k+1)∶=x(k)+h(k)# (3.144)

12: y(k)

∶=˜

x(k+1)−˜

x(k)# (3.146)

13: B(k+1)∶=B(k)+Rd×d(y(k)

+RdB(k)h(k))⊗h(k)

∥h(k)∥2

# (3.145)

14: ∆(k+1)∶=∆(k)

15: else # (3.153)

16: x(k+1)∶=x(k)+h(k)# (3.144)

17: y(k)

∶=˜

x(k+1)−˜

x(k)# (3.146)

18: B(k+1)∶=B(k)+Rd×d(y(k)

+RdB(k)h(k))⊗h(k)

∥h(k)∥2

# (3.145)

19: ∆(k+1)∶=ζ∆(k)

20: end # if

21: if ∥h(k)∥l2

∥x(k)∥l2

≤erel +eabs

∥x(k)∥l2# (3.158)

22: break

23: end # if

24: end # for

has to be true, otherwise the algorithm terminates. Technically, the termination

criterion in (3.158) consists of a combination of an absolute termination criterion and

a relative termination criterion.

Optionally, as an additional safeguard, one can incorporate the norm of the eval-

uated actual reduction function in (3.150) regarding an absolute threshold and a rel-

ative threshold. If, in some way, the gradient information concerning (ˆ

j○(idY0○Y10

K○X01 ˜

P))(x(k))or (ˆ

j○(idY0○Y10 ˜

K))(˜

x(k))is available, then the gradient information

114 Chapter 3. Surrogate optimization

can be utilized for a termination criterion as well.

Finally, let us choose an intermediate level between a pseudocode and a code

from a programming language in industry (cf. the Listing 3.1) in order to represent

the TRASM algorithm 3.1.

For the sake of completeness, let us discuss briefly the basic building blocks of the

Manifold Mapping (MM) algorithm which utilizes the surrogate model ˜

Ks(˜

R,˜

K,ι˜

P).

The corresponding family of algorithms is based upon theoretical considerations

that, in the multivariate vector-valued use case, focus particularly on the situation in

which dim(X0)<dim(Y0)with dim(X0)∶=dand dim(Y0)∶=mw(cf. [56, p. 44]). The

elaborations are built upon the discussion in [56, p. 43 – 48] and in [49, p. 67ff]).

Adapting the constructions in (3.137), in (3.138), and in (3.139) and providing

a desired high-fidelity function value Kd∈Rmw(cf. ydin (2.31) and Qdin (2.33)) as

well as an initial iteration point x(0)such as in (3.143) and set the initial iteration

matrix S(0)∶=Iand the initial iteration matrix T(0)∶=S(0)with I∈Rmw×mwbeing the

identity matrix, then one can concretely define an update scheme for x(k+1)by

y(k)∶=(ι˜

R○Y10 ˜

K○X01 ι˜

P)(x(k))−T(k)(K(x(k))−Kd), (3.159a)

x(k+1)∶=argmin

x∈X0∥(ι˜

R○Y10 ˜

K○X01 ι˜

P)(x)−y(k)∥2

l2, (3.159b)

D(k+1)

K∶=[K(x(k+1))−K(x(k)),.. .,K(x(k+1))−K(x(max(k+1−d,0)))],

(3.159c)

D(k+1)

K∶=[(ι˜

R○Y10 ˜

K○X01 ι˜

P)(x(k+1))−(ι˜

R○Y10 ˜

K○X01 ι˜

P)(x(k)),.. .,

(ι˜

R○Y10 ˜

K○X01 ι˜

P)(x(k+1))−(ι˜

R○Y10 ˜

K○X01 ι˜

P)(x(max(k+1−d,0)))],

(3.159d)

[U(k+1)

K,Σ(k+1)

K,V(k+1)

K]∶=svd(D(k+1)

K), (3.159e)

[U(k+1)

K,Σ(k+1)

K,V(k+1)

K]∶=svd(D(k+1)

K), (3.159f)

A(k+1)∶=(I−U(k+1)

K(U(k+1)

K)T), (3.159g)

S(k+1)∶=D(k+1)

K(D(k+1)

K)++A(k+1)(I−U(k+1)

K(U(k+1)

K)T), (3.159h)

T(k+1)∶=(S(k+1))+, (3.159i)

where the matrix A(k+1)∈Rmw×mwserves as a potential stabilizer for the MM algo-

rithm (cf. [56, p. 46]), and the operation svd refers to the singular value decomposi-

tion method, i.e., the operation svd performs the corresponding factorization for a

given matrix such that

D(k+1)

K∶=U(k+1)

KΣ(k+1)

K(V(k+1)

K)T(3.160)

D(k+1)

K∶=U(k+1)

KΣ(k+1)

K(V(k+1)

K)T, (3.161)

with U(k+1)

K,U(k+1)

K∈Rmw×mw,Σ(k+1)

K,Σ(k+1)

K∈Rmw×d, and V(k+1)

K,V(k+1)

K∈Rd×d. Note

that the matrix D(k+1)

K∈Rmw×din (3.159c) and the matrix D(k+1)

K∈Rmw×din (3.159d)

can be equivalently determined by

∀j∈{1,. . .,min(dim(X0),k+1)}.∀i∈{1,. . .,dim(Y0)}.

[di,j](k+1)

K=K(x(k+1))−K(x(k+1−j)), (3.162)

[di,j](k+1)

K=(ι˜

R○Y10 ˜

K○X01 ι˜

P)(x(k+1))−(ι˜

R○Y10 ˜

K○X01 ι˜

P)(x(k+1−j)). (3.163)

3.3. Surrogate-guided optimization 115

Observe that the definition of the matrix D(k+1)

Kand the definition of the matrix D(k+1)

reveal that, at the program level (recall Figure 1.4), there is some kind of allocation-

sensitive bookkeeping necessary for constructing D(k+1)

Kand D(k+1)

Proper termination criteria for the MM algorithm can be defined analogously to

the TRASM algorithm 3.1. Furthermore, in [124], the authors propose some heuris-

tics in order to extend the basic building blocks of the MM algorithm in (3.159) by a

trust-region framework – sort of like in the TRASM algorithm 3.1.

At the algorithm level (recall Figure 1.4), though, there are two notable differ-

ences between the MM algorithm and the TRASM algorithm 3.1. Firstly, the MM

algorithm does not rely on solving an additional optimization problem such as for

determining ˜

Pv(x)in (3.134), but it relies on performing computational efficiently

two singular value decompositions. And, secondly, the MM algorithm exploits one

high-fidelity model evaluation in the k+1-th iteration instead of two high-fidelity

model evaluations in the TRASM algorithm 3.1.

Building upon investigations of algorithms of Space Mapping (SM) kind – such as

the TRASM algorithm 3.1 – which utilize the surrogate model ˜

Ks(ι˜

R,˜

K,˜

P)and inves-

tigations of algorithms of Manifold Mapping (MM) kind which utilize the surrogate

model ˜

Ks(˜

R,˜

K,ι˜

P), the author in [49, p. 112ff] proposes the Response and Parame-

ter Mapping (RPM) algorithm which utilizes the surrogate model ˜

Ks(˜

R,˜

K,˜

P)and in

which basic building blocks from SM algorithms and MM algorithms are combined.

The RPM algorithm’s additional computational overhead compared to a SM algo-

rithm or to a MM algorithm is justified by a hopefully higher accuracy.

In the RPM algorithm, the update scheme of the matrix B(k+1)in (3.145) is per-

formed in a manner similar to the update scheme of the matrix S(k+1)in (3.159h)

which could lead to an interesting adaptation of the TRASM algorithm 3.1. How-

ever, the data situation regarding a widely ramified overall assessment of the RPM

algorithm in terms of, e.g., accuracy, speed or convergence is very limited. Hence,

it is presumed that the scope of the benefit of a corresponding adaptation of the

TRASM algorithm 3.1 may be limited as well.

Observing the different kinds of algorithms, then one can generally discern that

the Achilles’ heel – or the spot that requires the most attention – of all the opti-

mization algorithms within the space-mapping paradigm is the need for a benign

resemblance of the high-fidelity model and the low-fidelity model (cf. § 3.1.3).

However, due to their dependence on properties of the low-fidelity model ˜

K, the

surrogate model K˜

R,˜

K,˜

s, and the high-fidelity model K, the convergence analysis of

the corresponding iteration procedures for (3.140) is fairly intricate (see, e.g., [49,

p. 76–84]).

If we invoke the NREGE eNR

H,sg(ˆ

Qξ

ξ)in (3.24) and the SSPCC r2

y˜

yin (3.31) w.r.t. the

low-fidelity model and if we adapt the NREGE and the SSPCC w.r.t. the surrogate

model, i.e., if we define ad-hoc eNR

H,sg,K˜

R,˜

K,˜

s(ˆ

Qξ

ξ)and r2

y˜

y,K˜

R,˜

K,˜

sby replacing the low-

fidelity model in (3.24) and in (3.31) with the surrogate model, then, driven by

heuristics, it is presumably reasonable to assert formally that, with regard to some

appropriate norms,

∀K˜

R,˜

K,˜

s.∀˜

K.(eNR

H,sg(ˆ

Qξ

ξ)→0∧r2

y˜

y→1)∧(eNR

H,sg,K˜

R,˜

K,˜

s(ˆ

Qξ

ξ)→0∧r2

y˜

y,K˜

R,˜

K,˜

→1)as m→∞and k→∞

Ô⇒ x(k)→x∗as k→∞w.r.t (3.140a), (3.164)

where mdenotes the number of sampling plan points such as in (3.35).

116 Chapter 3. Surrogate optimization

Similarly to the comments on (3.35), note that, from an application-driven view-

point, we are rather drawn towards the pre-asymptotic behavior than towards the

asymptotic behavior. Furthermore, note that the need for convergence w.r.t. both

the low-fidelity model and the surrogate model is comprehensible if we plug the

high-fidelity model as a low-fidelity model, i.e., K≡˜

K, in (3.164). In this case, one

can observe by definition convergence w.r.t. the low-fidelity model, but one could

observe non-convergence w.r.t. the surrogate model due to an improper choice of

correction maps.

Chiefly, the statement in (3.164) is a novel attempt to express formally the above-

mentioned intricateness of a convergence analysis of iteration procedures associated

with (3.140).

A practical value of the statement in (3.164) resides in the issue concerning the

quantitative assessment of the quality of a low-fidelity model and a surrogate model.

Hence, it contributes to the discussion regarding this issue (see, for instance, in [121]

or in [49, p. 82ff]) where the NREGE and the SSPCC can serve as quality measures.

Another practical value of the statement in (3.164) resides in the treatment of the

NREGE and the SSPCC as potential safeguards for any optimization algorithm within

the space-mapping paradigm. Hence, keeping the concrete implementations sepa-

rated from the abstract specification in (3.140), I deem it beneficial to incorporate the

NREGE and the SSPCC at the zeroth step, at an intermediate step or at the last step

of an iteration procedure concerning (3.140). At the last step of a corresponding iter-

ation procedure, the NREGE and the SSPCC could serve as an ultimate termination

criterion.

Similarly to the sequential kriging optimization in § 3.3.1, the experimental rate

of convergence of iteration procedures concerning (3.140) might be very low or the

corresponding iteration procedures might not be convergent at all (see, e.g., [49,

p. 84] or [49, p. 101f]). Thus, the condition in (3.107) for a given problem is either

not applicable since its assumption is not satisfied or not necessarily true at all.

Therefore, from an application-driven viewpoint embedded in the context of val-

idation and verification, it might be judicious to adapt the procedure SBO-DFLF and

the procedure SBO-SPLF in order to construct the schematic procedure that I refer to

as SGO-SPLF:

1) create a sample w.r.t. the high-fidelity model (§ 3.1.1);

2) provide a simplified-physics low-fidelity model (§ 3.1.3) and compute r2

y˜

y∣k

w.r.t. to the sample from step 1) and if r2

y˜

y∣k

<0.75, break off the current pro-

cedure and invoke the procedure SBO-DFLF, otherwise continue the current

procedure;

3) invoke a global optimization algorithm (§ 2.3.3) w.r.t. the provided simplified-

physics low-fidelity model from step 2);

4) use the minimizer from step 3) as a starting point in order to construct in the

minimizer’s vicinity a data-fit low-fidelity model (§ 3.1.2) w.r.t. the simplified-

physics low-fidelity model from step 2);

5) invoke an optimization algorithm within the space-mapping paradigm where

the low-fidelity model is identical with the data-fit low-fidelity model from

step 4);

6) use the minimizer from step 3) as a starting point for a local optimization algo-

rithm (§ 2.3.3) w.r.t. the high-fidelity model from step 1).

3.3. Surrogate-guided optimization 117

7) compare the minimizer from step 6) with the minimizer from step 5) and if

the discrepancy is much larger than a user-assigned threshold, improve the

simplified-physics low-fidelity model from step 2) or improve the data-fit low-

fidelity model from step 4) or improve both low-fidelity models.

Similarly to the TRASM algorithm 3.1 and the proposed procedures SBO-SPLF

and SBO-DFLF, the proposed schematic procedure SGO-SPLF builds upon the com-

mon canon of optimization algorithms (recall § 2.3.3).

Note that, in the procedure SGO-SPLF, I extend an approach proposed in [124]

which is incorporated by step 4) and step 5). More precisely, it is assumed that

the simplified-physics low-fidelity model – e.g., a low-fidelity model based on a

coarse-grid discretization or a weakened termination criteria of an iterative solver

or a combination of both (recall § 3.1.3) – is computationally too expensive such that

there is a need to construct a data-fit low-fidelity model w.r.t. the simplified-physics

low-fidelity model as well.

In order to mitigate the curse of dimensionality (recall § 3.1.2), the data-fit low-

fidelity model is constructed in the vicinity of the simplified-physics low-fidelity

model’s minimizer, that is, the domain under consideration (see, e.g., Figure 2.2 and

Figure 2.3) for the sampling plan Xsis smaller.

Finally, observe that the k+1-th iteration point in (3.140a) of an optimization

algorithm within the space-mapping paradigm can be theoretically interpreted as a

new m+1 sampling plan point in (3.115) of the sequential kriging optimization. This

contemplation furnishes us with some kind of semantics in order to contrast loosely

the corresponding optimization procedures that are all conceived as subtypes of the

model management strategy adaptation.

Furthermore, if we select a kriging low-fidelity model within the procedure SGO-

SPLF, then, technically, the afore-mentioned contemplation gives us the opportunity

to suggest an extension of the procedure SGO-SPLF by an inner sequential kriging

optimization in order to provide a mechanism for an adaptive improvement of the

data-fit low-fidelity model in step 5). The intricateness of such an interweavement

is captured by the statement in (3.164) as well.

3.3.3 Co-kriging optimization

Recalling our explanations regarding the mathematical machinery behind the krig-

ing low-fidelity model in § 3.1.2 and regarding the simplified-physics low-fidelity

models in § 3.1.3, especially, the generic statement in (3.103), one can articulate the

essential ideas behind the co-kriging optimization.

Conceptually, a co-kriging low-fidelity model is a kind of a kriging low-fidelity

model in which the prediction at an arbitrary point x, i.e., ˆ

y(x)in (3.98), incorpo-

rates the information of a sample sin (3.13) with respect to a high-fidelity model K

and the information of a sample s˜

Kwith respect to a low-fidelity model ˜

K(cf. [70,

p. 168]). Due to this construction principle, the optimization based on a co-kriging

low-fidelity model is conceived as a subtype of the model management strategy fu-

sion.

A very rough intuition behind the prediction of the co-kriging low-fidelity model

reveals that it behaves as an interpolation problem with regard to the sample sand,

as long as there is no coincidence between s˜

Kand s, it behaves as some kind of

regression problem with regard to the sample s˜

K(cf. [70, p. 172]).

Mind that the following technical explanations concerning a co-kriging low-fidelity

model are condensed and built upon the discussion in [70, p. 167 – 177]. Hence, for

118 Chapter 3. Surrogate optimization

more details regarding some derivations, I refer to the corresponding reference and

the references therein.

Adapting the notation in § 3.3.2 regarding a high-fidelity model Kand a low-

fidelity model ˜

K, respectively, let us assume a sampling plan Xs⊆Xm

0in (3.14) with

respect to a high-fidelity model Kand a sampling plan Xs˜

K⊆Xm˜

1with respect to

a low-fidelity model ˜

Kwith mand m˜

Kbeing the respective sample sizes such that

∀m,m˜

K.m<m˜

K. Furthermore, supposing some kind of sub-structure between X0

and X1as well as between Y1and Y0such as in (3.122), it is demanded that Xs⊂Xs˜

However, for the sake of brevity, let us omit the explicit mentioning of the inclusion

maps.

If we identify Xswith an m×dmatrix and the corresponding sampling plan

points xiwith 1×dmatrices and if we identify Xs˜

Kwith an m˜

K×dmatrix the cor-

responding sampling plan points ˜

xiwith 1×dmatrices where ddenotes the number

of parameters, then one can define a (m˜

K+m)×dmatrix Xs˜

K,sby

Xs˜

K,s∶=[XT

s˜

KXT

s]T, (3.165a)

≡[˜

1... ˜

m˜

KxT

1... xT

m]T, (3.165b)

≡[Xs˜

Xs], (3.165c)

where Xs˜

K,sdenotes the joint sampling plan.

With regard to a given joint sampling plan Xs˜

K,s, let us encode in matrix repre-

sentation the joint output points as a column vector ys˜

K,s∈R(m˜

K+m)×1with

ys˜

K,s∶=[˜

y1... ˜

ym˜

Ky1... ym]T, (3.166a)

∶=[˜

yTyT]T, (3.166b)

where ˜

yirefer to the output points regarding the sampling plan Xs˜

Kwith respect to a

low-fidelity model ˜

Kand yirefer to the output points regarding the sampling plan Xs

with respect to a high-fidelity model K. Similarly to (3.83), the output points ˜

yiare

represented by the column vector ˜

y∈Rm˜

K×1and the output points yiare represented

by the column vector y∈Rm×1.

Technically, ys˜

K,sis associated with a corresponding random field Ys˜

K,s, i.e., a

vector of random variables. In this context, a crucial ingredient is the so-called auto-

regressive model assumption, which is a kind of a memoryless property or Markov prop-

erty; it states that

∀x≠xi. cov(Ys(xi),Ys˜

K(x)∣Ys˜

K(xi))=0. (3.167)

The statement in (3.167) reflects the idea that one can conceive the high-fidelity

model as an exact representation. More precisely, if one possesses all the informa-

tion about the high-fidelity model at xi, then the low-fidelity model will not provide

any new information about Ys(xi), but any potential errors are only due to the low-

fidelity model. For more details regarding the statement in (3.167), see, e.g., (cf. [70,

p. 168]) and references therein.

Let us concretize the entities K(x),˜

K(x), and Z∆(x)in the generic statement

in (3.103) by Gaussian processes and the entity Zρ(x)by a constant scaling factor

ρ∈R, then one receives the concrete statement

∀x∈Xs.Z(x)=ρ˜

Z(x)+Z∆(x), (3.168)

3.3. Surrogate-guided optimization 119

where the high-fidelity model evaluation and the low-fidelity model evaluation are

redefined as Gaussian process Z(x)and ˜

Z(x), respectively.

Similarly to the covariance matrix in (3.88), one can construct the corresponding

covariance matrix C∈R(m˜

K+m)×(m˜

K+m)by the five individual correlation matrices

Ψ˜

K(Xs˜

K,Xs˜

K)∈Rm˜

K×m˜

K, (3.169a)

Ψ˜

K(Xs˜

K,Xs)∈Rm˜

K×m, (3.169b)

Ψ˜

K(Xs,Xs˜

K)∈Rm×m˜

K, (3.169c)

Ψ˜

K(Xs,Xs)∈Rm×m, (3.169d)

Ψ∆(Xs,Xs)∈Rm×m. (3.169e)

Let us interpret the matrices in (3.169) as the evaluations of maps Ψ●with the sig-

natures Ra×d×Rb×d→Ra×bwith a,b,d∈Nand let us invoke (3.89) in order to de-

termine the entries of the correlation matrices, then one can define the family of

maps Ψ●as

Ψ●=(1X,2X)↦Ψ●(1X,2X)∶=[ψi,l]●≡[exp(−∑d

j=1

●θj∣1xj

i−2xj

l∣●pj)]●

, (3.170)

where i∈{1,. . .,a}and l∈{1,. . .,b}and 1xirefers to sampling plan points of the

sampling plan 1Xand 2xlrefers to sampling plan points of the sampling plan 2X.

Finally, one can construct the covariance matrix Cas

C∶=[σ2

KΨ˜

K(Xs˜

K,Xs˜

K)ρσ2

KΨ˜

K(Xs˜

K,Xs)

ρσ2

KΨ˜

K(Xs,Xs˜

K)ρ2σ2

KΨ˜

K(Xs,Xs)+σ2

∆Ψ∆(Xs,Xs)], (3.171)

where, similarly to (3.87), σ2

K,σ2

∆∈R.

Given the definition in (3.171) and the definition in (3.170), it is observable that,

similarly to (3.97), one has to determine maximum likelihood estimates of the pa-

rameters (θ˜

K,p˜

K,θ∆,p∆,ρ)∈[0,+∞[d×[0,2]d×[0,+∞[d×[0,2]d×R. For the sake of

notational ease – especially in (3.170), let us treat the 5-tuple (θ˜

K,p˜

K,θ∆,p∆,ρ)and

the 5-tuple (˜

Kθ,˜

Kp,∆θ,∆p,ρ)as definitional equal, i.e., let us demand that the expres-

sion (θ˜

K,p˜

K,θ∆,p∆,ρ)≡(˜

Kθ,˜

Kp,∆θ,∆p,ρ)holds componentwise.

It is assumed that the information associated with the low-fidelity model is inde-

pendent of the information associated with the high-fidelity model. Hence, one can

determine the maximum likelihood estimates for µ˜

K∈Rand σ2

Kby

µ˜

K∶=1T

KΨ˜

K(Xs˜

K,Xs˜

K)−1

KΨ˜

K(Xs˜

K,Xs˜

K)−11˜

y, (3.172a)

σ2

K∶=1

m˜

K(˜

y−ˆ

µ˜

K)TΨ˜

K(Xs˜

K,Xs˜

K)−1(˜

y−ˆ

µ˜

K), (3.172b)

where ˆ

µ˜

Kis defined similarly to (3.95), that is,

µ˜

K∶=ˆ

µ˜

K⋅1˜

K, (3.173)

where 1˜

K∶=[1 1 . . . 1 1]Twith 1˜

K∈Rm˜

K×1.

Let us define the concentrated ln-likelihood function associated with the infor-

mation regarding the low-fidelity model Lcln, ˜

K=(θ˜

K,p˜

K)↦Lcln, ˜

K(θ˜

K,p˜

K)with the

120 Chapter 3. Surrogate optimization

signature [0,+∞[d×[0,2]d→]− ∞,0]such that the assignment Lcln, ˜

K(θ˜

K,p˜

K)reads

Lcln, ˜

K(θ˜

K,p˜

K)∶=−m˜

2ln(ˆ

σ2

K)−1

2ln(∣Ψ˜

K(Xs˜

K,Xs˜

K)∣). (3.174)

By invoking a suitable optimization algorithm (recall § 2.3.3), one can compute

the maximum likelihood estimates of (θ˜

K,p˜

K)by considering the expression

(ˆ

θ˜

K,ˆ

p˜

K)∶=argmin

(θ˜

K,p˜

K)∈[0,+∞[d×[0,2]d

−Lcln, ˜

K(θ˜

K,p˜

K). (3.175)

In order to determine the maximum likelihood estimates of (θ∆,p∆,ρ), one has

to provide a column vector y∆∈Rm×1which encodes the difference between the col-

umn vector yand the column vector ˜

y. Due to the auto-regressive model assumption

in (3.167), one has to consider only those output points of ˜

ythat are associated with

the sampling plan Xs. Hence, one has to construct a column vector ˜

y∣Xs∈Rm×1that

can be interpreted as a restriction of ˜

yto the sampling plan Xs.29

Conceptually, it is favorable to construct initially Xs˜

Kand ˜

y, and, subsequently,

to construct Xs– such that Xs⊂Xs˜

K– and y. Mind that, though, a sampling plan

should possess desirable properties: It should be space-filling and non-collapsing

(recall § 3.1.1).

Hence, in the case of an Audze-Eglais LHC or a Maximin LHC (see Figure 3.1),

one has to adopt an exchange algorithm (see [70, p. 28f]) in order to construct Xs

from Xs˜

Ksuch that Xs⊂Xs˜

Kand Xspossessing the desirable properties for a sam-

pling plan. More precisely: Let us randomly select an initial Xssuch that Xs⊂Xs˜

and Xs˜

K/Xs∈R(m˜

K−m)×d.30

Furthermore, it is set that X(1)

s∶=Xs. Given the running index k∈{1,. . .,m}, let

us compute the corresponding space-filling (and non-collapsing) criterion for X(k)

i.e., the Audze-Eglais criterion for an Audze-Eglais LHC and the Morris-Mitchell cri-

terion for a Maximin LHC. Given the running index j∈{1,...,m˜

K−m}, one can ex-

change the sampling plan point xkof X(k)

swith each jsampling plan point of Xs˜

K/Xs;

thus, constructing theoretically jsampling plans X(k,j)

s. For each j, one can com-

pute the corresponding criterion of X(k,j)

s. If there is a j∗∈{1,. . .,m˜

K−m}such that,

for all j, the corresponding criterion for X(k,j∗)

sis optimal compared with the crite-

rion for X(k)

s, then one can set X(k+1)

s∶=X(k,j∗)

s, otherwise one can set X(k+1)

s∶=X(k)

thus, completing an iteration of the exchange procedure. The exchange procedure

is continued for all sampling plan points xkof Xsuntil it terminates for k=mand it

returns Xswith Xs∶=X(m)

In the case of a Sobol quasi-random sequence (see Figure 3.2), though, one can

construct initially Xs˜

Kand, then, we either pick the sampling plan points ˜

xkof Xs˜

with k∈{1,. . .,m}as the sampling plan points xiof Xs; or, alternatively, we con-

struct Xsas a Sobol quasi-random sequence with msampling plan points from

29Since some kind of sub-structure is supposed between X0and X1as well as between Y1and Y0

such as in (3.122), one can denote ˜

K∣X0∈hom(X0,Y1)as the restriction of ˜

Kto X0at the function level

(recall Figure 1.4). Furthermore, the map ˜

K∣X0can be composed with ι˜

Rsuch that one can construct

the map ι˜

R○Y10 ˜

K∣X0∈hom(X0,Y0). I argue, therefore, that it is reasonable to conceive the column

vector ˜

y∣Xswithin the context of the map ι˜

R○Y10 ˜

K∣X0. Keep in mind that if we solely operate with

various forms of the real numbers such as, e.g., R,Rn, and Rn×mwith n,m∈N, then a lot of valuable

conceptional distinction is probably lost.

30The operation /is overloaded with the signature Rm˜

K×d×Rm×d→R(m˜

K−m)×dwhere a resulting

difference sampling plan Xs˜

K/Xscontains all the sampling points of Xs˜

Kthat are not contained in Xs.

3.3. Surrogate-guided optimization 121

scratch. In both approaches, the resulting sampling plan Xssatisfies Xs⊂Xs˜

Kand it

is a Sobol quasi-random sequence itself, thus, it is a space-filling and non-collapsing

sampling plan itself. Therefore, utilizing Sobol quasi-random sequences to construct

the sampling plans Xsand Xs˜

Kcan be seen as a computationally time-saving and

cost-reducing alternative to the usage of an Audze-Eglais LHC or a Maximin LHC

and the necessity of an exchange algorithm.

Imagine a use case in which Xs˜

Kis constructed as an Audze-Eglais LHC or a

Maximin LHC and Xsis constructed as a Sobol quasi-random sequence. In such

a use case, it is very likely that there are no output points within ˜

ywhich can be

associated with the sampling plan points of Xs. Hence, one has to invoke a fallback

plan (cf. [70, p. 169]), i.e., given the maximum likelihood estimates (ˆ

θ˜

K,ˆ

p˜

K), let us

construct a kriging low-fidelity model (see (3.98)) of the low-fidelity model ˜

Kas

y(˜

x)∶=ˆ

µ˜

K+˜

rTΨ˜

K(Xs˜

K,Xs˜

K)−1(˜

y−ˆ

µ˜

K), (3.176)

where ˆ

y∶X1→Rsuch that ˆ

y(˜

x)indicates the prediction at an arbitrary point ˜

xand

r∶=[˜

ri]∈Rm˜

K×1denotes the correlation column vector that reads as

r∶=[˜

r1˜

r2... ˜

rm˜

K−1˜

rm˜

K]T, (3.177)

where, similarly to (3.81), the components ˜

riare defined as

ri∶=

∑

j=1

Kθj∣˜

xj−˜

i∣˜

Kpj. (3.178)

By means of (3.176), let us forge a column vector ˆ

y∣Xs

∈Rm×1.

Supposing ˜

y∣Xs, one can ultimately create the column vector y∆that reads as

y∆∶=y−ρ˜

y∣Xs, (3.179)

where the term ρ˜

y∣Xsencodes a multiplication of the column vector ˜

y∣Xswith the

scalar ρ. At the programs level (recall Figure 1.4), the term ˜

y∣Xsforces us to ensure

that we filter those components of the vector ˜

ysuch that we establish a map ˜

y↦˜

y∣Xs

in order to adequately compute y∆in (3.179).

Let us compute the maximum likelihood estimates for µ∆∈Rand σ2

∆by

µ∆∶=1T

∆Ψ∆(Xs,Xs)−1

∆Ψ∆(Xs,Xs)−11∆

y∆, (3.180a)

σ2

∆∶=1

m(y∆−ˆ

µ∆)TΨ∆(Xs,Xs)−1(y∆−ˆ

µ∆), (3.180b)

where ˆ

µ∆is defined similarly to (3.95), that is,

µ∆∶=ˆ

µ∆⋅1∆, (3.181)

where 1∆∶=[1 1 .. . 1 1]Twith 1∆∈Rm×1.

Let us define the concentrated ln-likelihood function associated with the infor-

mation regarding the low-fidelity model and the high-fidelity model, i.e., let us de-

fine the map Lcln,∆=(θ∆,p∆,ρ)↦Lcln,∆(θ∆,p∆,ρ)such that the map’s signature is

122 Chapter 3. Surrogate optimization

[0,+∞[d×[0,2]d×R→]− ∞,0]and the assignment Lcln,∆(θ∆,p∆,ρ)reads as

Lcln,∆(θ∆,p∆,ρ)∶=−m

2ln(ˆ

σ2

∆)−1

2ln(∣Ψ∆(Xs,Xs)∣). (3.182)

By invoking a suitable optimization algorithm (recall § 2.3.3), one can compute

the maximum likelihood estimates of (θ∆,p∆,ρ)by considering the expression

(ˆ

θ∆,ˆ

p∆,ˆ

ρ)∶=argmin

(θ∆,p∆,ρ)∈[0,+∞[d×[0,2]d×R

−Lcln,∆(θ∆,p∆,ρ).31 (3.183)

In practical applications, it is advisable to associate ρin (3.183) with a bounded

interval, e.g., with the closed interval [−a,a]such that ρ∈[−a,a]where a∈R+is a

user-assigned and problem-dependent entity.

Additionally, mind that, in order for ˆ

ρto be a reliable estimate of the scaling

in (3.168) and in (3.179), respectively, the sample size mregarding Xshas to be

greater than or equal to a problem-dependent lower bound (cf. [70, p. 176]).

After computing the maximum likelihood estimates (ˆ

θ˜

K,ˆ

p˜

K,ˆ

θ∆,ˆ

p∆,ˆ

ρ), one can

specify the co-kriging low-fidelity model as

y(x)∶=ˆ

µ˜

K,K+cTC−1(ys˜

K,s−ˆ

µ˜

K,K), (3.184)

where ˆ

y∶X1→Rsuch that ˆ

y(x)indicates the prediction at an arbitrary point x. The

maximum likelihood estimate for µ˜

K,K∈Rin (3.184) is given by

µ˜

K,K∶=

K,KC−1

K,KC−11˜

K,K

ys˜

K,s, (3.185a)

whereas ˆ

µ˜

K,Kis defined similarly to (3.95), that is,

µ˜

K,K∶=ˆ

µ˜

K,K⋅1˜

K,K, (3.186)

where 1˜

K,K∶=[1 1 . . . 1 1]Twith 1˜

K,K∈R(m˜

K+m)×1.

Recalling (3.88), the column vector c∈R(m˜

K+m)×1in (3.184) encodes the covari-

ance between the sampling plan Xs˜

Kand an arbitrary point xas well as the covari-

ance between the sampling plan Xsand an arbitrary point x. The column vector c

can be written as

c∶=[cT

1cT

2]T, (3.187)

where the column vector c1∈Rm˜

K×1and the column vector c2∈Rm×1are defined as

c1∶=ˆ

ρˆ

σ2

KΨ˜

K(Xs˜

K,x), (3.188a)

c2∶=ˆ

ρ2ˆ

σ2

KΨ∆(Xs,x)+ˆ

σ2

∆Ψ∆(Xs,x), (3.188b)

where Ψ˜

K(Xs˜

K,x)∈Rm˜

K×1and Ψ∆(Xs,x)∈Rm×1. Since the column vector cplays a

similar role such as the column vector rin (3.98), one can invoke the definition of the

31Regarding some applications, there might be numerical issues that are presumably caused pre-

dominantly by the estimate ˆ

ρ. In order to mitigate such potential numerical issues, it is advisable

to round the numerical value associated with the estimate ˆ

ρ. However, an in-depth analysis of the

propagation of, e.g., the corresponding round-off error is out of the scope of the present work.

3.3. Surrogate-guided optimization 123

family of maps Ψ●in (3.170) where, by setting b∶=1, one can specify the signature

by Ra×d×R1×d→Ra×1with a,d∈N.

Technically, one can apply the interpretation that 2Xis a sampling plan with

a single sampling plan point 2x1which is defined as 2x1∶=x. Hence, the expres-

sion Ψ˜

K(Xs˜

K,x)in (3.188a) and the expression Ψ∆(Xs,x)in (3.188b) can be under-

stood by adapting the assignment in (3.170) such that

Ψ●=(1X,x)↦Ψ●(1X,x)∶=[ψi,1]●≡[exp(−∑d

j=1

●θj∣1xj

i−xj∣●pj)]●

, (3.189)

where i∈{1,. . .,a}, and 1xirefers to sampling plan points of the sampling plan 1X

and xrefers to an arbitrary point in (3.184).

An important observation regarding the co-kriging low-fidelity model in (3.184)

is that if we choose a point xsuch that xis a sampling plan point of Xsin (3.165),

then ˆ

y(x)is an output point within yin (3.166).

Thus, regarding the information associated with the high-fidelity model K, the

co-kriging low-fidelity model behaves the same way as the kriging low-fidelity model

in (3.98). Recalling the very rough intuition at the beginning of the section § 3.3.3,

the co-kriging low-fidelity model does not show such a behavior with regard to the

information associated with the low-fidelity model ˜

Similarly to (3.111), one can provide a mean squared prediction error (ˆ

sy(x))2at

an arbitrary point xfor the co-kriging low-fidelity model ˆ

y(x)in (3.184). Hence, the

error (ˆ

sy(x))2(cf. [70, p. 172]) can be defined as

(ˆ

sy(x))2∶=ˆ

ρˆ

σ2

K+ˆ

σ2

∆−cTC−1c+(1−1T

K,KC−1c)2

K,KC−11˜

K,K

, (3.190)

where ˆ

sy∶X→R. Analogously to (3.111), the term in (3.190) involving the fraction is

negligibly small, thus, let us reformulate the mean squared prediction error (ˆ

sy(x))2

as (ˆ

sy(x))2∶=ˆ

ρˆ

σ2

K+ˆ

σ2

∆−cTC−1c. (3.191)

Utilizing the mean squared prediction error (ˆ

sy(x))2in (3.191), one could formu-

late a sequential co-kriging optimization in the same fashion as the sequential kriging

optimization in § 3.3.1.

However, in the present work, we do not dwell on the sequential co-kriging op-

timization, but we rather dwell on the co-kriging optimization. More precisely: Given

a high-fidelity model K, and a low-fidelity model ˜

Kwhich, in the context at hand, is

considered conceptually indistinguishable from a surrogate model, the high-fidelity

optimization problem, for instance, in the formulation in (3.116), is replaced by a

co-kriging low-fidelity optimization problem which one can state as, e.g.,

min.

x∈X0(ˆ

j○ˆ

y)(x), (3.192)

where ˆ

y(x)refers to the co-kriging low-fidelity model in (3.184).

Notice well that, in a kriging low-fidelity optimization problem corresponding

to (3.116), one is solely capable of obtaining information from the high-fidelity model

in order to forge the kriging low-fidelity model.

Thus, let us conceive the corresponding optimization kind, i.e., kriging optimiza-

tion, as a subkind of surrogate-based optimization.

124 Chapter 3. Surrogate optimization

In a co-kriging low-fidelity optimization problem such as in (3.192), though, a

necessary minimum of information regarding the high-fidelity model is established,

and, then, one can steer the amount of information regarding the low-fidelity model

to, hopefully, improve the co-kriging low-fidelity model.

Thus, let us conceive the corresponding optimization kind, i.e., co-kriging opti-

mization, as a subkind of surrogate-guided optimization.

Depending on the computational costs of the low-fidelity model and the sur-

rogate model, respectively, in the co-kriging optimization, one can build a data-fit

low-fidelity model of the surrogate model – analogously to step 3) and step 4) in

the procedure SGO-SPLF – in order to utilize the data-fit low-fidelity model of the

surrogate model as a proxy for the original surrogate model in the construction of

the co-kriging low-fidelity model.

Finally, comparing the co-kriging optimization and the corresponding kriging

optimization, one can check whether the relation in (3.107) holds to be true. In-

terestingly, it is conceivable that the relation in (3.107) holds to be true and that a

measurable computation time regarding the co-kriging optimization is still larger

than a measurable computation time regarding the kriging optimization. It appears

like a seemingly paradoxical behavior.

The rationale for this seemingly paradoxical behavior is that the determination of

the maximum likelihood estimates of the parameters (θ˜

K,p˜

K,θ∆,p∆,ρ)in (3.175) and

in (3.183) is much more involved than the determination of the maximum likelihood

estimates of the parameters (θ,p)in (3.97).

Therefore, if the number of parameters and the dimensions of the matrices in the

maximum likelihood estimations are too large (cf. [70, p. 171]), then a measurable

computation time regarding the co-kriging optimization can be larger than a mea-

surable computation time regarding the kriging optimization – albeit the relation

in (3.107) holds.

Observe that the sequential co-kriging optimization appears as a hybrid of the

model management strategies fusion and adaptation. Mind that, to my best knowl-

edge, such hybrids are not extensively discussed within the classification scheme of

model management strategies proposed in [166].

For instance, recalling the end of § 3.3.2, the suggested extension of the proce-

dure SGO-SPLF by an inner sequential kriging optimization can be interpreted as a

hybrid of one kind of adaptation model management strategy and another kind of

adaptation model management strategy.

Furthermore, if we apply a formalization-oriented viewpoint in the sense that

we discuss the co-kriging low-fidelity model in (3.184) embedded in the context of

the optimization within the space-mapping paradigm (see § 3.3.2), then one can re-

place the low-fidelity optimization problem in (3.117) by the co-kriging low-fidelity

optimization problem in (3.192). In the context of § 3.3.2, the co-kriging low-fidelity

model can be understood as a second-level low-fidelity model w.r.t. the first-level

low-fidelity model ˜

A first benefit of such a formalization-oriented viewpoint is that it grants us with

a novel procedure that can be interpreted as a hybrid of the model management

strategies fusion and adaptation. Such a viewpoint suggests, therefore, to conceive

this hybrid (a co-kriging low-fidelity model within the space-mapping paradigm)

as somehow comparable with the hybrid constituted by the sequential co-kriging

optimization – at least at the function level (recall Figure 1.4).

A second benefit of such a formalization-oriented viewpoint is that it nurtures a

modular construction principle such that, e.g., the extension of the procedure SGO-

SPLF could be executed by an inner sequential co-kriging optimization. Hence, this

3.3. Surrogate-guided optimization 125

novel suggested extension can be interpreted as a hybrid of one kind of adaptation

model management strategy, and a fusion model management strategy, and another

kind of adaptation model management strategy.

A third benefit of such a formalization-oriented viewpoint is that it furnishes us

with a formal suspicion about the important role of the low-fidelity model within the

co-kriging low-fidelity model. Note that the numerical experiments in § 3.2 within

the context of surrogate-based optimization – for instance, in terms of eN

cv,eNR

cv ,r2

y˜

y,cv

or SN

y,ias shown in the Figures 3.12 -3.15 – furnish us with a numerical initial suspi-

cion. To the best of my belief, the role of the low-fidelity model within the co-kriging

low-fidelity model is not exhaustively formally elaborated in the literature (see, e.g.,

[70, p. 167 – 177] and references therein), though. Mind that, however, if we consider

the hybrid constituted by a co-kriging low-fidelity model within the space-mapping

paradigm, then one can invoke the statement in (3.164) which has to be adjusted re-

garding the sample size mand the sample size m˜

Kin (3.165). Hence, let us write the

adjusted statement as, with regard to some appropriate norms,

∀K˜

R,˜

K2,˜

s.∀˜

K2.∀˜

K1.(eNR

H,sg,˜

K1(ˆ

Qξ

ξ)→0∧r2

y˜

y,˜

K1→1)∧(eNR

H,sg,˜

K2(ˆ

Qξ

ξ)→0∧r2

y˜

y,˜

K2→1)∧

(eNR

H,sg,K˜

R,˜

K2,˜

s(ˆ

Qξ

ξ)→0∧r2

y˜

y,K˜

R,˜

K2,˜

→1)as m→∞and m˜

K1→∞and k→∞

Ô⇒ x(k)→x∗as k→∞w.r.t (3.140a), (3.193)

where ˜

K1refers to the first-level low-fidelity model ˜

Kwith m˜

K1≡m˜

K, and ˜

K2refers

to the co-kriging low-fidelity model in (3.176). The remaining entities in (3.193) are

defined according to our comments on the statement in (3.164).

Hence, by observing the statement in (3.193), one can utter the formal reasonable

suspicion that the choice of the low-fidelity model within the co-kriging low-fidelity

model obeys similar restrictions as the low-fidelity model within the space-mapping

paradigm.

Or to put it differently: By embedding the convergence issues related to the co-

kriging low-fidelity model into the convergence issues related to the space-mapping

paradigm, one can formally argue that the quality of the low-fidelity model within

the co-kriging low-fidelity model has to satisfy some problem-dependent lower bou-

nds.

Mind that the present work has primarily focused on working out the benefits of

a purely formalization-oriented viewpoint which has led to fertile novel insights

of theoretical value (such as hybrid model management strategies) and practical

value (such as the quality of the low-fidelity model within the co-kriging low-fidelity

model). These novel insights reveal novel research directions at the algorithm level,

at the program level, and at the application level as well (recall Figure 1.4).

However, the scrutineering of these new research directions, i.e., their thorough

and extensive examination, for instance, by comparing with the results in (3.2.1) and

in (3.2.2) and by extending the corresponding database, has to be left for future work.

126 Chapter 3. Surrogate optimization

3.4 In closing

The chapter’s primary purpose has been to provide us with an in-depth elaboration

of this thesis’s key notion surrogate optimization and an in-depth elaboration of the

proposed partitioning of this notion in ch. 1.2 into the three sub-notions: (1) sur-

rogate modeling & simulation, (2) surrogate-based optimization, and (3) surrogate-

guided optimization.

Throughout the elaborations, we have anticipated algebraic tools from the cate-

gory theoretical language in ch. 4such that we have beneficially tagged the various

notions of surrogate optimization with algebraic notes. Similarly to a lubricant, the

algebraic tools enabled us to smoothly operate between the various layers in Fig-

ure 1.4.

Regarding the sub-notion (1) surrogate modeling & simulation, we have initially

discussed an abstract setting in order to introduce relevant notions from the common

methodological and terminological toolbox. We have looked at different classes of

mathematical problems and we have encountered various important terms such as

high-fidelity model and low-fidelity model.

Next, we have defined the high-fidelity function approximation error. In the

discussion about sampling plans, we have encountered three different kinds of sam-

pling plans and their peculiarities. In the literature regarding surrogate optimiza-

tion, to my best knowledge, some sampling plans, such as, e.g., those constructed

by a Sobol quasi-random sequence, are not widely represented, yet. Afterwards, we

have introduced the empirical surrogate modeling error where we have defined the

empirical training error and the empirical generalization error as well.

The empirical generalization error, in particular, is an important indicator within

surrogate optimization and we have presented this error in various guises, for in-

stance, within the k-fold cross-validation method. Another important indicator is

the squared sample Pearson correlation coefficient. We have carved out some not ex-

haustively discussed nuances in the case that this coefficient is being used together

with the empirical generalization error within the k-fold cross validation method.

These nuances gain in significance through the fact that, in the present work, the

number of sampling plan points concerning the high-fidelity model is assumed to

be sparse.

Furthermore, we have developed a potential link between the sample Pearson

correlation coefficient and a low-fidelity models’ normalized global first-order sen-

sitivity measures which has culminated in a cautiously formulated conjecture about

the trustworthiness of low-fidelity models’ normalized global first-order sensitivity

measures.

Subsequently, we have examined deterministic data-fit low-fidelity models, i.e.,

multivariate polynomials and radial basis functions, and probabilistic data-fit low-

fidelity models, that is, kriging low-fidelity models. We have investigated diverse

aspects of these models in order to gain an holistic understanding of these models

and to spot possible pitfalls and a potential room for improvement. Some of the

investigated aspects are: the underlying construction principles of the models, the

computational significance of the dimension of the domain, the numerical relevance

of the evaluation scheme, and a sampling plan’s influence on the condition number

of a problem-representing matrix.

We have closed the subpart (1) surrogate modeling & simulation by an elabora-

tion of simplified-physics low-fidelity models. We have examined the general con-

cept of a user-prescribed hierarchy of problems with regard to the degree of fidelity

3.4. In closing 127

and the computational costs. We have concretized this general concept by present-

ing a user-prescribed hierarchy of magnetoquasistatic and magnetostatic problems

which are associated with simplified-physics low-fidelity models. From this hierar-

chy, we have abstracted some diagrams in a loose category-theoretical style. At the

end, we have paved the way for a purely formalization-oriented viewpoint on some

surrogate-guided optimization approaches.

Regarding the sub-notion (2) surrogate-based optimization, we have examined

the optimization with the test functions in § 2.3.3 by data-fit low-fidelity models and

by emulated simplified-physics low-fidelity models. The essential idea is to solve

an optimization problem associated with the low-fidelity model whose optimal so-

lution is utilized as a starting point for the optimization problem associated with the

high-fidelity model.

The proof of work in form of, e.g., visualizations of the above-mentioned indi-

cators within surrogate optimization, appear valuable since, to my best knowledge,

there is a lack of a comprehensive database of corresponding benchmarks. Hence,

the proof of work equips us with a benchmark-focused classification of test func-

tions (more generally, high-fidelity models) whose advantages and disadvantages

we have briefly discussed.

An advantage is the opportunity to classify very roughly the behavior of a cor-

responding optimization problem within the magnetoquasistatic and magnetostatic

context. A disadvantage is that it is not clear whether there exists a reliable complete

list of indicators.

We have closed the subpart (2) surrogate-based optimization by an elaboration

of the proposed procedures SBO-DFLF and SBO-SPLF and their potential combina-

tions.

Regarding the sub-notion (3) surrogate-guided optimization, I have argued from

an application-driven viewpoint that it is worthwhile to check whether the num-

ber of high-fidelity model evaluations regarding a surrogate-based optimization ap-

proach is higher than the number of high-fidelity model evaluations regarding a

surrogate-guided optimization approach. The additional value of such a check is

comprehensible in the context of validation and verification.

Subsequently, we have dwelled on the sequential kriging optimization as a sub-

kind of the model management strategy adaptation, on optimization procedures

within the space-mapping paradigm which are a subkind of the model manage-

ment strategy adaptation, and on the co-kriging optimization which can be seen as

a subkind of the model management strategy fusion.

Concerning the optimization within the space-mapping paradigm, we have uti-

lized a formalization-oriented viewpoint to pin down properly, e.g., the conceptional

distinction between a low-fidelity model and a surrogate model. Mind that there is

a loss of conceptional information if we solely operate with various representations

of the real numbers such as, e.g., R,Rn, and Rn×mwith n,m∈N.

Nevertheless, we have concretized the formal concepts to investigate the syntax

and the semantics of the multivariate scalar-valued use case and the multivariate

vector-valued use case.

Subsequently, we have examined the basic building blocks of a representation of

the Trust Region Aggressive Space Mapping (TRASM) algorithm, that is, the algo-

rithm 3.1. Additionally, we have discussed the basic building blocks of some other

proposed algorithms within the literature about the space-mapping paradigm.

Driven by heuristics, we have formulated a convergence statement that incorpo-

rates some of the above-mentioned indicators that, to the best of my belief, furnishes

128 Chapter 3. Surrogate optimization

us with a novel access to the delicate aspect of convergence-related issues within the

space-mapping paradigm.

At the end, we have elaborated on the proposed procedure SGO-SPLF.

Concerning the co-kriging optimization, we have examined the basic building

blocks for constructing the co-kriging low-fidelity model. A special challenge is the

handling of a sampling plan associated with the high-fidelity model and a sampling

plan associated with a low-fidelity model where the usage of algebraic notions facil-

itates the consideration.

An intriguing novel observation is that sampling plans constructed by a Sobol

quasi-random sequence may help to reduce the overall computational costs of con-

structing a co-kriging low-fidelity model.

We have closed the subpart (3) surrogate-guided optimization by elaborating

on the benefits of a purely formalization-oriented viewpoint that provides us with

novel insights of theoretical value (such as potential hybrid model management strate-

gies) and of practical value (such as convergence-related issues regarding the quality

of the low-fidelity model within the co-kriging low-fidelity model).

129

Chapter 4

An algebraic modeling framework

using the category theoretical

language for applications in

surrogate optimization

In§1.3, I have briefly adduced the formal language of category theory as a holistic-

structural approach to mathematics which can serve as a promising mediator be-

tween the tool set from logical analysis and the tool set from numerical analysis.

Furthermore, I have pointed at the potential new opportunity that opens up by em-

ploying the category theoretical language in order to complement the primarily nu-

merical analytic perspective in the context of surrogate optimization.

In§2.1.2, we have made a detour to a structural perspective on a structural

property; in § 2.2.2, we have made a detour to a structural perspective on another

structural property; and, in § 2.3.2, we have made a detour to a structural perspective

on the objective functions. Hence, I have used these detours to show by examples

that the formal language of category theory is lurking in the background of some

established perspectives on optimization within the electromagnetics context.

In ch. 3, various notions of surrogate optimization have been tagged with alge-

braic notes. In § 3.1.3, especially, algebraic tools from the category theoretical lan-

guage have been anticipated in the elaborations on simplified-physics low-fidelity

models. In § 3.3.2 and in § 3.3.3, some benefits of a formalization-oriented viewpoint

on surrogate-guided optimization have been shown in order to, e.g., recognize hy-

brid model management strategies or formulate heuristics-driven convergence state-

ments.

In the present chapter, let us head further into the research direction of the formal-

ization-oriented viewpoint and aim at strengthening its theoretical foundations.

Firstly, we recapitulate some conceptualities from the previous chapters. More-

over, in addition to the context of validation and verification for the category theoret-

ical language (see, e.g., § 2.3.2), we briefly sketch the emerging research direction of

full automation of surrogate-guided optimization (SGO) and how it can serve as an-

other potential context for the category theoretical language. Afterwards, I concisely

mention some relevant related works.

Secondly, let us introduce the category theory toolset where we focus on core

tools. Initially, we foster some intuition about the toolset and, subsequently, we

apply some rigor in the reasoning. Finally, I illuminate a couple of computational

facets.

Thirdly, the category theory toolset is used for specifying a general optimization

problem and for specifying surrogate-guided optimization methods where the focus

130 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

is on methods within the space-mapping paradigm.

Fourthly, I examine other use cases for the category theory toolset related to

high-fidelity models and low-fidelity models relevant to applications in electrical

engineering.

We close the chapter, and thus also the advance in the research direction of the

formalization-oriented viewpoint, at a fork with three open roads for future use

cases.

4.1 Recapitulating and enlarging the contextual landscape

By recollecting some landmarks, let us concisely recapitulate the contextual land-

scape so far (cf. § 1.2). Next, we succinctly illuminate the context of full automa-

tion (recall § 1.1) of surrogate-guided optimization (recall § 3.3). A vague and in-

tuitive idea underlying full automation of SGO – or, in general, the full automation

of surrogate optimization (recall ch. 3) – is that, given an optimization problem by

a user, an ideal software system ascertains the "best" (in some sense) SGO approach

for the optimization problem at hand. Admittedly, the in-depth investigation of this

idea is out of the scope of the present work (cf. the third disclaimer in § 1.3). How-

ever, we at least sketch the potential practical contribution of the formal language of

category theory to the discussion of this idea. We end the section by naming some

relevant related work to our subsequent elaborations.

4.1.1 Recapitulating & the context of full automation of SGO

In engineering applications, the class of surrogate-guided optimization (SGO) meth-

ods are gainful in accelerating the numerical search for optimal solutions (see, e.g.,

[70], [116]). A fundamental assumption regarding SGO schemes is that the over-

all computational costs of the numerical optimization are dominated by the costs

of evaluating the objective function (aka high-fidelity function). Since the aim is to

quickly find the high-fidelity function’s optimal solution, the following basic ideas

of SGO methods arise: (1) Approximate the objective function by one or more surro-

gate functions (aka low-fidelity functions) – which capture the high-fidelity function’s

structure and, by design, have much lower evaluation costs than those of the high-

fidelity function; and (2) draw sparingly on the high-fidelity function.

Understandably, a lot of research effort in the field of surrogate-guided optimiza-

tion is circled around the numerical properties of the interplay between the high-

fidelity function and different kinds of surrogate functions. There are two common

ways to classify surrogate functions: (#1) data-fit,simplified-physics, or projection-based;

and (#2) intrusive or non-intrusive – where intrusive means that there is a need to

modify the numerical software that is underlying the high-fidelity function. Note

that, in general, computational models and non-computational models such as phys-

ical experiments are conceivable.

Despite all the advances in this field, there remains a need for investigating how

to achieve full automation of surrogate-guided optimization methods – as, for in-

stance, it has been commented in [119, p. 1513]: "Full automation of space mapping

and other surrogate-assisted design methods is a necessary condition of widespread

acceptance of such methods by the designers and industry." Furthermore, full au-

tomation is indirectly described by stressing aspects like "ensuring global conver-

gence, immunity to coarse model inaccuracy, as well as robustness with respect to

the surrogate model setup" (cf. [119, p. 1512]). All these aspects are undoubtedly

4.1. Recapitulating and enlarging the contextual landscape 131

essential since they are hinting at important common language features underlying

numerical analysis; but it seems unlikely that these common language features alone

can express all the intuitive associations with the idea of full automation. Thus, there

is an opportunity to point the research in this field at a new orthogonal direction,

more precisely, at category theoretical language features.

Imagine the following conceivable realization of full automation: An ideal soft-

ware system selects a surrogate model and chooses the appropriate algorithm for

the given optimization problem – without a user’s intervention. For this thought

experiment, there is a need for a formal language in which one can define the opti-

mization problem, the algorithms and the surrogate models in a suitable way for the

software system because it can only deal with well-defined tasks.

I argue that thinking of full automation in terms of software systems shifts the

analysis from point by point considerations of single concrete optimization prob-

lems to a holistic view of the modeling chain. This holistic perspective sheds some

light on the hidden costs of the modeling chain. And it allows to reassess the role of

the empirically undeniable savings in terms of high-fidelity function evaluations.

Depending on a user’s capabilities and experiences, the error-proneness of self-

implemented surrogate-guided optimization schemes will vary heavily. And even

a lot of testing cannot prevent potential hidden bugs. Therefore, striving for repro-

ducibility, a thorough cost–benefit analysis could lead a user to stick to classical op-

timization schemes provided by commercial or non-commercial software systems,

and to put more trust in solutions found by methods that have stood the test of

time (see, e.g., [35], [158], [142] or [96]).

Let us examine how to reduce the gap between mathematical modeling involved

in surrogate guided-optimization and a software system by the formal language of

category theory which is a holistic-structural approach to mathematics (see, e.g., [11],

[177] or [180]). I admit that, obviously, the restriction to a formal language is a sim-

plification since real-world software systems are vastly more complex and more

deeply anchored in physical machines than any mathematical abstraction or for-

malism could ever capture. Nevertheless, this simplification is reasonable because a

crucial part of a software system is the programming language – and a programming

language is a formal language, too.

Note that programming languages are a vivid research field within the computer

science community. In this context, type theory is an essential theoretical cornerstone

that is related to a practical programming language’s type system. In the elucida-

tions of the present work, solely a working or rudimentary knowledge of type the-

ory and type systems is supposed. For more details on type theory, see, e.g., [88] and

references therein.

The design of a type system determines heavily to which extent it can express

all the properties of a well-defined task: At one end of the spectrum, there are

dynamically-checked languages (such as, e.g., the MATLAB®PL or the Python PL),

where a trend can be seen towards richer types systems for performance reasons

(see, e.g., the Julia PL in [26]); at the other end of the spectrum, there are statically-

checked languages (such as, e.g., the Haskell PL or the Agda1PL), where verification

reasons have been an impetus for very rich types systems.

Let us conceive a type as a set equipped with an equivalence relation – but with-

out committing to one specific technical concept (see more details about various

1In [138], the Agda programming language is partially used for investigating partial derivatives

and the corresponding chain rule for multivariate calculus.

132 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

technical concepts, e.g., in [214]); since, in the present chapter, the focus is on cat-

egory theory, not on type theory. But the connection between the type theoretical

language and the category theoretical language can be understood in a very narrow

way by means of the Curry-Howard-Lambek correspondence (see, e.g., [167, p. 59ff]) or

in a broader way by means of a pragmatic mental model in which a category serves

as a model of a (functional) programming language (see, e.g., in [16, p. 20-24]).

Let us put the pragmatic mental model to use such that the category theoreti-

cal language can act as a mediating instance between the mathematical modeling

involved in surrogate-guided optimization and the type-theoretical aspects of a pro-

gramming language. From this point of view, other formalization approaches, e.g.,

Hilbert spaces, manifolds, and similar, can be thought of as domain-specific lan-

guages embedded in the host language (or "general-purpose language") of category

theory (cf. [102]).

The category theoretical language can offer a language-focused comparison of

different surrogate-guided optimization methods – which has a very practical bene-

fit; because, at least in the field of computational electromagnetics, there is a lack

of well-defined equivalence classes of benchmarks that could enable a standard-

ized benchmark-focused comparison. For instance, the popular T.E.A.M. (Testing

Electromagnetic Analysis Methods) problems do not seem to be used much in the

research field of surrogate-guided optimization.

Aiming at strengthening the theoretical foundations of the formalization-orien-

ted viewpoint on surrogate optimization, one main concern in this chapter is the

development of a novel comparison toolset for surrogate-guided optimization meth-

ods by explicitly category theoretical (CT) language features.

I illustrate the CT approach by discussing the space mapping paradigm’s basic

building blocks (recall § 3.3.2) within the frame of model management strategies (re-

call § 1.2).

Furthermore, I depict the usefulness of the CT approach with regard to formal-

ization use cases within the electromagnetics context related to simplified-physics

low-fidelity models and related to transformations – such as, e.g., coordinate trans-

formations – of a high-fidelity model and a low-fidelity model.

4.1.2 Relevant related work

To my best knowledge, in the literature on surrogate-guided optimization in electri-

cal engineering, formalization issues have not been addressed exhaustively. There-

fore, the selection of the following articles aims at creating a context in order to make

the added value of this chapter’s contribution comprehensible.

In [166], the authors try to organize various numerical methods in the fields of

uncertainty propagation, statistical inference, and optimization by means of the con-

cept of multifidelity model management. Applying this concept to the field of optimiza-

tion corresponds to the class of surrogate-guided optimization methods.

Originating as one correction methodology for surrogate functions, the space

mapping notion led to a very ramified family of surrogate-guided optimization

methods like, e.g., space mapping (see, e.g. [14]), manifold mapping (see, e.g. [56]),

and many others (see a survey, e.g., in [126]).

In [166], the authors identify correction methodologies like space mapping or the

first-order approximation and model management optimization (AMMO) paradigm

(see, e.g., [4]) as members of one class of multifidelity model management strategies:

adaptation; i.e., during the optimization process, the low-fidelity model is adapted by

high-fidelity model’s information.

4.2. Category theory toolset 133

In [123], the authors present an automated low-fidelity model selection based

on correlation analysis between low- and high-fidelity models for surrogate-guided

design optimization of antennas.

Various approaches to automated algorithm selection based on machine learn-

ing techniques are discussed in [113]. Category theoretical approaches to machine

learning and its underlying mathematical concepts (like Bayesian probability) have

been conducted in, e.g., [68] or [52]. In [58], automatic differentiation – a key concept

in machine learning techniques – is discussed in a functional language environment.

Using the idea of object-oriented coding, the authors in [111] discuss how to

implement finite element software systems that imitate the mathematical structures

of Maxwell’s equations. The emphasis on the mathematical structures appears very

fruitful in the realm of computational electromagnetism (see, e.g., [205], [174], [8],

[28], [32]).

In [65], the author discusses various category theoretical ideas in the realm of

general software engineering that are beyond the scope of the present work that

limits itself to ideas in programming languages (see, e.g., [219]).

In order to justify certain statistical modeling approaches, the author in [148]

offers a precise mathematical definition of a statistical model by the language of

category theory.

4.2 Category theory toolset

Since the formal language of category theory (CT) operates at an even higher level

of abstraction than the languages such as functional analysis or differential geom-

etry (recall ch. 2), the first step is to foster some intuition regarding the category

theoretical language. From an application-driven viewpoint, this intuition is valu-

able in order to better comprehend the nature of the problems where the category

theoretical language shines, and therefore to better anticipate its potential practical

benefits.

The subsequent step is to apply some rigor to the intuitive reasoning about cat-

egory theory. The corresponding elaborations rely only on elementary notions of

category theory, more precisely, no deep theorems of category theory are applied.

It is primarily used as a strong and stable notational scaffolding, especially, by dia-

grams of arrows (see, e.g., [144, p. 1ff]).

At the end, some computational facets regarding the category theoretical lan-

guage are illuminated.

4.2.1 Fostering some intuition

In order to harness partly the high level of abstraction regarding the category theo-

retical language, it is useful to recall occasionally some common concrete perspec-

tives on categories.

From one perspective, a category can be viewed as a kind of algebraic structure

like a group or a vector space. A vector space models linearity, a group models

symmetry, and a category models composition.

Another perspective is to consider a category as a mathematical context. In the

domain of linear algebra, for instance, the language of matrix dimensions and ma-

trices and the language of vector spaces and linear maps describe essentially the

same underlying structure and properties of this domain that can be encoded by an

equivalence of the corresponding categories.

134 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

Category theory emphasizes rather the structure-preserving maps between ob-

jects than the objects themselves; e.g., it emphasizes rather the linear maps between

vector spaces than the vector spaces themselves. This observation is reflected in a

third perspective in which a category Aencodes a syntax and another category B

encodes a semantics and the structure-preserving map F from Ato Bencodes an

interpretation of Awithin B(see, e.g., [136, p. 166ff]). The map F ∶A→Bis called a

functor; and, most abstractly, it is a tool for comparing categories (see, e.g., [36]).

Inferring from the second and the third perspective, a functor can also be re-

garded as an interpretation of one mathematical context within another. A different

interpretation can be encoded in a different functor G ∶A→B. The need for a

comparison of the two interpretations F and B is covered by the notion of a natural

transformation.2

A B

α(4.1)

The diagram of arrows in (4.1) unites the very basic tools of category theory: cate-

gories (A,B), functors (F, G), and natural transformations (α).

4.2.2 Applying some rigor

All following definitions build upon first-order logic and a primordial concept of a

set, i.e., no specific axiomatic system of set theory is used. In addition, if one wants to

express a proposition, let us use the set membership symbol "∈"; and if one wants to

express a judgment, let us use the type annotation symbol "∶". For instance, "one is an

element of the natural numbers" is a judgment, hence, one would write "1 ∶N". Simi-

larly, in a statically-checked language (recall § 4.1.1), "one is an element of the integer

type" is a judgment, not a proposition; hence, one would write "1 ∶Int". Regarding

further logical technicalities (hierarchy of universes, Grothendieck universes, axiom

of choice, and similar), I refer to, e.g., [214] and references therein.

Let us provide a definition of a category by emphasizing its three constituting

parts (data, structure, laws) as a mathematical entity.

Definition 4.2.1 (Category).Acategory Cis constituted by

• data:

–a collection obj(C)of objects X,Y,Z, ...

–∀X,Y∶obj(C)∃a set homC(X,Y)of morphisms (or arrows or "structure-

preserving" maps) f,˜

g,˜

h, ...

*notation (general) f∶homC(X,Y),f∶YX,

*notation (specific) f∶X→Y,Xf

Ð→ Y

*all homC(X,Y)are pairwise disjoint;

• structure:

–dom(f)is the domain Xof morphism f,

cod(f)is the codomain Yof morphism f

2Expressing formally the notion of natural transformation for applications in the field of algebraic

topology is mostly considered as the starting point of category theory (see, e.g., [177, p. 1f]).

4.2. Category theory toolset 135

–∀X∶obj(C)∃an identity morphism idX∶homC(X,X)

–∀f∶homC(X,Y)∀g∶homC(cod(f),Z)

∃a composite morphism g○f∶dom(f)→cod(g);

• laws:

–∀f∶homC(X,Y). idY○f≡f≡f○idX(unity)

–∀f∶homC(X,Y)∀g∶homC(cod(f),Z)

∀h∶homC(cod(g),W)∃h○g○f∶dom(f)→cod(h).

h○(g○f)≡(h○g)○f(associativity) .

Remark 4.2.1. The notion "structure-preserving" originates from considerations of struc-

tured sets and the structure-preserving functions between them. But morphisms do not

necessarily have to be structure-preserving functions – or functions at all.

Remark 4.2.2. By simply turning around all the arrows of a category C, one can define the

dual or opposite category Cop that molds the duality principle.

Remark 4.2.3. Addressing the issue concerning the size of a category, a category is called

small if both obj(C)and homC(X,Y)are sets. Moreover, let us suppose primarily locally

small categories such that, at least, for all pairs of objects X,Y, the collection of morphisms

between them, i.e., homC(X,Y), is a set – called a hom-set.

Some representations or implementations of categories are:

• Set (the category of (finite) sets and set functions),

• Top (the category of topological spaces and continuous maps),

• Man∞(the category of smooth manifolds and ∞-times continuously differen-

tiable maps),

• Vectfd

(the category of finite dimensional vector spaces over the field kand

k-linear maps),

• TVect

(the category of topological vector spaces over the topological field k

and continuous k-linear maps),

• Vectbasis

(the category of finite dimensional vector spaces over the field kwith

chosen basis and k-linear maps), and

• Mat

(the category of non-zero natural numbers [matrix dimensions (m,n)]

and m×n-matrices with values in the field k).

Bear in mind that the category’s definition encompasses merely its specification,

i.e., its minimal amount of essential characteristics; thus, various real encounters,

e.g., inverse maps or products, lead to new features – like a category Cwith isomor-

phisms.

In the subsequent elaborations, it is supposed that the maps fand gare ad hoc

polymorphic or overloaded. Thus, let us invoke the maps fand gwith different signa-

tures.

Notice well that isomorphisms are special morphisms that allow to encode the

idea of sameness of objects. A morphism f∶X→Yis called an isomorphism if and

only if there exits a morphism g∶Y→Xsuch that, in equational form, f○g=idY

136 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

and g○f=idX. In diagrammatic form, one can choose exemplarily the subsequent

representation:

X Y

idX

⟳∃g

⟳idY

, (4.2)

where the symbol ⟳is used to denote a commutative diagram. Furthermore, the

objects Xand Yare called isomorphic (symbolically: X≅CY). If the objects are iso-

morphic, then they are indistinguishable, hence, it possible to substitute one object

with the other.

A morphism whose domain is identical with its codomain, that is, a morphism f∶

X→Xsuch that

dom(f)≡cod(f)(4.3)

is called an endomorphism. An endomorphism which is also an isomorphism (see the

diagram in (4.2)) is called an automorphism.

One can construct a notable example for isomorphic objects by providing a cat-

egory Cwith product objects A ×B. More precisely, given A,B∶obj(C), then the ob-

ject A×B∶obj(C)equipped with a pair of morphisms πA∶A×B→Aand πB∶A×

B→Bis a (binary) product object, if and only if it satisfies a universal mapping

property (UMP), i.e.,

∀P∶obj(C).∀f∶P→A.∀g∶P→B.∃!h∶P→A×B(4.4a)

such that the subsequent diagram commutes

A A ×B B

∃!h

f⟳⟳g

πAπB

, (4.4b)

Let us refer to the morphism has the (binary) product of the morphisms fand g

(symbolically: h∶=⟨f,g⟩). Finally, assuming ternary product objects in the cate-

gory Set, a notable example for isomorphic objects is

∀A,B,C∶obj(Set).(A×B)×C≅Set A×(B×C). (4.5)

In order to show that a parallel pair of morphisms f,g∶X⇉Yis equal, i.e.,

f=˜

Cg, (4.6)

one needs a category ˜

Cwith a terminal object 1. In a category ˜

C, an object is terminal

if and only if for all objects X∶obj(˜

C), there exists a unique morphism X→1. In-

voking the duality principle (see Remark 4.2.2), there is an initial object Oas well. In

a category ˜

C, an object is initial if and only if for all objects X∶obj(˜

C), there exists a

unique morphism O→X.

Mind that, in category theory, there is no global set-membership relation; thus,

the idea of an element "x∈X" is encoded in a map xsuch that x∶1→Xor 1x

Ð→ X.

Hence, if for all morphisms 1x

Ð→ X, the equation f○x=1→Yg○xholds to be true,

then f=˜

Cg. If the morphisms are equal, then they are indistinguishable, hence, it is

4.2. Category theory toolset 137

possible to substitute one morphism with the other.

This kind of test for equality w.r.t. (4.6) corresponds to the common extensional

equality of functions (recall Remark 2.2.2) in which one treats functions as black-boxes

such that one only considers their input-output behavior. It is important to note

that this equality problem is algorithmically decidable only if the domain is finite

and relatively small. If one is also interested in how the output is calculated, i.e.,

the particular "formulas" of the functions, one needs to consider intensional equality

of functions. Finally, if one wants to know whether two functions point to the exact

same instance in computer memory, one needs to invoke the notion of referential

equality of functions.

Observe that the terminal object 1, if it exists, then it is only unique up to iso-

morphism. But, more importantly, it is part of objects (like the product object) that

follow the principle of universality. Hence, the definition of objects such as the ter-

minal object is based on a universal mapping property that is similar to (4.4) where the

object

s existence and uniqueness is related to all other objects of the category.

Given a category Cand a category D, one can define a map from Cto Dthat pre-

serves the structure of the category C(recall Definition 4.2.1) within the category D.

This structure-preserving map between two categories is called a functor.

Definition 4.2.2 (Functor).Given a category Cand a category D, a functor F∶C→D

is constituted by

• a map on the data of Cand D, i.e., a functor’s assignment rule reads as

–∀X∶obj(C).∃F(X)∶obj(D)

–∀f∶homC(X,Y).∃F(f)∶homD(˜

X,˜

Y);

• that preserves the structure of Cwithin D, i.e., the functor laws read as

–˜

X≡F(X),˜

Y≡F(Y)

–∀X∶obj(C). F(idX)≡idF(X)

–∀g○f∶homC(dom(f),cod(g)). F(g)○F(f)≡F(g○f).

Remark 4.2.4. The map Fis ad hoc polymorphic or overloaded in the sense that its sig-

nature C→Dis accompanied with the signature obj(C)→obj(D)and the signature

homC(X,Y)→homD(˜

X,˜

Y).

Remark 4.2.5. Observe that the defined functors are called covariant in order to distin-

guish them from contravariant functors F∶Cop →D– with Cop being the opposite cat-

egory (see Remark 4.2.2) – where the direction of the arrows in the domain-category Cop is

swapped in the codomain-category D.

The aforementioned definitions of a category and a functor appear a bit cumber-

some, but they unfold their power when we shift from this algebraic to a geometric

description. If we consider a category as a directed multigraph equipped with an alge-

bra of paths – the identity morphisms corresponding to 0-paths, the morphisms corre-

sponding to 1-paths, and the composition corresponding to 2-paths –, then one can

consider a functor as a graph-morphism that preserves paths. For the sake of brevity,

let us not define categorically the concepts graph, and similar. For more category

theoretical details regarding these concepts, I refer to, e.g., [144], [11] or [177] and

references therein.

Let us assume two finite categories Aand B. Note that a finite category has only a

finite number of objects, identity arrows, and non-identity arrows. Furthermore, let

138 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

us suppose a functor F ∶A→B. Hence, the diagram in (4.7) illustrates the geometric

view regarding categories and functors.

XF(X)

Y Z F(Z)F(Y)

idX

g○f

F(idX)

F(g○f)F(f)

idYg

idZ

F(idZ)

F(idY)

F(g)

(4.7)

Moreover, the diagram in (4.7) elucidates geometrically that all functors preserve

isomorphisms, that is, if the commutative diagrams in (4.2) exist, then all functors

preserve these commutative diagrams – in the sense that, if the objects X,Y∈obj(A)

are isomorphic, then

X≅AYÔ⇒ F(X)≅BF(Y). (4.8)

Although all functors preserve isomorphisms, it is not necessarily true that they reflect

isomorphisms, i.e., an isomorphism in the functor’s codomain does not mean neces-

sarily that the corresponding morphism in the functor’s domain is an isomorphism.

Functors can have various attributes. Two very useful examples are forgetful

functors that forget all or only some of the algebraic structure such as, e.g.,

F∶Vectfd

→Set, (4.9a)

G∶Vectbasis

→Vectfd

, (4.9b)

and faithful functors F ∶C→Dwho possess the defining property

for all X,Y∶obj(C), the map F ∶homC(X,Y)→homD(F(X),F(Y))is injective.3

(4.10)

Forgetful and faithful functors are very useful since they help to pin down the

concept of a structured set, i.e., a set equipped with extra structure; hence, a struc-

tured set is an object of a category Cequipped with a faithful functor F ∶C→Set.

Note that, in the setting of structured sets, the category Cis called a concrete cate-

gory. Some examples of concrete categories are the abovementioned categories Top,

Man∞, and Vectfd

. The category Mat

is not a concrete category.

Recalling § 4.2.1 for the case of structured sets, one can observe that categories

such as Top, Man∞or Vectfd

encode a syntax and the category Set provides a seman-

tics. A faithful functor between a syntax category and a semantics category encodes

an interpretation of a syntax category within a semantics category.

If one wants to compare two different interpretations, then one needs a proper

notion of comparing two different functors. Hence, let us define properly the notion

of an arrow between two functors, i.e., let us define a natural transformation.

Definition 4.2.3 (Natural transformation).Given a category C, a category D, a func-

tor F ∶C→D, and a functor G ∶C→D, then a natural transformation α∶F⇒G com-

prises

• the components of αat X–, i.e., a family of morphisms αXin D:

–∀X∶obj(C)∃αX∶F(X)→G(X)

• such that ∀f∶homC(X,Y), i.e., Xf

Ð→ Y,

3Regarding the signature of F, see Remark § 4.2.4.

4.2. Category theory toolset 139

–αY○F(f)=G(f)○αXin Dor

–the diagram

F(X)G(X)

F(Y)G(Y)

F(f)⟳

αX

G(f)

αY

commutes in D.

Remark 4.2.6. The lack of naturality (i.e., the lack of a natural transformation) is an even

more interesting observation than the existence of naturality – because, in the former case,

there is a need for an "unnatural choice of basis" in some sense that is motivated by the

relation between a finite-dimensional vector space and its dual and its double dual.

Remark 4.2.7. If we apply the geometric view such as in (4.7), then one can conceive a

natural transformation as a comparison of constructed paths.

Remark 4.2.8. A natural transformation α∶F⇒Gis a natural isomorphism (symbol-

ically: α∶F≅G) if and only if all components αXare isomorphisms. Let us omit writ-

ing the corresponding category, that is, the so-called functor category DCwhere the ob-

jects are the functors F∶C→D,G∶C→D, ..., and the morphisms are natural transforma-

tions α∶F⇒G. Hence, a natural isomorphism can symbolically be written as α∶F≅DCG.

Using the notion of natural isomorphisms in order to define the concept of equiv-

alence of categories, we are able to tackle the question of when two categories are

essentially the same.

Definition 4.2.4 (Equivalence of categories).An equivalence of categories C,D(sym-

bolically: C≃D) is constituted by

• functors F ∶C⇄D∶G and

• natural isomorphisms η∶idC≅G○F and e∶F○G≅idD.

Remark 4.2.9. Analogously to the identity morphism in Definition 4.2.1, the functors idC

and idDdenote the identity functors corresponding to the categories Cand D, respectively.

Remark 4.2.10. If and only if there is an equivalence of categories, then the functor Gis

called the inverse functor to the functor F.

Remark 4.2.11. If two categories are equivalent, then they are indistinguishable regarded

as categories, hence, it is possible to substitute one category with the other.

One can state stronger forms of sameness, e.g., one can state equality C=C; or,

one can replace the demand for natural isomorphisms in Definition 4.2.4 by the de-

mand for F to be an isomorphism regarding the category Cat, i.e., the category of (lo-

cally) small categories (recall Remark 4.2.3) where the objects are (locally) small cate-

gories C,D, ..., and the morphisms are functors F ∶C→D. Hence, a functor F ∶C→D

is called an isomorphism w.r.t. Cat if and only if there exits a morphism G ∶D→C

such that idC=G○F and F○G=idD.

However, these two stronger forms of sameness are less helpful since they restrict

unnecessarily the available expressive power. For instance, the second form tells

us that, by going around through F and G, we will land at the exact same starting

point. In most circumstances, though, we will land at a spot resembling our starting

point; and this idea of resembling is encoded by demanding natural isomorphisms

in Definition 4.2.4.

There is also a weaker form of sameness that replaces the natural isomorphisms

in the definition above by natural transformations (idC

Ô⇒ G○F and F○Ge

Ô⇒ idD)

140 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

and some coherence conditions (triangle identities). This process leads to the notion

of adjunction.

Regarding linear algebra (recall § 4.2.1), the important equivalence of categories

is captured by

Vectfd

≃Mat

(for any field

). (4.11)

For more details regarding the proof of (4.11), I refer to [177, p. 33]. Remember

that, in the setting of structured sets, the category Vectfd

is a concrete category and

the category Mat

is not a concrete category. However, the statement in (4.11) re-

gards Vectfd

and Mat

solely through the lens of a category (recall Definition 4.2.1).

The diagram in (4.12) depicts a possible interplay of some aforementioned cate-

gories.

Vectbasis

Mat

Vectfd

Set

TVect

Top

(4.12)

Notice well that most of the functors in (4.12) are forgetful. However, the functor D is

forgetful and faithful and full.Full functors F ∶C→Dpossess the defining property

for all X,Y∶obj(C), the map F ∶homC(X,Y)→homD(F(X),F(Y))is surjective.

(4.13)

In general, a diagram such as in (4.12) is not necessarily commutative.

4.2.3 Computational facets

The notion of computation in the category theoretical language depends on the

chosen context, hereby reflecting the observation that the programming language’s

point of view on computation differs from the differential geometric’s or functional

analytic’s.

In a programming language setting, (typed) λ-calculus is a model of computa-

tion. A typed λ-calculus (see, e.g., [15]) is, directly or indirectly, constituting the core

of functional and imperative programming languages.4

In a differential geometric setting, a connection to computation is established

by moving from the manifold level to the chart level – and hence ultimately, to the

field of real numbers and matrices over the field of real numbers. Observe that, for

instance, the authors in [117] give a category theory oriented exposition of classical

differential geometry.

In a functional analytic setting, one has a concept of "a space approximating an-

other space" and a notion of error, but ultimately, the connection to computation is

established by moving to the field of real numbers and matrices over the field of real

numbers.

Observe that approximating is related to iterating a function; and iterating a func-

tion is related to composing functions. Furthermore, observe that, in general, the no-

tion of error is formalized in the context of complete normed vector spaces, more

4For some applications of λ-calculus with connection to electromagnetics, I refer to, e.g., [138].

4.2. Category theory toolset 141

precisely, Banach spaces. Notice that there is a well-designed connection between Ba-

nach space theory and category theory (see, e.g., [43]). Finally, observe that there is

a common tension arising in theories using real numbers caused by the application

to machines – which, inevitably, recourse to floating point arithmetic – and a leap

of faith that the theory’s high-level properties still hold to be true on the machine.

Moreover, there is a tension between a theory’s high-level data structures and their

programming language counterparts (cf. [133]).

Relating the category theoretical language to programming languages, one ob-

servation is the correspondence between a cartesian closed category (CCC) and a typed

λ-calculus. The defining properties specifying a CCC are the existence of a terminal

object, a product object, and an exponential object. For a definition of an exponen-

tial object and a cartesian closed category, see, e.g., [167, p. 33f] and [167, p. 53 - 57],

respectively.

Another observation is that a category can serve as a model for a functional pro-

gramming language. Note that, in addition to exploiting categorical definitions for

guidance on how to organize software, in recent times, the extension of functional

programming languages with dependent types (see, e.g., the Agda PL or the Idris PL)

enables implementing immediately categorical definitions by using software.

For technical details regarding dependent types, I refer to, e.g., [214]. How-

ever, recalling Definition 4.2.1, an instance of a dependent type is the hom-set type

homC(X,X)since its definition depends on the value Xof the obj type obj(C). Mostly,

though, dependent types are used to encode the quantifiers "∀" and "∃". Although

implementing immediately categorical definitions by using software is a promising

approach, it is still a very young approach (see, e.g., [72]); thus, its industrial appli-

cation is not big yet.

Relating the category theoretical language to matrix-focused environments, the

key observation is the equivalence in (4.11) which has to be adapted for a computa-

tional context in the sense that

Vectfd

≃Mat

(for any computable field

). (4.14)

Thus, the category Mat

is used as a computational model for the category Vectfd

Moreover, observe that the categories Vectfd

and Mat

are examples of so-called

Abelian categories (see, e.g., [144, ch. VIII Abelian Categories] or [71]) in which the

notion of adding morphisms (and objects, respectively) exists.

For instance, the authors in [143] utilize the category Mat

in order to develop a

simple categorical type system for concrete linear algebra. Another example is the

CAP project (cf. [168], [86]) which devotes itself to enable computing within a cate-

gory, i.e., to calculate objects, morphisms etc. of a given representation of a category

via a computational model such as, e.g., the category Mat

. The CAP project encom-

passes various software packages implemented in GAP (a system for computational

discrete algebra). Note that GAP does not support dependent types.

In the zoo of dynamically-checked programming languages, there is, e.g., the

new, still developing, Julia PL package Catlab.jl (see [163]) that is part of the larger

open source software project called AlgebraicJulia which focuses on developing

category theoretical approaches for technical computing. For more details on this

software project, visit

https://www.algebraicjulia.org/

. However, the expres-

siveness of the Julia PL’s type system is limited to encode fully the formal language

of category theory.

Finally, let us address the general question of error in connection with category

142 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

theory – which is inevitably related to numerical analysis where, classically, a dis-

tinction is made between modeling error,approximation error, and numerical error.

In order to handle numerical errors due to, e.g., number representations as float-

ing point numbers, there are not yet widespread methods such as interval arith-

metics (see, e.g., [213], [104]) to produce reliably verified results.

In order to deal with approximation errors – that is, errors due to discrete repre-

sentations of continuous representations, there is, as already mentioned above, the

well-oiled machinery of functional analysis which is invaluable even if one would

have computers with exact arithmetic.

Mostly, the flimsiest limb in the chain of errors is the modeling error – because

it is most difficult to formally capture this error which is a result of representing a

physical problem as a mathematical problem.

And if we consider all kinds of errors, then it is most likely to expect the greatest

impact from a CT approach on the modeling error. This consideration is plausible

since there is a take on category theory as a "mathematical model of mathematical

modeling" (see, e.g., [197]); and, hence, it can provide in some sense a consistency

test for the modeling. For instance, in differential geometry, both a linear map and a

bilinear map can be represented by a matrix but the matrix associated with the linear

map transforms differently under a change of coordinates than the matrix associated

with the bilinear map. This important information would go unnoticed by merely

focusing on the matrix representation.

Though, it is still too early to expect from a CT approach w.r.t. the modeling

error something similar to the so-called fundamental theorem of numerical analysis (see,

e.g., [7]).

However, in the upcoming section, driven by heuristics, I propose and discuss

shortly a means of quantifying the modeling error by employing a problem-depen-

dent degree of forgetfulness.

4.3 Using the CT toolset for SGO methods

In the area of optimization, it is intricate to provide a generally accepted taxonomy

of the numerous solution methods. However, if we apply an engineering-driven

pragmatism as a classifier, then surrogate-guided optimization methods have gained

traction as an outstanding sub-area over the last decades (recall § 4.1).

I argue, though, that using unique category theoretical (CT) language features

(recall § 4.2) can help to find a better match for pragmatic notions such as model

hierarchies or fidelity – in order to capture more rigorously their intended meaning. By

focusing on structure-related issues, CT offers formal methods that complement the

usual comparison toolset for SGO methods (recall § 4.1). Due to its tight connection

to (functional) programming, the CT approach highlights beneficially a blueprint of

a software design where the focus is on the specification (or the interface) which is

kept conceptually separated from the implementation.

Let us first devote ourselves to specifying a general optimization problem. Sub-

sequently, we dedicate ourselves to specifying surrogate-guided optimization meth-

ods.

4.3. Using the CT toolset for SGO methods 143

4.3.1 Specifying a general optimization problem

Recalling § 2.3, let us assume the objects X,Y, and Zwithin a category C, then we

prescribe the Z-valued objective function as

J=(y,x)↦z∶Y×X→Z, (4.15)

where x∶Xdenotes the control (or input) variable and y∶Ydenotes the state (or

intermediate) variable and z∶Zdenotes the output variable. Mind that we use

the barred arrow notation "↦" for the internal representation of a function and the

straight arrow notation "→" for the external representation of a function. Notice well

that the CT approach exploits primarily the external representation.

In (4.15), it is commonly assumed that there exists a unique control-to-state map f

such that

f=x↦y∶X→Y, (4.16)

Hence, one can write J(f(x),x)or, assuming that Jis ad hoc polymorphic or over-

loaded, one can write shortly J(f(x)). One can generalize the statements in (4.15)

and in (4.16) in the sense that one can make the assignment Y∶=BA(see § 4.2.2) in

order to highlight the search for a function of type A→B. Furthermore, one can

make the assignment X∶=X1×X2×...Xnand/or Z∶=Z1×Z2×...Znto emphasize

the arity of the input-object and the output-object.

Recalling the statements in (3.9) and in (3.10), let us prescribe the X-valued mini-

mizer function argmin that reads as

argmin =Jf↦x∗∶(X→Z)→X, (4.17)

where x∗∶Xdenotes the minimizer of the composite function Jfwith Jf≡J○fand

J○f∶homC(X,Z).

If we invoke set-builder notation and if we utilize an order structure on Z, one

can define the set of optimal solutions X∗as

X∗∶={x∗∶X∣∀x∶X.J(x∗)≤J(x)}, (4.18)

and one can define the set of corresponding optimal co-domain values Y∗as

Y∗∶={J(x)∶Z∣x∈X∗}, (4.19)

which is encoded in the minimizer proposition

minimize J(x). (4.20)

If we use an order structure on X, one can define a set of admissible solutions XF

due to, e.g., box constraints, as

XF∶={x∶X∣∀xl,xu∶X.xl≤x≤xu}, (4.21)

Supposing a differentiability structure for the maps Jand f, gradient- or hessian-

exploiting solution methods can be utilized.

Finally, let us apply the diagrammatic notation from the CT toolset (see, e.g., the

144 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

diagram in (4.7)) as a graphical tool for the compact representation of the specifica-

tion’s basic building blocks (4.15), (4.16), and (4.17).

X Y ×X Z

X Y Z

ZXX

idX

<f,idX>

idY×X

idZ

idX

idY

idZ

idZX

argmin

idX

(4.22)

Bear in mind that the statements in (4.18), in (4.19), in (4.20), and in (4.21) are consid-

ered as implicitly encoded in the compact representation in (4.22). Furthermore, the

diagrammatic notation in (4.22) is primarily only a graphical tool and not a graph-

ical language. In order to have a graphical language, i.e., to enable diagrammatic

reasoning (see, e.g. [1] or [45]), completeness theorems are necessary that match the di-

agrammatic representation with the algebraic representation. However, these kinds

of considerations are left for future investigations.

Nevertheless, the abstract specification by means of the CT approach is indis-

pensable since it keeps the objects and morphisms conceptually separated from a

possible implementation, e.g., as sets and set functions in a category Set. And, gen-

erally, it is conceivable that there is no prevailing way of implementation in different

categories. For more details on the method of categorical definition as a kind of ab-

stract specification and the utility of keeping the specification separated from the

implementation, see, e.g., [16].

Hereinafter, let us suppose the commonly defined sequential composition such

that the corresponding laws of a category are satisfied (recall Definition 4.2.1). Hence,

if we consider the three individual branches in (4.22) as depictions of three finite cat-

egories A,B,C(similarly to (4.7)), then an interaction between these three finite

categories can be conducted by functors. Given exemplarily the functors F ∶A→B,

G∶B→C, and H ∶A→C, one can adapt in a simplistic manner the diagrams in (4.22)

to a version that can be drawn as

X Y ×X Z

X Y Z

ZXZXX

<f,idX>

idZXargmin

. (4.23)

Especially due to the observation of the differentiation operator as a functor (see,

e.g., [177, p. 14f]), there are already some emerging applications of CT ideas in opti-

mization (see, e.g., [191]). Hence, diagrams such as in (4.23) furnish us with a fresh

and novel perspective on a general optimization problem.

Between the first branch (A) and the second branch (B) in (4.23), a forgetful func-

tor F – such as, e.g., in (4.9) – can be deployed that forgets the product object. Inter-

estingly, one cannot provide an appropriate functor in order to show an equivalence

of categories (recall Definition 4.2.4). According to [177, p. 30f], a functor F ∶C→D

4.3. Using the CT toolset for SGO methods 145

has to be faithful (such as in (4.10)), and full (such as in (4.13)), and essentially sur-

jective on objects in order to define an equivalence of categories, that is, C≃D. Mind

that a functor F ∶C→Dthat is essentially surjective on objects possesses the defining

property

for all X∶obj(D), there exists ˜

X∶obj(C)such that X≃DF(˜

X).5(4.24)

In conclusion, one cannot treat the first branch (A) and the second branch (B) in (4.23)

as indistinguishable regarded as categories and, therefore, one cannot substitute one

with the other.

If we look at the arrow Y×XJ

Ð→ Zisolated within the corresponding category (A)

and if we provide appropriate functors F and G, one can observe a natural isomor-

phism α∶F≅G (recall Remark 4.2.8) that relates the arrow Y×XJ

Ð→ Zto another

arrow Yˆ

Ð→ ZX. In computer science, such a specific natural isomorphism α∶F≅G

is called currying (see, e.g., [177, p. 54]). For more elaborations regarding the con-

structions behind currying, I refer to, e.g., [167, p. 33f] or [177, p. 129].

Exploiting the information regarding the first branch (A) and the third branch (C)

in (4.23), let us suppose a category Din which one can draw a diagram such that

X Y ZXX

idX

idY

idZX

argmin

idX

. (4.25)

Observe that performance-related issues regarding a general optimization prob-

lem boil down to a matter of executing the composition in (4.25) as efficient as possi-

ble. Recalling Figure 1.4, the CT approach helps therefore with organizing concisely

the needed knowledge at the level of generalized functions which precedes the level

of associated algorithms which in turn precedes the level of programs that are cor-

responding implementations in a programming language. In informal parlance, one

can say that programs are a subpart of algorithms and algorithms are a subpart of

generalized functions. For more details, I refer to, e.g., [225].

Obviously, the CT approach is too abstract to utilize the standard analysis toolset;

but it enables a complementary useful rigor in reasoning by highlighting an ade-

quate specification.

Recalling for the sake of technicalities the category Cat which is the category of

(locally) small categories as objects and functors as morphisms (see § 4.2.2), then,

notably, the CT approach undertakes meaningfully a shift in perspective by lift-

ing the classical set-oriented modeling paradigm "(set) functions as models" onto

the category-oriented modeling paradigm: "Categories as models" and "functors as

model transformations".

4.3.2 Specifying surrogate-guided optimization methods

At the level of generalized functions (see Figure 1.4), surrogate-guided optimiza-

tion (SGO) methods try to address the aforementioned performance-related issue by

introducing a fidelity notion. Inevitably, it adds complexity to the inherent complex-

ity of the general optimization problem (see the diagram in (4.25)) by forcing the use

of various models that exhibit some kind of fidelity, that is, some kind of indexing.

5Regarding the signature of F, see Remark § 4.2.4.

146 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

In a nutshell, I argue that there are two kinds of fidelity: An order-oriented fidelity

and a hierarchy-oriented fidelity. Both kinds are based on a user-defined indexing

driven by the idea of a "best fit" to physics’ semantics; but the hierarchy-oriented

fidelity demands additionally a meaningful limit definition. The hierarchy-oriented

fidelity is at the core of multi-level methods, the order-oriented fidelity is at the core of

multi-fidelity methods. In the remaining, I will exclusively dwell on the multi-fidelity

methods.

To give an example of the "best fit" idea, let us use a semantics of electromagnetics

to interpret the syntactical considerations in the previous subsection – especially

in (4.15): the control variable xas geometric parameters, the state variable yas a

current density vector field, the map fas an assignment to a solution of a well-

posed boundary value problem (WPBVP), and the map Jas a loss map. Solving

numerically the WPBVP allows, e.g., to associated a "best fit" model with the highest

degree of discretization; and a "low fit" model with a low degree of discretization –

hence, let us call the later a low-fidelity model and the former a high-fidelity model.

Exploiting the category-oriented paradigm shift mentioned above in § 4.3.1, there

are two options for representing a model. For the sake of conciseness, let us omit the

identity arrows in the depictions:

X Y Z or

fJX Z .

Jf(4.26)

The option to choose depends on the information available. If we select, for

example, the first model representation as our high-fidelity model M1, then one can

ascribe a low-fidelity model M2to the high-fidelity model by using a functor F.

X Y Z

F(X)F(Y)F(Z)

F(f)F(J)

(4.27)

Given the functor F ∶M1→M2, one can interpret (recall § 4.2.1) the high-fidelity

model M1within the low-fidelity model M2. Hence, the CT approach gives a clear

mathematical meaning to the colloquial idea in SGO methods that the models of

variable fidelity should share some structural similarity.

Moreover, it establishes a good level of abstraction to encompass formally all

kinds of models – and the models’ interconnection – which are conceivable in SGO

methods (recall § 4.1).

In particular, the CT approach offers guidance to develop formalized guarantees

regarding the models in SGO methods. For instance, if two models Kand Lare

given, then one can analyze their relationship by studying various functors originat-

ing in homCat (K,L). To quantify the modeling error (see § 4.2.3), one can employ a

problem-dependent degree of forgetfulness (DoFF). E.g., if two or three forgetful func-

tors from TVect

towards Set (cf. the diagram in (4.12)) are provided, then one can

assign the value two or three to the degree of forgetfulness from TVect

to Set. Thus,

in this case, the modeling error associated with TVect

would be two or three. It can

be stipulated that a larger number reflects a greater knowledge of the model; and

therefore a larger number indicates a more trustworthy model.

By using a new functor G, one can construct a new low-fidelity model M3from

the low-fidelity model M2. Likewise, this construction can be interpreted as an as-

signment of the new low-fidelity model M3to the high-fidelity model M1by using

4.3. Using the CT toolset for SGO methods 147

a composite functor G○F.

X Y Z

F(X)F(Y)F(Z)

G(F(X)) G(F(Y)) G(F(Z))

G○F

F(f)F(J)

G(F(f)) G(F(J))

(4.28)

One can also ascribe directly a new low-fidelity model M4to the high-fidelity

model M1by using a functor H. Note that if, in addition, a functor ˜

G is provided

such that ˜

G/≅G and ˜

∶=G, and if it is set that H ∶=˜

G○F, then we have a similar

situation as in the previous diagram in (4.28) – mind that it is not the same situation,

though. Thus, the low-fidelity models M4and M3are still distinguishable regarded

as models.

F(X)F(Y)F(Z)

X Y Z

H(X)H(Y)H(Z)

F(f)

F(J)

H(f)H(J)

(4.29)

To be seen as indistinguishable, it demands testing for equivalence M4≃M3(recall

Definition 4.2.4) by furnishing adequate functors U ∶M4⇄M3∶V. Generally, this

equivalence test is a good initial tool to use if an arbitrarily new model is stated.

Further practical considerations require, for instance, to cover the issue of how

to check for equivalence by normalization. Furthermore, admittedly, it could be that

sometimes an even weaker form of equivalence is more appropriate (recall § 4.2.2).

F(X)F(Y)F(Z)

X Y Z

G(F(X)) ˜

G(F(Y)) ˜

G(F(Z))

G(F(X)) G(F(Y)) G(F(Z))

F(f)

F(J)

G○F

G(F(f)) ˜

G(F(J))

G(F(f))

G(F(J))

(4.30)

It is perceivable that one can arbitrarily scale such diagrams in (4.30) – while

maintaining the interpretability which facilitates a correct reasoning about the cor-

responding algorithms and programs. Because note that the inherent complexity at

the level of generalized functions propagates through the level of algorithms to the

level of programs (see Figure 1.4); but it is increasing at each level since each level

adds to it some level-specific complexity such as programming language-dependent

features.

Observe that, as instances for the presented diagrams such as in (4.27), one can

148 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

apply the category Set with appropriate additional properties and one can make

use of the corresponding Set-valued functors, that is, these kinds of functors whose

codomain is constituted by the category Set. However, the level of abstraction of the

CT approach also allows us to apply, for example, a category Graph of graphs and

graph homomorphisms. For more details on such graphs-related categories, see,

e.g., the references regarding the diagrams in (4.7).

Such a category Graph is a reasonable choice in practical terms since a program-

ming language’s compiler operates with abstract syntax trees. Moreover, regarding

the intrusiveness (see § 4.1.1) property assigned to the low-fidelity models in a tradi-

tional setting, this new context suggests to assign this property rather to the involved

functors.

If we choose the second model representation in (4.26) as our high-fidelity model

M1, then one can proceed analogously to the other choice. For the sake of clarity, let

us omit the parentheses.

X Z FXFZGFXGFZ

G○F

FJf

GGFJf(4.31)

These previous considerations provide a useful supplement to the concept of

multifidelity model management (see § 4.1.2). Using the CT approach, one can interpret

this concept’s basic building block as having "a finite category Cmmm with morphisms

as models". Again, one can sketch two cases depending on the available information.

To illustrate the two cases, let us choose one high-fidelity model (index 0) and two

low-fidelity models (index 1 and index 2, respectively).

X0Y0Z0

X1Y1Z1

X2Y2Z2

f0J0

f1J1

f2J2

X0Z0

X1Z1

X2Z2

Jf0

Jf1

Jf2

(4.32)

This concept requires mostly that there are isomorphisms (recall the diagram

in (4.2)) between the input objects (X0≅CX1,X0≅CX2, and X1≅CX2), intermediate

objects (Y0≅CY1,Y0≅CY2, and Y1≅CY2), and output objects (Z0≅CZ1,Z0≅CZ2,

and Z1≅CZ2) of the models. One can substitute one object with the other, if they are

isomorphic.

X0Y0Z0

X0Z0

Jf1

Jf0

Jf2

(4.33)

The notion of sameness of models is encoded by the equality of a parallel pair

of morphisms (such as in (4.6)), for instance, Jf0,Jf1∶X0⇉Z0,Jf0,Jf2∶X0⇉Z0.

4.3. Using the CT toolset for SGO methods 149

or Jf1,Jf2∶X0⇉Z0. For this purpose, one needs to presume the existence of the

terminal object 1within the given category.

Discern that the semantic link between the models in (4.32) is carried out by

conceptions such as model evaluation costs and correlation coefficients – whereas in

diagrams such as (4.30), the semantic link between the models is primarily carried

out by functors.

To examine the space mapping notion (see § 4.1.2), let us also interpret its basic

building blocks as having "a finite category Csm with morphisms as models". Then,

one can observe that it makes use of the intermediate objects.6

X0Y0Z0

X1Y2Z1

X2Y2Z2

f0J0

f1J1

f2J2

X0Y0Z0

X1Y1Z1

X2Y2Z2

p01

r01 o01

p01

p12

r01

r12

o01

o12

p12

f2J2

r12 ˜

o12

(4.34)

In (4.34), there is no need to require any isomorphisms. More precisely, there is no

necessity that, for any (not-ordered) i,j∶{0,1,2}2, the morphisms pij ∶Xi⇄Xj∶˜

pij

have to satisfy the conditions

pij ○pij =idXi(4.35a)

pij ○˜

pij =idXj(4.35b)

or the morphisms rij ∶Yi⇄Yj∶˜

rij have to satisfy the conditions

rij ○rij =idYi(4.36a)

rij ○˜

rij =idYj(4.36b)

or the morphisms oij ∶Zi⇄Zj∶˜

oij have to satisfy the conditions

oij ○oij =idZi(4.37a)

oij ○˜

oij =idZj. (4.37b)

However, the corresponding morphisms in (4.34)can satisfy the conditions in (4.35),

in (4.36), and in (4.37).

Similarly to (4.33), the notion of sameness of models in (4.34) is encoded by the

equality of a parallel pair of morphisms with the signature X0→Z0. Observe that

various combinations of path compositions in (4.34) can exhibit the signature X0→Z0.

To concretize the semantic link between the models in (4.34), various represen-

tations have been associated with some of the maps, e.g., affine maps with ˜

rij (see,

e.g., [56]) or argmin maps with pij (see, e.g., [14], [95], or [145]).

Concerning the semantic link, bear in mind that different representations, and

methods, respectively, suppose different isomorphisms regarding the objects in (4.34).

6Notice well that, compared with (3.124), there is a slight semantical change regarding the notation

in (4.34) in order to harmonize the notation a bit with the notation w.r.t. the CT toolset (recall § 4.2).

However, the identification of the entities in (3.124) with the entities in (4.34) should be clear.

150 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

Hence, taking into account the individual isomorphisms – reflected by the condi-

tions in (4.35), in (4.36), and in (4.37) – resembles a normalization process towards

the state where there is essentially only one input object, one intermediate object,

and one output object (see the diagrams in (4.33)).

X0Y0Z0

X1Y1

X2Y2

p01

r01

p12

p01 ˜

r01

r12

p12 ˜

r12

X0Y0Z0

r01 ˜

r01

r12

J2(4.38)

The normalization process argument elucidates that the modeling decisions con-

cerning the choice of isomorphisms can separate the concept of multifidelity model

management encoded in Cmmm (see the diagrams in (4.32)) and the space mapping

notion encoded in Csm (see the diagrams in (4.34)), on the one hand.

On the other hand, the modeling decisions concerning the choice of isomor-

phisms reveals diagrammatically requirements for indistinguishability of the con-

cept of multifidelity model management (see the diagrams in (4.32)) and the space

mapping notion (see the diagrams in (4.34)).

In essence, the normalization process argument delivers a classification tool at

the level of generalized functions (see Figure 1.4) for the concept of multifidelity

model management and the space mapping notion.

An allied argument – which is algebraic-geometric in nature – utilizes functors.

Recalling the commentary on the diagrams in (4.7), we possess the certainty that all

functors preserve isomorphisms. Therefore, given the finite category Cmmm associ-

ated with (4.32) and the finite category Csm associated with (4.34), one cannot pro-

vide a functor Q between these two categories, i.e., Q ∶Cmmm →Csm, that maps the

objects and the morphisms in the most obvious way such that it maps isomorphisms

in Cmmm to non-isomorphisms in Csm.

Furthermore, a potential indistinguishability of the concept of multifidelity mo-

del management (see the diagrams in (4.32)) and the space mapping notion (see the

diagrams in (4.34)) can be expressed by an equivalence of categories (recall Defini-

tion 4.2.4)Cmmm ≃Csm.

Hence, comparing the category Cmmm and the category Csm provides another clas-

sification tool at the level of generalized functions (see Figure 1.4) for the concept of

multifidelity model management and the space mapping notion.

Finally, recalling the diagrams in (4.22), let us consider the last part of the opti-

mization. We continue our prior thread and ascribe a low-fidelity model optimiza-

tion O1to the high-fidelity model optimization O0by using a functor P.

ZXXP(ZX)P(X)

argmin

P(argmin)(4.39)

Given the functor P ∶O0→O1, one can interpret (recall § 4.2.1) the high-fidelity

model optimization O0within the low-fidelity model optimization O1. Thus, the

4.3. Using the CT toolset for SGO methods 151

CT approach pins down the intuitive idea that there should be some structural sim-

ilarity between the high-fidelity model optimization and the low-fidelity model op-

timization.

Moreover, the diagram in (4.39) signifies structurally the distinction between

a surrogate-based optimization and a surrogate-guided optimization: In the former

case, after establishing the functor P, there is no further interaction between O0

and O1. In the later case, after establishing the functor P, there is further interac-

tion between O0and O1.

Ideally, if we furnish adequate functors P ∶O0⇄O1∶˜

P, then one can establish

an equivalence of categories O0≃O1; hence, one can consider them as indistinguish-

able regarded as model optimizations.

However, if we take up the position of the finite category Cmmm associated with

(4.32) and the finite category Csm associated with (4.34), then the expressive power of

the diagram in (4.39) reduces certainly. In the case of Cmmm, for instance, I conclude

from the diagrams in (4.33) that the high-fidelity model optimization and the low-

fidelity model optimizations are all together encoded within the subsequent repre-

sentation:

ZX0

0X0

argmin . (4.40)

Due to various combinations of path compositions in (4.34) that can exhibit the sig-

nature X0→Z0, the high-fidelity model optimization and the low-fidelity model op-

timizations concerning Csm can be all together encoded similarly to (4.40).

At the level of generalized function (see Figure 1.4), the encoding in (4.39) and the

encoding in (4.40) do not contain any value judgment in the sense that one encoding

is preferred more than the other encoding; they simply hint at possible logical mis-

matches between intuitive ideas and their mathematical encodings due to the choice

of the modeling paradigm.

Let us continue with the encoding in (4.39). Notice well that the X-valued mini-

mizer function argmin in (4.17) contains an internal representation (indicated by "↦")

and an external representation (indicated by "→").

Since the CT approach exploits primarily the external representation (recollect

§4.3.1), though, we have to assume additionally a terminal object in order to encode

properly the minimizer x∗for (4.39).

Hence, let us introduce 1x∗

Ð→ Xand x∗∶X1, respectively. Supposing that argmin

is polymorphic, one can extend its signatures such that

argmin ∶ZX→X1. (4.41)

Finally, let us adapt the diagram in (4.39) according to the statement in (4.41).

ZXX1P(ZX)P(X1)

argmin

P(argmin)(4.42)

The adapted diagram in (4.42) formalizes suitably the idea of preserving the cor-

responding structure and minimizer. For the sake of brevity, I keep, however, the

adapted signature implicit in the remaining exposition.

152 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

Similar to the abovementioned encounters, when we include multiple low-fidelity

model optimizations, the CT ansatz unfolds its strength of formally correct book-

keeping of the involved interactions. It is set definitionally that am ∶=argmin.

P2(X)P2(ZX)P1(ZX)P1(X)

ZXX

P2(am)˜

P12

P1(am)

P2P1(4.43)

The functors P12 ∶O1⇄O2∶˜

P12 indicate a possible test for equivalence (recall

Definition 4.2.4) of the corresponding low-fidelity optimization problems.

Similarly to the insight about the scalability from the diagrams in (4.30), one can

observe that one can arbitrarily scale diagrams such as in (4.43) while maintaining

the interpretability (see § 4.2.1) – a vital component for the correct reasoning about

associated algorithms and programs (see Figure 1.4).

The following depiction highlights the observation regarding the scalability. For

the sake of clarity, let us omit the potential functors between the low-fidelity model

optimizations whose existence is implicitly supposed.

P2(X)P2(ZX)P1(ZX)P1(X)

ZxX

P3(X)P3(ZX)P4(ZX)P4(X)

P2(am)

P1(am)

P3(am)˜

P4P4(am)

(4.44)

I remark that, by default, a possible implementation of the discussed categories

such as Cmmm that is associated with (4.32) and Csm that is associated with (4.34) is the

category Set, i.e., the category of (finite) sets and set functions (see § 4.2.2). Regarding

the category Set, a singleton set can serve as a terminal object and isomorphisms are

common bijective set functions.

An implementation of the discussed categories by the category Set seizes the

difference between the classical set-oriented modeling paradigm "(set) functions as

models" and the category-oriented modeling paradigm "categories as models" and

"functors as model transformations" (see § 4.3.1). However, as shown in this section,

one can discuss uniformly both modeling paradigms within the CT approach.

However, the set-oriented modeling paradigm benefits from a conglomeration

of diverse insights at the level of programs and at the level of algorithms (see Fig-

ure 1.4).

Nevertheless, the category-oriented modeling paradigm flourishes at the level of

generalized functions (see Figure 1.4), that is, it helps to encode more rigorously a lot

of colloquial or intuitive ideas; and it provides new tools to compare and to classify

methods, thus, it complements beneficially the tools from the set-oriented modeling

paradigm.

A final remark regarding the two modeling paradigms is concerned with the

direction of the arrows involved in the corresponding diagrams. In (4.26), the choice

of the direction of the arrows is dictated by the semantics commonly provided by a

general optimization problem (see § 4.3.1).

4.4. Use cases of the CT toolset

within the electromagnetics context 153

From a purely syntactical viewpoint, though, if we simply turn around the ar-

rows (cf. Remark 4.2.2) in (4.26), then combinatorial deliberations reveal that there

are four distinguishable diagrams conceivable regarding the three objects and two

arrows in (4.26). The corresponding list of distinguishable diagrams reads as

X Y Z

fJ(4.45a)

X Y Z

fJ(4.45b)

X Y Z

fJ(4.45c)

X Y Z

f. (4.45d)

Notice well that the possibility in (4.45b), in (4.45c), and in (4.45d) are ruled out by

semantics-based deliberations.

If we adapt (4.45) to categories A,B,Cand functors F,G, then we receive the

subsequent list of distinguishable combinations:

A B C

FG(4.46a)

A B C

FG(4.46b)

A B C

FG(4.46c)

A B C

F. (4.46d)

Let us invoke the category-oriented modeling paradigm in a verbose mode such that

one can spell out the statements in (4.46), for instance, in the following manner:

1. "model Cfollows from model B, and model Bfollows from model A",

2. "model Bfollows from model A, and model Bfollows from model C",

3. "model Afollows from model B, and model Bfollows from model C",

4. "model Afollows from model B, and model Cfollows from model B".

Thus, the category-oriented modeling paradigm offers implicitly a causal viewpoint

in some sense. Such a viewpoint hints at potential additional facets regarding the

notion of fidelity made at the beginning of the section. However, especially in the

light of the research on causal modeling (see, e.g., [196]), the critical examination of

these facets is left for future investigations. A popular adage in statistical parlance is:

“Correlation does not imply causation.” This adage could serve as a starting point

for a critical examination of statements such as in (4.46).

4.4 Use cases of the CT toolset

within the electromagnetics context

4.4.1 Use case #1: Simplified-physics low-fidelity models

Recalling § 3.1.3, one can observe that there is a difficulty regarding the formalization

of all intuitive ideas w.r.t. figures such as the Figure 3.8. To put it in other words: To

my best knowledge, there is a lack of a comprehensive theory to express formally all

154 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

conceivable relationships between different problems associated with a high-fidelity

model and corresponding low-fidelity models.

In (Diagrams of Fig. 3.8), some possible diagrams are abstractly presented that

can be associated with the Figure 3.8. In (3.102), some statements concerning (Dia-

grams of Fig. 3.8) are formulated in equational form. Mind that these statements and

diagrams can be understood a bit more rigorously by applying the category theory

toolset (see § 4.2).

From the viewpoint of the category theory toolset, one can make different mod-

eling decisions in (Diagrams of Fig. 3.8). I illustrate some modeling decisions by the

diagrams in (4.47).

T3D

h1T2D

h1R1T3D

h1T2D

h1R1T3D

h1T2D

h1R1

T3D

h2T2D

h2R2T3D

h2T2D

h2R2T3D

h2T2D

h2R2

1∆3D

Ax=b1∆2D

Ax=bR31∆3D

Ax=b1∆2D

Ax=bR31∆3D

Ax=b1∆2D

Ax=bR3

2∆3D

Ax=b2∆2D

Ax=bR42∆3D

Ax=b2∆2D

Ax=bR42∆3D

Ax=b2∆2D

Ax=bR4

f1g1h1

I1I2

f1g1h1

F1F2

f2g2h2

J1J2

f2g2h2

G1G2

f3g3h3

K1K2

f3g3h3

H1H2

L1L2

(4.47)

Observe that, in (4.47), a modeling decision is exhibited that takes into account the

different algebraic characters of, e.g., T3D

h1,T2D

h1, and R1, by lifting them up to the

level of categories. More precisely, entities such as, e.g., T3D

h1,T2D

h1, and R1are con-

ceived as categories and their interaction is mediated by functors such as, e.g., the

functor I1∶T3D

h1→T2D

h1and the functor I2∶T2D

h1→R1.

These categories can be understood as specifications (or interfaces) (recall § 4.3)

of the corresponding numerical entities (recall § 3.1.3). However, I do not dwell on

this modeling decision.

Regarding the other modeling decision in (4.47), it is supposed that, e.g., three

finite categories A,B,Cexist (similarly to (4.7)). In addition, it is presupposed that

the category A(recall Definition 4.2.1) is constituted by the six morphisms f1,f2,f3,

f4,f5, and f6where

f1∶homA(T3D

h1,T3D

h2)f2∶homA(T3D

h2,1∆3D

Ax=b)f3∶homA(1∆3D

Ax=b,2∆3D

Ax=b)

(4.48a)

f4∶homA(T3D

h1,2∆3D

Ax=b)f5∶homA(T3D

h1,1∆3D

Ax=b)f6∶homA(T3D

h2,2∆3D

Ax=b).

(4.48b)

Analogously to (4.48), one can constitute the category B, and the category C.

Thus, it is supposed that the category Bis constituted by the six morphisms g1,g2,

4.4. Use cases of the CT toolset

within the electromagnetics context 155

g3,g4,g5, and g6where

g1∶homB(T2D

h1,T2D

h2)g2∶homB(T2D

h2,1∆2D

Ax=b)g3∶homB(1∆2D

Ax=b,2∆2D

Ax=b)

(4.49a)

g4∶homB(T3D

h1,2∆2D

Ax=b)g5∶homB(T2D

h1,1∆2D

Ax=b)g6∶homB(T2D

h2,2∆2D

Ax=b).

(4.49b)

Finally, it is presupposed that the category Cis constituted by the six morphisms h1,

h2,h3,h4,h5, and h6where

h1∶homC(R1,R2)h2∶homC(R2,R3)h3∶homC(R3,R4)(4.50a)

h4∶homC(R1,R4)h5∶homC(R1,R3)h6∶homC(R2,R4). (4.50b)

For the sake of conciseness, let us assume implicitly that, regarding A,B,C, all cate-

gory laws are satisfied.

Notice that given a morphism f3○f2○f1as a decomposition of the morphism f4

in (4.48b) where

f3○f2○f1∶homA(T3D

h1,2∆3D

Ax=b), (4.51)

one can conceive the morphism f3○f2○f1, for instance, as a representation of an

operation that encodes the change from a fine-grid discretization regarding a three-

dimensional space and a high threshold for a termination criterion of an iterative

solver to a coarse-grid discretization regarding a three-dimensional space and a low

threshold for a termination criterion of an iterative solver (cf. § 3.1.3).

Hence, given a morphism g3○g2○g1as a decomposition of the morphism g4

in (4.49b) where

g3○h2○g1∶homB(T2D

g1,2∆2D

Ax=b), (4.52)

one can conceive the morphism g3○g2○g1as an analogue of the morphism f3○f2○f1

in (4.51) with respect to a two-dimensional space.

Given a morphism h3○h2○h1as a decomposition of the morphism h4in (4.50b)

where

h3○h2○h1∶homC(R1,R4), (4.53)

one can conceive the morphism h3○h2○h1, for instance, as a representation of an op-

eration that encodes the change from a multivariate rational polynomial function of

type (m,3)to a multivariate rational polynomial function of type (m,0)and leading

coefficient of one (cf. § 3.1.3).

Additionally to the categories A,B,C, it is supposed that the functors F1∶A→B,

F2∶B→C, the functors G1∶A→B, G2∶B→C, and the functors H1∶A→B,

H2∶B→Cexist.

Hence, one can reformulate and extend the statements in (3.102) by means of the

functor F1and the functor F2(recall Definition 4.2.2) , and with F2○F1∶A→C, i.e.,

F1(T3D

h1)∶=T2D

h1F1(T3D

h2)∶=T2D

h2F1(f1)∶=g1(4.54a)

(F2○F1)(T3D

h1)∶=R1(F2○F1)(T3D

h2)∶=R2(F2○F1)(f1)∶=h1(4.54b)

F1(1∆3D

Ax=b)∶=1∆2D

Ax=bF1(f2)∶=g2(4.54c)

(F2○F1)(1∆3D

Ax=b)∶=R3(F2○F1)(f2)∶=h2(4.54d)

F1(2∆3D

Ax=b)∶=2∆3D

Ax=bF1(f3)∶=g3(4.54e)

(F2○F1)(2∆3D

Ax=b)∶=R4(F2○F1)(f3)∶=h3(4.54f)

156 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

F1(f4)∶=g4F1(f5)∶=g5F1(f6)∶=g6(4.54g)

(F2○F1)(f4)∶=h4(F2○F1)(f5)∶=h5(F2○F1)(f6)∶=h6. (4.54h)

For the sake of conciseness, let us assume implicitly that, regarding F1, F2, and F2○F1,

all functor laws are satisfied.

If we suppose that, similarly to (4.9), the functor F1and the functor F2are for-

getful functors, then one can receive a precise encoding for the intuitive idea of con-

sidering the problems associated with the low-fidelity models as forgetful interpre-

tations of the problem associated with the high-fidelity model (cf. the commentary

on (3.102)).

Moreover, recalling § 4.3.2, one can invoke the heuristics-driven notion of a pro-

blem-dependent degree of forgetfulness (DoFF) in order to quantify the modeling

error. Bear in mind that my proposed notion of a DoFF is not cast in stone. The

DoFF serves primarily as an auxiliary means regarding the attempt to grasp more

formally the fidelity notion attached to models. I argue that the DoFF as an auxiliary

means is especially useful when using many different models.

If we fix the category Cas the model with the lowest fidelity, then one can as-

sociate the category Bwith the DoFF of 1, i.e., using one forgetful functor F2such

that cod(F2)≡C, and one can associate the category Awith the DoFF of 2, i.e., using

two forgetful functors F1and F2such that cod(F2○F1)≡C.

If we invoke the functor G1and the functor G2and the functor G2○G1∶A→C,

then one can encode a different interpretation (see § 4.2.1) than the functor F1and

the functor F2. For instance, the functor G1can encode a "partial shift" w.r.t. the

functor F1and the functor G2can encode a similar behavior w.r.t. the functor F2in

the sense that the assignments in (4.54) can be adapted as

G1(T3D

h1)∶=T2D

h2G1(T3D

h2)∶=1∆2D

Ax=bG1(f1)∶=g2(4.55a)

(G2○G1)(T3D

h1)∶=R1(G2○G1)(T3D

h2)∶=R2(G2○G1)(f1)∶=h1(4.55b)

G(1∆3D

Ax=b)∶=2∆2D

Ax=bG1(f2)∶=g3(4.55c)

(G2○G1)(1∆3D

Ax=b)∶=R3(G2○G1)(f2)∶=h2(4.55d)

G1(2∆3D

Ax=b)∶=2∆2D

Ax=bG1(f3)∶=id2∆2D

Ax=b

(4.55e)

(G2○G1)(2∆3D

Ax=b)∶=R4(G2○G1)(f3)∶=h3(4.55f)

F1(f4)∶=g4F1(f5)∶=g5F1(f6)∶=g6(4.55g)

(F2○F1)(f4)∶=h4(F2○F1)(f5)∶=h5(F2○F1)(f6)∶=h6. (4.55h)

If we replace the assignments in (4.55e) by the assignments

G1(2∆3D

Ax=b)∶=T2D

h1G1(f3)∶=g1, (4.56)

then one would violate the functor laws in the sense that

G1(f3○f2○f1)≡g1○g3○g2, (4.57)

where g1○g3○g2is not a legitimate composite morphism within B. From an appli-

cation-driven viewpoint, one can conceive this violation, e.g., as a restriction w.r.t. the

construction of an operation within another context that is grounded in, e.g., the rep-

resentation of the morphism f3○f2○f1in (4.51).

If we invoke the functor H1and the functor H2, and the functor H2○H1∶A→C,

4.4. Use cases of the CT toolset

within the electromagnetics context 157

then one can encode a third interpretation. For instance, the functor H1can encode a

similar behavior w.r.t. the functor F1and the functor H2can encode a "partial shift"

w.r.t. the functor F2in the sense that the assignments in (4.54) can be adapted as

F1(T3D

h1)∶=T2D

h1F1(T3D

h2)∶=T2D

h2F1(f1)∶=g1(4.58a)

(F2○F1)(T3D

h1)∶=R2(F2○F1)(T3D

h2)∶=R3(F2○F1)(f1)∶=h2(4.58b)

F1(1∆3D

Ax=b)∶=1∆2D

Ax=bF1(f2)∶=g2(4.58c)

(F2○F1)(1∆3D

Ax=b)∶=R4(F2○F1)(f2)∶=h3(4.58d)

F1(2∆3D

Ax=b)∶=2∆3D

Ax=bF1(f3)∶=g3(4.58e)

(F2○F1)(2∆3D

Ax=b)∶=R1(F2○F1)(f3)∶=h2(4.58f)

F1(f4)∶=g4F1(f5)∶=g5F1(f6)∶=g6(4.58g)

(F2○F1)(f4)∶=h4(F2○F1)(f5)∶=h5(F2○F1)(f6)∶=h6. (4.58h)

For the sake of brevity, let us omit the consideration of natural transformations

(recall Definition 4.2.3) regarding the functors F1,F2, the functors G1,G2, and the

functors H1,H2, that is, let us omit the comparison of the corresponding constructed

paths (recall Remark 4.2.7).

However, mind that, if we suppose that the functors in (4.54) in (4.55), and

in (4.58) are forgetful functors, then one can at least exclude that the categories A,B,

and Csatisfy a test for equivalence (recall Definition 4.2.4).

4.4.2 Use case #2: Coordinate transformations7

The second use case of the CT approach within the electromagnetics context is re-

lated to coordinate transformations. I construe the term "coordinate transformations"

as an umbrella term for different applications within the electromagnetics context

that are associated to coordinates in a narrower sense (e.g., w.r.t. a physical space) or

to coordinates in a broader sense (e.g., w.r.t. an abstract vector space).

An application that is associated to coordinates in a narrower sense is depicted in

Figure 4.1 which is inspired by [138]. In Figure 4.1, a three-dimensional helical coil

of 5 turns (cf. (i) in Figure 1.3) is represented by two different coordinate systems.

One coordinate system is constituted by xyz-coordinates and another coordinate sys-

tem is constituted by uvw-coordinates.

In Figure 4.1, it is supposed that an initial coordinate system is provided by a

user. Furthermore, I claim that the helical coil in xyz-coordinates possesses this kind

of shape that is intuitively expected from common experiences in physics.

However, the helical coil in uvw-coordinates, i.e., a three-dimensional solid iden-

tified as a cylinder, possesses this kind of curvilinear geometric shape that is rather

counterintuitive compared to common experiences in physics.

I do not dwell on the involved subtleties regarding such coordinate transforma-

tions such as, e.g., the proper transformation of the corresponding field quantities

w.r.t. the well-posed boundary value problems (recall (2.16)). For more details con-

cerning these subtleties, I refer to, e.g., [174] or [138].

7The elaborations are part of a joint publication in preparation for submission called "Formalization

Issues of Surrogate Modeling in Electromagnetic Compatibility" (M. Hadžiefendi´c, R. S. Rezende, R.

Schuhmann).

158 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

xyz-

coordinates

uvw-

coordinates

initial

xyz-coordinates

from initial

xyz-coordinates

change of

coordinates

uvw-coordinates

from initial

xyz-coordinates

FIGURE 4.1: A schematic depiction of a three-dimensional helical

coil of 5 turns in xyz-coordinates (cf. (i) in Figure 1.3) and in uvw-

coordinates (inspired by [138]). For more elaborated considerations

of such coordinate transformations within the electromagnetics con-

text, I refer to, e.g., [174] or [138].

For the considerations in the present work, though, the significant aspect is that

the evaluated quantities of interest (recall § 2.2.1) in both coordinate systems in Fig-

ure 4.1 should be identical. This aspect reflects the premise that (derived) measur-

able physical entities such as, e.g., the magnetic energy or the power loss, should be

coordinate-invariant. More precisely, the (derived) measurable physical entities, and

therefore the evaluated quantities of interest, should be independent of the choice of

coordinates.

An application that is associated to coordinates in a broader sense is encoded by

the subsequent matrix equation, that is, given a matrix S∈R4×4, then

∀SMM ∈R4×4.∃M∈R4×4.SMM =MSM−1. (4.59)

If we assume an endomorphism s∶V→Vwithin the category Vectbasis

(recall

§4.2.2) regarding an ordered basis Vb1, then one can associate the endomorphism s

regarding an ordered basis Vb1with the matrix Sin (4.59) (cf. (4.11)).

Furthermore, one can associate the endomorphism sregarding another ordered

basis Vb2with the matrix SMM such that one can conceive the matrix Mas the change-

of-basis matrix from the ordered basis Vb1to another ordered basis Vb2.

Hence, given sregarding Vb2and given an input element a∶Vregarding Vb2, then

we receive an output element b∶Vregarding Vb2by setting

b∶=s○a. (4.60)

If we instantiate the assignment in (4.60) by the matrix SMM, the column vector a∈R4×1,

and the column vector b∈R4×1, then the statement in (4.60) can be written as

b∶=SMM a(4.61a)

b∶=MSM−1a, (4.61b)

where the term M−1arefers to the representation of a∶Vregarding Vb1and the

term SM−1arefers to the representation of b∶Vregarding Vb1.

A semantics for the statements in (4.61) within the electromagnetics context is

provided by, for instance, the notion of 4-port S-parameters and the notion of 4-

port mixed-mode S-parameters (see, e.g., [29]) within the context of electromagnetic

4.4. Use cases of the CT toolset

within the electromagnetics context 159

compatibility (recall, e.g., the EMC filter in Figure 1.1).

Thus, in (4.61), the matrix Scorresponds to the 4-port S-parameters and the ma-

trix SMM corresponds to the 4-port mixed-mode S-parameters that incorporates the

idea of differential-mode and common-mode scattering parameters. The entries

of the change-of-basis matrix Mdepend on the ordering of the entries of the ma-

trix SMM. Let us omit an in-depth elaboration on these two kinds of S-parameters.

For more details, I refer to, e.g., [29] and references therein.

Given the S-parameters semantics, the key insight from (4.61) is that, by provid-

ing the entries of the matrix Swhich are (derived) measurable physical entities, one

can determine the matrix SMM.

In the considerations of the present work, let us focus on how to embed applica-

tions such as, e.g., the application in Figure 4.1 or the application in (4.59) into the

body of ideas of surrogate optimization.

Recalling § 3.1, one can intuitively state that, regarding Figure 4.1, if we associate

a kriging low-fidelity model with a high-fidelity model that is linked to the problem

corresponding to the helical coil in xyz-coordinates and if we associate a kriging

low-fidelity model with a high-fidelity model that is linked to the helical coil in uvw-

coordinates, then these two kriging low-fidelity models should behave equally. Let

us refer to this situation as formalization issue A (FI-A).

Regarding (4.59), given the Sij-parameter w.r.t. the matrix S∶=[si,j]∈R4×4which

one can define as

Sij ∶=[si,j], (4.62)

then, for instance, one can associate with every Sij-Parameter a high-fidelity model

and a low-fidelity model similarly to the multivariate vector-valued use case in

(3.131).8

Therefore, we receive a matrix SK∈(YX0

0)4×4that corresponds to the high-fidelity

models and we receive a matrix S˜

K∈(YX1

1)4×4that corresponds to the low-fidelity

models.9

If we employ the matrix SKand the matrix S˜

Kin (4.59), we obtain the matrix SMMK

and the matrix SMM˜

K, respectively. Mind that the change-of-basis matrix Moperates

componentwise w.r.t. Y0⊆Rmwon the evaluated matrix SK(x)∶(Y0)4×4and it oper-

ates componentwise w.r.t. Y1⊆Rmwon the evaluated matrix S˜

K(˜

x)∶(Y1)4×4, respec-

tively.

One can also construct a matrix SMMK,˜

Kwhose entries are constituted by low-

fidelity models regarding the entries of SMMK.

Observe that, however, one cannot necessarily expect that the matrix SMMK,˜

Kand

the matrix SMM˜

Kbehave equally. Let us refer to this situation as formalization is-

sue B (FI-B).

Even if we limit our considerations to the surrogate modeling & simulation sub-

part of surrogate optimization, one can already recognize that a substantial amount

of bookkeeping is needed to examine situations regarding coordinate transforma-

tions.

8Technically, an Sij-parameter is defined as a member of the complex numbers, i.e., Sij ∈C. How-

ever, I utilize the common abbreviated interpretation in which the term Sij refers to the magnitude of

the corresponding Sij-parameter. And since, roughly speaking, an Sij-parameter encodes the ratio of

waves, it is usual to conceive the term Sij in the relative unit of measurement decibel (dB). Therefore, I

interpret the term Sij as a member of the real numbers, i.e., Sij ∈R.

9Respecting the notation in (3.131), the entry SijKhas the signature YX0

0, i.e., SijK∶YX0

0, where the

matrix SK∶=[si,j]K∈(YX0

0)4×4and SijK∶=[si,j]K. Analogously, the entry Sij˜

Khas the signature YX0

0, i.e.,

Sij˜

∶YX0

0, where the matrix S˜

K∶=[si,j]˜

K∈(YX0

0)4×4and Sij˜

∶=[si,j]˜

160 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

Formalizing adequately situations regarding coordinate transformations in sur-

rogate optimization can be beneficial from a theory-driven viewpoint as well as

from an application-driven viewpoint. From a theory-driven viewpoint such an ad-

equate formalizing hints at the possibilities of, for lack of a better word, a coordinate-

invariant surrogate modeling & simulation. From an application-driven viewpoint,

such an adequate formalizing illuminates the path for the question of, e.g., to what

extent post-processing entities of a high-fidelity model based on a numerical sim-

ulation (cf. § 2.2) can be captured by post-processing entities of a corresponding

low-fidelity model.

I argue that the CT approach can at least assist in formalizing such intricate sit-

uations regarding coordinate transformations in surrogate optimization that, to my

best knowledge, are not exhaustively studied.

By recalling the diagrams from (4.27) to (4.31), one can abstract a common pattern

from the above-mentioned situations regarding coordinate transformations in the

surrogate modeling & simulation sub-notion of surrogate optimization. Hence, I

propose the subsequent diagram as a means of formalizing this common pattern.

X Y Z T(X)T(Y)T(Z)

G(T(X)) G(T(Y)) G(T(Z))

F(X)F(Y)F(Z)S(F(X)) S(F(Y)) S(F(Z))

T−1

T(f)

T(J)

G(T(f)) G(T(J))

F(f)F(J)S

S−1

S(F(f))

S(F(J))

(4.63)

Observe that, in (4.63), there are five categories (recall Definition 4.2.1)M1,M2,

M3,M4,M5accompanied by the six functors (recall Definition 4.2.2) F ∶M1→M2,

T∶M1→M3, G ∶M3→M4, S ∶M2→M5, U ∶M4→M5, and V ∶M5→M4. Notice

that, by providing the functors F, T, G, S, U, V, the identification of the categories

M1,M2,M3,M4,M5with their diagrammatic representation in (4.63) should be

clear.

Let us associate M1with a high-fidelity model, and link M3with the corre-

sponding coordinate-transformed high-fidelity model, and relate M4to the low-

fidelity model of the coordinate-transformed high-fidelity model. Furthermore, let

us associate M2with the corresponding low-fidelity model, and relate M5to the

corresponding coordinate-transformed low-fidelity model.

The inverse functor (recall Remark 4.2.10) T−1∶M3→M1and the inverse

functor S−1∶M5→M2indicate a coordinate transformation at the level of the

high-fidelity model and a coordinate transformation at the level of the low-fidelity

model, respectively. Notice that the existence of the inverse functors implies the

equivalence of categories (recall Definition 4.2.4)M1≃M3and M2≃M5, respec-

tively. From an application-driven viewpoint, a potential reading of these equiva-

lences is that, roughly speaking, the high-fidelity model M1and the correspond-

ing coordinate-transformed high-fidelity model M3are essentially the same as well

as the low-fidelity model M2and the corresponding coordinate-transformed low-

fidelity model M5are essentially the same.

The formalization issue FI-A can be expressed within the diagram in (4.63) by

4.5. Future use cases for the CT toolset 161

setting M2≡M5, and S ∶=idM2, and by demanding that the functor V is an in-

verse functor to the functor U, i.e., V ≡U−1, such that there is an equivalence of cate-

gories M4≃M2. Moreover, by assuming M1≃M3, one can ultimately regard F ≡G.

Or to put it in other words: The low-fidelity model M2and the low-fidelity model

of the coordinate-transformed high-fidelity model M4are essentially the same.

Using the diagram in (4.63), the essence of the formalization issue FI-B is that

there is not necessarily a functor V such that V ≡U−1, and therefore M4/≃M5.

One can roughly construe this equivalence as follows: The low-fidelity model of

the coordinate-transformed high-fidelity model M4and the coordinate-transformed

low-fidelity model M5are not essentially the same.

4.5 Future use cases for the CT toolset

I point out that, compared to, e.g., the usage of the language of functional analysis

and the language of differential geometry (recall ch. 2), the usage of the language of

category theory is still in an early stage regarding surrogate optimization within the

electromagnetics context.

However, by emphasizing the category theoretical language’s ability as a strong

notational scaffolding by diagrams of arrows, we have discussed in particular its po-

tential usefulness concerning some formal aspects of surrogate optimization within

the electromagnetics context.

Despite the fact that we have harnessed a lot of the CT toolset’s rigor, there is,

admittedly, more research needed to put the applications of the CT toolset on an

even more rigorous footing. I remark that the language of category theory offers

many more tools to be explored.

Besides the need for an even more rigorous footing of the applications of the CT

toolset, the findings of the present chapter can serve as a good starting point for

three interesting paths to pursue regarding future use cases for the CT toolset.

Concerning implementations of the CT toolset for surrogate optimization within

the electromagnetics context, the first path points at theory-driven implementations

in programming languages that follow different design principles: dynamically-

checked, statically-checked, imperative, functional, and similar. The expressiveness

of the chosen programming language will determine how many work-around arti-

facts in the actually programs are needed and how the incorporation of, e.g., differ-

ent numerical solvers are concretized.

The second path is inspired by [114] where the authors promote category theory

as a multidisciplinary language in order to discuss a magneto-elasticity problem.

Thus, the CT toolset could act as a guiding principle in designing surrogate-guided

optimization methods for multidisciplinary design optimization (recall § 1.1). In

constructing the corresponding numerical algorithms, one strength of the CT toolset

could be particularly useful: the coherent change of perspective from an algebraic to

a geometric presentation.

The third path is concerned with a thorough development of a modeling and

reasoning environment for parallel SGO methods. Recap that the definition of a

category (see Definition 4.2.1) deals solely with sequential composition. However,

monoidal categories (see, e.g., [46, p. 72f]) seem promising for this path since they in-

corporate sequential composition and parallel composition as well. Recalling the

162 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

diagrams in (4.32), monoidal categories can assist in encoding meaningfully expres-

sions such as10

∀J0,J1,J2.∀f0,f1,f2.(J0○f0)⊗(J1○f1)⊗(J2○f2)=((J0⊗J1)○(f0⊗f1))⊗(J2○f2)

(4.64a)

∀J0,J1,J2.∀f0,f1,f2.(J0○f0)⊗(J1○f1)⊗(J2○f2)=(J0⊗J1⊗J2)○(f0⊗f1⊗f2).

(4.64b)

Moreover, monoidal categories help to formalize, e.g., the subsequent diagrams.

X0Y0Z0

⊗

X1Y1Z1

⊗

X2Y2Z2

f0J0

f1J1

f2J2

X0Z0

⊗

X1Z1

⊗

X2Z2

Jf0

Jf1

Jf2

(4.65)

X0⊗X1⊗X2Y0⊗Y1⊗Y2Z0⊗Z1⊗Z2

f0⊗f1⊗f2J0⊗J1⊗J2(4.66)

X0⊗X1⊗X2Z0⊗Z1⊗Z2

Jf0⊗Jf1⊗Jf2(4.67)

10For more details regarding expressions such as in (4.64), I refer to, e.g., [46, p. 173–177] and refer-

ences therein.

4.6. In closing 163

4.6 In closing

The major aims of the chapter have been (1) to furnish us with a critical examina-

tion of the algebraic tools from the language of category theory (CT) that have been

anticipated in the preceding chapters; and (2) to investigate to which extent the cate-

gory theoretical language is beneficial in the development of an algebraic modeling

framework for applications in surrogate optimization.

In order to illustrate the principal application-oriented advantages of an alge-

braic modeling framework, we have recapitulated the contextual landscape of sur-

rogate optimization in the previous chapters and we have partly enlarged this land-

scape by the emerging context of full automation of surrogate-guided optimiza-

tion (SGO).

I have argued that to sufficiently comply with this emerging context, there is a

need for dealing with issues regarding software systems, too. Currently, however,

there is a gap between issues regarding software systems and the emerging context

of full automation of SGO.

Observing this gap, I have proposed the use of the category theoretical language

which can serve as a mediating instance between certain language-focused aspects

of software systems and the mathematical modeling involved in SGO methods.

Furthermore, I have mentioned some relevant related work regarding the devel-

opment of an algebraic modeling framework using the category theoretical language

for applications in surrogate optimization.

Subsequently, we have elaborated on the category theory toolset used in the

present work. This CT toolset, though, represents solely a subset of the large amount

of tools available within category theory. By illuminating the strengths of the CT

toolset as a strong notational scaffolding by diagrams of arrows, the power of the

coherent change of perspective from an algebraic presentation to a geometric pre-

sentation and vice versa has been shown.

There has been an attempt to balance the need for rigor and the need for appli-

cability as good as possible. Admittedly, more research is needed to put the appli-

cations of the CT toolset on an even more rigorous footing. However, an impor-

tant concern has been to foster some intuition regarding the CT toolset. From an

application-oriented viewpoint, this intuition is in particular important to promote

the CT toolset’s wider acceptance.

Besides the intuition, for the CT toolset’s wider acceptance from an application-

oriented viewpoint, computational facets are another important point. Although,

it is not the primary ambition in the present work, computational facets of the CT

toolset have been signposted and the relationship to more established toolsets (recall

ch. 2) has been illuminated. Finally, I have suggested a heuristics-driven notion of a

problem-dependent degree of forgetfulness (DoFF) in order to quantify the so-called

modeling error.

Afterwards, one way of specifying a general optimization problem by using the

CT toolset has been shown. For instance, we have carved out a diagram of arrows

that enables a discussion of performance-related issues regarding a general opti-

mization problem.

In specifying SGO methods by using the CT toolset, we have examined the clas-

sical set-oriented modeling paradigm "(set) functions as models" and the category-

oriented modeling paradigm: "Categories as models" and "functors as model trans-

formations".

By assuming an order-oriented fidelity notion regarding multi-fidelity methods,

we have in particular investigated the comparability of the concept of multifidelity

164 Chapter 4. An algebraic modeling framework using the category theoretical

language for applications in surrogate optimization

model management and the space mapping notion. Thus, I have propounded some

classification tools at the level of generalized functions (recall Figure 1.4) for the

concept of multifidelity model management and the space mapping notion.

By exploring the direction of the arrows in the category-oriented modeling para-

digm, some potential additional facets regarding the fidelity notion have been indi-

cated.

Furthermore, we have discussed the scalability of diagrams of arrows – while

maintaining the interpretability – concerning the handling of a high-fidelity opti-

mization problem and multiple low-fidelity optimization problems.

Ultimately, we have examined two use cases of the CT toolset that are rele-

vant to surrogate optimization for applications within the electromagnetics context:

(1) simplified-physics low-fidelity models, and (2) coordinate transformations.

Regarding the first use case, it has been exemplified how the machinery of cate-

gory theory can be applied to the construction process and the interaction of simpli-

fied-physics low-fidelity models.

It has been shown, for instance, how certain modeling decisions can lead to re-

strictions w.r.t. certain constructions. Furthermore, we have invoked the heuristics-

driven notion of a problem-dependent DoFF as a useful auxiliary means to quan-

tify simplified-physics low-fidelity models as forgetful interpretations of the high-

fidelity model.

Regarding the second use case, it has been exemplified how the machinery of

category theory can be applied to handle certain formalization issues concerning the

surrogate modeling & simulation sub-notion of surrogate optimization.

One formalization issue is related to coordinates w.r.t. a physical space; another

formalization issue is related to coordinates w.r.t. an abstract vector space. A context

for the first formalization issue is provided by the coordinate transformation of a

helical coil w.r.t. a well-posed boundary value problem. A context for the second

formalization issue is provided by the transformation of 4-port S-parameters.

By using the CT toolset, I have proposed a diagram of arrows as a common

generic interface of both formalization issues. It has been demonstrated that, by

setting adequately the entities within the diagram, the essential statement of the re-

spective formalization issue can be instantiated.

We have ended the investigation by exhibiting three potential future use cases

for the CT toolset based on the findings of the present chapter: (1) implementations

in programming languages, (2) SGO methods for multidisciplinary design optimiza-

tion, and (3) a modeling and reasoning environment for parallel SGO methods.

165

Chapter 5

Surrogate optimization with the

magnetoquasistatic model

Invoking the insights from the previous chapters, let us discuss how to apply a sub-

set of these insights to four optimization problems relevant within an electrical en-

gineering design context. Therefore, I deem it beneficial to utilize the image of a

rapid prototyping part of a product development cycle as a means to contextualize

the subsequent sections. Hence, I present a strategy of using surrogate optimization

in order to support preexisting electrical engineering design workflows.

In the first section, I elaborate on two optimization problems with regard to a

single inductive component, more precisely, to a solenoid with a core within the set-

ting of a 2D-LBVP. In the preliminary considerations, I present all the basic building

blocks to formulate the two optimization problems. Given an operating frequency,

the first optimization problem uses the semantics of the time-averaged ohmic loss

to encode the objective functional and it uses the semantics of the inductance to

encode the constraints besides some box constraints. The second optimization prob-

lem extends the discussion of the first optimization problem to the case of multiple

operating frequencies under test.

In the second section, let us discuss briefly one optimization problem with re-

gard to a common-mode choke within a prototypical version of a simplistic EMC

filter that is embedded in the setting of a 3D-LBVP. Furthermore, we discuss another

optimization problem with regard to two common-mode chokes within the setting

of a 2D-LBVP.

Concerning the first optimization problem, the aim is to minimize the magnitude

of one scattering parameter for a very narrow frequency range. From a theoretical

viewpoint, the notion of scattering parameters is not encoded within the magneto-

quasistatic model. From an application-driven viewpoint, though, there is usually

a fuzzy bounded overlapping modeling region of the magnetoquasistatic subsys-

tem of Maxwell’s equations and the complete system of Maxwell’s equations to gain

some knowledge about a given application for low frequency ranges such as, e.g.,

from 0Hz to 200MHz. The focus is mainly on the surrogate optimization, thus, we

consider the corresponding high-fidelity model primarily as a black-box model.

Regarding the second optimization problem, I examine the optimal positioning

of two common-mode chokes in order to lower their inductive coupling. I present

an approximate encoding of the high-fidelity optimization problem and, for this use

case, let us discuss concisely a surrogate optimization strategy as well.

166 Chapter 5. Surrogate optimization with the magnetoquasistatic model

5.1 Solenoid with a core

5.1.1 Preliminary consideration

The device under test is a solenoid with a core as a representative of the class of

inductive components.

In order to formulate a meaningful optimization problem within the syntacti-

cal framework discussed in § 2.3, let us briefly illustrate two physical behaviors of

inductive components that are relevant in practical applications: (1) losses and (2)

electromagnetic compatibility (EMC). These two main aspects are organized within

a short list. Albeit for a comprehensive treatment of the specific sub-aspects in this

short list, I refer to, e.g., [154] or [164] and references therein.

The short list reads as:

(1) losses

(1.1) losses in the winding

(1.1.1) losses due to a direct current

(1.1.2) losses due to an alternating current

(1.1.2.1) losses due to the skin effect

(1.1.2.2) losses due to the proximity effect

(1.2) losses in the core

(1.2.1) losses due to hysteresis

(1.2.2) losses due to eddy currents

(1.2.3) losses due to relaxation processes

(2) electromagnetic compatibility

(2.1) galvanic coupling or ohmic coupling

(2.2) electric coupling or capacitive coupling

(2.3) magnetic coupling or inductive coupling

(2.4) radiative coupling.

Invoking the EMC theory’s so-called "Emitter-Coupler-Receiver" scaffolding, one

can conceive the electromagnetic compatibility of an inductive component as the

component’s capability to interact in a mostly unwanted way with another electric

device. In the chapter’s exposition, let us be mainly concerned with the magnetic

coupling. Recalling Figure 1.1, an interesting use case is the magnetic coupling of

inductive components within an electromagnetic system such as, e.g., an EMC filter.

Therefore, the positioning of the inductive components within such an EMC filter

can influence the behavior of the filter.

Notice that I comprehend all the losses of an inductive component roughly as

power losses converted into heat. With regard to the BVPs under consideration,

though, it is assumed that the core is lossless which is a reasonable approximation

of real inductive components if their core material is ferrite.

In order to encode the solenoid with a core within a 2D-LBVP, one can break,

metaphorically speaking, the helicoidal winding with N∈Nturns of an actual sol-

enoid into Ntoroids (recall Figure 1.3). Finally, one can exploit the rotational sym-

metry of the corresponding 3D-LBVP that is geometrically composed of Ntoroids

and a cylindrical core such that it turns the 3D-LBVP into a 2D-LBVP (cf. Figure 3.7

and Figure 3.8). Hence, for the purpose of a numerical simulation within FEMM4.2,

5.1. Solenoid with a core 167

let us encode the solenoid with a core according to the schematic description in (i) of

Figure 5.1.1

(i) (ii)

Ω2D

nc,1

∂Ω2D

Ω2D

c,1

Ω2D

c,2

Ω2D

c,N

Ω2D

nc,2

axis of symmetry

dc+ 2x1

hΩ2D

c,2(x1)

wΩ2D

c,2(x1, x2)

FIGURE 5.1: (i) A schematic illustration of a solenoid with core with

an axially symmetric domain. (ii) A simplicial triangulation Thof the

space region Ω2Dvia FEMM4.2.

In (i) of Figure 5.1, the entity dc∈R+refers to a fixed distance between the N

toroids. It is supposed that the radii of the cross-section areas of the Ntoroids are all

the same length which is encoded by the parameter x1. The parameter x2denotes the

radius of the Ntoroids themselves. The height of the core hΩ2D

c,2 is an affine function

with the signature R+→R+that depends on the parameter x1, more precisely,

hΩ2D

c,2

=x1↦N(dc+x1)+a0∶R+→R+, (5.1)

where a0∈R+is a fixed parameter to set up the core’s vertical expansion below the

lowest circle and above the highest circle in (ii) of Figure 5.1. Furthermore, note that

the width of the core wΩ2D

c,2 is an affine function with the signature R+×R+→R+that

depends on the parameters x1and x2, more precisely,

wΩ2D

c,2

=(x1,x2)↦2(x2−x1)−2a1∶R+×R+→R+, (5.2)

1Mind that parts of the corresponding simulation code at an early stage have been developed

during a bachelor thesis called "Numerische Optimierung in der Magnetoquasistatik mit dem Space

Mapping Ansatz" [in English: "Numerical Optimization in Magnetoquasistatics using the Space Map-

ping Approach"] (Albert Piwonski, summer term 2017; unpublished) under my scientific supervision

and reviewed by the first reviewer of the present work and Prof. Dr.-Ing. Ronald Plath (TU Berlin).

168 Chapter 5. Surrogate optimization with the magnetoquasistatic model

where a1∈R+is a fixed parameter to set up a minimal distance with respect to the

core’s horizontal expansion towards the series of circles in (ii) of Figure 5.1. In the

test cases, it is set that N∶=10, dc∶=0.1mm, a0∶=2.0mm, and a1∶=0.5mm.

With regard to the constitutive equations in (2.2), it is set that the material char-

acteristics of the subdomain Ω2D

nc,1 equal to the material characteristics of air. The

material properties of the subdomain Ω2D

nc,2 are identified with the material proper-

ties of the subdomain Ω2D

nc,1 – except for the relative magnetic permeability that is

set to 5×103. Finally, for the subdomains from Ω2d

c,1 to Ω2D

c,N, it is assumed that the

material characteristics of plain copper (recall § 3.1.3).

With regard to the source term Jsrc in (2.16), a fixed current intensity (or in short,

a fixed current) I0∈R+where I0∶=1A is associated with the subdomains from Ω2D

c,1

to Ω2D

c,Nsuch that, in the case of direct current and in the case of alternating current

of sinusoidal waveform, the value of the root mean squared current Irms is identical

to the value of I0, i.e.,

Irms ≡I0. (5.3)

Observe that, in the case of alternating current of sinusoidal waveform, the require-

ment in (5.3) implies that, due to the fact that the relationship

∀Irms ∈R+.∃Ipeak ∈R+.Irms =Ipeak

√2(5.4)

holds, the value of the peak current Ipeak has to be set equal to the value of √2I0, i.e.,

Ipeak ≡√2I0. (5.5)

Furthermore, it is demanded that the source current density’s orientation is in the

same direction for each subdomain Ω2D

c,iwith i∈{1,2,.. ., N}. Mind that, from an

electrical network viewpoint on which we elaborate briefly below, one can consider

the conducting subdomains as galvanically connected in series such that the follow-

ing conditions hold:

∀i∈{1,. . ., N}.∫

Ω2d

c,i

Re(Jcond)⋅dA∶=⎧

⎪

⎨

⎪

⎩

I0, if ω≡2π⋅fwith f∶=0Hz

I0√2, if ω≡2π⋅fwith f>0Hz, (5.6a)

∃I={1,. . ., N}.∫

⊍i∈IΩ2d

c,i

Re(Jcond)⋅dA∶=⎧

⎪

⎨

⎪

⎩

I0, if ω≡2π⋅fwith f∶=0Hz

I0√2, if ω≡2π⋅fwith f>0Hz, (5.6b)

where the common map Re ∶C→Rthat maps a complex number to its real part is

overloaded.

Considering the boundary conditions in (2.16), Dirichlet boundary conditions

on ∂Ω2Dare solely imposed. More precisely, it is prescribed that the magnetic vector

potential Ais equal to the zero vector field along ∂Ω2D.

In the context of the numerical simulation of the magnetoquasistatic model (re-

call § 2.2), one has to pay attention to two points regarding the boundary ∂Ω2D.

First, the boundary ∂Ω2Dis topologically isomorphic (or homeomorphic) to the

boundary of a disk, i.e., to a one-dimensional sphere. If one examines other kinds

of isomorphisms as well, then, in general, the shape of the boundary ∂Ω2Dand, in

particular, the distance from the subdomains Ω2D

nc,2 and Ω2D

c,ifor all i∈{1,2,.. ., N}

5.1. Solenoid with a core 169

to the boundary ∂Ω2Dmay be relevant to the 2D-LBVP. However, let us invoke a

pragmatic approach such that we set heuristically the distance dΩ2D

nc,2,∂Ω2D∈R+, that

is, the distance from the vertical half of the core to the end of the boundary ∂Ω2D, to

at least four times hΩ2D

c,2 (x1)in (5.1), more precisely,

∀hΩ2D

c,2 (x1)∈R+.dΩ2D

nc,2,∂Ω2D≥4hΩ2D

c,2 (x1). (5.7)

Second, the distance dΩ2D

nc,2,∂Ω2Dis also relevant regarding the change of the pa-

rameters x1and x2such as within a numerical optimization. The parameters x1

and x2are associated with the geometry of the space region Ω2D, i.e., Ω2D(x1,x2)

(recall § 2.2.3), thus, in a non-intrusive environment (recall § 3.1.3), a change of these

parameters leads inevitably to a re-meshing. More precisely, a map ϕais provided

that reads as

ϕa=(x1a,x2a)↦ϕa(x1a,x2a)∶=(x1b,x2b)∶R+×R+→R+×R+, (5.8)

where ∀a,b∈N.a≠bÔ⇒ (x1a,x2a)≠(x1b,x2b), otherwise ϕais identical to the cor-

responding identity map idϕa. If we associate a 2-tuple (x1a,x2a)with a simplicial

triangulation T2D

haand if we associate a 2-tuple (x1b,x2b)with a simplicial triangu-

lation T2D

hb, then, by invoking the map ϕain (5.8), one can encode re-meshing figura-

tively by the map Φathat reads as

Φa=T2D

ha↦Φa(T2D

ha)∶=T2D

hb∶Ω2D(x1a,x2a)→Ω2D(ϕa(x1a,x2a)), (5.9)

where if and only if Φais identical to the corresponding identity map idΦa, the num-

ber of nodes, edges, and surfaces is preserved.

In order to mitigate the potential spurious influence of re-meshing on entities

depending on the parameters x1and x2such as quantities of interest, let us apply a

heuristic approach in the sense that

• the meshing is fixed to balance a highest possible resolution and a reasonable

computation time (for a representative of the corresponding simplicial trian-

gulation, see (ii) in Figure 5.1);

• any kind of adaptive meshing is disabled, and

• the distance dΩ2D

nc,2,∂Ω2Din (5.7) is fixed to at least four times hΩ2D

c,2 (x1,max)where

x1,max is conceived as the maximal number in a bounded interval [x1,min,x1,max].

For more details on topics such as re-meshing and the like, I refer to the research, for

instance, within the field of shape optimization (see, e.g., [54] and references therein).

In technical applications (recall Figure 1.1), it is common to adopt an electrical

network viewpoint regarding an inductive component within an electromagnetic

system such that the inductive component is expressed by a circuit diagram repre-

sentation.

In Appendix B.1, the electrical network viewpoint is concisely elaborated. Fur-

ther, I recall the relationship between entities at the field theoretical level such as PL∈

R+that denotes the time-averaged ohmic loss in Ω2D, and Wm∈R+that denotes the

time-averaged magnetic energy in Ω2D, and entities at the circuit theoretical level

such as R∈R+that denotes the resistance, and L∈R+that denotes the inductance.

170 Chapter 5. Surrogate optimization with the magnetoquasistatic model

In order to formulate an optimization problem in accordance with our elabora-

tions in § 2.3, let us introduce the map ˆ

jPLand the map ˆ

QLsuch that

jPL=(ω,x1,x2)↦PL≡ˆ

jPL(ω,x1,x2)∶R+×R+×R+→R+, (5.10a)

QL=(ω,x1,x2)↦L≡ˆ

QL(ω,x1,x2)∶R+×R+×R+→R+, (5.10b)

where x1and x2refer to the entities in (i) of Figure 5.1. Next, recalling (3.128), one

can construct a new map ˆ

jPL,ωand a new map ˆ

QL,ωby currying such that

jPL,ω=ω↦((x1,x2)↦ˆ

jPL(ω,x1,x2))∶R+→R+R+×R+

, (5.11a)

QL,ω=ω↦((x1,x2)↦ˆ

QL(ω,x1,x2))∶R+→R+R+×R+

. (5.11b)

If one fixes the operating frequency f0and the angular operating frequency ω0≡2πf0,

then one can define the map ˆ

jPL,ω0and the map ˆ

QL,ω0such that

jPL,ω0=(x1,x2)↦ˆ

jPL(ω0,x1,x2)∶R+×R+→R+, (5.12a)

QL,ω0=(x1,x2)↦ˆ

QL(ω0,x1,x2)∶R+×R+→R+, (5.12b)

where ˆ

jPL,ω0≡ˆ

jPL,ω(ω0)and ˆ

QL,ω0≡ˆ

QL,ω(ω0).

Subsequently, the operating frequency f0is set to f0∶=1×105Hz. This choice is

based on a rough heuristic estimate of the scale of the winding losses. This heuristic

estimate reads as

ePL=(x1,f0)↦2x1/δS(f0)∶R+×R+→R+, (5.13)

where δShas the signature R+→R+and δS(f0)∶=(πf0µ0σCu)−1

2denotes the so-

called skin depth (cf. [103, p. 220]) w.r.t. a good single conductor (recall § 3.1.3) evalu-

ated at the operating frequency f0.

If a lower bound of x1is set to x1,min ∶=1×10−3m and if an upper bound of x1

is set to x1,max ∶=3×10−3m, then one can ascertain the corresponding values of the

estimate ePL,f0for various operating frequencies in Table 5.1.

In Table 5.1, one can observe that the scale of the winding losses is, very roughly

estimated, forty-five times greater at the frequency f0∶=1×105Hz than at the fre-

quency f0∶=5×101Hz. Thus, given the frequency f0∶=1×105Hz, I conclude that

the winding losses are at such a significant level which can be critical in technical

applications.

Notice that the estimate in (5.13) can also be used to illustrate the decrease of

the magnetic energy due to a conductor’s higher magnetic shielding effect at higher

operating frequencies. However, the rate of the magnetic energy’s decrease is much

lower compared to the rate of the winding losses’ increase.2

From an electromagnetic system point of view, it might be more adequate to

regard the time-averaged ohmic loss density, that is, the time-averaged ohmic loss

2For more details on computations w.r.t. magnetic fields in the magnetoquasistatic model, I refer

to, e.g., [103, p. 218–224] and references therein.

5.1. Solenoid with a core 171

TABLE 5.1: Given a lower bound x1,min and an upper bound x1,max,

the rough heuristic estimate of the scale of the winding losses in (5.13)

for various operating frequencies f0.

(A) The lower bound x1,min set to

1×10−3m.

f0[ Hz ] ePL(x1,min,f0)[1]

5×1010.217

1×1020.307

1×1030.970

1×1043.068

1×1059.701

1×10630.678

1×10797.014

1×108306.784

(B) The upper bound x1,max set to

3×10−3m.

f0[ Hz ] ePL(x1,max,f0)[1]

5×1010.651

1×1020.920

1×1032.910

1×1049.204

1×10529.104

1×10692.035

1×107291.041

1×108920.353

normalized to a volume under test Vut. Hence, the time-averaged ohmic loss density

enables a comparison of different solenoids with core (recall Figure 5.1) by taking

into account a restriction of the available space within an electromagnetic system.

Given a map Vut =(x1,x2)↦Vut(x1,x2)with the signature R+×R+→R+, one

can define the map ˆ

jPL,Vut, the map ˆ

jPL,Vut,ω, and the map ˆ

jPL,Vut,ω0analogous to (5.10a),

(5.11a), and (5.12a), that is,

jPL,Vut =(ω,x1,x2)↦ˆ

jPL(ω,x1,x2)/Vut(x1,x2)∶R+×R+×R+→R+, (5.14a)

jPL,Vut,ω=ω↦((x1,x2)↦ˆ

jPL,Vut(ω,x1,x2))∶R+→R+R+×R+

, (5.14b)

jPL,Vut,ω0=(x1,x2)↦ˆ

jPL,Vut(ω0,x1,x2)∶R+×R+→R+, (5.14c)

where ˆ

jPL,Vut,ω0≡ˆ

jPL,Vut,ω(ω0).

I conceive Vut(x1,x2)∈R+in (5.14) as the volume of the core Vc(x1,x2)∈R+com-

bined with the volume of the winding Vw(x1,x2)∈R+, that is,

Vut(x1,x2)≡Vc(x1,x2)+Vw(x1,x2), (5.15)

where Vc(x1,x2)and Vw(x1,x2)are determined by means of numerical integration.

Recall that it is assumed that a cylindrical core and Ntoroids are associated with the

spatial domain in Figure 5.1. Hence, one can also determine Vc(x1,x2)and Vw(x1,x2)

by means of the following formulae:

Vc(x1,x2)∶=π(0.5wΩ2D

c,2 (x1,x2))2hΩ2D

c,2 (x1,x2),Vw(x1,x2)∶=N2π2x2

1x2. (5.16)

However, let us proceed with the computation of Vut(x1,x2)by numerical integra-

tion since it is a more general approach. Due to numerical inaccuracies, though,

there are slight differences between the computation of Vut(x1,x2)by numerical in-

tegration and by the formulae in (5.16).

172 Chapter 5. Surrogate optimization with the magnetoquasistatic model

5.1.2 Optimization problem I

Without loss of general applicability, in the subsequent exemplification regarding

surrogate optimization, let us focus on the term ˆ

jPL,ω0(x1,x2)in (5.12a) rather than

on the term ˆ

jPL,Vut,ω0(x1,x2)in (5.14c); or to put it differently, let us focus on the

ohmic loss rather than the ohmic loss density. Hence, I investigate the following

high-fidelity optimization problem as a concrete instance of the abstract optimiza-

tion problem in (2.36) that reads as

min. ˆ

jPL,ω0(x1,x2)(5.17a)

s.t. x1,min ≤x1≤x1,max, (5.17b)

x2,min ≤x2≤x2,max, (5.17c)

Lmin −ˆ

QL,ω0(x1,x2)≤0, (5.17d)

QL,ω0(x1,x2)−Lmax ≤0, (5.17e)

where ω0∶=2π100kHz, x1,min ∶=1×10−3m, x1,max ∶=3×10−3m, x2,min ∶=5×10−3m,

x2,max ∶=10×10−3m, Lmin ∶=2.5×10−6H, and Lmax ∶=3.5×10−6H.

Remark 5.1.1. Due to production considerations in practical applications, there might be

only integer length quantities or a mix of a real length quantity and an integer length quan-

tity. Hence, this kind of modeling issue might be covered adequately by an integer optimiza-

tion problem or a mixed-integer optimization problem. However, in the present work, I regard

the problem in (5.17) as a reasonable approximation of potential modeling implications due

to production considerations in practical applications.

Remark 5.1.2. Recalling § 2.3.1, one can observe that in (5.17), there is an evaluated re-

duced parametric quantity of interest in the objective functional and there is an evaluated

reduced parametric quantity of interest in the constraints, as well.

It is supposed that the relation in (3.106) holds to be true, at least, in a worst-case

sense. Regarding direct solving of the high-fidelity optimization problem in (5.17)

by means of an adequate optimization algorithm from § 2.3.3, a worst-case scenario

can be conceived as, e.g., the unfavorable choice of an initial point for a locally con-

vergent algorithm.3

For instance, if we apply the COBYLA algorithm (recall § 2.3.3) to the problem

in (5.17) with the initial point x(0)and the maximum number of high-fidelity func-

tion evaluations mDSO,max that read as

mDSO,max ≡30 x(0)∶=(1.1×10−3m,9.9 ×10−3m), (5.18)

then one receives the subsequent log data in an abridged version, i.e.,

mDSO ≡30 x∗∶=(2.98×10−3m,8.86 ×10−3m)

(5.19a)

jPL,ω0(x∗

1,x∗

2)≡96.32×10−3Wˆ

QL,ω0(x∗

1,x∗

2)∶=2.51×10−6H, (5.19b)

where the 3-tuple (∆ˆ

jPL,ω0

,∆ˆ

QL,ω0,∆x∗)∈R+×R+×R+is set to

(∆ˆ

jPL,ω0

,∆ˆ

QL,ω0,∆x∗)∶=(1.0×10−4,1.0 ×10−4,1.0 ×10−8), (5.20)

3Even if some potential physical intuition is available concerning a suitable candidate for an initial

point, such a worst-case scenario is highly probable in practical applications where the shape of the ad-

missible set of solutions and the landscape of the evaluated objective functional are usually unknown.

5.1. Solenoid with a core 173

which is constituted by the absolute accuracy threshold for the evaluated objective

functional ∆ˆ

jPL,ω0

, the absolute accuracy threshold for the constraints ∆ˆ

QL,ω0, and the

relative accuracy threshold for the optimal solution ∆x∗. Notice that I do not dwell

on these thresholds since they appear problem-dependent. However, let us retain

the 3-tuple (∆ˆ

jPL,ω0

,∆ˆ

QL,ω0,∆x∗)for all optimization problems under consideration.

To put it differently: It is assumed that, in some sense adequate, inclusion maps

(similarly to, e.g., (3.122)) exist for all optimization problems under consideration.

The initial point x(0)in (5.18) is not an admissible point since the inductance

is ˆ

QL,ω0(x(0)

1,x(0)

2)∶=3.98×10−6H. However, even if one selects heuristically4an

admissible initial point, then the optimization algorithm still detects the solution x∗

in (5.19a). Hence, I conclude that, at the level of programs (recall Figure 1.4), the

implementation of the COBYLA algorithm possesses a coping mechanism to deal

with not admissible initial points.

By numerical experiments and theoretical considerations, the optimal solution x∗

in (5.19a) is plausible. For instance, if one removes the constraints regarding the in-

ductance in (5.17), more precisely, if one removes (5.17d) and (5.17e), then the com-

puted optimal solution is located at (x1,max,x2,min). Recalling Figure 5.1, one can

put it in 3D terms: The largest possible radius of the cross-section areas of the N

toroids and the smallest possible radius of the Ntoroids themselves constituted the

computed optimal solution.

The computed optimal solution (x1,max,x2,min), though, corresponds to the low-

est possible inductance regarding the constraint in (5.17b) and the constraint in (5.17c).

Hence, I deem it reasonable to assume that the theoretical optimal solution in (5.17)

is at least a member of the level set LLmin(ˆ

QL,ω0)that can be defined by set-builder

notation as

LLmin(ˆ

QL,ω0)∶={(x1,x2)∈R+×R+∣ˆ

QL,ω0(x1,x2)−Lmin =0}. (5.21)

Furthermore, if one compares the solution (x1,max,x2,min)with the solution x∗in

(5.19a), then I prudently infer that, concerning ˆ

jPL,ω0(x1,x2)in (5.17), the parame-

ter x1possess probably a higher relevance compared to the parameter x2. This infer-

ence seems reasonable from a physical viewpoint: If a conceptual approximation is

made in the sense that one imagines that the Ntoroids corresponding to Figure 5.1

are replaced by Ncylinders of the length x2and the radius x1of the cross-section

area, then one can state, roughly speaking, that ˆ

jPL,ω0(x1,x2)is proportional to x2

and proportional to 1/x2

1. These proportional relations furnish us with some indica-

tions regarding the order of relevance of the parameters x1and x2w.r.t. ˆ

jPL,ω0(x1,x2)

in (5.17).

For a surrogate-based optimization, let us use a Sobol quasi-random sequence

sampling plan with m∶=21 and the data-fit low-fidelity models in § 3.2.1. In Ap-

pendix B.2, I present a visualization of the evaluated data-fit low-fidelity models

regarding (5.12), (5.14c), and (5.15). In Appendix B.2, potential scaling issues are

taken into account (recall § 3.2.1) by setting up the appropriate units of measure for

the corresponding physical dimensions.

Recalling § 3.2.1, one can most likely rule out that the corresponding unknown

evaluated high-fidelity models behave like the Ackley function or the Michalewicz

function. It is probable that these models behave like one of the other functions

4A strategy to determine a feasible point regarding the optimization problem in (5.17) is, e.g., to

solve an optimization problem where the evaluated objective function is defined as ∥ˆ

QL,ω0(x1,x2)−

Lmin∥2

l2or as ∥ˆ

QL,ω0(x1,x2)−Lmax∥2

l2and the constraints are defined by (5.17b) and (5.17c).

174 Chapter 5. Surrogate optimization with the magnetoquasistatic model

(i.e., the Unit sphere function, the Booth function, the Rosenbrock function or the

Modified Branin function) in its outer regions.

Concerning the evaluated data-fit low-fidelity models in Appendix B.2, observe

that, in Table 5.2, the normalized global first-order sensitivity measures SN

1and SN

(cf. (2.51)) and, given mj≡50 and mj−1≡21, a low-fidelity models’ normalized global

first-order sensitivity measures (LFSM) error emj(SN

y,i)(cf. (3.37)) are computed. I

emphasize the data relevant to the exemplification regarding surrogate optimization

by means of coloring, i.e., ˜

jPL,ω0and ˜

QL,ω0. Mind that, as mentioned above, the

higher relevance of the parameter x1compared to the parameter x2is plausible from

a physical viewpoint.

TABLE 5.2: (Ia) SN

iwith i∈{1,2}evaluated at

(a)

f≡˜

jPL,ω0and

(b)

f≡˜

Vut w.r.t. the Figure B.6; (Ib) Given mj∶=50, LFSM er-

ror emj(SN

y,i)w.r.t. (Ia); (IIa) SN

iwith i∈{1,2}evaluated at

(a)

f≡˜

jPL,Vut,ω0and

(b)

f≡˜

QL,ω0w.r.t. the Figure B.7; (IIb) Given

mj∶=50, LFSM error emj(SN

y,i)w.r.t. (IIa).

(Ia)

(1a) (2a) (3a) (1b) (2b) (3b)

1(f)0.6751 0.6854 0.6860 0.8930 0.8927 0.8631

2(f)0.3249 0.3146 0.3140 0.1070 0.1073 0.1369

Σ2

i=1Si(f)1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

(Ib)

(1a) (2a) (3a) (1b) (2b) (3b)

emj(SN

y,1)−0.0066 −0.0245 −0.0551 −0.0009 −0.0029 +0.0014

emj(SN

y,2)+0.0134 +0.0495 +0.1023 +0.0074 +0.0237 −0.0088

(IIa)

(1a) (2a) (3a) (1b) (2b) (3b)

1(f)0.9720 0.9700 0.9704 0.6074 0.6166 0.6223

2(f)0.0280 0.0300 0.0296 0.3926 0.3834 0.3777

Σ2

i=1Si(f)1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

(IIb)

(1a) (2a) (3a) (1b) (2b) (3b)

emj(SN

y,1)−0.0016 −0.0010 −0.0203 −0.0018 −0.0123 −0.0114

emj(SN

y,2)+0.0541 +0.0323 +0.3947 +0.0028 +0.0192 +0.0182

Notice well that, due to the examination in § 3.2.1, if one can most probably rule

out that the unknown evaluated high-fidelity models are members of the same class

as the Ackley function or members of the same class as the the Michalewicz function,

then it is probably reasonable to expect that the behavior of the corresponding low-

fidelity models’ sensitivity measures w.r.t. the number of training sampling points is

relatively fast converging – especially in the case of the kriging low-fidelity model.

Hence, in practical applications such as, e.g., the rapid prototyping part of a

product development cycle, the Table 5.2 can serve as an approximate proxy for

other indicators (recall the conjecture in § 3.1.1) and it can serve as an overview of

the relevance of the parameters.

5.1. Solenoid with a core 175

In the subsequent considerations, let us invoke the kriging low-fidelity model.

If we apply the COBYLA algorithm to the corresponding low-fidelity optimization

problem with the initial point ˜

x(0)∶=(1.1×10−3m,9.9 ×10−3m), then one can com-

pute the optimal solution ˜

x∗that reads as

x∗∶=(3.00×10−3m,8.87 ×10−3m). (5.22)

And if we apply the COBYLA algorithm to the problem in (5.17) with the maxi-

mum number of high-fidelity function evaluations ˜

mDSO,max ∶=2 and the initial point

x(0)∶=˜

x∗, one receives the surrogate-based optimization’s log data in an abridged

version, i.e.,

mSBO ≡23 x∗∶=(3.00×10−3m,8.87 ×10−3m)

(5.23a)

jPL,ω0(x∗

1,x∗

2)≡96.26×10−3Wˆ

QL,ω0(x∗

1,x∗

2)∶=2.50×10−6H, (5.23b)

where mSBO ∶=m+˜

mDSO,max. The low-fidelity models regarding the entities in (5.17)

are constructed in parallel. If one constructs the low-fidelity models sequentially,

then mSBO in (5.23) is determined by mSBO ∶=2m+˜

mDSO,max.

For additional investigations, one can apply the procedure SBO-DFLF in the

neighborhood of the optimal solution in (5.23).

Recalling § 4.4.1, one can construct simplified-physics low-fidelity models by

adapting, e.g., T2D

h1or 1∆2D

Ax=b. Moreover, one can associate R1in§4.4.1, e.g., with

a physics-oriented approximation formula of ˆ

jPL,ω0(x1,x2)and ˆ

QL,ω0(x1,x2)such

as N(2πx2)(πx2

1)−1σ−1

CuI2

rms for the term ˆ

jPL,ω0(x1,x2)and µN2(πx2

2)hΩ2D

c,2 (x1)−1for

the term ˆ

QL,ω0(x1,x2). However, in the remaining, let us focus on T2D

h1and 1∆2D

Ax=b.

The entities T2D

h1and T2D

h2are associated with ≈1×105elements and ≈1×103

elements, respectively (recall § 2.2.2), i.e., T2D

h1↦≈1×105elements, and T2D

h2↦≈

1×103elements. Further, it is set that 1∆2D

Ax=b∶=1×10−16 and 1∆2D

Ax=b∶=1×10−8.

In Table 5.3, given m∶=21, I present the corresponding mean SSPCC r2

y˜

y∣k∶=5

within the k-fold cross validation method w.r.t. the simplified-physics low-fidelity

model regarding ˆ

jPL,ω0and ˆ

QL,ω0in (5.17). The reference setting is defined by the

combination (T2D

h1,2∆2D

Ax=b).5

From the Table 5.3, I infer that one can probably invoke a useful simplified-

physics low-fidelity model by the combination (T2D

h2,2∆2D

Ax=b). Hence, one can pro-

ceed analogously to the procedure regarding the log data in (5.23). For additional

investigations, one can apply the procedure SBO-SPLF in the neighborhood of the

corresponding optimal solution.

If the solving of the optimization problem regarding the selected simplified-

physics low-fidelity model is perceived as being slow, then one can construct a

data-fit low-fidelity model w.r.t. the selected simplified-physics low-fidelity model

in addition (see, e.g., procedure SGO-SPLF). Due to the exposition in § 4.4.2, there

are also novel construction opportunities for simplified-physics low-fidelity models

based on coordinate transformations.

5In [123], the authors investigate among others the relationship between the SSPCC and the degree

of the grid discretization in the context of antenna design. However, mind that an in-depth examina-

tion of relationships such as, e.g., the relationship between T2D

h1and r2

y˜

y∣k∶=5in the light of re-meshing

(see (5.9)), is out of the scope of the present work.

176 Chapter 5. Surrogate optimization with the magnetoquasistatic model

TABLE 5.3: Given T2D

h1↦≈1×105elements, T2D

h2↦≈1×103ele-

ments, 1∆2D

Ax=b∶=1×10−16,2∆2D

Ax=b∶=1×10−8, and m∶=21, the mean

SSPCC r2

y˜

y∣k∶=5within the k-fold cross validation method w.r.t. the

simplified-physics low-fidelity model regarding ˆ

jPL,ω0and ˆ

QL,ω0

in (5.17). The reference setting is indicated by a gray box.

(A) The mean SSPCC r2

y˜

y∣k∶=5

regarding ˆ

jPL,ω0.

T2D

h1T2D

1∆2D

Ax=b1.00 0.90

2∆2D

Ax=b1.00 0.93

(B) The mean SSPCC r2

y˜

y∣k∶=5

regarding ˆ

QL,ω0.

T2D

h1T2D

1∆2D

Ax=b1.00 0.99

2∆2D

Ax=b1.00 0.99

Subsequently, let us recall explicitly the context that we are in the midst of a

rapid prototyping part of a product development cycle. Thus, it is supposed that

the concern is mainly to deploy a surrogate-guided optimization in the context of

validation and verification of given results of a surrogate-based optimization (re-

call § 3.3). Then, it is economical and sustainable to re-use as much of the data of a

surrogate-based optimization as possible.

More tangibly, for a co-kriging optimization (recall § 3.3.3), let us re-use the

sample of size m=21 associated with the data regarding ˜

jPL,ω0and ˜

QL,ω0w.r.t. the

surrogate-based optimization’s log data in (5.23). That is, we use a proper subsam-

ple of size mK=15 as the data of a high-fidelity model and we use a subsample of

size m˜

K=21 as the data of a low-fidelity model whose output points are defined as

y∶=ρckySBO , (5.24)

where ySBO ∈Rm˜

K×1refers to the column vector representing the output points of the

sample w.r.t. (5.23) and ρck ∈Rrefers to a scaling parameter.

The two main reasons for the modeling choice in (5.24) are: (1) It is a correct-

ness check at the level of programs (recall Figure 1.4) that is devised based on the

elaborations in [70, p. 173 – 176]. More precisely, one can expect that the estimate ˆ

in (3.183) is computed as

ρ∶=1

ρck . (5.25)

(2) Due to the examinations in § 3.2.2, one can expect that, e.g., the SSPCC r2

y˜

y,˜

in (3.193) is close to one such that one can partly emulate the conditions concerning

the situation in (3.193).

Notice that it is set that ρck ∶=1.05. Due to numerical issues (see the commentary

on (3.183)), let us round the corresponding estimate ˆ

ρto two decimal places, thus,

let us set ˆ

ρ∶=0.95 in (3.184).

The objective function in (5.17a) is adapted in order to be consistent with the

formulation in (3.192) in the sense that a desired value ˆ

jPL,ω0,d∈R+(cf. (2.33)) is

provided such that one can instantiate ˆ

jas

j=(x1,x2)↦∥ˆ

jPL,ω0(x1,x2)−ˆ

jPL,ω0,d∥2

l2∶X0→R+, (5.26)

where ˆ

jPL,ω0,d∶=0.0×10−3W and ˆ

j(x1,x2)=ˆ

j(ˆ

y(x1,x2)(cf. (2.40)) in (3.192). Due to

5.1. Solenoid with a core 177

the choice of the desired value ˆ

jPL,ω0,d, the optimal solution associated with the ob-

jective function in (5.17a) and the optimal solution associated with the adapted ob-

jective function in (5.26) are essentially the same.

Let us invoke the initial point in (5.18) and we receive the co-kriging optimiza-

tion’s log data in an abridged version, i.e.,

mSGO,ck ≡15 x∗∶=(3.00×10−3m,8.86 ×10−3m)

(5.27a)

jPL,ω0(x∗

1,x∗

2)≡96.53×10−3Wˆ

QL,ω0(x∗

1,x∗

2)∶=2.50×10−6H, (5.27b)

where mSGO,ck ∶=mK. Mind that, in the counting method regarding mSGO,ck, the sit-

uation is mimicked where one possesses a low-fidelity model whose SSPCC r2

y˜

y,˜

K1is

close to one and which is not equal to the high-fidelity model.

Observe that, compared with (5.23), the relative deviation in a suitable norm

between the optimal solution in (5.23) and the optimal solution in (5.27) is well below

one percent.

Recalling § 3.3.2, the TRASM algorithm 3.1 is applied by using the co-kriging

low-fidelity models of ˜

jPL,ω0and ˜

QL,ω0corresponding to (5.27). Let us invoke the

initial point in (5.18) and we set the remaining input entities of the TRASM algo-

rithm 3.1 as

B(0)∶=I∆(0)∶=10 (5.28a)

(η1,η2,γ,ζ)≡(1.0×10−5,1.0 ×10−1,0.25,2) (kmax,eabs,erel)∶=(2,1.0 ×10−3,1.0×10−4),

(5.28b)

and we define F(0)

0by using (3.149) and by adapting (5.17b)–(5.17e) similarly to (3.154).

Hence, one receives the TRASM algorithm’s log data in an abridged version, i.e.,

mSGO,sm ≡19 x∗∶=(3.00×10−3m,8.86 ×10−3m)

(5.29a)

jPL,ω0(x∗

1,x∗

2)≡96.53×10−3Wˆ

QL,ω0(x∗

1,x∗

2)∶=2.50×10−6H, (5.29b)

where mSGO,sm ∶=mSGO,ck +2kwith k≡2 within the TRASM algorithm 3.1. Let us

evaluate the optimal solution x∗in (5.29) by using the corresponding co-kriging low-

fidelity models.

Similarly to (5.27), observe that, compared with (5.23), the relative deviation in

a suitable norm between the optimal solution in (5.23) and the optimal solution

in (5.29) is well below one percent.

The entities in (5.28) are partly inspired by choices in [95], but, in general, these

entities are chiefly heuristically determined. Therefore, in practical applications,

there might be a need for a preprocessing step in which useful values for the en-

tities in (5.28) are determined. Additionally, in practical applications, there might

be a need for an adaptive restart strategy in order to prevent a potential low exper-

imental rate of convergence due to, for instance, too small step sizes h(k)within the

TRASM algorithm 3.1.

In summary, we have developed a particular relevant application-driven work-

flow concerning the relation in (3.106). More precisely, we have carved out a use

178 Chapter 5. Surrogate optimization with the magnetoquasistatic model

case such that

∃mDSO,mSBO,mSGO,sm,mSGO,ck ∈N/{0}.mDSO >mSBO >mSGO,sm >mSGO,ck (5.30)

holds to be true. To put the statement in (5.30) more poignantly, let us assume

that the computational time concerning a high-fidelity model evaluation is approxi-

mately 5min. Then, mDSO maps to 150min, mSBO maps to 115min, mSGO,sm maps to

95min, and mSGO,ck maps to 75min. Thus, we have carved out a use case in which

the computational time associated with a high-fidelity optimization problem can be

reduced by half. And with some additional effort, the corresponding optimal solu-

tion can be validated and verified, too.

Mind that, though, the numbers of the statement in (5.30) should be treated with

caution in the light of their potential for generalization such as in (3.106). Further-

more, recalling Figure 1.4, the choice of some entities such as, e.g., the problem-

dependent entities in (5.28), puts a few limits of comparability at the level of pro-

grams. For the issue of comparability at the level of functions, see § 4.3.

5.1.3 Optimization problem II

Note that if we extend the optimization problem in (5.17) in such a way that we

consider mw∈Noperating frequencies, then one can formulate heuristically an in-

dexed family of high-fidelity optimization problems Owhose assignment rule reads

as, e.g.,

O=i↦min. ˆ

jPL,ωi(x1,x2)(5.31a)

s.t. x1,min ≤x1≤x1,max, (5.31b)

x2,min ≤x2≤x2,max, (5.31c)

Lmin −ˆ

QL,ωi(x1,x2)≤0, (5.31d)

QL,ωi(x1,x2)−Lmax ≤0, (5.31e)

where i∈Iwith I∶={1,. . .,mw},ωi∈WIwith WI∶={ω1,. . .,ωmw},x1,min ∶=1×10−3m,

x1,max ∶=3×10−3m, x2,min ∶=5×10−3m, x2,max ∶=10×10−3m, Lmin ∶=2.5 ×10−6H, and

Lmax ∶=3.5×10−6H.

A possible aim concerning (5.31) can be to determine the optimal solution (ωi∗,x∗)

regarding all individual high-fidelity optimization problems O(i)in (5.31) for which

∀(wi,x)∈WI×X.ˆ

jPL,ωi∗(x∗)≤ˆ

jPL,ωi(x)(5.32)

holds (cf. (4.18)). If we assume that the entity ˆ

jPL,ωi(x1,x2)and the entity ˆ

QL,ωi(x1,x2)

are not available quickly for all ωi∈WIsuch that one cannot invoke immediately the

multivariate vector-valued use case in (3.131a), then we have to consider each high-

fidelity optimization problem O(i)in (5.31) individually.

By applying the above-mentioned application-driven workflow to each O(i)and

by supposing that the numbers mDSO,mSBO,mSGO,sm, and mSGO,ck in (5.30) are the

same for each ωi, one can roughly estimate the worst-case computational burden by

∃mDSO,mSBO,mSGO,sm,mSGO,ck ∈N/{0}.mwmDSO >mwmSBO >mwmSGO,sm >mwmSGO,ck .

(5.33)

5.1. Solenoid with a core 179

In Table 5.4, I present the log data in an abridged version w.r.t. the setting in (5.19)

for the operating frequencies 5×101Hz, 1×105Hz, and 1 ×108Hz such that it is de-

fined that ω1∶=2π50Hz, ω2∶=2π100kHz, and ω3∶=2π100MHz in (5.31).

TABLE 5.4: Given the operating frequencies 5×101Hz, 1×105Hz,

and 1×108Hz, the log data in an abridged version w.r.t. the setting

of the log data in (5.19).

f0[ Hz ] mDSO [1] x∗[ (m,m) ] ˆ

jPL,ω0(x∗

1,x∗

2)[W] ˆ

QL,ω0(x∗

1,x∗

2)[H]

5×10130 (2.99×10−3,7.23 ×10−3)27.63×10−53.27 ×10−6

1×10530 (2.98×10−3,8.86 ×10−3)96.32×10−32.51 ×10−6

1×10830 (1.45×10−3,9.78 ×10−3)26.24×10−13.39 ×10−6

Let us set ˆ

jPL,ω1(x∗

1,x∗

2)∶=27.63×10−5W, and ˆ

jPL,ω2(x∗

1,x∗

2)∶=96.32×10−3W, and

jPL,ω3(x∗

1,x∗

2)∶=26.24×10−1W, then one can define the subsequent ratios regard-

ing ˆ

jPL,ω0(x∗

1,x∗

2)in Table 5.4, that is,

jPL,ω3(x∗

1,x∗

jPL,ω1(x∗

1,x∗

2)∶=9496.92, ˆ

jPL,ω2(x∗

1,x∗

jPL,ω1(x∗

1,x∗

2)∶=348.61, ˆ

jPL,ω3(x∗

1,x∗

jPL,ω2(x∗

1,x∗

2)∶=27.25. (5.34)

If we invoke Table 5.1 and if we set ePL(x1,max,f1)∶=0.651, and ePL(x1,max,f2)∶=

29.104, and ePL(x1,max,f3)∶=920.353, then one can define the subsequent ratios re-

garding ePL(x1,max,f0)in Table 5.1, that is,

ePL(x1,max,f3)

ePL(x1,max,f1)∶=1413.75, ePL(x1,max,f2)

ePL(x1,max,f1)∶=44.71, ePL(x1,max,f3)

ePL(x1,max,f2)∶=31.62. (5.35)

Comparing (5.35) with (5.34), one can observe that the rough heuristic estimate of

the scale of the winding losses in (5.13) captures at least partly an overall trend that

one can expect from the electromagnetic field theory’s realm of magnetoquasistatics.

Hence, the Table 5.4 furnishes us with a physics-driven plausible optimal solution

in the sense of the condition in (5.32).

Notice well that, especially regarding the operating frequency f0∶=1×108Hz in

Table 5.4, one can observe that the optimal solution determined by the COBYLA

algorithm is sensitive to the choice of the initial point x(0).

TABLE 5.5: Given the operating frequency 1×108Hz and the initial

points in (5.36), the log data in an abridged version w.r.t. the setting

of the log data in (5.19).

x(0)[ (m,m) ] mDSO [1] x∗[ (m,m) ] ˆ

jPL,ω0(x∗

1,x∗

2)[W] ˆ

QL,ω0(x∗

1,x∗

2)[H]

x(0)

130 (1.45×10−3,9.78 ×10−3)26.24×10−13.39 ×10−6

x(0)

230 (2.79×10−3,9.38 ×10−3)23.25×10−12.56 ×10−6

Using the initial points x(0)

1and x(0)

2that read as

x(0)

1∶=(1.1×10−3m,9.9 ×10−3m)x(0)

2∶=(3.0×10−3m,9.9 ×10−3m), (5.36)

180 Chapter 5. Surrogate optimization with the magnetoquasistatic model

I report in Table 5.5 the corresponding log data in an abridged version w.r.t. the

setting in (5.19) for the operating frequency 1×108Hz.

From Table 5.5 and the elaborations regarding the log data in (5.19) and a particu-

lar relevant application-driven workflow that culminated in the statement in (5.30),

I infer that it is likely that the shape of ˆ

jPL,ω3(x1,x2)behaves more intricately than

the shape of ˆ

jPL,ω2(x1,x2)and that the shape of ˆ

QL,ω3(x1,x2)behaves essentially the

same as the shape of ˆ

QL,ω2(x1,x2).

Let us test the inference by constructing corresponding kriging low-fidelity mod-

els ˜

jPL,ωiand ˜

QL,ωiin (5.31) where mw∶=3. It is supposed that the individual high-

fidelity models in (5.31) uniformly build upon the combination (T2D

h1,2∆2D

Ax=b) in

Table 5.3. Furthermore, it is assumed that the corresponding kriging low-fidelity

models uniformly build upon a Sobol quasi-random sequence sampling plan with

m∶=50 (recall § 3.1.1).

In Figure 5.2, I depict the evaluated kriging low-fidelity models in contour rep-

resentation for ω1∶=2π50Hz, ω2∶=2π100kHz, and ω3∶=2π100MHz.

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(1a)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(2a)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(3a)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(1b)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(2b)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(3b)

FIGURE 5.2: Given the combination (T2D

h1,2∆2D

Ax=b) in Table 5.3

and using the Sobol quasi-random sequence sampling plan with

m∶=50 and kriging low-fidelity models with

(1)

ω1∶=2π50Hz,

(2)

ω2∶=2π100kHz, and

(3)

ω3∶=2π100MHz; contour representa-

tion of

(a)

jPL,ωiand

(b)

QL,ωiin (5.31) where mw∶=3.

Dark colors indicate low values; bright colors indicate high values

(cf. Figure B.4 and Figure B.5).

Invoking the application-driven workflow concerning (5.30) to each O(i)in (5.31)

with mw∶=3, one can unleash the machinery of surrogate-based optimization and

the machinery of surrogate-guided optimization for each O(i)– analogous to the

strategies regarding (5.17).

5.1. Solenoid with a core 181

Since this unleashing results in one strategy to tackle the optimization problem

in (5.31) within the context of surrogate optimization, let us end the present subsec-

tion by discussing briefly some conceivable variations of this strategy based on the

examination of the Figure 5.2.

Examining the Figure 5.2, I conclude that the observable behavior regarding ˜

jPL,ωi

and ˜

QL,ωifor ω1∶=2π50Hz passes a physical-driven plausibility check. More pre-

cisely, due to the rough heuristic estimate of the scale of the winding losses in (5.13),

one can expect that the observable behavior regarding ˜

jPL,ωiand ˜

QL,ωifor ω1is ap-

proximately explainable by the electromagnetic field theory’s realm of magnetostat-

ics – besides the realm of magnetoquasistatics.

The similar behavior of ˜

QL,ωifor ω2∶=2π100kHz and ω3∶=2π100MHz is phys-

ically deducible from the setup of the underlying boundary value problem (see the

commentary on Figure 5.1). Given the material characteristics concerning Figure 5.1,

it is to be expected that, after a certain threshold frequency, the electromagnetic field

is completely shielded from conducting domains. Hence, it is to be expected that

QL(ω,x1,x2)in (5.10b) changes more for lower values of the frequency than for

higher values of the frequency.

Recalling (5.35) and (5.34), I argue that the essentially different behavior of ˜

jPL,ωi

for ω2∶=2π100kHz and ω3∶=2π100MHz is explicable by the degree of the action of

the skin effect and the proximity effect.

However, a legitimate objection from a numerical analysis viewpoint is whether

it is justified to assume that the high-fidelity models in (5.31) uniformly build upon

the combination (T2D

h1,2∆2D

Ax=b) in Table 5.3. Hidden behind this assumption is the

assumption that the combination (T2D

h1,2∆2D

Ax=b) provides a sufficient resolution of

the action of the skin effect and the proximity effect for all frequencies under consid-

eration.

From an application-driven viewpoint, the chosen combination (T2D

h1,2∆2D

Ax=b)

that is based upon the numerical investigations concerning a median w.r.t. the or-

dered set WIin (5.31) can be interpreted as a compromise between accuracy and

speed. In the case of Figure 5.2, it is set that WI∶={ω1,ω2,ω3}such that the median

w.r.t. the ordered set WIis ω2∶=2π100kHz.

On an alternative path, one could base the combination (T2D

h1,2∆2D

Ax=b) upon the

numerical investigations concerning the highest frequency w.r.t. the ordered set WI

in (5.31). This choice puts an emphasis rather on accuracy than on speed since for

lower frequencies the same setup is utilized as for the highest frequency.

Another potential path is to choose adaptively the combination (T2D

h1,2∆2D

Ax=b) for

each frequency w.r.t. the ordered set WIin (5.31).

Recalling § 4.4.1, one can conceptualize the above-mentioned paths concerning

the combination (T2D

h1,2∆2D

Ax=b) by employing, for instance, a diagrammatic notation

such as in (4.47) for each frequency w.r.t. the ordered set WIin (5.31).

Observing Figure 5.2 and recalling § 3.2.1, it is possible to choose a different

sample size mfor each frequency w.r.t. the ordered set WIin (5.31) as well. Notice

well that such a choice has an impact on the estimate in (5.33).

182 Chapter 5. Surrogate optimization with the magnetoquasistatic model

5.2 Common-Mode Choke

5.2.1 Preliminary consideration

The device under test is a common-mode choke (CMC) as another representative of

the class of inductive components. The exposition is similar to the notational and

methodological exposition in § 5.1. Thus, the focus is primarily on adding other

aspects to the discussion regarding surrogate optimization with the magnetoqua-

sistatic model.

(i)

(ii)

Ω2D

nc,3

∂Ω2D

Ω2D

nc,2

Ω2D

nc,1

Ω2D

c,1b1

Ω2D

c,2b1

Ω2D

c,1aN

Ω2D

c,2aN

(iii)

(iv)

FIGURE 5.3: (i) A schematic illustration of a common-mode choke

with a longitudinal symmetric domain. (ii) A simplicial triangula-

tion Thof the space region Ω2Dvia FEMM4.2.

(iii) Magnetic field lines of a common-mode choke due to common-

mode (CM) currents. (iv) Magnetic field lines of a common-mode

choke due to differential-mode (DM) currents.

For a three-dimensional representation of a common-mode choke, see, e.g., Fig-

ure 1.1. In (i) of Figure 5.3, there is a schematic illustration of a two-dimensional

5.2. Common-Mode Choke 183

representation of a common-mode choke with a longitudinal symmetric domain.6

The subdomains Ω2D

nc,1 and Ω2D

nc,3 are behaviorally similar to the subdomain Ω2D

nc,1

in Figure 5.1; and the subdomain Ω2D

nc,2 is behaviorally similar to the subdomain Ω2D

nc,2

in Figure 5.1. Furthermore, the conducting subdomains – indexed by c– are be-

haviorally similar to the conducting subdomains in Figure 5.1. A notable differ-

ence in Figure 5.3 is the existence of a primary winding (that is, pairs of subdo-

mains (Ω2D

c,1ai,Ω2D

c,1bi)with i∈{1,. .., N}) and a secondary winding (i.e., pairs of sub-

domains (Ω2D

c,2ai,Ω2D

c,2bi)with i∈{1,. .., N}).

In (iii) of Figure 5.3, the magnetic field lines of a common-mode choke due to

common-mode (CM) currents are shown. By abuse of notation regarding (5.6), let

us indicate the two possible orthogonal directions of the current density w.r.t. the

two-dimensional domain Ω2d

nc,1 as I+

0and I−

0. Hence, let us encode CM currents as

the ordered pair (I+

0,I−

0)for each ordered pair (Ω2d

c,1ai,Ω2d

c,1bi)and (Ω2d

c,2ai,Ω2d

c,2bi).

In (iv) of Figure 5.3, the magnetic field lines of a common-mode choke due to

differential-mode (DM) currents are shown. Thus, let us encode DM currents as the

ordered pair (I+

0,I−

0)for each ordered pair (Ω2d

c,1ai,Ω2d

c,1bi)and the ordered pair (I−

0,I+

for each ordered pair (Ω2d

c,2ai,Ω2d

c,2bi). For an in-depth elaboration on the different

modes of operation of a common-mode choke, I refer to, e.g., [164, p. 346 – 352] and

references therein.

In § 5.1, we discuss two optimization problems with regard to a single inductive

component. Hence, I consider these optimization problems as embedded into the

component level of numerical investigations concerning inductive components.

If one considers these optimization problems with regard to two or more induc-

tive components or other components such as, e.g., in Appendix B.1, then I conceive

the corresponding optimization problems as lifted onto the system level of numerical

investigations concerning inductive components.

Instead of lifting the optimization problem in (5.17) and the optimization prob-

lem in (5.31) onto the system level, let us consider two other kinds of optimization

problems at the system level in order to enlarge the point of view regarding the

applications of surrogate optimization for inductive components.

In the subsequent elaborations, let us narrow down our attention of surrogate

optimization to surrogate-based optimization, though. Observe that the utilization

of surrogate-guided optimization in the context of validation and verification of

the given results of a surrogate-based optimization is analogous to the exposition

in (5.1.2).

5.2.2 Optimization problem I

Recalling the EMC filter in Figure 1.1, let us consider a prototypical version of a

simplistic EMC filter in Figure 5.4.7

6Mind that parts of the corresponding simulation code at an early stage have been developed dur-

ing a student project called "Spulen-Optimierung mit Ersatzmodellen" [in English: "Coil optimization

with surrogate models"] (Marie Krause, Mandy Domke, winter term 2017/2018; unpublished) under

my scientific supervision and reviewed by the first reviewer of the present work.

7Mind that parts of the corresponding simulation code at an early stage have been developed

during a master thesis called "Multi-fidelity modeling for electromagnetic compatibility problems"

(Rodrigo Silva Rezende, summer term 2020; unpublished) under my scientific supervision and the sci-

entific supervision of Dr. Jan Hansen (Robert Bosch GmbH) and reviewed by the first reviewer of the

present work and Prof. Dr. Stefan Kurz (TU Darmstadt). Moreover, the example in Figure 5.4 is part of

a series of numerical studies regarding joint publications in preparation for submission called "Vector-

Valued Multi-Fidelity Surrogate Modeling for Microwave Components Design" (R. S. Rezende, M.

184 Chapter 5. Surrogate optimization with the magnetoquasistatic model

From a theory-driven viewpoint, examples such as in Figure 5.4 or in Figure B.2b

leave the realm of the magnetoquasistatic model (recall § 2.1.3).

However, from an application-driven viewpoint, I deem it reasonable to argue

that, roughly speaking, it depends on the chosen frequency range whether, for the

numerical investigation of an application at hand, the magnetoquasistatic subsystem

of Maxwell’s equations and the complete system of Maxwell’s equations itself appear

appropriately useful to gain knowledge about the application at hand.

If one considers a frequency range such as, e.g., from 50Hz to 100MHz (recall

Figure 5.2) or from 0Hz to 200MHz, then I claim that, for a given task, it might be

beneficial to take into account the magnetoquasistatic subsystem and the complete

system of Maxwell’s equations. Notice that, though, the boundary of the overlap-

ping modeling region of the subsystem and the complete system is probably fuzzy.

For one classification of frequency ranges, I refer to, e.g., [164, p. 19].

x1C1

Ω3D

c,1

Ω3D

c,2

Ω3D

nc,2

Ω3D

nc,1

FIGURE 5.4: A prototypical version of a simplistic EMC filter created

within CST Studio Suite®. The nomenclature from (i) of Figure 5.3 is

adapted. The value C1∶=1×10−5F and the value C2∶=2×10−5F refer

to a respective capacitance (see Figure B.1).

The boundary ∂Ω3D, and the parts relevant to the excitation (four dis-

crete ports and the ground planes), and other (x1,x2,x3)-dependent

geometrical entities (cf. Figure 5.1) are not depicted.

In the present work, however, I do not dwell on epistemological issues regarding

the overlapping modeling region of the magnetoquasistatic subsystem and the com-

plete system of Maxwell’s equations. Furthermore, I do not dwell on the intricacies

of boundary value problems regarding the complete system.

Let us mainly consider the example of application in Figure 5.4 in the spirit of a

high-fidelity model as a black-box model (recall Figure 1.2) within a rapid prototyp-

ing part of a product development cycle.

Hence, it is assumed that, regarding the application in Figure 5.4, there are el-

igible parts relevant to the excitation such as, e.g., four discrete ports and ground

planes, and suitable conditions at the boundary ∂Ω3Dsuch as, e.g., a mix of open

boundary conditions and electric boundary conditions. Concerning the domain Ω3D,

let us consider the existence of other (x1,x2,x3)-dependent geometrical entities (cf.

Figure 5.1) than those in Figure 5.4 as given.

Hadžiefendi´c, R. Schuhmann) and "Multi-Output Variable-Fidelity Bayesian Optimization of a Com-

mon Mode Choke" (R. S. Rezende, M. Hadžiefendi´c, J. Hansen, R. Schuhmann).

5.2. Common-Mode Choke 185

Therefore, one can readout, for instance, the S22-parameter depending on the an-

gular frequency ω∈R+and the three geometrical parameters x1,x2,x3∈R+in Fig-

ure 5.4. Recalling our commentary on S-parameters in § 4.4.2, let us conceive S22 as

the magnitude in dB of the corresponding S22-parameter. Finally, one can overload

the S22-parameter in the sense that, similarly to (5.10), one defines the function S22

that reads as

S22 =(ω,x1,x2,x3)↦S22(ω,x1,x2,x3)∶R+×R+×R+×R+→R. (5.37)

Analogously to (5.11) and (5.12), one can construct a map S22,ωand, given a fixed

angular operating frequency ω0, a map S22,ω0such that

S22,ω=ω↦((x1,x2,x3)↦S22(ω,x1,x2,x3))∶R+→RR+×R+×R+, (5.38a)

S22,ω0=(x1,x2,x3)↦S22(ω0,x1,x2,x3)∶R+×R+×R+→R, (5.38b)

where S22,ω0≡S22,ω(ω0).

Similarly to the application in Figure 5.1, those geometrical parameters that af-

fect, among other things, the shape of the core of an inductive component, affect in-

evitably the corresponding inductance (see Figure B.1) of the inductive component

(recall, e.g., Figure 5.2), too. And, due to Figure B.3, we possess an indication that a

change of the inductance affects the impedance of an inductive component. Suppos-

ing an appropriate relationship between the impedance of the inductive component

and the corresponding S-parameters, it is reasonable to assume that a change of the

input parameters in (5.37) affects the shape of S22(ω,x1,x2,x3)in a similar way as

the change of parameters affects the shape of Z1(ω)and Z2(ω)in Figure B.3.8

Thus, by performing a variation of the optimization problem in (5.31), one can

formulate an optimization problem regarding the application in Figure 5.4 that reads

O=i↦min. S22,ωi(x1,x2,x3)(5.39a)

s.t. x1,min ≤x1≤x1,max, (5.39b)

x2,min ≤x2≤x2,max, (5.39c)

x3,min ≤x3≤x3,max, (5.39d)

where I∶={1,. . .,mw},ωi∈WIwith WI∶={ω1,. . .,ωmw}, and x1,min ∶=40 ×10−3m,

x1,max ∶=50×10−3m, x2,min ∶=10×10−3m, x2,max ∶=30×10−3m, x3,min ∶=2×10−3m,

and x3,max ∶=7×10−3m.

A possible aim concerning (5.39) can be to determine the optimal solution x∗

such that

∀(wi,x)∈WI×X.S22,ωi(x∗)≤S22,ωi(x)(5.40)

holds (cf. (4.18)). Observe, though, the slight difference regarding the formulation of

the optimal solution in (5.40) compared with (5.32).

In practical applications, it is usual to observe the entity S22,ωiin (5.39) w.r.t.,

for instance, a frequency range from 0Hz to 150MHz where it is set that mw∶=1001.

Thus, it is assumed that the entity S22,ωiis quickly available for all ωi∈WI. Therefore,

8I conceive S22(ω,x1,x2,x3)as the magnitude in dB of the corresponding S22(ω,x1,x2,x3)-

parameter whereas I conceive Z1(ω)and Z2(ω)as the magnitude in Ωof the corresponding

impedance Z1(ω)and Z2(ω), respectively. Hence, it is assumed that suitable maps exist in order

to compare the shapes of the entities S22(ω,x1,x2,x3),Z1(ω)and Z2(ω).

186 Chapter 5. Surrogate optimization with the magnetoquasistatic model

the high-fidelity optimization problems O(i)in (5.39) are not considered individu-

ally.

Hence, I propose to replace mwin the rough estimate in (5.33) by a problem-

dependent number cmw∈Nwith cmw≥1. The number cmwencodes a potential dif-

ferentness of the counting methods regarding the corresponding surrogate optimiza-

tion methods.

For the usage of surrogate-based optimization w.r.t. (5.39), let us focus on the case

i∈Iwith I∶={1,2,3}and ωi∈WIwith WI∶={115MHz,120MHz,125MHz}. This

case reflects, for instance, the aim to find a potential minimal value of an S-parameter

around a certain frequency or within a frequency subrange.

Thus, using a Maximin LHC (recall Figure 3.1) with m∶=35, one can construct

the corresponding kriging low-fidelity model for each O(i)and one can compute

the optimal solution with regard to each kriging low-fidelity model. Let us pick the

optimal solution out of three possible optimal solutions that evaluates to the lowest

value. Hence, the optimal solution ˜

x∗reads as

x∗∶=(40.00×10−3m,30.00 ×10−3m,2.00 ×10−3m). (5.41)

If we employ the optimal solution in (5.41) as an initial point within the implemen-

tation of the NMS algorithm (recall § 2.3.3) of CST Studio Suite®(recall Figure 5.4),

then one can observe no significant change w.r.t. the optimal solution in (5.41) after

the maximum number of high-fidelity function evaluations ˜

mDSO,max ∶=2.

Given a random initial point x(0)that can be written as

x(0)∶=(50.00×10−3m,10.00 ×10−3m,7.00×10−3m), (5.42)

then, on average, S22,ωiw.r.t. ˜

x∗is more than 40% lower than S22,ωiw.r.t. ˜

x(0).

If we utilize the initial point x(0)in (5.42) within the implementation of the NMS

algorithm of CST Studio Suite®, then, after the maximum number of high-fidelity

function evaluations mDSO,max ∶=40, it returns an optimal solution such as in (5.41).

Setting mDSO ∶=mDSO,max and mSBO ∶=m+˜

mDSO,max, one can observe a use case

such that

∃mDSO,mSBO ∈N/{0}.mDSO >mSBO (5.43)

holds to be true. However, concerning the statement in (5.43) the same caveats apply

as for the statement in (5.30).

For even more complex issues regarding surrogate optimization w.r.t. the ex-

ample of application in Figure 5.4, the tools from the category theoretical language

in ch. 4can beneficially provide notational and methodological guidance (see, e.g.,

§4.4.2).

5.2.3 Optimization problem II

In Figure 5.5, a representation of two common-mode chokes within FEMM4.2 and

their magnetic field lines due to DM currents is provided.9

In the test cases, some relevant choices regarding the geometric modeling of the

two CMCs are that both the number of turns of the primary winding and the number

9The example in Figure 5.5 is part of a series of numerical studies regarding a joint publication

in preparation for submission called "Surrogate-guided Optimization based on the Space-Mapping

Paradigm and the Co-Kriging Approach with Application in Electromagnetic Compatibility" (M.

Hadžiefendi´c, R. S. Rezende, R. Schuhmann).

5.2. Common-Mode Choke 187

(i) (ii)

(iii) (iv)

FIGURE 5.5: A representation of two common-mode chokes within

FEMM4.2 and their magnetic field lines due to DM currents.

(i) A schematic illustration of the two CMCs with geometrical param-

eters (x1,x2)∈R+×R+, and x1≡rcmc, and x2≡ϕcmc.

(ii) Magnetic field lines for the choice (x1,x2)∶=(rcm,0°).

(iii) Magnetic field lines for the choice (x1,x2)∶=(rcm,45°).

(iv) Magnetic field lines for the choice (x1,x2)∶=(rcm,90°).

of turns of the secondary winding are set to 9 and the radius of the cross-section

areas w.r.t. the primary winding and the secondary winding are both set to 2mm.

Furthermore, the geometrical specifications of the cores of both CMCs are defined

as subsequently: The inner radius is set to 18mm, the width is set to 12mm and the

height is set to 15mm.

However, analogously to the exposition in § 5.2.2, let us mainly consider the

example of application in Figure 5.5 in the spirit of a high-fidelity model as a black-

box model within a rapid prototyping part of a product development cycle.

Recalling the short list in § 5.1.1, let us formulate an optimization problem that is

concerned with the inductive coupling between the two CMCs.

From a field theoretical perspective, I conceive a situation such as in (ii) of Fig-

ure 5.5 as a situation of low inductive coupling between the two CMCs and I con-

ceive a situation such as in (iv) of Figure 5.5 as a situation of high inductive coupling

188 Chapter 5. Surrogate optimization with the magnetoquasistatic model

between the two CMCs. For an electrical network theoretical perspective on induc-

tive coupling, see, e.g., [164, ch. 11] and references therein.

Given this viewpoint, if we provide box constraints regarding the geometric pa-

rameters x1and x2that are depicted in (i) of Figure 5.5 such that x1∈[x1,min,x1,max]

and x2∈[x2,min,x2,max]where x1,min ∶=81.25×10−3m, and x1,max ∶=146.25 ×10−3m,

and x2,min ∶=0°, and x2,max ∶=90°, then one can expect the solution x∗

Hthat reads as

x∗

H∶=(x1,min,x2,max)(5.44)

to correspond to the situation of highest inductive coupling between the two CMCs.

Thus, in practical applications, it is desirable to detect a solution such as in (5.44)

in order to avoid it. Mind that, though, from a design viewpoint, it is probably

reasonable to assume that

∀x1∈[x1,min,x1,max].x∗

H∶=(x1,x2,max)(5.45)

holds to be true as solutions to be avoided. That is, all solutions in (5.45) are per-

ceived as equally bad.

Notice well that, analogously to (5.44) and to (5.45), it is probably reasonable to

assume that, from a design viewpoint,

∀x1∈[x1,min,x1,max].x∗

L∶=(x1,x2,min)(5.46)

holds to be true as solutions to be pursued. That is, all solutions in (5.46) are per-

ceived as equally good.

Given these preliminary thoughts, let us anticipate some kind of symmetrical be-

havior of the evaluated objective function w.r.t. all parameter configurations (x1,45°)

(see (iii) of Figure 5.5).

Thus, let us choose heuristically a map ˆ

QL,ω0as an objective function that is anal-

ogously defined to (5.12b). The map ˆ

QL,ω0serves as an approximate proxy to encode

the considerations about the inductive coupling of the two CMCs in Figure 5.5. Re-

garding the map ˆ

QL,ω0, it is supposed that V≡Ω2Din (B.4) and (B.5).

Hence, let us investigate the following high-fidelity optimization problem

min. ˆ

QL,ω0(x1,x2)(5.47a)

s.t. x1,min ≤x1≤x1,max, (5.47b)

x2,min ≤x2≤x2,max, (5.47c)

where ω0∶=2π50Hz, x1,min ∶=81.25×10−3m, x1,max ∶=146.25 ×10−3m, x2,min ∶=0°,

x2,max ∶=90°.

In Figure 5.6, I present a contour representation of an evaluated kriging low-

fidelity model ˜

QL,ω0w.r.t. (5.47) by using the Sobol quasi-random sequence sampling

plan with m∶=50.

If we choose m∶=21 and ˜

mDSO,max ∶=5, then one needs mSBO ∶=m+˜

mDSO,max high-

fidelity function evaluations in order to find the optimal solution

x∗

L∶=(x1,min,x2,min), (5.48)

which satisfies the statement in (5.46). Though, if we set mDSO,max ∶=30 and choose,

for instance, a random initial point x(0)that can be written as

x(0)∶=(146.25×10−3m,90°), (5.49)

5.2. Common-Mode Choke 189

91.25 111.25 131.25

1 [mm]

2 [∘]

FIGURE 5.6: Given the combination (T2D

h1,2∆2D

Ax=b) in Table 5.3 and

using the Sobol quasi-random sequence sampling plan with m∶=50

and a kriging low-fidelity model with ω0∶=2π50Hz;

contour representation of ˜

QL,ω0in (5.47).

Dark colors indicate low values; bright colors indicate high values

(cf. Figure 5.2).

then one can record that, for mDSO ∶=mDSO,max, we do not arrive at the optimal solu-

tion in (5.48). Hence, one can observe a use case such that

∃mDSO,mSBO ∈N/{0}.mDSO >mSBO (5.50)

holds to be true. Regarding the statement in (5.50), though, the same prudence ought

to be exercised as for the statement in (5.43).

With regard to the example of application in Figure 5.5, the statement in (5.50) re-

flects rather the focus on a strategy for surrogate optimization than an investigation

of formulations of the high-fidelity optimization problem.

The high-fidelity optimization problem in (5.47) is solely one approximate en-

coding of the issue concerning the inductive coupling of the two CMCs in Figure 5.5

within the mathematical framework of § 2.3. The in-depth investigation of other

potential encodings is left for future work.

190 Chapter 5. Surrogate optimization with the magnetoquasistatic model

5.3 In closing

Assuming the context of an electrical engineering design workflow, we have devel-

oped a strategy of using the tools from ch. 3in practical applications. Furthermore,

we have carved out some relevant spots where the tools from ch. 4can have a bene-

ficial impact as well.

We have elaborated on four high-fidelity optimization problems that are embed-

ded within the setting of a 2D-LBVP and a 3D-LBVP, respectively.

From the viewpoint of § 2.3, we have narrowed down our attention to optimiza-

tion problems that have an evaluated reduced parametric quantity of interest in the

objective functional and an evaluated reduced parametric quantity of interest in the

constraints besides box constraints; and to optimization problems that have an eval-

uated reduced parametric quantity of interest in the objective functional and box

constraints.

Moreover, given the semantics of multiple operating frequencies, we have suc-

cinctly addressed its impact on the computational burden in terms of the number

of high-fidelity function evaluations and its peculiarity regarding the formulation of

a corresponding optimal solution. We have also discussed that if the high-fidelity

optimization problems associated with each frequency are considered individually,

then, figuratively speaking, the space of options concerning surrogate optimization

expands – in the sense that, e.g., a data-fit low-fidelity model’s number of sampling

plan points can be adaptively chosen for each frequency.

We have seen that various high-fidelity models w.r.t. the magnetoquasistatic mo-

del exhibit different behaviors such that the insights regarding the investigations,

e.g., in § 3.2.1 can be exploited –, for instance, to roughly identify the behavior of the

high-fidelity models with the behavior of a test function from Figure 2.2.

Additionally, we have exemplified the quick use of normalized global first-order

sensitivity measures as survey tools for the relevance of parameters and as poten-

tial approximate proxies for other indicators regarding the quality of a low-fidelity

model.

Using the methodological guidance from § 4.4.1, we have constructed some simp-

lified-physics low-fidelity models and we have computed the corresponding SSPCC

of these models.

We have deployed surrogate-guided optimization (see § 3.3) mainly in the con-

text of validation and verification of given results of a surrogate-based optimization.

A peculiarity is that we have utilized the co-kriging low-fidelity model in (3.184)

within the TRASM algorithm 3.1 in the spirit of hybrid model management strate-

gies (cf. § 3.4).

Mind that, however, we have carved out use cases where the number mDSO of

high-fidelity function evaluations regarding direct solving of a high-fidelity opti-

mization problem exceeds the number mSBO regarding surrogate-based optimiza-

tion; and we have carved out a use case where, in addition, the number mSBO exceeds

the number mSGO regarding surrogate-guided optimization. Concerning these num-

bers, we have also addressed some caveats and limits of comparability.

191

Chapter 6

Conclusion and outlook

At the end, I distill a conclusion from § 2.4,§3.4, § 4.6, and § 5.3, more precisely, I

select a few particular insights from these sections to illustrate from the frog’s-eye

view, i.e., at a more technical level, some of the present work’s achievements.

Moreover, I adopt a bird’s-eye view to illustrate in which respect this work has

made some progress in the scientific thicket of full automation of the virtual pro-

totyping of power electronic systems (see chapter 1); and I present an outlook for

potential new endeavors that may stem from this work.

6.1 Conclusion

The whole zoo of optimization algorithms in § 2.3.3 has proven to be useful for the

purpose of finding a solution of a given optimization problem and for the purpose

of cross-checking a computed optimal solution in the sense of validation and verifi-

cation.

The gradient-based interpretation of sensitivity measures is well-suited for func-

tions that permit the determination of derivative information by forward mode auto-

matic differentiation. At the level of programs (see Figure 1.4), this interpretation is

especially beneficial if there is a sound interaction between a module for automatic

differentiation and a module for numerical integration to facilitate this interpreta-

tion’s embedding into production-level code.

To my best knowledge, sampling plans constructed by a Sobol quasi-random

sequence are not yet widely represented in the literature concerning surrogate opti-

mization. From a theoretical viewpoint, these kinds of sampling plans enable a re-

producibility of samples of a given size. From a data management viewpoint, these

kinds of sampling plans enable an economical and sustainable handling of data by

ensuring reusability of data. Furthermore, such sampling plans may help to lower

the computational burden of constructing a co-kriging low-fidelity model.

If we assume a sparse number of sampling plan points concerning the high-

fidelity model, then using the squared sample Pearson correlation coefficient in com-

bination with the empirical generalization error within the k-fold cross validation

method requires a minimum number of sampling plan points depending on the

number k.

By carving out a potential link between the correlation coefficient and the sensi-

tivity measures, I have cautiously formulated a conjecture about the trustworthiness

of low-fidelity models’ normalized global first-order sensitivity measures. An im-

plication of this conjecture is that, in addition to their feature as survey tools for the

relevance of parameters, the sensitivity measures can serve as possible approximate

proxies for other indicators regarding the quality of a low-fidelity model.

However, it is unclear whether there exists a reliable complete list of indicators.

Despite this unclear point, a benchmark-focused classification of test functions has

192 Chapter 6. Conclusion and outlook

been provided that creates the opportunity to classify very roughly the behavior of

a corresponding optimization problem within the magnetoquasistatic context.

Mind that use cases have been provided where the number of high-fidelity func-

tion evaluations is higher for a direct solving of a high-fidelity optimization prob-

lem than for a surrogate-based optimization approach and for a surrogate-guided

optimization approach. From an application-driven viewpoint in the context of val-

idation and verification, I have argued that there is an additional value of checking

whether the number of high-fidelity model evaluations is higher for a surrogate-

based optimization approach than for a surrogate-guided optimization approach.

Regarding all these numbers, however, there are some caveats and limits of compa-

rability, too.

Driven by heuristics, a purely formalization-oriented viewpoint has been ex-

ploited that has provided us with novel insights of theoretical value (such as poten-

tial hybrid model management strategies) and of practical value (such as convergence-

related issues within the space-mapping paradigm and regarding the quality of the

low-fidelity model within the co-kriging low-fidelity model).

The formalization-oriented viewpoint has culminated in the exposition of the

category theory toolset which represents solely a subset of the large amount of tools

available within category theory. The capability of the CT toolset as an algebraic

modeling framework for applications in surrogate optimization within the electro-

magnetics context has been shown. More precisely, the strengths of the CT toolset as

a strong notational scaffolding by diagrams of arrows have been illuminated.

In order to quantify the so-called modeling error, I have suggested a heuristics-

driven notion of a problem-dependent degree of forgetfulness as an auxiliary means.

Moreover, some classification tools at the level of generalized functions (see Fig-

ure 1.4) for the concept of multifidelity model management and the space mapping

notion have been propounded. Furthermore, a diagram of arrows has been pro-

posed as a common generic interface of two formalization issues related to coordi-

nate transformations.

From an application-oriented viewpoint, the intuition concerning the CT toolset

is especially relevant in order to facilitate the CT toolset’s wider acceptance. Hence,

there has been an attempt to balance the need for rigor and the need for intuition.

Admittedly, however, further investigations are necessary to set the applications of

the category theory toolset on an even more rigorous foundation.

Finally, representatives of the class of inductive components have been invoked

and I have examined four high-fidelity optimization problems that are embedded

within the setting of a two-dimensional linear boundary value problem and a three-

dimensional linear boundary value problem, respectively.

By supposing the context of an electrical engineering design workflow, I have

propounded a strategy of using the surrogate optimization tools of the present work

in practical applications. Moreover, some promising spots for a beneficial utilization

of the category theory toolset have been illuminated, too.

Finally, let me elucidate briefly in which respect this thesis achieves some pro-

gress concerning its ideal long-term goal, that is, the full automation of the virtual

prototyping of power electronic systems. This ideal goal can be approximately con-

ceived as the development of a user-independent software system that performs the

mathematical modeling, numerical simulation and optimization given an applica-

tion within the electromagnetics context.

The sheer complexity of such a goal demands expertise from numerous special-

ties. Thus, it is probable that any endeavor towards this goal is very interdisciplinary

by nature.

6.2. Outlook 193

This thesis has provided some indication that the category theoretical language

can be a serious candidate for the important position of a mediator that enables a

smooth interplay between diverse fields such as computer science, numerical anal-

ysis, and electrical engineering.

Moreover, this thesis has provided some indication that surrogate optimization

can be valuable for power electronic applications at a component level as well as at a

system level in the context of performance-oriented optimization and in the context

of validation and verification.

Category theory’s inherent emphasis on formal aspects of a given problem and

its closeness to type theory in programming language theory suggest that it is a good

companion in the pursuit of a user-independent software system. Furthermore, its

high level of mathematical abstraction and its close relationship to logic makes it

also a good companion in the pursuit of mathematical modeling, simulation, and

optimization of a given application within any physics-inspired semantics.

In a nutshell, the category theoretical toolset offers some help to handle the com-

plexity of this thesis’ ideal long-term goal by providing algebraic and visual tools

to convey complex formalization ideas that arise naturally in the context of, for in-

stance, multi-fidelity modeling of surrogate optimization. Thus, the CT toolset can

surely assist in the ongoing challenging search for novel surrogate-guided optimiza-

tion methods for power electronic applications.

6.2 Outlook

I will first mention some specific potential new endeavors regarding chapter 2, chap-

ter 3, chapter 4, and chapter 5; and then I will mention a general potential new en-

deavor that is associated with the Disclaimer in§1.3.

Concerning ch. 2, it is desirable to extend the size of the subset of optimization

test functions. Moreover, I also deem it advisable to include more test functions

that admit a natural generalization to higher dimensions. With regard to sensitivity

measures, it seems worthwhile to compare various interpretations.

Concerning ch. 3, it might be fruitful to extend the numerical investigations re-

garding the optimization with test functions by data-fit low-fidelity models. For

instance, more data-fit low-fidelity models could be taken into account. The thor-

ough investigation of sequential kriging optimization and sequential co-kriging op-

timization with regard to the optimization problems in ch. 5appears as an intriguing

venture, too. Furthermore, the proper incorporation of many low-fidelity models in

a surrogate-guided optimization approach might be a promising path as well.

Concerning ch. 4, future use cases for the CT toolset have been discussed bit

more extensively in § 4.5. Undoubtedly, however, it is very preferable to seek out

more examples of applications within the electromagnetics context for the category

theory toolset.

Concerning ch. 5, it is worthwhile to extend the number of parameters and the

number of optimization problems. Moreover, given the semantics of multiple op-

erating frequencies, I deem it beneficial to explore exhaustively the many possible

options concerning surrogate optimization.

Finally, recalling the Disclaimer in§1.3, it might be fruitful to incorporate uncer-

tainty quantification, parallel computing, and automation aspects all together into

the context of surrogate optimization under the guidance of the CT toolset in or-

der to develop a, for lack of a better word, robust, parallel, and automated surrogate

optimization guided by category theoretical ideas.

195

Appendix A

Multivariate polynomials (§ 3.1.2)

A.1 Reparametrization using mean-centered arguments

With regard to fostering numerical stable computations, some authors (see, e.g., [61,

p. 27]) recommend to perform a reparametrization using mean-centered arguments,

i.e., the argument xin (3.54) is mapped to x−¯

xvia the map T¯

x=x↦x−¯

x∶Rd×1→Rd×1

where, supposing a sampling plan Xs, the components of ¯

x∈Rd×1are determined by

the componentwise means of the sampling plan points in (3.14). The map T¯

xenables

to define a map p¯

x∶=(p○T¯

x)that can be graphically represented as

Rd×1Rd×1

T¯

p¯

x∶=(p○T¯

p(A.1)

where its assignment is encoded by

(p○T¯

x)(x)∶=β0+le(x−¯

x)+qA(x−¯

x). (A.2)

Analogous to the construction of the matrix Bin (3.59), one can define a matrix B¯

In Table A.1, one can observe that there is an improvement regarding the condition

number κ(BT

xB¯

x)w.r.t. Xs,1 and Xs,3. In the case of the sampling plan Xs,1, the ab-

solute difference is 7.67×103and the percental difference is 72.36%. In the case of

the sampling plan Xs,3, the absolute difference is 2.2×103and the percental differ-

ence is 68.11%. In the case of the sampling plan Xs,3, the matrix BT

xB¯

xis singular

which shows that a de facto singular matrix Bremains singular in the course of a

reparametrization of the form encoded in (A.2).

TABLE A.1: The condition number w.r.t. a sampling plan from Fig-

ure 3.4 without and with reparametrization in (A.2).

Condition number

Sampling plan Xs,1 Xs,2 Xs,3

κ(BTB)1.06×1048.57×1049 3.23×103

κ(BT

xB¯

x)2.93×103∞1.03×103

∣κ(BTB)−κ(BT

xB¯

x)∣ 7.67×103∞2.2×103

∣κ(BTB)−κ(BT

xB¯

x)∣

∣κ(BTB)∣ 72.36×10−2∞68.11×10−2

196 Appendix A. Multivariate polynomials (§ 3.1.2)

A.2 Bernstein polynomials

The reparametrization strategy in (A.1) and in (A.2), respectively, inspires to inves-

tigate in a small numerical experiment the condition number with regard to polyno-

mials in Bernstein form. For more details on the properties of Bernstein polynomials,

see, e.g., [55, p. 205–211].

The Bernstein basis Bς⊆P≤nreads as

Bς∶=⎧

⎪

⎨

⎪

⎩˜

bi,n(x)≡(n

i)xi(1−x)n−iRRRRRRRRRRRi∈{0,1,.. .,n−1,n}∧x∈[0,1]⎫

⎪

⎬

⎪

⎭. (A.3)

In order to transform the domain from [al,bl]d≡[0,1]dto [˜

al,˜

bl]dwith l∈{1,. . .,d},

let us apply the affine map γl∶[al,bl]→[˜

al,˜

bl]and the affine map νl∶[˜

al,˜

bl]→[al,bl]

such that

[al,bl][˜

al,˜

bl]

[al,bl]

γl

∀l.id[al,bl]∶=(νl○γl)νl(A.4)

where id[al,bl]denotes the identity map on the domain [al,bl]. The assignments of the

affine maps γland νlare encoded by

∀l∈{1,. . .,d}.γ(xl)∶=((bl−al)xl+al), (A.5a)

∀l∈{1,. . .,d}.ν(xl)∶=1

bl−al(xl−al). (A.5b)

Assuming that the Bernstein coefficients are given in their floating-point repre-

sentation, the de Casteljau’s algorithm is a numerically stable tool to evaluate a polyno-

mial p∈span(Bς). A more elaborated discussion on the properties of de Casteljau’s

algorithm and, especially, its application in the context of Bézier curves, see, e.g., [55,

p. 211–218].

In Listing A.1, I present an example implementation of de Casteljau’s algorithm

for the evaluation of a univariate Bernstein sum in the Julia PL.

LISTING A.1: An example implementation of de Casteljau’s algo-

rithm for the evaluation of a univariate Bernstein sum in the Julia PL.

function bernstein_deCasteljau_eval_1d(c::Vector{T},x::T) where T<:Real

N = size(c,1) # 1-based indexing

D = zeros(N,N)

D[1,:] = copy(c)

for jin 2:N

for iin 1:N-(j-1)

D[j,i] = (1-x)*D[j-1,i] + x*D[j-1,i+1]

end

return D[N,1]

end

Generalizing to the multivariate case by employing the tensor product construction,

one can provide a Bernstein basis for the space Pd

k, however, one cannot provide a

Bernstein basis for the space Pd

≤k.

A.2. Bernstein polynomials 197

For a small numerical experiment, let us consider the spaces P2

≤2,P3

≤2,P2

2and P3

Regarding the space P2

2, exemplarily, let us utilize the corresponding matrix repre-

sentations of the tensor product basis as column vectors ˜

b∈R9×1and ˜

bς∈R6×1by

invoking the Kronecker product ⊗with the signature Rm×n×Rp×q→Rpm×qn. Hence,

band ˜

bςcan be written as

b∶=[1x1x2

1]T⊗[1x2x2

2]T, (A.6a)

bς∶=[(1−x1)22x1(1−x1)x2

1]T⊗[(1−x2)22x2(1−x2)x2

2]T. (A.6b)

Additionally, let us use sampling plans based on the Sobol quasi-random sequence

in order to ensure reproducibility and to avoid averaging such as it would be needed

in the case of an Audze-Eglais LHC or a Maximin LHC. In Table A.3, the condition

numbers κ(BTB+λI)and κ(BT

ςBς+λI)are depicted.1

In the case of setting the Tikhonov regularization parameter λto zero, one can

observe that the condition number decreases by increasing the number of sampling

plan points m. Adopting a statistics point of view, one can interpret this observation,

intuitively, i.e., a larger sample size leads to a better estimate of the so-called true co-

efficients vector; or, more formally, the best coefficients column vector ˆ

cis consistent

for the true coefficients column vector c. For more details on this particular notion of

consistency, I refer to [82, p. 65f] where the author elaborates on a theorem in which

a linkage is presented between the consistency and an approximation check whether

m(BTB)approaches a symmetric positive definite matrix as m→∞.

If a symmetric positive definite matrix is given, then its trace is greater than zero;

hence, let us check computationally tr(1

m(BTB))and tr(1

m(BT

ςBς)), respectively. In

Table A.2, the corresponding results are reported.

TABLE A.2: The trace of 1

m(BTB)and the trace of 1

m(BT

ςBς)w.r.t. the

number of sampling plan points mand the Tikhonov regularization

parameter λ=0 assuming the spaces P2

≤2,P3

≤2,P2

2and P3

2with mono-

mial basis and the spaces P2

2and P3

2with Bernstein basis.

BP2

≤2BP2

2Bς,P2

2BP3

≤2BP3

2Bς,P3

m=10 2.131 2.310 0.267 2.862 3.328 0.133

m=50 2.147 2.304 0.270 2.899 3.427 0.140

m=100 2.162 2.334 0.278 2.909 3.573 0.145

m=1000 2.176 2.348 0.283 2.930 3.600 0.150

In Table A.2, one can observe that, for all cases under consideration, the trace is

non-negative and it increases very slowly with increasing number of sampling plan

points. Thus, these results provide some kind of empirical evidence for the intuition

underlying the notion of consistency described above. Note that the trace of a ma-

trix is equal to the sum of its eigenvalues; and a defining property of a symmetric

positive definite matrix is that is only positive definite if and only if all of its eigen-

values are positive. In Table A.2, solely the entries with regard to the combinations

(m=10, BP3

2)and (m=10, Bς,P3

2)do not correspond to a case where all eigenvalues

are positive.

1Given m=1000, the time needed to construct the sampling plan based on the Sobol quasi-random

sequence is approximately 198 s for d=2 and 227s for d=3 on a notebook with an Intel®Core™i7-

6500U CPU @ 2.50GHz. This time represents the main bottleneck during the construction of the matrix

(BTB+λI)and (BT

ςBς+λI), respectively.

TABLE A.3: The condition number κ(BTB+λI)and κ(BT

ςBς+λI)w.r.t. the number of sampling plan points mand the Tikhonov

regularization parameter λassuming the spaces P2

≤2,P3

≤2,P2

2and P3

2with monomial basis and the spaces P2

2and P3

2with Bernstein basis.

(A) The condition number assuming P2

≤2with monomial basis.

m=10 m=50 m=100 m=1000

λ=0.0 3.971×1031.176×1030.922×1030.861×103

λ=0.2 0.091×1030.334×1030.464×1030.789×103

λ=0.5 0.037×1030.161×1030.266×1030.700×103

λ=0.8 0.023×1030.106×1030.187×1030.630×103

(B) The condition number assuming P3

≤2with monomial basis.

m=10 m=50 m=100 m=1000

λ=0.0 0.182×1062.464×1031.782×1031.363×103

λ=0.2 0.119×1030.476×1030.716×1031.224×103

λ=0.5 0.048×1030.216×1030.378×1031.062×103

λ=0.8 0.030×1030.139×1030.257×1030.938×103

2with monomial basis.

m=10 m=50 m=100 m=1000

λ=0.0 2.292×1060.369×1060.316×1060.275×106

λ=0.2 0.098×1030.490×1030.983×1039.564×103

λ=0.5 0.039×1030.196×1030.394×1033.907×103

λ=0.8 0.025×1030.123×1030.247×1032.455×103

(D) The condition number assuming P3

2with monomial basis.

m=10 m=50 m=100 m=1000

λ=0.0 7.994×1019 6.938×1083.702×1081.563×108

λ=0.2 0.130×1030.668×1031.388×1031.395×104

λ=0.5 0.052×1030.268×1030.555×1035.581×103

λ=0.8 0.033×1030.167×1030.347×1033.488×103

(E) The condition number assuming P2

2with Bernstein basis.

m=10 m=50 m=100 m=1000

λ=0.0 1.092×1030.129×1030.104×1030.098×103

λ=0.2 0.006×1030.023×1030.036×1030.084×103

λ=0.5 0.003×1030.011×1030.019×1030.069×103

λ=0.8 0.002×1030.007×1030.013×1030.058×103

(F) The condition number assuming P3

2with Bernstein basis.

m=10 m=50 m=100 m=1000

λ=0.0 8.471×1018 4.578×1032.269×1031.063×103

λ=0.2 0.003×1030.010×1030.019×1030.158×103

λ=0.5 0.001×1030.004×1030.008×1030.070×103

λ=0.8 0.001×1030.003×1030.005×1030.045×103

A.3. Chebyshev polynomials 199

Given a fixed λ>0 in Table A.3, one can observe that the condition number is

increasing as the number of sampling plan points mis increasing. I cannot back up

this particular behavior in a formal way. Given a fixed m, one can observe that, the

condition number is decreasing as the regularization parameter is increasing. This

behavior is matching the expectation regarding the regularization parameter.

If we exclude pathological several orders of magnitude from consideration, then,

in practical application, the assessment of a condition number as an indication for

an ill-conditioned problem is, mostly, incumbent upon the judgment of the user and

its desired accuracy for a problem under investigation.

Therefore, the key empirical insight from Table A.3 is that for a low number of

sampling plan points, by invoking the regularization parameter λ, one can achieve

moderately small condition numbers by monomial basis for the space P2

≤2and P3

≤2.

These condition numbers are comparable to those condition numbers associated

with their tensor product polynomial basis counterparts.

A.3 Chebyshev polynomials

Another possibility to mitigate by design the multicollinearity with respect to the

chosen basis is, instead of a monomial basis B⊆P≤nsuch as in (3.40), to apply a fi-

nite basis of orthogonal polynomials Q⊆P≤nin which the polynomials are pairwise

orthogonal with respect to an inner product ⟨⋅,⋅⟩RXthat can be considered in a con-

tinuous form or in a discrete form (see, e.g., [48, p. 251f]), more precisely,

⟨⋅,⋅⟩RX∶=(f,g)↦∫b

af(x)g(x)w(x)dx∶RX×RX→Ror (A.7a)

⟨⋅,⋅⟩RXs∶=(f,g)↦

∑

i=1

f(xi)g(xi)w(xi)∶RXs×RXs→R, (A.7b)

where x∈X∶=[a,b]⊂R,xi∈Xs⊆Xmwith m∈Nand i∈{1,2,...,m}, and wdenotes

aweight function. Let us focus on the discrete form in (A.7b).

Using orthogonal polynomials qi∈Q, one can concisely define a matrix Q∈Rm×Rs

with respect to the sampling plan points xi:

Q∶=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⋮

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, (A.8)

such that, in the absence of the floating point approximation error, QTQis exactly

diagonal in the sense of

QTQ≡diag(⟨˜

q0,˜

q0⟩RXs,.. .,⟨˜

qs−1,˜

qs−1⟩RXs), (A.9)

where ˜

qidenote the components of a basis vector ˜

q∈Rs×1. If Qis orthonormal, then

QTQ=Iwhere I∈Rs×Rsdenotes the identity matrix.

Let us aim our attention at univariate Chebyshev polynomials of the first kind

Tn(x)with the leading coefficient 2n−1for n≥1 and the domain X∶=[−1,1]. For

more details on the properties of those sequences of orthogonal polynomials called

Chebyshev polynomials, see, e.g., [208].

200 Appendix A. Multivariate polynomials (§ 3.1.2)

The corresponding recurrence formula reads as

Tn(x)∶=⎧

⎪

⎨

⎪

⎩

1, if n≡0,

x, if n≡1,

2xTn−1(x)−Tn−2(x), if n≥2.

(A.10)

Based on the recurrence formula in (A.10), one can construct a basis for the space

P≤2; furthermore, one can construct a basis for the space Pd

≤2in the same manner as

in (3.45). Hence, one can define the column vector ˜

q∈Rs×1that represents the basis

of a d-variate Chebyshev polynomial of the first kind of degree at most two:

q∶=[˜

q0˜

q1... ˜

qd˜

qd+1... ˜

q2d˜

q2d+1... ˜

qs−1]T, (A.11)

where ˜

q0=T0(x1)⋯T0(xd),˜

q1=T1(x1),˜

qd=T1(xd),˜

qd+1=T2(x1),˜

q2d=T2(xd),

q2d+1=T1(x1)T1(x2), and ˜

qs−1=T1(xd−1)T1(xd); more specifically, ˜

q0=1, ˜

q1=x1,

qd=xd,˜

qd+1=2x2

1−1, ˜

q2d=2x2

d−1, ˜

q2d+1=x1x2, and ˜

qs−1=xd−1xd. Thus, analogously

to (3.57), one can define a polynomial as

p(x)∶=˜

qT˜

c. (A.12)

Mind that the sampling plan Xs⊂([−1,1]d)m1×⋯×mdis constituted by the Chebyshev

nodes

∀l∈{1,. . .,d}.∀k∈{1,. . .,ml}.xkl∶=cos(2k−1

2mπ), (A.13)

where ∏d

l=1ml∶=m. Let us refer to this specific construction of a sampling plan Xsas

aChebyshev grid in which the sampling plan Xscan be conceived as a tensor in the

sense of a multidimensional array (see, e.g., [79]). For a brief and succinct comment

on the smoothness requirements for a high-fidelity model and on the influence of

the positioning of the nodes in polynomial interpolation (the Runge phenomenon,

the Faber theorem), see, e.g., [209].

In Figure A.1, three Chebyshev grids with d∶=2 are illustrated.

-1.0 -0.5 0.0 0.5 1.0

-1.0

-0.5

0.0

0.5

1.0

(i)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(ii)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

(iii)

FIGURE A.1: Instances of a Chebyshev grid as

a sampling plan Xs⊂(X2)m1×m2.

(i)

X∶=[−1,1],m1≡m2∶=5,

(ii)

X∶=[0,1],m1≡m2∶=5,

(iii)

X∶=[0,1],m1≡m2∶=10.

In order to transform the domains from [al,bl]2≡[−1,1]2to [˜

al,˜

bl]2≡[0,1]2with

l∈{1,2}, let us overload the maps and override the domains and the variables in

(A.4) and in (A.5). Thus, let us apply the affine map γl∶[al,bl]→[˜

al,˜

bl]and the

A.3. Chebyshev polynomials 201

affine map νl∶[˜

al,˜

bl]→[al,bl]such that

[al,bl][˜

al,˜

bl]

[al,bl]

γl

∀l.id[al,bl]∶=(νl○γl)νl(A.14)

where id[al,bl]denotes the identity map on the domain [al,bl]. Invoking (A.13), the

assignments of the affine maps γland νlare encoded by

∀l∈{1,. . .,d}.∀k∈{1,. . .,ml}.γ(xkl)∶=1

2((bl−al)xkl+(al+bl)), (A.15a)

∀l∈{1,. . .,d}.∀k∈{1,. . .,ml}.ν(xkl)∶=1

bl−al(2xkl−(al+bl)). (A.15b)

If we use the orthogonality condition in the discrete form for the univariate case,

i.e.,

∀l∈{1,. . .,d}.

∑

k=1

Ti(xkl)Tj(xkl)∶=⎧

⎪

⎨

⎪

⎩

0, if i≠j,

ml, if i≡j∧i=0,

2, if i≡j∧i≠0,

(A.16)

one can express the orthogonality condition in the discrete form for the multivariate

case in terms of the components ˜

qiin (A.11), i.e.,

⟨˜

qi,˜

qj⟩RXs∶=⎧

⎪

⎨

⎪

⎩

0, if i≠j,

m, if i≡j∧i=0,

2, if i≡j∧i∈[1,2d],

4, if i≡j∧i∈[2d+1,s−1].

(A.17)

Analogously to the Listing 3.1, if we have the optimal coefficients as floating-

point numbers, then a numerically stable approach to evaluate the function in (A.12)

is Clenshaw’s recurrence formula.

In Listing A.2, I present an example implementation of Clenshaw’s algorithm

for the evaluation of a univariate Chebyshev sum in the Julia PL. For a neat graph

representation of the structure of Clenshaw’s algorithm, see, e.g., [78, p. 199].

To avoid clutter, a modified version of this algorithm (see [75, p. 78f]) is left out.

The modified version attempts to alleviate the rounding errors’ hazard in the cases in

which the argument is close to the domain’s boundary, e.g., the case x∶=−1+1×103e

or the case x∶=1−1×103ewhere the number eencodes the machine epsilon in float-

ing point arithmetic .

LISTING A.2: An example implementation of Clenshaw’s algorithm

for the evaluation of a univariate Chebyshev sum in the Julia PL.

function chebyshev_clenshaw_eval_1d(c::Vector{T},x::T) where {T<:Real}

N = size(c,1) - 1 # 1-based indexing

d = zeros(N+2)

d[N+2] = 0

d[N+1] = c[N+1]

for iin N:-1:2

d[i] = 2x*d[i+1] - d[i+2] + c[i]

end

return x*d[2] - d[3] + c[1]

end

202 Appendix A. Multivariate polynomials (§ 3.1.2)

Concerning the multivariate case, if we possess the space P2

≤2with dim(P2

≤2)≡6, then

one can introduce the column vectors ˜

τ(x2)∈R6×1and ˜

η(x1)∈R6×1and the diagonal

matrix ˜

∆∈R6×R6such that

τ(x2)∶=[T0(x2)T0(x2)T0(x2)T1(x2)T2(x2)T1(x2)]T, (A.18a)

η(x1)∶=[T0(x1)T1(x1)T2(x1)T0(x1)T0(x1)T1(x1)]T, (A.18b)

∆∶=diag(˜

c1,˜

c2,˜

c3,˜

c4,˜

c5,˜

c6), (A.18c)

p(x)∶=˜

τ(x2)T˜

∆˜

η(x1). (A.18d)

Finally, similarly to (3.69), one can apply an evaluation scheme based on multiple

nested hierarchical evaluations of univariate Chebyshev sums to the formulation

p(x)∶=

dim(P2

≤2)

∑

i=1

cj˜

τj(x2)˜

ηj(x1). (A.19)

For more details on evaluation algorithms for multivariate orthogonal polynomials

such as Chebyshev polynomials, see, e.g., [17].

Noteworthily, the authors in [206] allude to a numerically stable evaluation sche-

me for a univariate Chebyshev sum based on the second (true) form of the barycen-

tric formula in rational interpolation where the number of Chebyshev nodes of the

second kind has to be sufficiently large (cf. [22]). This evaluation scheme only re-

lies on the information about the nodes and the corresponding values of the high-

fidelity model. Though, since the basic presumption in the present work is to keep

the number of evaluation points fairly low, this formula is solely employed for test-

ing purposes.

Regarding the usage of a deterministic data-fit low-fidelity model in an interpo-

lation context, an additional aspect that is restricted by the basic presumption in the

present work is the reduction of the empirical surrogate modeling error (see Defini-

tion 3.1.2) by increasing sufficiently the sample size and by positioning appropriately

the sampling plan points. Some kind of adaptive interpolation in the sense that an

initial sampling plan is extended sequentially in a controlled manner, though, it is

solely regarded for the probabilistic data-fit low-fidelity model, that is, the kriging

low-fidelity model.

Regarding the condition number κ(QTQ), if Qis orthonormal, then, by design,

it holds that κ(QTQ)≡1; and if Qis only orthogonal, then, in principle, it holds that

κ(QTQ)<9. Comparing these observations to the observations in Table A.3, it can

be argued that a change of basis, especially, a change to a basis of Chebyshev poly-

nomials, can have a favorable effect on the condition number. Mind that, from an

application-oriented view, the regularization approach offers more flexibility con-

cerning the choice of a basis and the choice of a sampling plan.

203

Appendix B

Solenoid with a core (§ 5.1)

B.1 An electrical network viewpoint

In Figure B.1, I depict the circuit diagram representation of the three fundamental

passive electrical components where the circles encode external terminal nodes of

the passive electrical components.

R L C

FIGURE B.1: Circuit diagram representation of the three fundamental

passive electrical components: resistance R, inductance L, and capac-

itance C.

If we use the map U∈CR+and the map I∈CR+to denote a complex-valued

voltage drop and a complex-valued current intensity, respectively, where both maps

depend on the angular frequency ω, then, with respect to the passive electrical com-

ponents in Figure B.1, one can state the following equations to hold to be true:

UR(ω)=R⋅IR(ω),UL(ω)=jωL⋅IL(ω),IC(ω)=jωC⋅UC(ω), (B.1)

where it is assumed that the entities R,L,C∈Cwith Re(R),Re(L),Re(C)∈R+and

Im(R)∶=0, Im(L)∶=0, and Im(C)∶=0. Hence, all the multiplication maps in (B.1)

have the same signature C×C→Cand the corresponding assignment rules refer to

the common algebraic rules for complex numbers.

In Figure B.2, two representatives from the class of circuit diagrams for real in-

ductive components (cf. [112, p. 520ff]) are depicted where a rough colloquial equiv-

alence relation can be defined by "has the same four-tuple of fundamental passive

electrical components (L0,Rw,Rc,Cp)as".

Mind that in Figure B.2, due to the exposition in § 2.1.3, one can solely consider

the resistance Rw(associated with the losses in the winding), the resistance Rc(as-

sociated with the losses in the core) and the inductance L0(associated with the mag-

netic energy) within the magnetoquasistatic model where, in practical applications,

the entity L0refers to the nominal inductance provided by a choke manufacturer’s

data sheet.

In the context of electromagnetic compatibility, though, it is common to take the

capacitance Cpas a parasitic component into account, too, in order to reconstruct

properly a real inductive component’s impedance map Z∈CR+with Z=ω↦U(ω)

I(ω)

over a wide range of frequencies. Notice well that, in contrast to the 2D-LBVP

in § 5.1, the aim for proper reconstruction of a real inductive component’s impedance

map prompts one to also incorporate the resistance Rc. The assignment rules for the

204 Appendix B. Solenoid with a core (§ 5.1)

RwRc

(A) Representative #1.

CpRc

(B) Representative #2.

FIGURE B.2: Two representatives from the equivalence class of circuit

diagrams for real inductive components with the equivalence rela-

tion "has the same four-tuple of fundamental passive electrical com-

ponents (L0,Rw,Rc,Cp)as".

representatives in Figure B.2 read as

Z1(ω)∶=((Rc+Rw)+jωL0)1

jωCp

(Rw+Rc)+jωL0+1

jωCp

,Z2(ω)∶=1

Rw+jωL0+1

Rc+jωCp

, (B.2)

where Z1(ω)corresponds to the circuit diagram in Figure B.2a and Z2(ω)corre-

sponds to the circuit diagram in Figure B.2b.

If we overload the impedance map Z(and the maps Uand Ias well) in the sense

that Z∈CCwith Z=s↦U(s)

I(s), then one can rewrite the assignment rules in (B.2) by

substituting the term jωwith the term s– which we conceive as s∶=σs+jωswith

σs,ωs∈R– such that

Z1(s)∶=(Rc+Rw)+L0s

1+(Rw+Rc)Cps+CpL0s2,Z2(ω)∶=RcRw+RcL0s

Rc+Rw+(L0+CpRcRw)s+CpRcL0s2,

(B.3a)

Z1(s)∶=N1(s)

D1(s),Z2(s)∶=N2(s)

D2(s), (B.3b)

Z1(s)∶=k1s−z11

(s−p11)(s−p12),Z2(s)∶=k2s−z21

(s−p21)(s−p22), (B.3c)

where Ni∈CCwith i∈{1,2}denotes the complex numerator polynomial with real

coefficients of the respective impedance map – and the map Di∈CCdenotes the

corresponding complex denominator polynomial with real coefficients. The Julia PL

package SymPy.jl (see [110]) is utilized in order to perform symbolic computations

with regard to (B.2) and (B.3).

B.1. An electrical network viewpoint 205

In (B.3), the non-negative numbers k1and k2are defined as k1∶=L

LCpand k2∶=LRc

LCpRp.

Moreover, zi1∈Cwith i∈{1,2}refers to the zero of the respective impedance map

and pij∈Cwith i∈{1,2}and j∈{1,2}refers to the pole of the respective impedance

map. If the poles have non-zero imaginary parts, i.e., ∀i.∀j.Im(pij)≠0, then one can

define the resonance frequency fri∈R+for each ias fri∶=∣Im(pi1)

2π∣.

A useful approximation of the resonance frequency is given by fri∶=1

2π

√CpL. Ob-

serve that we limit our consideration of a real inductive component’s impedance

map Zto the case of one resonance frequency.

In Figure B.3, I illustrate the magnitude (or modulus) Z(ω)and the phase (or

argument) θ(ω)of the impedances in (B.2) associated with the representatives in

Figure B.2 for synthetic data w.r.t. the four-tuple (L0,Rw,Rc,Cp)and the frequency

range [1×102Hz,1 ×108Hz].1

102103104105106107108

Frequency [ Hz ]

10−3

10−1

101

103

105

107

) [Ω]

1 mΩ

10 Ω

102103104105106107108

Frequency [ Hz ]

−90

−60

−30

) [°]

1 mΩ

10 Ω

(A) Given Z1(ω), choose (2µH, Rw,1mΩ,10pF)

where Rw∈[1mΩ,10 Ω].

102103104105106107108

Frequency [ Hz ]

10−3

10−1

101

103

105

107

) [Ω]

9 mH

102103104105106107108

Frequency [ Hz ]

−90

−60

−30

) [°]

9 mH

(B) Given Z1(ω), choose (L0,100mΩ,1mΩ,10pF)

where L0∈[1µH,9mH].

102103104105106107108

Frequency [ Hz ]

10−3

10−1

101

103

105

107

) [Ω]

1 mΩ

10 Ω

102103104105106107108

Frequency [ Hz ]

−90

−60

−30

) [°]

1 mΩ

10 Ω

where Rw∈[1mΩ,10 Ω].

102103104105106107108

Frequency [ Hz ]

10−3

10−1

101

103

105

107

) [Ω]

9 mH

102103104105106107108

Frequency [ Hz ]

−90

−60

−30

) [°]

9 mH

(D) Given Z2(ω), choose (L0,10mΩ,1kΩ,1nF)

where L0∈[1µH,9mH].

FIGURE B.3: Given the frequency range [1×102Hz,1 ×108Hz], the

magnitude Z(ω)and the phase θ(ω)of the impedances in (B.2) as-

sociated with the representatives in Figure B.2 for synthetic data

w.r.t. the four-tuple (L0,Rw,Rc,Cp).

1It is supposed that the identification abs(Z)≡Z(ω)is given where abs refers to the single-valued

absolute value function with the signature CR+

→R+and it is supposed that the identification

arg(Z)≡θ(ω)is given where arg refers to the single-valued argument function with the signature

CR+

→R+.

206 Appendix B. Solenoid with a core (§ 5.1)

In order to move from the field theoretical level in (2.1) to the circuit theoretical

level in (B.1), let us invoke Poynting’s theorem (see, e.g., [139, p. 108ff]) for the fre-

quency domain such that one can determine the three fundamental passive electrical

components by the following identifications:

PL≡∫

2Jcond ⋅EdV,Wm≡∫

4B⋅HdV,We≡∫

4E⋅DdV, (B.4)

R≡PL

rms ,L≡2Wm

rms ,C≡2We

rms , (B.5)

where PL∈R+denotes the time-averaged ohmic loss in Ω2D,Wm∈R+denotes the

time-averaged magnetic energy in Ω2D, and We∈R+denotes the time-averaged

electric energy in Ω2D. In accordance with the elaborations in § 2.2.1, one can con-

ceive the 3-tuple (PL,Wm,We)and the 3-tuple (R,L,C)as 3-tuples of evaluated

quantities of interest.

Notice that Weand Care excluded in the considerations with regard to the 2D-

LBVP in § 5.1. Expressing it in terms of the circuit diagram representative in Fig-

ure B.2a, one can set Rc≡0mΩ,Cp≡0pF, Rw≡R, and L0≡Lsuch that the circuit

diagram representative reduces to a series connection of the impedances associated

with the resistance Rwand the inductance L0.

Furthermore, if we consider the map U∈CR+and the map I∈CR+with respect

to the external terminal nodes, then one can express the resistance Rand the induc-

tance Las

R≡Re(U(ω)

I(ω)),L≡Re(Ψ(ω)

I(ω)), (B.6)

where the map Ψ∈CR+denotes the complex-valued total magnetic flux. By invok-

ing the definition (ii) in (2.5) and applying the theorem of Stokes, the assignment

rule of the map Ψcan be stated as

Ψ=ω↦∫

∂A

A⋅ds. (B.7)

Notice well that due to the numerical integration that is involved in determining the

entities such as Wmin (B.4) or such as Ψ(ω)in (B.6), there might be slight differences

in the decimal places with regard to the resistance Rand the inductance Ldepending

on the method of computation – even if we assume that the domain of integration is

properly chosen.

B.2 A visualization of evaluated data-fit low-fidelity models

regarding (5.12), (5.14c), and (5.15)

B.2. A visualization of evaluated data-fit low-fidelity models regarding (5.12),

(5.14c), and (5.15)207

1 [mm]

1.5

2.5

2 [mm]

5.5

6.5

7.5

8.5

9.5

[mW]

110

130

(1a)

1 [mm]

1.5

2.5

2 [mm]

5.5

6.5

7.5

8.5

9.5

[mW]

110

130

(2a)

1 [mm]

1.5

2.5

2 [mm]

5.5

6.5

7.5

8.5

9.5

[mW]

110

130

(3a)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(1b)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(2b)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(3b)

(A) Representation of z∶=˜

jPL,ω0(x1,x2).

1 [mm]

1.5

2.5

2 [mm]

5.5

6.5

7.5

8.5

9.5

[cm3]

(1a)

1 [mm]

1.5

2.5

2 [mm]

5.5

6.5

7.5

8.5

9.5

[cm3]

(2a)

1 [mm]

1.5

2.5

2 [mm]

5.5

6.5

7.5

8.5

9.5

[cm3]

(3a)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(1b)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(2b)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(3b)

(B) Representation of z∶=˜

Vut(x1,x2).

FIGURE B.4: By using the Sobol quasi-random sequence sampling

plan with m∶=21 and the data-fit low-fidelity models in § 3.2.1,

i.e.,

(1)

Polynomial,

(2)

TPS RBF, and

(3)

Kriging; representa-

tions (surface

(a)

and contour

(b)

) of ˜

jPL,ω0(x1,x2)and ˜

Vut(x1,x2)

where ω0∶=2π100kHz.

208 Appendix B. Solenoid with a core (§ 5.1)

1.5

2.5

5.5

6.5

7.5

8.5

9.5

[mW/cm3]

(1a)

1.5

2.5

5.5

6.5

7.5

8.5

9.5

[mW/cm3]

(2a)

1.5

2.5

5.5

6.5

7.5

8.5

9.5

[mW/cm3]

(3a)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(1b)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(2b)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(3b)

(A) Representation of z∶=ˆ

jPL,Vut,ω0(x1,x2).

1.5

2.5

5.5

6.5

7.5

8.5

9.5

[

1.5

2.5

3.5

(1a)

1.5

2.5

5.5

6.5

7.5

8.5

9.5

[

1.5

2.5

3.5

(2a)

1.5

2.5

5.5

6.5

7.5

8.5

9.5

[

1.5

2.5

3.5

(3a)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(1b)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(2b)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(3b)

(B) Representation of z∶=ˆ

QL,ω0(x1,x2).

FIGURE B.5: By using the Sobol quasi-random sequence sampling

plan with m∶=21 and the data-fit low-fidelity models in § 3.2.1,

i.e.,

(1)

Polynomial,

(2)

TPS RBF, and

(3)

Kriging; representations

(surface

(a)

and contour

(b)

) of ˜

jPL,Vut,ω0(x1,x2)and ˜

QL,ω0(x1,x2)

where ω0∶=2π100kHz.

B.2. A visualization of evaluated data-fit low-fidelity models regarding (5.12),

(5.14c), and (5.15)209

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(1a)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(2a)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(3a)

(A) Depicting grad(˜

K)(x1,x2)w.r.t. Figure B.4a.

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(1b)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(2b)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(3b)

(B) Depicting grad(˜

K)(x1,x2)w.r.t. Figure B.4b.

FIGURE B.6: Depicting grad(˜

K)(x1,x2)as a projection on the contour

representation of the data-fit low-fidelity models for the functions in

Figure B.4.

210 Appendix B. Solenoid with a core (§ 5.1)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(1a)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(2a)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(3a)

(A) Depicting grad(˜

K)(x1,x2)w.r.t. Figure B.5a.

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(1b)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(2b)

1.5 2.5

1 [mm]

5.5

6.5

7.5

8.5

9.5

2 [mm]

(3b)

(B) Depicting grad(˜

K)(x1,x2)w.r.t. Figure B.5b.

FIGURE B.7: Depicting grad(˜

K)(x1,x2)as a projection on the contour

representation of the data-fit low-fidelity models for the functions in

Figure B.5.

211

Bibliography

[1] S. Abramsky and N. Tzevelekos. “Introduction to Categories and Categorical

Logic”. In: New Structures for Physics. Ed. by B. Coecke. Vol. 813. Lecture Notes

in Physics. Springer, 2011, pp. 3–94.

[2] N. M. Alexandrov and M. Y. Hussaini (editors). Multidisciplinary Design Op-

timization: State of the Art. SIAM, 1997.

[3] N. M. Alexandrov and R. M. Lewis. “First-Order Approximation and Model

Management in Optimization”. In: Large-Scale PDE-Constrained Optimization.

Ed. by L.T. Biegler, M. Heinkenschloss, O. Ghattas, and B. van Bloemen Waan-

ders. Vol. 30. Lecture Notes in Computational Science and Engineering. Sprin-

ger, 2003, pp. 63–79.

[4] N. M. Alexandrov, R. M. Lewis, C. R. Gumbert, L. L. Green, and P. A. New-

man. “Approximation and model management in aerodynamic optimization

with variable-fidelity models”. In: Journal of Aircraft 38.6 (2001), pp. 1093–

1101.

[5] S. Arlot and A. Celisse. “A survey of cross-validation procedures for model

selection”. In: Statistics Surveys 4 (2010), pp. 40–79.

[6] D. N. Arnold. Finite Element Exterior Calculus. SIAM, 2018.

[7] D. N. Arnold. “Stability, Consistency, and Convergence of Numerical Dis-

cretizations”. In: Encyclopedia of Applied and Computational Mathematics. Ed.

by B. Engquist. Springer, 2015.

[8] D. N. Arnold, R. S. Falk, and R. Winther. “Finite element exterior calculus,

homological techniques, and applications”. In: Acta Numerica 15 (2006), pp. 1–

155.

[9] K. Atkinson and W. Han. Theoretical Numerical Analysis: A Functional Analysis

Framework. 3rd ed. Springer, 2009.

[10] C. Audet and M. Kokkolaras (editors). “Special Issue on Blackbox and Deri-

vative-Free Optimization”. In: Optimization and Engineering 17.1 (2016), pp. 1–

262.

[11] S. Awodey. Category Theory. Oxford University Press, 2nd ed., 2009.

[12] I. Babuška and J. T. Oden. “Verification and validation in computational en-

gineering and science: basic concepts”. In: Comput. Methods Appl. Mech. Eng.

139.36 (2004), pp. 4057–4066.

[13] J. C. Baez and B. Fong. “A Compositional Framework for Passive Linear Net-

works”. In: Theory and Applications of Categories 33.38 (2018), pp. 1158–1222.

[14] J. W. Bandler, R. M. Biernacki, S. H. Chen, R. H. Hemmers, and K. Mad-

sen. “Electromagnetic optimization exploiting aggressive space mapping”.

In: IEEE Trans. Microwave Theory Tech. 43.12 (1995), pp. 2874–2882.

[15] H. P. Barendregt. The Lambda Calculus — Its Syntax and Semantics. revised ed.

North-Holland, 1985.

212 Bibliography

[16] M. Barr and C. Wells. “Category Theory for Computing Science”. In: Reprints

in Theory and Applications of Categories 22 (2012), pp. 1–538.

[17] R. Barrio, J. M. Peña, and T. Sauer. “Three term recurrence for the evaluation

of multivariate orthogonal polynomials”. In: Journal of Approximation Theory

162.2 (2010), pp. 407–420.

[18] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear Programming: Theory

and Algorithms. 3rd ed. Wiley, 2006.

[19] R. K. Beatson, M. J. D. Powell, and A. M. Tan. “Fast evaluation of polyhar-

monic splines in three dimensions”. In: IMA Journal of Numerical Analysis 27.3

(2007), pp. 427–450.

[20] P. Benner, S. Gugercin, and K. Willcox. “A Survey of Projection-Based Model

Reduction Methods for Parametric Dynamical Systems”. In: SIAM Review

57.4 (2015), pp. 483–531.

[21] P. Benner, M. Hinze, and E. J. W. ter Marten (editors). Model Reduction for

Circuit Simulation. 1st ed. Springer, 2011.

[22] J.-P. Berrut and L. N. Trefethen. “Barycentric Lagrange Interpolation”. In:

SIAM Review 46.3 (2004), pp. 501–517.

[23] L. Bessi. Surrogates in Julia.

https://github.com/SciML/Surrogates.jl

[Online; accessed 12-February-2020]. 2020.

[24] H.-G. Beyer and H.-P. Schwefel. “Evolution strategies: A comprehensive in-

troduction”. In: Natural Computing 1 (2002), pp. 3–52.

[25] J. Bezanson, J. Chen, B. Chung, S. Karpinski, V. B. Shah, J. Vitek, and L.

Zoubritzky. “Julia: Dynamism and Performance Reconciled by Design”. In:

Proc. ACM Program. Lang. 2.OOPSLA (2018), pp. 1–23.

[26] J. Bezanson, J. Edelman, S. Karpinski, and V. B. Sha. “Julia: A Fresh Approach

to Numerical Computing”. In: SIAM Review 59.1 (2017), pp. 65–98.

[27] M. E. Biancolini. Fast Radial Basis Functions for Engineering Applications. 1st ed.

Springer, 2017.

[28] P. B. Bochev and A. C. Robinson. “Matching algorithms with physics: exact

sequences of finite element spaces”. In: Collected lectures on preservation of sta-

bility under discretization. Ed. by D. Estep and S. Tavener. Workshop on the

preservation of stability under discretization. SIAM, 2001, pp. 145–166.

[29] D. E. Bockelman and W. R. Eisenstadt. “Combined Differential and Common-

Mode Scattering Parameters: Theory and Simulation”. In: IEEE Transactions

on Microwave Theory and Techniques 43.7 (1995), pp. 1530–1539.

[30] Z. Bontinck. Simulation and Robust Optimization for Electric Devices with Uncer-

tainties. Technische Universität Darmstadt, PhD thesis. 2018.

[31] A. J. Booker, J. E. Dennis, P. D. Frank, D. B. Serafini, V. Torczon, and M. W.

Trosset. “A rigorous framework for optimization of expensive functions by

surrogates”. In: Structural Optimization 17.1 (1999), pp. 1–13.

[32] A. Bossavit. Computational Electromagnetism: Variational Formulations, Comple-

mentarity, Edge Elements. Academic Press, 1998.

[33] A. Bossavit. “Discretization of Electromagnetic Problems: The “Generalized

Finite Differences” Approach”. In: Numerical Methods in Electromagnetics. Ed.

by P. G. Ciarlet. Vol. 13. Handbook of Numerical Analysis. Elsevier, 2005,

pp. 105–197.

Bibliography 213

[34] J. P. Boyd and K. W. Gildersleeve. “Numerical experiments on the condition

number of the interpolation matrices for radial basis functions”. In: Applied

Numerical Mathematics 61.4 (2011), pp. 443–459.

[35] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University

Press, 2004.

[36] R. Brown and T. Porter. “Category Theory: an abstract setting for analogy

and comparison”. In: What is Category Theory? Advanced Studies in Math-

ematics and Logic, Polimetrica Publisher. 2006, pp. 257–274.

[37] R. Brown and T. Porter. “The methodology of mathematics”. In: The Mathe-

matical Gazette 79.485 (1995), pp. 321–334.

[38] M. D. Buhmann. Radial Basis Functions – Theory and Implementation. 1st ed.

Cambridge University Press, 2003.

[39] H.-J. Bungartz and M. Griebel. “Sparse grids”. In: Acta Numerica 13 (2004),

pp. 147–269.

[40] S. Burgard, O. Farle, P. Loew, and R. Dyczij-Edlinger. “Fast Shape Optimiza-

tion of Microwave Devices Based on Parametric Reduced-Order Models”. In:

IEEE Transactions on Magnetics 50.2 (2014), pp. 629–632.

[41] R. M. Burkart. Advanced Modeling and Multi-Objective Optimization of Power

Electronic Converter Systems. ETH Zürich, PhD thesis. 2016.

[42] W. Burke. Applied Differential Geometry. 1st ed. Cambridge University Press,

1985.

[43] J. M. F. Castillo. “The Hitchhiker Guide to Categorical Banach Space Theory.

Part I”. In: Extracta Mathematicae 25.2 (2010), pp. 103–149.

[44] L. Codecasa. “Novel Approach to Model Order Reduction for Nonlinear Eddy-

Current Problems”. In: IEEE Transactions on Magnetics 51.3 (2015), pp. 1–4.

[45] B. Coecke and A. Kissinger. Picturing Quantum Processes – A First Course in

Quantum Theory and Diagrammatic Reasoning. Cambridge University Press,

2017.

[46] B. Coecke (editor). New Structures for Physics. Springer, 2011.

[47] A. R. Conn, K. Scheinberg, and L. N. Vicente. Introduction to Derivative-Free

Optimization. 1st ed. SIAM, 2009.

[48] S. D. Conte and C. de Boor. Elementary Numerical Analysis: An Algorithmic

Approach. 3rd ed. McGraw-Hill, 1980.

[49] G. Crevecoeur. Numerical Methods for Low Frequency Electromagnetic Optimiza-

tion and Inverse Problems using Multi-Level Techniques. Ghent University, PhD

thesis. 2009.

[50] G. Crevecoeur and R. H. De Staelen. “On cost function transformations for

the reduction of uncertain model parameters’ impact towards the optimal

solutions”. In: J. Comput. App. Math. 289 (2015), pp. 392–399.

[51] F. Cucker and D.-X. Zhou. Learning Theory: An Approximation Theory View-

point. Cambridge University Press, 2007.

[52] J. Culbertson and K. Sturtz. “A Categorical Foundation for Bayesian Proba-

bility”. In: Applied Categorical Structures 22.4 (2014), pp. 647–662.

[53] A. Dean, M. Morris, J. Stufken, and D. Bingham (editors). Handbook of Design

and Analysis of Experiments. 1st ed. CRC Press, 2015.

214 Bibliography

[54] M. C. Delfour and J.-P. Zolésio. Shapes and Geometries – Metrics, Analysis, Dif-

ferential Calculus, and Optimization. 2nd ed. SIAM, 2011.

[55] P. Deuflhard and A. Hohmann. Numerical Analysis in Modern Scientific Com-

puting: An Introduction. 2nd ed. Springer, 2003.

[56] D. Echeverría Ciaurri. Multi-Level Optimization: Space Mapping and Manifold

Mapping. Amsterdam University, PhD thesis. 2007.

[57] M. S. Eldred and D. M. Dunlavy. “Formulations for surrogate-based opti-

mization with data-fit, multifidelity and reduced-order models”. In: 2006 11th

AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference. Por-

tsmouth, U.S.A., September 6–8, 2006, pp. 1–20.

[58] C. Elliott. “The simple essence of automatic differentiation”. In: 2018 23rd

ACM SIGPLAN International Conference on Functional Programming (ICFP).

St. Louis, United States, September 23–29, 2018, pp. 1–29.

[59] M. H. van Emden and B. Moa. “Termination Criteria in the Moore-Skelboe

Algorithm for Global Optimization by Interval Arithmetic”. In: Frontiers in

Global Optimization. Ed. by C. A. Floudas and P. Pardalos. Springer, 2004.

[60] R. W. Erickson and D. Maksimovic. Fundamentals of Power Electronics. 2nd ed.

Springer, 2001.

[61] K.-T. Fang, R. Li, and A. Sudjianto. Design and Modeling for Computer Experi-

ments. 1st ed. CRC Press, 2006.

[62] G. E. Fasshauer. Meshfree Approximation Methods with Matlab. 1st ed. World

Scientific Publishing, 2007.

[63] G. E. Fasshauer and M. J. McCourt. “Stable Evaluation of Gaussian Radial

Basis Function Interpolants”. In: SIAM Journal on Scientific Computing 34.2

(2012), pp. 737–762.

[64] R. Feldt. Black-box optimization package for Julia.

https://github.com/robert

feldt/BlackBoxOptim.jl

. [Online; accessed 12-February-2020]. 2020.

[65] J. L. Fiadeiro. Categories for Software Engineering. Springer, 2005.

[66] H. Flanders. Differential Forms with Applications to the Physical Sciences. 1st ed.

Academic Press, 1963.

[67] C. A. Floudas. Deterministic Global Optimization: Theory, Methods and Applica-

tions. 1st ed. Springer, 2000.

[68] B. Fong, D. I. Spivak, and R. Tuyéras. “Backprop as Functor: A compositional

perspective on supervised learning”. In: 2019 34th Annual ACM/IEEE Sym-

posium on Logic in Computer Science (LICS). Vancouver, Canada, June 24–

27, 2019, pp. 1–13.

[69] A. Forrester and A. Keane. “Recent advances in surrogate-based optimiza-

tion”. In: Progress in Aerospace Sciences 45.1–3 (2009), pp. 50–79.

[70] A. Forrester, A. Sóbester, and A. Keane. Engineering Design via Surrogate Mod-

elling - A Practical Guide. Wiley, 2008.

[71] P. Freyd. Abelian categories. An Introduction to the Theory of Functors. 1st ed.

Harper and Row, 1964.

[72] F. Genovese, A. Gryzlov, J. Herold, A. Knispel, M. Perone, E. Post, and A.

Videla. “idris-ct: A library to do category theory in Idris”. In: 2019 2nd ACT

Applied Category Theory Conference. Oxford, United Kingdom, July 15–19,

2019, 1–13.

Bibliography 215

[73] R. Geroch. Mathematical Physics. University of Chicago Press, 1985.

[74] R. G. Ghanem and P. D. Spanos. Stochastic Finite Elements: A Spectral Approach.

1st ed. Springer, 1991.

[75] A. Gil, J. Segura, and N. M. Temme. Numerical Methods for Special Functions.

1st ed. SIAM, 2007.

[76] D. Ginsbourger, R. Le Riche, and L. Carraro. “Kriging Is Well-Suited to Par-

allelize Optimization”. In: Computational Intelligence in Expensive Optimization

Problems. Ed. by Y. Tenne and C. K. Goh. Springer, 2010.

[77] L. Giraldi, A. Litvinenko, D. Liu, Matthies H. G., and A. Nouy. “To Be or Not

to Be Intrusive? The Solution of Parametric and Stochastic Equations - the

”Plain Vanilla” Galerkin Case”. In: SIAM Journal on Scientific Computing 36.6

(2014), pp. 2720–2744.

[78] R. Goldman. Pyramid Algorithms: A Dynamic Programming Approach to Curves

and Surfaces for Geometric Modeling. 1st ed. Morgan Kaufmann, 2003.

[79] L. Grasedyck, D. Kressner, and C. Tobler. “A literature survey of low-rank

tensor approximation techniques”. In: GAMM-Mitteilungen 36.1 (2013), pp. 53–

78.

[80] R. Griesse and B. Vexler. “Numerical Sensitivity Analysis for the Quantity

of Interest in PDE-Constrained Optimization”. In: SIAM J. Sci. Comput. 29.1

(2007), pp. 22–48.

[81] A. Griewank and A. Walther. Evaluating Derivatives: Principles and Techniques

of Algorithmic Differentiation. 2nd ed. SIAM, 2008.

[82] J. Groß. Linear Regression. 1st ed. Springer, 2003.

[83] P. W. Gross and P. R. Kotiuga. Electromagnetic Theory and Computation: A Topo-

logical Approach. 1st ed. Cambridge University Press, 2004.

[84] NEOS Guide. NEOS Server: State-of-the-Art Solvers for Numerical Optimization.

https://neos-guide.org/

. [Online; accessed 12-February-2020]. 2020.

[85] B. Gustavsen and A. Semlyen. “Rational approximation of frequency do-

main responses by vector fitting”. In: IEEE Transactions on Power Delivery 14.3

(1999), pp. 1052–1061.

[86] S. Gutsche. Constructive Category Theory and Applications to Algebraic Geometry.

Universität Siegen, PhD thesis. 2017.

[87] R. T. Haftka, D. Villanueva, and A. Chaudhuri. “Parallel surrogate-assisted

global optimization with expensive functions - a survey”. In: Struct Multidisc

Optim 54 (2016), pp. 3–13.

[88] R. Harper. Practical Foundations for Programming Languages. Cambridge Uni-

versity Press, 2nd ed., 2015.

[89] D. W. Hart. Introduction to Power Electronics. 1st ed. Prentice Hall, 1996.

[90] B. Hashemi and L. N. Trefethen. “Chebfun in Three Dimensions”. In: SIAM

Journal on Scientific Computing 39.5 (2017), pp. C341–C363.

[91] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning.

2nd ed. Springer, 2008.

[92] A. Hatcher. Algebraic Topology. 1st ed. Cambridge University Press, 2002.

[93] F. W. Hehl and Y. N. Obukhov. Foundations of Classical Electrodynamics. 1st ed.

Birkhäuser, 2003.

216 Bibliography

[94] J. S. Hesthaven, G. Rozza, and B. Stamm. Certified Reduced Basis Methods for

Parametrized Partial Differential Equations. Springer, 2015.

[95] M. Hintermüller and L. Vicente. “Space mapping for optimal control of par-

tial differential equations”. In: SIAM J. Optim 15.4 (2005), pp. 1002–1025.

[96] M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich. Optimization with PDE Con-

straints. Springer, 2009.

[97] R. Hiptmair, F. Krämer, and J. Ostrowski. “Robust Maxwell Formulations for

all Frequencies”. In: IEEE Transactions on Magnetics 44.6 (2008), pp. 682–685.

[98] R. Horst and H. Tuy. Global optimization: Deterministic approaches. 3rd ed. Sprin-

ger, 1995.

[99] A. S. Householder. The Theory of Matrices in Numerical Analysis. Dover, 2006.

[100] B. G. M. Husslage, G. Rennen, E. R. van Dam, and D. den Hertog. “Space-

filling Latin hypercube designs for computer experiments”. In: Optimization

and Engineering 12 (2011), pp. 611–630.

[101] Julia Interop. An interface for using MATLAB™ from Julia using the MATLAB

C api.

https://github.com/JuliaInterop/MATLAB.jl

. [Online; accessed

12-February-2020]. 2020.

[102] C. Ionescu and P. Jansson. “Domain-specific languages of mathematics: Pre-

senting mathematical analysis using functional programming”. In: 4th and

5th International Workshop on Trends in Functional Programming in Educa-

tion. 2016, pp. 1–15.

[103] J. D. Jackson. Classical Electrodynamics. 3rd ed. John Wiley & Sons, 1999.

[104] L. Jaulin, M. Kieffer, O. Didrit, and E. Walter. Applied Interval Analysis: With

Examples in Parameter and State Estimation, Robust Control and Robotics. 1st ed.

Springer, 2001.

[105] S. G. Johnson. HCubature.jl – A pure-Julia multidimensional h-adaptive integra-

tion.

https://github.com/JuliaMath/HCubature.jl

. [Online; accessed 12-

February-2020]. 2020.

[106] S. G. Johnson. Sobol low-discrepancy-sequence (LDS) package for Julia.

https:

//github.com/stevengj/Sobol.jl

. [Online; accessed 12-February-2020].

2020.

[107] S. G. Johnson. The NLopt nonlinear-optimization package.

http://github.com/

stevengj/nlopt

. [Online; accessed 12-February-2020]. 2020.

[108] D. R. Jones. “A Taxonomy of Global Optimization Methods Based on Re-

sponse Surfaces”. In: Journal of Global Optimization 21.4 (2001), pp. 345–383.

[109] D. R. Jones, C. D. Perttunen, and B. E. Stuckman. “Lipschitzian optimization

without the Lipschitz constant”. In: J. Optimization Theory and Application 79.1

(1993), pp. 157–181.

[110] JuliaPy. A Julia interface to SymPy (a Python library for symbolic mathematics)

via PyCall.

https://github.com/JuliaPy/SymPy.jl

. [Online; accessed 12-

February-2020]. 2020.

[111] J. Kangas, T. Tarhasaari, and L. Kettunen. “Maxwell equations and finite el-

ement software systems: object-oriented coding needs well defined objects”.

In: IEEE Transactions on Magnetics 36.4 (2000), pp. 1645 –1648.

[112] M. K. Kazimierczuk. High-Frequency Magnetic Components. 2nd ed. Wiley, 2014.

Bibliography 217

[113] P. Kerschke, H. H. Hoos, F. Neumann, and H. Trautmann. “Automated Algo-

rithm Selection: Survey and Perspectives”. In: Evolutionary Computation 27.1

(2019), pp. 3–45.

[114] L. Kettunen, T. Kovanen, and T. Tarhasaari. “Electromagnetism and cross-

disciplinary problems”. In: 2016 URSI International Symposium on Electro-

magnetic Theory (EMTS). Espoo, Finland, August 14–18, 2016, pp. 500–501.

[115] D. Klis, M. Jochum, O. Farle, and R. Dyczij-Edlinger. “Application of nonlin-

ear model-order reduction to 3D eddy current problems”. In: 2013 Interna-

tional Conference on Electromagnetics in Advanced Applications (ICEAA).

Torino, Italy, September 09–13, 2013, pp. 344–347.

[116] M. J. Kochenderfer and T. A. Wheeler. Algorithms for Optimization. The MIT

Press, 2019.

[117] I. Koláˇr, P. W. Michor, and J. Slovák. Natural operations in differential geometry.

Springer, 1993.

[118] Y. Konkel, O. Farle, A. Köhler, A. Schultschik, and R. Dyczij-Edlinger. “Adap-

tive strategies for fast frequency sweeps”. In: COMPEL 30.2 (2011), pp. 1855–

1869.

[119] S. Koziel. “Space mapping: Performance, reliability, open problems and per-

spectives”. In: 2017 IEEE MTT-S International Microwave Symposium (IMS).

Honolulu, U.S.A., June 4–9, 2017, pp. 1512–1514.

[120] S. Koziel and J. W. Bandler. “Coarse and Surrogate Model Assessment for En-

gineering Design Optimization with Space Mapping”. In: 2007 IEEE/MTT-

S International Microwave Symposium (IMS). Honolulu, U.S.A., June 3–8,

2007, pp. 107–110.

[121] S. Koziel, J. W. Bandler, and K. Madsen. “Quality assessment of coarse models

and surrogates for space mapping optimization”. In: Optimization and Engi-

neering 9 (2008), pp. 375–391.

[122] S. Koziel and A. Bekasiewicz. “Sequential approximate optimisation for sta-

tistical analysis and yield optimisation of circularly polarised antennas”. In:

IET Microw. Antennas Propag 12.13 (2018), pp. 2060–2064.

[123] S. Koziel and A. Bekasiewicz. “Variable-fidelity design optimization of anten-

nas with automated model selection”. In: 2016 10th European Conference on

Antennas and Propagation (EuCAP). Davos, Switzerland, April 10–15, 2016,

pp. 1–5.

[124] S. Koziel and D. Echeverría Ciaurri. “Reliable Simulation-Driven Design Op-

timization of Microwave Structures Using Manifold Mapping”. In: Progress

In Electromagnetics Research B 26.1 (2010), pp. 361–382.

[125] S. Koziel, D. Echeverría Ciaurri, and L. Leifsson. “Chapter 3: Surrogate-Based

Methods”. In: Computational Optimization, Methods and Algorithms. Ed. by S.

Koziel and X. S. Yang. Vol. 356. Studies in Computational Intelligence. Sprin-

ger, 2011, pp. 33–59.

[126] S. Koziel, L. Leifsson, and X. S. Yang (editors). Solving Computationally Expen-

sive Engineering Problems: Methods and Applications. Springer, 2014.

[127] S. Koziel and L. Leifsson (editors). Surrogate-Based Modeling and Optimization:

Applications in Engineering. Springer, 2013.

218 Bibliography

[128] D. Kraft. “Algorithm 733: TOMP–Fortran modules for optimal control calcu-

lations”. In: ACM Transactions on Mathematical Software 20.3 (1994), pp. 262–

281.

[129] S. Kucherenko and B. Iooss. “Derivative-Based Global Sensitivity Measures”.

In: Handbook of Uncertainty Quantification. Ed. by R. Ghanem, D. Higdon, and

H. Owhadi. Cham: Springer, 2017. Chap. 36, pp. 1241–1263.

[130] J. Kuipers, A. Plaat, J. A. M. Vermaseren, and H. J. van den Herik. “Improving

multivariate Horner schemes with Monte Carlo tree search”. In: Computer

Physics Communications 184.11 (2013), pp. 2391–2395.

[131] S. Kurz and B. Auchmann. “Differential Forms and Boundary Integral Equa-

tions for Maxwell-Type Problems”. In: Fast Boundary Element Methods in Engi-

neering and Industrial Applications. Ed. by U. Langer, M. Schanz, O. Steinbach,

and W. Wendland. Springer, 2012.

[132] V. Lahtinen. Searching for Frontiers in Contemporary Eddy Current Model Based

Hysteresis Loss Modelling of Superconductors. Tampere University of Technol-

ogy, PhD thesis. 2014.

[133] V. Lahtinen, P. R. Kotiuga, and A. Stenvall. An electrical engineering perspective

on missed opportunities in computational physics. arXiv:1809.01002v2. 2018.

[134] S. Lang. Differential manifolds. 2nd ed. Springer, 1985.

[135] J. Larson, M. Menickelly, and S. M. Wild. “Derivative-free optimization meth-

ods”. In: Acta Numerica 28 (2019), pp. 287–404.

[136] F. W. Lawvere and S. H. Schanuel. Conceptual Mathematics. 2nd ed. Cambridge

University Press, 2009.

[137] L. Lebensztajn, C. A. R. Marretto, M. C. Costa, and J.-L. Coulomb. “Kriging: A

Useful Tool for Electromagnetic Device Optimization”. In: IEEE Transactions

on Magnetics 40.2 (2004), pp. 1196–1199.

[138] M. C. Lehmann, M. Hadžiefendi´c, A. Piwonski, and R. Schuhmann. “Encod-

ing Electromagnetic Transformation Laws for Dimensional Reduction”. In:

International Journal of Numerical Modelling: Electronic Networks, Devices and

Fields 33.1 (2020).

https://doi.org/10.1002/jnm.2747

, e2747.

[139] G. Lehner. Electromagnetic Field Theory for Engineers and Physicists. 1st ed. Sprin-

ger, 2010.

[140] P. Linz. Theoretical Numerical Analysis: An Introduction to Advanced Techniques.

1st ed. John Wiley & Sons, 1979.

[141] E. Ljungskog. Interpolation of scattered data in Julia.

https://github.com/el

jungsk/ScatteredInterpolation.jl

. [Online; accessed 12-February-2020].

2020.

[142] D. G. Luenberger. Optimization by Vector Space Methods. Wiley, 1969.

[143] H. D. Macedo and J. N. Oliveira. “Typing linear algebra: A biproduct-oriented

approach”. In: Science of Computer Programming 78.11 (2013), pp. 2160–2191.

[144] S. MacLane. Categories for the Working Mathematician. Springer, 1971.

[145] N. Marheineke and R. Pinnau. “Model hierarchies in space mapping opti-

mization: Feasibility study for transport processes”. In: J. Comput. Meth. Sci.

Eng 12.1,2 (2012), pp. 63–74.

Bibliography 219

[146] R. T. Marler and J. S. Arora. “Survey of Multi-Objective Optimization Meth-

ods for Engineering”. In: Structural and Multidisciplinary Optimization 26.6

(2004), pp. 369–395.

[147] J. R. R. A. Martins and A. B. Lambe. “Multidisciplinary Design Optimization:

A Survey of Architectures”. In: AIAA Journal 51.9 (2013), pp. 2049–2075.

[148] P. McCullagh. “What is a statistical model?” In: The Annals of Statistics 30.5

(2002), pp. 1225–1310.

[149] D. Meeker. Finite Element Method Magnetics (FEMM4.2).

http://www.femm.

info/

. 2017.

[150] K. Miettinen. Nonlinear Multiobjective Optimization. Kluwer Academic Pub-

lishers, 1999.

[151] P. K. Mogensen and A. N. Riseth. “Optim: A mathematical optimization pack-

age for Julia”. In: Journal of Open Source Software 3.24 (2018), p. 615.

[152] N. Mohan, T. M. Undeland, and W. P. Robbins. Power Electronics: Converters,

Applications, and Design. 3rd ed. Wiley, 2002.

[153] P. Monk. Finite Element Methods for Maxwell’s Equations. 1st ed. Oxford Uni-

versity Press, 2003.

[154] J. Mühlethaler. Modeling and multi-objective optimization of inductive power com-

ponents. ETH Zürich, PhD thesis. 2012.

[155] K. P. Murphy. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.

[156] R. D. Neidinger. “Introduction to Automatic Differentiation and MATLAB

Object-Oriented Programming”. In: SIAM Review 52.3 (2010), pp. 545–563.

[157] A. Neumaier. “Complete search in continuous global optimization and con-

straint satisfaction”. In: Acta Numerica 13 (2004), pp. 271–369.

[158] J. Nocedal and S. J. Wright. Numerical Optimization. 2nd ed. Springer, 2006.

[159] W. L. Oberkampf and C. J. Roy. Verification and Validation in Scientific Comput-

ing. 1st ed. Cambridge University Press, 2010.

[160] J. T. Oden and S. Prudhomme. “Estimation of Modeling Error in Computa-

tional Mechanics”. In: Journal of Computational Physics 182.2 (2002), pp. 496–

515.

[161] J. Oliver. “Rounding error propagation in polynomial evaluation schemes”.

In: Journal of Computational and Applied Mathematics 5.2 (1979), pp. 85–97.

[162] S. Patel. LaTeX Templates: Masters/Doctoral Thesis.

https://www.latextem

plates.com/template/masters-doctoral-thesis

. [Online; accessed 12-

February-2020]. 2020.

[163] E. Patterson. Catlab – A framework for applied category theory in the Julia language.

https://github.com/AlgebraicJulia/Catlab.jl

. [Online; accessed 12-

February-2020]. 2020.

[164] C. R. Paul. Introduction to Electromagnetic Compatibility. 2nd ed. Wiley, 2006.

[165] B. Peherstorfer, K. Willcox, and M. Gunzburger. “Optimal model manage-

ment for multifidelity monte carlo estimation”. In: SIAM Journal on Scientific

Computing 38.5 (2016), pp. 3163–3194.

[166] B. Peherstorfer, K. Willcox, and M. Gunzburger. “Survey of Multifidelity Me-

thods in Uncertainty Propagation, Inference, and Optimization”. In: SIAM

Review 60.3 (2018), pp. 550–591.

220 Bibliography

[167] B. C. Pierce. Basic Category Theory for Computer Scientists. MIT Press, 1991.

[168] S. Posur. Constructive Category Theory and Applications to Equivariant Sheaves.

Universität Siegen, PhD thesis. 2017.

[169] M. J. D. Powell. Approximation Theory and Methods. 1st ed. Cambridge Univer-

sity Press, 1981.

[170] M. J. D. Powell. “Direct search algorithms for optimization calculations”. In:

Acta Numerica 7 (1998), pp. 287–336.

[171] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical

Recipes - The Art of Scientific Computing. 3rd ed. Cambridge University Press,

2007.

[172] C. Psarras, H. Barthels, and P. Bientinesi. The Linear Algebra Mapping Problem.

arXiv:1911.09421v1. 2019.

[173] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning.

MIT Press, 2006.

[174] P. Raumonen. Mathematical structures for dimensional reduction and equivalence

classification of electromagnetic boundary value problems. Tampere University of

Technology, PhD thesis. 2009.

[175] J. Revels, M. Lubin, and T. Papamarkou. Forward-Mode Automatic Differentia-

tion in Julia. arXiv:1607.07892. 2016.

[176] J. A. Richardson and J. L. Kuester. “The complex method for constrained op-

timization”. In: Commun. ACM 16.8 (1973), pp. 487–489.

[177] E. Riehl. Category Theory in Context. Dover, 2016.

[178] U. Römer. Numerical Approximation of the Magnetoquasistatic Model with Un-

certainties and its Application to Magnet Design. Technische Universität Darm-

stadt, PhD thesis. 2015.

[179] A. A. Rodríguez and A. Valli. Eddy Current Approximation of Maxwell Equations

– Theory, algorithms and applications. 1st ed. Springer, 2010.

[180] S. Roman. An Introduction to the Language of Category Theory. Birkhäuser, 2017.

[181] G. Roussos and B. J. C. Baxter. “Rapid evaluation of radial basis functions”.

In: Journal of Computational and Applied Mathematics 180.1 (2005), pp. 51–70.

[182] C. J. Roy and W. L. Oberkampf. “A comprehensive framework for verifi-

cation, validation, and uncertainty quantification in scientific computing”.

In: Computer Methods in Applied Mechanics and Engineering 200.25-28 (2011),

pp. 2131–2144.

[183] W. Rudin. Principles of Mathematical Analysis. 3rd ed. McGraw–Hill, 1976.

[184] J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn. “Design and Analysis of

Computer Experiments”. In: Statistical Science 4.4 (1989), pp. 409–435.

[185] B. S. Saini, M. Lopez-Ibanez, and K. Miettinen. “Automatic surrogate mod-

elling technique selection based on features of optimization problems”. In:

2019 The Genetic and Evolutionary Computation Conference (GECCO). Pra-

gue, Czech Republic, July 13–17, 2019, pp. 1765–1772.

[186] D. P. Sanders. Rigorous global optimisation package for Julia.

https://githu

b.com/JuliaIntervals/IntervalOptimisation.jl

. [Online; accessed 12-

February-2020]. 2020.

Bibliography 221

[187] R. Schaback and H. Wendland. “Kernel techniques: From machine learning

to meshless methods”. In: Acta Numerica 15 (2006), pp. 543–639.

[188] M. Scheuerer, R. Schaback, and M. Schlather. “Interpolation of spatial data

– A stochastic or a deterministic problem?” In: European Journal of Applied

Mathematics 24.4 (2013), pp. 601–629.

[189] W. H. Schilders, H. A. van der Vorst, and J. Rommes (editors). Model Order

Reduction: Theory, Research Aspects and Applications. 1st ed. Springer, 2008.

[190] R. B. Schnabel. “Parallel Nonlinear Optimization: Limitations, Challenges,

and Opportunities”. In: Algorithms for Continuous Optimization: The State of the

Art. Ed. by E Spedicato. Vol. 434. NATO ASI Series (Series C: Mathematical

and Physical Sciences). Springer, 1994, pp. 531–559.

[191] S. Y. Serovajsky. “Differentiation Functor and Its Application in the Optimiza-

tion Control Theory”. In: Fourier Analysis. Ed. by Michael Ruzhansky and

Ville Turunen. Springer International Publishing, 2014, pp. 335–347.

[192] S. Shalev-Shwartz and S. Ben-David. Understanding Machine Learning: From

Theory to Algorithms. Cambridge University Press, 2014.

[193] C. H. da Silva Santos, M. S. Gonçalves, and H. E. Hernandez-Figueroa. “De-

signing Novel Photonic Devices by Bio-Inspired Computing”. In: IEEE Pho-

tonics Technology Letters 22.15 (2010), pp. 1177–1179.

[194] J. Søndergaard. Optimization using surrogate models - by the space mapping tech-

nique. Technical University of Denmark, PhD thesis. 2003.

[195] J. C. Spall. Introduction to Stochastic Search and Optimization: Estimation, Simu-

lation, and Control. 1st ed. Wiley, 2003.

[196] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. 2nd

ed. MIT Press, 2001.

[197] D. I. Spivak. “Categories as Mathematical Models”. In: Categories for the

Working Philosopher (ed. by Elaine Landry), Oxford University Press. 2017,

pp. 381–401.

[198] T. Steinmetz, S. Kurz, and M. Clemens. “Domains of validity of quasistatic

and quasistationary field approximations”. In: COMPEL 30.4 (2011), pp. 1237–

1247.

[199] A. Stenvall and V. Lahtinen. “The Methodology of HTS AC-Loss Modeling”.

In: IEEE Transactions on Applied Superconductivity 29.5 (2019), pp. 1–7.

[200] G. Strang. Computational Science and Engineering. 1st ed. Wellesley-Cambridge

Press, 2007.

[201] T. J. Sullivan. Introduction to Uncertainty Quantification. 1st ed. Springer, 2015.

[202] S. Surjanovic and D. Bingham. Virtual Library of Simulation Experiments: Test

Functions and Datasets.

http://www.sfu.ca/~ssurjano

. [Online; accessed

12-February-2020]. 2020.

[203] K. Svanberg. “A class of globally convergent optimization methods based

on conservative convex separable approximations”. In: SIAM J. Optim. 12.2

(2002), pp. 555–573.

[204] W. P. Thurston. “On proof and progress in mathematics”. In: Bulletin of the

American Mathematical Society 30.2 (1994), pp. 161–177.

[205] E. Tonti. The Mathematical Structure of Classical and Relativistic Physics: A Gen-

eral Classification Diagram. Springer, 2013.

222 Bibliography

[206] A. Townsend and L. N. Trefethen. “An Extension of Chebfun to Two Dimen-

sions”. In: SIAM Journal on Scientific Computing 35.6 (2013), pp. C495–C518.

[207] J. F. Traub and A. G. Werschulz. Complexity and Information. 1st ed. Cambridge

University Press, 1998.

[208] L. N. Trefethen. Approximation Theory and Approximation Practice. 1st ed. SIAM,

2013.

[209] L. N. Trefethen. “Six myths of polynomial interpolation and quadrature”. In:

Mathematics Today 47.4 (2012), pp. 184–188.

[210] F. Tröltzsch. Optimal Control of Partial Differential Equations – Theory, Methods

and Applications. Graduate Studies in Mathematics, Volume: 112. American

Mathematical Society, 2010.

[211] F. Tröltzsch and A. Valli. “Optimal control of low-frequency electromagnetic

fields in multiply connected conductors”. In: Optimization 65.9 (2016), pp. 1651–

1673.

[212] R. Trobec and G. Kosec. Parallel Scientific Computing: Theory, Algorithms, and

Applications of Mesh Based and Meshless Methods. Springer, 2015.

[213] W. Tucker. Validated Numerics: A Short Introduction to Rigorous Computations.

Princeton University Press, 2011.

[214] The Univalent Foundations Program. Homotopy Type Theory: Univalent Foun-

dations of Mathematics. Institute for Advanced Study:

https://homotopytype

theory.org/book

, 2013.

[215] M. Urquhart. Julia package for the creation of optimised Latin Hypercube Sampling

Plans.

https://github.com/MrUrq/LatinHypercubeSampling.jl

. [Online;

accessed 12-February-2020]. 2020.

[216] M. Urquhart, E. Ljungskog, and S. Sebben. “Surrogate-based optimisation

using adaptively scaled radial basis functions”. In: Applied Soft Computing 88

(2020), pp. 1–17.

[217] S. Voß. “Meta-heuristics: The State of the Art”. In: Local Search for Planning

and Scheduling (LSPS 2000). Ed. by A. Nareyek. Springer, 2001.

[218] P. Šolin. Partial Differential Equations and the Finite Element Method. 1st ed. John

Wiley & Sons, 2006.

[219] R. F. C. Walters. Categories and Computer Science. Cambridge Computer Sci-

ence Texts, 1991.

[220] Q. Wang, X. Zhang, R. Burgos, D. Boroyevich, A. White, and M. Kheraluwala.

“Design and optimization of a high performance isolated three phase AC/DC

converter”. In: 2016 IEEE Energy Conversion Congress and Exposition (ECCE).

Milwaukee, U.S.A., September 18–22, 2016, pp. 1–10.

[221] R. Webster and M. A. Oliver. Geostatistics for Environmental Scientists. 2nd ed.

John Wiley & Sons, 2007.

[222] B. Wen et al. “Integrated Design by Optimization of Electrical Power Systems

for More Electric Aircraft”. In: 2015 More Electric Aircraft (MEA). Toulouse,

France, February 3–5, 2015, pp. 1–4.

[223] T. Wittig. Zur Reduzierung der Modellordnung in elektromagnetischen Feldsimu-

lationen. Technische Universität Darmstadt, PhD thesis. 2004.

[224] D. H. Wolpert and W. G. Macready. “No Free Lunch Theorems for Optimiza-

tion”. In: IEEE Transactions on Evolutionary Computation 1.1 (1997), pp. 67–82.

Bibliography 223

[225] N. S. Yanofsky. “Towards a Definition of an Algorithm”. In: Journal of Logic

and Computation 21.2 (2010), pp. 253–286.

[226] K. Yosida. Functional Analysis. 6th ed. Springer, 1995.

[227] S. Zaglmayr. High Order Finite Element Methods for Electromagnetic Field Com-

putation. Johannes Kepler Universität Linz, PhD thesis. 2006.

[228] Z.-H. Zhan, J. Zhang, Y. Li, and H. S. Chung. “Adaptive Particle Swarm Op-

timization”. In: IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cy-

bernetics) 39.6 (2009), pp. 1362–1381.