Surrogate Optimization with
Algebraic Notes and Applications within
the Electromagnetics Context
vorgelegt von
M.Sc.
Mirsad Hadžiefendi´c
an der Fakultät IV - Elektrotechnik und Informatik
der Technischen Universität Berlin
zur Erlangung des akademischen Grades
Doktor der Ingenieurwissenschaften
- Dr.-Ing. -
genehmigte Dissertation
Promotionsausschuss:
Vorsitzender: Prof. Dr. Jürgen Bruns (TU Berlin, Fakultät IV)
Gutachter: Prof. Dr. Rolf Schuhmann (TU Berlin, Fakultät IV)
Gutachter: Prof. Dr. Fredi Tröltzsch (TU Berlin, Fakultät II)
Gutachter: Jun.-Prof. Dr. Ulrich Römer (TU Braunschweig)
Tag der wissenschaftlichen Aussprache: 14. Juli 2021
Berlin 2022
iii
Declaration of Authorship
I, Mirsad Hadžiefendi´c, M.Sc., declare that this thesis titled, “Surrogate Optimiza-
tion with Algebraic Notes and Applications within the Electromagnetics Context”
and the work presented in it are my own. I confirm that:
• This work was done wholly or mainly while in candidature for a research de-
gree at this University.
• Where any part of this thesis has previously been submitted for a degree or
any other qualification at this University or any other institution, this has been
clearly stated.
• Where I have consulted the published work of others, this is always clearly
attributed.
• Where I have quoted from the work of others, the source is always given. With
the exception of such quotations, this thesis is entirely my own work.
• I have acknowledged all main sources of help.
• Where the thesis is based on work done by myself jointly with others, I have
made clear exactly what was done by others and what I have contributed my-
self.
Signed:
Date:
v
“Ima jedna modra rijeka – valja nama preko rijeke.”
Mehmedalija Mak Dizdar
vii
TECHNISCHE UNIVERSITÄT BERLIN
Abstract
Fakultät Elektrotechnik und Informatik
Institut für Hochfrequenz- und Halbleiter-Systemtechnologien
Fachgebiet Theoretische Elektrotechnik
Doktor der Ingenieurwissenschaften (Dr.-Ing.)
Surrogate Optimization with
Algebraic Notes and Applications within
the Electromagnetics Context
by Mirsad Hadžiefendi´c, M.Sc.
This thesis deals with surrogate optimization for applications within the electro-
magnetics context. Regarding the electromagnetics context, in particular, the mag-
netoquasistatic model of Maxwell’s theory is discussed. Moreover, relevant points
regarding the magnetoquasistatic model’s numerical simulation and numerical op-
timization are examined.
The key notion surrogate optimization is thoroughly elaborated which is parti-
tioned into three sub-notions: (1) surrogate modeling & simulation, (2) surrogate-
based optimizaton, and (3) surrogate-guided optimization. The various notions of
surrogate optimization are tagged with algebraic notes in order to anticipate the
toolset of the formal language of category theory. Moreover, the capability of the
category theory toolset as an algebraic modeling framework for applications in sur-
rogate optimization is investigated.
Finally, representatives of the class of inductive components are invoked and the
surrogate optimization tools of the present work are applied to four high-fidelity op-
timization problems that are embedded within the setting of a two-dimensional lin-
ear boundary value problem and a three-dimensional linear boundary value prob-
lem, respectively. Concerning these optimization problems, some promising spots
for a useful application of the category theory toolset are illuminated.
From the bird’s-eye view, this thesis achieves some progress in the scientific
thicket of full automation of the virtual prototyping of power electronic systems.
From the frog’s-eye view, i.e., at a more technical level, some of the present
work’s achievements deal with hybrid model management strategies of surrogate-
guided optimization methods, the repercussions of the choice of a sampling plan
on these methods, and formalization issues regarding surrogate optimization with
multiple low-fidelity models.
ix
TECHNISCHE UNIVERSITÄT BERLIN
Kurzzusammenfassung
Fakultät Elektrotechnik und Informatik
Institut für Hochfrequenz- und Halbleiter-Systemtechnologien
Fachgebiet Theoretische Elektrotechnik
Doktor der Ingenieurwissenschaften (Dr.-Ing.)
Surrogate Optimization with
Algebraic Notes and Applications within
the Electromagnetics Context
von Mirsad Hadžiefendi´c, M.Sc.
Diese Arbeit beschäftigt sich mit der Optimierung mit Ersatzmodellen für Anwen-
dungen innerhalb der elektromagnetischen Feldtheorie. Im Kontext der elektromag-
netischen Feldtheorie geht diese Arbeit insbesondere auf das magnetoquasistatische
Modell der Maxwell’schen Gleichungen ein. Zudem werden relevante Punkte hin-
sichtlich der numerischen Simulation und Optimierung des magnetoquasistatischen
Modells besprochen.
Es wird der Schlüsselbegriff „Surrogate optimization“ (in Dt.: Optimierung mit
Ersatzmodellen) ausführlich ausgearbeitet, der im Rahmen dieser Arbeit in drei Un-
terbegriffe aufgeteilt wird: (1) „Surrogate modeling & optimization“ (in Dt.: Mod-
ellierung und Simulation mit Ersatzmodellen), (2) „Surrogate-based optimization“
(in Dt.: auf Ersatzmodellen basierende Optimierung) und (3) „Surrogate-guided
optimization“ (in Dt.: durch Ersatzmodelle geführte Optimierung). Man beachte,
dass die verschiedenen Begriffe bzgl. der Optimierung mit Ersatzmodellen mit alge-
braischen Anmerkungen versehen werden, um den mathematischen Werkzeugkas-
ten der formalen Sprache der Kategorientheorie vorwegzunehmen. Es werden im
Speziellen die Möglichkeiten untersucht, den kategorientheoretischen Werkzeugkas-
ten als algebraische Modellierungsumgebung für Anwendungen im Rahmen der
Optimierung mit Ersatzmodellen zu verwenden.
Abschließend werden Vertreter aus der Klasse der induktiven Komponenten
präsentiert und die im Rahmen dieser Arbeit vorgestellten Werkzeuge bzgl. der Op-
timierung mit Ersatzmodellen werden auf vier hochgenaue Optimierungsprobleme
angewendet. Diese Optimierungsprobleme sind in das Umfeld von zweidimension-
alen und dreidimensionalen linearen Randwertproblemen eingebettet. Hinsichtlich
dieser Optimierungsprobleme werden einige vielversprechende Stellen für eine nüt-
zliche Anwendung des kategorientheoretischen Werkzeugkastens beleuchtet.
Aus der Vogelperspektive betrachtend erreicht diese Dissertation einen gewissen
Fortschritt im wissenschaftlichen Dickicht der vollständigen Automatisierung des
virtuellen Prototypings von leistungselektronischen Systemen.
Aus der Froschperspektive betrachtend, d. h., auf einer eher technischen Ebene,
beschäftigen sich einige der Errungenschaften dieser Arbeit mit hybriden Modell-
management-Strategien von durch Ersatzmodelle geführte Optimierungsmethoden,
mit den Auswirkungen der Wahl des Stichprobenplans auf diese Methoden und mit
Formalisierungsproblemen bzgl. der Optimierung mit Ersatzmodellen im Falle von
mehreren ungenauen Modellen.
xi
Acknowledgements
I sincerely thank Prof. Dr. Rolf Schuhmann for his mentorship, advice, and open-
mindedness. I thank also all my colleagues at the group Theoretische Elektrotechnik
for a pleasant and inspirational working environment. A special thanks has to be
given to Marcus Christian Lehmann, Albert Piwonski, and Rodrigo Silva Rezende
with whom I have shared many pursuits of knowledge.
I thank Prof. Dr. Fredi Tröltzsch and Jun.-Prof. Dr. Ulrich Römer for their time
and energy to provide the second review and third review regarding the present
work, respectively. And I would like to thank Prof. Dr. Jürgen Bruns for taking over
the chairmanship of the doctoral committee.
Finally, I thank all my friends and my whole family for their support. My greatest
and warmest thanks are due to my parents, Mensur and Abasa, and my brothers,
Admir and Emir.
xiii
Contents
Declaration of Authorship iii
Abstract vii
Kurzzusammenfassung ix
Acknowledgements xi
1 Introduction 1
1.1 A bigger picture: The ideal long-term goal ................... 1
1.2 Glimpse at the details: Surrogate optimization ............... 3
1.3 Setting a horizon: The research scope & goals ............... 6
2 Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and opti-
mization 11
2.1 Magnetoquasistatic Model of Maxwell’s theory .............. 12
2.1.1 The fundamental problem statement of electromagnetism . . . 12
2.1.2 The system of Maxwell’s equations ................. 12
2.1.3 The magnetoquasistatic subsystem & the magnetostatic sub-
system ................................... 16
2.2 Numerical simulation of the magnetoquasistatic model ......... 18
2.2.1 The weak formulation ......................... 18
2.2.2 Numerical approximation ....................... 21
2.2.3 Parametric mathematical model ................... 22
2.3 Numerical optimization with the magnetoquasistatic model . . . . . . 23
2.3.1 Optimization with a partial differential equation ......... 24
2.3.2 Nonlinear optimization problem ................... 26
2.3.3 Optimization algorithms ........................ 28
2.4 In closing ...................................... 39
3 Surrogate optimization 41
3.1 Surrogate modeling & simulation ....................... 42
3.1.1 An abstract setting ........................... 42
3.1.2 Deterministic and probabilistic data-fit low-fidelity models . . 59
3.1.3 Simplified-physics low-fidelity models ............... 75
3.2 Surrogate-based optimization .......................... 81
3.2.1 Optimization with test functions
by data-fit low-fidelity models .................... 81
3.2.2 Optimization with test functions
by emulated simplified-physics low-fidelity models . . . . . . . 96
3.3 Surrogate-guided optimization ......................... 100
3.3.1 Sequential kriging optimization ................... 102
3.3.2 Optimization within the space-mapping paradigm . . . . . . . 104
xiv
3.3.3 Co-kriging optimization ........................ 117
3.4 In closing ...................................... 126
4 An algebraic modeling framework using the category theoretical language
for applications in surrogate optimization 129
4.1 Recapitulating and enlarging the contextual landscape . . . . . . . . . 130
4.1.1 Recapitulating & the context of full automation of SGO . . . . 130
4.1.2 Relevant related work ......................... 132
4.2 Category theory toolset ............................. 133
4.2.1 Fostering some intuition ........................ 133
4.2.2 Applying some rigor .......................... 134
4.2.3 Computational facets .......................... 140
4.3 Using the CT toolset for SGO methods .................... 142
4.3.1 Specifying a general optimization problem ............ 143
4.3.2 Specifying surrogate-guided optimization methods . . . . . . . 145
4.4 Use cases of the CT toolset
within the electromagnetics context ..................... 153
4.4.1 Use case #1: Simplified-physics low-fidelity models . . . . . . . 153
4.4.2 Use case #2: Coordinate transformations .............. 157
4.5 Future use cases for the CT toolset ...................... 161
4.6 In closing ...................................... 163
5 Surrogate optimization with the magnetoquasistatic model 165
5.1 Solenoid with a core ............................... 166
5.1.1 Preliminary consideration ....................... 166
5.1.2 Optimization problem I ........................ 172
5.1.3 Optimization problem II ........................ 178
5.2 Common-Mode Choke ............................. 182
5.2.1 Preliminary consideration ....................... 182
5.2.2 Optimization problem I ........................ 183
5.2.3 Optimization problem II ........................ 186
5.3 In closing ...................................... 190
6 Conclusion and outlook 191
6.1 Conclusion ..................................... 191
6.2 Outlook ....................................... 193
A Multivariate polynomials (§ 3.1.2) 195
A.1 Reparametrization using mean-centered arguments . . . . . . . . . . . 195
A.2 Bernstein polynomials .............................. 196
A.3 Chebyshev polynomials ............................. 199
B Solenoid with a core (§ 5.1) 203
B.1 An electrical network viewpoint ....................... 203
B.2 A visualization of evaluated data-fit low-fidelity models regarding (5.12),
(5.14c), and (5.15) ................................. 206
Bibliography 211
xv
List of Figures
1.1 An inductive component in various representations. ........... 2
1.2 A schematic depiction of a low-fidelity model. ............... 5
1.3 Illustrating two generic devices under test. ................. 7
1.4 A schematic orientation aid for the present work. ............. 10
2.1 Illustrations of a magneotquasistatic subsystem’s domain. . . . . . . . 17
2.2 Representations of the six test functions in Table 2.1. ........... 35
2.3 Representations of the six test functions in Table 2.1 (highlighting the
neighborhood of the global minimum). ................... 36
2.4 Depicting grad(f)(x1,x2)with (x1,x2)∈Uas a projection on the con-
tour representation of the six test functions in Table 2.1. ......... 37
2.5 Depicting si(f,(x1,x2))from (2.49) with i∈{1,2}and (x1,x2)∈Uw.r.t.
the test functions in Table 2.1. ......................... 38
3.1 Representations of different sampling plans (part one). ......... 49
3.2 Representations of different sampling plans (part two). ......... 50
3.3 The Audze-Eglais LHC sampling plan in Figure 3.2 adapted for the
contour representation in Figure 2.2b. .................... 51
3.4 Sampling plan Xs(condition number κ(BTB)). ............... 66
3.5 Basis under consideration for the space P1
≤2.
(i)
Monomial basis,
(ii)
Bern-
stein basis,
(iii)
Chebyshev basis. ....................... 67
3.6 The six radial basis functions in Table 3.2. .................. 69
3.7 A schematic depiction of a user-prescribed hierarchy of generic bound-
ary value problems. ............................... 76
3.8 A schematic depiction of a user-prescribed hierarchy of magnetoqua-
sistatic and magnetostatic problems. ..................... 77
3.9 Using the Sobol quasi-random sequence sampling plan in Figure 3.2,
a 2-variate monomial polynomial model p(x)∈P2
≤2via regression of
the test functions in Figure 2.2. ......................... 83
3.10 Using the Sobol quasi-random sequence sampling plan in Figure 3.2,
a radial basis function φ=r↦φ(r)with thin plate spline assignment
via interpolation of the test functions in Figure 2.2. ............ 84
3.11 Using the Sobol quasi-random sequence sampling plan in Figure 3.2,
a kriging low-fidelity model via interpolation of the test functions in
Figure 2.2. ..................................... 85
3.12 The value of eN
cv and eNR
cv evaluated at the number of folds kand the
number of testing points mgw.r.t. the test functions in Figure 2.2 with
normalized values. ................................ 90
3.13 The value of eN
cv evaluated at the number of folds kand the number of
testing points mgw.r.t. the test functions in Figure 2.2 with normalized
values. ........................................ 91
3.14 The value of r2
ˆ
y˜
ˆ
y,cv evaluated at the number of folds kand the number
of testing points mgw.r.t. the test functions in Figure 2.2. . . . . . . . . 92
xvi
3.15 The value of SN
˜
ˆ
y,iwith i∈{1,2}evaluated at fw.r.t. the test functions
in Figure 2.2 and the number of training points mt............. 93
3.16 Emulated simplified-physics low-fidelity models for the modified Branin
test function in Figure 2.2 via the assignment rule in (3.105). . . . . . . 98
3.17 Depicting grad(˜
K)(x1,x2)as a projection on the contour representa-
tion of the emulated simplified-physics low-fidelity models for the
modified Branin test function in Figure 3.16. ................ 99
4.1 A schematic depiction of a three-dimensional helical coil of 5 turns in
xyz-coordinates and in uvw-coordinates. .................. 158
5.1 A representation of a solenoid with core within FEMM4.2. . . . . . . . 167
5.2 Contour representation of ˜
ˆ
jPL,ωiand ˜
ˆ
QL,ωiin (5.31) for various fre-
quencies. ...................................... 180
5.3 A representation of a common-mode choke within FEMM4.2. . . . . . 182
5.4 A prototypical version of a simplistic EMC filter. . . . . . . . . . . . . . 184
5.5 A representation of two common-mode choke within FEMM4.2. . . . 187
5.6 Contour representation of ˜
ˆ
QL,ω0in (5.47). .................. 189
A.1 Instances of a Chebyshev grid as a sampling plan Xs⊂(X2)m1×m2.. . 200
B.1 Circuit diagram representation of the three fundamental passive elec-
trical components. ................................ 203
B.2 Two representatives from the equivalence class of circuit diagrams for
real inductive components. ........................... 204
B.3 The magnitude and the phase of the impedances associated with the
representatives in Figure B.2. .......................... 205
B.4 By using the Sobol quasi-random sequence sampling plan with m∶=21
and the data-fit low-fidelity models in § 3.2.1; representations of ˜
ˆ
jPL,ω0(x1,x2)
and ˜
Vut(x1,x2)where ω0∶=2π100kHz. ................... 207
B.5 By using the Sobol quasi-random sequence sampling plan with m∶=21
and the data-fit low-fidelity models in § 3.2.1; representations of ˜
ˆ
jPL,Vut,ω0(x1,x2)
and ˜
ˆ
QL,ω0(x1,x2)where ω0∶=2π100kHz. .................. 208
B.6 Depicting grad(˜
K)(x1,x2)as a projection on the contour representa-
tion of the data-fit low-fidelity models for the functions in Figure B.4. 209
B.7 Depicting grad(˜
K)(x1,x2)as a projection on the contour representa-
tion of the data-fit low-fidelity models for the functions in Figure B.5. 210
xvii
List of Tables
2.1 Test functions of form f=(x1,x2)↦f(x1,x2)∶U→R.. . . . . . . . . . 29
2.2 The normalized global first-order sensitivity measure SN
iwith i∈{1,2}
evaluated at fw.r.t. the Figure 2.2b. ..................... 31
2.3 The normalized global first-order sensitivity measure SN
iwith i∈{1,2}
evaluated at fw.r.t. the Figure 2.3b. ..................... 31
2.4 The normalized global first-order sensitivity measure SN
ievaluated at
fRNξw.r.t. the domain [−2.0,2.0]Nξwith Nξ∈{2,3,4,5,6,7}.. . . . . . . 32
2.5 Check exemplarily the Ackley function’s global optimum. . . . . . . . 33
3.1 Given some pairs (k,d)∈N2, the dimension of Pd
≤kand Pd
k.. . . . . . . 61
3.2 Given a generic radial basis function φ=r↦φ(r)with function sig-
nature R+→R, six different definitions for the assignment φ(r).. . . 69
3.3 The normalized mean generalization error and the mean SSPCC within
the k-fold cross validation method w.r.t. a 2-variate monomial poly-
nomial. ....................................... 86
3.4 The normalized mean generalization error and the mean SSPCC within
the k-fold cross validation method w.r.t. a radial basis function with
thin plate spline assignment. .......................... 86
3.5 The normalized mean generalization error and the mean SSPCC within
the k-fold cross validation method w.r.t. a kriging low-fidelity model. 87
3.6 The normalized global first-order sensitivity measure SN
˜
ˆ
y,iwith i∈{1,2}
evaluated at fw.r.t. the 2-variate monomial polynomial in Figure 3.9b. 87
3.7 The normalized global first-order sensitivity measure SN
˜
ˆ
y,iwith i∈{1,2}
evaluated at fw.r.t. the radial basis function with thin plate spline as-
signment in Figure 3.10b. ............................ 87
3.8 The normalized global first-order sensitivity measure SN
˜
ˆ
y,iwith i∈{1,2}
evaluated at fw.r.t. the kriging low-fidelity model in Figure 3.11b. . . 87
3.9 The normalized mean generalization error and the mean SSPCC within
the k-fold cross validation method w.r.t. a kriging low-fidelity model
of a generalized version of the Rosenbrock test function in Table 2.1
(without normalized values). .......................... 89
3.10 The normalized global first-order sensitivity measure SN
ievaluated at
fRNξfrom (2.52) w.r.t. the domain [−2.0,2.0]Nξwith Nξ∈{2,3,4,5,6,7}.94
3.11 The low-fidelity models’ normalized global first-order sensitivity mea-
sures (LFSM) error in (3.37) evaluated at fRNξfrom (2.52) w.r.t. the
domain [−2.0,2.0]Nξwith Nξ∈{2,3,4,5,6,7}................. 94
3.12 Surrogate-based optimization w.r.t. the modified Branin function us-
ing data-fit low-fidelity models. ........................ 96
3.13 The choice of the 4-tuple of parameters (α,β,γ,δ)in Figure 3.16. . . . 96
xviii
3.14 The normalized mean generalization error and the mean SSPCC within
the k-fold cross validation method w.r.t. emulated simplified-physics
low-fidelity models in Figure 3.16a. ...................... 97
3.15 The normalized mean generalization error and the mean SSPCC within
the k-fold cross validation method w.r.t. emulated simplified-physics
low-fidelity models in Figure 3.16b. ..................... 97
5.1 The rough heuristic estimate of the scale of the winding losses in (5.13)
for various operating frequencies f0...................... 171
5.2 (Ia) SN
iwith i∈{1,2}evaluated at
(a)
f≡˜
ˆ
jPL,ω0and
(b)
f≡˜
Vut w.r.t. the
Figure B.6; (Ib) Given mj∶=50, LFSM error emj(SN
˜
ˆ
y,i)w.r.t. (Ia); (IIa) SN
i
with i∈{1,2}evaluated at
(a)
f≡˜
ˆ
jPL,Vut,ω0and
(b)
f≡˜
ˆ
QL,ω0w.r.t. the
Figure B.7; (IIb) Given mj∶=50, LFSM error emj(SN
˜
ˆ
y,i)w.r.t. (IIa). . . . . 174
5.3 The mean SSPCC r2
ˆ
y˜
ˆ
y∣k∶=5within the k-fold cross validation method
w.r.t. the simplified-physics low-fidelity model regarding the entities
in (5.17). ....................................... 176
5.4 The log data in an abridged version w.r.t. the setting of the log data
in (5.19) for different operating frequencies. ................ 179
5.5 The log data in an abridged version w.r.t. the setting of the log data
in (5.19) for the operating frequency 1×108Hz and different initial
points. ........................................ 179
A.1 The condition number w.r.t. a sampling plan from Figure 3.4 without
and with reparametrization in (A.2). ..................... 195
A.2 The trace of 1
m(BTB)and the trace of 1
m(BT
ςBς).. . . . . . . . . . . . . . . 197
A.3 The condition number κ(BTB+λI)and κ(BT
ςBς+λI).. . . . . . . . . . 198
xix
List of Abbreviations
1D One-dimensional
2D Two-dimensional
3D Three-dimensional
ADE Adaptive Differential Evolution
AMMO Approximation and Model Management Optimization
BVP Boundary Value Problem
CCC Cartesian Closed Category
CM Common-Mode
CMC Common-Mode Choke
COBYLA Constrained Optimization by Linear Approximation
CT Category Theoretical/Category Theory
DIRECT Dividing Rectangles
DM Differential-Mode
DoFF Degree ofForgetfulness
EMC Electromagnetic Compatibility
FE Finite Element
GA Genetic Algorithm
GPU Graphics Processing Unit
KKT Karush-Kuhn-Tucker
L-BFGS Limited-memory Broyden-Fletcher-Goldfarb-Shanno
LBVP Linear Boundary Value Problem
LFSM Low-Fidelity Models’ Normalized Global First-Order Sensitivity Measures
LHC Latin Hypercube
LU Lower-Upper
MEA Modified Evolutionary Algorithm
MLE Maximum Likelihood Estimate
MM Manifold Mapping
MMA Method of Moving Asymptotes
MS Moore-Skelboe
NEGE Normalized Empirical Generalization Error
NLBVP Non-Linear Boundary Value Problem
NMS Nelder-Mead Simplex
NREGE Normalized Root Empirical Generalization Error
PDE Partial Differential Equation
PL Programming Language
PS Particle Swarm
RPM Response and Parameter Mapping
S- Scattering
SGO Surrogate-guided Optimization
SM Space Mapping
SQP Sequential Quadratic Programming
SSPCC Squared Sample Pearson Correlation Coefficient
SVD Singular Value Decomposition
xx
TPS RBF Thin Plate Spline Radial Basis Function
TRASM Trust Region Aggressive Space Mapping
UMP Universal Mapping Property
WPBVP Well-Posed Boundary Value Problem
xxi
Physical & Mathematical Constants
speed of light in vacuum c0=2.99792458 ×108ms−1
vacuum magnetic permeability µ0=4π×10−7Hm−1
vacuum electric permittivity e0=1/(µ0c2
0)
pi π=3.1415926535897. ..
xxiii
List of Symbols (Selection)
Jelectric current flux density
µmagnetic permeability
σelectric conductivity
ωangular frequency
Amagnetic vector potential
φelectric scalar potential
I0fixed current intensity
PLtime-averaged ohmic loss
Wmtime-averaged magnetic energy
Rresistance
Linductance
f0operating frequency
ω0operating angular frequency
Vut volume under test
Rset of real numbers
xspace variable
Ωspace region
∂Ωboundary of Ω
f∶A→Bdomain Aand codomain Bof function fwith the signature A→B
f=x↦f(x)function fwith the assignment rule x↦f(x)
div, grad,curl differential operators: divergence, gradient, and curl
∀universal quantifier
∃existential quantifier
VHilbert space
∥⋅∥Vappropriate norm on V
Qquantity of interest
hmesh size parameter
Thsimplicial triangulation
uhdiscrete solution
Nξnumber of parameters
ξ
ξ
ξparameter point
fparametric solution function
ˆ
Qξ
ξ
ξreduced parametric quantity of interest
ˆ
jreduced objective function
○composition operator
silocal first-order sensitivity measure w.r.t. component i
SN
inormalized global first-order sensitivity measure w.r.t. component i
ddimensionality
eH(K)high-fidelity function approximation error
ssample
Xssampling plan
msample size
xxiv
eH,s(ˆ
Qξ
ξ
ξ)empirical surrogate modeling error
eN
H,sg(ˆ
Qξ
ξ
ξ)normalized empirical generalization error
r2
ˆ
y˜
ˆ
ysquared sample Pearson correlation coefficient
Pd
≤kspace of d-variate polynomials of total degree at most k
Nm(y∣µy,Σ)probability density of an m-dimensional Gaussian distribution at y
Ψcorrelation matrix
Ccovariance matrix
Lln ln-likelihood function
∆Ax=bthreshold for a termination criterion of an iterative solver
˜
Pdomain-oriented correction map
∆(k+1)k+1-th iteration trust-region radius
x∗optimal solution of the high-fidelity optimization problem
Xobject
gmorphism
Acategory
F functor
αnatural transformation
Sij (i,j)-th scattering parameter
Nnumber of turns of a winding
mSGO,sm number of high-fidelity function evaluations (space-mapping)
mSGO,ck number of high-fidelity function evaluations (co-kriging)
mwnumber of operating frequencies
xxv
Dedicated to my parents, Mensur & Abasa
1
Chapter 1
Introduction
In this chapter, we encounter the background, the scope, and the research goals of
the present dissertation:
• First, I discuss the bigger picture, more precisely, the ideal long-term goal which
originates in the engineering domain of power electronics and inspires the
starting point and the direction of the research project.
• Second, I sketch the general path to which the dissertation contributes; that
is, the development and application of surrogate modeling, simulation, and
optimization methods in, primarily, the electromagnetic field theory’s realms
of magnetostatics and magnetoquasistatics.
• Third, I conclude the chapter by providing the path-dependent research goals
that guide the remainder of the work.1
1.1 A bigger picture: The ideal long-term goal
Implicitly or explicitly, every research project has an ideal long-term goal which
helps to establish the investigation’s concrete context and the actual research goals.
The thesis’ ideal long-term goal is the full automation of the virtual prototyping of power
electronic systems. Let me elucidate briefly this goal.
The domain of power electronics is concerned with the control and the conver-
sion of electrical energy by means of fast switching semiconductor components (see,
e.g., [152], [89], [60]); two representatives of the broad class of power electronic sys-
tems are three-phase rectifiers and electromagnetic compatibility (EMC) filters (see
Fig. 1.1).2Given this domain, my notion of “full automation” is that an ideal soft-
ware system processes a user’s input specifications and it outputs an appropriate
power electronic system – without any additional user’s intervention. Finally, I un-
derstand the term “virtual prototyping” as a proxy for “mathematical modeling,
numerical simulation & optimization”.
Note that full automation is still far away; but there has already been prolific
research regarding virtual prototyping of power electronic systems (see, e.g., [222],
[220], [41]). The corresponding real-world engineering optimization problems con-
sist of different levels of design complexity: from the materials’ design over the
1Notice that the present dissertation’s typesetting builds upon a free and open-source L
A
TEX type-
setting template provided by LaTeX Templates (see [162]).
2Mind that, for the purpose of drawing figures in the present dissertation, I invoke the free and
open-source vector graphics editor Inkscape (see version 1.0.1 at
https://inkscape.org/
), the free
vector graphics editor Ipe (see version 7.2.20 at
http://ipe.otfried.org/
), and the free and open-
source 3D computer graphics software Blender (see version 2.91.0 at
https://www.blender.org/
).
2Chapter 1. Introduction
(a)
(b)
(c) (d)
FIGURE 1.1: An inductive component in various representations:
(a) in a circuit diagram (figure from [154, p. 7]), (b) in a real-
world EMC filter (source: Fraunhofer-Institut für Zuverlässigkeit
und Mikrointegration IZM), (c) in the 3D simulation tool CST Stu-
dio Suite®3, and (d) in the 2D simulation tool FEMM4.2 (see [149]).
components’ design to the systems’ design. Mind that these problems involve intri-
cate interactions between various physical domains such as electromagnetics, fluid
dynamics or structural mechanics; additionally, they involve several conflicting ob-
jectives such as performance, cost or efficiency. Hence, formalizing properly and
solving efficiently these problems are challenging tasks.
To date, the reported optimization procedures utilize predominantly concepts
from the area of multidisciplinary design optimization (see, e.g., [2], [147]) and from the
area of multiobjective optimization (see, e.g., [150], [146]):
Multiobjective optimization. If multiple objective functions are taken into ac-
count, then multiobjective optimization – or vector optimization – expresses an opti-
mal design by the notion of Pareto optimality, i.e., an optimal design is Pareto-optimal
if an improvement concerning one objective leads inevitably to a degradation con-
cerning another objective. Common multiobjective optimization techniques include
a transformation of the multiple objectives into a single objective, for instance, by the
weighted sum method: First, the objectives are multiplied by weights (non-negative
numbers that add up to one); next, the weighted objectives are summed up. An
immediate complication is the need for selecting a specific combination of weights
which is reflecting a specific preference of objectives.
3CST Studio Suite®is a proprietary commercial 3D electromagnetic analysis software package by
Dassault Systèmes (see version 2019 at
https://www.3ds.com/
).
1.2. Glimpse at the details: Surrogate optimization 3
Multidisciplinary design optimization. Optimal input variables attained by an
optimization using a single physical discipline rarely equal the optimal input vari-
ables attained by an optimization using multiple physical disciplines – especially, if
there are interdependencies between the different disciplines. For the preceding ob-
servation, multidisciplinary design optimization offers a framework to keep track of
the input variables and all the involved output variables. However, one important
issue is how to establish compatibility regarding the variables; another important is-
sue is how to choose an adequate architecture, i.e., how to coordinate the analysis of
the multiple interdependent physical disciplines. These issues influence the selec-
tion of a solution method and the reasoning about the optimal design.
Regarding virtual prototyping of power electronic systems, a noteworthiness of
the deployed procedures is that they utilize different computational and noncom-
putational models of variable degrees of fidelity, e.g., finite element simulations,
closed-form expressions, physical experiments, etc. The areas of surrogate optimiza-
tion (see, e.g., [70]), and multifidelity optimization (see, e.g., [166]), respectively, are
dedicated to exploit such different models for optimization purposes.
Mind that, to my best knowledge, concepts from surrogate optimization have
not yet been exhaustively discussed in the context of virtual prototyping of power
electronic systems. However, as I have elaborated above, the complexity of real-
world power electronic systems’ design is tremendous. Therefore, the focus of the
present work’s applications is on particular optimization problems concerning in-
ductive components (see Fig. 1.1). Inductive components represent significant de-
vices under test since they contribute heavily to the losses of a power electronic
system, and they demand a lot of space within a power electronic system (cf. [154,
p. 2]).
All in all, the surrogate optimization of inductive components constitutes the
starting point and the direction of the research project.
1.2 Glimpse at the details: Surrogate optimization
Surrogate optimization for engineering design problems is a vast research area that
spans several decades of intensive investigations (see, e.g., surveys in [125], [127],
and [126]). At a conceptional level, the notion surrogate optimization encompasses
three sub-notions:
• surrogate modeling & simulation,
• surrogate-based optimization, and
• surrogate-guided optimization.4
Surrogate modeling & simulation. The basic assumption is that the evaluation
of a given function – aka high-fidelity function or model – is too expensive; hence,
there is a need to approximate this function in a meaningful manner by another
function – aka low-fidelity function or model – whose evaluation costs are, by design,
much lower than those of the high-fidelity model. An example of a high-fidelity
4Note that I employ the affix “guided” in the term “surrogate-guided”. This term is by no means
a member of the usual terminology in which “surrogate-guided” would be a synonym for “surrogate-
based”. However, I consider the terminological supplement as a helpful tool to enable better concep-
tional differentiation of the corresponding mechanisms.
4Chapter 1. Introduction
model is the joule loss functional computed by a high-order finite element simula-
tion (see, e.g., [227]). An example of a low-fidelity model is a fit to data collected by
sampling the high-fidelity model.
Some immediate issues are concerned with error bounds or error estimates in
order to assess the quality of the low-fidelity model.
Concerning this sub-notion, the surrogate model is identical to the low-fidelity
model; and surrogate simulation means evaluating the low-fidelity model.
Surrogate-based optimization. Assuming that the surrogate model is sufficiently
accurate, the basic idea of surrogate-based optimization is to replace the optimiza-
tion problem regarding the high-fidelity function by an optimization problem re-
garding the low-fidelity function – without any additional interaction with the high-
fidelity function.
Next, the optimal solution corresponding to the low-fidelity optimization prob-
lem is computed, for instance, by deterministic algorithms such as the sequential
quadratic programming (SQP) (see, e.g., [158, ch. 18]) or stochastic algorithms such
as the genetic algorithm (GA) (see, e.g., [49, p. 39–43]). Finally, the computed opti-
mal solution is checked within the high-fidelity optimization problem (cf. [31, p. 2]).
An issue concerning this sub-notion is connected to the assessment of the com-
puted optimal solution – since the optimal solution of the high-fidelity optimization
problem is unknown apriori.
Thus, the low-fidelity optimization problem’s optimal solution is either accepted
as a proxy – to some extent – of a high-fidelity optimization problem’s optimal solu-
tion; or it is utilized as a starting point within the high-fidelity optimization problem.
Surrogate-guided optimization. Compared to the previous optimization approa-
ch, the key difference is that there is an interaction between the high-fidelity op-
timization problem and the low-fidelity optimization problem. During the search
for the optimal solution of the high-fidelity optimization problem, the role of the
low-fidelity function is to speedup the search; whereas the role of the high-fidelity
function is to ensure convergence of the search.
A common issue is concerned with the general theoretic characterization of opti-
mal solutions by the first-order necessary conditions – i.e., the Karush–Kuhn–Tucker
(KKT) conditions (see, e.g., [210, p. 17f]).
Concerning this sub-notion, the surrogate model is not necessarily identical to the
low-fidelity model (cf. [194, p. 28]) since it depends on the type of interaction – or
model management strategy (cf. [166, p. 554f]) – between the high-fidelity model and
the low-fidelity model.
In the remaining text, I use the terms surrogate-guided optimization and multifi-
delity optimization (recall § 1.1) interchangeably.
Motivated by the field of application in the present work (recall § 1.1), the seman-
tics of the models is mainly determined by the electromagnetic field theory’s realms
of magnetostatics and magnetoquasistatics (see, e.g., [139] or [103]).
The preceding elaborations already hint at the pivotal role played by the low-
fidelity model in the area of surrogate optimization. In Fig. 1.2, there is a schematic
depiction of a low-fidelity model depending on the available information about the
high-fidelity model. Considering the high-fidelity model as a black-box, a gray-
box or a white-box model influences one classification of low-fidelity models into
1.2. Glimpse at the details: Surrogate optimization 5
xK(x)
(1)(2) (3)
x˜
K(x)
FIGURE 1.2: A schematic depiction of a low-fidelity model (encoded
by ˜
K) depending on the available information about the high-fidelity
model (encoded by K). The vertical arrows merely emphasize a con-
nection between a high-fidelity model and a low-fidelity model; the
horizontal arrows indicate input and output entities. The boxes as-
sociated with the low-fidelity model ˜
Ksolely indicate schematically
different potential representations of ˜
K. The high-fidelity model K
is considered as (1) a black-box model, (2) a gray-box model or
(3) a white-box model. The vertical black line separating (1) and (2)
from (3) indicates that, in the present work, the focus is on (1) and (2).
(1) data-fit, (2) simplified-physics, and (3) projection-based models (cf. [166, p. 556]).5An-
other possible way to classify low-fidelity models (indicated by the vertical black
line in Fig. 1.2) is to ask whether the models are intrusive or non-intrusive – where I
understand “intrusive” as a need to modify the numerical software underlying the
high-fidelity model (cf. [77, p. 3f]).
In my investigation, I reduce the area of focus on low-fidelity models of data-fit
type – for instance, kriging models (see, e.g., [137]) – and of simplified-physics type –
for instance, coarse-grid discretization models (see, e.g., [125, p. 159]). In order to
provide a complete picture and to comprehend the reduced focus, I address briefly
low-fidelity models of projection-based type.
Brief digression: projection-based low-fidelity models. The upcoming expo-
sition is very condensed. Thus, for a more elaborate exposition, I refer to [189]
and [20].
The basic mechanism behind this type of low-fidelity models is: The high-fidelity
model is given as a system of equations in a high-dimensional space and a corre-
sponding low-dimensional subspace is constructed such that some desired charac-
teristics of the system are preserved. The low-fidelity model constitutes the projec-
tion of the high-fidelity model onto the low-dimensional subspace.
In the context of electrical engineering, this type of low-fidelity models is inten-
sively discussed for circuit simulations and electromagnetic field simulations. Re-
garding applications in circuit simulations, see [21] for a collection of detailed in-
vestigations. Regarding applications in electromagnetic field simulations, there are
various investigations depending on the meaning of the parameter under consider-
ation (encoded by xin Fig. 1.2). Common meanings of the parameter are: frequency
5In the literature (see, e.g., [49, p. 45]), data-fit low-fidelity models are also called metamodels. More-
over, in [57], the authors suggest the wording data-fit,multifidelity, and reduced-order.
6Chapter 1. Introduction
(see, e.g., [223], [118]), material (see, e.g., [115], [44]), and geometry (see, e.g., [40],
[30]).
Undoubtedly, the projection-based type of low-fidelity models is a very important
type because it does not depend on domain-specific knowledge of an expert. On the
one hand, this independence is valuable for the automated construction of a low-
fidelity model; especially, if theoretically sound error bounds and error estimators
are available. On the other hand, it is questionable why the domain specific knowl-
edge of an expert – for example, in the form of a large number of different models –
should not be exploited.
With regard to the complexity of a real-world engineering design problem (re-
call § 1.1), there are, inevitably, a lot of open challenges concerning the theory and
the implementation of low-fidelity models of projection-based type. However, it is
arguably reasonable to state that a harmoniously balanced interaction between all
three types of low-fidelity models has the potential to be a fruitful approach in the
long run – as recent promising results (see, e.g., [165, p. A3163]) indicate.
Finally, I have sketched the general path to which the present dissertation con-
tributes. Next, I identify critical points on this general path and specify the research
scope and goals.
1.3 Setting a horizon: The research scope & goals
In order to have a chance to reconcile the ideal-long term goal (see § 1.1) and surro-
gate optimization (see § 1.2), there are at least two critical points that one encounters:
(1) In real-world design optimization problems, various high-fidelity models and
low-fidelity models from various sources are used non-formally – that lack rig-
orously proven error bounds and error estimators (see, e.g., [179] or [32]); de-
spite the non-formal usage, these models and their relationships have proven
to be useful in practice.
(2) In general, the task of comparing optimization algorithms is non-trivial (see,
e.g., [224]). With regard to surrogate optimization, there is a variety of methods
discussed in the literature but the task of choosing an appropriate method for
a given problem is non-trivial. An obstacle is to find a proper way to classify
the numerous methods. Especially, there is a lack of well-defined benchmarks
that could enable a standardized benchmark-focused comparison.
For illustration purpose of the two critical points, I exhibit briefly two examples:
(E1) An example concerning (1) is connected to the computation of the ohmic loss
(as the quantity of interest) of a three-dimensional helical coil of Nturns (as
the device under test, see (i) in Fig. 1.3).6If we replace this helical coil by a
collection of Ntoroids (see (ii) in Fig. 1.3), then the ohmic loss computation
associated with the coil represents the high-fidelity model, and the ohmic loss
computation associated with the toroids represents the low-fidelity model. The
comparison of the two models is usually based on the comparison of the re-
spective computed ohmic loss encoded as a non-negative real number.
6Unfortunately, there is some ambiguity regarding the term "ohmic loss". On the one hand, it refers
to the non-negative real number resulting from the ohmic loss integral computation; on the other hand,
it refers to the ohmic loss integral itself as a map. In the present context, I refer to the map.
1.3. Setting a horizon: The research scope & goals 7
Mind that, for instance from a topological viewpoint (see, e.g., [92]) or a bound-
ary value problem viewpoint (see, e.g., [174]), the two devices under test are
not necessarily the same in general. However, they are commonly assumed as
approximately the same regarding the ohmic loss computation – i.e., the same-
ness of the high-fidelity model and the low-fidelity model implies sameness of
the respective devices under test.
(i) (ii)
FIGURE 1.3: Two generic devices under test: (i) a three-dimensional
helical coil of 5 turns, and (ii) a collection of 5 toroids. The devices are
created within CST Studio Suite®.
(E2) An example concerning (2) is to choose a surrogate-guided optimization meth-
od from the class of methods following the space mapping paradigm (see, e.g.,
[125, p. 50]) for the optimization of a three-dimensional helical coil of Nturns
as a device under test.
Note that, in the space mapping paradigm, the low-fidelity model and the sur-
rogate model are not identical (cf. [194, p. 28]). Various approaches have been
proposed to construct the surrogate model (see, e.g., [49, ch. 3]). An attempt to
classify some methods within this class is by assessing the quality of the low-
fidelity models and the surrogate models with regard to convergence proper-
ties of the corresponding algorithms (see, e.g., [120], [121]).
I argue that the two critical points (1) and (2) are natural bounds to the full recog-
nition of surrogate optimization methods by practitioners in the industrial sector.
Additionally, the two points bound naturally the research scope in the present work
at a problem- or application-oriented level and at a theory-oriented level.
Bound at a problem- or application-oriented level. Since there is no realistic
possibility to test all conceivable classes of use cases by all surrogate optimization
methods, there is a need to restrict the investigation to a subclass of use cases and
a subclass of methods. Therefore, the use cases are restricted to applications asso-
ciated with inductive components; and the methods are restricted to those methods
that are using simplified-physics and data-fit as low-fidelity models and that are
using the space mapping paradigm (see, e.g., [125, p. 50]) and co-kriging approach
(see, e.g., [70, p. 167]) as model management strategies. According to the terminol-
ogy in [166, p. 555], the space mapping paradigm is a subtype of the model man-
agement strategy adaptation, and the co-kriging approach is a subtype of the model
management strategy fusion.
At the transition between the two levels, there are inevitable software issues
regarding, for instance, finite element (FE) simulation tools or programming lan-
guages (PLs). Commercial FE software (e.g., in CST Studio Suite®), open-source FE
8Chapter 1. Introduction
software (e.g., FEMM4.2), and in-house programs for the algorithms (written, e.g.,
in MATLAB®7and Julia8) are all employed in the present work.
Bound at a theory-oriented level. If one applies the high-fidelity model and the
low-fidelity model of the example (E1) in context of the example (E2), then one can
observe that the current formal languages in surrogate-guided optimization (see,
e.g., [127], [166]) enable only insufficiently to encode the semantics (or interpretation)
that one model is derived from the other. Considering point (1), such an encoding is
beneficial in order to preserve and organize formally the practical prior knowledge
about the models and their relationships – which is also beneficial as a stage of model
preparation in context of point (2).
Commonly, questions concerning semantics (and syntax) are rather investigated
by tools from logical analysis than by tools from numerical analysis. Mostly, in nu-
merical analysis, questions regarding logical sound footing for a reliable reasoning
about numerical models are associated with the notions of validation and verifica-
tion (see, e.g., [159]).
However, a promising mediator between these apparently different tool sets
is the formal language of category theory which is a holistic-structural approach to
mathematics (see, e.g., [11], [177], [180]). Its usefulness in physics (see, e.g., [73],
[46]) and in computer science (see, e.g., [167], [16]) has already been recognized.
Moreover, its usefulness is gradually getting recognition in electrical engineering
and computational electromagnetics (see, e.g., [13], [133]). Thus, the category theo-
retical language opens up a new opportunity to complement the primarily numeri-
cal analytic perspective in the context of surrogate optimization.
To draw the research scope completely, it is also necessary to mention directions
that are closely related to the present work but which will not be pursued.
Disclaimer: What is not considered in the dissertation. I provide a list of three
trends in the context of surrogate optimization (see § 1.2). Note that the list is cer-
tainly not exhaustive, though:
1. In real-world applications, there are many sources of uncertainties such as
manufacturing imperfections that result in, for instance, uncertain material,
shape or excitation information of a problem under consideration. Hence, the
first trend is to investigate mathematical methods of uncertainty quantifica-
tion (see, e.g., [201], [178], [50], [30], [122]).
2. The need for finding quickly an optimal solution associated with a high-fidelity
model is a reason for using surrogate optimization. However, an acceler-
ated search is also conceivable if the overall computational costs of a high-
fidelity model are reduced by utilizing concepts from parallel computing (see,
e.g., [190], [212]). Thus, a second trend is to explore the applicability of parallel
computing (see a survey, e.g., in [87]).
3. In surrogate optimization, as mentioned before, selecting a proper method for
a given problem is a non-trivial task since the selection depends heavily on
7MATLAB®is a proprietary commercial programming language by MathWorks (see version
R2019b at
https://www.mathworks.com/
).
8Julia is a free and open-source programming language (see version v1.5.3 at
https://julialang.
org/
). For more details on the dynamically-checked programming language Julia, I refer to, e.g., [26]
or [25].
1.3. Setting a horizon: The research scope & goals 9
the given problem. Therefore, there is a lack of generally valid guiding prin-
ciples for the selection process. However, considering machine learning tech-
niques (see, e.g., [192]), a third trend is concerned with the automation of the
selection process (see, e.g., [185]).
After providing the research scope, I can state the superordinate research goals:
• investigate the applicability of a surrogate optimization’s subsegment to ap-
plications associated with inductive components;
• investigate the benefits and drawbacks of the category theoretical language as
an algebraic modeling toolbox in the context of surrogate optimization.
In order to assess the achievements of the present work, it aids to consider the
superordinate research goals from a methodological point of view (cf. [199, p. 3]9):
The first goal is largely concerned with utilizing long-researched techniques for new
applications; whereas the second goal is largely concerned with introducing a new
area of knowledge to a long-researched area of knowledge.
Finally, I present the outline of the work:
• In chapter 2, I discuss particularly the magnetoquasistatic model of Maxwell’s
theory. Moreover, some relevant aspects regarding the numerical simulation
of the magnetoquasistatic model are presented. Finally, a few key points con-
cerning the numerical optimization with the magnetoquasistatic model are il-
luminated. Mind that, in the exposition, I take also a few small detours in order
to show by familiar examples some facets of the formal language of category
theory in advance. At the end, I address a zoo of optimization algorithms re-
garding nonlinear optimization problems. Furthermore, six test functions are
introduced and a gradient-based interpretation of sensitivity measures are de-
ployed that are primarily applied to models such as, e.g., data-fit low-fidelity
models, that permit the determination of derivative information by forward
mode automatic differentiation.
• In chapter 3, I elaborate thoroughly on the key notion surrogate optimization and
on the proposed partitioning of this notion in § 1.2 into the three sub-notions:
(1) surrogate modeling & simulation, (2) surrogate-based optimization, and
(3) surrogate-guided optimization. The various notions of surrogate optimiza-
tion are tagged with algebraic notes in order to anticipate the toolset of the
category theoretical language. Concerning the sub-notion (1), the notion of a
high-fidelity model, a low-fidelity model, and a surrogate model are pinned
down, for instance. Concerning the sub-notion (2), a numerical scaffolding of
a benchmark-focused classification of test functions is carved out, for example.
Concerning the sub-notion (3), given an optimization procedure within the
space-mapping paradigm and a co-kriging low-fidelity model, we encounter,
e.g., the elucidation of potential hybrid model management strategies.
9In [199], the authors offer a general classification of research methodologies. Although their field
of application is the numerical modeling of AC-loss in high-temperature superconductors, their classi-
fication has an application-independent general validity. Note that their classification can be regarded
as an extension of the classification in [37] of the methodology of mathematics.
10 Chapter 1. Introduction
• In chapter 4, a formalization-oriented viewpoint is deepened by introducing
the category theory toolset. I focus solely on core tools and attempt to bal-
ance intuition and rigor regarding this toolset. Moreover, the toolset is used
to specify a general optimization problem and to specify surrogate-guided op-
timization methods where the focus rests on optimization procedures within
the space-mapping paradigm. In addition, we face also other use cases for the
toolset related to high- and low-fidelity models associated with examples of
applications in electrical engineering.
• In chapter 5, we look at a solenoid with a core and a common-mode choke as
representatives of the class of inductive components where I elaborate on four
optimization problems within the setting of a two-dimensional linear bound-
ary value problem and a three-dimensional linear boundary value problem,
respectively. Supposing the context of an electrical engineering design work-
flow, a strategy of using the tools from chapter 3in practical applications is
presented and some relevant spots are carved out where the tools from chap-
ter 4can have a favorable influence, too.
• In chapter 6, I distill a conclusion from the presented research and present an
outlook.
In order to furnish one with some kind of visual orientation aid for maneuver-
ing within the present work, the following figure depicts schematically four generic
levels (the level of programs, the level of algorithms, the level of (generalized) func-
tions, and the level of applications) to which the essence of the respective discus-
sion in chapters 2,3,4, and 5can be roughly assigned to. In addition, some above-
mentioned terms are associated with these four levels as well.
Programs
(ch. 2, 3)
Algorithms
(ch. 2, 3)
Functions
(ch. 4) find the minimum
space mapping
space mappingPL
Applications
(ch. 5) *
quantity of interest
*
*
*
FIGURE 1.4: A schematic orientation aid for the present work (in-
spired by [225, p. 3]). The index PL refers to "programming language".
The assignment of ch. 6is omitted. The dotted lines merely indicate
a connection between the large ellipses. Each large ellipse represents
one of the generic levels: programs, algorithms, (generalized) func-
tions, and applications. The respective small ellipse within a large
ellipse symbolizes a sub-area of interest. A colored asterisk within a
small ellipse encodes a use case that is stated as colored text.
11
Chapter 2
Magnetoquasistatic Maxwell’s
theory – Modeling, simulation, and
optimization
In the present work, the physical framework is primarily restricted to macroscopic
scale electromagnetic phenomena described by Maxwell’s theory – in which the
magnetic energy and the power loss (weighted with a time of oscillation) are much
bigger than the electric energy such that physical effects concerning electromagnetic
wave propagation can be disregarded. The majority of the thesis’ central applica-
tions under investigation is embedded in this particular physical framework. There-
fore, I choose to expand on this particular physical framework in the subsequent
sections and to leave the common details of the general physical framework to the
standard literature (see, e.g., [139], [103]).
If the operating frequency is greater than zero, then, as customary, the mathe-
matical representation of the physical framework is given by the magnetoquasistatic
model of Maxwell’s theory; otherwise the mathematical representation is given by
the magnetostatic model of Maxwell’s theory.
Respecting the standard approach in electrical engineering, let us discuss the cor-
responding mathematical models in the language of vector analysis. Hence, in order
to express Maxwell’s theory (see, e.g., in [139], [103]), one has to assume a familiarity
with notions such as vector fields, scalar fields, the differential operators div, grad,
curl, etc.
We are not concerned with a thorough numerical analysis of the models – since
we abstract over most of their inner workings in the remaining chapters. However,
the modern treatment regarding the numerical simulation and optimization of these
models makes it necessary to involve some basic concepts from the languages of
functional analysis (see, e.g., [226], [140], [218], [9], [153]) and differential geome-
try (see, e.g., [93], [42], [83], [66], [134]) that provide methodological and termino-
logical guidance. Thus, one has to suppose a working knowledge of elementary
definitions and results concerning notions such as Hilbert spaces, bounded linear
operators, manifolds and similar; but the explanations do not follow strictly the so-
called "definition-theorem-proof model of mathematics" (cf. [204, p. 3]).
Furthermore, these two languages assist in tracing some intuitions concerning
the structural perspective that is emphasized by the language of category theory
that I employ in ch. 4.
12 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and
optimization
2.1 Magnetoquasistatic Model of Maxwell’s theory
The section provides a detailed description of the physical realm of the present work:
Starting from a brief exposition of the fundamental problem statement of electro-
magnetism, we discuss the statement’s mathematical representation by the system
of Maxwell’s equations. From the general system, we derive the magnetoquasistatic
subsystem and the magnetostatic subsystem; and for the former subsystem, we ar-
rive at a strong formulation that serves as an orientation point for the discussion
about the numerical simulation in the subsequent section.
2.1.1 The fundamental problem statement of electromagnetism
In the present work, we focus exclusively on Maxwell’s theory of electromagnetism
and corresponding mathematical models. For more details on the distinction be-
tween theory, model, and formulation, I refer to, e.g., [132, p. 5–9].
In [205, p. 273], the author states the fundamental problem of electromagnetism:
•Given a space region and a time interval,
•Given the nature of the materials that fill the region,
•Given the boundary conditions,
•Given the initial values of the configuration variables,
•Given the space and time distribution of charges and currents,
•Find the configuration of the field at every point and at every later
instant.1
2.1.2 The system of Maxwell’s equations
Associated with the problem statement in the previous section is the system of Max-
well’s equations that represents its mathematical model. Let us formalize the model
by the language of vector analysis.
The system of Maxwell’s equations contains the following field functions: the
electric field intensity E, the electric field flux density D, the magnetic field flux den-
sity B, the magnetic field intensity H, the electric charge density ρ, and the electric
current flux density J.
All field functions are defined as functions of space and time. It is assumed that
a three-dimensional Euclidean space as a model for space and a one-dimensional
Euclidean space as a model for time are given (see, e.g., [33, p. 109]). Additionally,
it is assumed that there are no mechanically moving parts involved. The space vari-
able xis a member of the space region Ω⊂R3and the time variable tis a member of
the time interval IT∶=[0,T]⊂R.
Moreover, the field functions are categorized into two types: vector fields and
scalar fields, i.e., given an instant in time, vector fields map a point in space to a
vector and scalar fields map a point in space to a scalar. Thus, the field functions E,
D,B,H, and Jare vector fields with the function signature Ω×IT→R3; the field
1The bullet points are a direct quotation of the listing in [205, p. 273], but the italic face and bold
face are my emphasis.
2.1. Magnetoquasistatic Model of Maxwell’s theory 13
function ρis a scalar field with the function signature Ω×IT→R.2Regarding the
notation, however, it should be noted that the symbols for the field functions can
also mean the evaluated field functions – for instance, E≡E(x,t),D≡D(x,t)etc.
Notice that, e.g., the field functions Eand Bconstitute configuration variables
(see § 2.1.1). The electric current density can be decomposed in a conduction part Jcond
due to an electrically conductive medium, and a source part Jsrc that is imposed ex-
ternally; hence, J∶=Jcond +Jsrc.
Customarily, the system of Maxwell’s equations is displayed in the integral ver-
sion or in the differential version. Let A⊂Ωbe an oriented surface with bound-
ary ∂A, and let V⊂Ωdenote a volume with boundary ∂V. Mind that the symbol ∂
behaves polymorphically, i.e., it is utilized to declare a boundary operator and a
partial time derivative operator ∂t. The integral version reads as
(i) ∀A.∫
∂A
H⋅ds=∫
A
J⋅dA+dt∫
A
D⋅dA,
(ii) ∀A.∫
∂A
E⋅ds=−dt∫
A
B⋅dA, (2.1)
(iii) ∀V.∫
∂V
D⋅dA=∫
V
ρdV,
(iv) ∀V.∫
∂V
B⋅dA=0.
The system of Maxwell’s equations is completed by the three constitutive equa-
tions that relate the corresponding field functions and express their interaction with
matter. Assuming time-invariant, linear, homogeneous, and isotropic material, the
equations are given by
(i) Dmat
=eE,
(ii) Bmat
=µH, (2.2)
(iii) Jcond
mat
=σE,
where the notation mat
=follows the style of [205, p. 33]. The electric permittivity e, the
magnetic permeability µ, and the electric conductivity σare considered as functions
of space. The absolute electric permittivity e0is incorporated in eand the absolute
magnetic permeability µ0is incorporated in µ. Notice that it depends on the context
whether e≡e(x),µ≡µ(x), and σ≡σ(x). In the case of non-linear and inhomo-
geneous magnetic material, it is customary to introduce the magnetization Mas an
additional field function (see, e.g., [178, p. 3]). In the presence of permanent mag-
nets, it is customary to introduce an additional magnetic field strength Hpm (see, e.g.,
[30, p. 10]). However, driven by the domain of applications in the present work (re-
call § 1.1), we are mainly concerned with constitutive equations given by (2.2).
2Borrowing from programming language theory (see, e.g., [88]), I conceive the term "function sig-
nature" similarly to the sense of the term "type signature". For example, given a function called fthat
maps a real number xto a real number x⋅Rxwhere the function ⋅Rindicates the real-valued binary
multiplication map, then one can write f=x↦x⋅Rx∶R→Rsuch that R→Ris the type signature
(or type annotation) of the function f(cf. the discussion in [32, p. 279–282]). Setting A≡Rand B≡R
in this example, then, roughly speaking, the type of xrefers to A(such that x∶A), the type of x⋅Ax
refers to B(such that x⋅Ax∶B), and the type of frefers to A→Bor BA(such that f∶A→Band f∶BA,
respectively).
14 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and
optimization
If we prescribe a unit normal vector to ∂Ω, then one can extract the tangential
and normal components of the field functions in (2.1) at the boundary of the space
region. If subregions Ω1and Ω2of the space region Ωexhibit different material
properties, then additional conditions have to be taken into account at the material
interfaces. For more details on the handling of all these conditions – especially by
trace operators in a functional analytic setting –, I refer to [218], [32], and [179].
Providing initial values of the corresponding field functions, and the space and
time information of the sources, all requirements according to the problem statement
in§2.1.1 are fulfilled. Applying the theorem of Stokes and the theorem of Gauss
on (2.1), we derive the system of Maxwell’s equation in the differential version
⎧
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩
(i) ∀(x,t)∈Ω×IT.curlH =J+∂tD,
(ii) ∀(x,t)∈Ω×IT.curlE =−∂tB,
(iii) ∀(x,t)∈Ω×IT. divD=ρ,
(iv) ∀(x,t)∈Ω×IT. divB=0,
(2.3)
where suitable boundary conditions and reasonable properties regarding the space
region Ωare assumed – which will be discussed later.
From (2.3), we can recover the continuity equation that encodes local charge con-
servation
∀(x,t)∈Ω×IT. divJ=−∂tρ. (2.4)
Commonly, potential field functions such as the electric scalar potential φand the
magnetic vector potential Aare also employed in the context of Maxwell’s equations.
Starting from (2.3), these two potential field functions are introduced, e.g., in the
representation
(i) E=∶−grad φ−∂tA, (ii) B=∶curl A . (2.5)
The potential field functions are particularly relevant for the formulation of mag-
netostatics problem and magnetoquasistatics problem where, for instance, the so-
called A-φformulation plays an important role in the numerical approximation of
these problems (see, e.g., [179, ch. 6]).
Before we move on to the approximation of the system of Maxwell’s equations
in magnetostatics and in magnetoquasistatics, let us seize the opportunity for a de-
tour to discuss shortly an important structural property of the differential operators
regarding the physical space. By discussing this property, I want to carve out some
intuitions concerning the category theory’s structural approach in ch. 4.
Detour 1: a structural perspective on a structural property. Recall that the three
differential operators grad,curl, and div exhibit an important structural property
for contractible domains: Firstly, given a field function expressed via grad, applying
curl to this function results in the zero vector field; secondly, given a field func-
tion expressed via curl, applying div to this function results in the zero scalar field.
Simplisticly, the structural property is expressed as curlgrad ≡0and divcurl ≡0
which is encoded in the Poincaré lemma (cf. [32, p. 298]).
If we emphasize the function signature of the three differential operators, i.e.,
grad has the type "scalar field →vector field", curl has the type "vector field →vector
field", and div has the type "vector field →scalar field", then one can systematize the
previous structural property as
" scalar field grad
ÐÐÐ→ vector field curl
ÐÐ→ vector field div
ÐÐ→ scalar field " .
2.1. Magnetoquasistatic Model of Maxwell’s theory 15
The inverted commas indicate that the systematization of the structural property
cannot be formalized properly in the standard language of vector analysis (see,
e.g., [83, p. 31]). For this purpose, there is a need to state the types and maps
more precisely. The language of differential geometry and the language of func-
tional analysis are capable to encode properly the structural property which, in these
languages, is the algebraic expression called exact sequence (cf. [32, p. 132f]). Let us
expose briefly this expression in these two languages. Leaving the majority of details
to the numerous textbooks that have been mentioned at the chapter’s beginning, we
focus only on the bare minimum of technicalities since the exposition’s purpose is to
abstract the structural essence of the common algebraic expression.
Within the manifold-based differential geometric approach, the full system of
Maxwell’s equations is formulated based on the machinery of differential forms and
exterior calculus. Using this approach, the vector field functions in (2.1) are called
"vector proxies" (cf. [33, p. 132]). For instance, the electric field strength is merely
a representative of an observable entity, more precisely, the assignment of a voltage
(i.e., the electromotive force) to an oriented line. Hence, the map
e=l↦∫lE⋅ds
is called a differential form of degree 1 – abbreviated as 1-form. Assuming a smooth
manifold M, we denote the space of 1-forms as Λ1(M), thus, eis an element of Λ1(M).
If we associate other field functions with other geometric objects such as points, sur-
faces, and volumes, then one can designate the corresponding spaces: the space of 0-
forms as Λ0(M), the space of 2-forms as Λ2(M), and the space of 3-forms as Λ3(M),
respectively. Additionally, one can instantiate a notion of a differential operator via
the exterior derivative d which maps a differential form of degree kto a differential
form of degree k+1 such that one can formalize the abovementioned systematization
of the structural property in vector analysis as the algebraic expression
0→Λ0(M)d1
Ð→ Λ1(M)d2
Ð→ Λ2(M)d3
Ð→ Λ3(M)→0 . (2.6)
Observe that the structural property itself is encoded in a defining property of the
exterior derivative: ∀a∈Λk(M).(dk+1○dk)(a)≡0; or concisely: dk+1○dk≡0.
Picking the functional analytic approach, technically, we are deploying the ma-
chinery of Sobolev spaces and weak differential operators such that, given a reg-
ular bounded, contractible domain Dof the Euclidean space, we have to choose
appropriate Sobolev spaces for the field functions in (2.3) and in (2.5), i.e.: L2
grad(D),
L2
curl(D),L2
div(D), and L2(D)where the notational convention by [32, p. 128] is
employed in which L2(D)denotes the space of square-integrable functions over D
and L2(D)denotes the space of square-integrable vector fields over D(see, e.g., [32,
p. 69]). Moreover, we have to set the domains and codomains of the weak differ-
ential operators such that, by construction, we arrive at the algebraic expression –
which is conceptually similar to (2.6):
0→L2
grad(D)grad
ÐÐÐ→ L2
curl(D)curl
ÐÐ→ L2
div(D)div
ÐÐ→ L2(D)→0 . (2.7)
One significance of diagrams such as (2.6) and (2.7) is that they provide a guid-
ance for the construction of a discrete representation of the full system of Maxwell’s
equations in the sense that, in numerical approximations, such type of diagrams
should be preserved (cf. [33, p. 145]) in order to mimic the continuous properties
16 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and
optimization
at the discrete level. Therefore, such diagrams enable some kind of consistency
check (cf. [111]). Following the spirit of the diagrammatic notation in (2.6) and
in (2.7), one can systematize the full system of Maxwell’s equations in the so-called
Maxwell’s house (cf. [32, p. 134]). See, in addition, the so-called Tonti’s classification
diagrams of electromagnetism, in short: Tonti’s diagrams (cf. [205, p. 307–323]).
Regarding an in-depth elaboration on the relationship between (2.6) and (2.7), I
refer to the discussion in [6] about the de Rham complex and Hilbert complexes, and
I refer to the discussion in [131] about the Sobolev space setting.
From a purely structural perspective, though, the essence of the algebraic expres-
sion in (2.6) and in (2.7) is that there are, in general, four spaces U,V,W,Xand three
maps f1,f2,f3such that the algebraic expression reads as
0→Uf1
Ð→ Vf2
Ð→ Wf3
Ð→ X→0 . (2.8)
At an intuitive level, one can regard the expression in (2.8) as the syntax, whereas
one can consider the expression in (2.6) as one possible semantics and the expression
in (2.7) as another possible semantics. More interestingly, one can identify another
semantics if, in (2.6) and in (2.7), one regards the spaces as vector spaces and the
maps as linear maps. This interplay of syntax and semantics shows us a flavor of a
structural perspective that foreshadows the category theoretical language which we
encounter in ch. 4.
2.1.3 The magnetoquasistatic subsystem & the magnetostatic subsystem
Due to the applications addressed in the present work, we are chiefly interested in
subsystems of Maxwell’s equations in (2.1) and in (2.3), respectively, where wave
propagating effects are neglected, and, therefore, the term ∂tDis neglected. Addi-
tionally, the electric charge density ρis assumed to be the zero scalar field function.
These restrictions lead to the magnetoquasistatic subsytem of Maxwell’s equation –
which, in the literature (see, e.g., [179, p. 7]), is also called eddy current approxima-
tion or magnetoquasistatic approximation of the Maxwell’s equation. Furthermore,
if one neglects all time-dependencies, then one arrives at the magnetostatic subsys-
tem of Maxwell’s equation.
The two subsystems represent approximations, hence, there is a need for justi-
fication. Let us assume that the magnetic energy and power loss (weighted with a
time of oscillation) are much bigger than the electric energy. There are additional
quantifiable tools (cf. [178, p. 6]) to check our assumption: (1) Given an operating
angular frequency ωin a domain, the product ωe has to be much smaller compared
to σ; and (2) the diameter of a bounded domain has to be much smaller than the
corresponding minimal wavelength within the bounded domain. For more details
on the mathematical justification, see, e.g., [179, ch. 2] or [198].
Reducing the system in (2.3) according to the corresponding restrictions, the
magnetoquasistatic subsystem of Maxwell’s equations reads as
⎧
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩
(i) ∀(x,t)∈Ω×IT.curlH =σE+Jsrc,
(ii) ∀(x,t)∈Ω×IT.curlE =−∂t(µH),
(iii) ∀(x,t)∈Ω×IT. div(eE)=0,
(iv) ∀(x,t)∈Ω×IT. div(µH)=0.
(2.9)
If we focus on the time-harmonic case where the field functions exhibit a sinu-
soidal time-dependency, one can formulate the magnetoquasistatic subsystem of
2.1. Magnetoquasistatic Model of Maxwell’s theory 17
Maxwell’s equations in the frequency domain as
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
(i) curlH =σE+Jsrc in Ω,
(ii) curlE =−jωµHin Ω,
(iii) div(eE)=0 in Ωnc.
(2.10)
Remark 2.1.1. In the frequency domain, the notation of field functions indicates complex-
valued field functions. As mentioned before, the symbols for the complex-valued field func-
tions can also mean the evaluated complex-valued field functions – for instance, E≡E(x,jω).
Hence, the notation Eindicates the componentwise conjugation of Eand E(x,jω), respec-
tively.
Remark 2.1.2. In Figure 2.1, I illustrate schematically two common representatives of a
magnetoquasistatic subystem’s domain in application.
Remark 2.1.3. Moving from (2.9) to (2.10) means that we have moved from an initial-
boundary value problem (IBVP) to a boundary value problem (BVP). In the time-harmonic
case, the equation (iv) in (2.9) can been dropped since it can be recovered from the equa-
tion (ii) in (2.10). Moreover, the equation (iii) in (2.10) holds for all non-conducting subre-
gions (Ωnc), whereas the equations (i) and (ii) refer to the whole physical (or computational)
domain under consideration (Ω). However, mind that the electric conductivity σis sup-
posed to be greater than zero in a conducting subregion (Ωc), and to be equal to zero in a
non-conducting subregion (Ωnc). Finally, one has to assume that divJsrc =0in Ωnc that
follows immediately from the continuity equation in (2.4).
Ωnc
∂Ω
Ωc
(A) Representative #1: A single con-
ducting subdomain.
Ωnc
∂Ω
Ωc,2
Ωc,1
(B) Representative #2: Multiple
conducting subdomains.
FIGURE 2.1: A schematic illustration of two common representatives
of a magneotquasistatic subsystem’s domain in application.
If we apply the frequency-domain representation of the potential field functions
from (2.5) to the subsystem in (2.10), one can state this subsystem in the A-φformu-
lation: ⎧
⎪
⎪
⎨
⎪
⎪
⎩
(i) curl(µ−1curl A)=−σgrad φ−jωσA+Jsrc in Ω,
(ii) div(−egrad φ−jωeA)=0in Ωnc.(2.11)
By setting ω≡0 in (2.11), one can immediately derive the magnetostatic subsystem
of Maxwell’s equations in the A-φformulation:
⎧
⎪
⎪
⎨
⎪
⎪
⎩
(i) curl(µ−1curl A)=−σgrad φ+Jsrc in Ω,
(ii) div(−egrad φ)=0 in Ωnc.(2.12)
18 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and
optimization
A necessary remark is concerned with the gradient field grad φin (2.12) and in (2.11).
Some authors (see, e.g., [178, p. 9]) neglect this term, other authors (see, e.g., [30,
p. 11]) introduce the field function Jsrc via the term −σgrad φ. In either cases, the
investigation focuses only on finding the vector potential A. For a treatment of the
term grad φin a more general setting, see, e.g., the discussion in [97].
In order to obtain uniqueness of the vector potential A, it is necessary to intro-
duce appropriate gauge conditions and boundary conditions. A common gauge
condition is the Coulomb gauge
divA=0 in Ω. (2.13)
Let ndenote the exterior unit normal at the computational domain’s boundary ∂Ω,
then common boundary conditions are Dirichlet boundary conditions
A⋅n=0 on ΓD⊂∂Ω, (2.14)
and Neumann boundary conditions
µ−1(curlA)×n=0 on ΓN⊂∂Ω, (2.15)
where it is assumed that ΓD∪ΓN≡∂Ω. If we assume a simply-connected com-
putational domain Ω, then, based on (i) in (2.11) – and, analogously, based on (i)
in (2.12) –, one can wrap-up the previous pieces of information in a so-called strong
formulation ⎧
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩
curl(µ−1curl A)+σgrad φ+jωσA=Jsrc in Ω,
divA=0 in Ω,
A⋅n=0 on ΓD,
µ−1(curlA)×n=0 on ΓN.
(2.16)
It is customary that due to, for instance, rotational or translational symmetry (see,
e.g., (ii) in Fig. 1.3), a two-dimensional setting is applied. Hence, the formulation
in (2.16) has to be adapted accordingly. For further discussion on this adaptation, I
refer to [178, p. 17f], [30, p. 11f], or, more generally, [83], [32].
2.2 Numerical simulation of the magnetoquasistatic model
As mentioned at the very beginning of the chapter, instead of a thorough numerical
analysis of the model, the primary concern is rather to utilize the methodological and
terminological guidance by basic concepts from functional analysis in the discussion
of the numerical simulation of the model. Hence, let us discuss abstractly the weak
formulation and its numerical approximation; and let us conclude the section by an
exposition of a parametric mathematical model.
2.2.1 The weak formulation
A strong formulation such as in (2.16) is the orientation point of the numerical sim-
ulation of the magnetoquasistatic model. However, due to continuity issues, the
solvability of the given problem in (2.16) is not generally guaranteed. Starting from
the strong formulation, a weak formulation has to be derived by formally multiply-
ing (2.16) with a test function vas a member of a Hilbert space V(Ω)and integrating
over the space region Ω. Under certain conditions, a solution to the weak formula-
tion is a solution to the strong formulation.
2.2. Numerical simulation of the magnetoquasistatic model 19
The weak formulation is a means to show that the problem in (2.16) is well-posed
in the sense of Hadamard, i.e., it exists a solution that is unique and that depends
continuously on the given data (e.g., boundary conditions or source). Furthermore,
the weak formulation is used for a finite element numerical approximation.
For the model-specific technicalities regarding the weak formulation of (2.16), I
refer to, e.g., [179, ch. 6] because, hereafter, I illuminate the weak formulation merely
abstractly in a Hilbert space setting.
First, let us set V∶=V(Ω)and encode the magnetic vector potential Aby the solu-
tion function uwhich is a member of the Hilbert space W∶=W(Ω). Second, looking
ahead to the numerical approximation by the finite element method considered as
a special case of the Ritz-Galerkin method (see, e.g., [32, p. 73]), [218, p. 45]), we set
the solution function space Wequal to the test function space V, i.e., W∶=V. Third,
assuming that the Hilbert space’s underlying field is R, let the map a∶V×V→Rbe
the bilinear form, and let the map l∶V→Rbe the linear form. Finally, one can state
the weak formulation abstractly as
findu∈Vsuch that ∀v∈V.a(u,v)=l(v). (2.17)
Remark 2.2.1. In (2.17), the Hilbert space V has to provide a notion of weak derivatives,
thus, V has to be a Sobolev space (see, e.g., [218, p. 419f]). For instance, the corresponding
spaces in (2.7) are considered as Sobolev spaces.
The boundary conditions are incorporated in the weak formulation, and, conventionally,
the excitation is incorporated in the linear form. If the bilinear form satisfies certain require-
ments such as boundedness and coerciveness and if the linear form is bounded as well, then
the weak formulation is well-posed.
Let us consider two restatements of (2.17). By observing that the linear form lis
a member of V′that is the dual of V, one can introduce the so-called natural pairing
that is a non-degenerate bilinear map <⋅,⋅>∶V′×V→Rsuch that l(v)=< l,v>.
Hence, the first restatement of (2.17) is
findu∈Vsuch that ∀v∈V.a(u,v)= < l,v>. (2.18)
A benefit of this presentation is that it is an aid in the conceptual distinction of the
various quantities involved since moving from the infinite-dimensional to the finite-
dimensional case, this distinction could be overlooked.
A second restatement of (2.17) is achieved if we only partially evaluate the bilin-
ear form aregarding the first argument such that a(u,⋅)∶V→R. One can observe
that the map a(u,⋅)and the linear form lare members of V′. By introducing a map L
such that L=u↦a(u,⋅)∶V→V′, one can restate (2.17) as
findu∈Vsuch that ∀v∈V.(Lu)(v)=l(v), (2.19)
where, by omitting additional brackets such as (L(u))(v), the conventional order of
evaluation is assumed.
Conceiving the map Lfrom (2.19) as a member of the collection hom(V,V′), that
is, the collection of all structure-preserving maps from Vto V′, one can represent,
e.g., a homogeneous partial differential equation (PDE) by the equation L(u)=0
(cf. [96, p. 2]). Hence, compared to (2.17), a benefit of the presentation in (2.18) is
the more explicit representation of the mathematical model involved. Examining
20 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and
optimization
parametric mathematical models in the last subsection, another benefit becomes ap-
parent by the logical connection between the parametric mathematical model and
its corresponding weak formulation.
In application, the solution function uis especially utilized to determine an ob-
servable physical quantity. Such a quantity is formally encoded in the so-called quan-
tity of interest which is denoted by a non-linear functional Q∶V→R. For instance,
one can express a quantity of interest with an appropriate norm on a space region
under investigation Ωi⊂Ω(cf. [211, p. 6]) such that one can write
Q=u↦∥u∥Ωi∶V→R. (2.20)
If one denotes another functional by q∶V→R(cf. [178, p. 35]), one can also repre-
sent Qin the form
Q=u↦∫
Ωi
q(u)dx∶V→R. (2.21)
In the magnetoquasistatic model, two common interpretations of the evaluated quan-
tity of interest Q(u)are the magnetic energy and the power loss.
The notion of a quantity of interest is also relevant in the context of error quan-
tification regarding high-fidelity models and corresponding low-fidelity models. We
take a closer look at these models in the next chapters.
The authors in [160] discuss the estimation of errors in quantities of interest of
two related solution functions u∈Vand u0∈V– whereas they embed their discus-
sion within the topic of model validation.
The solution function uis determined by (2.17) which represents a high-fidelity
model; the solution function u0is determined by using a different bilinear form a0
in (2.17) which represents a low-fidelity model. A conceivable distinction between
the models lies, e.g., in the different modeling of the material properties. Hence, the
error regarding the solution function E(u)∈R+and the modeling error E(Q)∈R+
can be defined as
E(u)∶=∥u−u0∥V, (2.22a)
E(Q)∶=∥Q(u)−Q(u0)∥l2, (2.22b)
where ∥⋅∥Vdenotes an appropriate norm on Vand ∥⋅∥l2denotes the standard l2-norm.
In (2.22b), choosing the absolute-value norm ∣⋅∣instead of the standard l2-norm is
possible as well.
Note that, in (2.22), it is assumed that both solutions are members of the same
space V. However, in the more generic setting of surrogate optimization, error es-
timates or error bounds for (2.22)might not exist. There are various situations in
which the quantity of interest to be compared has to be represented by two different
linear functionals. Some examples are: If uis determined in a three-dimensional
space region and u0is determined in a two-dimensional space region; or if the
space region under investigation (cf. (2.20)) may exhibit different topological prop-
erties (see, e.g., Fig. 1.3); or if different numerical methods are employed. In such a
generic setting, a comparison relying merely on the real number E(Q)conceals the
characters of the models under investigation and their relationships. The category
theoretical language in ch. 4provides tools to express formally at least parts of these
characters and relationships.
2.2. Numerical simulation of the magnetoquasistatic model 21
2.2.2 Numerical approximation
Recalling (2.16), the main focus of the exposition is on the time-harmonic case, thus,
let us solely pay attention to the spatial discretization in the context of the finite
element method.
The initial step of this method is the simplicial triangulation Thof the space re-
gion Ω, i.e., the space region is spatially subdivided into a collection of tetrahedra.
Notice that if Ω⊂R2, then the triangulation This a subdivision of Ωinto a collection
of triangles. Let us refer to has mesh size parameter.
The next step is to choose a family of finite dimensional subspaces Vhof Vsuch
that one can seek the discrete solution uh∈Vhby solving the discrete problem of the
weak formulation in (2.17), more precisely,
finduh∈Vhsuch that ∀v∈Vh.ah(uh,v)=lh(v), (2.23)
where the map ah∶Vh×Vh→Rdenotes a bilinear form and the map lh∶Vh→R
denotes a linear form. Notice well that, recalling § 2.2.1, it is tacitly assumed that
the spaces’ underlying field is R. However, technically speaking, the formulation
in (2.16) requires to consider the field of complex numbers Cwhich, in turn, de-
mands to invoke the notion of a sesquilinear form and an anti-linear form. For the
sake of exposition, let us not dwell on these specific technicalities and their implica-
tions, though.
By choosing an appropriate basis of Vh, the corresponding matrix representation
of (2.23) expresses the computation of uhby solving a system of linear equations. For
the construction of the finite element subspaces by associating each element of Th
with shape functions and degrees of freedom, I refer to, e.g., [6, p. 82f].
To construct a convergent numerical method that approximates properly the so-
lution u, the family of finite dimensional subspaces Vhhas to fulfill certain proper-
ties (cf. [6, p. 55ff]). For a more elaborated discussion on the consistency, stability,
and convergence of numerical methods, see, e.g., [7].
To close this paragraph, let us look closer at a structural property that is related
to the detour in § 2.1.2.
Detour 2: a structural perspective on another structural property. Recall the
diagrammatic presentation of the algebraic expression in (2.7). I have argued that
such an diagram is significant since it provides guidance as one moves from the
continuous to the discrete representation. The mimicry of the continuous level’s
structural property at the discrete level can be encoded by the commuting diagram
of the form
L2
grad(D)L2
curl(D)L2
div(D)L2(D)
L2
grad,h(D)L2
curl,h(D)L2
div,h(D)L2
h(D)
grad
πgrad
h
curl
πcurl
h
div
πdiv
hπh
grad curl div
(2.24)
where L2
grad,h(D),L2
curl,h(D),L2
div,h(D), and L2
h(D)denote the finite element sub-
spaces (see, e.g., [178, p. 21]); the differential operators behave polymorphically; and
the maps πgrad
h,πcurl
h,πdiv
h,πhindicate projections (see, e.g., [218, p. 401-405]).
From a purely structural perspective, the essence of the algebraic expression
22 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and
optimization
in (2.24) is that there are different spaces and maps equipped with a notion of com-
position in a certain context in which one can draw diagrams of the form
U V W X
˜
U˜
V˜
W˜
X
f1
p1
f2
p2
f3
p3p4
˜
f1˜
f2˜
f3
(2.25)
which are commutative such that it reflects the equality of various paths. Mind that,
in a more generic context, diagrams in (2.25) are not necessarily commutative.
Utilizing the intuition from the detour in § 2.1.2, one can regard the expression
in (2.25) as the syntax, whereas one can consider the expression in (2.24) as a pos-
sible semantics. Thus, this example hints more accurately at the style of reasoning
employed in the category theoretical language.
2.2.3 Parametric mathematical model
Up to this point, the solution function uis only considered within a space region Ω.
In application, though, one is additionally interested in a solution function that is
dependent on Nξparameters where Nξ∈N. These parameters are encoded in the
parameter point ξ
ξ
ξ∈X⊂RNξ. The map f=ξ
ξ
ξ↦u∶X→Vencodes the parametric so-
lution function. The expression f(ξ
ξ
ξ)denotes the solution function for the parameter
point ξ
ξ
ξ.
The corresponding partial differential equation depends on uand on ξ
ξ
ξsuch that
the map Lhas to be extended to L∶X×V→V′which is leading to L(ξ
ξ
ξ,u)=0
(cf. § 2.2.1). Commonly (see, e.g., [96, p. 2f]), it is assumed that the corresponding
partial differential equation is well-posed, and ∀ξ
ξ
ξ∈X.∃!f(ξ
ξ
ξ)∈V.
In the next step, let us adapt the weak formulation in (2.17)in order to state a
parametric weak formulation (cf. [94, p. 16]). Therefore, we have to extend the bilin-
ear form and the linear form to a∶V×V×X→Rand l∶V×X→R, respectively. The
bilinearity and linearity are with respect to the V-related arguments. The parametric
weak formulation (aka strong-weak formulation) reads as
given ξ
ξ
ξ∈X, findf(ξ
ξ
ξ)∈Vsuch that ∀v∈V.a(f(ξ
ξ
ξ),v,ξ
ξ
ξ)=l(v,ξ
ξ
ξ). (2.26)
Regarding the well-posedness of the formulation (2.26), one has to suppose the re-
quirements of the non-parametric case (2.19). For further details, I refer to, e.g., [94].
Recalling the strong formulation in (2.16), one can observe that the physical mean-
ing of the parameters originates from either the material, the geometry or the source.
In general, the individual components of the parameter point ξ
ξ
ξcan have different
physical meanings.
Similarly to (2.21), let us introduce two functionals Qξ
ξ
ξand ˆ
Qξ
ξ
ξ: the parametric
quantity of interest Qξ
ξ
ξ∶V×X→Rand the reduced parametric quantity of in-
terest ˆ
Qξ
ξ
ξ∶X→R. Given the matching of the two functionals’ codomains such
that cod(ˆ
Qξ
ξ
ξ)≡cod(Qξ
ξ
ξ), it is assumed that the evaluations of the functionals yield
the same numerical result, i.e.,
∀ξ
ξ
ξ∈X.ˆ
Qξ
ξ
ξ(ξ
ξ
ξ)=Qξ
ξ
ξ(f(ξ
ξ
ξ),ξ
ξ
ξ). (2.27)
The evaluated functional ˆ
Qξ
ξ
ξ(ξ
ξ
ξ)can be interpreted as, for instance, the numerical
value of the magnetic energy – or the numerical value of the power loss – for certain
2.3. Numerical optimization with the magnetoquasistatic model 23
geometry parameters. Analogously, the evaluated functional Qξ
ξ
ξ(f(ξ
ξ
ξ),ξ
ξ
ξ)can be in-
terpreted as the numerical value of the magnetic energy or the power loss; however,
it emphasizes the role of the parametric solution function fas well.
For further discussion on parametric mathematical models in the context of mag-
netoquasistatic Maxwell’s theory, I refer to the functional analytic setting, e.g., in [178],
and I refer to the differential geometric setting, e.g., in [174].
Remark 2.2.2. Since the functionals’ domains do not match, i.e., dom(ˆ
Qξ
ξ
ξ)/≡dom(Qξ
ξ
ξ), one
cannot conclude from (2.27) that the maps ˆ
Qξ
ξ
ξand Qξ
ξ
ξare equal by function extensionality.
In principle, one should be cautious with the equality of the evaluated quantities of in-
terests such as in (2.27). For instance, consider the so-called magnetic energy functional
and the so-called magnetic coenergy functional. Their assignment rules are different and the
numerical results of their evaluations are only equal in the linear case (cf. [32, p. 194]).
Additionally, recall the example (E1) in § 1.3 regarding the loss computation of a three-
dimensional helical coil and the loss computation of a corresponding representation by toroids.
Let ξ
ξ
ξcomprise geometry parameters that are the same in some sense for both the coil and the
toroids, let ˆ
Qξ
ξ
ξ,1(ξ
ξ
ξ)denote the loss of the coil, and let ˆ
Qξ
ξ
ξ,2(ξ
ξ
ξ)denote the loss of the toroids,
respectively. Then, an elemental tool of comparison is to check
∀ξ
ξ
ξ∈X . ˆ
Qξ
ξ
ξ,1(ξ
ξ
ξ)=Rˆ
Qξ
ξ
ξ,2(ξ
ξ
ξ),(2.28)
where the notation =Rindicates a test of equality of real numbers. If the statement in (2.28)
holds true, then one can conclude that the maps ˆ
Qξ
ξ
ξ,1 and ˆ
Qξ
ξ
ξ,2 are equal by function ex-
tensionality, thus, ˆ
Qξ
ξ
ξ,1 =X→Rˆ
Qξ
ξ
ξ,2 – such that one can substitute one map for the other.
However, the general statement in (2.28) might be undecidable. For the sake of completeness,
it is unlikely that the maps are equal by function intensionality as well; because it is unlikely
that the internal definitions of the maps are equal. In (3.2) in ch. 3, we encounter a situation
similar to the statement in (2.28) from the perspective of approximation theory.
Mind that the previous considerations are relevant from a rather logical analysis view-
point. Especially, if one imagines other loss computations by in some sense corresponding
representations of the helical coil, then it becomes appealing to look out for further tools of
comparison at the map level. Putting an emphasis on the map level is a peculiarity of the
category theoretical language.
2.3 Numerical optimization with the magnetoquasistatic model
Establishing the well-posedness property of a mathematical model is a demanding
major task in its own right. This property is a prerequisite for any optimization
procedure that is build on top of it. Hence, in the present work, it is assumed that all
mathematical models under investigation are well-posed.
Let us begin the section by outlining some theoretical considerations and limi-
tations regarding the optimization theory with partial differential equations and its
finite dimensional formulation as a nonlinear optimization problem. A particular
feature of the optimization problems is that the evaluation of the objective function
or of the constraints or of both requires the solving of a PDE. In the present work, as
opposed to its treatment as an explicit equality constraint, the discrete version of the
PDE is only considered implicitly within a given optimization problem.
We end the section by an illustration of a subset of optimization test functions
and various types of optimization algorithms.
24 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and
optimization
2.3.1 Optimization with a partial differential equation
The modern solution theory regarding optimization problems with partial differen-
tial equation is in tandem with the modern solution theory regarding partial differ-
ential equations; more precisely, it is rooted in the infinite-dimensional Banach space
setting and Hilbert space setting, respectively. However, since a thorough discussion
in such settings is out of the scope of the present work, let us only consider briefly
some aspects in order to be consistent with the abstract discussion in the previous
section, and to shine a light on some questions regarding optimization problems.
For an in-depth look at the infinite-dimensional case, I refer to, e.g., [96], [210].
Let us use the parametric mathematical model L(ξ
ξ
ξ,u)=0 from § 2.2.3 as a basis
for the investigation of the objective functional (or cost functional) J∶X×V→R.
Moreover, let us introduce a Hilbert space Rand a closed convex cone K⊂Rsuch
that one can define a map C∶X×V→Rin order to encode an abstract inequality
constraint as C(ξ
ξ
ξ,u)∈K. Hence, one can define abstractly the following optimiza-
tion problem (cf. [96, p. 2])
minimize J(ξ
ξ
ξ,u)over (ξ
ξ
ξ,u)∈X×V, (2.29a)
subject to L(ξ
ξ
ξ,u)=0, C(ξ
ξ
ξ,u)∈K, (2.29b)
where it is assumed that the objective functional Jis sufficiently smooth. The evalu-
ation of the objective functional Jrelies on solving accurately the discrete version of
the partial differential equation that is encoded in L(ξ
ξ
ξ,u)=0. Supposing the well-
posedness of the partial differential equation and using the map ffrom § 2.2.3, one
can redefine the optimization problem as
min. J(ξ
ξ
ξ,f(ξ
ξ
ξ))over (ξ
ξ
ξ,f(ξ
ξ
ξ))∈X×V, (2.30a)
s.t. L(ξ
ξ
ξ,f(ξ
ξ
ξ))=0, C(ξ
ξ
ξ,f(ξ
ξ
ξ))∈K. (2.30b)
Analogously to (2.27), one can define the reduced objective functional ˆ
J∶X→Rsuch
that ˆ
J(ξ
ξ
ξ)=J(ξ
ξ
ξ,f(ξ
ξ
ξ)). Let us consider shortly an instance of the reduced objective
functional. For the sake of presentation, I replace the finite-dimensional space Xby
the infinite-dimensional space ˜
Xand introduce the variable ˜
ξ
ξ
ξsuch that ˜
ξ
ξ
ξ∈˜
X. Hence,
the instantiated reduced objective functional ˆ
Jreads as
ˆ
J=˜
ξ
ξ
ξ↦α
2∥f(˜
ξ
ξ
ξ)−yd∥2
A+β
2∥˜
ξ
ξ
ξ∥2
B∶˜
X→R, (2.31)
where α,β>0 (with 0 <α+β) indicate some fixed scalars for, e.g., weighting or regu-
larization purposes, ∥⋅∥Aand ∥⋅∥Bindicate some appropriate norms, and yddenotes
a fixed desired solution function. Objective functionals similar to (2.31) are under
investigation, for instance, in the context of ˜
ξ
ξ
ξbeing the source current density. The
corresponding optimization problems deal with the optimal control of electromag-
netic fields (see, e.g., [211]).
From an application viewpoint, it is desirable to consider the role of (reduced)
parametric quantities of interest regarding the optimization problem in (2.30). There
are various possible combinations. For instance, the instantiated reduced objective
functional ˆ
Jin (2.31) does not necessarily encode a reduced parametric quantity of
interest ˆ
Q˜
ξ
ξ
ξ(see, e.g., [80]). Thus, the value ˆ
Q˜
ξ
ξ
ξ(˜
ξ
ξ
ξ)would only be determined in a
post-optimization step. Nevertheless, it is conceivable to instantiate the reduced
2.3. Numerical optimization with the magnetoquasistatic model 25
objective functional ˆ
Jas
ˆ
J=˜
ξ
ξ
ξ↦ˆ
Q˜
ξ
ξ
ξ(˜
ξ
ξ
ξ)∶˜
X→R, (2.32)
or if a desired value Qd∈Ris provided, then one can instantiate ˆ
Jas
ˆ
J=˜
ξ
ξ
ξ↦∥ˆ
Q˜
ξ
ξ
ξ(˜
ξ
ξ
ξ)−Qd∥2
l2∶˜
X→R. (2.33)
If Nˆ
Qquantities of interest with Nˆ
Q∈Nare considered, then a possible instantiation
of ˆ
Jcan be written as
ˆ
J=˜
ξ
ξ
ξ↦
Nˆ
Q
∑
i=1
ηi⋅Rˆ
Q˜
ξ
ξ
ξ,i(˜
ξ
ξ
ξ)∶˜
X→R, (2.34)
or, analogous to (2.33), if Nˆ
Qdesired values Qd,1,. . .,Qd,Nˆ
Q∈Rare provided, then
one can extend (2.34) to the expression
ˆ
J=˜
ξ
ξ
ξ↦
Nˆ
Q
∑
i=1
ηi⋅R∥ˆ
Q˜
ξ
ξ
ξ,i(˜
ξ
ξ
ξ)−Qd,i∥2
l2∶˜
X→R, (2.35)
where ηi>0 (with, e.g., η1+ ⋅ ⋅ ⋅ + ηNˆ
Q=1) denote Nˆ
Qfixed weighting constants and ⋅R
indicates the standard multiplication on the real numbers. Additionally, one could
incorporate the evaluated quantity of interests in the constraints (2.30b) as well.
From a solution theory viewpoint, there are a number of important questions
(cf. [96, p. 3f]) regarding the optimization problem in (2.30):
(a) One question is whether there exits an optimal argument for an optimal objec-
tive functional value.
(b) An immediate second question is whether this optimal argument is unique.
(c) A third question is concerned whether the optimal argument respects the con-
straints, hence, whether the optimality conditions, the so-called Karush-Kuhn-
Tucker (KKT) conditions, are satisfied. Mostly, first-order necessary optimal-
ity conditions are elaborated since the investigation of second-order necessary
and sufficient optimality conditions is harder.
(d) The fourth question is concerned with corresponding optimization algorithms.
Ideally, the algorithms respect the KKT conditions; thus, in the search of the
optimal solution, they rely on information about the first derivative (gradient)
or the second derivative (hessian) of, e.g., the objective functional. Generally,
such algorithms are guaranteed to find local minimal objective function values
of the problem in (2.30). Under certain conditions such as, e.g., convexity, they
even find the global minimal objective function value.
For instance, in the case of linear elliptic partial differential equations such as in (2.16),
there is much understanding concerning the questions (a) - (d) in the context of the
optimization problem in (2.30). However, to the extent of my present understanding,
there is still not yet a complete general solution theory regarding the consideration
of quantities of interests in different combinations in the optimization problem and
for different physical meanings of the parameters.
The above-mentioned algorithms’ tendency of finding local minima inspires the
introduction of some kind of randomness in the search of a potential global mini-
mum. If the algorithms exhibit some kind of randomness in the search, let us label
26 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and
optimization
them as stochastic; otherwise, let us label them as deterministic. For stochastic algo-
rithms, there is, to my best knowledge, no established theory dealing with guaran-
teed optimal solutions of an optimization problem such as (2.30).
Finally, in order to solve numerically the optimization problem in (2.30), one has
to transform it into a nonlinear optimization problem, i.e., one has to move from the
infinite-dimensional case to the finite-dimensional case.
2.3.2 Nonlinear optimization problem
Transforming the infinite-dimensional optimization problem in (2.30) into a finite-
dimensional optimization problem leads to the identification of the spaces involved
with subspaces of the standard Euclidean space and the restriction to finitely many
equality constraints and inequality constraints. Then, one can define abstractly the
so-called nonlinear optimization problem (see, e.g., [158], [18], [35], [142]) as
min. j(ξ
ξ
ξ,f(ξ
ξ
ξ))over (ξ
ξ
ξ,f(ξ
ξ
ξ))∈RNξ×Rn, (2.36a)
s.t. ∀i∈D.li(ξ
ξ
ξ,f(ξ
ξ
ξ))=0, (2.36b)
∀i∈E.ci(ξ
ξ
ξ,f(ξ
ξ
ξ))≤0, (2.36c)
where j∶Xad ×Vad →Ris the smooth objective function, Xad ⊆RNξis the set of
admissible parameter points, Vad ⊆Rnis the set of admissible parametric solu-
tion functions for a given parameter point, Dis the set of indices for the equality
constraints li∶RNξ×Rn→R, and Eis the set of indices for the inequality con-
straints ci∶RNξ×Rn→R. Let us leave the arguments ξ
ξ
ξand f(ξ
ξ
ξ)unchanged since
the altered context is clear, thus, there is no risk of confusion. If one incorporates the
constraints into a set of admissible arguments Wad ∶=Xad ×Vad, then one can express
the optimization problem in (2.36) compactly as
min.
(ξ
ξ
ξ,f(ξ
ξ
ξ))∈Wad
j(ξ
ξ
ξ,f(ξ
ξ
ξ)). (2.37)
Introducing a reduced objective function ˆ
j∶Xad →Rsuch that ˆ
j(ξ
ξ
ξ)∶=j(ξ
ξ
ξ,f(ξ
ξ
ξ))and
invoking set-builder notation to define the set of admissible parameter points Xad as
Xad ∶={ξ∈RNξ∶ ∀i∈D.li(ξ
ξ
ξ,f(ξ
ξ
ξ))=0∧ ∀i∈E.ci(ξ
ξ
ξ,f(ξ
ξ
ξ))≤0}, (2.38)
one can state the reduced optimization problem compactly as
min.
ξ
ξ
ξ∈Xad
ˆ
j(ξ
ξ
ξ), (2.39)
where, technically, one can assume that there are reduced constraint functions li(ξ
ξ
ξ)
and ci(ξ
ξ
ξ)such that li(ξ
ξ
ξ)∶=li(ξ
ξ
ξ,f(ξ
ξ
ξ))and ci(ξ
ξ
ξ)∶=ci(ξ
ξ
ξ,f(ξ
ξ
ξ)). A decisive property of
the class of nonlinear optimization problems such as in (2.36), in (2.37), and (2.39),
respectively, is that the evaluation of the objective function or of the constraints or
of both requires the solving of the discrete version of a partial differential equation.
In the present work, however, the discrete version of L(ξ
ξ
ξ,f(ξ
ξ
ξ))=0 is only implicitly
considered (see, e.g., [3]) as opposed to its treatment as an explicit equality constraint
(see, e.g., [96] or [210]).
Regarding the above-mentioned class of nonlinear optimization problems, the
fundamental assumption is that the numerical simulation of the corresponding math-
ematical model – in our case, the magnetoquasistatic model – dominates the overall
computational costs of the optimization procedure. This assumption inspires the
2.3. Numerical optimization with the magnetoquasistatic model 27
use of so-called low-fidelity mathematical models in order to reliably accelerate the
optimization procedure. Notice that the low-fidelity models are implicitly associ-
ated with low computational costs. The chapter 3is devoted to the discussion about
optimization schemes using low-fidelity models.
In ch. 3, we discuss a particular class of optimization schemes that are following
the so-called space-mapping paradigm (see, e.g., [125, p. 47]). Within this class, there is
an emphasis on a reduced objective function ˆ
ˆ
j○f∶Xad →Rwhere ˆ
ˆ
j∶Vad →Rand the
condition ˆ
ˆ
j(f(ξ
ξ
ξ))=j(ξ
ξ
ξ,f(ξ
ξ
ξ))is applied. Considering (2.39), the condition ˆ
ˆ
j(f(ξ
ξ
ξ))=
ˆ
j(ξ
ξ
ξ)is supposed, too. Finally, one can state the reduced optimization problem as
min.
ξ
ξ
ξ∈Xad (ˆ
ˆ
j○f)(ξ
ξ
ξ), (2.40)
where the composition operator ○ ∶ hom(Vad,R)×hom(Xad,Vad)→hom(Xad,R)is
assumed.
Confronted with the optimization problems in (2.37), in (2.39), and in (2.40), let
us apply briefly the structural perspective from the detours in § 2.1.2 and in § 2.2.2.
We discuss this perspective more thoroughly in ch. 4.
Detour 3: a structural perspective on the objective functions. Recall the dia-
grammatic presentation of the abstract algebraic expressions in (2.8) and in (2.25).
Using this style of presentation in the context of the optimization problems in (2.37),
in (2.39), and in (2.40) results in, among other aspects, stressing the function signa-
tures of the involved objective functions. Therefore, the objective function in (2.37)
can be expressed by its assignment rule together with its signature, i.e.,
j=(ξ
ξ
ξ,f(ξ
ξ
ξ))↦j(ξ
ξ
ξ,f(ξ
ξ
ξ))∶Xad ×Vad →R, (2.41)
the objective function in (2.39) can be expressed by its assignment rule together with
its signature, i.e.,
ˆ
j=ξ
ξ
ξ↦ˆ
j(ξ
ξ
ξ)∶Xad →R, (2.42)
and the objective function in (2.40) can be expressed by its assignment rule together
with its signature, i.e.,
ˆ
ˆ
j○f=ξ
ξ
ξ↦(ˆ
ˆ
j○f)(ξ
ξ
ξ)∶Xad →R. (2.43)
Recalling (2.28), an elemental tool of comparison for (2.42) and (2.43) is to check
∀ξ
ξ
ξ∈Xad .ˆ
j(ξ
ξ
ξ)=R(ˆ
ˆ
j○f)(ξ
ξ
ξ). (2.44)
If the statement in (2.44) holds true, then one can conclude that the maps ˆ
jand ˆ
ˆ
j○f
are equal by function extensionality, thus, ˆ
j=Xad→Rˆ
ˆ
j○f– such that one can substitute
one map for the other. However, from a purely structural perspective, the essence
of the previous designated objective functions can be captured by four distinct ob-
jects A,B,C, and A×Band by five distinct maps j0,j1,j2,f2, and j2○f2such that one
28 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and
optimization
can draw three diagrams
A×B C ,
A C ,
A B C .
j0
j1
j2○f2
f2j2
(2.45)
The benefits of such diagrams are diverse:
(1) Such diagrams disclose pictorially the different decisions for formulating the
objective function at the signature level. Hence, they reflect in some sense the
available information of the problem at hand.
(2) From the viewpoint of syntax and semantics (recall the detours in § 2.1.2 and
in§2.2.2), such diagrams’ level of abstraction is particular useful as a unifying
guiding tool. For instance, the map f2can be a parametric quantity of interest
and the map j2can be chosen such that the map j2○f2encodes an objective
function similar to (2.33).
(3) Especially in the wider context of validation and verification (see, e.g., [178,
p. 11ff] and some references therein such as, e.g., [12], [182], and [160]), the di-
agrammatic presentation is helpful as a formal organizing tool when we con-
sider various models of various fidelity at the signature level.
2.3.3 Optimization algorithms
In order to solve the optimization problem in (2.37), in (2.39) or in (2.40) by means
of a computer, we have to apply an appropriate optimization algorithm which seeks
iteratively the solution. It is intricate to define a generally accepted taxonomy of
optimization algorithms (see, e.g., [158, p. 422]) as it is intricate to define a gener-
ally accepted taxonomy of optimization problems (see, e.g., the internet website of
the NEOS Server [84]). Therefore, it depends on the user to select an appropriate
algorithm for a given problem (cf. [158, p. 2]).
Another user-dependent decision is a software-related issue, more precisely, the
choice of the programming language (PL) in which to implement the algorithm. Two
dynamically-checked programming languages – and corresponding libraries – are
employed: the well-known MATLAB®PL and the relatively young Julia PL. A thor-
ough discussion of this software-related issue is out of the scope of the present work.
Though, two of my reasons to utilize the Julia PL are:
(1) its promising outlook to reconcile performance issues and productivity issues;
(2) and its expressive type system and support of functional programming idioms.
The rationale behind point (1) is the observation that performance issues are pre-
dominantly discussed at the algorithm level; thus, performance issues at the pro-
gram level are rarely tackled explicitly in the literature. However, in order to com-
prehensively assess the Julia PL’s capabilities and limitations in comparison to other
programming languages, there are much more benchmarks of test problems from
various sources needed (see, e.g., [172]).
2.3. Numerical optimization with the magnetoquasistatic model 29
The rationale behind point (2) is the observation that the category theoretical lan-
guage (which we encounter in ch. 4) is closely related to type theory and functional
programming languages. However, the expressiveness of the Julia PL’s type system
and the range of its functional programming features are not sufficient to fully match
the category theoretical language.
Let us utilize the Julia PL to discuss some widely used optimization algorithms.
In our discussion, let us take a pragmatic viewpoint in the sense that we leave the
details about the algorithms to the corresponding references and we describe con-
cisely their behavior regarding a subset of test functions of form f=(x1,x2)↦
f(x1,x2)∶U→Rwhich are at least members of the differentiability class C1on
the open set U⊂R2. The test functions are: Ackley, the Unit sphere, Booth, Rosen-
brock, Michalewicz, and the modified Branin (see, e.g., [116], [70], [202]). These test
functions (see the Table 2.1) cover various shapes that pose challenges for the algo-
rithms. The Figure 2.2a provides the test functions’ surface representation and the
Figure 2.2b provides the test functions’ contour representation together with a mark
of a global minimum (x∗
1,x∗
2). Additionally, the Figure 2.3 provides a close-up of the
neighborhood of the test functions’ global minimum.
TABLE 2.1: Test functions of form f=(x1,x2)↦f(x1,x2)∶U→R.
Test function fDefinition f(x1,x2)Global minimum (x∗
1,x∗
2)
Ackley −20exp(−0.2√1
2∑2
i=1x2
i)−
exp(1
2∑2
i=1cos(2πxi))+20 +
exp(1)(0,0)s.t. f(0,0)=0
Unit sphere ∑d=2
i=1x2
i(0,0)s.t. f(0,0)=0
Booth (x1+2x2−7)2+(2x1+x2−5)2(1,3)s.t. f(1,3)=0
Rosenbrock3∑2−1
i=15(xi+1−x2
i)2+(xi−1)2(1,1)s.t. f(1,1)=0
Michalewicz4−∑2
i=1sin(xi)sin20(ix2
i
π) (2.20,1.57)s.t.
f(2.20,1.57)=−1.801
Modified Branin51⋅(x2−5.1
4π2x2
1+5
πx1−6)2+
10⋅((1−1
8π)cos(x1)+1)+5x1(−3.689,13.630)s.t.
f(−3.689,13.630)=−16.644
Let us set up the nonlinear optimization problem under test by choosing the
respective test function as the objective function and by defining the admissible
3Often, the factor 100 is used instead of the factor 5. However, the factor 5 is employed analogously
to [116, p. 431].
4Mind that the global minimum (2.20,1.57)is just an approximation (see, e.g., [116, p. 430]).
5The additional term 5x1is the modification (cf. [70, p. 196]) to the Branin function (see, e.g., [116,
p. 427]). The global minimum (−3.70,13.63)is just an approximation.
30 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and
optimization
sets F1(also known as box constraints), F2, and F∶=F1∩F2such that
F1∶={(x1,x2)∈R×R∶x1,l−x1≤0∧x1−x1,u≤0∧x2,l−x2≤0∧x2−x2,u≤0},
(2.46a)
F2∶={(x1,x2)∈R×R∶(x1−x∗
1)2+(x2−x∗
2)2−r2
0≤0}, (2.46b)
where the point (x1,l,x1,u)and the point (x2,l,x2,u)represent the lower and upper
bound of the component x1and x2, respectively; the point (x∗
1,x∗
2)denotes an opti-
mal argument of the respective test function that is known apriori; and the scalar r0
encodes the radius of a disk D(M,r0)with the midpoint M∶=(x∗
1,x∗
2). The scalar r0
is set to 10, i.e., r0∶=10, and the lower and upper bounds are set such that the opti-
mal argument’s quadrant is captured, e.g., for the Rosenbrock function in Fig. 2.2b,
it holds that (x1,l,x1,u)=(0,10)and (x2,l,x2,u)=(0,10). In the cases, in which the
optimal argument is the zero point (see (i) and (ii) in Fig. 2.2b), the first quadrant is
selected.
The test functions can be composed of generic Julia functions, hence, one can
employ the package ForwardDiff.jl (see [175]) in order to determine derivative
information by forward mode automatic differentiation (see, e.g., [81], [156]). Since
it is assumed that the six test functions in Table 2.1 are at least members of C1, I
depict in Figure 2.4 the image of Uunder the vector field grad f, i.e., grad(f)(x1,x2)
where (x1,x2)∈U, as a projection on the test functions’ contour representation (see
Figure 2.2b and Figure 2.3b).6
Mind that if one assumes that the domain Uis convex, that is, the condition
∀p1,p2∈U.∀t∈[0,1].(1−t)p1+R2tp2∈U(2.47)
holds to be true; and if one assumes that, e.g., the condition
∀p1,p2∈U.f(p1)≥f(p2)+grad(f)(p2)⋅R2(p1−R2p2)(2.48)
holds to be true as well, then one can conclude that a test function fis convex.
However, for arbitrary domains and arbitrary functions in practical applications, in
most cases, one cannot exploit analytical examinations of the convexity property.
Furthermore, by means of numerical examinations, one can only test the condition
in (2.47) and the condition in (2.48)for some p1,p2∈U; hence, one can gain at most a
clue for convexity. In the present work, therefore, I refrain from elaborating on the
convexity property for all the functions under consideration.
Assuming a map s1and a map s2who share the same signature that reads as
C1(U,R)×U→R+, one can conceive these maps as local first-order sensitivity mea-
sures if one defines their assignment rules as
s1(f,p)∶=(∂ex1(f)(p))2,s2(f,p)∶=(∂ex2(f)(p))2. (2.49)
Hence, let us deploy a gradient-based interpretation of sensitivity measures (see,
e.g., [129]).7In Figure 2.5,s1(f,p)and s2(f,p)are depicted w.r.t. Figure 2.3b.
6The map grad in (2.7) is overloaded in the sense that grad is equipped with the signature (U→
R)→(U→R2). Given a point p∈U, one can extract the components of grad(f)(p), that is, ∂ex1(f)(p)
and ∂ex2(f)(p), by setting ∂ex1(f)(p)∶=grad(f)(p)⋅R2ex1and by setting ∂ex2(f)(p)∶=grad(f)(p)⋅R2
ex2where ⋅R2denotes the Euclidean inner product w.r.t. R2; and ex1and ex2refer to the unit vectors
w.r.t. x1and x2, respectively.
7For more details on gradient-based sensitivity measures such as, e.g., other possible definitions
than the definition in (2.49), I refer to [129] and references therein.
2.3. Numerical optimization with the magnetoquasistatic model 31
Exploiting the assignment rules in (2.49), one can define the maps S1and S2that
possess the same signature, that is, C1(U,R)→R+, whose assignment rules read as
S1(f)∶=∫
U
s1(f,x)d2x,S2(f)∶=∫
U
s2(f,x)d2x. (2.50)
Thus, one can conceive the maps S1and S2as global first-order sensitivity measures.
In addition, one can define normalized global first-order sensitivity measures SN
1and SN
2
whose assignment rules read as
SN
1(f)∶=S1(f)
Σ2
i=1Si(f),SN
2(f)∶=S2(f)
Σ2
i=1Si(f), (2.51)
where ∀f.SN
1(f)+SN
2(f)=1.0 holds to be true in exact arithmetic.
By using the package HCubature.jl (see [105]), SN
1(f)and SN
2(f)in (2.51) can
be computed by means of numerical integration with regard to the Figure 2.2b and
with regard to the Figure 2.3b. In Table 2.2 and in Table 2.3, respectively, the corre-
sponding results are presented.8
TABLE 2.2: The normalized global first-order sensitivity measure SN
i
with i∈{1,2}evaluated at fw.r.t. the Figure 2.2b.
(i) (ii) (iii) (iv) (v) (vi)
SN
1(f)0.5000 0.5000 0.4894 0.9965 0.3702 0.7234
SN
2(f)0.5000 0.5000 0.5106 0.0035 0.6298 0.2766
Σ2
i=1SN
i(f)1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
TABLE 2.3: The normalized global first-order sensitivity measure SN
i
with i∈{1,2}evaluated at fw.r.t. the Figure 2.3b.
(i) (ii) (iii) (iv) (v) (vi)
SN
1(f)0.5000 0.5000 0.5000 0.9109 0.2595 0.9277
SN
2(f)0.5000 0.5000 0.5000 0.0891 0.7405 0.0723
Σ2
i=1SN
i(f)1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
If we consult the Table 2.1, then one can assert that the results in Table 2.2 and
in Table 2.3 seem to pass some plausibility checks. More precisely: The choice of
the domain Ucan have an impact on the sensitivity measures if the corresponding
evaluated test function does not show some kind of symmetry. In the cases from
(i)
to
(iii)
, for all points p∈Uthe evaluated squared instantaneous rate of change is,
roughly speaking, equal w.r.t. both variables x1and x2; whereas, in the cases from
(iv)
to
(vi)
, for all points p∈Uthe evaluated squared instantaneous rate of change is,
roughly speaking, either greater w.r.t. x1or greater w.r.t. x2. Hence, from a practical
applications viewpoint, the Table 2.2 and the Table 2.3 furnish us with a valuable
8Due to numerical inaccuracies, it is needed to set U≡[−29.9999,30.0]×[−30.0,30.0]and
U≡[−1.9999,2.0]×[−2.0,2.0], respectively, in the case of the test function
(i)
, i.e., Ackley, in order
to ensure that, for all test functions, the estimated error with regard to the estimated integral is at least
below 1×10−4.
32 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and
optimization
quantitative screening of the importance of the variables regarding the respective
test function.
Mind that the elaborations are without loss of generality regarding the number
of parameters Nξ(recall § 2.2.3). To exemplify this kind of generality, let us explore
the Rosenbrock test function in Table 2.1 since, without much ado, it is amenable to
the number of parameters Nξwith Nξ∈{2,3,4,5,6,7}. Hence, the Nξ-dimensional
Rosenbrock test function fRNξcan be written as
fRNξ
=x↦fRNξ(x)∶=
Nξ−1
∑
i=1
5(xi+1−x2
i)2+(xi−1)2∶UNξ→R(2.52)
where Nξ∈{2,3,4,5,6,7}, and UNξis an open set UNξ⊂RNξ. Notice that, in each case
of Nξ, the global minimum w.r.t. (2.52) is at the point (1,1,...,1)∈RNξ. Adapting the
normalized global first-order sensitivity measures in (2.51) to the use case of the Nξ-
dimensional Rosenbrock test function fRNξ, I report in Table 2.4 the corresponding
results.The observable pattern is reasonable if one unrolls the term fRNξ(x)in (2.52).
TABLE 2.4: The normalized global first-order sensitivity mea-
sure SN
ievaluated at fRNξw.r.t. the domain [−2.0,2.0]Nξwith
Nξ∈{2,3,4,5,6,7}.
Nξ
SN
i(f)i∶=1i∶=2i∶=3i∶=4i∶=5i∶=6i∶=7ΣNξ
i=1SN
i(f)
2 0.9109 0.0891 −−−−−1.0000
3 0.4008 0.5600 0.0392 −−−−1.0000
4 0.2569 0.3590 0.3590 0.0251 −−−1.0000
5 0.1892 0.2641 0.2641 0.2641 0.0185 − − 1.0000
6 0.1494 0.2090 0.2090 0.2090 0.2090 0.0146 −1.0000
7 0.1239 0.1728 0.1728 0.1728 0.1728 0.1728 0.0121 1.0000
Finally, let us invoke four packages that contain various types of optimization
algorithms:
(Opkg1) the package NLopt.jl (see [107]) provides an interface to the open-source
NLopt library for nonlinear optimization,
(Opkg2) the package BlackBoxOptim.jl (see [64]) which provides some meta-heuristic9
stochastic algorithms for global optimization,
(Opkg3) the package Optim.jl (see [151]) provides some deterministic and stochastic
algorithms for box-constrained local and global optimization, and
(Opkg4) the package IntervalOptimisation.jl (see [186]) which provides guaran-
teed deterministic global optimization algorithms using interval arithmetic.
From (Opkg1), I employ two gradient-based local optimization algorithms –
more precisely, a sequential quadratic programming (SQP) algorithm based on [128]
and a method of moving asymptotes (MMA) algorithm based on [203] – on the
admissible set F; I apply two derivative-free local optimization algorithms – i.e., a
9Commonly, the term meta-heuristic (see, e.g., [217]) refers to a strategic search by trial and error
without a theoretical guarantee of global optimality.
2.3. Numerical optimization with the magnetoquasistatic model 33
Nelder-Mead simplex (NMS) algorithm based on [176] and a constrained optimiza-
tion by linear approximations (COBYLA) algorithm based on [170] – on the admis-
sible set F1and on the admissible set F, respectively; and I apply two derivative-free
global optimization algorithms – that is, the dividing rectangles (DIRECT) algorithm
based on [109] and a modified evolutionary algorithm (MEA) based on [193] – on the
admissible set F1.10
From (Opkg2), I pick an adaptive differential evolution (ADE) algorithm from a
collection of stochastic algorithms in order to perform a stochastic global optimiza-
tion on the admissible set F1.
From (Opkg3), I utilize a primal interior-point algorithm (see, e.g., [158, ch. 19]
or [116, ch. 10.9]) on the admissible set F1in which I employ for the inner optimiza-
tion algorithms a gradient-based, i.e., a limited-memory Broyden–Fletcher–Gold-
farb–Shanno (L-BFGS) algorithm (see, e.g., [158, ch. 7.2]); a derivative-free, i.e., a
Nelder-Mead simplex algorithm (see above); and a stochastic global, i.e., a particle
swarm (PS) algorithm based on [228].
By providing the initial point (ˆ
x1,ˆ
x2)=(1.1,1.1), let us check exemplarily the
Ackley function’s global optimal argument (x∗
1,x∗
2)=(0.0,0.0)with the correspond-
ing optimal function value f(x∗
1,x∗
2)=0.0 (see (i) in Figure 2.2b). As expected, all
algorithms find the optimal solution within a certain numerical tolerance (see Ta-
ble 2.5). But also as expected, choosing an initial point closer to the admissible set’s
borders, the gradient-based local optimization algorithms tend to be trapped in one
of the Ackley function’s many local minima. Similarly, the behavior of the algo-
rithms with respect to the other test functions (see (ii)–(vi) in Figure 2.2b) – that has
been well investigated in the literature – can be recapitulated.
TABLE 2.5: Check exemplarily the Ackley function’s global optimum.
Opkg SQP MMA NMS COBYLA DIRECT MEA ADE L-BFGS PS
13 3 3 3 3 3
23
33 3 3
In order to assess the ambit of the solution found, a common practice in many
applications is: Apply a global optimization algorithm; and use its solution as a start-
ing point for a local optimization algorithm. However, another possibility to assess
the area of validity is to use interval arithmetic (see, e.g., [213], [104]) in the context
of deterministic global optimization.11 In (Opkg4), such a possibility is pursued by a
Moore-Skelboe (MS) algorithm (see, e.g, [59]). Mind that the result is not comprised
of the optimal component values (x∗
1,x∗
2)and the optimal function value f(x∗
1,x∗
2)
as with the aforementioned algorithms; instead the result is comprised of intervals
that contain guaranteed the optimal component values [x∗
1,l,x∗
1,u]×[x∗
2,l,x∗
2,u]and the
optimal function value [f(x∗
1,l,x∗
2,l),f(x∗
1,u,x∗
2,u)].
Finally, when one moves from the test functions such as in Figure 2.2a to func-
tions from applications, one has to recall two common issues:
10For more details on gradient-based optimization methods, I refer to, e.g., [158], [18]; on derivative-
free optimization algorithms, I refer to, e.g., [47], [10], [135]; and on deterministic and stochastic global
optimization algorithms, I refer to, e.g., [98], [67], and, e.g., [24], [195], respectively.
11For further deliberations on deterministic global optimization using interval arithmetic, I refer to,
e.g., the survey in [157].
34 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and
optimization
• The test functions exhibit a fairly complete picture that facilitates the choice
of an appropriate algorithm. However, in many applications, the choice of
an appropriate optimization algorithm for a given problem is difficult due to
incomplete preliminary information. Hence, there is also no preference for a
particular type of optimization algorithms.
• A test function evaluation is computationally cheap. However, in many appli-
cations, the evaluation of the objective function or the constraint functions or
both (see § 2.3.1) depends on a computationally expensive numerical simula-
tion (see § 2.2) such that, presumably, an exhaustive coverage of the parameter
space is prohibitive. Hence, an exhaustive reconstruction of the shape (or land-
scape) that can be associated with the function under investigation is unlikely.
We encounter these two issues again in the upcoming chapter 3that is concerned
with the discussion about optimization schemes using low-fidelity models.
Moreover, in chapter 5, we consider high-fidelity optimization problems as a con-
crete instances of the abstract optimization problem in (2.36) where the semantics of
the magnetoquasistatic model is applied. More precisely, we encounter functions
that encode, for instance, the time-averaged ohmic loss, the time-averaged ohmic
loss density or the inductance at different operating frequencies. Hence, the investi-
gation presented in this section is valuable as a preliminary study and anchor point
to develop and assess the studies of chapter 5.
2.3. Numerical optimization with the magnetoquasistatic model 35
x
1
-30-20-10 0102030
x
2
-30
-20
-10
0
10
20
30
z
5
10
15
20
(i)
x
1
-30-20-10 0102030
x
2
-30
-20
-10
0
10
20
30
z
250
500
750
1000
1250
1500
1750
(ii)
x
1
-10
-5
0
5
10
x
2
-10
-5
0
5
10
z
500
1000
1500
2000
2500
(iii)
x
1
-10
-5
0
5
10
x
2
-10
-5
0
5
10
z
1
2
3
4
5
×104
(iv)
x
1
0
1
2
3
4
x
2
0
1
2
3
4
z
-1.5
-1.0
-0.5
0.0
0.5
1.0
(v)
x
1
-5
0
5
10
x
2
0
5
10
15
z
0
50
100
150
200
250
(vi)
(A) Surface representation of z∶=f(x1,x2).
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(i)
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(ii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iv)
0 1 2 3 4
x
1
0
1
2
3
4
x
2
(v)
-5 0 5 10
x
1
0
5
10
15
x
2
(vi)
(B) Contour representation of z∶=f(x1,x2).
The red cross indicates a global minimum.
FIGURE 2.2: Representations of the test functions in Table 2.1.
(i)
Ackley,
(ii)
Unit sphere,
(iii)
Booth,
(iv)
Rosenbrock,
(v)
Michalewicz,
(vi)
Modified Branin.
36 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and
optimization
x
1
-2
-1
0
1
2
x
2
-2
-1
0
1
2
z
1
2
3
4
5
6
7
(i)
x
1
-2
-1
0
1
2
x
2
-2
-1
0
1
2
z
2
4
6
8
(ii)
x
1
-1
0
1
2
3
x
2
1
2
3
4
5
z
10
20
30
40
50
60
70
(iii)
x
1
-2
-1
0
1
2
x
2
-2
-1
0
1
2
z
25
50
75
100
125
150
175
(iv)
x
1
1
2
3
x
2
1
2
3
z
-1.5
-1.2
-0.9
-0.6
-0.3
(v)
x
1
-5
-4
-3
-2
-1
x
2
11
12
13
14
15
z
0
20
40
60
(vi)
(A) Surface representation of z∶=f(x1,x2).
-2 -1 0 1 2
x
1
-2
-1
0
1
2
x
2
(i)
-2 -1 0 1 2
x
1
-2
-1
0
1
2
x
2
(ii)
-1 0 1 2 3
x
1
1
2
3
4
5
x
2
(iii)
-2 -1 0 1 2
x
1
-2
-1
0
1
2
x
2
(iv)
1 2 3
x
1
1
2
3
x
2
(v)
-5 -4 -3 -2 -1
x
1
11
12
13
14
15
x
2
(vi)
(B) Contour representation of z∶=f(x1,x2).
The red cross indicates a global minimum.
FIGURE 2.3: Representations of the test functions in Table 2.1 (high-
lighting the neighborhood of the global minimum).
(i)
Ackley,
(ii)
Unit sphere,
(iii)
Booth,
(iv)
Rosenbrock,
(v)
Michalewicz,
(vi)
Modified Branin.
2.3. Numerical optimization with the magnetoquasistatic model 37
−30 −10 10 30
x
1
−30
−10
10
30
x
2
(i)
−30 −10 10 30
x
1
−30
−10
10
30
x
2
(ii)
−10 −5 0510
x
1
−10
−5
0
5
10
x
2
(iii)
−10 −5 0510
x
1
−10
−5
0
5
10
x
2
(iv)
012 3 4
x
1
0
1
2
3
4
x
2
(v)
−5 0510
x
1
0
5
10
15
x
2
(vi)
(A) Depicting grad(f)(x1,x2)within Figure 2.2b.
−2 −1 012
x
1
−2
−1
0
1
2
x
2
(i)
−2 −1 012
x
1
−2
−1
0
1
2
x
2
(ii)
−1 012 3
x
1
1
2
3
4
5
x
2
(iii)
−2 −1 012
x
1
−2
−1
0
1
2
x
2
(iv)
12 3
x
1
1
2
3
x
2
(v)
−5 −4 −3 −2 −1
x
1
11
12
13
14
15
x
2
(vi)
(B) Depicting grad(f)(x1,x2)within Figure 2.3b.
FIGURE 2.4: Depicting grad(f)(x1,x2)with (x1,x2)∈Uas a projec-
tion on the contour representation of the test functions in Table 2.1.
(i)
Ackley,
(ii)
Unit sphere,
(iii)
Booth,
(iv)
Rosenbrock,
(v)
Michalewicz,
(vi)
Modified Branin.
38 Chapter 2. Magnetoquasistatic Maxwell’s theory – Modeling, simulation, and
optimization
−2 −1 012
x
1
−2
−1
0
1
2
x
2
(i)
−2 −1 012
x
1
−2
−1
0
1
2
x
2
(ii)
−1 012 3
x
1
1
2
3
4
5
x
2
(iii)
−2 −1 012
x
1
−2
−1
0
1
2
x
2
(iv)
12 3
x
1
1
2
3
x
2
(v)
−5 −4 −3 −2 −1
x
1
11
12
13
14
15
x
2
(vi)
(A) Depicting s1(f,(x1,x2)) w.r.t. Figure 2.3b.
Dark colors indicate low values; bright colors indicate high values.
−2 −1 012
x
1
−2
−1
0
1
2
x
2
(i)
−2 −1 012
x
1
−2
−1
0
1
2
x
2
(ii)
−1 012 3
x
1
1
2
3
4
5
x
2
(iii)
−2 −1 012
x
1
−2
−1
0
1
2
x
2
(iv)
12 3
x
1
1
2
3
x
2
(v)
−5 −4 −3 −2 −1
x
1
11
12
13
14
15
x
2
(vi)
(B) Depicting s2(f,(x1,x2)) w.r.t. Figure 2.3b.
Dark colors indicate low values; bright colors indicate high values.
FIGURE 2.5: Depicting si(f,(x1,x2))from (2.49) with i∈{1,2}and
(x1,x2)∈Uw.r.t. the test functions in Table 2.1.
(i)
Ackley,
(ii)
Unit sphere,
(iii)
Booth,
(iv)
Rosenbrock,
(v)
Michalewicz,
(vi)
Modified Branin.
2.4. In closing 39
2.4 In closing
The chapter’s primary purpose has been to lay out the technical landscape in which
the remaining chapters are placed. The languages of vector analysis, differential
geometry, and functional analysis served as a methodological and terminological
guidance for formulating the relevant notions.
More precisely, we have elaborated the magnetoquasistatic model of Maxwell’s
theory by presenting the fundamental problem statement of electromagnetism and
the corresponding system of Maxwell’s equations. From this system, we have de-
rived the magnetoquasistatic subsystem and the magnetostatic subsystem.
Using the magnetoquasistatic model as a directing representative, we have ex-
amined its numerical simulation in the common procedure, i.e., we have recapitu-
lated concisely the concepts of the weak formulation, the numerical approximation,
and the parametric mathematical model.
Finally, we have discussed notions regarding the optimization with a partial dif-
ferential equation and its relation to nonlinear optimization problems. We have
sketched various types of optimization algorithms and we have outlined a subset of
optimization test functions. By deploying a gradient-based interpretation of sensi-
tivity measures to the test functions – which permit the determination of derivative
information by forward mode automatic differentiation –, we have completed the
discussion.
41
Chapter 3
Surrogate optimization
In this chapter, I provide an in-depth elaboration of the key notion surrogate optimiza-
tion. Furthermore, I provide an in-depth elaboration of the proposed partitioning of
this notion in § 1.2 into the three sub-notions: (1) surrogate modeling & simulation,
(2) surrogate-based optimization, and (3) surrogate-guided optimization.
Within the limited scope of the explanations, let us anticipate consistently alge-
braic tools from the category theoretical language in ch. 4in order to tag the various
notions of surrogate optimization with algebraic notes. Additionally, these algebraic
tools facilitate the smooth transition between the various layers in Figure 1.4.
Concerning the sub-notion (1) surrogate modeling & simulation, let us forge an
abstract setting in which we state common classes of mathematical problems. Within
the context of these classes, we embed the notion of a high-fidelity model and a
low-fidelity model. Subsequently, we define the high-fidelity approximation error,
the notion of a sampling plan, and the empirical surrogate modeling error as one
among other indicators within surrogate optimization. Afterwards, let us attempt
to sketch an holistic understanding of some deterministic data-fit low-fidelity mod-
els, i.e., multivariate polynomials and radial-basis functions, and some probabilistic
data-fit low-fidelity models, i.e., kriging low-fidelity models. We close this subpart
by applying a formalization-driven perspective on simplified-physics low-fidelity
models.
Concerning the sub-notion (2) surrogate-based optimization, let us examine the
optimization with the test functions in § 2.3.3 by data-fit low-fidelity models and
by emulated simplified-physics low-fidelity models. We carve out a numerical scaf-
folding of a benchmark-focused classification of test functions (more generally, high-
fidelity models) and we elucidate different procedures to find a solution of the high-
fidelity optimization problem.
Concerning the sub-notion (3) surrogate-guided optimization, let us dwell briefly
on the sequential kriging optimization and its construction principles as a subkind of
the model management strategy adaptation. Afterwards, we dwell on optimization
procedures within the space-mapping paradigm which are a subkind of the model
management strategy adaptation; and we dwell on the basic building blocks of the
co-kriging optimization which can be seen as a subkind of the model management
strategy fusion. By applying a formalization-oriented viewpoint, it is attempted to
illuminate potential hybrid model management strategies and to pin down prop-
erly, e.g., the conceptional distinction between a low-fidelity model and a surrogate
model within the space-mapping paradigm. Furthermore, driven by heuristics, we
construct formally some statements to provide a novel access to the delicate aspect
of convergence-related issues regarding the optimization within the space-mapping
paradigm and the co-kriging optimization.
42 Chapter 3. Surrogate optimization
3.1 Surrogate modeling & simulation
Notice well that, due to the work of so many diverse research communities in the
vast field of surrogate optimization, it seems impossible to provide a unifying metho-
dological and terminological guidance concerning surrogate optimization that suits
every research community.
Regarding probability low-fidelity models, for instance, there is the delicate as-
pect of the interpretation of probability (see, e.g., [201, p. 29ff]). One main interpre-
tation leads to the school of thought called Bayesian statistics (see, e.g., [155, ch. 5]),
another main interpretation leads to the school of thought called Frequentist statis-
tics (see, e.g., [155, ch. 6]).
Therefore, let a general guiding principle of ours be that we aim at being as in-
different as possible to potential issues of interpretation or semantics.
Thus, similarly to a first principles approach, we focus on stripping surrogate
optimization down to a bare syntactical minimum – and, then, to argue from this
bare syntactical minimum, adding layers of syntax and semantics when they are
needed.
3.1.1 An abstract setting
After discussing different classes of mathematical problems, we discuss the concepts
high-fidelity function approximation error, sampling plan, and empirical surrogate
modeling error. Among others, we introduce the empirical generalization error and
the squared sample Pearson correlation coefficient (SSPCC). We close this subsec-
tion by illuminating a link between the SSPCC and the normalized global first-order
sensitivity measures (see § 2.3.3).
Classes of mathematical problems
In the previous chapter, we have encountered the concepts of modeling, simulation
and optimization in the context of the magnetoquasistatic Maxwell’s theory. If we
apply a map-based viewpoint, one can assign abstractly each of these concepts to
one of the following classes of mathematical problems:
givenx∈Xand y∈Y, findK∈hom(X,Y)such that K(x)=y, (3.1a)
givenK∈hom(X,Y)and x∈X, findy∈Ysuch that K(x)=y, (3.1b)
givenK∈hom(X,Y)and y∈Y, find x∈Xsuch that K(x)=y, (3.1c)
where, for instance, Kdenotes a linear map, Xand Ydenote linear spaces over an
underlying field F, and hom(X,Y)(or homF(X,Y)) connotes a vector space as well.
Following the terminology in [140, p. 23], let us call the problems of the form
in (3.1a) as identification problems, the form in (3.1b) as direct problems, and the form
in (3.1c) as inverse problems.1Thus, in the context of the previous chapter, one can
assign modeling to (3.1a), simulation and optimization to (3.1c).
Observe that, for example, the evaluation of the reduced parametric quantity of
interest ˆ
Qξ
ξ
ξ(ξ
ξ
ξ)(see § 2.2.3) can be assigned to (3.1b). Seizing this example, let us pin
down a few notions regarding a surrogate model. Some chunks of approximation
theory and statistical learning theory are utilized which aid us to frame coherently
and to state economically the necessary notions.
1An identification problem is also frequently named recovery problem (see, e.g., [187, p. 551f]). A
direct problem is often called a forward problem as well (see, e.g., [49, p. 5]).
3.1. Surrogate modeling & simulation 43
Let us return to the statement in (2.28) and reformulate it slightly using the terms
in (3.1); hence, we deal with the statement
∀x∈X.K(x)=Y˜
K(x), (3.2)
where K,˜
K∶X⇉Y. In calculations that are of practical interest, it is assumed that
the map Kpossesses certain undesired properties, e.g., its evaluation is exceeding
reasonable finite computing time budgets or it is not straightforwardly available for
operations such as differentiation or integration. A consequence of these proper-
ties is that application-oriented optimizations relying on the map Kare prohibitive.
Therefore, the aim is to surrogate the map Kwith the map ˜
Kthat possesses user-
prescribed desired properties. Commonly, the map Kis called a high-fidelity model –
to emphasize the user’s prescribed assessment of the model’s predictive power –
or a high-cost model – to emphasize the user’s prescribed assessment of the model’s
computational costs. Then, the map ˜
Kis called a low-fidelity model, a low-cost model, a
meta-model or a surrogate model.
Technically, one can substitute the map Kfor the map ˜
Kif the statement in (3.2)
holds true such that the maps are equal by function extensionality, thus, K=X→Y˜
K.
However, this line of thought is left to the next chapter since the usual starting point
regarding surrogate models focuses on, e.g., interpolating polynomials or splines
and linear or nonlinear regression models. Inspired by the origins of these concrete
examples in approximation theory and statistical learning theory, one can define the
class of data-fit low-fidelity models which can be subdivided into the subclass of de-
terministic low-fidelity models, and the subclass of probabilistic low-fidelity models
(see, e.g., [76, p. 132], [116, p. 275]). Facing numerical simulations, one can addition-
ally define the class of projection-based low-fidelity models and the class of simplified-
physics low-fidelity models.
Recalling § 1.2, the class of projection-based low-fidelity models is not pursued.
Furthermore, mind that the term meta-model is primarily a paraphrase for the term
data-fit low-fidelity model, and vice versa; and that, in the context of the space map-
ping paradigm or the defect correction paradigm, the term low-fidelity model and
the term surrogate model are distinguished (see, e.g., [194, p. 28f] or [49, p. 56f]).
It is assumed that a high-fidelity model is deterministic in the sense that repeated
use of the same input results in the same output each time (see, e.g., [184, p. 409]).
If the choice of a high-fidelity model is, e.g., the reduced parametric quantity of
interest ˆ
Qξ
ξ
ξthat is attained by a FE simulation, then a deterministic FE simulation
is considered as opposed to a stochastic FE simulation (see, e.g., [74]). However,
some kind of randomness in the form of noise εin the image ˆ
Qξ
ξ
ξ(ξ
ξ
ξ)is taken into
account but this noise does not stem from randomness in the argument ξ
ξ
ξ. The noise
εencodes, for instance,
• "random errors from unobserved variables" [61, p. 39],
• errors in the presence of a "parameter controlling missing physics" [201, p. 93],
• or a "systematic error ... caused by insufficient mesh resolution" [70, p. 5].
Hence, let us summarize in a single observational noise εall kinds of noise such
as the computational noise with regard to the mesh size parameter h(recall § 2.2.2)
and represent this observational noise as a random variable, even though we utilize
a deterministic high-fidelity model. Mind that this representation is a usual trick
in order to make the machinery of probabilistic low-fidelity models amenable to
deterministic high-fidelity models (see, e.g., [70, p. 5]).
44 Chapter 3. Surrogate optimization
High-fidelity function approximation error
Let us suppose that X,Y, and YX≡hom(X,Y)comprise a more finely layered struc-
ture, more precisely, let us suppose that they are normed linear spaces and they are
equipped with corresponding norm-induced metrics dX,dY, and dYXsuch that
dX=(x1,x2)↦∥x1−x2∥X∶X×X→R+, (3.3a)
dY=(y1,y2)↦∥y1−y2∥Y∶Y×Y→R+, (3.3b)
dYX=(K1,K2)↦∥K1−K2∥YX∶YX×YX→R+. (3.3c)
Notice well that, for the sake of simplicity, it is merely tacitly assumed that, for all
elaborations, additional properties such as compactness, Lipschitz continuity, and
the like hold to be true if the respective notions require these properties.
If we choose a surrogate model ˜
Kfrom a prescribed class of functions called
hypothesis space H(cf. [51, p. 9ff]), that is, ˜
K∈Hand H⊆YX, then we introduce an
error which one can capture by the metrics in (3.3). The theoretical capability of a
surrogate model to approximate accurately a high-fidelity model can be investigated
by an analysis of the convergence of a sequence of surrogate models (˜
Kn)n∈Nto the
high-fidelity model K. It is assumed that ˜
Kn∶X→Yand (˜
Kn)n∈N=n↦˜
Kn∶N→YX.
Often, the notion of uniform convergence (see, e.g., [183, p. 147ff] or [169, p. 203ff])
is deployed where it is set that
dYX(K1,K2)∶=sup
x∈X∥K1(x)−K2(x)∥Y, (3.4)
such that, in abbreviated form, (˜
Kn)n∈N→Kdenotes the convergence of the se-
quence to the limit function which can be expressed figuratively as
(˜
Kn)n∈N→K∶⇔lim
n→∞dYX(˜
Kn,K)=0. (3.5)
There are various theorems like the Stone-Weierstrass theorem for multivariate poly-
nomials (see, e.g., in [140], [169], [183]) which establish the convergence of different
surrogate models in a sense that is similar to (3.5). The expression in (3.4) and the
expression in (3.5) guide the definition of the high-fidelity function approximation er-
ror eH(K)with respect to a fixed surrogate model ˜
Kn.
Definition 3.1.1 (High-fidelity function approximation error).Given a high-fidelity
model K∶X→Yand a fixed surrogate model ˜
Kn∶X→Yfrom a hypothesis space H,
i.e., ˜
Kn∈Hand H⊆YX, the high-fidelity function approximation error eH(K)∈R+
with respect to a fixed surrogate model ˜
Knis constituted by
eH(K)∶=sup
x∈X∥K(x)−˜
Kn(x)∥Y. (3.6)
Remark 3.1.1. If it is unambiguous, then the adjunct high-fidelity is dropped.
Remark 3.1.2. An important special case of (3.6) is eH(ˆ
Qξ
ξ
ξ), that is, the function approxima-
tion error with respect to the lp-norm regarding the reduced parametric quantity of interest.
Comparing the function approximation error eH(K)in (3.6) to the modeling er-
ror E(Q)in (2.22b), it is apparent that one can control eH(K)by a surrogate model’s
order nwhere the positive integer n+1 can represent, for instance, the number of ba-
sis functions. Thus, assuming an F-vector space structure on YXand on H, a possible
3.1. Surrogate modeling & simulation 45
general presentation of a surrogate ˜
Knis given by
˜
Kn∶=
n
∑
i=0
˜
ci⋅F˜
ϕi, (3.7)
where ˜
ϕi∶X→Ysignifies basis functions, i.e., members of a basis of a selected hy-
pothesis space Hwith dimension n, that is, dim(H)≡n; and ˜
ci∈Ftags coefficients
which are also referred to as components or coordinates of ˜
Knwith respect to the
chosen basis. A prototypical hypothesis space is the space P≤n, that is, the set of all
univariate algebraic polynomials of degree at most non an interval [a,b]equipped
with an R-vector space structure and the finite monomial basis. The degree npro-
vides a notion of a characteristic size of the hypothesis space H(cf. [51, p. 13]).2
Supposing a Hilbert space structure on YXand on H, one can define a notion of
a best approximation in a least-squares sense. Thus, the best or the closest surrogate
model ˆ
˜
Kn∈Hto the high-fidelity model K∈YXis associated with the optimization
problem
min
˜
Kn∈H
.R(˜
Kn)∶=1
2∥K−˜
Kn∥2
YX(3.8a)
≡1
2⟨K−˜
Kn,K−˜
Kn⟩YX, (3.8b)
where R∶H→R+denotes the residual objective functional. One can characterize the
best surrogate model ˆ
˜
Knby the minimizer functional argmin =R↦ˆ
˜
Kn∶(H→R+)→H
as a single-valued functional. If the residual objective functional Rpossesses a unique
global minimizer, then the minimizer functional returns ˆ
˜
Knas an output, i.e.,
ˆ
˜
Kn=argmin(R)(3.9)
is well-defined. If there are multiple global minimzers or if there is no global mini-
mizer at all, the minimizer functional falls back to the common definition as a multi-
valued functional where the output is constituted by the set of global minimizers
of R, i.e.,
argmin(R)≡argmin
˜
Kn∈H
R(˜
Kn)∶=⎧
⎪
⎪
⎨
⎪
⎪
⎩˜
Kn∈HRRRRRRRRRRRR(˜
Kn)=inf
˜
K′
n∈H
R(˜
K′
n)⎫
⎪
⎪
⎬
⎪
⎪
⎭(3.10)
In the case of (3.10), the surrogate model ˆ
˜
Knis abest solution that reads as
ˆ
˜
Kn∈argmin
˜
Kn∈H
R(˜
Kn). (3.11)
In the present work, it is generally assumed that the expression (3.9) holds. This
assumption is particularly reasonable for a least squares approach such as in (3.8a)
(see, e.g., [201, p. 69ff]) where the best (or the closest) surrogate model ˆ
˜
Knis the
orthogonal (pseudo-) projection of the high-fidelity model Konto the hypothesis
2The notation in (3.7) follows the customary index convention from multilinear algebra in order to
emphasize the linear combination of members of a vector space. The operation ⋅Fimplies a signature
F×homF(X,Y)→homF(X,Y). The operation’s concrete implementation depends on the selected
concrete function space like, for example, P≤n.
46 Chapter 3. Surrogate optimization
space Hsuch that
(K−ˆ
˜
Kn)⊥H. (3.12)
If we consider, e.g., the prototypical hypothesis space P≤nin the context of the
space of all square-integrable functions L2, then we have the basic continuous least
squares L2approximation as an instance of (3.8a).3
Sampling plan
If one would possess sufficient information in order to determine completely the
high-fidelity model, then the previous considerations suffice for the discussion of
corresponding surrogates. A standard example is the approximation of special func-
tions such as the sine function. However, a basic assumption in the present work
states that a single evaluation of a high-fidelity model is costly; hence, it is desired
to keep the total number of evaluations as low as possible. Therefore, let us create a
sample s∈Wmsuch that
s∶=((x1,y1),.. .,(xm,ym))∈(X×´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
m
Y)× ⋅ ⋅ ⋅ × (X×Y), (3.13)
where m∈N/{0}and ∀i∈{1,. . .,m−1,m}.yi=K(xi)and, furthermore, it is set
that Wm∶={(si)i∈{1,...,m−1,m}∣∀i∈{1,. . .,m−1,m}.si∈X×Y}. Let us refer to mas
sample size. Mind that, in theoretical considerations, it is assumed that the pairs
in X×Yare independently randomly chosen which involves utilizing means from
the probability theory toolkit such as a probability measure (see, e.g., [51, p. 5]). For
a more comprehensive discussion on the toolkit concerning measure and probability
theory, see, e.g., [201, Ch. 2].
However, let us forgo using all of the corresponding theoretical toolkit: In the
present work, we rather focus on the entities with respect to a sample, which are
equipped with the attribute sample or empirical – such as the sample or empirical
mean, the sample or empirical variance, and similar.
Given a sample sand given Xmas the m-fold Cartesian product of Xwith X⊂
RNξ, it is more common to deploy the notion of a sampling plan Xs⊆Xmdefined as
Xs∶={(xi)i∈{1,...,m}∣∀i∈{1,...,m}.xi∈X}, (3.14)
where, concerning the implementation and the choice of a data structure, it is pre-
vailing to identify the sampling plan Xswith an m×Nξmatrix where Nξdenotes the
number of parameters (see § 2.2.3).
Given a member x∈Xsand invoking the projection maps πi∶Xs→Xs,iwith
i∈{1,. . .,m}and Xs,i≡Xwhere πi=x↦πi(x)=∶xi, let us refer to xias sampling
plan points and to the corresponding yiin (3.14) as output points.
Let us discuss briefly some peculiarities regarding the design of a sampling plan.
For a more elaborate discussion on the design of sampling plans, I refer to, e.g., [116,
ch. 13], [70, ch. 1], [53, ch. 17], or [61, ch. 2].
Two desirable properties of a sampling plan Xsare that it is:
3I regard YX∶=L2solely as a powerful interface. The most important use case is given by X∶=Rd
with d∈Nand Y∶=R. For all the subtleties regarding the constructions such as Borel σ-algebra,
Lebesgue measure, the extended real numbers, the Lebesgue integral, and similar, I refer to the lit-
erature (see, e.g., [201, ch. 3]).
3.1. Surrogate modeling & simulation 47
•space-filling and
•non-collapsing (see, e.g., [100]).
The non-collapsing property requires that the coordinates of the sampling plan points
are not identical. More precisely: Let i∈{1,. ..,m}be fixed, and let πjdenote the co-
ordinate projection maps such that πj∶Xs,i→Rwith j∈{1,. . ., Nξ}where πj=
xi↦πj(xi), then ∀xk∈Xs,k,xl∈Xs,l.πj(xk)≠πj(xl)where k∈{1,.. .,m}and
l∈{1,. . .,m}and k≠l. The rationale for this property is, in a strong sense, to exclude
the pathological case where there are two identical sampling plan points; and, in a
weak sense, to exclude the non-economical cases where two sampling plan points
differ only in coordinates to which the high fidelity model is not very sensitive any-
way, so that, in fact, the two points can be seen as equal.
The space-filling property requires to sample the domain of the high-fidelity
model in such a way that the sampling plan error e(Xs)∈R+is minimal which re-
sults in a maximal uniform scattering of the sampling plan points in the domain.
Notice well that there are many ways to quantify the space-filling property (see,
e.g., [53, p. 600]). In order to achieve an optimized sampling plan, a basic idea
is to minimize some objective function involving a distance measure of the sam-
pling plan points with respect to the lp-norm. Pursuing this idea, the corresponding
space-filling sampling plans are generally called Latin hypercube (LHC) sampling
plans. Another kind of space-filling sampling plans are quasi-random sequences or
low-discrepancy sequences ([53, p. 615ff]). They are discussed, for instance, in the
context of the numerical integration of multivariate functions (cf. [116, p. 245]).
In [70, p. 17–27], the authors provide an implementation in the MATLAB®PL for
creating an optimized LHC sampling plan based on the Morris-Mitchell criterion and
an evolutionary operation. By exploiting the package MATLAB.jl (see [101]) which
provides the capability to interact with the MATLAB®PL within the Julia PL, let us
adapt the lines of code concerning this particular optimized LHC sampling plan to
the Julia PL and label them (XSpkg1).
Additionally, let us invoke two Julia PL packages for the creation of sampling
plans:
(XSpkg2) the package LatinHypercubeSampling.jl (see [215] and [216]) provides an
implementation for creating an optimized LHC sampling plan based on the
Audze-Eglais criterion and a genetic algorithm to solve the corresponding opti-
mization problem,
(XSpkg3) the package Sobol.jl (see [106]) provides an implementation for creating a
Sobol quasi-random sequence.
In Figure 3.1 and in Figure 3.2, there are representations of different sampling plans
Xs⊆Xmwhere X∶=[0,1]2denotes the unit 2-dimensional hypercube and the num-
ber of sampling plan points mis given by m∈{10,25,50,100}as well as by m∈
{10,25,100,1000}. Using (XSpkg1), an optimized Latin hypercube sampling plan
is generated which is abbreviated to maximin LHC (cf. [100]). Using (XSpkg2), a
random Latin hypercube and an optimized Latin hypercube sampling plan are gen-
erated which are abbreviated to Audze-Eglais LHC (cf. [100]), respectively. The unit
hypercube is achieved by scaling the hypercube [1,m]×[1,m]. Using (XSpkg3), a
Sobol quasi-random sequence sampling plan is generated which is by default con-
structed for the unit hypercube. Other hypercubes can be achieved by scaling the
unit hypercube.
48 Chapter 3. Surrogate optimization
Comparing the LHC sampling plans (see Figures 3.1a–3.1d)), one can observe
that, already at a low number of sampling plan points, utilizing a random LHC can
lead to a clustering instead of a uniform spreading. The comparison of the Audze-
Eglais LHC and the maximin LHC is intricate due to the different underlying cri-
teria and the randomness in the corresponding stochastic optimization algorithms
(see § 2.3.3). However, both optimized Latin hypercubes exhibit a highly uniform
scattering of the sampling plan points as desired. For more details on a comparison
of the Audze-Eglais LHC and the maximin LHC, see, e.g., (cf. [100]).
In Figure 3.2, the Sobol quasi-random sequence sampling plan is investigated
which shows a highly uniform and nonrandom scattering of the sampling plan
points.4
Interestingly, using (XSpkg1) for high numbers of sampling plan points such as
m>100 and a high accuracy regarding the solving of the underlying optimization
problem requires a significantly higher computational time than using (XSpkg2) and
(XSpkg3).
From a modeling and simulation viewpoint, though, the fundamental premise
is that a single evaluation of a high-fidelity model is expensive. Therefore, the aim
is to construct a sufficiently space-filling sampling plan in a short amount of time at
a low number of sampling plan points. With this aim in mind, all three presented
space-filling sampling plans are well-suited.
From an optimization viewpoint, the sampling plan points should ideally be lo-
cated in the vicinity of optimal points. In Figure 3.3, I illustrate this requirement
by adapting the Audze-Eglais Latin hypercube sampling plan in Figure 3.2 for the
contour representation in Figure 2.2b.
Notice well that a high number of sampling plan points can lower the sampling
plan error e(Xs). Lowering this error can improve the local accuracy of a surrogate
model built upon these points. A surrogate model’s global accuracy, though, is also
determined by the function approximation error in (3.6) which is independent of the
sample.
Empirical surrogate modeling error
In context of the local and global accuracy of a surrogate model, let us introduce
another notion regarding the entity ˆ
Qξ
ξ
ξ: the empirical surrogate modeling error eH,s(ˆ
Qξ
ξ
ξ)
with respect to the sampling plan Xs.
Definition 3.1.2 (Empirical surrogate modeling error).Let us suppose a sampling
plan Xs(such that e(Xs)is minimal) equipped with sampling plan points xi∈Xs,i
where i∈{1,. . .,m}and Xs,i≡X. Furthermore, let us assume a high-fidelity model
ˆ
Qξ
ξ
ξ∶X→Rand a fixed surrogate model ˜
ˆ
Qξ
ξ
ξ,n∶X→Rfrom a hypothesis space H⊆
RX. Then, the empirical surrogate modeling error eH,s(ˆ
Qξ
ξ
ξ)∈R+with respect to the
sampling plan Xsis constituted by
eH,s(ˆ
Qξ
ξ
ξ)∶=1
m
m
∑
i=1(ˆ
Qξ
ξ
ξ(xi)−˜
ˆ
Qξ
ξ
ξ,n(xi))2.5(3.15)
4I do not invoke the package’s functionality to skip the initial portion of the Sobol sequence which,
allegedly, could further improve the uniform spreading of the sampling plan’s points.
5From the viewpoint of statistical learning theory, this error is a representative of mean squared
errors (see, e.g., [116, p. 265]). Mean squared errors along with root mean squared errors (see, e.g., [70,
p. 37]) can be considered as parts of squared error loss functions (see, e.g., [91, p. 219]).
3.1. Surrogate modeling & simulation 49
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(i)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(ii)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(iii)
(A) The number of sampling plan points is given by m∶=10.
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(i)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(ii)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(iii)
(B) The number of sampling plan points is given by m∶=25.
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(i)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(ii)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(iii)
(C) The number of sampling plan points is given by m∶=50.
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(i)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(ii)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(iii)
(D) The number of sampling plan points is given by m∶=100.
FIGURE 3.1: Representations of different sampling plans Xs⊆Xm
where X∶=[0,1]2denotes the unit 2-dimensional hypercube.
(i)
Random LHC,
(ii)
Audze-Eglais LHC,
(iii)
Maximin LHC.
50 Chapter 3. Surrogate optimization
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(i)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(ii)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(iii)
(A) The number of sampling plan points is given by m∶=10.
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(i)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(ii)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(iii)
(B) The number of sampling plan points is given by m∶=50.
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(i)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(ii)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(iii)
(C) The number of sampling plan points is given by m∶=100.
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(i)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(ii)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(iii)
(D) The number of sampling plan points is given by m∶=1000.
FIGURE 3.2: Representations of different sampling plans Xs⊆Xm
where X∶=[0,1]2denotes the unit 2-dimensional hypercube.
(i)
Random LHC,
(ii)
Audze-Eglais LHC,
(iii)
Sobol quasi-random sequence.
3.1. Surrogate modeling & simulation 51
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(i)
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(ii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iv)
0 1 2 3 4
x
1
0
1
2
3
4
x
2
(v)
-5 0 5 10
x
1
0
5
10
15
x
2
(vi)
(A) The number of sampling plan points is given by m∶=10.
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(i)
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(ii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iv)
0 1 2 3 4
x
1
0
1
2
3
4
x
2
(v)
-5 0 5 10
x
1
0
5
10
15
x
2
(vi)
(B) The number of sampling plan points is given by m∶=50.
FIGURE 3.3: The Audze-Eglais LHC sampling plan in Figure 3.2
adapted for the contour representation in Figure 2.2b.
52 Chapter 3. Surrogate optimization
Remark 3.1.3. I point out that the size m of the sample sin (3.15) is fixed. If m would
theoretically tend to infinity, then this asymptotic consideration would particularly affect
the sampling plan error e(Xs). The function approximation error eH(ˆ
Qξ
ξ
ξ)in (3.6), though,
would not be affected since it is independent of the sample.
Remark 3.1.4. If we recall the presentation of a surrogate in (3.7), then, technically, we
face a family of surrogates parameterized by the coefficients ˜
ciand by the degree n. One can
distinguish the kinds of parameters by utilizing the term hyperparameter. Let us compre-
hend all parameters to be determined as hyperparameters ˜
χi– except the coefficients ˜
ci; one
can organize these hyperparameters within an ordered set whose size depends on the given
surrogate modeling problem.
Remark 3.1.5. Notice well that the presence of the coefficients and hyperparameters pres-
sures us to translate our notation from ˜
ˆ
Qξ
ξ
ξ,n(xi)to, for instance, ˜
ˆ
Qξ
ξ
ξ(xi1,ci2,˜
χi3). Though,
if the peril of confusion is low, in order to be consistent with the common literature, let us
accept an abuse of notation, that is, let us put ˜
ˆ
Qξ
ξ
ξ,n(xi)to work which refers tacitly to, for
instance, ˜
ˆ
Qξ
ξ
ξ(xi1,ci2,˜
χi3).
Defining the error eH,s(ˆ
Qξ
ξ
ξ)has a conceptual and a practical value. At a con-
ceptual level, it expresses the problem-dependency on ˆ
Qξ
ξ
ξ. Furthermore, this error
encodes the dependency on the surrogate model’s membership to a prescribed class
of functions, that is, the hypothesis space H; and it encodes the dependency on the
sample s.
At a practical level, the error eH,s(ˆ
Qξ
ξ
ξ)serves as a starting point in order to define
the empirical training error eH,st(ˆ
Qξ
ξ
ξ)and, more importantly, the empirical generalization
error eH,sg(ˆ
Qξ
ξ
ξ). These errors require a partition Xs,pof the sampling plan Xs, more
precisely, Xsis represented as the disjoint union of two subsets Xstand Xsg– where
Xstdenotes the training subset (or observed points subset) and Xsgdenotes the test-
ing subset (or prediction points subset). More formally, there exists an equivalence
relation ∼Xs,pon Xsassociated with the given partition such that Xs,p∶=Xs/∼Xs,p; and
by demanding Xst∩Xsg=∅, it is set that
Xs∶=Xst⊍Xsg, (3.16)
where Xst∈Xs,pand Xsg∈Xs,p. Given the number of sampling plan points m, then
the positive integer mtdenotes the number of training points and the positive inte-
ger mgdenotes the number of testing points such that m≡mt+mgwhereas mostly
mg≪mt. Moreover, given mand a scalar pm∈(0,1)that encodes a fixed partition
ratio for the sampling plan, then it is set that
mt∶=⌈p⋅m⌉such that mg∶=m−⌈p⋅m⌉. (3.17)
Let us suppose a scarcity of points in the sampling plan Xs. Hence, we do not
define yet another subset, more precisely, we do not define a validating subset (see,
e.g., [91, p. 222f]). A validating subset’s purpose is to aid selecting a member from
a parameterized family of surrogates (recall Remark 3.1.4) adapted to the training
subset; before this preselected member is assessed by the testing subset.
The members of the training subset Xstare deployed in (3.15) to determine or to
estimate the parameters ˜
g∈Gsuch as the coefficients ˜
ciin (3.7) or the hyperparam-
eters ˜
χieither by an interpolation problem (exactly fitting the given data) or by a
regression problem (inexactly fitting the given data).
3.1. Surrogate modeling & simulation 53
Determining the coefficients by interpolation results in an empirical training er-
ror eH,st(ˆ
Qξ
ξ
ξ)of zero – by virtue of the interpolation property: Supposing an evalua-
tion functional evi∈homR(RX,R)such that evi(˜
ˆ
Qξ
ξ
ξ,n)∶=˜
ˆ
Qξ
ξ
ξ,n(xi), let us conceive the
interpolation property in the sense of
∀xi∈Xs,i.ˆ
Qξ
ξ
ξ(xi)=˜
ˆ
Qξ
ξ
ξ,n(xi,˜
g). (3.18)
In the next section, we discuss the radial basis surrogate model and the Gaussian
process regression (or kriging) surrogate model, respectively. In this discussion, we
encounter concisely the interpolation problem in a deterministic setting and in a
stochastic setting, respectively. For an in-depth elaboration about deterministic and
stochastic interpolation, I refer to, e.g., [201, ch. 13] or [188].
Recalling (2.35) and (3.8a), one can define an objective function r ∶G→Rin order
to estimate the best parameters ˆ
˜
g∈Gvia a regression problem in the sense of a basic
discrete least squares l2approximation
minimize
˜
g∈Gr(˜
g)∶=1
2
m
∑
i=1∥ˆ
Qξ
ξ
ξ(xi)−˜
ˆ
Qξ
ξ
ξ(xi,˜
g)∥2
l2. (3.19)
If we estimate the coefficients by regression such as a least squares method, then,
generically, the training error eH,st(ˆ
Qξ
ξ
ξ)is greater than zero. A possible interpreta-
tion of this approach to parameter finding is the following: We know a priori that
the sample sin (3.13) fits the surrogate model but the observed outputs within the
sample are noisy, that is,
∀i∈{1,. . .,m}.yi=K(xi)+εi, (3.20)
where εiindicate members of a vector ε∈Rmwhich are independent random num-
bers distributed with regard to the normal distribution with mean µ≡0 and constant
variance σ2, i.e., ε∼N(0,σ2).6For ordinary least squares and a fixed standard de-
viation σ>0, this interpretation can be embedded in the more general parameter
finding by maximum likelihood estimation (see, e.g., [91, p. 31], [155, p. 217ff] or
[201, p. 96]). We look at the maximum likelihood estimation in the next section with
regard to the Gaussian process regression (or kriging) low-fidelity model.
The training error eH,st(ˆ
Qξ
ξ
ξ)associated with a surrogate model ˜
ˆ
Qξ
ξ
ξ,nis inadequate
to assess the surrogate model’s predictive power concerning points not yet observed
(see, e.g., [173, p. 108] or [91, p. 221]). Hence, derived from the empirical surrogate
modeling error in (3.15), let us introduce the empirical generalization error eH,sg(ˆ
Qξ
ξ
ξ)∈
R+with respect to the testing subset Xsgsuch that
eH,sg(ˆ
Qξ
ξ
ξ)∶=1
mg
mg
∑
i=1(ˆ
Qξ
ξ
ξ(xi)−˜
ˆ
Qξ
ξ
ξ,n(xi))2, (3.21)
where ∀i∈{1,. . .,mg}.xi∈Xsg,i∶=(Xs/Xst)iand (Xs/Xst)i≡X. Often, it is conve-
nient to normalize the error eH,sg(ˆ
Qξ
ξ
ξ)to the interval [ˆ
Qmin
ξ
ξ
ξ,ˆ
Qmax
ξ
ξ
ξ]Xsg,i, where ˆ
Qmin
ξ
ξ
ξ∈R
6In statistics vernacular (see, e.g., [61, p. 249–252]), the input entity, the output entity, and the
error entity are conceived as random variables X,Y, and ε, respectively. Hence, the high-fidelity
model Kis regarded as an unknown smooth regression function, i.e., as the conditional expectation
E(Y∣X=x)=∶K(x). Regarding the conditional variance Vµ≡0(ε∣X=x), it is assumed that the ho-
moscedasticity property holds, that is, ∀i.Vµ≡0(εi∣X=x)≡σ2
ε∣Xwith σ2
ε∣X∈R+∪{+∞}. In addition,
noise is only modeled by additive Gaussian noise.
54 Chapter 3. Surrogate optimization
denotes the minimal output point with respect to Xsg,iand ˆ
Qmax
ξ
ξ
ξ∈Rdenotes the
maximal output point with respect to Xsg,i. Hence, one can define the normalized
empirical generalization error (NEGE) eN
H,sg(ˆ
Qξ
ξ
ξ)∈R+with respect to the testing sub-
set Xsgsuch that
eN
H,sg(ˆ
Qξ
ξ
ξ)∶=1
ˆ
Qmax
ξ
ξ
ξ−ˆ
Qmin
ξ
ξ
ξ
eH,sg(ˆ
Qξ
ξ
ξ). (3.22)
As it has been pointed out in the commentary on the empirical surrogate model-
ing error in (3.15), one can additionally introduce the root empirical generalization
error eR
H,sg(ˆ
Qξ
ξ
ξ)∈R+such that
eR
H,sg(ˆ
Qξ
ξ
ξ)∶=[eH,sg(ˆ
Qξ
ξ
ξ)]1
2(3.23)
and one can introduce the normalized root empirical generalization error (NREGE)
eNR
H,sg(ˆ
Qξ
ξ
ξ)∈R+that reads as
eNR
H,sg(ˆ
Qξ
ξ
ξ)∶=1
ˆ
Qmax
ξ
ξ
ξ−ˆ
Qmin
ξ
ξ
ξ
eR
H,sg(ˆ
Qξ
ξ
ξ). (3.24)
Note that both the error eR
H,sg(ˆ
Qξ
ξ
ξ)and the error eNR
H,sg(ˆ
Qξ
ξ
ξ)can be seen as more conser-
vative error measures than their counterparts in (3.21) and in (3.22), respectively.
Let us call a partition Xs,p– with, e.g., the cells Xstand Xsg– randomly created
if and only if the members of the sampling plan Xsare permuted randomly and as-
signed to the partition’s cells in accordance with the partition ratio pm. Then, given
a fixed number of sampling plan points m, a sampling plan point’s membership to
either the training subset or to the testing subset in (3.16) is random, i.e., the parti-
tion Xs,pis randomly created.
In order to average over this membership randomness, let us utilize the notion
of a mean generalization error eH,sg(ˆ
Qξ
ξ
ξ)∈R+which we compute by deploying the
hold-out with random sub-sampling method or the k-fold cross-validation method.
These methods are computationally tractable since they are non-exhaustive in the
sense that they do not consider all possible ways of partitioning the sampling plan.
For further elaboration on the technical intricacies, see, e.g. the survey in [5].
The basic version of the hold-out (or simple validation) method computes the
generalization error in (3.21) by assuming a fixed partition ratio pmand a randomly
created partition Xs,pthat is constituted of the two cells Xstand Xsg. An extended
version of the hold-out method includes random sub-sampling (cf. [116, p. 267f]),
viz. performing the basic hold-out method in a finite number of multiple indepen-
dent runs where at each run the generalization error in (3.21) is computed; finally,
the mean generalization error eH,sg(ˆ
Qξ
ξ
ξ)is computed as the mean of all individual
generalization errors. However, due to the random creation of the individual par-
titions, this method does not provide any guarantees that all sampling plan points
will be exploited properly as testing points.
The k-fold cross-validation method supposes a randomly created partition Xs,p
that is constituted of kcells X(1)
s,p,. . ., X(k−1)
s,p,X(k)
s,pwhere the positive integer kis se-
lected as k≪m. Either the cells are equal in size, i.e., given a positive integer q, then
∀i∈{1,. . .,k}.∣X(i)
s,p∣=q; or the cells are only approximately equal in size. Generally,
there are numerous options to construct the required training and testing subsets.
However, the construction principle underlying this method demands to define k−1
3.1. Surrogate modeling & simulation 55
cells as the training subset and to define the k-th cell as the testing subset. Since the
cells’ ordering is not preserved when assigned to the subsets, there are kdifferent op-
tions to define the corresponding subsets. Therefore, one can define the i-th training
subset and the i-th testing subset as
X(i)
st∶=Xs,p/X(i)
s,p, (3.25)
X(i)
sg∶=X(i)
s,p, (3.26)
where i∈{1,. . .,k}. For each of the koptions, the generalization error in (3.21) is
computed. Hence, the mean generalization error eH,sg(ˆ
Qξ
ξ
ξ)is computed as the mean
of all the kindividual generalization errors. In order to emphasize the dependency
of the generalization error on the k-fold cross validation method, let us introduce the
map eH,sg,cv that reads as
eH,sg,cv =k↦eH,sg(ˆ
Qξ
ξ
ξ)∣k∶=eH,sg,cv(k)∶Z+→R+, (3.27)
where eH,sg(ˆ
Qξ
ξ
ξ)∣kand eH,sg,cv(k), respectively, denote the k-dependent generalization
error. Hence, the mean k-dependent generalization error is denoted as eH,sg(ˆ
Qξ
ξ
ξ)∣k
and eH,sg,cv(k), respectively.
In the context of (3.22), let us ease the notation for the sake of conciseness, that
is, let us introduce the map eN
cv that reads as
eN
cv =k↦eN
H,sg(ˆ
Qξ
ξ
ξ)∣k
≡eN
cv(k)∶Z+→R+, (3.28)
where eN
H,sg(ˆ
Qξ
ξ
ξ)∣kand eN
cv(k), respectively, denote the k-dependent normalized gen-
eralization error. Hence, the mean k-dependent normalized generalization error is
denoted as eN
H,sg(ˆ
Qξ
ξ
ξ)∣kand eN
cv(k), respectively. Notice well that one can analogously
define the mean k-dependent normalized root generalization error eNR
H,sg(ˆ
Qξ
ξ
ξ)∣kand
eNR
cv (k), respectively.
In both the extended hold-out method and the k-fold cross-validation method
the computational burden is dominated by determining the surrogate model via the
training subset for the computation of the individual generalization errors. Hence,
the number of runs (in the hold-out method) and the number of folds (in the cross-
validation method), respectively, have to be chosen in such a way that the burden
is low while still producing a reliable mean generalization error. Let us choose the
number of runs similar to the number of folds for which a computational reasonable
choice is k≡5 or k≡10 (see, e.g., [91, p. 242ff]).7
These computational considerations are relevant for both the surrogate model
assessment and the surrogate model selection – as already mentioned above regard-
ing the non-utilization of a validating subset – which can be subsumed under the
bias-variance problem (see, e.g., [51, p. 13f], [91, p. 223ff] or [155, p. 202]). It is a non-
trivial task to find the optimal hyperparameters in the sense that there is an ade-
quate tradeoff between the need for a small bias to avoid underfitting and the need
for a small variance to avoid overfitting the points regarding the sample sin (3.15)
where the size mis fixed. Since there is a lack of a rigorously proven and computa-
tionally cost-efficient approach to finding the optimal hyperparameters, a common
practice to emulate a validating subset’s purpose is to specify some hyperparameters
7Apart from these heuristic values, there is a lack of rigorously proven lower or upper bounds for
the number of runs or the number of folds.
56 Chapter 3. Surrogate optimization
by the user and to estimate the remaining hyperparameters by, e.g., cross-validation
or maximum-likelihood (see, e.g., [69]).
Supplementary to the error in (3.21), some authors (see, e.g., [70, p. 37]) suggest
to compute the squared sample Pearson correlation coefficient (SSPCC) r2
ˆ
y˜
ˆ
ywith respect
to the testing subset Xsgwhere rˆ
y˜
ˆ
y∈[−1,1]. For the sake of lucidity, let us apply
partly the identifications
ˆ
Y∶=ˆ
Qξ
ξ
ξ(xi)(3.29a)
˜
ˆ
Y∶=˜
ˆ
Qξ
ξ
ξ,n(xi). (3.29b)
Since we consider solely the discrete setting, one can additionally set the sample
means
¯
ˆ
Y∶=1
mg
mg
∑
i=1
ˆ
Yi(3.30a)
¯
˜
ˆ
Y∶=1
mg
mg
∑
i=1
˜
ˆ
Yi, (3.30b)
where the identifications ˆ
Yi≡ˆ
Yand ˜
ˆ
Yi≡˜
ˆ
Yare invoked. Moreover, one can overload
the meaning of the covariance map cov and the variance map var regarding random
variables in a continuous setting. Hence, the coefficient r2
ˆ
y˜
ˆ
yreads as
r2
ˆ
y˜
ˆ
y∶=⎛
⎜
⎝cov(ˆ
Y,˜
ˆ
Y)
√cov(ˆ
Y,ˆ
Y)cov(˜
ˆ
Y,˜
ˆ
Y)⎞
⎟
⎠
2
(3.31a)
≡⎛
⎜
⎝cov(ˆ
Y,˜
ˆ
Y)
√var(ˆ
Y)var(˜
ˆ
Y)⎞
⎟
⎠
2
(3.31b)
≡⎛
⎜
⎜
⎜
⎝
1
mg−1∑(ˆ
Y−¯
ˆ
Y)(˜
ˆ
Y−¯
˜
ˆ
Y)
√1
mg−1∑(ˆ
Y−¯
ˆ
Y)21
mg−1∑(˜
ˆ
Y−¯
˜
ˆ
Y)2⎞
⎟
⎟
⎟
⎠
2
(3.31c)
≡⎛
⎜
⎝∑(ˆ
Y−¯
ˆ
Y)(˜
ˆ
Y−¯
˜
ˆ
Y)
√∑(ˆ
Y−¯
ˆ
Y)2∑(˜
ˆ
Y−¯
˜
ˆ
Y)2⎞
⎟
⎠
2
(3.31d)
≡⎛
⎜
⎝mg∑ˆ
Y˜
ˆ
Y−∑ˆ
Y∑˜
ˆ
Y
√(mg∑ˆ
Y2−(∑ˆ
Y)2)(mg∑˜
ˆ
Y2−(∑˜
ˆ
Y)2)⎞
⎟
⎠
2
(3.31e)
(3.29)
≡⎛
⎜
⎝mg∑mg
i=1[ˆ
Qξ
ξ
ξ(xi)˜
ˆ
Qξ
ξ
ξ,n(xi)]−[∑mg
i=1ˆ
Qξ
ξ
ξ(xi)][∑mg
i=1˜
ˆ
Qξ
ξ
ξ,n(xi)]
√(mg∑mg
i=1[ˆ
Qξ
ξ
ξ(xi)]2−[∑mg
i=1ˆ
Qξ
ξ
ξ(xi)]2)(mg∑mg
i=1[˜
ˆ
Qξ
ξ
ξ,n(xi)]2−[∑mg
i=1˜
ˆ
Qξ
ξ
ξ,n(xi)]2)⎞
⎟
⎠
2
.
(3.31f)
In statistics parlance, if we regard the quantity ˆ
Yas an encoding of the observed val-
ues, and if we regard the quantity ˜
ˆ
Yas an encoding of the predicted (or computed or
simulated) values, then the choice of the SSPCC r2
ˆ
y˜
ˆ
yin (3.31) rests on the assumption
that all observed values and all predicted values are equally important. Hence, the
weighting of the individual data points is one.
3.1. Surrogate modeling & simulation 57
Geometrically speaking, assuming a list of abstract points from an Euclidean
space, then the list of predicted values and the list of observed values can be inter-
preted as the Cartesian coordinates of the abstract points where e˜
ˆ
Yand eˆ
Yrefer to
the unit vectors in R2w.r.t. ˜
ˆ
Yand ˆ
Y, respectively. Therefore, the SSPCC r2
ˆ
y˜
ˆ
yindicates
how well the relationship between the abstract points can be described by a linear
equation in R2. Notice that, by definition, r2
ˆ
y˜
ˆ
y∈[0,1]. Thus, if r2
ˆ
y˜
ˆ
y=1, then there is a
total positive linear correlation and the relationship between the abstract points can
be described by the linear equation that reads as
∃a∈R+.∃b∈R.∀xi∈Xsg,i.ˆ
Qξ
ξ
ξ(xi)=a⋅˜
ˆ
Qξ
ξ
ξ,n(xi)+b. (3.32)
Ideally, one can provide the number aand the number bsuch that a∶=1 and b∶=0.
However, the choice of the SSPCC r2
ˆ
y˜
ˆ
yin (3.31) is not capable of identifying the case
where a∶=1 and b∶=0.
If r2
ˆ
y˜
ˆ
y=0, then there is no linear correlation, more precisely, one cannot provide a
number aand a number bsuch that (3.32) holds to be true at least for some xi∈Xsg,i.
Observe that the geometrical consideration of the SSPCC r2
ˆ
y˜
ˆ
yreveals that the
number mgin (3.31) has to satisfy the condition mg>2 which can be translated into
the requirement that there are at least three abstract points represented as members
of the linear span constituted by the unit vectors e˜
ˆ
Yand eˆ
Y, i.e., span({e˜
ˆ
Y,eˆ
Y}); oth-
erwise r2
ˆ
y˜
ˆ
yis immediately equal to one.
In order to assess a low-fidelity model more nuanced, as mentioned above, one
can use r2
ˆ
y˜
ˆ
yin combination with eH,sg(ˆ
Qξ
ξ
ξ). However, if we use the SSPCC in combi-
nation with the mean generalization error determined by the k-fold cross-validation
method, then the condition mg>2 requires a minimum number of sampling points
mk,min depending on the number k, that is, the number of cells of the randomly cre-
ated partition. Therefore, for instance, if k∶=5, then mk∶=5,min ∶=15; and if k∶=10, then
mk∶=10,min ∶=30.
In order to emphasize this kind of dependency of the SSPCC on the k-fold cross
validation method, let us introduce the map r2
ˆ
y˜
ˆ
y,cv that reads as
r2
ˆ
y˜
ˆ
y,cv =k↦r2
ˆ
y˜
ˆ
y∣k
∶=r2
ˆ
y˜
ˆ
y,cv(k)∶Z+→[0,1], (3.33)
where r2
ˆ
y˜
ˆ
y∣kand r2
ˆ
y˜
ˆ
y,cv(k), respectively, denote the k-dependent SSPCC. In analogy
to eH,sg(ˆ
Qξ
ξ
ξ), for each of the koptions in (3.25) and in (3.26), the SSPCC in (3.31) is
computed. Hence, the mean k-dependent SSPCC (or short: mean SSPCC) r2
ˆ
y˜
ˆ
y∣kis
computed as the mean of all the kindividual SSPCCs.
Whereas the error in (3.21) focuses on the comparison of the values of the high-
fidelity model and the low-fidelity model, the SSPCC focuses on the comparison
of the shapes (or landscapes) of the high-fidelity model and the low-fidelity model.
Thus, if the SSPCC is close to the number 1, then it hints at a high geometrical sim-
ilarity of the corresponding shapes. Therefore, the SSPCC can be suitable as a sup-
plementary tool to assess quantitatively the similarity of the high-fidelity model and
a low-fidelity model.
Notice well that describing shapes by exploiting information about derivatives is
a common theme in languages such as, for instance, differential geometry (see, e.g.,
Detour 1 in§2.1.2). Since we deploy a gradient-based interpretation of sensitivity
measures (recall § 2.3.3), it is mathematically reasonable to associate the k-dependent
58 Chapter 3. Surrogate optimization
SSPCC with a low-fidelity model’s normalized global first-order sensitivity mea-
sures SN
˜
ˆ
y,i(f)with SN
˜
ˆ
y,i(f)≡SN
i(˜
ˆ
Qξ
ξ
ξ,n). Hence, I propose to normalize the k-dependent
SSPCC r2
ˆ
y˜
ˆ
y∣k– and the mean SSPCC r2
ˆ
y˜
ˆ
y∣kas well – to the sum ΣNξ
i=1SN
˜
ˆ
y,i(f). More pre-
cisely, one can define the normalized k-dependent SSPCC and the normalized mean
SSPCC such that
r2
ˆ
y˜
ˆ
y∣N
k
∶=
r2
ˆ
y˜
ˆ
y∣k
ΣNξ
i=1SN
˜
ˆ
y,i(f),r2
ˆ
y˜
ˆ
y∣N
k∶=
r2
ˆ
y˜
ˆ
y∣k
ΣNξ
i=1SN
˜
ˆ
y,i(f). (3.34)
Notice that r2
ˆ
y˜
ˆ
y∣N
k
=[0,1]r2
ˆ
y˜
ˆ
y∣kand r2
ˆ
y˜
ˆ
y∣N
k
=[0,1]r2
ˆ
y˜
ˆ
y∣k, that is, the corresponding entities
are equal as numbers. However, they are conceptually different. A benefit of intro-
ducing the entity r2
ˆ
y˜
ˆ
y∣N
k– and the entity r2
ˆ
y˜
ˆ
y∣N
kas well – is to highlight the connection
of various information sources for assessing the shape (or landscape) of the low-
fidelity model with regard to the high-fidelity model. Another benefit is that r2
ˆ
y˜
ˆ
y∣N
k
and r2
ˆ
y˜
ˆ
y∣N
khint at the trustworthiness of the normalized global first-order sensitiv-
ity measures associated with a low-fidelity model, that is, SN
˜
ˆ
y,i(f), as proxies for the
normalized global first-order sensitivity measures associated with the high-fidelity
model, that is, SN
ˆ
y,i(f)with SN
ˆ
y,i(f)≡SN
i(ˆ
Qξ
ξ
ξ). To put the conjecture in more formal
words:
Conjecture (Trustworthiness of low-fidelity models’ normalized global first-order
sensitivity measures).Given k and m such that m >mk,min, then there exist some low-
fidelity models such that
∀i∈{1,. . ., Nξ}.SN
˜
ˆ
y,i(f)→SN
ˆ
y,i(f)as m →∞Ô⇒ r2
ˆ
y˜
ˆ
y∣N
k→1as m →∞.8(3.35)
Remark 3.1.6. If and only if the case
∀i∈{1,. . ., Nξ}.SN
˜
ˆ
y,i(f)=SN
ˆ
y,i(f)(3.36)
holds, then a low-fidelity model’s sensitivity measures is considered as total trustworthy
proxies for a high-fidelity model’s sensitivity measures.
The contrapositive of the statement in (3.35) emphasizes that if r2
ˆ
y˜
ˆ
y∣N
kis not asymp-
totically converging to one as mtends to infinity, then one cannot expect that the
low-fidelity model’s sensitivity measures are trustworthy proxies at all. To my best
knowledge, a thorough formal investigation of the above-mentioned conjecture is
lacking. Mind that such a thorough formal investigation is out of the scope of the
present work. The conjecture should be rather understood as an attempt to jot down
formally an accumulation of experimental observations than an attempt to infer log-
ically from a bundle of theoretical insights.
In § 3.2, I present a small data-driven investigation of the statement in (3.35) by
means of numerical experiments with regard to the test functions in Table 2.1. Notice
that, in this investigation, the sample size is limited, though. Hence, the asymptotic
behavior is not examined but, primarily, the pre-asymptotic behavior – since, from
8It is implicitly supposed that the limit considerations are with regard to some appropriate norms.
3.1. Surrogate modeling & simulation 59
an application-driven viewpoint, the pre-asymptotic behavior is particular interest-
ing.
In practical applications, the statement in (3.35) inspires to introduce for each
i∈{1,. . ., Nξ}a low-fidelity models’ normalized global first-order sensitivity mea-
sures (LFSM) error emj(SN
˜
ˆ
y,i)that reads as
emj(SN
˜
ˆ
y,i)∶=
SN
i,mj(f)−SN
i,mj−1(f)
SN
i,mj(f), (3.37)
where mjand mj−1denote sample sizes such that mj>mj−1. The errors in (3.37) track
the size as well as the orientation of the discrepancy between a low-fidelity model’s
normalized global first-order sensitivity measures w.r.t. a sample size mjand a sam-
ple size mj−1. The notation em∞(SN
˜
ˆ
y,i)refers to the situation where a reference value,
e.g., from an analytical calculation, is provided by a user.
3.1.2 Deterministic and probabilistic data-fit low-fidelity models
I only cover a small part of the vast territory of available deterministic and prob-
abilistic surrogate models. For more examples from the zoo of surrogates, I refer
to, e.g., [61] and [91] and references therein. Additionally, the new, still develop-
ing, Julia PL package Surrogates.jl (see [23]) is recommended that is part of the
larger open source software project called SciML: Scientific Machine Learning
(see
https://sciml.ai/
).
The choice of surrogate models as well as the respective terminologies and the
technicalities reflect partly a bias towards mesh-free and mesh-based numerical mod-
els (see, e.g., § 2.2) – albeit, these surrogate models are also connected to data-driven
statistical models. All the surrogate models described in this subsection are essen-
tially representable as an expansion of basis functions as shown in (3.7).
Multivariate polynomials
In the previous subsection 3.1.1, we have encountered the prototypical hypothesis
space P≤nwhich is the set of all univariate algebraic polynomials of degree at most n
on an interval X⊂Requipped with an R-vector space structure and the finite mono-
mial basis B⊆P≤nthat reads as
B∶={1,x1,. . ., xn−1,xn}(3.38)
such that P≤ncan be regarded as the linear span of B, i.e., P≤n≡span(B). A mem-
ber of the space P≤nis a univariate polynomial p=x↦p(x)∶X→Rwhich can be
portrayed as
p(x)∶=c0x0+c1x1+⋯+cn−1xn−1+cnxn≡
n
∑
i=0
cixi, (3.39)
where it holds that x0≡1, n∈Nand the coefficients ci∈Rwith i∈{0,...,n}and
cn≠0. Using the generic representation in (3.7), one can write (3.38) as
B∶={˜
ϕi(x)≡xi∣i∈{0,1,.. .,n−1,n}}, (3.40)
60 Chapter 3. Surrogate optimization
and one can define the linear span of Bas
span(B)∶={n
∑
i=0
˜
ci⋅R˜
ϕi(x)∣n∈N∧˜
ϕi(x)∈B∧˜
ci∈R}.9(3.41)
Following the path in [201, p. 154ff], let us wield the tensor product construction in
order to articulate a space of multivariate polynomials. The construction’s underly-
ing principle is to express a d-variate polynomial with the arguments (x1,. .., xd)∈
X⊆Rdas a combination of dunivariate polynomials such as in (3.39).
Let N∶=(n1,...,nd)∈Nd
0be the ordered set of the degrees of the dunivariate
polynomials. Furthermore, let I∶=(i1,. . .,id)∈Nd
0denote an ordered set of dindices,
more precisely, a multi-index of dmembers, and ∣I∣∶=i1+⋅⋅⋅+iddesignates the degree
of the multi-index Ior the total degree of the monomial xIwhich can be written as
xI∶=xi1
1xi2
2⋯xid−1
d−1xid
d, (3.42)
where x(0,...,0)≡x0
1⋯x0
dand x(0,...,0)≡1. Then, one can write a multivariate polyno-
mial p=x↦p(x)∶X→Ras
p(x)∶=∑
I≤N
cIxI, (3.43)
where the coefficients cI∈Rare scalars and I≤Nencodes (i1≤n1,. . .,id≤nd).The
maximal degree of the monomial xIin (3.42) can be expressed as max{I}≤Nthat
encodes (max{i1}≤n1,. . .,max{id}≤nd); and the notion of a multivariate polyno-
mial’s degree deg(p)can be defined as
deg(p)∶=max{∣I∣∣cI≠0}. (3.44)
Finally, the space of d-variate polynomials of total degree at most kcan be ex-
pressed as the direct sums of tensor products of dspaces of univariate polynomials:
Pd
≤k∶=⊕
∣I∣≤k
Pi1⊗⋯⊗Pid.10 (3.45)
Let us restrict to the case in which N≡(k,...,k)with k∈N0, that is, each xiwith
i∈{1,. . .,d}is associated with a univariate polynomial of degree k. Then, a basis of
the space Pd
≤kcan be written as
d
⊗
i=1
Bi∶=B1⊗ ⋅ ⋅ ⋅ ⊗ Bd(3.46a)
∶={˜
ϕ1i1(x)⊗ ⋅ ⋅ ⋅ ⊗ ˜
ϕdid(x)≡xi1⊗ ⋅ ⋅ ⋅ ⊗ xid∣I∈{0,1,. . .,k}d∧∣I∣≤k}(3.46b)
∶={˜
ϕ1i1(x1)⋯˜
ϕdid(xd)≡xI∣I∈{0,1,. . .,k}d∧∣I∣≤k}(3.46c)
=∶B⊗d, (3.46d)
9In span(B), technically, the object under the given predicate allows to meaningfully state the
predicate p(x)∈P≤nwhere P≤n≡span(B). In order to meaningfully state the predicate p∈P≤nwith
p=x↦p(x)∶X→R, one should understand the object in span(B)as a shorthand notation for
x↦∑n
i=0˜
ci⋅R˜
ϕi(x)∶X→R.
10Generally, constructing multivariate functions as tensor products of univariate functions involves
many subtelties such as a quotient space or the universal property. For some subtleties of such con-
structions, see, e.g., [201, p. 45–52].
3.1. Surrogate modeling & simulation 61
where Biis the basis Bin (3.40) with respect to xi. Given the basis in (3.46), one can
express the dimension of the space Pd
≤kvia the binomial coefficient (k+d
d)such that
dim(Pd
≤k)≡(k+d
d). (3.47)
For example, the dimension of the space P2
≤2is dim(P2
≤2)∶=6 and a basis of this space
can be written as
2
⊗
i=1
Bi∶=B1⊗B2(3.48a)
∶={˜
ϕ1i(x)⊗˜
ϕ2j(x)≡xi⊗xj∣(i,j)∈{0,1,2}2∧i+j≤2}(3.48b)
∶={˜
ϕ1i(x1)˜
ϕ2j(x2)≡xi
1xj
2∣(i,j)∈{0,1,2}2∧i+j≤2}(3.48c)
=∶B⊗2, (3.48d)
where B1and B2are the basis Bin (3.40) for x1and x2, respectively. A polynomial
p∈span(B⊗2)can be represented as
p(x)∶=c(0,0)+c(1,0)x1+c(0,1)x2+c(2,0)x2
1+c(0,2)x2
2+c(1,1)x1x2, (3.49)
where an ordering for the multi-index Iis chosen which does respect the degree ∣I∣.
Occasionally, the basis associated with the space Pd
≤kis called a complete poly-
nomial basis. The basis associated with the space Pd
k, that is, the space of d-variate
polynomials of maximal degree at most k, is called a tensor product polynomial ba-
sis. The space Pd
kis constructed by applying the condition max{I}≤Nin (3.45). Its
dimension can be stated as
dim(Pd
k)≡(1+k)d. (3.50)
In Table 3.1, dim(Pd
≤k)and dim(Pd
k)are listed for some pairs (k,d)∈N2.
TABLE 3.1: Given some pairs (k,d)∈N2, the dimension of Pd
≤kand Pd
k.
(k,d)dim(Pd
≤k)dim(Pd
k)
(2,2)6 9
(2,3)10 27
(2,4)15 81
(2,5)21 243
(k,d)dim(Pd
≤k)dim(Pd
k)
(3,2)10 16
(3,3)20 64
(3,4)35 256
(3,5)56 1024
For a fixed integer k>0 and a fixed integer d>1, in general, one can observe that
dim(Pd
≤k)<dim(Pd
k). (3.51)
For a given integer pair (k,d), empirically, the computation time associated with the
polynomials in the space Pd
≤kis frequently lower than for polynomials in the space Pd
k,
while the approximation quality or accuracy is only slightly lower.
Recalling the test functions in Table (2.1), one cannot expect a globally sufficiently
accurate approximation of those test functions that include periodic parts. From an
optimization point of view, however, one can expect a locally sufficiently accurate
approximation in the neighborhood of the global optimum. The rationale behind
62 Chapter 3. Surrogate optimization
these expectations is rooted in the relationship of the notion of approximation qual-
ity for the polynomials in the space Pd
≤kwith the notion of approximation quality for
the d-variate Taylor-kind polynomials of degree k.
Admittedly, I do not elaborate on this relationship; it triggers, though, an im-
portant special case regarding low-fidelity models which is the space of d-variate
polynomials of total degree at most two. The corresponding polynomials p∈Pd
≤2are
called response surfaces (see, e.g., [61, p. 27]).
Ad-variate polynomial of degree at most two can be presented in the form
p(x)∶=c(0,...,0)+
d
∑
i=1
eixi+
d
∑
i=1
d
∑
j=1
ai,jxixj, (3.52)
where ei∈Rand ai,j∈Rare scalars. Invoking Householder’s notation for matrix
operations (see, e.g., [99, p. 1ff]) and given p∶Rd×1→R, we can write (3.52) as
p(x)∶=β0+eTx+xTAx , (3.53)
where β0∈Rrepresents a scalar, x∶=[xi]∈Rd×1and e∶=[ei]∈Rd×1represent column
vectors (or column matrices) and A∶=[ai,j]∈Rd×Rd– with Rd×Rd≅Rd×d– repre-
sents a quadratic matrix (or square matrix).11 By introducing the maps le=x↦eTx∶
Rd×1→Rand qA=x↦xTAx ∶Rd×1→R, a map-oriented presentation of (3.53) can
be achieved:
p(x)∶=β0+le(x)+qA(x). (3.54)
A possible matrix representation of the basis in (3.48c)is the column vector ˜
b∈
R6×1with
˜
b∶=[1x1x2x2
1x2
2x1x2]T. (3.55)
Given the order of the components in (3.55), the authors in [61, p. 133] show a general
construction rule how to obtain the components of a column vector ˜
b∈Rs×1with
s∶=d(d+3)/2+1 (cf. (3.47)) that represents the basis of a d-variate polynomial of
degree at most two:
˜
b∶=[˜
b0˜
b1... ˜
bd˜
bd+1... ˜
b2d˜
b2d+1... ˜
bs−1]T, (3.56)
where ˜
b0=1, ˜
b1=x1,˜
bd=xd,˜
bd+1=x2
1,˜
b2d=x2
d,˜
b2d+1=x1x2, and ˜
bs−1=xd−1xd.
Hence, if we introduce a column vector ˜
c∈Rs×1which encapsulates the coefficients
with regard to the components of ˜
b, one can reformulate (3.53) as
p(x)∶=˜
bT˜
c, (3.57)
Assuming a sample ssuch as in (3.13), one can employ the corresponding sampling
plan points in (3.57). Thus, one can define a column vector y∈Rm×1whose com-
ponents are the output points yiand one can succinctly define a matrix B∈Rm×Rs
11Technically, the representation of a vector y∈Rdas a column vector y∈Rd×1and the representa-
tion of a 1×1 matrix γ∈R1×1as a scalar γ∈Rinvolves some kind of isomorphisms, in order to state
Rd≅Rd×1and R≅R1×1.
3.1. Surrogate modeling & simulation 63
with respect to the sampling plan points xi:
B∶=
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
˜
bT
x1
˜
bT
x2
⋮
˜
bT
xm
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
. (3.58)
In a verbose mode, the matrix in (3.58) displays
B∶=
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1x11... x1dx2
11... x2
1dx11x12... x1d−1x1d
1x21... x2dx2
21... x2
2dx21x22... x2d−1x2d
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
1xm1... xmdx2
m1... x2
mdxm1xm2... xmd−1xmd
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
, (3.59)
where xmddenotes the d-th coordinate of the m-th sampling plan point. Using the
matrix B, one can define a function h=˜
c↦B˜
c∶Rs→Rmand one can state an inverse
problem (cf. (3.1c)in§3.1.1) that reads as
given B∈Rm×Rsand y∈Rm×1, find ˜
c∈Rs×1such that B˜
c=y. (3.60)
If the condition m=sholds and rank(B)is full, the left and the right inverse of the
matrix Bexist.12 Hence, there is a unique solution ˜
c≡B−1˜
ywhich is a determination
of the coefficients by interpolation (cf. (3.18)in§3.1.1).
In the applications of the present work, however, the condition m>susually
holds that leads to an overdetermined system of linear equations in (3.60) such that
B−1does not exist. In this case, the inverse’s purpose is most adequately emulated
by the pseudoinverse. For more details on the properties of the pseudoinverse, I
refer to, e.g., [200, p. 618f].
Given the space-filling property and the non-collapsing property of a sampling
plan, it is reasonable to assume that the column rank of Bis full such that the inverse
(BTB)−1exists, then the pseudoinverse B+∈Rs×Rmcan be stated as
B+∶=(BTB)−1BT, (3.61)
which can be computed efficiently by the singular value decomposition method.
Using the pseudoinverse B+results in a reformulation of the problem in (3.60) in
terms of a projection matrix PB∈Rm×Rmwith PB∶=BB+where tr(PB)≡rank(B)
such that one can define a column vector ˆ
y∈Rm×1via ˆ
y∶=PBy.13 The corresponding
coefficients column vector ˆ
˜
c∈Rqsuch that
ˆ
˜
c∶=B+y(3.62)
is the best solution in the sense of a linear multiple regression by the least squares
12Let us conceive the rank of the m×smatrix Bin the sense that rank(B)≤min{m,s}. If m=qand
rank(B)=m, then we say that the rank of the matrix Bis full.
13Let us comprehend the trace of a square matrix A∈Rn×Rnas the map tr ∶Rn×Rn→R+with
tr(A)∶=∑n
i=1ai,i.
64 Chapter 3. Surrogate optimization
method. Recalling (3.19), the best solution ˆ
˜
c∈Rs×1is associated with the optimiza-
tion problem
minimize
˜
c∈Rs×1rss(˜
c)∶=1
2(y−B˜
c)T(y−B˜
c), (3.63)
where the objective function rss ∶Rs×1→Ris sometimes called the residual sum-of-
squares function (see, e.g., [91, p. 30]) with r∈Rm×1designating the residual column
vector (abbreviated to residual) such that r∶=(y−B˜
c).
In order to ensure the well-posedness of the optimization problem, a fruitful
generalization of the basic discrete least squares l2approximation problem in (3.63)
is the Tikhonov regularized weighted least squares l2approximation problem that
can be stated as
minimize
˜
c∈Rs×1rss(˜
c)∶=1
2∥y−B˜
c∥2
W+1
2∥˜
c−˜
c0∥2
R, (3.64)
that leads to the normal equations and the best solution, respectively,
ˆ
˜
c∶=(BTWB+R)−1(BTWy+R˜
c0), (3.65)
where the Tikhonov matrix R∈Rs×Rsand the residual variance-covariance matrix W∈
Rm×Rmdenote symmetric positive definite diagonal matrices such that
∥⋅∥W=v↦vTWv∶Rm×1→R, (3.66)
∥⋅∥R=v↦vTRv∶Rs×1→R, (3.67)
and ˜
c0∈Rs×1denotes a column vector that represents the initial guess about the best
solution (cf. [201, p. 71]).
Since the Tikhonov matrix Rencodes the regularization, it is set that R∶=LLT
where one can define L∈Rs×Rsas L∶=√λIwith the regularization parameter λ∈
[0,1[and I∈Rs×Rsbeing the identity matrix. The residual variance-covariance
matrix Wencodes the weighting of the components of the squared residual in the
sense that W∶=diag(σ−2,. . .,σ−2)where σdenotes the constant conditional error
variance in (3.20).14
Hence, let us consider all the components of the squared residual as uncorrelated
(represented by setting all of W’s off-diagonal entries to zero) and on a par with
each other (represented by setting all of W’s diagonal entries to the same positive
number). If we set W∶=Iwith I∈Rm×mbeing the identity matrix and λ≡0 in the
Tikhonov matrix, we recover the problem in (3.63) as a special case.
Mind, though, if the original problem in (3.63) is ill-conditioned, choosing λtoo
small will not change much, choosing λtoo big leads much more to a detachment
from the original problem. Thus, finding an optimal regularization parameter is not
a trivial task and it depends highly on the problem at hand and the judgment of
the user. Let us interpret the regularization parameter as a hyperparameter (recall
Remark 3.1.4).
Given the optimal coefficients as floating-point numbers, a numerically stable
approach to evaluate the function in (3.57) is Clenshaw’s recurrence formula (see, e.g.,
14Given a column vector d∈Rn×1and a square diagonal matrix A∈Rn×nwhere ∀i,j∈{1,2,. . . ,n}.
i≠jÔ⇒ ai,j∶=0, let us comprehend diag as the map with the signature Rn×1→Rn×nand the assign-
ment diag(d)∶=[ai,i≡di]. Note well that, in the specific context in which the map diag lives, the term
diag(d1,. . .,dn)is treated as a rewriting of the term diag([d1,. .. , dn]T).
3.1. Surrogate modeling & simulation 65
[171, p. 222f]) – which, in the case of a monomial sum, is the familiar Horner’s
method – that exploits the inherent recurrence relation and avoids the explicit eval-
uation of the polynomial functions in (3.56). For a more elaborate discussion on
the propagation of the rounding error in the context of polynomial evaluation, see,
e.g., [161]. In Listing 3.1, I present an example implementation of Clenshaw’s algo-
rithm for the evaluation of a univariate monomial sum in the Julia PL.
LISTING 3.1: An example implementation of Clenshaw’s algorithm
for the evaluation of a univariate monomial sum in the Julia PL.
1
function monomial_clenshaw_eval_1d(c::Vector{T},x::T) where {T<:Real}
2
N = size(c,1) - 1 # 1-based indexing
3
d = zeros(N+2)
4
d[N+2] = 0
5
d[N+1] = c[N+1]
6
for iin N:-1:2
7
d[i] = x*d[i+1] + c[i]
8
end
9
return x*d[2] + c[1]
10
end
In the multivariate case, one can apply a plain greedy approach in the sense that one
can invoke multiple nested hierarchical univariate monomial sum evaluations. For
instance, if we possess the space P2
≤2with dim(P2
≤2)≡6, then one can introduce the
column vectors ˜
φ(x2)∈R6×1and ˜
ψ(x1)∈R6×1and the diagonal matrix ˜
Σ∈R6×R6
such that
˜
φ(x2)∶=[x0
2x0
2x0
2x1
2x2
2x1
2]T, (3.68a)
˜
ψ(x1)∶=[x0
1x1
1x2
1x0
1x0
1x1
1]T, (3.68b)
˜
Σ∶=diag(˜
c1,˜
c2,˜
c3,˜
c4,˜
c5,˜
c6), (3.68c)
p(x)∶=˜
φ(x2)T˜
Σ˜
ψ(x1). (3.68d)
A display that is favorable for the application of the multiple nested hierarchical
evaluations is
p(x)∶=
dim(P2
≤2)
∑
i=1
˜
cj˜
φj(x2)˜
ψj(x1), (3.69)
where, firstly, the terms ˜
cj˜
φj(x2)are evaluated, and, secondly, these evaluated terms
are used as the coefficients for the evaluation of the terms ˜
ψj(x1). The proper gener-
alization of Horner’s method to multivariate polynomials is still an active research
area (see, e.g., [130]). Furthermore, notice well that the display in (3.69) hints at the
connection to the vivid research area of the computationally efficient representation
of multivariate functions using low-rank tensor approximation techniques (see, e.g.,
[206], [90]). However, in the present work, let us leave it at that adumbration.
I have argued that due to a sampling plan’s space-filling property and its non-
collapsing property, it is reasonable to assume the generic case in which the matrix
BTBis invertible (or non-singular or non-degenerate). However, this premise could
be challenged. Hence, let us glance briefly at the influence of the arrangement of a
66 Chapter 3. Surrogate optimization
sampling plan on the condition number κ(BTB)with the property
κ(BTB)≡(κ(B))2.15 (3.70)
Thus, the condition number with respect to BTBis always worse than the condition
number with respect to B. Let us focus on instances of a multicollinearity with re-
spect to the chosen basis – where, in a theoretical absence of numerical errors, one
could spot that the column rank of Bis not full.
Observing Figure 3.1 and Figure 3.2, a situation is conceivable where a sampling
plan could be constructed as, for instance,
Xs,1 ∶={(0.1,0.6),(0.2,0.3),(0.4,0.7),(0.5,0.8),
(0.6,0.1),(0.3,0.2),(0.7,0.4),(0.8,0.5)}, (3.71)
where Xs,1 is represented by a Rm×Rdmatrix with m=8 and d=2 such that the con-
dition number is 1.06×104(see
(i)
in Figure 3.4). The underlying construction prin-
ciple is based on the Householder reflection matrix H∈Rd×Rdwith H∶=I−2vvT
vTv
where I∈Rd×Rddenotes the identity matrix and v∈Rd×1denotes a column vector.
Let us choose vsuch that it is orthogonal to the vector vt∶=∑d
i=0eiwhere eidenote
the standard basis vectors of Rd.
Another illustration is a sampling plan Xs,2 such that
∀i∈{1,. . .,m}.xid−1=xidÔ⇒ ∀i∈{1,...,m}.bi,s−1=bi,2d. (3.72)
For an example regarding the case m=8 where the condition number is 8.67×1049 –
which, numerically, indicates a singular matrix –, consult
(ii)
in Figure 3.4. Let us
compare the sampling plan Xs,1 in (3.71) and Xs,2 in (3.72) with a sampling plan Xs,3
based on the Sobol quasi-random sequence where the condition number is 3.23 ×103
(see
(iii)
in Figure 3.4). Even in the case of a space-filling and non-collapsing sam-
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(i)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(ii)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(iii)
FIGURE 3.4: Sampling plan Xs[condition number κ(BTB)].
(i)
Xs,1 [1.06×104],
(ii)
Xs,2 [8.57×1049],
(iii)
Xs,3 [3.23×103].
15Let us comprehend the condition number for inversion of a matrix A∈Rm×Rnwith m≥nas
the map κ∶Rm×Rn→R+with κ(A)∶=∥A∥2∥A+∥2where ∥⋅∥2∶Rm×Rn→R+denotes the matrix
norm induced by the l2-norm for vectors. If we possess A’s largest singular value σmax and its smallest
singular value σmin, then one can set κ(A)≡σmax
σmin . Given a positive integer kwhere κ(A)∝10k, then,
very roughly speaking, it is supposed that there are only 16 −kor 16 −log10(κ(A))significant digits of
an output’s accuracy in a double-precision floating-point format.
3.1. Surrogate modeling & simulation 67
pling plan such as the Sobol quasi-random sequence, one can observe a high condi-
tion number. It is an indication that the familiar ill-conditioned behavior of a mono-
mial basis for the space P≤2is mimicked by the monomial basis for the space Pd
≤2.
In Appendix A, numerical experiments are conducted with regard to a repara-
metrization using mean-centered arguments, Bernstein polynomials, and Cheby-
shev polynomials.
In Figure 3.5, the monomial basis in (3.46), the Bernstein basis in (A.3), and the
Chebyshev basis in (A.10) for the space P1
≤2are exhibited. Note that there are Julia PL
packages such as MultivariatePolynomials.jl (see doi:10.5281/zenodo.3839754)
that, in their definition of various kinds of polynomials, utilize intensely several lan-
guage features of the Julia PL, e.g., its type system or its metaprogramming capabili-
ties. Note further that there are MATLAB®PL toolboxes such as Chebfun3 (see [90])
that are purely written in MATLAB. These toolboxes avoid using, e.g., MEX files, i.e.,
MATLAB executables. Let us not dwell on language-related issues because there
is a lack of comprehensive studies (cf. § 2.3.3) that, for such particular use cases,
compare thoroughly the pros and cons of each programming language in terms of
performance, readability or maintainability – to name but a few criteria. For exam-
ple, it is difficult to determine whether potential performance differences are due to
different language designs or different implementations, or both.
0.0 0.2 0.4 0.6 0.8 1.0
x
0.0
0.2
0.4
0.6
0.8
1.0
p
(
x
)
(i)
b
1
b
2
b
3
0.0 0.2 0.4 0.6 0.8 1.0
x
0.0
0.2
0.4
0.6
0.8
1.0
p
(
x
)
(ii)
bς
1
bς
2
bς
3
-1.0 -0.5 0.0 0.5 1.0
x
-1.0
-0.5
0.0
0.5
1.0
p
(
x
)
(iii)
q
1
q
2
q
3
FIGURE 3.5: Basis under consideration for the space P1
≤2.
(i)
Monomial basis,
(ii)
Bernstein basis,
(iii)
Chebyshev basis.
Mind that the Chebyshev grid (see Figure A.1) is closely linked to the numerical
technique of sparse grids (see, e.g., [39]). This technique is particularly useful when
one is dealing with the space Pd
kwhere high values are chosen for kand dsuch that
dim(Pd
k)in (3.50) is high as well. Thus, the dimensionality dis a curse in the sense
that the computational costs in terms of memory and time grow exponentially in the
dimensionality d.
Based on ideas from information-based complexity theory (see, e.g., [207]), the
authors in [39] invoke a formal encoding of the curse of dimensionality by the com-
plexity estimate O(ε−αd)where the non-negative real number εdenotes a desired
accuracy of an approximate solution and the non-negative real number αis depen-
dent on the properties of the high-fidelity model and the low-fidelity model, and the
concrete implementation as well. It is assumed that 0 <ε<1 and 0 <αand that the
big Onotation refers to the worst-case time complexity estimate.
The authors in [39] provide a link between the approximation error and the com-
plexity estimate which is adapted to the high-fidelity function approximation error
68 Chapter 3. Surrogate optimization
in (3.6) such that
∥K−˜
Kn∥YX=O(n−r/d)as n→∞ ∶⇔∥K−˜
Kn∥YX∈O(n−r/d), (3.73)
where rdenotes the isotropic smoothness of the high-fidelity model and O(n−r/d)
can be interpreted as the corresponding complexity class.
Even though the sparse grids technique is a tool to alleviate the curse of dimen-
sionality to some extent, this technique is ignored in the present work. A reason is
that the focus is on low-fidelity models associated with the space Pd
≤2where, for a
fixed d, the curse of dimensionality does not appear as heavily as for the space Pd
2
(cf. Table 3.1). By focusing on these spaces, a potential loss of accuracy is acceptable
which is also partly due to the global character of these low-fidelity models.
Another tool to alleviate the curse of dimensionality to some extent are radial
basis functions – with their local character – which we encounter next. Mind that,
in favor of radial basis functions, a discussion on multivariate splines (see, e.g., [91,
ch. 5.7]) is skipped. However, radial basis functions are partly related to splines (see,
e.g., [62, p. 311]).
Radial basis functions
Using a radial basis function as a low-fidelity model assumes that its corresponding
hypothesis space (recall Definition 3.1.1) is a reproducing kernel Hilbert space HK. For
an elaborate treatment of reproducing kernel Hilbert spaces, I refer to the literature
(see, e.g., [187], [51, ch. 2.4], [91, ch. 5.8]).
Let us regard the space HKsolely in the context of radial basis functions where a
generic kernel φ=(x,t)↦φ(x,t)∶X×X→Ris specified as a radial kernel by setting
φ=r↦φ(r)∶R+→Rwhere r∶=∥x−t∥l2and t∈Xdenotes a center point. Technically,
it is tacitly assumed that the basis functions are radially symmetric on Rd.
By invoking a function φx=t↦φ(x,t)∶X→R, a member of the space HKis a
function ψ=x↦ψ(x)∶X→Rwhich can be portrayed as
ψ(x)∶=c1φx(r1)+...+cn−1φx(rn−1)+cnφx(rn)(3.74a)
≡
n
∑
i=1
ciφx(ri), (3.74b)
where ri∶=∥x−ti∥l2with i∈{1,. . .,n}and it holds that n∈Nand the coefficients ci∈R.
Analogously to (3.41), one can state that
ψ∈span({φx(ri)∣x∈X∧i∈{1,...,n−1,n}}). (3.75)
Notice well that if there is no peril of confusion, let us reduce the amount of notation
by omitting the subscript of φx.
In Table 3.2, six different assignment definitions for a generic radial basis func-
tion φ=r↦φ(r)∶R+→Rare provided. Their description follows the notational
convention in [70, p. 46] and [116, p. 262], respectively; thus, let us call the parameter
σthe shape parameter. Furthermore, let us interpret the shape parameter as a hyper-
parameter (recall Remark 3.1.4). For more elaborations on smoothness properties or
convergence properties of radial basis functions, I refer to, e.g., [38].
Despite favorable smoothness and convergence properties associated with radial
basis functions, the selection of an appropriate radial basis functions for a task at
3.1. Surrogate modeling & simulation 69
TABLE 3.2: Given a generic radial basis function φ=r↦φ(r)with
function signature R+→R,
six different definitions for the assignment φ(r).
Linear Cubic Thin plate spline
r r3r2log(r)
Gaussian Multiquadratic Inverse multiquadratic
e−r2
2σ2(r2+σ2)1
2(r2+σ2)−1
2
hand is heavily problem-dependent such that a heuristic approach to the selection is
common.
In Figure 3.6, I illustrate the six different radial basis function definitions from
Table 3.2. For those radial basis functions involving the shape parameter σ, I sketch
the members of the corresponding family where σ∈{0.2,0.4,0.6}.
0.0 1.0 2.0
r
0.0
1.0
2.0
𝜙
(
r
)
(i)
0.0 1.0 2.0
r
0.0
2.0
4.0
6.0
8.0
𝜙
(
r
)
(ii)
0.0 1.0 2.0
r
0.0
1.0
2.0
3.0
𝜙
(
r
)
(iii)
0.0 1.0 2.0
r
0.0
0.2
0.4
0.6
0.8
1.0
𝜙
(
r
)
(iv)
σ
≔0.2
σ
≔0.4
σ
≔0.6
0.0 1.0 2.0
r
0.0
1.0
2.0
𝜙
(
r
)
(v)
σ
≔0.2
σ
≔0.4
σ
≔0.6
0.0 1.0 2.0
r
0.0
2.0
4.0
𝜙
(
r
)
(vi)
σ
≔0.2
σ
≔0.4
σ
≔0.6
FIGURE 3.6: The six radial basis functions in Table 3.2.
(i)
Linear,
(ii)
Cubic,
(iii)
Thin plate spline
(iv)
Gaussian,
(v)
Multiquadratic,
(vi)
Inverse multiquadratic.
If we identify the number of basis functions nwith the number of sampling plan
points, that is, n≡m, one can execute a determination of the coefficients in (3.74) by
interpolation (cf. (3.18)). A common choice regarding the center points tiis to iden-
tify the center points with the sampling plan points xi, that is, ti≡xi. The floating-
point arithmetic operations complexity is O(n3), the storage costs are O(n2), and
the evaluation costs are O(n)(see, e.g., [181], [19], [63]). Since there is no particular
sparsity pattern associated with the corresponding interpolation matrix that could
be exploited during the solving, the solving’s arithmetic complexity is in the same
class as the Gaussian elimination algorithm’s arithmetic complexity of O(n3).
70 Chapter 3. Surrogate optimization
If the number of basis functions becomes too large, for instance, n≫1×104, then
this low-fidelity model becomes impractical. However, since the number of sam-
pling plan points is kept rather small, i.e., m≪1×104Ô⇒ n≪1×104, let us not
dwell on schemes for the evaluation of radial basis functions. For more details on
this active area of research, I refer to, e.g., [181], [19], [63].
By definition, the corresponding interpolation matrix – or, more suited to the
given context, the Gram matrix – is for all radial basis functions in Table 3.2 at least
positive semi-definite (see, e.g., [187], [116, p. 262]). In the case of the Gaussian radial
basis function, the corresponding interpolation matrix is even positive definite (see,
e.g., [187], [70, p. 46]) such that, given the output points, it is guaranteed that a
coefficient column vector exits. Nevertheless, for all radial basis functions, there
is still a governing trade-off principle or uncertainty principle which states that if we
increase the accuracy, e.g., by increasing the number of sampling plan points mor
by increasing the shape parameter σ, then the condition number – as an indicator
for numerical stability – grows as well (see, e.g. [62, ch. 16], [187]). Note that using
a sampling plan such as the Sobol quasi-random sequence can have a moderately
beneficial influence on the condition number (see, e.g., [34]).
Observe that if we make the choice n<m, more precisely, if we do not take ev-
ery sampling plan point as a center point, then one can execute a determination
of the coefficients in (3.74) by regression – analogously to (3.62). However, to my
best knowledge, there is no complete theory to explain the optimal selection of sam-
pling plan points as center points in the case of regression (see, e.g., [62, p. 168ff],
[70, p. 49]). Due to some relationship between radial basis functions and splines
(see, e.g., [91, p. 36]), the center point selection problem is mostly solved by heuris-
tic selection – similarly to the heuristic approach to the knot selection problem in
multivariate spline regression.
For the sake of completeness, I mention briefly two kinds of extensions of the
low-fidelity model in (3.74) which are discussed in the literature for reasons such as
ensuring well-posedness, increasing accuracy, and the like.
The first kind of extension is related to the thin plate spline and the radial pow-
ers – such as the linear or the cubic – (see Table 3.2). More precisely, the low-fidelity
model ψin (3.74) is extended linearly by a d-variate polynomial pof total degree at
most one, i.e., p∈Pd
≤1and, usually, in a monomial basis setting, such that one can
represent an extended radial basis low-fidelity model ˆ
ψas
ˆ
ψ(x)∶=p(x)+ψ(x), (3.76)
where the signature of the operation +is R×R→R. For more details on the deter-
mination of the coefficients of ˆ
ψ(x)and for some applications of ˆ
ψ(x)in engineering,
I refer to [27] and references therein.16
The second kind of extension is related to a statistical setting (see, e.g., [61, p. 177]),
that is, the low-fidelity model ψin (3.74) is extended linearly by a d-variate polyno-
mial ˆ
µof total degree at most zero, i.e., ˆ
µ∈Pd
≤0such that one can define an extended
radial basis low-fidelity model ˆ
ψas
ˆ
ψ(x)∶=ˆ
µ(x)+ψ(x), (3.77)
16Note that the determination of the coefficients of the extended radial basis low-fidelity model
ˆ
ψresults in a block matrix which incorporates additional constraints in order to uniquely determine
all the coefficients. By considering the corresponding Schur complements, it is probably beneficial to
experiment with various combinations of polynomials p∈Pd
≤kof varying degree kand radial basis
functions ψin order to achieve desirable properties such as a symmetric positive definite block matrix.
3.1. Surrogate modeling & simulation 71
where the term ˆ
µ(x)can be interpreted as the estimate of the mean of the high-
fidelity model. In order to discuss the relationship between ψ(x)and ˆ
ψ(x), one
would have to dwell on affine spaces which we forgo since, frequently, the term ˆ
µ(x)
is considered as incorporated in the coefficients and in the basis functions of ψ(x)
(see, e.g., [91, p. 11f]). Otherwise, the determination of the coefficients column vector
in (3.62) has to be adapted such that
ˆ
˜
c∶=B+(y−µy), (3.78)
where B+has to be customized to the corresponding radial basis function and µy
denotes the constant unknown mean column vector with respect to y. Let us post-
pone, though, further discussions on the expression in (3.77) to the elaboration on
stochastic interpolation via kriging low-fidelity models.
Kriging
Supposing a sampling plan Xssuch as in (3.14), a kriging low-fidelity model can be
interpreted as an extended radial basis low-fidelity model ˆ
ψsuch as in (3.77) (see,
e.g., [70, p. 60] or [109]), that is,
ˆ
ψ(x)∶=ˆ
µ(x)+
n
∑
i=1
ciφx(ri), (3.79)
where n≡m, i.e., all sampling plan points are utilized as center points. Following
the Gaussian in Table 3.2, the assignment φx(ri)in (3.79) is commonly chosen as
φx(ri)∶=exp(−ri), (3.80)
where the radii riare not defined via an l2-norm such as in (3.74), but via a metric
such that rican be written as
ri∶=
d
∑
j=1
θj∣xj−xj
i∣pj, (3.81)
where xj
irefers to the j-th component of the i-th sampling plan point, drefers to
the total number of components of the i-th sampling plan point, i.e., d≡Nξ, and
θj∈R+and pj∈[0,2]refer to parameters that have to be determined. Mind that the
coefficients ciin (3.79) depend on the parameters θj,pj, and ˆ
µ(x). These unknown
quantities are determined by means of a statistical machinery which we sketch out
next.
Building upon (3.20), an approach to a kriging low-fidelity model is to consider
∀i∈{1,. . .,m}.z(xi)=yi−µ(xi), (3.82)
where µindicates a constant unknown mean function with respect to yand the resid-
ual yi−µ(xi)indicates a realization of a Gaussian process z(x)(cf. [61, p. 146]). Note
that we only consider a constant unknown mean function. This choice of µis asso-
ciated with the so-called ordinary kriging. For other kinds of kriging, I refer to [221,
p. 154].
From a statistical viewpoint, z(x)is associated with a random error (recall the
discussion on noise in § 3.1.1). However, from an interpolation viewpoint, we do
not regard any errors in the output points yi(cf. [70, p. 55]).
72 Chapter 3. Surrogate optimization
Therefore, the sketch is inspired mainly by the approach to a kriging low-fidelity
model presented in [108] and in [70, ch. 2.4]. For more details regarding the sketch,
I refer to, e.g., [184], [108], [61, ch. 5.4], [173, ch. 5], [70, ch. 2.4] or [116, ch. 15].
With regard to a given sampling plan Xs, let us encode in matrix representation
the output points yias a column vector y∈Rm×1with
y∶=[y1y2... ym−1ym]T. (3.83)
Furthermore, with regard to a given sample, one can define the probability density
of an m-dimensional Gaussian distribution at yas
Nm(y∣µy,Σ)∶=1
(2π)m/2∣Σ∣1/2exp(−1
2(y−µy)TΣ−1(y−µy)), (3.84)
where, technically, yis associated with a corresponding random vector Y,Σ∈Rm×m
denotes the covariance matrix and µy∈Rm×1denotes the constant unknown mean
column vector with respect to ythat, given a scalar µy∈R, is defined as
µy∶=µy⋅[1 1 . . . 1 1]T(3.85)
≡µy⋅1, (3.86)
where 1 ∶=[1 1 .. . 1 1]Twith 1 ∈Rm×1.17 Observe that Nm(y∣µy,Σ)in (3.84) is
a slight abuse of notation in order to emphasize the sample-oriented viewpoint in
the present work (recall § 3.1.1).
If we consider the constant unknown variance with respect to y, i.e., σ2
y∈R, that
is associated with the constant unknown standard deviation w.r.t. y, i.e., σy∈R, by
σy∶=√σ2
y, (3.87)
then the covariance matrix can be expressed by the correlation matrix Ψ∈Rm×msuch
that
Σ≡σ2
y⋅Ψ. (3.88)
In the context of a kriging low-fidelity model, the entries of the correlation ma-
trix Ψ∶=[ψi,l]are commonly defined by the radial basis function in (3.80) such that
ψi,l≡exp(−
d
∑
j=1
θj∣xj
i−xj
l∣pj). (3.89)
The choice of entries in (3.89) reveals that if i=l, then ψi,l=1, and if ∥xi−xl∥l2grows
exponentially, then ψi,ltends asymptotically to zero.
Instead of a parametrization by its mean vector and its covariance matrix, one
can parameterize the m-dimensional Gaussian distribution in (3.84) by its mean vec-
tor, its variance scalar, and its correlation matrix such that (3.84) can be rewritten as
Nm(y∣µy,σ2
y,Ψ)∶=1
(2πσ2
y)m/2∣Ψ∣1/2exp(−1
2σ2
y(y−µy)TΨ−1(y−µy)), (3.90)
17Let us set ∣Σ∣∶=det(Σ)where we comprehend the map det as the determinant of a square matrix,
more precisely, det =Σ↦det(Σ)∶Rm×m→R.
3.1. Surrogate modeling & simulation 73
where it holds that ∀Ψ∈Rm×m.∀σ2
y∈R.det(σ2
yΨ)=(σ2
y)mdet(Ψ).
Due to the definition of the entries of Ψin (3.89), the matrix Ψis positive defi-
nite and all of its eigenvalues are positive, respectively. Therefore, the matrix Ψis
non-singular and the inverse matrix Ψ−1exists, respectively. Furthermore, one can
discern that ∣Ψ∣>0.18
Abstractly, one can define the likelihood function L=ϑ↦L(ϑ)≡L(y∣ϑ)with the
signature Θ→[0,1]and ϑ∶=(µy,σ2
y)such that the assignment L(ϑ)reads as
L(ϑ)∶=1
(2πσ2
y)m/2∣Ψ∣1/2exp(−1
2σ2
y(y−µy)TΨ−1(y−µy)). (3.91)
Roughly speaking: Given any y, the aim is to find the parameter ϑsuch that
the likelihood of observing yis maximized.19 Hence, the maximum likelihood estimate
(MLE) for ϑ, i.e., ˆ
ϑMLE or ˆ
ϑ, is characterized by
ˆ
ϑ∶=argmax
ϑ∈Θ
L(ϑ). (3.92)
Though, computationally more amenable regarding (3.92) is to consider the ln-
likelihood function Lln =ϑ↦ln(L(ϑ))with the signature Θ→]− ∞,0]such that the
assignment Lln(ϑ)reads as
Lln(ϑ)∶=−m
2ln(2π)−m
2ln(σ2
y)−1
2ln(∣Ψ∣)−1
2σ2
y(y−ˆ
µy)TΨ−1(y−ˆ
µy), (3.93)
where, due to the definition of the natural logarithm, one has to suppose that σ2
y≥0
and ∣Ψ∣≥0.
The maximum likelihood estimates for µyand σ2
ycan be described by
ˆ
µy∶=1TΨ−1
1TΨ−11y, (3.94a)
ˆ
σ2
y∶=1
m(y−ˆ
µy)TΨ−1(y−ˆ
µy), (3.94b)
where ˆ
µyis defined analogously to (3.85), that is,
ˆ
µy∶=ˆ
µy⋅1. (3.95)
By evaluating the ln-likelihood function in (3.93) at the estimates in (3.94) and
truncating those terms of the assignment Lln(ϑ)that represent solely numbers, one
can define the concentrated ln-likelihood function Lcln =(θ,p)↦Lcln(θ,p)with the
signature [0,+∞[d×[0,2]d→]− ∞,0]such that the assignment Lcln(θ,p)reads as
Lcln(θ,p)∶=−m
2ln(ˆ
σ2
y)−1
2ln(∣Ψ∣). (3.96)
In order to determine the maximum likelihood estimates of (θ,p)numerically by
utilizing a suitable optimization algorithm (recall § 2.3.3), it is common to associate
18Given ∣Ψ∣∶=det(Ψ)and if λidenote the eigenvalues of the matrix Ψ∈Rm×m, then one can invoke
the statement det(Ψ)≡∏m
i=1λi. Hence, if the matrix Ψis positive definite and all of its eigenvalues are
positive, then ∃Ψ. det(Ψ)>0 holds.
19For a more elaborated treatment of some aspects regarding the interpretation of the likelihood
function, I refer to [201, p. 29ff].
74 Chapter 3. Surrogate optimization
(ˆ
θ,ˆ
p)with the expression
(ˆ
θ,ˆ
p)∶=argmin
(θ,p)∈[0,+∞[d×[0,2]d
−Lcln(θ,p). (3.97)
In [70, p. 55–58], the authors mention further aspects regarding the numerical treat-
ment of the expression in (3.97). For instance, due to the definition of the entries of
the correlation matrix Ψin (3.89), the maximum likelihood estimates (ˆ
θ,ˆ
p)are sensi-
tive to the scaling of a given sampling plan Xs. Therefore, it is advisable to consider
the normalized sampling plan Xs, that is, the unit d-dimensional hypercube (recall
the case d=2 in Figure 3.1 and Figure 3.2).
Furthermore, it is preferable to consider the entity θrather on a closed logarith-
mic interval such as θ∈[10−3,102]d.
Additionally, it is common to alleviate the computational burden in (3.97) by
setting heuristically the entity ˆ
pto a fixed value in advance. A usual choice is
ˆ
p=[2 2 . . . 2 2]Twith ˆ
p∈Rd×1.
It should be recalled that the kriging low-fidelity models are closely related to
Gaussian radial basis functions. Hence, it is reasonable to assume that the krig-
ing low-fidelity models suffer from ill-conditioning issues as well which could be
mitigated by simple regularization techniques (see the discussion concerning (3.65))
such that, for instance, instead of the correlation matrix Ψ, the regularized correla-
tion matrix Ψ+λIis considered with λ∈[e,1×10−6]and I∈Rm×m. Hence, the ma-
trix Ψ+λIhas to be take into account in (3.94) and in (3.96) as well. Technically, the
regularized correlation matrix implicitly supposes a regression problem as opposed
to an interpolation problem. Pragmatically, due to regarding the range of values for
the hyperparameter λas marginal, this specific regularized correlation matrix is still
treated within an interpolation problem (see, e.g., [70, p. 152]).
It is also reasonable to assume that the kriging low-fidelity models exhibits a
similar character regarding the floating-point arithmetic operations complexity, the
storage costs, and the evaluation costs such as the Gaussian radial basis functions.
Thus, it possible to exploit the matrix structure of the correlation matrix Ψwhich
is a square positive-definite matrix. More precisely, a matrix inversion based on
Cholesky decomposition can be performed in order to reduce the number of floating
point arithmetic operations compared to a lower–upper (LU) decomposition.
Notice well that if we apply a singular value decomposition (SVD) to the corre-
lation matrix Ψ, then we obtain a least-squares Kriging regression (cf. [70, p. 152]).
The determination of the inverse Ψ−1of a non-singular matrix Ψin SVD can be seen
as computationally equivalent to the determination of the pseudoinverse Ψ+of the
matrix Ψby the SVD method such as in (3.61).
After the determination of the maximum likelihood estimates (ˆ
θ,ˆ
p), one can
specify the kriging low-fidelity model in (3.79) as
ˆ
y(x)∶=ˆ
µy+rTΨ−1(y−ˆ
µy), (3.98)
where ˆ
y∶X→Rsuch that ˆ
y(x)indicates the prediction20 at an arbitrary point xand
r∶=[ri]∈Rm×1denotes the correlation column vector that reads as
r∶=[r1r2... rm−1rm]T, (3.99)
20More precisely, in statistics vernacular, ˆ
y(x)indicates the best linear unbiased predictor (BLUP) (see,
e.g., [61, p. 146f]).
3.1. Surrogate modeling & simulation 75
where the components riare defined such as in (3.81). For an in-depth derivation of
the maximum likelihood estimate ˆ
y(x), I refer to [70, p. 59-62].
3.1.3 Simplified-physics low-fidelity models
Simplified-physics low-fidelity models depend on a user’s domain-specific knowl-
edge regarding the mathematical description of the physics associated with the high-
fidelity model and regarding the numerical software associated with the high-fidelity
model (recall chapter 2).
Depending on the degree of intervention in the implementation of the numeri-
cal software regarding the high-fidelity model, the low-fidelity models are intrusive
or non-intrusive. Let us consider all simplified-physics low-fidelity models as non-
intrusive, especially if a low-fidelity model is based on, e.g., a coarse-grid discretiza-
tion or a weakened termination criteria of an iterative solver or a combination of
both.
Recalling § 3.1.1, a basic postulate concerning a high-fidelity model Kand a low-
fidelity model ˜
Kis that K∈YXand ˜
K∈YX. Unlike the deterministic and probabilis-
tic data-fit low-fidelity models, one cannot generally provide a hypothesis space H.
Furthermore, the computational costs and the degrees of fidelity linked to the low-
fidelity models under consideration are prescribed by the user who, abstractly speak-
ing, imposes implicitly some kind of lexicographic ordering or lexicographic prefer-
ence on the class that encompasses all models.
Besides the low-fidelity model based on, e.g., a coarse-grid discretization, an-
other example of a simplified-physics low-fidelity model is a one-dimensional, lin-
ear boundary value problem (1D-LBVP) that is, in some sense, related to a two-
dimensional, linear boundary value problem (2D-LBVP). Notice well that the two-
dimensional, linear boundary value problem can be seen as a simplified-physics
low-fidelity model which in turn, in some sense, is related to a three-dimensional,
non-linear boundary value problem (3D-NLBVP).
Hence, one can construct a hierarchy of low-fidelity models where the high-
fidelity model corresponds to a 3D-NLBVP. In Figure 3.7, there is a schematic de-
piction of such a possible user-prescribed hierarchy.21
If we invoke the Figure 2.1a, then one can concretize the Figure 3.7 with regard
to a user-prescribed hierarchy of magnetoquasistatic and magnetostatic problems,
respectively, by means of the Figure 3.8.
In (i) of Figure 3.8, a single conducting subdomain as a common representative
of a magnetoquasistatic subsystem’s domain of application is depicted. In order to
emphasize the subdomain’s three-dimensionality, let us utilize the superscript 3D.
In (ii) of Figure 3.8, a single conducting subdomain exhibiting two-dimensionality
(2D) is shown. The cross sectional area indicated by Ωnc can be regarded as topologi-
cally equivalent to the closed 2-ball that is topologically equivalent to the closed unit
2-cube [0,1]2. Hence, the cross-sectional area’s boundary ∂Ωnc can be seen as topo-
logically equivalent to the 1-sphere that is topologically equivalent to the boundary
of the closed unit 2-cube [0,1]2. In applications, assuming an appropriate metric
structure, a round and a rectangular cross-sectional area are commonly utilized for
geometrically modeling a round conductor and a foil conductor, respectively. These
conductor kinds are usually the building blocks of an inductive components wind-
ing of varying complexity. For instance, the round conductor constitutes a basic
building block of a litz wire winding (see, e.g., [154, p. 110-113]).
21The Figure 3.7 is partly inspired by the depictions in [166].
76 Chapter 3. Surrogate optimization
(a)(b)
3D-NLBVP
3D-LBVP
2D-NLBVP
2D-LBVP
1D-LBVP
1D-NLBVP
computational costs
degree of fidelity
3D-NLBVP
3D-LBVP
2D-NLBVP
2D-LBVP
1D-NLBVP
1D-LBVP
FIGURE 3.7: A schematic depiction of a user-prescribed hierarchy of
problems which are associated with simplified-physics low-fidelity
models. The problem 3D-NLBVP is associated with the high-fidelity
model. (a) An arrangement of the user-prescribed hierarchy with re-
gard to the degree of fidelity and the computational costs. (b) An en-
coding of the user-prescribed hierarchy as a relationships diagram in
which the arrow points from a model with higher degree of fidelity
and higher computational costs to a model with lower degree of fi-
delity and lower computational costs.
A corresponding two-dimensional boundary value problem can be associated
with a three-dimensional boundary value problem where the cross-sectional area is
assumed to be spatially longitudinally homogeneous.
In (iii) of Figure 3.8, an ohmic resistor from electric circuit components is de-
picted as a symbolic representation of a function R. In applications, the function Ris
often associated with special mathematical functions such as the natural logarithm
or Bessel functions. Prevalently though, the function Ris associated with a multi-
variate rational function.
For instance, a conductor’s ohmic resistance at the frequency 0Hz can be ex-
pressed as a multivariate rational function depending on a parameter point ξ
ξ
ξ(recall
§2.2.3) in which geometrical parameters are incorporated that define the conduc-
tor’s length and its cross-sectional area. Assuming a spatially constant electric con-
ductivity (recall § 2.1.2), that is, σ(x)∶=σ0with σ0∈R+, then, technically, one could
include the material parameter σ0in the parameter point ξ
ξ
ξas well. However, a good
conductor is usually assumed in the sense that the material characteristics of plain
copper are utilized such that σ0is fixed as σ0∶=σCu with σCu ∶=5.96×107S/m.
Another example is an impedance which can be interpreted as a representative
of a real inductive component in a circuit theory context. This impedance can be
used, e.g., in the computation of a two-port S-parameter matrix. The components of
the parameter point ξ
ξ
ξadhere to the physical interpretation as an angular frequency,
a capacitance, an inductance, and an electrical resistance.
Abstractly, one can state that R∈p
qPd
(m,n)where, analogous to (3.50), the space p
qPd
(m,n)
denotes the space of d-variate rational polynomials of total degree at most min the
numerator polynomial p, i.e., p∈Pd
≤mand total degree at most nin the denomina-
tor polynomial q, i.e., q∈Pd
≤n. Hence, the function Rcan be called a multivariate
rational polynomial function of type (m,n)with m,n∈Z+
0. Finally, one can read the
3.1. Surrogate modeling & simulation 77
Ω3D
nc
∂Ω3D
R1
T3D
h1
Ω3D
cΩ2D
c
Ω2D
nc
∂Ω2D
R2
T3D
h2T2D
h2
R2
1∆3D
Ax=b
R3
2∆3D
Ax=b
T2D
h1
1∆2D
Ax=b
2∆2D
Ax=b
(i) (ii) (iii)
R
FIGURE 3.8: A schematic depiction of a user-prescribed hierarchy of
magnetoquasistatic and magnetostatic problems which are associated
with simplified-physics low-fidelity models. The three-dimensional
domains in (i) are associated with the high-fidelity model’s un-
derlying magnetoquasistatic or magnetostatic problem. The two-
dimensional domains in (ii) are associated with a low-fidelity model’s
underlying magnetoquasistatic or magnetostatic problem. The ohmic
resistors in (iii) are associated with a multivariate rational polynomial
derived from a magnetoquasistatic or a magnetostatic problem. It is
assumed that the user prefers the low-fidelity models in (ii) over those
low-fidelity models in (iii).
assignment of the (m,n)multivariate rational function Ras
R(x)∶=p(x)
q(x)such that p∈Pd
≤m,q∈Pd
≤n. (3.100)
Unlike deterministic data-fit low-fidelity models, the function Ris derived from the
system of Maxwell’s equations (recall § 2.1.2) and its coefficients are fixed to known
values. It might be beneficial to examine the usefulness of multivariate rational poly-
nomials as deterministic data-fit low-fidelity models, however, I ignore them in the
present work and they – as well as associated methods within the electromagnet-
ics context such as vector fitting (see, e.g., [85]) – are left for future investigations.
78 Chapter 3. Surrogate optimization
Notice well that the rationale for ignoring them is driven by potential difficulties
in handling properly spurious poles of a multivariate rational polynomial in an op-
timization context.22 Additionally, it is supposed that the potential benefits of a
multivariate rational polynomial’s localized behavior steered by the poles are com-
parable with a radial basis function’s localized behavior steered by the choice of the
center points such that the radial basis functions are preferred over the multivariate
rational polynomials. A thoroughly elaborated juxtaposition of these two kinds of
deterministic data-fit low-fidelity models is out of the scope of this work, though.
Notice well that the dotted arrows in Figure 3.8 are semantically overloaded in
the sense that their vertical reading and their horizontal reading differ.
Considering the first level and the second level within (i) of Figure 3.8, the dotted
arrows indicate a relationship between a fine-grid discretization and a coarse-grid
discretization, more precisely, the respective simplicial triangulations (recall § 2.2.2)
T3D
h1and T3D
h2are governed by the characteristic h1<h2. Furthermore, considering
the second and the third level, the dotted arrows indicate a relationship between
a higher threshold 1∆3D
Ax=band a lower threshold 2∆3D
Ax=bfor a termination crite-
rion of an iterative solver, more precisely, the thresholds exhibit the characteristic
1∆3D
Ax=b<2∆3D
Ax=bwith 1∆3D
Ax=b∈R+and 2∆3D
Ax=b∈R+.
The explanation for the levels within (ii) of Figure 3.8 is analogous to the previous
one. By contrast, the dotted arrows for the levels within (iii) of Figure 3.8 hint at a
change such as the domain transformations in (A.1) or in (A.14). Or the arrows
hint at a change, for instance, from a multivariate rational polynomial function of
type (m,n)to a multivariate rational polynomial function of type (m,0)and leading
coefficient of one, that is, to a multivariate polynomial function from the space Pd
≤k.
The horizontal reading of the dotted arrows in Figure 3.8 reflects the relation-
ships diagram within (b)of Figure 3.7. In the vertical reading, it is partially con-
ceivable how a formal encoding of the arrows could look like, but it is not straight-
forward to conceive such a formal encoding with regard to the horizontal reading.
If we employ a structural perspective to the Figure 3.8 – similarly to the structural
perspectives in ch. 2, then one can extract exemplarily the formal encodings by a
map-oriented representation in (Diagrams of Fig. 3.8) where, for the sake of clarity,
it is omitted to extract the inverse maps from the Figure 3.8.
From (Diagrams of Fig. 3.8), one can conclude that, theoretically, a less preferred
problem can be constructed by a composition of maps associated with more pre-
ferred problems. Then, one can make statements such as
l1○f3○f2○f1=g3○g2○g1(3.101a)
l1○f3○f2○f1=g3○g2○g1○i1(3.101b)
l2○l1○f3○f2○f1=h3○h2○h1○i2○i1. (3.101c)
Note that the statements in (3.101) are valid under the assumption that all maps are
set functions and their domains and co-domains are sets. However, this assumption
does not take into account adequately the different algebraic characters of, e.g., T3D
h1,
22Let us comprehend an unwanted pole as a spurious pole in the sense that it captures a singularity
that does not correspond to a non-essential singularity of the high-fidelity model.
3.1. Surrogate modeling & simulation 79
1∆3D
Ax=band R1.
T3D
h1T2D
h1R1T3D
h1T2D
h1R1T3D
h1T2D
h1R1
T3D
h2T2D
h2R2T3D
h2T2D
h2R2T3D
h2T2D
h2R2
1∆3D
Ax=b1∆2D
Ax=bR31∆3D
Ax=b1∆2D
Ax=bR31∆3D
Ax=b1∆2D
Ax=bR3
2∆3D
Ax=b2∆2D
Ax=bR42∆3D
Ax=b2∆2D
Ax=bR42∆3D
Ax=b2∆2D
Ax=bR4
f1g1h1
i1i2
f1g1h1
F1F2
f2g2h2
j1j2
f2g2h2
G1G2
f3g3h3
k1k2
f3g3h3
H1H2
l1l2
(Diagrams of Fig. 3.8)
Moreover, the statements in (3.101) do not adequately capture the idea that, for
instance, moving from a high-fidelity model’s underlying three-dimensional mag-
netoquasistatic problem to a low-fidelity model’s underlying two-dimensional mag-
netoquasistatic problem corresponds technically to a loss of problem information,
e.g., with regard to the boundary conditions. Therefore, it is more appropriate to
consider the problems associated with the low-fidelity models as forgetful interpreta-
tions of the problem associated with the high-fidelity model. This viewpoint can be
mediated by, for example, the maps F1and F2such that
F1(T3D
h1)∶=T2D
h1F1(T3D
h2)∶=T2D
h2F1(f1)∶=g1(3.102a)
(F2○F1)(T3D
h1)∶=R1(F2○F1)(T3D
h2)∶=R2(F2○F1)(f1)∶=h1, (3.102b)
where the map F1and the map F2are overloaded in order to deal with the different
algebraic characters. We elaborate on the corresponding formal approach based on
the category theoretical language in chapter 4.
To adjust the expectations correctly, notice well that the category theoretical lan-
guage is not a panacea at all. Its merits stem from the fact that, in a nutshell, there
is an absence of a cohesive theory to express in formal terms the relationships be-
tween different problems associated with a high-fidelity model and corresponding
low-fidelity models (see Figure 3.8).
This absence, though, is the Achilles’ heel of the mathematical analysis of any
optimization approach that exploits simplified-physics low-fidelity models and re-
lies on a, in some sense, benign resemblance between these low-fidelity models and
the high-fidelity model (see, e.g., [49, p. 76]).
In order to mitigate the ramifications of the absence, the authors in [121] suggest
to assess the resemblance based on an observed points subset, i.e., the training sub-
set, and some quality factors that are derived in the context of the space mapping
paradigm.
In a more general context, the author in [201, p. 76] suggests the assessment and
ordering of different problems by analyzing their explanatory power based on an
80 Chapter 3. Surrogate optimization
observed points subset in a Bayesian setting.
Mind that these approaches are not pursued in the present work, although these
approaches are promising endeavors towards a quantification of the relationships
between different problems associated with the various models.
However, these approaches do not seem widely adopted in practical applica-
tions – because, presumably, the user-prescribed hierarchy of problems, which re-
flects the user’s preferences, outranks other conceivable orderings of the problems.
Nevertheless, in § 3.3.2, I contribute partly to this overall discussion by elabo-
rating briefly on the potential role of the NREGE in (3.24) and the SSPCC in (3.31)
regarding the quantitative assessment of the quality of a low-fidelity model and a
surrogate model within the space-mapping paradigm.
In § 3.3.1, we discuss the efficient global optimization (or sequential kriging op-
timization) technique as a subtype of the model management strategy adaptation.
This technique exploits solely a kriging low-fidelity model.
In§3.3.2 and in § 3.3.2, we discuss the space mapping paradigm and the co-
kriging approach as subtypes of the two model management strategies adaptation
and fusion, respectively (recall § 1.3). Both the space mapping paradigm and the co-
kriging approach are designed such that they exploit especially simplified-physics
low-fidelity models.
A notable distinction between the space mapping paradigm and the co-kriging
approach is how they deal with the statement in (3.2). Abstracting from the authors’
perspective in [70, p. 167], it can be argued that the co-kriging approach focuses on
an instance of the generic statement
∀x∈X.K(x)=YZρ(x)⋅Y˜
K(x)+YZ∆(x), (3.103)
where Zρ∈YXand Z∆∈YXdenote correction maps and the map ⋅Y∶Y×Y→Yand
the map +Y∶Y×Y→Ydenote a suitable multiplication on Yand a suitable addi-
tion on Y, respectively. We elaborate on the instance associated with the co-kriging
approach in § 3.3.2.
Taking into account in an abstract manner the stance of the authors in [194,
p. 32f], [56, ch. 2.5], and [49, p. 110ff], it can be argued that the space-mapping
paradigm focuses on instances of the generic statements
∀x∈X.K(x)=Y(˜
K○X˜
P)(x), (3.104a)
∀x∈X.K(x)=Y(˜
R○Y˜
K)(x), (3.104b)
∀x∈X.K(x)=Y(˜
R○Y˜
K○X˜
P)(x), (3.104c)
where ˜
P∈XXdenotes a domain-oriented correction map and ˜
R∈YYdenotes a co-
domain-oriented correction map and the map ○X∶YX×XX→YXand the map ○Y∶
YY×YX→YXdenote suitable composition maps. Notice well that, in (3.104), the
maps ˜
K○X˜
P∶X→Y,˜
R○Y˜
K∶X→Y, and ˜
R○Y˜
K○X˜
P∶X→Ydesignate surrogate
models. Hence, we encounter the conceptional distinction between the notion of a
low-fidelity model and a surrogate model that has been mentioned in § 3.1.1.
By using the defect correction principle of numerical analysis as a scaffolding, the
author in [56, ch. 2.5] investigates the space mapping paradigm. With regard to the
corresponding numerical iteration schemes, the defect correction principle permits
to interpret implementations of the map ˜
Rand the map ˜
Pas a left-preconditioner
and a right-preconditioner, respectively. In § 3.3.1, I dwell on algorithmic instances
associated with the space mapping paradigm.
3.2. Surrogate-based optimization 81
In § 4, the category theoretical language is employed as an algebraic modeling
scaffolding in order to assess its capability to complement the primarily numerical
analytic narrative on simplified-physics low-fidelity models and the space mapping
paradigm.
3.2 Surrogate-based optimization
A basic premise in the present work is that the acquisition of pairs of sampling plan
points and output points with respect to a sample sin (3.13) is computationally
expensive. Thus, it forces a user to be parsimonious with regard to the sample size m.
From an engineering application viewpoint, though, imagine the use case in
which an effortless interplay between hardware and software enables a much faster
acquisition of a sample sthan without using this interplay. More concretely, imagine
that a user can, without much ado, exploit opportunities for parallel computing and
GPU (graphics processing unit) computing.
Recalling the Figure 1.4, this use case shifts rather the attention from the level of
algorithms and the level of programs to a level of hardware technologies which is
out of the scope of the present work.
However, if we abstract from the aforementioned concrete use case, then it re-
veals this section’s main aim as gaining some insights about the degree of similar-
ity between a high-fidelity model and a low-fidelity model without the usage of
a model management strategy. This consideration corresponds to the assessment
of the global and the local accuracy of a low-fidelity model with regard to a high-
fidelity model.
Building upon a certain degree of established similarity between a high-fidelity
model and a low-fidelity model, the basic idea underlying the surrogate-based op-
timization (cf. § 1.2) is, first, to find a minimum associated with the low-fidelity
model and, second, either to accept this low-fidelity model’s minimum as a proxy –
to some extent – of a minimum associated with the high-fidelity model or to use
this low-fidelity model’s minimum as a starting point of the search for a minimum
within the high-fidelity model.
In the subsequent subsections, let us examine the optimization with the test
functions in Figure 2.2 by data-fit low-fidelity models and by emulated simplified-
physics low-fidelity models.
3.2.1 Optimization with test functions
by data-fit low-fidelity models
Using the Sobol quasi-random sequence sampling plan in Figure 3.2, let us invoke a
2-variate monomial polynomial model p(x)∈P2
≤2via regression of the test functions
(and high-fidelity models, respectively) in Figure 2.2 where the matrices Wand R
in (3.65) are chosen such that W∶=Iwith I∈Rm×mand R∶=0 with 0 ∈R6×6. In Fig-
ure 3.9, the corresponding contour representations are depicted in the cases where
the number of sampling plan points is given by m∶=10 and by m∶=50.
In Figure 3.10, a Sobol quasi-random sequence sampling plan is utilized with the
number of sampling plan points set to m∶=10 and to m∶=50 as well. A radial basis
function φ=r↦φ(r)with thin plate spline assignment (see Table 3.2) is invoked via
interpolation of the test functions in Figure 2.2. Let us choose the thin plate spline
assignment as a representative of radial basis functions without additional hyperpa-
rameters to be adjusted. Mind that, for visualization purposes, in-house Julia code
82 Chapter 3. Surrogate optimization
is combined with the Julia PL package ScatteredInterpolation.jl (see [141] and
[216]).
Finally, utilizing the Sobol quasi-random sequence sampling plan with m∶=10
and m∶=50 again, let us invoke the last data-fit low-fidelity model, that is, a krig-
ing low-fidelity model via interpolation of the test functions in Figure 2.2. Since the
computational burden of finding numerically the optimal parameters (ˆ
θ,ˆ
p)in (3.97)
scales with the dimensionality d, let us choose a compromise with regard to the pa-
rameters (ˆ
θ,ˆ
p)in the sense that it is set that (ˆ
θ1,ˆ
θ2,ˆ
p1,ˆ
p2≡ˆ
p1). Thus, we compensate
slightly the computational burden of finding numerically the optimal parameters
whilst taking into account the benefit of a numerical search for optimal parameters
compared to a manually predefined set of parameters. Recalling § 2.3.3, notice that
the Nelder-Mead simplex algorithm is primarily employed to the optimization prob-
lem in (3.97). Compared loosely to an adaptive differential evolution algorithm, the
NMS algorithm’s results differ mostly in one or two decimal places from the ADE
algorithm’s results but, on the average, the NMS algorithm’s results are achieved
faster than the ADE algorithm’s results. Though, it is hard to generalize this obser-
vation and to detect a firm preference for an optimization algorithm for the task at
hand in (3.97). Mind that, for visualization purposes, in-house Julia code is com-
bined with the Julia PL package Surrogates.jl.
At a qualitative level, given the number of sampling plan points by m∶=10, one
can observe that the 2-variate monomial polynomial model best recovers the con-
tour, more precisely, the value and the shape, of the Unit sphere and the Booth test
function. Furthermore, it satisfactorily recovers the contour of the Rosenbrock and
the modified Branin test function, and it worst recovers the contour of the Ackley
and the Michalewicz test function. Due to the known definitions of the test func-
tions, these observations are plausible. Mind that the polynomial model is invoked
in a regression context, hence, the influence of another kind of polynomial model
on the quality of the low-fidelity model with respect to the high-fidelity model is
subdued. However, the influence of a higher number of sampling plan points is
slightly bigger – especially, if we consider the modified Branin test function which
can be seen as a rather protypical function within an engineering applications’ con-
text (cf. [70, p. 196]).
In the case of m∶=10, if compared to the monomial polynomial model within the
regression context, then the thin plate spline radial basis function and the kriging
low-fidelity model within the interpolation context recover moderately, for instance,
the modified Branin function. If we contrast the thin plate spline radial basis func-
tion (or short TPS RBF) with the kriging low-fidelity model, then one can observe
that the kriging low-fidelity model tends to retrieve more accurately the values of
the test functions whereas the thin plate spline radial basis function tends to retrieve
more accurately the shape of the test functions. However, in the case of m∶=50,
both low-fidelity models are able to recover the values and the shapes of the high-
fidelity models satisfactorily albeit the kriging low-fidelity performs the recovery
slightly better. Recall, though, that the thin plate spline radial basis function does
not involve any hyperparameters, thus, a computationally intensive hyperparame-
ters optimization step is omitted.
At a quantitative level, let us look at the normalized mean generalization error
eN
H,sg(ˆ
Qξ
ξ
ξ)and the mean SSPCC r2
ˆ
y˜
ˆ
y∣kwithin the k-fold cross validation method w.r.t.
deterministic and probabilistic data-fit low-fidelity models.23
23In the case of a Chebyshev polynomial, the common k-fold cross-validation method breaks the
regular pattern of the Chebyshev grid (recall Figure A.1).
3.2. Surrogate-based optimization 83
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(i)
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(ii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iv)
0 1 2 3 4
x
1
0
1
2
3
4
x
2
(v)
-5 0 5 10
x
1
0
5
10
15
x
2
(vi)
(A) The number of sampling plan points is given by m∶=10.
The red cross refers to the global minimum of the high-fidelity models in Figure 2.2b.
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(i)
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(ii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iv)
0 1 2 3 4
x
1
0
1
2
3
4
x
2
(v)
-5 0 5 10
x
1
0
5
10
15
x
2
(vi)
(B) The number of sampling plan points is given by m∶=50.
The red cross refers to the global minimum of the high-fidelity models in Figure 2.2b.
FIGURE 3.9: Using the Sobol quasi-random sequence sampling plan
in Figure 3.2, a 2-variate monomial polynomial model p(x)∈P2
≤2via
regression (W∶=Iwith I∈Rm×mand R∶=0 with 0 ∈R6×6in (3.65))
of the test functions (and high-fidelity models, respectively) in Fig-
ure 2.2 (solely in contour representation).
84 Chapter 3. Surrogate optimization
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(i)
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(ii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iv)
0 1 2 3 4
x
1
0
1
2
3
4
x
2
(v)
-5 0 5 10
x
1
0
5
10
15
x
2
(vi)
(A) The number of sampling plan points is given by m∶=10.
The red cross refers to the global minimum of the high-fidelity models in Figure 2.2b.
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(i)
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(ii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iv)
0 1 2 3 4
x
1
0
1
2
3
4
x
2
(v)
-5 0 5 10
x
1
0
5
10
15
x
2
(vi)
(B) The number of sampling plan points is given by m∶=50.
The red cross refers to the global minimum of the high-fidelity models in Figure 2.2b.
FIGURE 3.10: Using the Sobol quasi-random sequence sampling plan
in Figure 3.2, a radial basis function φ=r↦φ(r)with thin plate spline
assignment (see Table 3.2) via interpolation of the test functions (and
high-fidelity models, respectively) in Figure 2.2
(solely in contour representation).
3.2. Surrogate-based optimization 85
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(i)
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(ii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iv)
0 1 2 3 4
x
1
0
1
2
3
4
x
2
(v)
-5 0 5 10
x
1
0
5
10
15
x
2
(vi)
(A) The number of sampling plan points is given by m∶=10.
The red cross refers to the global minimum of the high-fidelity models in Figure 2.2b.
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(i)
-30 -20 -10 0 10 20 30
x
1
-30
-20
-10
0
10
20
30
x
2
(ii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iii)
-10 -5 0 5 10
x
1
-10
-5
0
5
10
x
2
(iv)
0 1 2 3 4
x
1
0
1
2
3
4
x
2
(v)
-5 0 5 10
x
1
0
5
10
15
x
2
(vi)
(B) The number of sampling plan points is given by m∶=50.
The red cross refers to the global minimum of the high-fidelity models in Figure 2.2b.
FIGURE 3.11: Using the Sobol quasi-random sequence sampling plan
in Figure 3.2, a kriging low-fidelity model via interpolation (where
(ˆ
θ1,ˆ
θ2,ˆ
p1,ˆ
p2≡ˆ
p1)in (3.97)) of the test functions (and high-fidelity
models, respectively) in Figure 2.2 (solely in contour representation).
86 Chapter 3. Surrogate optimization
As it has been already uttered in § 3.1.1, we consider the 5-fold case and the
10-fold case. The sample size is set to m∶=50 since we consider the results associ-
ated with this sample size as sample-based best case error estimates and sample-
based lower error bounds, respectively. The wording concerning the estimates and
bounds is rather a pragmatism-driven ad-hoc artifice and it should not be regarded
too tightly through the formally well crafted glasses in the context of numerical sim-
ulations (see § 2.2).
Moreover, let us consider the normalized global first-order sensitivity measures
SN
˜
ˆ
y,iwith i∈{1,2}evaluated at fw.r.t. the data-fit low-fidelity models.
TABLE 3.3: The normalized mean generalization error eN
H,sg(ˆ
Qξ
ξ
ξ)and
the mean SSPCC r2
ˆ
y˜
ˆ
y∣kwithin the k-fold cross validation method
w.r.t. the 2-variate monomial polynomial in Figure 3.9 with sample
size m∶=50.
(i) (ii) (iii) (iv) (v) (vi)
eN
H,sg(ˆ
Qξ
ξ
ξ)∣k∶=50.5098 ≪1.0 ×10−16 ≪1.0 ×10−16 ≫1.0 0.0657 >1.0
r2
ˆ
y˜
ˆ
y∣k∶=50.5461 1.0 1.0 0.9003 0.3369 0.6558
eN
H,sg(ˆ
Qξ
ξ
ξ)∣k∶=10 0.4477 ≪1.0 ×10−16 ≪1.0 ×10−16 ≫1.0 0.0582 >1.0
r2
ˆ
y˜
ˆ
y∣k∶=10 0.7217 1.0 1.0 0.9352 0.3951 0.8137
TABLE 3.4: The normalized mean generalization error eN
H,sg(ˆ
Qξ
ξ
ξ)and
the mean SSPCC r2
ˆ
y˜
ˆ
y∣kwithin the k-fold cross validation method w.r.t.
the radial basis function with thin plate spline assignment in Fig-
ure 3.10 with sample size m∶=50.
(i) (ii) (iii) (iv) (v) (vi)
eN
H,sg(ˆ
Qξ
ξ
ξ)∣k∶=50.3377 0.0149 0.4173 ≫1.0 0.0935 0.3707
r2
ˆ
y˜
ˆ
y∣k∶=50.6166 0.9999 0.9973 0.9610 0.1772 0.8905
eN
H,sg(ˆ
Qξ
ξ
ξ)∣k∶=10 0.2330 0.0111 0.4064 ≫1.0 0.0677 0.2898
r2
ˆ
y˜
ˆ
y∣k∶=10 0.6963 0.9999 0.9963 0.9780 0.4328 0.9732
In Table 3.3, the normalized mean generalization error and the mean SSPCC
within the k-fold cross validation method w.r.t. the 2-variate monomial polynomial
in Figure 3.9 is presented. It supports the observations at the qualitative level. If
we pick the modified Branin test function as an example, then the Table hints addi-
tionally at the monomial polynomial’s convenience for recovering at least partly the
shape of such a test function.
In Table 3.4, the normalized mean generalization error and the mean SSPCC
within the k-fold cross validation method w.r.t. a radial basis function with thin plate
spline assignment is listed. Compared to the monomial polynomial, the thin plate
spline radial basis function recovers better the values and the shape of the modified
Branin test function.
3.2. Surrogate-based optimization 87
TABLE 3.5: The normalized mean generalization error eN
H,sg(ˆ
Qξ
ξ
ξ)and
the mean SSPCC r2
ˆ
y˜
ˆ
y∣kwithin the k-fold cross validation method w.r.t.
the kriging low-fidelity model in Figure 3.11 with sample size m∶=50.
(i) (ii) (iii) (iv) (v) (vi)
eN
H,sg(ˆ
Qξ
ξ
ξ)∣k∶=50.2517 0.2818 0.0043 0.9956 0.0443 0.0242
r2
ˆ
y˜
ˆ
y∣k∶=50.5367 0.9987 0.9999 0.9998 0.5188 0.9983
eN
H,sg(ˆ
Qξ
ξ
ξ)∣k∶=10 0.2555 0.0925 0.0011 0.3290 0.0130 0.0022
r2
ˆ
y˜
ˆ
y∣k∶=10 0.5838 0.9999 0.9999 0.9998 0.7665 0.9998
TABLE 3.6: The normalized global first-order sensitivity measure SN
˜
ˆ
y,i
with i∈{1,2}evaluated at fw.r.t. the 2-variate monomial polynomial
in Figure 3.9b with sample size m∶=50.
(i) (ii) (iii) (iv) (v) (vi)
SN
˜
ˆ
y,1(f)0.4922 0.5000 0.4894 0.9935 0.8633 0.4455
SN
˜
ˆ
y,2(f)0.5078 0.5000 0.5106 0.0065 0.1367 0.5545
Σ2
i=1SN
˜
ˆ
y,i(f)1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
TABLE 3.7: The normalized global first-order sensitivity measure SN
˜
ˆ
y,i
with i∈{1,2}evaluated at fw.r.t. the radial basis function with thin
plate spline assignment in Figure 3.10b with sample size m∶=50.
(i) (ii) (iii) (iv) (v) (vi)
SN
˜
ˆ
y,1(f)0.5041 0.5000 0.4851 0.9733 0.5351 0.6226
SN
˜
ˆ
y,2(f)0.4959 0.5000 0.5149 0.0267 0.4649 0.3774
Σ2
i=1SN
˜
ˆ
y,i(f)1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
TABLE 3.8: The normalized global first-order sensitivity measure SN
˜
ˆ
y,i
with i∈{1,2}evaluated at fw.r.t. the kriging low-fidelity model in
Figure 3.11b with sample size m∶=50.
(i) (ii) (iii) (iv) (v) (vi)
SN
˜
ˆ
y,1(f)0.8994 0.5000 0.4893 0.9963 0.2609 0.7221
SN
˜
ˆ
y,2(f)0.1006 0.5000 0.5107 0.0037 0.7391 0.2779
Σ2
i=1SN
˜
ˆ
y,i(f)1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
In Table 3.5, the normalized mean generalization error and the mean SSPCC
within the k-fold cross validation method w.r.t. a kriging low-fidelity model. Com-
pared to the thin plate spline radial basis function and the monomial polynomial,
the kriging low-fidelity model shows a relatively high mean SSPCC for all test func-
tions. Observe that the kriging low-fidelity model is capable to recover the values
88 Chapter 3. Surrogate optimization
and the shape of the modified Branin test function with a high degree of accuracy.
However, all data-fit low-fidelity model exhibits some difficulties in recovering
the values of
(iv)
the Rosenbrock function. This observation indicates that there is a
need for a normalization of the test function’s values. Thus, let us determine auto-
matically the decimal power of the test function’s maximal value and normalize all
the test function’s values regarding this decimal power. Notice well that the shape-
related entities such as the SSPCC or the normalized global first-order sensitivity
measures are not affected by a normalization of the test function’s values.
In Figure 3.12, in Figure 3.13 and in Figure 3.14, let us extend in a self-explanatory
manner the definition of eN
H,sg(ˆ
Qξ
ξ
ξ)∣kin (3.28) – and the definition of eNR
H,sg(ˆ
Qξ
ξ
ξ)∣kas
well – and the definition of r2
ˆ
y˜
ˆ
y∣kin (3.33) in order to depict the corresponding courses
of the paths regarding the number of testing points mg.
Observe that the number of testing points is truncated at mg∶=10 which corre-
sponds to the total number of points mby m∶=kmg. If k∶=10, then m∶=100. Thus,
from an application-driven viewpoint with, e.g., weaker requirements w.r.t. the ac-
curacy, let us regard such a sample size as an upper limit. An immediate potential
drawback of such an upper limit is that, in general, one cannot expect to detect some
kind of monotonicity which is usually associated with an asymptotic consideration.
Inspecting Figure 3.12b, one can observe that the paths associated with the er-
ror eNR
H,sg(ˆ
Qξ
ξ
ξ)∣kreflect values that are order of magnitudes higher than the values
corresponding to the paths associated with the error eN
H,sg(ˆ
Qξ
ξ
ξ)∣k. This observation
echos the more conservative behavior of eNR
H,sg(ˆ
Qξ
ξ
ξ)∣k(cf. § 3.1.1).
Observing Figure 3.12 and Figure 3.14, a valuable insight is that the continu-
ous deformation of the courses of paths corresponding to the 5-fold cross validation
into the courses of paths corresponding to the 10-fold cross validation is relatively
marginal – except for
(i)
the Ackley test function and
(v)
the Michalewicz test func-
tion which, from an application-driven viewpoint, one can consider as extreme use
cases as opposed to the common use cases
(ii)
,
(iii)
,
(iv)
, and
(vi)
. Another valuable
insight is that if the requirements regarding the values and the shapes of the low-
fidelity models are weakened, then, for the common use cases, the deterministic
data-fit low-fidelity models can provide a computationally less expensive alterna-
tive for the probabilistic data-fit low-fidelity models.
In Table 3.6, in Table 3.7, and in Table 3.8, I present the results regarding the
normalized global first-order sensitivity measure SN
˜
ˆ
y,iwith i∈{1,2}evaluated at f
w.r.t. the 2-variate monomial polynomial in Figure 3.9b, the radial basis function
with thin plate spline assignment in Figure 3.10b, and the kriging low-fidelity model
in Figure 3.11b, respectively. In all cases the sample size mis set to m∶=50.
In Figure 3.15, let us extend in an obvious way the definition of SN
˜
ˆ
y,i(f)in (3.34)
in order to depict the corresponding courses of the paths regarding the number of
training points mt.
A valuable insight is that if we only focus on limit considerations, then we are not
able to tell some test functions apart (see, e.g.,
(i)
,
(ii)
or
(iii)
in Figure 3.15). However,
the courses of the paths furnish us with some hints about the landscapes of the test
functions, therefore, they are helping us partially to tell the test functions apart.
Another valuable insight is that, in the case of the interpolation-focused data-fit
low-fidelity models, the entity SN
˜
ˆ
y,i(f)exhibit some kind of correlation with the en-
tity r2
ˆ
y˜
ˆ
y,cv(k,mg). This insight drives the conjecture about the trustworthiness of low-
fidelity models’ normalized global first-order sensitivity measures in (3.35). Mind
3.2. Surrogate-based optimization 89
that this insight should rather be understood as an intriguing starting point for
future investigations where the behavior of regression-focused and interpolation-
focused data-fit low-fidelity models is elaborated more thoroughly, and where the
number of test functions under consideration is increased substantially. To my best
knowledge, there is currently a lack of such extensive benchmarking in a similar
style of Figure 3.12, Figure 3.13, Figure 3.14, and Figure 3.15.
Since the elaborations are without loss of generality w.r.t. the dimensionality d,
let us dwell briefly on this issue by exploring the Rosenbrock test function on the
smaller domain [−2.0,2.0]d(for the case d=2, see the Figure 2.3) where the dimen-
sionality dis governed by d∈{2,3,4,5,6,7}. Thus, we explore the evolution of the
normalized mean generalization error, the mean SSPCC, and the normalized global
first-order sensitivity measure with the dimensionality.
Recalling § 2.3.3, the Rosenbrock test function in Table 2.1 permits to consider
immediately an arbitrary dimensionality with d∈{2,3,4,5,6,7}, more precisely, fRNξ
from (2.52) with Nξ≡d, where, in each case, the global minimum is at the point
(1,1,.. .,1)∈Rd. In Table 3.9, I depict the normalized mean generalization error
and the mean SSPCC within the k-fold cross validation method w.r.t. a kriging low-
fidelity model of a generalized version of the Rosenbrock test function where the
sample size mis set to m∶=50. Essentially, the Table 3.9 reveals how strongly the
curse of dimensionality kicks in: For instance, the error eN
H,sg(ˆ
Qξ
ξ
ξ)∣k∶=10 increases by
approximately five orders of magnitude from d∶=2 to d∶=7, and the SSPCC r2
ˆ
y˜
ˆ
y∣k∶=10
drops by almost 70% from d∶=2 to d∶=7. Geometrically, this observations translates
into shifts of the corresponding courses of paths in Figure 3.13 and in Figure 3.14.
Thus, the findings in Table 3.9 can be conceived as a worst-case estimate of the evo-
lution of the normalized mean generalization error and the mean SSPCC with the
dimensionality regarding the findings in Figure 3.13 and in Figure 3.14.
TABLE 3.9: Given the sample size m∶=50, the normalized mean gen-
eralization error eN
H,sg(ˆ
Qξ
ξ
ξ)and the mean SSPCC r2
ˆ
y˜
ˆ
y∣kwithin the k-
fold cross validation method w.r.t. a kriging low-fidelity model of the
Rosenbrock test function in Table 2.1 (with normalized values) gener-
alized to the domain [−2.0,2.0]dwith d∈{2,3,4,5,6,7}.
Dimensionality deN
H,sg(ˆ
Qξ
ξ
ξ)∣k∶=5r2
ˆ
y˜
ˆ
y∣k∶=5eN
H,sg(ˆ
Qξ
ξ
ξ)∣k∶=10 r2
ˆ
y˜
ˆ
y∣k∶=10
2 1.5808×10−60.9998 7.9288×10−70.9996
3 3.2861×10−40.9652 9.0496×10−50.9915
4 4.7462×10−30.6747 3.5632×10−30.6093
5 6.5692×10−30.6483 5.7308×10−30.6387
6 9.7461×10−30.2742 8.7787×10−30.3653
7 1.3586×10−20.0393 1.3579×10−20.3013
Determining the normalized mean generalization error eN
H,sg(ˆ
Qξ
ξ
ξ)∣k∶=5for the di-
mensionality d∶=7 takes, roughly, about 23 times longer than for the dimensionality
d∶=2. Furthermore, determining the normalized mean generalization error eN
H,sg(ˆ
Qξ
ξ
ξ)∣k∶=10
for the dimensionality d∶=7 takes, roughly, about 17 times longer than for the dimen-
sionality d∶=2. In both cases, the longer computation time for the dimensionality
d∶=7 is primarily dominated by the more involved optimization in (3.97). How-
ever, these factors should merely be conceived as raw estimates or raw proxies of
90 Chapter 3. Surrogate optimization
(A)eN
cv(5,mg)vs. mg.
(B)eNR
cv (5,mg)vs. mg.
FIGURE 3.12: The value of eN
cv and eNR
cv evaluated at the number of
folds kand the number of testing points mgw.r.t. the test functions
in Figure 2.2 with normalized values. The number mgcorresponds to
the number mtby mt∶=(k−1)mgand to the number mby m∶=kmg.
The data-fit low-fidelity models are ( ) a 2-variate monomial polyno-
mial via regression, ( ) a TPS RBF, and ( ) a kriging model.
In
(ii)
and
(iii)
,eN
cv(k,mg)w.r.t. ( ) is below machine precision.
3.2. Surrogate-based optimization 91
(A)eN
cv(5,mg)vs. mg.
(B)eN
cv(10,mg)vs. mg.
FIGURE 3.13: The value of eN
cv evaluated at the number of folds kand
the number of testing points mgw.r.t. the test functions in Figure 2.2
with normalized values. The number mgcorresponds to the number
mtby mt∶=(k−1)mgand to the number mby m∶=kmg.
The data-fit low-fidelity models are ( ) a 2-variate monomial polyno-
mial via regression, ( ) a TPS RBF, and ( ) a kriging model.
In
(ii)
and
(iii)
,eN
cv(k,mg)w.r.t. ( ) is below machine precision.
92 Chapter 3. Surrogate optimization
(A)r2
ˆ
y˜
ˆ
y,cv(5,mg)vs. mg.
(B)r2
ˆ
y˜
ˆ
y,cv(10,mg)vs. mg.
FIGURE 3.14: The value of r2
ˆ
y˜
ˆ
y,cv evaluated at the number of folds k
and the number of testing points mgw.r.t. the test functions in
Figure 2.2. The number mgcorresponds to the number mtby
mt∶=(k−1)mgand to the number mby m∶=kmg.
The data-fit low-fidelity models are ( ) a 2-variate monomial polyno-
mial via regression, ( ) a TPS RBF, and ( ) a kriging model.
3.2. Surrogate-based optimization 93
(A)SN
˜
ˆ
y,1(f,mt)vs. mt.
(B)SN
˜
ˆ
y,2(f,mt)vs. mt.
FIGURE 3.15: The value of SN
˜
ˆ
y,iwith i∈{1,2}evaluated at fw.r.t. the
test functions in Figure 2.2 and the number of training points mt.
The data-fit low-fidelity models are ( ) a 2-variate monomial polyno-
mial via regression, ( ) a TPS RBF, and ( ) a kriging model.
The gray dotted line indicates the position mt∶=12 and the gray
dashed line indicates the position mt∶=27. The thick black line refers
to the value of SN
i(f)of the respective test function in Table 2.2.
94 Chapter 3. Surrogate optimization
the overhead in the optimization in (3.97) due to the increase in the dimensionality.
TABLE 3.10: The normalized global first-order sensitivity measure SN
i
evaluated at fRNξfrom (2.52) w.r.t. the data in Table 3.9.
d
SN
˜
ˆ
y,i(f)i∶=1i∶=2i∶=3i∶=4i∶=5i∶=6i∶=7Σd
i=1SN
˜
ˆ
y,i(f)
2 0.9081 0.0919 −−−−−1.0000
3 0.4125 0.5475 0.0400 −−−−1.0000
4 0.2004 0.4122 0.3568 0.0306 −−−1.0000
5 0.2133 0.2161 0.2386 0.3227 0.0093 − − 1.0000
6 0.1228 0.1842 0.2792 0.3246 0.0861 0.0031 −1.0000
7 0.0614 0.2896 0.0592 0.1496 0.0158 0.2985 0.1259 1.0000
TABLE 3.11: Using the reference values in Table 2.4, the low-fidelity
models’ normalized global first-order sensitivity measures (LFSM) er-
ror in (3.37) evaluated at fRNξfrom (2.52) w.r.t. the data in Table 3.9.
d
em∞(SN
˜
ˆ
y,i)i∶=1i∶=2i∶=3i∶=4i∶=5i∶=6i∶=7
2+0.0031 −0.0314 −−−−−
3−0.0292 +0.0223 −0.0204 − − − −
4+0.2199 −0.1482 +0.0061 −0.2191 − − −
5−0.1274 +0.1817 +0.0966 −0.2219 +0.4973 − −
6+0.1780 +0.1187 −0.3359 −0.5531 +0.5880 +0.7877 −
7+0.5044 −0.6759 +0.6574 +0.1343 +0.9086 −0.7274 −9.405
In Table 3.10, I present the normalized global first-order sensitivity measure SN
i
evaluated at fRNξfrom (2.52) w.r.t. the data in Table 3.9. Additionally, in Table 3.11, I
present the LFSM error in (3.37) w.r.t. the Table 2.4. The increase of the unsigned
LFSM error in Table 3.11 from d∶=2 to d∶=7 correlates with the decrease of the
SSPCC in Table 3.9 from d∶=2 to d∶=7. Hence, this relationship supports in higher
dimensions the similar observations in Figure 3.15 and in Figure 3.14 for the case
d∶=2.
A benefit of the previously discussed tables and figures with regard to the deter-
ministic and probabilistic data-fit low-fidelity models is that they provide us with
some quantitative hints about the values and the shape (or landscape) associated
with the high-fidelity models in the physics-oriented context of the applications in
the preset work such that one can roughly assign these high-fidelity models’ behav-
ior to one or more test functions from Table 2.1.
Thus, this by no means complete list of indicators or properties (i.e., tables and
figures) enables us some kind of raw classification of application-driven high-fidelity
models’ behaviors. More interestingly, if a high-fidelity model does not possess cer-
tain properties of a selected function from the list of test functions, then one can make
an educated guess that the high-fidelity model is probably not a representative of the
selected function from the list of test functions.
However, there are a couple of caveats regarding such a benchmark-focused
classification. First, it is not clear whether there exists a reliable complete list of
3.2. Surrogate-based optimization 95
indicators or properties. Second, it is not clear how big the list of test functions
has to be. Nevertheless, from an application-oriented viewpoint as well as from a
theory-oriented viewpoint, such an attempt of a benchmark-focused classification is
a worthwhile endeavor.
To conclude the discussion with respect to the data-fit low-fidelity models, let us
perform a surrogate-based optimization concerning the modified Branin test func-
tion according to the following schematic procedure that I refer to as SBO-DFLF:
1) create a sample w.r.t. the high-fidelity model (§ 3.1.1);
2) construct a data-fit low-fidelity model (§ 3.1.2) w.r.t. the sample from step 1);
3) invoke a global optimization algorithm (§ 2.3.3) w.r.t. the constructed data-fit
low-fidelity model from step 2);
4) use the minimizer from step 3) as a starting point for a local optimization algo-
rithm (§ 2.3.3) w.r.t. the high-fidelity model from step 1).
Notice that the proposed schematic procedure SBO-DFLF builds upon the com-
mon canon of optimization algorithms (recall § 2.3.3) such as the Nelder-Mead sim-
plex (NMS) algorithm and the adaptive differential evolution (ADE) algorithm.
Since the present work’s core mantra is that a function evaluation of the high-
fidelity model is computationally expensive, let us count the number of high-fidelity
model function evaluations in order to find the modified Branin test function’s global
minimizer and its corresponding value presented in Table 2.1 within a certain accu-
racy.
We start with ten function evaluations because the sample size mis set to m∶=10
such as in Figure 3.9a, and Figure 3.10a, and Figure 3.11a in order to construct the
respective data-fit low-fidelity model.
Next, let use invoke an adaptive differential evolution (ADE) algorithm to de-
termine the global minimizer and its corresponding value of the respective data-fit
low-fidelity model. Mind that we only consider box constraints. Depending on the
degree of rigor needed for a task at hand, in addition, a guaranteed deterministic
global optimization algorithm using interval arithmetic can be employed in order to
certify the result of the ADE algorithm, i.e., the global minimizer and its correspond-
ing value of the respective data-fit low-fidelity model.
Finally, let us use the minimizer from the ADE algorithm as a starting point for
a Nelder-Mead simplex (NMS) algorithm with regard to the modified Branin test
function. Note that, instead of the NMS algorithm from (Opkg1), the NMS algorithm
from (Opkg3) is invoked since, in addition to the common classic implementation,
it provides a modern implementation in the sense that, e.g., it defaults to an adap-
tive tuning parameters scheme and it focuses on keeping the number of necessary
function evaluations low.24
In Table 3.12, I present the results from the proposed schematic surrogate-based
optimization procedure SBO-DFLF. The quantity erx∗refers to the relative error with
respect to the global minimizer listed in Table 2.1 and the quantity erf(x∗)refers to the
relative error with respect to the global minimizer’s function value listed in Table 2.1
as well. The quantity itNMS refers to the total number of function evaluations in the
NMS algorithm.
24For more details on the implementation of the NMS algorithm from (Opkg1) and the NMS algo-
rithm from (Opkg3), I refer to the respective package documentation and references therein.
96 Chapter 3. Surrogate optimization
TABLE 3.12: Building upon the Figure 3.9a, and the Figure 3.10a,
and the Figure 3.11a, surrogate-based optimization according to the
proposed schematic procedure SBO-DFLF w.r.t. the modified Branin
function using data-fit low-fidelity models.
Low-fidelity model mitNMS erx∗erf(x∗)
Polynomial 10 11 0.0086 0.0018
TPS RBF 20 17 0.0703 0.0400
Kriging 10 11 0.0067 0.0023
The Table 3.12 reveals that the procedures based on the monomial polynomial
model and the kriging low-fidelity model consume in total 21 function evaluations
of the high-fidelity model in order to produce a relative error erx∗and a relative
error erf(x∗)that, encoding these errors in percentage, are less than one percent.
Choosing the thin plate spline radial basis function requires to increase the sample
size, otherwise the TPS RBF is not capable to produce a sufficiently good starting
point. Ultimately, the procedure based on the TPS RBF consumes in total 37 function
evaluations of the high-fidelity model in order to produce a relative error erx∗and
a relative error erf(x∗)that, encoding these errors in percentage, are less than eight
percent.
In practical applications, usually, there is no apriori knowledge about the min-
imizers of a high-fidelity model. Hence, the proposed schematic surrogate-based
optimization procedure SBO-DFLF offers an approach to find relatively quickly and
sufficiently accurately a global minimizer of a high-fidelity model.
An advantage of the Table 3.12 is that it serves as a quantitative hint at the po-
tential quality of a surrogate-based optimization procedure that is useful for the as-
sessment of a surrogate-guided optimization procedure.
3.2.2 Optimization with test functions
by emulated simplified-physics low-fidelity models
Finally, let us emulate simplified-physics low-fidelity models for the modified Branin
test function via the assignment rule
˜
K(x)∶=γ+Rδ⋅RK(α+R2β⊙x), (3.105)
where γ,δ∈Rand α,β∈R2×1denote adjustment parameters and the map ⊙is com-
prehended as the Hadamard product or the element-wise product.
TABLE 3.13: The choice of the 4-tuple of parameters (α,β,γ,δ)in Fig-
ure 3.16a and in Figure 3.16b.
Figure 3.16a Figure 3.16b
(i)
([0.0 10.0]T,[1.0 1.0]T, 0.0, 1.0) ([0.0 0.0]T,[1.0 1.0]T, 1.0×103, 1.0)
(ii)
([10.0 0.0]T,[1.0 1.0]T, 0.0, 1.0) ([10.0 10.0]T,[1.0 1.0]T, 1.0×103, 1.0)
(iii)
([10.0 10.0]T,[1.0 1.0]T, 0.0, 1.0) ([0.0 0.0]T,[1.5 1.5]T, 1.0×103, 1.0)
(iv)
([0.0 0.0]T,[0.5 1.0]T, 0.0, 1.0) ([0.0 0.0]T,[1.0 1.0]T, 0.0, 1.0×103)
(v)
([0.0 0.0]T,[1.0 0.5]T, 0.0, 1.0) ([10.0 10.0]T,[1.0 1.0]T, 0.0, 1.0×103)
(vi)
([0.0 0.0]T,[1.5 1.5]T, 0.0, 1.0) ([0.0 0.0]T,[1.5 1.5]T, 0.0, 1.0×103)
3.2. Surrogate-based optimization 97
The assignment in (3.105) is a variation and a generalization to the two-dimensio-
nal case of the one-dimensional cases in [49, p. 86] and in [70, p. 195]. In Table 3.13,
several choices of the 4-tuple of parameters (α,β,γ,δ)are listed whose influences on
the low-fidelity model are depicted in Figure 3.16.
In Figure 3.17, I illustrate grad(˜
K)(x1,x2)as a projection on the contour repre-
sentation of the emulated simplified-physics low-fidelity models.
For the sake of completion, in Table 3.14 and in Table 3.15, I provide the normal-
ized mean generalization error eN
H,sg(ˆ
Qξ
ξ
ξ)and the mean SSPCC r2
ˆ
y˜
ˆ
y∣kwithin the k-fold
cross validation method w.r.t. the emulated simplified-physics low-fidelity models
in Figure 3.16 with sample size m∶=50. For all combinations of the adjustment pa-
rameters in Table 3.13, the deterioration in the value is comprehensibly large. The
deterioration in the shape is largest for those combinations of the adjustment param-
eters in which the argument of the high-fidelity model are shifted.
TABLE 3.14: The normalized mean generalization error eN
H,sg(ˆ
Qξ
ξ
ξ)and
the mean SSPCC r2
ˆ
y˜
ˆ
y∣kwithin the k-fold cross validation method w.r.t.
emulated simplified-physics low-fidelity models in Figure 3.16a with
normalized values and with sample size m∶=50.
(i) (ii) (iii) (iv) (v) (vi)
eN
H,sg(ˆ
Qξ
ξ
ξ)∣k∶=56.2 ×10−24.2 ×10−29.6 ×10−24.9 ×10−31.7 ×10−21.6 ×10−2
r2
ˆ
y˜
ˆ
y∣k∶=50.6391 0.0391 0.1092 0.7466 0.3434 0.8684
eN
H,sg(ˆ
Qξ
ξ
ξ)∣k∶=10 6.2 ×10−24.2 ×10−29.6 ×10−24.9 ×10−31.7 ×10−21.6 ×10−2
r2
ˆ
y˜
ˆ
y∣k∶=10 0.6238 0.2691 0.3007 0.7224 0.3611 0.7565
TABLE 3.15: The normalized mean generalization error eN
H,sg(ˆ
Qξ
ξ
ξ)and
the mean SSPCC r2
ˆ
y˜
ˆ
y∣kwithin the k-fold cross validation method w.r.t.
emulated simplified-physics low-fidelity in Figure 3.16b with normal-
ized values and with sample size m∶=50.
(i) (ii) (iii) (iv) (v) (vi)
eN
H,sg(ˆ
Qξ
ξ
ξ)∣k∶=51.8 ×10−11.4 ×10−18.6 ×10−2<2.2 ×10−16 9.6 ×10−21.6 ×10−2
r2
ˆ
y˜
ˆ
y∣k∶=51.0 0.1558 0.8367 1.0 0.1558 0.8367
eN
H,sg(ˆ
Qξ
ξ
ξ)∣k∶=10 1.8 ×10−11.4 ×10−18.6 ×10−2<2.2 ×10−16 9.6 ×10−21.6 ×10−2
r2
ˆ
y˜
ˆ
y∣k∶=10 1.0 0.3522 0.7545 1.0 0.3522 0.7545
Let us skip a discussion about the sensitivity measures since the procedure is
analogous to the corresponding procedure in the previous section. However, an im-
port insight from the Table 3.14 and the Table 3.15 is that if the mean SSPCC r2
ˆ
y˜
ˆ
y∣k
is above a threshold of 0.75, then it might be useful to adapt the proposed proce-
dure SBO-DFLF to the case of simplified-physics low-fidelity models as well. Notice
that the threshold of 0.75 is rather user-dependent. Based on anecdotal evidence,
however, the authors in [70, p. 37] allege that a threshold above 0.80 corresponds to
a good low-fidelity model in terms of predictive power.
98 Chapter 3. Surrogate optimization
-5 0 5 10
x
1
0
5
10
15
x
2
(i)
-5 0 5 10
x
1
0
5
10
15
x
2
(ii)
-5 0 5 10
x
1
0
5
10
15
x
2
(iii)
-5 0 5 10
x
1
0
5
10
15
x
2
(iv)
-5 0 5 10
x
1
0
5
10
15
x
2
(v)
-5 0 5 10
x
1
0
5
10
15
x
2
(vi)
(A) The concrete 4-tuple of parameters (α,β,γ,δ)are listed in the first column of the Table 3.13.
The red cross refers to the global minimum of the high-fidelity model in Figure 2.2b.
-5 0 5 10
x
1
0
5
10
15
x
2
(i)
-5 0 5 10
x
1
0
5
10
15
x
2
(ii)
-5 0 5 10
x
1
0
5
10
15
x
2
(iii)
-5 0 5 10
x
1
0
5
10
15
x
2
(iv)
-5 0 5 10
x
1
0
5
10
15
x
2
(v)
-5 0 5 10
x
1
0
5
10
15
x
2
(vi)
(B) The concrete 4-tuple of parameters (α,β,γ,δ)are listed in the second column of the Table 3.13.
The red cross refers to the global minimum of the high-fidelity model in Figure 2.2b.
FIGURE 3.16: Emulated simplified-physics low-fidelity models for
the modified Branin test function (and high-fidelity model, respec-
tively) in Figure 2.2 via the assignment rule in (3.105) (solely in con-
tour representation).
3.2. Surrogate-based optimization 99
−5 0510
x
1
0
5
10
15
x
2
(i)
−5 0510
x
1
0
5
10
15
x
2
(ii)
−5 0510
x
1
0
5
10
15
x
2
(iii)
−5 0510
x
1
0
5
10
15
x
2
(iv)
−5 0510
x
1
0
5
10
15
x
2
(v)
−5 0510
x
1
0
5
10
15
x
2
(vi)
(A) Depicting grad(˜
K)(x1,x2)within Figure 3.16a.
Same scaling from
(i)
to
(vi)
.
−5 0510
x
1
0
5
10
15
x
2
(i)
−5 0510
x
1
0
5
10
15
x
2
(ii)
−5 0510
x
1
0
5
10
15
x
2
(iii)
−5 0510
x
1
0
5
10
15
x
2
(iv)
−5 0510
x
1
0
5
10
15
x
2
(v)
−5 0510
x
1
0
5
10
15
x
2
(vi)
(B) Depicting grad(˜
K)(x1,x2)within Figure 3.16b.
Same scaling from
(i)
to
(iii)
and from
(iv)
to
(vi)
.
The scaling in
(iv)
, in
(v)
, and in
(v)
is 1×10−3of the scaling in
(i)
, in
(ii)
, and in
(iii)
.
FIGURE 3.17: Depicting grad(˜
K)(x1,x2)as a projection on the con-
tour representation of the emulated simplified-physics low-fidelity
models for the modified Branin test function in Figure 3.16.
100 Chapter 3. Surrogate optimization
Let us apply the following schematic procedure that I refer to as SBO-SPLF:
1) create a sample w.r.t. the high-fidelity model (§ 3.1.1);
2) provide a simplified-physics low-fidelity model (§ 3.1.3) and compute r2
ˆ
y˜
ˆ
y∣k
w.r.t. the sample from step 1) and if r2
ˆ
y˜
ˆ
y∣k
<0.75, break off the procedure, other-
wise continue the procedure;
3) invoke a global optimization algorithm (§ 2.3.3) w.r.t. the provided simplified-
physics low-fidelity model from step 2);
4) use the minimizer from step 3) as a starting point for a local optimization algo-
rithm (§ 2.3.3) w.r.t. the high-fidelity model from step 1).
Similarly to the proposed procedure SBO-DFLF, the proposed schematic proce-
dure SBO-SPLF builds upon the common canon of optimization algorithms such
as the Nelder-Mead simplex (NMS) algorithm and the adaptive differential evolu-
tion (ADE) algorithm.
Depending on the desired degree of scrutiny and rigor, it is also conceivable to
combine the proposed procedures SBO-DFLF and SBO-SPLF in sequence or in par-
allel and to compare, e.g., the respective minimizers that serve as starting points.
Furthermore, the procedure SBO-DFLF suits well as a fallback branch for the proce-
dure SBO-SPLF in the case that r2
ˆ
y˜
ˆ
y∣kis below the threshold 0.75.
Finally, let us observe that, in order to determine the quantities eN
H,sg(ˆ
Qξ
ξ
ξ)∣kand
r2
ˆ
y˜
ˆ
y∣kin Table 3.14 and in Table 3.15, there are 50 evaluations of the simplified-physics
low-fidelity model and 50 evaluations of the high-fidelity model needed. Thus, in
practical applications, the sample size has to be lowered to, e.g., the number 10 that
corresponds to 10 evaluations of the high-fidelity model and 10 evaluations of the
simplified-physics low-fidelity model. However, in order to compute r2
ˆ
y˜
ˆ
y∣kwith, e.g.,
k∶=5, there are at least 15 evaluations of the high-fidelity model and 15 evaluations
of the simplified-physics low-fidelity model needed (recall § 3.1.1).
Under the common assumption that, compared to the number of evaluations
of the high-fidelity model, the number of evaluations of the simplified-physics low-
fidelity model is negligible, results from the proposed procedure SBO-SPLF are com-
parable to results such as in Table 3.12 from the proposed procedure SBO-DFLF.
3.3 Surrogate-guided optimization
The fundamental philosophy in the present work is to keep the sample size mas low
as possible since, with regard to the high-fidelity model, the acquisition of pairs of
sampling plan points and output points is computationally expensive.
In order to determine the high-fidelity model’s optimal solution, a reasonable
sample size mand a reasonable number of high-fidelity function evaluations, re-
spectively, are unknown apriori. Furthermore, as elucidated in the elaborations of
the previous section § 3.2, these numbers are problem-dependent as well.
If we suppose that the positive integer number of high-fidelity function evalu-
ations mDSO w.r.t. a direct solving of a high-fidelity optimization problem is higher
than the positive integer number of high-fidelity function evaluations mSBO w.r.t. a
3.3. Surrogate-guided optimization 101
surrogate-based optimization, then, hopefully, the positive integer number of high-
fidelity function evaluations mSGO w.r.t. a surrogate-guided optimization is lower
than mSBO such that the transitive relation
∀mDSO,mSBO,mSGO ∈N/{0}.mDSO >mSBO ∧mSBO >mSGO Ô⇒ mDSO >mSGO
(3.106)
holds to be true. From an application-driven viewpoint, the additional value of
checking the relation
∀mSBO,mSGO ∈N/{0}.mSBO >mSGO (3.107)
for a given problem is, for instance, comprehensible in the context of validation and
verification.
The relation in (3.106) as well as the relation in (3.107) possess the hidden as-
sumption that the corresponding solving procedures converge to the same optimal
solution w.r.t. a user-defined tolerance. Let us encode this hidden assumption by the
pre-condition
∃mDSO,mSBO,mSGO ∈N/{0}.mDSO,mSBO,mSGO >0, (3.108)
and let us encode the implicit order structure in (3.106) by the post-condition
∀mDSO,mSBO,mSGO ∈N/{0}.mDSO >mSBO >mSGO >0. (3.109)
In those cases in which the hidden assumption is not satisfied or the relation
in (3.107) does not hold, it might be more preferable to perform a surrogate-based
optimization instead of a surrogate-guided optimization.
However, mind that, generally, one cannot know with certainty in advance whe-
ther the relation in (3.107) holds for a task at hand; therefore, from an application-
driven viewpoint embedded in the context of validation and verification, it might be
reasonable to perform a surrogate-based optimization as well as a surrogate-guided
optimization for a task at hand.
The basic idea of surrogate-guided optimization is to provide some kind of in-
teraction between a high-fidelity model and a low-fidelity model, that is, to provide
some model management strategy (cf. § 1.2). In the subsequent subsections, let us
focus on the model management strategies adaptation and fusion (cf. § 1.3).
The essential idea underlying adaptation is to exploit information about a low-
fidelity model in each step of the solving procedure regarding the high-fidelity opti-
mization problem in order to adapt the procedure according to the low-fidelity model
information.
The key idea of fusion is to combine or to fuse information about the high-fidelity
model and information about the low-fidelity model into a single model that is ex-
ploited to constitute a proxy for the high-fidelity optimization problem.
For the sake of completeness of the discussion about the kriging low-fidelity
model in § 3.1.2, let us elaborate briefly on the so-called sequential kriging opti-
mization which is a subtype of the model management strategy adaptation.
Next, we discuss different algorithms from the context of the space-mapping
paradigm. Their underlying optimization procedures are subtypes of the model
management strategy adaptation as well.
Finally, we examine the co-kriging optimization that can be conceived as a sub-
type of the model management strategy fusion.
102 Chapter 3. Surrogate optimization
Regarding all optimization approaches under consideration, we elaborate some
convergence-related issues.
3.3.1 Sequential kriging optimization
An important feature of the kriging low-fidelity model in (3.98) is that one can pro-
vide a mean squared prediction error (ˆ
sy(x))2at an arbitrary point x. More precisely,
the error (ˆ
sy(x))2that is associated with ˆ
y(x)in (3.98) can be stated as
(ˆ
sy(x))2∶=ˆ
σ2
y(1−rTΨ−1r+(1−1TΨ−1r)2
1TΨ−11), (3.110)
where ˆ
sy∶X→R. Note that the term involving the fraction is negligibly small, thus,
let us consider the mean squared prediction error (ˆ
sy(x))2as
(ˆ
sy(x))2∶=ˆ
σ2
y(1−rTΨ−1r). (3.111)
In addition to the numerical justification for omitting the fraction term in (3.110),
the authors in [70, p. 84] provide a method-based justification, more specifically,
they argue that, from a Bayesian viewpoint, the fraction term is not mentioned at all
(such as in, e.g., [116, p. 280ff]).
Instead of (ˆ
sy(x))2, it is common to employ the root mean squared prediction
error ˆ
sy(x)as the measure of the uncertainty in the prediction ˆ
y(x). The error ˆ
sy(x)
is linked with the error (ˆ
sy(x))2by
ˆ
sy(x)∶=√∣(ˆ
sy(x))2∣(3.112a)
≡√∣ˆ
σ2
y(1−rTΨ−1r)∣. (3.112b)
Observe that, if a sampling plan point is given, then the error is zero. More formally,
if we provide an auxiliary map aˆ
sy=x↦rTΨ−1r∶X→R, then one can state that
∀i∈{1,. . .,m}.aˆ
sy(xi)=1Ô⇒ ∀i∈{1,. ..,m}.ˆ
sy(xi)=0. (3.113)
This observation encodes the intuitive expectation that the prediction is exact at a
given sampling plan point where the corresponding output point is given as well.
Generally, if we utilize a data-fit low-fidelity model in an interpolation context,
then one can reduce the empirical surrogate modeling error (recall Definition 3.1.2)
by increasing sufficiently the sample size and by positioning appropriately the sam-
pling plan points.
However, for reasons of computational thrift, the total number of sampling plan
points has to be kept as low as possible. Hence, a usual economical approach is to
start with a sampling plan and to add sequentially new sampling plan points in a
guided way.
Ideally, this kind of adaptive interpolation balances the error-based global ex-
ploration for generating an overall accurate low-fidelity model and the prediction-
based local exploitation for determining an optimal value of the high-fidelity model.
Using the error ˆ
sy(x), one can define the acquisition function (or infill criterion
3.3. Surrogate-guided optimization 103
function or update point function) expected improvement EI ∶X→R+whose assign-
ment is defined by the conditional expression
EI(x)∶=
⎧
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩
0, if ˆ
sy(x)≡0
(min(y)−ˆ
y(x))⎛
⎝1
2+1
2erf⎛
⎝min(y)−ˆ
y(x)
ˆ
sy(x)√2⎞
⎠⎞
⎠
+ˆ
sy(x)
√2πexp(−(min(y)−ˆ
y(x))2
2(ˆ
sy(x))2), if ˆ
sy(x)>0, (3.114)
where a signature of the map min reads as Rm×1→Rand min(y)returns the min-
imal output point within the current column vector yin (3.83), thus, it is set that
ymin ≡min(y); and the map erf ∶R→[−1,1]denotes the Gauss error function. No-
tice that there is a slight abuse of notation in the sense that the expected improve-
ment acquisition function EI is regarded as an independent map instead of as the
expected value of an improvement function I, i.e., E[I(x)]. For more details, see,
e.g., [108], [70, p. 89ff] or [116, p. 294ff].
The acquisition function EI enables us to assert how much utility or improve-
ment is to be expected from a potentially new sampling plan point. For other kinds
of acquisition function, see, e.g., [70, ch. 3.2] or [116, ch. 16] and references therein.
After determining ymin and the initial kriging low-fidelity model in (3.98) w.r.t. a
given sampling plan, one can identify the new m+1 sampling plan point as the
optimal value that maximizes the corresponding expected improvement acquisition
function in (3.114)– or, equivalently, as the minimizer of the optimization problem
xm+1∶=argmin
x∈X
−EI(x).25 (3.115)
Observe that the statement in (3.115) follows a common notation in which it is
not emphasized that the assignment definition EI(x)in (3.114)depends on the sam-
pling plan points and the corresponding output points as well as on the parameters
of the kriging low-fidelity model.
Ideally, the iteration procedure in (3.115) of determining ymin and the kriging
low-fidelity model, and finding a new sampling plan point xm+1terminates after
finitely many steps at the global minimum such that ˆ
sy(x)≡0, and, consequently,
EI(x)≡0.
However, the experimental rate of convergence of the iteration procedure in
(3.115) might be very low or the iteration procedure might not be convergent at all
(cf. [70, p. 91]). Thus, the condition in (3.107) for a given problem does not neces-
sarily hold to be true or the condition cannot be applied because its assumption is
violated.
The iteration procedure in (3.115) is associated with the so-called efficient global
optimization (or sequential kriging optimization) technique which can be consid-
ered as a subtype of the model management strategy adaptation (see [166, p. 555]).
Regarding the zoo of possible acquisition functions, it is intricate to find the most
appropriate embedding of concepts associated with iteration procedures such as
in (3.115) into the context of numerical optimization with the magnetoquasistatic
25For computational reasons in a manner similar to (3.97), it can be useful to add the smallest posi-
tive normalized floating-point number 2−1022 to the assignment in (3.114) and to consider the logarithm
with base 10 of this extended version, i.e., to utilize rather log10(EI(x)+2−1022)than EI(x)itself (see,
e.g., the book website associated with [70]).
104 Chapter 3. Surrogate optimization
model (recall § 2.3). Hence, concepts such as, for instance, constrained expected im-
provement (see, e.g., [70, ch. 5.4]) within the electromagnetics context demand a thor-
ough examination all on their own – which is out of the scope of the present work.
In the present work, the previous considerations concerning sequential kriging
optimization are primarily employed as internal checking tools at the level of pro-
grams (recall Figure 1.4).
3.3.2 Optimization within the space-mapping paradigm
Recalling § 2.3.2 and § 3.1.3, one can adapt the problem formulation in (2.40) to the
generic statements in (3.104) in the sense that
min.
x∈X0(ˆ
ˆ
j○K)(x), (3.116)
where K∈hom(X0,Y0)denotes the high-fidelity model and ˆ
ˆ
j∈hom(Y0,Z0)denotes
the objective functional and the composition map ○is overloaded such that its signa-
ture reads as ZY0
0×YX0
0→ZX0
0. Let us think of (3.116) as the high-fidelity optimization
problem.
Referring to the low-fidelity model as ˜
K∈hom(X1,Y1)and overloading the ob-
jective functional ˆ
ˆ
j∈hom(Y1,Z1)and the composition map ○∈hom(ZY1
1×YX1
1,ZX1
1)
in (3.116), one can think of the low-fidelity optimization problem as
min.
˜
x∈X1(ˆ
ˆ
j○˜
K)(˜
x). (3.117)
Furthermore, if we refer to the domain-oriented correction map as ˜
P∈hom(X0,X1),
and to the codomain-oriented correction map as ˜
R∈hom(Y1,Y0), then the generic
statements in (3.104) reduce to one legitimate generic statement from the perspective
of function extensionality such as in (2.28), that is,
∀x∈X0.K(x)=Y0(˜
R○Y10 ˜
K○X01 ˜
P)(x), (3.118)
where the map ○X01 ∶YX1
1×XX0
1→YX0
1and the map ○Y10 ∶YY1
0×YX1
1→YX1
0denote suit-
able composition maps which adhere to right-associativity. Thus, the ideal property
regarding the maps ˜
R,˜
K,˜
Preads as
∀˜
R∈hom(Y1,Y0).∀˜
K∈hom(X1,Y1).∀˜
P∈hom(X0,X1).K=X0→Y0(˜
R○Y10 ˜
K○X01 ˜
P).
(3.119)
Recalling the commentary on (3.104), let us introduce the map ˜
Kssuch that
˜
Ks=(˜
R,˜
K,˜
P)↦˜
Ks(˜
R,˜
K,˜
P)∶=˜
R○Y10 ˜
K○X01 ˜
P∶YY1
0×YX1
1×XX0
1→YX0
0, (3.120)
in order to conceptually discriminate the notion of a low-fidelity model ˜
Kand the
notion of a surrogate model ˜
Kswithin the context of the space-mapping paradigm.
For the sake of notational ease, let us define
K˜
R,˜
K,˜
P
s∶=˜
Ks(˜
R,˜
K,˜
P). (3.121)
If we add another assumption to the list of assumptions regarding the maps ˜
R,
˜
K, and ˜
P, more specifically, if we assume that there is some kind of sub-structure
between X0and X1as well as between Y1and Y0, then it is meaningful to define the
3.3. Surrogate-guided optimization 105
inclusion map ι˜
Rand the inclusion map ι˜
Psuch that
ι˜
P=x↦ι˜
P(x)∶=x∶X0→X1, (3.122a)
ι˜
R=˜
y↦ι˜
R(˜
y)∶=˜
y∶Y1→Y0. (3.122b)
However, if we assume that X≡X0and X0≡X1as well as Y≡Y1and Y1≡Y0, then
we receive ι˜
R≡idYand ι˜
P≡idX. Another possible assumption is that the corre-
sponding entities are isomorphic regarding some prescribed algebraic structure, i.e.,
X0≅X1and Y1≅Y0. We dwell on this assumption in ch. 4, though.
Observe that if and only if the case ι˜
R≡idYand ι˜
P≡idXis given, then
∀˜
K∈hom(X,Y).KidY,˜
K,idX
s=X→Y˜
K(3.123)
holds to be true such that the notion of a low-fidelity model ˜
Kand the notion of a
surrogate model ˜
Kscollapse within the context of the space-mapping paradigm.
If the condition in (3.118) and in (3.119), respectively, hold to be true, then one
can substitute Kwith K˜
R,˜
K,˜
P
sin (3.116) such that
min.
x∈X0(ˆ
ˆ
j○(˜
R○Y10 ˜
K○X01 ˜
P))(x). (3.124)
However, according to the remarks regarding (2.28), one cannot expect that the con-
dition in (3.118) and in (3.119), respectively, are satisfied for real-world applications.
Thus, we have to incorporate the information about the high-fidelity model into the
surrogate model to reduce a potential discrepancy between the two.
If we assume that the low-fidelity model does not depend on the high-fidelity
model, then, conceptually, one would have to extend the signature of the correction
map ˜
Rand the signature of the correction map ˜
Pin the sense that
˜
P=(x,K,˜
K)↦˜
P(x,K,˜
K)∶X0×YX0
0×YX1
1→X1, (3.125a)
˜
R=(˜
y,˜
K,K)↦˜
R(˜
y,˜
K,K)∶Y1×YX1
1×YX0
0→Y0. (3.125b)
Though, in the common treatment of theoretical issues regarding the space-mapping
paradigm, the high-fidelity model and the low-fidelity model are assumed to be
fixed. Thus, they are implicitly incorporated into the assignment rules of the cor-
rection maps (see, e.g., [49, ch. 3] and references therein). This tactic resembles the
definition of the empirical generalization error eH,sg(ˆ
Qξ
ξ
ξ)in (3.21).
From a program’s viewpoint (recall Figure 1.4), the low-fidelity model and the
high-fidelity-model are defined within the scope of the programs associated with
the correction maps.
Subsequently, I deem it advisable to move from an abstract function definition
to a concrete function definition. More precisely, let us assume X0⊆Rd,X1⊆Rd,
Y0⊆R, and Y1⊆R, that is, let us cover the multivariate scalar-valued use case w.r.t.
the high-fidelity model and the low-fidelity model, then one can provide a possible
definition of the assignment rule of the domain-oriented correction map ˜
Pby means
of
˜
Ps=x↦˜
Ps(x)∶=argmin
˜
x∈X1(1
2(˜
K(˜
x)−K(x))2+R
α
2∥˜
x−x∥2
l2), (3.126)
where the index sin ˜
Psemphasizes the scalar-valued use case, α∈R+is a user-
assigned smoothing parameter for the purpose of existence and uniqueness of a
solution (compare with the parameter βin (2.31)). For more details regarding the
106 Chapter 3. Surrogate optimization
definition in (3.126), see [95].
Let us redefine the high-fidelity model and the low-fidelity model in the sense
that
K=(w,x)↦K(w,x)∶W0×X0→Y0, (3.127a)
˜
K=(˜
w,˜
x)↦˜
K(˜
w,˜
x)∶W1×X1→Y1, (3.127b)
then one can introduce the map Kwand the map ˜
K˜
wthat read as
Kw=w↦(x↦K(w,x))∶W0→YX0
0, (3.128a)
˜
K˜
w=˜
w↦(˜
x↦˜
K(˜
w,˜
x))∶W1→YX1
1, (3.128b)
where the operation currying is applied to the map Kand ˜
K. The operation currying
is conceived as an operation that transforms a function with multiple arguments into
a sequence of functions with single arguments.
If we suppose that X0⊆Rd,X1⊆Rd,Y0⊆R,Y1⊆R,W0⊆R, and W1⊆R, then
one can consider mwpoints with mw∈Nsuch that one can construct lists of func-
tions, i.e.,
(Kw(w1),. . .,Kw(wmw))≡(x↦K(w1,x),..., x↦K(wmw,x)), (3.129a)
(˜
K˜
w(˜
w1),. . ., ˜
K˜
w(˜
wmw))≡(˜
x↦˜
K(˜
w1,˜
x),. . ., ˜
x↦˜
K(˜
wmw,˜
x)), (3.129b)
where the operation ≡is conceived as a componentwise operation. Furthermore, if
we evaluate the lists of functions at xand at ˜
x, respectively, then one can construct
lists of evaluated functions, i.e.,
(Kw(w1)(x),.. .,Kw(wmw)(x))≡(K(w1,x),...,K(wmw,x)), (3.130a)
(˜
K˜
w(˜
w1)(˜
x),. . ., ˜
K˜
w(˜
wmw)(˜
x))≡(˜
K(˜
w1,˜
x),. . ., ˜
K(˜
wmw,˜
x)). (3.130b)
Finally, assuming the redefinition in (3.127), one can cover the multivariate vector-
valued use case26 w.r.t. the high-fidelity model and the low-fidelity model, that is,
if we assume that X0⊆Rd,X1⊆Rd,Y0⊆Rmw,Y1⊆Rmw, then one can overload the
maps in (3.127) such that
K=x↦K(x)∶=(K(w1,x),. . .,K(wmw,x))∶X0→Y0, (3.131a)
˜
K=˜
x↦˜
K(˜
x)∶=(˜
K(˜
w1,˜
x),. . ., ˜
K(˜
wmw,˜
x))∶X1→Y1. (3.131b)
Notice that the entity wis often associated with the time variable tsuch as, e.g.,
in (2.9); or it is associated with the frequency ωsuch as, e.g., in (2.10), that is, w≡t
or w≡ω. In this context, it is usually assumed that W1≡W0in (3.127). These associ-
ations constitute some common interpretations of the entity wwithin the semantics
of electromagnetics. Due to these interpretations, the case ∀d.∀mw.d<mwis often
considered in practical applications (see, e.g., [194, p. 11] or [49, p. 91]).
In the vector-valued use case, technically, the dimensions of the domains and
co-domains do not necessarily have to match, i.e., the cases dim(X0)≠dim(X1)and
dim(Y0)≠dim(Y1)are conceivable. For instance, if the low-fidelity model admits
an application of automatic differentiation (see § 2.3.3) such as in the case of data-fit
low-fidelity models, then one can unleash the machinery of sensitivity computation
in order to determine an importance ranking of input variables (see § 3.2.1). Hence,
26In [138], multivariate functions are examined in a computational context for investigating partial
derivatives and the corresponding chain rule for multivariate calculus.
3.3. Surrogate-guided optimization 107
by means of sensitivity computation, one could construct a low-fidelity model such
that dim(X0)>dim(X1). In the present work, though, this path is not pursued. For
further elaborations regarding the dimension issue in the multivariate vector-valued
use case, I refer to, e.g., [56, p. 60ff] or [95].
Observing (3.130), though, I conclude that, syntactically, there is another possible
encoding of the vector-valued use case if it is set that
(Kw(w1)(x),.. .,Kw(wmw)(x))≡(K1(x),...,Kmw(x)), (3.132a)
(˜
K˜
w(˜
w1)(˜
x),. . ., ˜
K˜
w(˜
wmw)(˜
x))≡(˜
K1(˜
x),. . ., ˜
Kmw(˜
x)), (3.132b)
then one can rewrite (3.131) as
K=x↦K(x)∶=(K1(x),. . .,Kmw(x))∶X0→Y0, (3.133a)
˜
K=˜
x↦˜
K(˜
x)∶=(˜
K1(˜
x),. . ., ˜
Kmw(˜
x))∶X1→Y1, (3.133b)
Mind that if we change the definitions in (3.128) such that it is set W1≡W0, and
therefore, in (3.133), Kwand ˜
Kwwith w∈{1,. . .,mw}denote mwdifferent component
high-fidelity models and mwdifferent component low-fidelity models, respectively,
then the corresponding interpretation resembles partly multiobjective optimization
(recall § 1.1). For some applications of this kind of interpretation of the vector-valued
use case, I refer to, e.g., [56, ch. 5] or [49, ch. 6].
Given the definitions in (3.131), the assignment rule in (3.126) has to be adapted,
more precisely, given X0⊆Rdand X1⊆Rd, then one can implement the map ˜
Pby
˜
Pv=x↦˜
Pv(x)∶=argmin
˜
x∈X1(1
2∥˜
K(˜
x)−K(x)∥2
l2+R
α
2∥˜
x−x∥2
l2), (3.134)
where the index vin ˜
Pvemphasizes the vector-valued use case and the correspond-
ing entities are conceived as column vectors (see also the commentary on the rep-
resentation of vectors as column vectors in § 3.1.2). For other possible definitions
of ˜
Pv(x), see, e.g., [49, p. 65] and references therein.
Depending on the choice of interpretation in (3.131) or in (3.133), one can recover
the scalar-valued use case by setting mw∶=1 such that that ˜
Pv≡˜
Ps.
Assuming some kind of differentiability structure regarding the map ˜
P(cf. [49,
p. 64]), one can invoke the notion of a first-order Taylor series expansion for multi-
variate vector-valued functions, thus, one can define an affine map as a representa-
tive of the map ˜
Pin (3.125a), that is,
˜
P=x↦˜
P(x)∶=˜
P(x0)+J˜
P(x0)(x−x0), (3.135)
where, in the context of the space-mapping paradigm, the value of ˜
Pat the expan-
sion point x0is chosen as ˜
P(x0)∶=˜
Pv(x0); and J˜
P(x0)∈Rd×ddenotes the Jacobi ma-
trix w.r.t. the domain-oriented correction map ˜
Pin (3.125a) and evaluated at a fixed
argument x0of the high-fidelity model.27
27If we set [C1(U,R)]m∶=C1(U,R)× ⋯m−1×C1(U,R), and if we assume a map f∈[C1(U,R)]m
with U⊂Rnbeing an open set, and fi∈C1(U,R)with i∈{1,. . .,m}denote the components of
the map f, and xj∈Rwith j∈{1,. .. , n}denote the components of the component maps fi, and
if we suppose a fixed argument p∈U, then let us conceive the Jacobi matrix Jf(p)∈Rm×nw.r.t. f
and evaluated at pas Jf(p)∶=[jf(p)i,j]with jf(p)i,j∶=∂exj(fi)(p)for each i∈{1,. ..,m}and for each
j∈{1,.. .,n}with ∂exj(fi)(p)∶=grad(fi)(p)⋅Rnexjwhere ⋅Rndenotes the Euclidean inner product w.r.t.
Rn; and exjrefer to the unit vectors w.r.t. xj, respectively. See the commentary on the overloading of
the map grad in§2.3.3 as well.
108 Chapter 3. Surrogate optimization
By assuming some kind of sub-structure such as in (3.122), one can determine
J˜
P(x0)by the Jacobi matrix J˜
K(˜
Pv(x0))∈Rmw×dw.r.t. the low-fidelity model evalu-
ated at ˜
Pv(x0)and by the Jacobi matrix J˜
Ks(ι˜
R,˜
K,˜
P)(x0)∈Rmw×dw.r.t. the surrogate
model ˜
Ks(ι˜
R,˜
K,˜
P)and evaluated at x0.
Utilizing a multivariate vector-valued version of the chain rule, the Jacobi ma-
trix J˜
Preads as
J˜
P(x0)∶=J+
˜
K(˜
Pv(x0))J˜
Ks(ι˜
R,˜
K,˜
P)(x0), (3.136)
where J+
˜
K(˜
Pv(x0))∈Rd×mwindicates the pseudoinverse of J˜
K(˜
Pv(x0))possessing at
least a left inverse characteristic, i.e., J+
˜
K(˜
Pv(x0))J˜
K(˜
Pv(x0))≡Iwith I∈Rd×dbeing
the identity matrix. The definition of the pseudoinverse follows the definition in
(3.61).
Furthermore, by assuming some kind of sub-structure such as in (3.122) and by
assuming Y0⊆Rmw,Y1⊆Rmw, one can define an ideal affine map as a representative
of the map ˜
R, that is,
˜
R=˜
K(˜
x)↦˜
R(˜
K(˜
x))∶=K(x∗)+RmwS(˜
K(˜
x)−˜
K(ι˜
P(x∗))), (3.137)
where S∈Rmw×mwdenotes the ideal rotation matrix that is defined by the Jacobian
matrix JK(x∗)∈Rmw×dw.r.t. the high-fidelity model Kand evaluated at its optimal
argument x∗and by the Jacobian matrix J˜
K(ι˜
P(x∗))∈Rmw×dw.r.t. the low-fidelity
model ˜
Kand evaluated at ι˜
P(x∗).
Assuming that (3.118) adapted to the point x∗holds to be true for the surrogate
model ˜
Ks(˜
R,˜
K,ι˜
P)and utilizing a multivariate vector-valued version of the chain
rule, the ideal rotation matrix S≡J˜
R(˜
K(ι˜
P(x∗)))reads as
S∶=JK(x∗)J+
˜
K(ι˜
P(x∗)), (3.138)
where J+
˜
K(ι˜
P(x∗))∈Rd×mwindicates the pseudoinverse of J˜
K(ι˜
P(x∗))possessing at
least a right inverse characteristic, i.e., J˜
K(ι˜
P(x∗))J+
˜
K(ι˜
P(x∗))≡Iwith I∈Rmw×mwbe-
ing the identity matrix. Hence, the definition of the pseudoinverse in (3.61) is adapted
to the case
J+
˜
K(ι˜
P(x∗))∶=J˜
K(ι˜
P(x∗))T(J˜
K(ι˜
P(x∗))J˜
K(ι˜
P(x∗))T)−1. (3.139)
Note that the attribute "ideal" reflects the fact that one cannot know apriori the
optimal solution regarding the high-fidelity model optimization problem. For more
details regarding the ideal map ˜
Rin (3.137), I refer to, e.g., [56, p. 44ff].
From an algorithmic viewpoint (recall Figure 1.4 and recall § 2.3.3), the state-
ment in (3.124) constitutes the anchor point at the map level for any optimization
algorithm that follows the space-mapping paradigm.
Thus, one can articulate the essential aim of the corresponding iteration proce-
dures by
x(k+1)∶=argmin
x∈X0(ˆ
ˆ
j○(˜
R○Y10 ˜
K○X01 ˜
P))(x(k)), (3.140a)
where k∈Nand xkand xk+1denote the k-th iteration point and the k+1-th iteration
point, respectively, such that, with regard to some appropriate norm,
x(k)→x∗as k→∞, (3.140b)
3.3. Surrogate-guided optimization 109
where x∗refers to an existing optimal solution of the high-fidelity optimization prob-
lem in (3.116), more precisely,
x∗∈argmin
x∈X0(ˆ
ˆ
j○K)(x). (3.140c)
Analogous to (3.140c), one can refer to ˜
x∗as an existing optimal solution of the low-
fidelity optimization problem in (3.117), more precisely,
˜
x∗∈argmin
˜
x∈X1(ˆ
ˆ
j○˜
K)(˜
x). (3.141)
Observe that, in the scalar-valued use case and the vector-valued use case, it is ad-
ditionally supposed that Z0⊆R+and Z1⊆R+in (3.116) and in (3.117), respectively.
If we invoke an object X11such that there is some kind of substructure between X11
and X1, then one can utilize the inclusion map ι˜
Pin (3.122a) in order to define a
preimage of X11under ι˜
Pin the sense that
ι−1
˜
P(X11)∶={x∈X0∣ι˜
P(x)∈X11}. (3.142)
If we suppose that ˜
x∗∈X11⊂X1, then one can define a proposal for an initial iteration
point x(0)by
ι˜
P(x(0))∶=˜
x∗(3.143)
such that x(0)∈ι−1
˜
P(X11). Bear in mind that other initial iteration points are plausible,
too, since they are commonly problem dependent. However, the choice in (3.143)
appears heuristically as a promising decision in order to, hopefully, keep the total
number of iterations as low as possible.
In the course of the years, many different algorithms have been presented to
achieve the essential aim in (3.140). For elaborations on a large portion of cor-
responding optimization algorithms, see, e.g., [49, ch. 3] or [125] and references
therein.
In the present work, though, solely a small subset of the large class of optimiza-
tion algorithms within the space mapping paradigm is considered. Notice that those
algorithms are regarded as algorithms within the space-mapping paradigm that con-
ceptually distinguish a low-fidelity model ˜
Kand a surrogate model K˜
R,˜
K,˜
P
sin (3.121)
at the function level (recall Figure 1.4).
From the small subset of algorithms under consideration, let us focus primarily
on a Trust Region Aggressive Space Mapping (TRASM) algorithm which assumes the
surrogate model ˜
Ks(ι˜
R,˜
K,˜
P)that is described with regard to (3.136).
Mind that the present work’s version of the TRASM algorithm, that is, algo-
rithm 3.1, builds upon the discussion in [95] and in [49, p. 67ff]) and extends their
considerations by the context of a more general set of admissible solutions and the
context of the Julia PL. A main novel use case of the TRASM algorithm 3.1 of the
present work is its combination with a co-kriging low-fidelity model that we en-
counter in the next subsection.
Similarly to the proposed procedures SBO-DFLF and SBO-SPLF, the TRASM al-
gorithm 3.1 builds upon the common canon of optimization algorithms (recall § 2.3.3).
The TRASM algorithm’s basic building blocks are well covered by the theory of
trust-region methods within nonlinear optimization (see, e.g., [158, ch. 4]). An essen-
tial overriding motivation of invoking a trust-region scaffolding for an optimization
110 Chapter 3. Surrogate optimization
algorithm is to equip the algorithm with good global convergence guarantees, that
is, to ensure that any remote starting point will eventually converge, in the uncon-
strained case, to a stationary accumulation point, and, in the constrained case, to a
KKT point where a KKT point is conceived as a stationary accumulation point that
satisfies the KKT conditions (see § 2.3.1). Regarding the experimental rate of con-
vergence, the corresponding equipped optimization algorithms exhibit satisfactory
practical performance.
Let us assume some additional structure w.r.t. the domains and codomains of
the respective models; more precisely, let us suppose some kind of topological vec-
tor space structure equipped with some metric structure, some norm structure, and
some inner product structure. Hence, for the sake of simplicity, it is presupposed that
the above-mentioned vector-valued use case incorporates all the structure needed.
Furthermore, it is supposed that ˜
Ks(ι˜
R,˜
K,˜
P)≡˜
Ks(idY,˜
K,˜
P)w.r.t. (3.122).
Let us define the k+1-th iteration point x(k+1)as
x(k+1)∶=x(k)+h(k), (3.144)
where h(k)∈X0⊆Rddenotes the k-th step from x(k)to x(k+1)in which the step’s
direction and the step’s length are encoded.
The definition in (3.135) is adapted in the sense that the identification (x−x0)∶=
h(k)is made, it is set that ˜
P(x0)∶=˜
P(xk)and ˜
Pv(x0)∶=˜
Pv(xk)such that ˜
P(xk)∶=
˜
Pv(xk); and, finally, the Jacobi matrix J˜
P(x0)is approximated by means of the Broy-
den’s method for solving nonlinear equations (see, e.g., [158, p. 279–283]), i.e.,
J˜
P(x0)∶=B(k),B(0)∶=I,B(k+1)∶=B(k)+Rd×d(y(k)
˜
Pv
+RdB(k)h(k))⊗h(k)
∥h(k)∥2
l2
, (3.145)
where B(k)∈Rd×dis referred to as the k-th iteration Broyden’s matrix, the map ⊗is
conceived as the outer product w.r.t. two column vectors and the map ⊗is granted
a higher precedence than the map +Rd×d,I∈Rd×dis the identity matrix as a repre-
sentative of a non-singular matrix and y(k)
˜
Pv
∈Rddenotes the change in ˜
Pvin (3.134)
w.r.t. the step h(k), that is,
y(k)
˜
Pv
∶=˜
Pv(x(k+1))−˜
Pv(x(k)). (3.146)
Hence, the adaptation of the definition in (3.135) reads as
˜
P=x(k+1)↦˜
P(x(k+1))∶=˜
Pv(x(k))+B(k)h(k). (3.147)
By moving from the definition in (3.135) to its adaptation in (3.147), one can ob-
tain the k-th step h(k)by solving the trust-region optimization sub-problem regard-
ing (3.140) in which the corresponding model function is set to be the surrogate
model ˜
Ks(idY0,˜
K,˜
P)≡idY0○Y10 ˜
K○X01 ˜
Pw.r.t. h(k), that is,
min.
h(k)∈F(k)
0(ˆ
ˆ
j○(idY0○Y10 ˜
K○X01 ˜
P))(x(k)+h(k))(3.147)
≡min.
h(k)∈F(k)
0(ˆ
ˆ
j○(idY0○Y10 ˜
K))(˜
Pv(x(k))+B(k)h(k)),
(3.148)
where F(k)
0⊆Rdincorporates the inequality constraint function c1that reads as
c1=h(k)↦c1(h(k))∶=∥Dh(k)∥l2−∆(k)∶X0→R−, (3.149)
3.3. Surrogate-guided optimization 111
with ∀h(k).c1(h(k))≤0 and ∥Dh(k)∥l2being the trust-region in the l2-norm and ∆(k)∈R+
being the trust-region radius and D∈Rd×dbeing a diagonal matrix with positive en-
tries that enables scaling in order to potentially enhance the solving process (cf. [158,
p. 95ff]).
Choosing adequately the trust-region radius ∆(k)in each iteration is an important
part of an optimization algorithm based on the trust-region theory (cf. [158, p. 68]).
In this context, let us define the trust-region reduction quotient ρ(k)∈R, that is,
ρ(k)∶=ared(x(k),h(k))
pred(x(k),h(k)), (3.150)
where ared ∶ZX0×X0
0denotes the actual reduction function and pred ∶ZX0×X0
0denotes
the predicted reduction function whose assignment rules read as
ared(x(k),h(k))∶=(ˆ
ˆ
j○(idY0○Y10 ˜
K○X01 ˜
P))(x(k))−(ˆ
ˆ
j○(idY0○Y10 ˜
K○X01 ˜
P))(x(k)+h(k)),
(3.151a)
pred(x(k),h(k))∶=(ˆ
ˆ
j○(idY0○Y10 ˜
K○X01 ˜
P))(x(k))−(ˆ
ˆ
j○(idY0○Y10 ˜
K))(˜
Pv(x(k))+B(k)h(k)).
(3.151b)
For the sake of the implementation of the TRASM algorithm 3.1, let us overload
straightforwardly the function signature of ared by ZX1×X1
0and the function signa-
ture of pred by ZX1×X0
0such that the assignment rules read as
ared(˜
x(k),˜
x(k+1))∶=(ˆ
ˆ
j○(idY0○Y10 ˜
K))(˜
x(k))−(ˆ
ˆ
j○(idY0○Y10 ˜
K))(˜
x(k+1)), (3.152a)
pred(˜
x(k),h(k))∶=(ˆ
ˆ
j○(idY0○Y10 ˜
K))(˜
x(k))−(ˆ
ˆ
j○(idY0○Y10 ˜
K))(˜
x(k)+B(k)h(k)),
(3.152b)
where ˜
x(k)∶=˜
Pv(x(k))and ˜
x(k+1)∶=˜
Pv(x(k)+h(k)).
Note that ρ(k)in (3.150) quantifies the degree of justification for the identifica-
tion in (3.148) since the genuine nature of the statement in (3.147) and the statement
in (3.135), respectively, is approximate instead of exact.
Given a trust-region reduction threshold η1∈]0,1[and a trust-region reduction thresh-
old η2∈]0,1[with ∀η1,η2.η1<η2, a trust-region reduction factor γ∈]0,1[, and a trust-
region augmentation factor ζ∈]1,∞[, let us distinguish three cases regarding the k+1-
th iteration trust-region radius ∆(k+1), i.e.,
∆(k+1)∶=⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
γ∆(k), if ρ(k)<η1
∆(k), if η1≤ρ(k)<η2
ζ∆(k), if ρ(k)≥η2,
(3.153)
Thus, depending on the value of ρ(k), the trust-region radius is decreased or un-
changed or increased. Furthermore, if the trust-region radius is decreased, then
the k+1-th iteration point x(k+1)in (3.144) and the k+1-th iteration Broyden’s ma-
trix B(k+1)in (3.145) are not accepted, more precisely, x(k+1)∶=x(k)and B(k+1)∶=B(k);
otherwise they are accepted. For other strategies of updating ∆(k+1), see, e.g., [49,
p. 68] or [158, p. 69].
Mind that F(k)
0in (3.148) incorporates contingently also some other constraints
(recall § 2.3.2) such as, e.g., box constraints (see § 2.3.2), inherited from the high-
fidelity model’s domain. Hence, the corresponding inequality constraint functions c2
112 Chapter 3. Surrogate optimization
and c3read as
c2=h(k)↦c2(h(k))∶=xl−(x(k)+h(k))∶X0→X0(3.154a)
c3=h(k)↦c3(h(k))∶=(x(k)+h(k))−xu∶X0→X0, (3.154b)
where ∀h(k).c2(h(k))≤0, ∀h(k).c3(h(k))≤0, and xl∈X0⊆Rdincludes the lower bou-
ds w.r.t. x, and xu∈X0⊆Rdincludes the upper bounds w.r.t. x. Note that, e.g., in the
case of box constraints, it could be useful to change the norm from the l2- to the l∞-
or to the l1-norm (cf. [158, p. 97]).
Given an evaluated quantity of interest which is supposed to be computation-
ally expensive, if we incorporate this quantity in the constraints (recall § 2.3.1 and
§2.3.3), then, from an application-driven viewpoint, it might be advisable to replace
this quantity by a low-fidelity model as well. Depending on the computational costs
of a simplified-physics low-fidelity model in terms of memory storage and evalu-
ation time, it can be more favorable to invoke a data-fit low-fidelity model than a
simplified-physics low-fidelity model.
Finally, the k-th step h(k)in (3.144) is computed by
h(k)∶=argmin
h∈F(k)
0(ˆ
ˆ
j○(idY0○Y10 ˜
K))(˜
Pv(x(k))+B(k)h).28 (3.155)
In order to translate the basic building blocks of the TRASM algorithm into a
practical implementation, i.e., into a program (recall Figure 1.4), one has to provide
at least one termination criterion. Mind that applying proper termination criteria for
practical purposes is an intricate endeavor.
Let us consider the maximal number kmax ∈Nof iteration points in (3.144) as one
termination criterion, more precisely, the condition
∀k.k<kmax (3.156)
has to be true, otherwise the algorithm terminates.
Furthermore, let us consider the norm of the k-th step h(k)relative to the norm of
the k-th iteration point x(k)in (3.144) such that
∥x(k+1)−x(k)∥l2
∥x(k)∥l2
≡∥h(k)∥l2
∥x(k)∥l2
. (3.157)
Given an absolute threshold w.r.t. the norm of the step h(k), i.e., eabs ∈]0,1[, and
an relative threshold w.r.t. the norm of the step h(k), i.e., erel ∈]0,1[, where it is de-
manded that ∀eabs,erel .eabs <erel, then one can reformulate (3.157) such that
∀x(k),h(k).∥h(k)∥l2
∥x(k)∥l2
>erel +eabs
∥x(k)∥l2
(3.158)
28The definition of the k-th step h(k)in (3.155) aligns itself with the approach in [95]. However,
in [56, p. 18f] and in [49, p. 68f], another approach for computing h(k)is discussed within the context
of the Levenberg-Marquardt method for least-squares problems (see, e.g., [158, p. 258–262]). Hence,
h(k)is determined by solving the linear system of equations ((B(k))TB(k)+λI)h(k)=−(B(k))Te(k)
with e(k)∶=˜
Pv(xk)−˜
x∗where ˜
x∗refers to an existing optimal solution of the low-fidelity optimization
problem and λ∈R+plays a similar role as the regularization parameter w.r.t. (3.65). For more details
on this approach for computing h(k), I refer to, e.g., [56, p. 18f], [49, p. 68f], [158, p. 69ff], and [158,
p. 258–262] and references therein.
3.3. Surrogate-guided optimization 113
Algorithm 3.1: Trust Region Aggressive Space Mapping (TRASM)
# Input:
x(0)#∈Rd... initial solution # (3.143)
B(0)#∈Rd×d... initial Broyden’s matrix # (3.145)
∆(0)#∈R+... initial trust-region radius # (3.153)
η1,η2,γ,ζ#∈]0,1[×]0,1[×]0,1[×]1,∞[# (3.153)
F(0)
0#⊆Rd... initial set of admissible solutions # (3.149) # (3.154)
kmax #∈]0,100]... maximal number of iterations # (3.156)
eabs #]0,1[... absolute threshold w.r.t. the norm of the step h(k)# (3.158)
erel #∈]0,1[... relative threshold w.r.t. the norm of the step h(k)# (3.158)
# Output:
x(k+1)#∈Rd... optimal solution after k +1iterations
1: for kin 0∶1∶kmax # (3.156)
2: ˜
x(k)∶=˜
Pv(x(k))# (3.134) # a high-fidelity model evaluation
3: h(k)∶=argminh∈F(k)
0(ˆ
ˆ
j○(idY0○Y10 ˜
K))(˜
x(k)+B(k)h)# (3.155)
4: ˜
x(k+1)∶=˜
Pv(x(k)+h(k))# (3.134) # a high-fidelity model evaluation
5: ρ(k)∶=ared(˜
x(k),˜
x(k+1))
pred(˜
x(k),h(k))# (3.152)
6: if ρ(k)<η1# (3.153)
7: x(k+1)∶=x(k)
8: B(k+1)∶=B(k)
9: ∆(k+1)∶=γ∆(k)
10: elseif η1≤ρ(k)and ρ(k)<η2# (3.153)
11: x(k+1)∶=x(k)+h(k)# (3.144)
12: y(k)
˜
Pv
∶=˜
x(k+1)−˜
x(k)# (3.146)
13: B(k+1)∶=B(k)+Rd×d(y(k)
˜
Pv
+RdB(k)h(k))⊗h(k)
∥h(k)∥2
l2
# (3.145)
14: ∆(k+1)∶=∆(k)
15: else # (3.153)
16: x(k+1)∶=x(k)+h(k)# (3.144)
17: y(k)
˜
Pv
∶=˜
x(k+1)−˜
x(k)# (3.146)
18: B(k+1)∶=B(k)+Rd×d(y(k)
˜
Pv
+RdB(k)h(k))⊗h(k)
∥h(k)∥2
l2
# (3.145)
19: ∆(k+1)∶=ζ∆(k)
20: end # if
21: if ∥h(k)∥l2
∥x(k)∥l2
≤erel +eabs
∥x(k)∥l2# (3.158)
22: break
23: end # if
24: end # for
has to be true, otherwise the algorithm terminates. Technically, the termination
criterion in (3.158) consists of a combination of an absolute termination criterion and
a relative termination criterion.
Optionally, as an additional safeguard, one can incorporate the norm of the eval-
uated actual reduction function in (3.150) regarding an absolute threshold and a rel-
ative threshold. If, in some way, the gradient information concerning (ˆ
ˆ
j○(idY0○Y10
˜
K○X01 ˜
P))(x(k))or (ˆ
ˆ
j○(idY0○Y10 ˜
K))(˜
x(k))is available, then the gradient information
114 Chapter 3. Surrogate optimization
can be utilized for a termination criterion as well.
Finally, let us choose an intermediate level between a pseudocode and a code
from a programming language in industry (cf. the Listing 3.1) in order to represent
the TRASM algorithm 3.1.
For the sake of completeness, let us discuss briefly the basic building blocks of the
Manifold Mapping (MM) algorithm which utilizes the surrogate model ˜
Ks(˜
R,˜
K,ι˜
P).
The corresponding family of algorithms is based upon theoretical considerations
that, in the multivariate vector-valued use case, focus particularly on the situation in
which dim(X0)<dim(Y0)with dim(X0)∶=dand dim(Y0)∶=mw(cf. [56, p. 44]). The
elaborations are built upon the discussion in [56, p. 43 – 48] and in [49, p. 67ff]).
Adapting the constructions in (3.137), in (3.138), and in (3.139) and providing
a desired high-fidelity function value Kd∈Rmw(cf. ydin (2.31) and Qdin (2.33)) as
well as an initial iteration point x(0)such as in (3.143) and set the initial iteration
matrix S(0)∶=Iand the initial iteration matrix T(0)∶=S(0)with I∈Rmw×mwbeing the
identity matrix, then one can concretely define an update scheme for x(k+1)by
y(k)∶=(ι˜
R○Y10 ˜
K○X01 ι˜
P)(x(k))−T(k)(K(x(k))−Kd), (3.159a)
x(k+1)∶=argmin
x∈X0∥(ι˜
R○Y10 ˜
K○X01 ι˜
P)(x)−y(k)∥2
l2, (3.159b)
D(k+1)
K∶=[K(x(k+1))−K(x(k)),.. .,K(x(k+1))−K(x(max(k+1−d,0)))],
(3.159c)
D(k+1)
K∶=[(ι˜
R○Y10 ˜
K○X01 ι˜
P)(x(k+1))−(ι˜
R○Y10 ˜
K○X01 ι˜
P)(x(k)),.. .,
(ι˜
R○Y10 ˜
K○X01 ι˜
P)(x(k+1))−(ι˜
R○Y10 ˜
K○X01 ι˜
P)(x(max(k+1−d,0)))],
(3.159d)
[U(k+1)
K,Σ(k+1)
K,V(k+1)
K]∶=svd(D(k+1)
K), (3.159e)
[U(k+1)
˜
K,Σ(k+1)
˜
K,V(k+1)
˜
K]∶=svd(D(k+1)
˜
K), (3.159f)
A(k+1)∶=(I−U(k+1)
K(U(k+1)
K)T), (3.159g)
S(k+1)∶=D(k+1)
K(D(k+1)
˜
K)++A(k+1)(I−U(k+1)
˜
K(U(k+1)
˜
K)T), (3.159h)
T(k+1)∶=(S(k+1))+, (3.159i)
where the matrix A(k+1)∈Rmw×mwserves as a potential stabilizer for the MM algo-
rithm (cf. [56, p. 46]), and the operation svd refers to the singular value decomposi-
tion method, i.e., the operation svd performs the corresponding factorization for a
given matrix such that
D(k+1)
˜
K∶=U(k+1)
KΣ(k+1)
K(V(k+1)
K)T(3.160)
D(k+1)
˜
K∶=U(k+1)
˜
KΣ(k+1)
˜
K(V(k+1)
˜
K)T, (3.161)
with U(k+1)
K,U(k+1)
˜
K∈Rmw×mw,Σ(k+1)
K,Σ(k+1)
˜
K∈Rmw×d, and V(k+1)
K,V(k+1)
˜
K∈Rd×d. Note
that the matrix D(k+1)
K∈Rmw×din (3.159c) and the matrix D(k+1)
˜
K∈Rmw×din (3.159d)
can be equivalently determined by
∀j∈{1,. . .,min(dim(X0),k+1)}.∀i∈{1,. . .,dim(Y0)}.
[di,j](k+1)
K=K(x(k+1))−K(x(k+1−j)), (3.162)
[di,j](k+1)
˜
K=(ι˜
R○Y10 ˜
K○X01 ι˜
P)(x(k+1))−(ι˜
R○Y10 ˜
K○X01 ι˜
P)(x(k+1−j)). (3.163)
3.3. Surrogate-guided optimization 115
Observe that the definition of the matrix D(k+1)
Kand the definition of the matrix D(k+1)
˜
K
reveal that, at the program level (recall Figure 1.4), there is some kind of allocation-
sensitive bookkeeping necessary for constructing D(k+1)
Kand D(k+1)
˜
K.
Proper termination criteria for the MM algorithm can be defined analogously to
the TRASM algorithm 3.1. Furthermore, in [124], the authors propose some heuris-
tics in order to extend the basic building blocks of the MM algorithm in (3.159) by a
trust-region framework – sort of like in the TRASM algorithm 3.1.
At the algorithm level (recall Figure 1.4), though, there are two notable differ-
ences between the MM algorithm and the TRASM algorithm 3.1. Firstly, the MM
algorithm does not rely on solving an additional optimization problem such as for
determining ˜
Pv(x)in (3.134), but it relies on performing computational efficiently
two singular value decompositions. And, secondly, the MM algorithm exploits one
high-fidelity model evaluation in the k+1-th iteration instead of two high-fidelity
model evaluations in the TRASM algorithm 3.1.
Building upon investigations of algorithms of Space Mapping (SM) kind – such as
the TRASM algorithm 3.1 – which utilize the surrogate model ˜
Ks(ι˜
R,˜
K,˜
P)and inves-
tigations of algorithms of Manifold Mapping (MM) kind which utilize the surrogate
model ˜
Ks(˜
R,˜
K,ι˜
P), the author in [49, p. 112ff] proposes the Response and Parame-
ter Mapping (RPM) algorithm which utilizes the surrogate model ˜
Ks(˜
R,˜
K,˜
P)and in
which basic building blocks from SM algorithms and MM algorithms are combined.
The RPM algorithm’s additional computational overhead compared to a SM algo-
rithm or to a MM algorithm is justified by a hopefully higher accuracy.
In the RPM algorithm, the update scheme of the matrix B(k+1)in (3.145) is per-
formed in a manner similar to the update scheme of the matrix S(k+1)in (3.159h)
which could lead to an interesting adaptation of the TRASM algorithm 3.1. How-
ever, the data situation regarding a widely ramified overall assessment of the RPM
algorithm in terms of, e.g., accuracy, speed or convergence is very limited. Hence,
it is presumed that the scope of the benefit of a corresponding adaptation of the
TRASM algorithm 3.1 may be limited as well.
Observing the different kinds of algorithms, then one can generally discern that
the Achilles’ heel – or the spot that requires the most attention – of all the opti-
mization algorithms within the space-mapping paradigm is the need for a benign
resemblance of the high-fidelity model and the low-fidelity model (cf. § 3.1.3).
However, due to their dependence on properties of the low-fidelity model ˜
K, the
surrogate model K˜
R,˜
K,˜
P
s, and the high-fidelity model K, the convergence analysis of
the corresponding iteration procedures for (3.140) is fairly intricate (see, e.g., [49,
p. 76–84]).
If we invoke the NREGE eNR
H,sg(ˆ
Qξ
ξ
ξ)in (3.24) and the SSPCC r2
ˆ
y˜
ˆ
yin (3.31) w.r.t. the
low-fidelity model and if we adapt the NREGE and the SSPCC w.r.t. the surrogate
model, i.e., if we define ad-hoc eNR
H,sg,K˜
R,˜
K,˜
P
s(ˆ
Qξ
ξ
ξ)and r2
ˆ
y˜
ˆ
y,K˜
R,˜
K,˜
P
sby replacing the low-
fidelity model in (3.24) and in (3.31) with the surrogate model, then, driven by
heuristics, it is presumably reasonable to assert formally that, with regard to some
appropriate norms,
∀K˜
R,˜
K,˜
P
s.∀˜
K.(eNR
H,sg(ˆ
Qξ
ξ
ξ)→0∧r2
ˆ
y˜
ˆ
y→1)∧(eNR
H,sg,K˜
R,˜
K,˜
P
s(ˆ
Qξ
ξ
ξ)→0∧r2
ˆ
y˜
ˆ
y,K˜
R,˜
K,˜
P
s
→1)as m→∞and k→∞
Ô⇒ x(k)→x∗as k→∞w.r.t (3.140a), (3.164)
where mdenotes the number of sampling plan points such as in (3.35).
116 Chapter 3. Surrogate optimization
Similarly to the comments on (3.35), note that, from an application-driven view-
point, we are rather drawn towards the pre-asymptotic behavior than towards the
asymptotic behavior. Furthermore, note that the need for convergence w.r.t. both
the low-fidelity model and the surrogate model is comprehensible if we plug the
high-fidelity model as a low-fidelity model, i.e., K≡˜
K, in (3.164). In this case, one
can observe by definition convergence w.r.t. the low-fidelity model, but one could
observe non-convergence w.r.t. the surrogate model due to an improper choice of
correction maps.
Chiefly, the statement in (3.164) is a novel attempt to express formally the above-
mentioned intricateness of a convergence analysis of iteration procedures associated
with (3.140).
A practical value of the statement in (3.164) resides in the issue concerning the
quantitative assessment of the quality of a low-fidelity model and a surrogate model.
Hence, it contributes to the discussion regarding this issue (see, for instance, in [121]
or in [49, p. 82ff]) where the NREGE and the SSPCC can serve as quality measures.
Another practical value of the statement in (3.164) resides in the treatment of the
NREGE and the SSPCC as potential safeguards for any optimization algorithm within
the space-mapping paradigm. Hence, keeping the concrete implementations sepa-
rated from the abstract specification in (3.140), I deem it beneficial to incorporate the
NREGE and the SSPCC at the zeroth step, at an intermediate step or at the last step
of an iteration procedure concerning (3.140). At the last step of a corresponding iter-
ation procedure, the NREGE and the SSPCC could serve as an ultimate termination
criterion.
Similarly to the sequential kriging optimization in § 3.3.1, the experimental rate
of convergence of iteration procedures concerning (3.140) might be very low or the
corresponding iteration procedures might not be convergent at all (see, e.g., [49,
p. 84] or [49, p. 101f]). Thus, the condition in (3.107) for a given problem is either
not applicable since its assumption is not satisfied or not necessarily true at all.
Therefore, from an application-driven viewpoint embedded in the context of val-
idation and verification, it might be judicious to adapt the procedure SBO-DFLF and
the procedure SBO-SPLF in order to construct the schematic procedure that I refer to
as SGO-SPLF:
1) create a sample w.r.t. the high-fidelity model (§ 3.1.1);
2) provide a simplified-physics low-fidelity model (§ 3.1.3) and compute r2
ˆ
y˜
ˆ
y∣k
w.r.t. to the sample from step 1) and if r2
ˆ
y˜
ˆ
y∣k
<0.75, break off the current pro-
cedure and invoke the procedure SBO-DFLF, otherwise continue the current
procedure;
3) invoke a global optimization algorithm (§ 2.3.3) w.r.t. the provided simplified-
physics low-fidelity model from step 2);
4) use the minimizer from step 3) as a starting point in order to construct in the
minimizer’s vicinity a data-fit low-fidelity model (§ 3.1.2) w.r.t. the simplified-
physics low-fidelity model from step 2);
5) invoke an optimization algorithm within the space-mapping paradigm where
the low-fidelity model is identical with the data-fit low-fidelity model from
step 4);
6) use the minimizer from step 3) as a starting point for a local optimization algo-
rithm (§ 2.3.3) w.r.t. the high-fidelity model from step 1).
3.3. Surrogate-guided optimization 117
7) compare the minimizer from step 6) with the minimizer from step 5) and if
the discrepancy is much larger than a user-assigned threshold, improve the
simplified-physics low-fidelity model from step 2) or improve the data-fit low-
fidelity model from step 4) or improve both low-fidelity models.
Similarly to the TRASM algorithm 3.1 and the proposed procedures SBO-SPLF
and SBO-DFLF, the proposed schematic procedure SGO-SPLF builds upon the com-
mon canon of optimization algorithms (recall § 2.3.3).
Note that, in the procedure SGO-SPLF, I extend an approach proposed in [124]
which is incorporated by step 4) and step 5). More precisely, it is assumed that
the simplified-physics low-fidelity model – e.g., a low-fidelity model based on a
coarse-grid discretization or a weakened termination criteria of an iterative solver
or a combination of both (recall § 3.1.3) – is computationally too expensive such that
there is a need to construct a data-fit low-fidelity model w.r.t. the simplified-physics
low-fidelity model as well.
In order to mitigate the curse of dimensionality (recall § 3.1.2), the data-fit low-
fidelity model is constructed in the vicinity of the simplified-physics low-fidelity
model’s minimizer, that is, the domain under consideration (see, e.g., Figure 2.2 and
Figure 2.3) for the sampling plan Xsis smaller.
Finally, observe that the k+1-th iteration point in (3.140a) of an optimization
algorithm within the space-mapping paradigm can be theoretically interpreted as a
new m+1 sampling plan point in (3.115) of the sequential kriging optimization. This
contemplation furnishes us with some kind of semantics in order to contrast loosely
the corresponding optimization procedures that are all conceived as subtypes of the
model management strategy adaptation.
Furthermore, if we select a kriging low-fidelity model within the procedure SGO-
SPLF, then, technically, the afore-mentioned contemplation gives us the opportunity
to suggest an extension of the procedure SGO-SPLF by an inner sequential kriging
optimization in order to provide a mechanism for an adaptive improvement of the
data-fit low-fidelity model in step 5). The intricateness of such an interweavement
is captured by the statement in (3.164) as well.
3.3.3 Co-kriging optimization
Recalling our explanations regarding the mathematical machinery behind the krig-
ing low-fidelity model in § 3.1.2 and regarding the simplified-physics low-fidelity
models in § 3.1.3, especially, the generic statement in (3.103), one can articulate the
essential ideas behind the co-kriging optimization.
Conceptually, a co-kriging low-fidelity model is a kind of a kriging low-fidelity
model in which the prediction at an arbitrary point x, i.e., ˆ
y(x)in (3.98), incorpo-
rates the information of a sample sin (3.13) with respect to a high-fidelity model K
and the information of a sample s˜
Kwith respect to a low-fidelity model ˜
K(cf. [70,
p. 168]). Due to this construction principle, the optimization based on a co-kriging
low-fidelity model is conceived as a subtype of the model management strategy fu-
sion.
A very rough intuition behind the prediction of the co-kriging low-fidelity model
reveals that it behaves as an interpolation problem with regard to the sample sand,
as long as there is no coincidence between s˜
Kand s, it behaves as some kind of
regression problem with regard to the sample s˜
K(cf. [70, p. 172]).
Mind that the following technical explanations concerning a co-kriging low-fidelity
model are condensed and built upon the discussion in [70, p. 167 – 177]. Hence, for
118 Chapter 3. Surrogate optimization
more details regarding some derivations, I refer to the corresponding reference and
the references therein.
Adapting the notation in § 3.3.2 regarding a high-fidelity model Kand a low-
fidelity model ˜
K, respectively, let us assume a sampling plan Xs⊆Xm
0in (3.14) with
respect to a high-fidelity model Kand a sampling plan Xs˜
K⊆Xm˜
K
1with respect to
a low-fidelity model ˜
Kwith mand m˜
Kbeing the respective sample sizes such that
∀m,m˜
K.m<m˜
K. Furthermore, supposing some kind of sub-structure between X0
and X1as well as between Y1and Y0such as in (3.122), it is demanded that Xs⊂Xs˜
K.
However, for the sake of brevity, let us omit the explicit mentioning of the inclusion
maps.
If we identify Xswith an m×dmatrix and the corresponding sampling plan
points xiwith 1×dmatrices and if we identify Xs˜
Kwith an m˜
K×dmatrix the cor-
responding sampling plan points ˜
xiwith 1×dmatrices where ddenotes the number
of parameters, then one can define a (m˜
K+m)×dmatrix Xs˜
K,sby
Xs˜
K,s∶=[XT
s˜
KXT
s]T, (3.165a)
≡[˜
xT
1... ˜
xT
m˜
KxT
1... xT
m]T, (3.165b)
≡[Xs˜
K
Xs], (3.165c)
where Xs˜
K,sdenotes the joint sampling plan.
With regard to a given joint sampling plan Xs˜
K,s, let us encode in matrix repre-
sentation the joint output points as a column vector ys˜
K,s∈R(m˜
K+m)×1with
ys˜
K,s∶=[˜
y1... ˜
ym˜
Ky1... ym]T, (3.166a)
∶=[˜
yTyT]T, (3.166b)
where ˜
yirefer to the output points regarding the sampling plan Xs˜
Kwith respect to a
low-fidelity model ˜
Kand yirefer to the output points regarding the sampling plan Xs
with respect to a high-fidelity model K. Similarly to (3.83), the output points ˜
yiare
represented by the column vector ˜
y∈Rm˜
K×1and the output points yiare represented
by the column vector y∈Rm×1.
Technically, ys˜
K,sis associated with a corresponding random field Ys˜
K,s, i.e., a
vector of random variables. In this context, a crucial ingredient is the so-called auto-
regressive model assumption, which is a kind of a memoryless property or Markov prop-
erty; it states that
∀x≠xi. cov(Ys(xi),Ys˜
K(x)∣Ys˜
K(xi))=0. (3.167)
The statement in (3.167) reflects the idea that one can conceive the high-fidelity
model as an exact representation. More precisely, if one possesses all the informa-
tion about the high-fidelity model at xi, then the low-fidelity model will not provide
any new information about Ys(xi), but any potential errors are only due to the low-
fidelity model. For more details regarding the statement in (3.167), see, e.g., (cf. [70,
p. 168]) and references therein.
Let us concretize the entities K(x),˜
K(x), and Z∆(x)in the generic statement
in (3.103) by Gaussian processes and the entity Zρ(x)by a constant scaling factor
ρ∈R, then one receives the concrete statement
∀x∈Xs.Z(x)=ρ˜
Z(x)+Z∆(x), (3.168)
3.3. Surrogate-guided optimization 119
where the high-fidelity model evaluation and the low-fidelity model evaluation are
redefined as Gaussian process Z(x)and ˜
Z(x), respectively.
Similarly to the covariance matrix in (3.88), one can construct the corresponding
covariance matrix C∈R(m˜
K+m)×(m˜
K+m)by the five individual correlation matrices
Ψ˜
K(Xs˜
K,Xs˜
K)∈Rm˜
K×m˜
K, (3.169a)
Ψ˜
K(Xs˜
K,Xs)∈Rm˜
K×m, (3.169b)
Ψ˜
K(Xs,Xs˜
K)∈Rm×m˜
K, (3.169c)
Ψ˜
K(Xs,Xs)∈Rm×m, (3.169d)
Ψ∆(Xs,Xs)∈Rm×m. (3.169e)
Let us interpret the matrices in (3.169) as the evaluations of maps Ψ●with the sig-
natures Ra×d×Rb×d→Ra×bwith a,b,d∈Nand let us invoke (3.89) in order to de-
termine the entries of the correlation matrices, then one can define the family of
maps Ψ●as
Ψ●=(1X,2X)↦Ψ●(1X,2X)∶=[ψi,l]●≡[exp(−∑d
j=1
●θj∣1xj
i−2xj
l∣●pj)]●
, (3.170)
where i∈{1,. . .,a}and l∈{1,. . .,b}and 1xirefers to sampling plan points of the
sampling plan 1Xand 2xlrefers to sampling plan points of the sampling plan 2X.
Finally, one can construct the covariance matrix Cas
C∶=[σ2
˜
KΨ˜
K(Xs˜
K,Xs˜
K)ρσ2
˜
KΨ˜
K(Xs˜
K,Xs)
ρσ2
˜
KΨ˜
K(Xs,Xs˜
K)ρ2σ2
˜
KΨ˜
K(Xs,Xs)+σ2
∆Ψ∆(Xs,Xs)], (3.171)
where, similarly to (3.87), σ2
˜
K,σ2
∆∈R.
Given the definition in (3.171) and the definition in (3.170), it is observable that,
similarly to (3.97), one has to determine maximum likelihood estimates of the pa-
rameters (θ˜
K,p˜
K,θ∆,p∆,ρ)∈[0,+∞[d×[0,2]d×[0,+∞[d×[0,2]d×R. For the sake of
notational ease – especially in (3.170), let us treat the 5-tuple (θ˜
K,p˜
K,θ∆,p∆,ρ)and
the 5-tuple (˜
Kθ,˜
Kp,∆θ,∆p,ρ)as definitional equal, i.e., let us demand that the expres-
sion (θ˜
K,p˜
K,θ∆,p∆,ρ)≡(˜
Kθ,˜
Kp,∆θ,∆p,ρ)holds componentwise.
It is assumed that the information associated with the low-fidelity model is inde-
pendent of the information associated with the high-fidelity model. Hence, one can
determine the maximum likelihood estimates for µ˜
K∈Rand σ2
˜
Kby
ˆ
µ˜
K∶=1T
˜
KΨ˜
K(Xs˜
K,Xs˜
K)−1
1T
˜
KΨ˜
K(Xs˜
K,Xs˜
K)−11˜
K
˜
y, (3.172a)
ˆ
σ2
˜
K∶=1
m˜
K(˜
y−ˆ
µ˜
K)TΨ˜
K(Xs˜
K,Xs˜
K)−1(˜
y−ˆ
µ˜
K), (3.172b)
where ˆ
µ˜
Kis defined similarly to (3.95), that is,
ˆ
µ˜
K∶=ˆ
µ˜
K⋅1˜
K, (3.173)
where 1˜
K∶=[1 1 . . . 1 1]Twith 1˜
K∈Rm˜
K×1.
Let us define the concentrated ln-likelihood function associated with the infor-
mation regarding the low-fidelity model Lcln, ˜
K=(θ˜
K,p˜
K)↦Lcln, ˜
K(θ˜
K,p˜
K)with the
120 Chapter 3. Surrogate optimization
signature [0,+∞[d×[0,2]d→]− ∞,0]such that the assignment Lcln, ˜
K(θ˜
K,p˜
K)reads
as
Lcln, ˜
K(θ˜
K,p˜
K)∶=−m˜
K
2ln(ˆ
σ2
˜
K)−1
2ln(∣Ψ˜
K(Xs˜
K,Xs˜
K)∣). (3.174)
By invoking a suitable optimization algorithm (recall § 2.3.3), one can compute
the maximum likelihood estimates of (θ˜
K,p˜
K)by considering the expression
(ˆ
θ˜
K,ˆ
p˜
K)∶=argmin
(θ˜
K,p˜
K)∈[0,+∞[d×[0,2]d
−Lcln, ˜
K(θ˜
K,p˜
K). (3.175)
In order to determine the maximum likelihood estimates of (θ∆,p∆,ρ), one has
to provide a column vector y∆∈Rm×1which encodes the difference between the col-
umn vector yand the column vector ˜
y. Due to the auto-regressive model assumption
in (3.167), one has to consider only those output points of ˜
ythat are associated with
the sampling plan Xs. Hence, one has to construct a column vector ˜
y∣Xs∈Rm×1that
can be interpreted as a restriction of ˜
yto the sampling plan Xs.29
Conceptually, it is favorable to construct initially Xs˜
Kand ˜
y, and, subsequently,
to construct Xs– such that Xs⊂Xs˜
K– and y. Mind that, though, a sampling plan
should possess desirable properties: It should be space-filling and non-collapsing
(recall § 3.1.1).
Hence, in the case of an Audze-Eglais LHC or a Maximin LHC (see Figure 3.1),
one has to adopt an exchange algorithm (see [70, p. 28f]) in order to construct Xs
from Xs˜
Ksuch that Xs⊂Xs˜
Kand Xspossessing the desirable properties for a sam-
pling plan. More precisely: Let us randomly select an initial Xssuch that Xs⊂Xs˜
K
and Xs˜
K/Xs∈R(m˜
K−m)×d.30
Furthermore, it is set that X(1)
s∶=Xs. Given the running index k∈{1,. . .,m}, let
us compute the corresponding space-filling (and non-collapsing) criterion for X(k)
s,
i.e., the Audze-Eglais criterion for an Audze-Eglais LHC and the Morris-Mitchell cri-
terion for a Maximin LHC. Given the running index j∈{1,...,m˜
K−m}, one can ex-
change the sampling plan point xkof X(k)
swith each jsampling plan point of Xs˜
K/Xs;
thus, constructing theoretically jsampling plans X(k,j)
s. For each j, one can com-
pute the corresponding criterion of X(k,j)
s. If there is a j∗∈{1,. . .,m˜
K−m}such that,
for all j, the corresponding criterion for X(k,j∗)
sis optimal compared with the crite-
rion for X(k)
s, then one can set X(k+1)
s∶=X(k,j∗)
s, otherwise one can set X(k+1)
s∶=X(k)
s;
thus, completing an iteration of the exchange procedure. The exchange procedure
is continued for all sampling plan points xkof Xsuntil it terminates for k=mand it
returns Xswith Xs∶=X(m)
s.
In the case of a Sobol quasi-random sequence (see Figure 3.2), though, one can
construct initially Xs˜
Kand, then, we either pick the sampling plan points ˜
xkof Xs˜
K
with k∈{1,. . .,m}as the sampling plan points xiof Xs; or, alternatively, we con-
struct Xsas a Sobol quasi-random sequence with msampling plan points from
29Since some kind of sub-structure is supposed between X0and X1as well as between Y1and Y0
such as in (3.122), one can denote ˜
K∣X0∈hom(X0,Y1)as the restriction of ˜
Kto X0at the function level
(recall Figure 1.4). Furthermore, the map ˜
K∣X0can be composed with ι˜
Rsuch that one can construct
the map ι˜
R○Y10 ˜
K∣X0∈hom(X0,Y0). I argue, therefore, that it is reasonable to conceive the column
vector ˜
y∣Xswithin the context of the map ι˜
R○Y10 ˜
K∣X0. Keep in mind that if we solely operate with
various forms of the real numbers such as, e.g., R,Rn, and Rn×mwith n,m∈N, then a lot of valuable
conceptional distinction is probably lost.
30The operation /is overloaded with the signature Rm˜
K×d×Rm×d→R(m˜
K−m)×dwhere a resulting
difference sampling plan Xs˜
K/Xscontains all the sampling points of Xs˜
Kthat are not contained in Xs.
3.3. Surrogate-guided optimization 121
scratch. In both approaches, the resulting sampling plan Xssatisfies Xs⊂Xs˜
Kand it
is a Sobol quasi-random sequence itself, thus, it is a space-filling and non-collapsing
sampling plan itself. Therefore, utilizing Sobol quasi-random sequences to construct
the sampling plans Xsand Xs˜
Kcan be seen as a computationally time-saving and
cost-reducing alternative to the usage of an Audze-Eglais LHC or a Maximin LHC
and the necessity of an exchange algorithm.
Imagine a use case in which Xs˜
Kis constructed as an Audze-Eglais LHC or a
Maximin LHC and Xsis constructed as a Sobol quasi-random sequence. In such
a use case, it is very likely that there are no output points within ˜
ywhich can be
associated with the sampling plan points of Xs. Hence, one has to invoke a fallback
plan (cf. [70, p. 169]), i.e., given the maximum likelihood estimates (ˆ
θ˜
K,ˆ
p˜
K), let us
construct a kriging low-fidelity model (see (3.98)) of the low-fidelity model ˜
Kas
ˆ
˜
y(˜
x)∶=ˆ
µ˜
K+˜
rTΨ˜
K(Xs˜
K,Xs˜
K)−1(˜
y−ˆ
µ˜
K), (3.176)
where ˆ
˜
y∶X1→Rsuch that ˆ
˜
y(˜
x)indicates the prediction at an arbitrary point ˜
xand
˜
r∶=[˜
ri]∈Rm˜
K×1denotes the correlation column vector that reads as
˜
r∶=[˜
r1˜
r2... ˜
rm˜
K−1˜
rm˜
K]T, (3.177)
where, similarly to (3.81), the components ˜
riare defined as
˜
ri∶=
d
∑
j=1
˜
Kθj∣˜
xj−˜
xj
i∣˜
Kpj. (3.178)
By means of (3.176), let us forge a column vector ˆ
˜
y∣Xs
∈Rm×1.
Supposing ˜
y∣Xs, one can ultimately create the column vector y∆that reads as
y∆∶=y−ρ˜
y∣Xs, (3.179)
where the term ρ˜
y∣Xsencodes a multiplication of the column vector ˜
y∣Xswith the
scalar ρ. At the programs level (recall Figure 1.4), the term ˜
y∣Xsforces us to ensure
that we filter those components of the vector ˜
ysuch that we establish a map ˜
y↦˜
y∣Xs
in order to adequately compute y∆in (3.179).
Let us compute the maximum likelihood estimates for µ∆∈Rand σ2
∆by
ˆ
µ∆∶=1T
∆Ψ∆(Xs,Xs)−1
1T
∆Ψ∆(Xs,Xs)−11∆
y∆, (3.180a)
ˆ
σ2
∆∶=1
m(y∆−ˆ
µ∆)TΨ∆(Xs,Xs)−1(y∆−ˆ
µ∆), (3.180b)
where ˆ
µ∆is defined similarly to (3.95), that is,
ˆ
µ∆∶=ˆ
µ∆⋅1∆, (3.181)
where 1∆∶=[1 1 .. . 1 1]Twith 1∆∈Rm×1.
Let us define the concentrated ln-likelihood function associated with the infor-
mation regarding the low-fidelity model and the high-fidelity model, i.e., let us de-
fine the map Lcln,∆=(θ∆,p∆,ρ)↦Lcln,∆(θ∆,p∆,ρ)such that the map’s signature is
122 Chapter 3. Surrogate optimization
[0,+∞[d×[0,2]d×R→]− ∞,0]and the assignment Lcln,∆(θ∆,p∆,ρ)reads as
Lcln,∆(θ∆,p∆,ρ)∶=−m
2ln(ˆ
σ2
∆)−1
2ln(∣Ψ∆(Xs,Xs)∣). (3.182)
By invoking a suitable optimization algorithm (recall § 2.3.3), one can compute
the maximum likelihood estimates of (θ∆,p∆,ρ)by considering the expression
(ˆ
θ∆,ˆ
p∆,ˆ
ρ)∶=argmin
(θ∆,p∆,ρ)∈[0,+∞[d×[0,2]d×R
−Lcln,∆(θ∆,p∆,ρ).31 (3.183)
In practical applications, it is advisable to associate ρin (3.183) with a bounded
interval, e.g., with the closed interval [−a,a]such that ρ∈[−a,a]where a∈R+is a
user-assigned and problem-dependent entity.
Additionally, mind that, in order for ˆ
ρto be a reliable estimate of the scaling
in (3.168) and in (3.179), respectively, the sample size mregarding Xshas to be
greater than or equal to a problem-dependent lower bound (cf. [70, p. 176]).
After computing the maximum likelihood estimates (ˆ
θ˜
K,ˆ
p˜
K,ˆ
θ∆,ˆ
p∆,ˆ
ρ), one can
specify the co-kriging low-fidelity model as
ˆ
y(x)∶=ˆ
µ˜
K,K+cTC−1(ys˜
K,s−ˆ
µ˜
K,K), (3.184)
where ˆ
y∶X1→Rsuch that ˆ
y(x)indicates the prediction at an arbitrary point x. The
maximum likelihood estimate for µ˜
K,K∈Rin (3.184) is given by
ˆ
µ˜
K,K∶=
1T
˜
K,KC−1
1T
˜
K,KC−11˜
K,K
ys˜
K,s, (3.185a)
whereas ˆ
µ˜
K,Kis defined similarly to (3.95), that is,
ˆ
µ˜
K,K∶=ˆ
µ˜
K,K⋅1˜
K,K, (3.186)
where 1˜
K,K∶=[1 1 . . . 1 1]Twith 1˜
K,K∈R(m˜
K+m)×1.
Recalling (3.88), the column vector c∈R(m˜
K+m)×1in (3.184) encodes the covari-
ance between the sampling plan Xs˜
Kand an arbitrary point xas well as the covari-
ance between the sampling plan Xsand an arbitrary point x. The column vector c
can be written as
c∶=[cT
1cT
2]T, (3.187)
where the column vector c1∈Rm˜
K×1and the column vector c2∈Rm×1are defined as
c1∶=ˆ
ρˆ
σ2
˜
KΨ˜
K(Xs˜
K,x), (3.188a)
c2∶=ˆ
ρ2ˆ
σ2
˜
KΨ∆(Xs,x)+ˆ
σ2
∆Ψ∆(Xs,x), (3.188b)
where Ψ˜
K(Xs˜
K,x)∈Rm˜
K×1and Ψ∆(Xs,x)∈Rm×1. Since the column vector cplays a
similar role such as the column vector rin (3.98), one can invoke the definition of the
31Regarding some applications, there might be numerical issues that are presumably caused pre-
dominantly by the estimate ˆ
ρ. In order to mitigate such potential numerical issues, it is advisable
to round the numerical value associated with the estimate ˆ
ρ. However, an in-depth analysis of the
propagation of, e.g., the corresponding round-off error is out of the scope of the present work.
3.3. Surrogate-guided optimization 123
family of maps Ψ●in (3.170) where, by setting b∶=1, one can specify the signature
by Ra×d×R1×d→Ra×1with a,d∈N.
Technically, one can apply the interpretation that 2Xis a sampling plan with
a single sampling plan point 2x1which is defined as 2x1∶=x. Hence, the expres-
sion Ψ˜
K(Xs˜
K,x)in (3.188a) and the expression Ψ∆(Xs,x)in (3.188b) can be under-
stood by adapting the assignment in (3.170) such that
Ψ●=(1X,x)↦Ψ●(1X,x)∶=[ψi,1]●≡[exp(−∑d
j=1
●θj∣1xj
i−xj∣●pj)]●
, (3.189)
where i∈{1,. . .,a}, and 1xirefers to sampling plan points of the sampling plan 1X
and xrefers to an arbitrary point in (3.184).
An important observation regarding the co-kriging low-fidelity model in (3.184)
is that if we choose a point xsuch that xis a sampling plan point of Xsin (3.165),
then ˆ
y(x)is an output point within yin (3.166).
Thus, regarding the information associated with the high-fidelity model K, the
co-kriging low-fidelity model behaves the same way as the kriging low-fidelity model
in (3.98). Recalling the very rough intuition at the beginning of the section § 3.3.3,
the co-kriging low-fidelity model does not show such a behavior with regard to the
information associated with the low-fidelity model ˜
K.
Similarly to (3.111), one can provide a mean squared prediction error (ˆ
sy(x))2at
an arbitrary point xfor the co-kriging low-fidelity model ˆ
y(x)in (3.184). Hence, the
error (ˆ
sy(x))2(cf. [70, p. 172]) can be defined as
(ˆ
sy(x))2∶=ˆ
ρˆ
σ2
˜
K+ˆ
σ2
∆−cTC−1c+(1−1T
˜
K,KC−1c)2
1T
˜
K,KC−11˜
K,K
, (3.190)
where ˆ
sy∶X→R. Analogously to (3.111), the term in (3.190) involving the fraction is
negligibly small, thus, let us reformulate the mean squared prediction error (ˆ
sy(x))2
as (ˆ
sy(x))2∶=ˆ
ρˆ
σ2
˜
K+ˆ
σ2
∆−cTC−1c. (3.191)
Utilizing the mean squared prediction error (ˆ
sy(x))2in (3.191), one could formu-
late a sequential co-kriging optimization in the same fashion as the sequential kriging
optimization in § 3.3.1.
However, in the present work, we do not dwell on the sequential co-kriging op-
timization, but we rather dwell on the co-kriging optimization. More precisely: Given
a high-fidelity model K, and a low-fidelity model ˜
Kwhich, in the context at hand, is
considered conceptually indistinguishable from a surrogate model, the high-fidelity
optimization problem, for instance, in the formulation in (3.116), is replaced by a
co-kriging low-fidelity optimization problem which one can state as, e.g.,
min.
x∈X0(ˆ
ˆ
j○ˆ
y)(x), (3.192)
where ˆ
y(x)refers to the co-kriging low-fidelity model in (3.184).
Notice well that, in a kriging low-fidelity optimization problem corresponding
to (3.116), one is solely capable of obtaining information from the high-fidelity model
in order to forge the kriging low-fidelity model.
Thus, let us conceive the corresponding optimization kind, i.e., kriging optimiza-
tion, as a subkind of surrogate-based optimization.
124 Chapter 3. Surrogate optimization
In a co-kriging low-fidelity optimization problem such as in (3.192), though, a
necessary minimum of information regarding the high-fidelity model is established,
and, then, one can steer the amount of information regarding the low-fidelity model
to, hopefully, improve the co-kriging low-fidelity model.
Thus, let us conceive the corresponding optimization kind, i.e., co-kriging opti-
mization, as a subkind of surrogate-guided optimization.
Depending on the computational costs of the low-fidelity model and the sur-
rogate model, respectively, in the co-kriging optimization, one can build a data-fit
low-fidelity model of the surrogate model – analogously to step 3) and step 4) in
the procedure SGO-SPLF – in order to utilize the data-fit low-fidelity model of the
surrogate model as a proxy for the original surrogate model in the construction of
the co-kriging low-fidelity model.
Finally, comparing the co-kriging optimization and the corresponding kriging
optimization, one can check whether the relation in (3.107) holds to be true. In-
terestingly, it is conceivable that the relation in (3.107) holds to be true and that a
measurable computation time regarding the co-kriging optimization is still larger
than a measurable computation time regarding the kriging optimization. It appears
like a seemingly paradoxical behavior.
The rationale for this seemingly paradoxical behavior is that the determination of
the maximum likelihood estimates of the parameters (θ˜
K,p˜
K,θ∆,p∆,ρ)in (3.175) and
in (3.183) is much more involved than the determination of the maximum likelihood
estimates of the parameters (θ,p)in (3.97).
Therefore, if the number of parameters and the dimensions of the matrices in the
maximum likelihood estimations are too large (cf. [70, p. 171]), then a measurable
computation time regarding the co-kriging optimization can be larger than a mea-
surable computation time regarding the kriging optimization – albeit the relation
in (3.107) holds.
Observe that the sequential co-kriging optimization appears as a hybrid of the
model management strategies fusion and adaptation. Mind that, to my best knowl-
edge, such hybrids are not extensively discussed within the classification scheme of
model management strategies proposed in [166].
For instance, recalling the end of § 3.3.2, the suggested extension of the proce-
dure SGO-SPLF by an inner sequential kriging optimization can be interpreted as a
hybrid of one kind of adaptation model management strategy and another kind of
adaptation model management strategy.
Furthermore, if we apply a formalization-oriented viewpoint in the sense that
we discuss the co-kriging low-fidelity model in (3.184) embedded in the context of
the optimization within the space-mapping paradigm (see § 3.3.2), then one can re-
place the low-fidelity optimization problem in (3.117) by the co-kriging low-fidelity
optimization problem in (3.192). In the context of § 3.3.2, the co-kriging low-fidelity
model can be understood as a second-level low-fidelity model w.r.t. the first-level
low-fidelity model ˜
K.
A first benefit of such a formalization-oriented viewpoint is that it grants us with
a novel procedure that can be interpreted as a hybrid of the model management
strategies fusion and adaptation. Such a viewpoint suggests, therefore, to conceive
this hybrid (a co-kriging low-fidelity model within the space-mapping paradigm)
as somehow comparable with the hybrid constituted by the sequential co-kriging
optimization – at least at the function level (recall Figure 1.4).
A second benefit of such a formalization-oriented viewpoint is that it nurtures a
modular construction principle such that, e.g., the extension of the procedure SGO-
SPLF could be executed by an inner sequential co-kriging optimization. Hence, this
3.3. Surrogate-guided optimization 125
novel suggested extension can be interpreted as a hybrid of one kind of adaptation
model management strategy, and a fusion model management strategy, and another
kind of adaptation model management strategy.
A third benefit of such a formalization-oriented viewpoint is that it furnishes us
with a formal suspicion about the important role of the low-fidelity model within the
co-kriging low-fidelity model. Note that the numerical experiments in § 3.2 within
the context of surrogate-based optimization – for instance, in terms of eN
cv,eNR
cv ,r2
ˆ
y˜
ˆ
y,cv
or SN
˜
ˆ
y,ias shown in the Figures 3.12 -3.15 – furnish us with a numerical initial suspi-
cion. To the best of my belief, the role of the low-fidelity model within the co-kriging
low-fidelity model is not exhaustively formally elaborated in the literature (see, e.g.,
[70, p. 167 – 177] and references therein), though. Mind that, however, if we consider
the hybrid constituted by a co-kriging low-fidelity model within the space-mapping
paradigm, then one can invoke the statement in (3.164) which has to be adjusted re-
garding the sample size mand the sample size m˜
Kin (3.165). Hence, let us write the
adjusted statement as, with regard to some appropriate norms,
∀K˜
R,˜
K2,˜
P
s.∀˜
K2.∀˜
K1.(eNR
H,sg,˜
K1(ˆ
Qξ
ξ
ξ)→0∧r2
ˆ
y˜
ˆ
y,˜
K1→1)∧(eNR
H,sg,˜
K2(ˆ
Qξ
ξ
ξ)→0∧r2
ˆ
y˜
ˆ
y,˜
K2→1)∧
(eNR
H,sg,K˜
R,˜
K2,˜
P
s(ˆ
Qξ
ξ
ξ)→0∧r2
ˆ
y˜
ˆ
y,K˜
R,˜
K2,˜
P
s
→1)as m→∞and m˜
K1→∞and k→∞
Ô⇒ x(k)→x∗as k→∞w.r.t (3.140a), (3.193)
where ˜
K1refers to the first-level low-fidelity model ˜
Kwith m˜
K1≡m˜
K, and ˜
K2refers
to the co-kriging low-fidelity model in (3.176). The remaining entities in (3.193) are
defined according to our comments on the statement in (3.164).
Hence, by observing the statement in (3.193), one can utter the formal reasonable
suspicion that the choice of the low-fidelity model within the co-kriging low-fidelity
model obeys similar restrictions as the low-fidelity model within the space-mapping
paradigm.
Or to put it differently: By embedding the convergence issues related to the co-
kriging low-fidelity model into the convergence issues related to the space-mapping
paradigm, one can formally argue that the quality of the low-fidelity model within
the co-kriging low-fidelity model has to satisfy some problem-dependent lower bou-
nds.
Mind that the present work has primarily focused on working out the benefits of
a purely formalization-oriented viewpoint which has led to fertile novel insights
of theoretical value (such as hybrid model management strategies) and practical
value (such as the quality of the low-fidelity model within the co-kriging low-fidelity
model). These novel insights reveal novel research directions at the algorithm level,
at the program level, and at the application level as well (recall Figure 1.4).
However, the scrutineering of these new research directions, i.e., their thorough
and extensive examination, for instance, by comparing with the results in (3.2.1) and
in (3.2.2) and by extending the corresponding database, has to be left for future work.
126 Chapter 3. Surrogate optimization
3.4 In closing
The chapter’s primary purpose has been to provide us with an in-depth elaboration
of this thesis’s key notion surrogate optimization and an in-depth elaboration of the
proposed partitioning of this notion in ch. 1.2 into the three sub-notions: (1) sur-
rogate modeling & simulation, (2) surrogate-based optimization, and (3) surrogate-
guided optimization.
Throughout the elaborations, we have anticipated algebraic tools from the cate-
gory theoretical language in ch. 4such that we have beneficially tagged the various
notions of surrogate optimization with algebraic notes. Similarly to a lubricant, the
algebraic tools enabled us to smoothly operate between the various layers in Fig-
ure 1.4.
Regarding the sub-notion (1) surrogate modeling & simulation, we have initially
discussed an abstract setting in order to introduce relevant notions from the common
methodological and terminological toolbox. We have looked at different classes of
mathematical problems and we have encountered various important terms such as
high-fidelity model and low-fidelity model.
Next, we have defined the high-fidelity function approximation error. In the
discussion about sampling plans, we have encountered three different kinds of sam-
pling plans and their peculiarities. In the literature regarding surrogate optimiza-
tion, to my best knowledge, some sampling plans, such as, e.g., those constructed
by a Sobol quasi-random sequence, are not widely represented, yet. Afterwards, we
have introduced the empirical surrogate modeling error where we have defined the
empirical training error and the empirical generalization error as well.
The empirical generalization error, in particular, is an important indicator within
surrogate optimization and we have presented this error in various guises, for in-
stance, within the k-fold cross-validation method. Another important indicator is
the squared sample Pearson correlation coefficient. We have carved out some not ex-
haustively discussed nuances in the case that this coefficient is being used together
with the empirical generalization error within the k-fold cross validation method.
These nuances gain in significance through the fact that, in the present work, the
number of sampling plan points concerning the high-fidelity model is assumed to
be sparse.
Furthermore, we have developed a potential link between the sample Pearson
correlation coefficient and a low-fidelity models’ normalized global first-order sen-
sitivity measures which has culminated in a cautiously formulated conjecture about
the trustworthiness of low-fidelity models’ normalized global first-order sensitivity
measures.
Subsequently, we have examined deterministic data-fit low-fidelity models, i.e.,
multivariate polynomials and radial basis functions, and probabilistic data-fit low-
fidelity models, that is, kriging low-fidelity models. We have investigated diverse
aspects of these models in order to gain an holistic understanding of these models
and to spot possible pitfalls and a potential room for improvement. Some of the
investigated aspects are: the underlying construction principles of the models, the
computational significance of the dimension of the domain, the numerical relevance
of the evaluation scheme, and a sampling plan’s influence on the condition number
of a problem-representing matrix.
We have closed the subpart (1) surrogate modeling & simulation by an elabora-
tion of simplified-physics low-fidelity models. We have examined the general con-
cept of a user-prescribed hierarchy of problems with regard to the degree of fidelity
3.4. In closing 127
and the computational costs. We have concretized this general concept by present-
ing a user-prescribed hierarchy of magnetoquasistatic and magnetostatic problems
which are associated with simplified-physics low-fidelity models. From this hierar-
chy, we have abstracted some diagrams in a loose category-theoretical style. At the
end, we have paved the way for a purely formalization-oriented viewpoint on some
surrogate-guided optimization approaches.
Regarding the sub-notion (2) surrogate-based optimization, we have examined
the optimization with the test functions in § 2.3.3 by data-fit low-fidelity models and
by emulated simplified-physics low-fidelity models. The essential idea is to solve
an optimization problem associated with the low-fidelity model whose optimal so-
lution is utilized as a starting point for the optimization problem associated with the
high-fidelity model.
The proof of work in form of, e.g., visualizations of the above-mentioned indi-
cators within surrogate optimization, appear valuable since, to my best knowledge,
there is a lack of a comprehensive database of corresponding benchmarks. Hence,
the proof of work equips us with a benchmark-focused classification of test func-
tions (more generally, high-fidelity models) whose advantages and disadvantages
we have briefly discussed.
An advantage is the opportunity to classify very roughly the behavior of a cor-
responding optimization problem within the magnetoquasistatic and magnetostatic
context. A disadvantage is that it is not clear whether there exists a reliable complete
list of indicators.
We have closed the subpart (2) surrogate-based optimization by an elaboration
of the proposed procedures SBO-DFLF and SBO-SPLF and their potential combina-
tions.
Regarding the sub-notion (3) surrogate-guided optimization, I have argued from
an application-driven viewpoint that it is worthwhile to check whether the num-
ber of high-fidelity model evaluations regarding a surrogate-based optimization ap-
proach is higher than the number of high-fidelity model evaluations regarding a
surrogate-guided optimization approach. The additional value of such a check is
comprehensible in the context of validation and verification.
Subsequently, we have dwelled on the sequential kriging optimization as a sub-
kind of the model management strategy adaptation, on optimization procedures
within the space-mapping paradigm which are a subkind of the model manage-
ment strategy adaptation, and on the co-kriging optimization which can be seen as
a subkind of the model management strategy fusion.
Concerning the optimization within the space-mapping paradigm, we have uti-
lized a formalization-oriented viewpoint to pin down properly, e.g., the conceptional
distinction between a low-fidelity model and a surrogate model. Mind that there is
a loss of conceptional information if we solely operate with various representations
of the real numbers such as, e.g., R,Rn, and Rn×mwith n,m∈N.
Nevertheless, we have concretized the formal concepts to investigate the syntax
and the semantics of the multivariate scalar-valued use case and the multivariate
vector-valued use case.
Subsequently, we have examined the basic building blocks of a representation of
the Trust Region Aggressive Space Mapping (TRASM) algorithm, that is, the algo-
rithm 3.1. Additionally, we have discussed the basic building blocks of some other
proposed algorithms within the literature about the space-mapping paradigm.
Driven by heuristics, we have formulated a convergence statement that incorpo-
rates some of the above-mentioned indicators that, to the best of my belief, furnishes
128 Chapter 3. Surrogate optimization
us with a novel access to the delicate aspect of convergence-related issues within the
space-mapping paradigm.
At the end, we have elaborated on the proposed procedure SGO-SPLF.
Concerning the co-kriging optimization, we have examined the basic building
blocks for constructing the co-kriging low-fidelity model. A special challenge is the
handling of a sampling plan associated with the high-fidelity model and a sampling
plan associated with a low-fidelity model where the usage of algebraic notions facil-
itates the consideration.
An intriguing novel observation is that sampling plans constructed by a Sobol
quasi-random sequence may help to reduce the overall computational costs of con-
structing a co-kriging low-fidelity model.
We have closed the subpart (3) surrogate-guided optimization by elaborating
on the benefits of a purely formalization-oriented viewpoint that provides us with
novel insights of theoretical value (such as potential hybrid model management strate-
gies) and of practical value (such as convergence-related issues regarding the quality
of the low-fidelity model within the co-kriging low-fidelity model).
129
Chapter 4
An algebraic modeling framework
using the category theoretical
language for applications in
surrogate optimization
In§1.3, I have briefly adduced the formal language of category theory as a holistic-
structural approach to mathematics which can serve as a promising mediator be-
tween the tool set from logical analysis and the tool set from numerical analysis.
Furthermore, I have pointed at the potential new opportunity that opens up by em-
ploying the category theoretical language in order to complement the primarily nu-
merical analytic perspective in the context of surrogate optimization.
In§2.1.2, we have made a detour to a structural perspective on a structural
property; in § 2.2.2, we have made a detour to a structural perspective on another
structural property; and, in § 2.3.2, we have made a detour to a structural perspective
on the objective functions. Hence, I have used these detours to show by examples
that the formal language of category theory is lurking in the background of some
established perspectives on optimization within the electromagnetics context.
In ch. 3, various notions of surrogate optimization have been tagged with alge-
braic notes. In § 3.1.3, especially, algebraic tools from the category theoretical lan-
guage have been anticipated in the elaborations on simplified-physics low-fidelity
models. In § 3.3.2 and in § 3.3.3, some benefits of a formalization-oriented viewpoint
on surrogate-guided optimization have been shown in order to, e.g., recognize hy-
brid model management strategies or formulate heuristics-driven convergence state-
ments.
In the present chapter, let us head further into the research direction of the formal-
ization-oriented viewpoint and aim at strengthening its theoretical foundations.
Firstly, we recapitulate some conceptualities from the previous chapters. More-
over, in addition to the context of validation and verification for the category theoret-
ical language (see, e.g., § 2.3.2), we briefly sketch the emerging research direction of
full automation of surrogate-guided optimization (SGO) and how it can serve as an-
other potential context for the category theoretical language. Afterwards, I concisely
mention some relevant related works.
Secondly, let us introduce the category theory toolset where we focus on core
tools. Initially, we foster some intuition about the toolset and, subsequently, we
apply some rigor in the reasoning. Finally, I illuminate a couple of computational
facets.
Thirdly, the category theory toolset is used for specifying a general optimization
problem and for specifying surrogate-guided optimization methods where the focus
130 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
is on methods within the space-mapping paradigm.
Fourthly, I examine other use cases for the category theory toolset related to
high-fidelity models and low-fidelity models relevant to applications in electrical
engineering.
We close the chapter, and thus also the advance in the research direction of the
formalization-oriented viewpoint, at a fork with three open roads for future use
cases.
4.1 Recapitulating and enlarging the contextual landscape
By recollecting some landmarks, let us concisely recapitulate the contextual land-
scape so far (cf. § 1.2). Next, we succinctly illuminate the context of full automa-
tion (recall § 1.1) of surrogate-guided optimization (recall § 3.3). A vague and in-
tuitive idea underlying full automation of SGO – or, in general, the full automation
of surrogate optimization (recall ch. 3) – is that, given an optimization problem by
a user, an ideal software system ascertains the "best" (in some sense) SGO approach
for the optimization problem at hand. Admittedly, the in-depth investigation of this
idea is out of the scope of the present work (cf. the third disclaimer in § 1.3). How-
ever, we at least sketch the potential practical contribution of the formal language of
category theory to the discussion of this idea. We end the section by naming some
relevant related work to our subsequent elaborations.
4.1.1 Recapitulating & the context of full automation of SGO
In engineering applications, the class of surrogate-guided optimization (SGO) meth-
ods are gainful in accelerating the numerical search for optimal solutions (see, e.g.,
[70], [116]). A fundamental assumption regarding SGO schemes is that the over-
all computational costs of the numerical optimization are dominated by the costs
of evaluating the objective function (aka high-fidelity function). Since the aim is to
quickly find the high-fidelity function’s optimal solution, the following basic ideas
of SGO methods arise: (1) Approximate the objective function by one or more surro-
gate functions (aka low-fidelity functions) – which capture the high-fidelity function’s
structure and, by design, have much lower evaluation costs than those of the high-
fidelity function; and (2) draw sparingly on the high-fidelity function.
Understandably, a lot of research effort in the field of surrogate-guided optimiza-
tion is circled around the numerical properties of the interplay between the high-
fidelity function and different kinds of surrogate functions. There are two common
ways to classify surrogate functions: (#1) data-fit,simplified-physics, or projection-based;
and (#2) intrusive or non-intrusive – where intrusive means that there is a need to
modify the numerical software that is underlying the high-fidelity function. Note
that, in general, computational models and non-computational models such as phys-
ical experiments are conceivable.
Despite all the advances in this field, there remains a need for investigating how
to achieve full automation of surrogate-guided optimization methods – as, for in-
stance, it has been commented in [119, p. 1513]: "Full automation of space mapping
and other surrogate-assisted design methods is a necessary condition of widespread
acceptance of such methods by the designers and industry." Furthermore, full au-
tomation is indirectly described by stressing aspects like "ensuring global conver-
gence, immunity to coarse model inaccuracy, as well as robustness with respect to
the surrogate model setup" (cf. [119, p. 1512]). All these aspects are undoubtedly
4.1. Recapitulating and enlarging the contextual landscape 131
essential since they are hinting at important common language features underlying
numerical analysis; but it seems unlikely that these common language features alone
can express all the intuitive associations with the idea of full automation. Thus, there
is an opportunity to point the research in this field at a new orthogonal direction,
more precisely, at category theoretical language features.
Imagine the following conceivable realization of full automation: An ideal soft-
ware system selects a surrogate model and chooses the appropriate algorithm for
the given optimization problem – without a user’s intervention. For this thought
experiment, there is a need for a formal language in which one can define the opti-
mization problem, the algorithms and the surrogate models in a suitable way for the
software system because it can only deal with well-defined tasks.
I argue that thinking of full automation in terms of software systems shifts the
analysis from point by point considerations of single concrete optimization prob-
lems to a holistic view of the modeling chain. This holistic perspective sheds some
light on the hidden costs of the modeling chain. And it allows to reassess the role of
the empirically undeniable savings in terms of high-fidelity function evaluations.
Depending on a user’s capabilities and experiences, the error-proneness of self-
implemented surrogate-guided optimization schemes will vary heavily. And even
a lot of testing cannot prevent potential hidden bugs. Therefore, striving for repro-
ducibility, a thorough cost–benefit analysis could lead a user to stick to classical op-
timization schemes provided by commercial or non-commercial software systems,
and to put more trust in solutions found by methods that have stood the test of
time (see, e.g., [35], [158], [142] or [96]).
Let us examine how to reduce the gap between mathematical modeling involved
in surrogate guided-optimization and a software system by the formal language of
category theory which is a holistic-structural approach to mathematics (see, e.g., [11],
[177] or [180]). I admit that, obviously, the restriction to a formal language is a sim-
plification since real-world software systems are vastly more complex and more
deeply anchored in physical machines than any mathematical abstraction or for-
malism could ever capture. Nevertheless, this simplification is reasonable because a
crucial part of a software system is the programming language – and a programming
language is a formal language, too.
Note that programming languages are a vivid research field within the computer
science community. In this context, type theory is an essential theoretical cornerstone
that is related to a practical programming language’s type system. In the elucida-
tions of the present work, solely a working or rudimentary knowledge of type the-
ory and type systems is supposed. For more details on type theory, see, e.g., [88] and
references therein.
The design of a type system determines heavily to which extent it can express
all the properties of a well-defined task: At one end of the spectrum, there are
dynamically-checked languages (such as, e.g., the MATLAB®PL or the Python PL),
where a trend can be seen towards richer types systems for performance reasons
(see, e.g., the Julia PL in [26]); at the other end of the spectrum, there are statically-
checked languages (such as, e.g., the Haskell PL or the Agda1PL), where verification
reasons have been an impetus for very rich types systems.
Let us conceive a type as a set equipped with an equivalence relation – but with-
out committing to one specific technical concept (see more details about various
1In [138], the Agda programming language is partially used for investigating partial derivatives
and the corresponding chain rule for multivariate calculus.
132 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
technical concepts, e.g., in [214]); since, in the present chapter, the focus is on cat-
egory theory, not on type theory. But the connection between the type theoretical
language and the category theoretical language can be understood in a very narrow
way by means of the Curry-Howard-Lambek correspondence (see, e.g., [167, p. 59ff]) or
in a broader way by means of a pragmatic mental model in which a category serves
as a model of a (functional) programming language (see, e.g., in [16, p. 20-24]).
Let us put the pragmatic mental model to use such that the category theoreti-
cal language can act as a mediating instance between the mathematical modeling
involved in surrogate-guided optimization and the type-theoretical aspects of a pro-
gramming language. From this point of view, other formalization approaches, e.g.,
Hilbert spaces, manifolds, and similar, can be thought of as domain-specific lan-
guages embedded in the host language (or "general-purpose language") of category
theory (cf. [102]).
The category theoretical language can offer a language-focused comparison of
different surrogate-guided optimization methods – which has a very practical bene-
fit; because, at least in the field of computational electromagnetics, there is a lack
of well-defined equivalence classes of benchmarks that could enable a standard-
ized benchmark-focused comparison. For instance, the popular T.E.A.M. (Testing
Electromagnetic Analysis Methods) problems do not seem to be used much in the
research field of surrogate-guided optimization.
Aiming at strengthening the theoretical foundations of the formalization-orien-
ted viewpoint on surrogate optimization, one main concern in this chapter is the
development of a novel comparison toolset for surrogate-guided optimization meth-
ods by explicitly category theoretical (CT) language features.
I illustrate the CT approach by discussing the space mapping paradigm’s basic
building blocks (recall § 3.3.2) within the frame of model management strategies (re-
call § 1.2).
Furthermore, I depict the usefulness of the CT approach with regard to formal-
ization use cases within the electromagnetics context related to simplified-physics
low-fidelity models and related to transformations – such as, e.g., coordinate trans-
formations – of a high-fidelity model and a low-fidelity model.
4.1.2 Relevant related work
To my best knowledge, in the literature on surrogate-guided optimization in electri-
cal engineering, formalization issues have not been addressed exhaustively. There-
fore, the selection of the following articles aims at creating a context in order to make
the added value of this chapter’s contribution comprehensible.
In [166], the authors try to organize various numerical methods in the fields of
uncertainty propagation, statistical inference, and optimization by means of the con-
cept of multifidelity model management. Applying this concept to the field of optimiza-
tion corresponds to the class of surrogate-guided optimization methods.
Originating as one correction methodology for surrogate functions, the space
mapping notion led to a very ramified family of surrogate-guided optimization
methods like, e.g., space mapping (see, e.g. [14]), manifold mapping (see, e.g. [56]),
and many others (see a survey, e.g., in [126]).
In [166], the authors identify correction methodologies like space mapping or the
first-order approximation and model management optimization (AMMO) paradigm
(see, e.g., [4]) as members of one class of multifidelity model management strategies:
adaptation; i.e., during the optimization process, the low-fidelity model is adapted by
high-fidelity model’s information.
4.2. Category theory toolset 133
In [123], the authors present an automated low-fidelity model selection based
on correlation analysis between low- and high-fidelity models for surrogate-guided
design optimization of antennas.
Various approaches to automated algorithm selection based on machine learn-
ing techniques are discussed in [113]. Category theoretical approaches to machine
learning and its underlying mathematical concepts (like Bayesian probability) have
been conducted in, e.g., [68] or [52]. In [58], automatic differentiation – a key concept
in machine learning techniques – is discussed in a functional language environment.
Using the idea of object-oriented coding, the authors in [111] discuss how to
implement finite element software systems that imitate the mathematical structures
of Maxwell’s equations. The emphasis on the mathematical structures appears very
fruitful in the realm of computational electromagnetism (see, e.g., [205], [174], [8],
[28], [32]).
In [65], the author discusses various category theoretical ideas in the realm of
general software engineering that are beyond the scope of the present work that
limits itself to ideas in programming languages (see, e.g., [219]).
In order to justify certain statistical modeling approaches, the author in [148]
offers a precise mathematical definition of a statistical model by the language of
category theory.
4.2 Category theory toolset
Since the formal language of category theory (CT) operates at an even higher level
of abstraction than the languages such as functional analysis or differential geom-
etry (recall ch. 2), the first step is to foster some intuition regarding the category
theoretical language. From an application-driven viewpoint, this intuition is valu-
able in order to better comprehend the nature of the problems where the category
theoretical language shines, and therefore to better anticipate its potential practical
benefits.
The subsequent step is to apply some rigor to the intuitive reasoning about cat-
egory theory. The corresponding elaborations rely only on elementary notions of
category theory, more precisely, no deep theorems of category theory are applied.
It is primarily used as a strong and stable notational scaffolding, especially, by dia-
grams of arrows (see, e.g., [144, p. 1ff]).
At the end, some computational facets regarding the category theoretical lan-
guage are illuminated.
4.2.1 Fostering some intuition
In order to harness partly the high level of abstraction regarding the category theo-
retical language, it is useful to recall occasionally some common concrete perspec-
tives on categories.
From one perspective, a category can be viewed as a kind of algebraic structure
like a group or a vector space. A vector space models linearity, a group models
symmetry, and a category models composition.
Another perspective is to consider a category as a mathematical context. In the
domain of linear algebra, for instance, the language of matrix dimensions and ma-
trices and the language of vector spaces and linear maps describe essentially the
same underlying structure and properties of this domain that can be encoded by an
equivalence of the corresponding categories.
134 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
Category theory emphasizes rather the structure-preserving maps between ob-
jects than the objects themselves; e.g., it emphasizes rather the linear maps between
vector spaces than the vector spaces themselves. This observation is reflected in a
third perspective in which a category Aencodes a syntax and another category B
encodes a semantics and the structure-preserving map F from Ato Bencodes an
interpretation of Awithin B(see, e.g., [136, p. 166ff]). The map F ∶A→Bis called a
functor; and, most abstractly, it is a tool for comparing categories (see, e.g., [36]).
Inferring from the second and the third perspective, a functor can also be re-
garded as an interpretation of one mathematical context within another. A different
interpretation can be encoded in a different functor G ∶A→B. The need for a
comparison of the two interpretations F and B is covered by the notion of a natural
transformation.2
A B
F
G
α(4.1)
The diagram of arrows in (4.1) unites the very basic tools of category theory: cate-
gories (A,B), functors (F, G), and natural transformations (α).
4.2.2 Applying some rigor
All following definitions build upon first-order logic and a primordial concept of a
set, i.e., no specific axiomatic system of set theory is used. In addition, if one wants to
express a proposition, let us use the set membership symbol "∈"; and if one wants to
express a judgment, let us use the type annotation symbol "∶". For instance, "one is an
element of the natural numbers" is a judgment, hence, one would write "1 ∶N". Simi-
larly, in a statically-checked language (recall § 4.1.1), "one is an element of the integer
type" is a judgment, not a proposition; hence, one would write "1 ∶Int". Regarding
further logical technicalities (hierarchy of universes, Grothendieck universes, axiom
of choice, and similar), I refer to, e.g., [214] and references therein.
Let us provide a definition of a category by emphasizing its three constituting
parts (data, structure, laws) as a mathematical entity.
Definition 4.2.1 (Category).Acategory Cis constituted by
• data:
–a collection obj(C)of objects X,Y,Z, ...
–∀X,Y∶obj(C)∃a set homC(X,Y)of morphisms (or arrows or "structure-
preserving" maps) f,˜
g,˜
˜
h, ...
*notation (general) f∶homC(X,Y),f∶YX,
*notation (specific) f∶X→Y,Xf
Ð→ Y
*all homC(X,Y)are pairwise disjoint;
• structure:
–dom(f)is the domain Xof morphism f,
cod(f)is the codomain Yof morphism f
2Expressing formally the notion of natural transformation for applications in the field of algebraic
topology is mostly considered as the starting point of category theory (see, e.g., [177, p. 1f]).
4.2. Category theory toolset 135
–∀X∶obj(C)∃an identity morphism idX∶homC(X,X)
–∀f∶homC(X,Y)∀g∶homC(cod(f),Z)
∃a composite morphism g○f∶dom(f)→cod(g);
• laws:
–∀f∶homC(X,Y). idY○f≡f≡f○idX(unity)
–∀f∶homC(X,Y)∀g∶homC(cod(f),Z)
∀h∶homC(cod(g),W)∃h○g○f∶dom(f)→cod(h).
h○(g○f)≡(h○g)○f(associativity) .
Remark 4.2.1. The notion "structure-preserving" originates from considerations of struc-
tured sets and the structure-preserving functions between them. But morphisms do not
necessarily have to be structure-preserving functions – or functions at all.
Remark 4.2.2. By simply turning around all the arrows of a category C, one can define the
dual or opposite category Cop that molds the duality principle.
Remark 4.2.3. Addressing the issue concerning the size of a category, a category is called
small if both obj(C)and homC(X,Y)are sets. Moreover, let us suppose primarily locally
small categories such that, at least, for all pairs of objects X,Y, the collection of morphisms
between them, i.e., homC(X,Y), is a set – called a hom-set.
Some representations or implementations of categories are:
• Set (the category of (finite) sets and set functions),
• Top (the category of topological spaces and continuous maps),
• Man∞(the category of smooth manifolds and ∞-times continuously differen-
tiable maps),
• Vectfd
k
(the category of finite dimensional vector spaces over the field kand
k-linear maps),
• TVect
k
(the category of topological vector spaces over the topological field k
and continuous k-linear maps),
• Vectbasis
k
(the category of finite dimensional vector spaces over the field kwith
chosen basis and k-linear maps), and
• Mat
k
(the category of non-zero natural numbers [matrix dimensions (m,n)]
and m×n-matrices with values in the field k).
Bear in mind that the category’s definition encompasses merely its specification,
i.e., its minimal amount of essential characteristics; thus, various real encounters,
e.g., inverse maps or products, lead to new features – like a category Cwith isomor-
phisms.
In the subsequent elaborations, it is supposed that the maps fand gare ad hoc
polymorphic or overloaded. Thus, let us invoke the maps fand gwith different signa-
tures.
Notice well that isomorphisms are special morphisms that allow to encode the
idea of sameness of objects. A morphism f∶X→Yis called an isomorphism if and
only if there exits a morphism g∶Y→Xsuch that, in equational form, f○g=idY
136 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
and g○f=idX. In diagrammatic form, one can choose exemplarily the subsequent
representation:
X Y
X Y
f
idX
⟳∃g
⟳idY
f
, (4.2)
where the symbol ⟳is used to denote a commutative diagram. Furthermore, the
objects Xand Yare called isomorphic (symbolically: X≅CY). If the objects are iso-
morphic, then they are indistinguishable, hence, it possible to substitute one object
with the other.
A morphism whose domain is identical with its codomain, that is, a morphism f∶
X→Xsuch that
dom(f)≡cod(f)(4.3)
is called an endomorphism. An endomorphism which is also an isomorphism (see the
diagram in (4.2)) is called an automorphism.
One can construct a notable example for isomorphic objects by providing a cat-
egory Cwith product objects A ×B. More precisely, given A,B∶obj(C), then the ob-
ject A×B∶obj(C)equipped with a pair of morphisms πA∶A×B→Aand πB∶A×
B→Bis a (binary) product object, if and only if it satisfies a universal mapping
property (UMP), i.e.,
∀P∶obj(C).∀f∶P→A.∀g∶P→B.∃!h∶P→A×B(4.4a)
such that the subsequent diagram commutes
P
A A ×B B
∃!h
f⟳⟳g
πAπB
, (4.4b)
Let us refer to the morphism has the (binary) product of the morphisms fand g
(symbolically: h∶=⟨f,g⟩). Finally, assuming ternary product objects in the cate-
gory Set, a notable example for isomorphic objects is
∀A,B,C∶obj(Set).(A×B)×C≅Set A×(B×C). (4.5)
In order to show that a parallel pair of morphisms f,g∶X⇉Yis equal, i.e.,
f=˜
Cg, (4.6)
one needs a category ˜
Cwith a terminal object 1. In a category ˜
C, an object is terminal
if and only if for all objects X∶obj(˜
C), there exists a unique morphism X→1. In-
voking the duality principle (see Remark 4.2.2), there is an initial object Oas well. In
a category ˜
C, an object is initial if and only if for all objects X∶obj(˜
C), there exists a
unique morphism O→X.
Mind that, in category theory, there is no global set-membership relation; thus,
the idea of an element "x∈X" is encoded in a map xsuch that x∶1→Xor 1x
Ð→ X.
Hence, if for all morphisms 1x
Ð→ X, the equation f○x=1→Yg○xholds to be true,
then f=˜
Cg. If the morphisms are equal, then they are indistinguishable, hence, it is
4.2. Category theory toolset 137
possible to substitute one morphism with the other.
This kind of test for equality w.r.t. (4.6) corresponds to the common extensional
equality of functions (recall Remark 2.2.2) in which one treats functions as black-boxes
such that one only considers their input-output behavior. It is important to note
that this equality problem is algorithmically decidable only if the domain is finite
and relatively small. If one is also interested in how the output is calculated, i.e.,
the particular "formulas" of the functions, one needs to consider intensional equality
of functions. Finally, if one wants to know whether two functions point to the exact
same instance in computer memory, one needs to invoke the notion of referential
equality of functions.
Observe that the terminal object 1, if it exists, then it is only unique up to iso-
morphism. But, more importantly, it is part of objects (like the product object) that
follow the principle of universality. Hence, the definition of objects such as the ter-
minal object is based on a universal mapping property that is similar to (4.4) where the
object
´
s existence and uniqueness is related to all other objects of the category.
Given a category Cand a category D, one can define a map from Cto Dthat pre-
serves the structure of the category C(recall Definition 4.2.1) within the category D.
This structure-preserving map between two categories is called a functor.
Definition 4.2.2 (Functor).Given a category Cand a category D, a functor F∶C→D
is constituted by
• a map on the data of Cand D, i.e., a functor’s assignment rule reads as
–∀X∶obj(C).∃F(X)∶obj(D)
–∀f∶homC(X,Y).∃F(f)∶homD(˜
X,˜
Y);
• that preserves the structure of Cwithin D, i.e., the functor laws read as
–˜
X≡F(X),˜
Y≡F(Y)
–∀X∶obj(C). F(idX)≡idF(X)
–∀g○f∶homC(dom(f),cod(g)). F(g)○F(f)≡F(g○f).
Remark 4.2.4. The map Fis ad hoc polymorphic or overloaded in the sense that its sig-
nature C→Dis accompanied with the signature obj(C)→obj(D)and the signature
homC(X,Y)→homD(˜
X,˜
Y).
Remark 4.2.5. Observe that the defined functors are called covariant in order to distin-
guish them from contravariant functors F∶Cop →D– with Cop being the opposite cat-
egory (see Remark 4.2.2) – where the direction of the arrows in the domain-category Cop is
swapped in the codomain-category D.
The aforementioned definitions of a category and a functor appear a bit cumber-
some, but they unfold their power when we shift from this algebraic to a geometric
description. If we consider a category as a directed multigraph equipped with an alge-
bra of paths – the identity morphisms corresponding to 0-paths, the morphisms corre-
sponding to 1-paths, and the composition corresponding to 2-paths –, then one can
consider a functor as a graph-morphism that preserves paths. For the sake of brevity,
let us not define categorically the concepts graph, and similar. For more category
theoretical details regarding these concepts, I refer to, e.g., [144], [11] or [177] and
references therein.
Let us assume two finite categories Aand B. Note that a finite category has only a
finite number of objects, identity arrows, and non-identity arrows. Furthermore, let
138 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
us suppose a functor F ∶A→B. Hence, the diagram in (4.7) illustrates the geometric
view regarding categories and functors.
XF(X)
Y Z F(Z)F(Y)
idX
g○f
f
F(idX)
F(g○f)F(f)
idYg
idZ
F
F(idZ)
F(idY)
F(g)
(4.7)
Moreover, the diagram in (4.7) elucidates geometrically that all functors preserve
isomorphisms, that is, if the commutative diagrams in (4.2) exist, then all functors
preserve these commutative diagrams – in the sense that, if the objects X,Y∈obj(A)
are isomorphic, then
X≅AYÔ⇒ F(X)≅BF(Y). (4.8)
Although all functors preserve isomorphisms, it is not necessarily true that they reflect
isomorphisms, i.e., an isomorphism in the functor’s codomain does not mean neces-
sarily that the corresponding morphism in the functor’s domain is an isomorphism.
Functors can have various attributes. Two very useful examples are forgetful
functors that forget all or only some of the algebraic structure such as, e.g.,
F∶Vectfd
k
→Set, (4.9a)
G∶Vectbasis
k
→Vectfd
k
, (4.9b)
and faithful functors F ∶C→Dwho possess the defining property
for all X,Y∶obj(C), the map F ∶homC(X,Y)→homD(F(X),F(Y))is injective.3
(4.10)
Forgetful and faithful functors are very useful since they help to pin down the
concept of a structured set, i.e., a set equipped with extra structure; hence, a struc-
tured set is an object of a category Cequipped with a faithful functor F ∶C→Set.
Note that, in the setting of structured sets, the category Cis called a concrete cate-
gory. Some examples of concrete categories are the abovementioned categories Top,
Man∞, and Vectfd
k
. The category Mat
k
is not a concrete category.
Recalling § 4.2.1 for the case of structured sets, one can observe that categories
such as Top, Man∞or Vectfd
k
encode a syntax and the category Set provides a seman-
tics. A faithful functor between a syntax category and a semantics category encodes
an interpretation of a syntax category within a semantics category.
If one wants to compare two different interpretations, then one needs a proper
notion of comparing two different functors. Hence, let us define properly the notion
of an arrow between two functors, i.e., let us define a natural transformation.
Definition 4.2.3 (Natural transformation).Given a category C, a category D, a func-
tor F ∶C→D, and a functor G ∶C→D, then a natural transformation α∶F⇒G com-
prises
• the components of αat X–, i.e., a family of morphisms αXin D:
–∀X∶obj(C)∃αX∶F(X)→G(X)
• such that ∀f∶homC(X,Y), i.e., Xf
Ð→ Y,
3Regarding the signature of F, see Remark § 4.2.4.
4.2. Category theory toolset 139
–αY○F(f)=G(f)○αXin Dor
–the diagram
F(X)G(X)
F(Y)G(Y)
F(f)⟳
αX
G(f)
αY
commutes in D.
Remark 4.2.6. The lack of naturality (i.e., the lack of a natural transformation) is an even
more interesting observation than the existence of naturality – because, in the former case,
there is a need for an "unnatural choice of basis" in some sense that is motivated by the
relation between a finite-dimensional vector space and its dual and its double dual.
Remark 4.2.7. If we apply the geometric view such as in (4.7), then one can conceive a
natural transformation as a comparison of constructed paths.
Remark 4.2.8. A natural transformation α∶F⇒Gis a natural isomorphism (symbol-
ically: α∶F≅G) if and only if all components αXare isomorphisms. Let us omit writ-
ing the corresponding category, that is, the so-called functor category DCwhere the ob-
jects are the functors F∶C→D,G∶C→D, ..., and the morphisms are natural transforma-
tions α∶F⇒G. Hence, a natural isomorphism can symbolically be written as α∶F≅DCG.
Using the notion of natural isomorphisms in order to define the concept of equiv-
alence of categories, we are able to tackle the question of when two categories are
essentially the same.
Definition 4.2.4 (Equivalence of categories).An equivalence of categories C,D(sym-
bolically: C≃D) is constituted by
• functors F ∶C⇄D∶G and
• natural isomorphisms η∶idC≅G○F and e∶F○G≅idD.
Remark 4.2.9. Analogously to the identity morphism in Definition 4.2.1, the functors idC
and idDdenote the identity functors corresponding to the categories Cand D, respectively.
Remark 4.2.10. If and only if there is an equivalence of categories, then the functor Gis
called the inverse functor to the functor F.
Remark 4.2.11. If two categories are equivalent, then they are indistinguishable regarded
as categories, hence, it is possible to substitute one category with the other.
One can state stronger forms of sameness, e.g., one can state equality C=C; or,
one can replace the demand for natural isomorphisms in Definition 4.2.4 by the de-
mand for F to be an isomorphism regarding the category Cat, i.e., the category of (lo-
cally) small categories (recall Remark 4.2.3) where the objects are (locally) small cate-
gories C,D, ..., and the morphisms are functors F ∶C→D. Hence, a functor F ∶C→D
is called an isomorphism w.r.t. Cat if and only if there exits a morphism G ∶D→C
such that idC=G○F and F○G=idD.
However, these two stronger forms of sameness are less helpful since they restrict
unnecessarily the available expressive power. For instance, the second form tells
us that, by going around through F and G, we will land at the exact same starting
point. In most circumstances, though, we will land at a spot resembling our starting
point; and this idea of resembling is encoded by demanding natural isomorphisms
in Definition 4.2.4.
There is also a weaker form of sameness that replaces the natural isomorphisms
in the definition above by natural transformations (idC
η
Ô⇒ G○F and F○Ge
Ô⇒ idD)
140 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
and some coherence conditions (triangle identities). This process leads to the notion
of adjunction.
Regarding linear algebra (recall § 4.2.1), the important equivalence of categories
is captured by
Vectfd
k
≃Mat
k
(for any field
k
). (4.11)
For more details regarding the proof of (4.11), I refer to [177, p. 33]. Remember
that, in the setting of structured sets, the category Vectfd
k
is a concrete category and
the category Mat
k
is not a concrete category. However, the statement in (4.11) re-
gards Vectfd
k
and Mat
k
solely through the lens of a category (recall Definition 4.2.1).
The diagram in (4.12) depicts a possible interplay of some aforementioned cate-
gories.
Vectbasis
k
Mat
k
Vectfd
k
Set
TVect
k
Top
DA
J
B
H
C
E
F
G
(4.12)
Notice well that most of the functors in (4.12) are forgetful. However, the functor D is
forgetful and faithful and full.Full functors F ∶C→Dpossess the defining property
for all X,Y∶obj(C), the map F ∶homC(X,Y)→homD(F(X),F(Y))is surjective.
(4.13)
In general, a diagram such as in (4.12) is not necessarily commutative.
4.2.3 Computational facets
The notion of computation in the category theoretical language depends on the
chosen context, hereby reflecting the observation that the programming language’s
point of view on computation differs from the differential geometric’s or functional
analytic’s.
In a programming language setting, (typed) λ-calculus is a model of computa-
tion. A typed λ-calculus (see, e.g., [15]) is, directly or indirectly, constituting the core
of functional and imperative programming languages.4
In a differential geometric setting, a connection to computation is established
by moving from the manifold level to the chart level – and hence ultimately, to the
field of real numbers and matrices over the field of real numbers. Observe that, for
instance, the authors in [117] give a category theory oriented exposition of classical
differential geometry.
In a functional analytic setting, one has a concept of "a space approximating an-
other space" and a notion of error, but ultimately, the connection to computation is
established by moving to the field of real numbers and matrices over the field of real
numbers.
Observe that approximating is related to iterating a function; and iterating a func-
tion is related to composing functions. Furthermore, observe that, in general, the no-
tion of error is formalized in the context of complete normed vector spaces, more
4For some applications of λ-calculus with connection to electromagnetics, I refer to, e.g., [138].
4.2. Category theory toolset 141
precisely, Banach spaces. Notice that there is a well-designed connection between Ba-
nach space theory and category theory (see, e.g., [43]). Finally, observe that there is
a common tension arising in theories using real numbers caused by the application
to machines – which, inevitably, recourse to floating point arithmetic – and a leap
of faith that the theory’s high-level properties still hold to be true on the machine.
Moreover, there is a tension between a theory’s high-level data structures and their
programming language counterparts (cf. [133]).
Relating the category theoretical language to programming languages, one ob-
servation is the correspondence between a cartesian closed category (CCC) and a typed
λ-calculus. The defining properties specifying a CCC are the existence of a terminal
object, a product object, and an exponential object. For a definition of an exponen-
tial object and a cartesian closed category, see, e.g., [167, p. 33f] and [167, p. 53 - 57],
respectively.
Another observation is that a category can serve as a model for a functional pro-
gramming language. Note that, in addition to exploiting categorical definitions for
guidance on how to organize software, in recent times, the extension of functional
programming languages with dependent types (see, e.g., the Agda PL or the Idris PL)
enables implementing immediately categorical definitions by using software.
For technical details regarding dependent types, I refer to, e.g., [214]. How-
ever, recalling Definition 4.2.1, an instance of a dependent type is the hom-set type
homC(X,X)since its definition depends on the value Xof the obj type obj(C). Mostly,
though, dependent types are used to encode the quantifiers "∀" and "∃". Although
implementing immediately categorical definitions by using software is a promising
approach, it is still a very young approach (see, e.g., [72]); thus, its industrial appli-
cation is not big yet.
Relating the category theoretical language to matrix-focused environments, the
key observation is the equivalence in (4.11) which has to be adapted for a computa-
tional context in the sense that
Vectfd
k
≃Mat
k
(for any computable field
k
). (4.14)
Thus, the category Mat
k
is used as a computational model for the category Vectfd
k
.
Moreover, observe that the categories Vectfd
k
and Mat
k
are examples of so-called
Abelian categories (see, e.g., [144, ch. VIII Abelian Categories] or [71]) in which the
notion of adding morphisms (and objects, respectively) exists.
For instance, the authors in [143] utilize the category Mat
k
in order to develop a
simple categorical type system for concrete linear algebra. Another example is the
CAP project (cf. [168], [86]) which devotes itself to enable computing within a cate-
gory, i.e., to calculate objects, morphisms etc. of a given representation of a category
via a computational model such as, e.g., the category Mat
k
. The CAP project encom-
passes various software packages implemented in GAP (a system for computational
discrete algebra). Note that GAP does not support dependent types.
In the zoo of dynamically-checked programming languages, there is, e.g., the
new, still developing, Julia PL package Catlab.jl (see [163]) that is part of the larger
open source software project called AlgebraicJulia which focuses on developing
category theoretical approaches for technical computing. For more details on this
software project, visit
https://www.algebraicjulia.org/
. However, the expres-
siveness of the Julia PL’s type system is limited to encode fully the formal language
of category theory.
Finally, let us address the general question of error in connection with category
142 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
theory – which is inevitably related to numerical analysis where, classically, a dis-
tinction is made between modeling error,approximation error, and numerical error.
In order to handle numerical errors due to, e.g., number representations as float-
ing point numbers, there are not yet widespread methods such as interval arith-
metics (see, e.g., [213], [104]) to produce reliably verified results.
In order to deal with approximation errors – that is, errors due to discrete repre-
sentations of continuous representations, there is, as already mentioned above, the
well-oiled machinery of functional analysis which is invaluable even if one would
have computers with exact arithmetic.
Mostly, the flimsiest limb in the chain of errors is the modeling error – because
it is most difficult to formally capture this error which is a result of representing a
physical problem as a mathematical problem.
And if we consider all kinds of errors, then it is most likely to expect the greatest
impact from a CT approach on the modeling error. This consideration is plausible
since there is a take on category theory as a "mathematical model of mathematical
modeling" (see, e.g., [197]); and, hence, it can provide in some sense a consistency
test for the modeling. For instance, in differential geometry, both a linear map and a
bilinear map can be represented by a matrix but the matrix associated with the linear
map transforms differently under a change of coordinates than the matrix associated
with the bilinear map. This important information would go unnoticed by merely
focusing on the matrix representation.
Though, it is still too early to expect from a CT approach w.r.t. the modeling
error something similar to the so-called fundamental theorem of numerical analysis (see,
e.g., [7]).
However, in the upcoming section, driven by heuristics, I propose and discuss
shortly a means of quantifying the modeling error by employing a problem-depen-
dent degree of forgetfulness.
4.3 Using the CT toolset for SGO methods
In the area of optimization, it is intricate to provide a generally accepted taxonomy
of the numerous solution methods. However, if we apply an engineering-driven
pragmatism as a classifier, then surrogate-guided optimization methods have gained
traction as an outstanding sub-area over the last decades (recall § 4.1).
I argue, though, that using unique category theoretical (CT) language features
(recall § 4.2) can help to find a better match for pragmatic notions such as model
hierarchies or fidelity – in order to capture more rigorously their intended meaning. By
focusing on structure-related issues, CT offers formal methods that complement the
usual comparison toolset for SGO methods (recall § 4.1). Due to its tight connection
to (functional) programming, the CT approach highlights beneficially a blueprint of
a software design where the focus is on the specification (or the interface) which is
kept conceptually separated from the implementation.
Let us first devote ourselves to specifying a general optimization problem. Sub-
sequently, we dedicate ourselves to specifying surrogate-guided optimization meth-
ods.
4.3. Using the CT toolset for SGO methods 143
4.3.1 Specifying a general optimization problem
Recalling § 2.3, let us assume the objects X,Y, and Zwithin a category C, then we
prescribe the Z-valued objective function as
J=(y,x)↦z∶Y×X→Z, (4.15)
where x∶Xdenotes the control (or input) variable and y∶Ydenotes the state (or
intermediate) variable and z∶Zdenotes the output variable. Mind that we use
the barred arrow notation "↦" for the internal representation of a function and the
straight arrow notation "→" for the external representation of a function. Notice well
that the CT approach exploits primarily the external representation.
In (4.15), it is commonly assumed that there exists a unique control-to-state map f
such that
f=x↦y∶X→Y, (4.16)
Hence, one can write J(f(x),x)or, assuming that Jis ad hoc polymorphic or over-
loaded, one can write shortly J(f(x)). One can generalize the statements in (4.15)
and in (4.16) in the sense that one can make the assignment Y∶=BA(see § 4.2.2) in
order to highlight the search for a function of type A→B. Furthermore, one can
make the assignment X∶=X1×X2×...Xnand/or Z∶=Z1×Z2×...Znto emphasize
the arity of the input-object and the output-object.
Recalling the statements in (3.9) and in (3.10), let us prescribe the X-valued mini-
mizer function argmin that reads as
argmin =Jf↦x∗∶(X→Z)→X, (4.17)
where x∗∶Xdenotes the minimizer of the composite function Jfwith Jf≡J○fand
J○f∶homC(X,Z).
If we invoke set-builder notation and if we utilize an order structure on Z, one
can define the set of optimal solutions X∗as
X∗∶={x∗∶X∣∀x∶X.J(x∗)≤J(x)}, (4.18)
and one can define the set of corresponding optimal co-domain values Y∗as
Y∗∶={J(x)∶Z∣x∈X∗}, (4.19)
which is encoded in the minimizer proposition
minimize J(x). (4.20)
If we use an order structure on X, one can define a set of admissible solutions XF
due to, e.g., box constraints, as
XF∶={x∶X∣∀xl,xu∶X.xl≤x≤xu}, (4.21)
Supposing a differentiability structure for the maps Jand f, gradient- or hessian-
exploiting solution methods can be utilized.
Finally, let us apply the diagrammatic notation from the CT toolset (see, e.g., the
144 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
diagram in (4.7)) as a graphical tool for the compact representation of the specifica-
tion’s basic building blocks (4.15), (4.16), and (4.17).
X Y ×X Z
X Y Z
ZXX
idX
<f,idX>
idY×X
J
idZ
idX
f
idY
J
idZ
idZX
argmin
idX
(4.22)
Bear in mind that the statements in (4.18), in (4.19), in (4.20), and in (4.21) are consid-
ered as implicitly encoded in the compact representation in (4.22). Furthermore, the
diagrammatic notation in (4.22) is primarily only a graphical tool and not a graph-
ical language. In order to have a graphical language, i.e., to enable diagrammatic
reasoning (see, e.g. [1] or [45]), completeness theorems are necessary that match the di-
agrammatic representation with the algebraic representation. However, these kinds
of considerations are left for future investigations.
Nevertheless, the abstract specification by means of the CT approach is indis-
pensable since it keeps the objects and morphisms conceptually separated from a
possible implementation, e.g., as sets and set functions in a category Set. And, gen-
erally, it is conceivable that there is no prevailing way of implementation in different
categories. For more details on the method of categorical definition as a kind of ab-
stract specification and the utility of keeping the specification separated from the
implementation, see, e.g., [16].
Hereinafter, let us suppose the commonly defined sequential composition such
that the corresponding laws of a category are satisfied (recall Definition 4.2.1). Hence,
if we consider the three individual branches in (4.22) as depictions of three finite cat-
egories A,B,C(similarly to (4.7)), then an interaction between these three finite
categories can be conducted by functors. Given exemplarily the functors F ∶A→B,
G∶B→C, and H ∶A→C, one can adapt in a simplistic manner the diagrams in (4.22)
to a version that can be drawn as
X Y ×X Z
X Y Z
ZXZXX
<f,idX>
H
J
F
fJ
G
idZXargmin
. (4.23)
Especially due to the observation of the differentiation operator as a functor (see,
e.g., [177, p. 14f]), there are already some emerging applications of CT ideas in opti-
mization (see, e.g., [191]). Hence, diagrams such as in (4.23) furnish us with a fresh
and novel perspective on a general optimization problem.
Between the first branch (A) and the second branch (B) in (4.23), a forgetful func-
tor F – such as, e.g., in (4.9) – can be deployed that forgets the product object. Inter-
estingly, one cannot provide an appropriate functor in order to show an equivalence
of categories (recall Definition 4.2.4). According to [177, p. 30f], a functor F ∶C→D
4.3. Using the CT toolset for SGO methods 145
has to be faithful (such as in (4.10)), and full (such as in (4.13)), and essentially sur-
jective on objects in order to define an equivalence of categories, that is, C≃D. Mind
that a functor F ∶C→Dthat is essentially surjective on objects possesses the defining
property
for all X∶obj(D), there exists ˜
X∶obj(C)such that X≃DF(˜
X).5(4.24)
In conclusion, one cannot treat the first branch (A) and the second branch (B) in (4.23)
as indistinguishable regarded as categories and, therefore, one cannot substitute one
with the other.
If we look at the arrow Y×XJ
Ð→ Zisolated within the corresponding category (A)
and if we provide appropriate functors F and G, one can observe a natural isomor-
phism α∶F≅G (recall Remark 4.2.8) that relates the arrow Y×XJ
Ð→ Zto another
arrow Yˆ
J
Ð→ ZX. In computer science, such a specific natural isomorphism α∶F≅G
is called currying (see, e.g., [177, p. 54]). For more elaborations regarding the con-
structions behind currying, I refer to, e.g., [167, p. 33f] or [177, p. 129].
Exploiting the information regarding the first branch (A) and the third branch (C)
in (4.23), let us suppose a category Din which one can draw a diagram such that
X Y ZXX
idX
f
idY
ˆ
J
idZX
argmin
idX
. (4.25)
Observe that performance-related issues regarding a general optimization prob-
lem boil down to a matter of executing the composition in (4.25) as efficient as possi-
ble. Recalling Figure 1.4, the CT approach helps therefore with organizing concisely
the needed knowledge at the level of generalized functions which precedes the level
of associated algorithms which in turn precedes the level of programs that are cor-
responding implementations in a programming language. In informal parlance, one
can say that programs are a subpart of algorithms and algorithms are a subpart of
generalized functions. For more details, I refer to, e.g., [225].
Obviously, the CT approach is too abstract to utilize the standard analysis toolset;
but it enables a complementary useful rigor in reasoning by highlighting an ade-
quate specification.
Recalling for the sake of technicalities the category Cat which is the category of
(locally) small categories as objects and functors as morphisms (see § 4.2.2), then,
notably, the CT approach undertakes meaningfully a shift in perspective by lift-
ing the classical set-oriented modeling paradigm "(set) functions as models" onto
the category-oriented modeling paradigm: "Categories as models" and "functors as
model transformations".
4.3.2 Specifying surrogate-guided optimization methods
At the level of generalized functions (see Figure 1.4), surrogate-guided optimiza-
tion (SGO) methods try to address the aforementioned performance-related issue by
introducing a fidelity notion. Inevitably, it adds complexity to the inherent complex-
ity of the general optimization problem (see the diagram in (4.25)) by forcing the use
of various models that exhibit some kind of fidelity, that is, some kind of indexing.
5Regarding the signature of F, see Remark § 4.2.4.
146 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
In a nutshell, I argue that there are two kinds of fidelity: An order-oriented fidelity
and a hierarchy-oriented fidelity. Both kinds are based on a user-defined indexing
driven by the idea of a "best fit" to physics’ semantics; but the hierarchy-oriented
fidelity demands additionally a meaningful limit definition. The hierarchy-oriented
fidelity is at the core of multi-level methods, the order-oriented fidelity is at the core of
multi-fidelity methods. In the remaining, I will exclusively dwell on the multi-fidelity
methods.
To give an example of the "best fit" idea, let us use a semantics of electromagnetics
to interpret the syntactical considerations in the previous subsection – especially
in (4.15): the control variable xas geometric parameters, the state variable yas a
current density vector field, the map fas an assignment to a solution of a well-
posed boundary value problem (WPBVP), and the map Jas a loss map. Solving
numerically the WPBVP allows, e.g., to associated a "best fit" model with the highest
degree of discretization; and a "low fit" model with a low degree of discretization –
hence, let us call the later a low-fidelity model and the former a high-fidelity model.
Exploiting the category-oriented paradigm shift mentioned above in § 4.3.1, there
are two options for representing a model. For the sake of conciseness, let us omit the
identity arrows in the depictions:
X Y Z or
fJX Z .
Jf(4.26)
The option to choose depends on the information available. If we select, for
example, the first model representation as our high-fidelity model M1, then one can
ascribe a low-fidelity model M2to the high-fidelity model by using a functor F.
X Y Z
F(X)F(Y)F(Z)
fJ
F
F(f)F(J)
(4.27)
Given the functor F ∶M1→M2, one can interpret (recall § 4.2.1) the high-fidelity
model M1within the low-fidelity model M2. Hence, the CT approach gives a clear
mathematical meaning to the colloquial idea in SGO methods that the models of
variable fidelity should share some structural similarity.
Moreover, it establishes a good level of abstraction to encompass formally all
kinds of models – and the models’ interconnection – which are conceivable in SGO
methods (recall § 4.1).
In particular, the CT approach offers guidance to develop formalized guarantees
regarding the models in SGO methods. For instance, if two models Kand Lare
given, then one can analyze their relationship by studying various functors originat-
ing in homCat (K,L). To quantify the modeling error (see § 4.2.3), one can employ a
problem-dependent degree of forgetfulness (DoFF). E.g., if two or three forgetful func-
tors from TVect
k
towards Set (cf. the diagram in (4.12)) are provided, then one can
assign the value two or three to the degree of forgetfulness from TVect
k
to Set. Thus,
in this case, the modeling error associated with TVect
k
would be two or three. It can
be stipulated that a larger number reflects a greater knowledge of the model; and
therefore a larger number indicates a more trustworthy model.
By using a new functor G, one can construct a new low-fidelity model M3from
the low-fidelity model M2. Likewise, this construction can be interpreted as an as-
signment of the new low-fidelity model M3to the high-fidelity model M1by using
4.3. Using the CT toolset for SGO methods 147
a composite functor G○F.
X Y Z
F(X)F(Y)F(Z)
G(F(X)) G(F(Y)) G(F(Z))
f
G○F
J
F
F(f)F(J)
G
G(F(f)) G(F(J))
(4.28)
One can also ascribe directly a new low-fidelity model M4to the high-fidelity
model M1by using a functor H. Note that if, in addition, a functor ˜
G is provided
such that ˜
G/≅G and ˜
G/
∶=G, and if it is set that H ∶=˜
G○F, then we have a similar
situation as in the previous diagram in (4.28) – mind that it is not the same situation,
though. Thus, the low-fidelity models M4and M3are still distinguishable regarded
as models.
F(X)F(Y)F(Z)
X Y Z
H(X)H(Y)H(Z)
F(f)
˜
G
F(J)
fJ
H
F
H(f)H(J)
(4.29)
To be seen as indistinguishable, it demands testing for equivalence M4≃M3(recall
Definition 4.2.4) by furnishing adequate functors U ∶M4⇄M3∶V. Generally, this
equivalence test is a good initial tool to use if an arbitrarily new model is stated.
Further practical considerations require, for instance, to cover the issue of how
to check for equivalence by normalization. Furthermore, admittedly, it could be that
sometimes an even weaker form of equivalence is more appropriate (recall § 4.2.2).
F(X)F(Y)F(Z)
X Y Z
˜
G(F(X)) ˜
G(F(Y)) ˜
G(F(Z))
G(F(X)) G(F(Y)) G(F(Z))
F(f)
˜
G
F(J)
fJ
˜
G○F
F
˜
G(F(f)) ˜
G(F(J))
U
G(F(f))
V
G(F(J))
(4.30)
It is perceivable that one can arbitrarily scale such diagrams in (4.30) – while
maintaining the interpretability which facilitates a correct reasoning about the cor-
responding algorithms and programs. Because note that the inherent complexity at
the level of generalized functions propagates through the level of algorithms to the
level of programs (see Figure 1.4); but it is increasing at each level since each level
adds to it some level-specific complexity such as programming language-dependent
features.
Observe that, as instances for the presented diagrams such as in (4.27), one can
148 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
apply the category Set with appropriate additional properties and one can make
use of the corresponding Set-valued functors, that is, these kinds of functors whose
codomain is constituted by the category Set. However, the level of abstraction of the
CT approach also allows us to apply, for example, a category Graph of graphs and
graph homomorphisms. For more details on such graphs-related categories, see,
e.g., the references regarding the diagrams in (4.7).
Such a category Graph is a reasonable choice in practical terms since a program-
ming language’s compiler operates with abstract syntax trees. Moreover, regarding
the intrusiveness (see § 4.1.1) property assigned to the low-fidelity models in a tradi-
tional setting, this new context suggests to assign this property rather to the involved
functors.
If we choose the second model representation in (4.26) as our high-fidelity model
M1, then one can proceed analogously to the other choice. For the sake of clarity, let
us omit the parentheses.
X Z FXFZGFXGFZ
Jf
F
G○F
FJf
GGFJf(4.31)
These previous considerations provide a useful supplement to the concept of
multifidelity model management (see § 4.1.2). Using the CT approach, one can interpret
this concept’s basic building block as having "a finite category Cmmm with morphisms
as models". Again, one can sketch two cases depending on the available information.
To illustrate the two cases, let us choose one high-fidelity model (index 0) and two
low-fidelity models (index 1 and index 2, respectively).
X0Y0Z0
X1Y1Z1
X2Y2Z2
f0J0
f1J1
f2J2
X0Z0
X1Z1
X2Z2
Jf0
Jf1
Jf2
(4.32)
This concept requires mostly that there are isomorphisms (recall the diagram
in (4.2)) between the input objects (X0≅CX1,X0≅CX2, and X1≅CX2), intermediate
objects (Y0≅CY1,Y0≅CY2, and Y1≅CY2), and output objects (Z0≅CZ1,Z0≅CZ2,
and Z1≅CZ2) of the models. One can substitute one object with the other, if they are
isomorphic.
X0Y0Z0
f1
f0
f2
J1
J0
J2
X0Z0
Jf1
Jf0
Jf2
(4.33)
The notion of sameness of models is encoded by the equality of a parallel pair
of morphisms (such as in (4.6)), for instance, Jf0,Jf1∶X0⇉Z0,Jf0,Jf2∶X0⇉Z0.
4.3. Using the CT toolset for SGO methods 149
or Jf1,Jf2∶X0⇉Z0. For this purpose, one needs to presume the existence of the
terminal object 1within the given category.
Discern that the semantic link between the models in (4.32) is carried out by
conceptions such as model evaluation costs and correlation coefficients – whereas in
diagrams such as (4.30), the semantic link between the models is primarily carried
out by functors.
To examine the space mapping notion (see § 4.1.2), let us also interpret its basic
building blocks as having "a finite category Csm with morphisms as models". Then,
one can observe that it makes use of the intermediate objects.6
X0Y0Z0
X1Y2Z1
X2Y2Z2
f0J0
f1J1
f2J2
X0Y0Z0
X1Y1Z1
X2Y2Z2
f0
p01
J0
r01 o01
˜
p01
f1
p12
J1
˜
r01
r12
˜
o01
o12
˜
p12
f2J2
˜
r12 ˜
o12
(4.34)
In (4.34), there is no need to require any isomorphisms. More precisely, there is no
necessity that, for any (not-ordered) i,j∶{0,1,2}2, the morphisms pij ∶Xi⇄Xj∶˜
pij
have to satisfy the conditions
˜
pij ○pij =idXi(4.35a)
pij ○˜
pij =idXj(4.35b)
or the morphisms rij ∶Yi⇄Yj∶˜
rij have to satisfy the conditions
˜
rij ○rij =idYi(4.36a)
rij ○˜
rij =idYj(4.36b)
or the morphisms oij ∶Zi⇄Zj∶˜
oij have to satisfy the conditions
˜
oij ○oij =idZi(4.37a)
oij ○˜
oij =idZj. (4.37b)
However, the corresponding morphisms in (4.34)can satisfy the conditions in (4.35),
in (4.36), and in (4.37).
Similarly to (4.33), the notion of sameness of models in (4.34) is encoded by the
equality of a parallel pair of morphisms with the signature X0→Z0. Observe that
various combinations of path compositions in (4.34) can exhibit the signature X0→Z0.
To concretize the semantic link between the models in (4.34), various represen-
tations have been associated with some of the maps, e.g., affine maps with ˜
rij (see,
e.g., [56]) or argmin maps with pij (see, e.g., [14], [95], or [145]).
Concerning the semantic link, bear in mind that different representations, and
methods, respectively, suppose different isomorphisms regarding the objects in (4.34).
6Notice well that, compared with (3.124), there is a slight semantical change regarding the notation
in (4.34) in order to harmonize the notation a bit with the notation w.r.t. the CT toolset (recall § 4.2).
However, the identification of the entities in (3.124) with the entities in (4.34) should be clear.
150 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
Hence, taking into account the individual isomorphisms – reflected by the condi-
tions in (4.35), in (4.36), and in (4.37) – resembles a normalization process towards
the state where there is essentially only one input object, one intermediate object,
and one output object (see the diagrams in (4.33)).
X0Y0Z0
X1Y1
X2Y2
f0
p01
J0
r01
p12
f1
˜
p01 ˜
r01
r12
J1
f2
˜
p12 ˜
r12
J2
X0Y0Z0
Y1
Y2
f0
f1
f2
J0
r01 ˜
r01
r12
J1
˜
r12
J2(4.38)
The normalization process argument elucidates that the modeling decisions con-
cerning the choice of isomorphisms can separate the concept of multifidelity model
management encoded in Cmmm (see the diagrams in (4.32)) and the space mapping
notion encoded in Csm (see the diagrams in (4.34)), on the one hand.
On the other hand, the modeling decisions concerning the choice of isomor-
phisms reveals diagrammatically requirements for indistinguishability of the con-
cept of multifidelity model management (see the diagrams in (4.32)) and the space
mapping notion (see the diagrams in (4.34)).
In essence, the normalization process argument delivers a classification tool at
the level of generalized functions (see Figure 1.4) for the concept of multifidelity
model management and the space mapping notion.
An allied argument – which is algebraic-geometric in nature – utilizes functors.
Recalling the commentary on the diagrams in (4.7), we possess the certainty that all
functors preserve isomorphisms. Therefore, given the finite category Cmmm associ-
ated with (4.32) and the finite category Csm associated with (4.34), one cannot pro-
vide a functor Q between these two categories, i.e., Q ∶Cmmm →Csm, that maps the
objects and the morphisms in the most obvious way such that it maps isomorphisms
in Cmmm to non-isomorphisms in Csm.
Furthermore, a potential indistinguishability of the concept of multifidelity mo-
del management (see the diagrams in (4.32)) and the space mapping notion (see the
diagrams in (4.34)) can be expressed by an equivalence of categories (recall Defini-
tion 4.2.4)Cmmm ≃Csm.
Hence, comparing the category Cmmm and the category Csm provides another clas-
sification tool at the level of generalized functions (see Figure 1.4) for the concept of
multifidelity model management and the space mapping notion.
Finally, recalling the diagrams in (4.22), let us consider the last part of the opti-
mization. We continue our prior thread and ascribe a low-fidelity model optimiza-
tion O1to the high-fidelity model optimization O0by using a functor P.
ZXXP(ZX)P(X)
argmin
P
˜
P
P(argmin)(4.39)
Given the functor P ∶O0→O1, one can interpret (recall § 4.2.1) the high-fidelity
model optimization O0within the low-fidelity model optimization O1. Thus, the
4.3. Using the CT toolset for SGO methods 151
CT approach pins down the intuitive idea that there should be some structural sim-
ilarity between the high-fidelity model optimization and the low-fidelity model op-
timization.
Moreover, the diagram in (4.39) signifies structurally the distinction between
a surrogate-based optimization and a surrogate-guided optimization: In the former
case, after establishing the functor P, there is no further interaction between O0
and O1. In the later case, after establishing the functor P, there is further interac-
tion between O0and O1.
Ideally, if we furnish adequate functors P ∶O0⇄O1∶˜
P, then one can establish
an equivalence of categories O0≃O1; hence, one can consider them as indistinguish-
able regarded as model optimizations.
However, if we take up the position of the finite category Cmmm associated with
(4.32) and the finite category Csm associated with (4.34), then the expressive power of
the diagram in (4.39) reduces certainly. In the case of Cmmm, for instance, I conclude
from the diagrams in (4.33) that the high-fidelity model optimization and the low-
fidelity model optimizations are all together encoded within the subsequent repre-
sentation:
ZX0
0X0
argmin . (4.40)
Due to various combinations of path compositions in (4.34) that can exhibit the sig-
nature X0→Z0, the high-fidelity model optimization and the low-fidelity model op-
timizations concerning Csm can be all together encoded similarly to (4.40).
At the level of generalized function (see Figure 1.4), the encoding in (4.39) and the
encoding in (4.40) do not contain any value judgment in the sense that one encoding
is preferred more than the other encoding; they simply hint at possible logical mis-
matches between intuitive ideas and their mathematical encodings due to the choice
of the modeling paradigm.
Let us continue with the encoding in (4.39). Notice well that the X-valued mini-
mizer function argmin in (4.17) contains an internal representation (indicated by "↦")
and an external representation (indicated by "→").
Since the CT approach exploits primarily the external representation (recollect
§4.3.1), though, we have to assume additionally a terminal object in order to encode
properly the minimizer x∗for (4.39).
Hence, let us introduce 1x∗
Ð→ Xand x∗∶X1, respectively. Supposing that argmin
is polymorphic, one can extend its signatures such that
argmin ∶ZX→X1. (4.41)
Finally, let us adapt the diagram in (4.39) according to the statement in (4.41).
ZXX1P(ZX)P(X1)
argmin
P
˜
P
P(argmin)(4.42)
The adapted diagram in (4.42) formalizes suitably the idea of preserving the cor-
responding structure and minimizer. For the sake of brevity, I keep, however, the
adapted signature implicit in the remaining exposition.
152 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
Similar to the abovementioned encounters, when we include multiple low-fidelity
model optimizations, the CT ansatz unfolds its strength of formally correct book-
keeping of the involved interactions. It is set definitionally that am ∶=argmin.
P2(X)P2(ZX)P1(ZX)P1(X)
ZXX
˜
P2
P2(am)˜
P12
P12
˜
P1
P1(am)
am
P2P1(4.43)
The functors P12 ∶O1⇄O2∶˜
P12 indicate a possible test for equivalence (recall
Definition 4.2.4) of the corresponding low-fidelity optimization problems.
Similarly to the insight about the scalability from the diagrams in (4.30), one can
observe that one can arbitrarily scale diagrams such as in (4.43) while maintaining
the interpretability (see § 4.2.1) – a vital component for the correct reasoning about
associated algorithms and programs (see Figure 1.4).
The following depiction highlights the observation regarding the scalability. For
the sake of clarity, let us omit the potential functors between the low-fidelity model
optimizations whose existence is implicitly supposed.
P2(X)P2(ZX)P1(ZX)P1(X)
ZxX
P3(X)P3(ZX)P4(ZX)P4(X)
˜
P2
P2(am)
˜
P1
P1(am)
am
P2
˜
P3
P1
P4
P3
P3(am)˜
P4P4(am)
(4.44)
I remark that, by default, a possible implementation of the discussed categories
such as Cmmm that is associated with (4.32) and Csm that is associated with (4.34) is the
category Set, i.e., the category of (finite) sets and set functions (see § 4.2.2). Regarding
the category Set, a singleton set can serve as a terminal object and isomorphisms are
common bijective set functions.
An implementation of the discussed categories by the category Set seizes the
difference between the classical set-oriented modeling paradigm "(set) functions as
models" and the category-oriented modeling paradigm "categories as models" and
"functors as model transformations" (see § 4.3.1). However, as shown in this section,
one can discuss uniformly both modeling paradigms within the CT approach.
However, the set-oriented modeling paradigm benefits from a conglomeration
of diverse insights at the level of programs and at the level of algorithms (see Fig-
ure 1.4).
Nevertheless, the category-oriented modeling paradigm flourishes at the level of
generalized functions (see Figure 1.4), that is, it helps to encode more rigorously a lot
of colloquial or intuitive ideas; and it provides new tools to compare and to classify
methods, thus, it complements beneficially the tools from the set-oriented modeling
paradigm.
A final remark regarding the two modeling paradigms is concerned with the
direction of the arrows involved in the corresponding diagrams. In (4.26), the choice
of the direction of the arrows is dictated by the semantics commonly provided by a
general optimization problem (see § 4.3.1).
4.4. Use cases of the CT toolset
within the electromagnetics context 153
From a purely syntactical viewpoint, though, if we simply turn around the ar-
rows (cf. Remark 4.2.2) in (4.26), then combinatorial deliberations reveal that there
are four distinguishable diagrams conceivable regarding the three objects and two
arrows in (4.26). The corresponding list of distinguishable diagrams reads as
X Y Z
fJ(4.45a)
X Y Z
fJ(4.45b)
X Y Z
fJ(4.45c)
X Y Z
J
f. (4.45d)
Notice well that the possibility in (4.45b), in (4.45c), and in (4.45d) are ruled out by
semantics-based deliberations.
If we adapt (4.45) to categories A,B,Cand functors F,G, then we receive the
subsequent list of distinguishable combinations:
A B C
FG(4.46a)
A B C
FG(4.46b)
A B C
FG(4.46c)
A B C
G
F. (4.46d)
Let us invoke the category-oriented modeling paradigm in a verbose mode such that
one can spell out the statements in (4.46), for instance, in the following manner:
1. "model Cfollows from model B, and model Bfollows from model A",
2. "model Bfollows from model A, and model Bfollows from model C",
3. "model Afollows from model B, and model Bfollows from model C",
4. "model Afollows from model B, and model Cfollows from model B".
Thus, the category-oriented modeling paradigm offers implicitly a causal viewpoint
in some sense. Such a viewpoint hints at potential additional facets regarding the
notion of fidelity made at the beginning of the section. However, especially in the
light of the research on causal modeling (see, e.g., [196]), the critical examination of
these facets is left for future investigations. A popular adage in statistical parlance is:
“Correlation does not imply causation.” This adage could serve as a starting point
for a critical examination of statements such as in (4.46).
4.4 Use cases of the CT toolset
within the electromagnetics context
4.4.1 Use case #1: Simplified-physics low-fidelity models
Recalling § 3.1.3, one can observe that there is a difficulty regarding the formalization
of all intuitive ideas w.r.t. figures such as the Figure 3.8. To put it in other words: To
my best knowledge, there is a lack of a comprehensive theory to express formally all
154 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
conceivable relationships between different problems associated with a high-fidelity
model and corresponding low-fidelity models.
In (Diagrams of Fig. 3.8), some possible diagrams are abstractly presented that
can be associated with the Figure 3.8. In (3.102), some statements concerning (Dia-
grams of Fig. 3.8) are formulated in equational form. Mind that these statements and
diagrams can be understood a bit more rigorously by applying the category theory
toolset (see § 4.2).
From the viewpoint of the category theory toolset, one can make different mod-
eling decisions in (Diagrams of Fig. 3.8). I illustrate some modeling decisions by the
diagrams in (4.47).
T3D
h1T2D
h1R1T3D
h1T2D
h1R1T3D
h1T2D
h1R1
T3D
h2T2D
h2R2T3D
h2T2D
h2R2T3D
h2T2D
h2R2
1∆3D
Ax=b1∆2D
Ax=bR31∆3D
Ax=b1∆2D
Ax=bR31∆3D
Ax=b1∆2D
Ax=bR3
2∆3D
Ax=b2∆2D
Ax=bR42∆3D
Ax=b2∆2D
Ax=bR42∆3D
Ax=b2∆2D
Ax=bR4
f1g1h1
I1I2
f1g1h1
F1F2
f2g2h2
J1J2
f2g2h2
G1G2
f3g3h3
K1K2
f3g3h3
H1H2
L1L2
(4.47)
Observe that, in (4.47), a modeling decision is exhibited that takes into account the
different algebraic characters of, e.g., T3D
h1,T2D
h1, and R1, by lifting them up to the
level of categories. More precisely, entities such as, e.g., T3D
h1,T2D
h1, and R1are con-
ceived as categories and their interaction is mediated by functors such as, e.g., the
functor I1∶T3D
h1→T2D
h1and the functor I2∶T2D
h1→R1.
These categories can be understood as specifications (or interfaces) (recall § 4.3)
of the corresponding numerical entities (recall § 3.1.3). However, I do not dwell on
this modeling decision.
Regarding the other modeling decision in (4.47), it is supposed that, e.g., three
finite categories A,B,Cexist (similarly to (4.7)). In addition, it is presupposed that
the category A(recall Definition 4.2.1) is constituted by the six morphisms f1,f2,f3,
f4,f5, and f6where
f1∶homA(T3D
h1,T3D
h2)f2∶homA(T3D
h2,1∆3D
Ax=b)f3∶homA(1∆3D
Ax=b,2∆3D
Ax=b)
(4.48a)
f4∶homA(T3D
h1,2∆3D
Ax=b)f5∶homA(T3D
h1,1∆3D
Ax=b)f6∶homA(T3D
h2,2∆3D
Ax=b).
(4.48b)
Analogously to (4.48), one can constitute the category B, and the category C.
Thus, it is supposed that the category Bis constituted by the six morphisms g1,g2,
4.4. Use cases of the CT toolset
within the electromagnetics context 155
g3,g4,g5, and g6where
g1∶homB(T2D
h1,T2D
h2)g2∶homB(T2D
h2,1∆2D
Ax=b)g3∶homB(1∆2D
Ax=b,2∆2D
Ax=b)
(4.49a)
g4∶homB(T3D
h1,2∆2D
Ax=b)g5∶homB(T2D
h1,1∆2D
Ax=b)g6∶homB(T2D
h2,2∆2D
Ax=b).
(4.49b)
Finally, it is presupposed that the category Cis constituted by the six morphisms h1,
h2,h3,h4,h5, and h6where
h1∶homC(R1,R2)h2∶homC(R2,R3)h3∶homC(R3,R4)(4.50a)
h4∶homC(R1,R4)h5∶homC(R1,R3)h6∶homC(R2,R4). (4.50b)
For the sake of conciseness, let us assume implicitly that, regarding A,B,C, all cate-
gory laws are satisfied.
Notice that given a morphism f3○f2○f1as a decomposition of the morphism f4
in (4.48b) where
f3○f2○f1∶homA(T3D
h1,2∆3D
Ax=b), (4.51)
one can conceive the morphism f3○f2○f1, for instance, as a representation of an
operation that encodes the change from a fine-grid discretization regarding a three-
dimensional space and a high threshold for a termination criterion of an iterative
solver to a coarse-grid discretization regarding a three-dimensional space and a low
threshold for a termination criterion of an iterative solver (cf. § 3.1.3).
Hence, given a morphism g3○g2○g1as a decomposition of the morphism g4
in (4.49b) where
g3○h2○g1∶homB(T2D
g1,2∆2D
Ax=b), (4.52)
one can conceive the morphism g3○g2○g1as an analogue of the morphism f3○f2○f1
in (4.51) with respect to a two-dimensional space.
Given a morphism h3○h2○h1as a decomposition of the morphism h4in (4.50b)
where
h3○h2○h1∶homC(R1,R4), (4.53)
one can conceive the morphism h3○h2○h1, for instance, as a representation of an op-
eration that encodes the change from a multivariate rational polynomial function of
type (m,3)to a multivariate rational polynomial function of type (m,0)and leading
coefficient of one (cf. § 3.1.3).
Additionally to the categories A,B,C, it is supposed that the functors F1∶A→B,
F2∶B→C, the functors G1∶A→B, G2∶B→C, and the functors H1∶A→B,
H2∶B→Cexist.
Hence, one can reformulate and extend the statements in (3.102) by means of the
functor F1and the functor F2(recall Definition 4.2.2) , and with F2○F1∶A→C, i.e.,
F1(T3D
h1)∶=T2D
h1F1(T3D
h2)∶=T2D
h2F1(f1)∶=g1(4.54a)
(F2○F1)(T3D
h1)∶=R1(F2○F1)(T3D
h2)∶=R2(F2○F1)(f1)∶=h1(4.54b)
F1(1∆3D
Ax=b)∶=1∆2D
Ax=bF1(f2)∶=g2(4.54c)
(F2○F1)(1∆3D
Ax=b)∶=R3(F2○F1)(f2)∶=h2(4.54d)
F1(2∆3D
Ax=b)∶=2∆3D
Ax=bF1(f3)∶=g3(4.54e)
(F2○F1)(2∆3D
Ax=b)∶=R4(F2○F1)(f3)∶=h3(4.54f)
156 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
F1(f4)∶=g4F1(f5)∶=g5F1(f6)∶=g6(4.54g)
(F2○F1)(f4)∶=h4(F2○F1)(f5)∶=h5(F2○F1)(f6)∶=h6. (4.54h)
For the sake of conciseness, let us assume implicitly that, regarding F1, F2, and F2○F1,
all functor laws are satisfied.
If we suppose that, similarly to (4.9), the functor F1and the functor F2are for-
getful functors, then one can receive a precise encoding for the intuitive idea of con-
sidering the problems associated with the low-fidelity models as forgetful interpre-
tations of the problem associated with the high-fidelity model (cf. the commentary
on (3.102)).
Moreover, recalling § 4.3.2, one can invoke the heuristics-driven notion of a pro-
blem-dependent degree of forgetfulness (DoFF) in order to quantify the modeling
error. Bear in mind that my proposed notion of a DoFF is not cast in stone. The
DoFF serves primarily as an auxiliary means regarding the attempt to grasp more
formally the fidelity notion attached to models. I argue that the DoFF as an auxiliary
means is especially useful when using many different models.
If we fix the category Cas the model with the lowest fidelity, then one can as-
sociate the category Bwith the DoFF of 1, i.e., using one forgetful functor F2such
that cod(F2)≡C, and one can associate the category Awith the DoFF of 2, i.e., using
two forgetful functors F1and F2such that cod(F2○F1)≡C.
If we invoke the functor G1and the functor G2and the functor G2○G1∶A→C,
then one can encode a different interpretation (see § 4.2.1) than the functor F1and
the functor F2. For instance, the functor G1can encode a "partial shift" w.r.t. the
functor F1and the functor G2can encode a similar behavior w.r.t. the functor F2in
the sense that the assignments in (4.54) can be adapted as
G1(T3D
h1)∶=T2D
h2G1(T3D
h2)∶=1∆2D
Ax=bG1(f1)∶=g2(4.55a)
(G2○G1)(T3D
h1)∶=R1(G2○G1)(T3D
h2)∶=R2(G2○G1)(f1)∶=h1(4.55b)
G(1∆3D
Ax=b)∶=2∆2D
Ax=bG1(f2)∶=g3(4.55c)
(G2○G1)(1∆3D
Ax=b)∶=R3(G2○G1)(f2)∶=h2(4.55d)
G1(2∆3D
Ax=b)∶=2∆2D
Ax=bG1(f3)∶=id2∆2D
Ax=b
(4.55e)
(G2○G1)(2∆3D
Ax=b)∶=R4(G2○G1)(f3)∶=h3(4.55f)
F1(f4)∶=g4F1(f5)∶=g5F1(f6)∶=g6(4.55g)
(F2○F1)(f4)∶=h4(F2○F1)(f5)∶=h5(F2○F1)(f6)∶=h6. (4.55h)
If we replace the assignments in (4.55e) by the assignments
G1(2∆3D
Ax=b)∶=T2D
h1G1(f3)∶=g1, (4.56)
then one would violate the functor laws in the sense that
G1(f3○f2○f1)≡g1○g3○g2, (4.57)
where g1○g3○g2is not a legitimate composite morphism within B. From an appli-
cation-driven viewpoint, one can conceive this violation, e.g., as a restriction w.r.t. the
construction of an operation within another context that is grounded in, e.g., the rep-
resentation of the morphism f3○f2○f1in (4.51).
If we invoke the functor H1and the functor H2, and the functor H2○H1∶A→C,
4.4. Use cases of the CT toolset
within the electromagnetics context 157
then one can encode a third interpretation. For instance, the functor H1can encode a
similar behavior w.r.t. the functor F1and the functor H2can encode a "partial shift"
w.r.t. the functor F2in the sense that the assignments in (4.54) can be adapted as
F1(T3D
h1)∶=T2D
h1F1(T3D
h2)∶=T2D
h2F1(f1)∶=g1(4.58a)
(F2○F1)(T3D
h1)∶=R2(F2○F1)(T3D
h2)∶=R3(F2○F1)(f1)∶=h2(4.58b)
F1(1∆3D
Ax=b)∶=1∆2D
Ax=bF1(f2)∶=g2(4.58c)
(F2○F1)(1∆3D
Ax=b)∶=R4(F2○F1)(f2)∶=h3(4.58d)
F1(2∆3D
Ax=b)∶=2∆3D
Ax=bF1(f3)∶=g3(4.58e)
(F2○F1)(2∆3D
Ax=b)∶=R1(F2○F1)(f3)∶=h2(4.58f)
F1(f4)∶=g4F1(f5)∶=g5F1(f6)∶=g6(4.58g)
(F2○F1)(f4)∶=h4(F2○F1)(f5)∶=h5(F2○F1)(f6)∶=h6. (4.58h)
For the sake of brevity, let us omit the consideration of natural transformations
(recall Definition 4.2.3) regarding the functors F1,F2, the functors G1,G2, and the
functors H1,H2, that is, let us omit the comparison of the corresponding constructed
paths (recall Remark 4.2.7).
However, mind that, if we suppose that the functors in (4.54) in (4.55), and
in (4.58) are forgetful functors, then one can at least exclude that the categories A,B,
and Csatisfy a test for equivalence (recall Definition 4.2.4).
4.4.2 Use case #2: Coordinate transformations7
The second use case of the CT approach within the electromagnetics context is re-
lated to coordinate transformations. I construe the term "coordinate transformations"
as an umbrella term for different applications within the electromagnetics context
that are associated to coordinates in a narrower sense (e.g., w.r.t. a physical space) or
to coordinates in a broader sense (e.g., w.r.t. an abstract vector space).
An application that is associated to coordinates in a narrower sense is depicted in
Figure 4.1 which is inspired by [138]. In Figure 4.1, a three-dimensional helical coil
of 5 turns (cf. (i) in Figure 1.3) is represented by two different coordinate systems.
One coordinate system is constituted by xyz-coordinates and another coordinate sys-
tem is constituted by uvw-coordinates.
In Figure 4.1, it is supposed that an initial coordinate system is provided by a
user. Furthermore, I claim that the helical coil in xyz-coordinates possesses this kind
of shape that is intuitively expected from common experiences in physics.
However, the helical coil in uvw-coordinates, i.e., a three-dimensional solid iden-
tified as a cylinder, possesses this kind of curvilinear geometric shape that is rather
counterintuitive compared to common experiences in physics.
I do not dwell on the involved subtleties regarding such coordinate transforma-
tions such as, e.g., the proper transformation of the corresponding field quantities
w.r.t. the well-posed boundary value problems (recall (2.16)). For more details con-
cerning these subtleties, I refer to, e.g., [174] or [138].
7The elaborations are part of a joint publication in preparation for submission called "Formalization
Issues of Surrogate Modeling in Electromagnetic Compatibility" (M. Hadžiefendi´c, R. S. Rezende, R.
Schuhmann).
158 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
xyz-
coordinates
uvw-
coordinates
initial
xyz-coordinates
xyz-coordinates
from initial
xyz-coordinates
change of
coordinates
uvw-coordinates
from initial
xyz-coordinates
FIGURE 4.1: A schematic depiction of a three-dimensional helical
coil of 5 turns in xyz-coordinates (cf. (i) in Figure 1.3) and in uvw-
coordinates (inspired by [138]). For more elaborated considerations
of such coordinate transformations within the electromagnetics con-
text, I refer to, e.g., [174] or [138].
For the considerations in the present work, though, the significant aspect is that
the evaluated quantities of interest (recall § 2.2.1) in both coordinate systems in Fig-
ure 4.1 should be identical. This aspect reflects the premise that (derived) measur-
able physical entities such as, e.g., the magnetic energy or the power loss, should be
coordinate-invariant. More precisely, the (derived) measurable physical entities, and
therefore the evaluated quantities of interest, should be independent of the choice of
coordinates.
An application that is associated to coordinates in a broader sense is encoded by
the subsequent matrix equation, that is, given a matrix S∈R4×4, then
∀SMM ∈R4×4.∃M∈R4×4.SMM =MSM−1. (4.59)
If we assume an endomorphism s∶V→Vwithin the category Vectbasis
k
(recall
§4.2.2) regarding an ordered basis Vb1, then one can associate the endomorphism s
regarding an ordered basis Vb1with the matrix Sin (4.59) (cf. (4.11)).
Furthermore, one can associate the endomorphism sregarding another ordered
basis Vb2with the matrix SMM such that one can conceive the matrix Mas the change-
of-basis matrix from the ordered basis Vb1to another ordered basis Vb2.
Hence, given sregarding Vb2and given an input element a∶Vregarding Vb2, then
we receive an output element b∶Vregarding Vb2by setting
b∶=s○a. (4.60)
If we instantiate the assignment in (4.60) by the matrix SMM, the column vector a∈R4×1,
and the column vector b∈R4×1, then the statement in (4.60) can be written as
b∶=SMM a(4.61a)
b∶=MSM−1a, (4.61b)
where the term M−1arefers to the representation of a∶Vregarding Vb1and the
term SM−1arefers to the representation of b∶Vregarding Vb1.
A semantics for the statements in (4.61) within the electromagnetics context is
provided by, for instance, the notion of 4-port S-parameters and the notion of 4-
port mixed-mode S-parameters (see, e.g., [29]) within the context of electromagnetic
4.4. Use cases of the CT toolset
within the electromagnetics context 159
compatibility (recall, e.g., the EMC filter in Figure 1.1).
Thus, in (4.61), the matrix Scorresponds to the 4-port S-parameters and the ma-
trix SMM corresponds to the 4-port mixed-mode S-parameters that incorporates the
idea of differential-mode and common-mode scattering parameters. The entries
of the change-of-basis matrix Mdepend on the ordering of the entries of the ma-
trix SMM. Let us omit an in-depth elaboration on these two kinds of S-parameters.
For more details, I refer to, e.g., [29] and references therein.
Given the S-parameters semantics, the key insight from (4.61) is that, by provid-
ing the entries of the matrix Swhich are (derived) measurable physical entities, one
can determine the matrix SMM.
In the considerations of the present work, let us focus on how to embed applica-
tions such as, e.g., the application in Figure 4.1 or the application in (4.59) into the
body of ideas of surrogate optimization.
Recalling § 3.1, one can intuitively state that, regarding Figure 4.1, if we associate
a kriging low-fidelity model with a high-fidelity model that is linked to the problem
corresponding to the helical coil in xyz-coordinates and if we associate a kriging
low-fidelity model with a high-fidelity model that is linked to the helical coil in uvw-
coordinates, then these two kriging low-fidelity models should behave equally. Let
us refer to this situation as formalization issue A (FI-A).
Regarding (4.59), given the Sij-parameter w.r.t. the matrix S∶=[si,j]∈R4×4which
one can define as
Sij ∶=[si,j], (4.62)
then, for instance, one can associate with every Sij-Parameter a high-fidelity model
and a low-fidelity model similarly to the multivariate vector-valued use case in
(3.131).8
Therefore, we receive a matrix SK∈(YX0
0)4×4that corresponds to the high-fidelity
models and we receive a matrix S˜
K∈(YX1
1)4×4that corresponds to the low-fidelity
models.9
If we employ the matrix SKand the matrix S˜
Kin (4.59), we obtain the matrix SMMK
and the matrix SMM˜
K, respectively. Mind that the change-of-basis matrix Moperates
componentwise w.r.t. Y0⊆Rmwon the evaluated matrix SK(x)∶(Y0)4×4and it oper-
ates componentwise w.r.t. Y1⊆Rmwon the evaluated matrix S˜
K(˜
x)∶(Y1)4×4, respec-
tively.
One can also construct a matrix SMMK,˜
Kwhose entries are constituted by low-
fidelity models regarding the entries of SMMK.
Observe that, however, one cannot necessarily expect that the matrix SMMK,˜
Kand
the matrix SMM˜
Kbehave equally. Let us refer to this situation as formalization is-
sue B (FI-B).
Even if we limit our considerations to the surrogate modeling & simulation sub-
part of surrogate optimization, one can already recognize that a substantial amount
of bookkeeping is needed to examine situations regarding coordinate transforma-
tions.
8Technically, an Sij-parameter is defined as a member of the complex numbers, i.e., Sij ∈C. How-
ever, I utilize the common abbreviated interpretation in which the term Sij refers to the magnitude of
the corresponding Sij-parameter. And since, roughly speaking, an Sij-parameter encodes the ratio of
waves, it is usual to conceive the term Sij in the relative unit of measurement decibel (dB). Therefore, I
interpret the term Sij as a member of the real numbers, i.e., Sij ∈R.
9Respecting the notation in (3.131), the entry SijKhas the signature YX0
0, i.e., SijK∶YX0
0, where the
matrix SK∶=[si,j]K∈(YX0
0)4×4and SijK∶=[si,j]K. Analogously, the entry Sij˜
Khas the signature YX0
0, i.e.,
Sij˜
K
∶YX0
0, where the matrix S˜
K∶=[si,j]˜
K∈(YX0
0)4×4and Sij˜
K
∶=[si,j]˜
K.
160 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
Formalizing adequately situations regarding coordinate transformations in sur-
rogate optimization can be beneficial from a theory-driven viewpoint as well as
from an application-driven viewpoint. From a theory-driven viewpoint such an ad-
equate formalizing hints at the possibilities of, for lack of a better word, a coordinate-
invariant surrogate modeling & simulation. From an application-driven viewpoint,
such an adequate formalizing illuminates the path for the question of, e.g., to what
extent post-processing entities of a high-fidelity model based on a numerical sim-
ulation (cf. § 2.2) can be captured by post-processing entities of a corresponding
low-fidelity model.
I argue that the CT approach can at least assist in formalizing such intricate sit-
uations regarding coordinate transformations in surrogate optimization that, to my
best knowledge, are not exhaustively studied.
By recalling the diagrams from (4.27) to (4.31), one can abstract a common pattern
from the above-mentioned situations regarding coordinate transformations in the
surrogate modeling & simulation sub-notion of surrogate optimization. Hence, I
propose the subsequent diagram as a means of formalizing this common pattern.
X Y Z T(X)T(Y)T(Z)
G(T(X)) G(T(Y)) G(T(Z))
F(X)F(Y)F(Z)S(F(X)) S(F(Y)) S(F(Z))
fJ
F
T
T−1
T(f)
G
T(J)
G(T(f)) G(T(J))
U
F(f)F(J)S
S−1
S(F(f))
V
S(F(J))
(4.63)
Observe that, in (4.63), there are five categories (recall Definition 4.2.1)M1,M2,
M3,M4,M5accompanied by the six functors (recall Definition 4.2.2) F ∶M1→M2,
T∶M1→M3, G ∶M3→M4, S ∶M2→M5, U ∶M4→M5, and V ∶M5→M4. Notice
that, by providing the functors F, T, G, S, U, V, the identification of the categories
M1,M2,M3,M4,M5with their diagrammatic representation in (4.63) should be
clear.
Let us associate M1with a high-fidelity model, and link M3with the corre-
sponding coordinate-transformed high-fidelity model, and relate M4to the low-
fidelity model of the coordinate-transformed high-fidelity model. Furthermore, let
us associate M2with the corresponding low-fidelity model, and relate M5to the
corresponding coordinate-transformed low-fidelity model.
The inverse functor (recall Remark 4.2.10) T−1∶M3→M1and the inverse
functor S−1∶M5→M2indicate a coordinate transformation at the level of the
high-fidelity model and a coordinate transformation at the level of the low-fidelity
model, respectively. Notice that the existence of the inverse functors implies the
equivalence of categories (recall Definition 4.2.4)M1≃M3and M2≃M5, respec-
tively. From an application-driven viewpoint, a potential reading of these equiva-
lences is that, roughly speaking, the high-fidelity model M1and the correspond-
ing coordinate-transformed high-fidelity model M3are essentially the same as well
as the low-fidelity model M2and the corresponding coordinate-transformed low-
fidelity model M5are essentially the same.
The formalization issue FI-A can be expressed within the diagram in (4.63) by
4.5. Future use cases for the CT toolset 161
setting M2≡M5, and S ∶=idM2, and by demanding that the functor V is an in-
verse functor to the functor U, i.e., V ≡U−1, such that there is an equivalence of cate-
gories M4≃M2. Moreover, by assuming M1≃M3, one can ultimately regard F ≡G.
Or to put it in other words: The low-fidelity model M2and the low-fidelity model
of the coordinate-transformed high-fidelity model M4are essentially the same.
Using the diagram in (4.63), the essence of the formalization issue FI-B is that
there is not necessarily a functor V such that V ≡U−1, and therefore M4/≃M5.
One can roughly construe this equivalence as follows: The low-fidelity model of
the coordinate-transformed high-fidelity model M4and the coordinate-transformed
low-fidelity model M5are not essentially the same.
4.5 Future use cases for the CT toolset
I point out that, compared to, e.g., the usage of the language of functional analysis
and the language of differential geometry (recall ch. 2), the usage of the language of
category theory is still in an early stage regarding surrogate optimization within the
electromagnetics context.
However, by emphasizing the category theoretical language’s ability as a strong
notational scaffolding by diagrams of arrows, we have discussed in particular its po-
tential usefulness concerning some formal aspects of surrogate optimization within
the electromagnetics context.
Despite the fact that we have harnessed a lot of the CT toolset’s rigor, there is,
admittedly, more research needed to put the applications of the CT toolset on an
even more rigorous footing. I remark that the language of category theory offers
many more tools to be explored.
Besides the need for an even more rigorous footing of the applications of the CT
toolset, the findings of the present chapter can serve as a good starting point for
three interesting paths to pursue regarding future use cases for the CT toolset.
Concerning implementations of the CT toolset for surrogate optimization within
the electromagnetics context, the first path points at theory-driven implementations
in programming languages that follow different design principles: dynamically-
checked, statically-checked, imperative, functional, and similar. The expressiveness
of the chosen programming language will determine how many work-around arti-
facts in the actually programs are needed and how the incorporation of, e.g., differ-
ent numerical solvers are concretized.
The second path is inspired by [114] where the authors promote category theory
as a multidisciplinary language in order to discuss a magneto-elasticity problem.
Thus, the CT toolset could act as a guiding principle in designing surrogate-guided
optimization methods for multidisciplinary design optimization (recall § 1.1). In
constructing the corresponding numerical algorithms, one strength of the CT toolset
could be particularly useful: the coherent change of perspective from an algebraic to
a geometric presentation.
The third path is concerned with a thorough development of a modeling and
reasoning environment for parallel SGO methods. Recap that the definition of a
category (see Definition 4.2.1) deals solely with sequential composition. However,
monoidal categories (see, e.g., [46, p. 72f]) seem promising for this path since they in-
corporate sequential composition and parallel composition as well. Recalling the
162 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
diagrams in (4.32), monoidal categories can assist in encoding meaningfully expres-
sions such as10
∀J0,J1,J2.∀f0,f1,f2.(J0○f0)⊗(J1○f1)⊗(J2○f2)=((J0⊗J1)○(f0⊗f1))⊗(J2○f2)
(4.64a)
∀J0,J1,J2.∀f0,f1,f2.(J0○f0)⊗(J1○f1)⊗(J2○f2)=(J0⊗J1⊗J2)○(f0⊗f1⊗f2).
(4.64b)
Moreover, monoidal categories help to formalize, e.g., the subsequent diagrams.
X0Y0Z0
⊗
X1Y1Z1
⊗
X2Y2Z2
f0J0
f1J1
f2J2
X0Z0
⊗
X1Z1
⊗
X2Z2
Jf0
Jf1
Jf2
(4.65)
X0⊗X1⊗X2Y0⊗Y1⊗Y2Z0⊗Z1⊗Z2
f0⊗f1⊗f2J0⊗J1⊗J2(4.66)
X0⊗X1⊗X2Z0⊗Z1⊗Z2
Jf0⊗Jf1⊗Jf2(4.67)
10For more details regarding expressions such as in (4.64), I refer to, e.g., [46, p. 173–177] and refer-
ences therein.
4.6. In closing 163
4.6 In closing
The major aims of the chapter have been (1) to furnish us with a critical examina-
tion of the algebraic tools from the language of category theory (CT) that have been
anticipated in the preceding chapters; and (2) to investigate to which extent the cate-
gory theoretical language is beneficial in the development of an algebraic modeling
framework for applications in surrogate optimization.
In order to illustrate the principal application-oriented advantages of an alge-
braic modeling framework, we have recapitulated the contextual landscape of sur-
rogate optimization in the previous chapters and we have partly enlarged this land-
scape by the emerging context of full automation of surrogate-guided optimiza-
tion (SGO).
I have argued that to sufficiently comply with this emerging context, there is a
need for dealing with issues regarding software systems, too. Currently, however,
there is a gap between issues regarding software systems and the emerging context
of full automation of SGO.
Observing this gap, I have proposed the use of the category theoretical language
which can serve as a mediating instance between certain language-focused aspects
of software systems and the mathematical modeling involved in SGO methods.
Furthermore, I have mentioned some relevant related work regarding the devel-
opment of an algebraic modeling framework using the category theoretical language
for applications in surrogate optimization.
Subsequently, we have elaborated on the category theory toolset used in the
present work. This CT toolset, though, represents solely a subset of the large amount
of tools available within category theory. By illuminating the strengths of the CT
toolset as a strong notational scaffolding by diagrams of arrows, the power of the
coherent change of perspective from an algebraic presentation to a geometric pre-
sentation and vice versa has been shown.
There has been an attempt to balance the need for rigor and the need for appli-
cability as good as possible. Admittedly, more research is needed to put the appli-
cations of the CT toolset on an even more rigorous footing. However, an impor-
tant concern has been to foster some intuition regarding the CT toolset. From an
application-oriented viewpoint, this intuition is in particular important to promote
the CT toolset’s wider acceptance.
Besides the intuition, for the CT toolset’s wider acceptance from an application-
oriented viewpoint, computational facets are another important point. Although,
it is not the primary ambition in the present work, computational facets of the CT
toolset have been signposted and the relationship to more established toolsets (recall
ch. 2) has been illuminated. Finally, I have suggested a heuristics-driven notion of a
problem-dependent degree of forgetfulness (DoFF) in order to quantify the so-called
modeling error.
Afterwards, one way of specifying a general optimization problem by using the
CT toolset has been shown. For instance, we have carved out a diagram of arrows
that enables a discussion of performance-related issues regarding a general opti-
mization problem.
In specifying SGO methods by using the CT toolset, we have examined the clas-
sical set-oriented modeling paradigm "(set) functions as models" and the category-
oriented modeling paradigm: "Categories as models" and "functors as model trans-
formations".
By assuming an order-oriented fidelity notion regarding multi-fidelity methods,
we have in particular investigated the comparability of the concept of multifidelity
164 Chapter 4. An algebraic modeling framework using the category theoretical
language for applications in surrogate optimization
model management and the space mapping notion. Thus, I have propounded some
classification tools at the level of generalized functions (recall Figure 1.4) for the
concept of multifidelity model management and the space mapping notion.
By exploring the direction of the arrows in the category-oriented modeling para-
digm, some potential additional facets regarding the fidelity notion have been indi-
cated.
Furthermore, we have discussed the scalability of diagrams of arrows – while
maintaining the interpretability – concerning the handling of a high-fidelity opti-
mization problem and multiple low-fidelity optimization problems.
Ultimately, we have examined two use cases of the CT toolset that are rele-
vant to surrogate optimization for applications within the electromagnetics context:
(1) simplified-physics low-fidelity models, and (2) coordinate transformations.
Regarding the first use case, it has been exemplified how the machinery of cate-
gory theory can be applied to the construction process and the interaction of simpli-
fied-physics low-fidelity models.
It has been shown, for instance, how certain modeling decisions can lead to re-
strictions w.r.t. certain constructions. Furthermore, we have invoked the heuristics-
driven notion of a problem-dependent DoFF as a useful auxiliary means to quan-
tify simplified-physics low-fidelity models as forgetful interpretations of the high-
fidelity model.
Regarding the second use case, it has been exemplified how the machinery of
category theory can be applied to handle certain formalization issues concerning the
surrogate modeling & simulation sub-notion of surrogate optimization.
One formalization issue is related to coordinates w.r.t. a physical space; another
formalization issue is related to coordinates w.r.t. an abstract vector space. A context
for the first formalization issue is provided by the coordinate transformation of a
helical coil w.r.t. a well-posed boundary value problem. A context for the second
formalization issue is provided by the transformation of 4-port S-parameters.
By using the CT toolset, I have proposed a diagram of arrows as a common
generic interface of both formalization issues. It has been demonstrated that, by
setting adequately the entities within the diagram, the essential statement of the re-
spective formalization issue can be instantiated.
We have ended the investigation by exhibiting three potential future use cases
for the CT toolset based on the findings of the present chapter: (1) implementations
in programming languages, (2) SGO methods for multidisciplinary design optimiza-
tion, and (3) a modeling and reasoning environment for parallel SGO methods.
165
Chapter 5
Surrogate optimization with the
magnetoquasistatic model
Invoking the insights from the previous chapters, let us discuss how to apply a sub-
set of these insights to four optimization problems relevant within an electrical en-
gineering design context. Therefore, I deem it beneficial to utilize the image of a
rapid prototyping part of a product development cycle as a means to contextualize
the subsequent sections. Hence, I present a strategy of using surrogate optimization
in order to support preexisting electrical engineering design workflows.
In the first section, I elaborate on two optimization problems with regard to a
single inductive component, more precisely, to a solenoid with a core within the set-
ting of a 2D-LBVP. In the preliminary considerations, I present all the basic building
blocks to formulate the two optimization problems. Given an operating frequency,
the first optimization problem uses the semantics of the time-averaged ohmic loss
to encode the objective functional and it uses the semantics of the inductance to
encode the constraints besides some box constraints. The second optimization prob-
lem extends the discussion of the first optimization problem to the case of multiple
operating frequencies under test.
In the second section, let us discuss briefly one optimization problem with re-
gard to a common-mode choke within a prototypical version of a simplistic EMC
filter that is embedded in the setting of a 3D-LBVP. Furthermore, we discuss another
optimization problem with regard to two common-mode chokes within the setting
of a 2D-LBVP.
Concerning the first optimization problem, the aim is to minimize the magnitude
of one scattering parameter for a very narrow frequency range. From a theoretical
viewpoint, the notion of scattering parameters is not encoded within the magneto-
quasistatic model. From an application-driven viewpoint, though, there is usually
a fuzzy bounded overlapping modeling region of the magnetoquasistatic subsys-
tem of Maxwell’s equations and the complete system of Maxwell’s equations to gain
some knowledge about a given application for low frequency ranges such as, e.g.,
from 0Hz to 200MHz. The focus is mainly on the surrogate optimization, thus, we
consider the corresponding high-fidelity model primarily as a black-box model.
Regarding the second optimization problem, I examine the optimal positioning
of two common-mode chokes in order to lower their inductive coupling. I present
an approximate encoding of the high-fidelity optimization problem and, for this use
case, let us discuss concisely a surrogate optimization strategy as well.
166 Chapter 5. Surrogate optimization with the magnetoquasistatic model
5.1 Solenoid with a core
5.1.1 Preliminary consideration
The device under test is a solenoid with a core as a representative of the class of
inductive components.
In order to formulate a meaningful optimization problem within the syntacti-
cal framework discussed in § 2.3, let us briefly illustrate two physical behaviors of
inductive components that are relevant in practical applications: (1) losses and (2)
electromagnetic compatibility (EMC). These two main aspects are organized within
a short list. Albeit for a comprehensive treatment of the specific sub-aspects in this
short list, I refer to, e.g., [154] or [164] and references therein.
The short list reads as:
(1) losses
(1.1) losses in the winding
(1.1.1) losses due to a direct current
(1.1.2) losses due to an alternating current
(1.1.2.1) losses due to the skin effect
(1.1.2.2) losses due to the proximity effect
(1.2) losses in the core
(1.2.1) losses due to hysteresis
(1.2.2) losses due to eddy currents
(1.2.3) losses due to relaxation processes
(2) electromagnetic compatibility
(2.1) galvanic coupling or ohmic coupling
(2.2) electric coupling or capacitive coupling
(2.3) magnetic coupling or inductive coupling
(2.4) radiative coupling.
Invoking the EMC theory’s so-called "Emitter-Coupler-Receiver" scaffolding, one
can conceive the electromagnetic compatibility of an inductive component as the
component’s capability to interact in a mostly unwanted way with another electric
device. In the chapter’s exposition, let us be mainly concerned with the magnetic
coupling. Recalling Figure 1.1, an interesting use case is the magnetic coupling of
inductive components within an electromagnetic system such as, e.g., an EMC filter.
Therefore, the positioning of the inductive components within such an EMC filter
can influence the behavior of the filter.
Notice that I comprehend all the losses of an inductive component roughly as
power losses converted into heat. With regard to the BVPs under consideration,
though, it is assumed that the core is lossless which is a reasonable approximation
of real inductive components if their core material is ferrite.
In order to encode the solenoid with a core within a 2D-LBVP, one can break,
metaphorically speaking, the helicoidal winding with N∈Nturns of an actual sol-
enoid into Ntoroids (recall Figure 1.3). Finally, one can exploit the rotational sym-
metry of the corresponding 3D-LBVP that is geometrically composed of Ntoroids
and a cylindrical core such that it turns the 3D-LBVP into a 2D-LBVP (cf. Figure 3.7
and Figure 3.8). Hence, for the purpose of a numerical simulation within FEMM4.2,
5.1. Solenoid with a core 167
let us encode the solenoid with a core according to the schematic description in (i) of
Figure 5.1.1
(i) (ii)
Ω2D
nc,1
∂Ω2D
Ω2D
c,1
Ω2D
c,2
Ω2D
c,N
Ω2D
nc,2
axis of symmetry
x2
dc+ 2x1
hΩ2D
c,2(x1)
wΩ2D
c,2(x1, x2)
FIGURE 5.1: (i) A schematic illustration of a solenoid with core with
an axially symmetric domain. (ii) A simplicial triangulation Thof the
space region Ω2Dvia FEMM4.2.
In (i) of Figure 5.1, the entity dc∈R+refers to a fixed distance between the N
toroids. It is supposed that the radii of the cross-section areas of the Ntoroids are all
the same length which is encoded by the parameter x1. The parameter x2denotes the
radius of the Ntoroids themselves. The height of the core hΩ2D
c,2 is an affine function
with the signature R+→R+that depends on the parameter x1, more precisely,
hΩ2D
c,2
=x1↦N(dc+x1)+a0∶R+→R+, (5.1)
where a0∈R+is a fixed parameter to set up the core’s vertical expansion below the
lowest circle and above the highest circle in (ii) of Figure 5.1. Furthermore, note that
the width of the core wΩ2D
c,2 is an affine function with the signature R+×R+→R+that
depends on the parameters x1and x2, more precisely,
wΩ2D
c,2
=(x1,x2)↦2(x2−x1)−2a1∶R+×R+→R+, (5.2)
1Mind that parts of the corresponding simulation code at an early stage have been developed
during a bachelor thesis called "Numerische Optimierung in der Magnetoquasistatik mit dem Space
Mapping Ansatz" [in English: "Numerical Optimization in Magnetoquasistatics using the Space Map-
ping Approach"] (Albert Piwonski, summer term 2017; unpublished) under my scientific supervision
and reviewed by the first reviewer of the present work and Prof. Dr.-Ing. Ronald Plath (TU Berlin).
168 Chapter 5. Surrogate optimization with the magnetoquasistatic model
where a1∈R+is a fixed parameter to set up a minimal distance with respect to the
core’s horizontal expansion towards the series of circles in (ii) of Figure 5.1. In the
test cases, it is set that N∶=10, dc∶=0.1mm, a0∶=2.0mm, and a1∶=0.5mm.
With regard to the constitutive equations in (2.2), it is set that the material char-
acteristics of the subdomain Ω2D
nc,1 equal to the material characteristics of air. The
material properties of the subdomain Ω2D
nc,2 are identified with the material proper-
ties of the subdomain Ω2D
nc,1 – except for the relative magnetic permeability that is
set to 5×103. Finally, for the subdomains from Ω2d
c,1 to Ω2D
c,N, it is assumed that the
material characteristics of plain copper (recall § 3.1.3).
With regard to the source term Jsrc in (2.16), a fixed current intensity (or in short,
a fixed current) I0∈R+where I0∶=1A is associated with the subdomains from Ω2D
c,1
to Ω2D
c,Nsuch that, in the case of direct current and in the case of alternating current
of sinusoidal waveform, the value of the root mean squared current Irms is identical
to the value of I0, i.e.,
Irms ≡I0. (5.3)
Observe that, in the case of alternating current of sinusoidal waveform, the require-
ment in (5.3) implies that, due to the fact that the relationship
∀Irms ∈R+.∃Ipeak ∈R+.Irms =Ipeak
√2(5.4)
holds, the value of the peak current Ipeak has to be set equal to the value of √2I0, i.e.,
Ipeak ≡√2I0. (5.5)
Furthermore, it is demanded that the source current density’s orientation is in the
same direction for each subdomain Ω2D
c,iwith i∈{1,2,.. ., N}. Mind that, from an
electrical network viewpoint on which we elaborate briefly below, one can consider
the conducting subdomains as galvanically connected in series such that the follow-
ing conditions hold:
∀i∈{1,. . ., N}.∫
Ω2d
c,i
Re(Jcond)⋅dA∶=⎧
⎪
⎪
⎨
⎪
⎪
⎩
I0, if ω≡2π⋅fwith f∶=0Hz
I0√2, if ω≡2π⋅fwith f>0Hz, (5.6a)
∃I={1,. . ., N}.∫
⊍i∈IΩ2d
c,i
Re(Jcond)⋅dA∶=⎧
⎪
⎪
⎨
⎪
⎪
⎩
I0, if ω≡2π⋅fwith f∶=0Hz
I0√2, if ω≡2π⋅fwith f>0Hz, (5.6b)
where the common map Re ∶C→Rthat maps a complex number to its real part is
overloaded.
Considering the boundary conditions in (2.16), Dirichlet boundary conditions
on ∂Ω2Dare solely imposed. More precisely, it is prescribed that the magnetic vector
potential Ais equal to the zero vector field along ∂Ω2D.
In the context of the numerical simulation of the magnetoquasistatic model (re-
call § 2.2), one has to pay attention to two points regarding the boundary ∂Ω2D.
First, the boundary ∂Ω2Dis topologically isomorphic (or homeomorphic) to the
boundary of a disk, i.e., to a one-dimensional sphere. If one examines other kinds
of isomorphisms as well, then, in general, the shape of the boundary ∂Ω2Dand, in
particular, the distance from the subdomains Ω2D
nc,2 and Ω2D
c,ifor all i∈{1,2,.. ., N}
5.1. Solenoid with a core 169
to the boundary ∂Ω2Dmay be relevant to the 2D-LBVP. However, let us invoke a
pragmatic approach such that we set heuristically the distance dΩ2D
nc,2,∂Ω2D∈R+, that
is, the distance from the vertical half of the core to the end of the boundary ∂Ω2D, to
at least four times hΩ2D
c,2 (x1)in (5.1), more precisely,
∀hΩ2D
c,2 (x1)∈R+.dΩ2D
nc,2,∂Ω2D≥4hΩ2D
c,2 (x1). (5.7)
Second, the distance dΩ2D
nc,2,∂Ω2Dis also relevant regarding the change of the pa-
rameters x1and x2such as within a numerical optimization. The parameters x1
and x2are associated with the geometry of the space region Ω2D, i.e., Ω2D(x1,x2)
(recall § 2.2.3), thus, in a non-intrusive environment (recall § 3.1.3), a change of these
parameters leads inevitably to a re-meshing. More precisely, a map ϕais provided
that reads as
ϕa=(x1a,x2a)↦ϕa(x1a,x2a)∶=(x1b,x2b)∶R+×R+→R+×R+, (5.8)
where ∀a,b∈N.a≠bÔ⇒ (x1a,x2a)≠(x1b,x2b), otherwise ϕais identical to the cor-
responding identity map idϕa. If we associate a 2-tuple (x1a,x2a)with a simplicial
triangulation T2D
haand if we associate a 2-tuple (x1b,x2b)with a simplicial triangu-
lation T2D
hb, then, by invoking the map ϕain (5.8), one can encode re-meshing figura-
tively by the map Φathat reads as
Φa=T2D
ha↦Φa(T2D
ha)∶=T2D
hb∶Ω2D(x1a,x2a)→Ω2D(ϕa(x1a,x2a)), (5.9)
where if and only if Φais identical to the corresponding identity map idΦa, the num-
ber of nodes, edges, and surfaces is preserved.
In order to mitigate the potential spurious influence of re-meshing on entities
depending on the parameters x1and x2such as quantities of interest, let us apply a
heuristic approach in the sense that
• the meshing is fixed to balance a highest possible resolution and a reasonable
computation time (for a representative of the corresponding simplicial trian-
gulation, see (ii) in Figure 5.1);
• any kind of adaptive meshing is disabled, and
• the distance dΩ2D
nc,2,∂Ω2Din (5.7) is fixed to at least four times hΩ2D
c,2 (x1,max)where
x1,max is conceived as the maximal number in a bounded interval [x1,min,x1,max].
For more details on topics such as re-meshing and the like, I refer to the research, for
instance, within the field of shape optimization (see, e.g., [54] and references therein).
In technical applications (recall Figure 1.1), it is common to adopt an electrical
network viewpoint regarding an inductive component within an electromagnetic
system such that the inductive component is expressed by a circuit diagram repre-
sentation.
In Appendix B.1, the electrical network viewpoint is concisely elaborated. Fur-
ther, I recall the relationship between entities at the field theoretical level such as PL∈
R+that denotes the time-averaged ohmic loss in Ω2D, and Wm∈R+that denotes the
time-averaged magnetic energy in Ω2D, and entities at the circuit theoretical level
such as R∈R+that denotes the resistance, and L∈R+that denotes the inductance.
170 Chapter 5. Surrogate optimization with the magnetoquasistatic model
In order to formulate an optimization problem in accordance with our elabora-
tions in § 2.3, let us introduce the map ˆ
jPLand the map ˆ
QLsuch that
ˆ
jPL=(ω,x1,x2)↦PL≡ˆ
jPL(ω,x1,x2)∶R+×R+×R+→R+, (5.10a)
ˆ
QL=(ω,x1,x2)↦L≡ˆ
QL(ω,x1,x2)∶R+×R+×R+→R+, (5.10b)
where x1and x2refer to the entities in (i) of Figure 5.1. Next, recalling (3.128), one
can construct a new map ˆ
jPL,ωand a new map ˆ
QL,ωby currying such that
ˆ
jPL,ω=ω↦((x1,x2)↦ˆ
jPL(ω,x1,x2))∶R+→R+R+×R+
, (5.11a)
ˆ
QL,ω=ω↦((x1,x2)↦ˆ
QL(ω,x1,x2))∶R+→R+R+×R+
. (5.11b)
If one fixes the operating frequency f0and the angular operating frequency ω0≡2πf0,
then one can define the map ˆ
jPL,ω0and the map ˆ
QL,ω0such that
ˆ
jPL,ω0=(x1,x2)↦ˆ
jPL(ω0,x1,x2)∶R+×R+→R+, (5.12a)
ˆ
QL,ω0=(x1,x2)↦ˆ
QL(ω0,x1,x2)∶R+×R+→R+, (5.12b)
where ˆ
jPL,ω0≡ˆ
jPL,ω(ω0)and ˆ
QL,ω0≡ˆ
QL,ω(ω0).
Subsequently, the operating frequency f0is set to f0∶=1×105Hz. This choice is
based on a rough heuristic estimate of the scale of the winding losses. This heuristic
estimate reads as
ePL=(x1,f0)↦2x1/δS(f0)∶R+×R+→R+, (5.13)
where δShas the signature R+→R+and δS(f0)∶=(πf0µ0σCu)−1
2denotes the so-
called skin depth (cf. [103, p. 220]) w.r.t. a good single conductor (recall § 3.1.3) evalu-
ated at the operating frequency f0.
If a lower bound of x1is set to x1,min ∶=1×10−3m and if an upper bound of x1
is set to x1,max ∶=3×10−3m, then one can ascertain the corresponding values of the
estimate ePL,f0for various operating frequencies in Table 5.1.
In Table 5.1, one can observe that the scale of the winding losses is, very roughly
estimated, forty-five times greater at the frequency f0∶=1×105Hz than at the fre-
quency f0∶=5×101Hz. Thus, given the frequency f0∶=1×105Hz, I conclude that
the winding losses are at such a significant level which can be critical in technical
applications.
Notice that the estimate in (5.13) can also be used to illustrate the decrease of
the magnetic energy due to a conductor’s higher magnetic shielding effect at higher
operating frequencies. However, the rate of the magnetic energy’s decrease is much
lower compared to the rate of the winding losses’ increase.2
From an electromagnetic system point of view, it might be more adequate to
regard the time-averaged ohmic loss density, that is, the time-averaged ohmic loss
2For more details on computations w.r.t. magnetic fields in the magnetoquasistatic model, I refer
to, e.g., [103, p. 218–224] and references therein.
5.1. Solenoid with a core 171
TABLE 5.1: Given a lower bound x1,min and an upper bound x1,max,
the rough heuristic estimate of the scale of the winding losses in (5.13)
for various operating frequencies f0.
(A) The lower bound x1,min set to
1×10−3m.
f0[ Hz ] ePL(x1,min,f0)[1]
5×1010.217
1×1020.307
1×1030.970
1×1043.068
1×1059.701
1×10630.678
1×10797.014
1×108306.784
(B) The upper bound x1,max set to
3×10−3m.
f0[ Hz ] ePL(x1,max,f0)[1]
5×1010.651
1×1020.920
1×1032.910
1×1049.204
1×10529.104
1×10692.035
1×107291.041
1×108920.353
normalized to a volume under test Vut. Hence, the time-averaged ohmic loss density
enables a comparison of different solenoids with core (recall Figure 5.1) by taking
into account a restriction of the available space within an electromagnetic system.
Given a map Vut =(x1,x2)↦Vut(x1,x2)with the signature R+×R+→R+, one
can define the map ˆ
jPL,Vut, the map ˆ
jPL,Vut,ω, and the map ˆ
jPL,Vut,ω0analogous to (5.10a),
(5.11a), and (5.12a), that is,
ˆ
jPL,Vut =(ω,x1,x2)↦ˆ
jPL(ω,x1,x2)/Vut(x1,x2)∶R+×R+×R+→R+, (5.14a)
ˆ
jPL,Vut,ω=ω↦((x1,x2)↦ˆ
jPL,Vut(ω,x1,x2))∶R+→R+R+×R+
, (5.14b)
ˆ
jPL,Vut,ω0=(x1,x2)↦ˆ
jPL,Vut(ω0,x1,x2)∶R+×R+→R+, (5.14c)
where ˆ
jPL,Vut,ω0≡ˆ
jPL,Vut,ω(ω0).
I conceive Vut(x1,x2)∈R+in (5.14) as the volume of the core Vc(x1,x2)∈R+com-
bined with the volume of the winding Vw(x1,x2)∈R+, that is,
Vut(x1,x2)≡Vc(x1,x2)+Vw(x1,x2), (5.15)
where Vc(x1,x2)and Vw(x1,x2)are determined by means of numerical integration.
Recall that it is assumed that a cylindrical core and Ntoroids are associated with the
spatial domain in Figure 5.1. Hence, one can also determine Vc(x1,x2)and Vw(x1,x2)
by means of the following formulae:
Vc(x1,x2)∶=π(0.5wΩ2D
c,2 (x1,x2))2hΩ2D
c,2 (x1,x2),Vw(x1,x2)∶=N2π2x2
1x2. (5.16)
However, let us proceed with the computation of Vut(x1,x2)by numerical integra-
tion since it is a more general approach. Due to numerical inaccuracies, though,
there are slight differences between the computation of Vut(x1,x2)by numerical in-
tegration and by the formulae in (5.16).
172 Chapter 5. Surrogate optimization with the magnetoquasistatic model
5.1.2 Optimization problem I
Without loss of general applicability, in the subsequent exemplification regarding
surrogate optimization, let us focus on the term ˆ
jPL,ω0(x1,x2)in (5.12a) rather than
on the term ˆ
jPL,Vut,ω0(x1,x2)in (5.14c); or to put it differently, let us focus on the
ohmic loss rather than the ohmic loss density. Hence, I investigate the following
high-fidelity optimization problem as a concrete instance of the abstract optimiza-
tion problem in (2.36) that reads as
min. ˆ
jPL,ω0(x1,x2)(5.17a)
s.t. x1,min ≤x1≤x1,max, (5.17b)
x2,min ≤x2≤x2,max, (5.17c)
Lmin −ˆ
QL,ω0(x1,x2)≤0, (5.17d)
ˆ
QL,ω0(x1,x2)−Lmax ≤0, (5.17e)
where ω0∶=2π100kHz, x1,min ∶=1×10−3m, x1,max ∶=3×10−3m, x2,min ∶=5×10−3m,
x2,max ∶=10×10−3m, Lmin ∶=2.5×10−6H, and Lmax ∶=3.5×10−6H.
Remark 5.1.1. Due to production considerations in practical applications, there might be
only integer length quantities or a mix of a real length quantity and an integer length quan-
tity. Hence, this kind of modeling issue might be covered adequately by an integer optimiza-
tion problem or a mixed-integer optimization problem. However, in the present work, I regard
the problem in (5.17) as a reasonable approximation of potential modeling implications due
to production considerations in practical applications.
Remark 5.1.2. Recalling § 2.3.1, one can observe that in (5.17), there is an evaluated re-
duced parametric quantity of interest in the objective functional and there is an evaluated
reduced parametric quantity of interest in the constraints, as well.
It is supposed that the relation in (3.106) holds to be true, at least, in a worst-case
sense. Regarding direct solving of the high-fidelity optimization problem in (5.17)
by means of an adequate optimization algorithm from § 2.3.3, a worst-case scenario
can be conceived as, e.g., the unfavorable choice of an initial point for a locally con-
vergent algorithm.3
For instance, if we apply the COBYLA algorithm (recall § 2.3.3) to the problem
in (5.17) with the initial point x(0)and the maximum number of high-fidelity func-
tion evaluations mDSO,max that read as
mDSO,max ≡30 x(0)∶=(1.1×10−3m,9.9 ×10−3m), (5.18)
then one receives the subsequent log data in an abridged version, i.e.,
mDSO ≡30 x∗∶=(2.98×10−3m,8.86 ×10−3m)
(5.19a)
ˆ
jPL,ω0(x∗
1,x∗
2)≡96.32×10−3Wˆ
QL,ω0(x∗
1,x∗
2)∶=2.51×10−6H, (5.19b)
where the 3-tuple (∆ˆ
jPL,ω0
,∆ˆ
QL,ω0,∆x∗)∈R+×R+×R+is set to
(∆ˆ
jPL,ω0
,∆ˆ
QL,ω0,∆x∗)∶=(1.0×10−4,1.0 ×10−4,1.0 ×10−8), (5.20)
3Even if some potential physical intuition is available concerning a suitable candidate for an initial
point, such a worst-case scenario is highly probable in practical applications where the shape of the ad-
missible set of solutions and the landscape of the evaluated objective functional are usually unknown.
5.1. Solenoid with a core 173
which is constituted by the absolute accuracy threshold for the evaluated objective
functional ∆ˆ
jPL,ω0
, the absolute accuracy threshold for the constraints ∆ˆ
QL,ω0, and the
relative accuracy threshold for the optimal solution ∆x∗. Notice that I do not dwell
on these thresholds since they appear problem-dependent. However, let us retain
the 3-tuple (∆ˆ
jPL,ω0
,∆ˆ
QL,ω0,∆x∗)for all optimization problems under consideration.
To put it differently: It is assumed that, in some sense adequate, inclusion maps
(similarly to, e.g., (3.122)) exist for all optimization problems under consideration.
The initial point x(0)in (5.18) is not an admissible point since the inductance
is ˆ
QL,ω0(x(0)
1,x(0)
2)∶=3.98×10−6H. However, even if one selects heuristically4an
admissible initial point, then the optimization algorithm still detects the solution x∗
in (5.19a). Hence, I conclude that, at the level of programs (recall Figure 1.4), the
implementation of the COBYLA algorithm possesses a coping mechanism to deal
with not admissible initial points.
By numerical experiments and theoretical considerations, the optimal solution x∗
in (5.19a) is plausible. For instance, if one removes the constraints regarding the in-
ductance in (5.17), more precisely, if one removes (5.17d) and (5.17e), then the com-
puted optimal solution is located at (x1,max,x2,min). Recalling Figure 5.1, one can
put it in 3D terms: The largest possible radius of the cross-section areas of the N
toroids and the smallest possible radius of the Ntoroids themselves constituted the
computed optimal solution.
The computed optimal solution (x1,max,x2,min), though, corresponds to the low-
est possible inductance regarding the constraint in (5.17b) and the constraint in (5.17c).
Hence, I deem it reasonable to assume that the theoretical optimal solution in (5.17)
is at least a member of the level set LLmin(ˆ
QL,ω0)that can be defined by set-builder
notation as
LLmin(ˆ
QL,ω0)∶={(x1,x2)∈R+×R+∣ˆ
QL,ω0(x1,x2)−Lmin =0}. (5.21)
Furthermore, if one compares the solution (x1,max,x2,min)with the solution x∗in
(5.19a), then I prudently infer that, concerning ˆ
jPL,ω0(x1,x2)in (5.17), the parame-
ter x1possess probably a higher relevance compared to the parameter x2. This infer-
ence seems reasonable from a physical viewpoint: If a conceptual approximation is
made in the sense that one imagines that the Ntoroids corresponding to Figure 5.1
are replaced by Ncylinders of the length x2and the radius x1of the cross-section
area, then one can state, roughly speaking, that ˆ
jPL,ω0(x1,x2)is proportional to x2
and proportional to 1/x2
1. These proportional relations furnish us with some indica-
tions regarding the order of relevance of the parameters x1and x2w.r.t. ˆ
jPL,ω0(x1,x2)
in (5.17).
For a surrogate-based optimization, let us use a Sobol quasi-random sequence
sampling plan with m∶=21 and the data-fit low-fidelity models in § 3.2.1. In Ap-
pendix B.2, I present a visualization of the evaluated data-fit low-fidelity models
regarding (5.12), (5.14c), and (5.15). In Appendix B.2, potential scaling issues are
taken into account (recall § 3.2.1) by setting up the appropriate units of measure for
the corresponding physical dimensions.
Recalling § 3.2.1, one can most likely rule out that the corresponding unknown
evaluated high-fidelity models behave like the Ackley function or the Michalewicz
function. It is probable that these models behave like one of the other functions
4A strategy to determine a feasible point regarding the optimization problem in (5.17) is, e.g., to
solve an optimization problem where the evaluated objective function is defined as ∥ˆ
QL,ω0(x1,x2)−
Lmin∥2
l2or as ∥ˆ
QL,ω0(x1,x2)−Lmax∥2
l2and the constraints are defined by (5.17b) and (5.17c).
174 Chapter 5. Surrogate optimization with the magnetoquasistatic model
(i.e., the Unit sphere function, the Booth function, the Rosenbrock function or the
Modified Branin function) in its outer regions.
Concerning the evaluated data-fit low-fidelity models in Appendix B.2, observe
that, in Table 5.2, the normalized global first-order sensitivity measures SN
1and SN
2
(cf. (2.51)) and, given mj≡50 and mj−1≡21, a low-fidelity models’ normalized global
first-order sensitivity measures (LFSM) error emj(SN
˜
ˆ
y,i)(cf. (3.37)) are computed. I
emphasize the data relevant to the exemplification regarding surrogate optimization
by means of coloring, i.e., ˜
ˆ
jPL,ω0and ˜
ˆ
QL,ω0. Mind that, as mentioned above, the
higher relevance of the parameter x1compared to the parameter x2is plausible from
a physical viewpoint.
TABLE 5.2: (Ia) SN
iwith i∈{1,2}evaluated at
(a)
f≡˜
ˆ
jPL,ω0and
(b)
f≡˜
Vut w.r.t. the Figure B.6; (Ib) Given mj∶=50, LFSM er-
ror emj(SN
˜
ˆ
y,i)w.r.t. (Ia); (IIa) SN
iwith i∈{1,2}evaluated at
(a)
f≡˜
ˆ
jPL,Vut,ω0and
(b)
f≡˜
ˆ
QL,ω0w.r.t. the Figure B.7; (IIb) Given
mj∶=50, LFSM error emj(SN
˜
ˆ
y,i)w.r.t. (IIa).
(Ia)
(1a) (2a) (3a) (1b) (2b) (3b)
SN
1(f)0.6751 0.6854 0.6860 0.8930 0.8927 0.8631
SN
2(f)0.3249 0.3146 0.3140 0.1070 0.1073 0.1369
Σ2
i=1Si(f)1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
(Ib)
(1a) (2a) (3a) (1b) (2b) (3b)
emj(SN
˜
ˆ
y,1)−0.0066 −0.0245 −0.0551 −0.0009 −0.0029 +0.0014
emj(SN
˜
ˆ
y,2)+0.0134 +0.0495 +0.1023 +0.0074 +0.0237 −0.0088
(IIa)
(1a) (2a) (3a) (1b) (2b) (3b)
SN
1(f)0.9720 0.9700 0.9704 0.6074 0.6166 0.6223
SN
2(f)0.0280 0.0300 0.0296 0.3926 0.3834 0.3777
Σ2
i=1Si(f)1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
(IIb)
(1a) (2a) (3a) (1b) (2b) (3b)
emj(SN
˜
ˆ
y,1)−0.0016 −0.0010 −0.0203 −0.0018 −0.0123 −0.0114
emj(SN
˜
ˆ
y,2)+0.0541 +0.0323 +0.3947 +0.0028 +0.0192 +0.0182
Notice well that, due to the examination in § 3.2.1, if one can most probably rule
out that the unknown evaluated high-fidelity models are members of the same class
as the Ackley function or members of the same class as the the Michalewicz function,
then it is probably reasonable to expect that the behavior of the corresponding low-
fidelity models’ sensitivity measures w.r.t. the number of training sampling points is
relatively fast converging – especially in the case of the kriging low-fidelity model.
Hence, in practical applications such as, e.g., the rapid prototyping part of a
product development cycle, the Table 5.2 can serve as an approximate proxy for
other indicators (recall the conjecture in § 3.1.1) and it can serve as an overview of
the relevance of the parameters.
5.1. Solenoid with a core 175
In the subsequent considerations, let us invoke the kriging low-fidelity model.
If we apply the COBYLA algorithm to the corresponding low-fidelity optimization
problem with the initial point ˜
x(0)∶=(1.1×10−3m,9.9 ×10−3m), then one can com-
pute the optimal solution ˜
x∗that reads as
˜
x∗∶=(3.00×10−3m,8.87 ×10−3m). (5.22)
And if we apply the COBYLA algorithm to the problem in (5.17) with the maxi-
mum number of high-fidelity function evaluations ˜
mDSO,max ∶=2 and the initial point
x(0)∶=˜
x∗, one receives the surrogate-based optimization’s log data in an abridged
version, i.e.,
mSBO ≡23 x∗∶=(3.00×10−3m,8.87 ×10−3m)
(5.23a)
ˆ
jPL,ω0(x∗
1,x∗
2)≡96.26×10−3Wˆ
QL,ω0(x∗
1,x∗
2)∶=2.50×10−6H, (5.23b)
where mSBO ∶=m+˜
mDSO,max. The low-fidelity models regarding the entities in (5.17)
are constructed in parallel. If one constructs the low-fidelity models sequentially,
then mSBO in (5.23) is determined by mSBO ∶=2m+˜
mDSO,max.
For additional investigations, one can apply the procedure SBO-DFLF in the
neighborhood of the optimal solution in (5.23).
Recalling § 4.4.1, one can construct simplified-physics low-fidelity models by
adapting, e.g., T2D
h1or 1∆2D
Ax=b. Moreover, one can associate R1in§4.4.1, e.g., with
a physics-oriented approximation formula of ˆ
jPL,ω0(x1,x2)and ˆ
QL,ω0(x1,x2)such
as N(2πx2)(πx2
1)−1σ−1
CuI2
rms for the term ˆ
jPL,ω0(x1,x2)and µN2(πx2
2)hΩ2D
c,2 (x1)−1for
the term ˆ
QL,ω0(x1,x2). However, in the remaining, let us focus on T2D
h1and 1∆2D
Ax=b.
The entities T2D
h1and T2D
h2are associated with ≈1×105elements and ≈1×103
elements, respectively (recall § 2.2.2), i.e., T2D
h1↦≈1×105elements, and T2D
h2↦≈
1×103elements. Further, it is set that 1∆2D
Ax=b∶=1×10−16 and 1∆2D
Ax=b∶=1×10−8.
In Table 5.3, given m∶=21, I present the corresponding mean SSPCC r2
ˆ
y˜
ˆ
y∣k∶=5
within the k-fold cross validation method w.r.t. the simplified-physics low-fidelity
model regarding ˆ
jPL,ω0and ˆ
QL,ω0in (5.17). The reference setting is defined by the
combination (T2D
h1,2∆2D
Ax=b).5
From the Table 5.3, I infer that one can probably invoke a useful simplified-
physics low-fidelity model by the combination (T2D
h2,2∆2D
Ax=b). Hence, one can pro-
ceed analogously to the procedure regarding the log data in (5.23). For additional
investigations, one can apply the procedure SBO-SPLF in the neighborhood of the
corresponding optimal solution.
If the solving of the optimization problem regarding the selected simplified-
physics low-fidelity model is perceived as being slow, then one can construct a
data-fit low-fidelity model w.r.t. the selected simplified-physics low-fidelity model
in addition (see, e.g., procedure SGO-SPLF). Due to the exposition in § 4.4.2, there
are also novel construction opportunities for simplified-physics low-fidelity models
based on coordinate transformations.
5In [123], the authors investigate among others the relationship between the SSPCC and the degree
of the grid discretization in the context of antenna design. However, mind that an in-depth examina-
tion of relationships such as, e.g., the relationship between T2D
h1and r2
ˆ
y˜
ˆ
y∣k∶=5in the light of re-meshing
(see (5.9)), is out of the scope of the present work.
176 Chapter 5. Surrogate optimization with the magnetoquasistatic model
TABLE 5.3: Given T2D
h1↦≈1×105elements, T2D
h2↦≈1×103ele-
ments, 1∆2D
Ax=b∶=1×10−16,2∆2D
Ax=b∶=1×10−8, and m∶=21, the mean
SSPCC r2
ˆ
y˜
ˆ
y∣k∶=5within the k-fold cross validation method w.r.t. the
simplified-physics low-fidelity model regarding ˆ
jPL,ω0and ˆ
QL,ω0
in (5.17). The reference setting is indicated by a gray box.
(A) The mean SSPCC r2
ˆ
y˜
ˆ
y∣k∶=5
regarding ˆ
jPL,ω0.
T2D
h1T2D
h2
1∆2D
Ax=b1.00 0.90
2∆2D
Ax=b1.00 0.93
(B) The mean SSPCC r2
ˆ
y˜
ˆ
y∣k∶=5
regarding ˆ
QL,ω0.
T2D
h1T2D
h2
1∆2D
Ax=b1.00 0.99
2∆2D
Ax=b1.00 0.99
Subsequently, let us recall explicitly the context that we are in the midst of a
rapid prototyping part of a product development cycle. Thus, it is supposed that
the concern is mainly to deploy a surrogate-guided optimization in the context of
validation and verification of given results of a surrogate-based optimization (re-
call § 3.3). Then, it is economical and sustainable to re-use as much of the data of a
surrogate-based optimization as possible.
More tangibly, for a co-kriging optimization (recall § 3.3.3), let us re-use the
sample of size m=21 associated with the data regarding ˜
ˆ
jPL,ω0and ˜
ˆ
QL,ω0w.r.t. the
surrogate-based optimization’s log data in (5.23). That is, we use a proper subsam-
ple of size mK=15 as the data of a high-fidelity model and we use a subsample of
size m˜
K=21 as the data of a low-fidelity model whose output points are defined as
˜
y∶=ρckySBO , (5.24)
where ySBO ∈Rm˜
K×1refers to the column vector representing the output points of the
sample w.r.t. (5.23) and ρck ∈Rrefers to a scaling parameter.
The two main reasons for the modeling choice in (5.24) are: (1) It is a correct-
ness check at the level of programs (recall Figure 1.4) that is devised based on the
elaborations in [70, p. 173 – 176]. More precisely, one can expect that the estimate ˆ
ρ
in (3.183) is computed as
ˆ
ρ∶=1
ρck . (5.25)
(2) Due to the examinations in § 3.2.2, one can expect that, e.g., the SSPCC r2
ˆ
y˜
ˆ
y,˜
K1
in (3.193) is close to one such that one can partly emulate the conditions concerning
the situation in (3.193).
Notice that it is set that ρck ∶=1.05. Due to numerical issues (see the commentary
on (3.183)), let us round the corresponding estimate ˆ
ρto two decimal places, thus,
let us set ˆ
ρ∶=0.95 in (3.184).
The objective function in (5.17a) is adapted in order to be consistent with the
formulation in (3.192) in the sense that a desired value ˆ
jPL,ω0,d∈R+(cf. (2.33)) is
provided such that one can instantiate ˆ
jas
ˆ
j=(x1,x2)↦∥ˆ
jPL,ω0(x1,x2)−ˆ
jPL,ω0,d∥2
l2∶X0→R+, (5.26)
where ˆ
jPL,ω0,d∶=0.0×10−3W and ˆ
j(x1,x2)=ˆ
ˆ
j(ˆ
y(x1,x2)(cf. (2.40)) in (3.192). Due to
5.1. Solenoid with a core 177
the choice of the desired value ˆ
jPL,ω0,d, the optimal solution associated with the ob-
jective function in (5.17a) and the optimal solution associated with the adapted ob-
jective function in (5.26) are essentially the same.
Let us invoke the initial point in (5.18) and we receive the co-kriging optimiza-
tion’s log data in an abridged version, i.e.,
mSGO,ck ≡15 x∗∶=(3.00×10−3m,8.86 ×10−3m)
(5.27a)
ˆ
jPL,ω0(x∗
1,x∗
2)≡96.53×10−3Wˆ
QL,ω0(x∗
1,x∗
2)∶=2.50×10−6H, (5.27b)
where mSGO,ck ∶=mK. Mind that, in the counting method regarding mSGO,ck, the sit-
uation is mimicked where one possesses a low-fidelity model whose SSPCC r2
ˆ
y˜
ˆ
y,˜
K1is
close to one and which is not equal to the high-fidelity model.
Observe that, compared with (5.23), the relative deviation in a suitable norm
between the optimal solution in (5.23) and the optimal solution in (5.27) is well below
one percent.
Recalling § 3.3.2, the TRASM algorithm 3.1 is applied by using the co-kriging
low-fidelity models of ˜
ˆ
jPL,ω0and ˜
ˆ
QL,ω0corresponding to (5.27). Let us invoke the
initial point in (5.18) and we set the remaining input entities of the TRASM algo-
rithm 3.1 as
B(0)∶=I∆(0)∶=10 (5.28a)
(η1,η2,γ,ζ)≡(1.0×10−5,1.0 ×10−1,0.25,2) (kmax,eabs,erel)∶=(2,1.0 ×10−3,1.0×10−4),
(5.28b)
and we define F(0)
0by using (3.149) and by adapting (5.17b)–(5.17e) similarly to (3.154).
Hence, one receives the TRASM algorithm’s log data in an abridged version, i.e.,
mSGO,sm ≡19 x∗∶=(3.00×10−3m,8.86 ×10−3m)
(5.29a)
ˆ
jPL,ω0(x∗
1,x∗
2)≡96.53×10−3Wˆ
QL,ω0(x∗
1,x∗
2)∶=2.50×10−6H, (5.29b)
where mSGO,sm ∶=mSGO,ck +2kwith k≡2 within the TRASM algorithm 3.1. Let us
evaluate the optimal solution x∗in (5.29) by using the corresponding co-kriging low-
fidelity models.
Similarly to (5.27), observe that, compared with (5.23), the relative deviation in
a suitable norm between the optimal solution in (5.23) and the optimal solution
in (5.29) is well below one percent.
The entities in (5.28) are partly inspired by choices in [95], but, in general, these
entities are chiefly heuristically determined. Therefore, in practical applications,
there might be a need for a preprocessing step in which useful values for the en-
tities in (5.28) are determined. Additionally, in practical applications, there might
be a need for an adaptive restart strategy in order to prevent a potential low exper-
imental rate of convergence due to, for instance, too small step sizes h(k)within the
TRASM algorithm 3.1.
In summary, we have developed a particular relevant application-driven work-
flow concerning the relation in (3.106). More precisely, we have carved out a use
178 Chapter 5. Surrogate optimization with the magnetoquasistatic model
case such that
∃mDSO,mSBO,mSGO,sm,mSGO,ck ∈N/{0}.mDSO >mSBO >mSGO,sm >mSGO,ck (5.30)
holds to be true. To put the statement in (5.30) more poignantly, let us assume
that the computational time concerning a high-fidelity model evaluation is approxi-
mately 5min. Then, mDSO maps to 150min, mSBO maps to 115min, mSGO,sm maps to
95min, and mSGO,ck maps to 75min. Thus, we have carved out a use case in which
the computational time associated with a high-fidelity optimization problem can be
reduced by half. And with some additional effort, the corresponding optimal solu-
tion can be validated and verified, too.
Mind that, though, the numbers of the statement in (5.30) should be treated with
caution in the light of their potential for generalization such as in (3.106). Further-
more, recalling Figure 1.4, the choice of some entities such as, e.g., the problem-
dependent entities in (5.28), puts a few limits of comparability at the level of pro-
grams. For the issue of comparability at the level of functions, see § 4.3.
5.1.3 Optimization problem II
Note that if we extend the optimization problem in (5.17) in such a way that we
consider mw∈Noperating frequencies, then one can formulate heuristically an in-
dexed family of high-fidelity optimization problems Owhose assignment rule reads
as, e.g.,
O=i↦min. ˆ
jPL,ωi(x1,x2)(5.31a)
s.t. x1,min ≤x1≤x1,max, (5.31b)
x2,min ≤x2≤x2,max, (5.31c)
Lmin −ˆ
QL,ωi(x1,x2)≤0, (5.31d)
ˆ
QL,ωi(x1,x2)−Lmax ≤0, (5.31e)
where i∈Iwith I∶={1,. . .,mw},ωi∈WIwith WI∶={ω1,. . .,ωmw},x1,min ∶=1×10−3m,
x1,max ∶=3×10−3m, x2,min ∶=5×10−3m, x2,max ∶=10×10−3m, Lmin ∶=2.5 ×10−6H, and
Lmax ∶=3.5×10−6H.
A possible aim concerning (5.31) can be to determine the optimal solution (ωi∗,x∗)
regarding all individual high-fidelity optimization problems O(i)in (5.31) for which
∀(wi,x)∈WI×X.ˆ
jPL,ωi∗(x∗)≤ˆ
jPL,ωi(x)(5.32)
holds (cf. (4.18)). If we assume that the entity ˆ
jPL,ωi(x1,x2)and the entity ˆ
QL,ωi(x1,x2)
are not available quickly for all ωi∈WIsuch that one cannot invoke immediately the
multivariate vector-valued use case in (3.131a), then we have to consider each high-
fidelity optimization problem O(i)in (5.31) individually.
By applying the above-mentioned application-driven workflow to each O(i)and
by supposing that the numbers mDSO,mSBO,mSGO,sm, and mSGO,ck in (5.30) are the
same for each ωi, one can roughly estimate the worst-case computational burden by
∃mDSO,mSBO,mSGO,sm,mSGO,ck ∈N/{0}.mwmDSO >mwmSBO >mwmSGO,sm >mwmSGO,ck .
(5.33)
5.1. Solenoid with a core 179
In Table 5.4, I present the log data in an abridged version w.r.t. the setting in (5.19)
for the operating frequencies 5×101Hz, 1×105Hz, and 1 ×108Hz such that it is de-
fined that ω1∶=2π50Hz, ω2∶=2π100kHz, and ω3∶=2π100MHz in (5.31).
TABLE 5.4: Given the operating frequencies 5×101Hz, 1×105Hz,
and 1×108Hz, the log data in an abridged version w.r.t. the setting
of the log data in (5.19).
f0[ Hz ] mDSO [1] x∗[ (m,m) ] ˆ
jPL,ω0(x∗
1,x∗
2)[W] ˆ
QL,ω0(x∗
1,x∗
2)[H]
5×10130 (2.99×10−3,7.23 ×10−3)27.63×10−53.27 ×10−6
1×10530 (2.98×10−3,8.86 ×10−3)96.32×10−32.51 ×10−6
1×10830 (1.45×10−3,9.78 ×10−3)26.24×10−13.39 ×10−6
Let us set ˆ
jPL,ω1(x∗
1,x∗
2)∶=27.63×10−5W, and ˆ
jPL,ω2(x∗
1,x∗
2)∶=96.32×10−3W, and
ˆ
jPL,ω3(x∗
1,x∗
2)∶=26.24×10−1W, then one can define the subsequent ratios regard-
ing ˆ
jPL,ω0(x∗
1,x∗
2)in Table 5.4, that is,
ˆ
jPL,ω3(x∗
1,x∗
2)
ˆ
jPL,ω1(x∗
1,x∗
2)∶=9496.92, ˆ
jPL,ω2(x∗
1,x∗
2)
ˆ
jPL,ω1(x∗
1,x∗
2)∶=348.61, ˆ
jPL,ω3(x∗
1,x∗
2)
ˆ
jPL,ω2(x∗
1,x∗
2)∶=27.25. (5.34)
If we invoke Table 5.1 and if we set ePL(x1,max,f1)∶=0.651, and ePL(x1,max,f2)∶=
29.104, and ePL(x1,max,f3)∶=920.353, then one can define the subsequent ratios re-
garding ePL(x1,max,f0)in Table 5.1, that is,
ePL(x1,max,f3)
ePL(x1,max,f1)∶=1413.75, ePL(x1,max,f2)
ePL(x1,max,f1)∶=44.71, ePL(x1,max,f3)
ePL(x1,max,f2)∶=31.62. (5.35)
Comparing (5.35) with (5.34), one can observe that the rough heuristic estimate of
the scale of the winding losses in (5.13) captures at least partly an overall trend that
one can expect from the electromagnetic field theory’s realm of magnetoquasistatics.
Hence, the Table 5.4 furnishes us with a physics-driven plausible optimal solution
in the sense of the condition in (5.32).
Notice well that, especially regarding the operating frequency f0∶=1×108Hz in
Table 5.4, one can observe that the optimal solution determined by the COBYLA
algorithm is sensitive to the choice of the initial point x(0).
TABLE 5.5: Given the operating frequency 1×108Hz and the initial
points in (5.36), the log data in an abridged version w.r.t. the setting
of the log data in (5.19).
x(0)[ (m,m) ] mDSO [1] x∗[ (m,m) ] ˆ
jPL,ω0(x∗
1,x∗
2)[W] ˆ
QL,ω0(x∗
1,x∗
2)[H]
x(0)
130 (1.45×10−3,9.78 ×10−3)26.24×10−13.39 ×10−6
x(0)
230 (2.79×10−3,9.38 ×10−3)23.25×10−12.56 ×10−6
Using the initial points x(0)
1and x(0)
2that read as
x(0)
1∶=(1.1×10−3m,9.9 ×10−3m)x(0)
2∶=(3.0×10−3m,9.9 ×10−3m), (5.36)
180 Chapter 5. Surrogate optimization with the magnetoquasistatic model
I report in Table 5.5 the corresponding log data in an abridged version w.r.t. the
setting in (5.19) for the operating frequency 1×108Hz.
From Table 5.5 and the elaborations regarding the log data in (5.19) and a particu-
lar relevant application-driven workflow that culminated in the statement in (5.30),
I infer that it is likely that the shape of ˆ
jPL,ω3(x1,x2)behaves more intricately than
the shape of ˆ
jPL,ω2(x1,x2)and that the shape of ˆ
QL,ω3(x1,x2)behaves essentially the
same as the shape of ˆ
QL,ω2(x1,x2).
Let us test the inference by constructing corresponding kriging low-fidelity mod-
els ˜
ˆ
jPL,ωiand ˜
ˆ
QL,ωiin (5.31) where mw∶=3. It is supposed that the individual high-
fidelity models in (5.31) uniformly build upon the combination (T2D
h1,2∆2D
Ax=b) in
Table 5.3. Furthermore, it is assumed that the corresponding kriging low-fidelity
models uniformly build upon a Sobol quasi-random sequence sampling plan with
m∶=50 (recall § 3.1.1).
In Figure 5.2, I depict the evaluated kriging low-fidelity models in contour rep-
resentation for ω1∶=2π50Hz, ω2∶=2π100kHz, and ω3∶=2π100MHz.
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(1a)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(2a)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(3a)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(1b)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(2b)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(3b)
FIGURE 5.2: Given the combination (T2D
h1,2∆2D
Ax=b) in Table 5.3
and using the Sobol quasi-random sequence sampling plan with
m∶=50 and kriging low-fidelity models with
(1)
ω1∶=2π50Hz,
(2)
ω2∶=2π100kHz, and
(3)
ω3∶=2π100MHz; contour representa-
tion of
(a)
˜
ˆ
jPL,ωiand
(b)
˜
ˆ
QL,ωiin (5.31) where mw∶=3.
Dark colors indicate low values; bright colors indicate high values
(cf. Figure B.4 and Figure B.5).
Invoking the application-driven workflow concerning (5.30) to each O(i)in (5.31)
with mw∶=3, one can unleash the machinery of surrogate-based optimization and
the machinery of surrogate-guided optimization for each O(i)– analogous to the
strategies regarding (5.17).
5.1. Solenoid with a core 181
Since this unleashing results in one strategy to tackle the optimization problem
in (5.31) within the context of surrogate optimization, let us end the present subsec-
tion by discussing briefly some conceivable variations of this strategy based on the
examination of the Figure 5.2.
Examining the Figure 5.2, I conclude that the observable behavior regarding ˜
ˆ
jPL,ωi
and ˜
ˆ
QL,ωifor ω1∶=2π50Hz passes a physical-driven plausibility check. More pre-
cisely, due to the rough heuristic estimate of the scale of the winding losses in (5.13),
one can expect that the observable behavior regarding ˜
ˆ
jPL,ωiand ˜
ˆ
QL,ωifor ω1is ap-
proximately explainable by the electromagnetic field theory’s realm of magnetostat-
ics – besides the realm of magnetoquasistatics.
The similar behavior of ˜
ˆ
QL,ωifor ω2∶=2π100kHz and ω3∶=2π100MHz is phys-
ically deducible from the setup of the underlying boundary value problem (see the
commentary on Figure 5.1). Given the material characteristics concerning Figure 5.1,
it is to be expected that, after a certain threshold frequency, the electromagnetic field
is completely shielded from conducting domains. Hence, it is to be expected that
ˆ
QL(ω,x1,x2)in (5.10b) changes more for lower values of the frequency than for
higher values of the frequency.
Recalling (5.35) and (5.34), I argue that the essentially different behavior of ˜
ˆ
jPL,ωi
for ω2∶=2π100kHz and ω3∶=2π100MHz is explicable by the degree of the action of
the skin effect and the proximity effect.
However, a legitimate objection from a numerical analysis viewpoint is whether
it is justified to assume that the high-fidelity models in (5.31) uniformly build upon
the combination (T2D
h1,2∆2D
Ax=b) in Table 5.3. Hidden behind this assumption is the
assumption that the combination (T2D
h1,2∆2D
Ax=b) provides a sufficient resolution of
the action of the skin effect and the proximity effect for all frequencies under consid-
eration.
From an application-driven viewpoint, the chosen combination (T2D
h1,2∆2D
Ax=b)
that is based upon the numerical investigations concerning a median w.r.t. the or-
dered set WIin (5.31) can be interpreted as a compromise between accuracy and
speed. In the case of Figure 5.2, it is set that WI∶={ω1,ω2,ω3}such that the median
w.r.t. the ordered set WIis ω2∶=2π100kHz.
On an alternative path, one could base the combination (T2D
h1,2∆2D
Ax=b) upon the
numerical investigations concerning the highest frequency w.r.t. the ordered set WI
in (5.31). This choice puts an emphasis rather on accuracy than on speed since for
lower frequencies the same setup is utilized as for the highest frequency.
Another potential path is to choose adaptively the combination (T2D
h1,2∆2D
Ax=b) for
each frequency w.r.t. the ordered set WIin (5.31).
Recalling § 4.4.1, one can conceptualize the above-mentioned paths concerning
the combination (T2D
h1,2∆2D
Ax=b) by employing, for instance, a diagrammatic notation
such as in (4.47) for each frequency w.r.t. the ordered set WIin (5.31).
Observing Figure 5.2 and recalling § 3.2.1, it is possible to choose a different
sample size mfor each frequency w.r.t. the ordered set WIin (5.31) as well. Notice
well that such a choice has an impact on the estimate in (5.33).
182 Chapter 5. Surrogate optimization with the magnetoquasistatic model
5.2 Common-Mode Choke
5.2.1 Preliminary consideration
The device under test is a common-mode choke (CMC) as another representative of
the class of inductive components. The exposition is similar to the notational and
methodological exposition in § 5.1. Thus, the focus is primarily on adding other
aspects to the discussion regarding surrogate optimization with the magnetoqua-
sistatic model.
(i)
(ii)
Ω2D
nc,3
∂Ω2D
Ω2D
nc,2
Ω2D
nc,1
Ω2D
c,1b1
Ω2D
c,2b1
Ω2D
c,1aN
Ω2D
c,2aN
(iii)
(iv)
FIGURE 5.3: (i) A schematic illustration of a common-mode choke
with a longitudinal symmetric domain. (ii) A simplicial triangula-
tion Thof the space region Ω2Dvia FEMM4.2.
(iii) Magnetic field lines of a common-mode choke due to common-
mode (CM) currents. (iv) Magnetic field lines of a common-mode
choke due to differential-mode (DM) currents.
For a three-dimensional representation of a common-mode choke, see, e.g., Fig-
ure 1.1. In (i) of Figure 5.3, there is a schematic illustration of a two-dimensional
5.2. Common-Mode Choke 183
representation of a common-mode choke with a longitudinal symmetric domain.6
The subdomains Ω2D
nc,1 and Ω2D
nc,3 are behaviorally similar to the subdomain Ω2D
nc,1
in Figure 5.1; and the subdomain Ω2D
nc,2 is behaviorally similar to the subdomain Ω2D
nc,2
in Figure 5.1. Furthermore, the conducting subdomains – indexed by c– are be-
haviorally similar to the conducting subdomains in Figure 5.1. A notable differ-
ence in Figure 5.3 is the existence of a primary winding (that is, pairs of subdo-
mains (Ω2D
c,1ai,Ω2D
c,1bi)with i∈{1,. .., N}) and a secondary winding (i.e., pairs of sub-
domains (Ω2D
c,2ai,Ω2D
c,2bi)with i∈{1,. .., N}).
In (iii) of Figure 5.3, the magnetic field lines of a common-mode choke due to
common-mode (CM) currents are shown. By abuse of notation regarding (5.6), let
us indicate the two possible orthogonal directions of the current density w.r.t. the
two-dimensional domain Ω2d
nc,1 as I+
0and I−
0. Hence, let us encode CM currents as
the ordered pair (I+
0,I−
0)for each ordered pair (Ω2d
c,1ai,Ω2d
c,1bi)and (Ω2d
c,2ai,Ω2d
c,2bi).
In (iv) of Figure 5.3, the magnetic field lines of a common-mode choke due to
differential-mode (DM) currents are shown. Thus, let us encode DM currents as the
ordered pair (I+
0,I−
0)for each ordered pair (Ω2d
c,1ai,Ω2d
c,1bi)and the ordered pair (I−
0,I+
0)
for each ordered pair (Ω2d
c,2ai,Ω2d
c,2bi). For an in-depth elaboration on the different
modes of operation of a common-mode choke, I refer to, e.g., [164, p. 346 – 352] and
references therein.
In § 5.1, we discuss two optimization problems with regard to a single inductive
component. Hence, I consider these optimization problems as embedded into the
component level of numerical investigations concerning inductive components.
If one considers these optimization problems with regard to two or more induc-
tive components or other components such as, e.g., in Appendix B.1, then I conceive
the corresponding optimization problems as lifted onto the system level of numerical
investigations concerning inductive components.
Instead of lifting the optimization problem in (5.17) and the optimization prob-
lem in (5.31) onto the system level, let us consider two other kinds of optimization
problems at the system level in order to enlarge the point of view regarding the
applications of surrogate optimization for inductive components.
In the subsequent elaborations, let us narrow down our attention of surrogate
optimization to surrogate-based optimization, though. Observe that the utilization
of surrogate-guided optimization in the context of validation and verification of
the given results of a surrogate-based optimization is analogous to the exposition
in (5.1.2).
5.2.2 Optimization problem I
Recalling the EMC filter in Figure 1.1, let us consider a prototypical version of a
simplistic EMC filter in Figure 5.4.7
6Mind that parts of the corresponding simulation code at an early stage have been developed dur-
ing a student project called "Spulen-Optimierung mit Ersatzmodellen" [in English: "Coil optimization
with surrogate models"] (Marie Krause, Mandy Domke, winter term 2017/2018; unpublished) under
my scientific supervision and reviewed by the first reviewer of the present work.
7Mind that parts of the corresponding simulation code at an early stage have been developed
during a master thesis called "Multi-fidelity modeling for electromagnetic compatibility problems"
(Rodrigo Silva Rezende, summer term 2020; unpublished) under my scientific supervision and the sci-
entific supervision of Dr. Jan Hansen (Robert Bosch GmbH) and reviewed by the first reviewer of the
present work and Prof. Dr. Stefan Kurz (TU Darmstadt). Moreover, the example in Figure 5.4 is part of
a series of numerical studies regarding joint publications in preparation for submission called "Vector-
Valued Multi-Fidelity Surrogate Modeling for Microwave Components Design" (R. S. Rezende, M.
184 Chapter 5. Surrogate optimization with the magnetoquasistatic model
From a theory-driven viewpoint, examples such as in Figure 5.4 or in Figure B.2b
leave the realm of the magnetoquasistatic model (recall § 2.1.3).
However, from an application-driven viewpoint, I deem it reasonable to argue
that, roughly speaking, it depends on the chosen frequency range whether, for the
numerical investigation of an application at hand, the magnetoquasistatic subsystem
of Maxwell’s equations and the complete system of Maxwell’s equations itself appear
appropriately useful to gain knowledge about the application at hand.
If one considers a frequency range such as, e.g., from 50Hz to 100MHz (recall
Figure 5.2) or from 0Hz to 200MHz, then I claim that, for a given task, it might be
beneficial to take into account the magnetoquasistatic subsystem and the complete
system of Maxwell’s equations. Notice that, though, the boundary of the overlap-
ping modeling region of the subsystem and the complete system is probably fuzzy.
For one classification of frequency ranges, I refer to, e.g., [164, p. 19].
x3
x2
x1C1
C2
Ω3D
c,1
Ω3D
c,2
Ω3D
nc,2
Ω3D
nc,1
FIGURE 5.4: A prototypical version of a simplistic EMC filter created
within CST Studio Suite®. The nomenclature from (i) of Figure 5.3 is
adapted. The value C1∶=1×10−5F and the value C2∶=2×10−5F refer
to a respective capacitance (see Figure B.1).
The boundary ∂Ω3D, and the parts relevant to the excitation (four dis-
crete ports and the ground planes), and other (x1,x2,x3)-dependent
geometrical entities (cf. Figure 5.1) are not depicted.
In the present work, however, I do not dwell on epistemological issues regarding
the overlapping modeling region of the magnetoquasistatic subsystem and the com-
plete system of Maxwell’s equations. Furthermore, I do not dwell on the intricacies
of boundary value problems regarding the complete system.
Let us mainly consider the example of application in Figure 5.4 in the spirit of a
high-fidelity model as a black-box model (recall Figure 1.2) within a rapid prototyp-
ing part of a product development cycle.
Hence, it is assumed that, regarding the application in Figure 5.4, there are el-
igible parts relevant to the excitation such as, e.g., four discrete ports and ground
planes, and suitable conditions at the boundary ∂Ω3Dsuch as, e.g., a mix of open
boundary conditions and electric boundary conditions. Concerning the domain Ω3D,
let us consider the existence of other (x1,x2,x3)-dependent geometrical entities (cf.
Figure 5.1) than those in Figure 5.4 as given.
Hadžiefendi´c, R. Schuhmann) and "Multi-Output Variable-Fidelity Bayesian Optimization of a Com-
mon Mode Choke" (R. S. Rezende, M. Hadžiefendi´c, J. Hansen, R. Schuhmann).
5.2. Common-Mode Choke 185
Therefore, one can readout, for instance, the S22-parameter depending on the an-
gular frequency ω∈R+and the three geometrical parameters x1,x2,x3∈R+in Fig-
ure 5.4. Recalling our commentary on S-parameters in § 4.4.2, let us conceive S22 as
the magnitude in dB of the corresponding S22-parameter. Finally, one can overload
the S22-parameter in the sense that, similarly to (5.10), one defines the function S22
that reads as
S22 =(ω,x1,x2,x3)↦S22(ω,x1,x2,x3)∶R+×R+×R+×R+→R. (5.37)
Analogously to (5.11) and (5.12), one can construct a map S22,ωand, given a fixed
angular operating frequency ω0, a map S22,ω0such that
S22,ω=ω↦((x1,x2,x3)↦S22(ω,x1,x2,x3))∶R+→RR+×R+×R+, (5.38a)
S22,ω0=(x1,x2,x3)↦S22(ω0,x1,x2,x3)∶R+×R+×R+→R, (5.38b)
where S22,ω0≡S22,ω(ω0).
Similarly to the application in Figure 5.1, those geometrical parameters that af-
fect, among other things, the shape of the core of an inductive component, affect in-
evitably the corresponding inductance (see Figure B.1) of the inductive component
(recall, e.g., Figure 5.2), too. And, due to Figure B.3, we possess an indication that a
change of the inductance affects the impedance of an inductive component. Suppos-
ing an appropriate relationship between the impedance of the inductive component
and the corresponding S-parameters, it is reasonable to assume that a change of the
input parameters in (5.37) affects the shape of S22(ω,x1,x2,x3)in a similar way as
the change of parameters affects the shape of Z1(ω)and Z2(ω)in Figure B.3.8
Thus, by performing a variation of the optimization problem in (5.31), one can
formulate an optimization problem regarding the application in Figure 5.4 that reads
as
O=i↦min. S22,ωi(x1,x2,x3)(5.39a)
s.t. x1,min ≤x1≤x1,max, (5.39b)
x2,min ≤x2≤x2,max, (5.39c)
x3,min ≤x3≤x3,max, (5.39d)
where I∶={1,. . .,mw},ωi∈WIwith WI∶={ω1,. . .,ωmw}, and x1,min ∶=40 ×10−3m,
x1,max ∶=50×10−3m, x2,min ∶=10×10−3m, x2,max ∶=30×10−3m, x3,min ∶=2×10−3m,
and x3,max ∶=7×10−3m.
A possible aim concerning (5.39) can be to determine the optimal solution x∗
such that
∀(wi,x)∈WI×X.S22,ωi(x∗)≤S22,ωi(x)(5.40)
holds (cf. (4.18)). Observe, though, the slight difference regarding the formulation of
the optimal solution in (5.40) compared with (5.32).
In practical applications, it is usual to observe the entity S22,ωiin (5.39) w.r.t.,
for instance, a frequency range from 0Hz to 150MHz where it is set that mw∶=1001.
Thus, it is assumed that the entity S22,ωiis quickly available for all ωi∈WI. Therefore,
8I conceive S22(ω,x1,x2,x3)as the magnitude in dB of the corresponding S22(ω,x1,x2,x3)-
parameter whereas I conceive Z1(ω)and Z2(ω)as the magnitude in Ωof the corresponding
impedance Z1(ω)and Z2(ω), respectively. Hence, it is assumed that suitable maps exist in order
to compare the shapes of the entities S22(ω,x1,x2,x3),Z1(ω)and Z2(ω).
186 Chapter 5. Surrogate optimization with the magnetoquasistatic model
the high-fidelity optimization problems O(i)in (5.39) are not considered individu-
ally.
Hence, I propose to replace mwin the rough estimate in (5.33) by a problem-
dependent number cmw∈Nwith cmw≥1. The number cmwencodes a potential dif-
ferentness of the counting methods regarding the corresponding surrogate optimiza-
tion methods.
For the usage of surrogate-based optimization w.r.t. (5.39), let us focus on the case
i∈Iwith I∶={1,2,3}and ωi∈WIwith WI∶={115MHz,120MHz,125MHz}. This
case reflects, for instance, the aim to find a potential minimal value of an S-parameter
around a certain frequency or within a frequency subrange.
Thus, using a Maximin LHC (recall Figure 3.1) with m∶=35, one can construct
the corresponding kriging low-fidelity model for each O(i)and one can compute
the optimal solution with regard to each kriging low-fidelity model. Let us pick the
optimal solution out of three possible optimal solutions that evaluates to the lowest
value. Hence, the optimal solution ˜
x∗reads as
˜
x∗∶=(40.00×10−3m,30.00 ×10−3m,2.00 ×10−3m). (5.41)
If we employ the optimal solution in (5.41) as an initial point within the implemen-
tation of the NMS algorithm (recall § 2.3.3) of CST Studio Suite®(recall Figure 5.4),
then one can observe no significant change w.r.t. the optimal solution in (5.41) after
the maximum number of high-fidelity function evaluations ˜
mDSO,max ∶=2.
Given a random initial point x(0)that can be written as
x(0)∶=(50.00×10−3m,10.00 ×10−3m,7.00×10−3m), (5.42)
then, on average, S22,ωiw.r.t. ˜
x∗is more than 40% lower than S22,ωiw.r.t. ˜
x(0).
If we utilize the initial point x(0)in (5.42) within the implementation of the NMS
algorithm of CST Studio Suite®, then, after the maximum number of high-fidelity
function evaluations mDSO,max ∶=40, it returns an optimal solution such as in (5.41).
Setting mDSO ∶=mDSO,max and mSBO ∶=m+˜
mDSO,max, one can observe a use case
such that
∃mDSO,mSBO ∈N/{0}.mDSO >mSBO (5.43)
holds to be true. However, concerning the statement in (5.43) the same caveats apply
as for the statement in (5.30).
For even more complex issues regarding surrogate optimization w.r.t. the ex-
ample of application in Figure 5.4, the tools from the category theoretical language
in ch. 4can beneficially provide notational and methodological guidance (see, e.g.,
§4.4.2).
5.2.3 Optimization problem II
In Figure 5.5, a representation of two common-mode chokes within FEMM4.2 and
their magnetic field lines due to DM currents is provided.9
In the test cases, some relevant choices regarding the geometric modeling of the
two CMCs are that both the number of turns of the primary winding and the number
9The example in Figure 5.5 is part of a series of numerical studies regarding a joint publication
in preparation for submission called "Surrogate-guided Optimization based on the Space-Mapping
Paradigm and the Co-Kriging Approach with Application in Electromagnetic Compatibility" (M.
Hadžiefendi´c, R. S. Rezende, R. Schuhmann).
5.2. Common-Mode Choke 187
(i) (ii)
(iii) (iv)
x2
x1
y
x
FIGURE 5.5: A representation of two common-mode chokes within
FEMM4.2 and their magnetic field lines due to DM currents.
(i) A schematic illustration of the two CMCs with geometrical param-
eters (x1,x2)∈R+×R+, and x1≡rcmc, and x2≡ϕcmc.
(ii) Magnetic field lines for the choice (x1,x2)∶=(rcm,0°).
(iii) Magnetic field lines for the choice (x1,x2)∶=(rcm,45°).
(iv) Magnetic field lines for the choice (x1,x2)∶=(rcm,90°).
of turns of the secondary winding are set to 9 and the radius of the cross-section
areas w.r.t. the primary winding and the secondary winding are both set to 2mm.
Furthermore, the geometrical specifications of the cores of both CMCs are defined
as subsequently: The inner radius is set to 18mm, the width is set to 12mm and the
height is set to 15mm.
However, analogously to the exposition in § 5.2.2, let us mainly consider the
example of application in Figure 5.5 in the spirit of a high-fidelity model as a black-
box model within a rapid prototyping part of a product development cycle.
Recalling the short list in § 5.1.1, let us formulate an optimization problem that is
concerned with the inductive coupling between the two CMCs.
From a field theoretical perspective, I conceive a situation such as in (ii) of Fig-
ure 5.5 as a situation of low inductive coupling between the two CMCs and I con-
ceive a situation such as in (iv) of Figure 5.5 as a situation of high inductive coupling
188 Chapter 5. Surrogate optimization with the magnetoquasistatic model
between the two CMCs. For an electrical network theoretical perspective on induc-
tive coupling, see, e.g., [164, ch. 11] and references therein.
Given this viewpoint, if we provide box constraints regarding the geometric pa-
rameters x1and x2that are depicted in (i) of Figure 5.5 such that x1∈[x1,min,x1,max]
and x2∈[x2,min,x2,max]where x1,min ∶=81.25×10−3m, and x1,max ∶=146.25 ×10−3m,
and x2,min ∶=0°, and x2,max ∶=90°, then one can expect the solution x∗
Hthat reads as
x∗
H∶=(x1,min,x2,max)(5.44)
to correspond to the situation of highest inductive coupling between the two CMCs.
Thus, in practical applications, it is desirable to detect a solution such as in (5.44)
in order to avoid it. Mind that, though, from a design viewpoint, it is probably
reasonable to assume that
∀x1∈[x1,min,x1,max].x∗
H∶=(x1,x2,max)(5.45)
holds to be true as solutions to be avoided. That is, all solutions in (5.45) are per-
ceived as equally bad.
Notice well that, analogously to (5.44) and to (5.45), it is probably reasonable to
assume that, from a design viewpoint,
∀x1∈[x1,min,x1,max].x∗
L∶=(x1,x2,min)(5.46)
holds to be true as solutions to be pursued. That is, all solutions in (5.46) are per-
ceived as equally good.
Given these preliminary thoughts, let us anticipate some kind of symmetrical be-
havior of the evaluated objective function w.r.t. all parameter configurations (x1,45°)
(see (iii) of Figure 5.5).
Thus, let us choose heuristically a map ˆ
QL,ω0as an objective function that is anal-
ogously defined to (5.12b). The map ˆ
QL,ω0serves as an approximate proxy to encode
the considerations about the inductive coupling of the two CMCs in Figure 5.5. Re-
garding the map ˆ
QL,ω0, it is supposed that V≡Ω2Din (B.4) and (B.5).
Hence, let us investigate the following high-fidelity optimization problem
min. ˆ
QL,ω0(x1,x2)(5.47a)
s.t. x1,min ≤x1≤x1,max, (5.47b)
x2,min ≤x2≤x2,max, (5.47c)
where ω0∶=2π50Hz, x1,min ∶=81.25×10−3m, x1,max ∶=146.25 ×10−3m, x2,min ∶=0°,
x2,max ∶=90°.
In Figure 5.6, I present a contour representation of an evaluated kriging low-
fidelity model ˜
ˆ
QL,ω0w.r.t. (5.47) by using the Sobol quasi-random sequence sampling
plan with m∶=50.
If we choose m∶=21 and ˜
mDSO,max ∶=5, then one needs mSBO ∶=m+˜
mDSO,max high-
fidelity function evaluations in order to find the optimal solution
x∗
L∶=(x1,min,x2,min), (5.48)
which satisfies the statement in (5.46). Though, if we set mDSO,max ∶=30 and choose,
for instance, a random initial point x(0)that can be written as
x(0)∶=(146.25×10−3m,90°), (5.49)
5.2. Common-Mode Choke 189
91.25 111.25 131.25
x
1 [mm]
0
30
60
90
x
2 [∘]
FIGURE 5.6: Given the combination (T2D
h1,2∆2D
Ax=b) in Table 5.3 and
using the Sobol quasi-random sequence sampling plan with m∶=50
and a kriging low-fidelity model with ω0∶=2π50Hz;
contour representation of ˜
ˆ
QL,ω0in (5.47).
Dark colors indicate low values; bright colors indicate high values
(cf. Figure 5.2).
then one can record that, for mDSO ∶=mDSO,max, we do not arrive at the optimal solu-
tion in (5.48). Hence, one can observe a use case such that
∃mDSO,mSBO ∈N/{0}.mDSO >mSBO (5.50)
holds to be true. Regarding the statement in (5.50), though, the same prudence ought
to be exercised as for the statement in (5.43).
With regard to the example of application in Figure 5.5, the statement in (5.50) re-
flects rather the focus on a strategy for surrogate optimization than an investigation
of formulations of the high-fidelity optimization problem.
The high-fidelity optimization problem in (5.47) is solely one approximate en-
coding of the issue concerning the inductive coupling of the two CMCs in Figure 5.5
within the mathematical framework of § 2.3. The in-depth investigation of other
potential encodings is left for future work.
190 Chapter 5. Surrogate optimization with the magnetoquasistatic model
5.3 In closing
Assuming the context of an electrical engineering design workflow, we have devel-
oped a strategy of using the tools from ch. 3in practical applications. Furthermore,
we have carved out some relevant spots where the tools from ch. 4can have a bene-
ficial impact as well.
We have elaborated on four high-fidelity optimization problems that are embed-
ded within the setting of a 2D-LBVP and a 3D-LBVP, respectively.
From the viewpoint of § 2.3, we have narrowed down our attention to optimiza-
tion problems that have an evaluated reduced parametric quantity of interest in the
objective functional and an evaluated reduced parametric quantity of interest in the
constraints besides box constraints; and to optimization problems that have an eval-
uated reduced parametric quantity of interest in the objective functional and box
constraints.
Moreover, given the semantics of multiple operating frequencies, we have suc-
cinctly addressed its impact on the computational burden in terms of the number
of high-fidelity function evaluations and its peculiarity regarding the formulation of
a corresponding optimal solution. We have also discussed that if the high-fidelity
optimization problems associated with each frequency are considered individually,
then, figuratively speaking, the space of options concerning surrogate optimization
expands – in the sense that, e.g., a data-fit low-fidelity model’s number of sampling
plan points can be adaptively chosen for each frequency.
We have seen that various high-fidelity models w.r.t. the magnetoquasistatic mo-
del exhibit different behaviors such that the insights regarding the investigations,
e.g., in § 3.2.1 can be exploited –, for instance, to roughly identify the behavior of the
high-fidelity models with the behavior of a test function from Figure 2.2.
Additionally, we have exemplified the quick use of normalized global first-order
sensitivity measures as survey tools for the relevance of parameters and as poten-
tial approximate proxies for other indicators regarding the quality of a low-fidelity
model.
Using the methodological guidance from § 4.4.1, we have constructed some simp-
lified-physics low-fidelity models and we have computed the corresponding SSPCC
of these models.
We have deployed surrogate-guided optimization (see § 3.3) mainly in the con-
text of validation and verification of given results of a surrogate-based optimization.
A peculiarity is that we have utilized the co-kriging low-fidelity model in (3.184)
within the TRASM algorithm 3.1 in the spirit of hybrid model management strate-
gies (cf. § 3.4).
Mind that, however, we have carved out use cases where the number mDSO of
high-fidelity function evaluations regarding direct solving of a high-fidelity opti-
mization problem exceeds the number mSBO regarding surrogate-based optimiza-
tion; and we have carved out a use case where, in addition, the number mSBO exceeds
the number mSGO regarding surrogate-guided optimization. Concerning these num-
bers, we have also addressed some caveats and limits of comparability.
191
Chapter 6
Conclusion and outlook
At the end, I distill a conclusion from § 2.4,§3.4, § 4.6, and § 5.3, more precisely, I
select a few particular insights from these sections to illustrate from the frog’s-eye
view, i.e., at a more technical level, some of the present work’s achievements.
Moreover, I adopt a bird’s-eye view to illustrate in which respect this work has
made some progress in the scientific thicket of full automation of the virtual pro-
totyping of power electronic systems (see chapter 1); and I present an outlook for
potential new endeavors that may stem from this work.
6.1 Conclusion
The whole zoo of optimization algorithms in § 2.3.3 has proven to be useful for the
purpose of finding a solution of a given optimization problem and for the purpose
of cross-checking a computed optimal solution in the sense of validation and verifi-
cation.
The gradient-based interpretation of sensitivity measures is well-suited for func-
tions that permit the determination of derivative information by forward mode auto-
matic differentiation. At the level of programs (see Figure 1.4), this interpretation is
especially beneficial if there is a sound interaction between a module for automatic
differentiation and a module for numerical integration to facilitate this interpreta-
tion’s embedding into production-level code.
To my best knowledge, sampling plans constructed by a Sobol quasi-random
sequence are not yet widely represented in the literature concerning surrogate opti-
mization. From a theoretical viewpoint, these kinds of sampling plans enable a re-
producibility of samples of a given size. From a data management viewpoint, these
kinds of sampling plans enable an economical and sustainable handling of data by
ensuring reusability of data. Furthermore, such sampling plans may help to lower
the computational burden of constructing a co-kriging low-fidelity model.
If we assume a sparse number of sampling plan points concerning the high-
fidelity model, then using the squared sample Pearson correlation coefficient in com-
bination with the empirical generalization error within the k-fold cross validation
method requires a minimum number of sampling plan points depending on the
number k.
By carving out a potential link between the correlation coefficient and the sensi-
tivity measures, I have cautiously formulated a conjecture about the trustworthiness
of low-fidelity models’ normalized global first-order sensitivity measures. An im-
plication of this conjecture is that, in addition to their feature as survey tools for the
relevance of parameters, the sensitivity measures can serve as possible approximate
proxies for other indicators regarding the quality of a low-fidelity model.
However, it is unclear whether there exists a reliable complete list of indicators.
Despite this unclear point, a benchmark-focused classification of test functions has
192 Chapter 6. Conclusion and outlook
been provided that creates the opportunity to classify very roughly the behavior of
a corresponding optimization problem within the magnetoquasistatic context.
Mind that use cases have been provided where the number of high-fidelity func-
tion evaluations is higher for a direct solving of a high-fidelity optimization prob-
lem than for a surrogate-based optimization approach and for a surrogate-guided
optimization approach. From an application-driven viewpoint in the context of val-
idation and verification, I have argued that there is an additional value of checking
whether the number of high-fidelity model evaluations is higher for a surrogate-
based optimization approach than for a surrogate-guided optimization approach.
Regarding all these numbers, however, there are some caveats and limits of compa-
rability, too.
Driven by heuristics, a purely formalization-oriented viewpoint has been ex-
ploited that has provided us with novel insights of theoretical value (such as poten-
tial hybrid model management strategies) and of practical value (such as convergence-
related issues within the space-mapping paradigm and regarding the quality of the
low-fidelity model within the co-kriging low-fidelity model).
The formalization-oriented viewpoint has culminated in the exposition of the
category theory toolset which represents solely a subset of the large amount of tools
available within category theory. The capability of the CT toolset as an algebraic
modeling framework for applications in surrogate optimization within the electro-
magnetics context has been shown. More precisely, the strengths of the CT toolset as
a strong notational scaffolding by diagrams of arrows have been illuminated.
In order to quantify the so-called modeling error, I have suggested a heuristics-
driven notion of a problem-dependent degree of forgetfulness as an auxiliary means.
Moreover, some classification tools at the level of generalized functions (see Fig-
ure 1.4) for the concept of multifidelity model management and the space mapping
notion have been propounded. Furthermore, a diagram of arrows has been pro-
posed as a common generic interface of two formalization issues related to coordi-
nate transformations.
From an application-oriented viewpoint, the intuition concerning the CT toolset
is especially relevant in order to facilitate the CT toolset’s wider acceptance. Hence,
there has been an attempt to balance the need for rigor and the need for intuition.
Admittedly, however, further investigations are necessary to set the applications of
the category theory toolset on an even more rigorous foundation.
Finally, representatives of the class of inductive components have been invoked
and I have examined four high-fidelity optimization problems that are embedded
within the setting of a two-dimensional linear boundary value problem and a three-
dimensional linear boundary value problem, respectively.
By supposing the context of an electrical engineering design workflow, I have
propounded a strategy of using the surrogate optimization tools of the present work
in practical applications. Moreover, some promising spots for a beneficial utilization
of the category theory toolset have been illuminated, too.
Finally, let me elucidate briefly in which respect this thesis achieves some pro-
gress concerning its ideal long-term goal, that is, the full automation of the virtual
prototyping of power electronic systems. This ideal goal can be approximately con-
ceived as the development of a user-independent software system that performs the
mathematical modeling, numerical simulation and optimization given an applica-
tion within the electromagnetics context.
The sheer complexity of such a goal demands expertise from numerous special-
ties. Thus, it is probable that any endeavor towards this goal is very interdisciplinary
by nature.
6.2. Outlook 193
This thesis has provided some indication that the category theoretical language
can be a serious candidate for the important position of a mediator that enables a
smooth interplay between diverse fields such as computer science, numerical anal-
ysis, and electrical engineering.
Moreover, this thesis has provided some indication that surrogate optimization
can be valuable for power electronic applications at a component level as well as at a
system level in the context of performance-oriented optimization and in the context
of validation and verification.
Category theory’s inherent emphasis on formal aspects of a given problem and
its closeness to type theory in programming language theory suggest that it is a good
companion in the pursuit of a user-independent software system. Furthermore, its
high level of mathematical abstraction and its close relationship to logic makes it
also a good companion in the pursuit of mathematical modeling, simulation, and
optimization of a given application within any physics-inspired semantics.
In a nutshell, the category theoretical toolset offers some help to handle the com-
plexity of this thesis’ ideal long-term goal by providing algebraic and visual tools
to convey complex formalization ideas that arise naturally in the context of, for in-
stance, multi-fidelity modeling of surrogate optimization. Thus, the CT toolset can
surely assist in the ongoing challenging search for novel surrogate-guided optimiza-
tion methods for power electronic applications.
6.2 Outlook
I will first mention some specific potential new endeavors regarding chapter 2, chap-
ter 3, chapter 4, and chapter 5; and then I will mention a general potential new en-
deavor that is associated with the Disclaimer in§1.3.
Concerning ch. 2, it is desirable to extend the size of the subset of optimization
test functions. Moreover, I also deem it advisable to include more test functions
that admit a natural generalization to higher dimensions. With regard to sensitivity
measures, it seems worthwhile to compare various interpretations.
Concerning ch. 3, it might be fruitful to extend the numerical investigations re-
garding the optimization with test functions by data-fit low-fidelity models. For
instance, more data-fit low-fidelity models could be taken into account. The thor-
ough investigation of sequential kriging optimization and sequential co-kriging op-
timization with regard to the optimization problems in ch. 5appears as an intriguing
venture, too. Furthermore, the proper incorporation of many low-fidelity models in
a surrogate-guided optimization approach might be a promising path as well.
Concerning ch. 4, future use cases for the CT toolset have been discussed bit
more extensively in § 4.5. Undoubtedly, however, it is very preferable to seek out
more examples of applications within the electromagnetics context for the category
theory toolset.
Concerning ch. 5, it is worthwhile to extend the number of parameters and the
number of optimization problems. Moreover, given the semantics of multiple op-
erating frequencies, I deem it beneficial to explore exhaustively the many possible
options concerning surrogate optimization.
Finally, recalling the Disclaimer in§1.3, it might be fruitful to incorporate uncer-
tainty quantification, parallel computing, and automation aspects all together into
the context of surrogate optimization under the guidance of the CT toolset in or-
der to develop a, for lack of a better word, robust, parallel, and automated surrogate
optimization guided by category theoretical ideas.
195
Appendix A
Multivariate polynomials (§ 3.1.2)
A.1 Reparametrization using mean-centered arguments
With regard to fostering numerical stable computations, some authors (see, e.g., [61,
p. 27]) recommend to perform a reparametrization using mean-centered arguments,
i.e., the argument xin (3.54) is mapped to x−¯
xvia the map T¯
x=x↦x−¯
x∶Rd×1→Rd×1
where, supposing a sampling plan Xs, the components of ¯
x∈Rd×1are determined by
the componentwise means of the sampling plan points in (3.14). The map T¯
xenables
to define a map p¯
x∶=(p○T¯
x)that can be graphically represented as
Rd×1Rd×1
R
T¯
x
p¯
x∶=(p○T¯
x)
p(A.1)
where its assignment is encoded by
(p○T¯
x)(x)∶=β0+le(x−¯
x)+qA(x−¯
x). (A.2)
Analogous to the construction of the matrix Bin (3.59), one can define a matrix B¯
x.
In Table A.1, one can observe that there is an improvement regarding the condition
number κ(BT
¯
xB¯
x)w.r.t. Xs,1 and Xs,3. In the case of the sampling plan Xs,1, the ab-
solute difference is 7.67×103and the percental difference is 72.36%. In the case of
the sampling plan Xs,3, the absolute difference is 2.2×103and the percental differ-
ence is 68.11%. In the case of the sampling plan Xs,3, the matrix BT
¯
xB¯
xis singular
which shows that a de facto singular matrix Bremains singular in the course of a
reparametrization of the form encoded in (A.2).
TABLE A.1: The condition number w.r.t. a sampling plan from Fig-
ure 3.4 without and with reparametrization in (A.2).
Condition number
Sampling plan Xs,1 Xs,2 Xs,3
κ(BTB)1.06×1048.57×1049 3.23×103
κ(BT
¯
xB¯
x)2.93×103∞1.03×103
∣κ(BTB)−κ(BT
¯
xB¯
x)∣ 7.67×103∞2.2×103
∣κ(BTB)−κ(BT
¯
xB¯
x)∣
∣κ(BTB)∣ 72.36×10−2∞68.11×10−2
196 Appendix A. Multivariate polynomials (§ 3.1.2)
A.2 Bernstein polynomials
The reparametrization strategy in (A.1) and in (A.2), respectively, inspires to inves-
tigate in a small numerical experiment the condition number with regard to polyno-
mials in Bernstein form. For more details on the properties of Bernstein polynomials,
see, e.g., [55, p. 205–211].
The Bernstein basis Bς⊆P≤nreads as
Bς∶=⎧
⎪
⎪
⎨
⎪
⎪
⎩˜
bi,n(x)≡(n
i)xi(1−x)n−iRRRRRRRRRRRi∈{0,1,.. .,n−1,n}∧x∈[0,1]⎫
⎪
⎪
⎬
⎪
⎪
⎭. (A.3)
In order to transform the domain from [al,bl]d≡[0,1]dto [˜
al,˜
bl]dwith l∈{1,. . .,d},
let us apply the affine map γl∶[al,bl]→[˜
al,˜
bl]and the affine map νl∶[˜
al,˜
bl]→[al,bl]
such that
[al,bl][˜
al,˜
bl]
[al,bl]
γl
∀l.id[al,bl]∶=(νl○γl)νl(A.4)
where id[al,bl]denotes the identity map on the domain [al,bl]. The assignments of the
affine maps γland νlare encoded by
∀l∈{1,. . .,d}.γ(xl)∶=((bl−al)xl+al), (A.5a)
∀l∈{1,. . .,d}.ν(xl)∶=1
bl−al(xl−al). (A.5b)
Assuming that the Bernstein coefficients are given in their floating-point repre-
sentation, the de Casteljau’s algorithm is a numerically stable tool to evaluate a polyno-
mial p∈span(Bς). A more elaborated discussion on the properties of de Casteljau’s
algorithm and, especially, its application in the context of Bézier curves, see, e.g., [55,
p. 211–218].
In Listing A.1, I present an example implementation of de Casteljau’s algorithm
for the evaluation of a univariate Bernstein sum in the Julia PL.
LISTING A.1: An example implementation of de Casteljau’s algo-
rithm for the evaluation of a univariate Bernstein sum in the Julia PL.
1
function bernstein_deCasteljau_eval_1d(c::Vector{T},x::T) where T<:Real
2
N = size(c,1) # 1-based indexing
3
D = zeros(N,N)
4
D[1,:] = copy(c)
5
for jin 2:N
6
for iin 1:N-(j-1)
7
D[j,i] = (1-x)*D[j-1,i] + x*D[j-1,i+1]
8
end
9
end
10
return D[N,1]
11
end
Generalizing to the multivariate case by employing the tensor product construction,
one can provide a Bernstein basis for the space Pd
k, however, one cannot provide a
Bernstein basis for the space Pd
≤k.
A.2. Bernstein polynomials 197
For a small numerical experiment, let us consider the spaces P2
≤2,P3
≤2,P2
2and P3
2.
Regarding the space P2
2, exemplarily, let us utilize the corresponding matrix repre-
sentations of the tensor product basis as column vectors ˜
b∈R9×1and ˜
bς∈R6×1by
invoking the Kronecker product ⊗with the signature Rm×n×Rp×q→Rpm×qn. Hence,
˜
band ˜
bςcan be written as
˜
b∶=[1x1x2
1]T⊗[1x2x2
2]T, (A.6a)
˜
bς∶=[(1−x1)22x1(1−x1)x2
1]T⊗[(1−x2)22x2(1−x2)x2
2]T. (A.6b)
Additionally, let us use sampling plans based on the Sobol quasi-random sequence
in order to ensure reproducibility and to avoid averaging such as it would be needed
in the case of an Audze-Eglais LHC or a Maximin LHC. In Table A.3, the condition
numbers κ(BTB+λI)and κ(BT
ςBς+λI)are depicted.1
In the case of setting the Tikhonov regularization parameter λto zero, one can
observe that the condition number decreases by increasing the number of sampling
plan points m. Adopting a statistics point of view, one can interpret this observation,
intuitively, i.e., a larger sample size leads to a better estimate of the so-called true co-
efficients vector; or, more formally, the best coefficients column vector ˆ
˜
cis consistent
for the true coefficients column vector c. For more details on this particular notion of
consistency, I refer to [82, p. 65f] where the author elaborates on a theorem in which
a linkage is presented between the consistency and an approximation check whether
1
m(BTB)approaches a symmetric positive definite matrix as m→∞.
If a symmetric positive definite matrix is given, then its trace is greater than zero;
hence, let us check computationally tr(1
m(BTB))and tr(1
m(BT
ςBς)), respectively. In
Table A.2, the corresponding results are reported.
TABLE A.2: The trace of 1
m(BTB)and the trace of 1
m(BT
ςBς)w.r.t. the
number of sampling plan points mand the Tikhonov regularization
parameter λ=0 assuming the spaces P2
≤2,P3
≤2,P2
2and P3
2with mono-
mial basis and the spaces P2
2and P3
2with Bernstein basis.
BP2
≤2BP2
2Bς,P2
2BP3
≤2BP3
2Bς,P3
2
m=10 2.131 2.310 0.267 2.862 3.328 0.133
m=50 2.147 2.304 0.270 2.899 3.427 0.140
m=100 2.162 2.334 0.278 2.909 3.573 0.145
m=1000 2.176 2.348 0.283 2.930 3.600 0.150
In Table A.2, one can observe that, for all cases under consideration, the trace is
non-negative and it increases very slowly with increasing number of sampling plan
points. Thus, these results provide some kind of empirical evidence for the intuition
underlying the notion of consistency described above. Note that the trace of a ma-
trix is equal to the sum of its eigenvalues; and a defining property of a symmetric
positive definite matrix is that is only positive definite if and only if all of its eigen-
values are positive. In Table A.2, solely the entries with regard to the combinations
(m=10, BP3
2)and (m=10, Bς,P3
2)do not correspond to a case where all eigenvalues
are positive.
1Given m=1000, the time needed to construct the sampling plan based on the Sobol quasi-random
sequence is approximately 198 s for d=2 and 227s for d=3 on a notebook with an Intel®Core™i7-
6500U CPU @ 2.50GHz. This time represents the main bottleneck during the construction of the matrix
(BTB+λI)and (BT
ςBς+λI), respectively.
TABLE A.3: The condition number κ(BTB+λI)and κ(BT
ςBς+λI)w.r.t. the number of sampling plan points mand the Tikhonov
regularization parameter λassuming the spaces P2
≤2,P3
≤2,P2
2and P3
2with monomial basis and the spaces P2
2and P3
2with Bernstein basis.
(A) The condition number assuming P2
≤2with monomial basis.
m=10 m=50 m=100 m=1000
λ=0.0 3.971×1031.176×1030.922×1030.861×103
λ=0.2 0.091×1030.334×1030.464×1030.789×103
λ=0.5 0.037×1030.161×1030.266×1030.700×103
λ=0.8 0.023×1030.106×1030.187×1030.630×103
(B) The condition number assuming P3
≤2with monomial basis.
m=10 m=50 m=100 m=1000
λ=0.0 0.182×1062.464×1031.782×1031.363×103
λ=0.2 0.119×1030.476×1030.716×1031.224×103
λ=0.5 0.048×1030.216×1030.378×1031.062×103
λ=0.8 0.030×1030.139×1030.257×1030.938×103
(C) The condition number assuming P2
2with monomial basis.
m=10 m=50 m=100 m=1000
λ=0.0 2.292×1060.369×1060.316×1060.275×106
λ=0.2 0.098×1030.490×1030.983×1039.564×103
λ=0.5 0.039×1030.196×1030.394×1033.907×103
λ=0.8 0.025×1030.123×1030.247×1032.455×103
(D) The condition number assuming P3
2with monomial basis.
m=10 m=50 m=100 m=1000
λ=0.0 7.994×1019 6.938×1083.702×1081.563×108
λ=0.2 0.130×1030.668×1031.388×1031.395×104
λ=0.5 0.052×1030.268×1030.555×1035.581×103
λ=0.8 0.033×1030.167×1030.347×1033.488×103
(E) The condition number assuming P2
2with Bernstein basis.
m=10 m=50 m=100 m=1000
λ=0.0 1.092×1030.129×1030.104×1030.098×103
λ=0.2 0.006×1030.023×1030.036×1030.084×103
λ=0.5 0.003×1030.011×1030.019×1030.069×103
λ=0.8 0.002×1030.007×1030.013×1030.058×103
(F) The condition number assuming P3
2with Bernstein basis.
m=10 m=50 m=100 m=1000
λ=0.0 8.471×1018 4.578×1032.269×1031.063×103
λ=0.2 0.003×1030.010×1030.019×1030.158×103
λ=0.5 0.001×1030.004×1030.008×1030.070×103
λ=0.8 0.001×1030.003×1030.005×1030.045×103
A.3. Chebyshev polynomials 199
Given a fixed λ>0 in Table A.3, one can observe that the condition number is
increasing as the number of sampling plan points mis increasing. I cannot back up
this particular behavior in a formal way. Given a fixed m, one can observe that, the
condition number is decreasing as the regularization parameter is increasing. This
behavior is matching the expectation regarding the regularization parameter.
If we exclude pathological several orders of magnitude from consideration, then,
in practical application, the assessment of a condition number as an indication for
an ill-conditioned problem is, mostly, incumbent upon the judgment of the user and
its desired accuracy for a problem under investigation.
Therefore, the key empirical insight from Table A.3 is that for a low number of
sampling plan points, by invoking the regularization parameter λ, one can achieve
moderately small condition numbers by monomial basis for the space P2
≤2and P3
≤2.
These condition numbers are comparable to those condition numbers associated
with their tensor product polynomial basis counterparts.
A.3 Chebyshev polynomials
Another possibility to mitigate by design the multicollinearity with respect to the
chosen basis is, instead of a monomial basis B⊆P≤nsuch as in (3.40), to apply a fi-
nite basis of orthogonal polynomials Q⊆P≤nin which the polynomials are pairwise
orthogonal with respect to an inner product ⟨⋅,⋅⟩RXthat can be considered in a con-
tinuous form or in a discrete form (see, e.g., [48, p. 251f]), more precisely,
⟨⋅,⋅⟩RX∶=(f,g)↦∫b
af(x)g(x)w(x)dx∶RX×RX→Ror (A.7a)
⟨⋅,⋅⟩RXs∶=(f,g)↦
m
∑
i=1
f(xi)g(xi)w(xi)∶RXs×RXs→R, (A.7b)
where x∈X∶=[a,b]⊂R,xi∈Xs⊆Xmwith m∈Nand i∈{1,2,...,m}, and wdenotes
aweight function. Let us focus on the discrete form in (A.7b).
Using orthogonal polynomials qi∈Q, one can concisely define a matrix Q∈Rm×Rs
with respect to the sampling plan points xi:
Q∶=
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
˜
qT
x1
˜
qT
x2
⋮
˜
qT
xm
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
, (A.8)
such that, in the absence of the floating point approximation error, QTQis exactly
diagonal in the sense of
QTQ≡diag(⟨˜
q0,˜
q0⟩RXs,.. .,⟨˜
qs−1,˜
qs−1⟩RXs), (A.9)
where ˜
qidenote the components of a basis vector ˜
q∈Rs×1. If Qis orthonormal, then
QTQ=Iwhere I∈Rs×Rsdenotes the identity matrix.
Let us aim our attention at univariate Chebyshev polynomials of the first kind
Tn(x)with the leading coefficient 2n−1for n≥1 and the domain X∶=[−1,1]. For
more details on the properties of those sequences of orthogonal polynomials called
Chebyshev polynomials, see, e.g., [208].
200 Appendix A. Multivariate polynomials (§ 3.1.2)
The corresponding recurrence formula reads as
Tn(x)∶=⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
1, if n≡0,
x, if n≡1,
2xTn−1(x)−Tn−2(x), if n≥2.
(A.10)
Based on the recurrence formula in (A.10), one can construct a basis for the space
P≤2; furthermore, one can construct a basis for the space Pd
≤2in the same manner as
in (3.45). Hence, one can define the column vector ˜
q∈Rs×1that represents the basis
of a d-variate Chebyshev polynomial of the first kind of degree at most two:
˜
q∶=[˜
q0˜
q1... ˜
qd˜
qd+1... ˜
q2d˜
q2d+1... ˜
qs−1]T, (A.11)
where ˜
q0=T0(x1)⋯T0(xd),˜
q1=T1(x1),˜
qd=T1(xd),˜
qd+1=T2(x1),˜
q2d=T2(xd),
˜
q2d+1=T1(x1)T1(x2), and ˜
qs−1=T1(xd−1)T1(xd); more specifically, ˜
q0=1, ˜
q1=x1,
˜
qd=xd,˜
qd+1=2x2
1−1, ˜
q2d=2x2
d−1, ˜
q2d+1=x1x2, and ˜
qs−1=xd−1xd. Thus, analogously
to (3.57), one can define a polynomial as
p(x)∶=˜
qT˜
c. (A.12)
Mind that the sampling plan Xs⊂([−1,1]d)m1×⋯×mdis constituted by the Chebyshev
nodes
∀l∈{1,. . .,d}.∀k∈{1,. . .,ml}.xkl∶=cos(2k−1
2mπ), (A.13)
where ∏d
l=1ml∶=m. Let us refer to this specific construction of a sampling plan Xsas
aChebyshev grid in which the sampling plan Xscan be conceived as a tensor in the
sense of a multidimensional array (see, e.g., [79]). For a brief and succinct comment
on the smoothness requirements for a high-fidelity model and on the influence of
the positioning of the nodes in polynomial interpolation (the Runge phenomenon,
the Faber theorem), see, e.g., [209].
In Figure A.1, three Chebyshev grids with d∶=2 are illustrated.
-1.0 -0.5 0.0 0.5 1.0
x
1
-1.0
-0.5
0.0
0.5
1.0
x
2
(i)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(ii)
0.0 0.2 0.4 0.6 0.8 1.0
x
1
0.0
0.2
0.4
0.6
0.8
1.0
x
2
(iii)
FIGURE A.1: Instances of a Chebyshev grid as
a sampling plan Xs⊂(X2)m1×m2.
(i)
X∶=[−1,1],m1≡m2∶=5,
(ii)
X∶=[0,1],m1≡m2∶=5,
(iii)
X∶=[0,1],m1≡m2∶=10.
In order to transform the domains from [al,bl]2≡[−1,1]2to [˜
al,˜
bl]2≡[0,1]2with
l∈{1,2}, let us overload the maps and override the domains and the variables in
(A.4) and in (A.5). Thus, let us apply the affine map γl∶[al,bl]→[˜
al,˜
bl]and the
A.3. Chebyshev polynomials 201
affine map νl∶[˜
al,˜
bl]→[al,bl]such that
[al,bl][˜
al,˜
bl]
[al,bl]
γl
∀l.id[al,bl]∶=(νl○γl)νl(A.14)
where id[al,bl]denotes the identity map on the domain [al,bl]. Invoking (A.13), the
assignments of the affine maps γland νlare encoded by
∀l∈{1,. . .,d}.∀k∈{1,. . .,ml}.γ(xkl)∶=1
2((bl−al)xkl+(al+bl)), (A.15a)
∀l∈{1,. . .,d}.∀k∈{1,. . .,ml}.ν(xkl)∶=1
bl−al(2xkl−(al+bl)). (A.15b)
If we use the orthogonality condition in the discrete form for the univariate case,
i.e.,
∀l∈{1,. . .,d}.
ml
∑
k=1
Ti(xkl)Tj(xkl)∶=⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
0, if i≠j,
ml, if i≡j∧i=0,
ml
2, if i≡j∧i≠0,
(A.16)
one can express the orthogonality condition in the discrete form for the multivariate
case in terms of the components ˜
qiin (A.11), i.e.,
⟨˜
qi,˜
qj⟩RXs∶=⎧
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩
0, if i≠j,
m, if i≡j∧i=0,
m
2, if i≡j∧i∈[1,2d],
m
4, if i≡j∧i∈[2d+1,s−1].
(A.17)
Analogously to the Listing 3.1, if we have the optimal coefficients as floating-
point numbers, then a numerically stable approach to evaluate the function in (A.12)
is Clenshaw’s recurrence formula.
In Listing A.2, I present an example implementation of Clenshaw’s algorithm
for the evaluation of a univariate Chebyshev sum in the Julia PL. For a neat graph
representation of the structure of Clenshaw’s algorithm, see, e.g., [78, p. 199].
To avoid clutter, a modified version of this algorithm (see [75, p. 78f]) is left out.
The modified version attempts to alleviate the rounding errors’ hazard in the cases in
which the argument is close to the domain’s boundary, e.g., the case x∶=−1+1×103e
or the case x∶=1−1×103ewhere the number eencodes the machine epsilon in float-
ing point arithmetic .
LISTING A.2: An example implementation of Clenshaw’s algorithm
for the evaluation of a univariate Chebyshev sum in the Julia PL.
1
function chebyshev_clenshaw_eval_1d(c::Vector{T},x::T) where {T<:Real}
2
N = size(c,1) - 1 # 1-based indexing
3
d = zeros(N+2)
4
d[N+2] = 0
5
d[N+1] = c[N+1]
6
for iin N:-1:2
7
d[i] = 2x*d[i+1] - d[i+2] + c[i]
8
end
9
return x*d[2] - d[3] + c[1]
10
end
202 Appendix A. Multivariate polynomials (§ 3.1.2)
Concerning the multivariate case, if we possess the space P2
≤2with dim(P2
≤2)≡6, then
one can introduce the column vectors ˜
τ(x2)∈R6×1and ˜
η(x1)∈R6×1and the diagonal
matrix ˜
∆∈R6×R6such that
˜
τ(x2)∶=[T0(x2)T0(x2)T0(x2)T1(x2)T2(x2)T1(x2)]T, (A.18a)
˜
η(x1)∶=[T0(x1)T1(x1)T2(x1)T0(x1)T0(x1)T1(x1)]T, (A.18b)
˜
∆∶=diag(˜
c1,˜
c2,˜
c3,˜
c4,˜
c5,˜
c6), (A.18c)
p(x)∶=˜
τ(x2)T˜
∆˜
η(x1). (A.18d)
Finally, similarly to (3.69), one can apply an evaluation scheme based on multiple
nested hierarchical evaluations of univariate Chebyshev sums to the formulation
p(x)∶=
dim(P2
≤2)
∑
i=1
˜
cj˜
τj(x2)˜
ηj(x1). (A.19)
For more details on evaluation algorithms for multivariate orthogonal polynomials
such as Chebyshev polynomials, see, e.g., [17].
Noteworthily, the authors in [206] allude to a numerically stable evaluation sche-
me for a univariate Chebyshev sum based on the second (true) form of the barycen-
tric formula in rational interpolation where the number of Chebyshev nodes of the
second kind has to be sufficiently large (cf. [22]). This evaluation scheme only re-
lies on the information about the nodes and the corresponding values of the high-
fidelity model. Though, since the basic presumption in the present work is to keep
the number of evaluation points fairly low, this formula is solely employed for test-
ing purposes.
Regarding the usage of a deterministic data-fit low-fidelity model in an interpo-
lation context, an additional aspect that is restricted by the basic presumption in the
present work is the reduction of the empirical surrogate modeling error (see Defini-
tion 3.1.2) by increasing sufficiently the sample size and by positioning appropriately
the sampling plan points. Some kind of adaptive interpolation in the sense that an
initial sampling plan is extended sequentially in a controlled manner, though, it is
solely regarded for the probabilistic data-fit low-fidelity model, that is, the kriging
low-fidelity model.
Regarding the condition number κ(QTQ), if Qis orthonormal, then, by design,
it holds that κ(QTQ)≡1; and if Qis only orthogonal, then, in principle, it holds that
κ(QTQ)<9. Comparing these observations to the observations in Table A.3, it can
be argued that a change of basis, especially, a change to a basis of Chebyshev poly-
nomials, can have a favorable effect on the condition number. Mind that, from an
application-oriented view, the regularization approach offers more flexibility con-
cerning the choice of a basis and the choice of a sampling plan.
203
Appendix B
Solenoid with a core (§ 5.1)
B.1 An electrical network viewpoint
In Figure B.1, I depict the circuit diagram representation of the three fundamental
passive electrical components where the circles encode external terminal nodes of
the passive electrical components.
R L C
FIGURE B.1: Circuit diagram representation of the three fundamental
passive electrical components: resistance R, inductance L, and capac-
itance C.
If we use the map U∈CR+and the map I∈CR+to denote a complex-valued
voltage drop and a complex-valued current intensity, respectively, where both maps
depend on the angular frequency ω, then, with respect to the passive electrical com-
ponents in Figure B.1, one can state the following equations to hold to be true:
UR(ω)=R⋅IR(ω),UL(ω)=jωL⋅IL(ω),IC(ω)=jωC⋅UC(ω), (B.1)
where it is assumed that the entities R,L,C∈Cwith Re(R),Re(L),Re(C)∈R+and
Im(R)∶=0, Im(L)∶=0, and Im(C)∶=0. Hence, all the multiplication maps in (B.1)
have the same signature C×C→Cand the corresponding assignment rules refer to
the common algebraic rules for complex numbers.
In Figure B.2, two representatives from the class of circuit diagrams for real in-
ductive components (cf. [112, p. 520ff]) are depicted where a rough colloquial equiv-
alence relation can be defined by "has the same four-tuple of fundamental passive
electrical components (L0,Rw,Rc,Cp)as".
Mind that in Figure B.2, due to the exposition in § 2.1.3, one can solely consider
the resistance Rw(associated with the losses in the winding), the resistance Rc(as-
sociated with the losses in the core) and the inductance L0(associated with the mag-
netic energy) within the magnetoquasistatic model where, in practical applications,
the entity L0refers to the nominal inductance provided by a choke manufacturer’s
data sheet.
In the context of electromagnetic compatibility, though, it is common to take the
capacitance Cpas a parasitic component into account, too, in order to reconstruct
properly a real inductive component’s impedance map Z∈CR+with Z=ω↦U(ω)
I(ω)
over a wide range of frequencies. Notice well that, in contrast to the 2D-LBVP
in § 5.1, the aim for proper reconstruction of a real inductive component’s impedance
map prompts one to also incorporate the resistance Rc. The assignment rules for the
204 Appendix B. Solenoid with a core (§ 5.1)
Cp
RwRc
L0
(A) Representative #1.
CpRc
Rw
L0
(B) Representative #2.
FIGURE B.2: Two representatives from the equivalence class of circuit
diagrams for real inductive components with the equivalence rela-
tion "has the same four-tuple of fundamental passive electrical com-
ponents (L0,Rw,Rc,Cp)as".
representatives in Figure B.2 read as
Z1(ω)∶=((Rc+Rw)+jωL0)1
jωCp
(Rw+Rc)+jωL0+1
jωCp
,Z2(ω)∶=1
1
Rw+jωL0+1
Rc+jωCp
, (B.2)
where Z1(ω)corresponds to the circuit diagram in Figure B.2a and Z2(ω)corre-
sponds to the circuit diagram in Figure B.2b.
If we overload the impedance map Z(and the maps Uand Ias well) in the sense
that Z∈CCwith Z=s↦U(s)
I(s), then one can rewrite the assignment rules in (B.2) by
substituting the term jωwith the term s– which we conceive as s∶=σs+jωswith
σs,ωs∈R– such that
Z1(s)∶=(Rc+Rw)+L0s
1+(Rw+Rc)Cps+CpL0s2,Z2(ω)∶=RcRw+RcL0s
Rc+Rw+(L0+CpRcRw)s+CpRcL0s2,
(B.3a)
Z1(s)∶=N1(s)
D1(s),Z2(s)∶=N2(s)
D2(s), (B.3b)
Z1(s)∶=k1s−z11
(s−p11)(s−p12),Z2(s)∶=k2s−z21
(s−p21)(s−p22), (B.3c)
where Ni∈CCwith i∈{1,2}denotes the complex numerator polynomial with real
coefficients of the respective impedance map – and the map Di∈CCdenotes the
corresponding complex denominator polynomial with real coefficients. The Julia PL
package SymPy.jl (see [110]) is utilized in order to perform symbolic computations
with regard to (B.2) and (B.3).
B.1. An electrical network viewpoint 205
In (B.3), the non-negative numbers k1and k2are defined as k1∶=L
LCpand k2∶=LRc
LCpRp.
Moreover, zi1∈Cwith i∈{1,2}refers to the zero of the respective impedance map
and pij∈Cwith i∈{1,2}and j∈{1,2}refers to the pole of the respective impedance
map. If the poles have non-zero imaginary parts, i.e., ∀i.∀j.Im(pij)≠0, then one can
define the resonance frequency fri∈R+for each ias fri∶=∣Im(pi1)
2π∣.
A useful approximation of the resonance frequency is given by fri∶=1
2π
1
√CpL. Ob-
serve that we limit our consideration of a real inductive component’s impedance
map Zto the case of one resonance frequency.
In Figure B.3, I illustrate the magnitude (or modulus) Z(ω)and the phase (or
argument) θ(ω)of the impedances in (B.2) associated with the representatives in
Figure B.2 for synthetic data w.r.t. the four-tuple (L0,Rw,Rc,Cp)and the frequency
range [1×102Hz,1 ×108Hz].1
102103104105106107108
Frequency [ Hz ]
10−3
10−1
101
103
105
107
Z
1(
ω
) [Ω]
1 mΩ
10 Ω
102103104105106107108
Frequency [ Hz ]
−90
−60
−30
0
30
60
90
θ
1(
ω
) [°]
1 mΩ
10 Ω
(A) Given Z1(ω), choose (2µH, Rw,1mΩ,10pF)
where Rw∈[1mΩ,10 Ω].
102103104105106107108
Frequency [ Hz ]
10−3
10−1
101
103
105
107
Z
1(
ω
) [Ω]
1
μ
H
9 mH
102103104105106107108
Frequency [ Hz ]
−90
−60
−30
0
30
60
90
θ
1(
ω
) [°]
1
μ
H
9 mH
(B) Given Z1(ω), choose (L0,100mΩ,1mΩ,10pF)
where L0∈[1µH,9mH].
102103104105106107108
Frequency [ Hz ]
10−3
10−1
101
103
105
107
Z
2(
ω
) [Ω]
1 mΩ
10 Ω
102103104105106107108
Frequency [ Hz ]
−90
−60
−30
0
30
60
90
θ
2(
ω
) [°]
1 mΩ
10 Ω
(C) Given Z2(ω), choose (10µH, Rw,100Ω,1nF)
where Rw∈[1mΩ,10 Ω].
102103104105106107108
Frequency [ Hz ]
10−3
10−1
101
103
105
107
Z
2(
ω
) [Ω]
1
μ
H
9 mH
102103104105106107108
Frequency [ Hz ]
−90
−60
−30
0
30
60
90
θ
2(
ω
) [°]
1
μ
H
9 mH
(D) Given Z2(ω), choose (L0,10mΩ,1kΩ,1nF)
where L0∈[1µH,9mH].
FIGURE B.3: Given the frequency range [1×102Hz,1 ×108Hz], the
magnitude Z(ω)and the phase θ(ω)of the impedances in (B.2) as-
sociated with the representatives in Figure B.2 for synthetic data
w.r.t. the four-tuple (L0,Rw,Rc,Cp).
1It is supposed that the identification abs(Z)≡Z(ω)is given where abs refers to the single-valued
absolute value function with the signature CR+
→R+and it is supposed that the identification
arg(Z)≡θ(ω)is given where arg refers to the single-valued argument function with the signature
CR+
→R+.
206 Appendix B. Solenoid with a core (§ 5.1)
In order to move from the field theoretical level in (2.1) to the circuit theoretical
level in (B.1), let us invoke Poynting’s theorem (see, e.g., [139, p. 108ff]) for the fre-
quency domain such that one can determine the three fundamental passive electrical
components by the following identifications:
PL≡∫
V
1
2Jcond ⋅EdV,Wm≡∫
V
1
4B⋅HdV,We≡∫
V
1
4E⋅DdV, (B.4)
R≡PL
I2
rms ,L≡2Wm
I2
rms ,C≡2We
I2
rms , (B.5)
where PL∈R+denotes the time-averaged ohmic loss in Ω2D,Wm∈R+denotes the
time-averaged magnetic energy in Ω2D, and We∈R+denotes the time-averaged
electric energy in Ω2D. In accordance with the elaborations in § 2.2.1, one can con-
ceive the 3-tuple (PL,Wm,We)and the 3-tuple (R,L,C)as 3-tuples of evaluated
quantities of interest.
Notice that Weand Care excluded in the considerations with regard to the 2D-
LBVP in § 5.1. Expressing it in terms of the circuit diagram representative in Fig-
ure B.2a, one can set Rc≡0mΩ,Cp≡0pF, Rw≡R, and L0≡Lsuch that the circuit
diagram representative reduces to a series connection of the impedances associated
with the resistance Rwand the inductance L0.
Furthermore, if we consider the map U∈CR+and the map I∈CR+with respect
to the external terminal nodes, then one can express the resistance Rand the induc-
tance Las
R≡Re(U(ω)
I(ω)),L≡Re(Ψ(ω)
I(ω)), (B.6)
where the map Ψ∈CR+denotes the complex-valued total magnetic flux. By invok-
ing the definition (ii) in (2.5) and applying the theorem of Stokes, the assignment
rule of the map Ψcan be stated as
Ψ=ω↦∫
∂A
A⋅ds. (B.7)
Notice well that due to the numerical integration that is involved in determining the
entities such as Wmin (B.4) or such as Ψ(ω)in (B.6), there might be slight differences
in the decimal places with regard to the resistance Rand the inductance Ldepending
on the method of computation – even if we assume that the domain of integration is
properly chosen.
B.2 A visualization of evaluated data-fit low-fidelity models
regarding (5.12), (5.14c), and (5.15)
B.2. A visualization of evaluated data-fit low-fidelity models regarding (5.12),
(5.14c), and (5.15)207
x
1 [mm]
1.5
2.5
x
2 [mm]
5.5
6.5
7.5
8.5
9.5
z
[mW]
70
90
110
130
(1a)
x
1 [mm]
1.5
2.5
x
2 [mm]
5.5
6.5
7.5
8.5
9.5
z
[mW]
70
90
110
130
(2a)
x
1 [mm]
1.5
2.5
x
2 [mm]
5.5
6.5
7.5
8.5
9.5
z
[mW]
70
90
110
130
(3a)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(1b)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(2b)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(3b)
(A) Representation of z∶=˜
ˆ
jPL,ω0(x1,x2).
x
1 [mm]
1.5
2.5
x
2 [mm]
5.5
6.5
7.5
8.5
9.5
z
[cm3]
10
20
(1a)
x
1 [mm]
1.5
2.5
x
2 [mm]
5.5
6.5
7.5
8.5
9.5
z
[cm3]
10
20
(2a)
x
1 [mm]
1.5
2.5
x
2 [mm]
5.5
6.5
7.5
8.5
9.5
z
[cm3]
10
20
(3a)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(1b)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(2b)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(3b)
(B) Representation of z∶=˜
Vut(x1,x2).
FIGURE B.4: By using the Sobol quasi-random sequence sampling
plan with m∶=21 and the data-fit low-fidelity models in § 3.2.1,
i.e.,
(1)
Polynomial,
(2)
TPS RBF, and
(3)
Kriging; representa-
tions (surface
(a)
and contour
(b)
) of ˜
ˆ
jPL,ω0(x1,x2)and ˜
Vut(x1,x2)
where ω0∶=2π100kHz.
208 Appendix B. Solenoid with a core (§ 5.1)
x
1
1.5
2.5
x
2
5.5
6.5
7.5
8.5
9.5
z
[mW/cm3]
10
30
(1a)
x
1
1.5
2.5
x
2
5.5
6.5
7.5
8.5
9.5
z
[mW/cm3]
10
25
(2a)
x
1
1.5
2.5
x
2
5.5
6.5
7.5
8.5
9.5
z
[mW/cm3]
10
25
(3a)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(1b)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(2b)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(3b)
(A) Representation of z∶=ˆ
jPL,Vut,ω0(x1,x2).
x
1
1.5
2.5
x
2
5.5
6.5
7.5
8.5
9.5
z
[
μ
H]
1.5
2.5
3.5
(1a)
x
1
1.5
2.5
x
2
5.5
6.5
7.5
8.5
9.5
z
[
μ
H]
1.5
2.5
3.5
(2a)
x
1
1.5
2.5
x
2
5.5
6.5
7.5
8.5
9.5
z
[
μ
H]
1.5
2.5
3.5
(3a)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(1b)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(2b)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(3b)
(B) Representation of z∶=ˆ
QL,ω0(x1,x2).
FIGURE B.5: By using the Sobol quasi-random sequence sampling
plan with m∶=21 and the data-fit low-fidelity models in § 3.2.1,
i.e.,
(1)
Polynomial,
(2)
TPS RBF, and
(3)
Kriging; representations
(surface
(a)
and contour
(b)
) of ˜
ˆ
jPL,Vut,ω0(x1,x2)and ˜
ˆ
QL,ω0(x1,x2)
where ω0∶=2π100kHz.
B.2. A visualization of evaluated data-fit low-fidelity models regarding (5.12),
(5.14c), and (5.15)209
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(1a)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(2a)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(3a)
(A) Depicting grad(˜
K)(x1,x2)w.r.t. Figure B.4a.
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(1b)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(2b)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(3b)
(B) Depicting grad(˜
K)(x1,x2)w.r.t. Figure B.4b.
FIGURE B.6: Depicting grad(˜
K)(x1,x2)as a projection on the contour
representation of the data-fit low-fidelity models for the functions in
Figure B.4.
210 Appendix B. Solenoid with a core (§ 5.1)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(1a)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(2a)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(3a)
(A) Depicting grad(˜
K)(x1,x2)w.r.t. Figure B.5a.
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(1b)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(2b)
1.5 2.5
x
1 [mm]
5.5
6.5
7.5
8.5
9.5
x
2 [mm]
(3b)
(B) Depicting grad(˜
K)(x1,x2)w.r.t. Figure B.5b.
FIGURE B.7: Depicting grad(˜
K)(x1,x2)as a projection on the contour
representation of the data-fit low-fidelity models for the functions in
Figure B.5.
211
Bibliography
[1] S. Abramsky and N. Tzevelekos. “Introduction to Categories and Categorical
Logic”. In: New Structures for Physics. Ed. by B. Coecke. Vol. 813. Lecture Notes
in Physics. Springer, 2011, pp. 3–94.
[2] N. M. Alexandrov and M. Y. Hussaini (editors). Multidisciplinary Design Op-
timization: State of the Art. SIAM, 1997.
[3] N. M. Alexandrov and R. M. Lewis. “First-Order Approximation and Model
Management in Optimization”. In: Large-Scale PDE-Constrained Optimization.
Ed. by L.T. Biegler, M. Heinkenschloss, O. Ghattas, and B. van Bloemen Waan-
ders. Vol. 30. Lecture Notes in Computational Science and Engineering. Sprin-
ger, 2003, pp. 63–79.
[4] N. M. Alexandrov, R. M. Lewis, C. R. Gumbert, L. L. Green, and P. A. New-
man. “Approximation and model management in aerodynamic optimization
with variable-fidelity models”. In: Journal of Aircraft 38.6 (2001), pp. 1093–
1101.
[5] S. Arlot and A. Celisse. “A survey of cross-validation procedures for model
selection”. In: Statistics Surveys 4 (2010), pp. 40–79.
[6] D. N. Arnold. Finite Element Exterior Calculus. SIAM, 2018.
[7] D. N. Arnold. “Stability, Consistency, and Convergence of Numerical Dis-
cretizations”. In: Encyclopedia of Applied and Computational Mathematics. Ed.
by B. Engquist. Springer, 2015.
[8] D. N. Arnold, R. S. Falk, and R. Winther. “Finite element exterior calculus,
homological techniques, and applications”. In: Acta Numerica 15 (2006), pp. 1–
155.
[9] K. Atkinson and W. Han. Theoretical Numerical Analysis: A Functional Analysis
Framework. 3rd ed. Springer, 2009.
[10] C. Audet and M. Kokkolaras (editors). “Special Issue on Blackbox and Deri-
vative-Free Optimization”. In: Optimization and Engineering 17.1 (2016), pp. 1–
262.
[11] S. Awodey. Category Theory. Oxford University Press, 2nd ed., 2009.
[12] I. Babuška and J. T. Oden. “Verification and validation in computational en-
gineering and science: basic concepts”. In: Comput. Methods Appl. Mech. Eng.
139.36 (2004), pp. 4057–4066.
[13] J. C. Baez and B. Fong. “A Compositional Framework for Passive Linear Net-
works”. In: Theory and Applications of Categories 33.38 (2018), pp. 1158–1222.
[14] J. W. Bandler, R. M. Biernacki, S. H. Chen, R. H. Hemmers, and K. Mad-
sen. “Electromagnetic optimization exploiting aggressive space mapping”.
In: IEEE Trans. Microwave Theory Tech. 43.12 (1995), pp. 2874–2882.
[15] H. P. Barendregt. The Lambda Calculus — Its Syntax and Semantics. revised ed.
North-Holland, 1985.
212 Bibliography
[16] M. Barr and C. Wells. “Category Theory for Computing Science”. In: Reprints
in Theory and Applications of Categories 22 (2012), pp. 1–538.
[17] R. Barrio, J. M. Peña, and T. Sauer. “Three term recurrence for the evaluation
of multivariate orthogonal polynomials”. In: Journal of Approximation Theory
162.2 (2010), pp. 407–420.
[18] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear Programming: Theory
and Algorithms. 3rd ed. Wiley, 2006.
[19] R. K. Beatson, M. J. D. Powell, and A. M. Tan. “Fast evaluation of polyhar-
monic splines in three dimensions”. In: IMA Journal of Numerical Analysis 27.3
(2007), pp. 427–450.
[20] P. Benner, S. Gugercin, and K. Willcox. “A Survey of Projection-Based Model
Reduction Methods for Parametric Dynamical Systems”. In: SIAM Review
57.4 (2015), pp. 483–531.
[21] P. Benner, M. Hinze, and E. J. W. ter Marten (editors). Model Reduction for
Circuit Simulation. 1st ed. Springer, 2011.
[22] J.-P. Berrut and L. N. Trefethen. “Barycentric Lagrange Interpolation”. In:
SIAM Review 46.3 (2004), pp. 501–517.
[23] L. Bessi. Surrogates in Julia.
https://github.com/SciML/Surrogates.jl
.
[Online; accessed 12-February-2020]. 2020.
[24] H.-G. Beyer and H.-P. Schwefel. “Evolution strategies: A comprehensive in-
troduction”. In: Natural Computing 1 (2002), pp. 3–52.
[25] J. Bezanson, J. Chen, B. Chung, S. Karpinski, V. B. Shah, J. Vitek, and L.
Zoubritzky. “Julia: Dynamism and Performance Reconciled by Design”. In:
Proc. ACM Program. Lang. 2.OOPSLA (2018), pp. 1–23.
[26] J. Bezanson, J. Edelman, S. Karpinski, and V. B. Sha. “Julia: A Fresh Approach
to Numerical Computing”. In: SIAM Review 59.1 (2017), pp. 65–98.
[27] M. E. Biancolini. Fast Radial Basis Functions for Engineering Applications. 1st ed.
Springer, 2017.
[28] P. B. Bochev and A. C. Robinson. “Matching algorithms with physics: exact
sequences of finite element spaces”. In: Collected lectures on preservation of sta-
bility under discretization. Ed. by D. Estep and S. Tavener. Workshop on the
preservation of stability under discretization. SIAM, 2001, pp. 145–166.
[29] D. E. Bockelman and W. R. Eisenstadt. “Combined Differential and Common-
Mode Scattering Parameters: Theory and Simulation”. In: IEEE Transactions
on Microwave Theory and Techniques 43.7 (1995), pp. 1530–1539.
[30] Z. Bontinck. Simulation and Robust Optimization for Electric Devices with Uncer-
tainties. Technische Universität Darmstadt, PhD thesis. 2018.
[31] A. J. Booker, J. E. Dennis, P. D. Frank, D. B. Serafini, V. Torczon, and M. W.
Trosset. “A rigorous framework for optimization of expensive functions by
surrogates”. In: Structural Optimization 17.1 (1999), pp. 1–13.
[32] A. Bossavit. Computational Electromagnetism: Variational Formulations, Comple-
mentarity, Edge Elements. Academic Press, 1998.
[33] A. Bossavit. “Discretization of Electromagnetic Problems: The “Generalized
Finite Differences” Approach”. In: Numerical Methods in Electromagnetics. Ed.
by P. G. Ciarlet. Vol. 13. Handbook of Numerical Analysis. Elsevier, 2005,
pp. 105–197.
Bibliography 213
[34] J. P. Boyd and K. W. Gildersleeve. “Numerical experiments on the condition
number of the interpolation matrices for radial basis functions”. In: Applied
Numerical Mathematics 61.4 (2011), pp. 443–459.
[35] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University
Press, 2004.
[36] R. Brown and T. Porter. “Category Theory: an abstract setting for analogy
and comparison”. In: What is Category Theory? Advanced Studies in Math-
ematics and Logic, Polimetrica Publisher. 2006, pp. 257–274.
[37] R. Brown and T. Porter. “The methodology of mathematics”. In: The Mathe-
matical Gazette 79.485 (1995), pp. 321–334.
[38] M. D. Buhmann. Radial Basis Functions – Theory and Implementation. 1st ed.
Cambridge University Press, 2003.
[39] H.-J. Bungartz and M. Griebel. “Sparse grids”. In: Acta Numerica 13 (2004),
pp. 147–269.
[40] S. Burgard, O. Farle, P. Loew, and R. Dyczij-Edlinger. “Fast Shape Optimiza-
tion of Microwave Devices Based on Parametric Reduced-Order Models”. In:
IEEE Transactions on Magnetics 50.2 (2014), pp. 629–632.
[41] R. M. Burkart. Advanced Modeling and Multi-Objective Optimization of Power
Electronic Converter Systems. ETH Zürich, PhD thesis. 2016.
[42] W. Burke. Applied Differential Geometry. 1st ed. Cambridge University Press,
1985.
[43] J. M. F. Castillo. “The Hitchhiker Guide to Categorical Banach Space Theory.
Part I”. In: Extracta Mathematicae 25.2 (2010), pp. 103–149.
[44] L. Codecasa. “Novel Approach to Model Order Reduction for Nonlinear Eddy-
Current Problems”. In: IEEE Transactions on Magnetics 51.3 (2015), pp. 1–4.
[45] B. Coecke and A. Kissinger. Picturing Quantum Processes – A First Course in
Quantum Theory and Diagrammatic Reasoning. Cambridge University Press,
2017.
[46] B. Coecke (editor). New Structures for Physics. Springer, 2011.
[47] A. R. Conn, K. Scheinberg, and L. N. Vicente. Introduction to Derivative-Free
Optimization. 1st ed. SIAM, 2009.
[48] S. D. Conte and C. de Boor. Elementary Numerical Analysis: An Algorithmic
Approach. 3rd ed. McGraw-Hill, 1980.
[49] G. Crevecoeur. Numerical Methods for Low Frequency Electromagnetic Optimiza-
tion and Inverse Problems using Multi-Level Techniques. Ghent University, PhD
thesis. 2009.
[50] G. Crevecoeur and R. H. De Staelen. “On cost function transformations for
the reduction of uncertain model parameters’ impact towards the optimal
solutions”. In: J. Comput. App. Math. 289 (2015), pp. 392–399.
[51] F. Cucker and D.-X. Zhou. Learning Theory: An Approximation Theory View-
point. Cambridge University Press, 2007.
[52] J. Culbertson and K. Sturtz. “A Categorical Foundation for Bayesian Proba-
bility”. In: Applied Categorical Structures 22.4 (2014), pp. 647–662.
[53] A. Dean, M. Morris, J. Stufken, and D. Bingham (editors). Handbook of Design
and Analysis of Experiments. 1st ed. CRC Press, 2015.
214 Bibliography
[54] M. C. Delfour and J.-P. Zolésio. Shapes and Geometries – Metrics, Analysis, Dif-
ferential Calculus, and Optimization. 2nd ed. SIAM, 2011.
[55] P. Deuflhard and A. Hohmann. Numerical Analysis in Modern Scientific Com-
puting: An Introduction. 2nd ed. Springer, 2003.
[56] D. Echeverría Ciaurri. Multi-Level Optimization: Space Mapping and Manifold
Mapping. Amsterdam University, PhD thesis. 2007.
[57] M. S. Eldred and D. M. Dunlavy. “Formulations for surrogate-based opti-
mization with data-fit, multifidelity and reduced-order models”. In: 2006 11th
AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference. Por-
tsmouth, U.S.A., September 6–8, 2006, pp. 1–20.
[58] C. Elliott. “The simple essence of automatic differentiation”. In: 2018 23rd
ACM SIGPLAN International Conference on Functional Programming (ICFP).
St. Louis, United States, September 23–29, 2018, pp. 1–29.
[59] M. H. van Emden and B. Moa. “Termination Criteria in the Moore-Skelboe
Algorithm for Global Optimization by Interval Arithmetic”. In: Frontiers in
Global Optimization. Ed. by C. A. Floudas and P. Pardalos. Springer, 2004.
[60] R. W. Erickson and D. Maksimovic. Fundamentals of Power Electronics. 2nd ed.
Springer, 2001.
[61] K.-T. Fang, R. Li, and A. Sudjianto. Design and Modeling for Computer Experi-
ments. 1st ed. CRC Press, 2006.
[62] G. E. Fasshauer. Meshfree Approximation Methods with Matlab. 1st ed. World
Scientific Publishing, 2007.
[63] G. E. Fasshauer and M. J. McCourt. “Stable Evaluation of Gaussian Radial
Basis Function Interpolants”. In: SIAM Journal on Scientific Computing 34.2
(2012), pp. 737–762.
[64] R. Feldt. Black-box optimization package for Julia.
https://github.com/robert
feldt/BlackBoxOptim.jl
. [Online; accessed 12-February-2020]. 2020.
[65] J. L. Fiadeiro. Categories for Software Engineering. Springer, 2005.
[66] H. Flanders. Differential Forms with Applications to the Physical Sciences. 1st ed.
Academic Press, 1963.
[67] C. A. Floudas. Deterministic Global Optimization: Theory, Methods and Applica-
tions. 1st ed. Springer, 2000.
[68] B. Fong, D. I. Spivak, and R. Tuyéras. “Backprop as Functor: A compositional
perspective on supervised learning”. In: 2019 34th Annual ACM/IEEE Sym-
posium on Logic in Computer Science (LICS). Vancouver, Canada, June 24–
27, 2019, pp. 1–13.
[69] A. Forrester and A. Keane. “Recent advances in surrogate-based optimiza-
tion”. In: Progress in Aerospace Sciences 45.1–3 (2009), pp. 50–79.
[70] A. Forrester, A. Sóbester, and A. Keane. Engineering Design via Surrogate Mod-
elling - A Practical Guide. Wiley, 2008.
[71] P. Freyd. Abelian categories. An Introduction to the Theory of Functors. 1st ed.
Harper and Row, 1964.
[72] F. Genovese, A. Gryzlov, J. Herold, A. Knispel, M. Perone, E. Post, and A.
Videla. “idris-ct: A library to do category theory in Idris”. In: 2019 2nd ACT
Applied Category Theory Conference. Oxford, United Kingdom, July 15–19,
2019, 1–13.
Bibliography 215
[73] R. Geroch. Mathematical Physics. University of Chicago Press, 1985.
[74] R. G. Ghanem and P. D. Spanos. Stochastic Finite Elements: A Spectral Approach.
1st ed. Springer, 1991.
[75] A. Gil, J. Segura, and N. M. Temme. Numerical Methods for Special Functions.
1st ed. SIAM, 2007.
[76] D. Ginsbourger, R. Le Riche, and L. Carraro. “Kriging Is Well-Suited to Par-
allelize Optimization”. In: Computational Intelligence in Expensive Optimization
Problems. Ed. by Y. Tenne and C. K. Goh. Springer, 2010.
[77] L. Giraldi, A. Litvinenko, D. Liu, Matthies H. G., and A. Nouy. “To Be or Not
to Be Intrusive? The Solution of Parametric and Stochastic Equations - the
”Plain Vanilla” Galerkin Case”. In: SIAM Journal on Scientific Computing 36.6
(2014), pp. 2720–2744.
[78] R. Goldman. Pyramid Algorithms: A Dynamic Programming Approach to Curves
and Surfaces for Geometric Modeling. 1st ed. Morgan Kaufmann, 2003.
[79] L. Grasedyck, D. Kressner, and C. Tobler. “A literature survey of low-rank
tensor approximation techniques”. In: GAMM-Mitteilungen 36.1 (2013), pp. 53–
78.
[80] R. Griesse and B. Vexler. “Numerical Sensitivity Analysis for the Quantity
of Interest in PDE-Constrained Optimization”. In: SIAM J. Sci. Comput. 29.1
(2007), pp. 22–48.
[81] A. Griewank and A. Walther. Evaluating Derivatives: Principles and Techniques
of Algorithmic Differentiation. 2nd ed. SIAM, 2008.
[82] J. Groß. Linear Regression. 1st ed. Springer, 2003.
[83] P. W. Gross and P. R. Kotiuga. Electromagnetic Theory and Computation: A Topo-
logical Approach. 1st ed. Cambridge University Press, 2004.
[84] NEOS Guide. NEOS Server: State-of-the-Art Solvers for Numerical Optimization.
https://neos-guide.org/
. [Online; accessed 12-February-2020]. 2020.
[85] B. Gustavsen and A. Semlyen. “Rational approximation of frequency do-
main responses by vector fitting”. In: IEEE Transactions on Power Delivery 14.3
(1999), pp. 1052–1061.
[86] S. Gutsche. Constructive Category Theory and Applications to Algebraic Geometry.
Universität Siegen, PhD thesis. 2017.
[87] R. T. Haftka, D. Villanueva, and A. Chaudhuri. “Parallel surrogate-assisted
global optimization with expensive functions - a survey”. In: Struct Multidisc
Optim 54 (2016), pp. 3–13.
[88] R. Harper. Practical Foundations for Programming Languages. Cambridge Uni-
versity Press, 2nd ed., 2015.
[89] D. W. Hart. Introduction to Power Electronics. 1st ed. Prentice Hall, 1996.
[90] B. Hashemi and L. N. Trefethen. “Chebfun in Three Dimensions”. In: SIAM
Journal on Scientific Computing 39.5 (2017), pp. C341–C363.
[91] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning.
2nd ed. Springer, 2008.
[92] A. Hatcher. Algebraic Topology. 1st ed. Cambridge University Press, 2002.
[93] F. W. Hehl and Y. N. Obukhov. Foundations of Classical Electrodynamics. 1st ed.
Birkhäuser, 2003.
216 Bibliography
[94] J. S. Hesthaven, G. Rozza, and B. Stamm. Certified Reduced Basis Methods for
Parametrized Partial Differential Equations. Springer, 2015.
[95] M. Hintermüller and L. Vicente. “Space mapping for optimal control of par-
tial differential equations”. In: SIAM J. Optim 15.4 (2005), pp. 1002–1025.
[96] M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich. Optimization with PDE Con-
straints. Springer, 2009.
[97] R. Hiptmair, F. Krämer, and J. Ostrowski. “Robust Maxwell Formulations for
all Frequencies”. In: IEEE Transactions on Magnetics 44.6 (2008), pp. 682–685.
[98] R. Horst and H. Tuy. Global optimization: Deterministic approaches. 3rd ed. Sprin-
ger, 1995.
[99] A. S. Householder. The Theory of Matrices in Numerical Analysis. Dover, 2006.
[100] B. G. M. Husslage, G. Rennen, E. R. van Dam, and D. den Hertog. “Space-
filling Latin hypercube designs for computer experiments”. In: Optimization
and Engineering 12 (2011), pp. 611–630.
[101] Julia Interop. An interface for using MATLAB™ from Julia using the MATLAB
C api.
https://github.com/JuliaInterop/MATLAB.jl
. [Online; accessed
12-February-2020]. 2020.
[102] C. Ionescu and P. Jansson. “Domain-specific languages of mathematics: Pre-
senting mathematical analysis using functional programming”. In: 4th and
5th International Workshop on Trends in Functional Programming in Educa-
tion. 2016, pp. 1–15.
[103] J. D. Jackson. Classical Electrodynamics. 3rd ed. John Wiley & Sons, 1999.
[104] L. Jaulin, M. Kieffer, O. Didrit, and E. Walter. Applied Interval Analysis: With
Examples in Parameter and State Estimation, Robust Control and Robotics. 1st ed.
Springer, 2001.
[105] S. G. Johnson. HCubature.jl – A pure-Julia multidimensional h-adaptive integra-
tion.
https://github.com/JuliaMath/HCubature.jl
. [Online; accessed 12-
February-2020]. 2020.
[106] S. G. Johnson. Sobol low-discrepancy-sequence (LDS) package for Julia.
https:
//github.com/stevengj/Sobol.jl
. [Online; accessed 12-February-2020].
2020.
[107] S. G. Johnson. The NLopt nonlinear-optimization package.
http://github.com/
stevengj/nlopt
. [Online; accessed 12-February-2020]. 2020.
[108] D. R. Jones. “A Taxonomy of Global Optimization Methods Based on Re-
sponse Surfaces”. In: Journal of Global Optimization 21.4 (2001), pp. 345–383.
[109] D. R. Jones, C. D. Perttunen, and B. E. Stuckman. “Lipschitzian optimization
without the Lipschitz constant”. In: J. Optimization Theory and Application 79.1
(1993), pp. 157–181.
[110] JuliaPy. A Julia interface to SymPy (a Python library for symbolic mathematics)
via PyCall.
https://github.com/JuliaPy/SymPy.jl
. [Online; accessed 12-
February-2020]. 2020.
[111] J. Kangas, T. Tarhasaari, and L. Kettunen. “Maxwell equations and finite el-
ement software systems: object-oriented coding needs well defined objects”.
In: IEEE Transactions on Magnetics 36.4 (2000), pp. 1645 –1648.
[112] M. K. Kazimierczuk. High-Frequency Magnetic Components. 2nd ed. Wiley, 2014.
Bibliography 217
[113] P. Kerschke, H. H. Hoos, F. Neumann, and H. Trautmann. “Automated Algo-
rithm Selection: Survey and Perspectives”. In: Evolutionary Computation 27.1
(2019), pp. 3–45.
[114] L. Kettunen, T. Kovanen, and T. Tarhasaari. “Electromagnetism and cross-
disciplinary problems”. In: 2016 URSI International Symposium on Electro-
magnetic Theory (EMTS). Espoo, Finland, August 14–18, 2016, pp. 500–501.
[115] D. Klis, M. Jochum, O. Farle, and R. Dyczij-Edlinger. “Application of nonlin-
ear model-order reduction to 3D eddy current problems”. In: 2013 Interna-
tional Conference on Electromagnetics in Advanced Applications (ICEAA).
Torino, Italy, September 09–13, 2013, pp. 344–347.
[116] M. J. Kochenderfer and T. A. Wheeler. Algorithms for Optimization. The MIT
Press, 2019.
[117] I. Koláˇr, P. W. Michor, and J. Slovák. Natural operations in differential geometry.
Springer, 1993.
[118] Y. Konkel, O. Farle, A. Köhler, A. Schultschik, and R. Dyczij-Edlinger. “Adap-
tive strategies for fast frequency sweeps”. In: COMPEL 30.2 (2011), pp. 1855–
1869.
[119] S. Koziel. “Space mapping: Performance, reliability, open problems and per-
spectives”. In: 2017 IEEE MTT-S International Microwave Symposium (IMS).
Honolulu, U.S.A., June 4–9, 2017, pp. 1512–1514.
[120] S. Koziel and J. W. Bandler. “Coarse and Surrogate Model Assessment for En-
gineering Design Optimization with Space Mapping”. In: 2007 IEEE/MTT-
S International Microwave Symposium (IMS). Honolulu, U.S.A., June 3–8,
2007, pp. 107–110.
[121] S. Koziel, J. W. Bandler, and K. Madsen. “Quality assessment of coarse models
and surrogates for space mapping optimization”. In: Optimization and Engi-
neering 9 (2008), pp. 375–391.
[122] S. Koziel and A. Bekasiewicz. “Sequential approximate optimisation for sta-
tistical analysis and yield optimisation of circularly polarised antennas”. In:
IET Microw. Antennas Propag 12.13 (2018), pp. 2060–2064.
[123] S. Koziel and A. Bekasiewicz. “Variable-fidelity design optimization of anten-
nas with automated model selection”. In: 2016 10th European Conference on
Antennas and Propagation (EuCAP). Davos, Switzerland, April 10–15, 2016,
pp. 1–5.
[124] S. Koziel and D. Echeverría Ciaurri. “Reliable Simulation-Driven Design Op-
timization of Microwave Structures Using Manifold Mapping”. In: Progress
In Electromagnetics Research B 26.1 (2010), pp. 361–382.
[125] S. Koziel, D. Echeverría Ciaurri, and L. Leifsson. “Chapter 3: Surrogate-Based
Methods”. In: Computational Optimization, Methods and Algorithms. Ed. by S.
Koziel and X. S. Yang. Vol. 356. Studies in Computational Intelligence. Sprin-
ger, 2011, pp. 33–59.
[126] S. Koziel, L. Leifsson, and X. S. Yang (editors). Solving Computationally Expen-
sive Engineering Problems: Methods and Applications. Springer, 2014.
[127] S. Koziel and L. Leifsson (editors). Surrogate-Based Modeling and Optimization:
Applications in Engineering. Springer, 2013.
218 Bibliography
[128] D. Kraft. “Algorithm 733: TOMP–Fortran modules for optimal control calcu-
lations”. In: ACM Transactions on Mathematical Software 20.3 (1994), pp. 262–
281.
[129] S. Kucherenko and B. Iooss. “Derivative-Based Global Sensitivity Measures”.
In: Handbook of Uncertainty Quantification. Ed. by R. Ghanem, D. Higdon, and
H. Owhadi. Cham: Springer, 2017. Chap. 36, pp. 1241–1263.
[130] J. Kuipers, A. Plaat, J. A. M. Vermaseren, and H. J. van den Herik. “Improving
multivariate Horner schemes with Monte Carlo tree search”. In: Computer
Physics Communications 184.11 (2013), pp. 2391–2395.
[131] S. Kurz and B. Auchmann. “Differential Forms and Boundary Integral Equa-
tions for Maxwell-Type Problems”. In: Fast Boundary Element Methods in Engi-
neering and Industrial Applications. Ed. by U. Langer, M. Schanz, O. Steinbach,
and W. Wendland. Springer, 2012.
[132] V. Lahtinen. Searching for Frontiers in Contemporary Eddy Current Model Based
Hysteresis Loss Modelling of Superconductors. Tampere University of Technol-
ogy, PhD thesis. 2014.
[133] V. Lahtinen, P. R. Kotiuga, and A. Stenvall. An electrical engineering perspective
on missed opportunities in computational physics. arXiv:1809.01002v2. 2018.
[134] S. Lang. Differential manifolds. 2nd ed. Springer, 1985.
[135] J. Larson, M. Menickelly, and S. M. Wild. “Derivative-free optimization meth-
ods”. In: Acta Numerica 28 (2019), pp. 287–404.
[136] F. W. Lawvere and S. H. Schanuel. Conceptual Mathematics. 2nd ed. Cambridge
University Press, 2009.
[137] L. Lebensztajn, C. A. R. Marretto, M. C. Costa, and J.-L. Coulomb. “Kriging: A
Useful Tool for Electromagnetic Device Optimization”. In: IEEE Transactions
on Magnetics 40.2 (2004), pp. 1196–1199.
[138] M. C. Lehmann, M. Hadžiefendi´c, A. Piwonski, and R. Schuhmann. “Encod-
ing Electromagnetic Transformation Laws for Dimensional Reduction”. In:
International Journal of Numerical Modelling: Electronic Networks, Devices and
Fields 33.1 (2020).
https://doi.org/10.1002/jnm.2747
, e2747.
[139] G. Lehner. Electromagnetic Field Theory for Engineers and Physicists. 1st ed. Sprin-
ger, 2010.
[140] P. Linz. Theoretical Numerical Analysis: An Introduction to Advanced Techniques.
1st ed. John Wiley & Sons, 1979.
[141] E. Ljungskog. Interpolation of scattered data in Julia.
https://github.com/el
jungsk/ScatteredInterpolation.jl
. [Online; accessed 12-February-2020].
2020.
[142] D. G. Luenberger. Optimization by Vector Space Methods. Wiley, 1969.
[143] H. D. Macedo and J. N. Oliveira. “Typing linear algebra: A biproduct-oriented
approach”. In: Science of Computer Programming 78.11 (2013), pp. 2160–2191.
[144] S. MacLane. Categories for the Working Mathematician. Springer, 1971.
[145] N. Marheineke and R. Pinnau. “Model hierarchies in space mapping opti-
mization: Feasibility study for transport processes”. In: J. Comput. Meth. Sci.
Eng 12.1,2 (2012), pp. 63–74.
Bibliography 219
[146] R. T. Marler and J. S. Arora. “Survey of Multi-Objective Optimization Meth-
ods for Engineering”. In: Structural and Multidisciplinary Optimization 26.6
(2004), pp. 369–395.
[147] J. R. R. A. Martins and A. B. Lambe. “Multidisciplinary Design Optimization:
A Survey of Architectures”. In: AIAA Journal 51.9 (2013), pp. 2049–2075.
[148] P. McCullagh. “What is a statistical model?” In: The Annals of Statistics 30.5
(2002), pp. 1225–1310.
[149] D. Meeker. Finite Element Method Magnetics (FEMM4.2).
http://www.femm.
info/
. 2017.
[150] K. Miettinen. Nonlinear Multiobjective Optimization. Kluwer Academic Pub-
lishers, 1999.
[151] P. K. Mogensen and A. N. Riseth. “Optim: A mathematical optimization pack-
age for Julia”. In: Journal of Open Source Software 3.24 (2018), p. 615.
[152] N. Mohan, T. M. Undeland, and W. P. Robbins. Power Electronics: Converters,
Applications, and Design. 3rd ed. Wiley, 2002.
[153] P. Monk. Finite Element Methods for Maxwell’s Equations. 1st ed. Oxford Uni-
versity Press, 2003.
[154] J. Mühlethaler. Modeling and multi-objective optimization of inductive power com-
ponents. ETH Zürich, PhD thesis. 2012.
[155] K. P. Murphy. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.
[156] R. D. Neidinger. “Introduction to Automatic Differentiation and MATLAB
Object-Oriented Programming”. In: SIAM Review 52.3 (2010), pp. 545–563.
[157] A. Neumaier. “Complete search in continuous global optimization and con-
straint satisfaction”. In: Acta Numerica 13 (2004), pp. 271–369.
[158] J. Nocedal and S. J. Wright. Numerical Optimization. 2nd ed. Springer, 2006.
[159] W. L. Oberkampf and C. J. Roy. Verification and Validation in Scientific Comput-
ing. 1st ed. Cambridge University Press, 2010.
[160] J. T. Oden and S. Prudhomme. “Estimation of Modeling Error in Computa-
tional Mechanics”. In: Journal of Computational Physics 182.2 (2002), pp. 496–
515.
[161] J. Oliver. “Rounding error propagation in polynomial evaluation schemes”.
In: Journal of Computational and Applied Mathematics 5.2 (1979), pp. 85–97.
[162] S. Patel. LaTeX Templates: Masters/Doctoral Thesis.
https://www.latextem
plates.com/template/masters-doctoral-thesis
. [Online; accessed 12-
February-2020]. 2020.
[163] E. Patterson. Catlab – A framework for applied category theory in the Julia language.
https://github.com/AlgebraicJulia/Catlab.jl
. [Online; accessed 12-
February-2020]. 2020.
[164] C. R. Paul. Introduction to Electromagnetic Compatibility. 2nd ed. Wiley, 2006.
[165] B. Peherstorfer, K. Willcox, and M. Gunzburger. “Optimal model manage-
ment for multifidelity monte carlo estimation”. In: SIAM Journal on Scientific
Computing 38.5 (2016), pp. 3163–3194.
[166] B. Peherstorfer, K. Willcox, and M. Gunzburger. “Survey of Multifidelity Me-
thods in Uncertainty Propagation, Inference, and Optimization”. In: SIAM
Review 60.3 (2018), pp. 550–591.
220 Bibliography
[167] B. C. Pierce. Basic Category Theory for Computer Scientists. MIT Press, 1991.
[168] S. Posur. Constructive Category Theory and Applications to Equivariant Sheaves.
Universität Siegen, PhD thesis. 2017.
[169] M. J. D. Powell. Approximation Theory and Methods. 1st ed. Cambridge Univer-
sity Press, 1981.
[170] M. J. D. Powell. “Direct search algorithms for optimization calculations”. In:
Acta Numerica 7 (1998), pp. 287–336.
[171] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical
Recipes - The Art of Scientific Computing. 3rd ed. Cambridge University Press,
2007.
[172] C. Psarras, H. Barthels, and P. Bientinesi. The Linear Algebra Mapping Problem.
arXiv:1911.09421v1. 2019.
[173] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning.
MIT Press, 2006.
[174] P. Raumonen. Mathematical structures for dimensional reduction and equivalence
classification of electromagnetic boundary value problems. Tampere University of
Technology, PhD thesis. 2009.
[175] J. Revels, M. Lubin, and T. Papamarkou. Forward-Mode Automatic Differentia-
tion in Julia. arXiv:1607.07892. 2016.
[176] J. A. Richardson and J. L. Kuester. “The complex method for constrained op-
timization”. In: Commun. ACM 16.8 (1973), pp. 487–489.
[177] E. Riehl. Category Theory in Context. Dover, 2016.
[178] U. Römer. Numerical Approximation of the Magnetoquasistatic Model with Un-
certainties and its Application to Magnet Design. Technische Universität Darm-
stadt, PhD thesis. 2015.
[179] A. A. Rodríguez and A. Valli. Eddy Current Approximation of Maxwell Equations
– Theory, algorithms and applications. 1st ed. Springer, 2010.
[180] S. Roman. An Introduction to the Language of Category Theory. Birkhäuser, 2017.
[181] G. Roussos and B. J. C. Baxter. “Rapid evaluation of radial basis functions”.
In: Journal of Computational and Applied Mathematics 180.1 (2005), pp. 51–70.
[182] C. J. Roy and W. L. Oberkampf. “A comprehensive framework for verifi-
cation, validation, and uncertainty quantification in scientific computing”.
In: Computer Methods in Applied Mechanics and Engineering 200.25-28 (2011),
pp. 2131–2144.
[183] W. Rudin. Principles of Mathematical Analysis. 3rd ed. McGraw–Hill, 1976.
[184] J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn. “Design and Analysis of
Computer Experiments”. In: Statistical Science 4.4 (1989), pp. 409–435.
[185] B. S. Saini, M. Lopez-Ibanez, and K. Miettinen. “Automatic surrogate mod-
elling technique selection based on features of optimization problems”. In:
2019 The Genetic and Evolutionary Computation Conference (GECCO). Pra-
gue, Czech Republic, July 13–17, 2019, pp. 1765–1772.
[186] D. P. Sanders. Rigorous global optimisation package for Julia.
https://githu
b.com/JuliaIntervals/IntervalOptimisation.jl
. [Online; accessed 12-
February-2020]. 2020.
Bibliography 221
[187] R. Schaback and H. Wendland. “Kernel techniques: From machine learning
to meshless methods”. In: Acta Numerica 15 (2006), pp. 543–639.
[188] M. Scheuerer, R. Schaback, and M. Schlather. “Interpolation of spatial data
– A stochastic or a deterministic problem?” In: European Journal of Applied
Mathematics 24.4 (2013), pp. 601–629.
[189] W. H. Schilders, H. A. van der Vorst, and J. Rommes (editors). Model Order
Reduction: Theory, Research Aspects and Applications. 1st ed. Springer, 2008.
[190] R. B. Schnabel. “Parallel Nonlinear Optimization: Limitations, Challenges,
and Opportunities”. In: Algorithms for Continuous Optimization: The State of the
Art. Ed. by E Spedicato. Vol. 434. NATO ASI Series (Series C: Mathematical
and Physical Sciences). Springer, 1994, pp. 531–559.
[191] S. Y. Serovajsky. “Differentiation Functor and Its Application in the Optimiza-
tion Control Theory”. In: Fourier Analysis. Ed. by Michael Ruzhansky and
Ville Turunen. Springer International Publishing, 2014, pp. 335–347.
[192] S. Shalev-Shwartz and S. Ben-David. Understanding Machine Learning: From
Theory to Algorithms. Cambridge University Press, 2014.
[193] C. H. da Silva Santos, M. S. Gonçalves, and H. E. Hernandez-Figueroa. “De-
signing Novel Photonic Devices by Bio-Inspired Computing”. In: IEEE Pho-
tonics Technology Letters 22.15 (2010), pp. 1177–1179.
[194] J. Søndergaard. Optimization using surrogate models - by the space mapping tech-
nique. Technical University of Denmark, PhD thesis. 2003.
[195] J. C. Spall. Introduction to Stochastic Search and Optimization: Estimation, Simu-
lation, and Control. 1st ed. Wiley, 2003.
[196] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. 2nd
ed. MIT Press, 2001.
[197] D. I. Spivak. “Categories as Mathematical Models”. In: Categories for the
Working Philosopher (ed. by Elaine Landry), Oxford University Press. 2017,
pp. 381–401.
[198] T. Steinmetz, S. Kurz, and M. Clemens. “Domains of validity of quasistatic
and quasistationary field approximations”. In: COMPEL 30.4 (2011), pp. 1237–
1247.
[199] A. Stenvall and V. Lahtinen. “The Methodology of HTS AC-Loss Modeling”.
In: IEEE Transactions on Applied Superconductivity 29.5 (2019), pp. 1–7.
[200] G. Strang. Computational Science and Engineering. 1st ed. Wellesley-Cambridge
Press, 2007.
[201] T. J. Sullivan. Introduction to Uncertainty Quantification. 1st ed. Springer, 2015.
[202] S. Surjanovic and D. Bingham. Virtual Library of Simulation Experiments: Test
Functions and Datasets.
http://www.sfu.ca/~ssurjano
. [Online; accessed
12-February-2020]. 2020.
[203] K. Svanberg. “A class of globally convergent optimization methods based
on conservative convex separable approximations”. In: SIAM J. Optim. 12.2
(2002), pp. 555–573.
[204] W. P. Thurston. “On proof and progress in mathematics”. In: Bulletin of the
American Mathematical Society 30.2 (1994), pp. 161–177.
[205] E. Tonti. The Mathematical Structure of Classical and Relativistic Physics: A Gen-
eral Classification Diagram. Springer, 2013.
222 Bibliography
[206] A. Townsend and L. N. Trefethen. “An Extension of Chebfun to Two Dimen-
sions”. In: SIAM Journal on Scientific Computing 35.6 (2013), pp. C495–C518.
[207] J. F. Traub and A. G. Werschulz. Complexity and Information. 1st ed. Cambridge
University Press, 1998.
[208] L. N. Trefethen. Approximation Theory and Approximation Practice. 1st ed. SIAM,
2013.
[209] L. N. Trefethen. “Six myths of polynomial interpolation and quadrature”. In:
Mathematics Today 47.4 (2012), pp. 184–188.
[210] F. Tröltzsch. Optimal Control of Partial Differential Equations – Theory, Methods
and Applications. Graduate Studies in Mathematics, Volume: 112. American
Mathematical Society, 2010.
[211] F. Tröltzsch and A. Valli. “Optimal control of low-frequency electromagnetic
fields in multiply connected conductors”. In: Optimization 65.9 (2016), pp. 1651–
1673.
[212] R. Trobec and G. Kosec. Parallel Scientific Computing: Theory, Algorithms, and
Applications of Mesh Based and Meshless Methods. Springer, 2015.
[213] W. Tucker. Validated Numerics: A Short Introduction to Rigorous Computations.
Princeton University Press, 2011.
[214] The Univalent Foundations Program. Homotopy Type Theory: Univalent Foun-
dations of Mathematics. Institute for Advanced Study:
https://homotopytype
theory.org/book
, 2013.
[215] M. Urquhart. Julia package for the creation of optimised Latin Hypercube Sampling
Plans.
https://github.com/MrUrq/LatinHypercubeSampling.jl
. [Online;
accessed 12-February-2020]. 2020.
[216] M. Urquhart, E. Ljungskog, and S. Sebben. “Surrogate-based optimisation
using adaptively scaled radial basis functions”. In: Applied Soft Computing 88
(2020), pp. 1–17.
[217] S. Voß. “Meta-heuristics: The State of the Art”. In: Local Search for Planning
and Scheduling (LSPS 2000). Ed. by A. Nareyek. Springer, 2001.
[218] P. Šolin. Partial Differential Equations and the Finite Element Method. 1st ed. John
Wiley & Sons, 2006.
[219] R. F. C. Walters. Categories and Computer Science. Cambridge Computer Sci-
ence Texts, 1991.
[220] Q. Wang, X. Zhang, R. Burgos, D. Boroyevich, A. White, and M. Kheraluwala.
“Design and optimization of a high performance isolated three phase AC/DC
converter”. In: 2016 IEEE Energy Conversion Congress and Exposition (ECCE).
Milwaukee, U.S.A., September 18–22, 2016, pp. 1–10.
[221] R. Webster and M. A. Oliver. Geostatistics for Environmental Scientists. 2nd ed.
John Wiley & Sons, 2007.
[222] B. Wen et al. “Integrated Design by Optimization of Electrical Power Systems
for More Electric Aircraft”. In: 2015 More Electric Aircraft (MEA). Toulouse,
France, February 3–5, 2015, pp. 1–4.
[223] T. Wittig. Zur Reduzierung der Modellordnung in elektromagnetischen Feldsimu-
lationen. Technische Universität Darmstadt, PhD thesis. 2004.
[224] D. H. Wolpert and W. G. Macready. “No Free Lunch Theorems for Optimiza-
tion”. In: IEEE Transactions on Evolutionary Computation 1.1 (1997), pp. 67–82.
Bibliography 223
[225] N. S. Yanofsky. “Towards a Definition of an Algorithm”. In: Journal of Logic
and Computation 21.2 (2010), pp. 253–286.
[226] K. Yosida. Functional Analysis. 6th ed. Springer, 1995.
[227] S. Zaglmayr. High Order Finite Element Methods for Electromagnetic Field Com-
putation. Johannes Kepler Universität Linz, PhD thesis. 2006.
[228] Z.-H. Zhan, J. Zhang, Y. Li, and H. S. Chung. “Adaptive Particle Swarm Op-
timization”. In: IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cy-
bernetics) 39.6 (2009), pp. 1362–1381.