CIplus
Band 3/2017
Simula ion-based Tes Func ions o
Op imiza ion Algo i hms
Ma in Zae e e , And eas Fischbach, Bo is Naujoks,
Thomas Ba z-Beiels ein
Simula ion-based Tes Func ions o Op imiza ion Algo i hms
Ma in Zae e e , And eas Fischbach, Bo is Naujoks, Thomas Ba z-Beiels ein
[ i s name].[las name]@ h-koeln.de
TH K¨
oln, Facul y o Compu e Science and Enginee ing Science
S einm¨
ulle allee 1, 51643 Gumme sbach, Ge many
ABSTRACT
When designing o de eloping op imiza ion algo i hms, es unc-
ions a e c ucial o e alua e pe o mance. O en, es unc ions a e
no su icien ly di icul , di e se, lexible o ele an o eal-wo ld
applica ions. P e iously, es unc ions wi h eal-wo ld ele ance
we e gene a ed by aining a machine lea ning model based on
eal-wo ld da a. The model es ima ion is used as a es unc ion.
We p opose a mo e p incipled app oach using simula ion ins ead
o es ima ion. Thus, ele an and a ied es unc ions a e c ea ed
which ep esen he beha io o eal-wo ld i ness landscapes. Im-
po an ly, es ima ion can lead o excessi ely smoo h es unc ions
while simula ion may a oid his pi all. Mo eo e , he simula ion
can be condi ioned by he da a, so ha he simula ion ep oduces
he aining da a bu ea u es di e se beha io in unobse ed e-
gions o he sea ch space. The p oposed es unc ion gene a o is
illus a ed wi h an in ui i e, one-dimensional example. To demon-
s a e he u ili y o his app oach i is applied o a p o ein sequence
op imiza ion p oblem. This applica ion demons a es he ad an-
ages as well as p ac ical limi s o simula ion-based es unc ions.
CCS CONCEPTS
•Theo y o compu a ion →Ma hema ical op imiza ion; Gauss-
ian p ocesses; •Compu ing me hodologies →Modeling and
simula ion;
KEYWORDS
Op imiza ion, Tes unc ion gene a o , Simula ion, Modeling
ACM Re e ence o ma :
Ma in Zae e e , And eas Fischbach, Bo is Naujoks, Thomas Ba z-Beiels ein.
2017. Simula ion-based Tes Func ions o Op imiza ion Algo i hms. In
P oceedings o GECCO ’17, Be lin, Ge many, July 15-19, 2017, 8 pages.
DOI: h p://dx.doi.o g/10.1145/3071178.3071190
1 INTRODUCTION
A c ucial issue o he de elopmen , imp o emen and unde s and-
ing o op imiza ion algo i hms a e pe o mance es s o bench-
ma ks. Tes unc ions a e equi ed o e alua e he pe o mance o
algo i hms. I is pa icula ly di icul o p o ide es unc ions o
expensi e op imiza ion p oblems, whe e e alua ions equi e high
Pe mission o make digi al o ha d copies o all o pa o his wo k o pe sonal o
class oom use is g an ed wi hou ee p o ided ha copies a e no made o dis ibu ed
o p o i o comme cial ad an age and ha copies bea his no ice and he ull ci a ion
on he i s page. Copy igh s o componen s o his wo k owned by o he s han he
au ho (s) mus be hono ed. Abs ac ing wi h c edi is pe mi ed. To copy o he wise, o
epublish, o pos on se e s o o edis ibu e o lis s, equi es p io speci ic pe mission
and/o a ee. Reques pe missions om [email p o ec ed].
GECCO ’17, Be lin, Ge many
©
2017 Copy igh held by he owne /au ho (s). Publica ion igh s licensed o ACM.
978-1-4503-4920-8/17/07...$15.00
DOI: h p://dx.doi.o g/10.1145/3071178.3071190
compu a ional e o o o he limi ed esou ces. O en, expensi e
op imiza ion p oblems necessi a e access o complex, con iden ial
simula ion codes, o access o expensi e labo a o y equipmen .
E en i access is g an ed, he e alua ion cos s make ex ensi e es s
in easible. Only a limi ed numbe o expensi e p oblems is openly
a ailable o he esea ch communi y.
Thus, we need a gene a o o es unc ions which sa is y ce ain
c i e ia. Besides impo an ea u es, which a e lis ed in well-known
publica ions (e.g., [6, 30]) we ocus on he ollowing c i e ia.
(C.1)
Di icul y: Tes unc ions should be su icien ly complex [
21
].
Whi ley [
30
] s a es ha es “p oblems should be esis an o
hill-climbing”.
(C.2)
Di e si y: The p oblem ins ances a e a ied, andomized and
no known a p io i. This c i e ion is a s anda d in machine
lea ning, because he a ailable se o p oblem ins ances is
pa i ioned in o a aining, a alida ion, and a es se [13].
(C.3)
Flexibili y: They should no be es ic ed o one speci ic p ob-
lem ins ance. Flexibili y is used in machine lea ning o cha ac-
e ize he numbe o pa ame e s ha a e necessa y o speci y
a model [
15
]. Flexibili y will be used in ou amewo k o
cha ac e izing unc ions. Some au ho s use he e m “gene -
alizabili y” o cha ac e ize his ea u e [2].
(C.4)
Rele ance: They should e lec eal-wo ld p oblem beha io .
(C.5)
E alua ion cos : They should be inexpensi e o e alua e,
allowing o nume ous es s.
One way o p o ide es unc ions ha sa is y c i e ia (C.1)-
(C.5) is o gene a e da a-d i en eg ession models o he objec i e
unc ion and use he de i ed p edic o o es algo i hms [
2
,
7
,
8
,
25
]. This app oach has an inhe en p oblem: Almos all eg ession
models in e pola e he da a hey a e ained on and hence yield
smoo hed i ness landscapes. Thus, he de i ed ins ances may be
less ugged and mo e easy o sol e han he eal-wo ld p oblem.
The e o e, da a-d i en es unc ions should in addi ion espec he
ollowing c i e ion:
(C.6)
Non-smoo hing, i.e., he es ins ances e lec he uggedness
o he o iginal p oblem.
Thus, he main esea ch ques ion examined in his s udy is: How
o gene a e es unc ions ha sa is y c i e ia (C.1)-(C.6)? To ha
end, we p opose a gene al amewo k o gene a ing es unc ions
based on eal-wo ld da a using simula ion a he han es ima ion
(p edic ion) echniques. Decisi ely, a simula ion has he po en ial
o a oid he pi all o smoo hing. Fu he mo e, i p o ides a p inci-
pled way o gene a e di e se es unc ion ins ances. To illus a e
hese ea u es, we ain K iging models [
5
] on eal-wo ld da a. The
key idea is o use non-condi ional and condi ional simula ion [
5
]
o K iging models o gene a e a ied p oblem ins ances ha do
no smoo hen he po en ially ugged s uc u e o he eal-wo ld
p oblem. The simula ion-based es unc ions can e lec he beha -
io o he eal-wo ld p oblem a he han jus he da a i sel . These
"© Ma in Zae e e , And eas Fischbach, Bo is Naujoks, Thomas Ba z-Beiels ein. 2017.
This is he au ho 's e sion o he wo k. I is pos ed he e o you pe sonal use. No o edis ibu ion. The de ini i e e sion
was published in he p oceedings o GECCO'17, Be lin, Ge many,
h p://dx.doi.o g/10.1145/3071178.3071190."
GECCO ’17, July 15-19, 2017, Be lin, Ge many Zae e e e . al.
es unc ions a e especially in e es ing o expensi e op imiza ion
p oblems bu ob iously also apply o he cheap case.
Rela ed app oaches will be explained in Sec. 2. A e wa ds, Sec. 3
will p o ide he de ails o he (non-)condi ional K iging simula ion-
based gene a o . A simple example is gi en in Sec. 4. To p esen a
mo e complex applica ion, he me hod is applied o a eal-wo ld
da a se in Sec. 5. He e, we also in es iga e p ac ical limi s o he
app oach. Sec ion 6 discusses he applicabili y o ou app oach.
Finally, Sec. 7 p esen s a summa y and ou look o his wo k.
2 RELATED WORK
The mos basic es unc ion is a simple ma hema ical exp ession,
e.g., he sphe e unc ion, which e lec s he beha io o many es
unc ions in he icini y o he op imum [
21
]. In many s udies, se s
o such exp essions a e used as es beds, e.g., combining he sphe e,
B anin, o Rosenb ock unc ions. These es sui es should obey
ce ain p inciples, e.g., nonlinea i y, non-sepa abili y, and scalabil-
i y [
30
]. The bene i o using se s o well es ablished unc ions is
ha hey enable compa abili y be ween di e en s udies and can
be used o gua an ee gene alizable esul s. S ill, ce ain algo i hms
could easily be ailo ed o o e i a speci ic es bed, because he es
unc ions a e known in ad ance, i.e., be o e he s udy is pe o med.
Fu he mo e, he capabili y o ep esen ing complex eal-wo ld
beha io is p obably limi ed. The gene a ing p inciple o hese
classical es unc ion sui es [
6
,
21
,
27
] can be desc ibed as induc i e,
because single, simple ea u es such as symme y o mul imodali y
a e combined o gene a e a complex es unc ion.
A mo e comp ehensi e app oach is aken by he Compa ing Con-
inuous Op imize pla o m (COCO), also known as he Black Box
Op imiza ion Benchma k (BBOB) [
11
]. BBOB comp ises a ame-
wo k ha au oma es he expe imen al p ocedu e in ol ed in es ing
o con inuous op imiza ion algo i hms. BBOB akes an induc i e
app oach, elying on a i icial es unc ions [
12
]. BBOB p o ides
an enhanced p ocedu e o pos -p ocessing o expe imen al esul s
o enable a s anda dized compa ison and analysis.
The Gaussian Landscape Gene a o (GLG), which was p oposed
by Gallaghe and Yuan [
10
], is also an induc i e app oach. How-
e e , i is no based on a ixed se o unc ions. Ra he , i andomly
composes Gaussian cu es. The o e all i ness alue is he maxi-
mum o all cu es a a gi en poin . One ad an age o he GLG is
he abili y o con ol he numbe o local op ima. Thus, he com-
plexi y o he esul ing es unc ion ins ances can be con olled.
Also, he andomized p ocess allows o a la ge a ie y o es unc-
ions. Howe e , he ele ance o he esul ing unc ions is deba able.
Simila es unc ion gene a o s a e desc ibed in [1].
The K igi ie app oach c ea es andom K iging models, o Gauss-
ian p ocesses [
29
]. Wi h a use -speci ied end and co a iance
s uc u e, he K igi ie andomly c ea es a p ocess ha can be used
as a non-linea es unc ion. Thus, a ied and di icul unc ions
can be gene a ed based on an induc i e app oach. The ele ance
o he esul ing unc ions elies on he assump ion ha eal-wo ld
p ocesses a e also Gaussian, bu i is unclea how he esul ing es
unc ions ela e o a speci ic eal-wo ld applica ion.
A deduc i e app oach has been employed o a p ac ical applica-
ion by Rudolph e al. [
25
]. Deduc i e app oaches ake a complex
da a se and ex ac impo an ea u es using da a-d i en me h-
ods. Rudolph e al. aim o imp o e algo i hm pe o mance on he
eal p oblem (op imiza ion o a ship p opulsion sys em) by uning
pe o mance on a K iging su oga e model.
Ba z-Beiels ein [
2
] p oposed a deduc i e app oach o op imiza-
ion benchma ks in gene al. Da a om a eal-wo ld sys em a e
used o ain a model and o gene a ing es unc ion ins ances.
Model pa ame e s can be s ochas ically a ied o enable di e si y.
S a is ical ools such as mixed models a e also discussed [4].
Simila ly, Flasch [
8
] and Fischbach e al. [
7
] used a deduc i e ap-
p oach based on K iging models o gene a e es unc ions. Fi s ly,
eal-wo ld da a is aken om some expe imen . Secondly, a K ig-
ing model is ained wi h he da a. The K iging model is a ied
by making con olled changes o he model pa ame e s, e.g., he
nugge pa ame e o pa ame e s o he co ela ion unc ion. Hence,
i will be e e ed o as he pa ame e - a ia ion app oach. Then, he
p edic o o he a ied K iging model can be used as a es unc ion.
In p inciple, his app oach can be applied o a bi a y models and i
is no es ic ed o K iging o Gaussian p ocesses. An ex ension by
Fischbach e al. [
7
] akes wo p oblems in o accoun : (i) i a model
is insensi i e o some pa ame e , he de i ed es unc ion ins ances
will be nea ly iden ical and (ii) i a pa ame e has a d as ic impac , a
andom change may c ea e a unc ion wi hou any esemblance o
he o iginal unc ion. Bo h p oblems a e handled by compu ing a -
ious measu es o (dis-)simila i y be ween he es unc ion ins ances
and he unmodi ied model. The compu ed alues a e equi ed o be
wi hin use -speci ied bounds. Thus, simple copies ( oo simila ) and
s ong dis o ions (no simila enough) a e a oided.
Based on hese esul s, we p opose a new deduc i e simula ion
app oach. The goal is o gene a e da a-d i en es unc ions ha ul-
ill c i e ia (C.1) o (C.5) and a oid he pi all o smoo hing (C.6). The
co esponding me hods and equi ed ounda ions a e in oduced
in he ollowing.
3 SIMULATION-BASED TEST FUNCTION
GENERATOR
3.1 K iging Es ima ion
K iging is a modeling p ocedu e ha unde s ands obse a ions
as ealiza ions o a Gaussian p ocess. A de ailed desc ip ion is
gi en by Fo es e e al. [
9
]. In op imiza ion, K iging is a popula
choice, as i addi ionally p o ides an es ima e o p edic ion unce -
ain y, which can be used o balance explo a ion and exploi a ion by
compu ing he expec ed imp o emen o candida e solu ions [
19
].
This app oach is mos amously employed in he E icien Global
Op imiza ion algo i hm (EGO) [
16
]. Ou s udy u ilizes K iging o
simula e esponses o a Gaussian p ocess.
K iging app oxima es he da a by modeling he co ela ion be-
ween obse a ions, e.g., using a Gaussian co ela ion unc ion
(ke nel)
k(x,x0)=exp(−θd(x,x0))
, whe e
x,x0∈ X
. He e,
X
is
some non-emp y se , called he sea ch space. I
X=R
hen
x
is a
-dimensional eal ec o . Fu he mo e,
θ∈R
is a ke nel pa ame e
and
d(x,x0)
is a dis ance unc ion, e.g.,
d(x,x0)=|x−x0|
wi h
x∈R
.
Based on his, a co ela ion ma ix
K
is compu ed, which collec s
all pai wise co ela ions o he aining da a
X={x1, . . ., xn}
. I is
Simula ion-based Tes Func ions o Op imiza ion Algo i hms GECCO ’17, July 15-19, 2017, Be lin, Ge many
used in he p edic o as ollows:
ˆ
y(x)=ˆ
µ+kTK−1(y−1ˆ
µ),(1)
whe e
y
a e he aining obse a ions,
ˆ
y(x)
is he p edic ed unc ion
alue o a new sample
x
,
ˆ
µ
ep esen s he p ocess mean,
1
is a ec o
o ones and
k
is he column ec o o co ela ions be ween he se
o aining samples
X
and he new sample
x
. All pa ame e s (e.g.,
θ,ˆ
µ) a e de e mined by Maximum Likelihood Es ima ion (MLE).
3.2 K iging Simula ion
The p edic o in Eq.
(1)
es ima es a unc ion alue a a new sample
x
. The goal o es ima ion (o p edic ion) is o p oduce alues ha
a e as close o he ue alues as possible. In con as , he goal o
simula ion is o p oduce alues whose momen s a e as close o he
momen s o he eal da a as possible [
17
]. In case o K iging, he
simula ion app oach c ea es ealiza ions o a Gaussian p ocess wi h
he same mean and co a iances as he modeled p ocess.
While he p edic o is also based on he p ocess mean and co a i-
ance ma ix es ima ed du ing model aining, he p edic o i sel
does no ha e hese e y same p ope ies: i smoo hs he da a and
may e en be non-s a iona y [
5
]. On he o he hand, simula ions ac-
ually ha e he espec i e mean and co a iances. This is impo an
when gene a ing es unc ions: a smoo hed landscape may ob i-
ously lack impo an ea u es o he eal, o iginal i ness landscape,
e.g., i may ha e a smalle numbe o local op ima.
Di e en app oaches o simula ion o Gaussian p ocesses ex-
is [
5
]. Ou app oach is based on he squa e oo o he co a i-
ance ma ix [
5
]. This choice is made o compu a ional easons.
Fi s ly, a se
Xs
o
m
samples is selec ed. The p ocess will be simu-
la ed a hese samples. Secondly, he co ela ion ma ix
Ks
o he
se
Xs
is compu ed. I is decomposed as
σ2Ks=Cs=UΛUT
,
whe e he p ocess a iance
σ2
is de e mined by MLE and
Cs
is
he co a iance ma ix wi h he eigen ec o ma ix
U
and diag-
onal eigen alue ma ix
Λ=diag(λ)
. The squa e oo o
Cs
is
C1/2
s=Udiag(λ1/2
1, . . ., λ1/2
m)UTand he simula ion is gi en by
ˆync =1ˆ
µ+C1/2
sϵ.(2)
He e,
ϵ
is a ec o o
m
independen , no mally dis ibu ed andom
numbe s wi h ze o mean and uni a iance.
3.3 K iging Condi ional Simula ion
The goal o condi ional simula ion is ep oducing he momen s o
he aining da a. A he same ime, he simula ion can be made
condi ional on he aining da a. Tha means, he aining da a is
ep oduced exac ly by he simula ion, i.e., ˆys=yi Xs=X. Thus,
he condi ional simula ion is a be e app oxima ion o he aining
da a, compa ed o he non-condi ional simula ion. S ill, due o
he di e en goals o es ima ion and simula ion, he condi ional
simula ion e o is wice as la ge as he es ima ion e o [5, 17].
A a i s glance, his ea u e may seem undesi able. The use ul-
ness o condi ional simula ion can be demons a ed by a simple
example gi en by Lan uejoul [
18
]. The cu es in Fig. 1 ep esen
dep h measu emen s along an unde sea cable. The in e pola ion
( hin black line) gi es a good es ima e o he ue dep h. The condi-
ional simula ion (dashed ed line) on he o he hand may ha e a
la ge e o . Howe e , i he goal is o de e mine he equi ed leng h
0 100 200 300 400
−3000 −1000 0
dis ance [km]
dep h [m]
Figu e 1: Unde sea cable dep h es ima ion (black line) and
condi ional simula ion (dashed ed line) and he gi en da a
(black do s). Based on he example p esen ed in [18].
o he cable, he es ima ion app oach may se e ely unde es ima e
he ue alue, while he simula ion may wo k well. In e es ingly,
bo h es ima ion and simula ion a e a esul o he same ained
model, bu ep esen di e en ea u es.
The condi ional simula ion may esul in mo e ealis ic shapes
han he p edic o . Fo example, highe equency beha io may
no be isible in he p edic o , bu may be isible in a (condi ional)
simula ion [
5
]. Simila o he non-condi ional simula ion, di e en
condi ional simula ion app oaches exis . He e, we use a s aigh
o wa d app oach ha di ec ly simula es he condi ional Gaussian
p ocess [
28
]. This is no necessa ily he mos e icien choice. Bu i
allows o a a he simple and anspa en implemen a ion. A mo e
ad anced app oach can be subs i u ed when needed, e.g., when
g owing da a size ende s he ollowing app oach in easible.
As desc ibed in Sec. 3.2, we ha e he co ela ion ma ix o he
aining da a (obse ed da a)
K
and he ma ix o co ela ions be-
ween he samples o be simula ed,
Ks
. Fu he mo e,
Kx
deno es
he ma ix o he c oss-co ela ions be ween aining and simula-
ion samples. Co ela ions o he combined aining and simula ion
samples can be a anged in a block ma ix as ollows:
Kall =K KT
x
KxKs.
Following [
28
], he condi ioned co ela ion ma ix can be calcu-
la ed as
Kcs =Ks−KxK−1KT
x.
In ui i ely, iden ical aining and
simula ion da a esul s in a ze o co ela ion ma ix
Kcs
, which
ollows om
Ks=Kx=K
. To simula e he condi ional p ocess,
Ccs =σ2Kcs is used in a simila manne as in (2), wi h
ˆyc=ˆy +C1/2
cs ϵ.(3)
The es ima ions
ˆy
o he simula ion samples a e de i ed wi h Eq.
(1)
.
3.4 Tes Func ion Gene a o
The main p oposal o his wo k is o use he (condi ional) simula ion
app oach o gene a e es unc ions as desc ibed in Algo i hm 1.
The unc ion gene a o i s c ea es o ecei es da a o he p oblem
(line 2-7). Then, a Gaussian p ocess model is ained wi h ha
da a (line 8) and simula ion samples a e c ea ed (line 9). Fo each
desi ed es - unc ion, a sepa a e simula ion is pe o med (line 11).
GECCO ’17, July 15-19, 2017, Be lin, Ge many Zae e e e . al.
Algo i hm 1 Simula ion-based es unc ion gene a ion
1:
Gi en: numbe o aining samples
n
, simula ion samples
m
(usually
mn
), equi ed es unc ions
nsim
and (op ionally)
he expensi e eal-wo ld objec i e unc ion (x).
2: i (x)is a ailable hen
3: C ea e nsamples X={x1, .. ., xn}.
4: De e mine obse a ions y:yi= (xi).
5: else
6: Use p o ides da a se {X,y}.
7: end i
8: T ain Gaussian p ocess model Mbased on {X,y}.
9: C ea e msamples Xs={x1, .. ., xm}.
10: o all j∈1, .. ., nsim do
11:
C ea e (non-)condi ional simula ions
ˆy(j)
s
wi h Eq.
(2)
o
(3)
.
12: i Xs=X hen
13: Simula ion ˆy(j)
sis he equi ed j- h es unc ion.
14: else
15:
P o ide
j
- h es unc ion as in e pola ion o simula ed
samples using Eq. (1): ˆ
y(j)
s(x)=ˆ
µ+kT
sK−1
s(ˆy(j)
s−1ˆ
µ).
16: end i
17: end o
I he simula ion co e s he whole sea ch space, he esul ing alues
al eady ep esen he es unc ion. I no , an in e pola ion s ep (line
15 in Algo i hm 1) is necessa y. The chosen simula ion app oach
only simula es a he gi en sample loca ion
Xs
, and does no p esen
an explici o mula. Hence, he in e pola ion is necessa y when
only a subse o he sea ch space is simula ed. To gua an ee ha
he in e pola ion s ep ac ually ep oduces he aining da a in he
condi ional simula ion case, i is use ul o ensu e ha
X⊂Xs
.
The in e pola ion s ep has o be used wi h ca e. As his s ep is
based on es ima ion, i may iola e he non-smoo hing c i e ion
(C.6). Howe e , since
m
simula ion samples a e in e pola ed, a he
han jus he
n
aining samples, his issue is less se e e han in
he simple es ima ion case. The ad an ages o his es unc ion
gene a o a e:
•
I can make use o eal-wo ld da a and does no equi e access
o he ac ual objec i e unc ion, p o iding access o inexpensi e
es unc ions (C.5).
•
Tes unc ions ep esen a p oblem class a he han a single
p oblem. Di e se es unc ions can be p oduced a andom (C.2).
•
Tes unc ions can ep oduce he beha io o eal-wo ld p ob-
lems (non-condi ional simula ion) and he unde lying da a (con-
di ional simula ion), hus sa is ying he ele ance c i e ion (C.4).
•
Es ima ion-based es unc ions do no necessa ily p o ide he
mos ealis ic landscape. E.g., highe equency beha io may be
igno ed by he p edic o . Con a ily, simula ion allows o espec
such beha io [
5
]. Thus, simula ion may a oid he main pi all
o da a-d i en es unc ion gene a ion and sa is ies he non-
smoo hing c i e ion (C.6). Since his a oids o e ly simpli ied
es p oblems, his also helps o mee he di icul y (C.1) and
ele ance (C.4) c i e ia.
•
K iging models a e e y lexible (C.3). By adap ing he ke nel
unc ion (o i s pa ame e s) mos p oblems can be app oxima ed.
Due o his ea u e, he simula ion app oach is e en independen
o he solu ion ep esen a ion (da a ype o x) [20, 32].
•
Unlike he pa ame e a ia ion app oach (c . Sec. 2), he simu-
la ion app oach is a mo e p incipled way o gene a ing di e se
es unc ions (C.2). In he pa ame e - a ia ion app oach, con-
olled changes o he pa ame e s may ha e d as ic e ec s on
he esul ing unc ions. Simula ion, and especially condi ional
simula ion, on he o he hand gua an ees ha ce ain s uc u es
o he eal-wo ld da a a e p ese ed in he es unc ion.
•
Unlike he K igi ie app oach, we p opose o use da a om eal-
wo ld p oblems o de i e he simula ions (C.4). Fu he mo e, we
ou line he di icul ies o excessi e smoo hing (C.6), which also
a ec he K igi ie app oach. In addi ion, condi ional simula ion
is used, which has no been explo ed by he K igi ie .
Disad an ages o possible p oblems a e:
•
The aining da a may in oduce bias. I insu icien da a is
collec ed, he model may no lea n he p oblem s uc u e.
•
The model selec ion and con igu a ion may also in oduce bias.
I may be un easonable o compa e ce ain su oga e models o
hei con igu a ions based on his p ocedu e: The models ha
use he same con igu a ion and ype as he simula ion model
would ha e an un ai ad an age.
•
The numbe , alue and loca ion o local and global op ima is
unknown. This is in con as o classical es unc ions, BBOB o
he GLG. I equi ed, such ea u es ha e o be app oxima ed.
•
The numbe o simula ion samples
m
is impo an o be se o a
good alue. Ve y la ge alues o
m
may lead o compu a ional
issues due o memo y and ime equi emen s. Ra he small
alues o
m
may lead o excessi e smoo hness, due o he inal
in e pola ion s ep. Impo an ly, his smoo hness issue is less
se e e han in he es ima ion case:
m
is no es ic ed by he cos
o e alua ion o
(x)
(unlike he numbe o aining samples
n
).
•
Condi ional simula ion may p oduce es unc ions ha ha e
li le a ia ion i he ained K iging model i s he da a exceed-
ingly well, hus iola ing he di e si y c i e ion (C.2). The model
es ima es low a iances and all ealiza ions o he simula ion
will be nea ly iden ical. To de ec such a case, he es ima ed
a iances a he simula ion sample loca ions can be compa ed
agains a h eshold alue. In many use-cases, spa si y o ain-
ing da a due o high cos s o e alua ion will ende his issue
unlikely. This issue is i ele an o non-condi ional simula ion.
4 ONE-DIMENSIONAL EXAMPLE
To demons a e he in ui ion behind he simula ion-based es unc-
ion gene a o , we i s p esen a simple example. The sou ce code
can be eques ed om he au ho s. The example is based on he
eal alued, one-dimensional unc ion
1dim(x)=exp(−20x)+sin(6x2)+x,(4)
wi h
x∈ [
0
,
1
]
. He e,
n=
5 samples a e c ea ed wi h uni o m
andom sampling and a e e alua ed wi h
1dim
. The K iging model
is ained wi h he da a and is simula ed a
m=
100 loca ions. The
simula ion samples include he i e aining samples. The emaining
m−n=
95 samples a e d awn om a uni o m andom dis ibu ion.
This p ocess is epea ed
nsim =
10 imes, so ha en es unc ions
a e c ea ed. Figu e 2 shows he objec i e unc ion, he K iging
es ima ion and he (non-)condi ional simula ions.
Simula ion-based Tes Func ions o Op imiza ion Algo i hms GECCO ’17, July 15-19, 2017, Be lin, Ge many
0.0 0.2 0.4 0.6 0.8 1.0
−0.5 0.5 1.5
x
Es ima ion
0.0 0.2 0.4 0.6 0.8 1.0
−0.5 0.5 1.5
x
Non−condi ional
0.0 0.2 0.4 0.6 0.8 1.0
−0.5 0.5 1.5
x
Condi ional
Figu e 2: Top: The unc ion 1dim(x)(dashed, black) and
he K iging es ima ion (solid, black) based on aining da a
(do s). Middle and bo om: 10 ealiza ions o he (condi-
ional) simula ions.
The non-condi ional simula ion es unc ions do no ep oduce
he aining da a. While he esul ing unc ions look a he chao ic,
hey all sha e he same co a iance s uc u e and hence ha e simila
smoo hness, as well as a simila numbe o op ima. The es unc-
ions a e o simila di icul y as
1dim(x)
(C.1), a e di e se (C.2),
lexible (C.3), ele an o he o iginal p oblem (C.4), inexpensi e
(C.5), and su icien ly ugged (C.6). The mo i a ion o using hese
kinds o es unc ions would be o es pe o mance on unc ions
ha ha e simila s uc u e as he eal objec i e unc ion, bu a e
no necessa ily iden ical o i .
Con a ily, he aining da a is ep oduced by he condi ional
simula ion es unc ions. The condi ional simula ion’s de ia ion
om he es ima ion inc eases wi h inc easing dis ance o obse ed
da a. In gene al, he unc ions a e less a ied han he ones based on
non-condi ional simula ion, bu ma ch he ue unc ion
1dim(x)
mo e closely. Hence, he mo i a ion o use non-condi ional simula-
ion would be o es ima e pe o mance on po en ial ealiza ions o
he same (black-box) 1dim(x).
5 PROTEIN LANDSCAPE APPLICATION
5.1 Da a and P oblem
To showcase he applica ion o simula ion-based es unc ion gen-
e a ion in p ac ice, his sec ion p esen s a eal-wo ld example. To
ha end, an openly a ailable da a se om he ield o compu a-
ional biology is used [
3
,
23
]. I con ains he co esponding i ness
alues o all DNA sequences o leng h en. He e, i ness e e s o he
a ini y o a luo escen a ge p o ein: allophycocyanin. The da a
se has p e iously been used o he assessmen o e olu iona y
algo i hms, using a ini e s a e machine model [24].
Candida e solu ions
x
a e DNA sequences wi h
d=
10 bases,
i.e., s ings wi h en le e s ha a e ei he
A
,
C
,
T
, o
G
. The i ness
a ini y(x)
is he esul o he complex measu emen s desc ibed
in [23], and is pa o he da a se . I has o be maximized.
5.2 Tes Func ion Gene a ion
The da a se comp ises all possible 10-base sequences. Fo ou
es s,
n=
100 sequences
x
a e selec ed ( andomly, uni o mly) and
he co esponding i ness alues a e aken om he da a se . The
model
Mcomple e
is ained wi h he esul ing da a and is simula ed
a
m=
1000 addi ional samples, and
nsim =
10 es unc ions
a e c ea ed. Since he comple e da a se is a ailable, he e is li le
mo i a ion o a es unc ion based on condi ional simula ion.
Hence, we use he non-condi ional simula ion app oach. The idea
is o c ea e es unc ions ha show simila beha io as he gi en
p o ein i ness landscape. Since
m
is smalle han he size o he
comple e sea ch space, his equi es o use he in e pola ion s ep
du ing e alua ion o he es unc ion. The de i ed es unc ions
a e deno ed wi h
sim
(
Mcomple e
,
in e pola e
). He e, he i s
a gumen e e s o he employed model and he second a gumen
speci ies ha we in e pola e be ween he simula ed samples.
To show he e ec ha he in e pola ion has on he i ness land-
scape, we will c ea e wo addi ional se s o simula ion-based es
unc ions. In bo h o hese cases, he sea ch space is es ic ed
o a subspace o jus 1024 sequences. To ha end, he las 5 ele-
men s o each sequence a e ixed o
ACGTA
. In he i s case, he
abo e desc ibed model
Mcomple e
is used o simula e all 1024 se-
quences in he subspace. This case will be deno ed
sim
(
Mcomple e
,
simula e-only
). In he second case, a new model
Msubspace
is
ained wi h 100 sequences selec ed ( andomly, uni o mly) om
he 1024 sequences o he subspace. This case will be deno ed
sim
(
Msubspace ,simula e-only).
Since he candida e solu ions o samples a e no eal- alued, he
co ela ion unc ion desc ibed in Sec. 3.1 can no be used. Ins ead,
he co ela ion unc ion has o be changed [
20
,
32
]. He e, he expo-
nen ial co ela ion unc ion
k(x,x0)=exp(−θd(x,x0))
is used wi h
he Hamming dis ance
d(x,x0)=
d
Õ
i=1
wiwi h wi=1 i xi,x0
i,
0 o he wise.
The Hamming dis ance also p o ed o yield good esul s in o he
s udies [
31
] and has he addi ional ad an age o low compu a ional
cos . The Hamming dis ance was also used in he o iginal s udy
ha in oduced he conside ed da a se [23].
Since he p oblem is disc e e and o a he manageable size, we
can use b u e o ce o es ima e he global op imum o each gene -
a ed es unc ion. Also, he numbe o local minima is de e mined,
i.e., he numbe o samples whose neighbo s do no ha e a be e
i ness. He e, Hamming neighbo hood is employed. Tha means,
he neighbo s o a sequence a e all sequences ha di e in exac ly
one elemen om he o iginal one.
GECCO ’17, July 15-19, 2017, Be lin, Ge many Zae e e e . al.
5.3 Landscape Analysis
Fi s ly, we epo some landscape cha ac e is ics o he es unc-
ions based on simula ions in he comple e, un es ic ed sea ch
space, i.e.,
sim
(
Mcomple e
,
in e pola e
). Rowe e al. [
23
] epo a
co ela ion leng h o oughly 4.5. They es ima e his alue by calcu-
la ing he au o-co ela ion o andom walks in he i ness landscape.
Thei esul is nicely ep oduced by he K iging model, which is
ained wi h jus 100 samples: he co ela ion leng h ( he ecip ocal
o he ke nel pa ame e
θ
) de e mined by maximum likelihood es i-
ma ion du ing model aining is 4
.
48. Ano he good ma ch is he
epo ed i ness dis ance co ela ion:
−
0
.
32 (in [
23
]) e sus
−
0
.
37
wi h s anda d de ia ion 0.09 (in his s udy).
Un o una ely, he e is also a s ong misma ch. Rowe e al. [
23
]
epo ha he da a se has 6805 local op ima. The simula ed es
unc ions
sim
(
Mcomple e
,
in e pola e
) ha e 49 local op ima o
less. This is a majo p oblem, as he es unc ions do no seem
o ep esen he unde lying p oblem e y well. The es unc ions
would be a o easy o sol e, iola ing c i e ia C.1, C.4, and C.6.
This p oblem can clea ly be linked o he las s ep o he unc-
ion gene a ion: in e pola ion. The in e pola ion is again based
on es ima ion. I can no ep esen highe equency changes in
he landscape e y well, as i in oduces oo much smoo hness.
Essen ially, he in e pola ion be ween simula ed samples will e-
mo e po en ial local op ima which would o he wise be p esen in
a simula ion o he comple e sea ch space. Since he numbe o
simula ed samples
m=
1000 is much smalle han he numbe o
local op ima in he eal landscape, he esul ing es unc ion is
necessa ily much smoo he han he eal p oblem.
This p oblem can be e ealed in a down-sized expe imen . By
es ic ing he analysis o a subspace o jus 1024 DNA sequences,
i.e.,
sim
(
Mcomple e
,
simula e-only
), in e pola ion can be a oided.
This esul s in o landscapes ha ha e be ween 2 and 7 local op ima
(in his subspace). This esul ma ches mo e closely o he eal
landscape’s beha io , which should on a e age ha e 6805
/
4
5≈
6
.
6
local op ima in a subspace o his size.
I he model is also di ec ly ained as well as simula ed in he
subspace, i.e.,
sim
(
Msubspace
,
simula e-only
), his esul s in o 10
o 19 local op ima. The eal landscape has exac ly 16 op ima in he
subspace. The ea lie iola ed c i e ia (C.1, C.4, C.6) a e sa is ied.
In heo y, we could do he same pu e simula ion expe imen on he
comple e sea ch space, bu ha is compu a ionally in easible: e en
jus s o ing he equi ed 4
10
x 4
10
co a iance ma ix is p ohibi i e.
Clea ly, his is a cen al issue. As shown, small disc e e sea ch
spaces may allow o skip he in e pola ion s ep and he co espond-
ing p oblems. La ge, ugged sea ch spaces emain a challenge. We
do no esol e his nume ical and compu a ional issue in his a i-
cle, a he poin o some po en ial solu ion app oaches. As no ed
in Sec. 3, we used a he simple and s aigh o wa d simula ion
echniques. The e a e mo e e icien simula ion echniques ha
allow o deal wi h la ge numbe s o simula ion samples. In he
con inuous case, one could y o adop he spec al simula ion
echnique ha does no di ec ly ely on a se simula ed samples
bu is based on a sum o cosine unc ions [
5
]. Spec al simula ion
does no equi e o compu e he comple e co a iance ma ix o he
simula ion samples. In he disc e e case, Gaussian Ma ko Random
Field models [
26
] may be o in e es . He e, he Ma ko p ope y
induces spa si y in he in e se o he co a iance ma ix, which may
be exploi ed o deal wi h la ge sample sizes.
As he one dimensional example in Sec. 4 showed, in e pola ing
a small numbe o simula ion samples should p o ide sa is ying
esul s i he p oblem i sel is a he smoo h. Hence, i is desi able
o es ima e he equi ed numbe o simula ion samples
m
ha
lead o an in e pola ion ha e lec s he uggedness o he ac ual
p oblem. Clea ly, he model is ully speci ied once all pa ame e s
a e de e mined. I should be possible o es ima e
m
based on he
esul ing co a iance s uc u e. In ha sense, small pa ame e s
θ
o he ke nel unc ion yield smoo he landscapes ha equi e less
simula ion samples o app oxima e. La ge
θ
yield mo e ugged
landscapes ha equi e mo e simula ion samples. One could also
ake an empi ical app oach o de e mine
m
, by inc easing i in s eps
and obse ing he con e gence o a sui able measu e (e.g., some
non-pa ame ic measu e o uggedness o he simula ion). Whe e
possible, expe knowledge abou he p oblem may also help o
de e mine a sui able
m
, i , e.g., he numbe o local op ima is known.
La ge, ugged sea ch spaces emain a challenge, and should ecei e
mo e a en ion in u u e esea ch.
A he same ime, he expe imen al esul s demons a e ha
simula ion-based es unc ions should be p e e ed o pu e es ima-
ion. The es ima ion o he model
Msubspace
ained in he 5-base
subspace has only 2 op ima (in ha same subspace). This s esses
ha excessi e smoo hing may supp ess local op ima. This p oblem
ex ends o any kind o es ima ion-based es unc ion gene a o .
5.4 Pe o mance Analysis
As he landscape analysis showed, he in e pola ed simula ion is
no a good ep esen a ion o he eal p oblem beha io . Hence, we
use es unc ions de i ed om a model
Msubspace
ha is ained
and simula ed in he ea lie in oduced 5-base subspace, i.e.,
sim
(
Msubspace
,
simula e-only
). By es ic ing he pe o mance anal-
ysis o he 5-base subspace, we a oid he in e pola ion p oblem.
We use he de i ed simula ion-based es unc ions o e alua e he
pe o mance o op imiza ion algo i hms. Ten di e en es unc-
ions a e c ea ed, and each algo i hm is un wen y imes on each
unc ion, esul ing in o 200 eplica ions.
In addi ion, we wan o show he ad an age o his app oach
in compa ison o an es ima ion-based es unc ion. The e o e, a
baseline es unc ion is de i ed om he es ima ion (p edic ion)
o he same model. Finally, he algo i hms es s we e epea ed
on he ac ual objec i e unc ion, i.e., di ec ly using he eal da a.
Since hese las wo cases only in ol e a single objec i e unc ion
ins ance, all 200 eplica ions we e spen on ha single unc ion.
The es ed op imiza ion algo i hm is a a ian o EGO o combi-
na o ial op imiza ion [
16
,
31
]. The algo i hm i s gene a es a se
o
k
samples ( andomly, uni o mly) and e alua es hem wi h he
objec i e unc ion (o es unc ion). Di e en alues o he ini ial
design size pa ame e
k
will be es ed: k=
{
5,10,20,50
}
. A K iging
su oga e model is lea ned wi h he esul ing da a. An op imiza ion
algo i hm (he e: b u e o ce
1
) is used o op imize an in ill c i e ion.
We compa e wo in ill c i e ia: expec ed imp o emen (EI) and he
p edic ed mean. The o me is, e.g., desc ibed in [
16
]. The la e is
1
In la ge sea ch spaces, e olu ion s a egies o ela ed me hods a e mo e app op ia e.
Simula ion-based Tes Func ions o Op imiza ion Algo i hms GECCO ’17, July 15-19, 2017, Be lin, Ge many
he de i ed om Eq. (1). The sample ha op imizes he in ill c i e-
ion will be e alua ed by he objec i e unc ion and he esul is
used o upda e he su oga e model. This p ocedu e is i e a ed un il
a budge o 100 unc ion e alua ions is exhaus ed. Random sea ch
is used as a baseline compa ison. Hence, we compa e 9 algo i hms:
Random sea ch and 8 combina ions o he in ill c i e ion and k.
Simila ly o he COCO amewo k [
11
], we use a se o a ge
alues and he espec i e un ime equi ed o each hese a ge s
o measu e algo i hm pe o mance. To ha end, he global op-
imum
yop = (xop )
is de e mined by b u e o ce. Based on
he de e mined op imum, he i ness gap is speci ied as ollows
gap(x)= (x) − (xop )
. Finally, e enly spaced a ge s o gap
a e speci ied on a loga i hmic scale:
a дap =
10
0
,
10
−0.2
, .. .,
10
−6
.
Fo each algo i hm un, i is eco ded a e wha un ime a ce ain
a ge is eached. Fo each se o es unc ions, we agg ega e he
esul ing da a by calcula ing he ac ion o all a ge s eached a a
ce ain un ime. The esul ing cu es ( ac ion o a ge s eached
agains un ime) a e called un leng h dis ibu ions [
14
], empi ical
cumula i e dis ibu ion unc ions (ECDF) [
11
] o da a p o iles [
22
].
Fo ou expe imen s, he ECDF plo s a e depic ed in Fig. 3. The
esul s show ha using he EI in ill c i e ion is c ucial. Wi hou
EI, algo i hm pe o mance is close o he andom sea ch baseline.
Resul s a e 100 e alua ions seem o be insensi i e o
k
. Ea lie
in he uns, smalle design sizes yield supe io esul s. The s ong
posi i e e ec o EI can be pa ly a ibu ed o he uggedness and
ela i ely la ge numbe o local op ima. Since he es ima ion-based
es unc ion is o e ly smoo h, his e ec is less isible he e: he
uns wi hou EI a e mo e easily dis inguished om he andom
sea ch baseline in case o es ima ion. This s esses he impo ance
o using simula ion a he han es ima ion o es ing. Compa ed
o es ima ion, he simula ion-based esul s a e much close o he
esul s on he ue unc ion. The es ima ion-based es unc ion
se e ely unde es ima es he di icul y o he p oblem. This s esses
ha simula ion app oaches a e mo e sui ed o sa is y he equi ed
c i e ia o es unc ion gene a o s, especially wi h ega ds o di i-
cul y (C.1), ele ance (C.4) and non-smoo hness (C.6).
A sligh di e ence be ween simula ion-based es s and he eal
unc ion pe o mance can be obse ed. No e, ha he goal o he
simula ion-based es unc ions was no o p oduce exac su oga e
unc ions o he ue, eal-wo ld p oblem. Ra he , i was desi ed
o c ea e es unc ion ins ances wi h simila beha io . S ill, he
selec ed ke nel o he K iging model may be imp o ed. The iso opic
na u e o he ke nel may no be a pe ec choice. Rowe e al. [
23
]
epo ha bases a he s a o a sequence ha e la ge impac s
han he bases la e in he sequence. I may be mo e exac o use
an aniso opic ke nel, which allows o lea n he impo ance o
each base. This comes a he cos o in oducing addi ional ke nel
pa ame e s.
6 DISCUSSION
One emaining ques ion is: When should we use which es unc ion
gene a ion app oach? The answe clea ly depends on he con ex
and goals o he analysis.
We s ongly ecommend o use a simula ion app oach i da a-
d i en, model-based es unc ions a e desi ed. Simula ion is a
p incipled way o gene a ing di e si y and a oiding oo much
RandomSea ch
Mean.50
Mean.20
Mean.5
Mean.10
EI.5
EI.50
EI.20
EI.10
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2
log10( e al)
eached a ge s / all a ge s
RandomSea ch
Mean.50
Mean.20
Mean.10
Mean.5
EI.10
EI.20
EI.5
EI.50
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2
log10( e al)
eached a ge s / all a ge s
RandomSea ch
Mean.20
Mean.10
Mean.50
Mean.5
EI.50
EI.5
EI.10
EI.20
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2
log10( e al)
eached a ge s / all a ge s
Figu e 3: Loga i hmic ECDF plo s o h ee es cases: es
unc ions based on non-condi ional simula ion ( op), es i-
ma ion (middle), and he eal objec i e unc ion (bo om).
The labels inside he plo indica e he con igu a ion o he
employed algo i hm, ha is, whe he EI o he p edic ed
mean was used as an in ill c i e ion and he size o he ini-
ial design. The x-axis depic s he loga i hm o he numbe
o i ness unc ion e alua ions ( e al).
smoo hness. Bu simula ion-based es unc ions a e no supposed
o eplace classical es unc ion se s. These es unc ion se s do
ha e me i s, e.g., hei p ope ies and beha io a e well unde s ood.
I an algo i hm is assessed wi hou any speci ic applica ion in mind,
a mix o bo h would be ideal. I an algo i hm is assessed wi h he
desi e o de e mine pe o mance on p oblems wi h speci ic ea-
u es (e.g., sepa abili y, unimodali y), classical es unc ions a e
p obably p e e able. Con a ily, i an algo i hm is assessed in he
con ex o a speci ic eal-wo ld applica ion (i.e., C.4 is impo an ), a
simula ion-based es unc ion gene a o should be p e e ed.
In he la e case, i pe o mance on a class o p oblems wi h
simila beha io as he eal objec i e unc ion is o in e es , non-
condi ional simula ion would be mo e app op ia e. Condi ional