Simulation-based Test Functions for Optimization Algorithms

Author: Zaefferer, Martin,Fischbach, Andreas,Naujoks, Boris,Bartz-Beielstein, Thomas

Year: 2017

Source: https://cos.bibl.th-koeln.de/files/477/zaef17acos.pdf

CIplus
Band 3/2017
Simula ion-based Tes Func ions o
Op imiza ion Algo i hms
Ma in Zae e e , And eas Fischbach, Bo is Naujoks,
Thomas Ba z-Beiels ein
Simula ion-based Tes Func ions o Op imiza ion Algo i hms
Ma in Zae e e , And eas Fischbach, Bo is Naujoks, Thomas Ba z-Beiels ein
[ i s name].[las name]@ h-koeln.de
TH K¨
oln, Facul y o Compu e Science and Enginee ing Science
S einm¨
ulle allee 1, 51643 Gumme sbach, Ge many
ABSTRACT
When designing o de eloping op imiza ion algo i hms, es unc-
ions a e c ucial o e alua e pe o mance. O en, es unc ions a e
no su icien ly di icul , di e se, lexible o ele an o eal-wo ld
applica ions. P e iously, es unc ions wi h eal-wo ld ele ance
we e gene a ed by aining a machine lea ning model based on
eal-wo ld da a. The model es ima ion is used as a es unc ion.
We p opose a mo e p incipled app oach using simula ion ins ead
o es ima ion. Thus, ele an and a ied es unc ions a e c ea ed
which ep esen he beha io o eal-wo ld i ness landscapes. Im-
po an ly, es ima ion can lead o excessi ely smoo h es unc ions
while simula ion may a oid his pi all. Mo eo e , he simula ion
can be condi ioned by he da a, so ha he simula ion ep oduces
he aining da a bu ea u es di e se beha io in unobse ed e-
gions o he sea ch space. The p oposed es unc ion gene a o is
illus a ed wi h an in ui i e, one-dimensional example. To demon-
s a e he u ili y o his app oach i is applied o a p o ein sequence
op imiza ion p oblem. This applica ion demons a es he ad an-
ages as well as p ac ical limi s o simula ion-based es unc ions.
CCS CONCEPTS
•Theo y o compu a ion →Ma hema ical op imiza ion; Gauss-
ian p ocesses; •Compu ing me hodologies →Modeling and
simula ion;
KEYWORDS
Op imiza ion, Tes unc ion gene a o , Simula ion, Modeling
ACM Re e ence o ma :
Ma in Zae e e , And eas Fischbach, Bo is Naujoks, Thomas Ba z-Beiels ein.
2017. Simula ion-based Tes Func ions o Op imiza ion Algo i hms. In
P oceedings o GECCO ’17, Be lin, Ge many, July 15-19, 2017, 8 pages.
DOI: h p://dx.doi.o g/10.1145/3071178.3071190
1 INTRODUCTION
A c ucial issue o he de elopmen , imp o emen and unde s and-
ing o op imiza ion algo i hms a e pe o mance es s o bench-
ma ks. Tes unc ions a e equi ed o e alua e he pe o mance o
algo i hms. I is pa icula ly di icul o p o ide es unc ions o
expensi e op imiza ion p oblems, whe e e alua ions equi e high
Pe mission o make digi al o ha d copies o all o pa o his wo k o pe sonal o
class oom use is g an ed wi hou ee p o ided ha copies a e no made o dis ibu ed
o p o i o comme cial ad an age and ha copies bea his no ice and he ull ci a ion
on he i s page. Copy igh s o componen s o his wo k owned by o he s han he
au ho (s) mus be hono ed. Abs ac ing wi h c edi is pe mi ed. To copy o he wise, o
epublish, o pos on se e s o o edis ibu e o lis s, equi es p io speci ic pe mission
and/o a ee. Reques pe missions om [email p o ec ed].
GECCO ’17, Be lin, Ge many
©
2017 Copy igh held by he owne /au ho (s). Publica ion igh s licensed o ACM.
978-1-4503-4920-8/17/07...$15.00
DOI: h p://dx.doi.o g/10.1145/3071178.3071190
compu a ional e o o o he limi ed esou ces. O en, expensi e
op imiza ion p oblems necessi a e access o complex, con iden ial
simula ion codes, o access o expensi e labo a o y equipmen .
E en i access is g an ed, he e alua ion cos s make ex ensi e es s
in easible. Only a limi ed numbe o expensi e p oblems is openly
a ailable o he esea ch communi y.
Thus, we need a gene a o o es unc ions which sa is y ce ain
c i e ia. Besides impo an ea u es, which a e lis ed in well-known
publica ions (e.g., [6, 30]) we ocus on he ollowing c i e ia.
(C.1)
Di icul y: Tes unc ions should be su icien ly complex [
21
].
Whi ley [
30
] s a es ha es “p oblems should be esis an o
hill-climbing”.
(C.2)
Di e si y: The p oblem ins ances a e a ied, andomized and
no known a p io i. This c i e ion is a s anda d in machine
lea ning, because he a ailable se o p oblem ins ances is
pa i ioned in o a aining, a alida ion, and a es se [13].
(C.3)
Flexibili y: They should no be es ic ed o one speci ic p ob-
lem ins ance. Flexibili y is used in machine lea ning o cha ac-
e ize he numbe o pa ame e s ha a e necessa y o speci y
a model [
15
]. Flexibili y will be used in ou amewo k o
cha ac e izing unc ions. Some au ho s use he e m “gene -
alizabili y” o cha ac e ize his ea u e [2].
(C.4)
Rele ance: They should e lec eal-wo ld p oblem beha io .
(C.5)
E alua ion cos : They should be inexpensi e o e alua e,
allowing o nume ous es s.
One way o p o ide es unc ions ha sa is y c i e ia (C.1)-
(C.5) is o gene a e da a-d i en eg ession models o he objec i e
unc ion and use he de i ed p edic o o es algo i hms [
2
,
7
,
8
,
25
]. This app oach has an inhe en p oblem: Almos all eg ession
models in e pola e he da a hey a e ained on and hence yield
smoo hed i ness landscapes. Thus, he de i ed ins ances may be
less ugged and mo e easy o sol e han he eal-wo ld p oblem.
The e o e, da a-d i en es unc ions should in addi ion espec he
ollowing c i e ion:
(C.6)
Non-smoo hing, i.e., he es ins ances e lec he uggedness
o he o iginal p oblem.
Thus, he main esea ch ques ion examined in his s udy is: How
o gene a e es unc ions ha sa is y c i e ia (C.1)-(C.6)? To ha
end, we p opose a gene al amewo k o gene a ing es unc ions
based on eal-wo ld da a using simula ion a he han es ima ion
(p edic ion) echniques. Decisi ely, a simula ion has he po en ial
o a oid he pi all o smoo hing. Fu he mo e, i p o ides a p inci-
pled way o gene a e di e se es unc ion ins ances. To illus a e
hese ea u es, we ain K iging models [
5
] on eal-wo ld da a. The
key idea is o use non-condi ional and condi ional simula ion [
5
]
o K iging models o gene a e a ied p oblem ins ances ha do
no smoo hen he po en ially ugged s uc u e o he eal-wo ld
p oblem. The simula ion-based es unc ions can e lec he beha -
io o he eal-wo ld p oblem a he han jus he da a i sel . These
"© Ma in Zae e e , And eas Fischbach, Bo is Naujoks, Thomas Ba z-Beiels ein. 2017.
This is he au ho 's e sion o he wo k. I is pos ed he e o you pe sonal use. No o edis ibu ion. The de ini i e e sion
was published in he p oceedings o GECCO'17, Be lin, Ge many,
h p://dx.doi.o g/10.1145/3071178.3071190."
GECCO ’17, July 15-19, 2017, Be lin, Ge many Zae e e e . al.
es unc ions a e especially in e es ing o expensi e op imiza ion
p oblems bu ob iously also apply o he cheap case.
Rela ed app oaches will be explained in Sec. 2. A e wa ds, Sec. 3
will p o ide he de ails o he (non-)condi ional K iging simula ion-
based gene a o . A simple example is gi en in Sec. 4. To p esen a
mo e complex applica ion, he me hod is applied o a eal-wo ld
da a se in Sec. 5. He e, we also in es iga e p ac ical limi s o he
app oach. Sec ion 6 discusses he applicabili y o ou app oach.
Finally, Sec. 7 p esen s a summa y and ou look o his wo k.
2 RELATED WORK
The mos basic es unc ion is a simple ma hema ical exp ession,
e.g., he sphe e unc ion, which e lec s he beha io o many es
unc ions in he icini y o he op imum [
21
]. In many s udies, se s
o such exp essions a e used as es beds, e.g., combining he sphe e,
B anin, o Rosenb ock unc ions. These es sui es should obey
ce ain p inciples, e.g., nonlinea i y, non-sepa abili y, and scalabil-
i y [
30
]. The bene i o using se s o well es ablished unc ions is
ha hey enable compa abili y be ween di e en s udies and can
be used o gua an ee gene alizable esul s. S ill, ce ain algo i hms
could easily be ailo ed o o e i a speci ic es bed, because he es
unc ions a e known in ad ance, i.e., be o e he s udy is pe o med.
Fu he mo e, he capabili y o ep esen ing complex eal-wo ld
beha io is p obably limi ed. The gene a ing p inciple o hese
classical es unc ion sui es [
6
,
21
,
27
] can be desc ibed as induc i e,
because single, simple ea u es such as symme y o mul imodali y
a e combined o gene a e a complex es unc ion.
A mo e comp ehensi e app oach is aken by he Compa ing Con-
inuous Op imize pla o m (COCO), also known as he Black Box
Op imiza ion Benchma k (BBOB) [
11
]. BBOB comp ises a ame-
wo k ha au oma es he expe imen al p ocedu e in ol ed in es ing
o con inuous op imiza ion algo i hms. BBOB akes an induc i e
app oach, elying on a i icial es unc ions [
12
]. BBOB p o ides
an enhanced p ocedu e o pos -p ocessing o expe imen al esul s
o enable a s anda dized compa ison and analysis.
The Gaussian Landscape Gene a o (GLG), which was p oposed
by Gallaghe and Yuan [
10
], is also an induc i e app oach. How-
e e , i is no based on a ixed se o unc ions. Ra he , i andomly
composes Gaussian cu es. The o e all i ness alue is he maxi-
mum o all cu es a a gi en poin . One ad an age o he GLG is
he abili y o con ol he numbe o local op ima. Thus, he com-
plexi y o he esul ing es unc ion ins ances can be con olled.
Also, he andomized p ocess allows o a la ge a ie y o es unc-
ions. Howe e , he ele ance o he esul ing unc ions is deba able.
Simila es unc ion gene a o s a e desc ibed in [1].
The K igi ie app oach c ea es andom K iging models, o Gauss-
ian p ocesses [
29
]. Wi h a use -speci ied end and co a iance
s uc u e, he K igi ie andomly c ea es a p ocess ha can be used
as a non-linea es unc ion. Thus, a ied and di icul unc ions
can be gene a ed based on an induc i e app oach. The ele ance
o he esul ing unc ions elies on he assump ion ha eal-wo ld
p ocesses a e also Gaussian, bu i is unclea how he esul ing es
unc ions ela e o a speci ic eal-wo ld applica ion.
A deduc i e app oach has been employed o a p ac ical applica-
ion by Rudolph e al. [
25
]. Deduc i e app oaches ake a complex
da a se and ex ac impo an ea u es using da a-d i en me h-
ods. Rudolph e al. aim o imp o e algo i hm pe o mance on he
eal p oblem (op imiza ion o a ship p opulsion sys em) by uning
pe o mance on a K iging su oga e model.
Ba z-Beiels ein [
2
] p oposed a deduc i e app oach o op imiza-
ion benchma ks in gene al. Da a om a eal-wo ld sys em a e
used o ain a model and o gene a ing es unc ion ins ances.
Model pa ame e s can be s ochas ically a ied o enable di e si y.
S a is ical ools such as mixed models a e also discussed [4].
Simila ly, Flasch [
8
] and Fischbach e al. [
7
] used a deduc i e ap-
p oach based on K iging models o gene a e es unc ions. Fi s ly,
eal-wo ld da a is aken om some expe imen . Secondly, a K ig-
ing model is ained wi h he da a. The K iging model is a ied
by making con olled changes o he model pa ame e s, e.g., he
nugge pa ame e o pa ame e s o he co ela ion unc ion. Hence,
i will be e e ed o as he pa ame e - a ia ion app oach. Then, he
p edic o o he a ied K iging model can be used as a es unc ion.
In p inciple, his app oach can be applied o a bi a y models and i
is no es ic ed o K iging o Gaussian p ocesses. An ex ension by
Fischbach e al. [
7
] akes wo p oblems in o accoun : (i) i a model
is insensi i e o some pa ame e , he de i ed es unc ion ins ances
will be nea ly iden ical and (ii) i a pa ame e has a d as ic impac , a
andom change may c ea e a unc ion wi hou any esemblance o
he o iginal unc ion. Bo h p oblems a e handled by compu ing a -
ious measu es o (dis-)simila i y be ween he es unc ion ins ances
and he unmodi ied model. The compu ed alues a e equi ed o be
wi hin use -speci ied bounds. Thus, simple copies ( oo simila ) and
s ong dis o ions (no simila enough) a e a oided.
Based on hese esul s, we p opose a new deduc i e simula ion
app oach. The goal is o gene a e da a-d i en es unc ions ha ul-
ill c i e ia (C.1) o (C.5) and a oid he pi all o smoo hing (C.6). The
co esponding me hods and equi ed ounda ions a e in oduced
in he ollowing.
3 SIMULATION-BASED TEST FUNCTION
GENERATOR
3.1 K iging Es ima ion
K iging is a modeling p ocedu e ha unde s ands obse a ions
as ealiza ions o a Gaussian p ocess. A de ailed desc ip ion is
gi en by Fo es e e al. [
9
]. In op imiza ion, K iging is a popula
choice, as i addi ionally p o ides an es ima e o p edic ion unce -
ain y, which can be used o balance explo a ion and exploi a ion by
compu ing he expec ed imp o emen o candida e solu ions [
19
].
This app oach is mos amously employed in he E icien Global
Op imiza ion algo i hm (EGO) [
16
]. Ou s udy u ilizes K iging o
simula e esponses o a Gaussian p ocess.
K iging app oxima es he da a by modeling he co ela ion be-
ween obse a ions, e.g., using a Gaussian co ela ion unc ion
(ke nel)
k(x,x0)=exp(−θd(x,x0))
, whe e
x,x0∈ X
. He e,
X
is
some non-emp y se , called he sea ch space. I
X=R
hen
x
is a
-dimensional eal ec o . Fu he mo e,
θ∈R
is a ke nel pa ame e
and
d(x,x0)
is a dis ance unc ion, e.g.,
d(x,x0)=|x−x0|
wi h
x∈R
.
Based on his, a co ela ion ma ix
K
is compu ed, which collec s
all pai wise co ela ions o he aining da a
X={x1, . . ., xn}
. I is
Simula ion-based Tes Func ions o Op imiza ion Algo i hms GECCO ’17, July 15-19, 2017, Be lin, Ge many
used in he p edic o as ollows:
ˆ
y(x)=ˆ
µ+kTK−1(y−1ˆ
µ),(1)
whe e
y
a e he aining obse a ions,
ˆ
y(x)
is he p edic ed unc ion
alue o a new sample
x
,
ˆ
µ
ep esen s he p ocess mean,
1
is a ec o
o ones and
k
is he column ec o o co ela ions be ween he se
o aining samples
X
and he new sample
x
. All pa ame e s (e.g.,
θ,ˆ
µ) a e de e mined by Maximum Likelihood Es ima ion (MLE).
3.2 K iging Simula ion
The p edic o in Eq.
(1)
es ima es a unc ion alue a a new sample
x
. The goal o es ima ion (o p edic ion) is o p oduce alues ha
a e as close o he ue alues as possible. In con as , he goal o
simula ion is o p oduce alues whose momen s a e as close o he
momen s o he eal da a as possible [
17
]. In case o K iging, he
simula ion app oach c ea es ealiza ions o a Gaussian p ocess wi h
he same mean and co a iances as he modeled p ocess.
While he p edic o is also based on he p ocess mean and co a i-
ance ma ix es ima ed du ing model aining, he p edic o i sel
does no ha e hese e y same p ope ies: i smoo hs he da a and
may e en be non-s a iona y [
5
]. On he o he hand, simula ions ac-
ually ha e he espec i e mean and co a iances. This is impo an
when gene a ing es unc ions: a smoo hed landscape may ob i-
ously lack impo an ea u es o he eal, o iginal i ness landscape,
e.g., i may ha e a smalle numbe o local op ima.
Di e en app oaches o simula ion o Gaussian p ocesses ex-
is [
5
]. Ou app oach is based on he squa e oo o he co a i-
ance ma ix [
5
]. This choice is made o compu a ional easons.
Fi s ly, a se
Xs
o
m
samples is selec ed. The p ocess will be simu-
la ed a hese samples. Secondly, he co ela ion ma ix
Ks
o he
se
Xs
is compu ed. I is decomposed as
σ2Ks=Cs=UΛUT
,
whe e he p ocess a iance
σ2
is de e mined by MLE and
Cs
is
he co a iance ma ix wi h he eigen ec o ma ix
U
and diag-
onal eigen alue ma ix
Λ=diag(λ)
. The squa e oo o
Cs
is
C1/2
s=Udiag(λ1/2
1, . . ., λ1/2
m)UTand he simula ion is gi en by
ˆync =1ˆ
µ+C1/2
sϵ.(2)
He e,
ϵ
is a ec o o
m
independen , no mally dis ibu ed andom
numbe s wi h ze o mean and uni a iance.
3.3 K iging Condi ional Simula ion
The goal o condi ional simula ion is ep oducing he momen s o
he aining da a. A he same ime, he simula ion can be made
condi ional on he aining da a. Tha means, he aining da a is
ep oduced exac ly by he simula ion, i.e., ˆys=yi Xs=X. Thus,
he condi ional simula ion is a be e app oxima ion o he aining
da a, compa ed o he non-condi ional simula ion. S ill, due o
he di e en goals o es ima ion and simula ion, he condi ional
simula ion e o is wice as la ge as he es ima ion e o [5, 17].
A a i s glance, his ea u e may seem undesi able. The use ul-
ness o condi ional simula ion can be demons a ed by a simple
example gi en by Lan uejoul [
18
]. The cu es in Fig. 1 ep esen
dep h measu emen s along an unde sea cable. The in e pola ion
( hin black line) gi es a good es ima e o he ue dep h. The condi-
ional simula ion (dashed ed line) on he o he hand may ha e a
la ge e o . Howe e , i he goal is o de e mine he equi ed leng h
0 100 200 300 400
−3000 −1000 0
dis ance [km]
dep h [m]
Figu e 1: Unde sea cable dep h es ima ion (black line) and
condi ional simula ion (dashed ed line) and he gi en da a
(black do s). Based on he example p esen ed in [18].
o he cable, he es ima ion app oach may se e ely unde es ima e
he ue alue, while he simula ion may wo k well. In e es ingly,
bo h es ima ion and simula ion a e a esul o he same ained
model, bu ep esen di e en ea u es.
The condi ional simula ion may esul in mo e ealis ic shapes
han he p edic o . Fo example, highe equency beha io may
no be isible in he p edic o , bu may be isible in a (condi ional)
simula ion [
5
]. Simila o he non-condi ional simula ion, di e en
condi ional simula ion app oaches exis . He e, we use a s aigh
o wa d app oach ha di ec ly simula es he condi ional Gaussian
p ocess [
28
]. This is no necessa ily he mos e icien choice. Bu i
allows o a a he simple and anspa en implemen a ion. A mo e
ad anced app oach can be subs i u ed when needed, e.g., when
g owing da a size ende s he ollowing app oach in easible.
As desc ibed in Sec. 3.2, we ha e he co ela ion ma ix o he
aining da a (obse ed da a)
K
and he ma ix o co ela ions be-
ween he samples o be simula ed,
Ks
. Fu he mo e,
Kx
deno es
he ma ix o he c oss-co ela ions be ween aining and simula-
ion samples. Co ela ions o he combined aining and simula ion
samples can be a anged in a block ma ix as ollows:
Kall =K KT
x
KxKs.
Following [
28
], he condi ioned co ela ion ma ix can be calcu-
la ed as
Kcs =Ks−KxK−1KT
x.
In ui i ely, iden ical aining and
simula ion da a esul s in a ze o co ela ion ma ix
Kcs
, which
ollows om
Ks=Kx=K
. To simula e he condi ional p ocess,
Ccs =σ2Kcs is used in a simila manne as in (2), wi h
ˆyc=ˆy +C1/2
cs ϵ.(3)
The es ima ions
ˆy
o he simula ion samples a e de i ed wi h Eq.
(1)
.
3.4 Tes Func ion Gene a o
The main p oposal o his wo k is o use he (condi ional) simula ion
app oach o gene a e es unc ions as desc ibed in Algo i hm 1.
The unc ion gene a o i s c ea es o ecei es da a o he p oblem
(line 2-7). Then, a Gaussian p ocess model is ained wi h ha
da a (line 8) and simula ion samples a e c ea ed (line 9). Fo each
desi ed es - unc ion, a sepa a e simula ion is pe o med (line 11).

GECCO ’17, July 15-19, 2017, Be lin, Ge many Zae e e e . al.
Algo i hm 1 Simula ion-based es unc ion gene a ion
1:
Gi en: numbe o aining samples
n
, simula ion samples
m
(usually
mn
), equi ed es unc ions
nsim
and (op ionally)
he expensi e eal-wo ld objec i e unc ion (x).
2: i (x)is a ailable hen
3: C ea e nsamples X={x1, .. ., xn}.
4: De e mine obse a ions y:yi= (xi).
5: else
6: Use p o ides da a se {X,y}.
7: end i
8: T ain Gaussian p ocess model Mbased on {X,y}.
9: C ea e msamples Xs={x1, .. ., xm}.
10: o all j∈1, .. ., nsim do
11:
C ea e (non-)condi ional simula ions
ˆy(j)
s
wi h Eq.
(2)
o
(3)
.
12: i Xs=X hen
13: Simula ion ˆy(j)
sis he equi ed j- h es unc ion.
14: else
15:
P o ide
j
- h es unc ion as in e pola ion o simula ed
samples using Eq. (1): ˆ
y(j)
s(x)=ˆ
µ+kT
sK−1
s(ˆy(j)
s−1ˆ
µ).
16: end i
17: end o
I he simula ion co e s he whole sea ch space, he esul ing alues
al eady ep esen he es unc ion. I no , an in e pola ion s ep (line
15 in Algo i hm 1) is necessa y. The chosen simula ion app oach
only simula es a he gi en sample loca ion
Xs
, and does no p esen
an explici o mula. Hence, he in e pola ion is necessa y when
only a subse o he sea ch space is simula ed. To gua an ee ha
he in e pola ion s ep ac ually ep oduces he aining da a in he
condi ional simula ion case, i is use ul o ensu e ha
X⊂Xs
.
The in e pola ion s ep has o be used wi h ca e. As his s ep is
based on es ima ion, i may iola e he non-smoo hing c i e ion
(C.6). Howe e , since
m
simula ion samples a e in e pola ed, a he
han jus he
n
aining samples, his issue is less se e e han in
he simple es ima ion case. The ad an ages o his es unc ion
gene a o a e:
•
I can make use o eal-wo ld da a and does no equi e access
o he ac ual objec i e unc ion, p o iding access o inexpensi e
es unc ions (C.5).
•
Tes unc ions ep esen a p oblem class a he han a single
p oblem. Di e se es unc ions can be p oduced a andom (C.2).
•
Tes unc ions can ep oduce he beha io o eal-wo ld p ob-
lems (non-condi ional simula ion) and he unde lying da a (con-
di ional simula ion), hus sa is ying he ele ance c i e ion (C.4).
•
Es ima ion-based es unc ions do no necessa ily p o ide he
mos ealis ic landscape. E.g., highe equency beha io may be
igno ed by he p edic o . Con a ily, simula ion allows o espec
such beha io [
5
]. Thus, simula ion may a oid he main pi all
o da a-d i en es unc ion gene a ion and sa is ies he non-
smoo hing c i e ion (C.6). Since his a oids o e ly simpli ied
es p oblems, his also helps o mee he di icul y (C.1) and
ele ance (C.4) c i e ia.
•
K iging models a e e y lexible (C.3). By adap ing he ke nel
unc ion (o i s pa ame e s) mos p oblems can be app oxima ed.
Due o his ea u e, he simula ion app oach is e en independen
o he solu ion ep esen a ion (da a ype o x) [20, 32].
•
Unlike he pa ame e a ia ion app oach (c . Sec. 2), he simu-
la ion app oach is a mo e p incipled way o gene a ing di e se
es unc ions (C.2). In he pa ame e - a ia ion app oach, con-
olled changes o he pa ame e s may ha e d as ic e ec s on
he esul ing unc ions. Simula ion, and especially condi ional
simula ion, on he o he hand gua an ees ha ce ain s uc u es
o he eal-wo ld da a a e p ese ed in he es unc ion.
•
Unlike he K igi ie app oach, we p opose o use da a om eal-
wo ld p oblems o de i e he simula ions (C.4). Fu he mo e, we
ou line he di icul ies o excessi e smoo hing (C.6), which also
a ec he K igi ie app oach. In addi ion, condi ional simula ion
is used, which has no been explo ed by he K igi ie .
Disad an ages o possible p oblems a e:
•
The aining da a may in oduce bias. I insu icien da a is
collec ed, he model may no lea n he p oblem s uc u e.
•
The model selec ion and con igu a ion may also in oduce bias.
I may be un easonable o compa e ce ain su oga e models o
hei con igu a ions based on his p ocedu e: The models ha
use he same con igu a ion and ype as he simula ion model
would ha e an un ai ad an age.
•
The numbe , alue and loca ion o local and global op ima is
unknown. This is in con as o classical es unc ions, BBOB o
he GLG. I equi ed, such ea u es ha e o be app oxima ed.
•
The numbe o simula ion samples
m
is impo an o be se o a
good alue. Ve y la ge alues o
m
may lead o compu a ional
issues due o memo y and ime equi emen s. Ra he small
alues o
m
may lead o excessi e smoo hness, due o he inal
in e pola ion s ep. Impo an ly, his smoo hness issue is less
se e e han in he es ima ion case:
m
is no es ic ed by he cos
o e alua ion o
(x)
(unlike he numbe o aining samples
n
).
•
Condi ional simula ion may p oduce es unc ions ha ha e
li le a ia ion i he ained K iging model i s he da a exceed-
ingly well, hus iola ing he di e si y c i e ion (C.2). The model
es ima es low a iances and all ealiza ions o he simula ion
will be nea ly iden ical. To de ec such a case, he es ima ed
a iances a he simula ion sample loca ions can be compa ed
agains a h eshold alue. In many use-cases, spa si y o ain-
ing da a due o high cos s o e alua ion will ende his issue
unlikely. This issue is i ele an o non-condi ional simula ion.
4 ONE-DIMENSIONAL EXAMPLE
To demons a e he in ui ion behind he simula ion-based es unc-
ion gene a o , we i s p esen a simple example. The sou ce code
can be eques ed om he au ho s. The example is based on he
eal alued, one-dimensional unc ion
1dim(x)=exp(−20x)+sin(6x2)+x,(4)
wi h
x∈ [
0
,
1
]
. He e,
n=
5 samples a e c ea ed wi h uni o m
andom sampling and a e e alua ed wi h
1dim
. The K iging model
is ained wi h he da a and is simula ed a
m=
100 loca ions. The
simula ion samples include he i e aining samples. The emaining
m−n=
95 samples a e d awn om a uni o m andom dis ibu ion.
This p ocess is epea ed
nsim =
10 imes, so ha en es unc ions
a e c ea ed. Figu e 2 shows he objec i e unc ion, he K iging
es ima ion and he (non-)condi ional simula ions.
Simula ion-based Tes Func ions o Op imiza ion Algo i hms GECCO ’17, July 15-19, 2017, Be lin, Ge many
0.0 0.2 0.4 0.6 0.8 1.0
−0.5 0.5 1.5
x
Es ima ion
0.0 0.2 0.4 0.6 0.8 1.0
−0.5 0.5 1.5
x
Non−condi ional
0.0 0.2 0.4 0.6 0.8 1.0
−0.5 0.5 1.5
x
Condi ional
Figu e 2: Top: The unc ion 1dim(x)(dashed, black) and
he K iging es ima ion (solid, black) based on aining da a
(do s). Middle and bo om: 10 ealiza ions o he (condi-
ional) simula ions.
The non-condi ional simula ion es unc ions do no ep oduce
he aining da a. While he esul ing unc ions look a he chao ic,
hey all sha e he same co a iance s uc u e and hence ha e simila
smoo hness, as well as a simila numbe o op ima. The es unc-
ions a e o simila di icul y as
1dim(x)
(C.1), a e di e se (C.2),
lexible (C.3), ele an o he o iginal p oblem (C.4), inexpensi e
(C.5), and su icien ly ugged (C.6). The mo i a ion o using hese
kinds o es unc ions would be o es pe o mance on unc ions
ha ha e simila s uc u e as he eal objec i e unc ion, bu a e
no necessa ily iden ical o i .
Con a ily, he aining da a is ep oduced by he condi ional
simula ion es unc ions. The condi ional simula ion’s de ia ion
om he es ima ion inc eases wi h inc easing dis ance o obse ed
da a. In gene al, he unc ions a e less a ied han he ones based on
non-condi ional simula ion, bu ma ch he ue unc ion
1dim(x)
mo e closely. Hence, he mo i a ion o use non-condi ional simula-
ion would be o es ima e pe o mance on po en ial ealiza ions o
he same (black-box) 1dim(x).
5 PROTEIN LANDSCAPE APPLICATION
5.1 Da a and P oblem
To showcase he applica ion o simula ion-based es unc ion gen-
e a ion in p ac ice, his sec ion p esen s a eal-wo ld example. To
ha end, an openly a ailable da a se om he ield o compu a-
ional biology is used [
3
,
23
]. I con ains he co esponding i ness
alues o all DNA sequences o leng h en. He e, i ness e e s o he
a ini y o a luo escen a ge p o ein: allophycocyanin. The da a
se has p e iously been used o he assessmen o e olu iona y
algo i hms, using a ini e s a e machine model [24].
Candida e solu ions
x
a e DNA sequences wi h
d=
10 bases,
i.e., s ings wi h en le e s ha a e ei he
A
,
C
,
T
, o
G
. The i ness
a ini y(x)
is he esul o he complex measu emen s desc ibed
in [23], and is pa o he da a se . I has o be maximized.
5.2 Tes Func ion Gene a ion
The da a se comp ises all possible 10-base sequences. Fo ou
es s,
n=
100 sequences
x
a e selec ed ( andomly, uni o mly) and
he co esponding i ness alues a e aken om he da a se . The
model
Mcomple e
is ained wi h he esul ing da a and is simula ed
a
m=
1000 addi ional samples, and
nsim =
10 es unc ions
a e c ea ed. Since he comple e da a se is a ailable, he e is li le
mo i a ion o a es unc ion based on condi ional simula ion.
Hence, we use he non-condi ional simula ion app oach. The idea
is o c ea e es unc ions ha show simila beha io as he gi en
p o ein i ness landscape. Since
m
is smalle han he size o he
comple e sea ch space, his equi es o use he in e pola ion s ep
du ing e alua ion o he es unc ion. The de i ed es unc ions
a e deno ed wi h
sim
(
Mcomple e
,
in e pola e
). He e, he i s
a gumen e e s o he employed model and he second a gumen
speci ies ha we in e pola e be ween he simula ed samples.
To show he e ec ha he in e pola ion has on he i ness land-
scape, we will c ea e wo addi ional se s o simula ion-based es
unc ions. In bo h o hese cases, he sea ch space is es ic ed
o a subspace o jus 1024 sequences. To ha end, he las 5 ele-
men s o each sequence a e ixed o
ACGTA
. In he i s case, he
abo e desc ibed model
Mcomple e
is used o simula e all 1024 se-
quences in he subspace. This case will be deno ed
sim
(
Mcomple e
,
simula e-only
). In he second case, a new model
Msubspace
is
ained wi h 100 sequences selec ed ( andomly, uni o mly) om
he 1024 sequences o he subspace. This case will be deno ed
sim
(
Msubspace ,simula e-only).
Since he candida e solu ions o samples a e no eal- alued, he
co ela ion unc ion desc ibed in Sec. 3.1 can no be used. Ins ead,
he co ela ion unc ion has o be changed [
20
,
32
]. He e, he expo-
nen ial co ela ion unc ion
k(x,x0)=exp(−θd(x,x0))
is used wi h
he Hamming dis ance
d(x,x0)=
d
Õ
i=1
wiwi h wi=1 i xi,x0
i,
0 o he wise.
The Hamming dis ance also p o ed o yield good esul s in o he
s udies [
31
] and has he addi ional ad an age o low compu a ional
cos . The Hamming dis ance was also used in he o iginal s udy
ha in oduced he conside ed da a se [23].
Since he p oblem is disc e e and o a he manageable size, we
can use b u e o ce o es ima e he global op imum o each gene -
a ed es unc ion. Also, he numbe o local minima is de e mined,
i.e., he numbe o samples whose neighbo s do no ha e a be e
i ness. He e, Hamming neighbo hood is employed. Tha means,
he neighbo s o a sequence a e all sequences ha di e in exac ly
one elemen om he o iginal one.
GECCO ’17, July 15-19, 2017, Be lin, Ge many Zae e e e . al.
5.3 Landscape Analysis
Fi s ly, we epo some landscape cha ac e is ics o he es unc-
ions based on simula ions in he comple e, un es ic ed sea ch
space, i.e.,
sim
(
Mcomple e
,
in e pola e
). Rowe e al. [
23
] epo a
co ela ion leng h o oughly 4.5. They es ima e his alue by calcu-
la ing he au o-co ela ion o andom walks in he i ness landscape.
Thei esul is nicely ep oduced by he K iging model, which is
ained wi h jus 100 samples: he co ela ion leng h ( he ecip ocal
o he ke nel pa ame e
θ
) de e mined by maximum likelihood es i-
ma ion du ing model aining is 4
.
48. Ano he good ma ch is he
epo ed i ness dis ance co ela ion:
−
0
.
32 (in [
23
]) e sus
−
0
.
37
wi h s anda d de ia ion 0.09 (in his s udy).
Un o una ely, he e is also a s ong misma ch. Rowe e al. [
23
]
epo ha he da a se has 6805 local op ima. The simula ed es
unc ions
sim
(
Mcomple e
,
in e pola e
) ha e 49 local op ima o
less. This is a majo p oblem, as he es unc ions do no seem
o ep esen he unde lying p oblem e y well. The es unc ions
would be a o easy o sol e, iola ing c i e ia C.1, C.4, and C.6.
This p oblem can clea ly be linked o he las s ep o he unc-
ion gene a ion: in e pola ion. The in e pola ion is again based
on es ima ion. I can no ep esen highe equency changes in
he landscape e y well, as i in oduces oo much smoo hness.
Essen ially, he in e pola ion be ween simula ed samples will e-
mo e po en ial local op ima which would o he wise be p esen in
a simula ion o he comple e sea ch space. Since he numbe o
simula ed samples
m=
1000 is much smalle han he numbe o
local op ima in he eal landscape, he esul ing es unc ion is
necessa ily much smoo he han he eal p oblem.
This p oblem can be e ealed in a down-sized expe imen . By
es ic ing he analysis o a subspace o jus 1024 DNA sequences,
i.e.,
sim
(
Mcomple e
,
simula e-only
), in e pola ion can be a oided.
This esul s in o landscapes ha ha e be ween 2 and 7 local op ima
(in his subspace). This esul ma ches mo e closely o he eal
landscape’s beha io , which should on a e age ha e 6805
/
4
5≈
6
.
6
local op ima in a subspace o his size.
I he model is also di ec ly ained as well as simula ed in he
subspace, i.e.,
sim
(
Msubspace
,
simula e-only
), his esul s in o 10
o 19 local op ima. The eal landscape has exac ly 16 op ima in he
subspace. The ea lie iola ed c i e ia (C.1, C.4, C.6) a e sa is ied.
In heo y, we could do he same pu e simula ion expe imen on he
comple e sea ch space, bu ha is compu a ionally in easible: e en
jus s o ing he equi ed 4
10
x 4
10
co a iance ma ix is p ohibi i e.
Clea ly, his is a cen al issue. As shown, small disc e e sea ch
spaces may allow o skip he in e pola ion s ep and he co espond-
ing p oblems. La ge, ugged sea ch spaces emain a challenge. We
do no esol e his nume ical and compu a ional issue in his a i-
cle, a he poin o some po en ial solu ion app oaches. As no ed
in Sec. 3, we used a he simple and s aigh o wa d simula ion
echniques. The e a e mo e e icien simula ion echniques ha
allow o deal wi h la ge numbe s o simula ion samples. In he
con inuous case, one could y o adop he spec al simula ion
echnique ha does no di ec ly ely on a se simula ed samples
bu is based on a sum o cosine unc ions [
5
]. Spec al simula ion
does no equi e o compu e he comple e co a iance ma ix o he
simula ion samples. In he disc e e case, Gaussian Ma ko Random
Field models [
26
] may be o in e es . He e, he Ma ko p ope y
induces spa si y in he in e se o he co a iance ma ix, which may
be exploi ed o deal wi h la ge sample sizes.
As he one dimensional example in Sec. 4 showed, in e pola ing
a small numbe o simula ion samples should p o ide sa is ying
esul s i he p oblem i sel is a he smoo h. Hence, i is desi able
o es ima e he equi ed numbe o simula ion samples
m
ha
lead o an in e pola ion ha e lec s he uggedness o he ac ual
p oblem. Clea ly, he model is ully speci ied once all pa ame e s
a e de e mined. I should be possible o es ima e
m
based on he
esul ing co a iance s uc u e. In ha sense, small pa ame e s
θ
o he ke nel unc ion yield smoo he landscapes ha equi e less
simula ion samples o app oxima e. La ge
θ
yield mo e ugged
landscapes ha equi e mo e simula ion samples. One could also
ake an empi ical app oach o de e mine
m
, by inc easing i in s eps
and obse ing he con e gence o a sui able measu e (e.g., some
non-pa ame ic measu e o uggedness o he simula ion). Whe e
possible, expe knowledge abou he p oblem may also help o
de e mine a sui able
m
, i , e.g., he numbe o local op ima is known.
La ge, ugged sea ch spaces emain a challenge, and should ecei e
mo e a en ion in u u e esea ch.
A he same ime, he expe imen al esul s demons a e ha
simula ion-based es unc ions should be p e e ed o pu e es ima-
ion. The es ima ion o he model
Msubspace
ained in he 5-base
subspace has only 2 op ima (in ha same subspace). This s esses
ha excessi e smoo hing may supp ess local op ima. This p oblem
ex ends o any kind o es ima ion-based es unc ion gene a o .
5.4 Pe o mance Analysis
As he landscape analysis showed, he in e pola ed simula ion is
no a good ep esen a ion o he eal p oblem beha io . Hence, we
use es unc ions de i ed om a model
Msubspace
ha is ained
and simula ed in he ea lie in oduced 5-base subspace, i.e.,
sim
(
Msubspace
,
simula e-only
). By es ic ing he pe o mance anal-
ysis o he 5-base subspace, we a oid he in e pola ion p oblem.
We use he de i ed simula ion-based es unc ions o e alua e he
pe o mance o op imiza ion algo i hms. Ten di e en es unc-
ions a e c ea ed, and each algo i hm is un wen y imes on each
unc ion, esul ing in o 200 eplica ions.
In addi ion, we wan o show he ad an age o his app oach
in compa ison o an es ima ion-based es unc ion. The e o e, a
baseline es unc ion is de i ed om he es ima ion (p edic ion)
o he same model. Finally, he algo i hms es s we e epea ed
on he ac ual objec i e unc ion, i.e., di ec ly using he eal da a.
Since hese las wo cases only in ol e a single objec i e unc ion
ins ance, all 200 eplica ions we e spen on ha single unc ion.
The es ed op imiza ion algo i hm is a a ian o EGO o combi-
na o ial op imiza ion [
16
,
31
]. The algo i hm i s gene a es a se
o
k
samples ( andomly, uni o mly) and e alua es hem wi h he
objec i e unc ion (o es unc ion). Di e en alues o he ini ial
design size pa ame e
k
will be es ed: k=
{
5,10,20,50
}
. A K iging
su oga e model is lea ned wi h he esul ing da a. An op imiza ion
algo i hm (he e: b u e o ce
1
) is used o op imize an in ill c i e ion.
We compa e wo in ill c i e ia: expec ed imp o emen (EI) and he
p edic ed mean. The o me is, e.g., desc ibed in [
16
]. The la e is
1
In la ge sea ch spaces, e olu ion s a egies o ela ed me hods a e mo e app op ia e.
Simula ion-based Tes Func ions o Op imiza ion Algo i hms GECCO ’17, July 15-19, 2017, Be lin, Ge many
he de i ed om Eq. (1). The sample ha op imizes he in ill c i e-
ion will be e alua ed by he objec i e unc ion and he esul is
used o upda e he su oga e model. This p ocedu e is i e a ed un il
a budge o 100 unc ion e alua ions is exhaus ed. Random sea ch
is used as a baseline compa ison. Hence, we compa e 9 algo i hms:
Random sea ch and 8 combina ions o he in ill c i e ion and k.
Simila ly o he COCO amewo k [
11
], we use a se o a ge
alues and he espec i e un ime equi ed o each hese a ge s
o measu e algo i hm pe o mance. To ha end, he global op-
imum
yop = (xop )
is de e mined by b u e o ce. Based on
he de e mined op imum, he i ness gap is speci ied as ollows
gap(x)= (x) − (xop )
. Finally, e enly spaced a ge s o gap
a e speci ied on a loga i hmic scale:
a дap =
10
0
,
10
−0.2
, .. .,
10
−6
.
Fo each algo i hm un, i is eco ded a e wha un ime a ce ain
a ge is eached. Fo each se o es unc ions, we agg ega e he
esul ing da a by calcula ing he ac ion o all a ge s eached a a
ce ain un ime. The esul ing cu es ( ac ion o a ge s eached
agains un ime) a e called un leng h dis ibu ions [
14
], empi ical
cumula i e dis ibu ion unc ions (ECDF) [
11
] o da a p o iles [
22
].
Fo ou expe imen s, he ECDF plo s a e depic ed in Fig. 3. The
esul s show ha using he EI in ill c i e ion is c ucial. Wi hou
EI, algo i hm pe o mance is close o he andom sea ch baseline.
Resul s a e 100 e alua ions seem o be insensi i e o
k
. Ea lie
in he uns, smalle design sizes yield supe io esul s. The s ong
posi i e e ec o EI can be pa ly a ibu ed o he uggedness and
ela i ely la ge numbe o local op ima. Since he es ima ion-based
es unc ion is o e ly smoo h, his e ec is less isible he e: he
uns wi hou EI a e mo e easily dis inguished om he andom
sea ch baseline in case o es ima ion. This s esses he impo ance
o using simula ion a he han es ima ion o es ing. Compa ed
o es ima ion, he simula ion-based esul s a e much close o he
esul s on he ue unc ion. The es ima ion-based es unc ion
se e ely unde es ima es he di icul y o he p oblem. This s esses
ha simula ion app oaches a e mo e sui ed o sa is y he equi ed
c i e ia o es unc ion gene a o s, especially wi h ega ds o di i-
cul y (C.1), ele ance (C.4) and non-smoo hness (C.6).
A sligh di e ence be ween simula ion-based es s and he eal
unc ion pe o mance can be obse ed. No e, ha he goal o he
simula ion-based es unc ions was no o p oduce exac su oga e
unc ions o he ue, eal-wo ld p oblem. Ra he , i was desi ed
o c ea e es unc ion ins ances wi h simila beha io . S ill, he
selec ed ke nel o he K iging model may be imp o ed. The iso opic
na u e o he ke nel may no be a pe ec choice. Rowe e al. [
23
]
epo ha bases a he s a o a sequence ha e la ge impac s
han he bases la e in he sequence. I may be mo e exac o use
an aniso opic ke nel, which allows o lea n he impo ance o
each base. This comes a he cos o in oducing addi ional ke nel
pa ame e s.
6 DISCUSSION
One emaining ques ion is: When should we use which es unc ion
gene a ion app oach? The answe clea ly depends on he con ex
and goals o he analysis.
We s ongly ecommend o use a simula ion app oach i da a-
d i en, model-based es unc ions a e desi ed. Simula ion is a
p incipled way o gene a ing di e si y and a oiding oo much
RandomSea ch
Mean.50
Mean.20
Mean.5
Mean.10
EI.5
EI.50
EI.20
EI.10
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2
log10( e al)
eached a ge s / all a ge s
RandomSea ch
Mean.50
Mean.20
Mean.10
Mean.5
EI.10
EI.20
EI.5
EI.50
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2
log10( e al)
eached a ge s / all a ge s
RandomSea ch
Mean.20
Mean.10
Mean.50
Mean.5
EI.50
EI.5
EI.10
EI.20
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2
log10( e al)
eached a ge s / all a ge s
Figu e 3: Loga i hmic ECDF plo s o h ee es cases: es
unc ions based on non-condi ional simula ion ( op), es i-
ma ion (middle), and he eal objec i e unc ion (bo om).
The labels inside he plo indica e he con igu a ion o he
employed algo i hm, ha is, whe he EI o he p edic ed
mean was used as an in ill c i e ion and he size o he ini-
ial design. The x-axis depic s he loga i hm o he numbe
o i ness unc ion e alua ions ( e al).
smoo hness. Bu simula ion-based es unc ions a e no supposed
o eplace classical es unc ion se s. These es unc ion se s do
ha e me i s, e.g., hei p ope ies and beha io a e well unde s ood.
I an algo i hm is assessed wi hou any speci ic applica ion in mind,
a mix o bo h would be ideal. I an algo i hm is assessed wi h he
desi e o de e mine pe o mance on p oblems wi h speci ic ea-
u es (e.g., sepa abili y, unimodali y), classical es unc ions a e
p obably p e e able. Con a ily, i an algo i hm is assessed in he
con ex o a speci ic eal-wo ld applica ion (i.e., C.4 is impo an ), a
simula ion-based es unc ion gene a o should be p e e ed.
In he la e case, i pe o mance on a class o p oblems wi h
simila beha io as he eal objec i e unc ion is o in e es , non-
condi ional simula ion would be mo e app op ia e. Condi ional

Related note

Why institutions use Plag.ai for originality review, entry 51
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by teachers in the United States, the European Union, South America, and other research regions, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also faster first-level screening, better protection of institutional reputation, and stronger evidence for review committees. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For student essays, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai