Sch i en eihe CIplus, Band 1/2015
He ausgebe : T. Ba z-Beiels ein, W. Konen, H. S enzel, B. Naujoks
Meaning ul P oblem Ins ances
and Gene alizable Resul s
Thomas Ba z-Beiels ein
Meaning ul P oblem Ins ances and Gene alizable
Resul s∗
Thomas Ba z-Beiels ein
SPOTSe en Lab
Cologne Uni e si y o Applied Sciences
S einm¨
ulle allee 1
51643 Gumme sbach
homas.ba z-beiels ein@ h-koeln.de
www.spo se en.de
Feb ua y 12, 2015
Abs ac
Compu a ional in elligence me hods ha e gained impo ance in se e al eal-wo ld do-
mains such as p ocess op imiza ion, sys em iden i ica ion, da a mining, o s a is ical
quali y con ol. Tools a e missing, which de e mine he applicabili y o compu a ional
in elligence me hods in hese applica ion domains in an objec i e manne . S a is ics
p o ide me hods o compa ing algo i hms on ce ain da a se s. In he pas , se e al
es sui es we e p esen ed and conside ed as s a e o he a . Howe e , he e a e se e al
d awbacks o hese es sui es, namely: (i) p oblem ins ances a e somehow a i icial
and ha e no di ec link o eal-wo ld se ings; (ii) since he e is a ixed numbe o es
ins ances, algo i hms can be i ed o uned o his speci ic and e y limi ed se o
es unc ions; (iii) s a is ical ools o compa isons o se e al algo i hms on se e al
es p oblem ins ances a e ela i ely complex and no easily o analyze. We p opose a
me hodology o o e come hese di icul ies. I is based on s anda d ideas om s a is-
ics: analysis o a iance and i s ex ension o mixed models. This chap e combines
essen ial ideas om wo app oaches: p oblem gene a ion and s a is ical analysis o
compu e expe imen s.
∗This is a p ep in o he publica ion T. Ba z-Beiels ein. How o c ea e gene alizable
esul s. In J. Kacp zyk and W. Ped ycz, edi o s, Sp inge Handbook o Compu a ional In-
elligence, chap e 56. Sp inge , 2015 (in p in ). The o iginal publica ion is a ailable a
www.sp inge link.com
1
1 In oduc ion
Compu a ional in elligence (CI) me hods ha e gained impo ance in se e al eal-
wo ld domains such as p ocess op imiza ion, sys em iden i ica ion, da a mining,
o s a is ical quali y con ol. Tools a e missing, which de e mine he appli-
cabili y o CI me hods in hese applica ion domains in an objec i e manne .
S a is ics p o ide me hods o compa ing algo i hms on ce ain da a se s. In
he pas , se e al es sui es we e p esen ed and conside ed as s a e o he a .
Howe e , he e a e se e al d awbacks o hese es sui es, namely:
•p oblem ins ances a e mos ly a i icial and ha e no di ec link o eal-wo ld
se ings;
•since he e is a ixed numbe o es ins ances, algo i hms can be i ed
o uned o his speci ic and e y limi ed se o es unc ions. As a
consequence, s udies (benchma ks) p o ide insigh how hese algo i hms
pe o m on his speci ic se o es ins ances, bu no insigh on how hey
pe o m in gene al;
•s a is ical ools o compa isons o se e al algo i hms on se e al es p ob-
lem ins ances a e ela i ely complex and no easy o analyze.
We p opose a me hodology o o e come hese di icul ies. This me hodology,
which gene a es p oblem classes a he han uses one ins ance, is cons uc ed
as ollows.
1. Fi s , we p e-p ocess he unde lying eal-wo ld da a.
2. In a second s ep, ea u es om hese da a a e ex ac ed. This ex ac ion
elies on he assump ion ha ma hema ical a iables can be used o ep-
esen eal-wo ld ea u es. Fo example, decomposi ion echniques can be
applied o model he unde lying da a s uc u es, i we a e using ime-se ies
da a. The o iginal ime se ies is decons uc ed in o a numbe o compo-
nen se ies, whe e each o hese e lec s a ce ain ype o beha io , e.g., a
end o seasonali y[9]. We ob ain an analy ic model o he da a.
3. Then, we pa ame ize his model. Based on his pa ame iza ion and
andomiza ion, we can gene a e in ini ely many new p oblem ins ances.
4. I no eal-wo ld da a a e a ailable, p oblem ins ances can be gene a ed
using es -p oblem gene a o s. The gene a ion o es p oblems, which a e
well- ounded and ha e p ac ical ele ance, is an on-going ield o esea ch
o se e al decades.
5. F om his in ini e se , we can d aw a limi ed numbe o p oblem ins ances
which will be used o he compa ison.
6. Since p oblem ins ances a e selec ed andomly, we apply andom and
mixed models o he analysis [15]. Mixed models include ixed and
andom e ec s. A ixed e ec is an unknown cons an . I s es ima ion
om he da a is a common p ac ice in analysis o a iance (ANOVA) o
eg ession. A andom e ec is a andom a iable. We a e es ima ing he
2
pa ame e s ha desc ibe i s dis ibu ion, because—in con as o ixed
e ec s—i makes no sense o es ima e he andom e ec i sel .
This chap e combines ideas om wo app oaches: p oblem gene a ion and
s a is ical analysis o compu e expe imen s. The wo k p esen ed by Chia an-
dini and Goegebeu [11] p o ides he basis o ou s a is ical analysis. They
p esen a sys ema ic and well-de eloped amewo k o mixed models. Rela ed
modeling app oaches we e sugges ed by McGeoch[14] and Bi a a i [7]. Gal-
laghe and Yuan [13] p esen a p oblem ins ance (landscape) gene a o ha is
pa ame e ized by a small numbe o pa ame e s, and he alues o hese pa am-
e e s ha e a di ec and in ui i e in e p e a ion in e ms o he geome ic ea u es
o he landscapes ha hey p oduce. Cas i˜nei as, Cauwe , and O’Sulli an [10]
p esen a pa ame izable benchma k gene a o o bin packing ins ances based
on he well-known Weibull dis ibu ion. Using he shape and scale pa ame e s
o he Weibull dis ibu ion, he au ho s gene a e benchma ks ha con ain a
a ie y o i em size dis ibu ions. They epo ha o all bin capaci ies, he
numbe o bins equi ed in an op imal solu ion inc eases as he Weibull shape
pa ame e inc eases. Using his ea u e, scalabili y is enabled.
Basically, his chap e ies o ind answe s o he ollowing undamen al
ques ions in expe imen al esea ch.
(Q-1) How o gene a e p oblem ins ances?
(Q-2) How o gene alize expe imen al esul s?
The chap e is s uc u ed as ollows. Sec ion 2 in oduces eal-wo ld and
a i icial op imiza ion p oblems. Algo i hms a e desc ibed in Sec . 3. Objec-
i e unc ions and s a is ical models a e in oduced in Sec . 4. These models
ake p oblem and algo i hm ea u es in o conside a ion. Sec ion 5 p esen s case
s udies, which illus a e ou me hodology. This chap e closes wi h a summa y
and an ou look.
2 Fea u es o Op imiza ion P oblems
2.1 P oblem Classes and Ins ances
Nowadays, i is a common p ac ice in op imiza ion o choose a ixed se o p ob-
lem ins ances in ad ance and o apply classical ANOVA o eg ession analysis.
In many expe imen al s udies a ew p oblem ins ances πi(i= 1,2, . . . , q) a e
used and esul s o some uns o he algo i hms αj(j= 1,2, . . . , h) on hese in-
s ances a e collec ed. The ins ances can be ea ed as blocks and all algo i hms
a e un on each single ins ance. Resul s a e g ouped pe ins ance πi. Analyses
o hese expe imen s shed some ligh on he pe o mance o he algo i hms on
hose speci ic ins ances. Howe e , he in e es o he esea che should no be
jus he pe o mance o he algo i hms on hose speci ic ins ances chosen, bu
a he on he gene aliza ion o he esul s o he en i e class Π. Gene aliza ions
3
abou he algo i hm’s pe o mance on new p oblem ins ances a e di icul o
impossible in his se ing.
Based on ideas om Chia andini and Goegebeu [11], o o e come his di -
icul y, we p opose he ollowing app oach: A small se o p oblem ins ances
{πi∈Π|i= 1,2, . . . , q}is chosen a andom om a la ge se , o class Π, o pos-
sible ins ances o he p oblem. P oblem ins ances a e conside ed as ac o le els.
Howe e , his ac o is o a di e en na u e om he ixed algo i hmic ac o s in
he classical ANOVA se ing. Indeed, he le els a e chosen a andom and he
in e es is no in hese speci ic le els bu in he p oblem class Π om which hey
a e sampled. The e o e, he le els and he ac o a e andom. Consequen ly,
ou esul s a e no based on a limi ed, ixed numbe o p oblem ins ances. They
a e andomly d awn om an in ini e se , which enables gene aliza ion.
2.2 Fea u e Ex ac ion and Ins ance Gene a ion
A p oblem class Π can be gene a ed in di e en manne s. We will conside
a i icial and na u al p oblem class gene a o s. A i icially gene a ed p oblems
allow ea u e gene a ion based on some p ede ined cha ac e is ics. They a e ba-
sically heo y d i en, i.e., he esea che de ines ce ain ea u es such as linea i y
o mul i modali y. Based on hese ea u es, a model ( o mula) is cons uc ed.
By in eg a ing pa ame e s in o his o mula, many p oblem ins ances can be
gene a ed by pa ame e a ia ion. We will exempli y his app oach in he ol-
lowing pa ag aph. The second way, which will gene a e na u al p oblem classes,
uses a h ee-s age app oach. Fi s , he eal-wo d sys em and i s componen s a e
desc ibed. Then, ea u es a e ex ac ed om a eal-wo ld sys em. Based on his
ea u e se , a model is de ined. Adding pa ame e s o his model, new p ob-
lem ins ances can be gene a ed. The e is also a hi d way o ”gene a e” es
ins ances: i we a e lucky, many da a a e a ailable. In his case, we can sample
a limi ed numbe o p oblem ins ances om he la ge se o eal-wo ld da a.
The s a is ical analysis is simila o hese h ee cases.
2.2.1 A i icial Tes Func ions
Se e al p oblem ins ance gene a o s ha e been p oposed o e he las yea s. Fo
example, Gallaghe and Yuan p esen a landscape es gene a o , which can be
used o se up p oblem ins ances o con inuous, bound-cons ained op imiza ion
p oblems [13]. The Max-Se o Gaussian Landscape Gene a o (MSG) uses he
maximum o mweigh ed Gaussian unc ions
G(x) = max
i∈1,2,...,m(wigi(x)),
whe e g:Rn→Rdeno es an n-dimensional Gaussian unc ion
g(x) = exp −1
2(x−µ)Σ−1(x−µ)T
(2π)n/2|Σ|1/2!1/n
,
4
µis an n-dimensional ec o o means, and Σ is an (n×n) co a iance ma ix.
The mean o each Gaussian co esponds o an op imum on he landscape and
he loca ion o all op ima is known. The global op imum is he one wi h he
la ges alue. We will use he MSG p oblem ins ance gene a o in Sec . 5 o
demons a e ou app oach.
2.2.2 Na u al P oblem Classes
This sec ion exempli ies he h ee undamen al s eps o gene a ing eal-wo ld
p oblem ins ances, namely
1. Desc ibing he eal-wo ld sys em and i s da a
2. Fea u e ex ac ion and model cons uc ion
3. Ins ance gene a ion.
We will illus a e his p ocedu e by using he classic Box and Jenkins ai line
da a [8]. These da a con ain he mon hly o als o in e na ional ai line pas-
senge s om 1949 o 1961. The ea u e ex ac ion is based on me hods om
ime-se ies analysis. Because o i s simplici y he Hol -Win e s me hod is pop-
ula in many applica ion domains. I is able o adap o changes in ends and
seasonal pa e ns. The Hol -Win e s p edic ion unc ion equi es he es ima ion
o h ee pa ame e s, i.e., α,βand γ, which can be es ima ed om o iginal ime-
se ies da a. Thei op imal alues a e de e mined by minimizing he squa ed
one-s ep p edic ion e o . To gene a e new p oblem ins ances, hese pa ame e s
can be sligh ly modi ied. Based on hese modi ied alues, he model is e- i ed.
Finally, we can ex ac he new ime se ies. One ypical esul om his ins ance
gene a ion is shown in Fig. 1. Ba z-Beiels ein [2] desc ibes his p ocedu e in
de ail.
To illus a e he wide applicabili y o his app oach, we will lis u he eal-
wo k p oblem domains, which a e subjec o ou cu en esea ch.
Sma Me e ing. The de elopmen o accu a e o ecas ing me hods o elec i-
cal ene gy consump ion p o iles is an impo an ask. We conside ime
se ies collec ed om a manu ac u ing p ocess. Each ime se ies con ains
qua e -hou ly samples o he ene gy consump ion o a bake y. A de ailed
da a desc ip ion can be ound in [3].
Wa e Indus y. Cana y is a so wa e de eloped by he Uni ed S a es En i on-
men al P o ec ion Agency (US EPA) and Sandia Na ional Labo a o ies.
I s pu pose is o de ec e en s in he con ex o wa e con amina ion. An
e en is in his con ex de ined as a ce ain ime pe iod whe e a con am-
inan signi ican ly de e io a es he wa e quali y. Dis inguishing e en s
om (i) backg ound changes, (ii) main enance and modi ica ion due o
ope a ion, and (iii) ou lie s is an essen ial ask, which was implemen ed
in he Cana y so wa e. The e o e, de ia ions a e compa ed o egula
pa e ns and sho e m changes. The co esponding da a con ains mul i-
a ia e ime-se ies da a. I is a selec ion om a la ge da ase shipped
wi h he open sou ce e en -de ec ion so wa e CANARY de eloped by US
EPA and Sandia Na ional Labo a o ies [19].
5
Time
Ai Passenge s
1950 1952 1954 1956 1958 1960
100 200 300 400 500 600
Figu e 1: Hol -Win e s p oblem ins ance gene a o . The solid line ep esen s
he eal da a, he do ed line p edic ions om he Hol -Win e s model and he
ine do ed line modi ied p edic ions, espec i ely.
Finance. The da a a e eal-wo ld da a om in aday o eign exchange (FX)
ading. The FX ma ke is a inancial ma ke o ading cu encies o
enable in e na ional ade and in es men . I is he la ges and mos
liquid inancial ma ke in he wo ld. Cu encies can be aded ia a wide
a ie y o di e en inancial ins umen s, anging om simple spo ades
o e o highly complex de i a i es. We a e using h ee o eign exchange
(cu ency a e) ime se ies collec ed om Bloombe g. Each ime se ies
con ains hou ly samples o he change in cu ency exchange a e [12].
One ypical goal in o ecas ing is he minimiza ion o he o ecas e o s o
he di e ences be ween eal (obse ed) alues, say yi, and p edic ed alues, say
ˆyi. This goal can be conside ed as an op imiza ion p oblem.
As s a ed in Sec . 2.2, he s a is ical analysis is simila o a i icial and na -
u al p oblem classes. Ou goal can be s a ed as ollows: Fo a gi en p oblem
class Π, which can be a i icial o na u al, we a e ying o de e mine i an op i-
miza ion algo i hm αo se e al algo i hm ins ances αishow simila beha io on
andomly selec ed p oblem ins ances πi∈Π. This ques ion will be o mula ed
as a s a is ical hypo hesis. Based on he ela ed s a is ical amewo k, we can
de e mine con idence in e als o he pe o mance o he algo i hm on unseen
p oblem ins ances.
6
3 Algo i hm Fea u es
3.1 Fac o s and Le els
E olu iona y algo i hms (EA) belong o he la ge class o bio-inspi ed sea ch
heu is ics. They combine speci ic componen s, which may be quali a i e, like
he ecombina ion ope a o o quan i a i e, like he popula ion size. Ou in e es
is in unde s anding he con ibu ion o hese componen s. In s a is ical e ms,
hese componen s a e called ac o s. The in e es is in he e ec s o he speci ic
le els chosen o hese ac o s. Hence, we say ha he le els and consequen ly
he ac o s a e ixed. Al hough mode n sea ch echniques like sequen ial pa-
ame e op imiza ion o Pa e o gene ic p og amming [18] allow mul i-objec i e
pe o mance measu es (solu ion quali y e sus a iabili y o desc ip ion leng h),
we es ic ou sel es o analyze he e ec o hese ac o s on a uni a ia e mea-
su e o pe o mance. We will use he quali y o he solu ions e u ned by he
algo i hm a e mina ion as he pe o mance measu e.
3.2 Example: E olu ion S a egy
E olu ion s a egies (ES) a e p ominen ep esen a i es o e olu iona y algo-
i hms, which includes gene ic algo i hms and gene ic p og amming as well [17].
They can be classi ied as gene ic popula ion-based me aheu is ic op imiza ion
algo i hms o global op imiza ion ha in some sense mimics he na u al e olu-
ion. E olu ion s a egies a e applied o ha d eal- alued op imiza ion p oblems.
Mu a ion is pe o med by adding a no mally dis ibu ed andom alue o each
ec o componen . The s anda d de ia ion o hese andom alues is modi ied
by sel -adap a ion. E olu ion s a egies can use a popula ion o se e al solu-
ions. Each solu ion is conside ed as as indi idual and consis s o objec and
s a egy a iables. Objec a iables ep esen he posi ion in he sea ch space,
whe eas s a egy a iables s o e he s ep sizes, i.e., he s anda d de ia ions o
he mu a ion. We a e analyzing he ES basic a ian , which has been p oposed
in [6].
Mu a ion means neighbo hood-based mo emen in sea ch space ha includes
he explo a ion o he ”ou e space” cu en ly no co e ed by a popula ion,
whe eas ecombina ion ea anges exis ing in o ma ion and so ocuses on he
”inne space”. Selec ion is mean o in oduce a bias owa ds be e i ness al-
ues. A conc e e ES may con ain speci ic mu a ion, ecombina ion, o selec ion
ope a o s, o call hem only wi h a ce ain p obabili y, bu he con ol low is
usually le unchanged. Each o he consecu i e cycles is e med a gene a ion.
The con ol low is shown in Fig. 2. Conce ning he ep esen a ion, i should be
no ed ha mos empi ic s udies a e based on canonical o ms as bina y s ings
o eal- alued ec o s, whe eas many eal-wo ld applica ions equi e specialized,
p oblem dependen ones. Table 1 summa izes impo an ES pa ame e s. This
chap e p esen s wo case s udies. The i s case s udy is based on a ixed ES
pa ame e se ing, whe eas he second case s udy modi ies he ecombina ion
ope a o o objec a iables. We a e con inced ha he applicabili y o he
7
ma ing selec ion
ecombina ion
ini ializa ion
and e alua ion
mu a ione alua ion
es o e mina ion
en i onmen al
selec ion c osso e
eplacemen
Figu e 2: The e olu iona y cycle, basic wo king scheme o all ES and EA. Te ms
common o desc ibing e olu ion s a egies a e used, al e na i e e ms a e added
below in blue.
me hods p esen ed in his chap e goes a beyond he simpli ied case s udies.
Ou main con ibu ion is a amewo k, which allows conclusions ha a e no
limi ed o a small numbe o p oblem ins ances bu o p oblem classes.
4 Objec i e Func ions
We will use he ollowing op imiza ion amewo k: An ES is applied as a min-
imize on he es unc ion (x). Fo mally speaking, le Sdeno e some se ,
e.g., S⊆Rn. We a e seeking o alues ∗and x∗, such ha minx∈S (x) wi h
∗= minx∈S (x) and x∗= a g min (x). This app oach can be ex ended in
many ways. Fo example, i Sdeno es imes-se ies da a, hen an op imiza ion
algo i hm can be applied o minimize he empi ical mean squa ed p edic ion
e o .
Tes p oblem ins ances will be d awn om Gallaghe ’s and Yuan’s MSG es
unc ion gene a o . The ollowing pa ame e s can be used o speci y he MSG
gene a o .
•The numbe o Gaussian componen s m.
•The mean ec o µo each componen .
•The co a iance ma ix Σ o each componen .
•The weigh o each componen wi.
•A maximum h eshold ∈[0; 1] can be speci ied o local op ima and
he i ness alue o he global op imum G∗. Local op ima a e andomly
gene a ed wi hin [0; ×G∗].
8
Table 2: ANOVA able o a one- ac o ixed and andom e ec s models
Sou ce Sum Deg ees Mean EMS EMS
o Va ia ion o Squa es o eedom Squa e Fixed Random
T ea men SS ea q−1 MS ea σ2+ Pq
i=1 τ2
i
q−1σ2+ σ2
τ
E o SSe q( −1) MSe σ2σ2
To al SS o al q −1
and
MSe =SSe
q( −1) =Pq
i=1 P
j=1(Yij −¯
Yi.)2
q( −1) .
I can be shown ha
E(MS ea ) = σ2+ σ2
τand E(MSe ) = σ2,(9)
c . [15]. The e o e, he es ima o s o he a iance componen s a e
ˆσ2= MSe ,(10)
ˆ
σ2
τ=MS ea −MSe
.(11)
The co esponding ANOVA able is shown in Table 2.
Based on ANOVA calcula ions, we ob ain wi h (10) an es ima o o he
i s a iance componen ˆσ2=−0.4848257, and om (11), we ob ain he second
componen ˆσ2
τ= 11.32854. The model a iance can be de e mined as ˆσ2+ ˆσ2
τ=
10.84372. The mean µ=−12.05554 om (8) can be ex ac ed. Finally, he p
alue in he ANOVA able is calcula ed as 0.7979083.
No e, ha we ha e ob ained a nega i e a iance. Since nega i e a iances
a e no easible, we can p oceed by se ing hei alues o ze o and p oceed wi h
his modi ied alues. A mo e elegan way is p esen ed in he ollowing.
Res ic ed maximum likelihood. In some cases, he s anda d ANOVA,
which was used in ou example, p oduces a nega i e es ima e o a a iance
componen . This can be seen in (11): I MSe >MS ea , nega i e alues
occu . By de ini ion, a iance componen s a e posi i e. Me hods, which al-
ways yield posi i e a iance componen s ha e been de eloped. He e, we will
use es ic ed maximum likelihood es ima o s (REML). The ANOVA me hod
o a iance componen es ima ion, which is a me hod o momen s p ocedu e,
and REML es ima ion may lead o di e en esul s. Ou pu om an R-based
analysis wi h he unc ion lme om he package lme4 eads as ollows ( Seed
deno es he p oblem ins ance) [16]:
Linea mixed model i by REML
Fo mula: yLog ~ 1 + (1 | Seed)
Da a: samp.d
15
●●
●
●●●
●●
●● ● ●
●
●●
●
● ● ● ●
●● ●
●●
●
●●
●●
● ● ●
●●
●
●●
●●
●● ●● ●●
●
●●●
●●
●
●●●
●
●●
●
●●●
●●
●●
●●●
● ● ●● ●
●● ● ●
●●●
●
●
●●● ●
●●
−2 −1 0 1 2
0.00 0.05 0.10 0.15 0.20 0.25 0.30
Q−Q plo o esiduals
Theo e ical Quan iles
Sample Quan iles
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
−5 0 5 10
Q−Q plo o esiduals
Theo e ical Quan iles
Sample Quan iles
Figu e 5: Le : Q-Q plo o he esiduals o aw da a. Le : Q-Q plo o he
log- ans o med esponses.
AIC BIC logLik de iance REMLde
475.6 483.1 -234.8 469.3 469.6
Random e ec s:
G oups Name Va iance S d.De .
Seed (In e cep ) 0.000 0.0000
Residual 10.893 3.3004
Numbe o obs: 90, g oups: Seed, 9
Fixed e ec s:
Es ima e S d. E o alue
(In e cep ) -12.0555 0.3479 -34.65
Compa ed o he ANOVA se ing, di e en alues o ˆσ2, ˆσ2
τ, and µwe e
ob ained. Howe e , he REML based analysis also shows ha he a iabili y in
he esponse obse a ions can be a ibu ed o he a iabili y o he algo i hm.
SAMP-3 Valida ion o he Model Assump ions Be o e pe o ming hy-
po hesis es ing based on he models in oduced in SAMP-2, he alidi y o he
model assump ions has o be in es iga ed. I he model is adequa e, he esiduals
should exhibi no s uc u e. Residuals a e plo ed agains i ed alues o check
he assump ion o homoscedas ici y and quan ile-quan ile (Q-Q) plo s a e used
o check i esiduals mee he no mali y assump ion. Quan ile-quan ile plo s o
he esiduals is shown in Fig. 5 o he aw and he log- ans o med esponses.
These plo s p o ide a good way o compa e he he dis ibu ion o a sample
wi h a dis ibu ion. La ge de ia ions om he line indica e non-no mali y o
he sample da a. These Q-Q plo s indica e ha a log ans o ma ion o he
esponse migh be use ul in ou se ing.
SAMP-4 Hypo hesis Tes ing Tes ing hypo heses abou indi idual ea -
men s (ins ances) is useless, because he p oblem ins ances πia e he e consid-
16
e ed as samples om some la ge popula ion o ins ances Π. We es hypo heses
abou he a iance componen σ2
τ, i.e., he null hypo hesis
H0:σ2
τ= 0 e sus H1:σ2
τ>0.(12)
Unde H0, he algo i hm pe o mance is iden ical on all p oblem ins ances (”all
ea men s a e iden ical”), i.e., σ2
τis e y small. Based on (9), we conclude ha
E(MS ea ) = σ2+ σ2
τand E(MSe ) = σ2a e simila . Unde he al e na i e,
a iabili y exis s be ween ea men s. S anda d analysis shows ha SSe /σ2is
dis ibu ed as chi-squa e wi h q( −1) deg ees o eedom. Le Fu, deno e he
Fdis ibu ion wi h unume a o and denomina o deg ees o eedom. Unde
H0, he a io
F0=
SS ea
q−1
SSe
q( −1)
=MS ea
MSe
is dis ibu ed as Fq−1,q( −1). To es hypo heses in (8), we equi e ha τ1, . . . , τq
a e i.i.d. N(0, σ2
τ), εij,i= 1, . . . , q,j= 1, . . . , , a e i.i.d. N(0, σ2), and all τi
and εij a e independen o each o he . These conside a ions lead o he decision
ule o ejec H0a he signi icance le el αi
0> F(1 −α;q−1, q( −1)),(13)
whe e 0is he ealiza ion o F0 om he obse ed da a. An in ui i e mo i a ion
o he o m o s a is ic F0can be ob ained om he expec ed mean squa es.
Unde H0bo h MS ea and MSe es ima e σ2in an unbiased way, and F0can
be expec ed o be close o one. On he o he hand, la ge alues o F0gi e
e idence agains H0.
Rega ding he SAMP case, we ob ain he ollowing alues: Based on (9)
and (13), we can de e mine he Fs a is ic and he p alue. We ge MS ea =
MSe = 10.89275 and 0= 1, which esul s a la ge p alue: 0.4426363. The
null hypo hesis H0:σ2
τ= 0 om (12) can no be ejec ed, i.e., we conclude ha
he e is no ins ance e ec . A simila conclusion was ob ained om he ANOVA
me hod o a iance componen es ima ion as in oduced in Table 2.
SAMP-5 Con idence In e als and P edic ion An unbiased es ima o o
he o e all mean µis
ˆµ= ¯y·· =
q
X
i=1
X
j=1
yij/(q ).
I s a iance is gi en by
V(¯y··)=V
q
X
i=1
X
j=1
yij/(q )
= σ2
τ+σ2
q .
Wi h (9) and (10), we ob ain an es ima o o he a iance o he o e all mean
µas ˆ
V(¯y··) = MS ea /q .
17
Since ¯
Y·· −µ
pMS ea /q ∼ q( −1),
he con idence limi s o µcan be de i ed as
¯y·· ± 1−α/2;q( −1)pMS ea /q . (14)
We conclude he SAMP case s udy wi h p edic ion o he algo i hm’s pe -
o mance on a new ins ance om he same class. Based on (14), we ob ain he
ollowing 95% con idence in e al: [2.6773e−06; 1.262e−05]. Again, con idence
in e als om he REML and ANOVA me hods a e e y simila . Summa iz-
ing, we can conclude ha he ES pe o ms simila on ins ances om ΠMSG,
which we e gene a ed wi h Eq. 2.
5.3 MAMP: Mul iple Algo i hms, Mul iple P oblems
In he MAMP case s udy, ixed e ec s a e included in he condi ional s uc u e
o (6), which leads o a mixed model. Ins ead o one ixed algo i hm as in he
SAMP case, we conside ei he se e al algo i hms o algo i hms wi h se e al
pa ame e s. Bo h si ua ions can be ea ed while conside ing algo i hms as
le els o a ixed ac o , whe eas p oblem ins ances a e d awn andomly om he
popula ion o ins ances ΠMSG.
MAMP-1 Algo i hm and P oblem Ins ances
MAMP-2 ANOVA and REML Model Building
MAMP-3 Valida ion o he Model Assump ions
MAMP-4 Hypo hesis Tes ing
a) Random e ec s
b) Fixed e ec s
MAMP-5 Con idence In e als and P edic ion
MAMP-1 Algo i hm and P oblem Ins ances We aim a compa ing he
pe o mance o he ES wi h di e en ecombina ion ope a o s o e an ins ance
class. Mo e p ecisely, we ha e ou ES ins ances using ecombina ion ope a o s
{1,2,3,4}and nine ins ances andomly sampled om he class ΠMSG as illus-
a ed in Fig. 3. Each un is epea ed en imes. In his s udy 4 ×9×10 = 360
da a we e used. We a e in e es ed in he ollowing ques ions:
•Is he e an ins ance e ec ?
•Do he mean pe o mances o he ES wi h di e en ecombina ion ope a-
o s di e ?
•Do he ins ance-algo i hm in e ac ions con ibu e o he a iabili y o he
esponse?
A i s isual inspec ion, which plo s he pe o mance o he algo i hm wi hin
each p oblem ins ance, is shown in Fig. 6. In eigh o he nine ins ances he
linea eg ession line does ha e a nega i e slope and he in e cep s do no di e
e y much. This indica es ha he e is no signi ican in e ac ion be ween he
ixed and he andom ac o s.
18
obj eco
y
−10
−5
1 2 3 4
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
124
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
125
1234
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
126
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
127
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
128
−10
−5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
129
−10
−5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
130
1234
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
131
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
132
Figu e 6: Fou algo i hms (ES wi h modi ied ecombina ion ope a o s) on nine
es p oblem ins ances. Each panel ep esen s one p oblem ins ance and p oblem
ins ances a e labeled om 124 o 130. Pe o mance is plo ed agains he le el
o he ecombina ion ope a o .
19
MAMP-2 ANOVA and REML Model Building The a iabili y in he
pe o mance measu e can be decomposed acco ding o he ollowing mixed-
e ec s ANOVA model:
Yijk =µ+αj+τi+γij +εijk,(15)
whe e µis an o e all pe o mance le el common o all obse a ions, αjis a ixed
e ec due o he algo i hm j,τiis a andom e ec associa ed wi h ins ance i,
γij is a andom in e ac ion be ween ins ance iand algo i hm j, and εijk is a
andom e o o eplica ion ko algo i hm jon ins ance i. We assume ha
he αj’s a e ixed e ec s such ha Ph
j=1 αj= 0 and ha he andom elemen s
a e τia e i.i.d. N(0, σ2
τ), γij a e i.i.d. N(0, σ2
γ), εijk a e i.i.d. N(0, σ2), and
τi,γij and εijk a e mu ually independen andom a iables. Simila o (6) he
condi ional dis ibu ion o he pe o mance measu e gi en he ins ance and he
ins ance–algo i hm in e ac ion is gi en by
Yijk|τi, γij ∼ N(µ+αj+τi+γij, σ2),(16)
wi h i= 1, . . . , q, j = 1, . . . , h, and k= 1, . . . , . The ma ginal model eads
(a e in eg a ing ou he andom e ec s τiand γij):
Yijk ∼ N (µ+αj, σ2+σ2
τ+σ2
γ).(17)
Based on hese s a is ical assump ions, hypo hesis es s can be pe o med abou
ixed and andom ac o e ec s. Using he mixed model (16), we a e in e es ed
in es ing whe he he e is a di e ence be ween he ac o le el means µ+αj
(j= 1, . . . , h). The hypo heses o es ing he ixed e ec s can be o mula ed as
H0:αi= 0 ∀iagains H1:∃αj6= 0 (18)
Rega ding andom e ec s, es s abou pa icula le els a e useless. This is
simila o he andom-e ec s model (8). Again, we pe o m es s abou he
a iance componen s σ2
τand σ2
γins ead. These can be o mula ed as ollows:
H0:σ2
τ= 0,and H0:σ2
γ= 0,
H1:σ2
τ>0, H1:σ2
γ>0,(19)
espec i ely. I all ea men (p oblem ins ances) combina ions ha e he same
numbe o obse a ions, i.e., i he design is balanced, he es s a is ics o hese
hypo heses a e a ios o mean squa es ha a e chosen such ha he expec ed
mean squa es o he nume a o di e s om he expec ed mean squa es o he
denomina o only by he a iance componen s o he andom ac o unde es .
Chia andini and Goegebeu [11] p esen he esul ing analysis o a iance, which
is shown in Table 3.
ANOVA Model Building. The ANOVA able o he expe imen s om he
MAMP case s udy is shown in Table 4. Equa ing he obse ed mean squa es
20
Table 3: Expec ed mean squa es and consequen app op ia e es s a is ics o
a mixed wo- ac o model wi h h ixed ac o s, q andom ac o s, and epea s.
F om [11].
Mean Expec ed Tes
E ec s squa es d mean squa es s a is ics
Fixed ac o MSA h−1σ2+ σ2
γ+ q Ph
j=1 α2
j
h−1MSA/MSAB
Random
ac o
MSB q−1σ2+ σ2
γ+ hσ2
τMSB/MSAB
In e ac ion MSAB (h−1)(q−1) σ2+ σ2
γMSAB/MSE
E o MSE hq( −1) σ2
Table 4: ANOVA o he MAMP case
Mean squa es Fac o s D Sum Sq Mean Sq F alue P (>F)
MSA obj eco 3 154.59 51.53 11.05 0.0000
MSB Seed 8 251.79 31.47 6.75 0.0000
MSAB obj eco: Seed 24 185.60 7.73 1.66 0.0288
MSE Residuals 324 1511.27 4.66
21
in he lines o he ANOVA able o hei expec ed alues and sol ing o he
a iance componen s leads o he ollowing equa ions[15]:
ˆσ2
τ=MSB −MSAB
h = 0.593502
ˆσ2
γ=MSAB −MSE
= 0.306907
ˆσ2=MSE = 4.664423
Nex , we will compa e hese esul s o he REML based analysis o he
mixed model.
REML Model Building. We ha e speci ied sum con as s ins ead o he
de aul ea men con as s used in lme (). Again, Seed ep esen s he p ob-
lem ins ance, whe eas he algo i hm ins ance αj,j= 1,...,4, is ep esen ed by
obj eco.
Linea mixed model i by REML
Fo mula: yLog ~ obj eco + (1 | Seed)
+ (1 | Seed:obj eco)
Random e ec s:
G oups Name Va iance S d.De .
Seed:obj eco (In e cep ) 0.30691 0.55399
Seed (In e cep ) 0.59351 0.77039
Residual 4.66442 2.15973
Numbe o obs: 360,
g oups: Seed:obj eco, 36; Seed, 9
Fixed e ec s:
Es ima e S d. E o alue
(In e cep ) -6.0222 0.2956 -20.370
obj eco1 0.6176 0.2539 2.433
obj eco2 0.6918 0.2539 2.725
obj eco3 -0.6671 0.2539 -2.628
As can be seen om he Random e ec s sec ion o he REML model ou pu ,
he es ima ed a iances o he p oblem ins ance and he ins ance-in e ac ion
andom e ec s a e ˆσ2
τ= 0.59351 and ˆσ2
γ= 0.30691, espec i ely. The Random
e ec s sec ion p esen s he es ima es o he ixed e ec s model pa ame e s,
i.e., obj eco.
MAMP-3 Valida ion o he Model Assump ions Again, he check o
he diagnos ic plo s (Fig. 7) e eals ha a log ans o ma ion o he esponse
imp o es he model adequacy.
22
●●●
●●
●
●●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●●●
●
●●
●●
●
●●●
●
●
●
●
●
●●
●
●
●
●●●
●
●●
●●●
●
●
●
●●●
●●
●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●●
●
●●●
●
●
●●
●●●
●
●●
●
●●
●●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●●●
●●●
●●
●
●
−3 −2 −1 0 1 2 3
0.0 0.1 0.2 0.3
Q−Q plo o esiduals
Theo e ical Quan iles
Sample Quan iles
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−6 −4 −2 0 2 4
Q−Q plo o esiduals
Theo e ical Quan iles
Sample Quan iles
Figu e 7: Le : Q-Q plo o he esiduals o aw da a. Le : Q-Q plo o he
log- ans o med esponses.
MAMP-4a Hypo hesis Tes ing: Random E ec s We will conside an-
dom e ec s i s . Rega ding p oblem ins ances, es abou le els a e meaning-
less. Hence, we pe o m es s abou he a iance componen s σ2
τand σ2
γ, which
we e p esen ed in (19). Fi s , we a e es ing he null hypo hesis, which s a es
ha he componen s o he andom e ec s a e ze o. Based on he ANOVA om
Table 3, we ob ain he alues o he MAMP case ha a e shown in Table 4.
The alues e eal ha he e a e main ac o e ec s ( ixed and andom), bu no
signi ican in e ac ion e ec s.
Al e na i ely, we can compu e he likelihood a ios o models wi h and wi h-
ou he ac o s unde obse a ion.
Da a: mamp.d
Models:
mamp.lme 2: yLog ~ obj eco + (1 | Seed)
mamp.lme 3: yLog ~ obj eco + (1 | Seed)
+ (1 | Seed:obj eco)
D AIC BIC logLik
mamp.lme 2 6 1616.7 1640.0 -802.35
mamp.lme 3 7 1616.6 1643.8 -801.31
Chisq Chi D P (>Chisq)
2.0929 1 0.148
These es s indica e ha he e a e also no signi ican ins ance-algo i hm in e -
ac ions. Addi ional likelihood- a io es show ha he ixed ac o and andom
ac o e ec s a e signi ican .
MAMP-4b Hypo hesis Tes ing: Fixed Fac o E ec s Rega ding ixed
ac o s, we a e in e es ed in es ing o di e ences in he ac o le el means
µ+αi. These es s we e o mula ed in (18), i.e., we a e es ing H0: all αi
a e equal o 0 e sus H1: a leas one αj6= 0. He e, we a e using he es
23
s a is ic om [15, p. 523] o es ing ha he means o he ixed ac o e ec s
a e equal. The app op ia e es s a is ic o es ing ha he means o he ixed
ac o e ec s a e equal, i.e., H0is ue, is
F0=MSA
MSAB =154.59/3
185.6/24 = 6.663362,
wi h alues aken om Table 4. The e e ence dis ibu ion is Fn−1,(n−1)(q−1).
We calcula e he p alue o he es on he ixed-e ec e m. The ob ained p
alue is 0.002, hence he esul s collec ed indica e ha he ac o ecombina-
ion (obj eco) has a s a is ically signi ican impac on he pe o mance o he
algo i hm. Using sum o con as s implies ha Pαj= 0. The poin es ima es
o he mean algo i hm pe o mance wi h he j h ixed ac o se ing can be
ob ained by µ·j=µ+αj. The ixed ac o e ec s can be es ima ed in he mixed
model as
ˆµ=y...
ˆαj=yj. −y...,
which esul s in he ollowing es ima es: ˆα1= 0.6175519, ˆα2= 0.6918047,
ˆα3=−0.6671266, and ˆα4=−0.6423659.
The same es ima es we e ob ained wi h he REML analysis as can be seen
om he REML model ou pu on page 22. The co esponding ixed e ec s a e
shown in he Fixed e ec s sec ion o he REML ou pu . Fo example, we
ob ain he ollowing alue: obj eco1 = ˆα1= 0.6176.
MAMP-5 Con idence In e als and P edic ion We gene a e pai ed com-
pa isons plo s, which a e based on con idence in e als. The w appe unc ion
in e als() om Chia andini and Goegebeu [11] was used o isualizing
hese con idence in e als as shown in Fig. 8. When in e als o e lap we con-
clude ha he e is no signi ican di e ence. He e, we can conclude ha he
ecombina ion ope a o s (1) and (2) show a simila pe o mance, whe eas pe -
o mances be ween (3) and (2) a e di e en . In e media e ecombina ion o
he objec a iables, i.e., (3) and (4), esul s in a signi ican imp o emen o
he pe o mance.
6 Summa y and Ou look
In o de o answe ques ion (Q-1), we p opose an app oach o gene a e na u-
al p oblem classes, which a e based on eal-wo ld da a. I no such da a a e
a ailable, a i icial p oblem gene a o s such as MSG can be used. Since ou
app oach uses a model, say M, o gene a e new p oblem ins ances, one con-
cep ual p oblem a ises: This app oach is no applicable, i he inal goal is he
de e mina ion o a model o he da a, because Mis pe de ini ion he bes
model in his case and he sea ch o good models will esul in M. Bu he e
is a simple solu ion o his p oblem. In his case, he ea u e ex ac ion and
24