!
!
!
!
CIplus
Band 3/2016
Modeling and Op imiza ion o
a Robus Gas Senso
Ma ga i a A. Rebolledo C., Sebas ian K ey, Thomas Ba z-Beiels ein, Oli e Flasch, And eas
Fischbach, Jö g S o k
!
!
!
!
!
!
! !
MODELING AND OPTIMIZATION OF A
ROBUST GAS SENSOR
Ma ga i a A. Rebolledo C., Sebas ian K ey, Thomas Ba z-Beiels ein,
Oli e Flasch, And eas Fischbach, J¨o g S o k
SPOTSe en Lab, TH K¨oln Uni e si y o Applied Sciences
i s name.secondname@ h-koeln.de
Abs ac In his pape we p esen a compa ison o di e en da a d i en modeling
me hods. The i s ins ance o a da a d i en linea Bayesian model is
compa ed wi h se e al linea eg ession models, a K iging model and a
gene ic p og amming model. The models a e build on indus ial da a
o he de elopmen o a obus gas senso . The da a con ain limi ed
amoun o samples and a high a iance. The mean squa e e o o he
models implemen ed in a es da ase is used as he compa ison s a egy.
The esul s indica e ha s anda d linea eg ession app oaches as well
as K iging and GP show good esul s, whe eas he Bayesian app oach,
despi e he ac ha i equi es addi ional esou ces, does no lead o
imp o ed esul s.
Keywo ds: Bayesian modeling, BMA, Design o expe imen s, Gene ic p og am-
ming, Linea eg ession, Lasso, K iging.
1. In oduc ion
Theo e ically, he e a e many ad an ages o he implemen a ion o
Bayesian analysis [5]. The use o Bayesian models migh ep esen a
good al e na i e o indus ial applica ions as hey p oduce mo e in o -
ma i e esul s. The gene a ion o a da a-d i en model o op imize he
de elopmen o a ca bon-monoxide senso p o ides an oppo uni y o
es hese asse ions on limi ed and spa se da a. As a i s app oach,
Bayesian obus linea eg ession is implemen ed and compa ed o s an-
da d eg ession me hods and a gene ic p og amming app oach. Ou goal
is o lea n he di e ence in pe o mance om he es ed me hods when
applied o his kind o da a and o se u u e conside a ions o wo king
wi h Bayesian models in a mo e demanding ashion.
In ecen yea s he need o educe ai pollu ion le els has gained mo e
impo ance in he au omo i e indus y. The e iciency inc ease o he
1
2
mo o combus ion p ocess plays an impo an ole o he educ ion o
pollu ion le els. This e iciency can be indi ec ly measu ed by moni-
o ing he concen a ions o ca bon monoxide and o he ha m ul gases
eleased in o he a mosphe e. This pape ocuses on he modeling and
op imiza ion o a ca bon monoxide in-si u senso . The senso should be
able o disce n he ca bon-monoxide concen a ion apa om he o he
exhaus gases. This is a di icul goal, because he senso is exposed
o and in luenced by he o he gases. Thus, he senso ou pu is no
expec ed o be a di ec esul o he concen a ion o he gas o in e es .
Ins ead i will be he esul o an unde nea h p ocess in luenced by all
he o he gases. A he end o he analysis we hope o ob ain models
om di e en me hods wi h an imp o ed sensi i i y o ca bon-monoxide
concen a ions. The models will be compa ed in o de o check he pe -
o mances di e ences and possible imp o emen oppo uni ies.
This pape is s uc u ed as ollows: Sec ion 2 desc ibes he esea ch
con igu a ion, i.e., da a and expe imen al designs. Key ea u es o he
algo i hms a e in oduced in Sec. 3. Sec ion 4 p esen s esul s om he
expe imen s. Finally, a discussion o he esul s in gi en in Sec. 5.
2. P oblem
2.1 Da a Desc ip ion
The da a was collec ed ollowing a esponse su ace design o expe -
imen s (RS-DoE). This design cons ain s i sel o he maximum and
minimum expec ed concen a ion alues o each gas unde no mal wo k-
ing condi ions. Gi en he cos and ime consump ion equi ed o he
expe imen s, only a limi ed amoun o samples could be measu ed. The
minimum numbe o samples equi ed o ha e a good sys em desc ip-
ion and he eal limi o possible ealizable samples in he indus ial
es ing s a ion was balanced. Finally, a sample size o 80 was chosen.
A summa y o he da a is shown in Table 1. This applica ion example
is anonymized due o con iden iali y easons. The da a we e s anda d-
ized, meaning ha e e y sample had i s mean sub ac ed and was hen
di ided by he s anda d de ia ion. The di e en gases a e denomina ed
as he a iables X1 o X7. The alues o in e es co espond o he
columns denomina ed Y1 and Y2, which a e he senso measu emen s.
All he models will use his da ase as he aining se .
A gene al idea o he sys em beha io can be ob ained by examin-
ing he co ela ion be ween he sys em ou pu and inpu s as shown in
Table2. Some assump ions can be made abou he in luence each a i-
able has on he senso ou pu : no all pa ame e s seem o ha e he
same in luence on he senso ou pu . Also, he senso s do no beha e
Modeling and Op imiza ion o a Robus Gas Senso 3
Table 1: O e iew o he s anda dized da ase used o gene a e he
models o he senso s. He e e e y inpu o he model is deno ed by an
Xand e e y senso ou pu is deno ed by an Y.
X1 X2 X3 X4 X5 X6 X7 Y1 Y2
Minimum −1.13 −1.21 −1.16 −1.13 −1.15 −1.17 −1.00 −1.94 −2.06
1s Qua al −1.13 −1.21 −1.16 −1.13 −1.15 −1.17 −0.82 −0.63 −0.58
Median 0.09 0.03 0.12 0.08 0.08 0.05 −0.39 0.06 0.09
Mean 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
3 d Qua al 1.30 1.26 1.40 1.29 1.28 1.28 0.59 0.66 0.67
Maximum 1.30 1.26 1.40 1.29 1.31 1.28 3.79 2.32 2.28
Table 2: Co ela ion be ween he sys em ou pu and inpu s o he ain-
ing da ase
X1 X2 X3 X4 X5 X6 X7
Y1 0.34 −0.19 −0.27 0.73 0.01 −0.00 −0.21
Y2 0.31 −0.16 −0.18 0.78 0.00 −0.03 −0.22
iden ically. Figu e 1 shows he e ec he wo mos s ongly co ela ed
pa ame e s, X1 and X4, espec i ely, ha e on he senso s signal.
A second da ase , denomina ed es se , which ollow he cha ac e is ic
o he p e iously desc ibed aining se was made a ailable o alida e
he esul s o he ob ained models.
2.2 Expe imen al Design Conside a ions
The da a desc ibed in Sec. 2.1 was e ie ed du ing es s based on an
expe imen al design sui able o i ing models using he esponse su ace
me hodology (RSM) [13]. Fi s expe imen al esul s aken om sc eening
design expe imen s indica ed ha a i s o de polynomial model is no
su icien , due o c oss-sensi i i y o he senso s. The e o e i was decided
o use a RSM wi h wo- ac o in e ac ions quad a ic e ec s. In his case
cen al composi e designs would ha e been a logical choice. They a e
a combina ion o a box design, ypically a ull ac o ial o ac ional
ac o ial design and addi ional s a o cen e poin s.
A ull ac o ial design (FFD) wi h h ee le els o each o he six
ac o s o es ima e main e ec s and all quad a ic e ms would lead o
37= 2187 expe imen uns a minimum. Choosing o di ide ac o s
4
−1.0 −0.5 0.0 0.5 1.0
−2 −1 0 1 2
Concen a ion X1
Measu ed esponse Y1
(a)
−1.0 −0.5 0.0 0.5 1.0
−2 −1 0 1 2
Concen a ion X4
Measu ed esponse Y1
(b)
Figu e 1: Sca e plo s showing he gene al beha io o Y1 wi h espec
o: a) he in luence o X1 and b) he in luence o X4
in wo le els, a wo le el FFD would s ill need a leas 27= 128 uns,
wi hou any epe i ions o cen e poin s. A alid sys em desc ip ion
would need e en mo e uns.
To educe he numbe o uns and be capable o i ing second o de
polynomial models a Box-Behnken design comes in o conside a ion. Bu
as a linea cons ain on he inpu a iables is limi ing he sum o hei
alues, almos all s anda d designs me hods does no mee he equi e-
men s. The e o e, a mo e lexible design is needed. Using he s a is ical
so wa e JMP he RS-DoE was gene a ed ollowing he gi en cons ain s
and applying he I-op imali y c i e ion [12, 11]. The design was un and
op imized a o al o 80 imes wi h 80 da a poin s in o de o ob ain he
bes possible in e ence accu acy.
3. Algo i hms
3.1 OLS
A linea model es ima ed by o dina y leas squa es is he na u al i s
modeling a emp o da a gene a ed by an expe imen al design. Ou
baseline model o he compa ison o he di e en modeling me hods is
Modeling and Op imiza ion o a Robus Gas Senso 5
he linea main e ec s model
1: ˆy=β0+
7
X
i=1
βixi.
In his wo k we used a RSM design, so beside he main e ec s he
pa ame e s o all wo-way in e ac ions and quad a ic e ms o he inpu
a iables can be es ima ed. This esul s in he ull linea model
7: ˆy=β0+
7
X
i=1
βixi
7
X
i=1
7
X
j=i
βijxixj.
Based on he ull linea model 7we applied a iable selec ion based on
an analysis o a iance o ge a mo e spa se model, which can be be e
in e p e ed. Wi h a F-Tes p- alue o α= 0.01 as he decision bounda y
o he inclusion in o he inal model, we ob ained he model
2: ˆy=β0+
4
X
i=1
βixi+β14x1x4+β34x3x4.
The mean squa ed e o (MSE) as de ined in Eq. 4 on page 11, was
used o ou compa isons. While he ull linea model 7has a lowe
aining e o , i.e., MSE o 0.11 o senso Y1 and 0.10 o senso Y2,
compa ed o he baseline model (MSE o 0.24 o senso Y1 and 0.23 o
senso Y2), he p edic ion pe o mance on he es da ase is e y weak
(MSE o 7.76 and 9.08). This is a s ong indica o o o e i ing.
The model 2has a MSE o 0.79 o senso Y1 and 0.80 o senso
Y2, which is compa able ( o senso Y1) and li le highe ( o senso Y2)
han he baseline model. The esidual s anda d e o o he aining
se is lowe (0.43 compa ed o 0.52 o senso Y1 and 0.41 compa ed
o 0.51 o senso Y2) esul ing in na owe con idence in e als o he
pa ame e s. The adjus ed coe icien o de e mina ion (adjus ed R2) is
0.81 compa ed o 0.73 o senso Y1 and 0.83 compa ed o 0.74 o senso
Y2. This means he inclusion o he wo wo-way in e ac ions X1:X4 and
X3:X4 has a la ge con ibu ion o he explana ion o he a iance in he
da ase , while he inpu a iables X5, X6 and X7 ha e e y li le o no
con ibu ion and can be le ou o he model.
3.2 Lasso
The Leas Absolu e Sh inkage and Selec ion Ope a o (Lasso) imple-
men s a selec ion me hod o linea models [9]. I selec s solu ions
6
wi h ewe pa ame e alues, e ec i ely educing he numbe o a i-
ables upon which he gi en solu ion is dependen . The Lasso ains a
linea model wi h a L1p io as egula ize .
Gi e a se o inpu measu emen s X={xi}n
i=1 and an ou come mea-
su emen y, he lasso i s a linea model
ˆy=β0+
p
X
i=1
βixi.
Le α≥0 be a cons an . The Lasso uses he ollowing op imiza ion
c i e ion:
min
β
1
2n||Xβ −y||2
2unde he cons ain ||β||1≤α, (1)
whe e ||·||1and ||·||2deno e he L1- and L2-no m, espec i ely. The pos-
i i e cons an αis a uning pa ame e . Fo la ge α alues, he cons ain
||β||1≤αin Eq. 1 has no e ec and he usual linea leas squa es eg es-
sion is pe o med. Fo smalle alues o α, he solu ions a e sh unken
e sions o he leas squa es es ima es. Dec easing he alues o α o ces
he coe icien s βi’s o become ze o, i.e., choosing α esul s in selec -
ing he numbe o p edic o s o use in a eg ession model. The Lasso
can eco e he exac se o non-ze o weigh s (unde ce ain condi ions).
Coo dina e descen is used o i he coe icien s.
3.3 K iging
K iging o Gaussian p ocess eg ession is a me hod o in e pola ion [10].
The nobse a ions in an a bi a y da a se , Y={yi}n
i=1 can be associ-
a ed as a single poin sampled om some mul i a ia e (n- a ia e) Gaus-
sian dis ibu ion. The obse a ions and he Gaussian p ocess a e ela ed
o each o he by he co a iance o ke nel unc ion k(xi, xj). Ke nel unc-
ions compu e he dis ance be ween wo samples in an a bi a y me ic
and apply a adial unc ion o his dis ance. The squa ed exponen ial
ke nel, also known as he Gaussian adial basis unc ion (RBF) ke nel,
is used in ou s udy. This ke nel is gi en by
k1(xi, xj) = σ2exp(−θkxi−xjk2
2) wi h θ=1
2l2.(2)
The RBF ke nel can be in e p e ed as a simila i y measu e, because
alues o his ke nel dec ease wi h dis ance. They ange be ween ze o
(in he limi ) and one. The leng h pa ame e lin Eq. 2 de e mines he
e ec o o he obse a ions du ing in e pola ion a new x alues. The
RBF ke nel was selec ed in ou s udy, because Gaussian p ocesses wi h
Modeling and Op imiza ion o a Robus Gas Senso 7
his ke nel gene a e smoo h unc ions. Since noisy da a we e analyzed in
ou s udy, he whi e noise ke nel k2(xi, xj) = σ2δ(xi, xj), whe e δ(xi, xj)
deno es he K onecke del a unc ion, was added o he RBF ke nel.
Hence, we used he ke nel unc ion k=k1+k2.
3.4 Robus Bayesian Modeling
F om p e ious knowledge o he combus ion p ocess i is expec ed
ha no only he main p edic o s bu also he in e ac ions be ween p e-
dic o s ha e an e ec on he senso eading. As a gene al ule, i he
da a con ains K a iables hen he expec ed numbe o possible models
will be 2K. The o al numbe o a iables in he da ase wi h all he
in e ac ions included accoun ed o 22, ha is 4.19 ×103possible model
combina ions o desc ibe he senso eading. To educe he dimension-
ali y o he p oblem Bayesian model a e aging (BMA) is implemen ed.
This p o ides a way o accoun o he unce ain y in model selec ion and
p o ide in a e age a be e p edic i e abili y [3]. BMA is implemen ed
in he s a is ical p og aming language R using he Bayesian Model Sam-
pling (BMS) package [8]. The esul s show ha he p edic o s X1, X3,
X4, X1:X4, X3:X4 and X2 seem o be he mos impo an o a good
model.
Bayesian modeling is he ma hema ical eloca ion o c edibili y o pa-
ame e s alues o a model acco ding o wha can be in e ed om
he da a. As he i s aken app oach he educed model con aining
only 6 ou o he 22 a iables is de ined using a linea ela ionship.
The model was implemen ed using Jus Ano he Gibbs Sample (JAGS),
which is a p og am o Bayesian modeling using Ma ko Chain Mon e
Ca lo (MCMC) [6] and jags [7] as a link be ween R and JAGS.
The senso esponses Y1 and Y2 a e modeled ollowing a non s anda di-
zed S uden ’s -dis ibu ion. This dis ibu ion was selec ed assuming
ha he a iance p esen in he senso ou pu , illus a ed in Figu e 1,
se ed as an indica o o a iance in he model esponse. The mean
o he dis ibu ion is de ined by he canonical linea o mula o he lin-
ea eg ession. The sp ead o he da a was se o ha e a wide ange
o p obable alues de ined by an uni o m dis ibu ion. The no mali y
ac o , exp essed as an exponen ial dis ibu ion, ha e p e e ence o al-
ues close o one. Gi en he limi ed p io in o ma ion a ailable o he
expe imen , weakly in o ma i e p io s a e assigned o he pa ame e s.
The coe icien s p io s, βi, a e de ined o ha e a no mal dis ibu ion cen-
e ed a ound ze o and a la ge a iance. The -dis ibu ion no mali y
ac o υ a o s alues smalle han 30 and σallows o a wide enough
8
dis ibu ion. The p io dis ibu ions we e chosen as ollows (Eq. 3):
α∼N(0,4), βi∼N(0,4), υ ∼Exp(30), σ ∼U(−1−4,10) (3)
The MCMC simula ion a e execu ed on he de ined model o sample
he pos e io dis ibu ion o he pa ame e s o in e es , α, βi, σ, and υ.
The chains we e speci ied o un 500 adap i e i e a ions, ollowed by
1,500 bu n-in i e a ions. A e wa ds, 15,000 samples we e aken om
he pos e io dis ibu ion wi h a hinning ac o o 20 s eps. In es i-
ga ing hei ace plo s and diagnos ic s a is ics o he esul ing MCMC
objec e eals ha he chains ha e con e ged. A alue o he Gelman-
Rubin diagnos ic s a is ic [2] o unde 1.1 sugges a good con e gence.
The e ec i e sampling size (ESS) backups his assump ion. The isual
and nume ic diagnos ics allows us o hink ha he esul ing MCMC
sampling is ep esen a i e and accu a e o he pos e io dis ibu ion o
he di e en pa ame e s. The pos e io dis ibu ions ob ained om he
MCMC sampling o each pa ame e coe icien can be seen in Table 3
oge he wi h he high densi y in e als (HDI) o 95%. The MSE o he
i ed models is 0.16 and 0.15 o he senso Y1 and Y2, espec i ely.
Table 3: Pos e io mean o he coe icien s βi o i= 1, ..7 and he
pa ame e s σand υ o he models o Y1 and Y2. The lowe HDI (L-
HDI) and uppe HDI (U-HDI) limi s a e indica ed o each en y.
Y1
B0 B1 B2 B3 B4 B5 B6 σ υ
Mean −0.01 0.32 −0.14 −0.28 0.72 −0.23 −0.14 0.41 37.67
L-HDI −0.09 0.23 −0.24 −0.38 0.63 −0.33 −0.42 0.34 5.13
U-HDI 0.09 0.42 −0.04 −0.18 0.82 −0.14 −0.05 0.49 117.66
Y2
B0 B1 B2 B3 B4 B5 B6 σ υ
Mean −0.00 0.29 −0.10 −0.20 0.78 −0.26 −0.13 0.39 34.20
L-HDI −0.09 0.20 −0.19 −0.29 0.69 −0.35 −0.22 0.31 4.44
U-HDI 0.09 0.39 −0.10 −0.11 0.88 −0.17 −0.04 0.47 113.50
3.5 Gene ic P og amming
Gene ic p og amming is an e olu iona y algo i hm ha sea ches he
se o symbolic exp essions de ined by a se o basis exp essions (building
blocks) o exp essions ha minimize one o mul iple loss ( i ness) unc-
ions. Symbolic eg ession is he applica ion o gene ic p og amming o