Más e Uni e si a io en Modelización e In es igación
Ma emá ica, Es adís ica y Compu ación 2019/2020
T abajo Fin de Más e
On he use o Neu al Ne wo ks o sol e
Di e en ial Equa ions
Albe o Ga cía Molina
Tu o /es
Ca los Go ia Co es
Luga y echa de p esen ación p e is a
12 de Oc ub e del 2020
Abs ac
English.
A i icial neu al ne wo ks a e pa ame ic models, gene ally adjus ed o sol e eg ession and
classi ica ion p oblem. Fo a long ime, a ques ion has laid a ound ega ding he possibili y
o using hese ypes o models o app oxima e he solu ions o ini ial and bounda y alue
p oblems, as a means o nume ical in eg a ion. Recen imp o emen s in deep-lea ning ha e
made his app oach much a ainable, and in eg a ion me hods based on aining ( i ing)
a i icial neu al ne wo ks ha e begin o sp ing, mo i a ed mos ly by hei mesh- ee na u e and
scalabili y o high dimensions. In his wo k, we go all he way om he mos basic elemen s,
such as he de ini ion o a i icial neu al ne wo ks and well-posedness o he p oblems, o
sol ing se e al linea and quasi-linea PDEs using his app oach. Th oughou his wo k we
explain gene al heo y conce ning a i icial neu al ne wo ks, including opics such as anishing
g adien s, non-con ex op imiza ion o egula iza ion, and we adap hem o be e sui e he
ini ial and bounda y alue p oblems na u e. Some o he o iginal con ibu ions in his wo k
include: an analysis o he anishing g adien p oblem wi h espec o he inpu de i a i es, a
cus om egula iza ion echnique based on he ne wo k’s pa ame e s de i a i es, and a me hod
o escale he subg adien s o he mul i-objec i e o he loss unc ion used o op imize he
ne wo k.
Spanish.
Las edes neu onales son modelos pa amé icos gene almen e usados pa a esol e p oblemas
de eg esiones y clasi icación. Du an e bas an e iempo ha ondado la p egun a de si es posible
usa es e ipo de modelos pa a ap oxima soluciones de p oblemas de alo es iniciales y de
con o no, como un medio de in eg ación numé ica. Los cambios ecien es en deep-lea ning han
hecho es e en oque más iable, y mé odos basados en en ena (ajus a ) edes neu onales han
empezado a su gi mo i ados po su no necesidad de un mallado y su buena escalabilidad a
al as dimensiones. En es e abajo, amos desde los elemen os más básicos, como la de inición
de una ed neu onal o la buena de inición de los p oblemas, has a se capaces de esol e
di e sas EDPs lineales y casi-lineales. A lo la go del abajo explicamos la eo ía gene al
elacionada con edes neu onales, que incluyen ópicos como los p oblemas de des anecimien o
de g adien es ( anishing g adien ), op imización no-con exa y écnicas de egula ización, y
los adap amos a la na u aleza de los p oblemas de alo es iniciales y de con o no. Algunas
de las con ibuciones o iginales de es e abajo incluyen: un análisis del des anecimien o de
g adien es con espec o a las a iables de en ada, una écnica de egula ización cus omizada
basada en las de i adas de los pa áme os de la ed neu onal, y un mé odo pa a escala los
subg adien es de la unción de cos e mul i-objec i o usada pa a op imiza la ed neu onal.
I
Acknowledgemen s
To my ad iso Ca los Go ia Co es, o his ad ice, and o my amily and iends who ha e
gi en me hei suppo in all hese mon hs.
II
P eamble
The s uc u e o his wo k is di ided in o 5 chap e s and 2 annexes.
Chap e 0 s a s by gi ing an ini ial p agma ic o e iew o mul i-linea algeb a. I s pu pose
is o gi e anyone o eign o his subjec a wo king knowledge o enso s: de ining hei no a ion
and how o ope a e wi h hem. Tenso s will be ex ensi ely used h oughou Chap e 2 when
desc ibing a i icial neu al ne wo ks.
Chap e 1 con ains he ac ual in oduc ion o p oblem a hand. He e we will be explo ing
he mo i a ions o using a i icial neu al ne wo ks o nume ically in eg a e ini ial/bounda y
alue p oblems. On op o his, we will also be lis ing he di e en ial ope a o s ha will be
used, desc ibe he gene al condi ions unde which we will be gua an eeing well-posedness, and
examine s a e o he a .
Chap e 2 will layou he heo e ical amewo k o a i icial neu al ne wo ks. I will be
co e ing he e e y hing necessa y o de ine and ain a deep lea ning model om g ound
ze o. The opics co e ed in his sec ion include: de ini ion and design choices, es ablishmen
o an objec i e (loss) unc ion and non-con ex op imiza ion, and he use o egula iza ion
echniques. Al hough hese opics a e gene al o deep-lea ning, h oughou his whole chap e
we ha e adap ed hem, whe e necessa y, o i he subjec o his wo k.
Chap e 3 is he expe imen al pa o his wo k. The i s h ee sec ions con ain he
discussion on some p ac ical issues, namely, he p og amming, app oxima ing capaci ies o
a i icial neu al ne wo ks and aining mul i-objec i e unc ions. Following hese sec ions, lie
he expe imen s and simula ions o his wo k. He e we pu in o p ac ice all he p e ious
knowldege ha we ha e build up o nume ically in eg a e some ins ances o ini ial/bounda y
alue p oblems. On each ins ance we benchma k and discuss he esul s o se e al se -ups
based on he di e en a chi ec u es and aining op ions seen up o his poin .
Chap e 4 has he inal conclusions o his wo k. An analysis on he limi a ions and he
ad an ages o his echnique wi h espec o o he s, as a way o app oxima e solu ions o
di e en ial equa ions, is made. Also, based on he expe ience om his wo k, we sugges
possible lines o wo k and open ela ed ques ions, which can be conside o u he wo k.
Annexes A & B include: a linea algeb a pe spec i e o some exp essions in Chap e 2
o u he cla i y, and he code, espec i ely.
III
Con en s
Abs ac I
P eamble III
Lis o Figu es V
Table Index VI
0 O e iew o Mul i-linea Algeb a 1
0.1 Wha isa enso ?.................................. 1
0.2 Tenso Ope a ions and Summa ion Con en ion . . . . . . . . . . . . . . . . . 2
0.3 Linea Algeb a as Mul i-linea Algeb a . . . . . . . . . . . . . . . . . . . . . . 3
0.4 De i a i es o Vec o Func ions and Tenso s . . . . . . . . . . . . . . . . . . . 4
0.5 The Chain Rule in Tenso No a ion . . . . . . . . . . . . . . . . . . . . . . . . 5
1 In oduc ion 6
1.1 Posing heP oblem................................. 9
1.2 Rele an Li e a u e................................. 12
2 A i icial Neu al Ne wo ks F amewo k 14
2.1 Wha a e A i icial Neu al Ne wo ks? . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 F om Nume ical In eg a ion o Deep-Lea ning . . . . . . . . . . . . . . . . . . 17
2.3 De i a i es: Back P opaga ion and G adien Issues . . . . . . . . . . . . . . . 18
2.3.1 De i a i es Beha iou (Vanishing and Exploding G adien s) . . . . . . 19
2.4 Op imize s...................................... 25
2.4.1 Fi s O de Me hods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.2 Second O de Me hods . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Ac i a ion Func ions and Pa ame e Ini ializa ion . . . . . . . . . . . . . . . . 35
2.5.1 Pa ame e Ini ializa ion . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.6 Regula iza ion.................................... 39
2.6.1 Noise-based Regula iza ions . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6.2 Res ic ion-based Regula iza ions . . . . . . . . . . . . . . . . . . . . . 42
2.6.3 O he Regula iza ions . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3 Case S udies and Simula ions 46
3.1 Coding A i icial Neu al Ne wo ks . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 App oxima ing a Func ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 T aining wi h Mul i-Objec i e Loss Func ions . . . . . . . . . . . . . . . . . . 51
3.4 ModelSimula ion.................................. 55
3.4.1 Model 1: The 1D Di e gence Ope a o . . . . . . . . . . . . . . . . . . 55
3.4.2 Model 2: The 2D Di e gence Ope a o . . . . . . . . . . . . . . . . . . 57
3.4.3 Model 3: The 2D Laplacian Ope a o . . . . . . . . . . . . . . . . . . . 61
3.4.4 Model 4: The 1D Ad ec ion Ope a o . . . . . . . . . . . . . . . . . . 62
IV
3.4.5 Model 5: The 2D Clai au Ope a o . . . . . . . . . . . . . . . . . . . 64
3.4.6 Model 6: The 2D Bu ge s Ope a o . . . . . . . . . . . . . . . . . . . . 65
4 Conclusions 68
4.1 Au ho ’s Final Though s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 Fu he Wo k.................................... 69
A Linea Algeb a Fo mula ion o 2.3.1 70
B The Code 72
B.1 impo sCell..................................... 73
B.2 auxili yPlo ingClass ............................... 73
B.3 myDa aSe sClass.................................. 79
B.4 p oblemIns anceClass ............................... 82
B.5 secondO de Op imize s Class . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
B.6 myLaye Class ................................... 87
B.7 myModelClass ................................... 89
B.8 execu ionCell....................................100
Bibliog aphy 102
Lis o Figu es
2.1 Pe cep onscheme.................................. 14
2.2 A di ec ed g aph which could be a possible ep esen a ion o he a chi ec u e
o an a i icial neu al ne wo k. Nodes a e a i icial neu ons and edges indica e
which neu ons eed in o each o he . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Gene al scheme o a pe cep on based ully-connec ed eed- o wa d a i icial
neu alne wo k.................................... 15
2.4 Compu a ional g aph o example (2.6). . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Compu a ional g aph (de i a i es) o example (2.6). In g een he low o nodes
equi ed o compu e 𝜕𝑓(𝑥,𝑦)/𝜕𝑥 ......................... 19
2.6 Example model: A 2-3-4-2 a i icial neu al ne wo k. . . . . . . . . . . . . . . . 20
2.7 Main ac i a ion unc ions and hei i s o de de i a i es. . . . . . . . . . . . 36
2.8 Combina ion o sigmoid unc ions. . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.9 Seconda y ac i a ion unc ions and hei i s o de de i a i es. . . . . . . . . 37
2.10 Example o o e i ing o a model. . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.11 Example o a model adding noisy inpu . . . . . . . . . . . . . . . . . . . . . . 41
3.1 Compa ison o di e en ac i a ion unc ions aining pe o mance o a
[3,4,1]-ANN, wi h Adam 𝜂=0.01,𝛽1=0.9,𝛽1=0.999. Log10 scale. . . . . . 48
3.2 Compa ison o di e en i s o de op imize s aining pe o mance o a
[3,4,1]-ANN, wi h sigmoid ac i a ions. Lowe image in log10 scale. . . . . . . . 49
3.3 T aining pe o mance o a [3,4,1]-ANN wi h sigmoid ac i a ions, o i (3.1),
usingBFGSandL-BFGS. ............................. 50
V
3.4 T aining pe o mance o a [3,4,1]-ANN wi h sigmoid ac i a ions, o i (3.1),
using Adam wi h 𝜂=0.01.............................. 51
3.5 Example o possible mul i-objec i e unc ions. Componen and o al
ep esen a ion. ................................... 52
3.6 Example o possible mul i-objec i e unc ions. Adjus ed ac o s. . . . . . . . . 53
3.7 T aining pe o mance o 3 models ained o a [1,5,5,1]-ANN scheme, wi h no
egula iza ion, using Adam wi h 𝜂 = 0.01,𝛽1= 0.9,𝛽2= 0.999, on 3000
epochs.(3.5) .................................... 56
3.8 Final esul s. Bes pe o ming ained model ( anh) o (3.7) agains he exac
solu ion........................................ 57
3.9 Resul o a [1,10,10,1]-ANN model and anh ac i a ions, ained wi h no
egula iza ion, using Adam wi h 𝜂=0.01,𝛽1=0.9,𝛽2=0.999, on 12000
epochs. Le plo : model agains exac solu ion. Righ plo MSE e o o he
model, o each poin in he domain. . . . . . . . . . . . . . . . . . . . . . . . 58
3.10 Compa ison o di e en egula iza ion echniques in aining pe o mance o
3 models ained o a [1,10,10,1]-ANN scheme, using Adam wi h 𝜂 =0.01,
𝛽1=0.9,𝛽2=0.999, on 8000 epochs. (3.10) . . . . . . . . . . . . . . . . . . . 60
3.11 Compa ison o di e en egula iza ion echniques in aining pe o mance o
3 models ained o a [1,40,40,1]-ANN scheme, using Adam wi h 𝜂 =0.01,
𝛽1=0.9,𝛽2=0.999, on 8000 epochs. (3.10) . . . . . . . . . . . . . . . . . . . 60
3.12 Final esul s o he bes pe o ming ained model ([1,40,40,1]-ANN, ained
wi h he cus om egula iza ion (2.58)) o (3.7) agains he exac solu ion. . . 60
3.13 Resul s and pe o mance o he model ained o (3.11). . . . . . . . . . . . . 62
3.14 Posi i e and nega i e sign solu ions o 3.13. . . . . . . . . . . . . . . . . . . . 63
3.15 Resul s and pe o mance o he model ained o (3.16). . . . . . . . . . . . . 64
3.16 Resul s and pe o mance o he model ained o (3.17). . . . . . . . . . . . . 65
3.17 Resul s and pe o mance o he model ained o (3.18). . . . . . . . . . . . . 67
Table Index
1.1 Compa ison be ween FEMs and he A i icial Neu al Ne wo k Me hods. . . . . 8
1.2 Lis o di e en ial ope a o s used in Chap e 3. . . . . . . . . . . . . . . . . . 9
2.1 Lis o main ac i a ion unc ions. . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2 Lis o seconda y ac i a ion unc ions. . . . . . . . . . . . . . . . . . . . . . . 37
3.1 Resul s o 3 models ained o a [1,5,5,1]-ANN scheme, wi h no egula iza ion,
using Adam wi h 𝜂=0.01,𝛽1=0.9,𝛽2=0.999, on 3000 epochs. (3.5) . . . . 56
3.2 Resul s o 6 models wi h di e en a chi ec u es, ained o (3.10), using
Adam wi h 𝜂 = 0.01,𝛽1= 0.9,𝛽2= 0.999, on 8000 epochs and di e en
egula iza ion echniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
VI
Chap e 0
O e iew o Mul i-linea Algeb a
Gene ally, when wo king in he con ex o a i icial neu al ne wo ks, he amewo k linea
algeb a is mo e han enough o desc ibe he elemen s and ope a ions aking place. E en when
dealing wi h con olu ional ne wo ks, which may in ol e ope a ions on 3 dimensional a ays
o objec s, one can be decompose e e y hing in o ec o s ma ices, ma ix mul iplica ions and
elemen -wise p oduc s. Thus, many imes, when in he con ex o a i icial neu al ne wo ks,
any explici e e ence o mul i-linea algeb a o he enso na u e o such objec s is dis ega ded.
In his wo k, howe e , we will be aking he mul i-linea algeb a app oach. The e a e wo
main d aws o doing his, i.e. gene alizing ec o s and ma ices o enso s:
– Fi s , he enso no a ion is e y powe ul. This no a ion se es wo pu poses: i allows
us o ep esen ope a ions be ween enso s in a e y compac way and i also helps o
keep ack o dimensions a any ime.
– Second, mul i-linea algeb a p o ides a simple and na u al amewo k o cha ac e ize
high o de de i a i es o mul idimensional objec s such as ec o s o ma ices, which is
a pa icula i y o his wo k. In his amewo k de i a i es and he chain ule a e eally
easy o in e p e as hey isually ake he o m o he one dimensional case.
Fo he nex pa o his chap e we will be co e ing he basics o enso s. Howe e , since
he objec i e o his wo k is no o discuss mul i-linea algeb a, and he only pu pose o his
chap e is o se e as an en y poin o he concep s and he no a ion o enso s, we will be
aking a hands-on in o mal app oach. This means ha , he e will be no o mal de ini ions
and e e y concep will be explained h ough an example. Fo a p ope in oduc ion wi h due
igou one can e e o chap e s 2 o 4 in [1].
0.1 Wha is a enso ?
Pe haps he simples way o de ine a enso is as an elemen in a enso space, which is
no hing else han a di ec p oduc o ec o spaces and dual ec o spaces. So, o example,
le s imagine a andom enso 𝑇in he ollowing enso space:
𝑇∈ℝ4∗⊗ℝ2∗⊗ℝ3,(1)
hen 𝑇is o he o m 𝑇 =𝑣⊗𝑤⊗𝑧, whe e 𝑣∈ℝ4∗,𝑤∈ℝ2∗,𝑧∈ℝ3. Obse e ha 𝑇
is uniquely de ined in he enso space by 4×2×3=24scala componen s ( he indi idual
coo dina es o 𝑣,𝑤and 𝑧, ha ing ixed a base in each (dual) ec o space).
The p e ious is essen ially a de ini ion o a enso , bu in p ac ice we wan o desc ibe a
enso no by a di ec p oduc o ec o s bu by a se o scala coo dina es, he same way we
do wi h a ec o space. This is achie ed by de ining a enso base. So, gi en a base o each
o he (dual) ec o spaces in he enso space; o he p e ious example {𝑒1,𝑒2,𝑒3,𝑒4}ℝ4∗,
{ 𝑒1, 𝑒2}ℝ2∗,{ 𝑒1, 𝑒2, 𝑒3}ℝ3; we can in ui i ely build a base o he enso space as ollows:
{𝑒𝑖⊗ 𝑒𝑗⊗ 𝑒𝑘|𝑖=1,2,3,4, 𝑗=1,2, 𝑘=1,2,3}ℝ4∗⊗ℝ2∗⊗ℝ3(2)
1
High dimensional sys ems a e no e y common in physics, bu a ise in many ields such
as sociology and economics. Fo example, i we we e o conside op ion p icing in inance,
assuming he ma ke pa ame e s cons an ( o no incu in a s ochas ic p oblem), he sys em
would be modelled a e a PDE which has a leas as many a iables and dimensions as s ocks
in he po olio as well as he ime, which is gene ally a la ge numbe [2]. In cases such as
he one we ha e jus exposed, FEMs a e imp ac ical and Mon e Ca lo me hods a e used
[3], bu s ill ha e some s abili y limi a ions. Fo his eason in ecen imes, wi h he many
imp o emen s in a i icial neu al ne wo k, new machine lea ning me hods ha e esu ged as
po en ial candida es o deal wi h hese kinds o high dimensional p oblems. The main idea is
based in using he good quali ies o a i icial neu al ne wo ks as unc ion app oxima o s.
An a i icial neu al ne wo k is jus a complex pa ame ized unc ion 𝒩(𝑥;𝜃), which uses
modula a chi ec u e based on he concep o neu ons, has a s uc u e op imized o compu e
p ocessing, and makes use o non-linea op imiza ion algo i hms o ain i s pa ame e s o
i some model (we will co e his in Chap e 2). Basing ou sel es in he p e ious simpli ied
de ini ion, he deep-lea ning app oach should be s aigh o wa d, simply pu : he me hod
will app oxima e he exac solu ion by aking an a i icial neu al ne wo k, eplacing i in o
he di e en ial equa ion and using an op imiza ion algo i hm o ain i s pa ame e s so ha
he equa ion is sa is ied; all while making use o deep-lea ning s a egies o speed up he
p ocess. In [4] his ype o me hods is e e ed o as “Deep Gale kin Me hod”, he eason being
ha : bo h me hodologies e ol e a ound app oxima ing he exac solu ions o a di e en ial
equa ion ia a pa ame ized unc ion, ei he a linea combina ion o base unc ions o an
a i icial neu al ne wo k; and bo h in ol e eplacing his app oxima ion in o he di e en ial
equa ion and sol ing an in e se p oblem o ind i s coefficien s o pa ame e s. Howe e ,
he a i icial neu on s a egy di e much in na u e and lacks many o he elemen s o he
me hods in he Gale kin amily as i does no : ake in o accoun he idea o weak o mula ion
(which we ha e no explained he e o simplici y); use linea combina ion o base unc ions
and p ojec ions; and he esul ing in e se p oblem does no lead o sol ing a linea sys em
o equa ions in a ou all in a ou o a pu e non-con ex op imiza ion. In ac , i is because
o hese di e ences ha his machine lea ning app oach should be, in heo y, able o scale
well wi h dimension, since in using non-con ex op imiza ion, all dimensions a e ained a he
same ime, which should no inc ease much compu a ional cos . On he o he hand, one o
he main p oblems is ha he e o is no bound by an o de and is unp edic ably subjec o
op imiza ion and aining pa icula i ies. The ollowing able summa izes all he abo e:
Gale kin Me hods (FEM) A i icial Neu al Ne wo k Me hods
App oxima es he solu ion wi h a base o
linea unc ions.
App oxima es he solu ion wi h an a i icial
neu al ne wo k.
Requi es compu ing some in eg als (o
quad a u es) and sol ing a linea sys em.
Requi es sol ing a non-con ex op imiza ion
p oblem.
E o o de and s abili y p ope ies known. E o and s abili y unknown and depends on
he speci ics o he op imiza ion.
The complexi y scales exponen ially wi h he
dimension.
Gene alizes well o highe dimensions wi h
jus a ew mo e neu ons.
Table 1.1: Compa ison be ween FEMs and he A i icial Neu al Ne wo k Me hods.
8
In his wo k, we will be using deep-lea ning echniques and me hodologies o y and sol e
some ins ances o di e en ial equa ion. The objec i e will be o analyse he iabili y and
capabili ies o hese me hods. Al hough he main in e es o his me hods is in in eg a ing
PDEs (as he exis ing ODE in eg a ion me hods a e al eady e y efficien ), we will s a in
a p og essi e way, by s udying i s applica ion on ODEs (which can be seen as a pa icula
case o PDEs). Then we will scale up he complexi y o he ope a o s un il we a e able o
sol e some low-dimensional PDEs. Highe dimensional equa ions will be ou o scope since
he aim is o illus a e he easibili y, and s eng hs-weaknesses o his s a egy o which wo
dimensions will be enough.
1.1 Posing he P oblem
In i s mos gene al o m, a sys em o di e en ial equa ions wi h solu ion in he eal space
may be ep esen ed as ollows:
ℒ[𝑢(𝑥)]=𝑓(𝑥), 𝑥∈Ω⊆ℝ𝑛,(1.1)
whe e Ω⊆ℝ𝑛is a compac mani old, ℒ[⋅]is a di e en ial ope a o , 𝑓(𝑥)∶Ω→ℝ𝑚is he
ex e nal o ce, and 𝑢∶Ω→ℝ𝑚is a solu ion o he sys em. No e ha (1.1) may ep esen
ei he a sys em o ODEs o PDEs depending o he di e en ial ope a o . The lis o ope a o s
which we will be sol ing in Chap e 3 a e:
Name Exp ession
Iden i y Ope a o ℒ[𝑢(𝑥)]=𝑢(𝑥)
1D Di e gence Ope a o ℒ[𝑢(𝑥)]=𝜕𝑢(𝑥)
𝜕𝑥
2D Di e gence Ope a o ℒ[𝑢(𝑥,𝑦)]=𝜕𝑢(𝑥,𝑦)
𝜕𝑥 +𝜕𝑢(𝑥,𝑦)
𝜕𝑦
2D Laplacian Ope a o ℒ[𝑢(𝑥,𝑦)]=𝜕2𝑢(𝑥,𝑦)
𝜕𝑥2+𝜕2𝑢(𝑥,𝑦)
𝜕𝑦2
1D Ad ec ion Ope a o ℒ[𝑢(𝑥)]=𝑢(𝑥)⋅𝜕𝑢(𝑥)
𝜕𝑥
2D Clai au Ope a o ℒ[𝑢(𝑥,𝑦)]=𝑥⋅𝜕𝑢(𝑥,𝑦)
𝜕𝑥 +𝑦⋅𝜕𝑢(𝑥,𝑦)
𝜕𝑦
2D Bu ge s Ope a o
ℒ[u(𝑥,𝑦)]= ℒ[(𝑢𝑥(𝑥,𝑦),𝑢𝑦(𝑥,𝑦))]
= (𝑢𝑥(𝑥,𝑦)⋅𝜕𝑢𝑥(𝑥,𝑦)
𝜕𝑥 +𝑢𝑦(𝑥,𝑦)⋅𝜕𝑢𝑥(𝑥,𝑦)
𝜕𝑦 ,
𝑢𝑥(𝑥,𝑦)⋅𝜕𝑢𝑦(𝑥,𝑦)
𝜕𝑥 +𝑢𝑦(𝑥,𝑦)⋅𝜕𝑢𝑦(𝑥,𝑦)
𝜕𝑦 )
Table 1.2: Lis o di e en ial ope a o s used in Chap e 3.
9
Recall ha a he s a o his sec ion, in de ining (1.1) we indica ed ha 𝑢was “a” solu ion
o he sys em o di e en ial equa ions. In ac , he e a e usually many solu ions o none may
e en exis . To ensu e exis ence and uniqueness o he solu ion we need o impose some
addi ional condi ions o (1.1), namely ini ial condi ions on ODEs and bounda y condi ions
on PDEs. The mos common se o hese condi ions a e:
Cauchy (ODE): 𝑢(𝑥0)=𝑢0(1.2)
Di ichle (PDE): 𝑢(𝑥)=𝑔(𝑥), 𝑥∈Γ≡𝜕Ω (1.3)
Neumann (PDE): 𝜕𝑢(𝑥)
𝜕𝑛(𝑥)=𝑔(𝑥), 𝑥∈Γ≡𝜕Ω (1.4)
Cauchy (PDE): 𝑢(𝑥)=𝑔1(𝑥)∧𝜕𝑢(𝑥)
𝜕𝑛(𝑥)=𝑔2(𝑥), 𝑥∈Γ≡𝜕Ω (1.5)
whe e Γo 𝜕Ω(depending on he con en ion) is he bo de o he domain Ωand 𝑛(𝑥)is he
no mal ec o a he poin 𝑥∈Ω. Desc ip i ely, Cauchy ini ial condi ions ix he solu ion
alue a a ce ain poin ; Di ichle bo de condi ions, ix he solu ion alues a he bo de o
he domain; Neumann bo de condi ion, ix he low coming in and ou o he domain; and
inally, Cauchy bo de condi ions a e a mix o Di ichle and Neumann condi ions. [5]
Be o e p oceeding, one obse a ion has o be made on ODEs. Gi en a single ODE o 𝑛- h
o de ( he highes de i a i e in he equa ion has o de 𝑛) wi h 𝑛>1, i is common p ac ice o
ans o m he equa ion in o a sys em o i s o de ODEs by simply in oducing he ollowing
se o 𝑛−1equa ions 𝑢1=𝑢′, ..., 𝑢𝑛−1 =𝑢′𝑛−2 =𝑢(𝑛−1), and using hem o eplace any
de i a i es o o de highe han one in he o iginal equa ion. This means ha any gi en
𝑛- h o de ODE is equi alen o a sys em o 𝑛 i s o de ODEs; hus inding a solu ion
o he o iginal 𝑛- h o de equa ion, 𝑢(𝑥), is equi alen o inding an ex ended mani old
solu ion in he co esponding sys em o i s o de equa ions, u(𝑥) = (𝑢,𝑢1,...,𝑢𝑛−1)(𝑥),
which includes i s de i a i es. The (1.2) de ini ion o Cauchy ini ial condi ions is based on
his las pa adigm whe e we conside sys ems o i s o de ODEs. Hence, when conside ing a
𝑛- h o de ODE in i s o iginal o m, he equi alen o ixing an ini ial poin on he mani old
solu ion u(𝑥0)=(𝑢0,𝑢1,0,...,𝑢𝑛−1,0)is o ix he alue o i s i s 𝑛−1de i a i es, and on
hose p emises hese condi ions should be w i en as 𝑢(𝑥0)=𝑢0, ..., 𝑢(𝑛−1)(𝑥0)=𝑢𝑛−1,0.
Summa izing, we will be conside ing sys ems o di e en ial equa ions (1.1) in combina ion
some ini ial/bounda y condi ions (1.2-1.5), mainly Cauchy condi ions, o o mula e wha
a e known as an ini ial/bounda y alue p oblems. The main objec i e is o o mula e a
“well-posed” p oblem: a se o basic p ope ies which is equi ed o apply any nume ical
in eg a ion success ully. A sys em o di e en ial equa ions is said o be well-posed in he
Hadama d sense [5], i i holds he ollowing h ee condi ions:
– A solu ion exis s.
– The solu ion is unique.
– The is s able, i.e. i changes con inuously wi h small a ia ions o i s ini ial condi ions,
bounda y condi ions and ex e nal o ce.
P o ing ha a gi en p oblem is well-posed is a eally icky ma e . The e a e e y ew
gene al esul s and many o he p oo s a e case speci ic: hey may apply o ce ain ypes
o di e en ial ope a o s ( o example linea o Poisson ope a o s), equi e a ce ain ype o
bounda y condi ions and impose se e al deg ees o egula i y.
10
As shown in Table 1.2, in his wo k we will be using e y simple di e en ial ope a o s,
all o hem linea o quasi-linea . Also, he ex e nal o ce e ms will always be an analy ic
unc ions (ac ually polynomials) and he ini ial/bounda y condi ions will be o he mos pa
o Cauchy ype. Unde hese speci ic condi ions he Cauchy-Ko ale skaya heo em gua an ees
he exis ence o an unique analy ic solu ion o he exp ession in bo h he ODE and PDE cases.
None heless, his heo em has i s limi a ions:
– Fi s , i is a local heo em, al hough his can be emedied i all he e ms a e analy ic
e e ywhe e o o m a global e sion by “s i ching” he local solu ions in se e al local
neighbou hoods o o m a co e o he domain and build a global solu ion. Since he
solu ion has o be unique in he in e sec ion o he neighbou hoods he global solu ion
has o be unique.
– Second, he p oo is e y dependan on he analy ici y o he coefficien s in he ope a o
and ex e nal o ce as i s p oo elies on he me hods o majo an s. The ske ch o his
p oo goes as ollows [6, 7]: i s we assume ha he solu ion can be w i en as powe
se ies in some neighbou hood 𝑈 ⊆Ω, he coefficien s o which a e ob ained om he
ini ial condi ions and eplacing he powe se ies in o he di e en ial equa ion. Then we
a emp o ind some powe se ies ha majo a es he solu ion powe se ies, he de ini ion
o which is ha ∑𝑘𝑎𝑘𝑥𝑘majo a es ∑𝑘𝑏𝑘𝑥𝑘i |𝑎𝑘|<𝑏𝑘. Finally, we use he p ope y
ha s a es ha i a se ies is majo a ed by a se ies ha con e ges, so does ha se ies. I
he majo a ing se ies o he solu ion powe se ies is adequa ely chosen and con e ges,
so does he solu ion powe se ies which con e ges o he local unique analy ic solu ion.
When he di e en ial equa ion is linea o quasi-linea , o e e y 𝑥0∈𝑈⊆Ω, he e is
always a sys emic change o a iables ℎ∶𝑈→𝑉such ha ℎ(𝑥0)=0. Then, on his new
domain 𝑉, we can always cons uc a powe se ies ha con e ges o 0wi h adius o
con e gence 𝜌 = 1, and majo a es he powe se ies solu ion 𝑢(ℎ(𝑥))∶𝑉 →ℝ𝑛. This
makes he p oo independen o he di e en ial ope a o as long as i is linea
o quasi-linea . No e ha , his p oo is cons uc i e as he powe se ies solu ion
sa is ies he di e en ial ope a o and ini ial/bounda y condi ions, and i con e ges on
a neighbou hood ℎ−1(𝐷0(1))o 𝑥0. The e o e, i is a alid local solu ion, and likewise
i s uniqueness is p o en om a simila a gumen .
The assump ions o analy ici y and cons uc i eness o he p oo in his heo em implies
ha he heo em is p o ing ha he e is a unique analy ic solu ion. This is much
di e en han claiming ha he only solu ion is analy ic. Hence, he ini ial/bounda y
p oblem could s ill ha e o he non-analy ic solu ions. (Ac ually, in he case o ODEs,
he Pica d–Lindelö heo em gua an ees gene al uniqueness o e o he solu ions, so he
analy ic one is he only one; bu he e a e no simila esul s o PDEs.)
Despi e he wo po en ial limi a ions in applying he Cauchy-Ko ale skaya heo em ha
we ha e jus seen, his will be enough o he a i icial neu al ne wo k o app oxima e he
analy ic solu ion o he p oblem. The eason o his assump ion is ha he a i icial neu al
ne wo ks will be composed o analy ic unc ions (almos e e ywhe e), hus we expec hem o
i p e e en ially ha solu ion. F om now on, he e will be no u he discussions abou he
well-posedness o he ini ial/bounda y p oblems ha we will a emp o sol e in his wo k,
he Cauchy-Ko ale skaya heo em will always apply.
11
1.2 Rele an Li e a u e
The app oach o sol ing di e en ial equa ion sys ems da es ela i ely “old”; a leas , we
ha e ound and a icle [8], da ing back o 1994. Al hough his a icle uses a g aph-like
s uc u e acknowledged as an a i icial neu al ne wo k o sol e ODEs, i applies a FEM ype
o “ end-like” ac i a ion unc ions and does no ely in “ aining” in he mode n sense, i.e.
de ining a non-con ex op imiza ion p oblem, op ing ins ead o some kind o Gale kin me hod
hyb id. Th oughou his a icle he e a e some e e ences o some pape s which use some kind
o mean squa e e o and non-con ex op imiza ion ( he mos s anda d app oach nowadays),
bu he au ho ega ds hem as compu a ionally expensi e. This shows ha he s a e o he
ield o deep-lea ning back hen did no allow o hese s a egies o be iable candida es o
in eg a e di e en ial equa ions.
Mo ing o mo e ecen imes, a icles explici ly in eg a ing ODEs wi h a i icial neu al
ne wo ks a e ha d o come by, since as explained be o e, he e a e e y efficien me hods
al eady o in eg a e ODEs, and he main in e es is in PDEs. A ela ed case ha we ound
e y in e es ing and wo h men ioning is [9], which uses a e e se app oach. Ins ead o aining
an a i icial neu al ne wo k o in eg a e an ODE, i uses ODE nume ical in eg a o s o ain
a i icial neu al ne wo ks.
Wi h ega ds o PDEs, [4] is a e y comple e wo k. I de ines a loss by he disc e iza ion o e
a andom colloca ion o poin s, o he e o o he a i icial neu al ne wo k wi h espec o he
bounda y alue p oblem ( he same idea we will be using o de ine a loss in 2.2). Then, i goes
o sol ing e y high dimensional ee bounda y PDEs ( o Ame ican op ions), and bounda y
p oblems ( o he Hamil on-Jacobi-Bellman). Two in e es ing ea u es in his wo k a e ha :
in he a icle is called Mon e Ca lo me hod o as compu a ion o second de i a i es, which
is a ype o syn he ic g adien ; and p oo o es ic ed e sion o a uni e sal app oxima ion
heo em o he solu ion o PDEs. A syn he ic g adien [10] is usually used o e y la ge
ne wo ks o e y la ge amoun s o da a. Ins ead o compu ing he exac de i a i es o he
loss unc ion wi h espec o he pa ame e s equi ed o minimize he loss, he de i a i es a e
d awn om a dis ibu ion which is upda ed o e e y s ep o he aining. This echnique
ades o no ha ing he exac de i a i es, wi h less compu a ional cos and possibili y o
asynch onous aining. The au ho s o [4] use his Mon e Ca lo me hod o a oid he expensi e
cos o a second o de au oma ic di e en ia ion o e y high dimensions. In his wo k, we
will no be conside ing his echnique since, unlike [4] which in eg a es PDEs o up o 200
dimensions, we only in eg a e PDEs o up o 2 dimensions (no eally high dimensions) like
almos all o he o he pape s ha we will e iew nex do. Howe e , he use o a syn he ic
g adien is a ecu en heme in pape s dealing wi h e y high dimensional sys ems.
O he pape ocused on high dimensions a e [11] and [12]. Bo h a e simila in ha hey
do no conside de e minis ic PDEs bu BSDEs (Backwa d S ochas ic Di e en ial Equa ions)
such as he Allen-Cahn (physics) o Black-Scholes (economics) equa ions, and hey conside
sys ems o up o 100 dimensions.
Close o he line o wo k o his p ojec a e [13, 14, 15, 16, 17, 18]. The ou lines o hese
wo k a e qui e simila : hey simula e PDEs o up o 2 dimensions and do some kind o e o
analysis. Some he cha ac e is ics o [15] is ha i analyses he e ec in he e o o he mesh
and numbe o hidden nodes, and in [16, 17] he me hod is compa ed o an s anda d FEM
me hod. On he mo e in e es ing side o hings lie [13] and [18].
12
[18] ollows up on he a chi ec u e o [4], which uses a special kind o eed- o wa d a i icial
neu al ne wo k. In a egula eed- o wa d a i icial neu al ne wo k he neu ons a e di ided
in o sequen ial laye s, hen he ou pu s o a laye s ic ly ge ed as inpu o he nex laye
(we will see his in sec ion 2.1). Howe e , [4] used an a chi ec u e whe e he ou pu s o a
laye eed all o i s successi e laye s. This seem o yield good esul s al hough he e is no
compa ison o o he ype o a chi ec u es.
[13] is ocused in, ins ead o using he a i icial neu al ne wo k app oach o i s capabili ies o
in eg a e sys ems in high dimensions, in using i s mesh- ee na u e o in eg a e o e i egula
domains. In his pape a i icial neu al ne wo ks a e ained o i he ad ec ion and di usion
ope a o s o colloca ions o e y i egula domains. I also applies a e y o iginal idea which
is o conside he app oxima ed solu ion as 𝑢(𝑥)= 𝑔(𝑥)+𝐷(𝑥)⋅ 𝑢(𝑥), whe e 𝑔(𝑥)is he
bounda y condi ion, 𝐷(𝑥)is a dis ance unc ion o he bo de such ha 𝐷(𝑥)=0i 𝑥∈Γ,
and 𝑢(𝑥)is a egula a i icial neu al ne wo k. This way 𝑢(𝑥)always sa is ies he bounda y
condi ions by cons uc ion and i is only equi ed o ain he model o i di e en ial equa ion,
hus one can p e-compu e he dis ance om he colloca ion o he bo de since i does no
change h oughou he aining, and ocus on a single objec i e loss unc ion. Fo his wo k we
eckoned ha his idea would only wo k well wi h Neumann o Di ichle bounda y condi ions,
bu wi h Cauchy bounda y condi ions which include bo h a he same ime.
O he pape s ha ela e a i icial neu al ne wo ks o PDEs a e: [19], which is a e sion
o [9] ela ing PDEs o he dynamics o non-con ex op imiza ion in con olu ional ne wo ks;
and [20] which d aw he same ela ion be ween he dynamics o he op imiza ion o gene al
a i icial neu al ne wo ks and PDEs, using s a is ical physics echniques. Also [21, 22, 23] a e
a se ies o pape s by he same au ho which ain a i icial neu al ne wo ks wi h expe imen al
da a om physics o lea n he unde lying beha iou s modelled by PDEs.
Finally, we wan o ema k ha he need o he de i a i es o a i icial neu al ne wo ks wi h
espec o inpu s, which seem o be some hing ha would no appea in simple eg ession o
classi ica ion p oblems, hus only ela ed o his opic, has been used in o he con ex s. [24, 25]
a e examples o his, bo h make use o in o ma ion abou he de i a i es as egula iza ion
echniques and o speed up aining in p oblems wi h no impe a i e use o hem.
13
Chap e 2
A i icial Neu al Ne wo ks F amewo k
In his chap e we will be co e ing om sc a ch e e y hing abou a i icial neu al ne wo ks
ha we will be using o sol e di e en ial equa ions in he nex chap e . We will s a by
de ining wha an a i icial neu on and a i icial neu al ne wo k a e; explain he a chi ec u e
o a ully-connec ed eed- o wa d neu al ne wo k; hei possible ac i a ion unc ions and
ini ializa ions; show how o assign a loss unc ion o ain he model o i he ini ial/bounda y
alue p oblem; discuss he p os and cons o he main op imize op ions a ailable o ain he
model; and examine di e se egula iza ion echniques which help imp o e aining esul s.
2.1 Wha a e A i icial Neu al Ne wo ks?
A i icial neu al ne wo ks a e ensembles o uni s called a i icial neu ons. The e a e many
designs o hese a i icial neu ons, and by combining and a anging hem in di e en ways
we can c ea e ne wo ks wi h e y di e en beha iou s and esul s.
In his wo k, he only ype o a i icial neu on ha we will be using is known as he
pe cep on, which is p obably he simples and he mos widely used. Figu e 2.1 shows he
basic scheme o a pe cep on. A pe cep on akes in 𝑛inpu s, which we can iew as coo dina es
o a ec o 𝑥𝑛; i combines hem linea ly mul iplying weigh s and adding a bias; and hen
applies a (mos ly) non-linea unc ion 𝑎 o he linea combina ion.
...
Figu e 2.1: Pe cep on scheme.
Some o he popula design a e con olu ional neu ons, which a e used in image ecogni ion;
and memo y cells, which a e used in da a wi h ime dependencies such as ideo p ocessing.
The way in which we combine a i icial neu ons o o m a i icial neu al ne wo ks is such
ha he ou pu s o g oup o neu ons become he inpu o ano he g oup o neu ons. In his
ligh , one can hink o an a i icial neu al ne wo k as a di ec ed g aph wi h en y and exi
edges, whe e each o he nodes co espond o a neu on and he di ec ed edges indica e which
neu ons ou pu s eed ano he neu on as inpu s. This is why many imes he neu ons in a
ne wo k a e e e ed o as nodes.
14
Figu e 2.2: A di ec ed g aph which could be a possible ep esen a ion o he a chi ec u e o
an a i icial neu al ne wo k. Nodes a e a i icial neu ons and edges indica e which neu ons
eed in o each o he .
One app oach o o ganizing hese neu ons is using a eed- o wa d scheme. By his scheme
we di ide he neu ons in o sequen ial laye s (g oups). Then he ou pu s o a laye can only
become inpu s o he nex laye . Using he g aph cha ac e iza ion, his would co espond
o a i icial neu al ne wo ks de ined by di ec ed g aphs wi hou cycles. The main ad an age
o his scheme is i allows o he use s anda d back-p opaga ion algo i hm (which we will
be explaining in he nex sec ions) o ain he pa ame e s in he neu ons o i a ce ain model.
A pa icula case o eed- o wa d a i icial neu al ne wo ks is he ully-connec ed. This
happens when all he neu ons in a laye a e connec ed o all in he neu ons in he nex laye .
...
...
...
...
...
...
...
Figu e 2.3: Gene al scheme o a pe cep on based ully-connec ed eed- o wa d a i icial
neu al ne wo k.
Th oughou his wo k we will be exclusi ely using pe cep on ully-connec ed eed- o wa d
a i icial neu al ne wo ks wi h di e en numbe o laye s and di e en numbe o nodes pe
laye o app oxima e he solu ions o di e en ial equa ions. Figu e 2.3 shows he s anda d
g aph ep esen a ion o he s uc u e o such neu al ne wo ks based on he concep s explained
up o his poin . No e ha he edges a e no di ec ed, as i is unnecessa y, since by s anda d
con en ion he low o he neu ons goes om le o igh .
An a i icial neu al ne wo k p o ides a ame o de ine complex pa ame ic unc ions in a
modula way as a composi ion o simple ope a ions encapsula ed in a i icial neu ons (we
will insis in his poin he nex sec ions).
15
The ollowing obse a ions a e a way o be e unde s and a i icial neu al ne wo ks:
– A single neu on can o m a neu al ne wo k. In ha case, i he ac i a ion unc ion is
iden i y, and we adjus he pa ame e s o he neu on so he ou pu s i a con inuous
da ase , we pe o m a linea eg ession. Indeed 𝑤𝑖and 𝑏a e simply he slope and he
in e cep . Simila ly, i he ac i a ion unc ion is a sigmoid, and we adjus he pa ame e s
o he neu on so he ou pu s i a bina y da ase , we pe o m a logis ic eg ession.
– In mo e complex deep neu al ne wo ks, we kind o expand he same ideas as in he single
neu on. In gene al, in a i icial neu al ne wo ks, wha happens in eg ession p oblems
is ha we i he pa ame e s o make he hype -su ace de ined by ne wo k s uc u e
ge as close as possible o he sample da a; and in classi ica ion p oblems we i he
pa ame e s as much as possible o ma ch he unde lying ma ginal dis ibu ion o each
ca ego y wi h he ne wo k s uc u e.
Finally we will end his in oduc o y sec ion wi h some nomencla u e o he es o he
wo k:
– A eed- o wa d neu al ne wo k is said o be deep when i con ains mo e ha one laye .
– In deep neu al ne wo ks, laye s a e classi ied as inpu , hidden and ou pu laye s. The
inpu laye is he one ha simply akes in he inpu s and does no ans o ma ions; he
ou pu laye is he las laye o he sequence o laye s and i s ou pu s a e he ou pu
esul s o all he whole ne wo k; and he hidden laye s a e all ha lie be ween he inpu
and ou pu laye s. Based on Figu e 2.3, he inpu ec o co esponds o he inpu laye
(laye 0, e en hough i is no explici ly e e enced as ha ), laye s 1 o 𝑙−1would be
he hidden laye s and laye 𝑙would be he ou pu laye .
– A eed- o wa d ully-connec ed ne wo k is de ined by i s numbe o laye s, he numbe
o neu ons (nodes) in each laye , and i s ypes o neu ons. A neu on is de ined by i s
weigh s, bias and ac i a ion unc ion. The e o e, om now on we will use he ollowing
s anda d nomencla u e, which con ains all he elemen s ha we need, o comple ely
de ine ou ne wo ks:
ℓ
laye s indexes,
𝑛
ℓ
,𝑚
ℓ
neu ons indexes o laye
ℓ
,
𝑤[
ℓ
] 𝑚
ℓ
−1
𝑛
ℓ
weigh s o he neu on 𝑛
ℓ
in he laye
ℓ
(pa ame e s),
𝑏[
ℓ
]𝑛
ℓ
bias o he neu on 𝑛
ℓ
in he laye
ℓ
(pa ame e s),
𝑎[
ℓ
]𝑛
ℓ
ac i a ion unc ion o he neu on 𝑛
ℓ
in he laye
ℓ
,
𝑧[
ℓ
]𝑛
ℓ
esul o he linea combian ion o he neu on 𝑛
ℓ
in he laye
ℓ
,
𝑦[
ℓ
]𝑛
ℓ
esul o applying he ac i a ion unc ion o he neu on 𝑛
ℓ
in he laye
ℓ
.
By his no a ion we will conside he inpu ec o as laye 0 o which no ans o ma ions
a e pe o med, he e o e, 𝑥𝑖=𝑧[0]𝑖=𝑦[0]𝑖. Also, he ou pu laye ( he 𝑛- h laye ) ne e
has an ac i a ion unc ion, he e o e 𝑢𝑖=𝑧[𝑛]𝑖=𝑦[𝑛]𝑖.
16
2.2 F om Nume ical In eg a ion o Deep-Lea ning
The in ended use o he a i icial neu al ne wo ks in his wo k is o hem o app oxima e he
solu ion o some ini ial/bounda y alue p oblem. In o de o achie e his, he pa ame e s o
he ne wo k should be weaked o minimize some measu e ep esen ing how well he ne wo k
sa is ies he di e en ial ope a o and ini ial/bounda y condi ions.
Gi en he a i icial neu al ne wo k app oxima ion o he solu ion, 𝑢(𝑥;𝑤,𝑏), which depends
pa ame ically on he se o all weigh s 𝑤and all biases 𝑏, we can de ine a posi i ely de ined
loss o cos unc ion, 𝐿(𝑤,𝑏), ha quan i ies he deg ee o sa is ac ion o he ne wo k o he
p oblem, in he ollowing e ms:
𝐿(𝑤,𝑏)=𝐿1(𝑤,𝑏)+𝐿2(𝑤,𝑏)+𝑅(𝑤,𝑏), (2.1)
whe e 𝐿1(𝑤,𝑏)is he loss e m measu ing how well he neu al ne wo k app oxima es he
di e en ial ope a o (1.1), 𝐿2(𝑤,𝑏)is he loss e m measu ing how well he neu al ne wo k
app oxima es he ini ial/bounda y condi ions (1.2-1.5), and R(w,b) is he egula iza ion e m
o he loss which is a e m ha will help o s abilize and imp o e he con e gence in he
op imiza ion ( his e ms will be co e ed in a la e sec ion). In pa icula , using Cauchy
bounda y condi ions, which a e he mos complex o all, he e ms would be:
𝐿1(𝑤,𝑏)=∣∣ℒ[ 𝑢(𝑥;𝑤,𝑏)]−𝑓(𝑥)∣∣Ω,2=∫
Ω(ℒ[𝑢(𝑥;𝑤,𝑏)]−𝑓(𝑥))2𝑑𝑥, (2.2)
𝐿2(𝑤,𝑏) =∣∣ 𝑢(𝑥;𝑤,𝑏)−𝑔1(𝑥)∣∣Γ,2+∣∣𝜕 𝑢(𝑥;𝑤,𝑏)
𝜕𝑛(𝑥) ]−𝑔2(𝑥)∣∣Γ,2
=∫
Γ( 𝑢(𝑥;𝑤,𝑏)−𝑔1(𝑥))2𝑑𝑥+∫
Γ(𝜕 𝑢(𝑥;𝑤,𝑏)
𝜕𝑛(𝑥) −𝑔2(𝑥))2𝑑𝑥, (2.3)
wi h ||⋅||Ω,2 he no m o he 𝐿2(Ω)Hilbe space (which is he space o squa e in eg able
unc ions in Ω), and he same concep applies o ||⋅||Γ,2. Howe e , in p ac ice, as he in eg als
in (2.2-2.3) a e i ually imp ac ical o compu e, ins ead o using he ||⋅||Ω,2and ||⋅||Γ,2no ms,
a disc e e app oxima ion is used. The is achie ed is by aking a andom colloca ion o 𝑁Ω
poin s in Ωand 𝑁Γpoin s in Γ, which can be ob ained by using a Mon e Ca lo hi -and-miss
app oach, and disc e izing as ∫Ω→1/𝑁Ω∑𝑁Ωand ∫Ω→1/𝑁Γ∑𝑁Γ. Thus he ac ual loss
e ms become:
𝐿1(𝑤,𝑏)≈ 1
𝑁Ω∑
𝑖∈𝑁Ω(ℒ[𝑢(𝑥𝑖;𝑤,𝑏)]−𝑓(𝑥𝑖))2,(2.4)
𝐿2(𝑤,𝑏)≈ 1
𝑁Γ∑
𝑖∈𝑁Γ( 𝑢(𝑥𝑖;𝑤,𝑏)−𝑔1(𝑥𝑖))2+1
𝑁Γ∑
𝑖∈𝑁Γ(𝜕 𝑢(𝑥𝑖;𝑤,𝑏)
𝜕𝑛(𝑥𝑖)−𝑔2(𝑥𝑖))2.(2.5)
Obse e ha we ha e ans o med he con inuous loss unc ions in o he MSE (mean squa ed
e o ) on a andom colloca ion o poin s o he domain and he bo de . Now, on hese p emises,
he p oblem has changed in na u e, om a nume ical in eg a ion p oblem, o an almos pu ely
deep-lea ning eg ession ype o p oblem. O he disc e iza ions using he absolu e e o o
he Hube e o would ha e yielded equally alid app oxima ions.
17
Showing his second ype o g adien p oblem a bi less ob ious han he classic one. To be
able o gi e an in ui ion o he p oblem we will conside he case in which all he ac i a ion
unc ions a e exponen ial 𝑎[
ℓ
]𝑛
ℓ
(𝑥)=𝑒𝑥. In such case he de i a i es o he ac i a ion unc ion
a e simply:
𝐷𝑛
ℓ
(𝑎[
ℓ
]𝑛
ℓ
(𝑧[
ℓ
]𝑛
ℓ
))=𝑎[
ℓ
]𝑛
ℓ
(𝑧[
ℓ
]𝑛
ℓ
)⋅𝟙𝑛
ℓ
,
𝐷𝑛
ℓ
,𝑛
ℓ
(𝑎[
ℓ
]𝑛
ℓ
(𝑧[
ℓ
]𝑛
ℓ
))=𝑎[
ℓ
]𝑛
ℓ
(𝑧[
ℓ
]𝑛
ℓ
)⋅𝟙𝑛
ℓ
,𝑛
ℓ
,(2.22)
whe e 𝟙𝑛
ℓ
is he enso whose e e y componen is 1, and 𝟙𝑛
ℓ
,𝑛
ℓ
=𝟙𝑛
ℓ
⋅𝟙𝑛
ℓ
. Then, o an
exponen ial ac i a ion unc ion, he de i a i es (2.17) and (2.19) o he weigh s 𝑤[1] in he
i s laye become:
𝜕 𝑢𝑛3
𝜕𝑤[1] 𝑛0
𝑚1=𝑤[3] 𝑛2
𝑛3⋅(𝑎[2]𝑛2(𝑧[2]𝑛2))⋅𝟙𝑛2⋅𝑤[2] 𝑛1
𝑛2⋅(𝑎[1]𝑛1(𝑧[1]𝑛1))⋅𝟙𝑛1⋅𝛿𝑚1
𝑛1⋅𝑦[0]𝑛0,
(2.23)
𝜕2𝑢𝑛3
𝜕𝑤[1] 𝑛0
𝑚1𝜕𝑥𝑚0=𝑤[3] 𝑛2
𝑛3⋅(𝑎[2]𝑛2(𝑧[2]𝑛2))⋅𝟙𝑛2,𝑛2
⋅(𝑤[2] 𝑛1
𝑛2⋅(𝑎[1]𝑛1(𝑧[1]𝑛1))⋅𝟙𝑛1⋅𝛿𝑚1
𝑛1⋅𝑦[0]𝑛0)⋅(𝑤[2] 𝑛1
𝑛2⋅(𝑎[1]𝑛1(𝑧[1]𝑛1))⋅𝟙𝑛1⋅𝑤[1] 𝑚0
𝑛1)
+𝑤[3] 𝑛2
𝑛3⋅(𝑎[2]𝑛2(𝑧[2]𝑛2))⋅𝟙𝑛2⋅𝑤[2] 𝑛1
𝑛2⋅(𝑎[1]𝑛1(𝑧[1]𝑛1))⋅𝟙𝑛1,𝑛1⋅𝛿𝑚1
𝑛1⋅𝑦[0]𝑛0⋅𝑤[1] 𝑚0
𝑛1
+𝑤[3] 𝑛2
𝑛3⋅(𝑎[2]𝑛2(𝑧[2]𝑛2))⋅𝟙𝑛2⋅𝑤[2] 𝑛1
𝑛2⋅(𝑎[1]𝑛1(𝑧[1]𝑛1))⋅𝟙𝑛1⋅𝛿𝑚0
𝑛0⋅𝛿𝑚1
𝑛1,(2.24)
Since he de i a i e o he exponen ial is essen ially i sel , and he ac i a ion unc ion enso
is made o copies o exponen ials o which di e en ia ing means mul iplying 𝟙𝑛
ℓ
enso s, bo h
exp essions (2.23) and (2.24) a e w i en in he same e ms, hence a e easy o compa e. As
he enso ope a ions (sums and p oduc s) a e commu a i e; and gi en a enso 𝑇𝑖,𝑗
𝑘, which
le s say has non-ze o componen s o simplici y, we can de ine a “pseudo-in e se” o he
con ac ion (𝑇𝑖,𝑗
𝑘)−1 =[𝑇−1]𝑘
𝑖,𝑗 (elemen -wise) such ha i holds 𝑇𝑖,𝑗
𝑘⋅[𝑇−1]𝑘
𝑖,𝑗 =𝑖⋅𝑗⋅𝑘(i
he e a e ze o alues, we would ha e we would ha e o ix an in e se elemen o he ze o
componen s and discoun he ze oes om he coun 𝑖⋅𝑗⋅𝑘; he e we will assume he e a e no
ze oes o cla i y); hen we can easily eplace (2.23) in (2.24) yielding:
𝜕2𝑢𝑛3
𝜕𝑤[1] 𝑛0
𝑚1𝜕𝑥𝑚0=𝜕 𝑢𝑛3
𝜕𝑤[1] 𝑛0
𝑚1⋅(𝟙𝑛2⋅𝑤[2] 𝑛1
𝑛2⋅(𝑎[1]𝑛1(𝑧[1]𝑛1))⋅𝟙𝑛1⋅𝑤[1] 𝑚0
𝑛1)
+𝜕 𝑢𝑛3
𝜕𝑤[1] 𝑛0
𝑚1⋅𝟙𝑛1⋅𝑤[1] 𝑚0
𝑛1+𝜕 𝑢𝑛3
𝜕𝑤[1] 𝑛0
𝑚1⋅𝛿𝑚0
𝑛0⋅(𝑛0)−1⋅(𝑦[0]𝑛0)−1,(2.25)
and g ouping,
𝜕2𝑢𝑛3
𝜕𝑤[1] 𝑛0
𝑚1𝜕𝑥𝑚0= (𝟙𝑛2⋅𝑤[2] 𝑛1
𝑛2⋅(𝑎[1]𝑛1(𝑧[1]𝑛1))⋅𝟙𝑛1⋅𝑤[1] 𝑚0
𝑛1
+𝟙𝑛1⋅𝑤[1] 𝑚0
𝑛1+𝛿𝑚0
𝑛0⋅(𝑛0)−1⋅(𝑦[0]𝑛0)−1)⋅ 𝜕 𝑢𝑛3
𝜕𝑤[1] 𝑛0
𝑚1.(2.26)
24
F om (2.26) we see ha , in he case o exponen ial ac i a ion unc ions, we can w i e he
de i a i es o a gi en weigh and inpu in e ms o a lowe o de de i a i e in e ms o he
inpu ( his can be done o any o de o he inpu ). Now, ega ding he ac o in pa en hesis
in (2.26), assuming ha he inpu is no malized and we do no shu le da a: he hi d e m
beha es as a e y la ge cons an (>>1) which inc eases wi h he inpu dimension; when he
weigh s a e small, he second e m is he la ges , hus since he weigh s a e small, he whole
ac o is small (<<1); and when he weigh s a e la ge, i is he i s e m ha domina es,
making he whole ac o e y la ge (>>1). This c ea es he anishing/exploding g adien
e ec desc ibed p e iously in (2.21). Also, his same analysis can be ca ied ou wi h espec
o any o he weigh o bias pa ame e making a ew changes, bu in case o conside ing
o he ac i a ion unc ions, al hough he end conclusion is he same and can be empi ically
isualized, he analy ic s udy becomes much ha de .
This issue is impo an when conside ing ope a o s wi h ha con ain de i a i es o di e en
o de o p oduc s o de i a i es, which is always igno ed. Fo example, he laplacian ope a o
which con ain only addi ions o second o de de i a i es does no ha e his p oblem as he
e ec o no de i a i e as ly domina es he e ec o ano he when de i ing o e he pa ame e s,
bu he bu ge ’s ope a o p esen s i .
2.4 Op imize s
Recall ha in sec ion 2.2 we ans o med he p oblem o app oxima ing he solu ion o
an ini ial/bounda y p oblem in o a non-con ex op imiza ion p oblem, whe eby we had o
ind he minimum (o a sufficien ly small alue) o a loss unc ion 𝐿(𝑤,𝑏), de ined by (2.1),
(2.4) and (2.5). On ha sec ion we wen on o an icipa e ha , in o de o do ha , we would
be using a g adien based op imiza ion echnique (op imize ), which equi ed he compu a ion
o he loss de i a i es wi h espec o he pa ame e s, i.e. he g adien ∇(𝑤,𝑏)𝐿(𝑤,𝑏). This
had led o sec ion 2.3 we e we explained he algo i hm o back p opaga ion o compu e such
de i a i es and he discussions o hei po en ial p oblems, namely he anishing/exploding
g adien s. Now e e y hing is se up and i is inally ime o ge in o he p ocess o ac ually
making he pa ame e s o he a i icial neu al ne wo k app oxima e he solu ion (op imizing
he loss unc ion), which in deep-lea ning ja gon is known as he aining p ocess.
The ollowing discussion will be de o ed o explaining he design o some o he mos
impo an g adien based op imize s used in deep-lea ning, and he ones we will be using
in his wo k. These op imize s a e he so called me hods o s eepes descen o me hods o
g adien descen , which a e a amily o me hods used o sol e gene al non-linea (con ex o
non-con ex) un es ic ed op imiza ion p oblems. In ui i ely, he idea behind hese me hods
elies on hinking o he loss unc ion as a hype -su ace 𝐿(𝑤,𝑏)∶ℝ𝑛×ℝ𝑚→ℝ, whe e 𝑤∈ℝ𝑛
and 𝑏∈ℝ𝑚. Then, s a ing a some (𝑤0,𝑏0), ini ial poin , he me hod goes on o calcula e
new poin s which should educe he loss unc ion alue by mo ing wi hin a ce ain a e, 𝜂,
named he lea ning a e, in he di ec ion o ∇(𝑤,𝑏)𝐿(𝑤,𝑏). The ypical analogy o his idea is
hinking o i as ha ing a ball (ini ial poin ), and le ing i oll downhill along he slope ( he
di ec ion o he g adien ) un il i eaches he bo om.
25
As simple as hese me hod look concep uali y, in p ac ice i non ha easy o each he
minimum. I we we e o apply one o his me hods o a linea o quad a ic bowl loss unc ion
((𝑎𝑥+𝑏)2, 𝑎>0), we a e gua an eed ha he g adien a any poin would always poin in
he di ec ion o he only exis ing minimum, hus gi en adequa e lea ning a es, hese me hods
would ha e pe ec con e gence. Howe e , wi h almos e e y o he loss unc ion, he di ec ion
o s eepes descen (g adien ) does no necessa ily poin o he global minimum. Mo eo e , i
he p oblem is non-con ex, as all he ones we will be conside ing in his wo k, we a e almos
gua an eed ha he e a e many local minimums, and he di ec ion o s eepes descen may
lead he me hod o a local minimum and no he global one.
Ano he icky issue o hese me hods is he p esence o saddle o “saddle-like” egions o
he loss unc ion. These a e egions o which we ha e e y small de i a i es o he g adien
in ce ain di ec ions, and e y la ge in o he s. Visualizing hese egions in he loss unc ion,
hey esemble o, and hus a e o en called, “ alleys”. Wha happens in hese a eas is ha , in
he ball analogy, he ball s a oscilla ing up and down along he alley’s walls (di ec ions o
la ge alue de i a i es) bu is unable o make any p og ess ac oss he alley (di ec ions o low
alue de i a i es). When using hese s eepes descen , his “saddle-like” egion e ec , as well
as he e ec o no being unable o escape a local minimum, is o en e lec ed in he me hod
when he poin and loss unc ion s a oscilla ing be ween he same wo e y simila alues.
These a e he main h ee p oblems wi h s eepes descen : he g adien no poin ing in he
di ec ion o he global minimum; ge ing apped in a local minimum; and s agna ing when
passing h ough “saddle-like” egions o he loss unc ion. In o de o a oid o mi iga e hese
issues as much as possible, he e a e also h ee measu es ha can be applied: choosing a good
ini ializa ion (s a ing poin ); applying some egula iza ion echnique, which somewha has
he e ec o smoo hing he loss unc ion; and adjus ing “p ope ly” he lea ning a e a each
s ep. In he nex sec ions we will be looking a he ini ializa ion (which is igh ly ela ed o
he selec ion o ac i a ion unc ion), and he egula iza ion echniques. Fo he es o his
sec ion we will see di e en designs o s eepes descen me hods which adjus he lea ning
a es o e e y s ep based on di e en ideas. We will di ide hese designs in o i s o de i
hey equi e only he g adien , and second o de i hey also equi e es ima es o he cu a u e.
Fo an ex ensi e quali a i e su ey on g adien based me hods [28] has a good co e age;
in pa icula , in Table I and Table II he e is a e y comple e compa ison among i s and
highe o de me hods espec i ely. O he non g adien based me hods a e qui e a e, o
ins ance, in [29] a bio-inspi ed app oach is used: a popula ion o a i icial neu al ne wo ks
is gene a ed using di e en weigh s and a chi ec u es (hype -pa ame e s); he ne wo ks ge
es ed and anked by complexi y and pe o mance; hen, a new popula ion is gene a ed based
on he bes pe o ming ne wo ks wi h small al e a ions; and he p ocess ge s epea ed.
2.4.1 Fi s O de Me hods
As we ha e al eady explained, hese me hods only depend on he g adien . The idea behind
being so many a ia ions is o ha e he me hod co ec i s lea ning a e by keeping some kind
o memo y o he g adien s a p e ious poin s (s eps) o imp o e con e gence [30]. Nex , we
will discuss his me hods g ouping hem in he ollowing ca ego ies, om leas o mos e ined:
– Vanilla (No Lea ning Ra e Co ec ion)
– Momen um Lea ning Ra e Co ec ion
– Componen Lea ning Ra e Adap a ion
– Momen um + Componen Lea ning Ra e Adap a ion
26
Vanilla (No Lea ning Ra e Co ec ion)
This g oup is he simples and easies o implemen . I is ac ually he plain idea we ha e
jus explained, hus a e e y new s ep 𝑡+1we upda e he p e ious poin wi h he o mula:
(𝑤𝑡+1,𝑏𝑡+1)⟶(𝑤𝑡,𝑏𝑡)−𝜂∇(𝑤,𝑏)𝐿(𝑤𝑘,𝑏𝑘). (2.27)
Gene ally, i he ba ches o inpu da a (he e he andom colloca ion o poin s in Ω) is la ge,
compu ing ∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡)in e e y s ep can be compu a ionally e y expensi e. Recall ha
he ac o s in he loss unc ion a e o he o m 1/𝑁∑𝑁
𝑖=1(...), which means ha i 𝑁is
la ge, in e e y s ep we ha e o compu e a la ge sum o sub-g adien s, 1/𝑁∑𝑁
𝑖=1∇(𝑤,𝑏)(...),
which can be cos ly. The ix is o ake wha is known as an s ochas ic o on-line app oach,
his is, o di ide he inpu da a in o pa i ions, {𝑥𝑖}𝑖∈𝑀1,...,{𝑥𝑖}𝑖∈𝑀𝑘, wi h 𝑀1+...𝑀𝑘, and
compu e he g adien in e e y s ep jus o he da a o one o hose pa i ions. I he inpu
da a is well andomized in o he pa i ions, and he pa i ions a e o a ed consis en ly a
e e y s ep, he di e ences in he g adien om no using he whole ba ch should be e ened
ou h oughou he many s eps. When pa i ions o mo e han one da a inpu a e used he
me hod is called mini-ba ch g adien descen , when he pa i ions con ain a single da a
inpu he me hod is called s ochas ic g adien descen (SGD), and when he ull ba ch is
used he me hod is simply called g adien descen (GD). O en imes, no dis inc ion is made
be ween mini-ba ch and s ochas ic, and bo h ge e e ed o as s ochas ic g adien descen .
This s ochas ic app oach can be applied o all he a ia ions ha we will be seeing nex . In
his wo k, howe e , we will no be sampling e y la ge inpu da a ba ches and he a i icial
neu al ne wo ks will no be e y la ge ei he , so we will always ake ull-ba ch app oaches.
In e ms o usage one would hink ha hese e sion being he leas e ined would also be he
leas used, bu i is a om he u h. I is ue ha he lea ning a e, 𝜂, has o be manually
adap ed in e e y s ep, which equi es much y-and-e o expe imen a ion. This is done by
se ing a lea ning schedule, which is he se o ins uc ions on how o a y he lea ning a e
( o example, one could be as ollows: o he i s 1000 s eps use 𝜂=0.001, hen e e y 1000
s eps educe 𝜂/10). None heless, in ecen imes, and specially o a i icial neu al ne wo ks
wi h la ge amoun s o pa ame e s (some hing ha happens in e y deep pe cep on neu al
ne wo ks, o in con olu ional ne wo ks by design), he e ha e been many pape s ha claim
plain SGD (o a mos he momen um we will be seeing nex ) can ou class any o he a ia ion
ha we will see he e. The s a egy in hese pape s is o use an unusually la ge lea ning a e,
which means mo ing oo a in he di ec ion o he g adien and s aying om he op imal
pa h o minimal loss alue, in o de o c ea e an annealing e ec [31]. This annealing e ec
is a di ec pa allel om i s homonym in me allu gy. Using hese e y “long jumps” allows
o g ea e mobili y o he poin we a e a in he me hod, gi ing i he capaci y o ge o e
“walls” and explo e he loss hype -su ace o ge in o a be e egion, be o e swi ching o he
egula small lea ning a e s a egy used o achie e con e gence. This wo ks he same way
as hea ing a me al o allow o g ea e mobili y o i s molecules, and hen le ing hem se le
by cooling he me al. Adding noise o he g adien has been o a long ime an ex emely
success ul egula iza ion echnique in deep-lea ning p oblems ollowing he same p inciple o
annealing o adding some explo a ion componen . Howe e , his concep akes i u he ,
he objec i e being achie ing supe con e gence, which happens when en e ing in a e y
good egion whe e he me hod su e s a d as ic d op in i s loss alue, and con e gence can be
ob ained many o de s o magni ude as e han wi h a s anda d app oach. In [32] successi e
cycles o sho and long lea ning a es a e used o ob ain supe con e gence, and [33] de elop
an adap ed e sion called SGD wi h En opy ollowing hese same ideas.
27
As we will see he idea behind he nex a ia ions is o speed up he me hod by
au o-adjus ing he ini ial lea ning a e a e e y s ep. This implies less weaking o he lea ning
a es as he me hod will educe i na i ely when he loss is wo sening o s ay in he igh
ack, and inc ease i when he loss is imp o ing o go as e . This also make hese a ia ions
incompa ible wil supe con e gence, as in he i s s ep whe e he loss wo sens, he me hod
will immedia ely damp he lea ning a e.
Momen um Lea ning Ra e Co ec ion
Adding momen um o co ec he lea ning a e in GD is e y old and one o he i s
imp o emen s on GD, he idea being based on keeping he ine ia. In he ball analogy, i a
ball is loca ed a ce ain poin bu was ca ying some eloci y in a some di ec ion, a ha
poin i would no s op cold and esume i s mo emen ollowing he s eepes descen . The ball
will combine i s p e ious ine ia wi h he mo emen de ined by he slope i is in. This is he
idea behind classical momen um (CM), whe e he e ec i e change 𝑣𝑡+1 a he s ep 𝑡+1
is no only gi en by he g adien a ha poin , bu also by a ce ain p opo ion 𝜇by he
e ec i e change o he p e ious s ep 𝑣𝑡:
𝑣𝑡+1⟶𝜇𝑣𝑡−𝜂∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡),
(𝑤𝑡+1,𝑏𝑡+1)⟶(𝑤𝑡,𝑏𝑡)+𝑣𝑡+1.(2.28)
In ui i ely, i we we e a e mo ing in a e y consis en di ec ion h ough he loss hype -
su ace, he ine ial e m om he p e ious s ep 𝑣𝑡adds o he g adien making la ge jumps
in ha di ec ion. Con e sely i he di ec ion suddenly changes, 𝑣𝑡dampens he jump as we
migh ha e o e s epped in o a bad a ea by aking o la ge o a jump in he p e ious s ep. A
second a ia ion o his idea is he Nes e o ’s accele a ed g adien (NAG), which end
o yield be e esul s han CM. The di e ence is ha in NAV we look a he g adien no a
he poin we a e in, bu in a poin p ojec ed ahead as i we had done a second jump in he
p e ious s ep: 𝑣𝑡+1⟶𝜇𝑣𝑡−𝜂∇(𝑤,𝑏)𝐿((𝑤𝑡,𝑏𝑡)+𝜇𝑣𝑡),
(𝑤𝑡+1,𝑏𝑡+1)⟶(𝑤𝑡,𝑏𝑡)+𝑣𝑡+1.(2.29)
This subsec ion is based on he a icle [34].
Componen Lea ning Ra e Adap a ion
While momen um uses he in o ma ion o he p e ious g adien o speed up o slow down
he me hod when he e is consis en o changing beha iou , i does li le o mo e o wa d in
saddle egions. Recall ha his kind o egions occu when some de i a i es a e se e al o de
o magni ude la ge han o he s, i.e. some componen s o he g adien a e much la ge han
o he s, which can be caused by anishing o exploding g adien p oblems. In hese cases we
canno inc ease he global lea ning a e o ake longe jumps in he la e di ec ions because
his would also make he jumps longe in he s eepe di ec ions, which equi e sho e s eps
o no s ay om he con e gence pa h. Also, momen um canno help ei he , as i only adds
up on he p e ious g adien , which is s ill small o he la e di ec ions. The solu ion is o
escale he lea ning a e o each componen in he g adien indi idually based on p e ious
g adien s. So, i he e ha e been di ec ions which ha e had consis en ly small de i a i es, we
wan o ake la ge jumps jus in hose di ec ions, and con e sely o di ec ions which ha e
had consis en ly la ge de i a i es, we wan o make smalle jumps o no o o e s ep ou o
he con e gence pa h.
28
The i s me hod ha we a e going o e iew is he Adap a i e g adien Algo i hm
(AdaG ad). In i s o iginal pape [35], he me hod is p esen ed as ollows:
𝐺𝑡=∑𝑡𝜏=1(∇(𝑤,𝑏)𝐿(𝑤𝜏,𝑏𝜏))⋅((∇𝐿(𝑤,𝑏)(𝑤𝜏,𝑏𝜏))⊺∈𝑅𝑛+𝑚×𝑛+𝑚,
(𝑤𝑡+1,𝑏𝑡+1)⟶(𝑤𝑡,𝑏𝑡)−𝜂(𝑑𝑖𝑎𝑔(𝐺𝑡)+𝜀𝐼𝑑)−1/2 ∇(𝑤,𝑏)𝐿(𝑤𝜏,𝑏𝜏). (2.30)
whe e 𝐺𝑡is he cumula i e ma ix o p oduc s o he pas g adien s, 𝑑𝑖𝑎𝑔(𝐺𝑡)is he diagonal
o such ma ix, 𝐼𝑑co esponds o he iden i y ma ix, and 𝜀is a small cons an o a oid di ing
by ze o. As he ma ix 𝐺𝑡can be compu ed accumula i ely and only i s diagonal elemen s
a e used, we sugges ew i ing he me hod in he ollowing ec o ized way:
𝒢𝑡⟶𝒢𝑡−1+∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡)⊙∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡)∈𝑅𝑛+𝑚,
(𝑤𝑡+1,𝑏𝑡+1)⟶(𝑤𝑡,𝑏𝑡)−𝜂(𝒢𝑡+𝜀1)−1/2⊙∇(𝑤,𝑏)𝐿(𝑤𝜏,𝑏𝜏). (2.31)
whe e p oduc s, in e ses and oo a e all elemen -wise, and 1es he ec o consis ing o all
ones. Obse e ha i some di ec ion o he g adien has consis en ly small de i a i es, he
cumula i e alue o 𝒢𝑡will be small, and hus di iding by he squa e oo o ha alue
will inc ease he lea ning a e o di ec ion ( he in e se happens o componen s wi h la ge
de i a i es). This is so o app oxima ing he cu a u e in he p incipal di ec ions by he
alues o i s pas g adien s. Howe e , his cumula i e na u e is his me hod’s main p oblem,
as we a e cons an ly accumula ing posi i e alues, 𝒢𝑡becomes inc easingly la ge a each s ep,
and since we a e cons an ly di iding he lea ning a e by i , he me hods hal s he p og ess
and is unable o scape local minima as ime passes.
An imp o emen o AdaG ad cames wi h AdaDel a, [36], which mi iga es he e ec o he
s ong decay in lea ning a es o AdaG ad. Ins ead o using he accumula ed in o ma ion o
all he squa ed p e ious g adien s, i uses an exponen ial decay mo ing a e age o he squa e
alues o he g adien . This is ins ead o 𝒢𝑡, i uses 𝐸[(∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡))2]:
𝐸[(∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡))2]= 𝜌𝐸[(∇(𝑤,𝑏)𝐿(𝑤𝑡−1,𝑏𝑡−1))2]
+(1−𝜌)∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡)⊙∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡), (2.32)
whe e 𝜌is he decay a e o he mo ing a e age. On op o his change, he me hod also adds
a second idea. Since di iding by he squa e oo o (2.32) is so o a e y b u e app oxima ion
o di iding by he local cu a u e, in an a emp o esemble a second o de New on me hod,
a e m app oxima ing he slope is mul iplied. This e m is a mo ing a e age o he squa es
o p e ious inc emen s, 𝐸[(Δ(𝑤𝑡,𝑏𝑡))2]which uses he same decay a e as be o e:
𝐸[(Δ(𝑤𝑡,𝑏𝑡))2]= 𝜌𝐸[(Δ(𝑤𝑡−1,𝑏𝑡−1))2]
+(1−𝜌)Δ(𝑤𝑡,𝑏𝑡)⊙Δ(𝑤𝑡,𝑏𝑡), (2.33)
hen he inal algo i hm a each s ep 𝑡wo ks as:
Compu e (2.32),
Δ(𝑤𝑡,𝑏𝑡)⟶ √𝐸[(Δ(𝑤𝑡,𝑏𝑡))2]+𝜀1
√𝐸[(∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡))2]+𝜀1,
Compu e (2.33),
(𝑤𝑡,𝑏𝑡)⟶(𝑤𝑡−1,𝑏𝑡−1)+Δ(𝑤𝑡,𝑏𝑡).
(2.34)
29
As a gene al no e on he equa ions posed o his me hod, all he ope a ions in (2.32-2.34) ha e
been elemen wise. Finally, obse e ha he inc emen Δ(𝑤𝑡,𝑏𝑡)in (2.34) has in i s nume a o
a e m ha app oxima es he slope and in i s denomina o a e m ha app oxima es he
cu a u e, which ies o eplica e a s uc u e 𝐻(𝑓)−1⋅∇𝑓o a New on me hod.
A pa allel de eloped, e y popula and much simple me hod han AdaDel a o sol e he
as damping o AdaG ad is RMSP op. This is an unpublished me hod p oposed in a
Cou se a cou se by Geo ey Hin on, in lec u e 6.5. [37]. This me hod was hough as an
adap a ion o he RP op which is a me hod o iginally designed o ull-ba ches, o be able o
accoun o mini-ba ches. This RP op me hod does no ake in o accoun he magni ude
o he de i a i es in he g adien , and ins ead, only akes in o conside a ion he sign o
he de i a i es. Each di ec ion lea ning a e is inc eased sligh ly e e y ime he sign o i s
co esponding de i a i e i p ese ed, and d as ically dec eased whene e he sign o he
de i a i e changes, e e y hing wi hin a ce ain h eshold. When wo king wi h mini-ba ches
his me hod can ha e many p oblems, as some sub-g adien may change in sign o some
de i a i e due o he cha ac e is ics o ha speci ic mini-ba ch, and no because he me hod
has en e ed in o egion wi h a di e en beha iou . Fo ins ance, i he las 9 ou o 10
de i a i es in a di ec ion ha e been posi i e and he only one has been nega i e, we do no
wan o d as ically educe i s lea ning a e. To ix his esilience RMSP op uses he ollowing
mo ing a e age:
𝐸[(∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡))2]= 0.9𝐸[(∇(𝑤,𝑏)𝐿(𝑤𝑡−1,𝑏𝑡−1))2]
+0.1∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡)⊙∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡),
(𝑤𝑡+1,𝑏𝑡+1)⟶(𝑤𝑡,𝑏𝑡)−𝜂(𝐸[(∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡))2]+𝜖1)−1/2⊙∇(𝑤,𝑏)𝐿(𝑤𝜏,𝑏𝜏), (2.35)
whe e again all he ope a ions a e elemen -wise. No e ha he di ec ional adap a ion o he
g adien a he cu en s ep 𝑡is in oduced wi h a ac o o 0.1. This gi es obus ness o he
me hod as i equi es pe sis ence in he change o a sign h ough se e al s eps o change he
beha iou o he me hod. RMSP op also wo ks be e han RP op wi h ull-ba ch, due o
his obus ness.
Momen um + Componen Lea ning Ra e Adap a ion
This las ype o me hods combine he ideas o momen um and componen lea ning a e
adap a ion. Recall ha momen um in oduced in o ma ion abou he slope by p ese ing
some o he g adien o he las s ep, and componen adap a ion escaled he componen s o
he g adien in each di ec ion di iding by he squa e oo o he squa e o he g adien , which
is some kind o app oxima ion o he cu a u e in he p incipal di ec ions co esponding o
he elemen s in he diagonal o he Hessian, and ell us abou he a ia ion o he slope and
he di ec ions ha we can go as e . Combining slope and cu a u e o ge some so o
i s o de New on me hod has al eady been done AdaDel a, howe e , as he in o ma ion o
he slope came om p e ious inc emen s (al eady co ec ed g adien s) and no s ic ly om
p e ious g adien s ( he de ini ion o momen um), we e ained om including i in his sec ion.
Mainly he i s me hod ha emb aced his app oach is Adam, [38] (2014), no aking
in o accoun AdaDel a, (2012). Adam a he p esen ime (2020) is one o he bes esul
yielding i s o de me hods, and i has become he almos de ac o op imize in deep-lea ning
applica ions. I combines he e sa ili y o pu e classical momen um ( o scape local minima)
and componen lea ning a e adap a ion ( o escape saddle-poin s), and i is qui e as .
30
The me hod design is as ollows:
𝑚𝑡⟶1
1−(𝛽1)𝑡(𝛽1𝑚𝑡−1+(1−𝛽1)∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡)),
𝑣𝑡⟶1
1−(𝛽2)𝑡(𝛽2𝑣𝑡−1+(1−𝛽2)∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡)⊙∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡)),
(𝑤𝑡,𝑏𝑡)⟶(𝑤𝑡−1,𝑏𝑡−1)−𝜂 𝑚𝑡
√𝑣𝑡+𝜀,
(2.36)
whe e all ope a ions (sums, p oduc s, oo s and in e ses) a e elemen wise, 𝑚𝑡is called
he i s o de momen um in 𝑡and 𝛽1is i s decay a e, and 𝑣𝑡is called he second o de
momen um in 𝑡and 𝛽2is i s decay a e.
Obse e ha inside he big pa en hesis o 𝑚𝑡, we ha e a decaying a e age o he g adien ,
which esembles he classical momen um as de ined in (2.28), and he big pa en hesis o 𝑣𝑡
is exac ly he same as he decaying a e age p incipal di ec ion cu a u e app oxima ion in
he AdaDel a (2.32). Each o he momen ums is gi en an exponen ial escale ac o in he
o m o he e m 1/(1−(𝛽)𝑡), which ends o 1as 𝑡inc eases. Hence, since 0<𝛽2<𝛽1<1,
in he beginning 𝑚𝑡domina es o e 𝑣𝑡gi ing some so o annealing e ec by p io i izing
momen um o e cu a u e in he i s s eps. Finally he s eps a e upda ed as in AdaDel a
ollowing a New on-like app oach.
O he no able a ia ions o Adam a e: AMSG ad [39], which ies o ix he con e gence
p oblem o Adam in some ins ances (howe e , i is a gued ha he e y speci ic ins ances
ha AMSG ad ixed do no eally occu eal p oblems, hus i is some imes ega ded as mo e
complex and noisie e sion o Adam); Nadam [40] which uses Nes e o ’s momen um ins ead
o classical momen um ( om which he N in i s name comes om); and AdamW, which ies
o inco po a e a Tikhono egula iza ion (which we will see in he Regula iza ion sec ion)
inside he op imize , ins ead o adding i o he loss unc ion. As an addi ional commen ,
e y ecen ly a new ype o mo e sophis ica ed i s o de me hods which do no e en equi e
speci ying a lea ning a e ha e appea ed yielding appa en ly be e esul s han Adam and
i s a ia ions, one such me hod is YellowFin [41].
2.4.2 Second O de Me hods
The p e ious i s o de me hods yield good esul s in ela i ely small a i icial neu al
ne wo ks (a ew laye s deep). They a e no e y compu a ionally in ensi e and ha e linea
con e gence (which is o en imes enough), all a he expense o ixing a hype -pa ame e ,
namely he lea ning a e. In pa icula , AdaDel a and Adam ha e p o en o wo k eally well
agains anishing/exploding g adien and spa se g adien p oblems. Spa se g adien s a e
a “kind” o anishing g adien s which happens when he da ase is spa se, i.e. he e a e a e
ea u es ha occu in e y ew da a poin s. Hence, i we ecall ha gi en he loss unc ion
o m, he g adien is ac ually a sum o sub-g adien s each associa ed o an indi idual da a
poin , 1/𝑁∑𝑁
𝑖=1∇(𝑤,𝑏)(...), hen he con ibu ions o he g adien o i hese a e ea u es
a e small in compa ison o mo e common ea u es, as he e a e ewe poin s and sub-g adien s
ha can add o he sum. In ha case i is said ha he e is a weak signal o ha ea u e,
and in p ac ice his means ha he pa ame e s associa ed wi h ha ea u e ha e smalle
de i a i es, c ea ing saddle egions as he anishing g adien p oblem does.
31
Ne e heless, as well as hese i s o de me hods wo k in many small p oblems wi h a
somewha homogeneous da ase , he e a e wo ela ed mo i es occu ing in mo e complex
p oblems ha may equi e he conside a ion o highe o de me hods:
– The i s , mo i e is compu a ional cos : As we conside la ge a i icial neu al ne wo ks,
he numbe o pa ame e s scales up, and he smalle numbe o s eps equi ed wi h he
quad a ic con e gence (o almos quad a ic) o second o de me hods s a o become a
compu a ional ad an age o he simple bu la ge amoun o s eps equi ed wi h linea
con e gence o i s o de me hods.
– The second mo i e, e y ela ed o he i s , is he high slope a ia ion: As he numbe
o pa ame e s inc ease o he inpu da ase becomes mo e noisy, he loss unc ion
hype -su ace s a s becoming mo e “bumpy”, meaning ha in using i s o de me hods
he s ides in he di ec ion o he g adien ha e o be sho e o accoun o i s a ia ion,
i.e. he lea ning a e has o be educed. Recall ha AdaDel a and Adam co ec ed he
lea ning a e based on some so o app oxima ion o diagonal o he Hessian. The e o e,
when he Hessian inc eases (which happen when he numbe o pa ame e s inc ease),
he e ec o elemen s ou side o diagonal agg ega e o become mo e ele an , and he
me hods lose pa o hei e ec i eness.
Ou o all he second o de me hods, he classic op imiza ion New on me hod is he
p incipal one. This me hod elies on he Taylo expansion up o second o de o app oxima e
he unc ion o be op imized by a quad a ic unc ion, in a local neighbou hood o egion o
con idence o a poin (𝑤0,𝑏0). In ou case, he loss unc ion can be app oxima ed by:
𝐿((𝑤0,𝑏0)+𝑝)≈𝐿(𝑤0,𝑏0)+∇(𝑤,𝑏)𝐿(𝑤0,𝑏0)⊺𝑝+1
2𝑝⊺𝐻(𝑤0,𝑏0)𝑝, (2.37)
whe e 𝑝is an inc emen wi hin he egion o con idence and 𝐻(𝑤0,𝑏0)is he Hessian ma ix
in (𝑤0,𝑏0). Then, as (2.37) is a quad a ic unc ion o 𝑝, i should ha e a unique minimum 𝑝0,
hus de i ing he exp ession (2.37) wi h espec o 𝑝∈ℝ𝑛×𝑚, he minimum 𝑝0mus sa is y:
𝜕
𝜕𝑝(𝐿((𝑤0,𝑏0)+𝑝))≈ 𝜕
𝜕𝑝(𝐿(𝑤0,𝑏0)+∇(𝑤,𝑏)𝐿(𝑤0,𝑏0)⊺𝑝+1
2𝑝⊺𝐻(𝑤0,𝑏0)𝑝),
𝜕
𝜕𝑝(𝐿((𝑤0,𝑏0)+𝑝))≈∇(𝑤,𝑏)𝐿(𝑤0,𝑏0)+𝐻(𝑤0,𝑏0)𝑝,
0= 𝜕
𝜕𝑝(𝐿((𝑤0,𝑏0)+𝑝0))≈∇(𝑤,𝑏)𝐿(𝑤0,𝑏0)+𝐻(𝑤0,𝑏0)𝑝0,
𝑝0≈−𝐻−1(𝑤0,𝑏0)∇(𝑤,𝑏)𝐿(𝑤0,𝑏0).
(2.38)
The e o e he op imiza ion New on me hod, being a (𝑤𝑡,𝑏𝑡)in s ep 𝑡, compu es a new
s ep poin by app oxima ing he o iginal unc ion in a con idence egion a ound (𝑤𝑡,𝑏𝑡)by
a quad a ic unc ion using Taylo ’s heo em, hen inds he inc emen 𝑝𝑡 ha minimizes
ha quad a ic app oxima ion o he o iginal unc ion, and mo es using ha inc emen . In
summa y, he op imiza ion New on me hod upda e ule is:
𝑝𝑘→−𝐻−1(𝑤𝑡,𝑏𝑡)∇(𝑤,𝑏)𝐿(𝑤0,𝑏0),
(𝑤𝑡+1,𝑏𝑡+1)→(𝑤𝑘,𝑏𝑡)+𝑝𝑡.(2.39)
The inc emen 𝑝𝑡in known as sea ch di ec ion hese ypes o me hods.
32
No e ha no hype -pa ame e s a e equi ed, and second o de con e gence is gua an eed
by Taylo ’s heo em. As a majo d awback, he me hod in ol es compu ing he Hessian
ma ix and in e ing i , which is an ex emely imp ac ical and compu a ionally expensi e
ask, e en i he numbe o pa ame e s is jus mode a ely la ge. Thus, he e a e a se ies o
me hods ha modi y his op imiza ion New on me hod o use app oxima ions ins ead o he
whole in e se o he Hessian, bu p ese e many o he good p ope ies o he o iginal. As a
ade-o o hei dec ease in compu a ional complexi y, hese me hods loose hei quad a ic
con e gence, bu hey s ill ge a much be e han linea con e gence, usually e e ed o as
supe -linea con e gence, which ou class any i s o de me hod’s con e gence.
Quasi-New on Me hod
In he Quasi-New on amily each s ep upda e uses he same idea as in he New on me hod,
wi h he a small a ia ion. Ins ead o compu ing and using he Hessian ma ix 𝐻(𝑤𝑡,𝑏𝑡), we
use an app oxima ion ma ix 𝐵𝑡which we ha e o upda e in e e y s ep. This means ha , in
essence, all he easoning and de i a ion o he upda e ule a e comple ely analogous o ha
o (2.37-2.38) wi h he only di e ence being w i ing 𝐵𝑡ins ead o 𝐻(𝑤𝑡,𝑏𝑡). The inal upda e
ule will in ac ha e he same bluep in as he New on’s,
𝑝𝑘→−𝐵−1
𝑡∇(𝑤,𝑏)𝐿(𝑤0,𝑏0),
(𝑤𝑡+1,𝑏𝑡+1)→(𝑤𝑘,𝑏𝑡)+𝑝𝑡,(2.40)
wi h a ew addi ions ( wo in pa icula ). The i s is ha , in e e y s ep 𝐻(𝑤𝑡,𝑏𝑡)is being
eplaced by 𝐵𝑡, which is a ma ix ha changes wi h he cu a u e, bu does no necessa ily
ha e o be an exac app oxima ion o he Hessian ma ix ( o ins ance, i could be a scaled
down e sion o be displaced). This means ha we can us he sea ch di ec ion 𝑝𝑡 o i s
di ec ion bu no o i s magni ude, hus, we will equi e a lea ning a e 𝛼𝑡 o scale 𝑝𝑡a
e e y s ep. The e a e wo ways o compu e 𝛼𝑡a each s ep, namely inexac lines sea ch and
us egions. Coinciden ly, he Quasi-New on me hods ha we will be seeing nex use inexac
lines sea ch, and he unca ed New on me hods o he nex subsec ion use us egion. In
pa icula , he inexac lines sea ch ha we will use is he sa is ac ion o Wol e condi ions
which is gi en by he ollowing se o inequali ies:
𝐿((𝑤𝑡,𝑏𝑡)+𝛼𝑘𝑝𝑡)≤𝐿(𝑤𝑡,𝑏𝑡)+𝑐1𝛼𝑡∇𝐿(𝑤𝑡,𝑏𝑡)⊺𝑝𝑡,
∇𝐿((𝑤𝑡,𝑏𝑡)+𝛼𝑘𝑝𝑡)⊺𝑝𝑘≥𝑐2∇𝐿((𝑤𝑡,𝑏𝑡))⊺𝑝𝑡,(2.41)
wi h 0<𝑐1< 𝑐2< 1. Using Wol e condi ions, 𝛼𝑘is p og essi ely dec eased un il he
inequali ies (2.41) a e sa is ied. This gua an ees he lea ning a e holds sufficien decease in
cu a u e condi ions.
The second issue is how o compu e he ma ices 𝐵𝑡a e e y s ep. In p inciple, wo
gene al equi emen s a e demanded, ha help calcula e he ma ix: i has o be symme ic
like he Hessian and i mus sa is y he secan equa ion (o Quasi-New on equa ion). This
secan equa ion can be ob ained by di e en ia ing in e ms o he inc emen a iable p o
he quad a ic app oxima ion (2.37 wi h 𝐵𝑡) o a gi en s ep 𝑡:
𝜕
𝜕𝑝(𝐿((𝑤𝑡,𝑏𝑡)+𝑝))≈ 𝜕
𝜕𝑝(𝐿(𝑤𝑡,𝑏𝑡)+∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡)⊺𝑝+1
2𝑝⊺𝐵𝑡𝑝),
𝜕𝐿((𝑤𝑡,𝑏𝑡)+𝑝)
𝜕((𝑤𝑡,𝑏𝑡)+𝑝) ⋅𝜕((𝑤𝑡,𝑏𝑡)+𝑝)
𝜕𝑝 ≈∇(𝑤,𝑏)𝐿(𝑤𝑡,𝑏𝑡)+𝐵𝑡𝑝, (2.42)
33
An o e i ed model can be ha de o ain, his is because in he op imiza ion p ocess,
he g adien ha upda es he pa ame e s con ains con ibu ions o he ex a deg ees o
eedom o he model. This can make he op imiza ion p ocess much noisie and suscep ible
o o e - ep esen ou lie s. Also, o e i ed models end o gene alize da a poo ly, i.e. hei
abili y o p edic on new da a dec eases, as any da a no s ic ly in he aining se would
no simply be he ex apola ion o he ela ions among he aining poin s, bu would also
con ain he con ibu ions o he deg ees o eedom. Regula iza ion echniques help educe
he numbe deg ees o eedom o he models.
Figu e 2.10: Example o o e i ing o a model.
The p e ious Figu e 2.10 show an example o an o e i ed model. Bo h he blue line
and o ange line a e polynomial models ha i he aining da a ( ed poin s). The o ange
model (which has mo e coefficien s han he blue model) is o e i ed hough, since, mos
p ominen ly a he ex eme poin s, i gene a es a bumpy beha iou comple ely un ela ed o
ha o he aining da a, ela ed o he unnecessa y ex a pa ame e s ha we ha e added.
In p ac ice inding he exac igh numbe o hype -pa ame e s equi ed in a model is an
almos impossible ask by he shee amoun o possibili ies. Besides a ial and e o s a egy
would imp ac ical o compa e models due o aining being a compu a ionally cos ly p ocess
sensi i e o he op imize and ini ial condi ions. Hence e ec i ely, since he e is no gene al
ule ha can be ollowed, he hype -pa ame e s a e o en chosen wi hin a easonable ough
ma gin (mos ly based on he esul s o he i s ew s eps o he op imiza ion), gua an eeing
some o e i ing. Then egula iza ion echniques a e used o clamp down on he ex a deg ees
o eedom o he model. This is much mo e iable app oach han aining an almos
exponen ially inc easing amoun o models wi h di e en numbe s o hype -pa ame e s o
na ow down he igh numbe which does no unde i o o e i he da a.
Each o he ollowing subsec ions will be dedica ed o a di e en egula iza ion
echnique. We will imp o ise wo ca ego ies o g oup hese echniques based on he main
gene al p inciples behind hem, namely noise-based egula iza ions and es ic ion-based
egula iza ions. In his wo k we will only be using es ic ion-based egula iza ions hough.
2.6.1 Noise-based Regula iza ions
Behind he noise-based egula iza ions lies he idea ha adding an s ochas ic componen
h oughou he aining p ocess o add some (small) a iance in o model. To much a iance
can lead o a chao ic model (undesi able), bu adding a small a iance du ing aining can
be e y bene icial as i can somewha be seen as employing he ex a deg ees o eedom o
accoun o he ex a a iabili y o he model.
40
Figu e 2.11: Example o a model adding noisy inpu .
In ui i e he concep in i s mos i ial o m can be seen in Figu e 2.11, which is no hing
mo e han Figu e 2.10 o which we ha e added some ex a noisy poin s ( he o iginal poin s
plus some ex a noise). I now becomes appa en ha when i ing he model he end esul
would be close o he well-de ined model (blue line) han o he o e i ed model (o ange
line), since now he model has o also accoun o he g een do s o which he blue line has
a smalle e o , specially a he ex emes. Thus, i he a iance is small he ex a deg ees o
eedom a e spen o ensu e ha small a ia ions in he da a do no yield o e whelmingly
la ge changes in he model.
Now ha we ha e explained how noise wo ks, we will be looking a how o in oduce i
in o he model o he aining phase. The mos ob ious way is o in oduce a iance in o
he model is by using he di ec app oach, his is add noise o he inpu da a, like o
example, 𝑥𝑖+𝒩(𝜇,𝜎). A much sma e way o apply his concep is making use o some
in a ian o gene a e new da a, in wha is called da a augmen a ion. This happens a lo
in objec ecogni ion, whe eby a ca in a pic u e is ca independen ly o he image being
o a ed 90º o he ca appea ing in he cen e o a co ne o he pic u e, hus we can o a e
o shi he pic u es o gene a e new inpu s.
The e a e much mo e sophis ica ed way o in oduce a iance in o he model. One o hese
is using noisy neu ons, which implies ha , only du ing he aining phase, we add noise o
he ou pu o each neu on, his is 𝑦[
ℓ
]𝑛
ℓ
+𝒩(𝜇,𝜎). Ano he one is he d opou echnique
[52], which only du ing he aining phase uses a p obabili y o supp ess he ou pu o a gi en
neu on. Hence, o example, o e e y neu on a each s ep we would d aw a numbe om a
uni o m dis ibu ion 𝑝∼𝒰(0,1), and i 0.9<𝑝<1we would se i s ou pu o 0 ( his would
be a d opou o 10%). A las we ecall ha he annealing e ec explained in sec ion 2.4.1
o he anilla s ochas ic g adien descen can also be conside ed some so o noise-based
egula iza ion echnique.
One o he main issues wi h noise is ha , we wan o in oduce some small a iance
h oughou he model du ing aining, bu we wan his a iance o be con olled and small
along he p ocess, o he end esul no o be a chao ic model i ing only noise. By his we
mean ha , we do no wan he e ec o he a iance in oduced in he ea ly laye s ( he ones
closes o he inpu ) o explode in he ollowing laye s. We wan he a iance con ibu ion
o emain small as i ge s impu ed in o i s nex laye s. The p incipal idea o assu e his, as
well as p o iding o he e y good p ope ies, a e he no maliza ion echniques which has
become a s aple in many deep-lea ning, namely ba ch and laye [53] no maliza ion. I consis s
in no malizing ei he inpu ba ch o he ou pu s o he neu ons o e e y laye , espec i ely.
41
In gene al o small models (specially in hei numbe o laye s) wi h well ini ialized
pa ame e s do no equi e no maliza ion because in hose dimensions he noise will mos
likely no scale up. One incon enience o no maliza ions is ha i co ela es he g adien s.
Recall om (2.4-2.5), ha he loss unc ion is a summa ion o e each o he indi idual losses
a he poin s o he colloca ion, which makes he de i a i es o he pa ame e s wi h espec
o he loss a sum o unco ela ed de i a i es. No maliza ion en angles hese de i a i es which
makes he upda e g adien o he op imize co ela ed wi h espec o colloca ion poin s,
hus, he g adien compu a ion a e e y s ep o he aining becomes less spa se and mo e
compu a ionally expensi e. Addi ionally, no maliza ion does no wo k well wi h d opou .
The eason we ha e dis ega ded noise-based in a ou o es ic ion-based egula iza ions,
al hough bo h a e mu ually compa ible, is because unde s anding he beha iou o noise in
a con ex whe e we a e conside ing he de i a i es o he a i icial neu al ne wo k is e y
isky and becomes exponen ially mo e complex wi h he o de o he de i a i es. Also, in
he case o no maliza ion (which we ha e es ed o his wo k), whe e we do no ac ually
in oduce a iance bu limi i s e ec s, he ex a inc ease in compu a ional cos , caused by
he co ela ion o he g adien , builds up on he al eady una oidable compu a ion o highe
o de o de de i a i es o he neu al ne wo k equi ed in his wo k, making he aining p ocess
many imes slowe and imp ac ical. On he con a y, he echniques ha we ha e ca ego ized
as es ic ion-based a e mos ly (so o ha d) binds on he pa ame e s. Thus, hei applica ion
do no in e e e wi h he compu a ion o 𝐿1and 𝐿2, and so, hei e ec is applies a e wa ds.
2.6.2 Res ic ion-based Regula iza ions
On he o he side o he spec um lie wha we ha e named as es ic ion-based egula iza ion
echniques. These a e a se o so o ha d cons ains on he pa ame e s, added in he loss
unc ion o a e applied independen ly. Hence, any ex a deg ees o eedom in he model may
be in es ed in ul illing hese cons ains.
The mos common o hese all a e he weigh penal ies (ac ually i would be pa ame e
penal ies). This egula iza ion echniques ely on an ex a e m, which is added o he
loss unc ion, and impose some p e e ence in he pa ame e s ( his would co espond o he
placeholde e m 𝑅in oduced in (2.1)). The mos no able o hese egula iza ions is he
popula Tikhono egula iza ion which implies adding he ollowing e m:
𝑅=𝜆(||𝑤||2
2+||𝑏||2
2)∶=𝜆 ∑
ℓ
,𝑛
ℓ
,𝑚
ℓ
−1(||𝑤[
ℓ
] 𝑚
ℓ
−1
𝑛
ℓ
||2
2+||𝑏[
ℓ
]𝑛
ℓ
||2
2), (2.57)
whe e 𝜆is he egula iza ion scale ac o . Technically, his is same Tikhono egula iza ion
as he one used in he leas squa es me hod o linea eg ession. One possible in e p e a ion
o i is ha we impose a “minimum ene gy” model, i.e. we a e looking o a model whe e
he bias and specially he slopes a e he smalles possible (yielding a “ la e model”), which
happens na u ally as we minimize he e m 𝑅in he loss unc ion. Ano he common eading
is ha he addi i e con ibu ion o ∇(𝑤,𝑏)𝑅 o he o al upda e g adien in oduces some so
o dampening o ce which penalizes he op imiza ion me hod when mo ing in di ec ions whe e
he loss unc ion, 𝐿(𝑤,𝑏), is less smoo h (which happen wi h la ge alues o 𝑤and 𝑏). Going
back o Figu e 2.10, he well-de ined model would sa is y bes he e m (2.57).
42
Fu he mo e, he eason why hese a e called weigh penal ies and no pa ame e penal ies,
is because in mos cases his egula iza ion only a ec s he slopes (weigh s), and ||𝑏||2
2, he bias
e ms in (2.57), a e d opped. Fo his wo k hough, we a e conside ing he bias e m as we
belie e i p o ides some so o cen ing e ec on he neu on ou pu s which (in he pa icula
p oblem ins ances chosen o his wo k) help in speeding up he aining. Howe e , in a gene al
con ex , his could p o e icky and is gene ally undesi able, specially in ins ances whe e he
solu ion o he ini ial/bounda y p oblem we wan o app oxima e ha e many di e en localized
ea u es, i.e. he solu ion is e y bumpy (which will no be case in his wo k). The eason
o his, is ha biases, al hough no s ic ly necessa y in an a i icial neu al ne wo k (an only
weigh s ne wo k is absolu ely unc ional), when applied help o op imize he di e en ia ion
among neu ons in he same laye . Fo example, in a neu on using sigmoid ac i a ion unc ions,
gi en wo inpu s om wo di e en egions and combining hem wi h he weigh s, suppose
we ob ain alues 0.5and 1.5. Then, compu ing he ac i a ion, gi ing he neu on 𝑏 = 0,
yields a di e ence in ou pu be ween he wo inpu s o ∼0.19, and o 𝑏 =2, i yields a
di e ence in ou pu be ween he wo inpu s o ∼0.05. This is appa en om Figu e 2.7 as we
see ha he maximal slope is cen ed a ound ze o. Hence, using a bias o shi he p oduc
o inpu s and weigh s in a neu on can lead o an inc ease o dec ease o he di e ence in
ou pu s among alues in di e en egions, an e ec ha applying a Tikhono e m which
pushes 𝑏→0can e en nega e, as con a y o he mul iplica i e con ibu ion o he weigh s
in he neu on, biases ha e an addi i e one equi ing much la ge alues o ha e a signi ican
e ec . A second in e p e a ion can be d awn by explana ion gi en in Figu e 2.8, whe eby we
a gued ha wo inc easing unc ions could be combined o o m a so o dissipa i e squa e
pulse unc ion. By educing he biases, we educe he ampli ude o he pla eaus (wid h o
he windows) o hese a angemen s, which educes some he speci ici y ha can be achie ed
o ce ain egions in he model. Finally, we can s ill a gue ha he loss o some localized
specializa ion in he neu ons due o he e m ||𝑏||2
2should s ill no pose a p oblem in he
adap abili y o he model, as i would simply make he con ibu ions o a he neu ons in a
laye o a egion mo e o e lapping, why should his be a conce n and undesi able? Al hough
his las s a emen is ue, a gene ally desi ed ea u e o a good a i icial neu al ne wo k is
o i o be a spa se a i icial neu al ne wo k, i.e. ha o any inpu gi en o he ne wo k
almos all o he signal is con ibu ed by jus ew neu ons (o in o he wo ds any inpu only
equi e passing h ough ew ele an neu ons neu ons, and no all o hem ha e o be ac i e
a he same ime). Because o he objec i e o his wo k, which is p o ing i is possible o
app oxima e solu ions o ini ial/bounda y p oblems by a i icial neu al ne wo ks, we would
a he ha e he ex a egula iza ion e ec s o he e m o ob aining a spa se ne wo k.
Ano he cus om weigh penal y ha we ha e de ised and seems o wo k a he well in his
wo k is he ollowing:
𝑅 =𝜆(∣∣𝜕 𝑢(𝑥)
𝜕𝑤 −𝜕
𝜕𝑤𝜕 𝑢(𝑥)
𝜕𝑥 ∣∣2
2+∣∣𝜕 𝑢(𝑥)
𝜕𝑏 −𝜕
𝜕𝑏𝜕 𝑢(𝑥)
𝜕𝑥 ∣∣2
2)
∶=𝜆 ∑
ℓ
,𝑛
ℓ
,𝑚
ℓ
−1⎛
⎜
⎝∣∣ 𝜕 𝑢(𝑥)
𝜕𝑤[
ℓ
] 𝑚
ℓ
−1
𝑛
ℓ
−𝜕
𝜕𝑤[
ℓ
] 𝑚
ℓ
−1
𝑛
ℓ
𝜕 𝑢(𝑥)
𝜕𝑥 ∣∣2
2+∣∣𝜕 𝑢(𝑥)
𝜕𝑏[
ℓ
]𝑛
ℓ
−𝜕
𝜕𝑏[
ℓ
]𝑛
ℓ
𝜕 𝑢(𝑥)
𝜕𝑥 ∣∣2
2⎞
⎟
⎠,(2.58)
wi h i s idea being, ins ead o using he ex a deg ees o eedom o ob ain he solu ion wi h
minimal slopes, o ind a solu ion whose de i a i es wi h espec o he pa ame e s and wi h
espec o pa ame e s and inpu s a e simila in magni ude. This u ns ou o wo k qui e well
as we will see in he nex sec ion, and in ac , i equi es no ex a compu a ions as all he
de i a i es in ol ed in (2.58) a e au oma ically calcula ed when compu ing ∇(𝑤,𝑏)𝐿1( he loss
o he di e en ial ope a o ).
43
A opic ha we ha e no co e ed ye is he se ing o he egula iza ion coefficien 𝜆. This
has o be done manually and i has o be e ised and upda ed a e e y s ep o he aining
p ocess. We wan he model o minimize i s e o s 𝐿1+𝐿2(main objec i e) o e i ing he
e m 𝑅(seconda y objec i e). Typically, o s ablish p io i y as wi h any o he mul i-objec i e
unc ion we use he coefficien 𝜆, o limi he magni ude in which he e m egula iza ion 𝑅
con ibu es he o al loss wi hou becoming i ele an . One possible easonable demand would
be o ask o he egula iza ion e m magni ude o be be ween 10%and 20%o he main e m
magni ude, o in o he wo ds 0.1⋅(|𝐿1|+|𝐿2|)⪅|𝑅|⪅0.2⋅(|𝐿1|+|𝐿2|), and adjus 𝜆e e y
ime he c i e ion is no me . A much logical app oach a i s sigh would seem o be, adap
𝜆as a unc ion o he magni udes o he e ms, o example in he Tikhono case, we could
always make 𝜆 = 0.1(|𝐿1|+|𝐿2|)/|(||𝑤||2
2+||𝑏||2
2)|, o 𝑅 o always be 10%o he o he
wo e ms. Howe e , bea in mind ha e e y ime we modi y 𝜆we a e in ac changing he
o al loss unc ion 𝐿(𝑤,𝑏), which is de imen al o he obus ness and con e gence o he
op imiza ion p ocess. Thus, i is bes o use a h eshold ha allows he op imize o ain
on a ixed hype -su ace o many s eps un il a co ec ion has o be done, han o ha e he
op imize mo e on a hype -su ace ha changes in e e y s ep, mo e so conside ing ha all
bu anilla SGD use some kind o memo y om p e ious g adien s, which becomes i ele an
i he hype -su ace has changed. Ne e heless, in sec ion 3.3 in he nex chap e , we will
p opose an al e na i e app oach o deal wi h his issue as pa o a la ge amewo k o deal
wi h mul i-objec i e loss unc ions, which seems o pe o m much be e han his classical
h eshold s a egy and achie e as e con e gence.
An al e na i e o weigh penal ies a e he weigh cons ains (o pa ame e cons ains),
which a e se s o inequali ies ha can be applied o he pa ame e s, ei he by componen
alue like 𝑎<|𝑤[
ℓ
] 𝑚
ℓ
−1
𝑛
ℓ
|<𝑏, o by node o laye no m 𝑎<||𝑤[
ℓ
]𝑛
ℓ
||2
2<𝑏. The way o apply
hese inequali ies is usually by clipping, his is o example, i we upda e a pa ame e in a
gi en aining s ep and su pass he uppe bound 𝑏, hen pa ame e is se o 𝑏. This e ec i ely
s alls he aining o pa ame e s ha ha e become o la ge (usually e y dominan e ec s)
o p e en s om anishing pa ame e s ha ha e become oo small (usually e y negligible
e ec s), o cing a mo e e en dis ibu ion in he ele ance o he pa ame e s. A mo e common
clipping p ac ice is g adien clipping, which is applied on he g adien s wi h espec o
he pa ame e s used o upda e he pa ame e s on an uppe bound. Hence, his limi s he
e ec o any exploding de i a i es case, as any ex emely la ge de i a i e which would mean
an ex emely la ge upda e, whe eby using g adien clipping would be ins an ly educed o a
maximum easonable ange. In he aining o all he models in his wo k we ha e implemen ed
a componen uppe bound weigh s and bias clipping 103, which is easonable enough o he
small scale o he models, and an uppe bound g adien clipping by no m o all he pa ame e s
in he laye o 1.
Finally, we p opose and will implemen he ollowing idea o a egula iza ion, which can be
d awn om he con ex o his wo k. Since wan o ain a i icial neu al ne wo ks o sa is y
di e en ial equa ions, and hus app oxima e hei unique exac solu ions, any conse a ion
law sa is ied by he exac solu ions mus also be app oxima ely sa is ied by he a i icial neu al
ne wo k. Hence we can add conse a ion laws o he loss unc ion he same way we did
wi h he o he weigh penal y egula iza ion, by simply eplacing he conse a ion laws in o
he placeholde 𝑅(and op ionally adding some egula iza ion cons an ). Ac ually we could
a gue ha , since he ne wo k mus also sa is y he conse a ion law as closely as possible as
he exac solu ion does, he e m is no ac ually a egula iza ion bu a legi ima e ex a e m
which speeds up aining, and no an ex a condi ion which help selec some speci ic model
among he many ha app oxima e he solu ion (essen ially a egula iza ion).
44
No many di e en ial equa ion ha e known conse a ion laws hough, and hus, a e ha d o
come by. Besides, in cases whe e conse a ion laws a e known, usually adding an ex e nal o ce
o he equa ion (some hing we will be doing in his wo k) in alida es such laws. Some imes,
al hough i is a e, his can be accoun ed o by de i ing again he conse a ion law wi h he
ex e nal o ce, and his can lead o he o iginal law wi h some ex a e ms, like an he in eg al
o e he domain o he ex e nal o ce i he domain is bounded.
2.6.3 O he Regula iza ions
In his subsec ion we will look a wo e y common p ac ices ha migh as well all in he
ca ego y o egula iza ion. The i s is known as p e- aining, which consis s o , ins ead o
ini ializing a new a i icial neu al ne wo k o app oxima e an ini ial/bounda y p oblem, we
would use an al eady exis ing one as a s a ing poin , wi h he hopes ha his ne wo k is
al eady close o he desi ed ou come. All he a emp s in his wo k o use p e- aining wi h
a i icial neu al ne wo k ained o only i he ini ial/bo de da a, o only i he domain, o
o i he di e en ial equa ion d opping any o i s e ms, ha e ei he had he same pe o mance
as using no p e- aining, o wo sened. The mos plausible explana ion migh be ha being he
loss unc ion mul i-objec i e, i is bes o keep a balanced ag eemen be ween he wo pa s
om he s a , a he han s a ing by i ing ei he 𝐿1o 𝐿2, as he egion in he pa ame e
space we can all in du ing hese one e m op imiza ions migh be useless o e en de imen al
o he o he e m, hus making wo se he combined op imiza ion.
Second is ea ly s op, which is no only always applicable, bu use ul in many ways. Da a
used o i a i icial neu al ne wo ks (o any model by ha ), is usually spli in o wo g oups,
he aining da a and he alida ion da a. The aining da a is used o i he model (i is
he da a impu ed in he loss unc ion du ing he op imiza ion), and alida ion da a is used
as a con ol mechanism o p e en o e i ing o he model. The e o e e e y ce ain numbe
o i e a ions in he p ocess ( o example 1000 i e a ion), we e alua e he loss wi h espec o
he alida ion se , and i his alida ion se loss has wo sened wi h espec o i s p e ious
e alua ion, hen we s op he aining ( his is ea ly s op). The p inciple he e is ha he
aining p ocess is blind o he alida ion da a (no used), howe e he model should s ill i
his da a as pa o i s capabili ies o gene alize beyond he aining poin s. Simila ly o he
wo kings o noise, i he alida ion loss ge s wo s , he beha iou beyond he aining poin s
becomes undesi able, and hus, he model is o e i ing. When his happens we can simply
s op he aining comple ely, o i migh be a sign ha he lea ning a e o he op imize is oo
la ge and we ha e o educe i , we can y in oduce some ex a egula iza ion o adap he
egula iza ion coefficien 𝜆 o co ec he model, o we can gene a e a new ba ch which is
also a egula iza ion echnique, and esume he aining. Hence, ea ly s opping is e y use ul
no only as a egula iza ion echnique, bu i gi es a cue o ec i y he aining o he model
when he p ocess is s alled. Th oughou his wo k we use alida ion in e als (we check he
alida ion loss) e e y 1000 s eps o aining.
As a inal no e, we wan o add ess he eason why we do no check he alida ion loss a
e e y s ep. The i s mo i e is because i s o de op imize s a e no always smoo h, i.e. he
loss unc ion can be dec easing bu in an oscilla ing (conjuga e) manne (specially a ound
alleys), hus in a e y sho span ea ly s op could con use one o hese luc ua ions whe e
he is a local maximum wi h a s op c i e ion. S ill, o second o de me hods which use
line-sea ch ha gua an ees ha he e is always a dec ease in loss (o hey simply s op), he
eason is ha i is compu a ionally mo e expensi e o e alua e an ex a loss a each s ep, and
a ew mo e s eps om he ea ly s op c i e ion will no subs an ially change he model.
45
Chap e 3
Case S udies and Simula ions
In his chap e we will inally be aining a i icial neu al ne wo ks o app oxima e he
solu ions o some ins ances o ini ial/bounda y, s a ing by he mos simple case and building
up o mo e complex ope a o s.
The layou o his chap e will be ai ly consis en . Besides he i s h ee sec ions,
dedica ed o he gene al implemen a ion ela ed opics p ac ical o his wo k: he coding
amewo k, unc ion app oxima ion capabili ies o a i icial neu al ne wo ks, and adap a ion
o mul i-objec i e unc ion aining; each o he emaining sec ions ollows he same s uc u e
o posing a p oblem ins ance, aining, and esul analysis, wi h di e en ope a o s. All o he
ope a o s used in his chap e ha e al eady been de ailed in Table 1.2, and as men ioned in
he in oduc ion, we will be using only Cauchy ini ial/bounda y condi ions. Wi h ega ds o
he ex e nal o ces, we will be selec ing hem ad-hoc in e e y p oblem so ha he solu ion is
a simple known polynomial. This way we can benchma k he a i icial neu al ne wo k esul s
agains he exac solu ion wi h ease.
3.1 Coding A i icial Neu al Ne wo ks
F om an implemen a ion s andpoin , deep-lea ning model aining equi e he compu a ion
o many ope a ions, specially linea combina ions ( enso ope a ions). Recall ha an a i icial
neu al ne wo k neu on is composed o a linea combina ion o he ou pu s o he p e ious laye ,
and an applica ion o a non-linea ac i a ion unc ions. In e ms o ac i a ion unc ions,
li le can be done o imp o e pe o mance, bu he many sums and p oduc s o he linea
combina ions a e suscep ible o high pa alleliza ion, as hey a e mos ly independen among
neu ons, low in compu a ional cos and high in numbe . The e o e, in o de o speed up
lea ning, ins ead o using CPUs, which a he ime o his wo k ha e up o 8/16 co es, i.e.
p ocessing uni s and maximum amoun o ope a ions ha can be pe o m in pa allel, we can
make use o he al eady exis ing GPU ha dwa e. GPUs a e op imized o image p ocessing,
a p ocess which ely hea ily in ma ix mul iplica ion. Thus con a y o CPUs composed o
a ew powe ul co es, GPU a e buil using a la ge numbe o lowe end co es, which a he
ime o his wo k can be in a e age o 120 co es. By pa allelizing he linea combina ion
ope a ions o he many co es o a GPU, we can educe he aining ime o an a i icial neu al
ne wo k mani old, especially in la ge ne wo ks. Nowadays, a new piece o ha dwa e specially
designed o deep-lea ning aining has i up ed called TPUs (Tenso P ocessing Uni s). This
ha dwa e con ains an e en la ge numbe o co es and i s a chi ec u e is ad-hoc designed o
pa allelize enso ope a ions, imp o ing on he capabili ies o GPUs.
In o de o manage and dis ibu e he low o ope a ions o make he mos use o GPUs
and TPUs, he e a e se e al de eloped so wa e solu ions. We will b ie ly gi e a basic
unde s anding on he mos p ominen high/medium/low le el op ions.
46
On he lowes le el, almos exclusi ely, lies he APIs named CUDA, de eloped by GPU
make NVidia. This API allows o di ec con ol and dis ibu ion o ope a ions o he co es
in a GPU/TPU. Howe e , om a p ac ical pe spec i e, unless we wan o eally cus omize
and mic omanage he esou ces in ou GPU/TPUs, his le el o con ol is oo much. Thus,
he e a e se e al middle-le el lib a ies used in deep-lea ning ha au oma ically handle hese
asks, he mos popula ones being Tenso Flow, de eloped by Google, and PyTo ch, which
is open sou ce (bo h unning on CUDA). The way his lib a ies wo k is by implemen ing
hei own class o mul i-dimensional objec s (like a ays), and e e y ime enso ope a ions
a e pe o med on hese objec s, hey use CUDA unde he hood o dis ibu e i s ope a ions
in o he GPUs and/o TPUs co es au oma ically. This simpli ies he wo k by allows us o
concen a e on p og amming he ma hema ical amewo k o he models wi hou ha ing o
deal wi h he managemen o he pa alleliza ion asks. Las ly, on op o hese lib a ies, he e
a e also highe le el ones, such as Ke as lib a y which hinges on Tenso Flow. These build
on he mul i-dimensional class o u he implemen classes o laye s, models, op imize s,
aining, and mo e, wi h many op ions, c ea ing a s uc u e ha allows o build and ain a
model in a e y simple and encapsula ed manne .
The code o his wo k has been w i en using Py hon’s e sion o Tenso Flow 2.3. Some o
he p incipal classes o he Ke as lib a y ha implemen he laye s, models and op imize s,
ha e been impo ed bu only se e as a s uc u e, since hey we e no applicable o he special
o mula ion o his wo k, hey had o be comple ely o e w i en. Mo eo e , he execu ion has
been done h ough Jupy e No ebooks in he Google Colab cloud en i onmen which o e s a
ee N idia K80/T4 GPU. Fo he code, add ess o Appendix B.
3.2 App oxima ing a Func ion
He e we will be s udying he app oxima ing capabili ies o an a i icial neu al ne wo k o
model a unc ion. This can be conside ed, in he con ex o his wo k, as simples case o
di e en ial equa ion possible, he i ial case o he iden i y ope a o , whe eby he a i icial
neu al ne wo k should be adjus ed o sa is y:
ℒ[ 𝑢(𝑥)]=𝑓(𝑥) ⇒ 𝑢(𝑥)=𝑓(𝑥), (3.1)
which is equi alen o simply ha ing he a i icial neu al ne wo k model he ex e nal o ce
unc ion. Being his ope a o o o de ze o, ini ial/bounda y condi ions a e i ele an , and
hus, he loss unc ion o op imize is:
𝐿(𝑤,𝑏)=𝐿1(𝑤,𝑏)+𝑅= 1
𝑁Ω∑
𝑖∈𝑁Ω( 𝑢(𝑥𝑖;𝑤,𝑏)−𝑓(𝑥𝑖))2+𝑅. (3.2)
The (ex e nal o ce) unc ion ha we will be app oxima ing in his sec ion will be he
polynomial 𝑓(𝑥)=𝑥(𝑥−1). Using his ins ance as an example, we will compa e how well
di e en op imize s and ac i a ion unc ions wo k a he ask o modelling unc ions, as well
as explain some o he beha iou s o aining. He e, he basic me ic o assess pe o mance is
he ela ion loss unc ion - i e a ions, his ep esen s how well he model i s he solu ion a
e e y s ep. As he loss unc ion can ha e has e y s eep dec eases in alue, we will be using
loga i hmic scales o be e ep esen a ions. Mo eo e , o e e y model we will be showing
a plo o he end esul compa ed o he eal solu ion, and in e alua ion u u e e alua ions,
whe e i ac ually applies, we will also be decomposing he o al loss in o i s componen s 𝐿1
and 𝐿2.
47
A las , we will be using an a i icial neu al ne wo k wi h an inpu laye wi h 1 neu on,
wo hidden laye wi h 3 and 4 neu ons each, and an ou pu laye wi h 1 neu on; which we
will call a [3,4,1]-ANN, o app oxima e (3.1). Fi s , we will s a by compa ing, how di e en
ac i a ion unc ions wo k o he same ne wo k layou wi h di e en choices o ac i a ion
unc ions. Fo his pu pose we will use an Adam op imize ixed o 𝜂 = 0.01,𝛽1= 0.9,
and 𝛽1=0.999, and we will see he pe o mance o he i s 10000 i e a ions wi hou any
adjus men s. The only egula iza ion applied will be a pa ame e uppe bound o 10𝑒3and a
g adien clipping by laye no m o 1, which as explained in he egula iza ion sec ion o his
wo k, will be he s anda d. Ini ializa ion om he e on a e done as de ailed in 2.5.1, using he
no mal dis ibu ion e sions.
Figu e 3.1: Compa ison o di e en ac i a ion unc ions aining pe o mance o a
[3,4,1]-ANN, wi h Adam 𝜂=0.01,𝛽1=0.9,𝛽1=0.999. Log10 scale.
F om Figu e 3.1 we can obse e ha by he end o he aining, o many o his ac i a ion
unc ions, he loss alue s agna es an oscilla ing beha iou s a s o appea . This is in some
sou ces called sa u a ion, meaning ha he model is unable o lea n mo e. Fundamen ally his
is in insic o he model because we a e app oxima ing unc ions which may (and ac ually
ha e) a e y di e en analy ic s uc u e om he pa ame ic model we a e using. Hence, he
same way i happens when we use a Taylo se ies expansion, whe e we ha e o unca e a
some o de o ob ain a ini e model ob aining an e o , he e we will also ha e an in insic
minimal e o o he model. Howe e , being his a non-con ex op imiza ion p oblem, we do
no know i hese sa u a ions co espond o eaching he in insic e o o he model (global
minimum), o i i co esponds o a local minimum o a alley. When his happens, i we
ha e implemen ed ea ly s op in he aining loop, he p ocess will s op (which happened o
he exponen ial and so plus ac i a ions in Figu e 3.1). Then, we can choose o s eng hen
he egula iza ion (no e y e ec i e), o o educe he lea ning a e in he op imize o use a
second o de one in he hopes i is alley and we can scape i . I we use a second o de me hod
(in his wo k L-BFGS), and he me hod s ops, we can be almos comple ely su e ha a loss
is in some minimum and we will no be able o scape i . This is because he s op c i e ion
wi h line sea ch is no inding any a io in he g adien di ec ion ha can ac ually dec ease
he loss unc ion (and line sea ch looks o his a io wi h exponen ial decay), hus almos
gua an eeing we a e in a minimum. He e is whe e luck o non-con ex op imiza ion comes in o
place, as a di e en ini ializa ion, o ins ance o he same ini ializa ion, o a simple o mo e
complex ne wo k layou , o an appa en ly wo s pe o ming op imize can lead o a di e en
op imiza ion pa h h ough he loss hype -su ace, leading o a be e i ing model.
48
In his benchma k we ha e used qui e a minimal model o ensu e i is no oo o e i ed
( egula iza ion can only ix some o e i ing) and he loss unc ion is qui e smoo h, and we
ha e used a ai ly obus op imize . The e o e, we can assume wi h some con idence ha he
sa u a ion co esponds a leas o some minimum close o he global one. This lead us o
ex apola e as a gene al c i e ion ha , he ac i a ion unc ions ha sa u a e he la es and
a lowe alues, i.e. sigmoid, hype bolic angen and swish, a e p e e able o he exponen ial
o so plus ac i a ion unc ions, and hus, we will p io i ize he in he upcoming simula ions
(which does no mean ha o some pa icula ins ance an exponen ial o so plus ac i a ion
could ou pe o m he o he s).
As a second pa o his benchma king, we will compa e possibili ies o he o he c ucial
choice in aining, he op imize s. We will be using he same se -up as be o e, bu his ime
ins ead o ixing he op imize , we will be ixing he ac i a ion unc ion o be sigmoids.
Figu e 3.2: Compa ison o di e en i s o de op imize s aining pe o mance o a
[3,4,1]-ANN, wi h sigmoid ac i a ions. Lowe image in log10 scale.
F om hese i s 10000 i e a ions, all using a lea ning a e o 𝜂=0.01( he es o he hype -
pa ame e s in he op imize s can be d awn om he legend in Figu e 3.2), we can see many
o he beha iou s expec ed om sec ion 2.4.1. Vanilla, classic and Nes e o momen um SGD,
all had a e y simila pe o mance eaching an almos -sa u a ion a ound he same loss alue,
which means ha a ha poin we should ha e educed manually he lea ning a e. Al hough,
i is ha d o disce n om Figu e 3.2, Nes e o momen um had he as es ini ial dec ease un il
eaching he s a e o almos -sa u a ion, ollowed by classic momen um and anilla SGD, as
expec ed.
49
Fo his ins ance we will ocus on he e ec in he models o a ying he ac i a ion unc ions.
We will be aining 3 model wi h sigmoid, hype bolic angen and swish ac i a ion unc ions
espec i ely. All model will be ained on 3000 i e a ions (epochs), using a [1,5,5,1]-ANN,
wi h no egula iza ion, and using Adam wi h 𝜂=0.01,𝛽1=0.9and 𝛽2=0.999. Table 3.1
shows he end esul losses, and Figu e 3.7 plo s he pe o mance o he aining ( he losses
o he igu e ha e been b oken in o each o i s componen s).
Ac i a ion 𝐿 𝐿𝑠𝑜𝑙
Sigmoid 1.66⋅10−4 1.71⋅10−6
Tanh 1.20⋅10−4 4.02⋅10−7
Swish 6.41⋅10−5 1.75⋅10−6
Table 3.1: Resul s o 3 models ained o a [1,5,5,1]-ANN scheme, wi h no egula iza ion,
using Adam wi h 𝜂=0.01,𝛽1=0.9,𝛽2=0.999, on 3000 epochs. (3.5)
Figu e 3.7: T aining pe o mance o 3 models ained o a [1,5,5,1]-ANN scheme, wi h no
egula iza ion, using Adam wi h 𝜂=0.01,𝛽1=0.9,𝛽2=0.999, on 3000 epochs. (3.5)
56
F om he p e ious esul s we can see some in e es ing beha iou s. Fi s o all, he wo s
pe o ming ac i a ion ( he swish) wi h ega ds o he solu ion loss 𝐿𝑠𝑜𝑙, was ac ually he bes
in e ms o he objec i e loss 𝐿; and he bes pe o ming ac i a ion ( he anh) wi h ega ds
o he solu ion loss 𝐿𝑠𝑜𝑙, was no he bes in e ms o he objec i e loss 𝐿. Also, we see ha
he e is co ela ion be ween 𝐿(3 d plo o Figu e 3.7) and 𝐿𝑠𝑜𝑙 (4 h plo o Figu e 3.7); and
ha 𝐿1domina es o e 𝐿2, meaning 𝐿2is na i ely much smalle ha 𝐿1. A well as his, we
obse e ha he swish model had a much la e ini ial decay han he o he wo, bu he h ee
o hem s a sa u a ing a he same ime. All hese a e expec ed beha iou s ha we ha e
explained be o e.
Finally, in he nex igu e we plo he ou pu o he bes pe o min model ( he one wi h
anh ac i a ions), agains he exac solu ion. No e ha jus in 3000 epochs (2min) he ma ch
is almos pe ec .
Figu e 3.8: Final esul s. Bes pe o ming ained model ( anh) o (3.7) agains he exac
solu ion.
3.4.2 Model 2: The 2D Di e gence Ope a o
He e we will ake he p e ious model o he nex le el adding a dimension, and in doing so
we will conside ou i s PDE. S ill, his will be a e y simple p oblem. The ins ance we will
be conside ing i s is:
∇(𝑥,𝑦)⋅𝑢(𝑥,𝑦)⋅1=𝜕𝑢(𝑥,𝑦)
𝜕𝑥 +𝜕𝑢(𝑥,𝑦)
𝜕𝑦 =(2𝑥−1)⋅(𝑦2−𝑦)+(𝑥2−𝑥)⋅(2𝑦−1),
𝑢(𝑥,0)=0, 𝑥∈(−∞,∞), (3.8)
which has exac solu ion 𝑢(𝑥,𝑦)=(𝑥2−𝑥)⋅(𝑦2−𝑦). The loss unc ion o (3.8) would be:
𝐿(𝑤,𝑏)= 1
𝑁Ω∑
1≤𝑖≤𝑁Ω(𝜕 𝑢(𝑥𝑖,𝑦𝑖;𝑤,𝑏)
𝜕𝑥 +𝜕 𝑢(𝑥𝑖,𝑦𝑖;𝑤,𝑏)
𝜕𝑦 −(2𝑥𝑖−1)(𝑦2
𝑖−𝑦𝑖)
−(𝑥2
𝑖−𝑥𝑖)(2𝑦𝑖−1))2+1
𝑁Γ∑
1≤𝑖≤𝑁Γ( 𝑢(𝑥𝑖,0;𝑤,𝑏)−0)2+𝑅(𝑤,𝑏). (3.9)
Howe e , he e is an issue when using he (3.9) loss unc ion. The bo de condi ions a e
desc ibed by a cu e o 𝑥∈(−∞,∞), bu e ec i ely, we canno d aw samples om such a
wide ange. Since, we a e limi ing ou sel es o app oxima ing he solu ions in he domain o
Ω=[0,1]×[0,1] o p ac ical easons, we will sample 𝑥 om (−10,10) o he 𝐿2 e m.
57
The ollowing Figu e 3.9 shows he esul s o aining a model unde he p e ious
assump ions (speci ics in he cap ion). Obse e ha he le plo shows ha , he solu ion
app oxima ed by he model has wo sepa a e egions, one app oxima ing eally well he exac
solu ion, and ano he one ha does no by a la ge ma gin. I we u n o he igh plo we see
ha he MSE o he indi idual poin s wi h espec o he di e en ial ope a o /ex e nal o ce
is e y e en, meaning e e y poin is equally well i ed.
Figu e 3.9: Resul o a [1,10,10,1]-ANN model and anh ac i a ions, ained wi h no
egula iza ion, using Adam wi h 𝜂 = 0.01,𝛽1= 0.9,𝛽2= 0.999, on 12000 epochs. Le
plo : model agains exac solu ion. Righ plo MSE e o o he model, o each poin in he
domain.
This occu s because, in p ac ice, when we d aw a sample poin s o he bo de condi ions,
we a e limi ing ou sel es o 𝑥∈(−10,10). Hence, o all pu poses we a e sol ing (3.8) wi h
bounda y condi ions 𝑢(𝑥,0)=0, 𝑥∈(−10,10), which a e no longe Cauchy condi ions and
do no gua an ee uniqueness. The solu ion we wan o ind is also a solu ion o he p oblem
we a e i ing in p ac ice, bu he e a e many mo e. In ac , wha we see in Figu e 3.9 is he
a i icial neu al ne wo k o e lapping wo di e en solu ions o he p oblem ( he one closes
o 𝑦 = 0is he one we would wan ). Thus, his a good example o wha happens when
in eg a ing a p oblem which is no well-posed.
In o de o ix his issue we will change he Cauchy “bo de ” condi ions, which o in ini e
domains would be simply an open cu e, o i s ini e domain e sion, which equi es he
in o ma ion o e he bo de . This means, o Ω=[0,1]×[0,1], changing (3.8) o:
∇(𝑥,𝑦)⋅𝑢(𝑥,𝑦)⋅1=𝜕𝑢(𝑥,𝑦)
𝜕𝑥 +𝜕𝑢(𝑥,𝑦)
𝜕𝑦 =(2𝑥−1)⋅(𝑦2−𝑦)+(𝑥2−𝑥)⋅(2𝑦−1),
𝑢(𝑥,0)=0, 𝑢(𝑥,1)=0, 𝑥∈(0,1),
𝑢(0,𝑦)=0, 𝑢(1,𝑦)=0, 𝑦∈(0,1), (3.10)
wi h i s espec i e change in he loss unc ion (3.9). The solu ion o his p oblem is he same
as be o e.
58
In he ac ual expe imen s o (3.10) we will ake mo e in e es ing ea u es o compa e han
he simple ac i a ion unc ions o he p e ious ins ance. He e we will analyse he e ec s o
he size o he a i icial neu al ne wo k and he egula iza ion.
When choosing an a i icial neu al ne wo k a chi ec u e, he gene al ule is ha deepe
neu al ne wo ks a e able o lea n mo e complex unc ions, al hough a a g ea e cos o aining
[54]. Fu he mo e, pape s such as [55], ocused on lea ning polynomials wi h a i icial neu al
ne wo ks, sugges ha a ully-connec ed ne wo k wi h a single hidden laye wi h a numbe
o nodes equals o he deg ee o he polynomial, would be enough o lea n a polynomial ( his
is an ough and imp ecise ex ac ion o wha [55] s a es, bu holds o he mos pa ). In
his wo k hough, we ha e been using wo hidden laye s so a (an will keep using hem),
and a much la ge numbe o neu ons han he heo e ic minimal sugges s o he unde lying
solu ions we wan o app oxima e. The eason o doing his is o be e accoun o he
in o ma ion o he de i a i es du ing aining and make use o egula iza ion echniques, o
ob ain be e minima.
Fo his ins ance (3.10) we will be aining 6 models, all using hype bolic angen ac i a ions
and a e ained on Adam wi h 𝜂 = 0.001,𝛽1= 0.9,𝛽2= 0.999, on 8000 epochs. The
models will ei he ha e a [1,10,10,1]-ANN s uc u e o a [1,40,40,1]-ANN s uc u e; and be
ained using no egula iza ion, he cus om egula iza ion (2.58) wi h 𝜆=1, o a Tikhono
egula iza ion wi h 𝜆=1; which make o a o al o 6 combina ions. Mo eo e , all he models
wi h he same a chi ec u e ha e been ini ialized wi h exac ly he same pa ame e s. This has
been done o oo ou he possible e ec o luck o s a ing a a sligh ly be e poin o he
op imiza ion, and ensu e he di e ence is aining a e caused by he egula iza ion.
In Table 3.2 we show he inal esul s o he models, and in Figu es 3.10 and 3.11 we
show he pe o mance o he aining. Fi s , we obse e o hese kind o p oblems Tikhono
egula iza ions do no wo k well and hei aining incu s in ea ly s opping. Fo he (2.58)
cus om egula iza ion we see ha , in he smalle [1,10,10,1]-ANN model, he aining is
ac ually hinde ed and yields aw ul esul s, bu used he la ge [1,40,40,1]-ANN model i
ou pe o ms any o he se -up. This, is due o wha we ha e al eady explained in sec ion 2.6,
ha egula iza ions clamp down on he ex a deg ee o eedom o e i ing he model. Hence,
o he smalle model which is adequa ely pa ame ized, i becomes an ex a condi ion d awing
esou ces o m he model, while o he la ge model i na ows he pa ame e s o he i he
model. Fu he mo e, no only he la ge model wi h egula iza ion ou pe o ms he smalle
one wi hou , bu i we compa e hei pe o mances om Figu es 3.10 and 3.11, we no e ha
by he end o he aining, he smalle model has sa u a ed (s agna ed), while he la ge is
s ill s eadily dec easing ( hus, ha e mo e oom o imp o emen ). This shows ha is much
p e e able o ha e a la ge model wi h egula iza ion han simply a well adjus ed one.
A chi ec u e - Regula iza ion Technique 𝐿 𝐿𝑠𝑜𝑙
[1,10,10,1]-ANN - No Regula iza ion 1.04⋅10−4 7.52⋅10−6
[1,10,10,1]-ANN - (2.58) Regula iza ion wi h 𝜆=0.1 7.18⋅10−3 2.84⋅10−4
[1,10,10,1]-ANN - Tikhono Regula iza ion wi h 𝜆=0.1 6.33⋅10−4 4.19⋅10−5
[1,40,40,1]-ANN - No Regula iza ion 6.36⋅10−4 1.95⋅10−5
[1,40,40,1]-ANN - (2.58) Regula iza ion wi h 𝜆=0.1 2.93⋅10−4 2.32⋅10−6
[1,40,40,1]-ANN - Tikhono Regula iza ion wi h 𝜆=0.1 3.88⋅10−3 7.91⋅10−5
Table 3.2: Resul s o 6 models wi h di e en a chi ec u es, ained o (3.10), using Adam
wi h 𝜂=0.01,𝛽1=0.9,𝛽2=0.999, on 8000 epochs and di e en egula iza ion echniques.
59
Figu e 3.10: Compa ison o di e en egula iza ion echniques in aining pe o mance o 3
models ained o a [1,10,10,1]-ANN scheme, using Adam wi h 𝜂=0.01,𝛽1=0.9,𝛽2=0.999,
on 8000 epochs. (3.10)
Figu e 3.11: Compa ison o di e en egula iza ion echniques in aining pe o mance o 3
models ained o a [1,40,40,1]-ANN scheme, using Adam wi h 𝜂=0.01,𝛽1=0.9,𝛽2=0.999,
on 8000 epochs. (3.10)
Figu e 3.12: Final esul s o he bes pe o ming ained model ([1,40,40,1]-ANN, ained
wi h he cus om egula iza ion (2.58)) o (3.7) agains he exac solu ion.
60
3.4.3 Model 3: The 2D Laplacian Ope a o
A his poin we will complica e a bi mo e he di e en ial ope a o by conside ing second
o de de i a i es. Thus, we will conside he ollowing bounda y alue p oblem o he
Laplacian ope a o in 2 dimensions:
Δ𝑢(𝑥,𝑦)=𝜕2𝑢(𝑥,𝑦)
𝜕𝑥2+𝜕2𝑢(𝑥,𝑦)
𝜕𝑦2=2⋅(𝑦2−𝑦)+2⋅(𝑥2−𝑥),
𝑢(Γ)=𝑔1(Γ)∶ ⎧
{
⎨
{
⎩𝑢(𝑥,0)=0, 𝑢(𝑥,1)=0, 𝑥∈(0,1),
𝑢(0,𝑦)=0, 𝑢(1,𝑦)=0, 𝑦∈(0,1),
𝜕𝑢(Γ)
𝜕(𝑥,𝑦)⋅𝑛(Γ)=𝑔2(Γ)∶ ⎧
{
⎨
{
⎩
𝜕𝑢(𝑥,0)
𝜕𝑦 =−(𝑥2−𝑥), 𝜕𝑢(𝑥,1)
𝜕𝑦 =(𝑥2−𝑥), 𝑥∈(0,1),
𝜕𝑢(0,𝑦)
𝜕𝑥 =−(𝑦2−𝑦), 𝜕𝑢(1,𝑦)
𝜕𝑥 =(𝑦2−𝑦), 𝑦∈(0,1),
(3.11)
which has exac solu ion 𝑢(𝑥,𝑦)=(𝑥2−𝑥)⋅(𝑦2−𝑦), as wi h he p e ious p oblem. The o m
o p oblem (3.11) in i s gene al o m, o any dimension and ex e nal o ce, cons i u es wha is
called he Poisson equa ion, which is impo an h oughou physics, as i is he in e p e a ion
o Gauss Law in e ms o po en ials.
Be o e aining an a i icial neu al ne wo k o i his model, we would like o make a b ie
no e ega ding he coding o highe o de de i a i es in Tenso Flow. Looking a he official
documen a ion o Tenso Flow, he me hod gi en o ob ain highe o de de i a i es in one
a iable is by nes ing au o-di e en ia ions calls. Howe e , no e ha , Tenso Flow is used in a
con ex o aining a i icial neu al ne wo ks, hus when au o-di e en ia ing wice we ob ain:
∇(𝑥)𝑓(𝑥1,...,𝑥𝑛)=(𝜕𝑓
𝜕𝑥1,..., 𝜕𝑓
𝜕𝑥𝑛),
∇2
(𝑥)𝑓(𝑥1,...,𝑥𝑛)=( 𝜕
𝜕𝑥1𝑛
∑
𝑚=1 𝜕𝑓
𝜕𝑥𝑚,..., 𝜕
𝜕𝑥𝑛𝑛
∑
𝑚=1 𝜕𝑓
𝜕𝑥𝑚), (3.12)
which is no he Laplacian. The e a e wo ways o o e come his issue: ei he use he uns ack
and s ack unc ions o decouple he inpu s and compu e he g adien s acking only an
indi idual a iable ( he op ion we ha e used in he code); o o use he hessian unc ion
o compu e he Hessian ma ix and hen compu e he ace, which is highly inefficien as we
only equi e he elemen s in he diagonal. Wi hou [56] whe e his obse a ion is poin ed ou ,
we would no ha e been able o ca y ou his simula ion.
A his poin we ha e al eady expe imen ed on all he p incipal op ions and hype -pa ame e
choices co e ed in his wo k, and we ha e s udied hei pe o mance. So, om now on, we
will be d opping he compa isons and limi ou sel es o simply sol e he nex models wi h
he bes possible se -up bes on wha we ha e discussed.
The a i icial neu al ne wo k model ained o (3.11) has achie ed a inal global loss o
𝐿=1.23⋅10−3 and inal loss wi h espec o he solu ion o 𝐿𝑠𝑜𝑙 =4.25⋅10−6. This model
consis ed o a [1,40,40,1]-ANN wi h anh ac i a ions, ained o 6000 epochs (when ea ly s op
igge ed), using Adam wi h 𝜂=0.001,𝛽1=0.9,𝛽2=0.999, and he cus om egula iza ion
(2.58) wi h 𝜆=0.1. The esul s can be seen in he ollowing Figu e 3.13.
61
Figu e 3.13: Resul s and pe o mance o he model ained o (3.11).
3.4.4 Model 4: The 1D Ad ec ion Ope a o
Fo his simula ion we s ep down om he 2D PDE cases, o go back o an ODE. The
eason o his downg ade is o explain a ce ain issue occu ing o his ope a o . This issue
is one ha happens o he inal case o his sec ion, he 2D Bu ge s ope a o , and since he
ad ec ion ope a o we a e p oposing coincides wi h he Bu ge s ope a o in 1D, we see his
as a much simple example o in oduce a discussion.
62
The ini ial alue p oblem we wan o conside is:
𝑢(𝑥)⋅∇(𝑥)⋅𝑢(𝑥)=𝑢(𝑥)⋅𝜕𝑢(𝑥)
𝜕𝑥 =2𝑥3−3𝑥2+𝑥,
𝑢(0)=0, (3.13)
which has exac solu ion 𝑢(𝑥)=𝑥2−𝑥, same as he 1D di e gence case. This p oblem look
like alling unde he Cauchy-Ko ale skaya condi ions, so exis ence and uniqueness should be
gua an eed. Howe e , he e is a sub le y hidden he e. I we w i e he equa ion in i s canonical
o m (isola ing he highe de i a i e), which is equi ed o apply he Cauchy-Ko ale skaya
heo em, 𝜕𝑢(𝑥)
𝜕𝑥 =2𝑥3−3𝑥2+𝑥
𝑢(𝑥) ,(3.14)
we no e ha he equa ion is quasi-linea and i s e ms a e analy ic e e ywhe e excep o he
ze oes o 𝑢(𝑥). Hence we ha e local exis ence and uniqueness almos e e ywhe e, bu since
i can ail in some poin s, we canno build a unique global solu ion using he heo em. This
can be e i ied easily in his case, as he di e en ial equa ion is sepa able and can be sol ed
easily by sepa a ions o a iables me hod:
∫𝑥
0𝑢(𝑥)𝜕𝑢(𝑥)
𝜕𝑥 𝑑𝑥=∫𝑥
02𝑥3−3𝑥2+𝑥𝑑𝑥,
1
2(𝑢(𝑥))2∣𝑥
0=1
2𝑥4−𝑥3+1
2𝑥2∣𝑥
0,
1
2(𝑢(𝑥))2−0=1
2𝑥4−𝑥3+1
2𝑥2−0,
𝑢(𝑥)=±√𝑥4−2𝑥3+𝑥2=±(𝑥2−𝑥).
(3.15)
Looking a Figu e 3.14 we obse e ha he solu ions in e sec (hence, a e no unique) a
he oo s o 𝑢(𝑥).
Figu e 3.14: Posi i e and nega i e sign solu ions o 3.13.
To ix his issue and ix a solu ion, i is enough o p o ide in o ma ion abou an ex a
de i a i e o one mo e o de han he equi ed by he Cauchy condi ions. The e o e, he
well-posed p oblem ha we will conside will be:
𝑢(𝑥)⋅∇(𝑥)⋅𝑢(𝑥)=𝑢(𝑥)⋅𝜕𝑢(𝑥)
𝜕𝑥 =2𝑥3−3𝑥2+𝑥,
𝑢(0)=0, 𝑢′(0)=−1. (3.16)
63
The a i icial neu al ne wo k model ained o (3.16) has achie ed a inal global loss o
𝐿=1.05⋅10−3 and inal loss wi h espec o he solu ion o 𝐿𝑠𝑜𝑙 =2.74⋅10−7. This model
consis ed o a [1,20,20,1]-ANN wi h sigmoid ac i a ions, ained o 3000 epochs using Adam
wi h 𝜂=0.01,𝛽1=0.9,𝛽2=0.999, and he cus om egula iza ion (2.58) wi h 𝜆=0.1. The
esul s can be seen in he ollowing Figu e 3.16.
Figu e 3.15: Resul s and pe o mance o he model ained o (3.16).
3.4.5 Model 5: The 2D Clai au Ope a o
The Clai au ope a o can be seen as an upg ade o he 2D Ad ec ion case. I may no
be much mo e complica ed han wha we ha e seen be o e, bu i is he i s PDE wi h
non-cons an coefficien s ha we in eg a e in his wo k. We pose i s bounda y p oblem as:
(𝑥,𝑦)⋅∇(𝑥,𝑦)𝑢(𝑥,𝑦)= 𝑥⋅𝜕𝑢(𝑥,𝑦)
𝜕𝑥 +𝑦⋅𝜕𝑢(𝑥,𝑦)
𝜕𝑦
=𝑥⋅(2𝑥−1)⋅(𝑦2−𝑦)+(𝑥2−𝑥)⋅𝑦⋅(2𝑦−1),
𝑢(𝑥,0)=0, 𝑢(𝑥,1)=0, 𝑥∈(0,1),
𝑢(0,𝑦)=0, 𝑢(1,𝑦)=0, 𝑦∈(0,1),
(3.17)
wi h solu ion 𝑢(𝑥,𝑦)=(𝑥2−𝑥)⋅(𝑦2−𝑦), as always.
64
The a i icial neu al ne wo k model ained o (3.17) has achie ed a inal global loss o
𝐿=3.04⋅10−6 and inal loss wi h espec o he solu ion o 𝐿𝑠𝑜𝑙 =3.96⋅10−6. This model
consis ed o a [1,40,40,1]-ANN wi h anh ac i a ions, ained o 8000 epochs, using Adam
wi h 𝜂=0.001,𝛽1=0.9,𝛽2=0.999, and he cus om egula iza ion (2.58) wi h 𝜆=0.1.
The esul s can be seen in he ollowing Figu e 3.16.
Figu e 3.16: Resul s and pe o mance o he model ained o (3.17).
3.4.6 Model 6: The 2D Bu ge s Ope a o
Finally, we will nume ically in eg a e he las , and mos complex, bounda y p oblem o his
wo k. This would be he 2D Bu ge s ope a o , and i can be ega ded as he mul i-dimensional
case o he ad ec ion ope a o . While he ad ec ion ope a o is applied on scala unc ions,
he Bu ge s ope a o is applied on ec o ields.
65
Appendix B
The Code
As al eady in oduced in sec ion 3.1, he code has been implemen ed using Py hon’s e sion
Tenso Flow 2.3. The code was implemen ed a Google Colab no ebook, hence each class was
encapsula ed in a cell. Nex , he e is a b ie simpli ied desc ip ion on wha each cell/class
con ains:
–impo s Cell: Impo s he main lib a ies, which includes Tenso Flow o enso
manipula ion, Time o ge he ime s amp, Pickle o sa e he models, Ma Plo Lib
o plo he models, among many o he s. I also supp esses Wa nings.
–auxili yPlo ing Class: Encapsula es he me hods o plo ing esul s. I con ains
unc ions o: plo he dis ibu ion o da ase colloca ions poin s, plo he ou pu o
he model along he exac solu ion, plo he loss s epoch g aph o he aining o he
model, plo he e o s o indi idual poin s in he aining se , and plo he loss s epoch
o mul iple models in he same g aph.
–myDa aSe s Class: Used o c ea e ins ances o myDa aSe s. Each o his ins ances
mainly gene a e o di e en op ions, and con ain, he colloca ion o poin s o he
aining and alida ion se s.
–p oblemIns ance Class: Encapsula es he me hods o he speci ics o he di e se
ins ances o he ini ial/bounda y p oblems. I con ains unc ions ha gi en he aining
o alida ion se , and he a i icial neu al ne wo k ou pu and de i a i es, e u n he
alues o : he di e en ial ope a o , he ex e nal o ce, he ini ial/bounda y condi ions
lhs and hs, and he exac solu ion o he p oblem.
–secondO de Op imize s Class: Used o c ea e ins ances o secondO de Op imize s
implemen ing he BFGS and L-BFGS op imize s. Since Ke as only con ains i s o de
op imize s, his cus om class uses he implemen a ion in enso low_p obabili y lib a y,
which is gene ic, and adap s i o inpu a i icial neu al ne wo k models.
–myLaye Class: O e ides he ke as.Laye class and i is used o c ea e objec ins ances
o myLaye . These objec s con ain he pa ame e s and composi ion o he neu ons in
an a i icial neu on laye , and he eed me hod which p ocess an inpu o ob ain he
co esponding laye ou pu .
–myModel Class: O e ides he ke as.Model class o c ea e ins ances o myModel,
which implemen s he a i icial neu al ne wo k models. These objec s a e based on
collec ions o myLaye ins ances, and con ain ei he , a i s o de op imize ins ance
om Ke as, o second o de op imize ins ance om secondO de Op imize s, which can
be accessed and changed a any momen . Th ough he me hods in hese objec s and
gi en a myDa aSe s ins ance one can: ob ain he model ou pu , o ain he model o a
p oblem se -up which calls on p oblemIns ance o i s speci ics. His o ical in o ma ion
abou he loss pe o mance du ing aining is s o ed in he objec . Also, he e a e
me hods o sa e and load models in *.pickle iles, o la e use.
72
–execu ion Cell: These a e he snippe s o code ha calls on o he p e ious classes
o pe o m he expe imen s. One o hese calls usually consis on: a call o ins ance a
myDa aSe s and myModel, wi h some op ions; a call o he i me hod in myModel, o
ain he a i icial neu al ne wo k; a call o one o he auxili yPlo ing me hods o plo
he esul s; and op ionally, sa ing he model. B.8 has an example showing in commen s
all o he a ia ions ha can be used.
The e a e mo e unc ionali ies implemen ed h oughou hese classes. Fo mo e de ails, ead
he commen s h ough he code. (The code on has been educed o p ese e inden a ion).
B.1 impo s Cell
1"""
2@au ho : Albe o Ga cia Molina
3@la es _upda e: 12/10/2020
4"""
5
6impo ma h
7 om ma h impo log
8impo numpy as np
9
10 impo ime
11 impo ma plo lib.pyplo as pl
12 om pylab impo cPa ams
13 om mpl_ oolki s.mplo 3d impo Axes3D
14
15 impo pickle
16 om google.colab impo iles # Only o he colab en i onmen .
17
18 impo enso low as
19 impo enso low_p obabili y as p
20 om enso low impo ke as
21 om enso low.ke as impo laye s
22
23 impo logging , os
24
25 # Sup ess Wa nings.
26
27 logging.disable(logging.WARNING)
28 os.en i on["TF_CPP_MIN_LOG_LEVEL"] = "3"
29
30 # Op ional code o check i he e is a GPU a ailable.
31
32 #% enso low_ e sion 2.x
33 #de ice_name = . es .gpu_de ice_name()
34 #i de ice_name != '/de ice:GPU:0':
35 # aise Sys emE o ('GPU de ice no ound ')
36 #p in ('Found GPU a : {}'. o ma (de ice_name))
B.2 auxili yPlo ing Class
1"""
2@au ho : Albe o Ga cia Molina
3@la es _upda e: 12/10/2020
4"""
5
6class auxili yPlo ing:
7
8####################
9# Plo s he gene a ed se s (Only 2D).
10 ####################
11 de plo _da ase s (da a_se ):
12
13 %ma plo lib inline
14 aining_se , bo de _ aining_se , alida ion_se = da a_se .ge _se s()
15 _ , _, _, inpu _dim , ou pu _dim = da a_se .ge _se _dimensions()
16
17 i (inpu _dim == 2):
18 pl .sca e ( aining_se [:,0], aining_se [:,1], s=0.1)
19 pl . i le('T aining Se ')
20 pl .show()
21
22 pl .sca e (bo de _ aining_se [0][:,0], bo de _ aining_se [0][:,1], s=0.1)
23 pl . i le('Bo de T aining Se ')
24 pl .show()
25
26 pl .sca e ( alida ion_se [:,0], alida ion_se [:,1], s=0.1)
27 pl . i le('Valida ion Se ')
28 pl .show()
29 else:
30 p in ('In alid dimensions o plo .')
73
31
32 ####################
33 # Plo loss (Only o aining se ).
34 ####################
35 de plo _loss_ unc ion (model,
36 ini _ ange = 0,
37 end_ ange = -1,
38 subdi ide_losses = False ,
39 use_log_scale = False):
40
41 %ma plo lib inline
42
43 #Se s he x ange o he plo .
44 plo _ eal_sol_loss = T ue
45 i (end_ ange < 0):
46 end_ ange = len(model._losses)
47
48 # Plo s he eal loss.
49 min_loss = min(model._losses[ini _ ange:end_ ange])
50 max_loss = max(model._losses[ini _ ange:end_ ange])
51
52 i (use_log_scale == T ue):
53 min_loss = min(loss o loss in model._losses[ini _ ange:end_ ange] i loss > 0)
54 pl .plo ( ange(ini _ ange , end_ ange),
55 [log(y,10) i y !=0 else None
56 o yin model._losses[ini _ ange:end_ ange]],
57 label='Loss')
58 pl .ylim(log(min_loss ,10), log(max_loss ,10))
59 pl . i le('Loss (log) - Epoch ')
60 else:
61 pl .plo ( ange(ini _ ange , end_ ange),
62 model._losses[ini _ ange:end_ ange],
63 label='Loss')
64 pl .ylim(min_loss , max_loss )
65 pl . i le('Loss - Epoch ')
66
67 pl .xlim(ini _ ange , end_ ange)
68 pl .legend()
69 pl .show()
70 p in ('Minimum Loss a :', s (min_loss))
71
72 # Plo s he loss w he eal solu ion.
73 min_loss_w _solu ion = min(model._losses_solu ion[ini _ ange:end_ ange])
74 max_loss_w _solu ion = max(model._losses_solu ion[ini _ ange:end_ ange])
75
76 i (use_log_scale == T ue):
77 min_loss_w _solu ion = min(loss o loss in model._losses_solu ion[ini _ ange:end_ ange] i loss > 0)
78 pl .plo ( ange(ini _ ange , end_ ange),
79 [log(y,10) i y !=0 else None
80 o yin model._losses_solu ion[ini _ ange:end_ ange]],
81 label='Loss w Solu ion ')
82 pl .ylim(log(min_loss_w _solu ion,10), log(max_loss_w _solu ion,10))
83 pl . i le('Loss w Exac Sol (log) - Epoch ')
84 else:
85 pl .plo ( ange(ini _ ange , end_ ange),
86 model._losses_solu ion[ini _ ange:end_ ange],
87 label='Loss')
88 pl .ylim(min_loss_w _solu ion , max_loss_w _solu ion)
89 pl . i le('Loss w Exac Sol - Epoch ')
90 pl .xlim(ini _ ange , end_ ange)
91 pl .legend()
92 pl .show()
93 p in ('Minimum Loss w Solu ion a :', s (min_loss_w _solu ion))
94
95 # Plo s he subdi ision o he loss by i s componen s.
96 i (subdi ide_losses == T ue):
97
98 # Domain Componen
99 min_domain_loss = min(model._losses_domain[ini _ ange:end_ ange])
100 max_domain_loss = max(model._losses_domain[ini _ ange:end_ ange])
101
102 i (use_log_scale == T ue):
103 min_domain_loss = min(loss o loss in model._losses_domain[ini _ ange:end_ ange] i loss > 0)
104 pl .plo ( ange(ini _ ange , end_ ange),
105 [log(y,10) i y !=0 else None
106 o yin model._losses_domain[ini _ ange:end_ ange]],
107 label='Loss')
108 pl .ylim(log(min_domain_loss ,10), log(max_domain_loss ,10))
109 pl . i le('Domain Loss (log) - Epoch ')
110 else:
111 pl .plo ( ange(ini _ ange , end_ ange),
112 model._losses_domain[ini _ ange:end_ ange],
113 label='Loss')
114 pl .ylim(min_domain_loss , max_domain_loss)
115 pl . i le('Domain Loss - Epoch ')
116
117 pl .xlim(ini _ ange , end_ ange)
118 pl .legend()
119 pl .show()
120 p in ('Minimum Domain Loss a :', s (min_domain_loss))
121
122 # Bo de Componen
123 min_bo de _loss = min(model._losses_bo de [ini _ ange:end_ ange])
124 max_bo de _loss = max(model._losses_bo de [ini _ ange:end_ ange])
125
126 i (use_log_scale == T ue):
127 min_bo de _loss = min(loss o loss in model._losses_bo de [ini _ ange:end_ ange] i loss > 0)
128 pl .plo ( ange(ini _ ange , end_ ange),
129 [log(y,10) i y !=0 else None
130 o yin model._losses_bo de [ini _ ange:end_ ange]],
131 label='Loss')
132 pl .ylim(log(min_bo de _loss ,10), log(max_bo de _loss ,10))
74
133 pl . i le('Bo de Loss (log) - Epoch ')
134 else:
135 pl .plo ( ange(ini _ ange , end_ ange),
136 model._losses_bo de [ini _ ange:end_ ange],
137 label='Loss')
138 pl .ylim(min_bo de _loss , max_bo de _loss)
139 pl . i le('Bo de Loss - Epoch ')
140
141 pl .xlim(ini _ ange , end_ ange)
142 pl .legend()
143 pl .show()
144 p in ('Minimum Bo de Loss a :', s (min_bo de _loss))
145
146 # Regula iza ion Componen
147 i (model._ egula iza ion != None):
148 min_ eg_loss = min(model._losses_ egula iza ion[ini _ ange:end_ ange])
149 max_ eg_loss = max(model._losses_ egula iza ion[ini _ ange:end_ ange])
150
151 i (use_log_scale == T ue):
152 min_ eg_loss = min(loss o loss in model._losses_ egula iza ion[ini _ ange:end_ ange] i loss > 0)
153 pl .plo ( ange(ini _ ange , end_ ange),
154 [log(y,10) i y !=0 else None
155 o yin model._losses_ egula iza ion[ini _ ange:end_ ange]],
156 label='Loss')
157 pl .ylim(log(min_ eg_loss ,10), log(max_ eg_loss ,10))
158 pl . i le('Regula iza ion Loss (log) - Epoch ')
159 else:
160 pl .plo ( ange(ini _ ange , end_ ange),
161 model._losses_ egula iza ion[ini _ ange:end_ ange],
162 label='Loss')
163 pl .ylim(min_ eg_loss , max_ eg_loss)
164 pl . i le('Regula iza ion Loss - Epoch ')
165 pl .xlim(ini _ ange , end_ ange)
166 pl .legend()
167 pl .show()
168 p in ('Minimum Regula iza ion Loss a :', s (min_bo de _loss))
169
170 ####################
171 # Plo s he model (Uses he alida ion se ).
172 ####################
173 de plo _model (da a_se ,
174 model,
175 plo _ eal_sol = False):
176
177 i (model._inpu _dim == 1):
178 i (model._ou pu _dim == 1):
179 ou pu s , _ = model.p edic (da a_se ._ alida ion_se )
180 exac _sol = p oblemIns ance.exac _solu ion(inpu s = da a_se ._ alida ion_se ,
181 exac _solu ion = model._exac _solu ion ,
182 inpu _dim = 1,
183 ou pu _dim = 1)
184 pl .sca e ( da a_se ._ alida ion_se , ou pu s , s=0.1, label='Model ')
185 pl .sca e (da a_se ._ alida ion_se , exac _sol , s=0.1, label='Exac Solu ion ')
186 pl .xlabel('x')
187 pl .ylabel( '$ ha {u}(x)$')
188 pl .legend()
189 pl . i le('Model ')
190 else:
191 p in ('In alid dimensions o plo .')
192
193 eli (model._inpu _dim == 2):
194 x = da a_se ._ alida ion_se [:,0]
195 y = da a_se ._ alida ion_se [:,1]
196
197 i (model._ou pu _dim == 1):
198 ou pu s , _ = model.p edic (da a_se ._ alida ion_se )
199 exac _sol = p oblemIns ance.exac _solu ion(inpu s = da a_se ._ alida ion_se ,
200 exac _solu ion = model._exac _solu ion ,
201 inpu _dim = 2,
202 ou pu _dim = 1)
203 # Modi ica pa a los limi es eales.
204 pl . cPa ams[' igu e. igsize '] = [8,8]
205 ig = pl . igu e()
206 ax = pl .axes(p ojec ion='3d')
207 ax.se _ i le ('Model ')
208 ax.se _xlim( .ma h. educe_min(x), .ma h. educe_max(x))
209 ax.se _ylim( .ma h. educe_min(y), .ma h. educe_max(y))
210 ax.se _xlabel('x')
211 ax.se _ylabel('y')
212 ax.se _zlabel('u (x,y)')
213 ax. sca e 3D(x, y, ou pu s , cmap='G eens ', s=1, label='Model ')
214 ig = pl . igu e()
215
216 ax.sca e 3D (x, y, exac _sol , cmap='G eens ', s=1, label='Exac Solu ion ')
217 ig = pl . igu e()
218 ax.se _ i le('Model s Exac Solu ion ')
219 ax.legend()
220 #Op ional
221 ax. iew_ini (20, 230)
222
223 eli (model._ou pu _dim == 2):
224 ou pu s , _ = model.p edic (da a_se ._ alida ion_se )
225 exac _sol = p oblemIns ance.exac _solu ion(inpu s = da a_se ._ alida ion_se ,
226 exac _solu ion = model._exac _solu ion ,
227 inpu _dim = 2,
228 ou pu _dim = 2)
229 # Modi ica pa a los limi es eales.
230 pl . cPa ams[' igu e. igsize '] = [7,7]
231 ig = pl . igu e()
232 ax = pl .axes(p ojec ion='3d')
233 ax.se _ i le ('Model ')
234 ax.se _xlim( .ma h. educe_min(x), .ma h. educe_max(x))
75
235 ax.se _ylim( .ma h. educe_min(y), .ma h. educe_max(y))
236 ax.se _xlabel('x')
237 ax.se _ylabel('y')
238 ax.se _zlabel('u_x(x,y)')
239 ax.sca e 3D(x, y, ou pu s[:,0], cmap='G eens ', s=0.2)
240 ig = pl . igu e()
241
242 ig = pl . igu e()
243 ax = pl .axes(p ojec ion='3d')
244 ax.se _ i le('Exac Solu ion ')
245 ax.se _xlim( .ma h. educe_min(x), .ma h. educe_max(x))
246 ax.se _ylim( .ma h. educe_min(y), .ma h. educe_max(y))
247 ax.se _xlabel('x')
248 ax.se _ylabel('y')
249 ax.se _zlabel('u_x(x,y)')
250 ax.sca e 3D(x, y, exac _sol[:,0], cmap='G eens', s=0.2)
251 ig = pl . igu e()
252 pl .legend()
253
254 # Modi ica pa a los limi es eales.
255 pl . cPa ams[' igu e. igsize '] = [7,7]
256 ig = pl . igu e()
257 ax = pl .axes(p ojec ion='3d')
258 ax.se _ i le ('Model ')
259 ax.se _xlim( .ma h. educe_min(x), .ma h. educe_max(x))
260 ax.se _ylim( .ma h. educe_min(y), .ma h. educe_max(y))
261 ax.se _xlabel('x')
262 ax.se _ylabel('y')
263 ax.se _zlabel('u_y(x,y)')
264 ax.sca e 3D(x, y, ou pu s[:,1], cmap='G eens ', s=0.2)
265 ig = pl . igu e()
266
267 ig = pl . igu e()
268 ax = pl .axes(p ojec ion='3d')
269 ax.se _ i le('Exac Solu ion ')
270 ax.se _xlim( .ma h. educe_min(x), .ma h. educe_max(x))
271 ax.se _ylim( .ma h. educe_min(y), .ma h. educe_max(y))
272 ax.se _xlabel('x')
273 ax.se _ylabel('y')
274 ax.se _zlabel('u_y(x,y)')
275 ax.sca e 3D(x, y, exac _sol[:,1], cmap='G eens', s=0.2)
276 ig = pl . igu e()
277 pl .legend()
278
279 else:
280 p in ('In alid dimensions o plo .')
281
282 ####################
283 # Plo s he squa ed e o (P o o ype).
284 ####################
285 de plo _e o (da a_se ,
286 model):
287
288 i (model._inpu _dim == 1):
289 i (model._ou pu _dim == 1):
290 ou pu s , _ = model.p edic (da a_se ._ alida ion_se )
291 exac _sol = p oblemIns ance.exac _solu ion(inpu s = da a_se ._ alida ion_se ,
292 exac _solu ion = model._exac _solu ion ,
293 inpu _dim = 1,
294 ou pu _dim = 1)
295 pl .sca e ( da a_se ._ alida ion_se , . squa e(ou pu s - exac _sol), s=0.1, label ='Squa e E o ')
296 pl .legend()
297 pl . i le('Model ')
298 else:
299 p in ('In alid dimensions o plo .')
300
301 eli (model._inpu _dim == 2):
302 x = da a_se ._ alida ion_se [:,0]
303 y = da a_se ._ alida ion_se [:,1]
304
305 i (model._ou pu _dim == 1):
306 # Loss w o he ope a o and o ce
307 domain_ind_loss = . educe_sum(
308 .squa e(
309 p oblemIns ance.di e en ial_ope a o (
310 inpu s = da a_se ._ aining_se ,
311 ou pu s = model.p edic (da a_se ._ aining_se ,
312 model._ equi ed_de i a i e_o de )[0],
313 ou pu s_de i a i es = model.p edic (da a_se ._ aining_se ,
314 model._ equi ed_de i a i e_o de )[1],
315 di e en ial_ope a o = model._di e en ial_ope a o ,
316 inpu _dim = 2,
317 ou pu _dim = 1)
318 - p oblemIns ance.ex e nal_ o ce(inpu s = da a_se ._ aining_se ,
319 ex e nal_ o ce = model._ex e nal_ o ce ,
320 inpu _dim = 2,
321 ou pu _dim = 1)),
322 axis = 1,
323 keepdims =T ue)
324
325 #bo de _ind_loss = . educe_sum(
326 # .squa e(da a_se ._bo de _ aining_se [1]
327 #- model.p edic (da a_se ._bo de _ aining_se [0])[0],
328 # model._ equi ed_de i a i e_o de ),
329 # axis = 1,
330 # keepdims = T ue)
331
332 # Modi ica pa a los limi es eales.
333 pl . cPa ams[' igu e. igsize '] = [7,7]
334 ig = pl . igu e()
335 ax = pl .axes(p ojec ion='3d')
336 ax.se _xlim(0, 1)
76
337 ax.se _ylim(0, 1)
338 ax.se _xlabel('x')
339 ax.se _ylabel('y')
340 ax.se _zlabel( '$L_{1}(x;w,b)$')
341 ax.sca e 3D(# .conca ([da a_se ._ aining_se [:,0], da a_se ._bo de _ aining_se [0][:,0]], axis=0),
342 # .conca ([da a_se ._ aining_se [:,1], da a_se ._bo de _ aining_se [0][:,1]], axis=0),
343 # .squa e( .conca ([domain_ind_loss , bo de _ind_loss], axis =0)),
344 .conca ([da a_se ._ aining_se [:,0]], axis=0),
345 .conca ([da a_se ._ aining_se [:,1]], axis=0),
346 .squa e( .conca ([domain_ind_loss], axis=0)),
347 cmap='G eens ',
348 s=0.2)
349 ig = pl . igu e()
350 ax.se _ i le( 'MSE o he indi idual domain poin s: $|| ma hcal{L}[ ha {u}(x,y)]- (x,y)||^{2}_{2}$')
351 # Op ional
352 ax. iew_ini (30, 40)
353
354 # Loss w o he eal sol.
355 #inpu _se = .conca ([da a_se ._ aining_se [:], da a_se ._bo de _ aining_se [0][:]], axis=0)
356 inpu _se = .conca ([da a_se ._ aining_se [:]], axis=0)
357 ou pu s , _ = model.p edic (inpu _se )
358 exac _sol = p oblemIns ance.exac _solu ion(inpu s = inpu _se ,
359 exac _solu ion = model._exac _solu ion ,
360 inpu _dim = 2,
361 ou pu _dim = 1)
362 # Modi ica pa a los limi es eales.
363 pl . cPa ams[' igu e. igsize '] = [7,7]
364 ig = pl . igu e()
365 ax = pl .axes(p ojec ion='3d')
366 ax.se _ i le ('Model ')
367 ax.se _xlim(0, 1)
368 ax.se _ylim(0, 1)
369 ax.se _xlabel('x')
370 ax.se _ylabel('y')
371 ax.se _zlabel('Squa e E o ')
372 ax.sca e 3D(inpu _se [:,0],
373 inpu _se [:,1],
374 .squa e (ou pu s -exac _sol),
375 cmap='G eens ',
376 s=0.2)
377 ig = pl . igu e()
378 ax.se _ i le('Squa e E o o he Real Sol ')
379
380 i (model._ou pu _dim == 2):
381 # Loss w o he ope a o and o ce
382 domain_ind_loss = . educe_sum(
383 .squa e(
384 p oblemIns ance.di e en ial_ope a o (
385 inpu s = da a_se ._ aining_se ,
386 ou pu s = model.p edic (da a_se ._ aining_se ,
387 model._ equi ed_de i a i e_o de )[0],
388 ou pu s_de i a i es = model.p edic (da a_se ._ aining_se ,
389 model._ equi ed_de i a i e_o de )[1],
390 di e en ial_ope a o = model._di e en ial_ope a o ,
391 inpu _dim = 2,
392 ou pu _dim = 2)
393 - p oblemIns ance.ex e nal_ o ce(inpu s = da a_se ._ aining_se ,
394 ex e nal_ o ce = model._ex e nal_ o ce ,
395 inpu _dim = 2,
396 ou pu _dim = 2)),
397 axis = 1,
398 keepdims =T ue)
399
400 bo de _ind_loss = . educe_sum(
401 .squa e(da a_se ._bo de _ aining_se [1]
402 - model.p edic (da a_se ._bo de _ aining_se [0])[0],
403 model._ equi ed_de i a i e_o de ),
404 axis = 1,
405 keepdims = T ue)
406
407 # Modi ica pa a los limi es eales.
408 pl . cPa ams[' igu e. igsize '] = [7,7]
409 ig = pl . igu e()
410 ax = pl .axes(p ojec ion='3d')
411 ax.se _ i le ('Model ')
412 ax.se _xlim(0, 1)
413 ax.se _ylim(0, 1)
414 ax.se _xlabel('x')
415 ax.se _ylabel('y')
416 ax.se _zlabel('Squa e E o ')
417 ax.sca e 3D( .conca ([da a_se ._ aining_se [:,0], da a_se ._bo de _ aining_se [0][:,0]], axis=0),
418 .conca ([da a_se ._ aining_se [:,1], da a_se ._bo de _ aining_se [0][:,1]], axis=0),
419 .squa e( .conca ([domain_ind_loss , bo de _ind_loss], axis=0)),
420 cmap='G eens ',
421 s=0.2)
422 ig = pl . igu e()
423 ax.se _ i le('Squa e E o o he Loss Fo mula ')
424
425 else:
426 p in ('In alid dimensions o plo .')
427
428 de plo _loss_compa ison (models,
429 names,
430 i le,
431 ini _ ange = 0,
432 end_ ange = -1,
433 subdi ide_losses = False ,
434 use_log_scale = False):
435
436 %ma plo lib inline
437 # cPa ams[' igu e. igsize '] = 15, 5
438 cPa ams[' igu e. igsize '] = 20, 4
77
439
440 #Se s he x ange o he plo .
441 i (end_ ange < 0):
442 end_ ange = 0
443 o model in models:
444 end_ ange_ a = len(model._losses)
445 i (end_ ange < end_ ange_ a ):
446 end_ ange = end_ ange_ a
447
448 # Plo s he global loss.
449 min_loss = 1e30
450 max_loss = 0
451 o model in models:
452 min_loss_ a = min(loss o loss in model._losses[ini _ ange:end_ ange] i loss > 0)
453 max_loss_ a = max(model._losses[ini _ ange:end_ ange])
454 i (min_loss_ a < min_loss):
455 min_loss = min_loss_ a
456 i (max_loss_ a > max_loss):
457 max_loss = max_loss_ a
458
459 i (use_log_scale == T ue):
460 o model_ind in ange(len(models)):
461 pl .plo ( ange(ini _ ange , len(models[model_ind ]._losses)),
462 [log(y,10) i y !=0 else None
463 o yin models[model_ind]._losses[ini _ ange:len(models[model_ind]._losses)]],
464 label = names[model_ind])
465 pl .ylim(log(min_loss ,10), log(max_loss ,10))
466 pl . i le( 'Global Loss Loga i m , $log_{10}(L(w,b))$ s Epoch - ' + i le)
467 else:
468 o model_ind in ange(len(models)):
469 pl .plo ( ange(ini _ ange , len(models[model_ind ]._losses)),
470 models[model_ind]._losses[ini _ ange:len(models[model_ind]._losses)],
471 label = names[model_ind])
472 pl .ylim(min_loss , max_loss )
473 pl . i le( 'Global Loss, L(w,b) s Epoch - ' + i le)
474
475
476 pl .xlim(ini _ ange , end_ ange)
477 pl .xlabel('I e a ions ')
478 pl .ylabel('Loss')
479 pl .legend()
480 pl .show()
481
482 # Plo s he loss w he eal solu ion.
483 min_loss = 1e30
484 max_loss = 0
485 o model in models:
486 min_loss_ a = min(loss o loss in model._losses_solu ion[ini _ ange:end_ ange] i loss > 0)
487 max_loss_ a = max(model._losses_solu ion[ini _ ange:end_ ange])
488 i (min_loss_ a < min_loss):
489 min_loss = min_loss_ a
490 i (max_loss_ a > max_loss):
491 max_loss = max_loss_ a
492
493 i (use_log_scale == T ue):
494 o model_ind in ange(len(models)):
495 pl .plo ( ange(ini _ ange , len(models[model_ind ]._losses_solu ion)),
496 [log(y,10) i y !=0 else None
497 o yin models[model_ind]._losses_solu ion[ini _ ange:len(models[model_ind]._losses_solu ion)
]],
498 label = names[model_ind])
499 pl .ylim(log(min_loss ,10), log(max_loss ,10))
500 pl . i le( 'Exac Solu ion Loss Loga i m , $log_{10}(L_{sol}(w,b))$ s Epoch - ' + i le)
501 else:
502 o model_ind in ange(len(models)):
503 pl .plo ( ange(ini _ ange , len(models[model_ind ]._losses_solu ion)),
504 models[model_ind]._losses_solu ion[ini _ ange:len(models[model_ind]._losses_solu ion)],
505 label = names[model_ind])
506 pl .ylim(min_loss , max_loss )
507 pl . i le( 'Exac Solu ion Loss, $L_{sol}(w,b)$ s Epoch - ' + i le)
508
509 pl .xlim(ini _ ange , end_ ange)
510 pl .xlabel('I e a ions ')
511 pl .ylabel('Loss')
512 pl .legend()
513 pl .show()
514
515 # Plo s he bo de loss.
516 min_loss = 1e30
517 max_loss = 0
518 o model in models:
519 min_loss_ a = min(loss o loss in model._losses_bo de [ini _ ange:end_ ange] i loss > 0)
520 max_loss_ a = max(model._losses_bo de [ini _ ange:end_ ange])
521 i (min_loss_ a < min_loss):
522 min_loss = min_loss_ a
523 i (max_loss_ a > max_loss):
524 max_loss = max_loss_ a
525
526 i (use_log_scale == T ue):
527 o model_ind in ange(len(models)):
528 pl .plo ( ange(ini _ ange , len(models[model_ind ]._losses_bo de )),
529 [log(y,10) i y !=0 else None
530 o yin models[model_ind]._losses_bo de [ini _ ange:len(models[model_ind]._losses_bo de )]],
531 label = names[model_ind])
532 pl .ylim(log(min_loss ,10), log(max_loss ,10))
533 pl . i le( 'Ini ial Condi ion Loss Loga i m , $log_ {10}(L_{2}(w,b))$ s Epoch - ' + i le)
534 else:
535 o model_ind in ange(len(models)):
536 pl .plo ( ange(ini _ ange , len(models[model_ind ]._losses_bo de )),
537 models[model_ind]._losses_bo de [ini _ ange:len(models[model_ind]._losses_bo de )],
538 label = names[model_ind])
539 pl .ylim(min_loss , max_loss )
78
540 pl . i le( 'Ini ial Condi ion Loss , $L_{2}(w,b)$ s Epoch - ' + i le)
541
542 pl .xlim(ini _ ange , end_ ange)
543 pl .xlabel('I e a ions ')
544 pl .ylabel('Loss')
545 pl .legend()
546 pl .show()
547
548 # Plo s he domain loss.
549 min_loss = 1e30
550 max_loss = 0
551 o model in models:
552 min_loss_ a = min(loss o loss in model._losses_domain[ini _ ange:end_ ange] i loss > 0)
553 max_loss_ a = max(model._losses_domain[ini _ ange:end_ ange])
554 i (min_loss_ a < min_loss):
555 min_loss = min_loss_ a
556 i (max_loss_ a > max_loss):
557 max_loss = max_loss_ a
558
559 i (use_log_scale == T ue):
560 o model_ind in ange(len(models)):
561 pl .plo ( ange(ini _ ange , len(models[model_ind ]._losses_domain)),
562 [log(y,10) i y !=0 else None
563 o yin models[model_ind]._losses_domain[ini _ ange:len(models[model_ind]._losses_domain)]],
564 label = names[model_ind])
565 pl .ylim(log(min_loss ,10), log(max_loss ,10))
566 pl . i le( 'Domian Loss Loga i m , $log_{10}(L_{1}(w,b))$ s Epoch - ' + i le)
567 else:
568 o model_ind in ange(len(models)):
569 pl .plo ( ange(ini _ ange , len(models[model_ind ]._losses_domain)),
570 models[model_ind]._losses_domain[ini _ ange:len(models[model_ind]._losses_domain)],
571 label = names[model_ind])
572 pl .ylim(min_loss , max_loss )
573 pl . i le( 'Domian Loss, $L_{1}(w,b)$ s Epoch - ' + i le)
574
575 pl .xlim(ini _ ange , end_ ange)
576 pl .xlabel('I e a ions ')
577 pl .ylabel('Loss')
578 pl .legend()
579 pl .show()
B.3 myDa aSe s Class
1"""
2@au ho : Albe o Ga cia Molina
3@la es _upda e: 12/10/2020
4"""
5
6class myDa aSe s:
7
8# Ini ialize myDa aSe s objec .
9de __ini __ (sel ,
10 aining_ba ch_size = 2000,
11 bo de _ aining_ba ch_size = 20,
12 alida ion_ba ch_size = 1000,
13 inpu _dim = 1,
14 me hod = 'uni o m -hi -colloca ion ',
15 domain = 'hype cube -0-1',
16 bo de = 'side-x_1-y_0',
17 seed = None):
18
19 sel ._ aining_ba ch_size = aining_ba ch_size
20 sel ._bo de _ aining_ba ch_size = bo de _ aining_ba ch_size
21 sel ._ alida ion_ba ch_size = alida ion_ba ch_size
22 sel ._inpu _dim = inpu _dim
23
24 sel .me hod = me hod
25 sel .domain = domain
26 sel .bo de = bo de
27
28 seed_1 = None
29 seed_2 = None
30 seed_3 = None
31 i (seed != None):
32 seed_1 = seed
33 seed_2 = 2*seed
34 seed_3 = 3*seed
35
36 sel ._ aining_se = sel .gene a e_domain_se ( aining_ba ch_size , inpu _dim ,
37 me hod, domain, seed_1)
38
39 sel ._bo de _ aining_se = sel .gene a e_bo de _se (bo de _ aining_ba ch_size ,
40 inpu _dim , me hod, bo de , seed_2)
41
42 sel ._ alida ion_se = sel .gene a e_domain_se ( alida ion_ba ch_size , inpu _dim ,
43 me hod, domain, seed_3)
44
45 # Gene a es a dis ibu ion o poin s inside he sol ing domain.
46 de gene a e_domain_se (sel ,
47 ba ch_size = 2000,
48 inpu _dim = 1,
49 me hod = 'uni o m -hi -colloca ion ',
50 domain = 'hype cube -0-1',
51 seed = None):
52
53 i (seed != None):
54 . andom.se _seed(seed)
79
55
56 i (me hod == 'uni o m -hi -colloca ion '):
57 i (domain == 'hype cube -0-1'):
58 domain_se = . andom.uni o m(shape=[ba ch_size , inpu _dim],
59 min al=0., max al=1., d ype= . loa 32)
60 eli (domain == 'qua e -hype cube -0-1'):
61 domain_se = . andom.uni o m(shape=[ba ch_size , inpu _dim],
62 min al=0., max al=0.5, d ype= . loa 32)
63 eli (domain == 'hype cube -0-2'):
64 domain_se = . andom.uni o m(shape=[ba ch_size , inpu _dim],
65 min al=0., max al=2., d ype= . loa 32)
66
67 e u n domain_se
68
69 # Gene a es a dis ibu ion o poin s on he bo de o he sol ing domain.
70 de gene a e_bo de _se (sel ,
71 ba ch_size = 2,
72 inpu _dim = 1,
73 me hod = 'uni o m -hi -colloca ion ',
74 bo de = 'hype cube -0-1',
75 seed = None):
76
77 i (seed != None):
78 . andom.se _seed(seed)
79
80 i (me hod == 'uni o m -hi -colloca ion '):
81 i (bo de == 'hype cube -0-1'):
82 i (inpu _dim == 1):
83 x1 = .cons an (0., shape=[1, inpu _dim], d ype= . loa 32)
84 x2 = .cons an (1., shape=[1, inpu _dim], d ype= . loa 32)
85 bo de _se = .conca ([x1, x2], axis=0)
86
87 eli (inpu _dim == 2):
88 x1 = . andom.uni o m(shape=[ba ch_size//4],
89 min al=0.,
90 max al=1.,
91 d ype= . loa 32)
92 y1 = .cons an (0.,
93 shape=[ba ch_size//4],
94 d ype= . loa 32)
95 bo de _se _1 = .s ack([x1, y1], axis=1) # y=0
96
97 x2 = . andom.uni o m(shape=[ba ch_size//4],
98 min al=0.,
99 max al=1.,
100 d ype= . loa 32)
101 y2 = .cons an (1.,
102 shape=[ba ch_size//4],
103 d ype= . loa 32)
104 bo de _se _2 = .s ack([x2, y2], axis=1) # y=1
105
106 x3 = .cons an (0.,
107 shape=[ba ch_size//4],
108 d ype= . loa 32)
109 y3 = . andom.uni o m(shape=[ba ch_size//4],
110 min al=0.,
111 max al=1.,
112 d ype= . loa 32)
113 bo de _se _3 = .s ack([x3, y3], axis=1) # x=0
114
115 x4 = .cons an (1.,
116 shape=[ba ch_size//4],
117 d ype= . loa 32)
118 y4 = . andom.uni o m(shape=[ba ch_size//4],
119 min al=0.,
120 max al=1.,
121 d ype= . loa 32)
122 bo de _se _4 = .s ack([x4, y4], axis=1) # x=1
123
124 bo de _se = .conca ([ bo de _se _1 , bo de _se _2 , bo de _se _3 , bo de _se _4],
125 axis=0)
126
127 eli (bo de == 'side-x_1-y_0'):
128 i (inpu _dim == 1):
129 bo de _se = .cons an (0., shape=[1, inpu _dim], d ype= . loa 32)
130 eli (inpu _dim == 2):
131 x1 = . andom.uni o m(shape=[ba ch_size],
132 min al=-1.,
133 max al=2.,
134 d ype= . loa 32)
135 y1 = .cons an (0.,
136 shape=[ba ch_size],
137 d ype= . loa 32)
138 bo de _se = .s ack([x1, y1], axis=1)
139
140 eli (bo de == 'side-x_1-y_0_expanded '):
141 i (inpu _dim == 1):
142 bo de _se = .cons an (0., shape=[1, inpu _dim], d ype= . loa 32)
143 eli (inpu _dim == 2):
144 x1 = . andom.uni o m(shape=[ba ch_size],
145 min al=-1.,
146 max al=2.,
147 d ype= . loa 32)
148 y1 = .cons an (0.,
149 shape=[ba ch_size],
150 d ype= . loa 32)
151 bo de _se = .s ack([x1, y1], axis=1)
152
153 eli (bo de == ' wo_sides -x_0-y_0 '):
154 i (inpu _dim == 1):
155 bo de _se = .cons an (0., shape=[1, inpu _dim], d ype= . loa 32)
156 eli (inpu _dim == 2):
80
157 x1 = . andom.uni o m(shape=[ba ch_size//2],
158 min al=0.,
159 max al=1.,
160 d ype= . loa 32)
161 y1 = .cons an (0.,
162 shape=[ba ch_size//2],
163 d ype= . loa 32)
164
165 x2 = .cons an (0.,
166 shape=[ba ch_size//2],
167 d ype= . loa 32)
168 y2 = . andom.uni o m(shape=[ba ch_size//2],
169 min al=0.,
170 max al=1.,
171 d ype= . loa 32)
172
173 bo de _se _1 = .s ack([x1, y1], axis=1) # y=0
174 bo de _se _2 = .s ack([x2, y2], axis=1) # x=0
175 bo de _se = .conca ([ bo de _se _1 , bo de _se _2],
176 axis=0)
177
178 e u n bo de _se
179
180 # Re u ns he se s s o ed in his objec .
181 de ge _se s(sel ):
182 e u n sel ._ aining_se , sel ._bo de _ aining_se , sel ._ alida ion_se
183
184 # Re u ns he me ada a o he se s s o ed in his objec .
185 de ge _se _me ada a(sel ):
186 e u n sel ._ aining_ba ch_size, sel ._bo de _ aining_ba ch_size, sel ._ alida ion_ba ch_size,
187 sel ._inpu _dim , sel .me hod , sel .domain, sel .bo de
188
189 # D ops he alues which ha e nega i e loss.
190 de d op_nega i e_loss (da a_se ,
191 model):
192
193 # TBD
194 domain_ind_di = . educe_sum(
195 p oblemIns ance.di e en ial_ope a o (
196 inpu s = da a_se ._ aining_se ,
197 ou pu s = model.p edic (da a_se ._ aining_se ,
198 model._ equi ed_de i a i e_o de )[0],
199 ou pu s_de i a i es = model.p edic (da a_se ._ aining_se ,
200 model._ equi ed_de i a i e_o de )[1],
201 di e en ial_ope a o = model._di e en ial_ope a o ,
202 inpu _dim = model._inpu _dim ,
203 ou pu _dim = model._ou pu _dim)
204 - p oblemIns ance.ex e nal_ o ce(inpu s = da a_se ._ aining_se ,
205 ex e nal_ o ce = model._ex e nal_ o ce ,
206 inpu _dim = model._inpu _dim ,
207 ou pu _dim = model._ou pu _dim),
208 axis = 1,
209 keepdims = False)
210
211 # TBD
212 bo de _ind_di = . educe_sum(
213 da a_se ._bo de _ aining_se [1]
214 - model.p edic (da a_se ._bo de _ aining_se [0],0)[0],
215 axis = 1,
216 keepdims = False)
217
218 # Mask
219 il e ed_ aining_se = .boolean_mask( enso = da a_se ._ aining_se ,
220 mask = domain_ind_di > 0,
221 axis = 0)
222 il e ed_bo de _ aining_se _0 = .boolean_mask( enso = da a_se ._bo de _ aining_se [0],
223 mask = bo de _ind_di > 0,
224 axis = 0)
225 il e ed_bo de _ aining_se _1 = .boolean_mask( enso = da a_se ._bo de _ aining_se [1],
226 mask = bo de _ind_di > 0,
227 axis = 0)
228
229 # Replace he Da ase
230 i ( il e ed_ aining_se .shape[0] != 0):
231 da a_se ._ aining_se = il e ed_ aining_se
232 da a_se ._ aining_ba ch_size = il e ed_ aining_se .shape[0]
233 i ( il e ed_bo de _ aining_se _0.shape[0] != 0):
234 da a_se ._bo de _ aining_se [0] = il e ed_bo de _ aining_se _0
235 da a_se ._bo de _ aining_se [1] = il e ed_bo de _ aining_se _1
236 da a_se ._bo de _ aining_ba ch_size = il e ed_bo de _ aining_se _0.shape[0]
237
238 # D ops he alues which ha e nega i e loss.
239 de d op_bes _loss (da a_se ,
240 model):
241
242 # TBD
243 domain_ind_di = . educe_sum( .squa e(
244 p oblemIns ance.di e en ial_ope a o (
245 inpu s = da a_se ._ aining_se ,
246 ou pu s = model.p edic (da a_se ._ aining_se ,
247 model._ equi ed_de i a i e_o de )[0],
248 ou pu s_de i a i es = model.p edic (da a_se ._ aining_se ,
249 model._ equi ed_de i a i e_o de )[1],
250 di e en ial_ope a o = model._di e en ial_ope a o ,
251 inpu _dim = model._inpu _dim ,
252 ou pu _dim = model._ou pu _dim)
253 - p oblemIns ance.ex e nal_ o ce(inpu s = da a_se ._ aining_se ,
254 ex e nal_ o ce = model._ex e nal_ o ce ,
255 inpu _dim = model._inpu _dim ,
256 ou pu _dim = model._ou pu _dim)),
257 axis = 1,
258 keepdims = False)
81
22 ou pu _dim = 2,
23 ac i a ion = 'sigmoid ',
24 weigh _ini ialize = 'xa ie ',
25 bias_ini ialize = 'xa ie ',
26 seed = None,
27 ba ch_no maliza ion = False ,
28 sup ess_bias = False ,
29 epsilon = 1e-12):
30
31 sel ._inpu _dim = inpu _dim
32 sel ._ou pu _dim = ou pu _dim
33 sel ._ac i a ion = ac i a ion
34 sel ._weigh _ini ialize = weigh _ini ialize
35 sel ._bias_ini ialize = bias_ini ialize
36 sel ._ba ch_no maliza ion = ba ch_no maliza ion
37 sel ._has_bias = no sup ess_bias
38 sel ._epsilon = epsilon
39
40 i (weigh _ini ialize == 'ze os '):
41 wIni = .ke as.ini ialize s.Ze os()
42 eli (weigh _ini ialize == 'ones '):
43 wIni = .ke as.ini ialize s.Ones()
44 eli (weigh _ini ialize == 'no mal_0_1 '):
45 wIni = RandomNo mal(mean=0., s dde =1., seed=seed)
46 eli (weigh _ini ialize == 'uni o m_ -1_1'):
47 wIni = .ke as.ini ialize s.RandomUni o m(min al=-1., max al=1., seed=seed)
48 eli (weigh _ini ialize == 'xa ie '):
49 wIni = .ke as.ini ialize s.Glo o No mal(seed=seed)
50 eli (weigh _ini ialize == 'he'):
51 wIni = .ke as.ini ialize s.he_no mal(seed=seed)
52
53 i (bias_ini ialize == 'ze os '):
54 bIni = .ke as.ini ialize s.Ze os()
55 eli (bias_ini ialize == 'ones'):
56 bIni = .ke as.ini ialize s.Ones()
57 eli (bias_ini ialize == 'no mal_0_1 '):
58 bIni = RandomNo mal(mean=0.,s dde =1.,seed=seed)
59 eli (bias_ini ialize == 'uni o m_ -1_1'):
60 bIni = .ke as.ini ialize s.RandomUni o m(min al=-1., max al=1., seed=seed)
61 eli (bias_ini ialize == 'xa ie '):
62 bIni = .ke as.ini ialize s.Glo o No mal(seed=seed)
63 eli (bias_ini ialize == 'he'):
64 bIni = .ke as.ini ialize s.he_no mal(seed=seed)
65
66 sel .w = sel .add_weigh (
67 name = sel ._name + ' W',
68 shape = (sel ._inpu _dim , sel ._ou pu _dim),
69 ini ialize = wIni ,
70 ainable = T ue)
71 .cas (sel .w, . loa 32)
72
73 i (sel ._has_bias == T ue):
74 sel .b = sel .add_weigh (
75 name = sel ._name + ' b',
76 shape = (sel ._ou pu _dim ,),
77 ini ialize = bIni ,
78 ainable = T ue)
79 else:
80 bIni = .ke as.ini ialize s.Ze os()
81 sel .b = sel .add_weigh (
82 name = sel ._name + ' b',
83 shape = (sel ._ou pu _dim ,),
84 ini ialize = bIni ,
85 ainable = T ue)
86 .cas (sel .b, . loa 32)
87
88 ####################
89 # Feeds he inpu in o he laye .
90 ####################
91 de eed(sel ,
92 inpu s = None):
93
94 i (sel ._has_bias == T ue):
95 ou pu s = .ma mul(inpu s , sel .w) + sel .b
96 else:
97 ou pu s = .ma mul(inpu s , sel .w)
98
99 i (sel ._ac i a ion == 'sigmoid '):
100 ou pu s = .nn.sigmoid(ou pu s)
101 i (sel ._ac i a ion == ' anh'):
102 ou pu s = .ke as.ac i a ions. anh(ou pu s)
103 i (sel ._ac i a ion == ' elu'):
104 ou pu s = .nn. elu(ou pu s)
105 i (sel ._ac i a ion == 'exponen ial '):
106 ou pu s = .ke as.ac i a ions.exponen ial(ou pu s)
107 i (sel ._ac i a ion == 'elu'):
108 ou pu s = .ke as .ac i a ions .elu(ou pu s , alpha =1.0)
109 i (sel ._ac i a ion == 'swish '):
110 ou pu s = .ke as.ac i a ions.swish(ou pu s)
111 i (sel ._ac i a ion == 'so plus '):
112 ou pu s = .nn.so plus(ou pu s)
113
114
115 i (sel ._ba ch_no maliza ion == T ue):
116 mean, a = .nn.momen s(ou pu s , axes=0, keepdims=T ue)
117 ou pu s = (ou pu s -mean)/( .ma h.sq ( a + sel ._epsilon))
118
119 e u n ou pu s
88
B.7 myModel Class
1"""
2@au ho : Albe o Ga cia Molina
3@la es _upda e: 12/10/2020
4"""
5
6class myModel( .ke as.Model):
7
8####################
9# Ini ializes he model ins ance.
10 ####################
11 de __ini __ (sel ,
12 name = 'myModel '):
13
14 supe (myModel , sel ).__ini __()
15
16 # Ini ializes he name and lags o he model.
17 sel ._name = name
18 sel .buil = False
19 sel ._is_compiled = False
20 sel ._has_da ase = False
21
22 # Ini ializes he his o ical aining a iables o he model.
23 sel ._num_epochs_ ained = 0
24 sel ._losses = []
25 sel ._losses_domain = []
26 sel ._losses_bo de = []
27 sel ._losses_ egula iza ion = []
28 sel ._losses_solu ion = []
29 sel ._losses_ alida ion = []
30
31 ####################
32 # Builds he laye s o he model.
33 ####################
34 de build (sel ,
35 inpu _dim = 2,
36 hidden_dim = [5,5],
37 ou pu _dim = 2,
38 ac i a ions = 'sigmoid ',
39 weigh _ini ialize s = 'xa ie ',
40 bias_ini ialize s = 'xa ie ',
41 ba ch_no maliza ion = False ,
42 sup ess_bias = False ,
43 seed = None,
44 epsilon = 1e-12):
45
46 # Se s up he basic cha ac e is ics o he laye s in he model.
47 sel ._inpu _dim = inpu _dim
48 sel ._hidden_dim = hidden_dim
49 sel ._ou pu _dim = ou pu _dim
50 sel ._num_hidden_laye s = len(hidden_dim)-1
51 sel ._ac i a ions = ac i a ions
52 sel ._weigh _ini ialize s = weigh _ini ialize s
53 sel ._bias_ini ialize s = bias_ini ialize s
54 sel ._ba ch_no maliza ion = ba ch_no maliza ion
55 sel ._has_bias = no sup ess_bias
56
57 sel ._laye s = []
58
59 # Cons uc s he inpu laye .
60 laye = myLaye ('Inpu _Laye ')
61 laye .build(inpu _dim = sel ._inpu _dim ,
62 ou pu _dim = sel ._hidden_dim[0],
63 ac i a ion = sel ._ac i a ions,
64 weigh _ini ialize = sel ._weigh _ini ialize s,
65 bias_ini ialize = sel ._bias_ini ialize s,
66 seed = seed,
67 ba ch_no maliza ion = sel ._ba ch_no maliza ion,
68 sup ess_bias = sup ess_bias ,
69 epsilon = epsilon)
70 sel ._laye s.append(laye )
71 sel ._ ainable_weigh s.append(laye . a iables [0])
72 sel ._ ainable_weigh s.append(laye . a iables [1])
73
74 # Cons uc s he hidden laye s.
75 o laye _num in ange(1,sel ._num_hidden_laye s+1):
76 laye = myLaye ('Hidden_Laye _ '+s (laye _num))
77 laye .build(inpu _dim = sel ._hidden_dim[laye _num -1],
78 ou pu _dim = sel ._hidden_dim[laye _num],
79 ac i a ion = sel ._ac i a ions,
80 weigh _ini ialize = sel ._weigh _ini ialize s,
81 bias_ini ialize = sel ._bias_ini ialize s,
82 seed = seed,
83 ba ch_no maliza ion = sel ._ba ch_no maliza ion,
84 sup ess_bias = sup ess_bias ,
85 epsilon = epsilon)
86 sel ._laye s.append(laye )
87 sel ._ ainable_weigh s.append(laye . a iables [0])
88 sel ._ ainable_weigh s.append(laye . a iables [1])
89
90 # Cons uc s he ou pu laye .
91 laye = myLaye ('Ou pu _Laye ')
92 laye .build(inpu _dim = sel ._hidden_dim[-1],
93 ou pu _dim = sel ._ou pu _dim ,
94 ac i a ion = None ,
95 weigh _ini ialize = sel ._weigh _ini ialize s,
96 bias_ini ialize = sel ._bias_ini ialize s,
89
97 seed = seed,
98 ba ch_no maliza ion = False ,
99 sup ess_bias = sup ess_bias ,
100 epsilon = None)
101 sel ._laye s.append(laye )
102 sel ._ ainable_weigh s.append(laye . a iables [0])
103 sel ._ ainable_weigh s.append(laye . a iables [1])
104
105 # Raise lag i he neu al ne wo k has been buil success uly.
106 sel .buil = T ue
107
108 ####################
109 # Se s up he op imize .
110 ####################
111 de se _up_op imize (sel ,
112 op imize _selec ion,
113 lea ning_ a e = 1e-03,
114 epsilon = 1e-07):
115
116 # Se s up he in o ma ion o he op imize .
117 sel ._op imize _selec ion = op imize _selec ion
118 sel ._lea ning_ a e = lea ning_ a e
119 sel ._epsilon = epsilon
120
121 # Adam Op imize (1s O de )
122 i (sel ._op imize _selec ion == 'Adam '):
123 sel ._op imize 1 = .ke as.op imize s.Adam(lea ning_ a e = sel ._lea ning_ a e ,
124 epsilon = sel ._epsilon ,
125 amsg ad = False)
126 sel ._op imize 2 = .ke as.op imize s.Adam(lea ning_ a e = sel ._lea ning_ a e ,
127 epsilon = sel ._epsilon ,
128 amsg ad = False)
129 # AMSG ad Op imize (1s O de )
130 eli (sel ._op imize _selec ion == 'AMSG ad '):
131 sel ._op imize 1 = .ke as.op imize s.Adam(lea ning_ a e = sel ._lea ning_ a e ,
132 epsilon = sel ._epsilon ,
133 amsg ad = T ue)
134 sel ._op imize 2 = .ke as.op imize s.Adam(lea ning_ a e = sel ._lea ning_ a e ,
135 epsilon = sel ._epsilon ,
136 amsg ad = T ue)
137 # Nadam Op imize (1s O de )
138 eli (sel ._op imize _selec ion == 'Nadam '):
139 sel ._op imize 1 = .ke as.op imize s.Nadam(lea ning_ a e = sel ._lea ning_ a e ,
140 epsilon = sel ._epsilon)
141 sel ._op imize 2 = .ke as.op imize s.Nadam(lea ning_ a e = sel ._lea ning_ a e ,
142 epsilon = sel ._epsilon)
143 # AdaG ad Op imize (1s O de )
144 eli (sel ._op imize _selec ion == 'AdaG ad '):
145 sel ._op imize 1 = .ke as.op imize s.Adag ad(lea ning_ a e = sel ._lea ning_ a e ,
146 epsilon = sel ._epsilon)
147 sel ._op imize 2 = .ke as.op imize s.Adag ad(lea ning_ a e = sel ._lea ning_ a e ,
148 epsilon = sel ._epsilon)
149 # AdaDel a Op imize (1s O de )
150 eli (sel ._op imize _selec ion == 'AdaDel a '):
151 sel ._op imize 1 = .ke as.op imize s.Adadel a(lea ning_ a e = sel ._lea ning_ a e ,
152 ho = 0.95,
153 epsilon = sel ._epsilon)
154 sel ._op imize 2 = .ke as.op imize s.Adadel a(lea ning_ a e = sel ._lea ning_ a e ,
155 ho = 0.95,
156 epsilon = sel ._epsilon)
157 # RMSP op Op imize (1s O de )
158 eli (sel ._op imize _selec ion == 'RMSP op '):
159 sel ._op imize 1 = .ke as.op imize s.RMSp op(lea ning_ a e=sel ._lea ning_ a e ,
160 epsilon = sel ._epsilon)
161 sel ._op imize 2 = .ke as.op imize s.RMSp op(lea ning_ a e=sel ._lea ning_ a e ,
162 epsilon = sel ._epsilon)
163 # Vanilla SDG Op imize (1s O de )
164 eli (sel ._op imize _selec ion == 'Vanilla_SGD '):
165 sel ._op imize 1 = .ke as.op imize s.SGD(lea ning_ a e = sel ._lea ning_ a e ,
166 nes e o = False)
167 sel ._op imize 2 = .ke as.op imize s.SGD(lea ning_ a e = sel ._lea ning_ a e ,
168 nes e o = False)
169 # SGD wi h Momen um Op imize (1s O de )
170 eli (sel ._op imize _selec ion == 'Momen um_SGD '):
171 sel ._op imize 1 = ke as.op imize s.SGD(lea ning_ a e = sel ._lea ning_ a e ,
172 momen um = 0.9,
173 nes e o = False)
174 sel ._op imize 2 = ke as.op imize s.SGD(lea ning_ a e = sel ._lea ning_ a e ,
175 momen um = 0.9,
176 nes e o = False)
177 # SGD wi h Nes e o Momen um Op imize (1s O de )
178 eli (sel ._op imize _selec ion == 'Nes e o _SGD '):
179 sel ._op imize 1 = ke as.op imize s.SGD(lea ning_ a e = sel ._lea ning_ a e ,
180 momen um = 0.9,
181 nes e o = T ue)
182 sel ._op imize 2 = ke as.op imize s.SGD(lea ning_ a e = sel ._lea ning_ a e ,
183 momen um = 0.9,
184 nes e o = T ue)
185 # BFGS Op imize (2s O de )
186 eli (sel ._op imize _selec ion == 'BFGS'):
187 sel ._op imize = secondO de Op imize s(name = 'BFGS',
188 model = sel )
189 # L-BFGS Op imize (2s O de )
190 eli (sel ._op imize _selec ion == 'L-BFGS'):
191 sel ._op imize = secondO de Op imize s(name = 'L-BFGS',
192 model = sel )
193 else:
194 sel ._is_compiled = False
195 aise Excep ion("In alid op imize .")
196
197 ####################
198 # Builds he p oblem ins ance and aining se up.
90
199 ####################
200 de compile (sel ,
201 di e en ial_ope a o = None ,
202 ex e nal_ o ce = None,
203 exac _solu ion = None,
204 op imize _selec ion = None,
205 lea ning_ a e = 1e-03,
206 epsilon = 1e-07,
207 scale_ ac o = 1,
208 loss_ uc ion = 'squa e_L2_e o ',
209 egula iza ion = None,
210 egula iza ion_coe = 0,
211 clip_g adien = 'global'):
212
213 # Se s up he p oblem sol ed by he model.
214 sel ._di e en ial_ope a o = di e en ial_ope a o
215 sel ._ex e nal_ o ce = ex e nal_ o ce
216 sel ._exac _solu ion = exac _solu ion
217
218 # Se s up he egula iza ion and loss op ions.
219 sel ._scale_ ac o = scale_ ac o
220 sel ._loss_ uc ion = loss_ uc ion
221 sel ._ egula iza ion = egula iza ion
222 sel ._ egula iza ion_coe = egula iza ion_coe
223 sel ._clip_g adien = clip_g adien
224
225 # Cons uc s he op imize and alida es he ins ances.
226 i ( egula iza ion == None):
227 p in ('No egula iza ion in oduced , using de aul None')
228 sel ._ equi ed_de i a i e_o de = p oblemIns ance.
229 ins ance_exis s(di e en ial_ope a o = sel ._di e en ial_ope a o ,
230 ex e nal_ o ce = sel ._ex e nal_ o ce,
231 exac _solu ion = sel ._exac _solu ion)
232 sel .se _up_op imize (op imize _selec ion = op imize _selec ion ,
233 lea ning_ a e = lea ning_ a e ,
234 epsilon = epsilon)
235
236 # Raise lag i he p oblem and aining ins ance has been buil success uly.
237 sel ._is_compiled = T ue
238
239 ####################
240 # Feed o wa d o he neu al ne wo k , e u ning also he g adien w inpu s.
241 ####################
242 de p edic (sel ,
243 inpu s,
244 e u n_de i a i e_o de = 0):
245
246 i (sel .buil == False):
247 aise Excep ion("Canno eed o wa d, he model is no buil .")
248
249 ou pu s_de i a i es = []
250 i ( e u n_de i a i e_o de in (0,1,2,3)):
251
252 # Ou pu wi h 0 o de de i a i e.
253 i ( e u n_de i a i e_o de == 0):
254 ou pu s = sel ._laye s [0]. eed(inpu s)
255 o laye _ind in ange(1, sel ._num_hidden_laye s +2):
256 ou pu s = sel ._laye s[laye _ind]. eed(ou pu s)
257 . debugging. check_nume ics (ou pu s , message = 'NaN occu ed in ne wo k ou pu .')
258
259 # Ou pu wi h 1s o de de i a i es.
260 i ( e u n_de i a i e_o de == 1):
261 wi h .G adien Tape(pe sis en = False)as ape_o d1:
262 ape_o d1.wa ch(inpu s)
263 ou pu s = sel ._laye s [0]. eed(inpu s)
264 o laye _ind in ange(1, sel ._num_hidden_laye s+2):
265 ou pu s = sel ._laye s[laye _ind]. eed(ou pu s)
266 # Adap a ion o mul i- alued unc ions in one de i a i e (Bu ge s Ope a o ).
267 i (sel ._ou pu _dim < 2):
268 ou pu s_1s _de = ape_o d1.g adien (ou pu s ,
269 inpu s)
270 else:
271 ou pu s_1s _de = ape_o d1.ba ch_jacobian(ou pu s ,
272 inpu s)
273 ou pu s_de i a i es.append(ou pu s_1s _de )
274 del ape_o d1
275 . debugging. check_nume ics (ou pu s ,
276 message = 'NaN occu ed in ne wo k ou pu .')
277 .debugging.check_nume ics(ou pu s_1s _de ,
278 message = 'NaN occu ed in ne wo k 1s de i a i e ou pu .')
279
280 # Ou pu wi h 2nd o de de i a i es.
281 eli ( e u n_de i a i e_o de == 2):
282 ou pu s_1s _de = []
283 ou pu s_2nd_de = []
284 inpu _componen _lis = .uns ack(inpu s, axis = 1)
285 o dim in ange(inpu s.shape[1]):
286 inpu _componen _lis [dim] = .expand_dims(inpu _componen _lis [dim], axis = 1)
287 o dim in ange(inpu s.shape[1]):
288 wi h .G adien Tape(pe sis en = T ue)as ape_o d2:
289 wi h .G adien Tape(pe sis en = T ue)as ape_o d1:
290 ape_o d2.wa ch(inpu _componen _lis [dim])
291 ape_o d1.wa ch(inpu _componen _lis [dim])
292 econs _inpu s = .squeeze( .s ack(inpu _componen _lis , axis = 1), axis = 2)
293 ou pu s = sel ._laye s [0]. eed( econs _inpu s)
294 o laye _ind in ange(1, sel ._num_hidden_laye s +2):
295 ou pu s = sel ._laye s[laye _ind]. eed(ou pu s)
296 ou pu s_1s _de _ a = ape_o d1.g adien (ou pu s ,
297 inpu _componen _lis [dim])
298 ou pu s_2nd_de _ a = ape_o d2.g adien (ou pu s_1s _de _ a ,
299 inpu _componen _lis [dim])
300 ou pu s_1s _de .append(ou pu s_1s _de _ a )
91
301 ou pu s_2nd_de .append(ou pu s_2nd_de _ a )
302 del ape_o d1
303 del ape_o d2
304
305 ou pu s_de i a i es.append( .squeeze( .s ack(ou pu s_1s _de , axis = 1), axis = 2))
306 ou pu s_de i a i es.append( .squeeze( .s ack(ou pu s_2nd_de , axis = 1), axis = 2))
307
308 . debugging. check_nume ics (ou pu s ,
309 message = 'NaN occu ed in ne wo k ou pu .')
310 .debugging.check_nume ics(ou pu s_de i a i es[0],
311 message = 'NaN occu ed in ne wo k 1s de i a i e ou pu .')
312 .debugging.check_nume ics(ou pu s_de i a i es[1],
313 message = 'NaN occu ed in ne wo k 2nd de i a i e ou pu .')
314
315 # Ou pu wi h 3 d o de de i a i es. (CORRECT FOR THE THIRD ORDER DERIVATIVE RIGHT)
316 #eli ( e u n_de i a i e_o de == 3):
317 # wi h .G adien Tape(pe sis en = False)as ape_o d3:
318 # ape_o d3.wa ch(inpu s)
319 # wi h .G adien Tape(pe sis en = False)as ape_o d2:
320 # ape_o d2.wa ch(inpu s)
321 # wi h .G adien Tape(pe sis en = False)as ape_o d1:
322 # ape_o d1.wa ch(inpu s)
323 # ou pu s = sel ._laye s[0]. eed(inpu s)
324 # o laye _ind in ange(1, sel ._num_hidden_laye s +2):
325 # ou pu s = sel ._laye s[laye _ind]. eed(ou pu s)
326 # ou pu s_1s _de = ape_o d1.g adien (ou pu s ,
327 # inpu s)
328 # ou pu s_de i a i es.append(ou pu s_1s _de )
329 #del ape_o d1
330 # ou pu s_2nd_de = ape_o d2.g adien (ou pu s_1s _de ,
331 # inpu s)
332 # ou pu s_de i a i es.append(ou pu s_2nd_de )
333 #del ape_o d2
334 # ou pu s_3 d_de = ape_o d3.g adien (ou pu s_2nd_de ,
335 # inpu s)
336 # ou pu s_de i a i es.append(ou pu s_3 d_de )
337 #del ape_o d3
338 # . debugging. check_nume ics(ou pu s ,
339 # message = 'NaN occu ed in ne wo k ou pu .')
340 # .debugging.check_nume ics(ou pu s_1s _de ,
341 # message = 'NaN occu ed in ne wo k 1s de i a i e ou pu .')
342 # .debugging.check_nume ics(ou pu s_2nd_de ,
343 # message = 'NaN occu ed in ne wo k 2nd de i a i e ou pu .')
344 # .debugging.check_nume ics(ou pu s_3 d_de ,
345 # message = 'NaN occu ed in ne wo k 3 d de i a i e ou pu .')
346
347 else:
348 aise Excep ion("In alid o de o ne wo k de i a i e compu a ion.")
349
350 e u n ou pu s , ou pu s_de i a i es
351
352 ####################
353 # Calcula es he loss unc ion.
354 ####################
355 de loss_ unc ion (sel ,
356 inpu s_domain,
357 bo de _da a = None,
358 is_ aining = False ,
359 use_only_domain = False ,
360 use_only_bo de = False):
361
362 # Ini ializes he losses a iables.
363 loss_domain = .cons an (0.)
364 loss_bo de = .cons an (0.)
365 loss_ egula iza ion = .cons an (0.)
366 loss_solu ion = .cons an (0.)
367
368 # E alua es he le -hand-side and he igh -hand-side o he di e en ial equa ion and solu ion.
369
370 # Use he g adien balancing egula iza ion.
371 i (sel ._ egula iza ion != 'G adien _Type '):
372 ou pu s , ou pu s_de i a i es = sel .p edic (inpu s = inpu s_domain ,
373 e u n_de i a i e_o de = sel ._ equi ed_de i a i e_o de )
374 else:
375 wi h .G adien Tape(pe sis en = T ue)as ape_ eg:
376 ape_ eg.wa ch(sel ._ ainable_weigh s)
377 ou pu s , ou pu s_de i a i es = sel .p edic (inpu s = inpu s_domain ,
378 e u n_de i a i e_o de = sel ._ equi ed_de i a i e_o de )
379 ou pu s_pa am_de = ape_ eg. g adien (ou pu s ,
380 sel ._ ainable_weigh s,
381 unconnec ed_g adien s = .Unconnec edG adien s.ZERO)
382 ou pu s_de i a i es_pa am_de = ape_ eg.g adien (ou pu s_de i a i es[0],
383 sel ._ ainable_weigh s,
384 unconnec ed_g adien s = .Unconnec edG adien s.ZERO)
385 o weigh _ind in ange(sel ._num_hidden_laye s +2):
386 loss_ egula iza ion += . educe_mean( .squa e(ou pu s_de i a i es_pa am_de [2*weigh _ind]
387 - ou pu s_pa am_de [2*weigh _ind]))
388 loss_ egula iza ion += . educe_mean( .squa e(ou pu s_de i a i es_pa am_de [2*weigh _ind+1]
389 - ou pu s_pa am_de [2*weigh _ind+1]))
390 del ape_ eg
391
392 di _op_ou pu = p oblemIns ance.
393 di e en ial_ope a o (inpu s = inpu s_domain ,
394 ou pu s = ou pu s ,
395 ou pu s_de i a i es = ou pu s_de i a i es ,
396 di e en ial_ope a o = sel ._di e en ial_ope a o ,
397 inpu _dim = sel ._inpu _dim ,
398 ou pu _dim = sel ._ou pu _dim)
399 ex _ o ce_ou pu = p oblemIns ance.
400 ex e nal_ o ce(inpu s = inpu s_domain ,
401 ex e nal_ o ce = sel ._ex e nal_ o ce,
402 inpu _dim = sel ._inpu _dim ,
92
403 ou pu _dim = sel ._ou pu _dim)
404
405 exac _sol_ou pu = p oblemIns ance.
406 exac _solu ion(inpu s = inpu s_domain ,
407 exac _solu ion = sel ._exac _solu ion,
408 inpu _dim = sel ._inpu _dim ,
409 ou pu _dim = sel ._ou pu _dim)
410
411 # E alua es he le -hand-side o he bo de condi ions.
412 # Righ -hand-side al eady compu ed in bo de _da a[1:n].
413 i (bo de _da a != None):
414 ou pu s_bo de , ou pu s_de i a i es_bo de = sel .p edic (inpu s = bo de _da a[0],
415 e u n_de i a i e_o de = sel ._ equi ed_de i a i e_o de )
416 lhs_bo de = p oblemIns ance.
417 lhs_bounda y_cond ions (inpu s = bo de _da a[0],
418 ou pu s = ou pu s_bo de ,
419 ou pu s_de i a i es = ou pu s_de i a i es_bo de ,
420 bo de _ ype = sel ._bo de _ ype,
421 bo de _ba ch_size = sel ._bo de _ aining_ba ch_size,
422 ex e nal_ o ce = sel ._ex e nal_ o ce,
423 equi ed_de i a i e_o de = sel ._ equi ed_de i a i e_o de ,
424 inpu _dim = sel ._inpu _dim ,
425 ou pu _dim = sel ._ou pu _dim)
426
427 # Compu es he loss unc ion o he L2 E o .
428 i (sel ._loss_ uc ion == 'L2_e o '):
429 loss_domain = . educe_mean(
430 .no m(di _op_ou pu -ex _ o ce_ou pu ,
431 o d = 'euclidean ',
432 axis = 1))
433 i (bo de _da a != None):
434 o ind in ange(len(bo de _da a)-1):
435 loss_bo de += . educe_mean(
436 .no m(lhs_bo de [ind] - bo de _da a[ind+1],
437 o d='euclidean ',
438 axis=1))
439 i (sel ._ egula iza ion == 'Tikhono '):
440 o weigh _ind in ange(sel ._num_hidden_laye s+2):
441 loss_ egula iza ion += . educe_mean(
442 .no m(sel ._ ainable_weigh s [2*weigh _ind],
443 o d='euclidean ',
444 axis = 1))
445 loss_ egula iza ion += . educe_mean(
446 .no m(sel ._ ainable_weigh s[2*weigh _ind+1],
447 o d='euclidean ',
448 axis = 0))
449 eli (sel ._ egula iza ion == None
450 o sel ._ egula iza ion == 'G adien _Type '
451 o sel ._ egula iza ion == 'Quad a ic_Balance '):
452 pass
453 else:
454 p in ('In alid egula iza ion op ion , de aul ing o none.')
455 sel ._ egula iza ion = None
456 loss_solu ion = . educe_mean(
457 .no m(ou pu s -exac _sol_ou pu ,
458 o d='euclidean ',
459 axis=1))
460
461 # Compu es he loss unc ion o he Squa e L2 E o (MSE).
462 eli (sel ._loss_ uc ion == 'squa e_L2_e o '):
463 loss_domain = . educe_mean(
464 . educe_sum( .squa e(di _op_ou pu -ex _ o ce_ou pu ),
465 axis = 1,
466 keepdims = T ue))
467 i (bo de _da a != None):
468 o ind in ange(len(bo de _da a)-1):
469 loss_bo de += . educe_mean(
470 . educe_sum( .squa e(lhs_bo de [ind] - bo de _da a[ind+1]),
471 axis = 1,
472 keepdims = T ue))
473 i (sel ._ egula iza ion == 'Tikhono '):
474 o weigh _ind in ange(sel ._num_hidden_laye s+2):
475 loss_ egula iza ion += . educe_mean(
476 . educe_mean( .squa e(sel ._ ainable_weigh s[2*weigh _ind]),
477 axis = 1,
478 keepdims = T ue))
479 loss_ egula iza ion += . educe_mean(
480 . educe_mean( .squa e(sel ._ ainable_weigh s[2*weigh _ind+1]),
481 axis = 0,
482 keepdims = T ue))
483 eli (sel ._ egula iza ion == None
484 o sel ._ egula iza ion == 'G adien _Type '
485 o sel ._ egula iza ion == 'Quad a ic_Balance '):
486 pass
487 else:
488 p in ('In alid egula iza ion op ion , de aul ing o none.')
489 sel ._ egula iza ion = None
490 loss_solu ion = . educe_mean(
491 . educe_sum( .squa e (ou pu s -exac _sol_ou pu ),
492 axis = 1,
493 keepdims = T ue))
494
495 # Compu es he loss unc ion o he Absolu e E o (L1).
496 eli (sel ._loss_ uc ion == 'absolu e_e o '):
497 loss_domain = . educe_mean(
498 . educe_sum( .abs(di _op_ou pu -ex _ o ce_ou pu ),
499 axis = 1,
500 keepdims = T ue))
501 i (bo de _da a != None):
502 o ind in ange(len(bo de _da a)-1):
503 loss_bo de += . educe_mean(
504 . educe_sum( .abs(lhs_bo de [ind] - bo de _da a[ind+1]),
93
505 axis = 1,
506 keepdims = T ue))
507 i (sel ._ egula iza ion == 'Tikhono '):
508 o weigh _ind in ange(sel ._num_hidden_laye s+2):
509 loss_ egula iza ion += .ma h. educe_mean(
510 . educe_mean( .abs(sel ._ ainable_weigh s[2*weigh _ind]),
511 axis = 1,
512 keepdims = T ue))
513 loss_ egula iza ion += .ma h. educe_mean(
514 . educe_mean( .abs(sel ._ ainable_weigh s[2*weigh _ind+1]),
515 axis = 0,
516 keepdims = T ue))
517 eli (sel ._ egula iza ion == None
518 o sel ._ egula iza ion == 'G adien _Type '
519 o sel ._ egula iza ion == 'Quad a ic_Balance '):
520 pass
521 else:
522 p in ('In alid egula iza ion op ion , de aul ing o none.')
523 sel ._ egula iza ion = None
524 loss_solu ion = . educe_mean(
525 . educe_sum( .abs(ou pu s - exac _sol_ou pu ),
526 axis = 1,
527 keepdims = T ue))
528
529 # Expe imen al: Compu es he loss (MSE) p opo ional o he ex e nal o ce.
530 eli (sel ._loss_ uc ion == ' o ce_p opo ional_e o '):
531 loss_domain = . educe_mean(
532 .squa e ((di _op_ou pu -ex _ o ce_ou pu )
533 *(ex _ o ce_ou pu +1e-12)))
534 i (bo de _da a != None):
535 o ind in ange(len(bo de _da a)-1):
536 loss_bo de += . educe_mean(
537 .squa e(lhs_bo de [ind] - bo de _da a[ind+1]))
538 i (sel ._ egula iza ion == 'Tikhono '):
539 o weigh _ind in ange(sel ._num_hidden_laye s+2):
540 loss_ egula iza ion += . educe_mean(
541 . educe_mean( .squa e(sel ._ ainable_weigh s[2*weigh _ind]),
542 axis = 1,
543 keepdims = T ue))
544 loss_ egula iza ion += . educe_mean(
545 . educe_mean( .squa e(sel ._ ainable_weigh s[2*weigh _ind+1]),
546 axis = 0,
547 keepdims = T ue))
548 eli (sel ._ egula iza ion == None
549 o sel ._ egula iza ion == 'G adien _Type '
550 o sel ._ egula iza ion == 'Quad a ic_Balance '):
551 pass
552 else:
553 p in ('In alid egula iza ion op ion , de aul ing o none.')
554 sel ._ egula iza ion = None
555 loss_solu ion = . educe_mean(
556 .squa e (ou pu s -exac _sol_ou pu ))
557
558 # Compu es he squa e o he MSE , i.e. he ||.||^{4}_{2} e o .
559 eli (sel ._loss_ uc ion == 'squa e_MSE '):
560 loss_domain = . educe_mean(
561 . educe_sum( .squa e( .squa e(di _op_ou pu -ex _ o ce_ou pu )),
562 axis = 1,
563 keepdims = T ue))
564 i (bo de _da a != None):
565 o ind in ange(len(bo de _da a)-1):
566 loss_bo de += . educe_mean(
567 . educe_sum( .squa e( .squa e(lhs_bo de [ind] - bo de _da a[ind+1])),
568 axis = 1,
569 keepdims = T ue))
570 i (sel ._ egula iza ion == 'Tikhono '):
571 o weigh _ind in ange(sel ._num_hidden_laye s+2):
572 loss_ egula iza ion += . educe_mean(
573 . educe_mean( .squa e( .squa e(sel ._ ainable_weigh s [2*weigh _ind])),
574 axis = 1,
575 keepdims = T ue))
576 loss_ egula iza ion += . educe_mean(
577 . educe_mean( .squa e( .squa e(sel ._ ainable_weigh s [2*weigh _ind+1])),
578 axis = 0,
579 keepdims = T ue))
580 eli (sel ._ egula iza ion == None
581 o sel ._ egula iza ion == 'G adien _Type '
582 o sel ._ egula iza ion == 'Quad a ic_Balance '):
583 pass
584 else:
585 p in ('In alid egula iza ion op ion , de aul ing o none.')
586 sel ._ egula iza ion = None
587 loss_solu ion = . educe_mean(
588 . educe_sum( .squa e (ou pu s -exac _sol_ou pu ),
589 axis = 1,
590 keepdims = T ue))
591
592 # Expe imen al: Compu es he loss (MSE) p opo ional o he squa e o he inpu s.
593 eli (sel ._loss_ uc ion == 'inpu _p opo ional_e o '):
594 loss_domain = . educe_mean(
595 .squa e ((di _op_ou pu -ex _ o ce_ou pu )
596 *inpu s_domain*inpu s_domain))
597 i (bo de _da a != None):
598 o ind in ange(len(bo de _da a)-1):
599 loss_bo de += . educe_mean(
600 .squa e(lhs_bo de [ind] - bo de _da a[ind+1]))
601 i (sel ._ egula iza ion == 'Tikhono '):
602 o weigh _ind in ange(sel ._num_hidden_laye s+2):
603 loss_ egula iza ion += . educe_mean(
604 . educe_mean( .squa e(sel ._ ainable_weigh s[2*weigh _ind]),
605 axis = 1,
606 keepdims = T ue))
94
607 loss_ egula iza ion += . educe_mean(
608 . educe_mean( .squa e(sel ._ ainable_weigh s[2*weigh _ind+1]),
609 axis = 0,
610 keepdims = T ue))
611 eli (sel ._ egula iza ion == None
612 o sel ._ egula iza ion == 'G adien _Type '
613 o sel ._ egula iza ion == 'Quad a ic_Balance '):
614 pass
615 else:
616 p in ('In alid egula iza ion op ion , de aul ing o none.')
617 sel ._ egula iza ion = None
618 loss_solu ion = . educe_mean(
619 .squa e (ou pu s -exac _sol_ou pu ))
620
621 # Compu es he loss (MSE) wi h espec o one componen .
622 eli (sel ._loss_ uc ion == 'squa e_L2_e o _1s _comp'):
623 loss_domain = . educe_mean( .squa e(di _op_ou pu -ex _ o ce_ou pu )[:,0])
624 i (bo de _da a != None):
625 o ind in ange(len(bo de _da a)-1):
626 loss_bo de += . educe_mean(
627 . educe_sum( .squa e(lhs_bo de [ind] - bo de _da a[ind+1]),
628 axis = 1,
629 keepdims = T ue))
630 i (sel ._ egula iza ion == 'Tikhono '):
631 o weigh _ind in ange(sel ._num_hidden_laye s+2):
632 loss_ egula iza ion += . educe_mean(
633 . educe_mean( .squa e(sel ._ ainable_weigh s[2*weigh _ind]),
634 axis = 1,
635 keepdims = T ue))
636 loss_ egula iza ion += . educe_mean(
637 . educe_mean( .squa e(sel ._ ainable_weigh s[2*weigh _ind+1]),
638 axis = 0,
639 keepdims = T ue))
640 eli (sel ._ egula iza ion == None
641 o sel ._ egula iza ion == 'G adien _Type '
642 o sel ._ egula iza ion == 'Quad a ic_Balance '):
643 pass
644 else:
645 p in ('In alid egula iza ion op ion , de aul ing o none.')
646 sel ._ egula iza ion = None
647 loss_solu ion = . educe_mean ( . squa e(ou pu s - exac _sol_ou pu )[0])
648
649 # E o o in alid loss op ion.
650 else:
651 aise Excep ion("In alid loss op ion. Please ecompile wi h a alid name.")
652
653 # Expe imen al: Implemen he quad a ic loss balance egula iza ion.
654 i (sel ._ egula iza ion == 'Quad a ic_Balance '):
655 i (inpu s_bo de != None and expec ed_ou pu s_bo de != None):
656 loss_ egula iza ion += .sq ( .squa e(loss_domain -loss_bo de ))
657
658 # Implemen s he ain only domain o bo de op ions
659 coe _domain = 1
660 coe _bo de = 1
661 i (use_only_domain == T ue):
662 coe _bo de = 0.
663 i (use_only_bo de == T ue):
664 coe _domain = 0.
665
666 # Compu es he o al loss and checks o explosions.
667 loss = coe _domain*loss_domain + coe _bo de *loss_bo de
668 .debugging.check_nume ics(loss_domain , message='NaN occu ed in domain loss un ion.')
669 .debugging.check_nume ics(loss_bo de , message='NaN occu ed in bo de loss un ion.')
670 .debugging.check_nume ics(loss_ egula iza ion , message='NaN occu ed in egula iza ion loss un ion.')
671 .debugging.check_nume ics(loss, message='NaN occu ed in o al loss un ion.')
672
673 # I he me hod is se o aining mode, he losses a e sa ed on he his o ical aining a iables.
674 i (is_ aining == T ue):
675 sel ._losses_domain.append(loss_domain.numpy())
676 sel ._losses_bo de .append(loss_bo de .numpy())
677 sel ._losses_ egula iza ion.append(loss_ egula iza ion.numpy())
678 sel ._losses.append(loss.numpy())
679 sel ._losses_solu ion.append(loss_solu ion.numpy())
680
681 e u n loss, loss_domain , loss_bo de , loss_ egula iza ion
682
683 ####################
684 # Compu es he g adien w he weigh s.
685 ####################
686 de back_p opaga ion (sel ,
687 inpu s,
688 bo de _da a,
689 is_ aining,
690 spli _g adien = False ,
691 display_g adien _no m = False ,
692 no malize_g adien = False ,
693 ain_only_domain = False ,
694 ain_only_bo de = False):
695
696 # Execu es he back p opaga ion
697 wi h .G adien Tape(pe sis en = T ue)as ape_bp:
698 ape_bp.wa ch(sel ._ ainable_weigh s)
699 loss, loss_domain , loss_bo de ,
700 loss_ egula iza ion = sel .loss_ unc ion(inpu s_domain = inpu s,
701 bo de _da a = bo de _da a ,
702 is_ aining = is_ aining ,
703 use_only_domain = ain_only_domain ,
704 use_only_bo de = ain_only_bo de )
705
706 o al_loss_ = loss+sel ._ egula iza ion_coe *loss_ egula iza ion
707
708 g adien _upda e = ape_bp.g adien ( o al_loss_ ,
95
709 sel ._ ainable_weigh s,
710 unconnec ed_g adien s = .Unconnec edG adien s.ZERO)
711
712 # Spli s he g adien w o each indi idual loss componen .
713 i (spli _g adien == T ue):
714 g adien _upda e_domain = ape_bp.g adien (loss_domain ,
715 sel ._ ainable_weigh s,
716 unconnec ed_g adien s = .Unconnec edG adien s.ZERO)
717 g adien _upda e_bo de = ape_bp.g adien (loss_bo de ,
718 sel ._ ainable_weigh s,
719 unconnec ed_g adien s = .Unconnec edG adien s.ZERO)
720 g adien _upda e_ egula iza ion = ape_bp.g adien (loss_ egula iza ion ,
721 sel ._ ainable_weigh s,
722 unconnec ed_g adien s = .Unconnec edG adien s.ZERO)
723 del ape_bp
724
725 # A oids NAN p opaga ion by igh ully se ing hem o 0.
726 g adien _upda e = [ .whe e( .ma h.is_nan(g), .ze os_like(g), g)
727 o gin g adien _upda e]
728 i (spli _g adien == T ue):
729 g adien _upda e_domain = [ .whe e( .ma h.is_nan(g), .ze os_like(g), g)
730 o gin g adien _upda e_domain]
731 g adien _upda e_bo de = [ .whe e( .ma h.is_nan(g), .ze os_like(g), g)
732 o gin g adien _upda e_bo de ]
733 g adien _upda e_ egula iza ion = [ .whe e( .ma h.is_nan(g), .ze os_like(g), g)
734 o gin g adien _upda e_ egula iza ion]
735
736 # Applies clipping egula iza ion o bound he g adien s.
737 i (sel ._clip_g adien == 'global'):
738 g adien _upda e = .clip_by_global_no m(g adien _upda e , 1e+1)[0]
739 g adien _upda e_domain = .clip_by_global_no m(g adien _upda e_domain , 1e+1)[0]
740 g adien _upda e_bo de = .clip_by_global_no m(g adien _upda e_bo de , 1e+1)[0]
741 g adien _upda e_ egula iza ion = .clip_by_global_no m(g adien _upda e_ egula iza ion , 1e+1)[0]
742 eli (sel ._clip_g adien == ' alue '):
743 g adien _upda e = [ .clip_by_ alue(g, clip_ alue_min = -1e+1, clip_ alue_max = 1e+1)
744 o gin g adien _upda e]
745 g adien _upda e_domain = [ .clip_by_ alue(g, clip_ alue_min = -1e+1, clip_ alue_max = 1e+1)
746 o gin g adien _upda e_domain]
747 g adien _upda e_bo de = [ .clip_by_ alue(g, clip_ alue_min = -1e+1, clip_ alue_max = 1e+1)
748 o gin g adien _upda e_bo de ]
749 g adien _upda e_ egula iza ion = [ .clip_by_ alue(g, clip_ alue_min = -1e+1, clip_ alue_max = 1e+1)
750 o gin g adien _upda e_ egula iza ion]
751 eli (sel ._clip_g adien == 'no m '):
752 g adien _upda e = [ .clip_by_no m(g, 1e+1) o gin g adien _upda e]
753 g adien _upda e_domain = [ .clip_by_no m(g, 1e+1) o gin g adien _upda e_domain]
754 g adien _upda e_bo de = [ .clip_by_no m(g, 1e+1) o gin g adien _upda e_bo de ]
755 g adien _upda e_ egula iza ion = [ .clip_by_no m(g, 1e+1) o gin g adien _upda e_ egula iza ion]
756 eli (sel ._clip_g adien == None):
757 pass
758 else:
759 p in ('In alid clipping op ion , de aul ing o global.')
760 sel ._clip_g adien = 'global'
761
762 # Applies g adien no maliza ion egula iza ion.
763 i (no malize_g adien == T ue):
764 no m = .linalg.global_no m(g adien _upda e)
765 g adien _upda e = [g/no m o gin g adien _upda e]
766
767 # Rescale G adien Regula iza ion (Always On)
768 i (bo de _da a != None):
769 o laye _num in ange(sel ._num_hidden_laye s+2):
770 weigh _no m_domain = .no m(g adien _upda e_domain[2*laye _num],
771 o d = 'euclidean ',
772 axis = 0)
773 bias_no m_domain = .no m(g adien _upda e_domain [2*laye _num+1],
774 o d = 'euclidean ',
775 axis = 0)
776 weigh _no m_bo de = .no m(g adien _upda e_bo de [2*laye _num],
777 o d = 'euclidean ',
778 axis = 0)
779 bias_no m_bo de = .no m(g adien _upda e_bo de [2*laye _num+1],
780 o d = 'euclidean ',
781 axis = 0)
782 weigh _no m_ egula iza ion = .no m(g adien _upda e_ egula iza ion[2*laye _num],
783 o d = 'euclidean ',
784 axis = 0)
785 bias_no m_ egula iza ion = .no m(g adien _upda e_ egula iza ion [2*laye _num+1],
786 o d = 'euclidean ',
787 axis = 0)
788
789 weigh _no m = .minimum(weigh _no m_domain , weigh _no m_bo de )
790 bias_no m = .minimum(bias_no m_domain , bias_no m_bo de )
791
792 g adien _upda e_domain[2*laye _num] = g adien _upda e_domain[2*laye _num]*weigh _no m/(weigh _no m_domain+1e-31)
793 g adien _upda e_domain[2*laye _num+1] = g adien _upda e_domain[2*laye _num+1]*bias_no m/(bias_no m_domain+1e-31)
794
795 g adien _upda e_bo de [2*laye _num] = g adien _upda e_bo de [2*laye _num]*weigh _no m/(weigh _no m_bo de +1e-31)
796 g adien _upda e_bo de [2*laye _num+1] = g adien _upda e_bo de [2*laye _num+1]*bias_no m/(bias_no m_bo de +1e-31)
797
798 g adien _upda e_ egula iza ion[2* laye _num] = g adien _upda e_ egula iza ion[2*laye _num]*weigh _no m/(
weigh _no m_ egula iza ion+1e-31)
799 g adien _upda e_ egula iza ion[2* laye _num+1] = g adien _upda e_ egula iza ion[2*laye _num+1]*weigh _no m/(
bias_no m_ egula iza ion+1e-31)
800
801 g adien _upda e [2*laye _num] = (sel ._scale_ ac o *g adien _upda e_domain [2*laye _num]
802 + g adien _upda e_bo de [2*laye _num]
803 +sel ._ egula iza ion_coe *g adien _upda e_ egula iza ion[2*laye _num])
804 g adien _upda e [2*laye _num+1] = (sel ._scale_ ac o *g adien _upda e_domain[2*laye _num+1]
805 + g adien _upda e_bo de [2*laye _num+1]
806 +sel ._ egula iza ion_coe *g adien _upda e_ egula iza ion[2*laye _num+1])
807
808 # Displays he g adien (s) i he op ion is selec ed.
96
809 i (display_g adien _no m == T ue):
810 o al_no m = .linalg.global_no m(g adien _upda e)
811 domain_no m = .linalg.global_no m(g adien _upda e_domain)
812 bo de _no m = .linalg.global_no m(g adien _upda e_bo de )
813 egula iza ion_no m = .linalg.global_no m(g adien _upda e_ egula iza ion)
814 p in (' To al G adien No m', s ( o al_no m.numpy()))
815 p in (' Domain G adien No m', s (domain_no m.numpy()))
816 p in (' Bo de G adien No m', s (bo de _no m.numpy()))
817 p in (' Regula iza ion G adien No m', s ( egula iza ion_no m.numpy()))
818 o laye _num in ange(sel ._num_hidden_laye s+2):
819 p in (' ', sel ._laye s[laye _num]._name, ' To al Weigh G adien No m: ',
820 s ( .no m(g adien _upda e[2*laye _num],
821 o d = 'euclidean ', axis = 1).numpy()))
822 p in (' ', sel ._laye s[laye _num]._name, ' Domain Weigh G adien No m: ',
823 s ( .no m(g adien _upda e_domain[2*laye _num],
824 o d = 'euclidean ', axis = 1).numpy()))
825 p in (' ', sel ._laye s[laye _num]._name, ' Bo de Weigh G adien No m: ',
826 s ( .no m(g adien _upda e_bo de [2*laye _num],
827 o d = 'euclidean ', axis = 1).numpy()))
828 p in (' ', sel ._laye s[laye _num]._name, ' Regula iza ion Weigh G adien No m: ',
829 s ( .no m(g adien _upda e_ egula iza ion [2*laye _num],
830 o d = 'euclidean ', axis = 1).numpy()))
831 p in (' ', sel ._laye s[laye _num]._name, ' To al Bias G adien No m: ',
832 s ( .no m(g adien _upda e[2*laye _num+1],
833 o d = 'euclidean ', axis = 0).numpy()))
834 p in (' ', sel ._laye s[laye _num]._name, ' Domain Bias G adien No m: ',
835 s ( .no m(g adien _upda e_domain[2*laye _num+1],
836 o d = 'euclidean ', axis = 0).numpy()))
837 p in (' ', sel ._laye s[laye _num]._name, ' Bo de Bias G adien No m: ',
838 s ( .no m(g adien _upda e_bo de [2*laye _num+1],
839 o d = 'euclidean ', axis = 0).numpy()))
840 p in (' ', sel ._laye s[laye _num]._name, ' Regula iza ion Bias G adien No m: ',
841 s ( .no m(g adien _upda e_ egula iza ion [2*laye _num+1],
842 o d = 'euclidean ', axis = 0).numpy()))
843
844 e u n g adien _upda e
845
846 ####################
847 # Applies an op imiza ion s ep.
848 ####################
849 de apply_ aining_s ep (sel ,
850 inpu s,
851 bo de _da a,
852 spli _g adien = False ,
853 display_g adien _no m = False ,
854 no malize_g adien = False ,
855 ain_only_domain = False ,
856 ain_only_bo de = False):
857
858 i (sel ._op imize _selec ion != 'L-BFGS' and sel ._op imize _selec ion != 'BFGS'):
859 g adien _upda e = sel .back_p opaga ion(inpu s = inpu s,
860 bo de _da a = bo de _da a ,
861 is_ aining = T ue,
862 spli _g adien = spli _g adien ,
863 display_g adien _no m = display_g adien _no m ,
864 no malize_g adien = no malize_g adien ,
865 ain_only_domain = ain_only_domain ,
866 ain_only_bo de = ain_only_bo de )
867 i ( ain_only_domain == T ue):
868 sel ._op imize 1.apply_g adien s(zip(g adien _upda e , sel ._ ainable_weigh s))
869 eli ( ain_only_bo de == T ue):
870 sel ._op imize 2.apply_g adien s(zip(g adien _upda e , sel ._ ainable_weigh s))
871 else:
872 sel ._op imize 1.apply_g adien s(zip(g adien _upda e , sel ._ ainable_weigh s))
873 else:
874 sel ._op imize .apply_g adien s()
875
876 # Applies clipping egula iza ion o bound he weigh s.
877 o laye _num in ange(sel ._num_hidden_laye s+2):
878 sel ._ ainable_weigh s[2*laye _num].assign( .clip_by_ alue(sel ._ ainable_weigh s[2*laye _num],
879 clip_ alue_min = -1e+5,
880 clip_ alue_max = +1e+5))
881 sel ._ ainable_weigh s[2*laye _num+1].assign( .clip_by_ alue(sel ._ ainable_weigh s [2*laye _num+1],
882 clip_ alue_min = -1e+5,
883 clip_ alue_max = +1e+5))
884
885 # Clip by magni ude
886 #weigh _ enso = sel ._ ainable_weigh s[2*laye _num]
887 #weigh _sign = .ma h.sign(sel ._ ainable_weigh s [2*laye _num])
888 #clipped_weigh _ enso = .clip_by_ alue( .abs(weigh _ enso ),
889 # clip_ alue_min = 1e-3,
890 # clip_ alue_max = 1e+2)
891 #sel ._ ainable_weigh s[2*laye _num].assign(weigh _sign * clipped_weigh _ enso )
892 #
893 #bias_ enso = sel ._ ainable_weigh s[2*laye _num+1]
894 #bias_sign = .ma h.sign(sel ._ ainable_weigh s[2*laye _num+1])
895 #clipped_bias_ enso = .clip_by_ alue( .abs(bias_ enso ),
896 # clip_ alue_min = 1e-3,
897 # clip_ alue_max = 1e+2)
898 #sel ._ ainable_weigh s[2*laye _num+1].assign(bias_sign * clipped_bias_ enso )
899
900 ####################
901 # Loads he aining se s o ain he ne wo k.
902 ####################
903 de use_ aining_se s (sel ,
904 da a_se ):
905
906 i (da a_se == None):
907 aise Excep ion("No da a se loaded.")
908
909 aining_ba ch_size , bo de _ aining_ba ch_size , alida ion_ba ch_size ,
910 inpu _dim , me hod, domain, bo de = da a_se .ge _se _me ada a()
97
[45] R. Bollap agada, D. Mudige e, J. Nocedal, H.-J. M. Shi, and P. Tang, “A p og essi e ba ching L-BFGS
me hod o machine lea ning,” in ICML, 2018.
[46] J. Ma ens, “Deep lea ning ia hessian- ee op imiza ion.,” in ICML (J. Fü nk anz and T. Joachims,
eds.), pp. 735–742, Omnip ess, 2010.
[47] P. Ramachand an, B. Zoph, and Q. V. Le, “Sea ching o ac i a ion unc ions,” A Xi ,
ol. abs/1710.05941, 2018.
[48] S. El wing, E. Uchibe, and K. Doya, “Sigmoid-weigh ed linea uni s o neu al ne wo k unc ion
app oxima ion in ein o cemen lea ning,” Neu al ne wo ks : he official jou nal o he In e na ional
Neu al Ne wo k Socie y, ol. 107, pp. 3–11, 2018.
[49] V. Nai and G. E. Hin on, “Rec i ied linea uni s imp o e es ic ed bol zmann machines,” in P oceedings
o he 27 h In e na ional Con e ence on Machine Lea ning (ICML-10) (J. Fü nk anz and T. Joachims,
eds.), pp. 807–814, 2010.
[50] X. Glo o and Y. Bengio, “Unde s anding he difficul y o aining deep eed o wa d neu al ne wo ks,”
ol. 9 o P oceedings o Machine Lea ning Resea ch, (Chia Laguna Reso , Sa dinia, I aly), pp. 249–256,
JMLR Wo kshop and Con e ence P oceedings, 13–15 May 2010.
[51] K. He, X. Zhang, S. Ren, and J. Sun, “Del ing deep in o ec i ie s: Su passing human-le el pe o mance
on imagene classi ica ion,” in P oceedings o he IEEE In e na ional Con e ence on Compu e Vision
(ICCV), Decembe 2015.
[52] N. S i as a a, G. Hin on, A. K izhe sky, I. Su ske e , and R. Salakhu dino , “D opou : A simple way
o p e en neu al ne wo ks om o e i ing,” Jou nal o Machine Lea ning Resea ch, ol. 15, no. 56,
pp. 1929–1958, 2014.
[53] J. Ba, J. Ki os, and G. E. Hin on, “Laye no maliza ion,” A Xi , ol. abs/1607.06450, 2016.
[54] Y. Bengio, P. Lamblin, D. Popo ici, and H. La ochelle, “G eedy laye -wise aining o deep ne wo ks,”
in NIPS, 2006.
[55] A. Andoni, R. Panig ahy, G. Valian , and L. Zhang, “Lea ning polynomials wi h neu al ne wo ks,” in
ICML, 2014.
[56] Z.-Q. J. Xu, “A no e o using Tenso low o code Laplacian ope a o in high dimension.” h ps:
//ins.sj u.edu.cn/people/xuzhiqin/pub/laplaciancode.pd . Accessed: 2020-09-01.
104