scieee Science in your language
[en] (orig)

Boosted Neural Networks for Tabular Regression

Author: Musaliyarakath, Rizeen; A.S Abbas, Jessica
Publisher: Zenodo
DOI: 10.5281/zenodo.17721216
Source: https://zenodo.org/records/17721216/files/SRP.pdf
S uden Resea ch P ojec
Resea ch Topic: Boos ed Neu al Ne wo ks o Tabula
Reg ession
Au ho s: 1747556 Rizeen Musaliy aka h, 1748971 Jessica
A.S Abbas, 1749059 Gaya a Gunaseke a,
Supe iso
Ki an Madhusudhanan
15 h Ma ch 2025
Con en s
I abs ac .............................. 2
1 In oduc ion 3
I Mo i a ion ............................ 3
I.1 P oblem Se ing . . . . . . . . . . . . . . . . . . . . . 4
I.2 Resea chIdea....................... 5
I.3 Objec i e ......................... 6
2 Rela ed Wo ks 7
I The Boos ing F amewo k . . . . . . . . . . . . . . . . . . . . 7
II G adien -Boos ed Decision T ees (GBDTs) . . . . . . . . . . 8
III Boos ed Neu al Ne wo k A chi ec u es . . . . . . . . . . . . . 9
3 Me hodology 22
I In eg a ed NODE + PLE A chi ec u e . . . . . . . . . . . . . 22
II Boos ed Fully Connec ed Ne wo ks (BFCN) . . . . . . . . . . 24
III Boos ed Residual Ne wo ks . . . . . . . . . . . . . . . . . . . 26
4 Expe imen s and Resul s 29
I Da ase s.............................. 29
II Analysis o Resul s . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Conclusion 33
I Con ibu ions........................... 35
1
I abs ac
Tabula da a ep esen one o he mos p e alen o ms o da a in machine
lea ning con ex . Despi e ecen ad ancemen s in using neu al ne s (NNs)
o handle abula da a, he e emains an ac i e and ongoing deba e abou
whe he NNs ou pe o m g adien -boos ed decision ees (GBDTs) when
examined wi h espec o abula da a, wi h some ecen wo k sugges -
ing ei he ha GBDTs a e consis en ly be e han NNs o ha NNs a e
consis en ly be e han GBDTs. In his wo k, we a emp o b idge his
gap by explo ing a ious boos ed neu al ne wo k a chi ec u es on abula
da a. We p opose and e alua e h ee amewo ks: In eg a ed Neu al Obli -
ious Decision Ensembles (NODE) wi h Piecewise Linea Encoding (PLE),
Boos ed Fully Connec ed Ne wo ks (BFCN), and Boos ed Residual Ne -
wo ks. Ou expe imen al esul s e eal a mixed pe o mance landscape o
he p oposed boos ed neu al ne wo k a chi ec u es when compa ed o es-
ablished baselines. Agains g adien -boos ed decision ees, NODE+PLE
demons a es compe i i e pe o mance p ima ily on eg ession asks. How-
e e , signi ican unde pe o mance is e iden ac oss classi ica ion asks, wi h
pa icula ly poo esul s on HELOC and mode a e pe o mance on Adul
Da ase s. BFCN consis en ly unde pe o ms GBDTs ac oss all me ics, ail-
ing o achie e compe i i e esul s on any da ase . When e alua ed agains
deep lea ning baselines, NODE+PLE shows mo e p omising esul s, achie -
ing s a e-o - he-a pe o mance on Cali o nia Housing eg ession and com-
pe i i e accu acy on Adul classi ica ion. The esul s unde sco e he pe -
sis en challenge o achie ing GBDT-le el pe o mance wi h neu al a chi-
ec u es on abula da a, while simul aneously demons a ing ha boos ed
neu al ne wo ks can ad ance he s a e-o - he-a wi hin he deep lea ning
pa adigm o speci ic da ase cha ac e is ics.
Keywo ds
Deep Lea ning (DL), Neu al Obli ious Decision Ensembles (NODE), Deep
Neu al Ne wo ks (DNNs),Piecewise Linea Encoding (PLE), G adien -boos ed
decision ees (GBDTs), Boos ed Dynamic Neu al Ne wo ks (Boos Ne )
Codebase: h ps://www.uni-hildesheim.de/gi lab/s p-g oup
2
Chap e 1
In oduc ion
IMo i a ion
Deep neu al ne wo ks (DNNs) ha e achie ed excep ional pe o mance ac oss
a wide a ay o domains, including compu e ision, na u al language p o-
cessing, and speech ecogni ion, pa icula ly when dealing wi h homoge-
neous da a such as images, audio, and ex [1, 8, 11]. Howe e , hei e -
ec i eness on he e ogeneous abula da a emains a signi ican and pe sis-
en challenge [14, 46, 37]. Tabula da a, unlike image o language da a,
is inhe en ly he e ogeneous, o en comp ising a mix o dense nume ical and
spa se ca ego ical ea u es, wi h co ela ions among hese ea u es ypically
weake and mo e i egula [7]. This challenge is c i ical because abula da a
ep esen s he mos commonly used o m o da a and is indispensable o
nume ous i al and compu a ionally demanding applica ions [3, 7, 46].
Despi e he p o en success o DNNs in o he domains, g adien -boos ed
decision ees (GBDTs), such as XGBoos [9], Ligh GBM [27], and Ca -
Boos [38], s ill la gely ou pe o m deep lea ning models on supe ised lea n-
ing asks in ol ing abula da a [7]. This indica es a po en ial s agna ion
in esea ch p og ess o compe i i e deep lea ning models in his domain
[7]. Empi ical compa isons o en show ha GBDTs o!e supe io accu acy,
aining e”ciency, in e ence speed, and hype pa ame e op imiza ion ime
[7]. The e is an ac i e deba e on whe he neu al ne wo ks o GBDTs gen-
e ally ou pe o m each o he on abula da a, wi h a ious wo ks a guing
o [4, 26, 30, 36, 41] o agains [7, 20, 21, 46] NNs. Howe e , McEl esh e
al. [32] epo ed ha his ”NN s. GBDT” deba e may be o e emphasized,
as o a signi ican numbe o da ase s, ei he he pe o mance di!e ence is
3
negligible, o ligh hype pa ame e uning on a GBDT is mo e impac ul
han he choice be ween NN and GBDT.
Ne e heless, deep neu al ne wo ks possess se e al inhe en ad an ages
ha make hei adap a ion o abula da a a compelling esea ch di ec ion.
DNNs a e highly lexible [42],allow o e”cien and i e a i e aining, and
a e pa icula ly aluable in Au oML con ex s [23, 45]. Fu he mo e, neu al
ne wo ks can be deployed o mul imodal lea ning p oblems whe e abula
da a se es as one inpu modali y [45], o abula da a dis illa ion [33, 31],
and in ede a ed lea ning scena ios [40].
Gi en he pe sis en pe o mance gap and he unique cha ac e is ics o
abula da a, his p ojec aims o b idge he di ide by explo ing and en-
hancing boos ed neu al ne wo k amewo ks o abula da a p edic ion.
The co e idea is o in es iga e how an ensemble app oach o modeling neu-
al ne wo ks can adap he s eng hs o adi ional ee-boos ed models o
achie e imp o ed p edic i e pe o mance.
I.1 P oblem Se ing
Gi en a da ase X→RM→Nand co esponding labels y→R1→N,whe eN
ep esen s he numbe o aining ins ances and M ep esen s he numbe
o inpu ea u es. Le ω:RN↑RN↓Rdeno e a loss unc ion, and ω:
RM↓R ep esen a neu al ne wo k pa ame e ized by lea nable pa ame e s
ω.
Ou objec i e is o ind he op imal pa ame e ec o ω↑ ha minimizes
he empi ical isk:
ω↑= a g min
ωL(ω) = a g min
ωω(y,
ω(X))
whe e ω(X)→RN ep esen s he ne wo k’s p edic ions o e all aining
ins ances, and ωencompasses all ainable pa ame e s including weigh s and
biases ac oss all laye s o he neu al ne wo k.
The loss unc ion is de ined o di!e en ask ypes by he ollowing:
Reg ession: Mean Squa ed E o (MSE) is applied
LMSE =1
n
n
!
j=1
(yj↔ˆyj)2.
4

Bina y Classi ica ion: Bina y C oss-En opy Loss
LBCE =↔1
n
n
!
j=1
[yjlog ˆyj+(1↔yj) log(1 ↔ˆyj)] .
Mul iclass classi ica ion: C oss-En opy Loss
LCE =↔
n
!
j=1
yjlog ˆyj
I.2 Resea ch Idea
G adien boos ing echniques, pa icula ly models like XGBoos , Ligh -
GBM, and Ca Boos , ha e become he es ablished s anda d o abula da a
p edic ion, consis en ly achie ing s a e-o - he-a pe o mance [7, 32]. This
s ands in con as o he excep ional pe o mance o deep neu al ne wo ks
(DNNs) in o he homogeneous da a domains such as compu e ision and
na u al language p ocessing [48, 18]. The e!ec i eness o DNNs on he e o-
geneous abula da a, which o en consis s o a mix o dense nume ical and
spa se ca ego ical ea u es wi h weake and mo e i egula co ela ions, e-
mains a signi ican challenge [43, 52, 48]. Indeed, abula da ase s ha e been
called he ”las ’unconque ed cas le’” o deep neu al ne wo k models [7].
While he ”NN s. GBDT” deba e can be o e emphasized o many da ase s
whe e pe o mance di!e ences a e negligible o hype pa ame e uning is
mo e impac ul, GBDTs a e gene ally be e a handling skewed o hea y-
ailed ea u e dis ibu ions and o he da a i egula i ies, and end o pe o m
be e on la ge da ase s [32].
This p ojec is g ounded in he hypo hesis ha employing neu al ne -
wo ks wi hin a boos -like s uc u e can enhance hei p edic i e accu acy
on abula da ase s . Speci ically, we p opose o in es iga e he possibili y
o combining s uc u ed ea u e enginee ing, such as Piecewise Linea En-
coding (PLE) [19], wi h hie a chical, boos -like lea ning p inciples h ough
neu al ne wo ks. We aspi e o explo e how a hough ul combina ion o
neu al ne wo ks in a consecu i e boos ing o ma can adap he s eng hs
o adi ional ee-boos ed models, aiming o imp o ed p edic i e pe o -
mance o e con en ional me hods o abula da a p edic ion. The ul ima e
goal is o demons a e ha a chi ec u es employing hese boos ed neu al
ne wo k amewo ks could po en ially su pass he pe o mance o bo h a-
di ional GBDTs and s andalone deep neu al ne wo ks, o!e ing e sa ile and
po en ially in e p e able solu ions o complex abula da a challenges.
5
I.3 Objec i e
The p ima y objec i e o his s udy is o explo e and ad ance he use o
boos ed neu al ne wo k echniques o abula da a p edic ion. To achie e
his, we de ine he ollowing goals:
•Conduc a ho ough e iew o ecen ad ancemen s in boos ed neu al
ne wo k a chi ec u es and ensemble s a egies.
•Rep oduce and alida e he epo ed esul s o hese app oaches o
es ablish a eliable baseline.
•Adap he iden i ied neu al ne wo k a chi ec u es speci ically o ab-
ula da a applica ions whe e applicable.
•Benchma k and e alua e he adap ed models agains s a e-o - he-a
g adien -boos ed ee algo i hms as well as exis ing neu al ne wo k
baselines.
•Enhance model pe o mance h ough s uc u ed ea u e enginee ing
and op imiza ion o ensemble con igu a ions.
•Analyze he esul ing pe o mance o de elop a deepe unde s anding
o he sui abili y, s eng hs, and limi a ions o boos ed neu al ne wo ks
o abula da a.
6
Chap e 2
Rela ed Wo ks
The concep o boos ing, an ensemble algo i hm ha combines mul iple weak
lea ne s in o a single s ong lea ne has signi ican ly in luenced p edic i e
pe o mance on abula da a.This sec ion e iews he ounda ional g adien
boos ed decision ee models and he main neu al a chi ec u es ha in o m
ou app oach, by conside ing he models ha inco po a e boos ing-inspi ed
me hodologies[32]
I The Boos ing F amewo k
The algo i hmic ounda ion o boos ing was es ablished wi h he AdaBoos
(Adap i e Boos ing) algo i hm by F eund & Schapi e [16]. AdaBoos wo ks
by adap i ely changing he weigh s o aining ins ances, making subsequen
weak lea ne s (e.g., decision s umps) o ocus on p e iously misclassi ied ex-
amples. This concep was expanded by F iedman[17] wi h he in oduc ion
o g adien boos ing by modeling i as a nume ical op imiza ion p oblem in
unc ion space. In his, he model is buil sequen ially in a g eedy, s age-wise
manne whe e each new lea ne is ained o minimize he loss by co ec ing
he e o s o i s p edecesso s. The co e idea is o i e a i ely add weak lea n-
e s o an ensemble, wi h each new model ained o p edic he nega i e
g adien s ( he ”pseudo- esiduals”) o he cu en ensemble’s loss.
Fo mally, gi en a di!e en iable loss unc ion L(y,F(x)), he g adien
boos ing algo i hm p oceeds as ollows[17]
1. Ini ialize he model wi h a cons an alue:
F0(x) = a g min
ω
n
!
i=1
L(yi,ε) (2.1)
7
2. Fo m=1 o M(numbe o boos ing ounds), pe o m he ollowing
s eps:
(a) Compu e he pseudo- esiduals o each ins ance i:
(m)
i=↔"ϑL(yi,F(xi))
ϑF(xi)#F(x)=Fm→1(x)
(2.2)
This equa ion de ines he “e o ” ha he new weak lea ne mus
i .
(b) Fi a weak lea ne hm(x) (e.g., a decision ee) o he pseudo-
esiduals { (m)
i}.
(c) Find he op imal s ep size εm ia line sea ch:
εm= a g min
ω
n
!
i=1
L(yi,F
m↓1(xi)+εh
m(xi)) (2.3)
(d) Upda e he model:
Fm(x)=Fm↓1(x)+ϖ·εmhm(x) (2.4)
He e, ϖ→(0,1] is he sh inkage o lea ning a e, a hype pa ame-
e ha con ols he con ibu ion o each weak lea ne o p e en
o e i ing.[17]
II G adien -Boos ed Decision T ees (GBDTs)
GBDT models o m he baselines o any esea ch on abula da a eg ession.
These models build an ensemble o decision ees i e a i ely, and co ec he
e o s made by he exis ing ensemble o ees. Each new ee is ained
o model he g adien o he loss unc ion.[32] G adien -boos ed decision
ee(GBDT) models such as XGBoos [9], Ligh GBM[27], and Ca Boos [38]
a e well known o hei obus pe o mance, e”ciency, and abili y o handle
he e ogeneous ea u es which made hem domina e he a ea o abula da a
p edic ion.
XGBoos (Ex eme G adien Boos ing) is an implemen a ion o he
g adien boos ing amewo k ha enhances he co e GBDT algo i hm h ough
a second-o de app oxima ion o he loss unc ion o mo e e”cien boos ing,
egula iza ion (L1/L2) o p e en o e i ing, and a spa si y-awa e algo i hm
ha e”cien ly handles missing alues. I lea ns he bes di ec ion o send a
da a poin wi h a missing alue a each spli , allowing i o p ocess spa se
da a wi hou expensi e p ep ocessing [9].
8
a he las laye ,
F(x)=w↘hT(x),
whe e wis he linea classi ie a ached o he ou pu o he inal block.
Boos ResNe modi ies his by decomposing he ne wo k in o weak mod-
ule classi ie s. Each module consis s o a esidual block pai ed wi h i s
own linea classi ie w . Fo mally, he module classi ie a s ep is de ined
as
o (x)=w↘
h (x),
whe e h (x) is he ou pu o he esidual mapping a ha s age. The inal
p edic ion o he ne wo k is hen ep esen ed as a elescoping sum o hese
module classi ie s:
F(x)=
T
!
=0
↼ o (x),
wi h coe”cien s ↼ chosen such ha he ensemble exac ly econs uc s he
s anda d ResNe ou pu [24].
To suppo he heo y, Huang e al. (2018) conduc ex ensi e expe i-
men s on CIFAR-10, CIFAR-100, and SVHN. The esul s show ha Boos -
ResNe “achie es es pe o mance compa able o ha o end- o-end ResNe
aining” while equi ing subs an ially less GPU memo y (p. 2065). Be-
cause he me hod ains blocks sequen ially, only one shallow block needs
o be loaded in o memo y a a ime, making i e”cien o e y deep a -
chi ec u es. The expe imen s also con i m he heo e ical p edic ions: as
he numbe o modules inc eases, aining e o dec eases s eadily, and es
accu acy imp o es co espondingly. O e all, Boos ResNe o!e s a heo e i-
cally g ounded ein e p e a ion o esidual ne wo ks by o malizing hem as
a boos ing ensemble, wi h a elescoping-sum ep esen a ion ha p ese es
he ResNe ou pu . I s sequen ial aining p ocedu e p o ides bo h com-
pu a ional sa ings and p o able con e gence gua an ees, posi ioning i as a
unique con ibu ion in he li e a u e on deep esidual lea ning.
Neu al Obli ious Decision Ensembles(NODE)
Neu al Obli ious Decision Ensembles, is a deep lea ning a chi ec u e ha
combines he s eng hs o ee-based models and deep neu al ne wo ks. I s
main inno a ion is he c ea ion o a di!e en iable ensemble o obli ious de-
cision ees (ODTs), ha makes end- o-end aining ia g adien descen
possible.[36] An Obli ious Decision T ee (ODT) is a ype o decision ee
ha has all nodes a he same dep h and mus use he same ea u e and he
15

same h eshold o spli ing. Fo a ee o dep h d his homogenei y ans-
o ms he ee om a s anda d b anching s uc u e in o a decision able wi h
2den ies, whe e each en y ep esen s a unique combina ion o bina y de-
cisions. While his cons ain educes he capaci y o a single ee, i makes
ensembles o ODTs highly e”cien o in e ence and ema kably esis an o
o e i ing.[36] The Ca Boos algo i hm [38], uses ODTs as weak lea ne s in
g adien boos ing and his has con ibu ed signi ican ly o i s success [38]
This idea is expanded upon in he NODE a chi ec u e. Se e al di!e en iable
ODTs make up a NODE laye . The key o di!e en iabili y lies in eplacing
he ha d, non-di!e en iable ope a ions o a s anda d decision ee ( ea u e
selec ion and bina y ou ing) wi h so , lea nable al e na i es: The NODE
a chi ec u e gene alizes ensembles o Obli ious Decision T ees(ODTs) in o
a ully di!e en iable amewo k ha can be ained end- o-end ia g adien
descen . An ODT is a decision ee whe e all nodes a a gi en dep h duse
he same ea u e and h eshold o spli ing, e!ec i ely o ming a decision
able.[36] Fo he di!e en iable ODT,in a single NODE laye , he e a e m
di!e en iable ODTs and he o wa d pass o one ee is buil o app oxi-
ma e he unc ion o a classical ODT while main aining di!e en iabili y. In
a classical ODT, he non-di!e en iable ou pu is gi en by:
h(x)=R(1( 1(x)↔b1),1( 2(x)↔b2),...,1( d(x)↔bd))(2.10)
whe e 1(·) is he Hea iside s ep unc ion, iis he selec ed ea u e a he i- h
spli , biis he co esponding h eshold, and Ris a d-dimensional esponse
enso ha holds he lea alues. To allow di!e en iabili y, NODE eplaces
hese ha d ope a ions wi h so , lea nable al e na i es.The ha d selec ion
o a single ea u e iis eplaced by a spa se, weigh ed combina ion o all
ea u es using he ↼-en max ans o ma ion[35] applied o a lea nable ea u e
selec ion ma ix F→Rd→n:
ˆ
i(x)=
n
!
j=1
xj·en maxε(Fij) (2.11)
He e, ↼=1.5 is used o induce spa si y, ensu ing he ou pu closely mimics
a ha d ea u e selec ion. The classical Hea iside s ep unc ion is eplaced
by a scaled, wo-class a ian o en max, de ined as:
↽ε(x) = en maxε([x, 0]) (2.12)
The so ou ing p obabili y o he i- h spli is hen:
ci(x)=↽ε* i(x)↔bi
φi+,(2.13)
16
whe e biand φia e lea nable scaling pa ame e s. The alue ci(x)→[0,1]
ep esen s he so p obabili y o aking he igh b anch a he i- h spli .
The inal ee ou pu is compu ed as a weigh ed sum o e all lea es. The
weigh s a e gi en by he ou e p oduc o he so choice ec o s o all d
spli s, o ming a ”choice enso ” C(x):
C(x)="c1(x)
1↔c1(x)#⇐"c2(x)
1↔c2(x)#⇐···⇐"cd(x)
1↔cd(x)#(2.14)
The inal p edic ion o a single ee is:
ˆ
h(x)= !
i1,...,id≃{0,1}d
Ri1,...,id·Ci1,...,id(x) (2.15)
The ou pu o a NODE laye is he conca ena ion o he ou pu s o all m
ees:
(ˆ
h1(x),ˆ
h2(x),...,ˆ
hm(x))
Figu e 2.4: A chi ec u e o a single di!e en iable Obli ious Decision T ee
(ODT) wi hin a NODE laye .The single ODT inside he NODE laye . The
spli ing ea u es and he spli ing h esholds a e sha ed ac oss all he in e -
nal nodes o he same dep h. The ou pu is a sum o lea esponses scaled
by he choice weigh s[36]
Mul i-Laye Hie a chical A chi ec u e Se e al NODE laye s s acked
in a denseNe -like ashion o ms he ull NODE model[51] whe e each laye
uses a conca ena ion o all p e ious laye s.so, he inpu o he k- h laye is a
conca ena ion o he o iginal inpu ea u es and he ou pu s om all p e ious
laye s 0 o k↔1. Due o his design, he model is able o lea n hie a chical
ea u e in e ac ions, whe e ees in deepe laye s can lea n complex ules
17
based on high-le el ep esen a ions ex ac ed by ea lie laye s. The inal
p edic ion is compu ed as he a e age o he ou pu s om all ees ac oss
all NODE laye s:
ˆy=1
K
K
!
k=1
ˆ
hk(x) (2.16)
Figu e 2.5: The mul i-laye NODE a chi ec u e wi h DenseNe -s yle ea u e
euse ac oss laye s.The NODE a chi ec u e, consis ing o densely connec ed
NODE laye s. Each laye con ains se e al ees whose ou pu s a e conca e-
na ed and se e as inpu o he subsequen laye . The inal p edic ion is
ob ained by a e aging he ou pu s o all ees om all he laye s[36].
On Embeddings o Nume ical Fea u es in Tabula Deep Lea n-
ing
This esea ched in oduced an unde explo ed domain o deep lea ning (DL)
o abula da a being he embedding o nume ical ea u es. The au ho s in-
oduce wo app oaches o cons uc ing embeddings o nume ical ea u es:
Piecewise Linea Encoding(PLE) and Pe iodic Ac i a ion(P) unc ions.[19]
The PLE me hod is inspi ed by classical ea u e binning echniques, whe e
he alue ange o a nume ical ea u e is di ided in o in e als (bins), and he
ea u e alues a e encoded in a piecewise linea manne . The esul s o he
esea ch demons a e ha he echnique helps o imp o e he pe o mance
o deep lea ning models on abula da a. This app oach allows simple MLP
models o compe e wi h mo e complex T ans o me -based models.They also
show ha he in eg a ion o his app oach in o he deep lea ning pipeline
p oduces s a e-o - he-a esul s on abula DeepLea ning closing he pe -
o mance gap wi h GBDTs.[19]
18
Piecewise Linea Encoding(PLE):
The design o PLE is mo i a ed by he limi a ions o deep lea ning o
abula da a. While Mul ilaye Pe cep ons a e known o be a uni e sal
app oxima o [24][19], hei lea ning capabili ies in p ac ice a e o en ham-
pe ed by op imiza ion di”cul ies[39][19] Recen wo k by Tancik e al.[47]
demons a ed ha ans o ming he inpu space can signi ican ly sol e hese
op imiza ion issues. This inding di ec ly inspi es he co e p emise o PLE:
ha al e ing he ep esen a ion o o iginal scala nume ical ea u e alues
can enhance he lea ning capabili ies o abula deep lea ning models[16].
The au ho s use he one-ho encoding algo i hm, a me hod ha is wildly
success ul o ep esen ing disc e e en i ies (e.g., ca ego ical ea u es, NLP
okens)[19]. The one-ho ep esen a ion si s a he opposi e end o he spec-
um om a scala ep esen a ion in he ade-o!be ween pa ame e e”-
ciency and exp essi i y[19]. To es i a one-ho -like app oach could ben-
e i deep lea ning models on nume ical da a, PLE is designed as a con-
inuous al e na i e o one-ho encoding, making i applicable o nume ical
ea u es.[16][19]
Fo a gi en nume ical ea u e x, PLE de ines Tbins(in e als) wi h bound-
a ies b0,b
1,...,b
T. The encoding ans o ms he scala alue xin o a T-
dimensional ec o [19] whe e he - h elemen (bin) is calcula ed as:
e =








0,i x<b
↓1and >1,
1,i x≃b and <T,
x↔b ↓1
b ↔b ↓1
,o he wise.
The ull PLE encoding ec o is:
PLE(x)=[e1,e
2,...,e
T],
19
Figu e 2.6: The Piecewise Linea Encoding (PLE) in ac ion o T= 4[19].
The scala inpu alue xis mapped o a 4-dimensional ec o [e1,e
2,e
3,e
4]
based on i s posi ion wi hin he bins, c ea ing a s uc u ed and in e p e able
ep esen a ion.
The au ho s ely on he classic binning algo i hms [12] and one o he wo
algo i hms is unsupe ised, while ano he one u ilizes labels o cons uc -
ing bins. Ob aining bins om quan iles (Unsupe ised Binning): A
na u al baseline way o cons uc he bins is by spli ing alue anges acco d-
ing o uni o mly chosen empi ical quan iles o he co esponding indi idual
ea u e dis ibu ions[PLEPape ]:
b =q
T{xj
i} o j→ aining se .
T i ial bins o ze o size a e emo ed. Supe ised Ta ge -awa e Bin-
ning(Building a ge -awa e bins): This supe ised app oach employs
aining labels o cons uc ing bins, iden ical in spi i o he C4.5 Dis-
c e iza ion [29] algo i hm[19]. Fo each ea u e, we ecu si ely spli i s alue
ange in a g eedy manne using he a ge as guidance. This is equi alen
o building a decision ee (which uses o g owing only his one ea u e and
he a ge ) and ea ing he egions co esponding o i s lea es as he bins
o PLE[19]. we de ine
bi
0=min
j≃J ain
xj
iand bi
T= max
j≃J ain
xj
i[?].
Pe iodic Ac i a ion Func ion(P)
This design p ojec s scala alues in o a pe iodic space using lea nable
equencies.[47] Fo his me hod, a ea u e x, he embedding is cons uc ed
as:
i(x)=Pe iodic(x) = conca (sin( ),cos( )),(2.17)
20

whe e
=(2⇀c1x, 2⇀c2x, . . . , 2⇀ckx),(2.18)
and cia e ainable pa ame e s ini ialized om a no mal dis ibu ion
ci⇒N(0,↽).(2.19)
The hype pa ame e s ↽(ini ial equency scale) and k(numbe o equen-
cies) a e c ucial and a e uned on he alida ion se .
21
Chap e 3
Me hodology
This p ojec aims o b idge he pe o mance gap be ween deep lea ning
models and G adien Boos ed Decision T ees (GBDTs) on he e ogeneous
abula da a. To achie e his, we explo e ad anced concep s o :
•In eg a ing a deep lea ning a chi ec u e ha mimics ee ensembles
and in oducing a no el me hod o ep esen ing nume ical ea u es
in o his a chi ec u e. The co e o ou me hodology is he in eg a ion
o Piecewise Linea Encoding (PLE) in o he Neu al Obli ious Deci-
sion Ensembles (NODE) a chi ec u e, c ea ing a powe ul and di!e -
en iable model o abula eg ession and classi ica ion.
•Adap ing deep neu al ne wo k models o be mo e compa a i e wi h
g adien boos ing decision ee models(GBDTs) on he e ogenous ab-
ula da a.
I In eg a ed NODE + PLE A chi ec u e
The in eg a ion o he piecewise linea encoding(PLE) o nume ical embed-
dings wi h he Neu al Obli ious Decision Ensembles(NODE) a chi ec u e is
an inno a ion o his wo k wi h he aim o add essing he limi a ions o deep
lea ning on abula da a. While he componen s a e powe ul indi idually,
we belie e hei combina ion will esul in a mo e obus model.
The in eg a ed a chi ec u e le e ages he s eng hs o bo h NODE and PLE
o p ocess da a. The PLE model p ocesses each nume ical ea u e xisepa-
a ely by i s own PLE module. Based on he chosen s a egy (unsupe ised
PLE o supe ised ), he scala alue is ans o med in o a high-dimensional,
piecewise-linea ep esen a ion PLE(xi)→RT. This ec o is hen passed
22
h ough a ea u e-speci ic linea laye o ob ain a inal dense embedding[37]:
enum
i=Wi·PLE(xi)+bi.
Each ca ego ical ea u e is p ocessed h ough a s anda d embedding laye ,
mapping each ca ego y o a dense ec o eca
j. All esul ing nume ical em-
beddings enum
iand ca ego ical embeddings eca
ja e conca ena ed o o m
a join , ich inpu ep esen a ion ec o z. The ec o zis ed in o he
mul i-laye NODE a chi ec u e[37]. The di!e en iable obli ious decision
ees wi hin each NODE laye now ope a e on his p e-en iched embedding
space a he han on aw, no malized scala s. The DenseNe -like s uc u e
allows subsequen laye s o use he ans o med, high-le el ea u es lea ned
by ea lie laye s, enabling he lea ning o complex hie a chical in e ac ions
be ween he PLE-encoded ea u es. The inal ou pu is a simple a e age o
he ou pu s om all ees ac oss all NODE laye s.
Figu e 3.1: A chi ec u e o In eg a ed NODE + PLE. om he op le ,inpu
ea u es a e encoded ia Piecewise Linea Encoding (PLE).All embeddings
a e conca ena ed and p ocessed by a mul i-laye NODE a chi ec u e, which
lea ns hie a chical in e ac ions h ough i s ensembles o di!e en iable obli -
ious decision ees. The inal p edic ion is an a e age o all ee ou pu s[37]
Why PLE o NODE In eg a ion
The choice o Piecewise Linea Encoding (PLE) o e o he embeddings o
in eg a ion wi h NODE is due o a sha ed induc i e bias. Bo h me hods a e
undamen ally based on he concep o spli ing da a on ea u e h esholds.
This is he co e ope a ional p inciple ha makes ee-based models like
23
G adien Boos ed Decision T ees (GBDTs) powe ul on abula da a.NODE
explici ly mimics his p inciple by cons uc ing a di!e en iable ensemble o
obli ious decision ees ha lea n di!e en iable spli s. PLE di ec ly encodes
his p inciple in o he inpu ep esen a ion. I ans o ms a scala alue in o
a ec o based on i s posi ion wi hin lea ned bins (o in e als), de ined by
h esholds
(b0,b
1,...,b
T).
The e o e, PLE p o ides an inpu signal ha is al eady s uc u ed in he
o m ha NODE is designed o unde s and. The ee-like ou pu o PLE
is a i o he ee-based lea ning o NODE, c ea ing a mo e powe ul and
aligned model han wi h o he , less compa ible embeddings.
Why This Combina ion is Inno a i e
P e ious a emp s o make deep lea ning compe i i e wi h GBDTs ocused
p ima ily on designing no el backbone a chi ec u es (e.g., T ans o me s[48,
20], o he a en ion mechanisms) o on c ea ing comple ely di!e en iable
ee s uc u es (e.g., NODE[37]). This wo k is unique by ocusing on he
c i ical bu unde explo ed inpu ep esen a ion laye o nume ical ea u es.
Inno a ion lies in ecognizing ha :
•The inpu ep esen a ion (simple scala s) is a key bo leneck.
•A echnique (PLE) exis s o c ea e a much iche ep esen a ion.
•A speci ic a chi ec u e (NODE) exis s ha is pe ec ly sui ed o exploi
his iche ep esen a ion due o i s ee-like na u e.
By in eg a ing PLE wi h NODE, we aim o c ea e a uni ied a chi ec u e ha
di ec ly add esses he co e weaknesses o Deep Neu al Ne wo ks on abula
da a, pushing hei pe o mance close o and beyond ha o s a e-o - he-a
GBDTs
II Boos ed Fully Connec ed Ne wo ks (BFCN)
A conce ed a emp was made o ex end he ad an ages o dynamic in-
e ence o he e ogeneous abula da a by in eg a ing he Boos ed Dynamic
Neu al Ne wo k (Boos Ne ) model in o a ully-connec ed ne wo k ame-
wo k, which will be e e ed o as Boos ed Fully Connec ed Ne wo ks
(BFCN). This modi ica ion was essen ial o esol ing he well-known issues
ha deep neu al ne wo ks encoun e when wo king wi h abula da ase s,
24
Table 4.3: Pe o mance me ics o NODE + PLE and GBDTs and Machine
Lea ning Models
Me hod HELOC Adul HIGGS Co e ype Cal. Housing
Acc⇑AUC⇑Acc⇑AUC⇑Acc⇑AUC⇑Acc⇑AUC⇑MSE⇓
Linea Model 73.0±0.0 80.1±0.1 82.5±0.2 85.4±0.2 64.1±0.0 68.4±0.0 72.4±0.0 92.8±0.0 0.528±0.008
KNN 72.2±0.0 79.0±0.1 83.2±0.2 87.5±0.2 62.3±0.1 67.1±0.0 70.2±0.1 90.1±0.2 0.421±0.009
Decision T ee 80.3±0.0 89.3±0.1 85.3±0.2 89.8±0.1 71.3±0.0 78.7±0.0 79.1±0.0 95.0±0.0 0.404±0.007
Random Fo es 82.1±0.3 90.0±0.2 86.1±0.2 91.7±0.2 71.9±0.0 79.7±0.0 78.1±0.1 96.1±0.0 0.272±0.006
XGBoos 83.5±0.2 92.2±0.0 87.3±0.2 92.8±0.1 77.6±0.0 85.9±0.0 97.3±0.0 99.9±0.0 0.206±0.005
Ligh GBM 83.5±0.1 92.3±0.0 87.4±0.2 92.9±0.1 77.1±0.0 85.5±0.0 93.5±0.0 99.7±0.0 0.195±0.005
Ca Boos 83.6±0.3 92.4±0.1 87.2±0.2 92.8±0.1 77.5±0.0 85.8±0.0 96.4±0.0 99.8±0.0 0.196±0.004
Model T ees 82.6±0.2 91.5±0.0 85.0±0.2 90.4±0.1 69.8±0.0 76.7±0.0 --0.385±0.019
NODE + PLE 78.6±0.0 72.2±0.0 86.1±0.5 91.4±0.4 73.3±0.0 81.4±0.0 74.2±0.0 95.1±0.0 0.215±0.009
BFCN 71.0±1.1 77.6±1.3 85.1±0.5 90.9±0.4 70.6±1.1 77.7±0.9 73.4±0.2 93.2±0.6 0.277±0.017
NODE + PLE demons a es ha i s pe o mance sugges s ha i is
highly da ase -dependen . NODE + PLE p o es highly e!ec i e on he
Cali o nia Housing eg ession ask, whe e i achie es an MSE (0.215) ha
is highly compe i i e wi h op-pe o ming GBDTs like Ligh GBM (0.195),
his shows i s po en ial on nume ical da a. I also pe o ms compa ably
well on he Adul da ase bu is s ill ou pe o med by GBDTs. Bu i s
pe o mance collapses on he HELOC da ase , whe e i s AUC (72.2) is a
huge ou lie , alling below e en simple linea models and sugges ing p obably
a miscon igu a ion speci ic o ha da a. NODE + PLE on HIGGS and
Co e ype e en hough good bu no he bes , which is likely due o he
subsampling o da ase s and educ ion o embedding dimensions in o he o
sol e he issue o memo y unning ou du ing aining.
Table 4.4: Pe o mance me ics o NODE +PLE and Deep Lea ning Models.
Me hod HELOC Adul HIGGS Co e ype Cal. Housing
Acc⇑AUC⇑Acc⇑AUC⇑Acc⇑AUC⇑Acc⇑AUC⇑MSE⇓
MLP 73.2±0.3 80.3±0.1 84.8±0.1 90.3±0.2 77.1±0.0 85.6±0.0 91.0±0.4 76.1±3.0 0.263±0.008
VIME 72.7±0.0 79.2±0.0 84.8±0.2 90.5±0.2 76.9±0.2 85.5±0.1 90.9±0.1 82.9±0.7 0.275±0.007
DeepFM 73.6±0.2 80.4±0.1 86.1±0.2 91.7±0.1 76.9±0.0 83.4±0.0 --0.260±0.006
DeepGBM 78.0±0.4 84.1±0.1 84.6±0.3 90.8±0.1 74.5±0.0 83.0±0.0 --0.856±0.065
NAM 73.3±0.1 80.7±0.3 83.4±0.1 86.6±0.1 53.9±0.6 55.0±1.2 --0.725±0.022
Ne -DNF 82.6±0.4 91.5±0.2 85.7±0.2 91.3±0.1 76.6±0.1 85.1±0.1 94.2±0.1 99.1±0.0 -
TabNe 81.0±0.1 90.0±0.1 85.4±0.2 91.1±0.1 76.5±1.3 84.9±1.4 93.1±0.2 99.4±0.0 0.346±0.007
TabT ans o me 73.3±0.1 80.1±0.2 85.2±0.2 90.6±0.2 73.8±0.0 81.9±0.0 76.5±0.3 72.9±2.3 0.451±0.014
SAINT 82.1±0.3 90.7±0.2 86.1±0.3 91.6±0.2 79.8±0.0 88.3±0.0 96.3±0.1 99.8±0.0 0.226±0.004
RLN 73.2±0.4 80.1±0.4 81.0±1.6 75.9±8.2 71.8±0.2 79.4±0.2 77.2±1.5 92.0±0.9 0.348±0.013
STG 73.1±0.1 80.0±0.1 85.4±0.1 90.9±0.1 73.9±0.1 81.9±0.1 81.8±0.3 96.2±0.0 0.285±0.006
NODE 79.8±0.2 87.5±0.2 85.6±0.3 91.1±0.2 76.9±0.1 85.4±0.1 89.9±0.1 98.7±0.0 0.276±0.005
NODE + PLE 78.6±0.0 72.2±0.0 86.1±0.5 91.4±0.4 73.3±0.0 81.4±0.0 74.2±0.0 95.1±0.0 0.215±0.009
BFCN 71.0±1.1 77.6±1.3 85.1±0.5 90.9±0.4 70.6±1.1 77.7±0.9 73.4±0.2 93.2±0.6 0.277±0.017
Boos ResNe 69.6.0±0.2 - 85.7±0.1 - ---- -
Compa ed o he deep lea ning models,NODE + PLE achie es s a e o
he a esul s on he Cali o nia Housing eg ession ask (MSE: 0.215) and
compe i i e accu acy on he Adul da ase (Acc: 86.1). Howe e i ails
31

g ea ly on he Heloc da ase and i s esul s on Higgs and Co e ype is good
bu no he bes likely due o he subsampling o da ase s and educ ion o
embedding dimensions in o he o sol e he issue o memo y unning ou
du ing aining.
32
Chap e 5
Conclusion
We explo ed h ee di!e en model a chi ec u es, i s in eg a ion and uning
o abula da a. ou model; NODE+PLE shows a signi ican po en ial
o achie ing compe i i e pe o mance wi h GBDTs especially on Cali o nia
Housing eg ession ask and also on he Adul da ase which alida es he
e!ec i eness o PLE. Howe e i s pe o mance on he o he da ase s we e
no he bes .
Limi a ions
•The esul s sugges s ha NODE + PLE’s pe o mance is subjec ed o
he kind o da ase as i s pe o mance was inconsis en ac oss all he
da ase s.
•The complexi y o NODE+PLE, esul ed in longe aining imes and
highe memo y equi emen s compa ed o G adien Boos ed Decision
T ees (GBDTs) and simple MLPs. An ins ance is wi h he Co e -
ype and Higgs da ase s which consis en ly un ou o memo y du ing
aining.
•The PLE laye elies on ixed binning s a egies. In an en i onmen
whe e ea u e dis ibu ions keeps on changing, his binning may no
adap quickly esul ing o a decline in model pe o mance.
Fu u e Wo ks
•I is s ill unclea when o use NODE + PLE. la ge da se o no ? and
wha ea u es?
33
•The highly in alid esul s o HELOC da a se need o be u he in-
es iga ed. I sugges s ha PLE can memo ize noise and need o be
well egula ed.
34
Bibliog aphy
[1] Rishabh Aga wal, Le i Melnick, Nicholas F oss , Xuezhou Zhang, Ben
Lenge ich, Rich Ca uana, and Geo! ey E. Hin on. Neu al addi i e
models: In e p e able machine lea ning wi h neu al ne s. In Ad ances
in Neu al In o ma ion P ocessing Sys ems, 2021.
[2] Takuya Akiba, Sho a o Sano, Toshihiko Yanase, Take u Oh a, and
Masano i Koyama. Op una: A nex -gene a ion hype pa ame e op-
imiza ion amewo k. In P oceedings o he 25 h ACM SIGKDD In e -
na ional Con e ence on Knowledge Disco e y and Da a Mining, pages
1–10, Jul. 2019.
[3] Edesio Alcoba¸ca, Felipe Siquei a, Ad iano Ri olli, Lu´ıs P. F. Ga cia,
Je!e son T. Oli a, and And ´e C. P. L. F. de Ca alho. M e: Towa ds
ep oducible me a- ea u e ex ac ion. Jou nal o Machine Lea ning Re-
sea ch, 21(111):1–5, 2020.
[4] Se can ¨
O A ik and Tomas P is e . Tabne : A en i e in e p e able
abula lea ning. In P oceedings o he AAAI Con e ence on A i icial
In elligence, olume 35, pages 6679–6687, 2021.
[5] Sabuhi Badi li, Xiaowen Liu, Zhao Xing, A ko Bhowmik, Khanh Doan,
and Sa hiya S. Kee hi. G adien boos ing neu al ne wo ks: G owne .
a Xi p ep in , a Xi :2002.07971, 2020.
[6] Pie e Baldi, Pe e Sadowski, and Daniel Whi eson. Sea ching o exo ic
pa icles in high-ene gy physics wi h deep lea ning. Na u e Communi-
ca ions, 5(1):1–9, Sep. 2014.
[7] R.V. Bo iso , T. Leemann, K. Seßle , J. Haug, M. Pawelczyk, and
G. Kasneci. Deep neu al ne wo ks and abula da a: A su ey. 2022.
[8] Tom B own, Benjamin Mann, Nick Ryde , Melanie Subbiah, Ja ed D.
Kaplan, P a ulla Dha iwal, A ind Neelakan an, P ana Shyam, Gi ish
37
Sas y, Amanda Askell, e al. Language models a e ew-sho lea n-
e s. Ad ances in Neu al In o ma ion P ocessing Sys ems, 33:1877–1901,
2020.
[9] Tianqi Chen and Ca los Gues in. Xgboos : A scalable ee boos -
ing sys em. In P oceedings o he 22nd ACM SIGKDD In e na ional
Con e ence on Knowledge Disco e y and Da a Mining, 2016.
[10] Co inna Co es, Xa ie Gonzal o, Vi aly Kuzne so , Meh ya Moh i,
and Sco Yang. Adane : Adap i e s uc u al lea ning o a i icial neu-
al ne wo ks. In In e na ional Con e ence on Machine Lea ning, pages
874–883. PMLR, 2017.
[11] Alexey Doso i skiy, Lucas Beye , Alexande Kolesniko , Di k Weis-
senbo n, Xiaohua Zhai, Thomas Un e hine , Mos a a Dehghani,
Ma hias Minde e , Geo g Heigold, Syl ain Gelly, e al. An image
is wo h 16x16 wo ds: T ans o me s o image ecogni ion a scale. In
In e na ional Con e ence on Lea ning Rep esen a ions, 2021.
[12] James Doughe y, Ron Koha i, and Meh an Sahami. Supe ised and
unsupe ised disc e iza ion o con inuous ea u es. In P oceedings o
he 12 h In e na ional Con e ence on Machine Lea ning (ICML), pages
194–202, 1995.
[13] Dhee u Dua and Casey G a!. Uci machine lea ning eposi o y. Online,
2017.
[14] S. Elsayed, D. Thyssens, A. Rashed, H. S. Jomaa, and L. Schmid -
Thieme. Do we eally need deep lea ning models o ime se ies o e-
cas ing? a Xi p ep in , a Xi :2101.02118, 2021.
[15] FICO. Home equi y line o c edi (heloc) da ase , 2019. Accessed:
Jun. 15, 2022. [Online]. A ailable: h ps://communi y. ico.com/s/
explainable-machine-lea ning-challenge.
[16] Yoa F eund and Robe E. Schapi e. A decision- heo e ic gene al-
iza ion o on-line lea ning and an applica ion o boos ing. Jou nal o
Compu e and Sys em Sciences, 55(1):119–139, 1997.
[17] Je ome H. F iedman. G eedy unc ion app oxima ion: a g adien boos -
ing machine. Annals o S a is ics, 29(5):1189–1232, 2001.
[18] Ian Good ellow, Yoshua Bengio, and Aa on Cou ille. Deep Lea ning.
MIT P ess, 2016.
38

[19] Yu y Go ishniy, I an Rubache , and A em Babenko. On embeddings
o nume ical ea u es in abula deep lea ning. In Ad ances in Neu al
In o ma ion P ocessing Sys ems, olume 35, pages 24991–25004, 2022.
[20] Yu y Go ishniy, I an Rubache , Valen in Kh ulko , and A em
Babenko. Re isi ing deep lea ning models o abula da a. In Ad-
ances in Neu al In o ma ion P ocessing Sys ems, olume 34, pages
18932–18943, 2021.
[21] Lucien G insz ajn, Edoua d Oyallon, and Ga¨el Va oquaux. Why do
ee-based models s ill ou pe o m deep lea ning on ypical abula
da a? Ad ances in Neu al In o ma ion P ocessing Sys ems, 35:507–
520, 2022.
[22] Xiaojuan Qi Ruigang Yang Gao Huang Hao Li, Hong Zhang. Imp o ed
echniques o aining adap i e deep ne wo ks. In Compu e Vision
and Pa e n Recogni ion, 2019.
[23] Xiang He, Ke Zhao, and Xiaowen Chu. Au oml: A su ey o he s a e-
o - he-a . Knowledge-Based Sys ems, 212:106622, 2021.
[24] Fu ong Huang, Jo dan Ash, and Robe Schapi e. Lea ning deep esne
blocks sequen ially using boos ing heo y. In In e na ional Con e ence
on Machine Lea ning, 2018.
[25] Gao Huang, Yu Sun, Zhuang Liu, Daniel Sed a, and Kilian Q. Wein-
be ge . Mul i-scale dense con olu ional ne wo ks o e”cien p edic ion.
a Xi p ep in , a Xi :1703.09844, 2017.
[26] A lind Kad a, Ma ius Lindaue , F ank Hu e , and Josi G abocka.
Well- uned simple ne s excel on abula da ase s. In Ad ances in Neu al
In o ma ion P ocessing Sys ems, olume 34, 2021.
[27] Guolin Ke, Qi Meng, Thomas Finley, Tai eng Wang, Wei Chen, Wei-
dong Ma, Qiwei Ye, and Tie-Yan Liu. Ligh gbm: A highly e”cien
g adien boos ing decision ee. 2017.
[28] Diede ik P. Kingma and Jimmy Ba. Adam: A me hod o s ochas ic
op imiza ion. a Xi p ep in a Xi :1412.6980, 2014.
[29] Ron Koha i and Meh an Sahami. E o -based and en opy-based dis-
c e iza ion o con inuous ea u es. In P oceedings o he Second In e -
na ional Con e ence on Knowledge Disco e y and Da a Mining (KDD),
pages 114–119. AAAI P ess, 1996.
39
[30] Roman Le in, Vale iia Che epano a, A i Schwa zschild, A pi Bansal,
C Bayan B uss, Tom Golds ein, And ew Go don Wilson, and Micah
Goldblum. T ans e lea ning wi h deep abula models. In In e na ional
Con e ence on Lea ning Rep esen a ions, 2023.
[31] J. Li, Y. Li, X. Xiang, S.-T. Xia, S. Dong, and Y. Cai. Tn : An in e -
p e able ee-ne wo k- ee lea ning amewo k using knowledge dis il-
la ion. En opy, 22(11):1203, 2020.
[32] Duncan McEl esh, Sau abh Khandagale, Jose Val e de, Chai anya V.
P asad, Gou ham Ramak ishnan, Micah Goldblum, and Colin Whi e.
When do neu al ne s ou pe o m boos ed ees on abula da a? In
Ad ances in Neu al In o ma ion P ocessing Sys ems, olume 36, pages
76336–76369, 2023.
[33] D. Med ede and A. D’yakono . New p ope ies o he da a dis-
illa ion me hod when wo king wi h abula da a. a Xi p ep in ,
a Xi :2010.09839, 2020.
[34] Ch is ophe Z. Mooney. Mon e Ca lo Simula ion. SAGE, Newbu y
Pa k, CA, USA, 1997.
[35] Ben Pe e s, Vlad Niculae, and And ´e FT Ma ins. Spa se sequence- o-
sequence models. In P oceedings o he 57 h Annual Mee ing o he As-
socia ion o Compu a ional Linguis ics (ACL), pages 1504–1519. As-
socia ion o Compu a ional Linguis ics, 2019.
[36] Se gei Popo , S anisla Mo ozo , and A em Babenko. Neu al obli ious
decision ensembles o deep lea ning on abula da a. In In e na ional
Con e ence on Lea ning Rep esen a ions, 2020.
[37] Se gey Popo , S anisla Mo ozo , and And ey Babenko. Neu al obli -
ious decision ensembles o deep lea ning on abula da a. a Xi
p ep in , a Xi :1909.06312, 2019.
[38] Liudmila P okho enko a, Gleb Guse , Aleksand Vo obe , Anna V.
Do ogush, and And ey Gulin. Ca boos : unbiased boos ing wi h ca e-
go ical ea u es. 2018.
[39] Nasim Rahaman, A is ide Ba a in, De ansh A pi , Felix D axle , Min
Lin, F ed A. Hamp ech , Yoshua Bengio, and Aa on Cou ille. On
he spec al bias o neu al ne wo ks. In P oceedings o he 36 h In e -
na ional Con e ence on Machine Lea ning (ICML), pages 5301–5310.
PMLR, 2019.
40
[40] D. Roschewi z, M.-A. Ha ley, L. Co inzia, and M. Jaggi. I eda g: In-
e p e able da a-in e ope abili y o ede a ed lea ning. a Xi p ep in ,
a Xi :2107.06580, 2021.
[41] I an Rubache , A em Alekbe o , Yu y Go ishniy, and A em
Babenko. Re isi ing p e aining objec i es o abula deep lea ning.
a Xi p ep in , a Xi :2207.03208, 2022.
[42] Debo Sahoo, Quang Pham, Jing Lu, and S e en C. Hoi. Online deep
lea ning: Lea ning deep neu al ne wo ks on he ly. a Xi p ep in ,
a Xi :1711.03705, 2017.
[43] J¨u gen Schmidhube . Deep lea ning in neu al ne wo ks: An o e iew.
Neu al Ne wo ks, 61:85–117, 2015.
[44] Shai Shale -Shwa z. Sel ieboos : A boos ing algo i hm o deep lea n-
ing. In In e na ional Con e ence on Machine Lea ning, 2014.
[45] Xiang Shi, Johannes Muelle , Nicholas E ickson, Ming Li, and Alexan-
de Smola. Mul imodal au oml on s uc u ed ables wi h ex ields.
In 8 h ICML Wo kshop on Au oma ed Machine Lea ning (Au oML),
2021.
[46] Ra id Shwa z-Zi and Ami A mon. Tabula da a: Deep lea ning is
no all you need. 2021.
[47] Ma hew Tancik, P a ul P. S ini asan, Ben Mildenhall, Sa a
F ido ich-Keil, Ni hin Ragha an, U ka sh Singhal, Ra i Ramamoo hi,
Jona han T. Ba on, and Ren Ng. Fou ie ea u es le ne wo ks lea n
high equency unc ions in low dimensional domains. In Ad ances in
Neu al In o ma ion P ocessing Sys ems (Neu IPS), 2020.
[48] Ashish Vaswani, Noam Shazee , Niki Pa ma , Jakob Uszko ei , Llion
Jones, Aidan N. Gomez, #Lukasz Kaise , and Illia Polosukhin. A en ion
is all you need. In Ad ances in Neu al In o ma ion P ocessing Sys ems,
pages 5998–6008, 2017.
[49] Le Yang, Xiaoyang Huang, Hao Zhang, Yu Wang, Zhiwei Liu, and
Gao Huang. Resolu ion adap i e ne wo ks o e”cien in e ence. In
P oceedings o he IEEE/CVF Con e ence on Compu e Vision and
Pa e n Recogni ion, 2020.
41
[50] Han ing Yu, Hao Li, Gang Hua, Gao Huang, and Honghui Shi. Boos ed
dynamic neu al ne wo ks. In P oceedings o he AAAI Con e ence on
A i icial In elligence, 2023.
[51] Han ing Yu, Hao Li, Gang Hua, Gao Huang, and Honghui Shi. Boos ed
dynamic neu al ne wo ks. In P oceedings o he AAAI Con e ence on
A i icial In elligence, olume 37, pages 10989–10997, 2023.
[52] Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. Deep lea ning based
ecommende sys em: A su ey and new pe spec i es. ACM Compu ing
Su eys, 52(1):1–38, 2019.
42