Boosted Neural Networks for Tabular Regression

Author: Musaliyarakath, Rizeen; A.S Abbas, Jessica

Publisher: Zenodo

DOI: 10.5281/zenodo.17721216

Source: https://zenodo.org/records/17721216/files/SRP.pdf

S uden Resea ch P ojec
Resea ch Topic: Boos ed Neu al Ne wo ks o Tabula
Reg ession
Au ho s: 1747556 Rizeen Musaliy aka h, 1748971 Jessica
A.S Abbas, 1749059 Gaya a Gunaseke a,
Supe iso
Ki an Madhusudhanan
15 h Ma ch 2025
Con en s
I abs ac .............................. 2
1 In oduc ion 3
I Mo i a ion ............................ 3
I.1 P oblem Se ing . . . . . . . . . . . . . . . . . . . . . 4
I.2 Resea chIdea....................... 5
I.3 Objec i e ......................... 6
2 Rela ed Wo ks 7
I The Boos ing F amewo k . . . . . . . . . . . . . . . . . . . . 7
II G adien -Boos ed Decision T ees (GBDTs) . . . . . . . . . . 8
III Boos ed Neu al Ne wo k A chi ec u es . . . . . . . . . . . . . 9
3 Me hodology 22
I In eg a ed NODE + PLE A chi ec u e . . . . . . . . . . . . . 22
II Boos ed Fully Connec ed Ne wo ks (BFCN) . . . . . . . . . . 24
III Boos ed Residual Ne wo ks . . . . . . . . . . . . . . . . . . . 26
4 Expe imen s and Resul s 29
I Da ase s.............................. 29
II Analysis o Resul s . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Conclusion 33
I Con ibu ions........................... 35
1
I abs ac
Tabula da a ep esen one o he mos p e alen o ms o da a in machine
lea ning con ex . Despi e ecen ad ancemen s in using neu al ne s (NNs)
o handle abula da a, he e emains an ac i e and ongoing deba e abou
whe he NNs ou pe o m g adien -boos ed decision ees (GBDTs) when
examined wi h espec o abula da a, wi h some ecen wo k sugges -
ing ei he ha GBDTs a e consis en ly be e han NNs o ha NNs a e
consis en ly be e han GBDTs. In his wo k, we a emp o b idge his
gap by explo ing a ious boos ed neu al ne wo k a chi ec u es on abula
da a. We p opose and e alua e h ee amewo ks: In eg a ed Neu al Obli -
ious Decision Ensembles (NODE) wi h Piecewise Linea Encoding (PLE),
Boos ed Fully Connec ed Ne wo ks (BFCN), and Boos ed Residual Ne -
wo ks. Ou expe imen al esul s e eal a mixed pe o mance landscape o
he p oposed boos ed neu al ne wo k a chi ec u es when compa ed o es-
ablished baselines. Agains g adien -boos ed decision ees, NODE+PLE
demons a es compe i i e pe o mance p ima ily on eg ession asks. How-
e e , signi ican unde pe o mance is e iden ac oss classi ica ion asks, wi h
pa icula ly poo esul s on HELOC and mode a e pe o mance on Adul
Da ase s. BFCN consis en ly unde pe o ms GBDTs ac oss all me ics, ail-
ing o achie e compe i i e esul s on any da ase . When e alua ed agains
deep lea ning baselines, NODE+PLE shows mo e p omising esul s, achie -
ing s a e-o - he-a pe o mance on Cali o nia Housing eg ession and com-
pe i i e accu acy on Adul classi ica ion. The esul s unde sco e he pe -
sis en challenge o achie ing GBDT-le el pe o mance wi h neu al a chi-
ec u es on abula da a, while simul aneously demons a ing ha boos ed
neu al ne wo ks can ad ance he s a e-o - he-a wi hin he deep lea ning
pa adigm o speci ic da ase cha ac e is ics.
Keywo ds
Deep Lea ning (DL), Neu al Obli ious Decision Ensembles (NODE), Deep
Neu al Ne wo ks (DNNs),Piecewise Linea Encoding (PLE), G adien -boos ed
decision ees (GBDTs), Boos ed Dynamic Neu al Ne wo ks (Boos Ne )
Codebase: h ps://www.uni-hildesheim.de/gi lab/s p-g oup
2
Chap e 1
In oduc ion
IMo i a ion
Deep neu al ne wo ks (DNNs) ha e achie ed excep ional pe o mance ac oss
a wide a ay o domains, including compu e ision, na u al language p o-
cessing, and speech ecogni ion, pa icula ly when dealing wi h homoge-
neous da a such as images, audio, and ex [1, 8, 11]. Howe e , hei e -
ec i eness on he e ogeneous abula da a emains a signi ican and pe sis-
en challenge [14, 46, 37]. Tabula da a, unlike image o language da a,
is inhe en ly he e ogeneous, o en comp ising a mix o dense nume ical and
spa se ca ego ical ea u es, wi h co ela ions among hese ea u es ypically
weake and mo e i egula [7]. This challenge is c i ical because abula da a
ep esen s he mos commonly used o m o da a and is indispensable o
nume ous i al and compu a ionally demanding applica ions [3, 7, 46].
Despi e he p o en success o DNNs in o he domains, g adien -boos ed
decision ees (GBDTs), such as XGBoos [9], Ligh GBM [27], and Ca -
Boos [38], s ill la gely ou pe o m deep lea ning models on supe ised lea n-
ing asks in ol ing abula da a [7]. This indica es a po en ial s agna ion
in esea ch p og ess o compe i i e deep lea ning models in his domain
[7]. Empi ical compa isons o en show ha GBDTs o!e supe io accu acy,
aining e”ciency, in e ence speed, and hype pa ame e op imiza ion ime
[7]. The e is an ac i e deba e on whe he neu al ne wo ks o GBDTs gen-
e ally ou pe o m each o he on abula da a, wi h a ious wo ks a guing
o [4, 26, 30, 36, 41] o agains [7, 20, 21, 46] NNs. Howe e , McEl esh e
al. [32] epo ed ha his ”NN s. GBDT” deba e may be o e emphasized,
as o a signi ican numbe o da ase s, ei he he pe o mance di!e ence is
3
negligible, o ligh hype pa ame e uning on a GBDT is mo e impac ul
han he choice be ween NN and GBDT.
Ne e heless, deep neu al ne wo ks possess se e al inhe en ad an ages
ha make hei adap a ion o abula da a a compelling esea ch di ec ion.
DNNs a e highly lexible [42],allow o e”cien and i e a i e aining, and
a e pa icula ly aluable in Au oML con ex s [23, 45]. Fu he mo e, neu al
ne wo ks can be deployed o mul imodal lea ning p oblems whe e abula
da a se es as one inpu modali y [45], o abula da a dis illa ion [33, 31],
and in ede a ed lea ning scena ios [40].
Gi en he pe sis en pe o mance gap and he unique cha ac e is ics o
abula da a, his p ojec aims o b idge he di ide by explo ing and en-
hancing boos ed neu al ne wo k amewo ks o abula da a p edic ion.
The co e idea is o in es iga e how an ensemble app oach o modeling neu-
al ne wo ks can adap he s eng hs o adi ional ee-boos ed models o
achie e imp o ed p edic i e pe o mance.
I.1 P oblem Se ing
Gi en a da ase X→RM→Nand co esponding labels y→R1→N,whe eN
ep esen s he numbe o aining ins ances and M ep esen s he numbe
o inpu ea u es. Le ω:RN↑RN↓Rdeno e a loss unc ion, and ω:
RM↓R ep esen a neu al ne wo k pa ame e ized by lea nable pa ame e s
ω.
Ou objec i e is o ind he op imal pa ame e ec o ω↑ ha minimizes
he empi ical isk:
ω↑= a g min
ωL(ω) = a g min
ωω(y,
ω(X))
whe e ω(X)→RN ep esen s he ne wo k’s p edic ions o e all aining
ins ances, and ωencompasses all ainable pa ame e s including weigh s and
biases ac oss all laye s o he neu al ne wo k.
The loss unc ion is de ined o di!e en ask ypes by he ollowing:
Reg ession: Mean Squa ed E o (MSE) is applied
LMSE =1
n
n
!
j=1
(yj↔ˆyj)2.
4

Bina y Classi ica ion: Bina y C oss-En opy Loss
LBCE =↔1
n
n
!
j=1
[yjlog ˆyj+(1↔yj) log(1 ↔ˆyj)] .
Mul iclass classi ica ion: C oss-En opy Loss
LCE =↔
n
!
j=1
yjlog ˆyj
I.2 Resea ch Idea
G adien boos ing echniques, pa icula ly models like XGBoos , Ligh -
GBM, and Ca Boos , ha e become he es ablished s anda d o abula da a
p edic ion, consis en ly achie ing s a e-o - he-a pe o mance [7, 32]. This
s ands in con as o he excep ional pe o mance o deep neu al ne wo ks
(DNNs) in o he homogeneous da a domains such as compu e ision and
na u al language p ocessing [48, 18]. The e!ec i eness o DNNs on he e o-
geneous abula da a, which o en consis s o a mix o dense nume ical and
spa se ca ego ical ea u es wi h weake and mo e i egula co ela ions, e-
mains a signi ican challenge [43, 52, 48]. Indeed, abula da ase s ha e been
called he ”las ’unconque ed cas le’” o deep neu al ne wo k models [7].
While he ”NN s. GBDT” deba e can be o e emphasized o many da ase s
whe e pe o mance di!e ences a e negligible o hype pa ame e uning is
mo e impac ul, GBDTs a e gene ally be e a handling skewed o hea y-
ailed ea u e dis ibu ions and o he da a i egula i ies, and end o pe o m
be e on la ge da ase s [32].
This p ojec is g ounded in he hypo hesis ha employing neu al ne -
wo ks wi hin a boos -like s uc u e can enhance hei p edic i e accu acy
on abula da ase s . Speci ically, we p opose o in es iga e he possibili y
o combining s uc u ed ea u e enginee ing, such as Piecewise Linea En-
coding (PLE) [19], wi h hie a chical, boos -like lea ning p inciples h ough
neu al ne wo ks. We aspi e o explo e how a hough ul combina ion o
neu al ne wo ks in a consecu i e boos ing o ma can adap he s eng hs
o adi ional ee-boos ed models, aiming o imp o ed p edic i e pe o -
mance o e con en ional me hods o abula da a p edic ion. The ul ima e
goal is o demons a e ha a chi ec u es employing hese boos ed neu al
ne wo k amewo ks could po en ially su pass he pe o mance o bo h a-
di ional GBDTs and s andalone deep neu al ne wo ks, o!e ing e sa ile and
po en ially in e p e able solu ions o complex abula da a challenges.
5
I.3 Objec i e
The p ima y objec i e o his s udy is o explo e and ad ance he use o
boos ed neu al ne wo k echniques o abula da a p edic ion. To achie e
his, we de ine he ollowing goals:
•Conduc a ho ough e iew o ecen ad ancemen s in boos ed neu al
ne wo k a chi ec u es and ensemble s a egies.
•Rep oduce and alida e he epo ed esul s o hese app oaches o
es ablish a eliable baseline.
•Adap he iden i ied neu al ne wo k a chi ec u es speci ically o ab-
ula da a applica ions whe e applicable.
•Benchma k and e alua e he adap ed models agains s a e-o - he-a
g adien -boos ed ee algo i hms as well as exis ing neu al ne wo k
baselines.
•Enhance model pe o mance h ough s uc u ed ea u e enginee ing
and op imiza ion o ensemble con igu a ions.
•Analyze he esul ing pe o mance o de elop a deepe unde s anding
o he sui abili y, s eng hs, and limi a ions o boos ed neu al ne wo ks
o abula da a.
6
Chap e 2
Rela ed Wo ks
The concep o boos ing, an ensemble algo i hm ha combines mul iple weak
lea ne s in o a single s ong lea ne has signi ican ly in luenced p edic i e
pe o mance on abula da a.This sec ion e iews he ounda ional g adien
boos ed decision ee models and he main neu al a chi ec u es ha in o m
ou app oach, by conside ing he models ha inco po a e boos ing-inspi ed
me hodologies[32]
I The Boos ing F amewo k
The algo i hmic ounda ion o boos ing was es ablished wi h he AdaBoos
(Adap i e Boos ing) algo i hm by F eund & Schapi e [16]. AdaBoos wo ks
by adap i ely changing he weigh s o aining ins ances, making subsequen
weak lea ne s (e.g., decision s umps) o ocus on p e iously misclassi ied ex-
amples. This concep was expanded by F iedman[17] wi h he in oduc ion
o g adien boos ing by modeling i as a nume ical op imiza ion p oblem in
unc ion space. In his, he model is buil sequen ially in a g eedy, s age-wise
manne whe e each new lea ne is ained o minimize he loss by co ec ing
he e o s o i s p edecesso s. The co e idea is o i e a i ely add weak lea n-
e s o an ensemble, wi h each new model ained o p edic he nega i e
g adien s ( he ”pseudo- esiduals”) o he cu en ensemble’s loss.
Fo mally, gi en a di!e en iable loss unc ion L(y,F(x)), he g adien
boos ing algo i hm p oceeds as ollows[17]
1. Ini ialize he model wi h a cons an alue:
F0(x) = a g min
ω
n
!
i=1
L(yi,ε) (2.1)
7
2. Fo m=1 o M(numbe o boos ing ounds), pe o m he ollowing
s eps:
(a) Compu e he pseudo- esiduals o each ins ance i:
(m)
i=↔"ϑL(yi,F(xi))
ϑF(xi)#F(x)=Fm→1(x)
(2.2)
This equa ion de ines he “e o ” ha he new weak lea ne mus
i .
(b) Fi a weak lea ne hm(x) (e.g., a decision ee) o he pseudo-
esiduals { (m)
i}.
(c) Find he op imal s ep size εm ia line sea ch:
εm= a g min
ω
n
!
i=1
L(yi,F
m↓1(xi)+εh
m(xi)) (2.3)
(d) Upda e he model:
Fm(x)=Fm↓1(x)+ϖ·εmhm(x) (2.4)
He e, ϖ→(0,1] is he sh inkage o lea ning a e, a hype pa ame-
e ha con ols he con ibu ion o each weak lea ne o p e en
o e i ing.[17]
II G adien -Boos ed Decision T ees (GBDTs)
GBDT models o m he baselines o any esea ch on abula da a eg ession.
These models build an ensemble o decision ees i e a i ely, and co ec he
e o s made by he exis ing ensemble o ees. Each new ee is ained
o model he g adien o he loss unc ion.[32] G adien -boos ed decision
ee(GBDT) models such as XGBoos [9], Ligh GBM[27], and Ca Boos [38]
a e well known o hei obus pe o mance, e”ciency, and abili y o handle
he e ogeneous ea u es which made hem domina e he a ea o abula da a
p edic ion.
XGBoos (Ex eme G adien Boos ing) is an implemen a ion o he
g adien boos ing amewo k ha enhances he co e GBDT algo i hm h ough
a second-o de app oxima ion o he loss unc ion o mo e e”cien boos ing,
egula iza ion (L1/L2) o p e en o e i ing, and a spa si y-awa e algo i hm
ha e”cien ly handles missing alues. I lea ns he bes di ec ion o send a
da a poin wi h a missing alue a each spli , allowing i o p ocess spa se
da a wi hou expensi e p ep ocessing [9].
8
a he las laye ,
F(x)=w↘hT(x),
whe e wis he linea classi ie a ached o he ou pu o he inal block.
Boos ResNe modi ies his by decomposing he ne wo k in o weak mod-
ule classi ie s. Each module consis s o a esidual block pai ed wi h i s
own linea classi ie w . Fo mally, he module classi ie a s ep is de ined
as
o (x)=w↘
h (x),
whe e h (x) is he ou pu o he esidual mapping a ha s age. The inal
p edic ion o he ne wo k is hen ep esen ed as a elescoping sum o hese
module classi ie s:
F(x)=
T
!
=0
↼ o (x),
wi h coe”cien s ↼ chosen such ha he ensemble exac ly econs uc s he
s anda d ResNe ou pu [24].
To suppo he heo y, Huang e al. (2018) conduc ex ensi e expe i-
men s on CIFAR-10, CIFAR-100, and SVHN. The esul s show ha Boos -
ResNe “achie es es pe o mance compa able o ha o end- o-end ResNe
aining” while equi ing subs an ially less GPU memo y (p. 2065). Be-
cause he me hod ains blocks sequen ially, only one shallow block needs
o be loaded in o memo y a a ime, making i e”cien o e y deep a -
chi ec u es. The expe imen s also con i m he heo e ical p edic ions: as
he numbe o modules inc eases, aining e o dec eases s eadily, and es
accu acy imp o es co espondingly. O e all, Boos ResNe o!e s a heo e i-
cally g ounded ein e p e a ion o esidual ne wo ks by o malizing hem as
a boos ing ensemble, wi h a elescoping-sum ep esen a ion ha p ese es
he ResNe ou pu . I s sequen ial aining p ocedu e p o ides bo h com-
pu a ional sa ings and p o able con e gence gua an ees, posi ioning i as a
unique con ibu ion in he li e a u e on deep esidual lea ning.
Neu al Obli ious Decision Ensembles(NODE)
Neu al Obli ious Decision Ensembles, is a deep lea ning a chi ec u e ha
combines he s eng hs o ee-based models and deep neu al ne wo ks. I s
main inno a ion is he c ea ion o a di!e en iable ensemble o obli ious de-
cision ees (ODTs), ha makes end- o-end aining ia g adien descen
possible.[36] An Obli ious Decision T ee (ODT) is a ype o decision ee
ha has all nodes a he same dep h and mus use he same ea u e and he
15

same h eshold o spli ing. Fo a ee o dep h d his homogenei y ans-
o ms he ee om a s anda d b anching s uc u e in o a decision able wi h
2den ies, whe e each en y ep esen s a unique combina ion o bina y de-
cisions. While his cons ain educes he capaci y o a single ee, i makes
ensembles o ODTs highly e”cien o in e ence and ema kably esis an o
o e i ing.[36] The Ca Boos algo i hm [38], uses ODTs as weak lea ne s in
g adien boos ing and his has con ibu ed signi ican ly o i s success [38]
This idea is expanded upon in he NODE a chi ec u e. Se e al di!e en iable
ODTs make up a NODE laye . The key o di!e en iabili y lies in eplacing
he ha d, non-di!e en iable ope a ions o a s anda d decision ee ( ea u e
selec ion and bina y ou ing) wi h so , lea nable al e na i es: The NODE
a chi ec u e gene alizes ensembles o Obli ious Decision T ees(ODTs) in o
a ully di!e en iable amewo k ha can be ained end- o-end ia g adien
descen . An ODT is a decision ee whe e all nodes a a gi en dep h duse
he same ea u e and h eshold o spli ing, e!ec i ely o ming a decision
able.[36] Fo he di!e en iable ODT,in a single NODE laye , he e a e m
di!e en iable ODTs and he o wa d pass o one ee is buil o app oxi-
ma e he unc ion o a classical ODT while main aining di!e en iabili y. In
a classical ODT, he non-di!e en iable ou pu is gi en by:
h(x)=R(1( 1(x)↔b1),1( 2(x)↔b2),...,1( d(x)↔bd))(2.10)
whe e 1(·) is he Hea iside s ep unc ion, iis he selec ed ea u e a he i- h
spli , biis he co esponding h eshold, and Ris a d-dimensional esponse
enso ha holds he lea alues. To allow di!e en iabili y, NODE eplaces
hese ha d ope a ions wi h so , lea nable al e na i es.The ha d selec ion
o a single ea u e iis eplaced by a spa se, weigh ed combina ion o all
ea u es using he ↼-en max ans o ma ion[35] applied o a lea nable ea u e
selec ion ma ix F→Rd→n:
ˆ
i(x)=
n
!
j=1
xj·en maxε(Fij) (2.11)
He e, ↼=1.5 is used o induce spa si y, ensu ing he ou pu closely mimics
a ha d ea u e selec ion. The classical Hea iside s ep unc ion is eplaced
by a scaled, wo-class a ian o en max, de ined as:
↽ε(x) = en maxε([x, 0]) (2.12)
The so ou ing p obabili y o he i- h spli is hen:
ci(x)=↽ε* i(x)↔bi
φi+,(2.13)
16
whe e biand φia e lea nable scaling pa ame e s. The alue ci(x)→[0,1]
ep esen s he so p obabili y o aking he igh b anch a he i- h spli .
The inal ee ou pu is compu ed as a weigh ed sum o e all lea es. The
weigh s a e gi en by he ou e p oduc o he so choice ec o s o all d
spli s, o ming a ”choice enso ” C(x):
C(x)="c1(x)
1↔c1(x)#⇐"c2(x)
1↔c2(x)#⇐···⇐"cd(x)
1↔cd(x)#(2.14)
The inal p edic ion o a single ee is:
ˆ
h(x)= !
i1,...,id≃{0,1}d
Ri1,...,id·Ci1,...,id(x) (2.15)
The ou pu o a NODE laye is he conca ena ion o he ou pu s o all m
ees:
(ˆ
h1(x),ˆ
h2(x),...,ˆ
hm(x))
Figu e 2.4: A chi ec u e o a single di!e en iable Obli ious Decision T ee
(ODT) wi hin a NODE laye .The single ODT inside he NODE laye . The
spli ing ea u es and he spli ing h esholds a e sha ed ac oss all he in e -
nal nodes o he same dep h. The ou pu is a sum o lea esponses scaled
by he choice weigh s[36]
Mul i-Laye Hie a chical A chi ec u e Se e al NODE laye s s acked
in a denseNe -like ashion o ms he ull NODE model[51] whe e each laye
uses a conca ena ion o all p e ious laye s.so, he inpu o he k- h laye is a
conca ena ion o he o iginal inpu ea u es and he ou pu s om all p e ious
laye s 0 o k↔1. Due o his design, he model is able o lea n hie a chical
ea u e in e ac ions, whe e ees in deepe laye s can lea n complex ules
17
based on high-le el ep esen a ions ex ac ed by ea lie laye s. The inal
p edic ion is compu ed as he a e age o he ou pu s om all ees ac oss
all NODE laye s:
ˆy=1
K
K
!
k=1
ˆ
hk(x) (2.16)
Figu e 2.5: The mul i-laye NODE a chi ec u e wi h DenseNe -s yle ea u e
euse ac oss laye s.The NODE a chi ec u e, consis ing o densely connec ed
NODE laye s. Each laye con ains se e al ees whose ou pu s a e conca e-
na ed and se e as inpu o he subsequen laye . The inal p edic ion is
ob ained by a e aging he ou pu s o all ees om all he laye s[36].
On Embeddings o Nume ical Fea u es in Tabula Deep Lea n-
ing
This esea ched in oduced an unde explo ed domain o deep lea ning (DL)
o abula da a being he embedding o nume ical ea u es. The au ho s in-
oduce wo app oaches o cons uc ing embeddings o nume ical ea u es:
Piecewise Linea Encoding(PLE) and Pe iodic Ac i a ion(P) unc ions.[19]
The PLE me hod is inspi ed by classical ea u e binning echniques, whe e
he alue ange o a nume ical ea u e is di ided in o in e als (bins), and he
ea u e alues a e encoded in a piecewise linea manne . The esul s o he
esea ch demons a e ha he echnique helps o imp o e he pe o mance
o deep lea ning models on abula da a. This app oach allows simple MLP
models o compe e wi h mo e complex T ans o me -based models.They also
show ha he in eg a ion o his app oach in o he deep lea ning pipeline
p oduces s a e-o - he-a esul s on abula DeepLea ning closing he pe -
o mance gap wi h GBDTs.[19]
18
Piecewise Linea Encoding(PLE):
The design o PLE is mo i a ed by he limi a ions o deep lea ning o
abula da a. While Mul ilaye Pe cep ons a e known o be a uni e sal
app oxima o [24][19], hei lea ning capabili ies in p ac ice a e o en ham-
pe ed by op imiza ion di”cul ies[39][19] Recen wo k by Tancik e al.[47]
demons a ed ha ans o ming he inpu space can signi ican ly sol e hese
op imiza ion issues. This inding di ec ly inspi es he co e p emise o PLE:
ha al e ing he ep esen a ion o o iginal scala nume ical ea u e alues
can enhance he lea ning capabili ies o abula deep lea ning models[16].
The au ho s use he one-ho encoding algo i hm, a me hod ha is wildly
success ul o ep esen ing disc e e en i ies (e.g., ca ego ical ea u es, NLP
okens)[19]. The one-ho ep esen a ion si s a he opposi e end o he spec-
um om a scala ep esen a ion in he ade-o!be ween pa ame e e”-
ciency and exp essi i y[19]. To es i a one-ho -like app oach could ben-
e i deep lea ning models on nume ical da a, PLE is designed as a con-
inuous al e na i e o one-ho encoding, making i applicable o nume ical
ea u es.[16][19]
Fo a gi en nume ical ea u e x, PLE de ines Tbins(in e als) wi h bound-
a ies b0,b
1,...,b
T. The encoding ans o ms he scala alue xin o a T-
dimensional ec o [19] whe e he - h elemen (bin) is calcula ed as:
e =








0,i x<b
↓1and >1,
1,i x≃b and <T,
x↔b ↓1
b ↔b ↓1
,o he wise.
The ull PLE encoding ec o is:
PLE(x)=[e1,e
2,...,e
T],
19
Figu e 2.6: The Piecewise Linea Encoding (PLE) in ac ion o T= 4[19].
The scala inpu alue xis mapped o a 4-dimensional ec o [e1,e
2,e
3,e
4]
based on i s posi ion wi hin he bins, c ea ing a s uc u ed and in e p e able
ep esen a ion.
The au ho s ely on he classic binning algo i hms [12] and one o he wo
algo i hms is unsupe ised, while ano he one u ilizes labels o cons uc -
ing bins. Ob aining bins om quan iles (Unsupe ised Binning): A
na u al baseline way o cons uc he bins is by spli ing alue anges acco d-
ing o uni o mly chosen empi ical quan iles o he co esponding indi idual
ea u e dis ibu ions[PLEPape ]:
b =q
T{xj
i} o j→ aining se .
T i ial bins o ze o size a e emo ed. Supe ised Ta ge -awa e Bin-
ning(Building a ge -awa e bins): This supe ised app oach employs
aining labels o cons uc ing bins, iden ical in spi i o he C4.5 Dis-
c e iza ion [29] algo i hm[19]. Fo each ea u e, we ecu si ely spli i s alue
ange in a g eedy manne using he a ge as guidance. This is equi alen
o building a decision ee (which uses o g owing only his one ea u e and
he a ge ) and ea ing he egions co esponding o i s lea es as he bins
o PLE[19]. we de ine
bi
0=min
j≃J ain
xj
iand bi
T= max
j≃J ain
xj
i[?].
Pe iodic Ac i a ion Func ion(P)
This design p ojec s scala alues in o a pe iodic space using lea nable
equencies.[47] Fo his me hod, a ea u e x, he embedding is cons uc ed
as:
i(x)=Pe iodic(x) = conca (sin( ),cos( )),(2.17)
20

whe e
=(2⇀c1x, 2⇀c2x, . . . , 2⇀ckx),(2.18)
and cia e ainable pa ame e s ini ialized om a no mal dis ibu ion
ci⇒N(0,↽).(2.19)
The hype pa ame e s ↽(ini ial equency scale) and k(numbe o equen-
cies) a e c ucial and a e uned on he alida ion se .
21
Chap e 3
Me hodology
This p ojec aims o b idge he pe o mance gap be ween deep lea ning
models and G adien Boos ed Decision T ees (GBDTs) on he e ogeneous
abula da a. To achie e his, we explo e ad anced concep s o :
•In eg a ing a deep lea ning a chi ec u e ha mimics ee ensembles
and in oducing a no el me hod o ep esen ing nume ical ea u es
in o his a chi ec u e. The co e o ou me hodology is he in eg a ion
o Piecewise Linea Encoding (PLE) in o he Neu al Obli ious Deci-
sion Ensembles (NODE) a chi ec u e, c ea ing a powe ul and di!e -
en iable model o abula eg ession and classi ica ion.
•Adap ing deep neu al ne wo k models o be mo e compa a i e wi h
g adien boos ing decision ee models(GBDTs) on he e ogenous ab-
ula da a.
I In eg a ed NODE + PLE A chi ec u e
The in eg a ion o he piecewise linea encoding(PLE) o nume ical embed-
dings wi h he Neu al Obli ious Decision Ensembles(NODE) a chi ec u e is
an inno a ion o his wo k wi h he aim o add essing he limi a ions o deep
lea ning on abula da a. While he componen s a e powe ul indi idually,
we belie e hei combina ion will esul in a mo e obus model.
The in eg a ed a chi ec u e le e ages he s eng hs o bo h NODE and PLE
o p ocess da a. The PLE model p ocesses each nume ical ea u e xisepa-
a ely by i s own PLE module. Based on he chosen s a egy (unsupe ised
PLE o supe ised ), he scala alue is ans o med in o a high-dimensional,
piecewise-linea ep esen a ion PLE(xi)→RT. This ec o is hen passed
22
h ough a ea u e-speci ic linea laye o ob ain a inal dense embedding[37]:
enum
i=Wi·PLE(xi)+bi.
Each ca ego ical ea u e is p ocessed h ough a s anda d embedding laye ,
mapping each ca ego y o a dense ec o eca
j. All esul ing nume ical em-
beddings enum
iand ca ego ical embeddings eca
ja e conca ena ed o o m
a join , ich inpu ep esen a ion ec o z. The ec o zis ed in o he
mul i-laye NODE a chi ec u e[37]. The di!e en iable obli ious decision
ees wi hin each NODE laye now ope a e on his p e-en iched embedding
space a he han on aw, no malized scala s. The DenseNe -like s uc u e
allows subsequen laye s o use he ans o med, high-le el ea u es lea ned
by ea lie laye s, enabling he lea ning o complex hie a chical in e ac ions
be ween he PLE-encoded ea u es. The inal ou pu is a simple a e age o
he ou pu s om all ees ac oss all NODE laye s.
Figu e 3.1: A chi ec u e o In eg a ed NODE + PLE. om he op le ,inpu
ea u es a e encoded ia Piecewise Linea Encoding (PLE).All embeddings
a e conca ena ed and p ocessed by a mul i-laye NODE a chi ec u e, which
lea ns hie a chical in e ac ions h ough i s ensembles o di!e en iable obli -
ious decision ees. The inal p edic ion is an a e age o all ee ou pu s[37]
Why PLE o NODE In eg a ion
The choice o Piecewise Linea Encoding (PLE) o e o he embeddings o
in eg a ion wi h NODE is due o a sha ed induc i e bias. Bo h me hods a e
undamen ally based on he concep o spli ing da a on ea u e h esholds.
This is he co e ope a ional p inciple ha makes ee-based models like
23
G adien Boos ed Decision T ees (GBDTs) powe ul on abula da a.NODE
explici ly mimics his p inciple by cons uc ing a di!e en iable ensemble o
obli ious decision ees ha lea n di!e en iable spli s. PLE di ec ly encodes
his p inciple in o he inpu ep esen a ion. I ans o ms a scala alue in o
a ec o based on i s posi ion wi hin lea ned bins (o in e als), de ined by
h esholds
(b0,b
1,...,b
T).
The e o e, PLE p o ides an inpu signal ha is al eady s uc u ed in he
o m ha NODE is designed o unde s and. The ee-like ou pu o PLE
is a i o he ee-based lea ning o NODE, c ea ing a mo e powe ul and
aligned model han wi h o he , less compa ible embeddings.
Why This Combina ion is Inno a i e
P e ious a emp s o make deep lea ning compe i i e wi h GBDTs ocused
p ima ily on designing no el backbone a chi ec u es (e.g., T ans o me s[48,
20], o he a en ion mechanisms) o on c ea ing comple ely di!e en iable
ee s uc u es (e.g., NODE[37]). This wo k is unique by ocusing on he
c i ical bu unde explo ed inpu ep esen a ion laye o nume ical ea u es.
Inno a ion lies in ecognizing ha :
•The inpu ep esen a ion (simple scala s) is a key bo leneck.
•A echnique (PLE) exis s o c ea e a much iche ep esen a ion.
•A speci ic a chi ec u e (NODE) exis s ha is pe ec ly sui ed o exploi
his iche ep esen a ion due o i s ee-like na u e.
By in eg a ing PLE wi h NODE, we aim o c ea e a uni ied a chi ec u e ha
di ec ly add esses he co e weaknesses o Deep Neu al Ne wo ks on abula
da a, pushing hei pe o mance close o and beyond ha o s a e-o - he-a
GBDTs
II Boos ed Fully Connec ed Ne wo ks (BFCN)
A conce ed a emp was made o ex end he ad an ages o dynamic in-
e ence o he e ogeneous abula da a by in eg a ing he Boos ed Dynamic
Neu al Ne wo k (Boos Ne ) model in o a ully-connec ed ne wo k ame-
wo k, which will be e e ed o as Boos ed Fully Connec ed Ne wo ks
(BFCN). This modi ica ion was essen ial o esol ing he well-known issues
ha deep neu al ne wo ks encoun e when wo king wi h abula da ase s,
24
Table 4.3: Pe o mance me ics o NODE + PLE and GBDTs and Machine
Lea ning Models
Me hod HELOC Adul HIGGS Co e ype Cal. Housing
Acc⇑AUC⇑Acc⇑AUC⇑Acc⇑AUC⇑Acc⇑AUC⇑MSE⇓
Linea Model 73.0±0.0 80.1±0.1 82.5±0.2 85.4±0.2 64.1±0.0 68.4±0.0 72.4±0.0 92.8±0.0 0.528±0.008
KNN 72.2±0.0 79.0±0.1 83.2±0.2 87.5±0.2 62.3±0.1 67.1±0.0 70.2±0.1 90.1±0.2 0.421±0.009
Decision T ee 80.3±0.0 89.3±0.1 85.3±0.2 89.8±0.1 71.3±0.0 78.7±0.0 79.1±0.0 95.0±0.0 0.404±0.007
Random Fo es 82.1±0.3 90.0±0.2 86.1±0.2 91.7±0.2 71.9±0.0 79.7±0.0 78.1±0.1 96.1±0.0 0.272±0.006
XGBoos 83.5±0.2 92.2±0.0 87.3±0.2 92.8±0.1 77.6±0.0 85.9±0.0 97.3±0.0 99.9±0.0 0.206±0.005
Ligh GBM 83.5±0.1 92.3±0.0 87.4±0.2 92.9±0.1 77.1±0.0 85.5±0.0 93.5±0.0 99.7±0.0 0.195±0.005
Ca Boos 83.6±0.3 92.4±0.1 87.2±0.2 92.8±0.1 77.5±0.0 85.8±0.0 96.4±0.0 99.8±0.0 0.196±0.004
Model T ees 82.6±0.2 91.5±0.0 85.0±0.2 90.4±0.1 69.8±0.0 76.7±0.0 --0.385±0.019
NODE + PLE 78.6±0.0 72.2±0.0 86.1±0.5 91.4±0.4 73.3±0.0 81.4±0.0 74.2±0.0 95.1±0.0 0.215±0.009
BFCN 71.0±1.1 77.6±1.3 85.1±0.5 90.9±0.4 70.6±1.1 77.7±0.9 73.4±0.2 93.2±0.6 0.277±0.017
NODE + PLE demons a es ha i s pe o mance sugges s ha i is
highly da ase -dependen . NODE + PLE p o es highly e!ec i e on he
Cali o nia Housing eg ession ask, whe e i achie es an MSE (0.215) ha
is highly compe i i e wi h op-pe o ming GBDTs like Ligh GBM (0.195),
his shows i s po en ial on nume ical da a. I also pe o ms compa ably
well on he Adul da ase bu is s ill ou pe o med by GBDTs. Bu i s
pe o mance collapses on he HELOC da ase , whe e i s AUC (72.2) is a
huge ou lie , alling below e en simple linea models and sugges ing p obably
a miscon igu a ion speci ic o ha da a. NODE + PLE on HIGGS and
Co e ype e en hough good bu no he bes , which is likely due o he
subsampling o da ase s and educ ion o embedding dimensions in o he o
sol e he issue o memo y unning ou du ing aining.
Table 4.4: Pe o mance me ics o NODE +PLE and Deep Lea ning Models.
Me hod HELOC Adul HIGGS Co e ype Cal. Housing
Acc⇑AUC⇑Acc⇑AUC⇑Acc⇑AUC⇑Acc⇑AUC⇑MSE⇓
MLP 73.2±0.3 80.3±0.1 84.8±0.1 90.3±0.2 77.1±0.0 85.6±0.0 91.0±0.4 76.1±3.0 0.263±0.008
VIME 72.7±0.0 79.2±0.0 84.8±0.2 90.5±0.2 76.9±0.2 85.5±0.1 90.9±0.1 82.9±0.7 0.275±0.007
DeepFM 73.6±0.2 80.4±0.1 86.1±0.2 91.7±0.1 76.9±0.0 83.4±0.0 --0.260±0.006
DeepGBM 78.0±0.4 84.1±0.1 84.6±0.3 90.8±0.1 74.5±0.0 83.0±0.0 --0.856±0.065
NAM 73.3±0.1 80.7±0.3 83.4±0.1 86.6±0.1 53.9±0.6 55.0±1.2 --0.725±0.022
Ne -DNF 82.6±0.4 91.5±0.2 85.7±0.2 91.3±0.1 76.6±0.1 85.1±0.1 94.2±0.1 99.1±0.0 -
TabNe 81.0±0.1 90.0±0.1 85.4±0.2 91.1±0.1 76.5±1.3 84.9±1.4 93.1±0.2 99.4±0.0 0.346±0.007
TabT ans o me 73.3±0.1 80.1±0.2 85.2±0.2 90.6±0.2 73.8±0.0 81.9±0.0 76.5±0.3 72.9±2.3 0.451±0.014
SAINT 82.1±0.3 90.7±0.2 86.1±0.3 91.6±0.2 79.8±0.0 88.3±0.0 96.3±0.1 99.8±0.0 0.226±0.004
RLN 73.2±0.4 80.1±0.4 81.0±1.6 75.9±8.2 71.8±0.2 79.4±0.2 77.2±1.5 92.0±0.9 0.348±0.013
STG 73.1±0.1 80.0±0.1 85.4±0.1 90.9±0.1 73.9±0.1 81.9±0.1 81.8±0.3 96.2±0.0 0.285±0.006
NODE 79.8±0.2 87.5±0.2 85.6±0.3 91.1±0.2 76.9±0.1 85.4±0.1 89.9±0.1 98.7±0.0 0.276±0.005
NODE + PLE 78.6±0.0 72.2±0.0 86.1±0.5 91.4±0.4 73.3±0.0 81.4±0.0 74.2±0.0 95.1±0.0 0.215±0.009
BFCN 71.0±1.1 77.6±1.3 85.1±0.5 90.9±0.4 70.6±1.1 77.7±0.9 73.4±0.2 93.2±0.6 0.277±0.017
Boos ResNe 69.6.0±0.2 - 85.7±0.1 - ---- -
Compa ed o he deep lea ning models,NODE + PLE achie es s a e o
he a esul s on he Cali o nia Housing eg ession ask (MSE: 0.215) and
compe i i e accu acy on he Adul da ase (Acc: 86.1). Howe e i ails
31

g ea ly on he Heloc da ase and i s esul s on Higgs and Co e ype is good
bu no he bes likely due o he subsampling o da ase s and educ ion o
embedding dimensions in o he o sol e he issue o memo y unning ou
du ing aining.
32
Chap e 5
Conclusion
We explo ed h ee di!e en model a chi ec u es, i s in eg a ion and uning
o abula da a. ou model; NODE+PLE shows a signi ican po en ial
o achie ing compe i i e pe o mance wi h GBDTs especially on Cali o nia
Housing eg ession ask and also on he Adul da ase which alida es he
e!ec i eness o PLE. Howe e i s pe o mance on he o he da ase s we e
no he bes .
Limi a ions
•The esul s sugges s ha NODE + PLE’s pe o mance is subjec ed o
he kind o da ase as i s pe o mance was inconsis en ac oss all he
da ase s.
•The complexi y o NODE+PLE, esul ed in longe aining imes and
highe memo y equi emen s compa ed o G adien Boos ed Decision
T ees (GBDTs) and simple MLPs. An ins ance is wi h he Co e -
ype and Higgs da ase s which consis en ly un ou o memo y du ing
aining.
•The PLE laye elies on ixed binning s a egies. In an en i onmen
whe e ea u e dis ibu ions keeps on changing, his binning may no
adap quickly esul ing o a decline in model pe o mance.
Fu u e Wo ks
•I is s ill unclea when o use NODE + PLE. la ge da se o no ? and
wha ea u es?
33
•The highly in alid esul s o HELOC da a se need o be u he in-
es iga ed. I sugges s ha PLE can memo ize noise and need o be
well egula ed.
34
Bibliog aphy
[1] Rishabh Aga wal, Le i Melnick, Nicholas F oss , Xuezhou Zhang, Ben
Lenge ich, Rich Ca uana, and Geo! ey E. Hin on. Neu al addi i e
models: In e p e able machine lea ning wi h neu al ne s. In Ad ances
in Neu al In o ma ion P ocessing Sys ems, 2021.
[2] Takuya Akiba, Sho a o Sano, Toshihiko Yanase, Take u Oh a, and
Masano i Koyama. Op una: A nex -gene a ion hype pa ame e op-
imiza ion amewo k. In P oceedings o he 25 h ACM SIGKDD In e -
na ional Con e ence on Knowledge Disco e y and Da a Mining, pages
1–10, Jul. 2019.
[3] Edesio Alcoba¸ca, Felipe Siquei a, Ad iano Ri olli, Lu´ıs P. F. Ga cia,
Je!e son T. Oli a, and And ´e C. P. L. F. de Ca alho. M e: Towa ds
ep oducible me a- ea u e ex ac ion. Jou nal o Machine Lea ning Re-
sea ch, 21(111):1–5, 2020.
[4] Se can ¨
O A ik and Tomas P is e . Tabne : A en i e in e p e able
abula lea ning. In P oceedings o he AAAI Con e ence on A i icial
In elligence, olume 35, pages 6679–6687, 2021.
[5] Sabuhi Badi li, Xiaowen Liu, Zhao Xing, A ko Bhowmik, Khanh Doan,
and Sa hiya S. Kee hi. G adien boos ing neu al ne wo ks: G owne .
a Xi p ep in , a Xi :2002.07971, 2020.
[6] Pie e Baldi, Pe e Sadowski, and Daniel Whi eson. Sea ching o exo ic
pa icles in high-ene gy physics wi h deep lea ning. Na u e Communi-
ca ions, 5(1):1–9, Sep. 2014.
[7] R.V. Bo iso , T. Leemann, K. Seßle , J. Haug, M. Pawelczyk, and
G. Kasneci. Deep neu al ne wo ks and abula da a: A su ey. 2022.
[8] Tom B own, Benjamin Mann, Nick Ryde , Melanie Subbiah, Ja ed D.
Kaplan, P a ulla Dha iwal, A ind Neelakan an, P ana Shyam, Gi ish
37
Sas y, Amanda Askell, e al. Language models a e ew-sho lea n-
e s. Ad ances in Neu al In o ma ion P ocessing Sys ems, 33:1877–1901,
2020.
[9] Tianqi Chen and Ca los Gues in. Xgboos : A scalable ee boos -
ing sys em. In P oceedings o he 22nd ACM SIGKDD In e na ional
Con e ence on Knowledge Disco e y and Da a Mining, 2016.
[10] Co inna Co es, Xa ie Gonzal o, Vi aly Kuzne so , Meh ya Moh i,
and Sco Yang. Adane : Adap i e s uc u al lea ning o a i icial neu-
al ne wo ks. In In e na ional Con e ence on Machine Lea ning, pages
874–883. PMLR, 2017.
[11] Alexey Doso i skiy, Lucas Beye , Alexande Kolesniko , Di k Weis-
senbo n, Xiaohua Zhai, Thomas Un e hine , Mos a a Dehghani,
Ma hias Minde e , Geo g Heigold, Syl ain Gelly, e al. An image
is wo h 16x16 wo ds: T ans o me s o image ecogni ion a scale. In
In e na ional Con e ence on Lea ning Rep esen a ions, 2021.
[12] James Doughe y, Ron Koha i, and Meh an Sahami. Supe ised and
unsupe ised disc e iza ion o con inuous ea u es. In P oceedings o
he 12 h In e na ional Con e ence on Machine Lea ning (ICML), pages
194–202, 1995.
[13] Dhee u Dua and Casey G a!. Uci machine lea ning eposi o y. Online,
2017.
[14] S. Elsayed, D. Thyssens, A. Rashed, H. S. Jomaa, and L. Schmid -
Thieme. Do we eally need deep lea ning models o ime se ies o e-
cas ing? a Xi p ep in , a Xi :2101.02118, 2021.
[15] FICO. Home equi y line o c edi (heloc) da ase , 2019. Accessed:
Jun. 15, 2022. [Online]. A ailable: h ps://communi y. ico.com/s/
explainable-machine-lea ning-challenge.
[16] Yoa F eund and Robe E. Schapi e. A decision- heo e ic gene al-
iza ion o on-line lea ning and an applica ion o boos ing. Jou nal o
Compu e and Sys em Sciences, 55(1):119–139, 1997.
[17] Je ome H. F iedman. G eedy unc ion app oxima ion: a g adien boos -
ing machine. Annals o S a is ics, 29(5):1189–1232, 2001.
[18] Ian Good ellow, Yoshua Bengio, and Aa on Cou ille. Deep Lea ning.
MIT P ess, 2016.
38

[19] Yu y Go ishniy, I an Rubache , and A em Babenko. On embeddings
o nume ical ea u es in abula deep lea ning. In Ad ances in Neu al
In o ma ion P ocessing Sys ems, olume 35, pages 24991–25004, 2022.
[20] Yu y Go ishniy, I an Rubache , Valen in Kh ulko , and A em
Babenko. Re isi ing deep lea ning models o abula da a. In Ad-
ances in Neu al In o ma ion P ocessing Sys ems, olume 34, pages
18932–18943, 2021.
[21] Lucien G insz ajn, Edoua d Oyallon, and Ga¨el Va oquaux. Why do
ee-based models s ill ou pe o m deep lea ning on ypical abula
da a? Ad ances in Neu al In o ma ion P ocessing Sys ems, 35:507–
520, 2022.
[22] Xiaojuan Qi Ruigang Yang Gao Huang Hao Li, Hong Zhang. Imp o ed
echniques o aining adap i e deep ne wo ks. In Compu e Vision
and Pa e n Recogni ion, 2019.
[23] Xiang He, Ke Zhao, and Xiaowen Chu. Au oml: A su ey o he s a e-
o - he-a . Knowledge-Based Sys ems, 212:106622, 2021.
[24] Fu ong Huang, Jo dan Ash, and Robe Schapi e. Lea ning deep esne
blocks sequen ially using boos ing heo y. In In e na ional Con e ence
on Machine Lea ning, 2018.
[25] Gao Huang, Yu Sun, Zhuang Liu, Daniel Sed a, and Kilian Q. Wein-
be ge . Mul i-scale dense con olu ional ne wo ks o e”cien p edic ion.
a Xi p ep in , a Xi :1703.09844, 2017.
[26] A lind Kad a, Ma ius Lindaue , F ank Hu e , and Josi G abocka.
Well- uned simple ne s excel on abula da ase s. In Ad ances in Neu al
In o ma ion P ocessing Sys ems, olume 34, 2021.
[27] Guolin Ke, Qi Meng, Thomas Finley, Tai eng Wang, Wei Chen, Wei-
dong Ma, Qiwei Ye, and Tie-Yan Liu. Ligh gbm: A highly e”cien
g adien boos ing decision ee. 2017.
[28] Diede ik P. Kingma and Jimmy Ba. Adam: A me hod o s ochas ic
op imiza ion. a Xi p ep in a Xi :1412.6980, 2014.
[29] Ron Koha i and Meh an Sahami. E o -based and en opy-based dis-
c e iza ion o con inuous ea u es. In P oceedings o he Second In e -
na ional Con e ence on Knowledge Disco e y and Da a Mining (KDD),
pages 114–119. AAAI P ess, 1996.
39
[30] Roman Le in, Vale iia Che epano a, A i Schwa zschild, A pi Bansal,
C Bayan B uss, Tom Golds ein, And ew Go don Wilson, and Micah
Goldblum. T ans e lea ning wi h deep abula models. In In e na ional
Con e ence on Lea ning Rep esen a ions, 2023.
[31] J. Li, Y. Li, X. Xiang, S.-T. Xia, S. Dong, and Y. Cai. Tn : An in e -
p e able ee-ne wo k- ee lea ning amewo k using knowledge dis il-
la ion. En opy, 22(11):1203, 2020.
[32] Duncan McEl esh, Sau abh Khandagale, Jose Val e de, Chai anya V.
P asad, Gou ham Ramak ishnan, Micah Goldblum, and Colin Whi e.
When do neu al ne s ou pe o m boos ed ees on abula da a? In
Ad ances in Neu al In o ma ion P ocessing Sys ems, olume 36, pages
76336–76369, 2023.
[33] D. Med ede and A. D’yakono . New p ope ies o he da a dis-
illa ion me hod when wo king wi h abula da a. a Xi p ep in ,
a Xi :2010.09839, 2020.
[34] Ch is ophe Z. Mooney. Mon e Ca lo Simula ion. SAGE, Newbu y
Pa k, CA, USA, 1997.
[35] Ben Pe e s, Vlad Niculae, and And ´e FT Ma ins. Spa se sequence- o-
sequence models. In P oceedings o he 57 h Annual Mee ing o he As-
socia ion o Compu a ional Linguis ics (ACL), pages 1504–1519. As-
socia ion o Compu a ional Linguis ics, 2019.
[36] Se gei Popo , S anisla Mo ozo , and A em Babenko. Neu al obli ious
decision ensembles o deep lea ning on abula da a. In In e na ional
Con e ence on Lea ning Rep esen a ions, 2020.
[37] Se gey Popo , S anisla Mo ozo , and And ey Babenko. Neu al obli -
ious decision ensembles o deep lea ning on abula da a. a Xi
p ep in , a Xi :1909.06312, 2019.
[38] Liudmila P okho enko a, Gleb Guse , Aleksand Vo obe , Anna V.
Do ogush, and And ey Gulin. Ca boos : unbiased boos ing wi h ca e-
go ical ea u es. 2018.
[39] Nasim Rahaman, A is ide Ba a in, De ansh A pi , Felix D axle , Min
Lin, F ed A. Hamp ech , Yoshua Bengio, and Aa on Cou ille. On
he spec al bias o neu al ne wo ks. In P oceedings o he 36 h In e -
na ional Con e ence on Machine Lea ning (ICML), pages 5301–5310.
PMLR, 2019.
40
[40] D. Roschewi z, M.-A. Ha ley, L. Co inzia, and M. Jaggi. I eda g: In-
e p e able da a-in e ope abili y o ede a ed lea ning. a Xi p ep in ,
a Xi :2107.06580, 2021.
[41] I an Rubache , A em Alekbe o , Yu y Go ishniy, and A em
Babenko. Re isi ing p e aining objec i es o abula deep lea ning.
a Xi p ep in , a Xi :2207.03208, 2022.
[42] Debo Sahoo, Quang Pham, Jing Lu, and S e en C. Hoi. Online deep
lea ning: Lea ning deep neu al ne wo ks on he ly. a Xi p ep in ,
a Xi :1711.03705, 2017.
[43] J¨u gen Schmidhube . Deep lea ning in neu al ne wo ks: An o e iew.
Neu al Ne wo ks, 61:85–117, 2015.
[44] Shai Shale -Shwa z. Sel ieboos : A boos ing algo i hm o deep lea n-
ing. In In e na ional Con e ence on Machine Lea ning, 2014.
[45] Xiang Shi, Johannes Muelle , Nicholas E ickson, Ming Li, and Alexan-
de Smola. Mul imodal au oml on s uc u ed ables wi h ex ields.
In 8 h ICML Wo kshop on Au oma ed Machine Lea ning (Au oML),
2021.
[46] Ra id Shwa z-Zi and Ami A mon. Tabula da a: Deep lea ning is
no all you need. 2021.
[47] Ma hew Tancik, P a ul P. S ini asan, Ben Mildenhall, Sa a
F ido ich-Keil, Ni hin Ragha an, U ka sh Singhal, Ra i Ramamoo hi,
Jona han T. Ba on, and Ren Ng. Fou ie ea u es le ne wo ks lea n
high equency unc ions in low dimensional domains. In Ad ances in
Neu al In o ma ion P ocessing Sys ems (Neu IPS), 2020.
[48] Ashish Vaswani, Noam Shazee , Niki Pa ma , Jakob Uszko ei , Llion
Jones, Aidan N. Gomez, #Lukasz Kaise , and Illia Polosukhin. A en ion
is all you need. In Ad ances in Neu al In o ma ion P ocessing Sys ems,
pages 5998–6008, 2017.
[49] Le Yang, Xiaoyang Huang, Hao Zhang, Yu Wang, Zhiwei Liu, and
Gao Huang. Resolu ion adap i e ne wo ks o e”cien in e ence. In
P oceedings o he IEEE/CVF Con e ence on Compu e Vision and
Pa e n Recogni ion, 2020.
41
[50] Han ing Yu, Hao Li, Gang Hua, Gao Huang, and Honghui Shi. Boos ed
dynamic neu al ne wo ks. In P oceedings o he AAAI Con e ence on
A i icial In elligence, 2023.
[51] Han ing Yu, Hao Li, Gang Hua, Gao Huang, and Honghui Shi. Boos ed
dynamic neu al ne wo ks. In P oceedings o he AAAI Con e ence on
A i icial In elligence, olume 37, pages 10989–10997, 2023.
[52] Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. Deep lea ning based
ecommende sys em: A su ey and new pe spec i es. ACM Compu ing
Su eys, 52(1):1–38, 2019.
42

Related note

Why institutions use Plag.ai for originality review, entry 37
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by research administrators in North America, Europe, Latin America, and international online education, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also stronger evidence for review committees, more reliable review records, and clearer documentation of academic decisions. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For research files, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai