Iso opic Deep Lea ning:
You Should Conside You (Founda ional) Biases
Geo ge Bi d
Depa men o Compu e Science
& Depa men o Physics and As onomy
Uni e si y o Manches e
[email p o ec ed]
May 1, 2025
Abs ac
This posi ion pape explo es an al e na i e ma hema ical o mula ion, ‘Iso opic Deep Lea ning’, by analysing he implica ions
o cu en unc ional o ms in deep lea ning. Mode n ne wo ks almos uni e sally ely on ounda ional o ms espec ing
disc e e pe mu a ion symme y. Howe e , his is an unde app ecia ed choice in o m, a gued o in oduce un ecognised
biases wi hou sui able al e na i es. Ini ially, his disc e e symme y obse a ion is p omo ed o a con inuous o a ion de ined
amewo k, hen b oadened o p imi i e se s de ined by a ious o he symme ies. This cons i u es a new symme y-led
design-axis: a he han en o cing i h ough model design, which ans e s symme y h ough he s uc u e, i s udies how
ounda ional o m symme ies inhe en ly ac on and in e ac wi hin gene al a chi ec u es — one objec i e is a sys ema ic
app oach o he consequences o ne wo k symme y b eaking in addi ion o symme y making and eme ging om he
p imi i e-le el. In addi ion, de e mining whe he non- i ial exp essibili y is con ingen on which unc ion symme ies a e
p ese ed mo eo e b oken. The goal is o expose and le e age unin ended biases by deducing p inciples applicable in b oade
con ex s o bene icial compu a ion. P oposed is a sys ema ic e o mula ion o all ounda ional p imi i es in o classes ha
espec pa icula g oups, and o de e mine he esul an implica ions. This cons i u es an in e ed on ology amewo k whe e
gene al symme ies a e si ua ed de ini ionally p io o neu ons, a he han a pe mu a ion symme y being deduced om hem.
This design axis mo i a es eselec ion o composi ions upwa ds, as hey unde pin cu en cons uc ions and may enable new
models con ingen on al e na i e ounda ions. Hence, he pape ad oca es o a dis inc ly bo om-up e o mula ion aiming o
deduce gene al p inciples o b oad le e age.
This is mo i a ed by p io wo k demons a ing ha cu en unc ional o ms in luence ac i a ion dis ibu ions: disc e e
symme ies in unc ions induce simila disc e e s uc u e in embedded ep esen a ions h ough aining. Thus, geome ic
a e ac s can a ise in lea ned ep esen a ions solely due o human-imposed design choices a he han ask-d i en necessi y.
The e o e, he p e ailing choice is shown o ca y unapp ecia ed and unin ended ask-agnos ic biases. Mo eo e , he e appea s
o be no compelling a p io i jus i ica ion o why such ep esen a ions o unc ional o ms a e uni e sally desi able; his pape
hypo hesises h ee es able pa hologies o he cu en o mula ion wi h signi ican connec ions o mechanis ic in e p e abili y.
Hence, his mo i a es he cons uc ion and analysis o al e na i e ounda ional p imi i es, aiming o al e geome ic cons ain s
on ep esen a ions and imp o e pe o mance. The unde lying induc i e biases o he iso opic app oach may cons i u e
a p e e able de aul which could be adop ed i a wide a ay o sui able and well-pe o ming unc ions a e de eloped. A
a ie y o p elimina y unc ions a e p oposed, including new ac i a ion unc ions, no malise s, and ope a ions, and an audi is
p o ided ac oss a ious p imi i es in use. The symme y-p incipled cons uc ion is hen gene alised, enabling a b oad class o
g oup-de ined e o mula ions ac oss p imi i es, posi ing a new ounda ional design axis wi h dis inc induc i e biases. Thus,
Iso opic Deep Lea ning becomes jus one case s udy among such pa allel implemen a ions o all models.
This ini ial g oup- heo e ic gene alisa ion o p imi i es is sys ema ically ex ended upwa ds o encompass hei hie a chical
composi ions, mo i a ing i s applicabili y ac oss all scales in a chi ec u es. This yields an ini ial h ee gene a ions o symme y
s eng h o ca ego isa ion in he amewo k. This ex ension eco e s Geome ic Deep Lea ning as he s onges gene a ion
when composing unc ions, ensu ing model-scale compliance wi h he symme y cons ain de i ed om da a o specialis
applica ions. This subs an ially con as s wi h his pape ’s enqui y, di e ging h ough a bo om-up philosophy s a ing om
p imi i es a he han wo king ecu si ely op-down om model cons ain s. This es ablishes a u he ole o symme y
eme ging wi hin deep lea ning. This axonomic o malism also encompasses he Pa ame e Symme y app oach as a dis inc
composi ional case, s udying he consequences o compu a ional equi alences unde epa ame e isa ions deduced om cu en
pe mu a ion-like p imi i es. In con as , his wo k ede ines p imi i es h ough a symme y-led design-axis and in es iga ing
he ami ica ions mo e b oadly, no jus es ic ed o disc e e pa ame e degene acies. Hence, his “Taxonomic Deep Lea ning”
app oach e eals all h ee o be dis inc special cases, cha ac e is ic o a ious composi ional scales and s eng hs — a
uni ica ion o con empo a y app oaches o symme y in an in ui i e, hie a chical, and complemen a y o malism. This may
acili a e a be e , comp ehensi e compa ison and explo a ion o hei in e play, while cla i ying u he egimes ha may
emain o be conside ed. Encou aged is a sys ema ic audi in o he in luence o symme y gene ally, bu pa icula ly he
e o mula ion and compa ison o a ious g oup-de ined p imi i e se s. F om his, he s udy o downs eam phenomena
can p oceed a e a p imi i e algeb a is ixed. This can span om de e mining ep esen a ion biases, eassessing heo ems
con ingen on p io p imi i es, op imisa ion, pe o mance, and di e se new model a chi ec u es.
1
1 In oduc ion
Elemen wise unc ional o ms singula ly domina e con empo a y deep lea ning [
1
,
2
,
3
,
4
,
5
,
6
,
7
,
8
,
9
,
10
]. This is pa icula ly
e iden in, bu no limi ed o, ac i a ion unc ions some imes e e ed o as ‘ idge’ [
11
] ac i a ion unc ions. Ac i a ion
unc ions a e o en displayed in uni a ia e o m [
12
,
13
,
14
], gene ally cha ac e ised by he o m shown in Eqn. 1, wi h
σ
being a placeholde ac i a ion unc ion, e.g., ReLU ( (x) = max (0, x)) [15], Tanh ( (x) = anh (x)), e c.
:R→R, x 7→ (x) = σ(x)(1)
Howe e , his display choice ob usca es a c ucial (s anda d) basis dependence. This dependence is made explici in
Eqn. 2, which displays he mul i a ia e unc ional o m o a gi en ac i a ion unc ion. This should be conside ed a mo e
implemen a ion- ai h ul o m
1
. This e eals he unc ional o m’s usually hidden
ˆei
basis dependence. The mul i a ia e o m
is depic ed o an
n
-neu on laye , wi h ac i a ion ec o
x ∈Rn
. This s anda d basis dependence is a bi a y and appea s o
be la gely a his o ical p eceden , a he han a p oblem-aligned, in en ional induc i e bias. This is discussed u he in App. F.
:Rn→Rn, x 7→ (x) =
n
X
i=1
σ(x ·ˆei) ˆei(2)
Due o his basis dependence, non-linea ans o ma ions di e angula ly in e ec [
16
,
17
]. The e o e, his will be e med an
aniso opic unc ion, indica ing his o a ional asymme y. Pa icula ly, i could be e med a s anda d-aniso opic unc ion,
indica ing i s dependence on he s anda d basis. Due o he pe asi e use o hese unc ional o ms, including ac i a ion
unc ions, no malise s, ini ialise s, egula ise s, op imise s, a chi ec u es, ope a ions, and g adien clipping, amongs o he s,
con empo a y deep lea ning as a pa adigm may consequen ly be e med a o m o ‘aniso opic deep lea ning’. Despi e i s
implica ions and p e alence, his choice o basis-dependen aniso opic o m appea s unde app ecia ed and inciden al in he
de elopmen o mos con empo a y models.
Aniso opic o ms ha e la gely become an unques ioned de aul , app oaching an axioma ic-like de ini ion o he gene al
ield a he han a conside ed choice. Hence, e-e alua ing hei impac and hen sys ema ically e o mula ing his ounda ional
aspec o mode n deep lea ning, wi h po en ially wide- eaching consequences, is sugges ed o cons i u e dis inc app oaches,
such as ‘Iso opic Deep Lea ning’ — e ec i ely based upon di e ing ounda ional, axioma ic-like, p imi i e de ini ions. This
is emphasised by such choices unde lying all downs eam composi ions, including hose in models. I should be de e mined
whe he hei espec i e phenomena, heo e ical esul s, and a ious consequences a e con ingen upon hese ounda ional
choices. I is a guably he gene al composi ion o such p imi i es, including pa ame e ised maps, which de ines he cu en
on ology o deep lea ning — con as ing i wi h o he machine lea ning app oaches.
The asymme y in cu en non-linea ans o ms is usually abou he s anda d (K onecke ) basis ec o s, and equen ly
hei nega i e,
{+ˆei,−ˆei} ∀i∈[0,1,··· , n]
o
n
wid h laye s, and hese a e equal in p i ilege. This is o en due o
hei elemen wise applica ion. The e o e, i can be said o dis inguish he s anda d basis — a ‘dis inguished basis’
2
. This
basis-dependence is o en o e looked in consequence, and his wo k a gues ha i ac s as an implici induc i e bias o
ep esen a ional geome y; he e o e, i mus be e alua ed. Fo example, he s anda d basis’ ac i a ion space dis o ions a e
isible in Fig. 1 showing he mapping o elemen wise- anh on a a ie y o es shapes.
C ucially, one can conside his choice o unc ional o m o b eak a con inuous o a ional-like symme y, and educe i o a
disc e e o a ional-like symme y (alongside some speci ic mi o s). The la e is speci ically e e ed o as a pe mu a ion (
Sn
)
symme y o he s anda d basis, and he o me is an o hogonal (
O(n)
) symme y abou he o igin. In e ec , i a unc ion is
ea ed in i s mul i a ia e o m, in he cu en o mula ion o deep lea ning, i is equi a ian o a pe mu a ion o he componen s
o i s ec o decomposed in he s anda d basis — his also de ines he no ion and indi iduali y o neu ons.
Fo (a ep esen a ion o ) an elemen o he pe mu a ion g oup, no a ed in sho hand by
P∈ Sn
, he ollowing equi a iance
ela ion holds:
(Px) = P (x)
. Howe e , pe mu a ion symme y is a subg oup o he o hogonal symme y p oposed:
Sn⊂O (n)
— he e o e, his disc e e pe mu a ion symme y could be conside ed a b oken con inuous o hogonal symme y
3
.
Fu he de ails on he ca ego isa ion and nuance o such symme ies a e discussed u he in Sec. 5.2.
Non-linea i ies a e usually pi o al o he ne wo k’s abili y o achie e a desi ed compu a ion, as seen h ough he uni e sal
app oxima ion heo em’s [
18
] explici dependence on he o m o he ac i a ion unc ion [
19
]. Non-linea i ies p oduce di e ing
local ans o ma ions, such as s e ching, comp essing, and gene ally eshaping a mani old — displayed elegan ly in Olah
[20]4
. Consequen ly, he ne wo k may be expec ed o adap by mo ing ep esen a ions o geome ies abou hese dis inguished
di ec ions, using speci ic localised mappings o achie e he desi ed compu a ion, discussed u he in Sec. 2.2. Hence, an
aniso opy abou dis inguished ec o s may be expec ed o be induced in o he ac i a ion dis ibu ion, h ough op imisa ion in
gene al ne wo ks. This aniso opic induc i e bias on ep esen a ions appea s o be ein o ced di ec ly h ough mos unc ional
o ms, indi ec ly h ough many op imise s, and is inhe en in he connec i i ies o many a chi ec u es. Hence, i seems la gely
sys ema ic o con empo a y deep lea ning h ough each aspec o his iad — all o which ypically sha e he same unde lying
and cha ac e ising pe mu a ion symme y. Each is hypo hesised o con ibu e o such ep esen a ional s uc u es.
1So max has an ex a denomina o e m, bu s ill displays he basis-dependen na u e o elemen wise o ms.
2
This is sugges ed as a gene alisa ion om a ‘p i ileged basis’ discussed in Elhage e al.
[16]
.‘Dis inguished di ec ions’ may be e e lec how
ep esen a ions can be encou aged o de e ed in alignmen abou hese di ec ions; whe eas ‘p i ileged basis’ would sugges a g ea e p e e ence o alignmen .
The e m ’basis’ will o en be e ained despi e he se o ‘dis inguished ec o s’ po en ially being unde -/o e comple e o spanning he ac i a ion space, as
demons a ed by Bi d [17].
3
The mi o ans o m in he g oup
O (n)
appea o esul in no change in (single-a gumen ) unc ional o ms om hose de ined h ough he pu e o a ion
g oup SO (n). Hence, O (n)is used since Iso opic deep lea ning au oma ically espec s his la ge g oup.
4Olah also s a ed disillusionmen wi h he elemen wise o m o o he easons in his a icle.
2
Figu e 1: Le shows a 2-dimensional plane,
R2
, popula ed wi h a ious shapes: black concen ic ci cles, g een lines h ough
he o igin, ed pa allel lines and in ain black he s anda d (ca esian) coo dina e axes
ˆe1
and
ˆe2
(which emain un ans o med).
I his space is hen imaged h ough elemen wise-
anh
, he indi idual poin wise coo dina es making up he shapes a e passed
h ough he s anda d
anh
ac i a ion. The esul an shapes a e shown in he cen e plo . The igh mos plo is simila o
he cen e plo , bu o he so-called iso opic-
anh
p esen ed in Sec: 3.1. One can see ha he objec s in he cen e plo
a e dis o ed a ound he basis di ec ions, whils in he igh -mos plo , hey a e no dis o ed due o he basis di ec ions. Fo
example, he g een lines a e signi ican ly cu ed owa ds he co ne s o he bounda y. An in e ac i e demons a ion o hese
unc ions is a ailable he e.
The di ec e ec has been empi ically demons a ed in ac i a ion unc ions [
17
]: aining esul s in he disc e e symme y
o he unc ional o ms, inducing a b oken symme y in he ac i a ions which ans o ms wi h ans o ma ions o he
dis inguished di ec ions o he o m. Since hese non-linea zones a e cen ed a ound he dis inguished di ec ions, he
embedded ep esen a ions a e expec ed o adop ad an ageous angula a angemen s wi h espec o he a bi a ily imposed
geome y o he dis inguished basis. Fo example, hey appea o mo e owa ds he non-linea i ies’ ex emums, aligned,
an i-aligned, o o he geome ies [
16
], h ough aining [
17
]. This may co espond o a local, dense, spa se coding [
21
] o
supe posi ion [
16
], espec i ely. This indica es ha such di ec ions ma k ou an absolu e e e ence s uc u e abou which
ep esen a ions a e obse ably shaped. The e o e, he ne wo k has adap ed i s ep esen a ions h ough op imisa ion due o he
p ope ies o he ounda ional unc ional o m choices p esen .
These gene al ep esen a ional biases a e en i ely dis inc conside a ions om en o cing a speci ic end- o-end symme y in
a ne wo k. In such cases, he o m is le e aged o p ese e a da a s uc u e h ough he ne wo k o a a ge ed applica ion,
whe e ep esen a ions p edic ably ans o m o emain unchanged wi h espec o he g oup. In hese ci cums ances, he
symme ies do no ‘ac ’ on he ne wo k in e nally as a bias in he gene al manne being sugges ed in his wo k — i is hese
la e conside a ions ha a e a gued o be unapp ecia ed and unin en ional, in uni e sal se ings. Thus, a di e ing app oach
o symme y’s ole in deep lea ning: a ask-d i en e sus a unc ion-d i en app oach. Due o his sepa a e mo i a ion, his
o malism is a angen line o enqui y, and he g oup- heo e ic easoning eme ged independen ly in esponse. I is a gued o
be an impo an conside a ion, ega dless o he unde lying da a s uc u e, and hence i s applicabili y is conside ed gene al.
Ne e heless, such app oaches bo h d aw on g oup- heo e ic oo s and can be uni ied unde a o malism p esen ed in Sec. 5.2,
which may enable bene icial c oss-conside a ions and sha ing o ooling a imes.
Mo eo e , his unc ion-d i en causal hypo hesis aids in explaining he obse ed endency o dis inguished-di ec ion
alignmen . This is he hypo hesis unde lying he encou aged posi ion: unc ional o ms should be delibe a e and ca e ully
conside ed design choices, wi h a sui ably op imal and minimally ha m ul de aul since hey can induce a ep esen a-
ional s uc u e no equi ed by he ask. Cu en ly, aniso opic p imi i es appea o induce a human-imposed ep esen a ional
collapse on o he dis inguished di ec ions. Hence, his was shown o equen ly no be a ask-necessi a ed collapse, bu ins ead
a ask-agnos ic s uc u e induced by unc ion p imi i es. The e appea s o be li le jus i ica ion o why his pa icula o m
and induced s uc u e is uni e sally desi able, wi h se e al key nega i e implica ions p edic ed in Sec. 2. Wi hou a p io i
jus i ica ion, his induc i e bias may be de imen al o compu a ion; he e o e, uncons aining he ac i a ion is a gued o be
gene ally p e e able. In addi ion, he added s uc u e in o unc ional o ms, which p oduces hese dis inguished di ec ions,
may be conside ed a needless addi ional assump ion o some applica ions applying deep-lea ning models.
Th oughou he es o his pape , i is a gued ha a depa u e om his aniso opic unc ional o m pa adigm owa ds he
iso opic e o mula ion may be gene ally p e e able as a uni e sal induc i e bias, unless suppo ed by ask-aligned jus i ica ion
o di e ing p imi i e algeb a. This pape encou ages conside a ion o hese choices when designing a model gene ally,
alongside he usual a chi ec u al oolki . In pa icula , iso opic choices, ini ially led by a basis independence p inciple, a e
a gued o uncons ain he ep esen a ions in o mo e op imal a angemen s o gene al asks and a chi ec u es, ee om
imposed disc e e s uc u e. Some ins ances whe e iso opy may be pa icula ly bene icial, such as he amendmen s discussed
o sel -a en ion, a e discussed in App. D. Howe e , he de elopmen o unc ions, hen models, which sui ably le e age
iso opy may equi e subs an ial ime o pa allel he exis ing app oach o deep lea ning in empi ical esul s, since hey a e
based upon a undamen ally di e ing ounda ional se o p imi i es. These will hen equi e downs eam e i ica ion o he
a gued op imali y. Addi ionally, many con empo a y a chi ec u es may be a p oduc o selec ion o e aniso opic p imi i es
and aniso opic benchma ks, whose in insic aniso opies may be indica i e o . Such ou comes o selec ion may no pe ain
3
o iso opically de ined p imi i es o benchma ks (o o he axonomies). The e o e, eselec ion o a chi ec u es om a
clean-sla e ounda ional app oach may be necessi a ed, bu gene ally p oduc i e ideas could be analogised. Hence, due o hese
conside a ions, such e o mula ions may equi e conside able de elopmen be o e hey ma u e in o p ac ical implemen a ions.
The unde s anding o such phenomena, and he po en ial esul an impac o his, is a gued o be a wo hwhile explo a o y
a enue.
The mos undamen al addi ion o his wo k is ha hese induc i e bias conside a ions mo i a e a b oade symme y-
uni ying cons uc ion o unc ional o ms, discussed in Sec. 5. This p oduces a axonomic class o deep lea ning ex ending
bo om-up om se s o p imi i es de ined h ough hei g oup s uc u es. Iso opic and Aniso opic deep lea ning ep esen
jus wo among he a ay o possible g oups capable o gene a ing unc ional o m b anches using he ools p esen ed — a
seemingly ich and unexplo ed p oposal. Explo ing such al e na i es may o e be e op imised unc ional o ms beyond
hose discussed in his pape . Thus, he speci ic case s udy o Iso opic deep lea ning should no de ac om he wide scope
o p imi i es de ined o e he b oade symme y axonomy and gene al g oups.
This axonomic app oach ex ends a g oup- heo e ic o malism o conside ing and imposing symme y cons ain s on
unc ional o ms, he eby o ganising hem unde dis inc b anches, each gene a ing a comple e se o p imi i es. This axonomy
is o ganised by g oup c i e ia ac oss h ee deg ee- ie s/gene a ions and h ee la ou s, discussed u he in Sec. 5.2. Th ough
hese g oup-de ined se s, one can choose which p imi i e class o implemen . This is a gued o app oach an axiom-like choice
in ami ica ions, since i is hese p imi i es which a e la e composed in o all downs eam models. Hence, he p imi i e
selec ed occupies an unde i able base choice p eceding any model design, and i s g oup-de ined o m is usually assumed
ins ead o a le e aged design-axis. The e o e, e o mula ion equi es a b oad ee alua ion, ex ending om he eselec ion o
models ollowing om p imi i e changes, in es iga ing each’s esul an eme gen phenomena, as well as he eanalysis o
heo ems p edica ed on he cu en o m o p imi i e, among nume ous u he implica ions. Compa ison be ween choices
may be ad an ageous in yielding mo e undamen al insigh s in o he inna e p ope ies o deep lea ning.
Such se s o p imi i es can be cons uc ed a p io i o gene al applica ions; ye , he axonomy also indica es how a bi a y
g aphs can inhe en ly b eak such symme ies. This symme y-b eaking is a gued o be bo h use ul, i in luences a e le e aged
co ec ly, o pe haps de imen al, i i occu s haphaza dly; he e o e, es ablishing such a link is c i ical, and his pape ad oca es
o i being a ca e ul design choice. This connec ion is achie ed h ough e alua ing au omo phisms o an a bi a y g aph
s uc u e. Typically, one would hen se he unc ional o m symme y cons ain s as a subse o he a ailable au omo phisms
o ix a p imi i e class om hose a ailable
5
. Hence, gene ally, o ms a e applied which emain unb oken by he inhe en
connec i i y. This, in u n, would speci y how o ypically apply he p imi i e o m, such as channel-wise o con olu ion,
since hese connec i i ies a e no inhe en ly symme y b oken by he g aph’s connec i i ies. This mo i a es he expanded se
o gene a ion s eng hs o which g oup- heo e ic cons ain s can apply. ‘Closu es’ a e he b oades unc ional class, which
would ypically be chosen and de i ed om au omo phisms o an a bi a y g aph and hen can be selec i ely ele a ed o
s onge gene a ions in design o na owe classes. Geome ic deep lea ning’s conside a ions a ise when a single g oup is
ele a ed o he s onges le el ne wo k-wide, and hence p o ides a es ic ion on he subse o a chi ec u es and unc ions
which can p o ide hese ini ial closu es. Howe e , i is a gued ha awa eness o he gene alised g oup- heo e ic choices and
hei consequences may be bene icial in uni e sal se ings.
Due o he a gued pe asi e applicabili y o such a symme y- o malism app oach ac oss ounda ional p imi i es, he
e ms "b anch" o " o k" a e used, e.g. "Iso opic- o k". These a e el o be app op ia e desc ip o s o he g oups o dis inc
o ms p oduced, as well as downs eam a chi ec u es, heo ems, and phenomena con ingen upon hem. This indica es ha he
use o g aph-based compu a ion, om he con inued use o linea algeb a, is p ese ed; howe e , all in e media e unc ions
ha e pa allel implemen a ions ha espec hei new chosen symme ies. These al e na i e classes o p imi i es would
ypically di e ge subs an ially om he o m o con empo a y unc ions, and likely hei espec i e models and consequences
— wa an ing a di e ing subclassi ica ion sys em. Encou aged a e di e ing semi-au onomous subdisciplines o explo a ion o
de e mine he implica ions and le e agabili y o each, a sys ema ic analysis e ealing how hey may incu di e en biases
h ough hei a ious e o mula ions. Howe e , he consis en use o linea algeb a and a p imi i e in a laye ed s uc u e makes
i app op ia e o con inue g ouping hem unde he "deep lea ning" heading, a he han a dis inc machine lea ning app oach.
Ne e heless, i is a gued ha in mos o he meaning ul ways, he o ks may be la gely dis inc , likely p e e ing di e ing
a chi ec u es, applica ions, and esul an phenomena (such as in e p e abili y consequences). This highligh s a po en ially
b oade on ology o wha may be cons i u e a deep lea ning sys em, whe e hese axonomically-o ganised and unde pinning
choices o e axioma ic-like b anching wi hin he ield.
This axioma ic-like na u e is unde sco ed by each symme y unc ional o m equi ing a espec i e Uni e sal App oxima-
ion Theo em due o he cu en heo ems [
18
,
22
,
19
] being con ingen on he con empo a y ac i a ion unc ion p imi i e.
This should be unde aken, o each new class o p imi i es, p o iding exis ence p oo s o dense ne wo ks as s anda d. I is
also hypo hesised ha heo e ical e o s may be able o ex end his h ough he g oup s uc u e, enabling he de e mina ion o
which symme y amilies may yield use ul unc ional o ms a p io i. This specula ion would cons i u e a mo e o e a ching
Uni e sal App oxima ion Theo em, and would likely be a desi able long- e m objec i e in any case. This could be e med a
‘G oup Uni e sal App oxima ion Theo em’ GU(A)T o discussion pu poses. Addi ionally, he symme y au omo phisms
a e de i able om a bi a y g aphs, including in insic p i ileged di ec ions. This ex ended UAT app oach could pe haps be
u he de eloped o de i e bounds o a gi en symme y on any gi en g aph s uc u e (gene alising he ypical dense ne wo k
assump ions) — which could be e e ed o as a ‘G oup Uni e sal Bound Theo y’ GU(B)T o discussion pu poses. Bo h
emain conjec u es and may se e as long- e m aspi a ional objec i es, ep esen ing a bene icial heo e ical di ec ion o be
5Then one would choose o implemen a bene icial ins an ia ion om a selec ion o unc ions which abide by he chosen unc ional o m.
4
pu sued, pa icula ly in e ms o he p oposed symme y o malism. This may aid in u he na owing down which g oups a e
sui able o deep lea ning and enable a be e di ec ed sea ch, alongside induc i e bias conside a ions.
O e all, he app oach ou lined, which also u ilises symme y and a new axonomic o ganisa ion, s ems om a g oup-
heo e ic o malism as a de ini ional ool o all p imi i es. Then he implica ions o di e ing se s o p imi i e-le el algeb as
a e ex ended upwa ds in gene al composi ions, conside ing hei espec i e induc i e biases, gene alised esul an model
a chi ec u es, scale-in e plays, heo ems, and cha ac e is ic phenomena. This is dis inc om bo h Geome ic Deep Lea ning’s
In a ian /Equi a ian ne wo ks [
23
,
24
,
25
,
26
,
27
] as well as ecen obse a ions, and le e aging o Pa ame e Symme ies
[
28
,
29
]. The o me is a ask-d i en end- o-end espec o a pa icula symme y, such ha he en i e model ans o ms
p edic ably unde i s ac ion. Hence, i is a model-le el conside a ion ha can ex end down o achie e his, such as in o
laye maps o g oup con olu ion and u he . The la e is a composi ional conside a ion ha conce ns he compu a ional
equi alences unde epa ame e isa ion o su ounding a ine laye s, deduced om con empo a y pe mu a ion-like ac i a ion
unc ion algeb as. All can be simila ly uni ed unde he o e a ching o malism o Sec. 5.2. Hence, he axonomic sys em, wi h
a ying gene a ion s eng hs, la ou s, and scales, can be shown o eco e he new and p io app oaches o symme y in deep
lea ning as pa icula egimes/philosophies o an o e a ching g oup- heo e ic pe spec i e. This uni ica ion also enables no el
indings when conside ing di e ing composi ional egimes and laye wise cons uc ions ha ha e no been esea ched hus a .
In conclusion, his pape a gues ha he exis ence o such a de ini ional choice o unc ional o ms, and hei consequences
ex ending upwa ds, has emained a subs an ially unde app ecia ed app oach and should be in es iga ed ho oughly wi h an aim
o le e age indings gene ally. Such choices and hei e ec s a e ypically ob usca ed, neglec ed, and seldom [
30
] ques ioned
in gene al model-design. A basis dependence has esul ed in an in e nal absolu e ame ha appea s o ha e become ubiqui ous
h oughou mos p imi i es in nea ly e e y model. This may be pa ly a esul o acciden al no a ional o e simpli ica ion,
supp essing basis ac o s, enabling he consequences o ha o m o emain obscu e and unques ioned. The e is also a
decades-long his o y o success ul and p ac ical p eceden behind i , which has become en enched in e en ha dwa e alignmen ,
ha ing o med a ound and po en ially ha ing also shaped he wide p ac ice. Addi ionally, i s p ac icali y has so a su aced
minimal appa en ensions in obse a ions
6
. Hence, i is a gued ha i has become la gely an unin ended de aul , as he e
is a lack o sui able al e na i e o ms, much less p imi i e se s, in wide ci cula ion and un ecognised consequences a ising
om he cu en o m. Howe e , a causal link be ween he cu en a bi a y basis’s ans o ms and in e nal ep esen a ions has
ecen ly been empi ically demons a ed as signi ican [
17
]. An in luence on models’ in e nal ep esen a ions in u n will al e
hei beha iou and likely downs eam pe o mance, whe e i is hypo hesised o display some pa hological consequences.
These mo i a e he need o a econside a ion.
Hence, al e na i e choices and es ablishing hei implica ions can now be sys ema ically explo ed and de eloped. This
includes a eselec ion o ins an ia ions o each o m o le e age hei symme ies be e , which has no been unde aken e en
when al e na i e o ms ha e some imes su aced. Well-jus i ied and unde s ood decisions can hen be d awn om a ange
o choices. A sui able, minimally de imen al de aul can also be selec ed, and specialis choices can be made o pa icula
applica ions. This culmina es in an ex ended o malism which p o ides a uni ying pe spec i e o se e al na u ally eme ging
g oup- heo e ic app oaches. This has had he e ec o dema ca ing complemen a y bu dis inc egimes and scales o conside .
Pu suing his may e en ually yield c oss-disciplina y indings i hese dispa a e app oaches a e b idged, while o he scales
and composi ions may yield u he insigh s beyond wha is cu en ly es ablished. This is a gued o be a good mo i a ion o
conside ing his b oad and uni ying app oach.
The ollowing sec ion discusses he hypo hesised pa hological consequences, which ini ially mo i a ed such econside a-
ions.
2 P edic ed De imen s o Aniso opy
This sec ion ou lines a non-exhaus i e se o p edic ed pa hologies ha aniso opic unc ional o ms may in oduce. These
mainly cen e on he ole o he ac i a ion unc ions, since his is he a ea ha has been p ima ily explo ed hus a . Howe e ,
simila conside a ions may be equally applicable o o he p imi i es (pa icula ly quan isa ion). To he au ho ’s knowledge,
some o hese ailu e modes a e newly cha ac e ised phenomena, such as he so-called ‘neu al e ac i e p oblem’. I may
indica e ha i Iso opic Deep Lea ning is subs an ially ma u e, i may o m a be e de aul induc i e bias unless an al e na i e
is ask-necessi a ed. A u he in ui ion is exp essed in App. F.1.
2.1 The Neu al Re ac i e P oblem
The ‘neu al e ac i e p oblem’ desc ibes how linea and o igin-in e sec ing ajec o ies o ac i a ions may con e ge o di e ge
om hei ini ial pa h a e an ac i a ion unc ion is applied. This is analogous o a ligh ay e ac ing h ough op ically
a ying media o bounda ies.
This phenomenon appea s o occu in all aniso opic ac i a ion unc ions examined o da e. The ‘ e ac ion e ec ’ ypically
occu s mo e signi ican ly a la ge magni udes — po en ially p oducing a ailu e mode unde ne wo k ex apola ion. Neu al
e ac ion is demons a ed by cu a u e o p e iously s aigh o igin-in e sec ing lines (in g een) in he cen e plo o Fig. 1,
bu is absen in he igh mos plo o he same igu e. This e ac ion is a ma hema ical consequence o he o m, bu i s impac
on ep esen a ions and pa hological na u e equi es alida ion.
Ma hema ically, his phenomenon has se e al ep esen a ions, a magni ude- a ying ‘dynamic e ac ion’ shown in Eqn. 3
o di e en ially in Eqn. 4. Also de ined is a ‘s a ic e ac ion’ de ini ion shown in Eqn. 5. Geodesic-based cons uc ions
may also be de ined. These a e in ended only as p o isional o malisms o he phenomenon. These cu en o malisms a e
6Excep , pe haps, in obse a ions o in e p e abili y phenomena ha may eme ge om he o m’s s uc u e.
5
desc ibed o a mul i a ia e ac i a ion unc ion
and ec o
x =αˆx
whe e
ˆx
is a uni ec o . This ela ion may be sa is ied
o a single di ec ion, a subse o he space o all di ec ions
ˆx∈ X ⊆ Sn
. The ela ions gene ally show how he ac i a ion
unc ion al e s he di ec ion o i s inpu ec o in an aniso opic manne .
∃ˆx∈ Sn−1,∃α1=α2>0 : (α1ˆx)
∥ (α1ˆx)∥= (α2ˆx)
∥ (α2ˆx)∥(3)
∃ˆx∈ Sn−1,∃α0:∂
∂α
(αˆx)
∥ (αˆx)∥α0=
0(4)
∃ˆx∈ Sn−1,∃α: (αˆx)
∥ (αˆx)∥= ˆx(5)
I can be seen ha along a s aigh -line ajec o y in di ec ion
ˆx
, he esul o he ac i a ion unc ion is a cu ed line i
dynamically e ac ed. The e o e, i he linea ea u e hypo hesis is ollowed, hen e e y linea ea u e, in e ac ed di ec ions,
becomes cu ed ollowing he ac i a ion unc ion. The ne wo k may exploi some o his cu a u e o cons uc new linea
ea u es in he subsequen laye s; howe e , he e may be many ins ances whe e his cu a u e is de imen al o es ablished
seman ics. The ne wo k may lose seman ic sepa abili y, p oduce magni ude-based seman ic inconsis ency o p oduce
compensa o y maladap a ions in la e laye s. Howe e , due o he non-linea na u e o e ac ions in gene alised di ec ions,
which con inues o be compounded o e subsequen laye s, he ne wo k may s uggle o mi iga e he e ec . Hence, hese
maladap a ions may ail disp opo iona ely o ou -o -dis ibu ion samples. This may hinde he gene alisa ion pe o mance o
he ne wo k and indica es a mode which may make ep esen a ions mo e suscep ible o ad e sa ial a acks. Iso opic choices
would esol e he e ac ion, po en ially esul ing in ewe such adap a ions. An illus a i e example o neu al e ac ion is
shown in Fig. 2.
0-1 1
-1
1
Iden i y
0-1 1
-1
1
Leaky-ReLU
0-1 1
-1
1
S anda d-Tanh
0-1 1
-1
1
Iso opic Tanh
So max
0-1 1
-1
1
Figu e 2: Displays he
:R2→R2
maps, o he iden i y map (le mos ), s anda d Leaky-ReLU (cen e-le ), s anda d
Tanh (cen e), So max (cen e- igh ), and iso opic-Tanh ( igh mos ). These maps ans o m a ious objec s wi hin he space,
including wo lines wi h ze o-in e cep , shown in ed and g een, as well as se s o ho izon al and e ical lines in pale g ey. The
h ee cen e plo s demons a e ‘neu al e ac ion’ in i s s a ic o m o Leaky-ReLU and i s dynamic o m o s anda d Tanh, as
well as a mo e gene al case o So max. The iden i y plo and iso opic plo do no cause such e ac ions o hese objec s.
Pos ula ed o be especially de imen al, in bo h e ac ion cases, is he loss o seman ic sepa abili y. I wo dis inc
ajec o ies, ep esen ing di e en seman ics, a e ans o med in o cu es which in e sec o con e ge, hen he sepa abili y o
hese concep s is los o mis ep esen ed. Fo example, suppose one di ec ion is a linea ea u e o he p esence o a dog in an
image, whils he o he is o a ho se. In ha case, i hese ac i a ions a e o su icien magni ude whe e he ac i a ion unc ion
causes con e gence, he iden i y o he ac i a ion’s meaning can be miscons ued. This is in addi ion o he a o emen ioned
de lec ion o linea ajec o ies, which may educe he e ec i eness o ollowing linea ans o ms o e ec i ely sepa a e
ep esen a ions.
The con e gence may be pa icula ly consequen ial o unc ions such as Sigmoid and Tanh, since la ge magni ude inpu s
end up a pa icula limi poin s (discussed as i ial ep esen a ional alignmen s in Bi d
[17]
). Fo example, Tanh p oduces
he limi poin s shown in Eqn. 6 when
ˆx·ˆei= 0
o all
i
. I he e exis s an
i
such ha
ˆx·ˆei= 0
, hen he ans o med
ec o has a
0
in he co esponding index. The e o e, all ec o s end up a limi poin s wi h su icien magni ude when
using elemen wise-Tanh o Sigmoid. A ully-connec ed laye would ypically only e ec i ely sepa a e wo such con e ging
di ec ions a a ime, which a e hen u he cu ed by a subsequen ac i a ion unc ion.
lim
∀i,x·ˆei=0
α→∞
(αˆx) =
N
X
i=1
anh (αˆx·ˆei) ˆei≈
N
X
i=1 ±ˆei= (±1,··· ,±1)T(6)
Consequen ly, seman ic sepa abili y is los o la ge magni ude ep esen a ions excep o
3n
disc e e limi poin s o Tanh
and Sigmoid. The e o e, embedded ac i a ions may be expec ed o align wi h hese limi poin s. This explains some esul s
empi ically obse ed by Bi d
[17]
. Simila ly, ReLU has one dis inc limi poin ,
0
, bu o he wise an o han una ec ed by
neu al e ac ion. I is specula ed ha his is an addi ional eason o he success o ReLU, as only a subse o di ec ions
expe iences he neu al e ac ion phenomenon. Fu he mo e, his sugges s an ad an age o Leaky-ReLU: despi e ea u ing
s a ic e ac ion, di ec ions do no become o e lapped, so seman ic sepa abili y is e ained. The ne wo k may o he wise
‘expend’ aining ime on p oducing obus seman ic sepa abili y, ha ing a po en ially disc e ising e ec on ep esen a ions.
This would be a needless compensa o y adap a ion, which may lowe ep esen a ional capaci y and ex end aining as a esul
o ine iciency.
6
Mo e gene ally, he dynamic de lec ion o ajec o ies may cause seman ic ambigui y o he ne wo k, whe e only samples
in e polable om aining samples a e eliably seman ically iden i iable. Pa icula ly, he mo e signi ican he de lec ion, he
g ea e he seman ic ambigui y may be expec ed due o he esul an posi ion o ep esen a ions becoming unp edic able.
The e o e, a magni ude-dependen seman ic inconsis ency may a ise due o such de lec ions. A de lec ion unc ion can be a
i ial diagnos ic measu e, de ined by Eqn. 7 o a pa icula ac i a ion unc ion.
θ(α; ˆx, ) = a ccos (αˆx)·ˆx
∥ (αˆx)∥(7)
This may esul in an addi ional mode o deg aded pe o mance o a ne wo k, especially on ou -o - aining-dis ibu ion
samples. Fo example, suppose a linea ea u e oughly ep esen s he quan i y o cows in a ield. In ha case, he ne wo k
may ail o ex apola e i s unc ion when an anomalous amoun o cows a e p esen . This would be due o a conside ably
la ge magni ude o he linea ea u e, which is ypically de lec ed signi ican ly. The e o e, he de lec ion is unp eceden ed and
becomes unin e p e able. The ac i a ion unc ion would esul in a loss o seman ic consis ency. Consequen ly, a ne wo k
seeking o p ese e linea ea u es may cons ain ac i a ion magni udes h ough aining o egions whe e he non-linea
esponse is app oxima ely p edic able and s able, he eby a oiding he damaging consequences o neu al e ac ions. Mo eo e ,
he ne wo k may mo e ep esen a ions owa ds locally linea posi ions, limi ing he bene icial ans o ma i e p ope ies o he
non-linea i y.
Cu en angula aniso opies undamen ally cause he e ac ion phenomenon. I comp ession and a e ac ion occu in
ce ain angula egions, linea ea u es will be de lec ed in a ious ways. A ix o his is o in oduce iso opy (o no m-based
o ms o quasi-iso opy). This is he ini ial mo i a ion o de eloping he app oach. Iso opy does no p e en comp ession and
a e ac ion o ac i a ion dis ibu ions in gene al, as a bias can be added o ein oduce hese use ul phenomena p edic ably.
I is a gued ha hese issues only a ise when hey a ec linea ea u es, no a ine ones, in a po en ially unp edic able
and hus seman ically unin e p e able manne . Applying his o all a ine ea u es would be es ic i e enough o e u n
linea app oxima ions only; whils shi ing he o igin o linea ea u es could be conside ed h ough a symme y-b oken
ans o ma ion.
The phenomenon is elimina ed om ne wo ks by ea anging Eqn. 5 shown in Eqn. 8, hen applying he simpli ica ion
∥ (αˆx)∥=σ(α)in Eqn. 9.
(αˆx) = ∥ (αˆx)∥ˆx′(8)
(αˆx) = σ(α) ˆx′(9)
Finally choosing ˆx′=Rˆx o iso opy and Rˆx= Inˆx= ˆx o simplici y, shown in Eqn. 10.
(αˆx) = σ(α) ˆx(10)
In s anda d no a ion, Eqn. 10 can be ew i en in o he unc ional o m o iso opic ac i a ion unc ions shown in Eqn. 11.
This should be a piecewise unc ion, de ined using he iden i y a
x =
0
, bu his is supp essed o simplici y. Alongside
an app op ia e smoo hness condi ion on he Jacobian, his ensu es he appa en ‘singula i y’ a
x =
0
is only a coo dina e
singula i y p esen only due o how he unc ional o m is deno ed. Fu u e wo k in ol es es ablishing a uni e sal app oxima ion
heo em o his unc ional o m, which is cu en ly an ongoing a ea o esea ch o he au ho . This o m can be gene alised o
o he unc ional o ms in App. A, bu is discussed b ie ly below using symme y equi a iance.
:Rn→Rn, x 7→ (x) = σ(∥x∥) ˆx(11)
The o m o Eqn. 11 is
O(n)
ime o
Rn
ac i a ion ec o s, and only compu es he non-linea e m once, unlike
n
-compu a ions
o he non-linea e m in cu en ac i a ion unc ions. In addi ion, adial basis unc ions su e ed om a
O(nm)
cos . This
bilinea scaling a guably impeded he widesp ead adop ion o his unc ional o m in e e -la ge models, displayed in Eqn. 12
and Tab. 3.1.
:Rn→Rm, x 7→ (x) =
m
X
i=1
σ(∥x −ci∥) ˆei(12)
Iso opy can be gene alised o a esul o o a ional equi a iance o he unc ion, exp essed as a condi ion in Eqn. 13. This uses
a commu a o b acke o con enience, wi h
∀R∈O (n)
. This b acke can be used o simila ly de ine he cu en aniso opic
disc e e o a ional (pe mu a ion) pa adigm, by using he ans o m
∀P∈ Sn
ins ead o he o a ion. Connec ing o ms o
machine lea ning h ough symme y is u he elabo a ed on in Sec. 5, including he appa an unc ional o m indi e ence
be ween
O (n)
and
SO (n)
. Hence, hese cons ain s cons i u e an e ec i e de ini ional ool o gene a ing and ca ego ising
all p imi i es ac oss a ious axonomies. The equi a iance ela ion may be ecognised as supe icially simila o equi a ian
neu al ne wo ks, due o an analogous equi a iance ela ion; howe e , he di e ences in bo h implemen a ion and mo i a ions
a e subs an ial, and discussed u he in App. E.1.
[R, ]=(R −R ) =
0(13)
The ela ion may be mo e amilia as
(Rx) = R (x)
. This ela ion only applies o single-a gumen unc ions and equi es
gene alising o mo e ci cums ances, shown in App. A. A simila condi ion su ices:
(Rx1,··· ,RxN) = R (x1,··· , xN)
o :NNRn→Rn.
7
The ‘neu al e ac i e p oblem’ ou lines how seman ic meanings may become in e wined o ambiguous due o cu en
unc ional o ms skewing linea ea u es in undesi able ways. This is p edic ed o be especially de imen al o ou -o -
dis ibu ion ac i a ions, which a e likely o be mos de lec ed and hence mos seman ically co up ed. Thus, he ne wo k’s
gene alisa ion may hen ail in such ci cums ances. I may be expec ed ha he ne wo k p oduces compensa o y adap a ions o
he phenomenon, which may be na ow in he scope o hei co ec ions. Since neu al e ac ion is a non-linea and aniso opic
phenomenon, i canno be in e ed by a single subsequen laye , po en ially incu ing unnecessa y aining o e head on
p oducing co ec ions due o unin ended e ac ion.
2.2 Quan ised Rep esen a ions, Eme gence o Linea Fea u es and Seman ic In e pola abili y
Symme y-b oken unc ional o ms ha e been shown o induce symme y-b oken ep esen a ions which ans o m wi h he
basis [
17
], which indica es a dependency on he aniso opy and o e s an explana ion why app oxima ely disc e e embedding
di ec ions a e ended owa ds [
31
,
16
,
17
]. In his sec ion, ha conjec u e o dependence will be made clea . Addi ionally,
i can be hypo hesised ha because embedded ac i a ions a e o en disc e ised and meaning ul di ec ions may be expec ed
o align wi h hese embeddings, hen seman ic di ec ions also become quan ised. This gene ally appea s o be he case in
obse a ions [
31
,
32
,
17
]. Re e sing his p oposed causali y would indica e ha a con inuous o a ional symme y may enable
a con inuous embedding. Func ional o ms would no di ec ly induce a bi a y di ec ion-based symme y b eaking in hei
embeddings h ough aining; such a s uc u e would only eme ge om ask necessi y.
One can s a wi h he p edic ion o o m-induced ep esen a ional collapse. In his con ex , ep esen a ional collapse
is he ollowing heu is ic: The induced disc e isa ion o wha would o he wise be an app oxima ely smoo h con inuum o
ep esen a ions as samples d awn o e a da ase . Whe e Disc e isa ion is he inc easing concen a ion o ep esen a ions o
clus e s h ough aining, un il hey e en ually app oach a nea ly disc e e-like clus e in ep esen a ion space. Hence, his is
also desc ibed as a quan isa ion, o di e en ia e i om o he ep esen a ional collapses. Quan isa ion would be he induced
disc e isa ion o an o he wise con inuous quan i y. A his ea ly s age, un il he na u e o his p edic ed phenomenon is sui ably
unde s ood, his heu is ic may be mo e app op ia e han a p ema u e, igid ma hema ical de ini ion. The ollowing discussion
p o ides an in o mal mo i a ion o he p edic ion o quan isa ion, ollowed by a mo e p incipled discussion.
One may expec ha he angula une enness o a ious aniso opic p imi i es will esul in some o m o gene al e ec
on op imisa ion. Pa icula ly, such une enness would likely esul in sligh p e e ed di ec ions o embeddings and sligh ly
discou aged di ec ions o embeddings o ganised a ound he aniso opic dis inguished di ec ions. I is he deg ee o which his
e ec may occu which is o in e es . Fo example, in ex eme cases, his may esul in he absence o ep esen a ions o e
discou aged di ec ions and a clus e ing o ep esen a ions o e encou aged di ec ions. This is he sugges ed o m-induced
quan isa ion in o disc e e-like clus e s. Such a collapse esul s in in o ma i e ep esen a ional deg ees o eedom being
supp essed in o he wise app oxima ely connec ed da a. This is sugges ed o be pa hological when a bi a ily imposed h ough
ask-agnos ic unc ional o m induc i e biases. This may be bene icial whe e edundancy can be supp essed, as discussed
in Sec. 4; howe e , he o m-induced a bi a y s uc u e may emain de imen al. This ex eme disc e isa ion would be e y
dis inc and may aid in de ec ion — which is sugges ed o ha e obse ably al eady appea ed [
16
,
17
]. Howe e , ex eme
disc e isa ion i sel may no be ubiqui ous; i is he mo e gene al p oduc ion o ask-agnos ic ’s uc u e’ abou hese di ec ions
which cons i u es he gene al induc i e bias, and hese a e sugges ed o be indica i e o he algeb aic symme ies o he
o ms. Wi hou such ini ial une enness, p e e en ial angula egions would no exis , and ep esen a ions may dis ibu e mo e
’na u ally’, pe haps smoo hly o be indica i e o s uc u e in he da ase , a he han ask-agnos ic s uc u e due o he choice o
p imi i es.
P ac ically, his may ma e ialise in nume ous modes h ough op imisa ion, depending on he unc ion’s pa icula analy ical
quali ies; howe e , hese a e sugges ed o all esul om he unde lying g oup s uc u e o he o ms. Pa icula ly in p imi i es
de ined h ough disc e e g oup algeb as. This may a ise ega ding bo h o wa d and backwa ds pass conside a ions.
Fo example, cu en aniso opic p imi i es espec a leas a s anda d-basis pe mu a ion symme y (
Sn
), which can esul
in a disc e e o han pa i ioning o he space in wo o mo e dimensions. A unc ional o m can hen be desc ibed piecewise
h ough his pa i ioning. Gi en a uni a ia e unc ion,
, which is applied elemen wise, i can be ep esen ed piecewise as wo
di e ing unc ions
<
and
≥
o he nega i e and posi i e semi-de ini e domains, espec i ely. When applied elemen wise,
he ep esen a ion space’s o han s ha e a ious combina ions o hese wo unc ions ac ing on elemen s, dependen upon he
pa icula o han . Se e al o hese o han s a e hence analy ically equi alen , bu o a ed, in unc ion. Gene ally, he e a e
n+ 1
dis inc o han s o
n
-wid h laye s, wi h a
m
n
degene acy o
m∈ {0,1,··· , n}
. This is indica i e o he unde lying
Sn
s anda d-basis pe mu a ion symme y a ising om he elemen wise applica ion. E ec s on op imisa ion may hen esul ,
whe e ep esen a ions shi o e di e en o han s o le e age he di e ing localised maps o compu a ion. Hence, s uc u e is
expec ed o a ise as a consequence o his symme y pa i ioning. Hence, all
Sn
unc ions a e expec ed o be in luenced in his
gene alised symme ic manne , wi h speci ic modes con ingen upon each o han ’s pa icula map, ye emaining ied h ough
his unde lying cons uc ion.
O he pe mu a ion-based symme ies in unc ional o ms can be conside ed. Fo example, hype oc ahed al
Bn
, including
he s anda d-basis pe mu a ion wi h sign- lip symme y, makes all o han s analy ically degene a e when co ec ing o o a ion;
hence, i s e ec on ep esen a ions h ough op imisa ion may indica e his. Simila o e en-sign lips, which p oduce wo se s
o analy ically degene a e o han s, i cons uc ed piecewise using 0≤Qn
i=1 sign (x ·ˆei)(These a e deno ed Dnbu a e no
o be con used wi h he dihed al g oup). O he disc e e symme ies can esul in o he pa i ionings o be conside ed, such
as he simplex-based symme ies in Bi d
[17]
. Such a pa i ioning does no occu in he con inuous-symme y de ini ion o
iso opic p imi i es unde
O (n)
, so such an aligned s uc u e eme ging h ough op imisa ion is no expec ed. This is illus a ed
in Fig. 3 o
3D
mul i a ia e maps unde symme y, while Fig. 2 demons a es he phenomenon in
2D
o Leaky-ReLU and
8
s anda d Tanh, whe e he
Sn
and
Bn
espec i e symme ies p oduce quad an pa i ioning o hei maps (iso opic-Tanh does
no pa i ion in his manne ).
Iden i y (1n) Pe mu a ion (Sn)Hype oc ahed al (Bn)
E en-Sign Flips
& Pe mu a ion (Dn)O hogonal (O(n))
⊂ ⊂ ⊂ ⊂
Figu e 3: Illus a es he e ec o he a ious symme ies in
3D
abou he s anda d basis. The s anda d bases a e shown as ed,
g een and blue a ows, wi h he a ious oc an s (o han s in
nD
) demons a ed o disc e e symme ies. Le - o- igh shows, he
iden i y symme y o elemen wise unc ions
In
, he pe mu a ion symme y
Sn
, he e en-sign pe mu a ion symme y
Dn
, he
hype oc ahed al symme y
Bn
, and he con inuous o hogonal symme y
O(n)
p oducing an angula ly con inuous depic ion.
The colou -shading o he a ious oc an s demons a es which oc an s a e analy ically iden ical unde a o a ion/pe mu a ion o
hei map. This in ui i ely shows how di e en oc an egions ela e in hei maps and may in luence he ep esen a ion space.
Pa icula ly, he disc e e maps e ec i ely incu an absolu e ame o he in e nal ep esen a ion spaces, whils o hogonal
maps only incu an absolu e o igin.
Addi ionally, he e may be a hie a chical in e play o in luences on ep esen a ions om a ious unc ional o ms. These
may in e ac non- i ially, po en ially p i ileging di e ing bases, wi h an o e all p i ilege which may e ol e h ough aining.
Po en ially, accumula ion may occu , as sugges ed, up o a poin whe e an al e na i e basis becomes p i ileged and begins o
dispe se he exis ing s uc u e; his may esul in in e es ing dynamics, addi ional phase-changes and s eady-s a e equilib ium
beha iou . Whe he his occu s is specula ion, bu i could be explo ed.
One may also conside he consequences on he associa ed seman ics being ep esen ed h ough hese embeddings. Many
eal-li e seman ics a e con inuums: colou s, posi ions and poses o objec s, b oad mo phology, e en wi hin a single species
o objec s. Induced ep esen a ional collapse on o a single disc e e seman ic may lose i al nuance and meaning ul deg ees
o eedom. Disc e ised ep esen a ions encou aged by unc ional o ms appea o be a poo de aul induc i e bias unde
hese conside a ions. Wi hou spu ious s uc u e added o ep esen a ions om unc ions, he quan ising bias would anish,
po en ially enabling mo e con inuous ep esen a ion o he ask. In his manne , iso opic unc ions would no p e en
disc e e seman ics, which can be clus e ed h ough bias pa ame e s; howe e , hey do no p omo e disc e isa ion ei he . Hence,
Iso opic deep lea ning would be well-posi ioned o enable ne wo ks o acqui e mo e na u ally dis ibu ed ep esen a ions,
d i en by he ask and ee om s uc u e. The e o e, mo ing owa ds iso opy is hypo hesised o encou age embeddings
o be mo e smoo hly dis ibu ed and be e ep esen a i e o he ask and da a. In addi ion, his is expec ed o be e enable
in e pola able seman ics o in e media e ep esen a ions be ween ypically disc e e linea ea u es. This may subs an ially
enhance he exp essi i y and ep esen a ional capaci y o ne wo ks — only limi ed by concep in e e ences. This may posi ion
an Iso opic app oach as p oducing mo e op imal ep esen a ions.
Mo eo e , in such a case, he disc e e concep o ‘ ep esen a ion capaci y’ may become inapplicable. Each laye may
exp ess di e en con inuous a angemen s, whe e di e ing concep s a e angula ly supp essed and exp essed in analogy o he
linea ea u es hypo hesis [
33
]. Ins ead, he ‘magni ude-di ec ion hypo hesis’ is p oposed as a con inuous ex ension: wi h
magni udes indica ing he amoun o s imulus p esen , di ec ion indica ing he concep . Ac i a ions hen popula e his mo e
con inuous mani old, which is a gued o enable mo e meaning ul in e pola ions.
This con inuous seman ici y may also p oduce a be e -o ganised seman ic map a each ne wo k laye , since in e media e
ep esen a ions may now ela e o he wise disc e e ea u es. The lack o disc e ising bias may allow seman ics o be b ough
con inuously in o p oximi y (which ‘weigh locking’ discussed in Sec. 2.3 may ypically p e en ). A mani old wi hou
o m-induced s uc u e may aid esea che s in he eme ging ield o ep esen a ional alignmen , discussed u he in App. D.4.
The e o e, in e ms o ep esen a ions, he induc i e bias o iso opy appea s mo e app op ia e as a de aul , due o many
eal-wo ld seman ics being con inuous and no being quan ised in o disc e e bins h ough unc ional o m induced s uc u e.
Howe e , aniso opy may also be a good induc i e bias i uni e sal disc e isa ion o concep s a all scales and abs ac ion
le els is expec ed along he s anda d basis. Iso opy can be hough o as in oducing an induc i e bias ha enables con inuous
and in e pola able seman ics while e aining disc e e seman ics when ask-necessi a ed, as opposed o being design-imposed
s uc u e. Hence, i gene alises he disc e e linea ea u es pa adigm in o a mo e con inuous se ing.
2.3 Weigh Locking, Op imisa ion Ba ie s and Disconnec ed Basins
‘Weigh locking’ is a e m o desc ibe how, pa icula ly, he weigh pa ame e may su e om being s uck in local minima
ound u he in o loss alleys, encoun e ed only a e a su icien amoun o aining. Simila locking o biases nea
0
may
also occu . This op imisa ion a e ac is p edic ed o occu h ough wo modes — bo h a esul o he aniso opic unc ional
9
o m. Simila ly, i used de ini ionally o gene a ing new unc ional o ms, hey a e cons ained by his symme y maximally.
Fo example, i a unc ion is le in a ian o bo h Snand O (n), deno e only he la e , since Sn⊂O (n).
Addi ionally, he h ee p ima y ca ego ies also o m a hie a chy, wi h p obabilis ic ensu ing symme y closu e and algeb aic
ensu ing he o he wo o hei espec i e subdi isions. This is indica ed by Eqn. 30.
Algeb aic ⇒P obabilis ic ⇒Closu e (30)
Each ca ego y will be discussed, ollowed by examples ha mo i a e i , and hen a able will ou line he symme ies o
se e al unc ions.
Symme y-closu e indica es ha any elemen o a unc ional class can be ans o med unde a symme y, and he esul
is also a membe o he class. This is displayed in Eqns. 31, 32 and 33 o le -in a ian , igh -in a ian and equi a ian
espec i ly. The class Fwould be a chosen subse o all maps.
∀ ∈ F,∀g∈ G ( ◦g)∈ F (31)
∀ ∈ F,∀g∈ G (g◦ )∈ F (32)
Equi a ian closu e is a di e ing equi emen , in ol ing he pai ing o i s g oup-in e se (bu can be gene alised):
∀ ∈ F,∀g∈ G g−1◦ ◦g∈ F (33)
These s a emen s indica e ha any unc ion in a class which is ans o med unde symme y is s ill a membe o he
class. Using ep esen a ion heo e ic gene alisa ion i can be deno ed
∀ ∈ F
,
∀g∈ G
,
ρ(1)(g−1)◦ ◦ρ(2)(g)∈ F
o
wo ep esen a ions
ρ(1)
and
ρ(2)
. Closu e condi ions do no say i hese wo ins ances o he class a e equally likely o be
ini ialised, which is a s onge condi ion.
This la e case is he p obabilis ic condi ion and is gi en in Eqns. 34, 35 and 36 o le -in a ian , igh -in a ian and
equi a ian espec i ly.
P
gi es he p obabili y o he membe o he class
F
o be ini ialised. These could be e med a ‘weak’
acco dance wi h a symme y g oup.
∀ ∈ F,∀g∈ G P( ◦g) = P( )(34)
∀ ∈ F,∀g∈ G P(g◦ ) = P( )(35)
Again, p obabilis ic equi a iance is a di e ing equi emen :
∀ ∈ F,∀g∈ G Pg−1◦ ◦g=P( )(36)
The p obabilis ic condi ion can be speci ied as ime-like, ini ialisa ion-like, da a-like, o any combina ion o hese. This
depends on how he p obabili y is conside ed. Time-like would be, o example, o e subsequen i e a ions o o wa d passes
in he ne wo k, discussed in App. C. Ini ialisa ion-like indica es he dis ibu ions o pa ame e s which a e spon aneously
symme y b oken on ini ialisa ion. Da a-like can conside he p obabili y de ined o e samples o he da ase . O he si ua ional
subca ego ies o p obabilis ic condi ions may exis and equi e ex ension o he o malism. Using ep esen a ion heo e ic
gene alisa ion i can be deno ed ∀ ∈ F,∀g∈ G,P(ρ(1)(g−1)◦ ◦ρ(2)(g)) = P( ) o wo ep esen a ions ρ(1) and ρ(2).
Finally, he e a e algeb aic symme y ela ions, o a ‘s ong’ acco dance wi h a symme y. These indica e ha e e y
ins ance o a unc ion in he unc ional class espec s a symme y which lea es he compu a ion unchanged. They a e de ined
by Eqns. 37, 38 and 39 o le -in a ian , igh -in a ian and equi a ian , espec i ely. These a e he amilia b acke ela ions.
∀ ∈ F,∀g∈ G ◦g= (37)
∀ ∈ F,∀g∈ G g◦ = (38)
Again, algeb aic equi a iance would be a di e ing equi emen :
∀ ∈ F,∀g∈ G ◦g=g◦ (39)
Using ep esen a ion heo e ic gene alisa ion i can be deno ed
∀ ∈ F
,
∀g∈ G
,
ρ(1)(g)◦ = ◦ρ(2)(g)
o wo
ep esen a ions
ρ(1)
and
ρ(2)
. One can also conside i he unc ion has mul iple a gumen s o conca ena ed ou pu spaces.
These gene alised domains and codomains can ha e hese condi ions applied in a ious ways, e.g. di ec sums o enso
p oduc s. Hence, one can ex end he abo e de ini ions o unc ions wi h mul iple a gumen s, including di e ing ela ions
applied o any combina ion o a gumen s. Addi ionally, weigh sha ing can be conside ed ano he ex ension o he model. The
ollowing discussion conce ns se e al applica ions o he o malism.
To begin wi h, App. B.1, discusses a unc ional o m which in oduces aniso opy bu in an iso opically ini ialised manne .
This mo i a ed he cons uc ion o his o malism. This enabled such nuance in classi ica ions beyond algeb aic cons ain s. Fo
example,
(x;W) = Wx
is conside ed aniso opic since i s algeb aic equi a iance is he iden i y, bu i could s ill be weakly
iso opic. Then conside ing
(x) = PN
i=1 (x ·ˆei) ˆei
, which has algeb aic pe mu a ion equi a iance, one may conside hei
16
composi ion:
(x) = PN
i=1 ((Wx)·ˆei) ˆei
. This la e class is no s ongly iso opic bu could be weakly iso opic unde
sui able ini ialisa ion — his di e ence is signi ican and mo i a ed by classi ying o ms h ough his me a- amewo k. This
p oduced he o iginal dis inc ion be ween weak and s ong symme y acco dance, which was ex ended wi h closu e. This is an
example o how he composi ion o laye s can hen unde go spon aneous symme y b eaking.
Deep lea ning models a e hypo hesised o le e age symme y-b eaking phenomena, which a e u he adap ed h ough
aining, o achie e p ac ical compu a ion. The e o e, de ining a al e na i e app oach which p e en s any such phenomenon
would be imp ac ical o pu pose. This is one o he p ima y mo i a ions o his pape : elucida ing he ole o symme y
b eaking in ne wo ks sys ema ically by explo ing his axonomy and al e ed p imi i es.
The app oach o de ining each o k gene ally conside s algeb aic symme ies in p imi i es, while allowing pa ame e ised
maps o be closed unde he gene al linea g oup, in whiche e ele an la ou . Howe e , a pu e b anch would also ini ialise
such pa ame e s unde p obabilis ic cons ain s o he symme y o he b anch. Hence, his would esul in p imi i es espec ing
he o e all in ended symme y be o e symme y-b eaking ini ialisa ions and lea ning.
As s a ed, he speci ic g oup used in such cons ain s is chosen om a selec ion ha is conside ed de i able om a bi a y
di ec ed g aph au omo phisms, due o hei connec i i y, and hen chosen o be applied o associa ed p imi i es. This can
hen esul in he p oduc ion o unc ional classes which ha e a closu e unde hese espec i e au omo phism symme ies.
These maps, wi h closu e unde gene al au omo phisms, can hen be chosen o be upg aded p obabilis ically o up o algeb aic
cons ain s o speci ic g oups hey a e closed unde . Such conside a ions can also be applied a all scales: each indi idual
p imi i e unc ional o m and any possible composi ion o hese h ough he a bi a y g aph. In e ec , his indica es which
symme ies a e in p inciple a ailable be o e speci ic downs eam choices o unc ional o ms o p imi i es a e chosen
o eshly o mula ed. This in e play be ween a chi ec u es and p imi i e-cons ain s ou lines he amily o deep lea ning
app oaches and is sugges ed o be axioma ic-like.
As s a ed, om he closu es a ailable, one can choose a speci ic au omo phism o o m a s onge cons ain o. This can
be a subg oup o , o equal o, he whole au omo phism g oup p oduced by he g aph. Fo ‘pu e’ b anches, his choice esul s
in pa ame e ised mappings being upg aded o p obabilis ic, whils unc ional o ms o e nodes pick up algeb aic cons ain s.
Func ions om his unc ional class, which maximally abide by such es ic ions, a e hen used o p oduce pa icula p imi i es
o he speci ic ne wo k.
Fo example, his would ypically be a es ic ion o a pe mu a ion amily o e he s anda d basis in con empo a y deep
lea ning. This cons ain is hen applied ei he p obabilis ically (such as a ine maps
x;W,
b=Wx +
b
, which can
ha e le and igh p obabilis ic in a iance) o algeb aically (such as in ac i a ion unc ional o ms
(x) = Pn
i=1 (x ·ˆei) ˆei
,
which is algeb aically-equi a ian ). A di e en choice can yield Iso opic p obabilis ic and algeb aic o ms, o any o he
symme y o k o p imi i es. Some o hogonal p obabilis ic ini ialise s a e used; his can now be e med a hyb idisa ion.
O e all, he sugges ion is ha hese a e a conside ed choice in design. Iso opic ne wo ks a e he s a emen ha , a
minimum, he p imi i es should be p obabilis ically in/equi a ian maximally o o hogonal amily ac ions, bu s i e o
algeb aic iso opy whe e e easible. Ye his is no dogma ic: a linea map would be oo es ic i e o compu a ion i
algeb aically-o hogonal in/equi a ian , so in such cases, only p obabilis ic-o hogonal in/equi a iance, as spon aneous
symme y b eaking, would be desi able. I is he conside a ion o such choices which is impo an , and knowledge o hei
induced biases, no he es ic ion o hem.
Hyb idisa ions be ween di e ing symme y g oup de ini ions in a single model can be explo ed and appea o ha e
jus i ica ion al eady; hese would occupy in e media e posi ions compa ed o he mo e pu ely de ined b anches p oposed.
The e o e, al hough he pu e b anches a e igid in hei o m cons ain s, explo a ions o hyb idisa ions a e also encou aged,
analysis o which is enabled h ough his o malism. To he bes o ou knowledge, hese a e belie ed o be dis inc esea ch
di ec ions compa ed o p e ious symme y-based li e a u e.
As s a ed, his o malism is e ec i e a dis inguishing se e al app oaches, including he conside a ions o his pape om
Geome ic Deep Lea ning’s end- o-end symme y-de ined ne wo ks, such as equi a ian ne wo ks, and Pa ame e Symme ies.
The la e wo ega d a di e en scale o which hese ela ions apply.
Geome ic deep lea ning’s equi a ian ne wo ks can be eco e ed om he amewo k by conside ing he unc ion class o
he en i e model and es ic ing he model class o hose which a e algeb aically equi a ian o he in ended symme y g oup,
obse ed in he unde lying da a dis ibu ion. A simila applies o in a ian la ou ed cons ain s on models. Addi ionally,
one can conside g oup con olu ion as he nex composi ional scale down o which hese apply: unc ional class blocks o e
which hese cons ain s a e applied. Such componen s do occupy smalle -scale composi ions, hough hey a e in u he ance
o an end- o-end algeb aic symme y o he ull model. This is an en i ely di e en objec i e om hose p esen ed in his
pape , which a e in ended o gene al applica ion in a bi a y a chi ec u es, allowing and shaping symme y b eaking h ough
conside ing di e ing axonomic gene a ions composed in models. This ep esen s a signi ican dis inc ion in app oach. These
axonomic conside a ions conside he implici induc i e biases which ac on and in e play wi hin he ne wo k’s dynamics; i
is no solely ocused on cons uc ing models abou a desi ed equi a iance o in a iance o he g oup necessa y o p ese e
he da a s uc u e. Hence, one can conside his p oposal as he consequences o symme y b eaking in gene al ne wo ks.
Ne e heless, his o e all o malism can encompass bo h app oaches as a special case o symme ies applied o e unc ional
classes a di e ing scales and philosophies.
Hence, his ypically di e s in scale om mos o he conside a ions in his pape ’s app oach, which ocuses on he
unc ional o m p imi i e’s ela ion and i s inc easing in e ac ing composi ions, as opposed o he model as a whole and
he es ic ed unc ional classes equi ed downwa ds o achie e such model-scale esul s unde composi ion. These a e
complemen a y bu di e ing app oaches, wi h independen objec i es and philosophies; ye , some in e play can likely be
17
es ablished. Such in e play may be conside ed h ough his b oade o malism. Howe e , a p esen , he app oaches be ween
hese p imi i e ounda ion e o mula ions and equi a ian ne wo ks appea o o en be mu ually exclusi e in many GDL
models due o he la e ’s equen dependence on elemen wise p imi i es, bu his is no always he case. Howe e , his
may be econciled o e ime unde al e na i e implemen a ions and is discussed u he in App. E.1. This signi ican ly
di e en ia es he ypical scales a which Geome ic Deep Lea ning’s equi a ian models conside symme y om hose a
which his pape conside s hem. An algeb aic equi a iance o e he en i e model has no been he p ima y conce n o his
pape . Mo eo e , o he di e ences a ise, such as how he e o mula ion o p imi i e pe ains o symme y amilies due o he
di e en dimensionali ies o hei espec i e maps, whe eas models adhe e o a single g oup p ese ed h oughou .
Addi ionally, his o malism appea s o sui ably dis inguish ecen obse a ions and consequences due o Pa ame e
Symme ies, alongside hei pa ame e -space degene acies. This is he a enue in es iga ing how speci ic symme y ac ions can
lea e aspec s o exis ing ne wo ks unchanged in unc ionali y, esul ing in pa ame e degene acies. This phenomenon can
be uni ed in o he amewo k as a composi ional consequence. Th ee such ela ions make his e iden : a igh -closu e o a
gene al linea laye , an algeb aic-pe mu a ion equi a iance o ac i a ion unc ions and a gene al linea le -closu e o a linea
laye . Using his o malism, one can now ex end he conside a ions o o he p imi i e composi ions o analysis as well.
Fo example, a linea laye
1(x) = W1x+
b1
(
Rm→Rn
) has a igh -closu e o
n×n
gene al linea ans o ms, since he
unc ional class can ake on di e ing pa ame e alues. Simila ly o
3(x) = W3x +
b3
(
Rn→Rp
) which has a le -closu e
o
n×n
gene al linea ans o ms. In o mally, his means hese laye s ha e he capaci y o ‘abso b’ any gene al-linea , o
subse o , ans o ms whils emaining in he class. Finally, he ac i a ion unc ion, say
(x) = Pn
i=1 anh (x ·ˆei) ˆei
has a
signed-pe mu a ion algeb aic equi a iance, meaning Eqn. 40 ollows om he algeb aic-equi a iance o P∈Bng oup.
(x) = (Inx) = P−1Px=P−1 (Px)(40)
Following his, he composi ion o
3◦ ◦ 1
means ha he le and igh gene al-linea closu es can ‘abso b’ hese
P
and
P−1
ans o ms whils emaining in he class. This is because
3◦P∈ F3
and
P−1◦ 1∈ F1
. This combina ion o
p ope ies ep oduces he pa ame e symme ies unde composi ion 3◦ ◦ 1.
This ecas ing o pa ame e symme ies unde he abo e symme y o malism may aid gene alised conside a ions and
compa isons. This app oach e eals a la ge numbe o degene acies, pe such sandwhiched cons uc ion o an algeb aic-
equi a ian maps
Rk→Rk
and associa ed closed linea laye s, he ne wo k acqui es a mul iplica i e
(k!)2
ac o degene acy
in i s pa ame e space i he ac i a ion unc ion is
Sn
algeb aically equi a ian . Independen ly, his was conside ed a pa hology
be ween aniso opy and iso opy wi hin his wo k, and can be connec ed o he discussion in Sec. 2.3.
Addi ionally, his also highligh s ha he eme gence o a
Bn
o
Sn
symme y, in pa icula , is no due o he pa ame e s
(which a e gene al linea in a ian closu es); ins ead, i is he unc ion hey sandwich, in his case he ac i a ion unc ion, which
is algeb aically equi a ian o a ans o m. This aligns wi h God ey e al.
[28]
, which iden i ied and explo ed pe mu a ion-
ela ed symme ies o e exis ing ac i a ion unc ions. Thei in e wine g oups in ac i a ion unc ions co espond o hose o
pa ame e s, and can be di ec ly mapped o he abo e o malism discussed8.
Ex ensions o his can be conside ed unde his gene alised o malism, such as making clea he consequences i he a ine
maps a e no le / igh closed unde g oup
G
, when sandwiched wi h an algeb aic-equi a ian
G
. In such cases, pa ame e
symme ies a e no applicable and he ne wo k can become unique. One could also conside he p omo ion o p e en ion o
such closu e symme ies up o p obabilis ic condi ions in a simila manne . P e en ion may cause he ini ialise always o
a ou pa icula a angemen s o he ne wo k unde symme y, po en ially aiding in alignmen e o s.
Fu he mo e, one can conside he composi ional consequences o adding noise unde a p obabilis ic cons ain . This is a
sugges ed egula ise discussed in App. C. I may also ha e consequences o gene a i e e o s, which could be explo ed.
In addi ion, mo e ca e ul ea men o symme ies in Radial Basis Func ions [
30
] (
Rn→Rm
) can be explo ed. These
appea o ea u e a pe mu a ion igh -closu e (
Sm
) and o hogonal le -closu e (
O (n)
), which, when combined wi h linea
laye s, can o m simila pa ame e symme ies o hose discussed.
One can also conside o he composi ions, such as he maps which a e composed o o m esiduals
(x) = x +g(x)
,
whe e one can conside how acqui es he symme y o g. This is because he unc ion is he summa ion o an iden i y map
wi h a gene al map
g
. The iden i y commu es wi h any algeb aic symme y, so he esul ing symme y o hei composi ion is
only de ined h ough g’s subse symme y.
Fu he mo e, i appea s ha he e is a adeo be ween he le el o he maximal cons ain on he unc ional o m and he
esul on ep esen a ions. Fo example, algeb aic
Sn
equi a iance is less o a cons ain on unc ional o ms han algeb aic
O(n)
pe mu a ion equi a iance. Ye , he la e appea s o p oduce ewe cons ain s on ep esen a ions by emo ing absolu e
e e ence di ec ions, he dis inguished di ec ions. This adeo be ween algeb aic cons ain s and esul an ep esen a ional
cons ain s p esen s an in e es ing a enue o explo a ion.
O e all, his classi ica ion cons uc ion seems capable o bo h dema ca ing and uni ying mul iple di e ing app oaches o
symme ies wi hin deep lea ning. Hence, his b oade symme y o malism may be highly ad an ageous o explo e u he .
The p esen ocus o each’s app oach is pic o ally demons a ed in Fig. 4. I also indica es how na u ally he hie a chy o
cons uc ing models may equi e analysis o smalle composi ions, such as disc e e con olu ions o equi a ian ne wo ks.
S ill, i is in u he ance o he model-scale egime o which he symme y cons ain s a e applied. Due o i s lowes hie a chical
posi ioning, i a e o mula ion o he p imi i e ounda ion occu s, hen i has consequences o all laye s, all composi ions, and
all models in all applica ions, ex ending upwa ds. I is he s udy o he in e play and eme gen phenomena be ween symme y
8
They also demons a e a connec ion o ep esen a ions. This is u he suppo ing e idence o his pape ’s hypo hesis ha ac i a ion unc ions p oduce an
induc i e bias on ep esen a ions.
18
and ne wo ks hese b ing, as well as using i as a de ini ional ool ac oss all p imi i e o gene a e g oup-de ined classes, om
which eselec ion o speci ic ins an ia ions can occu .
Fla ou s
Gene a ions
Le
In a ian
Righ
In a ian
Equi a ian
Closu eP obabilis ic
Algeb aic
Taxonomic Table
Example G oups:
Founda ional
P imi i es
Laye s
Composed Laye s
Models
Ne wo k Cons uc ion
Hei a chy
P imi i e-Fi s
P og amme
Pa ame e Symme ies
Equi a ian Ne wo ksG oup-Con olu ions
App oaches:
Depends On Buil F om
Figu e 4: Pic o ally demons a es he a ious app oaches o symme y in con empo a y deep lea ning, h ough hei gene a ion
and la ou egimes as well as hei ypical scales. Le demons a es he hie a chical dependencies in he cons uc ion o deep
lea ning sys ems. Da k o ange indica es a op-down app oach philosophy, which seeks model-scale symme ies de i ed o
pu pose-buil , a ge ed applica ions, and consequen ly cons ains algeb aically downwa ds o ensu e his. In con as , he
min colou ep esen s a bo om-up, causal-e ec , and g oup- heo e ic philosophy, whe e new composi ions a e gene a ed
om and a e con ingen upon smalle -scale cons uc ions and may occupy mo e gene al gene a ions wi hin he axonomy.
Cen e- op p o ides se e al g oup axonomies which may be conside ed o implemen a ion. The cen e-bo om speci ies
se e al app oaches o symme y and is colou -coded o iden i y egimes in he le mos and igh mos diag ams. The igh mos
depic s he axonomic o ganisa ion pu o wa d. Pa ame e symme ies a e he composi ion o le / igh -in a ian closu es
wi h con empo a y ac i a ion unc ion equi a iance o pe mu a ion-like g oups, which is indica ed by he used iangle in he
axonomy. This also aises he p oblem o de ining he no ions o laye s and p imi i es, which is add essed in an upcoming
pape .
As a consequence, end- o-end models can con ain ins ances whe e speci ic p imi i es can be e o mula ed such ha he
model as a whole espec s he symme y; howe e , hey canno encompass he p imi i e- i s pa adigm as a whole, as his is a
supe se due o o ming he ounda ional base and i s a gued consequences o all gene al models. This indica es he p esen
app oach’s uni e sal, axioma ic-like impo ance o conside a ion, as any consequences u he in e ac wi h composi ions
con ingen upon hem. This is no o imply ha one philosophy is supe io o ano he ; hey a ge di e ing objec i es ye may
be complemen a y. One is al eady well-es ablished and g owing, wi h se e al s a e-o - he-a esul s ha al eady indica e
success in achie ing i s in ended goals. The o he is a emp ing o de e mine how symme ies om unc ional o ms may
in e ac wi h ne wo ks as unin ended induc i e biases, and hen e o mula ing p imi i es o bene icial pu poses in gene al
a chi ec u es. Dis inc ions a e d awn o a oid con usion be ween hei objec i es and conside a ions, as hey independen ly
sha e a g oup- heo e ic oo as hei na u al exp ession, which may appea simila a i s due o i s ela i e in equency in deep
lea ning. This is simila o how pa ame e symme ies also sha e a g oup- heo e ic oo and a e again dis inc . The objec i e o
his o malism is o d aw on his sha ed o e lap o g oup- heo e ic conside a ions o cons uc an o e a ching amewo k ha
p o ides a mo e comp ehensi e and high-le el pe spec i e.
Hence, his g oup- heo e ic app oach p o ides uni ying e ms o compa e hese wi hin he con ex o deep lea ning. The
app oach o his pape is o be p ima ily conscious o such decisions ega ding unc ional classes, a all scales, in hei
in oduc ion o induc i e biases and in e ac ions. This p ocess begins wi h gene a ing nume ous new amilies o ounda ional
p imi i e implemen a ions h ough ca e ul sea ches and selec ion wi hin hese unc ional classes, ollowed by building hese
upwa ds in composi ions o no el a chi ec u es and po en ially imp o ed, gene ally applicable models.
Tab. 5.2 oughly indica es he ypical symme y p ope ies o con en ional o ms. Fo example, a linea laye can be
chosen o be ini ialised iso opically, bu i sel does no display he associa ed algeb aic symme y. Each such choice es ic s
he unc ional class and incu s speci ic induc i e biases o conside . Ques ion ma ks on he equi a ian ne wo k indica e he
speci ic model’s chosen ini ialisa ions.
Commen s Func ion Closu e P obabilis ic Algeb aic
L R E L R E L R E
A ine Laye Wx +
bGL (n) GL (n) GL (n) O (n) O (n) O (n) InInIn
S anda d-Tanh PN
i=1 (x ·ˆei) ˆeiInInBnInInBnInInBn
Composed PN
i=1 ((Wx)·ˆei) ˆeiGL (n)SnSnO (n)SnSnInInIn
Equi a ian -Ne s models G?G?G G?G?GInInG
CE Loss L:Rn→RSnInInSnInInSnInIn
Iso opic-Tanh σ(∥x∥) ˆxInInO (n) InInO (n) InInO (n)
One can ex end his o malism in nume ous ways. Fo example, one can upg ade he g oup- heo e ic conside a ions o
19
gauge- heo e ic, i an applica ion sui ably jus i ies such a gene alisa ion o he app oach.
Addi ionally, one can conside mo e gene al symme ies, o example
O (1,3)
, whe e a me ic- enso can be inse ed in o
he unc ional o m o p oduce a pseudo-no m: (x) = xβgβγxγˆxα. This has some in e es ing consequences.
Pa icula ly, one can s ack he me ic o such a unc ion, simila o he manne in App. D.3, p oducing
(x) =
(xβgβγ
hxγ)ˆxα
. I he con a a ian and co a ian indica ing indices a e d opped, whils allowing non-symme ic me -
ic o la e pai wise compa isons hen he ollowing equa ion can be conside ed:
(x) = (xβgh
βγxγ)ˆxα
. Mo eo e ,
he me ic
gh
βγ
can hen be exp essed gene ally as he p oduc o wo ma ices
Wh
k
and
Wh
q
, e u ning o s anda d ma ix
no a ion:
(x) = (xT(Wh
q)TWh
kx)ˆx
and conside ing
K=Wkx
and
Q=Wqx
, hen a sel -a en ion-like s uc u e,
(x) = (QT
hKh)ˆx
, is closely eco e able, and could be gene alised o he alue ma ix, pai wise inne -p oduc s and
no malisa ion ac o
9
. Changing o a non-so max ac i a ion unc ion, which doesn’ depend on o he elemen s o componen s,
ein o ces his gene alised symme y conside a ion and pa ially mo i a es he discussion App. D.1.
Fu he mo e, one can make he appa en me ic posi ion dependen , ha ing ep esen a ional simila i y ollow om me ics
o e a non-linea ly con ac ing and expanding space. This could be a meaning ul a enue o explo e, clus e ing egions
o ep esen a ion space o dispe sing o he s. Hence, his symme y o malism enables a econ ex ualisa ion which may
ha e consequences o di e en compa isons be ween ep esen a ions and po en ially imp o ed exp essibili y in a en ion.
The e o e, his symme y app oach also allows he ein e p e a ion o sel -a en ion-like ope a ions wi hin his symme y
o malism.
O e all, his o malism is highly e sa ile in ca ego ising symme y echniques wi hin deep lea ning, po en ially enabling
imp o ed c oss-communica ions whils also making clea al e na i e a enues o explo e. The in en ion was o enable u he
compa isons and gene alisa ion o mo e examples, using his app oach o bo h audi exis ing unc ions in o a ca ego ised
axonomy and o use i as a de ini ional and gene a i e me hod o p oducing unc ions and models colla ed by g oup s uc u e.
This can occu in pa allel o unde s anding he ami ica ions o such symme y de ini ions h ough ep esen a ional geome y
and mechanis ic in e p e abili y, op imisa ion and pe o mance, pa ame e degene acies and mo e.
5.2.1 No e on Rep esen a ions
The ollowing h ee pa ag aphs b ie ly discuss ep esen a ion- heo e ic addi ional conside a ions ha may be impo an o
axonomisa ion, bu a e nuances ha may complica e he u ili y o he o e all axonomy o gene al p ac ice.
Mo eo e , hese la ou s can all be e o mula ed in e ms o ep esen a ions, which encompass and ex end he equi -
a iance/in a iance o mulae de ailed below, and could be o ganised using a highes weigh app oach, such as indexing
iso opy wi h Casimi ope a o s. This can be used o label hese ep esen a ions and p o ide a mo e p incipled, p imi i e, and
composi ional amewo k. Hence, using a ep esen a ion heo e ic app oach, o co esponding le and igh ac ions, can add
u he nuance o he p ima y la ou ca ego isa ions no a ed, and likely ep esen a ion heo y may be e o ganise ounda ional
biases and hei in e ac ions. This should be unde aken, bu is no a ionally supp essed o he sake o app oachabili y, and is
assumed as an implici and i al pa o he axonomisa ion. Fu he mo e, conside ing he coun less g oups possible, and all
he a ious possible ep esen a ions, i is encou aged ha gene al ounda ional bias p inciples a e dis illed down o e pa icula
amilies o g oups de ining p imi i es, e.g. o hogonal as opposed o pa icula ins ances
O (5)
,
O (8)
,
O (100)
e c. This
gene alised colla ion o biases may o en o e be e p ac ical bene i and le e agabili y, a he han mo e niche ins ances o
speci ic g oups, so his is encou aged o emos . Fo example, i was shown in Bi d
[17]
ha he axis-an i/alignmen gene ally
pe sis ed independen o he ne wo k wid h, which would indica e di e ing speci ic ins ances o he pe mu a ion amily:
S24
,
S32
, e c. — his is he p ima y objec i e o his p og amme. The e o e, al hough a mo e ine-g ained ca ego isa ion o
p imi i es is possible, i may be ad an ageous in gene al o ca ego ise hem only by g oup and dissimila ep esen a ions.
Addi ionally, ep esen a ions connec ed unde a conjuga ed ans o m
ρ′(g) = A−1ρ(g)A
may ha e meaning ully
di e ing induc i e biases depending on
A
, pa icula ly whe he he e exis s an elemen
g′∈ G
o which
A=ρ(g′)
is a
ep esen a ion. I his is no he case, hen he esul an bias may be non- i ial and could be o ganised by wha g oup
A
is a
ep esen a ion o . An example is ha i a weigh -decay egula isa ion is used, and wo di e ing equi alen ep esen a ions a e
chosen o a p imi i e composed wi h a ine laye s, hen i
A
has column-wise o ow-wise ec o s which do no no malise o
one, he e may be meaning ul composi ional biases be ween L2 egula ise -a ine-ac i a ion unc ion in e ac ions. A simila
a gumen applies i
A
is no in he ep esen a ion o he pe mu a ion de ining an aniso opic ac i a ion unc ion; hen i will
in e ac meaning ully wi h an L1 egula ise . This sugges s ha mo e han jus i educible ep esen a ions may be ele an
as ounda ional biases, and u he in es iga ion is equi ed. This will also in la e he conside a ions o he ounda ional
bias scheme; howe e , i again is likely bene icial o consolida e hese in o gene al p inciples o wide adop ion, al hough
speci ic ins ances could s ill be applied i desi able. An example o his is he pe mu a ions used in Bi d
[17]
o o a ed and
non-s anda d-basis ep esen a ions, whe e i was ound ha no obse able di e ences in biases a ose. Fu he mo e, he e
is an exponen ial g ow h in conside a ions when conside ing composi ions; he desi e is o dis il gene al p inciples o e
cons uc ions o only g oup-de ined p imi i es.
O e all, i is sugges ed ha despi e ine dis inc ions being possible h ough ep esen a ion- heo e ic p inciples, he
p ac icali y o he axonomy may emain p ima ily wi hin g oup- heo e ic de ini ions o p imi i es and gene al p inciples o
hei induced biases. This adds o why he g oup- heo e ic app oach is p ima ily showcased, wi h ewe p ima y la ou s,
alongside he no a ional app oachabili y o his p imi i e- i s p og amme.
9Such a no malisa ion ac o may be simila ly applicable o s anda d iso opic ac i a ion unc ions oo.
20
6 Conclusion
This pape ocuses on a no el case s udy o an iso opic unc ional o m as a hypo hesised be e de aul induc i e bias o
deep lea ning. Cu en o ms ha e been demons a ed in p e ious li e a u e o p oduce ask-unmo i a ed ep esen a ional
a e ac s [
17
], which his wo k hypo hesised may limi he ne wo ks’ seman ic exp essibili y. I is u he a gued ha he
cu en aniso opic unc ional o ms may ha e de imen al e ec s on pe o mance and lea ning h ough he p edic ed ‘neu al
e ac ion’, ‘disc e e seman ics’, and ‘weigh locking’ phenomena. Remo ing such cons ain s om he model is also a gued
o uncons ain he ep esen a ions om any pa icula basis. Hence, i is expec ed o p oduce a mo e na u al ac i a ion
ep esen a ion based upon ask necessi ies, a he han a s uc u e induced by human-imposed unc ional o ms. This may
imp o e seman ic s uc u e and p oduce high-capaci y embeddings, pa icula ly impo an o applica ions discussed in App. D.
In iso opic ne wo ks, unc ional o ms a e p omo ed om he exis ing disc e e pe mu a ion symme y o a con inuous
o hogonal symme y. This has subs an ial consequences o he o m o almos e e y unc ion in mode n-day deep lea ning.
This pape and appendices also ou line a amewo k o connec ing u u e wo k in hese di ec ions. Se e al p elimina y
ac i a ion unc ions, no malise s, op imise s, egula ise s and ope a ions a e also desc ibed as a s a ing poin . The ene s
o e iewing such unc ional o m choices a e encou aged. Th ough compa ison s udies, his philosophy should, a he e y
leas , e eal which cha ac e is ics o unc ional o ms mos signi ican ly con ibu e o pe o mance. This includes he ole o
symme y b eaking, which is one o he o emos conce ns when de eloping his axonomy, as i de e mines how a ious
composi ions may induce hie a chical in e ac ions ha aid o de ac om lea ning.
A change o iso opic deep lea ning is a gued o be gene ally ad an ageous, bu may need subs an ial ime o de elopmen
as a ma u e al e na i e. New models and benchma ks may also equi e de elopmen o de e mine he p ac icali y o his
al e na i e app oach. Addi ionally, be e -op imised implemen a ions ha sui ably le e age iso opy a e p e e able, as he ones
desc ibed emain illus a i e placeholde s. The unc ions p oposed so a a e analogous o exis ing unc ions, which may no
be inhe en ly op imal o an iso opic ne wo k, e en i hey sha e supe icial simila i ies. The e o e, empi ical wo k on hese
placeholde unc ions will be p esen ed in u u e pape s, o no dis ac om he p ima y mo i a ion o his shi o iso opic
deep lea ning and he wide g oup- heo e ic al e na i es. The p oposed ideas aim o s imula e he communi y’s in e es in
conduc ing a di ec ed sea ch o be e iso opic unc ions and de e mining whe he his app oach should be adop ed in wide
applica ions. The p edic ed pa hologies may also o e a sui able alsi iable mode o alida e he p inciples a his ea ly s age.
Finally, o he app oaches o symme y in deep lea ning a e shown o be dis inc om he app oach p oposed in his pape ;
howe e , an o e a ching symme y o malism is also in oduced o uni y hese dispa a e app oaches and make clea o he
a enues o explo e in a simila ega d.
I is p oposed ha he b ead h o he e o mula ions may cons i u e an al e na i e di ec ion o deep lea ning: Iso opic deep
lea ning, wi h he axonomy o Sec. 5 ex ending his u he . Connec ing his o g aph au omo phisms yields axioma ic-like
choices, o ming dis inc ‘b anches’ o p imi i es and downs eam models o conside . In gene al, i si ua es symme y p io o
neu ons, a he han symme y deduced om he neu on-de ined compu a ion g aph — an on ological in e sion o wha is
de ined o deduced om a neu al ne wo k. Ex ending his u he may p o ide a comple e axioma ic cons uc ion o deep
lea ning. Addi ionally, new ca ego ies o p imi i es may be highly dis inc om all ha ha e come be o e, pa icula ly wi h
po en ial consequences on ep esen a ions and lea ning. This may be le e aged o o e new app oaches o deep lea ning in
bo h p ac ice and heo y. This may equi e he de elopmen o a ious subdisciplines o he s udy o espec i e b anches
and any in e ela edness, such as he conjec u ed g oup uni e sal app oxima ion and bound heo ems. Reanalysis o exis ing
phenomena and esul s may also be unde aken o de e mine i hey a e p edica ed upon speci ic p imi i e choices.
In gene al, his axonomic g oup classi ica ion may be an e ec i e app oach in ca ego ising new unc ional o ms and
judging hei in e ac ions, ounda ional biases and symme y-b eaking phenomena. Howe e , i is no a eplacemen o o he
analy ical ac o s ha may be conside ed in se ing he unc ional o m o a e a g oup-de ined unc ional o m is ixed. Fo
example, he iden i y g oup can gene a e such a b oad epe oi e o unc ional o ms ha is insu icien ly dis inguished by
g oup heo y alone. Hence, o he analy ical ools, besides g oup- heo e ic conside a ions, emain c ucial o dis inguish hei
ounda ional biases. Ye , a g oup-de ined app oach does enable a p incipled way o gene alise o e se s o p imi i es and is
a gued o be impo an in conside ing he use ul compu a ional maps ha ne wo ks may le e age — such as in he o han
se up. O e all, i is a gued ha g oup heo y p o ides a good ounda ion o de ining ini ial p imi i e o ms and an ini ial
amewo k o ca ego ising in e ac ions. Howe e , he ca ego isa ion o ounda ional biases is likely o ou g ow he limi s o
g oup and ep esen a ion- heo e ic axonomisa ion, alongside symme y b eaking. I may p ac ically equi e an ex ension o
he axonomy beyond his, and g oup heo y should be employed o ex end he explo a ion, bu no limi i .
O e all, his gene a es a conside able and no el design axis o he ield o deep lea ning, which may be explo ed wi h
he aim o p oducing be e -pe o ming models, gaining a be e mechanis ic unde s anding o hei unc ions, disco e ing
undamen al phenomena and esul s, and hope ully gene alising o mo e applica ions.
6.1 Final No e on Philosophical Implica ions
Philosophically, his pape ad oca es o a shi in pe spec i e on symme y in ela ion o deep lea ning and an on ological
shi in wha could be conside ed a deep lea ning model, as well as he eme gen consequences ha s em om his.
This pape conce ns he eme gence o symme y wi hin deep lea ning i sel and how i may, inhe en ly and c ucially,
ask-agnos ically, ac on a model’s compu a ion. This app oach co e s explo ing he implica ions o his gene ally on models
o all applica ions, con ingen upon a ious choices o unc ional o m de ini ions.
This is a ma kedly di e en assump ion abou he ela ionship be ween symme y and deep lea ning compa ed o end- o-
end model app oaches, which le e age he es ablished symme ies o he na u al wo ld using obse a ions and a gumen s
21
equen ly eme ging ex e nally o he discipline, and ex ending hese in o deep lea ning o models o adhe e o. These a e
ex ensions o a known ex e nal physical app oach o symme y and ans e ing in o models. Howe e , his pape p oceeds
om a d as ically di e ing assump ion ega ding he eme gence o symme y, emphasising ha i is in e nal o deep lea ning
as well and has impo ance in i s own igh . I is making he a gumen and assump ion ha g oup- heo e ic conside a ions a e
na i ely impo an cha ac e ising ools o he ield, ein o ced by i being al eady unin en ionally se wi hin he con empo a y
de ining choices o p imi i es. Hence, i is no a pe spec i e on emula ing na u al symme ies in o a sys em, bu conside ing
ha he unc ional o m s uc u e al eady wi hin he sys em ca ies i s own symme y biases na u ally a ibu able o and
ca ego isable h ough symme y — an ex e nalis op-down e sus in e nalis bo om-up pe spec i e on symme y o deep
lea ning.
Hence, his cons i u es a philosophical eo ien a ion o symme y, no jus a ool o aligning models wi h he ex e nal wo ld,
bu also as an in e nal, unc ion-d i en in luence ac ing ask-agnos ically on ep esen a ions, op imisa ion, in e p e abili y, and
mo e gene ally — he p ima y and gene alised pe spec i e shi his pape ad oca es o . P ima ily, i is a gued ha hese
conside a ions may na u ally a ise in e nally, exis ing as impo an ac o s wi hin he ield, in addi ion o, bu no equi ing,
mo i a ion om he ex e nal wo ld. Hence, i is con ingen on an assump ion ha symme y’s impo ance as a ca ego ising
p inciple is no only impo ed bu also na i e o deep lea ning.
Pe haps one o he mos c ucial aspec s is he new on ology o wha cons i u es a deep lea ning app oach, which his
e aming p o okes. I highligh s ha i may no longe be con ingen on a compu a ional sys em cons uc ed upon a neu on-
wise in e connec i i y app oach, bu gene alised ac oss a ious g oup-de ined p imi i e se s. This e en gene alises he no ion
o a neu on as an objec gene alised o highe -dimensional conside a ions, which is a downs eam consequence a e a
symme y de ini ion is se . De ini ions e e se om neu ons p eceding deduc ed symme ies, o symme ies de ining neu ons
— and i symme ies a e de i ed om au omo phisms, hen g aphs p ecede hese. I also con ends ha his is likely no limi ed
o ounda ional biases s emming om pa ame e degene acies, as a consequence con ingen on cu en o malisms, such as
a ine laye s plus a non-linea i y; i also gene alises he implica ions beyond his composi ional s uc u e, sugges ing ha e en
in such cases hey do no need o be eme gen phenomena solely p edica ed and o mula ed on degene acies, bu gene al
unc ion-d i en consequences a ibu able a omically, such as neu al e ac ion, and mo e in gene al composi ions.
Hence, his di e s om pa ame e symme ies, which iden i y compu a ional equi alences unde epa ame e isa ions
deduced when assuming he exis ing ixed se o p imi i es and he consequences he ein. Ins ead, his wo k also changes
he assump ions upon which hose deduc ions a e p edica ed by b oadening he de ini ion o p imi i es o a ple ho a o
g oup-de ined ela ions and conside ing he implica ions o hese mo e b oadly wi hin he new on ology. Mo eo e , hese
unc ion-d i en implica ions a e a gued o exis o a model e en whe e pa ame e -induced compu a ional degene acies
may no . Hence, his is being conside ed as an explo a o y app oach o phenomena which may ex end beyond hose which
a e p edica ed on compu a ional equi alences h ough pa ame e degene acies, and he e o e is no con ingen on hese o
exis . Fo example, iden i y-de ined unc ions may ha e simila ounda ional biases. O e all, he emphasis ex ends o mo e
gene al ami ica ions, om all p imi i e de ini ions, bo h a omically and gene al composi ional cases o ne wo ks. A i ial
coun e example o how hese ex end pas pa ame e degene acies could be conside ing he diagonal basis-dependen and
pe mu a ion-de ined app oxima ion o Hessian-like scalings o adap i e op imise s. This is a mo e i ial example, bu
simila cons uc ions could be conside ed o ac i a ion unc ions and many o he p imi i es, which may no depend on he
speci ic pa ame e -symme y composi ion s uc u e o in luence — such as in cases whe e a es ic ed linea map may no
ha e he equi ed closu es, bu s ill ha e phenomena dependen on he de ini ion o he p imi i es used, o a single example:
consequences o e ac ion can exis independen o degene acy.
This also e ames se e al in e p e abili y app oaches, which may also be dependen on he exis ing se o p imi i es.
De e mining he seman ics o ep esen a ions may be media ed by hese axioma ic-like choices, whe e a e ac ual s uc u e
may a ise om p imi i e algeb a alone. By b oadening o e di e ing g oup-de ined se s, he geome y o ep esen a ions
may also al e . This may also a ec he knowledge and deduc ions a model can make — b oadening i s epis emic ho izon.
This in e play be ween imposed geome y o p imi i es and eme gen consequences on ep esen a ions and op imisa ions
may cons i u e a ’no- ee-geome y’ conside a ion — e aming ha obse a ion o ep esen a ion s uc u e may be as much
condi ioned on he p imi i es as he da a. Hence, ca e should be aken no o use i as e idence in suppo o he p imi i es
ci cula ly, bu pe haps asce ain mo e undamen al and sha ed o ganisa ions o ep esen a ions as insigh in o seman ic
ela ions and lea ning. This also mo i a es a shi in pe spec i e on ep esen a ions, om solely copying he ex e nal s uc u e
o eali y o also being s ongly s uc u ed by he non-de i able choices o unc ional o ms inhe en o he model.
O e all, his is a no ably di e ing philosophy o symme y in deep lea ning. I assumes ha i al eady p e-exis s and
a gues ha i may be a cons i u i e and in insic p ope y o deep lea ning, o ming a le e agable, na i e, and ounda ional
design axis. Philosophically, his is sugges ing a plu alism ede ini ion o wha cons i u es deep lea ning by gene alising i
ac oss a ious g oup- heo e ic componen s cons i u ing a model and de e mining and ca ego ising hei implica ions, such ha
hey can be bene icially le e aged.
22
Re e ences
[1]
Alex K izhe sky, Ilya Su ske e , and Geo ey E Hin on. Imagene classi ica ion wi h deep con olu ional neu al
ne wo ks. In F. Pe ei a, C.J. Bu ges, L. Bo ou, and K.Q. Weinbe ge , edi o s, Ad ances in Neu al In o ma ion
P ocessing Sys ems, olume 25. Cu an Associa es, Inc., 2012. URL
h ps://p oceedings.neu ips.cc/
pape _ iles/pape /2012/ ile/c399862d3b9d6b76c8436e924a68c45b-Pape .pd .
[2]
Ni ish S i as a a, Geo ey Hin on, Alex K izhe sky, Ilya Su ske e , and Ruslan Salakhu dino . D opou : A simple way
o p e en neu al ne wo ks om o e i ing. Jou nal o Machine Lea ning Resea ch, 15(56):1929–1958, 2014. URL
h p://jml .o g/pape s/ 15/s i as a a14a.h ml.
[3]
Ch is ian Szegedy, Wei Liu, Yangqing Jia, Pie e Se mane , Sco Reed, D agomi Anguelo , Dumi u E han, Vincen
Vanhoucke, and And ew Rabino ich. Going deepe wi h con olu ions, 2014. URL
h ps://a xi .o g/abs/
1409.4842.
[4]
Ka en Simonyan and And ew Zisse man. Ve y deep con olu ional ne wo ks o la ge-scale image ecogni ion, 2015.
URL h ps://a xi .o g/abs/1409.1556.
[5]
Se gey Io e and Ch is ian Szegedy. Ba ch no maliza ion: Accele a ing deep ne wo k aining by educing in e nal
co a ia e shi , 2015. URL h ps://a xi .o g/abs/1502.03167.
[6]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep esidual lea ning o image ecogni ion, 2015. URL
h ps://a xi .o g/abs/1512.03385.
[7]
Diede ik P. Kingma and Jimmy Ba. Adam: A me hod o s ochas ic op imiza ion, 2017. URL
h ps://a xi .
o g/abs/1412.6980.
[8]
Gao Huang, Zhuang Liu, Lau ens an de Maa en, and Kilian Q. Weinbe ge . Densely connec ed con olu ional ne wo ks,
2018. URL h ps://a xi .o g/abs/1608.06993.
[9]
Mingxing Tan and Quoc V. Le. E icien ne : Re hinking model scaling o con olu ional neu al ne wo ks, 2020. URL
h ps://a xi .o g/abs/1905.11946.
[10]
Ashish Vaswani, Noam Shazee , Niki Pa ma , Jakob Uszko ei , Llion Jones, Aidan N. Gomez, Lukasz Kaise , and Illia
Polosukhin. A en ion is all you need, 2023. URL h ps://a xi .o g/abs/1706.03762.
[11] Benjamin F Logan and La y A Shepp. Op imal econs uc ion o a unc ion om i s p ojec ions. 1975.
[12]
Vinod Nai and Geo ey E Hin on. Rec i ied linea uni s imp o e es ic ed bol zmann machines. In P oceedings o he
27 h in e na ional con e ence on machine lea ning (ICML-10), pages 807–814, 2010.
[13]
P aji Ramachand an, Ba e Zoph, and Quoc V. Le. Sea ching o ac i a ion unc ions, 2017. URL
h ps://a xi .
o g/abs/1710.05941.
[14]
Dan Hend ycks and Ke in Gimpel. Gaussian e o linea uni s (gelus), 2023. URL
h ps://a xi .o g/abs/
1606.08415.
[15]
And ew L Maas, Awni Y Hannun, And ew Y Ng, e al. Rec i ie nonlinea i ies imp o e neu al ne wo k acous ic models.
In P oc. icml, olume 30, page 3. A lan a, GA, 2013.
[16]
Nelson Elhage, T is an Hume, Ca he ine Olsson, Nicholas Schie e , Tom Henighan, Shauna K a ec, Zac Ha ield-Dodds,
Robe Lasenby, Dawn D ain, Ca ol Chen, Roge G osse, Sam McCandlish, Ja ed Kaplan, Da io Amodei, Ma in
Wa enbe g, and Ch is ophe Olah. Toy models o supe posi ion, 2022. URL
h ps://a xi .o g/abs/2209.
10652.
[17]
Geo ge Bi d. The spo ligh esonance me hod: Resol ing he alignmen o embedded ac i a ions. In Second Wo kshop on
Rep esen a ional Alignmen a ICLR 2025, 2025. URL h ps://open e iew.ne / o um?id=alxPpqVRzX.
[18]
Geo ge Cybenko. App oxima ion by supe posi ions o a sigmoidal unc ion. Ma hema ics o con ol, signals and sys ems,
2(4):303–314, 1989.
[19]
Ku Ho nik. App oxima ion capabili ies o mul ilaye eed o wa d ne wo ks. Neu al Ne wo ks, 4(2):251–257, 1991.
ISSN 0893-6080. doi: h ps://doi.o g/10.1016/0893-6080(91)90009-T. URL
h ps://www.sciencedi ec .
com/science/a icle/pii/089360809190009T.
[20]
Ch is Olah. Neu al ne wo ks, mani olds, and opology — colah.gi hub.io.
h ps://colah.gi hub.io/pos s/
2014-03-NN-Mani olds-Topology/, Ap il 2014. [Accessed 15-05-2025].
[21]
Pe e Foldiak and Dominik End es. Spa se coding, Jan 2008. URL
h p://www.schola pedia.o g/a icle/
Spa se_coding#:~: ex =Spa se%20coding%20is%20 he%20 ep esen a ion,subse %20o %
20all%20a ailable%20neu ons.
23
[22]
Ku Ho nik, Maxwell S inchcombe, and Halbe Whi e. Mul ilaye eed o wa d ne wo ks a e uni e sal app oxima o s.
Neu al ne wo ks, 2(5):359–366, 1989.
[23]
Taco S. Cohen and Max Welling. G oup equi a ian con olu ional ne wo ks, 2016. URL
h ps://a xi .o g/
abs/1602.07576.
[24] Taco S. Cohen and Max Welling. S ee able cnns, 2016. URL h ps://a xi .o g/abs/1612.08498.
[25]
Daniel E. Wo all, S ephan J. Ga bin, Daniya Tu mukhambe o , and Gab iel J. B os ow. Ha monic ne wo ks: Deep
ansla ion and o a ion equi a iance, 2017. URL h ps://a xi .o g/abs/1612.04642.
[26]
Taco S. Cohen, Ma io Geige , Jonas Koehle , and Max Welling. Sphe ical cnns, 2018. URL
h ps://a xi .o g/
abs/1801.10130.
[27]
Michael M. B ons ein, Joan B una, Taco Cohen, and Pe a Veliˇ
cko i´
c. Geome ic deep lea ning: G ids, g oups, g aphs,
geodesics, and gauges, 2021. URL h ps://a xi .o g/abs/2104.13478.
[28]
Cha les God ey, Da is B own, Tegan Eme son, and Hen y K inge. On he symme ies o deep lea ning models and
hei in e nal ep esen a ions, 2023. URL h ps://a xi .o g/abs/2205.14258.
[29]
De ek Lim, Theo Moe Pu e man, Robin Wal e s, Haggai Ma on, and S e anie Jegelka. The empi ical impac o neu al
pa ame e symme ies, o lack he eo , 2024. URL h ps://a xi .o g/abs/2405.20231.
[30]
Da id Lowe and D B oomhead. Mul i a iable unc ional in e pola ion and adap i e ne wo ks. Complex sys ems, 2(3):
321–355, 1988.
[31]
Ch is Olah, Alexande Mo d in se , and Ludwig Schube . Fea u e isualiza ion, No 2017. URL
h ps://dis ill.
pub/2017/ ea u e- isualiza ion/.
[32]
Da id Bau, Bolei Zhou, Adi ya Khosla, Aude Oli a, and An onio To alba. Ne wo k dissec ion: Quan i ying in e -
p e abili y o deep isual ep esen a ions, 2017. URL h ps://a xi .o g/abs/1704.05796.
[33]
Sanjee A o a, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and And ej Ris eski. Linea algeb aic s uc u e o wo d senses,
wi h applica ions o polysemy. T ansac ions o he Associa ion o Compu a ional Linguis ics, 6:483–495, 2018.
[34]
Jona han F ankle and Michael Ca bin. The lo e y icke hypo hesis: Finding spa se, ainable neu al ne wo ks, 2019.
URL h ps://a xi .o g/abs/1803.03635.
[35]
Sa a Sabou , Nicholas F oss , and Geo ey E Hin on. Dynamic ou ing be ween capsules, 2017. URL
h ps:
//a xi .o g/abs/1710.09829.
[36]
R. Hadsell, S. Chop a, and Y. LeCun. Dimensionali y educ ion by lea ning an in a ian mapping. In 2006 IEEE
Compu e Socie y Con e ence on Compu e Vision and Pa e n Recogni ion (CVPR’06), olume 2, pages 1735–1742,
2006. doi: 10.1109/CVPR.2006.100.
[37] Ting Chen, Simon Ko nbli h, Mohammad No ouzi, and Geo ey Hin on. A simple amewo k o con as i e lea ning
o isual ep esen a ions, 2020. URL h ps://a xi .o g/abs/2002.05709.
[38]
Va dan Papyan, X. Y. Han, and Da id L. Donoho. P e alence o neu al collapse du ing he e minal phase o deep
lea ning aining. P oceedings o he Na ional Academy o Sciences, 117(40):24652–24663, Sep embe 2020. ISSN
1091-6490. doi: 10.1073/pnas.2015509117. URL h p://dx.doi.o g/10.1073/pnas.2015509117.
[39]
Shibani San u ka , Dimi is Tsip as, And ew Ilyas, and Aleksande Mad y. How does ba ch no maliza ion help
op imiza ion?, 2019. URL h ps://a xi .o g/abs/1805.11604.
[40]
Jimmy Lei Ba, Jamie Ryan Ki os, and Geo ey E. Hin on. Laye no maliza ion, 2016. URL
h ps://a xi .o g/
abs/1607.06450.
[41]
Dmi y Ulyano , And ea Vedaldi, and Vic o Lempi sky. Ins ance no maliza ion: The missing ing edien o as
s yliza ion, 2017. URL h ps://a xi .o g/abs/1607.08022.
[42] Yuxin Wu and Kaiming He. G oup no maliza ion, 2018. URL h ps://a xi .o g/abs/1803.08494.
[43]
Pascal Me es, Elise an de Pol, and Cees G. M. Snoek. Hype sphe ical p o o ype ne wo ks, 2019. URL
h ps:
//a xi .o g/abs/1901.10514.
[44]
Se gey Io e. Ba ch eno maliza ion: Towa ds educing miniba ch dependence in ba ch-no malized models, 2017. URL
h ps://a xi .o g/abs/1702.03275.
[45]
And ew M Saxe, James L McClelland, and Su ya Ganguli. Exac solu ions o he nonlinea dynamics o lea ning in deep
linea neu al ne wo ks. a Xi p ep in a Xi :1312.6120, 2013.
24
[46]
Xa ie Glo o and Yoshua Bengio. Unde s anding he di icul y o aining deep eed o wa d neu al ne wo ks. In
Yee Whye Teh and Mike Ti e ing on, edi o s, P oceedings o he Thi een h In e na ional Con e ence on A i icial
In elligence and S a is ics, olume 9 o P oceedings o Machine Lea ning Resea ch, pages 249–256, Chia Laguna Reso ,
Sa dinia, I aly, 13–15 May 2010. PMLR. URL h ps://p oceedings.ml .p ess/ 9/glo o 10a.h ml.
[47]
John Duchi, Elad Hazan, and Yo am Singe . Adap i e subg adien me hods o online lea ning and s ochas ic op imiza-
ion. Jou nal o Machine Lea ning Resea ch, 12(61):2121–2159, 2011. URL
h p://jml .o g/pape s/ 12/
duchi11a.h ml.
[48]
Ma hew D. Zeile . Adadel a: An adap i e lea ning a e me hod, 2012. URL
h ps://a xi .o g/abs/1212.
5701.
[49]
As on Zhang, Zacha y C Lip on, Mu Li, and Alexande J Smola. Di e in o deep lea ning. Camb idge Uni e si y P ess,
2023.
[50]
Jiaxuan Wang and Jenna Wiens. Adasgd: B idging he gap be ween sgd and adam, 2020. URL
h ps://a xi .
o g/abs/2006.16541.
[51]
Cha les Geo ge B oyden. The con e gence o a class o double- ank minimiza ion algo i hms 1. gene al conside a ions.
IMA Jou nal o Applied Ma hema ics, 6(1):76–90, 1970.
[52] Roge Fle che . A new app oach o a iable me ic algo i hms. The compu e jou nal, 13(3):317–322, 1970.
[53]
Donald Gold a b. A amily o a iable-me ic me hods de i ed by a ia ional means. Ma hema ics o compu a ion, 24
(109):23–26, 1970.
[54]
Da id F Shanno. Condi ioning o quasi-new on me hods o unc ion minimiza ion. Ma hema ics o compu a ion, 24
(111):647–656, 1970.
[55]
Jo ge Nocedal. Upda ing quasi-new on ma ices wi h limi ed s o age. Ma hema ics o compu a ion, 35(151):773–782,
1980.
[56]
Sepp Hoch ei e and Jü gen Schmidhube . Long sho - e m memo y. Neu al compu a ion, 9:1735–80, 12 1997. doi:
10.1162/neco.1997.9.8.1735.
[57]
Wa d Cheney and Da id Kincaid. Linea algeb a: Theo y and applica ions. The Aus alian Ma hema ical Socie y, 110:
544–550, 2009.
[58]
G. W. S ewa . The e icien gene a ion o andom o hogonal ma ices wi h an applica ion o condi ion es ima o s. SIAM
Jou nal on Nume ical Analysis, 17(3):403–409, 1980. ISSN 00361429. URL
h p://www.js o .o g/s able/
2156882.
[59]
F ancesco Mezzad i. How o gene a e andom ma ices om he classical compac g oups, 2007. URL
h ps:
//a xi .o g/abs/ma h-ph/0609050.
[60]
Kai Hu and Ba nabas Poczos. Ro a ionou as a egula iza ion me hod o neu al ne wo k, 2020. URL
h ps:
//open e iew.ne / o um?id= 1e7M6VYwH.
[61]
Wallace Gi ens. Compu a ion o plain uni a y o a ions ans o ming a gene al ma ix o iangula o m. Jou nal
o he Socie y o Indus ial and Applied Ma hema ics, 6(1):26–50, 1958. doi: 10.1137/0106004. URL
h ps:
//doi.o g/10.1137/0106004.
[62]
Adelaide P Yiu, Valen ina Me caldo, Chen Yan, Blake Richa ds, Asim J Rashid, Hwa-Lin Liz Hsiang, Jessica P essey,
Vi ek Mahade an, Ma hew M T an, S e en A Kushne , Melanie A Woodin, Paul W F ankland, and Sheena A Josselyn.
Neu ons a e ec ui ed o a memo y ace based on ela i e neu onal exci abili y immedia ely be o e aining. Neu on, 83
(3):722–735, Augus 2014.
[63]
Lingxuan Chen, Ki s ie A Cummings, William Mau, Yosi Zaki, Zhe Dong, Sima Rabinowi z, Roge L Clem, T is an
Shuman, and Denise J Cai. The ole o in insic exci abili y in he e olu ion o memo y: Signi icance in memo y
alloca ion, consolida ion, and upda ing. Neu obiol. Lea n. Mem., 173(107266):107266, Sep embe 2020.
[64]
John B idle. T aining s ochas ic model ecogni ion algo i hms as ne wo ks can lead o maximum mu ual in o -
ma ion es ima ion o pa ame e s. In D. Tou e zky, edi o , Ad ances in Neu al In o ma ion P ocessing Sys ems,
olume 2. Mo gan-Kau mann, 1989. URL
h ps://p oceedings.neu ips.cc/pape _ iles/pape /
1989/ ile/0336dcbab05b9d5ad24 4333c7658a0e-Pape .pd .
[65]
Ilia Sucholu sky, Lukas Mu en hale , Ad ian Welle , Andi Peng, And eea Bobu, Been Kim, B adley C. Lo e, Ch is o-
phe J. Cue a, E in G an , I is G oen, Jascha Ach e be g, Joshua B. Tenenbaum, Ka he ine M. Collins, Ka he ine L.
He mann, Ke em Ok a , Klaus G e , Ma in N. Heba , Na han Cloos, Nikolaus K iegesko e, No i Jacoby, Qiuyi Zhang,
Raja Ma jieh, Robe Gei hos, She ol Chen, Simon Ko nbli h, Sunayana Rane, Talia Konkle, Thomas P. O’Connell,
Thomas Un e hine , And ew K. Lampinen, Klaus-Robe Mülle , Ma iya Tone a, and Thomas L. G i i hs. Ge ing
aligned on ep esen a ional alignmen , 2024. URL h ps://a xi .o g/abs/2310.13018.
25
Howe e , in Wang and Wiens
[50]
hei op imise is ADAM-like wi h iso opic de ini ions. This AdamSGD algo i hm is
displayed in Eqn. 51. This may be a s a ing poin o a mo e op imal iso opic adap i e op imise .
θ +1 =θ −η m
η =ηs1−β
2
/dim θ
=β2 −1+ (1 −β2)∥g ∥2
2, 0= 0
m =β1m −1+ (1 −β1)g , m0= 0
(51)
Al e na i ely, a e u n o an in e se-Hessian app oxima ing quasi-New on algo i hms such as BFGS [
51
,
52
,
53
,
54
] o
L-BFGS [
55
], may be ano he app oach. G adien clipping is ano he ope a ion ha equi es conside a ion, as i s aniso opy
can p oduce a ep esen a ional bias. The au ho is ac i ely esea ching bo h di ec ions.
A.4 Ope a ions:
Se e al new ope a ions can also be de ined wi hin he Iso opic amewo k; hese include min-like and max-like unc ions,
as well as a new mul iplica ion ope a ion. A link demons a ing hese is a ailable a
h ps://www.desmos.com/
calcula o / m is7a 4.
The s anda d minimum unc ion and maximum unc ion a e displayed in Eqns. 52 and 53 espec i ely, o a unc ion
:RN×RN→RN.
(x,n) = min (x,n) =
N
X
i=1
min (x ·ˆei, n ·ˆei) ˆei(52)
(x,n) = max (x,n) =
N
X
i=1
max (x ·ˆei, n ·ˆei) ˆei(53)
These unc ions a e applied elemen wise o componen s o
x
and
n
as indica ed by he sum o e s anda d basis di ec ions,
ˆei
. This basis dependency is an example o an induc i e bias which a ec s ep esen a ions. These unc ions ha e wo
a gumen s, which equi e gene alisa ion o he amewo k o mul i-a gumen cases. One way o gene alise such an iso opic
equi a iance is by applying he g oup ac ion o bo h a gumen s in a modi ed equi a iance ela ion:
R (x,n) = (Rx, Rn)
,
o all
R∈O (n)
. Acco dingly, one can de ine Eqns. 54 and 55 as unc ional o ms which ollow his ela ion. In he case
whe e n ·x = 0, he ope a ion should be de ined as he iden i y map.
(x,n) = min sign (x ·n),
n ·n
x ·nsign (x ·n)x (54)
(x,n) = max sign (x ·n),
n ·n
x ·nsign (x ·n)x (55)
These ope a ions a e a sugges ion o such an iso opic al e na i e, bu may no be op imal. They clip ec o s which
c oss a hype plane bounda y de ined by he choice o
n ∈RN
; he hype plane equa ion is:
x ·n =∥n∥2
2
. The minimum
unc ion p ese es he coo dina es o all samples on he o igin side o he hype plane and p ojec s all o he coo dina es on o a
hype plane. This p ojec ion is ca e ully chosen so ha he p ojec ed poin emains on a line passing h ough he o igin and he
o iginal coo dina es. Consequen ly, neu al e ac ion does no occu on he plane bounda y o o igin-in e sec ing lines, like i
does in he s anda d unc ions. The maximum unc ion compu es he opposing case, whe e poin s a e kep cons an on he a
side o he hype plane, which does no include he o igin. In his case, poin s wi hin he o he sec o a e simila ly p ojec ed
on o he hype plane.
Simila ly, Hadama d mul iplica ion, also e med elemen wise mul iplica ion, is inhe en ly basis-dependen . This can be
seen h ough i s basis dependence, h ough
ˆei
, in Eqn. 56. This is used in se ings such as he LSTM ga es mechanism [
56
],
also one can ein e p e he d opou -mask in such a o m [2], alongside many mo e applica ions.
(x,n) = x ⊗n =
N
X
i=1
(x ·ˆei) (n ·ˆei) ˆei(56)
One in e p e a ion o Eqn. 56 is ha i scales each componen o
x
by each componen o
n
, such ha each axis in he
s anda d basis is escaled. This is equi alen o p emul iplying
x
wi h a diagonal ma ix
W= diag (n)∈RN×N
, whe e
diag
decomposes
n
along he s anda d basis and pu s hese componen s along he diagonal o a squa e ma ix. This axis-wise
escaling is aniso opic, pa icula ly a wo-componen algeb aic pe mu a ion equi a iance simila o be o e.
To ep oduce a simila beha iou , one could use iso opic-mul iplica ion shown in Eqn. 57. This escales he componen o
x which lies in he n di ec ion by an amoun de e mined by ∥n∥.
(x,n) = x + ((∥n∥−1) x ·ˆn) ˆn(57)
These a e jus examples and may no be a simple d op-in eplacemen o exis ing ope a ions, due o he di e ences in how
he ope a ions ac . Ini ial aniso opic ope a ions escale in mul iple axes/hype planes a once, whe eas he iso opic ope a ions
32
only ac in single di ec ions. This may make hem subop imal as ga e mechanisms, such as in LSTMS, since hey can only
collapse he ep esen a ions in one di ec ion a a ime:
:RN×RN→RN−1,→RN
. These single-di ec ion ga es may be
limi ed in how hey eshape he ep esen a ions and ha e mo e signi ican compu a ional cos ; he e o e, u he de elopmen
is undoub edly needed. These speci ic implemen a ions may equi e a cus om iso opic ope a ion ha ully eplica es hei
desi ed e ec in an iso opic and basis-independen manne .
Finally, he e may be some ins ances whe e ep esen a ions mus be clipped in o a hype cube shape. This is no an iso opic
ope a ion, no basis independen ; howe e , his o mula ion does educe neu al e ac ion, so i is included o con enience.
Using Eqn. 58 esul s in subs an ial neu al e ac ion along he bounda y o he hype cube: lines passing h ough he o igin a e
p ojec ed ac oss he bounda y, subs an ially changing di ec ion.
(x) = max (min (x, 1) ,−1) =
N
X
i=1
max (min (x ·ˆei,1) ,−1) ˆei(58)
Using Eqn. 59 achie es he same esul wi hou neu al e ac ion a he bounda y. Any coo dina e which is ou side he
bounda y is p ojec ed back on o he bounda y, along a line be ween he o igin and he o iginal coo dina e. As a esul , di ec ion
is p ese ed. An a ine ans o m can ecen e his box a a bi a y coo dina es and eshape he box due o he linea ans o m.
(x) = min 1
maxi(x ·ˆei),1x (59)
33
B Quasi-Iso opic Func ional Fo ms
A middle g ound, ollowing om Sec. 4, balancing he p edic ed p oblems o aniso opy whils enabling ep esen a ion
comp ession, no jus h ough he bias, is o elax he ha d iso opy condi ion and in oduce sligh symme y b eaking in
many di ec ions. This can be achie ed using many small pe u ba ions o he di ec ion uni ec o only, p oducing a so e
symme y b eaking. Then he ne wo k has many dis inguished ec o s, a subse wi h which i may align i s ep esen a ions in
a ask-dependen manne . The e o e, i does no a ou a pa icula basis, bu s ill in oduces some desi able consequences o
aniso opy, o p oblems such as classi ica ion. I one u he es ic s he unc ions om ea u ing dynamic e ac ion, i limi s
de imen al aniso opic e ec s. This eco e s an a guably mo e p incipled and sligh ly inc eased basis-independen o m o
aniso opy.
One me hod is o apply a non-linea i y based on ounding he ec o ’s di ec ions. This is shown in Eqn. 60, whe e
[·]
indica es he ounding ope a ion and
ϕ(x)=x
. In ac , he aniso opic pe u ba ion may be implemen ed as simply as:
ϕ(x) = βx
o
β= 1
. The o e all angula e m is app oxima ely uni -no med, bu can be i ially modi ied o be exac ly
no m-1.
Φ (ˆx;α) = [αˆx]
α+ϕˆx−[αˆx]
α≈ˆx(60)
This p oduces a quasi-iso opic unc ional o m shown in Eqn. 61, wi h an iso opy-b eaking pa ame e
α
. Sligh aniso opic
e ac ion is added, independen o magni ude, so i is p edic able and hus ex apola able o he ne wo k. Due o he angula
a e ac ion and comp ession by he p oposed non-linea i y, ep esen a ion o e - and unde densi ies may hen occu , whe e
seman ici y may begin o be assigned. Howe e , o
α→ ∞
, iso opy is con inuously ein oduced and could be an op imisable
pa ame e . Such an app oach may be bene icial o con as i e lea ning me hods [36, 37].
(x) = σ(∥x∥) Φ (ˆx)(61)
Simila ly, one could cons uc Eqn. 62 as ano he o m o quasi-iso opic ac i a ion unc ional o ms. In his case, a ini e se
o uni - ec o s
ˆ
bi
is dis ibu ed o e a la ice ac oss
Sn−1
in
Rn
and a disc e e o a ion g oup ans o ms be ween a ious o
hese uni - ec o s. This me hod may be less compu a ionally e icien han he a o emen ioned me hod due o he nume ous
do -p oduc e alua ions; howe e , i emains ai h ul o he
Ψn
hie a chical g oup s uc u e. Mo eo e , one can eco e a
Bn
symme y wi hou neu al- e ac ion wi h his me hod, which may be desi able.
:Rn→Rn, x 7→ (x) = σ(∥x∥) ˆx
max
ˆ
biˆ
bi·ˆxi(62)
I is likely ha many o he unc ional o ms could be conside ed.
B.1 Pa ame e ised P obabalis ic-Iso opy
Following om Eqn. 62, an al e na i e ac i a ion unc ional o m can be cons uc ed which appea s simila bu uses ainable
ˆ
bi ec o s, which no a ionally can be s acked in o a ma ix ˆ
Zij ∈Rm×n.
Following an app op ia e ini ialisa ion o ainable pa ame e s
ˆ
Zij
, as discussed in App. A, he unc ional o m can become
weakly-iso opic. Ye i can unde go spon aneous symme y b eaking o ep oduce desi able beha iou s o aniso opy whils
p e en ing phenomena like neu al e ac ion. Whe he he denomina o e ms
ˆx
and
ˆ
Z
should be uni -no malised can be
e alua ed, as well as he speci ic use o a max unc ion.
:Rn→Rn, x 7→ (x) = σ(∥x∥) ˆx
max
iˆ
Zij ˆxj(63)
34
C S ochas ic Iso opy — P oducing Immedia e Aniso opic Analogues
One me hod o app oxima e iso opy wi h cu en unc ions is o s ochas ically choose a basis on which he aniso opic unc ion
ope a es. This enables aniso opic unc ions o be used in an iso opic ne wo k, wi hou inducing a ep esen a ional alignmen
o an a bi a y basis. The example is a aining- ime p obabilis ic condi ion; each ba ch exhibi s spon aneous symme y
b eaking, which ‘a e ages ou ’ o e mul iple ba ches. This app oach can be gene alised, and i s o e all u ili y assessed.
Fo example, cu en (aniso opic) d opou by S i as a a e al.
[2]
appea s o p i ilege he basis an i-aligned wi h he
s anda d basis, he eby maximally p ese ing in o ma ion when a di ec ion o he s anda d basis is collapsed. So i is expec ed
o incu an a bi a y basis dependence in he ep esen a ions. Howe e , aniso opic d opou can be applied o a s ochas ically
chosen basis. This andomness is hypo hesised o p e en a ep esen a ional-aniso opy induced by an a bi a ily chosen basis.
This can be achie ed by p oducing a basis, uni o mly d awn om he laye ’s o hogonal symme y:
B∼SO (n)10
.
The e a e se e al me hods o p oduce a uni o m andom ma ix, each wi h a ying compu a ional cos s. These include he
exponen ia ion o a Lie gene a o scaled by an app op ia ely d awn andom a iable, he G am-Schmid p ocedu e [
57
], and
many o he s [
58
,
59
]. The below ma ix-mul iplica ion p ocedu e is compu a ionally cumbe some; simple o mula ions may
be desi able. Hence, he ollowing p oposed o ms a e only a s a ing poin o con e ing aniso opic unc ions di ec ly o
p obabilis ically-iso opic o ms. In p ac ice, iso opic unc ions should be cons uc ed om he g ound up a he han me ely
analogous unc ions con e ed om exis ing aniso opic ones. The e o e, his emains a placeholde , bu wi h some in e es ing
ex ensions.
Fo he example o s anda d d opou , shown in Eqn. 64, i can be made s ochas ically-iso opic by including he basis-
ans o m shown in Eqn. 65. Whe e
x
is he ac i a ion ec o , wi h no malisa ion ac o
Sa
and
Ssi
, d opou -mask
Mi=
M·ˆei
and s anda d-basis ec o s ˆei.
x′=Sa
N
X
i=1
Mi(x ·ˆei) ˆei(64)
x′=Ssi
N
X
i=1 B
M·ˆei(x ·ˆei) ˆei(65)
Simila o mula ions, such as Ro a ionOu by Hu and Poczos
[60]
, gene ally imp o ed pe o mance when he basis is
s ochas ically o a ed. Howe e , his me hod emains s ochas ically aniso opic as he o a ions a e gene a ed h ough
Gi en’s o a ions [
61
], which is no uni o m o e he space o o hogonal ma ices — a necessi y o ull s ochas ic-iso opy.
Ne e heless, he implemen a ion by Hu and Poczos [60] is somewha encou aging.
This p ocedu e can be gene alised and applied o any exis ing aniso opic unc ion. Ye , as s a ed, i is gene ally p e e able
o cons uc an iso opic unc ion om i s p inciples a he han elying on s ochas ic iso opy, which may be compu a ionally
cos ly.
C.1 Conside ing Co ela ing he S ochas ic-Iso opy
Co ela ing he andom bases in ime would be a cu ious ex ension, pa icula ly o s ochas ically iso opic d opou . This may
p oduce a ime-like s uc u e in he embedded ac i a ion dis ibu ion o a ne wo k. This also cons i u es an ini ialisa ion-based
symme y b eaking, since a andom walk may no e enly co e he space in p ac ice and i s s a ing poin may bias aining.
Ne e heless, his is a sugges ed ex ension ha is conside ed in e es ing and in o ma i e.
I one imagines a andom walk in he Lie-algeb a space o o a ion ma ices (no mi o ac ion):
SO (n)∋R( +d )=
R( )δR
, wi h
δR=e ·n
, wi h
being he co esponding (no malised) an i-symme ic gene a o s o o a ions and
n ∼
N(
0, σIn)wi h 0< σ ≪1. This p ocedu e esul s in a andom walk o he o a ion ma ix a each ime s ep.
Following his, a ime-co ela ed Be noulli dis ibu ion can be de ined. Beginning wi h
D(0)
, di ide up he laye o neu ons
in o wo se s: inac i e
In=ni|
D(n−1)
i= 0o
and ac i e
An=ni|
D(n−1)
i= 1o
. Then we ha e wo hype -pa ame e s: he
s anda d d opou p obabili y
λ
and an o e lap p obabili y
Γ
, such ha
|An|q+|In|Γ = (|An|+|In|)λ
— whe e q is no
a ee pa ame e . I
|An|= 0
o
|In|= 0
, hen empo a ily de ine
q= Γ = λ
. I no , one mus p e en unno malised
p obabili ies as shown in Eqn. 66.
Γ = max 0,max λ+|An|
|In|(λ−1) ,min 1,min λ+|An|
|In|λ, Γ (66)
Leading o
q=λ+|In|
|An|(λ−Γ)
. Then use one Be noulli unc ion ac oss all ac i e neu ons using
R|An|∋
D(n)
(A)∼
Be noulliDis .|An|(q)
likewise o inac i e neu ons
R|In|∋
D(n)
(I)∼Be noulliDis .|In|(Γ)
. The e o e, co ela ing he
inac i e ‘neu ons’
11
ac oss he ime s eps, whils s ill in oducing a deg ee o andom d opou . Thus, he ‘basis o d opou ’
unde goes a andom walk a e e y ime s ep, and ‘neu ons’ a e andomly chosen o be d opped om he ne wo k, wi h a
di e ing likelihood i hey we e jus p e iously d opped. The cohe ence ime can be adjus ed h ough
Γ
o he speci ic
ime-dependen ask needed.
10
Gene a ing
SO (n)
may be compu a ionally simple han gene a ing
O (n)
due o he o me ’s connec ed na u e. Mo eo e , in ei he case, he e is no
e ec o d opou .
11
These ‘neu ons’ do ha e a s ochas ic d i in hei de ini ion due o he andom walk. In gene al, indi idual ‘neu ons’ a e an ambiguous concep in an
iso opic ne wo k.
35
This c ea es a link be ween he s imulus’s p esen a ion ime o he ne wo k and he ‘neu ons’ i al e s, such ha s imuli
p esen ed in a smalle ime window pe u b a simila subse o he ne wo k’s neu ons. This may p oduce an encoding
quali a i ely simila o ha ound in human cogni ion, whe e neu ons a e hough o go h ough exci abili y cycles o sligh ly
di e ing equencies and phases. When he exci abili y is highe , in o ma ion (eng ams) p e e en ially encodes upon hose
neu ons [
62
,
63
]. As g oups o ‘neu ons’ begin o decohe e, some o e lap emains, such ha memo ies a e in e laced i hey
occu wi hin a empo al window o cohe ence. This po en ially gi es neu al ne wo ks using iso opic d opou an ad an age in
ime-se ies da a. Howe e , his is no sugges ed as a model o such neu ological p ocesses, only a simila beha iou in deep
lea ning, which is made possible h ough iso opic choices. Simila co ela ions could be conside ed o aniso opic d opou ,
likely o ha e a simila esul an e ec .
36
D Po en ial Applica ions
Besides he p oposed gene al applicabili y o he iso opic modi ica ions, he ollowing a e some a eas whe e hey may yield
signi ican pe o mance bene i s o enable desi able ne wo k beha iou s.
D.1 Iso opy In T ans o me s
I is a gued ha iso opic deep lea ning may be a mo e app op ia e induc i e bias o deep lea ning. Howe e , he e may also
be some a chi ec u es which a e pa icula ly enhanced by i s inclusion. One o hese is he sel -a en ion s ep o ans o me s
[10], whe e iso opic- anh may be o pa icula bene i , in eplacing he so max ope a ion [64].
So max is de ined h ough elemen s being bounded be ween ze o and one,
(x)·ˆei∈[0,1]
and summing o one.
Consequen ly, i is non-nega i e, and he e a e egimes whe e his may be a limi ing ac o . I o ms a
n−1
-simplex embedded
in he Rnspace, no mal o
1, he e o e a deg ee-o - eedom is also los : :Rn→ △n−1,→Rn.
I has been shown ha ep esen a ions can exis in an an ipodal supe posi ion [
16
], pa icula ly when s imuli do no
end o coexis in samples; hus, an ipodal a angemen s can exis wi h minimal in e e ence. Such a s imulus may be a
con inuous quan i y, bu i s wo ex emes a e mu ually exclusi e. Many o hese seman ics a e p esen in he eal wo ld,
such as day ime- o-nigh ime, mo ion owa ds o away, and smiling e sus owning. These could be ep esen ed h ough a
ze o- o-one scale; howe e , a
[−1,1]
scale may be a be e ep esen a ion, wi h ze o as a be e neu al middle poin . This
is because, in he linea ea u es hypo hesis, he magni ude o en indica es he s eng h o he s imulus’s p esence. In his
case, he nega i e o a seman ic di ec ion may be equally meaning ul and p esen in a ying amoun s. I may be expec ed ha
enabling his beha iou wi hin he sel -a en ion s ep is a ou able.
Mo eo e , he sum- o-one case may no always be desi able: i always encou ages a change o he seman ics when
conside ing he esidual-s ep-modi ica ion. This may encou age a seman ic co ec ion o an ac i a ion in ans o me s,
e en when i is inapp op ia e, o op imisa ion may o ce he exis ence o a nea -ze o alue ec o o p e en co ec ions.
The esidual s ep only encou ages an iden i y ans o m independen o he ac i a ion, in he linea laye , a he han he
iden i y dependen on a pa icula ac i a ion.
The sel -a en ion s ep compa es he pai wise simila i ies be ween se e al ec o s g ouped in o he so-called ‘keys’ and
‘que ies’. The deg ee o simila i y hen a ec s how much o ano he seman ic is exp essed: he ‘ alues’. Howe e , he so max
laye is basis-dependen and p e en s a nega i e exp ession o hese alue seman ics.
A mo e sui able choice may be iso opic-
anh
. In analogy o i s sum- o-one cons ain , i s ec o -magni ude is a maximum
one,
0≤ ∥ (x)∥ ≤ 1
, whils elemen wise i s alues a e
−1≤ (x)·ˆei≤1
. Hence, i can exp ess a nega i e o he alue
seman ic, o any scaling o i be ween
−1
and
1
. This sugges s ha iso opic-
anh
may be an appealing d op-in eplacemen
o so max in he a en ion s ep, a leas concep ually. I s con inuous o a ional symme y may also o e an ad an age,
since he unde lying pai wise simila i y o sel -a en ion
QKT=xTWT
QWKxˆ=xTW′
kqx
is also basis-independen in
x
o
iso opic ini ialisa ions o
W
, which somewha aligns wi h he p inciples o iso opic deep lea ning. Hence, emo ing u he
bases may enable a mo e e en in e pola ion be ween, and pe u ba ion o, he alue ec o s. Hence, an iso opic adap a ion o
a sel -a en ion may appea as shown in Eqn. 67, which will be explo ed in u u e wo k.
A en ion (Q, K, V ) = Iso opic-Tanh QKT
√dkV(67)
Howe e , his does no make ans o me s ‘iso opic’ as a whole, since he e a e many u he aniso opic s eps p esen .
Ne e heless, single-laye iso opic adap a ions emain compa ible wi h a la ge aniso opic ne wo k o adial-basis ne wo k.
The e o e, one may hyb idise hese app oaches i app op ia e. In gene al, iso opy may no be applicable o cu en ans o me s,
due o hei likely selec ion upon aniso opic p imi i es. Hence, a g ound-up app oach may be gene ally equi ed, whils
bo owing concep s om he es ablished ans o me s.
D.2 Real-Time Dynamical Ne wo k Topology
An appealing ea u e o iso opic deep lea ning is he ela ion displayed in Eqn. 14, showing ha due o o a ional equi a iance,
a o a ion o one weigh ma ix can be coun e ac ed wi h he in e se- o a ion o ano he , p ese ing he ne wo k’s unc ion.
Consequen ly, a gauge eedom is o med due o unc ional o ms commu ing wi h symme ies. Wi h gauge eedom, a
pa icula gauge ha exp esses he weigh s bene icially wi hou a ec ing ne wo k unc ionali y can be chosen.
One such gauge exp esses he pa ame e s in a basis wi h a magni ude o de ing o he singula alues o he ma ix
ows/columns. One could hen se a h eshold o he singula alue, and bias, o de e mine i each co esponding di ec ion in
such a ma ix has a meaning ul con ibu ion o he o e all unc ionali y. I i is deemed o ha e negligible alue, i can be
p uned wi h li le ad e se e ec on he ne wo k.
Mo eo e ,
ζ
la en neu ons can be included, wi h ze o-ini ialised singula alues ully connec ed o exis ing neu ons. These
do no impac pe o mance, bu enla ge he ac i a ion space and pa ame e s a ailable. Since he Jacobians o he iso opic
ac i a ion unc ions a e no s ic ly diagonal, hese la en neu ons may be apidly ained i equi ed. The e o e, he o he wise
s a ic, ully connec ed ne wo k is now dynamic, g owing and sh inking in esponse o ask-necessi a ed demand, wi h minimal
impac on pe o mance.
This is enabled h ough an iso opic unc ional o m. I poses an in e es ing esea ch di ec ion, enabled by he con inuous
o a ional symme y a ailable. T ans e lea ning and ask-swapping may become mo e s aigh o wa d. The ne wo k may
g ow o accommoda e new asks, o p une o op imise he model and s abilise compu a ion. Fo example, i may s abilise by
emo ing nea -negligible, bu some imes signi ican ly non-ze o e ec s, which may be maladap i e due o hei in equen
37
e ec . Ou pu and inpu neu ons could also be appended and emo ed in such a way, allowing o eal- ime changes o a
da ase , o e en aining on mul iple da ase s. Such a p ocedu e could be ex ended o con olu ional ne wo ks, allowing a
dynamic numbe o ke nels. Simila possibili ies may exis o o he a chi ec u es.
I appea s ha i may sides ep he Lo e y Ticke Hypo hesis [
34
] in choosing he op imal ne wo k size be o e aining.
Due o he compu a ional cos , his does no need o be compu ed a e e y s ep; ins ead, i can be pe o med pe iodically and
laye wise.
This could o e subs an ial insigh in o how pa ame e s may be sha ed be ween asks in eal- ime. Fo example, he au ho
pos ula es ha i a new da ase is in oduced pa way h ough aining on a di e en da ase , he e migh be a sho - e m
inc ease in pa ame e s un il he ne wo k’s pa ame e -sha ing begins, ollowed by a p uning phase un il a mo e compac
a chi ec u e is eached. Ques ions such as hese could mo i a e he de elopmen o a sub ield o e ing insigh in o hese
esea ch a enues. These ne wo k dynamics may be inc edibly insigh ul. O e all, his enables ask-d i en, eal- ime neu al
plas ici y in deep lea ning. O he con inuous symme y-p imi i es may enable simila app oaches.
D.3 Mul i-Headed Laye s
Al hough no limi ed o iso opic deep lea ning, p oducing ‘mul i-head’ eed- o wa d laye s may be desi able, enabling
pe u ba i e-like co ec ions o ac i a ions a each laye . This could be achie ed by summing o e a se ies o ac i a ion
unc ions in each laye . An example o his is shown in Eqn. 68 o a mo e gene al cons uc ion in Eqn. 69, o weigh ma ices
Wjand W′j, biases
bjand
b′jand an ac i a ion unc ion :Rn→Rn.
xl+1 =X
j
Wjxl+
bj(68)
xl+1 =X
j
W′j Wjxl+
bj+
b′j(69)
This is e ec i ely a sum o e se e al eed- o wa d laye s, which could inc ease he exp essibili y o a laye . I may be
especially bene icial o iso opic unc ions, such as iso opic-
anh
, whe e each sum p oduces a u he pe u ba i e ‘co ec ion’
o an ou pu ec o . A scaling could be en o ced i an o de ing o such pe u ba i e co ec ions was desi ed.
Fu he mo e, one could conside an al e na i e, bu simila , s uc u e based upon a ib e-bundle symme y ex ension o e
hese a ious heads. Each laye could cons i u e a base space, and he mul iple heads could be conside ed a ib e, o ice
e sa. This can enable a de ini ional o m o ma ix- alued unc ions. Addi ionally, in such a case, one may also conside
he heads as a ec o ep esen a ion o e each base neu on, and es uc u e he pa ame e maps acco dingly o compu a ion
be ween a ious ec o - alued-like neu ons. This gene alisa ion may ind applicabili y in ce ain specialisms, whils he
gene al s uc u e abo e may ha e b oade applicabili y.
D.4 Iso opic Rep esen a ions May Aid Seman ic Alignmen
An eme ging in e disciplina y ield o seman ic alignmen [
65
] a e ying o p oduce compa able ep esen a ions be ween deep
lea ning and he b ain. The au ho belie es i is wo hwhile o in es iga e how he ep esen a ions gene a ed by he Iso opic
Deep Lea ning app oach can aid in achie ing his objec i e. This is because aniso opic unc ional o ms ha e been shown o
c ea e ep esen a ional s uc u e due o unc ional o ms; his s uc u e is no a na u ally eme ging consequence o he da a
[
17
]. These a i icial s uc u es may be de imen al o ep esen a ional alignmen objec i es. Remo ing aniso opy may help
wi h alignmen me hods ha use con inuous o a ion-like, such as in he wo k o Williams e al.
[66]
, since he induc i e bias
o iso opy is equi a iance o con inuous o a ion. Howe e , his connec ion is la gely specula i e, bu is included as a poin o
discussion and po en ial esea ch a enue o iso opic deep lea ning. Fo iso opy, his may p o ide a es able ou e o he
hypo hesis ha iso opic deep lea ning o ms mo e ‘na u al’ seman ic s uc u e in ep esen a ions.
Howe e , his is no o sugges ha he b ain is likely o be iso opic, especially since he app oach p oduces delocalised
unc ional o ms. In iso opic ne wo ks, neu ons ins ead ac as a collec i e and a e a bi a ily decomposable in o any se o
indi idual neu ons, due o he gauge in a iance. Consequen ly, he e is likely no iden i iable and ag eeable de ini ion o a
neu on in an iso opic ne wo k. This is undamen ally incompa ible wi h he b ain’s s uc u es. On he o he hand, obse a ions
o dis o ion in deep lea ning due o aniso opy do no imply ha he b ain also p oduces aniso opic and disc e e ea u es
when ime-a e aging i s neu on i ings. The e o e, despi e he incompa ibili y o unc ional o ms wi h biological neu on
beha iou , hei ep esen a ions may ha e subs an ial simila i y, and i may be possible o p oduce be e alignmen be ween
hei espec i e ac i a ion dis ibu ions once aniso opic-incu ed s uc u es a e emo ed om deep lea ning.
A simila app oach may be ex endable o in e ing meaning om languages whe e a la ge co pus is a ailable. I iso opy
p oduces a con inuous ep esen a ion, ee om basis dis o ions, hen one may expec a mo e s uc u ed and in e pola able
seman ic s uc u e. One may specula e whe he an app oxima ely language-agnos ic s uc u e may de elop in a simila
analogy o ep esen a ional alignmen — he la e ield is a guably assuming (and o en encou aging) an app oxima ely
model-agnos ic ep esen a ion s uc u e. Hence, he e may be a chance ha ep esen a ions ee om a i icial aniso opies
may aid in deducing an alignmen be ween known and unknown ocabula y. This may be an in e disciplina y applica ion
o Iso opic Deep Lea ning. Smalle -scale empi ical app oaches could begin wi h a hyb id o he wo me hods o es ablish
whe he embedded ac i a ion ep esen a ions can be aligned be ween o he simple biological sys ems’ neu al ac i i y o
ocalisa ions, and simila domained deep lea ning models. This may gi e insigh in o he app oxima e seman ici y o some
o hese. Simila can be ied o a ious models wi hin deep lea ning, de e mining i con e gence on o one, o se e al,
uni e sal-like ep esen a ions occu s wi hin a simila domain.
38
Despi e his, he success o ensemble models may limi his alignmen . Ensemble models use di e se indi idual models o
collec i ely app oxima e a ask solu ion. The model di e si y would sugges ha he e a e mo e minima han hose connec ed
h ough a pe mu a ion o con inuous symme y, which would no yield di e se indi idual solu ions, as hey a e unc ionally
iden ical. The e o e, he e a e likely unc ionally di e se models wi h subs an ially di e en in e nal ep esen a ions. This
may challenge assump ions o a uni e sal and compa able ep esen a ional s uc u e o seman ics. Despi e his, hey may all
p oduce di e en app oxima ions o a uni e sal ep esen a ion. Subs an ial es ing will elucida e his, and he basis-undis o ed
ep esen a ions p oduced by iso opic ne wo ks may be pa icula ly bene icial in such an endea ou .
39
E Compa isons Wi h Geome ic Deep Lea ning App oaches
E.1 Dis inc ion om Equi a ian Ne wo ks
Geome ic Deep Lea ning and his P imi i e-Fi s app oach bo h conside deep lea ning h ough he use o symme ies and
g oup- heo e ic lenses. Na u ally, due o his, se e al implemen a ion, o mulaic and e minological con e gences occu , and
such in e disciplina y conside a ions and unde s anding o hei consequences may be p ac ically ad an ageous in gene al.
Fo example, simila i ies include he (algeb aic) in a ian and equi a ian ela ion in hei de ini ion, which ex ends o
o he g oup- heo e ic usage, ep esen a ion heo y, Lie G oups, Gauges, e c., which can make hem appea simila due o
a incommon o malism. Ye , hese pe spec i es on symme y in deep lea ning emain dis inc in hei in ended pu pose,
he esul an consequences o applica ions, hei b oadly di e en egimes o conside a ion, hei independen mo i a ion
o such symme y p inciples, and hei di e ing po en ial impac s and implica ions o he ield. Se e al places in his
wo k al eady b ie ly ou line o e laps and dicho omies, pa icula ly ega ding axonomic uni ica ion. This sec ion aims o
p o ide an ex ended ou line o he pa allels and di e gences be ween hese philosophies, es ablishing a holis ic pic u e o
hese complemen a y conside a ions o symme ies in deep lea ning.
P ima ily, he guiding philosophy o Geome ic Deep Lea ning is o asce ain which symme ies (speci ically he algeb aic
symme ies in he axonomic e minology) a e inhe en o a gi en da ase and in he in ended applica ion. Using his, one hen
cons uc s models which a e gua an eed o espec his da a-de i ed s uc u e. A clea example o his app oach is he b oad
a ay o equi a ian ne wo ks ha ha e been de eloped, and se e al o hese in pa icula will be discussed u he below o
compa e simila i ies and di e ences. These ha e equi ed he cons uc ion o indi idual maps o u he his model-scale
equi a iance when composed oge he . This does c ea e some o e lap, speci ically wi hin algeb aic conside a ions, bu i is
p ima ily a op-down app oach — i s a s wi h model-scale cons ain s and ecu si ely applies hem downwa ds o ensu e i is
e ained o e all he unc ions and hei composi ions.
Fo p imi i e- i s e o mula ions, i s a s wi h de e mining he implica ions o p imi i e unc ions as ounda ional biases
and hei composi ional biases, which a e expec ed o be hie a chical in na u e. This is i s ini ial line o enqui y be o e
le e aging such indings gene ally. This is no jus pe inen o ac i a ion unc ions bu ex ends o e p imi i es mo e gene ally:
op imise s, no malise s, ope a ions, ini ialisa ions and many mo e ounda ional maps. These can be colla ed in o se s de ined
by pa icula symme ies h ough a axonomised sys em o h ee gene a ions and h ee la ou s o indica e he s eng h/deg ee
and ype o symme y ca ego ising each map — his me ges in o he b oade axonomic app oach. In his, Geome ic Deep
Lea ning occupies he algeb aic sec o in u he ance o model-scale symme ies. This is no he sole cons ain no scale- emi
o he p imi i e- i s app oach, o which he whole axonomy is ele an a all scales. This app oach is he sugges ion ha
hese p imi i e cha ac e ising symme ies may ha e impo an implica ions in e nally o ne wo ks. Thei o m may ha e
di ec and indi ec implica ions o he in e nal dynamics o ne wo ks, h ough ep esen a ions and lea ning dynamics, which,
once sui ably unde s ood, can be le e aged in mo e gene al applica ions. I is hypo hesised ha such symme ies may be
impo an and gene alising analy ical quali y o many p imi i es o each se
12
. Hence, he de elopmen and in es iga i e
audi o all such p imi i es and hei consequences, ollowed by he ebuilding o gene al a chi ec u es o applica ions, is
he p ima y guiding p inciple o his wo k. Hence, i cons i u es a bo om-up app oach which is less p edica ed on speci ic
da a-d i en s uc u es.
To ex end his b ie compa a i e analysis, his sec ion will con inue wi h a discussion and ou line o he se e al key
app oaches wi hin Geome ic Deep Lea ning, namely Equi a ian G oup-Con olu ions [
23
] and discussion o S ee able-CNNs
[
24
], Ha monic ne wo ks [
25
] and Sphe ical-CNNs [
26
]. Following his is a summa y o he simila i ies be ween hese
me hods and he p imi i e- i s app oach — namely, one can p oduce some alignmen s when conside ing only algeb aic
symme ies o he axonomy. Finally, c ucial di e ences will be highligh ed o demons a e he dis inc i eness o hese
app oaches in gene al, pa icula ly in e ms o symme y in deep lea ning. O e all, his indica es dis inc bu po en ially
complemen a y g oup- heo e ic app oaches.
An ou line o Cohen and Welling
[23]
’s G oup Equi a ian Ne wo ks is o u ilise a speci ic symme y, pa icula ly he one
exp essed in he unde lying ask’s da a-s uc u e domain, and ensu e he ne wo k as a whole espec s he ask- ele an symme y
h ough use o a modi ied con olu ion ope a ion: G oup Con olu ional Neu al Ne wo ks (G-CNNs). This is gene alising he
adi ional ansla ion equi a iance o he con olu ion ope a ion (igno ing edge e ec s) o ins ead be equi a ian o a gene al
disc e e g oup
G
. This symme y g oup is chosen a p io i by conside ing he gi en da ase , so he app oach is o le e age he
known symme ies o he ask as a s ong and cons aining induc i e bias o ensu e accu a e desi ed solu ions. Consequen ly,
his symme y- espec ing cons ain is applied end- o-end o e he whole ne wo k a chi ec u e. This can ha e a weal h o
bene i s, including inc eased weigh -sha ing e iciency, physically accu a e modelling, and a esul an inc eased exp essi e
capaci y.
A summa y o his design p ocedu e is: iden i y i he speci ic ask has i s da a dis ibu ed o e a pa icula linea base
space and i i is expec ed ha applica ions upon his da a a e expec ed o ollow a symme y o ha space. I so, hen he da a
can be ‘li ed’ on o a symme y g oup ac ing on ha space, and an associa ed equi a ian model can hen be used o achie e
he in ended applica ion. The connec ion o symme y can be deno ed o da a and gene al ac i a ions by he ollowing map:
:G → Rn
— e e y elemen wi hin he g oup is assigned an
n
-dimensional ec o by
, such ha he esul an ec o s
a e in e ela ed h ough he in ended g oup ac ion. Fo Cohen and Welling
[23]
, hei con olu ion ope a ion hen p ese es
g oup-s uc u e in i s con olu ion map o disc e e symme ies, p oducing new ea u es ha a e s ill s uc u ed o e he g oup.
These can hen be s acked o o m a model which espec s he end- o-end symme y.
12
Al hough i is ecognised ha o he analy ical quali ies speci ic o indi idual implemen a ions also con ibu e, so o some ex en could be conside ed an
‘e ec i e heo y’.
40
The g oup-con olu ion is implemen ed as a modi ica ion o he classic disc e e con olu ion ope a ion: by applying he
il e o e he g oup, as shown in Eqn. 70. In Eqn. 70,
k
indexes he il e
ψ
and no ably he sum is o e he base space
X
in he
i s laye :
h∈ X
, no
h∈ G
. Then, he esul an equi a ian g oup con olu ion espec s he symme ies o he ask a e e y
laye , gi en by he a o emen ioned da a s uc u e :G → Rn. A nai e implemen a ion esul s in augmen ing he numbe o
il e s o accommoda e e e y ac ion o he g oup; howe e , c ucially, hese a e shown o be ela ed h ough pe mu a ion, so
can be achie ed mo e e icien ly in p ac ice by an indexing ha exploi s he g oup’s s uc u e. Fo mo e de ails o p ecise
implemen a ion, see Cohen and Welling [23].
[ ⋆ ψ] (g) = X
h∈G X
k
k(h)ψkg−1h(70)
O e all, his adap s he con olu ional ope a ion and padding o espec he symme ies in he unde lying da a s uc u e. I
is also shown ha se e al exis ing p imi i es commu e wi h hese g oup ac ions. Pa icula ly, he exis ing elemen wise
non-linea i ies commu e wi h he conside ed g oup ac ions, and hence mus be e ained in hei cu en unc ional o m o
ensu e end- o-end adhe ence o he symme y. In o he cases, only small modi ica ions o o he p imi i es, such as hose
speci ied o no malisa ions, need o be made, which s ill enables hem o e ain hei cu en unc ional o m. Hence, in his
app oach, p imi i e e o mula ions a e po en ially de imen al o b eaking he end- o-end equi a iance o he cons uc ion —
in his case, e aining he cu en elemen wise o m is impo an o commu ing wi h he g oup ac ion. In sho , he cu en
p imi i es a e e ained.
In he subsequen wo ks o Cohen and Welling
[24]
, Wo all e al.
[25]
, and Cohen e al.
[26]
, conside able p og ess
is made in de eloping models capable o a g ea e ange o equi a iance symme ies h ough ex ending he a chi ec u es
and ools. In Cohen and Welling
[24]
, he au ho s build upon ea lie wo k by c ea ing s ee able capsules, in which ec o s
ans o m unde i educible ep esen a ions o he disc e e g oup. These s ee able il e s a e cons uc ed as linea combina ions
o base il e s, esul ing in a mo e pa ame e -e icien design. In [
26
], hey gene alise hese concep s o images o e a sphe ical
shell,
S2
, and li i o an
SO (3)
con inuous symme y equi a iance, using a Fou ie ans o m-like me hod. Th ough hese
and o he s, ne wo ks a e made algeb aically equi a ian o disc e e g oup ans o ma ions and ex ended o speci ic con inuous
g oup ans o ma ions. This sub ield is highly ac i e and ich wi h many o he success ul disco e ies along he same ein o
da a-d i en symme y app oaches o he model scale. The key di e ences can be cla i ied wi h he s a ed examples.
Re u ning o Wo all e al.
[25]
, he au ho s use he s ee able il e s o cons uc equi a iance o con inuous pa ch o a ion
using ini e il e s cons ained on o ci cula ha monic unc ions exhibi ing he desi able o a ional equi a iance. This de elops
in o complex- alued ac i a ions and maps, whe e he ini ial inpu da a becomes he eal pa in he complexi ica ion. To
main ain he o a ional equi a iance in he ha monic ne wo k, an ac i a ion unc ion is in oduced which mus ac upon he
complex- alued ac i a ions bu is also cons ained o ensu e o a ional equi a iance. The esul is an ac i a ion unc ion which
ac s on he absolu e alue o he complex numbe elemen wise. This ins ance o an ac i a ion unc ion which applies o e
he absolu e alue, can be conside ed o cons i u e a single ac i a ion unc ion o he algeb aic
Sn×U (1)
b anch, and i
exp essed mul i a ia ely could be manipula ed in o he unc ional o m such as Eqn 29. This indica es ha applica ion-d i en
ins ances o p imi i es do occu and could be e ospec i ely classi ied h ough he axonomic conside a ions o his pape , and
pa icula ly s ic ly as algeb aic gene a ions. Howe e , hese also eme ged in he con ex o a p imi i e ha mus u he he
b oade model’s cons ain s, so conside a ion o hei ounda ional biases is no discussed in ela ion o i s wide impac on
gene al ne wo ks ou side o model-scale equi a iance, which con ibu es u he di e gences in philosophy.
Building upon he Ha monic ne wo k, Thomas e al.
[67]
u ilise a mo e gene alised ins ance in ended o geome ic enso s
which enables equi a iance o local o a ion, ansla ion and pe mu a ion, ex ending he p io wo k o Ha monic ne wo ks o
a ious geome ic enso s. As a necessi y, he non-linea i y mus ac on he enso s in a manne which is no changed unde
he speci ied g oup ac ions, o ank-1 enso s his is mani es ly a o m o no m-based ac i a ion unc ion ins ance o ensu e
equi a iance, d awing some pa allels wi h ins ances o iso opic ac i a ion unc ions bu only o enso objec s, meanwhile o
ank-0 (scala s) enso s his educes o he s anda d elemen wise ac i a ion unc ion o he ypical aniso opic o ms and highe
ank enso s a e simila ly demons a ed. These choices ensu e ha he non-linea i y does no b eak he equi a iance o hese
ac ions.
O e all, he sha ed language o g oup heo y eme ged na u ally in bo h app oaches as a esul o hei espec i e objec i es.
One uses i in ensu ing model-scale adhe ence o a speci ied algeb aic symme y g oup li ed om he da a, whils he o he
conside s se s o p imi i es wi h espec i e unc ion-d i en ounda ional and composi ional biases. These do some imes
cons i u e o e lapping conside a ions and po en ial o sha ed ooling, mos ly limi ed o ins ances o ac i a ion unc ions o
speci ic algeb aic symme ies. Ye , hey emain angen in bo h hei mo i a ing pu pose, pa icula e sus gene al applica ions
and he consequences o symme y in a ne wo k. Al hough he p imi i e e o mula ion o deep lea ning is o geome ical and
deep lea ning cons uc ion, i does no appea o si cleanly in o he cu en ield o geome ical deep lea ning [
27
]. Ins ead, i
is cons uc ed a ound he geome y o embedded ep esen a ions, al e ing he in e nal symme ies o gene al ne wo ks a he
han a ne wo k-wide ex e nally applied symme y ins illed by a p edominan ly ask-d i en induc i e bias. In many ways, he
p imi i e i s app oach could be conside ed he consequences and le e aging o symme y b eaking in many ci cums ances,
whe e a ne wo k is no en o ced o be end- o-end symme ic and ins ead hese ac ions can be hough o ac on he ne wo k and
in luence i s beha iou in unin en ional and peculia ways. These a e hen conjec u ed o eeme ge as a ple ho a o sca e ed
phenomena discussed. A u he discussion o he di e ences is p o ided below.
The e a e se e al mo e dis inc di e ences in he app oach, pa icula ly conce ning whe e symme ies a ise, he dis inc
induc i e biases conside a ions, and how hese may in luence ec o spaces. These a e u he de ailed below.
41
Local coding, as discussed in App. F, is cha ac e ised by a one- o-one co espondence be ween a neu on and a seman ic.
The e o e, each neu on’s ac i a ion is associa ed wi h he p e alence o a dis inc seman ic concep . In con as , dis ibu ed
codes ha e seman ic meaning ha is dispe sed ac oss a popula ion o neu ons. Consequen ly, seman ics no longe end o align
wi h indi idual neu ons. The e a e con eniences o local coding; i s simplici y makes ne wo ks e y in e p e able, a bene i o
bo h AI sa e y and diagnosing ne wo k pa hologies. I is he e o e o en gi en as an in ui i e i s -o de app oxima ion o he
ac ion o deep lea ning models. Howe e , his i s -o de heu is ic may inad e en ly sugges ha such induc i e biases a e
benign — a posi ion his pape challenges.
Modula ion o dis inc seman ics is a desi able ac ion. Ope a ions such as bounding he s eng h, ec i ying he signal, and
dis o ing aspec s o i s s imulus o neu on- esponse cu es may all be bene icial seman ic ac ions o a ne wo k, al e ing i s
in e nal exp essions and enabling be e in e ac ions be ween concep s. These ac ions can all be achie ed using he applica ion
o an ac i a ion unc ion: Tanh and Sigmoid, ReLU [
15
], o Leaky-ReLU [
15
], Swish [
13
] and SiLU [
89
], espec i ely.
Alongside compa a i e and logical ac ions be ween seman ics, such as hose achie ed by So max o ga e mechanisms [
56
],
espec i ely. I each neu on encodes a single seman ic ea u e, hen unde his assump ion, i is app op ia e o apply such
ope a ions elemen wise, independen ly ac ing on each neu on o scale hei seman ic ep esen a ion. Unde a local coding
assump ion, elemen wise ope a ions su ice in modula ing and enabling in e ac ions be ween en i e seman ic ea u es, since
each neu on co esponds o an independen seman ic.
An addi ional in luence may ha e been he expe sys em app oach, whe e ixed i - hen logic p oduces a condi ion on each
meaning ul quan i y o p oduce he desi ed ou pu . This me hodology may also ha e in luenced he on-o ac i a ion unc ion
app oach, such as he Hea iside s ep unc ion. This app oach aligns concep ually wi h local coding.
A consensus is eme ging ha whils ne wo ks o en end o his local coding [
31
], he ep esen a ions a e o en mo e
nuanced in p ac ice [
31
,
88
,
16
]. I is a gued ha he ne wo k may balance local coding wi h in e e ence and ep esen a ional
capaci y needs [
16
]. This pape and Bi d
[17]
’s wo k ha e u he explo ed he issue by implica ing unc ional o ms as
esponsible o he o ma ion o his coding endency.
Unde hese mo e gene alised dis ibu ed codes, single ac i a ion ac ions and compa isons a e insu icien o in e ac ing
wi h whole seman ics. Such ope a ions may only dis o he ac ion o he seman ic, ep esen ed h ough a single neu on. This
unde pins he neu al e ac i e p oblem. Ins ead, a dis ibu ed logic is equi ed. Ins ead o assigning ope a ions upon numbe s
as componen s o a ays, i is sugges ed o conside a ull ec o space ea men (inne -p oduc space), wi h magni ude and
di ec ions as ounda ional quan i ies a he han componen s. I is a gued ha ope a ions should ac on hese, mo e undamen al
and basis- ee quan i ies.
This is ein o ced by he changing a ine ans o ms — due o pa ame e s adap ing du ing aining o andom symme y-
b eaking ini ialisa ions. The e o e, ec o di ec ions may be unp edic ably dis ibu ed and change apidly du ing aining,
causing ac i a ions o mo e a ound he ep esen a ion space. As ac i a ions e ol e a ound he ep esen a ion space h ough
aining, so oo do he seman ics hey indi idually and collec i ely ep esen . The e o e, ac i a ion unc ions as seman ic
mode a o s should no be chosen o only ac upon single neu ons, as seman ics a e no wholly exp essed by indi idual neu ons.
Especially, since he endency owa ds symme y-b oken local coding only eme ges a e conside able aining [
17
]. Mo eo e ,
his is compounded by las ing polyseman ici y in speci ic neu ons [31, 16] e en a e aining.
Since seman ics a e o en ound o be dis ibu ed in he ep esen a ion space, and adap h ough lea ning, wi hou a
p edic i e heo y o hei ajec o ies, a sensible induc i e bias is o apply he modula ion iso opically. Hence, ensu ing ha
any linea ea u e, including hose o -axis, a e modula ed consis en ly by iso opic o ms. This espec s bo h dis ibu ed
coding schemes and polyseman ic neu ons. Fu he mo e, he applica ion o unc ions iso opically s ill co ec ly modula es
locally coded seman ics since hei unc ionali y can be iden ical along he s anda d basis. Thus, applying non-linea i ies
exclusi ely along he s anda d basis is inconsis en wi h he goal o manipula ing in e nal seman ic meaning in dis ibu ed
codes. Remo ing such an induc i e bias is expec ed o emo e his local coding bias, enabling mo e dis ibu ed ep esen a ions
wi h inc eased ep esen a ion capaci y whils balancing concep in e e ences.
In conclusion, he in ui i e local coding app oach may ha e had some encou agemen on he elemen wise unc ional
o m used in con empo a y deep lea ning; howe e , elaxing his induc i e p io o a dis ibu ed neu al code sugges s ha
iso opic ac i a ion unc ions may be conside ably mo e app op ia e. Had dis ibu ed coding been p e alen du ing he ea ly
de elopmen s o deep lea ning, hen iso opic unc ional o ms may ha e eme ged as he de aul pa adigm: modula ing
ec o magni udes ins ead o s anda d componen s decomposed on an a bi a y basis. This could ha e been seen as a mo e
biologically inspi ed compu a ional model. While exac Iso opic o ms a e likely biologically implausible, due o cons ain s
o indi idual neu ons wi h localised esponses, such limi a ions do no cons ain a i icial sys ems. Hence, Iso opy may be
conside ed a be e induc i e bias o app oxima e coding in ui ions.
48