Mas e hesis on Sound and Music Compu ing
Uni e si a Pompeu Fab a
Neu al Engine Sound Syn hesis wi h
Physics-In o med Induc i e Biases and
Di e en iable Signal P ocessing
Robin Doe le
Supe iso : Lonce Wyse
Augus 2025
Acknowledgemen s
Fi s and o emos , I would like o hank my ad iso , Lonce Wyse, o his hough ul
supe ision and guidance h oughou his p ojec . Ou discussions ha e no only
shaped his wo k bu ha e also been genuinely enjoyable and inspi ing.
I ex end my g a i ude o Xa ie Se a and Es eban Gu ié ez o being on my de ense
commi ee and aking he ime o ead h ough and e alua e his wo k. I uly
app ecia e you expe ise and aluable eedback.
Special hanks o Anmol and Sa ya o being amazing iends and wonde ul com-
pany h oughou his p og am. You made e e y hing so much mo e enjoyable, and
I’m su e ou iendship will con inue a beyond his academic chap e .
I am since ely g a e ul o my company o us ing me o pu sue his deg ee while
wo king ull- ime. Balancing bo h commi men s wasn’ always easy, bu hei lex-
ibili y and suppo made i possible.
Thanks o my amily and iends o all hei encou agemen and unwa e ing belie
in me along he way. You suppo mean he wo ld o me, especially du ing he
mo e challenging momen s.
Finally, my deepes g a i ude goes o my pa ne Bes e o you pa ience and unde -
s anding h oughou his en i e p ocess. Thank you o you uncondi ional suppo
and o celeb a ing e e y miles one, big and small, wi h me.
I has been qui e a jou ney, and I am g a e ul o e e yone who helped make i
possible.
Abs ac
Engine sound syn hesis is inc easingly impo an in au omo i e audio and in e ac i e
media, ye p esen s unique challenges o neu al audio gene a ion ha dis inguish
i om musical audio pa adigms. Unlike sus ained musical ones, whe e pe iodic
oscilla ions exis inhe en ly in he acous ic ib a ion, engine sounds eme ge om se-
quen ial combus ion e en s ha gene a e sha p p essu e ansien s ecu ing a a es
om 600 o o e 8000 RPM. This c ea es acous ic phenomena exhibi ing signi ican
inha monici y, ex emely low undamen al equencies—down o 5 Hz—and apid
empo al sequences wi h in e als below 2 milliseconds, demanding app oaches ha
can model bo h p ecision in iming and complexi y in imb al e olu ion, beyond
con en ional musical audio assump ions.
While exis ing di e en iable digi al signal p ocessing (DDSP) me hods ha e demon-
s a ed success ac oss a ious audio syn hesis asks, hey o en ely on gene ic syn-
hesis modules ha do no explici ly ecognize o inco po a e he acous ic p inciples
and physical mechanisms unde lying engine sounds.
This hesis p esen s a no el app oach o engine sound syn hesis h ough sys em-
a ic in eg a ion o physics-in o med induc i e biases wi hin he en i e di e en iable
syn hesis pipeline. I p oposes he P ocedu al Engines Model (PRCE), a deep lea n-
ing a chi ec u e ha combines ime- a ying embeddings o RPM and o que pa-
ame e s – including hei empo al de i a i es – and de i ed condi ioning signals
– h o le posi ion and decele a ion uel cu o (DFCO) – wi h specialized model
heads o physics-in o med pa ame e con e sion d i ing wo cus om di e en iable
syn hesize con igu a ions ha inco po a e domain-speci ic acous ic p inciples. To
guide lea ning owa d accu a e engine imb e ep oduc ion, a cus om loss unc ion is
in oduced ha p io i izes spec al ene gy nea engine-o de ha monics, d awing in-
spi a ion om Campbell diag ams commonly used in noise, ib a ion, and ha shness
(NVH) analysis.
Engine sounds p esen a undamen al duali y: while in eali y a sum o s uc u ed
noise-like p essu e pulses, hey mani es as dis inc ly ha monic acous ic phenomena.
This mo i a es wo complemen a y syn hesis s a egies ha p o ide con as ing op-
imiza ion pa hways owa d he same acous ic a ge : di ec spec al- empo al e-
cons uc ion ha implici ly e lec s he unde lying pulse s uc u e, and explici pulse
sequence modeling h ough acous ic simula ion o indi idual combus ion e en s,
hei empo al alignmen and exhaus sys em p opaga ion.
The PRCE amewo k implemen s bo h pe spec i es as wo con igu a ions. The
Ha monic-Plus-Noise (HPN) a ian employs modi ied ha monic syn hesis wi h sys-
ema ic inha monici y and empo al-spec al s uc u ing o noise componen s o
model obse able acous ic cha ac e is ics. The Pulse-T ain-Resona o (PTR) con-
igu a ion di ec ly models physical–acous ic phenomena by composing combus ion
pulses aligned o engine i ing pa e ns and p opaga ing hem h ough di e en iable
esona o ne wo ks simula ing exhaus acous ics.
E alua ion on p ocedu ally gene a ed engine sound da ase s o aling 2.5 hou s ac oss
a ied ope a ing condi ions e eals complemen a y s eng hs be ween syn hesis ap-
p oaches. PTR achie es modes ly supe io alida ion pe o mance (5.7% imp o e-
men in o al loss) and demons a es mo e consis en aining- alida ion ans e ,
while HPN shows g ea e lexibili y ac oss di e se engine con igu a ions and obus -
ness o ha monic i egula i ies. Bo h a ian s success ully cap u e au hen ic engine
acous ic beha io s despi e dis inc syn hesis s a egies and hei audible signa u es.
This esea ch demons a es sys ema ic in eg a ion o physics-in o med induc i e bi-
ases in o di e en iable syn hesis a chi ec u es, p o iding a me hodological ame-
wo k applicable o physically-cons ained audio gene a ion beyond au omo i e con-
ex s. The wo k e eals ha domain-speci ic biases p oduce dis inc acous ic sig-
na u es ha in luence bo h op imiza ion s a egies and pe cep ual ou comes. To
suppo u u e esea ch, we openly publish he P ocedu al Engines Da ase , a com-
p ehensi e collec ion o p ocedu ally gene a ed engine audio wi h ime-aligned con-
ol anno a ions and he comple e PRCE model pipeline.
Keywo ds: Engine Sound Syn hesis, Di e en iable Signal P ocessing, Physics-
In o med Neu al Ne wo ks, Induc i e Biases, Neu al Audio Syn hesis
Con en s
Acknowledgemen s
Abs ac
Lis o Figu es
Lis o Tables
1 In oduc ion 1
1.1 Mo i a ion and Resea ch Ques ion . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Mo i a ion ............................... 1
1.1.2 Resea chQues ion ........................... 3
1.2 Objec i es and Ra ionale . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Resea ch Objec i es . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Ra ionale ................................ 6
1.3 S uc u eo heThesis........................... 7
2 Backg ound - S a e o he A 10
2.1 Applica ions o Engine Sound Syn hesis . . . . . . . . . . . . . . . . . . 10
2.2 Engine Acous ics Fundamen als . . . . . . . . . . . . . . . . . . . . . . 11
2.3 T adi ional Syn hesis Me hods . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Sample-Based App oaches . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.2 P ocedu al Syn hesis . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.3 Limi a ions and Gaps . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Neu al Syn hesis: DDSP . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 DDSPFounda ions........................... 19
2.4.2 DDSP in Engine Sound Syn hesis . . . . . . . . . . . . . . . . . . . 19
2.4.3 Rela ed DDSP De elopmen s . . . . . . . . . . . . . . . . . . . . . 22
2.4.4 Engine Sound Da ase s . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Resea ch Landscape and Gaps . . . . . . . . . . . . . . . . . . . . . . . 25
3 P ocedu al Engine Sounds Da ase 26
3.1 Da ase Gene a ion Me hodology . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 Fea u e Ex ac ion om Real Engine Da a . . . . . . . . . . . . . . 27
3.1.2 Real-Time Syn hesis A chi ec u e . . . . . . . . . . . . . . . . . . . 28
3.1.3 Da ase Augmen a ion S a egy . . . . . . . . . . . . . . . . . . . . 29
3.1.4 Da a Fo ma and Synch oniza ion . . . . . . . . . . . . . . . . . . 30
3.2 Da ase Valida ion and Analysis . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Rep esen a i eness and Realism Assessmen . . . . . . . . . . . . . 31
3.2.2 S a is ical Dis ibu ion and Va iabili y Analysis . . . . . . . . . . . 32
3.3 Applica ions and Resea ch Implica ions . . . . . . . . . . . . . . . . . . 34
3.3.1 Au omo i e Audio P ocessing Applica ions . . . . . . . . . . . . . . 34
3.3.2 Pa ame e Es ima ion and In e se Modeling . . . . . . . . . . . . . 34
4 P ocedu al Engines Model (PRCE) 36
4.1 A chi ec u e O e iew . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Inpu Fea u e Enginee ing . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Condi ioning Signal Design . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.1 DesignRa ionale............................ 40
4.3.2 Ha monic F equency Scaling . . . . . . . . . . . . . . . . . . . . . . 40
4.3.3 Vi ual Th o le and DFCO De i a ion . . . . . . . . . . . . . . . . 41
5 Syn hesize s and Model Heads 42
5.1 Di e en iable Syn hesis Me hodology . . . . . . . . . . . . . . . . . . . 42
5.1.1 B idging T adi ional Me hods and Neu al Syn hesis . . . . . . . . . 42
5.1.2 The Impulsi e O igin o Engine Ha monici y . . . . . . . . . . . . . 43
5.1.3 Dual Syn hesis S a egy . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2 Ha monic-Plus-Noise Engine Sound Syn hesis . . . . . . . . . . . . . . 44
5.2.1 Physically-In o med Syn hesis Componen s . . . . . . . . . . . . . 45
5.2.2 Neu al Pa ame e Con ol . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.3 Comple e Algo i hm Fo mula ion . . . . . . . . . . . . . . . . . . . 50
5.3 Pulse-T ain-Resona o Engine Sound Syn hesis . . . . . . . . . . . . . 51
5.3.1 Pulse Wa e Composi ion . . . . . . . . . . . . . . . . . . . . . . . . 51
5.3.2 Fi ing O de Sequence . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3.3 Neu al Pa ame e Con ol . . . . . . . . . . . . . . . . . . . . . . . 59
5.3.4 Di e en iable Ka plus-S ong Resona o . . . . . . . . . . . . . . . 61
6 Loss Func ions 71
6.1 Mul i-Resolu ion STFT Loss . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2 Ha monicLoss ............................... 73
7 T aining 75
7.1 ModelA chi ec u e............................. 75
7.2 T aining and Valida ion Da a . . . . . . . . . . . . . . . . . . . . . . . 75
7.3 Da aP ocessing............................... 76
7.4 Op imiza ionSe up............................. 77
7.5 Pa ame e Ini ializa ion . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8 Resul s 78
8.1 Quan i a i e Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
8.1.1 Con e gence and Pe o mance . . . . . . . . . . . . . . . . . . . . . 78
8.1.2 Loss Componen Analysis . . . . . . . . . . . . . . . . . . . . . . . 81
8.1.3 Gene aliza ion Ac oss Da ase s . . . . . . . . . . . . . . . . . . . . 81
8.2 Subjec i e Assessmen . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.2.1 O e all Rep oduc ion Quali y . . . . . . . . . . . . . . . . . . . . . 84
8.2.2 Audible Signa u es o Induc i e Biases . . . . . . . . . . . . . . . . 84
8.2.3 Dynamic Response o Con ol Pa ame e s . . . . . . . . . . . . . . 86
8.2.4 Noise and A i ac s . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9 Conclusions and Fu u e Wo k 88
9.1 Summa y o Con ibu ions . . . . . . . . . . . . . . . . . . . . . . . . . 88
9.2 KeyFindings ................................ 89
9.3 Implica ions................................. 89
9.4 Fu u eWo k................................. 90
Da aandCodeA ailabili y ............................ 92
Appendices 93
A Ha monic De ia ion Analysis 93
B Supplemen a y T aining Resul s 96
Bibliog aphy 98
Lis o Figu es
1 AudioandCon olDa a ......................... 30
2 Da ase Compa ison ........................... 32
3 P ocedu al Engines Model o e iew . . . . . . . . . . . . . . . . . . . 36
4 HPNSignal low.............................. 45
5 Pulse shapes o di e en pa ame e se ings . . . . . . . . . . . . . . 56
6 Ka plus-S ong Algo i hm . . . . . . . . . . . . . . . . . . . . . . . . 62
7 Ha monic Mask applied o p edic ed and a ge spec al ame . . . . 74
8 Con e gence Compa ison o To al Loss . . . . . . . . . . . . . . . . . 79
9 Con e gence Compa ison o Ha monic Loss . . . . . . . . . . . . . . 82
10 Con e gence Compa ison o Mul i-Resolu ion STFT Loss . . . . . . 83
11 G oup Engine O de Ha monic De ia ions . . . . . . . . . . . . . . . 94
12 Single Con inuous Engine O de Ha monic De ia ions . . . . . . . . . 95
Lis o Tables
1 Da ase S uc u e............................. 32
2 RPM and To que Dis ibu ions . . . . . . . . . . . . . . . . . . . . . 33
3 Valida ion Loss Compa ison . . . . . . . . . . . . . . . . . . . . . . . 80
4 T aining Loss Compa ison . . . . . . . . . . . . . . . . . . . . . . . . 80
5 Valida ion Loss a Ea ly S opping . . . . . . . . . . . . . . . . . . . . 96
6 T aining Loss a Ea ly S opping . . . . . . . . . . . . . . . . . . . . . 97
ins umen pa adigms and may no e ec i ely cap u e he con inuous mechanical
p ocesses go e ning engine acous ics. The de elopmen o dual syn hesis app oaches
add esses his limi a ion by p o iding bo h in ui i e ha monic-based modeling and
physically-g ounded p essu e-wa e simula ion wi hin he same heo e ical ame-
wo k.
The co e me hodological challenge lies in ansla ing insigh s abou engine sound
acous ics in o di e en iable syn hesis componen s ha can guide he lea ning p o-
cess owa d plausible and in e p e able solu ions h ough undamen ally di e en
compu a ional pa hways.
The compa a i e e alua ion o acous ic cha ac e is ics e sus physical p ocesses
modeling app oaches p o ides aluable insigh s in o he ela ionship be ween phys-
ical unde s anding and syn hesis a chi ec u e design.
By de eloping bo h a syn he ic da ase and dual physically-in o med syn hesis a -
chi ec u es, his wo k add esses he da a bo leneck while making he p ima y con-
ibu ion o demons a ing how sys ema ic in eg a ion o physical knowledge can
enhance neu al audio syn hesis pe o mance h ough mul iple pe spec i es, ex end-
ing beyond adi ional music-cen ic applica ions and p o iding a me hodological
amewo k applicable o o he complex acous ic modeling asks.
1.3 S uc u e o he Thesis
This hesis is s uc u ed ac oss nine chap e s, p og essing om heo e ical oun-
da ions h ough me hodological de elopmen o empi ical alida ion. I ollows
a sys ema ic app oach: es ablishing he esea ch con ex and iden i ying gaps in
exis ing app oaches, de eloping he necessa y da a in as uc u e, designing dual
syn hesis a chi ec u es based on physics-in o med p inciples, and inally e alua ing
hei compa a i e pe o mance. An o e iew o each chap e is p o ided below.
7
Chap e 1 – In oduc ion: Es ablishes he esea ch mo i a ion by iden i ying
he unique challenges o engine sound syn hesis wi hin neu al audio gene a ion. In-
oduces he dual modeling pa adigm, o mula es he cen al esea ch ques ions, and
ou lines he speci ic objec i es o in es iga ing physics-in o med syn hesis a chi ec-
u es.
Chap e 2 – Backg ound - S a e o he A : P o ides comp ehensi e co -
e age o engine acous ics undamen als, adi ional syn hesis me hods, and ecen
de elopmen s in neu al audio syn hesis wi h pa icula ocus on DDSP app oaches.
Re iews exis ing applica ions, iden i ies limi a ions in cu en me hodologies, and
es ablishes he heo e ical ounda ion o physics-in o med syn hesis. Concludes by
highligh ing esea ch gaps ha mo i a e he dual modeling app oach.
Chap e 3 – P ocedu al Engine Sounds Da ase : De ails he de elopmen
o a comp ehensi e syn he ic da ase designed o suppo con olled in es iga ion o
bo h syn hesis pa adigms. Desc ibes he eal- ime syn hesis a chi ec u e used o
da a gene a ion, ea u e ex ac ion me hodologies om eal engine eco dings, and
alida ion p ocedu es ensu ing acous ic ealism. Es ablishes he da a ounda ion
necessa y o obus compa a i e e alua ion o he p oposed models.
Chap e 4 – P ocedu al Engines Model (PRCE): P esen s he co e neu al
a chi ec u e sha ed by bo h syn hesis app oaches, including inpu ea u e enginee -
ing s a egies and condi ioning signal design. In oduces i ual h o le and DFCO
signal de i a ion ha enable physics-in o med pa ame e con ol. Es ablishes he
common ounda ion upon which bo h HPN and PTR syn hesis a ian s a e buil .
Chap e 5 – Syn hesize s and Model Heads: De elops he heo e ical ame-
wo k o di e en iable syn hesis me hodology and p esen s de ailed implemen a ions
o bo h modeling app oaches. The Ha monic-Plus-Noise (HPN) con igu a ion imple-
men s di ec spec al- empo al modeling h ough sys ema ic domain-speci ic adap-
a ions such as lea nable inha monici y and RPM-synch onized, empo ally s uc-
u ed noise componen s. The Pulse-T ain-Resona o (PTR) con igu a ion ealizes
8
pulse sequence modeling h ough acous ic p essu e pulse simula ion and exhaus sys-
em esonance modeling. Es ablishes he acous ic duali y as he heo e ical b idge
be ween hese app oaches.
Chap e 6 – Loss Func ions: In oduces domain-speci ic loss unc ion design
ha guides lea ning owa d physically plausible solu ions. De elops mul i- esolu ion
STFT loss o b oad spec al co e age and ha monic loss componen s ha p io i ize
engine-o de ha monic con en , d awing inspi a ion om NVH analysis p inciples
used in au omo i e enginee ing.
Chap e 7 – T aining: Documen s he aining me hodology including hype pa-
ame e selec ion, ini ializa ion s a egies, and da a p epa a ion p ocedu es. Es ab-
lishes consis en aining p o ocols ha enable ai compa ison be ween HPN and
PTR app oaches while accoun ing o hei di e en compu a ional equi emen s
and con e gence cha ac e is ics.
Chap e 8 – Resul s: P esen s comp ehensi e compa a i e e alua ion o bo h
syn hesis app oaches ac oss mul iple dimensions: aining dynamics and con e gence
beha io , econs uc ion quali y me ics, loss decomposi ion, gene aliza ion pe o -
mance ac oss di e en engine con igu a ions, and subjec i e assessmen o pe cep-
ual quali y. Analyzes he p ac ical mani es a ion o he dual modeling pa adigm
ega ding objec i e me ics and beyond.
Chap e 9 – Conclusions and Fu u e Wo k: Summa izes indings om he
compa a i e in es iga ion, discusses he b oade implica ions o physics-in o med
neu al engine sound syn hesis, and iden i ies oppo uni ies o ex ending his me hod-
ology. Ou lines u u e esea ch di ec ions and p o ides access in o ma ion o he
open-sou ced code, models, and da ase s de eloped in his wo k.
9
Chap e 2
Backg ound - S a e o he A
This hesis in es iga es he syn hesis o engine sounds h ough neu al syn hesis
me hodologies. This chap e p o ides a ounda ional e iew o he co e concep s
unde lying his esea ch a ea. I begins wi h an examina ion o engine sound syn-
hesis applica ion domains, ollowed by a e iew o adi ional signal p ocessing and
syn hesis me hodologies o engine sound analysis and syn hesis. An o e iew o
con empo a y neu al syn hesis app oaches is hen p esen ed, wi h pa icula ocus
on di e en iable signal p ocessing echniques demons a ing p omising e icacy in
his ield. Finally, he a ailabili y o ele an aining da ase s is assessed, and a
summa y is p o ided ha highligh s he cu en s a e o he a and iden i ies ex-
is ing esea ch gaps.
2.1 Applica ions o Engine Sound Syn hesis
Engine sound syn hesis inds signi ican applica ions ac oss mul iple indus ies, wi h
he au omo i e sec o being he p ima y domain o implemen a ion. The sound o
in e nal combus ion engines has e ol ed in o a key pe cep ual aspec o ehicle pe -
o mance and b and iden i y: Chang e al. [1] ela ed b and posi ioning a ibu es
o ce ain sonic cha ac e is ics o combus ion engine sounds. Moon e al. [2] u he
in es iga ed dis inc in e -ha monic ela ionships and hei ela ion o pe cep ional
aspec s. Melman e al. [3] demons a ed ha ehicle sound modi ica ions in lu-
10
ences d i ing beha io , he eby se ing e en beyond he me e audi o y expe ience.
Sch amm e al. [4] buil on his concep by adap i ely modi ying engine sounds,
ma ching p edic ed d i e engagemen and s udying hei in e play. As illus a ed
by Belschne and Bodden [5], he a i icial modi ica ion and enhancemen o engine
sounds by Ac i e Sound Design depa men s ep esen s a decades-old p ocedu e,
which emains ele an in con empo a y applica ions, including elec ic ehicles.
This is suppo ed by he wo k o Küppe s e al. [6], sugges ing ha he applica ion
o d i ing sounds and hei pe cep ual bene i s emain desi ed, e en in he con ex
o elec i ied mobili y, whe e adi ional engine sounds a e absen .
Beyond au omo i e applica ions, eal- ime gene a ion o engine noises ep esen s a
widesp ead applica ion in he Game Audio indus y Collins e al. [7], whe e adap i e
syn hesis echniques main ain imme sion in in e ac i e en i onmen s.
2.2 Engine Acous ics Fundamen als
Engine sounds cons i u e a complex acous ic phenomenon wi h dis inc i e cha -
ac e is ics ha di e signi ican ly om musical ins umen s. Unde s anding hese
physical p inciples is c ucial o de eloping e ec i e syn hesis me hods aligned wi h
he domain-speci ic cha ac e is ics o engine acous ics, as comp ehensi ely e iewed
by Jones [8].
Combus ion engines p oduce p opulsion by comp essing and igni ing an ai - uel mix-
u e in cylinde s. The esul ing expansion o gases applies o ce o he pis on, which
ans e s ene gy h ough he c anksha o he d i e ain, ul ima ely gene a ing o a-
ional mo ion ha p opels he ehicle. The undamen al componen s o his acous ic
phenomenon a ise om al e na ing p essu e luc ua ions, d i en by he sequen ial,
pe iodic p ocesses o ai in ake, comp ession, combus ion, and exhaus wi hin he
engine’s combus ion chambe s. These p essu e wa es p opaga e h ough he com-
plex geome y o he exhaus sys em, whe e hey unde go in e e ence, comp ession,
esonance, and luc ua ions in p essu e, empe a u e, and ai densi y, esul ing in a
highly complex and nonlinea acous ic phenomenon, as no ed by Balis e i [9].
11
Wha is commonly e e ed o as engine sound p ima ily o igina es om exhaus
p essu e pulses a he han di ec combus ion noise. While in ake sys ems and
combus ion p ocesses also con ibu e acous ic signa u es, exhaus p essu e pulses
domina e he pe cei ed sound cha ac e . Combus ion gene a es b oadband pulsa -
ing noise as a unc ion o engine speed, load, and uel inpu , and p ima ily ac s as
an exci e signal. Load and h o le posi ion signi ican ly in luence acous ic cha -
ac e is ics by modula ing backp essu e, combus ion in ensi y, and he ampli ude o
p essu e wa es. The cha ac e is ic onal quali ies a ise om he lowe - equency
exhaus pulses ha occu a speci ic poin s in he engine’s o a ion cycle. These
pulses p oduce dis inc i e ha monic pa e ns, e e ed o in domain li e a u e as
engine o de s, which a e de e mined by he engine’s con igu a ion, cylinde coun ,
and i ing sequence.
In a ou -s oke engine, he mos common combus ion engine, each pis on p oduces
one exhaus pulse o e e y wo c anksha o a ions, co esponding o he 0.5 h
o de (one pulse e e y o he e olu ion).
The undamen al equency o an engine sound is gi en by he simple ela ionship
0[Hz] = N·RPM
120 ,(2.1)
whe e Nis he numbe o cylinde s, and RPM is he engine o a ional speed in
e olu ions pe minu e.
The pa h hese p essu e pulses a el signi ican ly ans o ms he inal acous ic ou -
pu . Heade design ac s as an acous ic il e ha d ama ically al e s he iming and
in e ac ion o hese p essu e wa es:
•Equal-leng h heade s p ese e he o iginal iming sequence o pulses.
•Unequal-leng h heade s sc amble pulse iming, c ea ing dis inc i e acous ic
signa u es.
•Mani old shape, diame e changes, and junc ion geome y u he modi y he
sound h ough e lec ion and in e e ence pa e ns.
12
These acous ic ans o ma ions e ec i ely unc ion as a complex, ime- a ian es-
ona o sys em shaping he ini ial p essu e pulses o dis inc i ing sequences, e-
sul ing in a complex, pa ially ha monic signal wi h bo h s a ic and hy hmically
a ying noise componen s.
The syn hesis o such signals p esen s se e al unique challenges:
1. Dynamic Spec al E olu ion: Unlike musical ins umen s wi h ela i ely
s able ha monic s uc u es, engine ha monics shi d ama ically wi h changes
in pe o mance pa ame e s, c ea ing complex, con inuously e ol ing imb es.
2. S ochas ic Componen s: These a e in insically ied o he engine e olu-
ion cycles and i ing sequence. They occu as pe iodic p essu e pulses, span-
ning om dis inc audible hy hmic pa e ns a low engine speeds o dense,
high- equency pulse ains ha blend in o being pe cei ed as pe iodic signals
a he han dis inc pulses.
3. Non-Linea Acous ic In e ac ions: P essu e wa es in e ac h ough con-
s uc i e and des uc i e in e e ence in he exhaus sys em, c ea ing eme gen
acous ic phenomena di icul o model wi h simple addi i e o sub ac i e syn-
hesis echniques.
These physical cha ac e is ics and signal analy ical obse a ions di ec ly in o m he
design conside a ions o neu al syn hesis me hods discussed in subsequen chap e s.
The complex in e ac ion be ween exci a ion signals (combus ion e en s) and acous ic
il e ing (exhaus sys em esonance) sugges s ha neu al a chi ec u es modeling
his exci a ion- esona o ela ionship may p o e mo e e ec i e han gene al-pu pose
ha monic-plus-noise models.
The acous ic signa u es o engines hus ep esen ascina ing s udy subjec s a he
in e sec ion o physics, signal p ocessing, and pe cep ual acous ics, demanding com-
pu a ional app oaches ha can accu a ely model bo h hei physical p inciples and
pe cep ual cha ac e is ics.
13
2.3 T adi ional Syn hesis Me hods
Engine sound syn hesis has e ol ed h ough se e al me hodological app oaches,
b oadly ca ego ized in o sample-based me hods and p ocedu al syn hesis echniques.
This chap e e iews hese adi ional app oaches, examining hei s eng hs, limi-
a ions, and applica ions be o e in oducing neu al syn hesis me hods as he ocus
o his hesis.
2.3.1 Sample-Based App oaches
Sample-based app oaches ely on eco ded audio segmen s ha a e p ocessed and
eassembled acco ding o con ol pa ame e s. These me hods can be u he ca ego-
ized in o loop- o wa e able-based me hods, conca ena i e syn hesis, and g anula
syn hesis:
Loop Playback and Wa e able Syn hesis
Loop-based me hods, as desc ibed by Koi is o [10], ope a e by eco ding engine noise
a a ew s a ic engine speeds. These eco dings unde go edi ing o ob ain seamless
pe iodic audio loops ha a oid discon inui ies a hei bounda ies. They can be
as small as wa e ables o a ew hund ed samples, me ely spanning a single pe iod
o he undamen al ha monic oscilla ion, as shown, o example, by Hei b ink and
Cable [11]. The p ocessed samples a e hen implemen ed in audio sampling engines.
To achie e in e media e engine speeds be ween he eco ded samples, he sys em
ei he unde -samples o o e -samples he audio playback, e ec i ely modula ing he
speed and pi ch o he o iginal sample wi hin an accep able ange o pi ch shi .
This app oach o e s compu a ional e iciency and a low memo y oo p in , bu can
p oduce acous ic a i ac s when pi ch modula ion exceeds na u al-sounding limi s.
Mo eo e , i p o ides limi ed lexibili y beyond he eco ded da a and lacks he
na u al a iance inhe en in physical sys ems.
14
Conca ena i e Syn hesis
Conca ena i e syn hesis s a egies, in con as , ely on eco dings ha con inuously
span he en i e engine speed ange, ensu ing comp ehensi e co e age o syn hesis
a a bi a y speeds. As demons a ed by Jagla e al. [12], a ge engine speed noise
is ec ea ed by dynamically selec ing and ex ac ing sound segmen s om he o ig-
inal eco dings and conca ena ing hem using o e lap-add me hods. This app oach
e icien ly econs uc s engine noises ac oss all ope a ing speeds, p o iding a com-
pu a ionally easible solu ion o in e ac i e simula ions and i ual en i onmen s.
Mo e sophis ica ed app oaches u he decompose eco dings in o componen pa s.
Chen e al. [13] and Li e al. [14] p opose me hods ha sepa a e low- equency
ha monics om high- equency na ow-band ampli ude-modula ed signals. This
decomposi ion enables ame-wise esyn hesis, elimina es he need o phase-accu a e
sample alignmen ia G i in-Lim o e lap-add, and o ms a hyb id app oach ha
combines componen -based esyn hesis wi h sample-based echniques.
O e all, conca ena i e me hods di e in lexibili y, eal- ime capabili y, and sonic
ealism, o en elying p ima ily on engine speed as he con ol pa ame e .
G anula Syn hesis
G anula syn hesis me hods, widely used in he game indus y as no ed by Mac-
G ego [15], spli audio examples in o small uni s called “g ains.” Di e en sound
quali ies and imb es a e achie ed by a ying he wa e o m, en elope, du a ion, and
densi y o hese g ains. This app oach can p oduce mo e na u al-sounding esul s
h ough g ain-le el andomiza ion, including small posi ion shi s, pi ch a ia ions,
gain adjus men s, and modi ica ions o g ain size.
While g anula syn hesis o e s g ea e lexibili y and he po en ial o mo e na u al
esul s, i s g ain-le el eal- ime p ocessing ypically makes i mo e compu a ionally
demanding han o he sample-based playback me hods.
I also p esen s se e al p ac ical challenges: me iculous pa ame e uning is equi ed
due o he nume ous in e ela ed se ings, which can be ime-consuming and unp e-
15
dic able; p ecise alignmen o g ains wi h engine e en s, such as pis on i ings, is
di icul and may esul in iming a i ac s o loss o hy hmic cohe ence; and he
se up is labo -in ensi e, in ol ing ca e ul selec ion and cleanup o sou ce eco dings
o a oid unwan ed noise and ensu e consis en g ain ma e ial.
2.3.2 P ocedu al Syn hesis
In con as o sample-based me hods, p ocedu al syn hesis gene a es engine sounds
di ec ly h ough signal-p ocessing echniques, p ima ily using addi i e, sub ac i e,
and physically in o med syn hesis me hods.
Addi i e Syn hesis
Addi i e syn hesis o engine ha monics has eme ged as he p edominan me hod
wi hin he au omo i e indus y, as discussed by Bodden and Belschne [16] and
Boussa d e al. [17]. This app oach is pa icula ly sui able in applica ions whe e he
ac ual engine noise is p esen bu equi es imb al modi ica ion h ough he addi ion
and enhancemen o single engine ha monics. The ampli udes o hese ha monics
a e modula ed as a unc ion o engine pa ame e s, including RPM, load, and gea ,
p o iding an e icien means o p ecisely modi y he ha monic imb e o he o iginal
engine noise.
While e ec i e, he ha monic-based app oach ep esen s an idealized implemen-
a ion ha o en neglec s s ochas ic componen s and educes he ha monics o a
manageable subse , as he numbe o unable pa ame e s g ows apidly wi h each
ha monic, equi ing ampli ude and equency de ini ions ac oss all combina ions o
engine pe o mance pa ame e s.
Spec al Syn hesis
Cao e al. [18] e alua e engine o de syn hesis beyond he p esence o a physical
engine, speci ically in he con ex o elec i ied ehicles. They ex ac magni ude
spec a om engine eco dings and syn hesize new signals using he in e se sho -
ime Fou ie ans o m combined wi h o e lap-add.
16
challenge, Hayes e al. [31] p oposed a complex exponen ial su oga e app oach ha
eplaces di ec equency op imiza ion wi h op imiza ion o complex pa ame e s.
By le e aging PyTo ch’s Wi inge de i a i es o complex- alued pa ame e s, his
me hod enables g adien -based equency es ima ion ha a oids he local minima
ypical o di ec sinusoidal equency op imiza ion.
S ochas ic Syn hesis
Ba ahona-Rios and Collins in oduced NoiseBandNe [32], a model ha add esses
he ime– equency ade-o o spec al noise syn hesis while simpli ying he o e all
a chi ec u e. The sys em p edic s ime- a ying ampli ude en elopes o each noise
band a ame a e, hen upsamples hem linea ly o he audio sampling a e be o e
applying hem o a p ecompu ed il e ed noise bank. This design achie es high eso-
lu ion in bo h ime and equency, closely aligning wi h ou objec i e o a syn hesize
ope a ing di ec ly a audio a e o all p edic ed pa ame e s.
In e es ingly, hey also ound ha p o iding MFCCs as inpu ea u es was no es-
sen ial, and ha e en non-de e minis ic, pe cep ually mo i a ed con ol signals –
manually d awn acco ding o human in ui ion – we e su icien o guide he model
du ing aining and in e ence. This sugges s ha physically meaning ul pa ame-
e s, such as engine o que, can p o ide eliably s uc u ed in o ma ion o lea ning
acous ic pa e ns, e en i hey a e no s ic ma hema ical audio desc ip o s like
loudness o 0.
Resona o Modeling
The o iginal DDSP amewo k used lea ned impulse esponses applied ia con olu-
ion o e e be a ion in musical applica ions. Fo engine sound syn hesis, howe e ,
e e be a ion is less ele an ; ins ead, esona o s – cap u ing he dynamic acous ic
in e play wi hin exhaus pipes – play a cen al ole.
Resona o s a e ypically desc ibed as in ini e impulse esponse (IIR) il e s: sys ems
whose ou pu s depend no only on cu en exci a ion bu also on pas s a es. Thei
ecu si e na u e makes hem e icien , bu i also complica es g adien -based op i-
23
miza ion, as discussed in Chap e 5.3.4. Recen wo k has p oposed se e al s a egies
o implemen ing di e en iable IIR il e s and esona o s:
Recu en implemen a ions. A di ec app oach is o un oll he ecu sion and imple-
men IIR il e s as ecu en laye s [33, 34, 35]. While exac , his me hod becomes
p ohibi i ely slow when scaling o many il e s ope a ing a audio sample a e.
F equency-domain me hods. An al e na i e is o a oid ime-domain ecu sion en-
i ely by e alua ing he il e ’s ans e unc ion on a equency g id. This ” equency-
sampling ick” con e s coe icien s in o a di e en iable equency esponse, which
is hen applied in he spec al domain. No able examples include Ne cessian e
al. [36], who modeled audio dis o ion e ec s wi h in e p e able IIR sys ems, and
Diaz e al. [37], who applied he me hod o di e en iable modal syn hesis o igid-
body esonances.
Da a-d i en eg ession. Colonel e al. [38] in e ed he p oblem by aining a neu al
ne wo k (IIRNe ) o map a ge magni ude esponses di ec ly o cascaded biquad
coe icien s. This amo ized app oach bypasses i e a i e op imiza ion and yields
coe icien s ha can be used in a s anda d ime-domain implemen a ion.
Di ec di e en iable IIRs. Yu and Fazekas [39, 40] in oduced di e en iable all-pole
il e s, success ully applying hem as LPC ocode s and o e ec s such as phase s,
sub ac i e syn hesize s, and comp esso s.
Wa eguide-inspi ed esona o s. Among he mos elegan his o ical amewo ks o
modeling esonan sys ems such as ubes, memb anes, and s ings a e wa eguide
o mula ions, wi h he Ka plus–S ong algo i hm, o iginally p oposed by Ka plus
and S ong [41] and ex ended by Smi h [42], being a pa icula ly in luen ial ap-
p oxima ion. Howe e , di e en iable wa eguide-inspi ed esona o s emain la gely
unexplo ed, despi e hei po en ial o e icien and in e p e able modeling o phys-
ical esonance phenomena.
In summa y, a a ie y o p omising s a egies exis o making esonan sys ems
di e en iable, anging om ecu en o mula ions and equency-domain app ox-
24
ima ions o da a-d i en eg ession and di ec IIR designs. Ye , wa eguide-based
app oaches – long alued in physical modeling syn hesis – ha e no been ully le e -
aged in he di e en iable domain, ep esen ing a compelling di ec ion o his wo k.
2.4.4 Engine Sound Da ase s
Da ase s o wo k on he a o emen ioned neu al app oaches on engine sound syn-
hesis (2.4.2) we e ob ained by he au ho s and a e, despi e no being a ailable,
epo ed o be limi ed in size and anno a ion.
Exis ing engine sound collec ions ocus p ima ily on classi ica ion asks and anoma-
lous sound de ec ion a he han syn hesis, wi h audio ypically p o ided wi h in-
su icien quali y and anno a ions desc ibing ehicle classes o machine heal h con-
di ions a he han pa ame e s ele an o eal- ime sound gene a ion [43, 44, 45,
46, 47]. These da ase s lack he audio quali y, s anda diza ion, and de ailed ime-
aligned anno a ions necessa y o de eloping ad anced gene a i e models.
Based on he p omising applica ions o deep lea ning in his ield, i is assumed ha
one o he main bo lenecks hinde ing esea ch may be he lack o a ailable da a
equi ed o aining o gene a i e models.
2.5 Resea ch Landscape and Gaps
The ield o engine sound syn hesis encompasses a wide ange o app oaches, ye
he s a e-o - he-a emains unclea . Pe cep ually high-quali y me hods a e o en
insu icien ly documen ed, while mo e heo e ical con ibu ions p o ide limi ed au-
dio examples o e alua ion. The mos igo ous esea ch o igina es om adjacen
disciplines, such as noise– ib a ion–ha shness (NVH) and acous ics, hough some
ad anced echniques likely emain undisclosed due o co po a e in e es s in gaming
and au omo i e indus ies.
A majo obs acle o p og ess, pa icula ly in adap ing con empo a y me hods, is
he lack o sui able and publicly a ailable da ase s o aining.
25
Chap e 3
P ocedu al Engine Sounds Da ase
The acquisi ion and p epa a ion o comp ehensi e da ase s o engine sound analysis
p esen se e al c i ical challenges ha signi ican ly impac esea ch p og ess in his
ield. Ob aining high-quali y eco dings is inhe en ly di icul due o he need o
expensi e and complex se ups, including specialized measu emen equipmen , dy-
namome e s, and acous ically ea ed chambe s. Howe e , e en wi h p ope equip-
men , eal-wo ld eco dings a e o en con amina ed by con ounding noise sou ces,
such as dynamome e in e e ence, ehicle agg ega es, and o he mechanical compo-
nen s ha a e no pa o he ac ual engine sound signa u e. Since sel -supe ised
gene a i e neu al ne wo k based app oaches equi e as quan i ies o s anda dized
and anno a ed aining da a, his ep esen s a signi ican bo leneck o esea ch
ad ancemen in his domain.
To add ess hese limi a ions, his wo k adop s a syn he ically gene a ed da ase –
an es ablished me hodology in audio esea ch [48, 49, 50] – which enables p ecise
pa ame ic con ol o e da a complexi y, elimina ion o con ounding noise, unlimi ed
scalabili y wi h consis en quali y, and inhe en access o g ound u h labels. By
le e aging adi ional engine sound modeling echniques and analyzing a limi ed se
o eal eco dings, undamen al acous ic p inciples can be ex ac ed and implemen ed
in a pa ame ic syn hesize capable o gene a ing ex ensi e da ase s sui able o
neu al audio syn hesis model aining and o he applica ions beyond his hesis.
26
To enable he esea ch conduc ed in his hesis, a dedica ed da ase o engine
eco dings was c ea ed and made openly a ailable as he P ocedu al Engine Sounds
Da ase [51]. I comp ises synch onized engine audio and ime-aligned RPM and
o que measu emen s, se ing as he ounda ion o he expe imen s p esen ed he ein.
3.1 Da ase Gene a ion Me hodology
This sec ion documen s he da ase used h oughou he hesis. While he da ase
c ea ion is no he p ima y esea ch con ibu ion o his wo k, a de ailed accoun o
i s gene a ion, s uc u e, and p ep ocessing is p o ided o ensu e anspa ency and
o es ablish he us wo hiness o he expe imen al esul s.
3.1.1 Fea u e Ex ac ion om Real Engine Da a
The pa ame ic ounda ion o he syn hesis sys em de i es om sys ema ic analysis
o engine eco dings ob ained om a ious ehicle and engine con igu a ions.
To iden i y and quan i y unde lying ha monic de ia ion and magni ude pa e ns, a
dedica ed spec al analysis and ha monic ex ac ion pipeline was implemen ed, p o-
cessing app oxima ely 2.12 GB o aw mul i-channel engine da a spanning comple e
RPM and o que ope a ing anges.
The audio con en , sampled a 16 kHz, was segmen ed in o ames o 65,536 sam-
ples (4.096-second chunks). F ames exhibi ing s ong RPM a iance o ze o RPM
condi ions (engine o ) we e excluded om he analysis. To ensu e spec al s abili y,
each audio ame unde wen RPM-based esampling such ha he pi ch en elope
emained con inuous a he ame’s mean undamen al equency ( 0). This p ep o-
cessing elimina es pi ch a ia ions wi hin ames, p e en ing ha monics om d i ing
be ween equency bins and causing spec al a i ac s. To minimize spec al leak-
age, an adap i e FFT size calcula ion was implemen ed on a ame-by- ame basis,
ensu ing ha FFT bins consis en ly align wi h expec ed ha monic equencies based
on each ame’s undamen al equency.
Spec al cen oid analysis (cen e o mass calcula ion) was employed a ound each
27
ha monic posi ion, wi h non-o e lapping egions be ween adjacen ha monics, as
i demons a ed educed suscep ibili y o peak de ec ion e o s compa ed o con-
en ional peak picking ([25]). Ha monic de ia ions we e quan i ied by calcula ing
he a io be ween he ideal ha monic equency and he de ec ed ha monic cen oid
equency, wi h de ia ion alues compu ed and s o ed o all p ocessed ames.
This analysis ex ac s comp ehensi e pa ame e se s ha cap u e he essen ial acous-
ic inge p in o each engine con igu a ion, including ha monic magni ude dis ibu-
ions, inha monic de ia ions, and hei dependencies on ope a ing condi ions.
3.1.2 Real-Time Syn hesis A chi ec u e
The da ase gene a ion employs a comp ehensi e Max/MSP-based eal- ime syn-
hesis sys em in eg a ing h ee complemen a y syn hesis app oaches: addi i e, sub-
ac i e, and esona o -based p ocessing.
Addi i e Ha monic Syn hesis: The co e a chi ec u e uses 128 independen sine
wa e oscilla o s wi h dynamically con ollable ampli ude and equency pa ame e s
d i en by incoming RPM and o que da a s eams. The syn hesize ope a es h ough
lookup ables s o ing he engine-speci ic ha monic cha ac e is ics ex ac ed in he
ea u e analysis s age. These s o ed pa ame e alues a e in e pola ed in eal– ime,
ensu ing smoo h ansi ions ac oss he ope a ional pa ame e space while main ain-
ing acous ically ealis ic ha monic signa u es de i ed om au hen ic eco dings.
S ochas ic Componen Syn hesis: While addi i e syn hesis cap u es he ha -
monic ounda ion, b oadband spec al cha ac e is ics equi e s ochas ic modeling.
Fil e ed noise modula ed by he ime-domain signal o he summed ha monics, com-
bined wi h a dis inc low- equency oscilla ion linked o engine speed, cap u es he
inhe en u bulence and mechanical noise o combus ion e en s absen in pu e sinu-
soidal syn hesis.
Resona o -Based Acous ic Modeling: The combined addi i e and s ochas ic
signals a e p ocessed h ough a complex esona o bank consis ing o eigh pa allel
Ka plus-S ong-inspi ed delay-line ne wo ks. These p oduce non-linea ans o -
28
ma ions modeling he in e e ence pa e ns, comb- il e ing e ec s, and esonances
occu ing as engine p essu e pulses p opaga e h ough exhaus sys em geome y.
This esona o a chi ec u e ans o ms he o he wise syn he ic ou pu in o acous i-
cally con incing engine sounds wi h ealis ic spa ial and imb al complexi y, while
main aining pa ame ic con ol o sys ema ic da ase augmen a ion.
3.1.3 Da ase Augmen a ion S a egy
While he ea u e ex ac ion p ocess yields au hen ic engine-speci ic pa ame e s,
he limi ed scope o eal-wo ld eco dings cons ains da ase di e si y. To gene a e
comp ehensi e aining da ase s spanning b oade ope a ional anges and imb al
a ia ions, a sys ema ic augmen a ion me hodology is employed.
C oss-Con igu a ion Recombina ion: The syn hesis sys em enables applica ion
o con ol da a ajec o ies om any engine con igu a ion o he ha monic pa ame e
se s o any o he con igu a ion. This c oss– ecombina ion expands con ol da a
a ailable o each subse wi h physically consis en ope a ing scena ios no p esen
in he o iginal eco dings, subs an ially expanding he da ase while main aining
physical consis ency.
Pa ame ic Va ia ion: Beyond ecombina ion, a ge ed pa ame e modi ica ions
augmen imb al di e si y. This includes sys ema ic a ia ion o s ochas ic noise
shaping and in ensi y o model di e en exhaus con igu a ions o mic ophone po-
si ions, adjus men o esona o cha ac e is ics o simula e acous ic p opaga ion
di e ences, and con olled modi ica ion o ha monic magni ude dis ibu ions wi hin
physically ealis ic bounds.
Augmen a ion Scale: This me hodology enables gene a ion o ex ensi e da ase s
om each engine con igu a ion— ans o ming limi ed o iginal eco dings in o hou-
sands o syn hesized samples co e ing comp ehensi e ope a ional anges, and di e se
imb al cha ac e is ics. The esul is a da ase su icien o obus machine lea ning
aining while main aining g ounding in au hen ic acous ic cha ac e is ics ex ac ed
om eal ehicles, wi h physically accu a e con ol pa ame e ajec o ies.
29
3.1.4 Da a Fo ma and Synch oniza ion
To ensu e ealis ic aining da a ha cap u es au hen ic engine beha io , he syn-
hesis sys em ope a es unde con ol o ac ual ehicle pe o mance da a eco ded
du ing eal d i ing scena ios.
We eco d syn hesized s e eo audio alongside co esponding con ol pa ame e s in
a synch onized ou -channel o ma sampled a 48 kHz, ensu ing pe ec empo al
alignmen be ween acous ic con en and pe o mance pa ame e s h oughou he
da ase , as shown in Figu e 1.
Figu e 1: S e eo audio channels wi h ime-aligned con ol pa ame e s: RPM and
o que (Nm).
Con ol pa ame e s – o a ional speed as e olu ions pe minu e (RPM) and engine
o que in New on-Me e s (Nm) – a e no malized o he ange [-1, 1] using p ede ined
scaling bounda ies o 10,000 RPM and 1,000 Nm espec i ely. A a bi dep h o 16
bi , he esolu ion o RPM and o que is de e mined by he quan iza ion s ep size
∆ = 2
216 =2
65536 ≈3.0518 ×10−5. Gi en he no maliza ion scaling ac o s o 10−4
o RPM and 10−3 o o que, he eal-wo ld pa ame e esolu ion becomes ∆RPM =
∆
10−4≈0.305 RPM and ∆Nm =∆
10−3≈0.0305 Nm, p o iding su icien g anula i y
o accu a e econs uc ion o engine ope a ing condi ions om he encoded audio
da a.
30
3.2 Da ase Valida ion and Analysis
3.2.1 Rep esen a i eness and Realism Assessmen
This da ase , hough en i ely syn he ically gene a ed, p ese es ep esen a i e e-
alism h ough i s g ounding in au hen ic engine beha io . The syn hesis p ocess
ep oduces essen ial acous ic cha ac e is ics o eal engines – including engine o de
magni ude s uc u es, ha monic de ia ions, and pe o mance pa ame e dependen-
cies – while p o iding he con olled, noise- ee en i onmen necessa y o sys ema ic
machine lea ning esea ch. As a esul , he da ase combines he analy ical ad an-
ages o syn he ic gene a ion wi h he acous ic au hen ici y de i ed om eal-wo ld
engine cha ac e is ics and ope a ional dynamics.
To alida e his ep esen a i eness, a ho ough analysis o he ha monic s uc u e a
each RPM and o que combina ion was conduc ed o bo h eal-wo ld da a and he
p ocedu ally gene a ed da ase , compa ing magni udes and ha monic de ia ions o
engine o de s.
Figu e 2 shows he engine o de magni udes as unc ions o RPM ( op ow) and
To que (bo om ow), compa ing he eal eco dings displayed on he le wi h he
co esponding syn he ic engine sound se on he igh .
The compa ison highligh s simila i y ac oss o e all ends, ha monic dis ibu ions
be ween engine o de s, and pe o mance pa ame e dependencies. Cha ac e is ics
speci ic o he engine ype (in his case V8) – such as a dominan 4 h engine o de
– a e obse able in bo h da ase s, sugges ing ha he syn he ic da ase con ains
ealis ic ea u es wi h gene aliza ion po en ial.
Beyond demons a ing simila i y, he plo s also e eal di e ences in imb e and
how empi ical modi ica ion o syn hesis pa ame e s exceeding he ex ac ed ea u es
ans o med he engine sound beyond he limi ed a ailable da a poin s while main-
aining cohesi eness and plausible acous ic beha io .
31
Figu e 2: Compa ison o engine o de magni udes be ween eal eco dings (le ) and
syn he ic engine sounds ( igh ) as unc ions o RPM ( op) and o que (bo om)
3.2.2 S a is ical Dis ibu ion and Va iabili y Analysis
The da ase comp ises 5,935 audio iles ep esen ing 19.01 hou s o eco ded da a
(24.47 GB o al). The da a is o ganized in o eigh se s: Full Se s (A-D) con aining
3,068 iles and La ge Se s (E-H) con aining 2,867 iles, as de ailed in Table 1.
Ca ego y Files Du a ion (h s) Size (GB)
Full Se s (A-D) 3,068 9.83 12.65
La ge Se s (E-H) 2,867 9.18 11.82
To al 5,935 19.01 24.47
Table 1: Da ase o ganiza ion and s uc u e
The s ochas ic na u e o he da a is e iden in he o a ional speed and o que
dis ibu ions, which exhibi signi ican a iabili y cha ac e is ic o eal-wo ld ope -
a ional condi ions. RPM alues demons a e a wide ope a ional ange om 0.0 o
7,007.45 RPM, wi h a mean o 3,170.53 RPM and s anda d de ia ion o 1,713.87
RPM. The dis ibu ion shows a ela i ely symme ic pa e n a ound he median
alue o 2,971.19 RPM, wi h he in e qua ile ange spanning om 1,654.97 RPM
32
To p o ide physics-in o med induc i e biases ha cap u e dynamic engine beha io s,
we augmen he con ol signals wi h i s and second-o de empo al di e ences—
s anda d au omo i e enginee ing pa ame e s used o cha ac e ize engine beha io :
Engine speed del as:
•Fi s -o de di e ence (angula eloci y): RPM( )−RPM( −h): cap u es
o a ional accele a ion and decele a ion phases
•Second-o de di e ence (angula je k): ∆RPM( )−∆RPM( −h): iden i-
ies sudden o a ional dynamics om gea shi s, clu ch engagemen , o load
ansi ions
To que del as:
•Fi s -o de di e ence ( o que a e): Nm( )−Nm( −h): dis inguishes be-
ween s eady-s a e and ansien loading condi ions
•Second-o de di e ence ( o que je k): ∆Nm( )−∆Nm( −h): cap u es
ab up mechanical e en s (gea -shi s, o que holes) ha mani es as dis inc-
i e acous ic signa u es
We compu e di ec di e ences a he han no malized de i a i es (d
d ≈ ( )− ( −h)
h)
as hey ep esen he same empo al dynamics while p ese ing compa able magni-
ude and scale o he inpu da a.
To ob ain inpu ea u es ma ching he model ame a e, we subsequen ly down-
sample he con ol signal sequences by segmen ing hem in o ames and a e aging
each ea u e o e he ime dimension wi hin each ame.
All inpu ea u es a e s anda dized using he aining-se mean and s anda d de i-
a ion o ensu e compa able magni udes ac oss ea u es. The same ans o ma ion,
based on he aining-se s a is ics, is applied o he alida ion and es se s o
ensu e unbiased e alua ion and consis en in e ence.
39
4.3 Condi ioning Signal Design
Following pa ame e p edic ion by he model heads and upsampling o audio sample
a e, syn hesis pa ame e s unde go condi ioning ha en o ces physically-mo i a ed
beha io . This app oach shi s he model’s lea ning ask om absolu e pa ame e
p edic ion o ela i e pa ame e adjus men , whe e assignmen o engine ope a ing
s a es is handled de e minis ically.
4.3.1 Design Ra ionale
Condi ioning en o ces physical cons ain s wi hou equi ing he model o lea n hese
ela ionships om da a alone. This app oach add esses a undamen al op imiza ion
challenge: ce ain pa ame e ela ionships a e no di ec ly de ec able om spec al
cha ac e is ics and would equi e specialized loss unc ions o lea n co ec ly. Fo ex-
ample, noise modula ion dep h op imiza ion su e s om phase ambigui y be ween
ou pu and a ge signals-spec al losses p o ide inconsis en g adien s as modu-
la ions some imes coincide and some imes oppose, p e en ing clea op imiza ion
pa hways.
4.3.2 Ha monic F equency Scaling
P edic ed ha monic a ios a e con e ed o absolu e equencies h ough undamen-
al equency scaling de i ed om RPM con ol signals:
k( ) = 0( )· k( ),(4.1)
whe e 0( ) = RPM( )×10000
60 e e ses he no maliza ion o he embedded RPM alues
and con e s e olu ions pe minu e o Hz, and k( ) ep esen s he p edic ed ha -
monic a ios ha al eady inco po a e 0.5 in e als (0.5, 1.0, 1.5, ...) co esponding
o engine o de s o ou -s oke engines.
40
4.3.3 Vi ual Th o le and DFCO De i a ion
Engine ope a ing mode condi ioning de i es i ual h o le and decele a ion uel
cu o (DFCO) ac o s om o que measu emen s o condi ion noise gains acco ding
o dis inc acous ic cha ac e is ics:
h o le i ual( ) = max( o que( ), ϵ)0.7,(4.2)
DFCO ac o ( ) = max(− o que( ), ϵ).(4.3)
The i ual h o le de i a ion add esses he non-linea ela ionship be ween o que
measu emen s and acous ic ou pu . While measu ed o que e lec s ex e nal ac o s
( ac ion, ic ion, o que ec o ing), h o le posi ion di ec ly go e ns in ake ai low
and bu e ly al e dynamics ha de e mine acous ic cha ac e is ics. The powe law
scaling (0.7) cap u es he non-linea acous ic esponse o h o le opening, pa icu-
la ly o in ake noise and combus ion pulse in ensi y, while main aining minimum
gains (ϵ) du ing idle ope a ion.
Nega i e o que condi ions ac i a e he DFCO ac o , emphasizing u bulen low
noise cha ac e is ics o decele a ion uel cu o scena ios whe e combus ion ceases bu
ai low u bulence con inues o domina e he acous ic signa u e. This condi ioning
enables he syn hesis o au oma ically dis inguish be ween p opulsion-domina ed
and u bulence-domina ed ope a ing modes wi hou explici aining on hese dis-
inc acous ic egimes.
41
Chap e 5
Syn hesize s and Model Heads
5.1 Di e en iable Syn hesis Me hodology
Ha ing es ablished a neu al ne wo k a chi ec u e capable o lea ning mappings be-
ween engine pe o mance pa ame e s and syn hesis pa ame e s, we now u n o he
co e challenge: de ining di e en iable syn hesis modules ha can gene a e acous i-
cally ealis ic engine sounds while emaining amenable o g adien -based op imiza-
ion.
5.1.1 B idging T adi ional Me hods and Neu al Syn hesis
Gi en ou equi emen o ully di e en iable and con inuous syn hesis unc ions,
sample-based app oaches om he adi ional me hods e iewed in Sec ion 2.3 can
be excluded due o hei inhe en lack o di e en iabili y and limi ed gene aliza ion
capabili ies.
Examining he emaining p ocedu al me hods e eals a undamen al philosophical
di ide in how engine sounds a e app oached. Mos adi ional me hods ocus p i-
ma ily on ec ea ing he acous ic cha ac e is ics obse ed in eco ded engine sounds.
These me hods adjus syn hesis pa ame e s o achie e pe cep ual simila i y o a -
ge eco dings, using gene al-pu pose audio syn hesis echniques o ma ch spec al
con en , empo al en elopes, and imb al quali ies.
42
In con as , a smalle subse o me hods a emp s o model he unde lying physical
p ocesses and hei acous ic ou pu s ha gene a e engine sounds in he i s place.
These app oaches simula e combus ion e en s, p essu e wa e p opaga ion, exhaus
esonances, and mechanical ib a ions, wi h he expec a ion ha accu a e physics-
in o med modeling will na u ally p oduce ealis ic acous ic ou pu .
This undamen al dis inc ion be ween ec ea ing acous ic cha ac e is ics e sus mod-
eling unde lying p ocesses becomes pa icula ly ele an when ansla ed in o neu-
al syn hesis a chi ec u es, whe e bo h pe spec i es can be explo ed sys ema ically
wi hin a uni ied amewo k.
5.1.2 The Impulsi e O igin o Engine Ha monici y
Unlike esonan ins umen s whe e ha monic oscilla ions exis inhe en ly in he phys-
ical ib a ion, engine sounds de i e hei ha monic s uc u e om pe iodic epe i ion
o b oadband impulse e en s. Each combus ion e en gene a es acous ically com-
plex, noise-like p essu e wa es; he pe cei ed ha monici y eme ges om empo al
egula i y a engine-o de equencies.
This c ea es a undamen al duali y: engine sounds a e bo h a once: a sequence o
inhe en ly noise-like impulsi e e en s a hei sou ce, ye also a dis inc ly ha monic
acous ic s uc u e. This duali y sugges s wo complemen a y modeling pa adigms
ha di ec ly pa allel he philosophical di ide in adi ional me hods: we can ei he
model he pe cep ual acous ic quali ies di ec ly, o we can syn hesize he indi id-
ual p essu e pulses and allow he comple e acous ic cha ac e is ics o eme ge om
hei shapes, empo al o ganiza ion and p opaga ion. Fo hese impulsi e–pe iodic
sou ces:
1. Fo wa d modeling: Accu a e modeling o indi idual p essu e pulses and
hei empo al sequences should yield co ec spec al and pe cep ual cha ac-
e is ics
2. In e se modeling: P ecise acous ic spec um econs uc ion should implic-
i ly cap u e unde lying pulse s uc u e
43
5.1.3 Dual Syn hesis S a egy
This duali y mo i a es implemen ing bo h modeling pe spec i es as di e en iable
syn hesis modules:
1. Ha monic-plus-noise syn hesis: Models spec al cha ac e is ics h ough
adap i e ha monic gene a ion wi h physically-coupled, empo ally s uc u ed
noise componen s
2. Pulse- ain- esona o syn hesis: Models empo al causali y h ough pa-
ame e ized pulse sequences and exhaus sys em p opaga ion
Bo h app oaches inco po a e physical cons ain s while allowing neu al ne wo ks
o disco e op imal pa ame e mappings. This dual app oach p o ides con as ing
op imiza ion pa hways owa d he same acous ic a ge , enabling sys ema ic com-
pa ison o spec al-s uc u e e sus impulse-sou ce modeling pa adigms.
In he ollowing sec ions, we de ail he implemen a ion o each syn hesis module,
beginning wi h he ha monic-plus-noise app oach due o i s es ablished heo e ical
ounda ion and in ui i e pa ame e space, which also p o ides c ucial building blocks
o he pulse- ain- esona o syn hesis me hod.
5.2 Ha monic-Plus-Noise Engine Sound Syn hesis
The i s con igu a ion o he PRCE model – HPN – employs a ha monics-plus-noise
based syn hesis app oach, di ec ly modeling spec al and empo al dynamics while
adap ing he gene al me hod o he speci ic acous ic cha ac e is ics o engine sounds.
An o e iew o he in o ma ion low h ough he componen s de ailed in he ollow-
ing sec ions is p o ided in Figu e 4:
44
Engine Pa ame e s
(RPM, To que, De i a i es)
Neu al Ne wo k
(La en Embeddings x )
Ha monic Head Noise Head
Complex Oscilla o s
Hk( )
Noise Componen s
N∗( )
Syn hesized Audio
y( )
Figu e 4: Sys em a chi ec u e showing in o ma ion low om engine con ol pa am-
e e s h ough model heads o syn hesis componen s. Componen s in bold a e unique
o he HPN a ian .
5.2.1 Physically-In o med Syn hesis Componen s
Ou ha monic-plus-noise syn hesis a chi ec u e implemen s mul iple physically-in o med
induc i e biases ha cons ain he syn hesis owa d acous ically plausible engine
sounds while main aining ull di e en iabili y o g adien -based op imiza ion.
Sys ema ic Inha monici y ia Gaussian Ha monic Bending
One o he key limi a ions o s anda d ha monic syn hesis o engine sound modeling
lies in i s assump ion o pe ec ha monic ela ionships. Real engine sounds exhibi
sys ema ic de ia ions om pu e ha monic con en due o complex acous ic and me-
chanical in e ac ions. Howe e , unlike musical ins umen s whe e inha monici y has
been ex ensi ely s udied and modeled [52, 53, 54], he e exis s no scien i ically ig-
o ous physical model o inha monici y in engine sounds compa able o es ablished
heo ies o s i s ings in pianos o o he musical ins umen s.
Gi en he absence o es ablished heo e ical amewo ks, we adop ed an empi ical
app oach based on sys ema ic analysis o eal engine eco dings. Du ing da ase c e-
a ion and analysis (Sec ion 3.2), we conduc ed de ailed spec al analysis o ex ac
45
ac ual ha monic de ia ions om eco ded engine sounds. This analysis e ealed ha
obse ed de ia ions can each ±4-7%, which is qui e subs an ial gi en ha human
pe cep ion h esholds o equency shi s all below 1% [55].
Despi e he absence o ob ious uni e sal ends (unlike he inc easing inha monici y
ela ed o s i ness o piano s ings), se e al in e es ing obse a ions eme ged. Engine
o de s end o exhibi g oup-like beha io in esponse o speci ic RPM and o que
si ua ions, while o he s show mo e cons an de ia ions ac oss wide RPM anges.
These indings a e de ailed in Appendix A.
Ra he han lea ning a bi a y pe -ha monic o se s ha isk me e da ase memo-
iza ion, we p opose a ime- a ying Gaussian modula ion o ha monic equencies.
Fo engine o de s spaced a in e als o 0.5 (co esponding o ou -s oke engine
ha monics), we de ine
ben
k( ) = ideal
k( )·"1 +
G
X
g=1
∆g( ) exp −(k−µg( ))2
2σg( )2#,(5.1)
whe e ideal
k( ) = (k×0.5) × 0( ) ep esen s he ideal engine o de equencies,
k∈ {1,2, ..., K}indexes he ha monic a ay, and Glea nable Gaussians wi h pa-
ame e s {∆g( ), µg( ), σg( )}enable sys ema ic con ol o e inha monic pa e ns.
The equency de ia ion ac o s ∆g( )a e cons ained o [−0.08,0.08] o p o ide
su icien ange o obse ed da ase a ia ions wi h explo a ion head oom o u-
u e da ase s.
Complex Ha monic Oscilla o s
Because equency and phase a e now ime- a ying op imiza ion pa ame e s a he
han ixed alues, sinusoidal pa ame e es ima ion aces a well-known challenge: he
g adien landscape o spec al loss unc ions wi h espec o equency is highly non-
con ex, hinde ing con e gence [30]. To add ess his, we adop a complex exponen ial
ep esen a ion o he ha monic oscilla o s, as p oposed by [31], which imp o es
46
g adien low du ing equency op imiza ion:
Hk( ) = Ak( )· eal ejϕk( )=Ak( )·cos(ϕk( )),(5.2)
whe e he phase ϕk( )is compu ed h ough cumula i e in eg a ion o he p edic ed
angula equency:
ϕk( ) = Z
0
ωk(τ)dτ +ϕk,0=Z
0
2π ben
k(τ)dτ +ϕk,0.(5.3)
In disc e e ime, his in eg a ion is implemen ed as cumula i e summa ion o e
ime s eps. Random ini ial phases ϕk,0∼ U(0,2π)a e used o a oid cons uc i e
in e e ence a i ac s.
P essu e Wa e-Coupled Noise Syn hesis
The co e physical induc i e bias o ou app oach lies in he mul iplica i e coupling
be ween ha monic and noise componen s, e lec ing he acous ic eali y ha engine
noise o igina es om he same combus ion-gene a ed p essu e wa es ha p oduce
he ha monic spec um.
We implemen h ee dis inc noise models, each wi h speci ic physical mo i a ion:
Noise Bu s s model sha p p essu e pulses om indi idual combus ion e en s:
Nbu s s( ) = gpulse( )· B
X
b=1
αb( )ηb( )!·
M
X
m=1
wm( )|˜
Hm( )|pm( ),(5.4)
whe e ˜
Hm( )is con e ed om sinusoidal o iangula wa e o ms using an a csin
ans o ma ion o be e ma ch he impulsi e na u e o eal combus ion e en s (as
obse ed in ime-domain analysis o eco ded engine da a):
˜
Hm( ) = 2
πa csin(0.99 ·Hm( )).(5.5)
The sa e y ac o o 0.99 p e en s nume ical ins abili y a he a csin domain bound-
a ies whe e d
dx a csin(x) = 1
√1−x2→ ∞ as |x|→1.
47
Tu bulence Dis o ion cap u es ae odynamic u bulence e ec s coupled o he
o e all p essu e wa e:
N u bulence( ) = γ u b ·[gpulse( )+g low( )] ·H o al( )· B
X
b=1
αb( )ηb( )!,(5.6)
whe e H o al( ) = PK
k=1 Hk( )deno es he sum o all ha monic componen s.
Flow Noise ep esen s decele a ion uel cu o and ai low phenomena wi hou di-
ec ha monic coupling:
N low( ) = γ low ·g low( )· B
X
b=1
αb( )ηb( )!.(5.7)
He e, B ep esen s he numbe o ERB- il e ed noise bands and Mdeno es he num-
be o low-o de ha monic oscilla ions used o modula o pulse signal gene a ion.
The a iables ηb( )a e p e-compu ed ERB- il e ed noise bands wi h ime- a ying
ampli udes αb( ),wm( )a e weigh s balancing con ibu ions o ha monics, and pm( )
a e shaping exponen s ans o ming he ha monic modula o oscilla ions ˜
Hm( ) o
pulses wi h a ying sha pness. The empi ically-de e mined mixing coe icien s a e
se o γ u b = 0.7and γ low = 0.3based on pe cep ual e alua ion o plausible magni-
ude a ios.
As desc ibed in Sec ion 4.3.3, he di e en noise componen s a e ac i a ed acco ding
o speci ic h o le and engine ope a ion s a es:
gpulse( ) = [0.75 ·σ(pulse_gain( )) + 0.25] · h o le i ual( ),(5.8)
g low( ) = [0.75 ·σ( low_gain( )) + 0.25] ·DFCO ac o ( ).(5.9)
5.2.2 Neu al Pa ame e Con ol
Le x deno e he la en embedding ec o s om he inal hidden laye o ou neu al
ne wo k, which encode ime- a ying engine con ol pa ame e s (RPM, o que) and
hei empo al de i a i es. These embeddings a e p ocessed by specialized neu al
48
whe e ν < 1is a lea nable pa ame e desc ibing he dep h and cu a u e o he
bending, cons ained by a scaled sigmoid unc ion (Equa ion 5.37). Because he
exponen is smalle han 1, he phase ad ances as e a he beginning o he pulse
and slows down owa d he end.
When mapped o angula phase and applied o he de i a i e-o -cosine sum, his
esul s in:
d
d "K
X
k=1
akcos k2π ϕmod( )#=−
K
X
k=1
akk2π ϕmod( )′sin k2π ϕmod( ).(5.24)
In ui i ely, his "phase bending" comp esses he pulse in ime a i s onse , p oducing
a highe ins an aneous equency ha co esponds o he apid elease o ho com-
bus ion gases. The pulse hen g adually s e ches owa d i s end, lowe ing he pi ch
as he gas cools and he speed o sound no malizes along he exhaus mani old. This
app oach allows he syn hesized pulse ain o mo e accu a ely ep oduce he na u-
al pi ch en elopes, en iching imb e and p oducing he cha ac e is ic "g un ing" o
low-speed ope a ion. The e ec o his phase bending is shown in he bo om-le
panel o Figu e 5.
Exponen ial Ampli ude En elope Real exhaus al e e en s p oduce ini e-
ime p essu e ansi ions, cha ac e ized by a apid p essu e elease om he com-
bus ion chambe in o he exhaus pipe, esul ing in a sha p ise ollowed by a slowe
decay. To cap u e his beha io , we apply an exponen ial ise-and- all en elope o
he de i a i e pulse ains,
E( ) = (1 −exp(−α )) exp(−β ),(5.25)
whe e αand βa e lea nable coe icien s con olling he ise and decay slopes o
each pulse. This o mula ion allows he wa e o m o ep oduce he asymme i-
cal, g adien -d i en p essu e a ia ions obse ed in eal exhaus mani olds, while
p ese ing he bipola cha ac e is ic o he de i a i e pulse. The e ec o his am-
pli ude en elope is depic ed in he op- igh plo o Figu e 5.
55
Combined Bipola Pulse T ain The inal physically-in o med pulse ain, in-
co po a ing bo h he he modynamically-inspi ed equency modula ion and he ex-
ponen ial ampli ude en elope, can be w i en as:
P( )=E( )·
K
X
k=1
ak−sin k2π ϕmod( ),(5.26)
whe e E( )is he lea nable exponen ial ise-and- all en elope, ϕmod( ) = ϕno m( )ν
is he exponen ially ben no malized phase, and aka e he no malized ha monic
ampli ude dis ibu ions de i ed as in he HPN model (Equa ion 5.10), inco po a ing
a lea ned na u al ha monic decay ac o λ. This ac o has a signi ican in luence
on he esul ing pulse shape, as illus a ed in op-le plo o Figu e 5.
Figu e 5: Pulse shapes o di e en pa ame e se ings. The ou subplo s show:
( op-le ) he base de i a i e-o -cosine pulse wi h a ying numbe s o ha monic decay
ac o s λi; ( op- igh ) pulses shaped by exponen ial en elopes (αi, βi); (bo om-le )
phase-modula ed pulses wi h ac o s m,i; and (bo om- igh ) P( ) he combined
e ec o en elope and modula ion. Mul iple pa ame e se ings a e shown pe sub-
plo , wi h cu es dis inguished by g adien s.
56
No ice ha in Equa ion 5.26 we omi ed he no maliza ion by angula equency
k·ω, he eby a oiding he implici emphasis on highe ha monics in he summa-
ion. Since each indi idual ha monic componen con ibu es equally o shaping he
esul ing pulse wa e, hey a e lea ned as a no malized dis ibu ion ac oss all ha -
monics (ak). Howe e , he k·ωscaling c ea es unequal g adien magni udes du ing
backp opaga ion, wi h highe ha monics ecei ing disp opo iona ely la ge g adien
upda es and domina ing he lea ning p ocess.
Noise Componen s Tu bulence (N u bulence( )) and low noise (N low( )) a e
in eg a ed in o he pulse ain model using he same me hodology as he HPN
a ian , de ailed in Sec ion 5.2.1. Howe e , he noise bu s s (Nbu s s( )) a e no
longe coupled o ampli ude-modula ing oscilla o s. Ins ead, hey a e shaped using
he same exponen ial ise-and-decay mechanics employed o pulse gene a ion in
Sec ion 5.3.1, wi h independen ly p edic ed shaping coe icien s.
Comple e Pulse T ain Signal Wi h he noise componen s in place, we a i e
a ou inal pulse ain signal:
y( ) = P( ) + Nbu s s( )+N u bulence( )+N low( ).(5.27)
5.3.2 Fi ing O de Sequence
So a , we ha e been cons uc ing only he pe iodic p essu e pulse o a single cylinde .
In o de o c ea e he hy hm and ha monic spec um o a comple e mul i-cylinde
engine, we need o sequence each o he cylinde pulse ains acco ding o hei
physical a angemen and he engine’s i ing o de .
Sequence Implemen a ion
To achie e his, we map he ull ou -s oke powe cycle o 720◦c anksha o a ion
on o a 2π adian in e al. This allows o di ec compa ibili y wi h ou 2π-pe iodic
con inuous unc ions, such as he ha monic basis and phase-ben oscilla o s.
57
Each cylinde ’s con ibu ion is hen de ined by in oducing a ixed phase o se ac-
co ding o he i ing sequence and physical cylinde a angemen o he V8, as ep-
esen ed below.
Le bank Righ bank
1 5
2 6
3 7
4 8
In ou con igu a ion, he i ing o de is speci ied as: [1, 5, 4, 8, 6, 3, 7, 2].
This dis ibu es he igni ion e en s e enly ac oss he cycle, bu also leads o an
al e na ing hy hm be ween he wo cylinde banks. This al e na ion is a key pe -
cep ual cue o V- ype engines, p oducing he cha ac e is ic une en, “bu bling” ex-
haus hy hm ha dis inguishes hem om inline con igu a ions. The indi idual
pis on phases a e now compu ed by o se ing he global powe -cycle phase by he
co esponding i ing angle and w apping he esul modulo 2π.
This way inha monici y, oughness, and modula ion h ough a e gene a ed phase in-
e e ence be ween cylinde ou pu s, a he han equi ing explici acous ic modeling
o hese e ec s.
Cylinde Bank Summa ion
A e gene a ing he indi idual cylinde p essu e pulses, we sum he con ibu ions
wi hin each cylinde bank o e lec he physical g ouping o he V8 engine:
Bank A (Le ) =
4
X
i=1
yi( ),
Bank B (Righ ) =
8
X
i=5
yi( ),
(5.28)
whe e yi( ) ep esen s he p essu e wa e o m o cylinde i. This sepa a ion allows
us p opaga e u he he p essu e wa es h ough independen signal pa hs.
58
5.3.3 Neu al Pa ame e Con ol
As in he HPN e sion, he inpu pa ame e s o his syn hesis a chi ec u e a e
de i ed om logi s xiand ans o med o meaning ul ime- a ying syn hesis pa am-
e e s using dedica ed model heads ea u ing ac i a ion unc ions and p ocessing.
The Pulse Wa e Head ex ends he ha monic syn hesis s a egy by inco po a ing
pe -cylinde pa ame e iza ion, phase o se s, and al e-shaping cons ain s, making
i sui able o exci a ion modeling o indi idual combus ion e en s.
Cylinde Ampli udes. Each cylinde is assigned a ime- a ying ampli ude
Ci( ) = σ(PulseWa eHeadcyl_amp(x ))i,(5.29)
which ensu es posi i e con ibu ions while allowing he ne wo k o con ol ela i e
cylinde balance and dynamics ac oss ope a ing condi ions.
Phase O se s and Ji e . Pe -cylinde igni ion phase o se s a e p edic ed h ough
ϕo se
i( ) = anh(PulseWa eHeadcyl_phase(x ))i·∆ϕ,(5.30)
whe e ∆ϕ=40
720 ·2πbounds de ia ions o ±40◦c ank angle. To cap u e s ochas ic
i ing a iabili y, a ji e e m ϕji e
i( )∼ N(0, σ2
ϕ)is added wi h σϕ=2
720 ·2π,
co esponding o a ±1% c ank luc ua ion. The e ec i e cylinde phase is hen:
ϕi( ) = ϕo se
i( )+ϕji e
i( ).(5.31)
Ha monic Dis ibu ion wi h Cylinde Decay. Raw ha monic ampli udes a e
p edic ed as
A aw
k( ) = clamp(PulseWa eHeadha m_amps(x )k,−10,10),(5.32)
59
no malized wi h a so max:
Ano m
k( ) = exp(A aw
k( ))
Pjexp(A aw
j( )).(5.33)
Pe -cylinde exponen ial ha monic decays en o ce na u al damping o highe o de s:
λi( ) = σ(PulseWa eHeadha m_decay(x )i)0.75,(5.34)
Ai,k( ) = Ano m
k( )·exp(−λi( )·k),(5.35)
wi h e-no maliza ion ac oss k. An o e all gain e m
G( ) = 2 ·[σ(PulseWa eHeadgain(x ))]ln(10) +ϵ(5.36)
scales ene gy globally, ensu ing s able syn hesis.
Phase Modula ion. A modula ion dep h pa ame e con ols cu a u e o "phase
bending" as exponen ial sligh ly below uni y:
ν( ) = 1 −0.2·σ(PulseWa eHead m_dep h(x )),(5.37)
esul ing in a ia ion be ween [0.8,1.0], cen e ed a 0.9.
Pulse Shaping ia Val e Geome y. Each cylinde ecei es shaping coe icien s
αi, βi om
[αi, βi]=σ(PulseWa eHeadshape(x ))0.5
i,(5.38)
whe e exponen s bias he ne wo k owa d longe decays (smoo he pulses). A -
e escaling, αico esponds o ounded opening angles (10–15°c ank), and βi o
b oade closing en elopes (50–100°c ank):
αi=deg2 ad(5 ·αi+ 10),(5.39)
βi=deg2 ad(50 ·βi+ 50).(5.40)
60
Final Ou pu . The head ou pu s h ee pa ame e g oups pe ame:
Ha monic Ampli udes: Ai,k( ),(5.41)
F equency Modula ion: ν( ),(5.42)
Pulse Shapes: [αi, βi],(5.43)
Phase O se s: θi( ),(5.44)
which oge he de ine he ime-aligned, pe -cylinde ha monic pulses o he Pulse-
T ain Syn hesize .
Up o his poin , all syn hesis pa ame e s ha e been p edic ed by he ne wo k as
ime- a ying ou pu s based on he embedding and decoding o con ol inpu se-
quences. In he ollowing sec ion, we p esen he PTR’s inal p ocessing s age and
explain why i s pa ame e s a e op imized di ec ly, a he han being d i en by ime-
a ying con ol embeddings.
5.3.4 Di e en iable Ka plus-S ong Resona o
Ha ing gene a ed pulse ain signals o each cylinde bank h ough he pulse wa e
composi ion p ocess, we now add ess he c i ical acous ic ans o ma ion ha occu s
be ween combus ion e en s and he inal audible engine sound. As es ablished in
he backg ound e iew, he combus ion noise i sel is no wha we pe cei e as en-
gine sound— a he , i is he exhaus pulses a e p opaga ion h ough he complex
exhaus sys em geome y.
This mo i a es p o iding he ne wo k wi h di e en iable p ocesso s capable o mod-
eling he in ica e acous ic phenomena occu ing wi hin exhaus sys ems: wa e e-
lec ions, comb il e ing e ec s, and complex in e ac ions be ween oncoming pulses
and p essu e wa e p opaga ion. The exhaus sys em comp ises a ne wo k o ubes
wi h a ying leng hs, diame e s, in e sec ions, and conjunc ions, ypically engi-
nee ed o op imal p essu e low and cons uc i e p essu e coupling o achie e s eam-
lined exhaus low.
61
To model his sys em, we implemen wo independen esona o s p ocessing he
cylinde bank ou pu s sepa a ely, ep esen ing he dis inc exhaus mani old pa hs
wi h hei cha ac e is ic leng hs and geome ies. These p ocessed signals a e subse-
quen ly combined wi hin a inal sha ed esona o , modeling he con luence in o he
common exhaus pipe sys em whe e u he acous ic shaping occu s h ough he
emaining exhaus componen s.
Ka plus-S ong Algo i hm: A Recu si e Feedback-Delay Line
The Ka plus-S ong algo i hm p o ides an elegan amewo k o modeling esonan
sys ems, such as s ings, ubes and memb anes, h ough a eedback delay line [59].
As shown in Figu e 6, he basic s uc u e consis s o h ee key componen s: a delay
line ha s o es p e ious ou pu samples, a il e ha p ocesses he delayed signal
be o e eedback, and a eedback mechanism ha adds he scaled il e ed delayed
signal o he cu en inpu .
Figu e 6: Schema ic o Ka plus-S ong algo i hm showing signal low om inpu
x[n] o ou pu y[n] ia delay line Z−Land second-o de il e α·Z−L+β·Z−L−1
wi h eedback gain g.
This can be exp essed as he di e ence equa ion:
y[n] = x[n] + g·h[n−L],(5.45)
whe e Lis he delay leng h (de e mining undamen al pi ch), gis he eedback gain,
62
and h[n−L] ep esen s he il e ed delayed signal:
h[n−L]=α·y[n−L]+β·y[n−L−1].(5.46)
This wo-coe icien o mula ion (αand β) p o ides g ea e lexibili y han he a-
di ional Ka plus-S ong implemen a ion (whe e α=β= 0.5), allowing independen
con ol o e he il e cha ac e is ics wi hin he eedback pa h. This enables mo e
sophis ica ed imb e e olu ion ac oss he delay, as di e en coe icien combina ions
can emphasize o a enua e speci ic spec al componen s. Ex ended e sions o he
Ka plus-S ong algo i hm inco po a e addi ional il e s ages and nonlinea ans e
unc ions o mo e accu a ely model he physical beha io o s i plucked s ings [60].
Fo he pu pose o engine sound syn hesis, howe e , we adop he o iginal, mo e gen-
e al o m o he Ka plus-S ong algo i hm, in which a second-o de il e wi h ee
(i.e., lea nable o unable) coe icien s is applied in he eedback pa h, which is scaled
by he eedback gain g:
y[n] = x[n] + g·(α·y[n−L]+β·y[n−L−1]) (5.47)
o equi alen ly,
y[n] = x[n] + g·
1
X
i=0
ai·y[n−L−i],(5.48)
whe e a0=αand a1=β.
Challenges wi h Recu si i y
As shown in he Ka plus–S ong di e ence equa ion 5.48, he algo i hm is inhe en ly
o mula ed as a ecu si e eedback sys em. This p esen s signi ican challenges when
in eg a ed in o neu al ne wo k a chi ec u es:
Backp opaga ion Th ough Time (BPTT) Complexi y The ecu si e depen-
dency means ha y[n]depends on all p e ious ou pu s y[n−1], y[n−2], . . . , y[0].
Fo g adien compu a ion, his c ea es a dependency chain ha mus be un olled
comple ely:
63
•Compu ing ∂L
∂α equi es g adien s h ough he en i e sequence
•Each ime s ep depends on all p e ious ime s eps
•The compu a ional g aph becomes a long sequen ial chain
This esul s in:
•Compu a ional complexi y:
–O(N·M)whe e Nis sequence leng h and Mis he numbe o pa ame e s
–Sequen ial p ocessing ha canno be pa allelized (each example in ba ch
has unique ecu sions), making e icien implemen a ion di icul
•G adien issues: Vanishing/exploding g adien s o e long sequences
S abili y Conce ns Wi hou p ope cons ain s, ecu si e eedback il e s can
exhibi uns able beha io wi h ou pu s g owing wi hou bound. This is pa icu-
la ly p oblema ic du ing neu al ne wo k aining, whe e pa ame e s change apidly
and small nume ical e o s can accumula e o e long sequences, leading o ins able
coe icien s causing NaN alues.
A Rela ed Solu ion: Di e en iable All-Pole Fil e s
The BPTT challenge is no unique o he Ka plus-S ong algo i hm; i a ises in any
ecu si e sys em. In digi al signal p ocessing, such ecu si e s uc u es a e mos
commonly encoun e ed in digi al il e s.
Fo he speci ic case o all-pole il e s, Yu e al. [40] add ess he undamen al chal-
lenge o compu ing g adien s h ough ecu si e IIR il e s by e o mula ing he o -
wa d pass o elimina e ecu sion, he eby enabling s able and e icien backp opa-
ga ion.
64
Chap e 6
Loss Func ions
The model is ained wi h a combina ion o loss unc ions o cap u e global spec al
s uc u e and ha monic de ails.
6.1 Mul i-Resolu ion STFT Loss
We employ a Mul i-Resolu ion Sho -Time Fou ie T ans o m (STFT) loss composed
o spec al con e gence, linea - and log-magni ude e ms, supplemen ed by a spec al
ene gy loss. The mul i- esolu ion STFT was in oduced o cap u e spec al s uc u e
ac oss se e al ime– equency esolu ions. Engel e al. [24] adop his amewo k,
hough wi hou bo h he log-magni ude and spec al con e gence e ms ha we e
o iginally p oposed by A ik e al. [64] and ex ended o mul i- esolu ion by Yamamo o
e al. [65].
Ou implemen a ion is based on he au aloss lib a y [66], which p o ides mul i-
esolu ion STFT losses wi h op ional scale in a iance. We ex end i in wo ways.
Fi s , we add an explici spec al ene gy loss ha supe ises ame-wise ene gy,
Ex,b, =X
x2
mag,b, , , Ey,b, =X
y2
mag,b, , ,(6.1)
E(xmag, ymag) = 1
BT X
b,
d(Ex,b, , Ey,b, ),(6.2)
71
whe e d(a, b)is he L1 dis ance o log-scaled ene gies, no malized ac oss ames.
This disen angles absolu e loudness supe ision om spec al shape, imp o ing op-
imiza ion s abili y by p o iding independen g adien signals o imb e and ene gy.
Second, we in oduce an equal-con ibu ion no maliza ion ac oss esolu ions,
xmag ←xmag
F, ymag ←ymag
F,(6.3)
which compensa es o di e ing FFT sizes, ensu ing ha each esolu ion con ibu es
compa ably o he o al loss.
By scale-in a ian escaling [66], he a ge spec um is no malized be o e compu ing
STFT losses:
αb=P , xmag,b, , ymag,b, ,
P , y2
mag,b, ,
, ymag,b ←αbymag,b.(6.4)
The indi idual STFT e ms a e de ined as:
LSC =∥xmag −ymag∥F
∥ymag∥F
,(6.5)
Llin =1
BFT X
b, , |xmag,b, , −ymag,b, , |,(6.6)
Llog =1
BFT X
b, , |log(xmag,b, , +ϵ)−log(ymag,b, , +ϵ)|.(6.7)
The o al spec al loss is a weigh ed sum:
L=wSC LSC +wlin Llin +wlog Llog +wene gy E(xmag, ymag),(6.8)
whe e he weigh s wSC, wlin, wlog, wene gy con ol he ela i e con ibu ions o each
e m and a e all se o 1 in ou expe imen s. The ene gy loss is compu ed be o e
scale-in a ian no maliza ion o p ese e absolu e loudness supe ision, while STFT-
based e ms a e scale-in a ian o ocus on spec al shape.
This o mula ion p o ides disen angled supe ision o ha monic s uc u e and loud-
ness, while ensu ing ai con ibu ion om all esolu ions, leading o mo e s able
72
and in e p e able op imiza ion. I also aligns concep ually wi h ou model a chi-
ec u e, which p edic s o e all ene gy and ha monic dis ibu ion sepa a ely (see
Sec ion 5.2.2).
6.2 Ha monic Loss
To supe ise ine-g ained ha monic con en , we in oduce a Ha monic Loss, inspi ed
by Campbell diag ams om o a ing machine y analysis. In a Campbell diag am,
ha monic o de s appea as diagonal lines when plo ing esonan equencies agains
o a ional speed. Analogously, ou loss emphasizes spec al ene gy along p edic ed
ha monic acks de i ed om he ins an aneous RPM (o 0) o each ame, while de-
emphasizing in e -ha monic con en . This allows he model o p io i ize ha monic
magni udes o e non-ha monic spec al componen s.
The loss is ins an ia ed wi h high spec al esolu ion (FFT size 65,536, window size
16,384, hop size 256 a a16 kHz sample a e) and ze o-padding o minimize spec al
leakage, ensu ing ha ha monic peaks a e accu a ely cap u ed. Fo ec o ized com-
pu a ion, we cons uc a sinusoidal mask Mb, , whose maxima align wi h expec ed
ha monic bins, and whose nega i e o in e -ha monic egions a e clipped o educe
he con ibu ion o non-ha monic ene gy. Each ba ch b, example, and ame e-
cei es a mask based on he ins an aneous RPM, cap u ing ha monics 0.5,1,1.5, . . .
up o he desi ed numbe o ha monics.
Le xmag, ymag ∈RB×F×Tdeno e he masked magni ude spec og ams o he gene -
a ed and a ge audio. The Ha monic Loss is compu ed as he ame-wise ene gy
dis ance along he ha monic mask:
Eha m
b, =
F
X
=1 xmag,b, , ·Mb, , 2, Eha m
b, =
F
X
=1 ymag,b, , ·Mb, , 2,(6.9)
Lha m =1
B(T−1)
B
X
b=1
T−1
X
=1 log(Eha m
b, +1 +ϵ)−log(Eha m
b, +ϵ),(6.10)
73
whe e ϵis a small cons an o nume ical s abili y. This o mula ion measu es he
empo al di e ences o loga i hmic ene gy along ha monic bins, ensu ing he model
ep oduces he dynamics o ha monic magni udes.
Figu e 7: Log-magni ude spec a o p edic ion Xand a ge Ya e applying he
Ha monic Mask M. The mask supp esses ene gy di e ences be ween ha monics
and emphasizes ha monic egions, inspi ed by Campbell diag ams commonly used
in NVH and enginee ing acous ics.
74
Chap e 7
T aining
7.1 Model A chi ec u e
The model a ian s a e bo h con igu ed wi h 128 ha monic oices and 256 noise
bands. The neu al ne wo k a chi ec u e uses a hidden size o 256 dimensions wi h a
single-laye GRU o 512 uni s, esul ing in a o al o 1,435,697 ainable pa ame e s.
Memo y equi emen s include 5.74 MB o pa ame e s, 14.33 MB o o wa d/back-
wa d passes, and an es ima ed o al size o 20.61 MB.
7.2 T aining and Valida ion Da a
We ain ou models on he i s h ee subse s – A, B, C – o he P ocedu al Engine
Sounds Da ase [51] desc ibed in Chap e 3. Each subse has nea ly iden ical size,
pm and o que dis ibu ions, and co e age, enabling di ec compa abili y while con-
olling o ac o s o he han imb e. This design allows us o s udy gene aliza ion
ac oss di e en engine imb es.
The subse s o igina e om h ee dis inc eal-wo ld engines, wi h addi ional modi i-
ca ions o inc ease s ochas ici y and spec al complexi y. Da ase Ais cha ac e ized
by he ha monic spec um o an inline- ou engine, Bby a V8 con igu a ion wi h
low- equency esonan exhaus cha ac e is ics, and Cby a V8 engine exhibi ing
p ominen mid- ange equencies, in ake noise, and me allic esonances. Based on
75
hese ai s, he se s can be o de ed as A < B < C in e ms o inc easing spec al
and empo al complexi y.
Each subse comp ises app oxima ely 2.5 hou s (149 minu es) o audio da a, which
we spli in o 90% aining and 10% alida ion da a.
7.3 Da a P ocessing
Fo model aining, audio da a is p ocessed in ba ches con aining ixed-leng h mono-
phonic chunks o 65,536 samples a 16 kHz sampling a e (app oxima ely 4 seconds
du a ion). These chunks a e ex ac ed a andom posi ions h oughou he da ase
wi h 50% o e lap o maximize da a u iliza ion. Audio signals a e no malized o
he ange [−1,1] ac oss he en i e da ase , p ese ing he ull dynamic ange while
main aining ela i e ene gy di e ences ha a e essen ial o e ec i e lea ning.
Con ol da a unde goes p ep ocessing o align wi h he model’s empo al esolu ion.
Speci ically, con ol signals a e downsampled o ma ch he model’s ame a e o 125
Hz (co esponding o a hop size o 128 samples) and subsequen ly s anda dized using
ze o mean and uni a iance no maliza ion. The s anda diza ion pa ame e s (mean
and s anda d de ia ion) a e compu ed ac oss he en i e aining da ase o ensu e
compa able ea u e magni udes while p ese ing ela i e scaling ela ionships. In
con as , condi ioning signals a e p o ided a he o iginal audio sample a e wi hou
scaling o no maliza ion, as hey cap u e absolu e physical alues ha mus be
p ese ed.
These p ep ocessing choices—including he 16 kHz sampling a e, 4-second chunk
du a ion, and ba ch size o 8—align wi h es ablished p ac ices in DDSP-based im-
b e ans e and con ollable syn hesis sys ems. This con igu a ion is well-sui ed
o scena ios in ol ing limi ed aining da a wi h na ow acous ic a ge imb es
and minimal a ie y, le e aging s ong induc i e biases o ci cum en he need o
lea ning undamen al audio syn hesis p inciples om aw da a [24, 67, 32].
76
7.4 Op imiza ion Se up
T aining is conduc ed using he AdamW op imize wi h an ini ial lea ning a e o
1×10−3, weigh decay o 1×10−2, and be a pa ame e s o (0.9, 0.999). Lea ning a e
scheduling ollows a one-cycle policy wi h a maximum lea ning a e o 2×10−4, whe e
15% o aining s eps cons i u e he wa m-up phase. The lea ning a e inc eases by
a ac o o 10 om he ini ial alue, hen dec eases by a ac o o 100 in he inal
phase, ollowing a cosine annealing s a egy h oughou he cycle. The comple e
aining p ocess spans 100 epochs, o aling app oxima ely 45,000 aining s eps.
7.5 Pa ame e Ini ializa ion
All linea laye s employ Xa ie ini ializa ion, which is app op ia e o he anh
ac i a ions used in he syn hesis pa ame e heads. GRU laye s ollow PyTo ch’s
de aul ini ializa ion scheme, whe e weigh s a e d awn om a uni o m dis ibu ion
U(−√k, √k)wi h k=1
hidden_size and biases ini ialized o ze o. Laye no maliza ion
pa ame e s a e ini ialized o uni y o scale (γ= 1) and ze o o shi (β= 0).
77
Chap e 8
Resul s
In he subsequen sec ions, we compa e he wo model a ian s, HPN and PTR, by
analyzing aining dynamics on ain and alida ion se s, e alua ing c oss-da ase
gene aliza ion, and complemen ing objec i e me ics wi h subjec i e assessmen .
Th oughou , A, B, C deno e he espec i e aining subse s in oduced in Sec ion 7.2.
8.1 Quan i a i e Analysis
The ollowing sec ion examines con e gence and pe o mance, analyses loss compo-
nen s, and e alua es gene aliza ion capabili ies ac oss da ase s.
8.1.1 Con e gence and Pe o mance
Figu e 8 illus a es he aining dynamics o bo h model a ian s ac oss each
da ase . Bo h model a ian s demons a e apid con e gence wi hin he i s 8,000
aining s eps, wi h o al log- ans o med alida ion loss (black do ed lines) de-
c easing sha ply om app oxima ely 0.55 o 0.16 on a e age ac oss da ase s.
Con e gence is e ec i ely achie ed by 15,000–25,000 aining s eps ac oss all con ig-
u a ions, as indica ed by he e ical dashed lines ma king ea ly s opping candida es
(de ined as posi ions wi h 8 ollowing epochs o no signi ican imp o emen ). These
78
candida es ep esen po en ial s opping poin s ha could ha e achie ed simila pe -
o mance wi h signi ican ly ewe aining s eps compa ed o he ull 100-epoch ain-
ing un.
Howe e , aining was con inued o cap u e he comple e con e gence dynamics, as
alida ion loss showed con inued g adual imp o emen wi hou o e i ing conce ns,
despi e diminishing e u ns. Ea ly s opping candida e loss alues a e de ailed in
Appendix B, and all subsequen analysis uses he inal bes models om he comple e
aining un.
Figu e 8: Log- ans o med To al Loss ( aining and alida ion) o bo h model
a ian s (HPN, PTR; columns) ac oss h ee da ase s (A, B, C; ows). The x-axis
deno es aining s eps and he y-axis he loss. Ve ical dashed lines indica e ea ly
s op candida es (Esc), de ined as poin s whe e no signi ican imp o emen occu ed
o 8 consecu i e epochs. T aining cu es show pe -s ep sampling while alida ion
cu es ep esen epoch-le el measu emen s.
79
As shown in Table 3, PTR consis en ly ou pe o ms HPN in o al alida ion loss
ac oss all da ase s, achie ing a mean alida ion loss o 0.949 compa ed o HPN’s
1.006 (5.7% imp o emen ). PTR shows supe io alida ion pe o mance ac oss all
h ee da ase s, wi h imp o emen s anging om 3.8% o 7.6%.
HPN PTR
Da ase Ha monic STFT To al Ha monic STFT To al
A 0.107 1.781 0.944 0.090 1.649 0.872
B 0.059 1.824 0.943 0.055 1.754 0.907
C 0.166 2.093 1.132 0.117 2.017 1.069
mean 0.111 1.899 1.006 0.088 1.807 0.949
Table 3: Bes alida ion losses o HPN and PTR models, sepa a ed in o ha monic,
STFT, and o al loss componen s.
The inal aining losses in Table 4 e eal complemen a y s eng hs. HPN achie es
ma ginally supe io ha monic econs uc ion du ing aining (0.007 s 0.008), bu
his ad an age e e ses on alida ion da a (0.111 s 0.088), sugges ing HPN may
o e i o ha monic de ails a he han lea ning gene alizable syn hesis pa e ns.
PTR achie es supe io STFT pe o mance in bo h aining (1.322 s 1.377) and
alida ion, indica ing mo e obus spec al econs uc ion.
HPN PTR
Da ase Ha monic STFT To al Ha monic STFT To al
A0.007 1.245 0.643 0.009 1.220 0.646
B 0.008 1.401 0.728 0.007 1.354 0.702
C0.005 1.484 0.764 0.008 1.391 0.728
mean 0.007 1.377 0.712 0.008 1.322 0.692
Table 4: Bes aining losses o HPN and PTR models, sepa a ed in o ha monic,
STFT, and o al loss componen s.
The aining- alida ion gap con i ms his in e p e a ion: PTR demons a es supe-
io consis ency wi h a mean aining- alida ion gap o 0.257 o o al loss, while
HPN exhibi s a la ge gap o 0.294. This indica es PTR’s lea ned ep esen a ions
and econs uc ion s a egies ans e mo e e ec i ely o alida ion da a.
80
wi h syn hesis pa ame e s al e na ing in an uns able manne . This beha io co -
ela es wi h subse A’s cha ac e is ics: despi e being spec ally leas complex, i
exhibi s highe deg ees o inha monici ies and shi s o la ge ha monic g oups com-
pa ed o o he subse s. Fu he mo e, his engine imb e ea u es ypical inline- ou
cha ac e is ics, undamen ally di e en om he V8 i ing sequence used as bias
o he PTR a ian . While he PTR model is heo e ically capable o mu ing in-
di idual cylinde s o u ilize only ou cylinde s, his s a egy appea s di icul o
he model o disco e . Wi hou his cylinde mu ing s a egy, he PTR a ian
aces a undamen al limi a ion: i ing eigh cylinde s ins ead o ou inhe en ly p o-
duces undamen al equencies and ha monic con en a wice he a ge alues—a
disc epancy ha canno be compensa ed h ough pulse shaping alone.
Addi ionally, subse A’s p onounced ha monic de ia ions can be e ec i ely ad-
d essed by he HPN a ian ’s Gaussian ha monic bending mechanism, while he
PTR a ian lacks equi alen di ec means o a ge hese singula ha monic de-
ia ions as e ec i ely. The HPN a ian does no exhibi his beha io , e ealing
supe io obus ness o ansi ional dynamics and g ea e lexibili y in adap ing o
s ong ha monic de ia ions.
Idle S a e Challenges: Idle s a es p esen he mos signi ican syn hesis chal-
lenges, likely due o low equency con en and explici hy hmic pa e ns ha blu
op imiza ion a ge s wi hou phase-alignmen s a egies. Addi ionally, idle sounds
a e unde ep esen ed in he da ase compa ed o no mal d i ing condi ions (2000-
3500 RPM), ecei ing p opo ionally less a en ion du ing aining.
Loss Func ion Limi a ions: Ou loss unc ions do no inco po a e phase-alignmen
s a egies o synch onize ou pu and a ge ega ding engine cycle phase, na u ally
p omo ing an a e aging s a egy o e ime-aligned acous ic e en s. To ci cum en
his in o ma i e blindspo , bo h a ian s we e a chi ec u ally cons ained o explain
he da a in an inhe en ly synch onized manne , as de ailed in Chap e 5. While his
app oach success ully p oduces pe cep ually plausible hy hmic cha ac e is ics, he
non-synch onized loss unc ion may s ill p o ide inadequa e in o ma i e eedback
o p ecise en elope and modula ion pa e n upda es.
87
Chap e 9
Conclusions and Fu u e Wo k
This chap e summa izes he main con ibu ions and key indings, discusses hei
implica ions, and ou lines di ec ions o u u e wo k.
9.1 Summa y o Con ibu ions
This hesis makes he ollowing key con ibu ions:
•I in eg a es engine acous ics domain knowledge as induc i e biases in o neu al
audio a chi ec u es and di e en iable syn hesis componen s, p oposing design
pa e ns ha may be ans e able o o he con ex s o physically g ounded
audio syn hesis.
•I in oduces he P ocedu al Engine Sounds Da ase [51], add essing he lack
o sui able da ase s o engine sound modeling and ela ed in e se p oblems.
•I p o ides he P ocedu al Engines Model (PRCE), including he comple e
aining, in e ence, and analysis pipeline wi h model checkpoin s and weigh s.
88
9.2 Key Findings
The conduc ed expe imen s yield he ollowing cen al insigh s:
•Bo h PRCE model a ian s ep oduce engine sounds wi h high pe cep ual
plausibili y, cap u ing imb e, dynamics, and esponsi eness o con ol pa-
ame e s beyond wha objec i e loss me ics alone can desc ibe.
•Induc i e biases lea e audible signa u es: while he PTR a ian empha-
sizes hy hmic, mechanically g ounded ea u es, he HPN a ian p oduces
smoo he and mo e lexible app oxima ions ac oss engine ypes. These di -
e ences con i m ha model a chi ec u e di ec ly in luences solu ion s a egies
and shapes pe cep ual ou comes.
•Bo h models gene alize well by ocusing on essen ial acous ic componen s while
dis ega ding s ochas ic a ia ions in a ge samples, indica ing e ec i e denois-
ing and obus ness o eal-wo ld applica ions.
•Limi a ions include di icul ies in ep oducing idle s a es due o phase align-
men challenges, unde ep esen a ion o idle da a, and he absence o phase-
sensi i e loss unc ions. These sho comings mo i a e he a chi ec u al and
me hodological di ec ions ou lined in Sec ion 9.4.
9.3 Implica ions
This wo k ad ances esea ch and p ac ice in se e al ways:
•Academic Impac : The in eg a ion o domain knowledge in o neu al a -
chi ec u es illus a es he alue o induc i e biases o pe cep ually g ounded
sound syn hesis. The openly eleased da ase and model p o ide benchma ks
and esou ces ha add ess a cen al bo leneck in he ield, enabling ep o-
ducibili y and u he expe imen a ion.
89
•P ac ical Applica ions: The con ollable syn hesis o ealis ic engine sounds
has di ec ele ance o applica ions in au omo i e acous ics, gaming, and
i ual p o o yping. The esponsi eness and accu acy o he PRCE model
a ian s o con ol pa ame e a ia ions makes hem sui able o in e ac i e
use cases equi ing eal- ime sound gene a ion wi h high deg ee o pe cep ual
ealism.
•Ad ancemen o he S a e o he A : By b idging adi ional p ocedu-
al sound gene a ion wi h di e en iable syn hesis and machine lea ning, his
hesis demons a es a pa hway owa d scalable, explainable, and high-quali y
engine sound modeling. The eleased da ase , model, and pipeline es ablish a
ounda ion o bo h gene a i e and in e se asks, including NVH analysis and
simula ion s udies beyond he scope o engine sound ep oduc ion.
9.4 Fu u e Wo k
This wo k opens se e al p omising esea ch di ec ions o u he ad ancemen s in
neu al engine sound syn hesis.
Lea ned Timb e Rep esen a ions: The di e se engine con igu a ions in ou p o-
cedu al da ase p o ide an oppo uni y o in es iga e whe he he model na u ally
o ganizes lea ned ep esen a ions in o meaning ul imb e spaces. This would e eal
whe he domain-speci ic induc i e biases enable plausible and obus in e pola ion
be ween engine ypes, allowing na iga ion h ough acous ic cha ac e is ics no ex-
plici ly p esen in he aining da a.
T ans e Lea ning om P ocedu al Founda ion: Ou comp ehensi e p ocedu-
al engine sounds da ase could se e as a p e- aining ounda ion o imb e-speci ic
ine- uning wi h limi ed eal-wo ld da a. This app oach would le e age he sys-
ema ic pa ame e co e age achie ed h ough ou p ocedu al gene a ion o enable
high-quali y syn hesis wi h minimal addi ional aining samples.
90
In e se Pa ame e P edic ion: Ou de ailed pa ame e anno a ions and demon-
s a ed audio- o-pa ame e ela ionships enable aining models on he in e se p ob-
lem: p edic ing engine pe o mance pa ame e s om audio. This would enable au o-
ma ic da ase enhancemen wi h non-anno a ed eal-wo ld da a, enabling eal-wo ld
alida ion, a chi ec u al ex ensions, as well as non-gene a i e, analy ical asks com-
mon o NVH and acous ic enginee ing, whe e ime-aligned RPM and o que mea-
su emen s a e essen ial o iden i ying acous ic phenomena ela ed o speci ic engine
ope a ion s a es.
Real-Wo ld Valida ion: Inco po a ing eal anno a ed ehicle eco dings would
alida e whe he ou p oposed syn hesis app oaches show obus pe o mance on au-
hen ic engine acous ics, including scena ios wi h con ounding en i onmen al noises
and limi ed da a quan i ies ha cha ac e ize eal-wo ld eco ding condi ions. This
alida ion s ep is c ucial o es ablishing he p ac ical applicabili y o ou p ocedu al
gene a ion app oach and would in o m subsequen esea ch di ec ions.
Audio-D i en A chi ec u e Ex ensions: Building on he in e se pa ame e
p edic ion ounda ion, ou ne wo k a chi ec u e could be ex ended o use audio as
he p ima y inpu sou ce, p edic ing he unde lying pe o mance pa ame e s a an
in e media e s age (simila ly o 0and loudness in he o iginal DDSP amewo k).
This end- o-end audio-d i en app oach would enable lea ning om non-anno a ed
audio collec ions and d ama ically expand a ailable aining da a beyond ou cu en
p ocedu al gene a ion app oach.
B oade Applica ions in Vehicle Noise S udies: Bo h ou da ase and gene a-
i e ne wo k a chi ec u e ex end beyond he isola ed scope o engine sounds, se ing
as in eg able componen s in simula ion se ups ha in es iga e o he acous ically el-
e an sou ces wi hin he same con ex . Po en ial applica ions include op imiza ion
o in-cabin speech de ec ion sys ems, s udies o u ban noise pollu ion, and o he
esea ch ha bene i s om ealis ic, holis ic simula ions wi h p ecise pa ame ic
con ol in he b oade domain o ehicle acous ics.
91
Da a and Code A ailabili y
The da ase s, models, and code de eloped in his wo k a e made publicly a ailable
o suppo ep oducible esea ch and acili a e u u e de elopmen s in engine sound
syn hesis:
P ocedu al Engine Sounds Da ase : The comple e da ase is a ailable on Zen-
odo (h ps://doi.o g/10.5281/zenodo.16883336) and Hugging Face Da ase s
(h ps://hugging ace.co/da ase s/ doe le /p ocedu al-engine-sounds).
PRCE Model and Code: The comple e implemen a ion, including aining and
in e ence pipelines, model checkpoin s, and p e- ained weigh s, is a ailable in he
Gi Hub eposi o y: h ps://gi hub.com/ doe le /p ce-model
Audio Examples: Supplemen a y audio examples demons a ing model ou pu s
a e a ailable a : h ps:// doe le .gi hub.io/p ce-examples/
92
Appendix A
Ha monic De ia ion Analysis
This appendix p o ides supplemen a y ma e ial mo i a ing he sys ema ic Gaussian
bending s a egy o inha monici y, as desc ibed in Sec ion 5.2.1.
Ha monic de ia ions a e epo ed as he a io be ween iden i ied ha monic compo-
nen s and hei heo e ical ideals, and ha e been ob ained ac oss he ull da ase
o eal engine eco dings, including all mic ophone posi ions, by sys ema ic ea u e
ex ac ion and analysis. These de ia ions a e plo ed as unc ions o RPM and
o que, enabling he de ec ion o sys ema ic inha monici y phenomena, along wi h
hei magni udes and pa e ns.
Figu e 11 shows ha a speci ic o a ional speeds, mul iple engine o de g oups ex-
hibi synch onized sys ema ic de ia ions om hei ideal ha monic a ios, exceeding
±4.0%.
Fu he mo e, as highligh ed in Figu e 12, ce ain indi idual engine o de s display
independen posi i e de ia ions ha emain ela i ely s able ac oss wide pm anges,
ep esen ing o de -speci ic phenomena ha pe sis h oughou a ious ope a ing
condi ions.
These phenomena a e consis en ly obse able ac oss di e en mic ophone posi ions,
e ec i ely uling ou mic ophone-speci ic measu emen a i ac s and sugges ing ha
his beha io is inhe en o he engine’s acous ic cha ac e is ics.
93
Figu e 11: De ia ions om ideal ha monics (de ined as a ios be ween measu ed
and ideal ha monics, cen e ed a ound 0) plo ed agains RPM (X-axis). Y-axis
shows ha monic indices, colo in ensi y indica es deg ee and di ec ion o de ia ion.
Main plo shows analysis o channel 1, supplemen a y plo s on he igh show chan-
nel 2, 3, 4 espec i ely. Pa e ns o ha monic de ia ion obse able ac oss mul iple
mic ophone posi ions demons a e ha se e al g oups o engine o de s exhibi syn-
ch onized sys ema ic d i s exceeding ±4.0% a speci ic o a ional speeds.
94
Figu e 12: Dis inc dominan engine o de s exhibi independen , s able equency
shi s exceeding 4.0% ac oss wide RPM anges. Layou and con en ions as in Fig-
u e 11.
95
Appendix B
Supplemen a y T aining Resul s
This appendix con ains de ailed me ics ha suppo he quan i a i e analysis in
Chap e 8.
Da ase Model Ea ly S opping Bes Loss Values
Bes Epoch S op Epoch Ha monic STFT To al
AHPN 67 75 0.133 1.793 0.963
PTR 59 67 0.131 1.681 0.906
BHPN 54 62 0.097 1.827 0.962
PTR 42 50 0.097 1.774 0.935
CHPN 26 34 0.223 2.306 1.265
PTR 43 51 0.201 2.072 1.136
Mean HPN - - 0.151 1.975 1.063
PTR - - 0.143 1.842 0.993
Table 5: Model pe o mance summa y showing inal alida ion loss alues a ea ly
s opping poin s, de ined as epochs a e which no signi ican imp o emen occu ed
o 8 consecu i e epochs. HPN and PTR models a e compa ed ac oss da ase s A,
B, and C, wi h loss componen s including ha monic econs uc ion, STFT spec al
ma ching, and o al combined loss. Mean alues ep esen a e ages ac oss all h ee
da ase s. Bold alues indica e supe io (lowe ) pe o mance be ween HPN and PTR
models o each da ase .
96
[52] Fle che , N. H. The nonlinea physics o musical ins umen s. Repo s on
P og ess in Physics 62, 723–764 (1999).
[53] Mu ay, C. & Whi ield, D. S. Musical S ing Inha monici y. ASTRA - The
McNai Schola s’ Jou nal (2021).
[54] Wol e, J. Inha monic esonances in wind ins umen s (2025).
[55] Te aniemi, M., Jus , V., Koelsch, S., Widmann, A. & Sch ge , E. Pi ch dis-
c imina ion accu acy in musicians s nonmusicians: An e en - ela ed po en ial
and beha io al s udy. Expe imen al B ain Resea ch 161, 1–10 (2005).
[56] Roads, C. Sound Composi ion wi h Pulsa s. Jou nal o Audio Enginee ing
Socie y 49, 134–147 (2001).
[57] Ka , K., Robe s, S., S one, R., Old ield, M. & F ench, B. Ins an aneous Ex-
haus Tempe a u e Measu emen s Using The mocouple Compensa ion Tech-
niques. In SAE 2004 Wo ld Cong ess & Exhibi ion, 2004–01–1418 (2004).
[58] Venka a aman, V., Hong, B. & C onhjo , A. Analyzing Engine Exhaus Gas
Tempe a u e Pulsa ions and Gas-Dynamics Using Thin-Wi e The mocouples.
Jou nal o Enginee ing o Gas Tu bines and Powe 146, 071002 (2024).
[59] Smi h, J. O. Physical Audio Signal P ocessing (W3K Publishing).
[60] Ja e, D. A. & Smi h, J. O. Ex ensions o he Ka plus-S ong Plucked-S ing
Algo i hm. Compu e Music Jou nal 7, 56 (1983). 3680063.
[61] Jang, E., Gu, S. & Poole, B. Ca ego ical Repa ame e iza ion wi h Gumbel-
So max (2017). 1611.01144.
[62] Smi h, J. O. In oduc ion o Digi al Fil e s wi h Audio Applica ions (W3K
Publishing, 2007).
[63] Yamaki, S., Abe, M. & Kawama a, M. Closed Fo m Solu ions o L2-Sensi i i y
Minimiza ion Subjec o L2-Scaling Cons ain s o Second-O de S a e-Space
Digi al Fil e s wi h Real Poles. IEICE T ansac ions on Fundamen als o Elec-
onics, Communica ions and Compu e Sciences E93-A, 476–487 (2010).
103
[64] A ik, S. O., Jun, H. & Diamos, G. Fas Spec og am In e sion using Mul i-
head Con olu ional Neu al Ne wo ks. IEEE Signal P ocessing Le e s 26, 94–98
(2019). 1808.06719.
[65] Yamamo o, R., Song, E. & Kim, J.-M. Pa allel Wa eGAN: A as wa e o m gen-
e a ion model based on gene a i e ad e sa ial ne wo ks wi h mul i- esolu ion
spec og am (2020). 1910.11480.
[66] S einme z, C. J. & Reiss, J. D. Au aloss: Audio- ocused loss unc ions in
PyTo ch (2020). 2010.10291.
[67] Hayes, B., Sai is, C. & Fazekas, G. Neu al Wa eshaping Syn hesis (2021).
2107.05050.
104