Document [original]

Technische Universität Berlin

Fakultät I - Geistes- und Bildungswissenschaften

Institut für Sprache und Kommunikation

Fachgebiet Audiokommunikation

Master’s Thesis in Audiokommunikation und -technologie

Autogenous Spatialization for

Arbitrary Loudspeaker Setups

Author: Zeyu Yang

1. Supervisor: Dr. Henrik von Coler

2. Supervisor: Prof. Dr. Stefan Weinziel

Start Date: 03-11-2023

Submission Date: 30-04-2024

Eidesstattliche Erklärung

Hiermit erkläre ich an Eides statt gegenüber der Fakultät I der Technischen

Universität Berlin, dass die vorliegende, dieser Erklärung angefügte Arbeit selb-

stständig und nur unter Zuhilfenahme der im Literaturverzeichnis genannten

Quellen und Hilfsmittel angefertigt wurde. Alle Stellen der Arbeit, die anderen

Werken dem Wortlaut oder dem Sinn nach entnommen wurden, sind kenntlich

gemacht. Ich reiche die Arbeit erstmals als Prüfungsleistung ein. Ich versichere,

dass diese Arbeit oder wesentliche Teile dieser Arbeit nicht bereits dem Leis-

tungserwerb in einer anderen Lehrveranstaltung zugrunde lagen

Mit meiner Unterschrift bestätige ich, dass ich über fachübliche Zitierregeln unter-

richtet worden bin und verstanden habe. Die im betroﬀenen Fachgebiet üblichen

Zitiervorschriften sind eingehalten worden.

Eine Überprüfung der Arbeit auf Plagiate mithilfe elektronischer Hilfsmittel darf

vorgenommen werden.

____________ ____________

Ort, Datum Unterschrift

Acknowledgements

I would like to express my profound gratitude to my supervisor, Henrik von Coler,

for his invaluable support and guidance. His inﬂuence was pivotal in my decision to

join the Audiokommunikation und -technologie program, and his insights helped

me to discover and pursue my true interests. Special thanks to Miller Puckette for

his advices to the implementation in Pure Data.

Above all, my deepest appreciation goes to Jiayu Ding for her unwavering mental

and emotional support throughout my master’s thesis journey. Her presence is

indispensable, and I could not have achieved this without her.

III

Abstract

This thesis introduces Zerr*, a novel approach to spatial music production that

navigates between conventional sound spatialization and spatial sound synthesis.

Following an extensive review of sound spatialization techniques, as well as his-

torical and contemporary developments in spatial music, a tailored coordinate

system has been developed to categorize Zerr* alongside existing spatial music

authoring tools.

Zerr* employs an innovative algorithmic framework that leverages the intrinsic

properties of the audio signal, coupled with a special mapping system, to au-

tonomously distribute audio to arbitrary loudspeaker setups. This approach facil-

itates dynamic, context-sensitive spatialization and enables unique spatial sound

synthesis eﬀects. This approach eﬀectively circumvents the limitations imposed by

traditional spatialization techniques, thus introducing new sonic experiences and

creative paradigms.

The core modules of Zerr* are implemented in C++ and are further extended

as a Pure Data package and JACK clients. The system is designed to integrate

seamlessly into existing creative ecosystems, supporting real-time audio manipu-

lation and sample-level spatialization. This functionality has proven particularly

eﬀective for live performances and improvisational contexts.

Comprehensive listening tests conducted with participants from diverse back-

grounds have validated the system’s eﬀectiveness. Their feedback has conﬁrmed

the system’s innovative approach and its practical applicability in real-world set-

tings.

Zusammenfassung

In dieser Arbeit wird Zerr* vorgestellt, ein neuartiger Ansatz zur räumlichen

Musikproduktion, der sich zwischen konventioneller Klangverräumlichung und

räumlicher Klangsynthese bewegt. Nach einer ausführlichen Untersuchung der

Techniken zur Klangverräumlichung sowie der historischen und aktuellen Entwick-

lungen in der räumlichen Musik wurde ein maßgeschneidertes Koordinatensystem

entwickelt, um Zerr* neben den bestehenden Tools zur Erstellung räumlicher

Musik einzuordnen.

Zerr* verwendet ein innovatives algorithmisches Framework, das die intrinsischen

Eigenschaften des Audiosignals in Verbindung mit einem speziellen Mapping-Sys-

tem nutzt, um Audiosignale autonom auf beliebige Lautsprecherkonﬁgurationen

zu verteilen. Dieser Ansatz erleichtert eine dynamische, kontextabhängige Räum-

lichkeit und ermöglicht einzigartige räumliche Klangsyntheseeﬀekte. Dieser Ansatz

umgeht eﬀektiv die Beschränkungen, die durch herkömmliche Raumklangtech-

niken auferlegt werden, und führt so zu neuen Klangerlebnissen und kreativen

Paradigmen.

Die Kernmodule von Zerr* sind in C++ implementiert und werden durch ein Pure

Data-Paket und JACK-Clients erweitert. Das System ist so konzipiert, dass es sich

nahtlos in bestehende kreative Ökosysteme einfügt und Audiomanipulationen in

Echtzeit sowie Räumlichkeit auf Sample-Ebene unterstützt. Diese Funktionalität

hat sich als besonders eﬀektiv für Live-Performances und Improvisationskontexte

erwiesen.

Umfassende Hörtests mit Teilnehmern aus verschiedenen Bereichen haben die Ef-

fektivität des Systems bestätigt. Ihr Feedback hat den innovativen Ansatz des

Systems und seine praktische Anwendbarkeit in realen Umgebungen bestätigt.

Contents

1 Introduction ................................................................................................ - 9 -

1.1 Motivation ........................................................................................ - 9 -

1.2 Research Paths and Objectives ....................................................... - 10 -

1.3 Structure of the Work ..................................................................... - 10 -

2 Technical and Theoretical Foundations ..................................................... - 11 -

2.1 Conventional Sound Spatialization ................................................. - 11 -

2.1.1 Channel-based Method ......................................................... - 11 -

2.1.2 Object-based Method ............................................................ - 12 -

2.1.2.1 Panning Algorithms .................................................... - 12 -

2.1.2.2 Sound Field Reconstruction ........................................ - 13 -

2.1.3 Discussion ............................................................................. - 15 -

2.2 Early Spatial Music ......................................................................... - 16 -

2.2.1 Spatialization as Performance ............................................... - 16 -

2.2.2 Spatialization as Composition ............................................... - 17 -

2.2.3 Discussion ............................................................................. - 19 -

2.3 Loudspeakers as Musical Instruments ............................................. - 19 -

2.3.1 Reconsideration of Channel-based Method ........................... - 19 -

2.3.2 Loudspeaker Characteristics ................................................. - 20 -

2.3.2.1 Loudspeaker Orchestras .............................................. - 20 -

2.3.2.2 Unconventional Loudspeakers ..................................... - 21 -

2.3.3 Sonic Trajectories .................................................................. - 22 -

2.3.3.1 Historical Milestone .................................................... - 22 -

2.3.3.2 Current Direction and Limitation ............................... - 23 -

2.4 Spatial Sound Control ..................................................................... - 24 -

2.4.1 Composition Oriented ........................................................... - 25 -

2.4.1.1 Trajectory Editing ....................................................... - 26 -

2.4.1.2 Composition Toolchain ............................................... - 27 -

2.4.2 Performance Oriented ........................................................... - 28 -

2.4.2.1 Controllers in History .................................................. - 29 -

2.4.2.2 Spatial Instruments ..................................................... - 30 -

2.4.2.3 Matrix-based Diﬀusion ................................................ - 30 -

2.4.3 Discussion ............................................................................. - 31 -

2.5 Spatial Texture ............................................................................... - 32 -

2.5.1 Relevant Theories .................................................................. - 32 -

2.5.1.1 Spectromorphology & Spatiomorphology .................... - 32 -

2.5.1.2 Textural Composition ................................................. - 33 -

2.5.2 Creation Techniques .............................................................. - 34 -

2.5.2.1 Spectral Spatialization ................................................ - 34 -

2.5.2.2 Spatial Granulation ..................................................... - 35 -

2.5.2.3 Panning & Decorrelation ............................................. - 37 -

2.5.3 Spatialization as Synthesis .................................................... - 38 -

2.5.4 Related Tools ........................................................................ - 39 -

2.6 Discussion ....................................................................................... - 40 -

3 Zerr* Approach ......................................................................................... - 41 -

3.1 Approach Classiﬁcation ................................................................... - 41 -

3.2 Signal Flows in Live Performance ................................................... - 42 -

3.3 Zerr* Concept ................................................................................. - 45 -

3.4 System Design ................................................................................. - 47 -

3.4.1 Feature Tracker ..................................................................... - 47 -

3.4.2 Feature Processor .................................................................. - 48 -

3.4.3 Speaker Manager ................................................................... - 49 -

3.4.4 Envelope Generator ............................................................... - 51 -

3.4.4.1 Speaker Selection ........................................................ - 51 -

3.4.4.2 Distribution Processing ............................................... - 52 -

3.4.5 Envelope Combinator ............................................................ - 52 -

3.4.6 Audio Disperser ..................................................................... - 53 -

3.5 Discussions ...................................................................................... - 53 -

3.5.1 Creative Use of Audio Features ............................................. - 53 -

3.5.2 Sample-level Processing ........................................................ - 55 -

3.5.3 Irregular Loudspeaker Setups ................................................ - 56 -

3.5.4 “Incorrect” Useage ................................................................ - 56 -

4 Implementation ......................................................................................... - 58 -

4.1 Aims and Priorities ......................................................................... - 58 -

4.2 Modular Design & Proﬁles .............................................................. - 59 -

4.3 Core Modules .................................................................................. - 59 -

4.3.1 Feature Tracker ..................................................................... - 60 -

4.3.2 Feature Processor .................................................................. - 60 -

4.3.3 Speaker Manager ................................................................... - 61 -

4.3.4 Envelope Generator ............................................................... - 61 -

4.3.5 Envelope Combinator ............................................................ - 62 -

4.3.6 Audio Disperser ..................................................................... - 62 -

4.4 Encapsulations ................................................................................ - 62 -

4.4.1 Pure Data Package ................................................................ - 63 -

4.4.2 JACK Client ......................................................................... - 64 -

4.4.3 Encapsulations in Development ............................................ - 65 -

4.5 Discussion ....................................................................................... - 66 -

5 Evaluation ................................................................................................. - 67 -

5.1 Goals & Expectations ..................................................................... - 67 -

5.2 Study Design ................................................................................... - 67 -

5.2.1 Test Environment .................................................................. - 67 -

5.2.2 Test System ........................................................................... - 68 -

5.2.3 Experience Assesment ........................................................... - 71 -

5.2.4 Test Process .......................................................................... - 73 -

5.3 Procedure ........................................................................................ - 74 -

5.3.1 Questionnaire ........................................................................ - 74 -

VII

5.3.2 Recruitment .......................................................................... - 75 -

5.3.3 Test Scenario ......................................................................... - 75 -

5.4 Analysis ........................................................................................... - 76 -

5.4.1 General Feedbacks ................................................................. - 76 -

5.4.1.1 Feedback for the Presets ............................................. - 76 -

5.4.1.2 Comprehensive Feedback ............................................ - 79 -

5.4.2 Experience-related Feedbacks ................................................ - 81 -

5.5 Discussion ....................................................................................... - 85 -

6 Conclusions & Future Work ...................................................................... - 86 -

VIII

Chapter 1

Introduction

1.1 Motivation

The concept of spatial audio is no longer a foreign one to the general public. The

technologies have progressed beyond the realm of research institutes and have

been incorporated into a multitude of commercial applications. For instance, the

spatial audio experience in Apple Music,¹ the Dolby Cinema,² the Apple Vision

¹https://www.dolby.com/experience/apple-music/

²https://www.dolby.com/movies-tv/cinema/

Pro³ and the recently constructed sound system in Vegas Sphere.4 It is becoming

³https://www.apple.com/apple-vision-pro/

4https://holoplot.com/insights/case-studies/msg-sphere-case-study

increasingly evident that the general public is beginning to recognize the existence

of a spatial dimension in sound perception.

The impact of spatial audio on the music industry is also signiﬁcant. In Jan-

uary 2024, Apple Music decided to pay 10% higher royalties for Spatial Audio

tracks (supported by Dolby Atmos) than for tracks not available in this format.

The Dolby Atmos format5 continues to gain market share and has the potential of

5https://www.dolby.com/technologies/dolby-atmos/

becoming the dominant format for music production and distribution. The concept

of sound spatialization, which refers to the distribution of sound in an acoustic

space, is now being considered by traditional music producers and musicians.

Prior to this, however, attempts at spatializing sound has been an integral

part of electroacoustic music since its early days. The spatialization of previously

produced tape music on multichannel loudspeaker setups, as exempliﬁed by the

potentiomètre d’espace spatial control system (Teruggi, 2007), has already been

practiced in the 1950s by the Groupe de Recherches Musicale (GRM). We can also

say that the current commercially successful application cannot be separated from

the contribution of the experiments and theories put forward by once pioneering

musicians in their attempts for sound spatialization.

While substantial progress has been made in accurately reproducing the spa-

tial characteristics of sound within the ﬁeld of sound spatialization, this represents

only a fraction of the broader spectrum of challenges. In many cases, particularly

in musical practices, standard spatial audio techniques may not fully exploit the

spatial attributes of sound, potentially limiting the scope for innovation due to

their well-established methodologies. This thesis acknowledges the achievements

in traditional spatial audio techniques while proposing a considered exploration

into alternative approaches, particularly within the electroacoustic music sector.

There are already a number of pioneering approaches in this ﬁeld, and although

- 9 -

some may initially appear rudimentary or overly ambitious, they hold the poten-

tial to drive signiﬁcant innovations in how we perceive and interact with sound.

The motivation for this master’s thesis is rooted in a desire to critically explore

and develop these experimental methodologies. This research aims to incremen-

tally contribute to the existing body of knowledge by proposing new solutions that

address less explored challenges in sound spatialization. Through this process, the

work seeks to enhance the toolkit available for artists and technologists, poten-

tially enriching the auditory experiences and expanding the creative possibilities

within the ﬁeld.

1.2 Research Paths and Objectives

This thesis outlines a structured path to the creation of a new experimental mu-

sical tool, divided into four phases: background research, theoretical innovation,

technical deployment, and empirical testing. It begins with a comprehensive re-

view of historical and contemporary applications of spatial audio technologies and

spatial attributes in music, including a reevaluation of loudspeakers actively as

part of musical instruments. Tools for creating spatial music6 are also categorized,

6Spatial music is a term used extensively in this thesis to represent all musical genres that take into account

spatial properties.

and innovative aesthetics and techniques that go beyond traditional soundﬁeld

reconstruction and panning are introduced. This foundational study establishes

the theoretical framework for the Zerr* method, a novel approach to sound spa-

tialization and spatial sound synthesis. The Zerr* method is detailed through its

core concepts and speciﬁc algorithmic framework, which distinguish it from ex-

isting tools. The design of the system emphasizes ease of use and adaptability

to diﬀerent environments, making it suitable for diverse applications. Empirical

validation is performed through extensive listening tests designed to assess the ef-

fectiveness of the system. The feedback from these tests helps to reﬁne the system,

conﬁrming its practicality and identifying areas for improvement. The overall goal

of this thesis is to develop a spatialization system that balances theoretical depth

with practical usability. This system will contribute to the advancement of spatial

music production technologies while maintaining an experimental focus.

1.3 Structure of the Work

Section2 provides an overview of the technical and theoretical foundations un-

derlying the Zerr* system. Section3 elucidates the fundamental concept of Zerr*

and the design principles that inform its development. Section4 delves into the

implementation details of the Zerr* system. Section5 presents the results of a

listening test conducted on the Zerr* system. Finally, Section6 oﬀers a summary

of the conclusion and outlines future research directions.

- 10 -

Chapter 2

Technical and Theoretical Foundations

2.1 Conventional Sound Spatialization

Sound spatialization is still traditionally associated with the industry in which it

was ﬁrst commercially successful, namely the ﬁlm industry. Stereophony was ﬁrst

used in 1940 in Disney’s animated ﬁlm Fantasia, which was a signiﬁcant commer-

cial success (Ross, 2012). To date, the ﬁlm industry has been the primary domain

of sound spatialization, and as a result, a plethora of novel technologies have been

devised for the cinematic context. A multitude of assumptions and solutions are

proposed in order to facilitate the production of the ﬁlm. This, in turn, engenders

a subtle tension for musicians to utilise sound spatialization technologies in their

music (Baalman, 2010). The commercial success of the technology has led to an

overwhelming advantage in its development. The utilization of technologies that

are not inherently music-speciﬁc for the creation of music inevitably results in

the inﬂuence of the methodologies that are adapted in the technology for their

respective scenarios. Consequently, to a certain extent, it will impede the full po-

tential of utilising space as a musical element in a musical context.

Due to its conceptual richness, music oﬀers more expansive developmental

possibilities compared to other, more straightforward scenes. Furthermore, music

theorists and pioneering musicians have successfully developed various theories,

techniques, and technologies for utilizing space musically, particularly within the

realm of electroacoustic music. Prior to delving into the concepts of spatialization

in music, it is essential to elucidate the conventional concepts of sound spatializa-

tion in order to achieve a lucid comprehension of the distinctions.

2.1.1 Channel-based Method

In its narrowest deﬁnition, channel-based spatialization represents the most tra-

ditional and straightforward method of sound spatialization. It is also the ﬁrst

attempt at spatialization through the assignment of sound to particular speakers

in a multi-channel sound system (Wenzel et al., 2017). The multi-channel sound

system is constrained to a limited number of predeﬁned speaker layouts, which

can be as simple as stereo (2 channels) or as complex as 9.1.2 (12 channels). The

speaker layouts are frequently described using this X.Y.Z format. X represents the

number of main speakers positioned around the listener at ear level. Y represents

to the number of subwoofers. Z represents to the number of overhead speakers. In

the channel-based method, sound is often spatially rendered through the use of

volume level diﬀerences or time delays between channels, which serve as spatial

cues. The movement of the sound is rarely considered compared to the position

of the sound.

- 11 -

The simplicity of this approach has resulted in its continued use in a vari-

ety of settings, including music studios and home theaters. It has become the

prevailing concept of immersive sound for the majority of users. Novel forms of

sound spatialization systems are designed to retain compatibility with conven-

tional channel-based conﬁgurations. Nevertheless, the channel-based method has

a much more expansive meaning in the context of music, which will be elucidated

in Section2.3.1.

2.1.2 Object-based Method

Object-based spatialization attempts to emulate the natural behavior of sound

sources by generating the impression of sound coming from speciﬁc directions or

moving in certain ways. Typically, such spatialization techniques work with the

point source paradigm, in which a sound source is assigned a virtual position in

the listening space. In theory, the position of the point source is independent of the

actual position of the loudspeaker, and thus often regarded as a true source. Ob-

ject-based spatialization can be achieved through the use of panning algorithms,

such as Vector Base Amplitude Panning (VBAP), Distance-based Amplitude Pan-

ning (DBAP), and sound ﬁeld reconstruction methods, including Ambisonics and

Wave Field Synthesis (WFS).

2.1.2.1 Panning Algorithms

Panning algorithms achieve sound source positioning by calculating the relative

gains of the speakers. These calculations are based on the geometric relationship

between the loudspeakers used for generation, the target source position. The ori-

gins of the concept can be traced back to the stereophony research that began

in the 1930s (Blumlein, 1933). This research delved into the perceived eﬀects of

level diﬀerence and time diﬀerence between two loudspeakers on the localization

of a virtual sound source. Nevertheless, a number of enhancements have been im-

plemented, rendering contemporary spatial panning algorithms far more versatile

tools for sound spatialization than conventional stereo panning. Two widely uti-

lized algorithms are presented here.

Vector Base Amplitude Panning (VBAP) overcomes the limitation of

stereophony, which is exclusive to two speakers positioned in front of the listener,

by employing vectors to describe the positions of speakers and virtual sound

sources within a listening space (Pulkki, 1997; 1998; 2001). This approach extends

the stereo panning to two-dimensional and three-dimensional multiple speaker se-

tups. In order to determine the active speakers for the generation of the virtual

sound source, the VBAP algorithm selects the pair or triplet of speakers that

form the smallest angle with the direction of the virtual sound source. The precise

and ﬂexible spatialization of sound can be achieved through low-cost calculations,

particularly in high-density loudspeaker setups such as conventional ring or dome

shapes.

- 12 -

Distance-based amplitude panning (DBAP) is a lightweight panning method

proposed to address the practical challenges of implementing sound spatialization

systems in real-world settings (Lossius et al., 2009). In such environments, it is

not feasible to assume that listeners will be situated in the optimal listening po-

sition, and conventional speaker conﬁgurations may not be viable. The algorithm

determines the distance from a virtual sound source to each speaker in order to

maintain a constant sound intensity, regardless of the source’s position. A further

distinction between DBAP and VBAP is the assumption that all speakers are

active and contribute to the perception of the virtual source. Additionally, spatial

blur is incorporated to circumvent localization issues that may arise when a sound

source aligns with a single speaker. The author identiﬁes a limitation of DBAP

in its ability to deal with virtual sources outside loudspeaker arrays. To address

this, a solution is proposed whereby the virtual sound source is ﬁrst aligned to the

edge of the convex hull and the intensity is adjusted to mimic the eﬀect of that

source outside the loudspeaker array.

The DBAP is particularly well-suited to scenarios where the loudspeaker

conﬁguration is determined by considerations related to artistic, architectural, or

acoustical design, providing creators with the ﬂexibility to arrange speakers in a

variety of conﬁgurations across diﬀerent physical spaces. The prevalence of the

DBAP algorithm illustrates the discrepancy between the research scenarios and

the practical applications of sound spatialization algorithms in the real world. A

number of fundamental assumptions, including the regular distribution of speakers

and ﬁxed listening positions, are not applicable in practice. This makes it chal-

lenging to apply algorithms that perform well in laboratory settings in real-world

scenarios.

2.1.2.2 Sound Field Reconstruction

Another type of object-based method is designed to reconstruct the sound ﬁeld,

rather than manipulate human perception based on psychoacoustic principles as

is the case with panning methods. The recreation of the sound ﬁeld ensures the

physical correctness, thereby enabling the achievement of robust and high-accu-

rate sound localization. The most common algorithms for reconstruction are Am-

bisonics and wave ﬁeld synthesis (WFS). These algorithms consider reconstruction

from diﬀerent perspectives, although under certain conditions, they are essentially

the same.

Ambisonics is a method of encoding the sound ﬁeld around a point in space,

capturing the directionality and intensity of sound from all directions (Malham &

Myatt, 1995). It can be used with both ring (2D) and sphere (3D) speaker setups.

This technique employs spherical harmonics to represent the sound ﬁeld, which

can then be decoded for playback over a multi-speaker setup or headphones. The

sound ﬁeld at this point is approximated using spherical harmonics of diﬀerent or-

ders. As the order of the spherical harmonics increases, the approximation becomes

more precise, allowing for enhanced spatial resolution in the reproduced sound

- 13 -

(Gerzon, 1973). One signiﬁcant advancement in the application of Ambisonics is

the development of Near-ﬁeld-corrected Higher-Order Ambisonics (NFC-HOA),

now more commonly referred to as Distance-coded Ambisonics (DCA) (Daniel,

2003) . This reﬁnement addresses one of the traditional limitations of Ambisonics

related to the decoding accuracy for listeners close to the speakers. NFC-HOA,

or DCA, incorporates distance information into the decoding process, optimiz-

ing sound reproduction for near-ﬁeld listening environments. This enhancement

is particularly beneﬁcial in settings where listeners may be positioned at varying

distances from the speakers, improving the accuracy and quality of the auditory

experience across a wider listening area.

Ambisonics provides a highly adaptable framework for the manipulation

of sound ﬁelds in the context of complete music production, encompassing the

processes of recording, manipulation, composition, and reproduction. The algo-

rithm’s open-source nature renders it a popular choice among artists for spatial-

izing sound. Nevertheless, despite improvements like DCA, the most apparent

drawback of Ambisonics remains: listeners are conﬁned to a narrow sweet spot for

high-quality reproduction, particularly when traditional Ambisonic methods are

used without near-ﬁeld correction.

In contrast, Wave Field Synthesis (WFS) is capable of producing an acousti-

cally accurate synthesized sound ﬁeld through the generation of waves that repli-

cate the natural sound waves emitted by authentic sound sources (Berkhout et

al., 1993; Theile & Wittek, 2004). This implies that there is no concept of a sweet

spot, despite the fact that listeners perceive diﬀerently when moving, as if they

were listening to a real sound source. The Wavefront Synthesis (WFS) technique

is based on the principle of Huygen’s, which states that the subsequent wavefront

can be created by an inﬁnite number of small audio sources located at the current

wavefront. It employs a multitude of closely spaced loudspeakers to synthesize

the sound ﬁeld. Theoretically, the high density of loudspeakers allows for an ac-

curate reproduction of sound over a larger listening area. However, the physical

limitations of the speaker size make the spatial aliasing eﬀect in high frequencies

inevitable.

A number of practical considerations have impeded WFS from attaining sig-

niﬁcant popularity. Firstly, the algorithm necessitates the use of costly, special-

ized equipment with a substantial number of speakers integrated, in contrast to

ambisonics playback systems that can be constructed using conventional moni-

tor speakers. Secondly, WFS necessitates a considerable amount of computation

to perform large-scale multichannel audio processing, which remains a signiﬁcant

challenge on personal computers. Nevertheless, it is possible to posit that these

practical considerations can be addressed in a gradual manner through the devel-

opment of engineering solutions, with the theoretical advantages of WFS subse-

quently becoming apparent. As previously stated in Section1.1, there are already

successful commercial applications of this technology (Start, 2024).

- 14 -

2.1.3 Discussion

The object-based method represents an advanced abstraction of spatial sound

placement, moving beyond the straightforward logic of one channel corresponding

to one speaker. This approach necessitates a shift in mentality as it abstracts sound

objects, making it generally agnostic of the loudspeaker system. This system is

assumed to be as dense, evenly distributed, and homogeneous as possible to create

realistic and immersive listening experiences. Consequently, works utilizing the

object-based method are highly portable. Audio material and source movement

data can be stored separately and rendered to any loudspeaker setup that meets

the necessary speciﬁcations. While the individual experience and quality depend

heavily on the loudspeaker systems, they are considered interchangeable compo-

nents of the technical infrastructure.

Enumerating the advantages of the object-based method does not suggest it

is a one-size-ﬁts-all solution for sound spatialization. As discussed in the Dolby

Atmos White Paper (Dolby, 2014), sound objects are beneﬁcial for controlling

instantaneous eﬀects in movies, while ambient eﬀects, reverberations, and back-

ground music are more suitably transmitted directly to an array of loudspeakers.

Therefore, as a commercial standard, it still supports “beds” in the channel-based

tradition as a complement to objects in the rendering process.

Diﬀerent scenarios demand diﬀerent technologies. The sound spatialization

technologies discussed are primarily aimed at environments similar to cinemas,

theaters, or for research purposes in laboratories. In these settings, the core re-

quirement is to reconstruct the sound ﬁeld as closely as possible to real scenes,

providing an immersive sound experience. Historically, terms like immersive sound

or surround sound have been more prevalent in early promotions.

Conventional sound spatialization methods excel at reconstructing sound

ﬁelds and accurately localizing sounds. However, when the focus shifts back to mu-

sic, the perspective changes. A perfect imitation of physical sound sources does not

necessarily equate to a memorable musical experience. Indeed, the sound ﬁeld sim-

ulation paradigm can signiﬁcantly beneﬁt music. The movement of sound objects

introduces a new dimension of expression to musical elements. Accurate sound

localization not only recreates the ambiance of live music scenes, particularly in

classical music but also oﬀers a novel mixing approach that frees music producers

from the constraints of limited panning positions and spectral bandwidth.

Furthermore, there are broader possibilities for utilizing spatial properties in

music. Spatiality, an often overlooked but inherent attribute in music, can be more

intimately linked with diﬀerent musical perspectives. The next section delves into

the historical explorations of spatial properties in music.

- 15 -

2.2 Early Spatial Music

De méme que la musique est une dialectique de la durée et de l’intensité, le

nouveau procédé est une dialectique du son dans l’espace et je pense que le

terme de musique spatiale lui conviendrait mieux que celui de stéréophonie.

— Abraham Moles

The divergent conceptualizations of spatialization in electroacoustic music, ex-

tending beyond its original roots in the ﬁlm industry, have been thoroughly doc-

umented. In examining the origins of electroacoustic music, discussions typically

center on Musique Concrète, which originated in France, and Elektronische Musik

from Germany. Early developments in both countries already incorporated sound

spatialization as an integral aspect of musical practice. Chronologically, these

movements unfolded in a sequential manner, with signiﬁcant cross-communication

between them. However, this thesis maintains a distinction between the two be-

cause the pioneers from each movement embodied distinctly diﬀerent mindsets

regarding spatialization in music. One approach primarily viewed spatialization

as a performance tool, while the other integrated it as a fundamental aspect of

composition. Both perspectives have been pivotal in shaping the direction of sub-

sequent research. Throughout this section, the term spatial music is employed

to collectively refer to any music that involves exploration of spatial properties,

simplifying the complex taxonomy of music genres.

2.2.1 Spatialization as Performance

A particularly notable example from the early explorations is Jacques Poullin’s

work with the potentiomètre d’espace system (Valiquet, 2012). This occurred dur-

ing his time as a member of the Groupe de Recherche de Musique Concrète

(GRMC), which was organized by Pierre Schaeﬀer. The most renowned iteration

of this system, known as the pupitre d’espace (space desk) or pupitre de relief (re-

lief desk), was initially employed in live performances by Pierre Henry and Pierre

Schaeﬀer in 1951. This system featured a unique setup with four speakers: two

in front, one at the rear, and another overhead. A single-track recording is being

played back as the input signal, while a performer controls the position of sound in

real time via a handheld transmitter. The transmitter coil is designed to interact

with four receiver coils positioned around the performer, with the objective of

controlling the amplitude of the four loudspeakers (Teruggi, 2007).

Poullin believed that the innovative aspect of this system was its transfor-

mation of the listening experience, allowing sound to emanate not just from the

traditional frontal plane but from the surrounding space. However, from today’s

perspective, the contributions of this system extend far beyond that initial inno-

vation. Schaeﬀer attempted to diﬀerentiate this novel process from stereophony by

emphasizing that the objective is to generate a corresponding spatial development

- 16 -

for the sound rather than an exact replication. Furthermore, there were a few for-

ward-thinking remarks, as Abraham Moles notes in the quotation, “Just as music

is a dialectic of duration and intensity, the new process is a dialectic of sound in

space, and I think the term Spatial Music would suit it better than stereophony”.

The signiﬁcant contribution of this system is that it deepens the integration of

sound spatialization into live performance, allowing performers to actively shape

spatial properties as a core element of musical expression. This practice gradually

evolved and then became central to electroacoustic music, known as Sound Diﬀu-

sion (Austin & Smalley, 2000; Dack, 2001), where performers, often referred to as

diﬀusers, dynamically sculpt the sound in real-time across the performance space.

Sound diﬀusion not only enhances the expressive potential of tape music but also

reshapes traditional concepts of audience engagement. By integrating spatializa-

tion directly into the performance, it creates a dynamic interaction among the

audio, the space, and the listener. This approach eﬀectively dissolves the conven-

tional roles of composer, performer, and audience, fostering a more immersive and

collaborative experience (Harrison, 1998).

Figure1: Pierre Henry performing with the pupitre d’espace

2.2.2 Spatialization as Composition

From the origins of Elektronische Musik, Karlheinz Stockhausen investigated the

potential of sound spatialization, oﬀering a perspective divergent from the per-

formance-centric approaches due to his exposure to serialism. In 1956, Karlheinz

Stockhausen composed Gesang der Jünglinge as his inaugural eﬀort to employ the

spatial properties of sound in composition (Smalley, 2000). The timbre of the vocal

of a boy and the generated white noise and sine tones in this piece were blended

in such a way as to obscure the distinction between them. The precise positioning

and movement of sounds were meticulously crafted with the use of ﬁve groups of

loudspeakers, thereby further emphasizing the serialist connection between two

groups of timbre and becoming an integral component for the comprehension of

the work (Decroupet et al., 1998).

- 17 -

In his subsequent theoretical development, Stockhausen considered spatial

properties to be of equal importance to other musical elements, and thus asserted

that they should be articulated similarly (Morgan, 1975). However, he encountered

diﬃculties in serializing musical aspects beyond pitch, particularly timbre and

space, due to the lack of a clear serialist relationship among these high-dimensional

elements. To address this issue, he developed a novel approach that emphasized

the overall character of large, proportionally related groups of material, which

could also be applied to spatial properties. This approach was successfully applied

to his composition Gruppen ( “groups” in German) with three groups of orches-

tras located at the front left, center and right of the auditorium Further examples

include Carre (“square” in French) with four orchestras positioned at the four

angles of a square centered on the audience The impression of spatial movements

was created via overlapping crescendos and decrescendos, which are embedded in

the scores of the composition (Bates, 2009).

Figure2: Karlheinz Stockhause manipulate the rotating loudspeaker

Furthermore, Stockhausen posits that the direction of sound is of greater conse-

quence than the distance, as the latter can be derived from musical parameters

such as timbre and loudness. His contemplation of the potential of sound as a

primary compositional element was evidenced by the invention of the rotating

loudspeaker mechanism he created for the composition of Kontakte. The appa-

ratus comprises a turntable with a loudspeaker situated in the centre, and four

microphones arranged in a circle around it. The purpose of these microphones is

to record the sound produced by the loudspeaker during playback. The rotation

of the loudspeaker produces a multitude of acoustic eﬀects that extend beyond

the mere amplitude change between four microphones. These include phenomena

such as the Doppler shift and phase shifts, which are challenging to simulate using

purely electronic devices. The spatial variation of the sound has been meticulously

recorded on the four-track recording, allowing for the playback of Kontakte in any

venue with an equal setup of speakers in all four directions without the need for

on-site diﬀuser control. The pursuit of precise control of sound direction by Stock-

hausen is still evident in the current development of spatial music authoring tools.

- 18 -

2.2.3 Discussion

The preceding sections have examined the dimensions of sound spatialization in

both performance and composition contexts. Section2.2.1 delves into how per-

formers, in roles akin to diﬀusers, actively manipulate the spatial attributes of

sound in real-time within live electroacoustic music, thereby fostering a dynamic

interaction between audio, spatial environment, and audience. This approach eﬀec-

tively dissolves traditional performer roles, creating a deeply immersive auditory

experience. In contrast, Section2.2.2 examines how composers integrate spatial

properties directly into their compositions, emphasizing creative intent through

precise control of spatial attributes. Both methods place loudspeakers in a pivotal

role, transcending their traditional function as mere conduits of sound to become

active, color-imparting components of the musical expression.

If the initial phase of electroacoustic music, as is often asserted, transforms

the process of traditional music from the initial creation of music on a score by

the composer, which is then realized by instrument performers, to a direct manip-

ulation of sound materials, then the spatial control of sound represents a further

extension. In the past, the role of the instrument performer was regarded as that

of an intermediary between the composer and the listener. This meant that the

composer had no control over the performer’s interpretation of the music. Once the

composer begins to directly manipulate sound materials, there is no longer a need

to translate abstract notation into sound by performers. The only remaining iso-

lated interpreter is the playback system. The addition of loudspeakers inevitably

introduces a degree of coloration when reproducing sound. When musicians begin

to consider the distinctive attributes of loudspeakers and loudspeaker setups in

musical performance, and are able to regulate the audio playback during the per-

formance, it becomes evident that the loudspeaker can be considered an integral

component of the instrument. The subsequent section delineates various method-

ologies for employing the loudspeaker as an instrument for musical expression

within the domain of spatial music.

2.3 Loudspeakers as Musical Instruments

2.3.1 Reconsideration of Channel-based Method

Prior to a detailed examination of the manner in which speakers function as musi-

cal instruments, it is necessary to address a concept in Section2.1: channel-based

spatialization.

In contrast to the traditional understanding of a channel-based method, which

involves restricted speaker layouts, the term “channel-based” in spatial music is

employed to describe the one-to-one correspondence between channels and loud-

speakers. The musician must engage in a profound analysis of the utility of each

speaker and must possess direct control over each one. In this instance, the chan-

- 19 -

nel-based method represents a shift in deﬁnition from a highly structured format

to a more ﬂexible approach to thinking.

The channel-based and object-based methods of spatial music creation are,

in essence, neutral with regard to advantages or disadvantages. The decision to

employ the object-based or channel-based approach is contingent upon the speciﬁc

intent. The primary function of abstracting the concept of the sound object is to

facilitate precise control of position and movement. This is in accordance with the

discussion about Stockhausen’s use of serialism in composing spatial music. How-

ever, the concept of channel-based is more rudimentary. A useful analogy is the

distinction between a specialized cell and a stem cell. The absence of preconceived

notions regarding its capabilities renders it more likely to develop applications

that diverge from the norm and realize a greater array of potentialities.

2.3.2 Loudspeaker Characteristics

2.3.2.1 Loudspeaker Orchestras

As is the case with diﬀerent instruments in a symphony orchestra, loudspeakers

possess their own characteristics and techniques of use. From the perspective of

audio technology, these qualities can be reﬂected in the frequency response curve

of the speaker, distortion rate, and other parameters. Furthermore, the acoustic

discussion allows for the reﬂection of the interaction between the diﬀerent posi-

tions of the speakers and the listening environment. When a multitude of speakers

with disparate qualities are assembled in a speciﬁc conﬁguration, they can achieve

a similar eﬀect to that of a symphony orchestra. This is because they complement

each other’s qualities, resulting in a musical experience that is not possible with

a single type of instrument.

Loudspeaker orchestras, which serve as a paradigm for exploring the quali-

ties of loudspeakers and the corresponding spatial experience, have been realized

in a number of versions throughout history and up to the present. The most no-

table example is the Acousmonium (Desantos et al., 1997), which was introduced

by François Bayle in the 1970s. The system combines loudspeakers with distinct

characteristics and employs these diﬀerences in the diﬀusion of tape music. An

additional signiﬁcant development in the ﬁeld of loudspeaker orchestration was the

Gmebaphone, which was created by the Groupe de Musique Expérimentale during

the late 1970s. (Clozier & Olsson, 2001). In addition to investigating the impact

of spatial eﬀects on loudspeakers in a manner analogous to the Acousmonium, the

project’s most signiﬁcant contribution was the inﬂuence of the controller it de-

signed on the advancement of tools for real-time sound spatialization control. This

will be further elucidated in Section2.4 on control tools. Moreover, this tradition

has been reﬁned and expanded upon by contemporary implementations, such as

the Birmingham ElectroAcoustic Sound Theatre (BEAST) (Wilson & Harrison,

2010).

- 20 -

Table1: Acousmonium (left) and Gmebaphone-1 (right)

2.3.2.2 Unconventional Loudspeakers

In contrast to the formation of loudspeaker orchestras, an alternative approach

involves the use of unconventionally constructed loudspeakers. Spherical speaker

arrays, such as the IKO (Zotter et al., 2017), represent a departure from the tra-

ditional center-oriented array arrangement, with a diﬀuse speaker arrangement

providing a novel solution for composing directivity of sound. The distribution

of sound in space is achieved through the use of reﬂections, with the room itself

becoming an integral part of the instrument.

Parametric loudspeakers produce audible sound by emitting modulated ultra-

sonic waves that extend beyond the upper limit of human hearing. The modulated

ultrasonic waves interact with the nonlinear properties of the air to demodulate

into frequencies that can be heard (Shi & Gan, 2010). The aforementioned phys-

ical mechanism gives rise to the highly directional characteristics of parametric

loudspeakers. Although the majority of parametric loudspeaker manufacturers are

focused on the function of avoiding disruptions in scenarios such as home enter-

tainment or online conferences, the unique spatial characteristics of parametric

loudspeakers are ideal for sound installations that aim to create a strong connec-

tion between the sound and an exact listening location (Alunno & Yarce Botero,

2017).

An alternative approach to strong directivity is to minimize the perceptibility

of speakers. The OmniWave virtual speaker generates a stable, vertical phantom

sound source through its OmniDrive 360-degree radiators7. The system’s design

7https://bloomline.com/

ensures acoustical transparency, preserving the original sound quality and main-

taining stability across varying room conditions. Currently, the 4DSOUND system

(Oomen et al., 2016) based on Omniwave is being employed in the development

of spatial music and produces a series of works and performances.

The X1 Matrix array, manufactured by HOLOPLOT, has demonstrated re-

markable capabilities for precisely controlling the sound ﬁeld based on the Wave

- 21 -

Field Synthesis (WFS) algorithm. (Start, 2024). This technology oﬀers musicians

the potential to manipulate sound and space in a seamless manner, a capability

that has not previously been available. It is regrettable that the majority of appli-

cations for X1 Matrix are still constrained to the creation of precise coverage areas

and the provision of a consistent high-ﬁdelity sound experience for all listeners.

There are fewer examples of further utilisation as a music creation tool.

As evidenced by the preceding analysis, the utilization of unconventional

speakers is a promising avenue for further investigation. However, it is important

to note that there is still a paucity of research in this area. The primary rationale

for this phenomenon is that these speakers are typically prohibitively expensive

and not widely utilized. It is challenging for the majority of researchers to gain

access to these devices and to conduct prolonged experiments with them. The

majority of research into spatialization of sound has been conducted using conven-

tional studio monitors and arrays constructed from such speakers. As previously

stated in the preceding section on early spatial music, the investigation of the

spatial position and movement of sound has been a fundamental aspect of musical

exploration for a considerable period of time.

2.3.3 Sonic Trajectories

2.3.3.1 Historical Milestone

The book written by Pierre Schaeﬀer, In Search of a Concrete Music, published

in 1952, discussed how the sound travels on a sonic trajectory and creates spatial

depth through the contrast between stationary and manually controlled move-

ments (Schaeﬀer, 2012). The initial investigations into the sonic trajectories have

already been discussed from both the performance and compositional perspectives

in Section2.2.

It is also worth noting the application of sonic trajectories in the history

of music. One such example is Iannis Xenakis’ masterpiece Poème Électronique,

which was performed at the Philips Pavilion at the 1958 Brussels World Fair

(Lukes, 1996). A highly intricate spatialization scheme was devised by Edgard

Varèse for this composition, utilizing 350 loudspeakers. All of the loudspeakers

were integrated into the Philips Pavilion as an integral component of its archi-

tectural design. They were positioned to create a series of trajectories that align

with the distinctive hyperbolic paraboloid structure of the pavilion. The sounds

could traverse speciﬁc pathways in order to create the illusion of sonic trajecto-

ries. The Philips Pavilion was constructed at a time when, in 1958, there was no

established technology for object-based spatialization in three-dimensional sound

space. Nevertheless, the pioneering architects and musicians were able to achieve

their desired result in a way that spared no expense, thus creating an immersive

sound experience that transforms the entire pavilion into a musical instrument.

Following the conclusion of the Expo, the pavilion was dismantled. Since that

- 22 -

time, a great deal of multifaceted research has been conducted on it, as well as

attempts to virtually reconstruct the integral experience of it (Lombardo et al.,

2009). While the Poème Électronique reached an unprecedented level in combining

music and space, this kind of large-scale engineering feat, which could only have

been realized in a speciﬁc period of history, is still very diﬃcult or meaningless to

replicate nowadays.

Figure3: Sonic Trajectory for Poème Électronique

2.3.3.2 Current Direction and Limitation

A signiﬁcant proportion of subsequent research into the sonic trajectory has been

conducted with a more pragmatic approach. A tendency has emerged to explore

more portable solutions than site-speciﬁc sound spatialization. The portable side

is distinguished by the utilization of more precisely regulated speaker array struc-

tures, exempliﬁed by the 8-channel ring. Furthermore, there is a desire to have

underlying technology for more precise realization of sonic trajectories in space.

This topic is related to the discussion in Section2.1.2 about research interests

in the ﬁeld of sound ﬁeld reconstruction. The techniques outlined in Section2.1

represent the fundamental tools utilized by researchers in this ﬁeld.

Similarly, a signiﬁcant proportion of software, hardware, and user interfaces

designed for spatial music are oriented towards the control and characterisation of

sonic trajectories. It is impossible to ascertain whether this focus is justiﬁed from

any perspective. This is despite the fact that, from my personal perspective, the

notion that accurate control of sound spatialization does not necessarily equate

to a superior spatial musical experience has already been expressed.

It is crucial to recognize that a sonic trajectory remains an abstract concept,

and therefore, it is imperative to maintain a clear understanding of this funda-

mental aspect. The trajectory of a sound object is not as intuitive as one might

expect. It is challenging for the human ear to accurately recognize the trajec-

tory of a sound object without the aid of visual information (Schumacher et al.,

2021). In scenarios where there is only a single sound source, people already ex-

perience diﬃculties in distinguishing sound direction between the front and back.

- 23 -

The capacity to perceive sound trajectories is signiﬁcantly diminished when mul-

tiple objects are in motion in the same space. Even the aforementioned Poème

Électronique, which employs real speaker trajectories to convey movements in a

vast space, is not wholly accurate in its perception and necessitates the input of

other sensory modalities. Despite the emphasis on sound, the work is, in fact, a

Gesamtkunstwerk, a comprehensive artwork encompassing architecture, lighting,

ﬁlm, and music.

2.4 Spatial Sound Control

In considering the contemporary concept of musical instruments, the use of speak-

ers and speaker arrays as described in Section2.3 does not realize the complete

process of interaction between a musical instrument and musician. In reference

to the proposed models, (Wanderley, 2001) and (Magnusson, 2019), a musical

instrument can be deﬁned as comprising three fundamental elements: a gestural

controller, a mapping engine, and a sound engine. A simpliﬁed diagram is pre-

sented in Figure4. Speakers and speaker arrays primarily serve the function of

the audio engine. The remaining two elements will be addressed in this section.

Figure4: Interaction schematic between instrument and musician

In the early history of spatial sound controllers, they were conceptualized primarily

as physical hardware, such as the pupitre d’espace mentioned in Section2.2.1. In

the contemporary context, the notion of spatial sound controllers is predominantly

conceptualised within the framework of software, or alternatively, a combination of

hardware and software. The advent of multi-channel audio transmission protocols,

such as MADI (Lidbetter, 1988), Dante8, has signiﬁcantly lowered the threshold

for developing spatial audio applications.

8https://global.audinate.com/meet-dante/what-is-dante

- 24 -

Software can be categorized based on the environment in which it operates.

One category includes modules or libraries used in audio programming environ-

ments, such as Max/MSP9, Pure Data (Puckette & others, 1996), and Super-

9https://cycling74.com/products/max

Collider (McCartney, 2002). Another category consists of plugins for commercial

Digital Audio Workstations (DAWs) like Reaper10, Pro Tools, and Ableton Live.

10https://www.reaper.fm/

Among these, Reaper is particularly favored for spatial music creation due to its

robust support for multi-channel audio and the availability of a free evaluation

version. In addition, there are standalone applications or web browser-based con-

trol software.

The characteristics of a tool can be inﬂuenced by the environment in which

it is used. In essence, the tools utilized in an audio programming environment are

designed with a greater emphasis on real-time control and performance, as well as

the capacity to undertake a greater number of experimental trials. Conversely, the

tool chain employed in a digital audio workstation is more systematic in nature,

with the objective of providing a stable and controlled spatialization during the

production process. The standalone version is more eﬀective in fully realizing the

design concept of the developer. In addition, relatively recent developments in web-

based technologies have enabled the implementation of collaborative multi-user

operations, which is challenging to achieve in other environments. The discussion

of this aspect is beyond the scope of this thesis. For further information, please

refer to other related studies (Barbosa, 2003; Coler et al., 2020; Leslie et al., 2010).

This section adheres to the categorization established in the discussion of

early spatial music. Accordingly, tools designed for the purpose of composition

and performance are discussed separately. A multitude of tools are available that

can be utilized for both composition and performance tasks. The following section

will delineate the principal contexts in which these tools are employed.

2.4.1 Composition Oriented

In a more expansive deﬁnition of the term, one might posit that all of the scores and

notations utilized in the initial spatial music explorations to facilitate sound diﬀu-

sion could be considered composition-oriented tools. It can be argued that scores

and notations are not entirely obsolete, given that there is a minimal distinction

between scripting spatialization schemes and manually drawing automation lines

in a digital audio workstation (DAW) when utilizing plugins of an object-based

spatialization paradigm such as IEM-Ambisonic¹¹, SPARTA¹² and Dolby Atmos

¹¹https://plugins.iem.at/

¹²https://leomccormack.github.io/sparta-site/

Renderer¹³. Although the majority of researchers are reluctant to acknowledge it,

¹³https://professional.dolby.com/product/dolby-atmos-content-creation/dolby-atmos-renderer/

these remain the most prevalent tools by which most musicians are introduced to

composing spatial music.

- 25 -

One notable distinction is that the early practice of sound diﬀusion necessi-

tated the control of a single track of tape music. In this instance, the use of hand-

drawn scores and notations, in conjunction with manual hand-control, remained

a viable approach. In the contemporary era, spatial music creation necessitates

the simultaneous control of a multitude of sound objects (object-based view) or

speakers (channel-based view). It is now impossible to exercise direct manual con-

trol, and the manual deﬁnition of all spatial parameters during the composition

process has become an extremely redundant task. The fundamental objective of

authoring tools is to streamline the workload. Consequently, more creative meth-

ods of composition will be devised.

2.4.1.1 Trajectory Editing

The prevailing trend of object-based spatialization has given rise to a plethora of

assistive authoring tools that endeavor to streamline the process of editing the

spatial trajectory of sound objects. To illustrate, the earlier graphical spatial tra-

jectory editing software NeXTStep (Todoroﬀ et al., 1997), implemented on the

NeXT Computer, oﬀered a multitude of commonly utilized presets for 2D and 3D

trajectories, in addition to a graphical user interface for parameter editing. This

software provides an alternative way to create spatial trajectories by connecting to

other devices via a MIDI interface, especially direct recording of spatial gestures

via the wearable controller, Data Glove (Harada, 1992; Sturman & Zeltzer, 1994).

In a similar vein, Holophon, released the following year, also oﬀers 2D graphical

trajectory editing functions, in addition to a rich set of algorithm-based trajectory

generation functions (Pottier, 1998). The Holophon has undergone further devel-

opment14. In addition to the original HoloEdit graphical editors, HoloPad, an iPad

14https://en.gmem.org/holophon

software, has been introduced for the purpose of controlling DBAP-based sound

spatialization (Bascou, 2013). It duplicates a single-channel input according to the

number of ﬁngers and pressure of each ﬁnger, then localizes the sound objects at

positions deﬁned by the speaker array setup and ﬁnger positions. Additionally,

numerous analogous studies have been conducted on the evolution of graphical

user interfaces, which cannot be exhaustively enumerated or analyzed (Carpentier,

2015; Dilger, 2013; Thiébaut, 2005).

It is important to note that the aforementioned trajectory editing tools typ-

ically do not incorporate speciﬁc sound spatialization algorithms and, as a re-

sult, are not directly related to audio processing. The underlying sound spatial-

ization technique employed in conjunction with these tools may be any algorith-

mic process, including those mentioned in Section2.1.2, or any algorithm with

a similar functionality. The trajectories generated by these systems are typically

interpreted as control signals that indicate the position of the sound object. Some

software applications utilize more general transmission protocols, typically MIDI

(Rothstein, 1995) in the early stages of development and OSC (Wright, 2005) in

the contemporary era. Other software programs employ specialized sound descrip-

- 26 -

tion formats, including SDIF (Wright et al., 1999), ASDF (Geier et al., 2010), and

SpatDIF (Peters et al., 2013).

2.4.1.2 Composition Toolchain

The editing of trajectories represents a minor aspect of spatial music compositions.

Many long-term projects are dedicated to the development of a comprehensive

toolchain, encompassing the editing of spatial properties (trajectory, shape, time-

stamp), the generation of multichannel audio, and numerous other functions. A

project that provides only the underlying audio processing hardware or software

is not applicable to the categorization criteria used in this section. In essence,

these systems are technically neutral and do not explicitly incorporate musical

tendencies. For instance, IEM-Ambisonics is typically viewed as a set of DAW

plugins designed for composition. However, its standalone builds and OSC control

features make it an adaptable and engaging software tool. In contrast, the Spat

system developed by IRCAM (Carpentier, 2018; Jot & Warusfel, 1995) serves as

an external library in Max/MSP. It oﬀers real-time control and improvisation

capabilities, without limiting its integration as a foundational support in other

composition tools. It is more accurate to refer to such projects as toolkits rather

than toolchains. A number of similar toolkits are available, including SoundScape

Renderer (Geier et al., 2012), the Ambisonic Toolkit (AKT)15, and numerous oth-

ers.

15https://www.ambisonictoolkit.net/

IanniX, in honor of Iannis Xenakis, is not a software program designed specif-

ically for spatial music (Coduys & Ferry, 2004). The team that created it deﬁnes

IanniX as “a graphical open-source sequencer for digital art.”16 However, IanniX

16https://www.iannix.org

is well suited for composing complex spatial sound patterns, and is often cited in

papers on spatial authoring tools (Garcia et al., 2017; Jaroszewicz, 2015). IanniX

abstracts four core elements that comprise the sequential patterns: curves, trajec-

tories, triggers, and cursors. This is consistent with the design concepts of the tool

that will be presented in this thesis. The four fundamental elements can interact

with one another in a multitude of ways, thereby conferring upon the system a

high degree of ﬂexibility in the generation of complex control signals. The control

signals, which utilize the OSC protocol, can be mapped to the sound spatialization

algorithms, such as the toolkit previously described. The IanniX software is highly

sophisticated, yet its ﬂexibility and programmability also present signiﬁcant chal-

lenges for users. The necessity of user-deﬁned settings, for instance, makes it more

diﬃcult to use.

Zirkonium was developed by the Center for Art and Media Karlsruhe (ZKM)

with the primary objective of composing for the Sound Dome at ZKM Kubus

(Miyama & Dipper, 2016). Nevertheless, as an extensively developed software, it

can be utilised in a variety of settings with disparate loudspeaker systems. It oﬀers

- 27 -

a comprehensive toolchain for the creation of spatial music, encompassing but not

limited to the following capabilities: the generation of 2D or 3D loudspeaker setup

proﬁles, the creation and editing of parameter-based trajectories, the visualiza-

tion of real-time sound environments, the implementation of sound spatialization

algorithms (e.g., VBAP, HOA), and the integration of plugins for collaboration

with other software, synchronizing videos (ZirkVideoPlayer), and remote control

(ZirkPad). The initial release of the Zirkonium MK1 in 2006 marked the inception

of a novel concept: the integration of mathematical event-based sound movements,

rotations, and timing. This innovation served as a foundational framework for

subsequent development.

As with Zirkonium, the composition toolchain, which was designed for speciﬁc

venues, includes the SeamLess system for the TU Studio and Humboldt Forum

Listening Room (Coler et al., 2021) and the complete spatial music solution pro-

vided by 4DSOUND17, which includes both control software and speaker systems.

17https://4dsound.net/

It encompasses all the requisite functionalities, encompassing both software and

hardware (Cross, n.d.; Oomen et al., 2016).

Another signiﬁcant undertaking is the development of a series of spatial mu-

sic composition tools based on the visual programming language for computer-

assisted music composition, OpenMusic (Bresson et al., 2011). Garcia, Bresson,

and other research colleagues have been engaged in the active exploration of spa-

tial music composition workﬂows and tools based on OpenMusic for several years

(Agger et al., 2017; Bresson, 2012; Bresson et al., 2017; Jérémie Garcia, Jean

Bresson, & Carpentier, 2015; Jérémie Garcia, Jean Bresson, Schumacher, et al.,

2015; Garcia et al., 2016; 2017). They have developed SPAT-SCENE, an auxil-

iary module in OpenMusic for interacting with the Spat toolkit for timelines and

spatio-temporal speciﬁcation, the 3DC module for displaying spatial trajectories,

and Trajectoires, a mobile application for real-time drawing and control of spatial

trajectories by ﬁnger touch, among other applications. In addition to the tradi-

tional spatial music composition toolchain that has been the subject of current

discussion, more experimental workﬂows are proposed. These will be highlighted

in Section2.5.

2.4.2 Performance Oriented

Performance has been central to the development of spatial sound control tools,

from the earliest sound diﬀusion practices to present day. However, traditional

sound diﬀusion, with its roots in tape music, is now considered somewhat anti-

quated due to its reliance on ﬁxed sound materials and inherent limitations. It is

challenging for a diﬀuser to be fully dynamic, akin to a conventional instrumental

performer, given that it is a time-varying system that they is playing with. The

tools utilized for sound diﬀusion-type practice have gradually transitioned from

- 28 -

performance-oriented types to those oriented towards composition, as discussed

in Section2.4.1.

Contemporary performance-oriented spatial sound control tools emphasize

real-time interaction, generation, and collaboration, alongside essential control

functions. These tools are heavily inﬂuenced by the improvisational music mind-

set, where creative processes unfold in real-time. Unlike traditional sound diﬀusion

—where spatial properties are manipulated live but the audio is pre-composed—

modern performers must simultaneously manage both sound and space.

This section is structured into three parts: the ﬁrst provides a historical

overview of recognized spatial music performance devices; the second introduces

recent spatial sound control instruments for real-time performance; the third oﬀers

a concise overview of a particular branch that utilizes automation in performance,

which then leads into the subsequent discussion.

2.4.2.1 Controllers in History

As previously discussed in Section2.3.2.1, it is pertinent to highlight the GME-

Baphone, which was developed with the objective of regulating the loudspeaker

orchestra (Clozier & Olsson, 2001). To be precise, the GMEBaphone is an instru-

mentarium. It is a complete system containing all the necessary equipment, from

speaker arrays to signal processing units and control consoles. The concept of

GMEBaphone here refers speciﬁcally to the consoles that have been used in more

than two decades of iterative development, starting with the GMEBaphone 2 and

continuing with the GMEBaphone 6/Cybernéphone.

The device appears to be more akin to a mixing console in the contemporary

sense than an instrument. The method of operation is to regulate the volume of

a speciﬁc speaker or group of speakers by interacting with a set of fader boards.

This can be regarded as the core playing style of common sound diﬀusion. How-

ever, what distinguishes GMEBaphone from a mixing console is its programmable

mapping engine, which is designed speciﬁcally for GMEBaphone loudspeaker sys-

tems. This enables performers to control the loudspeaker orchestra in real time

with greater accuracy and eﬃciency. This style is still evident in the contemporary

BEAST system. (Wilson & Harrison, 2010).

In order to facilitate the performance of Expo 1970 in Osaka, a spherical

sound controller was constructed to control the spherical speaker array within the

German Pavilion (Brech, 2015). The controller comprised 50 sensor buttons, each

of which was mapped to a speciﬁc loudspeaker group. The sound direction was

altered when a button was pressed. This instrument enabled the speciﬁc volume

control to be carried out by the control circuit, thus moving away from the sliding-

type gesture to the push-type gesture, which facilitated faster and denser control

signals.

- 29 -

In 1984, Luigi Nono’s Prometeo premiered, featuring an instrument called the

Halaphon (Brech et al., 2015). This hybrid analog-digital spatialization system

could route input signals to the speakers during the performance. The Halaphon

originated as a digital musical instrument that enabled more complex control logic.

Following this, spatial sound controllers with purely analog circuits were oﬃcially

consigned to history.

2.4.2.2 Spatial Instruments

In recent years, there has been a notable surge in the development of instruments

for real-time spatial music performance. A comprehensive review by Pysiewicz

and Weinzierl has already been conducted (Pysiewicz & Weinzierl, 2017), thus

obviating the need for this thesis to reiterate the list of relevant works for analy-

sis. In their review, the instruments are classiﬁed according to three dimensions:

the controller type/interface, the controlled spatial parameters, and the scope of

control. The controller type/interface refers to the manner in which the controller

interacts with the player. The controlled spatial parameters were classiﬁed into

three categories based on the proximity to spatial properties, ranging from the

basic spatial location to the acoustic properties of the listening environment. The

scope of control refers to whether the controller solely provides functionality for

control signal generation or whether it involves a sound synthesis system. It is

strongly advised that readers consult the original article for a more comprehensive

analysis.

It is important to note that this review explicitly addresses the spatial sound

control tools discussed here in the context of Digital Music Instrument (DMI) and

Human Computer Interaction (HCI). Therefore, software for automatic spatial

sound control without an explicit user interface or physical controller, as well as

algorithms, are not included in this review.

2.4.2.3 Matrix-based Diffusion

Most composition tools, as noted in Section2.4.1, incorporate automation features

such as automated trajectory control and sequencer-style automation. While au-

tomation in composition feels intuitive, its role in performance raises concerns

due to the performer’s need for greater control. However, human capability to

manage control is inherently limited. Introducing more automation can alleviate

the burden of repetitive tasks and reduce the complexity of multitasking during

performances. Simultaneously, it can enhance the playability and dynamic inter-

action with the music.

The use of faders for group control, as discussed in Section2.4.2.1, marks

an early form of performance automation. Many contemporary tools have tran-

sitioned from traditional sound diﬀusion methods to a point source paradigm,

where automation is predominantly built on object-based sound spatialization

technology. Despite this trend, some software still seeks to oﬀer more adaptable

- 30 -

automation within the traditional sound diﬀusion framework. These programs of-

ten bypass object-based spatialization techniques in favor of matrix-based control

logic, allowing for greater ﬂexibility in automation without departing from estab-

lished practices.

The DM8 system was ﬁrst proposed by Barry Truax in his article on the

concepts of “space in sound” and “sound in space” (Truax, 1998). This is a ma-

trix-based system for mapping eight input signals to eight output signals. The

user has the option of manually assigning mapping relationships between inputs

and outputs, either statically or dynamically. During a performance, the user can

also cross-fade between eight diﬀerent mapping patterns, thus creating complex

variations in sound diﬀusion. Similarly, the M2 system employs a matrix-based

mapping architecture for the software component and a straightforward 32-fader

architecture for the control hardware (Mooney et al., 2004). The software does not

preset any speaker array structure or fader mapping. Instead, it provides a ﬂexible

mapping editing function that allows users to customize the input/output mapping

mode and fader control mapping according to their own needs. The M2 system

was originally designed to provide the greatest possible freedom of expression in

the context of improvisation, with the objective of facilitating the discovery of new

compositional avenues through improvisation. Resound represents a novel gener-

ation of matrix-based sound diﬀusion software that has been developed through

the accumulation of experience with the M2 system over an extended period of

time (Mooney & Moore, 2007; 2008; Stefani & Mooney, 2009). It presents a series

of creative mapping strategies, integrated as presets, which collectively constitute

a highly playable semi-automatic control device.

Admittedly, this particular oﬀshoot appears somewhat incongruous within

the prevailing mainstream trends. Moreover, the continued reliance on the fader

as the controller is somewhat uninspiring. Nevertheless, it is evident that this type

of system continues to possess intrinsic value, as it facilitates the accomplishment

of tasks that would be challenging to achieve within object-based spatialization,

as discussed Section2.3.1.

2.4.3 Discussion

The role of faders extends beyond their traditional hardware implementations. As

demonstrated in the Resound system, faders do more than merely adjust volume

and sound trajectories; they fundamentally inﬂuence the “behaviors” of the spa-

tialization system. By reconceptualizing faders as tools for parameter mapping

—from simple one-to-one mappings to intricate conﬁgurations—they enable deep

and nuanced manipulation, oﬀering richer and more complex control over the sys-

tem’s behaviors. Embracing this expanded mindset allows for the exploration of

a broader range of control algorithms for sound spatialization. In this approach,

performers or composers inﬂuence the behavior of these algorithms rather than

directly manipulating the spatial properties of individual sound materials. This

method facilitates the realization of unique spatial eﬀects that alter not only

- 31 -

the spatial properties but also the timbre of sound materials. Such algorithmic

control fosters a more abstract form of spatial music aesthetics that provides a

deeply integrated sound-spatial experience, going beyond traditional localization

and movement of sound sources. This evolution blurs the lines between composing

sound timbre and spatiality. The term “spatial texture” is used throughout this

thesis to describe all related topics. The next section will delve deeper into spatial

texture from both theoretical and technical perspectives.

2.5 Spatial Texture

Most tools and algorithms discussed so far adhere closely to the traditional objec-

tives of sound diﬀusion. In these contexts, spatial properties of sound are lever-

aged to enhance other intrinsic attributes such as dynamics and timbre, which

are largely determined by recording or synthesizing methods. Although spatial

attributes are recognized as critical elements in these works, they often play a

secondary role or are considered in later stages of composition or performance.

This relatively loose integration limits the utilization of spatiality to a macro-scale

musical structure.

In the realm of electroacoustic music, the focus has traditionally been on tim-

bre, with scholars shifting their attention from broad sonic features to the more

intricate details of sound texture. This trend highlights a growing interest in the

nuanced aspects of sound perception and manipulation. Such a concentrated ex-

amination of ﬁner details has fostered a rigorous exploration of spatial texture,

deﬁned as the integration of spatial attributes with sound’s textural qualities. The

distinction between sound texture and spatial texture is becoming increasingly

subtle, reﬂecting a broader trend towards a more integrated understanding of

sound’s spatial and timbral dimensions.

This section will commence with an introduction to the pertinent theories,

after which it will proceed to a categorization and analysis of the various speciﬁc

creation techniques.

2.5.1 Relevant Theories

2.5.1.1 Spectromorphology & Spatiomorphology

Spectromorphology, a term coined by Denis Smalley (Smalley, 1997), oﬀers a de-

tailed lens through which to describe and analyse the listening experience by fo-

cusing on the interaction between sound spectra (spectro-) and the ways in which

these sounds change and are shaped over time (-morphology). This approach al-

lows for a nuanced examination of the temporal and spectral structure of sounds

as they evolve, providing a vocabulary for discussing the otherwise abstract expe-

rience of listening to electroacoustic compositions. Spatiomorphology, introduced

- 32 -

in the same paper, further extends these concepts by incorporating the spatial

dimensions of sound.

The concept of “source bonding” plays a pivotal role in Smalley’s theory,

which focusing on the listener’s innate inclination to connect sounds with their

perceived origins or causes. This notion intricately weaves the extrinsic charac-

teristics of sound—its source and the method of its creation—with the listener’s

experience, whereby the origin of a sound, when obscured or abstracted, shifts the

listening focus towards the sound’s inherent qualities. This shift in focus becomes

particularly compelling in the context of spatial attributes of sound. When the

external characteristics, such as the source’s movement or position, become indis-

tinct, the listener’s attention is naturally drawn to the intrinsic spatial qualities

of the sound. This emphasis on the intrinsic spatial properties aligns closely with

Smalley’s exploration of spatial texture, understood as the revelation of spatial

perspective over time. It’s not merely about the movement or position of sound in

space but about how the listener perceives and interprets the spatial dimensions

and qualities of sound as it unfolds. The absence or abstraction of clear external

sources encourages a deeper engagement with these spatial textures, allowing lis-

teners to appreciate the subtleties of spatial expression and the nuanced interplay

between sound and space.

The theoretical frameworks of spectromorphology are of paramount impor-

tance, providing structured approaches to the analysis and understanding of the

complex interplays between sound’s spatial and timbral properties. Subsequent

explorations have been to a greater or lesser extent inﬂuenced by this theory, de-

spite their diﬀerent foci.

2.5.1.2 Textural Composition

Textural composition is a method of creating real-time computer music based

on acousmatic and stochastic concepts, manifested as sound metaobjects (Hagan,

2017). It relies on agile sounds that do not require conventional trajectory-based

spatial techniques. It serves as a bridge linking tape music and real-time computer

music, blurring the lines between sound objects and soundscapes, as well as point-

source and trajectory-based spatialization. In textural composition, the sonic tex-

ture is given precedence over other musical elements, with the aim of creating

distinct spatial and temporal experiences. The intention is to provide listeners

with a broad, immersive auditory experience characterised by slow, environmen-

tal shifts in time. The underlying philosophy of textural composition draws upon

the aesthetic qualities of sound objects found in acousmatic music, as well as the

expansive sound masses inﬂuenced by Iannis Xenakis. The aesthetic concepts pre-

sented in the textural composition are in alignment with the potential outcomes

achievable through the system outlined in this thesis.

- 33 -

2.5.2 Creation Techniques

For the purpose of introducing techniques for spatial texture, the classiﬁcation

method by Lynch (Lynch & Sazdov, 2011) serves as a reference. This method di-

vides spatial texture techniques into three categories according to the underlying

implementation logic: Spectral Spatialization, Spatial Granulation, and Panning &

Decorrelation. This thesis will reorganize the secondary classiﬁcations under the

broad categories and add some new examples from recent years.

2.5.2.1 Spectral Spatialization

The ﬁeld of frequency domain analysis and processing has been a pivotal tool in

the understanding of sound characteristics and the processing of sound details. The

analysis of the spectrum can be employed as a method of intuitively understanding

and manipulating sound. The majority of frequency domain processing in music is

conducted through the use of the Fast Fourier Transform (FFT) algorithm. Some

sophisticated audio analysis or Music Information Retrieval (MIR) tools employ

alternative frequency domain transform algorithms that are not addressed here.

Spectral spatialization may be conceptualized as a process wherein the spectrum

components are regarded as the fundamental basis for spatialization. Regardless

of the speciﬁc details of the various spectral spatialization algorithms, the funda-

mental concept is the same: the application of distinct spatialization treatments

to each group of spectrum components

Normandeau put forth a method he referred to as timbre spatialization (Nor-

mandeau, 2009). The underlying concept is that by directly assigning diﬀerent

frequency components of the sound to diﬀerent loudspeakers, the entire spectrum

of sound is virtually reassembled in the listening space, resulting in a sound that

has an integral spatiality within the original timbre. Such eﬀects can be achieved

by assigning diﬀerent bandwidth ﬁlters to each speaker. The endeavors of the

loudspeaker orchestra, which employ a variety of speakers, exhibit a certain degree

of consistency in their approach to timbre spatialization. This is because we can

simply understand the diﬀerent types of speakers as consisting of perfect playback

speakers and pre-ﬁlter setups. Timbre spatialization can also be manipulated in

a multitude of ways, as evidenced by Garcia et al.’s workﬂow, which combines

bandpass ﬁlter banks and sound movement (Jérémie Garcia, Jean Bresson, Schu-

macher, et al., 2015). A patch was constructed in OpenMusic for the purpose

of distributing an audio signal to a ﬁxed eight-channel loudspeaker ring. Each

loudspeaker is assigned a speciﬁc band-pass ﬁlter. The sound shifts between the

speakers, thereby inducing a coherent spectral and positional change.

The method of analysis/re-synthesis spatialization is not fundamentally dis-

tinct from timbre spatialization. The only diﬀerence is that, in this instance, the

frequency component manipulation is no longer conducted via a ﬁlter bank, but

rather through direct control of the frequency bins subsequent to the FFT trans-

form. There are additional, more elaborate techniques that build upon this foun-

- 34 -

dation. One technique is the extended spectral delay eﬀect, whereby the resyn-

thesis delay sound from individual FFT bins is sent to individual channels or

sound objects (Kim-Boyle, 2008). Another approach is to generate spatialization

patterns through analysis and mapping of spectral properties to spatialize another

input signal. Further creative techniques can be observed in the original papers.

(Jaroszewicz, 2015; Torchia & Lippe, 2004). Other, more complex techniques in-

volve the use of particle systems, as proposed by Kim-Boyle, which are categorized

in Section2.5.2.2 (Kim-Boyle, 2008).

Wave terrain synthesis represents a relatively self-contained multimodal

sound synthesis method, or alternatively, it can be interpreted as a kind of ex-

tended wavetable synthesis. In contrast to the majority of algorithms, which are

exclusively audio-centric, wave terrain synthesis employs graphical multidimen-

sional surfaces analogous to topographical maps as the foundation for sound gen-

eration (James, 2005). James has been engaged in an extensive investigation of

wave terrain synthesis as a means of spectral spatialization for an extended period

(James, 2012; 2015; 2016). The term wave terrain spatialization has been selected

to encapsulate the sound spatialization approach that employs this concept. The

fundamental concept is to ﬁrst construct a topographical map as the target for

spectral distribution, then map the height information in this terrain with the de-

sired spectral components, and ﬁnally employ the enhanced spatial panning algo-

rithm with audio-rate control signal to achieve spatialization. The precise speciﬁcs

of the implementation vary from article to article. The explanation provided here

is largely based on the version from 2015 (James, 2015). This version explicitly

states in the title that the algorithm is inspired by the theories related to spec-

tromorphology. The distinguishing feature of this method is its inversion of the

spatialization method with respect to the generation of the spectral distribution.

This results in a representation that is both straightforward and visually appeal-

ing. The image of the terrain surface and the corresponding image of the sound

trajectory permit the user to comprehend the current spatialization eﬀect in an

intuitive manner and to regulate it in a dynamic manner. Furthermore, the study

places signiﬁcant emphasis on the importance of the audio-rate control signal in

the spatialization process and fully exploits the ability of gen~ in Max/MSP.

2.5.2.2 Spatial Granulation

In conjunction with the spatialization techniques that operate within the frequency

domain, temporal processing represents another primary method for the genera-

tion of spatial texture. The principal method for modifying sound texture in the

time domain has its origins in a long history of research in the ﬁeld of microsound

(Thomson, 2004), which concerns the manipulation of sound fragments that last

for very short periods of time. The most widely recognized practical application of

this research is granular synthesis (Roads, 1978; Truax, 1988). There has always

been a close relationship between granular synthesis and sound spatialization.

Barry Truax, the author of the inaugural real-time granular synthesis algorithm,

- 35 -

posits that granular synthesis can be employed as a means of inﬂuencing the per-

ception of spatiality in sound (Truax, 1998). Algorithms that employ real-time

granular synthesis for sound spatialization were implemented at an early stage in

Max/FTS (Todoroﬀ, 1995), the predecessor of the current Max/MSP. Examples

of recent tools may include the ambisonic-based GranularEncoder plug-in within

the IEM Plug-in Suite18.

18https://plugins.iem.at/docs/granularencoder/

The concept of spatial granulation can be elucidated as the temporal disas-

sembly (granulation) of the input signals and their corresponding distribution to

the multi-channel system. The technical challenge of achieving granulation and

distribution has been overcome with the advent of today’s computer performance.

The diﬃculty of the algorithm lies in the eﬀective control of the behavior of hun-

dreds or thousands of grains and their distribution to the speaker array.

With regard to the behaviour of grains, the most well-known approach is to

control the overall behaviour of a group of homogeneous individuals based on the

Boids algorithms (Reynolds, 1987). The algorithm identiﬁes three fundamental

behavioral patterns exhibited by individuals in a group: separation, cohesion, and

alignment. It then determines the optimal next step for each individual in the

group, with the goal of achieving uniﬁed control over the collective behavior of the

group. The concept has been previously explored in granular synthesis algorithms

without sound spatialization (Blackwell & Young, 2004).

Among the most successful practices in spatial texture are the swarm lab

system (Davis & Rebelo, 2005) and spatial swarm granulation (Wilson, 2008).

Another prevalent approach is the utilization of corpus-based or dictionary-based

methodologies (Einbond & Schwarz, 2010; McLeran et al., 2008). Although the

algorithms of these two methods are entirely distinct, the underlying concept is

to reduce the dimensionality of the grains and cluster them to construct a dimen-

sionally controllable representation space, upon which the mapping between the

representation space and the three-dimensional physical space can be realized.

In regard to the strategy of distributing grains towards the speaker array,

the most straightforward approach is to leverage the underlying support of the

object-based spatialization technique. The most prevalent approach is to associate

each grain stream with a sound source in ambisonics, or to spatially localize it

via VBAP. This approach obviates the diﬃculty of making speciﬁc channel map-

pings. Nevertheless, in the context of other specialized speaker arrays, there will

also be evident methodologies for the design of channel-based distribution strate-

gies. To illustrate, the spatial swarm granulation previously mentioned is based

on the BEAST loudspeaker system. The grain ﬂow distribution strategy employs

the kd-tree algorithm to assign each boid to the closest speaker. Moreover, for

the High-Density Loudspeaker Array (HDLA), which is frequently employed in

contemporary research facilities, the attempts to map a grain stream to a speciﬁc

speaker are just as eﬃcacious as the previous approaches (Garavaglia, 2016).

- 36 -

It is possible to utilise both spectral spatialisation and spatial granulation in

conjunction with one another. For instance, Kim proposed a method for generating

spatial texture by mapping the particle positions in a particle system controlled

by the Boids algorithm to speciﬁc Fast Fourier Transform (FFT) bins (Kim-Boyle,

2008).

2.5.2.3 Panning & Decorrelation

Both spectral spatialization and spatial granulation seek to deconstruct sound

into its constituent elements, thereby transforming the control of a single sound

source by the overall spatial properties into the control of a multitude of sound

elements. The notion that sound must be miniaturised in order to exert control

over its microscopic properties is a concept that is intuitively appealing. However,

there are alternative approaches that can be employed to achieve a subtle and

infectious spatial texture without the necessity of dismantling the sound. Two dis-

tinct algorithms are presented here: decorrelation and panning. The decorrelation

algorithm is designed to produce diﬀerences in detail between audio streams, while

the panning algorithm employs rapid movement of the sound source.

Decorrelation modiﬁes an original audio signal into multiple outputs with

distinct waveforms that are perceived similarly to the source (Kendall, 1995).

Strictly, these sounds diﬀer physically but share perceptual qualities, making them

indistinguishable as separate sources. This physical variance enhances spatial per-

ception through psychoacoustic eﬀects. This principle underpins stereophony and

head-related transfer function (HRTF)-based spatial audio. When the phenome-

non of decorrelation is discussed outside the context of binaural hearing, it is ca-

pable of producing not only a psychoacoustic sense of spatiality, but also acoustic

phenomena that actually exist in space. In the most basic instance, a comb ﬁlter is

the result of superimposing a source in the sound ﬁeld with a second source that is

slightly delayed in time. This phenomenon is typically avoided in traditional room

acoustic design, yet it can be utilized in spatial music creation. Other methods of

decorrelation can facilitate a more engaging spatial listening experience.

The generation of multiple, uncorrelatable sound sources from a single sound

source can be achieved through the utilisation of either an all-pass ﬁlter for phase

adjustment or an FFT transform, with the subsequent phase resetting before the

sound sources are synthesised. Another approach is to convolve the source with

diﬀerent signals, which is not fundamentally diﬀerent from using an all-pass ﬁlter.

However, it should be noted that this approach does not adhere to the strict deﬁ-

nition of hearing consistency. The generation of spatial texture can tolerate certain

audible diﬀerences between the sources. When spatialized sound is synthesized by

a synthesizer, it is possible to create a decorrelation-like eﬀect by using sounds in

diﬀerent channels with a small diﬀerence in the synthesizer parameters. This is the

concept of topographic synthesis (Nystrom, 2018). The authors of this algorithm

do not propose categorizing it under decorrelation.

- 37 -

The concept of panning is referenced on several occasions in Section2.1, where

spatial audio algorithms developed on top of panning are also described. This

operation is fundamentally linked to traditional spatial audio applications and

spatial music aesthetics. As with the basic operations of audio processing, such as

amplitude modulation and frequency modulation, when applied at high rates and

modulation depths, they will have a dominant eﬀect on the auditory experience.

Furthermore, panning can be applied with extreme parameter settings, resulting

in the original signal becoming unrecognisable.

The spatial rapid panning technique is usually underpinned by object-based

spatialization algorithms (Schmele & Lopez, 2022). It is assumed that a control

signal at audio rate is applied to a virtual sound source, causing it to move rapidly

between two points in space to produce signiﬁcant amplitude modulations and

Doppler shifts (Schmele, 2011). Furthermore, when the rate of panning exceeds a

certain threshold, the sound enters the domain of microsound, resulting in a tex-

ture that is similar to that of granular techniques, as discussed in Section2.5.2.2

(McGee, 2015).

The concepts of the panning and decorrelation approach are more readily

comprehensible than the aforementioned time/frequency domain disassembly spa-

tialization approach. However, this does not imply that either approach is any

less eﬃcacious. Both methods entail the actual acoustic phenomena, whereas the

decomposition methods are still predominantly focused on the audio signal. In

practice, the panning and decorrelation approach may result in a multitude of

unexpected outcomes.

2.5.3 Spatialization as Synthesis

Once the three primary categories of techniques have been established, the concept

of spatial sound synthesis becomes more readily comprehensible. It is erroneous to

view spatial sound synthesis as a distinct branch of synthesis algorithms existing

in isolation. Rather, it should be conceptualised as a model for thinking in a more

forward-thinking manner (Clarke, 1999). Although space and timbre have been

uniﬁed, the distinction between controlling timbre and controlling spatiality has

become nearly indistinguishable. Nevertheless, there persists the presupposition

that the sounds are exisit before being spatialized, even if the spatialized sound

is entirely distinct from the original. At this juncture, the appropriate term for

these methods is the spatial eﬀect. However, if one introduces the concept of spa-

tial sound synthesis, the spatial aspects are considered at an early stage of the

synthesis process and thus become an integral part of the concept. Consequently,

sound is ultimately “synthesized” in real physical space.

Some of the algorithms are deﬁned by the researchers as spatial sound syn-

thesis. A signiﬁcant proportion of the concepts associated with spatial sound syn-

thesis can be considered spatialised extensions of conventional sound synthesis

algorithms. For instance, spatial granulation techniques in Section2.5.2.2 are still

- 38 -

regarded as either granular synthesis or concatenative synthesis methods. The

Spatio Operational Synthesis (SOS) (Topper et al., 2003) involves the rotation of

single partial components of basic waveforms in an additive synthesis process on

a circular loudspeaker setup utilising the VBAP. This is consistent with the pri-

mary concept of timbre spatialization, particularly his proposed extension method

utilizing subband decomposition. Some panning-based methods deﬁne themselves

as spatial modulation synthesis (McGee, 2015) or rapid panning modulation syn-

thesis (Schmele, 2011). Spectro-Spatial Sound Synthesis (Coler, 2019), which dis-

tributes the sounds of musical instruments as point clouds, represents a hybrid

use of spectral and temporal decomposition, albeit with a subtle reverse-thinking

approach. In topographic synthesis (Nystrom, 2018), each loudspeaker of a mul-

tichannel system is assigned an individual instance of a synthesis process. This

may be any general synthesis technique. When these parallel processes are de-

terministic and driven with the same input parameters, all speakers’ signals are

identical. Parameter distributions can be used to create instantaneous or evolutive

spatial textures. Topographic synthesis is indiﬀerent to the spatial conﬁguration

of loudspeaker systems, treating the loudspeakers as a sorted array. The author

provides a comprehensive examination of the similarities and distinctions between

topographic synthesis and decorrelation.

2.5.4 Related Tools

The techniques presented in this section are more experimental and less widely

used than those presented in Section2.4. Consequently, it is challenging to provide

a comprehensive overview of the diverse array of tools presented in that section.

The majority of the techniques are implemented in audio programming software

and have not been further developed to the extent that they can be easily used by

others. The following list provides an brief overview of some of the projects that

focus on this area.

BEASTmulchLib is a SuperCollider class library developed for the BEAST

project that oﬀers advanced spatial techniques including the spatial swarm gran-

ulation and other unconventional signal routing techniques (Wilson, 2009). The

OMPrisma and OM-Sox libraries, which is part of the OpenMusic suite of tools,

provides a general framework for controlling spatial sound synthesis and incorpo-

rates sound spatialization (Schumacher & Bresson, 2010). The library conceptu-

alizes spatial sound rendering as an essential element of sound synthesis, thereby

elevating spatial parameters to the status of abstract musical materials within a

comprehensive compositional framework. Live 4 Life is a spatial performance tool

designed to facilitate the creation of sound across multiple loudspeakers in Super-

Collider (Lengelé, 2018). By focusing on spatial rhythmic patterns and synthesis

parameter loops, Live 4 Life aims to enhance the interaction between sound ob-

jects and their spatial attributes, catering to a diverse range of performance setups

and experimental sound exploration. The ImmLib software has been developed

for the composition of spatial music on grid-based loudspeaker systems (Negrao,

- 39 -

2014). It enables the creation of multiple decorrelated sound streams at diﬀerent

locations, aiming to form a broad sound source with unique spatial qualities. Built

in SuperCollider, ImmLib simpliﬁes the generation of these streams from a single

synthesis deﬁnition and oﬀers tools for crafting spatial patterns on a virtual sur-

face by modulating synthesis parameters.

2.6 Discussion

This chapter provides a detailed overview of the theoretical and technical aspects

of spatial music applications, aiming to establish a comprehensive coordinate sys-

tem to clearly deﬁne the Zerr* approach. This system will clarify what Zerr* is and

what it is not, delineating its suitability for speciﬁc outcomes and its limitations

in certain scenarios.

Four key aspects that have been identiﬁed as categorizing related algorithms,

software, instrumentation, and other items can be summarized as follows:

•Speaker-centric & Source-centric: Speaker-centric & Source-centric: This

distinction determines whether the application has a deﬁned notion of a virtual

source or manipulates the speakers directly.

•Composition-oriented & Performance-oriented: Determines if the appli-

cation is designed to produce ﬁxed works or address performance-related chal-

lenges.

•Scope of Realization: Considers whether the application is tailored for spe-

ciﬁc speakers or arrays, integrates a deﬁned sound engine, or correlates with

speciﬁc musician gestures, akin to instrument design analysis.

•Aesthetic Inclination: Evaluates whether the application adheres to tradi-

tional aesthetic principles or aligns with preferences for spatial texture.

While the applications discussed in this chapter can be classiﬁed according to the

four key indicators, a detailed taxonomy will not be attempted here for brevity.

The next chapter will deﬁne the Zerr* System using these metrics and discuss its

design concept in depth, comparing it to other applications.

- 40 -

Chapter 3

Zerr* Approach

3.1 Approach Classification

Before outlining the methodology that underpins the Zerr* approach and the de-

tails of the concept design, it is ﬁrst necessary to categorize the Zerr* approach

according to the four sets of indicators previously mentioned. It is hoped that this

will enable a clear delineation of the boundaries of the discussion

•Speaker-centric or Source-centric: The Zerr* approach is explicitly

speaker-centric. It controls the signals delivered to each speaker directly, with-

out utilizing a concept of a virtual source. As a result, Zerr* cannot achieve

accurate spatial positioning and movement. Although it is possible to simulate

the eﬀect of sound source movement under speciﬁc parameter settings, compar-

ing the realism and stability with those achieved by object-based spatialization

algorithms would be inappropriate.

•Composition-oriented or Performance-oriented: The Zerr* approach is

primarily used in live performances, particularly for real-time spatial music

improvisation. Unlike composition-oriented tools, it lacks non-linear editing

and scoring capabilities. All audio and control signals within the system are

processed and transmitted in real time.

•Scope of Realization: The Zerr* system is compatible with any loudspeaker

setup that allows direct control of the input signals for each loudspeaker. In

addition, Zerr* can process any type of audio input, allowing musicians to

interact with Zerr* using any type of gesture. In essence, It functions as an

audio distribution engine that dynamically assigns input audio signal to speciﬁc

loudspeaker setups.

•Aethetic Inclination: Zerr* excels at creating complex spatial textures, and

the characteristics mentioned in the ﬁrst three points predestine it to be a very

experimental approach. Among the techniques related to spatial texture, Zerr*

employs pure amplitude panning. By adjusting the rate of the panning signal,

Zerr* can move seamlessly from standard spatialization eﬀects to the creation

of distinctive spatial textures.

In essence, Zerr* is a speaker-centric sound spatialization approach designed for

arbitrary audio sources and loudspeaker setups. It excels in generating spatial

textures during live improvisations. This will be rigorously reviewed in subsequent

analyses and presentations to ensure the validity of the discussion.

- 41 -

3.2 Signal Flows in Live Performance

The initial intention to develop an experimental system must ﬁrst be analyzed

in terms of its use scenarios. Most of Zerr*’s conceptual design is related to its

intended scenario, which is live improvisational performance. This section presents

a possible new system design methodology for the problem in a live performance

scenario, and the concept of Zerr* as a concrete proposal for realizing this method-

ology will be introduced in detail in the next section.

The real-time demands of live music performance scenarios have turned many

simple operations in the composition process into complex human-computer in-

teraction problems. And performances with real-time sound spatialization have

an extra dimension of complexity compared to a normal performance. Marshall

et al. analyzed the problems of live spatial music performance in detail (Marshall

et al., 2009). It divides the roles of those involved in the performance into spatial

performers, instrumental performers, and spatial conductors, and suggests the im-

portance of considering the cognitive load19 of the performers. In this section, this

19Cognitive load is deﬁned as “The total amount of mental activity imposed on working memory at an instance

in time” in the original paper.

is used as a reference to analyze the diﬀerence between signal ﬂow and performer

cognitive load in diﬀerent performance scenarios.

With the help of Figure4, all setups in a live performance can be understood

as a holistic instrument that is controlled by several performers in diﬀerent roles

at the same time. This sound engine can be simpliﬁed and divided into three core

modules corresponding to the three diﬀerent types of roles proposed by Marshall

et al., namely Sound Source, Sound Spatializer and Sound System. Assuming that

the performers’ gesture inputs are ignored, there are only pure audio signal ﬂows

between modules in the performance, as (A) in Figure5. The Sound Source mod-

ule in this context refers to anything that can produce audio signals, including but

not limited to acoustic instruments, playback devices, and synthesis algorithms.

The Sound Spatializer module refers generically to all algorithms and devices that

process and distribute the input audio signal. The Sound System module refers

to the equipment that ultimately presents the sound waves, including but not

limited to the loudspeaker arrays and the listening environment/room. In actual

scenarios, the three modules are merged. However, the current division is more

useful for later analysis.

In addition to the basic audio signal ﬂow, performers generate gesture inputs

that vary depending on their role and the performance context. In a traditional

live performance without sound spatialization, the performers’ gesture inputs only

aﬀect the sound sources. Here, the sound spatializer is a ﬁxed system that collects

all input signals and routes them to the sound system, as shown in (B) of the

Figure5. In contrast, in sound diﬀusion-type performances, the performer directly

manipulates the sound spatializer with a ﬁxed sound source system, as shown in

- 42 -

corresponding to the spatial performer and the instrumental performers. The spa-

tial conductor in this model is the performer who inputs gestures into the sound

system. As discussed by Marshall et al., a modern spatial music performance typ-

ically involves a number of instrumental performers, a spatial performer, and a

spatial conductor, as shown in (D) of the Figure5. This division of roles proves

highly eﬀective and eﬃcient for live performances that are either well rehearsed

over time or structured around ﬁxed scores. From personal observation, it’s worth

noting that there can be more than one spatial performer, and the role of spatial

conductor is relatively rare in actual performances. This role is simulated in the

original article by allowing the performer to control the size of a virtual space.

thus the spatial conductor is not central to the following discussion and will be

omitted.

Figure5: Signal ﬂow in live performance

However, the model is less applicable in an improvised live performance. First,

from the point of view of the spatial performers, it is diﬃcult for them to respond

eﬀectively in time to the audio signals generated according to the improvised ges-

tures of the instrumental performers. Because the audio input at this point is

a time-varying signal relevant to instrumental performers with causality, spatial

performers cannot predict in advance what will happen next. More commonly,

the roles of instrumental performer and spatial performer in improvisation are

undertaken by the same individual, as shown in (A) of Figure6. When a performer

is required to process two disparate gesture control signal inputs simultaneously,

the cognitive load can easily exceed the performer’s capacity, potentially leading

to mistakes in performance. At this juncture, it would appear that one can only

consciously control either the sound properties or the spatial properties.

This is not an insurmountable paradox. In addition to the performer exer-

cising their own multitasking abilities, the majority of the performance aid tools

mentioned in Section2.4.2 attempt to address this problem. Some tasks that re-

quire conscious control are converted to non-conscious control through the use

- 43 -

of pre-composed automations. This enables the performer to exert control over

one of the modules with minimal and discrete control gestures, thereby allowing

them to concentrate on the control of the other module. The performer’s focus

during improvisation will be determined by the speciﬁc part being performed.

The other part will have a reduced role and serve as a supporting function, as

illustrated in (B) and (C) of the Figure6. One illustrative example could be where

the performer engages in improvisation on the musical instrument, subsequently

initiating a spatial eﬀect, such as a trajectory movement through the action in the

intervals between playing. The process can also be reversed, whereby the performer

controls the spatial behaviour of a grain cloud in real time by ﬁnely controlling

various parameters in a spatial granulator. The sound input to the grain cloud is

triggered using simple gestures for playback or cessation. It is also possible to take

this non-conscious control to the extreme, namely to use complete automation

without human intervention on a particular module. In the case of autonomous

control of both parts, the intervention of real-time human subjective awareness is

lost and enters the realm of computer-generated music, which is not the subject

of the present discussion of performance.

In their discussion of conscious and non-conscious control, Marshall et al.

posit that non-conscious control is more akin to a compositional process than a

performance. In order to enhance the usability of an assistive tool, the control

gestures it provides are often more generic in nature, and on occasion, they do not

align with the speciﬁc gestures that the user desires to employ. Once the automa-

tion has been determined, there is a very limited range of interpretations that the

performers can make. This may give the performer the impression that they have

no real control over the performance.

It is necessary to identify a signal ﬂow that can provide conscious control ges-

tures to both the sound source and the spatializer, which are dense and consistent

with the performers’ intentions without overloading them. One possible approach

to this problem is to associate two modules and control the behavior of the other

module with a signal from one of them. Thereafter, the performer only needs to

focus on controlling one of the modules after completing a small number of basic

setups, while the other module will change its behaviour in response to the con-

trol signals generated based on the performer’s current conscious control gestures.

This enables high gesture density control of both modules simultaneously. Given

that this control signal lies between conscious and non-conscious control, as it is

indeed related to the performer’s gesture input, it is more appropriately called

semi-conscious control.

As illustrated in (D) of the Figure6, the optimal method for implementing

this type of control system is to collect the audio signal emanating from the sound

source as an input and utilize it to generate the control signals for controlling the

sound spatializer. In this instance, the performer’s sole responsibility is to regulate

the sound source, with the spatialization tasks being executed automatically. The

signal ﬂow diagram represents the primary framework underlying the Zerr* ap-

- 44 -

proach. Under this framework, any speciﬁc technical solution can be incorporated.

A controller can be designed in combination with a sound source or spatialiser or

sound system. This framework permits the incorporation of any speciﬁc technical

solution. The Zerr* approach is but one of the possible concepts.

Figure6: Signal ﬂow in improvisation performance

3.3 Zerr* Concept

Figure7 shows the general signal ﬂow of Zerr*20 . This signal ﬂow diagram reﬁnes

the internal structure from (D) of the Figure6, while simultaneously simplify-

20Derived from German “Zerräumlichung” ≈ spatial disintegration.

ing the other components that are of lesser importance. All solid paths represent

audio-rate signals, with the bold paths representing the raw input and audible

output.

The Zerr* system takes a single-channel audio signal 𝑥 as input then gen-

erates 𝑁 distinct signals 𝑥∗

1…𝑁 for N loudspeakers. The input signal is initially

processed by the Feature Tracker module. The Feature Tracker module employs

speciﬁc algorithms to extract audio features, which are then transmitted to the

Feature Processor. The Feature Processor is responsible for the execution of basic

post-processing operations on the input audio features, with the objective of gen-

erating standard control signals. The standard control signals are inputted into the

Envelope Generator module. The Envelope Generator employs a polling mecha-

nism based on the information contained in the control signals to communicate

with the Speaker Manager. The Speaker Manager is the module that stores all

the information about the loudspeaker setup in advance, and provides Envelope

Generator with the necessary information for the generation of envelopes. The

Envelope Generator is capable of generating multi-channel envelopes in real time,

utilising both the control signals from the feature processor and the speaker in-

formation from the speaker manager. It is possible to merge the envelopes with

- 45 -

another set of envelopes in order to produce more complex envelopes. The ﬁnal

combined envelopes will be multiplied with the original input signal in the Audio

Disperser in order to obtain audible multi-channel audio outputs. Each output

signal is transmitted directly to the corresponding loudspeaker, thereby complet-

ing the spatialization of the input audio. The functionality of each module will be

elucidated in detail in Section3.4.

Figure7: Signal ﬂow of Zerr* approach

The input audio signal is used to deﬁne the spatial distribution in accordance with

the aforementioned schema. This addresses the practical need to reduce cognitive

load and maintain a high density of gesture inputs in live improvisation scenarios.

Nevertheless, Zerr* is distinguished by its creative rather than by its functional

purpose.

The performance-oriented character of Zerr* has already been elucidated by

the methodology presented in the preceding section. With the brief description of

the Zerr* signal ﬂow diagram just given, it is also possible to demonstrate two

other characteristics. Zerr*’s input is an audio signal from the sound source, and

its output is a multi-channel audio signal assigned to the loudspeaker array, indi-

cating that Zerr*’s scope of realisation is limited to the spatialization of sound.

Furthermore, it is evident that Zerr* is a loudspeaker-centric system, as the audio

signal is fed directly to each corresponding loudspeaker. Additionally, the speaker

properties must be provided to the speaker manager for assisting decision-making

purposes. The only aspect of the system that requires further explanation and

cannot be directly recognized from the signal ﬂow is the system’s aesthetic incli-

nation.

As previously discussed in Section2.5.2.3, the ability of panning to transition

from a fundamental operation to an experimental eﬀect is contingent upon the

utilization of unconventional parameter settings, speciﬁcally those involving high

rates of control signals. This aligns with Zerr*’s design philosophy. As illustrated

in Figure7, all internal modules of Zerr* except speaker manager and envelope

generator utilize uni-directional audio-rate signals for information transfer. This

ensures that the system is inherently capable of generating high-speed control sig-

nals. The methodology of utilising audio-rate as a control signal has been outlined

in Section2.5.2.1 and Section2.5.2.3, which pertains to wave terrain spatialisation

- 46 -

and spatial modulation synthesis, respectively. In contrast to the aforementioned

control signals and the high-rate control signals typically employed in standard

modulation eﬀects, the control signals derived from audio features are not only of

a high rate but also irregular. The characteristics of the control signals extracted

from the features will be analysed in detail in the Section3.4.

3.4 System Design

This section will explicate the particulars of each module’s functionality, concomi-

tantly describing the design concepts. For the speciﬁc implementation details of

each module, please refer to Section4. This section encompasses more of the con-

ceptual descriptions.

3.4.1 Feature Tracker

The feature tracker functions as the initial processing stage, responsible for ex-

tracting desired audio features from the input signal. According to Lerch, audio

features are deﬁned as speciﬁc types of audio representations, constructed based on

expert knowledge, and tailored to meet the speciﬁc requirements of a task (Lerch,

2012). This process allows the audio’s meaningful properties to be emphasized,

informing subsequent control signal ﬂows. The eﬀectiveness of this system hinges

on the assumption that a performer’s gestures can signiﬁcantly inﬂuence the au-

dio, particularly features that can be ampliﬁed and rendered distinctly. Only when

this condition is met can the control system, which initiates with audio features,

enhance the real-time controllability for the performer. From a broader perspec-

tive, this method of coupling timbral variations of audio with spatial changes

aligns closely with the foundational goal of integrating spatial elements into mu-

sic. Changes in timbre inherently drive spatial modiﬁcations. Leveraging audio

features for spatialization accelerates the transition from manual analysis and re-

production to real-time analysis and decision-making as the audio evolves.

Audio features are divided into two categories: instantaneous features and

learned features. Instantaneous features, also known as audio descriptors, are more

low-level audio features. They typically take a small block of audio samples as

input and return a single value based on a ﬁxed calculation method. These fea-

tures lack an explicit musical or perceptual level meaning, but are simply descrip-

tions of the data characteristics of the sample block. Such features are generally

very simple to compute and are easily implemented in real time due to the small

amount of data required for a single computation. Furthermore, there is a clear

correspondence between inputs and outputs due to the ﬁxed algorithm. However,

the disadvantage of this type of feature is also evident, as using only a very small

amount of data for the computation can lead to unstable feature values in the

output.

The extraction of learned features necessitates the utilisation of supplemen-

tary data in addition to the input audio signal. The calculation of such features is

- 47 -

contingent upon the acquisition of information from other data sources, which is

then integrated with the current input audio signal to yield the ﬁnal feature. Both

traditional machine learning algorithms and modern AI models can be classiﬁed

as such. Such features can be utilized to derive highly abstract information from

them, either as explicit categorical labels or as feature vectors with implicit infor-

mation. While such algorithms are more complex, the majority of them require

longer audio data as input.

In order to guarantee the real-time performance of the Zerr* system, it is

preferable to deploy instantaneous features in the feature tracker. The issue of

unstable output from instantaneous features is not a signiﬁcant concern under the

Zerr* system. At some point, it can be advantageous. The Zerr* system’s use of

audio features, which are not designed to preserve every audio detail but rather

to maximize real-time performance, allows for extreme ﬂexibility in the deﬁnition

of audio features. Even the most minuscule units of audio, such as a few samples

or a single sample point, can be employed as a feature.

3.4.2 Feature Processor

The feature signals generated by the Feature Tracker will be fed directly into

the Feature Processor. The task of the Feature Processor is to ﬁrstly normalize

diﬀerent kinds of feature signals, and then merge or map the normalized signals

according to the requirements, and ﬁnally process them into a standard high-

speed signal that can be comprehended by the subsequent modules. This does not

specify any exact processing method, but the structure of the resulting ﬂows is

clearly deﬁned.

In their book Generating Sound & Organizing Time, Wakeﬁeld and Taylor

engage in a discussion of the concept of signals, which serves as an invaluable

source of inspiration for the design of this module (Wakeﬁeld & Taylor, 2022). A

signal can be considered for its characteristics in 4 ways, rates of change, shapes

of change, ranges of values, kinds of value. The ﬁrst point, rates of change, refers

to how fast the signal changes and whether the signal changes have signiﬁcant

characteristics (periodic, sporadic, complex, or stochastic). Shapes of change refers

to the manner in which the signal is undergoing a transformation, whether it is

a sudden shift or a gradual evolution. Ranges of values is used to describe the

boundaries of a signal, including the presence of a maximum or minimum value

and the type of value (ﬂoating point, integer). Kinds of value is used to describe

whether a signal has other meanings or explicit functions. These include whether

the signal implies phase or periodic motion, or whether it has explicit units such

as decibels, Hz, etc.

The aforementioned four attributes permit the categorization of the control

signals generated by the feature processor. The feature processor generates two

types of signals, designated as trajectory and trigger.

- 48 -

Given that all inter-module communication employs the audio rate, the rate

of change has been explicitly stated. In the majority of cases, the trajectory and

trigger will be within the audible range. It is important to note that although it is

audible, it is still a control signal and is not intended to be used as an audio signal.

The signal is extracted from the audio feature, and thus, its pattern of change

is related to the original audio features. In terms of the value represented, both

signals are merely abstract control signals devoid of any speciﬁc units. The two

correspond to two distinct control modes. The trajectory represents a continuous

control mode, wherein each data point in the trajectory signal exerts an inﬂuence

on the behavior of the controlled module. The trigger is a discrete control mech-

anism that responds to a speciﬁc event occurring at a speciﬁc time, resulting in

a subsequent system response. Both signals are normalized to values between 0.0

and 1.0, which is consistent with the deﬁnition of unipolar signals. The trajectory

of the signal may ﬂuctuate freely between 0.0 and 1.0, whereas the trigger signal

is constrained to either 0.0 or 1.0. The shapes of change on the trajectory are, for

the most part, smooth changes. However, there is a possibility that the trajectory

may become stepped-like if the audio characteristics change too drastically. The

trigger varies in a way that it jumps from 0.0 to 1.0 and will be 0.0 most of the

time. the pattern is basically the same as the single-sample impulse signals deﬁned

in the book, the only diﬀerence is that there is no stable period. The trigger signal

jumps between 0.0 and 1.0, with the majority of instances exhibiting a value of

0.0. The pattern is analogous to that of the single-sample impulse signals deﬁned

in the book (Wakeﬁeld & Taylor, 2022), with the exception that a stable period

is absent.

The primary rationale for the adoption of these two signal types as standard

control signal streams is their versatility and capacity to convey the requisite

amount of information as control signals. They can be processed from any combi-

nation of audio features and can be sample-level real-time accepted by subsequent

modules. Another distinguishing feature of the control signals is that they lack a

discernible pattern of change. Their patterns align with ﬂuctuations in audio fea-

tures, in contrast to the limited number of common control signal models, such as

sinusoidal and random. Systems under conventional control signals exhibit greater

predictability and produce convergent eﬀects. In this context, the instability of

instantaneous feature observed in Section3.4.1 becomes advantageous. Unstable

control ﬂow can lead to unexpected outcomes.

3.4.3 Speaker Manager

The Speaker Manager is responsible for managing the properties of speakers within

a given loudspeaker setup conﬁguration. It provides various functions for querying

loudspeaker properties, selecting speciﬁc speakers, all of which are essential for the

envelope generator stage. The Speaker Manager holds standard properties as well

as additional speciﬁc properties of loudspeaker setups. Standard properties are in-

herent to the speakers and loudspeaker setups Speciﬁc properties are task-related

- 49 -

and serve to fulﬁll speciﬁc creative purposes. Depending on the current creative

intent, a deﬁned loudspeaker setup can have multiple sets of speciﬁc properties.

The standard properties include a unique identiﬁer for each loudspeaker and

the geometric features of the loudspeaker array. These features include the position

of each loudspeaker in Cartesian and spherical coordinates and the orientation.

In addition, standard properties may include other intrinsic qualities of a loud-

speaker, such as frequency response, distortion, or ﬁxed pre-processing settings

for each loudspeaker. This aspect has not yet been speciﬁcally considered in the

current concept, and it represents one of the possible directions for subsequent

development.

Speciﬁc properties can be understood as parameters that deﬁne the relation-

ship of each loudspeaker to the other loudspeakers. These variables can be deﬁned

manually or calculated algorithmically. The initial investigation has identiﬁed

three properties that appear to be more suitable for current use, namely speaker

masks, speaker trajectory and speaker topology.

The speaker masks is employed to ascertain the visibility of each loudspeaker,

with only those that have been unmasked included in the system. In large-scale

loudspeaker array setups, it can be advantageous to utilize only a subset of loud-

speakers. From a functional standpoint, the selection of only a subset of loud-

speakers can alleviate the computational load of real-time multi-channel audio

processing, and enables parallel computation over multiple devices. In terms of

creative ﬂexibility, the use of only a few speakers with speciﬁc distributions from

a massively homogenized speaker array allows for more atypical spatialization ef-

fects.

The speaker trajectory can be understood as a modernized expansion of the

deﬁnition of speaker trajectory, as discussed in Section2.3.3 of Poème Électron-

ique. The interconnection of loudspeakers in a sequential manner establishes a

trajectory for sound to traverse. The movement of the sound on this speaker path

is controlled in real time by the control signal in trajectory format. The point of

expansion of the speaker trajectory in this context is that it focuses solely on the

logical ordering of speakers, rather than following a trajectory in real spatial co-

ordinates This trajectory can be generated algorithmically based on the standard

properties. It is also recommended that the trajectory be deﬁned manually so that

a customized design can be created that provides an unnatural sound movement.

The speaker topology is analogous to a matrix of logical connections between

speakers. It is deﬁned by assigning each loudspeaker a list of the speaker identi-

ﬁers representing other loudspeakers linked to it. Thus, the speakers connected to

the current speaker can be used as potential destinations, i.e. the sound on the

current speaker can jump to one of them. The jump of the sound is determined

by the control signal in trigger format, and the jump occurs at the moment of

the trigger sample. In accordance with this deﬁnition, the connections between

the loudspeakers can be either bi-directional or uni-directional, thereby resulting

- 50 -

in a topology that is exceedingly complex. The deﬁnition of speaker topology al-

lows for the implementation of more ﬂexible mapping strategies that transcend

the limitations of geometric constraints, thereby accommodating unconventional

speaker setups.

The spatial eﬀects constructed through these three speciﬁc properties resem-

ble sound movement and distribution in the conventional sense when low speed

control signals are employed. However, once the velocity of the control signal

surpasses the threshold of human perception, the aforementioned properties eﬀec-

tively delineate the underlying topology of the spatial texture. In contrast to an

impenetrable wall of sound, they collaborate to form a sound mass in accordance

with Hagen’s deﬁnition (Hagan, 2017).

3.4.4 Envelope Generator

The Envelope Generator generates 𝑁 individual modulation signals 𝑚1…𝑛 which

are contingent upon the signals from the Feature Processor and the properties

from the Speaker Manager. The outputs of the envelope generator are in the style

of unipolar envelopes(and windows) as described in (Wakeﬁeld & Taylor, 2022).

Two key considerations are required for the generation of multi-channel envelopes:

ﬁrstly, the selection of loudspeakers and secondly, the additional distribution pro-

cessing stages.

3.4.4.1 Speaker Selection

Zerr* employs amplitude panning to spatialize the input sound source. When am-

plitude panning is extended from merely relocating sound between two speakers or

two ﬁxed spatial positions to a more complex process involving multiple destina-

tions, a fundamental challenge emerges: how to determine the optimal destination

for the sound to subsequently move to. The channel-based logic enables the deci-

sion space for this problem to be constrained from continuous spatial coordinates

to discrete speciﬁc speaker locations. In accordance with the formats of the control

signals, the envelope generator should also operate in trajectory mode or trigger

mode. Two distinct selection approaches are delineated herein as trajectory map-

ping and trigger shifting.

3.4.4.1.1 Trajectory Mapping

In the trajectory mapping approach, the Envelope Generator takes the trajectory

control signal, and maps it with the speaker trajectory property. Dispersion of

timbre in space is achieved by this mapping strategy This is essentially the same

eﬀect as timbre spatialization. Due to the variety of available audio features, tra-

jectory mapping is able to achieve more ﬂexible spatialization eﬀects than timbre

spatialization, which is mainly spectrum-based.

The selection of an appropriate interpolation method can have a profound

impact on the perceived audio quality. It will aﬀect the perceived smoothness of

- 51 -

the sound as it transitions from one speaker to another. At high speeds, longer

interpolation produces a more seamless sound ﬁeld. Conversely, a small amount

of interpolation or a sharp edge will result in a more granular sound ﬁeld.

3.4.4.1.2 Trigger Shifting

In the trigger shifting approach, whenever a trigger sample from the control sig-

nal is encountered, the envelope target instantaneously shifts to a newly selected

loudspeaker. The Speaker Manager determines the destinations according to the

speaker topology property.

The decision-making method may be either Random or Nearest. In the Ran-

dom method, the next jump destinations are randomly selected from the available

candidate speakers, introducing an element of unpredictability to the speaker se-

lection. In the Nearest method, the next destinations are chosen based on prox-

imity, selecting the candidate speakers that are closest to the currently active

speaker. Similarly, the attack and release of the envelope have a profound eﬀect

on the auditory perception.

The fundamental distinction between the two modes pertains to the manner

in which the underlying structure of the spatial texture is to be understood. In

the case of Trajectory Mapping, a static binding of timbre to spatial coordinates

is employed, whereas in Trigger Mode, a dynamic binding of timbre changes to

spatial changes is utilized.

3.4.4.2 Distribution Processing

Distribution processing encompasses a range of additional manipulation methods,

in addition to panning, which collectively inﬂuence the overall character of the

sound. The current design comprises two fundamental processing stages.

•Spread: Spread is similar to the deﬁnition of Signal skirt by Hagen (Hagan,

2017). The signal can be distributed to all other loudspeakers, in addition to

the central one, with a lower gain, thus creating a more immersive sound ﬁeld.

•Overall Gain: The overall gain can also be modulated. When connected to

a slowly varying signal, it introduces additional details. Conversely, when the

control signal varies rapidly, it can result in a signiﬁcant alteration of the orig-

inal sound. Since the parameter controls all speaker behavior, it changes the

overall spatial listening experience.

3.4.5 Envelope Combinator

The envelopes created by the envelope generator serve as modulation signals for

the original audio input. The Envelope Combinator provides functions that facil-

itate the straightforward combination of sets of envelopes from disparate envelope

generators. This module is useful when more complex envelope generation strate-

gies are desired. It should be noted that this module is optional, as a single En-

- 52 -

velope Generator already provides suﬃcient information for dispersing the input

audio. The Envelope Combinator employs the scalability of unipolar envelope sig-

nals. Such signals retain their fundamental characteristics following mathematical

transformations. It is theoretically possible to cascade envelopes multiple times in

order to produce more complex modulation signals.

3.4.6 Audio Disperser

The Audio Disperser represents the ﬁnal stage in the overall process. It generates

the individual loudspeaker signals 𝑥∗

𝑛 by applying the corresponding modulation

signal 𝑚𝑛 on 𝑥:

𝑥∗

𝑛= 𝑥∗𝑚∗

𝑛

Subsequently, the modulated signals are distributed to the loudspeakers. In this

ﬁnal stage, the timbre characteristics of the input signal are combined with the

spatial properties of the modulation signals. Zerr* thus represents an approach

of achieving complete uniﬁcation of sound and space. The interplay of the input

signal, algorithm and loudspeaker conﬁguration allows performers to shape the

sound in texture, timbre and spatial behaviour simultaneously.

3.5 Discussions

The introduction of the functionalities of the modules that comprise Zerr* reveals

that it has the following main innovations: real-time spatialization through audio

features, sample-level processing, and support for non-conventional loudspeaker

arrays. These characteristics distinguish this approach from the majority of exist-

ing tools and render it an experimental endeavor. It is easier to create new sonic

experiences with it, but this inevitably leads to limitations in other aspects. The

following discussions will address the inherent advantages and disadvantages.

3.5.1 Creative Use of Audio Features

In the ﬁelds of audio content analysis and music information retrieval, the audio

feature serves as the fundamental basis for the construction of a system. Experts in

this ﬁeld possess a profound understanding of both hand-crafted audio descriptors

and audio embeddings derived from machine learning. A multitude of projects oﬀer

high-quality audio feature extraction algorithms, including Librosa (McFee et al.,

2015), Essentia (Bogdanov et al., 2013) and TorchAudio (Yang et al., 2021). The

high-quality libraries and the state-of-the-art algorithms and models are based on

the Python environments. The deployment and use of these models necessitates

the possession of adequate software engineering skills.

Additionally, there are libraries that provide audio analysis algorithms in au-

dio programming environments, both real-time and non-real-time(Collins, 2011;

Schnell et al., 2009). However, only a select few creative coders and music technol-

ogy researchers with a strong background in engineering are able to utilize these

- 53 -

algorithm libraries. These tools remain inaccessible to the average musician, in

part due to the complexity of audio feature knowledge and the diﬃculty of learn-

ing. However, this may also be attributed to the lack of a user-friendly workﬂow

and a comprehensive and clear tutorial for users with non-technical backgrounds.

To address this issue, the FluCoMa project has made a signiﬁcant contribution.

The Fluid Corpus Manipulation project (FluCoMa) employs novel method-

ologies for the creative exploitation of sound collections in musical contexts

(Tremblay et al., 2019; 2021). It integrates advancements in digital signal process-

ing algorithms and machine learning models into the toolkit for “techno-ﬂuent”

musicians, creative coders, and digital artists. FluCoMa oﬀers a comprehensive

methodology that elucidates the utilization of these modules within the toolkit

for audio analysis and sound decomposition, accompanied by corresponding work-

ﬂows for transforming a corpus of sound into music. Moreover, FluCoMa oﬀers

interactive introductions to each module, thereby facilitating the commencement

of the learning process²¹. In terms of implementation, FluCoMa was identiﬁed

²¹https://learn.ﬂucoma.org/

at an early stage of development as a tool that can be used across a range of

creative coding environments. This allows users to utilise FluCoMa within their

familiar environments and integrate it with their bespoke workﬂows. It is evident

that the conceptual design and implementation of Zerr* are profoundly inﬂuenced

by FluCoMa. Like the FluCoMa workﬂow, Zerr oﬀers a comprehensive processing

ﬂow with a variety of selectable algorithms tailored to user needs, allowing for

expansion with new algorithms as necessary.

Another issue that requires discussion is the necessity to control the spatial-

ization through audio features. The mapping of synthesizer parameters directly to

a spatialization algorithm is a relatively straightforward process that oﬀers clear

beneﬁts in terms of usability, similar to the topographic synthesis approach. In

this approach, the synthesizer parameters are directly distributed spatially. One

rationale for the utilization of audio features is that it permits a comprehensive

segregation of the sound source from the distribution system. A well-designed

mapping can be employed in conjunction with any source, and the sound source

can be altered in real time. This is a highly beneﬁcial strategy to enhance the

playability of improvisations, along with the ﬂexibility of the creative process.

Furthermore, synthesizer parameters do not directly correlate with audio features;

multiple parameters can aﬀect the same feature, and a single parameter can inﬂu-

ence multiple features. The utilization of audio features is merely an understanding

of timbre from a distinct parameter space, and is not interchangeable with the

mapping of synthesizer parameter methods. Moreover, for acoustic instruments,

audio features are the sole means of comprehending their sound and of controlling

them parametrically.

It is important to consider that when building Zerr* using multiple audio

features, there are correlations between diﬀerent audio features. A change in the

- 54 -

timbre of a source will result in a corresponding change in most of the audio

features. It is challenging to achieve precise one-to-one control unless one method-

ically selects or designs unrelated audio features. Once more, this can be regarded

as a shortcoming as well as a characteristic. Correlations between audio features

may complicate the interpretation of the current mapping relationships, yet they

may also enhance the overall experience of the spatialization eﬀect. In essence,

aside from the creator’s pursuit of interpretability of the system, the listener does

not actively perceive the spatialization eﬀect of the Zerr* system from this per-

spective. This will be covered in detail in Section5 on listening tests. If one’s

objective is to achieve precise one-to-one control, then the use of audio features is

not an appropriate solution.

3.5.2 Sample-level Processing

The concept of sample-level has been repeatedly highlighted in the introductory

sections of the modules. The section on the Feature Processor module states that

both trajectory and trigger control signals are in audio-rate, indicating that the

control information can be carried on any sample. It can be observed that while

the control signal is routed to sample level, the spatial textures that are gener-

ated also exhibit a corresponding sample level response. This is evidenced in both

modes of operation, with the envelope edge having a signiﬁcant impact on the

spatial eﬀect.

To illustrate, if the rate of occurrence of trigger samples is not constrained and

the length of attack and release is reduced to a single sample, the high density of

jumps between loudspeakers creates a unique aural sensation that is only possible

with sample-level control. The crack sound, which is caused by the sudden jump

of the audio signal, is something that is generally avoided by signal processing

algorithms. Even when discussed in the context of electroacoustic music, this falls

into the category of sounds that most musicians do not prefer. However, the sam-

ple-level processing capability oﬀers the possibility of using this extreme sound in

a musical way.

It must be acknowledged that the resulting sample-level spatial textures are

not always aesthetically pleasing. In some instances, the parameters of the map-

ping system require meticulous adjustment in order to achieve a harmonious bal-

ance between experimentation and musicality. This makes designing a concrete,

usable Zerr* system a challenging task. While it is possible to improvise on a

system without scruples, the process of designing the mapping itself is a rather

tedious and time-consuming compositional process. This is, to some extent, a lim-

itation of Zerr*. In order to achieve the desired ﬂexibility in live performance, it is

necessary to invest a signiﬁcant amount of time in preparation in order to create

the mappings.

- 55 -

3.5.3 Irregular Loudspeaker Setups

The Zerr* approach abstracts loudspeaker setups as logically connected trajecto-

ries or directed graphs. This allows it to be used with any irregular loudspeaker

setups. The use of specially arranged loudspeaker systems is not an uncommon

occurrence. In many real-world scenarios, it is challenging to achieve a speaker

distribution that fully aligns with the design speciﬁcations, while also considering

the impact of room acoustics. Consequently, even with the implementation of an

object-based spatialization system, there is no guarantee of a completely consis-

tent sound experience across creation and playback process. Fine-tuning a scene

while relocating it may take as much time as creating it directly in the playback

sound ﬁeld. In the case of sound installations with special artistic requirements, the

distribution of the loudspeakers will be more heterogeneous and the composition

will be more challenging to relocate. However there is no such issue as relocation

under Zerr*’s logic. All that is required is a highly customizable creation for the

particular loudspeaker setups. The more specialized the loudspeaker system is, the

more Zerr’s strengths can be utilized to fully realize the creative intent. is a highly

customizable creation for the particular loudspeaker setups at hand. The more

specialized the loudspeaker system is, the more Zerr*’s strengths can be utilized

to fully realize the creative intent.

Another aspect that makes it possible to fully utilize the characteristics of

the speaker system is the complete abandonment of portability. This implies that

the music created with Zerr* can only exist in a speciﬁc speaker setup. The use

of virtual speakers to playback under other conventional speaker setups or to ren-

der to stereo will result in a signiﬁcant deterioration of the listening experience,

which is highly detrimental to the promotion of the work. The decision of whether

to employ Zerr* for the purpose of authoring is contingent upon a cost-beneﬁt

analysis.

3.5.4 “Incorrect” Useage

The signal ﬂow diagram of the Zerr* approach is merely a suggestion. The ex-

pansion of this basis or the adoption of only some of the modules represents a

departure from the original design of Zerr*. However, this “incorrect” use can also

facilitate creativity and produce eﬀects that cannot be realized by the standard

process. Two possible uses that come to mind will be presented here. The funda-

mental prerequisites for the viability of such initiatives are the highly modular

design of the * approach and the adaptability of the formats employed for signal

transfer between modules.

The universality of trajectory and trigger allows the control signal to skip

circumvent the audio feature modules and to employ alternative inputs or algo-

rithms. A signiﬁcant proportion of traditional automation processes can be imple-

mented with self-generated control signals. Alternatively, in light of the discussion

in Section3.5.1, it is possible to utilise signals derived from synthesiser parameters

- 56 -

for the purpose of precise control. The utilization of this approach may obscure

numerous advantages inherent to the Zerr* approach; however, when employed in

an appropriate manner, it can enhance musicality.

An alternative approach would be to extend the scope of the Zerr* method-

ology. It is not always necessary to regulate the spatial properties of a sound using

the audio features derived from the sound itself. A signal generated by one source

can be utilized to regulate another source. This is analogous to a generalized

sidechaining eﬀect. This type of cross-modulation can occur at either the stage

of inputting control signals or at the stage of fusing the source and modulation

signals. In my personal authoring practice, this method has proven to be highly

eﬀective, particularly when the controlling source exhibits stable spatial properties

while the controlled source undergoes signiﬁcant global spatial property changes.

- 57 -

Chapter 4

Implementation

Zerr is not just a conceptual framework but also an ongoing software project.

While still under development during this thesis preparation, it is expected to

undergo long-term updates and improvements. The functions, parameter names,

and other elements mentioned in this chapter are based on the current version.

All of the code for Zerr* is open source under the MIT license and available on

the GitHub repository of the project²².

²²https://github.com/ringbuﬀer-org/Zerr

4.1 Aims and Priorities

Zerr*’s implementation was heavily inspired by the FluCoMa project. In their

paper, seven aims and priorities²³ that need to be considered for building toolkit

²³Native integration, Consistency, Learnability, Conﬁgurability, Scalability, Breadth and Completeness

are proposed which also aligns with the implementation goals of the Zerr* system

(Tremblay et al., 2021). Here we will introduce the most important ones in the

context of the actual situation of Zerr*.

•Native Integration denotes the necessity for the tool to adhere to the estab-

lished conventions of the system in which it is embedded, while simultaneously

exhibiting the requisite ﬂexibility to transfer data across disparate frameworks.

The implementation should utilize the host system’s native interface to the

greatest extent possible, as this will facilitate the eﬀective transfer of the user’s

experience within the host system to the process of using Zerr*. Concurrently,

the implementation cannot be wholly contingent on the data structure of a

host. The system parameters must be capable of migration to another environ-

ment without consequence and must maintain the performance of the system

before and after.

•Completeness denotes the capacity of the system to provide a complete tool-

chain. In each host system, all the core functions of the approach should be

realized with the functions provided by Zerr* system and a few built-in tools,

without the necessity of relying on any other third-party libraries.

•Conﬁgurability denotes the capacity of users to modify numerous facets of

the system to align them with their individual requirements, thereby combin-

ing their creativity into the system. Conversely, an implementation without

conﬁgurability implies a restricted range of ﬁxed processing preset algorithms

that can be utilized. In order to accommodate the static workﬂow, users must

alter the settings of other tools they utilize.

For further details on the introduction of additional aims and priorities, please

refer to the FluCoMa paper (Tremblay et al., 2021).

- 58 -

4.2 Modular Design & Profiles

The Zerr* system is comprised of two primary components: a series of core mod-

ules and encapsulations for various host environments. The core modules are

implemented purely in C++ and do not make use of any plug-in development

framework. In addition to the C++ standard library, they depend only on com-

mon underlying libraries to handle such standardized processes. The rationale

behind the decision to utilise a low-level and time-consuming development plan is

to guarantee that the fundamental functionality of Zerr* is self-contained, thereby

enabling it to be freely extended to diverse environments. The core modules are

divided in a similar manner to the modules of the signal ﬂow diagram, as illus-

trated in Figure7. Each module, with the exception of the Speaker Manager, is

implemented as an individual signal processing unit whose input/output signals

operate exclusively at an audio rate. The Speaker Manager is a submodule within

the Envelope Generator. One advantage of this approach is the freedom to choose

how diﬀerent modules are combined and wrapped. In accordance with the spe-

ciﬁc requirements of the application scenario, modules may be incorporated into

an integrated audio client in accordance with the standard signal ﬂow diagram.

Alternatively, each module may be encapsulated as a separate audio client, with

signals being transmitted between modules via the audio server of the embedded

system. Moreover, the modular design permits the utilization in unconventional

ways, as mentioned in Section3.5.4.

The conﬁguration ﬁle serves as the foundation for Zerr* to facilitate cross-en-

vironment data transfer. Two distinct conﬁguration ﬁles are available for use: the

speaker array conﬁguration and the module conﬁguration. The former retains the

standard properties that were previously outlined in the Speaker Manager section.

The latter encompasses conﬁgurations for the core modules, including the selected

features and the speaker selection mode, among other options. The conﬁguration

ﬁles are in the YAML format. In some host environments, conﬁguration ﬁles are

indispensable. In other environments, the system can be parameterized without

relying on conﬁguration ﬁles, which are simply a means for cross-environment de-

ployment.

4.3 Core Modules

Each core module and audio feature algorithm is commented in Doxygen style24,

allowing developers to use the generated Doxygen documentation for a deeper

24https://www.doxygen.nl/

understanding of the core modules. This section introduces the key details of the

implementation of each core module.

- 59 -

4.3.1 Feature Tracker

The design logic of the feature tracker module draws inspiration from the Essentia

(Bogdanov et al., 2013) library. Each audio feature extraction algorithm is imple-

mented using a consistent class template and is accessed through a uniform calling

interface within the feature tracker module. This uniformity simpliﬁes the process

of adding new audio feature algorithms to the system. Homogeneous processes,

such as audio buﬀering and Fast Fourier transforms, are conducted in the main

call module to prevent redundant calculations. The audio features that the Feature

Tracker is expected to output are loaded dynamically via a parameter list when

the Feature Tracker module is initialized. A Feature Tracker is capable of calcu-

lating and outputting an arbitrary number of audio features concurrently. The

number of output channels from the module is identical to the number of audio

features speciﬁed in the parameter list, and the order of output is also identical.

It is important to note that Feature Tracker only makes use of design ideas

derived from Essentia, without resorting to Essentia’s source code. The instanta-

neous audio features were developed independently based on the descriptions pro-

vided by Lerch (Lerch, 2012). Table2 enumerates the time-domain and spectral-

domain features that have been implemented thus far. The zero cross in this case

represents an unconventional sample-level feature. The output is a trigger sample

when the audio signal crosses the 0 point and remains at 0 for the remainder of

the time. In terms of functionality, this should be implemented by the Feature

Processor module. However, due to the extensive range of potential applications

for this sample-level feature, it is more practical to incorporate it into the Feature

Tracker.

Time Domain Spectral Domain

Root mean square

Zero crossing rate

Crest factor

Zero cross

Spectral ﬂux

Spectral centroid

Spectral rolloﬀ

Spectral ﬂatness

Table2: Instantaneous features

4.3.2 Feature Processor

The Feature Processor has been unable to provide a standard processing template

analogous to that of Feature Tracker due to the more varied functionality that

needs to be implemented. Consequently, the fundamental module for the Feature

Processor is designed to provide input/output interfaces without any speciﬁc im-

plementation. The manner in which the Feature Processor is integrated into a sys-

tem determines whether it is implemented as a hard-coded component or deﬁned

through a YAML ﬁle for numerical operations. Alternatively, it can be constructed

directly from standard blocks within the audio programming environment. All

calculations that ensure that the output control signals conform to the format

requirements are permitted.

- 60 -

4.3.3 Speaker Manager

The implementation of the Speaker Manager comprises two distinct steps. Initially,

the Speaker class, which describes the speciﬁc parameters associated with a given

speaker, must be deﬁned. Subsequently, the Speaker Manager class is constructed

using the objects of the Speaker class as its core members.

The Speaker class is employed to store and query all loudspeaker standard

properties in a static and structural manner. All standard properties of the speaker

are stored in a YAML conﬁguration ﬁle and loaded upon initialization. In the

current version, the standard properties include the unique identiﬁcation index,

coordinate information (Cartesian and spherical) and orientation information rel-

ative to a spatial origin point. A single format for coordinates is suﬃcient; the

other is calculated automatically following the loading of the conﬁguration. The

orientation system encompasses two degrees of freedom, namely yaw and pitch.

All speaker array standard properties can only be queried and not modiﬁed after

initialization.

In contrast, all speciﬁc properties are located within the Speaker Manager

object and can be reassigned at runtime. Furthermore, the Speaker Manager em-

ploys all the requisite control logic for selecting loudspeakers and furnishes the

requisite information for distribution processing. It processes the control signals

passed to it by the Envelope Generator on a sample-by-sample basis and instan-

taneously provides the speaker selection decision. Upon processing the trajectory

control signal, the Speaker Manager furnishes the current pair of speakers engaged

in the panning operation and the corresponding panning ratio. Upon the process-

ing of trigger control signals, the Speaker Manager will return the identiﬁcation

index of the selected speaker.

4.3.4 Envelope Generator

The number of multi-channel modulation signals the Envelope Generator outputs

corresponds to the number of speakers, and the order of output is consistent with

the order deﬁned in the speaker array parameter ﬁle. The selection of either tra-

jectory or trigger mode is dependent on the parameterization during initialization.

The Envelope Generator calls the corresponding method of the Speaker Manager

in accordance with the diﬀerent operational modes. The trajectory mode enables

real-time bending of the linear panning ratio, thereby allowing for precise control

of the generated modulation signals. In a similar manner, the length and curvature

of both the attack and release phases of the generated envelopes can be controlled

in real time within the context of trigger mode. Furthermore, the trigger mode

incorporates functions designed to standardize the input trigger signal. These in-

clude functions that enable the capture of only the rising edge of the step signal as

a trigger and functions that allow the user to deﬁne the minimum trigger sample

interval. Referring to the discussion in Section3.5.2, the minimum interval can

and is set to 0 by default.

- 61 -

The two proposed distribution processing methods are also subject to regu-

lation by the audio-rate control signals, which accept trajectory control signals by

default. The calculation of the spread is conducted subsequent to the selection

of the speaker. The algorithm needs to access a precalculated speaker distance

table, generated by Speaker Manager after loading the speaker coordinates, to

perform calculations based on the distances between speakers. The range of the

trajectory control signal is limited to values between 0 and 1, which precludes the

possibility of representing actual spatial distance information. In order to stabilize

the spread eﬀect on speaker arrays with diﬀerent distance scales, it is necessary to

introduce a standard distance parameter. The calculation logic of Spread is thus

conﬁgured such that at 0 no energy is distributed to the other speakers, and at 1

the speakers at standard distance positions are allocated a predeﬁned proportion

of energy. Subsequently, the signal resulting from the completion of the spread

step is adjusted in order to achieve an optimal overall volume.

4.3.5 Envelope Combinator

The number of input and output channels of the Envelope Combiner is also deter-

mined based on initialization parameters. It is capable of combining any group of

modulation signals having the same number of channels, one by one, in the order

in which the channels are arranged. The Envelope Combinator oﬀers a suite of

fundamental numerical operations for the combination of signals, including sum-

mation, averaging, and maximization. Moreover, the most practical approach is

the product root calculation. Referring to the symbols deﬁned in the Figure7, the

calculation formula is as follows:

𝑚∗

𝑛=𝑘

√|Π𝑘

𝑖=1𝑚(𝑛,𝑖)|

4.3.6 Audio Disperser

The input format of Audio Disperser is sound source for the ﬁrst channel, followed

by the generated modulation signal. sound source will be multiplied with each

modulation signal in turn to get the ﬁnal output signals.

4.4 Encapsulations

Two encapsulations of the Zerr* system have been developed so far: as JACK

clients25 and as Pure Data package. Other use environments such as SuperCol-

25https://jackaudio.org/

lider and Max/MSP, are still in the research and preliminary development phase.

For operating systems, binary releases have been successfully compiled on Ma-

cOS (Intel), MacOS (M1) and Linux. Since no Windows computers are currently

used and there is no immediate need for Zerr* on Windows, support for this plat-

form has low priority and is not yet available. For details on the progress of the

- 62 -

development, please refer to the development schedule in the README of the

Zerr* repository. This section focuses on the details of the Pure Data-based and

JACK Audio Connection Kit (JACK)-based encapsulation implementations, and

discusses other host environments that might want to experiment with as well as

their advantages and disadvantages.

4.4.1 Pure Data Package

The Pure Data version is the most fully developed encapsulation available and is

the version used in subsequent evaluations and creations. In the Pure Data ver-

sion, each core module is compiled into a separate external. The signal I/O and

conﬁguration interfaces are initially wrapped as C data structures, in keeping the

Pure Data design concepts. Then encapsulate all interfaces according to the inlet/

outlet, arguments, messages speciﬁcation of the Pure Data object. These are then

compiled as independent externals using pd-lib-builder26. The externals can then

26https://github.com/pure-data/pd-lib-builder

be utilized directly in Pure Data patching environment. The signaling between the

modules is managed by Pure Data. The following externals have been developed:

•zerr_features~

•zerr_envelopes~

•zerr_disperser~

•zerr_combinator~

All external names start with zerr to avoid naming conﬂicts with other libraries.

Function names are simpliﬁed from the original module names to reduce the length

of externals. All externals run under the PD sound engine, so their names end

with the tidal symbol, as is Pure Data’s custom. All audio-rate inputs and outputs

are encapsulated as inlets and outlets of external. All necessary parameters for

module initialization are encapsulated as inline arguments. The parameters that

can be modiﬁed at runtime are encapsulated as messages that can be received by

the ﬁrst inlet. An individual Feature Processor external is not included because its

functionalities can be easily achieved using built-in Pure Data objects, eliminating

the need for an additional module.

Figure8 shows an example patch that contains all the necessary Zerr* ex-

ternals. The audio sources input by adc~ are connected to zerr_features~ and

zerr_disperser~ respectively. The zerr_features~ analyzes the spectral centroid,

the spectral rolloﬀ, and the spectral ﬂux of the incoming signal in real time. The

analysis results are entered to zerr_envelopes~ after passing through the numeri-

cal calculation module of the PD. The zerr_envelopes~ is currently set to trajec-

tory mode and reads the speaker array parameter ﬁle called “circulation_8.yaml”

from a relative path. The 8-channel modulation signals are sent directly to the

zerr_disperser~ to be merged with the audio source and sent to the 8 output

channels for direct control of the 8 loudspeakers.

- 63 -

Each external comes with a corresponding help patch that can be looked

up in Pure Data. It describes in detail how to use each external with basic ex-

amples and all supported formats of arguments & messages. Screenshots of the

help patches can be found in the appendix A. The pd-lib-builder also provides

system-speciﬁc installation features, the Pure Data package of Zerr* can be in-

stalled with a single click using this command line tool. The package includes the

aforementioned externals, help patches, speaker array conﬁguration examples, and

continuously updated presets. The speciﬁc compilation and installation methods

are as described in the GitHub repository. Pure Data is best suited for exploring

diﬀerent combinations and connections of the basic building blocks and allowing

additional manipulation with built-in processing units. This makes it very easy to

explore speciﬁc techniques for using Zerr*.

Figure8: Example PD patch for eight loudspeakers

4.4.2 JACK Client

The earliest stage of Zerr* was based on JACK Audio Connection Kit (JACK). Its

development is currently lags behind the PD version. There are two main reasons

why the development was not started in Pure Data in the ﬁrst place. One is that

developing with the underlying audio server avoids limiting the design thinking

to PD mode too early, which is more helpful for migrating to other environments

later. Another beneﬁt is the ability to use newer compilation tools that provide

more eﬃcient compilation and clearer debugging information. This also eliminates

the hassle of re-entering the PD environment for testing after each update.

In the JACK implementation, all modules are internally buﬀered and con-

nected according to the standard signal ﬂow, forming a cohesive system signal pro-

cessing unit. The JACK audio client essentially encapsulates the system’s overall

inputs and outputs. The signal ﬂow between internal modules cannot be dynami-

cally modiﬁed. The processing methods in the Feature Processor must be written

according to the algorithm that is currently in use. The module parameters are

- 64 -

read exclusively from the conﬁguration ﬁle during system initialization. In sum-

mary, the JACK Client version is not a programmable system. Instead, it consists

of a series of command-line programs compiled with ﬁxed defaults. Each program,

once started, can execute only one constant processing algorithm.

The JACK version is much less ﬂexible than using Zerr* in Pure Data. How-

ever, there are speciﬁc situations where the JACK version can be advantageous.

When it comes to controlling very large arrays of speakers, graphical programming

can be extremely tedious, and the layers of encapsulation are not as computa-

tionally eﬃcient as communicating directly with the underlying audio server. In

addition, using JACK under Linux makes it easy to route audio signals between

diﬀerent software. The convenience it oﬀers is no less than that of Pure Data.

The JACK version is still highly recommended when a stable deployment of the

Zerr* system is desired. The recommended workﬂow is to validate the algorithm

in a programming environment such as Pure Data. The proven and reliable Zerr*

system can then be transferred to JACK for deployment. Although the internal

signal routing is not changeable, it is still possible to switch the Zerr* system to

use various loudspeaker systems via the conﬁguration ﬁle.

4.4.3 Encapsulations in Development

Max/MSP and SuperCollider are the environments that will be explored in the

future, and whether or not to make Zerr* a regular audio plug-in like VST3 is still

up for discussion.

Max/MSP comes from the same root as Pure Data and is very similar in its

use. Encapsulation on Max/MSP can directly apply all the design ideas of the

PD version. The advantages and disadvantages of Max/MSP over Pure Data are

obvious. First, it has better multichannel support, which eliminates the need for

tedious patching. Second, Max/MSP has a larger user base and a more robust

community operation that helps to promote the Zerr* system. However, being a

closed-source software, it is not friendly to some users who want to customize their

personal systems in depth. SuperCollider seems to be more ideal, as it guarantees

both open source features and stable multi-channel support. However, the logic of

SuperCollider’s UGen is slightly inconsistent with objects in graphical program-

ming environments. How to maintain the consistency of the Zerr* system while

adapting to the habits of SuperCollider users will be a challenge for future devel-

opment.

Audio Plug-in is not a suitable carrier for Zerr* systems, either in terms of

functionality or usage. While Digital audio workstation’s function is centered on

arrangement, Zerr* is essentially a performance system. One of the more attractive

aspects of Plug-in is that development frameworks such as JUCE27 have a very

27https://juce.com/

strong support for developing graphical interfaces. This is the area in which other

audio programming software is deﬁcient or unconcerned. In certain instances, vi-

- 65 -

sual feedback is as crucial as audio feedback for digital instruments. This is be-

cause, in many cases, consistent, tangible feedback is not available. Section2.4.1.1

mentions the eﬀect of the visualized trajectory on a person’s perception of the

spatial location of a sound source. For more abstract spatial eﬀects such as Zerr*,

it would be more user-friendly for non-technical users if there were corresponding

graphical feedback for users to visualize the results that can be produced under

the current system settings.

4.5 Discussion

The introduction of the Zerr implementation allows us to conclude that the cur-

rent implementation is capable of fulﬁlling the aims and priorities outlined in Sec-

tion4.1. It constitutes a complete workﬂow in its own right and has demonstrated

its capacity to integrate natively in a variety of environments. Furthermore, it

oﬀers a plethora of conﬁgurable parameters that permit users to fully exploit

their creative potential. As part of a master’s thesis, Zerr* was developed inde-

pendently, which made it challenging to maintain the project’s overall quality and

pace of development in line with other community-driven open source software.

Functionality, encapsulation, and documentation will be as consistent as possible.

The objective is to achieve consistency in functionality, encapsulation, and docu-

mentation. However, the current development focus is still on applications under

Pure Data.

In addition to the development of the Zerr system, a listening test was con-

ducted to gather feedback from individuals with diverse backgrounds. The subse-

quent chapter will provide a comprehensive account of the design, process and

analysis of the listening test.

- 66 -

Chapter 5

Evaluation

In order to evaluate the concept and implementation of Zerr* in a variety of

contexts, an on-site listening test was conducted, involving participants from a

diverse range of backgrounds was conducted. This chapter describes the listening

test in four parts: the ﬁrst outlines the test objectives, the second discusses the

experimental design, the third details the arrangement and implementation, and

the last analyzes participant feedback.

5.1 Goals & Expectations

The objective of this listening test is to obtain feedback on Zerr* from individuals

representing diverse backgrounds, which will inform the subsequent development

of the project. The feedback was comprised of two interrelated aspects: an assess-

ment of the conceptual framework underlying the Zerr* approach and an evalua-

tion of the eﬃcacy of the Zerr* implementation.

The conceptual evaluation assessed participants’ understanding of the Zerr*

method, examining whether it meets the design expectations. This included eval-

uating the feasibility and usefulness of spatial and timbral coupling for enhancing

creativity, the eﬀectiveness of controlling spatial properties through audio varia-

tions in reducing cognitive load during live performances, and importantly, Zerr*’s

ability to produce experimental and unique sounds with recognizable musicality.

The assessment of the Zerr system’s eﬃcacy includes two key elements: ﬁrstly, the

participant’s ability to quickly understand the system’s operational logic, and sec-

ondly, their capacity to eﬀectively use the system. The Zerr* is clearly an exper-

imental instrument, and its performance evaluation is likely to vary signiﬁcantly

among participants from diverse backgrounds. Another key objective of this lis-

tening test was to diﬀerentiate and categorize the feedback based on participants’

experiences.

5.2 Study Design

5.2.1 Test Environment

The listening tests for Zerr* were conducted in the TU-Studio E-N 325, which

was speciﬁcally designed for the purpose of sound ﬁeld synthesis and multichannel

music research and production.28. As illustrated in Figure9, the studio features

28https://tu-studio.github.io/studio-docs/EN325/

three distinct speaker array systems: a standard eight-channel ring loudspeaker

setup, a 21-speaker Ambisonics dome, and an extensive Sound Field Synthesis sys-

tem. Additionally, it includes two subwoofers positioned at the front and rear. All

- 67 -

speaker systems connect to a PC via a MADIface USB audio interface. The Zerr*

listening test utilizes these three speaker systems but excludes the subwoofers.

The E-N 325 system operates in two modes: the SeamLess Mode, developed by the

studio team (Coler et al., 2021), and the Direct Mode, which controls the channels

for each loudspeaker directly.

As a channel-based system, the studio must operate in Direct Mode. Cur-

rently, Direct Mode only supports direct control of octa and Ambisonics setups,

and does not allow independent control of each channel in the WFS-System. The

issue was resolved with the studio technician’s help by creating a new Dante

conﬁguration ﬁle for the listening test in Seamless Mode. This allowed the 64

WFS-system loudspeakers to be mapped directly to the 64 channels of the audio

interface, bypassing the SeamLess system for direct control. As a result, switching

between the two modes is necessary during the listening test to fully utilize all

three loudspeaker systems.

The 64 loudspeakers in the chosen WFS system are the mid-high frequency

units in the 8 WFS panels on the right of the Figure9. Each panel has 8 loud-

speakers arranged horizontally. The 64 loudspeakers thus form a horizontal linear

loudspeaker array, an unconventional setup compared to the studio’s other two

loudspeaker systems.

Figure9: TU-Studio E-N 325 © TU Studio Team

5.2.2 Test System

The Zerr* system was subjected to testing using the Pure Data version. In order

to fully express the characteristics of Zerr* in the shortest possible time, four pre-

sets were created. The four presets have been designed with simplicity in mind.

The objective is that anyone who is unfamiliar with the presets will be able to

understand them simply by trying them out. Each preset employs a fundamental

audio synthesis algorithm as the input sound source, with each algorithm oﬀering

- 68 -

only three parameters that can be manipulated. The Feature Tracker will extract

a single feature of the current input and utilize it to regulate one aspect of the

Envelope Generator.

Figure10: Synthesis algorithm patch for Zerr* listening test

The ﬁrst preset input is a square wave signal that passes through a low-pass ﬁlter.

The ﬁrst parameter controls the fundamental frequency of the square wave, the

second parameter controls the cutoﬀ frequency of the low-pass ﬁlter, and the third

parameter controls the duty cycle of the square wave. This signal is analyzed to

obtain its crest factor, which is normalized and for controlling the speaker selec-

tion process. The envelope generator is conﬁgured to operate in trajectory mode

with the conﬁguration ﬁle of the linear array loaded. The source with the lower

crest factor is assigned to the left side of the line, and vice versa.

The second preset input is a standard frequency modulation (FM) synthesis

algorithm. The ﬁrst parameter is the fundamental frequency of the carrier signal,

the second parameter is the modulation frequency, and the third is the modula-

tion depth. The Feature Tracker module is responsible for extracting the spectral

centroid. The resulting spectral centroid frequency is wrapped and rescaled be-

tween 0 and 1 with a period of 400Hz. Based on the wrapped centroid frequency,

the envelope generator assigns the input source to the 8-channel ring system in a

clockwise direction from the lowest to the highest frequency.

The third preset input is a combination of ring modulation and basic additive

synthesis. The fundamental waveform is a 440Hz sine wave. When the ﬁrst para-

meter is adjusted, the volume ratio of the integer multiples of the harmonics of the

sine wave is increased. The signal is then ring modulated with a sinusoidal signal.

The second parameter controls the modulation frequency, the third parameter is

the modulation amplitude. The spectral ﬂatness of the input signal was analyzed

and rescaled. The Envelope Generator is set to trigger mode and conﬁgured with

the linear array. The current speaker position is manually set to the center of

the loudspeaker line. The control signal was employed solely for manipulating the

dispersion of the source.

The fourth preset is a combination of low-frequency oscillator overlay noise.

The superimposed noise is white noise passed through a resonance low-pass ﬁlter

with a cutoﬀ frequency of 20 kHz. The ﬁrst parameter controls the gain of the

- 69 -

noise. The second parameter controls the frequency of the low-frequency oscillator,

which has a minimum value of 0.1 Hz and a maximum value of 60 Hz, which is

already within the audible frequency range to the human ear. The third parameter

controls the resonant level of the low-pass ﬁlter, which acts as a band-pass ﬁlter

to some extent.

All Zerr* presets are separate Pure Data patches, and not visible to partici-

pants. The participants interact solely with a highly wrapped patch of the synthe-

sis algorithms, as illustrated in Figure10. The four columns of this patch, from left

to right, correspond to the four basic synthesizer algorithms described above. Each

of the four views displays the waveform corresponding to the synthesis algorithm,

and the following three sliders correspond to the three parameters available for

tuning by the synthesis algorithm, from top to bottom. This patch deliberately

omits the name of the synthesis algorithm and the names of the parameters, so

that only the waveform can serve as a hint. The control panel on the far right

provides basic system controls and a specialized matrix audio routing system. The

six vertical signs represent the mute (m), all sources (a), and synthesis algorithms

1 to 4. The ﬁve horizontal signs correspond to stereo playback (s), zerr* presets

a to d. The routing system enables the instantaneous transmission of the audio

source utilized in the listening test to any desired playback system. The complete

set of Pure Data patches and conﬁguration ﬁles utilized in the listening test can

be found in the attached ﬁles. For the reader’s convenience, screenshots of the four

Zerr* preset patches are provided in Appendix B.

Figure11: MIDI controller for synthesis algorithm patch

The synthesis algorithm patch is controlled by participants via a MIDI controller

(AKAI MIDIMIX), as illustrated in Figure11. The four columns on the left of this

controller are directly aligned with the four columns of the patch. Three knobs

control the three synthesizer parameters from top to bottom, and the bottom

slider controls the volume of that synthesis algorithm. The three knobs in a col-

umn control the three parameters, while the bottom slider controls the volume of

the synthesis algorithm.

- 70 -

5.2.3 Experience Assesment

In order to obtain a more accurate proﬁle of the participants, the listening test

included a number of assessments of the participants’ relevant experiential back-

grounds. The speciﬁc types of experience involved include musical experience,

spatial audio experience, synthesis algorithm experience, and audio analysis ex-

perience. The musical experience of the participants was evaluated through self-

report scale questions. The spatial audio experience was gauged by requesting

that the participants listen to six audio clips and identify spatial sound patterns.

The participants’ knowledge about synthesis algorithms and audio analysis was

assessed through quizzes comprising 12 questions each.

The Goldsmiths Musical Sophistication Index (Gold-MSI) was utilized to as-

sess the musical experience of the participants (Müllensiefen et al., 2014). The

comprehensive Gold-MSI self-report inventory comprises 31 scale questions, 8 sin-

gle-choice questions(mixed with text input), and supplementary personal data

ﬁelds. A total of 15 scale questions and four single-choice questions were selected

on the basis of a variety of considerations, including their appropriateness and

duration of the listening test. The fundamental premise of Gold-MSI is to evaluate

the discrepancies in individuals’ personal literacy with regard to music in its con-

ventional sense. Consequently, some of the inquiries in the questionnaire are not

pertinent to the current context. The majority of the questions that were removed

pertained to self-assessment of the level of knowledge and training in tonal music.

These included the ability to sing melodies accurately and to recognize rhythmic

errors. The remaining questions pertain to the more general aspects, such as active

engagement and emotions. Furthermore, a multiple-choice question was posed to

know the most frequently utilized music production tools of the participants. The

questions included in the musical experience evaluation form used in the listening

test are as shown in Table3.

Six audio clips were prepared to assess participants’ experience of recognising

diﬀerent spatial properties of sound. The six audio clips were patched in Pure

Data, and placed on the right side of the main interface of the synthesis algorithm

patch. Participants can trigger the clips and listen to them on their own. The six

clips are organized into three groups of two, each testing a diﬀerent spatial prop-

erty. The ﬁrst set tests the ability to recognize the orientation of sounds. The ﬁrst

clip was a sine wave from directly in front, the second was a sine wave from directly

to the right. The second set tests the ability to detect the size of the sound source.

The third clip is a sawtooth wave played through only one speaker directly in

front, while the fourth clip is a sawtooth wave played simultaneously through nine

speakers in front, with amplitude normalized. The third group tests the ability to

recognize the movement of the sound source. The ﬁfth clip was white noise with

a 0.01 Hz sawtooth envelope moving clockwise in the speaker ring from a head-

down perspective. The sixth source is identical, but moves in a counterclockwise

direction. Each audio clip had a corresponding question to answer.

- 71 -

Scales

• I spend a lot of my free time doing music-related activities

• I sometimes choose music that can trigger shivers down my spine.

• I can sing or play music from memory.

• I’m intrigued by musical styles I’m not familiar with and want to ﬁnd

out more.

• Pieces of music rarely evoke emotions for me.

• I can compare and discuss diﬀerences between two performances or ver-

sions of the same piece of music.

• I often read or search the internet for things related to music.

• I often pick certain music to motivate or excite me.

• I am able to identify what is special about a given musical piece.

• I don’t spend much of my disposable income on music.

• I can tell when people sing or play out of tune.

• When I hear a music I can usually identify its genre.

• I would consider myself a musician.

• I keep track of new of music that I come across (e.g. new artists or

recordings).

• Music can evoke my memories of past people and places.

Single-choice

•I listen attentively to music for 0-15 min / 15-30 min / 30-60 min /

60-90 min / 2 hrs / 2-3 hrs / 4 hrs or more per day.

•I have attended 0 / 1 / 2 / 3 / 4-6 / 7-10 / 11 or more live music

events as an audience member in the past twelve months.

•I can play 0 / 1 / 2 / 3 / 4 / 5 / 6 or more musical instruments.

• The instrument I play best (including voice) is ____

Multiple-choice

What’s your mostly used music production tools?

• I don’t make music

• Song writing with main instruments (Guitar, Piano etc.)

• Music Notation Software (Musescore, Sibelius etc.)

• Digital Audio Workstation (Logic Pro, Ableton, Reaper etc.)

• DAWless (Modular Synthesizer, Sampler, Groove Boxes etc.)

• Audio Programming (Pure Data, Max/MSP, SuperCollider etc.)

• Other

Table3: Musical experience evaluation form

The questions to assess knowledge of synthesis algorithms and audio analysis

were developed in collaboration with the ChatGPT artiﬁcial intelligence system29.

29https://chat.openai.com/

ChatGPT was tasked with generating a substantial number of questions based

on the knowledge points covered in the listening test, then a thorough review

of the questions generated by ChatGPT was conducted and the questions were

reﬁned to meet the speciﬁed requirements. To illustrate, ChatGPT was requested

to generate in excess of 100 questions on a range of topics related to subtractive

synthesis, FM synthesis, ring modulation, and LFO, among others. A total of

12 questions were selected from the pool to form the ﬁnal quiz, with each knowl-

edge point represented by two to three questions. The selection of questions was

- 72 -

based on the following criteria: no factual errors, clear and moderately diﬃculty,

and exclusively related to the synthesis algorithms. All questions that pertained

to the utilization of synthesizers and the speciﬁc techniques employed in music

production were excluded. Similarly, the questions in the audio analysis quiz are

dominated by audio features from the Zerr* presets used for the listening test,

with a small number of audio features present in the zerr_features~ external. The

questions pertain solely to the algorithmic details and are not speciﬁc to the uti-

lization of the algorithm. From a personal standpoint, this approach to utilising

large language models appears to be a reasonable one. The complete set of 2×12

quiz questions is provided in the Appendix C, with the correct answer in bold.

5.2.4 Test Process

Prior to the commencement of the listening test procedure, participants are pro-

vided with an introduction to the test system, test environment and experience

assessment. This is conducted in a cross-sectional manner in order to prevent

the participants from becoming overly preoccupied with the project, which could

result in a lack of concentration. Prior to the assessments, participants were re-

quested to indicate their self-perceived level of experience. In order to ensure the

data were unbiased, participants were instructed to answer the questions even if

they believed they lacked relevant experience. The speciﬁc crossover introduction

and assessment process can be outlined as follows. The musical experience test

begins with an initial run-through, followed by an introduction to TU Studio and

the loudspeaker systems. Participants are then introduced to the Pure Data patch

and the MIDI controller, leading into a quiz on the sound synthesis algorithm.

The session concludes with a brief overview of Zerr* and an audio analysis quiz.

The listening tests for the four Zerr* presets followed a standardized proce-

dure. Initially, the sound source was routed to a two-channel system, using the

front two speakers from the ring system. Participants had one and a half minutes

to manipulate the sound source and were then asked to provide a textual descrip-

tion of the sound produced by the synthesis algorithm and the role of the three

parameters, focusing on either technical or perceptual aspects. Next, the sound

source was routed to the corresponding Zerr* system for spatialization. Partici-

pants were given two minutes to explore changes in timbre and spatial attributes.

They were then asked to describe the diﬀerences between the original and spatial-

ized sound and analyze the connection between timbre and spatial properties. This

was followed by scale questions assessing the perceived strength of the coupling

between timbre and spatial attributes and their eﬀectiveness in controlling spatial

attributes through the synthesis parameters. After responding, participants were

informed about the speciﬁc synthesis algorithms used and the functions of the

three knobs, as well as the role of the Zerr* system. Finally, they rated how well

these technical backgrounds matched their auditory experience.

This was followed by a comprehensive feedback session, during which partic-

ipants were asked to provide quantitative ratings of various aspects of the Zerr

- 73 -

system and to oﬀer their subjective opinions. The ten questions in the ﬁnal session

and their corresponding types are presented in Table4.

1 Rank From the tests conducted, which one demonstrated the most signiﬁcant coupling

between sound and spatial properties?

(Test1, Test2, Test3, Test4)

2 Scales How challenging was it for you to grasp the concept of the Zerr*?

(very hard → very easy)

3 Text Were there speciﬁc aspects of Zerr* that you found particularly diﬃcult to un-

derstand?

4 Text Have you encountered any discomfort of workﬂow or technical diﬃculties while

playing with Zerr*?

5 Scales In your view, does this sound-spatial coupling with Zerr* enhance or detract from

the overall musical experience?

(detract → enhance)

6 Scales Do you believe that the ability to control spatial properties via sound properties

simpliﬁes the process of composing or improvising music?

(totally agree → totally disagree)

7 Scales Do you think Zerr* oﬀers expanded possibilities for composing or improvising

music?

(not at all → yes for sure)

8 Text In your opinion, what unique features does Zerr* possess that distinguish it from

existing spatial audio systems?

9 Text Do you have any envision the application of Zerr* in contexts like sound instal-

lations or live improvisation music?

10 Text Are there any additional comments, suggestions, or ideas you would like to share?

Table4: Comprehensive feedback session

5.3 Procedure

5.3.1 Questionnaire

The complete process of the listening test was presented to the participants via

an online questionnaire on a laptop computer in the studio. Theoretically, partici-

pants could complete the entire listening test independently based on the instruc-

tional information provided on the questionnaire. In order to streamline the test,

I have endeavoured to minimise the verbal narrative component and have only

included explanations where necessary for sections that were not clearly expressed

in the text.

The questionnaire utilized for the listening test is hosted on Typeform30. Type-

form, as a business-oriented questionnaire service system was selected for several

30https://de5qfywm15f.typeform.com/to/Nj0OeAzs

reasons. Firstly, the time required for the faculty to respond to the application

for an account of student questionnaire system was a signiﬁcant factor. Secondly,

the Typeform system oﬀers a superior user experience in terms of questionnaire

editing and ﬁlling. It can be argued that the use of commercial questionnaires in

- 74 -

academic research does not detract from the overall credibility of such research. A

sample of part of this questionnaire is shown in Table5. It lists what 4 common

questionnaire elements look like, including description, scale qu, choice, and text

input.

Table5: Questionnaire Screenshots

5.3.2 Recruitment

The participants were recruited through a number of channels, including the fac-

ulty’s mailing list, social media platforms, live coding and spatial audio commu-

nities, etc. The recruitment and testing process commenced on February 27, 2024,

and concluded three weeks later.

A total of 18 individuals participated in the listening test. The participants

were drawn from a diverse range of backgrounds, including those engaged in audio

technology studies, those working as sound masters, musicians in bands, electronic

music producers, and music enthusiasts with no relevant experience. The mean

duration of participation was approximately 60 minutes. The unprocessed results

of the questionnaires can be found in the attached documents. In consideration of

the protection of individual privacy, no personal data is included in the results.

The participants were requested to provide their personal information on a sheet

of paper in the studio. Each participant was assigned a unique identifying index

which was used to correspond to the participant and the questionnaire they com-

pleted. The personal information on the sheet of paper will be destroyed after the

completion of the thesis.

5.3.3 Test Scenario

As illustrated by Figure9, each participant was positioned in the center of the

room. A MIDI controller is positioned in front of the participant. The laptop

- 75 -

screen is employed to present the questionnaire. The principal external monitor is

employed for displaying the synthesis algorithm patch. I observe the progress from

a position behind the participant without disturbing them. In the meantime, I

implement adjustments to the various conﬁgurations in accordance with the pro-

gression of the test and respond to any inquiries that the participant may have.

Following the completion of the tests, further communication will be initiated with

those participants who have demonstrated a high level of interest. Further details

regarding the project will be presented, with a particular focus on the concept of

design

5.4 Analysis

The Typeform system oﬀers straightforward data analysis and visualization tools.

Moreover, the unprocessed results can be exported to either the Excel or CSV

ﬁle formats. The data is then subjected to further processing and analysis using

the Python libraries Pandas and Matplotlib. The results were analyzed in two

dimensions. The ﬁrst was the overall feedback by the participants. The second was

the diﬀerences between participants with diﬀerent experience backgrounds. It’s

important to note that this listening test will not undergo quantitative statistical

analysis, such as hypothesis testing. Instead, it will concentrate on qualitatively

discussing feedback from each participant. This approach is due to the small sam-

ple size of 18 participants with diverse backgrounds, which poses challenges for

reliable statistical analysis. Additionally, the test included many subjective ques-

tions requiring textual responses that are best analyzed in the context of each

participant’s unique background.

5.4.1 General Feedbacks

5.4.1.1 Feedback for the Presets

The Figure12 represents the bar charts for the quantization questions in the four

Zerr* Prests tests. The graphs correspond to the three questions in order, from

top to bottom. The quantitative scale of the question is represented on the x-axis,

while the number of results for each value is represented on the y-axis. The four

colored bars represent the results of the four diﬀerent tests. The results of the text

input questions can be found in Appendix D.

As evidenced by the distribution of the ﬁrst question’s graph, participants

demonstrated a general ability to perceive a strong correlation between the timbral

and spatial properties. The fourth of these tests is the most apparent, followed by

the ﬁrst and second, while the third is the least apparent. This outcome aligns with

the anticipated outcomes of the four tests at the time of their design. The ﬁrst and

second spatialization patterns are the spatial movement of the sound source and

the spatial dynamic distribution of the timbre, respectively. The third pattern is

the change in sound shape with the spatial position of the sound source ﬁxed. The

- 76 -

human ability to recognize sound movements is higher than sound shapes. The

fourth source, however, exhibited a pronounced alteration in auditory perception

prior to and subsequent to spatialization, and the sound eﬀects were highly re-

sponsive to the parameters of the synthesizer.

The responses to the second question exhibited a lower concentration of re-

sults than those to the ﬁrst question. The prevailing view is that controlling spatial

properties through parameters is relatively straightforward. It can be reasonably

assumed that the underlying logic of control spatial properties via sound will not

pose any signiﬁcant challenges for the performer. The ﬁrst and second tests ex-

hibited a greater proportion of participants who perceived themselves to be un-

able to eﬀectively control the system. This discussion parallels the ﬁrst question

where dynamic features are clearly perceived, particularly when instabilities in the

mapping become apparent. Both the ﬁrst and second cases employ the trajectory

model, which directly links timbre to space. Linear parameter adjustments result

in non-linear spatial changes, complicating control in the these setups.

Figure12: Feedback for the Zerr* presets

Attitudes towards the relationship between technological context and perceived

experience varied widely among participants. The majority felt there was con-

sistency, but a signiﬁcant number perceived a low match. This could be due to

participants’ limited prior knowledge or the inherent diﬃculty of discerning tech-

nology through sound or conceptualizing sound based on techniques. Analysis of

the ﬁrst and third graphs revealed that participant experience remained robust,

even without any interpretability of the system. This insight is valuable for fu-

ture exploration of the application, suggesting that achieving a perceived unity

of sound and space does not necessarily require explicit, understandable mapping

relationships.

- 77 -

A qualitative understanding of Zerr*’s impact across four system types can

be derived by comparing participants’ attitudes towards the original and spatial-

ized sounds, as captured in the textual descriptions from the four tests. However,

not all feedback was informative; some participants misunderstood the textual

description task, and others showed limited engagement in the tests.

In Test 1, participants initially described the original sound using metaphors

like “hair shaver” and “alarm,” or adjectives such as “sharp” and “noisy.” Those

with synthesizer experience could identify the sound as a square wave signal and

accurately understand the purpose of the three parameters. After spatialization,

all participants noted the movement of sound in space, with most describing tim-

bre changes as the sound moved from left to right. However, few participants could

predict which speciﬁc timbre changes were associated with spatial movements be-

fore the explanation was given. The closest guess is that spatialization involves

the sharpness of the sound, not a change in frequency.

In Test 2, participants familiar with FM synthesis could identify its unique

qualities and accurately detail the use of the three parameters. While adjusting

these parameters, they noted an increase in the complexity and richness of the

sounds. It is also clear from the wording of the descriptions that the sounds It is

evident from the language employed in the descriptions that the sounds spatialized

in this manner aﬀorded the participants novel and gratifying spatial and sonic ex-

periences. Given the inability to ascertain the precise spatial location of the sound

source, participants naturally commenced to describe patterns of distribution and

variation of timbre in space. Speculations about the realization method ranged

from pitch and frequency adjustments to ﬁltering and phase shifting, demonstrat-

ing a variety of understandings. The diversity of the descriptions serves to conﬁrm

the conceptual design’s anticipation of the potential of the panning algorithm.

The combination of panning with timbre changes can result in the production of

a wide range of complex eﬀects.

The participants exhibited a more consistent perception of the original sig-

nal of Test 3, and AM/Ring Modulation was discernible by the majority of par-

ticipants. Many mentioned the ﬁrst parameter’s role in adding more harmonics.

While sometimes interpreted as distortion, their understanding of its function was

generally correct. With regard to the sound after spatialization, the majority of

participants were able to indicate that the sound would be perceived as wider.

This perception was found to be related to the number of sound harmonics and

the degree of harmony. Nevertheless, the overall eﬀects experienced by the partic-

ipants were not signiﬁcant, and none of them found the eﬀect to be appealing. It

can be demonstrated that variations of the sound shape alone are not an eﬀective

means of spatialization. The primary function of Zerr* remains the irregular high-

speed spatial panning, with the width of the sound serving as an ancillary role.

Test 4 was evidently of considerable interest to the participants, due to its

pronounced eﬀects and striking contrasts between the pre- and post-spatialization

- 78 -

conditions. The original signal due to its simplicity, it seems that all participants

can comprehend its synthesis principle with the aid of waveforms. With regard to

the signal post spatialization, the majority of participants indicated that it exerts a

pronounced spatial eﬀect. To provide a brief summary, the original comments from

participant #7 are cited as follows, “It becomes very rich spatial material, with

lots of options, from shivering noise creeches to very distinct static impulses, to

very smooth drones.” The majority of participants demonstrated an understanding

that the observed eﬀect was a consequence of rapid spatial shifts in sound. While

only one noted that the shifts occurred when the signal cross zero, the correlation

between timbre and spatiality was evidently perceived by the majority of partici-

pants. Additionally, the observed reactions of the participants indicated that they

were enjoying the experience. The outcome of this feedback was, in fact, somewhat

unexpected but nevertheless highly encouraging. This evidence demonstrates that

sample-level spatialization is not solely limited to the domain of technical exper-

imentation. Rather, it can be appreciated by broader audiences. The majority

of participants expressed a positive aesthetic experience. This evidence supports

the further exploration of this type of eﬀect and its potential use in real-world

applications.

5.4.1.2 Comprehensive Feedback

The results of ﬁrst ranking question in the comprehensive feedback as shown in

Table6, which test participants thought demonstrated the most signiﬁcant cou-

pling between sound and spatial properties, was consistent with the results ana-

lyzed directly from the Figure12. On average, the fourth test was placed ﬁrst,

followed by the second test and then the third test. This order is also consistent

with participants’ preference in the analysis of their subjective feedback.

Rank Test Average

1 Test4 - Noise + LFO 1.72

2 Test2 - FM 2.61

3 Test1 - Square Wave 2.67

4 Test3 - AM/Ring 3.00

Table6: Rank based on degree of coupling

Figure13 presents the ﬁndings derived from the scale questions included in the

comprehensive feedback In accordance with the preceding graph, each subgraph

corresponds to a single scale question. The text descriptions can also be found in

Appendix D.

The second question, regarding the degree of diﬃculty in comprehending the

concept of Zerr*, was perceived by participants as relatively straightforward. The

preponderance of participants choosing values in the center proves that a certain

threshold of conceptual understanding is still required. This implies that there is

still a cost of education if the concept is to gain widespread acceptance. Responses

to question 3 suggest that the main comprehension diﬃculty among participants

- 79 -

stemmed from a lack of background knowledge. This includes understanding both

synthesis algorithms and audio analysis algorithms. Stripped of these detailed

techniques, the fundamental concept of controlling spatial properties through tim-

bral variations can be grasped by the majority of participants. In conjunction with

the responses to question 4, it can be observed that even participants who explic-

itly stated that their background knowledge was insuﬃcient did not perceive any

discomforting aspects to their use. A lack of background knowledge can impede

the user’s ability to create their own preset system.

Figure13: Feedback scale bar chart for Zerr* approach

The ﬁfth question posed toparticipants was whether they recognized the value of

the concept of Zerr* and whether this type of coupling was an enhancement or a

hindrance to musicality. The results were clear, with the vast majority seeing it

as an enhancement to the overall musical experience.

The responses to question six exhibited a notable degree of polarization. The

participants perceived the overall diﬃculty of creating with Zerr* to be higher than

that of the conventional method. This result can be analyzed in two directions.

Firstly as stated above, Composing with Zerr* requires much technical background

knowledge, and most people would be dissuaded by these technical terms. This is

likely the primary reason why a signiﬁcant proportion of the participants selected

disagree. It is of paramount importance that developers of all types consider the

accessibility of tools for users with limited technical expertise One straightforward

- 80 -

approach is to provide a series of presets that do not require users to be aware

of the underlying technology. Another potential strategy is to educate users as

much as possible. The author is therefore particularly insistent on the importance

of the FluCoMa project in providing accessible tutorials for those with no prior

experience. In addition to the technical background, another aspect of the Zerr*

method is that it does not reduce the diﬃculty of creation. It is not an instrument

for enhancing eﬃciency; rather, it is a device for fostering creativity. As previously

discussed, while this approach facilitates improvisation in a live performance, the

process of writing the mapping system is a challenging compositional work. The

requisite eﬀort will in some cases be comparable to that of composing a complete

ﬁxed track. Generally, in music composition and improvisation, energy expendi-

ture and creative complexity increase when the reuse rate of the mapping system

is low.

One indication of the eﬀectiveness of Zerr* as a creativity tool is the remark-

able consistency observed in the responses to the next question. It is widely ac-

knowledged that this tool facilitates the expansion of new possibilities. This is the

most signiﬁcant beneﬁt of the tool, and it is the primary reason for its existence.

The subsequent three questions were designed to elicit subjective and open-ended

responses that would be highly relevant to the participants’ backgrounds. They

were therefore chosen for analysis in Section5.4.2.

5.4.2 Experience-related Feedbacks

The Figure14 contains self-reported relevant background information to the 18

participants. The Instruments lists the number of instruments the participant

plays and the instrument they is best at. The Production Tools refers to the tools

or methods of music production that they most frequently used. The meanings of

the abbreviations are shown in the notes at the bottom of the table, where NM

indicates that the participant does not produce music. The Keywords describes

basically the participant’s proﬁle. Given the considerable diversity of the partici-

pants’ backgrounds, this section was not included in the questionnaire. Instead,

the authors have identiﬁed the most relevant backgrounds of the participants from

communication.

Participants’ familiarity with background knowledge is detailed in Figure15.

Musical sophistication was self-rated using the condensed Gold-MSI scale. The

next three items feature self-assessed familiarity (detailed in the notes below) and

scores of mastery levels from quizzes. The scores were standardized to out of ten.

Data in gray highlight the highest scores for each topic.

It should be noted that the self-rated question in Spatial Audio category

asked participants about their familiarity with spatial audio technologies, while

the subsequent six audio clips test measured participants’ ability to recognize spa-

tial sound patterns. The results of this category were not highlighted, which is an

experimental design error that the authors must acknowledge. In these six test

- 81 -

audio clips, the actual listening experience diﬀered from what the authors had

hoped for due to stimulus design and room acoustic issues, making the ﬁnal test

results unreliable. In the ﬁrst two direction listening tests, the direct front sound

was obstructed by the computer screen. Moreover, the capacity to identify pat-

terns from sinusoidal signal was already limited, which resulted in a near-absence

of audible diﬀerentiation between the front and right directions. The third and

fourth tests for the width of the sound, there were no discernible diﬀerence in the

near-uniform spread of the sound through the space. And the rotations of the last

two stimulus were simple enough to be correctly distinguished by all participants.

Consequently, although the results have been presented here, they are not to be

utilized further.

Index Instruments Production Tools Keywords

13 Drum SW, DAW Band Musician, Drummer

24 Guitar SW, NS Band Musician, Guitarist

36+ Violin SW Band Musician, Bassist

40 \ DL, AP New Media Artist

53 Pipa NM Music Enthusiast

65 SuperCollider AP Computational Artist, Live Coder

71 Guitar AP Spatial Music Composer & Developer

84 Guitar SW, NS, DAW, OT³¹ Independent Musician

91 Voice DAW, DL Electronic Music Producer, DJ

10 1 Drum machine DAW, DL Electronic Music Producer, DJ

11 4 Guitar DAW Band Musician, Guitarist

12 1 Guitar NM Audio Engineer

13 1 Voice SW, DAW Independent Musician, Singer

14 4 Piano NS, DAW Recording Engineer, Mixing Engineer

15 3 Bass DAW Audio Engineer

16 0 \ NM Audio Engineer

17 3 Guitar SW, DAW Audio Engineer

18 4 Guitar DAW Audio Engineer, Music Producer

• NM: Non-Musician (“I don’t make music.”)

• SW: Song writing with main instruments (Guitar, Piano etc.)

• NS: Music Notation Software (Musescore, Sibelius etc.)

• DAW: Digital Audio Workstation (Logic Pro, Ableton Live, Reaper etc.)

• DL: DAWless (Mudular synthesizer, Sampler, Groove Boxes etc.)

• AP: Audio Programming (PureData, Max/MSP, Supercollider etc.)

• OT: Other

Figure14: Self-reported backgrounds

³¹Field recording

A salient correlation between the backgrounds of the participants and their evalua-

tions on the Zerr* system was evident. The impact of knowledge of sound synthesis

algorithms was more pronounced in the three categories of background knowledge,

as participants who had experience in this area were better able to understand the

speciﬁc changes in sound before and after. The inﬂuence of knowledge of audio

analysis was less pronounced.

- 82 -

Index Musical

Sophistication

Spatial Audio Sound Synthesis Audio Analysis

18.57 C 3.33 C 4.17 C 5.00

27.05 C 5.00 D 4.17 C 5.00

37.24 C 3.33 D 5.83 C 4.17

46.00 D 6.67 D 8.33 C 4.17

57.90 D 8.33 D 6.67 C 6.67

69.24 C 6.67 A 9.17 B 5.83

77.90 A 5.00 B 9.17 B 7.50

86.95 C 5.00 C 5.00 B 5.00

98.00 C 6.67 B 7.50 B 5.83

10 9.33 B 10.0 C 7.50 B 6.67

11 8.57 D 6.67 C 7.50 C 6.67

12 5.43 C 5.00 D 5.00 B 8.33

13 7.62 D 10.0 C 5.83 C 4.17

14 7.52 C 5.00 B 7.50 B 5.00

15 8.67 B 8.33 C 8.33 B 5.00

16 5.90 C 10.0 D 4.17 C 3.33

17 7.71 B 6.67 B 7.50 B 5.00

18 7.43 C 8.33 B 9.17 B 7.50

• A: I’m an expert in spatial audio

algorithms.

• B: I have experience in making

spatial audio pieces.

• C: I‘ve listened to spatial audio

pieces.

• D: I don’t have any experience

about spatial audio.

• A: I’m an expert in sound syn-

thesis algorithms.

• B: I skimmed over the algorithm

basics.

• C: I’m familiar with the synthe-

sizer sounds, but not the algo-

rithms.

• D: I don’t have any experience

about sound synthesis.

• A: I’m an expert in audio analy-

sis algorithms.

• B: I skimmed over the algorithm

basics.

• C: I don’t have any experience

about audio analysis.

Figure15: Results of the experience assesments

The relationship between musical background and attitudes in feedback was not

linearly correlated. Participants in the ﬁeld of audio and acoustic engineering, who

had basically no music-related activities, could clearly perceive little interest in

the test and some answer beside the point. However, enthusiasts who enjoy music

but have no experience in music production show an innate curiosity despite the

diﬃculties of understanding. The participants with some experience in music pro-

duction were also divided into two diﬀerent attitudes. The band musicians have

provided an overall negative review of Zerr*, or at least have reservations about

the value of the project. The feedback from the participants #1,#2,#3 can be

referred. This may be indicative of a signiﬁcant divergence of the aesthetic par-

adigm, particularly with rock music. It is challenging for musicians to identify

their preferred musical elements in the test, as they are unable to recognize any

familiar musical element, such as rhythm, melody, or harmony. In contrast, those

engaged in electronic music, music engineering, and independent musicians who

favour unconventional approaches tend to oﬀer more positive feedback. They could

think more about musicality in terms of timbre and speciﬁc sound details, as this

could lead to a plethora of intriguing suggestions and associations. First of all,

they could all feel the convenience of controlling the spatial attributes through the

sound, and many of them also reached the conclusion that Zerr* is suitable for live

- 83 -

performance. For instance, participant #10 stated that they thought this system

would be good for live music similar to the electronic duo, Autechre. Participant

#13 thought that this system would be good for large venues. Participant #18

pointed out that the “combination of spatial and timbre characteristics opens up

possibilities especially for improvising or intuitive composing.” In addition, they

identiﬁed a number of issues with Zerr*, including the signiﬁcant hardware sup-

port required, as well as questions regarding the system’s integration with studio

music production workﬂows.

Those with prior experience in audio programming and spatial music compo-

sition were able to provide more detailed and instructive suggestions. The com-

ments of participant #4, #6, #7 can be referred to. The responses of the four

participants will be subjected to a rigorous analysis.

Participant #4 has a broader experience in the use of creative tools as a new

media artist and has some familiarity with spatial audio-based artwork. In the

question about the degree of diﬃculty in understanding, they expresses clearly

that, although they has diﬃculty in making accurate descriptions using a range

of technical terminologies, they can still perceive a signiﬁcant link between spatial

and sonic attributes. Furthermore, their description of Zerr*’s position is notably

precise. “Zerr* as an additional layers between players and audio systems create

an automatic, changeable and intelligent eﬀect on sound locality, which doesn’t

fully controlled by players themselves, instead indirect gains inﬂuences.” In eval-

uating the potential of Zerr*, they proposed the possibility of transforming the

system into hardware. They further explained to the author after the listening test

whether Zerr* could be made into an add-on component for analog synthesiser

or for modular synthesis system. This was a notion that had not previously been

considered, but it does seem to be engineeringly feasible.

Participant #7 is a software developer for spatial music with similar experi-

ence to the author. However they has more experience in music production than

the author. They had relatively high scores in all assessments, which is a good

indication of how solid their relevant background is. In the process of communi-

cating with them, the auther can feel that there is a mutual agreement on technical

concerns. Upon analysis of the fourth preset, Section5.4.1.1 provided a quote

in which they expressed particular admiration for the spatial eﬀects that Zerr*

was able to produce. Comprehensive Feedback in which they expresses a more

obvious endorsement as “It really shines with more spatial capabilities.” Also they

thought Zerr* would be a great tool for live performance. The mapping system is

part of the artist’s personal style, just as important as the unique instrumental

techniques, the unique sound.

Participant #6 was the most experienced in the use of audio programming

software among all participants. In the question regarding which instrument they

considers to be their most proﬁcient, they was the only one to mention the audio

programming software, SuperCollider. During the listening tests, they was also

- 84 -

able to describe basically all the synthesis algorithms, the corresponding parame-

ters and the spatial eﬀects brought by the Zerr* system in an accurate way. In the

comprehensive feedback, they gave high praise for the program and expressed their

desire to use Zerr* with diﬀerent speaker setups in their live performances. Their

original statement about the Zerr* system compared to other spatialized systems

is as follows: “To couple synth (or any parameters) to the spatialization, and to

use audio analysis to decide how the spatialization occurs sounds to me more in-

teresting than trying to recreate real-world spatial perception (like in systems such

as ambisonics, wfs, etc)” This notion is entirely consistent with the one expressed

in this thesis in Section2; reconstructing a real-world spatial sound experience

is not the same as pursuing an instriging musical experience. The application of

spatial computing to music should not be that limited. What they suggests in the

additional comment is also a direction the author would like to try subsequently. “I

would love to experiment with this concept in diﬀerent speaker arrangements that

don’t necessarily follow rings/arrays but more asymmetrical ﬁgures, for instance.”

Under the concept of Zerr*, this a very intuitive direction of exploration. It is

regrettable that due to the limitations of the test site, this listening test can only

be as asymmetrical as possible under the standard loudspeaker setups. Another

point mentioned in the follow-up communication has also been emphasized so far.

They expressed a desire to explore more complex mapping systems based on audio

analysis. They noted that the presets in the test were too simple and would be

a bit boring. Mapping systems don’t necessarily need to ensure explicit mapping,

and it is more possible to get unexpected ideas on a more chaotic system.

5.5 Discussion

In summary, the participants expressed generally positive opinions. This approach

permits the spatial and timbral coupling, thereby fostering creativity and facili-

tating the exploration of new possibilities for spatial music. Although the adop-

tion of the Zerr* approach, in terms of functionality, does not lower the threshold

of spatial music production as a whole. But its advantages regarding live perfor-

mance are universally recognized. As an experimental tool, it was not as diﬃcult

to accept as initially expected. It can be observed that there is a certain technical

threshold for the creation of music with this system. However, there is no apparent

aesthetic or technical threshold for the appreciation of the sounds produced using

Zerr*. In summary, audiences don’t need to understand the technology behind

Zerr* to appreciate the aesthetic experience it oﬀers. Musicians can realize their

creative visions without fully grasping every detail of the system. Additionally,

this gives the tool’s developer, the author, the freedom to explore and integrate

more experimental features.

- 85 -

Chapter 6

Conclusions & Future Work

This thesis conducts an in-depth review of the development of spatial music cre-

ation tools and introduces a tailored coordinate system for categorizing them,

thereby establishing a robust foundation for the introduction of a novel approach,

Zerr*.

Zerr*, uniquely positioned within the ﬁeld of spatial music creation tools, ad-

dresses a gap in development and catalyzes new creative possibilities for artists. By

constructing an algorithmic framework that utilizes the intrinsic properties of au-

dio signals in conjunction with an innovative mapping system, Zerr* autonomously

distributes audio across arbitrary loudspeaker setups. This facilitates dynamic and

context-sensitive spatialization and spatial sound synthesis. This approach eﬀec-

tively couples timbre with spatial properties that extend beyond the limitations

of traditional spatialization techniques in the context of music.

The research not only expands theoretical knowledge but also demonstrates

a practical implementation designed with extendability and accessibility in mind.

The practical applications of this system were showcased through its implemen-

tation of the core modules, and its encapsulations as Pure Data package and

as JACK clients, enabling ﬂexible experimentation and integration into diverse

workﬂows. The emphasis on real-time manipulation and sample-level processing

indicated that this implementation is particularly advantageous for live perfor-

mance and improvisation.

A comprehensive listening test was conducted, gathering feedback from par-

ticipants of diverse backgrounds and experience levels on the concept and imple-

mentation of the approach. The feedback aﬃrmed a high level of understanding

and acceptance of the approach, oﬀering valuable insights for further reﬁnement.

Future work will concentrate on three principal areas: engineering, applica-

tion, and conceptual development. Engineering eﬀorts will be directed toward

broadening the system’s integration into other host environments. This will be

accompanied by the provision of improved documentation and tutorials. In the

ﬁeld of application, the system’s potential will be explored through active music

production, the organization of improvisation performances, and other practical

uses. Finally, the potential for further theoretical research and conceptual expan-

sion is considerable. A promising direction involves integrating learned features

that are well-aligned with contemporary advancements in artiﬁcial intelligence.

The integration of AI in spatial music technologies represents an exciting research

frontier (Einbond et al., 2024). Initial studies have already demonstrated the po-

tential of this approach.

- 86 -

List of Figures

Figure1: Pierre Henry performing with the pupitre d’espace .................... - 17 -

Figure2: Karlheinz Stockhause manipulate the rotating loudspeaker ........ - 18 -

Figure3: Sonic Trajectory for Poème Électronique .................................... - 23 -

Figure4: Interaction schematic between instrument and musician ............ - 24 -

Figure5: Signal ﬂow in live performance .................................................... - 43 -

Figure6: Signal ﬂow in improvisation performance .................................... - 45 -

Figure7: Signal ﬂow of Zerr* approach ...................................................... - 46 -

Figure8: Example PD patch for eight loudspeakers ................................... - 64 -

Figure10: Synthesis algorithm patch for Zerr* listening test ..................... - 69 -

Figure11: MIDI controller for synthesis algorithm patch ........................... - 70 -

Figure12: Feedback for the Zerr* presets ................................................... - 77 -

Figure13: Feedback scale bar chart for Zerr* approach ............................. - 80 -

Figure14: Self-reported backgrounds ......................................................... - 82 -

Figure15: Results of the experience assesments ......................................... - 83 -

- 87 -

List of Tables

Table1: Acousmonium (left) and Gmebaphone-1 (right) ........................... - 21 -

Table2: Instantaneous features .................................................................. - 60 -

Table3: Musical experience evaluation form .............................................. - 72 -

Table4: Comprehensive feedback session ................................................... - 74 -

Table5: Questionnaire Screenshots ............................................................ - 75 -

Table6: Rank based on degree of coupling ................................................. - 79 -

- 88 -

Bibliography

Agger, S., Bresson, J., & Carpentier, T. (2017). Landschaften–Visualization, Con-

trol and Processing of Sounds in 3D Spaces. International Computer Music

Conference (Icmc'17).

Alunno, M., & Yarce Botero, A. (2017). Directional landscapes: using parametric

loudspeakers for sound reproduction in art. Journal of New Music Research,

46(2), 201–211.

Austin, L., & Smalley, D. (2000). Sound diﬀusion in composition and performance:

an interview with Denis Smalley. Computer Music Journal, 24(2), 10–21.

Baalman, M. A. (2010). Spatial composition techniques and sound spatialisation

technologies. Organised Sound, 15(3), 209–218.

Barbosa, Á. (2003). Displaced soundscapes: A survey of network systems for music

and sonic art creation. Leonardo Music Journal, 13, 53–59.

Bascou, C. (2013). HoloPad: an original instrument for multi-touch control of

sound spatialisation based on a two-stage DBAP.

Bates, E. (2009). The composition and performance of spatial music.

Berkhout, A. J., Vries, D. de, & Vogel, P. (1993). Acoustic control by wave ﬁeld

synthesis. The Journal of the Acoustical Society of America, 93(5), 2764–2778.

Blackwell, T., & Young, M. (2004). Swarm granulator. Workshops on Applications

of Evolutionary Computation, 399–408.

Blumlein, A. (1933). Improvements in and relating to sound-transmission,

soundrecording and sound-reproducing systems. UK Patent, 394325.

Bogdanov, D., Wack, N., Gómez Gutiérrez, E., Gulati, S., Boyer, H., Mayor, O.,

Roma Trepat, G., Salamon, J., Zapata González, J. R., Serra, X., & others.

(2013). Essentia: An audio analysis library for music information retrieval.

Britto A, Gouyon F, Dixon S, Editors. 14th Conference of the International

Society for Music Information Retrieval (ISMIR); 2013 Nov 4-8; Curitiba,

Brazil.[Place Unknown]: ISMIR; 2013. P. 493-8.

Brech, M. (2015). Der hörbare Raum: Entdeckung, Erforschung und musikalische

Gestaltung mit analoger Technologie (Vol. 13). transcript Verlag.

Brech, M., Coler, H. von, & Paland, R. (2015). Aspects of space in luigi nono’s

prometeo and the use of the Halaphon. Compositions for Audible Space. The

Early Electroacoustic Music and Its Contexts. Music and Sound Culture, 193–

204.

Bresson, J. (2012). Spatial structures programming for music. Spatial Computing

Workshop (SCW).

- 89 -

Bresson, J., Agon, C., & Assayag, G. (2011). OpenMusic: visual programming

environment for music composition, analysis and research. Proceedings of the

19th ACM International Conference on Multimedia, 743–746.

Bresson, J., Bouche, D., Carpentier, T., Schwarz, D., & Garcia, J. (2017). Next-

generation Computer-aided Composition Environment: A new implementation

of OpenMusic. International Computer Music Conference (Icmc'17).

Carpentier, T. (2015). ToscA: an OSC communication plugin for object-ori-

ented spatialization authoring. 41st International Computer Music Conference

(ICMC), 368–371.

Carpentier, T. (2018). A new implementation of Spat in Max. 15th Sound and

Music Computing Conference (Smc2018), 184–191.

Clarke, M. (1999). Composing with multi-channel spatialisation as an aspect of

synthesis. 25th International Computer Music Conference, 17–19.

Clozier, C., & Olsson, J. (2001). The gmebaphone concept and the cybernéphone

instrument. Computer Music Journal, 81–90.

Coduys, T., & Ferry, G. (2004). Iannix aesthetical/symbolic visualisations for hy-

permedia composition. Journées D'informatique Musicale.

Coler, H. von. (2019). A JACK-based application for spectro-spatial additive syn-

thesis. Proceedings of the 17th Linux Audio Conference (LAC-19), Stanford

University, USA.

Coler, H. von, Schuladen, P., & Tonnätt, N. (2021). SeamLess Integration of Spa-

tial Sound Reproduction Methods.

Coler, H. von, Tonnätt, N., Kather, V., & Chafe, C. (2020). Sprawl: A network

system for enhanced interaction in musical ensembles. Proceedings of the 18th

Linux Audio Conference, 33–37.

Collins, N. (2011). SCMIR: A SuperCollider music information retrieval library.

ICMC.

Cross, T. Reframing Sound Shapes in Spectromorphological Composition: Notat-

ing perspectival space through spherical, Euclidean and Cartesian-coordinate

systems. Organised Sound, 1–11.

Dack, J. (2001). Diﬀusion as Performance. In IIASSRC Conference Proceedings

(pp. 81–88). IIASSRC Conference Proceedings.

Daniel, J. (2003, May). Spatial Sound Encoding Including Near Field Eﬀect:

Introducing Distance Coding Filters and a Viable, New Ambisonic Format.

Audio Engineering Society Conference: 23rd International Conference: Signal

Processing in Audio Recording and Reproduction. https://www.aes.org/e-lib/

browse.cfm?elib=12321

- 90 -

Davis, T., & Rebelo, P. (2005). Hearing emergence: towards sound-based self-or-

ganisation.

Decroupet, P., Ungeheuer, E., & Kohl, J. (1998). Through the sensory looking-

glass: the aesthetic and serial foundations of Gesang der Jünglinge. Perspec-

tives of New Music, 97–142.

Desantos, S., Roads, C., & Bayle, F. (1997). Acousmatic morphology: an interview

with François Bayle. Computer Music Journal, 11–19.

Dilger, T. (2013). Graphical Spatialization Program with Real Time Interactions

(GASPR). Intelligent Technologies for Interactive Entertainment: 5th Inter-

national ICST Conference, INTETAIN 2013, Mons, Belgium, July 3-5, 2013,

Revised Selected Papers 5, 136–145.

Dolby. (2014). Dolby Atmos Next-Generation Audio for Cinema.

Einbond, A., & Schwarz, D. (2010). Spatializing timbre with corpus-based con-

catenative synthesis. ICMC, 72–75.

Einbond, A., Carpentier, T., Schwarz, D., & Bresson, J. (2024). Embodying Spa-

tial Sound Synthesis with AI in Two Compositions for Instruments and 3-D

Electronics. Computer Music Journal, 1–19.

Garavaglia, J. A. (2016). Creating Multiple Spatial Settings with “Granular Spa-

tialisation” in the High-Density Loudspeaker Array of the Cube Concert Hall.

Computer Music Journal, 40(4), 79–90.

Garcia, Jérémie, Bresson, Jean, & Carpentier. (2015). Towards interactive author-

ing tools for composing spatialization. 2015 IEEE Symposium on 3d User In-

terfaces (3dui), 151–152.

Garcia, Jérémie, Bresson, Jean, Schumacher, et al. (2015). Tools and applications

for interactive-algorithmic control of sound spatialization in OpenMusic. In-

sonic2015, Aesthetics of Spatial Audio in Sound, Music and Sound Art.

Garcia, J., Carpentier, T., & Bresson, J. (2017). Interactive-compositional author-

ing of sound spatialization. Journal of New Music Research, 46(1), 74–86.

Garcia, J., Favory, X., & Bresson, J. (2016). Trajectoires: A mobile application

for controlling sound spatialization. Proceedings of the 2016 CHI Conference

Extended Abstracts on Human Factors in Computing Systems, 3671–3674.

Geier, M., Ahrens, J., & Spors, S. (2010). Object-based audio reproduction and

the audio scene description format. Organised Sound, 15(3), 219–227.

Geier, M., Hohn, T., & Spors, S. (2012). An open-source C++ framework for mul-

tithreaded realtime multichannel audio applications. Proc. Linux Audio Conf,

183–188.

Gerzon, M. A. (1973). Periphony: With-height sound reproduction. Journal of the

Audio Engineering Society, 21(1), 2–10.

- 91 -

Hagan, K. L. (2017). Textural composition: Aesthetics, techniques, and spatial-

ization for high-density loudspeaker arrays. Computer Music Journal, 41(1),

34–45.

Harada, T. (1992). Real Time Control of 3 D Sound Space by Gesture. Proc.

ICMC, 85–88.

Harrison, J. (1998). Sound, space, sculpture: some thoughts on the ‘what’,‘how’and

‘why’of sound diﬀusion. Organised Sound, 3(2), 117–127.

James, S. (2012). From autonomous to performative control of timbral spatiali-

sation.

James, S. (2015). Spectromorphology and Spatiomorphology of Sound Shapes: au-

dio-rate AEP and DBAP panning of spectra.

James, S. (2016). A multi-point 2D interface: Audio-rate signals for controlling

complex multi-parametric sound synthesis.

James, S. G. (2005). Developing a ﬂexible and expressive realtime polyphonic wave

terrain synthesis instrument based on a visual and multidimensional method-

ology.

Jaroszewicz, M. (2015). Compositional strategies in spectral spatialization. Univer-

sity of California, Riverside.

Jot, J.-M., & Warusfel, O. (1995). A real-time spatial sound processor for music

and virtual reality applications. ICMC: International Computer Music Con-

ference, 294–295.

Kendall, G. S. (1995). The decorrelation of audio signals and its impact on spatial

imagery. Computer Music Journal, 19(4), 71–87.

Kim-Boyle, D. (2008). Spectral spatialization-an overview. ICMC.

Lengelé, C. (2018). Live 4 Life-A Spatial Performance Tool Focused On Rhythm

And Parameter Loops. ICMC.

Lerch, A. (2012). An introduction to audio content analysis: Applications in signal

processing and music informatics. Wiley-IEEE Press.

Leslie, G., Zamborlin, B., Jodlowski, P., & Schnell, N. (2010). Grainstick: A

collaborative, interactive sound installation. Proceedings of the International

Computer Music Conference (ICMC), 4.

Lidbetter, P. S. (1988, November). The Concepts and Implementation of the Mul-

tichannel Audio Digital Interface (MADI) Format. Audio Engineering Society

Convention 85. https://www.aes.org/e-lib/browse.cfm?elib=4707

Lombardo, V., Valle, A., Fitch, J., Tazelaar, K., Weinzierl, S., & Borczyk, W.

(2009). A virtual-reality reconstruction of poeme electronique based on philo-

logical research. Computer Music Journal, 33(2), 24–47.

- 92 -

Lossius, T., Baltazar, P., & Hogue, T. de la. (2009). DBAP–distance-based am-

plitude panning. ICMC.

Lukes, R. D. (1996). The" Poeme electronique" of Edgard Varese. Harvard Uni-

versity.

Lynch, H., & Sazdov, R. (2011). An ecologically valid experiment for the com-

parison of established spatial techniques. International Computer Music Con-

ference.

Magnusson, T. (2019). Sonic Writing. 36–37.

Malham, D. G., & Myatt, A. (1995). 3-D sound spatialization using ambisonic

techniques. Computer Music Journal, 19(4), 58–70.

Marshall, M. T., Malloch, J., & Wanderley, M. M. (2009). Gesture control of sound

spatialization for live musical performance. Gesture-Based Human-Computer

Interaction and Simulation: 7th International Gesture Workshop, GW 2007,

Lisbon, Portugal, May 23-25, 2007, Revised Selected Papers 7, 227–238.

McCartney, J. (2002). Rethinking the computer music language: Super collider.

Computer Music Journal, 26(4), 61–68.

McFee, B., Raﬀel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., &

Nieto, O. (2015). librosa: Audio and music signal analysis in python. Scipy,

18–24.

McGee, R. (2015). Spatial modulation synthesis. ICMC.

McLeran, A., Roads, C., Sturm, B. L., & Shynk, J. J. (2008). Granular sound

spatialization using dictionary-based methods. Proceedings of the 5th Sound

and Music Computing Conference, Berlin, Germany, 1.

Miyama, C., & Dipper, G. (2016). Zirkonium 3.1-a toolkit for spatial composition

and performance. Proceedings of the International Computer Music Confer-

ence, 313, 312.

Mooney, J., & Moore, D. (2007). A concept-based model for the live diﬀusion of

sound via multiple loudspeakers. Proc. DMRN, 7.

Mooney, J., & Moore, D. (2008). Resound: open-source live sound spatialisation.

Proceedings of the International Computer Music Conference 2008.

Mooney, J., Moore, A., & Moore, D. (2004). M2 diﬀusion: The live diﬀusion of

sound in space. Proceedings of the International Computer Music Conference

2004.

Morgan, R. P. (1975). Stockhausen's Writings on Music. The Musical Quarterly,

61(1), 1–16.

Müllensiefen, D., Gingras, B., Musil, J., & Stewart, L. (2014). The musicality of

non-musicians: An index for assessing musical sophistication in the general

population. Plos One, 9(2), e89642.

- 93 -

Negrao, M. C. (2014). ImmLib-A new library for immersive spatial composition.

ICMC.

Normandeau, R. (2009). Timbre Spatialisation: The medium is the space. Organ-

ised Sound, 14(3), 277–285.

Nystrom, E. (2018). Topographic Synthesis: Parameter distribution in spatial tex-

ture. Proceedings of the 2018 International Computer Music Conference, 117–

122.

Oomen, P., HOLLEMAN, P., & DE KLERK, L. (2016). 4DSOUND: A New Ap-

proach to Spatial Sound Reproduction and Synthesis. WHITE PAPERS, 238.

Peters, N., Lossius, T., & Schacher, J. C. (2013). The Spatial Sound Description

Interchange Format: Principles, Speciﬁcation, and Examples. Computer Music

Journal, 37(1), 11–22. http://www.jstor.org/stable/24265581

Pottier, L. (1998). Dynamical spatialization of sound. HOLOPHON: a graphic

and algorithmic editor for Sigma1. Dafx98 Proceedings.

Puckette, M., & others. (1996). Pure Data: another integrated computer music

environment. Proceedings of the Second Intercollege Computer Music Concerts,

37–41.

Pulkki, V. (1997). Virtual sound source positioning using vector base amplitude

panning. Journal of the Audio Engineering Society, 45(6), 456–466.

Pulkki, V. (1998). Creating generic soundscapes in multichannel panning in

Csound synthesis software. Organised Sound, 3(2), 129–134.

Pulkki, V. (2001). Spatial sound generation and perception by amplitude panning

techniques. Helsinki University of Technology.

Pysiewicz, A., & Weinzierl, S. (2017). Instruments for spatial sound control in real

time music performances. a review. Springer.

Reynolds, C. W. (1987). Flocks, herds and schools: A distributed behavioral

model. Proceedings of the 14th Annual Conference on Computer Graphics and

Interactive Techniques, 25–34.

Roads, C. (1978). Automated Granular Synthesis of Sound. Computer Music Jour-

nal, 2(2), 61–62. http://www.jstor.org/stable/3680222

Ross, V. E. (2012). Too Much Change: How Fantasia's Cinematic Innovations

Overwhelmed the Audience of 1940. Kino: The Western Undergraduate Film

Studies Journal, 3(1).

Rothstein, J. (1995). MIDI: A comprehensive introduction (Vol. 7). AR Editions,

Inc.

Schaeﬀer, P. (2012). In search of a concrete music (Vol. 15). Univ of California

Press.

- 94 -

Schmele, T. (2011). Exploring 3d audio as a new musical language.

Schmele, T., & Lopez, J. J. (2022). Comparisons between VBAP and WFS using

Spatial Sound Synthesis. Audio Engineering Society Convention 153.

Schnell, N., Röbel, A., Schwarz, D., Peeters, G., Borghesi, R., & others. (2009).

MuBu and friends–assembling tools for content based real-time interactive au-

dio processing in Max/MSP. ICMC.

Schumacher, F., Espinoza, V., Mardones, F., Vergara, R., Aránguiz, A., & Aguil-

era, V. (2021). Perceptual recognition of sound trajectories in space. Computer

Music Journal, 45(1), 39–54.

Schumacher, M., & Bresson, J. (2010). Spatial sound synthesis in computer-aided

composition. Organised Sound, 15(3), 271–289.

Shi, C., & Gan, W.-S. (2010). Development of parametric loudspeaker. IEEE Po-

tentials, 29(6), 20–24.

Smalley, D. (1997). Spectromorphology: explaining sound-shapes. Organised

Sound, 2(2), 107–126.

Smalley, J. (2000). Gesang der Jünglinge: History and Analysis. Режим

Доступа: Http://sites. Music. Columbia. Edu/masterpieces/notes/stock-

hausen/gesanghistoryandanalysis. Pdf.

Start, E. (2024, January). Loudspeaker Matrix Arrays: Challenging the way we

create and control sound. Audio Engineering Society Conference: AES 2024

International Conference on Acoustics & Sound Reinforcement. https://www.

aes.org/e-lib/browse.cfm?elib=22351

Stefani, E., & Mooney, J. (2009). Spatial composition in the multi-channel domain:

aesthetics and techniques. Proceedings of the International Computer Music

Conference 2009.

Sturman, D. J., & Zeltzer, D. (1994). A survey of glove-based input. IEEE Com-

puter Graphics and Applications, 14(1), 30–39.

Teruggi, D. (2007). Technology and musique concrète: the technical developments

of the Groupe de Recherches Musicales and their implication in musical com-

position. Organised Sound, 12(3), 213–231.

Theile, G., & Wittek, H. (2004). Wave ﬁeld synthesis: A promising spatial audio

rendering concept. Acoustical Science and Technology, 25(6), 393–399.

Thiébaut, J.-B. (2005). A graphical interface for trajectory design and musical

purposes. Journées D'informatique Musicale.

Thomson, P. (2004). Atoms and errors: towards a history and aesthetics of mi-

crosound. Organised Sound, 9(2), 207–218.

Todoroﬀ, T. (1995). Real-Time Granular Morphing and Spatialisation of Sounds

with Gestual Control within MAX/FTS. ICMC.

- 95 -

Todoroﬀ, T., Traube, C., & Ledent, J.-M. (1997). NeXTStep graphical interfaces

to control sound processing and spatialization instruments. ICMC.

Topper, D., Burtner, M., & Seraﬁn, S. (2003). Spatio-operational spectral (sos)

synthesis. ICMC.

Torchia, R. H., & Lippe, C. (2004). Techniques for multi-channel real-time spatial

distribution using frequency-domain processing. Proceedings of the 2004 Con-

ference on New Interfaces for Musical Expression, 116–119.

Tremblay, P. A., Green, O., Roma, G., & Harker, A. (2019). From collections to

corpora: Exploring sounds through ﬂuid decomposition. International Com-

puter Music Conference and New York City Electroacoustic Music Festival,

223–228.

Tremblay, P. A., Roma, G., & Green, O. (2021). Enabling programmatic data

mining as musicking: the ﬂuid corpus manipulation toolkit. Computer Music

Journal, 45(2), 9–23.

Truax, B. (1988). Real-time granular synthesis with a digital signal processor.

Computer Music Journal, 12(2), 14–26.

Truax, B. (1998). Composition and diﬀusion: space in sound in space. Organised

Sound, 3(2), 141–146.

Valiquet, P. (2012). The spatialisation of stereophony: Taking positions in post-

war electroacoustic music. International Review of the Aesthetics and Sociology

of Music, 403–421.

Wakeﬁeld, G., & Taylor, G. (2022). Generating Sound & Organizing Time: Think-

ing with Gen~ Book 1 (Issue bk.1, pp. 16–25). Cycling '74. https://books.

google.de/books?id=yvV4zwEACAAJ

Wanderley, M. M. (2001). Gestural control of music. International Workshop Hu-

man Supervision and Control in Engineering and Music, 632–644.

Wenzel, E. M., Begault, D. R., Godfroy-Cooper, M., Roginska, A., & Geluso, P.

(2017). Immersive Sound: The Art and Science of Binaural and Multi-Channel

Audio. Routledge.

Wilson, S. (2008). Spatial swarm granulation. ICMC.

Wilson, S. (2009). BEASTMulchLib: BEASTmulchLib is a SuperCollider class li-

brary designed for use in the creation, processing and presentation of complex

multichannel signal chains. Objects include sources, matrix routers and mixers,

and sound processors and spatialisers. The latter are based on a simple user-

extensible plugin architecture. Many classes have elegant GUI representations.

Wilson, S., & Harrison, J. (2010). Rethinking the BEAST: Recent developments

in multichannel composition at Birmingham ElectroAcoustic Sound Theatre.

Organised Sound, 15(3), 239–250.

- 96 -

Wright, M. (2005). Open Sound Control: an enabling technology for musical net-

working. Organised Sound, 10(3), 193–200.

Wright, M., Chaudhary, A., Freed, A., Khoury, S., & Wessel, D. (1999). Audio

applications of the sound description interchange format standard. Audio En-

gineering Society Convention 107.

Yang, Y.-Y., Hira, M., Ni, Z., Chourdia, A., Astafurov, A., Chen, C., Yeh, C.-

F., Puhrsch, C., Pollack, D., Genzel, D., Greenberg, D., Yang, E. Z., Lian, J.,

Mahadeokar, J., Hwang, J., Chen, J., Goldsborough, P., Roy, P., Narenthiran,

S., … Shi, Y. (2021). TorchAudio: Building Blocks for Audio and Speech Pro-

cessing. Arxiv Preprint Arxiv:2110.15018.

Zotter, F., Zaunschirm, M., Frank, M., & Kronlachner, M. (2017). A beamformer

to play with wall reﬂections: The icosahedral loudspeaker. Computer Music

Journal, 41(3), 50–68.

- 97 -

Appendix A: Zerr* External Help

Patches

Please note that this is only a demonstration and that the latest, cor-

rect version can be found in the repository.

•zerr_features~

•reference

- 98 -

•zerr_envelopes~

•reference

- 99 -

•zerr_combinator~

•reference

- 100 -

•zerr_disperser~

•reference

- 101 -

Appendix B: Zerr* Preset Patches

•Zerr* Preset A

•Zerr* Preset B

- 102 -

•Zerr* Preset C

•Zerr* Preset D

- 103 -

Appendix C: Synthesis Algorithm &

Audio Analysis Quizzes

Synthesis Algorithm Quiz

1. What is the primary function of an oscillator?

• To control the volume of the sound

•To generate the basic waveform or sound

• To modulate other components like ﬁlters

• To create rhythmic patterns

2. FM Synthesis is best described as:

•Frequency Modulation of one oscillator by another

• Amplitude Modulation of one oscillator by another

• A method of ﬁltering frequencies in a sound

• A technique for creating rhythmic patterns

3. Which waveform is typically used for creating bass sounds in subtractive syn-

thesis due to its harmonic richness?

• Sine wave

•Sawtooth wave

• Square wave

• Triangle wave

4. What is the primary function of ring modulation in sound synthesis?

•To mix two audio signals in a way that creates new harmonic con-

tent

• To modulate the frequency of an oscillator

• To synchronize the phase of two waveforms

• To split an audio signal into multiple frequency bands

5. In FM synthesis, what term is used to describe the modulating oscillator?

• Carrier

•Modulator

• Operator

• Transmitter

6. Which type of ﬁlter cuts oﬀ frequencies above a certain threshold and allows

lower frequencies to pass?

• High-pass ﬁlter

•Low-pass ﬁlter

• Band-pass ﬁlter

• Notch ﬁlter

7. Which synthesis technique involves combining multiple simple waveforms to

create complex sounds?

• Subtractive Synthesis

•Additive Synthesis

• Wavetable Synthesis

- 104 -

• Phase Distortion Synthesis

8. In subtractive synthesis, what is the result of increasing resonance on a ﬁlter?

• It decreases the volume of the sound.

•It emphasizes frequencies around the ﬁlter’s cutoﬀ point.

• It broadens the range of frequencies that the ﬁlter aﬀects.

• It changes the waveform shape passing through the ﬁlter.

9. An LFO (Low Frequency Oscillator) is typically used to create audible pitches

in sound synthesis.

• True

•False

10. In subtractive synthesis, the primary method of shaping the sound is by re-

moving certain frequencies from a rich harmonic sound source using a ﬁlter.

•True

• False

11. Ring modulation is a form of amplitude modulation that can produce inhar-

monic overtones, often resulting in bell-like or metallic sounds.

•True

• False

12. A noise generator can only produce white noise.

• True

•False

- 105 -

Synthesis Algorithm Quiz

1. The Crest Factor of an audio signal is the ratio of:

•Peak amplitude to RMS amplitude.

• RMS amplitude to mean amplitude.

• Peak amplitude to mean amplitude.

• Mean amplitude to peak amplitude.

2. A high Crest Factor in an audio signal typically indicates:

• A high level of distortion.

• A smooth, sustained sound.

•A dynamic range with signiﬁcant peaks.

• A consistent amplitude level.

3. Which of the following best describes Spectral Flatness?

• It measures the ‘peakiness’ of the spectrum.

•It indicates how noise-like a sound is, compared to being tone-like.

• It is the average frequency of the spectrum.

• It represents the highest frequency in the spectrum.

4. If a signal’s Spectral Flatness measure is close to 1, the signal can be described

as:

• Very tonal.

•Very noisy.

• Very dynamic.

• Very rhythmic.

5. The Spectral Centroid can be used as an indicator of the brightness of a sound.

•True

• False

6. What does the Spectral Centroid of an audio signal represent?

•The average frequency of the spectrum.

• The highest frequency in the spectrum.

• The loudness of the audio signal.

• The duration of the audio signal.

7. Spectral Flux measures:

•The rate of change in the spectral power.

• The highest power in the spectral domain.

• The ﬂatness of the spectrum.

• The balance of even and odd harmonics.

8. A higher Spectral Flux value typically indicates a more rapidly changing

spectrum.

•True

• False

9. The Zero Crossing Rate of an audio signal is:

•The rate at which the signal changes from positive to negative

or back.

• The frequency at which the amplitude is highest.

- 106 -

• The number of times a signal reaches zero amplitude.

• The rate at which the spectral energy rolls oﬀ.

10. A high Zero Crossing Rate is often indicative of a high-pitched sound.

• True

•False

11. Spectral Rolloﬀ is a measure of the bandwidth of the signal.

•True

• False

12. The Spectral Rolloﬀ point is the frequency below which a certain percentage

of the total spectral energy is contained. What is the typical percentage used

in most applications?

• 50%

•85%

• 95%

• 60%

- 107 -

Appendix D: Textual Feedback

Test1

Index For Original Sound For Spatialized Sound

1The sound is like the hair shaver. The parameters ef-

fect changed the pitch

I can feel more detail after the spatialization. I can

feel the sound sometimes very concentrated and some-

times separated.

2ﬁrst parameter frequency; second parameter changes

the of long the signal 1 lasts and at the same time also

changes the waveform; the third parameter i forget.

The sound changes mainly with second parameter.

Obviously the sound property changing is associated

with spatial properties but due to my limited knowl-

edge i cannot describe precisely. Frequency seems to

have no relation.

3button noise in diﬀerent tonal Low frequency going left, high frequency going right

4metallic, the saw wave is ﬁltered in to square wave muddy and clear

5like a alarm’s sound the ﬁrst button will change the sound randomly, but

the second and the third one can hear the direction

of sound clearly.

6Continuous sound, parameter 1 changes pitch, para-

meter 3 changes pulse width, parameter 2 makes the

sound less sharp

Sounds like the perceived amplitude is related some-

how to where the sound is positioned in space. Also

related to the sharpness of the sound, or when the

pulse width is very small it sounded like the sound

moved in space more drastically.

7sounds substrative synthesis. like a square wave which

is folded. 1st knob controls pitch, 2nd some ﬁlter, 3rd

brightness overtones. 2nd is interesting!

theres a perceptual adding up to it. i feel some tresh-

olds that evoke fast moving. the sound feels more

lively. fast movement in space

81. frequency low/high

2. waveform triangle/sine/square

3. hpf/lpf

1. feels more spatial compared to the original

2. it moves around between my left side to right side

9Slider for Volume, Third knob for pulse width mod,

second ﬁlter oﬀ higher freq, ﬁrst knob changes pitch/

freq.

spatial auto pan from, sound changes from front

right top to front right bottom with diﬀerent diﬀer-

ent pitches, pan positions switch between left to right

based on diﬀerent ﬁlter positions and pulse width

10 1: Frequency Speed of frequency change 2: Filter 3

Pulse waveform change.,

Spatial pan

11 the sound is relatively sharp, with some rhythmic feel-

ing the parameter on the top changes the frequency

feeling of the sound the second one changes the range

the third one controls the richness of the sound

when some parameter changes, the spatial properties

changes at the same time, getting harder to control

the tone and spatial properties separately the sound

with higher frequency or sharper tone gives feeling

they are focusing on one point, left or right, very

clearly

12 Sound itself is tonal Knob1 changes pitch Knob 2

changes tonal qualitz Knob 3 makes it noisz

After spatialization the sounds feels like it has higher

oscilation Certain spatial directions respond to cer-

tain pitches

13 Top knob: frequency, changed pitch; middle: ﬁlter,

tone changed; bottom: waveform.

The 4 speakers have diﬀerent set-up that allow diﬀer-

ent type of sound to pass through.

14 top: frequency; mid:sharpness and warmness; bot:dif-

ferent Anteil of Amplitude

it’s more direct with spatialization, and left side has

more diﬀusion

15 raw, technical, square wave it can be narrow, but also really wide. with frequency

changes, you can also feel changes of the percepted

room in the sound, can sound very big and strong and

with no direction, but also very direct.

16 much bass, noisy, not comfortable sounds like a bee, annoying

17 sharp, granulated feels like there are more sound sources, can mostly

locate the sound source

18 square wave to triangle, frequency adjustable, shape

width of the waveform adjustable, constant sustained

clear tone

“harsher timbre wise, but at the same time not as

harsh, because it was spatially distributed and one

could lay down in the sound ﬁeld sound not so fo-

cussed, so less annoying”

- 108 -

Test2

Index For Original Sound For Spatialized Sound

1The sounds are more wobbly after I change the sec-

ond knob.

After I changed the third knob, the frequency and the

location of the sound both change.

2ﬁrst one still frequency, second seems to change

the combination of the two waves(?), third still the

wavewidth. Second is confusing.

when the second is set up to a certain value, if you

change the third one the sound is shifting between

front left and front right.

3“Knob above controls the frequency (high low) Knob

middle controls the cut oﬀ Knob below is like a oscil-

lator”

Turn the knob clockwise and the sound also goes

clockwise. At the same time, the parameters also in-

crease the frequency.

4ﬂowting, rounded, using square wave to mudulate the

frequency of sine wave

the location of sound changes with resonance

5the both sides are keeping more balances, can’t feel

the dynamic process anymore

like the alien’s sound

6FM synthesis, param one is carrier freq, param two is

modulation amount, param 3 is modulator freq

sounded like the perceived pitch is what is guiding the

spatialization. when modulating the frequency and

ﬁnding slow frequency oscillations I could hear the

sound moving with the pitch

7def modulated sound. goes from sine to noisey harsh

rich ovetone but also more tone sound charac. be-

comes rhzthmic in some values of knob 23

ﬁltering invokes the movement. clockwise. but the 1st

param also creates fast rhythmic back and forth pan-

ning realted to the sound peaks. rather fast move-

ments

8“1. fundamental frequency

2. cut-oﬀ frequency/ waveform

3. intensity of the sound"

“feels more dynamic phase shifting, frequency modu-

lation,”

92nd knob fm depth, 1nd knob modulator freq, 3rd

knob, carrier freq.

ﬁrst version stereo ﬁeld, second version adding sur-

round movements to sound elements based on diﬀer-

ent parameter changes

10 Frequency modulator Shaos

11 “the sound is sinus smooth sound the parameter on

the top controls the frequency the second one controls

the variation or how much the other wave interrupt

the sound the third one also changes the frequency

but in a diﬀerent way, it also controls the interception

of the other wave”

“feels like speeding up, increasing the frequency

while running around clockwise with the sound get-

ting noisy, or complicated, the spatial properties also

change, more frequency, move faster”

12 “Sine wave, with diﬀerent frequency modulators

Tonal Metallic sounds very bright can be achieved”

“Roomy sound With pitch change, sound moves left

to right high oscilattions”

13 Top knob: frequency; middle: how often does the bot-

tom knob eﬀect happen; bottom: adding a second

wave in.

Space journey. Diﬀerent frequency pass and other set-

up for receival sound for diﬀerent loudspeaker.

14 top:basic frequency; mid:the frequency of 2nd signal;

bot:sampling rate

softer, diﬀerent moving speed

15 low Sine wave, change in frequency but also waveform perceived space changes with frequency and shift in

parameter 2. it gets more sounding like metal, and

feels a bit wobby

16 continous, wavy , smooth higher, spacy

17 sound is width, change is sensitive sound barely changes, but is hard to locate

18 “from low and simple very fast to very complex sound

with lots of harmonics, can be very harsh sounding

my guess: fm synthesis 1. fundamental frequency 2.

modulator frequency 3. modulator gain”

“it’s possible to generate interesting movements of

soundshape and spatial positioning interesting points

where sound switches rapidly, like stable sustained

sound (sine like) breaks into modulated harsh wob-

bling sound”

- 109 -

Test3

Index For Original Sound For Spatialized Sound

1The change of sound is very intuitive. When the modulator frequency and depth are bigger,

the sound is wider

2third parameter frequency also relates to spatial prop-

erties. ﬁrst one changes the quality of sound dramati-

cally, seems to become very metalic and inharmonice.

second one seems to be modulator diﬀerences.

the ﬁrst parameter changes, seems to have more other

waves integrated, becoming more inharmonic.

3Knob above decides the tonal. Knob middle makes

the sound narrow. Knob below makes volume wave

the modular amplitude makes the sound stretching in

the space

4ring modulation? pure sound is more dispersed

5the third button can create the break of the sound the ﬁrst button can change the sound from far to near,

and another two button can make the sounds jumping

6amplitude modulation; param 1 is frequency of ﬁrst

osc, param 2 is frequency of second osc, param 3 is

modulation amount

sounds like the more complex the sound (more har-

monics) the wider is the spatialization

7am modulation i guess. param 3 controls how fast

amplitude is changing. 2 give some distorstion,noise

maybe realted to how much eﬀect. sound feels very

organic moving from big to small

tough to spatialise. at some in between setting sound

was moving to the right but couldnt redo it again.

the 2nd param didnt feel so strong eﬀect as it was

in stereo.

8“resonance noise-like ﬁlter frequency modulation” “feels like a match between frequency shift and spa-

tial movement. diﬀusion from point to array dot to

continuity”

9Ring Mod, 1st knob making it more noisy, 2 changes

center freq, 3rd amplitude mod.

1st knob center to sides, 2nd no pulsing to pulsing,

3rd more upper harmonics to more dull

10 Ambient. Slow changes. Flat “Harmonic volumes

2.Modulator Amplitude

Modulator Frequency"

11 “sound with high frequency and waving the ﬁrst pa-

rameter controls proportion of ﬁrst kind of sound the

second controls the second kind the third one controls

how much the sound waves”

with higher proportion of ﬁrst wave, the sound get-

ting more centralized, with more parameter two the

tone does not change much but sound getting wider,

and the third parameter makes sound more waving,

also more spacey

12 “Tonal base sound, sine wave First knob changes, how

noisy the tone will be getting less tonal other knobs

make the sound oscilate”

Possible to widen the sound quite a bit, but one fre-

quency stays in the middle

13 Top knob: add wave to existing wave? Middle: modu-

lator frequency; Bottom: modulator changing ampli-

tude.

I have no idea. But I think when changing knob 3 the

change is more obvious.

14 top: overlap another high frequency signal; mid: over-

lap another low frequency signal; bot: amplitude osc

central sound very concentrated, width changes with

diﬀerent overtone frequency

15 Sinus mit veraenderlichem Klirrfaktor und amplitu-

denmodulation

more spacious, a bit darker then the other sounds

16 only high frequencies, smooth, consistent conistent, wavy

17 sound is smooth, change is clear to hear sound is at right side, properties is clear

18 “sine wave with ﬁxed frequency you can add harmon-

ics to the sine with ﬁxed frequency you can add sec-

ond sine and adjust it’s frequency smooth sound to a

little bit harsh very stable sustained sound”

“wider, more open very unstable in comparison not

so smooth anymore, but also not harsh fragile”

- 110 -

Test4

Index For Original Sound For Spatialized Sound

1It sounds like sea and the environment noise. When the parameters of the second and third knobs

are bigger, the jump feelings are less.

2related to noise. ﬁrst parameter changes the wave

integrated to the main sin-wave, second parameter

changes the main sin-wave, third one more noisy or

less.

ﬁrst and second parameters make the jump of spatial-

ization, third don’t change much of the jump.

3knob above change the volume. knob middle add a low

frequency slowly that gets higher. knob below makes

the sound more thin and penetrated

Turn the knob above move the sound to the top. turn

the knob middle and down raise the pace of the sound

jumping from diﬀerent speaker.

4Low frequency oscillator and noise the cut oﬀ of the carrier wave changes the location

5feeling of white noise and can control the coarseness

of voice

the white voice shows strong space eﬀects

6sounds like a noise source with amplitude being mod-

ulated by a sine wave, parameter 3 seems like a high

pass ﬁlter, param 1 is noise amplitude and param 2

is the sine wave frequency

it sounds like when the sine wave oscillator crosses

the zero the sound jumps to another speaker. when

the sine wave frequency is very slow you can hear it

choosing randomly the speakers very quick, so it feels

like the is a small threshold near the zero crossing.

7feels like 2 sounds. a deep base sine tone and white

noise on top of the waveform. 3rd feels like highpass

ﬁlter. yet i did not perceive it as one sound rather two

entities

it becomes very rich spatial material, with lots of op-

tions, from shivering noise creeches to very distinct

static impulses, to very smooth drones. it really evokes

learning and playing!!!

8“1. gain

2. fundamental frequency of a masked tone

3. noise strength"

“sound feels completely diﬀerent

clockwise-rotation noise strength relative location"

93rd knob bandpass ﬁltering thru, 2nd knob adjusting

LFO freq, 1st knob noise level

1 knob beating of noise oscillator 1, 2nd knob funda-

mental freq of oscillator two, 3rd knob forgotten

10 “Noise 1. Noise Amplitude

2. LFO Frequency

3. Resonant Level"

Subtle

11 “the sum of some noise and bass sound with very low

frequency the ﬁrst parameter controls how strong the

noises are the second controls the frequency of bass

sound the third one controls the threshold of ﬁlter”

it just jumps out of somewhere randomly, but with

rhythm, but with interception of bass sound the noise

get controlled

12 White noise, frequency modulation possible “The random placement on the speakers is quite in-

tense Speed of change can be adjusted, from perceiv-

able change to chaos”

13 Tob knob: noise; middle: low freq osc; bottom: ﬁlter. If waves in view4 has x y: then I would say each singal

speaker is responsible for diﬀerent ranges of x y.

14 top:white noise; mid:low frequency; bot:high pass spatial properties come with the Amplitude change-

ment

15 white noise, changes in frequency of the sine and the

noise

Das Rauschen wird auf die Lautsprecher gegeben und

die Verteilung erhoeht sich, mit steigender Frequenz.

Sound fuehlt sich gross an und der Einﬂuss auf den

Klang sehr gut

16 rushy, low frequently, more stereo, more 3d eﬀekt,

17 sound is wide and ambient, change is slight chaotic, cant tell direction of the sound

18 “noise oscillator modulated by lfo

1. noise gain, 2. lfo frequency 3. ﬁlter frequency of

soft bell ﬁlter?

good sub coloured noise"

“moving fast nosiy harsh all over the place not so

sustained anymore but almost percussive lfo rate and

noise gain were assiociated with rate of movement,

maybe some sort of dynamic property”

- 111 -

Comprehensive Feedback

Index Question 3 Question 4 Question 8 Question 9 Question 10

1It would be diﬃ-

cult to understand

how the parameters

aﬀect the spatial

sound without basic

acknowledge of the

sythsizer.

No. Very intuitive Sorry I can’t tell It must be amazing. No

2no knowledge of the

ﬁeld therefore hard

to grasp in general.

no, in general

smooth.

enough amount of

hardware i guess.

yes. not really.

3The synthesizer jar-

gon

no The ambisonic se-

quence

Yes no

4Although it’s hard

for me to use tech-

nical terminologies

to describe the ac-

curate parameters,

which change the

spacial properties, I

can still strongly

feed the connection

between the space

and sound charac-

ter. The operation

on sound inﬂuences

indirectly the sound

ﬁeld.

nop Zerr* as an ad-

ditional layers be-

tween players and

audio systems cre-

ate an auto-

matic, changeable

and intelligent eﬀect

on sound locality,

which doesn’t fully

controlled by play-

ers themselves, in-

stead indirect gains

inﬂuences.

I see so many poten-

tials.:) looking for-

ward to seeing it be

realised in hardware

(like daisy?)

GREAT

5to ﬁnd the exact

sources where the

single sound come

from

no can self control and

have more possibil-

ity

yah;) can create and

try more diﬀerent

sound’s resoure and

maybe can let us

to try the combina-

tions of any single

sound

6no no To couple synth

(or any parameters)

to the spatialization,

and to use audio

analysis to decide

how the spatializa-

tion occurs sounds

to me more in-

teresting than try-

ing to recreate real-

world spatial per-

ception (like in sys-

tems such as am-

bisonics, wfs, etc)

I would deﬁnitely

like to play live with

diﬀerent speaker se-

tups and Zerr*

as my spatialization

tool

I would love to ex-

periment with this

concept in diﬀer-

ent speaker arrange-

ments that don’t

necessarily follow

rings/arrays but

more asymmetrical

ﬁgures, for instance.

7how mapping strat-

egy was chosen

no i think it really

shines with more

spatial capabilities.

eg more speaker or

more dimensions. it

feels very intuitive

cause you listen to

sound and its spatial

characters together.

diﬀeretn from gui

based spat

i think it a great

tool for playing live -

ﬁnd sound and map-

pings that match

your personal style!!

- 112 -

8no the connection of

the ethernet cable

the possibilty to

freely modify your

presets

yes. very impressive ap-

plication made with

graphical represen-

tation and knobs

for adjusting para-

meters

9no not much thank you more concentrated

possiblity of sound

Yes Good work!

10 Good No It’s simpler, more

practical

Live for Autechre Keep going

11 the knowledge of

synthesizer

nope controls the spa-

tial properties while

changes the sound

properties

yes, but need equip-

ment support

the control of spatial

properties could be

more clear

12 The translation of

parameters from

stereo to spatial rep-

resenatation

No The ability to freely

play with parame-

ters, that automati-

cally will be trans-

lated into spatial

sound

For live music

with needed speaker

setup, special at-

mosspheres could be

created

13 Not for the Zerr

but just general con-

cepts.

Nope No previous experi-

ence with other spa-

tial audio systems.

Yes. Makes it possi-

ble to pass over en-

ergy to engage au-

dience from diﬀerent

area, if it is a big

venue.

Idk how this would

aﬀect studio music

production though.

14 n/a basic concept knowl-

edgement

Freiheit und

Moeglichkeit

Neue Musik Kompo-

sition

Very potiential

15 not really no great and very easy

movement of space

and sound

yes, totally no comments

16 the graphics no wellenkopplung und

3d verteilung

yes no idea

17 deﬁniton of spectral

centroid etc.

not able to good

control the knob

can freely assign pa-

rameters to the syn-

thesize and listen to

the eﬀect

yes tell the people be-

fore the test that

which loudspeakers

are used in which

view

18 spatial mapping for

am, ring modulation

could not push

square wave synth

to the right

combination of spa-

tial and timbre char-

acteristics opens up

possibilities espe-

cially for improvis-

ing or intuitive com-

posing

see question 8 no

- 113 -