Document [original]

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

SCALE MODEL AURALIZATION FOR ART, SCIENCE, AND MUSIC: THE STUPAPHONIC

EXPERIMENT

Brian F.G. Katz

Audio & Acoustics Group

LIMSI-CNRS

Orsay, France

[email protected]

Markus Noisternig

Acoustic and Cognitive Spaces Group

UMR STMS, IRCAM-CNRS-UPMC

Paris, France

[email protected]

Olivier Delarozière

Woodstacker

Champ-au-Beau, France

[email protected]

ABSTRACT

The use of acoustical scale models has been replaced for the most

part by computational models and numerical simulations for room

acoustic studies as well as artificial reverberation units. There re-

mains however a number of acoustical phenomena which are diffi-

cult to address with computer simulations, such as coupled vol-

umes, diffraction, and complex scattering, due to the computa-

tional complexity and/or calculation time necessary for addressing

such acoustical wave phenomena on the scale of room acoustical

problems, even small rooms. This paper presents a pilot study

of a rather unique artistic architectural structure consisting of a

self-supporting construction composed of small stacked linear ele-

ments. Acoustically, the structure combines modal behavior, con-

cave forms, and very regular scattering patterns. An example scale

model has been constructed and studied in order to separate differ-

ent construction features and their associated acoustics effects. In

an attempt to explore the interest of the specific acoustic for mu-

sical performance, a computational platform was created to utilize

the scale model as a physical convolution reverberation unit for

musical performance.

1. INTRODUCTION

With the advent of recording, and dry recording studios, there have

been many efforts developed for the reintroduction of reverbera-

tion into studio recorded music. Some of the first technologies

developed were the use of “echo chambers”, wherein the dry au-

dio captured with the microphone was diffused in a reverberant

environment over loudspeakers, and then recaptured with micro-

phones. This physical-based artificial reverberation was quite pop-

ular, with examples existing in such famous institutions as Abbey

Road Studios where the echo chamber was constructed in 1931a,b.

Echo chambers are, however, space demanding, difficult to trans-

port, and not extremely adjustable. With improvements in elec-

tronics, other physical-electronic reverberation systems have been

developed such a plate reverberators and spring reverberators.

ahttp://en.wikipedia.org/wiki/Echo_chamber, last viewed 2013-11-30

bhttp://audiogeekzine.com/2011/02/the-history-of-echo-echo-chambers-chambers/,

last viewed 2013-11-30

With improvements in computer processing power, purely

electronic reverberation became possible, such as using feedback

delay network (FDN) for reverberation processing [1, 2, 3].

These reverberators could be easily adjusted, for example us-

ing perceptual descriptors relying on a simplified model of the

time-frequency energy distribution of parametric FDN [4]. Such

reverberators are however limited, lacking certain realism and

ability to represent unique architectural elements. Additional

increases in computational power allowed for the use of convo-

lution reverberators, using complex impulse responses, either

measured or calculated based on geometrical models such as

ray tracing [5, 6], beam tracing [7, 8, 9], or radiosity [10, 11].

Convolution reverberators capture the fingerprint of a given space,

but require preparations for the acquisition of such IRs and allows

little flexibility with regards to modifying the room at time of use,

though perceptual control of convolution based room simulators

is a subject of current study [12].

Scale models, to date, have been used as off-line convolution

reverberators to study architectural spaces [13, 14], but never in a

performance setting. The current study envisages the possibility

of using scale models in the same way as the old “echo cham-

bers” of the early and mid-twentieth century, while allowing for

the creation of complex and unique acoustic spaces rather than

just simple reverberators. Real-time use of one, or several, mod-

els and the ability to dynamically alter source, receiver, and even

room positions and configurations as desired offers a new form of

reverberation and musical expression.

In parallel, the development and exploitation of real-time scale

model convolution offers a number of interesting scientific aspects.

To begin with, there is the basic signal processing challenge of

achieving such a system. The applications of real-time physical-

based convolution in scale models, in contrast to off-line convo-

lution with measured impulse responses of the scale model, of-

fers the ability to study room excitation by dynamic sources, such

as moving or rotating, with perceptual studies. of specific inter-

est are perceptual studies concerning musician/room interactions,

which require real-time processing of generated music in coordi-

nation with source dynamics. The effect of dynamic architectures

can also be examined, such as movable panels, or dynamic listener

placement or movement during a performance.

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

2. ARTISTIC CONTEXT

What child has not dreamed of being able to experience, as a Lil-

liputian, a world in miniature: doll houses, electric trains, minia-

ture circus . . . In the “Stupaphonic” 1project that childhood dream

will become a reality for musicians: they will be able to play to

their audience in a space in miniature.

This particular space is based on a special type of structure,

which is at the core of the architectural project Woodstacker [15].

This architectonical type of structure is a solution to the geometri-

cal problem of how to cover a large area by stacking small pieces

of wood without the use of glue or nails. The result is a bottle

shaped three-dimensional rose window (see Figures 1 and 4). This

new building system, based on “stacked laminated” timber struc-

tures, can evoke references to pagodas whose construction also

consists of wooden stacked elements. This stacked architecture,

like chorten of Tibet, belongs to a family of stacked structures

which are derived from a Buddhist mound-like structure called

stupa. Stupas originated as pre-Buddhist earthen burial mounds,

like tumuli in Europa. Thus was born the idea of linking our new

project to these ancient and universal architecture.

The stupa is used by Buddhists as a place of meditation. In

the original pre-Buddhist burial mounts ascetics were buried in a

seated position. The American anthropologist J. Jaynes [16] pro-

poses that they were buried in this position so they can continue to

speak to living people. Hearing voices from beyond the grave sug-

gests some acoustic illusions which are also a part of our device.

In the Woodstacker system, the special geometrical pattern of the

lamellas not only focuses the sound [17] but also functions as fre-

quency filters producing a particular, almost metallic, sound. This

strange acoustic effect is a second reason to link our project to the

stupa as a kind of container for “voices from beyond the grave”,

a “voice granary” (“grenier à voix” in French) to cite the french

writer Pascal Quignard [18].

The larger structures we have currently built can accommo-

date about 30 people for sound experiments (see photograph in

Figure 1). This size limitation is a compromise between the fund-

ing for artistic experiments and the cost of such a construction.

With the project’s evolution we desired a means to quickly exper-

iment with different architectural structures in a flexible way. The

use of physical scale models which are powerful tools for archi-

tects, carpenters, and acousticians [19] offered a solution to ex-

ceed the current constraints. Thus, we found a way to invert the

acoustical environment, like turning a glove inside-out, and give

the musician and audience located outside of the building the same

acoustical experience that they could have inside the structure. Us-

ing the acoustic scale effect we are able to drastically reduce the

size of our installation and increase the number of structures per-

formers can play with and turn “stackscapes” (see Figure 2) into

interactive soundscapes.

3. SYSTEM OVERVIEW

To achieve the artistic, acoustic, and audio scheme imagined, a

basic scenario and system architecture was envisioned. One can

1Stupaphonic: from stupa (from Sanskrit: m., , st¯upa, literally

meaning “heap” a) and phonic (from Ancient Greek φων ´η,ph¯on¯e, mean-

ing “voice” or “sound” b)

ahttp://en.wikipedia.org/wiki/Stupa, last viewed 2013-11-30

bhttp://en.wikipedia.org/wiki/Phonetic, last viewed 2013-11-30

Figure 1: Photo of live performance at StackCamp 2013 featur-

ing Didier Petit (cello) and Emre Gultekin (saz). Champ-au-Beau,

France.

Figure 2: Blue Stackscape, c

2006 O. Delarozière.

imagine a performance area, where the performer is equipped with

one or several microphones. The space is open and large. Near the

musician is one or several acoustic scale models, equipped with

ultrasonic speakers and microphones. Around the musician is the

audience. The sound produced by the musician is captured, trans-

formed to the scale of the model where it is played and recaptured,

then transformed to the full scale of the musician’s performance,

where it is played live to the audience over an electro-acoustic ar-

ray of speakers either on-stage or around the audience.

3.1. Signal Processing

The signal processing chain is depicted in Figure 3. First, the in-

put audio signal is up-sampled to the ultrasound sampling rate,

which is determined by the scaling factor of the model. Sec-

ond, the up-sampled input signal is transposed by the scaling fac-

tor preserving the harmonic structure of the signal. Two imple-

mentations have been tested: a) off-line transposition that allows

for the time-stretching of the signal; b) a real-time implementa-

tion thereof, which compensates for the time-scaling effect using

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

Figure 3: Signal processing flow chart: (a) instrument signal cap-

ture, transformed audio signal playback; (b) up/down-sampling

with anti-aliasing filters; (c) frequency transposition, and (d) the

Stupaphonic scale model.

a phase vocoder and thus preserves the continuity of the audio

stream.

The off-line version study was carried out using MATLAB R

.

Audio samples were processed individually, with no audio stream-

ing functionality. The basic approach consisted of taking an audio

extract, resampling the audio, modifying the sample rate in order

to apply the scale factor, play/recording the sample in the model,

and then retransforming the recorded physically-convolved signal

to full-scale for audio playback. A code sample for the described

process is provided below:

fs = 44100;

[y] = wavrecord(audiolength,fs);

fs_max = 192000; % sample rate in scale model audio chain

scale_desired = 12;

[y_resamp] = resample(y,fs_max/scale_desired,fs);

fs_resamp = fs_max/scale_desired;

[y_convolved]=wavplayrecord(y_resamp,fs_max);

wavplay(y_convolved,fs_resamp);

In this example, with a maximum sample rate of 192 kHz on

the audio system and a scale factor of 12, the recorded audio track

is resampled to 16 kHz (bandlimited to 8 kHz). This resampled

track is then played back and recorded in the scale model at an ac-

tual sample rate of 192 kHz. The recorded physically-convolved

signal is then played back at an actual sample rate of 16 kHz, or

resampled to the sample rate of the audio device. The simple re-

definition of the sample rate for the audio buffer performs the ap-

plication of the scale factor, while the resampling assures correct

anti-aliasing filters.

While currently tested in single buffer full convolution, fu-

ture studies will evaluate the possibility of applying the concept

of overlap-add convolution [20] to the concept of this physical-

based convolution in order to allow for real-time operation on au-

dio streams.

The real-time version study was conceived of as an alter-

nate approach to the above approach employing resampling and

transposition to apply the scaling factor through the use of a high

quality phase vocoder architecture. Phase vocoder techniques

are typically based on a sinusoidal signal model. The digital

audio sampling rate conversion employed band-limited inter-

polation (see e.g. [21, 22]) that can be efficiently implemented

with sinc-function look-up tables. In [23] it was shown that

parametrized phase vocoders can also be applied to non-sinusoidal

signals. However, initial tests showed that the sinusoidal signal

model limits the use of phase vocoders for real-time scale-model

processing for large scale factors. Modified algorithms are the

subject of continuing investigations.

5 m

Figure 4: Woodstacker stacked lamella timber cupola. Champ-

au-Beau, France, 2010. (upper) A winter outside view. (lower)

Section and Reflected ceiling plan.

3.2. Scale Model

This preliminary study has been carried out using a single structure

as a test case. The scale 1:1 structure was built in 2008 for a Land

Art Exhibition in the highland of Auvergne, France (see Figure 4).

The original building comprised 366 pieces of Douglas pine wood.

It was 5 m in diameter, 4.5 m high, and weighed 6 tons. This work

called “Vox Granarium” [24] was dedicated to famous ancient fid-

dlers from this area. This installation was dismantled and moved

to Morvan where it was rebuilt in 2010. The Stupaphonic model

is a 1:12 scale model of “Vox Granarium”, 425 mm in diameter

and 322 mm high. This scale was chosen due to material availabil-

ity and because it is a traditional doll house scale. Serendipitously,

1:12 is also the Lilliputian people’s scale in Gulliver’s Travels2.

The model was constructed from oak wood, whose lengths and

widths were hand cut with no automatic process used for assem-

bly. Unlike the full-scale construction, glue was used to fix the

lamellas, and the model was assembled in three parts for ease of

transportation and manipulation purpose (see Figure 5).

Due to the very long time needed to build this model by hand,

subsequent models will probably be built using rapid prototyping

techniques such as laser cutting. This will allow to quickly exper-

iment with a large variety of structural shapes and configurations.

We are also planing to use other materials such as metal or con-

crete for special acoustics effects.

2“. . . having taken the height of my Body by the help of a Quadrant,

and finding it to exceed theirs in the Proportion of twelve to one ...” [25,

p. 64]

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

Figure 5: Scale model (1:12) of “Vox Granarium”, highlighting

the tetrahedral acoustic source and 3 modular elements.

Figure 6: Spectral analysis of impulse response (balloon burst ex-

citation) of the full (1:1) scale “Vox Granariium”, recorded with a

sample rate of 96 kHz. Temporal analysis windows corresponding

to direct/initial (0−10 msec), early decay (10 −30 msec), and

late tail (30 − ∞ msec).

4. PRELIMINARY TEST

The diffraction acoustic effect discussed in Section 2 can be ob-

served as a series of high-Q total resonances. These resonances

can be observed by comparing the spectral response (magnitude of

the FFT) at different moments in the impulse response. Figure 6

shows the spectral response at three moments in the impulse re-

sponse (balloon burst excitation signal employed for room acous-

tics measurements [26]) of the full (1:1) scale “Vox Granariium”

(see Figure 4). Resonant peaks are identified in the response for

identification. There is clearly a region of resonance density over

the frequency range 100–600 Hz, continuing still to ≈800 Hz.

The audio system employed for use in the scale model con-

sisted of DPA 4060 microphones and a custom 3-speaker tetrahe-

dron (see Figure 5) driven by a Samson Servo amplifier, connected

to a RME Fireface 400 audio interface. While somewhat uncon-

ventional in traditional scale model research, this selection of pro-

audio equipment has been used in previous studies [27, 28] and

has been shown to provide improved signal-to-noise ratio when

compared to more traditional laboratory scale model measurement

architectures. The current hardware exhibits a frequency roll-off

at ≈50 kHz. This of course imposes a low-pass frequency lim-

itation for the physical-based convolution. With a scale factor of

12, the upper frequency limit due to this roll off is on the order of

4.2 kHz, rather than the 8 kHz permissible due to sampling theory.

While suitable for the majority of studies in room acoustics with

scale models, the musical implications of this limit can not be ig-

Figure 7: Spectrogram of anechoic music extract (upper) and

physical-based convolved music (lower) manually time-aligned.

nored. This limit can be raised by improving the upper frequency

limit of the audio chain or selecting a lower scale factor.

An example result of the processing chain can be seen in Fig-

ure 7, which shows the spectrogram of a dry music extract be-

fore and after the physical-based convolution processing. The test

music except was a dry multichannel recording of a Schubert trio

(D.929, op.100), by [29] and publicly available a.

The acoustic timbre of the convolved signal using the scale

model greatly resembled that of the musical experience heard

within the full scale installation. Even thought the processing

steps apply a low-pass filter effect, due predominantly to trans-

ducer and amplifier performance limitations above 50 kHz, the

frequency range where the resonance characteristics of the struc-

ture are apparent are still well within the operating frequency of

the current signal processing chain for the 1:12 scale.

5. CONCLUSIONS

This paper has presented the foundations of the “Stupaphonic ex-

periment”, an artistic and scientific project which aims to use scale

models as physical-based convolution reverberators. The architec-

tural structure at the center of the project offers specific timbral

qualities which are maintained in the initial tests, despite the fre-

quency limitations of the scale transformations and associated sig-

nal processing chain.

The current example operates in an off-line, or time-deferred

situation. While streaming is currently still being investigated, the

current implementation could still be used in a performance setting

ahttp://c4dm.eecs.qmul.ac.uk/rdr/handle/123456789/27

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

in a live looping context, where the musician could send different

audio samples to different architectures at a unique or different

scale factors, effectively changing the size of the “echo chamber”.

The development of the real-time processing stage, currently

a subject of study, will allow for exploitation of the proposed

physical-based convolution for studies in room acoustics, specifi-

cally those involving dynamic source, listener, or architectural

elements, as well as dynamic performer/room interactions.

One artistic performance aspect specific to this system is the

potential for cross-scale cross-talk. If the scale models are open

to some degree, then the up-scaled audio will be heard by some of

the audience. At the same time, full scale sounds, such as other

elements of the performance or noise from the audience, can also

be captured in the scale model, and subsequently down-scaled and

played over the reproduction array. According to the Lilliputian

scale factor of 1:12, one can imagine the majority of these sounds

will be shifted to the lower end of the audible range, or into the

subsonic region. However, a high pitched scream with a center

frequency in the 5 kHz third-octave band for example would be

clearly audible when transposed to the 400 Hz third-octave band,

albeit also time stretched to 12 times its original duration. Investi-

gations of these effects, and their possible artistic use, remain the

subject of further studies.

6. ACKNOWLEDGMENTS

This study was funded in part by an Action Initiative grant at the

LIMSI-CNRS.

7. REFERENCES

[1] J-M. Jot and A. Chaigne, “Digital delay networks for design-

ing artificial reverberators,” in Proc. 90th AES Convention,

Paris, France, Feb. 1991.

[2] G. Garcia, “Optimal Filter Partition for Efficient Convolu-

tion with Short Input/Output Delay,” in Proc. 113th AES

Convention, Oct. 2002.

[3] M. Noisternig, A. Sontacchi, T. Musil, and R. Höldrich, “A

3D Ambisonic based Binaural Sound Reproduction System,”

in Proc. 24th AES Int. Conf., Banff, Canada, June 2003.

[4] J-M. Jot, “Real-time spatial processing of sounds for mu-

sic, multimedia and interactive human-computer interfaces,”

Multimedia Systems, vol. 7, no. 1, pp. 55–69, 1999.

[5] A. Krokstad, S. Strom, and S. Sorsdal, “Calculating the

acoustical room response by the use of a ray tracing tech-

nique,” J. Sound Vib., vol. 8, no. 1, pp. 118–125, 1968.

[6] M. R. Schroeder, “Digital Simulation of Sound Transmission

in Reverberant Spaces,” J. Acoust. Soc. Am., vol. 47, no. 2,

pp. 424–431, 1970.

[7] T. A. Funkhouser, I. Carlbom, G. Elko, G. Pingali,

M. Sondhi, and J. West, “A beam tracing approach to acous-

tic modeling for interactive virtual environments,” Proc.

ACM Comp. Graphics (SIGGRAPH’98), pp. 21–32, July

1998.

[8] S. Laine, S. Siltanen, T. Lokki, and L. Savioja, “Accelerated

beam tracing algorithm,” Applied Acoustics, vol. 70, no. 1,

pp. 172–181, 2009.

[9] M. Noisternig, B. F.G. Katz, S. Siltanen, and L. Savioja,

“Framework for real-time auralization in architectural acous-

tics,” Acta Acoust. united with Acust., vol. 94, pp. 1000 –

1015, 2008.

[10] C. Malcurt, Simulations informatiques pour prédire les

critères de qualification acoustique des salles. Compara-

ison des valeurs mesurées et calculées dans une salle à

acoustique variable, Ph.D. thesis, Laboratoire Acoustique

Métrologie Instrumentation, Toulouse, France, July 1986.

[11] G. I. Koutsouris, J. Brunskog, C-H. Jeong, and F. Jacobsen,

“Combination of acoustical radiosity and the image source

method,” J. Acoust. Soc. Am., vol. 133, no. 6, pp. 3963–3974,

2013.

[12] T. Carpentier, T. Szpruch, M. Noisternig, and O. Warus-

fel, “Parametric control of convolution based room simula-

tors,” in Proc. Int. Symp. on Room Acoust. (ISRA), Toronto,

Canada, June 2013.

[13] Jean-Dominique Polack, Xavier Meynial, and Vincent Gril-

lon, “Auralization in scale models: Processing of impulse

response,” J. Audio Eng. Soc, vol. 41, no. 11, pp. 939–945,

1993.

[14] Vincent Grillon, Xavier Meynial, and Jean-Dominique Po-

lack, “Auralization in small-scale models: Extending the

frequency bandwidth,” in Audio Engineering Society Con-

vention 98, Feb 1995.

[15] O. Delarozière and U. Gleeson, “Woodstacker,” in Archi-

tectures autrement : Habiter le monde, M. Culot and A-M.

Pirlot, Eds., pp. 46–51. AAM, Brussels, 2005.

[16] J. Jaynes, La naissance de la conscience dans l’effondrement

de l’esprit, Presses Universitaires de France, 1994.

[17] B. Katz, O. Delarozière, and P. Luizard, “A ceiling case study

inspired by an historical scale model.,” in Proc. 8th Int. Conf.

on Auditorium Acoust., Institute of Acoustics, Dublin, May

2011, vol. 33, pp. 314–321.

[18] P. Quignard and C. Lapeyre-Desmaison, Pascal Quignard

le solitaire : Pascal Quignard, rencontre avec Chantal

Lapeyre-Desmaison, Les Flohic éditions, 2006.

[19] O. Delarozière, “Camera tectonica : Hypothèses pour un

facsimilé d’architecture,” in Utopia Instrumentalis : Fac-

similés au musée - Musée de la Musique, Cité de la musique,

Paris, Nov. 2010, pp. 46–56.

[20] A. V. Oppenheim and R. W. Schafer, Digital signal pro-

cessing, Prentice-Hall, Englewood Cliffs, N.J., 1975, ISBN

0-13-214635-5.

[21] R. W. Schafer and L. R. Rabiner, “A digital signal processing

approach to interpolation,” Proceedings of the IEEE, vol. 61,

no. 6, pp. 692–702, 1973.

[22] J. O. Smith, III and P. Gossett, “A flexible sampling-rate con-

version method,” in Proc. IEEE Int. Conf. Acoust., Speech

and Sig. Proc. (ICASSP), 1984, pp. 112–115.

[23] W.-H. Liao, A. Roebel, and A. W. Y. Su, “On stretching gaus-

sian noises with the phase vocoder,” in Proc. of the 15 Int.

Conference on Digital Audio Effects (DAFx-12, Sept. 2012.

[24] O. Delarozière, “Vox Granarium,” in Horizons - Rencontres

“Arts Nature", pp. 12–13. Office de Tourisme du Sancy, July

2008.

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

[25] J. Swift, Part 1. A Voyage to Lilliput, vol. 1 of Travels Into

Several Remote Nations of the World, chapter III, pp. 47–64,

Printed for Benj. Motte, at the Middle Temple-Gate in Fleet-

street, 1726.

[26] J. Pätynen, B.F.G. Katz, and T. Lokki, “Investigations on

the balloon as an impulse source,” J. Acoust. Soc. Am., vol.

129(1), pp. EL27–EL33, 2011.

[27] Paul Luizard, Les volumes couplés : comportement, con-

ception, et perception dans un contexte de salle de spectacle,

Ph.D. thesis, Université Pierre et Marie Curie, Paris, France,

2013.

[28] P. Luizard, M. Otani, J. Botts, L. Savioja, and Brian F. Katz,

“Comparison of sound field measurements and predictions

in coupled volumes between numerical methods and scale

model measurements,” in Proc. Meetings on Acoustics, Mon-

treal, June 2013, vol. 19, p. (9 pages).

[29] Joachim Fritsch, “High quality musical audio source sepa-

ration,” M.S. thesis, UPMC / IRCAM / Telecom ParisTech,

2012.