Document [original]

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

PERCEPTUAL AND ROOM ACOUSTICAL EVALUATION OF A COMPUTATIONAL

EFFICIENT BINAURAL ROOM IMPULSE RESPONSE SIMULATION METHOD

Torben Wendt,

Acoustics Group, Medical Physics

and Cluster of Excellence “Hearing4all”,

University Oldenburg

Oldenburg, Germany

[email protected]

Steven van de Par,

Acoustics Group and Cluster of Excellence “Hearing4all”,

University Oldenburg

Oldenburg, Germany

[email protected]

Stephan D. Ewert,

Medical Physics and Cluster of Excellence “Hearing4all”,

University Oldenburg

Oldenburg, Germany

[email protected]

ABSTRACT

A fast and perceptively plausible method for synthesizing bin-

aural room impulse responses (BRIR) is presented. The method

is principally suited for application in dynamic and interactive

evaluation environments (e. g., for hearing aid development), psy-

chophysics with adaptively changing room reverberation, or simula-

tion and computer games. In order to achieve a low computational

cost, the proposed method is based on a hybrid approach. Using the

image source model (ISM; Allen and Berkley [J. Acoust. Soc. Am.

Vol. 66(4), 1979]), early reflections are computed in a geometrically

exact way, taking into account source and listener positions as well

as wall absorption and room geometry approximated by a “shoe-

box”. The ISM is restricted to a low order and the reverberant tail

is generated by a feedback delay network (FDN; Jot and Chaigne

[Proc. 90th AES Conv., 1991]), which offers the advantages of a

low computational complexity on the one hand and an explicit con-

trol of the frequency dependent decay characteristics on the other

hand. The FDN approach was extended, taking spatial room proper-

ties into account such as room dimensions and different absorption

characteristics of the walls. Moreover, the listener orientation and

position in the room is considered to achieve a realistic spatial

reverberant field.

Technical and subjective evaluations were performed by com-

paring measured and synthesized BRIRs for various rooms. Mostly,

a high accuracy both for some common room acoustical parameters

and subjective sound properties was found. In addition, an analysis

will be presented of several methods to include room geometry in

the FDN.

1. INTRODUCTION

Room acoustical simulations are desirable for many purposes, such

as developing or testing signal processing algorithms, or to e. g.

test the effect of reverberation on speech intelligibility. Further-

more, they are of interest for audio-visual simulation environments

(e. g, for training and rehabilitation) and in entertainment, e. g. in

computer games, all requiring a real-time adaptation of the virtual

environment, depending on the movement of the listener and/or the

sound sources.

One traditional way to emulate the acoustics of a certain room

is to measure binaural room impulse responses (BRIRs) and to

convolve dry source signals with the BRIRs. However, such mea-

surements are time consuming and their usage is restricted to static

scenarios. Furthermore, one is restricted to actually existing rooms.

Alternatively, room acoustics can be simulated, enabling different

degrees of realism, ranging from simple artificial reverb generation

to complex room acoustical simulation (image source model [

CATT [

], ODEON [

]), even for dynamic scenarios (e. g. [

], [

[6]).

Depending on the application, physically correct rendering

of a soundfield is required or a perceptually convincing auraliza-

tion, implying plausibility and authenticity, is sufficient. For room

simulations used in psychoacoustic research, rehabilitation or in

computer games perceptual aspects are most important, implying

accordance of room acoustical parameters, e. g. reverberation time,

definition, and measures like speech intelligibility. In this case sim-

plifications can be made to reach computational efficiency allowing

for real-time rendering of dynamic acoustic scenes, in which the

positions of sources and receivers can be changed interactively.

Several approaches exist to synthesize room impulse responses.

If the wavelength of a sound is small compared to the characteristic

dimensions of reflecting objects, concepts of geometric acoustics

(GA), such as the image source model [

] or the ray tracing method

[

] can be applied. Both methods have been used and further

developed in various room acoustics simulation algorithms, mostly

as hybrids together with other algorithms (e. g. [

], [

]). However,

these methods still have high computational complexities.

If the exact room geometry is neglected, artificial reverberation

can be synthesized very efficiently and with predefined reverbera-

tion time. Here, a common approach are feedback delay networks

(FDNs), based on Schroeder’s pioneering work on parallel delay

lines with feedback [9] and further developed (amongst others) by

Stautner and Puckette [10] and Jot and Chaigne [11].

One way to achieve real-time performance while maintaining

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

the advantages of the more “accurate” GA-based BRIR synthe-

sis and reverberation algorithms is their combination in a hybrid

approach: The initial part of the impulse responses is computed

based on a GA method. The reverberant tail is generated by a more

effective reverberation algorithm. Perception motivates such an

approach as the early sound reflections create the impression of a

certain spatial source width on the one hand and support speech

intelligibility on the other hand. The following reverberant tail

contains diffuse reflections and its frequency dependent decay char-

acteristics conveys information about the wall absorption and room

size.

Here, a hybrid approach was evaluated which combines the

image source model (ISM) for a shoebox geometry to simulate early

reflections up to a low order, and an FDN for creating a diffuse

reverberation tail. The FDN was extended to be directly linked

to the room geometry used in the ISM, and to be able to spatially

render the reverberation tail in order to generate BRIRs. For low

ISM orders, BRIRs can be simulated very efficiently with this

approach. In technical and subjective evaluations, the ability of the

algorithm to create plausible and authentic simulations was assessed

for single and connected (coupled) shoebox rooms. Two different

approaches for spatial reverb distribution rendering were compared,

taking room dimensions and receiver position into accout.

2. SIMULATION METHOD

A hybrid approach [

] was used to synthesize BRIRs. Early sound

reflections are computed by an image source model up to a low

order. The late reverberation was generated by a feedback delay

network.

The auralization steps are described explicitly for the case

of headphone presentation, reflected by the application of head-

related impulse responses (HRIRs). The adaptation to arbitrary

loudspeaker-based playback systems, such as higher order ambison-

ics or wave field synthesis, can be easily achieved by replacing

them by respective loudspeaker-controlling functions.

2.1. Image source model

The ISM regards a sound reflection as the direct sound of a mirrored

version of the original source. This so-called image source differs

from the original source by its time delay and its attenuation due to

the distance to the receiver, as well as the respective wall reflection

coefficient. The sound of an image source is reflected again at other

walls, creating higher order image sources. In this way, arbitrarily

complex reflection paths can be modeled.

The ISM implementation in the proposed simulation method

is restricted to empty shoebox-shaped rooms, where the six wall

surfaces are represented each by frequency dependent absorption

coefficients. These shoebox-shaped rooms enable a very efficient

calculation of image source positions in comparison to arbitrary

room geometries [

]. Nevertheless, for a shoebox room the num-

ber of image sources up to reflection order

is of order

O(N3)

which considerably affects computational efficiency for higher re-

flection orders. Another limitation of the ISM is that it inherently

assumes only specular instead of diffuse reflections, although they

are of importance to describe room acoustics.

In the ISM implementation, the following signal processing

steps are performed for each image source: A “1

distance” at-

tenuation factor and a time delay due to distance to receiver; an

“effective reflection filter”, being the (frequency domain) product

of all wall reflection coefficients that are involved to “create” the

current image source; an HRIR, according to the azimuth and ele-

vation position of the image source relative to the receiver’s head

orientation. Finally, the binaural signals for all image sources are

added up to one two-channel output.

2.2. Extended feedback delay network

The extended FDN used here is based on the general multichannel

network as suggested by Jot and Chaigne [

] and consists basically

of a set of parallel delay lines whose outputs are fed back via a

feedback matrix A.

The number of parallel channels (delay lines) was set to 12,

with four channels associated to each (showbox) room dimension

(two channels per wall) reflected in several parameter choices.

Firstly, the delay units

τj, j ∈N≤12

were directly related to the

room dimensions via sound propagation speed (plus a random jitter

per channel). Secondly, the absorption filters with transfer functions

Habs

simulate the frequency dependent sound attenuation due to the

wall reflections and air absorption. After Jot and Chaigne [

] the

frequency dependent reverberation time

T60(f)

conveyed by the

resulting RIR is controlled explicitly by the following frequency

responses, if all other processing steps are energy preserving:

20 lg |Habs

j(f)|=−60τj/T60(f).(1)

In the simulation method, the reverberation time is predicted from

the wall absorption coefficients via Sabine’s formula. Thirdly, the

feedback matrix

redistributes the outputs back to the input chan-

nels. This process is energy preserving if

is an orthogonal matrix.

Here, a randomly created unitary matrix was chosen, providing a

high variety of pulse amplitudes.

Two last processing steps per channel, referred to as “binaural-

ization steps”, extend the FDN to introduce spatiality distributed

and externalized reverberation. (1) Via HRIR filtering the FDN

channels are mapped to 12 points (directions) around the head,

with two points positioned on each wall. (2) Reflection filters—

identical to those applied to the first order image sources in the

ISM—simulate a direction dependent sound intensity of reverbera-

tion, due to the different acoustical wall properties.

Two possible principles are suggested to map the 12 directions

around the receiver’s head, which are sketched in Fig. 1 (micro-

phone symbol: recveiver, big “

⊗

”: direct sound source, small “

⊗

”:

reverb source). The first one (lhs of Fig. 1) is called “cube” condi-

tion. Here, the 12 directions are mapped to points on a cube around

the receiver’s head. The cube always moves with the receiver (re-

ceiver is always in its centre) and is axis aligned with the room.

Figure 1: Illustration of two possible techniques of spatial reverb

distribution: “cube” (lhs) and “box” (rhs) method. See text for

explanation. (Arrows will be explained in sec. 3.2.

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

In this way all 12 incidence directions are more or less equally

distributed around the head. In the second “box” condition the 12

directions are mapped to points on the actual six wall surfaces like

depicted in the rhs part of Fig. 1. Here, the sound incidences are

warped according to the room dimensions and the actual receiver

position. Differences between both methods will clearly be audible

for rooms with large differences in dimensions. If not specified, in

the following the cube condition will be used as standard rendering

method.

2.3. Combination of ISM and FDN

For a smooth transition from the early-reflections part (ISM) to the

late-reverberation part (FDN), i. e. a straight decay of the BRIR

on dB scale, the energy and initial delay of the FDN input signal

have to be suitable. For this purpose the FDN input signal consists

of the

th-order ISM pulses before HRIR filtering. In order to

avoid comb-filter coloration effects, which occur if a fixed temporal

pattern of pulses is fed into the FDN, the ISM output is distributed

to the FDN channels. Because the number of image sources of

order

does in general not equal the number

of FDN channels,

the

th ISM pulse is fed into the FDN channel

[(i−1)modM] + 1

2.4. Simplified model for coupled rooms

In addition to the single shoebox-shaped room a strongly simplified

method to simulate the acoustics of two connected shoebox rooms

that are acoustically coupled, e. g. by an open door is suggested.

It is assumed that a source S is located in room 1 and a re-

ceiver R in room 2 as depicted in Fig. 2. The sound transmission

from room 1 to room 2 is then simulated by a single virtual source S

located in the door which is exciting room 2 as in the case of the

single shoebox simulation described above. The virtual source radi-

ates the monaural impulse response of room 1 for a source position

specified by the coupled-room arrangement and a “monaural” re-

ceiver R

inside the open door. Thus, the effective BRIR is obtained

as the convolution product of the monaural RIR of room 1 with the

BRIR of room 2.

Depending on the source position in room 1, it is either visible

or invisible for a receiver in room 2. If it is not visible, no direct

sound will arrive at the receiver but only reflections and diffractions.

In this case, the direct sound pulse of the RIR of room 1 is discarded

in the current approach.

Figure 2: Sketch of two adjacent rooms, that are acousticly coupled

by an open door. See text for explanation. (The arrow will be

explained in sec. 3.2.)

3. EVALUATIONS

Three main aspects of the proposed simulation method were eval-

uated. Firstly, for a set of real-existing rooms, subjectively rated

sound properties of measured and respectively synthesized BRIRs

were compared. Secondly, the two approaches to realize the binau-

ralization steps of the extended FDN (see sec. 2.2) were evaluated

with respect to binaural parameters. Thirdly, the approach to simu-

late the acoustics of two coupled rooms was evaluated.

To perform these evaluations, a test-database containing mea-

sured and synthesized BRIRs was created. BRIRs were measured

for various rooms of different size and reverberation time, as well

as a for a few source-receiver configurations in two connected

rooms. Additionally, some measured BRIRs were taken from the

AIR database [14].

The BRIR measurements were performed using an omnidirec-

tional loudspeaker based on a ring-radiator and an artificial head

MK2 by Cortex. Rooms were excited with a logarithmic sweep

[

] (50 Hz to 18 kHz) offering removal of nonlinear harmonic dis-

tortions from the recorded and inverse filtered signal. BRIRs were

each calculated as the mean of BRIRs from 10 single recordings

and equalized by the inverse loudspeaker transfer function.

For the BRIR synthesis a single mean wall absorption coeffi-

cient was used. It was determined for each room from its reverbera-

tion time via the inverse form of Sabine’s formula, which ensures

that reverberation times of measured and synthesized BRIRs are in

good accordance.

The HRIRs used in the simulation were from the same arti-

ficial head as used for the BRIR recordings. The database [

]

offers HRIRs with azimuth angles in 2

◦

steps at elevations near the

equatorial level. Towards the poles, the azimuth angle sampling

decreases. Elevation angles are sampled in 2◦steps.

A varying synthesis parameter was the maximum image source

order. The goal was to find a trade-off between accuracy and

computational efficiency.

In the following, for all different rooms and source-receiver

configurations, the term “room condition” will be used. In contrast,

different types of BRIR synthesis, differing in the choice of simula-

tion parameters, will be referred to as “synthesis conditions”. All

room- and synthesis conditions will be introduced in the following.

3.1. Subjective sound properties

3.1.1. Room- and synthesis conditions

BRIRs were chosen from four rooms of different size and rever-

beration time, specified in Tab. 1. For the synthesized BRIRs, the

maximum image source order

was varied in

{1,3}

. For one

room the BRIR was synthesized only by the ISM with

N= 20

. In

the extended FDN, the cube condition was chosen (see 2.2). Two

dry source signals, female spoken speech and a guitar play (steel

strings) were convolved with the recorded and simulated BRIRs.

Presentation sound pressure levels ranged from 60 to 65 dB SPL,

depending on the source-receiver distance and the room reverbera-

tion.

3.1.2. Subjects and procedure

15 normal-hearing subjects (7 female, 8 male) aged 24 to 32 years

participated in the experiment. Sounds were presented via head-

phones in a sound attenuating booth. Since the synthesis method

was implemented as an offline simulation, no head tracking and

adaptively changing soundfield was employed.

The sound properties which were to be rated on a seven-point

scale were “naturalness” and “room size”. A test and a retest were

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

Table 1: Rooms, whose BRIRs were used in the subjective evalua-

tion. Reverberation times T60 were obtained from measured BRIR

(broadband).

Room Dimensions (m) T60 (s)

Aula (12.0, 30.0, 10.0) 4.8

Empty chamber (1.88, 2.74, 2.82) 2.5

Lecture room (10.90, 10.80, 3.15) 0.8

Laboratory (4.97, 4.12, 3.00) 0.3

performed in two sessions, each with a randomized order of pre-

sented sounds. Before the actual experiment was performed, sound

examples illustrating extremal distinctions of the sound properties

had been presented.

3.1.3. Results

Fig. 3 shows the results from the subjective sound property ratings

as mean values over all subjects and source signals. Each panel

shows results for one sound property. The avarage ratings are plot-

ted for all synthesis conditions against rooms. Error bars indicate

inter-subject standard errors.

For naturalness (left panel) ratings differ strongly between

rooms. Whereas the BRIRs of the laboratory were rated to sound

most natural, lowest naturalness was perceived for the empty cham-

ber. BRIRs of the aula and lecture room were rated to have a

medium to high naturalness. Between synthesis conditions, almost

no differences are visible, and for most rooms, differences between

synthesized and measured BRIRs are very low. Moreover, for some

conditions even the synthesized BRIRs were rated to sound slightly

more natural than the measured one. This shows, that the proposed

simulation method is able to synthesize BRIRs that sound as natural

as measured ones. Remaining differences in perceived naturalness

between rooms might be due to familiarities of subjects with these

acoustic environments in daily life, since the Laboratory sounds as

dry as an ordinary living room, whereas the empty chamber sounds

rather unusual, even by the measured BRIR. This might also be

due to the unusual relation of its very small room size and its high

reverberation time (see Tab. 1).

For room size (right panel), again clear differences between

rooms are perceived. The order is well in accordance with reverber-

ation times and, except for the empty chamber, with the actual room

sizes (see Tab. 1). Differences within synthesis conditions and be-

tween synthesises and measurements are practically not existent.

Aula Empty ch. Lect. r. Lab.

high

low

Room

Naturalness

Aula Empty ch. Lect. r. Lab.

small

large

Room

Room size

N = 1

N = 3

ISM

meas.

Figure 3: Subjective sound property ratings of measured and synthe-

sized BRIRs for four rooms, averaged over all subjects and source

signals (speech and music). Error bars indicate inter-subject stan-

dard errors.)

This shows that firstly the simulation method is able to represent

different room sizes and secondly to achieve this independently

from maximum image source order as far as tested.

As a consequence of the experimental design, no direct map-

ping of synthesized BRIRs to actual rooms was performed. Given

that no head tracking was employed, a potential effect of head rota-

tions on the subjective ratings could so far not be assesed. Future

research will apply the system in a real-time environment and will

address this issue.

3.2. Evaluation of spatial properties of the extended FDN

3.2.1. Room- and synthesis conditions

Two rooms each with different configurations of source- and re-

ceiver positions as well as wall properties, were used to evaluate

the spatial reverb rendering. Fig. 1 depicts schematically the condi-

tions for one room. Tab. 2 specifies the room dimensions and the

absorption coefficients for (250, 500, 1k, 2k, 4k) Hz. While room 1

has an almost square base area, room 2 represents a long corridor.

Side wall absorption coefficients were specified in two different

ways: In the “closed” condition, all side wall absorption coeffi-

cients were equal as given in line 3 in Tab. 2. By this way, spatial

sound properties in azimuth can be investigated in dependence of

room geometry in connection to the positions of the receiver and

all virtual “reverb” sources. In the “open” condition, the left side

was completely open, meaning that no wall was existent. This was

technically represented by a broadband absorption coefficient of

0.99, whereas the absorption of all other side walls did not differ

from those of the closed condition. By this, the spatial rendering

was to be evaluated in a challenging condition for the model.

Also in both rooms and for the closed- and open condition,

the distance

of source and receiver to the left wall was chosen

to be 0.3 m or 5 m (see Fig. 1). The source was always in the

front direction of the receiver, yielding a direct sound with no

interaural differences. All differences are thus due to reflections

and reverberation.

Since no suitable real rooms were found for BRIR measure-

ments, the ISM with reflection order 20 was used as reference.

For all room conditions, the cube- and box condition were com-

pared against each other and the reference. ILDs and IACCs were

determined from the BRIR up to the time

min{T60(f)}

, where

T60(f)

is the frequency dependent reverberation time. Positive

ILDs indicate higher signal energy on the right.

Table 2: Specification of two virtual rooms used for evaluation of

the binauralization steps in the extended FDN. See text for further

explanation.

Dimensions room 1 (10.9, 10.8, 3.15) m

Dimensions room 2 (3.9, 30.0, 3.15) m

Absorption side walls (0.05, 0.10, 0.13, 0.16, 0,22)

Absorption open side 0.99

Absorption floor (0.03, 0.03, 0.03, 0.03, 0.02)

Absorption ceiling (0.70, 0.60, 0.70, 0.70, 0.50)

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

Table 3: Results of the evaluation of the FDN binauralization

steps: Comparison of cube and box method with purely ISM-created

BRIRs in terms of ILDs and IACCs. See text for explanation of

conditions.

ISM cube box

ILD (dB)

R 1, closed

d= 0.3m−0.3−0.71.3

d= 5.0m0.30.70.8

R 1, open

d= 0.3m0.90.91.8

d= 5.0m1.2 1.11.2

R 2, closed

d= 0.3m−0.6−0.23.1

d= 5.0m−0.60.73.7

R 2, open

d= 0.3m0.51.03.2

d= 5.0m0.51.03.0

ISM cube box

IACC

0.5 0.5 0.5

0.7 0.7 0.7

0.9 0.8 0.8

0.8 0.8 0.8

0.60.50.4

0.70.60.5

0.90.80.6

3.2.2. Results

The results for all room- and synthesis conditions are shown in

Tab. 3. Comparing results for the cube and box condition, values

that are closer to those of the ISM reference, with a difference of

at least 0.1 dB (ILD) or 0.1 (IACC) to the less matching condition,

are printed in bold face for clarity. To interpret the results just-

noticeable differences (JNDs) of ILD have to be considered as, e. g.

determined in [

] for musical instruments in several reverberant

conditions: 1.0–1.4 dB for

T60 = 1.3

s; 0.8–1.2 dB for

T60 = 0.8

0.4–0.8 dB for the anechoic condition.

For room 1 (“R 1”) (closed) overall small absolute ILD values

are observed in the range of the JNDs. It has to be kept in mind,

that these ILDs originate from reflections only, given that the di-

rect sound was always located in the front direction. For the box

condition ILDs are larger and differ considerably more from the

reference. Clear mismatches to the reference ILD are obtained in

the close-to-wall position (d= 0.3m). Overall small or vanishing

ILDs are plausible for the closed conditions since all side walls

are equal in absorption coefficient. For the closed room 2 (“R 2”),

both ISM and cube condition show again small absolute ILD values.

However, ILDs for the box condition differ clearly from those of the

cube and reference conditions. This is not surprising because the

majority of virtual reverb sources lie clearly to the right hand side

of the receiver (see also scheme in Fig. 1). For the open versions of

the rooms, ILDs obtained from ISM-created BRIRs and the cube

condition have very similar values. Largest differences are again

obtained for the box condition in room 2.

The IACC results reveal overall no distinct differences between

the synthesis conditions for room 1. For room 2, where maximum

differences are 0.3, the cube condition yields IACCs that are closer

to those created by the ISM.

In conclusion, it can be said that the cube condition mostly cre-

ates spatially more realistic BRIRs in terms of ILD and IACC than

the box condition, especially when the room geometry and receiver

position are challenging. In addition, also an informal subjective

listening test yielded highest perceptive similarity between the cube

and the reference condition.

3.3. Evaluation of simulation of coupled rooms

3.3.1. Room- and synthesis conditions

The two adjacent rooms, an office and a corridor, acoustically cou-

pled by an open door, are specified in Tab. 4 in terms of dimensions

and absorption coefficients for (250, 500, 1k, 2k, 4k) Hz. The ar-

rangement of both rooms and positions of source S and receiver R

are depicted in Fig. 2. Two source positions were investigated. In

the “visible” condition the source is placed at the left end of the

double arrow, and in the “invisible” condition it is placed at the

right end. Measured BRIRs for two real rooms from which data

in Tab. 4 were obtained served as reference. The ISM condition

and the proposed hybrid method with

N= 3

and

N= 1

were

evaluated. In both hybrid conditions, the “cube” synthesis was

used.

Besides a comparison of the BRIRs in the time domain, ILDs

and IACCs were determined as described in sec. 3.2.1 and com-

pared with the reference.

Table 4: Specification of rooms used in the evaluation of the simu-

lation of coupled rooms.

Room 1 (corridor)

Dimensions (30.0, 1.94, 2.50) m

Absorption coeff. (0.16, 0.16, 0.13, 0.15, 0.17)

Room 2 (office)

Dimensions (4.43, 4.50, 3.00) m

Absorption coeff. (0.25, 0.30, 0.35, 0.32, 0.28)

3.3.2. Results

Fig. 4 shows normalized BRIR time signals for the measured (upper

panels) and synthesized (

N= 3

, lower panels) case on an arbitrar-

ily scaled ordinate. As expected for coupled rooms, the measured

BRIR in the invisible condition (rhs) shows a rising amplitude in

the beginning. This effect can hardly be observed in the simulated

BRIR. In this simple approach here, only one convolution of two

single RIRs was used which cannot mimick real coupling of the

rooms.

Tab. 5 shows ILDs and IACCs obtained from BRIRs of all

conditions. For all of them a clear dominance of sound energy

on the left is obtained (negative ILDs), which is primarily due

to the direction of the (virtual) direct sound (source S

in Fig. 2).

The ISM-created BRIRs, which can be assumed to simulate the

real rooms best, have indeed ILDs that are closest to those of the

measured BRIRs. The ILDs of the hybrid method BRIRs differ

maximally 3.4 dB from measurement condition, which is clearly

above the JND in reverberant conditions, at least for frontal source

positions [17].

For the IACC, all room- and synthesis conditions yield very

small values. A slightly higher accordance with the measurement

is obtained for the ISM synthesis, but it is questionable, whether

these differences were audible.

Concluding, the evaluation showed that this simple approach

has limitations if the acoustics of coupled rooms should be simu-

lated in a convincing way. Improvements should consider removal

of the direct path between the virtual source and the receiver in the

invisible condition. In a second step diffraction of the direct sound

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

amplitude

0 10 20 30 40 50

amplitude

time (ms)

0 10 20 30 40 50

time (ms)

Figure 4: BRIR time signals on arbitrary amplitude scale. Lhs:

visible condition, rhs: invisible condition. Upper panels: measured

BRIRs, lower panels: synthesis with N= 3.

Table 5: Results of the evaluation of the coupled-rooms simula-

tion. Comparison of ILDs and IACCs. See text for explanation of

conditions.

meas. ISM N= 3 N= 1

ILD (dB) visible −6.0−6.6−9.4−8.3

invisible −4.5−6.3−7.8−7.8

IACC visible 0.1 0.1 0.2 0.2

invisible 0.1 0.1 0.2 0.2

can be taken into account by inclusion of lowpass-filtered versions

of the direct path, depending on the geometric relation of the door

opening and the source and receiver positions.

4. SUMMARY AND CONCLUSIONS

A hybrid approach for synthesizing binaural room impulse re-

sponses of shoebox-shaped rooms was presented. It computes

geometrically exact early reflections using the image source model

up to a low reflection order, and approximates the reverberant tail

by a high efficient feedback delay network. The FDN was ex-

tended to enable a spatial reverb rendering, taking into account

room geometry as well as wall absorption and source- and receiver

positions.

The proposed simulation method was evaluated with respect to

different properties using subjective and technical measures. In a

subjective evaluation subjects rated the naturalness and room size

of measured and respectively synthesized BRIRs. The ratings show

that the simulation method is able to represent perceived naturalness

and room size very well and independently from maximum image

source order, whereas differences in these properties between rooms

are clearly conveyed.

For the extended FDN, two spatial reverb rendering techniques

(sec. 2.2) were compared in a technical evaluation assessing inter-

aural level differences and interaural cross correlation coefficients.

It was shown that synthesized spatial reverberation has better accor-

dance with purely ISM-created reference BRIRs if the reverberation

emitting virtual sound sources are equally distributed around the

listener’s head. In comparison, positioning these sources on the

actual room wall surfaces yielded worse results (sec. 3.2.2).

A first, simple approach to simulate the acoustics of two ad-

jacent coupled rooms was evaluated by comparing time signal

representations, ILDs and IACCs for measured and synthesized

BRIRs. While the results for this approach were not fully con-

vincing future improvement with refined approximations can be

expected.

In conclusion, the evaluation showed that the suggested compu-

tationally efficient approach for synthesizing binaural room impulse

responses is suited for applications where perceptual plausibility

and authenticity is acceptable.

5. ACKNOWLEDGEMENTS

This work was supported by the DFG FOR 1732 and the Cluster of

Excellence EXC 1077/1 “Hearing4all”.

6. REFERENCES

[1]

J. B. Allen and D. A. Berkley, “Image method for efficiently

simulating small-room acoustics,” J. Acoust. Soc. Am., vol.

66, no. 4, pp. 943–950, 1979.

[2]

B.-I. Dalenbäck, “Engineering principles and techniques in

room acoustics prediction,” in BNAM, Bergen, Norway, May

2010, 2010.

[3]

G. M. Naylor and J. H. Rindel, “Predicting Room Acoustical

Behaviour with the ODEON Computer Model,” in 124th ASA

meeting New Orleans, November 1992, 1992.

[4]

B.-I. Dalenbäck and M. Strömberg, “Real Time Walkthrough

Auralization – The First Year,” Tech. Rep., CATT (Dalen-

bäck), Valeo Graphics (Strömberg), 2010.

[5]

D. Schröder, F. Wefers, S. Pelzer, D. S. Rausch, M. Vorländer,

and T. Kuhlen, “Virtual Reality System at RWTH Aachen

University,” in Proceedings ICA 2010, 20th International

Congress on Acoustics: 23–27 August 2010, Sydney, New

South Wales, Australia, 2010.

[6]

A. Silzle, P. Novo, and H. Strauss, “IKA-SIM: A system to

generate auditory virtual environments,” in Audio Engineering

Society Convention 116, 2004.

[7]

A. Krokstad, S Strøm, and S. Sørsdal, “Calculating the acous-

tical room impulse response by the use of a ray tracing tech-

nique,” J. Sound Vib., vol. 8, no. 1, pp. 118–125, 1968.

[8]

Steven M. Schimmel, Martin F. Müller, and Norbert Dillier,

“A fast and accurate »shoebox« room acoustics simulator,”

Tech. Rep., 2009.

[9]

M. R. Schroeder, “Natural Sounding Artificial Reverberation,”

Journal of the Audio Engineering Society, vol. 10, no. 3, pp.

219–223, 1962.

[10]

J. Stautner and M. Puckette, “Designing Multi-Channel Rever-

berators,” Computer Music Journal, vol. 6, no. 1, pp. 52–65,

1982.

[11]

J.-M. Jot and A. Chaigne, “Digital delay networks for de-

signing artificial reverberators,” in 90th AES Convention,

1991.

[12]

T. Wendt, S. van de Par, and S. D. Ewert, “A computational

efficient and perceptually plausible algorithm for binaural

room impulse response simulation,” subm. to Journal of the

Audio Engineering Society.

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

[13]

J. Borish, “Extension of the image model to arbitrary poly-

hedra,” J. Acoust. Soc. Am., vol. 75, no. 6, pp. 1827–1836,

1984.

[14]

M. Jeub, M. Schäfer, and P. Vary, “A binaural room im-

pulse response database for the evaluation of dereverberation

algorithms,” Tech. Rep., 2009.

[15]

A. Farina, “Simultaneous measurement of impulse response

and distortion with a swept-sine technique,” in Audio Engi-

neering Society Convention 108, 2 2000.

[16]

G. Geißler and S. van de Par, “Messung von HRTF am Kun-

stkopf MK 2 von Cortex,” AG Akustik, Carl-von-Ossietzky-

Universität Oldenburg, 2012.

[17]

S. Klockgether and S. van de Par, “Just Noticable Differences

of Spatial Perception in Directly Manipulated Binaural Room

Impulse Responses,” in AIA/DAGA 2013, Merano, Italy,

2013.