Document [original]

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

COMPARISON OF A 2D- AND 3D-BASED GRAPHICAL USER INTERFACE FOR

LOCALIZATION LISTENING TESTS

Michael Schoeffler, Susanne Westphal, Alexander Adami, Harald Bayerlein, Jürgen Herre

International Audio Laboratories Erlangen

A Joint Institution of Fraunhofer IIS and University of Erlangen-Nürnberg

Erlangen, Germany

[email protected]

ABSTRACT

Recently, there is a trend in developing new multi-channel for-

mats towards adding additional loudspeakers in elevated positions.

While the common 5.1 surround sound system only has loudspeak-

ers in the horizontal plane, more complex systems, such as 10.2 or

22.2, include two or more elevated loudspeakers.

When listening to music using a multi-channel playback system,

the audio material has often not been produced for the used system,

e.g. listening to 10.2 material while using a 5.1 surround system. In

such cases, the audio material has to be down- or upmixed. Com-

pared with listening to the original audio material, down- or up-

mixing affects the listening experience. The localization of sound

sources is one attribute that might be affected by down- or upmix-

ing the audio material.

In the past, some localization listening tests were conducted by us-

ing an user interface depicting a two-dimensional representation of

the scene. When it comes to elevated loudspeakers, a third dimen-

sion also has to be depicted by the user interface. In this work, an

experiment was conducted where participants had to locate sound

sources by using two different graphical user interfaces (GUIs).

The first GUI consisted of two static images of the scene: a top-

view and a front-view. The other GUI had a fully adjustable 3D

visualization of the scene. The main purpose of the experiment is

to investigate the differences between both GUIs. This includes

the time participants spend on each GUI and the difference in the

responses. This work is a contribution to the development of new

evaluation methods for new and existing multi-channel audio for-

mats and renderers.

1. INTRODUCTION

A number of localization experiments were conducted to find out

more about the human ability to localize sound sources. In ex-

periments, reporting the perceived location of sound sources by

pointing (with or without the extension of a body part) has been

found to be the most accurate method [1]. Due to high accuracy,

pointing methods were widely used in recent experiments (e.g.

[2][3][4]). However, one drawback of pointing methods is that

they can only be applied when localizing sound sources in the lis-

tener’s field of view. As a consequence, pointing methods can not

be used when the listener is not allowed to move his head. Such

a condition has to be kept when localizing sound sources in the

back. One application example of localizing sound sources in the

back is the evaluation of down-mixers. A down-mixer is needed

when one multi-channel format has to be converted into another

multi-channel format with fewer channels. Especially considering

down-mixes where the input multi-channel format contains ele-

vated loudspeakers and the output format does not, a method is

needed which supports reporting the localization of sound sources

in all three dimensions. Such a method becomes even more impor-

tant if the distance of a sound source has to be evaluated, too. Since

pointing methods do not match the afore-mentioned requirements,

two graphical user interfaces which enable the listener to indicate

sources at any position are compared in this paper.

Our main research questions are: how accurate are the two

types of GUIs, how much time is needed for reporting the localized

stimuli and which variables influence the accuracy?

2. RELATED WORK

Graphical user interfaces were used in many localization tests be-

fore. Wenzel investigated the effect of increasing system latency

on localization of virtual sounds by using a graphical response

method [5]. The listener’s head was displayed from a top view on

the left-hand side of the GUI, while a front view was displayed on

the right-hand side. Almost the same GUI was used in an experi-

ment conducted by Begault et al [6]. The GUI of these experiments

depicted only the listener’s head and did not support reporting the

distance.

Pernaux et al. tested three different reporting methods [7]. The

first one used a 2D visual feedback of the listener’s head. The sec-

ond GUI displayed the listener’s head in a three-dimensional view.

The third one was similar to the second one apart from using a 3D

finger pointing input instead of a computer mouse. They observed

significant differences between these three reporting methods. The

2D-based and 3D-based GUIs of their experiments did not support

reporting the distance of a sound source.

Martin et al. utilized a GUI displaying a top view of the scene

to investigate the localization using a five-channel surround sound

reproduction system [8]. The elevation and distance of a sound

source was not investigated in their experiment.

Choisel and Zimmer developed a new pointing method for lo-

calizing frontal sources and compared it with a graphical response

method [9]. In their experiment, only azimuth angles of sound

sources in front of the participants were tested.

Yoo et al. evaluated the localization of sound sources on the

horizontal plane for a wave field synthesis system [10]. Listeners

reported the sound source positions using an answer sheet which

contained a scheme of the scene. Two different listening positions

were examined including front and back sound sources. Listen-

ers were found out to have an average localization error of 6.1◦

azimuth and 9.18◦elevation, while the average distance error was

0.5 m and 0.6 m in the horizontal and vertical plane, respectively.

107

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

In contrast to many localization tests conducted before, our

GUI allows to report the location of a sound source in all three

dimensions. By comparing two types of GUIs, the effect size of

the GUIs can be measured. Except for the experiment conducted

by Pernaux et al., we found no studies which investigated the dif-

ference between GUIs for the same experiment setup. Our experi-

ment covers reporting of front, side and back sources with listeners

being allowed to move their heads only slightly.

3. METHOD

3.1. Stimuli

Three different signals were used for generating the stimuli. The

first signal was pink noise which was faded in and out over 500 ms.

The pink noise signal was only played back during the training

phase. The second signal was a sine wave with a frequency of

220 Hz and also faded in and out over 500 ms. The steady sine sig-

nal represented a sound source which is hard to localize according

to Hartmann [11]. The third signal was a castanet recording. In

contrast to the sine signal, the castanets recording represented a

narrow sound source which is easier to localize due to its transient

structure. All signals had a length of 7.8 s and were adjusted to

have equal loudness by two expert listeners. The loudness was

adjusted using the final experiment setup.

Six loudspeakers were used to reproduce the sine signal and

castanets recording. Furthermore, one additional loudspeaker was

exclusively used for the training. The loudspeakers, with positions

according to Table 1, were placed in 1.5 m distance relative to the

participant. We categorized the loudspeaker positions dependent

on their azimuth angles to front,side and back. The positions were

selected based on previous research about localization and taking

into account sound sources in front can be localized more accu-

rately than sources behind the listener[12]. The positions of an

established multi-channel system were not used since participants

familiar with surround sound might have been biased.

No. Azimuth Height Category

Training 30 ◦-1 cm -

110 ◦-1 cm front

2−55 ◦-1 cm side

3120 ◦-1 cm back

4−10 ◦33 cm front

555 ◦33 cm side

6−120 ◦33 cm back

Table 1: Loudspeaker positions are relative to the listener’s head

(height = 120 cm). The height of the loudspeakers was measured

from the loudspeakers’ center.

Summarized, two different signals were played back from six

different loudspeaker positions which results in a total number of

twelve stimuli. For the training, a dedicated signal and loudspeaker

position was used.

3.2. Participants

Thirty participants including twenty audio professionals took part

in the experiment. Most of the participants were employees or

students of the International Audio Laboratories Erlangen. Details

about the participants are given in Table 2.

Participants 30

Audio professionals 20

Familiar with surround sound 5

Familiar with listening tests 25

Age groups [years]: [0 −19] 1

[20 −29] 19

[30 −39] 5

[40 −59] 5

Table 2: Detailed information about the participants.

3.3. Materials and Apparatus

3.3.1. Setup

The experiment took place in a soundproof listening room with

room measurements (H x W x D) 256 x 452 x 455 cm. In the mid-

dle of the room, a chair and a table were placed for the partici-

pants. A 24” widescreen LCD monitor mounted on a small table

was placed in front of the chair and table.

The loudspeakers were of type Focal CMS40 with measure-

ments (H x W x D) 23.8 x 15.6 x 15.5 cm. A black-colored 360 ◦

masking curtain made of deco-molton was installed to veil the

loudspeakers. The curtain was fixed to an aluminum ring with

a diameter of 2 m which was attached to three truss stands at a

height of 212 cm. The lighting in the room was adjusted such that

participants could not spot the loudspeakers beyond the curtain.

The masking curtain attenuated frequencies above 300 Hz by con-

stantly 2-3 dB.

A face-tracking system was installed to prevent participants

from moving their head while locating the stimuli. When par-

ticipants nodded or turned their head more than 25 ◦, a warning

message popped up and the stimulus stopped playing.

A picture of the experiment setup is shown in Figure 1.

Figure 1: A picture from the setup. The masking curtain was

closed during the experiment.

3.3.2. 2D-based GUI

The 2D-based GUI had two orthographic views of the same vir-

tual scene which was a representation of the room the participants

were sitting in. On the left-hand side, a top view of the scene was

shown whereas a front view was presented on the right-hand side.

The virtual scene contained the participant’s head, a monitor, the

masking curtain and a red sphere. Participants could adjust the

position and size of the red sphere and thus indicate where they

108

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

localized the stimulus. The scene including all modeled objects

was true to scale. A screenshot of the 2D-based GUI is depicted in

Figure 2.

Figure 2: Screenshot of the 2D-based GUI.

3.3.3. 3D-based GUI

The 3D-based GUI was almost similar to the 2D-based GUI ex-

cept for solely a single perspective view was displayed instead of

two orthographic views. The virtual camera of this view was con-

trollable by the participants to select the preferred camera views.

Figure 3 shows a screenshot of the 3D-based GUI.

Figure 3: Screenshot of the 3D-based GUI.

3.3.4. Input Controller

For reporting the position of the red sphere, a custom-made input

controller was developed. This controller offered three different

types of inputs:

The camera could be moved along a sphere using an analog

joystick and was always directed towards the participant’s virtual

representation. Zooming in and out could be done by pressing the

according button next to the camera controlling joystick. Camera

controls were only active while using the 3D-based GUI.

The red sphere could be moved on the horizontal plane by

using a digital joystick. Each step corresponded to an accuracy of

5 cm. For moving the red sphere up- or downwards, two additional

buttons were located next to the digital joystick. By pressing a

button once, the red sphere moved 5 cm. The size of the sphere

could be adjusted by another two buttons. The sphere controls

were active while using the 2D-based as well as the 3D-based GUI.

Furthermore, two buttons for playing back the stimulus (play

button) and completing the response (next button) were located be-

low the camera and sphere controls. When a stimulus was already

playing, pressing the play button had no effect.

Developing an own custom input controller allowed us to de-

sign an individual arrangement of buttons which is easy to under-

stand. E.g. if a keyboard had been used, participants might have

spent more time on learning the relevant keys and their function.

In Figure 4, a picture of the input controller is shown.

Figure 4: The input controller.

3.4. Procedure

The experiment had a subject-within design. All participants had

to localize all twelve stimuli using both GUIs. In total, each par-

ticipant gave twenty-four responses.

All participants were blindfolded and guided by an experi-

menter to the chair in the middle of the room. Blindfolding the

participants assured that they can not spot loudspeakers while en-

tering the room. After removing the blindfold, participants were

instructed to always keep their heads straight towards the moni-

tor during the experiment. Furthermore, they were informed that

their faces would be tracked by a face-tracking system to verify

that they would be looking towards the monitor. The experimenter

then left the room and all subsequent instructions were displayed

by the experiment software.

At the beginning of the test, participants had to fill out a ques-

tionnaire. They were asked whether they regularly listen to sur-

round sound, whether they are an audio professional, if they are

familiar with listening tests and to which age group they belong.

The questionnaire was followed by some general instructions: The

109

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

participants were again reminded that they had to localize stimuli

and are not allowed to turn their head. The general instructions

announced that a 2D- and a 3D-based GUI would be used for re-

porting the location of the stimuli.

It was randomly chosen which GUI was initially presented to

the participants. Before the participants could report the stimuli lo-

cations, they had to read the detailed instructions. These contained

a brief description of the GUI, how the input controller worked and

that they are asked to localize the stimuli. For the 3D-based GUI,

additional information about moving the camera was included in

the detailed instructions. After reading the instructions, the par-

ticipants had to undertake training in which they were asked to

place the red sphere at the position where they localize the stimu-

lus. Additionally, they were asked to indicate the broadness of the

stimulus by changing the size of the red sphere. The tutorial could

only be finished if every control element was used at least once

(play button, position and size of the red sphere). In the tutorial

for the 3D-based GUI, participants also had to move the camera.

Afterwards, participants had to localize twelve stimuli using the

present GUI. When they finished reporting all stimuli locations,

the same procedure was applied for the second GUI.

At the end of the experiment, the participants had to fill out a

questionnaire about how they got along with each GUI.

4. RESULTS

Completing the experiment took 19 minutes (SD1= 6) on aver-

age. To analyze the accuracy of both GUIs, we define the localiza-

tion error as the euclidean distance between the reported position

rand the actual loudspeaker position l:

LocError =||r−l||2,(1)

where bold-faced letters represent vectors with v= [vx, vy, vz]T.

The mean of LocError was 82 cm (SD = 68 cm) for all stim-

uli. The 2D-based GUI had a LocError mean of 83 cm (SD =

68 cm) for all stimuli. The 3D-based GUI had a mean LocError

of 82 cm (SD = 68 cm). The effect of the GUI on LocError

was not significant at the p < .05 level [F(1,717) = 0.094,

p=.759]. Levene’s test indicated equal variances for LocError

(F= 0.20, p =.655). In Table 3, detailed information about

LocError is given.

2D 3D both

M SD M SD M SD

front 55 52 60 53 58 52

side 66 47 71 65 68 57

back 129 76 114 71 121 74

all 83 68 82 68 82 68

Table 3: Average Localization errors in cm for both stimuli. The

table is segmented by the GUI type and the loudspeaker category.

The distance of a stimulus is dependent, among other factors,

on its loudness [13]. As the loudness was only subjectively ad-

justed by two expert listeners, a normalized localization error is

calculated. The normalized localization error excludes the depth

distance of the distance between reported position and loudspeaker

1M= mean, SD = standard deviation.

position. The normalized reported position is defined as:

rNorm =r

|r||l|.(2)

The normalized reported position has the same direct distance to

the listener as the loudspeaker and is used for calculating the nor-

malized localization error:

LocErrorNorm =||rNorm −l||2.(3)

The mean of the normalized localization error was 70 cm (SD =

62). The 3D-based GUI (M= 67 cm, SD = 59) turned out to

be more accurate than the 2D-based GUI (M= 73 cm, SD =

65). According to a repeated measures ANOVA, the difference

between the two GUIs was not statistically significant at the p <

.05 level [F(1,717) = 2.295, p =.13]. Levene’s test indicated

equal variances for LocErrorNorm (F= 1.710, p =.192).

In Table 4, all values for the normalized localization error can be

found.

2D 3D both

M SD M SD M SD

front 46 55 40 49 43 52

side 57 42 57 42 57 42

back 119 69 104 66 111 68

all 74 65 67 59 70 62

Table 4: Normalized localization errors in cm for both stimuli. The

table is segmented by the GUI type and the loudspeaker category.

A linear regression model with mixed effects was calculated

to analyze the influences on the normalized localization error in

more detail. The participants were defined as random factor (par-

ticipants_id). The fixed factors were the type of GUI (GUI), the

loudspeaker category (category), the signal (signal), the total time

spent on the GUI type (time_GUI), the time spent on the train-

ing for the GUI type (time_training) and if the response was given

when the GUI was chosen last. (GUI_last). The results of the

fitted model are described in Table 5.

Coefficient Value Std. Error t-value p-value

Fixed Effects:

(Intercept) 34.25 7.9 4.34 .000

GUI = 3D -3.43 4.58 -0.75 .455

category = side 13.49 4.6 2.92 .004

category = back 68.05 4.6 14.71 .000

signal = sine 36.32 3.78 9.61 .000

time_GUI -1.61 1.00 -1.62 .105

time_training 4.34 2.57 1.69 .091

GUI_last = true -5.87 4.29 -1.37 .172

Random Effects:

participants_id

(Intercept) Residual

StdDev: 8.52 50.69

Table 5: Linear regression model with mixed effects for the nor-

malized localization error. The table is segmented by the GUI type

and the loudspeaker category.

The normalized localization error can also be expressed as the

110

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

elevation and azimuth error of the normalized reported position

(EleErrorNorm and AziErrorNorm). The mean of EleErrorNorm

was 10 ◦(SD = 34). As indicated by the linear regression model

of the normalized localization error, the signal type had an influ-

ence on the elevation and azimuth errors. The average normal-

ized elevation error of the castanets recording was 9◦(SD = 7).

When sine wave was played back the average normalized ele-

vation error increased (M= 11 ◦, SD = 8). If the normal-

ized elevation errors are analyzed for each loudspeaker category,

the castanets recording resulted in lower errors for each category

(front: M= 8 ◦, SD = 6; side: M= 9 ◦, SD = 6; back:

M= 10 ◦, SD = 8). The sine wave resulted in much higher

average normalized elevation errors (front: M= 11 ◦, SD = 9;

side: M= 11 ◦, SD = 7; back: M= 10 ◦, SD = 7). All nor-

malized elevation errors for both stimuli are described in Table 6.

2D 3D both

M SD M SD M SD

front 10 8 9 7 10 8

side 10 7 9 7 10 7

back 11 7 9 8 10 8

all 11 8 9 7 10 7

Table 6: Elevation errors of the normalized reported position in de-

grees. The table is segmented by the GUI type and the loudspeaker

category.

The mean of AziErrorNorm was 24 ◦(SD = 30). As ex-

pected, the average normalized azimuth errors widely differed for

each signal type. The average normalized azimuth error of the

castanets recording was 16 ◦(SD = 21). The average normal-

ized azimuth error increased when the sine wave was played back

(M= 32 ◦, SD = 34). The castanets recording resulted in

smaller errors for all categories (front: M= 5 ◦, SD = 4; side:

M= 13 ◦, SD = 13; back: M= 30 ◦, SD = 29). As expected,

the sine wave resulted in larger average normalized azimuth er-

rors (front: M= 21 ◦, SD = 38; side: M= 22 ◦, SD = 19;

back: M= 54 ◦, SD = 31). All normalized azimuth errors are

described in Table 7 for both stimuli.

2D 3D both

M SD M SD M SD

front 14 29 12 27 13 28

side 18 17 18 17 18 17

back 46 33 39 31 42 32

all 26 31 23 28 24 30

Table 7: Azimuth errors of the normalized reported position in de-

grees. The table is segmented by the GUI type and the loudspeaker

category.

The differences of localization errors between the sine wave

and the castanets recording are confirmed by the reported broad-

ness of the two stimuli. The average radius of the red sphere was

23 cm(SD = 7) when the castanets recording was played back.

For the sine wave stimulus, the average reported radius of the

sphere was M= 45 cm (SD = 29).

Reporting twelve stimuli for the 2D-based GUI took the par-

ticipants in average 5 minutes (SD = 2). For the 3D-based GUI

Very bad Bad Neutral Good Very good

Response

Frequency

2D-based GUI

3D-based GUI

Figure 5: Frequencies of participants’ responses about they got

along with each GUI.

8 minutes (SD = 3). According to a repeated measures ANOVA,

the difference of the time participants spent on each GUI was sta-

tistically significant at the p < .05 level [F(1,57) = 17.99,

p=.000]. Levene’s test indicated equal variances for LocError

(F = 4.01, p = .04997).

At the end of the experiment, the participants were asked how

they got along with each GUI. The possible answers were: "Very

bad" (=1), Bad (=2), "Neutral" (=3), "Good" (=4) and "Very good"

(=5). In Figure 5, the frequency of the answers are shown. The

mode of the 2D-based GUI was Very Good and for the 3D-based

it was Good. A cumulative link model (Table 8 supported the in-

formation provided by the listeners that they got along better with

the 2D-based GUI. The 3D-based GUI type (GUI) had a signifi-

cant negative effect on the participants’ answers (F=−1.34, p =

.008). If a participant used a type of GUI last (last), it had a non-

significant positive effect on the answer (F= 0.46, p =.344).

Coefficient Estimate Std. Error z-value p-value

GUI = 3D -1.34 0.51 -2.64 .008

last = true 0.46 0.49 0.95 0.344

Threshold coefficients:

Estimate Std. Error z-value

Bad|Neutral -3.28 0.67 -4.91

Neutral|Good -1.67 0.50 -3.37

Good|V ery good 0.25 0.44 0.57

Number of observations: 60

Cragg and Uhler’s pseudo R2: 0.14

Table 8: Logit cumulative link model of the response about how

participants got along with each GUI.

5. DISCUSSION

Participants were much faster in reporting the location when using

the 2D-based GUI. This was expected since the 2D-based GUI

showed two views at the same time. To have a similar perspective

111

Proc. of the EAA Joint Symposium on Auralization and Ambisonics, Berlin, Germany, 3-5 April 2014

with the 3D-based GUI, the camera had to be moved which took

some time. Some participants reported the two views of the 2D-

based GUI were perfectly good for them. When using the 3D-

based GUI, they consecutively moved the camera such that a front

and top view were shown. Another reason for the time differences

might be due to the 3D-based GUI was something new and fun to

use as some participants reported. These participants spent a bit

more time on the 3D-based GUI just for playing around with the

camera. Therefore, for listening tests with many items, a 2D-based

GUI with multiple views should be used instead of a 3D-based

GUI with a virtual camera.

Participants reported getting along better with the 2D-based

GUI. This was expected since the 3D-based GUI needs the camera

to be adjusted which increases the complexity. The 2D-based GUI

already showed two views which enabled the user to monitor all

three dimensions. The red sphere was controlled by a joystick

relative to the virtual participant, even if the camera was moved.

Some participants reported that they would have expected the red

sphere to move relative to the camera (e.g. pressing the joystick

forward moves the red sphere away from the camera).

Localization errors of the participants’ responses were slightly

smaller when the 3D-based GUI was used. However, the differ-

ences in the localization error and normalized localization error

were not significant. Nevertheless, the similar results are interest-

ing, considering that participants reported that they got along much

better with the 2D-based GUI. By the linear regression model, it

could be revealed that for predicting the localization error other

effects are much more relevant than the GUI. The effect size of the

GUI type was small in the model and also not significant. Loud-

speaker position and the signal type influenced the localization

most. This was expected since these effects are known from estab-

lished research. There are non-significant indications that training

is important for reporting the localization by graphical user inter-

faces. When the GUI was last used, the localization error was

reduced according to the model.

The reported azimuth angle of the normalized location error

turned out to be accurate when the castanets recording was played

back by the front loudspeakers. The average normalized azimuth

error was 5◦which is close to results achieved by other localiza-

tion methods. E.g. Haber et al. reported that the average localiza-

tion errors of nine different methods ranged from +3.5◦to −5.2◦

for front loudspeaker positions[1]. In the experiment of Yoo et al.

the average azimuth error was −5◦and 2.3◦for two tested front

loudspeaker positions (−30 ◦and +30 ◦)[10].

6. CONCLUSION

A method for reporting the location of sound sources in all three di-

mensions was presented. The method was evaluated by conducting

an experiment with two different types of GUIs: A 2D-based GUI

and a 3D-based GUI. The 2D-based GUI had an average localiza-

tion error of 83 cm and turned out to be the less time-consuming

and more convenient choice. The 3D-based GUI was slightly more

accurate and had an average localization error of 82 cm. The anal-

ysis of the experiment results revealed that the used GUI had only

a small effect on the localization error. The signal type and loud-

speaker position played a much more important role. For front

loudspeaker positions, both GUIs resulted in an average normal-

ized azimuth error of 5◦when a castanets recording was played

back.

7. REFERENCES

[1] L. Haber, R. N. Haber, S. Penningroth, K. Novak, and

H. Radgowski, “Comparison of nine methods of indicating

the direction to objects: data from blind adults.,” Perception,

vol. 22, no. 1, pp. 35–47, Jan. 1993.

[2] M. Frank, L. Mohr, A. Sontacchi, and F. Zotter, “Flexible and

Intuitive Pointing Method for 3-D Auditory Localization Ex-

periments,” in Audio Engineering Society Conference: 38th

International Conference: Sound Quality Evaluation, 2010.

[3] H. Wierstorf, A. Raake, and S. Spors, “Localization of a Vir-

tual Point Source within the Listening Area for Wave Field

Synthesis,” in Audio Engineering Society Convention 133,

2012.

[4] T. Ashby, R. Mason, and T. Brookes, “Head Movements

in Three-Dimensional Localization,” in Audio Engineering

Society Convention 134, 2013.

[5] E. M. Wenzel, “Effect of increasing system latency on lo-

calization of virtual sounds,” in Proceedings of the Audio

Engineering Society 16th International Conference on Spa-

tial Sound Reproduction, 1999, pp. 42–50.

[6] D. R. Begault, E. M. Wenzel, and M. R. Anderson, “Di-

rect comparison of the impact of head tracking, reverber-

ation, and individualized head-related transfer functions on

the spatial perception of a virtual speech source.,” Journal of

the Audio Engineering Society. Audio Engineering Society,

vol. 49, no. 10, pp. 904–916, Oct. 2001.

[7] J. Pernaux, M. Emerit, and R. Nicol, “Perceptual Evalua-

tion of Binaural Sound Synthesis: the Problem of Reporting

Localization Judgments,” in Audio Engineering Society Con-

vention 114, 2003.

[8] G. Martin, W. Woszczyk, J. Corey, and R. Quesnel, “Sound

Source Localization in a Five-Channel Surround Sound Re-

production System,” in Audio Engineering Society Conven-

tion 107, 1999.

[9] S. Choisel and K. Zimmer, “A pointing Technique with Vi-

sual Feedback for Sound Source Localization Experiments,”

in Audio Engineering Society Convention 115, 2003.

[10] J. Yoo, J. Seo, H. Shim, H. Chung, K. Sung, and K. Kang,

“Subjective Listening Experiments on a Front and Rear

Array-Based WFS System,” ETRI Journal, vol. 33, no. 6,

pp. 977–980, 2011.

[11] W. M. Hartmann, “Localization of sound in rooms,” The

Journal of the Acoustical Society of America, vol. 74, no.

November 1983, pp. 1380–1391, 1983.

[12] S. Carlile, P. Leong, and S. Hyams, “The nature and distri-

bution of errors in sound localization by human listeners.,”

Hearing research, vol. 114, no. 1-2, pp. 179–96, Dec. 1997.

[13] G. von Békésy, “The moon illusion and similar auditory phe-

nomena,” The American journal of psychology, vol. 62, no.

4, pp. 540–552, 1949.

112