Document [original]

This version is available at https://doi.org/10.14279/depositonce-8800

right to use is granted. This document is intended solely for

personal, non-commercial use.

Brinkmann, Fabian; Lindau, Alexander; Müller-Trapet, Markus; Vorländer, Michael; Weinzierl, Stefan

(2015): Cross-validation of measured and modeled head-related transfer functions. In: Fortschritte der

Akustik - DAGA 2015: 41. Jahrestagung für Akustik, 16. - 19. März 2015 in Nürnberg. Berlin: Deutsche

Gesellschaft für Akustik e.V. pp. 1118–1121.

Fabian Brinkmann, Alexander Lindau, Markus Müller-Trapet, Michael

Vorländer, Stefan Weinzierl

Cross-validation of measured and modeled

head-related transfer functions

Published versionConference paper |

Cross-validation of measured and modeled head-related transfer functions

Fabian Brinkmann1, Alexander Lindau1, Markus M¨uller-Trapet2,

Michael Vorl¨ander2, Stefan Weinzierl1

1Audio Communication Group TU Berlin, Einsteinufer 17c, D-10587 Berlin, Germany.

E-Mail: {fabian.brinkmann, alexander.lindau, stefan.weinzierl}@tu-berlin.de

2Institute of Technical Acoustics, RWTH Aachen University, Kopernikusstr. 5, D-52074 Aachen, Germany.

E-Mail: {mmt, mvo}@akustik.rwth-aachen.de

Abstract

In the current study, we present full-spherical high reso-

lution head-related transfer function (HRTF) datasets of

the head and torso simulator FABIAN including data for

multiple head-above-torso orientations within a typical

horizontal rotation range of ±50 degrees. While the ﬁrst

dataset was measured acoustically using sequential swept

sines, the second one was modeled numerically using

the boundary-element method accelerated with the fast-

multipole method (FMM-BEM). Comparison of magni-

tude spectra revealed a good agreement at mid and high

frequencies, but increasing diﬀerences at frequencies be-

low 200 Hz which could be attributed to the limited fre-

quency range of the measurement loudspeaker. Addi-

tionally, an analysis of times of arrival (TOA) revealed

diﬀerences which were presumably caused by mechanical

inaccuracies of the physical measurement setup. Con-

sequently, by using the modeled data as a reference, in

the measured head-related impulse responses (HRIRs),

we corrected TOAs by fractionally delaying, and extrap-

olated the low frequency range.

Introduction

Virtual acoustical environments may be created through

reproduction of measured or modeled binaural signals us-

ing headphones or cross-talk compensated loudspeaker

setups – a method which is called binaural synthesis [1].

It was shown that a plausible acoustical simulation – i.e.,

“a simulation in agreement with the listener’s expecta-

tion towards a corresponding real event” [2, p. 804] –

may be achieved already with non-individual dynamic

binaural synthesis. In this case, head rotations of the lis-

tener are accounted for by real time exchange of binaural

transfer functions which were measured using a dummy

head. When using individual transfer functions and less

critical audio contents such as speech, simulations can

be authentic – i.e., indistinguishable from a given refer-

ence [3].

The potentially high degree of realism and the techni-

cal simplicity of binaural synthesis suggest its usage as

a reference when benchmarking other approaches of spa-

tial audio reproduction such as loudspeaker array based

methods [4] or room acoustic simulation [5]. This in turn

implies that in order to allow for a reliable comparison

between measured and modeled binaural simulations, the

acquisition of binaural transfer functions (binaural room

impulse responses – BRIRs, HRTFs) needs to be thor-

oughly validated as otherwise measurement errors could

bias the results.

Hence, we suggest a thorough cross-validation using

HRTFs both from acoustical measurements, and numer-

ical simulations. This ’two-sided’ validation approach

is founded in inherent restrictions that are related to

both acquisition approaches: In case of measurements,

through inaccuracies in microphone, subject, and loud-

speaker positioning, as well as non-ideal characteristics

of electro-acoustic transducer (e.g. time variance; devia-

tion from omni-directionality [6]) may indroduce errors,

whereas in case of the simulation trade-oﬀs in the dis-

cretization of the subjects’ geometry, and assumptions

about the acoustical surface impedance may give room

for speculations [7].

Head-related transfer function acquisition

HRTFs for the head and torso simulator FABIAN [8]

were acquired for 11950 source position and 11 az-

imuthal head-above-torso orientations (head rotations to

the left and right) covering the typical range of motion of

±50◦[9]. The resolution of source positions (2◦in eleva-

tion; 2◦great circle distance in azimuth, cf. Fig. 1) was

chosen to allow for perceptually transparent HRTF inter-

polation, and aliasing-free high order spherical harmonic

representation. Head-above-torso orientations were mea-

sured in distances of 10◦to assure that artifacts due to

their interpolation remain below the threshold of per-

ception [10]. In the following, we brieﬂy describe the

methods used for measuring and modeling HRTFs.

Measurements

HRTFs were measured in the fully anechoic chamber

of the Acoustics Groups at Carl von Ossietzky Univer-

sity Oldenburg with a lower cut-oﬀ frequency of 50 Hz

above which free ﬁeld conditions may be assumed. The

used Two Arc Source Positioning system (TASP) con-

sisted of two semicircular arcs with a radius of 1.7 m

which could be rotated about the vertical axis. Each arc

was equipped with a Manger MSW bending-wave sound

transducer that could automatically be moved to diﬀer-

ent elevations. Due to mechanical restriction, HRTFs

could not be measured for elevations below −64◦, result-

ing in approximately 5 % of missing data (11345 HRTFs).

FABIAN’s interaural center was aligned to the center of

the measurement system using a cross-line laser and a

laser pointer attached to FABIAN neck joint. Measure-

DAGA 2015 Nürnberg

1118

Figure 1: Left: Source positions used for HRTF acquisition.

Gray dots denote missing data in measured HRTFs. Right:

Setup for measuring HRTFs.

ments were conducted using sequential swept sines at a

sampling rate of 44.1 kHz. HRTFs were obtained by spec-

tral division of sweeps recorded at FABIAN’s blocked ear

canals and sweeps recorded in the center of the measure-

ment system in the absence of FABIAN. The measure-

ment setup is depicted in Fig. 1, more information can

be found in [11].

Numerical modeling

The numerical HRTF simulation by means of the BEM

required a mesh representation of FABIAN which was

obtained as follows: In a ﬁrst step, a point cloud repre-

sentation of FABIAN was obtained using a GOM ATOS I

structured light scanner. A point spacing of approxi-

mately 1/100 mm for the head and pinnae, and 1/10 mm

for the torso was achieved by aligning overlapping scans

from multiple viewing directions. FABIAN’s neck, and

torso bottom plate were excluded from the scan because

of their reﬂecting metallic surfaces. The alignment of

point clouds from diﬀerent scans was done using refer-

ence points that were marked on FABIAN beforehand

and surface matching as implemented in the ATOS Pro-

fessional software (precision of approx. 1/100 mm).

In the next step, a non-uniform rational basis spline

(NURBS) representation was generated from the point

cloud data using the Geomagix Studio 12 software. Sub-

sequently, the CAD software Rhino was used (a) to de-

sign a cylindrical neck with seamlessly transitions be-

tween head and torso, (b) to close holes in the NURBS

representation, (c) to extend the torso bottom to its orig-

inal size, and (d) to connect the arms to the torso as these

were scanned separately before (cf. Fig. 2, left).

In a last step, Virtual.Lab Acoustics 13.1 was used

for meshing the NURBS data and calculation of com-

plex HRTF spectra at frequencies between 100 Hz and

22.2 kHz with a resolution of 100 Hz. To speed up the

calculation, two triangular meshes with diﬀerent resolu-

tions were generated: a coarse mesh with edge lengths of

2 mm, 10 mm, and 10 mm for pinnae, head, and torso,

respectively was used for BEM calculations up to 6 kHz,

and a ﬁne mesh with edge lengths of 2 mm, 2 mm, and

5 mm was used for the FMM-BEM above 2 kHz. The

overlapping region between 2 kHz and 6 kHz was used

for verifying that both calculations yielded identical re-

sults. The edge lengths were chosen to fulﬁl the typical

Figure 2: Left: NURBS represenatation of FABIAN. Light

gray surfaces were manually inserted into the model. Right:

Coarse mesh of FABIAN’s head. Red area denotes micro-

phone position.

requirement of six elements per wavelength in the fre-

quency range under investigation [12]. Simulations were

then carried out by imposing a constant velocity bound-

ary condition on the part of the mesh corresponding to

the microphone at the ear canal entrance of FABIAN.

Otherwise, the mesh was assumed to be sound-hard, i.e.,

with an admittance of zero. The HRTFs were calculated

by dividing the result at the ﬁeldpoints by the analytical

solution of a point source with the same volume veloc-

ity placed in the center of the coordinate system. Finally,

HRIRs with a sampling rate of 44.1 kHz were obtained by

inverse Fourier transform after mirroring the single sided

spectrum considering the symmetry properties of the dis-

crete Fourier transform [13]. The frequency bin at 0 Hz

was set to 1 (0 dB) beforehand. HRIRs for three diﬀerent

models were calculated: (a) A sound-hard head and torso

model, (b) A head and torso model with an impedance

boundary condition on the torso bottom corresponding

to a porous absorber with a thickness of 30 mm, and (c)

a head, torso, and legs model with legs modeled by a

an elliptical cylinder with the surface impedance of the

porous absorber as in case (b). (cf. Fig. 2, right).

Cross-validation of measured and modeled

head-related transfer functions

Samples of measured and modeled HRIRs/HRTFs for

neutral head-above-torso orientation in the median plane

showed a good ﬁrst visual agreement (cf. Fig. 3). A

more detailed analysis of spectral and temporal diﬀer-

ences and their implications will be given in the follow-

ing – restricted to HRTFs for neutral head-above-torso

orientation due to page limititations.

Temporal cross-validation

Ideally, TOA should be identical in measured and mod-

eled HRIRs. However, diﬀerences in the range of 3 sam-

ples (equaling a displacement of approx. 2.3 cm) were

observed between the two conditions. As the geomet-

rical alignment of the sound sources and FABIANs in-

teraural center was believed to be almost perfect for the

BEM simulation, we used the modeled data as a reference

for correcting the TOA of the measured HRIRs. This

was done separately for the left and right ear HRIRs us-

DAGA 2015 Nürnberg

1119

Figure 3: Measured (left) and modeled (right) HRIRs (top)

and HRTF magnitude spectra (bottom) in the median plane.

Elevation of 0◦, and 180◦denote sources in the front, and

back. Color denotes magnitude in dB.

ing fractional delaying [14]. The amount of delay τwas

estimated by maximizing the cross-correlation between

pairs of ten times up-sampled measured and modeled

HRIRs, hence τ=argmax

τxy (τ). For the fractional

delay we used Kaiser windowed sinc ﬁlters of order 70,

exhibiting negligible magnitude and group delay distor-

tions (<0.1dB;<0.01 samples, ∀f<20 kHz). As a re-

sult the average (and minimum) cross-correlations in our

data increased from 0.7 (-0.18) to 0.94 (0.56). Fig. 4

shows two HRIRs before and after TOA correction.

An analysis across source positions revealed slight

TOA discontinuities in measured data – stemming from

start and end points of the 360◦rotation of the TASP,

and from the two diﬀerently mounted measurement loud-

speakers – which were also removed by the fractional de-

lays. The TOA treatment caused changes in the broad

band interaural time diﬀerence (ITD) of about ±0.5 sam-

ples (≈11 μs) in the proximity of the horizontal plane.

Diﬀerences of about -1.1 to 2.9 samples (≈66 μs) ocurred

for lateral sources where, however, the auditory system

is less sensitive to changes in the ITD [15]. Induced ITD

changeswerethusbelievedtobebelowthethresholdof

perception, and the applied fractional delays were con-

sidered to be perceptually non-critical.

Spectral cross-validation

Spectral diﬀerences between measured and modeled

HRTFs were evaluated in 40 auditory ﬁlter bands be-

tween 50 Hz and 20 kHz [16]. The results for the three

diﬀerent FABIAN models (cf. Sec. Numerical model-

ing) were almost identical, and thus the non-rigid sur-

faces models were discarded for simplicity. Results for

the rigid surface model are shown in Fig. 5 for the left

ear HRIRs averaged across source positions. Results

can be discussed with regard to three frequency ranges

showing distinct error pattern: Deviations as seen be-

low 200 Hz are caused by the limited frequency response

of the loudspeaker. Here, the levels of measured HRIRs

are systematically below their modeled counterpart. Be-

tween 200 Hz and 6 kHz the median deviation is approx.

0 20406080100

ï



t in samples

HRIR amplitude

Figure 4: Modeled (gray) and measured left ear HRIRs be-

fore (dashed) and after (solid) fractional delaying. For illus-

tration only the HRIR with the lowest cross-correlation before

the alignment is sown.

0 dB, while 90% of the errors are smaller than ±2dB,

and the maximum error remains below ±8 dB. Above 5-

6 kHz notches occur in the HRTF magnitude response,

that are caused by pinnae (anti) resonances [17] and are

highly sensitive to small changes of sound incidence, or

microphone position [6]. Hence, the median diﬀerence

between measured and modeled HRTFs increases to ap-

prox. ±3 dB. For 90% of the source positions the error

remains well below ±10 dB, whereas the maximum error

occasionally exceeds this range. Results for the right ear

were of comparable magnitude.

High frequency diﬀerences in the HRTFs could be ei-

ther induced by mechanical inaccuracies of the measure-

ment setup, or by simplifying assumptions underlying

the BEM simulation, making them diﬃcult to correct.

Observed low frequency diﬀerences, however, are clearly

caused by non-ideal loudspeaker characteristics and could

therefore be corrected by extrapolation: Xie [18] pro-

posed a linear interpolation of magnitude and phase re-

sponses, Bernsch¨utz [19] coupled a time and level aligned

low pass ﬁlter to the measured HRIRs, Algazi et al. [20]

ﬁtted analytically modelled transfer functions derived

from ﬁtted spherical/elliptical head and torso models,

and Gumerov et al. [21] suggested using data obtained

from BEM simulations. The latter was applied in our

case, too, assuming the modeled data to be valid for

low frequencies. To avoid discontinuities, measured and

modeled HRTF magnitude and unwrapped phase spec-

tra were combined separately using a linear fade between

200 and 500 Hz.

Discussion and outlook

In the current study, we presented high-resolution HRTF

datasets for various head-above-torso orientations which

we measured acoustically, and simulated numerically us-

ing the BEM. We presented results from temporal and

spectral analyses of both types of data. Thereby, we came

to the conclusion that both, measured and modeled data

suﬀered from speciﬁc shortcomings in so far that none of

them alone could provide us with a general ground truth.

However, from our analyses we could identify frequency

ranges in which each of the data sets appeared to be more

reliable than the other one, in turn using the more reli-

able data for a correction (cross-validation) of the other.

Thus, small TOA deviations which we observed in mea-

sured data and which could be ascribed to asymmetries

DAGA 2015 Nürnberg

1120

.1 110 20

−12

−8

−4

f in Hz

Deviation in dB

Figure 5: Deviations between measured and modeled

HRTFs for 40 auditory ﬁlter band averaged across source po-

sitions before (colored) and after low frequency extrapolation

(gray scale; Median errors given by black and orange solid

lines; 10%-90% percentile range by solid gray/red lines; 0%-

100% percentile range by dashed gray/red lines).

of the mechanical measurement setup were corrected by

aligning them to the modeled data by means of fractional

delay correction. Additionally, we investigated the dif-

ferences between measured and modeled HRTF magni-

tude spectra in 40 auditory ﬁlter bands. Below 200 Hz,

measured HRTFs were found to be invalid due to the

non-ideal loudspeaker frequency response, hence, missing

low-frequency information was recreated using the mod-

eled data. Above 200 Hz, magnitude spectra were found

to be reasonably comparable and diﬀerences were of an

order as reported, e.g., by Gumerov et al. [21]. Above

6 kHz, however, spectral diﬀerences increased presum-

ably caused by slightly mismatched pinna notches. As in

this case, causes for discrepancies could not be satisfac-

torily identiﬁed, we refrained from correcting either data.

For the future we plan to extrapolate the missing mea-

sured HRTFs at low elevations using an approach as

suggested by Ahrens et al. [22]. Furthermore, we aim

at psycho-acoustically validating the localization perfor-

mance obtainable with our HRTF sets using a model sug-

gested by Baumgartner et al. [23].

Acknolwdgement

The work is part of the Simulation and Evaluation of

Acoustical Environments (SEACEN) project funded by

the German Research Foundation (DFG WE 4057/3-2).

References

[1] Møller, H.: Fundamentals of binaural technology. Applied

Acoustics 36 (1992), 171-218.

[2] Lindau, A., Weinzierl, S.: Assessing the plausibility of

virtual acoustic environments. Acta Acustica united with

Acustica, 98 (2012), 804-810.

[3] Brinkmann, F., Lindau, A., Vrhovnik, M., Weinzierl, S.:

Assessing the authenticity of individual dynamic binaural

synthesis. Proc. of the EAA Joint Symposium on Aural-

ization and Ambisonics (2014), 62-68.

[4] Spors, S. Wierstorf, H., Raake, A., Melchior, F., Frank,

M., Zotter, F.: Spatial sound with loudspeakers and its

perception: A review of the current state. Proc. of the

IEEE 101 (2013), 1920-1938.

[5] Vor¨ander, M.:Auralization. Fundamentals of acoustics,

modelling, simulation, algorithms and acoustic virtual re-

ality. Springer, Berlin, Heidelberg, Germany, (2008).

[6] Andreopoulou, A., Begault, D.R., Katz, B.F.G.: Inter-

laboratory round robin HRTF measurement comparison.

IEEE J. Sel. Topics Signal Process. (2015), in print.

[7] Kahana, Y.: Numerical modelling of the head-related

transfer function. Ph.D. Thesis, University of Southhamp-

ton, United Kingdom, (2000).

[8] Lindau, A., Hohn, T, Weinzierl, S: Binaural resynthesis

for comparative studies of acoustical environments. 122th

AES Convention (2007), Vienna, Austria.

[9] Thurlow, W.R., Mangels, J.W., Runge, P.S.: Head move-

ments during sound localization. J. Acoust. Soc. Am. 42

(1967), 489-493.

[10] Brinkmann, F., Roden, R., Lindau, A., Weinzierl, S.:

Audibility and interpolation of head-above-torso orienta-

tion in binaural technology. IEEE J. Sel. Topics Signal

Process. (2015), in print.

[11] Brinkmann, F., Lindau, L., Weinzierl, S., Geissler, G,

van de Par, S.: A high resolution head-related transfer

function database including diﬀerent orientations of head

above the torso. AIA-DAGA 2013, Int. Conf. Acoust.,

596-599.

[12] Ciskowski, R. and Brebbia, C.: Boundary Element Meth-

ods in Acoustics, Elsevier Applied Science, (1991).

[13] Oppenheim, A.V., Schafer, R.W., Buck, J.R.: Discrete-

time signal processing. Prentice Hall, Upper Saddle, USA,

2nd ed. (1999).

[14] Laakso, T.I., V¨alim¨aki, V., Karjalainen, M. Laine, U.K.:

Splitting he unit delay. IEEE Sig. Proc. Mag., 13 (1996),

30-60.

[15] Blauert, J.: Spatial hearing. The psychophysics of human

sound localization. MIT Press, Revised ed. (1997).

[16] Slaney, M.: Auditory toolbox. Version 2. Tech. Rep.

#1998-010, Interval Research Corporation (1998).

[17] Takemoto, H., Mokhtari, P., Kato, H., Nishimura, R.

Iida, K.: Mechanism for generating peaks and notches

of head-related transfer functions in the median plane. J.

Acoust.Soc.Am., 132 (2012), 3832-3841.

[18] Xie, B.: On the low frequency characteristics of head-

related transfer function. Chinese J. Acoust., 28 (2009),

1-13.

[19] Bernsch¨utz, B.: A spherical far ﬁeld HRIR/HRTF com-

pilation of the Neumann KU 100. AIA-DAGA 2013, Inter-

national Conference on Acoustics, Merano, Italy, 592-595.

[20] Algazi, V.R., Duda, R.O., Duraiswami, R., Gumerov,

N.A.,Tang, Z.: Approximating the head-related transfer

function using simple geometric models of the head and

torso. J. Acoust. Soc. Am., 112 (2002), 2053-2064.

[21] Gumerov, N.A., O’Donovan, A.E., Duraiswami, R.,

Zotkin D.N.: Computation of the head-related transfer

function via the fast multipole accelerated boundary ele-

ment method and its spherical harmonic representation.

J. Acoust. Soc. Am., 127 (2010), 370-386.

[22] Ahrens, J., Thomas, M.R.P., Tashev, I.J.: HRTF magni-

tude modeling using a non-regularized least-squares ﬁt of

spherical harmonics coeﬃcients on incomplete data. AP-

SIPA Annual Summit and Conference, (2012).

[23] Baumgartner, R., Majdak, P., Laback, B.: Modeling

sound-source localization in sagittal planes for human lis-

teners. J. Acoust. Soc. Am., 136 (214), 791-802.

DAGA 2015 Nürnberg

1121