MIRACLE - Microphone Array Impulse Response
Dataset for Acoustic Learning
Adam Kujawski∗,1,2 Art J. R. Pelling†,1,2 Ennes Sarradj‡,1
∗Email: [email protected],ORCID: 0000-0003-4579-8813
†Email: [email protected],ORCID: 0000-0003-3228-6069
‡Email: [email protected] ORCID: 0000-0002-0274-8456
1Department of Engineering Acoustics, TU Berlin, Einsteinufer 25, 10587, Berlin, Germany
2These authors contributed equally to this work.
Abstract: This work introduces a large dataset comprising impulse responses of spatially distributed
sources within a plane parallel to a planar microphone array. The dataset, named MIRACLE, encompasses
856,128 single-channel impulse responses and includes four different measurement scenarios. Three mea-
surement scenarios were conducted under anechoic conditions. The fourth scenario includes an additional
specular reflection from a reflective panel. The source positions were obtained by uniformly discretizing a
rectangular source plane parallel to the microphone for each scenario. The dataset contains three scenarios
with a spatial resolution of 23 mm at two different source-plane-to-array distances, as well as a scenario with
a resolution of 5 mm for the shorter distance. In contrast to existing room impulse response datasets, the
accuracy of the provided source location labels is assessed and additional metadata, such as the directivity
of the loudspeaker used for excitation, is provided. The MIRACLE dataset can be used as a benchmark for
data-driven modelling and interpolation methods as well as for various acoustic machine learning tasks,
such as source separation, localization, and characterization.
Keywords: impulse response, dataset, microphone array, acoustics
Novelty statement: We provide a large dataset of spatially distributed multichannel impulse response
measurements together with a thorough assessment of the source location accuracy.
1. Introduction
A Room Impulse Response (RIR) characterizes the linear
time-invariant acoustic propagation between a source and
a receiver within a specific acoustic environment. RIRs are
crucial for sound field auralization [36] as well as in the
realm of room acoustics, where they are used for estimat-
ing acoustic properties of a room such as the reverberation
time [45].
The emergence of data-driven methods in acoustics [4],
particularly deep learning methods, has sparked increas-
ing interest in the availability of rich, high-quality RIR
datasets. These datasets play a pivotal role in the training
of data-driven (interpolatory) sound field reconstruction
methods [13,17,21,26], deep generative models [29,41],
and augmentation methods [6]. In addition, RIR datasets
can be flexibly employed in order to synthesize acous-
tic training data for source localization and characteri-
zation [18], sound event detection, and speech separation
tasks by convolving arbitrary source signals with RIRs [19,
20,37]. The same synthesis procedure can be employed
for data-driven acoustic parameter estimation problems,
such as blind reverberation time estimation [15,28] and
others [7].
While data-driven methods often exhibit superior per-
formance compared to conventional model-based meth-
ods, they require large amounts of realistic training data
and are sensitive to variations of underlying probability
distributions describing the data, also known as dataset
shift [33]. Experimental data is oftentimes not avail-
able or too time-consuming to acquire. Many data-driven
methods across various application areas, such as speech
enhancement and recognition [16,25], localization [5,18],
sound field reconstruction [30], room acoustic parameter
estimation [10,49], and acoustical engineering [2,27,31],
are therefore trained with simulated data, whereby en-
Preprint. 2024-06-19
MIRACLE Dataset 2
hanced realism helps to improve generalization perfor-
mance [47,48]. However, without adaptation to or train-
ing with realistic data, the performance of data-driven
methods can be significantly impaired [2,14], which indi-
cates the need for experimentally measured RIR datasets.
Data availability
The dataset presented in this paper can be obtained
from
doi:10.14279/depositonce-20837
under the CC BY-NC-SA 4.0 license and authored by
Adam Kujawski, Art J. R. Pelling and Ennes Sarradj.
2. Materials and Methods
2.1. Experimental Setup
The experimental setup is illustrated in Fig. 1. Details on
the utilized hardware are given in Table 4 in Appendix A.
Microphone Array: The phased microphone setup fea-
tures a planar microphone array comprising no=
64 channels mounted in a 1.5 m ×1.5 m aluminium
plate. The microphone arrangement follows Vogel’s
spiral [44]. The maximum pairwise distance be-
tween the array microphones is referred to as the
aperture size da= 1.47 m. The microphone array
data was acquired with a multichannel acquisition
system (sampling rate: 51.2 kHz).
Sound Source and Excitation Signal: A dynamic 2” cone
loudspeaker in a cylindrical 3D-printed enclosure
was employed as the sound source. An exponential
sine sweep was used as the excitation signal because
of its favourable properties with regard to crest fac-
tor and rejection of non-linearities [42]. It was de-
signed according to [11,34,35] in the frequency range
of the loudspeaker, namely from 100 Hz to 16 kHz.
Because the anechoic chamber is nearly free of re-
flections and has very low noise levels, it was pos-
sible to choose a relatively short sweep time of 3 s
for the measurement. In order to ensure that the
entire system response after excitation is captured,
a safety window of 250 ms was added to the record-
ing duration, resulting in ns= 166,400 samples per
measurement. The loudspeaker excitation signal
was also fed back directly to the AD-converter and
was synchronously recorded with the microphone
signals as a reference for post-processing. It is re-
ferred to as the loopback excitation signal in the
following.
Positioning: A high-precision motor-driven 2D position-
ing system was employed for loudspeaker position-
ing. The positioning system and the microphone ar-
ray were manually aligned by using a laser distance
meter and a cross-line laser, achieving only minor
alignment errors of a few millimetres at worst. The
loudspeaker dust cap at the membrane centre was
used as reference in the manual alignment. Dur-
ing data post-processing, a spatial offset correction
was applied based on a statistical evaluation given
in Section 3.3. The corrected positions apply to the
acoustical centre of the loudspeaker rather than the
center of the membrane.
Environment: All measurements were performed in the
anechoic chamber of TU Berlin (room volume V=
830 m3, lower cut-off frequency fc= 63 Hz). Nei-
ther heating nor air conditioning was active, and
the temperature was monitored at the microphone
array centre throughout the experiment. A ground
plate was placed between the loudspeaker and the
microphone array in one of the experimental scenar-
ios to enable a reflective environment. The support-
ing grid platform and the positioning system were
clad with absorptive foam to minimize reflections.
2.2. Experimental Procedure
A customized and fully automated data acquisition pro-
cedure was implemented. Before each experiment, the
loudspeaker was repeatedly excited with the excitation
signal for a duration of 20 minutes (the duration was
determined in a dedicated experiment). This warm-up
phase accounts for the weakly non-stationary dynamics
of the loudspeaker’s transfer function, e.g. changes of the
properties of the loudspeaker magnet related to internal
temperature fluctuations, see [3]. Subsequently, the ac-
tual measurement routine was started by positioning the
loudspeaker at the desired source location and measur-
ing the room temperature simultaneously. After position-
ing, two repetitions of background noise measurement (1 s
each) and loudspeaker excitation measurements (3 s each)
were performed using all nomicrophones at once. Sub-
sequently, the cross-correlation between all nirecorded
channels was evaluated according to the rule of two [40].
Based on the measured sweep signals and the noise sig-
nal, the rule of two defines a cross-correlation threshold
at which a pair of measured sweeps can be regarded free
of corruption. In case of any violations, the measurement
was repeated automatically.
Following the main measurement campaign, an addi-
tional measurement was conducted in the anechoic cham-
ber to obtain the angle-dependent frequency response of
the loudspeaker at discrete azimuth angles at a resolu-
tion of ∆θ= 2.5◦. A microphone was placed at a dis-
tance of 0.5 m from the loudspeaker centre. The latter
was mounted on a motor-driven dispersion measurement
turntable. A photograph of the measurement setup can
be found in Fig. 2. The same excitation signal and pro-
cessing parameters as in the previous measurement cam-
paign were used to determine the loudspeaker impulse
Preprint. 2024-06-19
MIRACLE Dataset 3
response. Due to the cylindrical enclosure enclosing the
loudspeaker, rotational symmetry around the z-axis can
be assumed.
2.3. Post-processing
Several post-processing steps were performed to obtain a
good estimate of the system impulse response from the
measurements. Firstly, the loopback excitation and mi-
crophone signals were averaged across the two measure-
ment repetitions to obtain a single averaged excitation sig-
nal ˜ui,j ∈Rnsand averaged microphone signal ˜yi,j ∈Rns
at the i-th source to the j-th receiver location, respec-
tively. According to that, all signals were resampled to a
sampling rate of fd= 32 kHz since the loudspeaker trans-
mission capability and excitation sweep have an upper fre-
quency limit of 16 kHz. We applied the polyphase method
for resampling (see [1] for details).
Deconvolution
In the following, let nd= 104,000 denote the number of
samples after resampling. An estimate of the frequency
response was obtained by dividing the Discrete Fourier
Transform (DFT) of the averaged and downsampled mea-
surement signals Yi,j =DFT(˜yi,j)∈Cndby the corre-
sponding DFT of the averaged and resampled loopback
excitation signals Ui,j =DFT(˜ui,j)∈Cnd, i.e.
Hi,j (eıωk) = Yi,j (eıωk)U−1
i,j (eıωk)∈C,
for the angular frequency ωk= 2πk/ndwith
k∈[−nd/2, nd/2] ⊂Z. The inverse spectra U−1
i,j ∈Cnd
were obtained by regularized inversion [12,22–24,35,38]
U−1
i,j (eıωk) = U∗
i,j (eıωk)
U∗
i,j (eıωk)Ui,j (eıωk) + Mλ (eıωk),
where M= maxk∈{1..., nd}{|Ui,j (eıωk)|2}= 1. Regular-
ization is necessary to avoid instabilities in the decon-
volved frequency response that arise from persistently ex-
citing only over a limited frequency range. Practical con-
siderations for choosing the regularization term in acous-
tic applications can be found in [35]. The regularization
term λ∈Rndwas chosen as
λ(eıωk) =
1 for |ωk| ∈ [ 0, ωfade]
1+cosωfade−|ωk|
ωfade−ωcut
2for |ωk| ∈ [ωfade, ωcut]
0 for |ωk| ∈ [ωcut, π]
such that the regularization term λ(eıωk) is equal to 0
above the cutoff frequency
ωcut = 2π100 Hz
fd
which is chosen according to the lower limit of the loud-
speaker’s frequency range of 100 Hz and equal to 1 be-
low ωfade =ωcut
√2. A cross-fade based on a Hann window
(raised-cosine) is used to smoothly transition in between.
The estimate of the frequency response Hi,j was then
transformed back to the time domain to finally obtain
the impulse response
hi,j =DFT−1(Hi,j).
Truncation
The calculated impulse responses were subsequently trun-
cated in order to contain the size of the final dataset. For
user convenience, the impulse responses of all measure-
ment scenarios were truncated identically. For this, the
minimum cumulative energy e∈Rndgiven by
e(t) = min
i∈ni,j∈no
t
X
τ=1 |hi,j(τ)|2, t ∈ {1, . . . , nd},
was calculated for each scenario. The truncation index
ntwas chosen to be the smallest power of two that is
larger than the time index for which 0.1 % of the energy
is truncated at worst, namely
nt= 1,024 ≥˜
t= arg max
t∈{1, ..., nd}e(t)≤0.999 ∥e∥∞.
3. Results and Discussion
3.1. Impulse Responses
A total of four different experimental scenarios were real-
ized, which are summarized in Table 1. The acquisition
time for each of the large-scale scenarios A1, A2, and R2
was about 20 hours. The total number of single-channel
impulse responses across all scenarios is 856,128. The
scenarios differ regarding the environment as well as the
spatial dimension (dy=dx), sampling resolution (∆dy=
∆dx), and distance dzof the source plane. The two large
anechoic scenarios A1 and A2 each include 4,096 mea-
sured source positions on an equidistantly spaced 64 ×64
grid at different source-plane distances dz. In addition,
a densely-sampled scenario D1 was acquired on a smaller
33 ×33 grid with a spacing of only 5 mm. Scenario R2
is based on the same geometric setup as scenario A2, but
an aluminium plate on the floor introduces a specular
reflection. Fig. 3 and Fig. 4 exemplarily show the mea-
sured impulse response and its magnitude spectrum for a
single source-receiver combination for scenarios A1, A2,
and R2, respectively. It can be readily verified that the
doubling of the distance to the source is also reflected in
a doubling of the delay shift and an attenuation of the
magnitude spectrum by approximately −6 dB. Further-
more, the specular reflection for scenario R2 manifests in a
prominent second peak in the impulse response and comb
filtering in its magnitude spectrum. Additional reflections
manifesting as spurious peaks in the impulse response are
due to the structure of the positioning system and the
supporting grid platform.
Preprint. 2024-06-19
MIRACLE Dataset 4
Observation Plane Microphone Array
Loudspeaker
Temperature Sensor
Ground Plate
Figure 1: Experimental setup for the main experiment (R2) with reflective ground plate.
Microphone
Turntable
Figure 2: Experimental setup for the directivity measurement.
Preprint. 2024-06-19
MIRACLE Dataset 5
Table 1: MIRACLE experimental scenarios.
Scenario Anechoic ninodx=dy∆dx= ∆dydz
A1 ✓642= 4,096 64 146.7 cm 23.3 mm 73.4 cm
D1 ✓332= 1,089 64 16.0 cm 5.0 mm 73.4 cm
A2 ✓642= 4,096 64 146.7 cm 23.3 mm 146.7 cm
R2 ✗642= 4,096 64 146.7 cm 23.3 mm 146.7 cm
0 10 20 30
t[ms]
−0.10
−0.05
0.00
0.05
0.10
0.15
Amplitude [Pa]
A1
A2
R2
˜
t
Figure 3: Measured impulse responses for the scenarios
A1, A2, and R2 and the centremost locations in
the source and receiver plane. The dash-dotted
vertical lines indicate the truncation index ˜
t.
The mean and standard deviation of temperature and
the speed of sound for each of the scenarios are given in
Table 2. The speed of sound has been calculated accord-
ing to [8,9]1. It reveals that the temperature and the
speed of sound are almost identical across all scenarios
with an absolute difference of ∆µ < 1◦C and ∆µ≤0.6m
s,
respectively, which is expected due to the fairly constant
environmental conditions inside the anechoic chamber.
Table 2: Mean µand standard deviation σof the temper-
ature and speed of sound for each experiment.
Scenario Temperature [◦C] Speed of Sound [m s−1]
A1 µ= 21.6σ= 0.12 µ= 344.8σ= 0.07
D1 µ= 21.8σ= 0.01 µ= 345.0σ= 0.01
A2 µ= 22.3σ= 0.05 µ= 345.3σ= 0.03
R2 µ= 22.5σ= 0.02 µ= 345.4σ= 0.01
1An atmospheric pressure of 101.325 kPa and a carbon dioxide
mole fraction of 0.0004 was used. A generic value of 38% was
used for the relative humidity approximating the humidity con-
ditions throughout the experiments
102103104
f[Hz]
−50
−40
−30
−20
−10
0
|H2081,64|[dB]
A1
A2
R2
Figure 4: Magnitude of the frequency response of the
measured transfer functions for the scenarios
A1, A2, and R2 at the centremost locations in
the source and receiver plane.
3.2. Loudspeaker Directivity
Fig. 5 shows the directivity D and the directivity index DI
of the loudspeaker measured with a dispersion measure-
ment turntable in the azimuthal plane. In this work, the
directivity is defined as the ratio between the measured
squared sound pressure pRMS(θ, f) at an angle θand the
maximum among all angles, i.e.
D(θ, f) = 10 log10 pRMS(θ, f)
maxϕ∈[0,2π]pRMS(ϕ, f)!,
The directivity index under the assumption of rotational
symmetry is expressed as
DI(f) = 10 log10 4πp2
RMS(0, f)
2πRπ
0p2
RMS(ϕ, f) sin (ϕ) dϕ!,
where p2
RMS(0, f) represents the squared sound pressure
in front of the speaker.
It is seen that the loudspeaker exhibits a radiation pat-
tern similar to a monopole until an upper frequency of
2 kHz. Above this frequency, the directivity index in-
creases. Still, the directivity observed by the microphone
array is close to a monopole at relevant radiation angles,
i.e. θ≤θmax = 67.3◦, as indicated by the dashed line in
Fig. 5.
Preprint. 2024-06-19
MIRACLE Dataset 6
102103104
f[Hz]
0
20
40
60
80
100
120
140
160
180
θ[deg]
θmax
flfu−5
0
5
10
15
DI(f) [dB]
−18−15−12−9−6−3
D(θ, f) [dB]
Figure 5: Directivity D and directivity index DI of the loudspeaker. The maximum opening angle across all experiments
is denoted by θmax.
3.3. Positional Validation
Several uncertainty factors affected the spatial alignment
precision regarding the microphone array centre and the
centre of the observation area. These factors include mea-
surement uncertainties with regard to the utilized cross-
line laser and distance meter as well as mechanical back-
lash, which occurred primarily with horizontal changes of
direction. Therefore, a systematic spatial offset within
the range of a few millimetres can be assumed.
Due to the anechoic environment and the use of a large-
scale microphone array enabling an excellent spatial reso-
lution, Conventional Frequency Domain Beamforming [32]
serves as an appropriate method to obtain an estimate of
the actual source location. The large number of acoustic
cases also permits a statistical approach to determine the
spatial offset for a measurement scenario and to quantify
the uncertainty regarding the source position information.
Beamforming
Let ωk= 2πk/ndwith k∈[−nd/2, nd/2] ⊂Zand let
H(eıωk) = Hi,1(eıωk). . . Hi,no(eıωk)∈Cno
denote the transfer function measurements from the i-th
source at location xsfor i∈ {1, . . . , ni}to each of the
nomicrophones. The cross-spectral matrix induced by a
sound source with unit strength is then given by
C(ωk) = H(eıωk)H(eıωk)∗∈Cno×no.
The beamforming result for an assumed source location
xs∈R3is then given by the square of the C-weighted
norm of the steering vector a(xs, ωk)∈Cno, i.e.
b(xs, ωk) = ∥a(xs, ωk)∥2
C(ωk)=a(xs, ωk)∗C(ωk)a(xs, ωk).
Many formulations of the steering vector can be found
in the literature. The formulations I and IV in [43] re-
sult in a coincidence of the beamformer’s steered response
power maximum and the actual source location for a sin-
gle monopole source radiating under free-field conditions.
In this work, formulation IV was used, which defines the
entries of avia
{a(xs, ω)}j=eıω(rj−r0)/c
rjqnoPno
k=1 r−2
k
,
where rj=∥xs−xj∥2is the distance between the assumed
source location xsand the j-th microphone location xj,
and r0=∥xs−x0∥2is the distance between xsand the
reference position, in this case the origin of the coordinate
system.
Validation of each measured source position commenced
with the spatial discretization of a neighbourhood around
the assumed source position. A 201 ×201 equidistantly
spaced focus-grid with a resolution of ∆x= 0.5 mm was
employed. The beamforming map was computed on the
discretized region for every frequency in the range
Ω = 2πfl
fd
,2πfu
fd
Preprint. 2024-06-19
MIRACLE Dataset 7
which was chosen such that the lower frequency limit
fl= 2 kHz enabled a sufficiently large spatial resolution in
the resulting beamforming map, and the upper frequency
limit fu= 4 kHz ensures that the wavelength is larger
than the loudspeaker diameter. The latter is important
to ensure that the loudspeaker has a radiation pattern
close to a monopole at relevant radiation angles in order
to meet the monopole assumption needed for the steering
vector formulation. As indicated by the dashed line in
Fig. 5, the radiation angle from the loudspeaker to any
microphone in the array is bounded by θmax = 67.3◦. The
global spatial maximum is then determined by
ˆxi= arg max
xsX
ω∈Ω
ˆ
b(xs, ω),
where ˆ
b(xs, ω) denotes the amplitude normalized beam-
forming result
ˆ
b(xs, ω) = b(xs, ω)
b(ˆxs, ω),
with b(ˆxs, ω) being the beamformer’s maximum output
among all source locations xsat a given frequency ω. The
evaluation was conducted for different distances within a
range of up to ±12 mm around the assumed source dis-
tance with a sampling interval of ∆z= 1 mm to account
for a potential mismatch of the source plane distance.
Finally, the positional offset between the beamformer’s
prediction and the assumed source position is determined
by ∆xi= ˆxi−xi.
Statistical Evaluation
The systematic positional offset between the centre of the
observation area and the microphone array in the horizon-
tal and vertical direction can be statistically determined
by using the estimates ∆xi∈R2for each individual mea-
sured source position. Thereby, each estimated positional
deviation ∆xican be seen as a realization of the jointly
distributed random variables Rx, Rywith the joint Proba-
bility Density Function (PDF)fRx,Ry(∆xi). It is assumed
that the individual positional offset estimations ∆xiare
symmetrically distributed around the true positional off-
set due to the approximate symmetry of the microphone
array and observation plane around the origin. Then, the
true positional offset corresponds to the deviation asso-
ciated with the greatest probability. A simple method
to determine the joint PDF of jointly distributed random
variables based on a finite set of samples is the kernel
density estimation [39], denoted by
ˆ
fRx,Ry(∆xi) = 1
N
N
X
n=1
Kh(∆xi−∆x(n)
i),
where Nrefers to the sample size and Khis the so-called
kernel. A bivariate Gaussian kernel with bandwidth hwas
used, where hwas chosen according to the Silverman’s
rule of thumb [46].
Offset Correction
The correction procedure’s first step was determining the
distance ∆zbetween the loudspeaker and the microphone
array plane for the experiments {A1, D1}and {A2}. The
joint PDF was estimated individually for each evaluated
distance ∆z. Note that source cases from experiment R2
were excluded from the statistical evaluation since the
ground plate reflections would introduce an additional
disruptive factor in the positional estimation. It is as-
sumed that the true distance minimizes the variance among
any direction associated with ˆ
fRx,Ry(∆xi), i.e. the spec-
tral norm of the covariance matrix Σ∆xi(∆z) is mini-
mized, such that
arg min
∆z||Σ∆xi(∆z)||2.
Fig. 6 shows the joint PDF with the smallest spectral
norm for the experiments {A1, D1}and {A2}. Based on
the joint PDF corresponding to the optimal distance cor-
rection ∆z, the true positional offset in vertical and hor-
izontal direction is determined from the maximum of the
corresponding marginal distributions depicted in Fig. 7.
Table 3 shows the positional offset correction values for
each of the experiments.
Table 3: Positional correction values for each experiment.
Scenarios ∆x[ mm] ∆y[ mm] ∆z[ mm]
A1, D1 −4.6 mm 1.4 mm 4.0 mm
A2, R2 −5.2 mm −0.4 mm 6.0 mm
With the correction offset applied, one can conclude
that the positional uncertainties regarding the true source
positions are in the order of a few millimetres. Given the
2.5 and 97.5 percentiles of the marginal distributions, the
positional uncertainty is in the range of [−3.6 mm,3.4 mm]
in x-direction and [−2.1 mm,3.5 mm] in y-direction for
the experiments {A1, D1}. Regarding the experiments
{A2, R2}, the positional uncertainty is in the range of
[−4.9 mm,1.4 mm] in x-direction and [−2.6 mm,3.7 mm]
in y-direction.
Abbreviations
DFT Discrete Fourier Transform
PDF Probability Density Function
RIR Room Impulse Response
Preprint. 2024-06-19
MIRACLE Dataset 8
-25 0 25
∆x[mm]
-25
0
25
∆y[mm]
-25 0 25
∆x[mm]
0e+00
3e-05
6e-05
9e-05
1e-04
2e-04
2e-04
fRs(∆x, ∆y)
Figure 6: Estimated joint PDF of the positional deviations between the beamforming results and the assumed source
positions. The inner black circle corresponds to the outer rim of the loudspeaker and the outer black circle
indicates the outer rim of the enclosure box (left: Experiments {A1, D1}, right: Experiment A2).
-10 -5 0 5 10
∆ [mm]
0e+00
5e-04
1e-03
2e-03
-10 -5 0 5 10
∆ [mm]
fRs
fRs(∆x)
fRs(∆y)
Figure 7: Marginal distribution functions characterizing
the positional offset between the microphone
array and the observation plane (left: Exper-
iments {A1, D1}, right: Experiment A2). The
dashed line indicates the positional offset corre-
sponding to the maximum of the corresponding
PDF. The dotted lines indicate the 2.5% and
97.5% percentiles.
Acknowledgments
The authors also thank Arya Prasetya, Serdar Gareayaghi,
Can Kurt Kayser and Roman Tschakert for their help
with the experimental measurements and Fabian Brinkmann
for valuable insights into sweep synthesis and experiment
design.
The authors thankfully acknowledge the support of this
research by Deutsche Forschungsgemeinschaft through
projects 439144410 and 504367810.
References
[1] Scipy v1.11.4 manual,https://docs.scipy.org/doc/
scipy/reference/generated/scipy.signal.resample_
poly.html (accessed 2023-12-18).
[2] E. J. Arcondoulis, Q. Li, S. Wei, Y. Liu, and
P. Xu,Experimental validation and performance analysis
of deep learning acoustic source imaging methods, in 28th
AIAA/CEAS Aeroacoustics Conference, Southampton,
UK, 6 2022, https://doi.org/10.2514/6.2022-2852.
[3] L. L. Beranek and T. J. Mellow,Acoustics: Sound
Fields and Transducers, Academic Press, and imprint of
Elsevier, Amsterdam, first edition ed., 2012.
[4] M. J. Bianco, P. Gerstoft, J. Traer, E. Ozanich,
M. A. Roch, S. Gannot, and C.-A. Deledalle,Ma-
chine learning in acoustics: Theory and applications, The
Journal of the Acoustical Society of America, 146 (2019),
pp. 3590–3628, https://doi.org/10.1121/1.5133944.
[5] G. Bologni, R. Heusdens, and J. Martinez,Acoustic
reflectors localization from stereo recordings using neu-
ral networks, in ICASSP 2021 - 2021 IEEE International
Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2021, pp. 1–5, https://doi.org/10.1109/
ICASSP39728.2021.9414473.
[6] N. J. Bryan,Impulse Response Data Augmentation and
Deep Neural Networks for Blind Room Acoustic Param-
eter Estimation, in ICASSP 2020 - 2020 IEEE Inter-
national Conference on Acoustics, Speech and Signal
Processing (ICASSP), Barcelona, Spain, 5 2020, IEEE,
pp. 1–5, https://doi.org/10.1109/ICASSP40776.2020.
9052970.
[7] M. Cobos, J. Ahrens, K. Kowalczyk, and A. Poli-
tis,An overview of machine learning and other data-
based methods for spatial audio capture, processing, and
reproduction, EURASIP Journal on Audio, Speech, and
Preprint. 2024-06-19
MIRACLE Dataset 9
Music Processing, 2022 (2022), p. 10, https://doi.org/
10.1186/s13636-022-00242-x.
[8] O. Cramer,The variation of the specific heat ratio and
the speed of sound in air with temperature, pressure,
humidity, and CO2 concentration, The Journal of the
Acoustical Society of America, 93 (1993), pp. 2510–2516,
https://doi.org/10.1121/1.405827.
[9] R. S. Davis,Equation for the determination of the
density of moist air (1981/91), Metrologia, 29 (1992),
p. 67, https://doi.org/10.1088/0026-1394/29/1/008,
https://dx.doi.org/10.1088/0026-1394/29/1/008.
[10] S. Dilungana, A. Deleforge, C. Foy, and S. Faisan,
Learning-based estimation of individual absorption pro-
files from a single room impulse response with known po-
sitions of source, sensor and surfaces, in INTER-NOISE
and NOISE-CON Congress and Conference Proceedings,
vol. 263, 2021, pp. 5623–5630, https://doi.org/10.
3397/IN-2021-3186.
[11] A. Farina,Simultaneous Measurement of Impulse Re-
sponse and Distortion with Swept-sine technique, in 108th
AES Convention, Paris, France, 2 2000.
[12] A. Farina,Advancements in impulse response measure-
ments by sine sweeps, in 122nd AES Convention, Vienna,
Austria, 2007, p. 21.
[13] E. Fernandez-Grande, X. Karakonstantis,
D. Caviedes-Nozal, and P. Gerstoft,Genera-
tive models for sound field reconstruction, The Journal
of the Acoustical Society of America, 153 (2023),
pp. 1179–1190, https://doi.org/10.1121/10.0016896.
[14] A. Francl and J. McDermott,Deep neural net-
work models of sound localization reveal how perception
is adapted to real-world environments, Nature Human
Behaviour, 6 (2022), pp. 111–133, https://doi.org/10.
1101/2020.07.21.214486.
[15] H. Gamper and I. J. Tashev,Blind reverberation time
estimation using a convolutional neural network, in 2018
16th International Workshop on Acoustic Signal En-
hancement (IWAENC), Tokyo, Japan, 9 2018, pp. 136–
140, https://doi.org/10.1109/IWAENC.2018.8521241.
[16] S. Gannot, E. Vincent, S. Markovich-Golan, and
A. Ozerov,A consolidated perspective on multimi-
crophone speech enhancement and source separation,
IEEE/ACM Transactions on Audio, Speech, and Lan-
guage Processing, 25 (2017), pp. 692–730, https://doi.
org/10.1109/TASLP.2016.2647702.
[17] A. Geldert, N. Meyer-Kahlen, and S. J. Schlecht,
Interpolation of Spatial Room Impulse Responses Us-
ing Partial Optimal Transport, in ICASSP 2023 -
2023 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), Rhodes Is-
land, Greece, 2023, IEEE, pp. 1–5, https://doi.org/
10.1109/ICASSP49357.2023.10095452.
[18] P.-A. Grumiaux, S. Kiti´
c, L. Girin, and A. Gu´
erin,
A Survey of Sound Source Localization with Deep Learn-
ing Methods, The Journal of the Acoustical Society of
America, 152 (2022), pp. 107–151, https://doi.org/10.
1121/10.0011809.
[19] E. Guizzo, R. F. Gramaccioni, S. Jamili, C. Mari-
noni, E. Massaro, C. Medaglia, G. Nachira,
L. Nucciarelli, L. Paglialunga, M. Pennese,
S. Pepe, E. Rocchi, A. Uncini, and D. Comminiello,
L3DAS21 Challenge: Machine Learning for 3D Audio
Signal Processing, in Proceedings of the International
Workshop on Machine Learning for Signal Processing
(MLSP), Gold Coast, Australia, 10 2021, IEEE, https:
//doi.org/10.1109/MLSP52302.2021.9596248.
[20] E. Guizzo, C. Marinoni, M. Pennese, X. Ren,
X. Zheng, C. Zhang, B. Masiero, A. Uncini,
and D. Comminiello,L3DAS22 Challenge: Learn-
ing 3D Audio Sources in a Real Office Environment,
in Proceedings of the ICASSP, Singapore, Singapore, 5
2022, IEEE, pp. 9186–9190, https://doi.org/10.1109/
ICASSP43922.2022.9746872.
[21] Y. Haneda, Y. Kaneda, and N. Kitawaki,Common-
acoustical-pole and residue model and its application to
spatial interpolation and extrapolation of a room trans-
fer function, IEEE Transactions on Speech and Audio
Processing, 7 (1999), pp. 709–717, https://doi.org/10.
1109/89.799696.
[22] P. C. Hansen,Rank-Deficient and Discrete Ill-Posed
Problems: Numerical Aspects of Linear Inversion, SIAM
Monographs on Mathematical Modeling and Computa-
tion, SIAM, Philadelphia, 1998, https://doi.org/10.
1137/1.9780898719697.
[23] T. Hironori, K. Ole, N. P. A, and H. Hareo,
Inverse filter of sound reproduction systems using
regularization, IEICE Trans. Fundamentals, A, 80
(1997), pp. 809–820, https://cir.nii.ac.jp/crid/
1572824502324497664 (accessed 2024-03-13).
[24] M. Holters, T. Corbach, and U. Z¨
olzer,Im-
pulse response measurement techniques and their
applicability in the real world, in 12th Int. Con-
ference on Digital Audio Effects (DAFx-09), 2009,
https://www.dafx.de/paper-archive/details.php?
id=1u-OdqevtbweDYmNY2_kuA (accessed 2024-03-13).
[25] J. Huang and T. Bocklet,Intel Far-Field Speaker
Recognition System for VOiCES Challenge 2019, in Proc.
Interspeech 2019, 2019, pp. 2473–2477, https://doi.
org/10.21437/Interspeech.2019-2894.
[26] F. Katzberg, R. Mazur, M. Maass, M. B¨
ohme,
and A. Mertins,Spatial interpolation of room impulse
responses using compressed sensing, in 2018 16th In-
ternational Workshop on Acoustic Signal Enhancement
(IWAENC), Tokyo, Japan, 9 2018, pp. 426–430, https:
//doi.org/10.1109/IWAENC.2018.8521390.
[27] A. Kujawski and E. Sarradj,Fast grid-free strength
mapping of multiple sound sources from microphone array
data using a Transformer architecture, The Journal of the
Acoustical Society of America, 152 (2022), pp. 2543–2556,
https://doi.org/10.1121/10.0015005.
[28] M. Lee and J.-H. Chang,Deep neural network based
blind estimation of reverberation time based on multi-
channel microphones, Acta Acustica united with Acus-
tica, 104 (2018), pp. 486–495, https://doi.org/10.
3813/AAA.919191.
Preprint. 2024-06-19
MIRACLE Dataset 10
[29] S. Lee, H.-S. Choi, and K. Lee,Yet another gen-
erative model for room impulse response estimation, in
2023 IEEE Workshop on Applications of Signal Pro-
cessing to Audio and Acoustics (WASPAA), New Paltz,
NY, USA, 10 2023, pp. 1–5, https://doi.org/10.1109/
WASPAA58266.2023.10248189.
[30] F. Llu´
ıs, P. Mart´
ınez-Nuevo, M. Bo Møller, and
S. Ewan Shepstone,Sound field reconstruction in
rooms: Inpainting meets super-resolution, The Journal of
the Acoustical Society of America, 148 (2020), pp. 649–
659, https://doi.org/10.1121/10.0001687.
[31] T. Lobato, R. Sottek, and M. Vorl¨
ander,Deconvo-
lution with neural grid compression: A method to accu-
rately and quickly process beamforming results, The Jour-
nal of the Acoustical Society of America, 153 (2023),
pp. 2073–2089, https://doi.org/10.1121/10.0017792.
[32] R. Merino-Mart´
ınez, P. Sijtsma, M. Snellen,
T. Ahlefeldt, J. Antoni, C. J. Bahr, D. Blacodon,
D. Ernst, A. Finez, S. Funke, T. F. Geyer, S. Hax-
ter, G. Herold, X. Huang, W. M. Humphreys,
Q. Lecl`
ere, A. Malgoezar, U. Michel, T. Padois,
A. Pereira, C. Picard, E. Sarradj, H. Siller, D. G.
Simons, and C. Spehr,A review of acoustic imaging
methods using phased microphone arrays, CEAS Aero-
nautical Journal, 10 (2019), pp. 197–230, https://doi.
org/10.1007/s13272-019-00383-4.
[33] J. G. Moreno-Torres, T. Raeder, R. Alaiz-
Rodr´
ıguez, N. V. Chawla, and F. Herrera,A unify-
ing view on dataset shift in classification, Pattern Recog-
nition, 45 (2012), pp. 521–530, https://doi.org/10.
1016/j.patcog.2011.06.019.
[34] S. M¨
uller and P. Massarani,Transfer-function mea-
surement with sweeps, Journal of the Audio Engineering
Society, 49 (2001), pp. 443–471, https://www.aes.org/
e-lib/browse.cfm?elib=10189 (accessed 2023-12-19).
[35] M. M¨
uller-Trapet,On the practical application of the
impulse response measurement method with swept-sine
signals in building acoustics, The Journal of the Acous-
tical Society of America, 148 (2020), pp. 1864–1878,
https://doi.org/10.1121/10.0001916.
[36] K. M¨
uller and F. Zotter,Auralization based on
multi-perspective ambisonic room impulse responses, Acta
Acustica, 4 (2020), 25, https://doi.org/10.1051/
aacus/2020024.
[37] K. Nagatomo, M. Yasuda, K. Yatabe, S. Saito, and
Y. Oikawa,Wearable Seld Dataset: Dataset For Sound
Event Localization And Detection Using Wearable De-
vices Around Head, in Proceedings of the ICASSP, Sin-
gapore, Singapore, 5 2022, IEEE, pp. 156–160, https:
//doi.org/10.1109/ICASSP43922.2022.9746544.
[38] S. G. Norcross, M. Bouchard, and G. A. Soulo-
dre,Inverse filtering design using a minimal-phase tar-
get function from regularization, in Audio Engineering
Society Convention 121, San Francisco, CA, USA, Oct.
2006, Audio Engineering Society, https://www.aes.org/
e-lib/browse.cfm?elib=13763 (accessed 2024-03-13).
[39] E. Parzen,On estimation of a probability density func-
tion and mode, The Annals of Mathematical Statistics,
33 (1962), pp. 1065–1076.
[40] K. Prawda, S. J. Schlecht, and V. V¨
alim¨
aki,Ro-
bust selection of clean swept-sine measurements in non-
stationary noise, The Journal of the Acoustical Society of
America, 151 (2022), pp. 2117–2126, https://doi.org/
10.1121/10.0009915.
[41] A. Ratnarajah, Z. Tang, R. Aralikatti, and
D. Manocha,Mesh2ir: Neural acoustic impulse response
generator for complex 3d scenes, in Proceedings of the
30th ACM International Conference on Multimedia, Lis-
boa Portugal, 10 2022, Association for Computing Ma-
chinery, New York, NY, United States, pp. 924–933,
https://doi.org/10.1145/3503161.3548253.
[42] M. R´
ebillat, R. Hennequin, ´
E. Corteel, and B. F.
Katz,Identification of cascade of hammerstein models
for the description of nonlinearities in vibrating devices,
Journal of Sound and Vibration, 330 (2011), pp. 1018–
1038, https://doi.org/10.1016/j.jsv.2010.09.012.
[43] E. Sarradj,Three-dimensional acoustic source mapping
with different beamforming steering vector formulations,
Advances in Acoustics and Vibration, (2012), 292695,
https://doi.org/10.1155/2012/292695.
[44] E. Sarradj,A Generic Approach To Synthesize Op-
timal Array Microphone Arrangements, in 6th Berlin
Beamforming Conference, Berlin, Germany, 2 2016,
Gesellschaft zur F¨orderung angewandter Informatik
(GFaI), pp. 1–12.
[45] M. R. Schroeder,New Method of Measuring Rever-
beration Time, The Journal of the Acoustical Society of
America, 37 (2005), pp. 409–412, https://doi.org/10.
1121/1.1909343.
[46] B. W. Silverman,Density estimation for statistics and
data analysis, Chapman & Hall/CRC monographs on
statistics and applied probability, Chapman and Hall,
London, 1986, https://cds.cern.ch/record/1070306.
[47] P. Srivastava,Realism in virtually supervised learning
for acoustic room characterization and sound source lo-
calization, theses, Universit´e de Lorraine, 2023, https:
//theses.hal.science/tel-04313405.
[48] P. Srivastava, A. Deleforge, A. Politis, and
E. Vincent,How to (Virtually) Train Your Speaker
Localizer, in Proc. INTERSPEECH 2023, Dublin, Ire-
land, 8 2023, ISCA, pp. 1204–1208, https://doi.org/
10.21437/Interspeech.2023-1065.
[49] W. Yu and W. B. Kleijn,Room acoustical parameter
estimation from room impulse responses using deep neu-
ral networks, IEEE/ACM Transactions on Audio, Speech,
and Language Processing, 29 (2021), pp. 436–447, https:
//doi.org/10.1109/TASLP.2020.3043115.
Preprint. 2024-06-19
MIRACLE Dataset 11
A. Experiment Equipment
Table 4 lists the hardware devices that were used in the experiments. The calibration of the temperature sensor was
performed after the measurement campaign using a reference sensor with a temperature accuracy of ±0.1◦C.
Table 4: Utilized hardware devices.
Device Manufacturer Type Usage
Microphones GRAS 40PL-1 Short CCP Sound pressure acquisition
Temperature Sensor OMNI SENSORS OT60-B (±0.8◦C) Temperature acquisition
Acquisition System SINUS Typhoon Data acquisition
Stepper Motor Stepperonline NEMA23 Axes positioning
Motor Control Unit OpenBuilds Blackbox X32 Control loudspeaker position
Amplifier Klein & Hummel Monoblock MB 80 Loudspeaker amplification
Turntable Outline ET2 Directivity measurement
Laser distance meter PeakTech 2800A Positional alignment
Cross line laser Bosch PCL20 Positional alignment
B. File Structure
The files A1.h5,A2.h5 and R2.h5 have a size of about 1.07 GB and D1.h5 has a size of about 302.3 MB. Their contents
are organized as follows:
<Dataset >
data
impulse response float32 array of shape (ni, no, nt) - measured impulse responses
location
receiver float64 array of shape (no,3) - microphone locations
source float64 array of shape (no,3) - corrected source locations
source raw float64 array of shape (no,3) - uncorrected source locations
metadata
c0 float32 array of shape (ni,) - speed of sound
temperature float32 array of shape (ni,) - ambient temperature
sampling rate int64 - sampling rate
We also supply the file loudspeaker.h5 with a size of about 468 KB which contains the directivity measurements
of the loudspeaker. Its contents are organzine as follows:
<Dataset >
data
angle float32 array of shape (73,) - measurement angles
impulse response float32 array of shape (73, nt) - measured impulse responses
metadata
directivity float32 array of shape (73,513) - directivity D
directivity index float64 array of shape (513,) - directivity index DI
fftfreq float64 array of shape (513,) - corresponding frequencies
sampling rate int64 - sampling rate
Preprint. 2024-06-19
MIRACLE Dataset 12
C. Loading the Files
Listing 1: Python code snippet for loading the data.
from h5py import File
with File('A1 '). with_suffix ('. h5 '), 'r') as f:
ir = f['data'][ 'impulse_response'][()]
fs = f['metadata'][ 'sampling_rate '][()]
Listing 2: Matlab code snippet for loading the data.
ir = h5read ( 'A1. h5 ','/data/impulse_response')
fs = h5read ( 'A1. h5 ','/ metadata / sampling_rate ')
Preprint. 2024-06-19