Document [original]

TOWARDS ROBUST 3D FACE RECOGNITION FROM NOISY

RANGE IMAGES WITH LOW RESOLUTION

O. EBERS, T. EBERS, T. SPIRIDONIDOU, M. PLAUE, P. BECKMANN, G. B¨

ARWOLFF,

AND H. SCHWANDT

Abstract. For a number of different security and industrial applications,

there is the need for reliable person identification methods. Among these meth-

ods, face recognition has a number of advantages such as being non-invasive

and potentially covert. Since the device for data acquisition is a conventional

camera, other advantages of a 2D face recognition system are its low data cap-

ture duration and its low cost. However, the recent introduction of fast and

comparatively inexpensive time-of-flight (TOF) cameras for the recording of

2.5D range data calls for a closer look at 3D face recognition in this context.

One major disadvantage, however, is the low quality of the data aquired with

such cameras. In this paper, we introduce a robust 3D face recognition system

based on such noisy range images with low resolution.

1. Introduction

There is a number of applications that require the identification of humans. Ex-

amples include the authentification for a computer application or access control for

high-security areas like an airport control tower. Face recognition systems are well

suited for the task of human identification as they require less cooperation by the

user than an iris or fingerprint scan. It is natural, robust and unintrusive, and the

user is not required to remember any passwords or codes [2]. While the automatic

face recognition on 2D images has been a research issue for several years, the recent

development of 3D sensors has resulted in a considerable interest in methods for

face recognition on range images.

In this project, we explored the state of the art of 3D face recognition and ana-

lyzed the advantages and disadvantages of several methods in regard to our project

goals. Our work resulted in the development of a real-time system for the process-

ing of three-dimensional data that is specialized on pattern recognition tasks. The

algorithms we chose to implement were modified according to the project’s needs

and were reinvestigated and recombined.

The result of our work is a general development platform for 3D pattern recog-

nition, specially designed for 3D face recognition on noisy and low-resolution data.

In this context the platform can be extended for the recognition of any kind of

3D objects and it can be easily enhanced by the supplementary processing of two-

dimensional intensity data.

Date: October 27, 2008.

2000 Mathematics Subject Classification. 68T45, 68U10.

Key words and phrases. 3D face recognition, time-of-flight camera, range data denoising, pat-

tern recognition, pattern matching.

This project was funded by the European Regional Development Fund (ERDF).

2 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨

ARWOLFF, AND SCHWANDT

In order to develop a face recognition system based on range images—for example

acquired with the new 3D sensor type of time-of-flight (TOF) cameras—one has

to turn particular attention on the quality of the data since such data is still very

noisy and biased [23, 55]. For this reason our main goal was the development of

algorithms that improve low-quality range data and process it efficiently and in

real-time. Furthermore, our 3D face recognition system is constructed modularly,

and can thus be easily adapted to data of higher quality obtained by other sensors.

To deal with low-quality range data, one has to (a) calibrate the imaging sys-

tem with this particular application in mind, and (b) employ a pre-processing step

that filters and smoothes the image data to achieve a quality suitable for feature

extraction. The pre-processing algorithms have to account for the particular char-

acteristics of the range data at hand since for example the noise model of a TOF

sensor differs from the usual Gaussian white noise model assumed for the majority

of standard denoising methods.

After acquiring and pre-processing the data, one wishes to extract discriminant

and robust features. Again, it is important to consider the special nature of the

data which for example forbids a robust calculation of the curvature. In particular,

we have considered three features: the surface normals (or Gaussian map), the local

binary pattern (LBP) and facial profiles (1D cross sections of the face).

The final face recognition task can then be accomplished by the usual classifica-

tion methods such as Principal Component Analysis (PCA [4]), the Linear Discrim-

inant Analysis (LDA [47]) or the Modified Linear Discriminant Analysis (MLDA

[36]).

2. Related Work

While there exists extensive work on 2D face recognition, 3D face recognition is

still a comparatively new research field. As has been shown in several experimental

surveys [1, 14, 15, 32], in particular multi-modal approaches combining 2D and 3D

features give results that surpass those of a simple 2D system. One main disadvan-

tage of a face recognition system using range images, however, is the high cost of an

industrial high resolution 3D scanner that is often needed to aquire the data. Most

of the 3D face recognition work published until today use such laser or structured-

light scanners [40, 63]. One cost-effective way to record range data is of course

stereographic imaging [18]. However, it is well-known that such systems require a

robust solution for the correspondence problem [26] and precise calibration. The

also comparatively inexpensive time-of-flight imaging systems on the other hand

have been used in a substantial number of application areas such as automated

production [39, 46] or automotive applications [49, 58, 57, 67], while there are little

studies that investigate the feasibility of TOF imaging for more complex recogni-

tion tasks like facial recognition. One major problem that arises with the use of

cost-efficient 3D imaging systems like TOF is the low quality and resolution of the

data. The main goal of our project was the implementation of a software pipeline

capable of processing such data in real-time which will be described in the follow-

ing sections. Although the system is taylored for 3D face recognition from TOF

range images, it can be easily modified for other object recognition tasks based on

low-quality data.

Denoising of 2.5D Data. The first processing step in our pipeline aims at the

removal of noise present in typical range sensor data. The denoising of 3D data

FACE RECOGNITION FROM NOISY RANGE IMAGES WITH LOW RESOLUTION 3

and range data is a wide research field, and the choice of the appropriate denoising

method depends on the noise and data characteristics. Typical denoising methods

include the median filter [20], the moving least-squares method [43] and anisotropic

diffusion [64], especially anisotropic smoothing of point sets [37] and surface meshes

[30].

The wavelet transform is widely used for the purpose of image denoising and has

been found to be a high-performance tool. For example, Cai et al. provide a useful

MATLAB R

framework [12] we used in our work (a detailed description can be

found in [61] and [35]). In [62], a good introduction of complex wavelet transforms

and their applications can be found.

Point-to-Point Registration. Another crucial step is the face registration which

aims at detecting the exact face position and attempts to align the face with a

position suitable for recognition tasks, which is usually the frontal view.

For the coarse registration, one common practice is to identify the position of

three significant local features, for example the pupils and the nose tip. Afterwards,

the features are mapped onto the corresponding features of a reference face by an

affine map consisting of a rotation and a translation (cf. [66]). The parameters

of this map express the feature points’ relation to the corresponding points in the

reference face. Via the affine map determined in this way, all data points are

subsequently transformed to realize the coarse alignment along a position that is

common for all faces in the database.

Common algorithms for fine alignment on the other hand is the family of Iterative

Closest Point algorithms (ICP) which try to minimize the Hausdorff-distance (or

one of its various relatives) between surfaces, and the Thin Plate Spline algorithm

(TPS) [42]. Chen et al. [17] and Besl et al. [6] use ICP for scan registration during

3D model creation. In this context, ICP can be used for fine face alignment by

fitting the face data onto the reference face. An exhaustive overview about ICP

algorithms is provided by Rusinkiewicz et al. [59].

An interesting variant of the aforementioned (rigid) ICP-based registration was

proposed by Bronstein et al. [9, 10], who used the Gromov–Hausdorff distance to

compute inter-facial embeddings with minimal metric distortion, thereby enhancing

the registration toolbox with the ability to match faces with different expressions

against each other.

As a generalization to the Hausdorff distance, which is usually expressed as a

min–max problem of the maximal distance of two sets (using the metric of their

common embedding metric space), the Gromov–Hausdorff distance minimizes the

maximal inner-metric distortion among all common ambient metric spaces and

all possible embedding mappings, thus rendering the Gromov–Hausdorff distance

independent of isometries. Since the computation of the functional as described

here is intractable, the authors propose a discretization of the Gromov–Hausdorff

distance in terms of mutual inter-surface embeddings, thus minimizing the metric

distortion while embedding one surface onto the other and vice versa.

Once this distance functional and its corresponding embeddings are computed,

the resulting distance value can be used directly for registration tasks by interpret-

ing it as a similarity measure between faces. Moreover, the resulting embeddings

carry an optimal inter-facial point-to-point correspondence regardless of the actual

facial expressions involved. Still, as in the case of ICP, the process relies heavily on

a previous rough initialization of a few feature points.

4 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨

ARWOLFF, AND SCHWANDT

Face Recognition. For the final task of face recognition for 2.5D images or 3D

models, three main methodologies can be identified: shape matching, feature-based

and image-based techniques. A detailed overview on face recognition methods is

provided in [2].

The first group consists of algorithms that iteratively try to map a 3D point

cloud or a 3D mesh onto a reference point cloud or reference mesh, respectively

[3, 5, 9, 10, 11, 19, 65]. The shape matching methods can be seen as pattern

recognition methods without feature extraction. A test pattern is directly compared

with the reference pattern, a feature extraction does not take place. The similarity

measure—which is often implemented via a correlation measure—can be optimized

by using a sufficiently large number of training samples. These approaches demand

an extensive computational effort and an accurate point-to-point data registration

and assume the existence of many correspondences between the reference model

and the test data.

The modus operandi of feature-based methods correponds mostly to that of shape

matching. However, with pattern matching, not the whole data is processed but

appropriate subsets. For example, particular regions (eye, forehead, cheek, nose)

or the nose profile of the face could be detected, extracted and processed [5, 16,

25, 42, 45]. Like shape matching methods, the feature-based methods demand a

robust image registration, since the features are selected during a pre-processing

step without the possibility to change their value later on.

Image-based methods attempt to extract the face data subset significant for face

recognition with the aid of statistical learning techniques and without any human

interaction. With feature-based methods, there are no or at least less pre-processing

steps involved as is the case with image-based methods: All of the image information

is used for statistical analysis. This methods have been very successful in the

context of 2D face recognition [4, 47]. Since the TOF sensor data is a 2D range

distribution and can thus technically be viewed as a conventional 2D image, it does

not surprise that these techniques are also applied in this context. Introductions in

state-of-the-art techniques of statistical learning and statistical pattern recognition

can be found in [7, 21, 33].

In our approach, we use the statistical learning techniques with Local Binary

Patterns (LBP [31, 50]) and surface normals, thereby proposing a combination of a

feature-based and an image-based method: There is less information lost with this

technique, since the whole image and not some preselected region is used for the

feature extraction and subsequent classification. As a statistical learning method,

we used the Principal Component Analysis (PCA [4]), the Linear Discriminant

Analysis (LDA [47]) and the Modified Linear Discriminant Analysis (MLDA [36])

for classification.

As an alternative feature-based approach, we used different profiles of the face

(see e.g. [51]) for classification via the Pearson coefficient.

3. The General Setup

Our project, funded by the European Regional Development Fund (ERDF), was

concerned with the processing of facial biometric data in the context of the descrip-

tion of pedestrian movement. To obtain data for crowd movement models that

account for the position of individuals [27] (in contrast to a crowd fluid [28]) it is

necessary to identify those individuals with unintrusive biometric techniques. A

FACE RECOGNITION FROM NOISY RANGE IMAGES WITH LOW RESOLUTION 5

more specific application would be the analysis of commuter behaviour in public

transportation: the usual systems available at the time of writing of this article only

count passengers without recognizing individuals changing means of transportation.

To achieve this task of processing and classifying individual biometrical infor-

mation we developed a 3D face recognition system that is able to cope with low-

quality range data. This software platform was implemented as a toolbox for the

MATLAB R

scripting language.

Multiple modules for pre-processing and the actual face recognition were im-

plemented and tested separately (cf. figure 1). For almost every module, we have

developed and implemented alternative approaches to adjust the system to different

application requirements. Depending on operating conditions and available capaci-

ties, the user can choose from a variety of individual modules and algorithms. The

software contains conventional methods for 3D face recognition as well as unique

and novel ideas.

Figure 1. Software pipeline

As a main result of this project we implemented a robust real-time face recogni-

tion system from an innovative multi-modal approach that accounts for the typical

characteristics of low-quality data obtained with a TOF sensor by combining 3D

and 2D techniques that can deal with low-resolution images and little preliminary

pre-processing capacities.

Since we would like to compare the performance of the system for data obtained

from various sources, we implemented a simulation pipeline to emulate different

noise chracteristics and resolution (see figure 2). The simulation pipeline features

additional modules and algorithms for gradually degrading the pre-processed and

comparatively noise-free laser scanner data from the Gavab database towards the

data quality of a realistic cost-effective real-time ranging system. This simulation

served as an important tool for assessing the sensor’s requirements like resolution

and signal-to-noise ratio.

Figure 2. Simulation of low-quality sensor data

In figure 1, the flow chart of the final system is illustrated.

6 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨

ARWOLFF, AND SCHWANDT

The pipelines can be divided into four main steps which will be described in the

following: noise simulation, pre-processing, feature extraction and classification.

Step 1: Simulation of Low-Quality Range Data

In order to assess the effectiveness of the implemented denoising algorithms for

data acquired from different types of range sensors, we decided to simulate the

noise characteristic of a standard continous-wave TOF sensor. Such a ranging

system provides a number of advantages:

•It is less expensive than most other 3D imaging systems,

•suitable for real-time applications,

•not intrusive and suitable for covert operation (since it works with infrared

light),

•space saving,

•also delivers 2D intensity data,

•delivers spatiotemporal data.

Since the TOF sensor independently delivers the geometry of an object as well

as the intensity of the reflected infrared light, one could in principle also obtain

information about the surface reflectivity of that object (at that particular wave

length). A quite exhaustive survey about optical time-of-flight measurement is

given in [38].

Unfortunately, as can be seen from figure 3, data acquired with standard TOF

sensors available at the time of writing of this article is subject to significant sys-

tematic and random errors (noise). The random errors are mainly caused by

•shot noise,

•quantization noise,

•phase jitter and modulation errors;

whereas to name some reasons for systematic errors, among these are:

•Nonlinear characteristics (saturation effect),

•dark current,

•edge and movement artifacts due to mixed phases,

•reflecting surfaces,

•scattered light.

It is also for this reason that the simulation of noisy range data is necessary to

account for future improvements in the image quality of such systems.

Noise Characteristic of a TOF Range Sensor. The precise technical method

by which the optical signal received by a TOF sensor is measured and processed

depends on the particular camera model, see e.g. [54, 60]. However, we suspect the

noise characteristic to be very similar for the various systems since most of them

use the phase-shifting technique, for which can be shown—assuming fixed envi-

ronmental conditions and similar base characteristics for each pixel of the optical

chip—that the standard deviation σof the range measurement is reciprocal to the

amplitude Aof the optical signal [23]: σ=1

In particular, the most popular 4-phase-shifting technique essentially leads to

estimating the angular component of a random variable with values in R2; the

mean length of this variable is given by the signal amplitude. This fact can be used

to simulate the noise pattern of a TOF sensor by generating a pseudo-random point

in the plane with normal distribution (by the Box–Muller method [8] for example)

FACE RECOGNITION FROM NOISY RANGE IMAGES WITH LOW RESOLUTION 7

and extracting the angular component. Further assuming a uniform reflectivity of

the facial surface (i.e. skin), the signal amplitude can be estimated by taking the

component of the surface normal projected onto the optical axis. Finally, this value

has to be multiplied by the luminosity of the infrared LEDs at this point which can

be estimated by considering the inverse-squared-distance law as well as a realistic

directional characteristic for the emitter [52]. This approach leads to a simple but

effective model for simulating the random noise characteristic of a generic TOF

sensor.

It should be noted, however, that systematic errors are not accounted for. These

errors are either difficult to control or can be eliminated by a careful sensor calibra-

tion [34, 41]; denoising algorithms are not of much use to eliminate this problem.

Figure 3. Facial surface obtained from a SwissRanger 3000 range image

Step 2: Pre-processing

To improve the face recognition on range images, a pre-processing step is needed

which includes denoising, face detection, segmentation and registration.

Denoising of Range Data. The performance of the methods used to remove noise

from range images depends on the sensor features and environmental conditions like

the distance to the observed object, the reflectivity of the object’s surface as well

as the angle of incidence of the illumination. For the denoising step of our system,

we implemented wavelet denoising and normalized convolution.

The Discrete Wavelet Transform. For a 2D signal I(x, y), the discrete wavelet trans-

form (DWT) provides a multi-scale signal representation. The DWT is computed

by a high-/low-pass filter bank, iteratively applied on the low-pass signal output of

the previous stage. The multi-scale signal representation is then a collection of the

resulting sub-band output coefficients. The inverse discrete WT is calculated by an

iteratively applied synthesis filter bank.

The standard DWT is a powerful and non-redundant tool of signal processing,

with four major drawbacks (as discussed in [61]):

•Oscillations in the neighborhood of singularities,

8 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨

ARWOLFF, AND SCHWANDT

•lack of translational invariance,

•lack of directional invariance,

•missing phase information.

In order to remedy those drawbacks, a number of variants of the standard DWT has

been developed. One example of such a generalization of the DWT is the complex

wavelet transform (CWT).

The Complex Wavelet Transform. The CWT is computed in a way similar to the

DWT, but with a complex-valued scaling function φc(t) and a complex-valued base

wavelet ψc(t):

φc(t) = φre(t) + iφim(t)(3.1a)

ψc(t) = ψre(t) + iψim(t),(3.1b)

where the indices re and im label the real and the imaginary part respectively.

After projecting the signal onto the basis functions 2i

2ψ(2it−n), one can calculate

the wavelet coefficients

(3.2) dc(j, n) = dre(j, n) + idim(j, n)

with magnitude

(3.3) |dc(j, n)|=qd2

re(j, n) + d2

im(j, n)

and phase

(3.4) arg(dc(j, n)) = arctan dim(j, n)

dre(j, n).

A redundant form of the CWT is the complex dual-tree wavelet transform (CDTWT),

which is discussed by Kingsbury and Selesnick in [35] and [61]. We will briefly de-

scribe the CDTWT in the next section.

The Complex Dual Tree Wavelet Transform. The complex dual-tree WT of a 2D

signal is obtained by the parallel computing of four conventional critically-sampled

separable 2D DWTs. The transform is therefore four times as expensive compared

with a DWT. On the other hand, the complex dual-tree WT can provide—based on

a certain design of the upper and lower high-pass and low-pass filters—a nearly ideal

shift invariance and directional selectivity in two or more dimensions, as opposed

to the critically-sampled discrete WT (see [35] and [61] for details.) The sub-band

output of the upper discrete WT can be interpreted with this filter design as the

real part, the sub-band output of the lower discrete WT as the imaginary part of

a complex dual-tree wavelet transform.

As the actual wavelet denoising algorithm, we used the implementation of soft

thresholding for 2D signal denoising as described in [12]. In this method, new values

wnew for wavelet coefficients wfor all scales and sub-bands are computed via the

equations (3.5). In the first step, we delete the frequencies the coefficients of which

lie below a certain threshold T. In the next step, the remaining coefficients are

scaled. In this way, the effect of small values in the high-frequency sub-bands of

the reconstructed 2D signal is decreased.

FACE RECOGNITION FROM NOISY RANGE IMAGES WITH LOW RESOLUTION 9

wnew = max(abs(w)−T, 0),(3.5a)

wnew =wnew

wnew +T∗w.(3.5b)

The new coefficients wnew are used for an inverse wavelet transformation to recon-

struct the 2D signal.

Normalized Convolution. Another approach for the adaptive denoising of an image

Iwith given confidence values Cis the normalized convolution [22, 56]. One advan-

tage of this method is the fact that it can be easily generalized to data of arbitrary

dimension since the denoised image I0can be written invariantly in the form

(3.6) I0=g∗(C·I)

g∗C.

Here, gis a suitable filter mask – a Gaussian kernel, for example. The multiplication

is to be understood componentwise and ∗denotes convolution. Using the inverse

amplitude of the optical signal as a confidence measure, the normalized convolution

can be used to filter TOF image sequences that are represented by 3D data.

Segmentation and Registration. To eliminate the non-facial parts of an im-

age like the neck or shoulders, we invoked a multi-stage segmentation process, as

shown in figure 4. First, the images were sliced into several regions by using a

range threshold. Afterwards, the particular parts were analyzed morphologically

and the face-like regions were chosen for further processing. In the next step, the

nose tip was found for each of the detected faces. Around the nose tip, we set a

sphere of fixed radius r= 14 cm which proved to deliver the best recognition rates

after some testing. We will call the partial surface cropped by the interior of this

sphere the sphere-cropped image. Afterwards, a rectangle-shaped region from the

corresponding 2.5D image was cut. This rectangle-cropped region was resized to

80 ×50 pixels in order to meet the classifier feature vector length. The individual

steps of this multi-stage segmentation are illustrated in figure 4.

(a) (b) (c)

Figure 4. Multi-stage segmentation.

(a) Threshold-based segmentation, (b) segmentation by a ball of radius r= 14 cm around the

nose tip (sphere-cropped), (c) rectangle-cropped face region.

Before the segmentation process, an ICP algorithm was invoked to match the

facial surface with a reference template.

10 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨

ARWOLFF, AND SCHWANDT

Step 3: Feature Extraction

In this step, the denoised, segmented and registered faces are passed to the feature

extraction part of the pipeline. In particular, we extracted three features: surface

normals, the local binary pattern and profiles.

Surface Normals. One of the earliest ideas in 3D face recognition is the use of

curvature as a discriminant feature [25]. Unfortunately, at least the Gaussian curva-

ture (which encodes valuable intrinsic geometric information) is highly susceptible

to noise, as shown in figure 5 (visualized with JavaView c

[53]). The Gaussian

map (the distribution of surface normals) depends on the first derivatives of the

parametrization and is thus more robust than the curvature which is encoded in the

derivative of the Gaussian map [13]. Figure 6 shows the three cartesian components

of the surface normals. (However, since the normal vectors are normalized, they

actually represent 2-dimensional data on the unit sphere.) As one can easily see,

the components correspond to a standard gray-valued 2D image that one would

obtain from a human face if the reflectivity of the skin and the lighting were uni-

form. Thus, by using the Gaussian map as a feature one overcomes one of the most

serious problems in 2D face recognition, namely varying illumination.

A major concern, however, is a suitable representation of the sphere-valued

data. As a scalar function, the Gaussian curvature is independent of a specific

parametrization of the surface, and being an intrinsic feature it is even invariant

against isometric transformations of the facial surface. For the representation of

the distribution of surface normals on the other hand, one has to agree on a spe-

cific coordinate system like spherical or stereographic coordinates. It is not a trivial

problem to decide which coordinate system is optimal for a specific recognition task

at hand. In this study, we have tested polar stereographic coordinates. This means

that the surface normals N= (N1, N2, N3) were projected onto the complex plane

via the map

(3.7) π(N) = 1

1−N3

(N1+iN2)

and represented by polar coordinates (N3,arg π(N)).

Local Binary Pattern (LBP). Originally, the LBP approach was developed for

the description and recognition of 2D textures [50]. The next step towards effi-

cient face recognition using LBP was done by the authors of [31] with 3D Local

Binary Patterns (3D LBP). They refined the feature value by adding three levels.

However, they still used the histogram comparison, which is not the optimal com-

parison method. We applied classifiers and compared the results of the different

conventional LBP and the different 3D LBP levels with the performance of other

features.

To compute the LBP value, we first calculated the differences between the gray-

scale values of a point and its Pneigbours. For P, the values 8, 16 or 24 can

be chosen. The radius of the neighborhood defines an additional parameter R.

In a second step, the signs of the intensity differences are binarily coded as 0 for

a difference value of less than 0, and 1 for other difference values. Finally, the

resulting binary digits are collected in a clockwise fashion to represent a binary

number, which is then written as a decimal number. In this way, each point is

mapped to an LBP value.

FACE RECOGNITION FROM NOISY RANGE IMAGES WITH LOW RESOLUTION 11

(a) High resolution image ob-

tained with a structured-light

3D scanner and color-coded

Gaussian curvature...

(b) ...after adding low power

Gaussian noise

Figure 5. The computation of Gaussian curvature is susceptible

to noise

Figure 6. Cartesian components of surface normals

The authors of [50] also describe an alternative technique that is rotation-invariant.

With this technique, each binary LBP number is shifted until the minimal value

is reached. To increase the efficiency of the histogram comparison, one can encode

all rare features with a single value. (Frequent features are edges, curvature lines

or homogenous regions.) This variant of an LBP is called the uniform LBP. The

texture operator for the general case based on a circularly symmetric neighborhood

of Pmembers on a circle of radius Rwill be denoted as LBPriu2

P,R as in [50].

For the computation of 3D LBP values not only the signs of the gray-scale differ-

ences to the neighboring pixels are coded, but also the values themselves. Four bits

are used to keep track of a difference value: One bit for the sign and three bits for

the value. A difference less than 7 is coded directly, greater differences are coded

as 7. If Pneighbors are considered, a 4 ×P-sized table is used to list the four 3D

LBP levels. The four P-bit binary numbers in the columns are decimally coded

12 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨

ARWOLFF, AND SCHWANDT

and represent the four 3D LBP layers. A detailed description of this method can

be found in [31]. The 3D texture operator will be denoted as 3DLBPlayer i with

i∈ {1, . . . , 4}. Some examples of different LBP feature images are shown in figure

Reference Texture Features. The authors of [29] evaluated the recognition per-

formance of different gradient images and recommended the use of the Horizontal

Sobel Operator (HSO) and the Large Horizontal Gradient operator (LHG) as effec-

tive texture descriptors. We ranked the performance of LBP and surface normals

by using these descriptors and the original range images. The LHG describes the

relative range differences along the horizontal direction in a range of five pixels and

is computed via the filter mask (LHGij) = −10001. The HSO detects

vertical edges and is calculated via the filter mask

(3.8) (HSOij) = 



−101

−202

−101



.

Figure 7 shows example images of the examined texture descriptors.

(a) (b) (c) (d) (e) (f)

(g) (h) (i) (j) (k) (l)

Figure 7. Feature images. a: range image, b: 3DLBPlayer 1, c:

3DLBPlayer 2, d: 3DLBPlayer 3, e: 3DLBPlayer 4, f: Horizontal So-

bel Operator, g: Horizontal Gradient Large, h: LBPu2

8,1, j: LBPu2

16,1,

i: LBPriu2

16,2, k: LBPri

8,1, l: LBPriu2

8,1

Facial Profiles. The extraction of vertical profiles from range data can be seen as

a quite natural approach if you think of the fact that humans are quite capable of

identifying a person by their facial silhouette only. In addition, templates of profile

data need very little memory and the computation of their degree of correlation is

very quick.

FACE RECOGNITION FROM NOISY RANGE IMAGES WITH LOW RESOLUTION 13

Here, four profiles (the vertical central profile, horizontal nose-crossing profile,

horizontal root-of-the-nose-crossing profile and the horizontal forehead-crossing pro-

file) have been investigated. The main difficulty consists of extracting the vertical

central profile. However, two methods can be used to achieve this: (a) by finding

the vertical symmetry plane of the face or (b) by aligning the face with a reference

face to assure a vertical position and subsequent detecting the nose tip.

Figure 8 illustrates the vertical symmetry plane, and figure 9 shows the extracted

profiles.

Figure 8. Symmetry plane of a face

Figure 9. Vertical profiles: by the symmetry plane detection

method (yellow), by detecting the nose tip (red) — horizontal pro-

files: nose-crossing profile (green), root-of-the-nose-crossing profile

(red), forehead-crossing profile (pink)

Step 4: Classification

In a pre-processing step each 2D image of the Ntraining samples is converted

into a 1D vector xiwith mcomponents (m= image height×image width in pixels)

by successively appending the rows of the image. In this way, we obtain a data set

(x1,x2, . . . , xN)∈(Rm)N.

14 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨

ARWOLFF, AND SCHWANDT

Principal Component Analysis. One of the classifiers most commonly used as

a reference is the PCA technique. PCA is a method for dimensional reduction that

computes a linear projection operator WP CA which maximizes the determinant of

the total scatter matrix STof the Nimage samples x1,x2, . . . , xN:

(3.9) WP CA = argmax

|WTSTW|

with

(3.10) ST=

k=1

(xk−µ)(xk−µ)T,

where µis the mean of all samples. The optimal projection matrix WP CA is

comprised of the eigenvectors to the largest eigenvalues of the total scatter matrix

ST. The images corresponding to these eigenvectors are called eigenfaces [4].

The new feature vectors ykare given by

(3.11) yk=WT

P CAxk.

Linear Discriminant Analysis. With the PCA approach, the considered total

scatter matrix STincludes not only the between-class scatter, which is useful for

classification, but also the within-class scatter. In contrast to PCA, the LDA com-

putes the projection matrix WLDA in such a way that the ratio of the between-class

scatter and the within-class scatter is maximized:

(3.12) WLDA = argmax

|WTSBW|

|WTSWW|,

where SWand SBdescribe the within-class scatter and the between-class scatter

matrices, defined by

(3.13) SW=

j=1

k=1

(xjk −µj)(xjk −µj)T

and

(3.14) SB=

j=1

Nj(µj−µ)(µj−µ)T.

Here Cis the class number, µjis the the mean of the class Xjand Njis the number

of training samples in class Xj.

Modified Linear Discriminant Analysis. With conventional LDA, the Fisher

optimization criterion is in a quotient form. As was shown in [4] and in [36],

the quotient form can cause a numerical problem due to an insufficient number

of training images for each person in the database. To avoid this problem, the

fisherface approach was proposed in [4]. The dimensionality is reduced twofold:

first without regarding the between-class differences using PCA for projecting the

data set onto a subspace of dimension m−C, and second using an LDA projection.

Another method was proposed by the authors of [36]; a modified Fisher optimization

criterion in the deduction form:

(3.15) WMLDA = argmax

|WT(SB−αSW)W|.

FACE RECOGNITION FROM NOISY RANGE IMAGES WITH LOW RESOLUTION 15

Here αis the adjusting parameter for the weighting of the within-class scatter

matrix SWrelative to the between-class scatter matrix SB.

Pearson Coefficient. This method is ideally suited for the comparison of the

facial profiles mentioned above.

Correlation indicates the strength and direction of a linear relationship between

two random variables (see e.g. [44] for details). The most commonly known cor-

relation coefficient is the Pearson product-moment correlation coefficient p(X, Y ),

which is obtained by dividing the covariance of the two variables by the product of

their standard deviations. The correlation coefficient is thus defined as:

(3.16) p(X, Y ) = cov(X, Y )

pcov(X, X)·cov(Y, Y )=cov(X, Y )

pvar(X)·pvar(Y),

where cov(X, Y ) is the covariance of Xand Y:

(3.17) cov(X, Y ) = E((X−E(X)) ·(Y−E(Y))).

The coefficient takes values between −1 and +1. A maximal positive linear

relationship is given if the coefficient is +1, a maximal negative linear relationship

is given by a coefficient of −1. A vanishing coefficient implies no linear relationship

of the features.

4. Experiments and First Results

We tested our face recognition algorithms with (a) the original laser scanner

data and (b) scanner data with added Gaussian noise for different values of the

peak signal-to-noise ratio (PSNR, figure 10), (c) scanner data with added Gaussian

noise that were subsequently denoised with the discrete WT method, (d) TOF data.

Database Description. The range images we used for our experiments were gen-

erated from 2.5D scans of human faces, contained in GavabDB, a database provided

by the Gavab group. A short description of this database can be found in [48]. The

GavabDB range images are in average of the size of 180 ×120 pixels. We used

frontal view images with neutral and non-neutral facial expressions. We also cre-

ated a database containing 2.5D TOF data sequences of 24 human faces acquired

with a SwissRanger SR-3000 camera. The size of the TOF images was 176 ×144

pixels, the distance between the test subject and the TOF camera was approx.

40 cm. For the experiments we used 50 ×80 face segments, as shown in figure 11.

Image Denoising. According to our test data, both the discrete WT and CDTWT

provided a robust performance for image denoising, but the CDTWT showed better

numerical results. An example for CDTWT-based image denoising is presented in

figure 12, with Gaussian noise added. Even a visual image comparison shows the

high performance of CDTWT; this was observed repeatedly during the experiments.

However, the choice of an appropriate noise dependent threshold value is crucial

for this approach since an inaccurate threshold value can produce data corruption.

Since it is possible to estimate the noise of a TOF sensor via the amplitude of

the optical signal [23], the denoising performance of the WT and the CDTWT can

be warranted by choosing the optimal threshold value for each noise level as was

done in our experiments.

The TOF data sequences were simply filtered by averaging ten subsequent frames.

16 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨

ARWOLFF, AND SCHWANDT

(a) PSNR=48,05 dB (b) PSNR=42,08 dB

Figure 10. Face data after adding Gaussian noise for different PSNRs

(a) TOF range image (b) TOF range image

Figure 11. TOF range image

Point-to-Point Registration. For face registration, we examined two different

variants of the ICP algorithm: The first method aligns the face data with a reference

face. In this way, 85% of the faces in the database could be rotated to a frontal

and upright view.

The second method mirrors the 2.5D face data about the vertical axis and moves

both the original face and the mirrored face against each other via ICP to achieve

FACE RECOGNITION FROM NOISY RANGE IMAGES WITH LOW RESOLUTION 17

(a) Original image, Gaussian noise added (b) ...after wavelet denoising

Figure 12. Complex dual tree wavelet transform denoising

an upright position. This technique assumes a mirror symmetry of the face data

and works well on 82% of the faces whose angle of inclination does not exceed 30◦.

To achieve an improved rotation invariance, we used additional images of different

face views for the classifier training.

Classifier Training. The chosen feature vector length of 80×50 pixels corresponds

to the resolution of a standard face image acquired by a TOF range camera (after

segmentation). It is also necessary to restrict the resolution considering the limited

computational resources of realistic applications.

To ensure rotation invariance (at least for small angles) and to address the prob-

lem of an insufficient number of training images for the LDA and the MLDA classi-

fier, twelve additional images were generated for two unregistered training samples

(one with a neutral and one with a smiling expression). To this end, the rectangle-

cropped and sphere-cropped faces were rotated by −5, −2.5, +2.5, +5 degrees

about all three axes.

The classifier training of the TOF data was processed on 20 randomly picked

frames of the data sequence.

Recognition on Original Laser Scanner Data.

Comparison of Different Image Features and Classifiers. Table 1 shows the best

recognition rates (recognition rates for two training samples) for PCA, LDA and

MLDA classifiers. Depth values (range images) and 3DLBPlayer 1images were used

as classifier features. For the MLDA classifier, the adjusting parameter αand for

the PCA, the number of eigenvalues were investigated.

test image test sequence

feature PCA LDA MLDA PCA LDA MLDA

depth (range image) 47 47 47 47 47 50

3DLBPlayer 187 85 88 92 85 88

Table 1. Recognition rate for neutral face expression in % (PCA,

LDA and MLDA with 180 eigenvectors, MLDA with α= 6)

18 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨

ARWOLFF, AND SCHWANDT

Table 2 and table 3 show the recognition rates for different texture features and

linear classifiers applied on images with a neutral face expression. The different

LBP variants perform significantly better than the range and gradient images. The

best MLDA results for a single-view test image were obtained with LBPriu2

8,1(88%),

for a test sequence with 3DLBPlayer 1(92%). PCA and LDA classifiers showed a

recognition rate of 92% for a single test image using the LBPriu2

8,1texture feature.

test image test sequence

feature sphere- rectangle- sphere- rectangle-

cropped cropped cropped cropped

depth (range image) 45 47 60 47

3DLBPlayer 178 87 92 88

3DLBPlayer 233 32 45 38

3DLBPlayer 358 77 72 90

3DLBPlayer 415 43 13 53

Horizontal Sobel Operator 32 40 43 50

Large Horizontal Gradient 42 50 52 55

LBPu2

8,177 83 87 87

LBPu2

16,170 82 83 87

LBPriu2

16,263 77 80 80

LBPri

8,148 77 65 85

LBPriu2

8,175 88 88 88

Table 2. Recognition rate for neutral face expression for different

features in % for a test image and for a test sequence (MLDA with

60 eigenvectors and α= 8)

Recognition from Surface Normals. Table 4 displays the recognition rates for the

MLDA classifier using the polar stereographic representation of the Gaussian image.

Obviously, the angular component is not robust against varying facial expressions.

However, it should be noted that this data was not phase-unwrapped, which is not

a trivial procedure for 2D data [24].

Comparison of Different Profile Features. With regard to the reference data for

facial profile recognition we proceed on the assumption that during a real enrol-

ment phase of the face recognition system only data of high quality is chosen as

a reference pattern. For this reason, we singled out each person’s best profiles.

Afterwards, we conducted 427 comparisons per person and per profile type. The

results are displayed in table 5. Particularly good results with profiles are achieved

when the person is looking down. In this case, horizontal root-of-the-nose-crossing

profiles and horizontal nose-crossing profiles obtained recognition rates of over 60%,

whereas vertical central profiles allowed for a recognition rate of 58%. Generally,

vertical central profiles and horizontal root-of-the-nose-crossing profiles show the

best results. The horizontal forehead-crossing profile is the weakest feature and

thus does not qualify for a discriminant recognition of faces. Furthermore, the pro-

file recognition rates seem to be stable against facial expressions, compared with

upward-oriented neutral faces.

FACE RECOGNITION FROM NOISY RANGE IMAGES WITH LOW RESOLUTION 19

test image test sequence

feature PCA LDA PCA LDA

depth (range image) 45 47 45 47

3DLBPlayer 187 85 90 85

3DLBPlayer 225 28 40 48

3DLBPlayer 372 77 80 87

3DLBPlayer 427 35 18 47

Horizontal Sobel Operator 38 45 48 55

Large Horizontal Gradient 43 50 50 55

LBPu2

8,188 83 88 88

LBPu2

16,185 85 87 88

LBPriu2

16,277 80 80 87

LBPri

8,172 80 80 87

LBPriu2

8,192 92 90 92

Table 3. Recognition rate for neutral face expression for different

features in % for a test image and for a test sequence for PCA and

LDA with 180 eigenvectors and the rectangle-cropped face segmen-

tation

N3arg π(N)

Neutral 87 90

Smiling 90 67

Table 4. Recognition rate for neutral and smiling face expression

for surface normals in polar stereographic representation for MLDA

(rectangle-cropped face segmentation)

Profile

Face direction Hor. nose-

root

Hor. fore-

head

Hor. nose Vert. central

Down 60% 48% 63% 58%

Up 37% 26% 37% 32%

Frontal 1 41% 13% 21% 51%

Frontal 2 37% 13% 21% 49%

Frontal 1 and 2 39% 13% 21% 50%

Facial expression 38% 0% 26% 53%

Laughter 36% 13% 22% 48%

Smile 38% 12% 25% 47%

on average 41% 20% 31% 48%

Table 5. Profile recognition rates

Comparison of Different Features. Table 6 displays the best recognition results for

faces with neutral and smiling expression for different features.

Recognition on Noisy and Denoised Laser Scanner Data. Table 7 illustrates

our test results on simulated noisy data before and after applying the denoising

20 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨

ARWOLFF, AND SCHWANDT

Neutral Smiling

PCA from LBPs 92 82

MLDA for surface normals 87 90

Profiles 51 48

Table 6. Best recognition rates of the introduced methods for

faces with neutral and smiling expression

image features

surface normals LBPs depth

classifiers PCA MLDA PCA MLDA PCA MLDA

PSNR test seq. a b a b a b a b a b a b

34,09 noisy 2 3 3 3 15 23 7 10 48 47 45 57

denoised 48 35 58 33 52 47 45 43 38 40 47 53

38,52 noisy 3 7 5 5 25 48 12 27 48 47 45 57

denoised 50 38 58 38 48 50 48 42 40 38 47 58

42,08 noisy 27 28 37 48 57 73 37 60 48 45 47 57

denoised 53 38 58 38 48 52 50 45 38 38 47 53

48,05 noisy 75 83 78 85 72 87 72 87 48 45 45 55

denoised 52 38 57 38 47 52 50 47 40 38 47 53

Table 7. Recognition rate for smiling (test seq. a) and neutral

(test seq. b) face expressions for different PSNRs in % for PCA

and MLDA with 180 eigenvectors and α= 6

algorithms. During the classifier training we used high-quality images and trained

only on scan sequences with little noise. During the recognition process we assumed

the use of a low-quality sensor for mobile application. Our results show that some of

the features like the surface normals and the LBPs are highly dependent upon noise

and fail, whereas range images work properly and seem to be noise independent.

After the denoising process, the derived features achieve recognition rates similar

to those obtained with the range images, independent of the noise level due to the

image corruption by denoising. To sum up, for data with little noise the derived

features are more discriminant than range images. For noisy images, we need

to investigate other denoising algorithms and/or redesign the computation of the

derived features.

Recognition on TOF images. Table 8 shows the best recognition rates (for

one frame and for a sequence of 20 frames) obtained with the MLDA classifier on

TOF images. Depth values (range images), LBPriu2

8,1and surface normals were used

as classifier features. The best recognition rates for single test images and test

sequences are the same (83%).

5. Discussion and Future Work

Within the scope of this research project, we investigated, implemented and

tested current methods for 3D face recognition. The developed system consists of

multiple optional and mandatory modules for denoising and interpolation, classifi-

cation training and the final face recognition. Since most parts of the system are

FACE RECOGNITION FROM NOISY RANGE IMAGES WITH LOW RESOLUTION 21

feature test image test sequence

depth (range image) 79 79

LBPriu2

8,171 79

surface normals 83 83

Table 8. Recognition rate for TOF data and neutral face expres-

sion in % (MLDA with 180 eigenvectors and α= 6, test sequence

of 20 randomly picked frames)

interchangeable, it serves as an excellent test environment for the investigation of

new methods.

For the classification and recognition tasks, a number of features was tested. The

Gaussian map showed high recognition rates even though the representation (one

stereographic coordinate) should be considered far from optimal and will probably

be improved in the future.

The local binary pattern—which originally has been introduced as a means of

describing textures by encoding the gray-value distribution in the neighborhood of

a pixel—also gave very good results and proved the usefulness of this type of feature

for images consisting of range data.

The facial profiles appear to be a robust feature against facial expressions like

e.g. smiling. Although the recognition rates achieved with the current system are

comparatively low, a further investigation to improve the results are in order since

the processing of profiles would allow for a significant reduction in computing time

and memory capacity that is often needed for realistic applications. For example, it

would be interesting to check if a more suitable segmentation or different measures

of profile correlation improve the recognition rates.

Although curvature data obtained from noisy range data by standard techniques

is unreliable, a further investigation of more robust methods for extraction of this

feature is of interest.

We achieved robust and real-time recognition performance (up to 92% for single

features) from unregistered and low-quality data, which is very good compared

with the currently existing methods. We expect these results to be improved by

our future work which will include the combination of several features and the use

of non-linear classifiers.

22 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨

ARWOLFF, AND SCHWANDT

References

[1] A. F. Abate, M. Nappi, S. Ricciardi, G. Sabatino, Multi-modal Face Recognition by Means

of Augmented Normal Map and PCA, in: IEEE Int. Conf. on Image Proc. (2006).

[2] L. Akarun, B. Goekberk, A. Salah, 3D Face Recognition for Biometric Applications, in:

13th European Signal Processing Conference (EUSIPCO) (Antalya, Turkey, 2005), URL

http://www.cmpe.boun.edu.tr/~gokberk/eusipco2005.pdf.

[3] H. Alt, L. J. Guibas, Discrete Geometric Shapes: Matching, Interpolation, and Approxima-

tion: A Survey, Tech. Rep. B 96-11 (1996), URL citeseer.ist.psu.edu/alt96discrete.

html.

[4] P. N. Belhumeur, J. P. Hespanha, D. J. Kriegman, Eigenfaces vs. Fisherfaces: Recognition

Using Class Specific Linear Projection, IEEE Trans. Pattern Analysis and Machine Intelli-

gence, Vol. 19, No. 7 (1997), 711–720, URL http://cs.gmu.edu/~kosecka/cs803/pami97.

pdf.

[5] B. BenAmor, K. Ouji, M. Ardebilian, L. Chen, 3D Face Recognition by ICP-Based Shape

Matching, in: The second International Conference on Machine Intelligence (ACIDCA-

ICMI’2005) (2005), URL http://liris.cnrs.fr/publis/pdf/Chen-2005_liris1963.pdf?

id=19%63.

[6] P. J. Besl, N. D. McKay, A Method for Registration of 3-D Shapes, in: IEEE Transactions

on pattern analysis and machine intelligence, Vol. 14, No. 2 (1992), 239–256, URL http:

//www.informatik.uni-bonn.de/~schulz/amr-prak/beslmckay.%pd.

[7] C. M. Bishop, Pattern Recognition and Machine Learning (Springer Science+Business Media,

LLC, 2001).

[8] G. E. P. Box, M. E. Muller, A Note on the Generation of Random Normal Deviates, Ann.

Math. Stat., 29 (1958), 610–611.

[9] A. M. Bronstein, M. M. Bronstein, R. Kimmel, Efficient Computation of Isometry-Invariant

Distances Between Surfaces, SIAM J. Sc. Comp., 28 (2006), 1812–1836, URL http://www.

cs.technion.ac.il/~ron/PAPERS/SIAM06.pdf.

[10] A. M. Bronstein, M. M. Bronstein, R. Kimmel, Generalized Multidimensional Scaling: A

Framework for Isometry-Invariant Partial Surface Matching, Proc. National Academy of

Sciences (PNAS), 103 (2006), 1168–1172.

[11] A. M. Bronstein, M. M. Bronstein, R. Kimmel, Expression-Invariant Representations of

Faces, IEEE Trans. Image Proc., 16 (2007), 188–197.

[12] S. Cai, K. Li, I. Selesnick, Matlab Implementation of Wavelet Transforms: Introduction,

URL http://taco.poly.edu/WaveletSoftware/index.html.

[13] M. do Carmo, Differential Geometry of Curves and Surfaces (Prentice Hall, 1976).

[14] K. I. Chang, K. W. Bowyer, P. J. Flynn, Multi-modal 2D and 3D Biometrics for Face

Recognition, in: Proc. IEEE Int. Workshop on Analysis and Model. of Faces and Gestures

(2003).

[15] K. I. Chang, K. W. Bowyer, P. J. Flynn, An Evaluation of Multimodal 2D+3D Biometrics,

IEEE Trans. Pattern Analysis and Machine Intelligence, 27 (2005), 619–624.

[16] K. I. Chang, K. W. Bowyer, P. J. Flynn, Multiple Nose Region Matching for 3D Face Recog-

nition under Varying Facial Expression, IEEE Trans. Pattern Analysis and Machine Intelli-

gence, 28 (2006), 1695–1700.

[17] Y. Chen, G. Medioni, Object Modeling by Registration of Multiple Range Images, in:

Proceedings of the 1991 IEEE International Conference on Robotics and Automation,

vol. 3 (1991), 2724–2729, URL http://ieeexplore.ieee.org/iel2/347/3640/00132043.pdf?

arnumbe%r=132043.

[18] N. Chiba, H. Hanaizumi, Three-Dimensional Face Recognition System, in: SICE Annual

Conf. in Sapporo (2004).

[19] T. Dey, J. Giesen, S. Goswami, Shape Segmentation and Matching with Flow Discretization,

in: M. S. F. Dehne, J.-R. Sack (Ed.), Proc. Workshop Algorithms and Data Strucutres

(WADS 03), LNCS 2748, vol. Volume 2748/2003 of Lecture Notes in Computer Sci-

ence (Springer Berlin / Heidelberg, 2003), 25–36, URL citeseer.ist.psu.edu/article/

dey03shape.html.

[20] C. Dorai, J. Weng, Jain, A. K., C. Mercer, Registration and Integration of Multiple Object

Views for 3D Model Construction, IEEE Trans. Pattern Analysis and Machine Intelligence

FACE RECOGNITION FROM NOISY RANGE IMAGES WITH LOW RESOLUTION 23

(1998), 83–89, URL http://ieeexplore.ieee.org/iel3/34/14286/00655652.pdf?arnumbe%

r=655652.

[21] R. O. Duda, P. E. Hart, D. G. Strok, Pattern Classification (John Wiley & Sons, Inc., 2001).

[22] M. Frank, M. Plaue, U. K¨othe, F. A. Hamprecht, Denoising of Continous-Wave Time-of-

Flight Depth Images using Confidence Measures (2007), submitted.

[23] M. Frank, M. Plaue, H. Rapp, U. K¨othe, B. J¨ahne, F. A. Hamprecht, Theoretical and Experi-

mental Error Analysis of Continous-Wave Time-of-Flight Range Cameras (2007), submitted.

[24] D. C. Ghiglia, M. D. Pritt, Two-Dimensional Phase Unwrapping: Theory, Algorithms, and

Software (Wiley, 1998).

[25] G. G. Gordon, Face Recognition Based on Depth Maps and Surface Curvature, in: Proc.

SPIE, vol. 1570 (1991), 234–247, URL http://www.vincent-net.com/gaile/papers/SPIE_

sandiego/spie_sa%ndiego.pdf.

[26] W. E. L. Grimson, From Images to Surfaces (MIT Press, 1981).

[27] D. Helbing, I. Farkas, T. Vicsek, Simulating Dynamical Features of Escape Panic, Nature,

407 (2000), 487–490.

[28] L. F. Henderson, The Statistics of Crowd Fluids, Nature, 229 (1971), 381–383.

[29] T. Heseltine, N. Pears, J. Austin, Three-Dimensional Face Recognition Using Surface Space

Combinations, ICIP04, II (2004), 1421–1424, URL http://www-users.cs.york.ac.uk/~nep/

tomh/3DFaceRecUsingSurfac%eSpaceCombinations-BMVC.pdf.

[30] K. Hildebrandt, K. Polthier, Anisotropic Filtering of Non-linear Surface Features, Comp.

Graph. Forum, 23 (2004), 391–400.

[31] Y. Huang, Y. Wang, T. N. Tan, Combining Statistics of Geometrical and Correlative Features

for 3D Face Recognition, ICPR04, 3(2006), I: 330–333, URL http://www.visionbib.com/

bibliography/people890.html\#TT65507%.

[32] M. H¨usken, M. Brauckmann, S. Gehlen, C. von der Malsburg, Strategies and Benefits of

Fusion of 2D and 3D Face Recognition, in: Proc. IEEE CVPR (2005).

[33] A. K. Jain, R. P. W. Duin, J. Mao, Statistical Pattern Recognition: A Review, in: IEEE

Transactions on Pattern Analysis and Machine Intelligence, vol. 22 (2000).

[34] T. Kahlmann, F. Remondino, H. Ingensand, Calibration for Increased Accuracy of the

Range Imaging Camera SwissrangerTM , in: Proc. ISPRS (2006), URL http://www.

photogrammetry.ethz.ch/general/persons/fabio/kahlm%ann_etal_ISPRSV2006.pdf.

[35] N. G. Kingsbury, Complex Wavelets for Shift Invariant Analysis and Filtering of Sig-

nals, Applied and Computational Harmonic Analysis, 10 (2002), 234–253, URL http:

//www-sigproc.eng.cam.ac.uk/~ngk/publications/ngk_ACHApa%p.pdf.

[36] H. Kong, X. Li, J.-G. Wang, E. K. Teoh, C. Kambhamettu, Discriminant Low-dimensional

Subspace Analysis for Face Recognition with Small Number of Training Samples, IEEE Trans-

actions on system, man and cybernetics, Part B, URL http://www.bmva.ac.uk/bmvc/2005/

papers/165/HK_BMVC05.pdf.

[37] C. Lange, K. Polthier, Anisotropic Smoothing of Point Sets, Comp. Aid. Geom. Des.,

22 (2005), 680–692, URL http://page.mi.fu-berlin.de/polthier/articles/pointSet/

points%etFairing.pdf.

[38] R. Lange, 3D Time-of-Flight Distance Measurement with Custom Solid-State Image Sensors

in CMOS/CCD Technology, Ph.D. thesis, University of Siegen (2000), URL http://deposit.

ddb.de/cgi-bin/dokserv?idn=960293825.

[39] T. Ledermann, J. Pannekamp, 3-D-Wahrnehmung in der Robotik, interaktiv 2.2006, 2(2006),

24–25.

[40] C. Li, A. Barreto, An Integrated 3D Face Expression Recognition Approach, in: Proc.

ICASSP (2006).

[41] M. Lindner, A. Kolb, Lateral and Depth Calibration of PMD distance sensors, in:

Proc. ISCV (2006), 524–533, URL http://www.cg.informatik.uni-siegen.de/data/

Publications/2006%/isvc2006.pdf.

[42] X. Lu, 3D Face Recognition Across Pose and Expression, Ph.D. thesis, Michigan State Uni-

versity (2006), URL http://www.msu.edu/~lvxiaogu/thesis/thesis_3DFace_Lu.htm.

[43] B. Mederos, L. Velho, L. H. de Figueiredo, Robust Smoothing of Noisy Point Clouds, in:

N. Press (Ed.), Geom. Des. and Comp.: Seattle 2003 (2004), URL http://w3.impa.br/

~boris/seattle2003.pdf.

[44] B. Meffert, O. Hochmuth, Werkzeuge der Signalverarbeitung (Pearson Studium, 2004).

24 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨

ARWOLFF, AND SCHWANDT

[45] A. S. Mian, D. Mathers, M. Bennamoun, R. Owens, G. Hingston, 3D Face Recognition

by Matching Shape Descriptors, in: Proc. IVCNZ ’04 (2004), 23–28, URL http://www.

postgraduate.uwa.edu.au/__data/page/110739/3Dfacer%ecogmatchingshape.pdf.

[46] K. U. Modrich, Industrielle Bildverarbeitung f¨ur automatisierte Produktionen, Wiley-VCH

Verlag GmbH & Co. KGaA, Weinheim (2007), 25–31.

[47] B. Moghaddam, T. Jebara, A. Pentland, Bayesian Face Recognition, in: Pattern Recogni-

tion, Vol. 33, No. 11, pps. 1171-1782 (2000), URL http://courses.csail.mit.edu/6.869/

handouts/MerlTR2000-42%20B%ayesianFacReco.pdf.

[48] A. B. Moreno, A. Sanchez, GavabDB: a 3D Face Database, in: Proc. 2nd COST275 Workshop

on Biometrics on the Internet (Vigo (Spain), 2004), 75–80, URL http://gavab.escet.urjc.

es/recursosen.html.

[49] T. Oggier, B. B¨uttgen, F. Lustenberger, G. Becker, B. R¨uegg, A. Hodac, SwissRanger SR3000

and First Experiences Based on Miniaturized 3D-TOF Cameras, Tech. rep., CSEM, IEE,

Fachhochschule Rapperswil Switzerland (2005).

[50] T. Ojala, M. Pietikaeinen, Multiresolution Gray-Scale and Rotation Invariant Texture Classi-

fication with Local Binary Patterns, IEEE Trans. Pattern Analysis and Machine Intelligence,

24 (2005), 971–987, URL http://www.mediateam.oulu.fi/publications/pdf/6.pdf.

[51] G. Pan, Y. Wu, Z. Wu, Investigating Profile Extracted from Range Data for 3D Face Recog-

nition, in: Proc. IEEE Int. Conf. on Systems, Man and Cybernetics (2003).

[52] M. Plaue, Analysis of the PMD Imaging System, Tech. rep., IWR, Univ. of Heidelberg (2006).

[53] K. Polthier, JavaView - Interactive 3D Geometry and Visualization (1999-2006), URL http:

//www.javaview.de/.

[54] H. Rapp, Experimental and Theoretical Investigation of Correlating TOF Camera Systems,

Master’s thesis, IWR, University of Heidelberg (2007).

[55] H. Rapp, M. Frank, F. A. Hamprecht, B. J¨ahne, A Theoretical and Experimental Investigation

of the Systematic Errors and Statistical Uncertainties of Time-of-Flight Cameras, Int. J.

Accounting, Auditing and Performance Evaluation (accepted).

[56] J. Restle, M. Hissmann, F. Hamprecht, Nonparametric Smoothing of Interferometric Height

Maps Using Confidence Values, Opt. Eng., 43 (2004), 866–871.

[57] T. Ringbeck, B. Hagebeuker, A 3D Time Of Flight Camera For Object Detection.

[58] T. Ringbeck, T. M¨oller, B. Hagebeuker, Multidimensional Measurement by Using 3-D PMD

Sensors.

[59] S. Rusinkiewicz, M. Levoy, Efficient Variants of the ICP Algorithm, in: Proceedings of the

Third Intl. Conf. on 3D Digital Imaging and Modeling (2001), 145–152.

[60] R. Schwarte, Z. Xu, H. Heinol, J. Olk, B. Buxbaum, New Optical Four-Quadrant Phase-

Detector Integrated into a Photogate Array for Small and Precise 3D Cameras, in: Proc.

SPIE, vol. 3023 (1997), 119–128.

[61] I. W. Selesnick, R. G. Baraniuk, N. G. Kingsbury, The Dual-Tree Complex Wavelet Trans-

form, IEEE Signal Processing Magazine, 123 (2005), 123–151, URL http://ieeexplore.

ieee.org/iel5/79/33042/01550194.pdf.

[62] P. Shukla, Complex Wavelet Transforms and their Applications, Ph.D. thesis, University of

Strathclyde, Glasgow, UK (2003), URL http://www.commsp.ee.ic.ac.uk/~pancham/MPHIL_

THESIS.pdf.

[63] H. J. Song, K. H. Sohn, Face Recognition Using Two Different 3D Sensors, in: Proc. ISPACS

(2004).

[64] T. Tasdizen, R. Whitaker, P. Burchard, S. Osher, Geometric Surface Smoothing via

Anisotropic Diffusion of Normals, Proceedings of the Conference on Visualization, P4 (2002),

125 – 132, URL http://portal.acm.org/citation.cfm?id=602117.

[65] R. Veltkamp, M. Hagedoorn, State of the Art in Shape Matching, URL citeseer.ist.psu.

edu/veltkamp99stateart.html.

[66] D. M. Weinstein, The Analytic 3-D Transform for the Least-Squared Fit of Three Pairs

of Corresponding Point, Tech. rep., Department of Computer Science, University of Utah

(1998), URL http://www.sci.utah.edu/publications/dmw98/UUCS-98-005.pdf.

[67] J. Wikander, Automated Vehicle Occupancy Technologies Study: Synthesis Report, Tech.

rep., Texas Transportation Institute (2007).

Department of Mathematics, Technische Universit¨

at Berlin, Germany