Towards robust 3D face recognition from noisy range images with low resolution [original]

TOWARDS ROBUST 3D FACE RECOGNITION FROM NOISY

RANGE IMAGES WITH LOW RESOLUTION

O. EBERS, T. EBERS, T. SPIRIDONIDOU, M. PLAUE, P. BECKMANN, G. B¨

ARWOLFF,

AND H. SCHWANDT

Abstract. For a number of different security and industrial applications,

there is the need for reliable person identification methods. Among these meth-

ods, face recognition has a number of advantages such as being non-invasive

and potentially covert. Since the device for data acquisition is a conventional

camera, other advantages of a 2D face recognition system are its low data cap-

ture duration and its low cost. However, the recent introduction of fast and

comparatively inexpensive time-of-flight (TOF) cameras for the recording of

2.5D range data calls for a closer look at 3D face recognition in this context.

One major disadvantage, however, is the low quality of the data aquired with

such cameras. In this paper, we introduce a robust 3D face recognition system

based on such noisy range images with low resolution.

1. Introduction

There is a number of applications that require the identification of humans. Ex-

amples include the authentification for a computer application or access control for

high-security areas like an airport control tower. Face recognition systems are well

suited for the task of human identification as they require less cooperation by the

user than an iris or fingerprint scan. It is natural, robust and unintrusive, and the

user is not required to remember any passwords or codes [2]. While the automatic

face recognition on 2D images has been a research issue for several years, the recent

development of 3D sensors has resulted in a considerable interest in methods for

face recognition on range images.

In this project, we explored the state of the art of 3D face recognition and ana-

lyzed the advantages and disadvantages of several methods in regard to our project

goals. Our work resulted in the development of a real-time system for the process-

ing of three-dimensional data that is specialized on pattern recognition tasks. The

algorithms we chose to implement were modified according to the project’s needs

and were reinvestigated and recombined.

The result of our work is a general development platform for 3D pattern recog-

nition, specially designed for 3D face recognition on noisy and low-resolution data.

In this context the platform can be extended for the recognition of any kind of

3D objects and it can be easily enhanced by the supplementary processing of two-

dimensional intensity data.

Date: October 27, 2008.

2000 Mathematics Subject Classification. 68T45, 68U10.

Key words and phrases. 3D face recognition, time-of-flight camera, range data denoising, pat-

tern recognition, pattern matching.

This project was funded by the European Regional Development Fund (ERDF).

2 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨

ARWOLFF, AND SCHWANDT

In order to develop a face recognition system based on range images—for example

acquired with the new 3D sensor type of time-of-flight (TOF) cameras—one has

to turn particular attention on the quality of the data since such data is still very

noisy and biased [23, 55]. For this reason our main goal was the development of

algorithms that improve low-quality range data and process it efficiently and in

real-time. Furthermore, our 3D face recognition system is constructed modularly,

and can thus be easily adapted to data of higher quality obtained by other sensors.

To deal with low-quality range data, one has to (a) calibrate the imaging sys-

tem with this particular application in mind, and (b) employ a pre-processing step

that filters and smoothes the image data to achieve a quality suitable for feature

extraction. The pre-processing algorithms have to account for the particular char-

acteristics of the range data at hand since for example the noise model of a TOF

sensor differs from the usual Gaussian white noise model assumed for the majority

of standard denoising methods.

After acquiring and pre-processing the data, one wishes to extract discriminant

and robust features. Again, it is important to consider the special nature of the

data which for example forbids a robust calculation of the curvature. In particular,

we have considered three features: the surface normals (or Gaussian map), the local

binary pattern (LBP) and facial profiles (1D cross sections of the face).

The final face recognition task can then be accomplished by the usual classifica-

tion methods such as Principal Component Analysis (PCA [4]), the Linear Discrim-

inant Analysis (LDA [47]) or the Modified Linear Discriminant Analysis (MLDA

[36]).

2. Related Work

While there exists extensive work on 2D face recognition, 3D face recognition is

still a comparatively new research field. As has been shown in several experimental

surveys [1, 14, 15, 32], in particular multi-modal approaches combining 2D and 3D

features give results that surpass those of a simple 2D system. One main disadvan-

tage of a face recognition system using range images, however, is the high cost of an

industrial high resolution 3D scanner that is often needed to aquire the data. Most

of the 3D face recognition work published until today use such laser or structured-

light scanners [40, 63]. One cost-effective way to record range data is of course

stereographic imaging [18]. However, it is well-known that such systems require a

robust solution for the correspondence problem [26] and precise calibration. The

also comparatively inexpensive time-of-flight imaging systems on the other hand

have been used in a substantial number of application areas such as automated

production [39, 46] or automotive applications [49, 58, 57, 67], while there are little

studies that investigate the feasibility of TOF imaging for more complex recogni-

tion tasks like facial recognition. One major problem that arises with the use of

cost-efficient 3D imaging systems like TOF is the low quality and resolution of the

data. The main goal of our project was the implementation of a software pipeline

capable of processing such data in real-time which will be described in the follow-

ing sections. Although the system is taylored for 3D face recognition from TOF

range images, it can be easily modified for other object recognition tasks based on

low-quality data.

Denoising of 2.5D Data. The first processing step in our pipeline aims at the

removal of noise present in typical range sensor data. The denoising of 3D data

FACE RECOGNITION FROM NOISY RANGE IMAGES WITH LOW RESOLUTION 3

and range data is a wide research field, and the choice of the appropriate denoising

method depends on the noise and data characteristics. Typical denoising methods

include the median filter [20], the moving least-squares method [43] and anisotropic

diffusion [64], especially anisotropic smoothing of point sets [37] and surface meshes

[30].

The wavelet transform is widely used for the purpose of image denoising and has

been found to be a high-performance tool. For example, Cai et al. provide a useful

MATLAB R

framework [12] we used in our work (a detailed description can be

found in [61] and [35]). In [62], a good introduction of complex wavelet transforms

and their applications can be found.

Point-to-Point Registration. Another crucial step is the face registration which

aims at detecting the exact face position and attempts to align the face with a

position suitable for recognition tasks, which is usually the frontal view.

For the coarse registration, one common practice is to identify the position of

three significant local features, for example the pupils and the nose tip. Afterwards,

the features are mapped onto the corresponding features of a reference face by an

affine map consisting of a rotation and a translation (cf. [66]). The parameters

of this map express the feature points’ relation to the corresponding points in the

reference face. Via the affine map determined in this way, all data points are

subsequently transformed to realize the coarse alignment along a position that is

common for all faces in the database.

Common algorithms for fine alignment on the other hand is the family of Iterative

Closest Point algorithms (ICP) which try to minimize the Hausdorff-distance (or

one of its various relatives) between surfaces, and the Thin Plate Spline algorithm

(TPS) [42]. Chen et al. [17] and Besl et al. [6] use ICP for scan registration during

3D model creation. In this context, ICP can be used for fine face alignment by

fitting the face data onto the reference face. An exhaustive overview about ICP

algorithms is provided by Rusinkiewicz et al. [59].

An interesting variant of the aforementioned (rigid) ICP-based registration was

proposed by Bronstein et al. [9, 10], who used the Gromov–Hausdorff distance to

compute inter-facial embeddings with minimal metric distortion, thereby enhancing

the registration toolbox with the ability to match faces with different expressions

against each other.

As a generalization to the Hausdorff distance, which is usually expressed as a

min–max problem of the maximal distance of two sets (using the metric of their

common embedding metric space), the Gromov–Hausdorff distance minimizes the

maximal inner-metric distortion among all common ambient metric spaces and

all possible embedding mappings, thus rendering the Gromov–Hausdorff distance

independent of isometries. Since the computation of the functional as described

here is intractable, the authors propose a discretization of the Gromov–Hausdorff

distance in terms of mutual inter-surface embeddings, thus minimizing the metric

distortion while embedding one surface onto the other and vice versa.

Once this distance functional and its corresponding embeddings are computed,

the resulting distance value can be used directly for registration tasks by interpret-

ing it as a similarity measure between faces. Moreover, the resulting embeddings

carry an optimal inter-facial point-to-point correspondence regardless of the actual

facial expressions involved. Still, as in the case of ICP, the process relies heavily on

a previous rough initialization of a few feature points.

4 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨

ARWOLFF, AND SCHWANDT

Face Recognition. For the final task of face recognition for 2.5D images or 3D

models, three main methodologies can be identified: shape matching, feature-based

and image-based techniques. A detailed overview on face recognition methods is

provided in [2].

The first group consists of algorithms that iteratively try to map a 3D point

cloud or a 3D mesh onto a reference point cloud or reference mesh, respectively

[3, 5, 9, 10, 11, 19, 65]. The shape matching methods can be seen as pattern

recognition methods without feature extraction. A test pattern is directly compared

with the reference pattern, a feature extraction does not take place. The similarity

measure—which is often implemented via a correlation measure—can be optimized

by using a sufficiently large number of training samples. These approaches demand

an extensive computational effort and an accurate point-to-point data registration

and assume the existence of many correspondences between the reference model

and the test data.

The modus operandi of feature-based methods correponds mostly to that of shape

matching. However, with pattern matching, not the whole data is processed but

appropriate subsets. For example, particular regions (eye, forehead, cheek, nose)

or the nose profile of the face could be detected, extracted and processed [5, 16,

25, 42, 45]. Like shape matching methods, the feature-based methods demand a

robust image registration, since the features are selected during a pre-processing

step without the possibility to change their value later on.

Image-based methods attempt to extract the face data subset significant for face

recognition with the aid of statistical learning techniques and without any human

interaction. With feature-based methods, there are no or at least less pre-processing

steps involved as is the case with image-based methods: All of the image information

is used for statistical analysis. This methods have been very successful in the

context of 2D face recognition [4, 47]. Since the TOF sensor data is a 2D range

distribution and can thus technically be viewed as a conventional 2D image, it does

not surprise that these techniques are also applied in this context. Introductions in

state-of-the-art techniques of statistical learning and statistical pattern recognition

can be found in [7, 21, 33].

In our approach, we use the statistical learning techniques with Local Binary

Patterns (LBP [31, 50]) and surface normals, thereby proposing a combination of a

feature-based and an image-based method: There is less information lost with this

technique, since the whole image and not some preselected region is used for the

feature extraction and subsequent classification. As a statistical learning method,

we used the Principal Component Analysis (PCA [4]), the Linear Discriminant

Analysis (LDA [47]) and the Modified Linear Discriminant Analysis (MLDA [36])

for classification.

As an alternative feature-based approach, we used different profiles of the face

(see e.g. [51]) for classification via the Pearson coefficient.

3. The General Setup

Our project, funded by the European Regional Development Fund (ERDF), was

concerned with the processing of facial biometric data in the context of the descrip-

tion of pedestrian movement. To obtain data for crowd movement models that

account for the position of individuals [27] (in contrast to a crowd fluid [28]) it is

necessary to identify those individuals with unintrusive biometric techniques. A

FACE RECOGNITION FROM NOISY RANGE IMAGES WITH LOW RESOLUTION 5

more specific application would be the analysis of commuter behaviour in public

transportation: the usual systems available at the time of writing of this article only

count passengers without recognizing individuals changing means of transportation.

To achieve this task of processing and classifying individual biometrical infor-

mation we developed a 3D face recognition system that is able to cope with low-

quality range data. This software platform was implemented as a toolbox for the

MATLAB R

scripting language.

Multiple modules for pre-processing and the actual face recognition were im-

plemented and tested separately (cf. figure 1). For almost every module, we have

developed and implemented alternative approaches to adjust the system to different

application requirements. Depending on operating conditions and available capaci-

ties, the user can choose from a variety of individual modules and algorithms. The

software contains conventional methods for 3D face recognition as well as unique

and novel ideas.

Figure 1. Software pipeline

As a main result of this project we implemented a robust real-time face recogni-

tion system from an innovative multi-modal approach that accounts for the typical

characteristics of low-quality data obtained with a TOF sensor by combining 3D

and 2D techniques that can deal with low-resolution images and little preliminary

pre-processing capacities.

Since we would like to compare the performance of the system for data obtained

from various sources, we implemented a simulation pipeline to emulate different

noise chracteristics and resolution (see figure 2). The simulation pipeline features

additional modules and algorithms for gradually degrading the pre-processed and

comparatively noise-free laser scanner data from the Gavab database towards the

data quality of a realistic cost-effective real-time ranging system. This simulation

served as an important tool for assessing the sensor’s requirements like resolution

and signal-to-noise ratio.

Figure 2. Simulation of low-quality sensor data

In figure 1, the flow chart of the final system is illustrated.

Loading more pages...