scieee Science in your language
[en] (orig)
TOWARDS ROBUST 3D FACE RECOGNITION FROM NOISY
RANGE IMAGES WITH LOW RESOLUTION
O. EBERS, T. EBERS, T. SPIRIDONIDOU, M. PLAUE, P. BECKMANN, G. B¨
ARWOLFF,
AND H. SCHWANDT
Abstract. For a number of different security and industrial applications,
there is the need for reliable person identification methods. Among these meth-
ods, face recognition has a number of advantages such as being non-invasive
and potentially covert. Since the device for data acquisition is a conventional
camera, other advantages of a 2D face recognition system are its low data cap-
ture duration and its low cost. However, the recent introduction of fast and
comparatively inexpensive time-of-flight (TOF) cameras for the recording of
2.5D range data calls for a closer look at 3D face recognition in this context.
One major disadvantage, however, is the low quality of the data aquired with
such cameras. In this paper, we introduce a robust 3D face recognition system
based on such noisy range images with low resolution.
1. Introduction
There is a number of applications that require the identification of humans. Ex-
amples include the authentification for a computer application or access control for
high-security areas like an airport control tower. Face recognition systems are well
suited for the task of human identification as they require less cooperation by the
user than an iris or fingerprint scan. It is natural, robust and unintrusive, and the
user is not required to remember any passwords or codes [2]. While the automatic
face recognition on 2D images has been a research issue for several years, the recent
development of 3D sensors has resulted in a considerable interest in methods for
face recognition on range images.
In this project, we explored the state of the art of 3D face recognition and ana-
lyzed the advantages and disadvantages of several methods in regard to our project
goals. Our work resulted in the development of a real-time system for the process-
ing of three-dimensional data that is specialized on pattern recognition tasks. The
algorithms we chose to implement were modified according to the project’s needs
and were reinvestigated and recombined.
The result of our work is a general development platform for 3D pattern recog-
nition, specially designed for 3D face recognition on noisy and low-resolution data.
In this context the platform can be extended for the recognition of any kind of
3D objects and it can be easily enhanced by the supplementary processing of two-
dimensional intensity data.
Date: October 27, 2008.
2000 Mathematics Subject Classification. 68T45, 68U10.
Key words and phrases. 3D face recognition, time-of-flight camera, range data denoising, pat-
tern recognition, pattern matching.
This project was funded by the European Regional Development Fund (ERDF).
1
2 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨
ARWOLFF, AND SCHWANDT
In order to develop a face recognition system based on range images—for example
acquired with the new 3D sensor type of time-of-flight (TOF) cameras—one has
to turn particular attention on the quality of the data since such data is still very
noisy and biased [23, 55]. For this reason our main goal was the development of
algorithms that improve low-quality range data and process it efficiently and in
real-time. Furthermore, our 3D face recognition system is constructed modularly,
and can thus be easily adapted to data of higher quality obtained by other sensors.
To deal with low-quality range data, one has to (a) calibrate the imaging sys-
tem with this particular application in mind, and (b) employ a pre-processing step
that filters and smoothes the image data to achieve a quality suitable for feature
extraction. The pre-processing algorithms have to account for the particular char-
acteristics of the range data at hand since for example the noise model of a TOF
sensor differs from the usual Gaussian white noise model assumed for the majority
of standard denoising methods.
After acquiring and pre-processing the data, one wishes to extract discriminant
and robust features. Again, it is important to consider the special nature of the
data which for example forbids a robust calculation of the curvature. In particular,
we have considered three features: the surface normals (or Gaussian map), the local
binary pattern (LBP) and facial profiles (1D cross sections of the face).
The final face recognition task can then be accomplished by the usual classifica-
tion methods such as Principal Component Analysis (PCA [4]), the Linear Discrim-
inant Analysis (LDA [47]) or the Modified Linear Discriminant Analysis (MLDA
[36]).
2. Related Work
While there exists extensive work on 2D face recognition, 3D face recognition is
still a comparatively new research field. As has been shown in several experimental
surveys [1, 14, 15, 32], in particular multi-modal approaches combining 2D and 3D
features give results that surpass those of a simple 2D system. One main disadvan-
tage of a face recognition system using range images, however, is the high cost of an
industrial high resolution 3D scanner that is often needed to aquire the data. Most
of the 3D face recognition work published until today use such laser or structured-
light scanners [40, 63]. One cost-effective way to record range data is of course
stereographic imaging [18]. However, it is well-known that such systems require a
robust solution for the correspondence problem [26] and precise calibration. The
also comparatively inexpensive time-of-flight imaging systems on the other hand
have been used in a substantial number of application areas such as automated
production [39, 46] or automotive applications [49, 58, 57, 67], while there are little
studies that investigate the feasibility of TOF imaging for more complex recogni-
tion tasks like facial recognition. One major problem that arises with the use of
cost-efficient 3D imaging systems like TOF is the low quality and resolution of the
data. The main goal of our project was the implementation of a software pipeline
capable of processing such data in real-time which will be described in the follow-
ing sections. Although the system is taylored for 3D face recognition from TOF
range images, it can be easily modified for other object recognition tasks based on
low-quality data.
Denoising of 2.5D Data. The first processing step in our pipeline aims at the
removal of noise present in typical range sensor data. The denoising of 3D data
FACE RECOGNITION FROM NOISY RANGE IMAGES WITH LOW RESOLUTION 3
and range data is a wide research field, and the choice of the appropriate denoising
method depends on the noise and data characteristics. Typical denoising methods
include the median filter [20], the moving least-squares method [43] and anisotropic
diffusion [64], especially anisotropic smoothing of point sets [37] and surface meshes
[30].
The wavelet transform is widely used for the purpose of image denoising and has
been found to be a high-performance tool. For example, Cai et al. provide a useful
MATLAB R
framework [12] we used in our work (a detailed description can be
found in [61] and [35]). In [62], a good introduction of complex wavelet transforms
and their applications can be found.
Point-to-Point Registration. Another crucial step is the face registration which
aims at detecting the exact face position and attempts to align the face with a
position suitable for recognition tasks, which is usually the frontal view.
For the coarse registration, one common practice is to identify the position of
three significant local features, for example the pupils and the nose tip. Afterwards,
the features are mapped onto the corresponding features of a reference face by an
affine map consisting of a rotation and a translation (cf. [66]). The parameters
of this map express the feature points’ relation to the corresponding points in the
reference face. Via the affine map determined in this way, all data points are
subsequently transformed to realize the coarse alignment along a position that is
common for all faces in the database.
Common algorithms for fine alignment on the other hand is the family of Iterative
Closest Point algorithms (ICP) which try to minimize the Hausdorff-distance (or
one of its various relatives) between surfaces, and the Thin Plate Spline algorithm
(TPS) [42]. Chen et al. [17] and Besl et al. [6] use ICP for scan registration during
3D model creation. In this context, ICP can be used for fine face alignment by
fitting the face data onto the reference face. An exhaustive overview about ICP
algorithms is provided by Rusinkiewicz et al. [59].
An interesting variant of the aforementioned (rigid) ICP-based registration was
proposed by Bronstein et al. [9, 10], who used the Gromov–Hausdorff distance to
compute inter-facial embeddings with minimal metric distortion, thereby enhancing
the registration toolbox with the ability to match faces with different expressions
against each other.
As a generalization to the Hausdorff distance, which is usually expressed as a
min–max problem of the maximal distance of two sets (using the metric of their
common embedding metric space), the Gromov–Hausdorff distance minimizes the
maximal inner-metric distortion among all common ambient metric spaces and
all possible embedding mappings, thus rendering the Gromov–Hausdorff distance
independent of isometries. Since the computation of the functional as described
here is intractable, the authors propose a discretization of the Gromov–Hausdorff
distance in terms of mutual inter-surface embeddings, thus minimizing the metric
distortion while embedding one surface onto the other and vice versa.
Once this distance functional and its corresponding embeddings are computed,
the resulting distance value can be used directly for registration tasks by interpret-
ing it as a similarity measure between faces. Moreover, the resulting embeddings
carry an optimal inter-facial point-to-point correspondence regardless of the actual
facial expressions involved. Still, as in the case of ICP, the process relies heavily on
a previous rough initialization of a few feature points.
Advertisement
4 EBERS, SPIRIDONIDOU, PLAUE, BECKMANN, B ¨
ARWOLFF, AND SCHWANDT
Face Recognition. For the final task of face recognition for 2.5D images or 3D
models, three main methodologies can be identified: shape matching, feature-based
and image-based techniques. A detailed overview on face recognition methods is
provided in [2].
The first group consists of algorithms that iteratively try to map a 3D point
cloud or a 3D mesh onto a reference point cloud or reference mesh, respectively
[3, 5, 9, 10, 11, 19, 65]. The shape matching methods can be seen as pattern
recognition methods without feature extraction. A test pattern is directly compared
with the reference pattern, a feature extraction does not take place. The similarity
measure—which is often implemented via a correlation measure—can be optimized
by using a sufficiently large number of training samples. These approaches demand
an extensive computational effort and an accurate point-to-point data registration
and assume the existence of many correspondences between the reference model
and the test data.
The modus operandi of feature-based methods correponds mostly to that of shape
matching. However, with pattern matching, not the whole data is processed but
appropriate subsets. For example, particular regions (eye, forehead, cheek, nose)
or the nose profile of the face could be detected, extracted and processed [5, 16,
25, 42, 45]. Like shape matching methods, the feature-based methods demand a
robust image registration, since the features are selected during a pre-processing
step without the possibility to change their value later on.
Image-based methods attempt to extract the face data subset significant for face
recognition with the aid of statistical learning techniques and without any human
interaction. With feature-based methods, there are no or at least less pre-processing
steps involved as is the case with image-based methods: All of the image information
is used for statistical analysis. This methods have been very successful in the
context of 2D face recognition [4, 47]. Since the TOF sensor data is a 2D range
distribution and can thus technically be viewed as a conventional 2D image, it does
not surprise that these techniques are also applied in this context. Introductions in
state-of-the-art techniques of statistical learning and statistical pattern recognition
can be found in [7, 21, 33].
In our approach, we use the statistical learning techniques with Local Binary
Patterns (LBP [31, 50]) and surface normals, thereby proposing a combination of a
feature-based and an image-based method: There is less information lost with this
technique, since the whole image and not some preselected region is used for the
feature extraction and subsequent classification. As a statistical learning method,
we used the Principal Component Analysis (PCA [4]), the Linear Discriminant
Analysis (LDA [47]) and the Modified Linear Discriminant Analysis (MLDA [36])
for classification.
As an alternative feature-based approach, we used different profiles of the face
(see e.g. [51]) for classification via the Pearson coefficient.
3. The General Setup
Our project, funded by the European Regional Development Fund (ERDF), was
concerned with the processing of facial biometric data in the context of the descrip-
tion of pedestrian movement. To obtain data for crowd movement models that
account for the position of individuals [27] (in contrast to a crowd fluid [28]) it is
necessary to identify those individuals with unintrusive biometric techniques. A
FACE RECOGNITION FROM NOISY RANGE IMAGES WITH LOW RESOLUTION 5
more specific application would be the analysis of commuter behaviour in public
transportation: the usual systems available at the time of writing of this article only
count passengers without recognizing individuals changing means of transportation.
To achieve this task of processing and classifying individual biometrical infor-
mation we developed a 3D face recognition system that is able to cope with low-
quality range data. This software platform was implemented as a toolbox for the
MATLAB R
scripting language.
Multiple modules for pre-processing and the actual face recognition were im-
plemented and tested separately (cf. figure 1). For almost every module, we have
developed and implemented alternative approaches to adjust the system to different
application requirements. Depending on operating conditions and available capaci-
ties, the user can choose from a variety of individual modules and algorithms. The
software contains conventional methods for 3D face recognition as well as unique
and novel ideas.
Figure 1. Software pipeline
As a main result of this project we implemented a robust real-time face recogni-
tion system from an innovative multi-modal approach that accounts for the typical
characteristics of low-quality data obtained with a TOF sensor by combining 3D
and 2D techniques that can deal with low-resolution images and little preliminary
pre-processing capacities.
Since we would like to compare the performance of the system for data obtained
from various sources, we implemented a simulation pipeline to emulate different
noise chracteristics and resolution (see figure 2). The simulation pipeline features
additional modules and algorithms for gradually degrading the pre-processed and
comparatively noise-free laser scanner data from the Gavab database towards the
data quality of a realistic cost-effective real-time ranging system. This simulation
served as an important tool for assessing the sensor’s requirements like resolution
and signal-to-noise ratio.
Figure 2. Simulation of low-quality sensor data
In figure 1, the flow chart of the final system is illustrated.
Advertisement
Loading more pages...