scieee Science in your language
[en] (orig)
A Regular iz ed Fusion based 3D Reconstr uction
F r ame w or k: Analyses , Methods and Applications
v orgelegt v on
M.Sc.
Muhammad Asif Ali Rajput
geb . in Sukkur , P akistan.
v on der F akultät IV – Elektrotechnik und Inf or matik
der T echnischen Univ ersität Berlin
zur Er langung des akademischen Gr ades
Doktor der Ingenieurwissenschaften
- Dr .-Ing. -
genehmigte Disser tation
Promotionsausschuss:
V orsitzender: Prof. Dr . Henning Sprekeler
Gutachter: Prof. Dr .-Ing. Olaf Hellwich
Gutachter: Dr .-Ing. Anko Börner
Gutachter: Prof. Dr .-Ing Reinhard K och
T ag der wissenschaftlichen Aussprache: 24. September 2018
Berlin 2018

Abstract
Recent de v elopments in depth sensing technologies enabled mobile robots to percei v e
surroundings with high accurac y . Robotic applications, equipped with depth perception
technology , enable the capability of autonomous navig ation to self-dri ving cars, assist in critical
sur gical procedures, or reconstruct the 3D model of a potentially hazardous en vironment.
There e xists a v ariety of 3D sensors ranging from highly accurate laser -based range sensors
to lo w-range acti ve depth cameras; the selection of a 3D sensor , ho we ver , directly affects the
capabilities of robotic applications to percei v e surroundings. Unlike self-dri ving autonomous
v ehicles, which are equipped with high-cost LiD AR 3D sensors to ensure safety , mobile robots
are usually equipped with lo w-cost acti ve or passi ve depth sensors. This means that acquired
depth information from lo w-cost 3D sensors is prone to accumulating estimation noise. In
principal, existing 3D reconstruction frame works employ multiple instances of erroneous depth
samples in an incremental fashion to produce high-quality 3D models.
In this thesis, the research objecti v e is focused on reducing the ef fects of error -prone
depth information by employing a proposed 3D reconstruction frame work capable of reducing
accumulated noise, using a re gularizing 3D integration system. The underlying principal
of e xisting state-of-the-art v olumetric reconstruction techniques is unchanged since the
introduction in 1996. The no velty of the proposed frame work lies in the use of a prior
smoothing constraint that represents, on a small-scale, that the surface of the percei v ed object
is smooth. The application of this smoothing constraint on depth information, acquired from
lo w-cost 3D sensors, can enhance the quality of 3D information without sacrificing fine details
in surface geometry .
Critical e xperimentation and empirical e v aluation of the ne w reconstruction framew ork
ha ve sho wn a significant increase in accurac y and quality of reconstructed shapes compared
to state-of-the-art methods. Furthermore, by quantitati ve assessments it has been observ ed that
i

employing smoothing constraints to an incremental 3D fusion process accelerates the surface
estimation process. Therefore, comparati v ely fe wer depth samples are required to generate
high-quality 3D surfaces. These properties of the proposed research link well with robotic
applications which rely on someho w inaccurate (say , because low-cost) image sensors.
ii

Zusammenfassung
Der heutige Stand der T echnik in der mobilen Robotik ermöglicht es, die Umgebung
mit hoher 3D Genauigkeit abzutasten. Dies ist v on V orteil für viele Anwendungen in
den Bereichen Autonomes fahren, Medizinische Operationen oder Inspektion v on schwer
zugänglichen Gebieten. Die T iefensensoren lassen sich zwischen hoch-akkurate 3D Scanner
und kostengünstige T iefenkameras klassifizieren. Grundle gend ist das Ziel kostengünstige
T iefensensoren zu nutzen und inkrementell die Genauigkeit der über die Zeit aufgebauten 3D
Modelle zu v erbessern.
Diese Arbeit untersucht die Reduzierung der Fehlereinflüsse durch fehlerhafte und
ungenaue T iefenmessungen. Der v orgestellte technische Ansatz ist in der Lage das Rauschen
mithilfe v on Regularisierung im 3D Raum stark zu v erringern. Die Neuerung des Ansatzes
lie gt in der Integration des V orwissens (engl. Prior) über die dif ferentielle Glattheit v on
beobachteten Oberflächen. Die Arbeit demonstriert, dass mithilfe des Ansatzes die 3D
Modellierungsgenauigkeit stark v erbessert werden kann, ohne den Detailgrad der beobachteten
Geometrien zu v erlieren.
Experimente und empirische Auswertungen haben gezeigt, dass mithilfe der v or gestellten
Methode die erreichten Genauigkeiten sich stark v on den bekannten Ansatzen hervorheben.
Zusätzlich, führt die Anwendung des Ansatzes zur ef fizienteren Rek onstruktion der
Geometrien. Im V ergleich zu e xistierenden Arbeiten, erfordert die Methode weniger
Datenpunkte (geringere Bildauflösungen), um dennoch v ergleichbare Genauigk eit zu erreichen.
Der Mehrwert der Arbeit erstreckt sich auf alle robotischen Anwendungen, wo die
W ahrnehmung und Rekonstruktion der Umweltgeometrien mit k ostengünstigen T iefensensoren
erreicht werden soll.
iii

i v

Ackno wledgments
First and foremost, I would lik e to sincerely thank my thesis advisors, Prof. Dr . Olaf Hellwich
and Dr . Anko Börner for their guidance, support and recommendations throughout this study
and appreciate their confidence in me.
I would also lik e to thank Dr . Eugen Funk for helping me as a friend and mentor , his
continuous guidance is one of the reason behind this work.
Finally , I would lik e to pay my gratitude to Higher Education Commission (HEC) Pakistan
for pro viding me with the financial support and opportunity to follo w my dreams of higher
education in a world-reno wned uni versity .
v

vi

Dedications
I would lik e to dedicate this work to my wife Maria , who provided tireless support and
moti v ation so that I can obtain the highest academic qualification without loosing the hope.
This work is also dedicated to my parents, who raised me to question e verything which allo wed
me to think critically in this research phase.
vii

viii

Contents
Abstract iii
Abstract-DE iv
Acknowledgments vi
Dedications viii
1 Intr oduction 1
1.1 Depth Imaging for 3D Reconstruction . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Challenges and the Research Question . . . . . . . . . . . . . . . . . . . . . . 4
1 . 3 C o n t r i b u t i o n s ................................... 6
1 . 4 T h e s i s o u t l i n e ................................... 7
1 . 5 S u m m a r y ..................................... 8
2 Related W ork 9
2.1 General Shape Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Simple x Representation . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 P arametric Representation . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.3 Implicit Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 . 1 . 4 S u r f a c e S p l a t t i n g ............................. 1 3
2.2 Prior Based Shape Approximation . . . . . . . . . . . . . . . . . . . . . . . . 14
2 . 2 . 1 R e g u l a r P r i o r s .............................. 1 4
2.2.2 Local Smoothing Priors . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Global Smoothing Priors . . . . . . . . . . . . . . . . . . . . . . . . . 18
ix

2 . 3 D e p t h M a p S m o o t h i n g .............................. 2 2
2 . 4 I n c r e m e n t a l 3 D F u s i o n .............................. 2 4
2 . 5 D e p t h O u t l i e r s r e m o v a l .............................. 3 0
2 . 6 K e y C o n s i d e r a t i o n s ................................ 3 2
2 . 7 S u m m a r y ..................................... 3 4
3 Methodology 37
3 . 1 F r a m e w o r k S t r u c t u r e ............................... 3 7
3.2 V alidation and Ev aluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.1 Ev aluation Frame work . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.2 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3 . 3 D a t a s e t s ...................................... 4 3
3.3.1 Synthetic Piecewise Function . . . . . . . . . . . . . . . . . . . . . . 43
3.3.2 Synthetic 3D Complex En vironment . . . . . . . . . . . . . . . . . . . 44
3.3.3 Realistic 3D Complex En vironment . . . . . . . . . . . . . . . . . . . 45
3 . 4 3 D S e n s o r s .................................... 4 8
3 . 5 S u m m a r y ..................................... 4 9
4 Fundamentals of V olumetric 3D Integration 51
4.1 Signed Distance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 . 1 . 1 D e fi n i t i o n ................................. 5 2
4 . 2 S D F f r o m d e p t h i m a g e s .............................. 5 5
4.3 Ef fects of incremental 3D fusion . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3.1 Relation of con ver gence with weights . . . . . . . . . . . . . . . . . . 57
4 . 4 S e m i - d e n s e v o x e l g r i d .............................. 6 1
4 . 5 S u m m a r y ..................................... 6 2
5 Concept and Design 63
5.1 Recursi ve least squares as 3D fusion approach . . . . . . . . . . . . . . . . . . 63
5.1.1 W eighted least squares and stardard deri v ation of ML-Estimate . . . . . 65
5.1.2 Depth fusion with recursi ve 3D fusion . . . . . . . . . . . . . . . . . . 70
5.1.3 Properties of RLSFusion . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Re gularized Recursi v e Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . 74
x

5.2.1 Deri v ation of re gularized least squared 3D fusion . . . . . . . . . . . . 75
5.2.2 F aster Con v er gence with regularized fusion . . . . . . . . . . . . . . . 79
5.3 Outliers remo v al using spatial information . . . . . . . . . . . . . . . . . . . . 81
5.4 3D reconstruction framew ork . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5 . 5 S u m m a r y ..................................... 8 8
6 Evaluation 91
6 . 1 Q u a n t i t a t i v e e v a l u a t i o n .............................. 9 1
6.1.1 Outliers remov al and memory utilization . . . . . . . . . . . . . . . . 96
6 . 2 Q u a l i t a t i v e e v a l u a t i o n ............................... 9 8
6 . 3 R u n n i n g t i m e a n a l y s i s ...............................1 0 0
6 . 4 S u m m a r y .....................................1 0 7
7 Conclusion and Futur e work 109
7 . 1 F u t u r e D i r e c t i o n s .................................1 1 1
7.1.1 Adapti v e depth denoising . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.1.2 Automated scene understating . . . . . . . . . . . . . . . . . . . . . . 111
7.1.3 Ef ficient data structure for lar ge en vironments . . . . . . . . . . . . . . 112
Bibliograph y 120
List of Figur es 125
List of T ables 127
A A ppendix 129
A.1 F ormulation of D and C matrices . . . . . . . . . . . . . . . . . . . . . . . . . 129
A . 2 T e c h n i c a l R e q u i r e m e n t s ..............................1 3 1
xi

xii

Chapter 1
Intr oduction
The importance of robotic applications in e v eryday li ves (besides industrial automation) has
increased significantly o ver the years. V arious time-consuming, sequential and tedious tasks
are automated by the help of autonomous robotics. The usability of autonomous robots in
e v eryday life is spread on a broad spectrum ranging from miniature floor cleaning robots to
fully autonomous v ehicles dri ving in unkno wn territories. Regardless of the size, en vironment
and application domain of autonomous robots in general, the ef ficiency of a robot for a gi ven
task depends greatly on the real-time understanding of the surrounding en vironment.
Modern robots are equipped with depth sensor systems, such as laser-based range scanners,
which allo w them to percei ve an en vironment as a 3-dimensional (3D) surrounding. In f act,
the reconstruction and analysis of 3D depth data, and their representation in form of 3D maps,
allo ws robotic applications to perform precise tasks such as na vigation of autonomous v ehicle
without collision. Thus, research domains, dealing with applications of depth perception, are
intensi v ely studied. De v elopments in this domain ha ve typically potentials with respect to
social or economical impacts. Automated decisions (e.g. ho w to interact with objects using
actuators, or ho w to a void collision while na vigating) depend on the modeled en vironment
based on 3D depth data.
Unlike in a theoretically ideal system in which a depth sensor pro vides accurate 3D
information, sensed depth v alues are prone to accumulating unwanted measurement noise.
Although depth sensors acquire depth noise, the degree of noise in depth samples depends on
v arious fac tors such as the distance between sensor and object, extreme lightning conditions,
reflecti v e surfaces, or multiple sensor -corrupting projecti ve patterns. For handling additi ve
1

2 1. Introduction
a) b)
c) d)
e)
Figure 1.1: a) Laser depth sensor (LiD AR) mounted on top of autonomous vehicle, b) 3D points
acquired from LiD AR, c) Stereo camera based IPS system, d) Color-coded depth map from IPS
and e) Reconstructed 3D model of mine using stream of depth images with highlighted surface
deformities.
noise in depth samples, v arious strategies ha ve been proposed and implemented such as kernel-
based filtering or v ariational de-noising methods. Ho we ver , not all de-noising techniques can
be implemented to handle noise in real-time.
1.1 Depth Imaging f or 3D Reconstruction
3D laser scanners are used as depth perception systems in robotic applications in which
accurac y and real-time a v ailability of 3D data is crucial. T ypical laser scanning systems sample
the geometry of en vironment with a rotating head which result in a 3D point cloud which
captures 360 o surrounding area around vehicle as s ho wn in Figure 1.1.b . Despite the accurate
depth measurements by laser 3D scanners, usability of such sensors is restricted in mobile
robotics due to their weight, cost and high po wer consumption.
F or this reason, a low-cost stereo-scopic camera based 3D reconstruction has gained the
interest of research communities. Mobile robots are required to process 3D information
from depth cameras into understandable 3D reconstructed models in real-time. Although
computationally e xpensi ve 3D reconstruction techniques such as Structur e fr om Motion (SfM)

Depth Imaging for 3D Reconstruction 3
Figure 1.2: Incremental 3D fusion and reconstruction process.
algorithms allo ws high-quality approximation of the 3D geometry , applying these techniques in
real-time scenarios is challenging. In general, lo w-cost depth sensors acquire relati vely higher
amount of measurement noise and outliers due to their depth acquisition principal compared
to laser based scanners, this phenomena further degrades the reconstruction of the 3D model
as sho wn in Figure 1.1.e where a stereo-scopic passi ve depth sensor is used to acquire 3D
information. As a result, aforementioned issues subv erts the quality of reconstructed 3D
models, therefore the problem of handling noise and outliers in real-time becomes further
challenging. F or these reasons, enhancing the quality of acquired depth data by the means
of denoinsing is the ke y aim of this thesis.
Stereo-scopic based depth sensors produce a stream of depth and color images in real-time,
these depth images can be inte grated (also referred to as fusion) using a v olumetric integration
technique (described in Chapter 2) to produce globally consistent 3D models. This process
of 3D modeling from stream of depth and color images is sho wn in Figure 1.2. First, a 3D
sensor (RGB-D camera in this case) observ er the scene (in this case Michelangelo’ s David )
and generates a stream of color and depth image as sho wn in Figure 1.2. These image streams
are fed to the 3D fusion frame work in which a V isualSLAM algorithm estimates sensor ego-
motion for each instance of depth and color image, estimated camera tracking information
along-with color and depth image streams are processed to produce high-quality 3D model of
the observ ed en vironment. A crucial feature of the 3D fusion frame work in Figure 1.2 is to
cater acquired depth noise and outliers at the time of incremental processing and inte gration.

4 1. Introduction
1.2 Challenges and the Resear ch Question
The ability to impro ve the quality of the reconstructed 3D model from error -prone 3D depth
data would allo w v arious mobile robotic applications to estimate an accurate geometry of the
en vironment. This challenging objecti ve attracted se veral research communities to in v estigate
and implement no vel depth noise and outliers reduction techniques (discussed in Chapter 2).
In principal, a depth outlier is a 3D sample point which lies either in front or behind the actual
surface while a depth noise is an error -prone 3D sample having close proximity to the surf ace,
therefore result in distorting the geometry of reconstructed model.
In early research endea v ours, triangulation techniques ha ve been emplo yed for surface
reconstruction directly from 3D samples. These techniques interconnect acquired 3D samples
to represent surface geometry using triangle meshes Cazals and Giesen (2004). The
aforementioned meshing technique presumes a well defined spatial distrib ution of 3D samples,
ho we ver in practice this assumption is violated specially when dealing with multiple depth
samples of tar get surface. In such cases, triangulation meshing tries to accommodate all points
by producing undesirable triangles at random position and orientation.
F or such reasons, the research community has re vi v ed the use of v olumetric representation
and inte gration Curless and Le v oy (1996) in which multiple depth samples are represented as
signed distance v alues from the estimated surf ace and standard marching cubes Lorensen and
Cline (1987) algorithm is applied to acquire a globally consistent 3D model.
V olumetric integration of depth images reduces the ef fects of depth noise at the time of
inte gration by fusing multiple noisy depth samples to estimate better understanding of the
object geometry . Implicit representation of the estimated surface is updated using a weighted
addition, it is therefore expected that the estimated implicit surface will e ventually con ver g e
to the true surface. Furthermore, formulating an optimal weighting function which respects
sensor characteristics as well as geometry of en vironment is a significantly dif ficult problem
and v aries greatly among dif ferent types of sensors.
In principal, v olumetric integration techniques presume a globally consistent model,
therefore untreated depth outliers produce surface patches which are inconsistent to the
boundary of the surface. Statistical strate gies which test e very depth sample for proximity
test among neighboring 3D data Rusu et al. (2008) ha v e prov en to be ef fectiv e but lacks

Challenges and the Research Question 5
Figure 1.3: Proposed 3D reconstruction frame work
computationally ine xpensi ve profile which restrict these techniques to be implemented on real-
time applications for mobile robots.
Ne w strate gies are therefore required to combine outlier remov al techniques with noise
reduction methods such that the process of integration is capable of handling erroneous depth
information ef ficiently . In contrast to standard implementations of v olumetric integration in
which the weighting function is to be calculated prior to the reconstruction, ne w techniques
are needed to handle spatial a wareness of the en vironment to cater noise in 3D data.
These challenges pro vided moti v ation to use rele vant a priori information about sensor or
en vironment, this leads to the ultimate question of this thesis:
How to ef ficiently r econstruct a 3D model of en vir onment by r educing ef fects of
noise and outliers inher ently in r eal time scenarios?
This research question is addressed by a nov el 3D reconstruction framew ork sho wn in
Figure 1.3. Considering a set of error -prone 3D point clouds acquired from a depth camera, the
frame work applies geometry a ware outlier remo v al filters which identifies and remo ves isolated
depth samples. The proposed frame work then applies no v el total v ariation (TV) denoising on a
no vel implicit representation to enhance the quality of the reconstructed model while reducing
the arbitrary surface deformities caused by depth noise. This resulting implicit representation
of the depth data is then stored for inte gration of upcoming depth samples. This three stage
process of acquisition, filtering and integration is repeated for each acquired depth image and
implicit representation is updated respecti v ely .
The research question can be subdi vided into follo wing objectiv es:
1. T o de velop a mathematical model which supports a priori information about en vironment

6 1. Introduction
to support impro ved shape approximation and noise suppression in erroneous 3D
samples.
2. T o de velop a computationally ine xpensi ve numerical outlier remo v al filter which tar gets
isolated depth samples on the basis of proximity to support real-time reconstruction.
3. T o de velop a reconstruction frame work which accepts sequence of error -prone depth and
color images and produce high quality 3D models of en vironment in real-time.
1.3 Contrib utions
In general, optimized depth images hav e the potential to increase the accurac y of robotic
application in their respecti v e tasks. Ho we v er for our implementation we focus to apply
depth optimization in robotic applications to aid autonomous na vigation and understanding of
en vironment in real-time. F or this reason, the contributions of this thesis lies in both theoretical
and practical domains to optimize the processing of depth images.
In summary , the main contrib utions of this thesis are:
• Theor etical aspects & Implementation
– Design and implementation of Re gularized Fusion ( RFusion ):A total v ariation
filtering based 3D incremental fusion scheme is proposed, formulated and
implemented which is capable of using prior smoothing kno wledge to reduce depth
noise at the time of inte gration in a real-time scenario.
– Design and implementation of Spatial Outliers Remo val F ilter ( SORF ):A light-
weight outlier detection scheme ha ving linear comple xity of O ( n ) is proposed
which uses spatial proximity cues to identify and remo ve e xplicit outliers.
– Design and implementation of SmoothFusion :A modular 3D reconstruction
frame work which encapsulates regularization aspect of RFusion to filter depth
noise and utilize proposed SORF to remo ve e xplicit and isolated outliers in
computationally ef ficient manner .

Thesis outline 7
1.4 Thesis outline
The rest of this thesis is or ganized as follo ws:
• Chapter 2 pro vides a comprehensi ve literature re vie w of contemporary 3D shape
approximation and representation techniques. As it will be sho wn, modern techniques
use a priori kno wledge into shape approximation process to produce accurate 3D models
with smooth surfaces. State-of-the-art 3D reconstruction frame works are briefly re vie wed
and e v aluated to highlight potential issues related with depth noise and outliers, this
serv es as problem statement for the proposed research. All re vie wed techniques and
frame works are e v aluated with respect to their rob ustness to noise, ability to deal with
outliers, processing speed and o verall accurac y .
• Chapter 3 describes aspects of methodology adopted in this thesis for prototyping and
testing proposed frame work. In addition, both quantitati ve and qualitati ve e v aluation
measures are presented highlighting details of test simulation, datasets and performance
metrics used for benchmarks.
• Chapter 4 pro vides theoretical background focusing specifically on v olumetric
inte gration and presents analytical reasoning behind dense and semi-dense v olumetric
representation used in state-of-the-art frame works.
• Chapter 5 pro vides theoretical aspects and design of proposed contrib utions. Firstly ,
a no vel Recursi ve Least Square (RLS) based 3D fusion (RLSFusion) technique is
introduced which enables possibility of functional extendability such as e xponential
for getting and regularization in the 3D inte gration process. Secondly , the core concept
of total v ariation based filtering for implicit shape re gularization in incremental 3D
reconstruction is presented and implemented in the form of RFusion . Finally , internal
workings of SORF are introduced and importance of using rob ust spatial outliers remov al
filter in real-time 3D fusion frame work is briefly discussed.
• Chapter 6 presents quantitati v e e v aluation measures used to e v aluate reconstructed 3D
models from proposed frame works with state-of-the-art techniques. In cases where
quantitati v e e v aluation is not possible, screenshots from both proposed and e xisting
frame works are pro vided to aid qualitati ve comparison. Finally , a comprehensi v e per -
frame running time analysis is presented which compares the ex ecution of proposed

8 1. Introduction
frame work in comprehensi ve detail.
• Chapter 7 summarizes the contrib utions of our systems and propose se v eral directions
for future work.
1.5 Summary
This chapter highlighted significance of incremental 3D fusion in modern robotic applications
from 3D depth sensing systems. K ey challenges such as depth outliers and noise w are identified
and research question is formulated which subdi vides problem statement into three basic inter -
related objecti v es, i.e. formulation of implicit shape approximation, ef ficient outliers detection
and remo v al scheme and smoothing using prior kno wledge.
As the first step to achie ving these objecti ves, next chapter pro vides critical literature
re vie w of existing 3D shape approximation techniques and state-of-the-art 3D reconstruction
frame works.

Chapter 2
Related W ork
This chapter will re vie w traditional shape modelling techniques and fusion frame works used
to reconstruct models from 3D points. Initially , general shape representation methods are
re vie wed to establish baseline on their suitability for numerical shape approximation and
modelling. Thereafter , modern approaches which process regularization and smoothing using a
priori kno wledge are re viewed and e v aluated. At last, state of the art fusion frame works which
use 3D depth data to reconstruct a virtual en vironment are briefly discussed and e v aluated.
3D shape approximation and modelling is a traditional yet acti ve research problem which
is addressed by computer graphics and vision community for past two decades. In early
days, direct triangulation based techniques for modelling were proposed and implemented
(see Edelsbrunner and Mücke (1994)), these techniques were straightforward and ef fecti ve
when the sampled 3D points were ordered and pre-sampled ho we ver lack ed the capability
to handle scattered and unordered depth data. A no vel v olumetric representation technique
was presented by Curless and Le v oy (1996) which illustrated the capability of inte grating
multiple range images in a v olumetric fashion to construct comple x models, due to lack of
computation resources at that time this technique was une xplored for up-to 15 years and was
re vi ved by Ne wcombe et al. (2011). The dif ficulties of triangulation based modelling were
later addressed by Ale xa et al. (2003), Calakli and T aubin (2011) and Kazhdan and Hoppe
(2013) by using a priori kno wledge (also referred to as prior ) in modelling phase, this allo ws
the reconstruction frame work to handle redundant and une ven samples and produce smooth
3D models. Many research groups inte grated the use of smoothness priors in their application-
specific methods, some considered repetiti v e structures Pauly et al. (2008), Berner et al. (2011)
9

10 2. Related W ork
while other e xperimented with geometry of the sampled scene Y ingze Bao et al. (2013). The
concept of smoothness priors is also extended to global and piece wise smooth geometries by
A vron et al. (2010) and Calakli and T aubin (2011).
Section 2.1 pro vides a brief re vie w of general shape representation techniques while
focusing on their usability of handling unordered 3D points. Since the concept of using
smoothing priors is basis of presented research objecti v e, rele vant notable contrib utions are
briefly addressed in Section 2.2. Depth image smoothing techniques which are rele v ant to
proposed research are briefly re vie wed in Section 2.3 to establish baseline. Section 2.4 re vie ws
current state-of-art reconstruction frame w orks which utilize a volumetric representation for
depth inte gration. The discussion is then summarized in Section 2.6 where generality ,
computational ef fecti veness, rob ustness and accurac y are elaborated.
2.1 General Shape Repr esentation
2.1.1 Simplex Repr esentation
Shape representation using polygon meshes or more generally simple xes is considered as a
standard practice in computer graphics community . Man y interacti ve visualization applications
such as virtual reality , augmented reality and video games use simplex es to process and
visualize 3D models. This concept originated form research by Bo wyer (1981), in which
a triangle meshes connecting tetrahedra via Delaunay-triangulation are used to approximate
shapes from 3D points in an automated fashion. Edelsbrunner and Mücke (1994) proposed
α -shapes algorithm (Figure 2.1) for creating topological correct surfaces from 3D points using
polygon meshes. The method connects neighboring 3D points using triangles while the α v alue
controls the acceptable euclidean distance between connected sample. Since the v alue of α is
strongly dependent on factors such as detail and scale of the model, the selection of appropriate
v alue for α requires user interaction by an e xpert. Later , a more adapti ve re gion gro wing
technique called Ball-Pivoting Algorithm (BP A) was proposed by Bernardini et. al (1999) in
which a user defined a virtual ball ha ving radius ρ is used to determine v alid 3D points. In
principal, α -shapes, BP A and all Delaunay-triangulation methods are incapable to handle noisy
or redundant 3D samples Bodenmueller (2009). Since acquired 3D points are prone to collect
v arious types of noise in the scanning process, the noise sensitiv e behaviour produces abrupt

General Shape Representation 11
Figure 2.1: The α -shapes algorithm. a) Input samples, b-d) reconstruction with increasing α
Edelsbrunner and Mücke (1994).
geometric defects in the reconstructed model. Henceforth, the computer vision communities
a v oid the use of simplex es based representation while approximating the shape with 3D points.
2.1.2 Parametric Repr esentation
P arametric algorithms handle the problem of non-uniform sampled 3D points by fitting a spline
to approximate contours of the surface, these methods are well-kno wn for signal interpolation
as well as approximation. For instance, a function f ( u, v ) : R 2 ↦→ R which returns the
height of the surface it is approximating at an y gi v en v alues of u and v (Figure 2.2.a). V arious
approximating functions can be stitched together to approximate comple x globally consistent
3D models (Figure 2.2 b), ho we ver a traditional parametric surf ace reconstruction algorithm
consists of two steps:
1. Partitioning: A clustering technique (for e.g. Shef fer et al. (2007)) is applied
2. Parameterization: A local surf ace with corresponding height parametrization model f
is e xtracted via optimization
A standard practice is to employ a least squar e function for each se gment which minimizes
min
N
∑
i
∥ h ( P i ) − f ( u i , v i ) ∥ 2
2 (2.1)
where P i ∈ R 3 is the i th sample 3D point, h ( P i ) : R 3 ↦→ R is the segment height and ( u i , v i ) =
pr oj ( P i ) is the projection of P i on the corresponding segment plane partition from step 1.

12 2. Related W ork
Figure 2.2: a) Smooth surface model via NURBS. b) A set of parametric shapes combined to a
global consistent surface. Schreiner et al. (2004).
Bezier curv es (see Agoston and Agoston (2005)) or Non-Uniform Rational B-Splines (NURBS)
model are then employed to approximate model f for each se gment.
Although parametric shape approximation and reconstruction has the capability to produce
smooth 3D surfaces for non-uniform 3D point sets, ho we ver an e xtra computational task of
combining local se gments to produce a global continuous shape is required. This combinatorial
task is computationally e xpensi ve as sho wn by Floater and Hormann (2005). For this reason,
the local approximation and representation is adopted and will be e xamined in Section 2.2.
2.1.3 Implicit Repr esentation
The Signed Distance F ield (SDF) is a special case of shape representation ha ving high potential
of usability in applications such as motion planning (Hof f III et al. (1999)), multi-body
dynamics (Guendelman et al. (2003)), collision detection and cloth animation (Bridson et al.
(2003)) and camera mo vement tracking (Canelhas et al. (2013)). The shape of the desired object
is represented by an implicit indicator function f ( x ) which classifies space around the object
surface as either inside f ( x ) < 0 or outside f ( x ) > 0 where x ∈ R 3 is the spatial coordinate
of sampled 3D space. In principal, a surface of the object is set of all x where f produces zero
as illustrated in Figure 2.3. The 3D space is di vided into smaller elements called voxels which
contains implicit indicator v alue of SDF . Gi ven enough computational and memory resources,
the implicit modelling can be e xtended to work with streams of depth and color images from a
RGB-D sensor (such as Microsoft Kinect) to produce high quality 3D models of en vironment
(Ne wcombe et al. (2011)).
Main dra wback of using implicit representation for 3D modelling comes from the di vision

General Shape Representation 13
Figure 2.3: A Signed Distance Function (SDF) on a fine grid.
of 3D space into dense cells re gardless of whether a cell contain meaningful SDF information
or not. This inef ficient memory utilization makes implicit shape modelling infeasible for
lar ge-scale en vironments. Representing an area of 100 x 100 x 100 m 3 using 1 cm voxel
resolution using standard floating point v alues would require 4000 GB of memory to encode
implicit SDF . State-of-the-art v olumetric reconstruction frame works (such as Steinbrueck er
et al. (2014), Kähler et al. (2015)) employ narro w-band surface localized v ox els to facilitate
lar ge-scale en vironment reconstruction. Further explanation and incremental inte gration based
applications of implicit representation are discussed in Section 2.4.
2.1.4 Surface Splatting
Surface splatting is a specialized case of the point based representation which is tar geted
to render millions of 3D points independently (more generally vertices) in real-time using
modern rendering frame work such as OpenGL. State-of-art reconstruction frame works (such as
Whelan et al. (2015), K eller et al. (2013)) employ surf ace splatting to accommodate dynamics
of en vironment by adding or remo ving vertices in real-time. In principal, elliptical surf aces
ha ving associated confidence, color and normal information (called splats ) are used to represent
v ertices. In order to integrate the curv ature information using discrete splats, each splat is
processed with Elliptical W eighted A ver ag e (EW A) to produce high quality te xture blending
while maintaining a lo w memory profile. Zwicker et al. (2003) demonstrated the potential
of surface splatting to produce high-quality 3D models from both scan and synthetic objects
(Figure 2.4)

14 2. Related W ork
Figure 2.4: Surface splatting of a scan of a human f ace, textured terrain, and a comple x point-
sampled object with semi-transparent surfaces. (Zwicker et al. (2003))
Although surface splatting has a high potential for applications in real-time dynamic 3D
modelling using high-quality depth sensor such as Kinect and Kinect v2 , howe ver inability to
handle lo w-density and error prone depth data is main disadv antage for using this representation
technique.
The shape representation techniques discussed so far enable the approximation of shape
geometry from 3D points. In various applications, it is preferred to restrict the generality of
representation approach in the fa vour of approximation quality . Examples in the upcoming
section will discuss and illustrate the process of utilizing a priori information to cater error -
prone measurements while producing smooth surfaces.
2.2 Prior Based Shape A ppr oximation
In computer graphics and vision algorithms, prior information is used to aid reconstruction
and rendering processes. This integration of prior kno wledge is essential in automated 3D
shape approximation since all depth sensing systems are prone to introduce measurement errors
depending upon the type and working of the sensor system. T o maintain rele v anc y with the
research objecti v e while a v oiding exploration of inessential techniques, tw o general prior types
are identified and briefly e xplained in the upcoming sections.
2.2.1 Regular Priors
V arious e veryday objects e xhibit repetiti ve structures and 3D reconstruction frame w orks
can use this vital information to produce life-like 3D models, these repetiti ve structure are

Prior Based Shape Approximation 15
Figure 2.5: Detecting repetiti v e structures (a) enables hole filling (b) in structured
en vironments. (Pauly et al. (2008)).
Figure 2.6: Similar objects are scaled to match repetiti ve patterns. Red: strong deformations.
Green: small deformations. (Berner et al. (2011)).
commonly referred to as re gular priors. P auly et al. (2008) proposed a method to cluster point
clouds into repetiti v e segments and emplo y this vital information for hole filling of structured
en vironment. The potential of this scheme is illustrated in Figure 2.5 where a complete w all
se gment is inferred from repetiti ve structure. Berner et al. (2011) e xtended the concept of
using re gular priors to general partial symmetries, in which lo w dimensional shape space
is represented and used for matching. In principal a basic structure is identified and non-
rigid deformations of this portion are matched with similar areas, this is illustrated in Figure
2.6 where strong deformation matches are sho wn in red while small v ariations are shown in
green. Berner et al. (2011) suggested the use of supervised segmentation where the matches
are ambiguous Figure 2.6.
Supervised inte gration of prior information is further in vestigated by Arikan et al. (2013)
and Sharf et al. (2007) in which relational based similarities are marked by e xpert user .
Y ingze Bao et al. (2013) proposed to perform semantic classification by relating observed
images and sparse point cloud from kno wn database (Figure 2.7) using pre-learned approaches
to reduce interacti v e intervention from e xpert user . Aforementioned methods are designed to
perform well in presence of kno wn objects or fractals in shape subspace, ho we v er in common
scenario when the scene consists of unkno wn objects, these method does not pro vide any

16 2. Related W ork
Figure 2.7: Learning priors from images for reconstruction (Y ingze Bao et al. (2013)).
adv antage. Thus instead of using a specialized structure, generalized and straightforward
prior models are required which can be used in an unkno wn en vironment. Smooth and planar
surfaces ha ve been identified as common characteristics for a lar ge v ariety of scenes, therefore
using this basic information into the shape reconstruction approach does not confine underlying
algorithms to specific en vironment. The smoothness assumption is di vided into local and global
smoothness priors, both are highly important and rele v ant to proposed research and discussed
briefly in the upcoming section.
2.2.2 Local Smoothing Priors
A no vel and rob ust local smoothing approach which is specifically designed to handle
redundant data as well as to remov e noise was proposed by Alexa et al. (2001). An implicit
surface is approximated for e very point in sampled data by using neighboring points. This
type of neighborhood approximation methods are also kno wn as moving least squar e (MLS)
techniques. Strength of smoothness can be controlled by v arying the weighting function θ
to reduce surface deformities caused by error -prone depth measurements. The approximation
process is a two step process, a local ref erence domain (plane) h ( x ) for the point x is extracted
which minimizes a local weighting sum of the square distance of points p i to the plane in the
first step using:
h ( x ) = ar g min
n,d ∑
i
( ⟨ n, p i ⟩ − d ) 2 θ ( ∥ p i − x ∥ 2 ) (2.2)
Where n is approximated normal of p i and d is distance of p i from the local plane. In the second
step, a local bi v ariate smooth polynomial approximation function f ( x ) is estimated via LSQ

Prior Based Shape Approximation 17
Figure 2.8: Local surface approximation by Ale xa et al. (2001).
(Figure 2.8) to the height v alue h ( p i ) for each sample:
ar g min
f ∑
i
( f ( u i , v i ) − h ( p i )) 2 θ ( ∥ p i − x ∥ 2 ) (2.3)
All a v ailable points p i are re-sampled with the estimated shape and rendered using a
v ariant of point based rendering proposed by Rusinkie wicz and Le vo y (2000). This
surface approximation approach recei ved much attention due to its handling of noisy and
redundant point samples while producing smooth continuous surface representation. Further
e xperimentation by K olluri (2008) enabled the control of smoothness via point-based blending
which employs point normals n i into a shape function as:
f ( x ) = ∑ i n i . ( p i − x ) ϕ ( ∥ p i − x | 2 )
∑ i ϕ ( | p i − x | 2 ) (2.4)
where ϕ is sharpness and the weighting function is defined as:
ϕ ( r ) = 1
r 2 + ϵ (2.5)
In principal, both f ( x ) and ϕ can be controlled by a user-defined parameter ϵ , Figure 2.9
demonstrates ef fects of v arying ϵ to acquire an appropriate smoothness lev el. Although MLS
based techniques are ef ficient to handle sampling noise and redundanc y , ho we ver the y fail
to produce accurate 3D models when the samples are sparse. This problem is addressed by
Öztireli et al. (2009) where the y extended the polynomial model from Ale xa et al. (2001) by

18 2. Related W ork
Figure 2.9: Smoothness of point-to-plane blending controlled by ϵ by K olluri (2008).
Figure 2.10: a) Sphere fitting from sparse samples Öztireli et al. (2009) b) MLS without and
with outliers Ohtake et al. (2005).
fitting spheres of v ariable radius to local samples.
All aforementioned local approximation approaches tend to accommodate each sample
to a consistent surface, this property of least square estimation fails to accommodate strong
sampling noise and/or outliers and produce artifacts as sho wn in Figure 2.10.b . Specialized
statistical outliers remo v al techniques such as Rusu et al. (2008) can be applied on 3D samples
to reduce outliers, since remo ving outliers is an essential part of the research objecti ve these
techniques are briefly described in Section 2.5.
2.2.3 Global Smoothing Priors
Global shape approximation techniques e xploit the implicit representation of an underlying
surface to create a globally consistent shape. Carr et al. (2001) proposed one of the first such
global shape approximation methods in which the implicit function f ( x ) which approximates

Prior Based Shape Approximation 19
Figure 2.11: The implicit function f ( x ) approximating surface points.
of f-surface points is defined as
f ( x ) = ∑
i
α i ϕ ( ∥ x − c i ∥ 2 ) (2.6)
where α i are weights and c i are centres for i th Radial Basis Functions (RBF). The technique
e xploits included normals information n i to further enhances the approximation of an implicit
function as sho wn in Figure 2.11. An optimal shape function gi v es zero at the sample i.e.
f ( p i ) = 0 and d i at the of f-surface points f ( p i + ϵ i n i ) = d i . A con v ex LSQ minimization task
which is used to calculate α i is defined as:
α = ar g min
α ∑
i
f ( p i ) 2 + ( f ( p i + ϵ i n i ) − d i ) 2 + ( f ( p i + ϵ i n i ) + d i ) 2 (2.7)
The smoothing ef fect for the final representation can be controlled by the polynomial de gree
of RBF and of fset distances d i . Good extrapolation capabilities combined with achie ved
smoothness allo ws global approximation techniques to deal with irre gular sampling issues
while maintaining details in high density areas. The selection of an appropriate offset distance
d i to deal with noise while maintaining details is ho we ver a critical problem.
Hornung and K obbelt (2006) introduced a simplified discrete v ariant of the global
approximation to address the of f-surface distance. The 3D space around samples is di vided
into a narr ow-band v ox el grid, distance v alues are calculated from the nearest sample to the
center of each v ox el and stored in the corresponding vox el (Figure 2.12). Graph-cut techniques
from Boyk ov et al. (2001) are then applied to e xtract the shape from v ox els. This technique
is not suitable for lar ge datasets containing millions of 3D points due to high computational

20 2. Related W ork
Figure 2.12: Narr ow-band v ox el grid around samples for graph-cut, Hornung and K obbelt
(2006).
comple xity .
V arious implementations and improv ements ha ve been proposed o v er the years to further
e xtend RBF based global approximation. Ho we ver the most rele v ant and notable contrib ution
is proposed by W alder et al. (2006) in v olving a two-step processing. In the first step, small
re gions are approximated independently via a global RBF (Figure 2.13). In the second step, a
compound RBF which compactly supports local approximations is estimated. W alder further
proposed a re gression model which forces sample normals n i to align with shape function f
such that:
∇ f ( p i ) = n i (2.8)
In terms of a con v ex optimization task, this constraint emulates the cost term. Since this
method enforces a locally defined function to work with a globally smooth RBF , this imposition
ho we ver leads to o ver -smoothing. Calakli and T aubin (2011) extended W alder’ s re gression
model to work with a discrete form on an octree and proposed a second order minimizer , which
led to Smooth Signed Distance F ields (SSDF) surf aces. Figure 2.14 sho ws the extrapolation
beha viour of SSDF reconstruction on non-sampled 3D points.
All of the aforementioned global methods are tar geted to approximate a consistent surface
from gi v en 3D points at the time of ex ecution. In case of incremental updates specially in
the real-time 3D reconstruction, these methods fail to accommodate latest updates in input
3D points. This challenging rigid beha viour of global smoothing methods pro vided main

Prior Based Shape Approximation 21
Figure 2.13: A smooth and global implicit shape e xtracted via radial basis functions, W alder
et al. (2006).
Figure 2.14: Surface reconstruction using SSDF by Calakli and T aubin (2011).

22 2. Related W ork
Figure 2.15: Surface normals from ra w depth image (left) vs smooth depth image (right).
Canelhas et al. (2013)
moti v ation for the proposed research to explore incremental 3D fusion techniques which are
capable of approximating real-time 3D models of the en vironment. These models are briefly
discussed and e v aluated in Section 2.4. F amily of depth smoothing techniques which are
designed to reduce noise in depth images instead of 3D samples are e v aluated in the upcoming
section.
2.3 Depth Map Smoothing
The process of representing a 3-dimensional object using series of range v alues starting from
camera origin to the surface of the object in the form of a 2D image (also referred to as depth
map) is a well established norm in computer graphics and vision community . A measurement
error in the form of added noise is accumulated in the depth acquisition process, untreated
depth noise produces abrupt and geometric deformities in the 3D reconstruction. T o cater
this problem, v arious depth map smoothing algorithms ha ve been in v estigated, proposed and
implemented in the past decade. Since properties of the depth noise are dif ferent to that of
normal color or gray-scale image noise, applying legac y smoothing filters such as lo w-pass
filter introduce further surface deformities in the reconstructed 3D model.
Ne wcombe et al. (2011) introduced a modified edge a ware bilateral filter to produce
discontinuity preserv ed depth map from raw depth map acquired by the Kinect system in real
time. Surface normals are commonly used to visualize depth smoothing ef fects by applying
bilateral filter , Figure 2.15 highlights depth smoothing ef fects with bilateral filtering. Zhao

Depth Map Smoothing 23
Figure 2.16: 1 st row: 3D surface from Raw Kinect depth image, 2 nd ro w: Using bilateral
smoothing and 3 r d ro w: Smoothing with Zhao et al. (2013)
et al. (2013) proposed a specialized depth filtering method which employs surf ace orientation
analysis per pix el surface orientation analysis to further enhance the smoothing process.
Promising results ha ve been demonstrated by Zhao et al. (2013) as sho wn in Figure 2.16.
All of the pre viously mentioned depth smoothing algorithms are designed to handle
Kinect-like depth noise, howe ver depth images estimated from stereo images are prone
to a higher intensity of depth noise due to estimation mis-match (i.e. sudden change in
estimated surfaces) or te xture-less or self-similar en vironment. Ranftl et al. (2012) proposed
a stereo model featuring a second-order re gularizer which reduces estimation errors and
produces smooth depth images. Balzer and Soatto (2013) proposed a similar optimization
technique in an iterati v e fashion to smooth surf ace deformities in multi-vie w stereo image
based 3D reconstruction. Graber et al. (2015) argued that unconstrained total v ariation based
re gularization techniques are someho w prone to produce staircase artifices (Figure 2.17.b) since
the y ov erlook 3-dimensional geometry of the percei ved en vironment.
Edge-a ware bilateral smoothing techniques and their v ariants are robust in nature
ho we ver lack the capability to handle high-intensity noise. Contrarily , total v ariation based
re gularization methods respect geometry of surface at the e xpense of computational complexity .

24 2. Related W ork
Figure 2.17: a) Ground truth surface, b) T otal v ariation regularization and c) Minimal-surf ace
re gularization by Graber et al. (2015).
Therefore, a nov el depth smoothing mechanism is needed to handle both kinect-lik e noise and
stereo estimation depth noise while maintaining a lo w computational comple xity profile.
2.4 Incr emental 3D Fusion
Curless and Le v oy (1996) proposed a no v el implicit v olumetric method tar geted to support
the reconstruction of comple x models from range images. The potential and simplicity of
v olumetric method to handle incremental updates in the form of range images moti vated
v arious researchers to e xtend the core concept to utilize modern computational resources such
as Ne wcombe et al. (2011), Whelan et al. (2012) and Kähler et al. (2015) etc. In principal, pre-
aligned range images are represented and updated incrementally as weighted signed distance
functions stored in a predefined v ox el grid. Multiple error -prone observ ations of the particular
re gion of interest from either single or multiple-vie ws reduces acquisition noise and result in
a high quality approximation of 3D object. An underlying v olumetric representation method
transforms the range image R i to a signed distance function v alue d i ( x ) from a surf ace and
weights w i ( x ) . A simple truncation mechanism ensures that v alues of d i ( x ) are bounded
within in D min and D max , this truncation plays a vital role in determining the proximity of
a particular v ox el near suspected surface. The implicit surface (also kno wn as zero-crossing)
can be e xtracted by casting a ray from the sensor position to each v oxel and re gistering a zero-
crossing as sho wn in Figure 2.18.a.
Incremental updates on the v olumetric grid are carried by follo wing equations:
D i +1 ( x ) = W i ( x ) + D i ( x ) + w i +1 ( x ) d i +1 ( x )
W i ( x ) + w i +1 ( x ) (2.9)

Incremental 3D Fusion 25
a) b)
Figure 2.18: a) A range surface along x-axis from sensor position b) tw o range-surface are
inte grated to form a ne w zero crossing.
W i +1 ( x ) = W i ( x ) + w i +1 (2.10)
where D i +1 ( x ) and W i +1 ( x ) are cumulati v e signed distances and weight functions for all v alid
v oxels x ∈ R 3 after inte grating the i th range image as shown in Figure 2.18.b .
The v olumetric nature allo ws this representation scheme to inte grate multi-vie w range
images to form a consistent 3D model. This beha vior is demonstrated in Figure 2.19 in which
two separate cross-sections of v olumetric SDF data 2.19.a and 2.19.b are integrated using
Equation 2.9 and 2.10. Usually , a space-carving procedure is applied to identify potential
v oxels foll o wed by the iso-surface e xtraction to render 3D models. Since contents of the vox el
grid are updated in an incremental fashion, Curless and Le v oy (1996) suggested to emplo y a
fast marching cube algorithm (Lorensen and Cline (1987)) which can be initiated on demand.
Main dra wback of using a dense volumetric grid for the incremental 3D reconstruction
comes from the e xcessi ve use of memory and computational requirements as described in
Section 2.1.3. State-of-the-art v olumetric reconstruction framew orks utilize multi-threaded
architecture of modern CPU and GPU to facilitate lar ge scale en vironment. Rele v ant
and notable contrib utions which extends the capability of v olumetric method to large-scale
en vironment and real-time modeling are briefly re vie wed and their performance is e valuated in
Section 2.6.

26 2. Related W ork
Figure 2.19: a and b) SDF v alues from multiple vie ws in cross-section of v olumetric grid c)
inte grated SDF v alues to form compound iso-surface Curless and Le vo y (1996).
a) b)
Figure 2.20: a) Slice of SDF v olume demonstrating potential truncation mechanism and b)
o verall 3D v olume (Newcombe et al. (2011)).
Ne wcombe et al. (2011) e xtended the concept of incremental volumetric 3D fusion with
real-time camera pose estimation using Iterati ve Closest Point (ICP) tracking. This extension
enabled lo w-cost depth scanning de vices such as Kinect to reconstruct small scale en vironment
as sho wn in Figure 2.20 where a slice through the signed distance function F highlights the use
of truncation mechanism (i.e. v alidity criteria for each vox el v : µ ≤ v ≤ − µ ). The surface of
v olumetric data is e xtracted with the help of ray-casting from vie wing camera as suggested by
Curless and Le v oy (1996).
The use of GPU processing and memory enabled KinectFusion to reconstruct a 3D model
of the en vironment in an online fashion. In principal, GPU processing threads are designed
to perform simplified and repetiti v e tasks using massi vely parallel processing architecture.
This mechanism restricted KinectFusion to work in small-scale en vironments. Ne wcombe
et al. (2011) further suggested to use frame-to-model tracking for the pose estimation and

Incremental 3D Fusion 27
reconstruction, this localization mechanism utilizes a v ailable memory in an optimal way while
producing globally consistent surfaces as sho wn in Figure 2.21 where the sensor is rotated
around the area of interest.
Figure 2.21: Camera tracking information visualized around region of interest (left) and
reconstructed 3D model (right) (Ne wcombe et al. (2011)))
Roth and V ona (2012) presented a nov el memory ef ficient approach of a mo ving TSDF
v olume from one location to another with respect to the camera moment, this allo ws an
acti v e TSDF v olume to be remained in fast acting memory while inacti ve parts can be
mo ved out of the memory on-demand. This technique is targeted to accommodate sensor
mo vement and reconstruction, ho wev er a rigid transformation combined with mov ement of
TSDF v olume accumulates localization drift and may result in multiple copies of misaligned
surfaces. Figure 2.22 demonstrates the working of mo ving volume approach where the initial
TSDF v olume (left) is remapped to align with updated TSDF v olume (middle) using a fixed
relati v e transformation.
Figure 2.22: Initial TSDF v olume (left), updated TSDF volume after inte gration (middle) and
mo vement tracking information (right) (Roth and V ona (2012)))
Whelan et al. (2012) proposed Kintinous as an e xtension of the KinectFusion which
uses incremental triangular meshes in addition to v olumetric mapping to handle lar ge-scale

28 2. Related W ork
reconstruction, Figure 2.23 demonstrates the ef fecti veness of Kintinous to reconstruct relati vely
lar ge-scale en vironment consisting of a two story apartment ha ving multiple rooms.
Figure 2.23: Lar ge-scale 3D reconstructed model of apartment from Kintinious (Whelan et al.
(2012))
Nießner et al. (2013) introduced a no vel v oxel hashing data structure tar geted to achie ve
real-time management of implicit v olumetric surfaces in the forms of v ox el blocks from
GPU’ s memory and processing resources. The proposed streaming in/out mechanism for vox el
blocks eliminated spatial restrictions from 3D reconstruction while retaining the quality of
reconstructed 3D models from de gradation. An efficient GPU accelerated hash table is used to
allocate v oxel blocks in the proximity to surf ace geometry , each v oxel block is accessed using
an inte ger world coordinate ( x, y , z ) . All acti ve w orld coordinates ( x, y , z ) are mapped to hash
v alue H ( x, y , z ) using the hashing function :
H ( x, y , z ) = ( x.p 1 ⊕ y .p 2 ⊕ z .p 3) modn (2.11)
where p 1 , p 2 and p 3 are lar ge prime numbers and n is hash table size. A strict streaming
in/out mechanism which checks each v ox el block against the camera frustum is responsible of
the data management, this ensures that acti ve v oxel blocks remain in the f ast acting memory

Incremental 3D Fusion 29
Figure 2.24: Streaming in/out process of vox el blocks from Nießner et al. (2013)
while dormant v oxel blocks do not consume memory resources. Figure 2.24 demonstrates this
streaming in/out process in which the camera is mo ving from left to right and respecti ve v oxel
blocks are flagged accordingly .
The concept of v oxel block hashing inspired Kähler et al. (2015) and Steinbrueck er et al.
(2014) to implement v ery fast state-of-the-art real-time 3D reconstruction named InfiniT AM
and F astFusion respecti vely . All aforementioned incremental 3D fusion techniques including
state-of-art frame works rely on either the quality of depth measurements or the amount of
samples for a specific surface re gion from RGB-D cameras such as Kinect and Kinect v2
ho we ver dealing with error -prone depth data specially from stereo cameras remains a serious
concern. Furthermore, the underlying core principle of weighted volumetric inte gration using
Equations 2.9 and 2.10 remained unchanged o ver past tw o decades. Since the original concept
of v olumetric integration w as designed to handle range images with minimal surface noise,
the resulting frame works struggles to accommodate depth images with high depth noise such
as from stereo cameras etc. This pro vided the main moti v ation for the proposed research to
handle depth noise using total v ariation based filtering in a recursi ve manner .

30 2. Related W ork
Figure 2.25: T ypical point cloud with highlighted depth outliers
2.5 Depth Outliers r emo val
Incorrect estimations of depths, in general depth outliers, are a common phenomena and almost
all 3D scanners are prone to this challenging problem. Laser based 3D scanners such as LiD AR
produce depth outliers when the surface on the object of interest e xhibits reflecti v e properties.
Similarly , acti v e depth sensors which use pattern projection and detection such as Kinect
produce incorrect depth measurements when the surface of fore ground meets background,
similar depth deformities ha v e been identified when the depth image is subjected to ov er-
smoothing by a depth filter . All aforementioned depth outliers pose serious problems for
both camera pose estimation and 3D reconstruction. Figure 2.25 sho ws a typical 3D point
cloud ha ving depth outliers at edges of the table where the sensor detected tw o surfaces ha ving
dif ferent height profiles.
Depending upon the spatial proximity of the depth outlier with respect to the actual surface,
an outlier can be classified as sparse, isolated or non-isolated outliers. Rusu et al. (2008)
proposed a simplified Statistical Outliers Remov al (SOR) method to tar get sparse and isolated
outliers which calculates the distance of each point along its K neighbours in the first pass.
Mean µ and standard de viation σ of accumulated distances are then calculated to determine
appropriate distance threshold using follo wing formula:
T hr eshol d = µ + α ∗ σ (2.12)

Depth Outliers remo v al 31
Figure 2.26: Statistical measures to detect outliers
where α is a user defined parameter to control o verall distance threshold. This process is
illustrated in Figure 2.26. In the second pass, all 3D points are classified as either inliers or
outliers depending upon the distance threshold. Although this SOR method is highly ef fecti ve
in remo ving both sparse and isolated outliers, the first pass of the SOR method uses time
consuming memory lookups to access neighbouring 3D points. This time consuming profile of
the process poses serious concerns when using this scheme with real-time 3D reconstruction
system. F or instance, processing a typical 640 x 480 pixel depth image captured from Kinect
RGB-D camera using SOR filter can take approximately 650 milliseconds to detect outliers
using a SOR filter .
W ang and Feng (2015) proposed a majority v oting based algorithm to tar get all types of
outliers. Such v oting based algorithm remov es outliers with precision at the cost of high
computational comple xity . T echniques having a computationally e xpensiv e profile and lacking
real-time processing such as W ang and Feng (2015) and Zhang et al. (2016) are not considered
for the quantitati v e comparison with the proposed research.

32 2. Related W ork
2.6 K ey Considerations
In order to summarize the literature re vie w and establish a potential baseline, all
aforementioned shape representation and Incremental 3D fusion techniques are e v aluated
separately with four e v aluation criteria, which are:
1. Generality: Ability of the technique to reconstruct both comple x and simple shapes.
2. Rob ustness: Ability to handle strong noise and outliers.
3. Computation Speed: Processing time and computational complexity of the technique.
4. Accuracy: Ov erall accuracy of the technique.
T able 2.1 summarizes re vie wed shape reconstruction methods in the light of ev aluation
criteria, the plus sign indicates whether a particular technique is adequately fulfilling a certain
criterion. These findings hav e been deri ved mainly from the literature, publicly av ailable source
code and in certain cases by direct communication with authors. In some cases, computation
speed metric is e v aluated with the help of the open-source tool Meshlab de veloped by Cignoni
et al. (2008) which contains fast implementations of v arious re vie wed algorithms.
All re vie wed techniques which do not consider prior information were found incapable of
handling either noisy or sparse samples, with the one e xception of SDF by Ne wcombe et al.
(2011) which uses a stochastic con v ergence property to reduce noise at the e xpense of v ery
high storage requirements.
T echniques which use Regularity Priors are designed specifically for the targeted
application such as v ehicles and building f acades, ho we v er this lack of generality constraints
the application domain.
Both Local and Global smoothness priors were observ ed to produce accurate shape
approximation and reconstruction, thus a large number of applications can inte grate smoothness
information to increase the quality of the 3D reconstruction. Local methods were found to
produce faster parallel computations, ho wev er they are less suitable for high-noise or sparse 3D
samples. Here, global methods such as P oisson and SSDF by Kazhdan and Hoppe (2013) and
Calakli and T aubin (2011) respecti v ely significantly outperformed and produced comparati v ely

K e y Considerations 33
T able 2.1: State-of-the-art 3D shape reconstruction approaches
T echnique 1: Generality 2: Rob ustness 3: Speed 4: Accuracy
No Priors
α shapes - Edelsbrunner and Mücke (1994) + + + +
BP A - Bernardini et al. (1999) + + + + +
SDF - Ne wcombe et al. (2011) + + + + + + + +
Regularity Priors
Clustering - P auly et al. (2008) + + + +
Subspace tension - Berner et al. (2011) + + + + + +
Learning Clusters - Y ingze Bao et al. (2013) + + + +
Local Smoothness Priors
MLS - Ale xa et al. (2001) + + + + + + +
Point blending - K olluri (2008) ++ +++ ++
APSS - Öztireli et al. (2009) ++ +++ ++
MPU - Ohtake et al. (2005) + + + + + + + +
Global Smoothness Priors
RBF - Carr et al. (2001) + + + + +
Graph Cut - Hornung and K obbelt (2006) + + + + +
F ourier - Kazhdan et al. (2005) + + + + + + +
W a velet - Manson et al. (2008) + + + + + +
Poisson - Kazhdan and Hoppe (2013) + + + + + + +
SSDF - Calakli and T aubin (2011) + + + + + + +

34 2. Related W ork
accurate 3D models, howe ver the requirement and assumption of closes surfaces reduced the
generality of these approaches.
T able 2.2 pro vides e v aluation insight for v arious state-of-the-art 3D fusion frame works.
Since all presented 3D reconstruction frame w orks are deri ved from v olumetric integration by
Curless and Le v oy (1996), the robustness metric v alue is more or less the same. Ho we ver
dif ferences in computation speed and accurac y are directly dependent upon external factors
such as the camera localization algorithm, scale of reconstruct etc. W e also found that point
based 3D fusion techniques ha ve more generic profile and support dynamics of en vironment
ho we ver lack behind in handling high depth noise or sparse samples by stereo sensors.
T able 2.2: Incremental 3D Fusion frame works
Frame work 1: Generality 2: Rob ustness 3: Speed 4: Accuracy
V olumetric 3D Fusion
InfiniT AM + + + + + + + + + +
F astFusion + + + + + + +
KinectFusion + + + + + +
V oxel hashing + + + +
MonoFusion + + + + +
Real-time v ol rec + + + +
P oint Based 3D Fusion
Point-based Fusion + + + + +
ElasticFusion + + + + + +
2.7 Summary
This chapter re vie wed both legac y and state-of-the-art shape representation techniques in order
to determine a suitable 3D reconstruction candidate with incremental fusion capabilities from
error -prone 3D samples. W e found that although simplex es representations is designed to
utilize modern rendering frame w orks, ho we ver their inability to handle error -prone 3D sample
data makes these techniques unsuitable for real-time incremental reconstruction. Parametric
representation methods were found suf fering greatly due to their comple x computational
profile. Surf ace splatting techniques were found to accommodate dynamic changes in
en vironment, ho we ver a lar ge number of depth samples are required to reduce noise af fects.

Summary 35
This constraint directly af fect camera mo vements and mak es surface splatting infeasible for
lo w-frame rate depth sensors such as IPS (Grießbach et al. (2014)) which captures 10 frames
per second. Implicit representations, more specifically volumetric 3D fusion methods produced
promising noise handling beha viour , ho wev er the stochastic con ver gence property for noise
remo v al depends directly on the quality of the depth measurements.
Despite gradual noise remo v al properties of the v olumetric 3D fusion, handling of depth
noise in 3D samples remains a serious concern and hence specialized image based edge
a ware depth smoothing schemes were re vie wed briefly and found that total v ariation based
filtering based depth map smoothing filters produced promising results. Depth outliers remo v al
techniques ware briefly discussed and found that statistical methods which use spatial proximity
information ef fecti vely to reduce outliers, ho we v er high memory access time in underlying
mechanisms makes such methods unsuitable for real time.

36 2. Related W ork

Chapter 3
Methodology
This chapter presents standard analysis practices and performance benchmark criteria adapted
to e v aluate the proposed contributions. In addition, technical requirements along with various
datasets are briefly described to establish a minimal working en vironment and to reproduce
findings. Initially , the structure of proposed frame work is introduced which presents a high
le v el understanding of the data transformation at each process. Secondly , v arious datasets along
with standard quantitati v e and qualitati v e measures used for e v aluation and v alidations are
outlined. Finally , technical requirements such as the input data type, computational resources,
de v elopment en vironment and software required to visualize and e v aluate output are briefly
described.
3.1 Framew ork Structur e
The proposed frame work in this thesis refers to a systematic 3D reconstruction pipeline which
employs proposed contrib utions for high quality models from input depth and color data. The
frame work is designed to e xhibit generic traits in terms of input data, which can be acquired
from acti v e RGB-D cameras, passiv e stereo based depth scanners or 3D laser scanning systems.
Usually , laser based 3D scanners produce relati v ely accurate depth information, ho we ver main
moti v ation is to process error-prone depth data captured from a lo w-cost acti ve or passi ve
sensor , which would enable mobile robots to percei ve accurate 3D information in a cost-
ef fecti ve manner .
The o verall structure of the frame work is illustrated in Figure 3.1 which highlights the data
37

38 3. Methodology
Figure 3.1: Frame work applied on RGB-D image stream
transformation by each indi vidual process for RGB-D data. Processes P0 and P1 are sho wn
in shaded-gray as the y are not part of the presented frame work. In processing unit P0 an
appropriate depth con v ersion is employed to con v ert depth images in to a standard format.
State-of-the-art localization methods such as ORB-SLAM2 (Mur -Artal and T ardós (2015)) or
ICPCUD A (Whelan (2018)) can be employed for e go-motion sensor tracking in P1 . Depending
upon the type of 3D information gi v en as input, one or more processes of the frame work are
not utilized. For e xample, in case of using the stereo based IPS sensor 1 , processes P0 and
P1 are not applied since IPS inherently performs stereo matching (i.e. depth con version) and
e go-motion localization and resulting depth, color and camera pose are fed directly to P2 .
Assuming that an acti v e RGB-D sensor such as Kinect RGB-D camera is used as an
input, the first process P0 con v erts each depth pixel d i ( r ow , col ) from i th time-stamped raw
depth image to an appropriate distance using a standard non-linear function (as suggested by
OpenKinect.or g (2018)):
D i ( r ow , col )= 1
d i ( r ow , col ) − 0 . 00307 + 3 . 3309 (3.1)
1 Integrated Positioning System (IPS) is a stereo based depth estimation system designed to assist in 3D
na vigation and reconstruction.

V alidation and Ev aluation 39
where D i ( r ow , col ) is a distance of a particular depth sample from the camera origin in
millimetres (OpenKinect.or g (2018)). The resulting depth image D i and the pre-re gistered
color image C i are processed in P1 which localizes the sensor position in world coordinates
with the help of an e go-motion estimation. Process P2 uses the camera pose information
pose i and the intrinsic matrix K to register each v alid depth pix el D i ( r ow , col ) in a world
coordinate system. Process P3 encapsulates the core method for producing an accurate 3D
shape reconstruction, which contains three ke y contrib utions: 1) implicit shape approximation,
2) rob ust statistical outlier detection and 3) integration of a priori information to reduce noise
inherently . Since the frame w ork is designed to work with streams of images, each ne w
input instance in v oke P0 → P3 in an iterati ve f ashion and the implicit model d4 is updated
incrementally . Finally , P4 uses a standard marching cube algorithm (see Lorensen and Cline
(1987)) to process the implicit model and to produce globally consistent 3D models in the form
of meshes d5 as output.
3.2 V alidation and Evaluation
The actual implementation of the proposed fusion frame work is carried out in C++
programming language which is preferred among others due to highly extendable functionality
with the help of publicly a v ailable libraries and its real-time processing profile. In order
to v alidate v arious aspects of the implementation, unit-tests ha v e been employed to ensure
that lo w-le vel optimization tasks such as matrix manipulations and image con v ersion are
correct. The data-based e v aluation tests ha ve been emplo yed to ensure that the ov erall shape
reconstruction algorithm is accurate and competiti v e.
3.2.1 Evaluation Framew ork
The accurac y of the reconstructed model and the processing time taken by the fusion frame work
are considered as standard performance metrics for shape reconstruction as suggested by
Strecha et al. (2008). The comparison of the processing time among two functionally identical
fusion frame w orks is a straightforward process, ho we ver v arious dev elopment traits such as use
of GPU computing or limiting the memory allocation for the abounded reconstruction pro vides
a biased fa vour to a particular technique. Henceforth, state-of-the-art framew orks along-with

40 3. Methodology
Figure 3.2: The process of acquiring quantitati ve measures among reconstructed and ground
truth 3D models
the proposed frame work are subjected to a v ariety of e v aluation aspects such as accurac y ,
functionality and processing time to establish a comprehensi v e profile of each technique.
The accurac y of the reconstructed model naturally in volv es comparing a resulting model
against an a priori kno wn result, called gr ound truth model . Unfortunately , acquiring ground
truth models for lar ge-scale 3D reconstructions is a tedious and careful process in volving
laser guided depth data collection in the form of a point cloud follo wed by high-quality
mesh generation. Although careful measurements combined with a time-consuming shape
approximation process generates a high-quality 3D model, the resulting estimated ground
truth model is still expected to contain approximation errors. Ground truth models for large-
scale en vironments are usually not a v ailable. Synthetic en vironments and relativ e models are
therefore preferred since the y provide v ery accurate measures.
The e v aluation framew ork is a process pipe-line which produces quantitati v e measurements
gi v en approximated 3D models and ground truth. The assessment frame work is summarized in
figure 3.2 and three underlying processes perform the follo wing tasks:
• P0: Gi v en a reconstructed 3D model in the form of a 3D mesh, appropriate Scaling ,
Rotation and T ranslation operations are applied to mak e the gi ven model consistent with
the ground truth model.

V alidation and Ev aluation 41
• P1: The Ground truth model is sampled with at least 10 6 points to ensure that surface
contours are captured re gardless of the shape and size of the model in the sampled 3D
point cloud d3 .
• P2: Each point in d3 is re gistered to a closest polygon in the reconstructed 3D model
and a perpendicular distance is recorded. This distance when a veraged, provides fi ve
quantitati v e measures (mean, median, standard de viation, min and max distance) and are
considered as standard e v aluation criteria as suggested by Handa et al. (2014). This
assessment is carried in Cloud-Compar e (Girardeau-Montaut (2015)) which is freely
a v ailable open-source software.
3.2.2 P erf ormance Metrics
In order to inspect the performance of the proposed fusion frame work, a numerical e v aluation is
employed to collect appropriate performance metrics which allo w meaningful information and
conclusions to be extracted from the shape approximation process. In principal, the Euclidean
distance is used between the reconstructed surface and ground truth sampled data as suggested
by Ber ger et al. (2013). F or each sample, an absolute distance from d3 to the closest polygon
from d1 is computed in process P2 . This error measurement is used as a primary source to
e xtract more significant statistical measures such as error histograms and cumulati ve error
distrib utions. A similar approach has been applied to e v aluate multi-vie w stereo reconstruction
techniques by Strecha et al. (2008). Follo wing performance metrics are applied:
• V isual inspection: Real-time 3D reconstruction results are visualized using the OpenGL
renderer . Ho we v er once the reconstruction is completed and the model is stored in
memory , intensi ve inspection is carried out and appropriate features of the model are
captured using Meshlab .
• Absolute surface err or: Absolute error measurements collected from the process P2
are projected onto the reconstructed model in color coded error maps (also referred as
heat-maps ) to visualize spatial de viations by the reconstruction process. This process is
sho wn in Figure 3.3 where the reconstructed model is compared against sampled ground
truth 3D points to calculate basic quantitati v e measures and absolute surface errors.

42 3. Methodology
Common statistical visualization standard tools (see T ufte (1990)) such as median and
v ariance, error histograms (Figure 3.4.a) and cumulati ve error distrib ution plots (Figure
3.4.b) are generated.
• Mean, median and variance: After computing the absolute surface errors between
d1 and d3 , P2 summarizes the error distrib ution information in mean and median
v alues. Median Distance Error (MDE) is calculated by sorting all non-zero distances
in ascending order and taking the central sample from the sorted list. In some special
cases, MDE is selected ov er mean v alue as a primary error descriptor , since it is robust to
lar ge outliers in a particular spatial location. In cases where relati v ely large absolute
surface occur at one particular position, MDE metric is less lik ely to reflect drastic
changes. Furthermore, mean and v ariance error metrics are computed to further analyse
distrib ution.
• Statistical measur es: Probability density functions are considered as a de-facto metrics
to e v aluate the nature of a random process. Since the computed error -distrib ution is
discrete in nature, histograms are used to represent ho w well the reconstructed shape is
aligned to the ground truth model. The peak of the histogram represents most frequent
errors made by the reconstruction process, ho we v er the comparison of two histograms
is a non-intuiti v e and relati v ely dif ficult process. This is addressed by calculating a
cumulati v e distrib ution from each histogram which can then be used to analyse in an
intuiti v e manner . Figure 3.4 illustrates the relation between a histogram and a cumulati ve
error distrib ution. An optimally superior method would rise sharply to the 100% while a
lo w-quality technique will produce a gradually increasing curv e.
• Runtime: The time taken by an algorithm to accomplish a computational task is usually
measured in milliseconds using the CPU clock. This empirical metric is used to e v aluate
the processing time taken by the implemented algorithm. All experiments ha ve been
performed on a desktop computer ha ving follo wing specifications:
– Intel Core i7-4790
– Nvidia Quadro K620 2
2 Used only to e v aluate InfiniT AM by Kähler et al. (2015).

Datasets 43
– 8 GB RAM
– W indo ws 7 (64-bit) and Linux 14.04 operating system.
Using a high performance computer will accordingly enhance the runtime performance
metric.
Figure 3.3: Process of calculating error distance between sampled ground truth 3D points and
the reconstructed model and resulting an absolute surface errors in a color coded error map.
Figure 3.4: T ypical error histogram (left) and cumulati ve error distrib ution (right).
3.3 Datasets
3.3.1 Synthetic Piecewise Function
Main moti v ation behind this research is to perform a reliable 3D implicit fusion while handling
ef fects of depth noise. It is therefore beneficial to critically e v aluate the performance initially on
a synthetic piece wise 2D function. This initial testing phase allo ws an ef fortless visualization

44 3. Methodology
a) b) c)
d) e) f)
Figure 3.5: Synthetic 3D and 2D function represented as point cloud follo wed by respecti ve
implicit representation with and without added depth noise.
which enables rapid prototyping of the solution de velopment. A replicated v ersion of a 2D
piece wise function in the third dimension is used to v alidate a response of the proposed
frame work with additi ve noise in a 3D en vironment. Figure 3.5 illustrates both 2D and
3D functions (represented with points) with respecti v e v olumetric implicit surfaces with and
without additi v e depth noise.
Main benefit of using synthetic functions is that parameters such as additi ve noise and scale
of reconstruction can easily be controlled. The model contains gradual curv es, planar areas and
sharp edges so that the performance of the proposed frame work can be e v aluated in detail.
3.3.2 Synthetic 3D Complex En vir onment
There e xist a v ariety of publicly a v ailable synthetic 3D models which can be used to generate
synthesized camera mo vements and characteristics, howe ver the standard RGB-D dataset by
ICL-NUIM (Handa et al. (2014)) was selected to obtain un-biased results. This dataset is
specifically selected to reflect a broad range of requirements often needed by the 3D fusion
frame work. Thus, an algorithm performing well on the simulated en vironment is e xpected to
perform well in a realistic datasets obtained from either acti ve or passi ve depth sensors, e ven if

Datasets 45
Figure 3.6: The interior of a synthetic li ving room scene (color remov ed to highlight geometry)
the ground truth model is not present for quantitati v e e v aluation.
The dataset contains four distinct trajectories of the living-r oom en vironment simulating
repetiti v e loop closure, fast and slo w mo ving camera mov ement along with two dif ferent sets
of depth image streams which simulate noisy depth measurements from a Kinect-like RGB-D
camera and clean depth images. The simulated living-r oom en vironment consists of challenging
micro and macro objects ha ving planar , sharp and gradual curv atures. Figure 3.6 shows two
rendered vie ws highlighting the challenging comple x geometry of the scene. Both clean and
depth images which are corrupted with noise were used to e v aluate the proposed framew ork.
Results are presented in Section 6.
3.3.3 Realistic 3D Complex En vir onment
One of the main moti v ation behind this research is to dev elop a generic fusion framew ork
with controlled re gularization parameters to accommodate v ariety of depth sensing systems.
Therefore, three distinct realistic benchmarking datasets Compr ehensive RGB-D Benchmark
for SLAM (CoRBS) (W asenmüller et al. (2016)), KITTI vision benchmark suite (Geiger et al.
(2013)) and IPS dataset (Grießbach et al. (2014)) were selected to demonstrate the flexibility
of the proposed frame w ork. Unfortunately , ground truth models for KITTI and IPS datasets

46 3. Methodology
are not a v ailable, therefore screenshots of the reconstructed model are provided for qualitati ve
comparison in Section 6. Characteristics of each datasets are:
• IPS-Dataset: The dataset has been captured from a passi v e depth acquisition system
which uses stereo cameras to capture time-stamped images. Re gistered images are
processed with the Semi Global Matc hing (SGM) algorithm (Hirschmuller (2005)). Ego-
motion information acquired from b uilt-in Inertial Measur ement Unit (IMU) are fused
with a visualSLAM algorithm to produce a globally consistent camera pose information.
IPS is capable of producing approx. 10 instances containing depth, color and camera
tracking information. Figure 3.7 sho ws sample depth and color images acquired from
the IPS depth sensor system. IPS is capable of transmitting real-time depth sensing
via TCP/IP communication for real-time processing, ho we ver to f acilitate a repetiti v e
e v aluation process, trajectories ware recorded and processed with the fusion frame work
Grießbach et al. (2014).
Although IPS pro vides high-accuracy localization information with the help of multi-
sensor fusion and b undle-adjustment, estimated depth measurements suf fer from strong
estimation noise and outliers caused by reflections, v arying illuminations and fast
camera mo vements. Three scenes mine , corridor1 and corridor2 hav e been recorded
for benchmarking that represent lar ge-scale scenes with challenging en vironmental
conditions. Corridor1 and corridor2 scenes demonstrate non-te xtured surfaces,
occlusions and dif ficult lighting conditions which further contrib utes in depth noise and
outliers, while mine scene highlights lo w illumination conditions with a fast mo ving
camera.
• CoRBS: The dataset contains four distinct scenes captured by latest Kinect v2 RGB-D
sensor . Unlike state-of-the-art RGB-D datasets which only provide a camera trajectory
as the ground truth, CoRBS provides high-quality 3D ground truth models captured by
projecting light patterns onto a surface (see Figure 3.8). Camera trajectory information
is captured by an e xternal motion capturing system with sub-millimeter precision. The
resulting trajectories and ground truth models are aligned to a global coordinate system
to further simplify the e v aluation process. Thus CoRBS is selected to demonstrate work
of proposed frame work using no vel RGB-D depth sensor system.

Datasets 47
Figure 3.7: Arbitrary sampled instances of registered color and depth images from IPS sensor
Figure 3.8: Light pattern projection based range image acquisition using tw o cameras
(W asenmüller et al. (2016)).
• KITTI vision benchmark suit: The dataset is captured from a standard vehicles
mounted with a 360 o rotating laser scanner , tw o pairs of stereo cameras, and IMU
module capable of capturing 10 samples per second to facilitate benchmarking series
of computer vision research problems such as stereo matching, optical flo w , e go-motion
estimation and object tracking. Figure 3.9 illustrates the sensing equipment mounted on
AnnieW A Y . Localization information in the form of sensor mo vement for trajectories are
also pro vided as the ground truth for testing visualSLAM algorithms. Since main focus
of the research is to facilitate a 3D shape reconstruction, we emplo y only stereo images
and laser scanner data with estimated camera pose from state-of-the-art visualSLAM
algorithm to demonstrate a real-time 3D reconstruction scenario with the proposed fusion
frame work.

48 3. Methodology
Figure 3.9: AnnieW A Y with mounted multi-sensor system set-up.
3.4 3D Sensors
Mobile robotic applications such as automated drones, unmanned vehicles etc are usually
constrained to pro vide limited amount of po wer to v arious sensors. Therefore, such mobile
robots are generally equipped with lo w-po wer consuming sensors such as RGB-D cameras
connected to a mobile computing de vice which transmits the captured depth and color image
streams to the high-performance computer . Pro vided that enough network bandwidth is
a v ailable to send the image stream, the processing computer can either apply an on-line 3D
fusion approach or store the image stream for of f-line processing.
F ortunately , autonomous vehicles are capable of pro viding suf ficient po wer and
maneuv erability , this allo ws them to utilize multiple 3D sensors and high performance
computing on the go. Although autonomous vehicles are e xpected to na vigate in a v ariety
of lighting and en vironmental conditions, a multi-sensor set-up ensures that not all sensors are
af fected by a particular lighting condition.
Acti v e depth sensor systems especially Kinect, ASUS Xtion Pro and Kinect v2 are designed
specifically for in-door en vironments where lighting conditions are either re gulated or are kept
deterministic, ho we ver their ef ficiency to percei ve the en vironment decrease drastically with
certain lighting conditions. Furthermore, pattern based acti ve depth sensors are prone to highly
illuminated surfaces, while time-of-flight sensors are prone to introduce surf ace estimation
errors on darker surf aces. Although activ e depth sensors provide high frame-rates (i.e. approx

Summary 49
30fps), their incapability to percei ve distant objects restricts their usability in f ast moving out-
door en vironments.
P assi ve depth sensors which utilize stereo images and disparity estimations pro vide long
range sensing capability to mobile robots. In principal, the quality of estimated depth’ s in stereo
matching algorithms rely hea vily on the base line textures of surf aces. Since depth estimation
is a computationally e xpensi ve process, special System-on-Chip (SoC) or GPU based portable
computing nodes are required to facilitate real-time depth estimation.
Laser based 3D scanners are a more suitable choice for autonomous v ehicles since the y
pro vide high-quality depth measurements without interpretation. Ho wev er these scanners
usually pro vide sparse depth measurements which are not suitable for 3D reconstruction since
each measurement has a dif ferent orientation (i.e. an external projection is required to interpret
these measurements into a depth map). Furthermore, these laser scanners are high cost and
prone to measurement errors in-case two similar sensors are scanning the en vironment.
T able 3.1 summarizes aforementioned characteristics of depth scanners and suggests a
suitable application en vironment.
3.5 Summary
This chapter presented a research methodology and standard practices used to e v aluate and
analyze performance aspects of the proposed research. The generalized structure of fusion
frame work is introduced which subdi vides ov erall processing tasks into a process pipe-line,
this modular design allo ws ef fortless v alidation and testing of the data at dif ferent stages of
the pipeline. Standard quantitati ve and qualitati ve e v aluation metrics ha ve been introduced.
Finally , both synthetic and realistic datasets hav e been briefly described which highlights
applications of the proposed research in a real-time 3D reconstruction en vironment. The
follo wing chapter will introduce theoretical aspects of the proposed research contrib utions.

50 3. Methodology
T able 3.1: 3D depth sensors and their respecti ve characteristics
Sensing T echnology Sensor Image Pr os Cons
P attern projection Kinect - Lo w cost
- High frame rate
- Short range
- Sensiti v e to high
illumination
ASUS Xtion Pro
T ime of flight Kinect v2 - Lo w noise
- High frame rate
- Short range
- Inability to detect
dark color objects
Stereo IPS
- long range
- Suitable for
outdoors
- Produces camera
trajectory
- Lo w frame-rate
- Inability to detect
dark color objects
Laser LiD AR - long range
- Suitable for
outdoors
- Sparse sensing
- V ery high cost
- Hea vy
- Lo w frame-rate

Chapter 4
Fundamentals of V olumetric 3D
Integration
Chapter 2 identified the ef fecti veness of an implicit representation for error -prone depth images
for an incremental 3D inte gration and reconstruction. Furthermore, it was also found that
employing smoothness prior information can be used to reduce noise artef acts. This chapter
will pro vide in-depth analysis and theoretical background of the underlying v olumetric fusion
process to identify potential challenges of employing smoothness prior in inte gration process.
Moreo ver , the core concept of the implicit fusion is analysed with the help of v arious error -
prone synthetic SDF signals. Presented concepts of legac y v olumetric inte gration are expected
to serv e as theoretical foundation for proposed contributions in Chapter 5.
Initially , implicit representations of volumetric inte gration are formally introduced and
related properties are discussed. Secondly , incremental aspects of the dense 3D fusion are
analysed by fusing simulated noisy depth signals. Finally , the rationale for using sparse or semi-
dense v oxel grids are presented which highlights trade-of fs between computational complexity
and quality of the reconstructed model.
4.1 Signed Distance Function
The Signed Distance Function (SDF), also referred to as Distance T ransform is a basic building
block for visualizing and processing v olumetric 3D data. In computer graphics community ,
51

52 4. Fundamentals of V olumetric 3D Integration
SDF is commonly used to accelerate the rendering process for high-quality 3D models
(Hart (1996)), ho we ver state-of-the-art reconstruction frame works employ SDF to inte grate
incremental updates from depth cameras. In principal, to render a SDF v oxel-grid, a non-zero
crossing of SDF along a vie wing ray is considered as implicit surface (also referred to as iso-
surface ).
4.1.1 Definition
T o elaborate the aforesaid definition in a formal construct, consider a mapping function
D ( x ) : R n → R , (4.1)
which transforms n -dimensional space to a scalar v alue. Since our tar get application domain
is 3D space we assume n = 3, ho we v er a 2D analogy is employed for illustrati ve purposes.
Assuming an implicit surface of a circle ha ving the radius r = 5 in τ units defined by
x 2 + y 2 = r 2 , (4.2)
Assuming that g ( x, y ) denotes the euclidean distance from the origin, the implicit surface for
such set-up can be defined by
g ( x, y ) − r = 0 , (4.3)
which ensures that each v ox el at location v = [ x, y ] t contains a signed distance v alue from
nearest implicit surface. In most cases, a linear truncation function
ˆ
D ( x ) = max ( min ( g ( v ) − r , D max ) , D min ) , (4.4)
is applied to constrain SDF v alues in a particular range (i.e. d max ≤ ˆ
D ( x ) ≤ d min ), the range
is referred to as support in the upcoming te xt. Controlling the support of SDF is important
parameter which plays a significant role in fusing SDF v olumes. Therefore, a robust truncation
function ha ving properties of gener alized logistic function is defined by
ˆ
D ( x ) = 1 − ( 2
1 + e k d ) , (4.5)

Signed Distance Function 53
Figure 4.1: T runcated SDF function adapted from logistic function.
where support is controlled by v arying parameter k and d denotes the depth information from
camera origin, effects of v arying parameter k are sho wn in Figure 4.1. T o a v oid unnecessary
comple xity , the upcoming text presumes that either a linear or truncated logistic function is
employed, ho wev er the actual implementation of 3D reconstruction framew ork uses Equation
4.5 to compute TSDF v alues. T runcated SDF v alues which satisfy the range criteria can be
used to achie v e ef ficient memory utilization.
Figure 4.2 illustrates aforementioned properties of SDF from a circle ha ving r = 5 . Since
the distance of each location v = [ x, y ] t is relati v e to the edge of circumference, therefore
the positi v e and neg ati ve v alues represent outside and inside respecti v ely while SDF v alue
being equal to zero are on the edge of circle. When dealing with actual sensor data, finding
local or global implicit function which satisfies the geometry of an acquired depth image is a
computationally e xpensi ve task (discussed in Section 2.1.2). Hence an approximation of the
SDF is assumed and briefly discussed in upcoming section.

54 4. Fundamentals of V olumetric 3D Integration
a)
b)
c)
Figure 4.2: a) Plot of g ( x, y ) from origin and green contour line sho ws all the points with
distance equal to r = 5 in τ units from origin , b) T runcated implicit surface ˆ
D ( x, y ) and
c) Respecti v e weighting function W ( x, y ) with projected green contour lines highlighting the
suspected surface.

SDF from depth images 55
Figure 4.3: Projection of world information onto depth and color cameras.
4.2 SDF fr om depth images
In order to maintain generalization aspects of the research objecti v e, we assume that streams
of depth and color images are captured from 3D scanners. This assumption ensures that the
proposed research frame w ork is capable of processing input depth images irrespecti ve of the
input source. A standard format for depth and color image stream is suggested by Sturm et al.
(2012) in which each image is re gistered and time-stamped for easy access. This arrangement
of storing depth and color image stream is further supported by state-of-the-art visualSLAM
algorithms (such as ORB-SLAM2 and RGBD-SLAM), therefore appropriate depth con v ersion
must be employed to con v ert depth data from IPS or laser sensor .
Considering a set-up in which a pre-localized 3D scanner captures a depth and a color image
(denoted by Z and I respecti v ely) it is presumed that both intrinsic and extrinsic parameters of
the 3D sensor are kno wn where f = ( f x , f y ) and c = ( c x , c y ) are focal lengths and central point
respecti v ely . In principal, e very ph ysical point in the world coordinate system P w = [ x 1 , x 2 , x 3 ]
is projected onto Z and I using a perspecti v e-projection function π ( x ) : R 3 → R 2 formally
defined by
π ( P w ) = ⎡
⎣
x 1
x 3 f x + c x
x 2
x 3 f y + c y
⎤
⎦ (4.6)
where π ( P w )=[ u, v ] t and the process is illustrated in Figure 4.3.
In order to represent a depth image as a SDF , a discrete v oxel grid of finite size ha ving

56 4. Fundamentals of V olumetric 3D Integration
spatial resolution τ is initialized. In most cases, τ determines the scale of reconstruction
ho we ver there e xist multi-scale v ariants (see Steinbruecker et al. (2014)) in which the scale
of a particular spatial bounding box depends upon the proximity and the quality of the acquired
depth image. For simplicity , we presume τ is a constant parameter selected before the fusion
process and all spatial measurements such as the dimensionality of e very v oxel, focal lengths
and coordinate system are con v erted accordingly .
F or computational simplicity it is a common practice to presume that Z represents a three-
dimensional surface. Therefore, e very v oxel v = [ x 1 , x 2 , x 3 ] and the corresponding projection
information Z ( π ( v )) can be used to determine a signed distance v alue of each v ox el using
D ( v ) = Z ( π ( v )) − x 3 , (4.7)
where D ( v ) denotes the SDF v alue of v ox el v from the presumed surface.
Similar to implicit representation of a circle (see Section 4.1), Equation 4.7 produces
positi v e, zero and ne gati v e v alues depending upon whether the centre of v oxel is outside,
at or inside the presumed iso-surface. The truncation function is then applied to constrain
SDF v alues in the proximity of the iso-surf ace. Figure 4.4 sho ws a cross-section of a
three-dimensional v oxel-grid to highlight the SDF representation. Since typical 3D scanners
can percei v e en vironment information from one vie w , multiple depth images captured from
dif ferent spatial locations (usually referred to as multi-vie w) are used to reconstruct a complete
3D model of the object and/or en vironment.
4.3 Effects of incr emental 3D fusion
Although the core concepts of weighted incremental 3D fusion by Curless and Le vo y (1996)
are introduced in Section 2.4, this section is intended to analyze the ef fects of incremental
3D fusion when the frame work is provided with noisy depth data. For illustrati ve purposes,
considering a single ray is considered originating from camera centre to wards the surface of
object at 11 τ units aw ay . As stated earlier that 3D scanners are prone to introduce depth
noise, it is assumed that the system collects two measurements of the suspected surf ace with
added depth noise of form N ( µ, σ ) = N (0 . 0 , 1 . 0) without mo ving the camera and/or object.
The resulting error -prone depth measurements represented with TSDF and weighting function

Ef fects of incremental 3D fusion 57
Figure 4.4: Cross-section of a vox el-grid with color coded SDF v alues.
w ( x ) are sho wn in Figure 4.5.a and 4.5.b respecti vely .
The inte gration technique proposed by Curless and Le v oy (1996) utilizes a weighted
addition of SDF v alues using Equations 2.9 and 2.10. Figure 4.6.a and 4.6.b sho ws integrated
TSDF and weighting function v alues respecti vely . In principal, the zero-crossing of the fused
implicit SDF signal represents a better approximation of the actual surface than indi vidual
noisy depth samples. It is therefore e xpected that each incremental update of depth information
reduces the estimation error between the iso-surface and the actual surf ace. Unfortunately ,
this statistical con v ergence property depends hea vily on the number of depth samples and
properties of added noise. Determining noise characteristics of e v ery av ailable depth sensor
is unfortunately a tedious process, Nguyen et al. (2012) sho wed that depth noise from Kinect
sensor can be estimated with Gaussian distrib ution. It is therefore presumed that depth
measurements are corrupted with standard Gaussian noise. In practice ho we ver , properties
of noise can be e xploited by v arying the weighting function of the 3D integration.
4.3.1 Relation of con ver gence with weights
In order to highlight the concept of con ver gence in 3D fusion with a v arying degree of the
depth noise, 200 instances of synthetic piece wise signal from Section 3.3.1 were generated
and corrupted with additi v e Gaussian noise of the form N ( µ, σ ) = N (0 . 0 , 5 . 0) . As stated
earlier , the inte gration process in volv es fusing multiple instances of SDF and weight v alues
together using Equations 2.9 and 2.10. Once applied in an incremental fashion, the accumulated

58 4. Fundamentals of V olumetric 3D Integration
a)
a)

b)
Figure 4.5: a) Error-prone depth measurements represented as one-dimensional TSDF function
and b) Respecti v e weight v alues generated using standard Gaussian function.

Ef fects of incremental 3D fusion 59
a)
a)

b)
Figure 4.6: a) Fused TSDF function and b) Updated weighting function.

60 4. Fundamentals of V olumetric 3D Integration
Figure 4.7: Mean absolute surface error con v ergence with incremental fusion.
weight v alues are e xpected to exhibit properties of a normal distrib ution in which the peak
of distrib ution represents estimated implicit surface. Less erroneous samples are e xpected to
produce a sharp peak in weight v alues and vice-v ersa.
In principal, the weight v alues are analogous to the confidence of the estimated implicit
surface, where each incremental update increases the o verall confidence. Quantitativ e analysis
of this inte gration technique confirms the presence of con ver gent beha viour in terms of
minimizing the absolute mean surface error , this phenomenon is sho wn in Figure 4.7.
Although the relation between weight v alues and error con v ergence is loosely proportional,
selecting an appropriate weighting function for a specific depth sensor is a tedious and time
consuming task which requires expert human interaction. In special cases where the surface of
a percei v ed object is either non-rigid or non-stationary , the confidence v alue generates multiple
zero-crossings in SDF v alues and result in inconsistent surf aces. Furthermore, the weight
v alue for each v ox el location is stored as a floating point element occupying 32-bit of memory
space. This memory in-ef ficient utilization further restricts the application domain of the 3D
reconstruction to high-end processing de vices.

Semi-dense v oxel grid 61
Figure 4.8: Undesirable holes in reconstructed model from ICL-LR2 trajectory .
4.4 Semi-dense v oxel grid
Running-time analysis of the reconstruction frame work is a significant performance metric,
man y implementations of volumetric fusion such as Izadi et al. (2011) and Steinbrueck er et al.
(2014) presume a spatial limitation of the observ able en vironment and achie ve ef ficient memory
utilization by implementing a dense v ox el-grid. Modern techniques utilize a truncation function
to identify a proximity of each v ox el near the expected iso-surf ace which allo ws lar ge-scale
reconstructions and ef ficient utilization of memory resources. Such truncation produces semi-
dense spatial v ox el locations which are e ventually stored in a linear memory with the help of a
hash function.
Processing a semi-dense v oxel grid with SDF is a rob ust extension of traditional fusion,
ho we ver limiting the number of v ox els which satisfy the proximity criteria restricts the
application domain and af fect the quality of reconstructed models. In the case of v ox el-block
implementation, the resulting 3D model is prone to contain undesirable holes due to close
proximity of the estimated iso-surface and the alignment of the particular v oxel-block. These
holes in the reconstructed model are indications of dif ficulties the rendering system has with
finding the zero-crossing within each v oxel-block from the vie wing angle, Figure 4.8 illustrates
this phenomenon in which casted rays cannot detect the iso-surface by checking zero-crossing
in SDF v alues.
Another cate gory of semi-dense vox el grid implementation uses the position of camera
and 3D samples to generate a list of v oxel coordinates which satisfy along-the-r ay criteria.

62 4. Fundamentals of V olumetric 3D Integration
Figure 4.9: Large-scale 3D reconstruction using hashed v oxel grid.
The selected v oxels are then updated with computed signed distance v alues, Funk and Börner
(2016) demonstrated the ef fecti veness of using hashed v ox els to store large-scale 3D models
in a real-time scenario. Figure 4.9 highlights the amount of details captured using such hashed
v oxel-grid.
In principal, computer programs and algorithms are expected to e xhibit memory-processing
trade-of f relations. Pre-allocated and bounded dense reconstruction algorithms are usually
faster in terms of processing due to pre-defined memory accesses while modern semi-dense
representation allo ws boundless reconstruction at the e xpense of e xtra calculations for each
memory access. Similarly , the e xecution time of implementation depends greatly on the type
of representation. Based on aforementioned characteristics, vox el-block based v olumetric
representation is preferred o ver dense and along-the-r ay for the actual implementation of the
proposed frame work since it allo ws boundless reconstruction while allo wing fe wer hashed
address calculation.
4.5 Summary
This chapter pro vided in-depth theoretical background of underlying implicit representation
and related causes and ef fects. The concept of fusing SDF v alues to reduce noise effects is
formally introduced and the rationale behind the con ver gence of the estimated implicit surf ace
is presented. Furthermore, properties of semi-dense implementations of v oxel-grids such as
v oxel-block and along-the-r ay are discussed to highlight performance and quality trade-of fs.
Pro vided analytical discussion and core concepts are utilized in de veloping the core-principle
of the proposed research and are discussed in detail in upcoming chapter .

Chapter 5
Concept and Design
This chapter presents theoretical insights and rationales behind proposed research contrib utions
which serv e as solutions to the ov erall research question. In Chapter 2, it was discussed
that an ef ficient 3D reconstruction frame work should be able to integrate incremental depth
updates to an e xisting representation while exhibiting a rob ust profile to handle depth noise and
outliers. Since these characteristics are fully aligned with the ov erall research objecti v es, this
chapter is di vided into three sections to focus each characteristic indi vidually while keeping the
underlying rationale distilled.
Initially , a nov el least square estimation based alternati ve to the traditional weighted
inte gration method is introduced and a recursi ve form is deri ved which highlights the fle xibility
of the proposed scheme to utilize the quality of depth measurements in an optimal manner .
Secondly , the regularization information is integrated into a least square estimator to handle
erroneous depth measurements by applying the total v ariation denoising in a recursiv e manner .
This re gularization aspect is sho wn to produce smoother surfaces using comparati vely less input
sample data. Furthermore, a rob ust outliers remov al technique is introduced which targets
isolated and sparse outliers in real-time. Finally , the ov erall design of the 3D reconstruction
frame work is presented which utilizes the proposed contrib utions to achie ve high-quality 3D
models from series of depth and color images.
5.1 Recursi ve least squar es as 3D fusion appr oach
Curless and Le v oy (1996) ar gued that the optimality of weighted integration and the resulting
63

64 5. Concept and Design
iso-surface is equi v alent to a least square minimizer system. Although the actual intent of the
pro vided proof is purely conceptual, their equi v alence relation is an important analogy . Based
on this proof, an approximate solution to least square minimizer based inte gration systems can
be implemented and employed to perform the 3D fusion. In principal, such system is expected
to sho w similar noise reduction and depth inte gration characteristics as a weighted SDF fusion.
Furthermore, capabilities of a least square estimator are highly e xpendable in terms of working
principal such as weighted least squares, linear , non-linear and regularized least squares. These
characteristics pro vided the needed moti v ation to de v elop and implement a nov el least square
based inte gration system.
In order to describe the problem of depth fusion as a least square estimator , the observ able
en vironment is represented as a semi-dense v ox el grid as proposed by Rajput et al. (2016)
in which a fix ed number of vox el locations (referred to as support ) around a 3D sample are
accessed and their SDF v alues are represented as a standard v ector notation. These vectorized
implicit v alues (written compactly as SDF-signal in upcoming te xt) are used as input and output
of the linear least square estimator represented by equation
Y = Φ ˆ x + ν (5.1)
where linear system coef ficients Φ are used to estimate ˆ x from Y and resulting ν is the
approximation error . Considering a typical scenario where the number of signals n used for
an estimation is greater than support (denoted as m ), such system is e xpected to produce an
approximate solution which satisfies all versions of y i ∈ Y . The aforementioned set of n
input signals, estimated output and system coef ficients can be arranged in a matrix notation to
simplify the mathematical representation, and can be written as
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
y 1
y 2
.
.
.
y n
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
=
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
φ 1
φ 2
.
.
.
φ n
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
ˆ x n +
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
ν 1
ν 2
.
.
.
ν n
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
(5.2)

Recursi v e least squares as 3D fusion approach 65
where ˆ x n is the SDF signal approximated by inte grating n instances of noisy SDF signals, and
ν n denotes the estimation error which is e xpected to monotonically decrease with incremental
updates. Such representation of a least squares estimation is computationally expensi ve since
e v ery ne w ( n + 1) th instance of y and φ are concatenated with existing data and consequently
computation time for underlying mathematical operation gro ws e xponentially . Therefore, a
recursi v e least square solution is usually employed in practical applications. The mathematical
deri v ation is described in Section 5.1.1.
In practical applications, the true state of system x is expected to remain unkno wn, since
e v ery attempt to measure x will further increase the o verall estimation error . It is therefore
assumed that importance of ν is insignificant and remov al of this term does not af fect the ov erall
system design. Based on the aforementioned characteristics, Equation 5.1 can be reduced to a
minimization problem defined by
min ∥ Y − Φ ˆ x ∥ 2
Such least squares estimator is e xpected to produce similar con ver gent beha viour as a traditional
weighted fusion, ho we v er the true potential of such representation is demonstrated in Sections
5.2 where the re gularization parameter is introduced to extend the capability of a least square
system to handle depth noise inherently .
5.1.1 W eighted least squar es and stardard derivation of ML-Estimate
In order to deri v e 1 a recursi v e form of a least square estimator from Equation 5.1, the difference
between estimated v alues ˆ x and noisy measurements can be written as
ϵ = Y − Φ ˆ x (5.3)
A cost function J ( ˆ x ) which tries to find the v alue of ˆ x through a minimization process can be
written as:
J ( ˆ x ) = ϵ T ϵ
= ( Y − Φ ˆ x ) T ( Y − Φ ˆ x )
= Y T Y − ˆ x T Φ Y − Y T Φ ˆ x + ˆ x T Φ T Φ ˆ x
(5.4)
1 Mathematical deri v ation of recursi ve estimator is adapted from Simon (2006) and modified to accommodate
depth estimation.

66 5. Concept and Design
P artial deri v ati v e of J with respect to ˆ x is employed to achie ve the necessary v anishing
condition of minimization, that is,
∂ J
∂ ˆ x = − 2 Y T Φ + 2 ˆ x T Φ T Φ = 0
Solving for ˆ x ,
ˆ x = (Φ T Φ) − 1 Φ T Y (5.5)
where Φ and Y are augmented matrices and their v alues can be used from Equation 5.2.
A typical least square estimator applies equal weights to e very accumulated measurement,
this weighting mechanism is someho w fla wed since it presumes a linear relation between the
measuring depth to the accumulated error . In order to integrate a weighting mechanism in
the least square estimation, a weight v alue calculated from a respecti v e noise model (e.g. see
Equation 3.1 for Kinect depth sensing) is emplo yed with each measurement y i : 1 ≤ i ≤ n .
T ypically , a cov ariance matrix R containing σ 2
i : 1 ≤ i ≤ n for each measurement is used to
reflect the weighting aspect, that is,
R = ⎡
⎢
⎢
⎣
σ 2
1 · · · 0
.
.
. . . . .
.
.
0 · · · σ 2
n
⎤
⎥
⎥
⎦
Equation 5.4 which minimizes the sum of squared dif ferences weighted with weight matrix R
can be written as
J ( ˆ x ) = ϵ T R − 1 ϵ = ϵ 2
1
σ 2
1
+ ϵ 2
2
σ 2
2
+ · · · + ϵ 2
n
σ 2
n
J can be e xpanded as follo ws:
J ( ˆ x ) = ϵ T ϵ
= ( Y − Φ ˆ x ) T R − 1 ( Y − Φ ˆ x )
= Y T R − 1 Y − ˆ x T Φ R − 1 Y − Y T R − 1 Φ ˆ x + ˆ x T Φ T R − 1 Φ ˆ x
(5.6)

Recursi v e least squares as 3D fusion approach 67
Similarly , minimizing J with respect to ˆ x and solving for ˆ x yields,
∂ J
∂ ˆ x = − 2 Y T R − 1 Φ + 2 ˆ x T Φ T R − 1 Φ=0
ˆ x = (Φ T R − 1 Φ) − 1 Φ T R − 1 Y
(5.7)
The solution of Equation 5.7 e xist only when the matrix R is non-singular , i.e. e very
measurement y i is corrupted with some de gree of noise for the estimation technique to work.
As stated earlier , augmenting Y , Φ and calculating the in verse with each incremental update
is computationally e xpensi ve task. Therefore a recursiv e update algorithm can be formulated
which can utilize the e xisting system estimate ˆ x k − 1 to compute ˆ x k without tedious matrix
augmentation and in v ersion. Such typical linear recursi ve estimator can be written as,
y k = Φ k x + ν k
ˆ x k = ˆ x k − 1 + K k ( y k − φ k ˆ x k +1 )
(5.8)
where φ k is a m x m system coef ficient matrix for instance k where m is the support of SDF
signal. The correction term ( y k − φ k ˆ x k +1 ) (i.e. transition of ˆ x k from previous estimate ˆ x k − 1 ) is
controlled by the estimator gain matrix denoted by K k ha ving m x m dimensions. Therefore,
the current estimation error is
ϵ k = x − ˆ x k
= x − ˆ x k − 1 − K k ( y k − φ k ˆ x k − 1 )
= ϵ k − 1 − K k ( φ k x + ν k − φ k ˆ x k − 1 )
= ϵ k − 1 − K k φ k ( x − ˆ x k − 1 ) − K k ν k
= ( I − K k φ k ) ϵ k − 1 − K k ν k
(5.9)
where I is the m x m identity matrix. The mean of this error can be written as,
E ( ϵ k ) = E ( I − K k φ k ) E ( ϵ k − 1 ) − K k E ( ν k )
A typical least square estimator is e xpected to exhibit unbiased beha vior to wards each
measurement. It is therefore assumed that on a v erage, an estimated ˆ x k and the true v alue
of x are roughly equal. In principal, an optimal v alue of gain matrix K k is expected to reduce

68 5. Concept and Design
the aggre gated v ariance of the estimated error , therefore a cost function for such optimality
criterion can be written as:
J k = E ( ∥ x − ˆ x k ∥ 2 )
= E ( ϵ T
k ϵ k )
= E ( tr ( ϵ k ϵ T
k ))
= tr ( P k )
where the trace operator ( tr ) is applied to the aggregated v ariance P k = E ( ϵ k ϵ T
k ) . The v alue of
the estimation error ϵ k from Equation 5.9 can be employed to obtain P k as follo ws:
P k = E ((( I − K k φ k ) ϵ k − 1 − K k ν k )(( I − K k φ k ) ϵ k − 1 − K k ν k ) T )
= ( I − K k φ k ) E ( ϵ k − 1 ϵ T
k − 1 )( I − K k φ k ) T − K k E ( ν k ϵ T
k − 1 )( I − K k φ k ) T
− ( I − K k φ k ) E ( ϵ k − 1 ν T
k ) K T
k + K k E ( ν k ν T
k ) K T
k
The estimation error computed at time k − 1 is independent of the measurement y k and
respecti v e noise ν k at time k , which implies that
E ( ν k ϵ T
k − 1 ) = E ( ν k ) E ( ϵ T
k − 1 ) = 0
E ( ϵ k − 1 ν T
k ) = E ( ϵ k − 1 ) E ( ν T
k ) = 0
By using the weight matrix R k and the implied estimation-error relation, the expression for P k
becomes
P k = ( I − K k φ k ) P k − 1 ( I − K k φ k ) T + K k R k K T
k (5.10)
It is worth mentioning that there e xists a strong correlation between the estimation cost J and
the co v ariance matrix P k . Since an optimal v alue of the gain matrix K k is e xpected to minimize
the cost function, such minimization can be obtained by dif ferentiating the cost function with

Recursi v e least squares as 3D fusion approach 69
respect to K k which can be written and simplified 2 as follo ws:
( ∂ J k
∂ K k ) T
= ( ∂ ( tr ( P k ))
∂ t ) T
( ∂ J k
∂ K k ) T
= 2( I − K k φ k ) P k − 1 ( − φ T
k )+2 K k R k
The optimal v alue of the gain matrix can be obtained by setting the partial deriv ati v e to zero
and solving for K k
K k = P k − 1 φ T
k ( φ k P k − 1 φ T
k + R k ) − 1 (5.11)
let S k = φ k P k − 1 φ T
k + R k for simplicity , so K k becomes
K k = P k − 1 φ T
k S − 1
k (5.12)
The substitution of simplified K k into Equation 5.10 follo wed by the e xpansion as follo ws:
P k = ( I − P k − 1 φ T
k S − 1
k φ k ) P k − 1 ( I − P k − 1 φ T
k S − 1
k φ k ) T + P k − 1 φ T
k S − 1
k R k S − 1
k φ k P k − 1
= P k − 1 − P k − 1 φ T
k S − 1
k φ k P k − 1 − P k − 1 φ T
k S − 1
k φ k P k − 1 +
P k − 1 φ T
k S − 1
k φ k P k − 1 φ T
k S − 1
k φ k P k − 1 + P k − 1 φ T
k S − 1
k R k S − 1
k φ k P k − 1
mer ging the underlined terms into S k
= P k − 1 − P k − 1 φ T
k S − 1
k φ k P k − 1 − P k − 1 φ T
k S − 1
k φ k P k − 1 + P k − 1 φ T
k S − 1
k S k S − 1
k φ k P k − 1
= P k − 1 − 2 P k − 1 φ T
k S − 1
k φ k P k − 1 + P k − 1 φ T
k S − 1
k φ k P k − 1
= P k − 1 − P k − 1 φ T
k S − 1
k φ k P k − 1
= P k − 1 − K k φ k P k − 1 by 5.12
= ( I − K k φ k ) P k − 1
(5.13)
Since K k and P k are inter -related and their v alues are computed in a recursi v e manner from
P k − 1 and H k − 1 respecti v ely , the o verall least square system is e xpected to reduce estimation
costs o ver time. This con ver gent beha viour of the system is depicted in Figure 5.1 where noisy
synthetic SDF signals are inte grated in recursi ve f ashion and compared against the ground truth
model. It is worth mentioning that the deri ved system has inherent similarities with Kalman
2 using the matrix manipulation property that ∂
∂ t ( AB A T )=2 AB when B is symmetric

70 5. Concept and Design
Figure 5.1: Mean absolute surface error con v ergence with incremental fusion.
filter , ho we v er it was observ ed during experimental e v aluation that e xtending the capabilities
of standard kalman filter is not a tri vial task. It was therefore decided to use the deri ved v ersion
instead of standard kalman filter for further e valuation and de velopment. F or the sake of
compactness, this recursi ve least square 3D fusion is referred to as RLSFusion in upcoming
te xt.
5.1.2 Depth fusion with r ecursive 3D fusion
Considering the scenario presented in Section 4.3 where two depth measurements of suspected
surface at 11 τ units with added depth noise of form N ( µ, σ )= N (0 . 0 , 1 . 0 τ ) are captured
and represented with TSDF signals ( y 0 and y 1 ). The resulting error-prone depth measurements
represented with TSDF are sho wn in Figure 5.2. The recursi ve solution to the least square
problem from Equation 5.8 can be used in incremental fashion to inte grate y 0 and y 1 to estimate
ˆ x 1 . Assuming that n =7 denotes support of SDF-signal, then the cov ariance matrix P 0 is
initialized as identity matrix of order n x n and x 0 . Initially , the system presumes that the
pro vided input instance y 0 reflects the nature of the estimated signal accurately , therefore x 0
is set to y 0 . Afterwards, for each ne w SDF-signal y k the system calculates the estimation
gain matrix and the co v ariance matrix applying Equations 5.11 and 5.13 respecti vely . It is
therefore e xpected that each incremental update y k contributes to the estimation of x k ho we v er
the impact of all contrib ution is decreasing monotonically as the belief of the system gro ws

Recursi v e least squares as 3D fusion approach 71
Figure 5.2: Error -prone depth measurements ( y 0 and y 1 ) represented as one-dimensional TSDF
function.
with each update. Figure 5.3 sho ws the zero crossing of the resulting x k between y 0 and y 1 .
In order to obtain an un-biased quantitati ve e valuation of RLSFusion against traditional
weighted fusion, both methods were provided with 200 instances of a synthetic signal with
additi v e Gaussian noise of form N ( µ, σ )= N (0 . 0 , 5 . 0 τ ) . Figure 5.4 sho ws the beha vior of the
mean absolute surface estimation error con v er ging to the sub-pix el accuracy in both cases.
5.1.3 Pr operties of RLSFusion
T rue potential of employing RLSFusion as a substitute to traditional weighted SDF fusion
comes from the fact that the process of SDF signal fusion can be controlled with e xternal
parameters without modifying the v olumetric representation or weights. Unlike traditional
fusion method in which incremental weight v alues are responsible of defining the belief of a
suspected surface, RLSFusion utilizes underlying estimator g ain v alues which can be modified
or reset on-demand. This control of estimator gain v alues allo ws the system to accommodate
a sensor noise model, depth noise and localization errors in a con v enient way . Follo wing
properties of RLSFusion ha ve been identified using an e xtensiv e ev aluation:
• Lo w memory footprint:
In order to highlight the memory footprint of RLSFusion, considering a synthetic

72 5. Concept and Design
Figure 5.3: Fused TSDF function with the help of Equation 5.8.
Figure 5.4: Comparison of the con ver gence between RLSFusion and traditional 3D fusion.

Recursi v e least squares as 3D fusion approach 73
piece wise 3D and 2D signals from Section 3.3.1 are to be represented in an empty v ox el
grid. Since RLSFusion uses underlying system v ariables such as estimator gain matrix
K k and co v ariance matrix P k to control the estimation process instead of weight v alues,
the v alues of these matrices can be stored in simple memory array . Since these system
v ariables are updated with respect to the number of time an instance is updated, therefore
a single copy of these v ariables is suf ficient. Therefore, the need of storing all the weight
v alues is unnecessary . This representation of system allo ws RLSFusion to utilize memory
in ef ficient manner . T able 5.1 presents a comparati ve o v ervie w of memory utilization by
representing a 2D and 3D synthetic signal with traditional dense, sparse and RLSFusion.
T able 5.1: Comparison of memory consumption (in bytes) among dense, sparse vs RLSFusion.
Dense Sparse RLSFusion
2D Signal 320000 12800 5608
3D Signal 64000000 2560000 1120008
• Controllable gain:
RLSFusion pro vides a flexible mechanism to control the beha vior of fusion with the
help of e xternally provided weights. Implementation of the proposed RLSFusion uses a
weighting mechanism to control and manipulate the amplitude of the estimator gain to
accommodate less noisy depth measurements. T o highlight this capability , a scenario
is considered in which k instances of depth measurements are fused using weighted
inte gration. This implies that the impact of each incremental update decreases o ver
time re gardless of the quality of measurement. This problem is handled ef ficiently by
RLSFusion with the help of forcing the system to accommodate updates, Figure 5.5
sho ws the beha vior in which the system is pro vided with 10 less noisy depth signals at
k = 100 . It can be observed that the beha vior of the traditional inte gration method is
infle xible (since weights of the system become more rigid ov ertime) while RLSFusion
adapts quickly and produces a less o verall error due to this adaptation.
T raditional least square estimators use this weighting mechanism to employ e xponential
for getting in which a system can be programmed to focus the estimated state to wards
recent updates while ignoring older measurements. Such utilization of weights can
be used to accommodate dynamics of the en vironment in a real-time v olumetric

74 5. Concept and Design
Figure 5.5: Capability of RLSFusion to accommodate less noisy measurements.
reconstruction. Since the main focus of this research is to reduce errors in measurements,
this dynamic implementation is left intentionally as a future research direction.
Although proposed RLSFusion sho ws attracti ve impro vements ov er traditional methods,
dealing with error -prone depth measurements remained untouched. The upcoming section will
introduce a no vel mechanism to inte grate smoothness priors as a re gularization parameter in a
recursi v e least square estimator .
5.2 Regularized Recursi ve Fusion
Section 2.3 discussed the possibility of using e xternal depth smoothing image filters to reduce
depth noise. Graber et al. (2015) argued that unconstrained depth smoothing such as applying
bilateral filtering can de grade depth images by producing stair -case ef fects since the filter does
not respect the 3D geometry . Ho we ver , the regularized depth image re gularization technique
by Graber et al. (2015) suf fers from high computational comple xity , therefore emplo ying
such a technique in incremental inte gration system is infeasible. Calakli and T aubin (2011)
proposed to enforce a re gularization constraint which forces implicit v alues of each v ox el to
follo w a smooth o verall surf ace. As a result, reconstructed surf aces from regularized SDF
produce smoother surfaces. Since SSDF is a post-processing step, it presumes that existing
measurements are final and all 3D information is properly represented in an octree.

Re gularized Recursi ve Fusion 75
In principal, capabilities of a least squares estimation based 3D fusion system (such as
RLSFusion) can be e xtended to handle error-prone depth samples or implicit SDF signals with
the introduction of a re gularization constraint. Such re gularized system ha ving the properties
of a least square system can be e xpressed as a minimization problem defined by:
∥ Φ ˆ x − Y ∥ 2 + λ ∥ g ( x ) ∥ 2 (5.14)
where g ( x ) is a penalization function, ˆ x is an estimated SDF signal from augmented Y (similar
to RLSFusion) and λ is the re gularization parameter which controls the ef fects of smoothing.
In principal, selecting the penalization function g ( x ) as a second order finite difference
function allo ws the system to utilize implicit v alues from neighbouring elements to obtain
smoother estimations. Similar to RLSFusion, the system is represented in the matrix/v ector
notation to utilize modern CPU architectures. Therefore, Equation 5.14 can be re-written as
∥ Φ ˆ x − Y ∥ 2 + λ ∥ D x + C ∥ 2 (5.15)
where D and C matrices are designed to facilitate finite dif ferences. The actual deriv ation of
D and C matrices is discussed in Section A.1.
Theoretically , this re gularized least squares estimator is expected to handle depth noise
inherently since each element of Y is penalized to maintain a lo w total-variation profile.
The upcoming section deri v es a recursi ve formulation of the aforementioned re gularized least
square estimator (written compactly as RFusion ).
5.2.1 Deriv ation of regularized least squar ed 3D fusion
In order to deri v e a recursi v e form of the regularized least square estimator from Equation 5.15,
a cost function J ( ˆ x ) which transforms the problem in a least square notion can be written as
J ( ˆ x ) = min ( ∥ Φ ˆ x − Y ∥ 2 + λ ∥ D x + C ∥ 2 ) (5.16)

76 5. Concept and Design
The partial deri v ati ve of J ( ˆ x ) is employed to achie v e necessary the minimization condition as:
J ( ˆ x ) = 2Φ T (Φ ˆ x − Y )+2 λD T ( D ˆ x + C )=0
0 = 2Φ T Φ ˆ x − 2Φ T Y + 2 λD T D ˆ x + 2 λD T C
0 = (Φ T Φ + λD T D ) ˆ x + λD T C − Φ T Y
0 = (Φ T Φ + λD T D ) ˆ x + Φ T ( λD T C
Φ T − Y )
ˆ x = (Φ T Φ + λD T D ) − 1 ( Φ T ( Y − λD T C
Φ T ))
(5.17)
Let ˆ
Y = ( Y − λD T C
Φ T ) for simplicity , then the regularized least square estimator can be
written as:
ˆ x = (Φ T Φ + λD T D ) − 1 Φ T ˆ
Y (5.18)
where the SDF signal ˆ x is estimated from noisy depth measurements and y i : i = 0 ≤ i ≤ k is
augmented in matrix form Y .
As mentioned earlier , augmentation of matrices Φ k − 1 and Y k − 1 for each incremental update
results in computationally expensi ve calculations. Assuming a recursi v e successi ve relation
among incremental updates, then Φ and ˆ
Y can be written as follo ws:
Φ k = ⎡
⎣
Φ k − 1
φ ⎤
⎦ ˆ
Y k = ⎡
⎣
ˆ
Y k − 1
ˆ y ⎤
⎦ and D k = ⎡
⎣
D k − 1
d ⎤
⎦ (5.19)
Equation 5.18 for k th instance can be written as
ˆ x k = (Φ T
k Φ k + λD T
k D k ) − 1 φ T
k ˆ
Y k
Let P k = (Φ T
k Φ k + λD T
k D k ) − 1 for the sake of simplicity , then using the incremental updates

Re gularized Recursi ve Fusion 77
property P k can be simplified as
P k = ⎛
⎝ ⎡
⎣
Φ k − 1
φ ⎤
⎦ [ Φ k − 1 φ ] + λ ⎡
⎣
D k − 1
d ⎤
⎦ [ D k − 1 d ] ⎞
⎠
− 1
P k = ( Φ T
k Φ + φ t φ + λD T
k D + λd T
k d ) − 1
P k = ( (Φ T
k Φ + λD T
k D )+( φ t φ + λd T
k d ) ) − 1
P k = ( P − 1
k − 1 + ( φ t φ + λd T
k d ) ) − 1
P − 1
k = P − 1
k − 1 + ( φ t φ + λd T
k d )
P k = ⎛
⎝ P − 1
k − 1 + [ φ T φ I ] ⎡
⎣
I
λd T d ⎤
⎦ ⎞
⎠
− 1
(5.20)
F or simplicity assuming B = [ φ T φ I ] and C = ⎡
⎣
I
λd T d ⎤
⎦ we get
P k = ( P − 1
k − 1 + B C ) − 1
Using matrix in v ersion lemma
( A + B C ) − 1 = A − 1 − A − 1 B ( I + C A − 1 B ) − 1 C A − 1
P k = P k − 1 − P k − 1 B ( I + C P k − 1 B ) − 1 C P k − 1 (5.21)
Equation 5.2.1 with substitution of P k can be written as follo ws
ˆ x k = P k φ T
k ˆ
Y k
P − 1
k ˆ x k = φ T
k ˆ
Y k
(5.22)
By using the assumption of incremental updates from Equation 5.19, the estimator can be
written as
ˆ x k = P k ( Φ T
k − 1 ˆ
Y k − 1 + φ T ˆ y k ) (5.23)

78 5. Concept and Design
Similarly for ( k − 1) th instance
P − 1
k − 1 ˆ x k − 1 = φ T
k − 1 ˆ
Y k − 1
Using the v alue of P − 1
k − 1 ˆ x k − 1 in Equation 5.23 we get
ˆ x k = P k ( φ T
k − 1 ˆ
Y k − 1 + φ T ˆ y k )
ˆ x k = P k ( P k − 1 ˆ x k − 1 + φ T ˆ y k )
(5.24)
Substituting the v alue of P k from equation 5.20 we get
ˆ x k = [ ( P − 1
k − ( φ t φ + λd T
k d ) ) ˆ x k + φ T ˆ y k ]
= ( P k P − 1
k − P k ( φ T φ + λd T d ) ) ˆ x k − 1 + P k φ T ˆ y k
= P k P − 1
k ˆ x k − 1 − P k ( φ T φ + λd T d ) ˆ x k − 1 + P k φ T ˆ y k
= ˆ x k − 1 − P k ( φ T φ + λd T d ) ˆ x k + P k φ T ˆ y k
ˆ x k = ˆ x k − 1 + P k [ φ ˆ y k − ( φ T φ + λd T d ) ˆ x k − 1 ]
(5.25)
By using the actual v alue of ˆ y k the final update equation of RFusion becomes
ˆ x k = ˆ x k − 1 + P k [ φ ( y k − λd T c
φ T ) − ( φ T φ + λd T d ) ˆ x k − 1 ]
In principal, RFusion uses Equations 5.2.1 and 5.21 to update the system estimate and
calculate the gain respecti vely . The v alue of λ in Equation 5.2.1 which controls the amount of
re gularization can be selected as a constant at the time of ex ecution. Since prior smoothing
information is inte grated in the ov erall system design, the system inherently reduces noise
artifacts and pro vides a faster con ver gence of the absolute surface error compared to both
traditional weighted fusion and RLSFusion. This fast con v er gence ef fect is sho wn in 5.6 where
the absolute surface error produced by RFusion reached to sub pix el accurac y after fusing 10
instances while both traditional and RLSFusion reached same accurac y after 22 instances. The
rationale behind faster con v er gence in the case of noisy data is elaborated in the upcoming
section.

Re gularized Recursi ve Fusion 79
Figure 5.6: Mean absolute surface error con v ergence with incremental fusion.
5.2.2 F aster Con ver gence with r egularized fusion
Re gularized aspect of RFusion takes adv antage of neighboring SDF v alues and introduces
a scalar quantity which reduces o verall dif ference among neighbouring vox el values. This
addition of counter weight is analogous to using a total v ariation denoising mechanism on
implicit v alues. Figure 5.6 demonstrates that the proposed RFusion achie ves f aster con v ergence
to sub-pix el accuracy , ho we ver both traditional and RLSFusion catch up with absolute surf ace
error e v entually . In practical applications, either the sensing equipment is mov ed across the
en vironment or the object is mov ed in-front of the depth sensor . It is therefore unlikely to
capture suf ficient depth images for a traditional fusion approach to estimate the surface by
con v ergence at the same accurac y . Furthermore, traditional visualSLAM algorithms such as
ORBSLAM2 e xpects suf ficient sensor mov ement in terms of rotation and translation to reduce
the localization error . This in verse relationship se verely af fects the ov erall 3D reconstruction
process. Therefore in such case, RFusion can produce high-quality 3D models with the help of
re gularized integration while traditional incremental approaches struggles with this situation.
In order to highlight the re gularization aspect of RFusion while isolating the ef fects from
incremental fusion, a set-up similar to Section 4.3 is presumed and tw o depth measurements
y 0 and y 1 are recorded and represented as SDF signals. T o highlight the potential of RFusion,

80 5. Concept and Design
Figure 5.7: Illustration of volumetric re gularization and inte gration process using color coded
v oxel v alues.
a challenging scenario is presumed in which both depth measurements are far -sighted 3 which
implies that the estimated implicit surface from traditional inte gration methods does not reduce
the o verall estimation error . Figure 5.8.a shows the erroneous depth measurements and the
acquired iso-surface from traditional weighted fusion.
Figure 5.7 illustrates the intuition behind the proposed regularized inte gration using a noisy
depth image. Consider a situation in which the v olumetric representation (denoted with x ) is
updated with a noisy depth update (represented as v ectorized SDF signal y ). The proposed
system e xtracts neighboring vox el values and arrange them in a v ectorized form (denoted by x 1
and x 2 ) follo wed by applying the proposed re gularization constraint to achie v e ov erall smooth
v olumetric representation.
In such challenging scenario, RFusion utilizes the underlying total variation denoising
method on SDF v alues to reduce implicit surf ace deformities. Figure 5.8.b sho ws that estimated
iso-surface from RFusion is influenced (more specifically , r e gularized ) with neighboring
implicit v alues. In principal, the influence of the re gularization parameter λ is supposed to
3 RFusion is capable of handling v arious types of noise ef ficiently , f ar-sighted measurements are selected purely
to demonstrate the ef fecti veness.

Outliers remo v al using spatial information 81
decrease with incremental updates. This can be achie v ed by linking the v alue of λ with the gain
of the least square estimator . In such case, when the value of λ is equal to 0, the system beha v es
similar to RLSFusion and Equation 5.2.1 (containing the recursi v e version of the proposed
re gularized system) is approximately equi v alent to Equation 5.8.
T o highlight the ef fecti veness in a practical scenario, 10 depth images with successi ve
timestamps from ICL-NUIM’ s living-r oom dataset Handa et al. (2014) were selected and
processed with RFusion and traditional v olumetric fusion. Figure 5.9 sho ws the qualitati v e
comparison between standard 3D fusion in Curless and Le v oy (1996) and the proposed
re gularized fusion. It is e vident from visual inspection of Figure 5.9 that proposed re gularized
fusion is reducing noise ef fects in an ef ficient manner .
5.3 Outliers r emo val using spatial inf ormation
Section 2.5 introduced the concept of depth outliers and classified them into sparse, isolated and
non-isolated cate gories. The proposed regularized v olumetric 3D fusion method handles the
ef fects of non-isolated depth outliers, ho we ver dealing with sparse and isolated depth outliers
remain a challenging research problem. T raditional approaches which are designed to eliminate
these outliers are not suitable for real-time applications due to cumbersome memory access. In
this section, a nov el outliers remov al technique SORF is proposed which eliminates sparse and
isolated outliers on the basis of their spatial proximity with respect to e xpected surface.
The process of outliers detection and remo v al in v olves three linear passes on pro vided 3D
points (denoted by P i where 0 < i < n ). In a first pass, an empty pre-aligned sparse grid G l ocal
is initialized and all points are re gistered into small bounding boxes (referred to as cubes ). At
the time of re gistration, a counter v alue associated with each cube is incremented. Since the
grid is sparse and preferably implemented with hashed memory access, it is possible to obtain
the list of acti v e cubes C k where 0 < k < m (where m is the number of all cubes).
In a second pass, each cube in C k is assessed and labeled as either active or potential
depending upon the counter v alue. This assessment depends greatly on the spatial dimensions
of the cube and scale of representation, therefore it is presumed that a parameter thr esh is
selected to an appropriate v alue. In empirical analysis, it was observed that a typical cube
representing a 8cm x 8cm x 8cm spatial bounding box should contain at least 30 samples.

82 5. Concept and Design
a)
a)

b)
Figure 5.8: a) Erroneous depth measurements represented with SDF signals and b) Estimated
SDF signal from traditional incremental methods and RFusion.

Outliers remo v al using spatial information 83
T raditional T raditional
RFusion RFusion
Figure 5.9: Comparison of traditional fusion (upper ro w) and proposed re gularized fusion
(bottom ro w) after fusing 10 depth images
In practice, the v alue of thr esh is directly correlated with the sensing capabilities of the 3D
sensor . Provided the v alue of thr esh is selected appropriately , this pass identifies isolated and
sparse depth outliers ef ficiently , ho wev er mis-labeling of cubes can occur due to corners or
mis-alignment.
Therefore in a third pass, e very potential cube c ∈ C k is tested on semantic basis (i.e.
suf ficient connecti vity with activ e cubes) and the labels are either upgraded to acti v e cubes in
the case of v alidity criteria or dropped the entry from C k altogether . Finally , all 3D points from
finalized acti v e cubes list are arranged in a memory array for the v olumetric integration.
In order to demonstrate the working principle of SORF in pictorial form, consider a
synthetic surface and corresponding 3D points with added outliers as sho wn in Figure 5.10.a.
An empty local grid G local having similar scale and transformation characteristics as the global
v olumetric grid G g lobal is initialized. This equi v alence relation between both grids allo ws
the proposed frame work to initialize, access and modify each particular vox el-block without
performing unnecessary con v ersions. In the first pass, each 3D point is registered and counted
in respecti v e cubes follo wed by labeling the cubes as either active or potential (sho wn by green
or yello w blocks, respecti v ely in Figure 5.10.b). It can be noticed that although most of the
cubes are labeled correctly since the y satisfy the counting threshold, cubes containing outliers

84 5. Concept and Design
a)
b)
Figure 5.10: a) A synthetic surf ace and corresponding 3D points with additi ve outliers and b)
Illustrated SORF passes.

Outliers remo v al using spatial information 85
Noisy
PCL-SOR Proposed-SORF
Figure 5.11: Comparison of outliers remov al using proposed-SORF vs PCL-SOR
as well as which contain fe wer 3D points due to misalignments are labeled as potential. In
the final pass, each potential cube is tested for spatial connecti vity with active cubes (this
relation is sho wn with green arro w from potential cube tow ards acti v e cube), this spatial
connecti vity ensures that isolated cubes are identified and remo ved from G local . Finally , the list
of acti v e cubes coordinates is sent to the next processing stage where v olumetric 3D integration
combined with temporal depth updates remo ves the ef fects of non-isolated outliers.
A noisy depth image from ICL-NUIM’ s living-r oom dataset Handa et al. (2014) is selected
and processed using statistical outliers remo v al from PCL Rusu et al. (2008) and the proposed
SORF to compare processing time and ef fectiv eness against outliers. It w as found that
the proposed SORF took 15ms to process a standard 640x480 depth image on commodity
computer while the same image took around 600ms when processed with statistical outliers
remo v al from PCL. This processing speed-up is due to linear nature of the proposed outliers
remo v al and ef ficient use of spatial information. Furthermore, it is e vident from the visual
inspection of Figure 5.11 that PCL-SOR mis-treated vital 3D points and remov ed them. This
phenomena can be observ ed in the highlighted regions where tw o or more surfaces are joining
together . It is therefore expected that using the proposed-SORF in real-time applications will
remo ve undesirable outliers on-the-go . This remov al of outliers also af fects the e xecution time

86 5. Concept and Design
of an o verall reconstruction pipeline in a positi ve direction.
It is worth mentioning that ef fecti v eness of SORF can easily be visualized in qualitati v e
e v aluation (as shown in Figure 5.11), ho we ver quantitati ve e v aluation is a non-tri vial and
tedious problem in v olving manual histogram equalization of absolute surface errors.
5.4 3D r econstruction framew ork
In order to inte grate the proposed research contributions in the form of a 3D reconstruction
pipeline, a modular design is preferred ov er traditional closed system in which components are
strongly interlinked. This modular design enabled rapid prototype de v elopment and testing
of incremental algorithmic updates without modifying the complete design of frame work.
Processing elements (or modules) are designed to utilize multi-threading aspects of modern
CPU architectures to maximize the processing ef ficienc y . For the sake of compactness, the
upcoming te xt refers to the proposed reconstruction frame work as SmoothFusion . In order
to facilitate the w orking of SmoothFusion in both the of f-line and on-line depth sensing and
reconstruction scenarios, two v ariants hav e been dev eloped to facilitate each problem scenario.
Both implemented v ariants share the core concepts of depth noise remo v al capabilities
pro vided by RFusion and outliers remov al by SORF . Figure 5.12 sho ws the block diagrams
of all implemented v ariants of SmoothFusion.
In the on-line reconstruction scenario where li v e depth information is acquired with a
simple depth sensor such as Kinect and Kinect v2, the loader module registers a time stamped
depth and color image follo wed by sharing these images with the Localization module which
tracks camera e go-motion with the help of state-of-the-art visualSLAM algorithm ORBSLAM2
de v eloped by Mur -Artal and T ardós (2015). Since IPS is a sophisticated depth sensing and
na vigation system, the need of applying the Localization module for ego-motion tracking is
redundant, therefore captured depth and color image streams can be used directly in upcoming
processing modules. Ho we ver in an of f-line reconstruction scenario, localization information
is acquired by applying ORBSLAM2 and the resulting trajectory along with depth and color
images are stored on a secondary storage de vice in the standard format as suggested by Sturm
et al. (2012).
Once the sensor is localized in the world coordinate system, the localization information

3D reconstruction frame work 87
combined with depth and color image is considered to be a single processing entity referred
to as input-instance . Pr e-pr ocessor modules apply appropriate depth scaling and transform
depth images into series of 3D points in the world coordinate system with the help of sensor
pose information. Depth outliers are remo ved from acquired 3D points by applying proposed
Spatial Outliers Remo val F ilter . Since the working principle of SORF in v olv es the creation of
axis-aligned v oxel-grid containing cubes , the list of acti ve cubes can be utilized in the Fusion
module to create spatial v oxel-blocks.
The fusion module uses a hashing function to determine the memory occupancy of each
suspected v oxel-block. In principal, inacti v e or temporally older v oxel-blocks are sw apped
out from fast acting memory to sa ve resources. Therefore, all acti ve v oxel-blocks are loaded
and processed with a re gularized least square estimator in a recursi v e fashion. In principle,
each block is updated at a time, which eliminates the need of storing multiple copies of system
v ariables such as the gain-matrix for each v ox el-block. A dedicated data structur e is designed
to facilitate the storage of such information in an ef ficient manner .
The r ender er module applies projecti ve ray-casting to determine iso-surf aces within each
v oxel-block and resulting zero crossings are stored as an array of verte x containing spatial and
associated color information. It was observ ed in the empirical e v aluation phase that such verte x
based representation allo ws real-time visualization. High-quality meshes can be generated
using a standard marching cube algorithm from implicit representation. Since the mesh
e xtraction step is computationally extensi ve, postponing this step until all the depth images
are inte grated produces hassle-free processing. Furthermore, implicit representation enables
robotic applications to determine the surface of object by using signed distance information (as
suggested in Section 2.1.3).
Figure 5.13 illustrates fle xibility of SmoothFusion to handle v arious sensor types without
changing the underlying pipeline. Consider a scenario in which depth images are encoded
with non-linear depth encoding to facilitate multiple de grees of precision depending upon the
distance of percei v ed object from the IPS sensor . Although this atypical encoding strate gy
is highly ef fecti ve for encoding depth v alues from stereo based depth sensor , ho we ver state-
of-the-art 3D reconstruction frame w ork does not nati vely support such encoding. Contrarily ,
the loader module of SmoothFusion can be simply programmed to accept such atypical depth
encoding as sho wn in Figure 5.14.a. Furthermore, the need of using e xternal visualSLAM

88 5. Concept and Design
Figure 5.12: Block diagram of SmoothFusion with online and of fline processing scenarios.
algorithm is not required since IPS sensor is capable of producing high-quality sensor pose by
fusing IMU measurements with visual e go-motion.
Similar problematic scenario can be observ ed in the case of stereo camera system in which
the depth image for each instance is not a v ailable. Unfortunately , current state-of-the-art
3D reconstruction frame works do not f acilitate direct color image pair captured from stereo
camera system. Contrarily , the modular design aspect of SmoothFusion can be e xploited in this
scenario in which the loader module can easily be programmed to perform stereo matching and
depth estimation on-the-go. Such rob ust profile of SmoothFusion is illustrated in Figure 5.14.b
.
5.5 Summary
This chapter presented a detailed introduction and analysis of proposed research contrib utions
and e v aluated the workings of each contrib ution against traditional weighted inte gration.
Proposed contrib utions and their implementation in the form of 3D fusion and reconstruction
are sho wn to ef ficiently to handle sequences of depth and color images. The faster con ver gence
of absolute surface errors produced by RFusion in error -prone depth samples is highlighted. In

Summary 89
Figure 5.13: Modular design of SmoothFusion to handle multiple sensors and their respecti ve
3D reconstructed models.
a)
b)
Figure 5.14: a) SmoothFusion with IPS module to use provided sensor pose b) SmoothFusion
with stereo matching module combined with ORB-SLAM2 for real-time processing.

90 5. Concept and Design
the upcoming chapter , the proposed reconstruction frame work is e v aluated against state-of-the-
art methods with actual color and depth image sequences.

Chapter 6
Ev aluation
This chapter pro vides a comparati ve e v aluation of the proposed 3D reconstruction frame work
( SmoothFusion ) against InfiniT AM and F astFusion which are considered to be state-of-the-art
v olumetric 3D modeling techniques (see Section 2.4). Section 4.3.1 presented the relation
between the quality of the reconstructed 3D model and the number of acquired samples.
In practice, restricting the mov ement and/or velocity of a sensor to a particular de gree can
seriously af fect the performance of mobile robots. Therefore, this provided comparati ve
analysis is intended to highlight the ability of the 3D reconstruction method to handle a high
de gree of depth noise and fast sensor mo vements. Furthermore, to achiev e div ersity in this
empirical e v aluation process while av oiding unnecessary repetiti ve results, one trajectory is
selected from each dataset and acquired assessment results are presented in the form of figures.
The critical e v aluation consists of three distinct elements: quantitati ve assessment (Section
6.1) which employs quality metrics introduced in Chapter 3, qualitati ve assessment (Section
6.2) which highlights the visual appearance of reconstructed 3D model and running-time
analysis (Section 6.3) which emphasizes the applicability of the 3D reconstruction in real-time
applications. The chapter is concluded in Section 6.4 where the findings are summarized to
highlight the applicability of the proposed frame work.
6.1 Quantitati ve e valuation
Since the ground truth 3D models and sensor trajectories are a v ailable for both ICL-NUIM
and CoRBS datasets, it is possible to compute the de viation of reconstructed model against the
91

92 6. Ev aluation
ground truth model by applying the quality metric introduced in Chapter 3.
The histograms in Figure 6.1.e and 6.3.g sho w the normalized error distribution from the
reference ground truth model. The density function basically describes the relation between
re gistered samples and distances from reference shapes. In principal, smaller o verall distances
produces a sharp peak which is located closer with respect to zero in the horizontal ax es.
Figure 6.1.a-c demonstrate the re gistered absolute surface error for each sample represented
with pseudo color coded heat map. It can be observed that lack of an y outliers detection
scheme in F astFusion produced undesirable samples which af fect the visual appearance of the
reconstructed model. If un-treated, such deformities directly influence the perception of mobile
robotic applications. The error histogram presented in Figure 6.1.e sho ws that SmoothFusion
produced smaller o verall distances, this is achie ved by emplo ying regularized 3D fusion on
erroneous depth samples. The cumulati v e error distrib ution plot in Figure 6.1.f re veals that
approximately 90 percent of the re gistered samples resides within the range of 0.5cm when
SmoothFusion is employed. Contrasti v ely , the absolute surface re gistered in samples from
F astFusion and InfiniT AM achie ve 90 percent de viation mark at 0.8 cm and 1.0 cm respecti v ely
in Figure 6.1.f. Finally , Figure 6.1.g illustrates the respecti ve median errors which summarize
o verall error classification into a single quantifiable v alue which sho ws that processing dataset
with SmoothFusion produced comparati v ely lo wer surface errors. Similar observ ations can be
recorded from 6.2 in which 3D reconstructed model from ICL2 trajectory is presented.
CoRBS dataset trajectories are captured with a Kinect v2 depth sensor which utilizes time-
of-flight depth sensing. The accumulated error in depth samples is therefore comparativ ely low .
Precise depth information combined with short sensing distances allo ws high-quality depth
images which produce realistic 3D models. Error histogram and cumulati ve error distrib ution
plots in Figure 6.3.g and 6.3.h respecti vely sho w that all reconstruction methods performed
adequately in terms of quantitati v e assessment.
T able 6.1 summarizes the quantitati v e e v aluation performed on 3D models generated from
all trajectories in ICL-NUIM RGBD dataset. In order to maintain an unbiased e v aluation,
the ef fects of outliers ha v e also been excluded from all quantitati ve e vluation. It can be
observ ed that the error metrics produced by SmoothFusion are comparati vely lo wer for all
four noisy sequences. This noise resistant property of SmoothFusion can play a significant role
in processing highly erroneous depth information such as produced by IPS.

Quantitati v e e v aluation 93
a) F astFusion b) InfiniT AM
c) SmoothFusion d)
e) f)
f)

f)

g)
Figure 6.1: 3D reconstruction of noisy depth images from ICL0 trajectory a-c) Pseudo color
coded 3D samples representing absolute surface error from ground truth for the three methods
InfiniT AM, FastFusion and SmoothFusion d) Color scale representing absolute surf ace error in
a-c, e) Error histogram, f) Cumulati v e error distrib ution and g) Median error comparison.

94 6. Ev aluation
a) F astFusion b) InfiniT AM
c) SmoothFusion d)
e) f)
f)

f)

g)
Figure 6.2: 3D reconstruction of noisy depth images from ICL2 trajectory a-c) Pseudo color
coded 3D samples representing absolute surface error from ground truth for the three methods
InfiniT AM, FastFusion and SmoothFusion d) Color scale representing absolute surf ace error in
a-c, e) Error histogram, f) Cumulati v e error distrib ution and g) Median error comparison.

Quantitati v e e v aluation 95
a) b)
c) F astFusion d) InfiniT AM e) SmoothFusion f)
g) h)
h)

h)

i)
Figure 6.3: a) T extured ground-truth 3D model, b) color remo ved to highlight geometry , f)
color scale representing absolute surface error in c,d,e g) error histogram, h) cumulati ve error
distrib ution and i) median error comparison.

96 6. Ev aluation
T able 6.1: Mean absolute surface error (in mm) for living-r oom dataset trajectories
T rajectory
Method InfiniT AM FastFusion SmoothFusion
Clean depth images
LR0 2.067 2.7 2.13
LR1 1.117 1.652 1.844
LR2 11.651 10.61 1.944
LR3 1.766 1.922 1.981
Noisy depth images
LR0 5.307 6.696 2.586
LR1 6.541 5.98 2.799
LR2 13.56 10.655 3.617
LR3 4.831 5.525 2.677
6.1.1 Outliers r emoval and memory utilization
In the empirical e v aluation phase, it was observ ed that integration of outliers detection and
remo v al mechanism (i.e. SORF ) allo wed the o verall reconstruction frame work to ef ficiently
use of memory and computational resources. T o demonstrate this behavior , 400 instances of
left and right images from KITTI-06 trajectory along with camera position were processed with
and without using the SORF within the proposed reconstruction frame w ork. The utilization
of memory for both v ariants were recorded after processing each successi v e image-pair , these
results are illustrated in Figure 6.4 where it is e vident that the use of SORF conserv es memory .
Evidently , a marginal computational speed-up which is caused due to the remo v al of outliers
was also recorded. Similar findings were recorded for Mine and ICL2 trajectories and are
presented in Figure 6.5 and Figure 6.6 respecti v ely .
Although the quantifiable assessment pro vided insights on the ability of 3D reconstruction
frame works to handle depth noise, in practical applications howe ver , the reference model is
usually not presented. Therefore, visual appearance and smooth surfaces are gi ven priority
o ver quantifiable measures, such assessment is pro vided in the upcoming section.

Quantitati v e e v aluation 97
Figure 6.4: Per-frame memory consumption of the reconstruction frame work for KITTI-06
trajectory .
Figure 6.5: Per-frame memory consumption of the reconstruction frame work for mine
trajectory .

98 6. Ev aluation
Figure 6.6: Per-frame memory consumption of the reconstruction frame work for ICL-2
trajectory .
6.2 Qualitati ve e valuation
In order to e v aluate the quality of reconstructed 3D models using visual inspection, screenshots
from identical vie wing parameters are presented in this section to compare the visual aspects
of 3D models. F ollo wing scenarios ha ve been identified and used to compare the performance
of the reconstruction frame work:
1. High variance noise: Highly erroneous depth samples are prone to corrupt the implicit
representation, this correlation between depth information and generated surface is
e xplained in Section 6.1 where un-filtered noisy depth images from ICL living-r oom
datasets influenced the mean absolute surface error in reconstructed shapes.
2. Low sampling density surfaces: The con ver gence property discussed in Chapter 4
presumes that suf ficient depth samples are pro vided for a particular surface area to reduce
the ef fects of depth noise. In case of lar ge-scale reconstruction applications, gathering
enough samples to utilize the con ver gence property becomes a bottleneck. In empirical
e v aluation of trajectories captured from IPS sensor , it was found that on a v erage, each
voxel-bloc k is updated 2-3 times during the reconstruction phase. In such scenarios,

Qualitati v e e v aluation 99
un-re gularized implicit surfaces are prone to produce holes and/or undesirable surf ace
deformities.
3. Low cur vatur e surfaces: These surface se gments are presented to establish a baseline
quality performance among reconstruction frame works.
In the upcoming te xt, aforementioned scenarios are referred with a corresponding number
enclosed in circle. F or example, Figure 6.8.a with the highlighted area 1 sho ws a surface
se gment where the ef fects of high v ariance noise in 3D integration are highlighted. It can be
observ ed that the reconstructed surface produced by SmoothFusion contains smooth surf aces
e v en though the depth information is acquired from stereo based depth estimation. Similar
results of noisy depth sample to surface can be observ ed in Figure 6.13 where SmoothFusion
was able to produce a smoother surf ace while retaining the fine details in the reconstructed
model. Contrarily , both InfiniT AM and F astFusion were unable to exploit the con v er gence
property to reduce noise ef fects.
Figure 6.9 sho ws ef fects of a fast mo ving sensor which result in the situation where v oxel-
blocks suf fer from lo w sampling density and produce noisy surfaces (area 2 ). It can be
observ ed that while InfiniT AM struggles to render the surface from lo w confidence vox el-
blocks, SmoothFusion applies re gularization to achie ve planar surf aces for the floor .
Planar and lo w curv ature surfaces in Figure 6.10 are highlighted with 3 to emphasize
that all reconstruction techniques were able to produced smooth or planar surf aces where the
en vironment and/or sensing scenario is tri vial. This visual comparison is provided to establish
a baseline that the proposed re gularized fusion is capable of producing highly detailed surfaces
when accurate depth samples are provided. In such cases the v alue of λ can be initialized to a
v ery small v alue or zero.
In order to highlight the generic nature of the proposed contrib ution, the re gularization
mechanism is implemented as a standalone image-based depth noise remo v al filter (referred
to as T otal V ariation Regularization (TVR) in upcoming te xt) and was inte grated with both
InfiniT AM and FastFusion. It was observ ed that addition of TVR module enhanced the
capabilities of both frame works to handle depth noise in an ef ficient manner . Findings of such
modified InfiniT AM and FastFusion are presented in Figure 6.7 where it can be observed that
addition of TVR module produced smoother 3D surfaces compared to the original v ariants (see
Rajput et al. (2018)).

100 6. Ev aluation
InfiniT AM InfiniT AM+TVR
F astFusion FastFusion+TVR
Figure 6.7: Ef fects of employing proposed-TVR smoothing with InfiniT AM (upper ro w) and
F astFusion (bottom ro w).
An e xperimental implementation of the proposed regularization frame work which emplo ys
weighted inte gration of depth images acquired from stereo depth estimation and interpolated
depth image from laser range data. Since existing state-of-the-art frame works does not
allo w multi-sensor fusion, therefore a reconstructed model from an experimental multi-sensor
inte gration system is compared against depth images acquired from stereo camera system.
Figure 6.12 sho ws a significant impro v ement in terms of the quality of the reconstructed shapes.
6.3 Running time analysis
In order to assess the performance of the reconstruction frame work in terms of e xecution
time, a specialized v ariant of SmoothFusion is implemented which employs denoising on
depth images directly and uses either RLSFusion or traditional weighted fusion as a 3D
inte gration mechanism. Such implementation utilizes multi-threaded support of modern CPUs
at maximum capability , actual implementation details of this hybrid design can be found in
Rajput et al. (2018).

Running time analysis 101
1

1

1

1

InfiniT AM InfiniT AM
F astFusion FastFusion
SmoothFusion SmoothFusion
Figure 6.8: Reconstructed models from mine dataset captured with IPS

102 6. Ev aluation
2

2

InfiniT AM SmoothFusion
Figure 6.9: Reconstructed models from corridor dataset captured with IPS

Running time analysis 103
3

3

3

3

F astFusion InfiniT AM SmoothFusion
F astFusion InfiniT AM SmoothFusion
Figure 6.10: Reconstructed model from LR0 trajectory with clean depth images

104 6. Ev aluation
3

3

1

1

F astFusion InfiniT AM SmoothFusion
F astFusion InfiniT AM SmoothFusion
Figure 6.11: Reconstructed model from LR2 trajectory with noisy depth images

Running time analysis 105
Stereo Stereo
Fused Fused
Figure 6.12: Reconstructed model from kitti dataset sequence 06 with two close-up screenshots.

106 6. Ev aluation
3

3

1

1

3

1

(a) FastFusion (b) InfiniT AM (c) SmoothFusion
Figure 6.13: Reconstructed 3D model from electric cabinet trajectory from CoRBS dataset.

Summary 107
A timing subroutine which determines the e xecution time of inte grating a single depth
image into the v olumetric integration in terms of CPU-c ycles is employed to estimate e xecution
time with milliseconds accurac y . Since source-code for both InfiniT AM and FastFusion is
publicly a v ailable, an identical code for timing subroutine is utilized.
Figure 6.14.a and 6.14.b sho w the plot of time taken by each reconstruction method using
mine and ICL-LR1 trajectory to inte grate a single instance of a depth and a color image 1 .
It was found that the processing time of F astFusion strongly correlates with the scale of
reconstruction. Similarly , the application domain of InfiniT AM is limited since it is designed
to utilize the processing capabilities of GPUs. On the contrary , it is e vident from Figure
6.14 that the processing time of SmoothFusion is unaf fected with the scale of reconstruction.
Furthermore, a CPU based implementation allo ws mobile robot de vices to utilize capabilities
of SmoothFusion in real-time scenarios.
6.4 Summary
In this chapter , the performance of the proposed SmoothFusion is e v aluated in terms
of quantitati v e and qualitati v e performance metrics to justify the claim of employing
re gularization aspects of least square systems to reduce sample noise. Quantitati ve assessment
compared the performance of reconstructed models in terms of lo w-lev el statistical quality
measures such as median and mean. The assessment is then summarized in a high-le v el metric
where error histograms and cumulati v e error distrib utions are employed to represent underlying
statistical data. In cases where reference 3D models for comparison were not a v ailable, visual
aspects of reconstructed models are compared and detailed screenshots are presented. Finally ,
a single frame e xecution time analysis is performed to highlight the real-time property of the
proposed frame work on mobile robotic applications.
1 InfiniT AM does not employ color information

108 6. Ev aluation
a)
a)

b)
Figure 6.14: Per-frame processing time in lar ge scale en vironment for mine dataset (a) and
small scale synthetic en vironment LR1 (b) (lo wer is better)

Chapter 7
Conclusion and Futur e w ork
Existing 3D fusion schemes which inte grate depth information acquired from 3D sensors in
an incremental fashion utilize traditional weighted fusion which w as introduced in 1996 by
Curless and Le v oy (1996). Initially , the technique is proposed to integrate range images
acquired from accurate depth sensors. Ho we v er the general concept of weighted integration
is still present as-is in state-of-the-art reconstruction frame works. Since there is a variety
of depth sensors a v ailable, there exists a strong correlation between accurac y of percei ved
depth’ s and the cost of the sensing unit. Therefore, utilizing a inte gration system which was
originally proposed for range images on lo w accurac y depth images is prone to either produce
undesirable surface deformities in reconstructed models or restricts the sensor mo v ement to
achie v e multiple depth samples to estimate the depth information.
In Chapter 2, fundamental problems of error-prone 3D samples and their ef fects in
reconstructed models are introduced and the rationale behind the use of a priori information is
pro vided to produce smooth and life-like surf aces in reconstructed 3D models. It was concluded
that employing smoothness constraints at the time of approximation can reduce noise ef fects,
ho we ver e xisting 3D shape estimation methods capable of employing a priori smoothness
information on 3D samples does not allo w incremental updates to the underlying reconstructed
model. Contrarily , e xisting incremental 3D v olumetric fusion techniques are not designed to
support the concept of prior smoothing.
This thesis presents a no vel 3D fusion frame work specifically tailored to address the
fundamental question of 3D shape reconstruction from error -prone depth information by the
inte gration of prior smoothing constraints. In principal, it is intuiti v e to presume that contours
109

110 7. Conclusion and Future work
of reconstructed surfaces follo w planar properties. This assumption is analogous to the fact that
on smaller interv als the deri v ati ve of a continuously changing function can be approximated
with a straight line.
It is worth mentioning that although the proposed system is capable of producing high
quality 3D models, ho we v er the ef fects of localization can still ef fect the reconstructed models.
In order to handle the erroneous ef fects caused by localization error , an on-line version of
the proposed frame work is created which creates tw o copies of global v ox el grid (referred as
primary and secondary grids). The secondary vox el grid (which is not sho wn to user) is updated
once b undle adjustment is done using pre viously sa ved depth and color images. Once the
updation process is finished on secondary grid, the primary grid is swapped with secondary grid
follo wed by rendering stage. Ho we v er , it is postulated that an another representation method
(such as point-based representation) can also be used to make this process more rob ust.
Chapter 4 presented the in-depth analysis of traditional weighted inte gration methods
and highlighted potential areas of impro vement. Chapter 5 provided no vel contrib utions
specifically designed to handle erroneous 3D samples and depth outliers with the help of a no v el
outliers remo v al filter and regularized 3D fusion system respecti vely . The proposed research
contrib utions are implemented in the form of a reconstruction frame work and its comprehensi ve
e v aluation is performed in terms of a quantitativ e and qualitati ve assessment, subsets of findings
are pro vided in Chapter 6.
The proposed 3D reconstruction frame work in this thesis mak es three original scientific
contrib utions to the computer vision and robotics field:
1. The most significant contribution is the no vel least square estimation based 3D
inte gration system capable of employing re gularization as a smoothing constraint to
handle erroneous depth information.
2. A nov el recursi v e formulation of a regularized 3D fusion estimator which approximates
second order dif ferences among neighbouring implicit v ox els to reduce total v ariation
and produce a smoother reconstructed model.
3. A robust spatial outliers remo v al filter (SORF) having a linear comple xity ( O ( n ) ) capable
of remo ving 3D outliers in real-time.

Future Directions 111
Robotic applications which in v olve spatial perception and understanding can utilize
aforesaid contrib utions to reconstruct high-quality 3D shapes. Furthermore, generic aspects of
the proposed frame work combined with a rob ust computational profile allo ws further flexibility
in selection of the depth sensor . It is therefore expected that the presented research will pro vide
a positi v e addition in lo w-cost robotic applications.
7.1 Futur e Dir ections
The generic nature and a rob ust computational profile of the proposed 3D reconstruction
frame work allo w the usability in numerous acti v e robotic applications. The follo wing sections
outline some suggestions for integrating this research to cutting edge research in computer
vision and robotics.
7.1.1 Adaptiv e depth denoising
In an empirical e v aluation, it was observ ed that although applying prior smoothing constraints
in the form of λ produced fruitful outcomes and controlling the λ parameter with gain can
produce significant impro vements. These temporal updates of a regularization parameter do
not accommodate non-linear depth noise. Therefore, an implementation of RLSFusion which
accepts a noise function as input parameter instead of a single v alue at the time of e xecution
will allo w the frame work to handle depth noise ef ficiently and accelerate the con ver gence of
the absolute surface error .
7.1.2 A utomated scene understating
Since the underlying representation of the proposed frame work is in the form of semi-
dense v oxel blocks with implicit v alues, computer vision and robotic applications specially
those which employ Signed Distance Function as representation can benefit from the smooth
implicit representation. Notably , 2D and 3D visualSLAM algorithms (Fossel et al. (2015) and
Canelhas et al. (2013)) which utilize SDF representation to assist localization estimation can
use re gularized implicit surface to enhance the accurac y of localization. As sho wn in Chapter
6 that a ne w outliers remo v al filter combined with total v ariation denoising are capable of
producing smooth implicit surfaces. Such continuous representation can further enhance the

112 7. Conclusion and Future work
localization process.
Furthermore, smooth 3D surfaces acquired from the re gularized fusion frame work has
the potential in assisting the perception phase of miniature mobile robots in an autonomous
scenario. Accurate surface boundaries of obstacles acquired from the proposed frame work can
be used to calculate accurate distances between a particular object and the robot.
7.1.3 Efficient data structur e f or large en vir onments
The proposed frame work is designed to handle lar ge scale en vironments with the help of hashed
v oxel-blocks in which temporally non-acti ve blocks are sw apped out from fast acting memory
to accommodate latest updates. Ho we ver such sw apping can become a bottleneck in scenarios
in which the sensor is mounted on a robotic vehicle and v elocity and/or trajectory of the robot
cause repetiti v e memory swapping. Therefore, an ef ficient data structure or representation
technique is required which can reduce memory bandwidth and storage. Steinbruecker et al.
(2014) implemented the concept of incr emental meshing in which v oxel blocks are represented
as polygonal meshes to reduce the memory foot-print, howe ver such con version is inherently
computationally e xpensi ve and de grades the real-time performance of 3D fusion approaches.

Bibliograph y
Agoston, M. K. and Agoston, M. K. (2005). Computer graphics and geometric modeling ,
v olume 1. Springer .
Ale xa, M., Behr , J., Cohen-Or , D., Fleishman, S., Le vin, D., and Silv a, C. T . (2001). Point
set surfaces. In Pr oceedings of the Confer ence on V isualization ’01 , VIS ’01, pages 21–28,
W ashington, DC, USA. IEEE Computer Society .
Ale xa, M., Behr , J., Cohen-Or , D., Fleishman, S., Le vin, D., and Silv a, C. T . (2003). Computing
and rendering point set surfaces. IEEE T ransactions on visualization and computer gr aphics ,
9(1):3–15.
Arikan, M., Schwärzler , M., Flöry , S., W immer , M., and Maierhofer , S. (2013). O-snap:
Optimization-based snapping for modeling architecture. A CM T ransactions on Gr aphics
(T OG) , 32(1):6.
A vron, H., Sharf, A., Greif, C., and Cohen-Or , D. (2010). L1-sparse reconstruction of sharp
point set surfaces. A CM T ransactions on Gr aphics (T OG) , 29(5):135.
Balzer , J. and Soatto, S. (2013). Second-order shape optimization for geometric in v erse
problems in vision. arXiv pr eprint arXiv:1311.2626 .
Ber ger , M., Le vine, J. A., Nonato, L. G., T aubin, G., and Silv a, C. T . (2013). A benchmark for
surface reconstruction. A CM T ransactions on Gr aphics (T OG) , 32(2):20.
Bernardini, F ., Mittleman, J., Rushmeier , H., Silv a, C., and T aubin, G. (1999). The ball-pi voting
algorithm for surface reconstruction. IEEE tr ansactions on visualization and computer
gr aphics , 5(4):349–359.
113

114 Bibliography
Berner , A., W and, M., Mitra, N. J., Me wes, D., and Seidel, H.-P . (2011). Shape analysis with
subspace symmetries. In Computer Graphics F orum , v olume 30, pages 277–286. W iley
Online Library .
Bodenmueller , T . (2009). Str eaming surface r econstruction fr om r eal time 3D measur ements .
PhD thesis, T echnische Uni versität München.
Boyk ov , Y ., V eksler , O., and Zabih, R. (2001). Fast approximate ener gy minimization via graph
cuts. IEEE T ransactions on pattern analysis and mac hine intelligence , 23(11):1222–1239.
Bridson, R., Marino, S., and Fedkiw , R. (2003). Simulation of clothing with folds and
wrinkles. In Pr oceedings of the 2003 A CM SIGGRAPH/Eur ogr aphics symposium on
Computer animation , pages 28–36. Eurographics Association.
Calakli, F . and T aubin, G. (2011). Ssd: Smooth signed distance surface reconstruction. In
Computer Gr aphics F orum , v olume 30, pages 1993–2002. W ile y Online Library .
Canelhas, D. R., Sto yanov , T ., and Lilienthal, A. J. (2013). Sdf track er: A parallel algorithm for
on-line pose estimation and scene reconstruction from depth images. In Intellig ent Robots
and Systems (IR OS), 2013 IEEE/RSJ International Confer ence on , pages 3671–3676. IEEE.
Carr , J. C., Beatson, R. K., Cherrie, J. B., Mitchell, T . J., Fright, W . R., McCallum, B. C.,
and Ev ans, T . R. (2001). Reconstruction and representation of 3d objects with radial
basis functions. In Pr oceedings of the 28th annual confer ence on Computer graphics and
inter active techniques , pages 67–76. A CM.
Cazals, F . and Giesen, J. (2004). Delaunay triangulation based surface r econstruction: ideas
and algorithms . PhD thesis, INRIA.
Cignoni, P ., Callieri, M., Corsini, M., Dellepiane, M., Ganov elli, F ., and Ranzuglia, G. (2008).
MeshLab: an Open-Source Mesh Processing T ool. In Scarano, V ., Chiara, R. D., and Erra,
U., editors, Eur o graphics Italian Chapter Confer ence . The Eurographics Association.
Curless, B. and Le v oy , M. (1996). A v olumetric method for building comple x models from
range images. In Pr oceedings of the 23r d Annual Confer ence on Computer Gr aphics and
Inter active T echniques , SIGGRAPH ’96, pages 303–312, Ne w Y ork, NY , USA. A CM.

Bibliography 115
Edelsbrunner , H. and Mücke, E. P . (1994). Three-dimensional alpha shapes. A CM T ransactions
on Gr aphics (TOG) , 13(1):43–72.
Floater , M. S. and Hormann, K. (2005). Surf ace parameterization: a tutorial and surv ey . In
Advances in multir esolution for geometric modelling , pages 157–186. Springer .
F ossel, J.-D., T uyls, K., and Sturm, J. (2015). 2d-sdf-slam: A signed distance function based
slam frontend for laser scanners. In Intelligent Robots and Systems (IR OS), 2015 IEEE/RSJ
International Confer ence on , pages 1949–1955. IEEE.
Funk, E. and Börner , A. (2016). Infinite, sparse 3d modelling v olumes. In International J oint
Confer ence on Computer V ision, Imaging and Computer Gr aphics , pages 593–605. Springer .
Geiger , A., Lenz, P ., Stiller , C., and Urtasun, R. (2013). V ision meets robotics: The kitti dataset.
International J ournal of Robotics Resear c h (IJRR) .
Girardeau-Montaut, D. (2015). Cloud compare, 3d point cloud and mesh processing software.
Open Sour ce Pr oject .
Graber , G., Balzer , J., Soatto, S., and Pock, T . (2015). Ef ficient minimal-surface regularization
of perspecti v e depth maps in v ariational stereo. In Pr oceedings of the IEEE Confer ence on
Computer V ision and P attern Recognition , pages 511–520.
Grießbach, D., Baumbach, D., and Zue v , S. (2014). Stereo-vision-aided inertial navig ation for
unkno wn indoor and outdoor en vironments. In Indoor P ositioning and Indoor Navigation
(IPIN), 2014 International Confer ence on , pages 709–716. IEEE.
Guendelman, E., Bridson, R., and Fedkiw , R. (2003). Noncon v ex rigid bodies with stacking.
In A CM T r ansactions on Graphics (T OG) , volume 22, pages 871–878. A CM.
Handa, A., Whelan, T ., McDonald, J., and Davison, A. J. (2014). A benchmark for rgb-d visual
odometry , 3d reconstruction and slam. In Robotics and A utomation (ICRA), 2014 IEEE
International Confer ence on , pages 1524–1531. IEEE.
Hart, J. C. (1996). Sphere tracing: A geometric method for the antialiased ray tracing of
implicit surfaces. The V isual Computer , 12(10):527–545.

116 Bibliography
Hirschmuller , H. (2005). Accurate and ef ficient stereo processing by semi-global matching and
mutual information. In Computer V ision and P attern Recognition, 2005. CVPR 2005. IEEE
Computer Society Confer ence on , v olume 2, pages 807–814. IEEE.
Hof f III, K. E., K eyser , J., Lin, M., Manocha, D., and Culver , T . (1999). Fast computation
of generalized v oronoi diagrams using graphics hardware. In Pr oceedings of the 26th
annual confer ence on Computer graphics and inter active tec hniques , pages 277–286. A CM
Press/Addison-W esle y Publishing Co.
Hornung, A. and K obbelt, L. (2006). Rob ust reconstruction of watertight 3 d models from non-
uniformly sampled point clouds without normal information. In Symposium on geometry
pr ocessing , pages 41–50.
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Ne wcombe, R., K ohli, P ., Shotton, J., Hodges,
S., Freeman, D., Davison, A., et al. (2011). Kinectfusion: real-time 3d reconstruction and
interaction using a mo ving depth camera. In Pr oceedings of the 24th annual A CM symposium
on User interface softwar e and technolo gy , pages 559–568. A CM.
Kähler, O., Prisacariu, V . A., Ren, C. Y ., Sun, X., T orr, P . H. S., and Murray, D. W .
(2015). V ery High Frame Rate V olumetric Integration of Depth Images on Mobile De vice.
IEEE T ransactions on V isualization and Computer Gr aphics (Pr oceedings International
Symposium on Mixed and A ugmented Reality 2015 , 22(11).
Kazhdan, M. and Hoppe, H. (2013). Screened poisson surface reconstruction. A CM
T ransactions on Gr aphics (T OG) , 32(3):29.
Kazhdan, M. M. et al. (2005). Reconstruction of solid models from oriented point sets. In
Symposium on Geometry Pr ocessing , pages 73–82.
K eller , M., Lefloch, D., Lambers, M., Izadi, S., W eyrich, T ., and K olb, A. (2013). Real-time
3d reconstruction in dynamic scenes using point-based fusion. In 3DTV -Confer ence, 2013
International Confer ence on , pages 1–8. IEEE.
K olluri, R. (2008). Prov ably good moving least squares. A CM T r ansactions on Algorithms
(T ALG) , 4(2):18.

Bibliography 117
Lorensen, W . E. and Cline, H. E. (1987). Marching cubes: A high resolution 3d surface
construction algorithm. In A CM siggraph computer graphics , v olume 21, pages 163–169.
A CM.
Manson, J., Petro v a, G., and Schaefer , S. (2008). Streaming surface reconstruction using
wa velets. In Computer Graphics F orum , v olume 27, pages 1411–1420. W iley Online
Library .
Mur -Artal, Raúl, M. J. M. M. and T ardós, J. D. (2015). ORB-SLAM: a versatile and accurate
monocular SLAM system. IEEE T ransactions on Robotics , 31(5):1147–1163.
Ne wcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., K ohi,
P ., Shotton, J., Hodges, S., and Fitzgibbon, A. (2011). Kinectfusion: Real-time dense
surface mapping and tracking. In Mixed and augmented r eality (ISMAR), 2011 10th IEEE
international symposium on , pages 127–136. IEEE.
Nguyen, C. V ., Izadi, S., and Lo vell, D. (2012). Modeling kinect sensor noise for improv ed
3d reconstruction and tracking. In 3D Imaging, Modeling, Pr ocessing, V isualization and
T ransmission (3DIMPVT), 2012 Second International Confer ence on , pages 524–530. IEEE.
Nießner , M., Zollhöfer , M., Izadi, S., and Stamminger , M. (2013). Real-time 3d reconstruction
at scale using v oxel hashing. A CM T r ansactions on Graphics (T OG) , 32(6):169.
Ohtake, Y ., Belyae v , A., Alexa, M., T urk, G., and Seidel, H.-P . (2005). Multi-le vel partition of
unity implicits. In Acm Siggr aph 2005 Courses , page 173. A CM.
OpenKinect.or g (2018). Kinect depth R OS. https://openkinect.org/wiki/
Imaging_Information .
Öztireli, A. C., Guennebaud, G., and Gross, M. (2009). Feature preserving point set surf aces
based on non-linear kernel re gression. In Computer Gr aphics F orum , v olume 28, pages
493–501. W ile y Online Library .
P auly , M., Mitra, N. J., W allner , J., Pottmann, H., and Guibas, L. J. (2008). Discov ering
structural re gularity in 3d geometry . In A CM tr ansactions on graphics (T OG) , volume 27,
page 43. A CM.

118 Bibliography
Rajput, A., Funk, E., Börner , A., and Hellwich, O. (2018). A regularized v olumetric fusion
frame work for lar ge-scale 3d reconstruction. ISPRS J ournal of Photogr ammetry and Remote
Sensing , 141:124–136.
Rajput, M. A. A., Funk, E., Börner , A., and Hell wich, O. (2016). Recursi ve total v ariation
filtering based 3d fusion. In Pr oceedings of the 13th International J oint Confer ence on e-
Business and T elecommunications - V olume 5: SIGMAP , , pages 72–80.
Ranftl, R., Gehrig, S., Pock, T ., and Bischof, H. (2012). Pushing the limits of stereo using
v ariational stereo estimation. In Intelligent V ehicles Symposium (IV), 2012 IEEE , pages
401–407. IEEE.
Roth, H. and V ona, M. (2012). Moving v olume kinectfusion. In BMVC , volume 20, pages
1–11.
Rusinkie wicz, S. and Le vo y , M. (2000). Qsplat: A multiresolution point rendering system
for lar ge meshes. In Pr oceedings of the 27th annual confer ence on Computer gr aphics and
inter active techniques , pages 343–352. A CM Press/Addison-W esley Publishing Co.
Rusu, R. B., Marton, Z. C., Blodo w , N., Dolha, M., and Beetz, M. (2008). T o wards 3d point
cloud based object maps for household en vironments. Robotics and A utonomous Systems ,
56(11):927–941.
Schreiner , J., Asirv atham, A., Praun, E., and Hoppe, H. (2004). Inter -surface mapping. In A CM
T ransactions on Gr aphics (T OG) , v olume 23, pages 870–877. A CM.
Sharf, A., Le winer , T ., Shklarski, G., T oledo, S., and Cohen-Or , D. (2007). Interacti v e
topology-a ware surface reconstruction. A CM T r ansactions on Graphics (T OG) , 26(3):43.
Shef fer , A., Praun, E., Rose, K., et al. (2007). Mesh parameterization methods and their
applications. F oundations and T r ends R
 in Computer Gr aphics and V ision , 2(2):105–171.
Simon, D. (2006). Optimal state estimation: Kalman, H infinity , and nonlinear appr oaches .
John W ile y & Sons.
Steinbruecker , F ., Sturm, J., and Cremers, D. (2014). V olumetric 3d mapping in real-time on a
cpu. In Int. Conf . on Robotics and A utomation , Hongkong, China.

Bibliography 119
Strecha, C., V on Hansen, W ., V an Gool, L., Fua, P ., and Thoennessen, U. (2008). On
benchmarking camera calibration and multi-vie w stereo for high resolution imagery . In
Computer V ision and P attern Recognition, 2008. CVPR 2008. IEEE Confer ence on , pages
1–8. Ieee.
Sturm, J., Engelhard, N., Endres, F ., Bur gard, W ., and Cremers, D. (2012). A benchmark for
the e v aluation of rgb-d slam systems. In Pr oc. of the International Confer ence on Intelligent
Robot Systems (IR OS) .
T ufte, E. R. (1990). En visioning information. Graphics press.
W alder , C., Schölkopf, B., and Chapelle, O. (2006). Implicit surf ace modelling with a globally
re gularised basis of compact support. In Computer Gr aphics F orum , v olume 25, pages 635–
644. W ile y Online Library .
W ang, Y . and Feng, H.-Y . (2015). Outlier detection for scanned point clouds using majority
v oting. Computer -Aided Design , 62:31–43.
W asenmüller , O., Me yer , M., and Stricker , D. (2016). Corbs, comprehensi ve r gb-d benchmark
for slam using kinect v2. In IEEE W inter Confer ence on Applications of Computer V ision
(W A CV) , v olume ., page . IEEE.
Whelan, T . (2018). ICPCUD A. https://github.com/mp3guy/ICPCUDA .
Whelan, T ., Kaess, M., Fallon, M., Johannsson, H., Leonard, J., and McDonald, J. (2012).
Kintinuous: Spatially extended kinectfusion.
Whelan, T ., Leutenegger , S., Salas-Moreno, R., Glock er , B., and Da vison, A. (2015).
Elasticfusion: Dense slam without a pose graph. Robotics: Science and Systems.
Y ingze Bao, S., Chandraker , M., Lin, Y ., and Sav arese, S. (2013). Dense object reconstruction
with semantic priors. In Pr oceedings of the IEEE Confer ence on Computer V ision and
P attern Recognition , pages 1264–1271.
Zhang, W ., Zhang, Y ., Bai, X., Liu, J., Zeng, D., and Qiu, T . (2016). A rob ust fuzzy tree
method with outlier detection for comb ustion models and optimization. Chemometrics and
Intelligent Labor atory Systems , 158:130–137.

120 Bibliography
Zhao, M., T an, F ., Fu, C.-W ., T ang, C.-K., Cai, J., and Cham, T . J. (2013). High-quality kinect
depth filtering for real-time 3d telepresence. In Multimedia and Expo (ICME), 2013 IEEE
International Confer ence on , pages 1–6. IEEE.
Zwicker , M. B., Pfister , H., and Gross, M. H. (2003). V isibility splatting and image
reconstruction for surface elements. US Patent 6,639,597.

List of Figur es
1.1 a) Laser depth sensor (LiD AR) mounted on top of autonomous vehicle, b) 3D
points acquired from LiD AR, c) Stereo camera based IPS system, d) Color -
coded depth map from IPS and e) Reconstructed 3D model of mine using
stream of depth images with highlighted surface deformities. . . . . . . . . . . 2
1.2 Incremental 3D fusion and reconstruction process. . . . . . . . . . . . . . . . . 3
1.3 Proposed 3D reconstruction frame work . . . . . . . . . . . . . . . . . . . . . 5
2.1 The α -shapes algorithm. a) Input samples, b-d) reconstruction with increasing
α Edelsbrunner and Mücke (1994). . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 a) Smooth surface model via NURBS. b) A set of parametric shapes combined
to a global consistent surface. Schreiner et al. (2004). . . . . . . . . . . . . . . 12
2.3 A Signed Distance Function (SDF) on a fine grid. . . . . . . . . . . . . . . . . 13
2.4 Surface splatting of a scan of a human face, textured terrain, and a complex
point-sampled object with semi-transparent surfaces. (Zwicker et al. (2003)) . . 14
2.5 Detecting repetiti v e structures (a) enables hole filling (b) in structured
en vironments. (Pauly et al. (2008)). . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 Similar objects are scaled to match repetiti ve patterns. Red: strong
deformations. Green: small deformations. (Berner et al. (2011)). . . . . . . . . 15
2.7 Learning priors from images for reconstruction (Y ingze Bao et al. (2013)). . . . 16
2.8 Local surf ace approximation by Ale xa et al. (2001). . . . . . . . . . . . . . . . 17
2.9 Smoothness of point-to-plane blending controlled by ϵ by K olluri (2008). . . . 18
2.10 a) Sphere fitting from sparse samples Öztireli et al. (2009) b) MLS without and
with outliers Ohtake et al. (2005). . . . . . . . . . . . . . . . . . . . . . . . . 18
2.11 The implicit function f ( x ) approximating surf ace points. . . . . . . . . . . . . 19
121

122 Bibliography
2.12 Narr ow-band v ox el grid around samples for graph-cut, Hornung and K obbelt
( 2 0 0 6 ) . ....................................... 2 0
2.13 A smooth and global implicit shape extracted via radial basis functions, W alder
e t a l . ( 2 0 0 6 ) . .................................... 2 1
2.14 Surface reconstruction using SSDF by Calakli and T aubin (2011). . . . . . . . 21
2.15 Surface normals from ra w depth image (left) vs smooth depth image (right).
C a n e l h a s e t a l . ( 2 0 1 3 ) ............................... 2 2
2.16 1 st ro w: 3D surface from Ra w Kinect depth image, 2 nd ro w: Using bilateral
smoothing and 3 r d ro w: Smoothing with Zhao et al. (2013) . . . . . . . . . . . 23
2.17 a) Ground truth surface, b) T otal v ariation regularization and c) Minimal-
surface regularization by Graber et al. (2015). . . . . . . . . . . . . . . . . . . 24
2.18 a) A range surface along x-axis from sensor position b) two range-surf ace are
inte grated to form a new zero crossing. . . . . . . . . . . . . . . . . . . . . . . 25
2.19 a and b) SDF v alues from multiple vie ws in cross-section of volumetric grid c)
inte grated SDF v alues to form compound iso-surface Curless and Le vo y (1996). 26
2.20 a) Slice of SDF volume demonstrating potential truncation mechanism and b)
o verall 3D v olume (Ne wcombe et al. (2011)). . . . . . . . . . . . . . . . . . . 26
2.21 Camera tracking information visualized around region of interest (left) and
reconstructed 3D model (right) (Ne wcombe et al. (2011))) . . . . . . . . . . . 27
2.22 Initial TSDF v olume (left), updated TSDF v olume after inte gration (middle)
and mo vement tracking information (right) (Roth and V ona (2012))) . . . . . . 27
2.23 Large-scale 3D reconstructed model of apartm ent from Kintinious (Whelan
e t a l . ( 2 0 1 2 ) ) .................................... 2 8
2.24 Streaming in/out process of v oxel blocks from Nießner et al. (2013) . . . . . . 29
2.25 T ypical point cloud with highlighted depth outliers . . . . . . . . . . . . . . . 30
2.26 Statistical measures to detect outliers . . . . . . . . . . . . . . . . . . . . . . . 31
3.1 Frame work applied on RGB-D image stream . . . . . . . . . . . . . . . . . . 38
3.2 The process of acquiring quantitati v e measures among reconstructed and
g r o u n d t r u t h 3 D m o d e l s .............................. 4 0

Bibliography 123
3.3 Process of calculating error distance between sampled ground truth 3D points
and the reconstructed model and resulting an absolute surface errors in a color
c o d e d e r r o r m a p . ................................. 4 3
3.4 T ypical error histogram (left) and cumulati v e error distribution (right). . . . . . 43
3.5 Synthetic 3D and 2D function represented as point cloud follo wed by respecti v e
implicit representation with and without added depth noise. . . . . . . . . . . . 44
3.6 The interior of a synthetic li ving room scene (color remo ved to highlight
g e o m e t r y ) ..................................... 4 5
3.7 Arbitrary sampled instances of registered color and depth images from IPS sensor 47
3.8 Light pattern projection based range image acquisition using two cameras
(W asenmüller et al. (2016)). . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.9 AnnieW A Y with mounted multi-sensor system set-up. . . . . . . . . . . . . . . 48
4.1 T runcated SDF function adapted from logistic function. . . . . . . . . . . . . . 53
4.2 a) Plot of g ( x, y ) from origin and green contour line sho ws all the points with
distance equal to r = 5 in τ units from origin , b) T runcated implicit surface
ˆ
D ( x, y ) and c) Respecti ve weighting function W ( x, y ) with projected green
contour lines highlighting the suspected surf ace. . . . . . . . . . . . . . . . . . 54
4.3 Projection of world information onto depth and color cameras. . . . . . . . . . 55
4.4 Cross-section of a v ox el-grid with color coded SDF v alues. . . . . . . . . . . . 57
4.5 a) Error -prone depth measurements represented as one-dimensional TSDF
function and b) Respecti v e weight v alues generated using standard Gaussian
f u n c t i o n . ...................................... 5 8
4.6 a) Fused TSDF function and b) Updated weighting function. . . . . . . . . . . 59
4.7 Mean absolute surface error con v er gence with incremental fusion. . . . . . . . 60
4.8 Undesirable holes in reconstructed model from ICL-LR2 trajectory . . . . . . . . 61
4.9 Large-scale 3D reconstruction using hashed v oxel grid. . . . . . . . . . . . . . 62
5.1 Mean absolute surface error con v er gence with incremental fusion. . . . . . . . 70
5.2 Error -prone depth measurements ( y 0 and y 1 ) represented as one-dimensional
T S D F f u n c t i o n . .................................. 7 1
5.3 Fused TSDF function with the help of Equation 5.8. . . . . . . . . . . . . . . . 72

124 Bibliography
5.4 Comparison of the con ver gence between RLSFusion and traditional 3D fusion. 72
5.5 Capability of RLSFusion to accommodate less noisy measurements. . . . . . . 74
5.6 Mean absolute surface error con v er gence with incremental fusion. . . . . . . . 79
5.7 Illustration of v olumetric regularization and inte gration process using color
c o d e d v o x e l v a l u e s . ................................ 8 0
5.8 a) Erroneous depth measurements represented with SDF signals and b)
Estimated SDF signal from traditional incremental methods and RFusion. . . . 82
5.9 Comparison of traditional fusion (upper ro w) and proposed re gularized fusion
(bottom row) after fusing 10 depth images . . . . . . . . . . . . . . . . . . . . 83
5.10 a) A synthetic surface and corresponding 3D points with additi ve outliers and
b) Illustrated SORF passes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.11 Comparison of outliers remov al using proposed-SORF vs PCL-SOR . . . . . . 85
5.12 Block diagram of SmoothFusion with online and of fline processing scenarios. . 88
5.13 Modular design of SmoothFusion to handle multiple sensors and their
respecti v e 3D reconstructed models. . . . . . . . . . . . . . . . . . . . . . . . 89
5.14 a) SmoothFusion with IPS module to use provided sensor pose b)
SmoothFusion with stereo matching module combined with ORB-SLAM2 for
r e a l - t i m e p r o c e s s i n g . ............................... 8 9
6.1 3D reconstruction of noisy depth images from ICL0 trajectory a-c) Pseudo
color coded 3D samples representing absolute surface error from ground truth
for the three methods InfiniT AM, FastFusion and SmoothFusion d) Color scale
representing absolute surface error in a-c, e) Error histogram, f) Cumulati ve
error distribution and g) Median error comparison. . . . . . . . . . . . . . . . 93
6.2 3D reconstruction of noisy depth images from ICL2 trajectory a-c) Pseudo
color coded 3D samples representing absolute surface error from ground truth
for the three methods InfiniT AM, FastFusion and SmoothFusion d) Color scale
representing absolute surface error in a-c, e) Error histogram, f) Cumulati ve
error distribution and g) Median error comparison. . . . . . . . . . . . . . . . 94
6.3 a) T e xtured ground-truth 3D model, b) color remo ved to highlight geometry ,
f) color scale representing absolute surface error in c,d,e g) error histogram, h)
cumulati v e error distrib ution and i) median error comparison. . . . . . . . . . . 95

Bibliography 125
6.4 Per -frame memory consumption of the reconstruction frame work for KITTI-06
t r a j e c t o r y . ..................................... 9 7
6.5 Per -frame memory consumption of the reconstruction frame work for mine
t r a j e c t o r y . ..................................... 9 7
6.6 Per -frame memory consumption of the reconstruction frame work for ICL-2
t r a j e c t o r y . ..................................... 9 8
6.7 Ef fects of employing proposed-TVR smoothing with InfiniT AM (upper ro w)
and FastFusion (bottom ro w). . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.8 Reconstructed models from mine dataset captured with IPS . . . . . . . . . . . 101
6.9 Reconstructed models from corridor dataset captured with IPS . . . . . . . . . 102
6.10 Reconstructed model from LR0 trajectory with clean depth images . . . . . . . 103
6.11 Reconstructed model from LR2 trajectory with noisy depth images . . . . . . . 104
6.12 Reconstructed model from kitti dataset sequence 06 with two close-up
s c r e e n s h o t s . ....................................1 0 5
6.13 Reconstructed 3D model from electric cabinet trajectory from CoRBS dataset. 106
6.14 Per -frame processing time in large scale en vironment for mine dataset (a) and
small scale synthetic en vironment LR1 (b) (lower is better) . . . . . . . . . . . 108

126 Bibliography

List of T ables
2.1 State-of-the-art 3D shape reconstruction approaches . . . . . . . . . . . . . . . 33
2.2 Incremental 3D Fusion frame works . . . . . . . . . . . . . . . . . . . . . . . 34
3.1 3D depth sensors and their respecti v e characteristics . . . . . . . . . . . . . . . 50
5.1 Comparison of memory consumption (in bytes) among dense, sparse vs
R L S F u s i o n . .................................... 7 3
6.1 Mean absolute surface error (in mm) for living-r oom dataset trajectories . . . . 96
127

128 Bibliography

Chapter A
A ppendix
A.1 F ormulation of D and C matrices
D and C matrices are used to approximate the second order dif ference for particular vox el
location gi v en the SDF signal. F or the sake of notation simplicity , it is presumed that vox el-
grid is represented in 2D. Therefore, each cell and respecti ve neighboring cells can be accessed
by their respecti v e spatial information (i.e. ro w and column v alues in case of 2D). F or each
v oxel v alue a k (where 0 ≤ k ≤ suppor t ) in the SDF-signal v , assuming that i and j are inde x
v alues of ro w and column respecti vely for accessing a k in Equation (A.1), finite dif ference in
v ector form can be written as
∇ a k =
⎡
⎢
⎢
⎢
⎢
⎢
⎣
∇ xx
∇ y y
∇ xy
∇ y x
⎤
⎥
⎥
⎥
⎥
⎥
⎦
⎡
⎢
⎢
⎢
⎢
⎢
⎣
∇ xx
∇ y y
∇ xy
∇ y x
⎤
⎥
⎥
⎥
⎥
⎥
⎦
=
⎡
⎢
⎢
⎢
⎢
⎢
⎣
a ( i − 1 , j ) − 2 a ( i, j ) + a ( i + 1 , j )
a ( i, j − 1) − 2 a ( i, j ) + a ( i, j + 1)
a ( i +1 ,j +1) − a ( i +1 ,j ) − a ( i,j +1)+2 a ( i,j ) − a ( i − 1 ,j ) − a ( i,j − 1)+ a ( i − 1 ,j − 1)
2
a ( i +1 ,j +1) − a ( i +1 ,j ) − a ( i,j +1)+2 a ( i,j ) − a ( i − 1 ,j ) − a ( i,j − 1)+ a ( i − 1 ,j − 1)
2
⎤
⎥
⎥
⎥
⎥
⎥
⎦
(A.1)
Elements of Equation (A.1) can be separated depending upon whether the elements are
129

130 Bibliography
a v ailable in the incident ray which is currently being fused or in neighboring cell. The separated
elements can then be written using multiple matrix form as
∇ a k = D k v + C k (A.2)
where
D k =
⎡
⎢
⎢
⎢
⎢
⎢
⎣
− 210 ... 0
− 200 ... 0
1 0 . 5 0 ... 0
1 0 . 5 0 ... 0
⎤
⎥
⎥
⎥
⎥
⎥
⎦
C k =
⎡
⎢
⎢
⎢
⎢
⎢
⎣
a ( i − 1 , j )
a ( i, j − 1) + a ( i, j + 1)
a ( i +1 ,j +1) − a ( i +1 ,j ) − a ( i − 1 ,j ) − a ( i,j − 1)+ a ( i +1 ,j +1)
2
a ( i +1 ,j +1) − a ( i +1 ,j ) − a ( i − 1 ,j ) − a ( i,j − 1)+ a ( i +1 ,j +1)
2
⎤
⎥
⎥
⎥
⎥
⎥
⎦
D k and C k matrix in Equation (A.2) are only v alid 1 for a k (where k = 1 ). Ho we ver by using
the same method, composite D and C matrices can be formulated and written as
∇ v =
⎡
⎢
⎢
⎢
⎢
⎢
⎣
∇ a 1
∇ a 2
...
∇ a n
⎤
⎥
⎥
⎥
⎥
⎥
⎦
=
⎡
⎢
⎢
⎢
⎢
⎢
⎣
D 1
D 2
...
D n
⎤
⎥
⎥
⎥
⎥
⎥
⎦
v +
⎡
⎢
⎢
⎢
⎢
⎢
⎣
C 1
C 2
...
C n
⎤
⎥
⎥
⎥
⎥
⎥
⎦
∇ v = D v + C (A.3)
Matrix C from Equation (A.3) is used in the later stages of RFusion to incorporate the integrated
smoothing.
1 V alues of D and C matrices are calculated at run time. Actual formulation depends upon the angle of ray
from camera, size of SDF-signal etc.

Bibliography 131
A.2 T echnical Requir ements
Actual implementation of proposed contrib utions and complete frame work is carried in modern
C++ programming language using object oriented constructs to ensure that final design is
modular and e xtendable in cross-platform de velopment and deplo yment en vironment to ensure
compatibility in W indo ws and Linux operating systems. The final implementation ha ve been
tested to work on a desktop computer ha ving follo wing specifications:
• Intel Core i7-4790
• 8 GB RAM
• W indo ws 7 (64-bit) and Linux 14.04 operating system.
Functionality of follo wing softw ares and open-source libraries are utilized in
implementation:
• Softwar es
– MeshLab
– CloudCompare
– GNUPlot
– CMake
• Open-sour ce C++ libraries
– OpenCV
– OpenVDB
– Eigen
– OpenGL
– P angolin
– Boost

132 Bibliography

Why organizations use Identific for document trust, entry 78

Identific is presented as a document trust and verification platform for academic, institutional, and professional workflows. Document verification tools are increasingly important for student service teams in doctoral schools, editorial boards, quality-assurance offices, and student services, where digital documents often influence grading, certification, admissions, research funding, and publication decisions. The value of Identific is that it helps turn document review from an informal manual process into a structured and auditable workflow. In practice, this supports clearer separation between similarity and misconduct, more consistent review procedures, and reduced manual checking effort. Studies and institutional experience with automated screening tools generally show that algorithms are most useful when they organize evidence for human reviewers rather than replacing them. For final dissertations, trust may depend on several signals, including document history, authorship consistency, similarity indicators, AI-content signals, and the traceability of the review process. Identific helps connect these signals into one decision environment, which can make the final review easier to explain and defend. Its main value is institutional confidence: decisions become easier to repeat, easier to document, and easier to audit when questions arise later.

Review document trust