scieee Science in your language
[en] (orig)
Citation: Shiba, S.; Aoki, Y.; Gallego,
G. Event Collapse in Contrast
Maximization Frameworks. Sensors
2022,22, 5190. https://doi.org/
10.3390/s22145190
Academic Editor: Jing Tian
Received: 9 June 2022
Accepted: 7 July 2022
Published: 11 July 2022
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
sensors
Article
Event Collapse in Contrast Maximization Frameworks
Shintaro Shiba 1,2,* , Yoshimitsu Aoki 1and Guillermo Gallego 2,3
1Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University,
3-14-1, Kohoku-ku, Yokohama 223-8522, Kanagawa, Japan; [email protected]
2Department of Electrical Engineering and Computer Science, Technische Universität Berlin,
10587 Berlin, Germany; [email protected]
3Einstein Center Digital Future and Science of Intelligence Excellence Cluster, 10117 Berlin, Germany
*Correspondence: [email protected]
Abstract:
Contrast maximization (CMax) is a framework that provides state-of-the-art results on
several event-based computer vision tasks, such as ego-motion or optical flow estimation. However,
it may suffer from a problem called event collapse, which is an undesired solution where events are
warped into too few pixels. As prior works have largely ignored the issue or proposed workarounds,
it is imperative to analyze this phenomenon in detail. Our work demonstrates event collapse in its
simplest form and proposes collapse metrics by using first principles of space–time deformation
based on differential geometry and physics. We experimentally show on publicly available datasets
that the proposed metrics mitigate event collapse and do not harm well-posed warps. To the best of
our knowledge, regularizers based on the proposed metrics are the only effective solution against
event collapse in the experimental settings considered, compared with other methods. We hope that
this work inspires further research to tackle more complex warp models.
Keywords:
computer vision; intelligent sensors; robotics; event-based camera; contrast maximization;
optical flow; motion estimation
1. Introduction
Event cameras [
1
3
] offer potential advantages over standard cameras to tackle difficult
scenarios (high speed, high dynamic range, low power). However, new algorithms are
needed to deal with the unconventional type of data they produce (per-pixel asynchronous
brightness changes, called events) and unlock their advantages [
4
]. Contrast maximization
(CMax) is an event processing framework that provides state-of-the-art results on several
tasks, such as rotational motion estimation [
5
,
6
], feature flow estimation and tracking [
7
11
],
ego-motion estimation [
12
14
], 3D reconstruction [
12
,
15
], optical flow estimation [
16
19
],
motion segmentation [2024], guided filtering [25], and image reconstruction [26].
The main idea of CMax and similar event alignment frameworks [
27
,
28
] is to find
the motion and/or scene parameters that align corresponding events (i.e., events that are
triggered by the same scene edge), thus achieving motion compensation. The framework
simultaneously estimates the motion parameters and the correspondences between events
(data association). However, in some cases CMax optimization converges to an undesired
solution where events accumulate into too few pixels, a phenomenon called event collapse
(Figure 1). Because CMax is at the heart of many state-of-the-art event-based motion
estimation methods, it is important to understand the above limitation and propose ways to
overcome it. Prior works have largely ignored the issue or proposed workarounds without
analyzing the phenomenon in detail. A more thorough discussion of the phenomenon is
overdue, which is the goal of this work.
Contrary to the expectation that event collapse occurs when the event transformation
becomes sufficiently complex [
16
,
27
], we show that it may occur even in the simplest case
of one degree-of-freedom (DOF) motion. Drawing inspiration from differential geometry
Sensors 2022,22, 5190. https://doi.org/10.3390/s22145190 https://www.mdpi.com/journal/sensors
Sensors 2022,22, 5190 2 of 20
and electrostatics, we propose principled metrics to quantify event collapse and discourage
it by incorporating penalty terms in the event alignment objective function. Although event
collapse depends on many factors, our strategy aims at modifying the objective’s landscape
to improve the well-posedness of the problem and be able to use well-known, standard
optimization algorithms.
A
C
B
Loss Landscape Original events at ACollapsed IWE at BDesired IWE at C
Figure 1.
Event Collapse.
Left
: Landscape of the image variance loss as a function of the warp
parameter
hz
.
Right
: The IWEs at the different
hz
marked in the landspace. (
A
) Original events
(identity warp), accumulated over a small
t
(polarity is not used). (
B
) Image of warped events (IWE)
showing event collapse due to maximization of the objective function. (
C
) Desired IWE solution
using our proposed regularizer: sharper than (A) while avoiding event collapse (C).
In summary, our contributions are:
1. A study of the event collapse phenomenon in regard to event warping and objective
functions (Sections 3.3 and 4).
2.
Two principled metrics of event collapse (one based on flow divergence and one
based on area-element deformations) and their use as regularizers to mitigate the
above-mentioned phenomenon (Sections 3.4 to 3.6).
3.
Experiments on publicly available datasets that demonstrate, in comparison with
other strategies, the effectiveness of the proposed regularizers (Section 4).
To the best of our knowledge, this is the first work that focuses on the paramount phe-
nomenon of event collapse, which may arise in state-of-the-art event-alignment methods.
Our experiments show that the proposed metrics mitigate event collapse while they do not
harm well-posed warps.
2. Related Work
2.1. Contrast Maximization
Our study is based on the CMax framework for event alignment (Figure 2, bottom
branch). The CMax framework is an iterative method with two main steps per iteration:
transforming events and computing an objective function from such events. Assuming
constant illumination, events are triggered by moving edges, and the goal is to find the
transformation/warping parameters
θ
(e.g., motion and scene) that achieve motion com-
pensation (i.e., alignment of events triggered at different times and pixels), hence revealing
the edge structure that caused the events. Standard optimization algorithms (gradient
ascent, sampling, etc.) can be used to maximize the event-alignment objective. Upon
convergence, the method provides the best transformation parameters and the transformed
events, i.e., motion-compensated image of warped events (IWE).
The first step of the CMax framework transforms events according to a motion or
deformation model defined by the task at hand. For instance, camera rotational motion
estimation [
5
,
29
] often assumes constant angular velocity (
θω
) during short time spans,
hence events are transformed following 3-DOF motion curves defined on the image plane
by candidate values of
ω
. Feature tracking may assume constant image velocity
θv
(2-DOF) [7,30], hence events are transformed following straight lines.
In the second step of CMax, several event-alignment objectives have been proposed
to measure the goodness of fit between the events and the model [
10
,
13
], establishing
connections between visual contrast, sharpness, and depth-from-focus. Finally, the choice
Sensors 2022,22, 5190 3 of 20
of iterative optimization algorithm also plays a big role in finding the desired motion-
compensation parameters. First-order methods, such as non-linear conjugate gradient
(CG), are a popular choice, trading off accuracy and speed [
12
,
21
,
22
]. Exhaustive search,
sampling, or branch-and-bound strategies may be affordable for low-dimensional (DOF)
search spaces [
14
,
29
]. As will be presented (Section 3), our proposal consists of modifying
the second step by means of a regularizer (Figure 2, top branch).
!
60
80
100
120
Y [pix]
14.52
time [s] 160
14.57 140
X [pix]
120
100
Y [pix]
X [pix]
Contrast score
Optimize parameters of point trajectories
60
80
14.52
Y [pix]
100
120
14.54
time [s]
160
140
X [pix]
14.56 120
100
Y [pix]
time [s] X [pix]
Warp along
point trajectories
Measure event
alignment
Input events
Calculate
regularizer
Warped events ℰ′
!
Parameters
Figure 2.
Proposed modification of the contrast maximization (CMax) framework in [
12
,
13
] to also
account for the degree of regularity (collapsing behavior) of the warp. Events are colored in red/blue
according to their polarity. Reprinted/adapted with permission from Ref. [13], 2019, Gallego et al.
2.2. Event Collapse
In which estimation problems does event collapse appear? At first look, it may appear
that event collapse occurs when the number of DOFs in the warp becomes large enough,
i.e., for complex motions. Event collapse has been reported in homographic motions
(8 DOFs) [
27
,
31
] and in dense optical flow estimation [
16
], where an artificial neural
network (ANN) predicts a flow field with 2
Np
DOFs (
Np
pixels), whereas it does not occur
in feature flow (2 DOFs) or rotational motion flow (3 DOFs). However, a more careful
analysis reveals that this is not the entire story because event collapse may occur even in
the case of 1 DOF, as we show.
How did previous works tackle event collapse? Previous works have tackled the issue in
several ways, such as: (i) initializing the parameters sufficiently close to the desired solution
(in the basin of attraction of the local optimum) [
12
]; (ii) reformulating the problem, chang-
ing the parameter space to reduce the number of DOFs and increase the well-posedness
of the problem [
14
,
31
]; (iii) providing additional data, such as depth [
27
], thus changing
the problem from motion estimation given only events to motion estimation given events
and additional sensor data; (iv) whitening the warped events before computing the objec-
tive [27]; and (v) redesigning the objective function and possibly adding a strong classical
regularizer (e.g., Charbonnier loss) [
10
,
16
]. Many of the above mitigation strategies are task-
specific because it may not always be possible to consider additional data or reparametrize
the estimation problem. Our goal is to approach the issue without the need for additional
data or changing the parameter space, and to show how previous objective functions and
newly regularized ones handle event collapse.
3. Method
Let us present our approach to measure and mitigate event collapse. First, we revise
how event cameras work (Section 3.1) and the CMax framework (Section 3.2), which
was informally introduced in Section 2.1. Then, Section 3.3 builds our intuition on event
collapse by analyzing a simple example. Section 3.4 presents our proposed metrics for
event collapse, based on 1-DOF and 2-DOF warps. Section 3.5 specifies them for higher
DOFs, and Section 3.6 presents the regularized objective function.
3.1. How Event Cameras Work
Event cameras, such as the Dynamic Vision Sensor (DVS) [
2
,
3
,
32
], are bio-inspired
sensors that capture pixel-wise intensity changes, called events, instead of intensity images.
Advertisement
Sensors 2022,22, 5190 4 of 20
An event
ek.
= (xk
,
tk
,
pk)
is triggered as soon as the logarithmic intensity
L
at a pixel
exceeds a contrast sensitivity C>0,
L(xk,tk)L(xk,tktk) = pkC, (1)
where
xk.
= (xk
,
yk)>
,
tk
(with
µs
resolution) and polarity
pk {+
1,
1
}
are the spatio-
temporal coordinates and sign of the intensity change, respectively, and
tktk
is the time
of the previous event at the same pixel
xk
. Hence, each pixel has its own sampling rate,
which depends on the visual input.
3.2. Mathematical Description of the CMax Framework
The CMax framework [12] transforms events in a set E={ek}Ne
k=1geometrically
ek.
= (xk,tk,pk)W
7 e0
k
.
= (x0
k,tref,pk), (2)
according to a motion model
W
, producing a set of warped events
E0={e0
k}Ne
k=1
. The warp
x0
k=W(xk
,
tk
;
θ)
transports each event along the point trajectory that passes through it
(Figure 2, left), until
tref
is reached. The point trajectories are parametrized by
θ
, which
contains the motion and/or scene unknowns. Then, an objective function [
10
,
13
] measures
the alignment of the warped events
E0
. Many objective functions are given in terms of
the count of events along the point trajectories, which is called the image of warped
events (IWE):
I(x;θ).
=
Ne
k=1
bkδ(xx0
k(θ)). (3)
Each IWE pixel
x
sums the values of the warped events
x0
k
that fall within it:
bk=pk
if
polarity is used or
bk=
1 if polarity is not used. The Dirac delta
δ
is in practice replaced by a
smooth approximation [
33
], such as a Gaussian,
δ(xµ) N (x
;
µ
,
e2)
with
e=
1 pixel. A
popular objective function
G(θ)
is the visual contrast of the IWE
(3)
, given by the variance
G(θ)VarI(x;θ).
=1
||Z(I(x;θ)µI)2dx, (4)
with mean
µI.
=1
||RI(x
;
θ)dx
and image domain
. Hence, the alignment of the
transformed events
E0
(i.e., the candidate “corresponding events”, triggered by the same
scene edge) is measured by the strength of the edges of the IWE. Finally, an optimization
algorithm iterates the above steps until the best parameters are found:
θ=arg max
θG(θ). (5)
3.3. Simplest Example of Event Collapse: 1 DOF
To analyze event collapse in the simplest case, let us consider an approximation to
a translational motion of the camera along its optical axis
Z
(1-DOF warp). In theory,
translational motions also require the knowledge of the scene depth. Here, inspired by
the 4-DOF in-plane warp in [
20
] that approximates a 6-DOF camera motion, we consider a
simplified warp that does not require knowledge of the scene depth. In terms of data, let
us consider events from one of the driving sequences of the standard MVSEC dataset [
34
]
(Figure 1).
For further simplicity, let us normalize the timestamps of
E
to the unit interval
t[t1,tNe]7 ˜
t[0, 1]
, and assume a coordinate frame at the center of the image plane,
then the warp Wis given by
x0
k= (1˜
tkhz)xk, (6)
where
θhz
. Hence, events are transformed along the radial direction from the image
center, acting as a virtual focus of expansion (FOE) (cf. the true FOE is given by the data).
Sensors 2022,22, 5190 5 of 20
Letting the scaling factor in
(6)
be
sk.
=
1
˜
tkhz
, we observe the following: (i)
sk
cannot be
negative since it would imply that at least one event has flipped the side on which it lies
with respect to the image center; (ii) if
sk>
1 the warped event gets away from the image
center (“expansion” or “zoom-in”); and (iii) if
sk[
0, 1
)
the warped event gets closer to the
image center (“contraction” or “zoom-out”). The equivalent conditions in terms of
hz
are:
(i) hz<1, (ii) hz<0 is an expansion, and (iii) 0 <hz<1 is a contraction.
Intuitively, event collapse occurs if the contraction is large (0
<sk
1) (see Figures 1C and 3a).
This phenomenon is not specific of the image variance; other objective functions lead to the
same result. As we see, the objective function has a local maximum at the desired motion
parameters (Figure 1B). The optimization over the entire parameter space converges to a
global optimum that explains the event collapse.
x
y
t
!!"#
(a)
x
y
t
!!"#
(b)
x
y
t
!!"#
(c)
Figure 3.
Point trajectories (streamlines) defined on
xyt
image space by various warps. (
a
) Zoom
in/out warp from image center (1 DOF). (
b
) Constant image velocity warp (2 DOF). (
c
) Rotational
warp around Xaxis (3 DOF).
Discussion
The above example shows that event collapse is enabled (or disabled) by the type
of warp. If the warp does not enable event collapse (contraction or accumulation of flow
vectors cannot happen due to the geometric properties of the warp), as in the case of feature
flow (2 DOF) [
7
,
30
] (Figure 3b) or rotational motion flow (3 DOF) [
5
,
29
] (Figure 3c), then
the optimization problem is well posed and multiple objective functions can be designed to
achieve event alignment [
10
,
13
]. However, the disadvantage is that the type of warps that
satisfy this condition may not be rich enough to describe complex scene motions.
On the other hand, if the warp allows for event collapse, more complex scenarios can
be described by such a broader class of motion hypotheses, but the optimization framework
designed for non-event-collapsing scenarios (where the local maximum is assumed to be
the global maximum) may not hold anymore. Optimizing the objective function may lead
to an undesired solution with a larger value than the desired one. This depends on multiple
elements: the landscape of the objective function (which depends on the data, the warp
parametrization, and the shape of the objective function), and the initialization and search
strategy of the optimization algorithm used to explore such a landscape. The challenge
in this situation is to overcome the issue of multiple local maxima and make the problem
better posed. Our approach consists of characterizing event collapse via novel metrics
and including them in the objective function as weak constraints (penalties) to yield a
better landscape.
3.4. Proposed Regularizers
3.4.1. Divergence of the Event Transformation Flow
Inspired by physics, we may think of the flow vectors given by the event transforma-
tion
E 7 E0
as an electrostatic field, whose sources and sinks correspond to the location
of electric charges (Figure 4). Sources and sinks are mathematically described by the di-
vergence operator
·
. Therefore, the divergence of the flow field is a natural choice to
characterize event collapse.
Advertisement
Loading more pages...