Event Collapse in Contrast Maximization Frameworks [original]

Citation: Shiba, S.; Aoki, Y.; Gallego,

G. Event Collapse in Contrast

Maximization Frameworks. Sensors

2022,22, 5190. https://doi.org/

10.3390/s22145190

Academic Editor: Jing Tian

Received: 9 June 2022

Accepted: 7 July 2022

Published: 11 July 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional affil-

iations.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

sensors

Article

Event Collapse in Contrast Maximization Frameworks

Shintaro Shiba 1,2,* , Yoshimitsu Aoki 1and Guillermo Gallego 2,3

1Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University,

3-14-1, Kohoku-ku, Yokohama 223-8522, Kanagawa, Japan; [email protected]

2Department of Electrical Engineering and Computer Science, Technische Universität Berlin,

10587 Berlin, Germany; [email protected]

3Einstein Center Digital Future and Science of Intelligence Excellence Cluster, 10117 Berlin, Germany

*Correspondence: [email protected]

Abstract:

Contrast maximization (CMax) is a framework that provides state-of-the-art results on

several event-based computer vision tasks, such as ego-motion or optical flow estimation. However,

it may suffer from a problem called event collapse, which is an undesired solution where events are

warped into too few pixels. As prior works have largely ignored the issue or proposed workarounds,

it is imperative to analyze this phenomenon in detail. Our work demonstrates event collapse in its

simplest form and proposes collapse metrics by using first principles of space–time deformation

based on differential geometry and physics. We experimentally show on publicly available datasets

that the proposed metrics mitigate event collapse and do not harm well-posed warps. To the best of

our knowledge, regularizers based on the proposed metrics are the only effective solution against

event collapse in the experimental settings considered, compared with other methods. We hope that

this work inspires further research to tackle more complex warp models.

Keywords:

computer vision; intelligent sensors; robotics; event-based camera; contrast maximization;

optical flow; motion estimation

1. Introduction

Event cameras [

–

] offer potential advantages over standard cameras to tackle difficult

scenarios (high speed, high dynamic range, low power). However, new algorithms are

needed to deal with the unconventional type of data they produce (per-pixel asynchronous

brightness changes, called events) and unlock their advantages [

]. Contrast maximization

(CMax) is an event processing framework that provides state-of-the-art results on several

tasks, such as rotational motion estimation [

], feature flow estimation and tracking [

–

ego-motion estimation [

–

], 3D reconstruction [

], optical flow estimation [

–

motion segmentation [20–24], guided filtering [25], and image reconstruction [26].

The main idea of CMax and similar event alignment frameworks [

] is to find

the motion and/or scene parameters that align corresponding events (i.e., events that are

triggered by the same scene edge), thus achieving motion compensation. The framework

simultaneously estimates the motion parameters and the correspondences between events

(data association). However, in some cases CMax optimization converges to an undesired

solution where events accumulate into too few pixels, a phenomenon called event collapse

(Figure 1). Because CMax is at the heart of many state-of-the-art event-based motion

estimation methods, it is important to understand the above limitation and propose ways to

overcome it. Prior works have largely ignored the issue or proposed workarounds without

analyzing the phenomenon in detail. A more thorough discussion of the phenomenon is

overdue, which is the goal of this work.

Contrary to the expectation that event collapse occurs when the event transformation

becomes sufficiently complex [

], we show that it may occur even in the simplest case

of one degree-of-freedom (DOF) motion. Drawing inspiration from differential geometry

Sensors 2022,22, 5190. https://doi.org/10.3390/s22145190 https://www.mdpi.com/journal/sensors

Sensors 2022,22, 5190 2 of 20

and electrostatics, we propose principled metrics to quantify event collapse and discourage

it by incorporating penalty terms in the event alignment objective function. Although event

collapse depends on many factors, our strategy aims at modifying the objective’s landscape

to improve the well-posedness of the problem and be able to use well-known, standard

optimization algorithms.

Loss Landscape Original events at ACollapsed IWE at BDesired IWE at C

Figure 1.

Event Collapse.

Left

: Landscape of the image variance loss as a function of the warp

parameter

Right

: The IWEs at the different

marked in the landspace. (

) Original events

(identity warp), accumulated over a small

∆t

(polarity is not used). (

) Image of warped events (IWE)

showing event collapse due to maximization of the objective function. (

) Desired IWE solution

using our proposed regularizer: sharper than (A) while avoiding event collapse (C).

In summary, our contributions are:

1. A study of the event collapse phenomenon in regard to event warping and objective

functions (Sections 3.3 and 4).

Two principled metrics of event collapse (one based on flow divergence and one

based on area-element deformations) and their use as regularizers to mitigate the

above-mentioned phenomenon (Sections 3.4 to 3.6).

Experiments on publicly available datasets that demonstrate, in comparison with

other strategies, the effectiveness of the proposed regularizers (Section 4).

To the best of our knowledge, this is the first work that focuses on the paramount phe-

nomenon of event collapse, which may arise in state-of-the-art event-alignment methods.

Our experiments show that the proposed metrics mitigate event collapse while they do not

harm well-posed warps.

2. Related Work

2.1. Contrast Maximization

Our study is based on the CMax framework for event alignment (Figure 2, bottom

branch). The CMax framework is an iterative method with two main steps per iteration:

transforming events and computing an objective function from such events. Assuming

constant illumination, events are triggered by moving edges, and the goal is to find the

transformation/warping parameters

(e.g., motion and scene) that achieve motion com-

pensation (i.e., alignment of events triggered at different times and pixels), hence revealing

the edge structure that caused the events. Standard optimization algorithms (gradient

ascent, sampling, etc.) can be used to maximize the event-alignment objective. Upon

convergence, the method provides the best transformation parameters and the transformed

events, i.e., motion-compensated image of warped events (IWE).

The first step of the CMax framework transforms events according to a motion or

deformation model defined by the task at hand. For instance, camera rotational motion

estimation [

] often assumes constant angular velocity (

θ≡ω

) during short time spans,

hence events are transformed following 3-DOF motion curves defined on the image plane

by candidate values of

. Feature tracking may assume constant image velocity

θ≡v

(2-DOF) [7,30], hence events are transformed following straight lines.

In the second step of CMax, several event-alignment objectives have been proposed

to measure the goodness of fit between the events and the model [

], establishing

connections between visual contrast, sharpness, and depth-from-focus. Finally, the choice

Sensors 2022,22, 5190 3 of 20

of iterative optimization algorithm also plays a big role in finding the desired motion-

compensation parameters. First-order methods, such as non-linear conjugate gradient

(CG), are a popular choice, trading off accuracy and speed [

]. Exhaustive search,

sampling, or branch-and-bound strategies may be affordable for low-dimensional (DOF)

search spaces [

]. As will be presented (Section 3), our proposal consists of modifying

the second step by means of a regularizer (Figure 2, top branch).

100

120

Y [pix]

14.52

time [s] 160

14.57 140

X [pix]

120

100

Y [pix]

X [pix]

Contrast score

Optimize parameters of point trajectories

14.52

Y [pix]

100

120

14.54

time [s]

160

140

X [pix]

14.56 120

100

Y [pix]

time [s] X [pix]

Warp along

point trajectories

Measure event

alignment

Input events

Calculate

regularizer

Warped events ℰ′

Parameters

Figure 2.

Proposed modification of the contrast maximization (CMax) framework in [

] to also

account for the degree of regularity (collapsing behavior) of the warp. Events are colored in red/blue

according to their polarity. Reprinted/adapted with permission from Ref. [13], 2019, Gallego et al.

2.2. Event Collapse

In which estimation problems does event collapse appear? At first look, it may appear

that event collapse occurs when the number of DOFs in the warp becomes large enough,

i.e., for complex motions. Event collapse has been reported in homographic motions

(8 DOFs) [

] and in dense optical flow estimation [

], where an artificial neural

network (ANN) predicts a flow field with 2

DOFs (

pixels), whereas it does not occur

in feature flow (2 DOFs) or rotational motion flow (3 DOFs). However, a more careful

analysis reveals that this is not the entire story because event collapse may occur even in

the case of 1 DOF, as we show.

How did previous works tackle event collapse? Previous works have tackled the issue in

several ways, such as: (i) initializing the parameters sufficiently close to the desired solution

(in the basin of attraction of the local optimum) [

]; (ii) reformulating the problem, chang-

ing the parameter space to reduce the number of DOFs and increase the well-posedness

of the problem [

]; (iii) providing additional data, such as depth [

], thus changing

the problem from motion estimation given only events to motion estimation given events

and additional sensor data; (iv) whitening the warped events before computing the objec-

tive [27]; and (v) redesigning the objective function and possibly adding a strong classical

regularizer (e.g., Charbonnier loss) [

]. Many of the above mitigation strategies are task-

specific because it may not always be possible to consider additional data or reparametrize

the estimation problem. Our goal is to approach the issue without the need for additional

data or changing the parameter space, and to show how previous objective functions and

newly regularized ones handle event collapse.

3. Method

Let us present our approach to measure and mitigate event collapse. First, we revise

how event cameras work (Section 3.1) and the CMax framework (Section 3.2), which

was informally introduced in Section 2.1. Then, Section 3.3 builds our intuition on event

collapse by analyzing a simple example. Section 3.4 presents our proposed metrics for

event collapse, based on 1-DOF and 2-DOF warps. Section 3.5 specifies them for higher

DOFs, and Section 3.6 presents the regularized objective function.

3.1. How Event Cameras Work

Event cameras, such as the Dynamic Vision Sensor (DVS) [

], are bio-inspired

sensors that capture pixel-wise intensity changes, called events, instead of intensity images.

Sensors 2022,22, 5190 4 of 20

An event

ek.

= (xk

pk)

is triggered as soon as the logarithmic intensity

at a pixel

exceeds a contrast sensitivity C>0,

L(xk,tk)−L(xk,tk−∆tk) = pkC, (1)

where

xk.

= (xk

yk)>

(with

µs

resolution) and polarity

pk∈ {+

−

}

are the spatio-

temporal coordinates and sign of the intensity change, respectively, and

tk−∆tk

is the time

of the previous event at the same pixel

. Hence, each pixel has its own sampling rate,

which depends on the visual input.

3.2. Mathematical Description of the CMax Framework

The CMax framework [12] transforms events in a set E={ek}Ne

k=1geometrically

ek.

= (xk,tk,pk)W

7→ e0

= (x0

k,tref,pk), (2)

according to a motion model

, producing a set of warped events

E0={e0

k}Ne

k=1

. The warp

k=W(xk

;

θ)

transports each event along the point trajectory that passes through it

(Figure 2, left), until

tref

is reached. The point trajectories are parametrized by

, which

contains the motion and/or scene unknowns. Then, an objective function [

] measures

the alignment of the warped events

. Many objective functions are given in terms of

the count of events along the point trajectories, which is called the image of warped

events (IWE):

I(x;θ).

∑

k=1

bkδ(x−x0

k(θ)). (3)

Each IWE pixel

sums the values of the warped events

that fall within it:

bk=pk

polarity is used or

bk=

1 if polarity is not used. The Dirac delta

is in practice replaced by a

smooth approximation [

], such as a Gaussian,

δ(x−µ)≈ N (x

;

e2)

with

1 pixel. A

popular objective function

G(θ)

is the visual contrast of the IWE

(3)

, given by the variance

G(θ)≡VarI(x;θ).

|Ω|ZΩ(I(x;θ)−µI)2dx, (4)

with mean

µI.

|Ω|RΩI(x

;

θ)dx

and image domain

Ω

. Hence, the alignment of the

transformed events

(i.e., the candidate “corresponding events”, triggered by the same

scene edge) is measured by the strength of the edges of the IWE. Finally, an optimization

algorithm iterates the above steps until the best parameters are found:

θ∗=arg max

θG(θ). (5)

3.3. Simplest Example of Event Collapse: 1 DOF

To analyze event collapse in the simplest case, let us consider an approximation to

a translational motion of the camera along its optical axis

(1-DOF warp). In theory,

translational motions also require the knowledge of the scene depth. Here, inspired by

the 4-DOF in-plane warp in [

] that approximates a 6-DOF camera motion, we consider a

simplified warp that does not require knowledge of the scene depth. In terms of data, let

us consider events from one of the driving sequences of the standard MVSEC dataset [

]

(Figure 1).

For further simplicity, let us normalize the timestamps of

to the unit interval

t∈[t1,tNe]7→ ˜

t∈[0, 1]

, and assume a coordinate frame at the center of the image plane,

then the warp Wis given by

k= (1−˜

tkhz)xk, (6)

where

θ≡hz

. Hence, events are transformed along the radial direction from the image

center, acting as a virtual focus of expansion (FOE) (cf. the true FOE is given by the data).

Sensors 2022,22, 5190 5 of 20

Letting the scaling factor in

(6)

sk.

−˜

tkhz

, we observe the following: (i)

cannot be

negative since it would imply that at least one event has flipped the side on which it lies

with respect to the image center; (ii) if

sk>

1 the warped event gets away from the image

center (“expansion” or “zoom-in”); and (iii) if

sk∈[

0, 1

)

the warped event gets closer to the

image center (“contraction” or “zoom-out”). The equivalent conditions in terms of

are:

(i) hz<1, (ii) hz<0 is an expansion, and (iii) 0 <hz<1 is a contraction.

Intuitively, event collapse occurs if the contraction is large (0

<sk

1) (see Figures 1C and 3a).

This phenomenon is not specific of the image variance; other objective functions lead to the

same result. As we see, the objective function has a local maximum at the desired motion

parameters (Figure 1B). The optimization over the entire parameter space converges to a

global optimum that explains the event collapse.

!!"#

(a)

!!"#

(b)

!!"#

(c)

Figure 3.

Point trajectories (streamlines) defined on

x−y−t

image space by various warps. (

) Zoom

in/out warp from image center (1 DOF). (

) Constant image velocity warp (2 DOF). (

) Rotational

warp around Xaxis (3 DOF).

Discussion

The above example shows that event collapse is enabled (or disabled) by the type

of warp. If the warp does not enable event collapse (contraction or accumulation of flow

vectors cannot happen due to the geometric properties of the warp), as in the case of feature

flow (2 DOF) [

] (Figure 3b) or rotational motion flow (3 DOF) [

] (Figure 3c), then

the optimization problem is well posed and multiple objective functions can be designed to

achieve event alignment [

]. However, the disadvantage is that the type of warps that

satisfy this condition may not be rich enough to describe complex scene motions.

On the other hand, if the warp allows for event collapse, more complex scenarios can

be described by such a broader class of motion hypotheses, but the optimization framework

designed for non-event-collapsing scenarios (where the local maximum is assumed to be

the global maximum) may not hold anymore. Optimizing the objective function may lead

to an undesired solution with a larger value than the desired one. This depends on multiple

elements: the landscape of the objective function (which depends on the data, the warp

parametrization, and the shape of the objective function), and the initialization and search

strategy of the optimization algorithm used to explore such a landscape. The challenge

in this situation is to overcome the issue of multiple local maxima and make the problem

better posed. Our approach consists of characterizing event collapse via novel metrics

and including them in the objective function as weak constraints (penalties) to yield a

better landscape.

3.4. Proposed Regularizers

3.4.1. Divergence of the Event Transformation Flow

Inspired by physics, we may think of the flow vectors given by the event transforma-

tion

E 7→ E0

as an electrostatic field, whose sources and sinks correspond to the location

of electric charges (Figure 4). Sources and sinks are mathematically described by the di-

vergence operator

∇·

. Therefore, the divergence of the flow field is a natural choice to

characterize event collapse.

Loading more pages...