scieee Science in your language
[en] (orig)
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1
Teaching Vehicles to Anticipate: A Systematic
Study on Probabilistic Behavior Prediction
Using Large Data Sets
Florian Wirthmüller , Julian Schlechtriemen , Jochen Hipp , and Manfred Reichert
Abstract By observing their environment as well as other
traffic participants, humans are enabled to drive road vehicles
safely. Vehicle passengers, however, perceive a notable difference
between non-experienced and experienced drivers. In particular,
they may get the impression that the latter ones anticipate what
will happen in the next few moments and consider these foresights
in their driving behavior. To make the driving style of automated
vehicles comparable to the one of human drivers with respect
to comfort and perceived safety, the aforementioned anticipation
skills need to become a built-in feature of self-driving vehicles.
This article provides a systematic comparison of methods and
strategies to generate this intention for self-driving cars using
machine learning techniques. To implement and test these algo-
rithms we use a large data set collected over more than 30000 km
of highway driving and containing approximately 40 000 real-
world driving situations. We further show that it is possible to
classify driving maneuvers upcoming within the next 5 s with
an Area Under the ROC Curve (AUC) above 0.92 for all defined
maneuver classes. This enables us to predict the lateral position
with a prediction horizon of 5 s with a median lateral error of
less than 0.21 m.
Index Terms Automated driving, advanced driver assistance
systems, maneuver classification, trajectory prediction, vehicle
position prediction, Gaussian mixture regression, mixture of
experts.
I. INTRODUCTION
AUTOMATED driving has the potential to radically
change our mobility habits as well as the way goods are
transported. To enable driving automation, several processing
steps have to be executed. Fig. 1 illustrates this thought:
In the first step, the current traffic scene has to be sensed
and a proper representation of the environment needs to be
Manuscript received October 2, 2019; revised April 19, 2020; accepted
May 21, 2020. The Associate Editor for this article was M. Mesbah.
(Corresponding author: Florian Wirthmüller.)
Florian Wirthmüller is with Mercedes-Benz AG Research and Development,
71034 Böblingen, Germany, and also with the Institute of Databases and
Information Systems (DBIS), Ulm University, 89081 Ulm, Germany (e-mail:
florian.wirthmueller@daimler.com).
Julian Schlechtriemen is with Mercedes-Benz AG Research and Develop-
ment, 71034 Böblingen, Germany, and also with the Institute of Realtime
Learning Systems, University of Siegen, 57076 Siegen, Germany.
Jochen Hipp is with Mercedes-Benz AG Research and Development,
71034 Böblingen, Germany.
Manfred Reichert is with the Institute of Databases and Information Systems
(DBIS), Ulm University, 89081 Ulm, Germany.
This article has supplementary downloadable material available at
http://ieeexplore.ieee.org, provided by the authors.
Digital Object Identifier 10.1109/TITS.2020.3002070
Fig. 1. Long-term driving behavior predictions in the context of trajectory
planning for automated driving (equal symbols denote simultaneity).
generated. Using this information, the given traffic situation
needs to be interpreted and the behavior of others has to be
anticipated. Subsequently, a plan, i.e. a trajectory, is derived
based on this knowledge. Finally, this plan is executed in the
last step of this process. How long the trajectory stays viable,
before it has to be re-planned, is strongly influenced by the
capability of the prediction component.
As opposed to other research works dealing with techniques
to interconnect vehicles through a so called car-to-car com-
munication, we aim to solve this anticipation task locally.
On one hand, it is not foreseeable when an adequate market
penetration of vehicles with such techniques will be reached.
On the other, a local prediction component always becomes
necessary, as there are several traffic participants without
communication abilities such as bicyclists. In addition, local
predictions might become necessary to bypass transmission
times in certain cases as emphasized by [1]. Moreover, it is
reasonable to approach the topic from the perspective of
highway driving, as this use case is easier to realize than others
due to its clear constraints (e.g. structured setting, absence
of pedestrians). However, for the prediction task this implies
the challenge to create precise long-term predictions (2 to 5s)
rather than short forecasts (up to 2s), as in highway scenarios
higher velocities can be expected than in urban or rural areas.
A. Problem Statement
We tackle the challenge of anticipating the behavior of other
traffic participants in highway scenarios. In particular, we aim
to generate information that can be processed by trajectory
planning algorithms to implement an anticipatory driving style.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
In this context, our objective is to model future vehicle posi-
tions within a time tin longitudinal xtand lateral ytdirection
as spatial distributions xtpx,ytpyrather than estimating
single shot predictions ˆxtand ˆytrespectively. Note that these
distributions are more useful for down-streamed criticality
assessments as they enable us to represent several alternative
hypotheses at a time with their particular frequencies. Despite
the focus on highway driving, the presented methods shall
be general enough to be appropriate in other environments as
well.
B. Problem Resolution Strategy
This article presents a systematic workflow for the design
and evaluation of a lightweight maneuver-based model [2],
which uses standard sensor inputs to perform long-term
driving behavior predictions. Methodically, we build on [3]
and use a two-step Mixture of Experts (MOE) approach. This
includes a maneuver classification and a down-streamed
behavior prediction. The maneuver probabilities {Pm}mM
determined by the classifier are used in the Mixture of Experts
approach as gating nodes. Specifically, the probabilities control
the weighting wmof the respective expert distributions py,m,
while calculating the overall distribution of future vehicle
positions py. Eq. 1 summarizes this procedure for the lateral
direction (equivalent for x):
ytpy(y,I,t)
=
mM
py,my,m|I,t)·wm(I)(1)
The set of maneuvers Mis defined as follows:
M={LCL,FLW,LCR}(2)
Different weighting approaches based on the maneuver
probabilities are presented in Sec. VII. The expert distributions
py,mare modeled as Gaussian Mixture Models (GMMs) in
the combined input and output space with Kcomponents
according to Eq. 3, and are used in a Gaussian Mixture
Regression manner. Hence, they are conditioned by the input
features Iand the prediction time t(cf. Eq. 1).
py,my,m)=
K
i=1
φy,m,i·Ny,m,i,y,m,i)(3)
The parameters of the GMMs are subsumed in y:
y={θy,m}mM={φy,my,m,y,m}mM(4)
In addition, we introduce an alternative methodology to the
Mixture of Experts approach, integrating the outputs of the
gating nodes into one single model. This simplifies Eq. 1 as
follows:
ytpyy,IGMM|I,t,PLCL(I), PLCR(I)) (5)
For implementing the models, we use out-of-the-box mod-
ules from the widely used frameworks Apache Spark MLlib
[4] (classifiers) and Scikit-learn [5] (GMMs).
Altogether, we contribute a systematic workflow for design-
ing and evaluating the prediction models as well as methodical
extensions to known approaches. Moreover, we assess the
performance of the developed modules for the two tasks of
predicting (1) driving maneuvers and (2) probability distribu-
tions of future positions both separately and in combination.
To evaluate the modules, we utilize a large data set comprising
real-world measurements. As will be shown, our prediction
models outperform established state-of-the-art approaches.
The remainder of this article is organized as follows: Sec. II
discusses related work on object motion prediction, empha-
sizing the value added by our approach. Sec. III introduces
the data set and describes the preprocessing steps applied to
it. Sec. IV outlines the training of the considered maneuver
classifiers, whereas Sec. V deals with the experimental eval-
uation and the performance of the classifiers. Based on these
findings, Sec. VI develops different approaches for estimating
probability distributions of future vehicle positions, which are
then assesed in Sec. VII. Finally, Sec. VIII summarizes the
article and gives an outlook on future work.
II. RELATED WORK
Regarding the understanding and prediction of the behav-
ior of other traffic participants in highway scenarios, vari-
ous aspects were investigated in literature. Accordingly, this
section is sub-divided into three parts: Sec. II-A presents
approaches inferring the kind of maneuver that will be exe-
cuted by a vehicle. Note that applications like collision check-
ers or trajectory planning algorithms cannot directly process
such kind of information. Instead, probabilities of future
vehicle positions or trajectories need to be predicted. Related
research on this topic is presented in Sec. II-B. Bringing
together the aspects of maneuver classification and position
prediction, Sec. II-C gives an overview of hybrid prediction
approaches. Finally, Sec. II-D closes the section with a brief
literature discussion, leading to the contributions of this article
in Sec. II-E.
A. Classification Approaches
Classification approaches for maneuver recognition are
described in [1], [6]–[8]. In [1], a system is introduced,
which is capable of detecting lane changes with high accura-
cies (>99%), approximately 1s before their occurrence. For
this purpose, dynamic Bayesian networks are used. Another
approach, which is capable of detecting lane changes approx-
imately 1.5s before their occurrence, is presented in [6].
To achieve this, the lane change probability is decomposed
into a situation- and a movement-based component, resulting
in an F1-score better than 98%. The approach presented in
[7], in turn, shows that it is possible to detect lane changes up
to time horizons of 2s when using feature selection for scene
understanding, with an Area Under the Curve (AUC) better
than 0.96. Moreover, [8] combines interaction-aware heuris-
tic models with an interaction-unaware learned model. The
interaction-aware component relies on a multi agent simulation
based on game theory, in which each agent simultaneously
tries to minimize different cost functions. These cost functions
are designed using expert knowledge and consider traffic rules.
In a second step, the output of the interaction model is used to
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
WIRTHMÜLLER et al.: TEACHING VEHICLES TO ANTICIPATE 3
condition an interaction-unaware classifier based on Bayesian
networks. The approach is able to detect lane changes on
average 1.8s in advance, with an AUC better than 0.93.
B. Trajectory and Position Prediction Approaches
Approaches dealing with the prediction of trajecto-
ries and positions are presented in [9]–[13]: [9] uses a
fully-connected Deep Neural Network to learn the parameters
of a two-dimensional GMM. For each situation, an adapted
Gaussian Mixture distribution models the probability density
in the output dimensions axand vy(cf. Tab. XII). This dis-
tribution is then sampled to estimate trajectories. The authors
evaluate their approach with the widely used NGSIM data set
[14] and show that a root weighted square error (comparable
to RMSE) of approximately 0.5m in lateral direction at a
prediction horizon of 5s can be achieved.
Another approach, also evaluated with the NGSIM data
set, is presented in [10]. The authors propose the use of a
Long Short Term Memory network for predicting trajectories.
In particular, the approach is able to compute single shot
predictions with an RMSE of approximately 0.42m at a pre-
diction horizon of 5s. Reference [11] deals with the prediction
of spatial probability density functions, especially at road
intersections. More precisely, a conditional probability density
function, which models the relationship between past and
future motions, is inferred from training data. Finally, standard
GMMs and variational approaches are compared. In [12], this
approach is extended by a hierarchical Mixture of Experts
that allows to incorporate categorical information. The latter
includes, for example, the topology of a road intersection.
In [13], a Gaussian Mixture Regression approach for pre-
dicting future longitudinal positions as well as a procedure for
estimating the prediction confidence are introduced.
C. Hybrid Approaches
Approaches that combine strategies for both maneuver
detection and trajectory or position prediction, similar to the
approach presented in this article, are described in [15]–[20].
In the following, we denote such approaches as hybrid.
Reference [15] presents a two-staged approach: In the first
step, a Multilayer Perceptron (MLP) is used to estimate the
future lane of a vehicle. In a second step, a concrete trajectory
realization is estimated with an additional MLP. As a result,
the lane estimation module is able to detect lane changes 2s
in advance with an AUC better than 0.90. The evaluation of
the trajectory prediction module shows a median lateral error
of approximately 0.23m at a prediction horizon of 5s.
Reference [16] proposes another hybrid approach that uses
the prediction of future trajectories to forecast lane change
maneuvers. Moreover, the intention of drivers is modeled using
a Support Vector Machine. Subsequently, the resulting action
is checked for collisions. This enables the approach to model
interrupted lane changes. During the evaluation, an F1-score
of 98.1% with a detection time up to 1.74s is achieved.
In turn, [17] does not follow such a hybrid approach, but
contains an intermediate step before predicting trajectories.
Instead of learning maneuver probabilities, the authors present
a regression technique for estimating the time span to the
next lane change relying on Random Forests. In [18], this
approach is extended and combined with findings from [6].
The estimated time up to the next lane changes to the left and
to the right are used as input for a cubic polynomial which
is intended to predict future trajectories. Finally, the approach
is evaluated with the mentioned NGSIM data set, showing a
median lateral error of approximately 0.5m at a prediction
horizon of 3s for lane changing scenarios, assuming a perfect
maneuver classification.
Reference [19] proposes the use of a maneuver recognition
based on a Hidden Markov Model, distinguishing between ten
maneuver classes. Based on this model, a position prediction
module, which combines several maneuver specific variational
GMMs (according to [11]) and an Interacting Multiple Model,
which weights different physical models against each other,
are implemented. As the approach uses ten maneuver classes
and as the errors are only measured in terms of Euclidean
distance, the results are difficult to compare with the ones of
other approaches. Additionally, the approach is evaluated on a
rather small data set. Finally, in [20] these findings are pursued
by the use of a Long Short Term Memory network. The authors
demonstrate certain improvements compared to their previous
work, while using the NGSIM data set for evaluation purposes.
Reference [3] presents an approach predicting future lateral
vehicle positions based on Gaussian Mixture Regression and
a Mixture of Experts with a Random Forest as gating net-
work. The approach is evaluated based on a small data set,
leading to noisy results, especially in case of lane changes.
The evaluation shows that the approach is able to perform
maneuver classifications with an AUC better than 0.84 and
lateral position predictions with a median error of less than
0.2m at a prediction horizon of 5s.
D. Discussion
The findings of our literature survey can be summarized
as follows: Many works provide meaningful algorithmic con-
tributions. However, in numerous cases we miss structure
regarding the problem resolution strategy. Often, it does not
become clear how the approaches compare to any baseline
(e.g. [19]). Moreover, parameters (e.g. [16]) and feature sets
(e.g. [10]) are selected manually, and are thus difficult to
retrace. In addition, most approaches focus on short or medium
prediction horizons (e.g. [1]), or lack a good prediction per-
formance for larger time-horizons (e.g. [18]). When analyzing
the approaches that aim to resolve the long-term prediction
problem, it becomes clear that the latter is challenging as the
prediction models become significantly more complex as, e.g.,
pointed out by [7], [8] and [21].
Moreover, many approaches (e.g. [10]) aim to predict
single trajectories or single shot predictions rather than prob-
abilistic distributions of future vehicle positions. Therefore,
the objective to be optimized is mostly the root-mean-square
error (RMSE). As opposed to these works, we consider the
objective of the learning problem as generating an estimator
that models a probability distribution of positions reflecting the
frequencies of all observed positions, e.g., for different drivers
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
4IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
Fig. 2. Preprocessing steps used in the proposed workflow (respective sections are referred in the boxes).
in the same situation. Thus, we aim to maximize the likelihood
of truly occupied positions given the model. As reasoning
behind this design choice, such distributions contain signif-
icantly more information than single shot predictions. Thus,
they are more useful for applications that need to consider
risks, like, for example, maneuver planning approaches as
presented in [11], [22], [23].
E. Contributions
The contribution of this article is threefold:
1) We apply a heuristic-free machine learning workflow to
generate a model capable of predicting maneuvers and
precise distributions of future vehicle positions for time
horizons up to 5s (reasonable in terms of comparability).
This is achieved with a machine learning workflow
that omits any human tuned (hyper-) parameters when
constructing the classifiers. Note that this includes all
aspects involving feature engineering, labeling, feature
selection, and hyperparameter optimization for different
classification algorithms. Regarding feature engineering
and selection, this means that we construct a data
set with a large superset of all features, which are
potentially relevant for the problem solution beforehand.
Afterwards we select a more or less small feature set
that still ensures maximum predictive power through an
automated feature selection process.
2) We evaluate the modules for maneuver classification
and position prediction, where both parts are not only
evaluated separately, as in other works (e.g. [18]), but
as a combined prediction system as well. This concerns
the lateral as well as the longitudinal behavior. In this
context, we show that directly feeding the results of the
classifier into the regression problem produces results
comparable to an Mixture of Experts approach. Addi-
tionally, we show that relying on the Markov assumption
and not modeling the interactions between the traffic
participants explicitly, allows producing superior results
compared to existing approaches. As opposed to these
works, we integrate the different aspects of behavior
prediction, which comprise the prediction of driving
maneuvers and positions both in lateral and longitudinal
direction. In addition, we introduce new methodologies
and conduct a large-scale evaluation.
3) We demonstrate that the presented methods not
only have the potential to outperform state-of-the-art
approaches when feeding them with a sufficient number
of data. Additionally, we show that our approach is
able to provide a meaningful estimate of the prediction
uncertainty to the consumer of the information, which
is beneficial for collision risk calculation and trajectory
planning (e.g. [22]).
III. DATA PREPARATION &EXPERIMENTAL SETUP
Sec. III-A introduces the considered data set and the exper-
imental setup. Sec. III-B then gives a detailed overview of
the features used to train our models. Afterwards, Sec. III-C
introduces the labeling process. Finally, Sec. III-D deals with
the data set split for training, validating and testing the
constructed models as well as further preprocessing steps.
Fig. 2 summarizes the overall preprocessing workflow.
A. Data Collection
For modelling and evaluating our modules, we use mea-
surement data from a fleet of testing vehicles [24] equipped
with common series sensors. The sensor setup includes a
front-facing camera detecting lane markings as well as two
radars observing the traffic situation in the back. In addition,
the vehicles have a front-facing automotive radar to sense the
distances and velocities of surrounding vehicles. The data has
been collected with different vehicles and drivers at varying
times of the day during all seasons. The data collection
campaign spanned over more than a year and was mainly
restricted to the area around Stuttgart in Germany. Through
the wide variance, we are expecting our models to achieve
good generalization characteristics.
Unlike other contributions (e.g. [3]), we are not using the
actual object-vehicles as prediction target oin this work, but
rather the ego- (or measurement-) vehicle itself. However,
as our work of course focuses on the prediction of sur-
rounding vehicles, we solely use features that are observable
from an external point of view, as postulated in other works
(e.g. [1] or [16]). Note that this constraint excludes features
like driver status or steering wheel angle. Thus, the models
remain applicable to actual object-vehicles, assuming a good
sensing of their surrounding. Working with the ego-vehicle
data offers several advantages concerning the modeling of
situations: First, each situation can be described in a similar
way, as situations in which relevant neighboring vehicles to the
target-vehicle are hidden for the measurement-vehicle can not
occur. In addition, all measurements span longer time periods
as the target-vehicle can never disappear from the field of view.
This way of data handling is widespread in literature (e.g.
[6]). In addition, one can expect that future sensor setups will
minimize measurement uncertainty for perceived objects and
will get closer to the data quality that is nowadays available
for the ego-vehicle.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
WIRTHMÜLLER et al.: TEACHING VEHICLES TO ANTICIPATE 5
Fig. 3. Environment model used for our investigations.
Basically, our investigations rely on a similar environment
model than the one presented in [7], modeling the surrounding
with a fixed grid of eight relation partners. But opposed to [7],
we use the ego-vehicle as prediction target. For this purpose,
we slightly adapt the environment model: As the sensors facing
the rear traffic in the testing vehicles are less capable than
the ones facing the front, our environment model (cf. Fig. 3)
distinguishes between relation partners behind (index rb)and
in front of (index rf) the prediction target o. Thus, the relation
vectors of the rear objects Rrb are shortened compared to the
ones of the front objects Rrf. The relation vectors describe the
relation between the respective object and the prediction target.
Object-vehicles on the same lane as oand driving behind o
are left out, as the current sensor setup is not able to sense
them. Consequently, a traffic situation can be described by the
feature vector Fsit, which contains the relations of oand its
seven relation partners, its own status Fo, and the infrastructure
description Finfra (cf. Eq. 6):
Fsit =[Rrf(r=fl), Rrf(r=f), Rrf(r=fr),
Rrf(r=l), Rrf(r=r),
Rrb(r=rl), Rrb(r=rr),
Fo,Finf ra]T(6)
A detailed listing of the particular elements of the relation
vectors Rrf and Rrb as well as Foand Finfra can be found
in Tab. XII.
B. Feature Engineering
To test and develop our system and to fill the described
environment model, we use fused data originating from three
different sources:
1) The basis for our investigations are measurement data
produced by the testing fleet (cf. Sec. III-A).
2) As we identified additional features being of interest as
inputs beforehand, we fuse the data with information
from a navigation map (e.g. bridges, tunnels, and dis-
tances to highway approaches).
3) Besides, we calculate some higher order features out of
the measurements, as e.g. a conversion to a curvilinear
coordinate-system along the road [25].
C. Labeling
Like previous works [3], we divide all samples into the
three maneuver classes LCL (lane change left), FLW (lane
following), and LCR (lane change right) and apply a labeling
process that works as follows: First, for each measurement,
the times up to the next lane change to the left neighboring
lane (TTLCL) and to the right one (TTLCR) respectively
are calculated. This is accomplished by a forecast in time with
the distances to the lane markings. As the moment of the lane
change, we define the point in time when the vehicle center has
just crossed the lane marking. Subsequently, we determine the
maneuver labels of each sample based on a defined prediction
horizon Thaccording to Eq. 7:
L=
LCL,if (TTLCL Th)
(TTLCL <TTLCR)
LCR,if (TTLCR Th)
(TTLCR <TTLCL)
FLW,otherwise
(7)
We decided to use a horizon of 5s, as the duration of
lane change maneuvers usally ranges from 3s to 5s (see
[16]). Consequently, it is reasonable to label samples only to
an upper boundary of 5s as potential lane change samples.
Additionally, this value is widely used in literature as longest
prediction time (e.g. [8], [15] or [16]) and, therefore, it allows
for comparability. However, note that this style of labeling
might result in decreased performance values, as detections
being slightly more than 5s ahead of a lane change count as
false positives in the evaluation.
D. Data Set Split
As shown in Fig. 2, we split our data into several parts after
executing the mentioned preprocessing steps. The first split
divides our data into one part for the maneuver classification
DMa and another one for the position prediction DPo.This
allows us to produce models based on independent data sets.
An overview of the splits as well as the respective data set
sizes and identifiers is given in Tab. I.
The first part DMa is then used as follows: To prepare
the training, parametrization and evaluation of the developed
classifiers as well as to stay methodically straight, we split
data set DMa once more into six folds.1Thereof we use five
folds DMa
TV in Sec. IV for the design and parametrization. The
remaining fold DMa
6=DMa
Te is only used for the performance
examinations presented in Sec. V. The split is performed based
on entire situations as described in [3]. This means that the
measurements of each situation solely occur in one of the
folds. Note that this ensures the absence of unrealistic results,
which might occur due to similar samples from the same
1As shown in the following sections, the amount of folds is a trade-off
between computability and correctness
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
6IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
TABLE I
DATA SET IDENTIFIERS AND SIZES
time series in the evaluation and trainings data otherwise.
To achieve an even proportion of the three maneuver classes,
we balance the number of samples within each fold by a
random undersampling strategy. As the prediction problem is
extremely unbalanced, as outlined in [10], classifiers would
focus on the most frequent maneuver class FLW otherwise.
In our case approximately 94% of the data points belong to
that class.
In addition, we only take situations into account that were
collected continuously up to the prediction horizon of 5s.
This ensures that the folds are also balanced over time, which
constitutes a prerequisite for performing fair evaluations. This
is necessary, as the prediction task is obviously much more
demanding when predicting a lane change 4s in advance
instead of 1s in advance. Due to this strategy, the numbers
of samples in the six folds are slightly different, but we con-
sider this as uncritical. Overall, DMa contains approximately
8 hours of highway driving of which 2
3are collected right
during lane changes.
The second data set DPo, which serves for the training and
evaluation of the position prediction, is processed as follows:
Initially, we add the lane change probabilities as estimated by
the different classifiers to each sample. Furthermore, we only
consider measurements that were collected when the vehicle
was manually driven. Note that this restriction is essential
as all vehicles of our testing fleet are equipped with an
Adaptive Cruise Control (ACC) system. Thus, driving in
a semi-automated mode is over-represented in our data set
compared to reality.2
We further split data set DPo into the subsets DPo
Tfor train-
ing and DPo
Te for evaluating the position predictions (cf. Sec. VI
and Sec. VII). Afterwards, we expand each data point in DPo
T
with the desired prediction outputs, i.e., the true positions
in xand ydirection for all times tTT={-1.0s, -0.9s,
…, 6.0s}. Note that the samples with negative times and the
ones with times >5s are needed to train the distributions
correctly. Strictly limiting the times to a certain range would
generate areas in the data space, which are difficult to represent
with GMMs due to discontinuities similar to the ones in the
probability dimension (cf. Sec. VI-B). To overcome these
problems, we integrated a mechanism performing a subsam-
pling between -1s and 0s as well as between 5s and 6s
2We do not explicitly filter out ACC driving in the data set for maneuver
classification, as we can assume that ACC is always deactivated during lane
changes.
Fig. 4. Process of training and evaluating maneuver classifiers.
according to a Gaussian distribution (percentiles: P50 =0.0s;
P3σ=−1.0s; equivalent between 5 and 6s).
Another mechanism performing a time interpolation ensures
that the training data points are distributed continuously along
the time dimension. Accordingly, we also have access to
prediction times in between our sampling times during the
training process. Moreover, the data points in the position test
data set DPo
Te are expanded with xand ypositions as well as
corresponding times tTTe ={0.0s, 0.1s, …, 5.0s}.
Finally, we ’coil’ the two data sets DPo
T&DPo
Te such
that each of the newly constructed data points contains the
features at the start point of the prediction, one corresponding
prediction time, and the actual xand ypositions at that point in
time (in Fig. 2 this step is called ’Explode Data’). Hence, our
data sets are multiplied by a factor of |TT|=71 respectively
|TTe|=51 and are structured as described in Sec. VII-A.
Note that DPo
Tis re-splitted along the maneuver labels and
undersampled in Sec. VI-A, to train maneuver specific position
prediction experts.
IV. MANEUVER CLASSIFIER TRAINING
This section gives an overview of the different techniques
used for feature selection (cf. Sec. IV-A), classification algo-
rithms (cf. Sec. IV-B), and techniques to tune the respective
hyperparameters (cf. Sec. IV-C) for the maneuver classifica-
tion. The corresponding activities are illustrated by Fig. 4.
A. Feature Selection
This section deals with the task of selecting a meaningful
subset of features from the available superset. Such selection
makes sense for two reasons: First, it can improve the predic-
tion performance of the maneuver classifiers. Second, it can
help to reduce calculation efforts, enabling predictions on
devices with limited computational power as well. Our main
goal here is to improve the overall prediction performance.
Note that this slightly contrasts with an overall ranking of
the available features, as some of them are highly redundant.
Consequently, the most predictive variables shall be selected,
3for details see Sec. VI-A
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
WIRTHMÜLLER et al.: TEACHING VEHICLES TO ANTICIPATE 7
TABLE II
SUMMARY OF EXAMINED FEATURE SELECTION TECHNIQUES
while excluding redundant ones. In literature, one can find
numerous works dealing with feature selection in machine
learning applications. In our implementation, we rely on the
findings from [26]. As we claim to solve the underlying
classification problem through a systematic machine learn-
ing workflow, we start with simple techniques and move
towards more sophisticated and computationally expensive
ones. To demonstrate the performance of the used techniques,
additionally, we test the classification with the entire superset
as a baseline. The superset that contains all features is denoted
as Ain the following.
The first investigated technique is a simple correlation-based
feature selection technique, which evaluates the correlation
of all features and then applies a threshold (set to 0.15)
to remove features showing a very low correlation with the
maneuver class from the superset. More precisely, we compute
Spearman’s Correlation (see [27, p. 133 ff]) between each
feature and the time up to the next lane change (TTLC).
We selected this quantity instead of the maneuver label, as it
enables a smooth fade-out. The resulting feature set is denoted
as Bin the following. Tab. II summarizes the examined
variants and their abbreviations. Finally, the elements of the
resulting feature sets can be found in Tab. XII.
The second technique uses the Correlation-based Feature
Selection (CFS; cf. [28]) and is referred to as Cin the
following. For this technique, the correlation of entire feature
sets instead of single features is calculated. More precisely,
for all feature sets S, the merit’ MS, as a measure of the
predictive performance, is computed according to Eq. 8:
MS=nρcf
n+n(n1 ff
(8)
ndescribes the number of features and ρcf corresponds to
the mean correlation of all features with the class label or,
in our case, TTLC.Variableρff, in turn, describes the mean
feature-feature inter-correlation of all features within S. As can
be seen from Eq. 8, strongly correlated features in a feature set
Sminimize MS, whereas a stronger correlation with the class
label ρcf maximizes the value of MS. All these computations
rely on the assumption that no strong feature inter-correlations
are present in the data set, but that instead every relevant
feature itself is at least weakly correlated with the class label
(see also [28]). To meet the conditions of our data set and to
be consistent with variant B, we use Spearmans correlation
coefficient. As the computation of MSis not feasible for all
possible feature combinations, we use a backward selection
strategy that, according to Guyon and Elisseeff [26], typically
provides superior results compared to forward selection. When
applying it in our research, we try to minimize the possible
shortcomings of the CFS by applying cross-validation with the
five data folds for training and validation (DMa
TV), as described
in Sec. III-D.
The feature selection techniques described so far are limited
in two aspects: Firstly, a proper incorporation of the properties
of the used classification algorithm is missing. Secondly,
features only being meaningful in combination with others
are not considered in feature sets Band C. Therefore, when
generating feature set D, we apply a wrapper feature selection
technique as described in [29]. As the training of Random
Forests already includes an implicit feature selection, we solely
focus on wrapper techniques including the other classifiers
presented in Sec. IV-B. The main idea of wrapper techniques
is to incorporate the classifier itself as black box into the
feature selection process. Within this process the prediction
performance on a validation data set is used to determine
the best feature set for the respective classifier. We build our
investigations on a hyperparameter set that was optimized as
described in Sec. IV-C, whith the feature set of variant C
being used for optimization. According to the process for
deriving C, we perform the search for the most descriptive
feature set with backward elimination. As for each of the
approximately 5000 possible subsets, a classifier needs to
be trained and evaluated, the wrapper technique becomes
computationally expensive. To accelerate the computation,
we are not performing the validation using cross-validation.
Instead, we use one of the data folds constructed in Sec. III-D
for training (DMa
1) and one for validation (DMa
2).
B. Examined Classification Algorithms
For the task of maneuver classification, we consider three
different algorithms for evaluation purposes, which have been
successfully applied in reference works:
1) The first algorithm is based on a Gaussian Naïve Bayes
(GNB) approach using GMMs instead of only using one
Gaussian kernel per class and was presented in [7].
2) The second algorithm is based on a Random Forest (RF)
and was presented in [3].
3) The third algorithm is based on a Multilayer Perceptron
(MLP) approach and was presented similiarly in [15].
As opposed to GNB and RF, this approach uses scaled
features, as suggested by [30, p. 398 ff]. In contrast
to [15], we use a modified labeling and a partly auto-
mated strategy to identify an optimal model structure,
where we restrict the model to one hidden layer in order
to keep the parameter optimization solvable in finite
time.
C. Hyperparameter Optimization
To achieve the best possible performance and to enable
a fair comparison of the examined classifiers, we optimize
their respective hyperparameters. For the GNB, this means
to find the optimal number of Gaussian kernels Kused
for each feature and class. A Variational Bayesian Gaussian
Mixture Model (VBGMM; see [31]) is used in this context.
This technique was already successfully applied in [11]. The
principle behind VBGMMs is to fit a distribution of the possi-
ble Gaussian Mixture distributions using a Dirichlet process.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
8IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
TABLE III
OPTIMIZED HYPERPARAMETERS PER CLASSIFIER
Hence, this technique ensures that the optimal value for kis
determined automatically.
Regarding RF and MLP approaches, the parameter opti-
mization is executed for each feature set using a grid-
search. This means, that we vary the parameters and calculate
for each parameter set a performance value. For the latter,
we calculate the average balanced accuracy (see Sec. V-A) in
a leave one out cross-validation manner. Thereby, we use the
data of the five data folds for training and validation (DMa
TV).
The parameters to be optimized are summarized in Tab. III.
So far, we constructed different feature sets (cf. Sec. IV-A)
and optimized the hyperparameters for the different classifi-
cation algorithms (cf. Sec. IV-B & Sec. IV-C). Subsequently,
we now execute a second training step with a larger amount
of data for all algorithms, using the optimized feature sets
and hyperparameters. The enlargement of the data set is
achieved using all five folds that we previously used in the
cross-validation DMa
TV. Note that through this step we derive
the final models for the classifier evaluation (cf. Sec. V).
V. MANEUVER CLASSIFIER EVALUATION
This section presents the experimental results obtained with
the trained classification models (cf. Sec. IV). Sec. V-A
introduces the used performance measures, whereas Sec. V-B
presents and discusses the results measured with the con-
structed test data set (cf. Sec. III-A).
A. Performance Measures
To be able to assess the performance of the developed clas-
sifiers, several metrics are needed, as we are simultaneously
focusing on different objectives. Particularly, we are interested
in predicting lane changes not only with high accuracies, but
also as early as possible in advance of their execution.
To reflect that, we use the balanced accuracy (BACC),
which enables us to perform an even weighting of the classifi-
cation performance for the three maneuver classes. Basically,
we use the definition presented in [32], but in a generalized
form for multiclass problems (cf. Eq. 9):
BACC =1
|M|·
mM
TP
m
Pm
(9)
TABLE IV
DEFINITION OF THE DETECTION TIME METRICS
Mis defined according to Eq. 2. Moreover, TP
mcorre-
sponds to the number of true positives for class mand Pmto
the number of samples truly belonging to class m(positives).
Thereby, the classifiers assign each sample to the class with
the highest probability value.
Additionally, we use the Receiver Operator Characteristic
(ROC) and Area Under the ROC Curve (AUC), which both
are widely used metrics in this domain (e.g. [33, p. 180 ff]).
As opposed to the BACC,theROC curve is originally
intended to asses binary classifiers. Accordingly, we transform
our three-class problem into three binary classification prob-
lems. In contrast to the BACC,theROC curves constructed
this way enable us to show off the classification performance
at different working points (WP). For example, this property
allows us to assess the performance for the maneuver classes
LCL and LCR with more conservative classifier parametriza-
tions and, thus, less false positives. Additionally, the AUC
helps to analyze the performance at all possible working points
at once.
Besides, metrics which enable us to analyze the technically
possible prediction time horizon are needed. As the point in
time being referenced in this context is essential and most
sources (e.g. [1], [15] and [16]) are not very exact in this
respect, we introduce the two metrics τfand τc(cf. Tab. IV).
As opposed to the BACC evaluation, for which an unam-
biguous class assignment becomes necessary, the class assign-
ment is at this point conducted in a way that matches the
binary evaluation in the ROC curve: For the classes LCL
and LCR, respectively, we select a binary decision threshold
that keeps the false positive rate below 1%. The resulting
working points are presented later on in Fig. 5 along with the
ROC curves. The detection times calculated this way reflect
an evaluation with a limited false positive rate and, hence,
at a similar working point for the different classifiers. Note
that this ensures a fair evaluation. We decide here for a very
low false positive rate as the system should not produce too
many lane change detections. Remember that in practice, lane
changes occur very rarely compared to lane following.
B. Results & Discussion
Tab. V shows the results (BACC,AUC,τ) for the different
classifiers and feature sets measured based on the maneuver
test data set DMa
Te . Probably, due to the large number of
samples, a favorable classifier parametrization and selection
seem to have a significantly higher impact on the classification
performance than a clever feature selection has. Note that
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
WIRTHMÜLLER et al.: TEACHING VEHICLES TO ANTICIPATE 9
Fig. 5. ROC curves for the developed maneuver classifiers with their respective best parameter sets and hyperparameters.
TABLE V
SUMMARY OF EXAMINED CLASSIFIERS WITH
PREFERRED HYPERPARAMETERS
this can be concluded, as the classifiers working with feature
sets Band Conly perform slightly worse regarding BACC
and AUC than the other classifiers. However, applying a
feature selection still remains reasonable as it ensures shorter
computation times. In addition, the results indicate that the
feature selection contributes to an increase of the prediction
times in most cases. Note that this does not apply to the RF
as this classifier performs an implicit feature selection.
Fig. 5 additionally shows the ROC curves for the respec-
tive best combination of classifier and feature set regarding
TABLE VI
CONTEXTUAL FEATURES SOLELY IMPACTING SPECIAL SITUATIONS
BACC and AUC for each of the three classifiers. As another
result of our investigations, the classification performance
for the lane following maneuver (FLW), which is neglected
by most researchers in literature, is notably worse than for
the lane changing maneuvers for all considered algorithms.
This can be explained with the fact that nearly each sample,
which can not be certainly assigned to one of the lane
change maneuvers, is classified as lane following. This is
caused, as confusions between a lane change to the right
and one to the left are very rare. Thus, a significantly larger
number of false positives arises for maneuver class FLW.
In addition, we could reproduce the findings of [8], which
showed that lane changes to the left are easier to predict
than the ones to the right. One may explain this phenomenon
with the observation that lane changes to the right are often
motivated by the intention to leave the highway. The latter
can be hardly predicted compared to lane changes to the
left, which are often performed to overtake slower leading
vehicles. Besides, it can be observed that the classification
problem remains resolvable even with a significantly decreased
number of features, as shown by the MLP classifier with
feature set DMLP, which only includes 24 features. This
illustrates that a decreased number of features sometimes leads
to an improved performance due to a lower dimension of
the input space. This can be explained with the fact that
numerous features, which we expected to provide insights
into specific lane changing situations, seem to have nearly no
effect concerning the general behavior in highway situations.
Exemplary features showing this behavior are summarized
in Tab. VI.
An explanation of this behavior is that situations, which are
affected by these features, occur even rarer than lane changes.
However, as automated driving is extremely demanding
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
10 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
Fig. 6. Histogram of detection times τf(a) and τc(b) for RF for maneuver
class LCL with feature set A.
TABLE VII
AUC VALUES IN COMPARISON TO REFERENCE WORKS
Fig. 7. Steps to train and evaluate the position predictors.
exactly in these situations, additional investigations are needed
in these cases (cf. Sec. VIII).
It is noteworthy that the detection times τfand τcare
limited to a maximum of 5s due to our evaluation method-
ology. Therefore, the average values τfand τcpresented in
Tab. V will even be exceeded in practice. To substantiate this
assumption, Fig. 6 shows a histogram of the detection times
for the RF. The distribution shows numerous situations, that
are detected 5 or more seconds in advance.
Altogether, our investigations show that a systematic
machine learning workflow, combined with a large amount of
data, is able to outperform current state-of-the-art approaches
significantly. This becomes obvious when looking at the AUC
in comparison to other approaches. Tab. VII shows that our
approach outperforms the others, although we are working
with a significantly larger prediction horizon, which makes
the classification problem more demanding as aforementioned.
Finally, note that the mentioned state-of-the-art approaches
were designed and evaluated on considerably smaller data sets.
Our investigations show that the GNB classifier performs
significantly worse than the two other approaches (i.e. MLP
and RF). Thus, we only use these two classifiers in our
further studies. Additionally, we are restricting ourselves to
those feature sets and hyperparameter sets showing the best
performance (cf. Tab. VIII).
VI. POSITION PREDICTOR TRAINING
This section deals with the training of the models for
position prediction. In particular, we show how to determine
TABLE VIII
SELECTED FEATURE SETS AND HYPERPARAMETERS PER CLASSIFIER
Fig. 8. Illustration of the mixture of experts (MOE) approach.
the GMM parameters . Sec. VI-A relies on the Mixture of
Experts (MOE) approach, which was introduced in [3] for lat-
eral predictions and which uses Gaussian Mixture Regression
(cf. Eq. 1). An alternative approach is presented in Sec. VI-B.
As opposed to the MOE approach, it solves the problem in one
processing step (cf. Eq. 5). The entire procedure, including the
evaluation process (cf. Sec. VII), is depicted in Fig. 7.
A. Mixture of Experts Approach
To train the experts for the three maneuver classes,
we divide the data set (cf. Sec. III-D) along the maneuver
labels (cf. Fig. 7). Subsequently, we perform a random under-
sampling of the data points for the FLW maneuver class to
obtain approximately the same number of samples as for the
other two classes. The basic idea behind this step is that the
regression problem for the FLW class is less complex than for
the two other classes. Thus, it should be solvable with the same
amount of data. Amongst others, this data reduction helps to
speed up training. As a consequence, the number of FLW
samples is approximately decreased by 95% and the data sets
DPo
T,LCL,DPo
T,FLW,andDPo
T,LCR are constructed (cf. Tab. I).
Afterwards, we train an expert GMM with each of these
data sets. These experts are later used in the MOE approach
(cf. Fig. 8). We choose a maximum number of K=50
mixture components as well as full covariance matrices,4and
fit the GMM in a variational manner again. Besides, we use
the following input-feature set FI
yand the true position yat a
defined prediction time tto train the experts in lateral direction
(cf. Eq. 10):
FI
y={vy,dcl
y}(10)
Regarding the prediction in longitudinal direction, we need
to distinguish whether or not a preceding vehicle is present.
4Preliminary investigations showed that GMMs with diagonal covariance
matrices are faster to fit, but are by far less accurate.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
WIRTHMÜLLER et al.: TEACHING VEHICLES TO ANTICIPATE 11
Fig. 9. Illustration of the integrated approach.
If no vehicle is in sensor range, both the relative speed and
distance for that vehicle are set to default values. As involving
the latter in the training of the models would lead to bad fits,
the input feature sets FI
x,Obj and FI
x,Obj are defined as follows
(cf. Eq. 11 & Eq. 12):
FI
x,Obj ={vx,ax,drel,f
v,v
rel,f
v}(11)
FI
x,Obj ={vx,ax}(12)
As shown in [13], the prediction performance for the longi-
tudinal direction can be significantly increased by learning the
deviation from the constant velocity prediction ˆxCV instead of
the true target position x. Consequently, we use the output
dimensions FO
x(cf. Eq. 13):
FO
x={x−ˆxCV,t}(13)
B. Integrated Approach
As alternative to the MOE approach, this section presents
an integrated approach, which uses the unsplitted data set DPo
T
(cf. Tab. I) and expands the feature sets (FI
x,Obj,FI
x,Obj,FI
y)
with the maneuver probabilities PLCL and PLCR (cf. Fig. 9).
PFLW is left out here as this information would be redundant
to the one provided by PLCL and PLCR, and we want to
keep the models dimension as low as possible. Consequently,
the task of considering the maneuver probabilities is directly
integrated in the model. The resulting one-block solution is
both easier to implement and to use. In this context, we discov-
ered that GMMs are not well suited to fit probabilities bounded
to values between 0 and 1. Especially, this is the case if most
of the probabilities tend against the extreme values (cf. Fig. 10
(a)). Hence, we expand our data set with a duplicate of each
data point containing probability values, which are mirrored at
0 for original probabilities being lower than 0.5 and at 1 for all
other original probabilities. This way, we are able to generate
the density shown in Fig. 10 (b), which we identified as easier
to fit with GMMs. Note that before our adjustment, the density
contained an abrupt jump, especially at PLCL =0. As such
discontinuities are only representable by numerous Gaussian
components, which are symmetrical and smooth per definition,
many components needed in other areas of the data space
would be wasted for this purpose.
The actual training of the integrated GMM is performed
similarly to the experts training in a variational fashion, with
K=50 components and full covariance matrices, but with the
entire training data set. Thus, no undersampling procedures are
applied and the unbalanced nature of the maneuver classes and
their actual frequencies are preserved.
Fig. 10. Density of PLCL before (a) and after (b) adjustment.
VII. POSITION ESTIMATION EVAL UATION
In order to evaluate the position predictions, first of all,
one has to decide which of the considered classifiers fits best
as gating network in the Mixture of Experts (MOE)andin
the integrated approach respectively. Hence, we calculate the
average log-likelihoods Lon the entire position test data set
DPo
Te (cf. Sec. III-D). Note that this data set is not balanced
according to the maneuver labels as also suggested in [20].
In particular, the unbalanced nature of the data allows us to
draw general conclusions about the performance, independent
of the respective driving maneuver. In this context, the use of
the average log-likelihood as quality criterion for comparing
different approaches is beneficial, as it rates the quality of the
predicted probability density distribution instead of assessing
only the ability to predict one single position with maximized
accuracy. Moreover, the log-likelihood is exactly the value
to be maximized in the process of fitting a GMM.However,
as Lcan not be interpreted as physical quantity, it is solely
useful for comparison purposes. As we are also interested in
assessing the performance concerning the spatial error and to
achieve comparability, we additionally investigate this quantity
for the approach working best in the following subsections.
Tab. IX shows the per sample log-likelihood of different
approaches for the longitudinal (Lx)aswellasthelateral(Ly)
direction. In this context, we use the already introduced classi-
fiers RF and MLP in combination with four different strategies
to combine the experts’ position estimates, as introduced in
Eq. 1, as weighting function wm(I):
1) Raw probabilities (Raw): This strategy directly uses the
raw probabilities as issued by the classifiers Pcl f
m(I)
as gating probabilities. This means that we concatenate
the three GMMs and multiply the mixture weights
with the probabilities issued by the respective classifier:
wRaw
m(I)=Pcl f
m(I).
2) Winner Takes it All (WTA): This strategy uses the
outputs of the GMM for the maneuver class with the
largest probability according to the respective classifier
(cf. Eq. 14).
wWTA
m(I)=
1,if Pcl f
m(I)=max
{qM}Pcl f
q(I)
0,else (14)
3) Prior Weighted Raw probabilities (PW-Raw): This strat-
egy considers that the classifiers were trained on a bal-
anced data set. Thus, it multiplies the raw probabilities
with the prior probabilities for each maneuver class:
wPWRaw
m(I)=norm(Pcl f
m(I)·πm).
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
12 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
TABLE IX
PER SAMPLE LOG-LIKELIHOODS WITH DIFFERENT
CLASSIFIERS AND MOE STRATEGIES
4) Integrated GMM (I-GMM): This strategy directly uses
the integrated approach presented in Sec. VI-B to predict
the probability distributions and follows Eq. 5.
To demonstrate the benefits of our approach, which
combines maneuver classification and position prediction,
we additionally analyze its performance compared to reference
strategies. First, we use the labels as a perfect classifier
according to Eq. 15:
wLabels
m=1,if m=L
0,else (15)
Moreover, we use the pure prior probabilities
(πLCL =πLCR =0.03;πFLW =0.94) as most naive
classifier (wPriors
m=πm) and a strategy without a classifier,
referred to as NOCLF in the following.
For the longitudinal direction, Tab. IX shows that the ref-
erence solution without any previous maneuver classification
(NOCLF) is able to produce slightly better results than the
other combinations. Although it seems to be trivial that lane
changes have not to be taken into account when predicting the
longitudinal behavior, this is noteworthy, as our expectations
beforehand was that lane changes to the left mostly go along
with an acceleration, whereas braking actions are extremely
rare.
By contrast, the benefits of the Mixture of Experts (MOE)
approach come into effect for the lateral direction. As shown
in Tab. IX, the combination of prior weighting and MLP
probabilities performs best. Furthermore, all combinations
involving the integrated approach perform only slightly worse
or even better (RF) than the combinations using prior weighted
probabilities. As benefit, these models are easier to use and are
more robust against poor or uncalibrated maneuver probabili-
ties without needing an additional calibration step. This can be
explained with the fact that these models perform an implicit
probability calibration during the training of the GMM.
Moreover, we learned that the WTA strategy has no practical
relevance, as it does not necessarily produce continous position
predictions over consecutive time steps as accomplished by
the other strategies per definition. Besides, in case of a
misclassification, the WTA strategy solely asks one specific
expert model, which might not be applicable in that area of
the data space, what clearly decreases the overall performance.
In the following, we investigate the spatial errors of the best
combinations (lateral: MLP classifier with PW-Raw strategy;
longitudinal: NOCLF), as previously introduced. For this
purpose, we present the applied performance measures in
Sec. VII-A and then show the obtained results in Sec. VII-B.
A. Performance Measures
To measure the spatial performance of our predictions,
we rely on the unbalanced position evaluation data set DPo
Te.
The latter contains the needed inputs for the maneuver classi-
fiers and position predictors (I) as well as the true trajectories
TR according to Eq. 16.
DPo
Te =ITR
(16)
TR contains N=20000 5s-trajectories sampled with 10Hz
(hence 1000000 samples) according to Eq. 17:
TR =tr0tr1... trN(17)
Each trajectory triconsists of 51 corresponding xand y
positions, according to Eq. 18:
tri=
xi
0.0yi
0.0
xi
0.1yi
0.1
.
.
..
.
.
xi
5.0yi
5.0
(18)
The predicted trajectories ˆ
TR are then calculated with the
described classifiers and position predictors in the same format
as TR. However, as the Gaussian Mixture Regression originally
produces probability densities instead of point estimates, these
have to be calculated first. This is accomplished by calculating
the center of gravity of the density as described in [3].
Accordingly, the prediction error ei
tof a specific prediction
time tfor one of the itrajectories is calculated separately for
the two dimensions xand yas follows (Eq. 19):
ei
t=ei
x,tei
y,t=|xi
t−ˆxi
t||yi
t−ˆyi
t|(19)
Variables ˆxand ˆydescribe the estimated positions, whereas
xand ycorrespond to the actual ones. The individual errors
ei
tof all trajectories iare concatenated to Et(cf. Eq. 20):
Et=Ex,tEy,t=ei
x,tei
y,ti(20)
At this point, we want to re-emphasize, that although this
way of evaluating the performance produces easy to interpret
results, it disregards that our original outputs (i.e. spatial
probability densities) contain much more information than a
single point estimation.
B. Results & Discussion
Fig. 11 shows the performance of the selected combinations
of classifiers and mixing strategies (highlighted in Tab. IX)
at a prediction horizon of 5s for the longitudinal (Ex,5)and
the lateral (Ey,5) direction on the left side. In comparison,
a constant velocity (CV) prediction and a Mixture of Experts
(MOE) with labels 5are shown. The right-hand side of Fig. 11
5Using the MOE with the labels as input corresponds to the assumption of
a perfect classifier.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
WIRTHMÜLLER et al.: TEACHING VEHICLES TO ANTICIPATE 13
Fig. 11. Visualization of the error distribution (left) in longitudinal and lateral direction and the median lateral error as function of the prediction time (right).
Fig. 12. Predicted probability distribution of future vehicle positions for an illustrative situation.
TABLE X
COMPARING LATERAL PREDICTION PERFORMANCE
WITH RELATED WORKS
shows the development of the median lateral error ˜
Ey,tas
function of the prediction time t.
As the plots indicate, our position prediction system is
able to produce results comparable to the ones with a per-
fect maneuver classification, in both lateral and longitudinal
direction. Additionally, the plots show that we are able to
clearly outperform simple models as CV and reach a very
small median lateral predictionerroroflessthan0.21mat
a prediction horizon of 5s. As shown in Tab. X, this is
remarkable compared to other approaches. Note that we did
not include studies in this compilation, which report the root-
mean-square error (RMSE), which we quantify with a value
of 0.64m. On one hand, we follow [34], which points out that
RMSE measures do not allow for a comparison over different
data sets, as the values depend on the size of the data set.
On the other, the challenge tackled by us (cf. Sec. I-A) is to
predict the probability distribution of future vehicle positions
TABLE XI
PREDICTION ERRORS PER CLASS AND DIRECTION
Fig. 13. Prediction confidence against lateral prediction errors.
rather than single shot estimates. Consequently, we did not
optimize the predictions to minimize RMSE. Therefore, it is
not surprising that other works which explicitly minimize this
value, but ignore distribution estimations, perform better with
respect to RMSE.
As shown in [3], these results are dominated by the most
frequent maneuver class (FLW). Hence, Tab. XI complemen-
tarily shows the errors for 20000 maneuvers of each type.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
14 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
TABLE XII
DESCRIPTION OF THE EVALUATED FEATURES fOF AN OBSERVED VEHICLE oAND USAGE
OF THE FEATURES IN THE CONSTRUCTED FEATURE SETS (AD)
As can be seen, the errors for the lane change maneuvers
are considerably larger than the ones for lane-following.
On one hand, this can be explained with the more complex
regression task. On the other, the predictions are subjected
to higher uncertainties in case of a lane change, as shown
by the predicted distributions (cf. Fig. 12). As opposed to
that, the uncertainty is ignored in the single point estimates.
Note that the increased uncertainties are caused by the lack of
knowledge on the exact point in time at which the maneuver
will be completed. This even holds true, if the classifier
made the position prediction to know about an upcoming lane
change.
Complementary to these quantitative evaluations, we per-
formed qualitative testing and visualized single situations
along with our predictions. To illustrate this, we attached a
short video and present a single frame in Fig. 12. More pre-
cisely, Fig. 12 shows the predictions during an upcoming lane
change, along with the described uncertainties. In addition,
we show the confidence of our predictions (Confx,Confy),
which provides an important hint concerning the reliability of
the predictions to the consumer of the information. This value
is calculated similarly to [13] through additional GMMs fitted
in the input dimensions. To demonstrate its general usability,
we visualized the confidence value divided by the standard
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
WIRTHMÜLLER et al.: TEACHING VEHICLES TO ANTICIPATE 15
deviation against the lateral prediction errors at Th=5s
in Fig. 13. As can be seen, and as expected the prediction
errors decrease with increasing confidence values.
VIII. SUMMARY AND OUTLOOK
This work introduces a machine learning workflow that
enables calculations of long-term behavior predictions for
surrounding vehicles in highway scenarios. For the first time,
a combined compilation of prediction techniques for driving
maneuvers and positions as well as lateral and longitudinal
behavior is presented. The developed modules are evaluated in
detail based on a large amount of real-world data, challenging
established state-of-the-art approaches.
To further improve the quality of the presented behavior pre-
dictions, especially in complex situations, we are working on
various enhancements and conducting additional studies. Cur-
rently, we migrate the prediction strategies to an experimental
vehicle to enable detailed investigations regarding run time
as well as resource usage. Meanwhile, we are about to apply
our models to predict movements of surrounding vehicles in
contrast to ego-vehicle movements. Besides, we plan to apply
our predictor to a publicly available data set as highD [35]
or NGSIM to improve comparability. In addition, we want to
investigate up to which maximum prediction horizon (beyond
5s), the maneuver detection produces useful insights.
Moreover, we see high potential in identifying demanding
scenarios and explicitly integrating contextual knowledge (e.g.
weather, traffic, time of day or local specialties) into our
models. First experiments towards this direction have proven,
that contextual properties can have a considerable impact on
driving behavior.
ACKNOWLEDGMENT
The authors would like to thank Mercedes-Benz AG
Research and Development for providing real-world measure-
ment data, which enabled us to perform our experiments.
Furthermore, they would like to thank the Institute of Data-
bases and Information Systems at Ulm University as well as
Prof. Dr. Klaus-Dieter Kuhnert from the Institute of Realtime
Learning Systems at the University of Siegen for supporting
our studies.
REFERENCES
[1] G. Weidl, A. L. Madsen, S. Wang, D. Kasper, and M. Karlsen, “Early
and accurate recognition of highway traffic maneuvers considering real
world application: A novel framework using Bayesian networks, IEEE
Intell. Transp. Syst. Mag., vol. 10, no. 3, pp. 146–158, Jun. 2018.
[2] S. Lefèvre, D. Vasquez, and C. Laugier, A survey on motion prediction
and risk assessment for intelligent vehicles, ROBOMECH J.,vol.1,
no. 1, p. 1, 2014.
[3] J. Schlechtriemen, F. Wirthmueller, A. Wedel, G. Breuel, and
K.-D. Kuhnert, “When will it change the lane? A probabilistic regression
approach for rarely occurring events, in Proc. IEEE Intell. Vehicles
Symp. (IV), Jun. 2015, pp. 1373–1379.
[4] X. Meng et al., “MLlib: Machine learning in apache spark, J. Mach.
Learn. Res., vol. 17, no. 1, pp. 1235–1241, 2016.
[5] F. Pedregosa et al., “Scikit-learn: Machine learning in python, J. Mach.
Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011.
[6] C. Wissing, T. Nattermann, K.-H. Glander, C. Hass, and T. Bertram,
“Lane change prediction by combining movement and situation based
probabilities, IFAC-PapersOnLine, vol. 50, no. 1, pp. 3554–3559,
Jul. 2017.
[7] J. Schlechtriemen, A. Wedel, J. Hillenbrand, G. Breuel, and
K.-D. Kuhnert, A lane change detection approach using feature ranking
with maximized predictive power, in Proc. IEEE Intell. Vehicles Symp.,
Jun. 2014, pp. 108–114.
[8] M. Bahram, C. Hubmann, A. Lawitzky, M. Aeberhard, and D. Wollherr,
A combined model- and learning-based framework for interaction-
aware maneuver prediction, IEEE Trans. Intell. Transp. Syst., vol. 17,
no. 6, pp. 1538–1550, Jun. 2016.
[9] D. Lenz, F. Diehl, M. T. Le, and A. Knoll, “Deep neural networks for
Markovian interactive scene prediction in highway scenarios, in Proc.
IEEE Intell. Vehicles Symp. (IV), Jun. 2017, pp. 685–692.
[10] F. Altche and A. de La Fortelle, An LSTM network for highway
trajectory prediction, in Proc. IEEE 20th Int. Conf. Intell. Transp. Syst.
(ITSC), Oct. 2017, pp. 353–359.
[11] J. Wiest, M. Höffken, U. Kresel, and K. Dietmayer, “Probabilistic
trajectory prediction with Gaussian mixture models, in Proc. IEEE
Intell. Vehicles Symp., Jun. 2012, pp. 141–146.
[12] J. Wiest, F. Kunz, U. Kreßel, and K. Dietmayer, “Incorporating categori-
cal information for enhanced probabilistic trajectory prediction, in Proc.
12th Int. Conf. Mach. Learn. Appl., vol. 1, Dec. 2013, pp. 402–407.
[13] J. Schlechtriemen, A. Wedel, G. Breuel, and K.-D. Kuhnert, A prob-
abilistic long term prediction approach for highway scenarios, in
Proc. 17th Int. IEEE Conf. Intell. Transp. Syst. (ITSC), Oct. 2014,
pp. 732–738.
[14] J. Colyar and J. Halkias, “US highway 101 dataset, Federal Highway
Admin., Washington, DC, USA, Tech. Rep. FHWA-HRT-07-030, 2007.
[15] S. Yoon and D. Kum, “The multilayer perceptron approach to lateral
motion prediction of surrounding vehicles for autonomous vehicles, in
Proc. IEEE Intell. Vehicles Symp. (IV), Jun. 2016, pp. 1307–1312.
[16] H. Woo et al., “Lane-change detection based on vehicle-trajectory
prediction, IEEE Robot. Autom. Lett., vol. 2, no. 2, pp. 1109–1116,
Apr. 2017.
[17] C. Wissing, T. Nattermann, K.-H. Glander, and T. Bertram, “Probabilistic
time-to-lane-change prediction on highways, in Proc. IEEE Intell.
Vehicles Symp. (IV), Jun. 2017, pp. 1452–1457.
[18] C. Wissing, T. Nattermann, K.-H. Glander, and T. Bertram, “Trajec-
tory prediction for safety critical maneuvers in automated highway
driving, in Proc. 21st Int. Conf. Intell. Transp. Syst. (ITSC), Nov. 2018,
pp. 131–136.
[19] N. Deo, A. Rangesh, and M. M. Trivedi, “How would surround vehicles
move? A unified framework for maneuver classification and motion
prediction, IEEE Trans. Intell. Vehicles, vol. 3, no. 2, pp. 129–140,
Jun. 2018.
[20] N. Deo and M. M. Trivedi, “Convolutional social pooling for vehicle
trajectory prediction, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Recognit. Workshops (CVPRW), Jun. 2018, pp. 1468–1476.
[21] S. Klingelschmitt, M. Platho, H.-M. Groß, V. Willert, and J. Eggert,
“Combining behavior and situation information for reliably estimating
multiple intentions, in Proc. IEEE Intell. Vehicles Symp., Jun. 2014,
pp. 388–393.
[22] J. Schlechtriemen, K. P. Wabersich, and K.-D. Kuhnert, “Wiggling
through complex traffic: Planning trajectories constrained by pre-
dictions, in Proc. IEEE Intell. Vehicles Symp. (IV), Jun. 2016,
pp. 1293–1300.
[23] D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning for
autonomous cars that leverage effects on human actions, in Proc.
Robot., Sci. Syst., vol. 2. Ann Arbor, MI, USA: RSS Foundation, 2016.
[24] S. Tattersall, U. Petersen, and J. Breuer, “Ein Messdatenmanagementsys-
tem für die Feldabsicherung von neuen Fahrerassistenzsystemen, in
Proc. VDI-Berichte, no. 2166, 2012, pp. 203–214.
[25] A. Thorvaldsson and V. Bandi, “Reference path estimation for lateral
vehicle control, M.S. thesis, Dept. Signals Syst., Chalmers Univ.
Technol., Gothenburg, Sweden, 2015.
[26] I. Guyon and A. Elisseeff, An introduction to variable and feature
selection, J. Mach. Learn. Res., vol. 3, pp. 1157–1182, Mar. 2003.
[27] L. Fahrmeir, C. Heumann, R. Künstler, I. Pigeot, and G. Tutz, Statistik:
Der Weg zur Datenanalyse. Berlin, Germany: Springer, 2016.
[28] M. A. Hall, “Correlation-based feature selection for discrete and numeric
class machine learning, in Proc. Int. Conf. Mach. Learn. (ICML).
San Mateo, CA, USA: Morgan Kaufmann, 2000, pp. 359–366.
[29] R. Kohavi and G. H. John, “Wrappers for feature subset selection, Artif.
Intell., vol. 97, nos. 1–2, pp. 273–324, Dec. 1997.
[30] T. Hastie, J. Friedman, and R. Tibshirani, The Elements of Statistical
Learning, vol. 1. New York, NY, USA: Springer, 2001.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
16 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
[31] A. Corduneanu and C. M. Bishop, “Variational Bayesian model selec-
tion for mixture distributions, in Artificial Intelligence and Statistics.
Waltham, MA, USA: Morgan Kaufmann, 2001, pp. 27–34.
[32] K. H. Brodersen, C. S. Ong, K. E. Stephan, and J. M. Buhmann, “The
balanced accuracy and its posterior distribution, in Proc. 20th Int. Conf.
Pattern Recognit., Aug. 2010, pp. 3121–3124.
[33] K. P. Murphy, Machine Learning: A Probabilistic Perspective.Cam-
bridge, MA, USA: MIT Press, 2012.
[34] C. Willmott and K. Matsuura, Advantages of the mean absolute error
(MAE) over the root mean square error (RMSE) in assessing average
model performance, Climate Res., vol. 30, pp. 79–82, 2005.
[35] R. Krajewski, J. Bock, L. Kloeker, and L. Eckstein, “The highD dataset:
A drone dataset of naturalistic vehicle trajectories on German highways
for validation of highly automated driving systems, in Proc. 21st Int.
Conf. Intell. Transp. Syst. (ITSC), Nov. 2018, pp. 2118–2125.
Florian Wirthmüller received the B.Sc. and M.Sc.
degrees in computer engineering with a study focus
on cognitive technical systems from the Ilmenau
University of Technology in 2015 and 2017, respec-
tively. He is currently pursuing the Ph.D. degree
with the Institute of Databases and Information Sys-
tems (DBIS), Ulm University, in cooperation with
Mercedes-Benz AG Research and Development.
His research interests include automated driving,
big data analytics, machine learning, and backend
architectures supporting manually driven as well as
automated vehicles.
Julian Schlechtriemen received the Diploma degree
in applied computer science with electrical engi-
neering as the main subject from the University of
Siegen in 2012, where he is currently pursuing the
Ph.D. degree with the Institute of Realtime Learning
Systems, in cooperation with Mercedes-Benz AG
Research and Development. His research interests
include vehicle and driver prediction using machine
learning techniques and the incorporation of this
information in behavior and trajectory planning.
Jochen Hipp received the Diploma degree in com-
puter science and economics and the Ph.D. degree
from the University of Tübingen. Since then, deriv-
ing knowledge from massive data sets is part of
his daily work at Mercedes-Benz AG Research and
Development. Over the years, he has been active in
different fields such as root cause analysis and early
warning based on aftersales data, target-oriented
endurance testing, customer profiles, advanced driver
assistance systems, autonomous driving with a focus
on high definition maps, vehicle localization, and
backend support. He is currently working on the analysis of field data to
improve current and future driver assistance system generations.
Manfred Reichert is a Full Professor with Ulm
University, where he is the Director of the Institute
of Databases and Information Systems (DBIS). His
research interests include business process man-
agement, intelligent information systems, process
and data mining, and mobile services. Furthermore,
he served as the General Chair of the BPM
2009 and EDOC 2014 conferences as well as the
BPM 2015 workshops. He was the PC Co-Chair of
the BPM 2008, CoopIS 2011, and EDOC 2013 con-
ferences. He has coauthored a Springer book on
process flexibility and obtained the BPM Test of Time Award from the BPM
Conference in 2013.