Predicting the Time Until a Vehicle Changes the Lane Using LSTM-Based Recurrent Neural Networks [original]

IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 6, NO. 2, APRIL 2021 2357

Predicting the Time Until a Vehicle Changes the Lane

Using LSTM-Based Recurrent Neural Networks

Florian Wirthmüller , Marvin Klimke , Julian Schlechtriemen , Jochen Hipp , and Manfred Reichert

Abstract—To plan safe and comfortable trajectories for auto-

mated vehicles on highways, accurate predictions of traffic situ-

ations are needed. So far, a lot of research effort has been spent

on detecting lane change maneuvers rather than on estimating the

point in time a lane change actually happens. In practice, however,

this temporal information might be even more useful. This letter

deals with the development of a system that accurately predicts the

time to the next lane change of surrounding vehicles on highways

using long short-term memory-based recurrent neural networks.

An extensive evaluation based on a large real-world data set shows

that our approach is able to make reliable predictions, even in

the most challenging situations, with a root mean squared error

around 0.7 seconds. Already 3.5 seconds prior to lane changes the

predictions become highly accurate, showing a median error of less

than 0.25 seconds. In summary, this article forms a fundamental

step towards downstreamed highly accurate position predictions.

Index Terms—Intelligent Transportation Systems, AI-Based

Methods, Automated Driving, Vehicle Motion Prediction.

I. INTRODUCTION

AUTOMATEDDRIVINGisontherise,makingtrafficsafer

and more comfortable already today. However, handing

over full control to a system still constitutes a particular chal-

lenge. To reach the goal of fully automated driving, precise

information about the positions as well as the behavior of

surrounding traffic participants needs to be gathered. Moreover,

an estimation about the development of the traffic situation,

i. e. the future motion of surrounding vehicles, is at least as

important.Only if the system is taught to perform an anticipatory

style of driving similar to a human driver, acceptable levels

of comfort and safety can be achieved. Therefore, every step

towards improved predictions of surrounding vehicles’ behavior

in terms of precision as well as wealth of information is valuable.

Manuscript received October 14, 2020; accepted January 24, 2021. Date of

publication February 11, 2021; date of current version March 15, 2021. This

letter was recommended for publication by Associate Editor J. Kim and Editor

Y. Choi upon evaluation of the reviewers’ comments. (Corresponding author:

Florian Wirthmüller.)

Florian Wirthmüller is with the Mercedes-Benz AG, 71034 Böblingen,

Germany, and also with the Institute of Databases and Information Systems

(DBIS), Ulm University, 89081 Ulm, Germany (e-mail: florian.wirthmueller@

daimler.com).

Marvin Klimke is with the RWTH Aachen University, 52062 Aachen, Ger-

many (e-mail: marvin.klimk[email protected]).

Julian Schlechtriemen is with the Mercedes-Benz AG, 71034 Böblingen,

Germany,andalsowith theInstituteof RealtimeLearningSystems, Universityof

Siegen, 57076 Siegen, Germany (e-mail: julian.schlechtriemen@daimler.com).

Jochen Hipp is with the Mercedes-Benz AG, 71034 Böblingen, Germany

(e-mail: jochen.hipp@daimler.com).

Manfred Reichert is with the Institute of Databases and Information Systems

(DBIS), Ulm University, 89081 Ulm, Germany (e-mail: manfred.reichert@uni-

ulm.de).

Digital Object Identifier 10.1109/LRA.2021.3058930

Fig. 1. A lot of previous works investigated systems that classify whether or

not a lane change is going to take place. Instead, the proposed approach estimates

the time to the next lane change directly. This information is more useful and

covers the classification information implicitly.

Although many works in the field of motion prediction focus

on predicting whether or not a lane change maneuver will take

place, predictions on the exact point in time the lane changes

will occur have not been well investigated. This temporal in-

formation, however, is extremely important, as emphasized

by Fig. 1. Hence, this letter deals with the development of

a system that predicts the time to upcoming lane changes of

surrounding vehicles precisely. The system is developed and

thoroughly evaluated based on a large real-world data set, which

is representative for highway driving in Germany. As methodical

basis, the state-of-the-art technique of long short-term memory

(LSTM)-based recurrent neural networks (RNNs) is applied.

Therefore, we form the basis for downstreamed highly accurate

position predictions. The novelty and main contribution of our

article results from using and thoroughly investigating known

techniques with the special perspective of (vehicle) motion

prediction rather than from developing completely new learning

methods. Therefore, we changed the learning paradigm from

classification to regression and obtained a significant gain in

knowledge. In addition, to the best of our knowledge, there is

no other article comparing an approach for time to lane change

regression with a maneuver classification approach.

The remainder of this paper is structured as follows: Section II

discusses related work. Section III then describes the proposed

approach, followed by its evaluation based on real-world mea-

surements in Section IV. Finally, Section V concludes the article

with a short summary and an outlook on future work.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

2358 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 6, NO. 2, APRIL 2021

II. RELATED WORK

An overview of motion prediction approaches is presented

in [1], which distinguishes three categories: physics-based,

maneuver-based, and interaction-aware approaches. Maneuver-

based approaches, which are most relevant in the context of our

work, typically define three fundamental maneuver classes: lane

change to the left LCL, lane change to the right LCR, and lane

following FLW [2]–[4]. These maneuver classes are used to

simplify modeling the entirety of highway driving and its multi-

modality. Based on this categorization, the prediction problem is

interpreted as a classification task with the objective to estimate

the upcoming maneuver or the maneuver probabilities based on

the current sensor data.

An approach that decomposes the lane change probability

into a situation- and a movement-based component is presented

in [2]. As a result, an F1-score better than 98 %, with the

maneuversbeingdetected approximately1.5s inadvance,canbe

obtained. The probabilities are modeled with sigmoid functions

as well as a support vector machine.

In [3], the problem of predicting the future positions of sur-

rounding vehicles is systematically investigated from a machine

learning point of view using a non-public data set. Among

the considered approaches and techniques, the combination of

a multilayer perceptron (MLP) as lane change classifier and

three Gaussian mixture regressors as position estimators in a

mixture of experts shows the best performance. The mixture of

experts approach can be seen as a divide and conquer manner

enablingtomastermodelingthecomplexmultimodalities during

highway driving. In order to achieve this, the probabilities of

all possible maneuvers are estimated. The latter are used to

aggregate different position estimates being characteristic for

the respective maneuvers. In [4], the approach of [3] has been

adopted to the publicly available highD data set [5], showing

an improved maneuver classification performance with an area

under the receiver operating characteristic curve of over 97 % at

a prediction horizon of 5s. Additionally, [4] studies the impact of

external conditions (e.g. traffic density) on the driving behavior

as well as on the system’s prediction performance.

The highD data set [5] has evolved into a defacto standard data

set for developing and evaluating such prediction approaches

since its release in 2018. The data set comprises more than

16 hours of highway scenarios in Germany that were collected

from an aerial perspective with a statically positioned drone. The

recordings cover road segments ranging 420m each. Compared

to the previously used NGSIM data set [6], the highD data

set contains less noise and covers a higher variety of traffic

situations.

In opposition to the so far mentioned machine-learning

based approaches, [1] introduced the notion ‘physics-based’ ap-

proaches. Such approaches mostly depend on the laws of physics

and can be described with simple models such as constant

velocity or constant acceleration [7]. Two well-known and more

advanced model-based approaches are the ‘Intelligent Driver

Model’ (IDM) [8] and ‘Minimizing Overall Braking Induced

by Lane Changes’ (MOBIL) approach [9]. Such approaches are

known to be more reliable even in rarely occurring scenarios.

Therefore, it is advisable to use them in practice in combination

with machine learning models, which are known to be more pre-

cise during normal operation, to safeguard the latter’s estimates.

Approaches understanding the lane change prediction prob-

lem as a regression task instead of a classification task and that

are more interested in the time to the next lane change are very

rare though. Two such approaches can be found in [10], [11].

In [10], an approach predicting the time to lane change based

on a neural network that consists of an LSTM and two dense

layers is proposed. Besides information about the traffic situa-

tion which can be measured from each point in the scene, the

network utilizes information about the driver state. Therefore,

the approach is solely applicable to predict the ego-vehicle’s

behavior, but not to predict the one of surrounding vehicles.

Nevertheless, the approach performs well showing an average

prediction error of only 0.3s at a prediction horizon of 3s when

feeding the LSTM with a history of 3s. To train and evaluate

the network, a simulator-based data set covering approximately

1000 lane changes to each side is used.

An approach based on quantile regression forests, which

constitute an extension of random decision forests, is presented

in [11]. It uses features that describe the relations to the sur-

rounding traffic participants over a history of 0.5s and produces

probabilistic outputs. The approach is evaluated with a small

simulation-based as well as a real-world data set with 150 and 50

situations per lane change direction, respectively. The evaluation

shows that the root mean squared error (RMSE) falls below

1.0s only 1.5s before a lane change takes place. In [12], this

work is extended utilizing the time to lane change estimates

to perform trajectory predictions using cubic polynomials. An

approach based on quantile regression forests, which constitute

an extension of random decision forests, is presented in [11]. It

uses features that describe the relations to the surrounding traffic

participants over a history of 0.5s and produces probabilistic

outputs. The approach is evaluated with a small simulation based

as well as a real-world data set with 150 and 50 situations per

lane change direction, respectively. The evaluation shows that

the root mean squared error (RMSE) falls below 1.0s only 1.5s

before a lane change takes place. In [12], this work is extended

utilizing the time to lane change estimates to perform trajectory

predictions using cubic polynomials.

Other approaches try to infer the future position or a spatial

probability distribution [3], [4], [13]–[16]. As [13] shows, it

is promising to perform the position prediction in a divide and

conquermanner.Therefore,asystem exclusivelyproducing time

to lane change estimates remains reasonable even though ap-

proaches directly estimating the future positions also determine

that information as by-product.

The approach presented in [13] uses a random forest to

estimate lane change probabilities. These probabilities serve as

mixture weights in a mixture of experts predicting future posi-

tions. This approach has been extended by the above-mentioned

works [3], [4], which have replaced the random forest by an

MLP. The evaluations presented in [4] show a median lateral

prediction error of 0.18m on the highD data set at a prediction

horizon of 5s.

WIRTHMÜLLER et al.: PREDICTING THE TIME UNTIL A VEHICLE CHANGES THE LANE USING LSTM-BASED RECURRENT NEURAL NETWORKS 2359

Fig. 2. Architecture of the used LSTM-based RNN together with its inputs and outputs. As the illustration indicates, it is necessary to feed the network with

several consecutive measurements in order to take advantage of the recursive nature of the LSTM units. The relevant output is the most recently produced one,

highlighted in red, as it is influenced by all previous measurements.

A similar strategy is applied by [14]. In this work, an MLP

for maneuver classification as well as an LSTM network for

trajectory prediction are trained using the NGSIM data set. In

turn, the outputs of the MLP are used as one of the inputs of the

LSTM network. The evaluation yields an RMSE of only 0.09m

at a prediction horizon of 5s for the lateral direction when using

a history of 6s.

The approach presented in [15] uses an LSTM-based RNN,

which predicts single shot trajectories rather than probabilistic

estimates. The network is trained using the NGSIM data set. [15]

investigates different network architectures. Among these archi-

tectures,asingle LSTMlayerfollowedbytwodenselayersusing

tanh-activation functions shows the best performance, i. e., an

RMSE of approximately 0.42m at a prediction horizon of 5s.

[16] uses an LSTM-based encoder-decoder architecture to

predict spatial probability distributions of surrounding vehicles.

The used architecture is enabled to explicitly model interactions

between vehicles. Thereby, the LSTM-based network is used

to estimate the parameters of bivariate Gaussian distributions,

which model the desired spatial distributions. Evaluations based

on the NGSIM and highD data sets show RMSE values of 4.30m

and 2.91 m, respectively, at a prediction horizon of 5s.

As our literature review shows, many approaches, and espe-

ciallythe most recentones,use longshort-termmemory(LSTM)

units. An LSTM unit is an artificial neuron architecture, which

is used for building recurrent neural networks (RNNs). LSTMs

have been firstly introduced by Hochreiter and Schmidhuber in

1997 [17].

The key difference between RNNs and common feedforward

architectures (e.g. convolutional neural networks) results from

feedback connections that allow for virtually unlimited value

and gradient propagation, making RNNs well suited for time

series prediction. To efficiently learn long-term dependencies

from the data, the LSTM maintains a cell and a hidden state

that are selectively updated in each time step. The information

flow is guided by three gates, which allow propagating the

cell memory without change. The latter contributes to keep the

problem of vanishing and exploding gradients, classic RNNs

suffer from [18, Ch. 10], under control.

III. PROPOSED APPROACH

The present work builds upon the general approach we de-

scribed in [3], [4] but follows a fundamentally different idea.

We replaced the previously used multilayer perceptron (MLP)

for lane change classification by a long short-term memory

(LSTM)-based recurrent neural network (RNN) predicting the

time to an upcoming lane change. Consequently, the classifi-

cation task becomes a regression task. For the moment of the

lane change, we are using the point in time when the vehicle

center has just crossed the lane marking [3]. Transforming the

classificationproblemtoaregressionproblem has in fact also the

benefit, that the labeling is simplified, as it is no longer necessary

to define the start and the end of the lane change maneuver. The

latter is a really challenging task. Fig. 2 illustrates the proposed

model architecture together with the inputs and outputs. The

architecture consists of one LSTM layer followed by one hidden

dense layer and an output layer. The dimensionality of the output

layer is two, with the two dimensions representing the predicted

time to a lane change to the left



TTLCL

1and to the right



TTLCR, respectively. In accordance with [17], the LSTM layer

uses sigmoid functions for the gates and tanh for the cell state

and outputs. By contrast, in the following dense layers rectified

linear units (ReLU) are used. ReLUs map negative activations to

avalueof zero. Forpositivevalues,inturn,theoriginal activation

is returned. ReLUs have to be favored against classical neurons,

e.g., using sigmoidal activation functions as they help to prevent

the vanishing gradient problem. The use of ReLUs instead of

linear output activations for a regression problem can be justified

with the fact that negative TTLC2values cannot occur in the

1Symbols which are overlined with a tilde denote estimated values in contrast

to the actual ones.

2TTLC stands for time to lane change values in general no matter to which

direction the lane change is performed.

2360 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 6, NO. 2, APRIL 2021

Fig. 3. Visualization of the relative feature importance values. As the relative

importance values are derived from the model weights, they are related to a

fitted model. In this case we chose the one with the optimal hyperparameters

(see Table I). The feature identifiers are in accordance with the ones from [3],

where the feature selection was carried out. A short overview can be found in

the Appendix (Table V).

givencontext.Whiledesigningourapproach,wealsoconsidered

model architectures featuring two LSTMs stacked on top or

using a second dense layer. Both variants provided no significant

performance improvement. This observation is in line with the

findings described in [15].

The used feature set is the same as in [4] and is based

on the highD data set. The selection of the features is taken

from [3], where data produced by a testing vehicle fleet is used to

thoroughly investigate different feature sets. As opposed to [3],

however, our approach omits the yaw angle as it is not available

in the highD data set. Moreover, the transformation to lane

coordinates is not needed as the highD data set solely contains

straight road segments. The relative feature importance values

are depicted in Fig. 3.

For each feature f, the importance value ι(f)is calculated

according to (1) as the sum of all weights wconnecting that

feature to the nLST M neurons of the LSTM layer:

ι(f)=

nLST M



n=1

w(f,n)(1)

The relative importance is calculated by normalization. As

Fig. 3 indicates, the distance to the left lane marking dml

and the lateral acceleration ayplay superior roles, whereas the

importance of the other features is lower and quite similar.

In order to use the recursive nature of the LSTM units, one

has to feed not only the current measurement values, but also a

certain number of previous measurement values to the network.

Although the network is able to output estimates each time an

input vector has been processed, we are only interested in the

last output. This is due to the fact that only in the last iteration, all

measurements and especially the most recent ones are utilized

for generating the prediction. This input/output interface of the

network is illustrated in Fig. 2. The gray box on the left depicts

a set of past measurements that are fed to the RNN as the

input time series for a prediction at point t. The LSTM layer

TABLE I

HYPERPARAMETERS IN THE GRID SEARCH

continuously updates its cell state, which can be used to derive a

model output at any time. This is indicated by the time series of

TTLC estimates in the gray box on the right. The relevant final

estimate is framed in red. In case a prediction is required for

every time step, the LSTM is executed with largely overlapping

input time series and reset in between.

The remaining hyperparameters, namely the dimensionality

of the LSTM cell and the hidden dense layer, as well as the num-

ber of time steps provided and the learning rate are tuned using

a grid search scheme [19, p. 7f]. Table I lists the hyperparameter

values to be evaluated, yielding 54 possible combinations. This

hyperparameter tuning scheme is encapsulated in a 5-fold cross

validation to ensure a robust evaluation of the model’s general-

ization abilities [3].

Moreprecisely,foreachpossiblecombination ofhyperparam-

etersa modelistrainedbasedon4folds. Subsequently,themodel

is evaluated using the remaining fifth fold. This procedure is

iterated so that each fold is used once for evaluation. Afterwards,

the results are averaged and used to indicate the fitness of this

hyperparameter set. As evaluation metric the loss function of the

regression problem is used.

Given the aforementioned grid definition (see Table I), the

following hyperparameter setup has proven to be optimal in the

context of the present study: The output dimensionality of the

LSTM nLST M results to 256 and the dense layer to a size of

ndense =32units. Moreover, 3s of feature history at 25 Hz,

resulting in 75 time steps, is sufficient for the best performing

model. As optimization algorithm we chose Adam [20], with

α=0.0003 as optimal learning rate.

When labeling the samples, the time to lane change values are

clipped to a maximum of seven seconds, which is also applied to

trajectory samples with no lane change ahead. The loss function

of the regression problem is defined as mean squared error

(MSE).Asthe TTLCvaluesarecontained in theinterval[0,7]s,

there are virtually no outliers that MSE could suffer from.

In order not to over-represent lane following samples during

the training process, the data set used to train the model is

randomly undersampled. Accordingly, only one third of the

lane following samples are used. A similar strategy is described

in [10]. Moreover, the features are scaled to zero mean and

unit-variance.

Keras [21], a Python-based deep learning API built on top

of Google’s TensorFlow [22], is used to assemble, train, and

WIRTHMÜLLER et al.: PREDICTING THE TIME UNTIL A VEHICLE CHANGES THE LANE USING LSTM-BASED RECURRENT NEURAL NETWORKS 2361

Fig. 4. Distribution of the unclipped time to lane change values. The upper

part of the figure only contains samples with an upcoming lane change to the

left. Hence, it solely depicts the time to the next lane change to the left TTLCL.

The lower part, in turn, shows an equivalent representation for lane changes to

the right. The used data set is not balanced over maneuver classes.

validate the RNN models. The grid search is performed on

a high-performance computer equipped with a graphics pro-

cessing unit, which is exploited by TensorFlow to reach peak

efficiency.

IV. EVALUATION

Toevaluatetheresulting timetolanechangepredictionmodel,

one fold of the highD data set is used. This fold was left out

during model training and hyperparameter optimization. It is

noteworthy that the used data sets are not balanced over TTLC.

This means, for example, that there are more samples with a

TTLCLof 3s than samples with a TTLCLof 5s. This fact is

illustrated by the histogram depicted in Fig. 4. The reason is that

in the highD data set observations for individual vehicles rarely

span over the full time of 7s or more. However, this does not

affectthefollowingevaluationssignificantly.Forallexperiments

we relied on the model, which showed the best performance

during the grid search.

In the following, we evaluate two different characteristics of

the proposed approach. First, we investigate how well the system

solves the actual task, that is to estimate the time to the next

lane change (cf. Section IV-A). Subsequently (Section IV-B),

we deduce a maneuver classification estimate from the TTLC

estimates and perform a performance evaluation in comparison

to existing works.

A. Time to Lane Change Prediction Performance

To investigate the system’s ability to estimate the time to

the next lane change, we consider the root mean squared error

(RMSE). This stands in contrast to the loss function that uses

the pure mean squared error (MSE) (see Section III). However,

as evaluation metric the RMSE is beneficial due to its better

interpretability. The latter is caused by the fact that the RMSE has

thesamephysicalunit as the desired output quantity,i.e.seconds

in our case. Further note that the overall RMSE is not always the

TABLE II

TIME TO LANE CHANGE PREDICTION PERFORMANCE (RMSE [s]) ON A

BALANCED DATA SET

most suitable measure. This fact shall be illustrated by a simple

example: For a sample where the driver follows the current lane

(FLW) or performs a lane change to the right (LCR), it is

relatively straight forward to predict the TTLCL. By contrast,

it is considerably more challenging to estimate the same quantity

for a sample where a lane change to the left (LCL) is executed.

However, the latter constitutes the more relevant information.

Therefore, we decided to calculate the RMSE values for the two

individual outputs



TTLCLand



TTLCR. A look at the results

presented in Table II makes this thought clearer.

To produce the results shown in Table II, we use a data set

that is balanced according to the maneuver labels. The latter are

defined according to [4].3The evaluation considers all samples

with an actual TTLCL value below 7s as LCL samples.

Regarding LCR samples, an equivalent logic is applied. All

remaining samples belong to the FLW class. In some very rare

cases,twolane changesare performedinquicksuccession.Thus,

a few samples appear in both LCL and LCR. This explains the

slightly different number of samples, shown in Table II.

The first row of Table II depicts the overall RMSE.TheRMSE

can be monotonically mapped from the MSE, which is used

as loss function during the training of the network. The two

rows below depict the RMSE values separated by the outputs.

The values we consider as the most relevant ones (TTLCL

estimation error for LCL samples and vice versa) are high-

lighted (bold font). Thus, the most interesting error values are

close to 0.7s. The other error values are significantly smaller

but this is in fact not very surprising. This can be explained, as

the system only has to detect that no lane change is about to

happen in the near future in these cases. If this is successfully

detected, the respective



TTLC can simply be set to a value

close to 7s. Note that these values can be hardly compared with

existing works (e.g. [10]) as the overall results strongly depend

on the distribution of the underlying data set as well as the RMSE

values considered. In addition, our investigations are based on

real-world measurements rather than on simulated data.

In addition to the overall prediction performance, we are

interested in the system’s prediction performance over time.

Obviously, the prediction task is, for example, significantly more

difficult 4s prior to the actual lane change than it is 1s before it.

To investigate this, we evaluate the RMSE and the distribution of

the errors using boxplots as functions of the TTLC,asshown

in Fig. 5. Attention should be paid to the fact that the illustrated

3The definition of the labels essentially complies with the one presented in

(2). As opposed to the shown equation, the actual times to the next lane change

are used instead of the estimated ones.

2362 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 6, NO. 2, APRIL 2021

Fig. 5. RMSE (red) and error distribution (boxplot) as functions of the remain-

ing time to the next lane change. The underlying data set is not balanced over the

actual maneuver classes. The depicted values refer to the error values separated

by the output channels.

values correspond to the errors separated by output channels

as in Table II. For this investigation we rely on the unbalanced

data set, meaning that considerably more FLW samples are

included. An exact depiction of the label distribution can be

found later on in Table IV. By using the unbalanced data set,

more samples with TTLC values between 5s and 7s remain

in the data. Thus, the error values aggregated over TTLC are

assumed to be less noisy, especially between 5 and 7s.

AsshowninFig.5,theRMSE and the median values in the

boxplots are mostly very close to each other, but the medians

are more optimistic in general. Especially, this is the case in

the upper part of Fig. 5 (arising lane change to the left) in

the region between 7s and 6s. This can be explained with the

fact that in this range the data density is relatively low. Thus, a

single large error can significantly affect the RMSE, whereas this

sample is considered as outlier in the boxplot. The illustrations

show that our approach reaches very small prediction errors

below 0.25s already 3.5s before the actual lane change moment.

Even though a direct comparison to other approaches is also

difficult for this quantity, it is noteworthy that [11] reports RMSE

values below 0.25s only 1s before the lane changes. Conversely,

our evaluations show comparable RMSE values already 2.5s in

advance of the lane change.

The fact that errors for large TTLC values (>4.5s) are also

very low can be explained as the system may not recognize such

examples as lane changes. In that case, the system will solely

output a TTLC value of around 7s. If, for example, the actual

value corresponds to 6s, the error is of course around 1s. Thus,

one can conclude that outputs, which are larger than the break

even point of approximately 4.5s are not very reliable. Note

that this is in fact not surprising as predictions with such time

horizons are extremely challenging.

Besides, it is known that lane changes to the left are easier to

predict than the ones to the right [3], [23]. This is the reason that

the RMSE values for lane changes to the right decrease slower

over time than the values for lane changes to the left.

TABLE III

MANEUVER CLASSIFICATION PERFORMANCE ON A DATA SET BALANCED OVER

ACTUAL MANEUVER CLASSES COMPAREDTOTHESTUDY IN [4]

THE PROCEDURE TO CONSTRUCT THE DATA SET CAN BE EXTRACTED FROM

THE CONTINUOUS TEXT

B. Classification Performance

In addition to the preceding evaluations, we want to know

how well our approach performs compared to a pure maneuver

classification approach. This can be easily investigated by deriv-

ing the classification information from the time to lane change

estimates. For this purpose, the logic depicted in (2) is applied:

L=⎧

⎪

⎨

⎪

⎩

LCL, if (



TTLCL ≤5s)∧(



TTLCL ≤



TTLCR)

LCR, if (



TTLCR ≤5s)∧(



TTLCR <



TTLCL)

FLW, otherwise

(2)



TTLCLand



TTLCRdenote the estimated time to the next

lane change to the left and to the right, respectively. The defined

labels LCL,LCR and FLW are used to specify samples

belonging to the three already introduced maneuver classes: lane

change to the left, lane change to the right, and lane following.

This definition matches the one used in [4] for the labeling. Also

the prediction horizon of 5s was adopted from [4] in order to

ensure comparability. As lane change maneuvers usually range

from 3s to 5s (see [24]), this is also a reasonable choice. The

following investigations are, therefore, conducted in comparison

to the approach outlined in [4], where an MLP for maneuver

classification is trained using the highD data set (see Section II).

We use the well-known metrics precision, recall and F1-score,

whose definitions can be found in [25, p. 182 f]. The results on

a balanced data set are given in Table III.

This investigation shows that our newly developed LSTM

network is able to perform the classification task – for which

it was not intended – with a comparable or even slightly bet-

ter performance than existing approaches. In particular, it is

remarkable that not only the overall performance (measured

with the F1-score) is significantly increased with respect to the

FLW samples, but also with respect to the LCL samples. The

improved performance on the FLW class can be explained by

the adapted training data set. While [4] uses a balanced data

set, in this study we use a third of all FLW samples and thus

significantly more than from the two other classes.

Theoverallslightlyimprovedperformancecanpresumably be

attributed to the recurrent LSTM structure enabling the network

to memorize past cell states. As opposed to this approach, [4]

WIRTHMÜLLER et al.: PREDICTING THE TIME UNTIL A VEHICLE CHANGES THE LANE USING LSTM-BASED RECURRENT NEURAL NETWORKS 2363

TABLE IV

MANEUVER CLASSIFICATION PERFORMANCE ON AN UNDERSAMPLED BUT NOT

BALANCED DATA SET COMPARED TO THE STUDY IN [4]

THE PROCEDURE TO CONSTRUCT THE DATA SET CAN BE EXTRACTED FROM

THE CONTINUOUS TEXT

relies on the Markov assumption and, thus, does not model past

system states. Although recurrent approaches can improve the

prediction performance, Markov approaches have to be also

taken into account when it comes to embedded implementations,

as the latter ones are more resource-friendly.

Another interesting characteristic of our approach can be

observed in Table IV, where its performance is measured on

a data set which is undersampled in the same way as during the

training.

As shown by Table IV, the new LSTM approach copes signif-

icantly better with the changed conditions (using an unbalanced

instead of a balanced data set) compared to the MLP approach

presented in [4]. On one hand, this is not surprising, as our

network is exactly trained on a data set that is distributed in

the same way. On the other, together with the results displayed

in Table III, where the LSTM also performs quite well, it

demonstrates that the LSTM approach is significantly more

robust than the MLP. Nevertheless, note that in practice the

MLP is applied together with a prior multiplication step. The

probabilities estimated this way are then used as weights in a

mixture of experts.

V. SUMMARY AND OUTLOOK

This work presented a novel approach for predicting the time

to the next lane change of surrounding vehicles on highways

with high accuracy. The approach was developed and evaluated

withregardtoits predictionperformanceusingalargereal-world

data set. Subsequently, we demonstrated that the presented ap-

proach is able to perform the predictions even during the most

challenging situations with an RMSE around 0.7s. Additional

investigations showed that the predictions become highly accu-

rate already 3.5s before a lane change takes place. Besides, the

performancewascompared toaselectedmaneuver classification

approach. Similar approaches are often used in recent works.

Thus, it was shown that our approach is also able to deliver this

information with a comparably high and in some situations even

better quality. On top of this, our approach delivers the time to

the next lane change as additional information.

The described work builds the basis for improving position

prediction approaches by integrating the highly accurate time to

lane change estimates into a downstreamed position prediction.

Our future research will especially focus on how to use these

estimates in an integrated mixture of experts approach instead

of maneuver probabilities as sketched in [3].

APPENDIX

TABLE V

FEATURE DESCRIPTION

TABLE VI

ACRONYMS

REFERENCES

[1] S. Lefèvre, D. Vasquez, and C. Laugier, “A survey on motion predic-

tion and risk assessment for intelligent vehicles,” ROBOMECH J., 2014,

pp. 1–14.

[2] C. Wissing, T. Nattermann, K.-H. Glander, C. Hass, and T. Bertram,

“Lane change prediction by combining movement and situation based

probabilities,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 3554–3559, 2017.

[3] F. Wirthmüller, J. Schlechtriemen, J. Hipp, and M. Reichert, “Teaching ve-

hiclestoanticipate:Asystematicstudy onprobabilisticbehaviorprediction

using large data sets,” IEEE Trans. Intell. Transp. Syst., to be published,

doi:10.1109/TITS.2020.3002070.

2364 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 6, NO. 2, APRIL 2021

[4] F. Wirthmüller, J. Schlechtriemen, J. Hipp, and M. Reichert, “Towards in-

corporating contextual knowledge into the prediction of driving behavior,”

in Proc. IEEE 23th Int. Conf. Intell. Transp. Syst., 2020, pp. 1–7.

[5] R. Krajewski, J. Bock, L. Kloeker, and L. Eckstein, “The highD dataset: A

drone dataset of naturalistic vehicle trajectories on german highways for

validation of highly automated driving systems,” in Proc. IEEE 21st Int.

Conf. Intell. Transp. Syst., 2018, pp. 2118–2125.

[6] J. Colyar and J. Halkias, “US highway 101 dataset,” Federal

Highway Administration (FHWA), Tech. Rep. FHWA-HRT-07-030,

2007.

[7] F. Wirthmüller, M. Klimke, J. Schlechtriemen, J. Hipp, and M. Reichert,

“A fleet learning architecture for enhanced behavior predictions during

challenging external conditions,” in Proc. 2020 IEEE Symp. Ser. Comput.

Intell. (SSCI), 2020, pp. 2739–2745.

[8] M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in

empiricalobservationsandmicroscopic simulations,”Phys.Rev.E, vol.62,

no. 2, 2000, Art. no. 1805.

[9] A. Kesting, M. Treiber, and D. Helbing, “General lane-changing model

mobil for car-following models,” Transp. Res. Rec., vol. 1999, no. 1,

pp. 86–94, 2007.

[10] H. Q. Dang, J. Fürnkranz, A. Biedermann, and M. Hoepfl, “Time-to-lane-

change prediction with deep learning,” in Proc. IEEE 20th Int. Conf. Intell.

Transp. Syst., 2017, pp. 1–7.

[11] C. Wissing, T. Nattermann, K.-H. Glander, and T. Bertram, “Probabilistic

time-to-lane-change prediction on highways,” in Proc. IEEE 28th Intell.

Veh. Symp., 2017, pp. 1452–1457.

[12] C. Wissing, T. Nattermann, K.-H. Glander, and T. Bertram, “Trajectory

prediction for safety critical maneuvers in automated highway driving,” in

Proc. IEEE 21st Int. Conf. Intell. Transp. Syst., 2018, pp. 131–136.

[13] J. Schlechtriemen, F. Wirthmueller, A. Wedel, G. Breuel, and

K.-D. Kuhnert, “When will it change the lane? A probabilistic regression

approach for rarely occurring events,” in Proc. IEEE 26th Intell. Veh.

Symp., 2015, pp. 1373–1379.

[14] A. Benterki, M. Boukhnifer, V. Judalet, and C. Maaoui, “Artificial in-

telligence for vehicle behavior anticipation: Hybrid approach based on

maneuver classification and trajectory prediction,” IEEE Access,vol.8,

pp. 56992–57002, 2020.

[15] F. Altché and A. de La Fortelle, “An LSTM network for highway trajectory

prediction,” in Proc. IEEE 20th Int. Conf. Intell. Transp. Syst., 2017,

pp. 353–359.

[16] K. Messaoud, I. Yahiaoui, A. Verroust-Blondet, and F. Nashashibi, “Non-

local social pooling for vehicle trajectory prediction,” in Proc. IEEE 30th

Intell. Veh. Symp., 2019, pp. 975–980.

[17] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural

Comput., vol. 9, no. 8, pp. 1735–1780, 1997.

[18] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cam-

bridge, MA, USA: MIT Press, 2016. [Online]. Available: http://www.

deeplearningbook.org

[19] F. Hutter, L. Kotthoff, and J. Vanschoren, Automated Machine Learning:

Methods, Systems, Challenges. Berlin, Germany: Springer Nature, 2019.

[20] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”

2014. [Online]. Available: https://arxiv.org/abs/1412.6980

[21] F. Chollet et al., “Keras,” 2015. [Online]. Available: https://keras.io

[22] M. Abadi et al., “Tensorflow: A system for large-scale machine learning,”

12th {USENIX} Symposium Operating Systems Design Implementation

({OSDI} 16), pp. 265–283, 2015. [Online]. Available: www.tensorflow.org

[23] M. Bahram, C. Hubmann, A. Lawitzky, M. Aeberhard, and D. Wollherr,

“A combined model-and learning-based framework for interaction-aware

maneuver prediction,” IEEE Trans. Intell. Transp. Syst., vol. 17, no. 6,

pp. 1538–1550, Jun. 2016.

[24] H. Woo et al., “Lane-change detection based on vehicle-trajectory predic-

tion,” IEEE Robot. Automat. Lett., vol. 2, no. 2, pp. 1109–1116, Apr. 2017.

[25] K. P. Murphy, Machine Learning: A Probabilistic Perspective. Cambridge,

Massachusetts & London, England: The MIT Press, 2012.