Received 16 January 2023, accepted 13 February 2023, date of publication 22 February 2023, date of current version 28 February 2023.
Digital Object Identifier 10.1109/ACCESS.2023.3247564
Autoencoder-Based Iterative Modeling and
Multivariate Time-Series Subsequence
Clustering Algorithm
JONAS KÖHNE 1,2, LARS HENNING 2, AND CLEMENS GÜHMANN 1
1Chair of Electronic Measurement and Diagnostic Technology, Technische Universität Berlin, 10623 Berlin, Germany
2IAV GmbH, 10587 Berlin, Germany
ABSTRACT This paper introduces an algorithm for the detection of change-points and the identification
of the corresponding subsequences in transient multivariate time-series data (MTSD). The analysis of such
data has become increasingly important due to growing availability in many industrial fields. Labeling,
sorting or filtering highly transient measurement data for training Condition-based Maintenance (CbM)
models is cumbersome and error-prone. For some applications it can be sufficient to filter measurements
by simple thresholds or finding change-points based on changes in mean value and variation. But a robust
diagnosis of a component within a component group for example, which has a complex non-linear correlation
between multiple sensor values, a simple approach would not be feasible. No meaningful and coherent
measurement data, which could be used for training a CbM model, would emerge. Therefore, we introduce an
algorithm that uses a recurrent neural network (RNN) based Autoencoder (AE) which is iteratively trained on
incoming data. The scoring function uses the reconstruction error and latent space information. A model of
the identified subsequence is saved and used for recognition of repeating subsequences as well as fast offline
clustering. For evaluation, we propose a new similarity measure based on the curvature for a more intuitive
time-series subsequence clustering metric. A comparison with seven other state-of-the-art algorithms and
eight datasets shows the capability and the increased performance of our algorithm to cluster MTSD online
and offline in conjunction with mechatronic systems.
INDEX TERMS Condition-based maintenance, multivariate time-series data, change point detection,
unsupervised clustering, autoencoder, segmentation, subsequence, clustering.
I. INTRODUCTION
In the applications of machine diagnosis of mechatronic sys-
tems and the subfield CbM, all supervised machine learning
methods rely on high-quality labeled data [1], [2]. An option
for a mechatronic system with different operating points is
to measure the main operating points separately and create a
diagnosis method for each of those individually. This requires
a well-structured design and execution of experiments with a
measurement labeling process. In a real world development
environment for mechatronic system, where measurements
are taken either automatically and/or manually and by many
The associate editor coordinating the review of this manuscript and
approving it for publication was Wentao Fan .
individuals and in different hardware and software develop-
ment stages, providing consistently labeled and categorized
data is a challenge.
Automatically labeling and categorizing multivariate time-
series (MTS) data is therefore not only an alleviation but
might be crucial for a successful CbM approach. As described
above, labeled and categorized data is essential for training
a model to represent a mechatronic system in a data driven
approach. In the automotive sector where a lot of measure-
ments occur at different operating points this is especially
important. Some of these measurements are being recorded
on a test bench in standard environment conditions with pre-
defined operating points and for a given time (e.g., Worldwide
harmonized Light Duty Test Cycle (WLTC)). Others can be
18868 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 11, 2023
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
in idle mode during waiting or preparation time for a longer
trip or other measurements. Also, very transient episodes are
existent (e.g., Real Driving Emissions (RDE)). All of these
measurements do not necessarily have the same calibration of
the underlying mechatronic system. To train a robust model of
the mechatronic system, component group or a single compo-
nent, a big effort has to be put in the design of the experiments
alone, not to mention the experiments themselves. Therefore,
a method of automatically labeling existing measurements is
of advantage. Afterwards an automatic sorting of the labeled
time sequences by statistical methods is possible, to enable a
data driven mechatronic diagnosis approach.
Using advanced unsupervised approaches for CbM allows
the data to be unlabeled (otherwise supervised methods could
be used). In this case the labeling refers to the label of the
condition (mechanical degradation) of the monitored system.
When trying to diagnose mechatronic systems that have many
operating points and are free to transfer in between those or
are capable of totally transient operation modes, then a robust
diagnosis of the actual condition of the mechatronic system
is extremely challenging. An early and reliable (robust) diag-
nosis of a mechatronic system prevents accidents, enables
optimal maintenance and increases uptime of machinery.
Without the knowledge of the current condition of the system,
fault prevention can only be done by predetermined mainte-
nance intervals. Motivation is therefore to monitor the health
condition of the mechatronic system as close as possible,
resulting in the task of separating discrete sensory data into
uniquely identifiable and recognizable segments or subse-
quences. This is beneficial to the performance of anomaly
detection, because if all normally occurring subsequences are
identified, the detection of abnormal or faulty subsequences
is straightforward.
When monitoring the health condition of a mechatronic
system it is state of the art, to manually calibrate specific
release conditions, during which the condition monitoring is
enabled. This is done to exclude operating points which are
very rare, too transient or are just not feasible for drawing
conclusions about the condition of the mechatronic system.
But even restricting the conditions on where to diagnose the
machine (which already is reducing the probability of diag-
nosing the machine at all due to an operating state which is by
chance outside the release conditions) cannot always help to
improve the fault detection, identification and quantification
of its magnitude. For example, in a mechatronic system with a
complex nonlinear dependency of its subcomponents and its
time dependency, ‘‘going in’’ or ‘‘going out’’ of the release
conditions can result in very different system behavior. Com-
paring these two states does not lead to reliable conclusions
for the mechatronic systems health condition.
Therefore, the kind of data sequences used for train-
ing/calibration and validation is crucial for any monitoring
strategy. Manually screening, labeling and sorting data into
comparable sequences is time-consuming, error-prone and
cumbersome. Additionally, this is a decision process which
requires expert and domain knowledge.
The new algorithm which we introduce in this work
is capable of generating subsequence models from online
streaming data which is processed sequentially. Any coherent
subsequence that is identified can be recognized (clustered)
if occurring again. Depending on multiple sensitivity cali-
bration parameters, time-varying data points are associated
and identified as a subsequence. The parameters determine
the volatility or the strength of the affiliation required to be
recognized as one time varying subsequence. These subse-
quence models can also be applied efficiently offline onto
large existing datasets. During this prediction phase, the algo-
rithm provides a vector of subsequence labels which were
recognized as one from the training data. Depending on the
calibration, it can also provide a label for unknown data which
represents a phase where no pattern could be recognized.
Otherwise, it finds the best fitting subsequence and labels it
as that. The approach published in this work is currently only
based on MTS input but could be adapted for a univariate
input. It is a multivariate time-series sub-sequence discovery
and identification method.
Our contribution is a new algorithm for online sub-
sequence clustering of MTSD called ‘‘Autoencoder-based
Iterative Modeling and Subsequence Clustering Algorithm
(ABIMCA)’’ and a new metric to evaluate cluster algo-
rithms focused on this task ‘‘Multivariate Time-Series Sub-
Sequence Clustering Metric (MT3SCM)’’. We compare our
algorithm with
•seven other state-of-the-art algorithms
•eight datasets, from which six are publicly available and
two are provided with our codebase
•three widely used unsupervised clustering metrics
•our own metric (MT3SCM) and its four components
while varying the use of default algorithm parameters with
optimized parameters on each algorithm and dataset via ran-
dom grid search.
II. RELATED WORK
In this section we define the terminology used and its seman-
tics to categorize our work within the large bibliography
existing in this field and provide a selected list of related
works and their ascendancy to this paper.
A. TERMINOLOGY AND SEMANTICS
Numerous possibilities have been described for achieving
our main goal of segmenting discrete time-series sensory
data. Most approaches can be sorted into the following par-
tially overlapping categories: time-series analysis [3], pattern
recognition [4], temporal knowledge discovery [5], motif dis-
covery [6], change-point detection [7], data clustering [8]
or anomaly detection [9]. All those terms refer to methods
or algorithms which could be used directly or indirectly to
achieve our goal. Explicit description of each term or category
can be found in the stated references.
To limit the scope of this work, we focus on data clus-
tering which can be separated into six subcategories by the
VOLUME 11, 2023 18869
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
FIGURE 1. Combination possibilities of time-series clustering categories.
following groups of two: univariate and multivariate data,
online and offline algorithms, variant and invariant data.
The term ‘‘clustering’’ implies an unsupervised method. The
equivalent supervised method would be called ‘‘classifica-
tion’’. In the relevant literature more and other distinctions
are made, depending on the specific field and context. First,
we describe time-series data (TSD) since this is the data
format we focus on in this paper. Afterwards we explain the
differences between the subcategories.
‘‘A time-series is a sequence of observations taken sequen-
tially in time.’’ [3] We denote a time-series data point as
an observation with exactly one connected timestamp. The
timestamp is not a variable or feature.
1) UNIVARIATE – MULTIVARIATE
If a single value or a scalar is the only variable of the data,
then the data is univariate. It is the most basic format data
can have. Considering TSD, a single temperature sensor with
a timestamp would be univariate. Univariate TSD could also
be interpreted as multivariate data of two dimensions, when
taking the timestamp as another variable or feature.
2) ONLINE – OFFLINE
The differentiation between online and offline algorithms or
analysis of data is crucial. Offline refers to data analysis
that is applied to all the data at once. Measurement data,
for example, is available in one or multiple files or can be
accessed via a previously filled database. Offline algorithms
can therefore iterate and optimize their result based on a
criterion applied to known data. Online analysis on the other
hand, is applied sequentially. The algorithm needs to be
able to function with a criterion that generalizes well with
unknown data. One selection or piece of data can be applied
on the online algorithm without knowing the rest of the data.
This approach cannot be as robust and accurate as an offline
analysis, which is why most methods found in the literature
are offline algorithms. Ideally the online algorithm learns as
new data is provided. For the sake of completeness, however,
it should be noted that some online algorithms have to be
pre-trained offline and some algorithms referred to as offline
can be used sequentially. This depends on the underlying
methods used. An offline algorithms’ purpose is not to be
TABLE 1. Algorithms used for time-series clustering comparison.
used online. Nevertheless, online algorithms can be used
offline.
3) DEPENDENT – INDEPENDENT TSD (TIME VARIANT –
TIME INVARIANT)
Depending on the field of study the specific terminology of
time dependency can differ. We want to emphasize on the
common accepted assumption “An intrinsic feature of a time-
series is that, typically, adjacent observations are dependent”
([3, S. 1]). Time dependency characterizes TSD, where a
consecutive observation has some connection with its prede-
cessor. In some fields a connection is not necessarily given
for two data points in a database that are the closest to each
other regarding their timestamp. The dependency of adjacent
observations is self-evident, when collecting sensor values of
a mechatronic system from an experimental rig, for example.
We therefore use the term ‘‘time dependency’’ in the context
of a dynamical system and extend it to a time-variant system
in the terminology of control systems engineering. Indepen-
dent TSD would be where the variance and the average are
invariant along the time (stationary) and ergodic.
We position our work in the subcategory of online clus-
tering of dependent multivariate time-series data. As of
now we refer to a time-series as a sequence of dependent
observations with a constant sample-rate. A ‘‘discrete series
of consecutive data points’’ as a subset of this time-series is
synonymously referred to as pattern, motif, sequence, operat-
ing state, state change, between change points, subsequence,
episode or segment, among others. In this paper we will use
the term subsequence (using terminology of [10]).
B. ALGORITHMS AND DATASETS
In the relevant literature a diverse number of clustering
algorithms can be found [11]. Due to this fact, most of
the existing literature for reviewing or surveying existing
approaches, attend to a higher-level scope [12], [13], [14].
Fewer are concentrating on time-series clustering [23],
[24], [25], online [26], temporal knowledge discovery [5],
sequential pattern recognition [10], high dimensional
data [27] or change point detection [14], [28]. Current
approaches use deep learning architectures like multilayer
1https://scikit-learn.org/stable/modules/classes.html
2https://facebookresearch.github.io/Kats/
3https://riverml.xyz
18870 VOLUME 11, 2023
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
perceptron (MLP), convolutional neural network (CNN),
deep belief network (DBN), generative adversarial net-
work (GAN) and variational autoencoder (VAE) among
others [29].
The algorithms we use for comparison are listed in Table 1.
All of these are online clustering algorithms that can be
used for time-series clustering. Implementations are publicly
available in the Python programming language (see library
column in Table 1) and are well established and tested.
The Balanced Iterative Reducing and Clustering using
Hierarchies (BIRCH) [16] algorithm is based on a clustering
features (CF) tree with the CF as a triple of the number of
data points, linear sum and the squared sum. This CF tree
is built dynamically. It was also one of the earliest algo-
rithms capable of online clustering. The Bayesian Online
Change-point Detection (BOCPD) [17] algorithm, as the
name suggests, uses Bayesian methods to detect change-
points (CPs) online. Since this algorithm only detects CPs,
we manipulated the result to be able to interpret every CP
as the beginning of a new cluster. This algorithm starts
in our comparison with the limitation of not being able
to recognize a previously seen cluster. The Stream Clus-
tering Framework (CluStream) [19] algorithm is based on
extended CF from BIRCH, following a k-means algorithm.
The Density-based Stream Clustering (DBSTREAM) [20]
algorithm is based on the Self Organizing density-based clus-
tering over data Stream (SOStream) [30] and uses a shared
density graph to capture the density between micro-clusters.
The Density-Based Clustering over an Evolving Data Stream
with Noise (DenStream) [21] algorithm is an extension of
Density-Based Spatial Clustering of Applications with Noise
(DBSCAN) [31] which uses a damped window model of CF
to create core-micro-clusters and outlier-micro-clusters. The
Mini-Batch K-Means (MiniBatchKMeans) [22] algorithm
proposes “the use of mini-batch optimization for k-means
clustering” ([22]) to improve the k-means optimization prob-
lem. The STREAMKMeans [18] algorithm uses an adaption
of the original STREAM algorithm from [32]. Replacing the
k-median subroutine LSEARCH by an incremental k-means
algorithm. More information and comparison of most of the
used algorithms can be found in [26] and [33]
As described in section I, the focus of this publication
is on the online multivariate time dependent subsequence
clustering using RNN based AE. The number of algorithms
within this scope is limited compared to the number of clus-
tering algorithms in general. The following approaches use
at least some of those prerequisites. Reference [34] empha-
sizes the term segmentation for an offline sliding window
and bottom-up algorithm. Others are converting the time-
series into a Markov chain (MC) and then using a Bayesian
method to cluster the MCs [35], referring to them as episodes.
Here the data needs to be discretized into bins of equal
length. Reference [36] uses manually selected characteristics
(e.g., kurtosis, skewness and frequency) for clustering uni-
variate TSD. Others are using the Augmented Dickey-Fuller
test to evaluate time-series stationarity and perform a segmen-
tation based on this [37]. In [38] dynamic latent variables
from a vector autoregression (VAR) model in combination
with a principal component analysis (PCA) is used for seg-
menting industrial TSD. Reference [39] shows the advantage
of an embedding approach as well, by introducing a PCA
and a Vanilla-AE CP detection method with the restriction
of focusing on multivariate power grid data.
Focused on transfer learning, [40] introduces an adversar-
ial approach for domain adaption using a stacked AE. Offline
convolutional sparse AE used for supervised sequence clas-
sification was done by [41] and adapted by [42] for
unsupervised motif mining. Other AE based papers are, for
example, by [43] who use a mixture of AEs for image and text
clustering. Stacked AE and k-means for offline clustering is
done by [44] without considering time dependency. Showing
the combination of GRU-based AE and MTS for anomaly
detection is done by [45]. Reference [46] applies the sliding
windows approach on CNN-based AE for Anomaly Detec-
tion of Industrial Robots. A similar approach using AE for
MTS segmentation is published in [47]. The focus there is on
change point detection using latent space variables only and
no clustering or identification of the subsequences is done.
Clustering is strongly depending on the data and task pro-
vided: “...; each new clustering algorithm performs slightly
better than the existing ones on a specific distribution of pat-
terns.” ([8, S. 268]). Therefore, we try to apply the algorithms
on multiple different MTS datasets and compute different
metrics for comparison. Large efforts are made for making
datasets available to the scientific community and the public
to improve comparability and reproducibility by universities
or governmental institutions [48], [49]. For this paper we
focus on data with multivariate quantitative features with
continuous values. For a list of the datasets see Table 3
and a brief description is given in section IV. Evaluating
the performance of a clustering algorithm can be done with
two different approaches. If external knowledge about the
ground truth of each data point and its cluster is known,
then so-called external measures can be applied. If no ground
truth is available, internal measures need to suffice. Many
external measures exist, like the well-known F1-score (based
on the effectiveness measure by [50]). With the large number
of data available and working in the context of transient
machine behavior with the focus on finding internal states
of the system, acquiring or providing the ground truth is
time-consuming, error-prone and cumbersome (as described
in section I). “The definition of clusters depends on the user,
the domain, and it is subjective.” ([25, S. 30]). We therefore
use internal measures for comparing our approach. Those
internal measures commonly rely on a similarity measure of
the actual data which is being clustered. Thorough work on
metric comparison and similarity measures has been done
[25], [51], [52]. Most of those measures are based on simple
distances and densities computed for each data point but
do not take time dependency into consideration. Because of
VOLUME 11, 2023 18871
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
this, we found that for the use case described in this paper,
the commonly used clustering evaluation measures are not
well suited for ‘‘time-series clustering evaluation measures’’.
In section Vwe introduce an approach for similarity measures
which considers time dependency in combination with well-
established clustering metrics (see Table 2).
TABLE 2. Metrics used for time-series clustering comparison.
Implementations used from [15].
TABLE 3. Datasets used for time-series clustering comparison.
III. DEFINITIONS AND RESTRICTIONS
In the following section we define in more detail our data,
together with the restrictions of our environment. Considering
online clustering, we can refer to our TSD as continuously
incoming data or streaming data. This data is considered mul-
tivariate when the dimension (number of sensors or features)
of the data stream d>1. When we denote one value of
one feature as x, we have at time step tthe following feature
vector:
xt=(x0,x1,...,xd)Twith x∈Rand d∈N(1)
whereas the natural numbers include zero {0,1,2, . . .}=N.
A complete measurement sequence with nnumber of time
4https://sites.cc.gatech.edu/~borg/ijcv_psslds/
5https://data.nasa.gov/dataset/C-MAPSS-Aircraft-Engine-Simulator-
Data/xaut-bemq
6http://www.timeseriesclassification.com/description.php?Dataset=
EigenWorms
7https://archive.ics.uci.edu/ml/datasets/Condition%20monitoring%20of
%20hydraulic%20systems
8http://mocap.cs.cmu.edu/
9https://github.com/LuisM78/Occupancy-detection-data
steps using xfrom Equation (1) as
X=(x0,x1,...,xn)with n∈N(2)
so, X∈Rd×n. With time dependency consideration, it is
reasonable to denote a sliding window of the streaming data,
considering Equation (1) and nthe number of samples already
collected as:
Wt=xt+0,xt+1,...,xt+ζ⇒(ζ∈N)∧(ζ < n) (3)
Let’s also assume, that within the measurement data X
there exist subsequences Sjwhich satisfy our requirements of
non-overlapping and variable length. For the indexes of our
subsequences, we denote
J={j∈N:j≤u}(4)
where uis the number of identified subsequences in X. The
subsequence then is a continuous sampling from Xfor a small
period of time steps with consecutive data points and the
length of m. The subsequence length is usually much smaller
than the length of the full measurement data m≪n.
Sj=xqj,...,xqj+mj(0 ≤q≤n−m)∧j∈J(5)
For each subsequence with the index jwe have a first time
step index qj=qj,start and a last qj+m=qj,end for which
we do not allow overlapping
∀j∈J∃(qj,start ,qj,end )
⇒(qj,start <qj,end )∧((qj−1,end <qj,start )∧j>0) (6)
This results in our uniquely identified non-overlapping set of
subsequences
S=S0,...,Sjj∈J(7)
Clustering these uniquely identified subsequences results in
recognizing reoccurring subsequences and combining them
into a subset of all subsequences
Ci⊆S(8)
which results in the following cluster set C
C={C0,...,Ci}i∈I(9)
with the cluster index or unique cluster label
I={i∈N:i<n}(10)
For the output of a clustering algorithm at time twe denote
the scalar value ytas our label or designated subsequence
identification. For evaluation purposes a clustering for a time-
series produces a label array yfor all time steps:
y=(y0,y1,...,yn)with y∈Jand n∈N(11)
Furthermore, it is a requirement, that the streaming data pro-
vided can be applied to a numerical differentiation algorithm.
Therefore, a constant sample rate is necessary and in case
of strong noise, filtering or smoothing of the data should
be applied by a preprocessing step. Also, there mustn’t be
missing values and extreme outliers need to be removed.
18872 VOLUME 11, 2023
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
In our use case we assume that some knowledge about the
incoming data exists, so that an estimate of the variance and
the mean of the variable can be performed for standardization.
IV. DATASETS
All datasets used for comparison in this work are described
briefly in this section and listed in Table 3. They all contain
quantitative features with continuous values. For further use
of the datasets, no missing values exist, the data is continuous
and was standardized for the algorithms but not for the metric
computations. No other preprocessing like smoothening or
filtering was performed.
The bee-waggle dataset [56] contains movement of bees
in a hive captured with a vision-based tracker. The first two
features are the x and y coordinates of the bee added with the
sine and the cosine function applied to the heading angle.
The cmapss dataset is a ‘‘dataset of run-to-failure trajecto-
ries for a small fleet of aircraft engines under realistic flight
conditions’’ [57] with 18 features.
The eigen-worms dataset [58] contains measurements of
worm motion. Preprocessing extracted six features, which
represent the amplitudes along six previously identified base
shapes of the worms
The hydraulic dataset [59] is obtained from a hydraulic
test rig with measuring 17 process values such as pressures,
volume flows and temperatures.
Lorenz-attractor refers to a synthetic dataset which is
calculated using a system of the three coupled ordinary dif-
ferential equations which represent a hydrodynamic system:
˙
X=s(Y−X);˙
Y=rX−Y−XZ;˙
Z=XY −bZ with
parameters used s=10, r=28 and b=2.667 (see Figure 2).
‘‘In these equations Xis proportional to the intensity of the
convective motion, while Y is proportional to the temperature
difference between the ascending and descending currents,
similar signs of Xand Ydenoting that warm fluid is rising,
and cold fluid is descending.’’ [60]
The mocap or The Motion Capture Database (MOCAP)
dataset [61] contains 93 features from human motion cap-
tured with markers.
The occupancy dataset [62] is a measurement of sensory
data in an office with the following sensors: temperature,
humidity, the derived humidity ratio, light and CO2.
The thomas-attractor dataset is as the lorenz-attractor
dataset, a synthetic dataset, computed with the three coupled
differential equations: ˙
X=sin(Y)−bX;˙
Y=sin(Z)−
bY ;˙
Z=sin(X)−bZ originally proposed by [63] with the
parameter used b=0.1615.
V. MULTIVARIATE TIME-SERIES SUB-SEQUENCE
CLUSTERING METRIC (MT3SCM)
As emphasized in section Iand section II, to our knowledge,
none of the existing clustering metrics take into consider-
ation the time space variations like curvature, acceleration
or torsion in a multidimensional space. We believe using
these curve parameters, is an intuitive method to measure
similarities between mechatronic system state changes or
FIGURE 2. Lorenz-attractor dataset. Computed with ˙
X=s(Y−X);
˙
Y=rX −Y−XZ ;˙
Z=XY −bZ and parameters used s=10, r=28 and
b=2.667. Color and marker size indicate amount of curvature on a
logarithmic scale for better visibility.
subsequences in MTSD in general (in regard to the restric-
tions in section III).
Our MT3SCM score consists of three main components.
mt3scm =(ccw+sL+sP)/3 (12)
The weighted curvature consistency (ccw), the silhouette
location based (sL) and the silhouette curve-parameter based
(sP). When making the attempt of clustering TSD, it is sub-
jective and domain specific. Nevertheless, we try to take
the intuitive approach of treating MTSD as space curves
and use the parameterization as a similarity measure. This
is done in two different ways. First, we create new fea-
tures by computing the curve parameters sample by sample
(e.g., curvature, torsion, acceleration) and determine their
standard deviation for each cluster. Our hypothesis is, that
with a low standard deviation of the curve parameters inside
a cluster, the actions of a mechatronic system in this clus-
ter are similar. We call this the curvature consistency (cc)
(see Equation (24) used in 14 in algorithm 1). The sec-
ond procedure is to apply these newly computed features,
which are computed to scalar values per subsequence, onto
a well-established internal clustering metric, the silhouette
score [53] (see Table 2).
The computation of the cc comprises the calculation of the
curvature κand the torsion τat every time step twith xt.
κ(t)=⟨˙
e1(t),e2(t)⟩
∥˙
xt∥(13)
τ(t)=⟨˙
e2(t),e3(t)⟩
∥˙
xt∥(14)
whereas e1is the unit tangent vector (or first Frenet vector),
e2is the unit normal vector (or second Frenet vector) and e3is
the unit binormal vector (or third Frenet vector) which are
VOLUME 11, 2023 18873
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
defined as:
e1(t)=˙
xt
∥˙
xt∥(15)
e2(t)=¨
xt− ⟨¨
xt,e1(t)⟩ × e1(t) (16)
e2(t)=e2(t)
∥e2(t)∥(17)
e3(t)=...
xt− ⟨...
xt,e1(t)⟩ × e1(t)− ⟨...
xt,e2(t)⟩ × e2(t)
(18)
e3(t)=e3(t)
∥e3(t)∥(19)
From which we can also derive the speed v= ∥˙
xt∥and
the acceleration a= ∥¨
xt∥. Figure 3shows exemplarily the
curvature κ, torsion τ, speed vand acceleration afor the first
part of the thomas-attractor dataset.
FIGURE 3. Qualitative visualization of the (a) curvature κ, (b) torsion τ,
(c) speed vand (d) acceleration acomputed on part of the
thomas-attractor dataset. Color and marker size indicate amount of curve
parameter on a logarithmic scale for better visibility (dark and thin means
low value, bright and thick means high value). Axis labels and colorbar
labels are along the lines of Figure 2.
Afterwards the cc is calculated per cluster i∈I, by taking
the empirical standard deviation for each curve parameter
(exemplarily for κin Equation (20) with the set of subse-
quence indexes Jiwithin our cluster i). The arithmetic mean
(Equation (21)) of the standard deviations for the curvature
κ, torsion τand the acceleration aresults in the final cc per
cluster (see Equation (22)).
σκi=v
u
u
u
t1
Ni−1X
j∈Ji
qj+mj
X
n=qj
(κn−κi)2(20)
σi=σκi+στi+σai
3(21)
cci=1−σiwith cci∈R:cci≤1 (22)
cci=(cci,if cci>−1
−1,if cci≤ −1(23)
The ccw, is directly derived from the cc per cluster, by weight-
ing it with the number of data points per cluster i∈I
Equation (24).
ccw=
n
P
i=1
cci×Ni
n
P
i=1
Ni
(24)
The calculation of the scores sPand sLis different to the
standard estimation of the silhouette score, which is shown
in Equation (25) and originally based on every data point of
the time-series Xand the assigned cluster label array y:
s=f(X,y) (25)
Our sPis the silhouette score derived from our previously
computed curve parameters per subsequence per cluster as
well as the standard deviation of those and the number of data
points per subsequence.
sP=f(e
XsP,yj) (26)
with
e
XsP=
κ11 τ11 a11 σ11 N11
κ12 τ12 a12 σ12 N12
.
.
..
.
..
.
..
.
..
.
.
κ21 τ21 a21 σ21 N21
κ22 τ22 a22 σ22 N22
.
.
..
.
..
.
..
.
..
.
.
κij τij aij σij Nij
,yj=
y11
y12
.
.
.
y21
y22
.
.
.
yij
(27)
The sLuses the silhouette score based on the median value
ˆxdij of a subsequences original feature space per feature d.
sL=f(e
XsL,yj) (28)
with
e
XsL=
ˆx111 ˆx211 . . . ˆxd11 σ11 N11
ˆx112 ˆx212 . . . ˆxd12 σ12 N12
.
.
..
.
..
.
.
ˆx121 ˆx221 . . . ˆxd21 σ21 N21
ˆx122 ˆx222 . . . ˆxd22 σ22 N22
.
.
..
.
..
.
.
ˆx1ij ˆx2ij . . . ˆxdij σij Nij
(29)
The main idea of this approach is to combine three main parts
inside one metric. First incentive is to reward a low standard
deviation of the curve parameters in between a cluster
(accomplished by cc). Second, to benchmark the clusters
spatial separation based on the new feature space (curve
parameters, accomplished by sP). And third, to benchmark
the clusters spatial separation based on the median of the
18874 VOLUME 11, 2023
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
subsequence in the original feature space (accomplished
by sL). The proposed algorithm for this new metrics compu-
tation is described in algorithm 1.
Algorithm 1 MT3SCM
1: procedure MT3SCM(X,y)▷Data X∈Rd×nand labels
y∈Nn
2: L←empty() ▷Array initialization for all
subsequence median coordinates or Location
3: P←empty() ▷Array initialization for all
subsequence curve Parameters mean values
4: K←GetCurveParametersForAllData(X)
5: yunique ←FindUniqueClusterIDs(y)
6: for iin yunique do
7: Xi←GetClusterData(X,i)
8: s←FindSubsequences(y,i)
9: for jin sdo
10: Xi,j←GetSubsequenceData(Xi,j)
11: L[i,j]←GetMedianLocations(Xi,j)
12: P[i,j]←GetCurveParameterValues(K,i,j)
13: end for
14: cci←ClusterCurvatureConsistency(P)▷
Compute the cluster curvature consistency (cci) with the
empirical standard deviation of each curve parameter
over time. If the cluster consists only of one time step,
set the ccito zero.
15: C[i]←cci▷Collect ccidata for all clusters
16: end for
17: ccw←WeightedAverage(C,npc)▷
Compute weighted average curvature consistency (ccw)
from cciwith number of points per cluster
18: sL←SilhouetteComputation(L,yunique)▷Compute
the silhouette coefficient using the center positions of
each identified subsequence
19: sP←SilhouetteComputation(P,yunique)▷Compute
the silhouette coefficient with the curve parameters
20: score ←(ccw+sL+sP)/3
21: return score ▷The final score
22: end procedure
A. EVALUATION
For computational tests, we manually created a ‘‘perfect’’
synthetic dataset with respect to our metric (see Figure 4).
Figure 4 (a) shows the original synthetic dataset, where the
subsequences in cluster 1 are a helix along the increasing
xaxis. For cluster 2 the subsequences are a straight move-
ment, with quadratic decreasing distances along the yaxis.
Cluster 3 is representing a helix along the decreasing xaxis
but with a different resolution than cluster 1. Cluster 4 is,
along with cluster 2, a straight movement with quadratic
increasing distances along the yaxis. This cycle is repeated
six times. Figure 4 (b) shows the new feature space for the sL
component. The feature space for the sPcomponent is shown
in Figure 4 (c). Applying the new features per subsequence on
the standard metrics, results in best scores for all metrics. This
shows that the new feature space allows a good separation in
contrast to the original space, as proven by the metrics scores
for silhouette, calinski-harabasz and davies-bouldin on the
original and the two new feature spaces. To show the benefit
of the new feature space, we applied the agglomerative clus-
tering10 not on the original lorenz-attractor dataset but on the
newly computed feature space based on curvature, torsion and
acceleration (see Figure 5) The metric values for Figure 5 (b)
show a high ccwand a decent sPvalue for the low number of
10 clusters specified.
To further evaluate our metric, we used the lorenz-attractor
and the thomas-attractor dataset (see Table 3) and applied
an agglomerative clustering, a time-series k-means clustering
as well as a random subsequence clustering. Varying the
number of clusters and some algorithm specific parameters.
Afterwards the metrics calinski-harabasz, davies-bouldin and
silhouette scores were computed and compared to our new
metric MT3SCM. From these results we derived a correlation
matrix (see Figure 6). The cc and the ccware clearly related
due to their direct combination. The positive correlation
between the internal components to the overall MT3SCM
score is obvious. We see a clear positive correlation to the
silhouette score which is evident due to the internal use of
this metric. Interestingly, the correlation between the ccwand
the sPis negative. This is due to the types of datasets and
algorithms we used. Because with higher number of clusters
we theoretically expect a better cc because of the lower stan-
dard deviation by chance. On the other hand, the more clusters
exist, the more likely a similar curve parameter between the
clusters exists and therefore creates a new feature space with
overlapping clusters, which results in a low sPscore. This
can be retraced within the subfigures of Figure 7. The low
correlation between the calinski-harabasz and the davies-
bouldin scores supports our point that the available clustering
metrics are not well suited to be used for time-series cluster-
ing evaluation measures. Figure 7shows examples where the
agglomerative clustering was applied on the lorenz-attractor
dataset (part of the data used for the correlation matrix
Figure 6). It can be seen that the agglomerative clustering on
the original dataset is not an optimal cluster algorithm, when
comparing the metrics to Figure 5 (b). Comparing Figure 5 (b)
and Figure 7 (d) we can see a similar MT3SCM score but very
different standard metrics scores. The similar MT3SCM score
is based on the much higher number of clusters and equally
distributed subsequence length in Figure 7 (d), which results
in a high ccwvalue as well as a good spatial separation (sL),
which is compensating the low sPvalue due to the similar
curve parameters of the clusters. Figure 5 (b) however, also
has a very high ccwvalue with a good sPvalue reaching
a similar MT3SCM score but with a fifth of the number of
clusters. How our metric handles random clustering with
10https://scikit-learn.org/stable/modules/generated/sklearn.cluster.
AgglomerativeClustering.html
VOLUME 11, 2023 18875
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
FIGURE 4. Synthetic dataset with four clusters with a perfect own metric score of mt3scm =1 due to each cluster’s unique and constant curve
parameters. (a) Synthetic dataset with best own result of mt3scm =1. Standard metrics scores computed with original data; davies-bouldin: 1.4,
calinski-harabasz: 6.9e+02, silhouette: 0.087. (b) New feature space from the centers (median value) of each subsequence. Standard metrics scores
computed with new feature space; davies-bouldin: 6.3e−07, calinski-harabasz: 3.8e+14, silhouette: 1. (c) New feature space from the curve
parameters extracted from each subsequence. Standard metrics scores computed with new feature space; davies-bouldin: 6.8e−07,
calinski-harabasz: 1.2e+13, silhouette: 1.
TABLE 4. Metric values for Figure 7 8 and 5.
critical scenarios, is shown in Figure 8. The Python code and
a more detailed evaluation are publicly available at [64]
B. CONCLUSION
We have described a more suitable similarity measure for
dependent TSD. After showing how to compute our metric
and evaluated on different datasets its use case and effec-
tiveness. Further we will use this metric in addition to the
standard metrics to evaluate our proposed online time-series
clustering algorithm which is described in section VI
VI. CLUSTERING ALGORITHM (ABIMCA)
In this section we describe the concept of our time-series
clustering approach in detail. Afterwards, we apply our algo-
rithm onto the datasets described in section IV and present
the results.
A. METHOD
As described in [23] a key component in a time-series clus-
tering algorithm is the similarity function to quantify the
clustering criteria. Common similarity functions used are
distance measures like euclidean distance or some kind of
correlation coefficients like Pearson’s correlation coefficient.
Those are also used for static data clustering algorithms. More
suitable for time-series clustering are similarity functions
like Dynamic Time Warping (DTW) distance, short time-
series (STS) distance [65] or considering space curves like
we introduced in section V.
In this work we analyzed an approach which is data driven,
based on unsupervised machine learning algorithms and has
online capabilities (see Figure 11). Our approach uses a RNN
based AE to generate scores which are used as similarity
measures. Specifically, the experiments in this work were
performed using a pytorch [66] implementation of a bidi-
rectional one-layer gated recurrent unit (GRU) RNN with a
hidden size of the input dimensions minus one h=d−1.
Other prerequisites regarding the dataset and preprocessing
are described in section IV and section III.
The main procedure of the approach is as follows: The
incoming data is taken as a sliding window Wtat the current
18876 VOLUME 11, 2023
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
FIGURE 5. Lorenz-attractor dataset with 10 clusters from agglomerative
clustering on the new curve parameters feature space. See Table 4for
metric comparison of the following subplots. (a) New curve parameters
feature space computed from the Lorenz-attractor dataset with labels
from agglomerative clustering (b) Lorenz-attractor dataset with labels
from agglomerative clustering on the new curve parameters feature space
(c) New feature space from the curve parameters extracted from each
subsequence. Standard metrics scores computed with new feature space
(d) New feature space from the centers (median value) of each
subsequence. Standard metrics scores computed with new feature space.
FIGURE 6. Own metric (MT3SCM) correlation analysis. Own metric and its
four subcomponents (curvature consistency (cc), weighted curvature
consistency (ccw), silhouette location based (sL), silhouette
curve-parameter based (sP)) correlation to calinski-harabasz,
davies-bouldin and silhouette score for random, agglomerative and
k-means clustering on lorenz and thomas-attractor dataset.
time twith length ζof past time steps and number of fea-
tures d. This matrix Wt∈Rd×ζis used for the input of, what
we call, the Base Autoencoder (BAE). The key element of our
algorithm is, that this BAE’s parameters are not constant but
being adapted iteratively with a stochastic gradient descent
(SGD) optimization method for each new incoming sliding
window. For this training of the BAE, we use a slight adaption
of the sparse AE loss function Lfrom [67] with a basic
FIGURE 7. Agglomerative clustering from [15] applied on the
lorenz-attractor dataset exemplifies the unique components of our metric
compared to the silhouette calinski-harabasz and davies-bouldin scores.
See Table 4for metric comparison of the following subplots. Subfigures
(a) (b) and (c) all have a similarly low MT3SCM score compared to
Figure 5 (b) but considerably good standard metric scores. Subfigure
(d) can achieve a relatively high MT3SCM score due to the high number of
clusters and the resulting good ccwand sLvalue which compensates the
low sPvalue.
FIGURE 8. Own metric evaluation using random clusterer on
thomas-attractor dataset and lorenz-attractor dataset. See Table 4for
metric comparison of the following subplots (a) Own metric and all of its
subcomponents are around zero, as desired. Calinski-harabasz value is
low and davies-bouldin is high, which also indicate a ‘‘bad’’ clustering
(b) Longer random subsequences also generate a MT3SCM result around
zero. Calinski-harabasz and davies-bouldin scores are stronger influenced
by the subsequence length (c) Own metric and all of its subcomponents
are around zero, as desired. Calinski-harabasz value is low and
davies-bouldin is high, which also indicate a ‘‘bad’’ clustering (d) As seen
for the lorenz-attractor data in (b), longer subsequences have a high
impact on calinski-harabasz and davies-bouldin scores.
regularization term or sparsity penalty
loss =l=L(Wt,e
Wt,h)=MSE(Wt,e
Wt)+(h) (30)
VOLUME 11, 2023 18877
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
FIGURE 9. Example of online clustering with a simple three-dimensional
synthetic dataset. First row shows the original input data X(or the last
values of each sliding window Wt) with the online cluster IDs as the
background color (blue is unknown or SID =0). Second row shows the
output of the AE or the reconstruction f
W. In the third row the blue line
represents the value of the latent space hb(left axis) and the identified
subsequence ID SID (right axis). The last row indicates the Base
Autoencoder’s (BAE) (sb) as well as the SAEs’ (s1-s4) score values. The
black horizontal line is the subsequence detection score threshold (η)
and the gray line is the subsequence recognition score threshold (ρ).
TABLE 5. Summation of the number of outperformances of each
algorithm for all datasets and all metrics compared to the
mini-batch-kmeans with parameters from hyperparameter search.
where h=f(x) is the encoders output or latent space. The
sparsity penalty we denote as:
(h)=λ·Pd−1
i=1|hi−clc|(31)
with the penalty factor λ=1e−10 and the latent center
constant clc =0.5. The mean squared error (MSE) is
MSE(Wt,e
Wt)=1
d·ζPd
i=1Pζ
j=1(wij −ewij)2(32)
FIGURE 10. Example of batch-wise offline clustering with a simple
three-dimensional synthetic dataset.
TABLE 6. Summation of the number of outperformances of each
algorithm for all datasets and all metrics compared to the
mini-batch-kmeans with default parameters.
which results in the final loss computation
loss =l=1
d·ζPd
i=1Pζ
j=1(wij −ewij)2+λ·Pd−1
i|hi−clc|
(33)
where the first part is the MSE between the input matrix Wt
and the reconstruction e
Wtand the second part is the penalty
of the latent space deviation.
To determine if a subsequence is recognized at the current
time step, we denote the scoring function SF as follows
s=score =SF(l,h)=cfw ·(|clc −h|)+l
cfw (34)
whereas the weighting factor in current implementation is
cfw =1 and latent center constant is clc =0.5. It utilizes the
reconstruction error as well as the deviation of the latent space
with a doubled emphasis on the latent space deviation due
to its dependence in the loss function as well as the scoring
function which includes the loss again (see Equation (34).
In combination with a threshold, the score is used to deter-
mine when a recognizable subsequence is present.
18878 VOLUME 11, 2023
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
FIGURE 11. Concept of the ABIMCA approach. Sliding window of the MTS Wtis iteratively trained in the base AE. If score of base AE (gray dotted
line) is below threshold (dashed red line), a new subsequence AE is created from the base AE. Incoming data is also compared to existing
subsequence AEs if subsequence can be recognized.
TABLE 7. Best metric ‘metrics.mt3scm’ value for each dataset and algorithm from hyperparameter search results.
TABLE 8. Best metric ‘metrics.calinski-harabasz’ value for each dataset and algorithm from hyperparameter search results.
The bottom row of Figure 9shows, that a subsequence
is present, when the BAE’s score (blue line) is below the
horizontal black line (sb<=η). If a subsequence is present,
a copy of the BAE is made and its parameters are frozen
and associated with this specific pattern of a subsequence.
These copies of the BAE, which we call Subsequence
Autoencoder (SAE), are used to recognize previously seen
subsequences using the same scoring function. A concept
drawing of the approach is shown in Figure 11. The algorithm
is described in pseudocode in algorithm 2.
The functionality can be retraced considering
Figure 9and 10. This example shows the algorithm applied
VOLUME 11, 2023 18879
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
TABLE 9. Best metric ‘metrics.davies-bouldin’ value for each dataset and algorithm from hyperparameter search results.
TABLE 10. Best metric ‘metrics.silhouette’ value for each dataset and algorithm from hyperparameter search results.
TABLE 11. Metric ‘metrics.mt3scm’ value for each dataset and algorithm from default calibration results.
TABLE 12. Metric ‘metrics.calinski-harabasz’ value for each dataset and algorithm from default calibration results.
on a three-dimensional synthetic data set. The input data
consists of four different operation points with small white
noise. The sequence of the four subsequences is repeated
once. The other rows are described in the caption of Figure 9.
It is evident that the algorithm needs a few time steps to adapt
to the current subsequence until it is recognized as such.
Recognizing a previously identified subsequence, however,
is almost instantaneous. The calibration of the thresholds
η(horizontal black line) and ζ(horizontal gray line) are
apparently crucial. The necessary time steps to adapt to a
current subsequence can be altered by the calibration of the
learning rate αand the number of BAE’s training cycles
18880 VOLUME 11, 2023
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
TABLE 13. Metric ‘metrics.davies-bouldin’ value for each dataset and algorithm from default calibration results.
TABLE 14. Metric ‘metrics.silhouette’ value for each dataset and algorithm from default calibration results.
FIGURE 12. Empricial complexity estimation with variation of total
number of datapoints and subsequences identified by our algorithm over
the duration. 11.
per time step ω. A faster recognition of a subsequence has
the drawback of the algorithm being very sensitive and
therefore identifying even small changes of the input as a new
subsequence. A strategy could be to calibrate the algorithm
first to be rather insensitive and cluster the time-series in
major subsequences. These can then be further clustered
with a more sensitive calibration. This procedure can be
repeated until the required degree of granularity is achieved.
For a streaming application multiple runs of this procedure
could also be applied in parallel and combined into a
cluster tree.
B. EVALUATION
For the evaluation study of our algorithm, we chose eight
different MTS datasets (see Table 3) from which six are pub-
licly available and two are provided with our codebase [68],
seven other state-of-the-art algorithms (see Table 1) and three
widely used unsupervised clustering metrics (see Table 2).
Each algorithm has been applied to each dataset with default
parameters. Additionally, we performed a hyperparameter
search for each algorithm based on a random grid search of
300 samples. The parameter boundaries for this hyperparam-
eter search are listed in Table 15. Overall, 19 264 experiments
were run.
For a better overview of the results, we chose to compare
every algorithm to the ‘‘MiniBatchKMeans’’ algorithm and
counted the number of times they performed better. Table 5
shows the results for the hyperparameter search and the
number of outperformances of each algorithm compared to
the ‘‘MiniBatchKMeans’’ algorithm. Table 6shows the same
results with default parameter settings for each algorithm.
We can see, that in sum and in two of the metrics our algo-
rithm beats state-of-the-art algorithms. The full list of results
is attached in Table 7and 14. Additionally, Table 16 shows
the best results from the hyperparameter search when sorted
by MT3SCM with the total time spent.11
11Experiments were performed on a Linux machine with a AMD Ryzen
Threadripper 2950X 16-Core Processor using a GeForce RTX 2080 GPU.
VOLUME 11, 2023 18881
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
Algorithm 2 ABIMCA. Using the Following Parameters:
α: Learning Rate, ω: Number of Base Model Training Cycle
per Step, η: Subsequence Detection Score Threshold, ρ: Sub-
sequence Recognition Score Threshold, Window Length: ζ.
Also, We Denote θSas a List of Subsequence Model Param-
eters, sas an Array of Scores for All Existing Subsequence
Models, e
Wtthe Reconstruction of the Sliding Window Input,
1: procedure Abimca(Wt)▷Sliding window data
Wt∈Rd×ζ
2: Verify calibration t≥ζ∨α > 0∨ω≥1∨η >
0∨ρ > η
3: cS←0▷Initialize subsequence counter
4: θb←Sparse(sparsity =0.1) ▷Initialize base model
parameters
5: while Wtdo ▷New input available
6: for j←1, ω do ▷Iterative base model train
loop
7: e
Wt,hb←Predict(Wt, θb)
8: lb←L(Wt,e
Wt,h)
9: 1θb←Backpropagate(lb)
10: θb←θb+1θb▷Update base model
parameters
11: end for
12: e
Wt,hb←Predict(Wt, θb)
13: sb←SF(lb,hb)
14: s←GetSubsequenceScores(Wt, θS)
15: if min(s)< ρ then ▷Subsequence recognized
16: SID ←arg min(s)▷Set ID to recognized
subsequence index
17: else ▷No subsequence recognized
18: if sb<=ηthen ▷Score below new
subsequence threshold
19: θS.append(θb)▷Append current
base model parameters to list of subsequence models
parameters
20: cs←cs+1▷increase subsequence
counter
21: SID ←cs▷Set ID to new subsequence
index
22: else ▷In transition
23: SID ←0▷Set ID to unknown
24: end if
25: end if
26: end while
27: end procedure
To estimate the complexity of the algorithm additional
experiments were run to substantiate our theoretical consid-
eration for the Big-O notation (see Figure 12). Since it is
an online algorithm that uses a sliding window, the duration
of the algorithm is linearly correlated to the number of data
points to be analyzed. Additionally, with every subsequence
TABLE 15. Hyperparameter random grid search upper and lower bound
for each algorithm and their specific parameter options.
18882 VOLUME 11, 2023
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
TABLE 16. Metrics from hyperparameter search when sorted for best value of mt3scm metric.
VOLUME 11, 2023 18883
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
identified, the algorithm checks if the new incoming data
is already known by comparing it to the previously identi-
fied subsequences. A linear correlation of the duration for
every datapoint with the number of subsequences identified
is therefore present. The complexity of our algorithm is
therefore O(NS), where Nis the number of datapoints and
Sis the number of subsequences identified. Considering the
worst case scenario of identifying every new datapoint as
a new subsequence the duration for the algorithm increases
quadratically with the number of datapoints, which results in
a complexity of O(n2).
As cited before, every algorithm performs differently on
the specific distribution of patterns and the hyperparameter
search was a simple random grid search of ‘‘only’’ 300 sam-
ples, so these results unlikely represent the optimal solution
for each algorithm on each dataset. Nevertheless, we demon-
strate, that the algorithm we present in this work, is highly
effective of detecting subsequences online in a MTS.
VII. LIMITATIONS AND DISCUSSION
For the evaluation of the segmentation of MTSD, we intro-
duced a new metric which is based on space-curve parame-
ters in the feature space. Due to the wide variety of fields,
use cases and applications, this falls into place for some
applications and uses cases but not for all. The calculation
of these space-curve parameters is sensitive to outliers and
smoothness and questionable for steady-state conditions or
non-moving point clouds in the feature space. We have imple-
mented specific numerical boundary limits for computing
the derivatives of the data in these states, but it needs to
be considered and evaluated if this is compatible with the
application. Because of the outlier sensitivity we use the mean
value of these parameters as well as their standard deviation.
A low-pass filter for very noisy data should be considered
before applying it to the metric. Attention is called for, when
the data is scaled or standardized. This effects the actual space
curve parameters, since a constant curvature is likely not con-
stant anymore after scaling. The metric also tends to reward
short subsequences who only occur once. Due to the mean
value of the curve parameters the subsequence separation
appears to be good, but the variance of one large subsequence
is high. This needs to be compensated or prevented more
and will be part of future analysis and improvement of the
metric. It might also be reasonable to introduce weighting
factors for the three parts of our metric in Equation (12) to
consider domain specific emphasis on a rather spatial or curve
parameter separation requirement.
Regarding our clustering method, calibration of the main
thresholds (η,ρ) needs special attention. In combination with
the learning rate, they mainly influence the ‘‘sensitivity’’
of the segmentation process. Using only the reconstruction
error or only the representation deviation can be beneficial
for different use cases and complexities (e.g., changing the
weighting factor in Equation (34)). Advantages are that no
additional information about the data for offline clustering
needs to be saved. All necessary information for clustering
data after training is the parameters of the subsequence spe-
cific AEs. It is a completely unsupervised method which can
cluster online data. In the context of CbM the once identified
subsequence AEs can be used for deviation quantification
of the underlying system. This can be used for deterioration
analysis and maintenance strategies. Further investigations
for improving the ABIMCA method would be to explore
different kind of AEs like feedforward neural network (FNN),
CNN or a combination of such. Also, a VAE could be rea-
sonable depending on the underlying process. Future work
should analyze the effect of reducing the latent space dimen-
sion by multiple factors of the input dimension (when input
dimension is very high). This could reduce computation costs
and improve representation learning without performance
loss. A detailed analysis of the optimal default parameters or a
generic automatic calibration depending on some statistics of
the expected input could increase performance and decrease
calibration efforts.
VIII. CONCLUSION
In this paper we have introduced the Autoencoder-based
Iterative Modeling and Subsequence Clustering Algorithm
(ABIMCA) which is a deep learning method to separate
multivariate time-series data (MTSD) into subsequences. It is
beneficial in a variety of fields, to cluster MTSD into smaller
segments or subsequences in an unsupervised manner. The
ability to filter measurement data based on specific subse-
quences can improve downstream development products such
as anomaly detection or machine diagnosis in Condition-
based Maintenance (CbM) strategies. Our algorithm is specif-
ically useful for MTSD generated by a mechatronic system
in a transient environment. It can be used offline as well as
online for streaming data. It utilizes recurrent neural network
(RNN) based Autoencoders (AE) by iteratively training a
Base Autoencoder (BAE), generating a segmentation score
and saving the intermediate parameters of the BAE to rec-
ognize previously identified subsequences. By comparing
our algorithm with seven other algorithms on eight different
publicly available datasets using four different unsupervised
metrics (from which we introduced one ourselves), we have
shown that our algorithm outperforms state-of-the-art algo-
rithms. Our unsupervised metric introduced (Multivariate
Time-Series Sub-Sequence Clustering Metric (MT3SCM)),
is an attempt to use a more intuitive similarity measure based
on the curvature and other space-curve parameters of the
spanned feature space. Additionally, all our code is open
source and publicly available for benchmarking.
REFERENCES
[1] E. Quatrini, F. Costantino, G. Di Gravio, and R. Patriarca, ‘‘Condition-
based maintenance—An extensive literature review,’’ Machines, vol. 8,
no. 2, p. 31, Jun. 2020.
[2] R. Isermann, Fault-Diagnosis Systems: An Introduction From Fault Detec-
tion to Fault Tolerance. Berlin, Germany: Springer, 2006.
[3] G. E. P. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, Time
Series Analysis: Forecasting and Control (Wiley series in Probability
and Statistics), 5th ed. Hoboken, NJ, USA: Wiley, 2016. [Online].
Available: https://search.ebscohost.com/login.aspx?direct=true&scope=
site&db=nlebk&db=nlabk&AN=1061322
18884 VOLUME 11, 2023
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
[4] C. M. Bishop, Pattern Recognition and Machine Learning (Information
science and statistics), 1st ed. New York, NY, USA: Springer, 2016.
[5] J. F. Roddick and M. Spiliopoulou, ‘‘A survey of temporal knowledge dis-
covery paradigms and methods,’’ IEEE Trans. Knowl. Data Eng., vol. 14,
no. 4, pp. 750–767, Jul. 2002.
[6] J. Lin, E. Keogh, S. Lonardi, and P. Patel, ‘‘Finding motifs in time series,’’
Proc. 2nd Workshop Temporal Data Mining, 2002, pp. 1–11.
[7] G. J. J. van den Burg and C. K. I. Williams, ‘‘An evaluation of change point
detection algorithms,’’ 2020, arXiv:2003.06222.
[8] A. K. Jain, M. N. Murty, and P. J. Flynn, ‘‘Data clustering: A review,’’ ACM
Comput. Surv., vol. 31, no. 3, pp. 264–323, Sep. 1999.
[9] V. Chandola, A. Banerjee, and V. Kumar, ‘‘Anomaly detection: A survey,’’
ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, Jul. 2009.
[10] R. Agrawal and R. Srikant, ‘‘Mining sequential patterns,’’ in Proc. 11th
Int. Conf. Data Eng., Mar. 1995, pp. 3–14.
[11] D. Xu and Y. Tian, ‘‘A comprehensive survey of clustering algorithms,’’
Ann. Data Sci., vol. 2, no. 2, pp. 165–193, 2015.
[12] M. Lovrić, M. Milanović, and M. Stamenković, ‘‘Algoritmic meth-
ods for segmentation of time series: An overview,’’ J. Contemp.
Econ. Bus. Issues, vol. 1, no. 1, pp. 31–53, 2014. [Online]. Available:
http://hdl.handle.net/10419/147468
[13] S. Torkamani and V. Lohweg, ‘‘Survey on time series motif discovery,’’
Wiley Interdiscipl. Rev., Data Mining Knowl. Discovery, vol. 7, no. 2,
p. e1199, Mar. 2017.
[14] S. Aminikhanghahi and D. J. Cook, ‘‘A survey of methods for time series
change point detection,’’ Knowl. Inf. Syst., vol. 51, no. 2, pp. 339–367,
2017.
[15] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
M. Blondel, A. Müller, J. Nothman, G. Louppe, P. Prettenhofer, R. Weiss,
V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,
M. Perrot, and E. Duchesnay, ‘‘Scikit-learn: Machine learning in Python,’’
J. Mach. Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011.
[16] T. Zhang, R. Ramakrishnan, and M. Livny, ‘‘BIRCH: An efficient data
clustering method for very large databases,’’ in Proc. ACM SIGMOD Int.
Conf. Manage. Data, vol. 25, no. 2, 1996, pp. 103–114.
[17] R. Prescott Adams and D. J. C. MacKay, ‘‘Bayesian online changepoint
detection,’’ 2007, arXiv:0710.3742.
[18] J. Montiel, M. Halford, S. Martiello Mastelini, G. Bolmier, R. Sourty,
R. Vaysse, A. Zouitine, H. Murilo Gomes, J. Read, T. Abdessalem, and
A. Bifet, ‘‘River: Machine learning for streaming data in Python,’’ 2020,
arXiv:2012.04740.
[19] C. C. Aggarwal, P. S. Yu, J. Han, and J. Wang, ‘‘A framework for clustering
evolving data streams,’’ in Proc. VLDB Conf. Elsevier, 2003, pp. 81–92.
[20] M. Hahsler and M. Bolaños, ‘‘Clustering data streams based on shared
density between micro-clusters,’’ IEEE Trans. Knowl. Data Eng., vol. 28,
no. 6, pp. 1449–1461, Jun. 2016.
[21] F. Cao, M. Estert, W. Qian, and A. Zhou, ‘‘Density-based clustering over
an evolving data stream with noise,’’ in Proc. SIAM Int. Conf. Data Mining,
J. Ghosh, D. Lambert, D. Skillicorn, and J. Srivastava, Eds. Philadel-
phia, PA, USA: Society for Industrial and Applied Mathematics, 2006,
pp. 328–339.
[22] D. Sculley, ‘‘Web-scale K-means clustering,’’ in Proc. 19th Int. Conf.
World Wide Web (WWW), New York, NY, USA, M. Rappa, P. Jones,
J. Freire, and S. Chakrabarti, Eds. 2010, p. 1177.
[23] T. W. Liao, ‘‘Clustering of time series data—A survey,’’ Pattern Recognit.,
vol. 38, no. 11, pp. 1857–1874, 2005.
[24] S. Zolhavarieh, S. Aghabozorgi, and Y. W. Teh, ‘‘A review of subsequence
time series clustering,’’ Sci. World J., vol. 2014, Jul. 2014, Art. no. 312521.
[25] S. Aghabozorgi, A. Seyed Shirkhorshidi, and T. Ying Wah, ‘‘Time-series
clustering—A decade review,’’ Inf. Syst., vol. 53, pp. 16–38, Oct. 2015.
[26] M. Carnein and H. Trautmann, ‘‘Optimizing data stream representation:
An extensive survey on stream clustering algorithms,’’ Bus. Inf. Syst. Eng.,
vol. 61, no. 3, pp. 277–297, Jun. 2019.
[27] H.-P. Kriegel, P. Kröger, and A. Zimek, ‘‘Clustering high-dimensional
data,’’ ACM Trans. Knowl. Discovery Data, vol. 3, no. 1, pp. 1–58,
Mar. 2009.
[28] C. Truong, L. Oudre, and N. Vayatis, ‘‘Selective review of offline
change point detection methods,’’ Signal Process., vol. 167, Feb. 2020,
Art. no. 107299.
[29] E. Aljalbout, V. Golkov, Y. Siddiqui, M. Strobel, and D. Cremers,
‘‘Clustering with deep learning: Taxonomy and new methods,’’ 2018,
arXiv:1801.07648.
[30] C. Isaksson, M. H. Dunham, and M. Hahsler, ‘‘SOStream: Self organizing
density-based clustering over data stream,’’ in Machine Learning and
Data Mining in Pattern Recognition (Lecture Notes in Computer Science),
vol. 7376, D. Hutchison, T. Kanade, J. Kittler, J. M. Kleinberg, F. Mattern,
J. C. Mitchell, M. Naor, O. Nierstrasz, C. P. Rangan, B. Steffen, M. Sudan,
D. Terzopoulos, D. Tygar, M. Y. Vardi, G. Weikum, and P. Perner, Eds.
Berlin, Germany: Springer, 2012, pp. 264–278.
[31] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, ‘‘A density-based algo-
rithm for discovering clusters a density-based algorithm for discovering
clusters in large spatial databases with noise,’’ in Proc. 2nd Int. Conf.
Knowl. Discovery Data Mining, 1996, pp. 226–231. [Online]. Available:
http://dl.acm.org/citation.cfm?id=3001460.3001507
[32] L. O’Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani,
‘‘Streaming-data algorithms for high-quality clustering,’’ in Proc. 18th Int.
Conf. Data Eng., 2002, pp. 685–694.
[33] M. Ghesmoune, M. Lebbah, and H. Azzag, ‘‘State-of-the-art on clustering
data streams,’’ Big Data Analytics, vol. 1, no. 1, Dec. 2016.
[34] E. Keogh, S. Chu, D. Hart, and M. Pazzani, ‘‘Segmenting time series:
A survey and novel approach,’’ in Data Mining in Time Series Databases,
vol. 57, M. Last, A. Kandel, and H. Bunke, Eds. Singapore: World Scien-
tific, 2003, pp. 1–21.
[35] M. Ramoni, P. Sebastiani, and P. Cohen, ‘‘Bayesian clustering by dynam-
ics,’’ Mach. Learn., vol. 47, no. 1, pp. 91–121, 2002.
[36] X. Wang, K. Smith, and R. Hyndman, ‘‘Characteristic-based clustering
for time series data,’’ Data Mining Knowl. Discovery, vol. 13, no. 3,
pp. 335–364, 2006.
[37] R. P. Silva, B. B. Zarpelão, A. Cano, and S. B. Junior, ‘‘Time series segmen-
tation based on stationarity analysis to improve new samples prediction,’’
Sensors, vol. 21, no. 21, p. 7333, Nov. 2021.
[38] S. Lu and S. Huang, ‘‘Segmentation of multivariate industrial time series
data based on dynamic latent variable predictability,’’ IEEE Access, vol. 8,
pp. 112092–112103, 2020.
[39] M. Ceci, R. Corizzo, N. Japkowicz, P. Mignone, and G. Pio, ‘‘ECHAD:
Embedding-based change detection from multivariate time series in smart
grids,’’ IEEE Access, vol. 8, pp. 156053–156066, 2020.
[40] R. Li, S. Li, K. Xu, X. Li, J. Lu, and M. Zeng, ‘‘A novel symmetric stacked
autoencoder for adversarial domain adaptation under variable speed,’’
IEEE Access, vol. 10, pp. 24678–24689, 2022.
[41] M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, ‘‘Spatio-
temporal convolutional sparse auto-encoder for sequence classification,’’
in Proc. Brit. Mach. Vis. Conf., R. Bowden, J. Collomosse, and K. Miko-
lajczyk, Eds. 2012, p. 124.
[42] K. Bascol, R. Emonet, E. Fromont, and J.-M. Odobez, ‘‘Unsupervised
interpretable pattern discovery in time series using autoencoders,’’ in
Structural, Syntactic, and Statistical Pattern Recognition (Lecture Notes
in Computer Science), vol. 10029, A. Robles-Kelly, M. Loog, B. Biggio,
F. Escolano, and R. Wilson, Eds. Cham, Switzerland: Springer, 2016,
pp. 427–438.
[43] S. E. Chazan, S. Gannot, and J. Goldberger, ‘‘Deep clustering based on a
mixture of autoencoders,’’ in Proc. IEEE 29th Int. Workshop Mach. Learn.
Signal Process. (MLSP), Oct. 2019, pp. 1–6.
[44] B. Yang, X. Fu, D. N. Sidiropoulos, and M. Hong, ‘‘Towards
K-means-friendly spaces: Simultaneous deep learning and
clustering,’’ in Proc. 34th Int. Conf. Mach. Learn., D. Precup and
Y. W. Teh, Eds. vol. 70, 2017, pp. 3861–3870. [Online]. Available:
https://proceedings.mlr.press/v70/yang17b.html
[45] Y. Guo, W. Liao, Q. Wang, L. Yu, T. Ji, and P. Li, ‘‘Multidimensional
time series anomaly detection: A gru-based Gaussian mixture variational
autoencoder approach,’’ in Proc. 10th Asian Conf. Mach. Learn., J. Zhu
and I. Takeuchi, Eds. vol. 95, 2018, pp. 97–112. [Online]. Available:
http://proceedings.mlr.press/v95/guo18a.html
[46] T. Chen, X. Liu, B. Xia, W. Wang, and Y. Lai, ‘‘Unsupervised
anomaly detection of industrial robots using sliding-window convolu-
tional variational autoencoder,’’ IEEE Access, vol. 8, pp. 47072–47081,
2020.
[47] W.-H. Lee, J. Ortiz, B. Ko, and R. Lee, ‘‘Time series segmentation through
automatic feature learning,’’ 2018, arXiv:1801.05394.
[48] D. Dua and C. Graff. (2017). UCI Machine Learning Repository. [Online].
Available: http://archive.ics.uci.edu/ml
[49] (Sep. 14, 2020). Data.gov. [Online]. Available: https://data.gov/
[50] C. J. van Rijsbergen, Information Retrieval. London, U.K.: Butterworths,
1979.
VOLUME 11, 2023 18885
J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm
[51] H. Kremer, P. Kranen, T. Jansen, T. Seidl, A. Bifet, G. Holmes, and
B. Pfahringer, ‘‘An effective evaluation measure for clustering on evolving
data streams,’’ in Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discov-
ery Data Mining (KDD), New York, NY, USA, C. Apte, J. Ghosh, and
P. Smyth, Eds. 2011, p. 868.
[52] J. Serrá and J. L. Arcos, ‘‘An empirical evaluation of similarity measures
for time series classification,’’ Knowl.-Based Syst., vol. 67, pp. 305–314,
Sep. 2014.
[53] P. J. Rousseeuw, ‘‘Silhouettes: A graphical aid to the interpretation and
validation of cluster analysis,’’ J. Comput. Appl. Math., vol. 20, no. 1,
pp. 53–65, 1987.
[54] T. Caliński and J. Harabasz, ‘‘A dendrite method for cluster analysis,’’
Commun. Stat., Theory Methods, vol. 3, no. 1, pp. 1–27, 1974.
[55] D. L. Davies and D. W. Bouldin, ‘‘A cluster separation measure,’’ IEEE
Trans. Pattern Anal. Mach. Intell., vol. PAMI-1, no. 2, pp. 224–227,
Apr. 1979.
[56] S. M. Oh, J. M. Rehg, T. Balch, and F. Dellaert, ‘‘Learning and inferring
motion patterns using parametric segmental switching linear dynamic
systems,’’ Int. J. Comput. Vis., vol. 77, nos. 1–3, pp. 103–124, May 2008.
[57] M. Arias Chao, C. Kulkarni, K. Goebel, and O. Fink, ‘‘Aircraft engine
Run-to-Failure dataset under real flight conditions for prognostics and
diagnostics,’’ Data, vol. 6, no. 1, p. 5, Jan. 2021.
[58] E. Yemini, T. Jucikas, L. J. Grundy, A. E. X. Brown, and W. R. Schafer,
‘‘A database of caenorhabditis elegans behavioral phenotypes,’’ Nature
Methods, vol. 10, no. 9, pp. 877–879, Sep. 2013.
[59] T. Schneider, N. Helwig, and A. Schütze, ‘‘Automatic feature extraction
and selection for classification of cyclical time series data,’’ Technisches
Messen, vol. 84, no. 3, pp. 198–206, Mar. 2017.
[60] E. N. Lorenz, ‘‘Deterministic nonperiodic flow,’’ J. Atmos. Sci., vol. 20,
no. 2, pp. 130–141, 1963.
[61] (Apr. 21, 2022). Carnegie Mellon University—CMU Graphics Lab—
Motion Capture Library. [Online]. Available: http://mocap.cs.cmu.edu/
[62] L. M. I. Candanedo and V. Feldheim, ‘‘Accurate occupancy detection of
an office room from light, temperature, humidity and Co2measurements
using statistical learning models,’’ Energy Buildings, vol. 112, pp. 28–39,
Jan. 2016.
[63] R. Thomas, ‘‘Deterministic chaos seen in terms of feedback circuits:
Analysis, synthesis, ‘labyrinth chaos,’’’ Int. J. Bifurcation Chaos, vol. 9,
no. 10, pp. 1889–1905, 1999.
[64] J. Köhne. (2022). MT3SCM: Multivariate Time Series Sub-
Sequence Clustering Metric. Python. [Online]. Available:
https://github.com/Jokonu/mt3scm
[65] C. S. Möller-Levet, F. Klawonn, K.-H. Cho, and O. Wolkenhauer, ‘‘Fuzzy
clustering of short time-series and unevenly distributed sampling points,’’
in Advances in Intelligent Data Analysis V (Lecture Notes in Computer Sci-
ence), vol. 2810, G. Goos, J. Hartmanis, J. van Leeuwen, M. R. Berthold,
H.-J. Lenz, E. Bradley, R. Kruse, and C. Borgelt, Eds. Berlin, Germany:
Springer, 2003, pp. 330–340.
[66] A. Paszke et al., ‘‘PyTorch: An imperative style, high-performance deep
learning library,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 32,
H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox,
and R. Garnett, Eds. Red Hook, NY, USA: Curran Associates, 2019,
pp. 8024–8035. [Online]. Available: http://papers.neurips.cc/paper/9015-
pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
[67] M. Ranzato, C. S. Poultney, S. Chopra, and Y. LeCun, ‘‘Efficient learning
of sparse representations with an energy-based model,’’ in Proc. NIPS,
2006, pp. 1–8.
[68] J. Köhne. (2022). ABIMCA: Autoencoder Based Iterative Modeling
and Subsequence Clustering Algorithm. Python. [Online]. Available:
https://github.com/Jokonu/abimca
JONAS KÖHNE was born in Berlin, Germany,
in 1983. He received the Dipl.-Ing. degree in
energy and process engineering from the Technis-
che Universität Berlin, Germany, in 2014, where
he is currently pursuing the Ph.D. degree with the
Department of Energy and Automation Technol-
ogy. From 2014 to 2015, he worked as a Func-
tion Development Engineer and an Embedded
Software Test Engineer in the automotive field
of electrical power steering with Bertrandt AG.
From 2015 to 2019, he worked as a Function Developer of powertrain and
power engineering with the Department of Commercial Vehicle Electronics,
IAV GmbH. Since 2019, he has been a Research Assistant with the Depart-
ment of Energy and Automation Technology, Technische Universität Berlin.
He is also working as a Data Scientist with the Department Commercial
Vehicle Electronics, IAV GmbH. His research interests include anomaly
and novelty detection, condition/predictive maintenance strategies of mecha-
tronic systems, analysis of multivariate time-series data using autoencoder,
and fault discovery and identification.
LARS HENNING was born in Nauen, Germany,
in 1975. He received the Dipl.-Ing. degree in
energy and process engineering from the Technis-
che Universität Berlin, Germany, in 2002, and the
Ph.D. degree in control engineering, in 2008. Since
2008, he has been working with the Department
of Commercial Vehicle Electronics, IAV GmbH,
in the area of powertrain and power engineering.
Currently, he is a Team Manager of the Software
Development Team, with the focus on condi-
tion/predictive maintenance strategies of mechatronic systems and machine-
learned algorithms for powertrain control.
CLEMENS GÜHMANN was born in Berlin,
Germany, in 1962. He received the Dipl.-Ing.
degree in electrical engineering from the Technis-
che Universität Berlin, Germany, in 1989, and the
Ph.D. degree in pattern recognition and technical
diagnosis, in 1995. From 1989 to 1994, he was
a Research Assistant with the Institute for Gen-
eral Electrical Engineering. From 1994 to 1995,
he worked as a Function Development Engi-
neer with Whirlpool Corporation (Bauknecht).
From 1996 to 2003, he was employed as a Function Development Engi-
neer with IAV GmbH, where he was the last Head of the Department of
Transmission Systems. In 2001, he received a Teaching Assignment with
the Technische Universität Berlin, where he was appointed to a professor-
ship at the Chair of Electronic Measurement and Diagnostic Technology,
in 2003. His research interests include the measurement technology and data
processing and the diagnosis, predictive maintenance, modeling, simulation,
and automatic control of mechatronic systems.
18886 VOLUME 11, 2023