Document [original]

Received 16 January 2023, accepted 13 February 2023, date of publication 22 February 2023, date of current version 28 February 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3247564

Autoencoder-Based Iterative Modeling and

Multivariate Time-Series Subsequence

Clustering Algorithm

JONAS KÖHNE 1,2, LARS HENNING 2, AND CLEMENS GÜHMANN 1

1Chair of Electronic Measurement and Diagnostic Technology, Technische Universität Berlin, 10623 Berlin, Germany

2IAV GmbH, 10587 Berlin, Germany

Corresponding author: Jonas Köhne ([email protected])

ABSTRACT This paper introduces an algorithm for the detection of change-points and the identification

of the corresponding subsequences in transient multivariate time-series data (MTSD). The analysis of such

data has become increasingly important due to growing availability in many industrial fields. Labeling,

sorting or filtering highly transient measurement data for training Condition-based Maintenance (CbM)

models is cumbersome and error-prone. For some applications it can be sufficient to filter measurements

by simple thresholds or finding change-points based on changes in mean value and variation. But a robust

diagnosis of a component within a component group for example, which has a complex non-linear correlation

between multiple sensor values, a simple approach would not be feasible. No meaningful and coherent

measurement data, which could be used for training a CbM model, would emerge. Therefore, we introduce an

algorithm that uses a recurrent neural network (RNN) based Autoencoder (AE) which is iteratively trained on

incoming data. The scoring function uses the reconstruction error and latent space information. A model of

the identified subsequence is saved and used for recognition of repeating subsequences as well as fast offline

clustering. For evaluation, we propose a new similarity measure based on the curvature for a more intuitive

time-series subsequence clustering metric. A comparison with seven other state-of-the-art algorithms and

eight datasets shows the capability and the increased performance of our algorithm to cluster MTSD online

and offline in conjunction with mechatronic systems.

INDEX TERMS Condition-based maintenance, multivariate time-series data, change point detection,

unsupervised clustering, autoencoder, segmentation, subsequence, clustering.

I. INTRODUCTION

In the applications of machine diagnosis of mechatronic sys-

tems and the subfield CbM, all supervised machine learning

methods rely on high-quality labeled data [1], [2]. An option

for a mechatronic system with different operating points is

to measure the main operating points separately and create a

diagnosis method for each of those individually. This requires

a well-structured design and execution of experiments with a

measurement labeling process. In a real world development

environment for mechatronic system, where measurements

are taken either automatically and/or manually and by many

The associate editor coordinating the review of this manuscript and

approving it for publication was Wentao Fan .

individuals and in different hardware and software develop-

ment stages, providing consistently labeled and categorized

data is a challenge.

Automatically labeling and categorizing multivariate time-

series (MTS) data is therefore not only an alleviation but

might be crucial for a successful CbM approach. As described

above, labeled and categorized data is essential for training

a model to represent a mechatronic system in a data driven

approach. In the automotive sector where a lot of measure-

ments occur at different operating points this is especially

important. Some of these measurements are being recorded

on a test bench in standard environment conditions with pre-

defined operating points and for a given time (e.g., Worldwide

harmonized Light Duty Test Cycle (WLTC)). Others can be

18868 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 11, 2023

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

in idle mode during waiting or preparation time for a longer

trip or other measurements. Also, very transient episodes are

existent (e.g., Real Driving Emissions (RDE)). All of these

measurements do not necessarily have the same calibration of

the underlying mechatronic system. To train a robust model of

the mechatronic system, component group or a single compo-

nent, a big effort has to be put in the design of the experiments

alone, not to mention the experiments themselves. Therefore,

a method of automatically labeling existing measurements is

of advantage. Afterwards an automatic sorting of the labeled

time sequences by statistical methods is possible, to enable a

data driven mechatronic diagnosis approach.

Using advanced unsupervised approaches for CbM allows

the data to be unlabeled (otherwise supervised methods could

be used). In this case the labeling refers to the label of the

condition (mechanical degradation) of the monitored system.

When trying to diagnose mechatronic systems that have many

operating points and are free to transfer in between those or

are capable of totally transient operation modes, then a robust

diagnosis of the actual condition of the mechatronic system

is extremely challenging. An early and reliable (robust) diag-

nosis of a mechatronic system prevents accidents, enables

optimal maintenance and increases uptime of machinery.

Without the knowledge of the current condition of the system,

fault prevention can only be done by predetermined mainte-

nance intervals. Motivation is therefore to monitor the health

condition of the mechatronic system as close as possible,

resulting in the task of separating discrete sensory data into

uniquely identifiable and recognizable segments or subse-

quences. This is beneficial to the performance of anomaly

detection, because if all normally occurring subsequences are

identified, the detection of abnormal or faulty subsequences

is straightforward.

When monitoring the health condition of a mechatronic

system it is state of the art, to manually calibrate specific

release conditions, during which the condition monitoring is

enabled. This is done to exclude operating points which are

very rare, too transient or are just not feasible for drawing

conclusions about the condition of the mechatronic system.

But even restricting the conditions on where to diagnose the

machine (which already is reducing the probability of diag-

nosing the machine at all due to an operating state which is by

chance outside the release conditions) cannot always help to

improve the fault detection, identification and quantification

of its magnitude. For example, in a mechatronic system with a

complex nonlinear dependency of its subcomponents and its

time dependency, ‘‘going in’’ or ‘‘going out’’ of the release

conditions can result in very different system behavior. Com-

paring these two states does not lead to reliable conclusions

for the mechatronic systems health condition.

Therefore, the kind of data sequences used for train-

ing/calibration and validation is crucial for any monitoring

strategy. Manually screening, labeling and sorting data into

comparable sequences is time-consuming, error-prone and

cumbersome. Additionally, this is a decision process which

requires expert and domain knowledge.

The new algorithm which we introduce in this work

is capable of generating subsequence models from online

streaming data which is processed sequentially. Any coherent

subsequence that is identified can be recognized (clustered)

if occurring again. Depending on multiple sensitivity cali-

bration parameters, time-varying data points are associated

and identified as a subsequence. The parameters determine

the volatility or the strength of the affiliation required to be

recognized as one time varying subsequence. These subse-

quence models can also be applied efficiently offline onto

large existing datasets. During this prediction phase, the algo-

rithm provides a vector of subsequence labels which were

recognized as one from the training data. Depending on the

calibration, it can also provide a label for unknown data which

represents a phase where no pattern could be recognized.

Otherwise, it finds the best fitting subsequence and labels it

as that. The approach published in this work is currently only

based on MTS input but could be adapted for a univariate

input. It is a multivariate time-series sub-sequence discovery

and identification method.

Our contribution is a new algorithm for online sub-

sequence clustering of MTSD called ‘‘Autoencoder-based

Iterative Modeling and Subsequence Clustering Algorithm

(ABIMCA)’’ and a new metric to evaluate cluster algo-

rithms focused on this task ‘‘Multivariate Time-Series Sub-

Sequence Clustering Metric (MT3SCM)’’. We compare our

algorithm with

•seven other state-of-the-art algorithms

•eight datasets, from which six are publicly available and

two are provided with our codebase

•three widely used unsupervised clustering metrics

•our own metric (MT3SCM) and its four components

while varying the use of default algorithm parameters with

optimized parameters on each algorithm and dataset via ran-

dom grid search.

II. RELATED WORK

In this section we define the terminology used and its seman-

tics to categorize our work within the large bibliography

existing in this field and provide a selected list of related

works and their ascendancy to this paper.

A. TERMINOLOGY AND SEMANTICS

Numerous possibilities have been described for achieving

our main goal of segmenting discrete time-series sensory

data. Most approaches can be sorted into the following par-

tially overlapping categories: time-series analysis [3], pattern

recognition [4], temporal knowledge discovery [5], motif dis-

covery [6], change-point detection [7], data clustering [8]

or anomaly detection [9]. All those terms refer to methods

or algorithms which could be used directly or indirectly to

achieve our goal. Explicit description of each term or category

can be found in the stated references.

To limit the scope of this work, we focus on data clus-

tering which can be separated into six subcategories by the

VOLUME 11, 2023 18869

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

FIGURE 1. Combination possibilities of time-series clustering categories.

following groups of two: univariate and multivariate data,

online and offline algorithms, variant and invariant data.

The term ‘‘clustering’’ implies an unsupervised method. The

equivalent supervised method would be called ‘‘classifica-

tion’’. In the relevant literature more and other distinctions

are made, depending on the specific field and context. First,

we describe time-series data (TSD) since this is the data

format we focus on in this paper. Afterwards we explain the

differences between the subcategories.

‘‘A time-series is a sequence of observations taken sequen-

tially in time.’’ [3] We denote a time-series data point as

an observation with exactly one connected timestamp. The

timestamp is not a variable or feature.

1) UNIVARIATE – MULTIVARIATE

If a single value or a scalar is the only variable of the data,

then the data is univariate. It is the most basic format data

can have. Considering TSD, a single temperature sensor with

a timestamp would be univariate. Univariate TSD could also

be interpreted as multivariate data of two dimensions, when

taking the timestamp as another variable or feature.

2) ONLINE – OFFLINE

The differentiation between online and offline algorithms or

analysis of data is crucial. Offline refers to data analysis

that is applied to all the data at once. Measurement data,

for example, is available in one or multiple files or can be

accessed via a previously filled database. Offline algorithms

can therefore iterate and optimize their result based on a

criterion applied to known data. Online analysis on the other

hand, is applied sequentially. The algorithm needs to be

able to function with a criterion that generalizes well with

unknown data. One selection or piece of data can be applied

on the online algorithm without knowing the rest of the data.

This approach cannot be as robust and accurate as an offline

analysis, which is why most methods found in the literature

are offline algorithms. Ideally the online algorithm learns as

new data is provided. For the sake of completeness, however,

it should be noted that some online algorithms have to be

pre-trained offline and some algorithms referred to as offline

can be used sequentially. This depends on the underlying

methods used. An offline algorithms’ purpose is not to be

TABLE 1. Algorithms used for time-series clustering comparison.

used online. Nevertheless, online algorithms can be used

offline.

3) DEPENDENT – INDEPENDENT TSD (TIME VARIANT –

TIME INVARIANT)

Depending on the field of study the specific terminology of

time dependency can differ. We want to emphasize on the

common accepted assumption “An intrinsic feature of a time-

series is that, typically, adjacent observations are dependent”

([3, S. 1]). Time dependency characterizes TSD, where a

consecutive observation has some connection with its prede-

cessor. In some fields a connection is not necessarily given

for two data points in a database that are the closest to each

other regarding their timestamp. The dependency of adjacent

observations is self-evident, when collecting sensor values of

a mechatronic system from an experimental rig, for example.

We therefore use the term ‘‘time dependency’’ in the context

of a dynamical system and extend it to a time-variant system

in the terminology of control systems engineering. Indepen-

dent TSD would be where the variance and the average are

invariant along the time (stationary) and ergodic.

We position our work in the subcategory of online clus-

tering of dependent multivariate time-series data. As of

now we refer to a time-series as a sequence of dependent

observations with a constant sample-rate. A ‘‘discrete series

of consecutive data points’’ as a subset of this time-series is

synonymously referred to as pattern, motif, sequence, operat-

ing state, state change, between change points, subsequence,

episode or segment, among others. In this paper we will use

the term subsequence (using terminology of [10]).

B. ALGORITHMS AND DATASETS

In the relevant literature a diverse number of clustering

algorithms can be found [11]. Due to this fact, most of

the existing literature for reviewing or surveying existing

approaches, attend to a higher-level scope [12], [13], [14].

Fewer are concentrating on time-series clustering [23],

[24], [25], online [26], temporal knowledge discovery [5],

sequential pattern recognition [10], high dimensional

data [27] or change point detection [14], [28]. Current

approaches use deep learning architectures like multilayer

1https://scikit-learn.org/stable/modules/classes.html

2https://facebookresearch.github.io/Kats/

3https://riverml.xyz

18870 VOLUME 11, 2023

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

perceptron (MLP), convolutional neural network (CNN),

deep belief network (DBN), generative adversarial net-

work (GAN) and variational autoencoder (VAE) among

others [29].

The algorithms we use for comparison are listed in Table 1.

All of these are online clustering algorithms that can be

used for time-series clustering. Implementations are publicly

available in the Python programming language (see library

column in Table 1) and are well established and tested.

The Balanced Iterative Reducing and Clustering using

Hierarchies (BIRCH) [16] algorithm is based on a clustering

features (CF) tree with the CF as a triple of the number of

data points, linear sum and the squared sum. This CF tree

is built dynamically. It was also one of the earliest algo-

rithms capable of online clustering. The Bayesian Online

Change-point Detection (BOCPD) [17] algorithm, as the

name suggests, uses Bayesian methods to detect change-

points (CPs) online. Since this algorithm only detects CPs,

we manipulated the result to be able to interpret every CP

as the beginning of a new cluster. This algorithm starts

in our comparison with the limitation of not being able

to recognize a previously seen cluster. The Stream Clus-

tering Framework (CluStream) [19] algorithm is based on

extended CF from BIRCH, following a k-means algorithm.

The Density-based Stream Clustering (DBSTREAM) [20]

algorithm is based on the Self Organizing density-based clus-

tering over data Stream (SOStream) [30] and uses a shared

density graph to capture the density between micro-clusters.

The Density-Based Clustering over an Evolving Data Stream

with Noise (DenStream) [21] algorithm is an extension of

Density-Based Spatial Clustering of Applications with Noise

(DBSCAN) [31] which uses a damped window model of CF

to create core-micro-clusters and outlier-micro-clusters. The

Mini-Batch K-Means (MiniBatchKMeans) [22] algorithm

proposes “the use of mini-batch optimization for k-means

clustering” ([22]) to improve the k-means optimization prob-

lem. The STREAMKMeans [18] algorithm uses an adaption

of the original STREAM algorithm from [32]. Replacing the

k-median subroutine LSEARCH by an incremental k-means

algorithm. More information and comparison of most of the

used algorithms can be found in [26] and [33]

As described in section I, the focus of this publication

is on the online multivariate time dependent subsequence

clustering using RNN based AE. The number of algorithms

within this scope is limited compared to the number of clus-

tering algorithms in general. The following approaches use

at least some of those prerequisites. Reference [34] empha-

sizes the term segmentation for an offline sliding window

and bottom-up algorithm. Others are converting the time-

series into a Markov chain (MC) and then using a Bayesian

method to cluster the MCs [35], referring to them as episodes.

Here the data needs to be discretized into bins of equal

length. Reference [36] uses manually selected characteristics

(e.g., kurtosis, skewness and frequency) for clustering uni-

variate TSD. Others are using the Augmented Dickey-Fuller

test to evaluate time-series stationarity and perform a segmen-

tation based on this [37]. In [38] dynamic latent variables

from a vector autoregression (VAR) model in combination

with a principal component analysis (PCA) is used for seg-

menting industrial TSD. Reference [39] shows the advantage

of an embedding approach as well, by introducing a PCA

and a Vanilla-AE CP detection method with the restriction

of focusing on multivariate power grid data.

Focused on transfer learning, [40] introduces an adversar-

ial approach for domain adaption using a stacked AE. Offline

convolutional sparse AE used for supervised sequence clas-

sification was done by [41] and adapted by [42] for

unsupervised motif mining. Other AE based papers are, for

example, by [43] who use a mixture of AEs for image and text

clustering. Stacked AE and k-means for offline clustering is

done by [44] without considering time dependency. Showing

the combination of GRU-based AE and MTS for anomaly

detection is done by [45]. Reference [46] applies the sliding

windows approach on CNN-based AE for Anomaly Detec-

tion of Industrial Robots. A similar approach using AE for

MTS segmentation is published in [47]. The focus there is on

change point detection using latent space variables only and

no clustering or identification of the subsequences is done.

Clustering is strongly depending on the data and task pro-

vided: “...; each new clustering algorithm performs slightly

better than the existing ones on a specific distribution of pat-

terns.” ([8, S. 268]). Therefore, we try to apply the algorithms

on multiple different MTS datasets and compute different

metrics for comparison. Large efforts are made for making

datasets available to the scientific community and the public

to improve comparability and reproducibility by universities

or governmental institutions [48], [49]. For this paper we

focus on data with multivariate quantitative features with

continuous values. For a list of the datasets see Table 3

and a brief description is given in section IV. Evaluating

the performance of a clustering algorithm can be done with

two different approaches. If external knowledge about the

ground truth of each data point and its cluster is known,

then so-called external measures can be applied. If no ground

truth is available, internal measures need to suffice. Many

external measures exist, like the well-known F1-score (based

on the effectiveness measure by [50]). With the large number

of data available and working in the context of transient

machine behavior with the focus on finding internal states

of the system, acquiring or providing the ground truth is

time-consuming, error-prone and cumbersome (as described

in section I). “The definition of clusters depends on the user,

the domain, and it is subjective.” ([25, S. 30]). We therefore

use internal measures for comparing our approach. Those

internal measures commonly rely on a similarity measure of

the actual data which is being clustered. Thorough work on

metric comparison and similarity measures has been done

[25], [51], [52]. Most of those measures are based on simple

distances and densities computed for each data point but

do not take time dependency into consideration. Because of

VOLUME 11, 2023 18871

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

this, we found that for the use case described in this paper,

the commonly used clustering evaluation measures are not

well suited for ‘‘time-series clustering evaluation measures’’.

In section Vwe introduce an approach for similarity measures

which considers time dependency in combination with well-

established clustering metrics (see Table 2).

TABLE 2. Metrics used for time-series clustering comparison.

Implementations used from [15].

TABLE 3. Datasets used for time-series clustering comparison.

III. DEFINITIONS AND RESTRICTIONS

In the following section we define in more detail our data,

together with the restrictions of our environment. Considering

online clustering, we can refer to our TSD as continuously

incoming data or streaming data. This data is considered mul-

tivariate when the dimension (number of sensors or features)

of the data stream d>1. When we denote one value of

one feature as x, we have at time step tthe following feature

vector:

xt=(x0,x1,...,xd)Twith x∈Rand d∈N(1)

whereas the natural numbers include zero {0,1,2, . . .}=N.

A complete measurement sequence with nnumber of time

4https://sites.cc.gatech.edu/~borg/ijcv_psslds/

5https://data.nasa.gov/dataset/C-MAPSS-Aircraft-Engine-Simulator-

Data/xaut-bemq

6http://www.timeseriesclassification.com/description.php?Dataset=

EigenWorms

7https://archive.ics.uci.edu/ml/datasets/Condition%20monitoring%20of

%20hydraulic%20systems

8http://mocap.cs.cmu.edu/

9https://github.com/LuisM78/Occupancy-detection-data

steps using xfrom Equation (1) as

X=(x0,x1,...,xn)with n∈N(2)

so, X∈Rd×n. With time dependency consideration, it is

reasonable to denote a sliding window of the streaming data,

considering Equation (1) and nthe number of samples already

collected as:

Wt=xt+0,xt+1,...,xt+ζ⇒(ζ∈N)∧(ζ < n) (3)

Let’s also assume, that within the measurement data X

there exist subsequences Sjwhich satisfy our requirements of

non-overlapping and variable length. For the indexes of our

subsequences, we denote

J={j∈N:j≤u}(4)

where uis the number of identified subsequences in X. The

subsequence then is a continuous sampling from Xfor a small

period of time steps with consecutive data points and the

length of m. The subsequence length is usually much smaller

than the length of the full measurement data m≪n.

Sj=xqj,...,xqj+mj(0 ≤q≤n−m)∧j∈J(5)

For each subsequence with the index jwe have a first time

step index qj=qj,start and a last qj+m=qj,end for which

we do not allow overlapping

∀j∈J∃(qj,start ,qj,end )

⇒(qj,start <qj,end )∧((qj−1,end <qj,start )∧j>0) (6)

This results in our uniquely identified non-overlapping set of

subsequences

S=S0,...,Sjj∈J(7)

Clustering these uniquely identified subsequences results in

recognizing reoccurring subsequences and combining them

into a subset of all subsequences

Ci⊆S(8)

which results in the following cluster set C

C={C0,...,Ci}i∈I(9)

with the cluster index or unique cluster label

I={i∈N:i<n}(10)

For the output of a clustering algorithm at time twe denote

the scalar value ytas our label or designated subsequence

identification. For evaluation purposes a clustering for a time-

series produces a label array yfor all time steps:

y=(y0,y1,...,yn)with y∈Jand n∈N(11)

Furthermore, it is a requirement, that the streaming data pro-

vided can be applied to a numerical differentiation algorithm.

Therefore, a constant sample rate is necessary and in case

of strong noise, filtering or smoothing of the data should

be applied by a preprocessing step. Also, there mustn’t be

missing values and extreme outliers need to be removed.

18872 VOLUME 11, 2023

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

In our use case we assume that some knowledge about the

incoming data exists, so that an estimate of the variance and

the mean of the variable can be performed for standardization.

IV. DATASETS

All datasets used for comparison in this work are described

briefly in this section and listed in Table 3. They all contain

quantitative features with continuous values. For further use

of the datasets, no missing values exist, the data is continuous

and was standardized for the algorithms but not for the metric

computations. No other preprocessing like smoothening or

filtering was performed.

The bee-waggle dataset [56] contains movement of bees

in a hive captured with a vision-based tracker. The first two

features are the x and y coordinates of the bee added with the

sine and the cosine function applied to the heading angle.

The cmapss dataset is a ‘‘dataset of run-to-failure trajecto-

ries for a small fleet of aircraft engines under realistic flight

conditions’’ [57] with 18 features.

The eigen-worms dataset [58] contains measurements of

worm motion. Preprocessing extracted six features, which

represent the amplitudes along six previously identified base

shapes of the worms

The hydraulic dataset [59] is obtained from a hydraulic

test rig with measuring 17 process values such as pressures,

volume flows and temperatures.

Lorenz-attractor refers to a synthetic dataset which is

calculated using a system of the three coupled ordinary dif-

ferential equations which represent a hydrodynamic system:

X=s(Y−X);˙

Y=rX−Y−XZ;˙

Z=XY −bZ with

parameters used s=10, r=28 and b=2.667 (see Figure 2).

‘‘In these equations Xis proportional to the intensity of the

convective motion, while Y is proportional to the temperature

difference between the ascending and descending currents,

similar signs of Xand Ydenoting that warm fluid is rising,

and cold fluid is descending.’’ [60]

The mocap or The Motion Capture Database (MOCAP)

dataset [61] contains 93 features from human motion cap-

tured with markers.

The occupancy dataset [62] is a measurement of sensory

data in an office with the following sensors: temperature,

humidity, the derived humidity ratio, light and CO2.

The thomas-attractor dataset is as the lorenz-attractor

dataset, a synthetic dataset, computed with the three coupled

differential equations: ˙

X=sin(Y)−bX;˙

Y=sin(Z)−

bY ;˙

Z=sin(X)−bZ originally proposed by [63] with the

parameter used b=0.1615.

V. MULTIVARIATE TIME-SERIES SUB-SEQUENCE

CLUSTERING METRIC (MT3SCM)

As emphasized in section Iand section II, to our knowledge,

none of the existing clustering metrics take into consider-

ation the time space variations like curvature, acceleration

or torsion in a multidimensional space. We believe using

these curve parameters, is an intuitive method to measure

similarities between mechatronic system state changes or

FIGURE 2. Lorenz-attractor dataset. Computed with ˙

X=s(Y−X);

Y=rX −Y−XZ ;˙

Z=XY −bZ and parameters used s=10, r=28 and

b=2.667. Color and marker size indicate amount of curvature on a

logarithmic scale for better visibility.

subsequences in MTSD in general (in regard to the restric-

tions in section III).

Our MT3SCM score consists of three main components.

mt3scm =(ccw+sL+sP)/3 (12)

The weighted curvature consistency (ccw), the silhouette

location based (sL) and the silhouette curve-parameter based

(sP). When making the attempt of clustering TSD, it is sub-

jective and domain specific. Nevertheless, we try to take

the intuitive approach of treating MTSD as space curves

and use the parameterization as a similarity measure. This

is done in two different ways. First, we create new fea-

tures by computing the curve parameters sample by sample

(e.g., curvature, torsion, acceleration) and determine their

standard deviation for each cluster. Our hypothesis is, that

with a low standard deviation of the curve parameters inside

a cluster, the actions of a mechatronic system in this clus-

ter are similar. We call this the curvature consistency (cc)

(see Equation (24) used in 14 in algorithm 1). The sec-

ond procedure is to apply these newly computed features,

which are computed to scalar values per subsequence, onto

a well-established internal clustering metric, the silhouette

score [53] (see Table 2).

The computation of the cc comprises the calculation of the

curvature κand the torsion τat every time step twith xt.

κ(t)=⟨˙

e1(t),e2(t)⟩

∥˙

xt∥(13)

τ(t)=⟨˙

e2(t),e3(t)⟩

∥˙

xt∥(14)

whereas e1is the unit tangent vector (or first Frenet vector),

e2is the unit normal vector (or second Frenet vector) and e3is

the unit binormal vector (or third Frenet vector) which are

VOLUME 11, 2023 18873

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

defined as:

e1(t)=˙

∥˙

xt∥(15)

e2(t)=¨

xt− ⟨¨

xt,e1(t)⟩ × e1(t) (16)

e2(t)=e2(t)

∥e2(t)∥(17)

e3(t)=...

xt− ⟨...

xt,e1(t)⟩ × e1(t)− ⟨...

xt,e2(t)⟩ × e2(t)

(18)

e3(t)=e3(t)

∥e3(t)∥(19)

From which we can also derive the speed v= ∥˙

xt∥and

the acceleration a= ∥¨

xt∥. Figure 3shows exemplarily the

curvature κ, torsion τ, speed vand acceleration afor the first

part of the thomas-attractor dataset.

FIGURE 3. Qualitative visualization of the (a) curvature κ, (b) torsion τ,

thomas-attractor dataset. Color and marker size indicate amount of curve

parameter on a logarithmic scale for better visibility (dark and thin means

low value, bright and thick means high value). Axis labels and colorbar

labels are along the lines of Figure 2.

Afterwards the cc is calculated per cluster i∈I, by taking

the empirical standard deviation for each curve parameter

(exemplarily for κin Equation (20) with the set of subse-

quence indexes Jiwithin our cluster i). The arithmetic mean

(Equation (21)) of the standard deviations for the curvature

κ, torsion τand the acceleration aresults in the final cc per

cluster (see Equation (22)).

σκi=v

Ni−1X

j∈Ji

qj+mj

n=qj

(κn−κi)2(20)

σi=σκi+στi+σai

3(21)

cci=1−σiwith cci∈R:cci≤1 (22)

cci=(cci,if cci>−1

−1,if cci≤ −1(23)

The ccw, is directly derived from the cc per cluster, by weight-

ing it with the number of data points per cluster i∈I

Equation (24).

ccw=

i=1

cci×Ni

i=1

(24)

The calculation of the scores sPand sLis different to the

standard estimation of the silhouette score, which is shown

in Equation (25) and originally based on every data point of

the time-series Xand the assigned cluster label array y:

s=f(X,y) (25)

Our sPis the silhouette score derived from our previously

computed curve parameters per subsequence per cluster as

well as the standard deviation of those and the number of data

points per subsequence.

sP=f(e

XsP,yj) (26)

with

XsP=







κ11 τ11 a11 σ11 N11

κ12 τ12 a12 σ12 N12

κ21 τ21 a21 σ21 N21

κ22 τ22 a22 σ22 N22

κij τij aij σij Nij







,yj=







y11

y12

y21

y22

yij







(27)

The sLuses the silhouette score based on the median value

ˆxdij of a subsequences original feature space per feature d.

sL=f(e

XsL,yj) (28)

with

XsL=







ˆx111 ˆx211 . . . ˆxd11 σ11 N11

ˆx112 ˆx212 . . . ˆxd12 σ12 N12

ˆx121 ˆx221 . . . ˆxd21 σ21 N21

ˆx122 ˆx222 . . . ˆxd22 σ22 N22

ˆx1ij ˆx2ij . . . ˆxdij σij Nij







(29)

The main idea of this approach is to combine three main parts

inside one metric. First incentive is to reward a low standard

deviation of the curve parameters in between a cluster

(accomplished by cc). Second, to benchmark the clusters

spatial separation based on the new feature space (curve

parameters, accomplished by sP). And third, to benchmark

the clusters spatial separation based on the median of the

18874 VOLUME 11, 2023

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

subsequence in the original feature space (accomplished

by sL). The proposed algorithm for this new metrics compu-

tation is described in algorithm 1.

Algorithm 1 MT3SCM

1: procedure MT3SCM(X,y)▷Data X∈Rd×nand labels

y∈Nn

2: L←empty() ▷Array initialization for all

subsequence median coordinates or Location

3: P←empty() ▷Array initialization for all

subsequence curve Parameters mean values

4: K←GetCurveParametersForAllData(X)

5: yunique ←FindUniqueClusterIDs(y)

6: for iin yunique do

7: Xi←GetClusterData(X,i)

8: s←FindSubsequences(y,i)

9: for jin sdo

10: Xi,j←GetSubsequenceData(Xi,j)

11: L[i,j]←GetMedianLocations(Xi,j)

12: P[i,j]←GetCurveParameterValues(K,i,j)

13: end for

14: cci←ClusterCurvatureConsistency(P)▷

Compute the cluster curvature consistency (cci) with the

empirical standard deviation of each curve parameter

over time. If the cluster consists only of one time step,

set the ccito zero.

15: C[i]←cci▷Collect ccidata for all clusters

16: end for

17: ccw←WeightedAverage(C,npc)▷

Compute weighted average curvature consistency (ccw)

from cciwith number of points per cluster

18: sL←SilhouetteComputation(L,yunique)▷Compute

the silhouette coefficient using the center positions of

each identified subsequence

19: sP←SilhouetteComputation(P,yunique)▷Compute

the silhouette coefficient with the curve parameters

20: score ←(ccw+sL+sP)/3

21: return score ▷The final score

22: end procedure

A. EVALUATION

For computational tests, we manually created a ‘‘perfect’’

synthetic dataset with respect to our metric (see Figure 4).

Figure 4 (a) shows the original synthetic dataset, where the

subsequences in cluster 1 are a helix along the increasing

xaxis. For cluster 2 the subsequences are a straight move-

ment, with quadratic decreasing distances along the yaxis.

Cluster 3 is representing a helix along the decreasing xaxis

but with a different resolution than cluster 1. Cluster 4 is,

along with cluster 2, a straight movement with quadratic

increasing distances along the yaxis. This cycle is repeated

six times. Figure 4 (b) shows the new feature space for the sL

component. The feature space for the sPcomponent is shown

in Figure 4 (c). Applying the new features per subsequence on

the standard metrics, results in best scores for all metrics. This

shows that the new feature space allows a good separation in

contrast to the original space, as proven by the metrics scores

for silhouette, calinski-harabasz and davies-bouldin on the

original and the two new feature spaces. To show the benefit

of the new feature space, we applied the agglomerative clus-

tering10 not on the original lorenz-attractor dataset but on the

newly computed feature space based on curvature, torsion and

acceleration (see Figure 5) The metric values for Figure 5 (b)

show a high ccwand a decent sPvalue for the low number of

10 clusters specified.

To further evaluate our metric, we used the lorenz-attractor

and the thomas-attractor dataset (see Table 3) and applied

an agglomerative clustering, a time-series k-means clustering

as well as a random subsequence clustering. Varying the

number of clusters and some algorithm specific parameters.

Afterwards the metrics calinski-harabasz, davies-bouldin and

silhouette scores were computed and compared to our new

metric MT3SCM. From these results we derived a correlation

matrix (see Figure 6). The cc and the ccware clearly related

due to their direct combination. The positive correlation

between the internal components to the overall MT3SCM

score is obvious. We see a clear positive correlation to the

silhouette score which is evident due to the internal use of

this metric. Interestingly, the correlation between the ccwand

the sPis negative. This is due to the types of datasets and

algorithms we used. Because with higher number of clusters

we theoretically expect a better cc because of the lower stan-

dard deviation by chance. On the other hand, the more clusters

exist, the more likely a similar curve parameter between the

clusters exists and therefore creates a new feature space with

overlapping clusters, which results in a low sPscore. This

can be retraced within the subfigures of Figure 7. The low

correlation between the calinski-harabasz and the davies-

bouldin scores supports our point that the available clustering

metrics are not well suited to be used for time-series cluster-

ing evaluation measures. Figure 7shows examples where the

agglomerative clustering was applied on the lorenz-attractor

dataset (part of the data used for the correlation matrix

Figure 6). It can be seen that the agglomerative clustering on

the original dataset is not an optimal cluster algorithm, when

comparing the metrics to Figure 5 (b). Comparing Figure 5 (b)

and Figure 7 (d) we can see a similar MT3SCM score but very

different standard metrics scores. The similar MT3SCM score

is based on the much higher number of clusters and equally

distributed subsequence length in Figure 7 (d), which results

in a high ccwvalue as well as a good spatial separation (sL),

which is compensating the low sPvalue due to the similar

curve parameters of the clusters. Figure 5 (b) however, also

has a very high ccwvalue with a good sPvalue reaching

a similar MT3SCM score but with a fifth of the number of

clusters. How our metric handles random clustering with

10https://scikit-learn.org/stable/modules/generated/sklearn.cluster.

AgglomerativeClustering.html

VOLUME 11, 2023 18875

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

FIGURE 4. Synthetic dataset with four clusters with a perfect own metric score of mt3scm =1 due to each cluster’s unique and constant curve

parameters. (a) Synthetic dataset with best own result of mt3scm =1. Standard metrics scores computed with original data; davies-bouldin: 1.4,

calinski-harabasz: 6.9e+02, silhouette: 0.087. (b) New feature space from the centers (median value) of each subsequence. Standard metrics scores

computed with new feature space; davies-bouldin: 6.3e−07, calinski-harabasz: 3.8e+14, silhouette: 1. (c) New feature space from the curve

parameters extracted from each subsequence. Standard metrics scores computed with new feature space; davies-bouldin: 6.8e−07,

calinski-harabasz: 1.2e+13, silhouette: 1.

TABLE 4. Metric values for Figure 7 8 and 5.

critical scenarios, is shown in Figure 8. The Python code and

a more detailed evaluation are publicly available at [64]

B. CONCLUSION

We have described a more suitable similarity measure for

dependent TSD. After showing how to compute our metric

and evaluated on different datasets its use case and effec-

tiveness. Further we will use this metric in addition to the

standard metrics to evaluate our proposed online time-series

clustering algorithm which is described in section VI

VI. CLUSTERING ALGORITHM (ABIMCA)

In this section we describe the concept of our time-series

clustering approach in detail. Afterwards, we apply our algo-

rithm onto the datasets described in section IV and present

the results.

A. METHOD

As described in [23] a key component in a time-series clus-

tering algorithm is the similarity function to quantify the

clustering criteria. Common similarity functions used are

distance measures like euclidean distance or some kind of

correlation coefficients like Pearson’s correlation coefficient.

Those are also used for static data clustering algorithms. More

suitable for time-series clustering are similarity functions

like Dynamic Time Warping (DTW) distance, short time-

series (STS) distance [65] or considering space curves like

we introduced in section V.

In this work we analyzed an approach which is data driven,

based on unsupervised machine learning algorithms and has

online capabilities (see Figure 11). Our approach uses a RNN

based AE to generate scores which are used as similarity

measures. Specifically, the experiments in this work were

performed using a pytorch [66] implementation of a bidi-

rectional one-layer gated recurrent unit (GRU) RNN with a

hidden size of the input dimensions minus one h=d−1.

Other prerequisites regarding the dataset and preprocessing

are described in section IV and section III.

The main procedure of the approach is as follows: The

incoming data is taken as a sliding window Wtat the current

18876 VOLUME 11, 2023

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

FIGURE 5. Lorenz-attractor dataset with 10 clusters from agglomerative

clustering on the new curve parameters feature space. See Table 4for

metric comparison of the following subplots. (a) New curve parameters

feature space computed from the Lorenz-attractor dataset with labels

from agglomerative clustering (b) Lorenz-attractor dataset with labels

from agglomerative clustering on the new curve parameters feature space

subsequence. Standard metrics scores computed with new feature space

(d) New feature space from the centers (median value) of each

subsequence. Standard metrics scores computed with new feature space.

FIGURE 6. Own metric (MT3SCM) correlation analysis. Own metric and its

four subcomponents (curvature consistency (cc), weighted curvature

consistency (ccw), silhouette location based (sL), silhouette

curve-parameter based (sP)) correlation to calinski-harabasz,

davies-bouldin and silhouette score for random, agglomerative and

k-means clustering on lorenz and thomas-attractor dataset.

time twith length ζof past time steps and number of fea-

tures d. This matrix Wt∈Rd×ζis used for the input of, what

we call, the Base Autoencoder (BAE). The key element of our

algorithm is, that this BAE’s parameters are not constant but

being adapted iteratively with a stochastic gradient descent

(SGD) optimization method for each new incoming sliding

window. For this training of the BAE, we use a slight adaption

of the sparse AE loss function Lfrom [67] with a basic

FIGURE 7. Agglomerative clustering from [15] applied on the

lorenz-attractor dataset exemplifies the unique components of our metric

compared to the silhouette calinski-harabasz and davies-bouldin scores.

See Table 4for metric comparison of the following subplots. Subfigures

(a) (b) and (c) all have a similarly low MT3SCM score compared to

Figure 5 (b) but considerably good standard metric scores. Subfigure

(d) can achieve a relatively high MT3SCM score due to the high number of

clusters and the resulting good ccwand sLvalue which compensates the

low sPvalue.

FIGURE 8. Own metric evaluation using random clusterer on

thomas-attractor dataset and lorenz-attractor dataset. See Table 4for

metric comparison of the following subplots (a) Own metric and all of its

subcomponents are around zero, as desired. Calinski-harabasz value is

low and davies-bouldin is high, which also indicate a ‘‘bad’’ clustering

(b) Longer random subsequences also generate a MT3SCM result around

zero. Calinski-harabasz and davies-bouldin scores are stronger influenced

by the subsequence length (c) Own metric and all of its subcomponents

are around zero, as desired. Calinski-harabasz value is low and

davies-bouldin is high, which also indicate a ‘‘bad’’ clustering (d) As seen

for the lorenz-attractor data in (b), longer subsequences have a high

impact on calinski-harabasz and davies-bouldin scores.

regularization term or sparsity penalty 

loss =l=L(Wt,e

Wt,h)=MSE(Wt,e

Wt)+(h) (30)

VOLUME 11, 2023 18877

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

FIGURE 9. Example of online clustering with a simple three-dimensional

synthetic dataset. First row shows the original input data X(or the last

values of each sliding window Wt) with the online cluster IDs as the

background color (blue is unknown or SID =0). Second row shows the

output of the AE or the reconstruction f

W. In the third row the blue line

represents the value of the latent space hb(left axis) and the identified

subsequence ID SID (right axis). The last row indicates the Base

Autoencoder’s (BAE) (sb) as well as the SAEs’ (s1-s4) score values. The

black horizontal line is the subsequence detection score threshold (η)

and the gray line is the subsequence recognition score threshold (ρ).

TABLE 5. Summation of the number of outperformances of each

algorithm for all datasets and all metrics compared to the

mini-batch-kmeans with parameters from hyperparameter search.

where h=f(x) is the encoders output or latent space. The

sparsity penalty we denote as:

(h)=λ·Pd−1

i=1|hi−clc|(31)

with the penalty factor λ=1e−10 and the latent center

constant clc =0.5. The mean squared error (MSE) is

MSE(Wt,e

Wt)=1

d·ζPd

i=1Pζ

j=1(wij −ewij)2(32)

FIGURE 10. Example of batch-wise offline clustering with a simple

three-dimensional synthetic dataset.

TABLE 6. Summation of the number of outperformances of each

algorithm for all datasets and all metrics compared to the

mini-batch-kmeans with default parameters.

which results in the final loss computation

loss =l=1

d·ζPd

i=1Pζ

j=1(wij −ewij)2+λ·Pd−1

i|hi−clc|

(33)

where the first part is the MSE between the input matrix Wt

and the reconstruction e

Wtand the second part is the penalty

of the latent space deviation.

To determine if a subsequence is recognized at the current

time step, we denote the scoring function SF as follows

s=score =SF(l,h)=cfw ·(|clc −h|)+l

cfw (34)

whereas the weighting factor in current implementation is

cfw =1 and latent center constant is clc =0.5. It utilizes the

reconstruction error as well as the deviation of the latent space

with a doubled emphasis on the latent space deviation due

to its dependence in the loss function as well as the scoring

function which includes the loss again (see Equation (34).

In combination with a threshold, the score is used to deter-

mine when a recognizable subsequence is present.

18878 VOLUME 11, 2023

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

FIGURE 11. Concept of the ABIMCA approach. Sliding window of the MTS Wtis iteratively trained in the base AE. If score of base AE (gray dotted

line) is below threshold (dashed red line), a new subsequence AE is created from the base AE. Incoming data is also compared to existing

subsequence AEs if subsequence can be recognized.

TABLE 7. Best metric ‘metrics.mt3scm’ value for each dataset and algorithm from hyperparameter search results.

TABLE 8. Best metric ‘metrics.calinski-harabasz’ value for each dataset and algorithm from hyperparameter search results.

The bottom row of Figure 9shows, that a subsequence

is present, when the BAE’s score (blue line) is below the

horizontal black line (sb<=η). If a subsequence is present,

a copy of the BAE is made and its parameters are frozen

and associated with this specific pattern of a subsequence.

These copies of the BAE, which we call Subsequence

Autoencoder (SAE), are used to recognize previously seen

subsequences using the same scoring function. A concept

drawing of the approach is shown in Figure 11. The algorithm

is described in pseudocode in algorithm 2.

The functionality can be retraced considering

Figure 9and 10. This example shows the algorithm applied

VOLUME 11, 2023 18879

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

TABLE 9. Best metric ‘metrics.davies-bouldin’ value for each dataset and algorithm from hyperparameter search results.

TABLE 10. Best metric ‘metrics.silhouette’ value for each dataset and algorithm from hyperparameter search results.

TABLE 11. Metric ‘metrics.mt3scm’ value for each dataset and algorithm from default calibration results.

TABLE 12. Metric ‘metrics.calinski-harabasz’ value for each dataset and algorithm from default calibration results.

on a three-dimensional synthetic data set. The input data

consists of four different operation points with small white

noise. The sequence of the four subsequences is repeated

once. The other rows are described in the caption of Figure 9.

It is evident that the algorithm needs a few time steps to adapt

to the current subsequence until it is recognized as such.

Recognizing a previously identified subsequence, however,

is almost instantaneous. The calibration of the thresholds

η(horizontal black line) and ζ(horizontal gray line) are

apparently crucial. The necessary time steps to adapt to a

current subsequence can be altered by the calibration of the

learning rate αand the number of BAE’s training cycles

18880 VOLUME 11, 2023

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

TABLE 13. Metric ‘metrics.davies-bouldin’ value for each dataset and algorithm from default calibration results.

TABLE 14. Metric ‘metrics.silhouette’ value for each dataset and algorithm from default calibration results.

FIGURE 12. Empricial complexity estimation with variation of total

number of datapoints and subsequences identified by our algorithm over

the duration. 11.

per time step ω. A faster recognition of a subsequence has

the drawback of the algorithm being very sensitive and

therefore identifying even small changes of the input as a new

subsequence. A strategy could be to calibrate the algorithm

first to be rather insensitive and cluster the time-series in

major subsequences. These can then be further clustered

with a more sensitive calibration. This procedure can be

repeated until the required degree of granularity is achieved.

For a streaming application multiple runs of this procedure

could also be applied in parallel and combined into a

cluster tree.

B. EVALUATION

For the evaluation study of our algorithm, we chose eight

different MTS datasets (see Table 3) from which six are pub-

licly available and two are provided with our codebase [68],

seven other state-of-the-art algorithms (see Table 1) and three

widely used unsupervised clustering metrics (see Table 2).

Each algorithm has been applied to each dataset with default

parameters. Additionally, we performed a hyperparameter

search for each algorithm based on a random grid search of

300 samples. The parameter boundaries for this hyperparam-

eter search are listed in Table 15. Overall, 19 264 experiments

were run.

For a better overview of the results, we chose to compare

every algorithm to the ‘‘MiniBatchKMeans’’ algorithm and

counted the number of times they performed better. Table 5

shows the results for the hyperparameter search and the

number of outperformances of each algorithm compared to

the ‘‘MiniBatchKMeans’’ algorithm. Table 6shows the same

results with default parameter settings for each algorithm.

We can see, that in sum and in two of the metrics our algo-

rithm beats state-of-the-art algorithms. The full list of results

is attached in Table 7and 14. Additionally, Table 16 shows

the best results from the hyperparameter search when sorted

by MT3SCM with the total time spent.11

11Experiments were performed on a Linux machine with a AMD Ryzen

Threadripper 2950X 16-Core Processor using a GeForce RTX 2080 GPU.

VOLUME 11, 2023 18881

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

Algorithm 2 ABIMCA. Using the Following Parameters:

α: Learning Rate, ω: Number of Base Model Training Cycle

per Step, η: Subsequence Detection Score Threshold, ρ: Sub-

sequence Recognition Score Threshold, Window Length: ζ.

Also, We Denote θSas a List of Subsequence Model Param-

eters, sas an Array of Scores for All Existing Subsequence

Models, e

Wtthe Reconstruction of the Sliding Window Input,

1: procedure Abimca(Wt)▷Sliding window data

Wt∈Rd×ζ

2: Verify calibration t≥ζ∨α > 0∨ω≥1∨η >

0∨ρ > η

3: cS←0▷Initialize subsequence counter

4: θb←Sparse(sparsity =0.1) ▷Initialize base model

parameters

5: while Wtdo ▷New input available

6: for j←1, ω do ▷Iterative base model train

loop

7: e

Wt,hb←Predict(Wt, θb)

8: lb←L(Wt,e

Wt,h)

9: 1θb←Backpropagate(lb)

10: θb←θb+1θb▷Update base model

parameters

11: end for

12: e

Wt,hb←Predict(Wt, θb)

13: sb←SF(lb,hb)

14: s←GetSubsequenceScores(Wt, θS)

15: if min(s)< ρ then ▷Subsequence recognized

16: SID ←arg min(s)▷Set ID to recognized

subsequence index

17: else ▷No subsequence recognized

18: if sb<=ηthen ▷Score below new

subsequence threshold

19: θS.append(θb)▷Append current

base model parameters to list of subsequence models

parameters

20: cs←cs+1▷increase subsequence

counter

21: SID ←cs▷Set ID to new subsequence

index

22: else ▷In transition

23: SID ←0▷Set ID to unknown

24: end if

25: end if

26: end while

27: end procedure

To estimate the complexity of the algorithm additional

experiments were run to substantiate our theoretical consid-

eration for the Big-O notation (see Figure 12). Since it is

an online algorithm that uses a sliding window, the duration

of the algorithm is linearly correlated to the number of data

points to be analyzed. Additionally, with every subsequence

TABLE 15. Hyperparameter random grid search upper and lower bound

for each algorithm and their specific parameter options.

18882 VOLUME 11, 2023

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

TABLE 16. Metrics from hyperparameter search when sorted for best value of mt3scm metric.

VOLUME 11, 2023 18883

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

identified, the algorithm checks if the new incoming data

is already known by comparing it to the previously identi-

fied subsequences. A linear correlation of the duration for

every datapoint with the number of subsequences identified

is therefore present. The complexity of our algorithm is

therefore O(NS), where Nis the number of datapoints and

Sis the number of subsequences identified. Considering the

worst case scenario of identifying every new datapoint as

a new subsequence the duration for the algorithm increases

quadratically with the number of datapoints, which results in

a complexity of O(n2).

As cited before, every algorithm performs differently on

the specific distribution of patterns and the hyperparameter

search was a simple random grid search of ‘‘only’’ 300 sam-

ples, so these results unlikely represent the optimal solution

for each algorithm on each dataset. Nevertheless, we demon-

strate, that the algorithm we present in this work, is highly

effective of detecting subsequences online in a MTS.

VII. LIMITATIONS AND DISCUSSION

For the evaluation of the segmentation of MTSD, we intro-

duced a new metric which is based on space-curve parame-

ters in the feature space. Due to the wide variety of fields,

use cases and applications, this falls into place for some

applications and uses cases but not for all. The calculation

of these space-curve parameters is sensitive to outliers and

smoothness and questionable for steady-state conditions or

non-moving point clouds in the feature space. We have imple-

mented specific numerical boundary limits for computing

the derivatives of the data in these states, but it needs to

be considered and evaluated if this is compatible with the

application. Because of the outlier sensitivity we use the mean

value of these parameters as well as their standard deviation.

A low-pass filter for very noisy data should be considered

before applying it to the metric. Attention is called for, when

the data is scaled or standardized. This effects the actual space

curve parameters, since a constant curvature is likely not con-

stant anymore after scaling. The metric also tends to reward

short subsequences who only occur once. Due to the mean

value of the curve parameters the subsequence separation

appears to be good, but the variance of one large subsequence

is high. This needs to be compensated or prevented more

and will be part of future analysis and improvement of the

metric. It might also be reasonable to introduce weighting

factors for the three parts of our metric in Equation (12) to

consider domain specific emphasis on a rather spatial or curve

parameter separation requirement.

Regarding our clustering method, calibration of the main

thresholds (η,ρ) needs special attention. In combination with

the learning rate, they mainly influence the ‘‘sensitivity’’

of the segmentation process. Using only the reconstruction

error or only the representation deviation can be beneficial

for different use cases and complexities (e.g., changing the

weighting factor in Equation (34)). Advantages are that no

additional information about the data for offline clustering

needs to be saved. All necessary information for clustering

data after training is the parameters of the subsequence spe-

cific AEs. It is a completely unsupervised method which can

cluster online data. In the context of CbM the once identified

subsequence AEs can be used for deviation quantification

of the underlying system. This can be used for deterioration

analysis and maintenance strategies. Further investigations

for improving the ABIMCA method would be to explore

different kind of AEs like feedforward neural network (FNN),

CNN or a combination of such. Also, a VAE could be rea-

sonable depending on the underlying process. Future work

should analyze the effect of reducing the latent space dimen-

sion by multiple factors of the input dimension (when input

dimension is very high). This could reduce computation costs

and improve representation learning without performance

loss. A detailed analysis of the optimal default parameters or a

generic automatic calibration depending on some statistics of

the expected input could increase performance and decrease

calibration efforts.

VIII. CONCLUSION

In this paper we have introduced the Autoencoder-based

Iterative Modeling and Subsequence Clustering Algorithm

(ABIMCA) which is a deep learning method to separate

multivariate time-series data (MTSD) into subsequences. It is

beneficial in a variety of fields, to cluster MTSD into smaller

segments or subsequences in an unsupervised manner. The

ability to filter measurement data based on specific subse-

quences can improve downstream development products such

as anomaly detection or machine diagnosis in Condition-

based Maintenance (CbM) strategies. Our algorithm is specif-

ically useful for MTSD generated by a mechatronic system

in a transient environment. It can be used offline as well as

online for streaming data. It utilizes recurrent neural network

(RNN) based Autoencoders (AE) by iteratively training a

Base Autoencoder (BAE), generating a segmentation score

and saving the intermediate parameters of the BAE to rec-

ognize previously identified subsequences. By comparing

our algorithm with seven other algorithms on eight different

publicly available datasets using four different unsupervised

metrics (from which we introduced one ourselves), we have

shown that our algorithm outperforms state-of-the-art algo-

rithms. Our unsupervised metric introduced (Multivariate

Time-Series Sub-Sequence Clustering Metric (MT3SCM)),

is an attempt to use a more intuitive similarity measure based

on the curvature and other space-curve parameters of the

spanned feature space. Additionally, all our code is open

source and publicly available for benchmarking.

REFERENCES

[1] E. Quatrini, F. Costantino, G. Di Gravio, and R. Patriarca, ‘‘Condition-

based maintenance—An extensive literature review,’’ Machines, vol. 8,

no. 2, p. 31, Jun. 2020.

[2] R. Isermann, Fault-Diagnosis Systems: An Introduction From Fault Detec-

tion to Fault Tolerance. Berlin, Germany: Springer, 2006.

[3] G. E. P. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, Time

Series Analysis: Forecasting and Control (Wiley series in Probability

and Statistics), 5th ed. Hoboken, NJ, USA: Wiley, 2016. [Online].

Available: https://search.ebscohost.com/login.aspx?direct=true&scope=

site&db=nlebk&db=nlabk&AN=1061322

18884 VOLUME 11, 2023

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

[4] C. M. Bishop, Pattern Recognition and Machine Learning (Information

science and statistics), 1st ed. New York, NY, USA: Springer, 2016.

[5] J. F. Roddick and M. Spiliopoulou, ‘‘A survey of temporal knowledge dis-

covery paradigms and methods,’’ IEEE Trans. Knowl. Data Eng., vol. 14,

no. 4, pp. 750–767, Jul. 2002.

[6] J. Lin, E. Keogh, S. Lonardi, and P. Patel, ‘‘Finding motifs in time series,’’

Proc. 2nd Workshop Temporal Data Mining, 2002, pp. 1–11.

[7] G. J. J. van den Burg and C. K. I. Williams, ‘‘An evaluation of change point

detection algorithms,’’ 2020, arXiv:2003.06222.

[8] A. K. Jain, M. N. Murty, and P. J. Flynn, ‘‘Data clustering: A review,’’ ACM

Comput. Surv., vol. 31, no. 3, pp. 264–323, Sep. 1999.

[9] V. Chandola, A. Banerjee, and V. Kumar, ‘‘Anomaly detection: A survey,’’

ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, Jul. 2009.

[10] R. Agrawal and R. Srikant, ‘‘Mining sequential patterns,’’ in Proc. 11th

Int. Conf. Data Eng., Mar. 1995, pp. 3–14.

[11] D. Xu and Y. Tian, ‘‘A comprehensive survey of clustering algorithms,’’

Ann. Data Sci., vol. 2, no. 2, pp. 165–193, 2015.

[12] M. Lovrić, M. Milanović, and M. Stamenković, ‘‘Algoritmic meth-

ods for segmentation of time series: An overview,’’ J. Contemp.

Econ. Bus. Issues, vol. 1, no. 1, pp. 31–53, 2014. [Online]. Available:

http://hdl.handle.net/10419/147468

[13] S. Torkamani and V. Lohweg, ‘‘Survey on time series motif discovery,’’

Wiley Interdiscipl. Rev., Data Mining Knowl. Discovery, vol. 7, no. 2,

p. e1199, Mar. 2017.

[14] S. Aminikhanghahi and D. J. Cook, ‘‘A survey of methods for time series

change point detection,’’ Knowl. Inf. Syst., vol. 51, no. 2, pp. 339–367,

2017.

[15] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,

M. Blondel, A. Müller, J. Nothman, G. Louppe, P. Prettenhofer, R. Weiss,

V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,

M. Perrot, and E. Duchesnay, ‘‘Scikit-learn: Machine learning in Python,’’

J. Mach. Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011.

[16] T. Zhang, R. Ramakrishnan, and M. Livny, ‘‘BIRCH: An efficient data

clustering method for very large databases,’’ in Proc. ACM SIGMOD Int.

Conf. Manage. Data, vol. 25, no. 2, 1996, pp. 103–114.

[17] R. Prescott Adams and D. J. C. MacKay, ‘‘Bayesian online changepoint

detection,’’ 2007, arXiv:0710.3742.

[18] J. Montiel, M. Halford, S. Martiello Mastelini, G. Bolmier, R. Sourty,

R. Vaysse, A. Zouitine, H. Murilo Gomes, J. Read, T. Abdessalem, and

A. Bifet, ‘‘River: Machine learning for streaming data in Python,’’ 2020,

arXiv:2012.04740.

[19] C. C. Aggarwal, P. S. Yu, J. Han, and J. Wang, ‘‘A framework for clustering

evolving data streams,’’ in Proc. VLDB Conf. Elsevier, 2003, pp. 81–92.

[20] M. Hahsler and M. Bolaños, ‘‘Clustering data streams based on shared

density between micro-clusters,’’ IEEE Trans. Knowl. Data Eng., vol. 28,

no. 6, pp. 1449–1461, Jun. 2016.

[21] F. Cao, M. Estert, W. Qian, and A. Zhou, ‘‘Density-based clustering over

an evolving data stream with noise,’’ in Proc. SIAM Int. Conf. Data Mining,

J. Ghosh, D. Lambert, D. Skillicorn, and J. Srivastava, Eds. Philadel-

phia, PA, USA: Society for Industrial and Applied Mathematics, 2006,

pp. 328–339.

[22] D. Sculley, ‘‘Web-scale K-means clustering,’’ in Proc. 19th Int. Conf.

World Wide Web (WWW), New York, NY, USA, M. Rappa, P. Jones,

J. Freire, and S. Chakrabarti, Eds. 2010, p. 1177.

[23] T. W. Liao, ‘‘Clustering of time series data—A survey,’’ Pattern Recognit.,

vol. 38, no. 11, pp. 1857–1874, 2005.

[24] S. Zolhavarieh, S. Aghabozorgi, and Y. W. Teh, ‘‘A review of subsequence

time series clustering,’’ Sci. World J., vol. 2014, Jul. 2014, Art. no. 312521.

[25] S. Aghabozorgi, A. Seyed Shirkhorshidi, and T. Ying Wah, ‘‘Time-series

clustering—A decade review,’’ Inf. Syst., vol. 53, pp. 16–38, Oct. 2015.

[26] M. Carnein and H. Trautmann, ‘‘Optimizing data stream representation:

An extensive survey on stream clustering algorithms,’’ Bus. Inf. Syst. Eng.,

vol. 61, no. 3, pp. 277–297, Jun. 2019.

[27] H.-P. Kriegel, P. Kröger, and A. Zimek, ‘‘Clustering high-dimensional

data,’’ ACM Trans. Knowl. Discovery Data, vol. 3, no. 1, pp. 1–58,

Mar. 2009.

[28] C. Truong, L. Oudre, and N. Vayatis, ‘‘Selective review of offline

change point detection methods,’’ Signal Process., vol. 167, Feb. 2020,

Art. no. 107299.

[29] E. Aljalbout, V. Golkov, Y. Siddiqui, M. Strobel, and D. Cremers,

‘‘Clustering with deep learning: Taxonomy and new methods,’’ 2018,

arXiv:1801.07648.

[30] C. Isaksson, M. H. Dunham, and M. Hahsler, ‘‘SOStream: Self organizing

density-based clustering over data stream,’’ in Machine Learning and

Data Mining in Pattern Recognition (Lecture Notes in Computer Science),

vol. 7376, D. Hutchison, T. Kanade, J. Kittler, J. M. Kleinberg, F. Mattern,

J. C. Mitchell, M. Naor, O. Nierstrasz, C. P. Rangan, B. Steffen, M. Sudan,

D. Terzopoulos, D. Tygar, M. Y. Vardi, G. Weikum, and P. Perner, Eds.

Berlin, Germany: Springer, 2012, pp. 264–278.

[31] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, ‘‘A density-based algo-

rithm for discovering clusters a density-based algorithm for discovering

clusters in large spatial databases with noise,’’ in Proc. 2nd Int. Conf.

Knowl. Discovery Data Mining, 1996, pp. 226–231. [Online]. Available:

http://dl.acm.org/citation.cfm?id=3001460.3001507

[32] L. O’Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani,

‘‘Streaming-data algorithms for high-quality clustering,’’ in Proc. 18th Int.

Conf. Data Eng., 2002, pp. 685–694.

[33] M. Ghesmoune, M. Lebbah, and H. Azzag, ‘‘State-of-the-art on clustering

data streams,’’ Big Data Analytics, vol. 1, no. 1, Dec. 2016.

[34] E. Keogh, S. Chu, D. Hart, and M. Pazzani, ‘‘Segmenting time series:

A survey and novel approach,’’ in Data Mining in Time Series Databases,

vol. 57, M. Last, A. Kandel, and H. Bunke, Eds. Singapore: World Scien-

tific, 2003, pp. 1–21.

[35] M. Ramoni, P. Sebastiani, and P. Cohen, ‘‘Bayesian clustering by dynam-

ics,’’ Mach. Learn., vol. 47, no. 1, pp. 91–121, 2002.

[36] X. Wang, K. Smith, and R. Hyndman, ‘‘Characteristic-based clustering

for time series data,’’ Data Mining Knowl. Discovery, vol. 13, no. 3,

pp. 335–364, 2006.

[37] R. P. Silva, B. B. Zarpelão, A. Cano, and S. B. Junior, ‘‘Time series segmen-

tation based on stationarity analysis to improve new samples prediction,’’

Sensors, vol. 21, no. 21, p. 7333, Nov. 2021.

[38] S. Lu and S. Huang, ‘‘Segmentation of multivariate industrial time series

data based on dynamic latent variable predictability,’’ IEEE Access, vol. 8,

pp. 112092–112103, 2020.

[39] M. Ceci, R. Corizzo, N. Japkowicz, P. Mignone, and G. Pio, ‘‘ECHAD:

Embedding-based change detection from multivariate time series in smart

grids,’’ IEEE Access, vol. 8, pp. 156053–156066, 2020.

[40] R. Li, S. Li, K. Xu, X. Li, J. Lu, and M. Zeng, ‘‘A novel symmetric stacked

autoencoder for adversarial domain adaptation under variable speed,’’

IEEE Access, vol. 10, pp. 24678–24689, 2022.

[41] M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, ‘‘Spatio-

temporal convolutional sparse auto-encoder for sequence classification,’’

in Proc. Brit. Mach. Vis. Conf., R. Bowden, J. Collomosse, and K. Miko-

lajczyk, Eds. 2012, p. 124.

[42] K. Bascol, R. Emonet, E. Fromont, and J.-M. Odobez, ‘‘Unsupervised

interpretable pattern discovery in time series using autoencoders,’’ in

Structural, Syntactic, and Statistical Pattern Recognition (Lecture Notes

in Computer Science), vol. 10029, A. Robles-Kelly, M. Loog, B. Biggio,

F. Escolano, and R. Wilson, Eds. Cham, Switzerland: Springer, 2016,

pp. 427–438.

[43] S. E. Chazan, S. Gannot, and J. Goldberger, ‘‘Deep clustering based on a

mixture of autoencoders,’’ in Proc. IEEE 29th Int. Workshop Mach. Learn.

Signal Process. (MLSP), Oct. 2019, pp. 1–6.

[44] B. Yang, X. Fu, D. N. Sidiropoulos, and M. Hong, ‘‘Towards

K-means-friendly spaces: Simultaneous deep learning and

clustering,’’ in Proc. 34th Int. Conf. Mach. Learn., D. Precup and

Y. W. Teh, Eds. vol. 70, 2017, pp. 3861–3870. [Online]. Available:

https://proceedings.mlr.press/v70/yang17b.html

[45] Y. Guo, W. Liao, Q. Wang, L. Yu, T. Ji, and P. Li, ‘‘Multidimensional

time series anomaly detection: A gru-based Gaussian mixture variational

autoencoder approach,’’ in Proc. 10th Asian Conf. Mach. Learn., J. Zhu

and I. Takeuchi, Eds. vol. 95, 2018, pp. 97–112. [Online]. Available:

http://proceedings.mlr.press/v95/guo18a.html

[46] T. Chen, X. Liu, B. Xia, W. Wang, and Y. Lai, ‘‘Unsupervised

anomaly detection of industrial robots using sliding-window convolu-

tional variational autoencoder,’’ IEEE Access, vol. 8, pp. 47072–47081,

2020.

[47] W.-H. Lee, J. Ortiz, B. Ko, and R. Lee, ‘‘Time series segmentation through

automatic feature learning,’’ 2018, arXiv:1801.05394.

[48] D. Dua and C. Graff. (2017). UCI Machine Learning Repository. [Online].

Available: http://archive.ics.uci.edu/ml

[49] (Sep. 14, 2020). Data.gov. [Online]. Available: https://data.gov/

[50] C. J. van Rijsbergen, Information Retrieval. London, U.K.: Butterworths,

1979.

VOLUME 11, 2023 18885

J. Köhne et al.: AE-Based Iterative Modeling and MTS Subsequence Clustering Algorithm

[51] H. Kremer, P. Kranen, T. Jansen, T. Seidl, A. Bifet, G. Holmes, and

B. Pfahringer, ‘‘An effective evaluation measure for clustering on evolving

data streams,’’ in Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discov-

ery Data Mining (KDD), New York, NY, USA, C. Apte, J. Ghosh, and

P. Smyth, Eds. 2011, p. 868.

[52] J. Serrá and J. L. Arcos, ‘‘An empirical evaluation of similarity measures

for time series classification,’’ Knowl.-Based Syst., vol. 67, pp. 305–314,

Sep. 2014.

[53] P. J. Rousseeuw, ‘‘Silhouettes: A graphical aid to the interpretation and

validation of cluster analysis,’’ J. Comput. Appl. Math., vol. 20, no. 1,

pp. 53–65, 1987.

[54] T. Caliński and J. Harabasz, ‘‘A dendrite method for cluster analysis,’’

Commun. Stat., Theory Methods, vol. 3, no. 1, pp. 1–27, 1974.

[55] D. L. Davies and D. W. Bouldin, ‘‘A cluster separation measure,’’ IEEE

Trans. Pattern Anal. Mach. Intell., vol. PAMI-1, no. 2, pp. 224–227,

Apr. 1979.

[56] S. M. Oh, J. M. Rehg, T. Balch, and F. Dellaert, ‘‘Learning and inferring

motion patterns using parametric segmental switching linear dynamic

systems,’’ Int. J. Comput. Vis., vol. 77, nos. 1–3, pp. 103–124, May 2008.

[57] M. Arias Chao, C. Kulkarni, K. Goebel, and O. Fink, ‘‘Aircraft engine

Run-to-Failure dataset under real flight conditions for prognostics and

diagnostics,’’ Data, vol. 6, no. 1, p. 5, Jan. 2021.

[58] E. Yemini, T. Jucikas, L. J. Grundy, A. E. X. Brown, and W. R. Schafer,

‘‘A database of caenorhabditis elegans behavioral phenotypes,’’ Nature

Methods, vol. 10, no. 9, pp. 877–879, Sep. 2013.

[59] T. Schneider, N. Helwig, and A. Schütze, ‘‘Automatic feature extraction

and selection for classification of cyclical time series data,’’ Technisches

Messen, vol. 84, no. 3, pp. 198–206, Mar. 2017.

[60] E. N. Lorenz, ‘‘Deterministic nonperiodic flow,’’ J. Atmos. Sci., vol. 20,

no. 2, pp. 130–141, 1963.

[61] (Apr. 21, 2022). Carnegie Mellon University—CMU Graphics Lab—

Motion Capture Library. [Online]. Available: http://mocap.cs.cmu.edu/

[62] L. M. I. Candanedo and V. Feldheim, ‘‘Accurate occupancy detection of

an office room from light, temperature, humidity and Co2measurements

using statistical learning models,’’ Energy Buildings, vol. 112, pp. 28–39,

Jan. 2016.

[63] R. Thomas, ‘‘Deterministic chaos seen in terms of feedback circuits:

Analysis, synthesis, ‘labyrinth chaos,’’’ Int. J. Bifurcation Chaos, vol. 9,

no. 10, pp. 1889–1905, 1999.

[64] J. Köhne. (2022). MT3SCM: Multivariate Time Series Sub-

Sequence Clustering Metric. Python. [Online]. Available:

https://github.com/Jokonu/mt3scm

[65] C. S. Möller-Levet, F. Klawonn, K.-H. Cho, and O. Wolkenhauer, ‘‘Fuzzy

clustering of short time-series and unevenly distributed sampling points,’’

in Advances in Intelligent Data Analysis V (Lecture Notes in Computer Sci-

ence), vol. 2810, G. Goos, J. Hartmanis, J. van Leeuwen, M. R. Berthold,

H.-J. Lenz, E. Bradley, R. Kruse, and C. Borgelt, Eds. Berlin, Germany:

Springer, 2003, pp. 330–340.

[66] A. Paszke et al., ‘‘PyTorch: An imperative style, high-performance deep

learning library,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 32,

H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox,

and R. Garnett, Eds. Red Hook, NY, USA: Curran Associates, 2019,

pp. 8024–8035. [Online]. Available: http://papers.neurips.cc/paper/9015-

pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

[67] M. Ranzato, C. S. Poultney, S. Chopra, and Y. LeCun, ‘‘Efficient learning

of sparse representations with an energy-based model,’’ in Proc. NIPS,

2006, pp. 1–8.

[68] J. Köhne. (2022). ABIMCA: Autoencoder Based Iterative Modeling

and Subsequence Clustering Algorithm. Python. [Online]. Available:

https://github.com/Jokonu/abimca

JONAS KÖHNE was born in Berlin, Germany,

in 1983. He received the Dipl.-Ing. degree in

energy and process engineering from the Technis-

che Universität Berlin, Germany, in 2014, where

he is currently pursuing the Ph.D. degree with the

Department of Energy and Automation Technol-

ogy. From 2014 to 2015, he worked as a Func-

tion Development Engineer and an Embedded

Software Test Engineer in the automotive field

of electrical power steering with Bertrandt AG.

From 2015 to 2019, he worked as a Function Developer of powertrain and

power engineering with the Department of Commercial Vehicle Electronics,

IAV GmbH. Since 2019, he has been a Research Assistant with the Depart-

ment of Energy and Automation Technology, Technische Universität Berlin.

He is also working as a Data Scientist with the Department Commercial

Vehicle Electronics, IAV GmbH. His research interests include anomaly

and novelty detection, condition/predictive maintenance strategies of mecha-

tronic systems, analysis of multivariate time-series data using autoencoder,

and fault discovery and identification.

LARS HENNING was born in Nauen, Germany,

in 1975. He received the Dipl.-Ing. degree in

energy and process engineering from the Technis-

che Universität Berlin, Germany, in 2002, and the

Ph.D. degree in control engineering, in 2008. Since

2008, he has been working with the Department

of Commercial Vehicle Electronics, IAV GmbH,

in the area of powertrain and power engineering.

Currently, he is a Team Manager of the Software

Development Team, with the focus on condi-

tion/predictive maintenance strategies of mechatronic systems and machine-

learned algorithms for powertrain control.

CLEMENS GÜHMANN was born in Berlin,

Germany, in 1962. He received the Dipl.-Ing.

degree in electrical engineering from the Technis-

che Universität Berlin, Germany, in 1989, and the

Ph.D. degree in pattern recognition and technical

diagnosis, in 1995. From 1989 to 1994, he was

a Research Assistant with the Institute for Gen-

eral Electrical Engineering. From 1994 to 1995,

he worked as a Function Development Engi-

neer with Whirlpool Corporation (Bauknecht).

From 1996 to 2003, he was employed as a Function Development Engi-

neer with IAV GmbH, where he was the last Head of the Department of

Transmission Systems. In 2001, he received a Teaching Assignment with

the Technische Universität Berlin, where he was appointed to a professor-

ship at the Chair of Electronic Measurement and Diagnostic Technology,

in 2003. His research interests include the measurement technology and data

processing and the diagnosis, predictive maintenance, modeling, simulation,

and automatic control of mechatronic systems.

18886 VOLUME 11, 2023