
Vol.:(0123456789)
Algorithmica (2021) 83:1–44
https://doi.org/10.1007/s00453-020-00746-y
1 3
Parameterized Dynamic Cluster Editing
JunjieLuo1,2· HendrikMolter3· AndréNichterlein3· RolfNiedermeier3
Received: 12 December 2018 / Accepted: 3 July 2020 / Published online: 25 July 2020
© The Author(s) 2020
Abstract
We introduce a dynamic version of the NP-hard graph modification problem
Clus
-
ter editing
. The essential point here is to take into account dynamically evolving
input graphs: having a cluster graph (that is, a disjoint union of cliques) constituting
a solution for a first input graph, can we cost-efficiently transform it into a “simi-
lar” cluster graph that is a solution for a second (“subsequent”) input graph? This
model is motivated by several application scenarios, including incremental cluster-
ing, the search for compromise clusterings, or also local search in graph-based data
clustering. We thoroughly study six problem variants (three modification scenarios
edge editing, edge deletion, edge insertion; each combined with two distance meas-
ures between cluster graphs). We obtain both fixed-parameter tractability as well
as (parameterized) hardness results, thus (except for three open questions) provid-
ing a fairly complete picture of the parameterized computational complexity land-
scape under the two perhaps most natural parameterizations: the distances of the
new “similar” cluster graph to (1)the second input graph and to (2)the input cluster
graph.
Keywords Graph-based data clustering· Incremental clustering· Compromise
clustering· Correlation clustering· Local search· Goal-oriented clustering·
NP-hard problems· Fixed-parameter tractability· Parameterized complexity·
Kernelization· Multi-choice knapsack
An extended abstract of this work appears in the proceedings of the 38th IARCS Annual Conference
on Foundations of Software Technology and Theoretical Computer Science (FSTTCS’18)[35].
Unfortunately, the conference version contains a claim whose unpublished proof contained an error
(seeTable1). This full version contains all proof details.
* Hendrik Molter
Extended author information available on the last page of the article

2
Algorithmica (2021) 83:1–44
1 3
1 Introduction
The NP-hard
Cluster editing
problem[6, 41], also known as
Correlation Cluster
-
ing
[5], is one of the most popular graph-based data clustering problems in algorithmics.
Given an undirected graph, the task is to transform it into a disjoint union of cliques
(also known as a cluster graph) by performing a minimum number of edge modifica-
tions (deletions or insertions). Being NP-hard,
Cluster editing
gained high popular-
ity in studies concerning parameterized algorithmics, e.g.[1, 4, 9, 11, 14, 23, 26, 29,
33]. To the best of our knowledge, to date these parameterized studies mostly focus on
a “static scenario”. Chen etal.[14] are an exception by also looking at temporal and
multilayer graphs. In their work, the input is a set of graphs (multilayer) or an ordered
list of graphs (temporal), in both cases defined over the same vertex set. The goal is to
transform each input graph into a cluster graph such that, in the multilayer case, the
number of vertices in which any two cluster graphs may differ is upper-bounded, and in
the temporal case, the number of vertices in which any two consecutive (with respect to
their positions in the list) cluster graphs may differ is upper-bounded.
In contrast to the work of[14], we do not assume that all future changes are
known. We consider the scenario where, given an input graph, we only know
changes that lie immediately ahead, that is, we know the “new” graph that the input
graph changes to. Thus we seek to efficiently and effectively adapt an existing solu-
tion, namely a cluster graph. Motivated by the assumption that the “new” cluster
graph should only change moderately but still be a valid representation of the data,
we parameterize both on the number of edits necessary to obtain the “new” clus-
ter graph and on the difference between the “old” and the “new” cluster graph. We
finally remark that there have been previous parameterized studies of dynamic (or
incremental) graph problems, dealing e.g. with coloring[30], domination[3, 19], and
vertex deletion[2, 34] problems.
1.1 Mathematical Model
In principle, the input for a dynamic version of a static problemX are two instancesI
and
I′
ofX, a solutionS forI, and an integerd. The task is to find a solution
S′
for
I′
such
that the distance betweenS and
S′
is upper-bounded byd. Often, there is an additional
constraint on the size of
S′
. Moreover, some distance measure betweenI and
I′
has often
been considered as a parameter for the problem[2, 3, 19, 34]. We arrive at our following
“original dynamic version” of
Cluster editing
(phrased as decision version).

3
1 3
Algorithmica (2021) 83:1–44
Herein,
⊕
denotes the symmetric difference between two sets and
dist (⋅,⋅)
is a
generic distance function for cluster graphs, which we discuss later. Moreover,
Gc
is supposed to be the “solution” given for the input graph
G1
. However, since the
question in this problem formulation is independent from
G1
, we can remove
G1
from the input and arrive at the following simplified version of the problem. For the
remainder of this paper we focus on this simplified formulation of
dynamiC Cluster
editing
.
There are many different distance measures for cluster graphs[37, 38]. Indeed, we
will study two standard ways of measuring the distance between two cluster graphs.
One is called classification error distance, which measures the number of vertices
one needs to move between cliques to make two cluster graphs the same—we sub-
sequently refer to it as matching-based distance. The other is called disagreement
distance, which is the symmetric distance between two edge sets—we subsequently
refer to it as edge-based distance. Notably, the edge-based distance upper-bounds
the matching-based distance. We give formal definitions in Sect.2.
1.2 Motivation andRelated Work
Beyond parameterized algorithmics and static
Cluster editing
, dynamic clustering
in general has been subject to many studies, mostly in applied computer science[12,
16, 17, 42–44]. We mention in passing that there are also close ties to reoptimization
(e.g.,[7, 8, 40]) and parameterized local search (e.g.,[21, 25, 28, 30, 36]).
There are several natural application scenarios that motivate the study of
dynamiC
Cluster editing
. Next, we list four of them.
Dynamically updating an existing cluster graph
dynamiC Cluster editing
can
be interpreted to model a smooth transition between cluster graphs, reflecting that
“customers” working with clustered data in a dynamic setting may only tolerate a
moderate change of the clustering from “one day to another” since “revolutionary”
transformations would require too dramatic changes in their work. In this spirit,
when employing small parameter values,
dynamiC Cluster editing
has kind of
an evolutionary flavor with respect to the history of the various cluster graphs in a
dynamic setting.
Editing a graph into a target cluster graph For a given graphG, there may be
many cluster graphs which are at mostk edge modifications away. The goal then
is to find one of these which is close to the given target cluster graph
Gc
since in a

4
Algorithmica (2021) 83:1–44
1 3
corresponding application one is already “used to” work with
Gc
. Adapting a differ-
ent point of view, the editing into the target cluster graph
Gc
might be too expensive
(that is,
|E(G)⊕E(Gc)|
is too big), and one has to find a solution cluster graph with
small enough modification costs but being still close to the target
Gc
.
Local search for an improved cluster graph Here the scenario is that one may
have found an initial clustering expressed by
Gc
, and one searches for another solu-
tion
G′
forG within a certain local region around
Gc
(captured by our parameterd).
Editing into a compromise clustering When focusing on the edge-based distance,
one may generalize the definition of
dynamiC Cluster editing
by allowing
Gc
to
be any graph (not necessarily a cluster graph). This may be used as a model for
“compromise cluster editing” in the sense that the goal cluster graph then is a com-
promise for a cluster graph suitable for both input graphs since it is close to both of
them.
1.3 Our Results
We investigate the (parameterized) computational complexity of
dynamiC Clus
-
ter editing
. We study
dynamiC Cluster editing
as well as two restricted ver-
sions where only edge deletions (“Deletion”) or edge insertions (“Completion”) are
allowed. We show that all problem variants (notably also the completion variants,
whose static counterpart is trivially polynomial-time solvable) are NP-complete
even if the input graphG is already a cluster graph. Table1 surveys our main com-
plexity results.
The general versions of
dynamiC Cluster editing
all turn out to be param-
eterized intractable (W[1]-hard) by the single natural parameter “budget k” or
Table 1 Result overview for
dynamiC Cluster editing
. We primarily categorize the problem variants by
the distance measure (Matching, Edge) they use and secondarily by the allowed modification operation
NP-completeness for all problem variants (last column) even holds if the input graph G is a cluster
graph. PK stands for polynomial-size problem kernel
a
In the conference version[35] of this paper we claimed that
dCCompletion (edge dist)
is in FPT when
parameterized by k. Unfortunately, the unpublished “proof” for this claim contained an error that we
could not fix
Parameter
Problem variant
k+d
k d
Match. Editing FPT(PK) Theorem3W[1]-h Theorem2W[1]-h
}
Theorem2NP-c Theorem1
Deletion FPT(PK) open W[1]-h NP-c
Completion FPT(PK) open FPT Theorem4NP-c
Edge Editing FPT(PK) Theorem3W[1]-h Theorem2W[1]-h
}
Theorem2NP-c Theorem1
Deletion FPT(PK) FPTTheorem4W[1]-h NP-c
Completion FPT(PK) open
a
FPT Theorem4NP-c

5
1 3
Algorithmica (2021) 83:1–44
“distanced”; however, when both parameters are combined, we achieve a polyno-
mial-size problem kernel, implying fixed-parameter tractability. We also derive a
generic approach, based on a reduction to
multi-ChoiCe KnapsaCK
, to derive fixed-
parameter tractability for several deletion and completion variants with respect to
the parameters budgetk as well as the distanced.
1.4 Organization ofthePaper
Our work, after introducing basic notations (Sect.2), consists of two main parts. In
Sect.3, we provide all our (parameterized) hardness results. In Sect.4, we develop
several positive algorithmic results, namely polynomial-size problem kernels
through polynomial-time data reduction, and fixed-parameter solving algorithms.
We conclude with a summary and directions for future work (Sect.5).
2 Preliminaries andProblems Variants
In this section we give a brief overview on concepts and notations of graph theory
and parameterized complexity theory that are used in this paper. We also give for-
mal definitions of the distance measures for cluster graphs we use and of our prob-
lem variants. We use
⊕
to denote the symmetric difference of two sets, that is, for
two setsA,B we have
A
⊕
B∶= (A⧵B)∪(B⧵A)
.
2.1 Graph‑Theoretic Concepts andNotations
Given an undirected graph
G=(V,E)
, we say that a vertex set
C⊆V
is a clique
inG if G[C] is a complete graph. We say that a vertex set
C⊆V
is isolated in G
if there is no edge
{u,v}∈E
with
u∈C
and
v∈V⧵C
. A
P3
is a path with three
vertices. We say that vertices
u,v,w∈V
form an induced
P3
in G if
G[{u,v,w}]
is
a
P3
. We say that an edge
{u,v}∈E
is part of a
P3
in G if there is a vertex
w∈V
such that
G[{u,v,w}]
is a
P3
. Analogously, we say that a non-edge
{u,v}∉E
is part
of a
P3
in G if there is a vertex
w∈V
such that
G[{u,v,w}]
is a
P3
. A cluster graph
is simply a disjoint union of cliques. Equivalently, a graph
G=(V,E)
is a cluster
graph if for all
u,v,w∈V
we have that
G[{u,v,w}]
is not a
P3
, or in other words,
P3
is a forbidden induced subgraph for cluster graphs.
2.2 Distance Measures forCluster Graphs
A cluster graph is simply a disjoint union of cliques. We use two basic distance
measures for cluster graphs[37, 38]. The first one is called “matching-based
distance” and counts how many vertices have to be moved from one clique to
another to make two cluster graphs the same (see Fig.1 for an illustrating exam-
ple). It is formally defined as follows.
Loading more pages...