scieee Science in your language
[en] (orig)
On Measuring Process Model Similarity based
on High-level Change Operations
Chen Li1, Manfred Reichert2, and Andreas Wombacher3
1Information System group, University of Twente, The Netherlands
2Institute of Databases and Information System, Ulm University, Germany
3Database group, University of Twente, The Netherlands
Abstract. For various applications there is the need to compare the
similarity between two process models. For example, given the as-is and
to-be models of a particular business process, we would like to know how
much they differ from each other and how we can efficiently transform
the as-is to the to-be model; or given a running process instance and
its original process schema, we might be interested in the deviations
between them (e.g. due to ad-hoc changes at instance level). Respective
considerations can be useful, for example, to minimize the efforts for
propagating the schema changes to other process instances as well. All
these scenarios require a method to measure the similarity or distance
between two process models based on the efforts for transforming the one
into the other. In this paper, we provide an approach using digital logic
to evaluate the distance and similarity between two process models based
on high-level change operations (e.g. to add, delete or move activities). In
this way, we can not only guarantee that model transformation results in
a sound process model, but also ensure that related efforts are minimized.
1 Introduction
Business world is getting increasingly dynamic, requiring from companies to con-
tinuously adapt business processes as well as supporting Process-Aware Informa-
tion Systems (PAISs) [3] in order to cope with the frequent and unprecedented
changes in their business environment [19]. Organizations and enterprises need
to continuously Re-engineer their Business Processes (BPR), i.e. they need to be
able to flexibly upgrade and optimize their business processes in order to stay
competitive in their market. Furthermore, PAISs should allow for process flexi-
bility, i.e., it must be possible for users to deviate from the pre-defined process
model at the instance level if required.
The pivotal research on process flexibility over the last years [1, 11] has pro-
vided the foundation for dynamic process change to reduce the cost of change
1Supported by the Netherlands Organization for Scientific Research (NWO) under
contract number 612.066.512
in PAISs. Process flexibility denotes the capability to reflect externally triggered
change by modifying only those aspects of a process that need to be changed,
while keeping the other parts stable, i.e., the ability to change or evolve the
process without completely replacing it [11]. To compare two process models is
a fundamental task in this context. In particular, it becomes necessary to cal-
culate the minimal difference between two process models based on high level
changes. If we need to transform one model into another, for example, efforts can
then be reduced and the transformation can go smoothly; i.e. we do not need
to re-define the new process model from scratch, but only apply these high-level
changes either at process type or process instance level. Several approaches like
ADEPT [11], WASA [20] or TRAM [6], have emerged to enable process change
support in PAIS (see [13] for an overview).
Based on the two assumptions that (1) process models are block-structured
and (2) all activities in a process model have unique labels, this paper deals with
the following fundamental research question:
Given two process models Sand S0, how much do they differ from each other
in terms of high-level change operations? And what is the minimal effort, i.e.
the minimal number of change operations needed to transform Sinto S0?
Clearly, our focus is on minimizing the number of high-level change operations
needed to transform process model Sinto process model S0. Soundness of the
resulting process model should be also not sacrificed. We apply the high-level
change operations as described in [11, 19] in the given context. By considering
high-level changes, we can distinguish our approach from traditional similarity
measures like graph or sub-graph isomorphism [15]. Both only consider basic
change primitives like insertion or deletion of single nodes and edges.
Answering the above research question will lead to better cost efficiency when
performing BPR, since the efforts to implement the corresponding changes in
the supporting PAIS are minimized. At process instance level, we can reduce the
efforts to propagate process type changes to the running instances [13]. Finally
the derived differences between original process model and its process instances
can be used as a set of pure and concise logs for process mining [4].
In previous work, we have provided the technical foundation for users to
flexibly change process models at both the process type and the process instance
level. For example, users may dynamically insert, delete or move an activity at
these two levels [11]. In addition, snapshot differential algorithms [7], known
from database technology, can be used as a fast and secure method to detect the
change primitives (e.g. to add or delete nodes and edges) needed to transform
one process model into another.
Using this framework and snapshot differential algorithm, this paper applies
Digital Logic in Boolean Algebra [14] to provide a new method to transform
a process model into another one based on high-level change operations. This
method does not only minimize the number of changes needed in this context,
but also guarantees soundness of the changed process model, i.e. the process
model remains correct when applying high-level change operations. We further
provide two measures process distance and process similarity –based on high-
2
level change operations, which indicate how costly it is to transform process
model Sinto model S0, and how different Sand S0are.
The remainder of this paper is organized as follows: Sec. 2 introduces back-
grounds needed for the understanding of this paper. In Sec. 3 we discuss reasons
and difficulties for deriving high-level change operations. Sec. 4 describes an ap-
proach to detect the difference between two process models. Sec. 5 discuss related
work. The paper concludes with a summary and outlook in Sec. 6.
2 Backgrounds
Let Pdenote the set of all correct process models. A particular process model
S= (N, E, . . .) P is defined as a well-structured Activity Net [11]. Ncon-
stitutes a set of activities aiand Eis a set of precedence relations (i.e. control
edges) between them. To limit the scope, we assume Activity Nets to be block
structured. Examples are depicted in Fig 1.
We assume that a process change (i.e. Activity Net Change) is accomplished
by applying a sequence of high-level change operations to a given process model
Sover time [11]. Such change operations modify the initial process model by
altering the set of activities and/or their order relations. Thus, each application
of a change operation results in a new process model. We define process change
as follows:
Definition 1 (Process Change). Let Pdenote the set of possible process mod-
els and Cthe set of possible process changes. Let S, S0 P be two process models,
let C be a process change, and let σ=h1, 2, . . . ni Cbe a sequence
of process changes performed on initial model S. Then:
S[iS0iff is applicable to Sand S0is the process model resulting from
the application of to S.
S[σiS0iff S1, S2,...Sn+1 P with S=S1,S0=Sn+1, and Si[iSi+1 for
i {1, . . . n}.
Examples of high-level change operations and their effects on a process model
are depicted in Table 1. Issues concerning the correct use of these operations and
related pre-/post- conditions are described in [11]. If some additional constraints
are met, the high-level change operations depicted in Table 1 will be also ap-
plicable at process instance level. Although the depicted change operations are
discussed in relation to our ADEPT framework [11], they are generic in the sense
that they can be easily transferred to other process meta models as well. For
example, the change operations in Table 1 can be also expressed by the life-cycle
inheritance rule as used in the context of Petri Nets [16]. We are referring to
ADEPT in this paper since it covers by far most high-level change operations
and change patterns respectively when compared to other approaches [19]. It
further has served as basis for representing our method.
A trace ton process model Sdenotes a valid execution sequence t<
a1, a2, . . . , ak>of activities aiNon Saccording to the control flow de-
fined by S. All traces process model Scan produce are summarized in trace
3
Table 1. Examples of High-Level Change Operations
Change Operation on S opType subject paramList
insert(S, X, A,B, [sc]) insert X S, A,B, [sc]
Effects on S: inserts activity X between activity sets Aand B. It is a conditional insert
if [sc] is specified (i.e. [sc] = XOR)
delete(S, X, [sc]) delete X S, [sc]
Effects on S: deletes activity X from S, i.e. X turns into a silent one. [sc] is specified ([sc] =
XOR) when we block the branch with X, i.e. the branch which contains X will not be activated
move(S, X, A,B, [sc]) move X S, A,B, [sc]
Effects on S: moves activity X from its original position in Sto another position between
activity sets A and B. (it is a conditional insert if [sc] is specified)
replace(S, X, Y) replace X Y
Effects on S: replaces activity X by activity Y
set TS.t(ab) is denoted as precedence relation between activities aand bin
trace t< a1, a2, . . . , ak>iff i < j :ai=aaj=b. Here, we only consider
traces composing ’real’ activities, but no events related to silent activities i.e.,
activity nodes which contain no operation and exist only for control flow pur-
pose, see Section 4.4. Finally, we will consider two process models as being the
same if they are trace equivalent, i.e. SS0iff TS TS0. The stronger notion
of bi-similarity [5] is not required in our context.
3 High-level Change Operations
3.1 Complementary Nature of Change and Execution Logs
Most PAISs support ad-hoc deviations at instance level and record them in
change logs. Thus, they provide additional information when compared to tra-
ditional PAISs which only record execution logs (which typically document the
start and/or end time of activities). Change logs and execution logs document
different run-time information on adaptive process instances and are not inter-
changeable. Even if the original process model is given, it will be not possible to
convert the change log of a process instance to its execution log or vice-verse. As
example, take the original and simplified patient treatment process as depicted in
Fig. 1a: a patient is admitted to a hospital, where he first registers, then receives
treatment, and finally pays. Assume that, due to an emergency situation, for one
particular patient, we want to first start the treatment of this patient and allow
him to register later during treatment. To represent this exceptional situation in
the process model of the respective instance, the needed change would be to move
activity receive treatment from its current position to a position parallel to activ-
ity register. This change leads to a new model S0, i.e., S[σiS0with σ=< move(S,
receive
treatment
Admitted
a) S: original process model b) S’: final execution & change
register pay
register
receive
treatment
pay
AND-Split
AND-Join
admitted
∆=Move (S, register, admitted, pay)
S[∆>S’
e=<admitted, receive treatment, register, pay>
Fig. 1. Change Log and Execution Log are not Interchangeable
4
delEdge(StartFlow,A); delEdge(A,B);
delEdge(B,C); addEdge(B,A);
addEdge(A,C); addEdge(StartFlow,B)
delEdge(A,B); delEdge(B,C); delEdge(B,D);
delEdge(C, τ); delEdge(D,τ); delEdge(t,E);
delEdge(τ, F}; delNode(τ); addEdge(A,C);
addEdge(C,B); addEdge(B,D); addEdge(D,E);
addEdge(D,F); updateNodeType(D,
XorSplit); updateNodeType(B, empty);
G
BA
C
D
E
F
D
AC
E
F
BG
B
A C
D
E
F
G
Change Primitives
Change Primitives
1
=Move (S, C, A, B)
S
1
:
model after change
1
2
=Move (S, A, B, C)
S[∆
1
>S
1
S[∆
2
>S
2
S:
original process model
S
2
:
model after change
2
AND-Split
AND-Join
XOR-Split
XOR-Join
Snapshot difference
Fig. 2. High-level Change Operation and Corresponding change primitive
reveive treatment,admitted,pay)>. Meanwhile, the execution log efor this par-
ticular instance can be e=<admitted,receive treatment,register, pay >(cf. Fig.
1b). If we only have process model Sand its execution log, it will be not possi-
ble to determine this change because the process model which can produce such
execution log is not unique. For example, a process model with the four activi-
ties contained in four parallel branches could produce this execution log as well.
On the contrary, it is generally not possible to derive the execution log from a
change log, because execution behavior of S0is also not unique. For example, a
trace <admitted,register,receive treatment,pay >is also producible on S0as
well. Consequently, change logs provide additional information when compared
to pure execution logs.
3.2 Why Do We Need High-level Change Operations?
After showing the importance of change logs, we now discuss why we need high-
level change operations rather than change primitives (i.e., low-level changes at
edge and node level). Left side of Fig. 2 shows original process model Swhich
consists of a parallel branching, a conditional branching, and a silent activity τ
(depicted as empty node) connecting these two blocks. Assume that two different
high-level change operations are applied to Sresulting in models S1and S2:1
moves activity Cfrom its current location to the position between activities A
and B, which leads to S1i.e., S[1iS1with 1=move(S, C,A,B). 2moves A
to the position between Band C, i.e. S[2iS2with 2=move(S, A,B,C). Fig. 2
additionally depicts the change primitives representing snapshot differences be-
tween Sand models S1and S2, respectively. Using high-level change operations
offers the following advantages:
1. High-level change operations guarantee soundness: i.e., application of a high-
level change operation to a sound model Sresults in another sound model
S0[11]. This also applies to our example from Fig. 2. By contrast, when
applying one single change primitive (e.g., deleting an edge in S) soundness
cannot be guaranteed anymore. Generally, if we delete any of the edges in
S, the resulting process model will not be necessarily sound.
2. High-level change operations provide richer syntactical meanings than change
primitives. Generally, a high-level change operation is built upon a set of
change primitives which collectively represent a complex modification of a
5
process model. As example take 1from Fig. 2. This high-level change oper-
ation requires 15 change primitives for its realization (deleting edges, adding
edges, deleting the silent activity, and updating the node types).
3. An important aspect, not discussed so far, concerns the number of change
operations needed to transform model Sinto target model S0. For example,
we need only one move operation to transform Sto either S1or S2. However,
when using change primitives, migrating Sto S1necessitates 15 change
primitives, while the second change 2can be realized based on 6change
primitives. This example also shows that change primitives do not provide
an adequate means to determine the difference between two process models.
Thus the required number of change primitives cannot represent the efforts
for process model transformations.
3.3 The Challenge to Derive High-level Change Operations
After sketching the benefits coming with high-level change operations, this sec-
tion discusses challenges of deriving them. When comparing two process models,
the change primitives needed for transforming one model into another can be
easily determined by performing two snapshots and a delta analysis on them [7].
An algorithm to minimize the number of change primitives is given in [12]. How-
ever, when trying to derive the high-level change operations needed for model
transformation, several challenges occur. As example consider Fig. 3:
1. When performing two delete operations on S(i.e., 1=delete(S, B) and
2=delete(S, C)), we obtain a new model S00 (i.e., S[σiS00 with σ=<
1, 2>), as well as an undetectable intermediate model S0with S[1iS0
and S0[2iS00. When examining the change primitives corresponding to each
high-level change operation, we first need to add edge (A,C) after the first
delete operation 1, and remove this edge (A,C) when applying the sec-
ond delete operation 2. However, when performing a delta analysis for
the original process model Sand the resulting process model S00, the two
change primitives (addEdge(A,C) introduced by the first delete operation and
delEdge (A,C) introduced by the second one) jointly have no effect on the
resulting process model S00, i.e., they cannot be detected by snapshot anal-
ysis. Consequently, deriving high-level change operations based on change
primitives would be challenging because the change primitives required for
every high-level change do not always appear in the snapshot differences be-
tween the original and resulting models. In Fig. 3, none of the two change
primitive sets associated with 1or 2constitute a sub-set of the change
primitive set associated with σ.
2. Even if there is just one high-level change operation, it will remain difficult
to derive it with delta algorithm. For example, in Fig. 3 the delta algorithm
shows that 15 change primitives are needed to transform Sinto S1. However,
the depicted changes can be also realized by just applying one high level move
operation to S.
6
S(
1
>S
S(
1
>S
1
= Delete (S, B)
2
= Delete (S, C)
delEdge(A,B), delEdge(B,C),
addEdge(A,C)
, delNode(B)
delEdge(C,D), addEdge(A,D),
delEdge(A,C)
,
delNode(C)
S(σ>S
σ =< Delete (S, B),
Delete (S, C) >
delEdge(A,B),
delEdge(B,C),
delEdge(C,D),
addEdge(A,D),
delNode(B)
delNode(C)
S’’ A D
S’ AC D
SAB C D
Fig. 3. Non-detectable Change Primitives
4 Detecting the Minimal Number of High-level Changes
In this section, we introduce our method to detect the minimal number of change
operations needed to transform a given process model Sinto another model S0.
As example, consider the process models Sand S0in Fig. 4.
4.1 General Description of our Method
As mentioned in Section 1, the key issue of our work is to minimize the number
of change operations needed to transform a process model S= (N, E, . . .) P
into another model S0= (N0, E0, . . .) P. Generally, three steps are needed (cf.
Fig. 4) to realize this minimal transformation:
1. aiN\N0:delete all activities being present in S, but not in S0. This first
step transforms Sto Ssame (cf. Fig. 4b).
2. aiNTN0:move all activities being present in both models to the loca-
tions as reflected by S0. Regarding our example, this second step transforms
Ssame to S0
same (cf. Fig. 4c).
3. aiN0\N:insert those activities being present in S0, but not in S. As
depicted in Fig. 4, the third step transforms S0
same to S0(cf. Fig. 4d).
Insertions and deletions deal with changes of the set of activities. Here, we can
hardly do anything to reduce efforts (i.e., the number of required insert/delete
operations): New activities (aiN0\N) must be added and obsolete activities
(ajN\N0) must be deleted.
The focus of minimality can therefore be shifted to the use of the move opera-
tion, which changes the structure of a process model, but not its set of activities.
Since a move operation logically corresponds to a delete followed by an insert op-
eration, we can transform Ssame to S0
same by maximally applying n=|NTN0|
move operations. Reason is that nmove operations correspond to deleting all
activities and then re-inserting them at their new positions. Correspondingly, n
is the maximal number of change operations needed to transform one process
model into another, both with same set of activities (Ssame and S0
same in our
example from Fig. 4). To measure the complete transformation from Sto S0, we
formally define process distance and process similarity as follows:
Definition 2 (Process Distance and Process Similarity). Let S= (N, E, . . .),
S0= (N0, E0, . . .) P be two process models. Let further σ=h1, 2, . . . ni
Cbe a sequence of change operations transforming Sinto S0(i.e. S[σiS0). Then
7
S
: original process model
S’
: destination process model
E
D
BC
AZ
F
G
Y
C
E
X
D
F
G
B
A
C
E
A
D
F
GB
S
same
: original model with shared activities
S’
same
: destination model with shared activities
A
C
B
E
F
GD
S
t
e
p
1
:
d
e
l
e
t
e
Transform
S to S’
Step2: move
S
t
e
p
3
:
i
n
s
e
r
t
a)
b) c)
d)
Fig. 4. Three Steps to Transform Sinto S0
the distance between Sand S0is given by d(S,S0)=min{|σ| |σ CS[σiS0}.
Furthermore, process similarity between Sand S0equals to 1d(S,S0)
|N|+|N0|−|NTN0|,
i.e., similarity equals to ((maximal number of changes - minimal number of
changes) / maximal number of changes).
4.2 Determining Required Activity Deletions and Insertions
To accomplish Step 1 and Step 3 of our method, we have to deal with the
change of the activity set when transforming Sinto S0. It can be easily detected
by applying existing snapshot algorithms [7] to both Sand S0. As described in
Section 4.1, as first step we need to delete all activities aiN\N0contained in
S, but not in S0. Regarding our example from Fig. 4, we can derive as our first
high-level change operation 1=delete(S, X). Similarly, activities contained in
S0, but not in S, are inserted in Step 3, after having moved the shared activities to
their respective position in S0(S0
same respectively). The parameters of the insert
operation, i.e. the predecessors and successors of the inserted activity, are just
like how they appear in S0. In this way, we obtain the last two change operations
for our example: Insert(S, Y,StartFlow,{A,B}) and Insert(S, Z,D,E).
4.3 Determining Required Move Operations
We now focus on Step 2 of our method; i.e., to transform two process models
with same activity set using move operations. Here, we can ignore the activities
not contained in both Sand S0(cf. 4.2). Instead, we consider the two process
models Ssame and S0
same respectively, as depicted in Fig. 4.
Determine the Order Matrix of a Process Model One key feature of our
ADEPT change framework is to maintain the structure of the unchanged parts
of a process model [11]. For example, if we delete an activity, this will neither
influence the successors nor the predecessors of this activity, and also not their
control relation. To incorporate this feature in our approach, rather than only
looking at direct predecessor-successor relationships between two activities (i.e.
control flow edges), we consider the transitive control dependencies between all
pairs of activities; i.e., for every pair of activities ai, ajNTN0,ai6=aj,
their execution order compared to each other is examined. Logically, we check
8
execution orders by considering all traces a process model can produce (cf. Sec.
2). Results can be formally described in a matrix An×nwith n=|NTN0|. Four
types of control relations can be identified (cf. Def. 3):
Definition 3 (Order matrix). Let S= (N, E, . . .) P be a process model
with N={a1, a2, . . . , an}. Let further TSdenote the set of all traces producible
on S. Then: Matrix An×nis called order matrix of Swith Aij representing
the relation between different activities ai,ajNiff:
Aij = ’1’ iff (t TSwith ai, ajtt(aiaj))
If for all traces containing activities aiand aj,aialways appears BEFORE
aj, we denote Aij as 1’, i.e., aiis predecessor of ajin the flow of control.
Aij = ’0’ iff (t TSwith ai, ajtt(ajai))
If for all traces containing activity aiand aj,aialways appears AFTER aj,
then we denote Aij as a 0’, i.e. aiis successor of ajin the flow of control.
Aij = ’*’ iff (t1 TS, with ai, ajt1t1(aiaj))(t2 TS, with
ai, ajt2t2(ajai))
If there exists at least one trace in which aiappears before ajand at least
one other trace in which aiappears after aj, we denote Aij as *’, i.e. ai
and ajare contained in different parallel branches.
Aij = ’-’ iff ( ¬∃t TS:aitajt)
If there is no trace containing both activity aiand aj, we denote Aij as -’,
i.e. aiand ajare contained in different branches of a conditional branching.
We revisit our example from Fig. 4. The order matrices of Ssame and S0
same
are shown in Fig. 5. The main diagonal is empty since we do not compare an
activity with itself. As one can see, elements Aij and Aji can be derived from
each other. If activity aiis a predecessor of activity aj(i.e. Aij = 1), we can
always conclude that Aji = 0 holds. Similarly, if Aij {’*’,’-’}, we will obtain
Aji =Aij . As a consequence, we can simplify our problem by only considering
the upper triangular matrix A= (Aij)j>i.
Execution order Matrix S
same
Execution order Matrix S’
same
A B
A
B
The first group of
activities and conflicts
C D E F
C
D
E
F
The second group of
activities and conflicts
Fig. 5. Order Matrices of Ssame and S0
same from Fig. 4
9
Under certain constraints, an order matrix Acan uniquely represent the
process model, based on which it was built on. This is stated by Theorem 1.
Before giving this theorem, we need to define the notion of substring of trace:
Definition 4 (Substring of trace). Let tand t0be two traces. We define tis
a sub-string of t0iff [ai,ajt,t(aiaj)ai,ajt0t0(aiaj)] and
[akN:ak/takt0].
Theorem 1. Let S, S0 P be two process models, with same set of activities
N={a1, a2, . . . , an}. Let further TS,TS0be the related trace sets and An×n,
A0
n×nbe the order matrices of Sand S0. Then S6=S0A6=A0, if (¬∃t1, t0
1
TS:t1is a substring of t0
1) and (¬∃t2, t0
2 TS0:t2is a substring of t0
2).
According to Theorem 1, there will be a one-to-one mapping between a pro-
cess model Sand its order matrix A, if the substring constraint is met. A proof of
Theorem 1 can be found in [8]. A detailed discussion of the sub-string restriction
is given in Section 4.4. Thus, when comparing two process models, it is sufficient
to compare their order matrices (cf. Def. 3), since a order matrix can uniquely
represent the process model. This also means that the differences of two process
models can be related to the differences of their order matrices. If two activities
have different execution order in two process models, we will define the notion
of conflict as follows:
Definition 5 (Conflict). Let S, S0 P be two process models with same set of
activities N. Let further Aand A0be the order matrices for Sand S0respectively.
Then: Activities aiand ajare conflicting iff Aij 6=A0
ij. We formally denote this
as C(ai,aj).CF := {C(ai,aj)|Aij 6=A0
ij}then corresponds to the set of all
existing conflicts.
Fig. 5 marks up differences between the two order matrices in grey. The set of
conflicts is as follows: CF ={C(A,B),C(C,D),C(C,F),C(D,E),C(D,F),C(E,F)}.
Optimizing the Conflicts To come from Ssame to S0
same (c.f. Fig. 4), we have
to eliminate conflicts between these two models by applying move operations.
Obviously, if there is no conflict for the two models, they will be identical. Every
time we move an activity from its current position in Ssame to the position it has
in S0
same, we can eliminate the conflicts this activity has with other activities.
For example, consider activity Ain Fig. 4. If we move Afrom its position in Ssame
(preceding B) to its new position in S0
same (Aand Bare contained in two different
branches of a conditional branching block), we can eliminate conflict C(A,B). As
shown in the order matrices, moving Arequires two steps. First, set the elements
in the first row and first column of An×n(which corresponds to activity A) to
empty, since Ais moved away. Second, reset these elements according to the new
order relation of A, when compared to the other activities from S0
same. So every
time we move an activity, we are able to change the value of its corresponding
row and column in the order matrices, i.e., we change these values corresponding
to the original model to the values compliant with the target model. By doing
10
this iteratively, we can change all the values and eliminate all the conflicts so
that we finally achieve the transformation from Ssame to S0
same.
A non-optimal solution would be to move all the activities involved in the
conflicts as set out by CF, from their positions in Ssame to the positions they
have in S0
same. Regarding our example from Fig. 5, to apply this straightforward
method, we would need to move activities A,B,C,D,Eand Ffrom their positions
in Ssame to the ones in S0
same. However, this naive method is not in line with our
goal to minimize the number of applied change operations. For example, after
moving activity Afrom its current position in Ssame to the position it has in
S0
same, we do not need to move activity Banymore, because after applying this
change operation, there are no activities with which activity Bstill has conflicts.
Digital logic in Boolean algebra [14] helps to solve this minimization prob-
lem. Digital logic constitutes the basis for digital electronic circuit design and
optimization. In this field, engineers face the challenge to optimize the internal
circuit design given the required input and output signals. To apply such tech-
nique in our context, we consider each process activity as an independent input
signal and we want to design a circuit which can cover all conflicts defined by CF
(cf. Def 5). If activity aiconflicts to activity aj, we can either move one of them
or both of them from the positions they have in Ssame to the ones they have in
S0
same. Doing so, the conflict will not exist any more. Reason is that every time
we move an activity from the position it has in Ssame to the position it has in
S0
same, we reset the corresponding row and column of this activity in the order
matrix. A conflict can be interpreted as a digital signal: When the two input
signals aiand ajare both ”true”(this means we do not move activity aiand
aj), we cannot solve the conflict and the ’circuit’ shall give an output signal of
”false”. If we apply this to all conflicts in CF, we will obtain all ”false” signals.
Meanwhile, the ”circuit” should be able to tell us what will result in a ”true”
output (i.e., the negative of all ”false” signals). This ”true” output represents
which activities we need to move. Regarding our example from Fig. 5, given the
set of conflicts CF, our logic expression then is: AB +CD +CF +DE +DF +EF.
The complexity for optimizing the logic expression is NP-Hard [14]. Therefore
it is advantageous to reduce the size of the problem. Concerning our example,
we can cut down the optimization problem into two groups: one with activities
Aand B, and conflict C(A,B); another one with activities C,D,Eand F, and
the following set of conflicts {C(C,D),C(C,F),C(D,E),C(D,F),C(E,F)}. Such a
division can be achieved in O(n) time in the following three steps. Step 1: List
all conflicting activities, and set every activity as a group. Step 2: If conflicting
activities aiand aj(i.e., C(ai,aj)) are contained in two different groups, merge
the two groups. Step 3: Repeat Step 2 for all conflicts in CF. After these three
steps, we can divide the activities as well as the associated conflicts into several
groups. Regarding our example, the optimization problem can be divided into
two sub-optimization problems: AB and CD +CF +DE +DF +EF. We depict this
by the two small matrices in Fig. 5.
Optimizing logic expressions has been intensively discussed in Discrete Math-
ematics. Therefore we omit details here and refer to Karnaugh map [14] and
11
C
E
A
D
F
GB
S
same
S’
same
M
o
v
e
(
S
,
D
,
{
A
,
B
}
,
{
C
,
E
}
)
move(S,B, Startflow, D, XOR)
M
o
v
e
(
S
,
F
,
C
,
G
)
a)
b) c)
d)
A
C
B
E
F
GD
A
E
B
C
F
GD
C
E
A B
F
GD
Fig. 6. Process Models After Every Move Operation
Quine-McCluskey algorithm [14]. We have implemented the latter in our proof-
of-concept prototype. Regarding our example in Fig. 4, the two optimization
results are AB =¯
A+¯
Bfor the first group and CD +CF +DE +DF +EF =¯
D¯
F+
¯
C¯
E¯
F+¯
C¯
D¯
Efor the second group. We can interpret this result as follows. For the
second group, either we move activities Dand F, or we move activities C,Eand
F, or we move activities C,Dand Efrom their position in Ssame to the positions
they have in S0
same. Based on this we can transform Ssame into S0
same since all
conflicts are eliminated. As can be seen from the order matrices, if we change
the value of the corresponding rows and columns of these activities in Ssame, we
can turn Ssame into S0
same. Since we want to minimize the number of change
operations, we can draw the conclusion that activities Dand Fmust be moved.
Same rule applies to the result of the first group. However, there is no differ-
ence whether to move either Aor Bsince both operations count as one change
operation. Here, we arbitrarily decide to move activity B.
So far we have determined the set of activities to be moved. The next step is to
determine the positions where these activities need to be moved to. Operation
move(S, X,A,B,[sc]) will be independent from other move operations (i.e., it
does not matter in which order to move the respective activity) if its direct
predecessors Aand direct successors Bdo not belong to the set of activities to
be moved. Regarding our example from Fig. 4, activity Fsatisfies this condition
since its predecessor Cand successor Gare not moved. If this had not been the
case, we would have to introduce silent activities to put the moved activity to
its corresponding place in S0
same. For example, if we want to first move Bto its
position in S0
same, we will have to introduce a silent activity after Band before
Cand E. Only in this way, we can change the execution order of Bto what it
appears in S0
same. However, such silent activity will be not required if we first
move activity Dto the position it has in S0
same. A detailed discussion can be
found in [11].
According to the position the moved activities have in S0
same, we can deter-
mine the parameters (i.e., the predecessors, successors and conditions) for every
move operation. In S0
same, activity Dhas predecessors Aand B, and successors E
and C. So one move operations therefore is move(S, D,{B,A},{C,E}). Similarly,
we obtain the other two move operations: move(S, B,StartFlow,D,XOR) and
move(S, F,C,G). The intermediate process models resulting after every move op-
eration are shown in Fig. 6. When comparing order matrices for each model in
12
T
S2
= {ABC, AC}
T
S1
= {ABC}
S
2
S
1
A B C A C
B
A C
B
D
T
S3
= {AC}
S
3
S
4
A C
T
S2
= {ABC, ADC}
Insert(S
1
, τ, A, C,
XOR
)
d
e
l
e
t
e
(
S
4
,
D
,
X
O
R
)
delete(S
4
, D)
replace(S
2
, τ, D)
d
e
l
e
t
e
(
S
2
,
B
)
i
n
s
e
r
t
(
S
3
,
B
,
A
,
C
,
X
O
R
)
Insert(S
3
,B,A,C)
Fig. 7. The Influence of Silent Activity
Fig. 6, it becomes clear that every move operation changes the values of the row
and the column corresponding to the moved activity.
4.4 Coping with Silent Activities
A silent activity is an activity which does not contain any operation or action,
and which only exists for control flow purpose. There are two reasons why we
do not consider silent activities in our similarity measure:
1. The appearance of a silent activity can be random. We can add or remove
silent activities without changing the behavior of a process model, e.g., we
can replace a control flow edge in a process model by one silent activity or
even a block of silent activities without influencing process model behavior.
2. The existence of a silent activity also depends on other activities and is
subject to change as other activities change. As example consider Fig. 2.
When applying change 1to S, the silent activity τis automatically removed
after activity Cis moved away.
There is one exception for which we need to consider silent activities. Consider
the two process models S1and S2in Fig. 7. If we ignore the silent activity τ
(depicted as an empty node) in S2, and derive the order matrix of S2, it will be
the same as the one of S1. Obviously, the two process models are not equivalent
since the trace sets producible by them are not identical. More precisely, TS2
contains one additional trace when compared to TS1. In general, if one process
model can produce additional traces, which are the sub-string of other traces (cf.
Def.4), there must be some silent activities we cannot ignore. Or if the direct
predecessor and direct successor of one silent activity constitute an XORsplit
and XORjoin, we can also not ignore this silent activity (cf. S2in Fig. 7).
Fig. 7 shows several process model transformations based on high-level change
operations. Here we can identify the difference between the two types of dele-
tion: delete(S4,D) and delete(S4,D,XOR) (cf. Fig. 7). The former one turns an
activity into a silent one (transforming S4into S2), while the latter one blocks
the branch which contains activity D(transforming S4to S1). When a branch
is blocked, we do not allow the activities of the branch to become activated [16,
11]. Since process models S1and S2have same order matrix, purely compar-
ing order matrices (cf. Sect. 4) would not be sufficient in the given situation.
Reason is that here the order matrix does not uniquely represent the process
model, since the sub-string constraint (cf. Def.4) of Theorem 1 is violated. To
13
Figure 2 Figure 4
SS1S2SSsame S'same S'
S 0 / 100% 1 / 86% 1 / 86% 4 / 50% 3 / 57% 3 / 57% 5 / 44%
Figure 2 S10 / 100% 2 / 71 % 4 / 50% 3 / 57% 3 / 57% 5 / 44%
S20 / 100% 5 / 38% 4 / 42% 3 / 57% 5 / 44%
S 0 / 100% 1 / 88% 4 / 50 % 6 / 40%
Figure 4 Ssame 0 / 100% 3 / 57% 5 / 44%
S'same 0 / 100% 2 / 78 %
S' 0 / 100%
Fig. 8. Distances and Similarities of Different Process Models
extend our method such that it can uniquely represent a process model without
the sub-string constraint, we must consider these special silent activities (i.e., a
silent activity which is direct predecessor of an XORsplit and direct successor of
an XORjoin) as well. They will appear in the order matrix and their execution
orders compared with other activities will be documented.
However, the existence of a silent activity is still very much dependent on
other activities, including the scenario described above. For example, if we delete
Bin S2as depicted in Fig. 7, we will transform S2into S3, i.e., the silent activity
will be simultaneously deleted when Bis deleted. We can identify this situation
by either examining the process model or the order matrix. In the process model,
a silent activity τcan be automatically deleted if there is another silent activity
τ0contained in the same block, but in another conditional branch (e.g., trans-
forming S2to S3). In the order matrix, we can automatically remove a silent
activity τif there is another silent activity τ0with same order relations to the
rest of the activities as τhas.
In general, if a silent activity has an XORsplit as direct predecessor and an
XORjoin as direct successor, we need to consider it when computing the order
matrix of a process model. However, these silent activities can automatically be
deleted when changing the process model. This requires us to perform additional
checks on the process model or order matrix (as described above) after every
change operation.
4.5 Summary
Taking our example from Fig. 4 (i.e., to transform Sinto S0), the following
six change operations are required: σ={delete(S, X), move(S, F,C,G), move(S,
D,{A,B},{C;E}), move(S, B,StartFlow,D,XOR), insert(S, Y,StartFlow,{A,B}),
and insert(S, Z,D,E)}. Distance between the two models is six and similarity
is 0.4 (cf. Def.2). To illustrate our method and these numbers in more detail,
we compare the distances and similarities between the seven process models dis-
cussed so far: S,S1and S2from Fig. 2 and S,Ssame,S0
same and S0from Fig.
4. Distance and similarity of two models are specified as distance/similarity in
each corresponding cell in Fig. 8. As the transformation is commutable, we only
fill in the upper triangle matrix. Taking Fig. 8, we can conclude:
1. Changing the activity set always leads to a modified distance. For example,
d(Sn,S0
same)always equals d(Sn,S0)+ 2, where Snstands for a process model
other than S0or S0
same in Fig. 8. Reason is that S0contains two unique
activities Yand Zwhen compared to S0
same, while the rest are identical.
14
2. If three process models S,S0, and S00 have same activity sets, we will ob-
tain d(S,S00 )d(S,S0)+d(S0,S00 ). It is easy to understand this because some
activities could be moved twice when transforming Sinto S0and S0into S00.
5 Related Work
Various papers have studied the process similarity problem and provided use-
ful results [17, 16, 21, 2]. In graph theory, graph isomorphism[15] and sub-graph
isomorphism [15] are used to measure similarity between two graphs. Unfortu-
nately, these measures usually only examine edges and nodes and cannot catch
the syntactical issues of a PAIS (e.g., guarantee soundness of a process model,
differentiate AND-Split and XOR-Split, and handle silent activities). Algorithms
for measuring tree edit distances [2] shows similar disadvantages, i.e., syntacti-
cal issues of a PAIS are missing. In the database field, the delta-algorithm [7]
is used to measure the difference between the two models. It extends the above
mentioned approaches by assigning attributes to edges and nodes [12]. Still, it
can only catch change primitives, and will further run into problems when con-
sidering high-level change operations. Regarding Petri-nets and state automata,
similarity based on change is difficult to measure since these formalisms are not
very tolerant for changes. Inheritance rules [16] are one of the very few tech-
niques showing the transformation of a process model described as Petri-net.
Trace equivalence is commonly used to compare whether two process models are
similar or identical [5]. In addition, bisimulation [16, 18] extends trace equiva-
lence by considering stronger notions. Also based on traces, [17] assign weights
to each trace based on execution logs which reflect the importance of a cer-
tain trace. The edit distance [21] is also used to measure the difference between
traces; the sum of them represents the differences of two models. Some similar-
ity measures use two numbers (precision and recall) to evaluate the difference
between process models S1and S2[17, 10]. None of these approaches measures
similarity by a unique and commutative number, based on the effort for process
transformation.
6 Summary and Outlook
We have provided a method to quantitatively measure the distance and similarity
between two process models based on the efforts for model transformation. High-
level change operations are used to evaluate the similarity since they guarantee
soundness and also provide more meaningful results. We further applied digital
logic in boolean algebra so that the number of change operations required to
transform process model Sinto process model S0becomes minimal. Respective
distance and similarity measures have already been applied in the filed of process
mining [9].
Additional work is needed to enrich our knowledge on process similarity. As a
first step, we will extend our method so that it is able to measure the similarity
between process models with additional constructs (e.g., loopbacks [11]) and
data flows. The next step will be to enrich the model with semantic relations
between activities and to give weight for each change operation, so that the
similarity measure can be further applied to practice.
15
References
1. P. Balabko, A. Wegmann, A. Ruppen, and N. Cl´ement. Capturing design rationale
with functional decomposition of roles in business processes modeling. Software
Process: Improvement and Practice, 10(4):379–392, 2005.
2. P. Bille. A survey on tree edit distance and related problems. Theor. Comput.
Sci., 337(1-3):217–239, 2005.
3. M. Dumas, W.M.P. van der Aalst, and A.H.M. ter Hofstede. Process-Aware Infor-
mation Systems. Wiley & Sons, 2005.
4. C.W. G¨unther, S. Rinderle, M. Reichert, and W. M. P. van der Aalst. Change
mining in adaptive process management systems. In CoopIS’06, LNCS4275, pages
309–326, 2006.
5. J. Hidders, M. Dumas, W.M.P. van der Aalst, A.H.M. ter Hofstede, and J. Verelst.
When are two workflows the same? In CATS ’05:, pages 3–11, Darlinghurst, Aus-
tralia, Australia, 2005. Australian Computer Society, Inc.
6. M. Kradolfer and A. Geppert. Dynamic workflow schema evolution based on
workflow type versioning and workflow migration. In COOPIS ’99, page 104,
Washington, DC, USA, 1999. IEEE Computer Society.
7. W. Labio and H. Garcia-Molina. Efficient snapshot differential algorithms for data
warehousing. In VLDB ’96, pages 63–74, San Francisco, CA, USA, 1996.
8. C. Li, M. Reichert, and A. Wombacher. On measuring process model similarity
based on high-level change operations. Technical Report TR-CTIT-07-89, Univer-
sity of Twente, 2007.
9. C. Li, M. Reichert, and A. Wombacher. Discovering reference process models by
mining process variants. In ICWS’08, to appear, 2008.
10. S.S. Pinter and M. Golani. Discovering workflow models from activities’ lifespans.
Comput. Ind., 53(3):283–296, 2004.
11. M. Reichert and P. Dadam. ADEPTflex -supporting dynamic changes of workflows
without losing control. Journal of Intelligent Info. Sys., 10(2):93–129, 1998.
12. S. Rinderle, M. Jurisch, and M. Reichert. On deriving net change information from
change logs - the deltalayer-algorithm. In BTW, pages 364–381, 2007.
13. S. Rinderle, M. Reichert, and P. Dadam. Correctness criteria for dynamic changes
in workflow systems: a survey. Data Knowl. Eng., 50(1):9–34, 2004.
14. S.Brown and Z.Vranesic. Fundamentals of Digital Logic with Verilog Design.
McGraw-Hill, 2003.
15. P.N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Addison-
Wesley, 2005.
16. W.M.P. van der Aalst and T. Basten. Inheritance of workflows: an approach to
tackling problems related to change. Theor. Comput. Sci., 270(1-2):125–203, uary.
17. W.M.P. van der Aalst, A.K.A. de Medeiros, and A.J.M.M. Weijters. Process equiv-
alence: Comparing two process models based on observed behavior. In Business
Process Management (BPM’06), pages 129–144, 2006.
18. R.J. van Glabbeek and W.P. Weijland. Branching time and abstraction in bisim-
ulation semantics. J. ACM, 43(3):555–600, 1996.
19. B. Weber, S. Rinderle, and M. Reichert. Change patterns and change support
features in process-aware information systems. In CAiSE, pages 574–588, 2007.
20. M. Weske. Formal foundation and conceptual design of dynamic adaptations in a
workflow management system. In HICSS ’01, page 7051, Washington, DC, 2001.
21. A. Wombacher and M. Rozie. Evaluation of workflow similarity measures in service
discovery. In Service Oriented Electronic Commerce, pages 51–71, 2006.
16