scieee Science in your language
[en] (orig)
On Measuring Process Model Similarity based
on High-level Change Operations
Chen Li1, Manfred Reichert1, and Andreas Wombacher2
1Information System group, University of Twente
P.O.Box 217, 7500 AE Enschede, The Netherlands
{lic,m.u.reichert}@cs.utwente.nl
2Database group, University of Twente
P.O.Box 217, 7500 AE Enschede, The Netherlands
Abstract. For various applications there is the need to compare the
similarity between two process models. For example, given the as-is and
to-be models of a particular business process, we would like to know how
much they differ from each other and how we can efficiently transform
the as-is to the to-be model; or given a running process instance and
its original process schema, we might be interested in the deviations
between them (e.g. due to ad-hoc changes at instance level). Respective
considerations can be useful, for example, to minimize the efforts for
propagating the schema changes to other process instances as well. All
these scenarios require a method to measure the similarity or distance
between two process models based on the efforts for transformation. In
this paper, we provide an approach using digital logic to evaluate the
distance and similarity between two process models based on high-level
change operations (e.g. to add, delete or move activities). In this way,
we can not only guarantee that the transformation results in a sound
process model, but also ensure that the efforts are minimized.
1 Introduction
Business world is getting increasingly dynamic, requiring from companies to con-
tinuously adapt their Process-Aware Information Systems (PAIS) [6] in order to
cope with the frequent and unprecedented changes in their business environ-
ment [23, 25]. Organizations and enterprises need to continuously Re-engineer
their Business Processes (BPR), i.e. they need to be able to flexibly upgrade
and optimize their business processes in order to stay competitive in their mar-
ket. Furthermore, PAIS should allow for process flexibility, i.e., they should allow
users to deviate from the standard process model at the instance level if required.
Finally, organizational learning should be supported by analyzing these process
instance deviations. The latter provides useful information about the past which
can be utilized by a company to evolve and optimize its business processes and
supporting PAIS [17].
The pivotal research on process flexibility over the last years [3–5] has pro-
vided the foundation for dynamic process change to reduce the cost of change
II
in PAIS. In this context, business process flexibility denotes the capability to
reflect externally triggered change by modifying only those aspects of a process
that need to be changed, while keeping the other parts stable, i.e., the ability
to change or evolve the process without completely replacing it [4]. When com-
paring two process models, we would like to be able to calculate their minimal
difference based on high level changes so that if we need to transform one model
to another, efforts can be reduced and the transformation can go smoothly (i.e.
we do not need to re-implement the new process model from scratch, but only
apply these high level changes either at process type or process instance level).
Several approaches like ADEPT [5], WASA [10] or TRAM [9], have emerged
to enable full lifecycle support in PAIS (for an overview see [7]). All these systems
support ad-hoc deviations at the instance level and record them in change logs.
Thus, they provide additional information when compared to traditional PAISs
which only record execution logs.
1.1 Problem Statement
Based on the two assumptions that: (1) process models are block-structured and
(2) all activities in a process model have unique labels, this paper deals with the
following research question:
Given two process models Sand S0, how much do they differ from each other
in terms of high-level change operations? And what is the minimal effort, i.e.
the minimal number of change operations needed to transform Sinto S0?
Clearly, our focus is on minimizing the number of changes necessary to trans-
form process model Sinto process model S0. However, soundness of the resulting
process model should be also not sacrificed. When modifying a process model, we
apply the high-level change operations as introduced by the ADEPT method [5]
to guarantee soundness. By considering high-level change operations, we can dis-
tinguish ourselves from traditional similarity measures like graph isomorphism
[2] or sub-graph isomorphism [24], which only consider basic change primitives
like the insertion or deletion of single nodes and edges. At process type level,
answering this question will lead to better cost efficiency when performing BPR,
since the efforts to implement the corresponding changes in the supporting PAIS
is minimized. At process instance level, answering this question can reduce the
efforts to propagate the process type changes to the running instances [7, 18];
or when there are thousands of instances, the derived differences between the
original process model and each process instance can be used as a set of pure
and concise logs for process mining [17].
1.2 Contribution
Previous work on ADEPT [5] has provided the technical foundation for users to
flexibly change process models at both the process type and the process instance
level. For example, users may dynamically insert, delete or move an activity [5]
at these two levels. In addition, snapshot differential algorithms [1], known from
database technology, can be used as a fast and secure method to detect the
III
change primitives (e.g. concerning differences of the activities or control flow
edges) needed to transform one process model into another.
Based on the ADEPT framework and snapshot differential algorithm, this
paper applies Digital Logic in Boolean Algebra [20] to provide a new method to
transform a process model to another based on high-level change operations. This
method does not only minimize the number of changes needed in this context,
but also guarantees soundness of the changed process model, i.e. the process
model remains correct when applying a high-level change operation. Our paper
also provides the two measures of process distance and process similarity based
on high-level change operations. Such measures can be used to determine how
difficult it is to transform a process model into another, and how different two
process models are.
The remainder of this paper is organized as follows: Section 2 introduces
backgrounds needed for the understanding of this paper. In Section 3, we dis-
cuss reasons and difficulties for deriving high-level change operations. Section 4
describes an approach to detect the changes between two block-structured pro-
cess models, and Section 5 discuss related work. The paper concludes with a
summary and outlook in Section 6.
2 Backgrounds
Aprocess model S= (N, E, . . .) P is defined as a Well-Structured Marking
(WSM) Net [5]. Pstands for all the possible process models. Nconstitutes a set
of activities aiassociated to Sand Eis a set of control edges linking all these
activities. To limit the scope, we assume process models to be block structured.
A detailed description and correctness issues of WSM Nets are out of the scope
of this paper, (see [22]).
We assume that a process change is accomplished by applying a sequence
of change operations to a given process model Sover time [5]. Such change
operations modify the initial process model by altering the set of activities and/or
by changing their order relations. Thus, each application of a change operation
to a process model results in another process model. In this context, we define
high-level change operations on a process model as follows:
Definition 1 (Change in Process Model). Let Pbe the set of possible pro-
cess models and Cthe set of possible process changes. Let S, S0 P be two process
models, let C be a process change, and let σ=h1, 2, . . . ni∈Cbe a
sequence of process changes performed on initial process model S. Then we can
define:
S[iS0iff is applicable to Sand S0is the process schema resulting from
the application of to S.
S[σiS0iff S1, S2, . . . Sn+1 P with S=S1,S0=Sn+1, and Si[iSi+1
with i={1,...n}.
Examples of high-level change operations and their effects on a process model
are depicted in Table 1. Issues concerning the correct use of these operations and
IV
Table 1. Examples of High-Level Change Operations on Process Schemas
Change Operation opType subject paramList
Applied to S
insert(S, X, A,B, [sc]) insert X S, A,B, [sc]
Effects on S: inserts activity X between activity sets Aand B.
It is a conditional insert if [sc] is specified (i.e. [sc] = XOR)
delete(S, X, [sc]) delete X S, [sc]
Effects on S: deletes activity X from S, i.e. X turns into a silent
activity. [sc] is specified ([sc] = XOR) when we block the branch
with X, i.e. the branch which contains X will not be activated
move(S, X, A,B, [sc]) move X S, A,B, [sc]
Effects on S: moves activity X from its original position to
another position between activity sets A and B. (it is a conditional
insert if [sc] is specified)
replace(S, X, Y) replace X Y
Effects on S: replaces activity X by activity Y
related pre/post conditions are described in [5]. If some additional constraints are
met, the high-level change operations depicted in Table 1 are also applicable at
the process instance level (e.g. to deal with exceptional situations [18]). Although
the depicted change operations are discussed in relation to the ADEPT meta
model (see [5] for details), they are generic in the sense that they can be easily
transferred to other meta models as well, (e.g. Petri Nets) [18]. We are referring
to ADEPT in this paper since it covers by far most high-level change operations
and change pattens respectively when compared to other approaches [25].
In our context, a trace ton process model Sdenotes a valid execution se-
quence t< a1, a2, . . . , ak>of activities aiNon Saccording to the control
flow defined by S. All the traces process model Scan produce are summarized in
set TS. Finally, t(ab) is denoted as precedence relationship between activities
aand bin trace t< a1, a2, . . . , ak>iff i < j :ai=aaj=b. Here, we only
consider traces composing ’real’ activities, but no events related to silent activ-
ities (i.e. activity nodes which contain no operation and exist only for control
flow purpose). The reason of why not considering silent activities is given later
(c.f. Section 4.4). At this stage, we consider two process models as being the
same if they are trace equivalent, i.e. SS0iff TS TS0. The stronger notion
of bi-similarity [26] is not considered at this stage.
3 high-level change operations
3.1 Complementary Nature of Change and Execution Logs
Change logs and execution logs document different run time information on
process instances and are not interchangeable. Even when the original process
model is given, it is not possible to convert the change log of a process instance
to its execution log or vice-verse. As an example, take the original model of a
patient treatment process as depicted in Figure 1a): a patient is admitted to a
V
hospital, where he first registers, then receives treatment, and finally pays.
Assume that, due to an occurring emergency situation, for one particular patient,
we want to first start the treatment of this patient and allow him to register
during treatment. To represent this exceptional situation in the process model
of the respective instance, the needed change would be to move activity receive
treatment from its current position to a position parallel to activity register.
This leads to a new model S0(S[σiS0) and the corresponding change log is σ=<
move(S,reveive treatment,admitted,pay)>. Meanwhile, the execution log
for this particular instance is then {admitted,receive treatment,register,
pay}(as numbers indicate in the Figure 1b). If we only have process model Sand
its execution log, it is not possible to determine this change because the process
model which can produce such execution log is not unique (for example, a process
model with the four activities contained in four parallel branches could produce
this execution log as well). On the contrary, it is generally not possible to derive
the execution log from a change log, because the execution behavior of S0is also
not unique (for example, a trace <admitted,register,receive treatment,
pay >is also producible on S0). Consequently, change logs provide additional
information compared to pure execution logs.
3.2 Why High-level Change Operations?
After showing the importance of changes, in this section we discuss why we
need high level change operations rather than change primitives (i.e. low-level
changes at edge and node level). Left side of Figure 2 shows original process
model Swhich consists of a parallel branching, a conditional branching and a
silent activity τ(depicted as empty node) connecting these two blocks. Assume
that two high-level change operations are applied to Sresulting in two models
S1,S2:1moves activity Cfrom its current location to the position between
activities Aand B, which leads to S1(i.e. S[1iS1with 1=move(S, C,A,B));
2moves activity Ato the position between activities Band C, i.e. S[2iS2
with 2=move(S, A,B,C). Figure 2 additionally depicts the change primitives,
representing snapshot differences between the model Sand the models S1and
S2, respectively.
receive
treatment
Admitted
a) S: original process model b) S’: final execution & change
1
3
2
4
register pay
register
receive
treatment
pay
AND-Split
AND-Join
1
A possible
execution order
Admitted
Fig. 1. change log and execution log is not interchangeable
VI
delEdge(StartFlow,A); delEdge(A,B);
delEdge(B,C); addEdge(B,A);
addEdge(A,C); addEdge(StartFlow,B)
delEdge(A,B); delEdge(B,C); delEdge(B,D);
delEdge(C, τ); delEdge(D,τ); delEdge(t,E);
delEdge(τ, F}; delNode(τ); addEdge(A,C);
addEdge(C,B); addEdge(B,D); addEdge(D,E);
addEdge(D,F); updateNodeType(D,
XorSplit); updateNodeType(B, empty);
G
BA
C
D
E
F
D
AC
E
F
BG
B
A C
D
E
F
G
Change Primitives
Change Primitives
1
=Move (S, C, A, B)
S
1
:
model after change
1
2
=Move (S, A, B, C)
S[∆
1
>S
1
S[∆
2
>S
2
S:
original process model
S
2
:
model after change
2
AND-Split
AND-Join
XOR-Split
XOR-Join
Fig. 2. high-level change operation compared with change primitive
When compared to change primitives, using high-level change operations like
’move’ or ’insert’ offers the following advantages:
1. High-level change operations, as supported by ADEPT, guarantee soundness:
i.e., the application of a high-level change operation on a sound model S
results in another sound model S0[5]. This also applies to our example
from Figure 2. By contrast, when applying one single change primitive (e.g.
deleting an edge in S), soundness cannot be guaranteed anymore. Generally,
if we delete any of the edges in S, the resulting process model will not be
necessarily sound.
2. High-level change operations provide richer syntactical meanings than change
primitives. Generally, a high-level change operation is built upon a set of
change primitives which collectively represent a semantic modification of a
process model. As example, take 1from Figure 2. This high-level change
operation requires 15 change primitives for its realization (deleting edges,
adding edges, deleting the silent activity, and updating the node types).
3. An important aspect, not discussed so far, concerns the number of change
operations needed to transform a process model Sinto a target model S0.
For example, we need only one move operation to transform Sto either S1or
S2. However, when using change primitives, migrating Sto S1necessitates
15 changes, while the second change 2can be realized based on 6change
primitives. This simple example shows that change primitives do not provide
an adequate means to determine the difference between two process models.
Thus the number of the change primitives cannot represent the efforts for
process model transformations.
3.3 The Challenge to Derive High-level Change Operations
After sketching the benefits coming with high-level change operations, this sec-
tion discusses challenges of Deriving them. When comparing two process models,
VII
S(
1
>S
S(
1
>S
1
= Delete (S, B)
2
= Delete (S, C)
delEdge(A,B), delEdge(B,C),
addEdge(A,C)
, delNode(B)
delEdge(C,D), addEdge(A,D),
delEdge(A,C)
,
delNode(C)
S(σ>S
σ =< Delete (S, B),
Delete (S, C) >
delEdge(A,B),
delEdge(B,C),
delEdge(C,D),
addEdge(A,D),
delNode(B)
delNode(C)
S’’ A D
S’ AC D
SAB C D
Fig. 3. undetectable change primitives
the change primitives needed for transforming one model into another can be
easily determined by performing two snapshots and a delta analysis on them
[1]. An algorithm to minimize the number change primitives has been given in
[11]. However, when trying to derive the high-level change operations needed for
model transformation, several challenges occur. Consider Figure 3 as example:
1. When performing two delete operations on S, i.e. 1=delete(S, B) and
2=delete(S, C), we obtain a new model S00 with S[σiS00 with σ=<
1, 2>, as well as an undetectable intermediate model S0with S[1iS0
and S0[2iS00. When examining the change primitives corresponding to each
high-level change operation, we need to first add edge (A,C) after the first
delete operation 1, and remove this edge (A,C) when applying the sec-
ond delete operation 2. However, when performing a delta analysis for
the original process model Sand the resulting process model S00, the two
change primitives (addEdge(A,C) introduced by the first delete operation
and delEdge (A,C) introduced by the second delete operation) jointly have
no effect on the resulting process model S00 so that they cannot be detected
by snapshot analysis. Consequently, deriving high-level change operations
from change primitive would be challenging because the change primitives
required for every high-level change will not always appear in the snapshot
differences between the original and resulting models (like in Figure 3, none
of the two sets of change primitives associated to 1or 2are a sub-set of
the set of change primitives associated with σ).
2. Even when there is just one high-level change operation, it remains difficult
to derive it with a delta algorithm. For example, in Figure 3, the delta
algorithm shows that when transforming Sto S1,15 change primitives need
to be applied to the original model S. However, the depicted change can be
also realized by just applying one high level move operation to S.
VIII
4 Detecting The Minimal Number Of High-level Change
Operations
In this section, we introduce our method to detect the minimal number of change
operations needed to transform a given process model Sinto another model S0.
As example, consider the process models Sand S0in Figure 4.
4.1 General Description of our Method
As mentioned in Section 1, the key issue of this work is to minimize the number
of change operations needed to transform a process model Sto another model
S0. In this context, let Nand N0be the two sets of activities based on which
Sand S0are defined. Generally, three steps are needed (cf. Figure 4) to realize
this minimal transformation:
1. aiN\N0:delete all activities being present in S, but not in S0. As
depicted in our example from Figure 4, this first step transforms Sto Ssame
(cf. Figure 4b).
2. aiNTN0:move all activities being present in both models to the lo-
cations as reflected by S0. As depicted in our example from Figure 4, this
second step transforms Ssame to S0
same (cf. Figure 4c).
3. aiN0\N:insert those activities being present in S0, but not S. As
depicted in Figure 4, the third step transforms S0
same to S0(cf. Figure 4d).
Insertions and deletions deal with changes of the set of activities and we can
hardly do anything here to reduce efforts (i.e. the number of required high-level
insert/delete operations): New activities (aiN0\N) must be added and ob-
solete activities (ajN\N0) must be deleted. The focus of minimality can
therefore be shifted to the use of the move operation, which changes the struc-
ture of a process model, but not its set of activities. Since a move operation
logically corresponds to a delete followed by an insert operation, we can trans-
form Ssame to S0
same by maximal n=|NTN0|move operations. Reason is that
nmove operations correspond to deleting all activities and then inserting them
back to their new positions. So nwould be the maximal number of change oper-
ations needed to transform one process model into another, with the same set of
activities (Ssame and S0
same in our example from Figure 4). However, this would
be not in line with our goal of minimality. To measure the complete transforma-
tion from Sto S0, we formally define process distance and process similarity as
follow:
Definition 2 (Process distance & process similarity.).
Let S, S0 P be two process models. Let Nand N0be two sets of activities
based on which Sand S0are defined. Let further σ=h1, 2, . . . ni Cbe a
sequence of change operations transforming Sinto S0(i.e. S[σiS0). Then the dis-
tance between Sand S0is given by d(S,S0)=min{|σ| |σ CS[σiS0}. Further-
more, the process similarity between Sand S0then equals to 1d(S,S0)
|N|+|N0|−|NTN0|.
i.e. the similarity equals ((maximal number of changes - minimal number of
changes) / maximal number of changes).
IX
4.2 Determining Required Activity Deletions and Insertions
To accomplish Step 1 and Step 3 of our method, we have to deal with the
change of the activity set when transforming Sto S0. It can be easily detected
by applying existing snapshot algorithms [1] to both Sand S0. As described in
Section 4.1, as first step we need to delete all activities aiN\N0contained in
S, but not in S0. Regarding our example from Figure 4, we therefore can derive
as our first high-level change operation 1=delete(S, X). Similarly, activities
contained in S0, but not in S, are to be inserted in Step 3 of our method, after we
moved the shared activities to their respective position in S0(S0
same respectively).
The parameters of the insert operation, i.e. the predecessors and successors of
the inserted activity, are just like how they appear in S0. In this way, we obtain
the last two change operations for our example: Insert(S, Y,StartFlow,{A,B})
and Insert(S, Z,D,E).
4.3 Determining Required Move Operations to Deal With
Structure Changes
In this section, we focus on Step 2 of our method, i.e. to transform two process
models with same activity set using move operations. Here, we can ignore the
activities not contained in both Sand S0(they have already been handled in
Section 4.2). We therefore consider the two process models Ssame and S0
same
respectively, as depicted in Figure 4.
Determine the Order Matrix of a Process Model One key feature of the
ADEPT change framework is to maintain the structure of the unchanged parts
of a process model [5]. For example, if we delete an activity, this will neither
influence the successors nor the predecessors of this activity, and also not their
control relationships. To incorporate this feature in our approach, rather than
S
: original process model
S’
: destination process model
E
D
BC
AZ
F
G
Y
C
E
X
D
F
G
B
A
C
E
A
D
F
GB
S
same
: original model with shared activities
S’
same
: destination model with shared activities
A
C
B
E
F
GD
S
t
e
p
1
:
d
e
l
e
t
e
Transform
S to S’
Step2: move
S
t
e
p
3
:
i
n
s
e
r
t
a)
b) c)
d)
Fig. 4. Three steps to transform Sinto S0
X
only looking at direct predecessor successor relationships between two activities
(i.e. control flow edges), we consider the transitive control dependencies between
all pairs of activities; i.e. for every pair of activities ai, ajNTN0,ai6=aj,
their execution order compared to each other is examined. Logically, we check
the execution orders by considering all traces a process model can produce. The
results can be formally described in a matrix An×nwith n=|NTN0|. Four
types of control relations can be identified (cf. Def. 3):
Definition 3 (Order Matrix). Let S P be a process model with N=
{a1, a2, . . . , an}. Let further TSdenote the set of all traces producible on S.
Then, matrix An×nis called order matrix of Swith Aij representing the rela-
tion between different activities ai,ajNiff:
Aij = ’1’ iff (t TSwith ai, ajtt(aiaj))
If for all traces containing activities aiand aj, activity aialways appears
BEFORE aj, we will denote Aij as ’1’, i.e., aiis a predecessor of ajin the
flow of control.
Aij = ’0’ iff (t TSwith ai, ajtt(ajai))
If for all traces containing activity aiand activity aj, activity aialways
appears AFTER aj, then we will denote Aij as a ’0’, i.e. aiis a successor
of ajin the flow of control.
Aij = ’*’ iff (t1 TS, with ai, ajt1t1(aiaj))(t2 T , with
ai, ajt2t2(ajai))
If there exists at least one trace in which aiappears before ajand at least
one other trace in which aiappears after aj, we will denote matrix element
Aij as ’*’, i.e. aiand ajare contained in different parallel branches.
Aij = ’-’ iff ( ¬∃t TS:aitajt)
If there exists no trace containing both activity aiand aj, we will denote
Aij as ’-’, i.e. aiand ajare contained in different branches of a conditional
branching.
We re-visit our example from Figure 4. The order matrices of Ssame and S0
same
are shown in Figure 5. The main diagonal is empty since we do not compare
an activity with itself. As one can see, elements Aij and Aji can be derived
from each other. If activity aiis a predecessor of activity aj, (i.e. Aij = 1), we
can always conclude that Aji = 0 holds. Similarly, if Aij {’*’,’-’}, then we
will obtain Aji =Aij . As a consequence, we can simplify our problem by only
considering the upper triangular matrix A= (Aij)j>i.
Under certain constraints, an order matrix Acan uniquely represent the
process model, based on which it was built on. This is stated by Theorem 1.
Before giving this theorem, we need to define substring of trace:
Definition 4 (Substring of trace).
Let tand t0be two traces. We define ta sub-string of t0iff [ai,ajt,
t(aiaj)ai,ajt0t0(aiaj)] and [akN:ak/takt0].
Theorem 1. Let S, S0 P be two process models, with same set of activities
N={a1, a2, . . . , an}. Let further TS,TS0be the related trace sets and An×n,
XI
A0
n×nbe the order matrices of Sand S0. Then S6=S0A6=A0, if (¬∃t1, t0
1
TS:t1is a substring of t0
1) and (¬∃t2, t0
2 TS0:t2is a substring of t0
2).
According to Theorem 1, a process model Sand order matrix Ais a one-
to-one mapping if the substring constraint is satisfied. A detailed discussion of
the sub-string restriction is given further in Section 4.4. A proof of Theorem 1
can be found in [27]. Thus, when comparing two process models, it is sufficient
to compare their order matrices (cf. Def. 3), since a order matrix can uniquely
represent the process model. This also means that the differences of two process
models can be related to the differences of their order matrices. If two activities
have different execution order in two process models, we define the notion conflict
as follows:
Definition 5 (Conflict). . Let S, S0 P be two process models with same set of
activities N. Let further Aand A0be the order matrices for Sand S0respectively.
Then: Activities aiand ajare conflicting iff Aij 6=A0
ij. We formally denote this
as C(ai,aj).CF := {C(ai,aj)|Aij 6=A0
ij}, then corresponds to the set of all
existing conflicts.
Figure 5 marks up differences between our two order matrices in grey. The
set of conflicts is as follows: CF ={C(A,B),C(C,D),C(C,F),C(D,E),C(D,F),
C(E,F)}.
Optimizing the Conflicts To come from Ssame to S0
same, we have to eliminate
conflicts between them by applying move operations. Obviously, if there is no
Execution order Matrix S
same
Execution order Matrix S’
same
A B
A
B
The first group of
activities and conflicts
C D E F
C
D
E
F
The second group of
activities and conflicts
Fig. 5. the execution order matrices of Ssame and S0
same in Figure 4
XII
conflict for the two process models Ssame and S0
same, they are identical. Every
time we move an activity from its current position in Ssame to the position it has
in S0
same, we can eliminate the conflicts this activity has with other activities. For
example, consider activity Ain Figure 4. If we move activity Afrom its position
in Ssame (preceding B) to its new position in S0
same (Aand Bare contained in
two different branches of a conditional branching block), then we can eliminate
conflict C(A,B). Shown in the order matrices, moving activity Arequires two
steps. First, set the elements in the first row and first column of An×n(which
corresponds to activity A) to empty, since Ais moved away. Second, reset these
elements according to the new order relation of A, when compared to the other
activities from S0
same. So every time we move an activity, we are able to change
the value of its corresponding row and column in the order matrices, i.e., we
change these values corresponding to the original model to the values compliant
with the target model. By doing this iteratively, we can change all the values
and eliminate all the conflicts so that we could achieve the transformation from
Ssame to S0
same.
A none-optimal solution to transform the processes would be to move all the
activities contained in the conflicts in CF, from their positions in Ssame to the
positions they have in S0
same. Regarding our example from Figure 5, to apply
this straightforward method, we would need to move activities A,B,C,D,Eand F
from their positions in Ssame to the ones in S0
same. However, this naive method
is not in line with our goal to minimize the number of applied change operations.
For example, after moving activity Afrom its current position in Ssame to the
position it has in S0
same, we do not need to move activity Banymore, because
after this change operation, there are no activities with which activity Bstill has
conflicts.
Digital logic in Boolean algebra [20] helps to solve this minimization prob-
lem. Digital logic constitutes the basis for digital electronic circuit design and
optimization. In this field, engineers face the challenge to optimize the internal
circuit design given the required input and output signals. To apply such a tech-
nique in our context, we consider each process activity as an independent input
signal and we want to design a circuit which can cover all conflicts defined by CF
(cf. Def 5). If activity aiconflicts to activity aj, we can either move one of them
or both of them from the positions they have in Ssame to the ones they have
in S0
same. Doing so, the conflict will not exist any more. Reason is that every
time we move an activity from the position it has in Ssame to the position it
has in S0
same, we reset the corresponding row and column of this activity in the
order matrix. A conflict can be interpreted as a digital signal as follows: when
the two input signals aiand ajare both ”true”(this means we do not move
activity aiand aj), we cannot solve the conflict and the ’circuit’ shall give an
output signal of ”false”, (i.e. aiaj= 0). If we apply this to all conflicts in CF, we
will obtain all the ”false” signals. Meanwhile, the ”circuit” should be able to tell
us what will result in a ”true” output, (i.e. the negative of all ”false” signals).
This ”true” output represents which activities we need to move. Regarding our
XIII
example from Figure 5, given the set of conflicts CF, our logic expression then
is: AB +CD +CF +DE +DF +EF.
The complexity for optimizing the logic expression is NP-Hard [20]. Therefore
it is advantageous to reduce the size of the problem. Concerning our example,
we can cut down the optimization problem into two groups: one with activities A
and B, and conflict C(A,B); another one with activities C,D,Eand Fand conflicts
{C(C,D),C(C,F),C(D,E),C(D,F),C(E,F)}.
Such a division can be achieved in O(n) time in the following three steps. Step
1: List all the conflicting activities, and set every activity as a group. Step 2: If
conflicting activities aiand aj(i.e. C(ai,aj)) are contained in two different groups,
merge these two groups. Step 3: Repeat Step 2 for all conflicts in CF. After these
three steps, we can divide the activities as well as the associated conflicts into
several groups. Regarding our example, the optimization problem can be divided
into two sub-optimization problems: AB and CD +CF +DE +DF +EF. We depict
this in the two small matrices in Figure 5.
Optimizing logic expression has been discussed intensively in Discrete Math-
ematics. Therefore we omit details here and refer to Karnaugh map [20] and
Quine-McCluskey algorithm [20]. We have implemented the latter in our proof-
of-concept prototype. Regarding our example in Figure 4, the two optimiza-
tion results are: AB =¯
A+¯
Bfor the first group, and CD +CF +DE +DF +EF =
¯
D¯
F+¯
C¯
E¯
F+¯
C¯
D¯
Efor the second group. We can interpret this result as follows.
For the second group, either we move activities Dand F, or we move activities
C,Eand F, or we move activities C,Dand Efrom their position in Ssame to the
positions they have in S0
same. Based on this, we can transform Ssame to S0
same,
since all conflicts are eliminated. As can be seen from the order matrices, if
we change the value of the corresponding rows and columns of these activities
in Ssame, we can turn Ssame into S0
same. As we want to minimize the number
of change operations, we can draw the conclusion that activities Dand Fmust
be moved. Same rule applies to the result of the first group. However, there is
no difference to move either Aor B, since both operations count as one change
operation. Here, we arbitrarily decide to move activity B.
So far we have determined the set of activities to be moved. The next step is
to determine the positions where the activities need to be moved to. Operation
move(S, X,A,B,[sc]) will be independent from other move operations (i.e. it
does not matter in which order to move the respective activity) if its direct
predecessors Aand direct successors Bdo not belong to the set of activities
to be moved. Regarding our example from Figure 4, activity Fsatisfies this
condition since its predecessor Cand successor Gare not moved. If this had not
been the case, we would have to introduce silent activities to put the moved
activity to its corresponding place in S0
same. For example, if we want to first
move Bto its position in S0
same, we will have to introduce a silent activity after
Band before Cand E. Only in this way, we can change the execution order of B
to what it appears in S0
same. However, such silent activity will be not required
if we first move activity Dto the position it has in S0
same. A detailed discussion
XIV
C
E
A
D
F
GB
S
same
S’
same
M
o
v
e
(
S
,
D
,
{
A
,
B
}
,
{
C
,
E
}
)
move(S,B, Startflow, D, XOR)
M
o
v
e
(
S
,
F
,
C
,
G
)
a)
b) c)
d)
A
C
B
E
F
GD
A
E
B
C
F
GD
C
E
A B
F
GD
Fig. 6. Process models after every move operation
is out of the scope here; we refer readers to the reduction rules introduced with
the ADEPT method [5].
According to the position the moved activities have in S0
same, we can deter-
mine the parameters (i.e. the predecessors, successors and conditions) for every
move operation. In S0
same, activity Dhas predecessors Aand B, and successors E
and C. So one move operations therefore is: move(S, D,{B,A},{C,E}). Similarly,
we obtain the other two move operations: move(S, B,StartFlow,D,XOR) and
move(S, F,C,G). The intermediate process models resulting after every move op-
eration is shown in Figure 6. When comparing order matrices for each model in
Figure 6, it becomes clear that every move operation changes the values of the
row and the column corresponding to the moved activity.
4.4 Silent Activities
A silent activity is an activity which does not contain any operation or action,
and which only exists for control flow purpose. There are two reasons why we
do not consider silent activities in our similarity measure:
1. The appearance of a silent activity can be random. We can add or remove
silent activities without changing the behavior of the process model. For
example, we can replace a control flow edge in a process model by one silent
activity or even a block of silent activities without influencing the behavior
of the process model.
2. The existence of a silent activity also depends on other activities and is
subject to change as other activities change. As example consider Figure 2.
When applying change operation 1to S, the silent activity τis automati-
cally removed after activity Cis moved away.
However, there is one exception for which we need to consider silent activities.
Consider the two process models S1and S2as depicted in Figure 7. If we ignore
XV
T
S2
= {ABC, AC}
T
S1
= {ABC}
S
2
S
1
A B C A C
B
A C
B
D
T
S3
= {AC}
S
3
S
4
A C
T
S2
= {ABC, ADC}
Insert(S
1
, τ, A, C,
XOR
)
d
e
l
e
t
e
(
S
4
,
D
,
X
O
R
)
delete(S
4
, D)
replace(S
2
, τ, D)
d
e
l
e
t
e
(
S
2
,
B
)
i
n
s
e
r
t
(
S
3
,
B
,
A
,
C
,
X
O
R
)
Insert(S
3
,B,A,C)
Fig. 7. The influence of silent activity
the silent activity τ(depicted as an empty node) in S2, and derive the order
matrix of process model S2, it will be the same as the one of S1. Obviously, the
two process model are not equivalent since the trace sets producible by them are
not identical. More precisely, TS2contains one additional trace when compared
to TS1. In general, if one process model can produce additional traces, which
are the sub-string of other traces (cf. Def.4), there must be some silent activities
we cannot ignore. Or if the direct predecessor and direct successor of one silent
activity constitute XORsplit and XORjoin, we can also not ignore this silent
activity (cf. S2in Figure 7).
Figure 7 shows several process model transformations based on high level
change operations. Here we can identify the difference between the two types
of deletion: delete(S4,D) and delete(S4,D,XOR) (cf. Figure 7). The former one
turns an activity into a silent one (transforming S4into S2), while the latter
one blocks the branch which contains activity D(transforming S4to S1). When
a branch is blocked, we do not allow the activities of the branch to become
activated [14, 5]. Since process models S1and S2have same order matrix, purely
comparing order matrices (cf. Section 4) would not be sufficient in the given
situation. Reason is that an order matrix does not uniquely represent a process
model, since the sub-string constraint (cf. Def.4) in Theorem 1 is violated. To
extend our method such that it can uniquely represent a process model without
the sub-string constraint, we must consider these special silent activities (i.e.
a silent activity which has direct predecessor of XORsplit and direct successor
of XORjoin) as well. They will appear in the order matrix and their execution
orders compared with other activities will be documented.
However, the existence of a silent activity is still very much dependent on
other activities, including the scenario described above. For example, if we delete
activity Bin S2as depicted in Figure 7, we will transform S2into S3, i.e. the
silent activity will be simultaneously deleted when activity Bis deleted. We can
identify this situation by either examining the process model or the order matrix.
In the process model, a silent activity τcan be automatically deleted if there is
another silent activity τ0which is contained in the same block but in another
XVI
Figure 2 Figure 4
SS1S2SSsame S'same S'
S 0 / 100% 1 / 86% 1 / 86% 4 / 50% 3 / 57% 3 / 57% 5 / 44%
Figure 2 S10 / 100% 2 / 71 % 4 / 50% 3 / 57% 3 / 57% 5 / 44%
S20 / 100% 5 / 38% 4 / 42% 3 / 57% 5 / 44%
S 0 / 100% 1 / 88% 4 / 50 % 6 / 40%
Figure 4 Ssame 0 / 100% 3 / 57% 5 / 44%
S'same 0 / 100% 2 / 78 %
S' 0 / 100%
Fig. 8. The distances and similarities of different process models
conditional branch (e.g. transforming S2to S3Figure 7). In the order matrix,
we can automatically remove a silent activity τif there is another silent activity
τ0which has the same order relations to the rest of the activities as τhas.
In general, when a silent activity has an XORsplit as direct predecessor and
an XORjoin as direct successor, we need to consider this silent activity when
computing the order matrix of a process model. However, these silent activities
can automatically been deleted when changing the process model. This requires
us to perform additional checks on the process model or order matrix (as de-
scribed above) after every change operation.
4.5 Summary
Taking our example from Figure 4, to transform Sinto S0(S[σiS0), the fol-
lowing six change operations are required: σ={delete(S, X), move(S, F,C,G),
move(S, D,{A,B},{C;E}), move(S, B,StartFlow,D,XOR), insert(S, Y,StartFlow,{A,B}),
insert(S, Z,D,E)}. The distance between the two process models is six, and the
similarity is 0.4 (cf. Def.2).
To illustrate our method and these numbers in more detail, we compare the
distances and similarities between seven process models discussed so far: S,S1
and S2in Figure 2 as well as S,Ssame,S0
same and S0in Figure 4. The distance
and similarity of two process models are specified as distance/similarity in each
corresponding cell in Figure 8. As the transformation is commutable, we only fill
in the upper triangle matrix. Taking Figure 8, we can conclude the following:
1. Changing the activity set always leads to changing distance. For example,
d(Sn,S0
same)always equals d(Sn,S0)+ 2, where Snstands for a process model
other than S0or S0
same in Figure 8. Reason is that S0contains two unique
activities Yand Zwhen compared to S0
same, while the rest are identical.
2. If three process models S,S0, and S00 have the same set of activities, d(S,S00 )
d(S,S0)+d(S0,S00 )will hold. It is also easy to understand this because some
activities could be moved twice when transforming Sinto S0and S0into S00.
XVII
5 related work
Various papers have studied the process similarity problem and provided some
useful results [12–14, 21]. In graph theory, graph isomorphism[2] and sub-graph
isomorphism [24] are often used to measure the similarity between two graphs.
Unfortunately, these measures usually only examine edges and nodes and cannot
catch the syntactical issues of a PAIS. In the database field, the delta-algorithm
[1] is used to measure the difference between the two models. It extends the
above mentioned approaches by assigning attributes to edges and nodes [11].
Still, it can only catch change primitives, and will further run into problems
when catching the high-level change operations (as described in Section 3). In
the fields of Petri-nets and state automata, similarity based on change is difficult
to measure since these formalisms are not very tolerant for changes. Inheritance
rules [14] are one of the very few techniques given to show the transformation of
a process model described as Petri-net. Trace equivalence is commonly used to
compare whether two process models are similar or identical [26]. In addition,
bisimulation [14, 15] extends trace equivalence by considering stronger notions.
Also based on traces, [12] assign weights to each trace based on execution logs
which reflect the importance of a certain trace. Respective techniques are applied,
for example in genetic process mining [13]. The edit distance [21] is also used to
measure the difference between traces; the sum of them represents the differences
of two models. Some similarity measures use two figures (precision and recall)
to evaluate the difference between process models S1to S2[12, 16]. However,
this asymmetric way might lead confusion since it is therefore not commutable.
None of these approaches measures the similarity by a unique and commutative
figure, based on the effort for change.
6 Summary and OutLook
We have provided a method to quantitatively measure the distance and similarity
between two process models based on the efforts for model transformation. High-
level change operations are used to evaluate the similarity since they guarantee
soundness and also provide more meaningful results. We further applied digital
logic in boolean algebra so that the number of change operations required to
transform process model Sinto process model S0is minimized.
Some additional work is needed to enrich our knowledge on process similar-
ity. As a first step, we will extend our method so that it will be able to measure
the similarity between process models using additional constitutes (e.g., syn-
chronization and loopback edges [5]). The next step will be to incorporate data
flow, temporal constraints, and resources, so that the similarity measure can be
further applied to practice.
References
1. W.Labio , H.Garcia-Molina: Efficient Snapshot Differential Algorithms for Data
Warehousing. 22th Int. Conf. on Very Large Data Bases. pp 63-74, 1996
XVIII
2. L.Babai ,P.Erd¨os, S.M.Selkow: Random Graph Isomorphism. SIAM Journal of
Computation 9, pp 628-634, 1980
3. P.Balabko, A.Wegmann, A.Ruppen and N.Clement: Capturing design rationale
with functional decomposition of roles in business processes modeling. Software
Process: Improvement and Practice. 10(4):379-392, 2005.
4. I Bider: Masking flexibility behind rigidity: Notes on how much flexibility people
are willing to cope with. CAiSE’05 Workshop, pp 7-18, 2005.
5. M. Reichert and P. Dadam. ADEPTflex - Supporting Dynamic Changes of Work-
flows Without Losing Control. Journal of Intelligent Information Systems, 10(2):93–
129, 1998.
6. M.Dumas, W.M.P. van der Aalst, and A.H.M. ter Hofstede: Process-Aware Infor-
mation Systems: Bridging People and Software through Process Technology.Wiley
& Sons, 2005.
7. S.Rinderle, M.Reichert and P.Dadam. Correctness Criteria for Dynamic Changes
in Workflow Systems - A Survey. Data and Knowledge Engineering, 50(1):9-34,
2004.
8. S.Rinderle, B.Weber, M.Reichert and W.Wild: Integrating Process Learning and
Process Evolution-A Semantics Based Approach. BPM’05, pp 252-267,2005.
9. M.Kradolfer and A.Geppert: Dynamic workflow schema evolution based on work-
flow type versioning and workflow migration. CoopIS’99, pp 104-114, 1999
10. M.Weske: Formal foundation and conceptual design of dynamic adaptations in a
workflow management system. HICSS-34, 2001.
11. S.Rinderle, M.Jurisch and M.Reichert: On Deriving Net Change Information From
Change Logs CThe DELTA-LAYER Algorithm. BTW’07, pp364-381,2007
12. W.M.P. van der Aalst, A.K. Alves de Medeiros, and A.J.M.M. Weijters: Pro-
cess Equivalence: Comparing Two Process Models Based on Observed Behavior.
BPM’06, LNCS 4102, pp 129-144. 2006.
13. A.K.A. de Medeiros, A.J.M.M. Weijters and W.M.P. van der Aalst. Genetic Process
Mining: A Basic Approach and its Challenges. BPM’05 Workshops , LNCS 3812
pp 203-215. 2006.
14. W.M.P. van der Aalst and T. Basten:Inheritance of Workflows: An Approach to
Tackling Problems Related to Change. Theoretical Computer Science, 270(1-2):125-
203, 2002
15. R.J. van Glabbeek and W.P. Weijland: Branching Time and Abstraction in Bisim-
ulation Semantics. Journal of the ACM, 43(3):555-600, 1996.
16. S.S. Pinter and M. Golani: Discovering Workflow Models from Activities’ Lifespans.
Computers in Industry, 53(3):283-296, 2004.
17. C.W. G¨unther, S. Rinderle, M. Reichert, and W.M.P. van der Aalst: Change
Mining in Adaptive Process Management Systems. CoopIS’06, LNCS 4275, pp
309-326. 2006.
18. M.Reichert, S.Rinderle and P.Dadam: On the common support of workflow type
and instance changes under correctness constraints. CoopIS’03, LNCS 2888, pp
407-425, 2003.
19. T.H.Cormen, C.E.Leiserson, R.L.Rivest and C.Stein: Introduction to Algorithms,
Second Edition. MIT press, 2001, pp 549-552
20. S.Brown and Z.Vranesic: Fundamentals of Digital Logic with Verilog Design.
McGraw-Hill, 2003
21. A.Wombacher and M.Rozie: Evaluation of Workflow Similarity Measures in Service
Discovery. Service Oriented Electronic Commerce. pp 51-71, 2006
22. S.Rinderle: Schema Evolution in Process Management Systems. Ph.D thesis, Univ.
of Ulm, 2004
XIX
23. D.M. Strong and S.M. Miller: Exceptions and exception handling in computerized
information processes. ACM Trans on Information Systems, 13(2):206-233, 1995
24. E. B.Krissinel and K.Henrick: Common subgraph isomorphism detection by back-
tracking search. Softw. Pract. Exper. 34(6):591-607, 2004
25. B.Weber and S.B.Rinderle and M.Reichert: Change Patterns and Change Support
Features in Process-Aware Information Systems. CAiSE 2007, LNCS 4495. pp 574-
588. 2007
26. J.Hidders, M.Dumas, W. M.van der Aalst, A. H. ter Hofstede, and J.Verelst. When
are two workflows the same?. 2005 Australasian Symposium on theory of Comput-
ing - Vol 41. ACM vol. 105. 3-11. 2005
27. C.Li, M.Reichert, A.Wombacher. Process Similarity Based on High Level Change
Operations. CTIT Technical Report, University of Twente, The Netherlands.