On Measuring Process Model Similarity based on High-level Change Operations [original]

On Measuring Process Model Similarity based

on High-level Change Operations

Chen Li1, Manfred Reichert2, and Andreas Wombacher3

1Information System group, University of Twente, The Netherlands

[email protected]

2Institute of Databases and Information System, Ulm University, Germany

[email protected]

3Database group, University of Twente, The Netherlands

[email protected]

Abstract. For various applications there is the need to compare the

similarity between two process models. For example, given the as-is and

to-be models of a particular business process, we would like to know how

much they differ from each other and how we can efficiently transform

the as-is to the to-be model; or given a running process instance and

its original process schema, we might be interested in the deviations

between them (e.g. due to ad-hoc changes at instance level). Respective

considerations can be useful, for example, to minimize the efforts for

propagating the schema changes to other process instances as well. All

these scenarios require a method to measure the similarity or distance

between two process models based on the efforts for transforming the one

into the other. In this paper, we provide an approach using digital logic

to evaluate the distance and similarity between two process models based

on high-level change operations (e.g. to add, delete or move activities). In

this way, we can not only guarantee that model transformation results in

a sound process model, but also ensure that related efforts are minimized.

1 Introduction

Business world is getting increasingly dynamic, requiring from companies to con-

tinuously adapt business processes as well as supporting Process-Aware Informa-

tion Systems (PAISs) [3] in order to cope with the frequent and unprecedented

changes in their business environment [19]. Organizations and enterprises need

to continuously Re-engineer their Business Processes (BPR), i.e. they need to be

able to flexibly upgrade and optimize their business processes in order to stay

competitive in their market. Furthermore, PAISs should allow for process flexi-

bility, i.e., it must be possible for users to deviate from the pre-defined process

model at the instance level if required.

The pivotal research on process flexibility over the last years [1, 11] has pro-

vided the foundation for dynamic process change to reduce the cost of change

1Supported by the Netherlands Organization for Scientific Research (NWO) under

contract number 612.066.512

in PAISs. Process flexibility denotes the capability to reflect externally triggered

change by modifying only those aspects of a process that need to be changed,

while keeping the other parts stable, i.e., the ability to change or evolve the

process without completely replacing it [11]. To compare two process models is

a fundamental task in this context. In particular, it becomes necessary to cal-

culate the minimal difference between two process models based on high level

changes. If we need to transform one model into another, for example, efforts can

then be reduced and the transformation can go smoothly; i.e. we do not need

to re-define the new process model from scratch, but only apply these high-level

changes either at process type or process instance level. Several approaches like

ADEPT [11], WASA [20] or TRAM [6], have emerged to enable process change

support in PAIS (see [13] for an overview).

Based on the two assumptions that (1) process models are block-structured

and (2) all activities in a process model have unique labels, this paper deals with

the following fundamental research question:

Given two process models Sand S0, how much do they differ from each other

in terms of high-level change operations? And what is the minimal effort, i.e.

the minimal number of change operations needed to transform Sinto S0?

Clearly, our focus is on minimizing the number of high-level change operations

needed to transform process model Sinto process model S0. Soundness of the

resulting process model should be also not sacrificed. We apply the high-level

change operations as described in [11, 19] in the given context. By considering

high-level changes, we can distinguish our approach from traditional similarity

measures like graph or sub-graph isomorphism [15]. Both only consider basic

change primitives like insertion or deletion of single nodes and edges.

Answering the above research question will lead to better cost efficiency when

performing BPR, since the efforts to implement the corresponding changes in

the supporting PAIS are minimized. At process instance level, we can reduce the

efforts to propagate process type changes to the running instances [13]. Finally

the derived differences between original process model and its process instances

can be used as a set of pure and concise logs for process mining [4].

In previous work, we have provided the technical foundation for users to

flexibly change process models at both the process type and the process instance

level. For example, users may dynamically insert, delete or move an activity at

these two levels [11]. In addition, snapshot differential algorithms [7], known

from database technology, can be used as a fast and secure method to detect the

change primitives (e.g. to add or delete nodes and edges) needed to transform

one process model into another.

Using this framework and snapshot differential algorithm, this paper applies

Digital Logic in Boolean Algebra [14] to provide a new method to transform

a process model into another one based on high-level change operations. This

method does not only minimize the number of changes needed in this context,

but also guarantees soundness of the changed process model, i.e. the process

model remains correct when applying high-level change operations. We further

provide two measures –process distance and process similarity –based on high-

level change operations, which indicate how costly it is to transform process

model Sinto model S0, and how different Sand S0are.

The remainder of this paper is organized as follows: Sec. 2 introduces back-

grounds needed for the understanding of this paper. In Sec. 3 we discuss reasons

and difficulties for deriving high-level change operations. Sec. 4 describes an ap-

proach to detect the difference between two process models. Sec. 5 discuss related

work. The paper concludes with a summary and outlook in Sec. 6.

2 Backgrounds

Let Pdenote the set of all correct process models. A particular process model

S= (N, E, . . .)∈ P is defined as a well-structured Activity Net [11]. Ncon-

stitutes a set of activities aiand Eis a set of precedence relations (i.e. control

edges) between them. To limit the scope, we assume Activity Nets to be block

structured. Examples are depicted in Fig 1.

We assume that a process change (i.e. Activity Net Change) is accomplished

by applying a sequence of high-level change operations to a given process model

Sover time [11]. Such change operations modify the initial process model by

altering the set of activities and/or their order relations. Thus, each application

of a change operation results in a new process model. We define process change

as follows:

Definition 1 (Process Change). Let Pdenote the set of possible process mod-

els and Cthe set of possible process changes. Let S, S0∈ P be two process models,

let ∆∈ C be a process change, and let σ=h∆1, ∆2, . . . ∆ni ∈ C∗be a sequence

of process changes performed on initial model S. Then:

–S[∆iS0iff ∆is applicable to Sand S0is the process model resulting from

the application of ∆to S.

–S[σiS0iff ∃S1, S2,...Sn+1 ∈ P with S=S1,S0=Sn+1, and Si[∆iSi+1 for

i∈ {1, . . . n}.

Examples of high-level change operations and their effects on a process model

are depicted in Table 1. Issues concerning the correct use of these operations and

related pre-/post- conditions are described in [11]. If some additional constraints

are met, the high-level change operations depicted in Table 1 will be also ap-

plicable at process instance level. Although the depicted change operations are

discussed in relation to our ADEPT framework [11], they are generic in the sense

that they can be easily transferred to other process meta models as well. For

example, the change operations in Table 1 can be also expressed by the life-cycle

inheritance rule as used in the context of Petri Nets [16]. We are referring to

ADEPT in this paper since it covers by far most high-level change operations

and change patterns respectively when compared to other approaches [19]. It

further has served as basis for representing our method.

A trace ton process model Sdenotes a valid execution sequence t≡<

a1, a2, . . . , ak>of activities ai∈Non Saccording to the control flow de-

fined by S. All traces process model Scan produce are summarized in trace

Table 1. Examples of High-Level Change Operations

Change Operation ∆on S opType subject paramList

insert(S, X, A,B, [sc]) insert X S, A,B, [sc]

Effects on S: inserts activity X between activity sets Aand B. It is a conditional insert

if [sc] is specified (i.e. [sc] = XOR)

delete(S, X, [sc]) delete X S, [sc]

Effects on S: deletes activity X from S, i.e. X turns into a silent one. [sc] is specified ([sc] =

XOR) when we block the branch with X, i.e. the branch which contains X will not be activated

move(S, X, A,B, [sc]) move X S, A,B, [sc]

Effects on S: moves activity X from its original position in Sto another position between

activity sets A and B. (it is a conditional insert if [sc] is specified)

replace(S, X, Y) replace X Y

Effects on S: replaces activity X by activity Y

set TS.t(a≺b) is denoted as precedence relation between activities aand bin

trace t≡< a1, a2, . . . , ak>iff ∃i < j :ai=a∧aj=b. Here, we only consider

traces composing ’real’ activities, but no events related to silent activities i.e.,

activity nodes which contain no operation and exist only for control flow pur-

pose, see Section 4.4. Finally, we will consider two process models as being the

same if they are trace equivalent, i.e. S≡S0iff TS≡ TS0. The stronger notion

of bi-similarity [5] is not required in our context.

3 High-level Change Operations

3.1 Complementary Nature of Change and Execution Logs

Most PAISs support ad-hoc deviations at instance level and record them in

change logs. Thus, they provide additional information when compared to tra-

ditional PAISs which only record execution logs (which typically document the

start and/or end time of activities). Change logs and execution logs document

different run-time information on adaptive process instances and are not inter-

changeable. Even if the original process model is given, it will be not possible to

convert the change log of a process instance to its execution log or vice-verse. As

example, take the original and simplified patient treatment process as depicted in

Fig. 1a: a patient is admitted to a hospital, where he first registers, then receives

treatment, and finally pays. Assume that, due to an emergency situation, for one

particular patient, we want to first start the treatment of this patient and allow

him to register later during treatment. To represent this exceptional situation in

the process model of the respective instance, the needed change would be to move

activity receive treatment from its current position to a position parallel to activ-

ity register. This change leads to a new model S0, i.e., S[σiS0with σ=< move(S,

receive

treatment

Admitted

a) S: original process model b) S’: final execution & change

receive

treatment

pay

AND-Split

AND-Join

admitted

∆=Move (S, register, admitted, pay)

S[∆>S’

e=<admitted, receive treatment, register, pay>

Fig. 1. Change Log and Execution Log are not Interchangeable

delEdge(StartFlow,A); delEdge(A,B);

delEdge(B,C); addEdge(B,A);

addEdge(A,C); addEdge(StartFlow,B)

delEdge(A,B); delEdge(B,C); delEdge(B,D);

delEdge(C, τ); delEdge(D,τ); delEdge(t,E);

delEdge(τ, F}; delNode(τ); addEdge(A,C);

addEdge(C,B); addEdge(B,D); addEdge(D,E);

addEdge(D,F); updateNodeType(D,

XorSplit); updateNodeType(B, empty);

A C

Change Primitives

∆

=Move (S, C, A, B)

model after change ∆

∆

=Move (S, A, B, C)

S[∆

original process model

model after change ∆

AND-Split

AND-Join

XOR-Split

XOR-Join

Snapshot difference

Fig. 2. High-level Change Operation and Corresponding change primitive

reveive treatment,admitted,pay)>. Meanwhile, the execution log efor this par-

ticular instance can be e=<admitted,receive treatment,register, pay >(cf. Fig.

1b). If we only have process model Sand its execution log, it will be not possi-

ble to determine this change because the process model which can produce such

execution log is not unique. For example, a process model with the four activi-

ties contained in four parallel branches could produce this execution log as well.

On the contrary, it is generally not possible to derive the execution log from a

change log, because execution behavior of S0is also not unique. For example, a

trace <admitted,register,receive treatment,pay >is also producible on S0as

well. Consequently, change logs provide additional information when compared

to pure execution logs.

3.2 Why Do We Need High-level Change Operations?

After showing the importance of change logs, we now discuss why we need high-

level change operations rather than change primitives (i.e., low-level changes at

edge and node level). Left side of Fig. 2 shows original process model Swhich

consists of a parallel branching, a conditional branching, and a silent activity τ

(depicted as empty node) connecting these two blocks. Assume that two different

high-level change operations are applied to Sresulting in models S1and S2:∆1

moves activity Cfrom its current location to the position between activities A

and B, which leads to S1i.e., S[∆1iS1with ∆1=move(S, C,A,B). ∆2moves A

to the position between Band C, i.e. S[∆2iS2with ∆2=move(S, A,B,C). Fig. 2

additionally depicts the change primitives representing snapshot differences be-

tween Sand models S1and S2, respectively. Using high-level change operations

offers the following advantages:

1. High-level change operations guarantee soundness: i.e., application of a high-

level change operation to a sound model Sresults in another sound model

S0[11]. This also applies to our example from Fig. 2. By contrast, when

applying one single change primitive (e.g., deleting an edge in S) soundness

cannot be guaranteed anymore. Generally, if we delete any of the edges in

S, the resulting process model will not be necessarily sound.

2. High-level change operations provide richer syntactical meanings than change

primitives. Generally, a high-level change operation is built upon a set of

change primitives which collectively represent a complex modification of a

process model. As example take ∆1from Fig. 2. This high-level change oper-

ation requires 15 change primitives for its realization (deleting edges, adding

edges, deleting the silent activity, and updating the node types).

3. An important aspect, not discussed so far, concerns the number of change

operations needed to transform model Sinto target model S0. For example,

we need only one move operation to transform Sto either S1or S2. However,

when using change primitives, migrating Sto S1necessitates 15 change

primitives, while the second change ∆2can be realized based on 6change

primitives. This example also shows that change primitives do not provide

an adequate means to determine the difference between two process models.

Thus the required number of change primitives cannot represent the efforts

for process model transformations.

3.3 The Challenge to Derive High-level Change Operations

After sketching the benefits coming with high-level change operations, this sec-

tion discusses challenges of deriving them. When comparing two process models,

the change primitives needed for transforming one model into another can be

easily determined by performing two snapshots and a delta analysis on them [7].

An algorithm to minimize the number of change primitives is given in [12]. How-

ever, when trying to derive the high-level change operations needed for model

transformation, several challenges occur. As example consider Fig. 3:

1. When performing two delete operations on S(i.e., ∆1=delete(S, B) and

∆2=delete(S, C)), we obtain a new model S00 (i.e., S[σiS00 with σ=<

∆1, ∆2>), as well as an undetectable intermediate model S0with S[∆1iS0

and S0[∆2iS00. When examining the change primitives corresponding to each

high-level change operation, we first need to add edge (A,C) after the first

delete operation ∆1, and remove this edge (A,C) when applying the sec-

ond delete operation ∆2. However, when performing a delta analysis for

the original process model Sand the resulting process model S00, the two

change primitives (addEdge(A,C) introduced by the first delete operation and

delEdge (A,C) introduced by the second one) jointly have no effect on the

resulting process model S00, i.e., they cannot be detected by snapshot anal-

ysis. Consequently, deriving high-level change operations based on change

primitives would be challenging because the change primitives required for

every high-level change do not always appear in the snapshot differences be-

tween the original and resulting models. In Fig. 3, none of the two change

primitive sets associated with ∆1or ∆2constitute a sub-set of the change

primitive set associated with σ.

2. Even if there is just one high-level change operation, it will remain difficult

to derive it with delta algorithm. For example, in Fig. 3 the delta algorithm

shows that 15 change primitives are needed to transform Sinto S1. However,

the depicted changes can be also realized by just applying one high level move

operation to S.

S’(∆

>S’’

S(∆

>S’

∆

= Delete (S, B)

∆

= Delete (S’, C)

delEdge(A,B), delEdge(B,C),

addEdge(A,C)

, delNode(B)

delEdge(C,D), addEdge(A,D),

delEdge(A,C)

delNode(C)

S(σ>S’’

σ =< Delete (S, B),

Delete (S, C) >

delEdge(A,B),

delEdge(B,C),

delEdge(C,D),

addEdge(A,D),

delNode(B)

delNode(C)

S’’ A D

S’ AC D

SAB C D

Fig. 3. Non-detectable Change Primitives

4 Detecting the Minimal Number of High-level Changes

In this section, we introduce our method to detect the minimal number of change

operations needed to transform a given process model Sinto another model S0.

As example, consider the process models Sand S0in Fig. 4.

4.1 General Description of our Method

As mentioned in Section 1, the key issue of our work is to minimize the number

of change operations needed to transform a process model S= (N, E, . . .)∈ P

into another model S0= (N0, E0, . . .)∈ P. Generally, three steps are needed (cf.

Fig. 4) to realize this minimal transformation:

1. ∀ai∈N\N0:delete all activities being present in S, but not in S0. This first

step transforms Sto Ssame (cf. Fig. 4b).

2. ∀ai∈NTN0:move all activities being present in both models to the loca-

tions as reflected by S0. Regarding our example, this second step transforms

Ssame to S0

same (cf. Fig. 4c).

3. ∀ai∈N0\N:insert those activities being present in S0, but not in S. As

depicted in Fig. 4, the third step transforms S0

same to S0(cf. Fig. 4d).

Insertions and deletions deal with changes of the set of activities. Here, we can

hardly do anything to reduce efforts (i.e., the number of required insert/delete

operations): New activities (ai∈N0\N) must be added and obsolete activities

(aj∈N\N0) must be deleted.

The focus of minimality can therefore be shifted to the use of the move opera-

tion, which changes the structure of a process model, but not its set of activities.

Since a move operation logically corresponds to a delete followed by an insert op-

eration, we can transform Ssame to S0

same by maximally applying n=|NTN0|

move operations. Reason is that nmove operations correspond to deleting all

activities and then re-inserting them at their new positions. Correspondingly, n

is the maximal number of change operations needed to transform one process

model into another, both with same set of activities (Ssame and S0

same in our

example from Fig. 4). To measure the complete transformation from Sto S0, we

formally define process distance and process similarity as follows:

Definition 2 (Process Distance and Process Similarity). Let S= (N, E, . . .),

S0= (N0, E0, . . .)∈ P be two process models. Let further σ=h∆1, ∆2, . . . ∆ni ∈

C∗be a sequence of change operations transforming Sinto S0(i.e. S[σiS0). Then

: original process model

S’

: destination process model

same

: original model with shared activities

S’

same

: destination model with shared activities

Transform

S to S’

Step2: move

b) c)

Fig. 4. Three Steps to Transform Sinto S0

the distance between Sand S0is given by d(S,S0)=min{|σ| |σ∈ C∗∧S[σiS0}.

Furthermore, process similarity between Sand S0equals to 1−d(S,S0)

|N|+|N0|−|NTN0|,

i.e., similarity equals to ((maximal number of changes - minimal number of

changes) / maximal number of changes).

4.2 Determining Required Activity Deletions and Insertions

To accomplish Step 1 and Step 3 of our method, we have to deal with the

change of the activity set when transforming Sinto S0. It can be easily detected

by applying existing snapshot algorithms [7] to both Sand S0. As described in

Section 4.1, as first step we need to delete all activities ai∈N\N0contained in

S, but not in S0. Regarding our example from Fig. 4, we can derive as our first

high-level change operation ∆1=delete(S, X). Similarly, activities contained in

S0, but not in S, are inserted in Step 3, after having moved the shared activities to

their respective position in S0(S0

same respectively). The parameters of the insert

operation, i.e. the predecessors and successors of the inserted activity, are just

like how they appear in S0. In this way, we obtain the last two change operations

for our example: Insert(S, Y,StartFlow,{A,B}) and Insert(S, Z,D,E).

4.3 Determining Required Move Operations

We now focus on Step 2 of our method; i.e., to transform two process models

with same activity set using move operations. Here, we can ignore the activities

not contained in both Sand S0(cf. 4.2). Instead, we consider the two process

models Ssame and S0

same respectively, as depicted in Fig. 4.

Determine the Order Matrix of a Process Model One key feature of our

ADEPT change framework is to maintain the structure of the unchanged parts

of a process model [11]. For example, if we delete an activity, this will neither

influence the successors nor the predecessors of this activity, and also not their

control relation. To incorporate this feature in our approach, rather than only

looking at direct predecessor-successor relationships between two activities (i.e.

control flow edges), we consider the transitive control dependencies between all

pairs of activities; i.e., for every pair of activities ai, aj∈NTN0,ai6=aj,

their execution order compared to each other is examined. Logically, we check

execution orders by considering all traces a process model can produce (cf. Sec.

2). Results can be formally described in a matrix An×nwith n=|NTN0|. Four

types of control relations can be identified (cf. Def. 3):

Definition 3 (Order matrix). Let S= (N, E, . . .)∈ P be a process model

with N={a1, a2, . . . , an}. Let further TSdenote the set of all traces producible

on S. Then: Matrix An×nis called order matrix of Swith Aij representing

the relation between different activities ai,aj∈Niff:

–Aij = ’1’ iff (∀t∈ TSwith ai, aj∈t⇒t(ai≺aj))

If for all traces containing activities aiand aj,aialways appears BEFORE

aj, we denote Aij as ’1’, i.e., aiis predecessor of ajin the flow of control.

–Aij = ’0’ iff (∀t∈ TSwith ai, aj∈t⇒t(aj≺ai))

If for all traces containing activity aiand aj,aialways appears AFTER aj,

then we denote Aij as a ’0’, i.e. aiis successor of ajin the flow of control.

–Aij = ’*’ iff (∃t1∈ TS, with ai, aj∈t1∧t1(ai≺aj))∧(∃t2∈ TS, with

ai, aj∈t2∧t2(aj≺ai))

If there exists at least one trace in which aiappears before ajand at least

one other trace in which aiappears after aj, we denote Aij as ’*’, i.e. ai

and ajare contained in different parallel branches.

–Aij = ’-’ iff ( ¬∃t∈ TS:ai∈t∧aj∈t)

If there is no trace containing both activity aiand aj, we denote Aij as ’-’,

i.e. aiand ajare contained in different branches of a conditional branching.

We revisit our example from Fig. 4. The order matrices of Ssame and S0

same

are shown in Fig. 5. The main diagonal is empty since we do not compare an

activity with itself. As one can see, elements Aij and Aji can be derived from

each other. If activity aiis a predecessor of activity aj(i.e. Aij = 1), we can

always conclude that Aji = 0 holds. Similarly, if Aij ∈ {’*’,’-’}, we will obtain

Aji =Aij . As a consequence, we can simplify our problem by only considering

the upper triangular matrix A= (Aij)j>i.

Execution order Matrix S

same

Execution order Matrix S’

same

A B

The first group of

activities and conflicts

C D E F

The second group of

activities and conflicts

Fig. 5. Order Matrices of Ssame and S0

same from Fig. 4

Under certain constraints, an order matrix Acan uniquely represent the

process model, based on which it was built on. This is stated by Theorem 1.

Before giving this theorem, we need to define the notion of substring of trace:

Definition 4 (Substring of trace). Let tand t0be two traces. We define tis

a sub-string of t0iff [∀ai,aj∈t,t(ai≺aj)⇒ai,aj∈t0∧t0(ai≺aj)] and

[∃ak∈N:ak/∈t∧ak∈t0].

Theorem 1. Let S, S0∈ P be two process models, with same set of activities

N={a1, a2, . . . , an}. Let further TS,TS0be the related trace sets and An×n,

n×nbe the order matrices of Sand S0. Then S6=S0⇔A6=A0, if (¬∃t1, t0

1∈

TS:t1is a substring of t0

1) and (¬∃t2, t0

2∈ TS0:t2is a substring of t0

2).

According to Theorem 1, there will be a one-to-one mapping between a pro-

cess model Sand its order matrix A, if the substring constraint is met. A proof of

Theorem 1 can be found in [8]. A detailed discussion of the sub-string restriction

is given in Section 4.4. Thus, when comparing two process models, it is sufficient

to compare their order matrices (cf. Def. 3), since a order matrix can uniquely

represent the process model. This also means that the differences of two process

models can be related to the differences of their order matrices. If two activities

have different execution order in two process models, we will define the notion

of conflict as follows:

Definition 5 (Conflict). Let S, S0∈ P be two process models with same set of

activities N. Let further Aand A0be the order matrices for Sand S0respectively.

Then: Activities aiand ajare conflicting iff Aij 6=A0

ij. We formally denote this

as C(ai,aj).CF := {C(ai,aj)|Aij 6=A0

ij}then corresponds to the set of all

existing conflicts.

Fig. 5 marks up differences between the two order matrices in grey. The set of

conflicts is as follows: CF ={C(A,B),C(C,D),C(C,F),C(D,E),C(D,F),C(E,F)}.

Optimizing the Conflicts To come from Ssame to S0

same (c.f. Fig. 4), we have

to eliminate conflicts between these two models by applying move operations.

Obviously, if there is no conflict for the two models, they will be identical. Every

time we move an activity from its current position in Ssame to the position it has

in S0

same, we can eliminate the conflicts this activity has with other activities.

For example, consider activity Ain Fig. 4. If we move Afrom its position in Ssame

(preceding B) to its new position in S0

same (Aand Bare contained in two different

branches of a conditional branching block), we can eliminate conflict C(A,B). As

shown in the order matrices, moving Arequires two steps. First, set the elements

in the first row and first column of An×n(which corresponds to activity A) to

empty, since Ais moved away. Second, reset these elements according to the new

order relation of A, when compared to the other activities from S0

same. So every

time we move an activity, we are able to change the value of its corresponding

row and column in the order matrices, i.e., we change these values corresponding

to the original model to the values compliant with the target model. By doing

this iteratively, we can change all the values and eliminate all the conflicts so

that we finally achieve the transformation from Ssame to S0

same.

A non-optimal solution would be to move all the activities involved in the

conflicts as set out by CF, from their positions in Ssame to the positions they

have in S0

same. Regarding our example from Fig. 5, to apply this straightforward

method, we would need to move activities A,B,C,D,Eand Ffrom their positions

in Ssame to the ones in S0

same. However, this naive method is not in line with our

goal to minimize the number of applied change operations. For example, after

moving activity Afrom its current position in Ssame to the position it has in

same, we do not need to move activity Banymore, because after applying this

change operation, there are no activities with which activity Bstill has conflicts.

Digital logic in Boolean algebra [14] helps to solve this minimization prob-

lem. Digital logic constitutes the basis for digital electronic circuit design and

optimization. In this field, engineers face the challenge to optimize the internal

circuit design given the required input and output signals. To apply such tech-

nique in our context, we consider each process activity as an independent input

signal and we want to design a circuit which can cover all conflicts defined by CF

(cf. Def 5). If activity aiconflicts to activity aj, we can either move one of them

or both of them from the positions they have in Ssame to the ones they have in

same. Doing so, the conflict will not exist any more. Reason is that every time

we move an activity from the position it has in Ssame to the position it has in

same, we reset the corresponding row and column of this activity in the order

matrix. A conflict can be interpreted as a digital signal: When the two input

signals aiand ajare both ”true”(this means we do not move activity aiand

aj), we cannot solve the conflict and the ’circuit’ shall give an output signal of

”false”. If we apply this to all conflicts in CF, we will obtain all ”false” signals.

Meanwhile, the ”circuit” should be able to tell us what will result in a ”true”

output (i.e., the negative of all ”false” signals). This ”true” output represents

which activities we need to move. Regarding our example from Fig. 5, given the

set of conflicts CF, our logic expression then is: AB +CD +CF +DE +DF +EF.

The complexity for optimizing the logic expression is NP-Hard [14]. Therefore

it is advantageous to reduce the size of the problem. Concerning our example,

we can cut down the optimization problem into two groups: one with activities

Aand B, and conflict C(A,B); another one with activities C,D,Eand F, and

the following set of conflicts {C(C,D),C(C,F),C(D,E),C(D,F),C(E,F)}. Such a

division can be achieved in O(n) time in the following three steps. Step 1: List

all conflicting activities, and set every activity as a group. Step 2: If conflicting

activities aiand aj(i.e., C(ai,aj)) are contained in two different groups, merge

the two groups. Step 3: Repeat Step 2 for all conflicts in CF. After these three

steps, we can divide the activities as well as the associated conflicts into several

groups. Regarding our example, the optimization problem can be divided into

two sub-optimization problems: AB and CD +CF +DE +DF +EF. We depict this

by the two small matrices in Fig. 5.

Optimizing logic expressions has been intensively discussed in Discrete Math-

ematics. Therefore we omit details here and refer to Karnaugh map [14] and

same

S’

same

(

{

}

{

}

)

move(S,B, Startflow, D, XOR)

(

)

b) c)

A B

Fig. 6. Process Models After Every Move Operation

Quine-McCluskey algorithm [14]. We have implemented the latter in our proof-

of-concept prototype. Regarding our example in Fig. 4, the two optimization

results are AB =¯

A+¯

Bfor the first group and CD +CF +DE +DF +EF =¯

D¯

F+

C¯

E¯

F+¯

C¯

D¯

Efor the second group. We can interpret this result as follows. For the

second group, either we move activities Dand F, or we move activities C,Eand

F, or we move activities C,Dand Efrom their position in Ssame to the positions

they have in S0

same. Based on this we can transform Ssame into S0

same since all

conflicts are eliminated. As can be seen from the order matrices, if we change

the value of the corresponding rows and columns of these activities in Ssame, we

can turn Ssame into S0

same. Since we want to minimize the number of change

operations, we can draw the conclusion that activities Dand Fmust be moved.

Same rule applies to the result of the first group. However, there is no differ-

ence whether to move either Aor Bsince both operations count as one change

operation. Here, we arbitrarily decide to move activity B.

So far we have determined the set of activities to be moved. The next step is to

determine the positions where these activities need to be moved to. Operation

move(S, X,A,B,[sc]) will be independent from other move operations (i.e., it

does not matter in which order to move the respective activity) if its direct

predecessors Aand direct successors Bdo not belong to the set of activities to

be moved. Regarding our example from Fig. 4, activity Fsatisfies this condition

since its predecessor Cand successor Gare not moved. If this had not been the

case, we would have to introduce silent activities to put the moved activity to

its corresponding place in S0

same. For example, if we want to first move Bto its

position in S0

same, we will have to introduce a silent activity after Band before

Cand E. Only in this way, we can change the execution order of Bto what it

appears in S0

same. However, such silent activity will be not required if we first

move activity Dto the position it has in S0

same. A detailed discussion can be

found in [11].

According to the position the moved activities have in S0

same, we can deter-

mine the parameters (i.e., the predecessors, successors and conditions) for every

move operation. In S0

same, activity Dhas predecessors Aand B, and successors E

and C. So one move operations therefore is move(S, D,{B,A},{C,E}). Similarly,

we obtain the other two move operations: move(S, B,StartFlow,D,XOR) and

move(S, F,C,G). The intermediate process models resulting after every move op-

eration are shown in Fig. 6. When comparing order matrices for each model in

= {ABC, AC}

= {ABC}

A B C A C

A C

= {AC}

A C

= {ABC, ADC}

Insert(S

, τ, A, C,

XOR

)

(

)

delete(S

, D)

replace(S

, τ, D)

(

)

(

)

Insert(S

,B,A,C)

Fig. 7. The Influence of Silent Activity

Fig. 6, it becomes clear that every move operation changes the values of the row

and the column corresponding to the moved activity.

4.4 Coping with Silent Activities

A silent activity is an activity which does not contain any operation or action,

and which only exists for control flow purpose. There are two reasons why we

do not consider silent activities in our similarity measure:

1. The appearance of a silent activity can be random. We can add or remove

silent activities without changing the behavior of a process model, e.g., we

can replace a control flow edge in a process model by one silent activity or

even a block of silent activities without influencing process model behavior.

2. The existence of a silent activity also depends on other activities and is

subject to change as other activities change. As example consider Fig. 2.

When applying change ∆1to S, the silent activity τis automatically removed

after activity Cis moved away.

There is one exception for which we need to consider silent activities. Consider

the two process models S1and S2in Fig. 7. If we ignore the silent activity τ

(depicted as an empty node) in S2, and derive the order matrix of S2, it will be

the same as the one of S1. Obviously, the two process models are not equivalent

since the trace sets producible by them are not identical. More precisely, TS2

contains one additional trace when compared to TS1. In general, if one process

model can produce additional traces, which are the sub-string of other traces (cf.

Def.4), there must be some silent activities we cannot ignore. Or if the direct

predecessor and direct successor of one silent activity constitute an XORsplit

and XORjoin, we can also not ignore this silent activity (cf. S2in Fig. 7).

Fig. 7 shows several process model transformations based on high-level change

operations. Here we can identify the difference between the two types of dele-

tion: delete(S4,D) and delete(S4,D,XOR) (cf. Fig. 7). The former one turns an

activity into a silent one (transforming S4into S2), while the latter one blocks

the branch which contains activity D(transforming S4to S1). When a branch

is blocked, we do not allow the activities of the branch to become activated [16,

11]. Since process models S1and S2have same order matrix, purely compar-

ing order matrices (cf. Sect. 4) would not be sufficient in the given situation.

Reason is that here the order matrix does not uniquely represent the process

model, since the sub-string constraint (cf. Def.4) of Theorem 1 is violated. To

Figure 2 Figure 4

SS1S2SSsame S'same S'

S 0 / 100% 1 / 86% 1 / 86% 4 / 50% 3 / 57% 3 / 57% 5 / 44%

Figure 2 S10 / 100% 2 / 71 % 4 / 50% 3 / 57% 3 / 57% 5 / 44%

S20 / 100% 5 / 38% 4 / 42% 3 / 57% 5 / 44%

S 0 / 100% 1 / 88% 4 / 50 % 6 / 40%

Figure 4 Ssame 0 / 100% 3 / 57% 5 / 44%

S'same 0 / 100% 2 / 78 %

S' 0 / 100%

Fig. 8. Distances and Similarities of Different Process Models

extend our method such that it can uniquely represent a process model without

the sub-string constraint, we must consider these special silent activities (i.e., a

silent activity which is direct predecessor of an XORsplit and direct successor of

an XORjoin) as well. They will appear in the order matrix and their execution

orders compared with other activities will be documented.

However, the existence of a silent activity is still very much dependent on

other activities, including the scenario described above. For example, if we delete

Bin S2as depicted in Fig. 7, we will transform S2into S3, i.e., the silent activity

will be simultaneously deleted when Bis deleted. We can identify this situation

by either examining the process model or the order matrix. In the process model,

a silent activity τcan be automatically deleted if there is another silent activity

τ0contained in the same block, but in another conditional branch (e.g., trans-

forming S2to S3). In the order matrix, we can automatically remove a silent

activity τif there is another silent activity τ0with same order relations to the

rest of the activities as τhas.

In general, if a silent activity has an XORsplit as direct predecessor and an

XORjoin as direct successor, we need to consider it when computing the order

matrix of a process model. However, these silent activities can automatically be

deleted when changing the process model. This requires us to perform additional

checks on the process model or order matrix (as described above) after every

change operation.

4.5 Summary

Taking our example from Fig. 4 (i.e., to transform Sinto S0), the following

six change operations are required: σ={delete(S, X), move(S, F,C,G), move(S,

D,{A,B},{C;E}), move(S, B,StartFlow,D,XOR), insert(S, Y,StartFlow,{A,B}),

and insert(S, Z,D,E)}. Distance between the two models is six and similarity

is 0.4 (cf. Def.2). To illustrate our method and these numbers in more detail,

we compare the distances and similarities between the seven process models dis-

cussed so far: S,S1and S2from Fig. 2 and S,Ssame,S0

same and S0from Fig.

4. Distance and similarity of two models are specified as distance/similarity in

each corresponding cell in Fig. 8. As the transformation is commutable, we only

fill in the upper triangle matrix. Taking Fig. 8, we can conclude:

1. Changing the activity set always leads to a modified distance. For example,

d(Sn,S0

same)always equals d(Sn,S0)+ 2, where Snstands for a process model

other than S0or S0

same in Fig. 8. Reason is that S0contains two unique

activities Yand Zwhen compared to S0

same, while the rest are identical.

2. If three process models S,S0, and S00 have same activity sets, we will ob-

tain d(S,S00 )≤d(S,S0)+d(S0,S00 ). It is easy to understand this because some

activities could be moved twice when transforming Sinto S0and S0into S00.

5 Related Work

Various papers have studied the process similarity problem and provided use-

ful results [17, 16, 21, 2]. In graph theory, graph isomorphism[15] and sub-graph

isomorphism [15] are used to measure similarity between two graphs. Unfortu-

nately, these measures usually only examine edges and nodes and cannot catch

the syntactical issues of a PAIS (e.g., guarantee soundness of a process model,

differentiate AND-Split and XOR-Split, and handle silent activities). Algorithms

for measuring tree edit distances [2] shows similar disadvantages, i.e., syntacti-

cal issues of a PAIS are missing. In the database field, the delta-algorithm [7]

is used to measure the difference between the two models. It extends the above

mentioned approaches by assigning attributes to edges and nodes [12]. Still, it

can only catch change primitives, and will further run into problems when con-

sidering high-level change operations. Regarding Petri-nets and state automata,

similarity based on change is difficult to measure since these formalisms are not

very tolerant for changes. Inheritance rules [16] are one of the very few tech-

niques showing the transformation of a process model described as Petri-net.

Trace equivalence is commonly used to compare whether two process models are

similar or identical [5]. In addition, bisimulation [16, 18] extends trace equiva-

lence by considering stronger notions. Also based on traces, [17] assign weights

to each trace based on execution logs which reflect the importance of a cer-

tain trace. The edit distance [21] is also used to measure the difference between

traces; the sum of them represents the differences of two models. Some similar-

ity measures use two numbers (precision and recall) to evaluate the difference

between process models S1and S2[17, 10]. None of these approaches measures

similarity by a unique and commutative number, based on the effort for process

transformation.

6 Summary and Outlook

We have provided a method to quantitatively measure the distance and similarity

between two process models based on the efforts for model transformation. High-

level change operations are used to evaluate the similarity since they guarantee

soundness and also provide more meaningful results. We further applied digital

logic in boolean algebra so that the number of change operations required to

transform process model Sinto process model S0becomes minimal. Respective

distance and similarity measures have already been applied in the filed of process

mining [9].

Additional work is needed to enrich our knowledge on process similarity. As a

first step, we will extend our method so that it is able to measure the similarity

between process models with additional constructs (e.g., loopbacks [11]) and

data flows. The next step will be to enrich the model with semantic relations

between activities and to give weight for each change operation, so that the

similarity measure can be further applied to practice.

References

1. P. Balabko, A. Wegmann, A. Ruppen, and N. Cl´ement. Capturing design rationale

with functional decomposition of roles in business processes modeling. Software

Process: Improvement and Practice, 10(4):379–392, 2005.

2. P. Bille. A survey on tree edit distance and related problems. Theor. Comput.

Sci., 337(1-3):217–239, 2005.

3. M. Dumas, W.M.P. van der Aalst, and A.H.M. ter Hofstede. Process-Aware Infor-

mation Systems. Wiley & Sons, 2005.

4. C.W. G¨unther, S. Rinderle, M. Reichert, and W. M. P. van der Aalst. Change

mining in adaptive process management systems. In CoopIS’06, LNCS4275, pages

309–326, 2006.

5. J. Hidders, M. Dumas, W.M.P. van der Aalst, A.H.M. ter Hofstede, and J. Verelst.

When are two workflows the same? In CATS ’05:, pages 3–11, Darlinghurst, Aus-

tralia, Australia, 2005. Australian Computer Society, Inc.

6. M. Kradolfer and A. Geppert. Dynamic workflow schema evolution based on

workflow type versioning and workflow migration. In COOPIS ’99, page 104,

Washington, DC, USA, 1999. IEEE Computer Society.

7. W. Labio and H. Garcia-Molina. Efficient snapshot differential algorithms for data

warehousing. In VLDB ’96, pages 63–74, San Francisco, CA, USA, 1996.

8. C. Li, M. Reichert, and A. Wombacher. On measuring process model similarity

based on high-level change operations. Technical Report TR-CTIT-07-89, Univer-

sity of Twente, 2007.

9. C. Li, M. Reichert, and A. Wombacher. Discovering reference process models by

mining process variants. In ICWS’08, to appear, 2008.

10. S.S. Pinter and M. Golani. Discovering workflow models from activities’ lifespans.

Comput. Ind., 53(3):283–296, 2004.

11. M. Reichert and P. Dadam. ADEPTflex -supporting dynamic changes of workflows

without losing control. Journal of Intelligent Info. Sys., 10(2):93–129, 1998.

12. S. Rinderle, M. Jurisch, and M. Reichert. On deriving net change information from

change logs - the deltalayer-algorithm. In BTW, pages 364–381, 2007.

13. S. Rinderle, M. Reichert, and P. Dadam. Correctness criteria for dynamic changes

in workflow systems: a survey. Data Knowl. Eng., 50(1):9–34, 2004.

14. S.Brown and Z.Vranesic. Fundamentals of Digital Logic with Verilog Design.

McGraw-Hill, 2003.

15. P.N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Addison-

Wesley, 2005.

16. W.M.P. van der Aalst and T. Basten. Inheritance of workflows: an approach to

tackling problems related to change. Theor. Comput. Sci., 270(1-2):125–203, uary.

17. W.M.P. van der Aalst, A.K.A. de Medeiros, and A.J.M.M. Weijters. Process equiv-

alence: Comparing two process models based on observed behavior. In Business

Process Management (BPM’06), pages 129–144, 2006.

18. R.J. van Glabbeek and W.P. Weijland. Branching time and abstraction in bisim-

ulation semantics. J. ACM, 43(3):555–600, 1996.

19. B. Weber, S. Rinderle, and M. Reichert. Change patterns and change support

features in process-aware information systems. In CAiSE, pages 574–588, 2007.

20. M. Weske. Formal foundation and conceptual design of dynamic adaptations in a

workflow management system. In HICSS ’01, page 7051, Washington, DC, 2001.

21. A. Wombacher and M. Rozie. Evaluation of workflow similarity measures in service

discovery. In Service Oriented Electronic Commerce, pages 51–71, 2006.