Workflow Management versus Case Handling: Results from a Controlled Software Experiment [original]

Workflow Management versus Case Handling:

Results from a Controlled Software Experiment

Bela Mutschler

Daimler AG

Group Research

Ulm, Germany

bela.mutschler@

daimler.com

Barbara Weber

Dept. of Computer Science

University of Innsbruck

Austria

barbara.weber@

uibk.ac.at

Manfred Reichert

Information Systems Group

University of Twente

The Netherlands

m.u.reichert@

utwente.nl

ABSTRACT

Business Process Management (BPM) technology has be-

come an important instrument for improving process per-

formance. When considering its use, however, enterprises

typically have to rely on vendor promises or qualitative re-

ports. What is still missing and what is also demanded by

IT decision makers are quantitative evaluations based on

empirical and experimental research. This paper picks up

this demand and illustrates how experimental research can

be applied in the BPM field. The conducted experiment

compares efforts for implementing a sample business pro-

cess either based on standard workflow technology or on a

case handling system. We motivate and describe the exper-

iment design, discuss threats for the validity of experiment

results (as well as risk mitigations), and present experiment

results. In general, more experimental research is needed in

order to obtain more valid data on the various aspects and

effects of BPM technology and tools.

Categories and Subject Descriptors

H.4 [Information Systems Applications]: Office Au-

tomation—Workflow Management, Case Handling

1. INTRODUCTION

Providing effective IT support for business processes has

become crucial for enterprises to stay competitive in their

market [1]. In response to this need numerous process sup-

port paradigms (e.g., workflow management, service flow

management, case handling), process specification standards

(e.g., WS-BPEL, BPML), and BPM tools (e.g., ARIS Toolset,

Tibco Staffware, FLOWer) have emerged [4].

When evaluating suitability of existing BPM technology

for a particular project or when arguing about its strengths

and weaknesses, typically, it becomes necessary to rely on

qualitative criteria. As one example consider workflow pat-

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

SAC’08 March 16-20, 2008, Fortaleza, Cear´

a, Brazil

terns [19], which can be used to evaluate the expressiveness

of the workflow modeling language provided by a particu-

lar BPM tool. As another example consider process change

patterns [22]. What has been neglected so far are more pro-

found evaluations of BPM technology based on empirical

or experimental research. This is surprising as the benefits

of these research methods have been demonstrated in ar-

eas like software engineering (e.g., in the context of software

development processes or code reviews [10, 7]) for a long

time [17]. From the introduction of experimental research

to BPM as well as to the development of process-aware in-

formation systems, we expect more valid, quantitative data

on costs and benefits of BPM technology. This, in turn, be-

comes increasingly important for IT managers and project

leaders [8].

Picking up this demand, this paper illustrates how experi-

mental research can be applied in the BPM context. For this

purpose we have conducted a controlled software experiment

with 48 participants. Exemplarily, this experiment investi-

gates efforts related to the implementation and change of

business processes either using a conventional workflow sys-

tem [20] or case handling technology [21]. More precisely,

we have used Tibco Staffware [18] as representative of work-

flow technology and FLOWer [2] as representative of case

handling systems. We describe our experiment design, give

a mathematical model of the experiment, and discuss poten-

tial threats for the validity of experiment results. The results

of our experiment help to better understand the complex ef-

forts caused by using BPM technology.

Section 2 motivates the need for experimentation in BPM

and provides background information needed for understand-

ing our experiment. Section 3 describes our experimental

framework. Section 4 deals with the performance and re-

sults of our experiment. Finally, Section 5 discusses related

work and Section 6 concludes with a summary.

2. BACKGROUNDS

Assume that a business process for refunding traveling ex-

penses shall be supported by a Process-Aware Information

System (PAIS) realized on top of BPM technology. This

eTravel business process distinguishes between four roles (cf.

Fig. 1). The traveler initiates the refunding of his expenses.

For this purpose, he has to summarize the travel data in a

travel expense report. This report is then forwarded either

to a travel expense responsible (in case of a national business

trip) or to a verification center (in case of an international

business trip). Both the travel expense responsible and the

verification center fulfill the same task, i.e., they verify a re-

ceived travel expense report. ”Verification” means that the

declared travel data is checked for correctness and plausibil-

ity (e.g., regarding accordance with receipts). An incorrect

travel expense report will be send back to the traveler (for

correction). If it is correct, it will be forwarded to the travel

supervisor for final approval. The supervisor role may be

filled, for example, by the line manager of the traveler. If

a travel expense report is approved by the supervisor, the

refunding will be initiated. Otherwise, it will be send back

to either the travel expense responsible (national trip) or

the verification center (international trip). Note that this is

a characteristic (yet simplified) process as it can be found

in many organizations.

Create Travel

Expense Report

Verify Travel

Expense Report

Verify Travel

Expense Report

Traveler

Travel

Expense

Reponsible

Verification

Center

Travel

Supervisor

Verify Travel

Expense Report

rejected

approved

rejected

national

international

Initiate

Refunding

rejected

approved

Activity

Start of Process

End of Process

User

Control Flow

Caption: Decision

Figure 1: The eTravel Business Process.

When realizing a PAIS supporting this process, one chal-

lenge is to select the most adequate BPM technology for this.

Currently, there exist different BPM paradigms, which can

be applied in the given context. Among them are workflow

management and case handling.

Workflow Management. Contemporary workflow man-

agement system (WfMS) enable the modeling, execution,

and monitoring of business processes. When working on a

particular process step (i.e., activity), typically, in WfMS-

based applications, only data needed for executing this ac-

tivity is visible to respective actors, but no other workflow

data. This is also known as ”context tunneling”. WfMSs

coordinate activity execution based on routing rules, which

are described by process definitions and which are strictly

separated from processed data. If an activity is completed,

subsequent activities will become active. Accompanying to

this the worklists of potential actors will be updated ac-

cordingly. Finally, electronic forms are typically used to

implement activities and to present data being processed.

Case Handling. An alternative BPM paradigm is pro-

vided by case handling [21]. A case handling system (CHS)

aims at more flexible process execution by avoiding restric-

tions known from (conventional) workflow technology (cf.

Fig. 2)). Examples of such restrictions include rigid control

flow and the aforementioned context tunneling. The central

concepts behind a CHS are the case and its data as opposed

to the activities and routing rules being characteristic for

WfMSs. Usually, CHSs present all data about a case at any

time to the user (assuming proper authorization), i.e., con-

text tunneling as known from WfMSs is avoided. Further-

more, CHSs orchestrate the execution of activities based on

the data assigned to a case. Thereby, different kinds of data

objects are distinguished (cf. Fig. 2). Free data objects are

not explicitly associated with a particular activity and can

be changed at any time during a case execution (e.g., Data

Object 3 in Fig. 2). Mandatory and restricted data objects,

in turn, are explicitly linked to one or more activities. If a

data object is mandatory for an activity, a value will have to

be assigned to it before the activity can be completed (e.g.,

Data Object 5 in Fig. 2). If a data object is restricted for

an activity, this activity needs to be active in order to assign

a value to the data object (e.g., Data Object 6 in Fig. 2).

Like in WfMSs, forms linked to activities are used to provide

context-specific views on case data. Thereby, a CHS does

not only allow to assign an execution role to activities, but

aredo role (to undo an executed activity) and a skip role as

well (to omit the execution of activities). User 2 in Fig. 2,

for example, may execute Activities 3, 4, 5 and 6, and redo

Activities 2 and 3.

mandatory data object restricted data object

Activity 1 Activity 2

Activity 3

Activity 4

Activity 6

Data

Object 1

Data

Object 2

Data

Object 4

Data

Object 5

Data

Object 3

Data

Object 7

Data

Object 6

Legend: form

User 1 User 2

execute role redo role

active

activity

Possible Actions of User 2:

- Activity 4 can be executed

Current Situation:

- Activity 4 is active

- Data Objects 1,2,3 and 4 are availble

Activity 5

- Activity 2 can be redone at any time

- Activity 3 can not be executed CONSTRAINT

- Activity 5 can not be executed CONSTRAINT

- Activity 6 can not be executed CONSTRAINT

Constraints:

- Activity 3 can be completed if Data Object 5 is available

- Activity 5 can be executed if Activities 3 and 4 is completed

- Activity 6 can be executed if Activity 5 is completed

available Data Objects not available Data Objects

Figure 2: Data-driven Case Handling.

Despite conceptual differences, both paradigms can be

used for implementing processes in general and our eTravel

process in particular. Usually, the selection of ”the most

adequate” BPM technology depends on project-specific re-

quirements. While some IT managers will consider BPM

technology as adequate if best practices are available, others

will take into account more specific selection criteria like the

support of a sufficient degree of process flexibility. Likewise,

IT managers can be interested in value-based considerations

as well. In practice, for example, a frequently asked question

is whether there is a difference in the efforts for implement-

ing a business process either with BPM technology A or BPM

technology B and if ”yes” how strong this difference is. De-

manding for such considerations, IT managers typically have

to rely on vendor data (e.g., about the return-on-investment

of their products) and qualitative experience reports. What

has been not available so far are precise quantitative data

about the use of BPM technology and PAIS (e.g., concerning

efforts for implementing processes).

To generate quantitative data, controlled software exper-

iments offer promising perspectives. In the following, we

pick up this idea and describe the results of an experiment

in which we investigate efforts related to the implementation

of business processes using either a WfMS or a CHS.

We use Tibco Staffware [18] (Version 10.1) as typical rep-

resentative of workflow technology. Its build-time tools in-

clude, among other components, a visual process modeling

tool and a graphical form editor. The used CHS, in turn, is

FLOWer [2] (Version 3.1), the most widely used commer-

cial CHS. Like Staffware, FLOWer provides a visual process

modeling tool and a form editor.

3. EXPERIMENTAL FRAMEWORK

This section describes the experimental framework under-

lying our experiment. Section 3.1 discusses general issues

to be considered when designing an experiment. Section

3.2 describes the specific design underlying our experiment.

Section 3.3 discusses factors threatening the validity of ex-

periment results as well as potential mitigations.

3.1 Basic Issues

Literature about software experiments [3, 5, 6, 17, 23]

provides various design guidelines for setting up an experi-

ment. First, an experiment design should allow to collect as

much data as possible with respect to the major goals of the

experiment. Second, collected data should be unambiguous.

Third, the experiment must be feasible within the given set-

ting (e.g., within the planned time period). Meeting these

design criteria is not trivial. Often, an experiment cannot

be accomplished as planned due to its complex design or an

insufficient number of participants [17].

Considering these major criteria, we accomplished our

experiment as a balanced single factor experiment with re-

peated measurement (cf. Fig. 3). This design is particularly

suitable for comparing software development technologies

[6]. Specifically, single factor experiments investigate the ef-

fects of one factor1(e.g., a particular software development

technology) on a common response variable (e.g., implemen-

tation efforts). This design also allows to analyze variations

of a factor (e.g., two alternative tools for software develop-

ment). Generally, these variations are called factor levels.

The response variable is determined when the participants

of the experiment (who are also called subjects) apply the

factor or factor levels to an object (e.g., a specification to be

implemented, based on a set of requirements).

Factor Level 1:

Software

Development

Technology 1

Participant 1 Specification

(Set of

Requirements)

Participant

n/2

Participant

n/2+1

Participant n

Zeichen

n Participant

Zeichen

1 Factor

Zeichen

1 Object

Factor Level 2:

Software

Development

Technology 2

Specification

(Set of

Requirements)

Factor Level 2:

Software

Development

Technology 2

Participant 1 Specification

(Set of

Requirements)

Participant

n/2

Participant

n/2+1

Participant n

Zeichen

n Subjects

Zeichen

1 Factor

Zeichen

1 Object

Factor Level 1:

Software

Development

Technology 1

Specification

(Set of

Requirements)

First Run Second Run

Completion of first

applied Factor Level

Overall Experiment

Figure 3: Single Factor Experiment.

We denote a single factor experiment as balanced if all fac-

tor levels are used by all participants of the experiment.

This enables repeated measurements and the collection of

more precise data as every subject generates data for every

treated factor level. Generally, repeated measurements can

1Multi factor experiments, by contrast, investigate the effects of fac-

tor combinations on a common response variable, e.g., of a software

development technology and a software development process on im-

plementation efforts. Despite such experiments can improve the va-

lidity of experiment results, they are rarely applied in practice [12].

be realized in different ways. Fig. 3 shows a frequently ap-

plied variant which is based on two subsequent runs. During

the first run half of the subjects apply ”Software Develop-

ment Technology 1” to the treated object, while the other

half uses ”Software Development Technology 2”. After hav-

ing completed the first run, the second run begins. During

this run each subject applies that factor level to the object

not been treated so far.

3.2 Experiment Design

Considering the generic experiment design from Section

3.1, our specific design comprises the following elements:

•Subjects: Subjects are 48 students of a combined

Bachelor/Master Computer Science course at the Uni-

versity of Innsbruck. These 48 students are divided

into 4 main groups each consisting of 4 teams with 3

students (cf. Fig. 4). This results in an overall num-

ber of 16 teams. The students are randomly assigned

to the teams prior to the start of the experiment.

Team 11

Team 13

Team 14

Team 16

Main Group 1 Main Group 2 Main Group 3 Main Group 4

Teams

Team 01

Team 02

Team 05

Team 06

Team 03

Team 04

Team 07

Team 08

Team 09

Team 10

Team 12

Team 15

1st Run:

WfMS

-----------

2nd Run:

CHS

1st Run:

WfMS

-----------

2nd Run:

CHS

1st Run:

CHS

-----------

2nd Run:

WfMS

1st Run:

CHS

-----------

2nd Run:

WfMS

WfMS = Workflow Management System, CHS = Case Handling System

Figure 4: Main Groups and Teams.

•Object: The object to be implemented is the eTravel

business process (cf. Section 2). Its specification com-

prises two parts: an initial ”Base Implementation”

(Part I ) and an additional ”Change Implementation”

(Part II ). While the first part deals with the real-

ization of process support for the refunding of na-

tional business trips, the second part specifies a process

change, namely, additional support for refunding in-

ternational business trips. Both parts describe the ele-

ments to be implemented: the process logic, user roles,

and the data to be presented using simple electronic

forms. Note that this specification does not only en-

able us to investigate efforts for (initially) implement-

ing a business process, but also to examine efforts for

subsequent process changes. In our experiment, with

”process change” we mean the adaptation of the imple-

mented business process. After having realized such a

process change new process instances are based on the

new process model. We do not investigate the migra-

tion of changing process instances to the new process

schema in this context.

•Factor & Factor Levels: In our experiment, BPM

technology is the considered factor with factor levels

”WfMS” (Staffware) and ”CHS” (FLOWer).

•Response Variable: In our experiment the response

variable is the time the subjects (i.e., the students)

need for implementing the given object (i.e., the eTravel

specification) with each of the factor levels (WfMS and

CHS). All effort values related to the Staffware imple-

mentation are denoted as ”WfMS Sample”. All effort

values related to the FLOWer implementation are de-

noted as ”CHS Sample”.

Besides, the following issues are important:

•Instrumentation: To precisely measure the response

variable, we have developed an application called Time-

Catcher. This ”stop watch” allows to log time in six

typical ”effort categories” related to the development

of a process-oriented application: (1) process model-

ing, (2) user/role management, (3) form design, (4)

data modeling, (5) test, and (6) miscellaneous efforts.

To collect qualitative feedback as well (e.g., concern-

ing the maturity or usability of the applied WfMS and

CHS), we use a structured questionnaire.

•Data Collection Procedure: The TimeCatcher tool

is used by the students during the experiment. The

aforementioned questionnaire is filled out by the stu-

dents after completing the experiment.

•Data Analysis Procedure: For data analysis well-

established statistical methods and standard metrics

are applied (cf. Section 4.3 for details).

The mathematical model of our experiment can be sum-

marized as follows: n subjects S1, ..., Sn(n ∈IN) divided

into m teams T1, ..., Tm(m∈IN, m ≥2, m even) have to

implement the eTravel business process specification. This

specification describes a ”Base Implementation” O1(corre-

sponding to the ”national case” of the eTravel process) and a

”Change Implementation” O2(additionally introducing the

”international case”). During the experiment one half of

the teams (T1, ..., Tm/2) implements the complete specifica-

tion (i.e., base and change implementation) using a WfMS

(P MS1, Staffware), while the other half (Tm/2+1, ..., Tm)

does this using a CHS (P MS2, FLOWer). After finishing

the implementation with the first factor level (i.e., the first

run), each team has to implement the eTravel process us-

ing the second factor level in a second run (i.e., the devel-

opment technologies are switched). The response variable

”Effort[Time] of Tmimplementing Ousing P MSj” is logged

with the TimeCatcher tool.

3.3 Risk Analysis and Mitigations

When accomplishing experimental research and generat-

ing results, related risks have to be taken into account. Gen-

erally, there exist factors that threaten both the internal

validity (”Are the claims we made about our measurements

correct?”) and the external validity (”Can we generalize the

claims we made?”) of an experiment. In our context, threats

to internal validity are as follows:

•People: Participating students differ in their skills

and their productivity for two reasons: (i) general ex-

perience with software development and (ii) experience

with BPM technology. The first issue can only be bal-

anced by conducting the experiment with a sufficiently

large and representative set of students. The number

of 48 students promises to achieve such a balance. The

second issue can be mitigated by using BPM tools un-

known to every student. Only three of the partici-

pating students have rudimental (and thus negligible)

workflow knowledge. As we cannot exclude that this

knowledge influences our experiment results, we have

assigned those three students to different teams in or-

der to minimize potential effects as far as possible.

•Data collection process: Data collection is one of

the most critical threats. To mitigate it, we need to

continuously control data collection in the context of

the experiment. We further have to ensure that stu-

dents understand which TimeCatcher categories have

to be selected during the experiment.

•Time for optimizing an implementation: The spec-

ification to be implemented does not include any guide-

line concerning the number of electronic forms or their

layout. This implies the danger that some teams spend

more time for implementing a ”nice” user interface

than others do. To minimize such effects, we explic-

itly indicate to the students that the development of a

”nice” user interface is not a goal of our experiment.

To ensure that the implemented solutions are similar

across different teams, we accomplish acceptance tests.

Besides, there are threats to the external validity:

•Students instead of professionals: Involving stu-

dents instead of IT professionals may be critical. How-

ever, it has been shown before that the results of stu-

dent experiments are transferable and can provide valu-

able insights into an analyzed problem domain [15].

Also note that the use of professional software devel-

opers is hardly possible in practice as no profit-oriented

organization will simultaneously implement a business

process twice using two different BPM technologies.

•Investigation of tools instead of concepts: In our

experiment, BPM tools are used as representatives for

the analyzed concepts (i.e., workflow management and

case handling). Investigating the concepts therefore al-

ways depends on the quality of the used tools. To mit-

igate this risk, the used BPM technologies should be

representative for state-of-the-art technologies in prac-

tice (which is the case as both selected BPM tools are

market leader in their domain).

•Choice of object: To avoid that the chosen business

process setting strongly supports the goals of our ex-

periment, we have picked a business process that can

be found in many organizations, i.e., the eTravel pro-

cess (cf. Section 2).

4. PERFORMING THE EXPERIMENT

This section deals with the preparation, execution and

analysis of our experiment. This also includes the presenta-

tion of experiment results.

4.1 Experiment Preparation

In the run-up of the experiment, we prepare a techni-

cal specification of the eTravel process. This specification

comprises UML activity diagrams2, an entity relationship

diagram describing the generic data structure of a travel ex-

pense report, and system-specific data models for the con-

sidered tools (Staffware, FLOWer). In order to ensure that

2One may argue that the use of UML activity diagrams can under-

mine the validity of the experiment as these diagrams are very similar

to the explicit, flow-driven notation of Staffware process models, but

different from the implicit, more data-driven FLOWer process mod-

els. However, in practice, UML activity diagrams are widely used to

describe standard business processes. Thus, the use of UML activ-

ity diagrams can even improve internal validity as a typical practical

scenario is investigated.

the specification is self-explanatory and correct, two student

assistants are involved in its development.

Before the experiment, the same two students implement

the specification with each of the utilized BPM technolo-

gies. This allows us to ensure the feasibility of our general

experiment setup and to identify critical issues with respect

to the performance of the experiment. This pre-test also

provides us with feedback that helps to further improve the

comprehensibility of our specification. Finally, we compile

a ”starter kit” for each participating team. It includes orig-

inal tool documentation, additional documentation created

by us when preparing the experiment (and which can be

considered as a compressed summary from the original doc-

umentation), and the technical process specification.

4.2 Experiment Execution

Due to infrastructure limitations, we split up the exper-

iment in two events. While the first one took place in Oc-

tober 2006, the second one was conducted in January 2007.

Each event lasted 5 days, involved 24 participants (i.e., stu-

dents), and was based on the following procedure: Prior to

the start of the experiment, all students have to attend an

introductory lecture. We introduce to them basic notions of

workflow management and case handling. We further inform

them about the goals and rules of the experiment. After-

wards, each team receives its ”starter kit”. Then, the stu-

dents have to implement the given eTravel business process

specification (with both considered factor levels). After hav-

ing implemented the eTravel specification with a factor level,

an acceptance test is accomplished by us in order to ensure

that the developed solution corresponds to the specification.

After finishing their work on the experiment, students have

to fill out the aforementioned questionnaire.

We further optimize experiment results by applying Ac-

tion Research [13]. Action Research is characterized by an

intensive communication between researchers and subjects.

At an early stage, we optimize the data collection process by

assisting and guiding the students in using the TimeCatcher

data collection tool properly (which is critical with respect

to the quality of the gathered data). Besides, we document

emotional reactions of the students regarding their way of

working. This helps us to design the questionnaire. Note

that Action Research does not imply any advice for the stu-

dents on how to implement the eTravel process.

4.3 Data Analysis Procedure

Data analysis comprises three steps: an initial validation

of the collected data (Step 1), data analysis itself (Step 2),

and analysis of the questionnaire results (Step 3).

Step 1: Data Validation. We validate the collected data

regarding its consistency (”Is all expected data available?”)

and plausibility (”Is all available data meaningful?”):

•Data Consistency: We discard the data of two teams

as their data is flawed. Both have made mistakes using

the TimeCatcher tool. Hence, the data provided by 14

teams is finally included in data analysis.

•Data Plausibility: We analyze data plausibility based

on box-whisker-plot diagrams. Such diagrams visualize

the distribution of a sample and particularly show out-

liers. A low number of outliers indicates plausible data

[12]. Fig. 5A, for example, shows a box-whisker-plot

diagram which illustrates the distributions of the base

implementation efforts in our experiment. The dia-

gram takes the form of a box that spans the distance

between the 25% quantile and the 75% quantile (the

so called interquantile range) surrounding the median

which splits the box into two parts. The ”whiskers”

are straight lines extending from the ends of the box

to the maximum and minimum values. Outliers are

defined as data points beyond the interquantile range,

i.e., beyond the edge of the box. As can be seen in Fig.

5A, there are no outliers, i.e., all data from these sam-

ples lie within the boxed area. Moreover, there exists

only one (negligible) outlier in the distribution of the

change implementation efforts (cf. Fig. 5B), and no

outliers regarding the distribution of the overall imple-

mentation efforts (cf. Fig. 5C).

Step 2: Data Analysis. The main goal of the experiment

is to investigate whether there is a significant difference be-

tween the efforts of implementing a business process with a

WfMS and the efforts of an implementation using case han-

dling technology. Hence, the 0-hypothesis to be analyzed is

as follows: ”Using workflow technology yields no significant

difference in implementation efforts when compared to case

handling technology”. We analyze this 0-hypothesis based

on a two-sided t-test [12] (respectively an additional sign test

if the t-test fails). Doing so, we are able to assess whether

the means of the WfMS sample and the CHS sample are

statistically different from each other. A successful t-test

(with |T|> t0) rejects our 0-hypothesis. Specifically, the

following steps have to be executed in order to accomplish

a t-test (with α= 0.05 as the level of significance):

1. Paired Comparison: The t-test is combined with a

paired comparison [12], i.e., we analyze ”pairs of effort

values”. Each pair comprises one effort value from the

WfMS sample and one from the CHS sample. Note

that we compose pairs according to the performance

of the teams, i.e., effort values of ”good” teams are not

combined with effort values of ”bad” teams (cf. [6]).

2. Standardized Comparison Variable: For each pair,

astandardized comparison variable Xjis derived. It is

calculated by dividing the difference of the two com-

pared effort values by the first one:

Xj:= EF F ORTj+m/2−EF F ORTj

EF F ORTj+m/2

·100%

In other words, Xjdenotes how much effort team Tj

saves using workflow technology when compared to

team Tj+m/2which uses case handling technology. To-

gether, all Xjconstitute a standardized comparison

sample x= (X11, ..., X1m/2) used as basis when per-

forming the t-test.

3. Statistical Measures: For the standardized compar-

ison sample xwe calculate the median (m), the in-

terquantile range (IQR), the expected value (µ), the

standard deviation (σ), and the skewness (sk).

4. Two-sided t-Test: Finally, we apply the t-test to x.

Note that the t-test will be only possible if xemanates

anormal distribution and if the WfMS and CHS sam-

ple have same variance. The first condition can be

A) Boxplot: Base Implementation B) Boxplot: Change Implementation C) Boxplot: Overall Implementation

EFFORT [HOURS]

FLOWer Staffware

Outliers

EFFORT [HOURS]

FLOWer Staffware

EFFORT [HOURS]

FLOWer Staffware

Figure 5: Data Distribution (Box-Whisker-Plot Diagrams).

tested using the Kolmogoroff/Smirnov test [16]. In

particular, the result of the Kolmogoroff/Smirnov test

has to be smaller than K0(with K0a predefined value

depending on the size of xand α). The second condi-

tion can be tested based on the test for identical vari-

ance [16]. The variance of the WfMS and CHS sample

will be identical, if the result of this test is smaller than

F0(with F0a predefined value depending on the size

of the samples and α). Only one violated precondition

is sufficient to avoid the accomplishment of the t-test.

Step 3: Questionnaire Analysis. We analyze the data

collected based on the questionnaire each student has to fill

out. As one of the students became ill at the last day of the

January 2007 event, only 47 students participate.

4.4 Experiment Results

Fig. 6A shows the results for the overall implementa-

tion efforts. When studying the efforts for the workflow

implementation, we can see that they are lower than the ef-

forts for the case handling implementation. This difference

is confirmed by the results of the (successful) t-tests for both

the first and the second run, i.e., our 0-hypothesis is to be

rejected. In the first run, the use of workflow technology

has resulted in effort savings of 43.04% (fluctuating between

27.51% and 50.81%) when compared to the efforts for us-

ing case handling technology. In the second run, the use of

workflow technology has still resulted in savings of 28.29%

(fluctuating between 11.48% and 53.16%).

Fig. 6A also shows that efforts for the first run are gener-

ally higher than those for the second run. Regardless which

technology is used first, all teams reduce their efforts in the

second run. This can be explained either through learning

effects on the used BPM technologies or an increasing pro-

cess knowledge gathered during the experiment. Based on

questionnaire results (see below), we assume that this ef-

fect is not necessarily related to learning effects concerning

the used BPM technologies (i.e., tool knowledge), but to in-

creasing process knowledge (which, in turn, reduces the risk

of comparing tools instead of concepts).

Fig. 6B and Fig. 6C show results for the base imple-

mentation and the change implementation. Again, our re-

sults allow to reject the 0-hypothesis (the failed t-test can

be compensated with a successful sign test). Using workflow

technology results in effort savings of 44.11% for the change

implementation in the first run (fluctuating between 16.29%

and 56.45%). In the second run, the use of workflow tech-

nology results in effort savings of 40.46% when compared

case handling efforts.

4.5 Questionnaire Results

Fig. 6D shows that the methodical soundness of using

process management technology is easier to understand in

the case of workflow technology, i.e., using case handling

technology is considered as being more difficult. Fig. 6E

illustrates what we have already mentioned above, i.e., pro-

cess knowledge gained during the first run significantly sim-

plifies the second run. By contrast, Fig. 6F shows that the

increased efficiency during the second run cannot be related

to a gained tool knowledge. Finally, Fig. 6G deals with the

usability of the applied process management systems. As

can be seen, there remains a lot of space for improvement

from the students’ viewpoint.

4.6 Discussion

Our results indicate that process implementations based

on workflow technology generate lower efforts when com-

pared to implementations based on case handling technol-

ogy. Moreover, our results show that initial implementations

of processes generate significantly higher efforts than subse-

quent process changes (cf. Fig. 7). This is particularly im-

portant for policy makers, who often focus on short-term

costs (e.g., for purchasing BPM technology and initially

implementing business processes) rather than on long-term

benefits (e.g., low costs for realizing process changes).

Finally, our data indicates that increasing knowledge about

the processes to be implemented results in increased pro-

ductivity of software developers. Regardless which BPM

technology is used first, all teams reduce their efforts in the

second run. Questionnaire results further indicate that this

effect is not necessarily related to an increasing knowledge

about the used BPM technologies. This also emphasizes the

need to involve domain experts with high process knowledge

when applying BPM technology.

Considering our experiment design, it is inevitable to ac-

knowledge that our experiment results are influenced by the

quality of the used BPM tools. However, by selecting leading

commercial BPM tools as representatives for the analyzed

concepts (i.e., workflow management and case handling), we

can reduce the impact of the tool quality. Yet, based on this

single experiment, results cannot be generalized, i.e., sub-

stantial conclusions regarding the strengths and weaknesses

of workflow management and case handling cannot be de-

rived. For this purpose, additional experiments with differ-

ent experiment designs and more specific research questions

will be necessary. As one example consider the comparison

of conventional WfMS, adaptive WfMS, and CHS regarding

the effectiveness of realizing process changes.

A) EXPERIMENT RESULTS - Paired Comparison (Overall Efforts)

100

120

11|15 6|8 1|12 16|4 14|7 2|10 13|3 15|11 8|6 12|1 4|16 7|14 10|2 3|13

WfMS

CHS

1st run

Statistical Data

2nd run

= 43.048

= [27.51 ; 50.81]

first run second run

IQR

= 38.6896

= 12.7703

= 28.2913

= [11.48 ; 53.16]

IQR

= 31.2358

= 20.6792

= -0.6309 = 0.4490

= 0.135 ( )

= 0.349

= 0.129 ( )

= 0.349

= 0.661 ( )

= 4.284

= 0.76 ( )

= 4.284

= -5.059 ( )

= 2.179

= -3.294 ( )

= 2.179

normalized effort values

pairs of effort values pairs of effort values

100

120

11|15 6|8 1|12 16|4 14|7 2|10 13|3 15|11 8|6 12|1 4|16 7|14 10|2 3|13

100

120

11|15 6|8 1|12 16|4 14|7 2|10 13|3 15|11 8|6 12|1 4|16 7|14 10|2 3|13

B) EXPERIMENT RESULTS - Paired Comparison (Base Implementation)

C) EXPERIMENT RESULTS - Paired Comparison (Change Implementation)

Statistical Data

1st run 2nd run

= 43.0116

= [32.03 ; 50.06]

first run second run

IQR

= 39.2788

= 11.9261

= 28.5209

= [8.11 ; 54.90]

IQR

= 29.3332

= 23.5067

= -09141 = 0.4401

= 0.138 ( )

= 0,349

= 0.198 ( )

= 0,349

= 0.972 ( )

= 4,284

= 0.895 ( )

= 4,284

= -4.816 ( )

= 2,179

= -3.024 ( )

= 2,179

= 44.1152

= [16.29 ; 56.45]

first run second run

IQR

= 29.9807

= 32.4034

= 40.4666

= [26,63 ; 52,20]

IQR

= 41.4368

= 14.4722

= -1.3034 = 0.8501

= 0.172 ( )

= 0.349

= 0.198 ( )

= 0.349

= 0.752 ( )

= 4.284

= 1.784 ( )

= 4.284

= -1.724 ( )

= 2.179

= -2.884 ( )

= 2.179

normalized effort values

pairs of effort values pairs of effort values

normalized effort values

pairs of effort values pairs of effort values t-test failed

WfMS

CHS

WfMS

CHS

D) QUESTIONNAIRE - Methodical Soundness of Implementation E) QUESTIONNAIRE - Impact of Process Knowledge

Question: How strong has the process knowledge, which was gained during

the first implementation, simplified the second implementation?

A B C D E F

A:04

B:14

C:21

D:05

E:02

F:01

G:00

(08.51%)

(29.79%)

(44.68%)

(10.64%)

(04.26%)

(02.13%)

(00.00%)

very strong

strong

rather strong

indifferent

rather weak

weak

very weak

A B C D E

Question: The methodical steps of the business process

implementation was clear during the experiment?

A:13

B:25

C:05

D:00

E:00

(27.66%)

(61.70%)

(10.64%)

(00.00%)

(00.0%)

yes

rather yes

indifferent

rather no

A:00

B:09

C:10

D:21

E:07

(00.00%)

(19.15%)

(21.28%)

(44.68%)

(14.89%)

G) QUESTIONNAIRE - UsabilityF) QUESTIONNAIRE - Impact of Tool Knowledge

Question: How strong has the tool knowledge, which was gained

during the first implementation, simplified the second implementation?

A B C D E F G

A:02

B:02

C:11

D:15

E:06

F:04

G:07

(04.26%)

(23.40%)

(31.91%)

(12.77%)

(08.51%)

(14.89%)

very strong

strong

rather strong

indifferent

rather weak

weak

very weak

Question: How would you rate the usability of the process management systems,

which have been used during the experiment?

A B C D E F G

A:05

B:21

C:09

D:08

E:04

F:00

G:00

(10.64%)

(44.68%)

(19.15%)

(17.02%)

(08.51%)

(00.00%)

very good

good

rather good

indifferent

rather weak

weak

very weak

A:01

B:01

C:05

D:05

E:14

F:14

G:07

(02.13%)

(10.64%)

(29.79%)

(14.89%)

absolute nominations absolute nominations

WfMS

CHS

WfMS

CHS

Figure 6: Results of our Experiment.

We apply these experiment results in the EcoPOST project

[9]. This project aims at the development of an approach to

investigate complex causal dependencies and related cost ef-

fects in PAIS engineering projects. In particular, our results

enable us to quantify causal dependencies in PAIS engineer-

ing projects. As an example consider the impact of process

knowledge on the productivity of process implementation.

Process Modeling

Data Modeling

Form Design

User/Role Management

Test

Miscellaneous

Explanation:

Base Implementation

Change Implementation

Figure 7: Base versus Change Implementation.

5. RELATED WORK

The most similar experiment design when compared to

our own is provided by [6] which investigates the impact

of workflow technology on software development and soft-

ware maintenance. Generally, only few data is available on

the effects of workflow technology (regarding case handling,

no data is available at all). Oba et al. [11], for example,

analyze the introduction of WfMS and particularly focus

on the identification of factors influencing work efficiency,

processing time, and business process standardization. A

mathematical model is provided for predicting the reduc-

tion rate of processing times. An extension of this work

is [14] where simulation is used to compare pre- and post-

implementations of information systems relying on workflow

technology. Focus of this work is on analyzing process per-

formance based on criteria such as lead time, waiting time,

service time, and utilization of resources.

6. SUMMARY

This paper presents the results of a controlled BPM soft-

ware experiment with 48 students. Our results indicate that

business process implementation based on workflow technol-

ogy generates lower efforts than using case handling tech-

nology. Thereby, initial process implementations result in

higher efforts than subsequent process changes. Our data

can help enterprises – which crave for quantitative data com-

pleting their qualitative decision criteria – to better under-

stand the efforts of using BPM technology.

7. REFERENCES

[1] Y. L. Antonucci. Using Workflow Technologies to

improve Organizational Competitiveness. Int’l. J. of

Management, 14(1), pp.117-126, 1997.

[2] P. Athena. Case Handling with FLOWer: Beyond

Workflow. 2002.

[3] V. R. Basili, R. W. Selby, and D. H. Hutchens.

Experimentation in Software Engineering. IEEE

Trans. in SW Engin., 12(7), pp.733-743, 1986.

[4] M. Dumas, W. M. P. van der Aalst, and A. ter

Hofstede. Process-aware IS. Wiley, 2005.

[5] N. Juristo and A. M. Moreno. Basics of Software

Engineering Experimentation. 2001.

[6] N. Kleiner. Can Business Process Changes Be Cheaper

Implemented with Workflow-Management-Systems?

IRMA 2004, pp.529-532.

[7] C. M. Lott and H. D. Rombach. Repeatable Software

Engineering Experiments for Comparing

Defect-Detection Techniques. Empirical Software

Engineering, 1(3), pp. 241-277, 1996.

[8] B. Mutschler, M. Reichert, and J. Bumiller.

Unleashing the Effectiveness of Process-oriented

Information Systems: Problem Analysis, Critical

Success Factors, Implications. IEEE Transactions on

Systems, Man, and Cybernetics - Part C: Application

and Reviews, 2008 (accepted for publication).

[9] B. Mutschler, M. Reichert, and S. Rinderle. Analyzing

the Dynamic Cost Factors of Process-aware IS: A

Model-based Approach. CAiSE 2007.

[10] G. J. Myers. A controlled Experiment in Program

Testing and Code Walkthroughs/Inspections. Comm.

of the ACM, 21(9), pp. 760-768., 1978.

[11] M. Oba, S. Onoda, and N. Komoda. Evaluating the

Quantitative Effects of Workflow Systems based on

Real Cases. HICSS 2000.

[12] L. Prechelt. Controlled Experiments in Software

Engineering (in German). Springer, 2001.

[13] P. Reason and H. Bradbury. Handbook of Action

Research. 2001.

[14] H. A. Reijers and W. M. P. van der Aalst. The

Effectiveness of Workflow Management Systems -

Predictions and Lessons Learned. Int’l. J. of Inf.

Manag., 25(5), pp.457-471, 2005.

[15] P. Runeson. Using Students as Experiment Subjects.

EASE 2003.

[16] D. J. Sheskin. Handbook of Parametric and

Nonparametric Statistical Procedures. 2000.

[17] D. I. K. Sjoberg, J. E. Hannay, O. Hansen, V. B.

Kampenes, A. Karahasanovic, N.-K. Liborg, and

A. C. Rekdal. A Survey of Controlled Experiments in

Software Engineering. IEEE Trans. in SW Engin.,

31(9), pp.733-753, 2005.

[18] Tibco. Staffware Process Suite. User Manual, 2005.

[19] W. van der Aalst, A. ter Hofstede, B. Kiepuszewski,

and A. Barros. Workflow Patterns. Distributed and

Parallel Databases, 14(3), pp.5-51, 2003.

[20] W. M. P. van der Aalst and K. van Hee. Workflow

Management. MIT Press, 2004.

[21] W. M. P. van der Aalst, M. Weske, and D. Grunbauer.

Case Handling: A New Paradigm for Business Process

Support. DKE, 53(2), pp.129-162, 2005.

[22] B. Weber, S. Rinderle, and M. Reichert. Change

patterns and change support features in process-aware

is. CAiSE 2007.

[23] M. V. Zelkowitz and D. R. Wallace. Experimental

Models for Validating Technology. IEEE Computer,

31(5), pp.23-31, 1998.