Investigating the Effort of Using Business Process Management Technology: Results from a Controlled Experiment [original]

Investigating the Effort

of Using Business Process Management Technology:

Results from a Controlled Experiment

Barbara Weber a,∗, Bela Mutschler bManfred Reichert c

aDepartment of Computer Science, University of Innsbruck,

Technikerstraße 21a, 6020 Innsbruck, Austria

bHochschule Ravensburg-Weingarten

Postfach 1261, 88241 Weingarten, Germany

cInstitute of Databases and Information Systems, Ulm University

James-Franck-Ring, 89069 Ulm, Germany

Abstract

Business Process Management (BPM) technology has become an important instrument for sup-

porting complex coordination scenarios and for improving business process performance. When

considering its use, however, enterprises typically have to rely on vendor promises or qualitative

reports. What is still missing and what is demanded by IT decision makers are quantitative

evaluations based on empirical and experimental research. This paper picks up this demand and

illustrates how experimental research can be applied to technologies enabling enterprises to co-

ordinate their business processes and to associate them with related artifacts and resources. The

conducted experiment compares the effort for implementing and maintaining a sample business

process either based on standard workflow technology or on a case handling system. We moti-

vate and describe the experimental design, discuss threats for the validity of our experimental

results (as well as risk mitigations), and present the results of our experiment. In general, more

experimental research is needed in order to obtain valid data on the various aspects and effects

of BPM technology and BPM tools.

Key words: Process-aware Information System, Workflow Management, Case Handling, Controlled

Experiment, Information Systems Engineering

∗Corresponding author.

Email addresses: [email protected] (Barbara Weber), [email protected]

(Bela Mutschler), [email protected] (Manfred Reichert).

Preprint submitted to Elsevier 29 December 2008

1. Introduction

Providing effective IT support for business processes has become crucial for enterprises

to stay competitive in their market [1,2]. In response to this need, a variety of process

support paradigms (e.g., workflow management, case handling, service orchestration),

process specification standards (e.g., WS-BPEL, BPMN), and business process manage-

ment (BPM) tools (e.g., Tibco Staffware, FLOWer, IBM Websphere Process Server) have

emerged supporting the realization of Process-Aware Information Systems (PAISs) [3].

Specifically, PAISs enable enterprises to implement and execute complex coordination

scenarios either within an enterprise or in a cross-organizational setting [4].

Coordination scenarios are typically described by coordination models. Such models

integrate the interactions of a number of (heterogeneous) components (processes, ob-

jects, agents) into a meaningful description. Relevant research areas are, for example,

service-oriented architectures (i.e., service coordination, service orchestration, and ser-

vice choreography), cooperative information systems (e.g., workflow management tech-

nology or case handling technology), component-based systems, multi-agent technology,

and related middleware platforms.

When evaluating the suitability of existing BPM technology for a particular coordina-

tion scenario or when arguing about its strengths and weaknesses, typically, it becomes

necessary to rely on qualitative criteria. As one example consider workflow patterns [5],

which can be used to evaluate the expressiveness of the workflow modeling language

provided by a particular BPM tool. As another example consider process change pat-

terns [7], which facilitate the evaluation of BPM tools regarding their ability to deal with

process changes. What has been neglected so far are more profound evaluations of BPM

technology based on empirical or experimental research. This is surprising as the benefits

of these research methods have been demonstrated in the software engineering area for

a long time [8] (e.g., in the context of software development processes or code reviews

[9,10]). In addition, a recently conducted survey among IT managers and project leaders

has clearly shown that quantitative data on costs, benefits and effects of BPM technology

becomes increasingly important [11].

Picking up this demand, this paper illustrates how experimental research can be ap-

plied in the BPM context. For this purpose we have conducted a controlled software

experiment with 48 participants to investigates the effort related to the implementation

and change of business processes either using conventional workflow technology [12] or

a case handling system [13]. More precisely, we have used Tibco Staffware [14] as rep-

resentative of workflow technology and FLOWer [15] as representative of case handling

systems. We describe our experimental design, give a mathematical model of the experi-

ment, and discuss potential threats for the validity of experimental results. Following this

we describe the major results of our experiment, which contribute to better understand

the complex effort caused by using BPM technology, and discuss them in detail.

This paper is a significant extension of the work we presented in [16]. It includes ex-

tended analyses of the data we gathered during our experiment and a more in-depth

interpretation of the presented results. In particular, learning effects, which have to be

considered when investigating the effort related to the implementation of business pro-

cesses, constitute an additional aspect being addressed. Moreover, the comparison of

workflow technology and case handling has been extended.

The remainder of this paper is organized as follows. Section 2 motivates the need for

experimentation in BPM and provides background information needed for understanding

our experiment. Section 3 describes our experimental framework. Section 4 deals with

the performance and results of our experiment. Finally, Section 5 discusses related work

and Section 6 concludes with a summary.

2. Background

This section presents the background needed for the understanding of this paper.

Section 2.1 introduces Process-Aware Information Sytems (PAISs). Section 2.2 deals

with different paradigms for realizing PAISs, which we compare in our experiment.

2.1. Need for Process-Aware Information Systems

Empirical studies have indicated that providing effective business process support by

information systems is a difficult task to accomplish [11,17]. In particular, these studies

show that current information systems fail to provide business process support as needed

in practice. Among the major reasons for this drawback are the hard-wiring of process

logic in contemporary information systems and the missing support for coping with

evolving business processes. Enterprises crave for approaches that enable them to control,

monitor and continuously improve business process performance [18]. What is needed are

PAISs, i.e., information systems that support the modeling, enactment and monitoring

of business processes in an integrated and efficient way.

In general, PAISs orchestrate processes of a particular type (e.g., handling of a cus-

tomer order) based on a predefined process model. Such a model defines the tasks to be

executed (i.e., activities), their dependencies (e.g., control and data flow), the organiza-

tional entities performing these tasks (i.e., process users), and the business objects which

provide or store activity data. Unlike conventional information systems, PAISs strictly

separate process logic from application code [19]; i.e., PAISs are driven by process models

rather than program code (cf. Fig. 3). Consequently, PAISs are realized based on process

engines which orchestrate processes and their activities during run-time [20]. Typically,

a process engine also provides generic functionality for the modeling and monitoring of

processes, e.g., for accomplishing process analysis. Earlier empirical work confirms that

PAISs enable a fast and cost-effective implementation as well as customization of business

processes [21].

Realizing PAISs also implies a significant shift in the field of information systems

engineering. Traditional engineering methods and paradigms (e.g., object-oriented design

and programming) have to be supplemented with engineering principles and software

technologies particularly enhancing the operational support of business processes (e.g.,

workflow management, case handling, and service orchestration). This is crucial to tie

up those requirements neglected by current information systems so far.

2.2. Paradigms for Orchestrating Business Processes and their Activities

Assume that a business process for refunding traveling expenses - in the following denoted

as eTravel business process - is to be supported by a PAIS which is realized using BPM

technology. The eTravel business process is used throughout the paper and is part of the

material used for our experiment. It distinguishes between four organizational roles (cf.

Fig. 1). The traveler initiates the refunding of his expenses. For this purpose, he has to

summarize the travel data in a travel expense report. This report is then forwarded either

to a travel expense responsible (in case of a national business trip) or to a verification

center (in case of an international business trip).

Both the travel expense responsible and the verification center fulfill the same task, i.e.,

they verify a received travel expense report. “Verification” means that the declared travel

data is checked for correctness and plausibility (e.g., regarding accordance with receipts).

An incorrect travel expense report is sent back to the traveler (for correction). If it is

correct, it will be forwarded to the travel supervisor for final approval. The supervisor

role may be filled, for example, by the line manager of the traveler. If a travel expense

report is approved by the supervisor, the refunding will be initiated. Otherwise, it will

be sent back to either the travel expense responsible (national trip) or the verification

center (international trip). Note that this is a characteristic (yet simplified) process as it

can be found in many organizations.

Create Travel

Expense Report

Verify Travel

Expense Report

Verify Travel

Expense Report

Traveler

Travel

Expense

Reponsible

Verification

Center

Travel

Supervisor

Verify Travel

Expense Report

rejected

approved

rejected

national

international

Initiate

Refunding

rejected

approved

Actor Activity Decision

(XOR)

Start Node End Node Control Edge

Fig. 1. The eTravel Business Process (modeled as UML Activity Diagram).

When realizing a PAIS which supports this process, one challenge is to select the most

suitable BPM technology for this purpose. Currently, there exist various BPM approaches,

which can be categorized as shown in Fig. 2. Basically, one distinguishes between group-

ware systems,workflow management systems, and case handling systems. Groupware

systems aim at the support of unstructured processes including a high degree of personal

communication. As groupware systems are not suitable for realizing PAISs, they are not

further discussed in this paper. Workflow management systems (WfMSs), in turn, are

best suited to support business processes which are well structured and have a high de-

gree of repetition (e.g., procurement processes or clearance processes). Often, they are

combined with an additional solution to integrate business processes within and across

enterprises. Case handling systems (CHSs), in turn, are more flexible than traditional

WfMSs [22], but are not suited for integrating heterogeneous application systems. In

addition, for fully automated business procesess (i.e., processes without the need for hu-

man interaction), CHSs are not the proper choice. Both workflow management and case

handling are well suited for realizing administrative processes like our eTravel business

process.

case handling

systems

data-driven process-driven

groupware

systems

not

structured

implicitly

structured

explicitly

structured

workflow management

systems

Fig. 2. Process Management Paradigms.

In the following workflow management as well as case handling are briefly introduced

(for a detailed qualitative comparison of both paradigms we refer to [22]).

Workflow Management. Contemporary workflow management systems (WfMSs)

enable the modeling, execution, and monitoring of business processes [4]. When work-

ing on a particular activity (i.e., process step), typically, in a WfMS-based PAIS only

data needed for executing this activity is visible to respective actors, but no other work-

flow data. This is also known as “context tunneling” [13]. WfMSs coordinate activity

execution based on routing rules, which are described by process models and which are

strictly separated from processed data (cf. Fig. 3). If an activity is completed, subsequent

activities will become active according to the logic defined by the used process model.

Accompanying this, the worklists of potential actors are updated accordingly. Finally,

for (administrative) processes electronic forms are typically used to implement activities

and to present data being processed.

Case Handling. An alternative BPM paradigm is provided by case handling [13].

Acase handling system (CHS) aims at more flexible process execution by avoiding re-

strictions known from (conventional) workflow technology (cf. Fig. 4). Examples of such

restrictions include rigid control flow and the aforementioned context tunneling. The

central concepts behind a CHS are the case and its data as opposed to the activities and

routing rules being characteristic for WfMSs. One should think of a “case” as being the

product which is “manufactured” by executing the workflow process. The characteristics

of the product should drive the workflow. Typically, the product is information, e.g.,

a decision based on various data. By focusing on the product characteristics, one can

replace push-oriented routing from one worktray to another by pull-oriented mechanisms

centered around the data objects relevant for a case.

Usually, CHSs present all data about a case at any time to the user (assuming proper

authorization), i.e., context tunneling as known from WfMSs is avoided. Furthermore,

CHSs orchestrate the execution of activities based on the data assigned to a case. Thereby,

different kinds of data objects are distinguished (cf. Fig. 4). Free data objects are not

explicitly associated with a particular activity and can be changed at any point in time

during a case execution (e.g., Data Object 3 in Fig. 4). Mandatory and restricted data ob-

Process Model:

A B C

Process Instances:

A B C

Instantiation

End User

Process

Execution

Process Engineer

Process Model

Creation & Change

Process Execution Engine Front-End

Data

Task

Execution

Business ObjectsProcess Execution Data

completed enabledActivity Status:

Fig. 3. Architecture of a Workflow Management System.

jects, in turn, are explicitly linked to one or more activities. If a data object is mandatory

for an activity, a value will have to be assigned to it before the activity can be completed

(e.g., Data Object 5 in Fig. 4). If a data object is restricted for an activity, this activity

needs to be active in order to assign a value to the data object (e.g., Data Object 6

in Fig. 4). As in WfMSs, forms linked to activities are used to provide context-specific

views on case data. Thereby, a CHS does not only allow assigning an execution role to

activities, but also a redo role (to undo an executed activity) and a skip role (to omit

the execution of activities). User 2 in Fig. 4, for example, may execute Activities 3, 4, 5

and 6, and redo Activities 2 and 3.

mandatory data object restricted data object

Activity 1 Activity 2

Activity 3

Activity 4

Activity 6

Data

Object 1

Data

Object 2

Data

Object 4

Data

Object 5

Data

Object 3

Data

Object 7

Data

Object 6

form

User 1 User 2

execute role redo role

active

activity

Possible Actions of User 2:

- Activity 4 can be executed

Current Situation:

- Activity 4 is active

- Data Objects 1,2,3 and 4 are available, i.e.,

values for the data objects have been entered

Activity 5

- Activity 2 can be redone at any time

- Activity 3 cannot be executed CONSTRAINT

- Activity 5 cannot be executed CONSTRAINT

- Activity 6 cannot be executed CONSTRAINT

Constraints:

- Activity 3 can be completed if Data Object 5 is available

- Activity 5 can be executed if Activities 3 and 4 is completed

- Activity 6 can be executed if Activity 5 is completed

available Data Objects not available Data Objects

available data object not available data object

Fig. 4. Data-driven Case Handling.

Fig. 5 summarizes the major conceptual differences between workflow management and

case handling, and additionally depicts characteristic representatives of each paradigm.

Despite conceptual differences, both paradigms are suited for implementing adiministra-

tive business processes as our eTravel business process.

C1: Basic focus

CHSs

C2: Primary driver for execution of activities

C4: Types of roles associated with tasks

C3: Separation of process control & data

case

case data

execute, skip, redo

Criteria for Comparison WfMSs

activity

routing rules

execute

yes

- Tibco Staffware,

- Microsoft BizTalk Server,

- IBM Websphere MQ Workflow,

- jBoss jBPM, etc.

- Pallas Athena FLOWer,

- Staffware Case Manager,

-con:cern (Open Source), etc.

Commercial WfMSs Commercial CHSs

Fig. 5. Selected Criteria for Comparing Workflow Management and Case Handling.

3. Experimental Definition and Planning

This section deals with the definition and planning of our experiment. Section 3.1 explains

its context and Section 3.2 describes its setup. Section 3.3 presents considered hypotheses.

Section 3.4 explains the specific design of our experiment. Factors threatening the validity

of experimental results as well as potential mitigations are discussed in Section 3.5. For

setting up and describing our experiment we follow the recommendations given in [23,24].

We strongly believe that the design of our experiment can be applied in similar form to

many other BPM related scenarios.

3.1. Context Selection

With workflow management and case handling we have introduced two paradigms for

realizing PAISs in Section 2.2. Usually, the selection of “the most suitable” BPM tech-

nology for implementing a PAIS depends on project-specific requirements. While some

IT managers will consider BPM technology as sufficient if best practices are available,

others will take into account more specific selection criteria like the support of a sufficient

degree of process flexibility. Likewise, IT managers are interested in value-based consid-

erations as well [17]. In practice, for example, a frequently asked question is as follows:

Is there a difference in the effort needed for implementing a business process either with

BPM technology A or BPM technology B and - if “yes” - how strong is this difference?

Currently, IT managers typically have to rely on vendor data (e.g., about the return-on-

investment of their products), experience reports, and criteria for qualitative comparisons

as provided by workflow patterns [5] or process change patterns [7]. What has been not

available so far are precise quantitative data regarding the use of workflow management

technology and case handling systems respectively (e.g., concerning the effort for im-

plementing processes) [16,25,17]. To generate quantitative data, and thus to complement

existing qualitative criteria, controlled software experiments offer promising perspectives.

In the following, we pick up this idea and describe an experiment in which we investigate

the effort related to the implementation and adaptation of business processes using either

a WfMS or a CHS.

The main goal of our experiment is to compare the implementation effort of WfMSs

and CHSs. Using the Goal Quality Metric (GQM) template for goal definition [26], the

goal of our experiment is defined as follows:

Compare workflow management and case handling technology

for the purpose of evaluating

with respect to their implementation effort

from the point of view of the researchers

in the context of Bachelor and Master of Computer Science students at the

University of Innsbruck

Fig. 6. Goal of our Experiment

3.2. Experimental Setup

This section describes the subjects, objects and selected variables of our experiment, and

presents the instrumentation and data collection procedure.

Subjects: Subjects are 48 students of a combined Bachelor/Master Computer Science

course at the University of Innsbruck. All subjects have a similar level of experience.

They are taught about workflow management and case-handling in an introductionary

session preceeding the execution of our experiment.

Object: The object to be implemented is the eTravel business process (cf. Section 2).

Its specification comprises two parts: an initial “Base Implementation” (Part I ) and

an additional “Change Implementation” (Part II ). While the first part deals with the

realization of the process support for refunding national business trips, the second one

specifies a process change, namely, additional support for refunding international busi-

ness trips. Both parts describe the elements to be implemented; i.e., the process logic,

user roles, and the data to be presented to actors using simple electronic forms. Note that

this experimental design does not only enable us to investigate the effort for (initially)

implementing a business process, but also to examine the effort for subsequent process

changes. In our experiment, with “process change” we mean the adaptation of the im-

plemented business process. After having realized such a process change new process

instances can be based on the new process model. We do not investigate the migration

of running process instances to the new process schema in this context [27].

Factor & Factor Levels: In our experiment, BPM technology is the considered factor

with factor levels “WfMS” and “CHS”. Thereby, we use Tibco Staffware [14] (Version

10.1) as typical and widely applied representative of workflow technology. Its build-time

tools include, among other components, a visual process modeling tool and a graphical

form editor. The used CHS, in turn, is FLOWer [15] (Version 3.1), the most widely used

commercial CHS. Like Staffware, FLOWer provides a visual process modeling tool and

a form editor.

Response Variable: In our experiment the response variable is the implementation

effort the subjects (i.e., the students) need for implementing the given object (i.e., the

eTravel specification) with each of the factor levels (WfMS and CHS). All effort values

related to the Staffware implementation are denoted as “WfMS Sample”, while all effort

values related to the FLOWer implementation are called “CHS Sample”.

Instrumentation: To precisely measure the response variable, we have developed a

tool called TimeCatcher (cf. Fig. 7). This “stop watch” allows logging time in six typical

“effort categories” related to the development of a process-oriented application: (1) pro-

cess modeling, (2) data modeling, (3) form design, (4) user/role management, (5) testing,

and (6) miscellaneous effort. To collect qualitative feedback as well (e.g., concerning the

maturity or usability of the applied WfMS and CHS), we use a structured questionnaire.

Effort Category 1

Effort Category 2

Effort Category 3

Effort Category 4

Effort Category 5

Effort Category 6

Currently implemented object

Currently used factor level object

ID of the team whose effort is logged

Logged Effort (= Response Variable)

Fig. 7. TimeCatcher Tool.

Data Collection Procedure: The TimeCatcher tool is used by the students during the

experiment. The aforementioned questionnaire is filled out by them after completing the

experiment.

Data Analysis Procedure: For data analysis well-established statistical methods and

standard metrics are applied (cf. Section 4.3 for details).

3.3. Hypothesis Formulation

Based on the goal of our experiment the following hypotheses are derived:

Differences in Implementation Effort: In our experiment we investigate whether

the used BPM technology has an influence on the response variable implementation ef-

fort.

Does the used BPM technology have an influence on the response variable

“implementation effort”?

Null hypothesis H0,1:There is no significant difference in the effort values

when using workflow technology compared to case handling technology.

Alternative hypothesis H1,1:There is a significant difference in the effort

values when using workflow technology compared to case handling technology.

Learning Effects: Our experiment further investigates learning effects that might occur

when implementing the same business process twice with two different BPM technolo-

gies. In particular, we aim at determining the influence of domain knowledge on imple-

mentation effort. When implementing the eTravel business process with the first BPM

technology the process specification is unknown to all subjects. When implementing the

respective process with the second BPM technology, however, its specification is already

known.

Does knowledge of the process specification have an influence on the response

variable “implementation effort”?

Null hypothesis H0,2:Domain knowledge does not have a statistically sig-

nificant impact on the mean effort values for implementing a business process.

Alternative hypothesis H1,2:Domain knowledge has a statistically signifi-

cant impact on the mean effort values for implementing a business process.

3.4. Experimental Design

Literature about software experiments provides various design guidelines for setting up an

experiment [28,23,21,8,29]. First, the design of an experiment should allow the collection

of as much data as possible with respect to the major goals of the experiment. Second,

collected data should be unambiguous. Third, the experiment must be feasible within

the given setting (e.g., within the planned time period). Note that meeting these design

criteria is not trivial. Often, an experiment cannot be accomplished as planned due to

its complex design or due to an insufficient number of participants [8].

Considering these design criteria, we accomplish our experiment as a balanced single

factor experiment with repeated measurement (cf. Fig. 8). This design is particularly suit-

able for comparing software development technologies [21]. Our experiment is denoted a

single factor experiment since it investigates the effects of one factor 1(i.e., a particular

BPM technology) on a common response variable (e.g., implementation effort). Our ex-

1Multi-factor experiments, by contrast, investigate the effects of factor combinations on a common response

variable, e.g., effects of a software development technology and a software development process on implemen-

tation effort. Even though such experiments can improve the validity of experimental results, they are rarely

applied in practice due to their complexity [30].

periment design also allows us to analyze variations of a factor called factor levels (i.e.,

the two BPM tools Staffware and FLOWer). The response variable is determined when

the participants of the experiment (i.e. subjects) apply the factor or factor levels to an

object (i.e., the base and the change specification of the eTravel business process).

We denote our experiment as balanced as all factor levels are used by all participants

of the experiment. This enables repeated measurements and thus the collection of more

precise data since every subject generates data for every treated factor level. Generally,

repeated measurements can be realized in different ways. We use a frequently applied

variant which is based on two subsequent runs (cf. Fig. 8). During the first run half of the

subjects apply “Staffware” to the treated object, while the other half uses “FLOWer”.

After having completed the first run, the second run begins. During this second run each

subject applies that factor level to the object not treated so far.

WfMS

Staffware

Subject 1 eTravel process

(Base + Change

Specification)

Subject n/2

Subject n/2+1

Subject n

Zeichen

n Subjects

Zeichen

Factor

Zeichen

Object

CHS

FLOWer

eTravel process

(Base + Change

Specification)

CHS

FLOWer

Subject 1 eTravel process

(Base + Change

Specification)

Subject n/2

Subject n/2+1

Subject n

Zeichen

n Subjects

Zeichen

Factor

Zeichen

Object

WfMS

Staffware

eTravel process

(Base + Change

Specification)

First Run Second Run

Fig. 8. Design of our Single Factor Experiment.

In our experiment subjects are not working on their own, but are divided into 4 main

groups each consisting of 4 teams with 3 students (cf. Fig. 9). This results in an overall

number of 16 teams. The students are randomly assigned to teams prior to the start of

the experiment.

Team 11

Team 13

Team 14

Team 16

Main Group 1 Main Group 2 Main Group 3 Main Group 4

Teams

Team 01

Team 02

Team 05

Team 06

Team 03

Team 04

Team 07

Team 08

Team 09

Team 10

Team 12

Team 15

1st Run:

WfMS

-----------

2nd Run:

CHS

1st Run:

WfMS

-----------

2nd Run:

CHS

1st Run:

CHS

-----------

2nd Run:

WfMS

1st Run:

CHS

-----------

2nd Run:

WfMS

WfMS = Workflow Management System, CHS = Case Handling System

Fig. 9. Main Groups and Teams.

The mathematical model of our experiment can be summarized as follows: n subjects

S1, ..., Sn(n ∈IN) divided into m teams T1, ..., Tm(m∈IN, m ≥2, m even) have to im-

plement the eTravel business process. The respective specification describes a “Base Im-

plementation” O1(corresponding to the “national case” of the eTravel business process)

and a “Change Implementation” O2(additionally introducing the “international case”).

During the experiment one half of the teams (T1, ..., Tm/2) implements the complete spec-

ification (i.e., base and change implementation) using a WfMS (P MS1, Staffware), while

the other half (Tm/2+1, ..., Tm) accomplishes this implementation using a CHS (P MS2,

FLOWer). After finishing the implementation with the first factor level (i.e., the first

run), each team has to implement the eTravel process using the second factor level in

asecond run (i.e., the development technologies are switched). The response variable

“Effort[Time] of Tmimplementing Oiusing P MSj” is logged with the TimeCatcher

tool.

3.5. Risk Analysis and Mitigations

When accomplishing experimental research related risks have to be taken into account

as well. Generally, there exist factors that threaten both the internal validity (“Are the

claims we made about our measurements correct?”) and the external validity (“Can we

justify the claims we made?”) of an experiment.

In our context, threats to internal validity are as follows:

-People: The students participating in our experiment differ in their skills and produc-

tivity for two reasons: (i) general experience with software development might differ

and (ii) experience with BPM technology might not be the same. The first issue can

only be balanced by conducting the experiment with a sufficiently large and represen-

tative set of students. The number of 48 students promises to achieve such balance.

The second issue can be mitigated by using BPM tools unknown to every student. Only

three of the participating students had rudimentary workflow knowledge beforehand.

As this knowledge might influence experimental results, we have assigned those three

students to different teams to minimize potential effects as far as possible. All other

students have been randomly assigned to groups.

-Data collection process: Data collection is one of the most critical threats. Therefore

we have to continuously control data collection during the experiment through close

supervision of the students. We further have to ensure that students understand which

TimeCatcher categories have to be selected during the experiment.

-Time for optimizing an implementation: The specification to be implemented

does not include any guideline concerning the number of electronic forms or their lay-

out. This implies the danger that some teams spend more time for implementing a

“nice” user interface than others do. To minimize such effects, we explicitly indicate

to the students that the development of a “nice” user interface is not a goal of our ex-

periment. To ensure that the implemented solutions are similar across different teams,

we accomplish acceptance tests.

Besides, there are threats to the external validity of experimental results:

-Students instead of professionals: Involving students instead of IT professionals

constitutes a potential threat to the external validity of our experiment. However, the

experiment by [32] evaluating the differences of students and IT professionals suggests

that results of student experiments are (under certain conditions) transferable and can

provide valuable insights into an analyzed problem domain. Furthermore, Runeson [31]

identifies a similar improvement trend when comparing freshman, graduate and profes-

sional developers. Also note that the use of professional developers is hardly possible in

practice as profit-oriented organizations will not simultaneously implement a business

process twice using two different BPM technologies. In addition, using professionals

instead of students would also be not entirely free of bias. In particular, it would be

very difficult to find professionals which are equally experienced with both systems

under investigation.

-Investigation of tools instead of concepts: In our experiment, BPM tools are

used as representatives for the analyzed concepts (i.e., workflow management and case

handling). Investigating the concepts therefore always depends on the quality of the

used tools. To mitigate this risk, the used BPM technologies should be representative

for state-of-the-art technologies in practice (which is the case as both selected BPM

tools are widely used representatives of workflow technology and case handling systems

respectively).

-Choice of object: To mitigate the risk that the chosen business process is favouring

one of the two BPM paradigms (i.e., case handling or workflow management), we

have picked a business process that can be found in many organizations; i.e., the

eTravel business process (cf. Section 2). However, additional experiments are needed

to assess how far our results can be generalized to different types of business processes.

Furthermore, one may argue that the use of UML activity diagrams can threaten the

validity of the experiment as these diagrams are similar to the more explicit, flow-

driven notation of Staffware process models, but different from the more implicit,

data-driven FLOWer process models. However, in practice, UML activity diagrams

(or other activity-centered diagramming techniques like Event-Driven Process Chains

or BPMN) are widely used to describe standard business processes [33]. Thus, the use

of UML activity diagrams can even improve internal validity as a typical practical

scenario is investigated.

4. Performing the Experiment

This section deals with the preparation and execution of the experiment (cf. Section 4.1).

It further covers the analysis and interpretation of the experimental data (cf. Section 4.2).

Finally, it includes a discussion of experimental results (cf. Section 4.3).

4.1. Experimental Operation

Preparation of the Experiment: In the run-up of the experiment, we prepared a

technical specification of the eTravel business process. This specification comprised UML

activity diagrams, an entity relationship diagram describing the generic data structure

of a travel expense report, and tool-specific data models for the considered systems

(Staffware, FLOWer). To ensure that the specification is self-explanatory and correct we

involved two student assistants in its development.

Before the experiment took place, the same two students implemented the specifica-

tion with each of the utilized BPM technologies. This allowed us to ensure feasibility of

the general setup of our experiment and to identify critical issues with respect to the

performance of the experiment. This pre-test also provided us with feedback that helped

to further improve comprehensibility of our specification. Finally, we compiled a “starter

kit” for each participating team. It included original tool documentation, additional doc-

umentation created by us when preparing the experiment (and which can be considered

as a compressed summary from the original documentation), and the technical process

specification.

Experimental Execution: Due to infrastructure limitations, the experiment was split

up in two events. While the first one took place in October 2006, the second one was

conducted in January 2007. Each event lasted 5 days, involved 24 participants (i.e.,

students), and was based on the following procedure: Prior to the start of the experiment,

all students had to attend an introductory lecture. We introduced to them basic notions

of workflow management and case handling. During this lecture we further inform them

about the goals and rules of the experiment. Afterwards, each team received its “starter

kit”. Then, the students had to implement the given eTravel business process specification

with both considered factor levels. Thereby, an implmenentation will be only considered

as completed, if the students successfully pass the acceptance test. This ensured that

all developed solutions correpond to the specification and were implemented correctly.

After finishing their work on the experiment, students filled out the aforementioned

questionnaire.

We further optimized the results of our experiment by applying Action Research [35].

Action Research is characterized by an intensive communication between researchers and

subjects. At an early stage we optimized the data collection process by assisting and guid-

ing the students in using the TimeCatcher tool properly (which is critical with respect to

the quality of the gathered data). In addition, we documented emotional reactions of the

students regarding their way of working. This helped us to design the questionnaire. Note

that Action Research did not imply any advice for the students on how to implement

the eTravel business process.

Data Validation: After conducting the experiment the data gathered by the teams

using the TimeCatcher tool was checked. We discarded the data of two teams as it was

flawed. Both teams had made mistakes using the TimeCatcher tool. Finally, the data

provided by 14 teams was considered in our data analysis.

4.2. Data Analysis and Interpretation

We now deal with the analysis of gathered data and the interpretation of results.

4.2.1. Raw Data and Descriptive Analysis

Fig. 10 presents the raw data obtained from our experiment. For both test runs it shows

the effort values for the base and the change implementation as well as the overall im-

plementation effort 1(measured in seconds). In the raw data table the values for the

different effort categories (i.e., process modeling, data modeling, form design, user/role

management, test, and miscelleneous) are accumulated.

1The overall implementation effort is calculated as the sum of the base implementation effort and the

change implementation effort.

System Base Change Overall System Base Change Overall

1 Staffware 15.303 816 16.119 Flower 23.261 1.521 24.782

2 Staffware 20.297 679 20.976 Flower 21.195 2.367 23.562

3 Flower 33.013 2.497 35.510 Staffware 18.420 1.630 20.050

4 Flower 29.024 2.608 31.632 Staffware 13.627 586 14.213

6 Staffware 13.995 1.609 15.604 Flower 19.487 1.942 21.429

7 Flower 38.814 2.461 41.275 Staffware 11.551 841 12.392

8 Flower 28.026 3.695 31.721 Staffware 12.650 1.126 13.776

10 Flower 35.616 1.215 36.831 Staffware 15.151 1.745 16.896

11 Staffware 13.747 1.617 15.364 Flower 18.407 1.243 19.650

12 Flower 32.049 2.072 34.121 Staffware 10.491 1.116 11.607

13 Staffware 22.440 3.301 25.741 Flower 18.981 2.525 21.506

14 Staffware 25.132 2.060 27.192 Flower 13.132 2.550 15.682

15 Flower 24.765 3.015 27.780 Staffware 6.677 740 7.417

16 Staffware 23.841 2.133 25.974 Flowe

14.830 1.226 16.056

Experiment Data

Group First Test Run Second Test Run

Fig. 10. Raw Data Obtained from the Experiment.

Based on this raw data we calcluate descriptive statistics for our response variable im-

plementation effort (cf. Fig. 11). When analyzing Fig. 11 one can observe the following:

– The mean effort values for FLOWer are higher than those for Staffware. This obser-

vation holds for the overall implementation effort as well as for the base and change

implementations. Obviously, this means that implementation effort are smaller for the

WfMS Staffware when compared to the CHS FLOWer.

– The mean effort values for the base implementation are much higher than those for

the change implementation for both Staffware and FLOWer.

– The effort values for implementing the eTravel business process using FLOWer in the

first test run are higher than those for using FLOWer in the second test run.

– The effort values for implementing the eTravel business process using Staffware in the

first test run are higher than those for using Staffware in the second test run.

4.2.2. Data Plausibility

We analyze data plausibility based on box-whisker-plot diagrams. Such diagrams visualize

the distribution of a sample and particularly show outliers. A low number of outliers

indicates plausible data distributions of the base implementation effort in our experiment.

The diagram takes the form of a box that spans the distance between the 25% quantile

and the 75% quantile (the so called interquantile range) surrounding the median which

splits the box into two parts. The “whiskers” are straight lines extending from the ends

of the box. As such, the length of a whisker is at most 1.5 times the interquartile range.

All results outside the whiskers can be considered as outliers. As can be seen in Fig. 12A,

there are no outliers regarding the base implementation effort for the first test run; i.e., all

data from these samples lie within the boxed area. However, there exist two outliers (i.e.,

o4 and o5) in the distribution of the change implementation effort for the first test run

(cf. Fig. 12B), and no outliers regarding the distribution of the overall implementation

effort for the first test run (cf. Fig. 12C). For the second test run no outliers can be

identified at all (cf. Fig. 12D-F).

System Object Test Run N Minimum Maximum Mean Standard Deviation

1st Time [s] 7 24.765 38.814 31.615,29 4.769,575

2nd Time [s] 7 13.132 23.261 18.470,43 3.498,155

Total Time [s] 14 13132 38814 25042,86 7916,249

1st Time [s] 7 1.215 3.695 2.509,00 768,146

2nd Time [s] 7 1.226 2.550 1.910,57 586,197

Total Time [s] 14 1.215 3.695 2.209,79 726,184

1st Time [s] 7 27.780 41.275 34.124,29 4.332,368

2nd Time [s] 7 15.682 24.782 20.381,00 3.492,184

Total Time [s] 14 15.682 41.275 27.253 8.071

1st Time [s] 7 13.747 25.132 19.250,71 4.837,774

2nd Time [s] 7 6.677 18.420 12.652,43 3.697,932

Total Time [s] 14 6.677 25.132 15.951,57 5.369,811

1st Time [s] 7 679 3.301 1.745,00 885,549

2nd Time [s] 7 586 1.745 1.112,00 439,266

Total Time [s] 14 586 3.301 1.428,50 747,577

1st Time [s] 7 15.364 27.192 20.995,71 5.327,046

2nd Time [s] 7 7.417 20.050 13.764,43 4.007,169

Total Time [s] 14 7.417 27.192 17.380,07 5.881,059

Staff

Overall

Base

Change

Descriptive Statistics

Flower

Overall

Base

Change

Fig. 11. Descriptive Statistics for Response Variable.

A) Boxplot: Base Implementation (First Run)

D) Boxplot: Base Implementation (Second Run)

B) Boxplot: Change Implementation (First Run) C) Boxplot: Overall Implementation (First Run)

E) Boxplot: Change Implementation (Second Run) F) Boxplot: Overall Implementation (Second Run)

Fig. 12. Data Distribution (Box-Whisker-Plot Diagrams).

4.2.3. Testing for Differences in Implementation Effort

In this section we describe the data analysis for our Hypothesis H0,1. We analyze this

0-hypothesis based on a two-sided t-test [30] (and an additional sign test if the t-test

fails). Doing so, we are able to assess whether the means of the WfMS sample and the

CHS sample are statistically different from each other. A successful t-test (with |T|> t0

where Tis the observed t-statistic and t0is a predefined value depending on the size

of sample xand significance level α) rejects our 0-hypothesis. Specifically, the following

steps (1) - (4) have to be executed in order to accomplish a t-test (with α= 0.05 as the

level of significance):

(1) Paired Comparison: The t-test is combined with a paired comparison [30], i.e.,

we analyze “pairs of effort values”. Each pair comprises one effort value from the

WfMS sample and one from the CHS sample. Note that we compose pairs according

to the performance of the teams, i.e., effort values of “good” teams are not combined

with effort values of “bad” teams (cf. [21]). Paired Comparison 1 in Fig. 13, for

example, combines effort values from Main Groups 1 and 3 with effort values from

Main Groups 2 and 4 (precise pairs are shown in Fig. 14).

Main Group 1 Main Group 2Main Group 3 Main Group 4

WfMS WfMS CHSCHS

1st RUN

Overall Effort Overall EffortOverall Effort Overall Effort

2nd RUN

Paired Comparison 1: Overall Effort 1st Run

Main Group 1 Main Group 2Main Group 3 Main Group 4

WfMS WfMSCHSCHS

Overall Effort Overall EffortOverall Effort Overall Effort

Paired Comparison 2: Overall Effort 2nd Run

Main Group 1 Main Group 2Main Group 3 Main Group 4

WfMS WfMS CHSCHS

Base Impl. Base Impl.Base Impl. Base Impl.

Paired Comparison 3: Base Implementation 1st Run

Main Group 1 Main Group 2Main Group 3 Main Group 4

WfMS WfMSCHSCHS

Base Impl. Base Impl.Base Impl. Base Impl.

Paired Comparison 4: Base Implementation 2nd Run

Main Group 1 Main Group 2Main Group 3 Main Group 4

WfMS WfMS CHSCHS

Change Impl. Change Impl.Change Impl. Change Impl.

Paired Comparison 5: Change Implementation 1st Run

Main Group 1 Main Group 2Main Group 3 Main Group 4

WfMS WfMSCHSCHS

Change Impl. Change Impl.Change Impl. Change Impl.

Paired Comparison 6: Change Implementation 2nd Run

Fig. 13. Paired Comparison.

(2) Standardized Comparison Variable: For each pair, a standardized comparison

variable Xjis derived. It is calculated by dividing the difference of the two com-

pared effort values by the first effort value:

Xj:= EF F ORTj+m/2−EF F ORTj

EF F ORTj+m/2

·100%

In other words, Xjdescribes how much effort team Tjsaves using workflow techno-

logy when compared to team Tj+m/2which uses case handling technology. Together,

all Xjconstitute a standardized comparison sample x= (X11, ..., X1m/2), which

we use as basis when performing the t-test.

(3) Statistical Measures: For the standardized comparison sample xwe calculate its

median (m), interquantile range (IQR), expected value (µ), standard deviation (σ),

and skewness (sk).

(4) Two-sided t-Test: Finally, we apply the t-test to x. Note that the t-test will

be only applicable if xis taken from a normal distribution and the WfMS and

CHS sample have same variance. The first condition can be tested using the

Kolmogorov/Smirnov test [34]. In particular, the result of this test has to be smaller

than K0(with K0being a predefined value depending on the size of xand the cho-

sen significance level α). The second condition can be tested using the test for

identical variance [34]. The variance of the WfMS and CHS sample will be identi-

cal, if the result of this test is smaller than F0(with F0being a predefined value

depending on the size of the samples and the chosen significance level α).

Fig. 14A shows the results of our analysis regarding overall implementation effort.

When studying the effort for the workflow implementation, we can see that they are

lower than the ones for the case handling implementation. This difference is confirmed

by the results of the (successful) t-tests for both the first and the second run, i.e., our 0-

hypothesis H0,1is to be rejected. In the first run, the use of workflow technology results

in effort savings of 43.04% (fluctuating between 27.51% and 50.81%) when compared

to the effort for using CHS-based technology. In the second run, the use of workflow

technology still results in savings of 28.29% (fluctuating between 11.48% and 53.16%).

Fig. 14B and Fig. 14C show results for the base implementation as well as the change

implementation. Again, our results allow the rejection of the 0-hypothesis H0,1(the failed

t-test can be compensated with a successful sign test). Using workflow technology results

in effort savings of 43.01% for the base implementation in the first run (fluctuating

between 32.03% and 50.06%). In the second run, the use of workflow technology results

in effort savings of 28.52% when compared to case handling effort. Regarding the change

implementation the use of workflow technology results in effort savings of 44.11% in the

first run (fluctuating between 16.29% and 56.45%) and 40.46% in the second run.

Fig. 15 illustrates that the obtained effort of the WfMS sample are smaller for all six

effort categories when compared to the CHS sample.

The additional analysis of our questionnaire provides possible explanations for these

differences. Fig. 16A shows that the concepts underlying workflow technology seem to

be easier to understand, i.e., the case handling paradigm is considered as being more

difficult. Finally, Fig. 16B deals with the usability of the applied process management

systems. The subjective results obtained from the questionnaire show that the students

perceive Staffware as beeing more user-friendly when compared to FLOWer.

Based on the results of the questionnaire we asume that the observed differences in

implementation effort between case handling and workflow technology are (1) due to

conceptual differences (i.e., the case handling concept was perceived as being more com-

plicated) and (2) due to differences in the tools (i.e., the used case handling system

FLOWer is perceived as being less user-friendly). Further experiments with different

designs are needed to confirm these assumptions.

4.2.4. Testing for Learning Effects

To investigate learning effects we compare the effort values of the groups using Staffware

in the first run with those groups using Staffware in the second test run. In addition, we

repeat this procedure for FLOWer.

As all preconditions for the t-test are met (samples are normally distributed and have

same variance), we test 0-hypothesis H0,2regarding the learning effects using the t-test

(with α= 0.05 as the level of significance). The t-test reveals that there is a statistically

significant difference between effort values for the first test run and for the second one.

Fig. 17A and Fig. 17B show the results for FLOWer and Staffware in respect to overall

implementation effort. When comparing the effort values for the two test runs, it can

be observed that for both systems the effort for the first run are generally higher than

those for the second run. Regarding FLOWer in the second run even the slowest group is

A) Paired Comparison (Overall Efforts)

10 0

12 0

11|15 6|8 1|12 16|4 14|7 2|10 13|3 15 |11 8 |6 12|1 4|16 7|14 10|2 3|13

WfMS

CHS

1st run

Statistical Data

Paired Comparison 1 Paired Comparison 2

2nd run

= 43.048

= [27.51 ; 50.81]

first run second run

IQR

= 38.6896

= 12.7703

= 28.2913

= [11.48 ; 53.16]

IQR

= 31.2358

= 20.6792

= -0.6309 = 0.4490

= 0.135 ( )

= 0.349

= 0.129 ( )

= 0.349

= 0.661 ( )

= 4.284

= 0.76 ( )

= 4.284

= -5.059 ( )

= 2.179

= -3.294 ( )

= 2.179

normalized effort values

pairs of effort values pairs of effort values

100

120

11|15 6|8 1|12 16|4 14|7 2 |10 13|3 15|11 8|6 12|1 4|16 7|14 10|2 3|13

100

120

11|15 6 |8 1|12 16|4 14|7 2|10 13 |3 15|11 8|6 12|1 4 |16 7|14 10|2 3|13

B) Paired Comparison (Base Implementation)

C) Paired Comparison (Change Implementation)

Statistical Data

Paired Comparison 3 Paired Comparison 5

Paired Comparison 4 Paired Comparison 6

1st run 2nd run

= 43.0116

= [32.03 ; 50.06]

first run second run

IQR

= 39.2788

= 11.9261

= 28.5209

= [8.11 ; 54.90]

IQR

= 29.3332

= 23.5067

= -09141 = 0.4401

= 0.138 ( )

= 0,349

= 0.198 ( )

= 0,349

= 0.972 ( )

= 4,284

= 0.895 ( )

= 4,284

= -4.816 ( )

= 2,179

= -3.024 ( )

= 2,179

= 44.1152

= [16.29 ; 56.45]

first run second run

IQR

= 29.9807

= 32.4034

= 40.4666

= [26,63 ; 52,20]

IQR

= 41.4368

= 14.4722

= -1.3034 = 0.8501

= 0.172 ( )

= 0.349

= 0.198 ( )

= 0.349

= 0.752 ( )

= 4.284

= 1.784 ( )

= 4.284

= -1.724 ( )

= 2.179

= -2.884 ( )

= 2.179

normalized effort values

pairs of effort values pairs of effort values

normalized effort values

pairs of effort values pairs of effort values t-test failed

WfMS

CHS

WfMS

CHS

IQR

… median … interquantile range …expeted vaue…standard deviation …skewness

… Kolmogorov/ Smirnov Z-value …observed f-statistic …observed t-statistic

…critical value for F …critical value for T…critical value for K

Fig. 14. Experimental Results

faster then the fastest one in the first run. These differences are confirmed by the results

of the t-test for both FLOWer and Staffware, i.e., our 0-hypothesis H0,2is rejected at a

significance level of 0.05. For FLOWer the mean difference between first and second test

run is 13743 seconds (fluctuating between 9161 and 18326) and for Staffware it is 7231

seconds (fluctuating between 1742 and 12721).

Fig. 17C and 17D show the results for FLOWer and Staffware in respect to base

implementation effort. Like for the overall implementation effort the differences in im-

plementation effort between the first and the second test run are statistically significant,

which leads to a rejection of 0-hypothesis H0,2. For FLOWer the mean difference between

A) Base Implementation (Staffware vs. FLOWer) B) Change Implementation (Staffware vs. FLOWer)

Process Modeling

Data Modeling

Form Design

User/Role Management

Test

Miscellaneous

Process Modeling

Data Modeling

Form Design

User/Role Management

Test

Miscellaneous

C) Staffware vs. FLOWer

Process Modeling

Data Modeling

Form Design

User/Role Management

Test

Miscellaneous

CHS

WfMS

Fig. 15. Understanding Effort Distribution.

A) Methodical Soundness of Implementation

A B C D E

Question: The methodical steps of the business process

implementation was clear during the experiment?

A:13

B:25

C:05

D:00

E:00

(27.66%)

(61.70%)

(10.64%)

(00.00%)

(00.0%)

yes

rather yes

indifferent

rather no

WfMS

A:00

B:09

C:10

D:21

E:07

(00.00%)

(19.15%)

(21.28%)

(44.68%)

(14.89%)

CHS

B) Usability

Question: How would you rate the usability of the process management

systems, which have been used during the experiment?

A B C D E F G

A:05

B:21

C:09

D:08

E:04

F:00

G:00

(10.64%)

(44.68%)

(19.15%)

(17.02%)

(08.51%)

(00.00%)

very good

good

rather good

indifferent

rather weak

weak

very weak

A:01

B:01

C:05

D:05

E:14

F:14

G:07

(02.13%)

(10.64%)

(29.79%)

(14.89%)

absolute nominations

WfMS CHS

Fig. 16. Selected Questionnaire Results (Part 1).

first and second test run is 13145 seconds and for Staffware it is 6598 seconds.

Fig. 17E and 17F show the results in respect to change implementation effort. Only the

effort savings for FLOWer between the first and second test run with a mean difference

of 598 seconds are statistically significant. For Staffware, the t-test fails, but can be

compensated by a successful Mann Whitney U-test.

The observed differences in implementation effort between the two test runs can be

explained either through an increasing process knowledge gathered during the experiment

5.000

10.000

15.000

20.000

25.000

30.000

Implementation Effort [s]

Efforts 1st

Run

Efforts 2nd

Run

10.000

20.000

30.000

40.000

50.000

Implementation Effort [s]

Efforts 1st

Run

Efforts 2nd

Run

5.000

10.000

15.000

20.000

25.000

30.000

Implementation Effort [s]

Efforts 1st

Run

Efforts 2nd

Run

10.000

20.000

30.000

40.000

50.000

Implementation Effort [s]

Efforts 1st

Run

Efforts 2nd

Run

500

1.000

1.500

2.000

2.500

3.000

3.500

Implementation Effort [s]

Efforts 1st

Run

Efforts 2nd

Run

1.000

2.000

3.000

4.000

Implementation Effort [s]

Efforts 1st

Run

Efforts 2nd

Run

A) Differences in Overall Implementation Efforts (Staffware) B) Differences in Overall Implementation Efforts (FLOWer)

C) Differences in Base Implementation Efforts (Staffware)

F) Differences in Change Implementation Efforts (FLOWer)

D) Differences in Base Implementation Efforts (FLOWer)

C) Boxplot: Differences in Base Implementation Efforts (Staffware)

1,896 0,194 7.231,286 12,720,784

1.741,788

T-test for Equality of Means

F Sig. t df Sig.

(2-tailed)

Mean

Difference

95% Confidence

Interval of the Mean

Upper

Levene’s

Test for

Equality of

Variances

2,870 12 ,014

E) Differences in Change Implementation Efforts (Staffware)

C) Boxplot: Differences in Base Implementation Efforts (Staffware)

,824 ,382 13.144,857 18.015,85

8.273,863

T-test for Equality of Means

F Sig. t df Sig.

(2-tailed)

Mean

Difference

95% Confidence

Interval of the Mean

Lower Upper

Levene’s

Test for

Equality of

Variances

5,880 12 ,000

C) Boxplot: Differences in Base Implementation Efforts (Staffware)

1,.881 ,195 633,00000 1.447,051

-181,0514

T-test for Equality of Means

F Sig. t df Sig.

(2-tailed)

Mean

Difference

95% Confidence

Interval of the Mean

Lower Upper

Levene’s

Test for

Equality of

Variances

1,694 12 ,116

C) Boxplot: Differences in Base Implementation Efforts (Staffware)

,128 ,726 13.743,286 18.325,810

9.160,762

T-test for Equality of Means

F Sig. t df Sig.

(2-tailed)

Mean

Difference

95% Confidence

Interval of the Mean

Lower Upper

Levene’s

Test for

Equality of

Variances

6,535 12 ,000

C) Boxplot: Differences in Base Implementation Efforts (Staffware)

2,046 ,178 6.598,2857 11.612,85

1.583,718

T-test for Equality of Means

F Sig. t df Sig.

(2-tailed)

Mean

Difference

95% Confidence

Interval of the Mean

Lower Upper

Levene’s

Test for

Equality of

Variances

2,867 12 ,014

C) Boxplot: Differences in Base Implementation Efforts (Staffware)

,004 ,950 598,42857 1.394,164

-197,3068

T-test for Equality of Means

F Sig. t df Sig.

(2-tailed)

Mean

Difference

95% Confidence

Interval of the Mean

Lower Upper

Levene’s

Test for

Equality of

Variances

1,639 12 ,127

(1) (2) (3) (4) (5) (6) (7) (8)

Lower

(1) (2) (3) (4) (5) (6) (7) (8)

(1) (2) (3) (4) (5) (6) (7) (8) (1) (2) (3) (4) (5) (6) (7) (8)

(1) observed F statistic

(2) significance value for the Levene test. A value above the significance level of 0.05 indicates that the two samples have equal variances.

(3) observed t statistic for each sample, calculated as the ratio of the difference between sample means divided by the standard error of the

difference.

(4) degrees of freedom, calculated as the total number of cases in both samples minus 2.

(5) significance value for the t-test, provides the probability of obtaining an absolute value greater than or equal to the observed t statistic, if

the difference between the sample means is purely random. A value below the significance level of 0.05 indicates the differences

between the two sample means are statistically significant.

(6) mean difference, calculated by subtracting the sample mean for group 2 from the sample mean for group 1

(7+8) provide an estimate of the boundaries between which the true mean difference lies in 95% of all possible random samples of 14

teams. If the confidence interval does not contain zero, this also indicates that there are statistically significant differences.

Fig. 17. Implementation Effort for 1st and 2nd Test Run

or an increasing tool knowledge, which is partially transferable when working with other

PAISs.

The results of the questionnaire provide possible explanations for the observed learning

effects. Fig. 18A illustrates that according to questionnaire results process knowledge

gained during the first run significantly simplifies the second run. By contrast, Fig. 18B

shows that increased efficiency during the second run can be related only to a much

smaller degree to a gained tool knowledge. Consequently, we assume that the observed

differences are primarily related to increased process knowledge, and not necessarily to

learning effects concerning the used BPM technologies (i.e., tool knowledge).

A) Impact of Process Knowledge

Question: How strong has the process knowledge, which was gained

during the first implementation, simplified the second implementation?

A B C D E F

A:04

B:14

C:21

D:05

E:02

F:01

G:00

(08.51%)

(29.79%)

(44.68%)

(10.64%)

(04.26%)

(02.13%)

(00.00%)

very strong

strong

rather strong

indifferent

rather weak

weak

very weak

B) Impact of Tool Knowledge

Question: How strong has the tool knowledge, which was gained

during the first implementation, simplified the second implementation?

A B C D E F G

A:02

B:02

C:11

D:15

E:06

F:04

G:07

(04.26%)

(23.40%)

(31.91%)

(12.77%)

(08.51%)

(14.89%)

very strong

strong

rather strong

indifferent

rather weak

weak

very weak

absolute nominations

Fig. 18. Selected Questionnaire Results (Part 2).

4.2.5. Additional Observations

Fig. 19 shows that the initial implementation of the eTravel business process (i.e., its

base implementation) takes significantly more time than subsequent changes. Partially

these differences are due to the smaller scope of the change implementation (cf. Fig.

1). While the base implementation requires the implementation of four activities, two

decision-points and four forms, the change implementation comprises one additional ac-

tivity and two additional decision-points. Furthermore, two existing forms need to be

slightly adapted.

Besides differences in the scope of base and change implementation it can be assumed

that tool-specific learning effects are contributing to these differences. In our experiment

design the change implementation immediately follows the base implementation. Conse-

quently, increasing tool knowledge gathered while working on the base implementation

might influence change implementation effort.

To investigate whether these differences apply with the same degree to both FLOWer

and Staffware we calculate a comparison variable changeRatio, which is defined as the

ratio of change and base implementation. Fig. 20 shows descriptive statistics for variable

changeRatio. It can be observed that for FLOWer the mean change ratio in the first test

run is 8.3% (fluctuating between 3.41% and 13.18%). For the second test run it is 10.77%

(fluctuating between 6.54% and 19.42%). Similarly, for Staffware the mean change ratio

for the first test run is 9.11% (fluctuating between 3.35% and 14.71%) and 8.94% for the

second run (fluctuating between 4.3% and 11.52%). However, when comparing the effort

savings between base and change implementation, there are no statistically significant

Base Implementation vs. Process Change (Overall)

Process Modeling

Data Modeling

Form Design

User/Role Management

Test

Miscellaneous

Base ImplementationChange Implementation

Fig. 19. Base Implementation vs. Change Implementation

differences between Staffware and FLOWer. Interestingly, change ratios are very similar

though Staffware and FLOWer are representing different process support paradigms.

System Test Run N Minimum Maximum Mean Standard Deviation

1st 7 3,41% 13,18% 8,30% 3,44%

2nd 7 6,54% 19,42% 10,77% 4,52%

1st 7 3,35% 14,71% 9,11% 3,93%

2nd 7 4,30% 11,52% 8,94% 2,53%

Flower

Staffware

Base Implementation versus Change Implementation

Fig. 20. Ratio Between Change Implementation and Base Implementation

4.3. Discussion

Our results indicate that process implementations based on workflow technology gen-

erate lower effort when compared to implementations based on case handling technology

(cf. Fig. 14). As illustrated in Fig. 15 these effort savings apply to all six effort cate-

gories (i.e., process modeling, user/role management, form design, data modeling, test,

and miscellaneous). Moreover, our results show that initial implementations of processes

come with a significantly higher effort when compared to subsequent process changes

(cf. Fig. 19). Interestingly, the ratio between the effort for subsequent process changes

and initial process implementations seems to be tool-independent (cf. Fig. 20). This is

particularly important for policy makers, who often focus on short-term costs (e.g., for

purchasing BPM technology and initially implementing business processes) rather than

on long-term benefits (e.g., lower costs for realizing process changes).

Finally, our data indicates that growing knowledge about the processes to be imple-

mented results in increased productivity of software developers (cf. Fig. 17). Regardless

of which BPM technology is used first, all teams reduce their effort in the second run sig-

nificantly. Questionnaire results further indicate that this effect is not necessarily related

to an increasing knowledge about the used BPM technology, but is fostered by increasing

process knowledge. This also emphasizes the need to involve domain experts with high

process knowledge when applying BPM technology.

Considering our experimental design, it is inevitable to acknowledge that experiment

results are influenced by the quality of the used BPM tools. However, by selecting lead-

ing commercial BPM tools as representatives for the analyzed concepts (i.e., workflow

management and case handling), we can reduce the impact that tool quality has on the

results of our experiment. Yet, based on this single experiment, it cannot be general-

ized that the effort related to workflow management is generally smaller when compared

to case handling. For this purpose, additional experiments with different experimental

designs and more specific research questions are needed, e.g., experiments comparing

conventional WfMSs, adaptive WfMSs 2, and CHSs regarding their effectiveness when

realizing particular process changes.

We have applied the described experimental results in the EcoPOST project [36]. This

project aims to develop an approach to investigate complex causal dependencies and

related cost effects in PAIS engineering projects. In particular, our results enable us to

quantify causal dependencies in PAIS engineering projects. As an example consider the

impact of process knowledge on the productivity of process implementation [36,25]. In

addition, our experimental results enable us to specify the effort distribution of the six

analyzed effort categories (cf. Fig. 21).

5. Related Work

There exist a number of approaches dealing with the evaluation of (economic) ef-

fects related to PAIS. So far, focus has been on analyzing the impact of WfMSs on

business process performance. The most similar experimental design, when compared

to ours, is provided by [21]. This work investigates the impact of workflow technology

on software development and software maintenance. Results indicate that the effort for

realizing process-oriented information systems can be significantly reduced when using

workflow technology (instead of convential programming techniques). Oba et al. [39], in

turn, investigate the introduction of WfMSs to an organization and particularly focus on

the identification of factors that influence work efficiency, processing time, and business

process standardization. A mathematical model is provided for predicting the rate of re-

duction of processing times. An extension is the work of Reijers and van der Aalst [40,41].

They use process simulation to compare pre- and post-implementations of information

systems that rely on WfMSs. Focus is on analyzing business process performance based

on criteria such as lead time, waiting time, service time, and utilization of resources. In

most cases, the use of workflow technology has resulted in a significant decrease of lead

and service time.

Choenni et al. [42] present a model to measure the added value of WfMSs to business

processes. This model builds upon different performance criteria, i.e., parameters of a

business process that are affected by the introduction of a WfMS (such as speed, quality,

flexibility, and reliability).

2Adaptive WfMSs extend traditional WfMSs with the ability to flexibly deal with process changes

during run-time (e.g., to dynamically add, delete or move activities in the flow of control) [37,38]

A) Base Implementation (Staffware) B) Base Implementation (FLOWer) C) Base Implementation (Overall)

D) Change Implementation (Staffware) E) Change Implementation (FLOWer) F) Change Implementation (Overall)

Explanation: A

User/Role Management

Test

Miscellaneous

Process Modeling

Data Modeling

Form Design

(1%)

(20%)

C (14%) D

(17%)

(21%)

F (27%)

(3%)

(14%)

(24%)

(17%)

(18%)

F (24%)

(2%)

(16%)

(21%)

(17%)

(19%)

F (25%)

(8%)

(28%)

C (10%)

(27%)

(10%)

F (17%)

(8%)

(24%)

C (11%)

(25%)

(13%)

F (19%)

(6%)

(33%)

C (6%)

(36%)

(6%)

F (13%)

Fig. 21. Distribution of Logged Effort.

Aiello [43] introduces a measurement framework for evaluating workflow performance.

The framework is defined in an abstract setting to enable generality and to ensure inde-

pendence from existing WfMSs.

Becker et. al [44] introduce a framework to identify those processes that can be sup-

ported by a WfMS in a “profitable” way. Their framework can serve as a guideline for

evaluating processes during the selection and introduction of a WfMS. It contains three

groups of criteria: technical, organizational, and economic criteria. Designed as a scoring

model, their approach enables users to systematically determine those business processes

that can be automated using a WfMS.

A different approach is proposed by Abate et al. [45]. This work introduces a mea-

surement language to evaluate the performance of automated business processes: the

“workflow performance query language” (WPQL). This language allows the definition of

metrics independently of a specific workflow implementation.

While the approaches described above investigate (economic) effects related to PAISs

from a quantitative prespective, existing work on workflow patterns provides qualitative

evaluation criteria. Patterns have been introduced for analyzing the expressiveness of

process meta models [5,46,47]. In this context, control flow patterns describe different

constructs to specify activities and their ordering. In addition, workflow data patterns

[48] provide ways for modeling the data aspect in PAISs, and workflow resource patterns

[49] describe how resources can be represented in workflows. Moreover, a set of change

patterns and change support features has been proposed by Weber et al.[7,50] to com-

pare PAISs regarding their ability to deal with process change. Furthermore, patterns

for describing service interactions and process choreographies [51] as well as exception

handling patterns have been proposed [52]. The introduction of workflow patterns has

had significant impact on PAIS design as well as on the evaluation of PAISs and pro-

cess languages like BPEL [46], BPMN [53], EPC [46], and UML [54]. Furthermore, the

patterns-based evaluations of both Staffware and FLOWer seem in particular noteworthy

in the context of this paper [5,7].

To evaluate the suitability of a BPM technology for a particular scenario patterns

are important, but not sufficient. In addition to qualitative criteria quantitative data is

needed to support IT decision makers in the selection of suitable technologies. With this

paper we want to stimulate more experimental research in the BPM field to achieve this.

6. Summary and Outlook

This paper presents the results of a controlled BPM software experiment with 48

students. Our results indicate that business process implementations based on traditional

workflow technology generate a lower effort than using case handling technology. Thereby,

initial process implementations result in a higher effort than subsequent process changes

(independently of whether worflow technology or case handling technology is used). In

addition, our results show that the impact of domain knowledge on implementation effort

is not negligable. It is important to mention that our results are complementing existing

research on workflow patterns [5,7]. While patterns facilitate the selection of appropriate

BPM technologies by providing qualitative selection criteria, our experiment contributes

to satisfying the increasing demand of enterprises for quantitative data regarding the

effort related to BPM technologies [17].

Future work will include additional experiments to investigate the role of domain

knowledge and tool knowledge in more detail and to confirm our observations that the

observed learning effects are not tool-specific. In addition, we plan more specific ex-

periments to investigate the effort related to process modeling and process change. In

particular, we aim at conducting similar experiments to assess whether and - if yes -

how far our results can be transfered to different business processes and different types

of subjects. In addition, we plan to investigate whether the usage of change patterns [7]

leads to a lower effort for process modeling and process change. Moreover, we want to

analyze the impact of using business process refactorings [55] on process maintenance

effort.

References

[1] Y. L. Antonucci, Using Workflow Technologies to improve Organizational Competitiveness, Int’l. J.

of Management, 14(1), pp. 117-126, 1997.

[2] R. Lenz and M. Reichert, IT Support for Healthcare Processes - Premises, Challenges, Perspectives,

Data and Knowledge Enginnering, 61(1), pp. 39-58, 2007.

[3] M. Dumas and W.M.P. van der Aalst and A.H.M. ter Hofstede, Process-aware Information Systems,

Wiley, 2005.

[4] M. Weske, Business Process Management: Concepts, Methods, Technology, Springer, 2007.

[5] W.M.P van der Aalst and A.H.M. ter Hofstede and B. Kiepuszewski and A.P. Barros, Workflow

Patterns, Distributed and Parallel Databases, 14(3), pp. 5-51, 2003.

[6] B. Weber and S. Rinderle-Ma and M. Reichert, Change Patterns and Change Support Features

- Enhancing Flexibility in Process-aware Information Systems, Data and Knowledge Engineering

66(3), pp. 438-466, 2008.

[7] B. Weber and S. Rinderle and M. Reichert: Change Patterns and Change Support Features in Process-

Aware Information Systems. In: Proc. CAiSE’07 (2007), LNCS 4495, pp. 574-588.

[8] D. I. K. Sjoberg and J. E. Hannay and O. Hansen and V. B. Kampenes and A. Karahasanovic and

N.-K. Liborg and A. C. Rekdal, A Survey of Controlled Experiments in Software Engineering, IEEE

Transactions in Software Engineering, 31(9), pp. 733-753, 2005.

[9] G. J. Myers, A controlled Experiment in Program Testing and Code Walkthroughs/Inspections,

Communications of the ACM, 21(9), pp. 760-768, 1978.

[10] C. M. Lott and H. D. Rombach, Repeatable Software Engineering Experiments for Comparing

Defect-Detection Techniques, Empirical Software Engineering, 1(3), pp. 241-277, 1996.

[11] B. Mutschler and M. Reichert and J. Bumiller, Unleashing the Effectiveness of Process-oriented

Information Systems: Problem Analysis, Critical Success Factors, Implications, IEEE Transactions

on Systems, Man, and Cybernetics (Part C), 38(3), pp. 280-291, 2008.

[12] W.M.P. van der Aalst and K. van Hee, Workflow Management, MIT Press, 2004.

[13] W.M.P. van der Aalst and M. Weske and D. Gr¨unbauer, Case Handling: A New Paradigm for

Business Process Support, Data and Knowkedge Enginnering, 53(2), pp. 129-162, 2005.

[14] Tibco, Staffware Process Suite, User Manual, 2005.

[15] Pallas Athena, Case Handling with FLOWer: Beyond Workflow, 2002.

[16] B. Mutschler and B. Weber and M. Reichert, Workflow Management versus Case Handling - Results

from a Controlled Software Experiment, Proc. 23rd Annual ACM Symposium on Applied Computing

(SAC ’08), Special Track on Coordination Models, Languages, Architectures, pp. 82-89, 2008.

[17] B. Mutschler, Modeling and Simulating Causal Dependencies on Process-aware Information Systems

from a Cost Perspective, PhD Thesis, University of Twente, 2008.

[18] J. Chang, Envisioning the Process-Centric Enterprise, EAI Journal, August 2002, pp. 30-33, 2002.

[19] J. Dehnert and W.M.P. van der Aalst, Bridging the Gap between Business Models and Workflow

Specification, Int’l. Journal of Cooperative Information Systems, 13(3), pp. 289-332, 2004.

[20] M. Reichert and S. Rinderle and U. Kreher and P. Dadam, Adaptive Process Management with

ADEPT2, Proc. ICDE ’05, pp. 1113-1114, 2005.

[21] N. Kleiner, Can Business Process Changes Be Cheaper Implemented with Workflow-Management-

Systems?, Proc. IRMA 2004, pp. 529-532, 2004.

[22] C. W. G¨unther, M. Reichert and W. M. P. van der Aalst, Supporting Flexible Processes with

Adaptive Workflow and Case Handling, Proc. WETICE’08, 2008.

[23] N. Juristo and A. M. Moreno, Basics of Software Engineering Experimentation, 2001.

[24] C. Wohlin, R. Runeson, M. Halst, M.C. Ohlsson, B. Regnell and A. Wesslen, Experimentation in

Software Engineering: an Introduction, Kluwer Academic Publisher, 2000.

[25] B. Mutschler and M. Reichert, On Modeling and Analyzing Cost Factors in Information Systems

Engineering, Proc. CAiSE 2008, LNCS 5074, pp. 510-524, 2008.

[26] Basili, V.R., Rombach, H.D.: The TAME project: Towards Improvement-oriented Software

Environments. IEEE Transactions on Software Engineering 14(6), pp. 758-773, 1998.

[27] S. Rinderle and M. Reichert and P. Dadam, Flexible Support of Team Processes by Adaptive

Workflow Systems., Distributed and Parallel Databases, 16(1), pp. 91-116, 2004.

[28] V. R. Basili and R. W. Selby and D. H. Hutchens, Experimentation in Software Engineering, IEEE

Transactions on Software Engineering, 12(7), pp.733-743, 1986.

[29] M. V. Zelkowitz and D. R. Wallace, Experimental Models for Validating Technology, IEEE

Computer, 31(5), pp. 23-31, 1998.

[30] L. Prechelt, Controlled Experiments in Software Engineering (in German), Springer, 2001.

[31] P. Runeson, Using Students as Experiment Subjects - An Analysis on Graduate and Freshmen

Student Data, Proc. 7th Int’l. Conf. on Empirical Assessment & Evaluation in Software Engineering

(EASE ’03), pp.95-102, 2003.

[32] M. H¨ost and B. Regnell and C. Wohlin, Using Students as SubjectsA Comparative Study of Students

and Professionals in Lead-Time Impact Assessment, Empirical Software Engineering, 5(3), 201-214,

2008.

[33] A. W. Scheer, Aris-Business Process Frameworks, Springer, 1999.

[34] D. J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, 2000.

[35] P.W. Reason and H. Bradbury, Handbook of Action Research, 2001.

[36] B. Mutschler and M. Reichert and S. Rinderle, Analyzing the Dynamic Cost Factors of Process-

aware Information Systems: A Model-based Approach, Proc. CAiSE 2007, LNCS 4495, pp. 589-603,

2007.

[37] M. Reichert and P. Dadam, ADEPTflex – Supporting Dynamic Changes of Workflows Without

Losing Control., Journal of Intelligent Information Systems, 10(2), pp. 93-129, 1998.

[38] S. Rinderle and M. Reichert and P. Dadam, Correctness Criteria for Dynamic Changes in Workflow

Systems – A Survey, Data and Knowledge Engineering, 50(1), pp. 9-24, 2004.

[39] M. Oba and S. Onoda and N. Komoda, Evaluating the Quantitative Effects of Workflow Systems

based on Real Cases, Proc. HICSS 2000.

[40] H. A. Reijers, Performance Improvement by Workflow Management Systems: Preliminary Results

from an Empirical Study, Proc. ICEIS ’04, pp.359-366, 2004.

[41] H. A. Reijers and W.M.P. van der Aalst, The Effectiveness of Workflow Management Systems -

Predictions and Lessons Learned, Int’l. J. of Inf. Manag., 25(5), pp. 457-471, 2005.

[42] S. Choenni and R. Bakker and W. Baets, On the Evaluation of Workflow Systems in Business

Processes, Electronic Journal of Information Systems Evaluation (EJISE), 6(2), 2003.

[43] R. Aiello, Workflow Performance Evaluation, PhD Thesis, University of Salerno, Italy, 2004.

[44] J. Becker and C. von Uthmann and M. zur Muehlen and M. Rosemann, Identifying the Workflow

Potential of Business Processes, Proc. HICSS ’99, 1999.

[45] A. F. Abate and A. Esposito and N. Grieco and G. Nota, Workflow Performance Evaluation through

WPQL, Proc. SEKE ’02, pp.489-495, 2002.

[46] N. Russell and A.H.M ter Hofstede and W.M.P van der Aalst and N. Mulyar, Workflow Control-Flow

Patterns: A Revised View., Technical Report BPM-06-22, BPMcenter.org, 2006.

[47] F. Puhlmann, M. Weske, Using the Pi-Calculus for Formalizing Workflow Patterns., Proc. BPM’05,

LNCS 3649, pp. 153-168, 2005.

[48] N. Russell and A.H.M ter Hofstede and D. Edmond and W.M.P van der Aalst, Workflow Data

Patterns, Technical Report FIT-TR-2004-01, Queensland Univ. of Techn., 2004.

[49] N. Russell and A.H.M ter Hofstede and D. Edmond and W.M.P van der Aalst, Workflow Resource

Patterns, Technical Report WP127, Eindhoven Univ. of Technology, 2004.

[50] S. Rinderle-Ma and M. Reichert and B. Weber: On the Formal Semantics of Change Patterns in

Process-aware Information Systems. In ER’08, LNCS 5231, pp. 279-293.

[51] A. Barros, M. Dumas, A. ter Hofstede, Service Interaction Patterns., Proc. BPM’05, LNCS 3649,

pp. 302-318, 2005.

[52] N. Russell and W.M.P van der Aalst and A.H.M ter Hofstede, Exception Handling Patterns in

Process-Aware Information Systems, Proc. CAiSE’06, LNCS 4001, pp. 288-302, 2006.

[53] P. Wohed, W.M.P van der Aalst, M. Dumas, A.H.M ter Hofstede, N. Russell, On the Suitability of

BPMN for Business Process Modelling, Proc. BPM’06, pp. 161-176, 2006.

[54] N. Russell, W.M.P. van der Aalst, A.H.M. ter Hofstede, P. Wohed, On the Suitability of UML 2.0

Activity Diagrams for BP Modelling, Proc. APCCM ’06, pp. 95-104, 2006.

[55] B. Weber and M. Reichert, Refactoring Process Models in Large Process Repositories, Proc.

CAiSE’08, LNCS 5074, pp. 124-139, 2008.