scieee Science in your language
[en] (orig)
Workflow Management versus Case Handling:
Results from a Controlled Software Experiment
Bela Mutschler
Daimler AG
Group Research
Ulm, Germany
bela.mutschler@
daimler.com
Barbara Weber
Dept. of Computer Science
University of Innsbruck
Austria
barbara.weber@
uibk.ac.at
Manfred Reichert
Information Systems Group
University of Twente
The Netherlands
m.u.reichert@
utwente.nl
ABSTRACT
Business Process Management (BPM) technology has be-
come an important instrument for improving process per-
formance. When considering its use, however, enterprises
typically have to rely on vendor promises or qualitative re-
ports. What is still missing and what is also demanded by
IT decision makers are quantitative evaluations based on
empirical and experimental research. This paper picks up
this demand and illustrates how experimental research can
be applied in the BPM field. The conducted experiment
compares efforts for implementing a sample business pro-
cess either based on standard workflow technology or on a
case handling system. We motivate and describe the exper-
iment design, discuss threats for the validity of experiment
results (as well as risk mitigations), and present experiment
results. In general, more experimental research is needed in
order to obtain more valid data on the various aspects and
effects of BPM technology and tools.
Categories and Subject Descriptors
H.4 [Information Systems Applications]: Office Au-
tomation—Workflow Management, Case Handling
1. INTRODUCTION
Providing effective IT support for business processes has
become crucial for enterprises to stay competitive in their
market [1]. In response to this need numerous process sup-
port paradigms (e.g., workflow management, service flow
management, case handling), process specification standards
(e.g., WS-BPEL, BPML), and BPM tools (e.g., ARIS Toolset,
Tibco Staffware, FLOWer) have emerged [4].
When evaluating suitability of existing BPM technology
for a particular project or when arguing about its strengths
and weaknesses, typically, it becomes necessary to rely on
qualitative criteria. As one example consider workflow pat-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SAC’08 March 16-20, 2008, Fortaleza, Cear´
a, Brazil
Copyright 2008 ACM 978-1-59593-753-7/08/0003 ...$5.00.
terns [19], which can be used to evaluate the expressiveness
of the workflow modeling language provided by a particu-
lar BPM tool. As another example consider process change
patterns [22]. What has been neglected so far are more pro-
found evaluations of BPM technology based on empirical
or experimental research. This is surprising as the benefits
of these research methods have been demonstrated in ar-
eas like software engineering (e.g., in the context of software
development processes or code reviews [10, 7]) for a long
time [17]. From the introduction of experimental research
to BPM as well as to the development of process-aware in-
formation systems, we expect more valid, quantitative data
on costs and benefits of BPM technology. This, in turn, be-
comes increasingly important for IT managers and project
leaders [8].
Picking up this demand, this paper illustrates how experi-
mental research can be applied in the BPM context. For this
purpose we have conducted a controlled software experiment
with 48 participants. Exemplarily, this experiment investi-
gates efforts related to the implementation and change of
business processes either using a conventional workflow sys-
tem [20] or case handling technology [21]. More precisely,
we have used Tibco Staffware [18] as representative of work-
flow technology and FLOWer [2] as representative of case
handling systems. We describe our experiment design, give
a mathematical model of the experiment, and discuss poten-
tial threats for the validity of experiment results. The results
of our experiment help to better understand the complex ef-
forts caused by using BPM technology.
Section 2 motivates the need for experimentation in BPM
and provides background information needed for understand-
ing our experiment. Section 3 describes our experimental
framework. Section 4 deals with the performance and re-
sults of our experiment. Finally, Section 5 discusses related
work and Section 6 concludes with a summary.
2. BACKGROUNDS
Assume that a business process for refunding traveling ex-
penses shall be supported by a Process-Aware Information
System (PAIS) realized on top of BPM technology. This
eTravel business process distinguishes between four roles (cf.
Fig. 1). The traveler initiates the refunding of his expenses.
For this purpose, he has to summarize the travel data in a
travel expense report. This report is then forwarded either
to a travel expense responsible (in case of a national business
trip) or to a verification center (in case of an international
business trip). Both the travel expense responsible and the
verification center fulfill the same task, i.e., they verify a re-
ceived travel expense report. ”Verification” means that the
declared travel data is checked for correctness and plausibil-
ity (e.g., regarding accordance with receipts). An incorrect
travel expense report will be send back to the traveler (for
correction). If it is correct, it will be forwarded to the travel
supervisor for final approval. The supervisor role may be
filled, for example, by the line manager of the traveler. If
a travel expense report is approved by the supervisor, the
refunding will be initiated. Otherwise, it will be send back
to either the travel expense responsible (national trip) or
the verification center (international trip). Note that this is
a characteristic (yet simplified) process as it can be found
in many organizations.
Create Travel
Expense Report
Verify Travel
Expense Report
Verify Travel
Expense Report
Traveler
Travel
Expense
Reponsible
Verification
Center
Travel
Supervisor
Verify Travel
Expense Report
rejected
approved
approved
rejected
national
international
Initiate
Refunding
rejected
approved
Activity
Start of Process
End of Process
User
Control Flow
Caption: Decision
Figure 1: The eTravel Business Process.
When realizing a PAIS supporting this process, one chal-
lenge is to select the most adequate BPM technology for this.
Currently, there exist different BPM paradigms, which can
be applied in the given context. Among them are workflow
management and case handling.
Workflow Management. Contemporary workflow man-
agement system (WfMS) enable the modeling, execution,
and monitoring of business processes. When working on a
particular process step (i.e., activity), typically, in WfMS-
based applications, only data needed for executing this ac-
tivity is visible to respective actors, but no other workflow
data. This is also known as ”context tunneling”. WfMSs
coordinate activity execution based on routing rules, which
are described by process definitions and which are strictly
separated from processed data. If an activity is completed,
subsequent activities will become active. Accompanying to
this the worklists of potential actors will be updated ac-
cordingly. Finally, electronic forms are typically used to
implement activities and to present data being processed.
Case Handling. An alternative BPM paradigm is pro-
vided by case handling [21]. A case handling system (CHS)
aims at more flexible process execution by avoiding restric-
tions known from (conventional) workflow technology (cf.
Fig. 2)). Examples of such restrictions include rigid control
flow and the aforementioned context tunneling. The central
concepts behind a CHS are the case and its data as opposed
to the activities and routing rules being characteristic for
WfMSs. Usually, CHSs present all data about a case at any
time to the user (assuming proper authorization), i.e., con-
text tunneling as known from WfMSs is avoided. Further-
more, CHSs orchestrate the execution of activities based on
the data assigned to a case. Thereby, different kinds of data
objects are distinguished (cf. Fig. 2). Free data objects are
not explicitly associated with a particular activity and can
be changed at any time during a case execution (e.g., Data
Object 3 in Fig. 2). Mandatory and restricted data objects,
in turn, are explicitly linked to one or more activities. If a
data object is mandatory for an activity, a value will have to
be assigned to it before the activity can be completed (e.g.,
Data Object 5 in Fig. 2). If a data object is restricted for
an activity, this activity needs to be active in order to assign
a value to the data object (e.g., Data Object 6 in Fig. 2).
Like in WfMSs, forms linked to activities are used to provide
context-specific views on case data. Thereby, a CHS does
not only allow to assign an execution role to activities, but
aredo role (to undo an executed activity) and a skip role as
well (to omit the execution of activities). User 2 in Fig. 2,
for example, may execute Activities 3, 4, 5 and 6, and redo
Activities 2 and 3.
mandatory data object restricted data object
Activity 1 Activity 2
Activity 3
Activity 4
Activity 6
Data
Object 1
Data
Object 2
Data
Object 4
Data
Object 5
Data
Object 3
Data
Object 7
Data
Object 6
Legend: form
User 1 User 2
execute role redo role
active
activity
Possible Actions of User 2:
- Activity 4 can be executed
Current Situation:
- Activity 4 is active
- Data Objects 1,2,3 and 4 are availble
Activity 5
- Activity 2 can be redone at any time
- Activity 3 can not be executed CONSTRAINT
- Activity 5 can not be executed CONSTRAINT
- Activity 6 can not be executed CONSTRAINT
Constraints:
- Activity 3 can be completed if Data Object 5 is available
- Activity 5 can be executed if Activities 3 and 4 is completed
- Activity 6 can be executed if Activity 5 is completed
available Data Objects not available Data Objects
Figure 2: Data-driven Case Handling.
Despite conceptual differences, both paradigms can be
used for implementing processes in general and our eTravel
process in particular. Usually, the selection of ”the most
adequate” BPM technology depends on project-specific re-
quirements. While some IT managers will consider BPM
technology as adequate if best practices are available, others
will take into account more specific selection criteria like the
support of a sufficient degree of process flexibility. Likewise,
IT managers can be interested in value-based considerations
as well. In practice, for example, a frequently asked question
is whether there is a difference in the efforts for implement-
ing a business process either with BPM technology A or BPM
technology B and if ”yes” how strong this difference is. De-
manding for such considerations, IT managers typically have
to rely on vendor data (e.g., about the return-on-investment
of their products) and qualitative experience reports. What
has been not available so far are precise quantitative data
about the use of BPM technology and PAIS (e.g., concerning
efforts for implementing processes).
To generate quantitative data, controlled software exper-
iments offer promising perspectives. In the following, we
pick up this idea and describe the results of an experiment
in which we investigate efforts related to the implementation
of business processes using either a WfMS or a CHS.
We use Tibco Staffware [18] (Version 10.1) as typical rep-
resentative of workflow technology. Its build-time tools in-
clude, among other components, a visual process modeling
tool and a graphical form editor. The used CHS, in turn, is
FLOWer [2] (Version 3.1), the most widely used commer-
cial CHS. Like Staffware, FLOWer provides a visual process
modeling tool and a form editor.
3. EXPERIMENTAL FRAMEWORK
This section describes the experimental framework under-
lying our experiment. Section 3.1 discusses general issues
to be considered when designing an experiment. Section
3.2 describes the specific design underlying our experiment.
Section 3.3 discusses factors threatening the validity of ex-
periment results as well as potential mitigations.
3.1 Basic Issues
Literature about software experiments [3, 5, 6, 17, 23]
provides various design guidelines for setting up an experi-
ment. First, an experiment design should allow to collect as
much data as possible with respect to the major goals of the
experiment. Second, collected data should be unambiguous.
Third, the experiment must be feasible within the given set-
ting (e.g., within the planned time period). Meeting these
design criteria is not trivial. Often, an experiment cannot
be accomplished as planned due to its complex design or an
insufficient number of participants [17].
Considering these major criteria, we accomplished our
experiment as a balanced single factor experiment with re-
peated measurement (cf. Fig. 3). This design is particularly
suitable for comparing software development technologies
[6]. Specifically, single factor experiments investigate the ef-
fects of one factor1(e.g., a particular software development
technology) on a common response variable (e.g., implemen-
tation efforts). This design also allows to analyze variations
of a factor (e.g., two alternative tools for software develop-
ment). Generally, these variations are called factor levels.
The response variable is determined when the participants
of the experiment (who are also called subjects) apply the
factor or factor levels to an object (e.g., a specification to be
implemented, based on a set of requirements).
Factor Level 1:
Software
Development
Technology 1
Participant 1 Specification
(Set of
Requirements)
Participant
n/2
Participant
n/2+1
Participant n
Zeichen
n Participant
Zeichen
1 Factor
Zeichen
1 Object
Factor Level 2:
Software
Development
Technology 2
Specification
(Set of
Requirements)
Factor Level 2:
Software
Development
Technology 2
Participant 1 Specification
(Set of
Requirements)
Participant
n/2
Participant
n/2+1
Participant n
Zeichen
n Subjects
Zeichen
1 Factor
Zeichen
1 Object
Factor Level 1:
Software
Development
Technology 1
Specification
(Set of
Requirements)
First Run Second Run
Completion of first
applied Factor Level
Overall Experiment
Figure 3: Single Factor Experiment.
We denote a single factor experiment as balanced if all fac-
tor levels are used by all participants of the experiment.
This enables repeated measurements and the collection of
more precise data as every subject generates data for every
treated factor level. Generally, repeated measurements can
1Multi factor experiments, by contrast, investigate the effects of fac-
tor combinations on a common response variable, e.g., of a software
development technology and a software development process on im-
plementation efforts. Despite such experiments can improve the va-
lidity of experiment results, they are rarely applied in practice [12].
be realized in different ways. Fig. 3 shows a frequently ap-
plied variant which is based on two subsequent runs. During
the first run half of the subjects apply ”Software Develop-
ment Technology 1” to the treated object, while the other
half uses ”Software Development Technology 2”. After hav-
ing completed the first run, the second run begins. During
this run each subject applies that factor level to the object
not been treated so far.
3.2 Experiment Design
Considering the generic experiment design from Section
3.1, our specific design comprises the following elements:
Subjects: Subjects are 48 students of a combined
Bachelor/Master Computer Science course at the Uni-
versity of Innsbruck. These 48 students are divided
into 4 main groups each consisting of 4 teams with 3
students (cf. Fig. 4). This results in an overall num-
ber of 16 teams. The students are randomly assigned
to the teams prior to the start of the experiment.
Team 11
Team 13
Team 14
Team 16
Main Group 1 Main Group 2 Main Group 3 Main Group 4
16
Teams
Team 01
Team 02
Team 05
Team 06
Team 03
Team 04
Team 07
Team 08
Team 09
Team 10
Team 12
Team 15
1st Run:
WfMS
-----------
2nd Run:
CHS
1st Run:
WfMS
-----------
2nd Run:
CHS
1st Run:
CHS
-----------
2nd Run:
WfMS
1st Run:
CHS
-----------
2nd Run:
WfMS
WfMS = Workflow Management System, CHS = Case Handling System
Figure 4: Main Groups and Teams.
Object: The object to be implemented is the eTravel
business process (cf. Section 2). Its specification com-
prises two parts: an initial ”Base Implementation”
(Part I ) and an additional ”Change Implementation”
(Part II ). While the first part deals with the real-
ization of process support for the refunding of na-
tional business trips, the second part specifies a process
change, namely, additional support for refunding in-
ternational business trips. Both parts describe the ele-
ments to be implemented: the process logic, user roles,
and the data to be presented using simple electronic
forms. Note that this specification does not only en-
able us to investigate efforts for (initially) implement-
ing a business process, but also to examine efforts for
subsequent process changes. In our experiment, with
”process change” we mean the adaptation of the imple-
mented business process. After having realized such a
process change new process instances are based on the
new process model. We do not investigate the migra-
tion of changing process instances to the new process
schema in this context.
Factor & Factor Levels: In our experiment, BPM
technology is the considered factor with factor levels
”WfMS” (Staffware) and ”CHS” (FLOWer).
Response Variable: In our experiment the response
variable is the time the subjects (i.e., the students)
need for implementing the given object (i.e., the eTravel
specification) with each of the factor levels (WfMS and
CHS). All effort values related to the Staffware imple-
mentation are denoted as ”WfMS Sample”. All effort
values related to the FLOWer implementation are de-
noted as ”CHS Sample”.
Besides, the following issues are important:
Advertisement
Instrumentation: To precisely measure the response
variable, we have developed an application called Time-
Catcher. This ”stop watch” allows to log time in six
typical ”effort categories” related to the development
of a process-oriented application: (1) process model-
ing, (2) user/role management, (3) form design, (4)
data modeling, (5) test, and (6) miscellaneous efforts.
To collect qualitative feedback as well (e.g., concern-
ing the maturity or usability of the applied WfMS and
CHS), we use a structured questionnaire.
Data Collection Procedure: The TimeCatcher tool
is used by the students during the experiment. The
aforementioned questionnaire is filled out by the stu-
dents after completing the experiment.
Data Analysis Procedure: For data analysis well-
established statistical methods and standard metrics
are applied (cf. Section 4.3 for details).
The mathematical model of our experiment can be sum-
marized as follows: n subjects S1, ..., Sn(n IN) divided
into m teams T1, ..., Tm(mIN, m 2, m even) have to
implement the eTravel business process specification. This
specification describes a ”Base Implementation” O1(corre-
sponding to the ”national case” of the eTravel process) and a
”Change Implementation” O2(additionally introducing the
”international case”). During the experiment one half of
the teams (T1, ..., Tm/2) implements the complete specifica-
tion (i.e., base and change implementation) using a WfMS
(P MS1, Staffware), while the other half (Tm/2+1, ..., Tm)
does this using a CHS (P MS2, FLOWer). After finishing
the implementation with the first factor level (i.e., the first
run), each team has to implement the eTravel process us-
ing the second factor level in a second run (i.e., the devel-
opment technologies are switched). The response variable
”Effort[Time] of Tmimplementing Ousing P MSj is logged
with the TimeCatcher tool.
3.3 Risk Analysis and Mitigations
When accomplishing experimental research and generat-
ing results, related risks have to be taken into account. Gen-
erally, there exist factors that threaten both the internal
validity (”Are the claims we made about our measurements
correct?”) and the external validity (”Can we generalize the
claims we made?”) of an experiment. In our context, threats
to internal validity are as follows:
People: Participating students differ in their skills
and their productivity for two reasons: (i) general ex-
perience with software development and (ii) experience
with BPM technology. The first issue can only be bal-
anced by conducting the experiment with a sufficiently
large and representative set of students. The number
of 48 students promises to achieve such a balance. The
second issue can be mitigated by using BPM tools un-
known to every student. Only three of the partici-
pating students have rudimental (and thus negligible)
workflow knowledge. As we cannot exclude that this
knowledge influences our experiment results, we have
assigned those three students to different teams in or-
der to minimize potential effects as far as possible.
Data collection process: Data collection is one of
the most critical threats. To mitigate it, we need to
continuously control data collection in the context of
the experiment. We further have to ensure that stu-
dents understand which TimeCatcher categories have
to be selected during the experiment.
Time for optimizing an implementation: The spec-
ification to be implemented does not include any guide-
line concerning the number of electronic forms or their
layout. This implies the danger that some teams spend
more time for implementing a ”nice” user interface
than others do. To minimize such effects, we explic-
itly indicate to the students that the development of a
”nice” user interface is not a goal of our experiment.
To ensure that the implemented solutions are similar
across different teams, we accomplish acceptance tests.
Besides, there are threats to the external validity:
Students instead of professionals: Involving stu-
dents instead of IT professionals may be critical. How-
ever, it has been shown before that the results of stu-
dent experiments are transferable and can provide valu-
able insights into an analyzed problem domain [15].
Also note that the use of professional software devel-
opers is hardly possible in practice as no profit-oriented
organization will simultaneously implement a business
process twice using two different BPM technologies.
Investigation of tools instead of concepts: In our
experiment, BPM tools are used as representatives for
the analyzed concepts (i.e., workflow management and
case handling). Investigating the concepts therefore al-
ways depends on the quality of the used tools. To mit-
igate this risk, the used BPM technologies should be
representative for state-of-the-art technologies in prac-
tice (which is the case as both selected BPM tools are
market leader in their domain).
Choice of object: To avoid that the chosen business
process setting strongly supports the goals of our ex-
periment, we have picked a business process that can
be found in many organizations, i.e., the eTravel pro-
cess (cf. Section 2).
4. PERFORMING THE EXPERIMENT
This section deals with the preparation, execution and
analysis of our experiment. This also includes the presenta-
tion of experiment results.
4.1 Experiment Preparation
In the run-up of the experiment, we prepare a techni-
cal specification of the eTravel process. This specification
comprises UML activity diagrams2, an entity relationship
diagram describing the generic data structure of a travel ex-
pense report, and system-specific data models for the con-
sidered tools (Staffware, FLOWer). In order to ensure that
2One may argue that the use of UML activity diagrams can under-
mine the validity of the experiment as these diagrams are very similar
to the explicit, flow-driven notation of Staffware process models, but
different from the implicit, more data-driven FLOWer process mod-
els. However, in practice, UML activity diagrams are widely used to
describe standard business processes. Thus, the use of UML activ-
ity diagrams can even improve internal validity as a typical practical
scenario is investigated.
the specification is self-explanatory and correct, two student
assistants are involved in its development.
Before the experiment, the same two students implement
the specification with each of the utilized BPM technolo-
gies. This allows us to ensure the feasibility of our general
experiment setup and to identify critical issues with respect
to the performance of the experiment. This pre-test also
provides us with feedback that helps to further improve the
comprehensibility of our specification. Finally, we compile
a ”starter kit” for each participating team. It includes orig-
inal tool documentation, additional documentation created
by us when preparing the experiment (and which can be
considered as a compressed summary from the original doc-
umentation), and the technical process specification.
4.2 Experiment Execution
Due to infrastructure limitations, we split up the exper-
iment in two events. While the first one took place in Oc-
tober 2006, the second one was conducted in January 2007.
Each event lasted 5 days, involved 24 participants (i.e., stu-
dents), and was based on the following procedure: Prior to
the start of the experiment, all students have to attend an
introductory lecture. We introduce to them basic notions of
workflow management and case handling. We further inform
them about the goals and rules of the experiment. After-
wards, each team receives its ”starter kit”. Then, the stu-
dents have to implement the given eTravel business process
specification (with both considered factor levels). After hav-
ing implemented the eTravel specification with a factor level,
an acceptance test is accomplished by us in order to ensure
that the developed solution corresponds to the specification.
After finishing their work on the experiment, students have
to fill out the aforementioned questionnaire.
We further optimize experiment results by applying Ac-
tion Research [13]. Action Research is characterized by an
intensive communication between researchers and subjects.
At an early stage, we optimize the data collection process by
assisting and guiding the students in using the TimeCatcher
data collection tool properly (which is critical with respect
to the quality of the gathered data). Besides, we document
emotional reactions of the students regarding their way of
working. This helps us to design the questionnaire. Note
that Action Research does not imply any advice for the stu-
dents on how to implement the eTravel process.
4.3 Data Analysis Procedure
Data analysis comprises three steps: an initial validation
of the collected data (Step 1), data analysis itself (Step 2),
and analysis of the questionnaire results (Step 3).
Step 1: Data Validation. We validate the collected data
regarding its consistency (”Is all expected data available?”)
and plausibility (”Is all available data meaningful?”):
Data Consistency: We discard the data of two teams
as their data is flawed. Both have made mistakes using
the TimeCatcher tool. Hence, the data provided by 14
teams is finally included in data analysis.
Data Plausibility: We analyze data plausibility based
on box-whisker-plot diagrams. Such diagrams visualize
the distribution of a sample and particularly show out-
liers. A low number of outliers indicates plausible data
[12]. Fig. 5A, for example, shows a box-whisker-plot
diagram which illustrates the distributions of the base
implementation efforts in our experiment. The dia-
gram takes the form of a box that spans the distance
between the 25% quantile and the 75% quantile (the
so called interquantile range) surrounding the median
which splits the box into two parts. The ”whiskers”
are straight lines extending from the ends of the box
to the maximum and minimum values. Outliers are
defined as data points beyond the interquantile range,
i.e., beyond the edge of the box. As can be seen in Fig.
5A, there are no outliers, i.e., all data from these sam-
ples lie within the boxed area. Moreover, there exists
only one (negligible) outlier in the distribution of the
change implementation efforts (cf. Fig. 5B), and no
outliers regarding the distribution of the overall imple-
mentation efforts (cf. Fig. 5C).
Step 2: Data Analysis. The main goal of the experiment
is to investigate whether there is a significant difference be-
tween the efforts of implementing a business process with a
WfMS and the efforts of an implementation using case han-
dling technology. Hence, the 0-hypothesis to be analyzed is
as follows: Using workflow technology yields no significant
difference in implementation efforts when compared to case
handling technology”. We analyze this 0-hypothesis based
on a two-sided t-test [12] (respectively an additional sign test
if the t-test fails). Doing so, we are able to assess whether
the means of the WfMS sample and the CHS sample are
statistically different from each other. A successful t-test
(with |T|> t0) rejects our 0-hypothesis. Specifically, the
following steps have to be executed in order to accomplish
a t-test (with α= 0.05 as the level of significance):
1. Paired Comparison: The t-test is combined with a
paired comparison [12], i.e., we analyze ”pairs of effort
values”. Each pair comprises one effort value from the
WfMS sample and one from the CHS sample. Note
that we compose pairs according to the performance
of the teams, i.e., effort values of ”good” teams are not
combined with effort values of ”bad” teams (cf. [6]).
2. Standardized Comparison Variable: For each pair,
astandardized comparison variable Xjis derived. It is
calculated by dividing the difference of the two com-
pared effort values by the first one:
Xj:= EF F ORTj+m/2EF F ORTj
EF F ORTj+m/2
·100%
In other words, Xjdenotes how much effort team Tj
saves using workflow technology when compared to
team Tj+m/2which uses case handling technology. To-
gether, all Xjconstitute a standardized comparison
sample x= (X11, ..., X1m/2) used as basis when per-
forming the t-test.
3. Statistical Measures: For the standardized compar-
ison sample xwe calculate the median (m), the in-
terquantile range (IQR), the expected value (µ), the
standard deviation (σ), and the skewness (sk).
4. Two-sided t-Test: Finally, we apply the t-test to x.
Note that the t-test will be only possible if xemanates
anormal distribution and if the WfMS and CHS sam-
ple have same variance. The first condition can be
Advertisement
A) Boxplot: Base Implementation B) Boxplot: Change Implementation C) Boxplot: Overall Implementation
EFFORT [HOURS]
FLOWer Staffware
Outliers
EFFORT [HOURS]
FLOWer Staffware
EFFORT [HOURS]
FLOWer Staffware
Figure 5: Data Distribution (Box-Whisker-Plot Diagrams).
tested using the Kolmogoroff/Smirnov test [16]. In
particular, the result of the Kolmogoroff/Smirnov test
has to be smaller than K0(with K0a predefined value
depending on the size of xand α). The second condi-
tion can be tested based on the test for identical vari-
ance [16]. The variance of the WfMS and CHS sample
will be identical, if the result of this test is smaller than
F0(with F0a predefined value depending on the size
of the samples and α). Only one violated precondition
is sufficient to avoid the accomplishment of the t-test.
Step 3: Questionnaire Analysis. We analyze the data
collected based on the questionnaire each student has to fill
out. As one of the students became ill at the last day of the
January 2007 event, only 47 students participate.
4.4 Experiment Results
Fig. 6A shows the results for the overall implementa-
tion efforts. When studying the efforts for the workflow
implementation, we can see that they are lower than the ef-
forts for the case handling implementation. This difference
is confirmed by the results of the (successful) t-tests for both
the first and the second run, i.e., our 0-hypothesis is to be
rejected. In the first run, the use of workflow technology
has resulted in effort savings of 43.04% (fluctuating between
27.51% and 50.81%) when compared to the efforts for us-
ing case handling technology. In the second run, the use of
workflow technology has still resulted in savings of 28.29%
(fluctuating between 11.48% and 53.16%).
Fig. 6A also shows that efforts for the first run are gener-
ally higher than those for the second run. Regardless which
technology is used first, all teams reduce their efforts in the
second run. This can be explained either through learning
effects on the used BPM technologies or an increasing pro-
cess knowledge gathered during the experiment. Based on
questionnaire results (see below), we assume that this ef-
fect is not necessarily related to learning effects concerning
the used BPM technologies (i.e., tool knowledge), but to in-
creasing process knowledge (which, in turn, reduces the risk
of comparing tools instead of concepts).
Fig. 6B and Fig. 6C show results for the base imple-
mentation and the change implementation. Again, our re-
sults allow to reject the 0-hypothesis (the failed t-test can
be compensated with a successful sign test). Using workflow
technology results in effort savings of 44.11% for the change
implementation in the first run (fluctuating between 16.29%
and 56.45%). In the second run, the use of workflow tech-
nology results in effort savings of 40.46% when compared
case handling efforts.
4.5 Questionnaire Results
Fig. 6D shows that the methodical soundness of using
process management technology is easier to understand in
the case of workflow technology, i.e., using case handling
technology is considered as being more difficult. Fig. 6E
illustrates what we have already mentioned above, i.e., pro-
cess knowledge gained during the first run significantly sim-
plifies the second run. By contrast, Fig. 6F shows that the
increased efficiency during the second run cannot be related
to a gained tool knowledge. Finally, Fig. 6G deals with the
usability of the applied process management systems. As
can be seen, there remains a lot of space for improvement
from the students’ viewpoint.
4.6 Discussion
Our results indicate that process implementations based
on workflow technology generate lower efforts when com-
pared to implementations based on case handling technol-
ogy. Moreover, our results show that initial implementations
of processes generate significantly higher efforts than subse-
quent process changes (cf. Fig. 7). This is particularly im-
portant for policy makers, who often focus on short-term
costs (e.g., for purchasing BPM technology and initially
implementing business processes) rather than on long-term
benefits (e.g., low costs for realizing process changes).
Finally, our data indicates that increasing knowledge about
the processes to be implemented results in increased pro-
ductivity of software developers. Regardless which BPM
technology is used first, all teams reduce their efforts in the
second run. Questionnaire results further indicate that this
effect is not necessarily related to an increasing knowledge
about the used BPM technologies. This also emphasizes the
need to involve domain experts with high process knowledge
when applying BPM technology.
Considering our experiment design, it is inevitable to ac-
knowledge that our experiment results are influenced by the
quality of the used BPM tools. However, by selecting leading
commercial BPM tools as representatives for the analyzed
concepts (i.e., workflow management and case handling), we
can reduce the impact of the tool quality. Yet, based on this
single experiment, results cannot be generalized, i.e., sub-
stantial conclusions regarding the strengths and weaknesses
of workflow management and case handling cannot be de-
rived. For this purpose, additional experiments with differ-
ent experiment designs and more specific research questions
will be necessary. As one example consider the comparison
of conventional WfMS, adaptive WfMS, and CHS regarding
the effectiveness of realizing process changes.
A) EXPERIMENT RESULTS - Paired Comparison (Overall Efforts)
0
20
40
60
80
100
120
11|15 6|8 1|12 16|4 14|7 2|10 13|3 15|11 8|6 12|1 4|16 7|14 10|2 3|13
WfMS
CHS
1st run
Statistical Data
2nd run
= 43.048
= [27.51 ; 50.81]
first run second run
IQR
m
= 38.6896
= 12.7703
= 28.2913
= [11.48 ; 53.16]
IQR
m
= 31.2358
= 20.6792
= -0.6309 = 0.4490
K
= 0.135 ( )
0
K
= 0.349
K
= 0.129 ( )
0
K
= 0.349
F
= 0.661 ( )
0
F
= 4.284
F
= 0.76 ( )
0
F
= 4.284
T
= -5.059 ( )
0
t
= 2.179
T
= -3.294 ( )
0
t
= 2.179
σ
µ
sk
σ
µ
sk
normalized effort values
pairs of effort values pairs of effort values
0
20
40
60
80
100
120
11|15 6|8 1|12 16|4 14|7 2|10 13|3 15|11 8|6 12|1 4|16 7|14 10|2 3|13
0
20
40
60
80
100
120
11|15 6|8 1|12 16|4 14|7 2|10 13|3 15|11 8|6 12|1 4|16 7|14 10|2 3|13
B) EXPERIMENT RESULTS - Paired Comparison (Base Implementation)
C) EXPERIMENT RESULTS - Paired Comparison (Change Implementation)
Statistical Data
Statistical Data
1st run 2nd run
1st run 2nd run
= 43.0116
= [32.03 ; 50.06]
first run second run
σ
µ
IQR
m
= 39.2788
= 11.9261
= 28.5209
= [8.11 ; 54.90]
IQR
m
= 29.3332
= 23.5067
sk
= -09141 = 0.4401
K
= 0.138 ( )
0
K
= 0,349
K
= 0.198 ( )
0
K
= 0,349
F
= 0.972 ( )
0
F
= 4,284
F
= 0.895 ( )
0
F
= 4,284
T
= -4.816 ( )
0
t
= 2,179
= -3.024 ( )
0
t
= 2,179
= 44.1152
= [16.29 ; 56.45]
first run second run
IQR
m
= 29.9807
= 32.4034
= 40.4666
= [26,63 ; 52,20]
IQR
m
= 41.4368
= 14.4722
= -1.3034 = 0.8501
K
= 0.172 ( )
0
K
= 0.349
K
= 0.198 ( )
0
K
= 0.349
F
= 0.752 ( )
0
F
= 4.284
F
= 1.784 ( )
0
F
= 4.284
T
= -1.724 ( )
0
t
= 2.179
= -2.884 ( )
0
t
= 2.179
µ
σ
sk
σ
µ
sk
σ
µ
sk
normalized effort values
pairs of effort values pairs of effort values
normalized effort values
pairs of effort values pairs of effort values t-test failed
WfMS
CHS
WfMS
CHS
0
5
10
15
20
25
D) QUESTIONNAIRE - Methodical Soundness of Implementation E) QUESTIONNAIRE - Impact of Process Knowledge
Question: How strong has the process knowledge, which was gained during
the first implementation, simplified the second implementation?
A B C D E F
A:04
B:14
C:21
D:05
E:02
F:01
G:00
(08.51%)
(29.79%)
(44.68%)
(10.64%)
(04.26%)
(02.13%)
(00.00%)
very strong
strong
rather strong
indifferent
rather weak
weak
very weak
0
5
10
15
20
25
30
35
A B C D E
Question: The methodical steps of the business process
implementation was clear during the experiment?
A:13
B:25
C:05
D:00
E:00
(27.66%)
(61.70%)
(10.64%)
(00.00%)
(00.0%)
yes
rather yes
indifferent
rather no
no
A:00
B:09
C:10
D:21
E:07
(00.00%)
(19.15%)
(21.28%)
(44.68%)
(14.89%)
G) QUESTIONNAIRE - UsabilityF) QUESTIONNAIRE - Impact of Tool Knowledge
0
2
4
6
8
10
12
14
16
Question: How strong has the tool knowledge, which was gained
during the first implementation, simplified the second implementation?
A B C D E F G
A:02
B:02
C:11
D:15
E:06
F:04
G:07
(04.26%)
(04.26%)
(23.40%)
(31.91%)
(12.77%)
(08.51%)
(14.89%)
very strong
strong
rather strong
indifferent
rather weak
weak
very weak
0
5
10
15
20
25
Question: How would you rate the usability of the process management systems,
which have been used during the experiment?
A B C D E F G
A:05
B:21
C:09
D:08
E:04
F:00
G:00
(10.64%)
(44.68%)
(19.15%)
(17.02%)
(08.51%)
(00.00%)
(00.00%)
very good
good
rather good
indifferent
rather weak
weak
very weak
A:01
B:01
C:05
D:05
E:14
F:14
G:07
(02.13%)
(02.13%)
(10.64%)
(10.64%)
(29.79%)
(29.79%)
(14.89%)
absolute nominations absolute nominations
absolute nominations absolute nominations
WfMS
CHS
WfMS
CHS
Figure 6: Results of our Experiment.
Advertisement
We apply these experiment results in the EcoPOST project
[9]. This project aims at the development of an approach to
investigate complex causal dependencies and related cost ef-
fects in PAIS engineering projects. In particular, our results
enable us to quantify causal dependencies in PAIS engineer-
ing projects. As an example consider the impact of process
knowledge on the productivity of process implementation.
Process Modeling
Data Modeling
Form Design
User/Role Management
Test
Miscellaneous
Explanation:
Base Implementation
Change Implementation
Figure 7: Base versus Change Implementation.
5. RELATED WORK
The most similar experiment design when compared to
our own is provided by [6] which investigates the impact
of workflow technology on software development and soft-
ware maintenance. Generally, only few data is available on
the effects of workflow technology (regarding case handling,
no data is available at all). Oba et al. [11], for example,
analyze the introduction of WfMS and particularly focus
on the identification of factors influencing work efficiency,
processing time, and business process standardization. A
mathematical model is provided for predicting the reduc-
tion rate of processing times. An extension of this work
is [14] where simulation is used to compare pre- and post-
implementations of information systems relying on workflow
technology. Focus of this work is on analyzing process per-
formance based on criteria such as lead time, waiting time,
service time, and utilization of resources.
6. SUMMARY
This paper presents the results of a controlled BPM soft-
ware experiment with 48 students. Our results indicate that
business process implementation based on workflow technol-
ogy generates lower efforts than using case handling tech-
nology. Thereby, initial process implementations result in
higher efforts than subsequent process changes. Our data
can help enterprises which crave for quantitative data com-
pleting their qualitative decision criteria to better under-
stand the efforts of using BPM technology.
7. REFERENCES
[1] Y. L. Antonucci. Using Workflow Technologies to
improve Organizational Competitiveness. Int’l. J. of
Management, 14(1), pp.117-126, 1997.
[2] P. Athena. Case Handling with FLOWer: Beyond
Workflow. 2002.
[3] V. R. Basili, R. W. Selby, and D. H. Hutchens.
Experimentation in Software Engineering. IEEE
Trans. in SW Engin., 12(7), pp.733-743, 1986.
[4] M. Dumas, W. M. P. van der Aalst, and A. ter
Hofstede. Process-aware IS. Wiley, 2005.
[5] N. Juristo and A. M. Moreno. Basics of Software
Engineering Experimentation. 2001.
[6] N. Kleiner. Can Business Process Changes Be Cheaper
Implemented with Workflow-Management-Systems?
IRMA 2004, pp.529-532.
[7] C. M. Lott and H. D. Rombach. Repeatable Software
Engineering Experiments for Comparing
Defect-Detection Techniques. Empirical Software
Engineering, 1(3), pp. 241-277, 1996.
[8] B. Mutschler, M. Reichert, and J. Bumiller.
Unleashing the Effectiveness of Process-oriented
Information Systems: Problem Analysis, Critical
Success Factors, Implications. IEEE Transactions on
Systems, Man, and Cybernetics - Part C: Application
and Reviews, 2008 (accepted for publication).
[9] B. Mutschler, M. Reichert, and S. Rinderle. Analyzing
the Dynamic Cost Factors of Process-aware IS: A
Model-based Approach. CAiSE 2007.
[10] G. J. Myers. A controlled Experiment in Program
Testing and Code Walkthroughs/Inspections. Comm.
of the ACM, 21(9), pp. 760-768., 1978.
[11] M. Oba, S. Onoda, and N. Komoda. Evaluating the
Quantitative Effects of Workflow Systems based on
Real Cases. HICSS 2000.
[12] L. Prechelt. Controlled Experiments in Software
Engineering (in German). Springer, 2001.
[13] P. Reason and H. Bradbury. Handbook of Action
Research. 2001.
[14] H. A. Reijers and W. M. P. van der Aalst. The
Effectiveness of Workflow Management Systems -
Predictions and Lessons Learned. Int’l. J. of Inf.
Manag., 25(5), pp.457-471, 2005.
[15] P. Runeson. Using Students as Experiment Subjects.
EASE 2003.
[16] D. J. Sheskin. Handbook of Parametric and
Nonparametric Statistical Procedures. 2000.
[17] D. I. K. Sjoberg, J. E. Hannay, O. Hansen, V. B.
Kampenes, A. Karahasanovic, N.-K. Liborg, and
A. C. Rekdal. A Survey of Controlled Experiments in
Software Engineering. IEEE Trans. in SW Engin.,
31(9), pp.733-753, 2005.
[18] Tibco. Staffware Process Suite. User Manual, 2005.
[19] W. van der Aalst, A. ter Hofstede, B. Kiepuszewski,
and A. Barros. Workflow Patterns. Distributed and
Parallel Databases, 14(3), pp.5-51, 2003.
[20] W. M. P. van der Aalst and K. van Hee. Workflow
Management. MIT Press, 2004.
[21] W. M. P. van der Aalst, M. Weske, and D. Grunbauer.
Case Handling: A New Paradigm for Business Process
Support. DKE, 53(2), pp.129-162, 2005.
[22] B. Weber, S. Rinderle, and M. Reichert. Change
patterns and change support features in process-aware
is. CAiSE 2007.
[23] M. V. Zelkowitz and D. R. Wallace. Experimental
Models for Validating Technology. IEEE Computer,
31(5), pp.23-31, 1998.