
A Workflow Mining Approach
for Deriving Software Process Models
DISSERTATION
in Computer Science
submitted to the
Faculty of Computer Science,
Electrical Engineering, and Mathematics
University of Paderborn
by Vladimir Rubin
in partial fulfilment of the requirements for the degree of
doctor rerum naturalium (Dr. rer. nat.)
Paderborn 2007

ii

Abstract
Current enterprises spend much effort in obtaining precise models of their software
and systems engineering processes in order to improve the process capability of their
organization. However, nowadays process engineers, business analysts and managers
design process models manually, which is complicated, time-consuming, and error-
prone. Moreover, the results rapidly become obsolete. The capabilities of human
beings in detecting discrepancies between actual processes and process models are
rather limited. Therefore, automatic techniques for deriving and updating the process
models are becoming ever more important; some of the problems described above
can be solved by these techniques. From the practical point of view, these automatic
techniques should be available as tools for supporting process engineers and analysts,
increasing the quality and reducing the complexity of their work.
In order to keep track of the involved documents and files, engineers use Document
Management Systems (DMS) and data repositories. In the software engineering prac-
tice, people use such DMS as Software Configuration Management Systems (SCM)
and such software repositories as defect tracking systems, e-mail archives and dis-
cussion forums. Furthermore, it has to be noted that using such systems is not only
recommended by software process improvement frameworks, but practically unavoid-
able in the actual situation of the increasing complexity and sizes of the developed
software and the distributed way of work of the developers. Along the way, those
systems collect and store detailed audit information on software projects and soft-
ware development processes in the form of logs. Thus, these logs can be used for
constructing explicit process models – we call it software process mining.
In the thesis, we develop an approach that exploits the audit information and user
interaction with software repositories for the automatic derivation of process models
that accurately reflect the real processes. We call our approach incremental workflow
mining [RGvdA+07a, RGvdA+07b, KRS06a, KRS05b]; it supports discovering pro-
cess models both in incremental and in batch mode and can be used for gradually
introducing process management systems to the companies.
iii

In the area of process mining, modern techniques attempt to extract non-trivial
and useful information from event logs. A principal element of process mining is
the control-flow discovery, i.e. automatically constructing a process model (e.g., a
Petri net) describing the causal dependencies between process activities. Today, many
process mining techniques reveal shortcomings when it comes to discovering processes
with complicated dependencies and to deriving process models on different levels of
abstraction. Moreover, existing approaches typically provide a single process mining
algorithm, which can hardly be adapted for different application domains.
In this thesis, within our incremental workflow mining approach we develop
a new process mining technique – a two-step generation and synthesis technique
[vdARvD+06, KRS06c]. In the first step, a transition system is generated from the
log, and in the second, a Petri net model is synthesized from the transition system.
We use the “theory of regions” in the second step. The main advantage of our tech-
nique is that it allows for different modification strategies; i.e. derived models can
be altered in order to fulfil the desired degree of generalization and to fit in the de-
sired application domain. The theory behind our technique guarantees that we obtain
consistent results, i.e. our models always reflect the behaviour recorded in the log.
Our two-step approach is implemented in the form of plug-ins for the process mining
framework ProM [vdAvDG+07].
We evaluate our approach on several real software projects from the area of open-
source software and from the university practice. In our case studies, we use two
types of audit information: document logs of SCM systems and bug logs obtained
from defect repositories. For all the case studies, we derive plausible process models
in the control-flow perspective using our generation and synthesis technique; further,
we extend the models with the organizational and performance data, verify and
analyse them with the help of the existing algorithms from the process mining area.
Thus, in the thesis we show that (1) process mining can be used for obtaining
software process models as well as for analysing and optimising them; (2) an algorith-
mic approach, which resulted from our research on software processes, is a valuable
contribution to the process mining area; (3) now, an adequate tool exists to support
software process mining and this tool can be used for real projects.
Moreover, in this work we show that the issues and solutions discussed in the
context of software engineering processes are relevant for other research domains
such as business process management, product data management, enterprise resource
planning too – the domains, where business audit data is recorded and maintained.
iv

Acknowledgements
Recalling the long period of exciting and challenging work on my PhD, I realize that
a PhD thesis is much more than just a book – it is a product of communication
and cooperation of many creative and outstanding people. Fortunately, I have been
working with such people during many years. I am lucky to have been educated by
the scientific community both in professional and in personal sense.
Firstly, I would like to express my gratitude to my doctoral advisor, Prof. Dr.
Wilhelm Sch¨afer, for giving me a chance to do my PhD in Germany in a multi-cultural
international scientific environment. I am thankful to Wilhelm for his valuable advice,
which helped me to look for practical application of science and will also help me to
apply science in practice in the future. It is simply impossible to imagine a better
“Doktorvater”.
I am very grateful to Assoc. Prof. Dr. Ekkart Kindler (now at DTU in Kopen-
hagen) for contributing greatly to my education, for his invaluable assistance and
patience in showing me how to achieve quality in work and how to carry out re-
search. It was very exciting to work with Ekkart. For me Ekkart was and will always
be an “ideal” scientist and university lecturer.
Besides scientific supervising, Wilhelm and Ekkart gave me moral support during
all the years of my research at the University of Paderborn. They helped me to
decide on the best way to continue my career after the PhD. I highly appreciate it.
I am proud that I was guided and worked together with such talented scientists and
creative personalities.
I would like to thank Prof. Dr. Wil van der Aalst, for his interesting scientific
ideas, for his style of work and communication with people. I always admired how
Wil could make 25 hours out of a day and I tried to learn from him. The time, which
I spent in Eindhoven, was very helpful for my research.
I am also thankful to Prof. Dr. Gregor Engels and Prof. Dr. J¨urgen Gausemeier.
They gave me important comments on the initial ideas of the thesis which were
v
Loading more pages...