Trace-based Debugging and Visualisation of
Concurrent Java Programs with UML
Katharina Mehner
Paderborn, 10 February 2005
ii
iii
Doctoral thesis submitted to the Faculty of Electrical Engineering,
Computer Science and Mathematics in partial fulfilment of the
requirements for the degree of Dr. rer. nat.
Supervisors:
Prof. Dr. Gregor Engels, Universit¨at Paderborn
Prof. Dr. Stefan J¨ahnichen, Technische Universit¨at Berlin
iv
Abstract
This thesis describes an approach for automated detection of concurrent live-
ness failures in the execution of Java programs.
Concurrent programs are highly prone to failure because of the inher-
ent nondeterminism. Developers of concurrent Java programs are not well
supported in detecting concurrency failures, i.e. failures that are due to inter-
actions between multiple threads. These failures are neither well documented
nor do tools like debuggers allow developers to identify them at runtime.
This thesis analyses and classifies liveness failures, a special kind of con-
currency failures, and the associated potentials in Java. A UML statechart
is developed that models the interaction of Java threads. Liveness failures
and potentials are specified formally in terms of the states controlling the
interaction of threads and in terms of the events exchanged by interacting
threads.
Detection algorithms are developed to identify the specified failures in a
program execution. A UML profile extending UML interaction diagrams is
developed to visualise the execution of concurrent Java programs and de-
tected liveness failures and potentials.
In order to deploy the algorithms and the UML profile, tool support
concepts are provided. This involves the specification of a trace format and
a tracing method to collect execution data from a running Java program, and
the specification of methods to analyse the trace and to visualise the trace
and the analysis results.
The concepts are implemented in the JAVIS prototype, which consists of
a Java tracer with an analysis facility for monitoring liveness in concurrent
Java programs, and a plug-in extension to the UML CASE tool Together
for importing and displaying concurrent Java traces including failures and
potentials.
v
vi
Acknowledgments
I would like to thank Gregor Engels for giving me the opportunity to write
this thesis and for his enduring support. His advice and critical comments
were a great help in shaping this thesis.
My thanks also go to Stephan Herrmann and Stefan J¨ahnichen for sup-
porting me in the final stage of the thesis. Stefan J¨ahnichen kindly agreed
to be my second supervisor.
I wish to thank all my former colleagues at the University of Paderborn
for providing a good working atmosphere and engaging in fruitful discussions
and a number of successful collaborations.
I am grateful to all my colleagues at the Technical University of Berlin for
their support while I was completing the thesis, especially for their critical
proof reading and helpful suggestions.
My work on the thesis also benefited from the warm welcome and the
cordial atmosphere I experienced during my stay at Lancaster University.
Finally, I would like to thank my family and friends for their love, en-
couragement and understanding.
vii
viii
To my parents
x
Contents
List of Figures xv
List of Tables xvii
1 Introduction 1
2 Motivation 5
2.1 Background ............................ 5
2.2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Goals................................ 21
2.4 Synopsis.............................. 23
3 Requirements for Automated Failure Detection 25
3.1 Failure Description and Classification . . . . . . . . . . . . . . 26
3.1.1 Purpose .......................... 26
3.1.2 Scope ........................... 26
3.2 Automated Detection . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.1 Contextual Requirements . . . . . . . . . . . . . . . . . 28
3.2.2 Data Collection Requirements . . . . . . . . . . . . . . 29
3.2.3 Data Analysis Requirements . . . . . . . . . . . . . . . 31
3.2.4 Failure Visualisation Requirements . . . . . . . . . . . 32
3.3 Summary ............................. 33
4 Concurrent Programming in Java 35
4.1 Thread Principles . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Thread Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Unsynchronised Interaction . . . . . . . . . . . . . . . . . . . 49
4.4 Synchronised Interaction . . . . . . . . . . . . . . . . . . . . . 51
4.5 Summary ............................. 60
xi
xii CONTENTS
5 Liveness Failures and Potentials in Java 61
5.1 Terminology............................ 62
5.1.1 Error, Mistake, Fault, and Failure . . . . . . . . . . . . 62
5.1.2 Potential for Failure . . . . . . . . . . . . . . . . . . . 64
5.1.3 Failure and Symptom . . . . . . . . . . . . . . . . . . . 67
5.2 Liveness Failures . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2.1 Liveness and Safety . . . . . . . . . . . . . . . . . . . . 68
5.2.2 Concurrent Java Liveness Failures . . . . . . . . . . . . 70
5.2.3 Potentials for Java Liveness Failures . . . . . . . . . . 82
5.2.4 Classification of Java Liveness Failures and Potentials . 84
5.2.5 Failures in Concurrent Application Logic . . . . . . . . 88
5.3 Summary ............................. 88
6 A Model of Thread Synchronisation 89
6.1 Model Requirements . . . . . . . . . . . . . . . . . . . . . . . 90
6.1.1 Concurrent Liveness Failure Characteristics . . . . . . 90
6.1.2 Dynamics of Thread Synchronisation . . . . . . . . . . 91
6.1.3 Control Flow States . . . . . . . . . . . . . . . . . . . 93
6.1.4 Summary of Model Requirements . . . . . . . . . . . . 95
6.2 RelatedWork........................... 95
6.2.1 The Java Language Specification . . . . . . . . . . . . 96
6.2.2 Formal Java Semantics . . . . . . . . . . . . . . . . . . 97
6.3 A Statechart Model for Thread Synchronisation . . . . . . . . 98
6.3.1 The UML Statechart Approach . . . . . . . . . . . . . 99
6.3.2 Classification of Thread Lifecycle Methods . . . . . . . 102
6.3.3 Principles for Designing States and Transitions . . . . . 104
6.3.4 The Resulting Statechart . . . . . . . . . . . . . . . . . 111
6.3.5 System State and History . . . . . . . . . . . . . . . . 119
6.4 Formalising Liveness Failures and Potentials . . . . . . . . . . 122
6.4.1 Formal Description of Failures . . . . . . . . . . . . . . 122
6.4.2 Formal Description of Potentials . . . . . . . . . . . . . 124
6.5 Summary .............................126
7 Trace-based Data Collection 129
7.1 Tracing Requirements . . . . . . . . . . . . . . . . . . . . . . . 130
7.1.1 Format, Schema, and Encoding . . . . . . . . . . . . . 131
7.1.2 Trace Generation . . . . . . . . . . . . . . . . . . . . . 133
7.2 RelatedWork...........................135
7.2.1 Basic Techniques . . . . . . . . . . . . . . . . . . . . . 135
7.2.2 Trace Formats . . . . . . . . . . . . . . . . . . . . . . . 139
7.2.3 Code Instrumentation . . . . . . . . . . . . . . . . . . 139
CONTENTS xiii
7.2.4 Runtime APIs . . . . . . . . . . . . . . . . . . . . . . . 140
7.2.5 Debuggers.........................143
7.2.6 Trace-based Tools . . . . . . . . . . . . . . . . . . . . . 145
7.2.7 Comparison . . . . . . . . . . . . . . . . . . . . . . . . 146
7.3 JAVIS-Tracer for Concurrent Java Programs . . . . . . . . . . 148
7.3.1 Trace Format . . . . . . . . . . . . . . . . . . . . . . . 148
7.3.2 Trace File Generation . . . . . . . . . . . . . . . . . . 150
7.3.3 Tracer Architecture . . . . . . . . . . . . . . . . . . . . 153
7.4 Summary .............................155
8 Trace-based Failure and Potential Analysis 157
8.1 Analysis Requirements . . . . . . . . . . . . . . . . . . . . . . 157
8.1.1 Functional Requirements . . . . . . . . . . . . . . . . . 157
8.1.2 Time and Space Complexity . . . . . . . . . . . . . . . 159
8.2 RelatedWork...........................159
8.2.1 Deadlock Detection . . . . . . . . . . . . . . . . . . . . 159
8.2.2 Deadlock Potential Detection . . . . . . . . . . . . . . 160
8.2.3 Failures and Potentials Involving wait() or join() . . . . 162
8.3 JAVIS-Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.3.1 Cycle Detection . . . . . . . . . . . . . . . . . . . . . . 162
8.3.2 Missed Notification . . . . . . . . . . . . . . . . . . . . 165
8.3.3 Potentials for Cyclic Dependencies . . . . . . . . . . . 166
8.4 Summary .............................166
9 Trace-based Failure and Potential Visualisation 167
9.1 Visualisation Requirements . . . . . . . . . . . . . . . . . . . . 167
9.1.1 Trace Visualisation . . . . . . . . . . . . . . . . . . . . 168
9.1.2 Failure and Potential Visualisation . . . . . . . . . . . 170
9.1.3 Visualisation Environment . . . . . . . . . . . . . . . . 171
9.2 RelatedWork...........................171
9.2.1 Visualisation of Concurrent Programs . . . . . . . . . . 172
9.2.2 Object-Oriented Visualisation . . . . . . . . . . . . . . 173
9.2.3 UML-based Visualisation . . . . . . . . . . . . . . . . . 177
9.2.4 Comparison . . . . . . . . . . . . . . . . . . . . . . . . 180
9.3 JAVIS-Visualisation . . . . . . . . . . . . . . . . . . . . . . . . 181
9.3.1 UML Profile for Java Traces, Failures, and Potentials . 181
9.3.2 Visualisation Architecture . . . . . . . . . . . . . . . . 189
9.4 Summary .............................189
xiv CONTENTS
10 Using the JAVIS Prototypes 191
10.1 Example for Automated Failure Detection . . . . . . . . . . . 191
10.1.1 The Banking Example Revisited . . . . . . . . . . . . . 193
10.1.2Tracing ..........................194
10.1.3 Automated Deadlock Detection . . . . . . . . . . . . . 195
10.1.4 Trace and Deadlock Visualisation . . . . . . . . . . . . 195
10.2 Example for General Purpose Tracing . . . . . . . . . . . . . . 199
10.2.1 A Simulation Software Example . . . . . . . . . . . . . 199
10.2.2 Tracing and Visualising JEVOX . . . . . . . . . . . . . 201
10.3Summary .............................201
11 Conclusion 203
11.1Contributions ...........................203
11.2Evaluation.............................204
12 Outlook 209
12.1 Remaining Issues . . . . . . . . . . . . . . . . . . . . . . . . . 209
12.2Evolution .............................210
12.3 Related Domains . . . . . . . . . . . . . . . . . . . . . . . . . 211
Bibliography 213
Index 223
List of Figures
2.1 BankingExample......................... 8
2.2 Banking Example with Deadlock . . . . . . . . . . . . . . . . 9
2.3 Cyclic Resource Dependency and Wait-For Graph . . . . . . . 11
2.4 Source Code View . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 CallStackView.......................... 15
2.6 MonitorView........................... 16
2.7 Monitor View with Callstacks . . . . . . . . . . . . . . . . . . 17
2.8 TraceExample .......................... 19
2.9 ThesisGoals............................ 22
3.1 Requirements for Automated Failure Detection . . . . . . . . . 27
3.2 First Refinement of Goals . . . . . . . . . . . . . . . . . . . . 33
4.1 Banking Simulation Classes . . . . . . . . . . . . . . . . . . . 37
4.2 Unsynchronised Access of Accounts . . . . . . . . . . . . . . . 40
4.3 LostUpdate............................ 41
4.4 Deriving New Threads . . . . . . . . . . . . . . . . . . . . . . 43
5.1 Error Terminology . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Extending the Basic Terminology with Potential . . . . . . . . 64
5.3 Potential for Deadlock . . . . . . . . . . . . . . . . . . . . . . 65
5.4 Deriving Potential Conditions from Failure Conditions . . . . 66
5.5 Safety and Liveness Terminology . . . . . . . . . . . . . . . . 69
5.6 Deadlock.............................. 71
5.7 Missed Notification . . . . . . . . . . . . . . . . . . . . . . . . 73
5.8 Balancing wait() and notify() ................... 74
5.9 Nested Monitor Lockout . . . . . . . . . . . . . . . . . . . . . 76
5.10CircularJoin ........................... 77
5.11SelfJoin.............................. 77
5.12 Join-induced Deadlock . . . . . . . . . . . . . . . . . . . . . . 78
5.13Livelock .............................. 79
xv
xvi LIST OF FIGURES
6.1 Failures and Potentials as Legal System Behaviour . . . . . . . 92
6.2 Thread Lifecycle Statechart . . . . . . . . . . . . . . . . . . . 113
6.3 Thread Lifecycle Statechart with Guards and Actions . . . . . 115
6.4 Object Synchronisation Behaviour Statechart . . . . . . . . . 116
7.1 Requirements for Data Collection . . . . . . . . . . . . . . . . 130
7.2 Levels of Instrumentation . . . . . . . . . . . . . . . . . . . . 137
7.3 The Java Platform Debugger Architecture (JPDA) . . . . . . 142
7.4 Trace Format Example . . . . . . . . . . . . . . . . . . . . . . 149
7.5 Class Diagram of the Tracer . . . . . . . . . . . . . . . . . . . 154
8.1 Requirements for Data Analysis . . . . . . . . . . . . . . . . . 158
8.2 Cycle in Graph Structure . . . . . . . . . . . . . . . . . . . . . 163
9.1 Requirements for Data Visualisation . . . . . . . . . . . . . . 168
9.2 GThread History View . . . . . . . . . . . . . . . . . . . . . . 172
9.3 GThread Mutex View . . . . . . . . . . . . . . . . . . . . . . 173
9.4 Jinsight Execution View . . . . . . . . . . . . . . . . . . . . . 174
9.5 Jinsight Execution View - Zoom . . . . . . . . . . . . . . . . . 175
9.6 Jinsight Reference Pattern . . . . . . . . . . . . . . . . . . . . 176
9.7 Tagged Values for Messages in Interaction Diagrams . . . . . . 185
9.8 Graphical Notation for Stereotypes . . . . . . . . . . . . . . . 186
9.9 Cyclic Failures using Stereotypes . . . . . . . . . . . . . . . . 188
9.10 Class Diagram of the Together Visualisation . . . . . . . . . . 190
10.1 Banking Application Class Diagram . . . . . . . . . . . . . . . 192
10.2 Sequence Diagram Generated by Together . . . . . . . . . . . 192
10.3 Trace Generation Dialogue . . . . . . . . . . . . . . . . . . . . 193
10.4 Part of the Trace . . . . . . . . . . . . . . . . . . . . . . . . . 194
10.5 Importing a Trace in Together . . . . . . . . . . . . . . . . . . 195
10.6 Sequence Diagram of a Trace . . . . . . . . . . . . . . . . . . 196
10.7 Collaboration Diagram with Involved Methods . . . . . . . . . 198
10.8 Collaboration Diagram with Deadlock . . . . . . . . . . . . . . 199
10.9 Complete Trace Visualisation in Together . . . . . . . . . . . . 200
10.10Part of the Remote Trace . . . . . . . . . . . . . . . . . . . . 202
List of Tables
4.1 Signature of sleep() ........................ 45
4.2 Signature of Interrupt Methods . . . . . . . . . . . . . . . . . 51
4.3 Signature of join() ......................... 52
4.4 Keyword synchronized ....................... 53
4.5 Signature of wait() and notify/All() ............... 56
4.6 Java Concurrency Concepts . . . . . . . . . . . . . . . . . . . 59
5.1 Liveness Failures . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2 Potential Liveness Failures . . . . . . . . . . . . . . . . . . . . 87
6.1 Mapping Java Behaviour to Statechart Events . . . . . . . . . 109
7.1 Trace Collection Approaches . . . . . . . . . . . . . . . . . . . 147
9.1 UML Profile for Concurrent Java Traces: Tags . . . . . . . . . 184
9.2 UML Profile for Concurrent Java Traces: Stereotypes . . . . . 187
xvii
xviii LIST OF TABLES
Chapter 1
Introduction
Concurrency is an important concept for developing modern software sys-
tems. While it is indispensable for most applications in distributed and
embedded programming, in general it is a useful concept that allows systems
to be structured efficiently, intuitively and flexibly.
In the past, it was mainly operating systems that provided support for
concurrent programming. Since then, the level of abstraction has risen. To-
day, thread libraries [But97] for widespread programming languages such as
C/C++ are used. The most recent approach is to tightly integrate concur-
rency at the language level such as in Java [GJSB00]. Note that concurrent
programming is not restricted to a particular programming paradigm, al-
though we focus on imperative object-oriented programming languages.
Despite advances in the field of concurrent programming, concurrency
still causes programmers problems. From time to time, concurrency prob-
lems cause severe incidents. A famous example is the software failure of the
Mars pathfinder, which experienced a priority-inversion problem [Ree98]. In
such a situation, a high-priority computation is waiting for a resource from
a low-priority computation. The low-priority computation cannot finish as
long as medium-priority computations are executing [SRL90]. The pathfinder
system was trying to handle the problem by resetting again and again, which
resulted in loss of data [Jon97]. Fortunately, the problem was quickly anal-
ysed and could be corrected in sito by transmitting a software patch. It has
also been reported that the problem was present in test runs but was not de-
tected because the test-analysis tools failed to address it. While concurrency
problems in embedded systems tend to have disastrous effects, non-safety-
critical and non-real-time systems can also experience concurrency problems.
In 1999, COMPAQ, a provider of a thread library, found that their users were
often blaming the library for containing bugs not realising that they had con-
structed deadlock-prone programs [Har00].
1
2CHAPTER 1. INTRODUCTION
Concurrency problems are difficult to prevent because concurrent pro-
grams are inherently nondeterministic. Their behaviour at runtime, includ-
ing undesirable behaviour, is difficult to predict. In 1977, Lamport provided
a general classification of concurrency problems into liveness and safety prob-
lems [Lam77, OL82], i.e. problems which result in lack of progress of a pro-
gram and problems which violate the consistency of program data. Other
authors have provided necessary and sufficient conditions for individual live-
ness problems such as the deadlock [Hol71]. Nevertheless, the above exam-
ples demonstrate that programmers did not always deal with these problems
adequately. Mainly, two reasons for this are conceivable: missing training
and missing tool support because even experts cannot cope manually with
the complexity of concurrent programs. Programmers do not only need edi-
tors and compilers. The success of a complex paradigm such as concurrency
depends also to a large extent on enhanced tool support.
An area where tool support has not been evolving at the same pace as
concepts is debugging for concurrent object-oriented programming languages.
It is not a new problem that especially debuggers do not grow to the needs
for software development, especially for software that is becoming increas-
ingly complex. From time to time, state-of-the-art surveys document almost
complete lack of progress in that area. In 1997, ACM published a Com-
munications issue titled ”The Debugging Scandal and What to Do About
It” [ACM97], which reported that still too much buggy software reaches its
users and that debugging support has not changed much within 30 years of
programming with inserting print statements into code still being the tool of
choice. Better tools were assumed to save significant effort also for experi-
enced programmers. At that time new approaches such as debuggers which
keep a history of a program or provide improved visualisations were only
present in academia. Still, to date, such tools exists mainly in research.In
2003, the ACM Queue [ACM03], actually dedicated to future trends in com-
puting, still saw a need to address the basic topic of debugging of concurrent
and distributed software in depth.
In the case of concurrency, today’s debuggers lack concepts for dealing
with multiple threads and automated detection of many well-known con-
currency problems. This is particularly problematic, because testing and
debugging are crucial and time consuming activities, and lack of support
hampers the successful use of concurrent object-oriented languages such as
Java. This thesis examines how the situation can be improved for Java.
The background of safety problems has been analysed in depth in re-
search. Many algorithms and tools have been proposed, also for Java, such
as in [CLL+02, vPG01]. To the contrary, Java liveness problems have not yet
been dealt with in depth. Java liveness problems are neither completely doc-
3
umented nor extensively supported by debuggers of integrated development
environments (IDE) or by dynamic analysis tools. Also, the visual capabili-
ties of the tools for supporting the development of concurrent Java programs
are not adequate when dealing with multiple threads.
The goal of this thesis is therefore to provide a detailed analysis of the
background of Java liveness failures and to develop concepts for automati-
cally detecting liveness problems in running programs and for improving the
visualisation of tools for debugging Java programs.
4CHAPTER 1. INTRODUCTION
Chapter 2
Motivation
In this chapter, we motivate the need for improving concepts for debug-
ging concurrency problems. In section 2.1 we give an overview of the main
concepts of concurrent programming and typical concurrency problems. In
section 2.2 we demonstrate that there is a lack of concepts for debugging
concurrency errors in Java. Neither are the errors well documented nor do
tools such as debuggers account for their special characteristics. We conclude
the motivation by stating the goals of this thesis in section 2.3. Section 2.4
gives an overview of this thesis.
2.1 Background
In this section, we will introduce the main concepts of concurrent program-
ming. Then we will shed light on the problems in concurrent programs.
Concurrent programs suffer from two kinds of errors, liveness and safety er-
rors. As the focus of this thesis is on liveness errors, in this section, we will
illustrate the characteristics of liveness errors with an example.
Concurrent Programming
Concurrency in an imperative program is the seeming or actual simultaneity
of control flows. The multiple flows of control in a concurrent program are
also called threads of control or simply threads. In general, the number of
threads in a program is not limited and not determined at compile time.
Threads do not only work in isolation from each other but can interact in a
number of ways either by synchronisation or by shared data.
The behaviour of a concurrent program is nondeterministic in the sense
that its execution order is only partially defined. More specifically, the timing
5
6CHAPTER 2. MOTIVATION
of the progress of threads and of progress relative to each other is undeter-
mined. No assumptions can be made about a specific timing. As a result, the
order in which threads interact is not predictable. The nondeterminism is
the reason for the general irreproducibility of concurrent program behaviour
Thus a concurrent program can deliver differing results when repeatedly run
on the same input data but need not.
Today, most commercially available object-oriented programming envi-
ronments support concurrent programming. For example, C/C++ provides
libraries for concurrent programming [But97]. Relatively new is the inte-
gration of concurrency into a commercial language itself, such as in Ada95
[Ada95, Nag03], Modula-3 [Mod] or Java [GJSB00]. Non-commercial lan-
guages integrating concurrency already have a longer tradition. Actor lan-
guages [AWY93] were among the first to integrate concurrency with objects,
providing active objects or actors. These languages hide many lowlevel issues
of thread control as threads are automatically instantiated and deleted. The
above commercial languages were following these ideas by merging threads
of control with the object-oriented paradigm while keeping a more explicit
and lowlevel style of control.
The most important role among the commercial languages plays Java,
which is increasingly gaining importance in academia and industry mainly
owing to its platform independence and its support for distributed and in-
ternet computing. The tight integration of concurrency features into Java
emphasises the importance of concurrent programming.
Concurrency is required for domains such as graphical user interfaces
(GUI), application frameworks, distributed systems, client/server-computing,
embedded systems, and heterogeneous component-based systems. The inte-
gration of concurrency in Java will have the effect that more and more com-
plex concurrent applications will be programmed in Java besides small-scale
Java programs such as web applets. Concurrency does not only play a role
for complex applications. It is also used in simple Java web applets. For in-
stance, a time-consuming download is executed in a separate thread because
it is not desirable that the applet’s main thread is blocked until a down-
load is either completed or aborted. A large number of Java programmers is
therefore confronted with concurrent programming. Not only do they have
to be trained for this but also software development tools should support
their activities.
The concurrency features of Java are often regarded as insufficiently de-
signed [Hol00, Lea03]. The recurring complaints by expert users have had the
effect that the features are still evolving, although at a slow pace. While pre-
vious versions only made small amendments such as deprecating features, the
upcoming version of Java 1.5 adds a few new features [Jav04] in the form of
2.1. BACKGROUND 7
dedicated classes for more high-level synchronisation concepts, for instances,
a lock class. Independent providers have also started to develop more sophis-
ticated libraries [Lea03]. Neither from the additional concepts nor from the
libraries it can be expected that they provide error-free concurrent program-
ming or that they completely replace the use of basic concurrency features of
Java. Therefore, support for programming with the basic Java concurrency
features will always be needed.
Concurrency Errors
Concurrent programs have a high potential for errors specifically related to
concurrency. These errors are not application-specific but can appear in any
concurrent program in general and are caused by the programmer. These
errors are in the code but they can be successfully compiled. Only when
the program is executed, the error has the effect that the program does not
behave as desired. Typically, the program does not crash due to the effect
of the error but merely exhibits unwanted behaviour. Then often the only
possibility is to abort the program.
Concurrency errors can be classified according to their effects on the run-
ning program. Safety errors have the effect that data becomes inconsistent
as a consequence of simultaneous access of data from concurrent threads.
Liveness errors have the effect that one or more thread stops executing but
the rest of the program continues. Those threads stop as a consequence of
synchronised interaction with other threads. The unwanted effect in a run-
ning program which is caused by an error in the code is also called failure.
It is often the case that safety and liveness failures have no immediate effect
on the system. Only after a while either the system may detect or the user
may experience the effects of inconsistencies or blocked computations.
Safety and liveness failures can be very severe. A so-called safety-critical
system that acts in an incorrect and unexpected way due to an inconsistency
or that does not react any longer due to a liveness error can be a danger to
human beings. An erroneous production control system can damage goods
or delay delivery. Also non-safety-critical erroneous software can cause severe
problems when corrupting data or when not responding to requests.
Safety and liveness failures are a serious problem in any concurrent pro-
gramming language because of the inherent nondeterminism. An safety or a
liveness problem does not necessarily occur in every execution of the program.
That makes it extremely difficult to find these failures.
In this thesis we will focus on concurrency liveness failures in Java pro-
grams. As they are a consequence of using synchronisation they require
different concepts than safety failures. Well-known liveness failures are dead-
8CHAPTER 2. MOTIVATION
1
2
34
synchronized
transfer(Account1, 35)
synchronized
transfer(Account2, 76)
synchronized
withdraw(76)
Thread-2 Account2Account1
synchronized
withdraw(35)
Thread-3
5
6
Figure 2.1: Banking Example
locks,nested monitor lockouts and other monitor problems. Other liveness
failures have been rarely documented or not at all. A comprehensive and
precise description of liveness failures is still missing for Java.
We distinguish liveness failure due to synchronisation from failures where
threads do not make progress due to the scheduler. Each Java Virtual Ma-
chine has a scheduler component which assigns CPU time to threads, e.g.
by time-slicing. The scheduler has a strategy, which it uses to determine
when to assign which thread to the CPU. Such a strategy can be unfair and
it can therefore have the effect, that a thread is never assigned CPU. This
has the same effect on a running programming as one of the liveness fail-
ures described above. The failures stemming from scheduling are not our
focus and they need to be dealt with differently than those stemming from
synchronisation concepts.
Example
A typical liveness failure is a deadlock. A deadlock is an execution state in
which threads are blocked mutually waiting for one’s other resources. These
resources are never released. This state is the result of a specific sequence of
interactions between threads. We illustrate this situation with an example.
To describe a multithreaded sequence of actions in a program execution we
can use a UML sequence diagram [UMLa].
We start by looking at an example of a successful bank transfer, a typical
scenario in a banking application (see Fig. 2.1). Two threads of control con-
currently transfer money between two bank accounts in opposite directions.
The sequence of actions in this specific scenario is as follows.
1. The thread object Thread-2 calls transfer(Account2, 76) on Account1 to
transfer an amount of 76 units of currency from Account2 to Account1.
2.1. BACKGROUND 9
12
3
4
synchronized
transfer(Account1, 35)
synchronized
transfer(Account2, 76)
synchronized
withdraw(76)
Thread-2 Account2Account1 Thread-3
synchronized
withdraw(35)
Figure 2.2: Banking Example with Deadlock
2. Account1 calls withdraw(76) on Account2 to withdraw the amount from
Account2.
3. When withdraw(76) returns, the amount withdrawn is added to Ac-
count1 and transfer() returns.
4. Another thread object named Thread-3 calls transfer(Account1, 35) on
the object Account2 to transfer an amount of 35 units of currency from
Account1 to Account2.
5. This triggers a withdraw of 35 units from Account1.
6. Again, withdraw() and transfer() return.
The methods transfer() and withdraw() both change an object’s state. In
order to avoid inconsistencies simultaneous calls to transfer() and withdraw()
on the same object have to be mutually exclusive, i.e. only one call can be
executed at a time on the same object. The prefix synchronized (see Fig.
2.1) denotes that a method has been defined as being mutually exclusive and
can only be executed mutually exclusive with all other methods specified
as synchronized when executed on the same object. Here, not only a single
method but an entire object is a mutually exclusive resource, which can be
assigned exclusively to at most one calling thread. When a mutual exclusive
method call is granted access to an object we also say that it locks the object.
When this call returns the lock is released. When a thread calls a mutually
exclusive method on an object which is already locked, it keeps attempting
to acquire the lock and cannot progress until it receives the lock. A thread
waiting for a lock resource is hence waiting for the thread having this lock
to make progress and release the lock.
The result of the activities of the threads in the above example depends
on the relative timing of the threads, i.e. on the order in which method
calls for each shared object occur. In Fig. 2.1, both calls to transfer() return
10 CHAPTER 2. MOTIVATION
successfully because the first thread has completed the call to withdraw() on
Account2 before the second thread calls transfer() on Account2. Figure 2.2
gives an example with a different execution order.
1. Thread-2 calls transfer(Account2, 76) thereby locking the first account.
2. Thread-3 calls transfer(Account1, 35) thereby locking the second ac-
count.
3. Thread-2 calls withdraw(76) trying to lock the second account and is
blocked.
4. Thread-3 calls withdraw(35) trying to lock the first account and is
blocked.
Because each account is already locked by a thread both threads are blocked.
They cannot proceed without the locks and thus they are waiting for the
release of the locks. The blocking of threads on synchronized-calls is indicated
by the two additional empty activation bars (see Fig. 2.2). Without the locks
the threads cannot complete withdraw(). Furthermore, they will never release
the locks which they were granted when calling transfer().
This situation is called a deadlock because of the cyclic dependencies:
Thread-2 holds a lock on Account1 and is blocked while waiting for the lock
of Account2 which is held by Thread-3 which in turn is blocked waiting for
the lock of Account1.
A typical means for depicting a deadlock are the following graphs in
Fig. 2.3 which have been proposed by [Tan97] to illustrate deadlocks in
operating systems. The graph on the left hand side is a so-called resource
dependency graph depicting which threads hold which resources and wait
for which resource. The graph on the right hand side abstracts from the
resources involved and is called a wait-for graph. It describes which thread
is waiting for another thread to release the resource.
Note that the deadlock in a Java program cannot be resolved except by
externally aborting the complete program. The deadlock is a failure which
has been programmed into the code by the programmer and hence can be
avoided on the programming level. In Java, a deadlock can happen exactly
as described here.
Note that the two different execution orders in Fig. 2.1 and Fig. 2.2
result from the very same code. The code has a potential for a deadlock,
and it is also possible to see that potential in a successful execution such as
in Fig. 2.1. Here, objects can be locked in an arbitrary order which always
bears the potential of cyclic dependencies.
2.1. BACKGROUND 11
Account2
Account1
Thread-3Thread-2 Thread-3 Thread-2
acquireslocks
acquires locks
wait-for
wait-for
resource depencency graph wait-for-graph
Figure 2.3: Cyclic Resource Dependency and Wait-For Graph
Liveness Failures Characteristics
The characteristics of the deadlock are typical for other liveness failures, too,
such as nested monitor lockout or notification problems, which will both
be discussed later. The deadlock characteristics also illustrate the inherent
complexity of these kind of failures. As an example consider the deadlock
from Fig. 2.2. A deadlock can be categorised according to the following
dimensions:
•Dynamicity: The deadlock is a state in a running program.
In the example, the deadlock is only manifest, when withdraw is called
for the second time and blocks.
•Threads: The deadlock involves more than one thread.
In the example, the two threads involved are Thread-2 and Thread-3.
•Interactions: The deadlock involves more than one interaction per
thread involved.
The interactions involved from Thread-2 are synchronized transfer() and
synchronized withdraw(), for Thread-3 analogously.
•Timing: The deadlock develops over a certain time span of the program
execution and is dependent on the specific timing of the execution order
during that time. More precisely, it is dependent on the relative timing
of interactions on objects shared between threads.
In the example, between the first interaction and the second interaction
of each thread, there can be an arbitrarily long elapse of time. For the
deadlock to occur it is only necessary that the second call to synchro-
nized transfer() on Account2 takes place before the call to synchronized
withdraw() on Account2.
12 CHAPTER 2. MOTIVATION
•Dependencies: The interactions create dependencies such that progress
of a thread depends on a resource from another thread which is not
released.
In the example, each thread needs a resource held by another thread.
The dependencies introduced are depicted by the graphs in Fig. 2.3. Either
these can be expressed directly as dependencies between threads, or they can
be expressed as dependencies between threads and shared objects.
Note that such a deadlock state does not completely describe the order of
execution leading into the error. From the given deadlock state we can only
derive a partial order. The complete order is missing. Knowledge about the
complete order can help to develop an understanding of how the deadlock
developed over time.
2.2 Problem Description
Dealing with the deadlock and other concurrent liveness failures requires
knowledge about all the possible failures and also about their potential in-
dicators in successful executions. The literature does not provide a com-
prehensive list of concurrency failures in Java. A good starting point for a
description of concurrency failures in Java is [Lea00], which informally de-
scribes the most important liveness failures.
Knowledge about possible failures, failure-prone executions, and the rea-
sons common to them, is required during testing and debugging. Therefore,
the tools used for these tasks need to be able to deal with concurrency fail-
ures. Existing tools as found in integrated development environments (IDEs)
like debuggers do not provide adequate support for the programmer to detect
concurrency failures.
The achievements and deficits of debuggers will be illustrated in depth
in the following. We will also illustrate the widespread technique of print-
line statements used in addition to or instead of a debugger. We shortly
discuss other approaches covering dynamic analysis and static analysis. For
improving tool support also a more formal documentation of liveness failures
is needed which is still missing.
Detecting Concurrency Errors
Also in concurrent programming, it is best practice to extensively test a
program. The purpose of testing is to validate test cases and to discover de-
viation from them in order to make failures visible. Because of the nondeter-
minism, testing aims at covering as many different executions of the program
2.2. PROBLEM DESCRIPTION 13
as possible, either by nondeterministic approaches based on inserting arbi-
trary delays or by deterministic approaches which can execute a concurrent
program based on a predefined timing of interactions [CT91, BT98].
Besides testing, programmers often check their code immediately after
writing it, using ad hoc test cases. This is called a code-debug-fix cycle.
For both approaches it is required to be able to detect and classify fail-
ures. Note that by detection we mean identifying that there is a failure in
a running program or in information gathered from a running program and
identifying the kind of failure it is. The term debugging does imply detection
but goes further. The goal of debugging is to localise the fault area of the
code and to develop an understanding of the program so that an adequate
correction can be made [FR01]. It therefore includes the steps taken after it
has been detected that the program behaves not as specified or as expected
and describes the activity of locating the source code causing this failure.
The generation of test cases is not part of the debugging and also not our
focus here. For testing Java, there are tools such as JTest [JTe], JUnit [JUn],
or even extensions which try to cover also concurrency such as JMThreadUnit
[JMT].
Debuggers
For support in testing, detecting, debugging and understanding failures, in-
tegrated software development environments (IDEs) provide debuggers with
the following support.
•The concept of a debugger is to execute a program stepwise. After each
step the program state is presented. The state consists of heap and call
stack.
•The granularity of steps ranges from single statements to user-defined
steps between breakpoints and watchpoints.
•Debuggers automatically support detection of address violations or di-
vision by zero.
Existing debuggers differ little in these concepts but mainly in the comfort
they provide for stepping through the program and for navigating the state
[ZK00]. Traditionally, debuggers do not cover concurrency related problems
because they were invented for sequential programs.
Using a debugger, a programmer still has to search manually for most
errors, especially concurrency ones. The programmer invents and checks
hypotheses about errors. Both is usually based on experience and intuition.
14 CHAPTER 2. MOTIVATION
Figure 2.4: Source Code View
The programmer looks for certain patterns in the execution history and in
the code. Proving a hypothesis is time-consuming because typically it is not
even known when exactly the error has happened. Breakpoints have to be
set well before the error is assumed to happen, and then the program has
to be executed stepwise. Each state has to be carefully examined by the
programmer for identifying known patterns and new situations.
For many decades, debuggers have focused solely on sequential programs.
Since the advent of commercial concurrent programming languages like Ada
or Java, debuggers have started to make a gradual transition towards concur-
rent programs. They have been extended to cover also multiple threads. The
presentation of program state covers heap, stack, and call stacks for individ-
ual threads. The stepping mechanism including breakpoints and watchpoints
works on thread-level. Individual threads can be started and stopped while
others keep running. The realisation of these concepts do not vary a lot across
different languages and tools for the same language. For Java, these concepts
have been integrated in IDEs, such as JBuilder [JBd01], VisualAge [Vis00],
JDeveloper [JDv01], or Eclipse [Ecl]. In more recent versions of these IDEs,
detection of deadlock has become available such as in JBuilder [JBd01]. In
this respect, Java IDEs are not behind, compared to other languages. In
response to the findings presented in the introduction, COMPAQ started a
project to provide deadlock detection in its C/C++ IDE for the Pthread li-
brary only in the year 2000 [Har00]. Also recently, a few standalone dynamic
analysis tools for concurrency errors have come into existence, which cover
also deadlocks such as JProbe Threadalyzer [JPr00].
2.2. PROBLEM DESCRIPTION 15
Figure 2.5: Call Stack View
Example
To illustrate the existing capabilities and concepts for analysing concurrency
errors we have implemented the deadlock-prone bank transfer example from
the previous section in a simple Java program, which we execute with the
debugger of the JBuilder IDE for Java [JBd01]. JBuilder only serves as an
example for a typical debugger in a Java IDE. Other IDEs like Visual Age
[Vis00], JDeveloper [JDv01], Eclipse [Ecl] or even the round-trip UML CASE
tool Together [Tog] provide similar debuggers. The following three JBuilder
IDE views are relevant for describing an execution state and, in particular,
a deadlock.
•A view of the source code with a cursor indicating the position of each
of the concurrent control flows (see Fig. 2.4).
•A view of the call stacks of all threads. The call stack of each thread
is an ordered list of method calls. The topmost method call in this
list is the last one put on the call stack and thereby the active one. A
synchronized-method call is only put on the call stack after the method
has been successfully entered, i.e. after the corresponding lock has been
granted. For each method call the line of code where it is defined is
given, the class to which the method belongs, and the package of that
class (see Fig. 2.5).
•A view of the mutual exclusive objects, which are also called monitors.
For each such object, the thread which has access and a list of threads
which are waiting for access are given. In the example (see Fig. 2.6),
16 CHAPTER 2. MOTIVATION
Figure 2.6: Monitor View
Thread-2 has the access for the bank account c2 and Thread-3 is waiting
for access, indicated by being second in the list of thread icons. This
view indicates a deadlock by highlighting the accessed objects involved
in the deadlock. Each thread can be extended to its call stack similar
to the second view (see Fig. 2.7).
In the following, we examine how our example is presented along these
views. We examine a run of the example code which leads into a deadlock.
In our example the source code view (see Fig. 2.4) shows the control flow
position of a thread by highlighting the line other.withdraw(amount) in the
code of class Account.
In the second view (see Fig. 2.5), we see the call stack of the two threads
which transfer money between accounts. Each thread was started by a call to
run(), therefore the bottommost method on each call stack is run(), defined
in line 19. The next method on the call stack of each thread is transfer(),
defined in line 47. In one of the call stacks, the last method entered has been
highlighted, which is transfer(). This is used to denote a correspondence with
the first view. The thread in whose call stack the method was called, here
Thread-2, is the one which owns the control flow depicted in the first view.
The third view (see Fig. 2.6) presents the synchronized objects and the
actual deadlock. The synchronized objects in this program are Account c1
and Account c2. For each of these account objects, the locking dependencies
are provided. Each object is depicted with a lock icon and a description of
its class. In brackets, the thread holding the lock is depicted. The internal
ID of the object is given in the next line. We see that Account c2 is locked by
Thread-2. After that, the thread holding the lock is depicted again before all
threads waiting for the lock are depicted. Here, Thread-3 is blocked waiting
for the lock. Similarly, Account c1 is locked by Thread-3 and Thread-2 is
waiting for the lock. Each thread depicted can be extended by its call stack
similar to the second view (see Fig. 2.7). This third view shows the threads
2.2. PROBLEM DESCRIPTION 17
Figure 2.7: Monitor View with Callstacks
and objects involved in the deadlock and their mutual dependencies. The
same holds for the extended monitor view. This information is not covered
by the other views.
The information about the interactions involved is scattered over two
views. The call stacks only show the last successfully entered method which
is transfer() in each case. The call stacks do not show which method call
is the one which is blocked. This information has to be deduced from the
control flow view(s). They are indicating the control flow position, which is
other.withdraw() for each thread (see Fig. 2.4).
The execution order is not covered by these views. Though each call stack
shows at least the execution order of nested interactions, the call stacks do not
show the relative order between any two call stacks because the threads are
only presented in isolation. The information can again be deduced based on
the locking dependencies each call has produced. These locking dependencies
restrict the possible execution order. Thereby one can derive the order in
which shared objects where accessed.
So far we have described only a snapshot of the JBuider IDE views. We
have not explained how these views were generated. To generate these views,
breakpoints had to be inserted in each thread before the code involved in the
deadlock. Then one had to stepwise execute the program until the deadlock
was reached. While this sounds simple, it is only simple for such a contrived
example. We knew which threads would get involved in the deadlock and
we could select these threads for stepwise execution, however not all at the
same time. JBuilder allows only to stepwise execute one thread at a time,
which does not provide enough flexibility if one does not know what to look
for.
18 CHAPTER 2. MOTIVATION
Debugger Deficits
Historically, debuggers did not provide integration of execution states from
different control flows. Their very powerful means for exploring the execution
state of a program in detail including heap, stack and the call stack were
sufficient for sequential programs. When making the transition to concurrent
programs, they did not integrate concepts for a coherent presentation. It is
obviously not sufficient to introduce a separate single-state presentations for
each new thread as in most debuggers. Another effect of not dealing with
concurrency features is the fact, that call stacks do not contain method calls
which have been entered but are blocked due to a synchronized. Therefore,
debuggers are not able to show that a synchronized-method was called but
that the thread was blocked. Another problem is the transient memory of a
debugger. Neither a single state nor the history of an execution can be stored.
Once the observation of an execution is completed, also the information is
gone and therefore the execution order cannot be accessed any longer.
To summarise, two main problems have arisen when examining the con-
cepts of debuggers for the detection and particularly for the understanding
of the development of a deadlock.
•Inadequate representation of the interactions involved in a deadlock.
•Missing representation of relative execution order, i.e. the timing on
shared objects.
Manually Generating an Execution History
To overcome the lack of an execution history and to avoid an incomplete
reconstruction of it after the execution has taken place, it has always been a
widespread debugging practice to manually insert printline statements into
the source code, for sequential programs as well as for concurrent programs.
Although this observation is commonly acknowledged, empirical evidence is
rare [Lew03a, ACM97]. This practice shows that programmers have always
felt the need for generating and viewing information about the execution
history. The output generated provides a time-ordered sequence of program
execution steps and is called a trace. It should not be confused with the
trace-command in a classical debugger, which selects a function or method
at which the debugger stops each time it is called.
Using printlines not only saves time compared to stepping through a
program with a debugger but also has the advantage that the output is more
compressed than the information presented on the debugger screen. Also, this
approach is completely tool-independent. Moreover, the information is highly
2.2. PROBLEM DESCRIPTION 19
1:Employee@ID#90:Terminal@ID#-1:handOverCheque:false:Exit
2:AccountingTransaction@ID#97:AccountingTransaction@ID#97:run():false:Enter
3:AccountingTransaction@ID#97:Account@ID#79:transfer:true:Acquire
4:AccountingTransaction@ID#108:Account@ID#80:transfer:true:Acquire
5:AccountingTransaction@ID#108:Account@ID#80:transfer(Account(id=79), long 34):true:Enter
6:AccountingTransaction@ID#97:Account@ID#80:withdraw:true:Acquire
7:AccountingTransaction@ID#111:AccountingTransaction@ID#111:run():false:Enter
8:AccountingTransaction@ID#111:Account@ID#80:transfer:true:Acquire
9:AccountingTransaction@ID#108:Account@ID#79:withdraw:true:Acquire
Figure 2.8: Trace Example
customisable with regard to the kind and amount of information collected.
For instance, one could either choose more printlines with less information
or fewer printlines with very detailed information about the program state.
Fig. 2.8 is an example of a Java trace observed during a program run.
This kind of output could have been created manually by using printline
statements, although typically one would not attempt such a detailed output
manually. One line in the trace corresponds to one execution step also called
event. The information for each event is separated by “:”. The first entry is
the line number of the trace itself. The second entry is the thread, identified
by its class and its object-ID, the third entry is the class and the ID of the
object on which the method is called. In case of a method exit, the callee is
not identified, which is denoted by a “-1”. The next entry is the name of the
method called. The following entry is “true” for a synchronized-method and
“false” otherwise. The last entry distinguishes method entry, exit or attempt
to acquire a lock. Fig. 2.8 is only an excerpt from a longer trace. Typically,
the classifier names are preceded by host names and package paths but these
elements are omitted here for reasons of presentation.
The deadlock is related to lines 5, 6, 8, 9 in Fig. 2.8. The thread with
ID 97 of class AccountingTransaction acquires a lock, which is locked by the
thread with the ID 108. However, the thread with ID 108 acquires a lock for
ID 79, which is locked by the thread with ID 97.
Manually inserting printline statements into the code also bears disad-
vantages. It is not only cumbersome but also error-prone in two respects.
•During the insertion of required printlines, important branches of the
code can be accidentally omitted.
•The information gathered with printline can be insufficient for a fore-
seen or unforeseen purpose.
There is not only a risk that important information be missed but also
that incomplete output be misinterpreted. In the presence of threads, it is
20 CHAPTER 2. MOTIVATION
particularly easy to create a misleading hypothesis. Besides that, printline
statements generate textual output that is difficult to read and understand.
Having demonstrated the advantages and drawbacks of manual generation
of execution histories, it is obvious that an automated generation is desirable
to overcome the drawbacks. However, without thoughtful selection, one could
easily generate a large amount of trace data. This requires to also analyse
traces automatically because, often, only a small part of the trace contains
the failure.
Other Approaches based on Dynamic Analysis
So far we have discussed debuggers which are integral parts of IDEs. There
are also some standalone tools which do not provide the complete functional-
ity of debuggers but instead provide thread-specific dynamic analysis. They
focus on profiling, i.e. resource and performance analysis, as well as on de-
tection of a few well-known failures. For instance, the JProbe Threadalyzer
[JPr00] provides deadlock detection besides profiling [JPr00]. The Assure
Thread Analyzer [Ass] also provides profiling and deadlock detection. These
tools do not put a special emphasis on the presentation of detected failures.
To the best of our knowledge, none of these tools is dedicated to an exhaustive
analysis of liveness problems. Both tools also detect safety failures.
Test-tools like the commercial product JUnit [JUn] or the open source
tool JTest [JTe] analyse coverage during testing. They put an emphasis on
automatically generating test cases for object-oriented testing but not for
concurrency. Therefore, they do not cover liveness failures or safety failures.
Formal Methods
Besides the best practice to perform extensive testing, there are also static
analysis approaches, which aim at proving properties of concurrent programs.
These approaches are either based on source code or on a model of a program.
One of the most sophisticated approaches is the Bandera Tool Set [Ban04,
CDH+00, HD01], which is an integrated collection of program analysis, trans-
formation, and visualisation components designed to facilitate experimenta-
tion with model-checking Java source code. Bandera takes as input Java
source code and a software requirement formalised in Bandera’s temporal
specification language. It generates a program model and specification in
the input language of one of several existing model-checking tools. For de-
riving models from source code, Bandera offers a variety of techniques such
as slicing and abstract interpretation. Thereby it wants to leverage the use
of model-checking for ordinary software development instead of its previous
2.3. GOALS 21
use only in dedicated settings, e.g. for safety-critical systems.
Finite-state verification techniques, such as model checking, are attrac-
tive because they are capable of exposing very subtle defects in the logic of
sequential and concurrent systems. However, it is not feasible to build finite-
state models for real-size applications, only for selected parts. Also, bridg-
ing the semantic gap between a non-finite-state software system expressed as
source code and those tool input languages requires the application of sophis-
ticated program analysis, abstraction, and transformation techniques. The
successful choice of techniques requires training and technical background.
Therefore, model size and missing expertise are the two main impediments
for applying model checking in every day software development.
Formal methods are also applied to check models from which software can
be derived following a methodology which keeps the desirable properties. An
approach based on model-checking, Labelled Transition Systems, has been
proposed in [MK99]. This approach suffers also from the problems discussed
above.
Because of model explosion and training overhead, alternatives to improve
testing are still needed and can be used complementary to static analysis.
Prerequisites for Improving Tool Support
In order to implement failure detection, precise descriptions of failures are
required. They are needed to determine what kind of data has to be accessed
for the failure detection. They are also needed to describe suitable detection
algorithms and they are needed to define how results are to be presented.
Classifications of precise descriptions can help to exploit the potential
commonalities of failures. During the implementation of failure detection
tools this supports reuse of common detection principles for a set of failures.
Precise descriptions are still missing for most Java liveness failures. Lit-
erature only provides incomplete informal or exemplary descriptions such as
in [Lea97, Lea00, Hol98].
2.3 Goals
In the problem description two areas have been identified where the state-of-
the art is not satisfying, the description of concurrent liveness failures and
their practical support with tools.
First of all, we want to contribute to a better understanding of concur-
rency failures by
22 CHAPTER 2. MOTIVATION
Failure
Domain
Data
Analysis
Data
Visualisation
Data
Collection
Concepts Design Implem. Evaluation
Figure 2.9: Thesis Goals
•systematically and precisely describing the domain of Java liveness fail-
ures.
Starting from a precise description we want to provide
•concepts for collecting data from running Java programs,
•concepts for automated detection of Java liveness failures, and
•concepts for visualisation of Java liveness failures.
These four goals are depicted on the vertical axis of Fig. 2.9. On the
horizontal axis we have depicted the steps which have to be taken to put the
concepts to work by designing and implementing corresponding prototypes,
which will finally be assed for their usefulness.
The intended prototype shall support the programmer in the detection
of liveness failures. Therefore, the detection of liveness failure is intended to
be fully automatic and should not require particular background knowledge.
The analysis tool shall record test cases or debug sessions and will only give
feedback if a failure is found. When a failure is found the programmer is
notified. If nothing is found the programmer does not necessarily have to be
aware that the automated detection is carried out in the background.
In the next chapters, we will derive detailed requirements from the above
listed goals.
2.4. SYNOPSIS 23
2.4 Synopsis
In chapter 3 we derive our requirements and outline our approach. In chapter
4 we discuss the concurrency approach taken by Java in greater detail.
In chapter 5 we provide an informal description and classification of live-
ness failures. In chapter 6 we provide a model of thread synchronisation,
which we use to formalise liveness failures.
In chapter 7 we discuss the state-of-the-art in tracing and develop a trac-
ing prototype. In chapter 8 we present selected algorithms for dynamic anal-
ysis of failures. In chapter 9 we discuss existing software visualisation ap-
proaches and develop a visualisation for concurrent Java and its failures.
In chapter 10 we present the developed prototypes with examples. In
chapter 11 we conclude and in chapter 12 we give an outlook.
24 CHAPTER 2. MOTIVATION
Chapter 3
Requirements for Automated
Failure Detection
In this chapter, we will determine the requirements for automated detection
of concurrent liveness failures and we will outline our approach. For this
purpose we will look again at the main goals for our approach, which we
have identified in the last chapter:
•The goal to precisely describe and classify concurrent liveness failures.
•The goal to provide concepts for automatically detecting these failures
in a running program.
The first goal is not only important in order to improve the general un-
derstanding of concurrent liveness failures but it is also a prerequisite for the
second goal. The reason is that the characteristics of the failures determine
what information is needed for detecting them and this in turn determines
how the detection results should be presented. Because of this dependency
we cannot derive detailed requirements for the automated detection until
we have described the failures in depth, which will take place in chapter 5.
Here, we outline the requirements based on the insights into failures we have
gained so far from the motivation. This level of abstraction will suffice for
this chapter to describe our requirements and the overall approach.
In section 3.1, we will refine the requirements for a precise description of
the failures. Here we will address the issue of classification and formalisation.
In section 3.2, we will describe the requirements for automated detection,
which consist of determining how the information from a running program
is accessed, how it is analysed, and how the analysis is presented. The
entire structure of the detection process is determined by the intended usage
scenario(s) which will also be discussed here. Section 3.3 summarises our
requirements.
25
26CHAPTER 3. REQUIREMENTS FOR AUTOMATED FAILURE DETECTION
3.1 Failure Description and Classification
Here we give an overview of the requirements for the precise decription of the
failure domain. Thesse requirements are discussed from two perspectives.
First of all we consider the purpose of the precise description which has
an influence on the choice of the means for the precise decription. Then
we consider the scope which has to take into account the characteristics of
failures.
3.1.1 Purpose
For the first goal we require a precise and systematic description of the con-
current liveness failures in Java. This description must support
•the development of a better understanding of the failure domain, and
•the development of definitions which can be used in detecting these fail-
ures, i.e. definitions from which patterns or algorithms can be derived
to detect failures.
For a better understanding of the failure systematics, it is advisable to iden-
tify commonalities and classify failures accordingly. It is desirable that the
results from analysing the domain of failures are captured in a clear and
unambiguous way. Here, an intuitive formalism can support our goals but it
should also be accompagnied by an informal description in natural language.
Also, it is useful to base the description of errors on well-known classifications
and terminology for errors rsp. failures.
For the development of detection mechanisms, which shall be imple-
mented in a tool, it is even more important to define failures in a formal
way. There are different levels of formality. For the goals of this approach,
namely automated failure detection and visualisation, we consider it dispens-
able that the formalism be executable or that the detection part be accom-
plished by a formal tool. Instead, we strive for a straightforward independent
implementation for validating our ideas.
3.1.2 Scope
As mentioned in the beginning of this chapter, the failure characteristics
determine the requirements for automated support. At this point, we can
only give an overview of these characteristics, which will be detailed later:
•Failures are situations which are observed while a program is running,
a characteristic which is already captured by the term failure.
3.2. AUTOMATED DETECTION 27
Data
Analysis
Data
Visualisation
Data
Collection
Failure Domain
UsageContext
Standards
Figure 3.1: Requirements for Automated Failure Detection
•Typically, concurrency failures involve more than one thread and de-
velop over time, involving one or more interaction from each thread
involved. These interactions create dependencies among the threads.
Note that all these characteristics were present in the deadlock example in
the previous chapter.
The characteristics of the failure domain determine essential features of
the desired tool support. From the characteristics it follows that the descrip-
tion of the failures has to cover program dynamics, thread interactions, and
thread dependencies. More details on the failure characteristics will be given
in chapters 5 and 6.
3.2 Automated Detection
When speaking about software, the general idea behind automation is that
a software system performs a computation or analysis on a kind of data and
informs about the result. This very coarse definition already identifies the
three key tasks involved in automation, along which we have to organise our
requirements:
•Collecting the input for the analysis (Data Collection).
•Performing the analysis task (Data Analysis).
28CHAPTER 3. REQUIREMENTS FOR AUTOMATED FAILURE DETECTION
•Transforming output results into a suitable visualisation (Data Visual-
isation).
In our case, the analysis task deals with the detection of the concurrency
liveness failures. Conceptually, we consider the collection of the input as part
of the overall approach, although it might not always have to be covered by
a tool, e.g. if collected data already exists. The three tasks are depicted in
the centre of Fig. 3.1. Even though the input data depends in its choice
on the intended analysis, we will stick to the above order as it is the order
in which these tasks depend on each other from the data flow perspective.
Note that the above order does not imply that each of these tasks can only
be performed once. Indeed, it is possible to perform them incrementally.
The failure characteristics shortly sketched in the previous section deter-
mine the analysis task, and in turn determine the necessary input and the
resulting output. The failure characteristics are depicted by the arc labelled
Failure Domain on the outer circle in Fig. 3.1. The outer circle denotes the
influence of its arcs on the three key tasks depicted in the centre.
Also, for automated support it is important to reflect the context in which
this support is going to be used. This has an impact on the model of how
the software interacts with its surroundings, e.g. a user. This is depicted by
the arc labelled Usage Context in Fig. 3.1.
Orthogonal to the just mentioned tasks is the fact that it is often desirable
to use standards or de facto standards and conventions. While this can affect
anything from data encoding up to the colours on a screen, here we take a
software engineering perspective and address exchange formats and APIs in
order to increase interoperability and reuse, and we address notations for
visualisation as this can improve usability. Standards are depicted as an
individual arc in Fig. 3.1.
The ideas outlined above will be examined in more depth below. Sec-
tion 3.2.1 discusses the contextual requirements. Section 3.2.2 discusses the
collection of input data. Section 3.2.3 discusses the analysis and 3.2.4 the
visualisation. Standards will be discussed directly in places where they are
related to.
3.2.1 Contextual Requirements
In this section, we discuss the requirements from the usage context, for which
we are planning the automated support. In general, the context is charac-
terised by a process and that process determines where automated support
can be used to replace human activities. The automated support is primar-
ily intended to replace part of the process, but also the process itself might
3.2. AUTOMATED DETECTION 29
change due to better tools. The process might already employ tools, which
has to be considered when attempting to use new tools.
The context in which the desired automated detection is to be used does
not only determine the core functionality but also how this functionality
needs to be controlled, how it needs to interact with the context, and how
data is stored and displayed. The requirements come therefore from the
intended ”use cases” of the automated detection support.
The automated failure detection can be applied in two different cases.
•Automated failure detection can be used to detect failures in test runs,
either during or after the test run.
•Automated failure detection can be used to detect failures in code-
debug-fix cycles.
Both cases have to be reflected in the requirements. Both cases require
that data from the test runs rsp. the debug cycles can be accessed. In general,
for both cases, concepts and techniques have to be determined. There is a
common core to both of them.
User interaction plays a role in each of the two use cases and therefore
determines the user control over the three tasks data collection, data analysis,
and data visualisation. The user has to choose the part from the program and
the execution phase of the program from which the information is collected.
The user has to decide what to do if a failure is detected. The user needs
full control about the visualisation.
Not only the use cases have to be reflected but also which tools are already
used in these use cases such as testing tools in the first case and debuggers or
other dedicated dynamic analysis tools in the second case. Either, one might
require that it is possible to exchange data between the existing tools and
the new tools, or on might go one step further and require that the detection
automation be more tightly integrated with existing tools.
Standards will become an issue when we look at the individual require-
ments for each task in the following.
3.2.2 Data Collection Requirements
This section describes the requirements for collecting input data for the in-
tended analysis. In principle, data collection has to provide support for both
of the above (use) cases. For this approach, we reduce the requirements to
the smallest common denominator. In both of the above cases we need to
collect information from a running Java program. One does not only have
to describe the format of the data, but one has to actively collect the input
30CHAPTER 3. REQUIREMENTS FOR AUTOMATED FAILURE DETECTION
from a suitable data source. Therefore, regarding the data collection, two
requirements arise:
•The data format needs to be determined.
•The method how the corresponding data is collected has to be deter-
mined.
Before a method of data collection can be defined, the data format has
to be determined and therefore will be discussed first. The granularity and
the level of abstraction of a format have to be such that the desired failure
analysis can be performed. Also, it is desirable to design a more general
format for the data collection in order to foster reuse. In the same veine, it
is desirable to use formats from testing or be at least compatible with them.
This could alleviate the first usage issue, i.e. to use our approach for test
cases.
For the data collection method, a way has to be found how to access infor-
mation about the execution steps taking place in the running program. One
needs precise information about each statement. One way is to hook into
the runtime environment to get this information, i.e. the runtime environ-
ment can output data about each statement it executes from a program. The
hooks can be part of the API of the runtime environment, or one needs to
instrument the runtime environment, i.e. augment its code. Another way is
to directly instrument the program by using an automatical transformation.
However, it is desirable that the source code need not be changed, in order to
test or debug a program which is already running, e.g. a webserver. It is also
required that the program is not disturbed by the observation. All software
solutions which gather precise data will have an impact on the timing of the
observed program, e.g. slow it down.
We have already pointed out that we consider a debugger as too limited
for our purpose as it is only able to deal with one state at a time. We
require that a history of states is available rsp. a history of events and the
effected state changes. We next consider a technique which seems apt for our
requirements.
Tracing
We have seen in the motivation that attempts have already been made by
programmers to create histories of events by manually inserting printlines
into the code. The problems of manually inserted printlines can be overcome
by automating the generation of output as produced by printline statements.
This is also called tracing. Tracing generates a log of interesting events and
3.2. AUTOMATED DETECTION 31
states of a running program. Such a log, which is also called a trace file or
simply a trace, provides an execution history with the global and relative
timing of threads. A trace is therefore typically defined as a time-ordered
sequence of event-records [KRR98].
The trace format defines the trace information, which has to be gathered,
i.e. events and states of a program, and how they are encoded. We have
already shown an example trace in the motivation in the previous chapter.
It is also required to provide concepts for controlling the tracing process
and to interact with it. Here, the challenge is to combine ideas from de-
bugging with tracing such as using breakpoints to determine the part of the
program to be traced.
A general problem is always that tracing generates large amount of data.
Therefore, means for selective tracing are required.
Standards are an issue as well for the tracing format as for the tracing
method. For the format we require either to use an existing format or to
use a format transformable into other formats. One should not only consider
formats from testing but also ”formats” for describing software dynamics
such as UML interaction diagrams [UMLa]. They are not far from what we
require.
For the tracing method, one should look at APIs in the Java context.
The Java Platform Debugger Architecture is a de facto standard for building
Java debuggers and it is used by many suppliers of Java IDEs. It allows
debuggers to collect information about program execution and to set break-
and watchpoints amongst many other typical debugger functions. We require
that such an interface is taken into consideration.
3.2.3 Data Analysis Requirements
The goal is to use the failure characteristics to determine concepts how failure
detection can be implemented. Therefore, we do not only require that the
data collection be automatic but also the detection. Automated detection
for all typical errors in concurrent traces is desirable.
Regarding the analysis, the most important requirement is that the in-
tended analysis can be computed efficiently. It must be feasible to determine
algorithms which detect concurrency liveness failures in a running program.
For a trace-based approach as outlined in the previous section, it is required
that large amounts of data are examined automatically.
The requirements for the algorithms will be refined after we have de-
scribed the failures in detail.
32CHAPTER 3. REQUIREMENTS FOR AUTOMATED FAILURE DETECTION
3.2.4 Failure Visualisation Requirements
It was demonstrated in the motivation with the example of the deadlock that
failures span over time, involve different interactions from different threads,
and involve dependencies among threads. The failure characteristics deter-
mine how the results from the failure detection should be presented to the
user. Ideally, all of these dimension should be accomodated simultaneously.
This can be achieved best by a graphical respresentation. While text can
only present one dimension explicit, namely, the sequence, a graphical rep-
resentation can display at least two dimensions and can encode others using
colour, shape, size or even text. Therefore, it can display failures with its
several dimension better than a texutal trace. In the motivation we have
already extensively used different kinds of graphical visualisations to explain
the deadlock.
We cannot learn from debuggers how to present several dimension, es-
pecially not how to display time. Tracing differs substantially from state-
oriented debuggers and therefore needs the dimension of time or order which
was not needed for debuggers.
Apart from the missing information about execution history, the main
problem of the debugger is the inadequate presentation of the execution state
of a concurrent program. Two views had to be considered to determine the
current focus of control in the case of a blocked synchronized method call.
Moreover, debuggers present each thread in isolation but threads do not act
in isolation. Concurrent programs live on interations between threads from
which manifold dependencies arise.
Therefore, new presentations different from the ones found in debuggers
have to be devised for tracing and for trace-based error detection.
The Unified Modeling Language
Describing behaviour and sequences as required when dealing with the ex-
ecution history of a program is not a completely new issue in the software
development lifecycle. The Unified Modeling Language (UML) [UMLa] has
become a standard for visually describing software requirements, analysis,
and design documents. Amongst others, it provides diagrams for describing
behaviour on object level, in particular, complex object interaction and re-
lationships. This is what is found in traces and what is particularly needed
to describe concurrency errors. A visualisation is desirable which combines
benefits from the history-oriented UML interaction diagrams with the exact
deadlock description from the debugger.
The advantage of using UML is that it is known to developers from pre-
3.3. SUMMARY 33
Failure Domain:
Data Analysis:
Data Visualisation:
Data Collection:
Concepts Design Implem. Evaluation
Intuitive, visual formalism
Trace format and method
Detection algorithms
UML-based visualisation
Figure 3.2: First Refinement of Goals
vious phases in the software development lifecycle. Programmers need to
be able to read UML design documents. Using UML also during debugging
makes a step further towards continuous use of one notation. Scalability is
not an issue when we aim at a focused visualisation of failure scenarios using
UML. A failure only a involves a limited number of threads, a limited number
of shared objects, and a limited number of interactions. Because UML is not
primarily designed for visualising program traces or failure detection results,
it has to be adapted to capture programming language-specific features. For
capturing failures it has to be adapted to capture thread depencencies which
play a role in failures. This is made possible through the UML profiling
mechanism.
3.3 Summary
We summarise the requirements by assigning them to the two goals by which
we started this chapter (see also the refinement of goals in Fig. 3.2).
For the goal of describing liveness failures we require
•a natural language description and classification, and
•an intuive, visual formalism, not necessarily executable, from which
detection specifications can be derived.
For the goal of automated detection we require
•a data collection using a flexible tracing method and a trace format
which supports the detection of liveness failures,
34CHAPTER 3. REQUIREMENTS FOR AUTOMATED FAILURE DETECTION
•concepts for implementing detection of the above failures, and
•a UML-based visualisation of traces and failures.
To show the feasibility of our concepts we require to validate these ideas
with a prototype. Given the three areas of tool support, for each of these
areas a suitable prototype has to be designed and implemented following the
decisions which have already been made in this chapter.
Chapter 4
Concurrent Programming in
Java
In order to improve debugging support and to derive detection character-
istics for concurrency failures in Java we need to determine the aspects in
concurrency relevant for liveness failures. In this chapter, we introduce the
concurrency mechanisms provided by Java. We will motivate the purpose of
the concurrency mechanisms with a running example which is an extension
of the banking scenario used in the motivation.
The presentation of the Java concurrency mechanisms is based on the
official Java language specification (JLS) [GJS97]. This document provides
a precise informal description of the syntax and semantics of Java. Not all
concurrency mechanisms are covered by the JLS because some belong to
standard Java libraries. Then our presentation refers to the official docu-
mentation of these libraries.
The Java language is still evolving. Many language features which had
been devised have later been deprecated. That means that they are still part
of the language specification for reasons of downward compatibility but their
usage is discouraged. Here, we only describe features which have not been
deprecated.
We will present the concurrency mechanisms of Java in great detail be-
cause we will formalise many parts of them later.
The Running Example
The example is a simplified banking application simulating the main compo-
nents of a bank:
A bank maintains accounts and engages employees which carry out
transfers between these accounts. A transfer withdraws money from
35
36 CHAPTER 4. CONCURRENT PROGRAMMING IN JAVA
one account and credits it to another account. All employees work
simultaneously. Each employee uses a terminal to enter data for a
transfer. Once all data for a transfer have been entered the employee
uses the terminal to trigger the execution of the transfer. The em-
ployee does not have to wait for the transfer to be completed but
can enter new data and trigger new transfers immediately.
In the remainder of this chapter we examine how the example is mapped
to Java. Thereby we will introduce step by step multithreading concepts in
Java. We will omit intermediate steps of a well-defined software development
process such as the design. This chapter is organised according to the step-
wise introduction of concepts. In the next section we aim at identifying basic
principles of threads and means of interaction between them as provided by
Java. In the remainder of this chapter we will present more details of the
implementation of the example in Java.
4.1 Thread Principles
In this section, we introduce the main principles of Java threads and of inter-
action among them. We also address nondeterminism, which is inseparable
of the notion of concurrency in Java.
Threads
The textual description of the running example contains hints to inherently
concurrent activities, i.e. activities which can take place simultaneously.
These are, for instance, the concurrently working employees. Also, entering
transfer data, triggering transfers, and transfers themselves are concurrent.
Moreover, one can observe that some entities are only active concurrently
because they are used by concurrent entities, i.e. a concurrent entity can
delegate its activity to another, otherwise passive, entity. For instance, ter-
minals only trigger transfers when they are used by employees.
Conceptually, we can distinguish between
•entities which are the root and thereby the owner of a concurrent ac-
tivity, and
•entities which can be activitated or used concurrently but are not the
owner of a concurrent activity.
In the example, we identify employees as being the root of a concur-
rent activity and also transfers because employees can trigger a new transfer
4.1. THREAD PRINCIPLES 37
Thread
AccountingTransaction
Employee
TerminalBank Account
Figure 4.1: Banking Simulation Classes
through a terminal before the previous one has finished. All other entities
are not owners of a concurrent activity but they are activitated by owners of
a concurrent activity.
Using Java, the conceptual concurrency can be mapped to an implemen-
tation level concurrency by means of threads [GJS97, Lea00]. A Java thread
is a single flow of control which executes its code similar to a sequential pro-
gram. Concurrent threads execute simultaneously. Each Java thread is also
a Java object, in the following referred to as thread object. Thread objects
are instances of the Java class Thread in the package java.lang.
The conceptual distinction between entities which are the root of a control
flow and entities which are only activated by other control flows exists in Java,
too. Java thread objects are the root of a control flow and such a control
flow can use any other Java objects during its execution.
For our example, we map employees and transfers to Java thread objects.
On the type level we map them to Java classes which inherit from class
Thread. For employees we specify the class Employee and for transfers we
specify the class AccountingTransaction. Those objects not owning a control
flow are mapped to normal classes such as classes Bank,Terminal,Account.
The resulting classes are depicted in the class diagram of Fig. 4.1, including
the inheritance from class Thread.
By providing the concept of concurrent threads, Java frees the program-
mer from the burden to map the conceptual concurrency to a sequential
implementation model. Also, the programmer can potentially take advan-
tage of platforms with multiple CPUs. The Java Virtual Machine (JVM),
interpreting the Java byte code, is responsible for mapping the concurrency
to multitasking on one ore more CPUs. Concurrent Java programs can run
on any JVM and are intended to be as portable as sequential Java programs.
The technical details of how to create thread objects and how to control
38 CHAPTER 4. CONCURRENT PROGRAMMING IN JAVA
their execution will be explained in section 4.2 on the thread lifecycle. In
the remainder of this section we will address basic principles of interaction
between threads and nondeterminism.
Interaction
The concurrent activities of the example are not completely independent of
each other. For instance, all transfers operate on the same set of accounts, i.e.
all accounts of the bank. Another example is that transfers are triggered by
employees working with terminals. Conceptually, we can identify two kinds
of interactions in this example.
•Activities share data.
•An activity triggers another activity.
Other conceivable interactions are the following.
•An activity delivers data to another activity.
•An activity is waiting for another activity to deliver data, to complete
a task, or to finish completely.
Java accounts for the above listed interactions between threads. More
complex interaction protocols can be constructed from these kind of inter-
actions. All threads of a Java program run inside the same process and
share the same address space. A Java thread is also called a light-weight
flow of control in opposition to processes running in different address spaces
called heavy-weight flow of control. The light-weight approach allows threads
to share all objects of a program as well as system resources such as files.
Threads also exchange data via shared objects. Triggering and waiting also
takes place via shared objects.
An interaction which implies a timely coordination between threads is
also called synchronisation. Synchronisation takes place when threads are
triggering each other or when they are waiting for each other for various
reasons. Sharing or exchanging data does not imply a synchronisation albeit
often synchronisation is needed to prevent data inconsistencies. Synchroni-
sation will be explained in detail in section 4.4 on synchronised interaction.
Unsynchronised but also synchronised interactions bear many problems
which will be the subject of subsequent chapters. The reason why any kind
of interaction is inherently error prone is the nondeterminism inherent in
concurrency.
4.1. THREAD PRINCIPLES 39
Nondeterminism
In the example, it is described that employees work simultaneously. Each
action of one employee can take place before, after, or simultaneous with each
action from another employee. That is, entering data by using a terminal
and triggering the transfers happen without timing constraints with regard
to the work of other employees.
When mapping this to Java it is desirable to keep this deliberate inde-
termination. This is possible because in Java concurrent activities and their
parts can, in general, happen in any order with respect to other current
activities including overlap.
The paradigm of concurrent threads breaks with the sequential paradigm.
•A sequential program specifies a total ordering of statements. The order
of statements specifies the order of execution. A sequential program has
adeterministic execution order at runtime and it is deterministic with
respect to results. Successive statement must be executed one by one
without any overlap in time. Given the same input data, a sequential
program executes the same sequence of instructions and produces the
same result. Considered of its own, a thread has the characteristics of
a sequential program.
•A thread is concurrent and nondeterministic only with respect to other
threads. Two execution steps from two concurrent threads may either
overlap or execute in any order (either one may precede the other).
Therefore, concurrent programs are partially ordered. Hence, many le-
gal executions are possible. A concurrent program has a nondetermin-
istic execution order at runtime. This is also called nondeterministic
interleaving [Lea00, MK99]. A scheduler determines a valid execution
order when mapping the threads to one or more CPUs. This schedul-
ing is not predictable for a given program. Therefore a programmer
cannot make any assumptions about the absolute or the relative speed
of progress of threads. From the point of view of the programmer, a
single thread executes in a series of activities as specified by the pro-
grammer interspersed with periods of dormancy not specified by the
programmer, which occur when the thread is deprived of the CPU.
Note that nondeterministic execution order has to be distinguished from
nondeterministic results. Different program executions can but do not nec-
essarily produce differing output for the same input. The nondeterminism
increases the complexity of a concurrent program and therefore makes it
difficult to understand and to find errors in it.
40 CHAPTER 4. CONCURRENT PROGRAMMING IN JAVA
transfer(Account1, 35)
transfer(Account2, 76)
withdraw(76)
Thread-2 Account2Account1
x:=getValue()
x:=x +76
setValue( x )
withdraw(35)
Thread-3
Figure 4.2: Unsynchronised Access of Accounts
We illustrate the nondeterministic execution order with the examples
given in Fig. 4.2 and Fig. 4.3. In the examples, two threads transfer money
between the same two accounts. It is not predictable in which order the
threads access the accounts. In Fig. 4.2, the call transfer(Account2, 76) trig-
gers withdraw(76) on Account2. The current amount of Account1 is queried
by getValue() and stored in the local variable x. Then xis set to x + 76 and
the new value is stored by calling setValue(x) on Account1. Then, the call
transfer(Account1, 35) by the other thread triggers a similar activity which
withdraws and adds an amount of 35 to Account2.
In Fig. 4.3, the activities on Account1 interfere. After withdraw(76) on
Account2,Account1 is queried by getValue() for the current amount. This
value is stored in x. Then, in the control flow of the other thread, with-
draw(35) is called on Account1. Then xis set to x + 76. The result of the
second withdraw is lost when the the new value of xis stored by calling
setValue(x) on Account1.
While the two program runs are an example for nondeterminism they are
at the same time an example for its danger. The problem occuring here is
also known as lost update. The freedom of the nondeterministic execution
order is not desirable for the concurrent transfers in our example. Synchroni-
sation, i.e. timely coordination, has the very purpose of introducing temporal
constraints between unordered execution of threads and is a means of cutting
out unwanted orderings. For example, the concurrent access of accounts can
be made mutually exclusive in order to avoid inconsistency of data. While
4.1. THREAD PRINCIPLES 41
transfer(Account1, 35)
transfer(Account2, 76)
withdraw(76)
Thread-2 Account2Account1
x:=getValue()
x:=x +76
setValue( x )
withdraw(35)
Thread-3
Figure 4.3: Lost Update
synchronisation limits potential orders it is subject to nondeterminism, too.
Synchronisation will be explained in detail in section 4.4 on synchronised
interaction.
The Java Memory Model
Chapter 17 of the Java language definition (JLS) [GJSB00] specifies the se-
mantics of Java threads by specifying a set of operations on a memory model
and a set of contraints for their occurrence, also called rules. The Java thread
semantics are fully described in terms of these model. This model is given
in natural language only. We will explicitly discuss the JLS and the memory
model in chapter 6. Here we explain how it relates to thread synchronisation
from the programmer’s perspective.
The JLS does not enforce a total ordering for accessing unsynchronised
variables but instead allows partial re-ordering in order to allow compiler
optimisations. These issues do not have an effect on the use of the synchro-
nisation concepts on which this approach is based. Therefore, they also are
not related to the concurrency failures and potentials. In the presence of
synchronisation the Java memory model guarantees sequential consistency
and does guarantees the semantics intuitively expected by the programmer.
42 CHAPTER 4. CONCURRENT PROGRAMMING IN JAVA
4.2 Thread Lifecycle
In the running example, we can observe that concurrent activities differ in
their lifetime. Some concurrent activities are there from the beginning and
last as long as the simulation such as the employees. Others come to life
triggered by other concurrent activities and have a determined lifetime, such
as the transfers, triggered by employees and finished when the transfer is
finished.
Java allows the programmer to control the lifetime of a thread accordingly.
In this section we will explain all issues of the lifecycle of a Java thread, i.e.
its contruction, start, pausing, termination, and the role of priorities. The
methods to create threads and to control their lifetime are provided by the
Java class Thread and the interface Runnable in the package java.lang. The
class Thread provides non-static methods, to be invoked on objects of class
Thread or of its subclasses, and static or class methods which are invoked on
class Thread.
Creating a Thread
In the previous section, we have already mentioned that thread objects are
created by deriving a new class from class Thread and by creating an instance
of the new class. Because Java supports only single inheritance it is not
always feasible to inherit from Thread. Another possibility is to specify a
new class which implements the interface Runnable. Then an instance of
that class is created and it is passed as an argument to a constructor of class
Thread.
To provide functionality, in either case the new class has to implement
the method run(). The body of the method run() contains the code which
will be executed by the thread. In case of inheritance, the new class can
override the method run(). In the case of the interface, the new class has
to implement run() which is the only method of the interface Runnable. The
two possibilities are depicted in Fig. 4.4. For the class Thread we only show
method run() from the set of methods because it is the focus here. Other
methods are not supposed to be overriden.
In the following, we give an example of a new class derived from Thread.
The purpose of the class AccountingTransaction is to transfer an amount of
money between a source and target account of class Account. This behaviour
is implemented in the run()-method. This method does not have any pa-
rameters and its signatur cannot be changed because it is called implicitly,
which will be explained later. Apart from specifying run() we therefore have
to make sure that the thread is given the data it needs for carrying out a
4.2. THREAD LIFECYCLE 43
constructor_argument
Thread
+run:void
InheritingThread
+run:void
interface
Runnable
+run:void
ImplementingRunnable
+run:void
Creatingthreadsby
implementingRunnable
Creatingthreads
byinheritance
Figure 4.4: Deriving New Threads
transfer. The accounts needed and the amount to be transfered are passed
with the constructor and are assigned to fields declared in class Accounting-
Transaction.
class AccountingTransaction extends Thread {
Account a1;
Account a2;
int amount;
AccountingTransaction(Account a, Account b, int am) {
a1 = a;
a2 = b;
amount = am;
}
public void run() {
a1.transfer(a2,amount);
)
}
Apart from the fields defined in the derived class each thread also has a
number of predefined attributes such as thread name,thread group,priority
and the daemon-property. In so far as they are relevant for concurrency er-
rors, the purpose of these properties will be explained in the sequel. These
attributes are only accessible by special methods to inspect and manipulate
those attributes, which are declared in class Thread. The thread group con-
trols the access and the lifetime of a thread. A thread’s lifetime is bound to
its creating thread’s lifetime.
Starting a Thread
After the creation of a thread object, the corresponding thread is started by
invoking start() on the thread object. This causes the Java Virtual Machine
44 CHAPTER 4. CONCURRENT PROGRAMMING IN JAVA
to fork a new control flow which executes the thread object’s run()-method.
Because run() is called implicitly its signature cannot be changed and contains
no parameters. A thread is always created and started by another thread,
which can also be the main()-thread.
Above, we have specified the class AccountingTransaction for implementing
transfers. Now, we want to create thread instances and fork new control flows.
We have to identify which thread is responsible for creating and starting
them. Transfers are triggered by employees using the terminal. So the code
responsible for creating an object of AccountingTransaction is part of the
class Terminal. This is an object without its own control flow but it will be
used by an employee thread and hence it will be an employee thread which
creates an instance of AccountingTransaction. The class Terminal is given in
the following. An object of class AccountingTransaction is created in method
transferAmount(). Then its priority is set to the maximum. Then start() is
called on it. The purpose of priorities will be explained later.
class Terminal {
private Bank bank;
AccountingTransaction at;
Terminal(Bank b) {
bank = b;
}
// see Terminal.handOverCheque()
public void transferAmount(int a1, int a2, int amount) {
Account A1 = bank.getAccount(a1);
Account A2 = bank.getAccount(a2);
at = new AccountingTransaction(A1,A2,amount);
at.setPriority(Thread.MAX_PRIORITY);
at.start();
}
}
After calling start until a thread actually runs, and after completion of
run until it is cleaned up there is a short elapse of time. During this time, it is
not defined, how method calls return. E.g., isAlive() returns true immediately
after start was called.
Each thread object can be started only once. Restarting a thread object
whose thread is already running will cause an exception. Restarting a thread
which was finished has no effect. In a concurrent Java program the number
of threads is not limited and not determined at compile time. Every Java
program has at least one thread executing the method main() which does not
have to be created explicitly.
While a thread is running its purpose is to do calculations, interact with
its environment, and interact with other threads, all of which is accomplished
by creating and accessing other Java objects. A thread navigating through
4.2. THREAD LIFECYCLE 45
the net of objects might need to access its thread object which is typically not
known in the object where the thread is currently executing. By calling the
static method currentThread() a thread can obtain a reference to its thread
object. When the thread object is of a subclass of class Thread the fields
of the subclass can only be accessed after casting the object received by
currentThread() to the inherited class.
Instead of providing thread instance-specific data via subclassing, the
class ThreadLocal allows a thread to create an object with a predefined set()
and get() method by which the thread can store and retrieve data indepen-
dent from where it is currently executing. Each thread has its own copy and
is not aware of the other copies for other threads.
Pausing a Thread
In a concurrent environment, threads will not always have to calculate some-
thing but can have explicit idle times before synchronising with other threads
or before exchanging data. In the bank example, we want to simulate that
employees are not working all the time but can be idle.
Idle time is mapped to Java as follows. With the static method sleep()
a thread can pause itself for a specified time. The thread will automatically
resume when the time span has elapsed.
public static void sleep(long ms) throws InterruptedException
Table 4.1: Signature of sleep()
Invoking this method can raise an InterruptedException for which an excep-
tion handler has to be supplied. The exception is thrown when the interrupt
flag of the thread has been set after method interrupt() has been called on the
thread but if the thread has not yet polled its interrupt flag. The interrupt
mechanism will be explained in detail in a section 4.3.
In our example we can use sleep() in the run()-method of employees (see
the next code example) to achieve that they are not active all the time. After
they are started they generate data for transfers randomly. Then they are
paused before they start the transfers.
The methods suspend() and resume() to pause a thread from another
thread are deprecated.
Prioritising a Thread
While in general the developer should not think about the order in which
multiple threads are running, there may be cases where a thread does an
46 CHAPTER 4. CONCURRENT PROGRAMMING IN JAVA
urgent calculation which should be favourised over other threads. Thread
priorities are a way to express this.
In section 4.1 we explained that the scheduler is completely responsible for
determining a valid execution order. In some cases this is not satisfactory, e.g.
if some calculations are more urgent than others. For instance, the transfers
from the banking example should be carried out immediately because they
need mutual exclusive access to accounts and shouldn’t block them for too
long.
The Java Virtual Machine supports a simple fixed priority scheduling.
Each Java thread is given a numeric priority in the range of MIN PRIORITY
defined as 1 and MAX PRIORITY defined as 10. By default, main() is given
the priority 5. Unless set otherwise each thread has the same priority as the
one which created it. The priority can be changed dynamically.
The thread with the highest priority is chosen for execution. When that
thread stops, or is suspended for some reason, a same-priority or lower-
priority thread starts executing. If a thread with a higher priority than
the currently executing thread is ready to execute, it is immediately sched-
uled. This is also called preemption. The Java runtime will not preempt the
currently running thread for another thread of the same priority 1. Depend-
ing on the underlying system some Java Virtual Machines map the range of
priorities to a smaller range with the effect that some threads will not be
able to preempt others.
Going back to the example, we will use priorities to ensure that transfers
are executed immediately. Transfers are created in the code of class Terminal
(see the code example in the previous section. After the creation of a thread
object of class AccountingTransaction,setPriority(Thread.MAX PRIORITY) is
called on it.
A thread may, at any time, give up its right to execute by calling the
static method yield(). Note that threads can only yield the CPU to other
threads of the same priority. In the example we include Thread.yield() after
the terminal has started the transfers to ensure that other employees can also
issue transfers.
Note the difference between sleep() and yield(). A yielding thread may
appear to sleep for a while but this effect is not guaranted because it depends
on all other threads of the program.
class Employee extends Thread {
private int amount;
1The system implementation of threads underlying the Java Thread class may support
time-slicing which has the effect that threads of the same priority will effectively preempt
each other.
4.2. THREAD LIFECYCLE 47
private static int numAccounts;
private Bank bank;
private Terminal term;
Employee(Bank b, Terminal t) {
bank = b;
term = t;
numAccounts = b.getNumAccounts();
}
public void run(){
int i=0;
while(i<100) {
//methods to determine randomly parameters for the simulation
amount = randomizedAmount();
int a1 = randomizedAccount();
int a2 = randomizedAccount();
//simulate idle time
try {
Thread.sleep(500);
}
catch (InterruptedException ie) {}
//transfer only between different accounts
if(a1 != a2) {
term.transferAmount(a1,a2,amount);
Thread.yield();
i++;
}
}
}
Terminating a Thread
In the example we decide that a thread for an accounting transaction only
carries out one transfer. Then it should be finished. For a new transfer, a
new thread will be started2. Usually, the control flow of a thread terminates
after run() returns. The return is reachable if neither a never ending loop
nor an unsolvable lifeness problem occurs. The method IsAlive() called on a
thread object allows another thread to test if this thread has been started
and has not yet terminated. Also, threads terminate when the thread which
created them is terminated. To avoid this, a thread can be given the daemon3
property. The JVM runs as long as there are non-daemon threads. Then it
runs as long as the main()-thread or the Java Virtual Machine.
2Note that this is a very naive approach. As starting and terminating threads is time
consuming, in a real program one would try to reuse a thread to perform accounting
transactions one after another.
3The origin of the term daemon for background processes is most recently ascribed to a
thought experiment by the physician Maxwell in which a hidden daemon selects molecules.
The thereby superseded hypothesis says it is an acronym for disk and execution monitor.
48 CHAPTER 4. CONCURRENT PROGRAMMING IN JAVA
Methods to stop a thread were initially part of class Thread but have been
deprecated because of potentially dangerous side effects on other running
threads which can lead to inconsistent and undefined system states.
A thread can terminate abnormally when an unchecked exception is not
handled or a checked exception is re-thrown. This will be discussed in the
next section.
Exceptional Termination
Although the exception mechanism is the same for concurrent Java programs
as for a single-threaded Java program its effects are slightly different. While a
sequential program is aborted if an exception is not handled, a multithreaded
program may continue if the exception does not occur in the main()-thread.
Exceptions are objects that store information about the occurrence of an
unusual or failure condition. They are thrown when that error or unusual
condition occurs. Java provides a mechanism for handling exceptions also
known as catching an exception. There are two types of exceptions in Java,
unchecked exceptions and checked exceptions. Exceptions are part of method
signatures.
•Unchecked exceptions inherit from class RuntimeException, e.g. the
predefined ArithmeticException,NullPointerException, or ArrayIndexOut-
OfBoundsException. Java does not require that you declare or catch
unchecked exceptions in your program code. Unchecked exceptions
may be handled as explained for checked exceptions in the following
paragraph.
•Checked exceptions do not extend the RuntimeException class. Checked
exceptions must be handled by the programmer to avoid a compile-time
error, e.g. IOException or InterruptedException.
There are two ways to handle checked exceptions: by declaring the ex-
ception using a throws-clause or by catching the exception. To declare an
exception, the keyword throws is added to the method header followed by the
class name of the exception. Any method that calls a method that throws
a checked exception must also handle the checked exception in one of these
two ways. If a checked exception occurs and is not caught before it reaches
the main method of the application, the program will crash.
After a method throws an exception, the runtime system leaps into action
to find a method to handle the exception in the call stack of the method where
the error occurred. The runtime system searches backwards through the call
stack, beginning with the method in which the error occurred, until it finds
4.3. UNSYNCHRONISED INTERACTION 49
a method that contains an appropriate exception handler. An exception
handler is considered appropriate if the type of the exception thrown is the
same as the type of exception handled by the handler. Thus the call stack is
searched until an appropriate handler is found and one of the calling methods
handles the exception. The exception handler chosen is said to catch the
exception. If the runtime system exhaustively searches all of the methods on
the call stack without finding an appropriate exception handler, the runtime
system (and consequently the Java program) terminates.
When the main thread in a single-threaded application throws an un-
caught exception, the stack trace is printed on the standard output, e.g.
console, and the program exits. This is also referred to as program crash.
In a multithreaded environment not every thread is attached to a console.
Therefore, a thread can silently disappear from an application, especially
without a (stack) trace, when an exception is thrown and not caught.
4.3 Unsynchronised Interaction
Despite the fact that a thread has to be created by another thread we have
not addressed interaction between threads so far. Interaction takes place be-
tween two threads, directly or via shared objects. Either it is unsynchronised
meaning that no timely coordination controls the interaction or it is synchro-
nised meaning that some timing constraints apply, e.g. mutual exclusion.
In this section we will address unsynchronised interaction between threads,
via shared objects and directly.
Sharing Objects
Threads can use ordinary Java objects to communicate, either by explicitly
exchanging data or by using data exchange to encode signals. Shared use
of data is the source for inconsistencies in the case of simultaneous read
and write accesses. In general, synchronisation has to be applied to avoid
inconsistencies. Unnecessary synchronisation should be avoided because it is
a bottle neck for a program. Synchronisation is not needed if several threads
access the same object by read-only methods as the state of the object is
never changed and hence the read-only methods may take place in any order
including overlap.
The class Bank creates account objects and can be asked for references to
them. For instance, an employee can query the bank for a certain account.
Many such queries can happen in any order including overlap. Because they
do not change the state there is no constraint on their execution order, i.e.
50 CHAPTER 4. CONCURRENT PROGRAMMING IN JAVA
they do not have to be synchronised. In this somewhat contrived example,
the bank interface does not provide methods for changing accounts. The
constructor is the only method which could interfere with the other read-
only methods. A constructor cannot be synchronised and therefore other
mechanisms are needed if threads are assumed to access objects before they
are fully constructed.
In the following the code of Bank is given. After creation, the bank
can be queried by the two methods for those accounts. With the method
getNumAccounts() the total number of accounts can be queried and with the
method getAccount(int nr) an individual account can be retrieved to work
with it.
Java guarantees atomicity for concurrent access and updates to fields of
any type except long and double. Atomicity is thereby also guaranteed for
fields serving as references to other objects. These exceptions are saying that
the field access to long and double can overlap and thereby a value computed
by none of the threads can be result. To avoid these inconsistencies long and
double can be made atomic by using the keyword volatile.
public class Bank {
public Hashtable accounts;
public int numAccounts;
public int numEmployees;
Bank(int numEmpl, int numAcc) {
numEmployees = numEmpl;
numAccounts = numAcc;
accounts = new Hashtable();
employees = new Vector();
for(int i=0;i<numAccounts;i++) {
accounts.put(new Integer(i),new Account(i,100));
}
for(int j=0;j<numEmployees;j++) {
Terminal t = new Terminal(this);
Employee e = new Employee(this,t);
employees.addElement(e);
}
for(int j=0;j<numEmployees;j++) {
Employee e = (Employee)employees.elementAt(j);
e.start();
}
for(int j=0;j<numEmployees;j++) {
Employee e = (Employee)employees.elementAt(j);
e.join();
}
}
public int getNumAccounts() {
4.4. SYNCHRONISED INTERACTION 51
return numAccounts;
}
public Account getAccount(int nr) {
return (Account)accounts.get(new Integer(nr));
}
}
Interrupting a Thread
A thread has a built-in interrupt mechanism. Other threads can interrupt
a thread by invoking the method interrupt() on the thread object, then in-
ternally the interrupt status is set to interrupted. The thread can poll its
interrupt status using the static method interrupted(), which returns true or
false. The interrupt status of the thread is cleared, i.e. set to not interrupted
by this method. Other threads can test, if a thread has been interrupted by
invoking the method isInterrupted() on a thread object. The interrupt status
of the thread is unaffected by this method. If interrupt() is called on a thread
which is in a state where it cannot poll its flag, e.g. sleeping, waiting, or join-
ing, an InterruptedException is raised. While interrupting serves as a means
of communication it cannot be considered as a synchronisation. The inter-
rupting thread has no control over the timing when the interrupted thread
will handle the interrupt. Also the interrupting thread is not informed when
the thread has handled the interrupt. Such an interrupt mechanism can be
easily simulated by two programmer-defined methods, one to send a signal
to the thread object and one to read out the signal. The advantage of this
built-in interrupt is that when the interrupted thread is in a state where it
cannot poll the signal it is interrupted by an exception.
public void interrupt()
public static boolean interrupted()
public boolean isInterrupted()
Table 4.2: Signature of Interrupt Methods
Note that any call to wait(),sleep(),join() has to be encapsulated in a
try-catch-block. Each such call will return when an interrupted happens, i.e.
each such call can potentially consume an interrupt.
4.4 Synchronised Interaction
In this section we look at the synchronisation mechanisms provided by Java.
As motivated earlier, concurrent threads need to coordinate their activities
52 CHAPTER 4. CONCURRENT PROGRAMMING IN JAVA
with respect to time. This may or may not include the exchange of data.
Note that also start() is a means of a timely coordination between the thread
starting a new thread and the new thread being started. In the following we
look at further language means.
Joining
Forking a new thread by calling start() is the simplest means of synchro-
nisation between threads. Similarly, a thread can wait for another thread
to finish before it does something. In the banking simulation, the bank is
responsible for forking the new employee threads. The bank can finish when
all employee threads are finished.
This can be expressed by calling e.join() on each employee object (see
the code above for class Bank). The non-static method join() suspends the
calling thread to wait until the target thread, on which the method is called,
terminates. Method join() throws an InterruptedException if the joining thread
is being interrupted by another thread. There exists a variant with a timer.
The time in ms can be supplied as a parameter.
public void join() throws InterruptedException
public void join(long ms)
Table 4.3: Signature of join()
A thread can only join one other thread at a time. The joining has no
effect on the joined thread. It can join any thread provided it has a reference
to its thread object. If it needs to join more than one thread it has to do
it one after another. In the case that the target thread terminates with an
exception, the joining thread returns without an exception.
A thread can join a thread which is already terminated. A thread can
join a thread which is not yet started. In both cases the joining thread will
return immediately.
Mutual Exclusion
In a multithreaded environment threads can simultaneously access objects.
This can put objects in an inconsistent state also referred to as data inconsis-
tency. To avoid inconsistencies the access on these objects must be mutual
exclusive. Only if it is ensured that they are not accessed by more than one
thread or read-only by all except one thread no mutual exclusion is needed.
In our example the accounts are shared by the threads carrying out the
transfers and so access to them needs to be mutual exclusive. We have
4.4. SYNCHRONISED INTERACTION 53
illustrated in Fig. 4.3 that concurrent transfers can cause inconsistency in
account data. Here we will explain how this kind of problem can be avoided.
Java supports mutual exclusion through monitors. A monitor encapsu-
lates data and provides mutual exclusive access to observe or modify the data.
In addition, a monitor supports conditional synchronisation. This permits a
monitor to block threads until a particular condition holds, e.g. a condition
on its data. In this subsection we deal with mutual exclusion. In the next
subsection we deal with conditional synchronisation.
Any Java object can serve as a monitor. Concurrent activations of the
same or of different methods on the same object are made mutually exclusive
by prefixing methods with the keyword synchronized. A thread executing a
synchronized-method excludes other threads from executing any synchronized-
methods on the same object. The keyword synchronized grants exclusive
access to the entire object, however non-synchronized methods can always be
executed.
synchronized method name (...)
static synchronized method name (...)
synchronized(o){...}
synchronized(this.getClass())(...)
Table 4.4: Keyword synchronized
Also a class can serve as a monitor if the synchronized keyword is attached
to static methods. The class lock has no relation ship to the object locks.
More fine-grained exclusion is achieved by declaring a block of statements
within a method as synchronized. Because synchronisation in Java is always
associated with a monitor, in the same ways as a synchronized-method, a
monitor object has to be supplied when specifying a synchronized-block. This
can be the object on which the method which contains the synchronized-
block has been called. Then the object specified for the synchronized-block
is this. It can also be any other object known in the context of that method.
The exclusion rules for methods extends to blocks. We will use the term
synchronized-region to refer to both, synchronized-blocks and synchronized-
methods.
In the following example of class Account from our banking example, all
methods are made mutually exclusive using synchronized. This guarantees
mutual exclusive access to accounts in order to keep the amount consistent
in a multithreaded context.
public class Account {
private long value;
54 CHAPTER 4. CONCURRENT PROGRAMMING IN JAVA
private int nr;
Account(int n, long v) {
nr = n;
value = v;
}
public synchronized int getNumber() {
return nr;
}
public synchronized long getValue() {
return value;
}
public synchronized void setValue(long v) {
value = v;
}
public synchronized void withdraw(long amount) {
value-=amount;
}
public synchronized void transfer(Account other, long amount) {
other.withdraw(amount);
value+=amount;
}
}
To describe the operational meaning of synchronized, we draw on the
metaphor of a lock for each object. To be able to enter a synchronized-block
or -method a thread needs to obtain a lock on the object. If multiple threads
are trying to obtain such a lock, only one thread is assigned the lock. Other
threads are locked out and they are blocked, i.e. they cannot proceed. If a
thread leaves a synchronized-region, it releases the lock. By nondeterministic
choice one of the blocked threads will receive the lock and can start to execute.
All synchronized non-static methods inherited and defined for a class use the
same lock.
The following is important to know when using synchronized:
•Java locking is nested and re-entrant, a thread can obtain more than
one lock and it can re-enter a lock it holds. Re-entrance allows a syn-
chronized method to make a self-call to another synchronized method
on the same object without freezing up.
•The keyword synchronized is not automatically inherited when sub-
classes override superclass methods. A synchronized-method can be
overriden by a non-synchronized-method.
•Methods in interfaces cannot be declared as synchronized.
•The object lock does not imply mutual exclusive access to static fields
of the corresponding class. Locking of static fields of a class is achieved
by synchronized static but only for the class not for its super classes.
4.4. SYNCHRONISED INTERACTION 55
•A constructor cannot be synchronized.
•Method run() should never be synchronized.
The monitor approach to mutual exclusion is characterised by the fact
that the locking of mutual exclusive objects is not an independent opera-
tion. It is always linked to another activity such as executing a method or
executing a block of statements. That holds for the beginning of the mutual
exclusion as well for its end. When leaving a synchronized-block or -method
the mutual exclusion implicitly ends. This avoids errors compared to a style
where one would have to specify the beginning of the mutual exclusion and
the end separately and without possibility of syntactic checking. The fact
that synchronized is always used with a block or method is important for
analysing concurrency failures involving mutual exclusion. It is important to
determine the block or method which issued the mutual exclusion.
The exception mechanism of the Java platform is integrated with its syn-
chronisation model, so that locks are released as synchronized statements and
invocations of synchronized methods complete upruptly. If a thread holding
asynchronized-lock encounters a runtime exception and terminates it will re-
lease its locks. This ensure that other threads can use these locks but when
the locks are released the consistency of the monitor data is not guaranteed.
Nondeterminism also plays a role in the presence of mutual exclusion.
With mutual exclusion only one of several threads is granted access. However,
the order in which contending threads are granted mutual exclusive access is
nondeterministic. The programmer must assume that any order can happen.
Conditional Synchronisation
Sometimes it is desirable to build more complex synchronisation than mu-
tual exclusion. This is the case when a concurrent activity needs a certain
condition of the environment to be fulfilled to continue its activities. For the
banking simulation we might think of an extension where transfers are only
carried out if there is enough money to withdraw the amount. Otherwise,
they should not be aborted but be delayed. This results in a number of
activities waiting for their completion and we are in charge of finding a way
how they will be informed when the condition they are waiting for is fulfilled.
Suspending and resuming threads based on the evaluation of a condition is
called conditional synchronisation. Together with mutual exclusion, it makes
up the monitor concept.
Ideally, when mapping to a programming language, as many features of
conditional synchronisation as possible should be automated, i.e. keeping
a list of waiting threads, watching for the change of some condition, and
56 CHAPTER 4. CONCURRENT PROGRAMMING IN JAVA
finding out which threads need to be notified about these changes. Java does
provide some support for conditional synchronisation as part of its monitor
concept but the programmer is asked to supply some information such as
when a thread needs to wait and also when it needs to be woken up.
The implementation of conditional synchronisation is not as straightfor-
ward as mutual exclusion. The blocking on conditions and the signalling of
changed conditions have to be implemented by the programmer. Therefore,
a simple mechanism is supplied with each Java object. Each Java object has
await-queue for queuing threads which have to be blocked on a condition.
With the methods wait(),notify(), and notifyAll() of class java.lang.object the
wait-queue of the object can be manipulated. These methods can only be
used within a synchronized-region. Note that these methods are non-static.
These methods are sometimes referred to as monitor-methods.
public final void wait() throws InterruptedException
public final void wait(long ms) throws InterruptedException
public final void notify()
public final void notifyAll()
Table 4.5: Signature of wait() and notify/All()
•When a thread calls wait() it changes its state from running to waiting.
The thread is inserted into the wait-queue and releases its lock. The
queue is in fact only a set, because Java does not guarantee anything
about the order in which the threads are enqueued or dequeued.
•Threads can only be removed from this queue by notifications on the
same object, which are issued by another thread calling notify() or no-
tifyAll(). Using notify() removes only one thread from the queue. No
assumptions can be made, which thread will be removed. A call to no-
tifyAll() removes all threads from the queue. If notifications are issued
and no thread is waiting, the notifications are not kept for threads,
which are calling wait() later on. After being removed from the queue,
the thread returns from the wait() and it has to re-obtain the synchro-
nized-lock before it can proceed. Releasing the synchronized-lock during
wait() and re-obtaining the lock afterwards is both with respect to re-
entrance, i.e. the lock is fully released and re-obtained as many times
as it was accessed before.
Similar to the states sleeping and joining, also in the state waiting, the
call of interrupt() causes an exception. Calls to wait() therefore have to be
4.4. SYNCHRONISED INTERACTION 57
encapsulated in a try-catch block to handle the InterruptedException defined
in the signature of wait(). As explained earlier, the interrupt mechanism
will raise an exception if a thread has been interrupted, its interrupt flag
has not yet been cleared, and it is in an inactive state such as waiting or
sleeping. After having called wait() the thread is in such a state. Then an
InterruptedException is thrown in two cases. Either the interrupt has been
issued before the thread is calling wait. Then the InterruptedException is
thrown immediately. Or the interrupt is issued while the thread is waiting.
Then the thread is leaving the waiting state because the InterruptedException
is thrown. It is common practice to provide an empty exception handler
for the InterruptedException. When the thread returns from wait because it
was interrupted, the thread is removed from the wait-queue, and, before the
exception is thrown, the thread has to re-obtain the synchronized-lock. If an
interrupt and a notification happen almost simultaneously, the thread may
leave the wait() either way.
The purpose of wait() and notify(), notifyAll() is to be used for conditional
synchronisation in the following way.
1. A thread which has obtained the synchronized-lock tests if the condition
necessary for its activities holds.
2. If the condition does not hold, the thread is calling wait(), which puts
him in the wait-queue.
3. When it is woken up by another thread through notify(), or notifyAll()
it first has to re-obtain all needed synchronized-lock.
4. Then it re-tests the condition before anything else happens. This is
implemented by a while-loop.
5. If the condition holds, the thread performs its activities.
6. After this, other threads can be notified.
7. It is important that wait() methods be balanced by notification meth-
ods.
The example is mapped to Java as follows. We need to program an ac-
count object which blocks threads trying to withdraw money if the result
would be beyond a limit. Withdrawing threads are conditionally blocked.
Transfering threads always increase the amount and therefore signal to wait-
ing threads that the condition has changed.
58 CHAPTER 4. CONCURRENT PROGRAMMING IN JAVA
public class LimitedAccount extends Account {
private const long limit = 10000;
Account(int n, long v) {
super Account(n, v); //???
}
public synchronized void withdraw(long amount) {
while (value - amount <= limit)
try {wait()} catch (InterruptedException ie) {};
Account.withdraw(amount);
}
public synchronized void transfer(Account other, long amount) {
Account.transfer(other, amount);
notifyAll();
}
}
Another solution for the notification would have been to call notify() in
transfer() to wake up only one thread. Then this thread would have to invoke
notify() to wake up the next thread because it could be the case, that the limit
is not yet reached. This solution only wakes up one thread at a time to avoid
unnecessary wake ups. It is still unfair concerning new threads contending
for the lock.
A call to notify() or notifyAll() can only wake up threads which are cur-
rently in the wait-queue. If there is no thread in the queue the notify does not
have an effect. Also, if after an unused notify a thread enters the wait-queue
it is not woken up. That is a notify can not be safed for later use.
Note that compared to the blocking involved in mutual exclusion, here the
waiting thread is not automatically woken up. Instead, the waiting thread is
dependent on the collaboration of other threads to be woken up. It cannot be
checked by a compiler if a corresponding notify() is properly encoded because
the notification does not have to be in the same method. It only has to be
in the code of the same class or in inherited code. Any notification could
become effective at runtime.
Note that the semantics of notify()/notifyAll() in Java differ from the se-
mantics of the analogous functions in a classical monitor by Brinch-Hansen
et al. [Lea97]. While the classical definition implies an explicit programming
of the wait-queues with more than one wait-queue possible (for several con-
ditions) the Java definition uses a monitor for an entity possessing a lock and
a single wait-queue. Also, Java has adopted a so-called signal-and-continue
semantics which says that the notifying thread continues after the notify
while the notified thread has to re-test the condition. The classical signal-
and-urgent-wait semantics implies that the notifying thread is suspended and
that the notified thread can continue without checking the condition. When
the notified thread has left the monitor the notifying thread regains the mon-
itor before any other thread. The semantics adopted by Java fit better with
4.4. SYNCHRONISED INTERACTION 59
Concept Java Syntax Section
Lifecycle control
- Creation
Thread,
Runnable
4.2
Lifecycle control
- Starting
start(), run() 4.2
Lifecycle control
- Termination
return from run,
exceptional ter-
mination
4.2
Pausing sleep() 4.2
Priorities setPriority() 4.2
Unsynchronised
Interaction -
Sharing Data
shared objects 4.3
Unsynchronised
Interaction - Di-
rect Interaction
interrupt() 4.3
Synchronised In-
teraction - Join-
ing
join() 4.3
Synchronised In-
teraction - Mu-
tual Exclusion
synchronized 4.4
Synchronised
Interaction -
Conditional
Synchronisation
wait(), notify() 4.4
Table 4.6: Java Concurrency Concepts
60 CHAPTER 4. CONCURRENT PROGRAMMING IN JAVA
the approach to use only one wait-queue.
Nondeterminism extends to conditional synchronisation of monitors. The
order in which threads are sent into the queue and are woken up from the
wait-queue is not predictable. Newly arriving withdraw()-calls can contend
with withdraw()-calls from threads waken up from the queue. Both kind of
threads contend for the synchronized-lock. The programmer cannot assume
that they are woken up in the same order in which they have entered the
wait-queue. If this is not desirable the program has to implement its own
synchronisation protocol accordingly.
Advances in Java Concurrent Programming
As already noted, the Java concurrency features are sometimes regarded
as insufficiently designed [Hol00, Lea03]. On the one hand, this has had
the effect that libraries have been proposed which provide more advanced
synchronisation elements such as locks [Lea03]. On the other hand this has
lead to proposals of new language features some of which follow the ideas of
aspect-oriented programming [KLM+97] in order to separate synchronisation
code from the remaining Java code such as in [MW99].
4.5 Summary
In this section we have described multithreading in Java. Our presenta-
tion was organised according to the concepts provided by Java. Table 4.6
summarises the concepts we have introduced and the related Java language
elements.
The next chapter will build on these concepts to describe, which concur-
rent liveness failures are possible in Java. In the next chapter, Table 4.6 will
be extended with the failures associated with each concept.
Chapter 5
Liveness Failures and Potentials
in Java
The previous chapter laid the ground for the description of concurrent Java
liveness failures by introducing synchronised thread interaction in Java. It
also pointed out that concurrency failures are inherent in thread interaction.
In this chapter, we will describe this relationship in depth.
The term failure is related to the terminology for error. The term error is
widely used and has different connotations. In order to define precisely our
notion of error we introduce a standard terminology for errors which relates
an error to different activities and artifacts during the software development,
namely, coding, program, and program execution.
Using this terminology, we will informally introduce liveness failures in
concurrent Java programs. In the next chapter, these failures will be defined
more formally.
In the presence of nondeterminism not only failures but also potentials
for failures are of interest because they can equally point to problems in the
program. We will therefore introduce the notion of a failure potential.
We will introduce the notion of liveness failure using the existing distinc-
tion between liveness and safety properties in concurrent programs. We will
elaborate a comprehensive list of conceivable liveness failures. We will focus
on the concurrency related failures but not on failures which occur also in
sequential programming. For each failure we will discuss the symptoms, the
failure state, and the fault in the code.
61
62 CHAPTER 5. LIVENESS FAILURES AND POTENTIALS IN JAVA
5.1 Terminology
In this section, we will present our terminology for describing concurrent
liveness failures in Java. It is based on standard terminology for errors and
on existing classifications for errors in concurrent programs.
5.1.1 Error, Mistake, Fault, and Failure
The IEEE Standard Glossary of Software Engineering Terminology [I3E90]
is a canonical source for software engineering terms and defines the notion
of error and related terms, given in definition 5.1. This definition havs not
changed since the 1990 version which is a revised version of 1983. Although
not stated explicitly in the definition or its context, the term error and its
refinements are related to software.
Definition 5.1 (Error) (1) The difference between a computed, observed,
or measured value or condition and the true, specified, or theoretically cor-
rect value or condition. For example, a difference of 30 meters between a
computed result and the correct result.
(2) An incorrect step, process, or data definition. For example, an incor-
rect instruction in a computer program.
(3) An incorrect result. For example, a computed result of 12 when the
correct result is 10.
(4) A human action that produces an incorrect result. For example, an
incorrect action on the part of the programmer or operator.
Note: While all four definitions are commonly used, one distinction as-
signs definition 1 to the word “error”, definition 2 to the word “fault”, defi-
nition 3 to the word “failure”, and definition 4 to the word “mistake”.
The key essence of an error is that it is dependent on an assumption
of what has been expected as correct instead. This is stated explicitly in
definition (1) and (3) but it also holds for (2) and (4). Such an assumption
can be a specification but also non-externalised knowledge of a human being.
For instance, in the case of testing, a so-called test oracle can provide the
expected results.
Definition (1) focuses on error as an error measure, i.e. the difference or
the deviation between expected and computed result. It is closely related to
definition (3), also termed failure, which applies to an incorrect value, result,
or state reached.
Definition (2), also termed fault, is used for incorrect instructions in a
program. This is also the interpretation of fault used by [FR01] and we will
5.1. TERMINOLOGY 63
Error
Mistake (4) Failure (3) / Error (1)Fault (2)
Figure 5.1: Error Terminology
adopt it in the following. Thereby we differ, e.g. from [GJM91] who use the
term fault for all incorrect states during the execution of a program.
Definition (3), also termed failure, focuses on incorrect results. We think
that the notion of result needs to be broadened. One reason is that programs
are not always intended to come to a defined end or to compute output
results. Therefore, the term failure should apply to any incorrect execution
states of a program (including output). An incorrect state is a state which
violates a specification. The error terminology is visualised in Fig. 5.1 The
numbers in brackets refer to the numbers used in Definition 5.1.
The definitions (1) to (4) are related as follows. The product of a human
mistake can materialise in a fault, e.g. if the programmer’s wrong action is
inserting the wrong statement in a specific line of code. Evidence of a fault
in the code can come through failures at runtime, typically incorrect output
values, unexpected program termination, non-terminating or non-progressing
execution. The error can possibly be determined as a difference between the
failure result and the expected result. The above cause-effect relations are
depicted by arrows in Fig. 5.1.
It is often the case that the root cause of a failure can be traced to a
small but not necessarily coherent area of the program which contains the
fault. Sometimes program failures are indications of global problems such
as mistaken assumptions or inappropriate architectural decisions. In such
cases, it is misleading to assume that editing a small area of the program will
prove sufficient to correct a fault [FR01].
Note that this error terminology is independent of a particular program-
ming paradigm but it will be helpful in the following to describe concurrency
problems.
Example
We illustrate the use of the introduced terminology with the example of the
deadlock. A deadlock is a program failure because it violates the desired
specification that threads should be able to finish their computations. An
64 CHAPTER 5. LIVENESS FAILURES AND POTENTIALS IN JAVA
Error
Mistake (4) Failure (3) / Error (1)Fault (2) Failure Potential
Figure 5.2: Extending the Basic Terminology with Potential
example for a deadlock has been given in Fig. 2.2. The reason for this
deadlock is a program fault, i.e. pieces of code which do not work together as
intended. Interestingly, there is no corresponding name for the fault related
to the deadlock. The code responsible for the example deadlock is the method
synchronized transfer(Account other, long amount).
public class Account {
//other members ...
public synchronized void transfer(Account other, long amount) {
other.withdraw(amount);
value+=amount;
}
}
The above method will always be called on the target account of a transfer
and will take the source account as an argument. Thus the order in which
accounts are locked is determined by the actual transfer only. This allows
for cyclic dependencies during locking of the involved account objects. This
fault in the code is a product of a human thought process. The programmer
made a mistake when not constraining the order of synchronized access to
account objects.
5.1.2 Potential for Failure
We want to go beyond the IEEE classification by introducing the notion of
failure potential. Because of the nondeterminism in concurrent programs a
failure might not occur in every program execution, although the program
contains a fault. Nevertheless, in some cases, a failure-free execution can con-
tain information about the potential for failure. The detection of a potential
for a failure is nearly as valuable as a detected failure because it points to
the same fault.
5.1. TERMINOLOGY 65
synchronized
transfer(Account1, 35)
synchronized
transfer(Account2, 76)
synchronized
withdraw(76)
Thread-2 Account2Account1
synchronized
withdraw(35)
Thread-3
Figure 5.3: Potential for Deadlock
The relation of the potential for a failure to the other terms introduced
is depicted in 5.2. In analogy to a failure, a potential failure has its origin in
a fault but it is not considered a failure, i.e. failures and failure potentials
represent different program executions. This will become clearer with the
following example.
Example
In the motivation, we have contrasted an execution leading into a deadlock
(see Fig. 2.2) with a correct execution alternative (see Fig. 2.1). We repeat
the latter figure for convenience in Fig. 5.3. The two concurrent threads lock
the same two resources, Account1 and Account2, without interfering. We can
observe that in this execution the two shared objects are locked in inverted
order, i.e. in opposite order. Should the access to the two accounts overlap
in time, a deadlock happens. Thus the inverted order serves as a warning
for a deadlock and is considered a potential for a deadlock (see Fig. 5.3).
Deriving Potentials from Failures
In the remainder we discuss how potentials for failures can be characterised.
Characteristics for the potential of a failure can be derived from the char-
acteristics of a failure. A failure is detected when all of its characteristical
conditions hold. There can be failure-free executions where some of the con-
ditions may hold. Depending on which conditions hold one can take this as
an indication that the concrete program bears the potential for a certain fail-
ure. Not all conditions are suitable to serve in the definition of a potential.
Failure conditions can be classified into those which are fulfilled a priori by
the language or the system, those which are fulfilled by most programs, or
most program states or executions, and those which hold only for specific
program states or executions. If there is a set of conditions which hold only
66 CHAPTER 5. LIVENESS FAILURES AND POTENTIALS IN JAVA
c1...ci ci+1...cj cj+1...cn
failure conditions
a posteriori fulfilled conditions
conditions fulfilled by most executions
conditions fulfilled by few executions
candidates for potential conditions
Figure 5.4: Deriving Potential Conditions from Failure Conditions
in specific states or executions, a subset or a set of weaker conditions may
serve as candidate conditions to describe a potential. Also, this subset or
weaker set should allow to point to a fault in the program. The idea of can-
didate conditions for potential failure among failure conditions is depicted in
Fig. 5.4. Technically, candidates are defined as follows.
Definition 5.2 (Failure Potential Candidate) Given the conditions of a
specific type of a failure we can derive candidate conditions for the potential
of this failure. From the conditions of the failure, a subset of conditions has
to be chosen such that the conditions in the subset are not fulfilled a priori
by the language or the system, i.e. they are not fulfilled by each and every
program execution.
From this set of conditions either subsets can serve as potential conditions
or sets of conditions weaker than the original conditions or a mix of both.
The set mustn’t be empty and weaker conditions mustn’t evaluate to true for
all cases.
While this is a very technical view, in reality, not all possible candidates
are useful. Some candidates may identify too many executions as failures.
The choice of such conditions depends on the concrete failure and will be
illustrated by an example in the following. Of course, it can be the case that
it is impossible to find such conditions.
Definition 5.2 explains, why failures and potentials are disjunct, i.e. why
a potential is not a failure. This is because, it fulfils less conditions than a
failure. This definition will be used in later section to establish corresponding
potentials from Java liveness failures.
5.1. TERMINOLOGY 67
5.1.3 Failure and Symptom
A failure will only surface and get noticed trough its symptoms, for instance
incorrect or missing output. The term symptom depends on a human per-
spective, i.e. how much the programmer knows about what is supposed to
happen and how much the programmer is aware of what is actually happen-
ing.
In the easiest case, the symptoms directly refer to the failure state. In
other, more common cases, the programmer will make an observation, e.g.
performance loss, but does not know what the failure is. The mapping of
symptoms to the failures causing them is not one-to-one in concurrent pro-
gramming and success in identifying failures correctly depends on a lot of ex-
perience. Surprisingly, this experience is rarely documented. It is especially
rare for concurrent programming, where failures are particularly difficult to
track down. Often, authors do not discuss symptoms but prefer to provide
lists of typical faults, e.g. for Java [Eng99]. Even books on debugging do
not teach the heuristics but present tools [ZK00]. A classification of symp-
toms is rare as is a classification of symptom-failure relations. An example is
[Lew03a], who classifies symptom-failure relations according to their trace-
ability, i.e. whether the symptoms can be traced back to failures or not by a
direct cause-effect chain. This classification is independent from concurrency
errors, i.e. it is applicable to all kind of errors.
There are symptoms which are characterised by the presence of a wrong
result or state. This gives a concrete handle to a problem, from where one
can try to trace back the cause of the problem. Whether the handle can be
successfully used or not, depends on the right tools. When using a classical
debugger, there is no means for going back arbitrarily.
There are symptoms which are characterised by the absence of a state or
result, instead, the missing of the result or the state is the problem. Then,
there is no handle from which to start. Also, the first variant might be
reduced to the second variant if the wrong result is eventually caused by the
lack of some behaviour.
Liveness failures tend to be part of the second category, absence, and
are therefore difficult to track in the way described above. Although we
take a different approach in the following to classifying liveness failures, our
ideas were strongly influenced by [Lew03a] and an memorable presentation
of [Lew03b].
68 CHAPTER 5. LIVENESS FAILURES AND POTENTIALS IN JAVA
5.2 Liveness Failures
In the remainder of this chapter we will informally introduce liveness failures
in Java. In order to state precisely what we understand by liveness failures
we will introduce the classical distinction between liveness and safety failures.
We will shortly introduce race conditions, a notion related to concurrency
failures.
5.2.1 Liveness and Safety
In this section, we will introduce the main classification for failures in con-
current programs. As we noted in the definition of an failure, one can only
speak of failures if one can compare against a specification and if that spec-
ification is not met. To be able to classify concurrency failures we introduce
two desirable properties of a program: liveness and safety [Lam77, OL82]. A
property is an attribute of a program that is true for every possible execution
of that program.
Definition 5.3 (Liveness) The liveness property means that a program even-
tually reaches a good state. In sequential programming, reaching a good state
refers to termination which is not always applicable as modern sequential
programs can run for a long time or forever, e.g. a server process or an
embedded program. In a concurrent program, it means that something even-
tually happens in an activity, i.e. each activity is eventually making progress
until they come to a defined end if any, e.g. activities are eventually granted
the resources they are waiting for.
Definition 5.4 (Safety) The safety property describes that a program does
never reach a bad state. More formally, the property says that all object are
in a consistent state as specified, e.g. by invariants. Concurrent safety means
that data is never corrupted by contending threads.
Both properties can be violated by sequential programs (inconsistent
data, non-terminating execution) as well as by concurrent programs. In
both cases, the violation can be avoided by the programmer. The violation
will be observable only while the program is executing. Therefore, we speak
of safety and liveness failures when these properties are violated. Following
our general error terminology, it is possible to speak of potentials for liveness
failures and of potentials for safety failures but the potentials of a failure
themselves do never violate these properties. We depict the integration of
safety and liveness failures into the basic terminology of errors in Fig. 5.5.
5.2. LIVENESS FAILURES 69
Error
Mistake Fault Failure
Safety Liveness
Figure 5.5: Safety and Liveness Terminology
An example for the violation of a liveness property is the failure of a
deadlock. Note that a deadlock is sometimes classified as a safety violation
because it is considered a bad state [MK99]. We prefer to view it as a liveness
failure because that stresses what effect the failure has, namely, that threads
involved in the deadlock fail to make progress, and where the reasons for
this failure lie and that this failure has much in common with other liveness
failures.
The example of the the deadlock also illustrates that liveness failures
occur in legal Java programs, i.e. those programs can be compiled and ex-
ecuted. Either some or all executions contain the deadlock although the
programmer did not intend this. But the way how the programmer used
Java for writing a specific program allowed this to happen. Therefore, it is
also said, that liveness failures are inherent in Java, i.e. they have their origin
in the programming language itself. The same holds for safety failures.
Race Condition
While liveness problems are due to the various mechanisms for thread syn-
chronisation, at the core of safety problems is the unsynchronised access of
data across different threads. As this can happen all the time and every
where in a concurrent program, one tries to tackle this problem using for-
mal methods. The potential for a safety problems in concurrent programs is
identified by the notion of a race condition.
An unexpected or unintended execution order leading to wrong results
based on unsynchronised access to variables is also termed race condition. A
race condition, also termed a data race, is given, if two threads are accessing
the same shared variable or memory location in an unsynchronised way and
at least one thread modifies the variable. Thereby, a data race is the source
for a safety violation. A race condition is anomalous behaviour caused by
the unexpected dependency on the relative timing of events. In other words,
a programmer incorrectly assumed that a particular action would always
70 CHAPTER 5. LIVENESS FAILURES AND POTENTIALS IN JAVA
happen before another. The following definition for a data race in Java is
taken from [CLL+02].
Definition 5.5 (Data race) A data race is defined as two memory accesses
which satisfy the following four conditions [CLL+02]:
(1) the two accesses are to the same memory location (i.e. the same field
in the same object) and at least one of the accesses is a write operation;
(2) the two accesses are executed by different threads;
(3) the two accesses are not guarded by a common synchronisation object
(lock); and
(4) there is no execution ordering enforced between the two accesses, for
example by thread start or join operations.
Data races are not the only source of nondeterminism. Races are also
involved in competing for access to a synchronised, i.e. mutually exclusive,
object. Depending on the order in which threads are guaranteed access the
subsequent behaviour of a program might change.
Safety problems can be analysed by analysing race conditions. Analysis
of race conditions is a hard problem. Therefore, it is the main goal of research
in the area of concurrent safety to improve algorithms for race conditions.
Hence, safety failure detection is quite well supported.
The situation for liveness failures is different, because the general condi-
tions for these failures have not yet been established for Java. Typically, only
the deadlock is well supported. Therefore, we will focus on liveness failures
in the following.
5.2.2 Concurrent Java Liveness Failures
In the following, we will informally present a list of concurrent liveness failures
in Java. Neither literature nor tools cover all concurrent Java liveness failures.
The original source for Java does not provide such material, neither in their
books [GJS97] nor on the web. We have consulted a range of books on
Java concurrent programming [Lea97, Lea00, Mit00, DD00], online material
[Hol98], and also the state-of-the-art tools JBuilder, JDeveloper, JProbe, and
Jinsight [JBd01, JDv01, Jin, JPr00]. We found out that the literature about
Java in generally does not address concurrency failures in great detail nor do
the tools provide support for failures other than the deadlock and deadlock
potential.
Here, we will draw on these references to give a comprehensive presenta-
tion of concurrent liveness failures. We complete the ones found in literature
by those we found missing. Our approach to determine additional failures is
based on the idea that the number of synchronisation primitives in Java is
5.2. LIVENESS FAILURES 71
limited and one can informally check, which combinations of primitives are
feasible or produce liveness failures. While this is carried out in this chapter
on an intuitive level, we will later show how this can be done more formally
by testing combinations of thread states of our desired formalisation.
We will use UML sequence diagrams to illustrate these failures. For each
failure we will describe the related fault. The symptoms of liveness failure are
always the same, the threads involved fail to perform intended computations.
F1 Deadlock
synchronized
transfer(Account1, 35)
synchronized
transfer(Account2, 76)
synchronized
withdraw(76)
Thread-2 Account2Account1 Thread-3
synchronized
withdraw(35)
Figure 5.6: Deadlock
The deadlock has been already extensively discussed in Chapter 2. Here
we give its precise definition.
Definition 5.6 (Deadlock) A deadlock is a state in a program where two
or more threads or processes are mutually blocking each other and can never
make progress. A deadlock is described with three necessary conditions (1.-3.)
and one sufficient condition (4.) [CES71].
1. Serially reusable resources, e.g. resources are mutual exclusive.
2. Incremental acquisition of resources.
3. No preemption of resources.
4. Circular chain of processes such that each process holds a resource which
its successor in the cycle is waiting to acquire.
This situation cannot be resolved by the threads involved. A deadlock with
two threads has also been called a deadly embrace.
Definition 5.6 is language and system independent. It is also applicable
to Java. Java is deadlock-prone because it fulfills the three necessary condi-
tions. Any resource in general is reusable. A resource is made serially usable
72 CHAPTER 5. LIVENESS FAILURES AND POTENTIALS IN JAVA
through the keyword synchronized. Synchronised resources can be acquired
incrementally in Java. The forth condition can be fulfilled by a certain exe-
cution. For instance, the sequence diagram in Fig. 5.6 describes an execution
that fulfils the four above conditions and hence describes a deadlock.
The fault leading into the deadlock has also been addressed already. It
is source code which does not impose order on locking when more than one
lock is shared between different threads.
F2 Wait-induced Deadlock
Even when ordering on locks is imposed, the use of wait() may
change the ordering as in the following example. This may lead
into a wait-induced deadlock [Hol98]. The deadlock itself fulfills
definition 5.6.
As an example look at the following code fragments:
//Thread 1:
synchronized(A)
{ synchronized(B)
{ //...
A.wait();
//...
}
}
//Thread 2:
synchronized(A)
{ synchronized(B)
{ //...
}
}
Based on the above code, a wait-induced deadlock can occur in the fol-
lowing way:
•Thread 1 acquires both locks, enters the wait() and releases Aas a side
effect of waiting.
•Thread 2 activates, acquires A, but can’t get Bbecause Thread 1 has
already locked it.
•Thread 1 is notified, through wait() it tries to re-acquire Abut can’t
get it because Thread 2 has already locked it.
5.2. LIVENESS FAILURES 73
The execution is deadlocked. The main issue is that wait() hides the fact that
the order-of acquisition isn’t the same on both threads. Thread 1 holds the
lock on B, but releases the lock on A(when it starts to wait). It re-acquires
the lock on Awhen wait returns, but the order of acquisition is effectively
B, A. Thread 2, however, acquires the two locks in the order A, B. Note that
this is not a new kind of failure but a different kind of fault which produced
it.
F3 Indefinite Blocking
A thread is blocked having called synchronized but is never granted
the lock.
This can even be the case for more threads regarding the same lock.
Here, the threads are however not involved in circular dependencies as in the
deadlock. The situation is, that another thread holds the lock but does not
intend to release it. Note that it cannot be decided in general if a lock will
ever be released by thread. This situation is also called lock starvation.
By indefinite blocking we do not refer to a similar situation where a thread
having called wait is also blocked waiting for a notification signal which will
never be sent.
F4 Missed Notification
:Thread :Object :Thread
synchronized do()
synchronized do()
wait()
notify()
synchronized do()
notify()
synchronized do() wait()
Figure 5.7: Missed Notification
A thread cannot be removed from the wait-queue, because it
entered the wait-queue after the corresponding notify()/notifyAll()-
call was issued by another thread [Lea00].
74 CHAPTER 5. LIVENESS FAILURES AND POTENTIALS IN JAVA
While the deadlock only involves locking, i.e. the use of synchronized,
there is a range of failures which are based on conditional synchronisation,
which is based on the methods wait(),notify() and notifyAll(). A thread tests
a condition and, if it doesn’t hold, issues a wait(). The thread assumes that
there are other threads which issue a notify() or notifyAll() when the condition
has changed. Moreover, one can assume a programming style which uses
notifications in an efficient way, i.e. notifyAll() is not used if only one thread
should be removed from the queue and if each removed thread will trigger
the removal of the next waiting thread.
A sensible use of these mechanisms is to balance wait()- and notify()-calls
at runtime. A wait() in one thread is at some time in the execution followed
by a notify()/notifyAll() in another thread. Several wait()-calls from different
threads are followed by several notify()-calls from one or more other threads
or by a notifyAll() from another thread. We depict this principle in Fig. 5.8.
If this equilibrium becomes unbalanced, a liveness problem is threatening by
threads not being properly woken up.
:Thread :Object :Thread
synchronized do()
synchronized do()
wait()
notify()
synchronized do()
notify()
synchronized do() wait()
Figure 5.8: Balancing wait() and notify()
Note that this failure describes a state in a program, namely, a waiting
thread. But to identify this as a failure, one has to look at the execution
history. An example is given in Fig. 5.7. In the upper half of the sequence
diagram, the wait()-call within the left thread is balanced by a notify() from
the right thread. In the lower half, the notify() takes place before the wait()
and is therefore missed from whithin the left thread. Also, no timed wait()
is used, i.e. the threads are not removed from the queue simply by time out.
The situation could also be extended to the use of notifyAll().
The fault behind this failure is that the programmer has made the (wrong)
assumption that the notify() always takes place after the wait() and that the
5.2. LIVENESS FAILURES 75
programmer has not included any synchronisation to make sure that each
notify() takes place after a wait().
Instead of missing a notification, the thread could have also missed an
interrupt(). This points to the fact that the above heuristic deals only with
one solution to wake up the thread, although also the other solution is pos-
sible. Also, if timed wait() is used, we must consider that threads are woken
up simply by timeout.
The balancing is only a heuristic based on the program’s history. How-
ever, one could also assume that such a notification is still to come no matter
what the history looks like. It is not obvious how long a thread is supposed
to be waiting. A timer could be used to trigger the detection.
Note that the situation described here differs from the idea of a potential.
While a potential is never a failure itself, the above situation can be a failure
following a heuristic although it might be a false positive.
F5 Missing Notification
The execution history of the program is such that there has never
been a notification for the object where the thread is waiting, even
though other threads might have been accessing the object [Lea00].
This situation assumes the same programming style as the previous one
and in particular that no timeouts are used in a wait(). It identifies a situa-
tion, where a thread keeps waiting without being notified. Only an interrupt()
could change the situation but we do not know if this will happen.
The fault behind this failure is that no thread can issue a notification,
because there is no code containing notify()/notifyAll() or the code is never
used or not used in this program run because certain conditions are not met.
Also here, an interrupt or a timeout could wake up the thread.
This situation could involve more than one waiting thread. E.g., a second
thread could call wait on the same object. Note that in the absence of the
programming convention it cannot be decided in general, if a notification will
be send by another thread.
F6 Nested Monitor Lockout
A thread waiting in the queue of an object does no longer hold
the lock for that object but it can hold locks to other objects.
We assume that the wait is not a timed wait but that the thread
is dependent of other threads to wake it up. This can lead to a
situation where any other thread which is supposed to wake up this
thread can only access the object at which the thread is waiting
76 CHAPTER 5. LIVENESS FAILURES AND POTENTIALS IN JAVA
:Thread :Object :Object
synchronized m2()
synchronized m1()
wait()
:Thread
synchronized m1()
Figure 5.9: Nested Monitor Lockout
via at least one of the locked object. The locked object is also
called the outer object. The object where the thread is waiting is
also called the inner object. Because the outer object is locked no
other thread can access the inner object [Lea00].
This situation is depicted in Figure 5.9. The thread on the left obtained
a lock to the left object before accessing the right object. Calling wait()
on the right object does release the lock on that object but not on the left
object. The second thread becomes blocked when accessing the left object.
Only interrupting the first thread could change the situation but we do not
know if this will happen. Interrupting is feasible, because the first thread
can re-obtain the lock for the inner object.
The fault in the code is that the thread is keeping the lock. The question
to be answered is how can we be sure that there is no other way to accessing
the object but through the outer lock. Generally, this question can only be
answered by looking at source code. Determining access relation cannot be
efficiently be computed because of aliasing problems. A further hint to a
problematic situation could be that other threads start being blocked at the
object which is locked by the waiting thread but this cannot be a guarantee
that the thread is waiting forever.
This situation bears similarities with the deadlock because of the circular
waiting involved. The first thread is waiting for a signal of the second thread
while the second thread is waiting for a resource from the first thread. Both
threads will not send the signal or release the resource before the desired
state change takes place.
Note that in a similar situation where join() is called instead of wait() the
thread is not blocked forever. Here, a thread obtains a lock and then calls
join() on a different thread. Thus it keeps the the first lock but will return
when the joined thread is is terminated.
5.2. LIVENESS FAILURES 77
F7 Circular Join
:Thread :Thread
join()
join()
Figure 5.10: Circular Join
In a circular join, a first thread is waiting via a call to join() for
a second thread to terminate. The second thread is waiting via
join() for the first thread to terminate.
This situation is depicted in Figure 5.10. However, a thread not involved
in the circular join could interrupt() one of the two threads which would
cause the interrupted thread to return from joining immediately. The second
thread, still joining, would return once the first thread is terminated. Nev-
ertheless, we consider the circular join a dangerous and unwanted situation
because it does not make sense having two threads joining themselves and
waiting for a third thread to interrupt() one of them. Note that an arbitrary
number of threads could create a circular chain of joins.
On an abstract level, this situation is similar to a deadlock because it
involves a circular waiting. The fault in the code is that the parameters of
join() allow mutual joining. Also, no timeout is used with join(), which is
always a risk. A join() can be aborted through interrupting, but again, one
does not know if such interrupt() will be called in another thread.
If a thread joins another thread which is calling wait() this could be a
failure if the other thread depends on the joining thread to be woken up.
Note that joining a thread which has not yet started or which is already
terminated could be a situation not intended by the developer but it is neither
a liveness nor a safety failure.
F8 Self Join
:Thread
join()
Figure 5.11: Self Join
78 CHAPTER 5. LIVENESS FAILURES AND POTENTIALS IN JAVA
A thread joining itself [OW99] can not return from joining. The
only exception is that it is is interrupted by another thread.
For an example, see Fig. 5.11, which depicts one thread joining itself and
thus not making progress.
The fault in the code is simply a wrong target parameter of join() rsp.
the code which computed the actual thread passed as parameter.
F9 Join-induced Deadlock
:Thread :Object
synchronized m1() join()
:Thread
synchronized m1()
Figure 5.12: Join-induced Deadlock
A first thread holding a synchronized-lock is joining a second
thread and thus will be blocked until this thread is terminated.
After the call to join(), the second thread is acquiring the same syn-
chronized-lock. This produces a cycling waiting. The first threads
is waiting for the terminating signal before it can release the lock,
the second thread is waiting for the lock before it can terminate .
An example is depicted in Figure 5.12. The thread on the left obtians
a lock and then joins the second thread. The second thread tries to obtain
the lock which is never released. Only interrupts and timeouts can remove
threads from the situation.
The fault is that the joining thread holds at least one lock and that its
target was not chosen carefully. The situation would not be different if the
second thread acquired the lock even before the first thread calls join(). The
same situation also exists if the second thread would still be in a wait-queue
of the object which is used as lock. When the thread is notified, it has to re-
obtain the lock which would be impossible. This situation cannot be reduced
to a normal deadlock such as in the case of the wait-induced deadlock.
F10 Livelock
A thread can be actively waiting, usually by looping, for a condition
to come true.
This is also called busy waiting. If, however, the condition will never be
true and the thread cannot make progress, the situation is called a livelock
5.2. LIVENESS FAILURES 79
:Thread :Object
check()
while (condition == false)
try {sleep()} catch
(InterruptedException ie) {};
Figure 5.13: Livelock
[Lea00], An example is given in Fig. 5.13. In the loop, the thread chooses to
sleep for some time before it tests the condition again.
This failure is well-known from sequential programming. Here, not only
one thread but the complete program does not make progress because of
a livelock. In concurrent programming though, this failure has the same
symptoms as other liveness failures and is therefore not easy to identify. The
fault in the code is the looping over a wrongly chosen condition.
To detect the failure, the program history has to be examined for repeated
loops where the loop condition is not changing. Whether the loop is a livelock
or not can however not be decided in general. It can only be decided for trivial
cases where the loop condition is a true constant and where the statements
in the loop will never cause exceptions or breaks.
F11 Blocking on I/O
A thread could be indefinitely blocked on an I/O method call.
This could be the case, when a device for input or output is not reacting
because it is in an inconsistent state or because the device is constantly used
by other threads. This failure is also well-known from sequential program-
ming.
Usually, the code of the program using the device cannot be made respon-
sible for device problems except when it violates preconditions when calling
it. Heuristics are needed to decide when a blocking on I/O takes too long.
F12 Termination by Exception
In a multithreaded Java program a thread can terminate due to
an uncaught exception with the effect that the stack trace of the
thread is printed to the error console.
If that thread is the main()-thread the complete program will exit. In the
case of other threads, only that thread will terminate but all other threads
80 CHAPTER 5. LIVENESS FAILURES AND POTENTIALS IN JAVA
keep running. Only in the case of a TreadDeathException, no such trace is
printed. The termination might even go unnoticed if the thread does not
print a stack trace on the console. It might be noticed only some time
after the thread has vanished and then it is not known when and where
which exception was thrown. The unexpected termination of a thread might
leave the objects used by that thread and shared with other threads in an
inconsistent state.
Note that symptoms are not different when threads are blocked or vanish.
Both failures may cause inconsistencies, wrong or missing results, or lack of
performance.
The fault is the lack of a corresponding exception handler. Exceptional
termination itself is not specific to concurrency although some exceptions
are caused by concurrency mechanisms such as InterruptedException or Ille-
galMonitorStateException. The latter exception points to a fault in the code
which is only detected at runtime. The fact that the exception is not yet han-
dled is however the fault addressed here. The InterruptedException is used to
signal a thread that it was interrupted. The exception itself does therefore
not point to a fault but the fact that it is re-thrown and not handled is the
fault addressed here.
User-defined exceptions typically signal an inconsistent state, which can
be a safety failure. Thereby, a safety failure could turn into an exceptional
termination if not caught and hence into a liveness failure.
Because of the various kind of exceptions and the various ways to use it,
one cannot interpret all of them as failures. Still, many of them bear the risk
of failures. It is essential to provide runtime detection of otherwise unnoticed
exceptions and to provide as much information about their occurence as
possible.
Thread Restarting
Restarting a thread in Java can either throw an exception or may go unno-
ticed if the JVM has already cleaned up behind a terminated thread.
This is not a failure in the above senses as this thread never becomes alive
and thus cannot fail to make progress. This failure might have the same
symptoms as other liveness failures as the thread cannot do the intended
computations. Other threads can however not become blocked when trying
to interact with this thread.
The fault in the code is that the same thread object is started more than
once. It could be the case that the creation of a corresponding new thread
object before calling start() is missing.
5.2. LIVENESS FAILURES 81
Ripple Effects
All of the above failures have the same symptons. Symptoms can be missing
or wrong results or computation states or lack of performance. It is also
a specific characteristic that these failures can remain unnoticed. When
a liveness failure happens the threads involved stop to run and can never
continue running. Other threads of the program are not affected by this
liveness failure unless they are engaged in a particular interaction with any
of the threads involved in the failure.
This can be the case if another thread calls join() on a thread involved in
the failure or is waiting for a notification by a thread involved in the failure.
If the threads involved in the failure are holding locks other threads can
be blocked when using synchronized. Therefore, each of the above liveness
failures can have a ripple effect on the entire system.
A Note on Stall, Dormancy, Starvation and Bottle Neck
Finally, we shortly mention a few terms often appearing in the context of
concurrency problems which we have neglected so far and which do not play
a role in this thesis because they either fall outside the area of liveness failures
or because they are less appropriate than the terms we prefer to use.
AStall refers to a situation where a thread cannot proceed for a certain
time span. The thread will however be able to continue again. Thus a stall is
a delay. Stalls can have many different reasons. The drawback of this term
is therefore that it does not exactly describe the situation. A stall is not a
liveness failure and thus will not be considered in the following.
Dormancy refers to threads waiting in queues that are not being notified.
It is not defined whether they will be notified later or never. Because of
this and because dormancy can have many different reasons it is not a pre-
cise term. Instead, we have classified reasons of dormancy by distinguishing
between a missing and a missed notification. This provides a better charac-
terisation than the more general term and this is the reason why we decided
not to use dormancy throughout this thesis.
Starvation typically refers to a thread not making progress as a conse-
quence of the scheduling policy. Although Java guarantees fair scheduling
this problem is not avoided in practice. Scheduling is however an issue on its
own and is not discussed in the context of liveness failures because it cannot
be influenced by the programmer in the same way as liveness failures. Star-
vation might also refer to lock starvation, which we have already discussed
in the context of indefinite blocking.
Priority-inversion is a special case of starvation. It refers to a problematic
82 CHAPTER 5. LIVENESS FAILURES AND POTENTIALS IN JAVA
situation where a medium-priority thread prevents a low-priority thread from
running and thus from releasing a lock which a high-priority thread needs
for making progress. This situation is more accurately termed unbounded
priority-inversion meaning that the situation cannot be resolved. Priority-
inversion can be a problem in Java depending on the scheduling policy of
a specific JVM. Java fails to require scheduling-protocols avoiding priority-
inversion. The reason for this total cop-out on the part of Java is presumably
because of its desire for platform independence, since it has to run on a variety
of operating systems. Consequently, multithreaded Java programs can run
well in one operating system, and not run at all in another operating system.
As said above, issues related to scheduling don’t fall in the area of liveness
failures and will not be considered in this thesis.
Bottle Neck refers to states in a program where the performance is de-
creased by an increasing number of threads or by issues related to thread
interaction. Performance issues are not related to liveness failures. Not only
the sheer number of threads running concurrently can affect performance
through the context switches or through the cost of creating and deleting
thread object, but also the overhead of synchronisation mechanisms. Locks
can become bottle necks for threads. The locking itself takes time which is
referred to as Lock Overhead. Sometimes, programs contain too much locking
and locking and even locks can be avoided. Often, there is a trade-off between
coarsening lock-granularity to avoid locking overhead and using fine-grained
locks to reduce the number of threads competing for a lock. None of these
problems is considered a liveness failures and therefore is not an issue here.
5.2.3 Potentials for Java Liveness Failures
Some potentials are identified in a system state and others in an execution
history. It is possible to determine potentials for some of the above introduced
failures. For other failures there are no meaningful potentials. In general it is
difficult to establish conditions for a potential if only one method per thread
is involved.
For example, in the case of a circular join it is impossible to establish a
potential. The dynamic conditions of the circular join are the two mutual
method calls. The only condition weaker than that would be one method
call but this condition does not make sense because then every call to join()
would be treated as a potential.
5.2. LIVENESS FAILURES 83
P1 Deadlock Potential
In the case of the deadlock its conditions (see definition 5.6) can be classified
in necessary conditions which are always fulfilled by a language or system
and in a sufficient condition whose fulfillment depend on a concrete program
or a concrete state or execution of a program.
•In the case of Java, the condition for non-preemption holds a priori for
any program written in Java. Therefore the fulfilment of this condition
can not serve as a warning for a concrete program execution.
•Mutual exclusion holds as soon as threads need to share resources with
mutual exclusive access using synchronized. Programs can only avoid
it when avoiding overlapping concurrency by completely serialising the
program which is not very efficient. In practice, systems will use mu-
tual exclusion. Therefore, the use of mutual exclusion is not a good
candidate for a warning either.
•Incremental acquisition, i.e. the incremental locking via synchronized
is also fulfilled in any Java program which uses synchronized without
further restrictions. In a dynamic system where objects are generated
at runtime incremental acquisition might be unavoidable. Therefore, it
is not a useful candidate.
•The condition for cyclic dependencies is fulfilled if synchronized-objects
are accessed in inverted orders and if that access overlaps in time.
Typically, programs with dynamic creation of objects can not easily
enforce order and therefore, inverse order is a widely used candidate
for detection of deadlock potential. The condition of inverse order is
weaker than the condition of cyclic dependencies because it omits the
overlap in time. Therefore it can be used as a condition for a potential
of a deadlock. Also note that detecting inverse order implies that the
condition of incremental acquisition and mutual exclusion hold.
P2 Unused Notification
Another potential can be established based on the problem of missed no-
tification. Disregarding the discussion that it is based on a programming
convention, it is possible to consider isolated notifications as troublesome.
By isolated notification we refer to notifications which have not been used
to wake up a thread because none was waiting. Such a notification can be
seen as a potential for a missed notification itself because it was issued at an
inappropriate time.
84 CHAPTER 5. LIVENESS FAILURES AND POTENTIALS IN JAVA
Concerning our method to establish conditions for potentials the con-
ditions for the potential of missed notification are weaker than the one for
the missed notification because they do not require that the notification is
followed by a wait.
P3 Unused Interrupt
The interrupt mechanism bears the potential for an InterruptedException. If a
thread was interrupted but has never polled or consumed its interrupt state
it will throw an exception when it enters a wait(),join() or a sleep(). We
said earlier that this exception has the intent of signaling the interrupt to a
thread which is not in a state where it can actively poll the interrupt flag.
If an interrupt is not polled by the interrupted thread after a certain elapse
of time this can be seen as a potential liveness failure because it can lead to
an exception when the thread calls wait(),join() or a sleep(). As it is best
practice to suppress this exception by providing an empty handler in code,
it can also be seen as a failure to participate in the interrupt mechanism.
P4 Potential for Join-induced Deadlock
There are two conditions for a join-induced deadlock. One thread joins an-
other while holding at least one lock. The target of the join attempts to
acquire a lock held by the first thread. Both together are sufficient if they
appear in the above order.
There could be an execution, where a thread acquires a lock and sub-
sequently releases it. Later, the thread becomes the target of join(). The
joining thread holds the same lock while joining. This situation has the
same conditions but a different timing and is therefore a potential for the
join-induced deadlock. If the first thread is to acquire the lock after join()
has taken place, both threads are blocked.
Joining while holding locks can generally be considered as a potential for
failure but this can happen very often in arbitrary Java programs.
5.2.4 Classification of Java Liveness Failures and Po-
tentials
We have motivated not only the need for a comprehensive list of concurrent
Java liveness failures, but also that it is desirable to analyse their common
reasons.
We consider only those liveness failures and potentials as concurrent
which involve synchronisation mechanisms. Therefore, we do not consider
5.2. LIVENESS FAILURES 85
livelock (see F10) and indefinite blocking on I/O (see F11) as concurrent
failures (see also Table 5.1).
It is obvious that concurrent liveness failures and potentials can be cat-
egorised based on the language means involved, e.g. synchronized,join() or
wait(), based on the number of threads involved, and whether they can be de-
tected in a state or in a history (see also Table 5.1 and Table tab:formalpotclassif).
General Dimensions
Concurrent liveness failures and their potentials
•manifest themselves while a program is executing,
•involve more than one thread (except for self join and termination by
exception), and
•have their origin in the synchronised interaction of threads.
Failures are manifest in either a state or in an execution history. Failure
states are legal system states. System states are composed by the states of
each thread disregarding for the moment the states of passive data objects.
Failure states differ from other system states in the combination of thread
states. In these combinations, threads need resources or events from each
other, in order to make progress. To be more precise, at least one thread
involved in a failures is depending on other threads.
Failure execution histories are also legal execution histories. The reason
why some failure cannot be described as a state but only as a history, is that
either the effect of a state change has been overridden by new state changes
or that there has not been a related state change at all. For instance, if
behaviour is missing, e.g. a missed notification, we cannot observe that in
any state. Another example is that behaviour has taken place but has not
persisted in a state change such as a call to notify().
Thread Dependencies
We have already pointed out how failure states and histories are different
from other system states because they involves special dependencies. It is
common to several kind of liveness failures that these dependencies are cyclic.
We can identify three different classes of waiting-dependencies between two
threads:
•waiting for a lock, which we also call blocking
86 CHAPTER 5. LIVENESS FAILURES AND POTENTIALS IN JAVA
Concept Java Syntax Liveness Failures State History
Lifecycle Control -
Termination
run() Uncaught Exception
(F12)
x
Unsynchronised Inter-
action - Direct Inter-
action via Interrupt
interrupt() Uncaught Exception
(F12)
x
Synchronised Interac-
tion - Joining
join() Circular Join (F7),
Self Join (F8)
x
Synchronised Interac-
tion - Joining
synchronized,
join()
Join-induced Dead-
lock (F9)
x
Synchronised Interac-
tion - Mutual Exclu-
sion
synchronized Deadlock (F1, F2) x
Synchronised Inter-
action - Conditional
Synchronisation
synchronized,
wait()
Nested Lockout (F6) x
Synchronised Inter-
action - Conditional
Synchronisation
notify/All(),
wait(),
Missed Notifica-
tion (F4), Missing
Notification (F5)
x
Synchronised Interac-
tion - Mutual Exclu-
sion
synchronized Indefinite Blocking on
synchronized (F3)
x
General Concepts -
I/O
(not spe-
cific)
Indefinite Blocking on
I/O (F11)
x
General Concepts -
Loop
while Livelock (F10) x
Table 5.1: Liveness Failures
5.2. LIVENESS FAILURES 87
•waiting for the termination of another thread during joining
•waiting for a notification from an arbitrary other thread
These three dependency can create several different failures involving
cyclic dependencies, namely deadlock, wait-induced deadlock, nested mon-
itor lockout, join-induced deadlock, self join, and circular join. The basic
pattern is always the same, although sometimes threads wait for a resource
and other times they wait for a signalling event.
Hence we can identify blocking, locking, joining, and waiting as the source
of failures and of the corresponding potentials. This once more shows, that
in concurrent programming, failures are unintended combinations of legal
system behaviour.
Concept Java Syntax Failure Potential State History
Unsynchronised Inter-
action - Direct Inter-
action via Interrupt
interrupt Unused Interrupt
(P3)
x
Synchronised Interac-
tion - Mutual Exclu-
sion
synchronized Deadlock Potential
(P1)
x
Synchronised Inter-
action - Conditional
Synchronisation
notify(),
wait()
Unused Notifica-
tion (P2)
x
Synchronised Interac-
tion - Joining
synchronized,
join()
Joining with Locks
(P4)
x
Table 5.2: Potential Liveness Failures
Synchronisation Mechanisms Involved
We have presented a few failures specific to both, sequential and concurrent
programs, and many failures specific to concurrent programs only. For the
latter it can be observed that they are based on a set of synchronisation
mechanisms of Java.
There exists a group of failures directly related to the concurrency con-
cepts of synchronised interaction (see Table 5.1). These failures are circular
join, self join, deadlock, nested monitor lockout, missing notification, missed
notification, and blocking on synchronized or wait. When a failure occurs,
88 CHAPTER 5. LIVENESS FAILURES AND POTENTIALS IN JAVA
this is due to the effect of a certain method call, and in some cases due to
the absence or wrong timing of a method call.
Two tables extend the table presented at the end of the previous chapter
by adding the failure (see Table 5.1) and potentials (see Table 5.2) which can
be caused by the language concepts.
5.2.5 Failures in Concurrent Application Logic
The Java synchronisation features can be used to construct high-level syn-
chronisation protocols such as protocols which allow concurrent reading threads
but only mutually exclusive writing threads (also known as reader/writer
protocols [Lea00]).
In these protocols it can also happen at runtime that a thread does not
make progress, e.g. if the protocol does not guarantee fairness. Technically,
such a thread is either blocked at a lock or waiting in a queue or joining,
as there are no other possibilities. Usually, these protocols use waiting to
coordinate threads.
In such a case, we would identify the failure that a thread fails to make
progress in the presence of the protocol as a missing or missed notification.
However, if we took the concrete protocol into account we could give a more
precise definition. That is, we are able to detect that something is wrong
with a protocol but we cannot provide a protocol specific analysis based on
the failures we have defined here. We can however detect if protocols created
unintended liveness failures like deadlocks.
5.3 Summary
In this chapter we have described a range of failures and potentials. We
described why the threads involved did not make progress and we have ex-
plained the faults, i.e. the causes or reasons behind these failures. Our focus
was on liveness failures inherent in the Java language.
We have described new failures and potentials which we could not found
in literature. The join-induced deadlock and its potential have a structure
similar to the deadlock and the deadlock potential. We were able to derive
them using the model of thread synchronisation which we describe in the
next chapter. Failures not documented yet like circular join and self join are
more obvious and easier to conceive and were also discovered using the model
of thread synchronisation.
Chapter 6
A Model of Thread
Synchronisation
The previous two chapters have introduced the domain of this thesis by
presenting concurrent programming in Java and by informally presenting
concurrent liveness failures in Java and their potentials. The reasons for
these failures and potentials were analysed and classified. As a result, thread
states and dependencies which potentially lead to waiting and blocking were
identified. They have their origin in Java methods and statements for thread
synchronisation.
The previous chapter used UML sequence diagrams to describe the dif-
ferent failures. Sequence diagrams can however not depict the states and de-
pendencies and describe failures only scenario-based. Therefore, this chapter
aims at precisely describing the behaviour of threads on a level which makes
states and dependencies explicit. To this end, a domain model for synchro-
nisation of Java threads is described which makes states and dependencies of
thread behaviour explicit. Based on the model, failures and potentials can
be specified more generally.
This chapter starts by deriving requirements for the model from the do-
main of concurrent liveness failures in Sect. 6.1. It is followed by a discussion
of related work in the area of formal Java semantics in Sect. 6.2. Then we
will present our statechart-based formalisation of the dynamics of Java thread
synchronisation in Sect. 6.3. In Sect. 6.4 we will use the developed model
to specify concurrent liveness failures and potentials.
89
90 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
6.1 Model Requirements
In this section we will determine the requirements for a model of Java threads
which will enable us to capture Java concurrent liveness failures and their
potentials.
The model has to cover the characteristics of the concurrent liveness fail-
ures and potentials. These characteristics have been informally identified in
the last chapter and are summarised here for convenience in the next sub-
section. Thereafter, we will analyse these characteristics in order to derive
requirements for the model.
6.1.1 Concurrent Liveness Failure Characteristics
In this subsection we summarise the characteristics of the concurrent liveness
failures which have been presented in the previous chapter. There, failures
were characterised according to two dimensions, the Java concepts involved
and the way how they manifest during a program execution.
In the following we will not consider liveness failures which can happen in
the same way in a sequential program such as the livelock and the blocking
on I/O. We will only consider liveness failures specific to concurrent Java
programs and their potentials.
Thread Synchronisation
The last chapter concluded with two tables (see Table 5.1 and Table 5.2) of
Java concepts involved in concurrent liveness failures and potentials. The
involved concepts are
•joining,
•mutual exclusion, and
•conditional synchronisation.
These concepts have been introduced in Chapter 4 together with the corre-
sponding Java language elements: the methods wait(),notify(),notifyAll(),
join(), and the keyword synchronized. Together, they form the Java part
dealing with thread synchronisation.
State and History
Liveness failures and their potentials manifest themselves while a program is
executing, either in a specific program execution state or in a specific history
6.1. MODEL REQUIREMENTS 91
of program states. All these states are legal system states, consisting of the
state of each thread and of its potentially shared objects. Therefore we need
to characterise how failures and potentials states differ from other states and
execution histories.
We can distinguish two kind of failure states. A deadlock-like failure is a
legal system state where
•The thread(s) involved cannot make progress and are blocked forever
(either blocking on a lock, waiting, or joining). Their interactions have
created cyclic dependencies such that progress of each thread depends
on a resource not yet released or signaled by another thread.
•One or more interaction from each thread is involved.
The other kind of failure state does not involve cycles.
•Some threads directly involved do not make progress due to blocking
after an interaction.
•The reason is the missing of an interaction of other threads. These
threads are said to be involved but they are not affected by the problem
and can still make progress.
There are also two kinds of potentials.
•One kind is manifest in a state, i.e. if certain conditions hold when a
thread makes an interaction.
•The other kind is manifest in an execution history where conditions
over the history are identified as potential for failure.
In the following we will analyse the above characteristics in more depth
which will allow us to formulate our requirements.
6.1.2 Dynamics of Thread Synchronisation
In this section we determine the scope of the model. At first glance, it seems
obvious that the scope is restricted to failures. But when we examined failure
states more closely, we identified each of them as a legal system state. The
Java concepts involved in failures can also be involved in many other system
states not identified as failures. This inclusion relationship is depicted in Fig.
6.1. Regarding potentials, they are legal system states or execution histories,
too.
92 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
Dynamics of
Synchronised Thread Behaviour
Liveness Failures
Liveness Failure Potentials
Figure 6.1: Failures and Potentials as Legal System Behaviour
Therefore, capturing failures requires to cover the legal behaviour to
which failures belong. Then failures can be specified as specific cases of
this behaviour. This two-step approach will be followed in the sequel.
Liveness failures and their potentials have their origin in the thread syn-
chronisation. Therefore, the formalisation has to cover Java thread synchro-
nisation. As the failures and potentials manifest themselves at runtime we
need to model the dynamics, i.e. the runtime behaviour of Java thread
synchronisation. Note that for the model we are only concerned with the dy-
namic semantics of Java thread synchronisation, but not with the so-called
static semantics or grammar.
The model has to provide semantics for the the methods wait(),notify(),
notifyAll(),join(), and the keyword synchronized. As the the Java methods
wait() and join() can throw an InterruptedException, it is obvious that the
interrupt mechanism is intertwined with the concepts of joining and condi-
tional synchronisation. Therefore, we choose to cover also interrupts in the
formalisation, implemented by the methods interrupt(),interrupted(),isInter-
rupted(), and the InterruptedException.
The resulting set of methods, keyword, and exception does not only cover
thread synchronisation but almost the entire thread lifecycle, i.e. the be-
haviour common to all thread objects, except for creation, starting and sleep-
ing of threads (see also chapter 4) . For the sake of completeness we decide
to cover the entire thread lifecycle with our model. That means, we are going
to integrate the constructor new() and the methods start() rsp. run(), and
sleep().
Note that these methods and synchronized calls do not only change the
lifecycle of the thread who called them but also the lifecycle of the called
thread. They can be called concurrently from other threads. Here we are
only concerned with their effects on the thread whose lifecycle we describe,
but the model has to respect these multithreaded calls because the lifecycle
also depends on the other threads behaviour.
6.1. MODEL REQUIREMENTS 93
6.1.3 Control Flow States
In this section we determine the level of abstraction for modelling the thread
lifecycle. It is important to identify, in which states of the lifecycle a thread is
unable to proceed. Therefore, one has to analyse how a thread changes state.
In order to better understand what characterises the states of the lifecycle
which are involved in failures we discuss the characteristics of concurrent
program state.
In general, a program state manifests itself in the state of its data and
its program counter, i.e. the pointer to the position of the control flow re-
spectively point of execution. In the case of a concurrent object-oriented
program the data state refers to all object states including thread object
states. The program counter consists of a counter for the main()-thread and
for all other threads. In a concurrent program, the continuation, i.e. the next
execution step of each thread, is not only determined by the counters but
also by additional information about the state of each thread with respect to
concurrent activities such as synchronisation or other interactions between
threads. Depending on this information, a thread can react differently to a
method call. Moreover, synchronisation and other interactions can prevent
the continuation of a thread and put a thread in a state where it cannot
proceed.
We therefore make the following important distinction between two kinds
of thread states:
•the state of the thread object itself, determined only by attributes either
defined in its class or inherited from super classes, including Thread.
•the state of the control flow associated with the thread object, i.e. the
program counter and the thread state with respect to synchronisation
and any other information about a thread which determines the contin-
uation. This state can be changed by synchronisation and interaction
mechanisms used by the thread itself but also by other threads.
Note that the thread object state and the control flow state are orthogonal,
i.e. independent of each other. The object attribute values are not used for
encoding control flow states. The control flow states of Java threads are part
of the language and cannot be extended. Each thread, whether an instance
of Thread or of a subclass, is always in one of these states.
The thread lifecycle can be fully characterised in terms of control flow
states. Regarding the list of methods, keyword, and exception which serve
to implement the thread lifecycle, their behaviour is dependent on the control
flow states and they can change it. Therefore, a model has to describe the
94 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
semantics of the methods implementing the Java thread lifecycle in terms of
the control flow states.
Many control flow states serve to identify pre- and postconditions of
method calls. However, threads engaged in a method call, i.e. threads which
have not yet returned from a call, can also react in different ways to incoming
concurrent calls or events because they can leave that method in different
ways. That means, that intermediate states are also control flow states be-
cause they influence the behaviour of a thread by having an effect on state
changes. However, these states never occur as pre- or postconditions but
they are states reached only during a method call.
Example
An example for an intermediate control flow state is the state, where a thread
is in state blocking while waiting for a lock on an object after having called
asynchronized-method or after having entered a synchronized-block. When
the lock is eventually granted, the thread changes its state to running while
holding a lock. A thread in state blocking reacts differently than, e.g. in
state waiting to an interrupt()-call from another thread. In state blocking,
the interrupt flag is set. In state waiting, the thread immediately returns
from its call to wait() and the flag is not set.
Control Flow States and Failures
Only control flow states which describe different kinds of blocking are in-
volved in failures. In order to describe failures, these states have to be iden-
tified by our formalisation. Therefore, the required granularity of the model
has to be that of control flow states.
From the list of failures we can identify which states need to be distin-
guished. Failures involve blocking on a lock, waiting and joining. Therefore,
we require that these states are made explicit by the formalisation. It can be
observed that the blocking states are all intermediate states of some method
execution or synchronized-block execution.
A control flow state for a concrete thread can bear additional information
which is specific for this thread. For instance, a thread in state locking has
locked a certain object and a thread in state blocking is waiting for a lock
on a certain object. This is the kind of information needed to describe
failures involving cyclic dependencies. Therefore, the model has to cover
thread instance-specific information associated with a control flow state.
6.2. RELATED WORK 95
6.1.4 Summary of Model Requirements
It is the primary goal to describes the states in the lifecycle where a thread
cannot make progress and is potentially blocked forever. Control flow states
have been identified as a useful abstraction for capturing the characteristics
of such states. Therefore, the model has to describe how the Java methods
and calls to synchronized involved in the thread lifecycle are dependent on
control flow states and how they change control flow states.
We can conclude the following requirements:
•The scope of the model is the thread lifecycle.
•The model has to capture the behaviour of methods and statements
having an effect on the thread lifecycle in terms of control flow states.
•Control flow states are chosen at a level of granularity which allows
us to distinguish all states involved in failures, especially intermediate
states during method executions.
•The model has to capture instance-specific information of control flow
states.
•Based on this, failures and potentials have to be formalised as specific
states and sequences of states.
Thus, we are only going to model a dedicated part of Java.
Note that for our model it is not required that it is executable or that it
can to be formally analysed. We provide the model in order to precisely and
systematically describe concurrent liveness failures in Java and their poten-
tials and to develop support for their automated detection and visualisation.
6.2 Related Work
Aiming at a model of the behaviour of Java thread synchronisation is re-
lated to efforts in the area of formal Java semantics. It seems obvious to
consult the official Java Language Specification (JLS) [GJSB00]. However,
this specification provides only an informal yet precise description of Java
syntax and semantics. Nevertheless, this is the only source available from
which formalisations can be derived, and equally important, the only source
against which any formalisation can be proven or validated. Existing Java
implementations, i.e. compilers and runtime environments only serve as ref-
erence implementations, such as the ones by Sun Microsystems [Jav01], but
not as a specification.
96 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
As we plan to provide our own model we discuss the JLS in this section
under the perspective of deriving a model, in general, and also with respect
to our requirements.
Not surprisingly, many thorough attempts have been made in research
to overcome the lack of formal semantics for Java. Because they cover most
parts of Java it seems practical to reuse one of these. In this section, we
will discuss briefly some state-of-the-art approaches and argue why we have
chosen to provide our own model of thread synchronisation in Java.
6.2.1 The Java Language Specification
We start with a few general observations related to the organisation of the
presentation of Java threads within the official Java language specification
(JLS) [GJSB00]. The semantics of threads is scattered over a few chapter
in the JLS. The reason for the scattering is that the JLS is driven by the
syntax. This requires to deal with the method modifier synchronized in a
different place (JLS chapter 8) than with the synchronized statement used to
specify a block (JLS chapter 11).
A more unifying presentation is given in the JLS chapter 17 titled ”Threads
and Locks” which discusses the semantics of unsynchronised and synchro-
nised shared data using synchronized and also wait() and notify/All(). This
chapter provides a complete treatment of locking, i.e. with respect to the
use of the synchronized keyword. It provides a basic description of wait() and
notify/All(), but this description is incomplete with respect to the interrupt
mechanism involved. The built-in exception InterruptedException, thrown by
wait(), is not specified.
Not only wait() and notify/All() of class java.lang.Object are not com-
pletely covered by the JLS but also all other methods, classes, and inter-
faces dealing with threads which are part of java.lang, most importantly the
classes Thread,ThreadGroup, and the interface Runnable. Their semantics
are described in the documentation of the package java.lang. Although these
language elements are closely related to the Java language itself, these pack-
ages are not documented in the current version of the JLS [GJSB00]. This
clearly has a disadvantage for understanding one of the most complex parts
of Java. An integrating presentation is therefore desirable but needs a thor-
ough analysis of the relevant chapters of the JLS and of the documentation
of java.lang.Thread.
The specification style of the JLS and of the related package java.lang
does not exploit commonalities among the preconditions and postconditions
of method behaviour. The specifications of preconditions and postconditions
are per method (or per statement). Thereby, the JSL and also the Thread
6.2. RELATED WORK 97
related libraries do not make control flow states explicit. Analysing common-
alities is not only difficult because of the scattering but also because the JLS
is pure prose.
There is one part of the JLS which is not pure prose but contains well
structured textual rules. This is the so-called Java Memory Model (JMM)
which is the major part of the JLS chapter 17 [GJSB00]. The rules specifies
how unsynchronised and synchronised data (by means of synchronized blocks
and methods) is shared among threads between thread local memories and
one global memory, using a set of primitive operations for data transfers and
for locking and unlocking an object. The keyword synchronized is mapped to
the lock operation primitive, followed by the unlock operation primitive, both
operations are carried out by a thread and by the global memory together,
i.e. ”synchronised”. Formally analysing the JMM reveals some problematic
cases for hard to detect safety failures which has been the focus of several for-
mal approaches [RM02]. However, safety is not our focus. For our approach
we are interested in the behaviour of locking. The semantics of synchronized
is covered by rules which guarantee that all lock and unlock requests in a
program are totally ordered. Also, lock requests are serialised, i.e. simulta-
neous requests for the same lock cannot interfere. Only one lock request can
return successfully while the other one is blocked.
The way how the JLS deals with the specification of locking is not suitable
for our requirement. In analogy to the Java language, also the primitive
operations lock and unlock of the JMM, which lay claim to a lock of an
object and which release exactly this claim, do not allow to distinguish the
intermediate state where a thread is blocked having called lock because the
lock is not available. Essentially, the call to lock returns only after successful
locking, what happens internally is hidden. We want to be more explicit and
go beyond this level of abstraction, i.e. we want to identify state changes
inside the call to the operation lock.
6.2.2 Formal Java Semantics
We briefly present some well-known rigorous approaches and discuss why
they do not completely meet our requirements. The chosen approaches cover
multithreading in Java, however to different extents. As already mentioned,
many approaches dealing with multithreading aim at formalising the Java
Memory Model, which is not our focus.
Existing operational semantics [CKRW99, BS99] based on Structured Op-
erational Semantics (SOS) and Abstract State Machines (ASM) respectively,
fall short in completely covering thread synchronisation. The concepts of
joining, sleeping, and interrupting, which are tightly coupled with conditional
98 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
synchronisation, are not covered. Another missing feature is the discrimina-
tion of successful from unsuccessful locking. The blocking on locks, i.e. the
phase when a thread is acquiring a lock until it might be granted, is not
made explicit as a separate state. Instead, this behaviour is encapsulated
in a function which returns only when the lock is granted, similar to Java
synchronized and similar to the JMM lock operation. The executable opera-
tional semantics by [RM02] only address the JMM primitive operations using
Guarded Commands. They also do not distinguish intermediate phases of
locking. Moreover, they have reduced complexity by working with only one
lock for the entire Java program which locks a set of objects. Even when
these approaches are semantically correct they are not at the desired level of
abstraction. We need to discriminate the different phases involved in locking
and other blocking activities in order to allow precise description of failures.
The denotational approach by [WM00] using CSP is also not yet complete
with respect to thread synchronisation. The present version does not yet deal
with waiting and notification. Another drawback is that translating Java
into CSP changes not only the level of abstraction but also the concepts,
e.g. with regard to object-orientation. Hence, the failures of the original
program take a different shape in the CSP domain. This prevents an intuitive
understanding and requires to map forth and back. This is not necessarily
the case with any denotational semantics but it is a typical effect. Because
of these disadvantages a more intuitive formalisation is desirable.
In any case, working from an existing formalisation would make it nec-
essary to extend the scope and, in many cases, extend the handling of the
locking state. We do not consider integrating the missing features into an
existing approach as unfeasible. However, because we aim at formalising
only a dedicated part of Java, namely thread synchronisation, implemented
by a set of methods and statements, we do not want to take the burden of
extending an entire Java formalisation. Therefore we decide to provide our
own model dedicated to thread synchronisation.
6.3 A Statechart Model for Thread Synchro-
nisation
Given our requirement that the model of the dynamics of the thread lifecycle
should make states as pre- and postconditions and as intermediate states
explicit, we decide to use statecharts. Thereby we can achieve the following:
•Statecharts make control flow states explicit at the desired level of
granularity.
6.3. A STATECHART MODEL FOR THREAD SYNCHRONISATION 99
•They capture how execution of methods depend on control flow states
and how they change these states.
•Statecharts thereby capture commonalities in the pre- and postcondi-
tions of method call events.
•Statecharts are a visual and intuitive formalism for capturing state-
driven behaviour.
In comparison to other approaches based on states and transition, the hi-
erarchical mechanisms supplied by statecharts support factorisation of com-
mon behaviour of a group of states. Most statechart formalisms allow speci-
fication of side effects on variables, which can be used to model the instance-
specific state information. Because we do not intend to use the statechart
for formal reasoning, the chosen statechart approach does not have to be
executable.
A Java thread is object-oriented. Therefore, we decide to use an object-
oriented variant of statecharts. This allows us to assign statecharts to in-
stances of Java classes. Also, directed communication between instances is
supported in object-oriented statecharts. Object-oriented statecharts sup-
port for communication not only by asynchronous events as in classical stat-
echarts but also by method calls.
In the following we first present the chosen object-oriented statechart
approach. We then further classify the Java concepts involved in the lifecycle,
so that we can identify which concepts can be mapped in the same way to
statecharts. Then we will map the thread lifecycle to the chosen statechart
approach. Based on this we will formalise system state and history.
6.3.1 The UML Statechart Approach
The Unified Modeling Language (UML) [UMLa] provides an object-oriented
approach to statecharts. UML statecharts can be used for modelling be-
haviour of classes but also for other behavioural elements such as operations.
Here, we intend to use them for modelling the behaviour of a class. We do
need to model in which state a method call can be processed and we need
to model the effects of calls. These effects may include actions in order to
encode instance-specific behaviour. Therefore, a protocol statechart simply
defining order is not sufficient but we need to use a behaviour statechart,
following the terminology of the UML.
In the following we describe the semantics provided for UML statecharts
[UMLa] on which we rely for our formalisation. Where the semantics is left
open or where it imposes problem we will refer to other standard semantics
100 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
such as the semantics for Rhapsody statecharts [HK04, HG97], in order to
clarify the situation. The syntax will be given while presenting our statechart
solution.
Event Processing
A queue is associated with each statechart. It enqueues all events arriving
at a statechart. The UML does not specify exactly in which order events are
removed, in order to leave room for, e.g. priority-based strategies. For our
formalisation we assume that events are enqueued immediately after they
were sent in the order in which they were sent from each statechart. They
are dequeued in the same order as they were enqueued. In addition, events
can be prioritised, meaning that they will be enqueued at the head of the
queue. If several prioritised events are enqueued we do not make assumptions
about their order.
Events are processed one at a time, i.e. an event can only be processed if
the processing of the previous one has completed entirely (Run-to-Completion
Semantics). This includes all actions generated during the transition. If an
action is synchronous, then the transition is not completed until invoked
objects complete their own run-to-completion transitions.
An event can only trigger one transition. If several transitions are pos-
sible, one is arbitrarily selected except for hierarchical states where a prece-
dence mechanism is defined. If there is no corresponding transition for an
event it is ignored but not stored. In order to store an event, a state can
declare an event as deferred. Whenever the event occurs in that state, it is
put back in the queue.
An ignored event can be interpreted as normal behaviour or as illegal be-
haviour. Often, the latter semantics is associated with protocol statecharts.
Here, a synchronous method call event which cannot be dispatched is con-
sidered illegal (unless it is deferred). We will adopt this interpretation here.
If the statechart is attached to an active object, e.g. a thread, the events
for that object can be processed by its thread, potentially concurrent with
event processing in other threads. Events for passive objects are processed
by a system thread. This is a semantic variation point of the UML. Having a
queue per active object simplifies avoidance of concurrency conflicts during
event processing.
Although we will use hierarchical statecharts and a history mechanism,
we do not rely on a specific transition selection mechanism because we use
only one level of nesting and no conflicting transitions.
6.3. A STATECHART MODEL FOR THREAD SYNCHRONISATION101
Event Types
How event processing affects the sender of an event depends on the kind of
event. There are four kinds of events.
•An asynchronous signal event denotes the reception of an asynchronous
signal. The sender does not have to wait for the event to be processed.
This event types can have parameters.
•Asynchronous call event denotes the reception of a request to syn-
chronously invoke an operation. The sender has to wait until the call
event is dequeued and the transition is completed. This event types
can have parameters.
•Achange event is an event that occurs when a boolean expression
becomes true.
•Atime event occurs after a specified elapse of time.
In the presence of event-queues per active object, synchronous events can
cause a deadlock in the following way: A first thread sends a call event to a
second thread. The event is queued in the queue of the second thread. The
first thread becomes blocked waiting for the call to be handled. While the
call event arrives, the second thread processes an earlier event and sends a
call event to the first thread. This event is queued and cannot be handled
because the first thread is already blocked. The second thread also becomes
blocked.
At present, there is no convincing solution for this problem. The seman-
tics of Rhapsody statecharts [HK04] specifies that call events are sent to the
receiver directly instead of queuing them. It is also specified that the receiver
serialises incoming concurrent calls and that they can only be handled when
the thread does not handle other events. This blocks the callers in a similar
way as in the UML semantics. Deadlocks generated by these semantics are
considered as design errors. Both approaches, UML and Rhapsody enforce
a kind of rendezvous concept on thread communication in statecharts, i.e.
both threads have to be available to handle a call event from one thread.
Both proposed statechart semantics for synchronous events differ from
what is actually happening during a method call in a programming language
like Java. Therefore, they are not suitable for documenting the exact be-
haviour of programming languages like Java.
102 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
Guards and Actions
A guard is a boolean expression which controls whether a transition for an
event will fire or not. A transition can have an associated action which
executes when the transition fires. It must execute completely before the
transition is completed. This includes synchronous method calls and syn-
chronous call events. Both, guard and action, can access parameters of the
event and attributes, links, and operations of the object which is associated
with the statechart. It is not specified whether a transition is atomic with
respect to guard and action, i.e. whether guard and action are carried out as
an atomic transaction on the respective data structures. Therefore, we will
not assume atomicity.
Actions can send call events and signal events to other objects. The
target has to be specified with the event itself. An event generated during
a transition is put in the queue of the target. If no target is specified, the
event is sent to the object which has issued the event, i.e. the implicit target
is ”this”. UML proposes a framework for action semantics and supports the
design a suitable action language.
6.3.2 Classification of Thread Lifecycle Methods
Before we can design the states and transitions of the Java thread lifecycle, we
will discuss specific characteristics of methods of the thread lifecycle whose
pre- and postconditions and intermediate execution states we want to model
with states.
Method calls are are synchronous by definition. They can be further
characterised according to the fact whether they are static or non-static.
There are also some system events which trigger a return from a method call
instead of an ordinary return. Both, calls and system events, are potential
events on transitions.
The potential candidates for events can be further characterised as fol-
lows.
1. A thread can invoke non-static methods of a thread object (including
itself). These calls are executed in the control flow of the calling thread.
When a method returns, the state of the calling thread might have
changed and the state of the called thread, too.
Methods new(), start(), interrupt() only change the state of the called
thread object. Method new() creates a thread object. Method start()
creates the corresponding control flow, the thread object changes its
state to running. Methods new() and start() can only be invoked once
6.3. A STATECHART MODEL FOR THREAD SYNCHRONISATION103
for one thread object. Method interrupt() can be invoked concurrently
from concurrent threads. It changes the thread’s interrupt flag to in-
terrupted.
Method join() and wait() only change the state of the calling thread.
Also the call of a synchronized-method or block only change the state
of the calling thread.
Methods notify/All() do not change the state of calling or called thread.
The called thread is used to signal other waiting threads whose state
is potentially changed.
Methods join(),wait(),notify/All() and a synchronized-method or block
can be invoked concurrently. Note however, that wait(),notify/All()
also require a synchronized. Therefore, all methods except join() are in
fact mutual exclusive when invoked on the same thread object.
From the point of view of the thread object, on which these methods
can be called, these calls are also referred to as external calls.
The behaviour of a thread on which wait(),notify/All() and a synchro-
nized-method or block is called is the same as of a passive Java object.
They will be explained in more depth in item 3. In the following we
will not deal with these methods specifically for thread objects, be-
cause the behaviour of thread objects regarding these methods can be
derived from the behaviour of objects. This is admissible, because these
methods do not interfere with the thread-specific behaviour of thread
objects.
All other methods which can be invoked concurrently on a thread object
have no effect on the lifecycle nor on synchronisation. In order not to
make the statechart more complex than necessary, they will not be
considered here.
2. A thread can invoke static methods of class Thread. These methods
have an effect not on the state of the class but on the thread itself who
is calling them.
A thread can invoke one of these methods concurrently with invocation
of its own non-static methods by other threads.
Either the state of a thread changes upon return of the method, e.g.
for interrupted(), or the state of a thread changes during the execution
of the method, and the state is reset upon return. For example, sleep()
causes a thread to change to sleeping until it returns from the call.
While a thread is executing method sleep() it can react differently to
104 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
concurrent invocations from other threads. If method interrupt() is
called on a sleeping thread, the thread returns immediately.
3. The thread under consideration can invoke any non-static method on
any other Java object.
Calling methods wait() or the keyword synchronized has an effect on
the calling thread but not on the Java object. Method notify/All() has
no effect on caller or callee but on other threads waiting in the queue
of the object.
In the case of wait() or the keyword synchronized, the thread changes its
state during the call and it changes its state again upon return. Again,
these states are considered important.
It is important to distinguish the state waiting because a thread can
return differently from this state, via timeout, via notification or via
interrupt(). It is important to distinguish the blocking for a synchro-
nized-lock because this is the source for failures.
4. A thread changes state due to system events such as notifications or
unblocking or timeouts.
In a state where a system event occurs the thread is engaged in calling
specific methods but cannot proceed from these methods unless other
threads make certain calls.
Hence, these events are only concurrent with respect to calls from other
threads. They can never be concurrent with a method call from the
thread in which they occur because that thread is in a state where it
cannot make any more calls.
A statechart modelling the behaviour of one thread has to model how
it reacts to calls issued by its own control flow, how it reacts to calls from
concurrent thread on the corresponding thread object, and how it reacts to
system events for its own control flow.
6.3.3 Principles for Designing States and Transitions
In this section we will derive a set of design principles for designing state and
transitions from the above characteristics of methods for thread synchroni-
sation. These principles will be applied when we identify how the behaviour
of a method is described in terms of states and transitions.
The principles are chosen such that they support our main goal, which
is to use the statechart model to specify failures and potentials. Therefore,
6.3. A STATECHART MODEL FOR THREAD SYNCHRONISATION105
the statechart must model not only the behaviour of the class Thread with
respect to synchronisation but it must also model instance-specific behaviour
such as a dependency between two threads.
The goal of the statechart is to describe how a thread reacts to method
calls which it issues itself and which it receives from other threads. We do
not model, how the method calls issued by a thread are generated. We make
the assumption, that, when a thread of control works through the code, the
corresponding events will be generated.
Designing Events
We have already pointed out that there are some key differences between
Java semantics on the one hand and UML statechart semantics on the other
hand. When attaching a statechart to a thread, the events sent to the thread
are queued. Because the thread is considered an active object it can process
the queue using its own thread of control. If another thread synchronously
calls the thread and if that call maps to a call event that call (event) is
queued. To the contrary, in Java, the call would be executed immediately in
the thread of control of the calling thread. Also semantics which do not queue
synchronous calls are not equivalent because they make multiple concurrent
calls mutual exclusive. It is not the goal of this thesis to provide a general
solution, either as an extension or a mapping algorithm, for this problem.
Instead, we will try to determine the most appropriate solution for our Java
synchronisation model.
Because of the mismatch we will not rely on the use of synchronous events.
Instead we consider asynchronous events as an alternative. There are several
reasons why this is feasible.
First of all, we do not want to model the behaviour of arbitrary method
calls. Next, for some method calls we want to model intermediate execution
states which would not have been possible with synchronous events but re-
quires to split method calls in asynchronous enter and exit events. Also, we
have noted that there is more than one way to return from these states. This
can easily be modelled by several exit events.
For those method calls, which we do not split in this way, we consider a
single asynchronous event as appropriate. The nature of these method calls
is such that they will always return immediately and that they have no return
value. Therefore, they are very similar to asynchronous events, such as the
method interrupt(), only that Java does not support asynchronous calls.
The justification will be made clearer for the different cases in the follow-
ing. When we model calls with asynchronous events we also have to consider
that these events are queued and we have to ensure, that the queuing does
106 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
not yield a semantics which is not feasible in Java.
Using asynchronous events we loose of course synchronity. We cannot
specify in the statechart that a certain exit event has to follow a method
enter event. However, we gain expressiveness with respect to those states we
are interested in for failure detection. There is no way to map a synchronous
call in two asynchronous events by means of a statechart with the above
semantics. We have to assume that the corresponding entry and exits events
are generated.
Designing the Event-Queue
Following UML, we assume that there is a queue for each statechart, i.e. for
each object for which a statechart exists. We do not use priorities and we do
not use deferred events.
Each thread receives events due to its own calls and events from other
threads due to their calls. Both are queued in the same queue. Therefore, we
have an interleaving semantics for the concurrency semantics of Java which
allow both, interleaving and real parallelism. For the calls which a thread
executes in its own control flow, we assume that the system generates them.
We assume, that at most one such event is generated and that at most one
such event is in the queue at a time. This is correct, as the thread will be
able to execute only one piece of code after the other. We do not model how
these events are generated.
Designing States
Designing states involves three key issues,
•identifying intermediate states, i.e. states describing the execution of
a method call,
•factoring out commonalities, i.e. hierarchical states and common tran-
sitions, and
•designing how instance-specific information per state has to be kept.
The last item takes us to the issue of the design of an action language.
Action Language
We have already said several times, that in addition to the state of a thread
there is also state-specific information which can not be captured by a stat-
echart. This additional information is needed for two purposes:
6.3. A STATECHART MODEL FOR THREAD SYNCHRONISATION107
•It constrains the executions of method calls.
•It can be used in expressing state-based dependencies between threads
which is needed for failures.
An action language provides means to deal with such additional information.
For the first case, we need to be able to use that information in guards, and
therefore actions have to perform suitable side effects. For the second case,
we need to be able to query that information for the entire system state and
we need to be able to navigate it arbitrarily.
We assume that for each statechart (instance) a list of variables can be
specified. We use variables of type N(integer rsp. natural numbers including
0) and of type set of N, also denoted as P(N). Each statechart (instance)
has its own data space and cannot access the data of another statechart.
We assume that the action language can perform assignments on integers
using simple arithmetics, and assignments on set types using standard set
operations.
We use the set N to depict unique identifiers of threads and objects.
Similarly, P(N) is used to depict sets of unique identifiers. This is based
on the assumption, that the unique identifiers are either integers or can be
mapped uniquely to integers.
Direct Mapping of Method Calls to Asynchronous Events
Here we describe the one-to-one mapping of a synchronous call to an asyn-
chronous event.
This applies to the non-static methods new(),start(),interrupt() (see item
1.) and notify/All() (see item 3.). It applies to the static method interrupted()
(see item 2.). Here we describe the effect of using an asynchronous event and
we conclude by defining the corresponding event for the resulting statechart.
•The external calls new() and start() both return without problem. Con-
cerning the thread on which start is called, the queuing should not
have an effect as this call does not compete with others by nature. The
queuing has the beneficial effect, that concurrent calls assuming that
the thread was started can not be made too early.
The two events introduced are NEW and START. They have no param-
eters.
•Methods notify/All() are mapped to single asynchronous events. They
have no influence on the caller and no return value. The method no-
tify/All() is mutual exclusive because it requires synchronized. There-
fore, the queue is not a problem.
108 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
The two events defined are NOTIFY(ob) and NOTIFYALL(ob). They
have a parameter for the target object.
•For interrupt() we have chosen to handle it as an asynchronous event
to avoid deadlocking. This is admissible, as the method interrupt()
cannot block because it is not synchronized and also it does not return
a value. Therefore, the interrupting thread usually does not depend on
the synchronous handling. As a result of using an asynchronous event,
the calling thread is not waiting longer than necessary because it is not
waiting at all.
Regarding the queuing itself, we argue that each interrupt changes the
control flow state of the called thread and therefore has to be carried
out mutually exclusive with other events handled.
The asynchronous event defined is IRPT.
•Also method interrupted() is mapped to an asynchronous event. This is
problematic, as it has a return value. In addition, the method changes
the state by clearing the interrupt flag. We can accept to replace the
method with an event, if we argue that the return has no effect on
synchronisation and we ignore its value.
The asynchronous event defined is INTERRUPTED.
Table 6.1 gives an overview of the mapping of method calls to asyn-
chronous events, which are depicted in capitalised letters. Parameters are
omitted in the table.
Splitting Method Calls in Asynchronous Events
Here we describe the mapping of a synchronous call to a combination of an
asynchronous entry event, modelling the entering of the method call, and at
least one asynchronous exit event.
This applies to the non-static methods join() (see item 1.) and wait() (see
item 3.), to non-static methods with a keyword synchronized and to blocks
with the keyword synchronized (both item 3.), and to the static method sleep()
(see item 2.).
•The call to join() is split, because it can be involved in failures and it
can return via interrupt or when the joined thread is finished or via
timeout.
The entry event is JOIN(t) or the timed variant JOIN(t,d). The param-
eter tis the target of the join, dis the duration of the timeout. The
events defined for return are EXIT,IRPT or a time event after d.
6.3. A STATECHART MODEL FOR THREAD SYNCHRONISATION109
Mapping Java methods/
statements
Event name
Single asynchronous events new(), start() NEW, START
interrupt(), in-
terrupted()
IRPT, INTER-
RUPTED
notify/All() NOTIFY, NO-
TIFYALL
Asynchronous entry events join, join(d) JOIN
sleep() SLEEP
wait(), wait(d) WAIT
Asynchronous entry and exit
events
synchronized LOCK, UN-
LOCK
Asynchronous system events timeout after
timed wait(d),
join(d), sleep(d)
after d
notification of a
thread
NOTIFICATION
unblocking of a
thread after lock
release
FREE
return from run RUN EXIT
Table 6.1: Mapping Java Behaviour to Statechart Events
110 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
•The call to sleep() is split analogously, although never involved in fail-
ures. It can return via timeout or via interrupt.
The defined entry event is SLEEP(d) with a parameter for the duration
of the sleep. Returns are via a time event after d or IRPT.
•The call to wait() is split. Waiting can be involved in failures and has
several possible ways of returning, via interrupt, timeout or notification.
As with notify/All(), the queue is not a problem.
The entry event is WAIT(ob) or WAIT(ob, d). The parameter ob is
the object on which the wait is called, dis the duration of the timed
variant. Return is via events IRPT,after d, or NOTIFICATION.
•Intermediate states of calls to synchronized are involved in failures.
However, synchronized deserves a more complex treatment than a sim-
ple splitting. Firstly, the call itself is split in an explicit locking and
unlocking, because the release of a lock has an effect on other threads
which has to be modelled. Secondly, locking has to be differentiated
because it can involve an intermediate state blocking before the thread
obtains a lock and continues to run.
Calls to synchronized are split into the events LOCK(ob) and UNLOCK(ob)
with the parameter ob for the object on which they are called. When
a thread tries to obtain a lock via LOCK(ob), it has to be signaled by
the system if it can obtain the lock via the event FREE(ob).
Because synchronized is mutual exclusive by definition, the queuing of
the corresponding events is not more restrictive than Java.
The resulting events are depicted in Table 6.1.
Direct Mapping of System Events
The system events (see item 4.) map directly to asynchronous events. We will
not model the system and therefore we will not model how system events are
generated. The following events have already partly been introduced above.
•We model timeouts occurring after certain method calls with a time
event, denoted as after time.
•When the target thread of a join() is exiting, an event EXIT is generated.
•When a thread calls notify() or notifyAll(), the system determines which
threads have to be notified and sends an event NOTIFICATION.
6.3. A STATECHART MODEL FOR THREAD SYNCHRONISATION111
•After a synchronized-lock is released, one blocking thread has to be
signaled via a FREE event.
•Finally, a thread returns from run() via RUN EXIT.
See the resulting events in Table 6.1.
6.3.4 The Resulting Statechart
The main goal is to show how the thread changes state due to calls it makes
itself and due to incoming calls from other threads. We have already de-
scribed how we define events for each method, statement, and system event
from the Java thread lifecycle.
Because of the complexity of the thread lifecycle we will introduce the
resulting statechart step by step. Each statechart models the behaviour of
one thread. In the following we will refer to this thread as the observed
thread in order to distinguish it from other threads it interacts with.
We first describe, how the thread reacts to calls from other threads. Then
we describe, how the thread reacts to self calls, i.e. we show how self calls
change the control flow state of the calling thread.
Note again that we omit all methods which by their nature have no influ-
ence on the lifecycle but provide handling of independent thread properties
such as methods to set/get the thread name, group, priority, or interrupt
flag.
The resulting statechart will describe the part of the behaviour of a Java
thread related to the thread lifecylce, especially including synchronisation.
Therefore, the statechart can be seen as providing operational semantics for
a set of Java method calls with respect to a limited set thread states. The
statechart describes an interpreter which takes a thread state, an event, and
the state of the queue as input, and generates a (new) state as an output.
The following statecharts were created with the UML CASE tool Together
[Tog].
Reacting to External Calls
After its instantiation via NEW the thread is in the state created. After the
call of START it is in the state runnable (see Fig. 6.2). We have chosen
this label to express that it is under the control of the scheduler whether the
thread is actually running or not, although it has no effects on the semantics
presented here.
While start() returns immediately, after starting the thread by calling
run(),run() will not return immediately. The entering of run() is therefore
112 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
modelled as the transition to a new state. We depict the return with the
event RUN EXIT which triggers the transition to state terminated. This event
covers ordinary return from method run() as well as exceptional return with
no handler found. For our purpose, it is not necessary to differentiate between
ordinary and exceptional return. When first entering runnable, the observed
thread enters cleared because its interrupt flag is not set.
In a state like runnable and also in other states to come, we have to
provide transitions with events for external calls. Absence of an event means
that such a call is illegal. The event NEW does not make sense since new()
can be called only once for one instance. Method start() could be called
again, and this is illegal.
Any thread can send an IRPT. The called thread has to change from
cleared to interrupted. When IRPT is received in state interrupted, the thread
remains in this state. Other effects of IRPT will be explained when sleeping,
joining, waiting, and locking is described.
External calls to synchronized and wait(), and notify/All(), and join() will
not be considered here, as they do not change the state of the called thread.
Instead, we will consider these methods, when the thread itself is calling them
on other (thread) objects, and we will model the effect which they have on
the calling thread.
Reacting to interrupted()
With a static self call interrupted(), the observed thread can change from
interrupted or cleared to cleared, depicted with the event INTERRUPTED
(see Fig. 6.2).
Reacting to sleep()
When the static method sleep(d) is called by the observed thread, the ob-
served thread reacts to it by sleeping for the specified amount of time, then
the call returns. The call also returns when another thread calls interrupt()
or when the interrupt flag has already been set.
With the event SLEEP(d) the thread enters sleeping if it was in cleared.
It returns via after d or IRPT (see Fig. 6.2). If the thread was in interrupted,
it changes immediately to cleared.
We do not distinguish between ordinary return and exceptional return. If
a thread returns via an interrupt, the corresponding exception will be caught
by a handler at some point upward in the callstack, or the program exits. If
it is caught by a handler, the program will continue normally. Therefore, we
do not introduce a new substate of runnable but show the different events
6.3. A STATECHART MODEL FOR THREAD SYNCHRONISATION113
created
terminated
sleeping
joining
timed joining
blocking&cleared
blocking&interrupted
waiting
runnable
cleared
interrupted
timed waiting
IRPT
IRPT
EXIT, after d, IRPT
EXIT, IRPT
after d, IRPT
INTERRUPTED
IRPT
RUN_EXIT
SLEEP(d), JOIN(t), JOIN(t,d),
WAIT(ob), WAIT(ob,d)
SLEEP(d)
JOIN(t)
JOIN(t,d)
NEW
START
WAIT(ob)
LOCK(ob)
LOCK(ob)
FREE(ob)
FREE(ob)
NOTIFY(ob), NOTIFYALL(ob),
UNLOCK(ob)
IRPT, NOTIFICATION
WAIT(ob,d)
after d, IRPT, NOTIFICATION
Figure 6.2: Thread Lifecycle Statechart
on the transitions. In any case the thread will be in the state cleared when
it returns (see Fig. 6.2).
Reacting to join()
The non-static method join() can be called by an observed thread on another
thread but it has an effect only on the calling thread, i.e. the observed thread.
We distinguish the states states joining and timed joining (see 6.2) because
the thread can return differently from them.
The observed thread enters joining by JOIN(t) where tstands for the
target. On termination of the target thread the event EXIT is sent to each
joining thread. Also, the observed thread can return if an IRPT is sent. The
timed variant is depicted by JOIN(t, d). From this call the observed thread
can return via EXIT, by IRPT and by timeout after d.
Reacting to wait() and notify/notifyAll()
Similar to joining, we distinguish the states waiting and timed waiting which
are entered via WAIT(ob) or WAIT(ob, d) from state cleared (see Fig. 6.2).
From both states, the observed thread can return via IRPT or via a NO-
TIFICATION generated by the system when another thread calls notify() or
notifyAll() on the corresponding object. From timed waiting, the thread can
also return via timeout after d.
114 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
In Fig. 6.2 it is shown, that the thread then changes to blocked&cleared
because it first has to re-acquire the lock which it released upon entering
waiting. Locking will be explained in the next paragraph.
One NOTIFICATION can cause the leaving of waiting/timed waiting. If
there are more notifications in the queue than threads, they cannot be
matched and therefore, they have no further effects. This corresponds to
the Java semantics of notification.
If the observed thread receives WAIT(o) or WAIT(o,d) in state interrupted,
the wait returns immediately with an exception and the interrupt flag is
cleared, and therefore the thread enters cleared.
A thread can issue notifications only in state runnable. It reacts to them
by return to the previous state, which is modelled using the history state.
Reacting to synchronized-Calls
Asynchronized-method or synchronized-block claims a lock for the specified
object in order to proceed. It proceeds if the lock is granted or blocks if
the lock is possessed by another thread. We have to distinguish, whether
the thread was in state cleared or interrupted because it has to return to
the previous state when the lock is granted. We cannot use the history
mechanism because the thread can receive an IRPT while blocking on the
lock.
In a simple solution (see Fig. 6.2), the thread changes to state block-
ing&cleared when it receives an event LOCK(ob) in state cleared. This state
has two meanings. It is a state where the thread tests if the object is free.
If the object is free, an event FREE(ob) is sent, and the thread can leave the
state immediately. The state also has the meaning, that the thread is blocked
because the object is already locked when it tries to acquire it. Then it also
leaves via an event FREE(ob) which is sent when another thread releases the
lock and when the observed thread is the one granted the lock.
Analogously, when in state interrupted the thread changes to blocking-
&interrupted. Note that the thread can change from blocking&cleared to
blocking&interrupted due to IRPT.
Locks are released with the event UNLOCK(ob) in state runnable. After
this event, the thread returns to the previously hold state, indicated by the
history state.
The solution presented so far does not yet handle instance-specific infor-
mation such as which locks are held and how often each lock was entered,
i.e. re-entrance. Therefore, the solution does not support to test whether a
lock is held when calling wait() or notify/All(), modelled by the correspond-
ing events WAIT,NOTIFY,NOTIFYALL. It also does not distinguish in state
6.3. A STATECHART MODEL FOR THREAD SYNCHRONISATION115
created
unlocking
timed waiting
sleeping
waiting
joining
timed joining
timed releasing
releasing
runnable
interrupted
cleared
terminated
testing
testing&interrupted
blocking
blocking&cleared
blocking&interrupted
NEW
RELOK(n)/num:=n
WAIT(ob,d)[o in
locks]/ob:=ob^ob.RELWAIT(self)
SLEEP(d)
JOIN(t)/join:=t
JOIN(t,d)/join:=t
UNLOCK(ob)[o in
locks]^ob.RELEASE(self)
NOTIFY(ob), NOTIFYALL(ob)[o in
locks]
INTERRUPTED
IRPT
SLEEP(d), JOIN(t), JOIN(t,d),
WAIT(ob), WAIT(ob,d)
EXIT, after d, IRPT/join:=0
EXIT, IRPT/join:=0
RELLAST(ob)/locks:=locks-{ob}
REL(ob)
START
after d, IRPT
WAIT(ob)[o in
locks]/ob:=ob^ob.RELWAIT(self)
RELOK(n)/num:=n
RUN_EXIT
OK(ob)/locks:=locks U
{ob};acquire:=0;num:=0
LOCK(ob)/num:=1^ob.TEST(self) after d, IRPT,
NOTIFICATION^ob.TEST(self,n)
IRPT,
NOTIFICATION^ob.TEST(self)
OK(ob)/locks:=locks U {ob};
acquire:=0;num:=0
LOCK(ob)/num:=1^ob.TEST(self) IRPT
RETRY^ob.TEST(self,num)
NO(ob)/acquire:=ob
RETRY^ob.TEST(self,num)
NO(ob)/acquire:=ob
Figure 6.3: Thread Lifecycle Statechart with Guards and Actions
runnable whether a thread is running with locks or not. Also, the molding
of testing for a free lock and the actual blocking into one state is not really
satisfactory.
Based on the statechart in Fig. 6.2 we cannot yet specify failures or
potentials because we cannot yet express which locks a thread is holding
and at which threads or objects it is waiting. The next paragraph describes,
how the missing information can be added. We will also provide a more
appropriate solution for describing blocking and locking.
Adding Instance-Specific Information
The thread statechart is augmented by a definition of data fields holding
the instance-specific information needed to express dependencies involved in
failures and potentials.
We first describe the straightforward extension to capture dependencies
involved in joining. We need to keep track of the target of a thread in state
116 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
unlocked locked
RELWAIT(t)[locker=t]/locker:=0;n:=0
^t.RELOK(n)
TEST(t)[locker =\= t]^t.NO(ob)
RELEASE(t)[locker = t
&&n>1]/n:=n-1^t.REL(ob)
RELEASE(t)[locker = t &&
n=1]/locker:=0;
n:=0^t.RELEASE(ob), RETRY
TEST(t, n)/locker:=t;n:=n^t.OK(ob)
TEST(t)/locker:=t, n:=1^t.OK(ob)
TEST(t,n)[locker=\=t]^t.NO(ob)
TEST(t)[locker = t]/n:=n+1^t.OK(ob)
Figure 6.4: Object Synchronisation Behaviour Statechart
joining or timed joining. Therefore, the thread statechart has a field join :
Nof type integer (also depicted in the field declaration below). On entering
the above states, the target thread is stored with the action join:=t. Target t
is obtained from the event parameter. When returning, the target is cleared
with join:=0. This extension is depicted in the statechart in Fig. 6.3.
The following code snippet declares the data fields used by a thread stat-
echart. Note that for each thread instance, there is a separate instance of all
these fields. The thread identifier is stored in the field self. The remaining
fields will be explained in the following.
// data fields for thread statechart
self : N // unique thread identifier
join : N // target thread of a join call
ob : N // target object of a wait call
acquire : N // target object of a blocking synchronized call
locks : P(N) // set of locks hold by the thread
num : N // number of locks to be acquired during re-entrance
Next we deal with blocking and locking. Here we have to keep track of
the following details:
•We have to track the state of a lock object, either locked or unlocked.
•We have to track for each lock the owner and vice versa.
•We have to count re-entrance of locks.
That means, for each LOCK(ob) event we have to check whether the object
is already locked by this thread. Depending on the result we either change to
6.3. A STATECHART MODEL FOR THREAD SYNCHRONISATION117
state runnable or to blocking&cleared. The check must be mutually exclusive
with checks of other concurrent threads for the same lock object. As threads
do not share data, this can only be modelled through a statechart for lock
objects. Also, the checking and the obtaining of a free lock must be atomic.
As this involves communication with the object’s statechart, this cannot
be modelled with a single transition in the thread statechart. In addition,
the object statechart must guarantee that the effect of the communication
guarantees atomicity.
Therefore, we propose a solution, where a thread first tries to grab a lock
and in a second step checks whether it was successful or not. Depending on
the result it enters different states. This is implemented using an additional
state testing (see Fig. 6.3). On the transition from cleared to testing, the
thread sends an event TEST to the object it wants to lock and then waits
for the answer of the object. The thread can receive a NO meaning that
the object was already locked. Then it enters blocking&cleared and stores
the acquired object in the field acquire : N using acquire:=ob. The thread can
receive an OK meaning that is has locked the object successfully. Then it
adds the lock identifier to the set of its locked objects hold in the field locks :
P(N) with the action locks:=locks ∪{ob}. It also deletes the acquired object
with acquire:=0. Note that re-entrance is counted in the object itself. The
thread can leave the blocking when it receives a RETRY generated by the
lock object. It then sends again a TEST.
Again we require a separate state testing&interrupted for a thread which
has been interrupted. The transitions are the same and have already been
explained.
The thread releases locks via the event UNLOCK. This event can now be
guarded using [o in locks]. (Note that in the diagrams generated with the
CASE tool Together the symbol ∈is depicted as ”in”). Also, here commu-
nication with the object statechart is required to indicate the release of the
lock using event RELEASE(ob). The object statechart checks whether the
lock is finally releases. If so, the object sends an event RELALL event and the
thread removes the lock from its list using locks:=locks - {ob}. Otherwise,
the objects sends a REL event. The thread enters via the history state to
return to its state prior to the unlocking.
Note that for a complete model it would be necessary to introduce for
each of the states unlocking,testing, and testing&interrupted an additional
transition for receiving IPRT which could happen while the thread is in these
states. We have omit this in order to keep the statechart readable. Alterna-
tively, one could assume that the events from the communication with the
object are prioritised. For convenience we have grouped all states related to
blocking into a super state blocking. This avoids having to distinguish states
118 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
when describing failures involving blocking.
The lock object behaviour relevant for the above described communica-
tion with the thread statechart is depicted by the statechart in Fig. 6.4. It
is associated with the following data.
// data fields for object statechart
locker : N // thread currently locking the object
n : N // re-entrance counter, i.e. number of times thread has locked the object
The object has a field for storing the thread which locks the object and
it has a field for counting the re-entrance of that thread. Threads blocking
at the object do not have to be stored because the model does not cover the
generation of notifications. Instead it assumes that NOTIFICATION events
are generated by the system.
The statechart from Fig. 6.4 distinguishes states unlocked and locked. In
unlocked, the object can grant access to the first thread whose TEST event it
receives. The object handles TEST events mutually exclusive and therefore
the object is only locked by one thread. The object stores this threads and
sends an OK to it. When the thread tries a re-entrance, access is granted.
Other threads are sent a NO. When the thread releases the lock, the objects
checks for final release and sends messages correspondingly. Note that the
same behaviour as for the object also applies to thread objects as they can
serve as locks, too. Here we have not modelled this aspect of threads as this
would make the resulting statechart too complicated. The missing behaviour,
as we have argued before, is completely orthogonal to the thread behaviour
specified so far.
The extensions also requires to make changes to notifications and waiting.
We can use the information to guard the events WAIT,NOTIFY, and NOTI-
FYALL with [o in locks]. On entry, the lock has to be completely released by
sending a RELWAIT event to the object. The object has to acknowledge the
release by a RELOK(n) providing the re-entrance number.
In addition, states waiting and timed waiting are linked with the new
state testing (see Fig. 6.3). It is entered via events NOTIFICATION,IRPT, or
after d. In addition, the event TEST is sent to the acquired object. Therefore,
the object reference has to be store when entering waiting or timed waiting.
This is done by an assignment to an additional field ob:=ob.
Method wait() or the timed variant wait(d) can only be called when the
corresponding synchronized lock is hold by the thread, this is already checked
when the call is generated. We have to keep track of the fact that the thread
has to release the lock completely (and regain the lock after returning from
wait). When a thread releases a lock on entering wait(), the object statechart
sends the re-entrance count to that thread and releases its lock completely.
6.3. A STATECHART MODEL FOR THREAD SYNCHRONISATION119
Also from state releasing we would need a transition for an IRPT which we
also omit in in analogy to the states unlocking,testing etc. When a thread
returns from waiting it claims multiple access to the lock. Either they are
granted or the thread enters blocking&cleared. From there, a thread returns
by claiming a lock a fixed number of times. For a thread which claim the
lock for the first time, the number is set to 1. Leaving waiting we can reuse
the existing state testing. An event TEST with an additional parameter for
the re-entrance number required is sent. Only if the corresponding object
is unlocked, the thread enters runnable with the action that it acquires the
lock as many times a held before. If the corresponding object is locked, the
thread changes to blocked&cleared. A thread changes via after a time event
to blocking&cleared if the corresponding object is locked. If unlocked the
thread changes to runnable with the same actions taken as for notification.
Note that also other solution are conceivable. For modelling re-entrance,
one could also maintain a re-entrance counter in the thread data, one for
each lock in the set of locks. Here, our goal was to keep the data types as
simple as possible.
Limitations of UML Statecharts
At present, the formal semantics of UML statecharts have not yet been com-
pletely defined. Of cause, there will not only be one formal semantics, but
different semantics will concretise some of the extension points of the UML
[HK04].
We do not want to contribute to the area of defining formal semantics of
UML statecharts. Instead we are interested in using visual formalisms for
describing models useful for understanding a certain domain. Nevertheless,
we have tried to keep the statechart simple, especially with respect to the
action languages, so that it would be straightforward to translate it into a
simpler state-based formalism such as Labelled Transition Systems as used
in [SHS03] or Kripke structures [LMM99] for which formal semantics ex-
ist. Also, the hierarchical states and the history mechanism could be easily
transformed into non-hierarchical states.
6.3.5 System State and History
The statechart provides the lifecycle for one thread. In order to formally
define failures, we have to formalise the states of all threads involved, as a
failure has been defined as a system state.
We intend to simulate a system of concurrent threads by instantiating
the statechart for each thread of the system.
120 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
The set of all objects existing throughout the whole system lifetime is
denoted by the set of natural numbers N. Each number is used only once.
This set includes also thread objects. The set of all control flow states of
threads is denoted by C.
C={created, runnable, cleared, interrupted, terminated, sleeping,
joining, timedjoining, waiting, timedwaiting, blocking,
blocking&cleared, blocking&interrupted, unlocking, releasing,
timedreleasing, testing, testing&interrupted}
We do not need to formalise the object states for the description of failures
and potentials because it is sufficient to refer to objects through the instance-
specific information of each thread. We define the system state as a mapping
of the set of objects to the set of control flow states.
Definition 6.1 (System State) A system state is a partial function
sys :N→C∪ {⊥} with
n7→ c∈C, if n is a thread and the thread is alive
n7→ {⊥} , if n is a thread that is not alive.
Note that only objects that are threads are mapped to thread states. The
bottom symbol denotes that a thread does not have a system state, because
it is not yet created or already deleted. The state of a thread also includes
the instance-specific values which are stored in the Java data structure. For
each thread state there is a snapshot of this data. We assume that the data
fields can be accessed in a given system state using the field names.
The execution history of a system is a sequence of events generated by
the statecharts of all threads of the system.The set of possible event occur-
rences is denoted by E. Here, we need occurrences and not types, because
we need access to the actual parameters of events. Set Eis infinite because
of the infinite possibilities of actual parameters. Event occurrences are de-
rived from the event types given in Table 6.1 and by the additional event
types defined while constructing the statechart in Fig. 6.3. The event types
are instantiated with potential parameter values. All parameters are natural
6.3. A STATECHART MODEL FOR THREAD SYNCHRONISATION121
numbers. Depending on the signature of the event defined in our statechart
model, a number may identify an instance or a timeout.
E:= {IRP T, LOCK(1), UNLOCK(1), ...}
The execution history also keeps track of the threads which generated the
events. An event in the history is therefore a tuple (e, n) where e∈Eis an
event occurrence and n∈Nrepresents the reference to a thread instance.
The chosen definition of an execution history provides a total order. That
means, concurrent events are eventually serialised. The result is an order with
an interleaving semantics. This is sufficient for our purpose. The history is
a mapping from a point in time to an event in the history. The time axis is
represented as a set of natural number, denoted as Ifor index.
Definition 6.2 (History) The history is a partial function
his :I→(E×N)with
i7→ (e, n), also denoted as e
n, if event (e, n)happens at the point in time
represented by i.
Here we have defined a history simply as a trace of events but not as
a sequence of states. For our purpose, this will be sufficient. The above
mapping is partial because a program might generate only a finite number
of events.
Note that the term trace as used here refers to a different kind of trace
than in the motivation. The term used in the motivation stands for a trace
generated from a running Java program where an event has a different format
from here consisting of the method name, parameter names and values and
a lot more. Therefore, it should not be confused with the execution history
based on statechart events as defined here. The trace we have defined here
can be seen as an abstract model of the execution of a Java program and
thus an abstraction of the trace as defined in the motivation.
In Sect. 6.4, the above formalisation will be used to specify failures
and potentials in a declarative way. We assume that for each thread or
object created, an instance of the corresponding statechart exists. We further
assume that the data fields associated with each statechart are initialised.
The self field of the thread-specific data has to be set to the identifier (natural
number) of the corresponding thread. All other fields of type Nshould be
set to zero. Fields of type P(N) are assigned with empty sets.
122 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
6.4 Formalising Concurrent Java Liveness Fail-
ures and Potentials
Failures which involve dependencies will be formalised using the system state
formalisation (see Definition 6.1) because dependencies can only be expressed
with these kind of formalisation.
Failures not involving dependencies and potentials are more conveniently
formalised by the notion of an execution history (see Definition 6.2).
6.4.1 Formal Description of Failures
We formalise the failures which have been informally introduced in Chapter 5.
There, typical failure scenarios were depicted by a sequence diagram. Here,
we describe a formalisation which aims at matching all possible scenarios of
one kind of failure in a running Java program.
In the following we do not consider the timeout variant of events and of
states, i.e. the state from which one can return with timeout, because includ-
ing them would make the specifications less understandable. It is possible to
replace the variants with no timeout with a timed variant in the following.
There are two failures which are not formalised. Indefinite blocking on
synchronized (F3) is not covered because there is no pattern, besides detecting
a blocked thread, which allows to detect it. Detecting it depends only on
heuristic using a timeout after which the blocking is assumed to be a failure.
Missing notification (F5) cannot be covered because it is related to source
code.
Deadlock (F1, F2)
Here we give a deadlock definition for two threads. The definition can be
extended to match three or four threads, etc. Note that the definition covers
also the wait-induced deadlock.
∃ti∈N∃tj∈N:ti6=tj∧sys(ti) = blocking ∧sys(tj) = blocking
∧ti.acquire ∈tj.locks ∧tj.acquire ∈ti.locks (6.1)
Missed Notification (F4)
This failure involves one thread not being able to make progress because it
is in state waiting. This thread is dependent on other threads because it
is waiting for a notification signal. Assuming single notifications, a missed
6.4. FORMALISING LIVENESS FAILURES AND POTENTIALS 123
notification is a wait() not followed by a notify(). Either, there has been
only a notify() before but no other wait(), or there have been two subsequent
occurences of notify() with no wait() in between.
Because the failure deals with calling notify() on an object and not with
a thread receiving the corresponding notification after the system has deter-
mined the threads to be notified, we describe a formula based on the event
NOTIFY not on the event NOTIFICATION. We assume total ordering of the
history trace with the smallest number being the first in the trace and hence
the oldest event. We omit the type of the variables in the following formula
to improve readability. Variables s, t, u ∈Netc. denote threads, o, p ∈N
etc. denote objects, and g, h, i ∈Ietc. denote indices from the time axis.
∃m∃s∃o:his(m) = W AIT (o)
s!∧ ∀t6=s∀n > m(¬(his(n) = NOT IF Y (o)
t!))
∧
(¬(∃l < m∃t:his(l) = W AIT (o)
t!)∧ ∃k < l∃u6=s:his(k) = NOT IF Y (o)
u!
∨
(∃k < m∃t6=s:his(k) = NOT IF Y (o)
t!
∧∀u6=s∀l:k < l < m ⇒ ¬(his(l) = NOT IF Y (o)
t!)
∧∃i < k∃v6=s:his(i) = NOT IF Y (o)
v!
∧∀w∀j:i < j < k ⇒ ¬(his(j) = NOT IF Y (o)
w!)
∧¬(∃x∃h:i < h < k ⇒his(h) = W AIT (o)
x!)))
(6.2)
In order to identify that the thread has missed a notification we must
deal with different cases. The first line of equation 6.2 states that there is
await() not followed by a notify() from a different thread. The second line
states the special case that this wait() is preceded by at least one notify() and
124 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
no other wait(). The rest of the lines matches the two closest occurences of
notify() which happened before the wait(). The last line states that there has
not been a wait() between the first (in execution order) of the two occurences
of notify() and the observed wait().
Nested Monitor Lockout (F6)
∃ti∈N∃tj∈N:ti6=tj∧sys(ti) = waiting ∧sys(tj) = blocking
∧tj.acquire ∈ti.locks (6.3)
Circular Join (F7)
∃ti∈N∃tj∈N:ti6=tj∧sys(ti) = joining ∧sys(tj) = joining
∧ti.join =tj∧tj.join =ti(6.4)
Self Join (F8)
∃t∈N:sys(t) = joining ∧t.join =t(6.5)
Join-induced Deadlock (F9)
∃ti∈N∃tj∈N:ti6=tj∧sys(ti) = joining ∧sys(tj) = blocking
∧ti.join =tj∧tj.acquire ∈ti.locks (6.6)
6.4.2 Formal Description of Potentials
Here, we formalise the potentials described in chapter 5. Again, we omit the
type of variables to improve readability. Variables s, t, u ∈Netc. denote
threads, o, p ∈Netc. denote objects, and g, h, i ∈Ietc. denote indices from
the time axis.
6.4. FORMALISING LIVENESS FAILURES AND POTENTIALS 125
Deadlock Potential (P1)
Here we formalise the deadlock potential for two threads. The deadlock
potential is captured by checking for existence of two locks which are locked
in adverse orders in different threads. The last two lines check that there
is no so-called gate lock, which is always used before the other locks and
therefore avoids deadlocking.
∃s∃t∃i∃j∃o∃p:i < j ⇒(his(i) = LOCK(o)
s!∧his(j) = LOCK(p)
s!
∧o6=p∧s6=t
∧∃k∃l:k < l ⇒(his(k) = LOCK(p)
t!∧his(l) = LOCK(o)
t!))
∧¬(∃q∃m∃n:m6=n∧his(m) = LOCK(q)
t!
∧his(n) = LOCK(q)
s!∧m < i ∧n < k)
(6.7)
Unused Notification (P2)
An unused notification is one that follows another one. Between the two
notifications there is no wait().
∃u∃s∃o∃n:his(n) = NOT IF Y (o)
s!
∧
(∃l < n∃t:his(l) = NOT IF Y (o)
t!
∧∀u∀m:l < m < n ⇒ ¬(his(m) = W AIT (o)
u!))
(6.8)
126 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
Potential for Join-induced Deadlock (P4)
A thread acquires a lock and subsequently releases it. Later, the thread
becomes the target of a join(). The joining thread holds the same lock while
joining.
The following potential can be checked when a join() takes place. This
is feasible because from the point of the time where the join() happens, the
history is searched backwards for a claim of the lock by the other thread.
∃t∃o∃k∃m:his(k) = LOCK(o)
t!∧(∀l > k :his(l)6= UNLOCK(o)
t!)
∧m > k ∧his(m) = JOIN(o)
t!
∧∃s∃n:n < k ∧his(n) = LOCK(o)
s!
(6.9)
Another risky situation is a join() with locks. This is not a potential in
the previous sense but it is a risky situation. It can be described in a system
state.
∃t∃o:sys(t) = joining ∧o∈t.locks (6.10)
6.5 Summary
In this chapter we have provided a statechart model to capture the behaviour
of threads with respect to synchronisation on the type level. Based on this
statechart we have captured individual failures and potentials more formally
than in the previous chapter. This chapter marks the end of the presentation
and analysis of the domain of concurrent Java liveness failures. The forth-
coming chapters will describe support for automated detection of failures.
As a first step towards a model of thread synchronisation, we analysed
the domain of Java concurrent liveness failures and potentials for their scope
and their characteristics. The key results were:
•The scope of Java concurrent liveness failures is covered by Java thread
synchronisation.
6.5. SUMMARY 127
•The failures can be classified as specific combinations of control flow
states, a definition we have introduced to identify the states in the
thread lifecycle.
•These specific combinations of states can be further characterised by
dependencies between threads in these states.
In order to cover control flow states and thread dependencies we have
proposed a statechart-based model. This model has captured the thread
synchronisation in a specific way. The key decisions were the use of:
•Intermediate states to model significant behaviour while a method is
executing.
•Asynchronous events to model the entry and exit to intermediate states
of otherwise synchronous method calls.
•Instance-specific information for expressing thread dependencies caused
by waiting, joining, locking, and blocking on a lock.
The use of the statechart imposed trade-offs. The advantage is a vi-
sual formalism depicting the core idea of control flow states explicitly. The
disadvantage is the problem to capture control flows from object-oriented
programming in the world of statecharts which typically process events us-
ing queues. The conclusion is that for our restricted application area, where
only a restricted part of Java had to be modelled, the use of a statechart
is feasible. The simulation of a synchronous communication with mutual
exclusive objects guarantees the correct behaviour also in the presence of
asynchronous events for behaviour which is synchronised. For behaviour
which is not synchronised, like the interrupt, the asynchronous mechanism
is not a problem.
It is cumbersome to model the details involved in thread synchronisation
with a statechart. On the one hand, the mechanisms are complex involving
different pre- and postconditions and involving different ways of returning
from methods. Also solutions using other formalisms will become lengthy.
The drawback of the statechart is the mixture of explicit states and implicit
states encoded in associated data fields. Still, we think that making the states
explicit fosters better understanding of the Java synchronisation mechanism,
e.g. when provided in addition to the purely textual language and library
documentation.
Based on the statechart, we have formalised the notion of system state
and execution history. These formalisations where used to formally specify
failures and potentials. Failures implying dependencies are formalised using
128 CHAPTER 6. A MODEL OF THREAD SYNCHRONISATION
the system state formalisation which makes the state of each thread and its
instance-specific information explicit. For failures based on history it is more
convenient to express them in terms of an execution trace. The same holds
for potentials.
The developed statechart also helped us in determining failures we had
not thought of before and which have not been documented elsewhere. This
was the case with the join-induced deadlock, its potential, and the circular
join from the previous chapter. Because the statechart makes the states
involved in different kinds of synchronisation explicit, one can easier check
different configurations, i.e. different combinations of thread instances and
their states, and hence one can determine possible failure states.
Chapter 7
Trace-based Data Collection
The previous chapters have presented the domain of concurrent liveness fail-
ures. They concluded with their formal specification as an answer to our first
main requirement (see chapter 3). This chapter and the following two chap-
ters will present concepts for automated detection of these failures, thereby
addressing our second main requirement. These concepts are implemented
in our prototype, the Java Visualisation environment JAVIS.
In chapter 3 we have already structured the requirements for automated
detection of concurrent liveness failures into the tasks data collection, data
analysis and data visualisation (depicted once more in Fig. 7.1). Each task
is influenced by the failure domain, whose characteristics have already been
presented, by task-specific requirements for user control, and by task-specific
standards. Now, that we have informally and formally presented the failure
domain in the previous chapters, we are able to refine our requirements for
the three tasks depicted in the centre. For each of them, we will discuss the
refined requirements, related work, and our proposed solution in a separate
chapter.
In the motivation in chapter 2 we have already argued that the failures
are insufficiently covered by analysis tools and, when they are supported, not
adequately visualised. Before developing new concepts, we will present the
state-of-the-art in tracing, dynamic analysis, and visualisation of concurrent
object-oriented programs to determine which concepts may be re-used. Most
of the concepts stem from existing tools. These tools typically cover all three
tasks. However, each tool is only of interest for our goals regarding one or
two of the three tasks. The contribution to these tasks is then the reason why
the tool was chosen for discussion and therefore the tool will be discussed in
the respective chapter.
This chapter deals with data collection in JAVIS, which is indicated by
the highlighted part in the centre of Fig. 7.1. Chapter 8 will deal with data
129
130 CHAPTER 7. TRACE-BASED DATA COLLECTION
Data
Analysis
Data
Visualisation
Data
Collection
Failure Domain
UsageContext
Standards
Figure 7.1: Requirements for Data Collection
analysis of the JAVIS environment and chapter 9 will conclude with data
visualisation of the JAVIS environment.
7.1 Tracing Requirements
In chapter 3 we have already outlined that our approach to collecting data
from a running program will be based on tracing, i.e. we will collect data
about the execution history of a running program.
Although many failures are detected in a state, the state information is
not sufficient to reconstruct the execution order which lead into the failure
state. Instead, the execution history can provide better insights in how a fail-
ure developed. At the same time, the execution history can provide the same
information which can be found in the individual states. For the detection
of a few failures and for many potentials, the execution history is needed.
Therefore, tracing is the approach which can provide information for both,
state-based and history-based failures and potentials.
We will draw on the failure specifications in order to define the granularity
and level of abstraction of the collected trace data. This is dealt with in the
next section. In the section to follow we will refine requirements for the
tracing method generating the required trace data.
7.1. TRACING REQUIREMENTS 131
7.1.1 Format, Schema, and Encoding
Here we will determine the requirements for a schema for trace data, i.e. a
definition of the structure and the contents of a trace. We do not impose
specific requirements on the encoding of the trace data, i.e. how the trace
is represented and compressed. Schema and encoding together are called
format. In the remainder of this section we will present our requirements for
the schema.
Atrace is a time-ordered sequence of event-records [KRR98]. The order
of the traced events must be kept, either explicitly, e.g. using time stamps,
or implicitly, e.g. by a relative position. Time stamps are needed if events
can happen and be collected in parallel, e.g. in the presence of true con-
currency or in a distributed setting. If the concurrency is implemented by
interleaving, i.e. by executing one thread at a time and changing between
threads, time stamps do not add information about concurrent execution
apart from performance information. Therefore, time stamps are optional
and we do not require a specific format for them. As most JVMs are used
with single processor machines or do not use an additional processor, we
assume interleaving semantics for our trace implementation.
The data to be collected is mainly determined by the failure character-
istics. Concurrent liveness failures are a subset of synchronised behaviour.
For the purpose of failure detection, only method calls with an effect on syn-
chronisation have to be traced, including synchronized-methods and -blocks.
Note that the trace we intend to define here will defer from the notion
of trace introduced in our formal model in the previous chapter. This is not
surprising, because now we need to define a trace in terms of an executing
Java program while in the formal model we were defining a trace in terms
of events from a statechart. The statechart had been designed to simulate
a part of behaviour of Java thereby mapping Java behaviour to a different
paradigm.
In order to be able to detect the state-based failures in an execution
trace, the trace has to contain an event which indicates a blocking method
call. With such an event, we are prepared for offline analysis of failures
involving blocking. Without such an event, blocking can only be decided
with the following workaround. If a blocking takes place, the trace would
contain the last successful method entry or exit taking place before the call
to synchronized. One would need to look at the source code following the last
traced event to decide whether the blocking is caused by synchronized or by
other code such as a loop or I/O.
For each method call or synchronized-block we require to store
•caller object and class (optional because it can be computed from
132 CHAPTER 7. TRACE-BASED DATA COLLECTION
previous events)
•callee object and class
•method name
•parameters (optional)
•thread of control object and class
(if necessary for identification also its virtual machine and the host, on
which the virtual machine is running)
•timestamp (optional)
Having discussed in depth the importance of control flow states of threads
during method call execution, we require that for each method call observed
the trace stores
•when it was entered,
•when it was exited,
•and in addition whether it was blocked on a lock before it could be
successfully executed.
Asynchronized-block can be treated as a method call except for the
method name that is omitted. Caller, callee, and parameter objects can be
stored by an identifier and class. We do not need to store the complete state,
i.e. fields and values, of each object. Parameters can be stored by their name
and value. Note that here we consider also constructors as method calls.
While it is sufficient for failure detection to store only selected method
calls in the trace (those dealing with synchronisation), with this approach it
is however difficult to provide a coherent picture of the program execution, as
not every call in a sequence and not every nested call is traced. The naviga-
tion paths between objects are incomplete. This inhibits the understanding
of the general behaviour and can be counterproductive when trying to un-
derstand the execution leading into a failure and especially when trying to
understand potentials. Failures and potentials are restricted to synchronised
behaviour but they are embedded in non-synchronised behaviour.
These deficiencies can be easily overcome by tracing all method calls,
not only those involved in synchronisation. Only innermost methods of a
sequences of nested method calls can be omitted without loosing coherence
of control flow. Innermost methods are often calls to standard Java packages.
Omitting calls to framework packages on the other hand can break the control
7.1. TRACING REQUIREMENTS 133
flow because a framework is often characterised by the fact, that it makes calls
to application code. Other kinds of statements can be neglected. Thereby,
we achieve a general purpose trace format include arbitrary method calls.
Moreover, a general purpose tracing format is also important to allow the
tracing of method calls to libraries providing synchronisation. Such libraries
can be used in addition to or instead of the Java synchronisation constructs.
Method calls to such libraries have the same effects on threads as already
described. If only the internal calls to synchronisation methods would be
traced, the overall behaviour would be difficult to understand. Therefore, it
is desirable that such method calls can be included in detection of failures
and hence in tracing. We have already discussed, that future versions of Java
will include synchronisation libraries. This situation makes a trace format
with arbitrary method calls even more important.
From the perspective of failure detection it would make sense to declare
all synchronisation methods as required for the format and all other methods
as optional. But tracing all synchronized-methods may provoke a large space
and time overhead. In order to give the user more flexibility we allow that
only selected parts of a program be traced.
The described format for tracing method calls on objects is almost equiv-
alent to UML interaction diagrams [UMLa].
Although the requirements for our format are fixed, we desire that the
schema is extensible regarding entries for one call, e.g. time stamps or
performance annotations, and regarding additional events, e.g. statements.
We can determine one such extension already from the intended use of
the tracing facility. When an already started program is traced, the trace
will contain method returns for which no entry has been traced. However,
the traced program has the corresponding entries on its call stack. Therefore,
we allow the trace to contain the call stack of each thread. This part of
the trace will therefore contain method entries in our trace format but they
are only ordered with respect to one thread. To avoid misunderstanding, we
need an extra entry for these calls which identifies them as call stack method
entries.
7.1.2 Trace Generation
In principal, the required trace data format is the essential requirement which
determines the choice of method. Only the optional features can be neglected.
Every trace method has to face a problem which cannot be avoided. It
disturbs the running program either because the program is transformed and
therefore can change behaviour, or because the runtime environment executes
the program differently due to the runtime environment’s additional task to
134 CHAPTER 7. TRACE-BASED DATA COLLECTION
generate traces. This problem is known as probe effect or Heisenberg effect.
It is especially severe for concurrent programs [Kra98]. Here, the observed
program can behave differently due to different scheduling in the presence
of a changed program or a runtime environment with additional tasks. This
sometimes also provokes failures, also called ”Heisenbugs”, but is likely to
hide as many failures as provoked. The ultimate answer to this can only be
deterministic testing.
For using tracing in different scenarios such as testing and debugging we
have also determined the core requirement that we need independent trace
data collection facilities. This choice was made in order not to be dependent
on a tool which allows tracing only in combination with another functionality
such as testing or stepwise debugging. Therefore, our only core requirement
is, that tracing requires that the program is executed. It should not be
dependent on a specific tool executing the program.
Also the tracing should have a loose coupling with the subsequent tasks
of analysis and visualisation to foster interoperability with different analysis
and visualisation components. The general question in this respect is, if the
data are online or offline analysed or visualised. Online can mean either
initiated by the tracing tool or more loosely coupled via streaming to shared
memory or to a communication channel from where another tool can read
the trace immediately. The highest degree of decoupling is reached if the
trace is stored and subsequent tasks can access the trace post-mortem, i.e.
after its generation. Therefore we aim at trace file generation.
Usually one is looking only for application-specific method calls and typ-
ically one can limit the search for errors to specific packages. Non-related
information should be discarded. As already mentioned, a tracing method
involves time and space overhead. Therefore the tracing method must be
configurable. We require that it must be possible to select which method
calls are traced, and that optional information about events such as param-
eters can be omitted. For method selection, it must be possible to either
select individual methods, or to select methods by part of the name, class or
package. This is also called filtering. It should be easy to extend the tracing
method in order to extend the trace format.
Tracing itself should be as flexible as possible regarding when and where
it is connected to the observed program, i.e. time and location. The user
has to control the starting and the stopping of the trace. For analysing
problematic program behaviour it can be required that this is done also after
a program has been started. It can be required to pause the tracing (but not
the program), also termed drive-by-analysis [PMR+01]. The user may also
want to specify other kind of starting and stopping criteria such as criteria
expressed in terms of the program behaviour. For instance, a trace could be
7.2. RELATED WORK 135
stopped after a failure has occurred and has been detected. In the presence
of the internet also distributed debugging scenarios are conceivable where a
non-local program needs to be traced remotely.
7.2 Related Work
When we started in 2000 to develop concepts for supporting automated de-
tection with a tool, tracing was the first component to be designed and built.
At that time there were no reusable general purpose tracing tools for trac-
ing concurrent Java programs nor general purpose tracing formats. Only
very few tools for dynamic analysis of concurrent Java based on tracing were
available.
Since then the research has increased exponentially in this area with al-
most a hundred academic and a few commercial tools based on tracing being
published. A few reusable tools and frameworks have been proposed and re-
cently, the need for a standard trace format has been expressed more often.
In the following, we discuss related work with respect to trace formats
and trace methods. We also discuss the degree of flexibility in user control.
We start by introducing the basic techniques on which all these approaches
rely. Then we present in detail how different approaches choose among these
techniques to achieve their goals. The presented approaches are from the
area of Java programming and concurrent programming.
When data about performance of these approaches is available we discuss
it.
7.2.1 Basic Techniques
Schemata and Encoding
It is desirable to reuse trace schemata in order to foster interoperability. At
the lowest level, a schema is specifying which bit or byte of a file contains
what information. Schemas may also be defined as the streaming result of
runtime memory objects. In both cases, the schema includes the encoding
and the distinction between the two is blurred. More advanced schema def-
initions contain pairs of name and value thereby not imposing orders inside
an event. At the highest level is a schema defined with a data description
language (DDL) which is executable, i.e. supported by a tool to generate
useful components from the data description. Depending on the require-
ments, both, low-level and high-level techniques, can be useful. Extensibility
of schemata can also be important in the fast evolving programming world.
136 CHAPTER 7. TRACE-BASED DATA COLLECTION
Encoding is important and influenced by factors such as compression, con-
sistency or ability for network streaming. It must also serve purposes such
as fast readability or analysis. Sometimes also legibility can be required.
Since the advent of more widespread mark-up languages, sometimes XML
[XML00], is proposed as encoding. XML can be advantageous because it
simplifies exchange, storing, and retrieving traces in XML databases. How-
ever, developers of tracing facilities consider XML too verbose and therefore
too inefficient [BDE+02].
None of our requirements seem to exclude any of the schema or encoding
techniques.
Generation of Execution Information
In the motivation, we have already coarsely sketched the two main approaches
for collecting information from a running program, namely code instrumen-
tation generating file output, or calls to helper classes generating output,
and instrumentation of the runtime environment. Instrumentation always
requires a modification step. Instrumentation means, that this modification
typically augments the program by inserting statements. It is different from
program transformation which means more arbitrary modifications where ex-
isting code can be replaced, re-ordered etc. Often, instrumentations of the
runtime environment are already part of the environment and can be used
via dedicated APIs of the environment.
In the following, we have a more detailed look, at what levels source code
instrumentation and runtime instrumentation or interfaces can occur with
respect to Java (see Fig. 7.2). We explain each level and discuss advantages
and disadvantages in more depth.
•Java source code, i.e. the Java program, is not always available. There-
fore, instrumentation of Java byte code is more general. It might be
even the case that Java byte code is only instrumented at load-time.
The general advantage of source code and byte code instrumentation
is that it is faster than interaction with the runtime environment.
While method entry or exit can easily be instrumented by inserting
code before the first line and before each return or after the last line,
the synchronized-keyword is problematic. If code is inserted after the
method call or block, the blocking situation can never be traced. If
code is inserted before a synchronized-method or -block, the blocking
appears in the trace but one does not know if the call is executed
immediately or ever. In the byte code, the problem remains the same.
The keyword synchronized is mapped to a pair of monitorenter and
7.2. RELATED WORK 137
Operating System
Runtime Environment (JVM)
Java Bytecode
Java Compiler
Runtime
Libraries
Application
Libraries
Java Program
Figure 7.2: Levels of Instrumentation
monitorexit statements in byte code. Hence, also here code can only be
inserted before or after monitorenter. Here, the JVM has the advantage
that it knows when a call was done and when it was blocked.
Another problem is that, when an exception occurs, the synchronized-
method or -block is left immediately and the release of the lock cannot
be traced via an instrumentation. The same is true for the byte code.
Therefore, [GH03] proposes to add a try-catch-block around each mon-
itorenter/exit.
•Instead of a separate transformation tool to achieve the instrumenta-
tion at source code level before compilation or at byte code level after
compilation, one could consider modifying the Java compiler for inte-
grating this task. This is similar to compiling a Java program with
a debug option which generates debug information otherwise not in-
cluded in the byte code. The limitations are however the same as for
code instrumentation.
•Depending on what kind of information is needed, it could be more
useful to instrument the application libraries usually given as byte code.
The advantage is that only a library has to be modified but not each
program using the library. This is not feasible, as we are not using a
third party library for synchronisation.
•Solutions for the runtime environment have the advantage that they
138 CHAPTER 7. TRACE-BASED DATA COLLECTION
can be installed once and the repetition of the transformation step is
superfluous. Here, one can think of different options, namely instru-
menting runtime libraries or instrumenting the runtime environment
(JVM) itself. The first would of course not yield a general tracing
approach but could be interesting for our synchronisation domain. It
is however not possible for Java, as the synchronisation primitives are
only partly in a runtime library java.lang.Thread but also partly proper
language elements such as the synchronized-keyword.
Therefore, the Java alternative to code instrumentation or compiler in-
strumentation lies only with the Java Virtual Machine (JVM) itself.
The JVM can be instrumented to generate traces which means that
a modified JVM has to be provided. Luckily, for the most applica-
tions which want to observe a running JVM this is not necessary. The
Java runtime environment already provides a set of runtime application
programming interfaces (APIs) for observation tasks. This includes
method calls and blocking on monitors. It has also be pointed out
by an approach for deadlock detection in a different setting (pthreads)
that it is necessary that the blocking can be explicitly queried. The
JVM can also pause a running program which cannot be achieved using
code instrumentation. In general, runtime-based solution have a higher
performance overhead than code instrumentations.
•For special purpose data collection one could even consider using the
operation system (OS), e.g. if Java only provides an interface to using
services or libraries of the OS. Often, the OS provides already APIs
for observing such behaviour. Instrumenting the OS should be the last
thing to be considered. It is the deepest layer and modifications will
effect also other programs using it. This could however make sense
when a JVM uses extensively OS-level threads and their synchronisa-
tion concepts.
With regard to user control, all approaches using an instrumentation of
the program itself need to provide means to insert and remove instrumenta-
tion (or keep uninstrumented copies). This might not suffice when tracing
is switched on and of while the program continues to run. The more flex-
ible the tracing should be, the more functionality has to be built into the
instrumentation. This is not a problem for the direct interaction with the
components having complete control about the execution of a program.
Although some major disadvantages of instrumenting the Java program
itself in different levels have been pointed out it cannot be said in general
that this approach is unfeasible.
7.2. RELATED WORK 139
7.2.2 Trace Formats
In this section, we look at explicit proposals for an exchangeable format. The
runtime APIs presented in the next section and some of the tools presented in
the following sections also provide formats but they do not explicitly address
interoperability.
Compact Trace Format
A recent yet very initial proposal to standardisation for object-oriented traces,
Compact Trace Format (CTF) [HLL03], proposes to base traces on UML se-
quence diagrams [UMLa]. The schema contains concepts for eliminating re-
curring sequences and loops. It does not explicitly deal with synchronisation
nor with blocking method calls.
STEP Framework
More mature work has been provided recently by the STEP project [BDE+02].
It addresses standardisation at the level of schemata and encoding, and pro-
vides a framework for efficient encoding of trace data. For definition of trace
schemata, STEP provides a data definition language. It also provides config-
urable trace encoding. STEP has been used for example for defining a schema
for events from the Java Virtual Machine Profiler Interface (JVMPI), which
will be introduced in the next section.
7.2.3 Code Instrumentation
Flexible instrumentation tools are rare. The following approaches do not
provide solutions for dealing with the synchronized-problem.
JSpy and JTrek
Only recently the JSpy [GH03] Java byte code instrumentation tool was pro-
posed. It was built for the Java PathExplorer JPax [Hav00], a tool for run-
time analysis of Java programs, which is part of the NASA Java Pathfinder
project which develops model checking techniques for Java [JPF03]. JSpy
declaratively specifies what should be traced, under what condition, and
with what effect Amongst others, JSpy reports all successful acquisitions
and releases of locks but not the blocking. JSpy support streaming and file
output.
JSpy is a customised interface to the byte code instrumentation tool JTrek
by Compaq SRC [JTr] which also supports the conditional tracing but allows
140 CHAPTER 7. TRACE-BASED DATA COLLECTION
to insert only restricted kinds of codes. A performance evaluation is not
available.
Aspect-Oriented Programming (AOP)
The aspect-oriented programming (AOP) approach, especially the language
AspectJ [AJ98, KLM+97], can be seen as an approach for instrumenting
method calls and therefore seems to meet the requirements for tracing method
calls. However, at the time when we examined it, it was not flexible enough
because it could only instrument method bodies. This would not allow to
insert code directly before a synchronized-method. This was also observed
by the developers of JSpy [GH03] which additionally noted that not enough
information can be gathered about the running program. More recent ver-
sions allow also method calls to be instrumented, not only bodies. Still for
counting locking and releasing locks this method is not apt.
As for any instrumentation approach, it is cumbersome to implement
interactive control over the instrumentation.
7.2.4 Runtime APIs
The Java Development Kit (JDK) provides several APIs for developing pro-
grams which observe a running Java program and interact with it through
the JVM on which it runs [Jav01].
The observation principle is always the same: The API defines a set of
events which can be observed. Via the API the observing program can reg-
ister for selected events and can provide callback functions which are called
when the events occur. With the call, also an event record is provided con-
taining details about the event occurrence. The level of detail is configurable
via the API. There are several different APIs which we present in the follow-
ing.
Java Virtual Machine Profiler Interface (JVMPI)
The Java Virtual Machine Profiler Interface (JVMPI) is intended to support
the collection of profiling data from which statistics about time and space
consumption and performance bottle necks can be computed.
The JVMPI is a native interface in C. A profiler has to be developed as
a separate C program.
Specifically for multithreading, the JVMPI allows profilers to observe a
set of predefined events such as starting and ending of threads, contention
of monitor objects and waiting of threads. The event records are however
7.2. RELATED WORK 141
designed for statistical evaluation and not for reconstructing the execution.
For instance, the event record for method entries contains only method name
and identifier of the called object but not the thread or the caller identifier.
Java Platform Debugger Architecture (JPDA)
The Java Platform Debugger Architecture (JPDA) (since JDK 1.3) contains
two interfaces and a protocol intended to develop debuggers (see Fig. 7.3).
It allows for more detailed observations than the JVMPI.
The Java Virtual Machine Debugger Interface (JVMDI) is a native
interface implemented by each JVM, typically in C.
It provides basic debugging functionality such as inserting breakpoints
and watchpoints, starting and stopping the debugged program, and exhaus-
tive querying of the state of a debugged program. The implementation di-
rectly using the JVMDI is written in the native code and runs on the same
JVM as the observed program.
The Java Debug Interface (JDI) is a Java interface which internally
uses the JVMDI. It provides the same functionality in Java as the JVMDI
but can be used on a different JVM.
This is achieved by the Java Debug Wire Protocol (JDWP), a stan-
dard for using the JVMDI. A backend using the JMVDI runs on the same
JVM as the observed program. The frontend which implements the JDI runs
on a different JVM, either on the same host or on any other host. In each
case, the communication is implementing the JDWP.
The JDI provides detailed enough event records for method entry and
exit events from which program execution can be reconstructed. Synchro-
nized method call entries which are blocked do not generate events, only
successful method entries. However, the JDI provides a method currentCon-
tendedMonitor which determines for a given thread the object it attempts to
lock. The JPDA does not provide time stamps with the events.
The JPDA seems to fulfill the requirements for our trace format require-
ments. It allows method entries and exits including callee and control flow
thread to be traced and it allows the state of objects used as monitors to
be traced. The complete information about entries and exits is not available
via the JVMPI. The JPDA allows debuggerst to trace already running pro-
grams and supports also remote tracing. Also, the JPDA allows not only to
generate trace files but to couple arbitrary programs to its provided inter-
faces which will be executed while the observed program is executed. This
allows tracer to provide additional functionality, namely optional online anal-
ysis and optional online visualisation. The JPDA also allows flexible control
during tracing.
142 CHAPTER 7. TRACE-BASED DATA COLLECTION
JVMDI Front
end
Back
End JDI
JVM JVM
JDWP
Debuggee Debugger
Figure 7.3: The Java Platform Debugger Architecture (JPDA)
These varieties makes the JPDA more flexible than instrumentation.
Therefore, we prefer such a solution over the instrumentation-based ap-
proach. More details on the JPDA will be provided when presenting our
solution.
It is noteworthy, that for Java tracing there is no specific interface of the
Java runtime environment, only the JPDA can be used. It has not been
designed for tracing, albeit the interface is increasingly used for these kind
of applications. We have already noted, that performance is a problem.
As performance evaluations are not available we have conducted our own
experiments.
Packages Runtime and Log4J
The class Runtime of package java.lang provides a method traceMethodCalls.
When this method has been called, the JVM outputs all method entries and
method exits. Regarding source and target of a call, only the class names of
the called methods and the thread of control are in the output. Note that
this class cannot really be considered an API, as it is tightly integrated with
the observed Java program itself.
For very simple tracing there are also libraries such as Log4J [L4J] but
none of the tools discussed here is using it. With Log4J it is possible to
enable logging at runtime. Logging behaviour can be controlled by editing
a configuration file. The Log4J package is designed so that the additional
statements for logging can remain in shipped code without incurring a heavy
performance cost. The logging output is not as detailed as we require. It
does not support locking.
7.2. RELATED WORK 143
7.2.5 Debuggers
Regarding classical debuggers we have already shown their deficiencies with
respect to the analysis and visualisation of concurrent program execution.
Here, we will shortly discuss their approach to data collection and to user
control. Then we will discuss extensions such as history-oriented debuggers
and deterministic debuggers.
Classical Debuggers
Java debuggers of today are using the Java Platform Debugger Architecture
(JPDA) such as JBuilder [JBd01] and others we have mentioned.
The JPDA was designed with only debuggers in mind and supports break-
points, watchpoints, stepwise execution, complete inspection of the program
state and the like. For this kind of functionality, performance is not a critical
issue. Debuggers demonstrate what information is accessible with the JPDA.
They access more information via the JPDA than what it required for our
trace format, e.g. the complete object-graph. However, debuggers do not
store any of the displayed information. Because the JPDA provides remote
functionality, debuggers can also attach to remote programs. Debuggers can
also attach to already running programs. Modern Java debuggers support
this to a certain extent.
An interesting aspect of data collection in debuggers is the highly user-
interactive process. The user has fine-grained control over starting and stop-
ping the execution with conditional breakpoints and watchpoints.
As we said we do not want to be dependent on a given tool functionality
for producing traces. However, it would be feasible to extend a debugger
with tracing as a new functionality. A problem is, that for commercial tools
source code is not available.
History-based Debuggers
The key idea of tracing, namely access to execution history, is also pursued
by a set of trace-based tools which see themselves in the tradition of debug-
gers. Their mission is to consider what information will help the programmer
instead of what information can be provided while the program is running
[Lew03a].
The Omniscient Debugger [Lew03a] provides typical debugger func-
tionality but generates a trace. It still adheres to the stepwise presentation
of the execution by a source code view. The Omniscient debugger is a post-
mortem tool, which limits its usefulness concerning interactive program ex-
ecution. It first collects the trace and then allows flexible trace navigation.
144 CHAPTER 7. TRACE-BASED DATA COLLECTION
It does not provide any more analysis capabilities than a normal debugger.
The Omniscient debugger instruments Java byte code at load-time to gener-
ate the tracefile. Tracing uses filters and the level of detail can be configured.
The trace format is not dedicated to locking problems and cannot identify
blocking.
The slowdown factor is 10-300. It takes 10µs to store a method call and
1µs to store an assignment. The average is assumed to be 2µs. In 20s, a
2GB trace file is generated.
Only an early predecessor, ZStep95, a debugger for LISP [LF98] also
provided a large variety of visualisations of the history itself at different levels
of abstraction. At the time of its publication, the feature of reversibility was
considered to be the most important new feature. Reversibility means that
the trace can be navigated forwards and backwards.
Also tools like JProbe Threadalyzer [JPr00], and Assure Threadan-
alyzer [Ass] need to store data about the execution history. However, they
collect data highly selectively governed by their intended analysis. They do
not store or output it as traces but use it for statistical results and for iden-
tifying individual failures such as the deadlock. They collect data via the
JVMPI and JPDA. These tools existed when we started with this thesis but
as they are not open source they are not reusable or extensible.
Deterministic Debuggers
Tools for deterministic tracing of concurrent Java programs can improve
the testing and debugging considerably. The Delta Debugging approach
[ZH02] starts from a nondeterministic or user-defined thread execution sched-
ule which is then automatically examined and narrowed down to a more sig-
nificant thread schedule. The Dejavu tool [CZ02] supports execution of a
given deterministic thread schedule using a modified Java virtual machine
called Jalapeno. Dejavu was first used to isolate failure inducing thread
schedules [CZ02].
Deterministic replay tools also have a long tradition in parallel and dis-
tributed computing such as the collect-and-replay tool by [CT91].
We consider this a feature which is more related to the testing area and
we envision that approaches like ours can be used together with these kind
of tools.
7.2. RELATED WORK 145
7.2.6 Trace-based Tools
Java Dynamic Analyzer (JaDA)
The Java Dynamic Analyzer (JaDA) [BT98] is an academic approach to con-
current Java tracing. It traces only synchronisation events and it supports
deterministic tracing by prescribing synchronisation sequences or replaying
already recorded synchronisation sequences. This approach relies on code
instrumentation. (It is not indicated whether on source code or byte code
level.) For the tracing of synchronisation, the approach assumes that either
specific libraries are used which can be instrumented or that the code is
transformed to use such libraries. This is to circumvent the problems with
the instrumentation of the keyword synchronized. For the deterministic exe-
cution, thread classes are instrumented such that they inherit from a specific
new class. With this technique the main thread and the GUI event handling
thread cannot be transformed. The approach also lacks flexibility, as the
tracing cannot be switched on and off while the program is running.
Jinsight
IBM Jinsight Version 2.1 is an industrial prototype [Jin, PKV98] and the
most prominent example of a tool combining tracing, analysis and visuali-
sation. Its purpose is the detection of anomalous behaviour in Java with a
focus on object-oriented time and space performance. The tool first collects
data and then post-mortem visualises individual method calls of different
threads.
Jinsight provides two ways to collect data, behaviour tracing and object
population snapshots. For tracing, Jinsight uses the JPDA albeit with an
instrumented JVM for generating trace files. The trace files are in a propri-
etary format in a non-legible encoding and their definition is not published.
This makes it difficult to reason about there suitability. Because none of the
analysis facilities or the views allow to detect a deadlock at the level of gran-
ularity of identifying blocked method calls we conclude that the trace format
does not contain these kind of entries. The tool only traces the locking and
release of locks.
Jinsight provides flexible control supporting remote drive-by-analysis, e.g.
in order to trace Webservers to identify problematic behaviour.
The trace implementation has changed over time. Jinsight 2.1 provides
instrumentation via a profiling agent which uses JVMPI interface. Jinsight
2.0 provided instrumentation as entire replacement JVMs.
Trace files grow at a rate of 15-30 MB/minute. A few tens of megabyte
contain over 1 million events. One event hence seem to take a few tens of
146 CHAPTER 7. TRACE-BASED DATA COLLECTION
byte in a file. Traces are read/visualised at rate of 12 MB/minute. Jinsight
visualises traces up to a few tens of megabyte. Filtering while reading allows
visualisation of traces otherwise too large.
Jinsight itself is not developed further but its concepts are integrated in
commercial tools now such as IBM Websphere [WbS] or IBM Hyades for
Eclipse [Hya].
Tracing in Distributed Environments
The Siemens Test and Monitoring Tool (TMT) is a noteworthy ap-
proach aiming at tracing different kinds of middleware in distributed sys-
tems, for instance Java programs with remote method invocation (RMI)
[CGR01, BCH+00]. It collects trace data via a very low level ”runtime en-
vironment”, namely the network layer. It listens to the network traffic and
re-combines network packages sent, for instances into remote method invo-
cations.
It is important to point out that tracing is not a new concept. Tracing
has a long tradition in the area of telecommunication. Here it is used with
domain-specific languages. As a consequence, tracing is often a configurable
facility built into the domain-specific operating systems. Since the telecom-
munication area faces new challenges such as reverse engineering their long
grown systems new concepts are needed. In the E-CARES project [MW03]
reverse engineering tools using tracing are proposed.
7.2.7 Comparison
We summarise the discussion of related work with Table 7.1. For each ap-
proach we classify the trace schema and encoding, the kind of trace output
(file or only runtime data), the tracing method (instrumentation or runtime
modifications).
We did not include performance in the table because it is only published
for few approaches. We will discuss performance for the approaches which
provide measurements and for our proposed solution at the end of this chap-
ter in more depth. We draw the following conclusions from the related work.
Firstly, no tool provides a trace format schema which fulfills our require-
ments. Even those schemata which provide the tracing of synchronized do
not distinguish between blocking and successful locking. At the time when
we built our tracer, there were not extensible frameworks. Nowadays, JSpy
or STEP provide useful support.
The only more high-level API is the JPDA with its interfaces. It has
gained practical relevance, has already been used for tracing in Jinsight, and
7.2. RELATED WORK 147
Approach Year Schema Encoding Output Method
CTF 2003 method calls, multi-
threaded
trees indepen-
dent
independent
STEP 2002 definable definable definable independent
JSpy/JTrek 2003 multithreaded, lock-
ing and unlocking
unknown file/socket
streaming
byte code
instrumen-
tation
AspectJ 2004 executes code before
and after method
call no blocking
user de-
fined
user de-
fined
byte code
instrumen-
tation
JVMPI - method calls, multi-
threaded
indepen-
dent
indepen-
dent
JVM API
JPDA
(JVMDI,
JDI)
- thread identi-
fier/class, method
entry/exit, callee
identifier/class,
blocking on locks
indepen-
dent
indepen-
dent
JVM API
Runtime - thread name,
method entry or
exit, class name
ASCII file, con-
sole
runtime li-
brary
Log4J - thread name,
method entry/exit,
class name
ASCII file, con-
sole
runtime li-
brary
Omniscient 2003 calls and statements,
multithreaded, no
locking
compressed
by coding
in bytes
file byte code
instrumen-
tation
ZStep95 1995 LISP paradigm, not
multithreaded
unknown file reflection
JProbe,
Assure
- multithreaded, par-
tial trace informa-
tion for profiling
- none unknown
Delta De-
bugging
2000 multithreaded unknown file modified
JVM
JaDA 1998 interface operations
of synchronisation
concepts, monitor
lock and release,
time stamps
unknown file (per
synchro-
nisation
object)
application
library and
source code
instrumen-
tation
Jinsight 1998-
2001
multithreaded,
method calls, no
blocking
unknown file JPDA,
modified
JVM
TMT 2000 Java RMI unknown unknown network lis-
tener
Table 7.1: Trace Collection Approaches
148 CHAPTER 7. TRACE-BASED DATA COLLECTION
seems to be more flexible than instrumentation and flexible enough for our
tracing goals. It especially seems to be more flexible and more usable than
the JaDA architecture.
7.3 JAVIS-Tracer for Concurrent Java Pro-
grams
In this section we define the trace format and the tracing method used by
the JAVIS tracer. We conclude by presenting the resulting architecture of
our tracer.
7.3.1 Trace Format
Schema
The schema is a linear list of entries. The traced events are method entries
and method exits, including constructor calls. Each event record contains
the following entries:
1. a unique name of the thread of control in which the event takes
place
2. the class of the thread of control and a unique internal reference
3. the class of the call target and a unique internal reference
4. the name of the called method with optional parameters, given
by class and reference
5. a boolean synchronisation flag indicating whether a method is syn-
chronized
6. a string indicating the state of a method call, i.e. ”enter”, ”exit”
or ”acquire” (for a thread being blocked when attempting to acquire a
lock)
Note that for the concrete values we rely on the conventions used by the
JPDA. Threads and objects are given by the references provided by a JVM
which are integer IDs. In addition, each thread has an string with an ordinal
number in the JVM, except the main thread which only has a string ”main”.
To make these IDs and strings unique they have to be qualified by the
host and the JVM instance of the running program because we allow that
7.3. JAVIS-TRACER FOR CONCURRENT JAVA PROGRAMS 149
beethoven_VM#1_main:java.lang.Thread@beethoven_VM#1_ID#1:Employee@beethoven_VM#1_ID#92:
<init>(Bank@beethoven_VM#1_ID#74), Terminal@beethoven_VM#1_ID#91)):false:enter
beethoven_VM#1_main:java.lang.Thread@beethoven_VM#1_ID#1:Employee@beethoven_VM#1_ID#-1:<init>:
false:exit
beethoven_VM#1_Thread-0:Employee@beethoven_VM#1_ID#90:Employee@beethoven_VM#1_ID#90:run()
:false:enter
Figure 7.4: Trace Format Example
data from different JVMs can be collected in one trace file. The host is given
by its name if accessed within a local network or by an IP address if accessed
via internet.
Note that we do not store the caller within an event record because the
caller can be inferred from the preceding event record within the same thread
of control. It is the target object of the preceding method entry. The method
call entry can also be a constructor call, denoted by ”<init>”, or a synchro-
nized-block, denoted by ”<block>”. in the latter case, the target denotes
the object used as monitor for the block.
A method entry which is not traced but found on the call stack has the
same format as a normal method entry, only the method name is preceded
by a label ”<stack>”.
Encoding
At present, the trace format is encoded in ASCII. This has the advantage,
that it can be read by humans without any tools. Also, it is our choice having
seen that the general discussion is still open ended and that there are more
arguments against an XML-based solution than pro.
An event record does qualify its entries by position but not by a qualifying
field name. Entries are separated by ”:”. Within an entry, IDs are separated
from class names by ”@”. Each line contains one event record. The sequence
of event records presents the timing order of the events. We do not use time
stamps.
Figure 7.4 gives an example, but note that this format was not designed
for ease of reading but for completeness of information. The example contains
three subsequent lines from a generated trace file. For this trace, parame-
ters have been suppressed, and are only given with constructors which are
depicted with <init>. The first line traces the constructor of an Employee ob-
ject, the next line the return, the third line the entering of the run()-method
on the created object. Note that start() cannot be trace with the JPDA, only
run().
150 CHAPTER 7. TRACE-BASED DATA COLLECTION
7.3.2 Trace File Generation
Here, we describe how the trace file is generated using the JPDA. We will
describe our choice according to the dimensions of remoteness, execution
time control, event selection and callback behaviour, and filtering options.
Location
Our solution uses the JDI because it provides more comfort and greater flex-
ibility than the JVMDI. That implies, our trace program runs on a different
JVM than the traced program but potentially on the same computer (see
also the right hand side in Fig. 7.3).
For the JDWP we have chosen TCP/IP [JPD] because this solution is
applicable for the case that the same host is used but also for the case that
a remote host is used. This allows remote tracing.Remote tracing is imple-
mented simply by choosing from the different initialisation capabilities from
the JDI API. The chosen connection uses ports [JPD]. Therefore, the only
requirement is that the observed program opens a port when started.
Time
Here we describe how the execution phase during which a program is traced
can be controlled. There are two ways to start the tracing:
•start the observed program from the tracer
•attach the tracer to an already running program (implying re-attaching)
This is implemented straightforward by connecting to an existing port or
by starting the observed program with such a port.
We provide different ways to stop the tracing.
•stop the tracing by user choice
•stop the tracing when a failure is detected
•stop the observed program
The conditional stopping uses the possibility that via the JPDA not only out-
put can be generated but also arbitrary calculations can be carried out while
observing events. We provide the option that the tracing can be stopped
when a failure is detected. Details on the online detection will be provided
in the next chapter on trace analysis.
During tracing we also provide the means to pause a trace simply for
convenience. That means that the tracing is stopped because the observed
program is paused by the tracer.
7.3. JAVIS-TRACER FOR CONCURRENT JAVA PROGRAMS 151
Event Selection
Via the JDI API we chose events for which we provide callback methods.
The following events are chosen:
•METHOD ENTRY
•METHOD EXIT
With these events, additional information is transmitted to the callbacks.
The information provided contains the name and the ID of the thread of
control, the class of the call target, the name of the method, and whether
the method was synchronised. In the case of METHOD EXIT, the target is
not provided but can be inferred from the METHOD ENTRY event. It has
to be noted that the ID of the target is also missing.
For accessing information which is not part of the events generated, the
JDI API allows suspension of the JVM when an event is generated. When
the events are chosen, the suspend policy can be determined. Either all or
no thread can be suspended, or only the thread in which the event happened
can be suspended.
...
//vm is a handle to an attached vm
EventRequestManager mgr = vm.eventRequestManager();
MethodEntryRequest menr = mgr.createMethodEntryRequest();
menr.setSuspendPolicy(EventRequest.SUSPEND_ALL);
menr.enable;
...
When the JVM is suspended, the callback method can use the thread
reference provided by the event to examine its stack frame, where the miss-
ing ID of the target object can be found. When the callback returns, the
suspension is finished and the program continues running.
There are no events for blocked method calls. Our idea was therefore,
to use the suspend modus to query the JVM for the missing information.
Experiments with different JVMs have shown that the JDI method Thread-
Reference.currentContendedMonitor(), which is supposed to return the ac-
quired lock for a given thread, is not supported. The method ThreadRef-
erence.status() returns whether a thread is blocked but not the acquired lock.
However, other experiments have shown, that some JVMs already provide
the stack frame for the method call which is still waiting for the lock. In this
stack frame we can find the ID of the call target and hence the ID of the lock.
The Solaris JVM for Java 1.4 (see also the description in [Wey01]) provides
152 CHAPTER 7. TRACE-BASED DATA COLLECTION
this kind of information while the JVM for Java 1.4 from SUN [Jav01] does
not.
Combined with querying the thread status we can hence determine blocked
calls. These queries are of cause triggered by other events. During tracing,
references to all threads are stored. Whenever an entry event for a synchro-
nized-method happens all other threads are queried for their monitor status.
For each thread blocked, we determine the object. We can only do this for
threads which have changed their status. Therefore, if a thread only shortly
waits for lock, we might not be able to trace it. But for the purpose of
detecting blocking involved in failures this approach is sufficient.
Entry and exit of synchronized-blocks are no predefined events in the JDI.
Unfortunately, the aforementioned solution cannot be used with synchronized-
blocks because they do not appear in the stack. In general, one could extend
the JVMs with the desired functionality but this was not the focus of this
thesis. As the Java technology is steadily evolving and especially its APIs
we hope that better support will be provided in future. For these kinds of
problems, feature enhancement requests can be sent to designers of the Java
runtime environments. Therefore, here we cannot cover synchronized-blocks.
Generating Trace Output
We have already described how the callbacks determine the needed informa-
tion. Each callback writes the information to a file, to a console and to a
text view in the tool.
The default is that also traces obtained from different JVMs are written
to the same file. They can be identified by the JVM and host identifier.
Our solution could be easily extended with a streaming facility, e.g. writ-
ing the file output to shared memory or to a socket [JPD].
Filtering
In order to reduce trace data we provide two kinds of filtering
•default filtering of all java packages (java.*, javax.*) and of JVM related
packages (ibm.*, sun.*, com.sun.*)
•customisable filtering of arbitrary packages
Filters are added as exclusion filters to the event requests:
...
String[] exludes = {"java.*", "javax.*", "sun.*", "com.sun*", "ibm.*"};
for (int i = 1; i<excludes.length; i++) {
menr.addClassExclusionFilter(exludes[i]);
}
...
7.3. JAVIS-TRACER FOR CONCURRENT JAVA PROGRAMS 153
In later Java versions [Jav04], it became also possible to define filters dy-
namically. We were not able to use this feature when we built our prototype.
Performance and Size
As no performance data was available for the JPDA, we conducted a measure-
ment experiment [Meh03]. Note that the following measurements do include
the data gathering remotely via the JDI but on the same machine. The set-
tings for the experiments are the same as for the intended usage scenario of
the tracer.
•1000 method calls take 80 seconds when the full trace information is
gathered, i.e. when the JDI suspend mechanism is used.
1000 method calls take only 4 seconds when simpler trace information
is gathered, i.e. a trace with the same number of calls but without the
identifier of the called object and without detection of blocked method
calls, i.e. if no suspension is used.
•1000 method calls generate 2000 entries of an average of 85 bytes,
altogether 170 KBytes.
•The JPDA with suspend is factor 100.000 slower than an ordinary Java
program. Without suspend it is factor 4.000 slower.
Similar values are also obtained by other JPDA applications. The Omniscient
debugger is a lot faster because it uses load-time instrumentation and also
Jinsight is faster because it uses an instrumented JVM.
7.3.3 Tracer Architecture
We conclude this chapter by presenting an overview of the design of the
tracer (see Fig. 7.5). The design is multithreaded itself.
•The class Trace contains the main functionality of the application. It
can open connections to one or more JVMs.
•Class Connect is responsible for setting up the connection to one JVM
and for closing down the connection. It is also responsible to handle
any unforeseen connection problems and to take actions if the connec-
tion breaks unforeseen. This can for instance be closing open files.
Therefore, Connect has a link to the Protocoller class. It is useful to
have a separate thread for these tasks.
154 CHAPTER 7. TRACE-BASED DATA COLLECTION
Figure 7.5: Class Diagram of the Tracer
•Class EventThread provides the callbacks. The callbacks are triggered
when an event occurs for which the tracer is registered. It is better not
to use the main()-thread for the callback computations and therefore
a separate thread makes sense. The callbacks send trace data to the
Protocoller.
If online failure detection is enabled, an instance of EventThread creates
a separate Analyzer thread for failure detection. The results it receives
from the Analyzer are also sent to the Protocoller and hence are inserted
into the trace file.
•Class Protocoller receives entries from one or more EventThreads. It
maintains a buffer with trace file entries which are flushed from time
to time to the trace file. Again, it is useful to have a separate thread.
•The Analyzer class can be used for online detection of failures. It uses
aGraph class which provides a graph data structure with thread de-
pendencies which is used for detecting failures. The Analyzer sends its
results to the EventThread.
The classes Analyzer and Graph will be described in detail in the next
7.4. SUMMARY 155
chapter when we describe concepts for algorithms for failure and po-
tential detection.
The main behaviour scenario of these classes is as follows. The main
class creates the user interface. For each program whose execution shall be
observed it creates a Connect thread. It also creates the Protocoller thread
which is the same for all observed programs. The Connect thread sets up the
connection. Then it creates the EventThread for this connection. If the user
has enabled the online failure detection, the EventThread creates an Analyzer
thread when a blocking method call event is traced.
7.4 Summary
In this chapter we have presented an in-depth discussion of tracing facilities.
We have provided a solution using standard technology of the Java platform.
The concepts for tracing have been implemented in the JAVIS prototype.
The tracing facilities lay the ground for the following chapters.
We have presented the JPDA-based solution first in [Wey01] and [Meh02].
We also carried out an experiment to measure the performance of the solu-
tion, first presented in [Meh03].
156 CHAPTER 7. TRACE-BASED DATA COLLECTION
Chapter 8
Trace-based Failure and
Potential Analysis
In this chapter we draw on the precise failure and potential specifications in
order to derive algorithms to detect them in traces collected from a program
execution. We present existing algorithms and describe the choice for our
prototype implementation JAVIS.
8.1 Analysis Requirements
Analysis depends on the preceding data collection, which provides the input
to the analysis algorithms. The results of the analysis are in turn the input to
the subsequent failure visualisation. Therefore, analysis is depicted in Fig.
8.1 as a component labelled data analysis layered between data collection
and data visualisation. The goals of the analysis algorithms, i.e. the failures
and potentials which have to be detected, are depicted by the arc labelled
failure domain in Fig. 8.1. The detailed requirements for analysis are given
in the following.
8.1.1 Functional Requirements
The functional requirements are that each occurrence of a failure and a po-
tential in a given trace should be detected. Therefore, our approach should
not be confused with source code based approaches intending to detect con-
currency failures by analysing the source code.
The input to the algorithms is a trace in the format which was described
in the last chapter. The trace format is designed such that the information
is sufficient for the desirable detection algorithms. Therefore, the commu-
157
158CHAPTER 8. TRACE-BASED FAILURE AND POTENTIAL ANALYSIS
Data
Visualisation
Data
Collection
Failure Domain
UsageContext
Standards
Data
Analysis
Figure 8.1: Requirements for Data Analysis
nication between data collection and data analysis is straightforward. First
the trace data is collected, then the algorithms can be executed. It is not
required that the analysis algorithms control what kind of data is collected,
albeit this is a conceivable concept. For instance, the analysis component
could change filters during tracing.
In the previous chapter, we have already pointed out that we include the
information that a thread is blocking on a lock in our traces. This is necessary
for detection of failures involving blocking, namely for deadlock, lockout, and
join-induced deadlock. For detection of the corresponding potentials only
successful locking is required. That means, that potential detection could
work also from a trace format containing less information than ours. These
details are important to know if the algorithms are implemented in a different
context.
Although a trace is already easier to analyse than source code, algorithms
for potential detection can incur intolerable overhead. As a consequence,
it might be desirable to use a faster but less accurate algorithm. Where
algorithms are not able to produce accurate results it should be indicated
what kind of misses or false positives have to be expected.
For the output of analysis results we do not require a format. We propose
to include them in the trace files. This will prove convenient for the visuali-
sation component described in the next chapter as we will describe our own
solution for reading trace file input.
8.2. RELATED WORK 159
8.1.2 Time and Space Complexity
As already pointed out, detection should compute results in a tolerable time.
Also, space consumption could become an issue. Therefore, it is important
to consider the time and space complexity of the specified algorithms.
Concerning time-efficient computations, reducing trace size already dur-
ing tracing can help. To this end, we have already introduced filters in the
previous chapter.
8.2 Related Work
Here we describe algorithms which have been proposed in literature for the
failures and potentials we are interested in. We present solutions for Java
but also for other widely used object-oriented programming languages.
8.2.1 Deadlock Detection
Detection of cyclic dependencies in a trace of a concurrent program is not
a hard problem. This explains why there is not a lot of literature about
languages like Java while at the same time many commercial tools implement
it.
Operating Systems Support
Deadlock detection algorithms have a long tradition in operating systems in
order to detect deadlocks of processes. Most often, the Banker Algorithm
and reducibility of resource dependency graphs are cited [Tan97].
These two algorithms can be used for deadlock detection but also for
computing deadlock free resource allocation orders. They work from the
assumptions (i) that it is completely known, which resources will be needed
by a process, and (ii) that a resource allocation system can decide to which
process a resource is granted first.
This assumption does not hold for Java locks granted to threads. A lock
is granted when the thread acquires it and when it is free. If different threads
are claiming a lock, one is arbitrarily chosen. Therefore, we cannot simply
implement a banker algorithm or the resource graph reduction algorithm to
detect Java deadlocks or to detect potentials.
160CHAPTER 8. TRACE-BASED FAILURE AND POTENTIAL ANALYSIS
Commercial Tools for Java and Similar Languages
Deadlock detection for Java can be found in many commercial tools, for in-
stance JBuilder [JBd01], JProbe Threadalyzer [JPr00], and Assure Thread-
analyzer [Ass] for Java. Note that these tools do not cover failures similar
to the deadlock like lockout or join-induced deadlock. The above mentioned
commercial tools do not publish the concrete implementation of their dead-
lock detection algorithms. In these tools, deadlock detection is often com-
bined with detection of deadlock potentials which will be discussed in the
next section.
We have already mentioned in the introduction that COMPAQ observed
that users of the pthreads library did not have appropriate support for con-
currency problems. Therefore, COMPAQ built Visual Threads [Har00] to
provide runtime analysis of deadlocks, deadlock potentials and race prob-
lems. The definition of a deadlock is specific for pthreads. It covers deadlocks
involving mutex locks, joining, and read-write locks as found in the pthread
library. For the deadlock detection, Visual Threads requires that at runtime
blocking can be traced. This is the same observation we have made.
The deadlock detection algorithm identifies cycles in a thread dependency
graph. This graph contains thread nodes and directed edges, meaning that
one thread depends on another for its progress. The algorithm starts only if a
thread changes from running from blocking. This blocked thread is uniquely
marked and then the algorithm recursively visits all threads on which the
thread depends. For each new thread visited it is checked if the thread has
already been marked with the same value. If so, a cycle is detected. Oth-
erwise, each thread visited is marked with the same value. Assuming 32bit
integers as marks, the marks are guaranteed to be unique and do not have to
be cleared. With this algorithms not only a cycle involving the start thread
can be detected but also a cycle which can be reached from the start thread.
The complexity of this algorithm is not given in [Har00]. Assuming that a
thread has at most one dependency edge to another thread and that a thread
can be at most once in the dependency graph, the complexity of detecting
a cycle is linear in the number t of threads, i.e. O(t). The complexity for
maintaining the dependency graph cannot be determined without knowing
more about its implementation.
8.2.2 Deadlock Potential Detection
Here, we describe the solution found in Visual Threads [Har00] and in Java
Pathexplorer JPAX [Hav00, HR01].
JPAX [Hav00, HR01] has a facility which checks traces against temporal
8.2. RELATED WORK 161
logic formulas, and more interesting for our context, which checks for the
potentials of deadlock and data race. The deadlock potential detection al-
gorithm called GoodLock [Hav00] constructs a lock hierarchy for each thread
from information about a running programming.
Then the lock hierarchies of different threads are compared for pairs held
at the same time but which were acquired in opposite order, and, most
importantly, which are not controlled by a preceding commonly used lock.
This presents an improvement over other algorithms which only detect pairs
and hence produce false positives. The presented implementation only deals
with deadlock potential between two threads.
Therefore, a subsequent publication described an algorithm which can de-
tect deadlock potential between many threads [HR01]. Two data structures
are maintained by this algorithm: a thread map keeps track of which locks
are owned by any thread at any point in time. The second data structure, a
lock graph, maintains an accumulating graph of all the locks taken by threads
during an execution, recording locking orders as edges. That is, an edge is
introduced from a lock 1 to a lock 2 in case a thread owns 1 while taking 2.
If this graph ever becomes cyclic the program has a deadlock potential. The
complexity is not given. As the algorithm is not exactly given, we cannot
give a precise assessment.
It has been noted by [Hav00] that deadlock potential detection in traces
using a model checker suffers from the state explosion problem. Local dead-
locks, i.e. deadlocks which are only possible due to a part of the lock hierar-
chy, cannot always be found. Therefore, the detection of potentials using a
dedicated solution is more feasible and to this end the GoodLock algorithm
was proposed.
Visual Threads also provides detection of deadlock potential [Har00]. The
algorithm is only coarsely sketched. It detects inconsistent lock acquisition
order by maintaining a set of must-not-be-locked-before relationship pairs.
When a new lock is acquired a search is performed to find any existing lock
order pairs involving the new lock and each of the other locks already held
by the thread. The must-not-be-locked-before relationship is transitive and
the search recursively follows chains of relationships. If the search fails to
find any inconsistencies with previous execution behaviour, then new must-
not-be-locked-before relationship pairs are created using the new lock and
each lock currently held. This algorithm does not avoid false positives as a
consequence of gate locks. The complexity is not given with the algorithm.
162CHAPTER 8. TRACE-BASED FAILURE AND POTENTIAL ANALYSIS
8.2.3 Failures and Potentials Involving wait() or join()
For other failures such as the ones described by [Lea00] and the ones we have
determined we could not find any algorithms in the literature. The reason
might be that they are very specific for the Java monitor concept and that
some of them are very similar to the deadlock because they involve cycles.
For these failures, other algorithms can be easily adapted. Also, for the
corresponding potentials no literature could be found.
8.3 JAVIS-Algorithms for Liveness Failures
and Potentials
In this section, we describe our choice for a cycle detection algorithm with
the example of a deadlock and we propose an algorithm for detecting missed
notifications.
8.3.1 Cycle Detection
All failures involving cyclic dependency based on blocking, joining and wait-
ing can be detected in the same way. A dependency graph is constructed and
then the algorithm detects cycles in the graph. Here we describe the idea
exemplarily for deadlock detection.
Graph Structure
Dependencies exist between threads and monitor objects or between threads.
Threads and objects become nodes of the graph, locking and blocking is
represented as edges.
The exact definition of the graph is as follows. Nodes are of type thread
or object and edges of type lock or acquire which can be inserted between
a thread and an object only. A weight for re-entrance of threads is added
to the data structure for each locking edge. Otherwise one would not know
when to delete a locking edge. Note that the graph is not necessarily fully
connected.
A cycle is a sequence of alternating locking and acquire edges with one
node being the start and the same node being the end. It may start from a
thread which is connected by a locking edge to an object which is connected
by an acquire edge to a (different) thread which is connected by a locking to
a (different) object which is connected by an acquire edge to the first thread.
8.3. JAVIS-ALGORITHMS 163
thread
object
object
thread
locking acquire
locking
acquire acquire
locking object
object
acquire
locking
Figure 8.2: Cycle in Graph Structure
Also more object-thread pairs may be involved. This situation is depicted in
Fig. 8.2.
Algorithm
The algorithm has two main tasks.
•It creates and deletes thread and object nodes, and locking and blocking
arcs as the program proceeds (or the the trace is read).
•It checks for cycles starting from a thread node whenever a new blocking
edge is inserted.
For maintaining the graph, the algorithm inserts an acquire edge for a
blocked call. For a synchronised method entry it inserts a locking edge if no
exists and sets the weight to one. If one exists it increments the weight by
one. If the thread or the object node do not yet exist they are created as
well. When a locking edge is created it is checked if there has been an acquire
edge which is then deleted. For a synchronised method exit the weight of
the locking edge is decremented. If the locking weight reaches zero, the edge
is deleted. If the object has no other edges it is deleted. A thread is only
deleted after method exit for run().
The cycle detection is triggered when a new acquire edge is inserted. For
each thread it is checked whether it has an acquire edge. For the online
algorithm the monitor status can be checked with the JVM which requires
that all thread reference provided by the JVM are kept until a thread exits.
For the offline algorithm the existence of a reference to the acquire edge has
to be checked.
1. Determine all threads with an acquire edge.
2. Take one and follow its acquire edge to the object.
164CHAPTER 8. TRACE-BASED FAILURE AND POTENTIAL ANALYSIS
3. Repeat the following until there is no such edge:
If the object has a locking edge, follow it to the thread. Then if the
thread has an acquire edge, follow it to the object.
4. Check if the last thread reached is the same as the starting thread.
Then a deadlock is detected.
5. Delete the thread from the determined threads.
6. If a deadlock was detected also delete all other threads involved from
the determined threads.
The detected cycles have to be stored, e.g. in the trace file.
Complexity
Here we determine the time complexity, The complexity depends on the
number of threads t and the number of objects o. Space complexity also
depends on t and o and is not considered an issue here.
The complexity of a maintenance operation is as follows. Finding the
thread in a list can be accomplished in linear time O(t). The worst case is
that the thread is not yet part of the list. In the case of a successful locking
the object may not yet be part of the graph yet. Then the insertion of the
object and the lock-edge has complexity O(1). If the object exists, the thread
has a locking edge which is also computed in O(1). In the case of acquire
an edge has to be inserted. The object already exists. If the objects are
kept additionally in a list it takes O(o). If the object can only be found via
threads, the complexity is O(t) plus O(n) because an object can be reached
through only one thread. Deletions are analogous. The overall complexity
reached by best practice is therefore O(t).
The complexity of the detection part is as follows. Each thread has to be
checked for the existence of the acquire edge which takes O(t). The fan-out
for leaving a thread by an acquire edge and for leaving an object by a locking
edge is at most one. Therefore, following one edge is O(1). In the worst case
all threads are involved in the deadlock and therefore following the cycle back
to the starting thread takes O(t). Therefore, the overall complexity is O(t).
If used online, the algorithm is triggered each time a new acquire edge is
inserted. As a result, many edges are visited again and again. This could be
improved in an offline variant, marking already visited paths. For a proof-
of-concept prototype this is considered efficient enough.
8.3. JAVIS-ALGORITHMS 165
Implementation
We have implemented the online variant in the Analyzer class of the tracing
system (see Fig. 7.5). When a deadlock is found, the cycle found is added to
the trace file, and the tracing is stopped. The implementation is described
in more detail in [Wey01].
Comparison
The cycle detection does not have the same complexity as the known algo-
rithms for detecting arbitrary cycles. Our algorithm only checks whether a
given node is involved in cycle and there is at most one outgoing arc from the
node which can lead into a cycle. Therefore we do not need marking such
as in [Har00]. If the algorithm should detect cycles which do not involve
the starting thread but also a cycle which can be reached from the starting
thread we need to keep track of each thread visited or we need to employ a
constant marker during one search.
8.3.2 Missed Notification
Remember that a missed notification is a thread which cannot return from
wait() because there is no subsequent notification but there was a notification
before this wait(). That is, when we detect a thread waiting for a long time,
we can start to search for a missed notification. To this end, the analysis
component should have a configurable timer.
A missed notification is a notification which did not have an effect, be-
cause it happened too early. Instead of looking only for one such notification
we propose an algorithm with which we can find also other notifications
which did not have an effect. These can be seen also as potentials for missed
notifications.
For detecting an ineffective notification we propose to tackle the problem
by counting wait()- and notify()-calls for each object. A counter maintains
the current balance between these calls. There is a counter for each Java
object.
•The counter is initialised with zero.
•Await()-call increments the counter by one.
•Anotify()-call decrements the counter by one. If the counter is set to a
number smaller than zero, this means that a notify() did not wake up
any waiting thread. In this case a warning should be issued pointing to
the notify()-call causing the warning. After that the counter is set to
166CHAPTER 8. TRACE-BASED FAILURE AND POTENTIAL ANALYSIS
zero, because notifications cannot be stored and hence one has to start
counting the balance again.
•AnotifyAll() sets the counter to zero again if if was positive because
all threads are woken up. If the counter was already zero, a warning is
issued because no thread could be notified. The counter remains zero
as in the previous case.
This algorithm could be used online and offline. The algorithm involves for
each call to wait, notify and notifyAll an integer operation and comparison.
A counter is only created when the Java object is first used as a lock.
8.3.3 Potentials for Cyclic Dependencies
For the deadlock potential and other potentials of cyclic failures we do not
have to provide a new algorithm as we think that the solution by [Hav00,
HR01] is feasible also for our trace-based approach. The graph used for
deadlock detection can be extended to store lock hierarchies.
8.4 Summary
In this chapter we have given an overview of algorithms for detecting concur-
rent liveness failures and potentials in traces. We have adapted a deadlock
detection algorithm which works with our trace format and whose results
serves as input for the visualisation described in the next chapter. We have
also described a new algorithm for detecting missed notification and potential
for missed notification.
The deadlock detection has been implemented in the JAVIS prototype.
A first version of the analysis was described in [Wey01] and [Meh02].
Chapter 9
Trace-based Failure and
Potential Visualisation
In this chapter we describe how the traces and the analysis results from
failure detection are visualised in the JAVIS environment. Based on the
trace format from Chapter 7 and based on the detection algorithms from
Chapter 8 we can now derive requirements for the visual language and also
for the tool environment in which the visualisation will be generated.
After refining the requirements we will discuss related work. We will
discuss visual languages which can be found in tools for visualising the ex-
ecution of object-oriented and concurrent programs and we will discuss the
visual modelling language UML (Unified Modeling Language) [UMLa] which
has been introduced to support modelling at the different phases of the soft-
ware lifecycle. We will also discuss visualisation environments and visual
language editors.
We will then present a languages for trace and failure visualisation which
extends the UML. We will present a visualisation environment which can
translate traces in our trace format and results from failure analysis into this
language and display the visualisation.
9.1 Visualisation Requirements
The visualisation depends primarily on the analysis results, which it has to
visualise, see also Fig. 9.1. If the failures are visualised in isolation from the
surrounding execution they may become less understandable. Also, there is
not really a reason why one should not aim at providing also a visualisation
of the trace, not only of detected failures.
Therefore, we require that the failures can be visualised embedded into
167
168CHAPTER 9. TRACE-BASED FAILURE AND POTENTIAL VISUALISATION
Data
Analysis
Data
Visualisation
Data
Collection
Failure Domain
UsageContext
Standards
Figure 9.1: Requirements for Data Visualisation
the trace visualisation and separately in a dedicated view. Also, the trace
view and the separate failure view should use a very similar language in order
not to create a barrier for understanding when switching between the two.
We detail the requirements for the trace visualisation language in the
next section. In the two sections to follow the next, we detail the require-
ments for the failure visualisation language and discuss requirements for the
environment which is supposed to generate visualisation in these languages.
9.1.1 Trace Visualisation
The starting point for the trace visualisation is a textual trace such as the
one given in Fig. 7.4. We require accuracy, i.e. the visualisation should
be an equivalent presentation of the textual trace without loosing any of the
information from the trace.
The textual presentation itself has drawbacks which can be easily over-
come by graphical presentations. Text is linear and cannot display other
dependency dimensions than the sequence of its elements. Other depen-
dencies are only implicit, e.g. in the syntax or in the values. A graphical
visualisation can present easily two dimensions explicitly, e.g. according to
an x and y axis. An axis can also be organised in a spiral shape.
Additional dimensions can be encoded for instance by colour or shape.
Colour and shape are graphical patterns which can be easily discriminated.
Adequately designed, a graphical visualisation can convey selected dimension
better than text. Therefore, we require a graphical visualisation. It can be
9.1. VISUALISATION REQUIREMENTS 169
an optional feature to view the textual trace along the side.
Here, we aim at a 2-dimensional graphical visualisation. While viewing
2D is familiar, e.g. using horizontal and vertical scroll bars, 3D may have
acceptance problems, because it is typically not present in IDEs and requires
specific familiarity and experience with 3D viewing such as in VRML (Virtual
Reality Markup Language) environments.
A trace contains the dimensions time, objects, links and dependencies,
and method calls. It is not possible to squeeze them all into one visualisation
without the danger of overloading it. Therefore, our requirements for the
visualisation have to prioritise these dimensions.
Traces are collected over time and therefore, time plays an important
role for trace visualisation. It should have an explicit dimension in the vi-
sualisation which supports the notion of happened before or happened after.
This is only possible if time is mapped to an explicit axis, e.g. x or y axis.
We prefer an explicit time axis over animation. During animation, displayed
information may be erased.
The other axis should be taken by the objects and the method calls be-
tween them. By objects we refer to thread objects and passive objects found
in the trace. It is important to show passive objects in addition to threads,
because of their key role in concurrent behaviour when they are shared by dif-
ferent threads. Either they are accessed without synchronisation, potentially
leading to safety problems, or they are access mutually exclusive, which may
lead to blocking, or they are used by threads for sending each other signals,
e.g. by notification. Method calls are more important than the links and
dependencies between the threads and objects for the general understanding
of an execution. While we neglect the dimension of object links, we also have
to consider the dimension of concurrency. Concurrency is related to time
and expresses itself in the method calls. While it is already implicit it can
be supported by colouring the different threads of control.
It is desirable to have also another optional trace visualisation which
can show the other dimension of links and dependencies although this is
difficult because they are changing over time.
The visualisation must also respect that the trace file may also contain
acall stack at its beginning and must provide an adequate visualisation
seamlessly integrated into the visualisation of the real trace.
Also, it is desirable to use an existing visual language and to use a lan-
guage which is already used in the software development lifecycle such as the
Unified Modeling Language (UML) [UMLa]. We have already argued
that this language is known to many programmers which have to read UML
design documents and that by using UML in debugging we extend the seam-
lessness in the use of notations. The UML contains Interaction Diagrams
170CHAPTER 9. TRACE-BASED FAILURE AND POTENTIAL VISUALISATION
which seem to be a candidate matching our requirements.
Using UML may conflict with the requirement that a general purpose
trace visualisation should be scalable. It should be possible to visualise
traces with many method calls or/and many different objects. UML diagrams
can become very large and then the question is how well they still convey
information.
9.1.2 Failure and Potential Visualisation
Generally, the failures involve the same dimensions as the traces. Depending
on the kind of failure or potential, either the time dimension or the depen-
dency dimension is more important. The dimension of objects is always
needed.
Firstly, we look at the failures for which dependencies are important.
The comparison of the different kinds of failures showed that there is a group
of failures, namely, deadlock, nested monitor lockout, circular join, self join,
and join-induced deadlock, which have in common that the involved threads
are blocked. The comparison has identified three different kinds of such
blockings, blocking on a lock, blocking while joining, blocking while waiting
for a notification. In all these states, the blocking threads are dependent
on a resource or on another thread. These dependencies have already been
documented with the well-known wait-for-graphs.
Now it is straightforward to require, that these kind dependencies are
visualised in the failure visualisation. That is, we do not require a visuali-
sation for each kind of failure, but we exploit the commonalities to define a
language with which different kind of failures can be described. The cycles
resulting from the dependencies involved in the failures should be obvious in
the visualisation.
However, we remember that it was one of our motivations to go beyond
the usual visualisation with a wait-for-graph because it does not inform about
the execution history nor about the method calls involved. Therefore, we
require that both, the dependencies and the execution order is depicted.
In addition, we have required that, in order to improve understanding,
the trace visualisation and the failure visualisation should use the same lan-
guage concepts, and that the failure can also be viewed in the original trace.
That means, the visualisation of the detection should be embedded in the
visualisation of the overall behaviour.
Secondly, we look at failures and potentials for which time is more im-
portant. This applies to the missed notification and to circular join. For
them, we do not have to derive completely new concepts but we require that
they can be visualised as traces with the involved calls highlighted or with
9.2. RELATED WORK 171
the other calls excluded. Also for the deadlock potential it is adequate to
show the lock accesses and releases over time.
9.1.3 Visualisation Environment
Here we list the functional and the non-functional requirements for the visu-
alisation environment.
Functional Requirements
The main functionality is to read a trace from file and visualise it, and to
visualise detected failures.
Either failure detection has been carried out before the visualisation, on-
line or in a separate step. Then results are part of the trace or are in a
separate file. Or the analysis can be triggered from within the visualisation,
for example while reading the trace file.
It must be possible to view the failures as part of the trace and it must
be possible to extract them in their own views.
The visualisation should be fully automatic, e.g. including the layout.
We require that a trace or a failure can be displayed as a whole or stepwise,
where a step corresponds to a method blocking, a method entry, or a method
exit.
Different focusing techniques are also desirable, especially regarding the
failure, but also a general zooming facility.
Storing the generated visualisations, exporting them and printing are also
desirable features.
Nonfunctional Requirements
Such an environment can be built in different ways. One can use a framework,
e.g. for a graphical editor, or one can use an extensible graphical editor. Also,
the solution can be integrated in a standard IDE environment. Ultimately, it
depends on what the developers typically use, which may vary from standard
IDEs to UML round-trip tools. There is no tool which will find a high
acceptance with all developers. Therefore, a special approach is not required.
9.2 Related Work
As we are interested in the behavioural aspects of the traces not in the
structural information contained in it, here we discuss only approaches to
172CHAPTER 9. TRACE-BASED FAILURE AND POTENTIAL VISUALISATION
Figure 9.2: GThread History View
dynamic visualisation, i.e. visualisation of the runtime behaviour. We discuss
approaches for concurrent and object-oriented programs.
9.2.1 Visualisation of Concurrent Programs
GThreads was one of the first tools for visualising the execution of concur-
rent programs [ZS95]. It instruments source code using the pthread library
to trace calls to the library. In order to trace arbitrary function calls, the
programmer manually has to insert tracing methods in the code. Traces are
output as files.
The post-mortem visualisation is an animation. It is implemented with
the POLKA [SK93] animation toolkit, primarily intended for algorithm ani-
mation. Animation is speed-controlled and can be executed stepwise. Here,
we primarily discuss the used visual languages. We do not consider anima-
tion to be an central feature for our goals. It has disadvantages for large
traces, also the information displayed can be erased and therefore we prefer
an explicit presentation of time.
GThreads contains an animated call graph, a history view, a mutex view,
and a barrier view. For us the history view and the mutex view are of
interest. The function view is neither object-oriented nor does it incorporate
synchronisation-specific information. This kind of information can be found
in the mutex and barrier view. The barrier is a synchronisation not supported
by Java and therefore we do not consider it here.
The history view provides an overview of the trace (see Fig. 9.2). It
shows exactly all function calls. A bar describes the execution of a thread,
it contains segments which denotes the different, nested function calls. The
shadows of the bars have all different colours and denote different threads.
This view is feasible for imperative programs but not for object-oriented
programs.
9.2. RELATED WORK 173
Figure 9.3: GThread Mutex View
The mutex view depicts a mutex lock with a bigger dice and threads using
it with smaller dices (see Fig. 9.3). A thread holding the lock is depicted
inside the big dice. A thread waiting for a lock is depicted outside the big
dice. For each mutex there is such a view. Deadlock can only be detected
by comparing different views.
9.2.2 Object-Oriented Visualisation
Jinsight
IBM Jinsight is an industrial research prototype for visually exploring the
runtime behaviour of a Java program. It is the most prominent tool for
visualising Java execution. Here we refer to the last published version 2.1
[Jin, PKV94, PKV98, PJM+02]. Thereafter, the concepts of Jinsight have
been introduced into commercial tools like IBM Websphere [WbS] or the
IBM Hyades plugin for Eclipse [Hya].
Jinsights helps in understanding a program’s time performance, in under-
standing a program’s memory usage and anomalies such as memory leaks,
and in understanding a program’s data structures and its bottle necks by un-
derstanding how objects are created and interrelated. Its scope is concurrent
Java programs. Jinsight traces method calls, object creation and deletion.
The tracing approach has been described in chapter 7 in depth. Jinsight
provides post-mortem visualisation.
Jinsight provides three kind of views:
•views depicting the execution at different level of details,
•views depicting statistical summaries of the execution, and
174CHAPTER 9. TRACE-BASED FAILURE AND POTENTIAL VISUALISATION
Figure 9.4: Jinsight Execution View
•views depicting patterns which emerge during the execution.
For us, the execution view is the most interesting. It allows the ex-
ploration of the detailed program execution sequence per thread. It starts
by an overview (see Fig. 9.4) depicting threads in columns and depicting
nested method calls of each threads by vertical coloured stripes stacking to
the right. The colour indicates the object. Time progresses from top to bot-
tom. The view provides pop-up information about selected calls or threads.
It allows the zooming into the picture until the level of individual method
calls is reached (see Fig. 9.5).
At its most detailed level of presentation, the visual language used in the
execution view has similarities with UML sequence diagrams. It is however
possible to compress this view by omitting details and therefore the lan-
guage used is scalable. Jinsight allows the user to customise views by slicing,
grouping, and filtering information.
Although threads are covered, this tool is not thread-specific. Liveness
failures can only be detected using the heuristic that a thread from the exe-
cution view fails to make progress.
Jinsight provides also views showing dependencies between objects. The
reference pattern view (see Fig. 9.6) supports the exploration of patterns
9.2. RELATED WORK 175
Figure 9.5: Jinsight Execution View - Zoom
of references to or from a set of objects, in varying detail. It is intended
for studying data structures and finding memory leaks. The view shows
references to an object at different points in time.
We do not show the views providing summary information and the views
for identifying patterns of interactions emerging as the program executes, as
they cannot help with our purpose of highlighting erroneous behaviour as
detailed as possible.
Jinsight as a visualisation environment is characterised by the fact that
it is an implementation for exactly the trace rsp. dump format. It is not
open source and it is also not intended to be a framework. The user control
is characterised by the fact that the user can choose between different views
and in addition can choose projections called slices to work with in separate
projects called workspaces. We think that it is important that the views are
from the two orthogonal dimensions behaviour and structure, i.e. diagrams
with time axis and diagrams for object structures such as the reference pat-
tern view. We also think that techniques for focusing such as the zooming
facility are very important for a user to work with such a tool.
3-Dimensional Visualisation
For instance, [Rei98] uses 3D diagrams for displaying object lifetime, call
graphs, and object interaction. The Software Tomograph [LS02, SOT] uses
176CHAPTER 9. TRACE-BASED FAILURE AND POTENTIAL VISUALISATION
Figure 9.6: Jinsight Reference Pattern
the third dimension to map metrics to distances between classes in a 3D
space.
Note that the 3D consideration do not mean that UML must be excluded
from consideration. The UML specification does not prevent showing several
UML diagrams in a 3D space. There is work about 3-dimensional class
diagrams [MLMD01] but not about 3-dimensional interaction diagrams.
We have not yet considered 3D when visualising single traces. The third
dimension could for instances be used to show how links and dependencies
change overtime.
Changing dependencies and links can only be observed in the stepwise
replay of the collaboration diagram. An alternative could be interesting but
with the focus on failures we reckon that the user is only interested in the
final result of the cyclic dependencies. For reconstructing the behaviour over
time the user can view execution over time in the sequence diagram, which is
more compact than a sequence of structural snapshots. Also, this sequence
would need to depict the method call sequence, and lastly it seems that the
call sequence is the key into understanding behaviour over time. Besides
this, the third dimension could be used for additional information such as for
9.2. RELATED WORK 177
comparing several traces or for visualising the dichotomy between structure
and behaviour.
9.2.3 UML-based Visualisation
For visualisation of structure and behaviour of software it is also possible
to consider visual modelling languages from software engineering. Today,
the most widely used modelling language is the Unified Modeling Language
(UML) [UMLa].
Typically, UML is supported by UML CASE (Computer Aided Software
Engineering) tools which provide editors and also code generations facili-
ties. Tools to visualise structure and behaviour of existing software with
UML come from at least two different areas, reverse engineering and pro-
gram comprehension.
Before we describe the use of UML in these areas we will give a short
overview of the UML.
UML
The Unified Modeling Language (UML) [UMLa] has become a de facto
standard for visually describing artifacts of the software lifecycle, covering
software requirements, analysis and design documents, program documen-
tation, test specification, and also activities and processes of the software
development itself. The UML provides a set of diagrammatic languages and
a constraint language (OCL) which helps in specifying information which
cannot be specified visually.
In the context of this thesis, the diagrams for describing behaviour on in-
stance level are of interest, namely interaction diagrams, which have two
possible representations, as sequence diagrams and as collaboration di-
agrams. While a sequence diagram depicts objects on one axis and method
calls along an explicit time axis, a collaboration diagram depicts objects and
its links in a two-dimensional space and depicts messages on links with a
number indicating their order. What is interesting for concurrency is that
interaction diagrams support the notion of an active object which owns a
thread of control.
In the context of this thesis, not only the visual languages for describing
software behaviour are of interest but especially the possibilities to extend
them for special domains. The UML is designed to be adapted to domain-
specific requirements by defining profiles. A profile is a set of refinements
of existing modelling elements. The UML already provides a set of standard
178CHAPTER 9. TRACE-BASED FAILURE AND POTENTIAL VISUALISATION
profiles, though none for Java execution. We will provide details on the
profiling mechanism when we propose our UML-based visualisation.
UML CASE Tools
The UML Language is supported today by UML CASE tools such as Borland
Together [Tog]. These tools provide UML editors, code editors, debuggers,
and support round-trip engineering or at least code generation. In round-trip
engineering, the model is used to generate corresponding code fragments in
an object-oriented programming language. Typically, only classes, fields, and
method headers are generated but not method bodies. Also, the tools can
generate UML class diagrams from the source code. When the code or the
model is changed, it is reflected in the model and in the code. Existing round-
trip concepts are fairly limited covering mainly structure but not behaviour.
For instance, Together can generate an interaction diagram from a method
body but subsequent changes in the code or in the sequence diagram are not
reflected. The sequence diagram can not be used for code generation.
Together can import UML diagrams via XMI, however with restrictions,
and also via a proprietary API by which a UML diagram can be generated
and also be extracted again. Together supports UML extension mechanism
like stereotypes and tags.
There are many similar tools, e.g. Rational Rose or Argo UML. These
tools demonstrate, that UML-based modelling and programming are moving
closer. Programmers often have to know the UML.
UML-based Code Generation
Except for the STATEMATE [HP98] CASE tool and similar approaches
which generate executable code from statecharts, code generation in the pre-
viously mentioned commercial CASE tools is limited to generation of class
fragments containing no behaviour.
An extension to include code generation from behavioural UML diagrams
is for instance proposed in [EHSW99]. In order to generate executable Java
code fragments from collaboration diagrams, the use of standard elements
has been restricted, e.g. destructors are not allowed because they have no
counterpart in Java. To this end, the UML meta-model is refined. The
approach does not cover thread synchronisation.
UML-based Reverse Engineering
Reverse engineering aims at supporting the maintenance of a software system.
It aims at deriving the design of a system from code. In the past, the focus
9.2. RELATED WORK 179
was mainly on structure but now there exist also dynamic reverse engineering
approaches.
UML has been used since its existence in reverse engineering of object-
oriented systems, mainly visualising structure with class diagrams, replacing
the predecessors of UML. Today, most UML editors support the genera-
tion of static structure UML diagrams from source code such as Java, e.g.
Together [Tog]. Here, we are focusing on dynamic reverse engineering of
object-oriented systems. The academic prototype Shimba [Sys99] generates
sequence diagrams using a proprietary tracing facility. They are however only
an intermediate step for deriving statecharts from several different traces.
This tool does not support concurrency and therefore the sequence diagrams
are not suitable for our purpose.
The approach by [KG01] describes a meta-model of Java behaviour in
UML. This model is used to extract facts about behaviour such as a method
invocation from the code. Then rules are used to generate UML-conform
collaboration diagrams from the extracted information. The proposed Java
meta-model does not support concurrency.
Program Comprehension using UML
Program comprehension aims at supporting the understanding of an existing
program. Tools for program comprehension aim at supporting the process
of a human to build its own mental image of the structure and behaviour
of a piece of code, or of aspects thereof, either starting from code or from
its execution. Because of many different conceivable purposes, tools differ a
lot. Tools like debuggers or profilers are typically not considered program
comprehension tools although the borders are not clear. Often tools are also
simply called software visualisation tools, a term which hides even more the
purpose of a tool. It is not our aim to give a precise definition of these terms
but merely to indicate areas of related work.
The academic prototype JavaVis [Oec02] aims at improving the under-
standing of small-scale Java programs for educational purposes. It supports
stepwise execution of a program and visualises the execution in a sequence
diagram. The tool supports threads but does not support blocking or lock-
ing. It does not produce a trace but only incrementally extends the displayed
sequence diagram.
The academic prototype Jacot [LRRM03] also generates UML sequence
diagrams from a running Java program and shows different thread states in a
simple statechart. The visualisation is online and does not generate a trace.
The tool aims at supporting the identification of liveness failures through the
visualiation but does not provide automated detection. It supports multi-
180CHAPTER 9. TRACE-BASED FAILURE AND POTENTIAL VISUALISATION
threading including locking and unlocking. Blocking is not visualised.
9.2.4 Comparison
Visual Trace and Failure Language
There are many different possibilities to visualise traces which allow to depict
the order of events explicitly. Object-oriented traces, in particular, have the
problem that they need to abstract from some of the dimensions involved
in the trace in order to achieve a compact visualisation. Only the UML
depicts object interaction explicitly, which is space-consuming. Therefore
UML interaction diagrams are less compact than other trace visualisations.
Other trace visualisations therefore are more scalable than the UML.
Scalability is not an issue because we primarily aim at visualising failures,
i.e. only parts of a trace. In a concurrency failure, only a certain number of
threads and other objects and especially method calls are involved.
We have seen very few failure-specific visualisations. Their drawback is
that they use a very special visual language and are not integrated with the
trace language.
For our purpose, single interactions are important, also in the trace, be-
cause they are the source of the failures we want to depict. We also want
to use the same language for traces and failures. With UML interaction dia-
grams it can be especially useful, that they have two different representations.
The sequence diagram is more suitable for a trace with many calls while the
collaboration diagram is more suitable for depicting a few calls which pro-
voke a failure, especially because it has a space for depicting dependencies
between objects, which is currently used for links.
Therefore, for a unified approach to trace and failure views the UML
seems to be apt. It can also be extended to depict Java-specific behaviour.
Another argument is that developers often are familiar with the UML. Pro-
grammers need to be able to read UML design documents. Using UML also
during debugging makes a step further towards seamless use of one notation
and we do not have to introduce a new visual language.
Because UML is not primarily designed for visualising program traces
it has to be adapted, which is made possible through the UML profiling
mechanism. We are of the view that the UML can be extended to cover
traces and failures sufficiently. Existing efforts to adapt UML to describe
Java execution, either for the purpose of forward or reverse engineering, do
not go far enough as they do not yet cover concurrency on the level of detail
we require here.
9.3. JAVIS-VISUALISATION 181
Tool Support
Most visualisation environments are prototypes. There is no best way to
integrate a new tool in the development process. Of course, it is advantageous
to extend an existing tool, but processes differ a lot, and a tool used by some
programmers may not be used by others. Also there is no clear decision how
tight a new tool has to interoperate with existing ones.
We favour to extend a UML CASE tool. For implementing a prototype
as a proof of concept, this kind of tool provides the highest level of reuse as
we can use the UML editor, zooming, printing, exporting and so on.
Note that JavaVis and Jacot were built in parallel with the JAVIS pro-
totype. They have very much in common with our approach from the UML
point of view and from the technical point of view. Our purpose is more
concise, supporting automated debugging, and our tracing and hence our
viewing can contain more information.
9.3 JAVIS-Visualisation
In this section, we build on UML sequence and collaboration diagrams for
trace and failure visualisation. Because they need to be adapted we explain
the profiling mechanism in depth before we explain how we use it here. Then
we describe how we extend the UML CASE tool Together [Tog] for visualising
traces and failures.
The idea to visualise concurrent execution and concurrency failures with
the Unified Modeling Language was the beginning of the research described in
this thesis. We published our first version of a UML profile in 2000 [MW00].
Thereafter, we began with the design and the implementation of the tracing,
the analysis, and the visualisation.
9.3.1 UML Profile for Java Traces, Failures, and Po-
tentials
The standard way to extend the UML is by defining a profile which refines
existing UML model elements. In this section we define a UML profile for
concurrent Java traces and for liveness failures and potentials, which can
occur in these traces.
UML Profiles, Stereotypes, and Tags
A UML profile [UMLa] is a stereotyped package that contains model ele-
ments that have been customised for a specific domain or purpose by extend-
182CHAPTER 9. TRACE-BASED FAILURE AND POTENTIAL VISUALISATION
ing the meta-model using stereotypes, tagged definitions, and constraints.
The principal extension mechanism is the concept of stereotype. It
provides a way of defining virtual subclasses of UML meta-classes with new
meta-attributes and additional semantics. A stereotype is a model element
that defines additional values (based on tag definitions), additional con-
straints, and optionally a new graphical representation. A stereotyped model
element receives these values and constraints in addition to the attributes,
associations, and super classes that the element has in the standard UML.
Extensions must be strictly additive to the standard UML semantics, i.e.
they must not conflict with or contradict the standard semantics.
Tag definitions specify new kinds of properties that may be attached
to model elements. The actual properties of individual model elements are
specified using tagged values, either simple data type values or references
to other model elements. A tagged value is a keyword-value pair that may
be attached to any kind of model element. The keyword is called a tag. Both
the tag and the value are encoded as strings. One or more pairs are enclosed
in {}, separated by commata, e.g. {persistent = true}. In the case of boolean
values, the value true may be omitted, e.g.{persistent}.
Constraints can be attached to any model element to refine its semantics
linguistically. Constraints can be specified formally, for example by using
OCL (Object Constraint Language) rules.
Sequence and Collaboration Diagrams
In the comparison, we have already argued that sequence diagrams shall be
used to depict traces and that collaboration diagrams shall be used for de-
picting failures involving dependencies because they are designed to capture
dependencies like links. Depicting failures has been inspired by the well-
known resource allocation graphs and wait-for-graphs, which were shown in
chapter 3 and 5.
Note that we do not expect the sequence diagram to be extended with
dependencies because it does not yet show any kind of dependencies, also
no links. We are of the view, that the diagram which shows already specific
dependencies should be extended for this purpose [MW00].
For visualising traces we rely on the following subset of standard elements
of interaction diagrams:
•synchronous method calls (messages), method call parameters, method
return, active objects, (passive) objects, lifelines, activation bars, shad-
ing of activation bars, multiple activation bars, links, and method call
sequence numbers.
9.3. JAVIS-VISUALISATION 183
For depicting the complete trace format in interaction diagrams we miss:
•synchronized-method calls
•method calls found on the stack
For depicting failures in collaboration diagrams we miss:
•dependencies between threads and objects (locking and acquiring, i.e.
blocking)
•dependencies between threads (joining and waiting)
Tag Definitions for Method Calls
Here we provide a solution for distinguishing synchronized-method calls and
method calls found on the stack from other Java method calls in a sequence
or collaboration diagram.
Java methods which have a modifier synchronized are particularly inter-
esting for our approach because they may result in locking or blocking, and
when they return, in releasing a lock. Therefore, we want to be able to dis-
tinguish them from other methods. In the trace format, they are identified
by the last entry which is true for a synchronized-method.
As we have already pointed out, the trace file may contain a call stack
for each thread. The method entries and exits in this part have their method
name labelled with <stack>.
For both cases, we will define new tags and tagged values to augment
method calls with the additional semantics. Such a tag is not bound to a
specific model element, e.g. the stack property could also be attached to ob-
jects, and the synchronized property could also apply to blocks of statements.
Therefore, we do not define stereotypes for them. However, in our profile we
limit the use of these tags to method calls.
We need to determine a suitable element to which this tag is added. A
synchronized-method call in an interaction diagram maps to a model element
Message from the Collaborations-Package. A message defines a particular
communication between instances that is specified in an interaction. For the
model element Message we define two tags, {synchronized}and {stack},
where the tagged values are of type boolean (see Table 9.1).
For an illustration, see the sequence diagram in Fig. 9.7. The first two
calls were found in the stack, the following return was traced, and then a
synchronized-call was traced, followed by two returns. So far, we have not
yet shown a collaboration diagram. In Fig. 9.7, we also illustrate how the
tags are integrated in a collaboration diagram, which is the equivalent of the
sequence diagram.
184CHAPTER 9. TRACE-BASED FAILURE AND POTENTIAL VISUALISATION
Tag Stereo-
type
Type Multipl. Base
Class
Description
syn-
chronized
none boolean 0..1 Message Attached to a call of a Java
synchronized-method
stack none boolean 0..1 Message Attached to a call of a Java
method which has been found
on the stack of the JVM when
the trace was started
Table 9.1: UML Profile for Concurrent Java Traces: Tags
Stereotyping Dependencies
In collaboration diagrams, we want to show the dependencies between threads
and objects which stem from synchronised interactions. Note that depen-
dencies also occur if threads are not involved in a failure. Therefore, the
extensions we describe here, are general extensions for describing concurrent
Java traces. It is not necessary to provide only failure-specific extensions.
For each of the situations where a thread is waiting for an object or
another thread we need an extension. In addition, a thread locking an object
also has to be depicted as a dependency.
We need an additional model element for this because it is not a refine-
ment of an existing element. Take for instance Fig. 9.7. The synchronized-
call uses a link and creates a dependency between the main thread and the
locked object. There is no link which we could tag for this purpose. Also
semantically this would not be satisfying.
Because all dependencies need additional elements, we have defined them
as stereotypes. The desired dependencies can occur between different kind
of existing model elements, depending on its kind, between a thread object
and a passive object, or between two thread objects.
In the context of an interaction diagram we consider the following three
model elements as candidates from which we can derive suitable stereotypes.
There is no perfect fit with our requirements, hence we try to determine the
best trade-off.
•ALink is a tuple with a pair of object references. It is an instance of
an association.
We see the following arguments against stereotyping a link. There is
not always a corresponding association and if there is one, it would
have to be instantiated twice.
9.3. JAVIS-VISUALISATION 185
:Object :Object
{stack} foo()
{synchronized} bar()
Main
{stack} do()
:Object
:Object
1.1: {stack} foo()
1.2: {synchronized} bar()
Main 1: {stack} do()
Figure 9.7: Tagged Values for Messages in Interaction Diagrams
•ADependency captures a relationship other than association, gen-
eralisation, flow, or a meta-relationship. A Dependency is a directed
relationship from a client (or clients) to a supplier (or suppliers), shown
as a dashed arrow from client to supplier. It states that the client(s)
depend(s) on the supplier(s), i.e. the implementation or functioning of
the client(s) requires the presence or knowledge of the supplier(s). It
indicates a situation in which a change to the target element may re-
quire a change to the source element in the dependency. A dependency
does not require a set of instances for its meaning.
As we are not taking a modelling perspective we do not really match
the description of the dependency but we also have the intuitive notion
that one thread depends on another thread or object.Also, it is useful
that the dependency is directed.
•ADerived Element is one that can be computed from another one,
but that is shown for clarity or that is included for design purposes even
though it adds no semantic information. A derived element is shown by
placing a slash (/) in front of the name of the derived element, such as
an attribute or a role name. The details of computing a derived element
can be specified by a dependency with the stereotype ≪derive≫or by
186CHAPTER 9. TRACE-BASED FAILURE AND POTENTIAL VISUALISATION
:Object:Thread <<acquire>>
:Object:Thread <<lock>>
:Thread:Thread <<joining>>
:Thread:Thread <<waiting>>
Figure 9.8: Graphical Notation for Stereotypes
simply placing a constraint near the derived element.
Locking, acquiring, joining and waiting dependencies are obviously de-
rived from their corresponding method calls. This does not provide us
with a model element for the derived element type. It can be an inter-
esting option to show from which method call the dependency is derived
but it also can be a visual overload. Also, these kind of relationships
are obvious for the programmer.
Our conclusion is that we can use the model element Dependency to derive
new stereotypes for locking, acquiring, joining, and waiting. We have used
names slightly different from the corresponding method names.
Table 9.2 defines the corresponding stereotypes. Note that we have omit-
ted the parent and tag column because none of our stereotypes is derived
from another and none defines additionally tags.
Each stereotype has a graphical notation. They are given in Fig. 9.8. The
first one has a green colour because it depicts a successful locking. The others
are coloured in red to indicate waiting situations. Note that for locking and
acquiring we draw the arrow from the thread to the lock. This resembles the
direction of the method call. In the case of joining we draw the arrow from
the joined thread to the joining thread because the joining thread depends on
the other thread. In the case of waiting we draw the arrow from the thread
which can send a notification to the waiting thread.
In Fig. 9.9 we illustrate, how the different failures involving cycles look
using the stereotypes. Here, we have omitted the corresponding method calls.
9.3. JAVIS-VISUALISATION 187
Stereotype Base Class Description Constraints
Lock
≪lock≫
Dependency The lock dependency
states that a thread
is holding a lock
after having called
asynchronized Java
method.
The dependency
source must be an
active object. From
the active object
there must be a at
least one synchro-
nized method call
entry not paired
with its exit.
Acquire
≪acquire≫
Dependency The acquire depen-
dency states that a
thread is waiting for
a lock after having
called a synchronized
Java method.
The dependency
source must be
an active object.
From the active
object there must
be a synchronized
method call en-
try being blocked
in the sequence
diagram.
Joining
≪joining≫
Dependency The joining depen-
dency states that a
thread is waiting for
another thread to exit.
The source and tar-
get must be active
objects. The source
must not yet have
exited.
Waiting
≪waiting≫
Dependency The waiting depen-
dency states that a
thread is waiting for
another thread to send
a notification.
The source and tar-
get must be active
objects.
Table 9.2: UML Profile for Concurrent Java Traces: Stereotypes
188CHAPTER 9. TRACE-BASED FAILURE AND POTENTIAL VISUALISATION
:Object
:Object
:Thread <<lock>>
<<acquire>>
:Thread
<<lock>>
<<acquire>>
:Object:Thread <<lock>>
<<joining>> :Thread
<<acquire>>
:Thread
<<joining>>
:Thread
<<joining>>
:Object:Thread <<lock>>
<<waiting>> :Thread
<<acquire>>
Deadlock Circular Join
LockoutJoin-induced Deadlock
Figure 9.9: Cyclic Failures using Stereotypes
Dealing with Blocking of Method Calls
As a last issue we discuss whether we really want to display blocked method
calls as in the motivation.
The trace format distinguishes three kind of stages of method call, entry,
exit, and blocking, which all have to be depicted consistently. Not being in-
terested primarily into profiling issues such as how long a thread was blocked
before it entered, we choose not to show this explicitly. Also, this kind of
information would not really be exact as we do not have time stamps. We
would only see how many other calls happened during the blocking. While
not giving exact durations it could still be identified how many threads have
been blocked at the same time at a lock, which helps in identifying bottle
necks. We choose to replace blocking by entries when they exist, in both kind
of diagrams. As we provide the possibility of stepwise visualisation, we do
include an option to show the blockings until they are replaced by successful
entries.
9.4. SUMMARY 189
9.3.2 Visualisation Architecture
We implemented the visualisation of the JAVIS prototype as plug-in to the
UML CASE tool Together [Tog]. A first version of this architecture was
described in [Meh02].
Together provides two ways to import UML diagrams, either using the
specialised XML format XMI or via an API. The API is easy to use and
supports the construction of the same diagrams as the editor. As our traces
are not encoded in an XML format we did see a need to transfer them into
XMI.
In order to use the API we developed a set of Java classes (see Fig.
9.10). The PreParser presents summarised information to the user before
displaying a trace. The Parser reads the trace and the Generator generates
a corresponding interaction diagram using the API. Together interacts with
our Java classes by calling a run()-method, which we provided. This method
handles our user interface and controls our parser and diagram generator.
The user interface contains several dialogs for configuring the reading and
analysis of traces and for controlling their display.
Together provides the two different presentations of an interaction dia-
gram automatically. In our prototype we have extended Together with the
stereotypes for ≪lock≫and ≪acquire≫in the intended colours. Colouring
individual threads of control could not be achieved. Together only supports
colour for stereotypes.
9.4 Summary
This chapter discussed visualisation requirements, existing visualisations,
and provided a UML profile for concurrent Java programs and concurrent
Java liveness failures [MW00]. The profile was implemented with the UML
CASE tool Together and is part of the JAVIS prototype [Meh02].
None of the existing tools using UML for visualisation provided extensions
for concurrent Java and specifically the detailed extensions for describing
thread synchronisation.
In the next chapter we will provide more insights into the Together plug-
in by demonstrating its use with examples. Thereby, also the use of the UML
profile will be demonstrated in more depth. The next chapter describes a
complete usage scenarios starting with tracing, followed by analysis, and
finishing with the visualisation.
190CHAPTER 9. TRACE-BASED FAILURE AND POTENTIAL VISUALISATION
Figure 9.10: Class Diagram of the Together Visualisation
Chapter 10
Using the JAVIS Prototypes
In the previous three chapters, we have presented the technical details of the
prototypical solutions for tracing, analysis, and visualisation.
In this chapter we present a visual tour of the prototypes at work. We
demonstrate how they are intended to be used and how they work together
by illustrating a typical use case with an example Java program. Thereby we
will also introduce examplarily the user interface of the different prototypes.
After presenting the main example we will present a small experiment
which demonstrates that the general purpose tracing and visualisation func-
tionality also works for real-size software.
10.1 Example for Automated Failure Detec-
tion
Our main example use case is a non-deterministic test run. It is based on the
banking application which was introduced in detail in chapter 4 and which
had been used in parts already in the motivation.
We will illustrate the following steps.
•Running the application with the Java tracer (see Chapter 7).
•Automatic analysis of deadlocks of the running application with the
option for deadlock detection (see Chapter 8).
•Generating different views of the trace and the deadlocks found with
the Together extension (see Chapter 9).
In the following we will look again at the banking example and focus on
a part of its behaviour.
191
192 CHAPTER 10. USING THE JAVIS PROTOTYPES
7. Fallb eispiel
Im vorangegangenen Kapitel wurde die Entwiklung zweier Werkzeuge b eshrie-
b en, das erste Werkzeug zur Protokollierung eines Programmablaufs und das zweite
Werkzeug zur Üb erführung des Protokolls in UML-Interaktionsdiagramme üb er das
Case-To ol Together. Anhand des Fallb eispiels einer Bank, das in dieser Arb eit immer
wieder aufgegrien wurde, soll nun die Funktionsweise der entwikelten Prototyp en
erläutert werden. Weiterhin wird gezeigt, inwieweit sih die implementierten Fil-
ter auf die Diagrammkomplexität auswirken und der Entwikler die Skalierbarkeit
steuern kann.
7.1. Bankb eispiel
Als Fallb eispiel wurde eine Bankapplikation gewählt, in der Shekbuhungen vor-
genommen werden. Abbildung 7.1 zeigt das Klassendiagramm des Fallb eispiels. Die
Klasse
Main
erzeugt ein
Bank
-Ob jekt, das mehrere Konten (
Aount
), Angestellte
(
Employee
) und Terminals (
Terminal
) b esitzt. Eine Shekbuhung wird von einem
Hilfsthread
AountingTransation
ausgeführt, damit der Angestellte sofort an dem
Terminal weiterarb eiten kann.
Abbildung 7.1.: Klassendiagramm zum Bankb eispiel.
91
Figure 10.1: Banking Application Class Diagram
7.2. Protokollierung des Programmablaufs
Abbildung 7.3.: Sequenzdiagramm zu einer Shekbuhung durh einen Hilfsthread.
7.2. Protokollierung des Programmablaufs
Das Bank-Beispiel wurde mit dem Java-Trae-Programm untersuht, indem alle
Ereignisse protokolliert wurden. Ein Eintrag im Protokoll ist nah der Denition
aus Kapitel 5.1 aufgebaut. Die Einträge in einer Zeile werden durh Dopp elpunkte
getrennt. Ein Eintrag ist folgendermaÿen aufgebaut:
VM-interne Threadbezeihnung:Aufrufert hrea dKla sse Obje ktID ::
ZielklasseObjektID:Zielmethod e:
Synhronisation:Ein- Austritt oder Sperranforderung
Als erstes steht eine VM-interne Bezeihnung eines Threads. Danah wird die Klas-
senzugehörigkeit des Threads angezeigt und dessen Ob jekt-ID. Damit sind alle In-
formationen üb er den Aufrufer vorhanden. Jetzt folgen die Informationen üb er das
Zielob jekt. Zuerst wird die Klasse des Zielob jektes und dessen Ob jekt-ID aufge-
führt. Danah wird die Zielmetho de mit ihrem Name angezeigt. Zum Shluÿ stehen
Informationen üb er das Ereignis. Eine b o olshe Variable zeigt an, ob die Metho de
synhronisiert ist o der niht. Als letztes wird angezeigt, ob es sih um einen Metho-
deneintritt, -austritt o der eine Sp erranforderung handelt.
Zusätzlih werden weitere Informationen protokolliert, z. B. Start und Ende des
Deadlo k-Detetion-Algorithmus. Der folgende Auszug aus einer Protokoll-Datei,
zeigt den Anfang und das Ende eines Programmablaufs, der in einem Deadlo k
endete.
main:java.lang.Thread1::Main 1:ma in:f alse :Ent er
main:java.lang.Thread1::Bank 95:< init >:fa lse: Ente r
main:java.lang.Thread1::Aou nt1 00:< init >:fa lse: Ente r
main:java.lang.Thread1::Aou nt1 00:s etVa lue: true :Ent er
main:java.lang.Thread1::Aou nt- 1:se tVal ue:t rue: Exit
main:java.lang.Thread1::Aou nt- 1:<i nit> :fal se:E xit
93
Figure 10.2: Sequence Diagram Generated by Together
10.1. EXAMPLE FOR AUTOMATED FAILURE DETECTION 193
Figure 10.3: Trace Generation Dialogue
Then we will trace the behaviour and check for deadlocks. The result
will be visualised in Together using our extension which can read traces and
visualise them with our UML profile.
10.1.1 The Banking Example Revisited
The example is the banking application which has already been used in other
chapters. Fig. 10.1 depicts the complete class diagram including methods.
For illustrating the behaviour of these classes we use the facility of To-
gether to generate a sequence diagram. Note that this facility is not able to
deal with concurrency. It can only generate single-threaded behaviour. The
generated sequence in Fig. 10.2 depicts the behaviour of a thread transfer-
ring money between accounts. It accesses a first account, and from there it
accesses the second account. Then it transfers the money from the second to
the first. Note that Together does not show whether a method is synchronized
or not. The two methods used to access the accounts are synchronized. In
the program, several such threads work concurrently. As the code is a simu-
lation, a delay has been inserted between accessing the first and the second
194 CHAPTER 10. USING THE JAVIS PROTOTYPES
7. Fallb eispiel
main:java.lang.Thread1::A ount 101 :<in it>: fals e:En ter
main:java.lang.Thread1::A ount 101 :set Valu e:tr ue:E nter
main:java.lang.Thread1::A ount -1: setV alue :tru e:Ex it
main:java.lang.Thread1::A ount -1: <ini t>:f alse :Exi t
.
.
.
Thread-0:Employee112::Bank 95:g etA oun t:tr ue:E nter
Thread-0:Employee112::Bank -1:g etA oun t:tr ue:E xit
Thread-0:Employee112::Bank 95:g etA oun t:tr ue:a qui re
Start: Deadlok Detetion
End: Deadlok Detetion
Thread-1:Employee114::Bank 95:g etA oun t:tr ue:E nter
Thread-1:Employee114::Bank -1:g etA oun t:tr ue:E xit
Thread-1:Employee114::Bank 95:g etA oun t:tr ue:a qui re
Start: Deadlok Detetion
End: Deadlok Detetion
Thread-0:Employee112::Bank 95:g etA oun t:tr ue:E nter
Thread-0:Employee112::Bank -1:g etA oun t:tr ue:E xit
Thread-1:Employee114::Bank 95:g etA oun t:tr ue:E nter
Thread-0:Employee112::Aou ntin gTra nsa tion 157 :<in it>: fals e:En ter
Thread-1:Employee114::Bank -1:g etA oun t:tr ue:E xit
Thread-6:AountingTransati on1 41:: Ao unt 101: draw Valu e:tr ue:a qui re
Start: Deadlok Detetion
AountingTransation133:A oun t10 2
AountingTransation141:A oun t10 1
Desription:AountingTransa tio n13 3 aquires Aount102
Desription:Aount102 is loked by thread AountingTransation141
Desription:AountingTransa tio n14 1 aquires Aount101
Desription:Aount101 is loked by thread AountingTransation133
DEADLOCK
End: Deadlok Detetion
Im ersten Teil des Protokolls sind die Initialisierungen zweier Konten zu erkennen.
Ein Konto mit der Ob jekt-ID 100 und eins mit Ob jekt-ID 101. Zuerst wird der
Konstruktor eines Kontos aufgerufen. Dieser wird im JDI mit
<init>
b eshrieb en.
Innerhalb des Konstruktors wird das Konto mittels der Metho de
setValue
mit ei-
nem Betrag initialisiert. Der Wert der b o olshen Variablen ist hier
true
, da es sih
um eine synhronisierte Metho de handelt. Es ist zu erkennen, daÿ in allen Einträ-
gen, die einen Metho denaustritt anzeigen, das Feld der Ob jektID mit
1
b elegt
ist. Da das Ob jekt vorher aufgerufen werden musste, und dab ei die Ob jektID des
Zielob jektes korrekt protokolliert wurde, ist die Information der Ob jektID b eim Me-
tho denaustritt implizit im Protokoll vorhanden und sie wird vom Parser aus Kapitel
6.2.2 extrahiert.
94
Figure 10.4: Part of the Trace
account.
10.1.2 Tracing
In order to find out if the behaviour is as desired, we make a non-deterministic
test run with the JAVIS tracer. The banking example has to be started with
a port and in suspend mode. The last argument is the class which contains
the method main().
java -Xrunjdwp:transport=dt_socket,server=y,suspend=y
-Xdebug -Xnoagent -Djava.compiler=none
Main
10.1. EXAMPLE FOR AUTOMATED FAILURE DETECTION 195
Figure 10.5: Importing a Trace in Together
When the program is started in this way, the command line returns a
port number. The tracer can connect to a program started in the above way.
It can also directly start a program itself. Fig. 10.3 depicts the main dialog
window of the tracer. The port number of the started banking application
has to be entered.
10.1.3 Automated Deadlock Detection
The tracing stops automatically because a deadlock has been detected. The
resulting trace is depicted in Fig. 10.4. The information about the deadlock
is part of the trace in our trace format.
10.1.4 Trace and Deadlock Visualisation
The generated trace can be imported into Together using the JAVIS plug-
in. In the corresponding dialog in Fig. 10.5 different options can be chosen.
Either the entire trace, the methods involved in the deadlock, or the deadlock
only can be visualised. Also a stepwise visualisation is offered.
196 CHAPTER 10. USING THE JAVIS PROTOTYPES
7. Fallb eispiel
Die in den Deadlo k involvierten Ob jekte können üb er das Hauptfenster gefun-
den werden, indem üb er diese farblihen Links zu den Ob jekten gegangen wird.
Im Hauptfenster sind zwei Konten zu erkennen, die jeweils gesp errt sind und eine
Sp erranforderung hab en. Üb er die Links können die entsprehenden Threads gefun-
den werden. Abbildung 7.5 zeigt das Sequenzdiagramm zu diesem Programmablauf.
Abbildung 7.5.: Das Sequenzdiagramm zum Kollab orationsdiagramm aus Abbil-
dung 7.4
Abbildung 7.6 zeigt eine die Darstellung des Programmablaufs, indem der Filter
involvierte Objekte
aktiviert wurde. Da die Hilfsthreads
AountingTransation
nur
zwei Befehle ausführen, und der Filter nur die in den Deadlo k involvierten Ob jekte
anzeigt, ist die Darstellung auf das Wesentlihe reduziert. Die zwei Hilfsthreads 133
und 141 verweigern sih gegenseitig den Zugri auf die Konten 100 und 101. Die No-
tiz b eshreibt die Deadlo k-Situation textuell und ist mit den blo kierten Threads
verbunden.
Das Sequenzdiagramm 7.7 zeigt diese Aufrufe in der zeitlihen Abfolge. Dab ei wer-
den Sp erranforderungen mit einem Rükpfeil gekennzeihnet. Metho deneintritte wer-
den niht mit einem Rükpfeil versehen. Dadurh können Sp erranforderungen und
Metho deneintritte auh im Sequenzdiagramm untershieden werden.
An dem Sequenzdiagramm ist eine Einshränkung zu erkennen, die in Abshnitt
96
Figure 10.6: Sequence Diagram of a Trace
10.1. EXAMPLE FOR AUTOMATED FAILURE DETECTION 197
The complete trace can be visualised as sequence (see Fig. 10.6). In the
sequence diagram our profile allows the identification of a method with a syn-
chronized-keyword. The methods involved in the deadlock can be visualised
as a sequence, too. Here we prefer to show the corresponding collaboration
diagram (see Fig. 10.7). Together can generate the corresponding collabora-
tion automatically. Here, the stereotypes for locking and acquiring are used.
It is also possible to visualise the deadlock separately (see Fig. 10.8).
The sequence and collaboration diagrams generated by our extension de-
pict the Java execution more accurate than the sequence diagram which was
generated with Together from the Java source code because the profile dis-
tinguishes mutual exclusive calls based on synchronized.
It is also possible to visualise the complete trace in Together, see Fig.
10.9. Together is able to display traces with over hundred calls. It also has a
zoom-in and a zoom-out functionality. However, zooming-out does not scale
very well. Labels cannot be read any longer. It would be better to have a
way to compact a view while keeping important information.
Correcting the Java Code
To avoid the deadlock the code can be corrected such that the accounts are
not used via nesting but in parallel.
class Account{
private long value;
public void synchronized setValue(long newValue){
value = newValue
}
public long synchronized getValue(){
return value
}
public void synchronized drawValue(long amount){
setValue(getValue() - amount);
}
public void synchronized drawCheque(long amount){
setValue(getValue()+ amount);
}
}
class AccountingTransaction extends Thread {
public void run(Account target, Account receiver, long amount) {
target.drawValue(amount);
receiver.drawCheque(amount);
}
}
Other recommended strategies are lock ordering, i.e. enforcing that locks
are always accessed in the same order (see for instance [Lea00]), or a gate
lock, i.e. using locks which guard a set of other locks.
198 CHAPTER 10. USING THE JAVIS PROTOTYPES
7.3. Visualisierung des Programmablaufs
Abbildung 7.6.: Generiertes Kollab orationsdiagramm mit dem Filter
involvierte Ob-
jekte
.
6.2.5 erläutert wurde. Der zeitlihe Ordnung der Nahrihten im Sequenzdiagramm
ist niht die gleihe, wie im Diagramm dargestellt. Dies folgt aus einer Umordnung
der Nahrihten durh das Case-To ol Together, das annimmt, daÿ Nahrihten mit
der niedrigeren ersten Sequenznummer auh im zeitlihen Kontext immer als erstes
ersheinen. Mit der shrittweisen Visualisierung kann der genaue zeitlihe Ablauf
jedo h nahvollzogen werden.
Abbildung 7.8 zeigt das generierte Kollab orationsdiagramm mit dem Filter
only
dead lok
, b ei dem nur die letzte Nahriht angezeigt wird. Dies ist immer eine
Sp erranforderungsnahriht an ein Ob jekt. Die b eiden Konten hab en den Stereoty-
p en
temporary ative
, weil sie die Aktivität des aufrufenden Threads innehab en.
Die b eiden
lok
-Links, welhe auf die zeitweilig aktiven Ob jekte selb er zeigen, b e-
shreib en, daÿ der aktivierende Thread das exklusive Zugrisreht auf das Ob jekt
b esitzt. Dieser Sahverhalt kann wiederum durh die textuelle Notiz nahvollzogen
werden.
Die Darstellung des zugehörigen Sequenzdiagramms ist in diesem Fall niht sinn-
voll, da jeweils nur die letzte Sp erranforderungsnahriht angezeigt wird und die
Sp errsituation davor niht verändert wurde. Eine zeitlihe Reihenfolge der Sp erran-
forderungen ist in diesem Fall deshalb niht von Bedeutung.
97
Figure 10.7: Collaboration Diagram with Involved Methods
10.2. EXAMPLE FOR GENERAL PURPOSE TRACING 199
7.4. Testergebnis und Analyse
Abbildung 7.8.: Generiertes Kollab orationsdiagramm mit dem Filter
only Dead lok
.
long newValue = getValue()-amount;
setValue(newValue);
}
publi synhronized drawCheque(Aount target, long amount) {
long newValue = getValue()+amount;
setValue(newValue);
}
}
lass AountingTransation extends Thread {
void run(Aount target, Aount reeiver, long amount) {
target.drawValue(amount);
reeiver.drawCheque(amount );
}
}
Eine
AountingTransation
führt jetzt erst die Belastung durh und danah die
Gutshrift. Der Ablauf ist derselb e wie im Beispiel in Abshnitt 1.3.2, allerdings
wird die Belastung des Kontos niht mehr in der Metho de
drawCheque
durhge-
führt, sondern getrennt in einem extra Metho denaufruf. Dies hat zur Folge, daÿ ein
Thread während einer Buhung nur no h hö hstens eine Sp erre b esitzt. Mit dieser
Implementierung ist die Ursahe des Deadlo ks b ehob en.
99
Figure 10.8: Collaboration Diagram with Deadlock
The banking example should be seen as an illustrating example. We do
not claim that, in a real software, the code to transfer money between account
looks exactly like here.
10.2 Example for General Purpose Tracing
Here we report on an experiment with remote tracing and visualisation of a
real-size concurrent Java application.
10.2.1 A Simulation Software Example
The JEVOX application [JEV] is a simulation of a factory environment. In
this factory, shuttles are running on tracks, stopping from time to time at
robots, from which they obtain goods or which remove goods. The tracks
and the behaviour of shuttles is visualised on the screen.
200 CHAPTER 10. USING THE JAVIS PROTOTYPES
Figure 10.9: Complete Trace Visualisation in Together
10.3. SUMMARY 201
The JEVOX simulation has been generated from a high-level executable
specification. During the testing of the simulation, a failure occurred. The
simulation crashed when two shuttles were closely following each other.
10.2.2 Tracing and Visualising JEVOX
We traced this situation in order to capture the exact behaviour before the
crash. The remote tracing facility was very convenient. We did not have to
install the software we wanted to trace on our own machine nor did we have
to install the tracer on the application machine. Instead, the application
had to be restarted with opening a port. The tracing facility running on a
different machine connected to the port. The only preliminary for this was
that both machines were connected via a network.
Note that the code we traced was generated code. Figure 10.10 shows
the first part of the trace. Although it is generated code, the main principle
of the code is obvious. The object on the left is the execution engine of the
simulation. It triggers a robot to interact with a shuttle. The robot makes
several calls on the shuttle and so on.
The trace contained over 500 calls, the trace file had 173 KByte. Tracing
took a few minutes. With our Together extension for trace visualisation we
could visualise a bit more than hundred calls. To this end we had manually
truncated the trace before visualisation. The trace of the calls of generated
methods was easily understandable for the developers of the application.
From the trace they understood that methods were called in an unexpected
order, which provoked the crash.
Of course, this failure could not be detected automatically because it was
a failure in the logic of the application. However, catching a trace auto-
matically, and subsequently viewing the trace was more convenient than the
stepwise debugging of the application.
10.3 Summary
In this chapter we first described how to work with the JAVIS prototypes.
This was illustrated with a small-scale example, in which a deadlock could
be successfully detected using automated detection.
Then we described how we traced a real-size software project in order
to help the developers understand a failure in the concurrent application
logic. Here, it was the general purpose tracing and visualisation facility that
helped the developers in understanding the failure, which was caused by an
unexpected execution order.
202 CHAPTER 10. USING THE JAVIS PROTOTYPES
Figure 10.10: Part of the Remote Trace
Chapter 11
Conclusion
This thesis describes an approach for automated detection of concurrent live-
ness failures and their potentials in the execution of Java programs. It con-
tributes new concepts to or adapts existing ones in four different areas: the
domain of concurrent Java liveness failures, the area of runtime data collec-
tion, the area of runtime analysis, and the area of software visualisation.
The focus of this thesis is on presenting a coherent treatment of these
areas by showing how concepts from the domain can be put to work in the
other areas in order to achieve the overall goal. This results in concepts for
tool support and in a set of prototypes which seamlessly work together.
In the following, the contributions are described in more detail and eval-
uated.
11.1 Contributions
1. Liveness Failure and Potential Classification
For each conceivable failure and potential a detailed description was
provided. It was exemplarily illustrated with a UML sequence diagram.
The failures and potentials were analysed for the Java synchronisation
mechanisms involved. A set of thread dependencies was identified as a
source of failures and potentials.
2. Formalisation of Liveness Failures and Potentials
The synchronised behaviour of a single thread was modelled using a
UML statechart. Based on the statechart, the different threads in an
executing Java program and the dependencies between them were mod-
elled.
203
204 CHAPTER 11. CONCLUSION
Based on these models, the failures and potentials were specified using
logic formulas.
3. Detection of Liveness Failures and Potentials
Algorithms were given for detecting failures involving cycles and for
detecting their potentials.
4. Tracing of Concurrent Java Programs
A trace format and a tracing method for concurrent Java programs
were defined which are suitable for the implementation of the specified
algorithms.
5. Visualisation of Liveness Failures and Potentials
A UML profile for visualising the execution of concurrent Java pro-
grams was developed.
It allows failures and potentials to be visualised in execution scenarios.
6. Tools and Prototypical Evaluation of Concepts
Prototypes for General Purpose Tracing, Analysis, and UML-based
Visualisation were implemented.
Their intended use was illustrated with examples.
11.2 Evaluation
The above contributions are assessed for their usefulness, i.e. how well they
solved the goals.
Domain of Liveness Failures and Potentials
The analysis of the domain of liveness failures was carried out in depth.
Failures were described at different levels of detail and abstraction including
intuitive visualisations and formalisations.
This filled in a gap in the literature. It helped us a lot in understanding
the domain. This kind of material would have been able to save the students
involved in the projects related to this thesis a lot of time when learning
which pitfalls to avoid.
We also made the experience that the explicit analysis of how and why
failures happen gives programmers, which have to deal with Java threads
from time to time, valuable feedback.
11.2. EVALUATION 205
Statechart
In the same vein, the statechart was a major part to better understand the
domain and to capture the analysis made.
In fact, the formalisation of thread dependencies helped in two respects.
Firstly, it was more accurate than the informal description, which was only
accompanied by the visualisation with UML sequence diagrams. This means
that it allowed to specify dependencies explicitly such as which locks are held
by a thread. This accurateness was needed to specify algorithms.
Secondly, the statechart allowed us to find additional failures about which
we had not read in the literature. It was straightforward to use the statechart
to check different combinations of execution histories of threads.
The use of the statechart also raised further issues which could not be
treated here such as formal semantics. Nevertheless, we think that a visual
formalism has advantages e.g. over textual formalisms albeit other formalism
might be more mature.
Algorithms
For a prototype, our choice of algorithm is adequate. We did not need to be
as fast as possible. If algorithms for all kind of failures and potentials run in
parallel, performance becomes an issue. For a commercial tool, one should
try to make the algorithms as efficient as possible.
Tracing
The trace format we have specified is a general purpose format for concurrent
Java programs. We think that we will be able to experiment with it also for
other kind of goals.
We have discussed tracing methods in depth, including technologies which
we consider state-of-the-art such as the Java Platform Debugger Architecture
(JPDA). We have contributed to the dissemination of these technologies by
using them and by reporting our experience with them such as with perfor-
mance, which we measured with an experiment in the absence of performance
data published by the provider of the Java Virtual Machine.
Note that we proposed to use the JPDA for a purpose for which it was not
designed in first place. This is also the reason for the lack of performance,
which we measured. It is an interesting observation that there is no dedicated
support for tracing although dynamic analysis is considered more and more
an important source during analysis of software qualities.
206 CHAPTER 11. CONCLUSION
Visualisation
From time to time, general purpose software visualisation prototypes are pro-
posed without supporting a dedicated goal. This typically has the result that
they are not very useful. Therefore, we have set out to support a dedicated
goal to increase the chance of providing a useful software visualisation tool.
We also have gone further than others and have adapted the UML for the
purpose of dynamic visualisation of existing software. Here, our conclusion
is that the UML does not provide ideal extensible elements for using it to
exactly describe the semantics of a programming language. One can reply
that this is not the intention of the UML.
The UML is certainly a debatable candidate for visualisations in the area
of debugging. Nevertheless, the UML is a good fit for traces of object-oriented
languages like Java. Our conclusion is that it is possible to visualise selected
parts of traces with UML. Failures and methods causing potentials are such
selected parts.
There is however no evidence that UML is better than others visual lan-
guages in this respect. We consider it very difficult to empirically assess these
issues, especially if languages are not very different from each other. In the
first place, the functionality of a software tool is important.
Prototypes
For the prototypes we were using state-of-the-art technology which is scal-
able and turned out to be very stable. The choice for the state-of-the-art
technology was made not only from the academic point of view but also from
a practical perspective, so that the solution can be adapted for commer-
cial tools. Our experiences and results are of interest for the community of
program developers and developers of tools for program developers.
For the tracing component, we have tried to provide essential features
such as tracing an already running program remotely. Also, we have provided
concepts how tracing can be used within many environments. These features
were easily accomplished using the Java Platform Debugger Architecture.
The user control we provided with the tracer is not very flexible compared
to a debugger. It mainly lacks interactive control.
The UML CASE tool Together is widely used in industrial settings. To-
gether is still having an impact also if new tools like Eclipse [Ecl] have ap-
peared. Its UML modelling facilities are more mature than UML plug-ins
for Eclipse.
It was straightforward to use the Together Open API to write a plug-in
for generating diagrams from the trace file. Once this was done, we benefited
11.2. EVALUATION 207
from the standard diagram editor facilities. We acknowledge that this saved
a lot of development time.
For our UML profile it was not a problem that the Together version we
used did not support graphical stereotypes or constraints on stereotypes.
208 CHAPTER 11. CONCLUSION
Chapter 12
Outlook
The outlook is concerned with three issues: addressing parts of the thesis
which can be treated in more depth, the evolution of our ideas in the evolving
domain of Java and UML, and the transfer of our ideas to related domains.
12.1 Remaining Issues
Concurrency Libraries
Our approach focuses on tracing concurrency implemented directly in Java.
Because of the deficiencies of the concurrency constructs, there is a move-
ment to develop suitable Java libraries for concurrency [Lea03]. When such
libraries are used, our approach can be of limited usefulness. Firstly, the
user does not want to debug the library. Secondly, liveness failures based on
software with high-level protocols cannot be detected in the same way as the
basic Java failures.
Such protocols use synchronized,wait(), and notify/All() to implement
their high-level synchronisation strategies. Therefore, when a thread cannot
make progress this is also caused by the typical method calls and statements.
But this kind of waiting has a special meaning in terms of a given protocol.
This meaning cannot be detected by our approach. Our approach would have
to be adapted to each library used.
Formalisation
We have already pointed out that the formalisation of thread synchronisation
is a field with open issues. Existing formalisations do not yet cover it com-
pletely. Our approach using statecharts still lacks a formal semantics. This
209
210 CHAPTER 12. OUTLOOK
is a prerequisite for the required validation of properties of the statechart
and for the required validation against the Java Language Specification.
Also, it could be considered to use and extend one of the other formalisms
presented in this thesis.
Tracing
Tracing of Java still has deficits. Especially, performance has to be improved.
To foster interoperability of our approach with tools from testing, we
think it is worthwile to consider other standardised formats such as for test-
ing.
The control over the tracing process could be more flexible. Tracing
could be combined more tightly with debuggers. For instance, debuggers
could automatically generate small traces to improve their visualisation of
history. Tracing on the other hand could generate breakpoints at interesting
lines of code and start the debugger when these breakpoints are hit. The
debugger is important because it allows better browsing of object structures.
We think it would be a good decision to integrate tracing into Eclipse.
Visualisation
Typically, traces are visualised using 2-dimensional languages such as the
UML. So far, there is not yet a 3-dimensional trace language but it has
already been considered in research.
12.2 Evolution
The area of software development is constantly changing. Therefore, practi-
cal solutions need to be constantly adjusted, too. Software development tools
must continuously be adapted to improvements of programming languages,
middleware, and operating systems.
Besides the many benefits and inventions of Java, many of its languages
concepts are not new, they appeared already in other languages. The sit-
uation is even worse, as the design of some initial features was not even
state-of-the-art. Therefore, Java undergoes bigger changes than other pro-
gramming languages.
Java 1.5
Java 1.5.0, also called 5.0, was released only recently [Jav04]. For concurrent
programming, it provides new classes such as a lock class for mutual exclu-
12.3. RELATED DOMAINS 211
sion. This will surely have an impact on the failures we have to deal with.
In analogy to the above discussion on third party concurrency libraries, the
tracing, the analysis, and the visualisation will have to cover the additional
classes.
The lock class for mutual exlusion is very similar to synchronized. Its
advantage is that the thread waiting for the mutex can be woken up. This
can be used to resolve deadlocks.
Nevertheless, the provided new classes are at a similar level of abstraction
and hence our ideas can be adapted.
Java Virtual Machine Tooling Interface (JVMTI)
Also, Java 1.5 has deprecated the use of the native profiling and debug-
ging interfaces JVMPI and JVMDI. They are replaced by the Java Virtual
Machine Tooling Interface (JVMTI). In subsequent releases, the deprecated
interfaces will be removed.
Therefore, all existing tools based on the JPDA have to be ported to the
new interface. The JVMTI is an answer to the many requests for feature
enhancements and hopefully, the JMVTI will have erased some of the prob-
lems of the JPDA, and hopefully, it will provide implementations of methods
which have already been proposed in the JPDA but which have never been
implemented. The JVMTI will support the deadlock detection we had still
to program ourselves.
UML 2.0
Not only programming languages are evolving, also modelling languages.
Our approach is still feasible with the UML 2.0 [UMLb]. Without going
into details we conclude that the amendments to interaction diagrams do
not conflict with our purposes.
12.3 Related Domains
In this thesis, the focus was on Java. There are other concurrent object-
oriented languages of industrial strength such as Ada [Ada95] or C♯[Cs] or
even Modula-3 [Mod]. They also face concurrent liveness failures. Our ideas
could be adapted and also a UML-based visualisation could be used. For
instance, C♯has a threading concept very similar to Java. The ideas could
also be adapted for languages using thread libraries such as POSIX threads.
This thesis only looked at concurrency. In distributed systems, support
for failure detection is also an important issue. For instance, the tracing could
212 CHAPTER 12. OUTLOOK
be extended to include remote method invocation (RMI). For a distributed
setting, time stamps are required. With these preliminaries it is possible to
identify deadlocks in RMIs [Sch02].
Tracing has a high potential to gain more impact in the future because
it can be used to guide model-checking such as proposed in [Hav00].
Bibliography
[ACM97] The Debugging Scandal and What to Do About It. Special Issue
Communications of the ACM, 40(4), April 1997.
[ACM03] Developer Tools - Proceed with Caution. ACM Queue - Tomor-
row’s computing today, 1(6), September 2003.
[Ada95] Ada95. Ada Home, 1995. http://www.adahome.com.
[AJ98] AspectJ Specification. Technical report, XEROX, Palo Alto
Research Center, http://www.parc.xerox.com/aop/aspectj,
1998.
[Ass] Assure Thread Analyzer. Intel. http://www.intel.com.
[AWY93] G. Agha, P. Wegner, and A. Yonezawa. Research Directions in
Concurrent Object-Oriented Programming. MIT Press, 1993.
[Ban04] The Bandera Project. http://bandera.projects.cis.ksu.edu/,
2004.
[BCH+00] K. Berg, S. Canditt, A. Hennig, A. Hentschel, E. Reyzl, and
J. Schmitz-Foster. Monitoring with TMT - Insight Into Dis-
tributed Systems. In Proceedings of the International Conference
on Parallel and Distributed Processing Techniques and Applica-
tions, PDPTA 2000, 2000.
[BDE+02] R. Brown, K. Driesen, D. Eng, L. Hendren, J. Jorgensen, C. Ver-
brugge, and Q. Wang. STEP: A Framework for the Efficient
Encoding of General Trace Data. In PASTE 02, 2002.
[BS99] E. Boerger and W. Schulte. A Programmer Friendly Modular
Definition of the Semantics of Java. In J. Alves-Foss, editor,
Formal Syntax and Semantics of Java, volume 1523, pages 353–
404. LNCS, 1999.
213
214 BIBLIOGRAPHY
[BT98] A. Bechini and K. Tai. Design of a Toolset for Dynamic Analysis
of Concurrent Java Programs. In Proc. International Workshop
on Program Comprehension IWPC98, 1998.
[But97] D. Butenhof. Programming with POSIX Threads. Addison-
Wesley, 1997.
[CDH+00] J. Corbett, M. Dwyer, J. Hatcliff, C. Pasareanu, Robby,
S. Laubach, and H. Zheng. Bandera: Extracting Finite-State
Models from Java Source Code. In 22nd International Confer-
ence on Software Engineering, June 2000.
[CES71] E. G. Coffmann, M. J. Elphick, and A. Shoshani. System Dead-
lock. ACM Computing Surveys, 3(2):67–78, June 1971.
[CGR01] S. Canditt, K. Grabenweger, and E. Reyzl. Trace-basiertes
Testen von Middleware. Java Spektrum, (3):17–32, May 2001.
(in German).
[CKRW99] P. Cenciarelli, A. Knapp, B. Reus, and M. Wirsing. An Event-
Based Structural Operational Semantics of Multi-threaded Java.
In J. Alves-Foss, editor, Formal Syntax and Semantics of Java,
volume 1523, pages 157–200. LNCS, 1999.
[CLL+02] J.-D. Choi, K. Lee, A. Loginov, R. O’Callahan, V. Sarkar, and
M. Sridharan. Efficient and Precise Datarace Detection for Mul-
tithreaded Object-Oriented Programs. In ACM SIGPLAN 2002
Conference on Programming Language Design and Implementa-
tion (PLDI’02), June 2002.
[Cs] C♯. Microsoft. http://www.microsoft.com.
[CT91] R. Carver and K. Tai. Replay and Testing for Concurrent Pro-
grams. IEEE Software, 8(2):66–74, 1991.
[CZ02] J.-D. Choi and A. Zeller. Isolating Failure-Inducing Thread
Schedules. In International Symposium on Software Testing and
Analysis ISSTA, 2002.
[DD00] E.-E. Doberkat and S. Dissmann. Einf¨uhrung in die objektorien-
tierte Programmierung. Oldenbourg, 2000. (in German).
[Ecl] Eclipse. IBM. http://www.eclipse.org.
BIBLIOGRAPHY 215
[EHSW99] G. Engels, R. H¨ucking, S. Sauer, and A. Wagner. UML Collab-
oration Diagrams and their Transformation to Java. In Proceed-
ings of UML 1999. LNCS, 1999.
[Eng99] J. Engel. Programming for the Java Virtual Machine. Addison-
Wesley, June 1999.
[FR01] M. A. Francel and S. Rugaber. The Value of Slicing While De-
bugging. Science of Computer Programming - Special Issue on
Program Comprehension (Pittsburg, PA, 1999), 40(2-3):151–169,
2001.
[GH03] A. Goldberg and K. Havelund. Instrumentation of Java Bytecode
for Runtime Analysis. In Fifth ECOOP Workshop on Formal
Techniques for Java-like Programs, volume 1885, July 2003.
[GJM91] C. Ghezzi, M. Jazayeri, and D. Mandrioli. Fundamentals of Soft-
ware Engineering. Prentice Hall, 1991.
[GJS97] J. Gosling, B. Joy, and G. Steele. The Java Language Specifica-
tion. Addison-Wesley, 1997.
[GJSB00] J. Gosling, B. Joy, G. Steele, and G. Bracha. The Java Language
Specification, 2nd edition. Addison-Wesley, 2000.
[Har00] J. Harrow. Debugging Multithreaded Applications on Compaq
Tru64 UNIX Operating Systems. Technical report, Compaq
Computer, 2000.
[Hav00] K. Havelund. Using Runtime Analysis to Guide Model Checking
of Java Programs. In Proc. 7th Intl. SPIN Workshop on Model
Checking of Software, volume 1885. Lecture Notes in Computer
Science, August 2000.
[HD01] J. Hatcliff and M. Dwyer. Using the Bandera Tool Set to Model-
Check Properties of Concurrent Java Software. In Proceedings
of CONCUR 2001, volume 2154. Lecture Notes in Computer
Science, June 2001. Invited Tutorial Paper.
[HG97] D. Harel and E. Gery. Executable Object Modeling with State-
charts. IEEE Computer, 30(7):31–42, 1997.
[HK04] D. Harel and H. Kugler. The Rhapsody Semantics of Statecharts
(or, On the Executable Core of the UML). 2004.
216 BIBLIOGRAPHY
[HLL03] A. Hamou-Lhadj and T. Lethbridge. Compact Trace Format
(CTF). In Workshop ATEM2003 co-located with WCRE 2003,
2003.
[Hol71] R. Holt. Some Deadlock Properties of Computer Systems. ACM
Computing Surveys, 3(2), June 1971.
[Hol98] A. Holub. Programming Java Threads in the Real World
- The Perils of Race Conditions, Deadlock, and Other
Threading Problems. Java World, Java Toolbox Column,
http://www.javaworld.com, October 1998.
[Hol00] A. Holub. If I were king: A proposal for fixing the Java program-
ming language’s threading problems. IBM Developer Works,
Java Technology, http://www.holub.com/publications, Octo-
ber 2000.
[HP98] D. Harel and M. Politi. Modeling Reactive Systems with State-
charts: The STATEMATE Approach. McGraw-Hill, 1998.
[HR01] K. Havelund and G. Rosu. Java PathExplorer - A Runtime Ver-
ification Tool. In Proc. The 6th International Symposium on AI,
Robotics and Automation in Space, May 2001.
[Hya] Hyades for Eclipse. IBM. http://www.ibm.com.
[I3E90] Standard Glosary of Software Engineering Terminology Std
610.12-1990. IEEE Computer Society, 1990.
[Jav01] Java2Platform, Standard Edition, V1.4.2.
http://java.sun.com/j2se/1.4.2/docs/api/index.html,
2001.
[Jav04] Java2Platform, Standard Edition, V1.5.0.
http://java.sun.com/j2se/1.5.0/docs/api/index.html,
2004.
[JBd01] JBuilder 5. Borland Software Corporation, 2001.
http://www.borland.com/jbuilder/.
[JDv01] JDeveloper. Oracle, 2001.
http://www.oracle.com/technology/products/jdev/.
[JEV] JEVOX - Shuttle Simulation Environment.
http://wwwcs.upb.de/cs/jevox/.
BIBLIOGRAPHY 217
[Jin] IBM Jinsight - A Visual Tool for Optimizing and Un-
derstanding Java Programs. W. De Pauw and others.
http://www.research.ibm.com/jinsight.
[JMT] JMThreadUnit. http://www.sourceforge.net.
[Jon97] M. Jones. What Really Happened on Mars.
http://research.microsoft.com/~mbj/Mars Pathfinder/
Authoritative Account.html, 1997.
[JPD] JDK 1.4 Java Platform Debugger Architecture. JavaSoft.
http://www.javasoft.com/products/jpda/.
[JPF03] Java Pathfinder - A Formal Methods Tool for Java. NASA, 2003.
http://ase.arc.nasa.gov/visser/jpf/.
[JPr00] Threadalyzer. JProbe, 2000. http://www.jprobe.com.
[JTe] JTest. http://www.sourceforge.net.
[JTr] JTrek. COMPAQ/SRC. http://research.compaq.com/SRC/.
[JUn] JUnit. JProbe. http://www.jprobe.com.
[KG01] R. Kollmann and M. Gogolla. Capturing Dynamic Program Be-
haviour with UML Collaboration Diagrams. In 5th European
Conference on Software Maintenance and Reengineering. IEEE,
2001.
[KLM+97] G. Kiczales, J. Lamping, A. Mendhekar, C. Maeda, C.V. Lopes,
J.M. Loingtier, and J. Irwin. Aspect-Oriented Programming. In
Proceedings of ECOOP ‘97, number 1241 in LNCS, pages 220–
243, 1997.
[Kra98] E. Kraemer. Visualizing Concurrent Programs. In Software Visu-
alization: Programming as a Multimedia Experience, pages 237–
256. MIT Press, Cambridge, MA, 1998.
[KRR98] D. Kimelman, B. Rosenburg, and T. Roth. Visualization of Dy-
namics in Real World Systems. In Software Visualization: Pro-
gramming as a Multimedia Experience. MIT Press, Cambridge,
MA, 1998.
[L4J] Log4J. http://logging.apache.org/log4j.
218 BIBLIOGRAPHY
[Lam77] L. Lamport. Proving the Correctness of Multiprocess Pro-
grams. IEEE Transactions on Software Engineering, 3(2):125–
143, March 1977.
[Lea97] D. Lea. Concurrent Programming in Java, Design Principles and
Patterns. Addison-Wesley, 1997.
[Lea00] D. Lea. Concurrent Programming in Java, Design Principles and
Patterns (2nd Ed.). Addison-Wesley, 2000.
[Lea03] D. Lea. Java Concurrency Utility Library (developed under JSR-
166). http://gee.cs.oswego.edu/dl/, 2003.
[Lew03a] B. Lewis. Debugging Backwards in Time. In AADEBUG 2003,
2003.
[Lew03b] B. Lewis. Omniscient Debugging. In ASARTI Workshop
ECOOP, 2003.
[LF98] H. Lieberman and C. Fry. ZStep95: A Reversible, Animated
Source Code Stepper. In Software Visualization: Programming
as a Multimedia Experience. MIT Press, Cambridge, MA, 1998.
[LMM99] D. Latella, I. Majzik, and M. Massink. Towards a formal op-
erational semantics of UML statechart diagrams. In Proceed-
ings of Formal Method for Open Object-based Distributed Systems
FMOODS 1999. Kluwer Academic Publishers, 1999.
[LRRM03] H. Leroux, A. Requile-Romanczuk, and C. Mingins. JACOT: A
Tool to Dynamically Visualise the Execution of Concurrent Java
Programs. In Proc. PPPJ 2003, 2003.
[LS02] C. Lewerentz and F. Simon. Metrics-based 3D Visualization of
Large Object-Oriented Programs. In Proceedings of the First
IEEE International Workshop on Visualizing Software for Un-
derstanding and Analysis (VISSOFT 2002), pages 70 – 77. IEEE
Computer Society Press, 2002.
[Meh02] K. Mehner. JaVis: A UML-Based Visualization and Debugging
Environment for Concurrent Java Programs. In S. Diehl, editor,
Software Visualization, International Seminar, Dagstuhl Castle,
Germany, May 20-25, 2001. Revised Papers, volume 2269, pages
163 – 175. LNCS, 2002.
BIBLIOGRAPHY 219
[Meh03] K. Mehner. Performante ¨
Uberwachung von Methodenaufrufen
mit JPDA. Java Spektrum, (6):42–46, November 2003. (in Ger-
man).
[Mit00] W. D. Mitchell. Debugging Java - Troubleshooting for Program-
mers. McGraw-Hill Computing, 2000.
[MK99] J. Magee and J. Kramer. Concurrency: State Models & Java
Programs. John Wiley & Sons, 1999.
[MLMD01] J.I. Maletic, J. Leigh, A. Marcus, and G. Dunlap. Visualizing
Object-Oriented Software in Virtual Reality. In Proceedings of
the 9th IEEE International Workshop on Program Comprehen-
sion (IWPC’01). IEEE Computer Society Press, 2001.
[Mod] Modula-3. Compaq Research.
http://research.compaq.com/SRC/modula-3/html/home.html.
[MW99] K. Mehner and A. Wagner. On the Role of Method Families in
Aspect-Oriented Programming. In Workshop on Aspect-Oriented
Programming, ECOOP 1999, 1999.
[MW00] K. Mehner and A. Wagner. Visualizing the Synchronization
of Java-threads with UML. In Proc. IEEE Symposium on Vi-
sual Languages, Seattle, pages 199–206. IEEE Computer Society
Press, 2000.
[MW03] A. Marburger and B. Westfechtel. Tools for Understanding the
Behavior of Telecommunication Systems. In Proceedings of the
25t International Conference on Software Engineering ICSE03,
2003.
[Nag03] M. Nagl. Softwaretechnik mit Ada 95-Entwicklung großer Sys-
teme (2., durchgesehene Auflage). Vieweg Verlag, 2003. (in Ger-
man).
[Oec02] R. Oechsle. JAVAVIS: Automatic Program Visualization with
Object and Sequence Diagrams Using the Java Debug Interface
(JDI). In S. Diehl, editor, Software Visualization, International
Seminar, Dagstuhl Castle, Germany, May 20-25, 2001. Revised
Papers, volume 2269. LNCS, 2002.
[OL82] S. Owicki and L. Lamport. Proving Liveness Properties of Con-
current Programs. ACM Transactions on Program Languages
and Systems, 4(3):455–495, 1982.
220 BIBLIOGRAPHY
[OW99] S. Oaks and H. Wong. Java Threads (2nd Ed.). O’Reilly, 1999.
[PJM+02] W. De Pauw, E. Jensen, N. Mitchell, G. Sevitsky, J. Vlissides,
and J. Yang. Visualizing the Execution of Java Programs. In
S. Diehl, editor, Software Visualization, International Seminar,
Dagstuhl Castle, Germany, May 20-25, 2001. Revised Papers,
volume 2269. LNCS, 2002.
[PKV94] W. De Pauw, D. Kimelman, and J. Vlissides. Modeling Object-
Oriented Program Execution. In ECOOP 94, 1994.
[PKV98] W. De Pauw, D. Kimelman, and J. Vlissides. Visualizing Object-
Oriented Software Execution. In Software Visualization: Pro-
gramming as a Multimedia Experience. MIT Press, Cambridge,
MA, 1998.
[PMR+01] W. De Pauw, N. Mitchell, M. Robillard, G. Sevitsky, and
H. Srinivasan. Drive-by Analysis of Running Programs. In Pro-
ceedings for Workshop on Software Visualization, International
Conference on Software Engineering 2001, 2001.
[Ree98] G. Reeves. Re: What Really Happened on Mars? Risks-Forum
Digest, January 1998.
[Rei98] S. Reiss. Software Visualization in the Desert Environment. In
Proc. PASTE, 1998.
[RM02] A. Roychoudhury and T. Mitra. Specifying Multithreaded Java
Semantics for Program Verification. In International Conference
on Software Engineering, 2002.
[Sch02] T. Schattkowsky. Ans¨atze zur objektorientierten Modellierung
nebenl¨aufiger Systeme. Diploma Thesis, University of Paderborn,
2002. (in German).
[SHS03] D. Seifert, S. Helke, and T. Santen. Test Case Generation for
UML Statecharts (Short Paper). In Proc. Perspectives of System
Informatics (PSI 2003). Springer, 2003.
[SK93] J. Stasko and E. Kraemer. A Methodology for Building
Application-specific Visualisations of Parallel Programs. Jour-
nal of Parallel and Distributed Computing, 18(2):258–264, June
1993.
BIBLIOGRAPHY 221
[SOT] The Software Tomograph (Sotograph).
http://www.sotograph.com/.
[SRL90] L. Sha, R. Rajkumar, and J.P. Lehoczky. Priority Inheritance
Protocols: An Approach to Real-Time Synchronization. IEEE
Transactions on Computers, page 1175, September 1990.
[Sys99] T. Syst¨a. On the relationships between static and dynamic mod-
els in reverse engineering Java software. In Proceedings of the 6th
Working Conference on Reverse Engineering. IEEE Computer
Society Press, 1999.
[Tan97] Andrew Tanenbaum. Operating Systems: Design And Implemen-
tation (2nd Ed.). Prentice Hall, 1997.
[Tog] Together Control Center. Borland Software Corporation.
http://www.togethersoft.com/.
[UMLa] UML Specification Version 1.5 (formal/03-03-01). Object Man-
agement Group. http://www.omg.org.
[UMLb] UML Specification Version 2.0 (ptc/03-08-02). Object Manage-
ment Group. http://www.omg.org.
[Vis00] Visual Age. IBM, 2000. http://www.ibm.com.
[vPG01] C. von Praun and T. R. Gross. Object Race Detection. In Proc.
OOPSLA 2001, 2001.
[WbS] WebSphere. IBM. http://www.ibm.com.
[Wey01] B. Weymann. Visualisierung der Synchronisation in Java-
Programmen mit der UML. Diploma Thesis, University of Pader-
born, 2001. (in German).
[WM00] P.H. Welch and J.M.R. Martin. A CSP Model for Java Mul-
tithreading. In P. Nixon and I. Ritchie, editors, International
Symposium on Software Engineering for Parallel and Distributed
Systems, pages 114–122. IEEE, IEEE Computer Society Press,
June 2000.
[XML00] Extensible Markup Language XML Version 1.0 W3C
Recommendation. World Wide Web Consortium, 2000.
http://www.w3c.org.
222 BIBLIOGRAPHY
[ZH02] A. Zeller and R. Hildebrandt. Simplifying and Isolating
Failure-inducing Input. Transactions on Software Engineering,
38(2):183–200, February 2002.
[ZK00] A. Zeller and J. Krinke. Programmierwerkzeuge: Versionskon-
trolle - Konstruktion - Testen - Fehlersuche unter Linux/Unix.
dpunkt-Verlag, 2000. (in German).
[ZS95] Q. Zhao and J. Stasko. Visualizing the Execution of Threads-
based Parallel Programs. Technical Report GIT-GVU-95-01,
Georgia Institute of Technology, Atlanta, GA, January 1995.
Index
Abstract State Machines (ASM), 97
aspect-oriented programming (AOP),
60, 140
banking example, 8, 193
bottle neck, 82
classification
Java concurrency concepts, 58
Java liveness failure potentials, 84
Java liveness failures, 84
Java thread dependencies, 85
thread lifecycle methods, 102
code instrumentation, 136, 139
code-debug-fix cycle, 13
Communicating Sequential Processes
(CSP), 98
concurrency, 5
error, 7, 62
failure, 7, 62, 68
failure potential, 10, 64, 82
concurrent programming, 6, 35
conditional synchronisation, 53, 55
control flow state, 93
intermediate state, 94
cycle detection, 162
data analysis, 31, 157
data collection, 29, 129
data race, 70
data visualisation, 32, 167
deadlock detection, 159
deadlock potential detection, 160
debugger, 13, 143
debugging, 13, 141, 143
domain model, 89
error, 62
execution history, 90
execution state, 90
failure, 7, 62, 68
failure potential, 10, 64, 82
false positive, 75
fault, 62
formal semantics, 97
GThreads, 172
Guarded Commands, 98
instrumentation, 136, 139
Java, 6, 35
deprecated, 35
exception, 48
interrupt, 51
Java 1.4, 35, 152
Java 1.5, 210
join, 52
Language Specification (JLS), 35,
96
Memory Model (JMM), 41, 97
notify, 56
notifyAll, 56
priority, 45
Runnable, 42
223
224 INDEX
run, 42
start, 43
scheduling, 39
sleep, 45
static, 102
synchronisation, 38
synchronized, 53
Thread, 36, 42
Virtual Machine (JVM), 37
wait, 56
Java Libraries, 140
Concurrency, 60, 133, 209
JDI, 141
JPDA, 141
JVMDI, 141
JVMPI, 140
JVMTI, 211
Log4J, 142
Runtime, 142
JAVIS prototype, 129, 191
-Analysis, 162
-Tracer, 148
-Visualisation, 181
overview, 129
usage examples, 191
JEVOX example, 199
Jinsight, 173
liveness, 7, 68
failure, 68
blocking on I/O, 79
circular join, 77, 124
deadlock, 71, 122
dormancy, 81
formalisation, 122
indefinite blocking, 73
join-induced deadlock, 78, 124
livelock, 78
missed notification, 73, 122
missing notification, 75
nested monitor lockout, 75, 124
self join, 77, 124
starvation, 81
termination by exception, 79
wait-induced deadlock, 72, 122
failure potential, 82
deadlock potential, 83, 125
formalisation, 124
join-induced deadlock potential,
84, 126
unused interrupt, 84
unused notification, 83, 125
lost update, 40
mistake, 62
model for thread synchronisation, 98
monitor, 53
mutual exclusion, 52, 53
nondeterminism, 5, 39
potential (see failure potential), 10,
64, 82
preemption, 46
priority-inversion, 1, 82
program comprehension, 179
race condition, 69
requirements, 25
resource dependency graph, 10
reverse engineering, 178
safety, 7, 68
semantics, 97
software visualisation, 171, 181
stall, 81
statechart for thread synchronisation,
98
Structured Operational Semantics (SOS),
97
synchronisation, 38
testing, 12
thread, 5
INDEX 225
thread dependencies, 85
thread synchronisation, 90
model, 98
statechart, 98
Together, 178, 189
trace, 18, 31, 131, 148
trace visualisation, 171, 181
tracing, 30, 129, 145
Unified Modeling Language (UML),
32, 177
CASE tool, 178
Together, 178, 189
collaboration diagram, 177, 182
interaction diagram, 170, 177
Object Constraint Language (OCL),
182
profile, 177, 181
sequence diagram, 8, 71, 177, 182
statechart, 99
stereotype, 184
tagged value, 183
UML 2.0, 211
visualisation, 32
wait-for graph, 10
wait-queue, 56