scieee Science in your language
[en] (orig)
A thesis submitted to the
Faculty of Computer Science,
Electrical Engineering and Mathematics
of the
University of Paderborn
in partial fulfillment
of the requirements for the degree of Dr. rer. nat.
submitted by
Dipl.-Inform. Timo Kerstan
Paderborn, 21. März 2011
TOWARDS FULL VIRTUALIZATION OF
EMBEDDED REAL-TIME SYSTEMS
Supervisor________________________________________________
Prof. Dr. rer. nat. Franz Josef Rammig
Referees________________________________________________
Prof. Dr. rer. nat. Franz Josef Rammig
Prof. Dr. rer. nat. Marco Platzner
Timo Kerstan: Towards full virtualization of embedded real-time systems–
© University of Paderborn, 21. März 2011
ABSTRACT
The growing complexity and the need for high level functional-
ity in embedded hard real-time systems lead to conflicting goals
for the design of the underlying system software. Adding the re-
quired high level functionality endangers the properties of being
small, robust, efficient, safe and secure typically stated for em-
bedded real-time operating systems. On the other side, the capa-
bility of handling hard real-time loads in a general purpose OS
is often not compatible due to strong non determinisms in gen-
eral purpose OSes. Top of the line cars already house more than
70 interconnected embedded control units (ECUs). The question
is how to cope with this additional complexity. Will the next
generation of cars house even more ECUs or will there be pow-
erful general purpose ECUs executing more than one dedicated
task to reduce the number of ECUs, as the growing number of
ECUs is not adequately manageable any more? When looking
towards the direction of using more powerful ECUs, the con-
flicting goals of general purpose OSes and real-time OSes arise
again. Is it suitable to apply the paradigm of virtualization to
realize this integration, still ensuring the requirements for both
types of OSes? What virtualization paradigm is suitable for the
domain of embedded real-time systems? How to guarantee real-
time properties when applying virtualization to a real-time sys-
tem with other real-time system or general purpose systems?
All these questions need to be answered when thinking of virtu-
alization as an upcoming paradigm for designing complex dis-
tributed embedded hard real-time systems.
iii
ZUSAMMENFASSUNG
Die zunehmende Komplexität und die Forderung nach Schnitt-
stellen auf höchster Ebene bei eingebetteten harten Echtzeitsys-
temen führt zu gegensätzlichen Zielen bei der Entwicklung der
unterliegenden Systemsoftware. Das Hinzufügen weiterer Funk-
tionalität auf höchster Ebene gefährdet die typischen Eigenschaf-
ten eingebetteter harter Echtzeitsysteme. Auf der anderen Seite
ist die Implementierung von Funktionalität zum Ausführen von
Systemen unter harter Echtzeit in nicht echtzeitfähigen Betrieb-
ssystemen nicht möglich, da diese aufgrund ihrer vorhandenen
Implementierung oft nicht deterministisches Verhalten aufweis-
en. Heutige Oberklassenfahrzeuge enthalten allerdings mittler-
weile mehr als 70 eingebettete Steuereinheiten (ECUs). Es stellt
sich also die Frage, wie man dieser zunehmenden Komplexität
Herr wird. Werden aus diesem Grund zukünftige Fahrzeuge
noch mehr ECUs enthalten oder werden leistungsfähigere ECUs
die Funktionalitäten mehrere ECUs in sich vereinen, weil die
wachsende Zahl von ECUs nicht mehr adäquat handhabbar ist.
Blickt man in Richtung des Einsatzes leistungsfähigerer ECUs,
so tauchen die Konflikte der Designziele von eingebetteten har-
ten Echtzeitsystemen und Betriebssystemen mit Funktionalität
auf höchster Ebene wieder auf. Denkt man an den Einsatz von
Virtualisierung, ergeben sich daraus interessante Fragestellun-
gen. Ist der Einsatz von Virtualisierung in der Lage, die wider-
sprüchlichen Ziele in ein System zu integrieren, wobei die An-
forderungen beider Systeme noch immer erhalten bleiben? Eine
weitere Frage ist, welches Paradigma der Virtualisierung am bes-
ten für eingebettete Echtzeitsysteme geeignet ist. Sämtliche Fra-
gen bedürfen einer Antwort, wenn Virtualisierung als mögliches
Lösungsparadigma in Betracht gezogen wird, um dem Design
komplexer verteilter eingebetteter harter Echtzeitsysteme Herr
zu werden.
iv
Auf der Familie ruht die Kunst, die Wissenschaft, der menschliche
Fortschritt, der Staat.
—Adalbert Stifter *23.10.1805 - 28.01.1868
To my children, my wife, my parents and my parents in law.
OWN PUBLICATIONS
[BK09] Baldin, Daniel and Timo Kerstan: Proteus, a hybrid vir-
tualization platform for embedded systems. In Rettberg,
Achim and Franz Josef Rammig (editors): Analysis,
Architectures and Modelling of Embedded Systems. IFIP
WG 10.5, Springer-Verlag, September 2009.
[GK08] Groesbrink, Stefan and Timo Kerstan: Modular paging
with dynamic tlb partitioning for embedded real-time sys-
tems. In SIES 08. Third International Symposium on In-
dustrial Embedded Systems, La Grande Motte, France,
2008.
[KBG10] Kerstan, Timo, Daniel Baldin, and Stefan Groesbrink:
Full virtualization of real-time systems by temporal par-
titioning. In Petters, Stefan M. and Peter Zijlstra
(editors): Proceedings of the 6th International Workshop
on Operating Systems Platforms for Embedded Real-Time
Applications, pages 2432. ArtistDesign Network of
Excellence on Embedded Systems Design, ArtistDe-
sign Network of Excellence on Embedded Systems
Design, July 2010. in conjunction with the 22nd Eu-
romicro Intl Conference on Real-Time Systems Brus-
sels, Belgium, July 7-9,2010.
[KBS09] Kerstan, Timo, Daniel Baldin und Gunnar Schoma-
ker: Formale Bestimmung von Systemparametern zum
transparenten Scheduling virtueller Maschinen unter
Echtzeitbedingungen. In: Informatik aktuell (Tagungs-
band Echtzeit 2009). Fachausschuß Echtzeitsysteme
der Gesellschaft für Informatik und der VDI/VDE-
Gesellschaft Mess- und Automatisierungstechnik
(GMA), Springer-Verlag, November 2009.
[KDMK10] Klobedanz, Kay, Bertrand Defo, Wolfgang Müller,
and Timo Kerstan: Distributed coordination of task mi-
gration for fault-tolerant flexray networks. In Proceedings
of the fifth IEEE Symposium on Industrial Embedded Sys-
tems (SIES2010). IEEE, IEEE, July 2010.
[KO10] Kerstan, Timo and Markus Oertel: Design of a real-
time optimized emulation method. In Proceedings of
vii
DATE 2010, Dresden, March 2010. IEEE Computer
Society, IEEE Computer Society Press.
[RDJ+09] Rammig, Franz Josef, Michael Ditze, Peter Janacik,
Tales Heimfarth, Timo Kerstan, Simon Oberthür, and
Katharina Stahl: Hardware-dependent Software Princi-
ples and Practice, chapter Basic Concepts of Real Time
Operating Systems, pages 1545. Springer, January
2009.
[SBTKS09] Samara, Sufyan, Fahad Bin Tariq, Timo Kerstan, and
Katharina Stahl: Applications adaptable execution path
for operating system services on a distributed reconfig-
urable system on chip. In Proceedings of International
Conference on Embedded Software and Systems, 2009.
ICESS 09, May 2009.
[SHK+10] Schäfer, Wilhelm, Christian Henke, Lydia Kaiser, Ti-
mo Kerstan, Matthias Tichy, Jan Rieke und Tobi-
as Eckardt: Der Softwareentwurf im Entwicklungspro-
zess mechatronischer Systeme. In: Gausemeier, Jürgen,
Franz Josef Rammig, Wilhelm Schäfer und Ansgar
Trächtler (Herausgeber): 7. Paderborner Workshop Ent-
wurf mechatronischer Systeme. Heinz Nixdorf Institut,
HNI Verlagsschriftenreihe, Paderborn, März 2010.
viii
CONTENTS
1 Introduction 1
1.1Purpose of the Thesis . . . . . . . . . . . . . . . . . . 3
1.2CurrentStatus....................... 4
1.3Structure of the Thesis . . . . . . . . . . . . . . . . . . 4
2 Virtualizing Embedded Real-Time Systems 7
2.1Real-TimeSystems .................... 13
2.1.1TaskModels .................... 15
2.1.2Periodic task scheduling . . . . . . . . . . . . . 16
2.2The Architecture of VMs . . . . . . . . . . . . . . . . . 21
2.2.1System Virtual Machines . . . . . . . . . . . . . 24
2.2.2Processor Virtualization . . . . . . . . . . . . . . 26
2.2.3Memory Virtualization . . . . . . . . . . . . . . 34
2.2.4Multiple independent layers of security . . . . 40
2.3Emulation ......................... 41
2.3.1Interpretation.................... 41
2.3.2Binary translation . . . . . . . . . . . . . . . . . 43
2.4Summary ......................... 47
3 A Virtual Machine Monitor for Embedded Real-Time
Systems 49
3.1ProblemStatement.................... 54
3.2RelatedWork ....................... 57
3.2.1Academia...................... 57
3.2.2Industry....................... 60
3.2.3Summary ...................... 70
3.3Design ........................... 71
3.3.1Configurability................... 73
3.3.2Architecture..................... 73
3.3.3Processor Virtualization . . . . . . . . . . . . . . 82
3.3.4Timer Virtualization . . . . . . . . . . . . . . . . 86
3.3.5Scheduler...................... 92
3.3.6Memory Virtualization . . . . . . . . . . . . . . 92
3.3.7I/O Virtualization . . . . . . . . . . . . . . . . . 97
3.3.8Summary ...................... 99
3.4Evaluation......................... 100
3.4.1PPC405 ISAAnalysis ............... 100
3.4.2Worst Case Execution Times . . . . . . . . . . . 102
3.4.3Virtualization overhead . . . . . . . . . . . . . . 110
ix
xContents
3.4.4Footprint ...................... 113
3.4.5Performance .................... 114
3.5Summary ......................... 115
4 Scheduling of full virtualized hard real-time systems 119
4.1ProblemStatement.................... 120
4.2RelatedWork ....................... 121
4.2.1Academia...................... 122
4.2.2Industry....................... 131
4.2.3Classification.................... 133
4.2.4Summary ...................... 135
4.3Model ........................... 135
4.4Transformation of real-time systems into real-time
virtualmachines ..................... 137
4.4.1Normalization ................... 139
4.4.2Scaling the virtual real-time system . . . . . . . 140
4.4.3Summary ...................... 142
4.5Partitioning Policy . . . . . . . . . . . . . . . . . . . . 142
4.5.1Activationslots................... 143
4.5.2Period of the static resource partitions . . . . . 145
4.5.3Schedule....................... 149
4.5.4Summary ...................... 153
4.6Evaluation......................... 155
4.6.1Algorithmic complexity . . . . . . . . . . . . . . 156
4.6.2Distribution of the GCD . . . . . . . . . . . . . . 158
4.6.3Experimental Setup . . . . . . . . . . . . . . . . 159
4.6.4Scheduling performance . . . . . . . . . . . . . 160
4.6.5Context Switching . . . . . . . . . . . . . . . . . 164
4.6.6CaseStudy ..................... 167
4.6.7Summary ...................... 174
5 Summary 177
A Evaluation 181
A.1Measurements....................... 181
A.2Scenario I: Electrical Drive Engineering - Linear
MotorControl....................... 185
A.3Scenario II: Industrial - CNC Machine . . . . . . . . . 186
A.4Scenario III: Medical - X-ray Machine . . . . . . . . . 187
A.5Scenario IV: Automotive - Airbag Control and
DriverAssistance..................... 188
bibliography 189
LIST OF FIGURES
1Operator Controller Module . . . . . . . . . . . . . . . . . 9
2Use Cases for the application of virtualization to
embedded systems[Hei08b] ................. 11
3Classes of real-time systems. [But04]............ 14
4Common parameters of real-time tasks [But04]. ..... 16
5Classes of real-time tasks [But04]............... 17
6Example of timeline scheduling [But04]........... 18
7Abstraction vs. Virtualization . . . . . . . . . . . . . . . . 22
8Process and System VMs . . . . . . . . . . . . . . . . . . . 23
9Different types of System VMs [SN05b]........... 24
10 Virtual Machine Monitor Concept . . . . . . . . . . . . . 27
11 VirtualMachineMap..................... 32
12 Memory Virtualization . . . . . . . . . . . . . . . . . . . . 35
13 Virtualizing architected page tables . . . . . . . . . . . . . 36
14 Virtualizing architected TLBs . . . . . . . . . . . . . . . . 38
15 MILSarchitecture........................ 39
16 Overview of an interpreter [SN05b]. ............ 42
17 Basic and threaded interpretation . . . . . . . . . . . . . . 44
18 BinaryTranslation....................... 45
19 Code Location Problem [SN05b]............... 45
20 Code Discovery Problem . . . . . . . . . . . . . . . . . . . 46
21 Operator Controller Module . . . . . . . . . . . . . . . . . 50
22 OCMvirtualized ....................... 53
23 Problem of integrating given real-time systems into an
virtual system hosting these real-time systems as RTVMs. 55
24 ModularPageTables..................... 61
25 Arinc653 architecture diagram. . . . . . . . . . . . . . . . 63
26 Green Hills Secure Virtualization architecture diagram. . 64
27 Lynuxworks Lynxsecure architecture diagram. . . . . . . 65
28 PikeOS architecture diagram. . . . . . . . . . . . . . . . . 67
29 XENArchitecture....................... 69
30 Configuration flow of the virtualization platform. . . . . 74
31 The virtual machine monitor allows for the virtualized
execution of any kind of guest application. Left: an
unmodified application. Middle: a completely
paravirtualized application. Right: a partially
modified application. . . . . . . . . . . . . . . . . . . . . . 75
xi
xii List of Figures
32 Information and control flow of the components used
inside the virtualization platform. . . . . . . . . . . . . . 76
33 Flow Chart of the Full Virtualization components. . . . . 77
34 Flow Chart of the Paravirtualization components. . . . . 79
35 Pre-Virtualization illustrated. a) Base source code. b)
Pre-Virtualized source code and Pre-Virtualization
table. c) The final paravirtualized code. . . . . . . . . . . 81
36 Steps performed upon handling an IRQ [Bal09]...... 83
37 When sharing the timer device, the VMs are suffering
from blackouts during their phase of inactivity [Kai08a]. 85
38 Timer scaling in case of full virtualization . . . . . . . . . 87
39 Timer scaling in case of full virtualization . . . . . . . . . 87
40 Timer Virtualization Flow Chart . . . . . . . . . . . . . . 91
41 Interface of the VMM Scheduler component [Bal09]. . . . 93
42 Mixed Priority Paging with Dynamic TLB Partitioning . 94
43 PowerPC 405 register set [Xil09]. .............. 101
44 Virtualization overhead . . . . . . . . . . . . . . . . . . . . 111
45 Virtualization overhead of mtevpr in detail [Bal09]. . . . . 111
46 Hierarchy of involved schedulers . . . . . . . . . . . . . . 120
47 Problem of integrating given real-time systems into an
virtual system hosting these real-time systems as RTVMs. 121
48 VM supply function C(t) vs real time elapsed. . . . . . . 123
49 Resource Partition Model . . . . . . . . . . . . . . . . . . 125
50 ARINC653 partition scheduling . . . . . . . . . . . . . . . 132
51 PikeOS scheduling mechanism. The OR operator
realizes a preemptability of the time triggered
partitions τiby the event triggered background
partition τ0........................... 133
52 Model of a real-time system . . . . . . . . . . . . . . . . . 135
53 Model of a virtual real-time system . . . . . . . . . . . . . 136
54 Methodology to realize full virtualization of RTVMs
by temporal partitioning. . . . . . . . . . . . . . . . . . . . 138
55 Comparison of switching overhead of STSPPs and
MTSPPs............................. 144
56 Graphical illustration of the required computation
time within the allocated interval EiSiof Πi...... 145
57 Example for the choice of an incompatible period
length for static resource partition . . . . . . . . . . . . . 146
58 Assigned computation time and utilization over the
real time elapsed of schedule 1of figure 57 ........ 148
59 Example for scheduling activation slots of a two
periodic resource partitions Π1and Π2with period
P1=P2=8and U1=U2=1
2. ............... 149
60 Example for placing the activation slots of a MTSPP . . . 151
61 Example for placing the activation slots of a STSPP . . . 152
62 Example for placing the activation slots of three
STSPPs to ensure the schedulability of the whole system 153
63 Example of a valid partitioned schedule with the
period being the GCD of all deadlines instead of
P=8. The activation slots have been determined
according to equation 4.28................... 154
64 Distribution of the GCD for up to 6tasks with periods
up to 10ms........................... 158
65 Possible measurement error due to signal propagating
time................................ 159
66 Scaling behavior of sched_init. For measurement
values see table 22,23 and 26................. 161
67 Scaling behavior of sched_getNextVMIndex . . . . . . . 163
68 VMCS overhead for periods from 1µs up to 10ms on
PowerPC405 @300 MHz................... 166
69 Schedules using OSE and FVBTP . . . . . . . . . . . . . . 169
LIST OF TABLES
1Actions of the VMM on triggered page faults . . . . . . . 36
2Actions of the VMM on triggered TLB misses . . . . . . 39
3Subset of PPC405FX registers available in supervisor
mode .............................. 103
4WCETfand WCETpmeasurement. WCETirepresents
the WCET using IRFM (see section 3.3.3) for register
access [Bal09]. ......................... 106
5WCETfand WCETpmeasurement. WCETirepresents
the WCET using IRFM (see section 3.3.3) for register
access [Bal09]. ......................... 107
6WCETfand WCETpmeasurement for interrupt
latency overhead induced by the VMM. WCETi
represents the WCET using IRFM (see section 3.3.3)
for register access [Bal09]. .................. 108
7Application of functions WCETf(p)(3.11) and
WCETp(p)(3.12) compared to the real measured
executiontimes. ....................... 109
8Binary and memory footprint of the VMM with one
VMconfigured. ........................ 114
9Performance of three different virtualization scenarios. . 116
xiii
xiv List of Tables
10 Comparison of the algorithmic complexity of the open
system environment and FVBTP . . . . . . . . . . . . . . 156
11 Performance of scheduler interface routine
sched_getNextTimerEvent [Grö10].............. 162
12 VMCS for Open System Environment for n=2. [Grö10]165
13 VMCS Performance for FVBTP. [Grö10] .......... 166
14 Scenario IV: Electrical Drive Engineering - Linear
Motor Control. [Grö10].................... 168
15 Electrical Drive Engineering Evaluation. [Grö10]..... 169
16 Scenario I: Industrial - CNC Machine. [Grö10] ...... 170
17 Industrial - CNC Machine Evaluation. [Grö10]...... 171
18 Scenario II: Medical - X-ray Machine. [Grö10] ...... 172
19 Medical Evaluation. [Grö10]................. 172
20 Scenario III: Automotive - Airbag Control and Driver
Assistance. [Grö10]...................... 173
21 Automotive Scenario Evaluation. [Grö10]......... 174
22 Performance of scheduler interface routine sched_init
in case of FVBTP. [Grö10] .................. 181
23 Performance of scheduler interface routine sched_init
in case of FVBTP. [Grö10] .................. 182
24 Performance of scheduler interface routine sched_init
in case of OSE. [Grö10].................... 183
25 Performance of scheduler interface routine
sched_getNextVMIndex in case of OSE. [Grö10] ..... 184
26 Performance of scheduler interface routine
sched_getNextVMIndex in case of FVBTP. [Grö10].... 184
1
INTRODUCTION
Contents
1.1Purpose of the Thesis 3
1.2Current Status 4
1.3Structure of the Thesis 4
Virtualization has been a key technology in the desktop and
server market for a fairly long time. Numerous products offer
hardware virtualization at the bare metal level or at the host
level. They enable system administrators to consolidate whole
server farms and end users to use different operating systems
concurrently. In the case of server consolidation, virtualization
helps to improve the utilization or load balance which facilitates
a reduction of costs and energy consumption. Distributed em-
bedded systems used in automotive and aeronautical systems
consist of multitudinous microcontrollers, each executing a ded-
icated task to guarantee isolation and to prevent a fault from
spreading over the whole network. In addition, the utilization
of a single microcontroller may be very low. Applying virtual-
ization to distributed embedded systems can help to increase
the scalability while preserving the required isolation, safety,
and reliability. It is not possible to apply the virtualization so-
lutions for server and desktop systems one-to-one to embedded
systems. The inherent timing constraints of embedded systems
preclude this. These timing constraints add temporal isolation
as a requirement to virtualized embedded real-time systems. Es-
pecially the classical approach of schedulability analysis [LL73]
is no longer applicable to virtualized environments.
One of the main problems of building complex embedded sys-
tems is the integration of software components to a big inte-
grated system.The automotive domain can be use as an example
for a complex embedded system. When considering a top line
1
2Chapter 1. Introduction
car, there are about 70 ECUs1installed with different tasks, from
hard real-time tasks for actuating elements to soft real-time for
multimedia devices to non real-time elements like the ECU for
the window lifter. Those traditionally quit unrelated tasks start
to interact in such complex embedded systems, and the prob-
lem of unintentional feature interaction becomes an extremely
important issue to be handled safely as the verification of a com-
plex integrated system is an extremely hard task. Thus, there
is the trend to have much less ECUs in favor of more central-
ized multi-functional multipurpose hardware, less communica-
tion lines and less dedicated sensors and actuators. The trend
towards more centralized multi-functional hardware boosts the
problem of unintentional interaction of software components as
they share the processor, memory and I/O devices in this case.
Thus, the task of the system software is to prevent unintentional
interactions which are not based on the communication between
the components such as the domination of hardware resources
like the processor or memory, faulty implementations allowing
for buffer overflows, heap overflows, stack overflows, race con-
ditions and so on, as these unintentional interactions endanger
all components running on the same hardware. This is a typi-
cal task for system software and is normally covered by using
virtual memory isolating the tasks from each other, but there
is a new demand for high-level functionality in such complex
embedded systems. Reconsidering the automotive domain, one
can easily see that the modern multimedia, infotainment and
other comfort functions require a lot of high-level APIs to like
sophisticated GUI libraries. This is typically not a task of em-
bedded RTOSs2. This is the point where virtualization can show
its strength as virtualization allows to isolate the high-level tasks
into a VM3together with a GPOS4providing all of the high-level
functionality needed by the cognitive, while isolating the hard
real-time tasks in RTVMs5, which are executing an RTOS. Thus,
virtualization helps to simplify the integration process of compo-
1Electronic Control Unit (ECU) is a generic term for any embedded system that
controls one or more of the electrical systems or subsystems in a motor vehicle.
2Real-Time Operating Systems (RTOS) are a class of operating systems being
used in computing systems that must react within precise time constraints to
events in the environment [But04]
3Virtual Machines (VMs) are containers that provide subsets of the underlying
hardware to the guest operating systems executed in this container
4General Purpose Operating Systems (GPOSs) are the class of operating sys-
tems typically used on desktop computers
5Real-Time Virtual Machines
1.1 purpose of the thesis 3
nents with different requirements to their operating systems as
it allows to run multiple operating systems while spatially isolat-
ing them and preventing unintentional interactions like resource
domination or attacks resulting from faulty implementations. A
big advantage of virtualization against the use of a single op-
erating system providing all functionality is that VMMs are by
design very small and are easier to verify than a big operating
system full of high-level functionality.
1.1 purpose of the thesis
The growing complexity of embedded real-time systems and
their demand for high-level functionality typically provided by
GPOSs like Linux, Windows and Mac OS X is the main motiva-
tion of this thesis. This growing complexity of distributed em-
bedded real-time systems needs to be handled. Considering a
top of the line car, one can observe that there are more than 70
ECUs used for the realization of safety and comfort functions.
So will there be more ECUs used due to the growing complex-
ity in future? Prof. Dr. Manfred Broy of the Technical University
Munich, who is a leading scientist in the domain of software
engineering, doubts this trend and formulated the following hy-
pothesis:
“The car of the future will certainly have much less ECUs in
favor of more centralized multi-functional multipurpose hard-
ware, less communication lines and less dedicated sensors and
actuators. Arriving today at more than 70 ECUs in a car, the
further development will rather go back to a small number of
ECUs by keeping only a few dedicated ECUs for highly critical
functions and combining other functions into a small number of
ECUs, which then would be rather not special purpose ECUs,
but very close to general-purpose processors. Such a radically
changed hardware would allow for quite different techniques
and methodologies in software engineering.” [Bro06]
The requirement for High-Level API functionality for comfort
functions and hard real-time capabilities for safety functions is
not addressed in state of the art real-time operating systems, as
their focus is to be small and efficient while general purpose
operating systems are not able to handle hard real-time loads,
4Chapter 1. Introduction
besides the demand for High-Level APIs. As both domains have
different requirements virtualization is a promising approach for
integrating both kinds of operating systems into a single virtual-
ized system. The purpose of this thesis is to provide a virtualiza-
tion environment being capable of handling multiple real-time
guests together with multiple general purpose guests, while pre-
venting the necessity of paravirtalization. The virtualization plat-
form shall be able to support paravirtualization, but it shall not
require it, as licensing restriction may eliminate the possibility
of paravirtualizing a specific given operating system.
1.2 current status
Up to the point in time where the work on this thesis started
there, was no available virtualization platform providing the
support for paravirtualization and full virtualization to address
the problem of providing a hybrid virtualization interface be-
ing tailored to the applications needs. The trend towards virtu-
alization has raised the interest of industry, and some products
offering full virtualization for High-Level guests and paravirtu-
alization for real-time guests have been released on the market
in the meantime. Unfortunately, the support for multiple real-
time VMs has been covered sparely in industry by restricting
the execution of real-time virtual machines either to a dedicated
cpu core or by restricting the number of real-time virtual ma-
chines to only one with an arbitrary number of non real-time
virtual machines. Academia in contrast offers different hierar-
chical scheduling approaches especially suited for paravirtual-
ization that allow for the presence of multiple real-time virtual
machines, but there is also a lack of approaches using full virtu-
alization.
1.3 structure of the thesis
To provide a general common understanding of the underlying
real-time and virtualization techniques, chapter 2will provide a
short overview of basic real-time and virtualization terms and
techniques. Chapter 3will clearly define the problems being ad-
1.3 structure of the thesis 5
dressed in detail within this thesis. The problems being defined
will be covered in chapter 3and 4. The problem of providing
a configurable hybrid virtualization interface will be covered in
chapter 3, and an overview of the corresponding related work
is given in the beginning of this chapter. Afterwards, the design
of such a virtualization platform is presented and evaluated. In
addition, the worst case execution times are determined in order
to provide the possibility to determine the deterministic over-
head induced by the virtualization. Finally, chapter 4covers the
problem of deriving the cpu requirements and feasible schedule
of given real-time virtual machines in order to guarantee their
execution in a full virtualized environment to eliminate the ne-
cessity of paravirtualization. Therefore, the related work relevant
for this topic is presented in the beginning of this chapter, before
these problems are modeled and a solution is presented. The fi-
nal step of the thesis is an evaluation based on a real execution
on a PowerPC hardware platform. Within this evaluation, the ap-
proach presented in this thesis will be compared to the state of
the art approach and a final resume on the evaluation is given.
To complete this thesis, a final conclusion is given summing up
the reached goals and discussing the possible future work.
2
VIRTUALIZING EMBEDDED REAL-TIME
SY ST E M S
Contents
2.1Real-Time Systems 13
2.2The Architecture of VMs 21
2.3Emulation 41
2.4Summary 47
Today’s computer systems are extremely complex and are de-
signed as hierarchies with well-defined interfaces that separate
levels of abstraction. This allows the independent development
of subsystems by hardware and software design teams. Low-
level implementation details are hidden by the simplifying ab-
stractions. In contrast to abstraction, virtualization does not nec-
essarily hide or simplify details. Instead, virtualization provides
different resources at the same abstraction level. A simple exam-
ple is providing two virtual network adapters while having only
one physically available.
System virtualization has become a key technology in the en-
terprise and personal computing spaces and is recently gaining
significant interest in the domain of embedded systems [Hei08b,
Hei07]. After introducing the key characteristics of enterprise
and embedded systems, the difference in motivation for the use
of virtualization and the resulting differences in requirements
will be presented. These requirements will be used to identify
the flaws of current virtualization technologies in the context of
embedded hard real-time systems under the assumption that the
software of the embedded systems is not modified to support
virtualization as this may be prohibited by licensing restrictions.
When looking at modern data centers today, virtualization is a
hot topic. The decoupling of physical and virtual execution plat-
7
8virtualizing embedded real-time systems
forms by System VMs enables a variety of aspects which are of
importance for enterprise data centers.
Service consolidation: Services being executed on single
machines can be integrated into a single virtualized system
using system virtualization to reach a better utilization of
the system.
Load balancing: Through the decoupling of physical and
virtual execution platform, it is possible to migrate VMs
between different virtualization hosts depending on their
load.
Heterogeneity: The use of System VMs enables the execu-
tion of different operating systems (OSs) on a single ma-
chine. This is mostly relevant for personal desktops.
Power management: This is closely related to load balanc-
ing, but with the optimization goal to minimize the power
consumption of the data center.
Spatial Isolation: The criticality of services may differ ex-
tremely, so in general it is necessary to isolate services from
the rest of the system not allowing them to compromise the
whole system when they are compromised or fail.
The main characteristic of these aspects is the fact that the VMs
execute GPOSs, providing roughly the same kind of of capabili-
ties and similar abstraction levels. Another characteristic of those
scenarios is that VM communication is closely related to the
communication of physical machines using network interfaces
[Hei07,Hei08b,SN05a,SN05b].
The characteristics of embedded systems have changed dramat-
ically over the last two decades. Especially the complexity of
embedded systems has increased tremendously. They changed
from relatively simple single purpose devices to extremely com-
plex distributed systems with millions lines of code (Mloc). Top
of the line cars consist of approximately 70 embedded control
units (ECUs) and gigabytes of software. A funny rumor is that it
takes longer to download the software than building the vehicle
physically.
When taking a look at modern smart phones like the Apple
iPhone, a new characteristic of modern embedded systems can
be noticed. Increasingly, embedded systems run applications or-
iginally developed for GPOSs like Linux, Windows and Mac OS
virtualizing embedded real-time systems 9
Figure 1: Operator Controller Module
10 virtualizing embedded real-time systems
X. There is also a trend that programers develop programs for
embedded system platforms without any experience in this area.
Furthermore there is a strong trend towards openness. The own-
ers of a device want to load their own applications on the embed-
ded systems and run them there. To enable such an openness, it
is necessary to provide an open API introducing all the security
challenges known from GPOSs. However, embedded systems
are still subject to real-time and resource constraints. In addi-
tion they are often used in safety critical mechatronical systems
like planes, cars and trains leading to very high requirements on
safety, reliability and security. The OCM1structure (depicted in
figure 1) of the CRC 6142is an example for such a safety critical
system where the concurrent demands of safety critical systems
and the need of High-Level APIs exist. [But06,Hei07,Hei08b].
The OCM is divided into 3layers:
1. Controller: A closed loop system that controls the sensors
and actuators of the mechatronical system. The sensing,
calculation of the control signals and the output of the con-
trol signal need to be performed in hard real-time.
2. Reflective operator: A monitoring and controlling layer ab-
ove the controller. It has no direct access to the hardware,
but modifies the controller by parameter or structure mod-
ifications. Typically, the reflective operator is time-critical
in terms of soft real-time, but it may also operate in hard
real-time.
3. Cognitive operator: The top layer of the OCM is respon-
sible for using the information of the reflective operator
as input for its cognition to perform self-optimization of
the whole mechatronical system. This is realized by using
a diversity of methods like machine learning, model based
optimization or knowledge based systems. The application
of these methods demand the use of High-Level APIs im-
plementing those methods. The cognitive operator is not
time critical and thus does not need to be executed under
real-time constraints. Furthermore the programers at this
layer do not need necessarily the knowledge of program-
ing embedded systems.
1Operator Controller Module
2Collaborative Research Center for Self-Optimizing Mechatronical Systems
virtualizing embedded real-time systems 11
High-Level
SW
GPOS
Access SW
RTOS
Hypervisor
Hardware
(a) Heterogenous OS environ-
ments
High-Level
SW
GPOS Access SW
RTOS
Hypervisor
Hardware
Buffer
Overflow
Attack
(b) Security
Figure 2: Use Cases for the application of virtualization to embedded
systems[Hei08b]
The new trend for GPOS properties and the demands of complex
embedded real-time systems like the self-optimizing mechatroni-
cal systems developed within the CRC 614 make virtualization a
very promising technique in this area, as virtualization provides
the following interesting aspects:
Heterogeneity: The use of System VMs enables the execu-
tion of different operating systems (OSs) on a single em-
bedded system, as depicted in figure 2ato address the con-
flicting requirements of high-level APIs, real-time support
and legacy support.
Spatial isolation: When building heterogenous OS environ-
ments on a single embedded system it is of indispensable
importance to spatially isolate the different OSs from each
other in a manner that it is not possible for a fault or an
attack to spread to the other systems as depicted in fig-
ure 2b. Spatial isolation is typically realized by assigning
different virtual address spaces to the different VMs. Vir-
tualization fulfills this requirement and therefore increases
the security and dependability significantly.
12 virtualizing embedded real-time systems
Architectural abstraction: The decoupling of physical and
virtual execution platform enables the possibility to ab-
stract from the instruction set architecture (ISA). This re-
sults in the possibility to migrate VMs unchanged to hosts
with the same ISA and to hosts with a different ISA using
emulation. The same ISA migration covers especially the
case of distributing VMs on multiple processor systems on
chip (MPSoC) which are also an upcoming trend in embed-
ded systems.
Legacy software support: In contrast to personal desktops
and enterprise systems, embedded systems are built on a
broad diversity of different µCs3. This diversity implies
the existence of different ISAs which makes it necessary
to port embedded system software to run on a different
ISA when changing the µC of an embedded system. This
becomes problematic if the source code of this software is
not available or accessible due to licensing restrictions. The
advantage of virtualization now comes into play as virtual-
ization allows the usage of emulation techniques to make
this software run on a non supported ISA.
Service consolidation: Services being executed on single
ECUs can be integrated into a single virtualized system us-
ing system virtualization to lower the complexity of large
distributed embedded systems.
These aspects are closely related to the aspects of enterprise sys-
tem virtualization. However, the inherent timing requirements
of embedded systems makes the application of existing virtual-
ization techniques difficult as they have been developed for en-
terprise system architectures. Temporal isolation is an essential
requirement of RTVMs. This means they must not be interfered
by other VMs to respect their own timing behavior. Specifically,
there is a temporal isolation among VMs whenever the ability
for a VM to respect its own timing constraints (e.g. terminating
a computation within a specified time, a.k.a. deadline) does not
depend on the temporal behavior of other unrelated VMs run-
ning on the same system, thus sharing with it a set of resources
(e.g. the CPU or such devices as disk, network, etc...). Under-
standing the issues of temporal isolation requires a fundamental
understanding of real-time systems and their parameters. Sec-
3micro controllers
2.1 real-time systems 13
tion 2.1clarifies the common notations and algorithms used for
preserving the timing requirements of real-time systems.
The next essential requirement for virtualizing embedded real-
time systems is the architecture of VMs enabling the spatial iso-
lation being introduced in section 2.2. Especially the architecture
of the software controlling virtual machine is addressed in this
section. Besides the architecture of the VMs a brief introduction
on common implementation methods for processor virtualiza-
tion and memory virtualization will be given to show the issues
of implementing VMs on a specific hardware.
2.1 real-time systems
In the introduction of this chapter, the term “real-time” was
already used without clearly defining it. Most people think of
real-time systems being extremely fast and performing without
any noticeable lags. This understanding of real-time systems
does not conform to the definition of real-time systems given
by Kopetz:
Definition 2.1.A real-time Computer system is a computer system in
which the correctness of the system behavior depends not only on the
logical results of the computation, but also on the physical instant at
which these results are produced.
The definition implies that the value of a logical result depends
on the time it is produced. Thus, a real-time operating system
(RTOS) needs to be able to manage tasks with timing constraints.
In addition, the RTOS may need to ensure the timing constraints
of a task in the peak load (worst-case) situation. When this prop-
erty is of indispensable importance, the real-time system needs
to be entirely predictable to be able to determine the worst-case
situation. Thus, a real-time system does not only need to be fast
to fulfill the timing requirements, it especially has to be pre-
dictable to be able to determine whether it is possible to fulfill
the timing constraints even in the worst-case situation. In a nut-
shell, a real-time system has to fulfill these properties:
Timeliness
OS has to provide kernel mechanisms for time man-
agement
14 virtualizing embedded real-time systems
time
value of
result
deadline
(a) soft real-time
time
value of
result
deadline
(b) firm real-time
time
value of
result
deadline
(c) hard real-time
Figure 3: Classes of real-time systems. [But04]
handling tasks with explicit time constraints
Design for peak load
Predictability
The value of the logical result in real-time decreases at the time
the deadline is reached. Imagine a video decoder who has to
decode the video frames in time to guarantee lag free video dis-
play. When the video decoder exceeds the deadline, the result is
not completely useless. The decoded frame can still be displayed
when the deadline is exceeded within a pre-defined time inter-
val to prevent the frame from being dropped. This leads to video
jitter, but the video may still be viewed at an acceptable quality.
This application is an example for the class of soft real-time sys-
tems where the value of the computed result decreases smoothly
after the deadline is exceeded (see figure 3a). The second class
of real-time systems is called firm real-time systems. In a firm real-
time system, the value of the computed result is zero when the
deadline is exceeded (see figure 3b). Examples for such a sys-
tems are decision support and value prediction systems such
as stock exchange systems and weather forecast systems. The
third and final class of real-time systems is called hard real-time
systems. In hard real-time systems, the value of the computed
result is negative when the deadline is exceeded (see figure 3b),
because missing the deadline causes catastrophic damage to the
controlled system. Consider an ABS4in a vehicle where the real-
time task misses to compute the force control value of the brakes.
Typical hard real-time activities are sensory data acquisition, ac-
tuator servoing and low-level control of critical system compo-
nents. [But04]
4Anti Lock-Braking System
2.1 real-time systems 15
2.1.1 Task Models
Real-time systems in general execute a set of so called real-time
tasks under the constraint of fulfilling the time constraint of each
real-time task. Tasks within a real-time system are characterized
by computational activities within stringent timing constraints
that must be met in order to achieve the desired behavior. A
typical timing constraint on a task is the deadline. If a deadline
is specified with respect to the arrival time, it is called a relative
deadline, whereas if it is specified with respect to time zero, it is
called an absolute deadline.
In general, a real-time task can be characterized by the following
parameters (see figure 4for a graphical description):
Arrival time ai: is the time at which a task becomes ready
for execution; it is also referred to as request time (or release
time) and indicated by ri;
Computation time Ci: is the time necessary to the proces-
sor for executing the task without interruption;
Absolute deadline di: is the time before which a task should
be completed to avoid damage (if hard) or performance
degradation (if soft);
Relative deadline Di: is the difference between the absolute
deadline and the arrival time: Di=diri;
Start time si: is the time at which a task starts activation;
Finishing time fi: is the time at which the task finishes its
execution;
Response time Ri: is the difference between finishing time
and the arrival time: Ri=fiai;
Criticality: is a parameter related to the consequences of
missing the deadline (typically, it can be hard or soft);
Value vi: represents the relative importance of the task
with respect to other tasks in the system;
Lateness Li:Li=fidirepresents the delay of a task
completion with respect to its deadline;
Tardiness or Exceeding Time Ei:Ei=max(0,Li)is the time
a task stays active after its deadline;
16 virtualizing embedded real-time systems
time
Ji
Di
Ci
aidi
sifi
Figure 4: Common parameters of real-time tasks [But04].
Laxity or Slack time Xi:Xi=diaiCiis the maximum
time frame a task can be delayed on its activation to com-
plete within its deadline;
In real-time systems there are different classes of real-time tasks,
namely periodic tasks (see figure 5a), aperiodic tasks (see figure 5b)
and sporadic tasks. The exact definition of these classes is now
given to clearly distinguish them from each other:
Definition 2.2.A periodic task τihas an infinite sequence of identi-
cal activities, called instances or jobs, and is regularly activated at a
constant rate. The activation time of the first periodic instance is called
phase φi. If φiis the phase of the periodic task τi, the activation time
of the k-th instance instance is given by φi+ (k1)·Ti, where Tiis
called period of the task. [But04]
Definition 2.3.An aperiodic task Jihas an infinite sequence of iden-
tical activities, called instances or jobs, and is not regularly activated
at a constant rate [But04].
Definition 2.4.A sporadic task is an aperiodic task where consecutive
jobs are seperated by a minimum interarrival time [But04].
2.1.2 Periodic task scheduling
In many real-time control applications, like the controller mod-
ule of the OCM described in the introduction of this chapter
(see 2), periodic activities represent the major computational de-
mand in the system. When a control application consists of n
concurrent periodic tasks, building the taskset
Γ={τi(Φi,Ti,Ci)|i=1...n}(2.1)
with τi,jbeing the j-th instance of task τi, and ai,jor ri,jbe-
ing the release time of the j-th instance of task τi, the operating
2.1 real-time systems 17
time
τi
Di
Ci
Ti
1st instance kth instance
ΦiΦi+(k-1) Ti
(a) periodic task τi
time
Ji
Di
Ci
ai1 di1
Di
Ci
ai2 di2
(b) aperiodic task Ji
Figure 5: Classes of real-time tasks [But04].
system has to guarantee that each periodic instance is regularly
activated at its proper rate and is completed within its dead-
line. Therefore, a brief introduction to timeline scheduling (TS)
also known as cyclic scheduling (CS) or synchronous time di-
vision multiple access (TDMA), rate monotonic priority assign-
ment (RM) and earliest deadline first scheduling policies are
now given under the assumption of an independent taskset Γ
and tasks τihaving relative deadlines Diequal to their period
Ti.
Timeline scheduling
The main idea of timeline scheduling consists in dividing the
temporal axis into slices of equal length in which one or more
tasks can be allocated offline for execution, in such a way to
respect the frequencies derived from application requirements.
Figure 6illustrates the timeline scheduling method for a taskset
A,Band Cwith periods TA=25ms,TB=50ms and TC=
100ms. To meet the required periods, the GCD5of the periods
can be used to determine the time slice length which is also
called minor cycle. The minimum period after which the schedule
repeats itself is called a major cycle or hyperperiod being in general
equal to the least common multiplier (LCM) of the tasks periods.
In the example shown in figure 6, the minor cycle is 25ms and
5Greatest Common Divisor
18 virtualizing embedded real-time systems
time [ms]
Major Cycle
Minor Cycle
025 50 75 100 125 150
A
B
C
Figure 6: Example of timeline scheduling [But04].
the major cycle is 100ms. Task A needs to be executed every
minor cycle while Task B needs to be executed every two minor
cycles and Task C needs to be executed every four minor cycles.
A possible schedule is also depicted in figure 6.
Timeline scheduling is a very simplistic approach which can be
easily implemented by programming a timer interrupt with a
period equal to the minor cycle. Another advantage is that the
tasks are not affected by jitter, because task start and response
times are not subject to large variations. Nevertheless, timeline
scheduling has some more drawbacks besides being an offline
approach. Overhead condition handling is very problematic, be-
cause if a task does not terminate in time, it can either be aborted,
leaving the system in an inconsistent state, or if the failing tasks
continues execution, it can cause a domino effect on the other
tasks, breaking the entire scheduler.
The timeline scheduling approach is similar to the Time Division
Multiple Access multiplexing used in communications engineer-
ing.
Rate Monotonic Scheduling
The RM6scheduling algorithm, or more precisely the rate mono-
tonic fixed priority assignment algorithm, is a simple algorithm
assigning priorities to real-time tasks proportional to their exe-
cution rates or equivalently anti-proportional to their period T.
Thus, tasks with a higher rate (shorter period) will receive higher
priorities. As periods are not subject to change at runtime, RM
is a fixed priority scheduling algorithm where priorities are as-
signed offline to all tasks. The scheduling itself is priority-based
and intrinsically preemptive.
In 1973, Liu and Layland showed in [LL73] that RM is optimal
among all fixed-priority assignments, in the sense that there is
6Rate Monotonic
2.1 real-time systems 19
no other fixed priority algorithm that can schedule a taskset that
cannot be scheduled by RM. Furthermore the least upper bound
based on the processor utilization factor has been derived to test
whether a taskset is schedulable by RM or not.
Definition 2.5.The processor utilization factor Uis the fraction of
processor time spent on executing the tasks of a given taskset Γ. The
time spent within a task τiis Ci
Ti. Thus the utilization factor Ufor n
tasks is given by:
U=
n
X
i=0
Ci
Ti
To decide whether a real-time taskset Γis being schedulable
by an arbitrary scheduling algorithm A, it has to be checked
whether the processor utilization Uis less or equal than the up-
per bound Uub(Γ,A), which depends on the taskset and the ap-
plied scheduling algorithm. In the case U=Uub(Γ,A)the pro-
cessor is said to be fully utilized. To eliminate the dependency
on the specific taskset to decide whether a taskset is schedulable
by a scheduling algorithm A, the minimum of all upper bounds
can be used as simplified schedulability check for a given taskset
Γ. The minimum of all upper bounds is called least upper bound
Ulub:
Ulub(A) = min
Γ(Uub(Γ,A)) (2.2)
In the case of RM, the least upper bound Ulub(n)7of a taskset
with ntasks can be calculated as:
Ulub(n) = n·(21
n1)(2.3)
lim
n
Ulub(n) = ln(2)0.69 =Ulub(RM)(2.4)
Theorem 2.1.Let Γbe an arbitrary set of periodic tasks with a proces-
sor utilization factor of U(Γ). Then Γis schedulable by RM if
U(Γ)Ulub(RM).
7The derivation of Ulub(n)can be found in [LL73] or [But04]
20 virtualizing embedded real-time systems
Theorem 2.2.Let Γbe an arbitrary set of nperiodic tasks with a
processor utilization factor of U(Γ). Then Γis schedulable by RM if
U(Γ)n·(21
n1).
Another approach in the same complexity class of O(n)is the
hyperbolic bound introduced by Bini et. al in [BB01]. The hyper-
bolic bound is less pessimistic than the original Liu and Layland
bound.
Theorem 2.3.Let Γbe an arbitrary set of nperiodic tasks. Then Γis
schedulable by RM if
n
Y
i=1
(Ci
Ti
+1)2.
All these schedulability tests presented up to now are sufficient
tests. Audsley et. al presented in [Aud91] a sufficient and neces-
sary schedulability test with complexity O(n2). This test is called
Response Time Analysis. As the name implies, the test determines
for every task the response time at the critical instant as the sum
of its computation and the interference due to preemption by
higher priority tasks.
Theorem 2.4.Let Γbe an arbitrary set of nperiodic tasks. Then Γis
schedulable by RM if and only if
RiDi:
with Ri=Ci+Ii,
where Ii=
i1
X
j=1
dRj
Tj
e · Cj
As this is a recurrent equation, no simple solution exists. Thus,
the test has to be performed by iteratively checking the equation
for every task of the taskset Γ. The floor expression within the
equation ensures the termination of the iteration when check-
ing against the deadline of each task, as there is no asymptotic
behavior.
2.2 the architecture of vms 21
Earliest Deadline First Scheduling
In contrast to RM, the EDF8scheduling algorithm is a dynamic
scheduling algorithm that selects the next task to execute accord-
ing to the absolute deadlines. Thus, tasks with earlier absolute
deadlines have higher priorities and as the absolute deadline of
a periodic tasks depends on the current instance the priorities
of the periodic task change during execution. As RM, EDF is in-
trinsically preemptive, as it preempts the current running task
when a task arrives with a shorter absolute deadline. EDF is not
only able to handle periodic tasks, but also aperiodic tasks and is
optimal in minimizing the maximum lateness on a given taskset.
To check whether a given taskset is schedulable by EDF the pro-
cessor utilization factor introduced in definition 2.5is used to
check it against the least upper bound of EDF. Fortunately, the
least upper bound of EDF is one, enabling a very simple schedu-
lability test and allowing tasks to fully utilize the processor up
to 100%.
Theorem 2.5.Let Γbe an arbitrary set of periodic tasks, then Γis
schedulable by EDF if and only if
U(Γ) =
n
X
i=1
Ci
Ti
1.
[But04]
2.2 the architecture of vms
Today’s computer systems are extremely complex and are de-
signed as hierarchies with well-defined interfaces that separate
levels of abstraction. This allows the independent development
of subsystems by hardware and software design teams. In addi-
tion low-level implementation details are hidden by the simpli-
fying abstractions. In contrast to abstraction, virtualization does
not necessarily hide or simplify details. This is depicted in figure
7. Figure 7ashows that the operating system is an abstraction of
the hardware, in this case the CPU, and provides a simplified
interface (Processes) to access the CPU, while figure 7bshows
8Earliest Deadline First
22 virtualizing embedded real-time systems
Abstraction
OS
Real CPU
Applications
Abstraction
(a) Abstraction
Virtualization
OS OS
VCPU1 VCPU2
Real CPU
Applications Applications
Abstraction
Abstraction
(b) Virtualization
Figure 7: Abstraction vs. Virtualization
that virtualization provides different resources, in this case two
virtual CPUs, at the same abstraction level.
Virtualization can be applied at different interface levels of a
computer system architecture. The Instruction Set Architecture
(ISA) is the interface between soft- and hardware and is divided
into user and system instructions. The system instructions are
privileged instructions that are only accessible by software run-
ning in system mode to prevent user mode software from unau-
thorized access to the hardware.
A process virtual machine is executed in general on top of the OS
and provides a uniform view to application processes indepen-
dent of the underlying OS and hardware. A good example for
this is the Java Virtual Machine (JVM). A system virtual machine
is located directly at the ISA level and creates virtual instances of
the underlying hardware which it can assign to different guest
OSs.
In case of process VMs depicted in figure 8a, the operating sys-
tem runs in system mode and the virtualizing software runs in
user mode while in case of System VMs the virtualizing soft-
ware depicted in 8bruns in system mode. So in case of system
virtualization, the virtualizing software is put directly at the ISA
level and is therefore in full control of the hardware, while in
2.2 the architecture of vms 23
(a) Process virtual machine
Application process
Hardware
Virtualizing
Software
OS
ISA
ABI
(b) System virtual machine
Figure 8: Process and System VMs
case of process VMs the virtualizing software is put in the ABI9
level where the OS is in full control of the hardware. The ABI
gives a program access to the hardware by the system call in-
terface of the OS and direct access to the user mode ISA. Thus
the difference of System VMs and process VMs is that a pro-
cess VM is a virtual platform that executes an individual pro-
cess while a System VM provides a complete persistent system
environment that supports an operating system along with its
many user processes. In addition, it provides access to virtual
hardware resources like networking, I/O, memory and proces-
sors [BDF+03,SN05a,SN05b].
Due to the demands of GPOS properties, openness, High-Level
APIs, real-time behavior and strict spatial and temporal isolation,
system virtualization is the most interesting VM architecture for
embedded real-time systems, because a system virtual machine
is able to provide a complete persistent system environment to
host multiple OSs fulfilling these demands. These properties in-
spired the development of the system multiple independent lay-
ers of security architecture (MILS [Obj08]) for embedded sys-
tems being used in mission critical systems.
9Application Binary Interface
24 virtualizing embedded real-time systems
OS
Applications Guest OS
Applications
Hardware
VMM
Hardware
VMM
Applications
Host OS
Hardware
Guest OS
Applications
Guest OS
VMM
Host OS
Hardware
Nonprivileged
modes
Privileged
modes
(a) Traditional
uniprocessor
system
(b) Native VM
system
(c) User-mode
hosted
VM system
(d) Dual-mode
hosted
VM system
Figure 9: Different types of System VMs [SN05b].
2.2.1 System Virtual Machines
The VMM10 is the core component in any System VM environ-
ment. It is responsible for scheduling and managing the allo-
cation of hardware resources to various guest VMs. The VMM
controls the physical resources and makes them available to the
guest VMs by providing them as virtualized resources. The re-
sources can be shared among the guest VMs, they can be par-
titioned with a partition being exclusively accessible to a guest
VM or they can be exclusively assigned to a single guest VM.
Such resources include the CPU registers, the real memory of
the system and the various I/O devices attached to the system.
To realize virtualization in an efficient manner, at least the VMM
needs to be executed in an higher privileged “real CPU mode”
than the code of the guest VMs. The real CPU mode denotes the
CPU mode currently determined by the hardware. Usually, the
hardware provides a system mode used by the OS and a user
mode used by the applications. The real CPU mode can differ
from the virtual CPU mode, as the CPU mode needs to be virtu-
alized for the guest OS. Thus the guest OS executes virtually in
a higher privileged CPU level than the guest applications, while
the real CPU mode in hardware may not reflect this. With this
knowledge, it is possible to distinguish between different types
of System VMs.
Definition 2.6.A virtual machine system in which the VMM operates
in a privileged level higher than the level of the guest VMs is called
native VM system (see figure 9) [SN05b].
In such a system, the VMM has to be installed first on the sys-
tem. The guest OSs are installed on top of the VMM. Thus, the
10 Virtual Machine Monitor
2.2 the architecture of vms 25
VMM always keeps full control over the hardware during the in-
stallation process. As the definition says the guest OS runs in a
lower privilege level than the VMM, and thus the CPU privilege
mode has to be emulated by the VMM. A well known example
for native VM systems is XEN. Sometimes it is advantageous
to run the VMM on top of a host OS for user convenience and
implementation simplicity, as such a System VM can use the
functionality provided by the host OS to implement the VMM
functionality.
Definition 2.7.A virtual machine system in which the VMM is exe-
cuted on top of an existing OS is called hosted VM system (see figure
9) [SN05b].
A hosted VM system can be implemented with the VMM run-
ning at a lower privilege level than the guest OS only when it
is possible to modify the host OS. This is not always possible,
as the source code may be unavailable or licensing restrictions
prohibit such modifications. In these cases, the VMM may be
implemented at the same privilege level of the guest OS.
Definition 2.8.A virtual machine system in which the VMM is ex-
ecuted on top of an existing OS with a privilege level equal to the
privilege level of its guests is called user-mode hosted VM system
(see figure 9) [SN05b].
User-mode System VMs suffer efficiency, because most of the
code has to be scanned or emulated. To overcome these prob-
lems, an additional kind of hosted VM systems has been intro-
duced.
Definition 2.9.A virtual machine system in which the VMM is ex-
ecuted on top of an existing OS with a privilege level lower or equal
to the privilege level of its guests is called dual-mode hosted VM
system (see figure 9) [SN05b].
Those systems can be implemented using well defined interfaces
of the OS such as kernel extensions or device drivers. A well
known example for such a dual-mode hosted system is the clas-
sic VMWare for Desktops.
To make the physical resources available to the guest VMs the
VMM assigns these resources to the guest VM. It is very impor-
tant for the VMM to be able to get the control of the resources
26 virtualizing embedded real-time systems
back, so that they can be assigned to a different VM when the
resource is shared between multiple VMs. Thus, the VMM must
maintain the full control over all hardware resources, even in
the case that they are temporarily assigned and used by the
guest VM currently running. This problem already occurs in
time-sharing system, where the OS needs to get the processor
back to assign a new task to the processor. In this case. the re-
source controlled by the OS is the interval timer. It is not directly
accessible for the tasks of the OS. Thus the OS can ensure its
reactivation by setting a timer value equal to the time assigned
to the task. After this time, the timer causes an interrupt guar-
anteeing that the OS gains back control of the processor. The
situation is similar in a System VM environment. In this case the
VMM controls the sharing of the resources between the differ-
ent VMs. The VMM therefore emulates the access to privileged
resources to prevent the VMs from directly accessing these re-
sources. When considering interrupts as an example, the VMM
would first handle the interrupt itself before it modifies the state
of the guest VMs to emulate the incoming interrupt.
In the following section, the process of the control transfer from
the guest VMs to the VMM will be formally derived to ensure
virtualization with one of the main properties being to keep
the VMM in full control of all system resources. Afterwards,
the problems of virtualizing system memory will be introduced.
[SN05b]
2.2.2 Processor Virtualization
Already in the very early stage of third generation computers,
virtualization came up as a technology to realize multiple sub-
systems on a large system like a mainframe computer. In 1974,
G.J. Popek and R.P. Goldberg presented the formal definitions
of VMs and a simple condition which can be tested to deter-
mine whether an architecture can efficiently support virtualiza-
tion. After presenting this formal requirements in the following
subsection the different control transfer approaches are shortly
introduced to pass the control from a VM to the VMM.
2.2 the architecture of vms 27
VMM
Hardware
VM
Figure 10: Virtual Machine Monitor Concept
Formal Requirements of Virtualizability
In the following section, it will be shown that system virtualiza-
tion software needs to fulfill the three properties efficiency, re-
source control and equivalency to provide a correct and efficient
virtualization of the underlying ISA. This will be described at a
quite formal level which allows the exact classification of ISAs
into two classes. The first class contains all ISAs fulfilling the for-
mal requirements and are thus virtualizable, while the second
class contains the ISAs not fulfilling these requirements. These
ISAs are called to be not efficiently virtualizable.
To realize a virtual machine on top of the real machine, the con-
cept of the virtual machine monitor (VMM), depicted in figure
10, is introduced. The three essential characteristics of a VMM
are:
The VMM provides an essentially identical environment
for programs as the original machine would.
Programs running in this environment show only minor
decreases in speed.
The VMM is in complete control of the system resources.
The first property is meant in terms of identical results when
executing an arbitrary program with or without the existence of
the VMM. This does not cover identical in timing. The second prop-
erty demands for a statistically dominant subset of instructions
of the virtual machine being executed natively. This rules out
traditional emulators and software interpreters from the virtual
machine umbrella. The third property ensures that an arbitrary
program is not able to access any resource not allocated to it.
Upon every attempt to access resources the VMM is invoked
28 virtualizing embedded real-time systems
controlling the access to the resource. Now we can define what
a virtual machine is:
Definition 2.10.A virtual machine is the environment created by the
VMM [PG74].
To understand how the process of virtualization works, it is nec-
essary to define a model of third generation architectures. The
ISA provides two modes of execution: The user mode uand the
system mode s. If the machine is in system mode, the complete
set of instructions of the ISA is fully available while in user mode
only a subset is available. A state of a third generation computer
has four elements: executable storage E, processor mode M, pro-
gram counter P, and relocation bounds register R. The relocation
bounds register is a tuple with (l,b)where ldefines the reloca-
tion base and bdefines size of the relocated memory area. An
instruction producing an address a which is out of the relocation
bounds is called to memorytrap.
S=< E,M,P,R > (2.5)
The triplet < M,P,R > represents the PSW11. The memory loca-
tion E[0]is used to store the PSW which was in effect before a
trap while the location E[i]is used to store the new PSW to be
in effect after the trap. The real machine can exist in any one of
a finite number of states. This set is called Cr. The execution of
an instruction transforms the state of the machine into another.
Thus, the definition of an instruction can be formalized as:
Definition 2.11.An instruction iis a function from Crto Cr.i:
CrCr. So, for example, i(S1) = S2[PG74].
Now the definition of the action of a trap can be given by:
Definition 2.12.An instruction iis said to trap if i(E1,M1,P1,R1)=
(E2,M2,P2,R2)where E1[j] = E2[j],E2[0]=(M1,P1,R1)and (M2,
P2,R2) = E1[1]. In addition the trap must activate the supervisor
mode, thus M2=s, and the complete memory must be accessible by
the VMM resulting in R2= (0,q1).
This is the formal description of a context switch from user mode
to system mode caused by a trapping instruction user mode. The
11 The Program Status Word (PSW) reflects the current state of the processor
2.2 the architecture of vms 29
definition requires the memory to be untouched upon a trap,
except for memory location 0, as there the PSW needs o be in
effect after the trap is stored.
Amemory trap is a special trap caused by an instruction that
wants to access memory out of the bounds specified by the re-
location register R. With this knowledge, it is now possible to
classify the instructions of an ISA to determine whether the real
machine is virtualizable.
Definition 2.13.An instruction iis privileged if and only if for any
pair of states S1=< e,s,p,r > and S2=< e,u,p,r > in which i(S1)
and i(S2)do not memory trap: i(S2)traps and i(S1)does not [PG74].
The difference between the two states S1and S2is the proces-
sor mode. When executing the instruction iin state S1, the in-
struction is executed in system mode and does not trap, as the
execution permissions are not restricted in this mode. In case of
executing the instruction iin state S2, the instruction is executed
in an identical state with the difference that the instruction is ex-
ecuted in user mode. In user mode, privileged instructions are
not executable and cause the system to trap, as the execution per-
missions are restricted to non privileged instructions only. Thus,
an instruction is privileged when trapping in user mode and not
trapping in system mode. Traps caused by a memory trap are
not considered, as their origin is located in accessing unmapped
or protected memory locations.
There are two types of sensitive instructions being introduced.
The term sensitive instructions represents the set of instructions
that are able to modify the state of the real hardware and in-
structions that depend on the processor mode or on the memory
location where they are executed. At first the class of instructions
that are able to modify the state of real hardware is introduced.
This class is called control sensitive.
Definition 2.14.An instruction iis control sensitive if there exists a
state S1=< e1,m1,p1,r1>, and i(S1) = S2=< e2,m2,p2,r2>,
and r16=r2or m16=m2[PG74].
If an instruction attempts to change the available memory by
modifying the relocation register Ror affects the processor mode
without going through the memory trap sequence, it is called
control sensitive. A very simple example of a control sensitive
30 virtualizing embedded real-time systems
instruction is the mtmsr12 instruction of the Power ISA. This in-
struction directly affects the machine state register (MSR) which
is the PSW of the Power ISA. If this instruction is executed in
user mode the instruction will cause a trap.
The second type of sensitive instructions is called behavior sensi-
tive. The effect of a behavior sensitive instruction depends on the
current processor mode m, on the memory location pwhere this
instruction is executed, or on the relocation register r. Reading a
special instruction cache line is an example for a behavior sensi-
tive instruction Power Architecture. Dependent on the actual po-
sition, the returned value can either be an instruction executed
before or the current instruction itself. This kind of behavior sen-
sitivity is called location sensitivity. If the result of an instruction
depends on the processor mode, the instruction is called mode
sensitive.
To describe the behavior sensitivity formally, the operator is
defined, so that the relocation register ris modified in the way
that its base value is shifted by the value of x. Thus, the new
value is r0=rx= (l+x,b). The notion E|rdescribes the
contents of Eof the part of memory restricted by r. Combining
these two notions E|r=E0|rxmeans that the memory contents
of part Erestricted by ris equal to the memory contents of part
E0restricted by rx. Now the definition for behavior sensitive
instructions can be given:
Definition 2.15.An instruction iis behavior sensitive if there exists
an integer xand states: Let S1=< e|r,m1,p,r > and S2=< e|r
x,m2,p,rx >, where i(S1) =< e1|r,m1,p1,r >,i(S2) =< e2|r
x,m2,p2,rx > and neither i(S1or i(S2)memorytrap, while e1|r6=
e2|rxand/or p16=p2[PG74].
Please note that the definition requires the observation of possi-
ble combinations of S1and S2. This especially includes the case
of m16=m2representing the case required for mode sensitiv-
ity. Depending on the states S1and S2, the instruction ishows
different effects either on the executable storage e1|r6=e2|rx
and/or on the next instruction being executed (p16=p2). An ex-
ample for mode sensitivity is the POPF instruction of the Intel
IA-32 ISA. The instruction causes to pop a word or doubleword
to be popped from the current stack into the flags registers of the
12 Move to MSR
2.2 the architecture of vms 31
cpu. The stack pointer is then decreased accordingly to the word
or doubleword data type. This effect only occurs in system mode
while the instruction has no effect in user mode. Thus m16=m2
and e1|r6=e2|rx.
Now it is possible to define two classes of instructions of an ISA:
Definition 2.16.An instruction iis sensitive if it is either control or
behavior sensitive. If iis not sensitive, then it is innocuous [PG74].
The VMM mainly consists of three components, a dispatcher,
an allocator and an interpreter. The task of the dispatcher is the
heart of the VMM and decides which component to execute. The
allocator decides what system resources are to be provided to
the VMs. The allocator ensures the spatial isolation of the VMs
and is called if a privileged instruction attempts to change the
machine resources. The interpreter is responsible for all other in-
structions which trap and provides one interpreter routine per
privileged instruction to emulate the effect of the instruction
which trapped.
There are three properties of interest when the VMM is execut-
ing an arbitrary VM:
1. Efficiency: All innocuous instructions are natively executed
without intervention of the VMM.
2. Resource Control: For an arbitrary VM it is not possible
modify the availability of system resources. The allocator
has to be invoked upon any attempt.
3. Equivalency: Any VM executed by a VMM performs in a
manner indistinguishable from the case when the VMM
did not exist, with the exception of a different timing be-
havior.
Now it is possible to state the following Theorem for a VMM:
Theorem 2.6.For any conventional third generation computer, a vir-
tual machine monitor may be constructed if the set of sensitive instruc-
tions for that computer is a subset of the set of privileged instructions
[PG74].
To proof this theorem, it has to be shown that a VMM fulfills
the three properties of efficiency, resource control and equiva-
lency. The first two properties are guaranteed by the definition
of sensitive instructions. The only thing left to show is equiva-
lency. This is done by defining a virtual machine map (VM map)
32 virtualizing embedded real-time systems
Si
Sj
Si'
Sj'
f(Si)
f(Sj)
ei(Si)ei'(Si)
Figure 11: Virtual Machine Map
f:CrCvwhere Cris the set of machine states without a
VMM being present in the memory and Cvis the set of machine
states where the VMM is present in the memory.
Definition 2.17.A virtual machine map is a ring homomorphism with
respect to all the operations eiin the instruction sequence set I. That
is for any state SiCrand any instruction sequence ei, there ex-
ists an instruction sequence e0
isuch that f(ei(Si)) = e0
i(f(Si)). This
corresponds to figure 11 [PG74].
It is shown in [PG74] that the execution of arbitrary instruction
sequences fulfill the equivalence requirement considering an ex-
ample VMM and an example VM map being a one-one homo-
morphism with respect to all operations of the ISA.
This section clearly defines the formal requirements of the ISA
to be efficiently virtualizable by the concept of a VMM. There-
fore, efficient emulation of sensitive instructions is necessary to
implement the interpreter of the VMM, which is responsible to
emulate the trapping instructions of the ISA. The key aspect of
efficient emulation is the control transfer from the VM to the
VMM. This property is defined in theorem 2.6demanding a sen-
sitive instruction to trap what immediately transfers execution
control to the VMM. The following sections will discuss the con-
trol transfer paradigm from the VM to the VMM for the case
of full virtualization, paravirtualization and hybrid virtual ma-
chine systems.
Full virtualization
Full virtualization is a virtualization technique used to provide
a certain kind of virtual machine environment, namely one that
2.2 the architecture of vms 33
is a complete emulation of the underlying hardware. The exe-
cuted VMs, further referred as guests, are not aware of the virtu-
alization software executed on the host system. That especially
means that the guests are not modified to support virtualiza-
tion. So any software capable of running on the bare metal can
be run in the virtual machine and, in particular, any operating
systems. The key challenge of full virtualization is the intercep-
tion and emulation of sensitive instructions. The requirements
of full virtualizability have been introduced in detail in section
2.2.2. The interception of such sensitive instructions is quite com-
plex as they cause an interrupt at any time, and the VMM has
to determine which instruction caused this interrupt and what
are the parameters of the instruction to emulate the instruction
correctly.
Paravirtualization
Paravirtualization is a virtualization technique that presents a
software interface to VMs that is similar but not identical to that
of the underlying hardware. The intent of paravirtualization is
to reduce the emulation overhead caused by emulating sensitive
instructions in the domain of the VMM. Therefore, paravirtuali-
zation provides a so-called ”hypercall interface“. Hypercalls are
system calls the virtual machine has to use to simplify virtualiza-
tion tasks like emulating sensitive instructions. As an example,
the emulation of an access to a sensitive register like the program
status word (PSW) is more complicated in case of full virtualiza-
tion than in the case of paravirtualization, because in case of full
virtualization, a trap is caused by the sensitive instruction and
the instruction causing this trap has to be identified including its
parameters by the VMM while in case of paravirtualization the
hypercall already contains this information and the VMM sim-
ply has to extract this from the hypercall. Besides the hypercall
interface, which performs instruction level modifications, par-
avirtualizing a guest OS also means to perform structural mod-
ifications which require intimate knowledge of the guest kernel.
Such structural changes may modify for example the address
space layout to achieve fast VM communication. Despite this ad-
vantages, paravirtualization has a major drawback, because the
usage of hypercalls and/or structural changes require the mod-
ification of the guest OS to support the hypercalls. This may be
34 virtualizing embedded real-time systems
restricted due to licensing restrictions of the OS developer mak-
ing a paravirtualization of that OS impossible.
Hybrid virtual machine system
Up to now, virtualization has been discussed for the case that
the ISA fulfills theorem 2.6, but how to implement system virtu-
alization when this property does not hold. The Intel IA-32 in-
struction POPF is behavior-sensitive, but does not cause a trap in
user mode and thus violates theorem 2.6. Nevertheless, the ISA
is virtualizable, but there should be spent some more effort on
implementation. The naive solution would be to interpret every
single instruction, but this of course degrades the performance
dramatically. A pragmatic solution to this problem is to scan
for critical instructions in advance and patch them. The patched
instruction transfers the control to the VMM, allowing for the
emulation of the non trapping sensitive instruction.
Definition 2.18.A virtual machine system executing code of the VMs
natively and in which the control transfer of some sensitive instructions
is not caused by a trap caused by this instruction is called hybrid
virtual machine system.
Depending on the number of necessary control transfers, the ap-
proach of scanning and patching can still degrade the perfor-
mance dramatically. Thus some enhanced methods have to be
used to improve performance. It is possible to apply the scan
and patch methodology to common techniques like binary trans-
lation, which is discussed in general in section 2.3.2. [SN05b]
2.2.3 Memory Virtualization
One of the properties of a VMM is resource control. Thus, the
VMM has to ensure spatial isolation between the different VMs
to prevent them from manipulating the memory of each other.
When considering embedded systems without MMU13, the per-
formance of virtualization decreases extremely, as every memory
access needs to trap and the VMM has to check whether the ac-
cessed memory location is protected or not. In reality, this situa-
tion is even worse, as in most embedded processors the memory
13 Memory Management Unit
2.2 the architecture of vms 35
--- ---
1000 5000
--- ---
2000 1500
--- ---
1000
2000
Page Table for Program 1
Virtual Memory
of Program 1
Virtual Memory
of Program 2 Page Table for Program 2
Virtual Memory
of Program 3
1500
Virtual Page Real Page
--- ---
1000 not mapped
--- ---
4000 3000
--- ---
Virtual Page Real Page
--- ---
1500 500
3000 not mapped
5000 1000
--- ---
Real Page Physical
Page
--- ---
500 3000
--- ---
3000 not mapped
--- ---
Real Page Physical
Page
---
1000
---
---
4000
Page Table for Program 3
Virtual Page
3000
5000
500
1000
4000 500
1000
3000
3000
1000
4000
Real Map Table
Real Map Table
---
500
---
---
3000
Real Page
VM1
VM2
Physical Memory
Real Memory
Real Memory
VMM
Figure 12: Memory Virtualization
accessing instructions do not trap. This results in the demand for
MMU support. The concept of virtual memory can be seen as a
special case of virtualization, as the concept of virtual memory
defines a clear distinction between the logical view of memory
as seen by an application programmer and the actual hardware
memory resource as managed by the operating system. In a Sys-
tem VM environment, each guest VM has its own set of virtual
memory tables. The MMU translates the virtual addresses to
“real” memory addresses. The “real” memory addresses corre-
spond to physical addresses in the case of non-virtualized sys-
tems. In a virtualized environment, the mapping of virtual ad-
dresses to “real” addresses is maintained by the guest OS, but
the physical memory is controlled by the VMM and thus, an-
other mapping from “real” memory to physical addresses has
to be performed to guarantee spatial isolation, as different VMs
may want to map a virtual address to the same “real” address.
The process of mapping virtual addresses to “real” addresses
and “real” addresses to physical addresses is depicted in figure
12.
Nowadays, the address translation from virtual to physical ad-
dresses is realized using architected page table or architected
36 virtualizing embedded real-time systems
--- ---
1000 5000
--- ---
2000 1500
--- ---
1000
2000
Page Table for Program 1
Virtual Memory
of Program 1
Virtual Memory
of Program 2
Page Table for Program 2
Virtual Memory
of Program 3
Virtual Page Real Page
--- ---
1000 not mapped
--- ---
4000 3000
--- ---
Virtual Page Real Page
--- ---
1500 500
3000 not mapped
5000 1000
--- ---
Real Page Physical
Page
--- ---
500 3000
--- ---
3000 not mapped
--- ---
Real Page Physical
Page
---
1000
---
---
4000
Page Table for Program 3
Virtual Page 500
1000
4000
3000
1000
4000
Shadow Page Table for Program 1
Real Map Table
---
500
---
---
3000
Real Page
VM1
VM2
Real Memory
VMM
--- ---
1000 1000
--- ---
2000 500
--- ---
Virtual Page Physical
Page
1500
3000
5000
Real Memory Real Map Table
Shadow Page Table for Program 2
--- ---
1000 not mapped
--- ---
4000 not mapped
--- ---
Virtual Page Physical
Page
Shadow Page Table for Program 3
--- ---
1000 3000
--- ---
4000 not mapped
--- ---
Virtual Page Physical
Page
Page Table
Register
VM1 active
Virtual
Page Table
Register
Program 1 active
Virtual
Page Table
Register
Program 3 active
Figure 13: Virtualizing architected page tables
TLBs14. The virtualization of these two address translation archi-
tectures require different approaches that will be introduced in
the following two sections. [SN05b]
Table 1: Actions of the VMM on triggered page faults
Mapped in Mapped in VMM Action
page table shadow page table
Yes Yes No VMM activation
Yes No Handle page
fault silently
No Yes Should not
occur
No No Transfer page
fault handling to
guest OS
2.2 the architecture of vms 37
Architected Page Tables
When an architecture provides architected page tables, the OS
maintains its own page tables and the hardware is aware of this
page tables by providing a page table pointer register. An ad-
dress space switch is realized by changing the page table pointer
to the page table of the corresponding address space. The page
tables have to be in a format defined by the architecture. When a
virtual address needs to be translated, the MMU walks the page
table using the page table pointer and performs the translation
when the corresponding entry has been found. If there is no en-
try found, a page fault is triggered by the MMU notifying the OS
that the address translation cannot be performed for the given
virtual address. In a virtualized environment, the page table reg-
ister pointer needs to be virtualized. If a guest tries to access the
page table pointer, and to either read or write it, a trap is trig-
gered to activate the VMM. The page tables maintained by the
guest OS do not contain virtual to physical mappings but virtual
to real mappings. Thus, the VMM maintains shadow page tables
to map real addresses to physical addresses as depicted in figure
13.
The maintenance of the page tables by the guest OS and the
maintenance of the shadow page tables by the VMM introduces
the problem of different VMM actions depending on the state
of the mappings in the guest OS page tables and the shadow
page tables. The case of having a page mapped in the shadow
page table but not in the guest OS page table should never occur,
because the modification of the page table needs to trap to the
VMM. Otherwise, the page table and the shadow page table may
get into inconsistent states. The two other cases are described in
table 1. [SN05b]
Architected TLBs
In the case of architected TLBs, the TLB table itself needs to be
virtualized together with the ASID15 register. The ASID is a spe-
cial tag field in the TLB associated with every entry in the TLB.
The ASID describes the address space associated with the en-
try. Thus, it is possible to keep translations of different virtual
14 Translation Look-Aside Buffers
15 Address Space Identifier
38 virtualizing embedded real-time systems
--- ---
1000 5000
--- ---
2000 1500
--- ---
1000
2000
Virtual TLB of VM 1
Virtual Memory
of Program 1
Virtual Memory
of Program 2
Page Table for Program 2
Virtual Memory
of Program 3
Virtual Page Real Page --- ---
1500 500
3000 not mapped
5000 1000
--- ---
Real Page Physical
Page
--- ---
500 3000
--- ---
--- ---
--- ---
Real Page Physical
Page
1000
4000
1000
4000
Real Map Table of VM2
VM1
VM2
VMM
Real Map Table of VM1
---
3
---
3
---
ASID
4000 3000
--- ---
7
---
--- ---
1000 500
--- ---
--- ---
--- ---
Virtual Page Real Page
---
3
---
---
---
ASID
--- ---
--- ---
---
---
--- ---
1000 1000
1000 3000
2000 500
--- ---
Virtual Page Physical
Page
---
9
4
9
---
ASID
--- ---
--- ---
---
---
Virtual TLB of VM 2
---
9
---
---
---
Real
ASID
---
VM1:3
---
VM1:7
---
Virtual
ASID
---
4
---
VM2:3
ASID Map Table Real TLB
Virtual
ASID
Register
Virtual
ASID
Register
ASID
Register
Figure 14: Virtualizing architected TLBs
address space simultaneously in the TLB to prevent flushing the
TLB on an address space switch. The ASID register holds the cur-
rent active ASID that enable translation of all entries in the TLB
having the same ASID as in the ASID register. Thus, an address
space switch is realized by writing to the ASID register. The ad-
vantage of an architected TLB is that the system software is able
to define its own policy for replacement in the TLB, which is not
possible in the case of architected page tables. The disadvantage
is the complexity added to the system software to implement
this policy.
When virtualizing architected TLBs, the instructions modifying
the ASID register and the TLB entries need to trap to the VMM.
The guest OS operates on virtual TLBs which can be reconstructed
by the VMM using the ASID map table, the real map table, and
the real TLB maintained by the VMM. The main difference to
architected page tables is the additional ASID register and ASID
tag. The virtual ASIDs, need to be mapped to real ASIDs, as the
different VMs may use the same virtual ASIDs independently
of each other. As already described in section 2.2.3, the VMM
needs to manage the handling of page faults depending on the
mapping state in the guest OS page tables and the shadow page
tables. This problem is identical for the virtual TLBs of the guest
OS and the real TLB. Table 2shows the actions of the VMM
2.2 the architecture of vms 39
Partition 1 Partition 2 Partition 3
MILS Separation Kernel
Processor
Figure 15: MILS architecture.
performed for the different cases. As already desribed for archi-
tected page tables, the case of having a TLB entry mapped in the
real TLB but not in the guest OS virtual TLB table should never
occur, because the modification of the virtual TLB needs to trap
to the VMM. [SN05b]
Table 2: Actions of the VMM on triggered TLB misses
Mapped in Mapped in VMM Action
virtual TLB TLB
Yes Yes No VMM activation
Yes No Handle TLB miss
fault silently
No Yes Should not
occur
No No Transfer TLB
miss handling to
guest OS
40 virtualizing embedded real-time systems
2.2.4 Multiple independent layers of security
The multiple independent layers of security architecture (MILS)
satisfies highest security requirements. The main idea of this
approach is the structuring of the participating components in
a secure system to achieve extremely high security. This struc-
turing is depicted in figure 15. The components of the system,
like Processes or Applications, are isolated through the usage
of different partitions that encapsulate these components. A sin-
gle partition consists of the executable code, the data and the
used system resources of the component. This partitioning al-
lows for the possibility to separately verify each partition. For
realizing this partitioning a software component called “separa-
tion kernel” is introduced. Its tasks are to isolate the partitions
and to control the information flow between the partitions. The
information flow control mechanism is the main addition that
delimits the MILS architecture from general VMM architectures
which are introduced in the next section. All components not
responsible for partitioning and information flow control have
to be placed in a separate partition. Keeping the separation ker-
nel free from complex tasks allows for the general possibility for
mathematical verification of the separation kernel, which is the
overall goal of the MILS architecture. As an upper bound for ver-
ifiability of the separation kernel, a total number of about 5000
LOC is given nowadays. The verifiability of the separation kernel
is given by the four aspects ensured by the MILS architecture:
1. Information Flow: The inter-partition communication needs
the authorization of the separation kernel.
2. Data Isolation: No private data of a partition is accessible
by any other partition.
3. Periods Processing: When a switch of the partitions occur,
no information about the existence of other partitions are
available.
4. Damage Limitation: A fault occurring in a partition has no
impact on any other partition.
[Obj08]
2.3 emulation 41
2.3 emulation
In general, the term emulation is defined as the process of imple-
menting the interface and functionality of one system or subsys-
tem on a system having a different interface and functionality.
Thus, emulation is a key aspect to implement virtualization, as
instructions have to be emulated correctly to ensure the equiv-
alency property of VMs. In case the VM uses the same ISA as
the host system, all trapping instructions have to be emulated
(see section 2.2.2). This is referred to as “same ISA virtualiza-
tion”. However virtualization may also provide an ISA different
from the hosts ISA to the VMs. This is referred to as instruction
set emulation allowing a machine implementing one instruction
set, the target instruction set, to reproduce the behavior of soft-
ware compiled to another instruction set, the source instruction
set. To distinguish between instruction set emulation and com-
plete virtual machine environments, the terms guest and host re-
fer to complete virtual machine environments, while the terms
source and target specifically address instruction set emulation.
Instruction set emulation is a key feature for consolidating old
hardware that is not available anymore, and for heterogeneous
environments, as instruction set emulation allows for the migra-
tion of VMs using an ISA different from the host ISA [SN05b].
For many virtual machine applications, it is of great importance
that the emulation of the instruction set is performed efficiently.
This of course holds for embedded real-time systems. In the fol-
lowing, a short and abstract overview of the two basic emula-
tion methods interpretation, covered in section 2.3.1, and binary
translation, covered in section 2.3.2, is given [SN05b].
2.3.1 Interpretation
An interpreter program emulates and operates on the complete
architected state of a machine implementing the source ISA, in-
cluding all architected registers and main memory as depicted
in figure 16. To execute a program of the source ISA, the in-
terpreter has to manage the code and the program data which is
maintained in the source memory state by the interpreter. To reflect
the current state of the source machine, the interpreter holds a
source context block which contains the various components of
42 virtualizing embedded real-time systems
Stack
...
Data
Code
Source Memory
State
Program Counter
Condition Codes
Reg 0
Reg 1
...
Reg n-1
Interpreter Code
Source Context Block
Figure 16: Overview of an interpreter [SN05b].
the source’s architected state, such as general-purpose registers,
the program counter, condition codes, and miscellaneous control
registers.
Basic interpretation is implemented as fetch and decode loop
(see figure 17a). The program counter register stored in the sour-
ce context block points to the address of the source memory
state, where the next instruction is being fetched by a simple
memory operation. After the instruction has been fetched, the in-
struction has to be decoded to decide how to emulate the instruc-
tion. Therefore, the instruction itself and the parameters have to
be extracted. This can be very expensive, as the opcode needs to
be identified first to decide how the parameters have to be ex-
tracted. This is all based on bit masking and shifting operations.
After the instruction and its parameters have been extracted the
fetch and decode loop dispatches to the responsible emulation
method. After the emulation method has finished a branch back
to the end of the main loop to unroll the call stack is performed.
After this, the program counter is modified and a branch to the
beginning of the loop is performed where the interpreter pro-
gram is ready to fetch and decode the next instruction of the
memory source state. To sum up, the fetch and decode loop
introduces three branches which degrades the pipeline perfor-
mance enormously [SN05b]. To address this problem, the fetch
and decode loop can be integrated at the end of every interpreta-
tion routine increasing the code size but reducing the overhead
2.3 emulation 43
branches to only one branch (see figure 17b) [Bel73,SN05b]. Ob-
viously, interpretation is a complex process and introduces a lot
of run-time overhead to the execution of a program of the source
ISA [EG01,Bel73,EW01,EGKP02,Pit87,SCEG08,CEG07,PR98,
BVZB05].
2.3.2 Binary translation
As already discussed in section 2.3.1, the run-time overhead of
interpretation techniques is quite huge and a promising approach
to reduce this overhead, called “Binary Translation”,is a suitable
method where the “source program” is translated into a pro-
gram of the “target ISA”, which is depicted in figure 18a. Due to
the fact that instruction translation is customized, state mapping
can be used to map registers of the source ISA directly to the
target ISA registers (see figure 18b. This saves memory access to
the context block and accelerates the execution.
A distinction is made between dynamic and static binary trans-
lation. In the case of dynamic binary translation, the translation
of the program depends on the execution flow and is performed
on the fly. If a certain part in the execution flow of the source
program, called dynamic basic block, is not available as trans-
lated code, it will be compiled at runtime. Dynamic basic blocks
are determined by the actual flow of a program as it is executed.
A dynamic basic block always begins at the instruction executed
immediately after a branch or jump, follows the sequential in-
struction stream, and ends with the next branch or jump. Jump
or branch targets within this flow do not terminate the dynamic
basic block. This is of course not suitable for real-time embed-
ded systems, as the number of possible dynamic basic blocks is
very high and a caching of all blocks is therefore not possible
due to the resource limits. Furthermore, if the blocks cannot be
completely cached, the worst case execution time of a dynamic
basic block always has to consider the time for the translation
from the source ISA to the target ISA destroying the advantage
of the translation.
In the case of static binary translation, it is tried to offline trans-
late the whole program from the source ISA to the target ISA. In
this case, the execution flow cannot be considered as in the case
of dynamic binary translation. Because of this, the source code
44 virtualizing embedded real-time systems
fetch and decode
loop
source memory
state
interpreter
routines
1. fetch
2. decode
3. dispatch
4. set PC
source context
block
(a) Basic Interpretation
source memory
state
interpreter
routines
2. fetch
3. decode
4. dispatch
source context
block
emulation
emulation
1. set PC
2. fetch
3. decode
4. dispatch
1. set PC
(b) Threaded Interpretation
Figure 17: Basic and threaded interpretation
2.3 emulation 45
binary
translator
source code
binary translated
target code
(a) Binary Translation [SN05b].
Source Register
Blocks
Source Memory
Image
program counter
Reg 1
Reg 2
Reg n
r1
r2
r3
r4
r5
r n+4
(b) State Mapping [SN05b].
Figure 18: Binary Translation
binary
translator
source code
binary translated
target code
movl %eax,, 4(%esp) ; load jump address from memory
jmp %eax ; jump indirect through %eax
addi r16,r11,4 ; compute IA-32 address
lwzx r4,r2,r16 ; get IA-32 jump address from IA-32
; memory image
mtctrr4 ; move value of r4 to count register
bctr ; jump indirect through count register
%eax = ctr
but
source[%eax] source memory state[ctr]
Figure 19: Code Location Problem [SN05b].
46 virtualizing embedded real-time systems
mov %ch, 0
movl %esi, 0x08030000(%ebp)
31 c0 8b b5 00 00 03 08 8b bd 00 00 03 00
(a) Finding IA-32 boundaries in an instruction
stream [SN05b].
Instruction 1 Instruction 2
Instruction 3 jump
reg. data
Instruction 5 Instruction 6
uncond. branch pad
Instruction 8
Jump indirect to ???
Data in instruction
stream
Pad for instruction
alignment
(b) Causes of the Code Discovery Problem [SN05b].
Figure 20: Code Discovery Problem
is divided into static basic blocks. In essence, static basic blocks
begin and end at all branch or jump instructions and all branch
or jump targets. Static basic blocks are the biggest atomic instruc-
tion sequences and can be translated in advance. Now one might
think it is possible to completely translate the program, but the
problem is that the binary code does not contain any informa-
tion on where possible jump targets and instructions are located
within the binary. The first problem of determining jump targets
comes up when a relative jump instruction shall be translated. If
the contents of the registers used to perform the relative jumps
are calculated, it is in general impossible to perform the transla-
tion offline. This is due to the fact, that the result of the calcu-
lation is an address within the source program what has to be
mapped to an address within the target program. This problem
is called the “code location problem” (see figure 19).
The second problem is to identify instructions within the binary
as compilers may intermix code and data due to padding for
alignment reasons or to provide a mask that specifies which reg-
isters have been saved by a procedure caller at the time of the
call. Whatever the reason for interspersing data in code sections
is, it poses difficulties in statically identifying the starting points
of all instruction sequences in a given region of memory. It be-
comes more problematic if the computer architecture is a CISC
2.4 summary 47
16, where the length of the instruction varies in contrast to RISC17
architectures, where the length of an instruction is fixed (see fig-
ure 20a). This problem of identifying instructions within the bi-
nary is referred to as “code discovery problem”. Hoorspool and
Marovac proved in [HM80] that the code location and the code
discovery problem are not solvable in general and if they are
solvable they are NP-Hard problems.
As stated at the beginning of this section, emulation is a key
aspect to implement virtualization, as instructions have to be
emulated correctly to ensure the equivalency property of VMs.
Beside same-ISA emulation, emulation offers the possibility to
execute programs written for a source ISA. Interpretation is the
straightforward approach to emulate a source program on a tar-
get ISA, but it suffers from its bad performance primarily caused
by the overhead of the fetch, decode and dispatch steps. Nev-
ertheless, interpretation has the big advantage of being robust
against the code location and the code discovery problem, be-
cause interpretation in contrast to static binary translation, has
knowledge of the execution flow. Binary translation comes in
two flavors namely dynamic and static binary translation. Dy-
namic binary translation creates blockwise translated binary code
at runtime depending on the execution flow and is as interpre-
tation robust against the code location and code discovery prob-
lem, as interpretation can be integrated as fallback mechanism.
The problem of dynamic binary translation is the huge overhead
of caching dynamic basic blocks what is not possible to carry
out completely and therefore has to be considered in the WCET.
Static binary translation tries to translate the whole source pro-
gram of the target ISA, but suffers from the code location and
code discovery problem eliminating the possibility of translating
the whole program in the general case [Hei08a,SCK+93,CM96,
SBR05,GL94].
2.4 summary
At the beginning of this chapter, the properties of real-time sys-
tems have been introduced together with the common terms be-
16 Complex Instruction Set Computer
17 Reduced Instruction Set Computer
48 virtualizing embedded real-time systems
ing used in this area. This terms will be used throughout this
document to be in line with this common understanding of real-
time systems. To understand the common terms of virtualiza-
tion, the different approaches of process and System VMs have
been introduced. As this thesis aims at providing a system vir-
tual machine approach for embedded hard real-time systems
based on full virtualization, an in-depth look at the basics of
System VMs was performed. Especially the formal requirements
were highlighted, as these formulate the properties a VMM has
to fulfill for realizing a full virtualization of third generation
computer architectures in an efficient and secure manner. To re-
alize a VMM, a memory protection mechanism is necessary to
isolate the VMM and the VMs from each other. Todays systems
mostly provide either an architected TLB or a software-managed
TLB. For both architectures, the basic approaches to successfully
virtualize a system with one of those architectures is presented.
Finally an overview of emulation techniques, which are required
by the VMM was given. With this knowledge in mind, it is now
possible to address the problem of designing a system virtual
machine for embedded hard real-time systems being based on
full virtualization, which is the topic of the following chapter.
3
A VIRTUAL MACHINE MONITOR FOR
EMBEDDED REAL-TIME SYSTEMS
Contents
3.1Problem Statement 54
3.2Related Work 57
3.3Design 71
3.4Evaluation 100
3.5Summary 115
The growing complexity of embedded real-time systems and
their demand for high-level functionality typically provided by
GPOSs like Linux, Windows and Mac OS X is the main moti-
vation of this chapter. The OCM hierarchy already briefly in-
troduced in chapter 2is a perfect example for this demand of
modern embedded systems.
The controller module of the OCM depicted in figure 21 operates
on a closed loop called motor loop which needs to be controlled in
hard-real time, as missing a deadline may destroy the controlled
motor. Thus, the controller module needs an RTOS as execution
platform. The reflective operator module controls and monitors
the controller module. The different configurations for the con-
troller module encapsulate different control algorithms, which
may differ in quality of control, energy consumption, fail safe
behavior and so on. These configurations can be selected by the
reflective operator and assigned to the controller module to use
this configuration. This process can be performed either in hard
real-time or in soft real-time depending on the current situation.
The cognitive operator is responsible for using the data pro-
vided by the reflective operator for self-optimization processes
like machine learning approaches, model-based optimization or
knowledge-based systems. Depending on the result of the self-
49
50 a virtual machine monitor for embedded real-time systems
Figure 21: Operator Controller Module
a virtual machine monitor for embedded real-time systems 51
optimization process, a configuration is proposed to the reflec-
tive operator. When the monitored system is in a state allowing
for a transition into this proposed configuration, the transition is
triggered by the reflective operator. Nevertheless, the reflective
operator is always able to force a switch into a different configu-
ration when needed. This information is then forwarded to the
cognitive operator triggering again the self-optimizing process.
The OCM is very complex and developed by different research
teams providing software components for the controller module,
the reflective operator and the cognitive operator.
One of the main problems of building such complex systems is
the integration of software components into a big integrated sys-
tem as described by Broy in [Bro06]. He uses the automotive do-
main as example for a mechatronical systems. When considering
a top line car, there are about 70 ECUs1installed with different
tasks, from hard real-time tasks for actuating elements to soft
real-time tasks for multimedia devices to non real-time elements
like the ECU for the window lifter. For the integration of such a
system Broy says:
“Traditionally quite unrelated and independent functions (such
as braking, steering, or controlling the engine) that were freely
controlled by the driver get related and start to interact. The car
turns from an assembled device into an integrated system. Phe-
nomena like unintentional feature interaction become issues.”
[Bro06]
This unintentional feature interaction becomes an extremely im-
portant issue to be handled safely as the verification of a com-
plex integrated system is an extremely hard task also identified
by Broy:
“Since today, by their design, architecture and the interaction be-
tween the sub-systems are not precisely specified, and since the
suppliers realize the sub-systems in a distributed process, it is
not surprising that integration is a major challenge. First of all
a virtual integration and architecture verification is not possible
today, due to the lack of precise specifications. Second, in turn
the sub-systems delivered by the suppliers do not fit together
properly and thus the integration fails. Third when trying to
carry out the error correction due to the missing guidelines of
1Embedded Control Units
52 a virtual machine monitor for embedded real-time systems
architecture, there is no guiding blue print to make the design
consistent.” [Bro06]
In a distributed system where every task is placed on a sin-
gle ECU the unintentional interaction or a hardware failure are
fairly the only issues endangering a component’s functionality.
But this kind of feature distribution is currently something up
for discussion as the trend is to have less dedicated ECUs in fa-
vor of more centralized multi-functional hardware, which is also
stated by Broy:
“The car of the future will certainly have much less ECUs in
favor of more centralized multi-functional multipurpose hard-
ware, less communication lines and less dedicated sensors and
actuators. Arriving today at more than 70 ECUs in a car, the
further development will rather go back to a small number of
ECUs by keeping only a few dedicated ECUs for highly critical
functions and combining other functions into a small number of
ECUs, which then would be rather not special purpose ECUs,
but very close to general-purpose processors. Such a radically
changed hardware would allow for quite different techniques
and methodologies in software engineering.” [Bro06]
a virtual machine monitor for embedded real-time systems 53
RTOS RTOS GPOS
VMM
RTVM VM
Cognitive
Loop
Motor
Loop
Reflective
Loop
RTVM
Raptor 2000
Figure 22: OCM virtualized
54 a virtual machine monitor for embedded real-time systems
The trend towards more centralized multi-functional hardware
boosts the problem of unintentional interaction of software com-
ponents as they share the processor, memory and I/O devices
in this case. Thus, the task of the system software is to prevent
unintentional interactions, which are not based on the communi-
cation between the components like the domination of hardware
resources (for example the processor or memory), faulty imple-
mentations allowing buffer overflows, heap overflows, stack over-
flows, race conditions and so on, as these unintentional interac-
tions endangers all components running on the same hardware.
This is a typical task for system software and is normally cov-
ered by using virtual memory isolating the tasks from each other,
but there is a new demand for high-level functionality in such
complex embedded systems. Reconsidering the OCM, one can
easily see that the cognitive operator requires a lot of high-level
APIs to perform the optimization tasks using machine learning
approaches, model-based optimization or knowledge-based sys-
tems. This is typically not a task of embedded RTOSs. This is the
point where virtualization can show its strength as virtualization
in the example of the OCM allows to isolate the cognitive opera-
tor into a virtual machine together with a GPOS providing all of
the high-level functionality needed by the cognitive, while isolat-
ing the reflective operator and the controller module in RTVMs
running an RTOS (see figure 22). Thus, virtualization helps to
simplify the integration process of components with different
requirements to their operating systems as it allows to run mul-
tiple operating system while spatially isolating them and pre-
venting unintentional interactions like resource domination or
attacks resulting from faulty implementations. A big advantage
of virtualization against the use of a single operating system pro-
viding all functionality is that VMMs are by design very small
and are better to verify than a big operating system full of high-
level functionality.
3.1 problem statement
The overall goal of this thesis is the integration of software com-
ponents into a big integrated system as depicted in figure 23.
Thus, there are given real-time systems which may have been
executed on different to the integrated host system. To make
3.1 problem statement 55
GPOS
Integration
VMM
Root Scheduler
RTVM1 RTVM2 RTVM3
Local Scheduler
RS1 RS2 RS3
Local Scheduler Local Scheduler Local Scheduler
P1 P2 P3 P1 P2 P1 P2 P3
CPU
CPU 1 CPU 2 CPU 3
Constraints:
Full virtualization
Guarantee of local real-time constraints
Root-Level Scheduling
requirements?
CPU
requirements?
Given:
Derivation of:
CPU requirements
Root level schedule
P1 P2 P3 P1 P2 P1 P2 P3 P4
Local Scheduler Local Scheduler
GPOS
P4
Local Scheduler
CPU 3
P1 P2 P3
Local Scheduler
P1 P2 P3
Figure 23: Problem of integrating given real-time systems into an vir-
tual system hosting these real-time systems as RTVMs.
these real-time systems executable on the integrated system, ev-
ery real-time system is encapsulated as RTVM2. These RTVMs
are then executed on an especially designed VMM, which en-
sures temporal and spatial isolation to prevent the described un-
intentional interactions. There already exist a few commercial
virtualization platforms or VMMs for a range of embedded pro-
cessors. nearly all of them being proprietary systems. All of the
available products only use paravirtualization trying to provide
reasonable performance and support realtime applications only
by the use of dedicated resources. Naturally, this limits the ap-
plicability of virtualization using these products to a subset of
all possible scenarios, as in general the paravirtualization inter-
faces are not standardized. Especially whenever there are appli-
cations that cannot be paravirtualized since the source code is
not available, these applications cannot be virtualized using the
currently available virtualization products, since this would re-
quire a binary analysis of the whole application, which most of-
ten is not completely possible. Thus, within this thesis, a VMM
is designed to overcome this problem by introducing a config-
urable hybrid VMM architecture designed and implemented for
the PowerPC405 processor which allows for the virtualization of
unmodified applications as well as paravirtualized applications
2Real-Time Virtual Machine
56 a virtual machine monitor for embedded real-time systems
or even a combination of both. To describe this more in a more
formal manner, the ABI is kept to be configurable. The paravirtu-
alization effort is thus decreased, as only the required hypercalls
need to be implemented in the guest OS. The support for paravir-
tualization is motivated by the integration of open source GPOSs
for High-Level API support like Linux which already provide pa-
ravirtualization interfaces. Support for realtime applications was
the next major goal of the design, which allows the integration
of any kind of scheduling mechanism for VMs while being com-
pletely deterministic. Furthermore, the high configurability al-
lows the system to be optimized explicitly for the intended field
of use. This affirmed the research in this direction with hybrid
virtualization being a relevant topic in industrial embedded sys-
tems. Finally, it is desirable to know in advance how the WCET
of the executed guests are affected, as these WCETs are neces-
sary to determine the CPU requirements of the virtual real-time
system.
When supporting multiple VMs on a virtualization platform, the
VMM, needs to implement a scheduling algorithm according to
which it switches between the VMs. From the point of the VMM
the executed VMs are just processes, but as they execute an OS,
they schedule a set of tasks by themselves (see figure 23). In the
case of full virtualization, these VMs appear to be blackboxes
while in the case of paravirtualization, the VMs can communi-
cate with the VMM scheduler. The goal of a VMM is to give
to each VM the illusion of having the resources of a complete
system at its disposal, while these resources are only subsets
of a physical machine. Thus, the VMM needs to implement a
partitioning policy which ensures the correct timing behavior
of all executed VMs. For RTVMs this timing behavior has hard
deadlines the VMM has to cope with. Existing approaches are
restricted to the application of paravirtualization when support-
ing hard real-time, which is not in general possible due to the
already mentioned licensing restrictions, or they are not able to
derive the root level schedule automatically from a given set of
real-time systems. Thus, the goal of this thesis is to automatically
derive an integrated virtual system hosting the given real-time
system while guaranteeing the real-time constraints and prevent-
ing the application of paravirtualization, as this would require
the availability of the guest OS source code.
3.2 related work 57
To sum up, the goals of thesis are described in a hierarchical
manner:
1. Integration of given real-time and general purpose systems
into an integrated virtual real-time system
a) Configurable hybrid VMM
i. Configurable ABI (Full and Paravirtualization)
ii. Extensible scheduler interface
iii. WCET Determination of virtualized guests
b) Hierarchical Scheduling of RTVMs
i. Derivation of CPU requirements
ii. Derivation of root-level schedule
All these goals share the constraints of full virtualization sup-
port, local real-time constraints and low jitter.
3.2 related work
This section describes the related work which is relevant for the
goal of providing a configurable hybrid VMM (1aof the prob-
lem statement in section 3.1). The goal of hierarchical schedul-
ing RTVMs is addressed in the next chapter, as this is separately
discussed due to being thematically totally different from the
design of a VMM.
3.2.1 Academia
The related work from academia addresses mainly technical de-
tails like memory and I/O virtualization, where a lot of pa-
pers can be found. Unfortunately, there exist some few papers
on paravirtualization extensions which do not address the de-
fined goal of providing a configurable ABI, but they are worth
mentioning, as especially the previrtualization approach is in-
teresting for minimized paravirtualization effort when having
unlinked assembly code as origin. Finally, an approach is intro-
duced to implement virtual memory suitable for mixed critical-
ity systems like the proposed configurable hybrid VMM, which
is able to host GPOSs beside RTVMs.
58 a virtual machine monitor for embedded real-time systems
GandalfVMM
The GandalfVMM was developed at the University of Tsukuba
by Oikawa et. al [OIN06] and focuses on the simplification of pa-
ravirtualization overhead. The authors claim that their approach
called Mesovirtualization [IO07] reduces the effort of paravirtu-
alizing a guest OS by two orders of magnitude. One significant
characteristic of mesovirtualization is how a VMM handles sensi-
tive instructions used in guest OSs. While they are emulated by a
VMM very much like in full virtualization, only the essentials are
emulated in GandalfVMM. There are some cases that sensitive
instructions which are not emulated by a VMM produce unex-
pected results for a guest OS. Rather than having every sensitive
instruction changed to trap to a VMM and handled it with hard
work, mesovirtualization slightly modifies the guest OS source
code preventing an interrupt to the VMM. For some parts of the
host machine that are considered safe to be dedicated or shared,
the VMM does not virtualize these parts and allows guest OSs
to touch them directly. Such characteristics lead to a lightweight
VMM, because there is no need to virtualize the full ability of
the host machine. It also leads to the reduction of VMM’s use of
processor time, which makes it possible to provide higher per-
formance to guest OSs.
Previrtualization
Due to the intrusive nature of paravirtualization, guest OSs are
mostly restricted to a special VMM when they are paravirtual-
ized. Responsible for this cutback are the instruction level and
structural modifications that build a very specialized interface
to the VMM. The idea of pre-virtualization is to modularize and
decouple this specialized interface from the guest OS to be flexi-
ble and degradable to allow for the execution on raw hardware
and VMMs that lack the support of pre-virtualization.Therefore,
sensitive instructions will be padded with a sequence of no-op
instructions and annotated to determine their location at run-
time by the VMM. The VMM is able to replace these no-op in-
structions with higher performance alternatives that delegate the
instructions to a so-called “in-place VMM”. The in-place VMM
provides a virtual model of the CPU and the devices and is able
to defer and batch hypercalls depending on the state of the vir-
tual cpu and device models. It is necessary to be able to provide
3.2 related work 59
the scratch space by padding the unlinked assembly code of the
guest OS as at this step the padding is simply possible due to
the fact that addresses are available as symbols [LUC+06].
Memory Virtualization
The design of the memory virtualization approach depends on
the hardware support of the target processor architecture as al-
ready presented in section 2.2.3. When using a processor archi-
tecture with architected page tables, the VMM manages the real
map tables of every VM and adjusts the PTR3to the real map
table of the currently active VM. The VMM has no control over
the replacement strategy applied on the TLB and thus the VMM
cannot organize the TLB to manipulate the TLB hit or miss rate
for real-time or non real-time VMs. There are already several
approaches to increase the TLB hit rate and to reduce the up-
per bound for memory access time. To some extent, the results
are remarkable, but they do not solve the main problem. Virtual
memory and real-time constraints cannot be reconciled through
a simple increase of the memory management performance. A
non-deterministic, high-performance TLB miss handling is first
of all still non-deterministic. In this thesis, an extension to the
multiple page table design introduced by Bennett and Audsley
[BA01] is proposed. This extension covers real-time and non real-
time requirements in combination with a dynamic partitioning
of the TLB to improve the performance of soft and non real-
time tasks at the cost of hard real-time tasks as they do not have
any benefit from being faster than required by their WCET. The
main goal is to reduce the negative effect of context switches on
the TLB miss rate of specific soft and non real-time tasks, with-
out losing the ability to handle hard real-time tasks within their
WCET.
Arithmetic-Based Address Translation [ZP05] bypasses the TLB
through the replacement of most of the virtual address transla-
tions with fast arithmetic add operations. It considers that the
physical allocation of virtual pages conforms to certain rules. Se-
quences of virtual page numbers are identified and mapped to
sequences of consecutive physical page frames. The result is a
virtual-to-physical address translation in constant time for these
virtual addresses, but a default TLB is still used for the rest.
3Page Table Register
60 a virtual machine monitor for embedded real-time systems
After comparing different approaches to improve TLB perfor-
mance, Peng et al. [PLW+06] proposed a superpage design which
aims to reduce TLB miss rates by improving TLB coverage. Tal-
luri et al. [TKHP92,TH94] as well investigated the reduction
of TLB miss overhead through a page enlargement. Jacob et. al.
reinforced the dominant impact of the miss rate on the TLB
performance in [JM98b,JM98a]. Two-level TLB is a technique
of Peng et al. [PLW+06] which aims to reconcile high cover-
age with low latency. Multi-level TLBs constitute a way to de-
crease the average translation time, but do not guarantee deter-
ministic response times. Intermediate-level Skip Multi-Size Pag-
ing by Suzuki and Shin [SS97] is based on Multi-level Paging.
These approaches do not take into account that real-time sys-
tems may be composed of hard, soft and even non real-time
tasks with differing timing characteristics. Bennett and Audsley
[BA01] discovered this property and proposed a modular page
table design (figure 24). They suggest the implementation of cus-
tom page miss handlers, depending on the individual real-time
constraints. An implementation which is possible for software-
managed TLBs provides full virtual memory support for less
critical tasks and a predictable address translation at the expense
of only a restricted virtual memory support for hard real-time
tasks. This work is based on their findings.
Another interesting approach to increase the TLB performance
is the partitioning of the TLB. Channon and Koch [CK97] pro-
posed a partitioning scheme of the TLB based on reference and
ownership characteristics. They divided the TLB into sections
for instruction fetching, data fetching and a section for the op-
erating system kernel. The partition boundaries are adjusted at
runtime to balance the TLB misses within the partitions.
3.2.2 Industry
At the beginning of this section, an introduction to engineering
software quality standards of the automotive and avionic indus-
try is presented to show up their requirements. The following
presented VMMs from Greenhills, Lynuxworks, Windriver and
SYSGO are based on theses engineering standards. Furthermore,
an overview of XEN is given, as XEN is very popular in the
server consolidation domain. All of those presented approaches
3.2 related work 61
TLB Miss
Handler
Dispatcher
TLB Miss
Handler
Dispatcher
TLB Miss
Handler
Dispatcher
1
2
3
4
5
6
7
n
TLB Miss
Handler
Dispatcher
TLB Miss
PCB of active
Task
PCB Lookup
Custom
Handler 1
Custom
Handler n
TLB
Figure 24: Modular Page Tables
require paravirtualization to support hard real-time and do not
provide an extensible scheduler interface to address the prob-
lem of hierarchical scheduling by configuring or exchanging the
VMM scheduler.
Engineering standards
In the market of embedded systems, it is common that compa-
nies, which are associated via the chain of economic value added
of one or multiple products, agree on a combined standard of
their solution. In the area of real-time operating systems, differ-
ent standards exist. Some standards specify interfaces rather the
functionality between operating system and applications. Other
standards concentrate on the development process of depend-
able embedded systems to certify a maximum of software qual-
ity. These standards can also be applied to the development pro-
cess of operating systems to certify the dependability of them.
For example, the OSGi4has created a standard for networked
applications. In the automotive sector, many developers use the
4Open Service Gateways initiative
62 a virtual machine monitor for embedded real-time systems
operating system specification for car control system OSEK 5.
This standard has been partly published as an international stan-
dard [ISO05]. DO-178B6is a norm for the development of soft-
ware from the avionic area. The standard has been developed
from the RTCA7and EUROCAE8. The American authority FAA
9employs the norm for certification of software and software de-
velopment processes in the area of avionic. The FAA certificate
is the highest security certificate issued by the FAA. Integrity
Secure Virtualization is built upon technology taken from In-
tegrity RTOS-178B, which has been certified besides FAA DO-
178B Level A with EAL 6+10 by the NSA11. The Evaluation As-
surance Levels 1-7are defined in the Common Criteria for Infor-
mation Technology Security Evaluation. Common Criteria is a
framework in which computer system users can specify their se-
curity functional and assurance requirements, vendors can then
implement and/or make claims about the security attributes of
their products, and testing laboratories can evaluate the prod-
ucts to determine if they actually meet the claims. In other words,
Common Criteria provides assurance that the process of speci-
fication, implementation and evaluation of a computer security
product has been conducted in a rigorous and standard manner.
Different automotive manufacturers and suppliers have allied to
an international consortium and try to establish an open stan-
dard for the electric and IT architecture in the automotive field
with AUTOSAR12. The core of the architecture of AUTOSAR
is the AUTOSAR RTE13 that abstracts from a real topology of
control devices. The programming language ADA is often used
to program dependable embedded real-time systems. For this
purpose the language is adequate because it supports special at-
tributes to increase the dependability: e.g. type safety, run-time
tests for memory overflow or simplified program verification.
5German: Offene Systeme und deren Schnittstellen für die Elektronik in Kraft-
fahrzeugen; English: Open Systems and their interfaces for the Electronics in
Motor Vehicles
6Software Considerations in Airborne Systems and Equipment Certification
7Radio Technical Commission for Aeronautics
8European Organisation for Civil Aviation Equipment
9Federal Aviation Administration
10 Evaluation Assurance Level 6+ : semi-formally verified design and tested
11 National Security Agency of the United States of America
12 AUTomotive Open System ARchitecture
13 Run-Time Rnvironment
3.2 related work 63
Partition 1 Partition 2 Partition 3
Module OS
Hardware
Module
Cabinet
Partition OS Partition OS Partition OS
APEX Interface APEX Interface APEX Interface
Figure 25: Arinc653 architecture diagram.
Since 1983 a ISO/ANSI standard for this programming language
is existing.
ARINC 653
The ARINC 653 specification defines the functionality that an
operating system must guarantee robust spatial and temporal
partitioning together with an avionics application programming
interface. The standard application interface is called APEX14
and defines a set of software services an ARINC 653 compliant
OS needs to provide to avionics application developers. The AR-
INC 653 standard only specifies this interface while it leaves the
implementation details to the OS vendors.
14 Application Executive
64 a virtual machine monitor for embedded real-time systems
Figure 26: Green Hills Secure Virtualization architecture diagram.
The architectural design of the ARINC 653 specification is in gen-
eral identical to the MILS design 2.2.4. The separation kernel is
called Module OS and is deployed on a single processing unit
called module and is able to host one or more avionics applica-
tions and to execute them independently. Due to the MILS based
architecture, the Module OS provides the needed spatial isola-
tion for fault containment. A partition in the sense of ARINC
653 encapsulates an avionic program distributed over a number
of processes being executed by a Partition OS.
Greenhills Integrity Secure Virtualization
Integrity Secure Virtualization is a separation kernel based on
the MILS architecture introduced in section 2.2.4and is able to
host arbitrary guest OSs alongside a comprehensive suite of real-
time applications and middleware. The architecture of Integrity
Secure Virtualization is shown in figure 26. The support for real-
time applications is achieved by the usage of real-time technol-
ogy taken from Integrity RTOS-178B into the separation kernel.
Thus, real-time applications are executed directly by the sepa-
ration kernel and do not require an RTOS within a partition to
host them. This extends the separations kernel task of Integrity
Secure Virtualization to real-time management within real-time
partitions. Somehow, this is against the general design of separa-
tion kernels, as this adds additional code to the separation ker-
nel making the formal verification more difficult. Nevertheless,
3.2 related work 65
Figure 27: Lynuxworks Lynxsecure architecture diagram.
Integrity Secure Virtualization has been certified FAA DO-178B
Level A for avionic systems controlling passenger and military
jets [Gre10].
Lynuxworks Lynxsecure
The Lynxsecure separation kernel, depicted in figure 27 is also
obviously based on the MILS system architecture with strict ad-
herence to data isolation, damage limitation, information flow
policies and periods processing identified in this architecture.
Thus, the different subjects within the figure represent resource
partitions encapsulating an OS or another runtime environment.
The resources assigned to each resource partition are dedicated
exclusively. Lynxsecure thus guarantees resource availability at
any time. The resources are accessible from within the parti-
tions through the virtual chip support package (CSP) and vir-
tual board support packages (BSP). The virtualization of these
support packages ensure that the guest OS is virtually executed
on supported hardware.
Lynxsecure offers the possibility to use full and paravirtualiza-
tion side by side on the same hardware. The full virtualization
feature for example can be used to host Windows as a guest
OS. However, LynxOS-SE, developed by Lynuxworks, is the only
supported RTOS and is virtualized using paravirtualization. The
66 a virtual machine monitor for embedded real-time systems
Lynxsecure separation kernel has been designed to be certifiable
to Common Criteria EAL-715 and FAA DO-178B level A. Up to
now Lynxsecure has not obtained EAL 7[Lyn09].
Windriver VMM
The Windriver VMM has entered the market after Greenhills In-
tegrity Secure Virtualization and Lynuxworks Lynxsecure. It is
also based on the MILS architecture. Windriver VMM uses para-
virtualization to host guest OSs. The resource partitions, which
encapsulate the guest OSs, are called Virtual Boards. Compared
to Integrity Secure Virtualization and Lynxsecure, the Windriver
VMM does not offer anything special [Win10].
SYSGO PikeOS
In contrast to Greenhills Integrity Secure Virtualization, Lynux-
works Lynxsecure and the Windriver VMM, Sysgo follows with
PikeOS the architectural design principles of a microkernel. Fig-
ure 28 shows the architectural design of PikeOS. The separation
of the microkernel and the PSSW16 helps to keep the microkernel
smaller and thus less fault-prone. The PSSW can be understood
as an abstracted management layer of the providing microkernel
while the Pike OS microkernel provides the main functionality
typically provided by an RTOS microkernel:
Task & Thread Management
IPC
Memory Management
Interrupt Handling
Scheduling
The required temporal and spatial isolation is realized by intro-
ducing two types of partitions:
Resource Partitions
Time Partitions
A resource partition is a set of PikeOS tasks sharing a bounded
set of kernel resources assigned to the resource partition and
15 Formally Verified Design and Tested
16 PikeOS System Software
3.2 related work 67
Partition specific
resources
Partition specific
resources
Partition specific
resources
PSSW Library PSSW Library PSSW Library
System
Configuration
File System
Services
Partition Control
Application
Process Control
Interpartition
communication
Time partitioning
Health
monitoring
Primary Application Task (Pike OS process created by the PSSW module)
Secondary Application Task (created by another application task)
PSSW Module
Resource Partitions
Pike OS Microkernel
Hardware
Figure 28: PikeOS architecture diagram.
exception monitoring handlers. PikeOS itself handles hardware
interrupts, trace and breakpoint exceptions and system calls by
itself and does not forward them to the resource partition excep-
tion monitoring handlers. Nevertheless, the kernel is not always
capable of handling all faults and passes the fault information
to user mode handlers. The exception monitoring handlers as-
signed to the resource partitions are the entry and exit points
for user mode exception handling and are not exchangeable by
the resource partition itself. In between the exception monitoring
handlers PikeOS offers a short and full exception handler being
configurable by the user. The whole process of exception han-
dling is implemented using the IPC mechanisms implemented
68 a virtual machine monitor for embedded real-time systems
by the PikeOS microkernel. Thus, upon an exception the kernel
generates on behalf of the faulting thread an IPC message, which
is sent to the exception monitoring handler of the assigned re-
source partition and waits for a response of the user mode han-
dler being implemented in either the short or full exception han-
dler. Within each resource partition, a PSSW library is linked in
to provide the PikeOS PSSW API to the software encapsulated
in the resource partition.
With resource partitions implementing spatial isolation PikeOS
supports the remaining requirement of temporal isolation for
virtualizing real-time system by providing time partitions. Each
PikeOS thread is assigned to a specific time partition. The con-
cept of time partitioning of PikeOS is explained in section 4.2.2.
PikeOS is restricted to paravirtualization of guest OSs. Thus, the
used guest OSs need to be adapted to the PSSW library and the
interrupt handling mechanisms provided by the PikeOS micro-
kernel. This requires as already mentioned the source code of the
guest OS to be adapted, what can be hard to realized depending
on the licensing restrictions of the guest OS vendor. [Sys10]
XEN
XEN is an extremely popular open-source VMM for server con-
solidation and has been developed at the University of Cam-
bridge. The approach of XEN is straightforward in the sense of
system virtualization by multiplexing physical resources at the
granularity of an entire operating system. One of the top design
goals of XEN is to provide high performance virtualization and
strong isolation especially for machine architectures not fulfill-
ing the Popek and Goldberg criteria (see 2.2.2) for full virtualiza-
tion, such as the x86 instruction set. Some examples for the non
virtualizability of the x86 ISA are the instruction popf, which has
different kernel and system mode behavior, the instruction smsw,
which stores the machine status but does not trap in user mode,
and the instructions sgdt and sldt to manage descriptor table, that
also do not trap in user mode. The goal of virtualizing an unco-
operative ISA like the x86 ISA (see 2.2.2) is achieved in XEN by
the use of a high performance paravirtualization architecture de-
picted in figure 29. XEN even goes further by proposing high per-
formance paravirtualization for machine architectures fulfilling
the Popek and Goldberg criteria, because “completely hiding the
3.2 related work 69
Control
Plane
Software
User Software User Software
HW (SMP x86, phy mem, enet SCSI/IDE)
User Software
Domain0
control
Interface
GuestOS
(XenoLinux)
Domain0
Xeno-Aware
Device Drivers
GuestOS
(XenoLinux)
Xeno-Aware
Device Drivers
GuestOS
(XenoBSD)
Xeno-Aware
Device Drivers
GuestOS
(XenoXP)
Xeno-Aware
Device Drivers
virtual
x86 CPU
virtual
phy mem
virtual
network
virtual
blockdev
XEN
Figure 29: XEN Architecture
effects of resource virtualization risks both correctness and per-
formance [BDF+03]”. While this may be true for performance it
is not true for correctness as Popek and Goldberg have shown in
[PG74], because when the set of sensitive instructions is a subset
of the privileged instructions a virtual machine monitor holding
the efficiency, resource control and equivalency properties can
be constructed (see also section 2.2.2). Nevertheless, XEN is very
popular as it provides very good performance on x86 architec-
tures and very inspiring approaches especially for virtualization
of devices other than the CPU [BDF+03].
As XEN is based on paravirtualization two mechanisms are in-
troduced to realize control interactions between XEN and a XEN
Domain:
Hypercalls
Events
The hypercall interface of XEN is responsible for performing a
control transfer from a XEN domain to the XEN VMM by using
a synchronous software trap to perform privileged operations
70 a virtual machine monitor for embedded real-time systems
analogous to the use of system calls in conventional operating
systems.
The event interface provides asynchronous communication from
XEN to a XEN Domain. The event interface replaces the usual
delivery mechanisms for device interrupts and allows the light-
weight notification of important events. As a result, pending
events are stored in a per-domain bitmask which is updated by
XEN before invoking an event-callback handler specified by the
guest OS. The callback handler is responsible for resetting the
set of pending events, and responding to the notifications in an
appropriate manner [BDF+03].
3.2.3 Summary
The related work section showed that there only exist VMMs
supporting hard-real time constraints in combination with para-
virtualization of the RTOS, which is against the goal of this thesis
(see 3.1). Especially the related work for virtualization architec-
tures of embedded systems lacks of existing academic VMM ap-
proaches. Only GandalfVMM represents a VMM approach tar-
geted at embedded systems and was developed at the Univer-
sity of Tsukuba by Oikawa et. al. Nevertheless, GandalfVMM
does not address any architectural decisions, but focuses more
on their mesovirtualization approach, which restricts the para-
virtualization overhead to minimal set of instruction necessary
to be able to execute the guest OS in virtualized environments.
Nevertheless, the approach is restricted to paravirtualization and
thus requires the source code of the guest OS to be applicable.
Besides this fact, the approach lacks of a discussion on how to
decide whether a sensitive instruction may still be executed in
the context of the VM without causing an interrupt instead of
executing it in the context of the VMM emulating it.
There already exist a few commercial virtualization platforms or
VMMs for a range of embedded processors nearly all of them
being proprietary systems. All of the available commercial prod-
ucts only use paravirtualization trying to provide reasonable per-
formance and support realtime applications only by the use of
dedicated resources. Naturally, this limits the applicability of vir-
tualization using these products to a subset of all possible sce-
narios. Especially whenever there are applications that cannot
3.3 design 71
be paravirtualized since the source code is not available, these
applications cannot be virtualized using the currently available
virtualization products since this would require a binary analy-
sis of the whole application which most often is not completely
possible.
To sum up, there is no solution given either from academia or
even industry which fulfills the goal of having a VMM offering
full virtualization to support RTVMs. Furthermore, there does
not exist a VMM allowing an arbitrary combination of full and
configurable paravirtualization. The idea of having an extensible
scheduler interface is also not addressed in the presented related
work. Thus, the following part of this chapter addresses exactly
those problems to be solved by the introduced VMM design.
3.3 design
The development of a configurable hybrid VMM for embedded
real-time systems is the first topic addressed in this thesis. The
task to realize the idea of a configurable hybrid VMM was as-
signed to Daniel Baldin. His diploma thesis [Bal09] covered the
prototypical development. The thesis was supervised by me and
the results have been published in [BK09]. Thus some of the
parts presented in the following sections and not appearing in
[BK09] have first been described in [Bal09]. Sections containing
such parts are marked by a reference to [Bal09].
A broad overview of existing VMMs and sophisticated virtual-
ization techniques has so far been given, showing that the actual
existing solutions lack the support for hard real-time when using
full virtualization. Furthermore, there does not exist a a solution
providing a configurable paravirtualization interface, reducing
the paravirtualization effort to the really needed paravirtualiza-
tion hypercalls. The need for an extensible scheduler interface is
also addressed by the proposed design.
The approach presented in this thesis overcomes this problem by
introducing a configurable hybrid VMM architecture designed
and implemented for the PowerPC405 processor which allows
for the virtualization of unmodified applications as well as par-
avirtualized applications or even a combination of both. Support
for realtime applications was a major goal of the design, which
72 a virtual machine monitor for embedded real-time systems
allows for the integration of any kind of scheduling mechanism
for VMs while being completely deterministic. Furthermore, the
high configurability allows the system to be optimized explic-
itly for the intended field of use. This affirmed the research in
this direction with hybrid virtualization being a relevant topic in
industrial embedded systems.
Besides these advantages, virtualization suffers from its inher-
ent overhead. This overhead is heavily dependent on the ISA
and on the hardware support for the emulation process, context
saving, memory virtualization and privilege management. The
hardware support for virtualization is currently in the focus of
several chip designers as Intel with their Intel VT extension and
AMD with their pacifica extension. They are currently providing
Desktop and Server CPUs with their hardware extensions. With
the ATOM processors Intel started the introduction of their vir-
tualization technologies to the embedded market for netbooks.
Nevertheless, there is a lack for hardware support of virtualiza-
tion in the embedded systems domain. Thus, it is very inter-
esting to identify the bottlenecks of virtualization in embedded
systems and derive possible cost-efficient hardware extensions
for embedded microcontrollers.
In section 3.3.1, the configurability of the hybrid VMM ABI is
introduced while section 3.3.2describes the architecture and the
support of full and paravirtualization of the hybrid VMM de-
sign. In section 3.3.3, the virtualization concepts of the processor
are introduced including especially interrupt handling and reg-
ister access. To support multiple VMs simultaneously, the VMM
needs to implement a scheduling policy. The scheduler interface
is described in section 3.3.5. The design does not rely on a sin-
gle scheduling approach. Instead, it leaves the implementation of
the scheduling policy up to the developer by providing an exten-
sible scheduler interface. The support for RTVMs is discussed in
chapter 4, as this is a very complex problem on its own. Another
very important problem to be addressed in section 3.3.6is the
virtual memory management, because the virtual memory man-
agement is essential to guarantee spatial isolation in memory.
The presented memory virtualization approach was published
in [GK08]. Finally a short overview of the possibilities to realize
I/O virtualization is given in section 3.3.7.
3.3 design 73
3.3.1 Configurability
Achieving minimal overhead as needed for the intended field of
use is one of the major design goals of this system. This goal
is met by the high flexibility offered through the possibility to
configure a wide range of system components. The idea is based
on the concept of RTOS configurability introduced by Ditze in
[Dit98a,Dit98b].
The configuration of the VMM is realized in a very fine gran-
ular structure by the extensive use of preprocessor statements
within all VMM components. Thus, the system designer is able
to enable or disable features of the virtual machine monitor or
specify which parts of the hypercall interface shall be supported.
The principle workflow of the configuration is depicted in figure
30. Based on the configuration files, the preprocessor eliminates,
adds, or changes code segments inside the implementation files
to create source code which does not suffer from unneeded code
parts any more. This is a very valuable feature if the platform
is about to be deployed inside a very memory limited environ-
ment.
The virtualization platform allows the system designer to explic-
itly define whether there is the need for full-virtualization or
paravirtualization or even both. It is even possible to configure
the support for special parts of the host ISA as for example sup-
port for virtual memory. The hypercall interface may explicitly
be configured by defining whether paravirtualized drivers are
supported, inter partition communication is needed and which
scheduler needs to be used. Each of the components can then
be configured as well to allow the platform to match the needs
of the target system to a maximum amount. An example for a
hybrid configuration supporting full and paravirtualization is
depicted in figure 31. [BK09]
3.3.2 Architecture
In order to meet the requirements of high performance and con-
figurability the virtualization platform provides a hybrid VMM
interface as already depicted in figure 31. Therefore, the VMM
is implemented as a hybrid VMM which uses full-virtualization
as the basic virtualization technique with additional support for
74 a virtual machine monitor for embedded real-time systems
hypconf.h
vm_phytabs.c
linkerscript.sed
Preprocesser: Replace Symbols
Compiler and Linker
System
Designer
VMM
FTS
Scheduler
DispatcherIPCM
ISA
Emulator
Hypercall
Handler
EDF
Scheduler
FTS
Scheduler Dispatcher
configured ISA
Emulator
Configuration FilesConfigurable Components
Figure 30: Configuration flow of the virtualization platform.
3.3 design 75
Figure 31: The virtual machine monitor allows for the virtualized exe-
cution of any kind of guest application. Left: an unmodified
application. Middle: a completely paravirtualized applica-
tion. Right: a partially modified application.
paravirtualization. Partially paravirtualized applications or self-
modifying applications are also supported by the use of a fall-
back feature which ensures the consistency while executing in
and switching between both kind of states. The configurability of
the VMM and the paravirtualization interface allows the system
designer to create a system which is tailored for the particular
field of application and saves as much memory as possible.
In general, the design is based on the multiple independent lev-
els of security (MILS) architecture [Obj08]. As illustrated in fig-
ure 32, only the minimal set of components which are needed
to implement the secure partitioning, scheduling and communi-
cation between VMs are part of the VMM running in privileged
mode. All other components called ”Untrusted VMP Modules”
are placed inside a separate partition and are executed in user
mode as this is one of the fundamental concepts of the MILS ar-
chitecture. The communication between partitions is controlled
by the Inter Partition Communication Manager (IPCM) who will
on request create shared memory tunnels between two partitions
which can be used for VMs to ease the communication between
each other. Especially this feature allows formerly physically
spread systems that had to use a real-time capable bus system
(e.g. a CAN-bus) for communication to enhance the security, ro-
bustness and performance of the information flow between each
other if they are virtualized and placed on top of the same VMM.
The fundamental components of the whole system are formed
by the interrupt handlers as seen in figure 32. Since the VMM
executes the guest application in user mode, any interrupt oc-
76 a virtual machine monitor for embedded real-time systems
Full Virtualized Application Para Virtualized Application Untrusted VMP Modules
Hypercall
Dispatcher
ISA Emulator
Emulation
Dispatcher
Program IRQ Syscall IRQ Timer IRQ External IRQ
VM Scheduler
IPCM
Hardware
Hypercall
Figure 32: Information and control flow of the components used in-
side the virtualization platform.
curring is delegated to the VMM first which then has to analyze
the interrupt and forward it to the appropriate component or
the virtual machine itself. Program interrupts raised amongst
others by privileged instructions inside the guest application are
forwarded to the emulation routine dispatcher which will deter-
mine the corresponding ISA emulation routine. Whenever the
virtualization platform has been configured to support paravir-
tualization, applications running inside a virtual machine can
use hypercalls to emulate privileged instructions, call the IPCM,
use paravirtualized I/O drivers or call scheduler related func-
tions. Especially the last component offers methods to set sched-
uler related parameters at runtime or the possibility to yield the
cpu directly in order to allow system designers to incorporate
special scheduling mechanisms. [BK09]
Full Virtualization
The full virtualization components are formed by the already de-
scribed program IRQ Handler, the Emulation Dispatcher and the
ISA Emulator (see figure 32). A flow chart is depicted in figure
33 to visualize the following description of the involved compo-
nents. The emulation dispatcher is responsible for the analysis
of the program IRQ and the dispatching to the associated emu-
3.3 design 77
Program
IRQ
Trap?
fetch
instruction
decode
instruction
save context
illegal
instruction
extract
parameters
terminate
VM
no
dispatch
emulation
routine 1
emulation
routine 2
emulation
routine n
...
1
2
n
restore
context
Emulation
Dispatcher
ISA
Emulator
Program IRQ Handler
Instruction
address
Figure 33: Flow Chart of the Full Virtualization components.
78 a virtual machine monitor for embedded real-time systems
lation routine of the ISA emulator. The set of emulation routines
is configurable by the configuration process introduced in 3.3.1.
Thus, not needed emulation routines can be removed from the
final VMM binary. The emulation process is based on the basic
interpretation approach presented in 2.3.1, as techniques like dy-
namic binary translation are not suited for the use in real-time
systems due to their runtime overhead. The developer always
has to keep in mind that the emulation of a sensitive instruc-
tions needs to be bounded in its execution time. The first step
of the emulation process is to fetch the instruction to be em-
ulated. This step depends on how the hardware provides the
source instruction that caused the trap. In case of the Power
ISA, the address of the instruction is stored in a reserved reg-
ister. Thus, the address needs also to be fetched from memory
before it can be decoded. The decoding step identifies the in-
struction and its parameters and passes them to the dispatching
routine which calls the associated emulation routine for the iden-
tified instruction with its parameters. The decoding step could
also be performed offline when the hardware provides the ad-
dress of the trapping instruction. This is known as pre-decoding
[SN05b] and is closely related to binary translation (see 2.3.2).
A pre-decoded copy of the binary is saved in memory having
at least the same size as the original binary. The pre-decoded
version of the binary contains the already extracted instruction
and their parameters in easily accessible fields what reduces the
decoding step to load instructions only. Nevertheless, the offline
pre-decoding approach suffers from the code-discovery problem
like binary translation, making it only usable in special cases. To
overcome the code-discovery problem, pre-decoding could also
be applied online, which is known as incremental pre-decoding.
In this case, the pre-decoding is performed with the knowledge
of the current program flow from the actual program location
up to the point where an indirect branch or jump is performed,
as then, the target location within the binary code is not known
in advance.
Paravirtualization
The syscall IRQ handler and the hypercall dispatcher build the
paravirtualization part of the hybrid VMM design. The first part
of the paravirtualization covers the emulation of sensitive in-
3.3 design 79
Program
IRQ
Syscall?
save context
signal
guest OS
extract
Hypercall ID
yes
dispatch
restore
context
Hypercall
Dispatcher ISA
Emulator
Program IRQ Handler
virtual CPU
mode
no
VM Scheduler IPCM
emulation
routine x
VMSched_HC IPCM_HC
Figure 34: Flow Chart of the Paravirtualization components.
80 a virtual machine monitor for embedded real-time systems
structions to speed up the trap and emulate approach used in
full virtualization mode. Therefore, the hypercall provides all
information needed in easy accessible fields to minimize the de-
coding step in the hypercall handler. Thus, implementing this
kind of hypercalls in a guest OS can be considered as manually
annotated pre-decoding. The second part of the paravirtualiza-
tion adds additional functionality to the system ABI by provid-
ing high-level hypercalls to control the behavior of the VMM
Components such as the scheduler, the IPCM and other compo-
nents that may be added using the configurability of the hybrid
VMM approach. A very important requirement of the hypercalls
is their bounded execution time, what has to be kept in mind
upon implementation always.The flow chart of this two para-
virtualization parts is depicted in figure 34. When the syscall
IRQ handler detects a hypercall, the hypercall dispatcher is in-
voked, while the syscall is signaled to the guest OS and the con-
text is restored when the syscall IRQ handler detects a syscall
from a user program to its guest OS. For the decision whether
a syscall IRQ is a syscall or a hypercall, the virtual cpu mode of
the executed VM is needed (see 2.2.1). The hypercall dispatcher
can easily decode the hypercall as it is passed using easy acces-
sible fields. The representation of these fields depends on the
hardware architecture as they may be passed in special registers,
scratchpad memory and so on. After the hypercall has been de-
coded, the hypercall dispatcher can call the associated hypercall
handler. As already described this can be an emulation routine
or a high-level component like the IPCM, the scheduler or even
an untrusted I/O driver encapsulated in userspace .
A big problem of paravirtualization is that up to now, no method
exists to perform paravirtualization automatically using special
tools on a given binary. An approach in that direction is called
pre-virtualization and has been shortly presented in 3.2.1. Pre-
virtualization parses unlinked binaries and replaces sensitive in-
structions by hypercalls. The technique is based on the idea to
append ”no operation” (nop) instructions to every privileged
instruction inside the guests applications source code in order
to create space for a later replacement of these instructions by
hypercall instructions to the virtual machine monitor. The over-
all process is illustrated exemplarily in figure 35. To support
pre-virtualization, a python script, which first parses the source
code of an application for privileged instructions, has been im-
3.3 design 81
add r3,r4
mtevpr r3
li r3,0
add r3,r4
label_1:
mtevpr r3
nop
nop
li r3,0
file "ab_tab.S":
.long label_1
.long 3
li r13,HC_MTEVPR
mr r14,r3
sc
add r3,r4
li r13,HC_MTEVPR
mr r14,r3
sc
li r3,0
Loadtime Binary
Translation
Pre-
Virtualization
a) b) c)
Figure 35: Pre-Virtualization illustrated. a) Base source code. b) Pre-
Virtualized source code and Pre-Virtualization table. c) The
final paravirtualized code.
plemented for the hybrid VMM. Whenever a privileged instruc-
tion is found, its address is stored inside a pre-virtualization
table together with the binary code of the hypercall instructions.
The code itself is only modified by appending nop instructions
to the privileged instructions found. Therefore, the intermedi-
ate pre-virtualized code can still be executed natively. At load-
time, the virtual machine monitor finally paravirtualizes the pre-
virtualized applications by the use of the pre-virtualization ta-
ble which is stored within the applications data area. As the
hypercalls typically comprise more than one instruction, the ad-
dresses within the binary are shifted. This is the reason why
pre-virtualization can only be applied to unlinked binaries, as it
is then possible to adjust all addresses automatically. Neverthe-
less, it is worth adding this feature to the hybrid VMM design
as it does not increase the complexity of the VMM, since it is
based on automated preprocessing steps and helps to automate
paravirtualization, because the manual paravirtualization can be
quite hard, because the guest OS has to be adapted completely
to match the hypercall ABI. Considering a Linux version to be
executed with XEN requires the modification of about 3000 lines
of code [BDF+03] what is close to a complete VMM like Lynxse-
cure with about 8000 lines of code.
82 a virtual machine monitor for embedded real-time systems
3.3.3 Processor Virtualization
This section describes the core components handling the pro-
cessor virtualization. The processor provides different interrupts
being the entry points to the VMM, thus IRQ Handling and dis-
patching is addressed in section 3.3.3. Besides this the access to
the processor registers needs to be controlled by the VMM. In
the case of full virtualization, this is handled within the VMM
upon an interrupt signaling the access to a register by a sensitive
instruction. This causes a lot of overhead due to the interrupt
handling mechanisms. When it is possible to apply paravirtu-
alization, a more sophisticated approach can be applied. This
approach is presented in section 3.3.3.
IRQ Handling and Dispatching
The IRQ Handlers are the heart of the VMM design. They ensure
the proper activation of the VMM in case of an interrupt and
form the only entry point to the VMM. The design is based on
four types of interrupts.
Program IRQ: A Program IRQ is raised when the execu-
tion of an illegal instruction or an attempted execution of a
privileged instruction from user-mode traps. This IRQ trig-
gers the full virtualization components of the hybrid VMM
design.
Syscall IRQ: A Syscall IRQ is raised when a syscall instruc-
tion is executed to activate to signal a request to the system
software. This IRQ triggers the paravirtualization compo-
nents of the hybrid VMM design.
Timer IRQ: A Timer IRQ is raised when the timer device
of the hardware has reached its requested value.
External IRQ: External IRQs are raised by external devices
like I/O devices.
The task of the interrupt handlers is to save the context, analyze
the interrupt, dispatch the interrupt to the appropriate compo-
nent of the VMM, handle the interrupt and return to the pre-
empted program. Figure 36 shows this sequence after a privi-
leged instruction trapped. The time tsrepresents the period of
time needed by the context saving routine, tarepresents the time
to identify the source of the interrupt, tdrepresents the time
3.3 design 83
tstd
t
Trap of
privileged
instruction
thtr
ta
Figure 36: Steps performed upon handling an IRQ [Bal09].
needed by the dispatching routine, threpresents the required
time to perform the handling of the interrupt and trrepresents
the time needed to restore the context of the preempted program.
The time tsheavily depends on the hardware architecture and
usually takes most of the time of an interrupt handler. Thus, the
efficient implementation of this part of the interrupt handler is
crucial for the performance of the whole VMM. The next part
of the interrupt handler analyzes the interrupt for the reason of
the interrupt. The analysis heavily depends on how the informa-
tion on the interrupt cause is represented in hardware. The timer
interrupt is an example for this case as it is only raised by a sin-
gle possible event. When there is a dedicated interrupt line for
a special event, no time is needed to identify the source of the
interrupt reducing the time needed by the dispatching routine
to zero, as the associated handler can be directly called, which is
the case for the timer IRQ. In the case of a shared interrupt line,
the analysis routine can become very complex, which is the case
for the Syscall and the Program IRQ.
The program IRQ is the central entry point for the full virtual-
ization part of the hybrid VMM design. Besides the case of the
execution of an illegal instruction, it is raised when a privileged
instruction traps in user mode. As already mentioned before, the
dispatching process to the associated emulation routines can be-
come very complex, because the source instruction that triggered
the program IRQ has to be identified. The complexity heavily
depends on the information provided by the hardware upon a
triggered program IRQ.
The syscall IRQ is the central entry point of the paravirtualiza-
tion part of the hybrid VMM design. Whenever a paravirtualized
84 a virtual machine monitor for embedded real-time systems
Processor
Processor
Register File
VMM
Processor
Register File
for VM n
Processor
Register File
for VM 1
Copy values
when activated
(a) Processor Virtualization by using
Register Files.
VM
Register File
MSR
PIT
SRR0
SRR1
TCR
VM
User Space
IRF Page
SRR0
SRR1
IRORF Page
MSR
TCR
Memory
Mappings
(b) Innocuous Register File Mapping.
Temporarily innocuous register are
mapped into pages inside the VM
memory space.
VM wants the VM to service a hypercall, it triggers a syscall to
activate the VMM. As the syscall IRQ is typically used by user-
mode programs to activate the operating system the VMM must
distinguish between hypercalls to the VMM and syscalls to the
guest OS. This distinction is quite easy, as a hypercall performed
by the guest OS is executed as a syscall in virtual supervisor
mode, while a regular syscall of a user program is executed as a
syscall in virtual user mode (see 2.2.1) [Bal09].
Innocuos register file mapping
Virtualizing the processor is one of the main tasks of the vir-
tual machine monitor. Every virtual machine gets its own set of
virtual register stored inside the Register File as depicted in fig-
ure 37a. Whenever the virtual machine is activated, the registers
are copied into the registers of the physical processor. During
the execution, the virtual machine has unrestricted access to the
user space registers of the processor. Registers that need to be
accessed by the use of privileged instructions will trap to the vir-
tual machine monitor which then emulates the behavior of that
instruction on the virtual register inside the register file. How-
ever, there exist registers which can be accessed without having
immediate influence on the state or behavior of the virtual ma-
chine. An example is given by the Save Restore Registers (SRR)
of the PowerPC Processor. The value of these registers will only
be used whenever the processor executes the ”recover from in-
terrupt” (rfi) instruction. In order to speed up the access to these
3.3 design 85
t [s]
12345 6
active VM
1
2
Blackout Blackout
Blackout
Blackout
Figure 37: When sharing the timer device, the VMs are suffering from
blackouts during their phase of inactivity [Kai08a].
kind of registers, Proteus uses a technique called Innocuous Reg-
ister File Mapping (IRFM) which allows virtual registers to be
mapped to memory pages inside the VMs accessible memory
area as illustrated in figure 37b. Virtual registers which may be
read directly without the need of any emulation are mapped
into a read only memory page. Registers that can be either read
or written without immediate influence are made accessible in-
side a readable and writable memory page whose address can
be configured to conform to the systems memory map. There-
fore, access to these registers is possible by using load and store
commands instead of trapping to the virtual machine monitor
which speeds up the virtualized execution of a program dramat-
ically. A similar approach has been used inside the ia64 port of
the Xen VMM [BDF+03] for desktop systems to speed up the
information flow between the VMM and the VMs. However, the
approach has not been applied completely to all possible regis-
ters of the ISA like this virtualization platform does.
Since the IRFM feature can only be used by paravirtualized ap-
plications partially modified applications may still use privileged
instructions to access the virtual registers. In order to ensure the
consistency of the mapped registers and the virtual registers in-
side the register file, the virtual machine monitor updates the
mapped registers whenever a value is written to the register file.
This feature is called the Innocuous Register File Mapping Fall-
back Feature and may be enabled if the system designer is un-
sure whether all privileged instructions are paravirtualized or
not. Especially for systems using third party libraries without
access to the source code, this is a very valuable feature [BK09].
86 a virtual machine monitor for embedded real-time systems
3.3.4 Timer Virtualization
In general, the hardware systems today offer a timer register
and a so-called programmable interval timer (PIT). The time is
represented as system ticks being periodically executed with a
specific frequency f. Thus, the time Trepresented by a single
system tick is T=1
f. When an interrupt is needed after a certain
time has expired, the PIT can be used. Upon request, the PIT is
filled with a given value and is decremented by one upon every
system tick. When the value reaches zero, a timer interrupt is
triggered. The virtualization of a timer device is a quite complex
problem, as the timer device represents the notion of time of a
system and has to be shared among different VMs. During this
sharing, the inactive VMs suffer a blackout (see figure 37), as the
timer device continues incrementing at the rate f. In the case of
paravirtualization, the virtual machine is aware of experiencing
blackouts while in full virtualization, the VM is not aware of
these blackouts. This may lead lead to undeterministic behavior,
but is not necessarily a consequence. To ensure the correct timing
behavior of full virtualized VMs, the timer virtualization must
hide the blackouts.
As a scheduling mechanism for full virtualized VMs/RTVMs, a
fixed time slice scheduler (FTS scheduler) is assumed, as it al-
lows an easy implementation of proportional share distribution
among different VMs and is the base scheduler used in chapter
4. In figure 38, such a schedule is shown. The x-axis denotes the
time passed in the hardware timer register given in host cycles.
The y-axis denotes the number of cycles supplied to the VM in
virtual cycles. The solid line represents the real execution, while
the dashed line represents the idealized execution assuming the
supply could be assigned linear by using infinite timeslicing.
When considering the example of VM1, depicted in figure 38,
being scheduled with VM2, depicted in figure 39, one can see
that the VM1is active periodically from time S1=0ms to time
E1=1ms relative to the beginning of the time period P=3ms,
and VM2is active from time S2=1ms to time E2=3ms relative
to the beginning of the time period P=3ms. Both VM receive
a share of EiSi
Pof the processor. In case of VM1this share is
equal to 1
3of the processors capacity, while in case of VM2, this
share is equal to 2
3.
3.3 design 87
t [106 cycles]
Virtual Timer Register [106 cycles ]
1 2 3 54 6
1
2
3
4
Timer Interrupt
5
6
IVTVM1(t)
PIT=1,33
7
7
t [106 cycles]
0,5
~ 0,167
0
1,5
virtual time
when timer interrupt
occurs
real time
when timer interrupt
occurs
VTVM1(t)
Figure 38: Timer scaling in case of full virtualization
1 2 3 54 6
1
2
3
4
5
6
VTVM1(t)
Timer Interrupt
7
7
PIT=1,66
~ 1,33
virtual time
when timer interrupt
occurs
real time
when timer interrupt
occurs
IVTVM1(t)
t [106 cycles]
t [106 cycles]
Virtual Timer Register [106 cycles ]
Figure 39: Timer scaling in case of full virtualization
88 a virtual machine monitor for embedded real-time systems
A very simple approach to virtualize the timer without black-
outs would be to substract the cycles passed during the blackout
from the virtual timer register of the virtual machine. The vir-
tual timer register may then be multiplied with the inverse of
the share being P
E1S1, being 3for VM1and being 1.5for VM2,
to receive the real cycles passed in the hardware timer register
to achieve this supply. This implies the virtual cycle supply be-
ing ideally given as linear function called ICSVM(t)(Ideal Cycle
Supply), where tis given in host cycles. As already mentioned
before, this function is depicted by the dashed line in figures 38
and 39.
ICSVMi(t) = EiSi
P·t, (3.1)
One could think of also multiplying the virtual timer value by
P
EiSito be passed to the hardware PIT register upon a PIT re-
quest, but this would result in an incorrect value for the PIT, as
the result of multiplying the virtual timer value by P
EiSiis the
virtual time expressed in host cycles and is called tICS. The time
tICS can be seen as the virtual time running contiguously in the
VM. The real cycle supply to the VM is given as a piecewise
function called CSVMi(t), as the VMs are executed in fixed time
slices with the given processor frequency fand thus the time t
represents the real time passed when the virtual timer register
has reached the supply of ICSVMi(tICS)at the virtual time tICS.
Thus, the real time thas to be passed to the hardware timer
register. The value of tcan be determined by the intersection of
CSVMi(t)with the constant function being equal to the current
virtual timer value.
Thus, the time tof CSVMi(t)is mapped to the time tICS of
ICSVMi(tICS)where both supply functions are equal:
tICS t,ICSVMi(tICS) = CSVMi(t)(3.2)
This preserves the correct timing behavior based on the cycle
supply given by a FTS schedule. The mapping can be realized
by first determining the cycles ∆t passed since the VM has been
active. This is the time asince the VM has got last activated
substracted from the current real time t.
∆t =ta(3.3)
3.3 design 89
Then, the time PITVM of the requested PIT interrupt relative to
the last activation of the VM is determined by adding the cycles
supplied by CSVM(∆t), as these numbers of cycles have already
been supplied since the last activation of the VM:
PITVMi=PIT +∆t. (3.4)
Now PITVMicontains the cycles to be supplied to the VM. As
the PIT register of the system needs to be set to the real time
value when the supply of PITVMiis finished by CSVMi(t)the
value of thas to be determined by the VMM. This is realized
by first determining the number of full periods xwith supply
EiSicontained in PITVMi.
x=bPITVMi
EiSi
c. (3.5)
Then the remaining supply Rbeing supplied in period x+1is
calculated by:
R=xbxc. (3.6)
Now it is possible to calculate tCS by:
tCS =x·P+Si+R·(EiSi). (3.7)
The value tCS can now be put directly into the PIT hardware
register. As can be seen in figures 38 and 39, the real time when
the interrupt occurs is different to the virtual time of the VM.
This is based on the mapping introduced in equation 3.2. This
has to be considered by the developer when defining the task
sets for the given VMs under a given FTS schedule.
An example for this is given in figures 38 and 39. The first exam-
ple of VM1shows a task requesting a PIT interrupt in 1.33 ·106
virtual cycles relative to the current time. The current real time
when the request is issued is t=0.167 ·106host cycles being
equivalent to the virtual time tICS =0.5·106host cycles. The
calculation using equations 3.3to 3.7results in the real time
t=3.5·106host cycles.The virtual time tICS can be calculated
using equation 3.1and results in tICS =4.5·106host cycles as
ICSVM1(tICS) = 1.5·106virtual cycles. The time difference of
the virtual time when interrupt occurs to the virtual time where
the interrupt was requested is equal to 4·106host cycles. This
90 a virtual machine monitor for embedded real-time systems
is the expected time passed in host cycles, because VM1has a
share of 1
3what is equal to 1.33 ·106virtual cycles when the
virtual clock would tick with 1
3f.
The second example of VM2shows a task requesting a PIT inter-
rupt in 1.66 ·106virtual cycles relative to the current time. The
current real time when the request is issued is t=2.33 ·106host
cycles being equivalent to the virtual time tICS =2·106virtual
cycles. The calculation using equations 3.3to 3.7results in the
real time t=5·106host cycles.The virtual time tICS can be cal-
culated using equation 3.1and results in tICS =4.5·106host
cycles as ICSVM2(tICS) = 3·106virtual cycles. The time differ-
ence of the virtual time where the interrupt occurs to the virtual
time where the interrupt was requested is equal to 2.5·106host
cycles. This is the expected time passed in host cycles, because
VM2has a share of 2
3what is equal to 1.66 ·106virtual cycles
when the virtual clock would tick with 2
3f.
With this knowledge, it is possible to design the interrupt han-
dling of the VMM. This design is depicted as flow chart in figure
40. Directly after the timer interrupt was triggered, the interrupt
is acknowledged by the VMM. Then, the smallest PIT value is
determined to assign the triggered PIT interrupt to an available
PIT request from the VMM or an arbitrary VM. If the interrupt
was triggered for the VMM, all virtual timer registers are decre-
mented by the value of the expired PIT interrupt, the context of
the active VM is saved and the scheduler is called. In the other
case, the triggered PIT interrupt is assigned to one of the avail-
able VMs. Depending on the auto reload flag, the virtual timer
register for the according VM is reloaded. Then, the PIT register
is set to the minimum value of all available virtual PIT regis-
ters. Afterwards the PIT interrupt is queued for the target VM,
because the VMM first has to determine whether the VM has in-
terrupts enabled or not. If the target VM has interrupts disabled,
the active VM is resumed directly. If not, the interrupt is flagged
in the corresponding virtual register of the target VM. Then, it
is checked whether the target VM of the interrupt is equal to the
active VM, because otherwise, a preemption may be necessary if
this feature is enabled.
3.3 design 91
Timer
Interrupt
Ack
Interrupt
min(
VMM_PIT,
VM1_PIT,
...,
VMn_PIT)
Save active
VM Context
Call
Hypervisor
Scheduler
VMM_PIT
VMx_PIT
Decrement all
Virtual Timer
Registers
Decrement all
Virtual Timer
Registers
VMx_PIT =
ReloadValue
yes
Timer=min(
VMM_PIT,
VM1_PIT,
...,
VMn_PIT)
no
Queue target
VM Timer
Interrupt
Resume target
VM@TimerISR
Save active
VM Context
yes
Auto reload?
VM preemption
enabled?
Resume
active VM
Target VM
interrupts
enabled?
yes
no
Flag target
VM Timer
Interrupt
targetVM =
activeVM?
yes no
no
Figure 40: Timer Virtualization Flow Chart
92 a virtual machine monitor for embedded real-time systems
3.3.5 Scheduler
The scheduler is the main component of the VMM as it multi-
plexes the CPU among the different VMs. Due to the design de-
cision of a configurable hybrid VMM, the scheduler component
implements a uniform interface, which is depicted in figure 41.
This interface has to be implemented by every scheduler of the
VMM. The method sched_init is responsible for the initialization
of the scheduler component. This can be for example the ini-
tialization of the necessary data structures. The next VM to be
executed is determined using the method sched_getNextVMIndex
which implements the scheduling policy. The activation duration
of the VM in processor cycles necessary to fulfill the scheduling
task is returned by the method sched_getNextTimerEvent. To guar-
antee that the VMM will regain control of the CPU, the VMM
uses this value to program the timer to generate a timer inter-
rupt after this duration. Furthermore, the interface defines two
methods useful for paravirtualization. The first method is called
sched_yield and allows a VM to give back the control of the CPU
to the VMM, which is then able to assign the CPU to another VM.
This is not possible in the case of paravirtualization, as the VMM
is in general not able to notice whether a VM idles or not. The
second method is called sched_setParam and allows a paravirtual-
ized guestOS to pass different important scheduling parameters,
e.g. the next deadline, to the VMM. Figure 41 shows the three ba-
sic scheduling approaches currently implemented in the VMM.
An in-depth discussion of scheduling VMs will be given in 4as
this is a complex problem especially when applying full virtu-
alization, which is the main focus of this thesis. Nevertheless, a
comparison to the paravirtualized scheduling mechanisms will
be given to demonstrate the advantages and disadvantages of
both approaches [Bal09].
3.3.6 Memory Virtualization
As stated in section 3.2.1, Bennett and Audsley addressed the
problem of coexisting virtual memory access time restrictions
for real-time and non real-time tasks with their modular page
table approach depicted in figure 24. Upon a TLB miss, the TLB
Miss Handler Dispatcher performs a look-up on the PCB of the
3.3 design 93
void sched_init();
unint2 sched_getNextVMIndex();
unint4 sched_getNextTimerEvent();
void sched_yield();
void sched_setParam(int vm_id, int param, void* val);
{interface}
Scheduler
RoundRobinScheduler FixedTimeSliceScheduler
implements
globalEDF
Figure 41: Interface of the VMM Scheduler component [Bal09].
active task to determine the assigned miss handler for this task.
The obtained miss handler is called to find the appropriate TLB
translation entry within the page table of the process. Due to the
fact that different miss handlers can be used, it is possible to use
distinct page table architectures for different tasks.
To enable deterministic memory access for hard real-time tasks,
Bennett and Audsley propose to use a fixed length array indexed
by the virtual page address to find the appropriate translation
entry in time O(1). This approach is only appropriate for small
logical address spaces as the size of the real map table is pro-
portional to the logical address space. As an example, a 32 bit
microprocessor with a minimum page size of 1024 Byte is as-
sumed. In this case, the fixed length array would need about
11.5MB of memory at least. Thus, a solution consuming less
memory would be desirable. There are a lot of approaches exist-
ing for general purpose OSs like Bit Maps [Tan07], Segmentation
[Tan07], Multi-level Paging [Tan07], Short-Circuit Segment Trees
[SH02] and Paged Segmentation [Tan07]. A very interesting ex-
tension to short circuit trees called intermediate-level skip multi
size paging has been proposed by Suzuki et. al in [SS97]. The
memory consumption of this approach is superior to the other
approaches, and the processing overhead is the same class be-
ing also O(1). Nevertheless, the usage of a fixed array would be
more performant since the lookup could be realized in a single
step while the lookup when using ISMSP requires nsteps in the
worst case with nbeing equal to the depth of the tree. Please
note that nis user defined and bounded.
The next important step after the translation entry has been
found is the replacement of an existing TLB entry with the re-
94 a virtual machine monitor for embedded real-time systems
quired entry. There exist a lot of approaches to handle this prob-
lem, such as FIFO [Tan07], LFU [Tan07], LRU [Tan07], or the
clock algorithm [Tan07]. When performing a WCET Analysis of
hard real-time tasks, an upper bound has to be assumed for ev-
ery memory access to guarantee the task will finish before its
deadline. In the general case the number of entries of a mapping
table, such as the real map table, exceed the number of available
TLB entries. Thus, a TLB miss has to be assumed for every sin-
gle look-up as the TLB does not cover 100% of the entries. As
hard real-time tasks do not benefit from a completion of a mem-
ory access before this upper bound, hard real-time tasks do not
benefit from having more than one TLB entry. Additional TLB
entries would increase the memory coverage and decrease the
probability of a TLB miss, but could not increase the hit rate to
100%. In most instances, the memory requirements of a task are
too high to make all addresses available in the TLB, assuming a
fine granular memory management.
TLB Miss
Handler
Dispatcher
TLB Miss
Handler
Dispatcher
TLB Miss
Handler
Dispatcher
TLB Miss
Handler
Dispatcher
TLB Miss
VMCB of
active VM
VMCB Lookup
Hard Real-Time
Miss Handler
Non Real-Time
Miss Handler
Soft Real-Time
Miss Handler
Real Map Table
of active VM
1
2
3
4
5
6
7
n
TLB
Figure 42: Mixed Priority Paging with Dynamic TLB Partitioning
The fact that hard real-time tasks do not take any advantage of
more than one TLB entry, a software managed TLB design is
used in this thesis to modify the Multiple Page Approach to the
approach called Mixed Priority Paging, which is depicted in fig-
ure 42. The modification is realized by a static assignment of the
3.3 design 95
first entry of the TLB to all available hard real-time tasks. The
remaining space within the TLB is shared between soft and non
real-time tasks. Soft real-time tasks and non real-time tasks ben-
efit from having faster memory access times as this results in a
shorter response time which may be directly visible for the user.
The main problem addressed by this partitioning is the imme-
diate TLB miss caused by the first memory access after a con-
text switch, because the entries within the TLB do not belong
to the active task anymore. This effect is known as cold cache
phenomenon. The goal is to prevent TLB misses after a context
switch without slowing down the context switch itself by inval-
idating the whole TLB and a prefetch of TLB entries to prevent
long latencies within context switches. Instead, an exclusively as-
signment of TLB entries to specific soft and non real-time tasks
(locked entries) is introduced. This shall prevent TLB misses right
after a context switch and keeps the context switch fast by using
ASIDs as described in section 2.2.3. The number of locked TLB
entries for a specific soft or non real-time task should be change-
able dynamically, as the memory access behavior may be highly
dynamic during the runtime of the task.
To realize the Mixed Priority Paging with dynamic TLB partition-
ing for hard, soft and non real-time tasks, a software managed
TLB design as introduced in section 2.2.3can be used. For every
miss handler a replacement policy has to be implemented. This
is very simple for hard real-time tasks: there exists only one pos-
sible entry for replacement which makes the replacement possi-
ble in O(1).
In the case of soft real-time tasks, a solution for the following
aspects has to be provided:
TLB locked entry replacement policy
Overall TLB replacement policy
Partitioning policy
The replacement policy for the locked TLB entries is the most de-
manding problem. To prevent TLB misses after a context switch
with the help of ASIDs, the page table entries with the highest
access probability have to be kept in the TLB as locked entries.
For this it is inevitable to know the number of accesses over a cer-
tain period of time or at least a good approximation to get an
idea which pages have a high probability to be used again. The
execution time before the scheduler preempts the active process
96 a virtual machine monitor for embedded real-time systems
is a good choice for the period of time in which the number of
accesses is counted. Just before dispatching the next process, the
top nentries are placed in the locked area of size n, if they haven
not already been located there. The placement is performed at
the end of the execution phase, because the locked entries take
effect not before the process gets dispatched again. To prevent
a blocking of the locked area by entries that have been accessed
very frequently and are not accessed anymore, the access coun-
ters of every TLB entry should be reset to 0before the next pro-
cess gets dispatched. This ensures that it is possible to adapt to
changed memory access behaviors.
The global TLB replacement policy is responsible for the entries
that are not locked. These entries can be used by all tasks, even
the tasks that have exclusively assigned TLB entries. For the area
of locked entries, the most frequently used entries are chosen. If
the entries most frequent used are known, it is possible to find
out the least frequently used entry. Therefore, it is possible to ap-
ply least-frequently-used (LFU) to the globally available entries
of the TLB. This has the nice side-effect that an entry that has
been used quite frequently and may have experienced a small
period in which it has not been accessed, will not be replaced.
The next step is to find out how to manage the number of locked
entries for a specific soft or non real-time task. The best metric to
achieve this is the TLB miss ratio TLBMR, which represents the
percentage of all memory accesses that lead to a TLB miss. Based
on the TLB miss ratio, two thresholds γinc and γdec for each
soft and non real-time task are introduced. They are checked
right after the process is preempted. The number of locked en-
tries is incremented by 1if TLBMR > γinc and decremented
if TLBMR < γdec. The incrementation by 1prevents a task from
grabbing all available TLB entries for its locked area without giv-
ing other tasks the possibility to increase their number of locked
entries. The check is performed at the end of the execution phase.
Another important question as regards to the partitioning policy
is which tasks should be able to lock TLB entries. A partition-
ing scheme in which only small areas (i.e. one or two entries)
are locked for a high number of tasks should be prevented. This
would lead to a fully partitioned TLB with only a very small
number of available slots for each task. Therefore, it is proposed
to allow only tasks the locking of TLB entries which heavily de-
pend on memory accesses, as they would benefit most from hav-
3.3 design 97
ing locked entries. Furthermore, it is important to limit the total
number of locked entries, preventing a heavy performance loss
for tasks that are not allowed to lock TLB entries. There always
has to be at least one globally available slot otherwise, address
translation would not be possible for tasks without locked en-
tries [GK08].
3.3.7 I/O Virtualization
The IPCM is a very sensitive part of the hybrid VMM archi-
tecture, as it allows to communicate between different spatially
isolated VMs. Therefore, the IPCM provides a hypercall called
ipcm_create_tunnel that creates a shared memory region accessi-
ble by both VMs only using the memory virtualization provided
by the underlying architecture. As already stated in section 2.2.4
the information flow control policy is very important as the
policy defines which VMs may communicate directly using the
shared memory provided by the IPCM. When using an unpro-
tected shared memory region for the communication of two VMs
the advantage is that the VMs have to implement the communi-
cation protocol being used reducing the complexity of the VMM.
Nevertheless, this is problematic from the point of view of the
spatial isolation provided by the VMM as the VMM is not in
control of the performed interactions. Thus, the VMM cannot
prevent bad behavior leading to unintentional interactions on
this communication channel. To secure this channel against bad
behavior the VMM can introduce a protocol to be used. An in-
teresting approach is to use the I/O ring concept introduced by
XEN [BDF+03]. The big advantage of the I/O ring concept is the
ability to en- and dequeue multiple items at once to reduce the
hypercall overhead.
Besides the capability of direct Inter-VM I/O the VMM needs
to support device virtualization. To realize this two concepts are
applied:
Dedicated I/O using memory mapped I/O (MMIO)
Shared I/O using Hypercalls
The concept of dedicated I/O using MMIO is suitable for full vir-
tualization. The memory area of the device is mapped into the
address space of the associated VM. Thus, the VM has full ac-
98 a virtual machine monitor for embedded real-time systems
cess to the device itself. The problem of this approach is that the
sharing of devices between multiple VMs is not possible at all.
Thus, hosting multiple VMs using I/O devices requires a dedi-
cated I/O device for each VM. The advantage of this approach is
that the VMM does not need to handle any trapped instruction
or hypercall reducing the virtualization overhead to zero.
Sharing an I/O device between multiple VMs is much more
complex. Considering the case of MMIO devices being shared
between multiple full virtualized VM this can be very tricky to
handle, because the virtualization needs to be handled at the
I/O operation level itself. The VMM needs to provide a memory
mapping for the MMIO device registers to each VM using the
device. This comes at the cost of extra TLB slot usage and the
problem of monitoring access to this memory mappings. When
providing the memory area as writeable the VMM has no chance
to monitor the state of the device. This can easily result in incon-
sistent states of the device when multiple VMs access the mem-
ory area without synchronization. The synchronization of the
MMIO area is not trivial since the write access has to be granted
to a single VM and revoked after the VM has finished its access
to device and the device is ready for a new access. The problem
for the VMM is to monitor when the VM has finished its access,
because once the VMM has granted write access to the MMIO
area the VMM is not able to monitor the memory accessing in-
structions of the MMIO area violating the required resource con-
trol property presented in 2.2.2. Thus, the VMM is not able to
notice when it can revoke the access to the device from the VM.
To enable the VMM of monitoring the MMIO area it needs to be
write protected. In this case every memory access to the MMIO
area traps to the VMM. The VMM is then able to emulate these
memory accesses using a virtual device data structure for each
machine. When the VMM detects that a request is completely
performed on the virtual device it can enqueue this request to
the real device. This requires the integration of virtual devices
into the VMM and makes the VMM much more complex. Be-
sides the complexity of the VMM the virtualization overhead at
the I/O operation level increases dramatically. Actually research
focuses more in the direction of I/O virtualization at the level of
hardware support [BYMK. . . 06,UNRS05] and the device driver
level [EPF06] or even combinations of both [RS07]. [Bal09]
3.3 design 99
3.3.8 Summary
The presented design was the first design of a hybrid VMM and
was published in [BK09]. It supports the full virtualization of
unmodified guests, the execution of paravirtualized guests and
a combination of both where the full virtualization is realized
as a fall back mechanism when no hypercall is available for the
required action. Thus, the goals of a configurable ABI and an
extensible scheduler interface, defined in section 3.1, have been
achieved while fulfilling the constraints of providing full virtual-
ization and real-time support.
The configurability is provided by the intensive usage of pre-
processor statements which allow for a fine granular configu-
ration of all VMM components. The configurability also allows
the exchange of scheduling policies by providing a special sched-
uler interface. The need for spatial isolation, which is essential
for preventing unintentional interactions, is guaranteed by the
MILS-based architecture. The core functionality of the VMM is
intentionally kept very small to enable the verification of the
VMM. Additional untrusted functionality can be shifted into
user space.
The need for high level APIs and high performance is addressed
by providing a configurable hypercall interface for the under-
lying ISA which also includes the IRFM feature, allowing ac-
cess to most of the ISA registers by simple paravirtualized mem-
ory accesses. To enable semi-automatical paravirtualization, pre-
virtualization is implemented as a preprocessing python script
allowing for a complete automatic paravirtualization if the un-
linked object files of the guest is available.
To fulfill real-time requirements, the design always considered
determinism by bounded execution times. The virtual memory
management of architected TLBs has been specially optimized in
this case. The mixed priority paging partitions the TLB to favor
soft and non real-time tasks instead of hard real-time tasks, as it
is in general not possible to guarantee a 100% coverage of TLB
entries to page table entries.
The introduced extensible scheduling interface is defined at an
quite abstract level to keep it generic. For an information flow
from the VM to the VMM, the method sched_setParam has been
introduced, which allows for the passing of arbitrary data by
100 a virtual machine monitor for embedded real-time systems
memory pointers to the VMM. Thus, the scheduler of the VMM
and the scheduler of the VM need to be adopted for this com-
munication. Unfortunately, an implementation of this method is
not required when focusing on full virtualization. A general dis-
cussion about the schedulability of RTVMs is given in chapter
4.
To sum it up the main goals of providing a configurable ABI and
an extensible scheduler interface with regard to real-time and
full virtualization support have been achieved. The only goal left
to be addressed is the determination of the WCET of the virtual-
ized tasks. This will be addressed in the following evaluation.
3.4 evaluation
The implementation and evaluation of the design proposed in
section 3.3was the main part of the diploma thesis of Daniel
Baldin [Bal09]. The implementation was realized on a PowerPC
405 FX microprocessor. Therefore, at the beginning of this sec-
tion the virtualizability of the Power ISA is shown in section
3.4.1. The following section 3.4.2covers the determination of the
WCETs of the VMM. With the help of the WCETs of the VMM,
it is possible to determine the WCETs of the VMs.
3.4.1 PPC405 ISA Analysis
The basic requirement for supporting efficient full virtualization
is the compliance of the Popek and Goldberg theorem by the
PowerPC 405 ISA. As classical RISC processor, the PowerPC of-
fers a set of registers, which are only accessible in supervisor
mode through dedicated read and write instructions for these
registers. In figure 43, the register set of the PowerPC 405 FX is
depicted and shows the registers accessible in user mode and su-
pervisor mode. Table 3shows a short description of a subset of
the register available only in supervisor mode by the dedicated
read and write instructions mtspr and mfspr (move to/from spe-
cial purpose register). When taking an in-depth look at the reg-
ister set, one can see that within the user model, only registers
are readable and writeable by innocuous instructions which do
3.4 evaluation 101
User Model
General-Purpose Registers
GPR0
GPR1
GPR31
Condition Register
CR
Fixed-Point Exception Register
XER
Link Register
LR
Count Register
CTR
Time Base Registers (read-only)
TBL
TBU
SPR 0x001
SPR 0x009
SPR 0x008
TBR 0x10C
TBR 0x10D
Supervisor Model
MSR
Machine State Register
PVR
Processor Version Register
SPR 0x3DA
Exception Handling Registers
Exception Vector Prefix Register
Exception Syndrome Register
EVPR
ESR
SPR 0x3D5
SPR 0x3D4
SPR General Registers
SPRG0
SPRG1
SPRG7
SPR 0x110
SPR 0x111
SPR 0x114
Save/Restore Registers
SRR0
SRR1
SRR2
SRR3
SPR 0x01A
SPR 0x01B
SPR 0x3DE
SPR 0x3DF
SPRG4
SPRG5
SPRG7
SPR 0x104
SPR 0x105
SPR 0x107
SPRG5 SPR 0x106
SPR General Registers (read-only)
SPRG2
SPRG3
SPRG4
SPRG5
SPRG6
SPR 0x112
SPR 0x113
SPR 0x115
SPR 0x116
SPR 0x117
Data Exception Address Register
DEAR SPR 0x3D5
Timer Control Register
TCR
Timer Status Register
TSR
Timer Facilities
Core Configuration Registers
CCR0
Instruction Address Compares
IAC1
IAC2
IAC3
IAC4
SPR 0x3F4
SPR 0x3F5
SPR 0x3B4
SPR 0x3B5
Debug Registers
Time Base Registers
TBL
TBU
SPR 0x11C
SPR 0x11D
Data Address Compares
DAC1
DAC2
SPR 0x3F6
SPR 0x3F7
Debug Status Register
DBSR
Storage Attribute Control Registers
DCCR
DCWR
SPR 0x3FA
SPR 0x3BA
SPR 0x3BB
ICCR
SGR
SLER
SU0R
SPR 0x3FB
SPR 0x3B9
SPR 0x3BC
Debug Control Registers
DBCR0
DBCR1
SPR 0x3F2
SPR 0x3BD
Data Value Compares
DVC1
DVC2
SPR 0x3B6
SPR 0x3B7
Instruction Cache Debug Data Register
ICDBR SPR 0x3D3
SPR 0x3F0
Memory Management Registers
Process ID
Zone Protection Register
PID
ZPR
SPR 0x3B1
SPR 0x3B0
SPR 0x3B3
SPR 0x3D8
SPR 0x11F
Programmable Interval Timer
PIT SPR 0x3DB
User SPR General Register 0 (read/write)
USPRG0 SPR 0x100
CCR1 SPR 0x378
Machine Check Syndrome Register
MCSR SPR 0x23C
Figure 43: PowerPC 405 register set [Xil09].
102 a virtual machine monitor for embedded real-time systems
not violate the resource control property of am VMM defined
by Popek and Goldberg, as all registers managing resources are
sensitive instructions available only in supervisor mode. Thus,
it is possible to simplify theorem 2.6for the PowerPC 405 FX
ISA. This is done by checking whether all sensitive instructions
accessing the registers of the supervisor model are trapping in
user mode. This can easily be shown, since all sensitive instruc-
tions accessing registers of the supervisor model are privileged
and cause a trap when executed in user mode according to the
PowerPC 405 FX users manual.
With this knowledge, it was possible to implement emulation
routines for all sensitive instructions using the approaches intro-
duced in section 3.3while preserving the Popek and Goldberg
criterions introduced in section 2.2.2[BK09].
3.4.2 Worst Case Execution Times
As defined in section 3.1, the knowledge on how the WCET is
affected by the virtualization is essential for RTVMs to deter-
mine the upper bound of virtualization overhead to guarantee
the needed determinism for real-time applications and to deter-
mine the CPU requirements of the final virtual real-time system.
The design presented in 3.3considered the required determin-
ism by its offline configurability preventing unnecessary runtime
overhead and the proposed virtualization techniques to be ap-
plied. The determination of the WCET of a VM is divided into
two steps. First, the virtualization complexity at instruction level
is determined. In general, an instruction is taken from the ISA,
which can be a privileged or non-privileged instruction. The set
of privileged instruction is called Ip. The WCET of instructions
of the set of privileged instructions needs to be determined, be-
cause the virtualization induces overhead by the emulation of
the VMM, while the WCET of the non-privileged instructions
can be determined without considering this overhead. Additon-
ally interrupt events have to be considered as well and are con-
sidered as atomic events with a given WCET, because the VMM
does not allow for nested interrupts.
WCET :IvN|Iv{i|iIpiIRQ}(3.8)
3.4 evaluation 103
Register
Short
Name
Register Long Name Resource
MSR Machine State Register Manages the processor
execution modes.
CCR0-1Cache Control Register Manages the cache behav-
ior.
EVPR Exception Vector Prefix Reg-
ister
Contains the physical ad-
dress of the offset for the
interrupt handlers
SRR0-4Save and Restore Register Contains the MSR, PC
and other important reg-
ister values when an in-
terrupt occurs. When re-
turning from an interrupt,
the values of the register
can be restored by setting
the associated SRR.
PID Process Identification Contains the Process ID
used for selecting the ac-
tive TLB entries of the ar-
chitected TLB.
ZPR Zone Protection Register Contains policies for
memory protection
TBU/TBL Time Base Lower/Upper Contains the number of
processor cycles since the
register was last resetted.
TCR Timer control register Controls the timer behav-
ior of the process.
TSR Timer Status Register Contains information
about the status of the
fixed and programable
interval timer (FIT/PIT).
PIT Programable Intervall Timer Counts down the proces-
sor cycles to the next PIT
interrupt.
Table 3: Subset of PPC405FX registers available in supervisor mode
104 a virtual machine monitor for embedded real-time systems
The determination of the WCET of a whole VM requires a con-
trol flow graph analysis of the VM to determine the path with
maximum execution time within this graph. One of the biggest
problems of determining this path is given by loops depending
on input values, which lead to a state explosion problem when
testing all possible cases. To solve this problem, sophisticated
methods [PB00,FH04,WEE+08] and tools like aiT from AbsInt
have been developed to derive this path efficiently and as accu-
rately as possible while minimizing the overestimation of these
methods. Once this path called PWCET is known, it can be used
to determine the WCET of the VM including the overhead in-
duced by the VMM. The path PWCET with length nis then given
as a sequence of instructions of the underlying ISA and occuring
interrupts during the execution of PWCET .
PWCET P= (ISA IRQ)n(3.9)
To clarify this modelling, an example for a path pWCET with
length n=4is given as
PWCET = (0li r0,10, ’pit’,0ba 0x24000,0mfsrr0r30)(3.10)
The path pWCET can consist of privileged, non-privileged in-
structions and interrupt events. For the determination of the
WCET of the path PWCET , the functions WCETf:PNand
WCETp:PNare defined. WCETfrepresents the virtualized
execution using full virtualization, while WCETprepresents the
virtualized execution using paravirtualization.
WCETf(p) = X
ip
WCET(i) + X
ip|iIv
(WCETf(i) WCET(i))
(3.11)
and
WCETp(p) = X
ip
WCET(i) + X
ip|iIv
(WCETp(i) WCET(i))
(3.12)
The function WCETf(p)or WCETp(p)first calculates the WCET
of PWCET by assuming it was executed, and not virtualized, and
then adds for every instruction of interrupt the induced over-
head of the full or paravirtualization by adding the WCET of
3.4 evaluation 105
the virtualization step and then subtracting the native execution
time of the virtualized instruction or IRQ as it is not executed
natively anymore. [Bal09]
Measurements
To determine the WCETs of the emulation of privileged instruc-
tions and interrupts as accurately as possible, their execution
has been simulated cycle accurately using ModelSim [Mod09]
together with the VHDL design of a Virtex II Pro FPGA, encap-
sulating a PowerPC405FX as hard wired core. The advantage of
this approach is the cycle accuracy which is not given when us-
ing measurements based on the timer register or I/O Pins due
to their inherent delays. Using ModelSim allows for the monitor-
ing of the write-back pipeline stage of the five stage pipeline of
the PowerPC405FX at any time. Thus, it is possible to determine
the exact number of cycles passed until an instruction passed the
pipeline completely.
In general, the execution time for the virtualization depends on
the parameters of the virtualized instruction. Thus, to determine
the WCET, the parameters creating the longest execution paths
have been chosen. This was possible due to the internal knowl-
edge of the virtualization process and due to the short execu-
tion paths generated by the slim design of the VMM. Other-
wise, a pipeline analysis would have been necessary to deter-
mine the WCET for all possible parameters. The determination
of the WCET of the native execution was realized by disabling
the cache and placing the instruction at the beginning of the
cache line boundary to force a line fill, which is performed in the
PPC405FX to fetch the following instructions even with caching
disabled. In general, with caching enabled and a hot cache, the
PPC405FX is able to execute instructions in one up to five cycles
[IBM05]. Table 4and 5show the results for the measured WCETs,
with caching disabled, of the privileged instructions (see 3.4.1) of
the PowerPC405FX ISA.
The cost of a line fill can be determined by substracting 5cycles
from the measurement of the WCETs of the native execution,
because the execution of a single instruction in an empty 5stage
pipeline takes 5cycles and thus, the cost of a line fill is 52 cycles.
These 52 cycles are already included in the values of WCETpin
table 4and 5.
106 a virtual machine monitor for embedded real-time systems
rfi rfci wrteei 1wrteei 0
WCETnative [cycles] 57 57 57 57
WCETf[cycles] 2656 2645 2455 2044
WCETp[cycles] 2463 2452 2054 1687
WCETi[cycles] - - - -
Full/native 46,60 46,40 43,07 35,86
Para/native 43,21 43,01 36,03 29,60
IRFM/native - - - -
tlbwehi tlbwelo mtmsr mftcr
WCETnative [cycles] 57 57 57 57
WCETf[cycles] 3394 2832 2531 57
WCETp[cycles] 2555 1819 2075 1654
WCETi[cycles] - - - 180
Full/native 59,54 49,68 44,40 38,75
Para/native 44,82 31,92 36,40 29,02
IRFM/native - - - 3,16
mfevpr mftsr mtevpr mttsr
WCETnative [cycles] 57 57 57 57
WCETf[cycles] 2218 2248 2215 2323
WCETp[cycles] 1568 1654 1447 1634
WCETi[cycles] 180 180 180 -
Full/native 38,91 39,44 38,86 40,76
Para/native 27,51 29,02 25,39 28,67
IRFM/native 3,16 3,16 3,16 -
mttcr mttbu mttbl mtpit
WCETnative [cycles] 57 57 57 57
WCETf[cycles] 2266 2266 2419 2389
WCETp[cycles] 1558 1547 1711 1714
WCETi[cycles] - - - -
Full/native 39,75 39,75 42,43 42,07
Para/native 27,34 27,14 30,02 30,07
IRFM/native - - - -
Table 4: WCETfand WCETpmeasurement. WCETirepresents the
WCET using IRFM (see section 3.3.3) for register access
[Bal09].
3.4 evaluation 107
mtsrr mttbl mtpit mtsprg
WCETnative [cycles] 57 57 57 57
WCETf[cycles] 2218 2419 2389 2176
WCETp[cycles] 1468 1711 1714 1433
WCETi[cycles] 180 - - 180
Full/native 38,91 42,43 42,07 38,17
Para/native 25,75 30,02 30,07 25,14
IRFM/native 3,16 - - 3,16
mtzpr mtpid mfmsr tlbrehi
WCETnative [cycles] 57 57 57 57
WCETf[cycles] 2221 2302 2056 2731
WCETp[cycles] 1519 1655 1612 1879
WCETi[cycles] - - 180 -
Full/native 38,96 40,39 36,07 47,91
Para/native 26,65 29,03 28,28 32,97
IRFM/native - - 3,16 -
tlbrelo mfsrr mfsprg mfpid
WCETnative [cycles] 57 57 57 57
WCETf[cycles] 2686 2282 2191 2243
WCETp[cycles] 1733 1639 1568 1629
WCETi[cycles] -180 180 180
Full/native 47,12 40,03 38,44 39,19
Para/native 30,40 28,75 27,51 28,58
IRFM/native -3,16 3,16 3,16
Table 5: WCETfand WCETpmeasurement. WCETirepresents the
WCET using IRFM (see section 3.3.3) for register access
[Bal09].
108 a virtual machine monitor for embedded real-time systems
SC IRQ PIT IRQ FIT IRQ
WCETf[cycles] 1206 1978 1140
WCETp[cycles] 1506 2202 1386
Table 6: WCETfand WCETpmeasurement for interrupt latency over-
head induced by the VMM. WCETirepresents the WCET us-
ing IRFM (see section 3.3.3) for register access [Bal09].
Compared to the native execution, the emulation of sensitive in-
structions induces an enormous overhead from factor 38 up to
factor 60 in the case of full virtualization and from factor 25
up to factor 44 in the case of paravirtualization. The introduc-
tion of the IRFM (see section 3.3.3) increased the performance of
the paravirtualization dramatically, resulting in a factor of only
3compared to the native execution. The mean factor between
full virtualized and native execution is about 42, while the mean
factor between the paravirtualized and the native execution is
about 31 with IRFM disabled and about 22 with IRFM enabled,
assuming a uniform distribution of the instruction occurrences.
Thus, the full virtualization needs on average about factor 1.35
more cycles than the paravirtualization when the IRFM is not
used. When the IRFM is used, the full virtualization needs about
factor 2more cycles than the paravirtualization.
Besides the emulation of native instructions, the VMM is also
responsible for virtualizing occuring interrupts as described in
sections 3.3.3and 3.3.4. The occurrence of such an interrupt at
first activates the VMM, which decides how to proceed with the
interrupt, because the interrupt may have the VMM or the VM
as target. This creates additional latencies to the handling of in-
terrupts.
The measurements of these latencies are listed in table 6. It is
interesting to see that the paravirtualization using the IRFM fea-
ture increased the interrupt latency, while the full and paravirtu-
alization without IRFM are equal in latency. This effect is caused
by the mapping of certain registers to the memory of the guest
VM. When an interrupt occurs, some of the registers are changed
to represent the state before the occurence of the interrupt. This
state has to be forwarded to the guest OS interrupt handlers
and consequently, then after IRFM feature has been enabled, the
state has to be written to the memory location of the IRFM area
3.4 evaluation 109
Cycles
WCETp(p)23563
Measurement 22492
Cycles
WCETf(p)37794
Measurement 37394
Table 7: Application of functions WCETf(p)(3.11) and WCETp(p)
(3.12) compared to the real measured execution times.
of the guest OS to synchronize the machine state with the state
represented in the IRFM. This additional synchronization over-
head has increased the interrupt latency, while the runtime ac-
cess time to registers mapped in the IRFM has decreased dra-
matically as shown in table 4and 5. Thus, the usage of the IRFM
feature is useful when the number of occuring interrupts does
not destroy the advantage of the IRFM register access speedup
within the guests. [Bal09]
Case Study
In the preceding section, the induced worst case overhead of the
single emulation routines and the IRQ handling of the VMM
have been shown and mean values have been calculated to show
roughly what the mean worst case overhead could look like for
a virtualization of a guest OS on the PowerPC405FX. Neverthe-
less, the real worst case overhead depends on the execution trace
being generated by the guest OS and its Tasks. To show the ap-
plication of the function WCETf(p)(3.11) and WCETp(p)(3.12)
to determine the WCET for full and paravirtualization, a case
study of the RTOS ORCOS is performed. ORCOS has been de-
veloped at the University of Paderborn and is designed for the
application in deep-embedded systems. The big advantage of us-
ing ORCOS for this case study was the availability of the source
code and the knowledge of internal details being highly relevant
for the application of the proposed VMM design.
The first part of the case study shows the determination of the
WCET for the RTOS ORCOS when it has to perform process
dispatching after the occurrence of a timer interrupt in the case
of full and paravirtualization with IRFM. The execution trace of
this dispatching process is finite and unique and could thus be
formulated as:
110 a virtual machine monitor for embedded real-time systems
p= (pit, ...,0mfsrr0 r270,0mfsrr1 r280, ...,0mfpid r310, ...,
0mttsr r00, ...,0wrteei 00, ...,0mttsr r00,0mtpit r40, ...,
0mfmsr r20, ...,0mtsrr0 r90,0mtsrr1 r20,0rfi0)
(3.13)
The trace only lists the execution of sensitive instructions that
triggered the VMM for emulation. First, the needed execution
time of this trace has been measured for the case of full and
paravirtualization using ModelSim. Finally the execution trace
phas been used to calculate WCETf(p)and WCETp(p). Table
7shows the results of the functions WCETf(p)and WCETp(p)
and the result of the measurement by simulating the trace using
ModelSim. In the case of paravirtualization, the approximation
of the WCET by the function WCETp(p)is about 4.5% higher
than the value of the real execution. This is due to the assump-
tion of occurring line fill effects when using paravirtualization
(see 3.4.2). The approximation of the fully virtualized WCET is
about 1% higher than the value of the real execution. Thus, the
approximation of the functions WCETf(p)(3.11) and WCETp(p)
(3.12) are a good indication for the real WCET.
The question of how to determine an approximation of the vir-
tualization WCET for a given instruction or program trace phas
been addressed in this section, but there is still the question to
answer what especially creates this overhead in virtualization.
Therefore, the overhead induced by the different components of
the VMM design will be determined in detail in the next section.
[Bal09]
3.4.3 Virtualization overhead
Within this section, the induced overhead of the different VMM
components is analyzed to identify the bottlenecks of the virtu-
alization process. It has to be noted that this heavily depends
on the hardware used. Thus, the result of this section does not
only focus on the improvement of the implementation, but also
on the improvement of the hardware support to boost virtualiza-
tion even on embedded hardware.
3.4 evaluation 111
tstd
t
Trap of
privileged
instruction
thtr
ta
Figure 44: Virtualization overhead
169, 14%
64, 5%
4, 0%
232, 19% 143, 11%
410, 33%
218, 18%
771, 62%
ts: Context saving
ta: Analysis
te: Analysis
tr: Context restore
td: Dispatch
Table Lookup / Branch
Instruction fetch
Register moving
Figure 45: Virtualization overhead of mtevpr in detail [Bal09].
To determine the cost of the different components of the VMM
design it is helpful to reconsider figure 44. First, the VMM saves
the context upon an interrupt taking time ts. Then, the VMM
needs to identify the kind of interrupt and its source taking time
ta. Afterwards, the dispatcher is responsible for delegating the
identified interrupt to the associated component, what takes the
time td. Then, the handling is performed by this component tak-
ing the time th. Finally, after the handling is finished, the control
is transferred from the VMM to the target VM by restoring the
context of the target VM taking the time tr. Figure 45 shows
these values.
The overhead ts,tris constant and depends on the hardware ar-
chitecture, while the analysis overhead tadepends on the source
interrupt. For the instruction mtevpr, this overhead is caused
due to the analysis of a Program IRQ being raised upon exe-
cution of the mtevpr in userspace. The analysis needs to decide
whether the program IRQ was raised in user mode or in supervi-
sor mode. Due to the program IRQ being ambiguous, the action
to be performed needs to be identified. In case of the mtevpr in-
112 a virtual machine monitor for embedded real-time systems
struction, this needs 5% of the whole emulation process. When
an emulation is required, the instruction that caused the inter-
rupt must be loaded from memory, because the Power Archi-
tecture only stores the address of the instruction in a save and
restore register. This is denoted as instruction fetch of the dis-
patching overhead tdand requires a memory access of the VMM
to the address where the instruction is stored contributing 11%
to the total overhead. Afterwards the instruction opcode and its
parameters have to be identified within this instruction word be-
ing represented by the table lookup and branch overhead of td.
Due to the instruction encoding of the Power Architecture, this
task in the worst case requires a two level indirection analysis
(in case of mtspr and mfspr) to identify the instruction. This task
contributes 33% to the total emulation overhead. When the in-
struction has finally been identified, the parameter needs to be
identified and moved from its register to the parameter register
of the emulation method. This can simply be implemented as a
switch statement.
source_reg = extract_source_register(inst)
switch(source_reg) {
case 0: regs[13] = regs[0]
case 1: regs[13] = regs[1]
case 2: regs[13] = regs[2]
...
}
Being in principle quite simple, this switch statement induces
two indirect branches, which lead to a stall of the pipeline and
thus contributes a large part of 18% to the whole emulation of
the instruction. Thus, a total of 62% of the emulation overhead
for the instruction mtevpr is induced by the dispatching to the
associated emulation routine. The emulation routine of mtevpr
is very simple, as it only needs to move the parameter value
to the EVPR register of the VM and contributes only 4cycles to
the emulation overhead. Finally, the context saving and restoring
routines contribute 14% and 19% to the emulation overhead.
The most costly part of the emulation is thus the dispatching
process, as there are included several indirect branches which
lead to a stall of the pipeline.
3.4 evaluation 113
While the emulation of the instruction mtevpr does not require
a large amount of computation time to emulate the instruction
itself, this cannot be transfered to all instructions of the Power
ISA. A register requiring a much larger emulation overhead is
the machine state register (MSR) being accessed by mtmsr and
mfmsr. When accessing this register, some checks need to be per-
formed in the emulation routine. One example is the activation
and deactivation of interrupts. When interrupts are deactivated
for a specific VM, the interrupts are buffered by the VMM, as
the interrupts are only virtually disabled, because the sharing
of the Timer register requires a permanent activation of inter-
rupts to guarantee the resource control property of the VMM
(see 2.2.2). Upon reactivation of interrupts for a VM, the emula-
tion routine of mtmsr needs to check whether there are buffered
interrupts pending for the VM. When there are pending inter-
rupts, the handling of these interrupts needs to be invoked. This
interrupt handling mechanisms require about 50% of the whole
emulation routine for the MSR register. [Bal09]
3.4.4 Footprint
In order to be deployed on small scale nodes, the memory over-
head introduced by the virtual machine monitor needs to be as
small as possible. Since the VMM is configurable completely, the
amount of binary and memory footprint occupied by the virtual
machine monitor strongly depends on the features used by the
target system. Thus, it was necessary to build configurations that
allowed for the extraction of the binary and memory footprint
of the configurable features. The results are presented in table 8.
The values have been derived by compiling the different config-
urations using the GNU C Compiler 4.5with optimization level
2. Remember that for additional VMs, the memory footprint of
the IPCM, TLB and the IRFM features need to be multiplied by
the number of available VMs. Furthermore, 300 bytes of the core
memory footprint are allocated for the VM context. Thus, this
number also needs to be multiplied by the number of available
VMs to get the total memory footprint.
In total, the VMM needs approximately 8KB of memory for its
code and approximately 4KB of memory for its data structures.
The data structures are statically allocated and thus, there is no
114 a virtual machine monitor for embedded real-time systems
Feature Binary Memory
Footprint [bytes] Footprint [bytes]
IPCM 464 2
TLB 960 648
Previrtualization 192 0
IRFM Fallback 620 0
IRFM 552 92
Paravirtualization 648 160
Full virtualization 472 2172
Core 4072 760
Total 7980 3834
Table 8: Binary and memory footprint of the VMM with one VM con-
figured.
dynamic memory allocation during runtime for the VMM. The
core already includes the emulation routines for full virtualiza-
tion, as full virtualization is always used as a fallback feature.
When enabling the full virtualization feature, mainly the mem-
ory footprint increases due to the dispatching tables used to
speedup the full virtualization process. Nevertheless, full virtu-
alization already works when only using the core configuration.
[BK09]
3.4.5 Performance
The performance overhead introduced by the virtualization soft-
ware has been measured for a set of application scenarios. As
real-time operating system ORCOS (University of Paderborn,
2009) running on a PowerPC405 processor has been used to mea-
sure the overhead for two application scenarios. The results are
shown in 9.
The first scenario used a simple periodic real-time task for the
only purpose of counting a variable to a specific value. The exe-
cution time was measured for an interval of two hundred repeti-
tions to get a reasonable measurement. The overhead produced
here was about 60% compared to the native execution time. The
high overhead in this scenario can be easily explained by the fact
3.5 summary 115
that the amount of time spent inside the operating system com-
pared to the amount of time spent inside the real-time task was
really high. Thus, the relative amount of privileged instructions
used by the operating system which needed to be emulated was
high, contributing to this kind of overhead.
A more realistic scenario was given by another real-time task
which had to calculate a Fast Fourier Transformation for a set
of input values for a fixed amount of repetitions. Since the time
spent for the calculation was much higher compared to the for-
mer example leading to a much smaller amount of emulation
routines called the overhead reduced to less than one percent if
the application is executed para-virtualized. It is also possible to
see that running the unmodified application in full-virtualization
mode is only slightly slower. The overhead increases to 0.56%
in that case. Considering embedded control systems, this ren-
ders the usage of full-virtualization extremely interesting. Fur-
thermore, this observation gives reason to believe that the ex-
ecution time of applications that could not be paravirtualized
completely, since for example a part of the application has been
linked against static libraries, will not suffer much by running
inside the VMM introduced in this thesis.
Applications using only the user ISA as given by the third ex-
ample application SimpleFFT, which calculated a Fast Fourier
Transformation without using the support of an operating sys-
tem, can be executed with native performance, because no in-
struction needs to be emulated by the virtual machine monitor.
[BK09]
3.5 summary
The overall goal of this thesis is the integration of software com-
ponents into a big integrated system. Thus, there are given real-
time systems which may have been executed on different hard-
ware than the integrated systems run on. To make these real-
time systems executable on the integrated system, every real-
time system is encapsulated as RTVM. These RTVMs are then
executed on the VMM introduced within this chapter, which
shall ensure temporal and spatial isolation to prevent uninten-
tional interactions. There already exist a few commercial virtual-
116 a virtual machine monitor for embedded real-time systems
Execution Mode ORCOS + ORCOS + FFT
CTask FFTTask
Native [ms] 10.73 5094.97 294.56
Full virtualization [ms] 26.51 5123.38 294.56
Paravirtualization [ms] 17.16 5117.94 294.56
Full Virtualization/ 246.95 100.56 100
Native [%]
Paravirtualization/ 159.92 100.45 100
Native [%]
Full / Para [%] 148.65 100.11 100
Table 9: Performance of three different virtualization scenarios.
ization platforms or VMMs for a range of embedded processors
nearly all of them being proprietary systems. All of the available
products only use paravirtualization trying to provide reason-
able performance and support realtime applications only by the
use of dedicated resources. Naturally, this limits the applicability
of virtualization to a subset of all possible scenarios, as in gen-
eral, the paravirtualization interfaces are not standardized. Espe-
cially whenever there are applications that cannot be paravirtual-
ized, because the source code is not available, these applications
cannot be virtualized using the currently available virtualization
products, since this would require a binary analysis of the whole
application which most often is not completely possible.
Thus, a VMM providing a configurable hybrid VMM architec-
ture was designed and implemented for the PowerPC405 pro-
cessor which allows for the virtualization of unmodified appli-
cations as well as paravirtualized applications or even a combina-
tion of both. To describe this in more formal manner, the ABI is
kept to be configurable. The paravirtualization effort is thus de-
creased, as only the required hypercalls need to be implemented
in the guest OS. The support for paravirtualization was moti-
vated by the integration of open source GPOSs for High-Level
API support like Linux which already provide paravirtualization
interfaces.
Support for realtime applications was the next major goal of the
design, which allows for the integration of any kind of schedul-
ing by the introduction of an extensible scheduler interface and
3.5 summary 117
while being completely deterministic. Furthermore, the config-
urability allows the system to be optimized explicitly for the in-
tended field of use. This affirmed research in this direction with
hybrid virtualization being a relevant topic in industrial embed-
ded systems. Finally, it is desirable to know in advance how the
WCETs of the executed guests are affected, as these WCETs are
necessary to determine the CPU requirements of the virtual real-
time system. Therefore, the WCETs of all available privileged in-
structions, IRQ Handlers and Hypercalls have been determined
to include these values in the final WCET of a real-time task
being executed in a RTVM.
The support for real-time applications requires the temporal iso-
lation of the executed RTVMs, which has not been addressed
up to now, but the interface to provide this has been introduced
by the realized extensible scheduler interface. The next chapter
thus covers the problem of temporal isolation in a virtualized
environment with hierarchical scheduling.
4
SCHEDULING OF FULL VIRTUALIZED HARD
REAL-TIME SYSTEMS
Contents
4.1Problem Statement 120
4.2Related Work 121
4.3Model 135
4.4Transformation of real-time systems into real-time vir-
tual machines 137
4.5Partitioning Policy 142
4.6Evaluation 155
The goal of a VMM is to give to each VM the illusion of hav-
ing the resources of a complete system at its disposal. Neverthe-
less, the resources assigned to VM are in reality only subsets of
a physical machine. When supporting multiple VMs on a vir-
tualization platform the VMM needs to implement a schedul-
ing algorithm. According to this algorithm the VMM switches
between the VMs at the root-level of the hierarchy. From the
VMM’s point of view, the executed VMs are just tasks. The VMs
itself host different OSs scheduling their own set of tasks at the
local level (see figure 46). In case of full virtualization, these
VMs appear to be blackboxes while in case of paravirtualiza-
tion, the VMs can communicate with the VMM scheduler. Thus,
the scheduling question is in general a two level hierarchical
scheduling problem to be solved under the given communica-
tion constraint of full or paravirtualization. Answering this ques-
tions means to find a suitable virtualization host and a valid root-
level schedule that does not violate the real-time constraints of
the given RTVMs.
119
120 scheduling of full virtualized hard real-time systems
Figure 46: Hierarchy of involved schedulers
4.1 problem statement
The problem to be addressed in this thesis has already been de-
scribed at the beginning of chapter 3. The overall goal is the
integration of software components into a big integrated system
as depicted in figure 47. The previous chapter has built the in-
frastructural fundament for this chapter by providing a highly
deterministic VMM with an extensible scheduler interface and
the possibility to determine the WCETs of the virtualized guests,
which are required by the root-level scheduler.
When reconsidering figure 47 the open questions are the deriva-
tion of the CPU requirements and the derivation of the root-level
schedule.
All existing methods presented in the following section 4.2, ex-
cept for the open system environment lack the possibility of de-
riving the root-level schedule from a given set of real-time sys-
tems. At a first glance, the open system environment could be
the key to the stated problems, but it requires paravirtualization
violating the constraint of full virtualization. The compositional
real-time scheduling framework is quite close in this direction
but still requires the definition of static resource partitions at
guest level and valid schedules on these partitions to finally de-
rive the root-level schedule. Thus, it is not possible to derive the
root-level schedule from only the given real-time systems.
The approach presented in this chapter will solve this problems
by deriving the necessary cpu requirements to prevent an over-
4.2 related work 121
GPOS
Integration
VMM
Root Scheduler
RTVM1 RTVM2 RTVM3
Local Scheduler
RS1 RS2 RS3
Local Scheduler Local Scheduler Local Scheduler
P1 P2 P3 P1 P2 P1 P2 P3
CPU
CPU 1 CPU 2 CPU 3
Constraints:
Full virtualization
Guarantee of local real-time constraints
Root-Level Scheduling
requirements?
CPU
requirements?
Given:
Derivation of:
CPU requirements
Root level schedule
P1 P2 P3 P1 P2 P1 P2 P3 P4
Local Scheduler Local Scheduler
GPOS
P4
Local Scheduler
CPU 3
P1 P2 P3
Local Scheduler
P1 P2 P3
Figure 47: Problem of integrating given real-time systems into an vir-
tual system hosting these real-time systems as RTVMs.
load situation on the host system. This is based on a normaliza-
tion and scaling of the tasksets to a faster processor. This consid-
ers the specifics of the local scheduling policies like EDF or RM.
When it is ensured that there is no overload situation, the root-
level schedule is derived from only the given real-time systems
while fulfilling the constraint of full virtualization at runtime.
This is realized on specifically derived resource partitions which
ensure the real-time execution of all real-time tasks in the vir-
tual real-time system. But first an in depth look at the afore men-
tioned related work is performed, which covers the derivation
of the root-level scheduler under the constraint of full virtualiza-
tion.
4.2 related work
The problem of hierarchical scheduling has already been ad-
dressed in academia and industry. Section 4.2.1will begin with
the related work originating from academia. Subsequent to this
section, the related work developed in the industrial field is pre-
sented in section 4.2.2. Finally, a classification of the available
related work is given in section 4.2.3.
122 scheduling of full virtualized hard real-time systems
4.2.1 Academia
The related academic work is presented in this section beginning
with the general concept of partitioning resources at root-level.
The introduced models for partitioning resources do not allow
a derivation of a specific partitioning scheme which guarantees
the schedulability of a real-time taskset. Instead, the developer
needs to define such partitioning schemes on its own and needs
to check these partitioning schemes for schedulability. Therefore,
some of the approaches, provide a schedulability test for a given
partition. Thus, when deriving a root-level schedule based on
these partitioning approaches, a search has to be applied. That
can be very complex as will be shown later in this chapter. Fi-
nally a paravirtualization-based approach called open system
environment is presented, which is able to derive the root-level
schedule at runtime for the RTVMs. Violating the full virtualiza-
tion constraint, it is nevertheless not a solution for the problem
stated in sections 3.1and 4.1.
Proportional Share Scheduling
The general idea of proportional share scheduling [Tij80] is to
provide a predefined amount of the processor capacity to each
available guest. Let A(t)be the set of all active clients at time t
and let wibe the computational weight assigned to client i. The
share fi(t)of client iat time tis then defined as
fi(t) = wi
PjA(t)wj
(4.1)
Assuming a perfect fair system, the service time Si(t0,t1)as-
signed to a client iduring the timer interval [t0,ti]is given as
Si(t0,t1) = Zt1
t0
fi(t)dt (4.2)
Equation 4.2represents a system in which it is possible to assign
infinitesimal small intervals of time to each client. The smallest
assignable time interval is called time quantum q. Unfortunately,
the assumption of qbeing infinitesimal small is not practical for
4.2 related work 123
1 2 3 4 5 6 7 8 9 10 11 120
C(t)
VM
t
2
1
4
3
2
1
5
6
7
8VM1
VM2
Idealized supply with infinite time slicing
Figure 48: VM supply function C(t) vs real time elapsed.
real computing systems due to the overhead induced by con-
text switching for very small values of q. Thus, one problem of
proportional share scheduling is the decision on how to choose
the granularity which is defined by the time quantum q, since q
has a direct impact on the overhead induced by the scheduling.
The other important problem arising is guaranteeing the service
time Si(t0,t1)for specific intervals [t0,ti]. Due to the discretiza-
tion using the time quantum qas smallest possible time interval,
the real service time si(t0,t1)may deviate from the idealized
service time Si(t0,t1)based on infinitesimal time slicing.
The approximation for a time quantum q=1is depicted in fig-
ure 48. The dashed line represents the idealized supply function
with infinitesimal time slicing, while the green and the blue lines
show the approximation of the supply function for two virtual
machines sharing the processor equally with a time slice length
of one time unit (q=1). The bars below the function indicate
the activation of the virtual machines.
The deviation from the real assigned service time Si(ti
0,t)to the
ideally assigned service time si(ti
0,t)is called lag:
lagi(t) = Si(ti
0,t) si(ti
0,t). (4.3)
124 scheduling of full virtualized hard real-time systems
ti
0denotes the point in time of which client ibecame active. The
lag needs to be lower bounded for real-time systems to guar-
antee a certain minimal amount of processing capacity during a
specific time interval [t0,ti]. Thus, the system designer can deter-
mine the minimal guaranteed service time for a given real-time
task τiexecuted in an interval [t0,ti].
The problems being mainly addressed in literature so far are the
quality and the bound of the lag. Different proportional share
algorithms, addressing these aspects have evolved in the 80s
and 90s. Weighted Fair Queuing (WFQ) [DK89], Packet Fair Queu-
ing (PFQ) [DK89], Lottery Scheduling [WW94], Stride Scheduling
[Wal95] and Earliest Eligible Deadline First [SAWJ+96,JSMA98]
are popular examples for such algorithms.
An open question all proportional share algorithms have in com-
mon is how to choose the time quantum q. A small time quan-
tum qleads to a massive switching overhead while choosing a
large time quantum qleads to a larger lag but less switching
overhead.
Resource Partition Model
Mok, Feng and Chen introduced in [MFC01,FM02] the concept
of real-time virtual resources that are shared among real-time task
groups. They introduced a formalism for the schedulability anal-
ysis of a task group executed on a single time slot or multiple time
slot periodic partition.
First they introduce the static resource partition model. In a nut-
shell, a (temporal) static partition is simply a collection of time
intervals during which the physical resource is made available
for a set of tasks which is scheduled on this partition. Two types
of periodic resource partitions are introduced:
Single Time Slot Periodic Partitions (STSPP). See figure 49a.
Multiple Time Slot Periodic Partitions (MTSPP). See figure 49b.
In case of a STSPP, only one time interval exists (N=1), while
in case of a MTSPP, more than one time interval is specified
(N>1).
Definition 4.1.A resource partiton Πis a tuple (Γ,P), where Γis an
array of N time pairs ((S1,E1), ..., (SN,EN)) that satisfies 0S1<
4.2 related work 125
1 2 3 4 5 6 7 8 9 10 11 12013 14 15 16
Π1 = ( {(0;4)}, 8 )
Π2 = ( {(4;8)}, 8 )
Π3 = ( {(0,1)}, 3 )
(a) STSPP examples.
1 2 3 4 5 6 7 8 9 10 11 120
Π2 = ( {(0,1), (2,3), (4,5)}, 8 )
Π1 = ( {(1,2), (3,4), (5,8)}, 8 )
13 14 15 16
Π3 = ( {(2,4), (5,6)}, 6 )
(b) MTSPP examples.
Figure 49: Resource Partition Model
E1<... < SN< ENfor some N1, and Pis the partition period.
The physical resource is available to the set of tasks executed on this
partition only during intervals (Si+j·P,Ei+j·P),1iN,j0
[MFC01].
The static resource partition model is in general a more formal
definition of the timeline scheduling approach presented in 2.1.2.
The schedulability analysis of a task set executed on such a re-
source partition is no longer possible by comparing the utiliza-
tion of the task set to traditional utilization bounds introduced
by Liu and Layland in [LL73]. To solve this problem, Mok and
Feng introduced the concept of critical instances for resource
partitions and proposed schedulability tests for fixed and dy-
namic priority scheduling of the task sets executed on a given
static resource partition. The static resource partition model ap-
proach of Mok and Feng provides a generalized formalization
and schedulability analysis of timeline scheduling introduced in
section 2.1.2
Mok and Feng extended this static resource partition model to
the bounded-delay resource partition model to provide more flexi-
bility. They argued that static resource partitions are an impor-
tant technique for designing high criticality applications since
the static model is very amenable to timing correctness certifica-
tion, but in other cases, it is desirable to give more flexibility in
granularity to the root scheduler. The partition schedule in case
of the static resource partition model is defined directly by the
time pairs γand the partition period P.
126 scheduling of full virtualized hard real-time systems
Definition 4.2.A bounded delay resource partition Πis a tuple (α,),
where αis the availability factor of the partition and is the partition
delay [MFC01].
This definition actually defines a set of partitions, because there
are many different partitions in static partition model that satisfy
this requirement. The bounded-delay resource partition model
allows the creation of a root-level schedule based on the defini-
tion of an availability factor αkand the definition of partition
delay kfor each partition being scheduled by the root-level
scheduler. The parameter αkensures that αk·Ltime units of
the resource in any interval of length L+kare assigned to the
partition with the delay bound k. Starting from any point in
time, kspecifies the maximum period of time the task group
may have to wait before receiving its fraction of the processor.
Nevertheless, the approach requires a valid schedule of the task
set on a given static resource partition to create a transformation
to a feasible schedule for the bounded-delay resource partition
model. A very important thing to notice is that the resulting fea-
sible schedule may not preserve the original scheduling policy.
When this transformation is feasible for all partitions, a root-level
schedule can be derived based on given resource requirements
(αk,k)with kfor each partition, where is the maximal
possible delay for the associated partition k.
The work of Mok and Feng provides a solid formal fundament
for modeling and testing static resource partitions for schedula-
bility of a given task set. The modeling of static resource parti-
tion requires the definition of the parameter γ, being an array
of Ntime pairs in which the partition is active, and the period
Pof the resource partition by the system designer which can
then be tested for schedulability using fixed or dynamic prior-
ity scheduling algorithms. Mok and Feng extended their work
by the bounded resource partition model, which provides more
flexibility in creating a root-level schedule due to the specifica-
tion of the resource availability factor αkand the partition delay
k. The static resource partition model is applicable to virtu-
alized environments based on full virtualization, as it is possi-
ble to combine resource partitions in a way that two partitions
do not allocate a resource at the same time while creating fea-
4.2 related work 127
sible real-time schedules on the partitions. The bounded-delay
resource partition model is only applicable to paravirtualized en-
vironments, because the derived schedule on a specific resource
partition may not preserve the original scheduling policy. Thus,
the local scheduler must be modified to implement the derived
schedule.
Compositional Real-Time Scheduling Framework
Shin and Lee extended the resource partition model in [SL03,SL04,
SL08] by schedulability tests for RM and EDF being applied on
a given periodic resource Γ(Π,Θ).Πis the period of resource
and Θis the capacity of the resource supplied over the period Π.
Additionally they introduced a method to determine the max-
imal utilization for a periodic task set being schedulable on a
given periodic resource Γ(Π,Θ)using RM or EDF. Based on
this result, they developed a compositional real-time scheduling
framework where global (system-level) timing properties are es-
tablished by composing independently (specified and) analyzed
local (component- level) timing properties. This framework de-
rives the timing requirements of the hypervisor scheduler from
the timing requirements of the guest schedulers in a composi-
tional manner, that is to say such that the timing requirement
of the root-level scheduler is satisfied, if, and only if, the timing
requirements of the guest schedulers are satisfied. The schedul-
ing model on hypervisor level is schedulable, if and only if, each
scheduling model on guest level is schedulable.
The approach of Shin and Lee is very interesting when defin-
ing RTVMs as a periodic resource and the designer wants to
know what maximal utilization is schedulable on such a peri-
odic resource or whether a given taskset is schedulable on that
resource. Once the designer has specified the local periodic re-
source with their tasksets and tested them for schedulability, it
is possible to derive a global schedule of that local periodic re-
sources. Thus, a composition of individually specified periodic
resources is achieved using a global scheduling mechanism. It
is not possible to completely derive a valid root-level scheduler,
since the partitioning schemes for the different RTVMs have to
be specified manually by the designer. This thesis also covers
the derivation of the partitioning schemes for a valid root-level
schedule.
128 scheduling of full virtualized hard real-time systems
Open System Environment
The open environment for real-time applications developed by
Deng and Liu in [DLS97,DL97] and the BSS-I [LB00] and PShED
frameworks [LCB00] developed by Lipari et al. are hierarchical
scheduling systems designed around the idea of providing a
USP1to each real-time application. The USP abstraction guar-
antees that any task set (i.e. the set of real-time tasks comprising
a real-time application) that can be scheduled without missing
any deadlines (by a particular scheduler) on a processor of speed
scan also be scheduled by that scheduler if it is given a USP
with rate s
fon a processor of speed f. The USP itself is realized
by implementing a CUS2. The CUS replenishes its budget to its
capacity Bat the end of the servers period in comparison to an
immediate replenishment when the budget is exhausted as in
case of the total bandwidth server(TBS) [But04]. This is due to
the reason that periodic tasks do not have any benefit from fin-
ishing much earlier than their deadline. The USP guarantee is
characterized by a share of the processor and does not make any
assumptions about the granularity at which this share will be
granted. The characterization by the share is realized in the CUS
replenishment policy as:
dserver =t+B
U(4.4)
The higher the utilization of a server, the higher its replenish-
ment rate. Compared to proportional share schedulers, this is
the main difference, as proportional share schedulers are based
on the granularity given by the time quantum q. Thus, the gran-
ularity information must be specified dynamically by communi-
cating the next deadlines of all local schedulers at runtime to the
root scheduler. This requires the usage of a paravirtualization in-
terface between the guest OS hosting the local scheduler and the
hypervisor hosting the root scheduler. With the knowledge of
the local schedulers, the root scheduler is able to ensure that the
USP assigned to each guest never uses more of its share of the
processor over any time interval that matters to any USP. The
requirement that all deadlines are specified dynamically adds
run-time overhead and is not a good match for some kinds of
schedulers. For example, a rate monotonic scheduler retains no
1uniformly slower processor
2Constant Utilization Server
4.2 related work 129
information about task deadlines at run time, and consequently
cannot make use of a USP guarantee based on a root-level EDF
scheduler [Reg01].
Fixed-Priority-Driven Open Environment Scheduling
The original proposal of the open environment uses an EDF
scheduler at root-level and constant bandwidth servers to en-
capsulate the guests. Besides the fact of EDF minimizing the
maximum lateness of its scheduled tasks and the possibility to
utilize the processor up to 100%, EDF requires much more imple-
mentation and runtime overhead than fixed-priority schedulers
like RM. Thus, Kuo et. al proposed an approach using RM as
root-level scheduler for the open system environment in [KL99].
The approach of Kuo et. al uses sporadic servers to encapsulate
the guests. To guarantee the real-time execution of a guest, the
period of the associated sporadic servers Pshas to fulfill the fol-
lowing inequality
Ps<di
2+dci
ce(4.5)
for every τiof the task set of the guest with dibeing the relative
deadline of τi,cibeing the required computation time of τi, and
cbeing the capacity of the sporadic server.
Like the original open system environment which uses EDF as
root-level scheduler, the extension by Kuo et. al requires a para-
virtualization of the guest OS, because the replenishment of the
sporadic servers is based on the knowledge of their remaining
capacity. Due to the general possibility of having idle times in
a RTVM, the root-level scheduler cannot derive the remaining
capacity of the scheduled sporadic servers without modifying
the guest scheduler to communicate the resource consumption
to the root-level scheduler.
Open System Environment Server Parameters
To analyze general research questions which are valid for all
server-based hierarchical scheduling approaches like the Open
System Environment (see 4.2.1), Lipari and Bini [LB05] built an
abstract server model. The servers of this model are character-
ized by their maximum computational budget Qand their server
130 scheduling of full virtualized hard real-time systems
period P. Thus, a tuple (Q,P)is assigned to every virtual ma-
chine. The virtual machine then receives Qunits of execution
time every Punits of time. This is equivalent to the general pe-
riodic task model introduced in section 2.1.1. The problem ad-
dressed by Bini and Lipari is the question how to choose the
tuple (Q,P)to guarantee the schedulability of the applications
executed within the servers. Two opposite needs have to be bal-
anced. A large Pis desired to avoid a waste of time in context
switches. Context switches have to be performed by the global
scheduler and are costly in terms of time. Since a large α=Q
P
leads to a high utilization, a small required bandwidth is de-
sired to avoid a waste of total processor capacity. They answered
this question for virtual machines composed of periodic and spo-
radic tasks which are scheduled by a fixed priority local sched-
uler.
To find the class of tuples of server parameters (Q,P)that make
the task set feasible, all possible tuples are derived. Based on
this result, the system designer could choose the best trade-off
between a large Pand a small α. An analysis of the temporal
behavior of a server has to precede the schedulability analysis
of the task set. The authors define a function Zs(t)as the mini-
mum amount of time provided by server Sin any time interval
of length t0. A task is feasible on server Sif the time Zs(t)
provided by Sis greater than or equal to the worst case execu-
tion time requested by this task and all tasks of higher priority,
since all tasks of higher priority delay the execution of the ana-
lyzed task. The schedulability conditions are expressed as a set
of linear inequalities. By solving these, a class of tuples with
guaranteed schedulability is obtained and the system designer
can select the server parameters.
The work of Lipari and Bini allows to compute the best server
parameters for a two-level hierarchical scheduling system based
on the open system environment. Other hierarchical scheduling
approaches leave this open and just assume that the best server
parameters are known. Their approach is however complex to
implement and introduces a high scheduling overhead. The un-
derlying server abstraction generalizes their findings and can be
used for future research in the field of real-time servers.
4.2 related work 131
4.2.2 Industry
The related work in the industrial field is mainly based on time-
line scheduling approaches (see section 2.1.2), as they guarantee
a strict timeliness which is necessary for a lot of applications
involving sensors and actors. A very interesting approach of a
hierarchical scheduled system is introduced by the ARINC 653,
a standard developed for avionic computing applications. Unfor-
tunately, the ARINC653 standard is based on creating the time-
line schedules manually, what is violating the goal of deriving
the schedule from the given real-time systems, as described in
section 4.1. Finally, the operating system PikeOS implementing
the ARINC 653 standard is presented.
ARINC 653
The architecture of the ARINC653 standard has already been
described in section 3.2.2. Now, the approach of scheduling the
partitions is presented in this section. In general, an ARINC653
partition is equivalent to a program encapsulated in a single ap-
plication environment. The partitioning separates applications in
space and time. The time partitioning is realized as a static con-
figuration and a set of execution windows is assigned to each
partition (see figure 50). The processes within the scope of a par-
tition are scheduled by the partitions scheduler. The basic ap-
proach of the ARINC653 scheduling scheme can easily be iden-
tified as timeline scheduling (see 2.1.2). Thus, it can be modeled
by the static resource partition model introduced by Mok and
Feng (see 4.2.1). Nevertheless ARINC653 does also not give a so-
lution to the problem of deriving a feasible root-level schedule
The ARINC653 standard is implemented by most of the avail-
able commercial VMMs such as Greenhills Integrity Secure Vir-
tualization ( see3.2.2), Lynuxworks Lynxsecure (3.2.2), Windriver
Hypervisor (3.2.2), and Sysgo PikeOS (3.2.2). Thus, all root-level
schedules of these hypervisors can be modeled using the con-
cept of resource partitions introduced by Mok and Feng (see
4.2.1).
132 scheduling of full virtualized hard real-time systems
time [ms]
025 50 75 100 125 150
Partition 1 Partition 2 Partition 3
Module OS
Hardware
Partition OS Partition OS Partition OS
APEX Interface APEX Interface APEX Interface
Major Cycle
Minor Cycle
Figure 50: ARINC653 partition scheduling
Pike OS
As described in the previous section, PikeOS implements the AR-
INC653 standard and thus provides temporal isolation by time-
line scheduling. Nevertheless, there are some small adaptations
in wording and implementation worth to mention in PikeOS.
The first adaptation affects the naming of partitions. PikeOS in-
troduces two type of partitions:
Resource Partitions
Time Partitions
A resource partition is a set of PikeOS tasks sharing a bounded
set of kernel resources assigned to the resource partition and
exception monitoring handlers. Besides the assignment to a re-
source partition, each PikeOS thread is assigned to a specific
time partition τi. In figure 51,σVM shows the activation of
three different time partitions while σishows the thread acti-
vations within the time partition τi.τ0is a special time domain
in PikeOS which is alway active and is responsible for handling
asynchronous events. Thus, the dispatcher needs to decide be-
tween the highest priority thread of the currently active time
partition τiand the domain τ0. This adaptation is the main dif-
ference to the standard ARINC 653 scheduling approach. Nev-
ertheless, the scheduling in PikeOS can also be modeled by the
4.2 related work 133
1
2
3
1
2
3
1
2
3
1
2
3
σ1(t)
σ2(t)
σ3(t)
σVM(t)
prio
ready
queues
prio
prio
t
t
t
t
prio
τ0
τ1
τ2
τ3
OR Dispatcher
Event Triggered
Domain
Userspace Kernelspace
Figure 51: PikeOS scheduling mechanism. The OR operator realizes
a preemptability of the time triggered partitions τiby the
event triggered background partition τ0
concept of resource partitions introduced by Mok and Feng (see
4.2.1) when this extension is not implemented.
4.2.3 Classification
In general, it is possible to classify the introduced scheduling
algorithms based on their root-level scheduling behavior into
the two basic classes:
Static scheduling
Dynamic scheduling
Static scheduling:
This class includes the classical approach of static proportional
share scheduling and its direct derivates of the Fair Queueing
class. The root-level scheduler statically partitions the shared re-
source and makes the partition available to the clients in multi-
ples of the time quantum qat a fixed rate. The main problem
when applying proportional share schedulers at the root-level
of a real-time hierarchical scheduling is the choice of the time
quantum qand the bound of the lag at arbitrary time. The time
quantum qdirectly influences the overhead induced by the root-
134 scheduling of full virtualized hard real-time systems
level scheduler, because choosing a large time quantum qleads
to less switching overhead, but a larger lag.
The timeline scheduling approaches as used in the ARINC 653
standard also fall into this category. The timeline is partitioned
in minor and major cycles with the minor cycles being the time
synchronization points, where the scheduler is activated by a
timer interrupt. Thus, the minor cycle specifies the rate of the
scheduling and influences the switching overhead comparable
to the time quantum qof proportional share schedulers. In gen-
eral, the static proportional share schedulers and the timeline
schedulers are equivalent.
The static resource partition model is another approach of this
class. Within the major cycles of a resource partition, which is
called partition period P, the resource partitions activations are
completely customizable by the system designer by defining the
parameter Γbeing an array of Ntime pairs. This is the main
difference to static proportional share and timeline scheduling.
Nevertheless, it is possible to model static proportional share
schedules and timeline schedules as static resource partitions.
The compositional real-time scheduling framework is also in-
cluded in the class of static scheduling, because it is based on
the composition of static resource partitions based on given valid
schedules on given resource partitions.
Dynamic scheduling:
Besides the static root-level schedulers, some dynamic root-level
schedulers have been introduced. One is the Earliest Eligible Vir-
tual Deadline First Algorithm, which is the only proportional
share based algorithm in this class providing a bound of the
lag of O(1)at the time a deadline expires, what allows for the
scheduling of firm real-time tasks using deadline based schedul-
ing at the local scheduler level. Nevertheless, the algorithm also
requires a communication of the local deadlines to the root-level
scheduler what can only be realized by applying paravirtualiza-
tion.
The original open system environment published by Deng et. al
is based upon EDF scheduling of constant bandwidth servers.
The decision of the granularity is based on the deadlines of the
local schedulers and does not depend on the time quantum q
like in proportional share schedulers. Therefore the root-level
scheduler needs to communicate with the local schedulers re-
4.3 model 135
RS
Scheduler
τ1τ2τ3
CPU
providing ISA @ f Hz
Figure 52: Model of a real-time system
sulting in a paravirtualization of the guest OS encapsulated in
the server. The RM extension of the open system environment
published by Kuo et. al introduces RM as root-level scheduler,
but still requires paravirtualization.
4.2.4 Summary
The presented related work does not give a satisfying solution
to the problem of deriving a root-level schedule from given real-
time systems. Instead, there are interesting partitioning models,
like the resource partition model, which provides a schedula-
bility test for a given partitioning scheme. The resource parti-
tion model will be used in the following to model the root-level
schedule, as it can be seen as an abstraction of existing static re-
source partitioning approaches. The class of dynamic root-level
schedulers all rely on the paradigm of paravirtualization violat-
ing the full virtualization constraint of the problem statement
(see section 4.1).
4.3 model
The problem stated in section 4.1first requires the modeling of
the given real-time systems and the virtual system hosting these
real-time systems. Thus, the definition of a given real-time sys-
tem is given. The definition corresponds to figure 52.
Definition 4.3.A real-time system RS is given as a 4-tuple RSi(Γ,Σ,
ISA,f).Γi={τi(Ti,Ci)|i=1...ni}represents the taskset being sched-
uled by the scheduling algorithm Σi. The taskset Γis a periodic taskset
136 scheduling of full virtualized hard real-time systems
VMM
Root Scheduler
(RTVM, Φ)
RTVM1 RTVM2 RTVM3
Scheduler 1Local Scheduler 2Scheduler 3
τ1τ2τ3τ1τ2τ1τ2τ3τ4
CPU
providing ISA @ f Hz
Figure 53: Model of a virtual real-time system
(see 2.1.1). The applied scheduling algorithm Σis assumed to be either
RM (see2.1.2) or EDF (see 2.1.2). The parameter ISA represents the
instruction set architecture of the hardware platform where the real-
time system is compiled for. The parameter frepresents the clock-rate
to which the worst case execution time Ckof the task τkΓrefers to.
Now the virtual real-time system (see figure 53) hosting multiple
real-time systems is defined:
Definition 4.4.A virtual real-time system VRS((RTVM1,RTVM2,
..., RTVMn),Φ)is given by a 2-tuple of real-time systems being en-
capsulated as real-time virtual machines (RTVMs). These RTVMs are
spatially and temporarily isolated by a VMM, which fulfills the VMM
properties of Popek and Goldberg (see 2.2.2). The temporal isolation is
ensured by the time partitioning policy Φ.
The RTVMs RTVM1, ..., RTVMnare either defined directly or
can be derived from the given real-time systems RS1, ..., RSn. For
the derivation of the RTVMs, it is further assumed that the given
real-time systems share a common ISA and their clock rate is
known. This allows to determine the clock frequency of the vir-
tual host system. This simplification restricts the direct applica-
tion to identical processor architectures used in the RTVMs in
this thesis, which is not necessary in general since virtualization
offers the emulation of different ISAs.
Other effects induced by the processor architecture like caching,
pipeling and so on are not considered either, since this is covered
4.4 transformation of real-time systems into real-time virtual machines 137
by the research area for determining worst case execution times
and Thus, is out of scope of this thesis.
The time partitioning policy Φcan be realized by applying one
of the approaches presented in section 4.2. Since the goal of this
thesis is to derive a fully virtualized virtual real-time system
from given real-time systems RS or virtual real-time machine
RTVM, a method based on single time slot periodic partitions
(STSPPs) is presented to derive the partitioning policy Φfor the
FVBTP. As local schedulers Σieither RM or EDF are assumed.
In the following, the problems of determining the CPU require-
ments and the derivation of a feasible root-level schedule for the
integrated virtual system will be discussed. The sequential steps
to find a solution for this problem are depicted in figure figure
54. First, a normalization of all given real-time systems is per-
formed to determine a scaling factor in the next step. After the
normalization and scaling steps have been performed, a valid
root-level schedule is derived based on the concept of resource
partitions introduced in section 4.2.1. This methodology will fur-
ther be called Full Virtualization by Temporal Partitioning (FVBTP).
Finally, an evaluation based on hardware developed in CRC614
is presented to show the applicability and the performance of
FVBTP. FVBTP has been published in [KBS09,KBG10].
4.4 transformation of real-time systems
into real-time virtual machines
When the set of RTVMs is not explicitly defined by the system
designer, this set needs to be derived from the given set of real-
time machines RSi. This is the case when performing a consol-
idation of given real-time systems to an integrated virtual real-
time system. As already stated in section 4.3, the given real-time
systems share the same ISA and each real-time system RSipro-
vides the parameter firepresenting its clock rate. This informa-
tion is necessary to derive the necessary clock rate of the virtual
real-time system to be able to handle the load of its RTVMs.
This clock-rate is considered to be a hint to the system designer
138 scheduling of full virtualized hard real-time systems
RS1 RS2 RS3
Local Scheduler Local Scheduler Local Scheduler
P1 P2 P3 P1 P2 P1 P2 P3
CPU1 @ f1 [Hz] CPU2 @ f2 [Hz] CPU3 @ f3 [Hz]
RS1 RS2 RS3
Local Scheduler Local Scheduler Local Scheduler
P1 P2 P3 P1 P2 P1 P2 P3
CPU @
min(f1,f2,f3) [Hz]
CPU @
min(f1,f2,f3) [Hz]
CPU @
min(f1,f2,f3) [Hz]
Normalization
Scaling
RTVM1 RTVM2 RTVM3
Local Scheduler Local Scheduler Local Scheduler
P1 P2 P3 P1 P2 P1 P2 P3
CPU @ fVRS [Hz] CPU @ fVRS [Hz] CPU @ fVRS [Hz]
P4
P4
P4
Partitioning
VMM
Root Scheduler
RTVM1 RTVM2 RTVM3
Local Scheduler
CPU @ fVRS [Hz]
P1 P2 P3 P1 P2 P1 P2 P3 P4
Local Scheduler Local Scheduler
Figure 54: Methodology to realize full virtualization of RTVMs by tem-
poral partitioning.
4.4 transformation of real-time systems into real-time virtual machines 139
of the virtual real-time system, since the possible real-clock rates
depend on the available hardware.
4.4.1 Normalization
To calculate the load of the virtual real-time system VRS, it is first
necessary to calculate the load of the given real-time system RSi.
According to definition 2.5, the utilization of a given real-time
system RSiis calculated by
U(RSi) = X
RSi(Γ)
Ck
Tk
(4.6)
First of all, it has to be noted that the WCETs Ckneed to be the
WCETs introduced in section 3.4.2, as the virtualization has an
impact on the execution times. Since the given real-time systems
RSimay operate at different clock rates, the utilization of them
cannot be summed up without normalizing them to a common
clock rate. To achieve this, the clock rate fiof each real-time
system RSiis used. The normalization is realized relative to the
slowest real-time system RSs
RSs=RSk|i:fkfi(4.7)
Now it is possible to determine the normalization factor sito
normalize each real-time system RSito the real-time system RSs
si=fi
fs
(4.8)
Each normalization factor sishows how much faster RSiis com-
pared to RSs.
Now, it is possible to normalize each real-time system under
the assumption that each real-time system is executed on the
slowest available hardware platform of RSs. Thus, the execution
times of the tasksets of each RSibeing executed on the hardware
platform of RSshave to be multiplied by its normalization factor
si.
RSis= ({Tk,Ck·si)|τk(Tk,Ck)
RSi(Γ)},RSi(Σ),RSi(ISA),fs)(4.9)
140 scheduling of full virtualized hard real-time systems
Obviously, the utilization of each normalized real-time system
RSisis also multiplied by its normalization factor si
U(RSis) = X
RSis(Γi)
Ck·si
Tk
=si·U(RSi)(4.10)
4.4.2 Scaling the virtual real-time system
When executing all normalized real-time systems RSisencapsu-
lated as RTVMs on the hardware platform of Rs, it is very likely
that this hardware platform is overloaded. Thus, it may be neces-
sary to find a suitable hardware for the virtual real-time system
to execute all real-time systems as RTVMs. This is achieved by
determining the overall utilization of the normalized system and
using this utilization as a speedup factor Sbeing multiplied by
the clock rate fsof the normalized system. This results in the
final clock rate fVRS being the clock rate of the virtual real-time
system hosting the RTVMs. Thus, the RTVMs can now be de-
rived by applying Sto the normalized real-time systems RSis.
RTVMi= ({(Tk,Ck
S)|τk(Tk,Ck)
RSis(Γ))},RSis(Σ),RSis(ISA),fVRS)(4.11)
The utilization of the single RTVMs and the virtual real-time
system VRS can easily be calculated by:
U(RTVMi) = X
RTVMi(Γ)
Ck
Tk·S=1
S·U(RSis) = si
S·U(RSi)
(4.12)
U(VRS) =
n
X
i=0
U(RTVMi).
(4.13)
The identification of the speedup factor S depends on the ap-
plied scheduling algorithm Σiof each real-time system, because
the scheduling algorithms may differ in their utilization bound.
In general, it is not possible to use the total utilization U(VRSs)
of the normalized virtual real-time system when there are sched-
ulers having a utilization bound less than one. Using the total uti-
lization U(VRSs)as speedup may lead for example in the case
4.4 transformation of real-time systems into real-time virtual machines 141
of RM to the preemption of low priority tasks by high priority
tasks before the low priority tasks are able to finish before their
deadline. Thus, the speedup factor S needs to be relative to the
utilization bound of the applied scheduling algorithm . For each
scheduling algorithm σthe utilizations of the normalized real-
time systems U(RSis), relative to their utilization bounds, are
summed up to build the speedup factor Sσ.
αi=1
Ulub(RSis)(4.14)
Sσ=X
i|i:RSis(Σi)=σ
αi·U(RSis)(4.15)
It is assumed that either EDF or RM is used as local scheduler
within the scope of this thesis. Thus, the final speedup factor S
is given by:
S=SEDF +SRM (4.16)
With this speedup factor S, it is possible to derive the required
clock-rate for the VRS hosting the real-time systems RSias real-
time virtual machines RTVMi. This is simply done by multiply-
ing the clock-rate of the slowest system RSs, used for normaliza-
tion, by the speedup factor S.
fVRS =fs·S(4.17)
This clock rate fVRS denotes the minimum required clock-rate
for the VRS. As there is hardly any hardware platform providing
exactly this clock rate, the next faster one is to be chosen.
EDF
The speedup factor SEDF for EDF-based real-time systems is the
sum of their normalized utilization, as the utilization bound of
EDF is 1. Thus, αi=1
SEDF =X
i|i:RSis(Σi)=EDF
U(RSis)(4.18)
142 scheduling of full virtualized hard real-time systems
RM
Since the utilization bound of RM depends on the task set, the
speedup factor SRM needs to be defined relative to the least up-
per bound of the tasksets. Thus, αi=1
Ulub(RSis)
The speedup factor SRM for RM-based real-time systems is:
SRM =X
i|i:RSis(Σi)=RM
1
Ulub(RSis)·U(RSis)(4.19)
4.4.3 Summary
This section described a methodology for deriving the clock-rate
of the VRS to prevent the system from being overloaded by
the executed RTVMs. Therefore, a normalization to a common
base system (namely the slowest) is performed. Afterwards, the
utilization of this system, considering the utilization bounds of
EDF an RM, is determined. This utilization is chosen to be the
speedup factor Sapplied to the clock-rate of the common base
system to determine the final clock-rate of the VRS. This method-
ology only ensures the VRS from being overloaded, but it does
not ensure the RTVMs to be schedulable by a root-level sched-
uler with an arbitrary partitioning policy. Thus, the next section
covers the problem of ensuring the feasibility of the single RTVM
schedule while having them scheduled by a root-level scheduler
with a specific partitioning policy.
4.5 partitioning policy
The first question to answer is which kind of partitioning pol-
icy is suited for the given requirements in section 4.1. The class
of existing dynamic root-level schedulers is not suitable, because
the implementation requires paravirtualization and thus violates
the requirement of full virtualization. Thus, the remaining class
to consider is the class of static root-level schedulers. Within this
class, the static resource partition model is a generalization of
the static proportional share and timeline scheduling algorithms.
This allows the definition of arbitrary static partitions using ei-
ther STSPP or MTSPP. STSPPs have the advantage of a simple
4.5 partitioning policy 143
modelling of a partition: required is only the definition of the
partition period Pand of one time interval during which the
partition is active, which will further be called activation slot.
There are two problems being addressed when using static re-
source partitions. One is the determination of the activation slots
γiof a static resource partition Πi, and the other one is the de-
termination of its period Pi. Both problems have to be solved
under the real-time constraints given by the taskset RTVMi(Γ)
being executed on Πi. Thus, the choice of the proper activation
slots and the proper period are essential to fulfill the given real-
time constraints.
Another question is whether to choose STSPPs or MTSPPs as
static resource partitions Πi. Considering the switching over-
head, MTSPPs with Nactivation slots introduce more switching
overhead than a STSPP when their period PMTSPP is chosen to
be smaller than Ntimes the period PSTSPP of a STSPP providing
an equivalent resource allocation.
PMTSPP < N ·PSTSPP. (4.20)
An example to clarify this is depicted in figure 55. There are
three static resource partitions given. The first has a period of
9), the second has a period of 5and the third has a period of
11. All partitions provide a bandwidth of 40%, but differ in their
switching overhead. Π1has a period of 9being less than twice
the period of the STSPP Π2. According to equation 4.20 the ex-
pected switching overhead of Π1is higher than in the case of Π2.
Figure 55 shows this effect. Counting the switches up to time
t=45 results in 10 VM context switches caused by Π1while
Π2causes 9VM context switches and Π3causes 8VM context
switches. Thus, the choice of the period is essential for determin-
ing whether to use STSPPs or MTSPPs.
4.5.1 Activation slots
The derivation of the RTVMs presented in section 4.4.2deter-
mines which bandwidth a resource partition needs to provide
for a given RTVMi. When assuming STSPPs as partitioning pol-
144 scheduling of full virtualized hard real-time systems
1
1
1
t
510 15 20 25 30 35 40 450
23456789
2 3 4 5 6 7 8
2 3 4 5 6 7 8 9 10
Figure 55: Comparison of switching overhead of STSPPs and MTSPPs
icy and Pand S1are being chosen arbitrarily, the activation slot
(S1,E1)of Πifor VMican be calculated by:
E1=S1+αi·U(RTVMi)·P(4.21)
There are P (E1S1)possibilities to place this STSPP within
the period P, because the slot must be placed in a way not to
exceed the period P.
When assuming MTSPPs as partitioning policy, the activation
slots {(S1,E1), ..., (SN,EN)}of Πialso need to fulfill the band-
width requirement of RTVMi:
PN
j=1EjSj
P=αi·U(RTVMi)(4.22)
When defining the activation slots using either equation 4.21 or
4.22, it is ensured, that the fraction of the required resource band-
width to the allocated resource bandwidth is equal to the utiliza-
tion bound of the applied scheduling algorithm RTVMi(Σ):
Theorem 4.1.Let the period Pof a static resource partition Πibe arbi-
trarily chosen. Then the ratio of the required utilization R=U(RTVMi)
to the allocated utilization A=PN
j=1EjSj
Pis equal to the utilization
bound αi=1
Ulub(RTVMi)of RTVMi(Σ)scheduling the associated
taskset RTVMi(Γ).
Proof. Figure 56 shows how the absolute times for the required
computation time and the allocated computation time within the
period Pare calculated. By dividing the required computation
4.5 partitioning policy 145
Ei - Si = αi · U(RTVMi) · P
U(RTVMi) · P
SiSi + U(RTVMi) · P Ei
Figure 56: Graphical illustration of the required computation time
within the allocated interval EiSiof Πi
time R=U(RTVMi)·Pby the allocated computation time A=
EiSi=αi·U(RTVMi)·P, we obtain
R
A=U(RTVMi)·P
αi·U(RTVMi)·P=1
αi
=Ulub(RTVMi(Γ))
For MTSPPs the theorem directly follows by rewriting equation
4.22.
By applying the methodology presented in section 4.4to derive
the RTVMs and the derivation of the activation slots presented
in this section it is possible to guarantee that the ratio of the
required computation time to the effectively allocated computa-
tion time over the period Pis equal to the utilization bound of
the applied local scheduler. The remaining problem to be solved
is the choice of the period of the static resource partition. This
will be addressed in the following section.
4.5.2 Period of the static resource partitions
When choosing the period for a static resource partition, the
schedulability of the taskset executed on this partition needs
to be ensured. This requirement does not permit to choose the
period arbitrarily. An example for this is depicted in figure 57.
The example shows the scheduling of two tasksets RTVM1(Γ) =
RTVM2(Γ={(8,2),(10,2.5)}being executed on Π1={(0,4),8}
and Π2={(4,8),8}. The tasks of RTVM1(Γ)are mapped to
146 scheduling of full virtualized hard real-time systems
0
0,5
0
0,5
0
0
1 2 3 4 5 6 7 8 9 10 11 120
1
Task
VM
1
2
3
4
2
1
0
0,5
0
0,5
1 2 3 4 5 6 7 8 9 10 11 120
Task
VM
1
2
3
4
2
1
0
0
1
Γ1 = { (2;8), (2,5;10) }
Γ2 = { (2;8), (2,5;10) }
Π1 = { (0;4), 8 }
Π2 = { (4;8), 8 }
Π1 = { (0;4), 8 }
Π2 = { (4;8), 8 }
Γ1 = { (2;8), (2,5;10) }
Γ2 = { (2;8), (2,5;10) }
Figure 57: Example for the choice of an incompatible period length for
static resource partition
numbers 1and 2, while the tasks of RTVM2(Γ)are mapped
to numbers 3and 4.The first schedule shows the execution of
RTVM1(Γ)on Π1and RTVM2(Γ)on Π2, while the second sched-
ule shows the execution of RTVM1(Γ)on Π2and RTVM2(Γ)
on Π1. Both tasksets are scheduled by EDF. The utilization of
RTVM1is U(RTVM1) = 0.5, and the utilization of RTVM2is
U(RTVM2) = 0.5. Both resource partitions provide a bandwidth
of 40
8=84
8=0.5. The ratio of the required utilization to allo-
cated utilization is 1in all cases. Thus, the partitions are fully uti-
lized and are not overloaded. The utilization of the partitions is
equal to the utilization bound of EDF. Thus, a scheduling of the
tasksets should be in general possible when considering only the
utilization. When taking a look at the schedules of figure 57, one
can see that there is a deadline miss at time t=10 in both sched-
ules. Now the choice of the period comes into play, as this is the
reason why the deadlines are missed. Remembering the supply
function (see figure 48 introduced in section 4.2.1) to show the
4.5 partitioning policy 147
impact on quantization, it is possible to find the problem. The
supply function
C(Π,t) =
t
X
i=0
active(Π(γ,P),t)(4.23)
active(Π(γ,P),t) = 1if (tmod P)γ
0else (4.24)
represents the accumulated activation time up to time tof Πi,
while the assigned utilization function
UA(Π,t) = C(Π,t)
t(4.25)
denotes the percentual part of the processor assigned to Πiup
to time t.
For each partition of the first schedule, C(Π,t)and UA(Π,t)are
depicted in figure 58. Due to the quantization introduced by the
partitions Π1and Π2, the real supply deviates from the idealized
supply when assuming infinite time slicing (P0). Note that in
general, there are individual supply functions for each partition,
but the example contains two partitions with identical supply
functions. As one can see, the supply function for both VMs
deviates from the idealized supply at time t=10. The one with
the supply function value below the idealized supply misses its
deadline at time t=10. Thus, an arbitrary choice of the period P
is not possible when executing hard real-time tasksets on static
resource partitions.
Lemma 4.2.When PγiEjSj
Pi=1
Ulub ·U(RTVMi)for γiof the static
resource partition Πi(γi,Pi)and Piis chosen as Pi=gcd({Tk|τk(Tk,
Ck)RTVMi(Γ)}), then no task τkRTVMi(Γ)will miss its dead-
line, if RTVMiis transformed by RTVMi= ((Tk,Ck
S)|τk(Tk,Ck)
RSis(Γ)},RSis(Σ),RSis(ISA),fVRS)with S=SEDF +SRM.
Proof. To show that the assigned allocation is guaranteed at ev-
ery single deadline of the taskset RTVMi(Γ), we assume that
there exists a task τkRTVMi(Γ)with deadline Tkwhere UA(
148 scheduling of full virtualized hard real-time systems
0 2 4 6 8 10 12 14 16
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
4 8 10 120
VM
t
2
1
C
4
Π1Π2
16
8
Idealized Supply
Π2
Π1
UA
t
Idealized Supply
Figure 58: Assigned computation time and utilization over the real
time elapsed of schedule 1of figure 57
4.5 partitioning policy 149
1 2 3 4 5 6 7 80
Π1 = ( {(0,1), (2,3), (4,5),(6,7)}, 8 ) Π2 = ( {(1,2), (3,4), (5,6),(7,8)}, 8 )
Π1 = ( {(1,2), (3,5), (7,8)}, 8 )
Π1 = ( {(0,2), (4,5), (7,8)}, 8 )
Π1 = ( {(0,3), (5,6)}, 8 )
Π1 = ( {(0,4)}, 8 )
Π2 = ( {(0,1), (2,3), (5,7)}, 8 )
Π2 = ( {(2,4), (5,7)}, 8 )
Π2 = ( {(3,5), (6,8)}, 8 )
Π2 = ( {(4,8)}, 8 )
Figure 59: Example for scheduling activation slots of a two periodic
resource partitions Π1and Π2with period P1=P2=8
and U1=U2=1
2.
Πi,t)at time t=Tkis smaller than the required utilization
αi·U(RTVMi).
τkRTVMi(Γ) : UA(Πi,Tk)< αi·U(RTVMi)
Due to Tkbeing a divider of Pitheorem 4.1can be applied. Thus,
UA(Πi,Tk)) = αi·U(RTVMi). Inserting this into the claim leads
to
αi·U(RTVMi)< αi·U(RTVMi)
at time t=Tk, which is a contradiction. Thus, the given taskset
RTVMi(Γ)is schedulable on the static resource partition Πiby
the scheduling algorithm RTVMi(Σ).
4.5.3 Schedule
Up to now, the questions on how to derive the activation slots
and the period of a static resource partition guaranteeing the
150 scheduling of full virtualized hard real-time systems
real-time constraints of a single RTVM have been answered. The
question is how to schedule the activation slots of the RTVMs
within the virtual real-time system to guarantee that the real-
time constraints for all RTVMs will be answered. Figure 59 de-
picts multiple examples of scheduling the activation slots of two
periodic resource partitions with same period and equal utiliza-
tion. Thus, each periodic resource partition needs to provide half
of the period as computation time. Each line represents a sched-
ule, which can either be a MTSPP with two, three or four activa-
tion slots, or a STSPP with only one activation slot.
As shown in section 4.5.2, the period of a static resource parti-
tion is chosen as the greatest common divisor of all deadlines of
the taskset executed by the RTVM to ensure the schedulability
of a single RTVM when the RTVM is derived by the transforma-
tion presented in section 4.4. This period is determined indepen-
dently of the type of static resource partition. Thus, the partition
may either be a STSPP or a MTSPP. Nevertheless, STSPP offer
less switching overhead when having a period PSTSPP >PMTSPP
N.
Since the derivation of the period is independent of the partition
type, the period would be equal for either a STSPP or a MTSPP,
such that PSTSPP =PMTSPP and N>1for a MTSPP. Thus, the
choice falls on STSPPs realizing the static resource partitions de-
rived in section 4.5, as they introduce less switching overhead
within the same period as a MTSPP.
However, it may be possible to create MTSPPs with longer pe-
riods than derived in section 4.5.2and less switching overhead
than a STSPP providing the same bandwidth. The problem then
is to find such a appropriate MTSPP. Up to now, there is no bet-
ter way known than using the schedulability test presented by
Mok and Feng (see section 4.2.1) on each possible MTSPP for a
RTVM.
The number of possibilities to generate a MTSPP for RTVMi
with a given period Piis given by
Pi
αi·U(RTVMi)·Pi. (4.26)
The term αi·U(RTVMi)·Pirepresents the time to be allocated
within period Piaccording to theorem 4.1. Thus, this would be
a factorial complexity of O(Pi!)schedulability tests to be per-
formed for each RTVM. An example for placing the activation
4.5 partitioning policy 151
1 2 3 4 5 6 7 80
Figure 60: Example for placing the activation slots of a MTSPP
slots within the first RTVM is depicted in figure 60. Note that
when placing multiple MTSPPs, the already allocated activation
slots need to be considered to prevent a multiple allocation of
a resource. When allocating the same activation slot to multiple
MTSPPS, the number of MTSSPs sharing this slot defines the
number of processor cores needed.
In the case of STSPPs, the number of placing a single STSPP
within period Pis restricted to P (E1S1)(see section 4.5.1)
possibilities. A test of all possibilities for nSTSPPs would result
in a complexity of O(n·P), as Piof a real-time virtual machine
RTVMican be calculated as the gcd of its tasks deadline. Figure
61 shows an example for placing the activation slots of a STSPP.
As the resource being partitioned by the STSPPs must be allo-
cated by only one partition at a time in case of a single core sys-
tem. There might be no solution for STSPP combinations when
the periods Pifor the STSPPs are determined independently us-
152 scheduling of full virtualized hard real-time systems
1 2 3 4 5 6 7 80
Figure 61: Example for placing the activation slots of a STSPP
ing the gcd for the local tasksets only. According to Lemma 4.2,
it is possible to derive Pias gcd of all deadlines in the system:
Pi=gcd({Tk|τk(Tk,Ck)[
i=1,...,n
RTVMi(Γ)})(4.27)
Thus, i=1, ..., n:P=Piwith nbeing the total number
of RTVMs within the VRS. This means each period Piof each
STSPPiis equal to the the period P. The period Pis a divisor
of all deadlines in the system and PγiEjSj
Pi=αi·U(RTVMi)
thus, due to theorem 4.2, all RTVMs are schedulable on the vir-
tual real-time system VRS. Now, the remaining question is how
to derive the activation slot lengths of each STSPP and how to
schedule them.
This is achieved by starting with the first STSPP at the beginning
and determining the length of the activation slot by multiplying
the utilization of its RTVM by the length of the period P. Ad-
ditionally this value also needs to be aligned relative to the uti-
lization bound of the applied scheduling algorithm. Describing
it in other words, the activation slots are concatenated one after
another with their length being proportional to their utilization
relative to the whole system.
S1=0(4.28)
Ei=Si+αi·U(RTVMi)·P(4.29)
Si=Ei1(4.30)
Figure 62 shows a small example for three STSPPs with P=
gcd S3
i=1{Tk|τkRTVMi(Γ)}=6,α1·U(RTVM1) = 1
2,α2·
U(RTVM2) = 1
6and α3·U(RTVM3) = 1
3. The order in which
4.5 partitioning policy 153
1 2 3 4 5 60
Π1 = ( {(0,3)}, 6 )
Π2 = ( {(3,4)}, 6 )
Π3 = ( {(4,6)}, 6 )
E1 = 0 + α1U(RTVM1) = 3 S1 = 0
S2 = E1E2 = S2 + α2U(RTVM2) = 4
S3 = E2E3 = S3 + α2U(RTVM2) = 6
Figure 62: Example for placing the activation slots of three STSPPs to
ensure the schedulability of the whole system
the STSPPs are placed is not relevant, as for each RTVMi, the
required utilization has to be provided not later than P. Recon-
sidering the example given for the invalid choice of the period P
in figure 57 of section 4.5.2, it is now easily possible to derive a
solution for this scenario which is depicted in figure 63.
4.5.4 Summary
The integration of software components into a big integrated sys-
tem is the main goal of this thesis. The previous chapter built the
infrastructural fundament by providing a highly deterministic
VMM with extensible scheduler interface and the possibility to
determine the WCET of the virtualized guests, which is required
for the scheduling of the RTVMs.
Within this section, the derivation of the CPU requirements and
the derivation of the root-level schedule have been presented
under the assumption that only full virtualization is allowed due
to licensing restrictions and that all real-time tasks meet their
deadlines to realize the temporal isolation property. The related
work presented in section 4.2lacks this complete derivation of
the root-level schedule from given real-time systems under the
assumption of full virtualization.
First the derivation the CPU requirements of the virtual real-
time system hosting the RTVMs was realized in section 4.4in
such a way that the system is not overloaded. A general trans-
formation approach based on utilization bounds has been intro-
duced to address this problem. As RTOS schedulers EDF and
154 scheduling of full virtualized hard real-time systems
0
0,5
0
0,5
0
0
1 2 3 4 5 6 7 8 9 10 11 120
1
Task
VM
t
1
2
3
4
2
1
0
0,5
0
0,5
1 2 3 4 5 6 7 8 9 10 11 120
Task
VM
t
1
2
3
4
2
1
0
0
1
VM1 = { (8;2), (10;2,5) }
VM2 = { (8;2), (10;2,5) }
STSPP1 = { (0;4), 8 }
STSPP2 = { (4;8), 8 }
1
1 2 3 4 5 6 7 8 9 10 11 120
Task
VM
t
1
2
3
4
2
1
0
0,5
2
0
1,5 0,5
0
1
0
0,5
2
0
1,5 0,5
0
VM1 = { (8;2), (10;2,5) }
VM2 = { (8;2), (10;2,5) }
STSPP1 = { (0;1), 2 }
STSPP2 = { (1;2), 2 }
VM1 = { (8;2), (10;2,5) }
VM2 = { (8;2), (10;2,5) }
STSPP1 = { (4;8), 8 }
STSPP2 = { (0;4), 8 }
Figure 63: Example of a valid partitioned schedule with the period
being the GCD of all deadlines instead of P=8. The ac-
tivation slots have been determined according to equation
4.28.
4.6 evaluation 155
RM have been exemplarily analyzed and a scaling factor being
applied to a normalized system was calculated. With this scaling
factor it is possible to derive the minimal CPU speed fVRS of the
virtual real-time system. This derived CPU speed can be consid-
ered as a hint for the system designer, as in general there hardly
is any hardware available with the desired clock rate and the
assumed linear scaling behavior does not reflect accurately the
reality3. Nevertheless the calculated speedup factor is still valid,
even when its application as a linear factor to the clock rate is
restricted.
Finally, the derivation of a root-level schedule was presented
in section 4.5by deriving the parameters of STSPP partitions.
Each partition executes a RTVM and defines an interval within
which this RTVM is executed. This interval is repeated period-
ically with the period of all STSPPs being the GCD of all task
deadlines available in all RTVMs. The schedule of the intervals
within the global period is derived as described in section 4.5.3.
As already mentioned, the derivation of this partitioning is not
covered in the presented related work. So the main contribution
of this section is the possibility of automatically deriving a fulyl
virtualized virtual real-time system from given real-time systems
without requiring the interaction of the system designer.
4.6 evaluation
The evaluation of FVBTP from given real-time systems presented
in this chapter is first addressed by summarizing the algorithmic
complexity for each step of this approach. This algorithmic com-
plexity is compared to the current state of the art approach from
academia namely the open system environment (see 4.2.1). At
first glance this seems to be a comparison of apples and oranges,
since both approaches use different virtualization paradigms. A
valid comparison would be to compare FVBTP with another
methodology that derives the schedule of a fully virtualized hi-
erarchical real-time system.The related work showed that there
does not exist such a solution that completely derives a hierarchi-
3There are a lot of effects affecting the speed of a program, such as pipelining
and caching. Even when assuming they are not available, a linear scaling be-
havior for identical ISAs is restricted on the speed the memory bus, which
defines the maximum speed of memory accesses.
156 scheduling of full virtualized hard real-time systems
Task Complexity
Open System Environment FVBTP
Initialization O(1) O(j)
Runtime O(n2) O(1)
Table 10: Comparison of the algorithmic complexity of the open sys-
tem environment and FVBTP
cal schedule for fully virtualized systems. Another question may
be why not comparing this approach to existing static schedulers
like the ARINC 653. This question is easy to answer, because
FVBTP does not compete with this kind of schedulers due to the
reason of essentially being an extension of this type of sched-
ulers. In general, FVBTP is an abstraction of this class of sched-
ulers providing a methodology to derive the execution param-
eters as extension. This has been reasoned in section 4.2.3. The
main difference is the extension allowing the derivation of the
complete virtual real-time system, what has not been addressed
in this field up to now. The result of this approach can thus be
transferred to an ARINC 653 Plattform. Thus, it is more inter-
esting to see the strengths and weaknesses of the approach pre-
sented in this thesis compared to the open system environment
as this approach uses paravirtualization, which is in general ap-
plied for improving virtualization performance (see 3.2.1). This
is realized by evaluating the real execution of tasksets in the
open system environment and with FVBTP. This evaluation was
the focus of the master thesis[Grö10] of Stefan Groesbrink which
was supervised by me. The results presented in this section are
based on this work. Before presenting the evaluation based on
the real execution hardware, the algorithmic complexity and the
distribution of the GCD is discussed to get a feeling about the
expectation of the real execution result.
4.6.1 Algorithmic complexity
The algorithmic complexity is determined for the initialization
process and the runtime overhead of the open system environ-
ment and the approach presented in this thesis. Let ndenote the
number of RTVMs being executed and let jdenote the number
of all tasks available in the virtual real-time system. The trans-
4.6 evaluation 157
formation of the tasksets, namely the adaption of the worst case
execution times to the underlying hardware (see 4.4), has a com-
plexity of O(j), as each task needs to be scaled to the target vir-
tual real-time system. The derivation of the root-level schedule
requires the calculation of jgreatest common divisors to calcu-
late the global period Pand the activation slots for each STSPP.
The worst case execution time of determining the gcd of two k-
bit numbers which can be found in literature [Sed09] is O(k2).
Current hardware architectures provide either 32 Bit or 64 Bit
numbers. The gcd of jnumbers can be determined by hierachi-
cal application of the gcd.
gcd(j1, ..., jn) =
gcd(j1,gcd(j2,gcd(j3, ..., gcd(jn1,gcdj)))) (4.31)
The number of iterations is bounded by O(j) [Bra70]. Thus, the
total complexity of determining the gcd of j k bit numbers is
O(jk2). Since k is constant, by either being 32 Bit or 64 Bit when
considering actual hardware architectures, the complexity finally
is O(j). Both the transformation and the derivation of the root-
level schedule have a complexity of O(j)and thus, the total com-
plexity of the initialization step is also O(j).
The initialization of the partitioning policy of the open system
environment only requires the initialization of the CBS servers
what requires a constant time for each server. Thus, the complex-
ity of the initialization process of the open system environment
is O(n)while being O(j)for the FVBTP. Note that in general,
there are more tasks than virtual machines nj.
Finally, the complexity of the scheduler at runtime is determined.
For the open system environment, this complexity is given by
the maintainance of nservers and the determination of the next
deadline among all servers. The maintenance can be realized in
constant time, while the determination of the next deadline is
given by the complexity of inserting one element into a sorted
list. Using a linear list the complexity is given by O(n). Thus,
the total complexity is O(n2)for the runtime overhead of the
open system environment. In case of FVBTP this complexity is
constant, as the schedule can be stored as an array of activation
slots being repeated after Ptime units. Thus, the runtime com-
plexity is O(1).
158 scheduling of full virtualized hard real-time systems
1 2 3 4 5 6 7 8 9 10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
GCD Distribution of n=1..6 Tasks with period of 1 to 10 ms
6 Tasks
5 Tasks
4 Tasks
3 Tasks
2 Tasks
Figure 64: Distribution of the GCD for up to 6tasks with periods up
to 10ms
4.6.2 Distribution of the GCD
The performance of the FVBTP is heavily dependent on the re-
sult of the GCD of all deadlines available in the system, as small
GCDs heavily increase the switching overhead within the sched-
ule. Intuitively, one expects a lot of low GCD results for the pos-
sible combinations of deadlines that exist. Unfortunately, this
expectation is correct, and figure 64 shows the distribution of
the GCD for n=2, ..., 6tasks where all possible combinations
of deadlines for periods T=1ms, ..., 10ms have been considered.
As one can see, the distribution has its maximum at a GCD value
of one and it rapidly decreases when the GCD is getting larger.
When increasing the number of tasks, the probability of getting
a GCD of one increases, while the probability of getting a higher
GCD decreases. So theoretically, the approach of FVBTP seems
to be unsuitable for most of the cases and seems to work only
for seldom special cases. Fortunately, these special cases are not
as seldom for hard-real-time systems as one would expect. This
will be shown in the following sections where the evaluation on
the real execution, platform with realistic scenarios is presented.
4.6 evaluation 159
SE
Trigger
GPIO High Sig Prop
t1 t2
Trigger
GPIO Low Sig Prop
Time to measure
GPIO Pin
High
GPIO Pin
Low
Measured time by Logic Analyzer
Figure 65: Possible measurement error due to signal propagating
time.
4.6.3 Experimental Setup
As execution platform for the real execution the PowerPC405FX
system as presented in section 3.4was used, since the designed
VMM was implemented on this platform. As a difference, the
measurements in this section are determined by using an Agi-
lent logic analyzer connected directly to 20 general purpose I/O
pins (GPIOs) of the hardware platform. These GPIOs have been
used to signal the different start and stop times of the evaluated
parts of the VMM. The logic analyzer is able to detect rising
and falling edges on these GPIOs with a sampling frequency
of 800 MHz being more than two times higher than the fre-
quency of the evaluated hardware. This ensures that the required
property of the sampling frequency is at least two times higher
than the sampled signal stated by the Nyquist Shannon Sampling
Theorem[Sha49]. Thus, the worst case error induced due to the
logic analyzer would be equal to ±1.25ns for each detection of
a falling or raising edge, as the sampled signal may just be hit
shortly before or after the transition occurs. This error is equal to
approximately a third of a processor cycle and is thus negligible
for the following measurements. [Grö10]
Besides the error induced by the logic analyzer, the signal propa-
gation delay from the triggering instruction to the GPIO needs to
be determined when measuring time intervals. Figure 65 shows
the effects occuring when measuring a specific time interval with
the Logic Analyzer. [Grö10]
To determine the length of the interval, Eand Shave to be de-
termined. The direct measurements of the GPIO signal lead to
a delay between the finish time of the instruction triggering the
160 scheduling of full virtualized hard real-time systems
GPIO to high and the time at which the GPIO pin changes its
state due to the signal propagation. Thus, the value of the GPIO
pin raises to high when the time to measure may have already
started. The instruction triggering the GPIO to change its state
to low is executed directly after the time to measure, and again
the delay due to signal propagation time delays the state change
of the GPIO pin. Thus, the difference between Sand Ecan be
calculated as:
ES=t2SigProp TriggerInstr (t1SigProp)
ES=t2t1TriggerInstr
So the signal propagation time has no influence on measuring
the time interval. The only error is given by the execution time
of the instruction that triggers the GPIO. The worst case would
be the alignment of this instruction on the beginning of a cache
line that needs to be fetched for execution. The WCET of han-
dling the line fill has been already determined in section 3.4.2
and is equal to 52 cycles. Additionally the access to the GPIO
is realized by memory mapped I/O which introduces an addi-
tional overhead of 52 cycles for the memory access. Thus, the
error is 78 cycles ±26 cycles, because the memory access is al-
ways required while the line fill may not occur. This error needs
to be considered for each taken measurement performed in the
following analysis.
4.6.4 Scheduling performance
For determining the scheduler performance, the execution times
of the methods specified in the extensible scheduler interface
(see 3.3.5) have been determined using the experimental setup
described in the previous section. The measurements can be
found in A.1and are taken from [Grö10].
The method sched_init is responsible for the initialization of the
schedulers. As expected, the overhead induced by determining
the schedule offline in case of FVBTP is significantly larger than
in case of the Open System environment initialization phase. The
complexity of determining the root-level schedule is O(j)with j
being the number of available real-time tasks in the system (see
4.6.1). In case of the Open System Environment, the initializa-
4.6 evaluation 161
!"#"$%&$$'("
)*"#"+&,,-$."
!"#"/$&,''("
)*"#"+&,,,,0"
+"
1+++"
/+++"
%+++"
$+++"
0+++"
.+++"
'+++"
-+++"
,+++"
1++++"
+" 0+" 1++" 10+" /++"
!"#$%&'($
)*"+#,$-.$/0'1'$2345/67$8$9/4:;$2<;=7$
'>?#@ABCBD$
23456"
789"
:;<=>?"@23456A"
:;<=>?"@789A"
Figure 66: Scaling behavior of sched_init. For measurement values see
table 22,23 and 26.
tion step is O(n)with nbeing the number of RTVMs (see 4.6.1).
Figure 66 shows the scaling behavior of sched_init with an in-
creasing number of tasks/RTVMs. As one can see, the expected
linearity of the initialization phase is also represented in the mea-
surements. To give a prediction, a linear regression was applied
to the available measurements, and as a result, a gradient of
a=42.663[µs
Task ]was determined with a coefficient of determi-
nation of R2=0.99744, which implies a very good predictability.
As expected, the offline overhead of FVBTP is higher than the
overhead of the open system environment. Nevertheless, even
for a large virtual real-time system with 1000 tasks, the FVBTP
offline derivation of the schedule would only need about 42 mil-
liseconds on a 300MHz PowerPC. Thus, even a reset of the sys-
tem requiring a recalculation of the schedule seems to be realiz-
able without any problem of violating the timing constraints for
resetting such a system.
162 scheduling of full virtualized hard real-time systems
Confidence
Scheduler Min[µs] Max[µs] Mean[µs] Intervall0.95
FVBTP 0.917 1.240 1.033 [1.017;1.050]
OSE 0.920 1.200 1.037 [1.021;1.054]
Table 11: Performance of scheduler interface routine
sched_getNextTimerEvent [Grö10].
The next method of the extensible scheduler interface to be ana-
lyzed is the method sched_getNextVMIndex, which is the essential
scheduling method that determines the next VM to be executed
out of the ready queue. In the case of FVBTP, this method is ex-
pected to show constant execution time behavior, while in the
case of the OSE this method is expected to show O(n2). Fig-
ure 67 shows the measured behavior for a growing number of
virtual machines. As one can see, OSE shows quadratic behav-
ior for a growing number nof RTVMs with a coefficient of de-
termination of R2=0.9996 The FVBTP sched_getNextVMIndex
shows the expected constant behavior and is even in the case of
a small set of RTVMs (n=6) significantly lower (10 times) than
the sched_getNextVMIndex of the OSE.
As already stated at the beginning of this evaluation section, this
comparison is quite unfair, as OSE is an online and FVBTP is an
offline scheduler. Nevertheless, FVBTP had to be compared to
an existing approach, and as FVBTP is the only known offline
approach, a comparison to the state of the art online approach
was obvious.
Finally, the performance of the method sched_getNextTimerEvent
had to be analyzed. In both cases, this method returns the point
in time where a timer interrupt is set to reactivate the VMM for
scheduling reasons. In the case of OSE, this is the next deadline
which is accessible in O(1)by reading the relative deadline from
the first tasks PCB of the ready queue and adding this value to
the current time. In the case of FVBTP, the next timer event is
determined by the endpoints of the activation slots, which are
also accessible in O(1). Table 11 shows the result of the mea-
surements for FVBTP and OSE. Both methods show a nearly
identical performance at 0.9µs.
As expected, FVBTP shows the better runtime performance for
the scheduling decision and a worse performance for the ini-
4.6 evaluation 163
!"#"$%&'$()("*"+(%+,-)"
./"#",%000+"
,"
$,,,,"
(,,,,"
',,,,"
1,,,,"
2,,,,"
+,,,,"
-,,,,"
&,,,,"
0,,,,"
$,,,,,"
," (," 1," +," &," $,," $(," $1," $+," $&," (,,"
!"#$%&'($
)*"+#,$-.$/012'$
'34#567#8)#9812:;5#9$
345"
6789:"
:;<!=;>?@AB"C345D"
Figure 67: Scaling behavior of sched_getNextVMIndex
164 scheduling of full virtualized hard real-time systems
tialization phase when comparing it to OSE. However, there is
still an issue not addressed up to now that has significant influ-
ence on the overhead introduced by those schedulers. This issue
is the amount of overhead caused by VM context switchings.
Therefore, two values need to be determined. The first one is the
introduced overhead of a context switch and the second is the
number of context switches caused by the scheduler. The first
question will be answered in the following section and the sec-
ond question will be addressed by performing a scenario based
evaluation using realistic scenarios.
4.6.5 Context Switching
Like any other system software enabling the use of multipro-
gramming on a single core system, virtualization adds overhead
due to switching between the virtualized environments. This
switching process is in general identical to a classical context
switch known from operating systems and consists of the fol-
lowing steps
1. Save context (Hardware Registers, Virtual Registers, Vir-
tual memory data structures)
2. Scheduling decision
3. Restore context of next VM
Such a virtual machine context switch (VMCS) needs to be per-
formed every time a VM is suspended and another VM becomes
active. The VMCS starts from the point in time where the last
VM was suspended and ends when the next VM starts its ex-
ecution. During this time, only the VMM is active. The steps
1and 3are independent from the applied root-level schedul-
ing policy, while step 2is directly influenced by the schedul-
ing policy. To quantify the VMCS, the save and restore process
needs to be determined. With this result, the VMCS can be de-
rived for a specific scheduling scenario. Table 12 shows the re-
sult of the VMCS measurements performed for the OSE sched-
uler with two virtual machines, and table 13 shows the results
for the FVBTP scheduler. As one can easily see, the save and
restore process needs the same amount of time of about 33µs
and only differs in orders of 500ns in the mean which can oc-
cur due to code positioning. The time needed for the scheduling
4.6 evaluation 165
Confidence
OSE Min[µs] Max[µs] Mean[µs] Intervall0.95
Total 129.123 130.042 129.545 [129.512;129.579]
context 33.087 33.969 33.716 [33.687;33.746]
schedule 95.522 96.162 95.823 [95.764;95.852]
Table 12: VMCS for Open System Environment for n=2. [Grö10]
includes the calls of the extensible scheduler interface methods
sched_getNextVMIndex and sched_getNextTimerEvent. The differ-
ence of about 5µs to the sum of the measured values is in both
cases caused by the code embedding the calls to these methods.
The scheduling decision itself is the dominating part in case of
OSE. Already in the case of two virtual machines, the schedul-
ing overhead is about three times larger than the overhead in-
duced by saving and restoring the context. In case of FVBTP, the
schedule decision takes about a third of the time to save and
restore the context. Thus, the complexity of VM switching is in
the same order of magnitude of a normal context switch. The
main problem of OSE is the fact that the scheduling decision in
the worst case grows quadratically in the number of virtual ma-
chines, what leads to an even higher impact of the scheduling
decision upon a VMCS. The main problem of FVBTP is a small
GCD. To show the impact of the GCD on the VMCS overhead,
figure 68 shows the expected overhead for a GCD ranging from
1µs to 10ms for 2up to 10 VMs. This overhead was determined
by
Overhead(GCD,VMs) = (VMs 1)·VMCSFVBTP
GCD (4.32)
It is easy to see the high impact of GCDs in the magnitude
of µs on the switching overhead for the used evaluation plat-
form. When restricting the granularity in time to milliseconds,
FVBTP seems to be applicable in most of the cases. This restric-
tion seems also be valid for OSE, as the complexity for a single
VMCS is already in the order of 0.13ms in case of two tasks. The
final question to answer is how many VMCS are performed by
OSE and FVBTP in realistic scenarios, as this allows a final evalu-
ation of the overhead introduced by both scheduling approaches.
This discussion will be performed in the following section using
different realistic scenarios.
166 scheduling of full virtualized hard real-time systems
Confidence
FVBTP Min[µs] Max[µs] Mean[µs] Intervall0.95
Total 42.480 42.962 42.751 [42.725;42.776]
context 32.977 33.577 33.281 [33.251;33.311]
schedule 9.360 9.643 9.465 [9.450;9.479]
Table 13: VMCS Performance for FVBTP. [Grö10]
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
GCD [µs]
FVBTP VMCS Overhead [%]
2 VMs
3 VMs
4 VMs
5 VMs
10 VMs
20VMs
40 VMs
60 VMs
80 VMs
100 VMs
Figure 68: VMCS overhead for periods from 1µs up to 10ms on Pow-
erPC405 @300 MHz
4.6 evaluation 167
4.6.6 Case Study
To sum up the evaluation up to this point, FVBTP introduces
significantly less runtime overhead than OSE, but suffers from
a very bad GCD distribution (see 4.6.2), which is essential for
the total number of VMCS needed. So one may expect that OSE
outperforms FVBTP even with its higher runtime overhead due
to its expensive scheduling decisions. Nevertheless, this section
will show that FVBTP is applicable to realistic scenarios and it
even outperforms OSE even when having small numbers of the
GCD.
The presented scenarios in [Grö10] cover varying timing gran-
ularities of embedded systems ranging from microseconds to
seconds. The computation times of the presented scenarios are
adopted to be able to consolidate them on the evaluation plat-
form presented in 4.6.3, because in some cases, a system with
more than 300 MHz would have been necessary due to the cal-
culated required speedup for FVBTP. In this case, it has been
assumed that the 300MHz platform was able to execute those
tasksets with the same speed as on the required platform.
Electrical Drive Engineering - Linear Motor Control
The first scenario presented in [Grö10] is taken from the CRC6144
and covers the control of a linear motor developed at the Univer-
sity of Paderborn. It includes two parts. The first part is responsi-
ble for controlling the current and the second part is responsible
for controlling the speed of the motor. The current control in-
volves two tasks while the speed control only involves one task.
The tasksets are listed in 14.
When taking a look at the final results for the depicted scenario
shown in table 15, one can see that OSE introduces three times
more VMM overhead when the number of VMCS are in the same
magnitude. This is due to the VMCS of OSE being three times
slower than the VMCS of FVBTP in the case of two virtual ma-
chines. FVBTP assigns 57.46% to the guests in case of EDF and
69.13% in case of RM. This is caused by the pessimistic approach
of scaling based on the least upper bound for the RM taskset.
Here, OSE has an advantage of 11.67% less guest load. Neverthe-
4The Collaborative Research Centre 614 ”Self-Optimizing Concepts and Struc-
tures in Mechanical Engineering”
168 scheduling of full virtualized hard real-time systems
Guest System Task Computation Period
Time [ms] * [ms]
(1) Current Control ControllerC10.25 3
ControllerC20.25 3
(2) Speed Control ControllerS0.25 42
* On a 1GHz processor
Table 14: Scenario IV: Electrical Drive Engineering - Linear Motor Con-
trol. [Grö10]
less, this advantage is nearly equalized by the overhead caused
by VMM.
Figure 69 shows the resulting schedules of OSE and FVBTP. In
case of OSE the speed controller is executed by a single RTVM
being colored green. The two current controllers are colored
blue. At the very beginning the schedule of OSE shows its prop-
erty of introducing more context switches than FVBTP due to
its global EDF nature. The higher amount of VMCS in case of
OSE is caused by the aspect that a task that terminated before
its deadline returns the control to the VMM to trigger a new
scheduling decision, because the head of the scheduling queue
has changed. This is not necessary in case of FVBTP leading to
a continuous execution within its activation slot even when a
task terminated within this activation slot. Thus, an increasing
number of tasks in a VMM causes OSE to perform more con-
text switches at VMM level. Considering an additional controller
task with the same period and the utilization equally distributed
among these three controller tasks, OSE would introduce 15 ad-
ditional context switches during the hyperperiod. Considering
the worst case of the GCD being one millisecond, the VMM over-
head of FVBTP would increase to 8.55% as the number of VMCS
would increase to 84. Nevertheless, the total VMM overhead in
this case is still less than the VMM overhead of OSE. Thus, the
granularity of time used for the tasksets is an essential property
influencing the performance of FVBTP. As a result, one can see
that FVBTP has not been outperformed by OSE in this scenario,
even if in the worst case scenario the GCD is one millisecond.
It has to be noted that the original periods of the current con-
trol tasks were given as 3.051667ms and 3.183013ms, while the
speed control task was given as 43.473157ms by the mechanical
engineers. Now why is it possible to use 3ms and 42ms? The
4.6 evaluation 169
0
12
Task
VM
C1
C2
S
2
1
345678910
0,28
0
0
12
VM
C1
C2
S
2
1
345678910
0 0 0
000
OSE
FVBTP
Task
0
0
0
0
0
0
Figure 69: Schedules using OSE and FVBTP
Illinois Paderborn
Guest Scheduler Guest Scheduler
Scenario EDF RM EDF RM
(IV) Motor VMM [%] 9.25 9.25 2.85 2.85
Control Guests [%] 57.46 57.46 57.46 69.13
Idle [%] 33.29 33.29 39.69 28,02
U [%] 66.71 66.71 60.31 71,98
VMCS 30 30 28 28
Table 15: Electrical Drive Engineering Evaluation. [Grö10]
property of periods being multiples of each other is quite com-
mon in control systems. This is due to the fact that the sampling
and control frequency is selectable freely within certain limits
dictated by the technical restrictions. Usually, the lower limit is
fixed while the upper limit is not. The frequency of the actua-
tors is then optimally been chosen as a multiple of the sampling
frequencies [Lit04,Ric08]. Thus, it was possible to increase the
rates to get a GCD of 3ms in this case for FVBTP. Nevertheless,
FVBTP would have been even capable of handling a GCD of 1ms.
So as a final result both approaches were able to consolidate two
real-time systems being executed separately before on a 1GHz
computer with FVBTP and OSE causing the same number of
VMCS. The following scenarios will show a different behavior
of OSE, as already mentioned above.
170 scheduling of full virtualized hard real-time systems
Guest System Task Computation Period
Time [ms] * [ms]
(1) Human- GUI Control 10 100
Machine GUI Task 150 100
Interface GUI Task 25 100
(2) Corporate IT Industrial Ethernet 1 10
Interface Database Control 50 100
(3) CNC Servo Control 11 10
Control Servo Control 21 10
Servo Control 31 10
Sensor Task 11 10
Sensor Task 21 10
Sensor Task 31 10
Emergency Stop Check 1 10
* On a 100 MHz processor
Table 16: Scenario I: Industrial - CNC Machine. [Grö10]
Industrial - CNC Machine
The second scenario presented in [Grö10] covers an industrial
automation example. Therefore, three systems of a CNC5ma-
chine control are consolidated on a single virtualized system.
One of the systems is responsible for handling the GUI6-based
human machine interface. This GUI requires a latency of 100ms
to be considered reacting instantaneously [Mil68,Nie93]. An-
other system is responsible for database access via an indus-
trial ethernet connection to acquire the necessary information of
the workpiece in production and another system is responsible
for sensing and controlling the actuators of the CNC machine
[WLM+10]. The tasksets can be found in table 16.
This scenario shows the impact of multiple tasks within one
RTVM on the number of VMCS of OSE. The results are shown
in table 17. In this scenario, OSE needs to perform 129 VMCS
during the hyperperiod of 100ms, which leads to a total VMCS
overhead of 24.58%. In contrast, only FVBTP causes an overhead
of 1.29% for the GCD being 10ms. Even in the worst case of the
GCD being 1ms, the total VMCS overhead with 12.82% would
5Computer Numerical Control (CNC) refers to the automation of machine tools
that are operated by programmed commands
6Graphical User Interface
4.6 evaluation 171
Illinois Paderborn
Guest Scheduler Guest Scheduler
Scenario EDF RM EDF RM
Industrial VMM [%] 24.58 24.58 1.29 1.29
CNC Guests [%] 65.00 65.00 65.00 83.93
Machine Idle [%] 10.12 10.12 33.71 14.78
U[%] 89.88 89.88 66.29 85,22
VMCS 129 129 30 30
Table 17: Industrial - CNC Machine Evaluation. [Grö10]
be still half of the overhead introduced by OSE. Unfortunately,
this would cause a total utilization of 98.05% when using FVBTP
with RM as guest schedulers due to the pessimistic activation
slot allocation. Thus, the total utilization would be about 8%
more than the utilization of OSE in that case.
Medical - X-ray Machine
The third scenario shown in [Grö10] covers the medical domain
and includes the subsystems typical for medical applications.
The first subsystem is the human-machine interface and the sec-
ond subsystem is responsible for controlling of the X-ray unit.
The human-machine interface has three tasks. The first task is to
control the GUI. The second task is to perform some image pro-
cessing on the gathered image, and the final task is to visualize
the resulting image. The frame rate is about one image every half
a second. The control subsystem needs to perform sensing and
controlling the actuators to position the X-ray beam as required.
The AEC7ensures the exposition limitation of an object to the
X-ray beam when the human operator did not turn off the X-ray
beam after a specific period of time [VS06]. The corresponding
taskset of this scenario is shown in table 18.
In this scenario, the execution times and their periods of all tasks
are quite long. Thus, OSE and FVBTP had no problem coping
with this example. Table 19 shows the results for this scenario.
One can see that there is still a significant difference in overhead
between OSE and FVBTP, what is expected due to the effects de-
scribed in the previous scenarios, but due to the long execution
times and periods, this difference did not have a large impact on
7Automatic Exposure Control
172 scheduling of full virtualized hard real-time systems
Guest System Task Computation Period
Time [ms] * [ms]
(1) Human-Machine GUI Control 10 100
Interface Image Processing 200 500
Visualization 100 500
(2) X-Ray Control AEC 50 1000
Servo Control 120 100
Servo Control 220 100
Sensor Task 110 100
Sensor Task 210 100
* On a 150 MHz processor
Table 18: Scenario II: Medical - X-ray Machine. [Grö10]
Illinois Paderborn
Guest Scheduler Guest Scheduler
Scenario EDF RM EDF RM
(II) Medical VMM [%] 0.97 0.97 0.001 0.001
Guests [%] 67.50 67.50 67.50 88.61
Idle [%] 31.53 31.53 32.41 11.30
U [%] 68.47 68.47 67.59 88.70
VMCS 75 75 20 20
Table 19: Medical Evaluation. [Grö10]
the total overhead introduced by OSE. Thus, both approaches
performed very good in this scenario.
Automotive - Airbag control and Driver Assistance
The fourth scenario presented in [Grö10] covers an automotive
consolidation of an ACU8, ABS9, and ESC10. The tasksets of this
scenario are listed in table 20. The Airbag control unit runs a
detection task which monitors the results of the different sensor
tasks and triggers the gas inflation in case of a detected crash.
For each of the eight sensors11 used in the airbag control, a cor-
responding sensor task is executed to handle the readout [Rei07].
The ABS works at a speed of 10Hz. The sensor task monitors the
8Airbag Control Unit
9Anti-lock Braking Systems
10 Electronic Stability Control
11 Accelerometer, Impact Sensor, Pressure Sensors, Wheel Speed Sensor and Gy-
roscopes
4.6 evaluation 173
Guest System Task Computation Period
Time [ms] * [ms]
(1) Airbag Detection Task 1.5 15
Control Sensor Task 10.3 15
Unit Sensor Task 20.3 15
Sensor Task 30.3 15
Sensor Task 40.3 15
Sensor Task 50.3 15
Sensor Task 60.3 15
Sensor Task 70.6 15
Sensor Task 80.6 15
(2) Anti-lock Brake Pressure Control 20 100
Braking Sensor Task 10 50
System CAN Communication 1 5
(3) Electronic Detection Task 0.9 5
Stability Servo Control 0.3 5
Control Sensor Task 10.3 5
Sensor Task 20.3 5
Sensor Task 30.3 5
Sensor Task 40.3 5
CAN Communication 1 5
* On a 100 MHz processor
Table 20: Scenario III: Automotive - Airbag Control and Driver Assis-
tance. [Grö10]
rotational speed of the four wheels. This sensor input is used in
the brake pressure control task to calculate the applied pressure
for the target actuator. The ESC compares the driver steering
commands based on a steering angle sensor with the current
driving conditions. When necessary, the ESC intervenes keeping
the car in a stable and safe driving state. The tasks necessary for
the ESC are executed at a rate of 150Hz [Bos07]. To prevent con-
tradictory commands of the ABS and the ESC both components
are interconnected using a CAN12 interface [Ise06]. The consol-
idation of the ABS and the ESC into a virtualized system elimi-
nates the need of the physical interconnection of those systems
using CAN and can be replaced by virtual network interfaces.
This scenario again shows the impact of multiple tasks in a
RTVM, what leads to an overload condition for OSE. Due to the
12 Controller Area Network
174 scheduling of full virtualized hard real-time systems
Illinois Paderborn
Guest Scheduler Guest Scheduler
Scenario EDF RM EDF RM
(III) Automotive VMM [%] 52.47 52.47 2.57 2.57
Guests [%] 52.67 52.67 52.70 70.64
Idle [%] - - 44.73 26.79
U[%] > 100 > 100 55.27 73.21
VMCS 826 826 180 180
Table 21: Automotive Scenario Evaluation. [Grö10]
strong time-shared behavior of the tasksets within short periods,
a lot of VMCS are necessary for OSE, what creates a required
VMM utilzation of 52.47%. Here, FVBTP shows its strength by
performing 4.5times less VMCS than OSE by executing multiple
tasks within of a RTVM in a single activation slot. In addition,
the impact of the VMCS being around five times more expensive
in the case of OSE causes the total VMM utilization difference of
50% compared to FVBTP. Considering again the worst case of a
GCD being 1ms, the VMCS overhead would increase to 12.82%
with the scenario still being schedulable by FVBTP in case of
EDF and RM.
4.6.7 Summary
The evaluation showed the expected performance advantage of
FVBTP in all cases of pure EDF based RTVMs due to significant
difference of the VMCS overhead of OSE. This overhead caused
by the dynamic scheduling approach enabled FVBTP to outper-
form OSE even in the case of its worst case of the GCD being
1ms. This effect is caused, by the fact that to the VMCS over-
head uses only 4.28% in an interval of 1ms length. In contrast,
OSE needs 12.95% in the same interval and in the case of only
two RTVMs. With an increasing number of RTVMs the VMCS
overhead of OSE quickly grows quadratically to 23.6% in case of
four RTVMs, while the VMCS overhead for FVBTP is constant
due to its static behavior. Thus, the dynamic behavior comes at
a high price when dealing with short periods, which has been
shown in the scenarios. Besides the pure values of the VMCS
overhead, the number of performed VMCS has been evaluated.
For FVBTP, a direct derivation of the number of VMCS was pos-
4.6 evaluation 175
sible and a performance graph for the caused overhead for a
given GCD was presented. In case of OSE, the number of VMCS
is dependent on the executed tasks in the virtualized system.
The scenario evaluation showed a significant higher amount of
VMCS for OSE. The reason for this behavior is the preemptive
behavior of OSE. OSE needs to perform a scheduling decision in
the VMM upon every task finishing time or when a CUS with
higher priority becomes active causing the active RTVM to be
preempted.
When taking a look at pure RM guests, one notices that FVBTP
has a higher guest utilization. This is due to the pessimistic scal-
ing of the activation slots based on the least upper bound for the
executed taskset. Due to this decision, it is always ensured that
the taskset is schedulable in the RTVM. This pessimistic scaling
can be improved by scaling relatively to higher utilizations than
the least upper bound, but then a response time analysis (see
2.1.2) is necessary in advance to determine whether the taskset
is still schedulable under this conditions or not.
Summing all up FVBTP performed very good in all scenarios
demonstrating that the worst case GCD impact can be kept low
when the time granularity is in the next magnitude to the VMCS
overhead. In the special case of the evaluation platform, this was
a time granularity of ms being the next magnitude to the VMCS
overhead being in the lower µs magnitude. The impact of the
dynamic behavior of OSE is as severe as expected in the case
of multiple tasks per RTVM with short periods due to the pre-
emptive behavior. This prevents OSE from being applicable to
the automotive scenario. The possibility of adding dynamically
RTVMs to OSE makes it attractive for dynamic environments,
but it has to be noted that the required server parameter calcu-
lation proposed by Bini et. al. (see 4.2.1) does not come without
consequences. Thus, the application is restricted to more power-
ful hardware architectures than the used PowerPC405FX.
5
SUMMARY
The characteristics of embedded systems has changed dramati-
cally within the last two decades. Especially the growing com-
plexity of embedded real-time systems and their demand for
high-level functionality typically provided by GPOSes creates
different problems that have to be faced when developing, as-
sembling and deploying them. One of the main problems of
building such complex systems is the integration of software
components into a big integrated system as described by Broy in
[Bro06]. The integration of such complex systems can easily lead
to unmodelled and thus to unintentional feature interactions be-
tween the integrated components. In a distributed system where
every task is placed on a single ECU, the unintentional interac-
tion or a hardware failure are fairly the only issues endanger-
ing a components functionality. But this kind of feature distri-
bution is currently something up for discussion, as the trend is
to have less dedicated ECUs in favor of more centralized multi-
functional hardware, due to the increasing need for high-level
APIs like graphical user interfaces or multimedia functionality.
The trend towards more centralized multi-functional hardware
boosts the problem of unintentional interaction of software com-
ponents as they share the processor, memory and I/O devices in
this case. To isolate the different components spatial and tempo-
ral, the approach of virtualization is suitable. The main problem
currently is the virtualization of multiple system either requir-
ing hard, soft or non real-time using full virtualization instead
of the widely applied paravirtualization approach. The paravir-
tualization approach requires access to the source code of the
virtualized OS, which is in general not accessible due to licens-
ing restrictions of the different vendors. Thus a VMM providing
the possibility to use full virtualization and paravirtualization
at the same time is required to offer the possibility to virtual-
177
178 summary
ized closed source guests as well as paravirtualizable guests like
Linux, which offer high-level functionality for graphical user in-
terfaces. The presented Proteus VMM was designed and imple-
mented to face these problems and offers the possibility to exe-
cute full virtualized and paravirtualized guests at the same time
while the provided ABI is configurable to keep the code over-
head of the VMM as small as possible. To be able to handle hard
real-time guests Proteus VMM provides an extensible scheduler
interface and is strictly kept deterministic. Thus it is easily pos-
sible to determine the WCET of its different virtualization com-
ponents. With the knowledge of WCET analysis approaches this
allows, the approximation of the WCET of the virtualized execu-
tion of a guest application. The WCET of the virtualized guest
is required for the schedulability analysis. Due to the integra-
tion of different systems into a single virtualized systems it is
very useful to provide a methodology that automatically derives
all execution parameters of the virtual real-time systems that
executes given real-time systems as real-time virtual machines.
Those parameters are especially the required performance and
the schedule of the real-time virtual machines. The existing state
of the art lacks the possibility of automatically deriving those
parameters under the constraint of full virtualization. The pre-
sented approach called FVBTP1addresses this problems and al-
lows for deriving automatically those parameters from a set of
given systems. The derived schedule depends on the correlation
of the deadlines being in the best case divided by large GCD to
reduce the switching overhead of the VMM. This drawback can
be smoothed by restricting the time granularity to the next mag-
nitude. In addition, hard real-time systems used for controlling
sensors and actuators very often have the property of their task
periods being multiples of each other and the possibility of ad-
justing the periods in certain limits what mostly leads to good re-
sults for the GCD. The evaluation demonstrated the applicability
of the presented approach and showed good results compared
to the open system environment. The open system environment
is based on paravirtualization and is a dynamic scheduling ap-
proach in contrast to the presented FVBTP which uses a static
scheduling strategy and is based on full virtualization. The com-
parison to the open system environment was performed, as this
1Full Virtualization by Temporal Partitioning
summary 179
is the state of the scheduling approach and the only approach
which is able to schedule multiple hard real-time systems.
To conclude, it has been shown that the defined goal of provid-
ing a configurable and deterministic VMM being able to han-
dle paravirtualized and full virtualized guests was reached by
implementing the presented VMM design into the VMM called
Proteus. To ensure the schedulability under the constraint of full
virtualization, the FVBTP approach was presented and evalu-
ated. The results showed the applicability of the approach to
representative scenarios. Further research may be focused on the
support for multicore systems, as those systems seem to be the
trend for future multi-functional hardware even in the field of
embedded hard real-time systems. This especially includes the
questions of how to distribute the real-time virtual machines in
such a system. Another interesting area is the extension of the
presented transformation of the given real-time system tasksets
to derive the parameters of the virtualized real-time system. Cur-
rently, this transformation is kept quite simple as it is a proof
of concept. The extension to transform the taskset between dif-
ferent ISAs and hardware architectures is quite interesting for
making much more accurate approximations on the required ex-
ecution platform for the host system.
A
E VA L UAT I O N
a.1 measurements
Confidence
Tasks Min[µs] Max[µs] Mean[µs] Intervall0.95
2 75.795 75.765 75.825 [75.765;75.825]
3 111.448 111.421 111.476 [111.421;111.476]
4 144.455 144.411 144.498 [144.411;144.498]
5 172.491 172.475 172.507 [172.475;172.507]
6 183.051 183.009 183.092 [183.009;183.092]
7 214.671 214.615 214.727 [214.615;214.727]
8 251.289 251.245 251.334 [251.245;251334]
9 410.597 410.525 410.669 [410.525;410.669]
10 424.146 424.068 424.225 [424.068;424.225]
11 475.634 475.595 475.673 [475.595;475.673]
12 520.839 520.773 520.905 [520.773;520.905]
13 604.713 604.601 604.825 [604.601;604.825]
14 617.245 617.226 617.264 [617.226;617.264]
15 715.053 715.024 715.081 [715.024;715.081]
16 728.816 728.726 728.906 [728.726;728.906]
17 759.352 759.252 759.452 [759.252;759.452]
18 792.982 792.888 793.076 [792.888;793.076]
19 830.144 830.097 830.190 [830.097;830.190]
Table 22: Performance of scheduler interface routine sched_init in case
of FVBTP. [Grö10]
181
182 evaluation
Confidence
Tasks Min[µs] Max[µs] Mean[µs] Intervall0.95
20 843.301 843.270 843.331 [843.270;843.331]
30 1401.315 1401.276 1401.353 [1401.276;1401.353]
40 1715.308 1715.241 1715.375 [1715.241;1715.375]
50 2145.459 2145.327 2145.591 [2145.327;2145.591]
60 2573.147 2573.017 2573.276 [2573.017;2573.276]
70 3115.312 3115.196 3115.428 [3115.196;3115.428]
80 3429.558 3429.386 3429.731 [3429.386;3429.731]
90 3972.970 3972.840 3973.100 [3972.840;3973.100]
100 4286.520 4286.500 4286.541 [4286.500;4286.541]
Table 23: Performance of scheduler interface routine sched_init in case
of FVBTP. [Grö10]
a.1 measurements 183
Confidence
Tasks Min[µs] Max[µs] Mean[µs] Intervall0.95
1 29.523 30.080 29.835 [29.812;29.857]
2 54.680 55.240 55.010 [54.986;55.033]
3 78.840 79.440 79.113 [79.089;79.138]
4 104.600 105.123 104.841 [104.802;104.879]
5 130.083 130.882 130.604 [103.570;130.638]
6 155.040 155.723 155.365 [155.332;155.398]
7 177.160 177.803 177.443 [177.413;177.472]
8 203.400 204.365 203.910 [203.855;203.966]
9 229.363 230.405 230.020 [229.974;230.066]
10 254.403 255.005 254.637 [254.601;254.672]
11 275.603 276.363 276.054 [276.019;276.088]
12 302.082 303.045 302.589 [302.516;302.663]
13 329.402 330.485 329.912 [329.859;329.965]
14 355.725 356.765 356.475 [356.409;356.541]
15 374.202 374.885 374.498 [374.457;374.539]
16 402.285 403.005 402.533 [402.501;402.565]
17 431.405 432.165 431.806 [431.762;431.850]
18 454.725 455.805 455.509 [455.456;455.561]
19 472.645 473.325 472.944 [472.897;472.992]
20 501.365 502.365 501.976 [501.914;502.038]
30 753.648 753.970 753.787 [753.753;753.821]
40 998.133 999.090 998.616 [998.512;998.719]
50 1256.215 1257.135 1256.765 [1256.681;1256.849]
60 1493.180 1494.497 1493.815 [1493.657;1493.973]
70 1756.900 1757.500 1757.169 [1757.072;1757.266]
80 1991.463 1991.985 1991.723 [1991.633;1991.814]
90 2243.425 2247.825 2246.326 [2245.335;2247.317]
100 2488.347 2488.710 2488.491 [2488.412;2488.571]
Table 24: Performance of scheduler interface routine sched_init in case
of OSE. [Grö10]
184 evaluation
Confidence
VMs Min[µs] Max[µs] Mean[µs] Intervall0.95
2 85.882 86.882 86.792 [86.763;86.820]
3 154.563 156.882 156.857 [156.812;156.903]
4 199.322 202.245 202.213 [202.156;202.270]
5 306.243 309.565 309.408 [309.342;309.474]
6 365.965 369.885 369.613 [369.532;369.694]
7 435.125 439.527 439.364 [439.277;439.450]
8 498.005 503.408 503.333 [503.226;503.440]
9 669.648 676.170 675.432 [675.290;675.573]
10 749.050 756.570 755.932 [755.760;756.104]
11 839.690 846.410 846.079 [845.945;846.212]
12 921.970 930.693 930.107 [929.934;930.281]
13 1028.172 1037.333 1036.800 [1036.610;1036.990]
14 1118.573 1128.253 1127.983 [1127.797;1128.169]
15 1219.255 1229.295 1228.616 [1228.411;1228.821]
16 1313.375 1324.098 1323.314 [1323.097;1323.531]
17 1603.898 1615.780 1615.260 [1615.027;1615.493]
18 1716.940 1730.860 1729.660 [1729.403;1729.918]
19 1841.223 1853.863 1853.360 [1853.117;1853.603]
16 1958.303 1972.385 1972.223 [1971.948;1972.499]
30 3389.000 3408.600 3407.685 [3407.308;3408.062]
40 5520.302 5545.505 5545.026 [5544.536;5545.517]
50 7722.488 7755.010 7754.684 [7754.050;7755.318]
60 10181.683 10221.400 10220.492 [10219.721;10221.264]
70 13690.518 13739.800 13737.246 [13736.323;13738.168]
80 16896.037 16947.315 16945.127 [16944.159;16946.094]
90 20485.518 20543.520 20542.339 [20541.213;20543.465]
100 24481.280 24485.522 24483.402 [24482.890;24483.913]
Table 25: Performance of scheduler interface routine
sched_getNextVMIndex in case of OSE. [Grö10]
Confidence
Min[µs] Max[µs] Mean[µs] Intervall0.95
2.918 3.440 3.054 [3.035 ;3.073]
Table 26: Performance of scheduler interface routine
sched_getNextVMIndex in case of FVBTP. [Grö10]
a.2 scenario i: electrical drive engineering - linear motor control 185
a.2 scenario i: electrical drive engineer-
ing - linear motor control
Both subsystems are executed on a dSPACE DS1005 PPC board1.
The processor of this platform is a PowerPC 750GX, with a pro-
cessor speed of 1GHz. If one executes these controller tasks on
a PowerPC with 300 MHz, the execution times have to be multi-
plied by 1000 MHz/300 MHz 3.333.
RTVM1(Γ) = {(3,0.833),(3,0.833)}
RTVM2(Γ) = {(42,0.833)}
U(RTVM1(Γ)) 0.556
U(RTVM2(Γ)) 0.020
(a) EDF
Activation Slots:
Π1= ( (0,1.668),3)
Π2= ( (1.668,1.724),3)
(b) RM
Ulub(Γ1)0.828
Ulub(Γ2) = 1
Activation Slots:
Π1= ( (0,2.014),3)
Π2= ( (2.014,2.074),3)
1http://www.dspace.com/
186 evaluation
a.3 scenario ii: industrial - cnc machine
All three subsystems are assumed to be executed on a 100 MHz
processor, and are now consolidated on a PowerPC with 300
MHz.
RTVM1(Γ) = {(100,3.333),(100,16.667),(100,1.667)}
RTVM2(Γ) = {(10,0.333),(100,16.667)}
RTVM3(Γ) = {(10,0.333),(10,0.333),(10,0.333),(10,0.333),
(10,0.333),(10,0.333),(10,0.333)}
U(RTVM1)0.217
U(RTVM2)0.2
U(RTVM3)0.233
(a) EDF
Activation Slots:
Π1= ( (0,2.170),10 )
Π2= ( (2.170,3.870),10 )
Π3= ( (3.870,6.200),10 )
(b) RM
Ulub(RTVM1(Γ)) 0.780
Ulub(RTVM2(Γ)) 0.828
Ulub(RTVM3(Γ)) 0.729
Π1= ( (0,2.782),10 )
Π2= ( (2.782,5.197),10 )
Π3= ( (5.197,8.393),10 )
Utilization G = 83.93 %
a.4 scenario iii: medical - x-ray machine 187
a.4 scenario iii: medical - x-ray machine
Both subsystems are assumed to be executed on a 150 MHz pro-
cessor, and are now consolidated on a PowerPC with 300 MHz.
RTVM1(Γ) = {(100,5.000),(500,100.000),(500,50.000)}
RTVM2(Γ) = {(1000,25.000),(100,10.000),(100,10.000),
(100,5.000),(100,5.000)}
U(RTVM1(Γ)) 0.350
U(RTVM2(Γ)) 0.275
(a) EDF
Activation Slots:
Π1= ( (0,35.000),100 )
Π2= ( (35.000,62.500),100 )
(b) RM
Ulub(RTVM1(Γ)) 0.780
Ulub(RTVM2(Γ)) 0.743
Activation Slots:
Π1= ( (0,44.872),100 )
Π2= ( (44.872,81.833),100 )
188 evaluation
a.5 scenario iv: automotive - airbag con-
trol and driver assistance
All three subsystems are assumed to be executed on a 100 MHz
processor, and are now consolidated on a PowerPC with 300
MHz.
RTVM1(Γ) = {(15,0.500),(15,0.100),(15,0.100),(15,0.100),
(15,0.100),(15,0.100),(15,0.100),(15,0.200),(15,0.200)}
RTVM2(Γ) = {(100,6.667),(50,3.333),(5,0.333)}
RTVM3(Γ) = {(5,0.300),(5,0.100),(5,0.100),(5,0.100),
(5,0.100),(5,0.100),(5,0.333)}
U(RTVM1(Γ)) 0.100
U(RTVM2(Γ)) 0.200
U(RTVM3(Γ)) 0.227
(a) EDF
Activation Slots:
Π1= ( (0,0.500),5)
Π2= ( (0.500,1.500),5)
Π3= ( (1.500,2.635),5)
(b) RM
Ulub(RTVM1)0.721
Ulub(RTVM2)0.780
Ulub(RTVM3)0.729
Activation Slots:
Π1= ( (0,0.693),5)
Π2= ( (0.693,1.975),5)
Π3= ( (1.975,3.532),5)
BIBLIOGRAPHY
[Atm08] Atmel: Atmega8:8-bit with 8k bytes in-system pro-
grammable flash., Juli 2008.http://www.atmel.com/
dyn/resources/prod_documents/doc2486.pdf.
[Aud91] Audsley, N: Optimal priority assignment and feasibility
of static priority tasks with arbitrary start times. Real-
Time Systems, Jan 1991.
[BA00] Buttazzo, G and L Abeni: Adaptive rate control
through elastic scheduling. Decision and Control, 2000.
Proceedings of the 39th IEEE Conference on, 5:4883
4888 vol.5, Jan 2000.
[BA01] Bennett, M. D. and Neil C. Audsley: Predictable and
efficient virtual addressing for safety-critical real-time
systems. In Euromicro Conference on Real-Time Sys-
tems (ECRTS), pages 183190. IEEE Computer Soci-
ety, 2001.
[Bak05] Baker, T: An analysis of EDF schedulability on a mul-
tiprocessor. IEEE Transactions on Parallel and Dis-
tributed Systems, Jan 2005.http://oz.nthu.edu.
tw/~d918323/real-time/01458691.pdf.
[Bal09] Baldin, Daniel: Entwurf und Implementierung ei-
ner komponentenbasierten Virtualisierungsplattform für
selbstoptimierende eingebettete mechatronische Systeme.
Diplomarbeit, Universität Paderborn, March 2009.
[BB01] Bini, E. and G. Buttazzo: A hyperbolic bound for the
rate monotonic algorithm. In Real-Time Systems, 13th
Euromicro Conference on, 2001., pages 59 66,2001.
[BB08] Baruah, S and T Baker: Schedulability analysis of global
edf. Real-Time Systems, Jan 2008.
[BBB09] Bini, E, M Bertogna, and S Baruah: Virtual mul-
tiprocessor platforms: Specification and use. Real-
Time Systems Symposium, 2009, RTSS 2009.
30th IEEE, pages 437 446, Dec 2009.http://
ieeexplore.ieee.org/search/srchabstract.jsp?
arnumber=5368130&isnumber=5368116&punumber=
5368115&k2dockey=5368130@ieeecnfs.
189
190 bibliography
[BDF+03] Barham, Paul, Boris Dragovic, Keir Fraser, Steven
Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian
Pratt, and Andrew Warfield: Xen and the art of vir-
tualization. SOSP 03: Proceedings of the nineteenth
ACM symposium on Operating systems principles,
Dec 2003.http://portal.acm.org/citation.cfm?
id=945445.945462.
[Bel73] Bell, J: Threaded code. Communications, Jan 1973.
http://portal.acm.org/citation.cfm?doid=
362248.362270.
[BLA98] Buttazzo, G, G Lipari, and L Abeni: Elastic task model
for adaptive rate control. Real-Time Systems Sympo-
sium, 1998. Proceedings., The 19th IEEE, pages 286
295, Dec 1998.
[BLCA02] Buttazzo, G, G Lipari, M Caccamo, and L Abeni:
Elastic scheduling for flexible workload management.
Computers, IEEE Transactions on, 51(3):289 302,
Mar 2002.
[Bos07] Bosch, Robert: Autoelektrik Autoelektronik. Vieweg,
2007.
[Bra70] Bradley, G: Algorithm and bound for the greatest com-
mon divisor of n integers. Communications of the
ACM, Jan 1970.http://portal.acm.org/citation.
cfm?id=362694.
[Bro02] Broad, William J.: For parts, NASA boldly goes on
ebay, May 12 2002.http://www.nytimes.com/2002/
05/12/us/for-parts-nasa-boldly-goes-on-ebay.
html(access,09/28/2010).
[Bro06] Broy, M: Challenges in automotive software engineering.
Proceedings of the 28th international conference on
Software engineering (ICSE 06), Jan 2006.http://
portal.acm.org/citation.cfm?id=1134292.
[Bry09] Brygier, Jacques: Extending military software life ex-
pectancy through safe and secure virtualization. Military
Embedded Systems, June, 2009.
[But04] Buttazzo, Giorgio C.: Hard Real-Time Computing Sys-
tems. Springer, 2004.
[But05] Buttazzo, G: Rate monotonic vs. EDF: Judgment
day. Real-Time Systems, Jan 2005.http://www.
springerlink.com/index/Q210434KLR294437.pdf.
[But06] Buttazzo, Giorgio: Research trends in real-time com-
bibliography 191
puting for embedded systems. SIGBED Review, 3(3),
Jul 2006.http://portal.acm.org/citation.cfm?
id=1164050.1164052.
[BVZB05] Berndl, M, B Vitale, M Zaleski, and A Brown: Con-
text threading: a flexible and efficient dispatch technique
for virtual machine interpreters. Code Generation
and Optimization, 2005. CGO 2005. International
Symposium on, pages 15 26, Feb 2005.http://
ieeexplore.ieee.org/search/srchabstract.jsp?
arnumber=1402073&isnumber=30441&punumber=
9631&k2dockey=1402073@ieeecnfs.
[BYMK. . . 06] Ben-Yehuda, M, J Mason, O Krieger, and J Xenidis
...: Utilizing IOMMUs for virtualization in Linux and
Xen. Proceedings of the Ottawa Linux Symposium
(OLS 06), Jan 2006.http://landley.net/kdocs/
ols/2006/ols2006v1-pages-71-86.pdf.
[CEG07] Casey, K, M Ertl, and D Gregg: Optimizing indi-
rect branch prediction accuracy in virtual machine in-
terpreters. portal.acm.org, Jan 2007.http://portal.
acm.org/citation.cfm?id=1286821.1286828.
[CGV07] Cherkasova, Ludmila, Diwaker Gupta, and Amin
Vahdat: Comparison of the three CPU schedulers in
xen. SIGMETRICS Performance Evaluation Review,
35(2), Sep 2007.http://portal.acm.org/citation.
cfm?id=1330555.1330556.
[CHL06] Chantem, T, Xiaobo Sharon Hu, and M Lemmon:
Generalized elastic scheduling. Real-Time Systems
Symposium, 2006. RTSS 06.27th IEEE International,
pages 236 245, Dec 2006.
[Ciu08] Ciufo, Chris A.: Virtualization yields hardware opti-
mization and new embedded architectures. Military Em-
bedded Systems, July, 2008.
[CK97] Channon, D. and D. Koch: Performance analysis of re-
configurable partitioned tlbs. In HICSS 97: Proceedings
of the 30th Hawaii International Conference on System
Sciences, page 168, Washington, DC, USA, 1997. IEEE
Computer Society.
[CM96] Cifuentes, C and V Malhotra: Binary translation:
static, dynamic, retargetable? Software Maintenance
1996, Proceedings., International Conference on,
pages 340 349, Oct 1996.
[CN01] Chen, Peter M. and Brian D. Noble: When Virtual Is
192 bibliography
Better Than Real.2001.
[DCSH05] D. C. Snowdon, S. Ruocco and G. Heiser: Power man-
agement and dynamic voltage scaling: Myths and facts.
In Proceedings of the 2005 Workshop on Power Aware
Real-time Computing,2005.
[DCSH07] D. C. Snowdon, S. M. Petters and G. Heiser: Accu-
rate on-line prediction of processor and memory energy
usage under voltage scaling:. In Proceedings of the 7th
International Conference on Embedded Software,2007.
[Dew75] Dewar, Robert: Indirect threaded code. Communica-
tions of the ACM, 18(6), Jun 1975.http://portal.
acm.org/citation.cfm?id=360825.360849.
[Dit98a] Ditze, Carsten: A customizable library to support soft-
ware synthesis for embedded applications and micro-
kernel systems. EW 8: Proceedings of the 8th ACM
SIGOPS European workshop on Support for com-
posing distributed applications, Sep 1998.http://
portal.acm.org/citation.cfm?id=319195.319209.
[Dit98b] Ditze, Carsten: A step towards operating system synthe-
sis. In In Proc. of the 5th Annual Australasian Conf.
on Parallel And Real-Time Systems (PART). IFIP, IEEE,
1998.
[DK89] Demers, A and S Keshav: Analysis and simulation of a
fair queueing algorithm. SIGCOMM 1989 Symposium
proceedings on Communications architectures and
protocols, Jan 1989.
[DL97] Deng, Z and J Liu: Scheduling real-time applications in
an open environment. Real-Time Systems Symposium,
1997. Proceedings., The 18th IEEE, pages 308 319,
Nov 1997.
[DLS97] Deng, Z, J Liu, and J Sun: A scheme for scheduling hard
real-time applications in open system environment. Real-
Time Systems, 1997. Proceedings., Ninth Euromicro
Workshop on, pages 191 199, May 1997.
[EG01] Ertl, M and D Gregg: The behavior of efficient virtual
machine interpreters on modern architectures. Lecture
Notes in Computer Science, Jan 2001.http://www.
springerlink.com/index/1EUU8KJJ7TXRR1WE.pdf.
[EGKP02] Ertl, M, D Gregg, A Krall, and B Paysan: Vmgen-a
generator of efficient virtual machine interpreters. Soft-
ware: Practice and Experience, Jan 2002.http://doi.
bibliography 193
wiley.com/10.1002/spe.434.abs.
[EPF06] EPFL, S: Optimizing network virtualization in Xen.
USENIX Annual Technical Conference, 2006.
http://www.usenix.org/events/usenix06/tech/
menon/menon.pdf.
[Ert99] Ertl, M: Optimal code selection in DAGs. POPL
99: Proceedings of the 26th ACM SIGPLAN-
SIGACT symposium on Principles of program-
ming languages, Jan 1999.http://portal.acm.org/
citation.cfm?id=292540.292562.
[EW01] Ertl, M and T Wien: Threaded code variations and op-
timizations. EuroForth 2001 Conference Proceedings,
Jan 2001.
[FH04] Ferdinand, Christian and Reinhold Heckmann: aiT:
Worst-Case Execution Time prediction by static program
analysis. In Jacquart, Renè (editor): Building the In-
formation Society, volume 156 of IFIP International
Federation for Information Processing, pages 377383.
Springer Boston, 2004.
[FM02] Feng, Xiang and A Mok: A model of hierarchical real-
time virtual resources. Real-Time Systems Sympo-
sium, 2002. RTSS 2002.23rd IEEE, pages 26 35, Nov
2002.
[GL94] Greene, Robert and George Lownes: Embedded CPU
target migration, doing more with less. Proceedings of
the conference on TRI-Ada 94,1994.
[Gre10] Greenhills: Real-Time Operating Systems (RTOS), Em-
bedded Development Tools, Optimizing Compilers, IDE
tools, Debuggers - Green Hills Software. Website, June
2010.http://www.ghs.com/.
[Grö10] Grösbrink, Stefan: Comparison of alternative hierar-
chichal scheduling techniques for the virtualization of em-
bedded real-time systems. Diplomarbeit, Universität
Paderborn, Dec 2010.
[Hei07] Heiser, Gernot: Virtualization for embedded systems.
Technical report, Open Kernel Labs, Apr 17 2007.
[Hei08a] Heinz, Thomas: Preserving temporal behaviour of
legacy real-time software across static binary translation.
IIES 08: Proceedings of the 1st workshop on Iso-
lation and integration in embedded systems, Apr
2008.http://portal.acm.org/citation.cfm?id=
194 bibliography
1435458.1435459.
[Hei08b] Heiser, Gernot: The role of virtualization in embedded
systems. IIES 08: Proceedings of the 1st workshop
on Isolation and integration in embedded systems,
Apr 2008.http://portal.acm.org/citation.cfm?
id=1435458.1435461.
[HM80] Horspool, R and N Marovac: An approach to
the problem of detranslation of computer pro-
grams. The Computer Journal, Jan 1980.
http://comjnl.oxfordjournals.org/cgi/
content/abstract/23/3/223.
[IBM05] IBM: PPC405Fx Embedded Processor Core Users Man-
ual, January 2005.
[Inc08] Inc., Wind River: Arinc 653 - an avionics standard for
safe, partitioned systems. In IEEE-CS Seminar, June 4
2008.
[Int07] Intel: Virtualization brings real benefits. Whitepaper -
www.intel.com, July 2007.
[Int09] Intel: Computing technologies for medical equipment.
Medical Equipment Whitepaper - www.intel.com,
2009.
[IO07] Ito, M and S Oikawa: Mesovirtualization: Lightweight
virtualization technique for embedded systems. Lecture
Notes in Computer Science, Jan 2007.http://www.
springerlink.com/index/y7228126v5933720.pdf.
[IO08] Ito, M and S Oikawa: Improving real-time performance
of a virtual machine monitor based system. Lecture
Notes in Computer Science, Jan 2008.http://www.
springerlink.com/index/10xr208023j16574.pdf.
[Ise06] Isermann, Rolf: Fahrdynamik-Regelung: Mod-
ellbildung, Fahrerassistenzsysteme, Mechatronik.
Vieweg+Teuber, 2006.
[JM98a] Jacob, B. L. and T.N. Mudge: Virtual memory in con-
temporary microprocessors. IEEE Micro, 18(4):3343,
1998.
[JM98b] Jacob, Bruce L. and Trevor N. Mudge: A look at sev-
eral memory management units, TLB-Refill mechanisms,
and page table organizations. In Proceedings of the
Eighth International Conference on Architectural Sup-
portfor Programming Languages and Operating Systems,
pages 295306,1998.
bibliography 195
[JSMA98] Jeffay, K, F Donelson Smith, A Moorthy, and J An-
derson: Proportional share scheduling of operating sys-
tem services for real-time applications. Real-Time Sys-
tems Symposium, 1998. Proceedings., The 19th IEEE,
pages 480 491, Nov 1998.
[Kai08a] Kaiser, Robert: Alternatives for scheduling virtual
machines in real-time embedded systems. IIES
08: Proceedings of the 1st workshop on Isola-
tion and integration in embedded systems, Apr
2008.http://portal.acm.org/citation.cfm?id=
1435458.1435460.
[Kai08b] Kaiser, Robert: Applying virtualization to real-time em-
bedded systems.1. GI/ITG KuVS Fachgespr"ach Vir-
tualisierung, 2008.
[KB09] Kerstan, Timo and Daniel Baldin: ORCOS. Web-
site, 2009.https://orcos.cs.uni-paderborn.de/
orcos/.
[KL99] Kuo, Tei Wei and Ching Hui Li: A fixed-priority-
driven open environment for real-time applications. Real-
Time Systems Symposium, 1999. Proceedings. The
20th IEEE, pages 256 267,1999.
[LB00] Lipari, G and S Baruah: Efficient scheduling of real-
time multi-task applications in dynamic systems. Real-
Time Technology and Applications Symposium,
2000. RTAS 2000. Proceedings. Sixth IEEE, pages 166
175, Jan 2000.
[LB05] Lipari, G and E Bini: A methodology for designing hi-
erarchical scheduling systems. Journal of Embedded
Computing, Jan 2005.http://iospress.metapress.
com/index/A60KEB3BB9C9CPTN.pdf.
[LCB00] Lipari, G, J Carpenter, and S Baruah: A frame-
work for achieving inter-application isolation in multipro-
grammed, hard real-time environments. Real-Time Sys-
tems Symposium, 2000. Proceedings. The 21st IEEE,
pages 217 226,2000.
[LCH06] Laune C. Harris, Barton P. Miller: Practical Analysis
of Stripped Binary Code. Technical report, Computer
Sciences Department, University of Wisconsin, 2006.
[Lit04] Litz, Lothar: Grundlagen der Automatisierungstechnik:
Regelungssysteme - Hybride Systeme, Seiten 145146.
Oldenbourg, 2004.
196 bibliography
[LL73] Liu, C and J Layland: Scheduling algorithms for mul-
tiprogramming in a hard-real-time environment. Jour-
nal of the Association for computing Machinery,
Jan 1973.http://portal.acm.org/citation.cfm?
doid=321738.321743.
[LSD89] Lehoczky, J, L Sha, and Y Ding: The rate monotonic
scheduling algorithm: exact characterization and average
case behavior. Real Time Systems Symposium, 1989.
Proceedings., The 10th IEEE, pages 166 171,1989.
[LUC+06] LeVasseur, Joshua, Volkmar Uhlig, Matthew Chap-
man, Peter Chubb, Ben Leslie, and Gernot Heiser:
Pre-Virtualization: soft layering for virtual machines.
Technical report 2006-15, Fakultät für Informatik,
Universität Karlsruhe (TH), July 2006.
[Lyn09] LynuxWorks: Embedded Hypervisor and Separation
Kernel for Operating-system Virtualization: LynxSecure.
Website, February 2009.
[Mai09] Main, Chris: Allowing for GPOS and RTOS: The unique
virtualization needs of mission-critical embedded systems.
Military Embedded Systems, September, 2009.
[MFC01] Mok, A, X Feng, and D Chen: Resource par-
tition for real-time systems. Proc. of IEEE
Real-Time Technology and Applications . . . , Jan
2001.http://ieeexplore.ieee.org/iel5/7401/
20109/00929867.pdf?arnumber=929867.
[Mil68] Miller, Robert: Response time in man-computer conver-
sational transactions. Proc. AFIPS Fall Joint Computer
Conference, 33:267277, Jan 1968.http://portal.
acm.org/citation.cfm?id=1476589.1476628.
[MMH08] Murray, Derek, Grzegorz Milos, and Steven
Hand: Improving Xen security through disaggregation.
VEE 08: Proceedings of the fourth ACM SIG-
PLAN/SIGOPS international conference on Virtual
execution environments, Mar 2008.http://portal.
acm.org/citation.cfm?id=1346256.1346278.
[Mod09]ModelSim - a comprehensive simulation and debug
enivronment for complex ASIC and FPGA designs,2009.
http://www.model.com/.
[Nea06] Neumann, D. and D. Kulkari et. al.: Intel virtualiza-
tion technology in embedded and communication infras-
tructure applications. Intel Technology Journal, 10(3),
2006.
bibliography 197
[Nie93] Nielsen, J: The usability engineering life cycle. Com-
puter, 25(3):1222,1993.
[Obj08] Objective Interface Systems: MILS Technical Primer.
Website, 2008.http://www.ois.com/Products/
MILS-Technical-Primer.html.
[Oer09] Oertel, Markus: Entwurf und prototypische Implemen-
tierung eines echtzeitfähigen ATmega8Emulators. Di-
plomarbeit, University of Paderborn, 2009.
[OIN06] Oikawa, Shuichi, Megumi Ito, and Tatsuo Nakajima:
Linux/RTOS hybrid operating environment on gandalf
virtual machine monitor.4096, Jan 2006.http://www.
springerlink.com/content/j580077065882221/.
[PB00] Puschner, Peter and Alan Burns: A Review of Worst-
Case Execution-Time Analysis. Journal of Real-Time
Systems, 18(2/3):115128, May 2000.
[PG74] Popek, G and R Goldberg: Formal require-
ments for virtualizable third generation architec-
tures. Communications of the ACM, Jan 1974.
http://dforeman.cs.binghamton.edu/~foreman/
552pages/Readings/popek74formal.pdf.
[Pit87] Pittman, T: Two-level hybrid interpreter/native code exe-
cution for combined space-time program efficiency. ACM
SIGPLAN Notices, Jan 1987.http://portal.acm.
org/citation.cfm?id=960114.29666.
[PLW+06] Peng, Jinzhan, Guei Yuan Lueh, Gansha Wu, Xiao-
gang Gou, and Ryan Rakvic: A comprehensive study
of hardware/software approaches to improve TLB perfor-
mance for java applications on embedded systems. In
MSPC 06: Proceedings of the 2006 workshop on Mem-
ory system performance and correctness, pages 102111,
New York, NY, USA, 2006. ACM Press.
[PR98] Piumarta, Ian and Fabio Riccardi: Optimizing direct
threaded code by selective inlining. PLDI 98: Pro-
ceedings of the ACM SIGPLAN 1998 conference on
Programming language design and implementation,
May 1998.http://portal.acm.org/citation.cfm?
id=277650.277743.
[Reg01] Regehr, John: Using Hierarchical Scheduling to Support
Soft Real-Time Applications on General-Purpose Operat-
ing Systems. PhD thesis, University of Virginia, 2001.
[Rei07] Reif, Konrad: Automobilelektronik. Vieweg, 2007.
198 bibliography
[Ric08] Richter, K: Echtzeitfähigkeit im verteilten Regler-
Entwurf. Elektronik Automotive, (9):4245,2008.
[RRW+03] Regehr, J, A Reid, K Webb, M Parker, and J Lepreau:
Evolving real-time systems using hierarchical scheduling
and concurrency analysis. Real-Time Systems Sympo-
sium, 2003. RTSS 2003.24th IEEE, pages 25 36, Jan
2003.
[RS07] Raj, H and K Schwan: High performance and scalable
I/O virtualization via self-virtualized devices. Proceed-
ings of the 16th international symposium on High
performance distributed computing (HPDC 07), Jan
2007.http://portal.acm.org/citation.cfm?id=
1272390.
[SAWJ+96] Stoica, I, H Abdel-Wahab, K Jeffay, S Baruah, J
Gehrke, and C Plaxton: A proportional share resource
allocation algorithm for real-time, time-shared systems.
Real-Time Systems Symposium, 1996., 17th IEEE,
pages 288 299, Nov 1996.
[SB09] Sabeghi, M and K Bertels: Toward a runtime system
for reconfigurable computers: A virtualization approach.
Design, Automation & Test in Europe Conference &
Exhibition, 2009. DATE 09., pages 1576 1579, Apr
2009.
[SBR05] Schnerr, Jurgen, Oliver Bringmann, and Wolfgang
Rosenstiel: Cycle accurate binary translation for simula-
tion acceleration in rapid prototyping of SoCs. DATE 05:
Proceedings of the conference on Design, Automa-
tion and Test in Europe, 2, Mar 2005.http://portal.
acm.org/citation.cfm?id=1048925.1049217.
[SCEG08] Shi, Y, K Casey, M Ertl, and D Gregg: Virtual ma-
chine showdown: stack versus registers. portal.acm.org,
Jan 2008.http://portal.acm.org/citation.cfm?
id=1328197.
[SCK+93] Sites, Richard, Anton Chernoff, Matthew Kirk, Mau-
rice Marks, and Scott Robinson: Binary transla-
tion. Communications of the ACM, 36(2), Feb
1993.http://portal.acm.org/citation.cfm?id=
151220.151227.
[SD98] Saumya Debray, Robert Muth, Matthew Weippert:
Alias Analysis of Executable Code. Technical report,
Department of Computer Science, University of Ari-
zona, 1998.
bibliography 199
[Sed09] Sedjelmaci, S: The mixed binary euclid algorithm.
Electronic Notes in Discrete Mathematics, Jan
2009.http://linkinghub.elsevier.com/retrieve/
pii/S1571065309001826.
[SH02] Shapiro, J and N Hardy: EROS: A principle-driven op-
erating system from the ground up. IEEE software, Jan
2002.
[Sha49] Shannon, C.E.: Communication in the presence of noise.
Proceedings of the IRE, 37(1):10 21, jan 1949,
ISSN 0096-8390.
[SL03] Shin, I and I Lee: Periodic resource model for compo-
sitional real-time guarantees. Proceedings of the 24th
IEEE International Real-Time . . . , Jan 2003.
[SL04] Shin, I and I Lee: Compositional real-time scheduling
framework. Proc. of IEEE Real-Time Systems Sympo-
sium, Jan 2004.
[SL08] Shin, I and I Lee: Compositional real-time scheduling
framework with periodic model. ACM Transactions on
Embedded Computing Systems . . . , Jan 2008.http:
//portal.acm.org/citation.cfm?id=1347383.
[SN05a] Smith, J and R Nair: The architecture of virtual ma-
chines. Computer, Jan 2005.
[SN05b] Smith, James E. and Ravi Nair: Virtual Machines -
Versatile Platforms for Systems and Processes. Morgan
Kaufmann, 2005.
[Sou09]Winavr,2009.http://sourceforge.net/projects/
winavr/.
[SS97] Suzuki, S. and K. Shin: On memory protection in real-
time os for small embedded systems. In Fourth Interna-
tional Workshop on Real-Time Computing Systems and
Applications (RTCSA97), page 51,1997.
[SSF+04] Seyer, Reinhard, Christian Siemers, Rainer Falsett,
Klaus H. Ecker, and Harald Richter: Robust partition-
ing for reliable real-time systems. In Workshop on Paral-
lel & Distributed Real-Time Systems WPDRTS, 18th
International Parallel and Distributed Processing Sympo-
sium (18th IPPDS’04). IEEE Computer Society, 2004.
[Sys10] Sysgo AG: PikeOS manual, January 2010.
[Tan07] Tanenbaum, Andrew S.: Modern Operating Systems.
Prentice Hall Press, Upper Saddle River, NJ, USA,
2007, ISBN 9780136006633.
200 bibliography
[TH94] Talluri, Madhusudhan and Mark D. Hill: Surpassing
the TLB performance of superpages with less operating
system support. In Proceedings of the Sixth Interna-
tional Conference on Architectural Support for Program-
ming Languages and Operating Systems, pages 171
182,1994.
[Tij80] Tijdeman, R.: The chairman assignment problem.
Discrete Mathematics, 32(3):323 330,1980,
ISSN 0012-365X. http://www.sciencedirect.
com/science/article/B6V00-48PVXSR-C/2/
9d4146c9c8b36a222cdd992dac3af767.
[TKHP92] Talluri, M., S. Kong, M. Hill, and D. Patterson: Trade-
offs in supporting two page sizes. In Proceedings of the
19th ISCA, pages 415424,1992.
[UNRS05] Uhlig, R, G Neiger, D Rodgers, and A Santoni: Intel
virtualization technology. Computer, Jan 2005.
[Vir09] VirtualLogix: VirtualLogix - Real-time Virtualization
for Connected Devices :: Products - VLX for Embed-
ded Systems:. Website, February 2009.http://www.
virtuallogix.com/.
[VmW09] VmWare: TRANGO Virtual Prozessors: Scalable secu-
rity for embedded devices. Website, February 2009.
http://www.trango-vp.com/.
[VS06] Varjonen, Mari and Pekka Strömmer: Anatomically
adaptable automatic exposure control (AEC) for amor-
phous selenium (a-Se) full field digital mammography
(FFDM) system. In Flynn, Michael J. and Jiang Hsieh
(editors): Medical Imaging 2006: Physics of Medical
Imaging,2006.
[Wal95] Waldspurger, C: Stride scheduling: Deterministic
proportional-share resource management. Technical re-
port, 1995.
[WEE+08] Wilhelm, Reinhard, Jakob Engblom, Andreas Er-
medahl, Niklas Holsti, Stephan Thesing, David
Whalley, Guillem Bernat, Christian Ferdinand, Rein-
hold Heckmann, Tulika Mitra, Frank Mueller,
Isabelle Puaut, Peter Puschner, Jan Staschulat,
and Per Stenström: The worst-case execution-time
problem—overview of methods and survey of tools.
Trans. on Embedded Computing Sys., 7(3):1
53,2008, ISSN 1539-9087.http://dx.doi.org/10.
1145/1347375.1347389.
bibliography 201
[Win10]Windriver hypervisor,2010.http://www.windriver.
com/products/hypervisor/.
[WLM+10] Wan, J. F., D. Li, Y. Ming, F. Ye, Y. Quin Tu, and
C. Hua Zhang: A performance analysis model for real-
time ethernet-based CNC machines. Journal of Central
South University of Technology, 2010.
[WW94] Waldspurger, Carl A. and William E. Weihl: Lot-
tery scheduling: flexible proportional-share resource man-
agement. In Proceedings of the 1st USENIX confer-
ence on Operating Systems Design and Implementation,
OSDI 94, Berkeley, CA, USA, 1994. USENIX Associ-
ation. http://portal.acm.org/citation.cfm?id=
1267638.1267639.
[Xil09] Xilinx: PowerPC 405 Processor Block Reference Guide,
2009.
[YLLT10] Yang, L., H. Lin, J. Li, and Y. Tao: The architecture
and real-time communication of CNC systems based on
switched ethernet.2nd International Conference on
Computer Engineering based on switched ethernet,
pages V1169–V1173, Dec 2010.
[ZP05] Zhou, Xiangrong and Peter Petrov: Arithmetic-based
address translation for energy-efficient virtual memory
support in low-power, real-time embedded systems. In
Proceedings of the 18th Annual Symposium on Inte-
grated Circuits and Systems Design, SBCCI 2005, Flo-
rianolopolis, Brazil, September 4-7,2005, pages 8691.
ACM, 2005.