scieee Science in your language
[en] (orig)
Large Deviations of Generalised Jackson
Networks
vorgelegt von Diplom Wirtschaftsmathematikerin
Silke Meiner
aus Berlin
Von der Fakult¨at II - Mathematik und Naturwissenschaften
der Technischen Universit¨at Berlin
zur Erlangung des akademischen Grades
Doktorin der Naturwissenschaften
Dr. rer. nat.
genehmigte Dissertation
Promotionsausschuss:
Vorsitzender: Prof. Dr. Reinahrd Nabben
Gutachter: Prof. Dr. Jean-Dominique Deuschel
Gutachter: Prof. Adam Shwartz, Ph.D.
Tag der wissenschaftlichen Aussprache: 3. September 2008
Berlin, 2008
D 83
This research has been supported in parts by the Minerva Foundation of the
Max Planck Society and the Berliner Programm zur orderung der Chan-
cengleichheit von Frauen in Forschung und Lehre.
Abstract
In dieser Arbeit entwickeln wir die lokalen großen Abweichungen von verallge-
meinerten Jackson Netzwerken. Im Unterschied zum Jackson Netzwerk sind
Zwischenankunfts- und Servicezeiten allgemeinen Verteilungen unterworfen
und nicht auf Exponentialverteilungen beschankt. Die daraus resultieren-
den stochastischen Prozesse sind nicht Markovsch, was eine Herausforderung
an die zur Verf¨ugung stehende mathematische Technik bedeutet.
Im ersten Teil der Arbeit untersuchen wir, inwieweit und mit welchen Mit-
teln die verlorene Markoveigenschaft aufgewogen werden kann. Die verall-
gemeinerten Prozesse, die wir betrachten, sind Erneuerungsprozesse. Es
gelingt uns, die Prozesse, mit denen wir das generalisierte Jackson Netz-
werk beschreiben werden, so abzu¨andern, dass sie unabh¨angige station¨are
Inkremente haben und im Sinne der großen Abweichungen nicht von den ur-
spr¨unglichen Prozessen zu unterscheiden sind. Weiter entwickeln wir einen
exponentiellen Maßwechsel f¨ur die Erneuerungsprozesse, so dass die Erneuer-
ungseigenschaft erhalten bleibt. Der resultierende Maßwechsel f¨ur den Netz-
werkprozess ver¨andert nur die Raten des Netzwerkes, nicht aber seine grund-
legenden Eigenschaften.
Im Ergebnis erhalten wir ein lokales Prinzip großer Abweichungen mit einer
Ratenfunktion, die fast die Fenchel Legendre Transformierte der logarith-
mischen Momenterzeugendenfunktion Ψ des freien Prozesses ist, der dem
generalisierten Jackson Netzwerk zugeordnet ist:
L(x, v) = sup
α∈BK(x,v)
hα, vi Ψ(α) (1)
Die lokale Ratenfunktion L(·,·) unterscheidet sich von einer Fenchel Le-
gendre Transformierten durch die Einschr¨ankung auf Elemente aus BK(x,v).
Diese Menge beschreibt die unterschiedlichen Verhaltensweisen des Netz-
werkprozesses in Abh¨angigkeit vom derzeitigen Zustand des Netzwerkes -
repr¨asentiert durch x- und dem zuk¨unftigen Verlauf - repr¨asentiert durch v.
Ist eine zuk¨unftige Entwicklung des Netzwerkes in Richtung vein seltenes
Ereignis und αder Optimierer in (1), so ¨andert sich die Situation unter dem
Maßwechsel mit Parameter αdahingehend, dass die Entwicklung in Richtung
vzum erwarteten Verhalten des Netzwerkes wird.
Contents
1 Introduction 1
1.1 Queues, networks, and rare events . . . . . . . . . . . . . . . . 1
1.2 Large deviations of Jackson and generalised Jackson networks 6
2 Inter event times 9
2.1 Inter event time . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 The logarithmic moment generating function . . . . . . . . . . 12
2.3 Exponential twist of inter event times . . . . . . . . . . . . . . 16
2.4 Hazard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Introduction of the hazard function . . . . . . . . . . . 18
2.4.2 The hazard function and the domain of lmgf . . . . . . 20
2.4.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6 Fenchel-Legendre transforms . . . . . . . . . . . . . . . . . . . 29
2.7 Extreme twists . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.8 Generalisations . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3 The counting process 35
3.1 Introducing the counting process . . . . . . . . . . . . . . . . 36
3.1.1 Joint distributions . . . . . . . . . . . . . . . . . . . . 40
3.2 Stationary increments . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Lmgf for the undelayed rcp . . . . . . . . . . . . . . . . . . . . 46
3.4 Exponential equivalence for cps . . . . . . . . . . . . . . . . . 48
3.4.1 Initial inter event time . . . . . . . . . . . . . . . . . . 49
3.4.2 Independence of increments . . . . . . . . . . . . . . . 53
3.4.3 Interpolation . . . . . . . . . . . . . . . . . . . . . . . 59
3.5 Conclusions from previous proofs . . . . . . . . . . . . . . . . 59
3.5.1 Finite dimensional large deviations . . . . . . . . . . . 59
3.5.2 Continuous paths . . . . . . . . . . . . . . . . . . . . . 63
3.6 Change of measure . . . . . . . . . . . . . . . . . . . . . . . . 63
3.6.1 Martingale property . . . . . . . . . . . . . . . . . . . 64
5
3.6.2 The twisted distribution . . . . . . . . . . . . . . . . . 68
4 Large deviations of the ren. counting process 75
4.1 Thespace ............................. 77
4.1.1 A base of the topology . . . . . . . . . . . . . . . . . . 77
4.2 Local large deviations . . . . . . . . . . . . . . . . . . . . . . . 79
4.2.1 Local large deviations upper bound . . . . . . . . . . . 80
4.2.2 Local large deviations lower bound . . . . . . . . . . . 82
4.2.3 Generalisation . . . . . . . . . . . . . . . . . . . . . . . 83
4.2.4 Piecewise linear functions . . . . . . . . . . . . . . . . 84
4.2.5 Towards linear geodesics . . . . . . . . . . . . . . . . . 86
4.2.6 Alimit........................... 90
4.3 The LDP in sample space . . . . . . . . . . . . . . . . . . . . 92
4.3.1 The weak large deviation principle . . . . . . . . . . . 92
4.3.2 The full large deviation principle . . . . . . . . . . . . 100
4.3.3 Interpretation . . . . . . . . . . . . . . . . . . . . . . . 102
4.4 Split counting processes . . . . . . . . . . . . . . . . . . . . . 102
4.4.1 Construction of the split process . . . . . . . . . . . . . 103
4.4.2 Change of measure for the split process . . . . . . . . . 110
4.4.3 Sample path LDP for the split process . . . . . . . . . 114
5 Stochastic networks and associated processes 119
5.1 Stochastic networks . . . . . . . . . . . . . . . . . . . . . . . . 120
5.2 Deterministic descriptions of stochastic networks . . . . . . . . 127
5.2.1 Fluid network . . . . . . . . . . . . . . . . . . . . . . . 130
5.2.2 Subnetworks . . . . . . . . . . . . . . . . . . . . . . . . 131
5.3 Processes .............................138
5.3.1 The free process . . . . . . . . . . . . . . . . . . . . . . 138
5.3.2 The network process . . . . . . . . . . . . . . . . . . . 150
5.3.3 The local process . . . . . . . . . . . . . . . . . . . . . 152
6 Local large deviations of the generalised Jackson network 159
6.1 Local large deviations upper bound . . . . . . . . . . . . . . . 161
6.1.1 Leaving a boundary . . . . . . . . . . . . . . . . . . . . 164
6.2 Existence and uniqueness of an optimiser . . . . . . . . . . . . 167
6.3 Network drift under the changed measure . . . . . . . . . . . . 174
6.4 Local large deviations lower bound . . . . . . . . . . . . . . . 178
6.5 Rate function identification . . . . . . . . . . . . . . . . . . . 181
6.6 Calculating the local rate function . . . . . . . . . . . . . . . . 183
6.6.1 Interpretation and possible improvement . . . . . . . . 188
6.6.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 189
7 Appendix 195
7.1 Functional strong law of large numbers . . . . . . . . . . . . . 195
7.1.1 Implication for the counting process . . . . . . . . . . . 196
7.2 Implications from exponential equivalence . . . . . . . . . . . 198
7.3 Fenchel-Legendre transforms . . . . . . . . . . . . . . . . . . . 202
7.4 The shifted inter event time . . . . . . . . . . . . . . . . . . . 204
7.5 Large deviations and other tools . . . . . . . . . . . . . . . . . 206
7.6 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Notation
|| · || for fR[0,T ]:||f|| = supt[0,T ]|f(t)|
for f(Rd)[0,T ]:||f|| = maxi=1,...,d supt[0,T ]|fi(t)|
|| · ||[a,b]||f||[a,b]=||f11[a,b]||, [a, b][0, T]
AC, AC([0, T],Rd){f(Rd)[0,T ]|fis absolutely continuous }
BΛ,BKset of restrictions, claims 6.1.1, 6.1.4
C([0, T],Rd){f(Rd)[0,T ]|fis continuous at each t[0, T]}
D([0, T],Rd){f(Rd)[0,T ]|fwith each firight-continuous at each t
[0, T) and with left limits for each t(0, T]}
dnumber of nodes of a network / graph
E(α),E[α]expectation wrt a twisted distribution, (α): inter event time
twisted with parameter α; [α] counting process twisted with
parameter α, section 3.6
Fβexponential tranfform / twist, def 2.3.1
F+adistribution function of τaconditional on τ > a, def 2.1.7
gi(·) restriction, above claim 6.3.2
gγ(·) def 6.2.2
K(·) lmgf of splitting probabilities, def 4.4.4
K(·) set of nodes / indices, claim 6.1.1
L(X) law / distribution of a random variable X
L(α), L(α, k) a level set, the level set of a function identified through k
Λ(·) lmgf of an inter event time, def 2.2.1
Λ(·) set of nodes / indices, proof of 5.2.15, claim 6.1.1
L(·,·) local rate function
λRd, λivector of arrival rates, arrival rate at node i
λMvector of arrival rates to nodes iM, claim 5.2.10
λΛR|Λc|arrival rates to Λc-nodes in the subnetwork of Λcnodes with
Λ-nodes free, proof of 5.2.15
µ, µi(vector of) service rate(s), same indexing as with λ
Ncounting process, def 3.1.1
Nσ, N˜τcp with indicated initial inter event time, def 3.1.5
ˇ
Ncounting process as a memeber of a coupling
ˆ
Ninterpolated counting process
Nre,s, Nre,(s1,...,sk)restarted counting process, s(or s1,...,sk) indicating the
time(s) of restarting, defs 3.4.11, 3.4.16
Nsp a split counting proces, def 4.4.1
P , p(i)routing matrix, row of the routing matrix P
πkprojection onto span {ek}
πM,Ma set projection onto span {ek|kM}
Ψ(·) lmgf of the free process, def 5.3.9
R>0,R0R(0,), R[0,)
R(i)runtime process of the server at node i
R(·,·) operator to describe / define a runtime process, def 5.3.21
T,T(i)linear transformation, def 4.4.7
Atranspose of a matrix A
Toften a time-index R>0
Chapter 1
Introduction
We develop local large deviations for the generalised Jackson network. We
work with a continuous time model and with light tail distributions for inter
arrival and service times. Using classical large deviation theory of logarithmic
moment generating functions and exponential changes of measure we get a
local rate function that is almost a Fenchel Legendre transform of the free
process’ logarithmic moment generating function Ψ.
L(x, v) = sup
α∈BK(x,v)
hα, vi Ψ(α) (1.1)
What keeps the local rate function from being a full Fenchel Legendre trans-
form is the restrictions BK(x,v)reflecting nodes not-empty when the state of
the network is xand nodes filling up when the network evolves in direction v.
The way we develop the local large deviation will allow to get a weak and
full large deviation principle for the generalised Jackson network quite easily.
We also give a representation of the almost Fenchel Legendre transform as a
Fenchel Legendre transform in lower dimension.
The approach to apply classical large deviation theory to stochastic networks
is inspired by “Large Deviations of Jackson Networks” of Irina Ignatiouk-
Robert published in 2000 [12].
1.1 Queues, networks, and rare events
A queueing network is a collection of inter connected service stations that
customers arrive to, travel through and leave. At each service station a cus-
tomer occupies the server in order to receive service. Whenever a customer
1
2
arriving at a server finds it busy serving another customer, the arriving cus-
tomer will queue.
We like to think about networks in a stochastic way: the time a customer
occupies a server may vary from customer to customer and from server to
server; the travelling through the network may be along different possible
routes and a customer leaving a node may choose between different nodes to
go to next. We perceive these decisions as random.
Assuming that the network has resources enough to finish servicing customers
at each station in a reasonable time we are interested in the rare events of
long queue sizes at some nodes and in probabilities of different evolutions of
large queue sizes over time.
In a network with sufficient resources large queue sizes will occur only rarely
- but they will. A manager of a network will have to find the balance between
increasing resources and the tolerance to rarely occurring large queue sizes
and when necessary come up with actions to reduce large queue sizes quickly.
The present thesis gives a guideline of how often to expect any kind of large
queue sizes for given resources in a network under its regular operating con-
ditions.
Before turning to the main object of interest of this thesis which are net-
works of queues, let us look at a single network node in isolation. The
simplest setting is the single server queue with Poisson arrivals and exponen-
tial service times, the so called M/M/1 queue. A generalised M/M/1 would
be denoted a GI/GI/1 queue where the stream of arriving customers is a
renewal counting process and service times are independent with a general
distribution (M is for Markovian and the Poisson process is Markovian, GI
is for General Independent).
To get a first impression of typical results consider a queue that generally
can serve arriving customers without the queue size becoming too large. Pro-
vided that the queue has been running for a long time, the probability for a
M/M/1 queue to be of size xNor larger is
P(Qx) = ρx(1.2)
where ρ < 1 is the traffic intensity, the ratio of arrival rate and service rate.
For the GI/GI/1 queue we can make the following approximation for the
1.1 Introduction 3
probability of an unusually large size of at least nx: For xR
lim
n→∞
1
nlog P(Qnx) = δx (1.3)
for some δ > 0 (for example [6] in 1994). This can be equivalently expressed
as: For any small ǫ > 0 and nlarge enough
en(δxǫ)P(Qnx)en(δx+ǫ)(1.4)
The approach via steady state probabilities works well if the network has
been running for a long time under stable conditions and is usually followed
in situations where we have no information other than that. If however we
are able to monitor the system and observe its state x0at time t= 0 say,
then we might want to know how the queue size evolves from now on. In a
stable system we will most likely observe the queue size decreasing but even
so rarely other evolutions may occur too. In our approximative setting we
do not observe the queue size exactly but for some nthe smoothed version
Qnas a function of time: Qn(t) = 1
nQnt. We are then interested in following
probability
P(||Qnψ|| < ǫ |Qn(0) = x0).(1.5)
with || · || denoting the supremum norm over the interval of interest.
This and more general questions can be answered with a large deviation
result by Anatolii Puhalskii [15] from 1995: under some technical conditions
for a set of functions A
inf
φAI(φ)lim inf
n→∞
1
nlog P(QnA|Qn(0) = x0)
lim sup
n→∞
1
nlog P(QnA|Qn(0) = x0) inf
φA
I(φ) (1.6)
Here the so called rate function I(φ) at φis either infinite or can be calculated
from an explicitly known local rate function Lby
I(φ) = Zt1
s=0
L(φ(s), φ(s)) ds. (1.7)
The rate function I(·) allows to generalise the fixed rate δof (1.3) to bound
the steady state probabilities: We get a lower rate infφAI(φ) and an up-
per rate infφAI(φ) that naturally depend on the event Ain which we are
interested in.
4
In particular the choice A={φ| ||φψ|| < ǫ}is feasible and we can approx-
imate (1.5) by applying (1.6). We rephrase (1.6) in terms of small ǫ > 0 and
large n:
en(infφAI(φ)ǫ)P(||Qnψ|| < ǫ |Qn(0) = x0)en(infφAI(φ)+ǫ)(1.8)
So far we have stated results for the single queue only. We now proceed to
networks of queues and state similar results. We are interested in approxi-
mating probabilities for specified rarely occurring behaviour of queue sizes,
now jointly for the queues at each node of the network. We start with steady
state probabilities for queue sizes at a fixed time and then move to probabil-
ities over intervals of time and with fixed starting positions.
Consider a network of dM/M/1 queues and now let QNdbe the vec-
tor of queue sizes at some fixed time when assuming that the system has
been running for a long time. For d= 1 we are back with the isolated queue.
This network of M/M/1 queues, the so called Jackson network, was intro-
duced by James R. Jackson in 1957 [14] to model machine shops with goods
to be processed travelling the network of machines. As a machine is busy
processing, newly arriving goods have to wait to be processed later and in
the mean time queue. The Jackson network has also been applied to doc-
uments travelling a network of offices of a business or administration where
they are processed by clerks [25] ; Qis then the height of the stack of docu-
ments piling on each desk. More recently Jackson networks are applied in the
analysis and design of computer networks where there are now data packages
being routed through a local area network or some larger network like the in-
ternet. Qis then the number of packages in the buffer of each routing device.
r
1
r
2r
3
r
4
-
-?
@@@@
@R
6
- -
-
Figure 1.1: A network with d= 4 nodes
As an example consider the network of d= 4 nodes in figure 1.1. Let
1.2 Introduction 5
λR4be non-negative with coordinates λ1, λ2>0 for the rate at
which customers arrive at nodes 1 and 2. There are no customers
arriving at nodes 3 and 4, so λ3=λ4= 0;
µR4be positive with µithe rate at which a server releases customers
while busy;
PR4,4be a routing matrix: the i-th row of Pgives probabilities of
which node to go to next and of leaving the network. In the example
network leaving the network is possible only after finishing service at
nodes 3 and 4.
The result of Jackson published in [14] is the following: If there is a unique
νsolving the traffic equation
ν=λ+Pν(1.9)
and νi< µifor all nodes i= 1,...,4 then the steady state probability for
the queue sizes to be of size xN4is
P(Q=x) =
4
Y
i=1
(1 νi
µi
)νi
µixi
.(1.10)
Note that νi< µimakes precise the sufficient resources (all µilarge enough)
in the network to generally allow to serve all customers in a reasonable time.
Applications of Jackson’s results are manifold: it helps us to decide about
the design of a network system where the probability of queue sizes to ex-
ceed some xmax has to be below some fixed level. Similarly, in a network
with given service-resources we can now decide how to invest additional re-
sources to get a maximum reduction of the probability to exceed xmax. To a
network service provider probabilities of large queue sizes and the empirical
occurrence of large queue sizes are a measure for the quality of the provided
service, and presumably for customer satisfaction.
In computer networks conditions will vary throughout the day and a network
service provider will observe and thus fix the present state of the network
and ask about probabilities for future evolution starting from the present
state. For this purpose steady state probabilities are not enough and we
turn to sample path large deviations. As before the large deviations concern
the scaled, smoothed process Qnthat for tin some interval is defined as
Qn(t) = 1
nQ(nt)Rd.
6
1.2 Large deviations of Jackson and gener-
alised Jackson networks
Large deviations for a wide class of Markovian models including the Jack-
son network have in principle been obtained by Paul Dupius and Richard S.
Ellis in 1995 [7]. The analysis of stochastic networks has been perceived as
difficult due to discontinuity of their behaviour as queue sizes change from
empty to full and back.
A result of Irina Ignatiouk-Robert of 2000 [12] gives the explicit form of
the rate function for the Jackson network. We can make the same kind of
bounding as in (1.6) and (1.8) now with Aa set of d-dimensional functions
and the rate function I(·) again in integral form over a local rate function
L(·,·):
I(φ) = ZT
s=0
L(φ(s), φ(s)) ds
L(x, v) = lΛ(x)(v) = sup
α∈BΛ(x)
hα, vi R(α) (1.11)
for some explicitly known Rand set of restrictions BΛ(x). It is quite remark-
able that given the complexity of a network the result comes in such handy
format.
The next step is of course to generalise the sample path large deviations
from networks of M/M/1 queues to networks of GI/GI/1 queues. This is
what we do in this thesis.
During the work on this thesis, in 2007, Anatolii Puhalskii [16] has proved
existence of a large deviation principle for the generalised Jackson network.
He gives a rate function in integral form with the local rate function a high-
dimensional convex optimisation problem. Complementing this we give the
explicit representation of the local rate function.
Generally our approach is very different from that of Puhalskii. It is closer
to ideas found in the work of Ignatiouk-Robert [12] and highlights the classi-
cal large deviation theory of finding the exponential change of measure that
turns the deviating behaviour into regular behaviour.
As a generalised Jackson network is not a Markov process we cannot profit
from the rich theory developed around them in the last 100 years and we
have to develop our own tools. The thesis is structured as follows:
1.2 Introduction 7
Chapter 2 is on inter event times. Inter event times in a queueing
network are times between arrivals of customers or the service times
of customers at a node. We introduce inter event times and make
assumptions on their distributions. Further, we introduce the hazard
rate function. From inter event times we build
renewal counting processes in chapter 3. In the network setting an
arrival counting process will count the number of arrivals at a node over
an interval of time. We generally work with non-Markovian counting
processes and we prove in this section that some implications of the
Markov property can still be obtained for these non-Markovian count-
ing processes. This will be done through exponential equivalence.
We then introduce a change of measure for the renewal counting pro-
cess such that under the changed measure the process stays a renewal
counting process. Then of course we need to know about the
large deviations of the renewal counting process: We develop them in
chapter 4. We start with local large deviations applying the change
of measure developed in the previous chapter. Building on the local
large deviations we prove weak large deviations and strengthen these
to a full large deviation principle.
The large deviations of renewal counting processes are on the one hand
required to develop the local large deviations of the generalised Jackson
network. On the other hand we think that in a similar and relatively
easy way this allows to strengthen the local large deviations of the
generalised Jackson network to a full large deviation principle for the
generalised Jackson network.
Chapter 5 introduces stochastic networks and stochastic processes
that describe (aspects of) such networks: the free, the network, and
the local process.
We first define drifts for networks based on deterministic rates for the
network starting empty and the network starting with some nodes ini-
tially non empty, and we give formulations of network drifts in terms
of the solution to the linear complementary problem and the Skoro-
hod problem. When introducing the stochastic processes we show that
these initially defined drifts are drifts of these processes, and thus de-
scribe the regular behaviour of theses processes.
8
We then investigate rare events and develop sample path large devia-
tions for the free process.
In chapter 6 we prove the local large deviations for the generalised
Jackson network. We follow a classical approach here that uses an
exponential change of measure (developed over chapters 3, 4, 5) that
gives the upper bound as the almost Fenchel Legendre transform that
will be the local rate function. The optimiser of the almost Fenchel
Legendre transform corresponds to a change of measure which is then
applied to obtain the lower bound.
We close with identifying the rate function of Puhalskii with our local
large deviation rate function and with an example of how to calculate
the local rate function.
Also there are further applications of these sample path large deviations by
the contraction principle that allow approximating probabilities for events
that continuously depend on the queue sizes Qin [9], [23] that require an
analytical form of the rate function.
As Jackson networks are Markov processes and generalised Jackson networks
are not our technique has to be fundamentally different from that in [7] and
[12]. In terms of results the difference of (1.1) and (1.11) is small:
Ψ versus R: Both are logarithmic moment generating functions of the
associated free process and Ψ = Rif the generalised Jackson network
is a Jackson network.
optimising over BΛ(x)versus BK(x,v): For an absolutely continuous φwe
have Λ(φ(t)) = K(φ(t), φ(t)) for almost all t.
We cite related work we are aware of at the beginning of chapters and give
reference to alternative proofs in the main text.
Chapter 2
Inter event times
Looking at a stochastic network times between arrivals of customers at a
node, as well as the time a customer occupies the server at a node are ran-
dom.
In this thesis we generalise the sample path large deviation principle for
Jackson networks [12]: In a Jackson network times between arrivals of cus-
tomers at a node, as well as the time a customer occupies the server at a
node are iid exponentially distributed. We generalise this to arbitrary light
tailed distributions. Independence assumptions of the Jackson network are
not challenged. We start - in this chapter - with investigating general inter
event times and comparing them to exponential ones: How they are different
and what angle can be chosen to highlight similarities.
We will see that there are basically two classes of inter event times: those
that stay relatively small (we call them LD-bounded) and those that may be
large. The exponential distribution is one that produces not LD-bounded in-
ter event times and here we will apply general properties of not LD-bounded
inter event times to make up for the lost Markov property.
2.1 Inter event time
In a GI/GI/1 queue times between consecutive arrivals are iid and so are the
required lengths of service for each customer. We will refer to times between
consecutive arrivals and to the lengths of service as inter event times.
Definition 2.1.1. An inter event time is a non-negative random variable.
Inter event times are often denoted by τand variations of it. The distri-
bution function of the inter event time τwill be denoted F. Throughout this
9
10
thesis we follow the convention that for Fa distribution function Fc= 1F.
Definition 2.1.2 (-transform of F).For a distribution function Fof an
inter event time with finite mean define the distribution function ˜
Fas
˜
F(x) := Zx
s=0
Fc(s)
R
t=0 Fc(t)dt ds
Note that ˜
Fhas density
˜
f(x) = Fc(x)
R
s=0 Fc(s)ds =Fc(x)
E[τ](2.1)
and is the distribution function of an inter event time. The inter event time
with distribution function ˜
Fwill be denoted ˜τ. We see how the mean of τ
and ˜τrelate:
Claim 2.1.3. Let τbe an inter event time and ˜τassociated with it. If
E[τn+1]<then E[˜τn]<and E[˜τn] = E[τn+1]
(n+1)E[τ].
Proof of 2.1.3:
E[τn+1] = Z
x=0
xn+1 dF(x)
= lim
z→∞ zn+1F(z)Zz
x=0
F(x)dxn+1
= lim
z→∞ (n+ 1) Zz
x=0
xndx F(z)(n+ 1) Zz
x=0
F(x)xndx
= lim
z→∞(n+ 1) Zz
x=0
(F(z)F(x)) xndx
= (n+ 1) Z
x=0
(1 F(x)) xndx
= (n+ 1)E[τ]Z
x=0
Fc(x)
E[τ]xndx
= (n+ 1)E[τ]E[˜τn]
2.1.3
We construct another inter event time from τ:
Definition 2.1.4 (τassociated with τ, G).Let τ, τ1, τ2,...be iid with distri-
bution function F,p(0,1) fixed and Gthe geometrically distributed random
2.1 Inter event times 11
variable with mass function P(G=g) = pg1(1 p)for g= 1,2,.... Let
G, τ1, τ2,... be independent. Define τas
τ=
G
X
k=1
τk.
Claim 2.1.5. If τis an inter event time and τis associated with τand G
with parameter pthen the mean relate as: E[τ] = 1
1pE[τ].
Proof of 2.1.5: From independence of {G, τ1, τ2,...,}
E[τ] = E[
G
X
k=1
τk] =
X
g=1
E[
g
X
k=1
τk]
|{z }
=gE[τ]
P(G=g) = E[GE[τ]] = E[G]E[τ]
=1
1pE[τ]
2.1.5
Generally, τis not qualitatively different from τand whenever working with
τit might be of the form τ.
Remark 2.1.6. We will later see that ˜τas the time to the first event makes
the renewal counting process have stationary increments and that τis the in-
ter event time at a service node in a network that allows the leaving customer
to immediately join again the queue just left.
A few times in this thesis we will need the distribution of an inter event
time τconditional on τ > a for some positive a.
Definition 2.1.7. For a distribution function Fof an inter event time and
aR0such that F(a)[0,1) define
F+a: [0,)[0,), x 7→ F(x+a)F(a)
Fc(a)
Claim 2.1.8. If τhas distribution function Fand aRis such that F(a)
[0,1) then F+ais the distribution function of τaconditional on τ > a.
12
Proof of 2.1.8:
P(τa > x |τ > a) = P(τa > x , τ > a)
P(τ > a)=P(τ > x +a)
P(τ > a)
=Fc(x+a)
Fc(a)
P(τax|τ > a) = 1 Fc(x+a)
Fc(a)=Fc(a)Fc(x+a)
Fc(a)
=F(x+a)F(a)
Fc(a)
2.1.8
2.2 The logarithmic moment generating func-
tion
The logarithmic moment generating function is an essential in classical large
deviation theory.
Definition 2.2.1 ,D(Λ)).For an inter event time τthe logarithmic mo-
ment generating function Λis defined as
Λ : RR {∞} , θ 7→ log E[eθτ ]
and abbreviated as lmgf. The domain of Λis defined as
D(Λ) := {θR|Λ(θ)<∞}.
For a distribution function Fof an inter event time we may also say that
Λ is the lmgf of F; similarly for a density fof an inter event time.
Throughout this thesis we make the following
Assumption 2.2.2. D(Λ) is open and infθRΛ(θ) = −∞.
The following two claims are to interprete the assumption:
Claim 2.2.3. τhas point mass at 0iff its lmgf Λis bounded from below.
Proof of 2.2.3. We first establish log F(0) as a lower bound of Λ. Let
θ < 0 and ǫ > 0.
E[eθτ ] = E[eθτ 11[0](τ) + eθτ 11(ǫ,)(τ)] eθǫF(ǫ)
E[eθτ ]lim
ǫց0eθǫF(ǫ) = F(0) (Fright-continuous)
2.2 Inter event times 13
If F(0) = 0 we found a trivial lower bound for Λ and we have to show that
there is no other, non-trivial lower bound. Let ǫ > 0 again and first choose a
such that 0 < a F1(ǫ
2) and then θsuch that θ < 1
alog ǫ
2<0. With these
E[eθτ ]eθa Fc(a) + F(a)ǫ.
2.2.3
Note that by definition of Λ we have Λ(0) = 0 and by the assumed (in 2.2.2)
openness of D(Λ) there is θ > 0 such that Λ(θ)<.
Claim 2.2.4. If Λ(θ)<for some θ > 0then all moments of τexist.
Proof of 2.2.4: For θ > 0 and xR0we have non-negative (l, x)7→ (θx)l
l!
and are allowed to interchange summation and integration as an application
of the Tonelli theorem (cf [21] theorem 20 of chapter 12, p. 270).
E[eθτ ] = Z
x=0
eθx dF(x) = Z
x=0
X
l=0
(θx)l
l!dF(x) =
X
l=0
θl
l!Z
x=0
xldF(x)
=
X
l=0
θl
l!E[τl] (2.2)
2.2.4
We get Λ C(D(Λ)) from eΛ(θ)being a power series. eΛbeing a power
series we are also allowed to interchange differentiation and summation in
the following
d
E[eθτ ] =
X
l=0
d
θl
l!E[τl] =
X
l=0
θl
l!E[τl+1] = E[τ
X
l=0
θl
l!τl] = E[τeθτ ]
Openness of the domain allows this easy differentiation for all θ D(Λ).
Claim 2.2.5. Λis convex and strictly increasing. If τis not deterministic
we have strict convexity of Λon its domain.
Proof of 2.2.5: Convexity follows from the older inequality. By differ-
entiating in the open domain
d
Λ(θ) = E[τeθτ
eΛ(θ)]>0,d2
2Λ(θ) = E[τ2eθτ
eΛ(θ)]E[τeθτ
eΛ(θ)]2
While d
Λ(θ)>0 follows from P(τ > 0) >0 as implied by assumption 2.2.2
we have d2
2Λ(θ)>0 only for P(τ=x)<1 for all xR0.
2.2.5
14
Remark 2.2.6. Note that
eθτ
eΛ(θ)>0,E[eθτ
eΛ(θ)] = 1
and we can write Λ(θ)and Λ′′(θ)as expectation and variance of τunter a
changed measure.
Openness of the domain D(Λ) also implies that Λ is not bounded from
above. So 2.2.2 implies Λ(D(Λ)) = Rand the following is a feasible definition.
Definition 2.2.7 associated with Λ).Let Λbe the lmgf of an inter event
time τfor which assumption 2.2.2 holds. Then define Γassociated with Λas
Γ : RR, θ 7→ Λ1(θ)
Claim 2.2.8. D(Γ) = R,Γis strictly increasing, and (if τis not determin-
istic) strictly convex on R.
Proof of 2.2.8: Finiteness of Γ on all of Rshould be immediate from the
definition. Strict convexity of Λ was argued for in 2.2.5 and implies strict
convexity for Γ:
d
Γ(θ) = d
Λ1(θ) = 1
Λ(Γ(θ)) >0
d2
2Γ(θ) = d
1
Λ(Γ(θ)) =Λ′′(Γ(θ))
Λ(Γ(θ))3>0
2.2.8
So properties of Λ translate into properties of Γ. For example
limθ→−∞ Λ(θ) = 0 limθ→∞ Γ(θ) =
D(Λ) = (−∞, LC(h)) is equivalent to Γ(R) = (LC(h),).
Γ(θ) = 1
E(Γ(θ))[τ]
We will now investigate properties of the lmgf of inter event times ˜τand τ.
Definition 2.2.9. The lmgf of inter event time ˜τwith distribution function
˜
Fis denoted ˜
Λ:
˜
Λ(θ) = log E[eθ˜τ] = log Z
t=0
eθt d˜
F(t)
The lmgf of inter event time τis denoted Λ:
Λ(θ) = log E[eθτ]
2.2 Inter event times 15
Claim 2.2.10. D(Λ) = D(˜
Λ) and
˜
Λ(θ) = (log eΛ(θ)1
E[τ]θ, θ 6= 0
0, θ = 0.(2.3)
Proof of 2.2.10: If for τall moments exist the same holds for ˜τ(cf claim
2.1.3). And we can write the lmgf of ˜τin terms of the lmgf of τ. While
log E[e0˜τ] = 0 for θ6= 0 we get
E[eθ˜τ] =
X
l=0
θl
l!E[˜τl]
=
X
l=0
θl1
l!
1
l+ 1
|{z }
=1
(l+1)!
1
E[τ]E[τl+1]
=1
E[τ]
1
θ
X
l=1
θlE[τl]
l!
=1
E[τ]θ
X
l=0
θlE[τl]
l!1
=1
E[τ]θ(E[eθτ ]1)
2.2.10
We see that ˜τfalls under assumption 2.2.2. ˜
Λ C(D(˜
Λ)) = C(D(˜
Λ))
and we need not worry about continuity of ˜
Λ in θ= 0.
Claim 2.2.11. If τis associated with τand geometric Gof parameter p
(0,1) then
Λ(θ) = Λ(θ) + log 1p
1p eΛ(θ),D) = ,Λ1(log p)
Proof of 2.2.11:
E[eαG] = 1p
1p eαeα
is finite only for α < log 1
p. For θ < Λ1(log p)
E[eθτ] = E[eθPG
k=1 τk] = E[E[eθPG
k=1 τk|G] ]
=E[E[eθτ1|G]G] = E[E[eθτ1]G]
=E[eΛ(θ)G] = E[eΛ(θ)G]
16
is finite and
log E[eθτ] = log E[eΛ(θ)G] = log 1p
1p eαeαα=Λ(θ)
as claimed. Openness of the domain of Λfollows from openness of the
domain of the lmgf of G.
2.2.11
Corollary 2.2.12. If for τassumption 2.2.2 holds then this assumption holds
for τof definition 2.1.4, too.
Proof of 2.2.12: Openness of the domain of Λcomes from 2.2.11. Un-
boundedness from below is immediate from the explicit form of Λ:
lim
θ→−∞ Λ(θ) = lim
θ→−∞ Λ(θ) + log lim
θ→−∞
1p
1p eΛ(θ)
=−∞ + log(1 p) = −∞
2.2.12
Throughout this thesis we make the following
Assumption 2.2.13. τhas a density fand lim supa→∞ E[τa|t > a]<.
This assumption is mainly for technical convenience and to ease proofs.
Existence of the density of course implies the unboundedness from below
of Λ of assumption 2.2.2 - only P(τ= 0) = 0 was required for this. The
boundedness of the conditional expectation is a property of many inter
event times. Also, as an interpretation note that lima→∞ E[τa|τ > a] (=
lima→∞ R
x=0 Fc
+a(x)dx ) = is a harsh case of used better than new - ex-
cluding it should do no harm.
2.3 Exponential twist of inter event times
Definition 2.3.1 (Twisted distribution, exponential transform of F).Let F
be the distribution function of an inter event time with lmgf Λ. Let β D(Λ)
then the twisted distribution Fβis defined as
Fβ(x) := Zx
s=0
eβsΛ(β)dF(s)
and βwill be called the twist parameter.
2.3 Inter event times 17
We can say that Fβhas density s7→ eβsΛ(β)wrt Fand that if Fhas
density f(wrt Lebesgue measure) then Fβhas density (wrt Lebesgue mea-
sure)
fβ(x) := f(x)eβxΛ(β).(2.4)
The exponential transform of an exponential distribution is again an expo-
nential distribution, the parameter changes as µ7→ µβfor βthe parameter
in the exponential transform. In this section we describe properties of expo-
nential transforms applied to general inter event times.
Claim 2.3.2. If τfalls under assumption 2.2.2 and τβis associated with
τthrough the exponential transform with parameter β D(Λ) then τβfalls
under assumption 2.2.2, too.
Proof of 2.3.2: We calculate Λβ, the lmgf of τβwith distribution function
Fβ.
E[eθτβ] = Z
s=0
eθs dFβ(s) = Z
s=0
eθs eβsΛ(β)dF(s)
=Z
s=0
e(θ+β)sdF(s)eΛ(β)=E[e(θ+β)τ]1
E[eβτ ]
Λβ(θ) = log E(β)[eθτ ] = Λ(β+θ)Λ(β) (2.5)
which we might want to write as Dβ) = D(Λ) β. So β D(Λ) is
equivalent to 0 Dβ) and generally openness of the domain is not changed
by its translation. Unboundedness from below for Λβis immediate.
2.3.2
Lemma 2.3.3. Exponential transforms can be inverted: If β D(Λ) then
(Fβ)β=F , (fβ)β=β
Proof of 2.3.3: The second twist with parameter βhas to be relative to
fβ. We write the twists explicitly.
fβ(x) = f(x)eβxΛ(β)
(fβ)β(x) = fβ(x)eβxΛβ(β)
eΛβ(θ)=E(β)[eθτ ] = Z
x=0
eθx fβ(x)dx
=Z
x=0
eθx f(x)eβxΛ(β)dx =Z
x=0
e(θ+β)xf(x)eΛ(β)dx
eΛβ(β)=Z
x=0
f(x)eΛ(β)dx =eΛ(β)
(fβ)β(x) = f(x)e(ββ)x
|{z }
=1
eΛ(β)+Λ(β)
|{z }
=1
18
We check that everything is well defined.
β Dβ) = D(Λ) β0 D(Λ)
Since we calculated Λβ(β) = Λ(β) the lhs of this expression is finite when-
ever β D(Λ), a condition we started with. So all’s well.
2.3.3
We close the section with some remarks
Remark 2.3.4. The -transform and the exponential transform of F
do not generally commute: (˜
F)β6=g
(Fβ).
Writing down (˜
f)βand g
(fβ)equality requires that x7→ eβx Fc
Fc
β(x)is
constant. This holds for the exponential but not for the uniform distri-
bution.
Denote expectation and variance wrt the exponentially transformed dis-
tributions of 2.3.1 with parameter θby indexing with (θ). Then deriva-
tives of Λcan be written as Λ(θ) = E(θ)[τ]and Λ′′(θ) = V(θ)[τ](cf
2.2.5, 2.2.6).
2.4 Hazard
We introduce the hazard rate and make some mild technical assumptions on
it. We’ll see how the hazard rate relates to the domain of the lmgf, especially
in terms of boundedness. Also, we give examples for the hazard rate - for
well known distributions and when constructing inter event times from the
hazard rate.
We also investigate how hazard rates change under exponentially twisting
the distribution, and how hazard rates of τand ˜τrelate.
We will again see an analogy to the exponential distribution: When de-
fined the right way the mean rate of the hazard function equals the bound
of the domain.
2.4.1 Introduction of the hazard function
For inter event time τwith distribution function Fdefine H:= log Fcon
the support of τ. Assuming absolute continuity of Falso Hhas a derivative
almost everywhere and h(x) = d
dx H(x) = d
dx F(x)
Fc(x)=f
Fc(x) where it exists.
2.4 Inter event times 19
Interpreting hin terms of probability:
h(x) = lim
ǫ0
1
ǫP(τ(x, x +ǫ])
P(τ > x)= lim
ǫ0
1
ǫP(τ(x, x +ǫ]|τ > x)
which is the infinitesimal probability for τto take a value in the infinitesimal
interval (x, x +dx] given that τ > x.his called the hazard function of τ.
Definition 2.4.1 (Hazard function, LC(h)).For an inter event time τwith
density fthe hazard function his
h: [0,)[0,], x 7→ h(x) = (f
Fc(x), Fc(x)>0
0,else
and LC(h) := limx→∞ 1
xRx
s=0 h(s)ds is the Cesaro mean of h.
In this thesis we always make the following
Assumption 2.4.2. For an inter event time τwith hazard function hthe
Cesaro-mean LC(h)exists and LC(h)(0,].
Note that given a distribution function Fthe density fand the hazard
function hare not uniquely defined. However, we have
Fc(x) = exp{− Zx
s=0
h(s)ds}holds for all x
d
dx log Fc(x) = h(x) holds for almost all xsupp(f).
Thus we see that given a hazard function we can construct the distribution
function. But, what is a hazard function in general?
Claim 2.4.3. Let h: [0,)[0,]be measurable and H=Rh. If
limǫ0H(ǫ) = 0 and limx→∞ H(x) = then F= 1 eHis a distribution
function with a density. On the support of Fthe hazard function is a.s equal
to h.
Proof of 2.4.3: We prove that F:= 1 eHis a distribution function.
Fis non-negative and continuous since His.
limx→∞ F(x) = 1 elimx→∞ H(x)= 1 0 = 1
Fis increasing since h0 and His increasing.
20
This distribution has a density iff Fis an absolutely continuous function.
His absolutely continuous by definition. And g(z) = 1 ezis Lipschitz
continuous with g(z) = ez1 for z0. Since H0 we get absolute
continuity for F=gH. Let xbe such that F(x)<1 (H(x)<):
d
dxF(x) = d
dx(1 eH(x)) = eH(x)(h(x)) = h(x) exp{− Zx
s=0
h(s)ds}
(with H=hwhere His finite a.s. from absolute continuity) is the density
of F.
How does the support of Frelate to H? If H(x) = for some finite x, then
H(z) = and Fc(z) = 0 for all z > x. The hazard of Fis thus defined to
be = 0 on [x, ), though hmay take any value there (from the claim).
Now on the support of F: if Fc(x)<1 then H(x)<and applying the
density fas calculated f
Fc=heH
Fc=h.
2.4.3
Had we assumed continuity of hall of the above would have been imme-
diate.
How does assumption 2.2.2 translate into H? We will discuss this in the
following examples sections.
For a bounded inter event time τbit is necessary (F(b) = 1) that
limtbH(t) = limtbRt
s=0 h(s)ds =and lim suptbh(t) = . This agrees
with the following interpretation for the hazard: If an event has to happen
before bthen the force for it to happen increases without bound as time
approaches b.
2.4.2 The hazard function and the domain of lmgf
For an exponential distribution with density f(x) = µ eµx the parameter µis
the boundary for the domain of its logarithmic moment generating function:
D(Λ) = (−∞, µ) and the constant hazard rate hµ. We generalise this.
Claim 2.4.4. If the Cesaro-mean of the hazard rate diverges to the lmgf
has an unbounded domain: LC(h) = D(Λ) = R.
Proof of 2.4.4: Let M > θ and x0be large enough for H(x)
x> M for all
2.4 Inter event times 21
xx0.
E[eθ˜τ] = 1
E[τ]Zx0
x=0
eθx Fc(x)dx +Z
x=x0
eθx Fc(x)
|{z }
=ex(θH(x)
x)
dx
1
E[τ]Zx0
x=0
eθx Fc(x)dx +Z
x=x0
ex(θM)dx<
By 2.2.10 this is equivalent to unboundedness of D(Λ).
2.4.4
Claim 2.4.5. If the Cesaro-limit LC(h)exists in (0,)then the domain of
the lmgf of τis bounded and D(Λ) = (−∞, LC(h)).
Proof of 2.4.5: For θ < LC(h) set ǫ:= LC(h)θ
2>0. Let x0be large
enough for
H(x)
x> LC(h)ǫxx0.
Then
θx H(x) = x(θH(x)
x)x(θ(LC(h)ǫ)) x(2ǫ+ǫ)
=ǫx
and
Z
x=x0
eθxFc(x)dx Z
x=x0
eǫx dx x0→∞
0
and by 2.2.10 also θ D(Λ).
Now let θ > LC(h) and ǫ=θLC(h)
2>0 and x0large enough for H(x)
x<
LC(h) + ǫ. Then
θx H(x) = x(θH(x)
x)> x(θLC(h)ǫ)> ǫx
and R
x=x0eθxFc(x)dx =and E[eθ˜τ] = implying θ6∈ D(˜
Λ) and LC(h)
thus has to be on the boundary of D(˜
Λ). Since D(Λ) = D(˜
Λ) the claimed
statement is justified.
2.4.5
Corollary 2.4.6. From 2.4.4 and 2.4.5 we can completely generalise the
property of the exponential distribution: If LC(h)exists in (0,]then D(Λ) =
(−∞, LC(h)).
22
Similarly for the exponential distribution LC(h) = µis the exact expo-
nential decay rate of the tail of the distribution function t7→ eµt. We
generalise this expression to the more general Fwe allow for inter event time
distributions:
Remark 2.4.7. For LC(h) = we say the tails of Fdecay superexponen-
tially and for LC(h)<we’ll say that Fcdecays exponentially with rate
LC(h).
Corollary 2.4.8. LC(h) = LC(˜
h)
due to (−∞, LC(h)) 2.4.6
=D(Λ) 2.2.10
=D(˜
Λ) 2.4.6
= (−∞, LC(˜
h)).
LC(hβ) = LC(h)β
due to (−∞, LC(h)) 2.4.6
=D(Λ) and Dβ)(2.5)
=D(Λ) β.
In 2.2.2 we have assumed openness of the domain. The following is a
sufficient condition for openness of D(Λ).
Lemma 2.4.9. If LC(h)<and lim supx→∞ H(x)LC(h)x < then the
domain of Λis open as assumed in 2.2.2.
Proof of 2.4.9: If lim supx→∞ H(x)x LC(h)<then lim infx→∞ ex LC(h)H(x)>
0 and
E[eLC(h)˜τ] = 1
E[τ]Z
x=0
eLC(h)xFc(x)dx =1
E[τ]Z
x=0
eLC(h)xH(x)dx
=
Then D(˜
Λ) does not contain its right boundary LC(˜
h) = LC(h) and since
D(˜
Λ) = D(Λ) neither does D(Λ).
2.4.9
The equivalent condition for an open domain is obvious from the proof of
2.4.9:
E[eLC(h)˜τ]< eLC(h)xH(x)0 integrably fast.
2.4.3 Examples
We give several examples of hazard rates with different properties in bound-
edness and monotonicity. We discuss assumptions 2.2.2 and 2.4.2.
Example 2.4.10. The exponential distribution is characterised by its con-
stant hazard rate: f(x) = µ eµx h(x) = µ=LC(h).
2.4 Inter event times 23
Example 2.4.11. The Erlang distribution Ek(µ)where k1has a hazard
rate that is monotonically increasing to µ.
Fc(x) = eµx
k1
X
j=0
(µx)j
j!, H(x) = µx log
k1
X
j=0
(µx)j
j!
f(x) = µkxk1
(k1)! eµx , h(x)x>0
=µ1 +
k1
X
j=1
(µ x)j(k1)!
(k1 + j)!1
LC(h) = µ
The domain of a hazard function with h(x)< LC(h) is generally open
as implied by 2.4.9. Erlang distributed inter event times fall under both
assumptions.
Example 2.4.12 (Oscillating hazard, discrete).If h(x) = κ11xodd+µ11xeven
then LC(h) = 1
2(µ+κ).
We have
H(x) = 2x
2κ+µ
2+Zx
s=2x
2
h(s)ds
LC(h)xH(x) = Zx
s=2x
2κ+µ
2κ11sodd +κ+µ
2µ11seven ds
=Zx
s=2x
2
κµ
211sodd +κµ
211seven dx
6→ −∞
Again from 2.4.9 we get an open D(Λ).
Example 2.4.13 (Oscillating hazard, continuous).If h(x) = 1+sin(x)then
LC(h) = 1 and Fc(x) = exp{−x1 + cos x}. More generally for a, b > 0
define ha,b(x) := a+b(1 + sin(x)).
Again LC(h)xH(x)6→ −∞ and exp{LC(x)(h)H(x)}is not integrable.
2.4.9 applies.
Example 2.4.14. If h(x) = a+c
x+bfor a, b > 0and c(0,1] then LC(h) =
a.
We have
LC(h) = a , H(x) = ax +clog x+b
b
eLC(h)xH(x)=b
x+bc
24
And E[eLC(h)˜τ] = for the parameters given in the example.
c > 1 is excluded since otherwise the domain was not open. Similarly b > 0
is necessary since a function x7→ a+1
xdoes not qualify as a hazard function
as Rz
x=0 a+1
xdx =for any z > 0 or equivalently limǫ0H(ǫ) = 6= 0.
Also the case of a= 0 does not fall under assumption 2.4.2
While h(x) = 2 + cos log(1 + x) is a hazard function it does not fall un-
der assumption 2.4.2.
Example 2.4.15 (Exponential hazard).If h(x) = 1 exthen LC(h) = 1
and Fc(x) = exp{−xex1}. More generally for a > 1, b define ha,b(x) =
a+ sign(b)ebx.
For b < 0 we have LC(h) = aand an open domain from 2.4.9. For b > 0
we get LC(h) = and openness is not an issue. Existence of a density is
obvious in both cases and so 2.2.2 holds.
Example 2.4.16. The uniformly distributed random variable has
f(x) = 11[0,1](x), F(x) = x11[0,1](x) + 11(1,)(x)
h(x)x[0,1]
=1
1x
(x1)
=LC(h)
Starting from the hazard function:
Example 2.4.17 (Affine hazard).If h(x) = xthen LC(h) = and Fc(x) =
exp{−x2
2}. More generally for a, b > 0define ha,b(x) := a x +b.
Generalising this to polynomial hazards h(x) = (ax)k+bwith k1 we
arrive at Weibull-distributions.
2.5 Coupling
This section is on coupling inter event times.
Definition 2.5.1 (Coupled inter event times).Two inter event times are
coupled if they have a joint distribution.
Any two inter event times σ1, σ2have a joint distribution as a tuple of
independent random variables. Another possibility to construct a joint dis-
tribution while keeping individual (=marginal) distributions of σ1, σ2is the
quantile coupling. It is nicely explained in section 3 of chapter 1 in the book
of Hermann Thorisson [24].
2.5 Inter event times 25
Definition 2.5.2 (Quantile coupling).If inter event times σ1, σ2have re-
spective distribution functions F1, F2and Uis uniformly distributed on (0,1)
then (F1
1(U), F1
2(U)) is the quantile coupling of (σ1, σ2).
It is obvious that (F1
1(U), F1
2(U)) are coupled. Furthermore the associ-
ation of (F1
1(U), F1
2(U)) with (σ1, σ2) is in the fact that L(F1
i(U)) = L(σi)
for i= 1,2:
P(F1
1(U)t) = P(UF1(t)) = F1(t) = P(σ1t).(2.6)
A joint distribution is required if we want to consider a function of two ran-
dom variables, for example their difference. This is required in the following
definition of exponential equivalence.
Definition 2.5.3 (Exponential equivalence in R).The sequences of real val-
ued random variables (Yn;nN)and (Zn;nN)are exponentially equiv-
alent if for each nNthere is a coupling (ˇ
Yn,ˇ
Zn)of (Yn, Zn)such that the
sequence (|ˇ
Ynˇ
Zn|;nN)decays super exponentially: For any δ > 0
lim
n→∞
1
nlog P(|ˇ
Ynˇ
Zn|> δ) = −∞.
This definition is a special case of the more general definition 4.2.10 of
[5]. And the importance of this definition is in theorem 4.2.13 of [5] which
tells us that if one of two exponentially equivalent sequences satisfies a large
deviation principle with a good rate function then the other does too, and
with the same rate function.
We introduce a new property for the inter event times.
Definition 2.5.4. A inter event time is LD-bounded if its lmgf is finite on
all of R.
An alternative definition is that an inter event time is LD-bounded if its
hazard function satisfies LC(h) = , by 2.4.6.
Thus any bounded inter event time will be LD-bounded. In this section
we connect LD-boundedness to exponential equivalence.
Very generally we can scale a single inter event time τand investigate its
large deviation behaviour. Doing so, any nice non-negative random variable
falls in one of two classes: It’s either exponentially equivalent to 0 or to an
exponentially distributed random variable with the correct parameter.
26
The correct parameter for the exponential distribution is the Cesaro mean
of the hazard function LC(h) with hthe hazard function of τ.
In the following we prove the claimed exponential equivalence separately
for the LD-bounded and the non-LD bounded inter event times.
Claim 2.5.5. If the inter event time σis LD-bounded then the sequences
(σ
n;nN)and (0 ; nN)are exponentially equivalent.
As a shorter way to express 2.5.5 we might say that σand 0 are expo-
nentially equivalent.
Proof of 2.5.5: Let LC(h) be the Cesaro mean of the hazard rate of σ. We
generally assumed that LC(h) exists in (0,] (cf. 2.4.2) and for LD-bounded
σwe have LC(h) = by 2.4.6. Thus
1
nlog P(1
n|σ0|> x) = 1
nlog Fc(nx) = 1
nlog exp{− Znx
s=0
h(s)ds}
=x1
x n Znx
s=0
h(s)ds
|{z }
LC(h)=
n→∞
−∞
2.5.5
This immediately implies that any two LD-bounded random variables are
exponentially equivalent.
Claim 2.5.6. If σis an inter event time with distribution function Fand
LC(h)(0,)and if Xis exponentially distributed with parameter LC(h)
then the sequences (σ
n;nN)and (X
n;nN)are exponentially equivalent.
Again, as a shorter way to express 2.5.6 we might say that if τand ex-
ponential Xhave the same finite Cesaro mean for their respective hazard
functions then they are exponentially equivalent.
Proof of 2.5.6: Let Gbe the distribution function of exponentially distributed
X:G(x) = 1 eLC(h)xand assume that X, σ are already quantile-coupled:
that there is Uuniform on (0,1) such that X=G1(U) and σ=F1(U).
We first observe that a large value for σimplies a large value for X. Let
0< s < t.
P(σ > nt , X ns) = P(U > F(nt), U < G(ns))
=P(U[F(nt), G(ns)])
=G(ns)F(ns)11F(nt)<G(ns)
2.5 Inter event times 27
Let ǫ > 0 be small enough for t
s>LC(h)
LC(h)ǫto hold and nlarge enough for
H(nt)
nt > LC(h)ǫto hold.
t
s>LC(h)
LC(h)ǫ
nt(LC(h)ǫ)> nsLC(h)
H(nt)> nsLC(h)
eH(nt)< ensLC(h)
Fc(nt)< Gc(ns)
Thus for nlarge enough
P(σ > nt , X ns) = 0
and the same works very similarly for P(X > nt , τ < ns). Put another way
we have for nlarge enough (depending on ts)
P(σ > nt) = P(σ > nt , X ns) or P(Xns |τ > nt) = 1.(2.7)
We apply (2.7) in the following that will show the super exponential decay
of σX:
σX > na
σ > 0
X > 0
σX > na
σ > na
X > 0
(2.7)
σX > na
σ > na
X > n(aǫ)
We see that for two positive random variables to be large and their difference
to be large, too, the subtrahend has to be even larger. With the help of
σX > na σ > na +X > n(2aǫ) the equivalence
σX > na
σ > na
X > n(aǫ)
σX > na
σ > n (2aǫ)
X > n(aǫ)
is easy to see. We can iterate and again use that given σis large, Xwill
almost surely be similarly large for nlarge enough.
σX > na
σ > n (2aǫ)
X > n(aǫ)
· · ·
σX > na
σ > n ((k+ 1)akǫ)
X > k n (aǫ)
Thus
P(σX > na) = P(σX > na , σ > n((k+ 1)akǫ), X > nk(aǫ))
PX > nk(aǫ)
=eLC(h)n k (aǫ)
ǫ< a
2
eLC(h)n k a
2
28
and the last expression has an exponential decay rate in nof LC(h)ka
2which
can be made arbitrarily large by increasing k(we need our assumption 2.4.2
here: that LC(h)>0). So the decay is faster than exponential.
Similarly for Xσ.
2.5.6
The following is fairly general.
Claim 2.5.7. If σ1, σ2are inter event times with distribution functions
F1, F2and integrated hazard functions Hi=log Fc
i(for i= 1,2) with a
common Cesaro limit LC= limx→∞ Hi(x)
x(for i= 1,2) then the sequences
(σ1
n;nN)and (σ2
n;nN)are exponentially equivalent.
Proof of 2.5.7: For LC<as in 2.5.6 let Xbe exponentially distributed
with parameter LCand distribution function Gand assume that σ1, σ2, G
are quantile coupled, that is σi=F1
i(U) (for i= 1,2) and X=G1(U) for
some Uuniformly distributed on (0,1). Then
lim sup
n→∞
1
nlog P(|σ1σ2|> )
lim sup
n→∞
1
nlog P(|σ1X|> nδ
2) + P(|σ2X|> nδ
2)
= max{lim sup
n→∞
1
nlog P(|σ1X|> nδ
2),lim sup
n→∞
1
nlog P(|Xσ2|> nδ
2)}
= max{−∞ ,−∞} =−∞
While for LC=we have already argued that exponential equivalence
holds. We might as well do the same calculation as above replacing Xby 0.
2.5.7
Corollary 2.5.8. τand ˜τare exponentially equivalent.
So far we have applied the distinction between LD-bounded and not LD-
bounded random variables in the different proofs for exponential equivalence
of inter event times in 2.5.5 and 2.5.6. We now connect them directly:
Claim 2.5.9. If inter event times σ, τ have hazard functions with the same
Cesaro mean then their difference under the quantile coupling is LD-bounded.
Proof of 2.5.9: For σ, τ themselves LD-bounded and θ > 0
E[e|στ|(θ+ǫ)]E[eσ(θ+ǫ)eτ(θ+ǫ)]
E[eσ p (θ+ǫ)]1
pE[eτ q (θ+ǫ)]1
q<
2.6 Inter event times 29
And if σ, τ are not LD-bounded: Let uniform Ube such that σi=F1
i(U)
for i= 1,2 and set X=G1(U) for Gthe exponential distribution function
with parameter LC. We proved that quantile coupled σiand Xhave an
LD-bounded difference. For the difference of Xand σiunder the quantile
coupling we have bounded
P(σiX > na)eLCkna
2
with a > 0 and arbitrary kNfor nlarge enough with reference to a, k.
Let Fibe the distribution function of |σiX|and Hthe hazard function
associated with F(that is: Fc
i=eHi). Then
log Fc
i(x)LCkx
2and Hi(x)
x=log Fc
i(x)
xLCk1
2
for xlarge enough. The limit takes care of xbeing large enough for any fixed
kand
kN: lim
x→∞
Hi(x)
xLCk1
2lim
x→∞
Hi(x)
x=
This tells us that E[eθ|σiX|]<for all θR. For p, q > 0 such that
1
p+1
q= 1 we get
E[eθ|σ1σ2|]E[eθ|σ1X|eθ|σ2X|]E[eθp|σ1X|]1
pE[eθq|σ2X|]1
q<
2.5.9
2.6 Fenchel-Legendre transforms
Definition 2.6.1. Given a logarithmic moment generating function H:
RmRits Fenchel Legendre transform is defined as
H:RmR, x 7→ sup
θRm
hθ, xi H(θ).
His again convex and continuous on the interior of its domain.
Remark 2.6.2. Sufficient criteria for a Fenchel-Legendre transform Hof
Hto have compact level sets are
for m= 1:0 D(H)
for arbitrary mN:D(H) = Rmor His essentially smooth lower
semicontinuous.
30
(cf [5] lemma 2.2.20, theorem 2.3.6). Thus the Fenchel-Legendre transforms
Λand Γof lmgfs Λand Γare good rate functions.
For an inter event time τwe have denoted its lmgf Λ. From Λ we defined
an associated Γ (cf (2.2.7)). Their Fenchel-Legendre transforms relate, too.
Claim 2.6.3. Γ(x) = xΛ(1
x)for x > 0holds and Γ(0) = limx0xΛ(1
x).
Proof of 2.6.3: First let x > 0.
Γ(x) = sup
θ
θ x Γ(θ) = sup
θ
θ x + Λ1(θ)
= sup
θ=Λ(γ) ; γR
θ x + Λ1(θ)
= sup
γRΛ(γ)x+ Λ1Λ(γ)
=xsup
γRΛ(γ) + 1
xγ
=xΛ1
x
While for x= 0
Γ(0) = sup
θRΓ(θ) = sup
θRΛ1(θ) = lim
θ→∞ Λ1(θ) = LC(h)
If Γ(0) = LC(h)<we get continuity of Γat 0 from convexity and
Γ(0) = lim
x0Γ(x) = lim
x0xΛ(1
x)
Otherwise, if Γ(0) = LC(h) = we get simultaneous divergence to from
lower semicontinuity of Γ:
= Γ(0) lim inf
x0Γ(x) = lim inf
x0xΛ(1
x)
= Γ(0) = lim
x0Γ(x) = lim
x0xΛ(1
x)
2.6.3
Corollary 2.6.4. Γ(0) = limx0Γ(x)
Γ(0) = LC(h)and the domain of Γcontains its left boundary point
{0}iff τis LD-bounded.
Similarly we are interested in how Γbehaves for large arguments.
Claim 2.6.5. Γbehaves superlinearly at infinity.
2.7 Inter event times 31
Proof of 2.6.5: We see the correspondence of unboundedness from below
of Λ (left) and superlinearity of Γ(right).
lim
θ→−∞ Λ(θ) = Λ(0) = lim
x→∞ Λ(1
x) = lim
x→∞
x
xΛ(1
x) = lim
x→∞
Γ(x)
x
where limx→∞ Λ(1
x) = Λ(0) follows from lower semi continuity and Λ(0) =
).
2.6.5
2.7 Extreme twists
What happens as the twist parameter θfor an inter event time tends to
±∞ or, if the domain of Λ is bounded, to LC(h)? With the assumption
2.2.2 we have interpreted the derivative of the lmgf as the twisted mean
(θ) = E(θ)[τ]) and so the range of Λis the interval of attainable means
under exponential twists. I assume the inter event time under the twisted
distribution tends to the essential infimum and the essential supremum of
the untwisted inter event time as the twist parameter tends to −∞ and
+/LC(h) respectively.
Claim 2.7.1. If F(0) = 0 and there is some kNsuch that f(0)(0) = ···=
f(k1)(0) = 0 < f(k)(0) then limθ→∞Λ(θ) = 0.
Proof of 2.7.1: First for f(0) >0.
θE[eθτ ] = θZ
x=0
eθ x f(x)dx =θ2Z
x=0
eθx F(x)dx
|{z }
(1)
y=θ x
=Z
y=0
eyF(y
θ)θ2
θdy
lim
θ→∞ θE[eθτ ] = lim
θ→∞ Z
y=0
eyθ F(y
θ)dy =Z
y=0
eyylim
θ→∞
θ
yF(y
θ)dy
=f(0)
and
E[τeθτ ] = Z
x=0
x eθx f(x)dx
=x eθxF(x)
x=0 Z
x=0 eθx x θ eθxF(x)dx
= 0 0Z
x=0
eθ x F(x)dx
|{z }
see above
+θZ
x=0
x eθx F(x)dx
32
and
θ2Z
x=0
x eθx f(x)dx =θ2Z
x=0
eθ x F(x)dx
|{z }
=f(0)by (1) above
+θ3Z
x=0
x eθx F(x)dx
|{z }
=2f(0) by(2) below
=f(0) + 2f(0) = f(0)
(2) : lim
θ→∞ θ3Z
y=0
x eθ xF(x)dx y=θ x
= lim
θ→∞ θ3Z
y=0
y
θeyF(y
θ)1
θdy
=Z
y=0
y2eylim
θ→∞
θ
yF(y
θ)dy
=f(0) Z
y=0
y2eydy = 2 f(0)
All this we apply in
lim
θ→∞ θΛ(θ) = limθ→∞ θ2E[τeθτ ]
limθ→∞ θE[eθτ ]=f(0)
f(0) = 1
If f(0) = 0 but f(0) >0 then we do a similar calculation with limǫ0F(ǫ)
ǫ2=
1
2f(0) >0 instead of the above limǫ0F(ǫ)
ǫ=f(0) >0. This works generally
as claimed.
2.7.1
This claim is a generalisation of an assumption in [16], theorem 2.2. It is
about being able to twist τto have positive mean values as small as we like
since we identified Λwith the expectation under the twisted distribution.
Corollary 2.7.2. If an inter event time τis bounded with least upper bound
b > 0, no point mass on 0and density fsuch that there is some kNsuch
that f(0)(b) = ···=f(k1)(b) = 0 < f(k)(b)then limθ→∞Λ(θ) = b.
This is by considering bτas an inter event time and applying 2.7.1.
We note that we don’t really need the density on the whole support but
just in a neighbourhood of 0 and - for bounded τ- on the least upper bound.
Claim 2.7.3. For τwith support [0, b]we have Λ(R) = (0, b)under assump-
tion 2.7.1 for the density at both 0and b.
Proof of 2.7.3: Openness of Λ(R) = (0, b) remains to be proved. Assume
the contrary and let θ0Rbe such that Λ(θ0) = b. Then for θ > θ0we’d
have Λ(θ) = bsince Λcannot decrease due to convexity and not increase
due to limθ→∞ Λ(θ) = b.
2.7.3
Also we can argue for unbounded τ.
2.8 Inter event times 33
Claim 2.7.4. If τsatisfies the if part of 2.7.1 and is unbounded but LD-
bounded, then Λ(R) = R>0.
Proof of 2.7.4: Fix some M > 0 and let θ > 0.
Λ(θ)>Λ(θ)
θlog RM
s=0 eθs dF(s) + eθM Fc(M)
θ
log(eθM Fc(M))
θ=θM RM
s=0 h(s)ds
θ
=M1
θZM
s=0
h(s)ds
|{z }
=log Fc(M)
But for unbounded τwe have Fc>0 always and thus log Fc(M)<.
Therefore
lim
θ→∞ Λ(θ)> M
2.7.4
For D(Λ) = (−∞, LC(h)) for LC(h)<we have already seen that Λ
can’t be bounded and that Λ is essentially smooth / steep. With 2.7.1 we
again get Λ(D(Λ)) = R>0.
2.8 Generalisations
Up to now we have assumed that inter event times could be arbitrarily small.
We now investigate the more general inter event times τ(a, b) for a0
and b .
Claim 2.8.1. If τ(a, b)with a0and b and Λ(D(Λ)) = (a, b)then
D)[a, b]and for x(a, b)an optimising θ=θ(x)Rexists such that
Λ(x) = θx Λ(θ), θ = )1(x).
From the assumption of no-point mass on boundary points we get D) =
(a, b).
Proof of 2.8.1:
log E[eθτ ] = + log E[eθ(τa)]
Λ(a) = sup
θRθa log E[eθτ ] = sup
θRlog E[eθ(τa)]
and thus Λ(a) = if τhas no point mass on a. Similarly Λ(b) = .
2.8.1
34
Remark 2.8.2. This affects the associated Γin the following way:
If a > 0, b < then D) = (1
b,1
a).
If a= 0 , b < then D) = (1
b,).
If a > 0, b =then D) = (0,1
a)or D) = [0,1
a)depending on
LC(h).
And D) = [0,1
a)has limx0Γ∗′(x) = .
Chapter 3
The counting process
In this chapter we move from inter event times to renewal counting processes
that we will define through these inter event times. For exponentially dis-
tributed inter event times the resulting renewal counting process will be the
Markovian Poisson process. For the more general inter event times of chap-
ter 2 we will obtain non-Markovian processes. In this chapter we will argue
that many implications of the Markov-property that are desirable when de-
veloping a large deviation principle on the process level can be obtained for
renewal counting processes as well.
The Poisson process as a Markovian renewal counting process has station-
ary increments starting at any fixed moment. This does not hold true for
the general renewal counting process. For the general case of a counting
process with iid inter event times we show how to construct the associated
process with stationary increments in section 3.2: Choosing a different first
inter event time will be enough and we will make the definition of renewal
counting process such as to include these kinds of processes. In section 3.3
we calculate the lmgf of the general renewal counting process and we find
that it relates to the lmgf of the inter event times as Γ(θ) = Λ1(θ). This
has been proved in 1994 by Peter Glynn and Ward Whitt for more general
counting processes and by other means (cf [10]).
The fourth section is about exponential equivalence of different kinds of
counting processes: A Markovian renewal counting process has a first in-
ter event time with the same distribution as all following inter event times
and has stationary increments. For a general renewal counting process these
are mutually exclusive properties. Luckily, in terms of large deviations the
process with stationary increments and the process with the first and all
following inter event times identically distributed are indistinguishable. In a
35
36
similar way we have independence of increments over disjoint intervals for the
Markovian counting process while this does not hold for the general renewal
counting process. We will construct a process that does have independent
increments when observed over finitely many, fixed, disjoint intervals. We say
the process restarts at the beginning of each interval. The restarted process
will be indistinguishable from the general renewal counting process in terms
of large deviations.
The exponential change of measure is classical in large deviation theory. It
is central to the proof of the large deviation principle for Jackson networks
of Irina Ignatiouk-Robert [12]. Preparing for a similar approach we develop
the exponential change of measure for the single (undelayed) renewal count-
ing process of general inter event times in the sixth section. We derive the
change of measure for the counting process from exponentially twisting its
inter event times. We will see that this does not change the process’ renewal
property and we stay in the same class of processes. This way we know about
the process’ expected behaviour under the changed measure.
Before starting the first section we remind the reader of assumptions and
some notation from the previous chapter:
Inter event times are denoted by τand variations of it with distribution
function Fand density (wrt Lebesgue measure) fand lmgf Λ with an
open domain (cf 2.2.2 , 2.2.13).
For a non-deterministic inter event time τthe hazard function is de-
noted hand existence of a positive (possibly infinite) Cesaro mean
LC(h) is assumed (cf 2.4.2).
All Λ and Γ are strictly convex as soon as the associated inter event
time is not deterministic.
For the domain of Λ we have D(Λ) = (−∞, LC(h)) (cf 2.4.6) and if
LC(h) = we say that τis LD-bounded (cf 2.5.4) and that Fcdecays
super exponentially (cf 2.4.7).
3.1 Introducing the counting process
With a sequence of inter event times τ1, τ2,...we associate a process that for
every t0 counts how many events have happened in [0, t]. This process
should at t= 0 take the value 0 and increase by 1 at each occurrence of
3.1 The counting process 37
an event. Since events are separated by inter event times τ1, τ2,... events
happen at τ1, τ1+τ2,P3
l=1 τl,....
Definition 3.1.1 (Counting process).Let τ1, τ2,... be inter event times.
Then the associated counting process Nis defined as
Ntkt <
k+1
X
l=1
τl
for any tR0.
Claim 3.1.2. Nis piecewise constant and jumps at Pk
l=1 τl;kN.
Proof of 3.1.2:
N(t) = kN(t)k
N(t)6≤ k1Sk+1 > t
Sktt[Sk, Sk+1)
3.1.2
Claim 3.1.3. If τ1, τ2, . . . are independent and fall under assumption 2.2.2
then all jump-sizes of Nare equal to 1and N(t)<for any tR0.
Proof of 3.1.3: Jump sizes are equal to 1 because intervals [Sk, Sk+1) are
a.s. not degenerate due to P(τk= 0) = 0 (which is necessary for 2.2.2 to
hold). For finiteness:
N(t) = N(t)kfor no kNt < Sk+1 for no kN
tSk+1 kN.
But the partial sums process grows a.s. unboundedly.
3.1.3
Definition 3.1.4 (Renewal counting process, rcp).If inter event times τ1, τ2,...
of a counting process Nare independent and τ2, τ3,... are identically dis-
tributed then Nis a renewal counting process or - abbreviated - a rcp. If
τ1, τ2are identically distributed then Nis an undelayed rcp.
τ1, τ2are not identically distributed then Nis a delayed rcp.
For a rcp we say that τ1is the initial inter event time and τ2is a typical
inter event time.
Of the delayed renewal counting processes one is particularly important.
Definition 3.1.5 (˜
N).The delayed renewal counting process with typical
inter event time distribution Fand initial inter event time distribution ˜
F,
the -transform of Fdefined in 2.1.2, is denoted ˜
N.
38
-r r r
-
6
````````
````````````
````````````````
τ1- τ2- τ3-
Figure 3.1: A sequence of inter event times and the associated counting
process
The sequence of inter event times and the associated counting process
contain the same information. We give a graphical representation of both in
figure 3.1.
We want to describe the distribution of the discrete valued delayed re-
newal counting process N. For a convenient way to describe the mass function
we make the following
Definition 3.1.6 (Convolution).For two density functions fand gon
R0their convolution fgis defined as fg(t) = Rt
s=0 f(ts)g(s)ds.
For two distribution functions Fand Gon R0their convolution FG
is defined as FG(t) = Rt
s=0 F(ts)dG(s).
For F:R0R0not necessarily a distribution function the convo-
lution is again FG=Rt
s=0 F(ts)dG(s).
Convoluting for Fwith itself we write ff=f2and FF=F2.
More generally fkis the k-time convolution recursively defined for k2
and for a uniform notation we write f1=fand f0=δ0. Analogously for
Fand Fkwith F0= 11[0,).
Defined this way the convolution of densities and distributions match: Rt
s=0 f
g(s)ds =FG(t).
Remark 3.1.7. Convolutions of densities and distribution functions describe
densities and distribution functions of sums of independent inter event times:
e.g. if τ, σ are independent and have densities fand gthen τ+σhas density
fg.
3.1 The counting process 39
Claim 3.1.8 (Mass function for delayed rcp).If Nis a delayed rcp with
initial inter event time distribution Gand typical inter event time distribution
Fthen
P(Nt=k) = (Gc(t)if k= 0
FcGF(k1)(t)if k1
Proof of 3.1.8: Let τ1, τ2,...be the inter event times of N,τ1with distri-
bution function Gand τ2, τ3,...each with distribution function F. For k1
the distribution function of Sk=Pk
l=1 τlis the convolution GF(k1). We
calculate the mass function:
P(Nt= 0) = P(τ1> t) = Gc(t)
P(Nt=k) = Zt
s=0
P(Nt=k|Sk=s)dG F(k1)(s)
=Zt
s=0
P(τk> t s)dG F(k1)(s)
=FcGF(k1)(t) (3.1)
and for the last line we need to allow the convolution of the not-a-distribution
function Fc.
3.1.8
Corollary 3.1.9. For G=Fwe get the following mass function for the
undelayed counting process
P(Nt=k) = FcFk(t) = E[τ]˜
ffk(t) (kN)
and P
k=0 FcFk(t) = 1 or P
k=0 ˜
ffk(t) = 1
E[τ].
We have introduced exponential transformation of distribution functions
in definition 2.3.1 and we want to know how this relates to convolutions.
Claim 3.1.10. Let Λbe the lmgf of interevent time τwith density fand
distribution function F. If k1then
(1) log R
x=0 eβx dFk(x) = kΛ(β);
(2) fkβ=fβkor equivalently Fkβ=Fβk.
Remark 3.1.11. Claim 3.1.10 and the definition of the exponential
transform 2.3.1 imply that fkβ(x) = eβxkΛ(β)fk(x)or equivalently
Fkβ(x) = Rx
s=0 eβskΛ(β)dFk(s).
40
As a consequence we can omit parentheses: Fk
β= (Fβ)k= (Fk)β.
Proof of 3.1.10: (1) is clear from Fkbeing the distribution function of
Pk
l=1 τlfor iid τ1,...k. We prove (2) inductively for the densities. Initially
for k= 1
(2), k = 1 (fβ)1(x) = fβ(x) = (f1)β(x).
And generally for k1: If (2) holds for kand (1) generally holds then (2)
holds for k+ 1, too.
(fβ)(k+1)(x) = (fβ)kfβ(x) = Zx
s=0
(fβ)k(xs)fβ(s)ds
(2)
=Zx
s=0
(fk)β(xs)fβ(s)ds
(1)
=Zx
s=0
fk(xs)eβ(xs)kΛ(β)f(s)eβsΛ(β)ds
=Zx
s=0
fk(xs)f(s)ds eβx(k+1)Λ(β)
(1)
= (f(k+1))β(x)
3.1.10
3.1.1 Joint distributions
At any fixed time tit might be interesting to know how much time passed
since the last event and how far the next event is.
Definition 3.1.12 (Age, residual lifetime, spread).For a counting process
Nwith associated partial sums of inter event times Swe define
the age at time tas B(N, t) = tSN(t).
the residual lifetime at time tas C(N, t) = SN(t)+1 t.
the spread at time tas B(N, t)+C(N, t). This is the length of the inter
event time covering t.
We sometimes abbreviate B(N, t) and C(N, t) by omitting Nand write
B(t) and C(t) instead.
Figure 3.2 illustrates the definition.
Remark 3.1.13. From construction of the process we immediately get: Given
the age, the residual life is independent of the state.
3.1 The counting process 41
-
t
r r
τN(t)+1 -
B(t)- C(t)-
Figure 3.2: Age, residual lifetime, and spread
Claim 3.1.14. P(C(t)x|B(t)) = F+B(t)(x)with F+·of definition 2.1.7.
Proof of 3.1.14: First note that
B(t) = aN(ta) = 1 , τN(t)+1 > a
{B(t) = a , N(t) = k} {Sk=ta , τk+1 > a}
and apply this in the following.
P(τN(t)+1 > x +a|B(t) = a)
=
X
k=0
P(τN(t)+1 > x +a|N(t) = k , B(t) = a)P(N(t) = k)
=
X
k=0
P(τk+1 > x +a|N(t) = k , Sk=ta , τk+1 > a)P(N(t) = k)
=
X
k=0
P(τk+1 > x +a|Sk=ta , τk+1 > a)P(N(t) = k)
Skτk+1
=
X
k=0
P(τk+1 > x +a|τk+1 > a)
|{z }
=P(τ1>x+a|τ1>a)
P(N(t) = k)
=P(τ1> x +a|τ1> a)
X
k=0
P(N(t) = k)
=Fc
+a(x)
3.1.14
Corollary 3.1.15. By symmetry of age and residual lifetime similarly to
3.1.14 holds: P(B(t)x|C(t)) = F+C(t)(x)with F+·of definition 2.1.7.
Let Nbe a delayed renewal counting process with initial inter event time
distribution function Gand typical inter event time distribution function F.
We give the joint distribution of state and residual lifetime.
42
Claim 3.1.16. The joint mass of (N(t), C(t)), the state and residual lifetime
at t, on {k} × (s, )for kNand s0is
P(N(t) = k, C(t)> s) = (Gc(t+s)k= 0
Rt
r=0 Fc(t+sr)dG F(k1)(r)k1.
The joint mass of (N(t), B(t)), the state and the age at t, on {k} × [0, s]for
kNand s0is
P(N(t) = k , B(t)s) =
0k= 0 , s < t
Gc(t)k= 0 , s t
FcGF(k1)(t)
FcGF(k1)(ts)!k > 0, s t
FcGF(k1)(t)k > 0, s > t.
Proof of 3.1.16: Let Sk=Pk
l=1 τl.
P(N(t) = k, C(t)> s)
=P(Skt < Sk+1 , Sk+1 > t +s)
=P(Skt , Sk+1
|{z}
=Sk+τk+1
> t +s)
=Zt
r=0
P(rt , r +τk+1 > t +s|Sk=r)dG F(k1)(r)
=Zt
r=0
P(τk+1 > t +sr)dG F(k1)(r)
=Zt
r=0
Fc(t+sr)dG F(k1)(r)
Now the joint mass of state and age and first for k= 0.
P(N(t) = 0 , B(t)s) = (0, s < t
Gc(t), s t
3.2 The counting process 43
Now for the general k1 and st
P(N(t) = k , B(t)s)
=P(N(t) = k
|{z }
t[Sk, Sk+τk+1)
, t SN(t)s)
=P(Sk[ts, t], τk+1 > t Sk)
=Zt
r=ts
P(r[ts, t], τk+1 > t r|Sk=r)gf(k1)(r)dr
=Zt
r=ts
Fc(tr)gf(k1)(r)dr (3.2)
=Zt
r=ts
Fc(tr)d G F(k1)(r)
=FcGF(k1)(t)FcGF(k1)(ts)
whereas for k1, t < s
{N(t) = k
|{z }
B(t)<t
, B(t)s}={N(t) = k}
P(N(t) = k , B(t)s) = FcGF(k1)(t) (cf. 3.1)
3.1.16
Note that the joint distribution of state and age has a density for k1
fixed and st(easily seen from (3.2)):
s7→ Fc(s)gf(k1)(ts) (3.3)
Corollary 3.1.17. The undelayed renewal counting process with inter event
time distribution function Fhas as joint distribution of state and age
P(N(t) = k , B(t)s) =
0k= 0 , s < t
Fc(t)k= 0 , s t
FcFk(t)FcFk(ts)k > 0, s t
FcFk(t)k > 0, s > t
with the following density for k > 0, s [0, t]
s7→ d
dsP(N(t) = k , B(t)s) = Fc(s)fk(ts)
44
3.2 Stationary increments
Consider the delayed renewal counting process ˜
Ndefined in 3.1.5. In this
section we prove that ˜
Nhas strictly / strongly stationary increments.
The mass function of ˜
Nat fixed tis derived from claim 3.1.8 as
k7→ P(˜
N(t) = k) = (˜
Fc(t)k= 0
Fc˜
FF(k1)(t)k1(3.4)
The following is weak stationarity of increments.
Claim 3.2.1. t7→ E[˜
N(t)] is linear with slope 1
E[τ].
Proof of 3.2.1:
E[˜
N(t)] =
X
k=1
P(˜
N(t)k) =
X
k=1
P(˜τ+
k
X
l=2
τlt) =
X
k=1
˜
FF(k1)(t)
and for the derivative
d
dt ˜
FFk(t) = ˜
F(0)
|{z}
=0
Fk(t) + Zt
s=0
˜
f(ts)dFk(s) = 1
E[τ]FcFk(t)
3.1.9
=1
E[τ]P(N(t) = k)
and thus
d
dtE[˜
N(t)] = d
dt ˜
F(t) +
X
k=1
˜
FFk(t)
=˜
f(t) +
X
k=1
1
E[τ]P(N(t) = k)
=1
E[τ]Fc(t)
|{z}
=P(N(t)=0)
+P(N(t)1)=1
E[τ]
3.2.1
We will continue by arguing for strong stationarity of increments of the de-
layed counting process ˜
N.
Lemma 3.2.2. For the delayed renewal counting process ˜
Nwith initial inter
event time ˜τholds: L(C(˜
N, t)) = L(˜τ)for any t.
3.2 The counting process 45
Proof of 3.2.2: We work with the joint distribution of ˜
N(t) and C(˜
N, t)
of 3.1.16. We have to set g=˜
f.
P(C(t)> s) = P(N(t) = 0 , C(t)> s) +
X
k=1
P(N(t) = k , C(t)> s)
=˜
Fc(t+s) + Zt
r=0
Fc(t+sr)
X
k=1
˜
ff(k1)(r)dr
=Z
r=t+s
Fc(r)
E[τ]dr +Zt
r=0
Fc(t+sr)
X
k=0
˜
ffk(r)
|{z }
=1
E[τ]by 3.1.9
dr
=Z
r=t+s
Fc(r)
E[τ]dr +Zt+s
r=s
Fc(r)
E[τ]dr
=˜
Fc(s) = P(˜τ > s)
3.2.2
From this strong stationarity of increments of ˜
Nis immediate:
Claim 3.2.3. Let N:s7→ ˜
N(t+s)˜
N(t)for s0. Then L(˜
N) = L(N).
Proof of 3.2.3: The process Nis piecewise constant and if it jumps at s
then
N(s)N(s) = ˜
N(t+s)˜
N(t+s) = 1.
Typical inter event times of Nare typical inter event times of ˜
N, so they
are independent. Thus Nis a delayed renewal counting process. The initial
inter event time distribution of Nis τ
1=C(˜
N, t) and we have just seen that
L(C(˜
N, t)) = L(˜τ). So inter event times of Nhave the same distribution as
those of ˜
Nand the distributions of the associated counting processes coin-
cide.
3.2.3
As a graphical presentation of the proof of 3.2.3 consider figure 3.3. In-
ter event times of ˜
Nare on the upper and of Non the lower timeline.
For a different kind of proof of strong stationarity of increments of ˜
Nsee
[26] (ch. 2.16 especially theorem 17. on p. 112).
We can similarly argue for reversibility of the counting process with sta-
tionary increments. The following lemma is a preparation.
46
-
t
0
-
0
r r r r
rpppppppppr
rppppppppp
τ1= ˜τ
- τ2
- τ3
- τ4
-
-
τ
1=C(t)
-
-
τ
2=τ5
Figure 3.3: Inter event times of ˜
Nand Nof claim 3.2.3
Lemma 3.2.4.
P(B(t)x) = (1if xt
˜
F(x)if x < t
Proof of 3.2.4: We apply the conditional distribution 3.1.15 and the
knowledge of the distribution of C(˜
N, t) from 3.2.2.
P(B(t)x) = Z
y=0
P(B(t)x|C(t) = y)
|{z }
=F+y(x)
˜
f(y)dy
=Z
y=0
(1 Fc(x+y)
Fc(y))Fc(y)
E[τ]dy
= 1 Z
y=0
Fc(x+y)
E[τ]dy
= 1 Z
y=x
˜
f(y)dy
=˜
F(x)
For the point mass: P(B(˜
N, t) = t) = P(˜τ > t) = ˜
Fc(t) which suits just fine.
3.2.4
Claim 3.2.5 (Reversibility of increments).For ˜
Nand fixed t > 0let N′′ :
s7→ ˜
N(t)˜
N(ts)for s[0, t]. Then L(˜
N[0,t]) = L(N′′).
Proof of 3.2.5: B(˜
N, t) has the required distribution by lemma 3.2.4.
Denote inter event times of N′′ by τ′′
1, τ′′
2,.... The remaining proof is in
figure 3.4.
3.2.5
3.3 Lmgf for the undelayed rcp
The logarithmic moment generating function is an essential in classical large
deviation theory.
3.3 The counting process 47
-
0t
r r r r
t0
rppppppppp
rppppppppp
rppppppppp
τ1= ˜τ
-
B(N′′, t)
-
τ2
-
τ′′
3=τ2
-
τ3
-
τ′′
2=τ3
-
τ4
-
-
-
τ′′
1=B(˜
N, t)
Figure 3.4: Inter event times of ˜
Nand the reversed process N′′
Claim 3.3.1. Let Nbe the undelayed rcp with inter event time distribution
Fwith lmgf Λ(cf 2.2.1) and associated Γ(cf 2.2.7). Then for any θR
lim
t→∞
1
tlog E[eθNt] = Γ(θ).
Proof of 3.3.1: We calculate exactly. For θRset ρ=Γ(θ) which is
equivalent to Λ(ρ) = θby definition of Γ.
E[eθNt]3.1.9
=Fc(t) +
X
k=1
eθk FcFk(t)
=Fc(t) +
X
k=1
eθk Zt
s=0
Fc(ts)fk(s)ds
=Fc(t) + Zt
s=0
Fc(ts)
X
k=1
eθk fk(s)ds
θ=Λ(ρ)
=Fc(t) + Zt
s=0
Fc(ts)
X
k=1
eρs eρskΛ(ρ)fk(s)ds
3.1.10
=Fc(t) + Zt
s=0
Fc(ts)eρs
X
k=1
fk
ρ(s)ds
=Fc(t) + e Zt
s=0
Fc(ts)eρ(ts)1
E(ρ)[τ]g
(fρ)(s)
|{z }
=1
E(ρ)[τ]Fc
ρ(s)
E(ρ)[τ]
ds
=Fc(t) + e Zt
s=0
Fc(ts)eρ(ts)Fρ(s)
E(ρ)[τ]ds
And the integral in the last line converges to some value in (0,). Which
implies that there is no exponential decay or growth and the integral does
48
not contribute to the exponential rate of the expectation.
Zt
s=0
Fc(ts)eρ(ts)Fρ(s)ds r=ts
=Zt
r=0
Fc(r)
|{z}
=E[τ]˜
f(r)
erρ Fρ(tr)dr
=Zt
r=0
(˜
f)ρ(r)e˜
Λ(ρ)Fρ(tr)dr
=e˜
Λ(ρ)Zt
r=0
Fρ(tr)d(˜
F)ρ(r)
=e˜
Λ(ρ)Fρ(˜
F)ρ(t)
e˜
Λ(ρ)(as t )
Thus under the exponential scaling
lim sup
t→∞
1
tlog E[eθNt]max{lim
t→∞
1
tlog Fc(t),ρ+ lim sup
t→∞
1
tlog e˜
Λ(ρ)
E(ρ)[τ]}
= max{−LC(h),ρ}=ρ
where we applied 2.4.7 (for decay rate of Fcas LC(h)) and ρ D(Λ) =
(−∞, LC(h))) and
lim inf
t→∞
1
tlog E[eθNt]lim inf
t→∞
1
tlog e e˜
Λ(ρ)
E(ρ)[τ]=ρ
Since ρ=Γ(θ) the claim is proved.
3.3.1
3.4 Exponential equivalence for cps
Doing Large Deviations for counting processes it will be convenient to make
some small alterations from the original process from time to time. In this
section we describe these alterations and prove that they do not affect the
Large Deviation behaviour of the process.
Definition 3.4.1 (Scaling).For Na renewal counting process define
Nn:R0R, t 7→ 1
nN(nt).
For every nNand T > 0 the scaled counting processes on [0, T] are
elements of D([0, T],R). The sup-norm of any Nnrestricted to [0, T] is finite
and for two counting processes the sup-norm induced distance will be finite:
||NnN
n|| ||Nn|| +||N
n|| =1
n(N(nT) + N(nT)) <
3.4 The counting process 49
Definition 3.4.2 (Exponential equivalence in D(p0, Tq,R),||.||).The se-
quences of processes (Yn;nN)and (Zn;nN)in D([0, T],R)equipped
with the sup-norm are exponentially equivalent if for each nNthere is a
coupling (ˇ
Yn,ˇ
Zn)of (Yn, Zn)such that the sequence of sup-norm distances
(||ˇ
Ynˇ
Zn|| ;nN)decays super exponentially: For any δ > 0
lim
n→∞
1
nlog P(||ˇ
Ynˇ
Zn|| > δ) = −∞.
If the sequence of processes used in the exponential equivalence is obvious
and for example relates to a counting process under the scaling 3.4.1, then we
may say that two processes N , Nare exponentially equivalent when indeed
we should be saying that (Nn;nN) and (N
n;nN) are exponentially
equivalent.
3.4.1 Initial inter event time
Here we argue for the exponential equivalence of counting processes that differ
only in the distribution of the time to the first event if these distributions
are exponentially equivalent. This will imply exponential equivalence of the
undelayed rcp and the associated rcp with stationary increments.
Definition 3.4.3 (N, Nσ).Nis an undelayed renewal counting pro-
cess and Fis the distribution function of each inter event time. τis
the first inter event time of N.
Nσis a delayed renewal counting process with typical inter event time
distribution Fand initial inter event time σwith distribution function
G.
Claim 3.4.4. If the initial inter event times τ, σ of N, Nσare exponentially
equivalent then the counting processes N, Nσare exponentially equivalent.
To prove this we have to give a coupling ˇ
N, ˇ
Nσof N, Nσsuch that the
difference || ˇ
Nnˇ
Nσ
n|| decays faster than exponentially in n.
Definition 3.4.5 (Coupled ˇ
N, ˇ
Nσ).Let {U, τ2, τ3,...}be independent ran-
dom variables where Uis uniform on [0,1] and τ2, τ3,... have density fand
distribution function F. Define the counting process ˇ
Nthrough its inter event
times F1(U), τ2, τ3,... and the counting process ˇ
Nσthrough its inter event
times G1(U), τ2, τ3,... (cf def 3.1.1 of a counting process).
Figure 3.5 is a graphical representation of the coupled ˇ
Nand ˇ
Nσ.
50
-
0r r r r
-
0rp
p
p
p
p
p
p
p
p
rp
p
p
p
p
p
p
p
p
rp
p
p
p
p
p
p
p
p
rp
p
p
p
p
p
p
p
p
τ1=τ
-
τ1=σ
-
τ2
- τ3
- τ4
-
Figure 3.5: Inter event times of coupled ˇ
Nand ˇ
Nσ
Claim 3.4.6 (Marginal distributions).L(N) = L(ˇ
N)and L(Nσ) = L(ˇ
Nσ).
Proof of 3.4.6: Inter event times F1(U), τ2, τ3,... and G1(U), τ2, τ3,...
are independent because U, τ2, τ3,...are. Thus ˇ
Nand ˇ
Nσare rcps. All inter
event times of ˇ
Nhave distribution function F: the τ2, τ3,...by definition and
F1(U) by the quantile coupling and (2.6). Thus ˇ
Nis an undelayed renewal
counting process. By the definition 3.1.1 of a counting process its distribution
is determined by the distribution of its inter event times: so distributions of
ˇ
Nand Ncoincide. Similarly Nσis a delayed renewal counting process and
its first inter event time has the required distribution: L(σ) = L(G1(U))
again by the quantile coupling and (2.6).
3.4.6
Lemma 3.4.7. For coupled ˇ
N, ˇ
Nσof 3.4.5: ˇ
Nσ(t) = ˇ
N(t+τσ)(with
ˇ
N(s) = 0 for s0).
Proof of 3.4.7: Set Sk=τ+Pk
l=2 τkand Sσ
k=σ+Pk
l=2 τk. For k1
ˇ
N(t+τσ) = kSkt+τσ < Sk+1
τ+σ
Sσ
kt < Sσ
k+1 ˇ
Nσ(t) = k
and for k= 0
ˇ
N(t+τσ) = 0 t+τσ < τ t < σ ˇ
Nσ(t) = 0
Consider the case τ < σ and let 0 < s < τ+σ. Since s < σ we have
Nσ(s) = 0 and N(s+τσ) = 0 since s+τσ < 0.
3.4.7
Proof of 3.4.4: The claim is proved if we can prove
lim sup
n→∞
1
nlog P(|| ˇ
Nnˇ
Nσ
n|| > a) = −∞ (3.5)
for ˇ
N,ˇ
Nσof definition 3.4.5 and any a > 0. We introduce a small parameter
3.4 The counting process 51
δ > 0.
P( sup
t[0,T ]
|ˇ
Nσ
n(t)ˇ
Nn(t)|> a)
=P( sup
t[0,T ]
|N(nt +τσ)N(nt)|> n a , |τσ|< δn)
+P( sup
t[0,T ]
|N(nt +τσ)N(nt)|> n a , |τσ| δn)
(We write Ninstead of ˇ
Nagain since L(N) = L(ˇ
N)). From the quantile
coupling of τand σ(such that they are exponentially equivalent, cf claim
2.5.7) we already know that the second probability decays superexponentially.
We further investigate the first event. Under the condition of a small distance
|τσ|and in light of the monotonicity of N
|N(nt +τσ)N(nt)|
max{|N(nt +)N(nt)|,|N((nt )+)N(nt)|}
= max{N(nt +)N(nt), N(nt)N((nt )+)}
and taking the sup over all tmakes the max{...}unnecessary.
P( sup
t[0,T ]
|ˇ
Nσ(nt)ˇ
N(nt)|> n a , |τσ|< δn)
P( sup
t[0,T ]
N(nt +)N(nt)> n a , |τσ|< δn)
And
P( sup
t[0,T ]
|ˇ
Nσ
n(t)ˇ
Nn(t)|> a)
P( sup
t[0,T ]
N(nt +)N(nt)> n a) + P(|τσ| )
Exponential equivalence as stated in 3.5 has now become equivalent to
lim
δ0lim sup
n→∞
1
nlog P( sup
t[0,T ]
Nn(t+δ)Nn(t)> a) = −∞ (3.6)
Now fix a δ < a
λ(and small relative to T) and divide the interval [0, T] into
many intervals of size δ. If there is tsuch that Nn(t+δ)Nn(t)> a then
52
this twill be in one of these intervals.
P( sup
t[0,T ]
Nn(t+δ)Nn(t)a)
=P( max
m=0,...,T
δ
sup
t[,(m+1)δ]
Nn(t+δ)Nn(t)a)
T
δ
X
m=0
P( sup
t[,(m+1)δ]
Nn(t+δ)Nn(t)a)
=
T
δ
X
m=0
P(C(mnδ)< , sup
t[,(m+1)δ]
Nn(t+δ)Nn(t)a)
=
T
δ
X
m=0
P( sup
t[,(m+1)δ]
Nn(t+δ)
|{z }
Nn((m+2)δ)
Nn(t)
|{z}
Nn()
=1
n(N(nmδ+C(nmδ))1)
|{z }
1
n+Nτ
n(2δ)
a
|C(mnδ)< )P(C(mnδ)< )
|{z }
1
T
δ
X
m=0
P(Nτ
n(2δ)a1
n)
where we introduced Nτas an undelayed rcp. to bounds increments of Nn
(It should be a stochastic domination only since we don’t want to do another
explicit coupling again ...). Since Ndefined in 3.4.3 was undelayed too, we
can even omit the τ.
P( sup
t[0,T ]
Nn(t+δ)Nn(t)a)(T
δ+ 1)P(Nn(2δ)> a 1
n)
a<a
(T
δ+ 1)P(Nn(2δ)> a)
With in the last inequality n > 1
aawhich is required for a1
n> ato
hold for a< a. To apply the scaling we move from counting processes to
partial sums since we already have the large deviation for their mean (cf [5]
3.4 The counting process 53
Cram´er’s theorem 2.2.3).
1
nlog P(Nn(2δ)a) = 1
nlog P(N(n2δ)na)
=1
nlog P(Snan2δ)
=a1
nalog P(1
naSna2δ
a)n→∞
aΛ(2δ
a)
As we have arbitrary a< a and Λis continuous we now have checked (3.6):
lim
δ0lim sup
n→∞
1
nlog P( sup
t[0,T ]
Nn(t+δ)Nn(t)> a)
lim
δ0aΛ(2δ
a) = aΛ(0) = −∞
Note that Λ(0) = is a consequence of the no-point-mass at zero property
of assumption 2.2.2 and that limδ0Λ(δ) = follows from lower semi con-
tinuity of Λ.
3.4.4
The following generalisation of 3.4.4 is immediate.
Corollary 3.4.8. Any two renewal counting processes with the same typical
inter event time distribution and exponentially equivalent initial inter event
times are exponentially equivalent.
3.4.2 Independence of increments
For the Markovian renewal counting process the state at a fixed time s
[0, T] and future increments (N(s), N(T)N(s)) are independent. This
does not hold for non-deterministic inter event distributions different from
the exponential.
The non-independence of N(s) and N(T)N(s) for a general rcp Nis
through the age B(s) at time sand affects the initial distribution of the
process of increments on [s, T]. We have seen in the last section that a single
initial inter event time may be changed without affecting the large deviation
behaviour of the process. This will be a tool when replacing a renewal count-
ing process by a similar process with a certain independence of increments.
Claim 3.4.9. Let Nbe a rcp and Nnthe associated scaled process. Fix a
kNand fix 0< s1<··· < sk< T. Then there is a sequence of scaled
counting process (N
n;nN)such that (Nn;nN),(N
n;nN)are
54
exponentially equivalent in (D([0, T],R),|| · ||)and processes of increments
over disjoint intervals for N
n
N
n(t)N
n(sm) ; t[sm, sm+1]|m= 0,...,k
(with s0= 0 , sk+1 =T) are independent.
Since rcps with the same Cesaro mean for their initial inter event time
distribution are exponentially equivalent we can pick one for the proof of
3.4.9 and we pick ˜
Nof definition 3.1.5 and section 3.2. The proof will be by
induction and we start with the base case k= 1.
Claim 3.4.10. Let ˜
Nbe the delayed rcp with stationary increments and for
some T > 0let s[0, T]. Then there is a sequence of counting process
(N
n;nN)such that (N
n;nN)and (˜
Nn;nN)are exponentially
equivalent in (D([0, T],R),|| · ||)and for each nN
(N
n(t) ; t[0, s]) ,(N
n(t)N
n(s) ; t[s, T])
are independent.
Definition 3.4.11 (Restarted process Nre,s).Given s > 0and rcps N, N(1)
define
Nre,s :R0R, t 7→ (N(t)if ts
N(s) + N(1)(ts)if t > s.
Note that Nre,s is a counting process and if N[0,s], N(1) are independent,
then Nre,s(s) and increments of Nre,s after sare independent.
We want a scaling for the restarted process Nre,s that also scales the as-
sociated epoche s.
Definition 3.4.12 (Scaled restarted process Nre,s
n).Fix s > 0. Let Nbe a
renewal counting process and (Nre,ns ;nN)a sequence of restarted process
associated with ns, N, N(1). Then define
Nre,s
n:R0R, t 7→ 1
nNre,ns(nt).
For some fixed swe now give a coupling of ˜
Nand a restarted process
Nre,s.
3.4 The counting process 55
Definition 3.4.13 (Coupling M, M).Let s > 0be fixed. Let U1, U2, τ
2, τ′′
2,
τ
3, τ′′
3... be independent random variables where U1, U2are uniform on [0,1]
and all τ
i, τ′′
ihave distribution function F. Let F+·be associated with F(cf
definition 2.1.7) and define (B, C) = ( ˜
F1(U1), F1
+B(U2)) through the quan-
tile coupling (cf definition 2.5.2).
Finally define Mas the counting process associated with the following se-
quence of inter event times:
B(M, s) = B
C(M, s) = C
inter event times after s+Care τ
2, τ
3,...
inter event times before sBare τ′′
2, τ′′
3,...
And define Mas the counting process with the following inter event times
inter event time covering s:C(M, s) = G1(U2),B(M, s) = B(M, s)
all other inter event times the same as in M.
Figure 3.6 is a graphical representation of the definition of inter event
times for the coupled M, M.
0
pppp -
r r r r r r
τ′′
3
- τ′′
2
- B
-
s
- -
τ
2 -
τ
3
F1
+B(U2)
0
pppp -rppppppppp
rppppppppp
rppppppppp
rppppppppp
rppppppppp
rppppppppp
-
G1(U2)
Figure 3.6: Inter event times of coupled M(top line) and M(bottom line)
Claim 3.4.14. L(M) = L(˜
N).
Proof of 3.4.14: The construction of Mis a combination of those in section
3.2: We construct the counting process starting in tfor stas in 3.2.3 and
also construct its past as the reverse of a stationary counting process in 3.2.5.
56
We make sure that C=F1
+B(U2) has the correct marginal distribution:
density ˜
f.
P(Cx) = Z
b=0
P(Cx
|{z}
F1
+b(U2)x
|B=b)˜
f(b)db =Z
b=0
F+b(x)˜
f(b)db
2.1.7
=Z
b=0
F(x+b)F(b)
Fc(b)
Fc(b)
E[τ]db =Z
b=0
F(x+b)F(b)
E[τ]db
d
dxP(Cx) = R
b=0 f(x+b)db
E[τ]=Fc(x)
E[τ]=˜
f(x)
So we can really use Bas the age B(˜
N, t) and Cas the residual lifetime
C(˜
N, t). The inter event times of Mand ˜
Nthat cover thave the same
distribution. Since all other inter event times of of Mand ˜
Nhave the same
distribution, too, the distributions of the counting processes coincide.
3.4.14
Claim 3.4.15. L(M) = L(Nre,s)
Proof of 3.4.15: We have the process of increments of Mafter swith
inter event times G1(U2), τ
2, τ
3,...which is independent of Mon [0, s] and
we can identify increments of Mafter s- distribution wise - with N(1) of the
definition of Nre,s.
3.4.15
Proof of claim 3.4.10 with N
n=Nre,s
n: For each nNlet M, Mbe the
coupled processes defined above associated with ns. Then for fixed n
sup
t[0,nT ]
|M(t)M(t)|= sup
t[ns,nT ]
|M(t)M(t)|
= sup
t[0,n(Ts)]
|NF1
+B(U2)(t)NG1(U2)(t)|
where N·are rcp with typical inter event time distribution function Fand
indicated initial inter event times: and F1
+b(U2) and G1(U2) = ˜
F1(U2) have
the same Cesaro mean for their hazard functions uniformly in bR0. Thus
by section 3.4.1 / corollary 3.4.8 these processes are exponentially equivalent
and
lim
n→∞
1
nlog P( sup
t[0,n(Ts)]
|NF1
+B(U2)(t)NG1(U2)(t)|> na) = −∞
3.4.10
We now prepare for the inductive step:
3.4 The counting process 57
Definition 3.4.16 (Restarted process Nre,(s1,...,sk)).Given an ordered se-
quence 0< s1<···< skand rcps N, N(1),...,N(k)define
Nre,(s1,...,sk):R0R
t7→
N(t)if t[0, s1]
N(s1) + N(1)(ts1)if t(s1, s2]
.
.
..
.
.
N(s1) + Pk1
m=1 N(m)(sm+1 sm) + N(k)(tsk)if t(sk,).
Definition 3.4.16 generalises definition 3.4.11 in the number of restarted
epochs.
Lemma 3.4.17. The following is an equivalent recursive definition for 3.4.16:
Given an ordered sequence 0< s1<··· < skand rcps N, N(1),...,N(k)de-
fine
For k= 1 apply definition 3.4.11.
For k2and an ordered sequence 0< s1<··· < sk<and rcps
N, N(1),...,N(k)let Nre,(s1,...,sk1)be the restarted process associated
with N, N(1),...,N(k1) and the ordered sequence s1,...,sk1
Nre,(s1,...,sk)(t) = (Nre,(s1,...,sk1)(t)if t[0, sk]
Nre,(s1,...,sk1)(sk) + N(k)(tsk)if t(sk,)
The proof of 3.4.17 is just by inductively spelling out the definition.
3.4.17
Claim 3.4.18. Let ˜
Nbe the delayed rcp with stationary increments. Fix
T > 0and kNand let 0< s1,...,sk+1 < T be an ordered sequence.
Assume that for any ordered sequence 0< s
1,...,s
kthe sequences of pro-
cesses (˜
Nn;nN)and (Nre,(s
1,...,s
k)
n;nN)are exponentially equivalent in
(D([0, T],R),|| · ||). Then N
n=Nre,(s1,...,sk+1)
nis such that the sequences
of processes (˜
Nn;nN)and (N
n;nN)are exponentially equivalent in
(D([0, T],R),|| · ||)and for each nNthe processes of increments
(N
n(t) ; t[0, s1]) ,(N
n(t)N
n(s1) ; t[s1, s2]) , . . .
. . . , (N
n(t)N
n(sk+1) ; t[sk+1, T])
are independent.
58
Proof of 3.4.18: For each nNwe couple the sequence of k+2 processes
{˜
Nn,˜
N(1)
n,˜
N(2)
n,..., ˜
N(k)
n,˜
N(k+1)
n}(3.7)
the following way:
˜
Nn,˜
N(1)
n,˜
N(2)
n,..., ˜
N(k)
nare coupled such that the exponential equiva-
lence of ( ˜
Nn;nN) and (Nre,(s1,...,sk)
n;nN) holds when Nre,(s1,...,sk)
n
is build from ˜
Nn,˜
N(1)
n,˜
N(2)
n,..., ˜
N(k)
n.
˜
N(k+1)
nand ˜
N(k)
nare coupled as in definition 3.4.13 for s=n(sk+1 sk).
We have for any process N
|| ˜
NnNre,(s1,...,sk+1)
n|| || ˜
NnN|| +||NNre,(s1,...,sk+1)
n||
P(|| ˜
NnNre,(s1,...,sk+1)
n|| > δ)P(|| ˜
NnN|| +||NNre,(s1,...,sk+1)
n|| > δ)
P(|| ˜
NnN|| >δ
2) (3.8)
+P(||NNre,(s1,...,sk+1)
n|| >δ
2) (3.9)
For fixed nNchoose N=Nre,(s1,...,sk)
nassociated with the ˜
N,˜
N(1),...,
˜
N(k)of (3.7). Then ( ˜
Nn;nN) and ( ˜
Nre,(s1,...,sk)
n;nN) have an LD
bounded sup-norm distance by construction and the induction assumption:
The probability (3.8) decays super exponentially. For (3.9) first notice that
t[0, sk+1] : Nre,(s1,...,sk)
n(t) = Nre,(s1,...,sk+1)
n(t)
since both processes are associated with the same sequence of (3.7). And on
[sk+1, T] first apply the representation of Nre,(s1,...,sk+1)
nof lemma 3.4.17 and
then the plain definition of the restarted process.
tsk+1 :Nre,(s1,...,sk)
n(t)Nre,(s1,...,sk+1)
n(t)
=Nre,(s1,...,sk)
n(t)Nre,(s1,...,sk)
n(sk+1) + ˜
N(k+1)
n(tsk+1)
=˜
N(k)
n(tsk)˜
N(k)
n(sk+1 sk)˜
N(k+1)
n(tsk+1)
and the event of the probability in (3.9) can be simplified:
||Nre,(s1,...,sk)
nNre,(s1,...,sk+1)
n||
= sup
t[sk+1,T ]
|˜
N(k)
n(tsk)˜
N(k)
n(sk+1 sk)˜
N(k+1)
n(tsk+1)|
= sup
t[sk+1sk,T sk]
|˜
N(k)
n(t)˜
N(k)
n(sk+1 sk)˜
N(k+1)
n(t+sksk+1)|
= sup
t[sk+1sk,T sk]
|˜
N(k)
n(t)˜
N(k)re,sk+1sk
n(t)|
=|| ˜
N(k)
n˜
N(k)re,sk+1sk
n||[0,T sk+1]
3.5 The counting process 59
where ˜
N(k)re,sk+1skis the once restarted process associated with ˜
N(k),˜
N(k+1)
both of (3.7). By the the base case 3.4.10 of the once restarted process we
have exponential equivalence and the probability (3.9) decays super expo-
nentially.
3.4.18
Proof of 3.4.9: by induction with the base case 3.4.10 and inductive step
3.4.18. Nis the restarted process of definition 3.4.16.
3.4.9
Corollary 3.4.19. Let Nbe a rcp and Nnthe associated scaled process. Fix
kNand fix 0< s1<··· < sk< T. Then there is a sequence of scaled
counting process (N
n;nN)such that (Nn;nN),(N
n;nN)are
exponentially equivalent in (D([0, T],R),||·||)and for each nNincrements
{N
n(s1), N
n(s2)N
n(s1), . . . , N
n(T)N
n(sk)}
of N
nare independent.
3.4.3 Interpolation
The counting process is piecewise constant and each realisation is a rightcon-
tinuous function with left limits, an element of D([0, T],R). Interpolating
Nand denoting the interpolated counting process ˆ
Nwe get an element of
C([0, T],R). For any twe have ˆ
N(t)N(t)[0,1] so || ˆ
NN|| is bounded
and the counting process and the interpolated counting process are exponen-
tially equivalent.
3.5 Conclusions from previous proofs
In this section we apply the exponential equivalences of the last section to
obtain finite dimensional large deviations for the renewal counting process.
Subsection 3.5.2 is on the limiting distribution for the scaled renewal counting
process being concentrated on the space of continuous functions. Technically
it is more a reinterpretation of a statement in the proof of 3.4.4.
3.5.1 Finite dimensional large deviations
We develop finite dimensional large deviation principles for the delayed and
undelayed renewal counting process.
Having calculated the lmgf for the undelayed rcp in 3.3 the following theorem
is now immediately clear.
60
Theorem 3.5.1. Let Nbe the undelayed renewal counting process with typ-
ical inter event time density fand lmgf Λ. Then for θR
lim
t→∞
1
tlog E[eθNt] = Γ(θ)
with Γ(θ) = Λ1(θ). Furthermore a one-dimensional Large Deviation
principle holds: for any s > 0and open set Gand closed set F
sinf
xGΓ(x)lim inf
n→∞
1
nlog P(Nn(s)G)
lim sup
n→∞
1
nlog P(Nn(s)F) sinf
xFΓ(x)
with Γthe Fenchel-Legendre transform of Γ.
Our theorem 3.5.1 is theorem 1 in [10].
Proof of 3.5.1: By the artner-Ellis theorem, cf [5] Theorem 2.3.6 on p.
44 or 7.5.3 of the appendix.
In terms of large deviation we need not distinguish between the delayed
and the undelayed counting process any more. The same theorem holds
for undelayed rcp that are exponentially equivalent to undelayed N; we have
proved exponential equivalence for delayed and undelayed rcp in section 3.4.1.
We state and prove the more general finite dimensional large deviations for
the delayed process ˜
N. We apply exponential equivalence of ˜
Nand the
restarted process. Again the theorem immediately generalises to all other
rcp exponentially equivalent to ˜
N.
Theorem 3.5.2. Let ˜
Nbe the delayed renewal counting process with sta-
tionary increments and inter event time of density fand lmgf Λ. Then for
any k1ak-dimensional Large Deviation principle holds: for any ordered
sequence s1, . . . , sk(and for easier notation s0=x0= 0), and open set
GRk
inf
xG
k
X
r=1
(srsr1) Γ(xrxr1
srsr1
)lim inf
n→∞
1
nlog P(˜
Nn(s1,...,sk)G)
while for any closed set FRk
lim sup
n→∞
1
nlog P(˜
Nn(s1,...,sk)F) inf
xF
k
X
r=1
(sksk1) Γ(xkxk1
sksk1
)
with Γthe Fenchel-Legendre transform of Γ.
3.5 The counting process 61
Proof of 3.5.2: Again by the artner-Ellis theorem. In the scope of
this proof abbreviate Nre
n=Nre,(s1,...,sk1)
nfor the restarted process defined
in 3.4.16 associated with ˜
N, ˜
N(1),..., ˜
N(k1). We calculate the following
relatively simple lmgf:
E[exp{hθ ,
Nre
n(s1)
.
.
.
Nre
n(sk)
i}]
(1)
=E[exp{h
θ1+···+θk
θ2+· · +θk
.
.
.
θk
,
Nre
n(s1)
Nre
n(s2)Nre
n(s1)
.
.
.
Nre
n(sk)Nre
n(sk1)
i}]
=E[exp{h
θ1+···+θk
θ2+· · +θk
.
.
.
θk
,
˜
Nn(s1)
˜
N(1)
n(s2s1)
.
.
.
˜
N(k1)(sksk1)
i}]
˜
N(0):= ˜
N
=
k
Y
r=1
E[exp{(θr+···+θk)˜
N(r1)
n(srsr1)}]
where in (1) we applied
(1) : hθ, yi=hTθ, T1yi,T=
1 0 0 ... 0
1 1 0 ... 0
.
.
..
.
........
.
.
1 1 ... 1 0
1 1 ... 1 1
and thus
lim
n→∞
1
nlog E[exp{hθ ,
Nre
n(s1)
.
.
.
Nre
n(sk)
i}]
=
k
X
r=1
lim
n→∞
1
nlog E[exp{(θr+···+θk)˜
N(r1)
n(srsr1)}]
=
k
X
r=1
(srsr1) Γ(θr+···+θk)
62
And the Fenchel-Legendre transform of this lmgf in some xRkbecomes
(first apply to the inner product hθ, xithe same transformation as in (1),
then change the variable θ)
sup
θRk
hθ, xi
k
X
r=1
(srsr1) Γ(θr+···+θk)
= sup
θRk
h
θ1+···+θk
θ2+· · +θk
.
.
.
θk
,
x1
x2x1
.
.
.
xkxk1
i
k
X
r=1
(srsr1) Γ(θr+···+θk)
ξr=θr+···+θk
= sup
ξRk
hξ ,
x1
x2x1
.
.
.
xkxk1
i
k
X
r=1
(srsr1) Γ(ξr)
= sup
ξRk
k
X
r=1
ξr(xrxr1)(srsr1) Γ(ξr)
=
k
X
r=1
(srsr1) sup
ξRxrxr1
srsr1
ξΓ(ξ)
=
k
X
r=1
(srsr1) Γ(xrxr1
srsr1
)
Now the restarted process is exponentially equivalent to ˜
N. By 7.2.3 they
have the same lmgf. Alternatively from the large deviation principle proved
for the restarted process Nre follows the large deviation principle for ˜
Nand
with the same rate function by [5] theorem 4.2.13. Thus the claim is proved.
3.5.2
Corollary 3.5.3. Any delayed renewal counting process exponentially equiv-
alent to ˜
Nsatisfies a finite dimensional large deviation principle with the
rate function of 3.5.2.
3.6 The counting process 63
3.5.2 Continuous paths
Recall section 3.4.1 and equation (3.6). This equation is really about the
modulus of continuity for Nn. The modulus of continuity is defined as
ωδ(f) := sup{|f(s)f(t)|:s, t [0, T],|st| δ}
and for monotonously increasing f=Nnwe get
ωδ(Nn) = sup
t[0,T ]
Nn(t+δ)Nn(t)
Claim 3.5.4. limδ0limn→∞ ωδ(Nn) = 0 a.s.
Proof of 3.5.4: In the proof of 3.4.4 we showed that (3.6) holds for any
fixed a > 0. We restate (3.6) in terms of the modulus of continuity:
(3.6) lim
δ0lim sup
n→∞
1
nlog P(ωδ(Nn)> a) = −∞
This implies that for any fixed a > 0, M R
(δ0, n0) : (δ < δ0, n n0) : P(ωδ(Nn)> a)enM
but then P
n=1 P(ωδ(Nn)> a)<and by the Borel Cantelli lemma
ωδ(Nn)> a happens only for finitely many nfor almost any fixed path
Nin the scaling (Nn;nN). Thus lim supn→∞ ωδ(Nn)afor δ < δ0a.s.
Thus
lim
δ0lim sup
n→∞
ωδ(Nn)sup
δ0
lim sup
n→∞
ωδ(Nn)a
Since awas arbitrary the claim is proved.
3.5.4
So we know that as n the counting process tends to a continuous
function. We can look at interpolations ˆ
Nnof Nnand their distribution on
C([0, T],R) for finite n. We now know that any limiting distribution will be
concentrated on C([0, T],R).
3.6 Change of measure
We start with an intuitive way of defining a change of measure for the pro-
cess Nover compact intervals and then rigorously prove that it is a mean
one martingale and has the intuitively clear properties. Since delayed and
undelayed counting processes are exponentially equivalent (as long as initial
64
inter event times share the same Cesaro mean of hazard rates) we are free to
chose an initial distribution and we will work with the undelayed counting
process throughout this section.
Fix a path of the counting process Nover [0, t]. The path is defined through
the now fixed value at tand the fixed inter event times τ1, . . . , τNt. The
likelihood of the path under the inter event times density fis
f(τ1)· · · · · f(τNt)Fc(B(t))
and the likelihood ratio for two different inter event times densities fand g
(with distribution function G)
g(τ1)· · · · · g(τNt)Gc(B(t))
f(τ1)· · · · · f(τNt)Fc(B(t)) =
Nt
Y
k=1
g
f(τk)Gc
Fc(B(t))
In the case of g=fβ
Nt
Y
k=1
fβ
f(τk)Fc
β
Fc(B(t)) =
Nt
Y
k=1
eβτkΛ(β)Fc
β
Fc(B(t)) = eβ(tB(t))NtΛ(β)Fc
β
Fc(B(t))
and this will be our density process.
3.6.1 Martingale property
We prove the martingale property directly relying only on the renewal prop-
erty of the counting process.
Claim 3.6.1. For β D(Λ) and any TRthe process
L(β, ·) : [0, T][0,), t 7→ eβ(tB(t))NtΛ(β)Fc
β
Fc(B(t)) (3.10)
is a martingale with respect to the natural filtration of N.
Proof of 3.6.1: L(β, t)<always for unbounded τ. If τis bounded and
has no point mass on its least upper bound, bsay, then P(Fc(B(t)) = 0) =
P(τ=b) = 0 and L(β, t)<a.s. Changing the density of τto f(b) = 0
makes L(β, t)<always.
Measurability wrt the filtration generated by Nis immediate: Observing
Nup to time twe know Ntand B(t) which L(β, t) is a function of. We prove
integrability in the following claim 3.6.2 by calculating the mean and then
innovation in claim 3.6.3.
3.6.1
3.6 The counting process 65
Claim 3.6.2 (Integrability).E[L(β, t)] = 1 for all βand t0.
Proof of 3.6.2: We need to know about the distribution of the age when
Nt=kis fixed. If k= 0 then B(t) = tfor sure. For specified k1 we have
an explicit density for the age (cf 3.1.17)
E[L(β, t)]
=
X
k=0
E[eβ(tB(t))NtΛ(β)Fc
β
Fc(B(t)) 11Nt=k]
=Fc
β
Fc(t)Fc(t)
+
X
k=1 Zt
x=0
E[eβ(tB(t))kΛ(β)Fc
β
Fc(B(t)) 11Nt=k|B(t) = x]Fc(x)fk(tx)dx
=Fc
β(t) +
X
k=1 Zt
x=0
eβ(tx)kΛ(β)Fc
β
Fc(x)Fc(x)fk(tx)
|{z }
=Fc
β(x)fk
β(tx)
dx
=Fc
β(t) +
X
k=1
Fc
βFk
β(t)
=P(β)(Nt= 0) +
X
k=1
P(β)(Nt=k) = 1
3.6.2
Claim 3.6.3 (Innovation).E[L(β, t)|(Nr;rs)] = L(β, s)for any β
D(Λ) and all t, s with st.
Proof of 3.6.3: Let us first investigate the conditional expectation of the
claim.
E[L(β, t)|(Nr;rs)]
=E[
Nt
Y
k=1
exp{βτkΛ(β)}Fc
β
Fc(B(t)) |(Nr;rs)]
=
Ns
Y
k=1
exp{βτkΛ(β)}E[
Nt
Y
k=Ns+1
exp{βτkΛ(β)}Fc
β
Fc(B(t)) |(Nr;rs)]
For the remaining conditional expectation we do not need the condition on
the whole past of the process: All the information the integrand requires is
66
in the state N(s) in sand the age B(s) in s. Claim 3.6.3 is equivalent to the
following
E[
Nt
Y
k=Ns+1
exp{βτkΛ(β)}Fc
β
Fc(Bt)|Bs, Ns] = Fc
β
Fc(Bs) (3.11)
for any β D(Λ) and all t, s with st. In (3.11) there is τN(s)+1, the inter
event time covering sand we condition on B(s), the age of the process at
time s. We in fact condition on τN(s)+1 > b for b=B(s) which allows us to
apply the distribution function F+a(cf 2.1.7) with a=B(s) to τN(s)+1. We
denote the density of F+B(s)by f+B(s).
Now introduce the indicator 11Nt=Ns+lwith lN. For l= 0 we have an
empty product in the following
E[
Nt
Y
k=Ns+1
exp{βτkΛ(β)}Fc
β
Fc(Bt) 11Nt=Ns|Bs, Ns]
=Fc
β(ts+Bs)
Fc(Bs)(3.12)
And we have applied Nt=NsBt=Bs+ts. Now for general l1.
E[
Nt
Y
k=Ns+1
exp{βτkΛ(β)}Fc
β
Fc(B(t)) 11Nt=Ns+l|B(s), Ns]
=E[eβτNs+1Λ(β)
Nt
Y
k=Ns+2
exp{βτkΛ(β)}Fc
β
Fc(B(t)) 11Nt=Ns+l|B(s), Ns]
(3.13)
Now τNs+1 =Bs+Cswhere Bsis known. We solve for the unknown
Cs=τNs+1 Bs
which has density f+aof definition 2.1.7 with a=Bs(cf 3.1.14).
(3.13) = Z
x=0
E[eβ(Bs+Cs)Λ(β)
Nt
Y
k=Ns+2
exp{βτkΛ(β)}Fc
β
Fc(Bt)
11Nt=Ns+l|Bs, Ns, Cs=x]f+Bs(x)dx
=Zts
x=0
eβ(Bs+x)Λ(β)E[
Nt
Y
k=Ns+2
exp{βτkΛ(β)}Fc
β
Fc(Bt)
11Nt=Ns+l|Bs, Ns, Cs=x]f+Bs(x)dx (3.14)
3.6 The counting process 67
In the remaining conditional expectation we have inter event times τNs+2,
τNs+3,...that are independent of Nsand identically distributed to τ1, τ2,....
The age B(t) = B(N, t) conditional on Nhaving an event at s+xhas the
same distribution as the age of the renewal counting process NτN(s)+2 defined
through its inter event times τN(s)+2, τN(s)+3,... at time t(s+x):
LB(N, t)|s, Ns, Cs=x=LB(NτNs+2 , t (s+x)) |s, Ns
And from the just mentioned independence
LB(NτNs+2 , t (s+x)) |s, Ns=LB(Nτ1, t (s+x))
The event Nt=Ns+ltranslates into 1 + NτNs+2 (t(s+x)) = lor equiv-
alently Nτ1(t(s+x)) = l1. In the product we had k=Ns,...,Ntfor
Nt=Ns+l, so we did take the product over all l1 inter event times of
Nbetween [s+C(s), t]. These inter event times now become τ1,...τl1or
τ1,...,τNτ1
t(s+x).
We summarise all just mentioned changes:
(3.14) = Zts
x=0
eβ(B(s)+x)Λ(β)
E[
Nτ1(t(s+x))
Y
k=1
exp{βτkΛ(β)}
Fc
β
Fc(B(Nτ1, t (s+x))))11Nτ1(t(s+x))=l1]
f+B(s)(x)dx
and summing expressions (3.13) over l1
X
l=1
(3.13)
=Zts
x=0
eβ(B(s)+x)Λ(β)E[
Nτ1(t(s+x))
Y
k=1
exp{βτkΛ(β)}Fc
β
Fc(B(Nτ1, t (s+x))))
X
l=1
11Nτ1(t(s+x))=l1
|{z }
=1
]f+B(s)(x)dx
and applying 3.6.2 to Nτ1at t(s+x)
Zts
x=0
eβ(B(s)+x)Λ(β)E[
Nτ1(t(s+x))
Y
k=1
exp{βτkΛ(β)}Fc
β
Fc(B(Nτ1, t (s+x)))) ]
|{z }
=1
f+B(s)(x)dx
68
we can simplify
X
l=1
(3.13) = Zts
x=0
eβ(B(s)+x)Λ(β)f+B(s)(x)
|{z }
=f(x+B(s))
Fc(B(s))
dx
=1
Fc(B(s)) Zts
x=0
eβ(B(s)+x)Λ(β)f(x+B(s)) dx
=1
Fc(B(s)) Zts+B(s)
x=B(s)
fβ(x)dx
=1
Fc(B(s)) (Fβ(ts+B(s)) Fβ(B(s)))
=1
Fc(B(s)) (Fc
β(ts+B(s)) + Fc
β(B(s)))
=Fc
β(ts+B(s))
Fc(B(s)) +Fc
β(B(s))
Fc(B(s))
which finally results in
E[
Nt
Y
k=Ns+1
exp{βτkΛ(β)}Fc
β
Fc(B(t)) |B(s), Ns]
=
X
l=0
E[
Nt
Y
k=Ns+1
exp{βτkΛ(β)}Fc
β
Fc(B(t)) 11Nt=Ns+l|B(s), Ns]
= (3.12) +
X
l=1
(3.13)
=Fc
β(ts+B(s))
Fc(B(s)) +Fc
β(ts+B(s))
Fc(B(s)) +Fc
β(B(s))
Fc(B(s))
=Fc
β(B(s))
Fc(B(s))
and we got (3.11).
3.6.3
For an alternative proof cf. [4] proposition 13.3.V on p. 535.
3.6.2 The twisted distribution
We rearrange the non-negative mean one martingale L(β, ·) a little and
change the parameter from βto γsuch that β=Γ(γ). Since D(Γ) = R
and Λ(D(Λ)) = Rwe can start with any γand find the matching β. The
following is well defined.
3.6 The counting process 69
Definition 3.6.4. M(γ, t) := L(Γ(γ), t)for γR, t R0.
We get a simplification in L(β, t) from Λ(Γ(γ)) = Λ(Λ1(γ)) = γ
L(Γ(γ), t)(3.10)
=eΓ(γ) (tB(t))NtΛ(Γ(γ)) Fc
Γ(γ)
Fc(B(t))
=eγNttΓ(γ)Fc
Γ(γ)
Fc(B(t))eΓ(γ)B(t)
and will write
M(γ, t) = exp{γNttΓ(γ)}r(Γ(γ), t) (3.15)
with
Definition 3.6.5. For t0and βsuch that Fβis well defined (cf 2.3.1) set
r(β, t) := Fc
β
Fc(B(t)) eβB(t).
Note that ris measurable by continuity of (β, x)7→ Fc
β
Fc(x)eβx and mea-
surability of t7→ B(t).
Claim 3.6.6. The one dimensional distributions of the counting process un-
der the change of measure M(γ, ·)coincide with the one dimensional distri-
butions of a renewal counting process with inter event times densities fβfor
β=Γ(γ).
Proof of 3.6.6: It holds for k= 0 and arbitrary t0:
P[γ](Nt= 0) = E[11Nt=0 eγ0(tt)Γ(γ)Fc
β
Fc(t)] = E[11Nt=0]Fc
β
Fc(t) = Fc
β(t)
and for k1 and t > 0, applying β=Γ(γ)Λ(β) = γ:
P[γ](Nt=k)
=E[11Nt=keγNt(tB(t))Γ(γ)Fc
β
Fc(B(t))]
3.3
=Zt
s=0
E[11Nt=keγNt(tB(t))Γ(γ)Fc
β
Fc(B(t)) |B(t) = s, Nt=k]Fc(s)fk(ts)ds
=Zt
s=0
eγk(ts)Γ(γ)Fc
β
Fc(s)Fc(s)fk(ts)ds
=Zt
s=0
eγk(ts)Γ(γ)fk(ts)
|{z }
=fk
β(ts)
Fc
β(s)ds
=Fc
βFk
β(t)
3.6.6
Where we did not fix the order of convoluting and exponentially twisting due
to 3.1.10.
70
Claim 3.6.7. The two dimensional distributions of the counting process un-
der the change of measure M(γ, ·)coincide with two dimensional distribu-
tions of a renewal counting process with inter event times densities fβfor
β=Γ(γ).
Proof of 3.6.7: First the untwisted counting process. We apply the joint
distribution of state and residual lifetime introduced in 3.1.16 for k0, l
1. Remember that F0= 11(0,).
P(Nt1=k, Nt2=k+l)
=Zt2t1
r=0
P(Nt1=k, Nt2=k+l|Nt1=k, C(t1) = r)
Zt1
p=0
f(t1+rp)fk(p)dp dr
=Zt2t1
r=0
P(Nt2=k+l|Nt1=k, C(t1) = r)
|{z }
=P(N
t2t1r=l1) Zt1
p=0
f(t1+rp)fk(p)dp dr
With Ncounting increments of Nafter t1+C(t1) and Nan undelayed
renewal counting process (cf section 3.2). Now for k0, l = 0.
P(Nt1=k, Nt2=k) = P(Nt1=k, C(t1)> t2t1)
=Zt1
r=0
Fc(t1+ (t2t1)r)dFk(r)
=Zt1
r=0
Fc(t2r)dFk(r)
Now we investigate the twisted process. k0, l 1
P[γ](Nt1=k, Nt2=k+l)
=Zt2t1
r=0 Zt1
s=0
E[11Nt1=k11Nt2=k+leγNt2(t2B(t2))Γ(γ)Fc
β
Fc(B(t2))
|Nt1=k, B(t1) = s, C(t1) = r]f(t1+rs)fk(s)ds dr (3.16)
Let again Ncount increments of Nafter t1+C(t1). Then the notation
changes:
Nt2Nt1|C(t1) = r1 + N
t2t1r
Nt2=Nt1+Nt2Nt1=Nt1+ 1 + N
t2t1r
B(N, t2)B(N, t2t1r)
t2B(t2) = t1+rs+t2t1rB(N, t2) + s
=t1+rs+t2t1rB(N, t2t1r) + s
3.6 The counting process 71
Continue
(3.16)
=Zt2t1
r=0 Zt1
s=0
E[11N
t2t1r=l1eγN
t2t1r(t2t1rB(N,t2t1r))Γ(γ)
Fc
β
Fc(B(N, t2t1r))|Nt1=k, B(t1) = s, C(t1) = r]
e(k+1)γ(t1+r)Γ(γ)f(t1+rs)fk(s)ds dr
=Zt2t1
r=0 Zt1
s=0
E[γ][11N
t2t1r=l1]fβ(t1+rs)fk
β(s)ds dr
to get the twisted analogue of the untwisted finite dimensional distribution
of t1, t2. The case of l= 0 for the twisted process is omitted.
3.6.7
Iterating this all finite dimensional distributions of the counting process un-
der the change of measure M(γ, ·) are those of a renewal counting process
with inter event time densities fβ(for β=Γ(γ)):
Conclusion 3.6.8. Let Nbe a renewal counting process with typical inter
event time density fand lmgf Λ. Let t7→ M(t, γ)be the change of measure
process defined in 3.6.4. Under this change of measure the counting process
remains renewal and inter event times now have density fβwith β=Γ(γ) =
Λ1(γ).
We now investigate the change of measure process further.
Lemma 3.6.9. Let τbe an inter event time with density fand distribution
function Fand β D(Λ). Then infx:F(x)<1
Fc
β
Fc(x)eβx >0.
Proof of 3.6.9: The function x7→ Fc
β
Fc(x)eβx is continuous on {x|F(x)<
1}from the existence of the density ffor F. If τis unbounded the infimum
is over R0otherwise - if τ(0, b), say - over some bounded interval. The
function is positive for each fixed xand thus has a positive minimum over
any compact interval whithin {x|F(x)<1}. We still need a positive liminf
as x or xb. We do a calculation for τunbounded, but the same
holds for bounded τ.
Fc
β
Fc(x)eβx =Z
s=x
fβ(s)ds 1
Fc(x)eβx
=Z
s=x
eβsΛ(β)f(s)
Fc(x)ds eβx
=Z
s=x
eβs f(s)
Fc(x)ds eΛ(β)βx
=E[eβ(τx)|τ > x]eΛ(β)
72
Thus for β > 0
Fc
β
Fc(x)eβx =E[eβ(τx)|τ > x]eΛ(β)> eΛ(β)>0
(the eΛ(β)>0 requires β D(Λ)). If on the other hand β < 0 by an
application of Jensen’s inequality
Fc
β
Fc(x)eβx =E[eβ(τx)|τ > x]eΛ(β)eβE[τx|τ>x]eΛ(β)
and
ǫ > Fc
β
Fc(x)eβx ǫ > eβE[τx|τ>x]eΛ(β)
1
βlog ǫ eΛ(β)<E[τx|τ > x]
and the last inequality holds for arbitrarily small ǫonly if lim supx→∞ E[τ
x|τ > x] = . We have assumed in 2.2.13 that this does not happen.
3.6.9
We can often do without the cited assumption 2.2.13 but then have to con-
cider different cases:
For bounded τand β < 0 we could have argued directly:
τ > x τ(x, b)β<0
β(τx)> β (bx)
Implying E[eβ(τx)|τ > x]> eβ(bx)> eβb >0 which makes Fc
β
Fc(x)eβx
strictly positive for all x(0, b).
If for unbounded τthe limit limx→∞ h(x) exists in (0,] then limx→∞ E[τ
x|τ > x] = 1
limx→∞ h(x)<. (cf (7.4) of the appendix with E[τ+x]
the expectation under the distribution F+x(cf definition 2.1.7), so
E[τ+x] = E[τx|τ > x])
Lemma 3.6.10. Let τbe an inter event time with density fand distribution
function Fand γ D(Λ). Then supx:F(x)<1
Fc
γ
Fc(x)eγx <.
Proof of 3.6.10: We have seen that infx
Fc
β
Fc(x)eβx >0 for any β D(Λ).
Since 0 D(Λ) we also have α D(Λ) + α=Dα) for any αR(cf 2.5).
For α D(Λ) we apply claim 3.6.9 to the inter event time with distribution
3.6 The counting process 73
function Fα1(and parameter αin the place of β):
0<inf
x:Fα(x)<1
(Fα)c
α
Fc
α
(x)eαx = inf
x:F(x)<1
Fc
Fc
α
(x)eαx
=1
supx:F(x)<1
Fc
α
Fc(x)eαx
(With {x|Fα(x)<1}={x|F(x)<1}from equivalence of measures /
distribution functions under the exponential twist; for (Fα)α=Fcf lemma
2.3.3.) So we got the claim with γ=αand our only requirement was
α D(Λ).
3.6.10
Claim 3.6.11. t7→ r(β, t)defined in 3.6.5 is bounded and strictly positive.
Proof of 3.6.11: For unbounded τ
inf
tR0
r(β, t) = inf
tR0
Fc
β
Fc(B(t)) eβB(t)B(t)[0,t]R0
inf
xR0
Fc
β
Fc(x)eβx >0
sup
tR0
r(β, t) = sup
tR0
Fc
β
Fc(B(t)) eβB(t)B(t)[0,t]R0
sup
xR0
Fc
β
Fc(x)eβx <
while for bounded τ(0, b) we have B(t)(0, b) and we apply the infimum
and supremum over (0, b) = {x:F(x)<1}.
Positivity of the infimum was proved in 3.6.9 and finiteness of the supremum
in 3.6.10.
3.6.11
We have now introduced a change of measure M(γ, ·) for the renewal count-
ing process and have proved how it affects the inter event times of the count-
ing process. In future notation we will distinguish exponentially twisting
inter event times with some parameter βin parentheses: E(β)[τ]. If we ex-
plicitly refer to a counting process we write E(β)[eθNt] to express the twist of
its inter event times with parameter β; or we might write E[γ][eθNt] to de-
note the twisting of the counting process with the change of measure process
M(γ, ·). Both are the same as soon as β=Γ(γ).
Further more about notation: Under the exponentially transformed inter
event densities with parameter βthe mean changes from E[τ] = Λ(0) to
E(β)[τ] = Λ(β) (cf proof of 2.2.8 and 2.3.4). This corresponds to a change
of the renewal point process’ rate from λ=1
E[τ]to 1
E(β)[τ]. The following is a
1Fαis an exponential transform as defined in 2.1.2 and not of the kind of definition
2.1.7. If it was it would have to be F+(α)for α0
74
translation of this fact into the changed parameters (β=Γ(γ)) and more
in terms of the counting process.
Definition 3.6.12. The rate of the point process Nunder the twist M(γ, ·)
is denoted λ(γ)and is defined as λ(γ) := Γ(γ).
Alternatively we could have defined λ(γ) = 1
E(Γ(γ))[τ].
To conclude: We have developed our basic tools to be working with renewal
counting processes. This allows us to make up for the lost Markov property.
We have
in terms of large deviations identified the undelayed renewal counting
process, the renewal counting process with stationary increments, and
the counting process with independent increments over disjoint inter-
vals. For the Poisson process all these three properties hold generally
and truly - not only in terms of large deviation.
We have defined an exponential change of measure for the undelayed
renewal counting process such that under the changed distribution the
process remains renewal. Changing a Markovian jump process this
holds true immediately.
Chapter 4
Large deviations of the ren.
counting process
In this chapter we develop a sample path large deviation principle for the
renewal counting process and we get a rate function in integral form with
the integrand, the so-called local rate function, the Fenchel-Legendre trans-
form of the lmgf of the counting process. The integral form fits the claimed
closeness to the Markov property while the local rate function reflects the
generalised distribution of inter event times.
The importance of this chapter is twofold: On the one hand the sample
path large deviations for a one dimensional counting process are much sim-
pler than the sample path large deviations for a higher dimensional process
with non independent coordinates and discontinuous statistics. A stochastic
process describing a stochastic network will have such undesirable properties.
So aiming at the large deviations for a network process we want to develop
and test our tools by proving the large deviations of the counting process.
On the other hand we will later directly apply the sample path large devi-
ations of the arrival and service process to obtain local large deviations for
the network. Thus it is also a matter of completeness that we include the
sample path large deviations for the counting process.
The large deviation principle for the counting process is not a new result:
it was proved in 1997 by Anatolii Puhalskii and Ward Whitt in the space
D([0,),R). Under our assumption 2.2.2 they equip the space with the
Skorohod J1-topology [17], theorem 6.1. They use weak convergence analogs
in large deviations and apply an extended contraction principle. It was not
immediately clear to us how to apply their techniques in the setting of a
stochastic network.
75
76
There may be another interesting point in this chapter: The sample path
large deviations of the partial sums process can be obtained for LD-bounded
iid summands by Mogulskii’s theorem, 5.1.2 in [5]. It may be intuitive that
the partial sums process of inter event times contains the same information
as the counting process constructed from these inter event times and that
rare events of one process can be translated into rare events of the other.
In recent work Raymond Russel [22] and Mark Rodgers-Lee [20] prove how
to obtain a large deviation principle for the counting process from the large
deviation principle of the partial sums process and vice versa. This would
be one approach to develop the large deviations for the counting process of
LD-bounded inter event times.
However, one would still have to develop the large deviations for partial sums
processes with not LD-bounded summands for which the Mogulskii theorem
does not hold - for example for exponentially distributed summands. In this
case, one would rather start from the counting process-side: Sample path
large deviations for the Poisson process and other more general Markovian
jump processes have been developed in the book of Shwartz and Weiss [23].
We decided to directly develop the sample path large deviations for the count-
ing process and will do so for LD-bounded and not LD-bounded inter event
times at the same time. One could then like to transfer the large deviation
principle for the renewal counting process with not LD-bounded inter event
times back to the partial sums process.
Our approach of development of the sample path large deviations for the
renewal counting process starts with local large deviations and calculating
exponential decay rates on sup-norm balls around piecewise linear functions
with diminishing radii applying an exponential change of measure. In the
change of measure we need to find the suitable change of measure that makes
the deviating event become the expected behaviour. We get a weak large de-
viation principle and identify the integral form for the rate function. Then
we strengthen the weak to a full large deviation principle by exponential
tightness.
Apart from the existence of a large deviation principle and the explicit form
of the rate function this chapter contributes in allowing standard large devi-
ation interpretation of rare events: that the rare event happens like a regular
(common, non-rare) event under a different distribution, the twisted distri-
bution. And this allows for standard applications as in fast simulation.
4.1 LD of the ren. counting process 77
4.1 The space
We have introduced the counting process as a stochastic process with piece-
wise constant paths. Its paths are elements of D([0, T],Z). As we interpolate
ˆ
Nits paths become elements of C([0, T],R)D([0, T],R). By exponential
equivalence of Nand ˆ
Nwe need not distinguish between Nand ˆ
Nin terms
of large deviations when working in the sup-norm induced topology, cf 3.4.3.
As before we choose the process that we are most convenient to work with.
In this section in terms of large deviation theorems it will be the interpo-
lated process ˆ
Nliving in the continuous functions C([0, T],R) while for direct
calculations of exponentially scaled limits of probabilities it will be the not-
interpolated undelayed process Nwith realisations in D([0, T],R).
The choice of C([0, T],R) equipped with the sup-norm ||·|| may seem natural
in that we have already seen that limiting distributions of scaled counting
processes will be in the continuous functions, cf 3.5.2. The sup-norm induces
a metric on C([0, T],R) and on D([0, T],R) and thus a topology on these sets
of functions. We use the same notation for sup-norm balls in C([0, T],R) as
on D([0, T],R):
Definition 4.1.1. For ψC([0, T],R), ǫ > 0
Uǫ(ψ) = {f:||fψ|| < ǫ}
where Uǫ(ψ)is understood to be a subset of D([0, T],R)or C([0, T],R).
Remark 4.1.2. The space (C([0, T],R),|| · ||)of continuous functions over
the compact interval [0, T]with the sup-norm is a complete seperable metric
space.
While the scaled process Nn(cf definition 3.4.1) is a function on [0, T] it
contains information about Nover [0, nT].
4.1.1 A base of the topology
In this section we give a base of the sup-norm induced topology of C([0, T],R)
the space of continuous functions. It will consist of sup-norm balls around
piecewise linear functions.
Definition 4.1.3 (Piecewise linear functions).For JNset
PJ={fC([0, T],R)|flinear on [k T 2J,(k+ 1) T2J]
for k= 0,...,2J1}.
78
Claim 4.1.4.
J={ Uǫ(ψ)|ǫ > 0, J N, ψ PJ}
is a base of the sup-norm induced topology of the real valued continuous func-
tions C([0, T],R).
Proof of 4.1.4: We argue with [21] (chapter 8, section 2, proposition 3, p.
146)
The base elements cover the set of continuous functions: Let fbe a
continuous function. Then there is an element of Jthat contains f:
There is for any ǫ > 0 a JNand a piecewise linear function ψ PJ
such that ||fψ|| < ǫ.
Let B1and B2be in J. There is for any gB1B2aB3=B3(g) J
such that gB3and B3B1B2.
Nothing to say about the first bullet. About the second: Let B1, B2 J
be such that they have a non-empty intersection. Let ψibe the centre of Bi
for i= 1,2 and ǫibe such that Bi=Uǫi(ψi) and Jisuch that ψi PJi. Set
J:= max{J1, J2}then ψi PJfor i= 1,2.
Let gbe some continuous function (not necessarily piecewise linear!) in
B1B2. On any of the 2Jintervals of [0, T] this ghas a positive distance to
the boundary of the intersection B1B2[T k 2J, T (k+1) 2J].
In terms of pointwise restrictions any function in the intersection B1B2
maps x[0, T] to ywith the following properties.
x[T k 2J, T (k+ 1) 2J] (4.1)
y(max{ψ1(x)ǫ1, ψ2(x)ǫ2},min{ψ1(x) + ǫ1, ψ2(x) + ǫ2})
and for fixed gdefine the strictly positive, continuous functions h, i as
h(x) := g(x)max{ψ1(x)ǫ1, ψ2(x)ǫ2}
i(x) := min{ψ1(x) + ǫ1, ψ2(x) + ǫ2} g(x)
for x[0, T]. Each has a strictly positive minimum over [0, T] and we can
define 0 < ǫ := minx[0,T ]{min{h(x), i(x)}} and uniformly on [0, T] the dis-
tance of g(x) to the bound B1B2is at least ǫ. Now there is KNand
a piecewise linear φ PKthat is ǫ
4-close to gin the sup-norm. The neigh-
bourhood Uǫ
2(φ) lies within B1B2and contains g.
4.1.4
4.2 LD of the ren. counting process 79
-
6
T k 2JT(k+ 1) 2J
ψ1
6
ǫ1
?
@@
@
ψ2
@@
@
@@
@
6
ǫ2?
g
pppppppp
pppppppp
pppppppppppppp
ppp pp
ppp
Figure 4.1: Feasible B1, B2 J and
gB1B2
-
6
T k 2JT(k+ 1) 2J
@@
@
@@
@
g
pppppppp
pppppppp
pppppppppppppp
ppp pp
ppp
x
6
?
h(x)
6
?
i(x)
Figure 4.2: Restrictions (4.1) for
functions in B1B2,gand h(x), i(x)
for a fixed xand the fixed g
The base Jconsists of convex sets (not-compact sup-norm balls). This
agrees with C([0, T],R) being locally convex.
We get a similar countable Jif we allow only piecewise linear functions
with slopes in Q, starting in Q, and sup norm balls of rational radii. This
agrees with C([0, T],R) being separable. As a complete separable space
(C([0, T]),R) is denoted a Polish space.
If we fix the initial values and work with (C([0, T],R) {f|f(0) = x},||.||)
we get a base of the induced topology as J {f|f(0) = x}.
4.2 Local large deviations
We directly compute decay-rates for probabilities of the event that the scaled
renewal counting process Nn(cf definition 3.4.1) stays close to a piecewise
linear function ψ PJwith some JN:
lim
ǫ0lim
n→∞
1
nlog P(Nn Uǫ(ψ)) (4.2)
80
We do this in steps, first for linear ψ P0and then for the general case
ψ PJ, J 1.
Remember that Λ is the lmgf of a typical inter event time τof Nand
Γ = Λ1( ·) has been defined in 2.2.7. We will apply the density pro-
cess Mdefined in 3.6.4 (for an explicit form: (3.15)). The rappearing in the
density process has been defined in 3.6.5. We repeat the change of measure
process with twist parameter α=Λ(β) for β D(Λ):
M(α, t) = exp{αNttΓ(α)}r(Γ(α), t)
This change of measure process applied to the counting process corresponds
to the exponential twist of inter event times with parameter β=Γ(α) (cf
3.6.8).
4.2.1 Local large deviations upper bound
We calculate the limsup for the expression (4.2) for t7→ tv P0with some
v0.
Claim 4.2.1. For a renewal counting process Nwith lmgf Γand Γthe
Fenchel-Legendre transform of Γ(cf section 2.6)
lim
ǫ0lim sup
n→∞
1
nlog P(Nn Uǫ(t7→ tv)) TΓ(v)
Proof of 4.2.1: We start with a change of measure with M(α, ·) for some
αR.
P(Nn Uǫ(t7→ tv))
=E[11Nn∈Uǫ(t7→tv)] = E[11Nn∈Uǫ(t7→tv)
M(α, nT)
M(α, nT)]
=E[α][11Nn∈Uǫ(t7→tv)
1
M(α, nT)]
=E[α][11Nn∈Uǫ(t7→tv)exp{−α N(nT) + n T Γ(α)}1
r(Γ(α), nT)]
We now add the zero α nTv α nTv and bound by applying closeness of
4.2 LD of the ren. counting process 81
N(nT) to nTv as enforced by the indicator. For α > 0
P(Nn Uǫ(t7→ tv))
=E[α][11Nn∈Uǫ(t7→tv)
exp{ α N(nT) + α nTv
|{z }
=α n T (Nn(T)T v)α n T ǫ
α nTv +n T Γ(α)
|{z }
=nT (α vΓ(α))
}1
r(Γ(α), nT)]
E[α][11Nn∈Uǫ(t7→tv)
1
r(Γ(α), nT)] exp{−n T α(vǫ)Γ(α)}
The expectation E[α][11Nn∈Uǫ(t7→tv)1
r(Γ(α),nT )] is finite and uniformly bounded
in n: Bound 11Nn∈Uǫ(t7→tv)1 and apply boundedness of 1
r(Γ(α),·)proven in
3.6.11. There is no exponential growth in the expectation and it vanishes in
the exponentially scaled upper bound limit. Thus we finish for α > 0
lim sup
n→∞
1
nlog P(Nn Uǫ(t7→ tv))
lim
n→∞
1
nlog E[α][11Nn∈Uǫ(t7→tv)
1
r(Γ(α), nT)] exp{−n T α(vǫ)Γ(α)}
Tα(vǫ)Γ(α)(4.3)
while for α < 0 we get
lim sup
n→∞
1
nlog P(Nn Uǫ(t7→ tv)) α T (v+ǫ) + TΓ(α)
Optimising the bound
lim sup
n→∞
1
nlog P(Nn Uǫ(t7→ tv))
Tmax{sup
α>0
α(vǫ)Γ(α),sup
α<0
α(v+ǫ)Γ(α)}
and letting ǫ0
lim
ǫ0lim sup
n→∞
1
nlog P(Nn Uǫ(t7→ tv))
Tmax{sup
α>0
α v Γ(α),sup
α<0
α v Γ(α)}
=TΓ(v).
4.2.1
82
4.2.2 Local large deviations lower bound
We calculate the liminf for the expression (4.2) for t7→ tv P0and v0.
Claim 4.2.2. For a renewal counting process Nwith lmgf Γand Γthe
Fenchel-Legendre transform of Γ(cf section 2.6)
lim
ǫ0lim inf
n→∞
1
nlog P(Nn Uǫ(t7→ tv)) TΓ(v)
To prove the claim we will apply the following
Lemma 4.2.3. If Γ(α) = vthen
lim inf
n→∞
1
nlog E[α][11Nn∈Uǫ(t7→tv)
1
r(Γ(α), t))] = 0.
Proof of 4.2.3: We have for β=Γ(α) bounded x7→ Fc
β
Fc(x)eβx and thus
E[α][11Nn∈Uǫ(t7→tv)
1
r(Γ(α), nT)]E[α][11Nn∈Uǫ(t7→tv)]
|{z }
1 by 7.1.3
inf
xR
Fc
Fc
β
(x)eβx >0
The convergence in 7.1.3 of the appendix is almost surely. From boundedness
the convergence is in the mean, too.
4.2.3
Proof of 4.2.2: Consider the case v > 0 first. We start similarly as for
the upper bound. We again apply that N(nT) is close to nTv. Let α > 0.
P(Nn Uǫ(t7→ tv))
=E[α][11Nn∈Uǫ(t7→tv)
exp{ α N(nT) + α nTv
|{z }
=α n T (Nn(T)T v)≥−α n T ǫ
α nTv +n T Γ(α)
|{z }
=nT (α vΓ(α))
}1
r(Γ(α), nT)]
E[α][11Nn∈Uǫ(t7→tv)
1
r(Γ(α), nT)] exp{−n T α(v+ǫ)Γ(α)}
Fix αsuch that Γ(α) = vand lemma 4.2.3 is applicable. For v > Γ(0) we
have α > 0 (by α= )1(v)>)1(0)) = 0) and lower bound
lim inf
n→∞
1
nlog P(Nn Uǫ(t7→ tv))
lim inf
n→∞
1
nlog E[α][11Nn∈Uǫ(t7→tv)
1
r(Γ(α), nT)]
exp{−α n T (v+ǫ) + n T Γ(α)}
= 0 α T (v+ǫ) + TΓ(α).
4.2 LD of the ren. counting process 83
This results in the now accurate
lim
ǫ0lim
n→∞
1
nlog P(Nn Uǫ(t7→ tv)) (1)
=Tα(v)vΓ(α(v))(2)
=TΓ(v).
where we wrote α(v) since αis the twist with Γ(α) = vfrom lemma 4.2.3
needed in (1). The same property Γ(α) = vjustifies (2).
Similarly for v(0,Γ(0)) and Γ(α) = vwhich implies α0.
Since Γ>0 always lemma 4.2.3 does not apply to v= 0. However, for
v= 0 we may write
Nn Uǫ(t7→ 0·t)Nn< ǫ
and calculate the probability of the event directly.
lim
n→∞
1
nlog P( sup
t[0,T ]
|Nn(t)|< ǫ) = lim
n→∞
1
nlog P(Nn(T)< ǫ) = TΓ(ǫ)
lim
ǫ0lim
n→∞
1
nlog P( sup
t[0,T ]
|Nn(t))|< ǫ) = TΓ(0)
With Γ(0) = LC(h) and = for LD-bounded inter event times. We applied
continuity of Γ(or simultaneous divergence to ) in 0 from the corollary
2.6.4.
4.2.2
4.2.3 Generalisation
Up to now we have started the scaled counting process as Nn(0) = 0 and
calculated decay rates for the counting process to stay close to some linear
function starting in 0, too. Upper and lower bound work exactly the same
way for counting processes starting in nx
nand an affine function t7→ x+tv.
We make this explicit in the following notation and state the generalised
result.
Definition 4.2.4 (Scaling).For a counting process Nand a fixed x > 0set
Nn(·, x) : t7→ nx
n+1
nN(nt)
Corollary 4.2.5. Let Nbe a counting process, x > 0and ψ P0,ψ(t) =
x+tv. Let Nn(·, x)be the scaled process of definition 4.2.4. Then
lim
ǫ0lim
n→∞
1
nlog P(Nn(·, x) Uǫ(t7→ x+tv)) = TΓ(v)
84
4.2.4 Piecewise linear functions
We calculate (4.2) for general J1 applying heuristics learned from [23]
(chapter 5, p. 73) developed for Markovian processes.
Claim 4.2.6. If ψ PJwith non-negative slopes v1, v2,...,v2Jand Nan
undelayed renewal counting process with lmgf Γand Nnthe scaled process
associated with N. Then
lim
ǫ0lim
n→∞
1
nlog P(Nn Uǫ(ψ)) = T2J
2J
X
k=1
Γ(vk)
Proof of 4.2.6: Let Nre,(T2J,2T2J,...,(2J1)T2J)
nbe the scaled restarted
process defined in 3.4.16 and 3.4.12 for k= 2J1 with the equidistant
s1=T2J, s2= 2 T2J. . . s2J1=T(2J1) 2J
and identically distributed rcps
N(0) , N(1) , . . . , N(2J1)
We abbreviate Nre
n:= Nre,(T2J,2T2J,...,(2J1)T2J)
nfor the rest of this section
4.2.4.
On the interval [kT2J,(k+1)T2J] the restarted process starts in Nre
n(kT2J)
and increases as the renewal counting process N(k). Note that Nre
nand ψare
defined with respect to the same cases. Let v1,...,v2Jbe the piecewise con-
stant slopes of ψ PJand set s0= 0, s2J=Tfor notational convenience.
Nre
n(t)ψ(t)
=
N(t)v1tif t[0, s1]
N(s1) + N(1)(ts1)v1s1v2(ts1) if t(s1, s2]
.
.
.
2J2
P
j=0
N(j)(sj+1 sj)vj+1(sj+1 sj)
+N(2J1)(ts2J1)v2J(ts2J1)
if t(s2J1, s2J] = (s2J1, T]
=
2J1
X
k=0
11t(sk,sk+1]
k1
X
j=0
N(j)(sj+1 sj)vj+1(sj+1 sj) + N(k)(tsk)vk+1(tsk)
4.2 LD of the ren. counting process 85
and from the triangle inequality
||Nre
nψ||
2J1
X
k=0
11t(sk,sk+1](4.4)
k1
X
j=0
|N(j)(sj+1 sj)vj+1(sj+1 sj)|+||N(k)(t7→ tvk+1)||[0,sk+1sk].
Now if
||N(j)(t7→ tvj+1)||[0,sj+1sj]ǫ
2J
then especially for the end point of the interval
|N(j)(sj+1 sj)vj+1(sj+1 sj)| ǫ
2J
and
||Nre
nψ||
2J1
X
k=0
11t(sk,sk+1]
k1
X
j=0
|N(j)(sj+1 sj)vj+1(sj+1 sj)|+||N(k)(t7→ tvk+1)||[0,sk+1sk]
2J1
X
k=0
11t(sk,sk+1]k1
X
j=0
ǫ
2J+ǫ
2J=
2J1
X
k=0
11t(sk,sk+1](k+ 1) ǫ
2J
ǫ.
Thus, by independence of increments
P(Nre
n Uǫ(ψ)) P(||N(k)
n(t7→ tvk+1)|| < ǫ 2Jk)
=
2J1
Y
k=0
P(||N(k)
n(t7→ tvk+1)|| < ǫ 2J)
lim
ǫ0lim inf
n→∞
1
nlog P(Nre
n Uǫ(ψ))
2J1
X
k=0
lim
ǫ0lim
n→∞
1
nlog P(||N(k)
n(t7→ tvk+1)|| < ǫ 2J)
2J1
X
k=0
T2JΓ(vk+1) = T2J
2J
X
k=1
Γ(vk)
86
From exponential equivalence of Nand the restarted process limits in the
form of the claim are the same for both processes by application of claim
7.2.1 of the appendix. This finishes the lower bound. The upper bound is
quite similar with blowing up radii from constant ǫto an increasing sequence
of radii ǫ, 2ǫ, . . . , 2Jǫ.
4.2.6
4.2.5 Towards linear geodesics
In sample path large deviations linear geodesics is the property that the rate
function is in integral form and the integrand, the so called local rate func-
tion, is convex (cf [9], definition 6.1). In their book [9] Ayalvadi Ganesh, Neil
O’Connell, and Damon Wischik approach large deviations through proving
a sample path large deviation principle once and then deducing further large
deviation principles by application of the contraction principle. To get an ex-
plicit rate function from the contraction principle a variational problem has
to be solved - and here linear geodesics helps. Technically linear geodesics is
the settting where Jensen’s inequality can be applied.
In the queueing setting a nice application of the contraction principle and
linear geodesics is the large deviations of the queue size at some fixed time
t > 0 if the smoothed queue has started in t= 0 with size x0. In this and
many other cases we can observe a most likely path to the event of interest
which is piecewise linear.
While linear geodesics starts with a rate function in integral form and can
prove the implication that the process “likes” to move along piecewise linear
functions, we argue the other way around: Aiming to get a large deviation
principle with a rate function in integral form (like in the Markovian case)
we can already prove that the scaled renewal counting process deliberately
stays close to linear functions if it can.
Claim 4.2.7. For ψ P0and φ P1with φ(0) = ψ(0),φ(T) = ψ(T)and
||φψ|| >0
lim
ǫ0lim
n→∞
1
nlog P(Nn Uǫ(φ)) <lim
ǫ0lim
n→∞
1
nlog P(Nn Uǫ(ψ))
Proof of 4.2.7: We name slopes of ψand φfirst:
ψ: [0, T][0,), t 7→ tv
φ: [0, T][0,), t 7→ (tw1for t[0,T
2]
T
2w1+ (tT
2)w2for t(T
2, T]
4.2 LD of the ren. counting process 87
-t
6
T
2T
φ(T), ψ(T)
v
w1
w2
Figure 4.3: ψwith slope vand φwith slopes w1, w2
The paths’ different slopes relate as
Tv =T
2w1+ (TT
2)w2=T(1
2w1+1
2w2)
And from strict convexity of the Fenchel-Legendre transform Γ
Γ(v) = Γ(1
2w1+1
2w2)<1
2Γ(w1) + 1
2Γ(w2)
which in terms of the decay rate for the tubes is
TΓ(v)<T
2Γ(w1) + T
2Γ(w2)
lim
ǫ0lim
n→∞
1
nlog P(Nn Uǫ(ψ)) >lim
ǫ0lim
n→∞
1
nlog P(Nn Uǫ(φ))
4.2.7
Comparing two elements of P1the one that is close to a linear function is
asymptotically preferred by the scaled process Nn. In preparation for claim
4.2.9 we make the following
Definition 4.2.8. Let ψ, φ P1be defined as
ψ: [0, T][0,), t 7→ (tv1for t[0,T
2]
T
2v1+ (tT
2)v2for t(T
2, T]
φ: [0, T][0,), t 7→ (tw1for t[0,T
2]
T
2w1+ (tT
2)w2for t(T
2, T]
with non-negative v1, v2, w1, w2and φ(0) = ψ(0),φ(T) = ψ(T).
88
-t
6
T
2T
φ(T), ψ(T)
v1
v2
w1
w2
-t
6
T
2T
φ(T), ψ(T)
v1
v2
w1
w2
Figure 4.4: Two sets of {ψ, φ}suiting definition 4.2.8
Note that v1, v2have the same distance to ψ(T)
T.
ψ(T)
T=1
2(v1+v2)ψ(T)
Tv2=v1ψ(T)
T
|ψ(T)
Tv2|=|ψ(T)
Tv1|
Writing slopes as a vector ~v =v1
v2we get
ψ(T)
T=1
2(v1+v2)v2= 2ψ(T)
Tv1
~v =v1
v2=ψ(T)
T
|{z}
mean slope 1
1+ (v1ψ(T)
T)1
1
|{z }
off-set from mean slope
which is visualised in figure 4.5. Since all the above considerations apply to
φand its vector of slopes ~w as well, the figure shows again a ~v corresponding
to ψand a ~w corresponding to φ.
Claim 4.2.9. For φ, ψ of definition 4.2.8 If
||~v ψ(T)
T1
1|| <||~w ψ(T)
T1
1||
then
lim
ǫ0lim
n→∞
1
nlog P(Nn Uǫ(φ)) <lim
ǫ0lim
n→∞
1
nlog P(Nn Uǫ(ψ))
4.2 LD of the ren. counting process 89
-R0
6
R0
2ψ(T)
T
ψ(T)
T
ψ(T)
T
2ψ(T)
T
@@@@@@@@@@@@@
@
r~v
6
?
-
r~w -
6
?b
ψ(T)
T1
1
Figure 4.5: Slopes ~v for ψand ~w for φsuiting definition 4.2.8.
Proof of 4.2.9: Consider the vectors of slopes ~v, ~w in R2
0. Note that
||~v ψ(T)
T1
1|| =|v1ψ(T)
T| · || 1
1||
and thus
||~v ψ(T)
T1
1|| <||~w ψ(T)
T1
1|| |v1ψ(T)
T|<|w1ψ(T)
T|
which, due to symmetry, is equivalent to
|v2ψ(T)
T|<|w2ψ(T)
T|
Now we have an ordering
wi1< vj1<ψ(T)
T< vj2< wi2
with i1, i2, j1, j2 {1,2}and i16=i2, j16=j2. Lets assume that w1< v1<
ψ(T)
T< v2< w2. Then
vi=viw1
w2w1
w1+w2vi
w2w1
w2(i= 1,2)
90
and by convexity
Γ(vi)viw1
w2w1
Γ(w1) + (1 viw1
w2w1
) Γ(w2) (i= 1,2)
Γ(v1) + Γ(v2)
v1w1
w2w1
+v2w1
w2w1Γ(w1) + w2v1
w2w1
+w2v2
w2w1Γ(w2)
=v1+v22w1
w2w1
Γ(w1) + 2w2v1v2
w2w1
Γ(w2)
= Γ(w1) + Γ(w2)
Thus from
lim
ǫ0lim
n→∞
1
nlog P(Nn Uǫ(ψ)) = T
2(v1) + Γ(v2))
lim
ǫ0lim
n→∞
1
nlog P(Nn Uǫ(φ)) = T
2(w1) + Γ(w2))
follows the claim.
4.2.9
4.2.6 A limit
We can write the rhs of 4.2.6 in integral form as
lim
ǫ0lim
n→∞
1
nlog P(Nn Uǫ(ψ)) = ZT
s=0
Γ(ψ(s)) ds
While for finite Jthis integral is artifical, it gives us a uniform expression
for finite Jand the limit of J .
In the following we will apply a standard way of approximating absolutely
continuous functions.
Definition 4.2.10. For fAC[0, T]with f(s) = f(0) + Rs
r=0 g(r)dr for
some g L1define its piecewise linear approximation fJ PJthrough a
piecewise constant approximation of the almost derivative g.
gJ(s) = 1
T2JZs
T2JT2J
r=s
T2JT2J
g(r)dr
fJ=f(0) + ZgJ
4.3 LD of the ren. counting process 91
Claim 4.2.11. The fJof 4.2.10 have a limit “in the rate function”:
lim
J→∞ lim
ǫ0lim
n→∞
1
nlog P(Nn Uǫ(fJ)) = ZT
s=0
Γ(f(s)) ds
Proof of 4.2.11: The gJapproximate gin a pointwise fashion (cf [5] C.13,
a Lebesgue theorem). An application of the pointwise convergence of gJg
and the Fatou-lemma tells us
lim inf
J→∞ ZT
s=0
Γ(fJ(s)) ds = lim inf
J→∞
2J1
X
l=0
2JTΓ(gJ(l2JT))
= lim inf
J→∞ ZT
s=0
Γ(gJ(s)) ds
Fatou
ZT
s=0
lim inf
J→∞ Γ(gJ(s)) ds
lsc
ZT
s=0
Γ(g(s)) ds
=ZT
s=0
Γ(f(s)) ds
We get the other direction, an upper bound, immediately from Jensen’s in-
equality. We only need this if the lower bound is finite.
Γ(gJ(lT2J)) = Γ2J
TZ(l+1)T2J
r=lT 2J
g(r)dr2J
TZ(l+1)T2J
r=lT 2J
Γ(g(r)) dr
thus
ZT
s=0
Γ(fJ(s)) ds =
2J1
X
k=0
2JTΓ(gJ(kT2J))
2J1
X
k=0
2JT2J
TZ(k+1)T2J
r=kT 2J
Γ(g(r)) dr
=ZT
r=0
Γ(g(r)) dr =ZT
s=0
Γ(f(s)) ds
What we have now is a well defined limit for any f AC.
lim
J→∞ ZT
s=0
Γ(fJ(s)) ds =ZT
s=0
Γ(f(s)) ds (4.5)
So the claimed equality is proved.
4.2.11
92
4.3 The LDP in sample space
In this section we develop a full large deviation principle for the counting
process. We will apply the local large deviations, the tube limits, found in
previous sections and repeat some of the techniques already developed. The
main object of this section are ǫ-neighbourhoods of piecewise linear functions:
the technical difference to the tubes in the local large deviations is that we
have fixed ǫ > 0 instead of a limit ǫ0.
4.3.1 The weak large deviation principle
The following is an application of theorem 4.1.11 of [5]. We state it here in
the notation fitting our context and as implied by the remark following the
theorem.
Theorem 4.3.1 (Dembo and Zeitouni).In the space of continuous functions
over a compact interval equipped with the sup-norm (C[0, T],||.||) let Abe a
base of the topology. If for every U A
lim
n→∞
1
nlog P(ˆ
NnU) (4.6)
exists in R {−∞} then ˆ
Nnsatisfies the weak LDP with the rate function I
defined as
I(f) := sup
U∈A :fU
lim
n→∞
1
nlog P(ˆ
NnU).
The theorem takes place in (C[0, T],||.||) and the large deviation object
has to be the interpolated counting process ˆ
NnC[0, T]. To calculate
the limit (4.6) we will be working with the undelayed and not-interpolated
counting process Nsince limits for both processes are the same (cf claim
7.2.1 of the appendix).
Claim 4.3.2. For ψ P0,ψ(t) = vt for some v0, and ǫ > 0
lim
n→∞
1
nlog P(Nn Uǫ(ψ)) = Tinf
w[vǫ
T, v+ǫ
T]Γ(w)
Proof of 4.3.2: For the lower bound we apply our tubes limit. Let δ > 0
and w[vǫ
T+δ
T, v +ǫ
Tδ
T].
P(Nn Uǫ(ψ)) PNn Uδ(t7→ tw)(δ(0, ǫ])
lim inf
n→∞
1
nlog P(Nn Uǫ(ψ)) TΓ(w)
4.3 LD of the ren. counting process 93
As we let δ0 the restrictions for wbecome w(vǫ
T, v +ǫ
T) resulting
in
lim inf
n→∞
1
nlog P(Nn Uǫ(ψ)) Tinf
w(vǫ
T, v+ǫ
T)Γ(w)
Tinf
w[vǫ
T, v+ǫ
T]Γ(w).
The upper bound is simplified to the large deviation of the mean (one-
dimensional LDP in claim 3.5.1).
P(Nn Uǫ(ψ)) PNn(T)[Tv ǫ , Tv +ǫ])
=P1
nT N(nT)[vǫ
T, v +ǫ
T])
lim sup
n→∞
1
nlog P(Nn Uǫ(ψ)) Tinf
w[vǫ
T, v+ǫ
T]Γ(w)
We can rephrase 4.3.2 the following way: There is a φ P0 Uǫ(ψ) such
that the probability to stay within Uǫ(ψ) is carried by some φ:
lim
n→∞
1
nlog P(Nn Uǫ(ψ)) = lim
δ0lim
n→∞
1
nlog P(Nn Uδ(φ))
That φis linear agrees with linear geodesics (cf section 4.2.5).
Claim 4.3.3. For ψ P1and ǫ > 0
lim
n→∞
1
nlog P(Nn Uǫ(ψ)) = inf
φ∈P1∩Uǫ(ψ)ZT
t=0
Γ(φ(t)) dt
Proof of 3.4.7: Let v1, v20 be such that
ψ(t) = (v1tfor t[0,T
2]
v1T
2+v2(tT
2) for t(T
2, T]
For a lower bound: if φ P1with non-negative slopes w1, w2is such that
w1(v12ǫ
T, v1+2ǫ
T)
and w1+w2(v1+v22ǫ
T, v1+v2+2ǫ
T)
94
then for δ > 0 small enough Uδ(φ) Uǫ(ψ). Figure 4.6 shows such ψand φ.
Now we can bound applying claim 4.2.6 to J= 1, φand Uδ(φ).
lim inf
n→∞
1
nlog P(Nn Uǫ(ψ)) lim
δ0lim
n→∞
1
nlog P(Nn Uδ(φ))
=T
2Γ(w1) + Γ(w2)
-t
6
T
2T
v1
v2
ψ(·)
ǫ
```````````````
```````````````
w1
``````````````````````````````
w2
φ(·)
Figure 4.6: Solid ψwith slopes v1, v2and dashed φwith slopes w1, w2,
φ Uǫ(ψ)
Let ~w =w1
w2, ~v =v1
v2and to abbreviate the condition on ~w set
S1(v, ǫ) := {wR2:w1[v12ǫ
T, v1+2ǫ
T],
w1+w2[v1+v22ǫ
T, v1+v2+2ǫ
T]}
V1:S1(v, ǫ)R, ~w 7→ Γ(w1) + Γ(w2)
4.3 LD of the ren. counting process 95
and optimise the lower bound:
lim inf
n→∞
1
nlog P(Nn Uǫ(ψ)) T
2inf
~w∈S1(~v,ǫ)Γ(w1) + Γ(w2)
To get an upper bound we apply the finite dimensional large deviation prin-
ciple of 3.5.1: Consider the process only at fixed epochs T
2and T:
P(Nn Uǫ(ψ)) PNn(T
2, T)
(ψ(T
2)ǫ , ψ(T
2) + ǫ)×(ψ(T)ǫ , ψ(T) + ǫ)
PNn(T
2, T)
[ψ(T
2)ǫ , ψ(T
2) + ǫ]×[ψ(T)ǫ , ψ(T) + ǫ]
Now apply the upper bound for closed sets of claim 3.5.2:
lim sup
n→∞
1
nlog P(Nn Uǫ(ψ))
T
2inf
~zFΓ(2z1
T) + Γ(2(z2z1)
T) (4.7)
with F= [ψ(T
2)ǫ , ψ(T
2) + ǫ]×[ψ(T)ǫ , ψ(T) + ǫ] (4.8)
Define V2the following way.
V2:FR, ~z 7→ Γ(2z1
T) + Γ(2(z2z1)
T)
To match lower and upper bound we need
inf
~w∈S1(~v,ǫ)V1(~w) = inf
~zFV2(~z)
We prove a little more in lemma 4.3.4 which finishes the proof.
4.3.3
Lemma 4.3.4. V1=V2Ufor a regular transformation U:S1(~v, ǫ)F.
Proof of 4.3.4: Define Uas
U:S1(~v, ǫ)F , ~w 7→ T
21 0
1 1 ~w.
96
It is well defined since
U(~w)FT
2w1[ψ(T
2)ǫ , ψ(T
2) + ǫ]
T
2(w1+w2)[ψ(T)ǫ , ψ(T) + ǫ]
·2
T
w1[2ψ(T
2)
T
|{z}
=v1
2ǫ
T,2ψ(T
2)
T+2ǫ
T]
w1+w2[2ψ(T)
T
|{z}
=v1+v2
2ǫ
T,2ψ(T)
T+2ǫ
T]
~w S1(~v, ǫ).
Regularity is immediate from S1(~v, ǫ) having relative dimension 2 and the
matrix representation. Further note that mapping of ~z Fonto the argu-
ments of Γin (4.7)
~z 7→ 2z1
T
2(z2z1)
T=2
T1 0
1 1 ~z =U1(~z)
is the inverse transformation. Which already implies the claim. To be very
exact
V2U(~w) = V2T
2w1
T
2(w1+w2)
= Γ(2
T
T
2w1) + Γ(2
T(T
2(w1+w2)T
2w1))
= Γ(w1) + Γ(w2)
=V1(~w)
4.3.4
Claim 4.3.3 holds in the general case of JN, too. We state it and give the
definition of the generalised objects, but do not prove the general case.
Remark 4.3.5. For ψ PJfor some fixed JNand ǫ > 0
lim
n→∞
1
nlog P(Nn Uǫ(ψ)) = inf
φ∈PJ∩Uǫ(ψ)ZT
t=0
Γ(φ(t)) dt
The following is the definition analogue to the case J= 1 and an appli-
cation of this definition.
4.3 LD of the ren. counting process 97
Remark 4.3.6. Given ψ PJwith slopes v1,...,v2Jforming ~v R2J
SJ(~v, ǫ) := {wR2J
0:k {1,...,2J}:
k
X
l=1
wl[
k
X
l=1
vl2Jǫ
T,
k
X
l=1
vl+2Jǫ
T]}
F=×2J
k=1[ψ(k T 2J)2Jǫ
T, ψ(k T 2J) + 2Jǫ
T]
U:SJ(~v, ǫ)F , ~w 7→ T2J
1 0 0 ... 0
1 1 0 ... 0
.
.
..
.
........
.
.
1 1 ... 1 0
1 1 ... 1 1
~w
If ψ , φ PJwith slopes vi(for ψ) and wi(for φ) for i= 1,...,2Jforming
~v, ~w R2Jthen
φ Uǫ(ψ)~w SJ(~v, ǫ)U(~w)F
According to 4.3.1 we have a rate function for the weak LDP. We want
to identify the rate function as the decay rate on tubes.
Claim 4.3.7 (Rate function identification).If ψ PKthen
I(ψ) = ZT
s=0
Γ(ψ(s)) ds.
Proof of 4.3.7: Let ψ PKand Kbe minimal in that ψ6∈ PK1. From
4.3.1 we have I(f) = supU∈A,fUlimn→∞ 1
nlog P(NnU). Applying the
form of our base as U=Uǫ(ψ) for some ψ PJwith JN, ǫ R.
I(f) = sup
ǫR, JN
ψ∈PJ, f∈Uǫ(ψ)
lim
n→∞
1
nlog P(Nn Uǫ(ψ))
|{z }
=infξ∈PJ∩Uǫ(ψ)RΓξ
In the infimum we need to concider all ψ Uǫ(f) and Uǫ(ψ) PJwill never
be empty since it contains ψby construction - we will not see the infimum
over the empty set.
We immediately have I(f)RΓf(fix J=Kand let ǫ0).
98
To get the opposite inequality fix a feasible combination of J, ǫ, ψ: such
that ψ PJand f Uǫ(ψ). If KJthen f PJ Uǫ(ψ) and
inf
ξ∈PJ∩Uǫ(ψ)ZΓξZΓf
If J < K optimising over PK Uǫ(ψ) instead of PJ Uǫ(ψ) would generally
decrease the infimum. However, from section 4.2.5 on linear geodesics we
know that the functions ξ PK(Pj)cwe added to the set of restrictions
do not decrease but increase the decay rate RΓξ. Thus
inf
ξ∈PJ∩Uǫ(ψ)ZΓξ= inf
ξ∈PK∩Uǫ(ψ)ZΓξZΓf
We have a uniform bound for the infimum (uniform in ǫ, J, ψ) which implies
I(f) = sup
ǫR, JN
ψ∈PJ, f∈Uǫ(ψ)
inf
ξ∈PJ∩Uǫ(ψ)ZΓξsup
ǫR, JN
ψ∈PJ, f∈Uǫ(ψ)ZΓfZΓf
matching the lower bound of I(f).
4.3.7
We want this to be the general form of the rate function.
Claim 4.3.8 (Rate function identification).I(ψ) = RT
s=0 Γ(ψ(s)) ds for
ψAC[0, T].
Proof of 4.3.8: Let ψbe absolutely continuous and ψ6∈ SJNPJ. We
investigate again
I(f) = sup
U∈A,fU
lim
n→∞
1
nlog P(NnU)
= sup
ǫR, JN
ψ∈PJ, f∈Uǫ(ψ)
inf
ξ∈PJ∩Uǫ(ψ)ZΓξ
Let fJbe approximations of fin PJ(cf 4.2.10) and assume that limJ→∞ I(fJ)<
and let γ > 0 be some small number. Choose Jlarge enough for
|I(fJ)lim
K→∞ I(fK)|<γ
2
to hold and ǫ > 0 small enough for
|I(fJ)inf
ξ∈PJ∩Uǫ(fJ)I(ξ)|<γ
2
4.3 LD of the ren. counting process 99
to hold. We then get
|inf
ξ∈PJ∩Uǫ(fK)I(ξ)lim
K→∞ I(fK)|< γ
With J , ǫ chosen according to γand now fixed we have
I(f)inf
ξ∈PJ∩Uǫ(fJ)I(ξ)lim
K→∞ I(fK)γ
And since γwas arbitrarily small
I(f)lim
K→∞ I(fK) (=: I)
Now lets assume the inequality was strict: that is the following defines a
positive number.
γ:= sup
ǫR, JN
ψ∈PJ, ψ∈Uǫ(f)
inf
ξ∈PJ∩Uǫ(ψ)I(ξ)I= sup
ǫR, JN
ψ∈PJ, ψ∈Uǫ(f)inf
ξ∈PJ∩Uǫ(ψ)I(ξ)I
There are basically two possibilities to choose ǫ, J, ψ in the supremum. We
investigate the argument of the supremum in both cases.
1st case. ǫ, J, ψ are chosen such that fJ Uǫ(ψ) then infξ∈PJ∩Uǫ(ψ)I(ξ)
I(fJ) and
inf
ξ∈PJ∩Uǫ(ψ)I(ξ)I= inf
ξ∈PJ∩Uǫ(ψ)I(ξ)I(fJ)
|{z }
0
+I(fJ)I
|{z }
0
0
2nd case. If on the other hand ǫ, J, ψ are such that fJ6∈ Uǫ(ψ) then we have
ψ PJ Uǫ(f) Uǫ(fJ)c. That is ψis an element of PJthat is closer in the
sup-norm to fthan fJ.
Let ||fψ|| =: δ < ǫ and Klarge enough for ||fKf|| < ǫ δto hold.
Then ||fKψ|| < ǫ and
inf
ξ∈PJ∩Uǫ(ψ)I(ξ)I= inf
ξ∈PK∩Uǫ(ψ)I(ξ)I
= inf
ξ∈PK∩Uǫ(ψ)I(ξ)I(fK)
|{z }
0
+I(fK)I
|{z }
0
0
So γ0.
4.3.8
Since Γis convex the rate function Iis convex, too.
Looking at [5] and their proof of the sample path LDP for the partial sums
process in lemma 5.1.6 (p. 181, top) we see that the rate function Iis con-
centrated on absolutely continuous functions.
100
4.3.2 The full large deviation principle
We can now strengthen the weak large deviation principle to a full one.
Claim 4.3.9. In the space of continuous functions C([0, T],R)equipped with
the sup-norm induced topology the interpolated renewal counting process ˆ
N
(under the scaling ˆ
Nn:t7→ 1
nˆ
N(nt)) satisfies: for any open set Gand any
closed set F
inf
fGI(f)lim infn→∞ 1
nlog P(ˆ
NnG)
lim supn→∞ 1
nlog P(ˆ
NnF) inf
fFI(f)
with the good convex rate function
I(f) = (RT
t=0 Γf(t)dt if fAC([0, T],R), f(0) = 0
else.
Proof of 4.3.9: From the weak LDP the full LDP follows as soon as we
have a good rate function in the weak LDP. As we are in a Polish space
goodness of the rate function is the compactness of its level sets.
We apply the Arzela-Ascoli theorem (cf 7.5.1) to identify level sets
L(c) = {f|I(f)c}
with c0 as compact subsets of C([0, T],R).
Closedness of the level set is given as a property of Ibeing a rate
function.
I(f)cimplies f(0) = 0, so initial points of elements of L(c) are
bounded.
It remains to be shown that
(t, ǫ)δ > 0 : |ts|< δ sup
f∈L(c)
|f(t)f(s)|< ǫ
Remember 4.2.11 and let f L(c), JN, and fJits approximation with
piecewise constant derivatives vJ
1,...,vJ
2J:vk=1
T2JRT2Jk
t=T2J(k1) f(t)dt.
4.3 LD of the ren. counting process 101
(Note that only 2Jis a power and the Jin fJ, vJ
kis an index)
T2J
2J
X
k=1
Γ(vJ
k) = I(fJ)I(f)cfor each J
max
k=1,...,2JT2JΓ(vJ
k)cfor each J
sup
JN,k=1,...,2J
T2JΓ(vJ
k)c
sup
t,s<t
(ts) Γ(f(t)f(s)
ts)c
Fix t, ǫ > 0 and let mc
ǫ. From limx→∞ Γ(x)
x=let Mbe such that
Γ(x)
x> m for xM. Set δ=c
ǫ
f(t)f(s)
ts< M f(t)f(s)M(ts)ǫfor |ts|< δ
otherwise
f(t)f(s)
tsM
(ts) Γ(f(t)f(s)
ts)> m(f(t)f(s))
f∈L(c)
c(ts) Γ(f(t)f(s)
ts)> m(f(t)f(s))
cm(f(t)f(s))
Γ(f(t)f(s)
ts)> m(f(t)f(s))
ǫ=c
mf(t)f(s)
4.3.9
For an alternative proof see [23], lemma 5.18. There compactness of level
sets is proved for the Poisson process, which is the renewal process with
exponential inter event times. It works exactly the same for all rcp in the
scope of this thesis; especially the no-point mass in {0}-property of assump-
tion 2.2.2 is important since it is equivalent to Λ(0) = which is equivalent
to limx→∞ Γ(x)
x=.
Corollary 4.3.10. In the space D([0, T],R)equipped with the sup-norm in-
duced topology the sequence of scaled renewal counting processes (Nn;nN)
satisfies a sample path large deviation principle with the good, convex rate
function I(.)of 4.3.9.
102
The corollary follows from claim 4.3.9 by an application of lemma 4.1.5
and theorem 4.2.13 of Dembo and Zeitouni [5].
4.3.3 Interpretation
We denote Γthe local rate function for the large deviation principle of
the counting process. Note that with reference to Big Queues [9] (section 6.2
definition 6.1, p .99) we now do have linear geodesics for the counting process.
Given some fixed ψAC[0, T] with ψ0 where it exists, what can we
do?
Approximate ψby ψJfor some large Jand denote vR2Jthe vector
of slopes of ψJ. The rate function I(ψ) is well approximated by the
finite sum T2JP2J
k=1 Γ(vk). Each summand has Γ(vk) = θ(vk)·vk
Γ(θ(vk)) where θ(vk) as a twist parameter makes vkthe expectation
of limn→∞ Nn(1) under the twisted measure θ(vk).
Simulate a counting proccess that (over a long time) behaves different
from what would be expected in terms of its empirical mean.
Find the most likely paths for certain events. The most likely path
to a “too large” or “too small” value at some fixed time will be along
some piecewise linear function with only two different slopes. This is a
typical application of linear geodesics.
Apply the contraction principle and solve the associated variational
problem.
4.4 Split counting processes
This is a first step towards networks. In a network of dNnodes customers
leaving a node i {1,...,d}may be routed to another network node or may
leave the network, cf figure 4.7. The queue of customers waiting for service at
node imay be non-empty over a period of time. During this time times be-
tween subsequent departures are the customers service times. Equivalently:
if over an interval of time the queue at node iis never empty then increments
of the departure process from this queue are increments of the service process
at this queue.
When describing the network and how it evolves in time we want to doc-
ument where customers departing from a node go to next. As customers
4.4 LD of the ren. counting process 103
node
i
r
r
r
*
HHHHH
Hj
?
leaving the network
``````
*
``````
j
` ` ` ` ` `
-
`````` -
`
`
`
`
`
`
R
`
`
`
`
`
`
Figure 4.7: Possible routing of customers leaving node i
leaving queue iare routed into different directions on a technical level we are
splitting the service process.
We will then need a linear transformation to describe the number of cus-
tomers leaving node iand the number of customers being routed to other
network nodes as a vector-valued process in time. Customers leaving the
network will not be counted they only appear as leaving node i. With an
additional condition the linear transformation we apply is a bijection.
In this section we develop the large deviations for a split renewal counting
process under a linear transformation. We start in 4.4.1with constructing
the split process and calculating its lmgf. In another subsection we give the
explicit linear transformation we will later apply in the generalised Jackson
network. We continue in 4.4.2 with an exponential change of measure that
transforms the split rcp into another split rcp. We also give the change of
measure explicitly for the linearly transformed split process we will apply in
the network setting. Finally, in 4.4.3 we develop the full sample path large
deviations principle for the split and the linearly transformed split counting
process..
The examples we give in this section fit the example-network we will work
with in chapter 6.
4.4.1 Construction of the split process
Let us split a counting process Ninto mprocesses N(1),...,N(m)with the
property that Pm
j=1 N(j)
t=Ntfor all t0 and that (N(1),...,N(m)) as an
m-tupel changes state iff Ndoes. If it changes state it will be by increasing
one coordinate by 1.
104
Definition 4.4.1 (Split rcp Nsp associated with N, p).Let Nbe a rcp and
p= (p1,...,pm)for some mNsuch that P(r=ei) = pidefines the
distribution of ron Rm(eiis the i-th standard base vector in Rm). Let
r, r1, r2,... be iid and define
Nsp
t:=
Nt
X
i=1
ri(Rm).
The coordinates of the split process Nsp are renewal counting processes
and we will identify t7→ PNt
i=1 riand
N(1)
.
.
.
N(m)
.
Definition 4.4.2 (Scaled split process).For the split process Nsp defined in
4.4.1 define the scaled split process Nsp
nand
Nsp
n:t7→ 1
nNsp
nt =1
n
Nnt
X
i=1
ri
And we can identify the scaled split process with the split process of all
its coordinates scaled as in 3.4.1:
Nsp
n=
N(1)
n
.
.
.
N(m)
n
, Nsp
n(t) =
1
nN(1)(nt)
.
.
.
1
nN(m)(nt)
Example 4.4.3. Let m= 5 and p= (0,1
2,1
2,0,0). Then Nsp =
0
N(2)
N(3)
0
0
.
Figure 4.8 shows a realisation of Nand the coordinate processes N(2) and
N(3) of Nsp. Note that N=N(2) +N(3) in the figure. Inter event times of
Nare uniformly distributed on (0,2).
We define the lmgf for a discrete probability measure pand then the lmgf
for the split counting process.
Definition 4.4.4. The logarithmic moment generating function of rwith
P(r=ei) = pifor i= 1,...,m is denoted K(as a capital Greek letter). For
θRm
K(θ) = log E[ehθ,ri] = log
m
X
j=1
ehθ,ejipj= log
m
X
j=1
eθjpj.
4.4 LD of the ren. counting process 105
0 2 4 6 8 10 12
0
5
10
N
0 2 4 6 8 10 12
0
5
10
N(2)
0 2 4 6 8 10 12
0
5
10
N(3)
original process
2. coordinate of the split process
3. coordinate of the split process
Figure 4.8: Realisation of Nand coordinates of Nsp of example 4.4.3.
It is D(K) = Rmeither from the explicit form of the lmgf or from bound-
edness of rand by 2.6.2 the Fenchel-Legendre transform Khas compact
level sets.
Claim 4.4.5. If the counting process Nwith lmgf Γis split wrt the probability
measure pwith lmgf Kthen the lmgf of the split process Nsp associated with
Nand pis
lim
t→∞
1
tlog E[ehθ , Nsp
ti] = Γ K(θ).
Proof of 4.4.5: The split process is a time change of the partial sum
n7→ Pn
k=1 rkreplacing nby Ntresulting in t7→ PNt
k=1 rk. Note that coordi-
nates of Pn
k=1 rkare binomially distributed.
106
We calculate exponential moments for the split process. Let θRm.
E[ehθ , Nsp
ti] = E[ehθ , PNt
k=1 rki]
=E[ehθ , Pn
k=1 rki
X
n=0
11Nt=n]
=
X
n=0 X
i1,...,im
Pm
l=1 il=n
P(Nt=n ,
n
X
k=1
rk=
i1
.
.
.
im
)
|{z }
=P(Nt=n)P(Pn
k=1 rk=... )
ePm
j=1 θjij
=
X
n=0
P(Nt=n)X
i1,...,im
Pm
l=1 il=n
n!
i1!i2!···im!
m
Y
j=1
pij
jePm
j=1 θjij
|{z }
=Qm
j=1(pjeθj)ij
=
X
n=0
P(Nt=n)(
m
X
j=1
pjeθj)n
=E[exp{Ntlog
m
X
j=1
pjeθj}]
We take the scaled limit to obtain the lmgf for the split process.
lim
n→∞
1
nlog E[ehθ , Nsp
nt i] = lim
n→∞
1
nlog E[exp{Nnt log
m
X
j=1
pjeθj}]
=tΓlog
m
X
j=1
pjeθj=tΓK(θ)
4.4.5
We’ll later need a linear transformation of the split process Nsp.
Remark 4.4.6. Generally, given a random variable XRmwith lmgf K
and TXa linear transformation of Xthe lmgf of TXis θ7→ K(Tθ). Es-
pecially we might use a linear transformation to only work with a subset
M {1,...,m}of coordinates. Then
T=diag(111M,...,11mM) = T
4.4 LD of the ren. counting process 107
and thus
KT(θ) = K(X
kM
θkek) = log X
kM
pkeθk+X
k6∈M
pk
= log X
kM
pkeθk+ 1 X
kK
pk
which is the Ξ-style lmgf of a sub-probability measure pM.
Application to the network
We now give the explicit linear transformation we need in the network set-
ting and calculate the lmgf of the linearly transformed split counting process.
If a vector yRd+1 describes the next destinations of customers leaving
node i= 1 over an interval of time
with yd+1 the customers that have left the network at departure from
node i= 1
and customers cannot immediately join the same queue again
then
Pd+1
k=1 yk
y2
.
.
.
yd
Rd
has as its first coordinate the number of customers that have left the first
node i= 1. Remaining coordinates j= 2,...,dare the number of customers
who have gone from node i= 1 to node j. This formalises as a linear
transformation
T=
1... ... 1
0 1 . . . . . .
. . . . . .
0... 1 0
,T:
y1
.
.
.
.
.
.
yd+1
7→
Pd+1
k=1 yk
y2
.
.
.
yd
When working with a network of dnodes we need a family (T(i))i=1,...,d of
such transformations, one at each node. The Tjust given would be T(1).
108
Definition 4.4.7 (Transformation T(i)).Let d, i Z, d 2, i dand
{e1,...,ed}the standard base of Rd.
T(i):Rd+1 Rd, y 7→
d
X
k=1
k6=i
ykekd+1
X
k=1
ykei
For example T(2) transforms
T(2) :
y1
.
.
.
yd+1
7→
d
X
k=1
k6=2
ykekd+1
X
k=1
yke2=
y1
Pd+1
k=1 yk
y3
.
.
.
yd
and T(2) can be identified with the d×(d+ 1) matrix
1 0 ... 0 0 0
0 1 ... 0 0 0
.
.
.....
.
..
.
.
0 0 ... 1 0 0
0 0 ... 0 1 0
+
0 0 ... 0 0
11... 11
0 0 ... 0 0
.
.
..
.
..
.
..
.
.
0 0 ... 0 0
.
As we did split the counting process it was with respect to a probability
measure pwith lmgf K: all departing customers were considered. Now,
after the transformation we only consider customers staying in the network
and how they are split onto the other nodes: The measure we use becomes
a subprobability measure. In the following we define a lmgf Ξ for a sub-
probability measure.
Definition 4.4.8. Let pbe a sub-probability distribution on {1,...,d}set
p0= 1 Pd
k=1 pkand define
Ξ : RdR, ξ 7→ log d
X
k=1
pkeθk+p0
Since Ξ(θ)<for all θRdthe Fenchel-Legendre transform Ξhas
compact level sets (cf 2.6.2).
4.4 LD of the ren. counting process 109
Claim 4.4.9. Let Ξ(i)be associated with the sub-probability measure p(i)on
{e1,...,ed}with pii = 0. Let Nbe a counting process with lmgf Γand let
Nsp be the split process associated with Nand the unique probability measure
on {e1,...,ed,0}associated with p(i). The lmgf of the linearly transformed
split process is
lim
t→∞
1
tlog E[ehξ , T(i)Nsp
ti] = Γ(ξi+ Ξ(i)(ξ))
Proof of 4.4.9: If p(i)= (pi1,...,pid) is a sub-probability measure then
set pi0= 1Pd
j=1 pij and let Kbe associated with this probability measure.
Applying pii = 0 we get
Ξ(i)(ξ) = ξi+K(T(i)ξ)
from calculating
K(T(1)ξ) = K
ξ1
1
.
.
.
1
+
0
ξ2
.
.
.
ξd
0
pii=p11=0
= log d
X
j=1
pij eξ1+ξj+pi0eξ1
=ξ1+ log d
X
j=1
pij eξj+pi0
=ξ1+ Ξ(1)(ξ)
and apply the adaption of the lmgf to a linear transformation of the split
process 4.4.5 to obtain
lim
t→∞
1
tlog E[ehξ , T(i)Nsp
ti] = lim
t→∞
1
tlog E[ehT(i)ξ , Nsp
ti]
= Γ K(T(i)ξ) = Γ(ξi+ Ξ(i)(ξ)).
4.4.9
Remark 4.4.10. The transformation T(i)defined in 4.4.7 is a regular trans-
formation between ddimensional spaces when defined on
T(i):Rd+1 {x|xi= 0} Rd.
110
The inverse transformation is
RdRd+1 {x|xi= 0}, ξ 7→
ξ1
.
.
.
ξd
Pd
k=1 ξk
ξiei.
Conclusion 4.4.11. If Nis split wrt p(i)= (p1i,...,pdi)with pii = 0 then
the transformation between Nsp and T(i)Nsp is a linear bijection.
When splitting a counting process wrt some probability measure p(i)with
pii = 0 we’ll have N(i)
t0 and the split process Nsp
tRd+1 {x|xi= 0}.
We continue the example 4.4.3 where we had split a counting process: We
now apply a linear transformation.
Example 4.4.12. The transformation T=T(1) applied to the split process
0
N(2)
N(3)
0
0
results in
N
N(2)
N(3)
0
.
The transformation works a little different if the last coordinate process
is not 0.
Example 4.4.13. Let m= 5,p= (1
4,1
4,0,0,1
2), and the transformation
T=T(3)
N7→ Nsp 7→ T(3) Nsp :Nsplit
7→
N(1)
N(2)
0
0
N(5)
T(3)
7→
N(1)
N(2)
N
0
All examples will appear in the later example network of d= 4 nodes.
4.4.2 Change of measure for the split process
Changing the split processes means changing the unsplit process and the
Bernoulli random variable reigning the routing. Since we constructed the
counting process Nand the routing variables r1, r2,...to be independent we
multiply mass functions. First each variable is twisted exponentially.
4.4 LD of the ren. counting process 111
We apply the exponential twist of 2.3.1 to rwith the lmgf Kdefined in
4.4.4. For measurable ARm
P(rA) = X
k:ekA
pk
P(θ)(rA) = X
k:ekA
ehθ,eki−K(θ)pk=X
k:ekA
eθk
E[ehθ,ri]pk
We may also write
P(θ)(rA) = E[11rA
ehri
E[ehθ,ri]]
and we identify the change of measure
r7→ ehri−K(θ)=ehr,θi
E[ehθ,ri]
The sum Pn
k=1 rkhas lmgf nK by independence of r1,...,rnand the density
of the twisted distribution wrt the original distribution is
n
X
k=1
rk7→
n
Y
k=1
ehrki−K(θ)=ehPn
k=1 rk, θi−nK(θ)
and summands remain independent under the new measure.
The change of measure for the counting process was given in definition 3.6.4
and (3.15). We combine both changes of measure on the product space for
Ntand Pn
k=1 rk:
Claim 4.4.14. Let Nbe a counting process with lmgf Γand rRma routing
variable with mass function pand lmgf K. Then for θ D(K) = Rm
(θ, t)7→ exp{hNsp
t, θi tΓK(θ)}r(t, ΓK(θ))
is a change of measure process for the split process constructed from Nand p.
The r(·,·)appearing in the change of measure is a random function defined
in (3.6.5) referring to the distribution function Fof inter event times of N
and the age B(and r6=r(·,·)).
112
Proof of 4.4.14: Since Nand Pn
k=1 rkare independent we twist them
individually with ζRthe twist parameter for Nand θRmthe twist
parameter for Pn
k=1 rk. Let jNand xNm.
Pθ[ζ](Nt=j ,
n
X
k=1
rk=x)
=Eθ[ζ][ 11Nt=j11Pn
k=1 rk=x]
=Eθ[ 11Nt=jeζjtΓ(ζ)Fc
Γ(ζ)
Fc(B(t)) eB(t) Γ(ζ)11Pn
k=1 rk=x]
=E[ 11Nt=jeζjtΓ(ζ)Fc
Γ(ζ)
Fc(B(t)) eB(t) Γ(ζ)11Pn
k=1 rk=xehθ,xi−nK(θ)]
The following is for the case of n=jand ζ=K(θ).
Pθ[K(θ)](Nt=n ,
n
X
k=1
rk=x)
=E[ 11Nt=neK(θ)ntΓK(θ)Fc
ΓK(θ)
Fc(B(t)) eB(t) ΓK(θ)11Pn
k=1 rk=xehθ,xi−nK(θ)]
=E[ 11Nt=n11Pn
k=1 rk=xehθ,xi−tΓK(θ)Fc
ΓK(θ)
Fc(B(t)) eB(t) ΓK(θ)]
We apply this to the split counting process Nsp(t) = PN(t)
k=1 rk. Fix an arbi-
trary xNmand n=n(x) = Pm
k=1 xk. Then
P(Nsp(t) = x) = P(Nt=n ,
n
X
k=1
rk=x)
and
Pθ,[K(θ)](Nt=n ,
n
X
k=1
rk=x)
=E[ 11Nsp
t=xehθ,xi−tΓK(θ)Fc
ΓK(θ)
Fc(B(t)) eB(t) ΓK(θ)]
Which identifies the claimed density process.
4.4.14
4.4 LD of the ren. counting process 113
Application to the network
We rewrite the change of measure process to fit our transformed split counting
process. Let θRd+1 and T=T(1) for notational simplicity.
hNsp
t, θi=hTNsp
t,(T)1θi
ξ:= (T)1θ= (T1)θ=
0
θ2
.
.
.
θd
θd+1
1
1
.
.
.
1
with this definition θ=Tξand K(θ) = Ξ(1)(ξ)ξ1from definition 4.4.8.
We rewrite the change of measure 4.4.14 and denote it G(i).
Definition 4.4.15 (G(i)).For
a renewal counting process N
a subprobability p(i)= (pi1,...,pid)with pii = 0 and lmgf Ξ
the split process Nsp associated with Nand (pi1,...,pid,1Pd
j=1 pij)
a linear transformation T(i)
we define the change of measure process for T(i)Nsp with parameter ξRd
as
G(i)(ξ, t) = exp{hT(i)Nsp
t, ξi tΓ(ξi+ Ξ(i)(ξ))}r(t, Γ(ξi+ Ξ(i)(ξ))).
We give a summary of this subsection 4.4.2:
Corollary 4.4.16. Let Nbe a counting process with inter event times density
fand lmgf Γthat is split into d+ 1 processes wrt a probability measure p.
Let (p1,...,pd)be the sub-probability measure associated with pwith lmgf Ξ.
Then under the change of measure G(i)the new process is distributed like the
split process of iid inter event times with density fΓ(ξ1(1)(ξ)) and mean
Γ(ξ1+ Ξ(1)(ξ)) and with routing (sub)probabilities Ξ(i)(ξ) = p(ξ).
Definition 4.4.17. For a rcp Nwith lmgf Γand fixed iand ξRdthe
function
ξ7→ Γ(ξi+ Ξ(i)(ξ))
gives the rate of Nunder the change of measure G(i).
114
4.4.3 Sample path LDP for the split process
Claim 4.4.18. Let Nsp be the split process associated with the counting pro-
cess Nand the probability measure p. Let Nhave lmgf Γand plmgf K.
Then under the scaling of 4.4.2 a sample path large deviation principle holds
for Nsp in D([0, T],Rm)equipped with the sup-norm induced topology. The
rate function is good, convex and
h7→ ZT
t=0
K)(h(t)) dt
for hAC([0, T],Rm),h(0) = 0 and h7→ otherwise.
Proof of 4.4.18: Set Sn=Pn
k=1 rkand we get a sample path LDP for
the split process t7→ Nsp
t=PNt
k=1 =SN(t) similarly to the partial sums
process of iid summands in Mogulskii’s theorem (cf [5], theorem 5.1.2 with
K(θ)<for all θRm). We only sketch it in the following. We already
have the lmgf Γ Kand from restarting the counting process we easily get
the finite dimensional lmgfs for (1
nSN(0),1
nSN(nt1),...,1
nSN(nT)).
From the artner-Ellis theorem we get the finite dimensional large deviations
principle with the rate function as the sum over expressions of K).
Applying the projective limit theorem we get the large deviation principle
in the continuous functions with the rate function in integral form, the lo-
cal rate function the Fenchel-Legendre transform K). The topology
we get is that of pointwise convergence. One can then deduce that the rate
function is concentrated on absolutely continuous functions. To obtain the
large deviation principle in the sup-norm induced topology we have to prove
exponential tightness. This is done in 4.4.19.
4.4.18
Claim 4.4.19. The distributions of the scaled split counting process (Nsp
n;n
N)are exponentially tight under the sup-norm induced topology.
Proof of 4.4.19: The i-th coordinate process for the split process has inter
event times τ=PG
k=1 τkwith geometric Gwith mass function P(G=g) =
pi(1 pi)g1. To fit our notation: p= 1 pi.
By 2.2.11 the lmgf is Λ(θ) = Λ(θ) + log 1p
1peΛ(θ)and we need the associ-
ated Γ, that is Λ◦−1(θ). The inverse of Λis
x7→ Λ1log(p+ (1 p)ex)
and we get on the level of the counting process
Γ(x) = Λ◦−1(x) = Λ1log(p+ (1 p)ex).
4.4 LD of the ren. counting process 115
Substituting p= 1piand applying the definition of Γ through Λ (cf (2.2.7))
Γ(x) = Γlog(1 pi+piex).
This perfectly coincides with the coordinate-wise lmgf of the split counting
process: Consider x eifor some xR
ΓKπi(x ei) = Γ K(x ei) = Γlog(piex+X
m6=i
pm)
= Γlog(piex+ (1 pi))= Γ(x)
For each coordinate process N(k)(for k= 1,...,d) a sample path large
deviation principle holds (cf. 4.3.9). We apply these to construct a compact
set from the level sets of the coordinate functions. Let ǫ > 0.
L(α, k) := {fC([0, T],R)|ZT
t=0
Γ(k)(f(t)) dt < α}
K(α) := {fC([0, T],Rm)|fk L(α+ǫ, k) for k= 1,...,m}
Each L(α+ǫ, k) is a compact set in C([0, T],R) by the large deviation prin-
ciple 4.3.9 with good rate function. K(α) is a Cartesian product of compact
sets and itself compact in the product space C([0, T],R)m=C([0, T],Rm).
P(ˆ
Nn6∈ K(α)) = P(ˆ
N(k)
n6∈ L(α+ǫ) for some k)
mmax
k=1,...,m P(ˆ
N(k)
n6∈ L(α+ǫ, k))
The closure of L(α+ǫ, k)cis a subset of L(α, k)cand we can apply an
alternative formulation of the LDP (cf [DZ] (1.2.7), p.6).
lim sup
n→∞
1
nlog P(ˆ
N(k)
n6∈ L(α+ǫ)) α
lim sup
n→∞
1
nlog P(ˆ
N(k)
n6∈ K(α)) α
which is exponential tightness.
4.4.19
Application to the network
We simply restate the claim of the large deviation principle 4.4.18 in terms
of the process transformed by T. As usual d=m1.
116
Corollary 4.4.20. There is a sample path large deviation principle for the
transformed split process t7→ TNsp
tin D([0, T],Rd)equipped with the sup-
norm induced topology with the good convex rate function
h7→ ZT
t=0 Γ(i)πi)(h(t)) dt
for hAC([0, T],Rd),h(0) = 0 and h7→ otherwise.
Proof of 4.4.20: From 4.4.18 and from T(i)being a continuous bijection
we immediately have the sample path large deviation principle in the same
space with rate function
h7→ ZT
t=0 ΓKT1(h(t)) dt
and we only have to identify the local rate functions. Abbreviate T=T(i).
K)T1(x) = inf
zΓ(z) + z K(T1x
z)
linearity of T
= inf
zΓ(z) + z K(T1x
z)
def. 4.4.8
= inf
zΓ(z) + z(i)πi)(x
z)
= (i)πi))(x) (4.9)
4.4.20
Lemma 4.4.21 (Rate function identification).For the local rate function in
4.4.20 for the transformed split counting process holds
Γ(i)πi)(x) = inf
r(i)
γ(r(i)ei)=x
Γ(γ) + γΞ(i)(r(i))
Proof of 4.4.21: We make explicit the Fenchel-Legendre transform (4.9).
(i)πi))(x) = inf
γΓ(γ) + γ(i)πi)(1
γx)
= inf
γΓ(γ) + γinf
bRdΞ(i)(1
γxb) + (πi)(b)
4.4 LD of the ren. counting process 117
but for the projection πiand b6=eiwe get
(πi)(b)(7.1)
=π
i(b) = .
So the projection enforces b=eiand then vanishes. We continue
(i)πi))(x) = inf
γΓ(i)
S(γ) + γΞ(i)(1
γx+ei)
Writing the restriction differently as
r(i)=x
γ+eiγ(r(i)ei) = x
we have proved the claim.
4.4.21
Claim 4.4.22. If Ξis associated with the subprobability measure pon Rd
then Ξ(r)<for rrepresenting a strictly positive sub-probability measure
in Rdwith
p0= 0 r0= 0 and pi= 0 ri= 0.
Proof of 4.4.22: We have for θRd
Ξ(θ) = "pieθi
Pd
j=1 pjeθj+p0#i=1,...,d
and Ξ(θ) is a probability measure iff pis (p0= 0). For p0>0 and ri=
0pi= 0 set
θi= log ri
r0
p0
pi(i:pi>0).
In isuch that pi= 0 the θinever shows up in Ξ(θ) at all and can be set to
an arbitrary value. We get Ξ(θ) = rand an optimiser in the transform is
found. The coordinates θiwith isupp(p) are uniquely defined.
If there are isuch that ri= 0, pi>0 then setting this θi=−∞ also makes
Ξ(θ) = r. Formally an optimiser of the transform does not exist (not in
Rd). We nevertheless get an Ξ(r)<and explain it more formally:
Set z=Pi:ri>0pi+p0which will be z < 1 if there are isuch that ri=
118
0, pi>0. If z= 0 then p, r are orthogonal / their supports have an empty
intersection. So we assume z > 0 in the following:
sup
θRd
hθ, ri Ξ(θ) = sup
θi:ri>0
hθ, ri+ sup
θi:ri=0
Ξ(θ)
= sup
θi:ri>0
hθ, ri lim
θi→−∞:ri=0 Ξ(θ)
= sup
θi:ri>0
hθ, ri log d
X
j=1
rj>0
pjeθj+p0
= sup
θi:ri>0
hθ, ri log 1
z+ log d
X
j=1
rj>0
pjeθj+p0log 1
z
= sup
θi:ri>0
hθ, ri log d
X
j=1
rj>0
pj
zeθj+p0
z+ log 1
z
which is now of the kind Ξ(r) with Ξ associated with the submeasure p=
(pi
z;i {1,...,d} supp(r)). rand pare now equivalent and our initial
reasoning applies.
4.4.22
So if Ξ(r)<for r, p that are not equivalent there is no optimising θRd.
However, we can still associate with Ξ(r) a change of measure for the sub-
probability measure pthat changes pinto r. Note that Ξ is different from the
inter event times lmgfs in that it comes from a random variable with point
mass.
Corollary 4.4.23. If Ξis associated with the sub-probability measure pon
Rd(cf definition 4.4.8) and θRdthen
Ξ(θ) = "pieθi
Pd
j=1 pjeθj+p0#i=1,...,d
is a subprobability measure. Ξ(i)(θ)is a probability measure if pis and
Ξ(θ)i>0pi>0
Chapter 5
Stochastic networks and
associated processes
This chapter introduces stochastic networks and the processes we work with
in the next chapter.
We give a formal definition of stochastic networks starting from random
walks on graphs. We will point out how the generalised Jackson network and
the Jackson network are instances of stochastic networks.
As we are concerned with rare events for generalised Jackson networks where
we can observe initial queue sizes we introduce tools to describe the expected
behaviour of such a network with an initial condition. The initial condition
will tell us which queues are initially full with a large queue size and which
are about empty. In order to describe the expected future behaviour of the
network with a given present starting point we will apply the Skorohod map.
Processes we work with are the free, the network, and the local process
and we develop their change of measure. For the free process we will develop
sample path large deviations.
We repeat our notation for counting processes.
inter event rcp, renewal lmgf of rcp rate
time counting process
τ N :t7→ N(t) Γ(θ) = limt→∞ 1
tlog E[eθNt]Γ(0) = 1
E[τ]
chapter 2 definition 3.1.1 section 3.3 definition 3.6.12
We have further defined the scaled process Nnin 3.4.1, the split process
119
120
Nsp in 4.4.1 with Nsp
nthe scaled split process, and the restarted process
Nre,(s1,...,sk)in 3.4.16 with the scaling specified in 3.4.12.
And we repeat our assumptions
Assumption 5.0.2. If τ < is a non-deterministic inter event time then
the assumptions on inter event times of chapter 2 should hold: 2.2.2, 2.2.13,
2.4.2.
In this chapter we will have different counting processes in the setting
of a stochastic network: processes counting external arrivals at nodes of the
network, we will call them arrival processes, and those counting how many
customer can be served over any period of time, these will be denoted service
processes.
5.1 Stochastic networks
Definition 5.1.1 (Graph).A graph is a collection of nodes and edges. For
a graph of finitely many nodes we denote them as {1,...,d}for some dN.
Edges will be directed and are written
{(i, j)|i, j {1,...,d}}.
In this thesis we work with finite directed graphs only.
Definition 5.1.2 (Random walk on a graph).Given
a graph with nodes {1,...,d}for some dNand edges Y {1,...,d}×2
a substochastic matrix PRd,d with components pij >0(i, j)Y
and rows p(i)
a fixed node iA {1,...,d}
a fixed t00
for each i {1,...,d}a sequence of inter event times τ(i)S
1, τ(i)S
2,...
denoted service times
a random walk zon the graph is a stochastic process giving at each time tthe
position of a customer who enters the graph at time t0at node iAand travels
the graph according to the following rules:
when at node ifor the k-th time occupy the server for time τ(i)S
k
5.1 Stochastic networks and associated processes 121
at the end of the server occupation time / service time leave node i
immediately and either go to node jwith probability pij or leave the
network with probability 1Pd
j=1 pij.
If t0= 0 the process starts as z(0) = iA, if t0>0the process starts as
z(t) = 0 for t[0, t0)and z(t0) = iA. As the customer leaves the network,
the state of the process is fixed at 0.
This random walk on a graph will also be referred to as isolated random
walk on a graph.
Definition 5.1.3 (Stochastic network).A stochastic network is a joint ran-
dom walk on a graph with inter action: If dis the number of nodes of the
network and A(i)are counting processes modelling the arrivals to node isuch
that
for every jumptime tof A(j)and jump size A(j)(t) = kof any j
{1,...,d}there are krandom walks zj,A(j)(t)+1,...,zj,A(j)(t)with iA=
jand t0=t
then define the stochastic network process Zas
Z(i)(t) =
d
X
j=1
X
k=1
11zj,k(t)=i, Z(t) =
Z(1)(t)
.
.
.
Z(d)(t)
.
A selection of typical interaction of randomly walking customers in the
network is
queueing at single server nodes: if at arrival at a node the server is
occupied, an arriving customer has to wait until they can occupy the
server;
state dependent routing: the matrix Pj,n of the random walk zj,n may
be time-inhomogeneous and Pj,n(t) may depend on Z(t);
modified service: for each random walk the service times τ(i)
1, τ(i)
2,...at
node imay depend on the value of Zat the time the customer occupies
the server at node i.
If the isolated random walks forming the stochastic network are independent
(no interaction) the stochastic network represents a network with a pool of
infinitely many servers at each node. Generally queueing occurs whenever
there are more customers at a node than servers, as a server can only serve
122
one customer at a time. The number of customers exceeding the number of
servers forms the queue at that node. Possible permutations of the sequence
of customers arriving at a node and of customers starting service are speci-
fied in the queueing discipline (most well known are FIFO, LIFO which are
explained in every book on queueing). Examples for state dependent routing
are “join the shortest queue” and modifications of it, like: ”joint the shortest
queue of a randomly sampled subset”. Modified service might be: if queue
iis empty over time [t1, t2) then the server at node ibecomes an additional
server at some node jwith a non-empty queue ([8] of Robert Foley and David
McDonald).
Definition 5.1.4 (Generalised Jackson network).A generalised Jackson net-
work is a stochastic network with all random walks sharing
the same graph,
the same time-homogeneous, deterministic routing matrix P,
the same distribution of service times at each fixed node; and service
times are independent of how often a customer returns to the node.
Further more
inter arrival times at each node are iid and independent of everything
else,
service times at each node are iid and independent of everything else,
there is (only) queueing interaction with a single server at each node.
In this thesis we assume that all service times and all inter arrival times
satisfy assumption 5.0.2.
Note that under the general assumption 5.0.2 the counting process A(i)as-
sociated with inter arrival times at node iis a renewal counting process with
all jumpsizes equal to unity, and regular (finitely many jumps over finite in-
tervals). The stochastic network process for the generalised Jackson network
Zsatisfies
Z(i)(t) =
d
X
j=1
A(j)(t)
X
k=1
11zj,k(t)=i
d
X
j=1
A(j)(t)<
which makes Z:R0Nd.
5.1 Stochastic networks and associated processes 123
For the generalised Jackson network we have for each random walk on the
graph zj,n a sequence of inter event times at each node. Due to independence
of all service times of the isolated random walks we can replace service times
at each node iby just a single sequence of iid service times. These service
times will form the renewal counting processes S(1),...,S(d).
We summarise the counting processes introduced so far:
inter event time rcp lmgf rate
τ(i)AA(i):t7→ A(i)(t) Γ(i)
Aλi= Γ(i)
A(0) = 1
E[τ(i)A]
τ(i)SS(i):t7→ S(i)(t) Γ(i)
Sµi= Γ(i)
S(0) = 1
E[τ(i)S]
The following definition names the processes required to describe the
generalised Jackson network the networks primitives. We will later, in 5.3.4,
give another equivalent definition for the network primitives that will be
technically more convenient and will apply results from section 4.4.
Definition 5.1.5 (Network primitives I).For a generalised Jackson network
with dnodes the following processes are denoted the networks primitives:
arrival processes A(1),...,A(d)with inter event times τ(i)Aand lmgf Γ(i)
A
for A(i);
service processes S(1),...,S(d)with inter event times τ(i)Sand lmgf Γ(i)
S
for S(i);
the processes of routing decisions
n7→
n
X
k=1
r(i)
k
with r(i), r(i)
1, r(i)
2, . . . iid with values in {e1,...,ed,~
0}and P(r(i)=ej) =
pij for i= 1,...,d and Ξ(i)the lmgf of r(i), cf 4.4.8.
Definition 5.1.6 (Rates λ, µ, P)).For a generalised Jackson network with
network primitives as in 5.1.5 and typical inter arrival times τ(i)Afor the
primitive arrival process A(i)and typical service times τ(i)Sfor the primitive
service process S(i)(for i= 1,...,d) let
λi= Γ(i)
A(0) = 1
E[τ(i)A]and λRd
0have coordinates λifor i= 1,...,d;
µi= Γ(i)
S(0) = 1
E[τ(i)S]and µRd
>0have coordinates µifor i= 1,...,d;
124
p(i)=Ξ(i)(0) = E[r(i)]and PRd×dhas rows p(i)for i= 1,...,d.
Then we denote by (λ, µ, P)the arrival rates, the service rates, and the rout-
ing matrix of the generalised Jackson network. In short we say that (λ, µ, P)
are the rates of the network.
In the following section we will investigate stochastic networks through
their rates only. Results we obtain there are applicable to stochastic net-
works that are not generalised Jackson networks: They for example apply to
networks where arrival processes are sums of independent renewal counting
processes.
Definition 5.1.7 (Jackson Network).A Jackson Network is a generalised
Jackson network with all inter arrival and service times being exponentially
distributed.
James Jackson’s definition in [14] of what he then called a network of
waiting lines allowed for finitely many servers at each node and did specify
the queueing discipline as “first come first serve”. In the context of large de-
viation for the queue sizes multiple servers are modeled as a single server and
the service times at that node are decreased correspondingly. Since we do not
distinguish different classes of customers and investigate queue sizes only and
not delay the queueing discipline is not relevant here. Our definition 5.1.7
of Jackson networks corresponds to the definition of Ignatiouk-Robert in [12].
The following definition is reasonable in a network where customer share
the same underlying graph as they do in the generalised Jackson network.
Definition 5.1.8 (Path).In a network with routing matrix Pwe say there
is a path from node ito node j6=i(i, j {1,...,d}) if there is a sequence
k1,...,kmof nodes in {1,...,d}such that
pik1pk1k2. . . pkmj>0
or equivalently Pm+1(i, j)>0with Pm+1 the m+ 1-times product of P. We
say that such a path has length m+1 as it moves along m+1 (not necessarily
different) edges.
We make the following assumption for the rest of this thesis:
Assumption 5.1.9 (Open, no immediate feedback).Networks are open and
without immediate feedback:
For each node jthere is a node iwith λi>0and a path from ito j.
If λi>0node iis called an entry node.
5.1 Stochastic networks and associated processes 125
For each node ithere is a node jwith pj0>0and a path from ito j.
If pi0>0node iis called an exit node.
When leaving a node customers are not immediately fed back into the
same node.
The assumed properties are reflected in the routing matrix P:Pis strictly
substochastic and 1,1 are no eigenvalues. Then (id P)1exists, and
(id P)1λ > 0 (coordinate-wise). Also (idMP
M)1exists for any
M {1,...,d}. No immediate feedback in the network is equivalent to
pii = 0 for i= 1,...,d.
Satisfying assumption 5.1.9 is a property of a network that depends on the
network topology or the adjacency matrix associated with Pand the entry
nodes {i|λ(i)>0}. Different networks with equivalent distributions for inter
arrival times τ(i)Aand inter service times τ(i)Sand equivalent distributions
for all routing decisions r(i)are either all open and without feedback or none
of them is.
Example 5.1.10. The network of figure 5.1 satisfies 5.1.9. Nodes 1,2are
entry-nodes, nodes 3,4are exit nodes.
r
1
r
2r
3
r
4
-
-?
@@@@
@R
6
- -
-
Figure 5.1: An open network without immediate feedback for d= 4
The graph is even strongly connected which is not a general assumption.
Removing immediate feedback
The assumption 5.1.9 of no feedback is no restriction at all. For an open
network with feedback we can remove the feedback and remodel the network
such that the theory developed in this thesis is applicable.
In a feedback queue customers finishing service have two possibilities to pro-
ceed: either leave the network or join the queue again. If decisions to join
126
the queue again are iid with pthe probability to do so then a typical total
service time is the τdefined in 2.1.4. We repeat the definition:
τ=
G
X
k=1
τkwith geometrically distributed G1 and iid τ, τ1, τ2,...
The above the τis a reference to the loop and the parameter pis dropped
in the notation.
We can now remodel the queue as one without feedback but with inter event
times τ
1, τ
2,....
Claim 5.1.11. Queue sizes of the single queue with inter event times τ, τ1, τ2,...
and feedback and the GI/GI/1 with service times τ, τ
1, τ
2,... and without
feedback have the same distribution.
Proof of 5.1.11 by coupling: Let τ1, τ2,... be the iid sequence of service
times and r, r1, r2,... the sequence of routing decisions r {0,1}and r= 1
representing a customer just having finished service rejoins the feedback-
queue.
We model an associated queue without feedback the following way: With the
sequence of routing decisions associate a sequence G1, G2,...that counts the
length of runs of 1-s:
G1= 1 + max{j1|1 = r1=···=rj,}
K(i) := Pi1
k=1 Gk,Gi= 1 + max{j1|1 = rK(i)+1 =···=rK(i)+j}.
(with max = 0) Then the G1, G2,...are iid geometric and τ
i=PGi
g=1 τK(i)+g
defines service times of the associated queue without feedback.
Starting from one set of sequences τ1, τ2,...and r1, r2,...we have customers
leaving the respective queues with or without feedback at exactly the same
epochs: at Pk
j=1 τkwith a ksuch that rk= 0. With the same arrival process
both queues are always of the same size.
5.1.11
The same remodelling of inter event times can be applied to a queue in
a network. If at node icustomers are routed internally wrt
α1
.
.
.
αd
with
αi>0 we can change inter event times at node ifrom τ(i)to τ(i)(with
p=αi) and routing to p(i)=
pi1
.
.
.
pid
with pii = 0, pij =αj
1αi.
5.2 Stochastic networks and associated processes 127
5.2 Deterministic descriptions of stochastic
networks
In this section we focus on the deterministic rates of a stochastic network.
This section generally applies to stochastic networks with deterministic rates,
including the generalised Jackson network. In terms of only the rates the
Jackson and the generalised Jackson network are not distinguishable.
Definition 5.2.1 (Flow).If (λ, µ, P)are the deterministic rates of an open
stochastic network then
ν=λ+Pmin{ν, µ}
is called the traffic equation (in ν) and its unique solution is denoted the flow
in the network.
Uniqueness of νis proved in [11]. Given the network flow one can decide
if or if not the network is ergodic.
Definition 5.2.2 (Ergodic network).A network with flow νwith νi< µifor
all iis called ergodic.
From ergodicity we get the existence of an equilibrium distribution for
the number of customers in each queue of the Jackson network. We have
seen this in the introduction. This implies that in the limit of the scaling
3.4.1 all queue sizes will uniformly on compacts stay small: The a.s. limit of
the network process is the function t7→ 0. That is we know the determinis-
tic limiting behaviour, the expected behaviour. We expect the same for the
generalised Jackson network.
An easy to check criterion for ergodicity is: Calculate the flow rates as if
all service rates were = .
ν=λ+Pmin{ν, ∞} =λ+Pνν= (id P)1λ
and check if νi< µifor all nodes iof the network. If not, the network is not
ergodic. This can be written as an (coordinate wise) ”equilibrium inequality”
(id P)1λ < µ (5.1)
If a network is not ergodic solving the traffic equation is still important. For
a nice way to solve the traffic equation see [11] of Jonathan Goodman and
William Massey where they give an algorithm to calculate νwith at most d
matrix vector multiplications in Rd.
128
Definition 5.2.3 (Traffic intensity).In a network with flow νand service
rates µthe traffic intensity ρiat the i-th node is defined as ρi=νi
µi.
Definition 5.2.4. In a network with traffic intensity ρwe say node iis
a bottleneck if ρi1
a strict bottleneck if ρi>1
ergodic if ρi<1.
Definition 5.2.5 (Loss rate).In a stochastic network with deterministic
rates (λ, µ, P)and flow νdefine the loss rate yRd
0
y:= max{0, µ ν}.(5.2)
For ergodic nodes we have yi=µiνiand in an ergodic network
y=µ(id P)1λ(Rd
>0) (5.3)
Definition 5.2.6 (Free drift, equilibrium network drift).In a stochastic
network with deterministic rates (λ, µ, P)define the free drift of the network
as
λ+Pµµ.
Additionally, for νthe flow of the network (cf 5.2.1) define the network drift
as the coordinate wise maximum
max{0, λ +Pmin{ν, µ} µ}
Remark 5.2.7. The equilibrium drift of an ergodic network is 0.
The network drift can equivalently be expressed as max{νµ}.
νiµi=µi(ρi1) for each i= 1,...,d from expressing νiin terms
of ρi.
The following is an example on rates and drifts in a network with fixed
routing probabilities and service resources at the nodes. The network drift
is qualitatively different wrt arrival processes with the different rates λ.
Example 5.2.8. Consider the network 5.1 with λ, µ, P where
µ=
3
4
4
3
, P =
01
2
1
20
0 0 1 0
0 0 0 1
2
1
4
1
40 0
.
5.2 Stochastic networks and associated processes 129
20
5
6
28
2
4
408 32
1
240 12
7
3
16
0
364
24
0
8
20
40
24
36
12
20
32
28
40
8
32280 16
16
12
4
4
36
Ergodic network Non-ergodic network,
bottleneck node 3 (yellow)
Figure 5.2: Realisation of the queue sizes in the networks of example 5.2.8.
Then the free and network drift and the set of bottlenecks depend on the ar-
rival rates
arrival rates free drift network drift bottlenecks
λ=
1
1
0
0
1
4
5
3
6
4
0since (id P)1λ=1
3
4
6
8
4
< µ none
λ=
2
2
0
0
1
4
1
1
6
4
max{
2.5
3.75
5
2
µ , 0}=
0
0
1
0
3 (strict)
Figure 5.2 gives two simulations of the network of example 5.2.8 and for
the different arrival rates. Note the different scales for the y-axis: In the
ergodic network all queues stay small, in the non-ergodic network the queue
size at the bottleneck grows.
In the following two subsections we interprete the solution of the traf-
fic equation as actual flow and we will look at non-ergodic networks and
investigate ergodic subnetworks.
130
5.2.1 Fluid network
We generally look at a network that is travelled by customers or packages,
things that can be counted. We will now look at an associated sewer like
network model.
u
1
99K
u
2
99K
u
3
99K
u
4
99K
-
-?
µ1p12
@@@@@@@@@
@R
µ1p13
µ4p42
µ4p41
6
µ3p34
-
µ2p23 -
µ3p30
-
µ4p40
Figure 5.3: The fluid network of definition 5.2.9 associated with the network
of figure 5.1. 99K represents an outlet.
Definition 5.2.9 (Fluid model associated with a stochastic network).Given
a stochastic network with deterministic rates (λ, µ, P)the associated deter-
ministic fluid model is based on the same graph where edges now represent
pipes:
At node iwith λi>0some kind of fluid flows into the network at rate
λi.
The edge / pipe (i, j)has capacity µipij. The joint capacity of edges
leaving from node iis µi=Pd
j=0 µipij.
At each node there is an outlet.
Propagation: If at node ithe sum of incoming flow is strictly less than
µithen the incoming flow is divided into outgoing flow to nodes 1,...,d
and the outside world according to pi1,...,pid and pi0. Otherwise all
outgoing pipes / edges will get flow equal to their capacity and the non-
negative surplus leaves the network through the outlet.
In this setting the solution νto the traffic equation is the actually ob-
served equilibrium flow in the network: νiis the total flow into node iand
5.2 Stochastic networks and associated processes 131
min{µi, νi}pij is the flow through the edge / pipe connecting nodes iand j.
In a non-ergodic network there is a node where at rate νiµi0 fluid leaves
the network through the outlet. All nodes that can route all incoming flow
to nodes jsupp(p(i)) and still have some spare capacity left are ergodic
nodes.
5.2.2 Subnetworks
Distinguishing bottlenecks and ergodic nodes in a network allows to find
an ergodic subnetwork. We can move the bottleneck nodes to the outside
world and calculate flow rates in the network of remaining nodes. In the flow
network moving node i“to the outside world” means removing node ifrom
the network and increasing for all nodes jin the support of p(i)the arrival
rate from λjto λj+µipij. From the intuition of the flow network it is clear
that flow rates in the ergodic subnetwork remain the same. However, we do
the explicit calculations.
The network starting empty
Let νbe the flow of the network with rates (λ, µ, P) as defined in 5.2.1.
Partition nodes into two sets: the ergodic nodes Eand the bottleneck nodes
Bas defined in 5.2.4.
Claim 5.2.10. In a network with rates (λ, µ, P)and flow νpartition nodes
into ergodic nodes Eand bottleneck nodes Band consider the subnetwork of
nodes Ewith rates
λE+ (P)E B µB, µE, PE(5.4)
Then νE= [νi]iEis the flow in this subnetwork.
Proof of 5.2.10: For the network of all dnodes the following initial state-
ment is true.
ν=λ+Pmin{ν, µ}
any partition
νB=λB+P
Bmin{νB, µB}+ (P)B E min{νE, µE}
νE=λE+ (P)E B min{νB, µB}+P
Emin{νE, µE}
our partition
νB=λB+P
BµB+ (P)B E νE
νE=λE+ (P)E B µB+P
EνE(5.5)
The second equation of (5.5) can be rearranged:
νE=λE+ (P)E B µB+P
EνE=λE+ (P)E B µB
|{z }
arrivals to the ergodic subnetwork
+P
Emin{νE, µE}
|{z }
=νE
132
So the νEwe extracted coordinate wise from the flow of the network of all d,
bottleneck and ergodic, nodes is the solution for the traffic equation in the
remodelled smaller network of ergodic nodes with rates (5.4).
5.2.10
Let us now turn again to the bottleneck nodes
Claim 5.2.11. In a network with rates (λ, µ, P)and flow νpartition nodes
into ergodic nodes Eand bottleneck nodes B. Then the bottleneck nodes B
have equilibrium network drift
λB+ (P)BE νE+ (P
Bid)µB
with νE= [νi]iEthe flow in the ergodic subnetwork. The equilibrium network
drift equals the free drift.
Proof of 5.2.11: we prove that
νBµB=λB+ (P)BE νE+ (P
Bid) µB
where the lhs is the network equilibrium drift by definition 5.2.6. We apply
the same partitioning of {1,...,d}=EBwe solve the second equation of
(5.5) for νE
νE=λE+ (P)E B µB+P
EνEνE= (id P
E)1λE+ (P)E B µB
and plug it into the first equation of (5.5). Substraction of µBgives the
claimed equilibrium drift. Which is a free drift as in definition 5.2.6.
5.2.11
We can also get an expression for νE, νB(And then for the drift of bot-
tleneck nodes) with on the rhs only network primitives:
νE= (id P
E)1(λE+ (P)EB µB(5.6)
νB=λB+P
BµB+ (P)B E (id P
E)1λE+ (P)E B µB(5.7)
We remember that the partition of nodes into sets Eand Bwas such that
νE< µEand νBµBcoordinate wise.
Another approach to finding the ergodic subnetwork is in Chen and Mandel-
baum, [3] p. 411. LCP is short for “linear complementary problem”.
Definition 5.2.12 (LCP in Rd).Let x , P be given with xRdand P
a substochastic matrix in Rd×dand 1not an eigenvalue of P. The linear
complementary problem is to find (y, z) satisfying
z=x+ (id P)y , y, z 0,hy, zi= 0 (5.8)
5.2 Stochastic networks and associated processes 133
A solution to the LCP exists and is unique for any xif all the principal
minors of id Pare positive (cf [1] p. 271). This is what we get from 1 not
being an eigenvalue of P. The following claim is from [3] p. 412.
Claim 5.2.13. In a stochastic network with deterministic rates (λ, µ, P)and
free drift x=λ+Pµµlet (y, z)be the solution to the LCP in Rdfor
(x, P). Then zis the equilibrium network drift and yis the loss rate.
Proof of 5.2.13:
y= max{0, µ ν} 0, z= max{0, ν µ} 0
max{0, µ ν} · max{0, ν µ}= 0 0
to show z=x+ (id P)ywe start with partitioning nodes {1,...,d}
into ergodic nodes Eand bottleneck nodes B. The network drift is
then 0
νBµB. We show that the drift equals x+ (id P)y.
x+ (id P)yE=λE+(PTid) µE+(id P)yE
=λE+P
Eid (P)EB (µy)
=λE+ (P
Eid) (µEyE
|{z }
=νE
) + (P)EB (µByB
|{z}
=0
)
(5.6)
= 0
For the bottleneck nodes
x+ (id P)yB
=λB+ (P
Bid) (µByB
|{z}
=0
) + (P)BE (µEyE
|{z }
=νE
)
5.2.11
=νBµB
So we feed the LCP the free drift and get the loss rate and the equilib-
rium network drift. We can identify bottlenecks and ergodic nodes from the
solution of the LCP by
iis a bottleneck if zi0
a strict bottleneck if if zi>0
an ergodic node if zi= 0 and yi>0.
134
30
12
0
36
20
0 408 24 2816 32204
50
10
40
16
20
40
400
10
208 3228
0
4
50
30
12 3624
Figure 5.4: The ergodic network of 5.2.8 with a queue starting non-empty.
This may or may not change ergodic nodes to become bottlenecks.
The network starting non-empty
Similar to observing the subnetwork of ergodic nodes we can investigate a
network where some nodes are initially non-empty. We would investigate
the subnetwork of initially empty nodes and the initially non-empty nodes
separately.
The reason why bottleneck nodes and full nodes are treated similarly is:
bottlenecks tend to fill up if started empty (positive network drift), so the
initial difference vanishes immediately: We see this in figure 5.5 which shows
simulations of the non-ergodic network of example 5.2.8: Qualitatively the
evolution of queue sizes is the same if or if not the bottleneck node (green)
starts empty or not.
The following is a generalisation of the LCP to function space. It will
lead to a definition of a unique network drift and loss rate in a network
where initial nodes may be non-empty. The Skorohod problem is generally
investigated in the space D([0,),Rd) functions (cf [18], appendix D) but
we only need it for (piecewise) linear input functions. In [3], section 5.2
p. 431 the Skorohod problem is stated under the name “oblique reflection
mapping”. We give a simplified version:
Theorem 5.2.14 (Skorohod problem for linear input functions).If
PRd×dis substochastic with ρ(P)<1
5.2 Stochastic networks and associated processes 135
404 280
20
30
12 36
5
24
0
32168
15
20
25
10
12
0
248
10
32 36284
40
20
50
30
4016 200
Figure 5.5: The non-ergodic network of 5.2.8, the bottleneck node start-
ing empty or not does not affect the qualitative behaviour of the remaining
ergodic nodes
z0Rd
0,θRd
and X:X(t) = z0+t θ then there are unique functions Y, Z continuously
depending on Xsuch that
Z=X+ (id P)Y
Y, Z 0
Y(0) = 0;Z(0) = z0
Ycoordinate wise increasing and for each i {1,...,d}:Yiincreases
at times t0when Zi(t) = 0.
We consider the Skorohod problem with the linear input function with
the free network drift as slope.
Claim 5.2.15. Consider the Skorohod problem 5.2.14 for P , X(t) = z0+
t(λ+ (Pid)µ). Then Y, Z of the solution of the Skorohod problem are
linear over [0, T]for some T > 0.
Proof of 5.2.15: Let Λ = {i|z0,i >0}and νΛthe flow in the Λc-
subnetwork with rates
(λΛ, µΛc, PΛc), λΛ=λΛc+ (P)ΛcΛµΛ
136
Let K={i|z0,i >0 or for iΛc:νΛ
i> µi}and consider the subnetwork of
Kc-nodes with rates
(λK, µKc, PKc), λK=λKc+ (P)KcKµK
Note that νΛ
i=νK
ifor iKcby 5.2.10 and that Kcnodes are ergodic or
non-strict bottlenecks: the flow νKsatisfies νK
iµifor all iKc.
Rearrange nodes such that Λ = {1,...,|Λ|} and Λ , K ={1,...,|K|} (this
works since KΛ in the above construction).
Now set
zKcto the network drift of the Kcsubnetwork and yR|Kc|to the
loss rates:
zKc=λKc+ (P)KcKµK+ (P
Kcid) (µKcy)
which is = 0 since Kcnodes are no strict bottlenecks. Equivalently let
(zKc, y) be the solution of the LCP in R|Kc|with (x, P) of definition
5.2.12 as
x=λKc+ (P)KcKµK+ (P
Kcid) µKc, P =PKc
zKto the drift of K-nodes: zK=λK+ (P)KKcνK+ (P
Kid) µK.
Note that zimay be negative for iΛK.
Then let Z(t) = z0+tz and Y(t) = ty with y=0
y. We rearrange
z=zK
zKc=λ+(P
Kid) µK+ (P)KKc(µKcyK)
(P)KcKµK+ (P
Kcid) (µKcyK)
=λ+ (Pid) (µy)
=λ+ (Pid) µ+ (id P)y
We have
iKcz0,i = 0, zi= 0
iK\Λz0,i = 0, zi>0
iΛz0,i >0, ziR
5.3 Stochastic networks and associated processes 137
which implies that there is T > 0 such that z0+Tz 0 coordinatewise. We
now give the explicit solution of the Skorohod problem on [0, T]:
Y(t) = t y
Z(t) = z0+tz
=z0+tλ+ (Pid) µ+ (id P)y
=z0+tλ+ (Pid) µ+ (id P)t y
=X(t) + (id P)Y(t)
5.2.15
Claim 5.2.15 can be generalised to arbitrary Tand piecewise linear Y, Z.
Definition 5.2.16 (Network drift and loss rate, non-empty network).Con-
sider a stochastic network with deterministic rates (λ, µ, P). Let z0Rd
0
and consider the Skorohod problem with
P,
z0,
X(t) = z0+tλ+ (Pid)µ.
If linear processes Z(t) = z0+tz, Y (t) = ty are the solution of this Skorohod
problem then define
zas the network drift
yas the loss rate.
For well-defined-ness note that from the Skorohod problem z0,i = 0
yi0 and z0,i >0yi= 0. The network drift and loss rate are unique
from uniqueness of the solution to the Skorohod problem.
Remark 5.2.17. If a network with rates (λ, µ, P)starts in z0and Λ =
{i|z0,i >0}is the set of indices of queues starting nonempty and zis the
network drift then
zΛ= [zi]iΛis the free drift of the initially non-empty nodes iΛ;
zΛc= [zi]iΛcis the equilibrium drift of the subnetwork of Λc-nodes, the
initially empty nodes.
138
5.3 Processes
We now move from vectors of real numbers to sample paths. This section
is about different kind of processes we need to describe the behaviour of the
network, especially with reference to it starting non-empty and being non-
ergodic.
We define stochastic processes to analyse the generalised Jackson network
in terms of large deviations. We will emphasise the changes of measure that
maintain desirable properties of the processes.
5.3.1 The free process
We describe the free process associated with a generalised Jackson network
and investigate its large deviation behaviour. The free process is interesting
since it behaves like a network in a certain way and will be easy to analyse in
terms of large deviations. We gain valuable insights we will apply to obtain
the local large deviations of the generalised Jackson network.
We apply results from chapter 4 on the sample paths large deviations of
the networks primitives to obtain relatively easily a sample path large devi-
ation principle for the free process. We present it in 5.3.13. Similarly the
change of measure for the free process can be given easily relying on chapter
4 and independence of the network primitives. We define and interprete a
change of measure in 5.3.14.
Given a generalised Jackson network of dnodes with primitive processes
of 5.1.5 we make some definitions:
Definition 5.3.1 (Vectorial arrival process).In a stochastic network with d
nodes, network primitives 5.1.5 where we denote the arrival processes A(1),...,A(d)
set
A=
A(1)
.
.
.
A(d)
.
Definition 5.3.2. In a stochastic network with dnodes, network primitives
5.1.5 we define for the i-th service process S(i)and the i-th process of routing
decisions n7→ Pn
k=1 r(i)
k
S(i):t7→
S(i)(t)
X
k=1
r(i)
keiS(i).
5.3 Stochastic networks and associated processes 139
Remark 5.3.3. For the primitive service process S(i)and the process S(i)sp
split wrt S(i)and (pi1, . . . , pid,1Pd
j=1 pij)as in 4.4.1. For T(i)of definition
4.4.7 we have
T(i)S(i)sp =T(i)
S(i1)
.
.
.
S(id)
S(i,d+1)
=eiS(i)+
d
X
k=1
ekS(ik)=S(i)
and for this process we have developed the sample path large deviations in
section 4.4.3.
We will refer to S(i)as a renewal counting process, too. Its coordinates
count customers leaving node iand customers moving from node ito the
other network nodes and times between changes of state are independent.
We now give a definition for the network primitives that is equivalent to that
in 5.1.5
Definition 5.3.4 (Network primitives II).If for i= 1,...,d
A(i), S(i), n 7→
n
X
k=1
r(i)
k
are the network primitives of a generalised Jackson network as defined in
5.1.5 and Ais defined as in 5.3.1 and S(i)for i= 1,...,d as in 5.3.2 then
A, S(1),...,S(d)
represent the same generalised Jackson network. These processes will be de-
noted the networks primitives.
In [3], section 2.7 these processes together with the initial starting point
Z(0) of the network are denoted the networks primitives. In a generalised
Jackson network the primitives are independent. With these we define the
Definition 5.3.5 (Free process).X=A+Pd
i=1 S(i).
The process Xincreases in its i-th coordinate at the occurrence of an
event of the arrival process A(i). When S(i)changes state we can interprete
this as a customer leaving node iand being routed to some other network
node or leaving the network. This change of state also happens in the free
process X. In these aspects Xdescribes the stochastic network well. How-
ever, the free process does not suitably describe a stochastic network because
140
coordinates of Xmay be of negative values.
The free process is nice to work with: We will develop its large deviations in
the following sections. Before that we want to connect the free process with
the free drift defined in section 5.2.
Claim 5.3.6. Network primitives in a generalised Jackson network converge
a.s uniformly on compacts with a linear limit.
Proof of 5.3.6: This follows as a functional strong law of large numbers for
the renewal counting processes, cf 7.1.3 of the appendix, or from the sample
path large deviations proved in chapter 4.
5.3.6
Definition 5.3.7 (Drift of S(i)).T(i)µip(i)=µiei+p(i)
Claim 5.3.8. The free drift λ+ (Pid)µdefined in 5.2.6 is the drift of
the free process.
Proof of 5.3.8: The drift of the process as its a.s limit under the scaling
and wrt the supnorm over some compact interval.
lim
t→∞
1
tXt=λ+
d
X
i=1
lim
t→∞
1
tS(i)
t=λ+
d
X
i=1
µiei+p(i)=λ+ (Pid) µ
5.3.8
For the network primitives and the free process the rates of section 5.2 de-
scribe the expected behaviour of the process in the limit of the scaling 3.4.1.
Lmgf for free process
We calculate the lmgf Ψ of the free process Xand prove its strict convexity.
Definition 5.3.9 (Lmgf Ψ of the free process).If Xis the free process build
from the network primitives of a generalised Jackson network as defined in
5.1.5, 5.3.4 then the lmgf of Xis
Ψ(θ) = lim
t→∞
1
tlog E[ehθ,Xti] =
d
X
i=1
Γ(i)
A(θi) + Γ(i)
S(θi+ Ξ(i)(θ)).
For the form of the lmgf we have applied independence of network primi-
tives and the form of the lmgf of S(i), in the representation of 5.3.3, of claim
4.4.9:
E[ehθ,Xti] =
d
Y
i=1
E[eθiA(i)
t]E[ehθ,S(i)
ti].
5.3 Stochastic networks and associated processes 141
Claim 5.3.10. D(Ψ) = Rdand Ψis strictly convex.
Proof of 5.3.10: Finiteness of Ψ is a direct consequence of finiteness of all
involved Γ’s and Ξ’s. Convexity of Ψ is immediate since it is the limit (in
t) of strictly convex (in θ) functions (cf 2.2.8). In this approach in the limit
strictness is lost, so we choose another: Assume all network primitives are
non-deterministic. Let α, β Rd,α6= 0. We show that along the half line
{β+ |c0}the function c7→ Ψ(β+c α) is strictly convex.
We calculate derivatives in direction α.
d
dcΨ(β+)
=
d
X
i=1
αiΓ(i)
A(βi+i)
(i)
S((β+)i+ Ξ(i)(β+)) (αi+d
dcΞ(i)(β+))
d2
dc2Ψ(β+) (5.9)
=
d
X
i=1
α2
iΓ(i)′′
A(βi+i)
(i)′′
S((β+)i+ Ξ(i)(β+)) (αi+d
dcΞ(i)(β+))2
(i)
S((β+)i+ Ξ(i)(β+)) d2
dc2Ξ(i)(β+)
where the Γ′′ are non-negative (cf 2.2.5). Convexity of the Ξ(i)follows from
their definition 4.4.8 (and convexity of K(i)and linearity of T) or alternatively
from the following application of Jensen’s inequality.
d2
dc2Ξ(i)(β+) (5.10)
=d
dc Pd
j=1 αjpij exp{(β+)j}
exp{Ξ(i)(β+)}
=
d
X
j=1
α2
j
pij exp{(β+)j}
exp{Ξ(i)(β+)}d
X
j=1
αj
pij exp{(β+)j}
exp{Ξ(i)(β+)}2
0
We have (5.10)= 0 for p(i)a point measure (when routing from node iis
deterministic) or when the α1,...dare constant on the support of p(i).
142
We now argue for strict convexity. We show that for α, β Rd,α6= 0
the second derivative (5.9) is strictly positive.
Strict positivity of (5.9) follows if there is isuch that λiαi6= 0 since then
Pd
i=1 α2
iΓ(i)′′
A(αi)>0 from strict convexity of the ΓA(·)’s. Otherwise
Pd
i=1 α2
iΓ(i)′′
A(αi) = 0 and we need the ΓS’s to argue for strict positivity.
Let’s assume that
α, β Rd,α6≡ 0
i:λi>0αi= 0
d2
dc2Ψ(β+) = 0
and produce a contradiction. The derivatives Γ(i)
Sand Γ(i)′′
Sare strictly pos-
itive by 2.2.8. So the other sums in (5.9) are = 0 iff
αi+d
dcΞ(i)(β+) = 0 ,d2
dc2Ξ(i)(β+) = 0 i(5.11)
First and second derivatives of Ξ(i)on the half line can be written as expec-
tation and variance under exponentially twisted p(i):
d
dcΞ(i)(β+) =
d
X
j=1
αj
pije(β+)j
eΞ(i)(β+)=Ep(i)(β+)[α]
d2
dc2Ξ(i)(β+)(5.10)
=Vp(i)(β+)[α]
and (5.11) becomes
αi=Ep(i)(β+)[α],Vp(i)(β+)[α] = 0 i(5.12)
From openess of our network we know there is at least one λi>0 and by our
assumption (2nd bullet) for this iwe have αi= 0. From (5.11) we know that
all αjwith jsupp(p(i)) take the same value (from V= 0) and that this
value is again = 0 (from 0 = αi=E). So moving along a directed spanning
forest with each tree of the spanning forest rooted at an entry node, we find
that αi= 0 at each node we pass. From the spanning property we get α0,
the contradiction.
If in the network there are deterministic primitives and some Γ′′
A0 the
proof works the same way if there is non-deterministic flow at each node / if
5.3 Stochastic networks and associated processes 143
there is a spanning forest of the network with only those entry nodes as roots
that have non-deterministic arrival processes; all service processes have to be
non-deterministic. This condition (non-det. flow at each node) compares to
condition (P, S) applied in theorem 2.2. in Puhalskii’s [16].
5.3.10
Since we did not properly define a spanning forest we at least give exam-
ples.
r r
r r
1
23
4
-
6
-
r r
r r
1
23
4
-
-
6
-
r r
r r
1
23
4
-
-
@@@@
@R
6
Figure 5.6: Three spanning forests for the network in figure 5.1
The first spanning forest in figure 5.6 is a spanning tree. If the network
is not strongly connected a spanning tree does not necessarily exist.
Large deviations of the free process
We start with a one dimensional large deviation principle and proceed with
the sample path large deviation principle.
Claim 5.3.11. For the mean of the free process a large deviation principle
holds: For open GRand closed FR
inf
xGΨ(x)lim infn→∞ 1
nlog P(1
nXnG)
lim supn→∞ 1
nlog P(1
nXnF) inf
xFΨ(x)
with Ψthe lmgf of the free process of definition 5.3.9 and Ψits Fenchel-
Legendre transform.
Proof of 5.3.11: By application of the artner-Ellis theorem. Strict con-
vexity of Ψ and D(Ψ) = Rdare important here (no regularisation in the
artner-Ellis theorem needed). The rate function is the convex Ψ, the
Fenchel-Legendre transform of the lmgf.
5.3.11
144
Claim 5.3.12. Ψis a good rate function and
Ψ(v) = inf
a,r
a+(rid)γ=v
d
X
i=1
Γ(i)
A(ai) + Γ(i)
S(γi) + γiΞ(i)(r(i)).
Proof of 5.3.12: We have one dimensional large deviations with good
rate functions for the arrival and split service processes that form the free
process, cf 3.5.1. As summing is continuous in Rwe can apply the contraction
principle (cf 7.5.2 in the appendix) to get a large deviation principle for the
mean of the free process. As rate function we get
v7→ inf
a,s1,...,sdRd
a+Pd
i=1 si=v
d
X
i=1
Γ(i)
A(ai) + (i)
S(i)πi))(si)
Since the rate function of a large deviation object is unique this rate function
has to equal Ψ.
Ψ(v) = inf
a,s1,...,sdRd
a+Pd
i=1 si=v
d
X
i=1
Γ(i)
A(ai) + (i)
S(i)πi))(si)
4.4.21
= inf
a,s1,...,sdRd
a+Pd
i=1 si=v
d
X
i=1
Γ(i)
A(ai) + inf
r(i)i
γi(r(i)ei)=si
Γ(i)
S(γi) + γiΞ(i)(r(i))
Now in the set of restrictions we write ras the matrix with rows r(i). Note
that for Ψ(v) to be finite rows have to be subprobabilities concentrated on
the support of p(i), cf 4.4.22. We rewrite the restriction as
v=a+
d
X
i=1
si=a+
d
X
i=1
γi(r(i)ei) = a+ (rid)γ
and we have obtained the desired representation. From goodness of the
primitives’ rate functions (cf 2.6.2) all of the above representations of the
rate function are good, too.
5.3.12
Since we do have sample path large deviation principles for the arrival and
split service processes we can apply the contraction principle to obtain sample
path large deviations for the free process.
5.3 Stochastic networks and associated processes 145
Claim 5.3.13. The free process Xobeys a sample path large deviation prin-
ciple in D([0, T],Rd)equipped with the sup-norm induced topology under the
scaling 3.4.1. The good rate function is
ψ7→ ZT
t=0
Ψ(ψ(t)) dt
for ψAC[0, T],ψ(0) = 0. The rate function equals for all other ψ.
Proof of 5.3.13: We have sample path large deviation principles for each
primitive Aand S(i)(for i= 1,...,d) in (D([0, T],Rd) equipped with the
supremum norm induced topology. For the definition of A, S(i)cf 5.3.1, 5.3.2
and for the sample path large deviation principles 4.3.10, 4.4.20. As primi-
tives are independent we have the joint large deviation principle of
(A, S(1), . . . , S(d))D([0, T],Rd)×(d+1)
with the supremum norm
f7→ max
i=1,...,d
j=1,...,d+1
||fi,j|| for fD([0, T],Rd)×(d+1) ,||fij|| = sup
t[0,T ]
|fij(t)|
The rate function of the joint large deviation principle is the sum of individual
rate functions by independence of the primitives. It is infinite for any fwith
an fij 6∈ AC([0, T],R) as each rate function is concentrated on absolutely
continuous functions. If all fij are absolutely continuous then for this element
fD([0, T],Rd)×(d+1)
the rate function has the representation
f7→
d
X
i=1 ZT
t=0
Γ(i)
A(f
i1(t)) dt +
d+1
X
j=2 ZT
t=0 Γ(j)
S(πj+ Ξ(j))(f
·j(t)) dt
Now addition on D([0, T],Rd)× · · · × D([0, T],Rd) is a continuous map wrt
the sup-norm induced topology. Applying the contraction principle we get a
large deviation principle for the free process with rate function
ψ7→ inf
a,s1,...,sd
a+s1+···+sd=ψ
d
X
i=1 ZT
t=0
Γ(i)
A(a
i(t)) + Γ(i)
S(πi+ Ξ(i))(s
i(t)) dt
where the infimum can be taken over a, s1,...,sdAC([0, T],Rd). Note
that this makes the rate function infinite as soon as ψ6∈ AC([0, T],Rd).
146
We get a lower bound for the rate function by interchanging summation
and integration and then optimisation and integration. With the infimum
inside the integral it changes from an optimisation in function space to an
optimisation in Rd.
inf
a,s1,...,sdAC
a+s1+···+sd=ψ
d
X
i=1 ZT
t=0
Γ(i)
A(a
i(t)) + Γ(i)
S(πi+ Ξ(i))(s
i(t)) dt
ZT
t=0
d
X
i=1
inf
a,s1,...,sdRd
a+s1+···+sd=ψ(t)
Γ(i)
A(ai) + Γ(i)
S(πi+ Ξ(i))(si)dt
=ZT
t=0
Ψ(ψ(t)) dt
where the last equality is due to the different representations of Ψwe have
already seen in the proof of 5.3.12.
Since there are no continuity restrictions for the derivatives and ψis ab-
solutely continuous, we get absolutely continuous functions from integrating
infimisers found inside the integral. The is in fact an equality.
5.3.13
Change of measure
We define a change of measure process for the free process associated with a
generalised Jackson network.
Definition 5.3.14. Consider a generalised Jackson network with
primitives defined in 5.1.5, 5.3.4.
Let Xbe the free process defined in 5.3.5
with lmgf Ψdefined in 5.3.9.
Define for t0and αRd
M(α, t) = exp{hα , Xti tΨ(α)}r(α, t)
where
r(α, t) =
d
Y
i=1
r(i)S(i)
S(αi+ Ξ(i)(α)), t)r(i)A(Γ(i)
A(αi), t)
r(i)D(β, t) = Fc
β
Fc(B(t)) eβB(t)
5.3 Stochastic networks and associated processes 147
for D {A, S}and Fthe distribution function of inter event times of the
renewal counting process D(i)and B(t)the age of D(i)at time t(for the
definition of r(i)Dcf 3.6.5).
Claim 5.3.15. t7→ M(α, t)is a change of measure process for the free
process.
Under the change of measure M(α, ·)the process X=A+Pd
i=1 S(i)is
again a free process associated with the primitives of a generalised Jack-
son network. Each primitive changes its distribution in the following
way:
if A(i)had inter event densities fthen under the changed measure
A(i)remains a renewal counting process and now has inter event
times density fΓ(i)
A(αi), cf 2.3.1, (2.4).
if the routing decision r(i)was distributed on {e1,...,ed,0}with
probability measure (pi1,...,pid,1Pd
j=1 pij)associated with the
sub-probability measure p(i)then under the change of measure rout-
ing decisions remain iid. r(i)now has the distribution associated
with the sub-probability measure Ξ(i)(α), cf 4.4.23.
if S(i)had inter event densities fthen under the changed measure
S(i)remains a renewal counting process and now has inter event
times density fΓ(i)
S(πi(i))(α), cf 2.3.1, (2.4).
Corollary 5.3.16. Under the change of measure M(α, ·)the rates of the
primitives change from λ, µ, P to
λi(αi) = Γ(i)
A(αi)for i= 1,...,d
µi(α) = Γ(i)
S(αi+ Ξ(i)(α)) for i= 1,...,d
P(α)= [Ξ(1)(α)· · · Ξ(d)(α) ]
The Corollary fits 5.3.15 and is a summary of definitions 3.6.12, 4.4.17,
(4.4.23).
Proof of 5.3.15: For the primitives A(i),S(i)for i= 1,...,d we have in
previous sections developed individual change of measure processes. For the
arrival processes (definition 3.6.4 and representation (3.15)) with γR
(γ, t)7→ exp{γA(i)
ttΓ(i)
A(γ)}r(i)A(Γ(i)
A(γ), t)
148
and in vectorial notation with αRd
(α, t)7→ exp{hα , Ati t
d
X
i=1
Γ(i)
A(αi)}
d
Y
i=1
r(i)A(Γ(i)
A(αi), t)
and for S(i)(cf definition 4.4.15)
(α, t)7→ exp{hS(i)
t, αi tΓ(i)
S(αi+ Ξ(i)(α))}r(i)S(t, Γ(i)
S(αi+ Ξ(i)(α)))
Since primitives are independent we can multiply changes of measure pro-
cesses.
d
Y
i=1
exp{hα , S(i)
ti tΓ(i)
S(αi+ Ξ(i)(α))} × rS(i)(i)
S(αi+ Ξ(i)(α)), t)
×exp{hα , Ati t
d
X
i=1
Γ(i)
A(αi)}
d
Y
i=1
rA(i)(Γ(i)
A(αi), t)
= exp{hα , At+
d
X
i=1
S(i)
t
|{z }
=Xt
i td
X
i=1
ΓA(αi) + Γ(i)
S(αi+ Ξ(i)(α))
|{z }
=Ψ(α)}
d
Y
i=1
rS(i)(i)
S(αi+ Ξ(i)(α)), t)rA(i)(Γ(i)
A(αi), t)
|{z }
=r(α,t)
= exp{hα , Xti tΨ(α)}r(α, t).
So M(α, ·) is the compound change of measure we get when changing each
primitive process separately. For the translation of the change of measure
for S(i)into that of S(i)and p(i)cf 4.4.15 and 4.4.16.
5.3.15
Application of the change of measure in the large deviation
The rate function for the large deviation for the mean of the free process is
the Fenchel Legendre transform of the free process’ lmgf.
Ψ(v) = sup
αRd
hα, vi Ψ(v)
5.3 Stochastic networks and associated processes 149
We want to point out here that the drift of the free process under the change
of measure M(α, ·) with some αRdcoincides with Ψ(α):
d
i
Ψ(α)
= Γ(i)
A(αi)Γ(i)
S(αi+ Ξ(i)(α)) +
d
X
j=1
Γ(j)
S(αj+ Ξ(i)(α)) d
i
Ξ(j)(α)
=λi(α)µi(α) +
d
X
j=1
µj(α)pji(α)
Where for the last equality we applied corollary 5.3.16. We get
Ψ(α) = λ(α) + (P(α)id) µ(α) (5.13)
Claim 5.3.17. If for vRdthere is αRdsuch that under the change of
measure M(α, ·)the free process has drift vthen
Ψ(v) = hα, vi Ψ(α).
Proof of 5.3.17: From the definition of the Fenchel-Legendre transform
and the above
Ψ(v) = sup
αRd
hα, vi Ψ(α) = hα, vi Ψ(α)
v=Ψ(α)
so Ψ(α) is the free drift (cf 5.2.6) under the changed measure and if this
equals vthen θ=αis the optimiser in the Fenchel-Legendre transform.
5.3.17
It is therefore interesting under which conditions an optimising αRdin
the Fenchel-Legendre transform Ψ(v) exists.
Remark 5.3.18. From identifying rate functions in 5.3.12 we have
finiteness of Ψ(v)iff there are a, r, γ such that a+ (rid)γ=v.
And from goodness of Ψwe have existence of an optimiser whenever
Ψ(v)<:
a, γ, r :a+ (rid)γ=v
α:λ(α) + (P(α)id)µ(α) = v
so while αRdhas dcoordinates or degrees of freedom and a, γ, r have
2d+d2this seems not to be relevant in terms of existence of optimisers.
150
From the form of the rate function in 5.3.12 and openness of the do-
mains D(i)
D)for all D=S, i {1,...,d}and D=A, i {1,...,d}
with λi>0(cf 2.8.2): if for vRdexists αsuch that Ψ(α) = vthen
for vclose enough to vthere is αsuch that Ψ(α) = v. This also
implies that the domain D)is open.
5.3.2 The network process
This section is on the stochastic process that describes the generalised Jack-
son network. We will construct it from the network primitives in a similar
way as the free process in such a way that its coordinates cannot become
negative. We give the network process’ expected behaviour.
The large deviations of the network process need a more thorough theo-
retical background. The local large deviations for the network process will
be developed in chapter 6. The following theorem 2.1 of Chen and Mandel-
baum [3] allows to move from the networks primitives defined in 5.3.4 to the
network process that we define in 5.3.20.
Theorem 5.3.19. Let z0Rand consider a network of dnodes and prim-
itives A, S(1),...,S(d)that satisfy
A(1),...,A(d)are in D([0,),R), non-decreasing and A(i)(0) = 0 for
i= 1,...,d;
S(i),...,S(d),S(i)=eiS(i)+Pd
j=1 S(ij)ejwith S(i), S(ij)are in D([0,),R),
nondecreasing, and S(ij)(0) = S(i)(0) = 0.
If further there is ǫ > 0such that for each D {A(i), S(i), S(ij)|i, j =
1,...,d}
D(t)D(t) + ǫ
then there exists a unique pair of d-dimensional processes (Z, R)satisfying
Z(t) = z0+A(t) + Pd
i=1 S(i)R(i)(t)
Z(j)0
R(j)(t) = Rt
011Z(j)(s)>0ds
Note that this theorem applies to the unscaled (ǫ= 1) and scaled primi-
tives (ǫ=1
n).
5.3 Stochastic networks and associated processes 151
Definition 5.3.20 (Network process).Given network primitives A, S(1),...,S(d)
and some fixed z0Ndsuch that theorem 5.3.19 applies set
Z(t, z0) = z0+A(t) +
d
X
i=1
S(i)R(i)(t)
R(i)(t) = Zt
0
11Z(i)(s,z0)>0ds (j= 1,...,d)
and denote (Z, R)the network process. When addressed in isolation Zis
denoted the queue size process and Rthe runtime process of the network.
The difference of the free and the network process is in the runtime pro-
cess R(i)defined at each node. If queue iis empty during [t0, t1] then during
this time Z(i)= 0 and R(i)= 0. Thus S(i)cannot change state in [t0, t1] and
no customer leaves the empty queue at node i. The queue size always stays
non-negative.
We have just described the different behaviour of a queue when empty and
when nonempty. This is termed “discontinuous statistics” of the network
process and it is the reason why the network process is more difficult to work
with than the free process.
Definition 5.3.21 (R(·,·)).For tR>0and Za well behaved non-negative
process on [0, t]with values in Rlet
R(Z, t) = Zt
s=0
11Z(s)>0ds.
Definition 5.3.22 (Scaled network process).For z0Rd
0define the pro-
cesses
Zn:R0Rd
0, t 7→ 1
nZ(nt, nz0)
and Rnwith coordinate processes R(i)
ndefined as
R(i)
n(t) = 1
nR(Z(i)(·,nz0), nt)
Then (Zn, Rn)is denoted the scaled network process starting in z0. If the
starting point is z0= 0 it may be omitted. When addressed in isolation
Znmay be referred to as the scaled queue size process and Rnas the scaled
runtime process of the network.
152
From Chen and Mandelbaum (part of their theorem 5.1. Their assumed
uniform convergence on compacts holds for the generalised Jackson network
under our assumptions.) we get a statement paralleling claims 5.3.6, 5.3.8.
Theorem 5.3.23 (Drift of the network process).Consider the scaled net-
work process (Zn, Rn)of the generalised Jackson network with rates (λ, µ, P).
Consider the Skorohod problem 5.2.14 with
starting point z0
linear input functions X,X(t) = z0+tλ+ (Pid)µ
P
and let (Z, Y )be the linear solution of the form Z(t) = z0+tz,Y(t) = ty
where zis the network drift and yis the loss rate. Then the network process
converges almost surely uniformly on compacts
lim
n→∞(Zn, Rn) = t7→ z0+tz , t 7→ t(1 ρ).
Chen and Mandelbaum denote the network drift zthe fluid limit of the
queue size process. And indeed it can be identified with the deterministic
flow in the associated fluid network of section 5.2.1.
5.3.3 The local process
Observe that in the scaled network process t7→ Zn(t, z0) (z0fixed) initially
non empty nodes stay non-empty for some positive length of time and since
R(i)(t) = tover this interval the network process looks like the free process
in this i-th coordinate and over this interval. Figure 5.7 gives a realisation
of the non-ergodic network of example 5.1, 5.2.8 over some short time span
[0, T], T = 0.1 with a scaling parameter n= 500. This motivates the local
process which is somewhere in between the free and the network process. We
work with the local process to prove local large deviations in chapter 6.
Definition 5.3.24 (WΛ, RΛ).Given network primitives A, S(1),...,S(d)and
some fixed z0Rd
0let
Λ = Λ(z0) = {i|z0,i >0}
and define the local process WΛas
WΛ:R0×Rd
0Zd
(t, z0)7→ z0+A(t) + X
iΛ
S(i)(t) + X
iΛc
S(i)R(i(t)
5.3 Stochastic networks and associated processes 153
70
60
10
50
20
0
10 40
30
40
0
50
20 30
Queue 1
Queue 2
Queue 3
Queue 4
Figure 5.7: Realisation of the non-ergodic network or associated local process
with Λ = {2,3}, cf examples 5.2.8 (T= 0.1 and scaling n= 500)
and the local runtime process for iΛc
R(i(t) = R(WΛ
i(·, z0), t) = Zt
s=0
11W(i(s,z0)>0ds
The local process WΛwill have non-negative Λc-coordinates: WΛ
i0 for
iΛcsince the runtime R(i will stop S(i)if WΛ
i= 0. Λ-coordinates of the
local process are allowed to have negative values. We denote iΛ a free
node and iΛca restricted node.
We split the local process into the sum of a free and a network process:
Claim 5.3.25. Let A, S(1),...,S(d)be network primitives of a generalised
Jackson network, z0Rd
0and (Z, R)the associated network process starting
in z0. Set Λ = {i|z0,i >0}and let πMbe the projection such that πΛ(α) =
PiΛαiei. Then
WΛ=z0+XΛ+ZΛ
for the following ZΛ, XΛ:
154
ZΛis the network process of the Λc-nodes with all nodes starting empty.
ZΛ(t)Rd
0, Z(i 0for iΛ
ZΛ=πΛc(WΛ)
=X
iΛcA(i)+X
kΛ
S(ki)S(i)R(S(i), t) + X
kΛc
S(ki)R(S(ki), t)ei
XΛis the free process of the Λ-nodes with XΛ(t)Rd, XΛ(0) = 0
XΛ=πΛ(WΛ)z0
=X
iΛA(i)+X
kΛc
S(ki)R(S(ki), t)S(i)+X
kΛc
S(ki)ei
Proof of 5.3.25: We only have to check that the projections πΛ(WΛ) and
πΛc(WΛ) are correct.
The interpretation as a network process for ZΛis correct since 5.3.20 ap-
plies to the primitives
arrival processes A(i =A(i)+PkΛS(ki)for iΛc;
service processes eiS(i)+PkΛcS(ki).
Similarly XΛis a free process as in 5.3.5 with
arrival processes t7→ A(i)(t) + PkΛcS(ki)R(S(ki), t) for iΛ
where Rmakes distributions of these arrival processes difficult;
service processes eiS(i)+PkΛS(ki).
5.3.25
The new arrival processes A(i , i Λcof ZΛdefined in the proof above
as
A(i =A(i)+X
kΛ
S(ki)
for iΛcare no renewal counting processes but the sum of independent
renewal counting processes. The rates of the network process ZΛare
X
iΛcλi+X
kΛ
µkpki ei, µ , [pij 11i,jΛc]i,j=1,...,d
5.3 Stochastic networks and associated processes 155
since Z(i = 0 for iΛ nodes in Λ are not relevant in the ZΛnetwork and
we move to R|Λc|, describing the (sub)network of Λc-nodes as
λΛc+ (P)ΛcΛµΛ=λi+X
kΛ
µkpki iΛc, µΛc= [ µi]iΛc, PΛc= [ pij ]i,jΛc
These rates are like in (5.4) with Λ instead of Band Λcinstead of E. Λc
nodes are not necessarily ergodic.
Exit nodes of the Λc-network are {iΛc| jΛ {0}:pij >0}and
for p(i a row of PΛcand a sub-probability measure pΛ
i0=pi0+PkΛpik.
Claim 5.3.26. ZΛsatisfies assumption 5.1.9.
Proof of 5.3.26: If Zis open then ZΛis, too: In a picture of an open
network there is a sequence of arrows from the outside world to an arbitrary
node iand from that node to the outside world. By remodelling a subset of
nodes as a network we get more outside world” which makes sequences from
the outside world to node ishorter but never looses the required accessabil-
ity. Feedback cannot be created by removing edges from the network.
5.3.26
We can now investigate if the subnetwork of Λc-nodes described through
ZΛis ergodic or find the ergodic subnetwork. The network has deterministic
rates and definitions and claims of section 5.2 apply.
Change of measure
We define the change of measure process for the local process associated
with a generalised Jackson network for a starting point z0Rd
0and set
Λ = {i|z0,i >0}of initially non-empty nodes.
Observing the local process WΛup to some fixed twe observe all arrival
processes and the service processes of Λ-nodes up to time t. For restricted
nodes iΛcwe only have observed the service process and the routing de-
cisions up to time R(i(t)t. If node iis ergodic with high probability
R(i(t)< t. We want to argue that the change of measure process for the
i-th service process we developed in section 4.4.2, esp. definition 4.4.15, is
still a change of measure process under the time change t7→ R(i(t).
Definition 5.3.27 (Local filtration).
FΛ
t=σWΛ
s;st,FΛ=FΛ
t;t0
156
Then (WΛ
t;t0) is adapted to the filtration (FΛ
t)t0and the runtime
t7→ R(i(t) is adapted to this filtration, since R(i(t) = R(WΛ(·, z0), t) is a
measurable function of WΛ.
Let G(i)for some fixed iΛcbe the change of measure process for the
counting process S(i)(cf 5.3.2, 4.4.15):
G(i)(α, t) = exp{hα, S(i)
ti tΓ(i)
S(αi+ Ξ(i)(α))}r(αi+ Ξ(i)(α), t)
Claim 5.3.28. t7→ G(i)(α, R(i(t)) is a martingale wrt FΛ.
Proof of 5.3.28: G(i)(α, ·) is a martingale wrt the filtration
σS(i)
s;stt0
generated by S(i)and for fixed t0 the runtime R(i(t) is a stopping time
wrt FΛ. Thus G(i)(α, R(i(t)) is a regular random variable with mean 1 (the
same mean as G(i)(α, t)) and measurable wrt σ(S(i)
s;sR(i(t))) FΛ
t.
If s < t we have two ordered, bounded stopping times R(i(s)R(i(t)
and
E[G(i)(α, R(i(t)) | FΛ
s] = G(i)(α, R(i(s))
From this we have the martingale property for any s1<···< snor 0 = s0<
s1<···< sn1< sn=Tof the following finite dimensional vector:
1,G(i)(α, R(i(s1)),...,G(i)(α, R(i(sn1)),G(i)(α, T)
And t7→ G(i)(α, R(i(t)) is a local martingale. We now show that for any
t0 we have E[sups[0,t]G(i)(α, s)] <and then the martingale property
follows (cf proposition A.7 in [18]).
sup
tRr(αi+ Ξ(i)(α), t)<
sΓ(i)
S(αi+ Ξ(i)(α)) max{0,tΓ(i)
S(αi+ Ξ(i)(α)) }
E[exp{hα , S(i)
si}] = E[exp{hα , T(i)S(i)sp
si}]
5.3.3
=E[exp{hT(i)α , S(i)sp
si}]
β=T(i)α
=E[exp{hβ , S(i)sp
si}]
Replace βiby β+
ithen the last expression on the rhs is monotone increasing
in sand
E[ sup
s[0,t]
G(i)(α, s)] E[exp{hβ+, S(i)sp
ti}]<.
5.3 Stochastic networks and associated processes 157
5.3.28
Corollary 5.3.29. t7→ G(i)(α, R(i(t)) is the change of measure process for
t7→ S(i)R(i(t), the primitive service process at node iin the setting of the
local process.
The corollary brings together claim 5.3.28 and the additional property of
a mean equal to unity.
Definition 5.3.30 (MΛ(α, ·)).Consider a generalised Jackson network with
primitives defined in 5.1.5, 5.3.4.
Let Xbe the free process defined in 5.3.5
with lmgf Ψdefined in 5.3.9.
Let (WΛ, RΛ)be the local process for some Λ {1,...,d}.
Define for t0and αRd
MΛ(α, t) = exp{hα , WΛ
ti
d
X
i=1
tΓ(i)
A(αi)
X
iΛ
tΓ(i)
S(αi+ Ξ(i)(α))
X
iΛc
R(i(t) Γ(i)
S(αi+ Ξ(i)(α))} × r(α, t)
where
r(α, t) =
d
Y
i=1
r(i)S(i)
S(αi+ Ξ(i)(α)), t)r(i)A(Γ(i)
A(αi), t)
r(i)D(β, t) = Fc
β
Fc(B(t)) eβB(t)
for D {A, S}and Fthe distribution function of inter event times of the
renewal counting process D(i)and B(t) = B(D(i), t)the age of D(i)at time t
(for the definition of r(i)Dcf 3.6.5).
Note that stochasticity of MΛ(α, t) is in the inner product of WΛand in
the runtimes R(i for iΛc.
Claim 5.3.31. t7→ MΛ(α, t)is a change of measure process for the
local process.
158
Under the change of measure MΛ(α, ·)the process WΛis again a local
process associated with the primitives of a generalised Jackson network
and the same set Λ. Each primitive changes its distribution in the
following way:
if A(i)had inter event densities fthen under the changed measure
A(i)remains a renewal counting process and now has inter event
times density fΓ(i)
A(αi), cf 2.3.1, (2.4).
if the routing decision r(i)was distributed on {e1,...,ed,0}with
probability measure (pi1,...,pid,1Pd
j=1 pij)associated with the
sub-probability measure p(i)then under the change of measure rout-
ing decisions remain iid. r(i)now has the distribution associated
with the sub-probability measure Ξ(i)(α), cf 4.4.23.
if S(i)had inter event densities fthen under the changed measure
S(i)remains a renewal counting process and now has inter event
times density fΓ(i)
S(πi(i))(α), cf 2.3.1, (2.4).
Proof of 5.3.31: Can we sequentially change the distributions of the prim-
itive processes. We do it over different times: over [0, t] for the arrival pro-
cesses and the S(i)with iΛ. For the restricted nodes iΛcwe change the
distribution of S(i)over [0, R(i)(t)]. This still works in the new setting due to
5.3.29. Since R(i)(t) is a random variable in our setting and a stopping time
we can work with the stopped martingale as the change of measure.
5.3.31
Since the change of measure MΛ(α, ·) has the same effect on the network
primitives as in the case of the free process, the rates of the network’s prim-
itives change in the same way and corollary 5.3.16 holds under the change
MΛ(α, ·).
Definition 5.3.32 (E[α]).For any fixed t0and set A FΛ
t
E[11(WΛ
s;st)AMΛ(α, t)] = E[α][11(WΛ
s;st)A].
Remark 5.3.33. The local process has been defined with explicit ref-
erence to a starting point z0. This suits our following application of
the local process associated with a network process Z=Z(·, z0). How-
ever, local processes can be defined more generally, not requiring that
Λ = Λ(z0).
If Λ = then MΛ(α, ·) = M(α, ·)is a change of measure process for
the network process.
Chapter 6
Local large deviations of the
generalised Jackson network
For the local sample path large deviations for a Markovian network we cite
the definition of Irina Ignatiouk-Robert in the introduction of [13], Large
Deviations for Processes with Discontinuous Statistics”. The paper is con-
cerned with how to develop full large deviations for Markovian processes with
discontinuous statistics starting from local large deviations.
Definition 6.0.2 (Local large deviations [13]).Let xRd
0and (X(t, x))
be a Markov process on ERd
0with initial state X(0, x) = x. For nN,
(Zn(t, z)) is the rescaled Markov process on En=1
nEand having initial state
Zn(0, z) = z En:
Zn(t, z) = 1
nX(nt, nz).
A local sample path large deviation principle with a rate function J[0,T ]is said
to hold when the following inequalities are satisfied:
(3) : lim
δ0lim
ǫ0lim inf
n→∞ inf
z∈En
|zψ(0)|
1
nlog Pz(||ψZn||< δ) J[0,T ](ψ)
(4) : lim
δ0lim sup
n→∞
sup
z∈En
|zψ(0)|
1
nlog Pz(||ψZn||< δ) J[0,T ](ψ)
for every piecewise linear function ψ: [0, T]Rd
0.
From the Markov property it is deduced in [13] that one only needs to
consider linear functions. We will work only with linear functions for the
non-Markovian generalised Jackson network since we have shown that its
159
160
primitives are exponentially equivalent to primitive proceses that have in-
dependent increments over finitely many, deterministic, disjoint intervals, cf
section 3.4.2. This can be generalised to independent evolution of the net-
work over finitely many, deterministic, disjoint intervals of time, over which
the process stays close to some linear function over each interval (we can
thereby bound runtimes and will obtain deterministic intervals for the ser-
vice processes of the restricted nodes.).
Since a piecewise linear function over a compact interval will hit and leave
boundaries at a finite number of fixed instances of time, independence of the
network evolution over such intervals should be enough to move from linear
to piecewise linear functions. We work with a slightly weaker definition of
local large deviations.
Definition 6.0.3 (Local large deviations).For each nNand fixed z0let
Zn(·, z0)be a scaled network process starting in z0, cf 5.3.22. A local sample
path large deviation principle is said to hold when for any x, v, T such that
xi= 0 vi0
xi>0xi+Tvi>0
the following inequalities are satisfied:
(1) : lim
δ0lim
ǫ0lim inf
n→∞ inf
|z0x|
1
nlog P(||(t7→ x+tv)Zn(·, z0)|| < δ) TL(x, v)
(2) : lim
δ0lim
ǫ0lim sup
n→∞
sup
|z0x|
1
nlog P(||(t7→ x+tv)Zn(·, z0)|| < δ) TL(x, v)
with || · || the supremum norm over the interval [0, T].
We prove the following local large deviation principle for the non-Markovian
generalised Jackson network. We explicitly allow linear functions that leave
a boundary. Such a situation is given if there is isuch that xi= 0, vi>0.
Claim 6.0.4 (Local large deviations for the generalised Jackson network).
Consider the generalised Jackson network with dnodes and primitives 5.1.5.
Let Γ(i)
A,Γ(i)
S,Ξ(i)be the the lmgfs for the primitives and Ψthe lmgf for the
free process. Under assumptions 5.0.2 for the inter event times and 5.1.9
for the network a sample path local large deviation principle holds with rate
function
L(x, v) = sup
α∈BK
hα, vi Ψ(α)
6.1 Local large deviations of the generalised Jackson network 161
with
K={i|xi>0or vi>0}
BK={αRd| αi+ Ξ(i)(α)0iKc}.
From the definition of the local large deviation we make the general as-
sumption
Assumption 6.0.5. x, v, T are such that
xi= 0 vi0and
xi>0xi+T vi>0.
In the local large deviation we bound the probability that the queue size
process stays close to a specified linear function. Since queue sizes will al-
ways be non-negative this assumption picks linear functions that a network
process may have positive probability of staying close to; it is not restrictive.
In the next section we will prove the upper bound of the local large de-
viation principle and obtain a candidate for the local rate function L(·,·)
as an inequality constrained optimisation problem. We will then investigate
existence of an optimiser in this optimisation problem; an optimiser can be
interpreted as the parameter of a change of measure for the network pro-
cess. We will further investigate properties of the network under the change
of measure with the optimising αas parameter and finally prove the lower
bound. As the lower and upper bound coincide the candidate local rate
function is the local rate function and 6.0.4 will be proved.
6.1 Local large deviations upper bound
We are interested in the event
Zn(·, z) Uδ(t7→ x+t v) (6.1)
over some interval [0, T] for the scaled network process Znstarting in zRd
and the asymptotic decay of the probability that the process stays in the
neighbourhood over an interval of positive length as n .
We start with linear functions where empty queues stay empty and the scaled
network process has the same starting point as the linear function t7→ x+tv.
162
-
6
T
t7→ 0
hhhhhhhhhhhhhh
h
x1
t7→ x1+tv1
δ
hhhhhhhhhhhhhh
h
hhhhhhhhhhhhhh
h
Figure 6.1: d= 2, x2= 0 , v2= 0
Claim 6.1.1. Consider the generealised Jackson network with dnodes and
primitives 5.1.5. Let Γ(i)
A,Γ(i)
S,Ξ(i)be the the lmgfs for the primitives and Ψ
the lmgf for the free process. Let Znbe the scaled queue size process of the
generalised Jackson network. Given x, v, T such that assumption 6.0.5 holds
and xi= 0 vi= 0
lim
δ0lim sup
n→∞
1
nlog P(Zn(·, x) Uδ(t7→ x+t v)) Tsup
α∈BΛ
hα, vi Ψ(α)
with the set BΛdefined as
Λ = {i|xi>0}
BΛ={αRd| αi+ Ξ(i)(α)0iΛc}.
Proof of 6.1.1: By the assumption on x, v, T we have R(i)(t) = tfor all
iΛ and tnT while Zn(·, x) Uδ(t7→ x+tv). Thus we can exchange
the network process for the local process:
Zn(·, x) Uδ(t7→ x+t v)WΛ
n(·, x) Uδ(t7→ x+t v)
We uniformise in x:
WΛ
n(t, x)(x+tv) (6.2)
=Zn(0, x) + An(t) + X
iΛ
S(i)
n(t) + 1
n
d
X
iΛc
S(i)R(i)(nt)xtv
=nx
nx
|{z }
[1
n,0]
+An(t) + X
iΛ
S(i)
n(t) + 1
n
d
X
iΛc
S(i)R(i)(nt)tv
|{z }
=WΛ
n(t,0)tv
6.1 Local large deviations of the generalised Jackson network 163
As we have removed xfrom (6.2) we remove it in the notation and write
WΛ
n(t) instead of WΛ
n(t, 0). The difference nx
nxforces us to change the δ
of our neighbourhood to some δwith |δδ| 1
nbut we choose to ignore
this notational nuisance. We are now investigating the event
WΛ
n Uδ(t7→ t v)
and will apply the change of measure MΛ(cf 5.3.24) with parameter αRd.
P(WΛ
n Uδ(t7→ t v)) = E[11WΛ
n∈Uδ(t7→t v)] = E[11WΛ
n∈Uδ(t7→t v)
MΛ(α, nT)
MΛ(α, nT)]
=E[α][11WΛ
n∈Uδ(t7→t v)
1
MΛ(α, nT)] (6.3)
We bound
1
MΛ(α, nT)= exp{− hα, WΛ
nT i
|{z }
=hα,W Λ
nT nT vi+hα,nT vi
+nTd
X
i=1
Γ(i)
A(αi)
+X
iΛ
Γ(i)
S(αi+ Ξ(i)(α))
+X
iΛc
R(i)(nT)
nT
|{z }
1
Γ(i)
S(αi+ Ξ(i)(α))}1
r(α, R(nT))(6.4)
To get an upper bound we restrict αsuch that
BΛ:= {αRd| αi+ Ξ(i)(α)0iΛc}(6.5)
resulting in
1
MΛ(α, nT)exp n hα, WΛ
nT nTvi+nTΨ(α) hα, vio1
r(α, R(nT))
We go on with the bound
P(WΛ
n Uδ(t7→ t v))
(6.4)
E[α][11WΛ
n∈Uδ(t7→t v)exp{ hα, nTv WΛ
nT i
|{z }
≤||α||·||nT vWΛ
nT ||<||α||nT δ
+nTΨ(α) hα, vi}
1
r(α, R(nT))] (6.6)
E[α][11WΛ
n∈Uδ(t7→t v)
1
r(α, R(nT))]
|{z }
1 sup 1
r<
exp{||α|| n T δ +nTΨ(α) hα, vi
164
With the expectation finite uniformly in nand δby 11 1 and 1
r(α,R(nT )) a
product of bounded terms independent of δand n, cf claim 3.6.11.
For fixed α BΛwe have
lim
δ0lim sup
n→∞
1
nlog P(WΛ
n Uδ(t7→ t v)) TΨ(α) hα, vi
Optimising over α BΛwe get the desired upper bound
inf
α∈BΛ
T(Ψ(α) hα, vi) = Tsup
α∈BΛ
hα, vi Ψ(α)
6.1.1
The upper bound Tsupα∈BΛhα, viΨ(α) in 6.1.1 looks similar to a Fenchel
Legendre transform, the usual candidate for a rate function. We will some-
times refer to this upper bound as an almost Fenchel-Legendre transform.
Interpretation 6.1.2. The α BΛare twist parameter in the change of
measure process MΛ(α, ·)and by definition 4.4.17 the change of measure for
the service process changes the rate of the counting process from µito
µi(α) = Γ(i)
S(αi+ Ξ(i)(α)).
Thus we can interprete BΛas the twist parameters that do not allow a de-
crease of service rates at the Λc-nodes.
Corollary 6.1.3. It should be immediate that we can similarly bound the
event (6.1) with z6=xif we have zxbefore δ0:
lim
δ0lim
ǫ0lim sup
n→∞
1
nlog sup
z:|zx|
P(Zn(·, z) Uδ(t7→ x+t v))
Tsup
α∈BΛΨ(α) hα, vi
The situation x6=zaffects the proof in (6.2) as we get nx
nz=
nx
nx+xzinstead of nx
nx. It is again just a matter of changing δ
to some δ. The order of limits as ǫ0 before δ0 is important.
6.1.1 Leaving a boundary
In this section we investigate the event that a network process starting in
Zn(0) = nx
nstays close to some affine function t7→ x+vt and we allow
vi>0 for i6∈ Λ(x). That is: an initially empty node xi= 0 increases over
[0, T] and becomes non-empty. Figure 6.2 is an example of this situation.
6.1 Local large deviations of the generalised Jackson network 165
-
6
T
t7→ tv2
hhhhhhhhhhhhhh
h
x1
t7→ x1+tv1
δ
δ
v2
hhhhhhhhhhhhhh
h
hhhhhhhhhhhhhh
h
Figure 6.2: d= 2, x2= 0 , v2>0
Claim 6.1.4. Consider the generalised Jackson network with dnodes and
primitives 5.1.5. Let Γ(i)
A,Γ(i)
S,Ξ(i)be the the lmgfs for the primitives and Ψ
the lmgf for the free process. Let assumption 6.0.5 hold for x, v, T. Then
lim
δ0lim
n→∞
1
nlog P(Zn(·, x) Uδ(t7→ x+t v)) Tsup
α∈BK
hα, vi Ψ(α).
for
K={i|xi>0or vi>0}
BK={αRd| αi+ Ξ(i)(α)0iKc}.
The difference of claims 6.1.1 and 6.1.4 is in vΛc= 0 vs vΛc0 and the
optimisation on the right hand sides over BΛvs BK. The difference in the
result stems only from the additional assumption in 6.1.1.
Lemma 6.1.5. For iwith xi= 0 , vi>0
Zn(·, z) Uδ(t7→ x+t v)R(i)
n(T)Tδ
vi
Proof of 6.1.5: The claim may be obvious from figure 6.2. Nevertheless,
we apply the definition of the runtime and bound. In this proof we abbreviate
Zn(·, z) Uδ(t7→ x+t v) as Zn U.
11Zn∈U R(i)(nT) = 11Zn∈U ZnT
t=0
11Z(i)
t>0dt
11Zn∈U ZnT
t=0
11xi+tvi>0dt
xi=0
= 11Zn∈U ZnT
t=0
11t>
vi
dt = 11Zn∈U (nT
vi
)
166
6.1.5
Proof of 6.1.4: In this proof we abbreviate Wn(·,0) Uδ(t7→ t v) as WΛ
n U.
The proof goes unchanged up to (6.4) where we bound differently the sum-
mands for iKΛc:
11WΛ
n∈Uδ(t7→tv)
R(i)(nT)
nT Γ(i)
S(αi+ Ξ(i)(α))
= 11WΛ
n∈Uδ(t7→tv)11αi(i)(α)>0
R(i)(nT)
nT
|{z }
1
Γ(i)
S(αi+ Ξ(i)(α))
+11αi(i)(α)011WΛ∈Uδ(t7→tv)
R(i)(nT)
nT
|{z }
1δ
T vi
Γ(i)
S(αi+ Ξ(i)(α))
|{z }
0
11WΛ
n∈Uδ(t7→tv)11αi(i)(α)>0Γ(i)
S(αi+ Ξ(i)(α))
+11αi(i)(α)0(1 δ
Tvi
) Γ(i)
S(αi+ Ξ(i)(α))
= 11WΛ
n∈Uδ(t7→tv)Γ(i)
S(αi+ Ξ(i)(α)) (1 δ
Tvi
11αi(i)(α)0)
and then analog to (6.4)
11WΛ
n∈Uδ(t7→tv)
1
MΛ(α, nT)
11WΛ
n∈Uδ(t7→tv)exp{−hα, WΛ
nT i+nTd
X
i=1
Γ(i)
A(αi)
+X
iΛ
Γ(i)
S(αi+ Ξ(i)(α))
+X
iKΛc
R(i)(nT)
nT
|{z }
1
Γ(i)
S(αi+ Ξ(i)(α))
+X
iKc
R(i)(nT)
nT
|{z }
1δ
T vi11αi(i)(α)
Γ(i)
S(αi+ Ξ(i)(α))}1
r(α, R(nT))
6.2 Local large deviations of the generalised Jackson network 167
The upper bound then becomes
P(WΛ
n Uδ(t7→ tv)) E[α][11WΛ∈Uδ(t7→tv)
1
r]
exp{||α||nTδ nThα, vi+nTd
X
i=1
Γ(i)
A(αi)
+X
iΛcK
Γ(i)
S(αi+ Ξ(i)(α)) (1 δ
Tvi
11αi(i)(α)0)
+X
iΛ
Γ(i)
S(αi+ Ξ(i)(α))
+X
iKc
Γ(i)
S(αi+ Ξ(i)(α))}(6.7)
Where we need the restriction αi+Ξ(i)(α)0 only for iKcfor bounding
the relative runtime in (6.7) by 1. These restrictions define BKanalogue to
(6.5). We continue
P(WΛ
n Uδ(t7→ tv)) E[α][11WΛ∈Uδ(t7→tv)
|{z }
1
1
r
|{z}
bounded by 3.6.11
]
exp{||α||nTδ nThα, vi+nTΨ(α)
X
iΛcK
Γ(i)
S(αi+ Ξ(i)(α)) δ
Tvi
|{z}
0 as δ0
11αi(i)(α)0}
and under the scaling limit
lim
δ0lim
n→∞
1
nlog P(WΛ
n Uδ(t7→ tv)) Thα, vi+TΨ(α)
6.1.4
6.2 Existence and uniqueness of an optimiser
We investigate the optimisation problem found in claim 6.1.4
sup
α∈BK
hα, vi Ψ(α) (6.8)
with K {i|vi>0}
BK={αRd| αi+ Ξ(i)(α)0iKc}
168
and we will argue for the existence of a unique optimiser in the following.
Uniqueness is not really an issue since we have seen that Ψ is strictly convex
(if all inter event times are non-deterministic or if at least there is nondeter-
ministic flow reaching each node).
We start with a simple condition for existence of an optimiser and then
develop a second one, more elaborate and less restrictive.
Claim 6.2.1. If the Fenchel Legendre transform Ψis finite on all of Rd
then an optimising αin (6.8) exists.
Proof of 6.2.1: We prove that For any v, K such that vKc= 0 the level
sets {α BK|Ψ(α) hα , vi c}are compact. Similar to [12] we construct
a finite norm-ball including the level set.
Let || · || denote some norm in Rd. and define the norm | · |1as
|α|1= sup
||v||≤1
hα, vi
We give a finite bound for |α|1uniform in αof the level set.
sup
α∈BK:
Ψ(α)−hvi≤c
|α|1= sup
α∈BK:
Ψ(α)−hvi≤c
sup
v:||v||≤1
hα, vi
For αfrom the level set {α BK|Ψ(α) hα , vi c}
hα, vi=hα, v+vi hα, vi
=hα, v+vi Ψ(α) + Ψ(α) hα, vi
|{z }
c
and thus
|α|1c+ sup
v:||v||≤1
hα , v+vi Ψ(α)
c+ sup
v:||v||≤1
sup
αRd
hα , v+vi Ψ(α)
c+ sup
v:||v||≤1
Ψ(v+v) (6.9)
which is finite if Ψis finite on Rdand the supremum is over a convex set.
The bound uniform in α: The level set is bounded. Closedness is immediate
and only needs continuity of Ψ. From compactness of level sets and finite-
ness and continuity of the objective α7→ Ψ(α) hα, vifollows existence of
6.2 Local large deviations of the generalised Jackson network 169
an infimiser.
6.2.1
We formulate our second criterion below in 6.2.7 and we believe that its
if-part holds if the rate functions Γ(i)
Dfor all i= 1,...,d and D {A, S}
are open, cf 2.8. Technically we need to replace Ψin the bound (6.9) in the
case that that (6.8) is finite but there is no neighbourhood of von which Ψ
is finite. We will define the replacement GKin 6.2.4.
We start with basic convex analysis and then give an upper bound for (6.8).
Also we will argue for finiteness of this upper bound.
Consider the generalised Jackson network with primitives A(i), S(i), n 7→
Pn
k=1 r(i)
kfor i= 1,...,d (cf 5.1.5) where the arrival and service processes
have lmgfs Γ(i)
A,Γ(i)
Sand r(i)
khas lmgf Ξ(i), cf 4.4.8. The Γ(i)
Dare strictly con-
vex as soon as they are not deterministic, the Ξ(i)are strictly convex if the
routing measure they are build from are no point measures.
Let γRd
0and wRdbe fixed for the moment and denote by πj
the projection from Rdonto span{ej}, that is πj(α) = αjej. Then α7→
Pd
i=1 Γ(i)
Aπi(α)+γiΞ(i)(α) is a convex function and we investigate its Fenchel-
Legendre transform.
Definition 6.2.2. For γRd
0define
gγ:RdR {∞} , θ 7→ d
X
i=1
Γ(i)
Aπi+γiΞ(i)(θ).
Claim 6.2.3. gγis convex and
g
γ(w) = inf
aRr,rRd×d
a+rγ=w
d
X
i=1
Γ(i)
A(ai) + γiΞ(i)(r(i))
We interprete gγ(w) as the joint decay rate for the probability that at
node ian empirical arrival rate aican be observed instead of the expected λi
and that routing happens at empirical rates r(i)instead of p(i)over time γi.
Additionally there is the condition that ai, r(i)have to be such as to produce
total flow into each node iof rate wi=ai+ (rγ)iin the associated fluid
network. If there are no a, r in the domains of Γ(i)
A,Ξ(i)such a flow of a
into the fluid network and a splitting of flow at each node iwrt r(i)would
produce input flow winto the nodes then g
γ(w) = .
170
Proof of 6.2.3: We start with some convex analysis (cf (7.2) of the appendix).
d
X
i=1
Γ(i)
Aπi+γiΞ(i)(w) = inf
cj,djRd
Pjcj+dj=w
d
X
j=1
(j)
Aπj)(cj) + (γjΞ(j))(dj)
Due to 7.3.1 we do not increase the infimum when restricting cjto πj(Rd).
In the following we optimise over aRdwith aj=hcj, eji. This is OK since
(c1,...,cd)7→ ais a bijection for those cjfor which (j)
Aπj)(cj)<. We
also apply (7.1)
g
γ(w) = inf
a,djRd
a+Pdj=w
d
X
j=1
Γ(j)
A(aj) + γjΞ(j)(1
γj
dj) (6.10)
Changing variables from 1
γjdjto r(j)in the argument of Ξ(j)we need to
change the restriction, too.
r(j)=1
γj
dj
d
X
j=1
dj=
d
X
j=1
γjr(j)=rγ
This completes the proof.
6.2.3
From the new representation of g
γin 6.2.3 we see that g
γis finite if there are
(a, r)Rd×Rd×dwith ai0, λi= 0 ai= 0, and r(i)equivalent to p(i). It
is also possible that there is r(i)not equivalent to p(i)but with supp(r(i))$
supp(p(i)). In that case there’d be no finite optimiser in Ξ(i)(r(i)) (cf 4.4.22).
Note that the Fenchel-Legendre transform in claim 6.2.3 satisfies
g
γ0 from Γ(i)
A,Ξ(i), γi0 and thus g
γ>−∞ always,
g
γ(λ+Pµ)<since (λ, P) X (λ+Pµ, µ). This is also a minimiser
of the Fenchel-Legendre transform: g
γ(λ+Pµ) = 0.
Thus g
γis a proper convex function (cf [19] p. 24, definition of “proper”).
As a next prep-step
6.2 Local large deviations of the generalised Jackson network 171
Definition 6.2.4. Let K {1,...,d}.
GK:RdR {∞}
v7→ inf
a,r:
a+(rid)γ=v
d
X
i=1
Γ(i)
A(αi) + γiΞ(i)(r(i))
+X
jK
Γ(j)
S(γi) + X
jKc
11γj(j)Γ(j)
S(γj)
GK(v)<if there is one set of {a, r, γ}such that the respective rate
functions are finite and the fluid network with
non empty nodes K
arrival rate aiat node i
service rate γiat each node iK
service rate max{γi, µi}at each node iKc
routing matrix r
has network drift v(cf definition 5.2.16). The difference µiγiwould be the
loss rate usually denoted yiat the initially empty subnetwork Kc.
Claim 6.2.5. If the set of restrictions {a, r, γ :a+ (rid)γ=v}has
a non-empty inter section with the domains of the respective individual rate
functions Γ(i)
A,Γ(i)
S,Ξ(i)then GK(v)<and the infimum is a minimum
(optimiser exists).
Proof of 6.2.5: We argue with compactness of level sets.
d
X
j=1
Γ(j)
A(aj) + γjΞ(j)(r(j)) + X
iΛc
11γi(i)Γ(i)
S(γi) + X
iΛ
Γ(i)
S(γi)M
Γ(j)
A(aj)M , j = 1,...d
Γ(i)
S(γi)M , i Λ
γi[0, µ(i)]
or
Γ(i)
S(γi)M
, i Λc
From goodness of the Γ-rate functions the a, γ are in compact sets of Rd,
forming a bounded set themselves. Also the r(i)are sub-probability measures
and their max-norm is 1 so each is in a bounded set of Rd. From continuity
172
of all involved rate functions the level set is closed. Thus all parameters are
in compact sets forming a compact set in product space. Therefore GKthat
was defined as an infimum is actually a minimum whenever it is finite.
6.2.5
Note that GKcan be written as a composition of g
γthat is constant in
Kand the rate functions for the service processes.
GK(v) = inf
γg
γ(v+γ) + X
iKc
11γi(i)Γ(i)
S(γi) + X
iK
Γ(i)
S(γi).
Further note that G{1,...,d}= Ψso G{1,...,d}is a tight upper bound for the
Fenchel-Legendre transform Ψand this generalises as:
Claim 6.2.6. GKbounds the almost Fenchel-Legendre transform (6.8).
Proof of 6.2.6: We start with some transformations that will allow us to
apply 6.2.3.
sup
α∈BK
hα , vi Ψ(α)
= sup
α∈BK
hα , vi
d
X
i=1
Γ(i)
A(αi) + Γ(i)
S(αi+ Ξ(i)(α))
we add a zero (introducing the non-negative parameters γ1,...,γd) and re-
arrange.
= sup
α∈BK
hα , vi+
d
X
i=1
γi(αi+ Ξ(i)(α)) Γ(i)
A(αi)
+γi(αi+ Ξ(i)(α)) Γ(i)
S(αi+ Ξ(i)(α))
= sup
α∈BK
hα , v +γi
d
X
i=1
γiΞ(i)(α) + Γ(i)
A(αi)
+γi(αi+ Ξ(i)(α)) Γ(i)
S(αi+ Ξ(i)(α))
and take suprema separately. We loose the restriction in the first supremum.
6.3 Local large deviations of the generalised Jackson network 173
The expression may increase:
sup
αRd
hα , v +γi
d
X
i=1
γiΞ(i)(α) + Γ(i)
A(αi)
+
d
X
i=1
sup
α∈BK
γi(αi+ Ξ(i)(α)) Γ(i)
S(αi+ Ξ(i)(α))
d
X
i=1
Γ(i)
Aπi+γiΞ(i)(v+γ)
+X
iKc
11γi(i)Γ(i)
S(γi) + X
iK
Γ(i)
S(γi) (6.11)
Optimise over γand the claim follows: we got
sup
α∈BK
hα, vi Ψ(α)inf
γRd(6.11) = GK(v)
6.2.6
In the rest of the subsection we apply GKto prove existence of an optimiser
in the almost Fenchel-Legendre transform (6.8).
Claim 6.2.7. If v D(GK)then for any v, K such that vKc= 0 the level
sets {α BK|Ψ(α) hα , vi c}are compact and an optimiser in (6.8)
exists.
Proof of 6.2.7: Let a > 0 be small enough for GKto be finite in an || · ||-
ball of radius aaround vand let |α|a= supv:||v||≤ahα, vi. Then as in the
proof of 6.2.1 we obtain
|α|ac+ sup
v:||v||≤a
hα , v+vi Ψ(α)
c+ sup
v:||v||≤a
sup
α∈BΛ
hα , v+vi Ψ(α)
c+ sup
v:||v||≤a
GΛ(v+v)
which is a finite bound by choice of aand uniform in α: The level set is
bounded. Closedness is immediate from continuity of Ψ.
6.2.7
174
6.3 Network drift under the changed mea-
sure
From the local large deviations upper bound in 6.1.1 and more general 6.1.4
we got the candidate for the local rate function L(x, v) as the following
optimisation problem
sup
α∈BK
hα, vi Ψ(α).(6.12)
In section 6.2 we have given conditions under which an optimiser αexists.
We will now investigate the behaviour of the network after the change of
measure M(α, ·).
For any αthe change of measure can be translated back into the individ-
ual changes of measure for each network primitive, cf 5.3.31. Thus once we
have identified an αwe are not restricted to work with the local process WΛ
we started with, we just switch from Pto P[α]and work with the network
primitives and the free, the network, and the local process as before.
Claim 6.3.1. Under assumption 6.0.5 for x, v, T and under P[α]for αthe
optimiser in 6.1.4 the fluid limit of Zn(·, z0)is t7→ z0+tv and
lim
n→∞ P[α](Zn Uδ(t7→ z0+tv)) = 1.
The proof of 6.3.1 requires elements of optimisation theory we state before
we begin the proof. For this section we rephrase the almost Fenchel-Legendre
transform (6.12) as an inequality constrained minimisation problem (ICM).
−hα , vi+ Ψ(α)min subj. to αiΞ(i)(α)0iKc(ICM)
For future reference set gi=πiΞ(i).
If for the ICM the Slater condition holds the optimiser satisfies the Karush-
Kuhn-Tucker (KKT) condition. The KKT condition will help us to identify
vas the network drift under P[α].
Claim 6.3.2 (Slater condition).There is αRdsuch that gi(α)<0for
each iΛc.
Proof of 6.3.2: We assume that Λc6=. We step by step fix the value of
each αisuch that the condition gi(α)<0 always (and finally) holds for all
6.3 Local large deviations of the generalised Jackson network 175
iΛc.
Fix values for αΛand let B:= Λ be the set of indices with αialready fixed.
The proof stops as B={1,...,d}and gi(α)<0 has been checked for all
iΛc.
We define a partition of the Λc-nodes relative to the length of the short-
est path from a node iΛcto Λ {0}. From assumption 5.1.9 there is a
finite length path from ito {0}making the shortest path from ito Λ {0}
have finite length. For fixed Λ set
A0= Λ {0}
and for k= 1,2,... and while Ak1is not empty set
Ak={i {1, . . . , d} \ (A0 · · · Ak1)| jAk1:pij >0}
Then Akis the subset of Λcnodes with the shortest path to Λ{0}consisting
of kedges. Note that there is at most d+ 1 such sets.
Set αifor iΛ to an arbitrary value in R. In the following we fix the values
of αi, i A1: We omit the α’s not to be fixed now and get an inequality.
Ξ(i)(α)log X
jA1
pij eαj+X
jB
pij eαj+pi0
|{z }
=:p
i0
gi(α) = αiΞ(i)(α)αilog X
jA1
pij eαj+p
i0
We now have an upper bound for gi(α) and we choose αA1to make the upper
bound negative.
0> αilog X
jA1
pij eαj+p
i0
eαi<X
jA1
pij eαj+p
i0
ˆαi<X
jA1
pij ˆαj+p
i0(6.13)
where we set ˆαi:= eαiand will have to observe the condition ˆαi>0 to get
an αiR. From construction of A1we have p
i0>0 - either due to pi0>0
or from pij >0 for some jB= Λ and the αΛfixed as some real numbers
176
- the rhs of (6.13) is positive. In the linear notation p
i0=PjBpij ˆαj+pi0.
We get a system of |Λc|linear inequalities.
(6.13) iΛcˆαΛc< PΛcA1ˆαA1+PΛcBˆαB+PΛc{0}
ˆαA1< PA1ˆαA1+PA1BˆαB+PA1{0}
(id PA1) ˆαA1< PA1BˆαB+PA1{0}
The inverse of id PA1exists and is strictly positive. Thus we can multiply
with (id PA1)1and keep the coordinate wise inequality.
ˆαA1<(id PA1)1PA1BˆαB+PA1{0}
which leaves a non-degenerate positive interval for each ˆαi, i A1. We can
fix αA1and thus update B:= BA1(= ΛA1). If |B|=dwe are done. Else:
We can iterate this. For k2 the k-th iteration is to be done only if
|B|=|ΛA1 · · · Ak1|< d which is equivalent to Ak6=. Up to the
k1-st iteration αiare known for all iB. As before
Ξ(i)(α)log X
jAk
pij eαj+X
jB
pij eαj+pi0
|{z}
=0
|{z }
=:p
i0
αiΞ(i)(α)αilog X
jAk
pij eαj+p
i0
and
gi(α)<0ˆαi<X
jAk
pij ˆαj+p
i0
with strictly positive p
i0(from Bsupp(p(i))6=by construction of Ak).
Putting all iAkin one inequality
ˆαAk<(id PAk)1PAkBˆαB
Positivity of the p
i0grants solvability of the inequality and we can fix real
coordinates for αAkand update B:= BAk. If necessary iterate again.
6.3.2
Claim 6.3.3 (KKT).In αthe Karush-Kuhn-Tucker condition holds:
(−hα , vi+ Ψ(α)) + PiKcηigi(α) = 0
6.3 Local large deviations of the generalised Jackson network 177
ηR|Kc|
0
hη, g(α)i= 0
with a unique η=η(α).
Proof of 6.3.3: The KKT condition holds since we proved that the Slater
condition holds; existence of the optimiser αwas already proved. For unique-
ness of ηit remains to be shown that the gradients {∇gi(α),|iKc}are
linearly independent.
From our general assumption 5.1.9 one is not an eigenvalue of Pand idP
is a regular matrix. Then also the sub-matrix idKcP
Kcis regular. Since
rows of PKcand PKc(α) are both sub-probability measures (cf 4.4.16, 4.4.23)
also idKcP
Kc(α) is a regular matrix. And from regularity of id P
Kc(α)
follows linear independence of their columns eipKc(α)(i)R|Kc|:
{eip(i)
Kc(α)|iKc} R|Kc|
and thus of the longer columns
{∇gi(α)|iKc}={eip(i)(α)|iKc} Rd.
6.3.3
We can now prove the statement about the fluid limit under the change
of measure.
Proof of 6.3.1: Restate the first bullet from KKT:
X
iKc
ηigi(α) = X
iKc
(ei Ξ(i)(α)) ηi
each Ξ(i)is an exponential twist of the row p(i)of P(cf 4.4.16) and we have
under the change of measure with αfrom 5.3.16)
P(α) = [ Ξ(1)(α),...,Ξ(d)(α) ]
X
iKc
ηigi(α) = (id{1,...,d},KcP(α){1,...,d}Kcη= (id P(α)) 0
η
and the KKT first bullet becomes
v=Ψ(α) + X
iKc
ηi(ei Ξ(i)(α))
=λ(α) + (P(α)id) µ(α) + (id P(α)) 0
η
=λ(α) + (P(α)id) µ(α)0
η
178
and on the right hand side we have the network drift defined in 5.2.16 for
the network process starting in some z0with z0,i >0 for iKand z0,i = 0
for iKc. The 0
ηis the loss rate. And by 5.3.23 the network drift is the
expected, normal behaviour of the network process, its fluid limit.
6.3.1
Interpretation 6.3.4. We have interpreted BKin 6.1.2 as twist parameters
that do not decrease service rates at Kc-nodes. From gi(α)<0ηi= 0
(complementarity in the KKT condition) and the identification of ηas the
loss rate in 6.3.1 we know that if the service rate of a node is strictly increased
under the twist αthen this node is a bottleneck.
We have written the local process WΛas the sum of a free subprocess
XΛand the network process of nodes Λcdenoted ZΛ.
Corollary 6.3.5. From 6.3.1 and for K= Λ: Under P[α]there are no strict
bottlenecks in the Λc-nodes. Ergodic nodes in the Λc-subnetwork are identified
through the network drift (v, η)obtained from the KKT (via ηi>0).
Proof of 6.3.5:
1 = lim
n→∞ P[α](Zn Uδ(t7→ z0+tv))
= lim
n→∞ P[α](WΛ
n Uδ(t7→ z0+tv))
= lim
n→∞ P[α](XΛ
n Uδ(t7→ z0+Λ(v)) , ZΛ
n Uδ(t7→ Λc(v)))
From Λ = Kwe have πΛc(v) = 0 and
1 = lim
n→∞ P[α](XΛ
n Uδ(t7→ z0+Λ(v)) , ZΛ
n Uδ(t7→ Λc(v)))
lim
n→∞ P[α](ZΛ
n Uδ(t7→ 0)) = lim
n→∞ P[α](||ZΛ
n|| < δ)
6.3.5
6.4 Local large deviations lower bound
In this section we give a lower bound for the exponential decay rate of the
event that the local process WΛstarting in WΛ
n(0, x) = nx
nfollows the affine
function t7→ x+tv. Let Λ = {i|xi>0}and then uniformise over xand
only investigate the following event:
{sup
t[0,nT ]
|WΛ
ttv|< }={WΛ Uδ(t7→ tv)}.
Further let again K={i|xi>0 or vi>0} Λ.
6.4 Local large deviations of the generalised Jackson network 179
Claim 6.4.1. If αis the optimiser in the almost Fenchel-Legendre transform
sup
α∈BK
hα, vi Ψ(α) = hα, vi Ψ(α)
then the lower local large deviation bound holds:
lim
δ0lim inf
n→∞
1
nlog P(||Zn(·, x)(t7→ x+tv)|| < δ) Thα, vi Ψ(α).
We will apply the change of measure with parameter αthat was found
as the optimiser in the upper bound 6.1.4. We start the same way as for the
upper bound.
P(WΛ
n Uδ(t7→ tv)) = E[11WΛ
n∈Uδ(t7→tv)]
=E[11WΛ
n∈Uδ(t7→t v)
MΛ(α, nT)
MΛ(α, nT)]
=E[α][11WΛ
n∈Uδ(t7→tv)
1
MΛ(α, nT)]
From the change of measure applied here that was defined in 5.3.30, 5.3.32:
1
MΛ(α, nT)= exp hα, WΛ
nT i+nTd
X
i=1
Γ(i)
A(αi)
+X
iΛ
Γ(i)
S(αi+ Ξ(i)(α))
+X
iΛc
1
nT R(i)(nT(i)
S(αi+ Ξ(i)(α)) 1
r(α, nT)
180
We apply the change of measure to our event (uniformise over xalready).
P(WΛ
n Uδ(t7→ tv))
=E[α][11WΛ
n∈Uδ(t7→tv)
1
MΛ(α, nT)]
=E[α][11WΛ
n∈Uδ(t7→tv)exp n hα , WΛ
nT i+nT
d
X
i=1
Γ(i)
A(αi)
+nT X
iΛ
Γ(i)
S(αi+ Ξ(i)(α))
+nT X
iΛc
R(i)(nT)
nT Γ(i)
S(αi+ Ξ(i)(α))o1
r(α, nT)]
exp{−||α||nTδ hα, nTvi+nTΨ(α)}(6.14)
E[α][11WΛ∈Uδ(t7→tv)exp{−nT X
iΛc
(1 R(i)(nT)
nT (i)
S(αi+ Ξ(i)(α))}1
r(α, nT)](6.15)
Inequality in (6.14) is due only to the minus in −||α||nTδ. It applies the
definition of Ψ of 5.3.9 from the primitives lmgfs. The proof of 6.4.1 is thus
equivalent to the proof of
Lemma 6.4.2.
lim
δ0lim inf
n→∞
1
nlog(6.15) = 0
Proof of 6.4.2: We have joint uniform convergence of the queue size and
the runtime process.
E[α][11Zn(·,x)∈Uδ(t7→x+tv)exp{nT X
iΛc
(R(i)(nT)
nT 1)Γ(i)
S(αi+ Ξ(i)(α))}]
E[α][11Zn(·,x)∈Uδ(t7→x+tv)11Rn∈Uδ(t7→)
exp{nT X
iΛc
(R(i)(nT)
nT 1)Γ(i)
S(αi+ Ξ(i)(α))}]
=E[α][11Zn(·,x)∈Uδ(t7→x+tv)11Rn∈Uδ(t7→)
exp{nT X
iΛc
αi(i)(α)=0
(R(i)(nT)
nT 1) Γ(i)
S(αi+ Ξ(i)(α))
|{z }
=0
}
exp{nT X
iΛc
αi(i)(α)6=0
(R(i)(nT)
nT 1)Γ(i)
S(αi+ Ξ(i)(α))}]
6.5 Local large deviations of the generalised Jackson network 181
We have started with α BK, thus for iKc
αi+ Ξ(i)(α)6= 0 αi+ Ξ(i)(α)>0
For nodes iKcwe have found in 6.3.4 that under α
αi+ Ξ(i)(α)>0ρi= 1.
and from the indicator for the runtime process
R(i)(nT)
nT 1ρiδ1 = δ
KΛcwas the set of nodes with xi= 0, vi>0 and from the runtime bound
in 6.1.5 we have
R(i)(nT)
nT 11δ
tvi
1 = δ
tvi
We finally have
11Zn(·,x)∈Uδ(t7→x+tv)11Rn∈Uδ(t7→)
exp{nT X
iΛc
αi(i)(α)>0
(R(i)(nT)
nT 1)Γ(i)
S(αi+ Ξ(i)(α))}
11Zn(·,x)∈Uδ(t7→x+tv)11Rn∈Uδ(t7→)
exp{nT X
iΛcK
αi(i)(α)>0
(R(i)(nT)
nT 1)
|{z }
≥− δ
T vi
Γ(i)
S(αi+ Ξ(i)(α))}
exp{nT X
iKc
αi(i)(α)>0
(R(i)(nT)
nT 1)
|{z }
≥−δ
Γ(i)
S(αi+ Ξ(i)(α))}
So we only have deterministic exponential expressions left, and under the
scaling in 6.4.2 they tend to 0. The indicators are such that their expecta-
tion tends to 1.
6.4.2
We can also have Zn(·, z) with z6=x, as soon as zxbefore δ0.
6.5 Rate function identification
For the generalised Jackson network Anatolii Puhalskii proved a sample path
large deviation principle in [16]. The rate function is infinite on not absolute
182
continuous functions, for absolutely continuous functions qit is defined as
IQ
q0(q) = Z
t=0 X
J⊆{1,...,d}
11q(t)FJRJ(˙
q(t)) dt (= Z
t=0
L(q(t),˙
q(t)) dt)
FJ={xRK
+|xk= 0, k J;xk>0, k 6∈ J}
RJ(v) = inf
(a,d,r):v=a+(rI)dψJ(a, d, r)
ψJ(a, d, r) = ψA(a) + X
kJc
ψS
k(dk) + X
kJ
ψS
k(dk) 11dk>ˆµk+
d
X
k=1
dkψR
k(rk)
In our notation (definition 6.2.4)
RJ(v) = GJc(v)
Claim 6.5.1 (Rate function identification).RJ(v) = L(x, v)for Jc= Λ(x).
Proof of 6.5.1: The set Jis the set of initially (t= 0) empty nodes and
FJis the face with Jc-coordinates strictly positive. This is just the other
way around compared to the definition of Λ (and the face BΛin [12]).
Puhalskii’s large deviation principle is on D([0,),Rd) equipped with the
extended J1-topology of Skorohod. It implies a sample path large deviation
principle on D([0, T],Rd) equipped with the J1-topology with the same local
rate function.
Skorohod’s J1-topology is a metric topology, denote by dJ1(·,·) a metric
inducing this topology (cf ddin display (A.2), (A.3) below theorem A.53
of [23]). Convergence in D([0, T],Rd) to affine functions in the supremum
norm induced metric and dJ1is equivalent, since: for ψ, f D([0, T],R) by
definition of the metrices
dJ1(f, ψ) ||fψ||
And if ||ψ|| = supt[0,T ]|ψ(t)|<then
||fψ|| (||ψ|| + 1) dJ1(ψ, f)
Thus, open balls around affine functions wrt these metrices can be nested.
Define the open ball Uaround ψwith ψ(t) = x+tv wrt dJ1
U(δ) := {fD([0, T],Rd)|dJ1(f, t 7→ x+tv)< δ}
6.6 Local large deviations of the generalised Jackson network 183
then
Uδ(t7→ x+tv)U(δ)U(δ) Uδ(1+|v|)(t7→ x+tv)
and
lim inf
n→∞
1
nlog P(Zn Uδ(t7→ x+tv)) lim sup
n→∞
1
nlog P(ZnU(δ))
lim sup
n→∞
1
nlog P(Zn U(1+|v|)δ(t7→ x+tv))
Then let δ0. From goodness of the rate function IQ
q0, closedness of {ψ},
and [5], p.119
TRJ(x)(v) = IQ
q0(ψ) = inf
g∈{ψ}IQ
q0(g) = lim
δ0inf
gU(δ)
IQ
q0(g)
and we finally obtain
TL(x, v) TRJ(x)(v) TL(x, v)
which identifies the local rate functions.
6.5.1
Corollary 6.5.2. The upper bound for the almost Fenchel Legendre trans-
form in 6.2.4 is always a tight bound and
GK(v) = sup
α∈BK
hα, vi Ψ(α)
6.6 Calculating the local rate function
We have identified the local large deviation rate function L(·,·) for the gener-
alised Jackson network as a restricted optimisation problem 6.0.4, an almost
Fenchel-Legendre transform. For the Jackson network the rate function can
be expressed as a (real, true) Fenchel-Legendre transform in lower dimen-
sional space. We cite proposition 10.2 of [12] of Ignatiouk-Robert that we
generalise here. We give it in our notation. For L(x, v) and αthe optimiser in
the restricted optimisation problem is such that under the change of measure
with paramter αthe network drift becomes vand the deviating event that
the scaled network process stays close to the linear function t7→ x+tv be-
comes the expected behaviour, the network’s fluid limit. Knowing the change
of measure that makes t7→ x+tv the fluid limit of the network is knowing
the rate function.
184
Claim 6.6.1. For (x, v),K={i|xi>0or vi>0}there is ΘKand
convex ΨΘ:R|Θ|Rsuch that
sup
α∈BK
hα, vi Ψ(α) = ΨΘ(vΘ)
Let MK={i|xi>0 or vi>0}and consider the equality constrained
optimisation problem.
inf
α∈DM
−hα , vi+ Ψ(α),DM:= {αRd|αi= Ξ(i)(α)iMc}(6.16)
To better describe elements of DMwe need the following
Definition 6.6.2. For strictly substochastic Pand M {1,...,d}define
the matrix QRd×|M|
Q=P{1,...,d}M+P{1,...,d}Mc(id PMc)1PMcM
Note that Qsimplifies when splitting {1,...,d}into Mand Mc.
QMCM= (id PMc)1PMcM
QM=PM+PMMc(id PMc)1PMcM
Remark 6.6.3. Qis substochastic and 1is not an eigenvalue of QM.
The remark was proved in [3] Lemma 4.3. Thus the rows q(i)of Qare
measures with total mass 1. The q(i)are not generally equivalent to the
p(i)when restricted to M: there may be jMsuch that qij >0 = pij. We
define the lmgf of q(i)parallel to Ξ(i)for the p(i):
Definition 6.6.4. In parallel to the definition of Ξin 4.4.8 define for the
sub-probability measure q(j), the j-th row of Qand βR|M|
Υ(j)(β) = log |M|
X
k=1
qjk eβk+ (1 qj1 · · · qj|M|
|{z }
=:qj0
).
If q(j)is a subprobability measure on {e1,...,ed}then Υ(j)is as in 4.4.6,
representing the restriction to M.
Lemma 6.6.5. If PRd×dis a substochastic matrix and Qassociated with
Pas in 6.6.2 and for j {1,...,d}Ξ(j)is associated with the j-th row p(j)
of Pand Υ(j)with the j-th row q(j)of Qthen Ξ(j)(α) = Υ(j)(αM)for any
α DM.
6.6 Local large deviations of the generalised Jackson network 185
Proof of 6.6.5: We transform eαisuch that expressions become linear:
ˇαi=eαi1. Definition of Υ(i)and Ξ(i)then become
eΞ(i)(α)i=1,...,d = d
X
j=1
pij eαj+ (1
d
X
j=1
pij)!i=1,...,d
=Pˇα+
1
.
.
.
1
eΥ(i)(αM)i=1,...,d =
|M|
X
j=1
qij eαj+ (1
|M|
X
j=1
qij)
i=1,...,d
=QˇαM+
1
.
.
.
1
rewriting the condition α DMin a similar fashion
αi= Ξ(i)(α)iMcˇαMc=PMc{1,...,d}ˇα(6.17)
and iterating
ˇαMc=PMc{1,...,d}ˇα
=PMcˇαMc+PMcMˇαM
=PMcPMcˇαMc+PMcMˇαM+PMcMˇαM
= (PMc)2ˇαMc+ (PMc+ id)PMcMˇαM
=... = (PMc)n+1 ˇαMc+
n
X
k=0
(PMc)kPMcMˇαM
which converges
ˇαMc= lim
n→∞
n
X
k=0
(PMc)kPMcMˇαM= (id PMc)1PMcMˇαM
Applying this in Psplit into Mand Mcindices
Pˇα=P{1,...,d}MˇαM+P{1,...,d}McˇαMc
=P{1,...,d}MˇαM+P{1,...,d}Mc(id PMc)1PMcMˇαM
=QˇαM
Thus Pˇα+
1
.
.
.
1
=QˇαM+
1
.
.
.
1
and we get the claim.
6.6.5
Corollary 6.6.6. DM={α|αi= Υ(i)(αM)iMc}.
186
Claim 6.6.7. The equality constraint optimisation problem is a Fenchel-
Legendre transform.
inf
α∈DM
−hα, vi+ Ψ(α) = ΨM(vM)
Proof of 6.6.7: For α DMwrite α=αM
αMcand ui= Υ(i)(αM) for
iMc. Then α=αM
uby 6.6.6. Thus
inf
α∈DM
−hα, vi+ Ψ(α) = inf
αMR|M|−hv, αM
ui+ Ψ(αM
u).
By choice of Mwe have vMc= 0 and hv, αM
ui=hαM, vMi. Also Ψ
simplifies:
Ψα∈DM(α) = X
jM
Γ(j)
A(αj) + Γ(j)
S(αj+ Ξ(j)(α)
|{z }
(j)(αM)
)
+X
jMc
Γ(j)
A(αj
|{z}
(j)(α)=Υ(j)(αM)
) + Γ(j)
S(αj+ Ξ(j)(α)
|{z }
=0
)
=X
jM
Γ(j)
A(αj) + Γ(j)
S(αj+ Υ(j)(αM)) + X
jMc
Γ(j)
A(j)(αM))
=: ΨM(αM)
and we now have
inf
α∈DM
−hα, vi+ Ψ(α) = inf
αMR|M|−hvM, αMi+ ΨM(αM)
and the claim is proved.
6.6.7
Proof of 6.6.1: Let αbe the optimiser of 6.1.4 , (6.8) and
A={iKc|αi= Ξ(i)(α)}
the set of indices of restrictions active in α. Set
Θ = KAc={i|xi>0 or vi>0 or αiΞ(i)(α)<0}.
Then αis also the optimiser in the equality constrained optimisation problem
with restrictions only in Mc=A(cf [2] chapter 3 on Lagrange Multiplier
Theory, section 3.3) and
inf
α∈BK
−hα, vi+ Ψ(α) = inf
α∈DΘ
−hα, vi+ Ψ(α)ΨΘ(vΘ)
6.6.1
6.6 Local large deviations of the generalised Jackson network 187
Remark 6.6.8. Let A(i), S(i),...,S(d)be primitive processes of a generalised
Jackson network with lmgfs Γ(i)
A,Γ(i)
S. Let Pbe the routing matrix of the
network. Fix M {1, . . . , d}and define Qas in 6.6.2 and let its i-th row
define the lmgf Υ(i)as in 6.6.4. For iMcsplit the i-th arrival process into
A(ij)jMwrt q(i)and define the compound arrival process for iM
A(i)=A(i)+X
jMc
A(ji).
For iMsplit S(i)wrt q(i)and define the new S(i)
S(i)=X
kM
S(ik)S(i)
Then
A+X
jM
S(j)
is a free process, its routing matrix is QMand it has lmgf ΨMas in 6.6.1.
It seems immediate that the network represented by QMis open and that
ΨMis finite on R|M|. General theory for the large deviations of the free
process apply. We can now look for a suitable superset Θ in 6.6.1 in the
following way:
Algorithm 6.6.9. To calculate the local large deviation rate function for
the generalised Jackson network (cf 6.0.4), especially to find the optimiser
α BKand a suitable set Θof 6.6.1
1. Set M:= K , S:= .
2. Renumber nodes such that {1,...,d}={1,...,|M|,...,d}.
3. Find the optimiser ˜αMR|M|in ΨM(vM).
4. Calculate ˜αi= Υ(i)(˜αM)for iMcsuch that ˜α=˜αM
˜αMc DM.
5. If under the change of measure M(˜α, ·)there are no strict bottlenecks
in the Mc-nodes then add (M , ˜α)to S. Else, for each strict bottleneck
iMcset M:= M {i}and iterate from 3. on.
6. ,α) S such that ΨΘ(vΘ) = min{ΨM(vM)|(M, ˜α) S}.
This compares to theorem 2 of [12].
188
6.6.1 Interpretation and possible improvement
We now want to give an interpretation of the local rate function of the gen-
eralised Jackson network in terms of the associated free rate function ΨΘ,
cf 6.6.1.
From interpretation 6.1.3 we know that any feasible twist (any α BK)
does not decrease service rates at Kcnodes. And from 6.3.4 we have that if
the optimal twist parameter strictly increases a service rate of a Kc-node then
this node will be a bottleneck. Both interpretations make sense as minimum-
cost (in terms of service rates Γ(i)
Sand routing Ξ(i)) to allow a certain flow
through Kc-nodes: to reduce flow through a restricted node one only has to
reduce its input, the service rate does not have to change (i)
S(0) = 0). And
increasing the service rate of a restricted node is reasonable only if all of the
service capacity is required to get a certain flow through this node.
In steps 3.-5. of the algorithm 6.6.9 the network is partitioned into Mand
Mc. The free process of M-nodes as in 6.6.8 is twisted to have drift vM, The
twist ˜αMis the optimiser in ΨM(vM), thus an optimal twist wrt service at
and routing between M-nodes and original arrival processes at Mc-nodes.
Sevice rates at Mcnodes are not considered in the model of the free process
of M-nodes of 6.6.8; cost for these service rates Γ(i)
S,iMcdo not appear
in ΨM. It may now happen that as the arrival to Mcnodes is split to be-
come arrivals at Mnodes the flow through the Mc-subnetwork does not go
as smoothly as assumed in 6.6.8: Nodes in Mcoverflow when they cannot
handle the flow into Mand / or out of M.
If this happens, then in the network the drift vis not realised under the
changed measure: ˜α6=αand M6= Θ. Capacities and cost for increasing
capacities of Mcnodes has not been considered in the choice of ˜αbut should
have been. In the next iteration 3.-5. an increased set Mand cost at this
increased set of nodes is considered.
In the proof of 6.6.1 we have characterised ,α) as Θ = K∪{iKc|µi(α)>
µi}. Thus if (M , ˜α) with iM\Kand µi(˜α)< µithen M6= Θ. This
allows us to remove a node iM\Kfrom the set of free nodes. It would
be an advantage if one could change step 5. of algorithm 6.6.9 to become
5’. If under the change of measure M(˜α, ·) there are no strict bottlenecks
in the Mcsubnetwork and µi(˜α)µifor all iM\Kthen α= ˜α
and Θ = M. Else
6.6 Local large deviations of the generalised Jackson network 189
If for some iM\K:µi(˜α)< µithen restrict this node: set
M:= M\ {i}and iterate from 3. on.
If for all iM\K:µi(˜α)µiand some iMcis a bottleneck in
the subnetwork of Mc-nodes then free this node: set M:= M{i}
and iterate from 3. on.
However, it is not obvious that this algorithm terminates or whether the
sequence in which nodes are freed and/or restricted influences the final set
of free nodes. In the best of all cases step 6. of algorithm 6.6.9 could be
omitted.
6.6.2 Example
We give a simple example of calculating the decay rate / local rate function
with algorithm 6.6.9. The result is of course the same as when calculating
it from the restricted optimisation problem of the almost Fenchel-Legendre
transform of 6.0.4. The following example is simple as there will be only one
iteration in the algorithm and only one feasible choice for adding a bottleneck
in 5.
We work with the network of d= 4 nodes as introduced in figure 5.1 of
chapter 5. We chose exponential inter event times for all arrival and service
events for simplicity. Let the rates be
λ=
1
1
0
0
, µ =
3
6
4
5
, P =
01
2
1
20
0 0 1 0
0 0 0 1
2
3
10
3
10 0 0
.
Let the network process start in xend investigate the probability that it
evolves in direction v
x=
0.1
0.2
0
0
, v =
1
1
2
0
0
vis not the network drift and we will calculate the decay rate for the scaled
network process to stay close to the function t7→ x+tv.
We have K= Λ = {1,2}and the local process WΛ=W{1,2}. Figure
190
6.3 is an adaption of the network to represent the local process: Λ-nodes are
circled, indicating the behaviour as always non-empty.
rh
1
rh
2
r
3
r
4
-
-?
@@@@
@R
6
- -
-
Figure 6.3: Adaption of the network to represent the local process of Λ =
{1,2}.
For step 3. of algorith 6.6.9 we transform the network process into the
associated free process in |Λ|= 2 dimensions. We do this in steps. We
remove nodes 3,4 but keep the flow through these nodes. Flow indirectly
leaving the network through {3,4}-nodes is now documented as exits from
the {1,2}nodes: both nodes 1 and 2 become exit nodes. Similarly there is
flow comming back to each node (via p13 p34 p41 >0 and p23 p34 p42 >0) and
indirect flow from node 1 to node 2 (via p23 p34 p41 >0).
rh
1
@@@@
@
@@@@@
@-
rh
1
rh
2-
Figure 6.4: Some flows through {3,4}-nodes
Since immediate feedback is not allowed in our model we have to remodel
inter event times at these feedback nodes as in section 5.1.
We calculate Qwith Λ = {1,2}and Λc={3,4}.
QΛ=PΛ+PΛΛc(id PΛc)1PΛcΛ=3
40
23
40
3
20
3
20
QΛcΛ=
X
k=0
Pk
ΛcPΛcΛ=3
20
3
20
0 0
Removing immediate feedback we have to remodel inter event times (and
their lmgfs): In this setting we now have inter event times τ(i)SΛ=τ(i)S
6.6 Local large deviations of the generalised Jackson network 191
rh
1
rh
2
r
3
r
4
-
-?
@@@@
@R
6
- -
-rh
rh
-
-?
6
-
-
-
-
rh
1
rh
2
-
-?
6
-
-
Figure 6.5: Construction of the free subprocess for M= Λ = {1,2}
with E[τ(i)S] = 1
µ(1qii). Generally the new free processes primitives have
the rates
free process of Λ-nodes free process remodelled
(with immediate feedback) (without immediate feed-
back, cf 5.1)
arrival
rates
λΛ+ (QΛcΛ)λΛc=1
11
1
service
rates
µΛ=3
6[µi(1 qii)]iΛ=2.775
5.1
routing QΛ=3
40
23
40
3
20
3
20 023
37
3
17 0
We now investigate the free process with rates (1
1,2.775
5.1,023
37
3
17 0).
Ψ{1,2}∗(1
1
2) = 0.2694 ,˜α{1,2}=0.1457
0.3028
Step 4: We use rows of QΛcΛto calculate remaining coordinates of ˜α. For
192
example ˜α3= log(q31 e˜α1+q32 e˜α2+q30).
˜α=
0.1457
0.3028
0.0738
0
Step 5: We check if under the twist ˜αthe Λc-nodes have strict bottlenecks.
twisted twisted twisted
arrival rates service rates routing
1.2
1.4
0
0
3.2
4.8
3.7
5.8
0 0.55 0.44 0
0 0 1 0
0 0 0 0.5
0.3 0.35 0 0
Node 3 is a strict bottleneck. Also note that with these rates the evolution
at node 1 is less than the required v1=1 as flow that was supposed to
reach node 1 via node 3 is held back at the bottleneck node 3. We free the
bottleneck node 3, set M:= {1,2,3}, and continue with step 3.
Step 3: For M={1,2,3}construct the associated free process. We give
flows of node 3 that are affected by removing node 4.
rh
1
rh
2
r
3
r
4
-
-?
@@@@
@R
6
- -
-
h
rh
1
rh
2
rh
3
-
rh
1
rh
2
rh
3
-
-?-
@@@@
@R
@
@
@
@
@I
-
Figure 6.6: Construction of the free subprocess for M={1,2,3}
6.6 Local large deviations of the generalised Jackson network 193
We calculate Qfor the free subprocess. We now have Mc={4}.
QM=PM+PM{4}P{4}M=
01
2
1
2
0 0 1
3
20
3
20 0
Q{4}M=
X
k=0
Pk
{4}
|{z }
=id=[1]
P{4}M=3
10
3
10 0
now consider the free process with ates (
1
1
0
,
3
6
4
,
01
2
1
2
0 0 1
3
20
3
20 0
).
We get the free rate function and its optimiser.
Ψ{1,2,3}∗(
1
1
2
0
) = 0.5754 ,˜α{1,2,3}=
0.0121
0.1110
0.2595
Step 4:
˜α=
0.0121
0.1110
0.2595
0.0381
twisted twisted twisted
arrival rates service rates routing
1.0
1.1
0
0
2.8
4.1
5.3
5
0 0.59 0.41 0
0010
0 0 0 0.51
0.29 0.32 0 0
These rates have node 3 a bottleneck but not a strickt bottleneck, and node
4 in equilibrium. The network drift is v.
Step 5: Θ = {1,2,3}and α=
0.0121
0.1110
0.2595
0.0381
. The algorithm terminates
with L(x, v) = 0.5754.
194
The following are 4 simulations of the 4-nodes network under the twisted
distribution with n= 3000 and T=.05.
75
500
0
250 100
400
125 15050
600
300
200
100
0
400
150
200
600
300
100
1250 25 75
500
10050
75
500
0
250 100
400
125 15050
600
300
200
100
200
150
300
500
100
25 125
400
0
600
0 1007550
The exponential decay rate (per time) is about .5754: The probability that
the scaled network process follows t7→ x+tv for an interval of scaled time
length 0.2 has a decay rate in nof about .2.5754 = .1151.
Chapter 7
Appendix
7.1 Functional strong law of large numbers
Let X1, X2,...be iid and centred with E[X1] = 0. Let Snbe the n-th partial
sum and 0, S1, S2,... a path of partial sums. Let Znbe the scaled process
under the law of large numbers scaling.
Zn:R0R, t 7→ 1
nSnt
From the strong law of large numbers for almost every path of partial sums
ǫ > 0n0N:nn0:<1
nSn<ǫ
T
where the n0depends on the path. Fix ǫ > 0 and for each path the associ-
ated n0where it exists. Then for an arbitrary path where such a finite n0
exists, there is also a maximum of the partial sums before n0attained (again
depending on the path).
M:= max
m∈{1,...,n0}|Zm|
When choosing nlarge enough the scaled partial sums process Znbefore and
after n0
nwill be arbitrarily small. We do an exact calculation: Fix δ > 0. Fix
for ǫ:= δ
Tfor each path the n0and Mas above. Choose nn0and nM
δ.
Both lower bounds depend on the path. They are finite for almost all paths.
Let t[0, T]. For nmax{n0,M
δ,1
T}we either have nt n0or not. First
195
196
consider the first case.
Zn(t)1
nmax{|Snt|,|Snt+1|}
max{nt
n
1
nt|Snt|
|{z }
<δ
T
,nt+ 1
n
1
nt+ 1 |Snt+1|
|{z }
<δ
T
}
nt+ 1
n
δ
T=nt
n
δ
T+1
n
δ
T2δ
And now the second with nt < n0.
Zn(t)=1
nmax{|Snt|,|Snt+1|} 1
nMδ
So almost all paths converge under the scaling and in the sup-norm to the
function t7→ 0.
1 = P( lim
n→∞ sup
t[0,T ]
|Zn(t)|= 0)
In another notation with X1:= τ1E[τ] we have Sn=Pn
k=1 τknE[τ]
and Zn(t) = 1
nSnt=1
nPnt
k=1 τknt
nE[τ]. Since the difference between
nt
nE[τ] and tE[τ] is bounded by E[τ]
nwe get the almost sure convergence:
1 = P( lim
n→∞ sup
t[0,T ]
|1
n
nt
X
k=1
τktE[τ]|= 0)
7.1.1 Implication for the counting process
Define the partial sums and the interpolated partial sums process wrt τ1, τ2,...
Yn:t7→ 1
n
nt
X
k=1
τk
ˆ
Yn:t7→ Yn(t) + nt nt
nτnt+1
And note that
Yn(k
n) = ˆ
Yn(k
n) for all kN.
ˆ
Yn(t)Yn(t) and ˆ
Yn(t)> Yn(t) for t6∈ {k
n|kN}.
7.1 Appendix 197
Claim 7.1.1. Yn Uǫ(t7→ )ˆ
Yn Uǫ(t7→ ).
Proof of 7.1.1: Note that if Ynis close to t7→ then necessarily Ynis close
to t7→ whenever it jumps, that is in {k
n|k= 0,1,...,nT}:
Yn Uǫ(t7→ )ǫ > |Yn(k
n)k
nλ|(k= 1,...,nT)
But this is enough for ˆ
Ynto be close to t7→ : (t, ˆ
Yn(t)) is the interpolation
between k
n, Yn(k
n)and (k+1
n, Yn(k+1
n)) for k=nt
n. And since Uǫ(t7→ ) is
a convex subset of R2it has to contain (t, ˆ
Yn(t)).
7.1.1
-
6
Yn(k
n)
k
n
ǫ
ǫ
λ
r
r
r
r
b
bb
b
-
6
ˆ
Yn(t)
t
ǫ
ǫ
λ
r
r
r
r
b
bb
b
Figure 7.1: 2 realisations of Ynat jumptimes {k
n|k= 1,...,4}and as inter-
polated functions ˆ
Yn
Claim 7.1.2. For the interpolated counting process a functional strong law
of large numbers holds.
Proof of 7.1.2: From 7.1.1 the functional strong law of large numbers
holds for the interpolated partial sums process too. ˆ
Y1=ˆ
Nso
sup
t[0,T ]
|ˆ
Y(t)tE[τ]|< ǫ sup
t[0,T E[τ]ǫ]
|ˆ
Y1(t)t1
E[τ]|<ǫ
E[τ]
sup
t[0,T E[τ]ǫ]
|ˆ
N(t)t1
E[τ]|<ǫ
E[τ]
198
For an ǫ0smaller than ǫwe get
1 = P( lim
n→∞ sup
t[0,T ]
|1
nˆ
Y(nt)tE[τ]|< ǫ)
P( lim
n→∞ sup
t[0,T E[τ]ǫ]
|1
nˆ
Y1(nt)t1
E[τ]|<ǫ
E[τ])
P( lim
n→∞ sup
t[0,T E[τ]ǫ0]
|1
nˆ
N(nt)t1
E[τ]|<ǫ
E[τ])
Claim 7.1.3. A functional strong law of large numbers holds for the unin-
terpolated counting process.
Let 0 < ǫ< ǫ and n1
ǫǫ.
||Nn(t7→ )|| < ǫ || ˆ
Nn(t7→ )|| +||Nnˆ
Nn||
|{z }
1
n
< ǫ
|| ˆ
Nn(t7→ )|| < ǫ 1
n
|| ˆ
Nn(t7→ )|| < ǫ
and
P( lim
n→∞ ||Nn(t7→ )|| < ǫ)P( lim
n→∞ ||Nn(t7→ )|| < ǫ) = 1
7.2 Implications from exponential equivalence
Claim 7.2.1. If Nand Nare exponentially equivalent then
lim
ǫ0lim
n→∞
1
nlog P(Nn Uǫ(ψ)) = lim
ǫ0lim
n→∞
1
nlog P(N
n Uǫ(ψ)).
If ǫ > 0is fixed and we have
lim
n→∞
1
nlog P(Nn Uǫ(ψ)) = f(ǫ)
for some fcontinuous in ǫthen
lim
n→∞
1
nlog P(Nn Uǫ(ψ)) = lim
n→∞
1
nlog P(N
n Uǫ(ψ)).
7.2 Appendix 199
Proof of 7.2.1: Assume Nand Nare coupled such that their difference
decays super exponentially fast as required for exponential equivalence. We
have for any δ > 0
lim
n→∞
1
nlog P(||NnN
n|| > δ) = −∞
And we want for Uconvex and open or closed and U6=
lim
n→∞
1
nlog P(NnU) = lim
n→∞
1
nlog P(N
nU)
Let’s try. Let Uδbe the closed blow up of U
P(NnU) = P(NnU , ||NnN
n|| δ) + P(NnU , ||NnN
n|| > δ)
P(N
nUδ,||NnN
n|| δ) + P(NnU , ||NnN
n|| > δ)
P(N
nUδ) + P(||NnN
n|| > δ)
and thus
lim
n→∞
1
nlog P(NnU)
lim sup
n→∞
1
nlog P(N
nUδ) + P(||NnN
n|| > δ)
= max{lim sup
n→∞
1
nlog P(N
nUδ),−∞}
= lim sup
n→∞
1
nlog P(N
nUδ)
and since δ > 0 was arbitrary and for U=Uǫ(ψ) and for the limit of ǫ0.
lim
ǫ0lim
n→∞
1
nlog P(Nn Uǫ(ψ)) lim
ǫ0lim
δ0lim sup
n→∞
1
nlog P(N
n Uǫ+δ(ψ))
= lim
ǫ0lim
n→∞
1
nlog P(N
n Uǫ(ψ))
By symmetry we are done.
In the case of ǫ > 0 fixed with a bound continuous in ǫ
lim
n→∞
1
nlog P(Nn Uǫ(ψ)) lim
δ0lim sup
n→∞
1
nlog P(N
n Uǫ+δ(ψ))
= lim
n→∞
1
nlog P(N
n Uǫ(ψ))
200
For a lower bound we need the δ-interior of U, that is a non-empty subset of
Usuch that its δ-blow up is still contained in U. Let this be Uδ.
P(NnU) = P(N
n+ (NnN
n)U)
P(N
n+ (NnN
n)U , ||NnN
n|| δ)
P(N
nUδ,||NnN
n|| δ)
and for U=Uǫ(ψ) , Uδ=Uǫδ(ψ)
lim
n→∞
1
nlog P(Nn Uǫ(ψ))
= lim
δ0lim
n→∞
1
nlog P(N
n Uǫδ(ψ),||NnN
n|| δ)
= lim
δ0lim
n→∞
1
nlog P(N
n Uǫδ(ψ))
continuity
=
in ǫlim
n→∞
1
nlog P(N
n Uǫ(ψ))
7.2.1
Lemma 7.2.2. For θ > 0and || · || the supremum norm over [0,1]:
lim
n→∞
1
nlog E[e||NnN
n||] = 0
Proof of 7.2.2: Let δ > 0.
E[e||NnN
n||] = E[e||NnN
n|| 11||NnN
n||] + E[e||NnN
n|| 11||NnN
n||≤δ]
Investigate summands separately, starting with the first. Apply older
E[e||NnN
n|| 11||NnN
n||]E[enpθ||NnN
n||]1
pE[11||NnN
n||]1
q
E[enpθ(Nn(1)+δ)]1
pE[11||NnN
n||]1
q
with some p1. Since Γ(·) is finite on all of R:
lim sup
n→∞
1
nlog E[eθ||NnN
n|| 11||NnN
n||]
lim sup
n→∞
1
nlog E[eN(n)]1
p+ lim sup
n→∞
1
nlog E[11||NnN
n||]1
q
=1
pΓ()
=−∞
7.2 Appendix 201
Continuing for the second.
lim sup
n→∞
1
nlog E[e||NnN
n|| 11||NnN
n||≤δ]lim sup
n→∞
1
nlog eδ =θδ
Thus
lim
δ0lim sup
n→∞
1
nlog E[eθ||NnN
n||] = lim
δ0max{−∞ , δθ}= 0
But θ > 0 and || · || 0 makes E[eθ||NnN
n||]1 and we cannot have decay.
So the lim has to be = 0.
7.2.2
Claim 7.2.3. If N, Nare exponentially equivalent then they have the same
lmgfs:
lim
t→∞
1
tlog E[eθNt] = lim
t→∞
1
tlog E[eθN
t]
lim
t→∞
1
tlog E[ehθ,Nti] = lim
t→∞
1
tlog E[ehθ,N
ti]
Proof of 7.2.3: For θ > 0. Upper bound: Let p, q > 1 such that 1
p+1
q= 1
E[eθNt] = E[eθN
t+θ(NtN
t)]E[eθN
t+θ||NN||[0,t]]
E[eN
t]1
pE[eqθ||NN||[0,t]]1
q
under the exponential scaling
1
tlog E[eθNt]1
tp log E[eN
t] + 1
tq log E[etqθ 1
t||NN||[0,t]]
1
pΓ() + 0 (t )
by application of 7.2.2. As pis arbitrary with only p > 1 we let p1. We
have D(Γ) = Rand continuity of Γ(·) from finiteness and convexity. We get
the upper bound Γ(θ). Lower bound, still θ > 0:
E[eθNt] = E[eθN
t+θ(NtN
t)]
E[eθN
tθ||NN||[0,t]]
E[eθN
tθ||NN||[0,t]11||NN||≤] + E[eθN
tθ||NN||[0,t]11||NN||>tδ]
E[eθN
tθ 11||NN||≤]
under the exponential scaling
lim
t→∞
1
tlog E[eθNt]lim
t→∞
1
tlog E[eθN
t11||NN||≤]θδ
= Γ(θ)δ θ
202
as we had δ > 0 arbitrarily small, the lower bound is done.
For θ < 0 we apply basically the same tool and in general dimensions for
θRdand |·|a norm in Rd,|| · ||[0,t]the supremum norm over [0, t]. Bounds
work similarly starting from
E[ehθ,Nti] = E[ehθ,N
ti+hθ,NtN
ti]E[ehθ,N
ti+|θ| |NtN
t|]
E[ehθ,N
ti+|θ| ||NtN
t||[0,t]]
7.2.3
7.3 Fenchel-Legendre transforms
Some simple transformations. All functions are assumed to be convex.
(cf)(x) = cf(x
c) (7.1)
(f+g)(x) = inf
αf(xα) + g(α) (7.2)
(fg)(x) = inf
αRα g(x
α) + f(α) (7.3)
And as a combination of the above
(h+fg)(x)inf
αRh(xα) + (fg)(α)
inf
α[0,1] inf
βh(xα) + β g(α
β) + f(β)
Should we prove them?
(c f)(x) = sup
θ
θ x c f(θ) = csup
θ
θx
cf(θ) = c f(x
c)
(f+g)(x) = sup
θ
θ x f(θ)g(θ)
= sup
θ
θ(xα)f(θ) + α θ g(θ)
sup
θ
θ(xα)f(θ) + sup
θ
α θ g(θ)
=f(xα) + g(α)
With equality if the optimiser is the same in fand g. Also we can optimise
the bound.
(f+g)(x)inf
αf(xα) + g(α)
7.3 Appendix 203
If there is an optimal θfor (f+g)(x) then
f(θ) + g(θ) = xf(θ) = xg(θ)
g(θ) = g(θ)f(θ) = xα
g(θ) = α
for some α
and the same θis the optimiser in f(xα) and g(α). We got the equality
of (7.2). If an optimal θdoes not exist for finite (f+g)(x) we probably get
equality by approximation. If (f+g)(x) = there is nothing to do. The
claim and the proof do not rely on xR, it is for general f, g :RmRand
x, α Rm.
Next one. We have g:RmRand f:RR.
(fg)(x) = sup
θ
θ x f(g(θ))
= sup
θ
θ x α g(θ) + α g(θ)f(g(θ))
sup
θ
θ x α g(θ) + sup
θ
α g(θ)f(g(θ))
=αsup
θ
θx
αg(θ) + sup
ξg(R)
α ξ f(ξ)
α g(x
α) + f(α)
We argue as before: If there is an optimising θfor (fg)(x) it will satisfy
(fg)(θ) = xand if we set α:= f(g(θ)) (R) then we get g(θ) = x
αand
have obtains optimisers for f(α) and g(x
α). Again we got equality as in
(7.3).
Claim 7.3.1. For g=πjin (7.3): fπj(α) = (fπj(α), α =πj(α)
,else .
Define Π
jas the subspace of Rdperpendicular to πj(Rd) such that Rd=
πj(Rd)Π
jand write
α=πj(α) + α, α ⊥∈ Π
j
θ=πj(θ) + θ, θ ⊥∈ Π
j.
This implies hα , θi=hπj(α), πj(θ)i+hα, θiwhich we apply in the defi-
nition of the F-L transform:
fπj(α) = sup
θRd
hπj(α), πj(θ)i fπj(θ) + hα, θi
= sup
θ(1)πj(Rd)
hπj(α), θ(1)i f(θ(1))
|{z }
=f(πj(α))
+ sup
θ(2)Π
j
hα, θ(2)i
|{z }
∈{0,∞}
204
Iff α=πj(α) then α= 0 implying the claimed statement.
7.4 The shifted inter event time
Let τbe an inter event time with distribution function Fand density f. We
denote f+xthe shifted density and its distribution function F+x.
f+x(t) = f(x+t)
Fc(x)
Fc
+x(t) = Z
s=t
f+x(s)ds =Z
s=t
f(s+x)
Fc(x)ds =Z
s=t+x
f(s)
Fc(x)ds =Fc(t+x)
Fc(x)
h+x(t) = f+x
Fc
+x
(t) = f(x+t)
Fc(x+t)=h(x+t)
H+x(t) = Zt
s=0
h+x(s)ds =Zt
s=0
h(s+x)ds =Zt+x
s=x
h(s)ds =H(t+x)H(x)
and F+xmatches H+xby Fc
+x=eH+x. The Cesaro limit for the shifted
distributions hazard function does not change:
LC(h+x) = lim
t→∞
H+x(t)
t= lim
t→∞
H(t+x)H(x)
t= lim
t→∞
H(t+x)
t0 = lim
t→∞
H(t)
t
and immediately LC(h) = LC(h+x) and D(Λ) = D+x).
We get finiteness of all moments of τ+x. We calculate the mean for un-
bounded τ
E[τ+x] = Z
s=0
Fc
+x(s)ds =Z
s=0
eH(x+s)+H(x)ds =eH(x)Z
s=x
eH(s)ds
If τis bounded by b, say, then x < b is required. τ+x(0, b x)
E[τ+x] = Zbx
s=0
Fc
+x(s)ds =Zbx
s=0
eH(x+s)+H(x)ds =eH(x)Zb
s=x
eH(s)ds
In both cases (τbounded or not) we got a product of something large and
something small. What happens as x (or xb)?
lim
x.. E[τ+x] = lim
x.. Rb
s=xeH(s)ds
eH(x)= lim
x..
eH(x)
h(x)eH(x)= lim
x..
1
h(x)(7.4)
which only makes sense if limt→∞ h(t) exists.
7.4 Appendix 205
Claim 7.4.1. If limx→∞ h(x)exists then limx→∞ E[τ+x] = 1
LC(h).
For τunbounded but LD-bounded: LC(h) = . If lim infx→∞ h(x) =
then limx→∞ E[τ+x] = 0.
If τis bounded by b:LC(h) = and if lim infxbh(x) = then
limxbE[τ+x] = 0.
If τis not LD-bounded and limx→∞ h(x)exists in (0,)(and is =
LC(h)) then limx→∞ E[τ+x] = 1
LC(h).
E[eθτ+x] = Z
s=0
eθs f(s+x)
Fc(x)ds =1
Fc(x)eθx Z
s=0
eθ(s+x)f(s+x)ds
=1
Fc(x)eθx Z
s=x
eθs f(s)ds =1
Fc(x)eθx Z
s=x
eθs f(s)ds
=1
Fc(x)eθxΛ(θ)Z
s=x
eθsΛ(θ)f(s)
|{z }
=fθ(s)
ds
=1
Fc(x)eθxΛ(θ)Fc
θ(x) = Fc
θ
Fc(x)eθx+Λ(θ)(7.5)
Claim 7.4.2. Shifting and exponentially twisting commute.
f+xβ(x) = f+x(t)eβtΛ+x(β)=f(t+x)
Fc(x)eβtΛ+x(β)
=1
Fc(x)eβxΛ+x(β)+Λ(β)f(t+x)eβ(t+x)Λ(β)
=1
Fc(x)eβxΛ(β)eΛ+x(β)fβ(t+x)
(7.5)
=1
Fc(x)eβxΛ(β)
Fc
Fc
β
(x)eβxΛ(β)fβ(t+x)
=fβ(x+t)
Fc
β(x)= (fβ)+x(t)
Claim 7.4.3. If limx→∞ h(x) = then limx→∞ E[eθτ+x] = 1.
206
We generally have e˜
Λ(θ)=eΛ(θ)1
Λ(θ)θand
lim
x→∞ E[eθ^
(τ+x)]
= lim
x→∞ R
s=xeθsH(s)ds
eθx R
s=xeH(s)ds = lim
x→∞
eθxH(x)
θ eθx R
s=xeH(s)ds eθxH(x)
= lim
x→∞
1
θeH(x)R
s=xeH(s)ds 1= lim
x→∞
1
θE[τ+x]1(7.6)
If limx→∞ h(x) = then limx→∞ E[τ+x] = 0 and limx→∞ E[eθ^
(τ+x)](7.6)
= 1 for
all θR.
E[eθτ+x] = 1 + E[τ+x]θE[eθ^
(τ+x)]
lim
x→∞ E[eθτ+x] = 1 + lim
x→∞ E[τ+x]θE[eθ^
(τ+x)] = 1 + 0 ·θ·1 = 1
7.5 Large deviations and other tools
Theorem 7.5.1 (Arzel`a-Ascoli, theorem A.51 of [23]).The set Ahas com-
pact closure in C([0, T],Rd)equipped with the sup-norm if and only if
The initial points are bounded: sup~xA|~x(0)|<, and
The functions in Aare equicontinuous, that is, for every tand ǫthere
exists a δso that, whenever |ts|< δ we have |~x(t)~x(s)|< ǫ for all
~x A.
Theorem 7.5.2 (Contraction principle, theorem 4.2.1 of [5]).Let Xand Y
be Hausdorff topological spaces and X Y a continuous function. Consider
a good rate function I:X [0,].
(a) For each y Y, define
I(y) = inf{I(x) : x X, y =f(x)}.
Then Iis a good rate function on Y, where as usual the infimum over
the empty set is taken as .
(b) If Icontrols the LDP associated with a family of probability measures
{µǫ}on X, then Icontrols the LDP associated with the family of prob-
ability measures {µǫf1}on Y.
The following version of the artner-Ellis theorem is equivalent to theo-
rem 2.3.6 of [5].
7.6 Appendix 207
Theorem 7.5.3 (G¨artner and Ellis).Let (Zn;nN)be a sequence of
random vectors in Rdwith µnthe law of Zn. Assumue that the limit
Λ(λ) = lim
n→∞
1
nlog E[ehλ,Zni]
exists as an extended real number. Assume further that 0 D(Λ).
(a) For any closed set F,
lim sup
n→∞
1
nlog µn(F) inf
xFΛ(x).
(b) For any open set G,
lim inf
n→∞
1
nlog µn(F) inf
xG∩F Λ(x),
where Fis the set of exposed points of Λwhose exposing hyperplane belongs
to D(Λ).
(c) If Λis an essentially smooth, lower semicontinuous function, then a large
deviation principle holds for (Zn;nN)with the good rate function Λ(·).
7.6 Assumptions
Chapter 2
2.2.2: Inter event times a.s. strictly positive. Lmgfs open and not bounded
from below.
2.2.13: Existence of density, no harsh used better than new.
2.4.2 : Existence of positive Cesaro mean for the hazard rate.
Chapter 5
5.0.2: Summarises previous assumptions.
5.1.9: Networks are open and have no immediate feedback.
208
Bibliography
[1] Abraham Berman and Robert J. Plemmons. Non-negative matrices in
the mathematical sciences. Academic Press, 1979.
[2] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, 1995.
[3] Hong Chen and Avishai Mandelbaum. Discrete flow networks: Bot-
tleneck analysis and fluid approximations. Mathematics of Operations
Research, 16:408–446, 1991.
[4] D.J. Daley and D. Vere-Jones. An Introduction to the Theory of Point
Processes. Springer, 1988.
[5] Amir Dembo and Ofer Zeitouni. Large Deviations Techniques and Ap-
plications. Springer, 1993.
[6] Nicholas G. Duffield and Neil O’Connell. Large deviations and overflow
probabilities for the general single-server queue with applications. Math.
Proc. Cam. Phil. Soc., 1995.
[7] Paul Dupuis and Richard S. Ellis. The large deviation principle for a
general class of queueing systems. Transactions of the American Math-
ematical Society, 347(8), 1995.
[8] Robert Foley and David McDonald. Large deviations of a modified jack-
son network: Stability and rough asymptotics. The Annals of Applied
Probability, 15, 2005.
[9] Ayalvadi Ganesh, Neil O’Connell, and Damon Wischik. Big Queues.
Springer, 2004.
[10] Peter Glynn and Ward Whitt. Large deviations behaviour of counting
processes and their inverses. Queueing Systems, 17:107–128, 1994.
[11] J.B. Goodman and William Massey. The non-ergodic jackson-network.
Journal of Applied Probability, 21:860–869, 1984.
209
210
[12] Irina Ignatiouk-Robert. Large deviations of jackson networks. Annals
of Applied Probability, 10:962–1001, 2000.
[13] Irina Ignatiouk-Robert. Large deviations for processes with discontinu-
ous statistics. Annals of Probability, 33:1479–1508, 2005.
[14] James R. Jackson. Networks of waiting lines. Operations Research,
5:518–521, 1957.
[15] Anatolii Puhalskii. Large deviation analysis of the single server queue.
Queueing Systems, 21:5–66, 1995.
[16] Anatolii Puhalskii. The action functional for the jackson network.
Markov Processes and Related Fields, 13:99–136, 2007.
[17] Anatolii Puhalskii and Ward Whitt. Functional large deviation princi-
ples for first-passage-time processes. The Annals of Applied Probability,
7(2):362–381, 1997.
[18] Philippe Robert. Stochastic Networks and Queues. Springer, 2003.
[19] R. Tyrrell Rockafellar. Convex Analysis. Princeton University Press,
1970.
[20] Mark Rodgers-Lee. The large deviations of random time-changes in a
metric topology, 2003. Master Thesis, University of Dublin.
[21] H. L. Royden. Real Analysis. The Macmillan Company, 1968.
[22] Raymond Russell. The large deviations of random timechanges, 1998.
PhD thesis, Trinity College Dublin.
[23] Adam Shwartz and Alan Weiss. Large Deviation for Performance Anal-
ysis. Chapman and Hall, 1995.
[24] Hermann Thorisson. Coupling, Stationarity, and Regeneration.
Springer, 2000.
[25] James S. Vandergraft. A fluid flow model of networks of queues. Man-
agement Science, 29(10), 1983.
[26] Ronald W. Wolff. Stochastic Modeling and the Theory of Queues.
Prentice-Hall, 1989.