Document [original]

Large Deviations of Generalised Jackson

Networks

vorgelegt von Diplom Wirtschaftsmathematikerin

Silke Meiner

aus Berlin

Von der Fakult¨at II - Mathematik und Naturwissenschaften

der Technischen Universit¨at Berlin

zur Erlangung des akademischen Grades

Doktorin der Naturwissenschaften

Dr. rer. nat.

genehmigte Dissertation

Promotionsausschuss:

Vorsitzender: Prof. Dr. Reinahrd Nabben

Gutachter: Prof. Dr. Jean-Dominique Deuschel

Gutachter: Prof. Adam Shwartz, Ph.D.

Tag der wissenschaftlichen Aussprache: 3. September 2008

Berlin, 2008

D 83

This research has been supported in parts by the Minerva Foundation of the

Max Planck Society and the Berliner Programm zur F¨orderung der Chan-

cengleichheit von Frauen in Forschung und Lehre.

Abstract

In dieser Arbeit entwickeln wir die lokalen großen Abweichungen von verallge-

meinerten Jackson Netzwerken. Im Unterschied zum Jackson Netzwerk sind

Zwischenankunfts- und Servicezeiten allgemeinen Verteilungen unterworfen

und nicht auf Exponentialverteilungen beschr¨ankt. Die daraus resultieren-

den stochastischen Prozesse sind nicht Markovsch, was eine Herausforderung

an die zur Verf¨ugung stehende mathematische Technik bedeutet.

Im ersten Teil der Arbeit untersuchen wir, inwieweit und mit welchen Mit-

teln die verlorene Markoveigenschaft aufgewogen werden kann. Die verall-

gemeinerten Prozesse, die wir betrachten, sind Erneuerungsprozesse. Es

gelingt uns, die Prozesse, mit denen wir das generalisierte Jackson Netz-

werk beschreiben werden, so abzu¨andern, dass sie unabh¨angige station¨are

Inkremente haben und im Sinne der großen Abweichungen nicht von den ur-

spr¨unglichen Prozessen zu unterscheiden sind. Weiter entwickeln wir einen

exponentiellen Maßwechsel f¨ur die Erneuerungsprozesse, so dass die Erneuer-

ungseigenschaft erhalten bleibt. Der resultierende Maßwechsel f¨ur den Netz-

werkprozess ver¨andert nur die Raten des Netzwerkes, nicht aber seine grund-

legenden Eigenschaften.

Im Ergebnis erhalten wir ein lokales Prinzip großer Abweichungen mit einer

Ratenfunktion, die fast die Fenchel Legendre Transformierte der logarith-

mischen Momenterzeugendenfunktion Ψ des freien Prozesses ist, der dem

generalisierten Jackson Netzwerk zugeordnet ist:

L(x, v) = sup

α∈BK(x,v)

hα, vi − Ψ(α) (1)

Die lokale Ratenfunktion L(·,·) unterscheidet sich von einer Fenchel Le-

gendre Transformierten durch die Einschr¨ankung auf Elemente aus BK(x,v).

Diese Menge beschreibt die unterschiedlichen Verhaltensweisen des Netz-

werkprozesses in Abh¨angigkeit vom derzeitigen Zustand des Netzwerkes -

repr¨asentiert durch x- und dem zuk¨unftigen Verlauf - repr¨asentiert durch v.

Ist eine zuk¨unftige Entwicklung des Netzwerkes in Richtung vein seltenes

Ereignis und αder Optimierer in (1), so ¨andert sich die Situation unter dem

Maßwechsel mit Parameter αdahingehend, dass die Entwicklung in Richtung

vzum erwarteten Verhalten des Netzwerkes wird.

Contents

1 Introduction 1

1.1 Queues, networks, and rare events . . . . . . . . . . . . . . . . 1

1.2 Large deviations of Jackson and generalised Jackson networks 6

2 Inter event times 9

2.1 Inter event time . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 The logarithmic moment generating function . . . . . . . . . . 12

2.3 Exponential twist of inter event times . . . . . . . . . . . . . . 16

2.4 Hazard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4.1 Introduction of the hazard function . . . . . . . . . . . 18

2.4.2 The hazard function and the domain of lmgf . . . . . . 20

2.4.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.5 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.6 Fenchel-Legendre transforms . . . . . . . . . . . . . . . . . . . 29

2.7 Extreme twists . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.8 Generalisations . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 The counting process 35

3.1 Introducing the counting process . . . . . . . . . . . . . . . . 36

3.1.1 Joint distributions . . . . . . . . . . . . . . . . . . . . 40

3.2 Stationary increments . . . . . . . . . . . . . . . . . . . . . . 44

3.3 Lmgf for the undelayed rcp . . . . . . . . . . . . . . . . . . . . 46

3.4 Exponential equivalence for cps . . . . . . . . . . . . . . . . . 48

3.4.1 Initial inter event time . . . . . . . . . . . . . . . . . . 49

3.4.2 Independence of increments . . . . . . . . . . . . . . . 53

3.4.3 Interpolation . . . . . . . . . . . . . . . . . . . . . . . 59

3.5 Conclusions from previous proofs . . . . . . . . . . . . . . . . 59

3.5.1 Finite dimensional large deviations . . . . . . . . . . . 59

3.5.2 Continuous paths . . . . . . . . . . . . . . . . . . . . . 63

3.6 Change of measure . . . . . . . . . . . . . . . . . . . . . . . . 63

3.6.1 Martingale property . . . . . . . . . . . . . . . . . . . 64

3.6.2 The twisted distribution . . . . . . . . . . . . . . . . . 68

4 Large deviations of the ren. counting process 75

4.1 Thespace ............................. 77

4.1.1 A base of the topology . . . . . . . . . . . . . . . . . . 77

4.2 Local large deviations . . . . . . . . . . . . . . . . . . . . . . . 79

4.2.1 Local large deviations upper bound . . . . . . . . . . . 80

4.2.2 Local large deviations lower bound . . . . . . . . . . . 82

4.2.3 Generalisation . . . . . . . . . . . . . . . . . . . . . . . 83

4.2.4 Piecewise linear functions . . . . . . . . . . . . . . . . 84

4.2.5 Towards linear geodesics . . . . . . . . . . . . . . . . . 86

4.2.6 Alimit........................... 90

4.3 The LDP in sample space . . . . . . . . . . . . . . . . . . . . 92

4.3.1 The weak large deviation principle . . . . . . . . . . . 92

4.3.2 The full large deviation principle . . . . . . . . . . . . 100

4.3.3 Interpretation . . . . . . . . . . . . . . . . . . . . . . . 102

4.4 Split counting processes . . . . . . . . . . . . . . . . . . . . . 102

4.4.1 Construction of the split process . . . . . . . . . . . . . 103

4.4.2 Change of measure for the split process . . . . . . . . . 110

4.4.3 Sample path LDP for the split process . . . . . . . . . 114

5 Stochastic networks and associated processes 119

5.1 Stochastic networks . . . . . . . . . . . . . . . . . . . . . . . . 120

5.2 Deterministic descriptions of stochastic networks . . . . . . . . 127

5.2.1 Fluid network . . . . . . . . . . . . . . . . . . . . . . . 130

5.2.2 Subnetworks . . . . . . . . . . . . . . . . . . . . . . . . 131

5.3 Processes .............................138

5.3.1 The free process . . . . . . . . . . . . . . . . . . . . . . 138

5.3.2 The network process . . . . . . . . . . . . . . . . . . . 150

5.3.3 The local process . . . . . . . . . . . . . . . . . . . . . 152

6 Local large deviations of the generalised Jackson network 159

6.1 Local large deviations upper bound . . . . . . . . . . . . . . . 161

6.1.1 Leaving a boundary . . . . . . . . . . . . . . . . . . . . 164

6.2 Existence and uniqueness of an optimiser . . . . . . . . . . . . 167

6.3 Network drift under the changed measure . . . . . . . . . . . . 174

6.4 Local large deviations lower bound . . . . . . . . . . . . . . . 178

6.5 Rate function identification . . . . . . . . . . . . . . . . . . . 181

6.6 Calculating the local rate function . . . . . . . . . . . . . . . . 183

6.6.1 Interpretation and possible improvement . . . . . . . . 188

6.6.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 189

7 Appendix 195

7.1 Functional strong law of large numbers . . . . . . . . . . . . . 195

7.1.1 Implication for the counting process . . . . . . . . . . . 196

7.2 Implications from exponential equivalence . . . . . . . . . . . 198

7.3 Fenchel-Legendre transforms . . . . . . . . . . . . . . . . . . . 202

7.4 The shifted inter event time . . . . . . . . . . . . . . . . . . . 204

7.5 Large deviations and other tools . . . . . . . . . . . . . . . . . 206

7.6 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

Notation

|| · || for f∈R[0,T ]:||f|| = supt∈[0,T ]|f(t)|

for f∈(Rd)[0,T ]:||f|| = maxi=1,...,d supt∈[0,T ]|fi(t)|

|| · ||[a,b]||f||[a,b]=||f11[a,b]||, [a, b]⊆[0, T]

AC, AC([0, T],Rd){f∈(Rd)[0,T ]|fis absolutely continuous }

BΛ,BKset of restrictions, claims 6.1.1, 6.1.4

C([0, T],Rd){f∈(Rd)[0,T ]|fis continuous at each t∈[0, T]}

D([0, T],Rd){f∈(Rd)[0,T ]|fwith each firight-continuous at each t∈

[0, T) and with left limits for each t∈(0, T]}

dnumber of nodes of a network / graph

E(α),E[α]expectation wrt a twisted distribution, (α): inter event time

twisted with parameter α; [α] counting process twisted with

parameter α, section 3.6

Fβexponential tranfform / twist, def 2.3.1

F+adistribution function of τ−aconditional on τ > a, def 2.1.7

gi(·) restriction, above claim 6.3.2

gγ(·) def 6.2.2

K(·) lmgf of splitting probabilities, def 4.4.4

K(·) set of nodes / indices, claim 6.1.1

L(X) law / distribution of a random variable X

L(α), L(α, k) a level set, the level set of a function identified through k

Λ(·) lmgf of an inter event time, def 2.2.1

Λ(·) set of nodes / indices, proof of 5.2.15, claim 6.1.1

L(·,·) local rate function

λ∈Rd, λivector of arrival rates, arrival rate at node i

λMvector of arrival rates to nodes i∈M, claim 5.2.10

λΛ∈R|Λc|arrival rates to Λc-nodes in the subnetwork of Λcnodes with

Λ-nodes free, proof of 5.2.15

µ, µi(vector of) service rate(s), same indexing as with λ

Ncounting process, def 3.1.1

Nσ, N˜τcp with indicated initial inter event time, def 3.1.5

Ncounting process as a memeber of a coupling

Ninterpolated counting process

Nre,s, Nre,(s1,...,sk)restarted counting process, s(or s1,...,sk) indicating the

time(s) of restarting, defs 3.4.11, 3.4.16

Nsp a split counting proces, def 4.4.1

P , p(i)routing matrix, row of the routing matrix P

πkprojection onto span {ek}

πM,Ma set projection onto span {ek|k∈M}

Ψ(·) lmgf of the free process, def 5.3.9

R>0,R≥0R∩(0,∞), R∩[0,∞)

R(i)runtime process of the server at node i

R(·,·) operator to describe / define a runtime process, def 5.3.21

T,T(i)linear transformation, def 4.4.7

A⊤transpose of a matrix A

Toften a time-index ∈R>0

Chapter 1

Introduction

We develop local large deviations for the generalised Jackson network. We

work with a continuous time model and with light tail distributions for inter

arrival and service times. Using classical large deviation theory of logarithmic

moment generating functions and exponential changes of measure we get a

local rate function that is almost a Fenchel Legendre transform of the free

process’ logarithmic moment generating function Ψ.

L(x, v) = sup

α∈BK(x,v)

hα, vi − Ψ(α) (1.1)

What keeps the local rate function from being a full Fenchel Legendre trans-

form is the restrictions BK(x,v)reflecting nodes not-empty when the state of

the network is xand nodes filling up when the network evolves in direction v.

The way we develop the local large deviation will allow to get a weak and

full large deviation principle for the generalised Jackson network quite easily.

We also give a representation of the almost Fenchel Legendre transform as a

Fenchel Legendre transform in lower dimension.

The approach to apply classical large deviation theory to stochastic networks

is inspired by “Large Deviations of Jackson Networks” of Irina Ignatiouk-

Robert published in 2000 [12].

1.1 Queues, networks, and rare events

A queueing network is a collection of inter connected service stations that

customers arrive to, travel through and leave. At each service station a cus-

tomer occupies the server in order to receive service. Whenever a customer

arriving at a server finds it busy serving another customer, the arriving cus-

tomer will queue.

We like to think about networks in a stochastic way: the time a customer

occupies a server may vary from customer to customer and from server to

server; the travelling through the network may be along different possible

routes and a customer leaving a node may choose between different nodes to

go to next. We perceive these decisions as random.

Assuming that the network has resources enough to finish servicing customers

at each station in a reasonable time we are interested in the rare events of

long queue sizes at some nodes and in probabilities of different evolutions of

large queue sizes over time.

In a network with sufficient resources large queue sizes will occur only rarely

- but they will. A manager of a network will have to find the balance between

increasing resources and the tolerance to rarely occurring large queue sizes

and when necessary come up with actions to reduce large queue sizes quickly.

The present thesis gives a guideline of how often to expect any kind of large

queue sizes for given resources in a network under its regular operating con-

ditions.

Before turning to the main object of interest of this thesis which are net-

works of queues, let us look at a single network node in isolation. The

simplest setting is the single server queue with Poisson arrivals and exponen-

tial service times, the so called M/M/1 queue. A generalised M/M/1 would

be denoted a GI/GI/1 queue where the stream of arriving customers is a

renewal counting process and service times are independent with a general

distribution (M is for Markovian and the Poisson process is Markovian, GI

is for General Independent).

To get a first impression of typical results consider a queue that generally

can serve arriving customers without the queue size becoming too large. Pro-

vided that the queue has been running for a long time, the probability for a

M/M/1 queue to be of size x∈Nor larger is

P(Q≥x) = ρx(1.2)

where ρ < 1 is the traffic intensity, the ratio of arrival rate and service rate.

For the GI/GI/1 queue we can make the following approximation for the

1.1 Introduction 3

probability of an unusually large size of at least nx: For x∈R

lim

n→∞

nlog P(Q≥nx) = −δx (1.3)

for some δ > 0 (for example [6] in 1994). This can be equivalently expressed

as: For any small ǫ > 0 and nlarge enough

e−n(δx−ǫ)≤P(Q≥nx)≤e−n(δx+ǫ)(1.4)

The approach via steady state probabilities works well if the network has

been running for a long time under stable conditions and is usually followed

in situations where we have no information other than that. If however we

are able to monitor the system and observe its state x0at time t= 0 say,

then we might want to know how the queue size evolves from now on. In a

stable system we will most likely observe the queue size decreasing but even

so rarely other evolutions may occur too. In our approximative setting we

do not observe the queue size exactly but for some nthe smoothed version

Qnas a function of time: Qn(t) = 1

nQnt. We are then interested in following

probability

P(||Qn−ψ|| < ǫ |Qn(0) = x0).(1.5)

with || · || denoting the supremum norm over the interval of interest.

This and more general questions can be answered with a large deviation

result by Anatolii Puhalskii [15] from 1995: under some technical conditions

for a set of functions A

−inf

φ∈A◦I(φ)≤lim inf

n→∞

nlog P(Qn∈A|Qn(0) = x0)

lim sup

n→∞

nlog P(Qn∈A|Qn(0) = x0)≤ − inf

φ∈A

I(φ) (1.6)

Here the so called rate function I(φ) at φis either infinite or can be calculated

from an explicitly known local rate function Lby

I(φ) = Zt1

s=0

L(φ(s), φ′(s)) ds. (1.7)

The rate function I(·) allows to generalise the fixed rate δof (1.3) to bound

the steady state probabilities: We get a lower rate infφ∈A◦I(φ) and an up-

per rate infφ∈AI(φ) that naturally depend on the event Ain which we are

interested in.

In particular the choice A={φ| ||φ−ψ|| < ǫ}is feasible and we can approx-

imate (1.5) by applying (1.6). We rephrase (1.6) in terms of small ǫ > 0 and

large n:

e−n(infφ∈A◦I(φ)−ǫ)≤P(||Qn−ψ|| < ǫ |Qn(0) = x0)≤e−n(infφ∈AI(φ)+ǫ)(1.8)

So far we have stated results for the single queue only. We now proceed to

networks of queues and state similar results. We are interested in approxi-

mating probabilities for specified rarely occurring behaviour of queue sizes,

now jointly for the queues at each node of the network. We start with steady

state probabilities for queue sizes at a fixed time and then move to probabil-

ities over intervals of time and with fixed starting positions.

Consider a network of dM/M/1 queues and now let Q∈Ndbe the vec-

tor of queue sizes at some fixed time when assuming that the system has

been running for a long time. For d= 1 we are back with the isolated queue.

This network of M/M/1 queues, the so called Jackson network, was intro-

duced by James R. Jackson in 1957 [14] to model machine shops with goods

to be processed travelling the network of machines. As a machine is busy

processing, newly arriving goods have to wait to be processed later and in

the mean time queue. The Jackson network has also been applied to doc-

uments travelling a network of offices of a business or administration where

they are processed by clerks [25] ; Qis then the height of the stack of docu-

ments piling on each desk. More recently Jackson networks are applied in the

analysis and design of computer networks where there are now data packages

being routed through a local area network or some larger network like the in-

ternet. Qis then the number of packages in the buffer of each routing device.

@@@@







- -

Figure 1.1: A network with d= 4 nodes

As an example consider the network of d= 4 nodes in figure 1.1. Let

1.2 Introduction 5

•λ∈R4be non-negative with coordinates λ1, λ2>0 for the rate at

which customers arrive at nodes 1 and 2. There are no customers

arriving at nodes 3 and 4, so λ3=λ4= 0;

•µ∈R4be positive with µithe rate at which a server releases customers

while busy;

•P∈R4,4be a routing matrix: the i-th row of Pgives probabilities of

which node to go to next and of leaving the network. In the example

network leaving the network is possible only after finishing service at

nodes 3 and 4.

The result of Jackson published in [14] is the following: If there is a unique

νsolving the traffic equation

ν=λ+P⊤ν(1.9)

and νi< µifor all nodes i= 1,...,4 then the steady state probability for

the queue sizes to be of size x∈N4is

P(Q=x) =

i=1

(1 −νi

µi

)νi

µixi

.(1.10)

Note that νi< µimakes precise the sufficient resources (all µilarge enough)

in the network to generally allow to serve all customers in a reasonable time.

Applications of Jackson’s results are manifold: it helps us to decide about

the design of a network system where the probability of queue sizes to ex-

ceed some xmax has to be below some fixed level. Similarly, in a network

with given service-resources we can now decide how to invest additional re-

sources to get a maximum reduction of the probability to exceed xmax. To a

network service provider probabilities of large queue sizes and the empirical

occurrence of large queue sizes are a measure for the quality of the provided

service, and presumably for customer satisfaction.

In computer networks conditions will vary throughout the day and a network

service provider will observe and thus fix the present state of the network

and ask about probabilities for future evolution starting from the present

state. For this purpose steady state probabilities are not enough and we

turn to sample path large deviations. As before the large deviations concern

the scaled, smoothed process Qnthat for tin some interval is defined as

Qn(t) = 1

nQ(nt)∈Rd.

1.2 Large deviations of Jackson and gener-

alised Jackson networks

Large deviations for a wide class of Markovian models including the Jack-

son network have in principle been obtained by Paul Dupius and Richard S.

Ellis in 1995 [7]. The analysis of stochastic networks has been perceived as

difficult due to discontinuity of their behaviour as queue sizes change from

empty to full and back.

A result of Irina Ignatiouk-Robert of 2000 [12] gives the explicit form of

the rate function for the Jackson network. We can make the same kind of

bounding as in (1.6) and (1.8) now with Aa set of d-dimensional functions

and the rate function I(·) again in integral form over a local rate function

L(·,·):

I(φ) = ZT

s=0

L(φ(s), φ′(s)) ds

L(x, v) = lΛ(x)(v) = sup

α∈BΛ(x)

hα, vi − R(α) (1.11)

for some explicitly known Rand set of restrictions BΛ(x). It is quite remark-

able that given the complexity of a network the result comes in such handy

format.

The next step is of course to generalise the sample path large deviations

from networks of M/M/1 queues to networks of GI/GI/1 queues. This is

what we do in this thesis.

During the work on this thesis, in 2007, Anatolii Puhalskii [16] has proved

existence of a large deviation principle for the generalised Jackson network.

He gives a rate function in integral form with the local rate function a high-

dimensional convex optimisation problem. Complementing this we give the

explicit representation of the local rate function.

Generally our approach is very different from that of Puhalskii. It is closer

to ideas found in the work of Ignatiouk-Robert [12] and highlights the classi-

cal large deviation theory of finding the exponential change of measure that

turns the deviating behaviour into regular behaviour.

As a generalised Jackson network is not a Markov process we cannot profit

from the rich theory developed around them in the last 100 years and we

have to develop our own tools. The thesis is structured as follows:

1.2 Introduction 7

•Chapter 2 is on inter event times. Inter event times in a queueing

network are times between arrivals of customers or the service times

of customers at a node. We introduce inter event times and make

assumptions on their distributions. Further, we introduce the hazard

rate function. From inter event times we build

•renewal counting processes in chapter 3. In the network setting an

arrival counting process will count the number of arrivals at a node over

an interval of time. We generally work with non-Markovian counting

processes and we prove in this section that some implications of the

Markov property can still be obtained for these non-Markovian count-

ing processes. This will be done through exponential equivalence.

We then introduce a change of measure for the renewal counting pro-

cess such that under the changed measure the process stays a renewal

counting process. Then of course we need to know about the

•large deviations of the renewal counting process: We develop them in

chapter 4. We start with local large deviations applying the change

of measure developed in the previous chapter. Building on the local

large deviations we prove weak large deviations and strengthen these

to a full large deviation principle.

The large deviations of renewal counting processes are on the one hand

required to develop the local large deviations of the generalised Jackson

network. On the other hand we think that in a similar and relatively

easy way this allows to strengthen the local large deviations of the

generalised Jackson network to a full large deviation principle for the

generalised Jackson network.

•Chapter 5 introduces stochastic networks and stochastic processes

that describe (aspects of) such networks: the free, the network, and

the local process.

We first define drifts for networks based on deterministic rates for the

network starting empty and the network starting with some nodes ini-

tially non empty, and we give formulations of network drifts in terms

of the solution to the linear complementary problem and the Skoro-

hod problem. When introducing the stochastic processes we show that

these initially defined drifts are drifts of these processes, and thus de-

scribe the regular behaviour of theses processes.

We then investigate rare events and develop sample path large devia-

tions for the free process.

•In chapter 6 we prove the local large deviations for the generalised

Jackson network. We follow a classical approach here that uses an

exponential change of measure (developed over chapters 3, 4, 5) that

gives the upper bound as the almost Fenchel Legendre transform that

will be the local rate function. The optimiser of the almost Fenchel

Legendre transform corresponds to a change of measure which is then

applied to obtain the lower bound.

We close with identifying the rate function of Puhalskii with our local

large deviation rate function and with an example of how to calculate

the local rate function.

Also there are further applications of these sample path large deviations by

the contraction principle that allow approximating probabilities for events

that continuously depend on the queue sizes Qin [9], [23] that require an

analytical form of the rate function.

As Jackson networks are Markov processes and generalised Jackson networks

are not our technique has to be fundamentally different from that in [7] and

[12]. In terms of results the difference of (1.1) and (1.11) is small:

•Ψ versus R: Both are logarithmic moment generating functions of the

associated free process and Ψ = Rif the generalised Jackson network

is a Jackson network.

•optimising over BΛ(x)versus BK(x,v): For an absolutely continuous φwe

have Λ(φ(t)) = K(φ(t), φ′(t)) for almost all t.

We cite related work we are aware of at the beginning of chapters and give

reference to alternative proofs in the main text.

Chapter 2

Inter event times

Looking at a stochastic network times between arrivals of customers at a

node, as well as the time a customer occupies the server at a node are ran-

dom.

In this thesis we generalise the sample path large deviation principle for

Jackson networks [12]: In a Jackson network times between arrivals of cus-

tomers at a node, as well as the time a customer occupies the server at a

node are iid exponentially distributed. We generalise this to arbitrary light

tailed distributions. Independence assumptions of the Jackson network are

not challenged. We start - in this chapter - with investigating general inter

event times and comparing them to exponential ones: How they are different

and what angle can be chosen to highlight similarities.

We will see that there are basically two classes of inter event times: those

that stay relatively small (we call them LD-bounded) and those that may be

large. The exponential distribution is one that produces not LD-bounded in-

ter event times and here we will apply general properties of not LD-bounded

inter event times to make up for the lost Markov property.

2.1 Inter event time

In a GI/GI/1 queue times between consecutive arrivals are iid and so are the

required lengths of service for each customer. We will refer to times between

consecutive arrivals and to the lengths of service as inter event times.

Definition 2.1.1. An inter event time is a non-negative random variable.

Inter event times are often denoted by τand variations of it. The distri-

bution function of the inter event time τwill be denoted F. Throughout this

thesis we follow the convention that for Fa distribution function Fc= 1−F.

Definition 2.1.2 (∼-transform of F).For a distribution function Fof an

inter event time with finite mean define the distribution function ˜

Fas

F(x) := Zx

s=0

Fc(s)

R∞

t=0 Fc(t)dt ds

Note that ˜

Fhas density

f(x) = Fc(x)

R∞

s=0 Fc(s)ds =Fc(x)

E[τ](2.1)

and is the distribution function of an inter event time. The inter event time

with distribution function ˜

Fwill be denoted ˜τ. We see how the mean of τ

and ˜τrelate:

Claim 2.1.3. Let τbe an inter event time and ˜τassociated with it. If

E[τn+1]<∞then E[˜τn]<∞and E[˜τn] = E[τn+1]

(n+1)E[τ].

Proof of 2.1.3:

E[τn+1] = Z∞

x=0

xn+1 dF(x)

= lim

z→∞ zn+1F(z)−Zz

x=0

F(x−)dxn+1 

= lim

z→∞ (n+ 1) Zz

x=0

xndx F(z)−(n+ 1) Zz

x=0

F(x)xndx 

= lim

z→∞(n+ 1) Zz

x=0

(F(z)−F(x)) xndx

= (n+ 1) Z∞

x=0

(1 −F(x)) xndx

= (n+ 1)E[τ]Z∞

x=0

Fc(x)

E[τ]xndx

= (n+ 1)E[τ]E[˜τn]



2.1.3

We construct another inter event time from τ:

Definition 2.1.4 (τ◦associated with τ, G).Let τ, τ1, τ2,...be iid with distri-

bution function F,p∈(0,1) fixed and Gthe geometrically distributed random

2.1 Inter event times 11

variable with mass function P(G=g) = pg−1(1 −p)for g= 1,2,.... Let

G, τ1, τ2,... be independent. Define τ◦as

τ◦=

k=1

τk.

Claim 2.1.5. If τis an inter event time and τ◦is associated with τand G

with parameter pthen the mean relate as: E[τ◦] = 1

1−pE[τ].

Proof of 2.1.5: From independence of {G, τ1, τ2,...,}

E[τ◦] = E[

k=1

τk] =

∞

g=1

k=1

τk]

|{z }

=gE[τ]

P(G=g) = E[GE[τ]] = E[G]E[τ]

1−pE[τ]



2.1.5

Generally, τ◦is not qualitatively different from τand whenever working with

τit might be of the form τ◦.

Remark 2.1.6. We will later see that ˜τas the time to the first event makes

the renewal counting process have stationary increments and that τ◦is the in-

ter event time at a service node in a network that allows the leaving customer

to immediately join again the queue just left.

A few times in this thesis we will need the distribution of an inter event

time τconditional on τ > a for some positive a.

Definition 2.1.7. For a distribution function Fof an inter event time and

a∈R≥0such that F(a)∈[0,1) define

F+a: [0,∞)→[0,∞), x 7→ F(x+a)−F(a)

Fc(a)

Claim 2.1.8. If τhas distribution function Fand a∈Ris such that F(a)∈

[0,1) then F+ais the distribution function of τ−aconditional on τ > a.

Proof of 2.1.8:

P(τ−a > x |τ > a) = P(τ−a > x , τ > a)

P(τ > a)=P(τ > x +a)

P(τ > a)

=Fc(x+a)

Fc(a)

P(τ−a≤x|τ > a) = 1 −Fc(x+a)

Fc(a)=Fc(a)−Fc(x+a)

Fc(a)

=F(x+a)−F(a)

Fc(a)



2.1.8

2.2 The logarithmic moment generating func-

tion

The logarithmic moment generating function is an essential in classical large

deviation theory.

Definition 2.2.1 (Λ,D(Λ)).For an inter event time τthe logarithmic mo-

ment generating function Λis defined as

Λ : R→R∪ {∞} , θ 7→ log E[eθτ ]

and abbreviated as lmgf. The domain of Λis defined as

D(Λ) := {θ∈R|Λ(θ)<∞}.

For a distribution function Fof an inter event time we may also say that

Λ is the lmgf of F; similarly for a density fof an inter event time.

Throughout this thesis we make the following

Assumption 2.2.2. D(Λ) is open and infθ∈RΛ(θ) = −∞.

The following two claims are to interprete the assumption:

Claim 2.2.3. τhas point mass at 0iff its lmgf Λis bounded from below.

Proof of 2.2.3. We first establish log F(0) as a lower bound of Λ. Let

θ < 0 and ǫ > 0.

E[eθτ ] = E[eθτ 11[0,ǫ](τ) + eθτ 11(ǫ,∞)(τ)] ≥eθǫF(ǫ)

E[eθτ ]≥lim

ǫց0eθǫF(ǫ) = F(0) (Fright-continuous)

2.2 Inter event times 13

If F(0) = 0 we found a trivial lower bound for Λ and we have to show that

there is no other, non-trivial lower bound. Let ǫ > 0 again and first choose a

such that 0 < a ≤F−1(ǫ

2) and then θsuch that θ < 1

alog ǫ

2<0. With these

E[eθτ ]≤eθa Fc(a) + F(a)≤ǫ.



2.2.3

Note that by definition of Λ we have Λ(0) = 0 and by the assumed (in 2.2.2)

openness of D(Λ) there is θ > 0 such that Λ(θ)<∞.

Claim 2.2.4. If Λ(θ)<∞for some θ > 0then all moments of τexist.

Proof of 2.2.4: For θ > 0 and x∈R≥0we have non-negative (l, x)7→ (θx)l

and are allowed to interchange summation and integration as an application

of the Tonelli theorem (cf [21] theorem 20 of chapter 12, p. 270).

E[eθτ ] = Z∞

x=0

eθx dF(x) = Z∞

x=0

∞

l=0

(θx)l

l!dF(x) =

∞

l=0

θl

l!Z∞

x=0

xldF(x)

∞

l=0

θl

l!E[τl] (2.2)



2.2.4

We get Λ ∈ C∞(D(Λ)◦) from eΛ(θ)being a power series. eΛbeing a power

series we are also allowed to interchange differentiation and summation in

the following

dθE[eθτ ] =

∞

l=0

dθ

θl

l!E[τl] =

∞

l=0

θl

l!E[τl+1] = E[τ

∞

l=0

θl

l!τl] = E[τeθτ ]

Openness of the domain allows this easy differentiation for all θ∈ D(Λ).

Claim 2.2.5. Λis convex and strictly increasing. If τis not deterministic

we have strict convexity of Λon its domain.

Proof of 2.2.5: Convexity follows from the H¨older inequality. By differ-

entiating in the open domain

dθΛ(θ) = E[τeθτ

eΛ(θ)]>0,d2

dθ2Λ(θ) = E[τ2eθτ

eΛ(θ)]−E[τeθτ

eΛ(θ)]2

While d

dθ Λ(θ)>0 follows from P(τ > 0) >0 as implied by assumption 2.2.2

we have d2

dθ2Λ(θ)>0 only for P(τ=x)<1 for all x∈R≥0.

2.2.5

Remark 2.2.6. Note that

eθτ

eΛ(θ)>0,E[eθτ

eΛ(θ)] = 1

and we can write Λ′(θ)and Λ′′(θ)as expectation and variance of τunter a

changed measure.

Openness of the domain D(Λ) also implies that Λ is not bounded from

above. So 2.2.2 implies Λ(D(Λ)) = Rand the following is a feasible definition.

Definition 2.2.7 (Γ associated with Λ).Let Λbe the lmgf of an inter event

time τfor which assumption 2.2.2 holds. Then define Γassociated with Λas

Γ : R→R, θ 7→ −Λ−1(−θ)

Claim 2.2.8. D(Γ) = R,Γis strictly increasing, and (if τis not determin-

istic) strictly convex on R.

Proof of 2.2.8: Finiteness of Γ on all of Rshould be immediate from the

definition. Strict convexity of Λ was argued for in 2.2.5 and implies strict

convexity for Γ:

dθΓ(θ) = −d

dθΛ−1(−θ) = 1

Λ′(−Γ(θ)) >0

dθ2Γ(θ) = d

dθ

Λ′(−Γ(θ)) =Λ′′(−Γ(θ))

Λ′(−Γ(θ))3>0



2.2.8

So properties of Λ translate into properties of Γ. For example

•limθ→−∞ Λ′(θ) = 0 ⇔limθ→∞ Γ′(θ) = ∞

• D(Λ) = (−∞, LC(h)) is equivalent to Γ(R) = (−LC(h),∞).

•Γ′(θ) = 1

E(−Γ(θ))[τ]

We will now investigate properties of the lmgf of inter event times ˜τand τ◦.

Definition 2.2.9. The lmgf of inter event time ˜τwith distribution function

Fis denoted ˜

Λ:

Λ(θ) = log E[eθ˜τ] = log Z∞

t=0

eθt d˜

F(t)

The lmgf of inter event time τ◦is denoted Λ◦:

Λ◦(θ) = log E[eθτ◦]

2.2 Inter event times 15

Claim 2.2.10. D(Λ) = D(˜

Λ) and

Λ(θ) = (log eΛ(θ)−1

E[τ]θ, θ 6= 0

0, θ = 0.(2.3)

Proof of 2.2.10: If for τall moments exist the same holds for ˜τ(cf claim

2.1.3). And we can write the lmgf of ˜τin terms of the lmgf of τ. While

log E[e0˜τ] = 0 for θ6= 0 we get

E[eθ˜τ] =

∞

l=0

θl

l!E[˜τl]

∞

l=0

θl1

l+ 1

|{z }

(l+1)!

E[τ]E[τl+1]

E[τ]

∞

l=1

θlE[τl]

E[τ]θ∞

l=0

θlE[τl]

l!−1

E[τ]θ(E[eθτ ]−1)



2.2.10

We see that ˜τfalls under assumption 2.2.2. ˜

Λ∈ C∞(D(˜

Λ)◦) = C∞(D(˜

Λ))

and we need not worry about continuity of ˜

Λ in θ= 0.

Claim 2.2.11. If τ◦is associated with τand geometric Gof parameter p∈

(0,1) then

Λ◦(θ) = Λ(θ) + log 1−p

1−p eΛ(θ),D(Λ◦) = − ∞,Λ−1(−log p)

Proof of 2.2.11:

E[eαG] = 1−p

1−p eαeα

is finite only for α < log 1

p. For θ < Λ−1(−log p)

E[eθτ◦] = E[eθPG

k=1 τk] = E[E[eθPG

k=1 τk|G] ]

=E[E[eθτ1|G]G] = E[E[eθτ1]G]

=E[eΛ(θ)G] = E[eΛ(θ)G]

is finite and

log E[eθτ◦] = log E[eΛ(θ)G] = log 1−p

1−p eαeαα=Λ(θ)

as claimed. Openness of the domain of Λ◦follows from openness of the

domain of the lmgf of G.

2.2.11

Corollary 2.2.12. If for τassumption 2.2.2 holds then this assumption holds

for τ◦of definition 2.1.4, too.

Proof of 2.2.12: Openness of the domain of Λ◦comes from 2.2.11. Un-

boundedness from below is immediate from the explicit form of Λ◦:

lim

θ→−∞ Λ◦(θ) = lim

θ→−∞ Λ(θ) + log lim

θ→−∞

1−p

1−p eΛ(θ)

=−∞ + log(1 −p) = −∞



2.2.12

Throughout this thesis we make the following

Assumption 2.2.13. τhas a density fand lim supa→∞ E[τ−a|t > a]<∞.

This assumption is mainly for technical convenience and to ease proofs.

Existence of the density of course implies the unboundedness from below

of Λ of assumption 2.2.2 - only P(τ= 0) = 0 was required for this. The

boundedness of the conditional expectation is a property of many inter

event times. Also, as an interpretation note that lima→∞ E[τ−a|τ > a] (=

lima→∞ R∞

x=0 Fc

+a(x)dx ) = ∞is a harsh case of used better than new - ex-

cluding it should do no harm.

2.3 Exponential twist of inter event times

Definition 2.3.1 (Twisted distribution, exponential transform of F).Let F

be the distribution function of an inter event time with lmgf Λ. Let β∈ D(Λ)

then the twisted distribution Fβis defined as

Fβ(x) := Zx

s=0

eβs−Λ(β)dF(s)

and βwill be called the twist parameter.

2.3 Inter event times 17

We can say that Fβhas density s7→ eβs−Λ(β)wrt Fand that if Fhas

density f(wrt Lebesgue measure) then Fβhas density (wrt Lebesgue mea-

sure)

fβ(x) := f(x)eβx−Λ(β).(2.4)

The exponential transform of an exponential distribution is again an expo-

nential distribution, the parameter changes as µ7→ µ−βfor βthe parameter

in the exponential transform. In this section we describe properties of expo-

nential transforms applied to general inter event times.

Claim 2.3.2. If τfalls under assumption 2.2.2 and τβis associated with

τthrough the exponential transform with parameter β∈ D(Λ) then τβfalls

under assumption 2.2.2, too.

Proof of 2.3.2: We calculate Λβ, the lmgf of τβwith distribution function

Fβ.

E[eθτβ] = Z∞

s=0

eθs dFβ(s) = Z∞

s=0

eθs eβs−Λ(β)dF(s)

=Z∞

s=0

e(θ+β)sdF(s)e−Λ(β)=E[e(θ+β)τ]1

E[eβτ ]

Λβ(θ) = log E(β)[eθτ ] = Λ(β+θ)−Λ(β) (2.5)

which we might want to write as D(Λβ) = D(Λ) −β. So β∈ D(Λ) is

equivalent to 0 ∈ D(Λβ) and generally openness of the domain is not changed

by its translation. Unboundedness from below for Λβis immediate. 

2.3.2

Lemma 2.3.3. Exponential transforms can be inverted: If β∈ D(Λ) then

(Fβ)−β=F , (fβ)−β=β

Proof of 2.3.3: The second twist with parameter −βhas to be relative to

fβ. We write the twists explicitly.

fβ(x) = f(x)eβx−Λ(β)

(fβ)−β(x) = fβ(x)e−βx−Λβ(−β)

eΛβ(θ)=E(β)[eθτ ] = Z∞

x=0

eθx fβ(x)dx

=Z∞

x=0

eθx f(x)eβx−Λ(β)dx =Z∞

x=0

e(θ+β)xf(x)e−Λ(β)dx

eΛβ(−β)=Z∞

x=0

f(x)e−Λ(β)dx =e−Λ(β)

(fβ)−β(x) = f(x)e(β−β)x

|{z }

e−Λ(β)+Λ(β)

|{z }

We check that everything is well defined.

−β∈ D(Λβ) = D(Λ) −β⇔0∈ D(Λ)

Since we calculated Λβ(−β) = −Λ(β) the lhs of this expression is finite when-

ever β∈ D(Λ), a condition we started with. So all’s well. 

2.3.3

We close the section with some remarks

Remark 2.3.4. •The ∼-transform and the exponential transform of F

do not generally commute: (˜

F)β6=g

(Fβ).

Writing down (˜

f)βand g

(fβ)equality requires that x7→ eβx Fc

β(x)is

constant. This holds for the exponential but not for the uniform distri-

bution.

•Denote expectation and variance wrt the exponentially transformed dis-

tributions of 2.3.1 with parameter θby indexing with (θ). Then deriva-

tives of Λcan be written as Λ′(θ) = E(θ)[τ]and Λ′′(θ) = V(θ)[τ](cf

2.2.5, 2.2.6).

2.4 Hazard

We introduce the hazard rate and make some mild technical assumptions on

it. We’ll see how the hazard rate relates to the domain of the lmgf, especially

in terms of boundedness. Also, we give examples for the hazard rate - for

well known distributions and when constructing inter event times from the

hazard rate.

We also investigate how hazard rates change under exponentially twisting

the distribution, and how hazard rates of τand ˜τrelate.

We will again see an analogy to the exponential distribution: When de-

fined the right way the mean rate of the hazard function equals the bound

of the domain.

2.4.1 Introduction of the hazard function

For inter event time τwith distribution function Fdefine H:= −log Fcon

the support of τ. Assuming absolute continuity of Falso Hhas a derivative

almost everywhere and h(x) = d

dx H(x) = d

dx F(x)

Fc(x)=f

Fc(x) where it exists.

2.4 Inter event times 19

Interpreting hin terms of probability:

h(x) = lim

ǫ→0

ǫP(τ∈(x, x +ǫ])

P(τ > x)= lim

ǫ→0

ǫP(τ∈(x, x +ǫ]|τ > x)

which is the infinitesimal probability for τto take a value in the infinitesimal

interval (x, x +dx] given that τ > x.his called the hazard function of τ.

Definition 2.4.1 (Hazard function, LC(h)).For an inter event time τwith

density fthe hazard function his

h: [0,∞)→[0,∞], x 7→ h(x) = (f

Fc(x), Fc(x)>0

0,else

and LC(h) := limx→∞ 1

xRx

s=0 h(s)ds is the Cesaro mean of h.

In this thesis we always make the following

Assumption 2.4.2. For an inter event time τwith hazard function hthe

Cesaro-mean LC(h)exists and LC(h)∈(0,∞].

Note that given a distribution function Fthe density fand the hazard

function hare not uniquely defined. However, we have

Fc(x) = exp{− Zx

s=0

h(s)ds}holds for all x

−d

dx log Fc(x) = h(x) holds for almost all x∈supp(f).

Thus we see that given a hazard function we can construct the distribution

function. But, what is a hazard function in general?

Claim 2.4.3. Let h: [0,∞)→[0,∞]be measurable and H=Rh. If

limǫ→0H(ǫ) = 0 and limx→∞ H(x) = ∞then F= 1 −e−His a distribution

function with a density. On the support of Fthe hazard function is a.s equal

to h.

Proof of 2.4.3: We prove that F:= 1 −e−His a distribution function.

•Fis non-negative and continuous since His.

•limx→∞ F(x) = 1 −e−limx→∞ H(x)= 1 −0 = 1

•Fis increasing since h≥0 and His increasing.

This distribution has a density iff Fis an absolutely continuous function.

His absolutely continuous by definition. And g(z) = 1 −e−zis Lipschitz

continuous with g′(z) = e−z≤1 for z≥0. Since H≥0 we get absolute

continuity for F=g◦H. Let xbe such that F(x)<1 (⇔H(x)<∞):

dxF(x) = d

dx(1 −e−H(x)) = −e−H(x)(−h(x)) = h(x) exp{− Zx

s=0

h(s)ds}

(with H′=hwhere His finite a.s. from absolute continuity) is the density

of F.

How does the support of Frelate to H? If H(x) = ∞for some finite x, then

H(z) = ∞and Fc(z) = 0 for all z > x. The hazard of Fis thus defined to

be = 0 on [x, ∞), though hmay take any value there (from the claim).

Now on the support of F: if Fc(x)<1 then H(x)<∞and applying the

density fas calculated f

Fc=he−H

Fc=h.

2.4.3

Had we assumed continuity of hall of the above would have been imme-

diate.

How does assumption 2.2.2 translate into H? We will discuss this in the

following examples sections.

For a bounded inter event time τ≤bit is necessary (F(b) = 1) that

limt→bH(t) = limt→bRt

s=0 h(s)ds =∞and lim supt→bh(t) = ∞. This agrees

with the following interpretation for the hazard: If an event has to happen

before bthen the force for it to happen increases without bound as time

approaches b.

2.4.2 The hazard function and the domain of lmgf

For an exponential distribution with density f(x) = µ e−µx the parameter µis

the boundary for the domain of its logarithmic moment generating function:

D(Λ) = (−∞, µ) and the constant hazard rate h≡µ. We generalise this.

Claim 2.4.4. If the Cesaro-mean of the hazard rate diverges to ∞the lmgf

has an unbounded domain: LC(h) = ∞ ⇒ D(Λ) = R.

Proof of 2.4.4: Let M > θ and x0be large enough for H(x)

x> M for all

2.4 Inter event times 21

x≥x0.

E[eθ˜τ] = 1

E[τ]Zx0

x=0

eθx Fc(x)dx +Z∞

x=x0

eθx Fc(x)

|{z }

=ex(θ−H(x)

dx

≤1

E[τ]Zx0

x=0

eθx Fc(x)dx +Z∞

x=x0

ex(θ−M)dx<∞

By 2.2.10 this is equivalent to unboundedness of D(Λ). 

2.4.4

Claim 2.4.5. If the Cesaro-limit LC(h)exists in (0,∞)then the domain of

the lmgf of τis bounded and D(Λ) = (−∞, LC(h)).

Proof of 2.4.5: For θ < LC(h) set ǫ:= LC(h)−θ

2>0. Let x0be large

enough for

H(x)

x> LC(h)−ǫ∀x≥x0.

Then

θx −H(x) = x(θ−H(x)

x)≤x(θ−(LC(h)−ǫ)) ≤x(−2ǫ+ǫ)

=−ǫx

and

Z∞

x=x0

eθxFc(x)dx ≤Z∞

x=x0

e−ǫx dx x0→∞

→0

and by 2.2.10 also θ∈ D(Λ).

Now let θ > LC(h) and ǫ=θ−LC(h)

2>0 and x0large enough for H(x)

LC(h) + ǫ. Then

θx −H(x) = x(θ−H(x)

x)> x(θ−LC(h)−ǫ)> ǫx

and R∞

x=x0eθxFc(x)dx =∞and E[eθ˜τ] = ∞implying θ6∈ D(˜

Λ) and LC(h)

thus has to be on the boundary of D(˜

Λ). Since D(Λ) = D(˜

Λ) the claimed

statement is justified. 

2.4.5

Corollary 2.4.6. From 2.4.4 and 2.4.5 we can completely generalise the

property of the exponential distribution: If LC(h)exists in (0,∞]then D(Λ) =

(−∞, LC(h)).

Similarly for the exponential distribution LC(h) = µis the exact expo-

nential decay rate of the tail of the distribution function t7→ e−µt. We

generalise this expression to the more general Fwe allow for inter event time

distributions:

Remark 2.4.7. For LC(h) = ∞we say the tails of Fdecay superexponen-

tially and for LC(h)<∞we’ll say that Fcdecays exponentially with rate

LC(h).

Corollary 2.4.8. •LC(h) = LC(˜

due to (−∞, LC(h)) 2.4.6

=D(Λ) 2.2.10

=D(˜

Λ) 2.4.6

= (−∞, LC(˜

h)).

•LC(hβ) = LC(h)−β

due to (−∞, LC(h)) 2.4.6

=D(Λ) and D(Λβ)(2.5)

=D(Λ) −β.

In 2.2.2 we have assumed openness of the domain. The following is a

sufficient condition for openness of D(Λ).

Lemma 2.4.9. If LC(h)<∞and lim supx→∞ H(x)−LC(h)x < ∞then the

domain of Λis open as assumed in 2.2.2.

Proof of 2.4.9: If lim supx→∞ H(x)−x LC(h)<∞then lim infx→∞ ex LC(h)−H(x)>

0 and

E[eLC(h)˜τ] = 1

E[τ]Z∞

x=0

eLC(h)xFc(x)dx =1

E[τ]Z∞

x=0

eLC(h)x−H(x)dx

=∞

Then D(˜

Λ) does not contain its right boundary LC(˜

h) = LC(h) and since

D(˜

Λ) = D(Λ) neither does D(Λ). 

2.4.9

The equivalent condition for an open domain is obvious from the proof of

2.4.9:

E[eLC(h)˜τ]<∞ ⇔ eLC(h)x−H(x)→0 integrably fast.

2.4.3 Examples

We give several examples of hazard rates with different properties in bound-

edness and monotonicity. We discuss assumptions 2.2.2 and 2.4.2.

Example 2.4.10. The exponential distribution is characterised by its con-

stant hazard rate: f(x) = µ e−µx ⇔h(x) = µ=LC(h).

2.4 Inter event times 23

Example 2.4.11. The Erlang distribution Ek(µ)where k≥1has a hazard

rate that is monotonically increasing to µ.

Fc(x) = e−µx

k−1

j=0

(µx)j

j!, H(x) = µx −log

k−1

j=0

(µx)j

f(x) = µkxk−1

(k−1)! e−µx , h(x)x>0

=µ1 +

k−1

j=1

(µ x)−j(k−1)!

(k−1 + j)!−1

LC(h) = µ

The domain of a hazard function with h(x)< LC(h) is generally open

as implied by 2.4.9. Erlang distributed inter event times fall under both

assumptions.

Example 2.4.12 (Oscillating hazard, discrete).If h(x) = κ11⌊x⌋odd+µ11⌊x⌋even

then LC(h) = 1

2(µ+κ).

We have

H(x) = 2⌊x

2⌋κ+µ

2+Zx

s=2⌊x

2⌋

h(s)ds

LC(h)x−H(x) = Zx

s=2⌊x

2⌋κ+µ

2−κ11⌊s⌋odd +κ+µ

2−µ11⌊s⌋even ds

=Zx

s=2⌊x

2⌋

−κ−µ

211⌊s⌋odd +κ−µ

211⌊s⌋even dx

6→ −∞

Again from 2.4.9 we get an open D(Λ).

Example 2.4.13 (Oscillating hazard, continuous).If h(x) = 1+sin(x)then

LC(h) = 1 and Fc(x) = exp{−x−1 + cos x}. More generally for a, b > 0

define ha,b(x) := a+b(1 + sin(x)).

Again LC(h)x−H(x)6→ −∞ and exp{LC(x)(h)−H(x)}is not integrable.

2.4.9 applies.

Example 2.4.14. If h(x) = a+c

x+bfor a, b > 0and c∈(0,1] then LC(h) =

We have

LC(h) = a , H(x) = ax +clog x+b

eLC(h)x−H(x)=b

x+bc

And E[eLC(h)˜τ] = ∞for the parameters given in the example.

c > 1 is excluded since otherwise the domain was not open. Similarly b > 0

is necessary since a function x7→ a+1

xdoes not qualify as a hazard function

as Rz

x=0 a+1

xdx =∞for any z > 0 or equivalently limǫ→0H(ǫ) = ∞ 6= 0.

Also the case of a= 0 does not fall under assumption 2.4.2

While h(x) = 2 + cos log(1 + x) is a hazard function it does not fall un-

der assumption 2.4.2.

Example 2.4.15 (Exponential hazard).If h(x) = 1 −e−xthen LC(h) = 1

and Fc(x) = exp{−x−e−x−1}. More generally for a > 1, b define ha,b(x) =

a+ sign(b)ebx.

For b < 0 we have LC(h) = aand an open domain from 2.4.9. For b > 0

we get LC(h) = ∞and openness is not an issue. Existence of a density is

obvious in both cases and so 2.2.2 holds.

Example 2.4.16. The uniformly distributed random variable has

f(x) = 11[0,1](x), F(x) = x11[0,1](x) + 11(1,∞)(x)

h(x)x∈[0,1]

1−x

(x→1)

→ ∞ =LC(h)

Starting from the hazard function:

Example 2.4.17 (Affine hazard).If h(x) = xthen LC(h) = ∞and Fc(x) =

exp{−x2

2}. More generally for a, b > 0define ha,b(x) := a x +b.

Generalising this to polynomial hazards h(x) = (ax)k+bwith k≥1 we

arrive at Weibull-distributions.

2.5 Coupling

This section is on coupling inter event times.

Definition 2.5.1 (Coupled inter event times).Two inter event times are

coupled if they have a joint distribution.

Any two inter event times σ1, σ2have a joint distribution as a tuple of

independent random variables. Another possibility to construct a joint dis-

tribution while keeping individual (=marginal) distributions of σ1, σ2is the

quantile coupling. It is nicely explained in section 3 of chapter 1 in the book

of Hermann Thorisson [24].

2.5 Inter event times 25

Definition 2.5.2 (Quantile coupling).If inter event times σ1, σ2have re-

spective distribution functions F1, F2and Uis uniformly distributed on (0,1)

then (F−1

1(U), F−1

2(U)) is the quantile coupling of (σ1, σ2).

It is obvious that (F−1

1(U), F−1

2(U)) are coupled. Furthermore the associ-

ation of (F−1

1(U), F−1

2(U)) with (σ1, σ2) is in the fact that L(F−1

i(U)) = L(σi)

for i= 1,2:

P(F−1

1(U)≤t) = P(U≤F1(t)) = F1(t) = P(σ1≤t).(2.6)

A joint distribution is required if we want to consider a function of two ran-

dom variables, for example their difference. This is required in the following

definition of exponential equivalence.

Definition 2.5.3 (Exponential equivalence in R).The sequences of real val-

ued random variables (Yn;n∈N)and (Zn;n∈N)are exponentially equiv-

alent if for each n∈Nthere is a coupling (ˇ

Yn,ˇ

Zn)of (Yn, Zn)such that the

sequence (|ˇ

Yn−ˇ

Zn|;n∈N)decays super exponentially: For any δ > 0

lim

n→∞

nlog P(|ˇ

Yn−ˇ

Zn|> δ) = −∞.

This definition is a special case of the more general definition 4.2.10 of

[5]. And the importance of this definition is in theorem 4.2.13 of [5] which

tells us that if one of two exponentially equivalent sequences satisfies a large

deviation principle with a good rate function then the other does too, and

with the same rate function.

We introduce a new property for the inter event times.

Definition 2.5.4. A inter event time is LD-bounded if its lmgf is finite on

all of R.

An alternative definition is that an inter event time is LD-bounded if its

hazard function satisfies LC(h) = ∞, by 2.4.6.

Thus any bounded inter event time will be LD-bounded. In this section

we connect LD-boundedness to exponential equivalence.

Very generally we can scale a single inter event time τand investigate its

large deviation behaviour. Doing so, any nice non-negative random variable

falls in one of two classes: It’s either exponentially equivalent to 0 or to an

exponentially distributed random variable with the correct parameter.

The correct parameter for the exponential distribution is the Cesaro mean

of the hazard function LC(h) with hthe hazard function of τ.

In the following we prove the claimed exponential equivalence separately

for the LD-bounded and the non-LD bounded inter event times.

Claim 2.5.5. If the inter event time σis LD-bounded then the sequences

(σ

n;n∈N)and (0 ; n∈N)are exponentially equivalent.

As a shorter way to express 2.5.5 we might say that σand 0 are expo-

nentially equivalent.

Proof of 2.5.5: Let LC(h) be the Cesaro mean of the hazard rate of σ. We

generally assumed that LC(h) exists in (0,∞] (cf. 2.4.2) and for LD-bounded

σwe have LC(h) = ∞by 2.4.6. Thus

nlog P(1

n|σ−0|> x) = 1

nlog Fc(nx) = 1

nlog exp{− Znx

s=0

h(s)ds}

=−x1

x n Znx

s=0

h(s)ds

|{z }

→LC(h)=∞

n→∞

→ −∞ 

2.5.5

This immediately implies that any two LD-bounded random variables are

exponentially equivalent.

Claim 2.5.6. If σis an inter event time with distribution function Fand

LC(h)∈(0,∞)and if Xis exponentially distributed with parameter LC(h)

then the sequences (σ

n;n∈N)and (X

n;n∈N)are exponentially equivalent.

Again, as a shorter way to express 2.5.6 we might say that if τand ex-

ponential Xhave the same finite Cesaro mean for their respective hazard

functions then they are exponentially equivalent.

Proof of 2.5.6: Let Gbe the distribution function of exponentially distributed

X:G(x) = 1 −e−LC(h)xand assume that X, σ are already quantile-coupled:

that there is Uuniform on (0,1) such that X=G−1(U) and σ=F−1(U).

We first observe that a large value for σimplies a large value for X. Let

0< s < t.

P(σ > nt , X ≤ns) = P(U > F(nt), U < G(ns))

=P(U∈[F(nt), G(ns)])

=G(ns)−F(ns)11F(nt)<G(ns)

2.5 Inter event times 27

Let ǫ > 0 be small enough for t

s>LC(h)

LC(h)−ǫto hold and nlarge enough for

H(nt)

nt > LC(h)−ǫto hold.

s>LC(h)

LC(h)−ǫ

⇒nt(LC(h)−ǫ)> nsLC(h)

⇒H(nt)> nsLC(h)

⇔e−H(nt)< e−nsLC(h)

⇔Fc(nt)< Gc(ns)

Thus for nlarge enough

P(σ > nt , X ≤ns) = 0

and the same works very similarly for P(X > nt , τ < ns). Put another way

we have for nlarge enough (depending on t−s)

P(σ > nt) = P(σ > nt , X ≥ns) or P(X≥ns |τ > nt) = 1.(2.7)

We apply (2.7) in the following that will show the super exponential decay

of σ−X:







σ−X > na

σ > 0

X > 0



⇔





σ−X > na

σ > na

X > 0





(2.7)

⇔





σ−X > na

σ > na

X > n(a−ǫ)





We see that for two positive random variables to be large and their difference

to be large, too, the subtrahend has to be even larger. With the help of

σ−X > na ⇒σ > na +X > n(2a−ǫ) the equivalence







σ−X > na

σ > na

X > n(a−ǫ)



⇔





σ−X > na

σ > n (2a−ǫ)

X > n(a−ǫ)





is easy to see. We can iterate and again use that given σis large, Xwill

almost surely be similarly large for nlarge enough.







σ−X > na

σ > n (2a−ǫ)

X > n(a−ǫ)



⇔ · · · ⇔ 





σ−X > na

σ > n ((k+ 1)a−kǫ)

X > k n (a−ǫ)





Thus

P(σ−X > na) = P(σ−X > na , σ > n((k+ 1)a−kǫ), X > nk(a−ǫ))

≤PX > nk(a−ǫ)

=e−LC(h)n k (a−ǫ)

ǫ< a

≤e−LC(h)n k a

and the last expression has an exponential decay rate in nof LC(h)ka

2which

can be made arbitrarily large by increasing k(we need our assumption 2.4.2

here: that LC(h)>0). So the decay is faster than exponential.

Similarly for X−σ.

2.5.6

The following is fairly general.

Claim 2.5.7. If σ1, σ2are inter event times with distribution functions

F1, F2and integrated hazard functions Hi=−log Fc

i(for i= 1,2) with a

common Cesaro limit LC= limx→∞ Hi(x)

x(for i= 1,2) then the sequences

(σ1

n;n∈N)and (σ2

n;n∈N)are exponentially equivalent.

Proof of 2.5.7: For LC<∞as in 2.5.6 let Xbe exponentially distributed

with parameter LCand distribution function Gand assume that σ1, σ2, G

are quantile coupled, that is σi=F−1

i(U) (for i= 1,2) and X=G−1(U) for

some Uuniformly distributed on (0,1). Then

lim sup

n→∞

nlog P(|σ1−σ2|> nδ)

≤lim sup

n→∞

nlog P(|σ1−X|> nδ

2) + P(|σ2−X|> nδ

2)

= max{lim sup

n→∞

nlog P(|σ1−X|> nδ

2),lim sup

n→∞

nlog P(|X−σ2|> nδ

2)}

= max{−∞ ,−∞} =−∞

While for LC=∞we have already argued that exponential equivalence

holds. We might as well do the same calculation as above replacing Xby 0.



2.5.7

Corollary 2.5.8. τand ˜τare exponentially equivalent.

So far we have applied the distinction between LD-bounded and not LD-

bounded random variables in the different proofs for exponential equivalence

of inter event times in 2.5.5 and 2.5.6. We now connect them directly:

Claim 2.5.9. If inter event times σ, τ have hazard functions with the same

Cesaro mean then their difference under the quantile coupling is LD-bounded.

Proof of 2.5.9: For σ, τ themselves LD-bounded and θ > 0

E[e|σ−τ|(θ+ǫ)]≤E[eσ(θ+ǫ)eτ(θ+ǫ)]

≤E[eσ p (θ+ǫ)]1

pE[eτ q (θ+ǫ)]1

q<∞

2.6 Inter event times 29

And if σ, τ are not LD-bounded: Let uniform Ube such that σi=F−1

i(U)

for i= 1,2 and set X=G−1(U) for Gthe exponential distribution function

with parameter LC. We proved that quantile coupled σiand Xhave an

LD-bounded difference. For the difference of Xand σiunder the quantile

coupling we have bounded

P(σi−X > na)≤e−LCkna

with a > 0 and arbitrary k∈Nfor nlarge enough with reference to a, k.

Let Fibe the distribution function of |σi−X|and Hthe hazard function

associated with F(that is: Fc

i=e−Hi). Then

−log Fc

i(x)≥LCkx

2and Hi(x)

x=−log Fc

i(x)

x≥LCk1

for xlarge enough. The limit takes care of xbeing large enough for any fixed

kand

∀k∈N: lim

x→∞

Hi(x)

x≥LCk1

2⇒lim

x→∞

Hi(x)

x=∞

This tells us that E[eθ|σi−X|]<∞for all θ∈R. For p, q > 0 such that

p+1

q= 1 we get

E[eθ|σ1−σ2|]≤E[eθ|σ1−X|eθ|σ2−X|]≤E[eθp|σ1−X|]1

pE[eθq|σ2−X|]1

q<∞



2.5.9

2.6 Fenchel-Legendre transforms

Definition 2.6.1. Given a logarithmic moment generating function H:

Rm→Rits Fenchel Legendre transform is defined as

H∗:Rm→R, x 7→ sup

θ∈Rm

hθ, xi − H(θ).

H∗is again convex and continuous on the interior of its domain.

Remark 2.6.2. Sufficient criteria for a Fenchel-Legendre transform H∗of

Hto have compact level sets are

•for m= 1:0∈ D(H)◦

•for arbitrary m∈N:D(H) = Rmor His essentially smooth lower

semicontinuous.

(cf [5] lemma 2.2.20, theorem 2.3.6). Thus the Fenchel-Legendre transforms

Λ∗and Γ∗of lmgfs Λand Γare good rate functions.

For an inter event time τwe have denoted its lmgf Λ. From Λ we defined

an associated Γ (cf (2.2.7)). Their Fenchel-Legendre transforms relate, too.

Claim 2.6.3. Γ∗(x) = xΛ∗(1

x)for x > 0holds and Γ∗(0) = limx→0xΛ∗(1

x).

Proof of 2.6.3: First let x > 0.

Γ∗(x) = sup

θ x −Γ(θ) = sup

θ x + Λ−1(−θ)

= sup

θ=−Λ(γ) ; γ∈R

θ x + Λ−1(−θ)

= sup

γ∈R−Λ(γ)x+ Λ−1◦Λ(γ)

=xsup

γ∈R−Λ(γ) + 1

xγ

=xΛ∗1

x

While for x= 0

Γ∗(0) = sup

θ∈R−Γ(θ) = sup

θ∈RΛ−1(−θ) = lim

θ→∞ Λ−1(θ) = LC(h)

If Γ∗(0) = LC(h)<∞we get continuity of Γ∗at 0 from convexity and

Γ∗(0) = lim

x→0Γ∗(x) = lim

x→0xΛ∗(1

Otherwise, if Γ∗(0) = LC(h) = ∞we get simultaneous divergence to ∞from

lower semicontinuity of Γ∗:

∞= Γ∗(0) ≤lim inf

x→0Γ∗(x) = lim inf

x→0xΛ∗(1

⇒ ∞ = Γ∗(0) = lim

x→0Γ∗(x) = lim

x→0xΛ∗(1

x)

2.6.3

Corollary 2.6.4. •Γ∗(0) = limx→0Γ∗(x)

•Γ∗(0) = LC(h)and the domain of Γ∗contains its left boundary point

{0}iff τis LD-bounded.

Similarly we are interested in how Γ∗behaves for large arguments.

Claim 2.6.5. Γ∗behaves superlinearly at infinity.

2.7 Inter event times 31

Proof of 2.6.5: We see the correspondence of unboundedness from below

of Λ (left) and superlinearity of Γ∗(right).

lim

θ→−∞ −Λ(θ) = Λ∗(0) = lim

x→∞ Λ∗(1

x) = lim

x→∞

xΛ∗(1

x) = lim

x→∞

Γ∗(x)

where limx→∞ Λ∗(1

x) = Λ∗(0) follows from lower semi continuity and Λ∗(0) =

∞).

2.6.5

2.7 Extreme twists

What happens as the twist parameter θfor an inter event time tends to

±∞ or, if the domain of Λ is bounded, to LC(h)? With the assumption

2.2.2 we have interpreted the derivative of the lmgf as the twisted mean

(Λ′(θ) = E(θ)[τ]) and so the range of Λ′is the interval of attainable means

under exponential twists. I assume the inter event time under the twisted

distribution tends to the essential infimum and the essential supremum of

the untwisted inter event time as the twist parameter tends to −∞ and

+∞/LC(h) respectively.

Claim 2.7.1. If F(0) = 0 and there is some k∈Nsuch that f(0)(0) = ···=

f(k−1)(0) = 0 < f(k)(0) then limθ→∞Λ′(−θ) = 0.

Proof of 2.7.1: First for f(0) >0.

θE[e−θτ ] = θZ∞

x=0

e−θ x f(x)dx =θ2Z∞

x=0

e−θx F(x)dx

|{z }

(1)

y=θ x

=Z∞

y=0

e−yF(y

θ)θ2

θdy

lim

θ→∞ θE[e−θτ ] = lim

θ→∞ Z∞

y=0

e−yθ F(y

θ)dy =Z∞

y=0

e−yylim

θ→∞

yF(y

θ)dy

=f(0)

and

E[τe−θτ ] = Z∞

x=0

x e−θx f(x)dx

=x e−θxF(x)∞

x=0 −Z∞

x=0 e−θx −x θ e−θxF(x)dx

= 0 −0−Z∞

x=0

e−θ x F(x)dx

|{z }

see above

+θZ∞

x=0

x e−θx F(x)dx

and

θ2Z∞

x=0

x e−θx f(x)dx =−θ2Z∞

x=0

e−θ x F(x)dx

|{z }

=f(0)by (1) above

+θ3Z∞

x=0

x e−θx F(x)dx

|{z }

=2f(0) by(2) below

=−f(0) + 2f(0) = f(0)

(2) : lim

θ→∞ θ3Z∞

y=0

x e−θ xF(x)dx y=θ x

= lim

θ→∞ θ3Z∞

y=0

θe−yF(y

θ)1

θdy

=Z∞

y=0

y2e−ylim

θ→∞

yF(y

θ)dy

=f(0) Z∞

y=0

y2e−ydy = 2 f(0)

All this we apply in

lim

θ→∞ θΛ′(−θ) = limθ→∞ θ2E[τe−θτ ]

limθ→∞ θE[e−θτ ]=f(0)

f(0) = 1

If f(0) = 0 but f′(0) >0 then we do a similar calculation with limǫ→0F(ǫ)

ǫ2=

2f′(0) >0 instead of the above limǫ→0F(ǫ)

ǫ=f(0) >0. This works generally

as claimed. 

2.7.1

This claim is a generalisation of an assumption in [16], theorem 2.2. It is

about being able to twist τto have positive mean values as small as we like

since we identified Λ′with the expectation under the twisted distribution.

Corollary 2.7.2. If an inter event time τis bounded with least upper bound

b > 0, no point mass on 0and density fsuch that there is some k∈Nsuch

that f(0)(b) = ···=f(k−1)(b) = 0 < f(k)(b)then limθ→∞Λ′(θ) = b.

This is by considering b−τas an inter event time and applying 2.7.1.

We note that we don’t really need the density on the whole support but

just in a neighbourhood of 0 and - for bounded τ- on the least upper bound.

Claim 2.7.3. For τwith support [0, b]we have Λ′(R) = (0, b)under assump-

tion 2.7.1 for the density at both 0and b.

Proof of 2.7.3: Openness of Λ′(R) = (0, b) remains to be proved. Assume

the contrary and let θ0∈Rbe such that Λ′(θ0) = b. Then for θ > θ0we’d

have Λ′(θ) = bsince Λ′cannot decrease due to convexity and not increase

due to limθ→∞ Λ′(θ) = b.

2.7.3

Also we can argue for unbounded τ.

2.8 Inter event times 33

Claim 2.7.4. If τsatisfies the if part of 2.7.1 and is unbounded but LD-

bounded, then Λ′(R) = R>0.

Proof of 2.7.4: Fix some M > 0 and let θ > 0.

Λ′(θ)>Λ(θ)

θ≥log RM

s=0 eθs dF(s) + eθM Fc(M)

≥log(eθM Fc(M))

θ=θM −RM

s=0 h(s)ds

=M−1

θZM

s=0

h(s)ds

|{z }

=−log Fc(M)

But for unbounded τwe have Fc>0 always and thus −log Fc(M)<∞.

Therefore

lim

θ→∞ Λ′(θ)> M



2.7.4

For D(Λ) = (−∞, LC(h)) for LC(h)<∞we have already seen that Λ′

can’t be bounded and that Λ is essentially smooth / steep. With 2.7.1 we

again get Λ′(D(Λ)) = R>0.

2.8 Generalisations

Up to now we have assumed that inter event times could be arbitrarily small.

We now investigate the more general inter event times τ∈(a, b) for a≥0

and b≤ ∞.

Claim 2.8.1. If τ∈(a, b)with a≥0and b≤ ∞ and Λ′(D(Λ)) = (a, b)then

D(Λ∗)⊆[a, b]and for x∈(a, b)an optimising θ=θ(x)∈Rexists such that

Λ∗(x) = θx −Λ(θ), θ = (Λ′)−1(x).

From the assumption of no-point mass on boundary points we get D(Λ∗) =

(a, b).

Proof of 2.8.1:

log E[eθτ ] = aθ + log E[eθ(τ−a)]

Λ∗(a) = sup

θ∈Rθa −log E[eθτ ] = sup

θ∈R−log E[eθ(τ−a)]

and thus Λ∗(a) = ∞if τhas no point mass on a. Similarly Λ∗(b) = ∞.

2.8.1

Remark 2.8.2. This affects the associated Γ∗in the following way:

•If a > 0, b < ∞then D(Γ∗) = (1

b,1

a).

•If a= 0 , b < ∞then D(Γ∗) = (1

b,∞).

•If a > 0, b =∞then D(Γ∗) = (0,1

a)or D(Γ∗) = [0,1

a)depending on

LC(h).

And D(Γ∗) = [0,1

a)has limx→0Γ∗′(x) = ∞.

Chapter 3

The counting process

In this chapter we move from inter event times to renewal counting processes

that we will define through these inter event times. For exponentially dis-

tributed inter event times the resulting renewal counting process will be the

Markovian Poisson process. For the more general inter event times of chap-

ter 2 we will obtain non-Markovian processes. In this chapter we will argue

that many implications of the Markov-property that are desirable when de-

veloping a large deviation principle on the process level can be obtained for

renewal counting processes as well.

The Poisson process as a Markovian renewal counting process has station-

ary increments starting at any fixed moment. This does not hold true for

the general renewal counting process. For the general case of a counting

process with iid inter event times we show how to construct the associated

process with stationary increments in section 3.2: Choosing a different first

inter event time will be enough and we will make the definition of renewal

counting process such as to include these kinds of processes. In section 3.3

we calculate the lmgf of the general renewal counting process and we find

that it relates to the lmgf of the inter event times as Γ(θ) = −Λ−1(−θ). This

has been proved in 1994 by Peter Glynn and Ward Whitt for more general

counting processes and by other means (cf [10]).

The fourth section is about exponential equivalence of different kinds of

counting processes: A Markovian renewal counting process has a first in-

ter event time with the same distribution as all following inter event times

and has stationary increments. For a general renewal counting process these

are mutually exclusive properties. Luckily, in terms of large deviations the

process with stationary increments and the process with the first and all

following inter event times identically distributed are indistinguishable. In a

similar way we have independence of increments over disjoint intervals for the

Markovian counting process while this does not hold for the general renewal

counting process. We will construct a process that does have independent

increments when observed over finitely many, fixed, disjoint intervals. We say

the process restarts at the beginning of each interval. The restarted process

will be indistinguishable from the general renewal counting process in terms

of large deviations.

The exponential change of measure is classical in large deviation theory. It

is central to the proof of the large deviation principle for Jackson networks

of Irina Ignatiouk-Robert [12]. Preparing for a similar approach we develop

the exponential change of measure for the single (undelayed) renewal count-

ing process of general inter event times in the sixth section. We derive the

change of measure for the counting process from exponentially twisting its

inter event times. We will see that this does not change the process’ renewal

property and we stay in the same class of processes. This way we know about

the process’ expected behaviour under the changed measure.

Before starting the first section we remind the reader of assumptions and

some notation from the previous chapter:

•Inter event times are denoted by τand variations of it with distribution

function Fand density (wrt Lebesgue measure) fand lmgf Λ with an

open domain (cf 2.2.2 , 2.2.13).

•For a non-deterministic inter event time τthe hazard function is de-

noted hand existence of a positive (possibly infinite) Cesaro mean

LC(h) is assumed (cf 2.4.2).

•All Λ and Γ are strictly convex as soon as the associated inter event

time is not deterministic.

•For the domain of Λ we have D(Λ) = (−∞, LC(h)) (cf 2.4.6) and if

LC(h) = ∞we say that τis LD-bounded (cf 2.5.4) and that Fcdecays

super exponentially (cf 2.4.7).

3.1 Introducing the counting process

With a sequence of inter event times τ1, τ2,...we associate a process that for

every t≥0 counts how many events have happened in [0, t]. This process

should at t= 0 take the value 0 and increase by 1 at each occurrence of

3.1 The counting process 37

an event. Since events are separated by inter event times τ1, τ2,... events

happen at τ1, τ1+τ2,P3

l=1 τl,....

Definition 3.1.1 (Counting process).Let τ1, τ2,... be inter event times.

Then the associated counting process Nis defined as

Nt≤k⇔t <

k+1

l=1

τl

for any t∈R≥0.

Claim 3.1.2. Nis piecewise constant and jumps at Pk

l=1 τl;k∈N.

Proof of 3.1.2:

N(t) = k⇔N(t)≤k

N(t)6≤ k−1⇔Sk+1 > t

Sk≤t⇔t∈[Sk, Sk+1)

3.1.2

Claim 3.1.3. If τ1, τ2, . . . are independent and fall under assumption 2.2.2

then all jump-sizes of Nare equal to 1and N(t)<∞for any t∈R≥0.

Proof of 3.1.3: Jump sizes are equal to 1 because intervals [Sk, Sk+1) are

a.s. not degenerate due to P(τk= 0) = 0 (which is necessary for 2.2.2 to

hold). For finiteness:

N(t) = ∞ ⇔ N(t)≤kfor no k∈N⇔t < Sk+1 for no k∈N

⇔t≥Sk+1 ∀k∈N.

But the partial sums process grows a.s. unboundedly. 

3.1.3

Definition 3.1.4 (Renewal counting process, rcp).If inter event times τ1, τ2,...

of a counting process Nare independent and τ2, τ3,... are identically dis-

tributed then Nis a renewal counting process or - abbreviated - a rcp. If

•τ1, τ2are identically distributed then Nis an undelayed rcp.

•τ1, τ2are not identically distributed then Nis a delayed rcp.

For a rcp we say that τ1is the initial inter event time and τ2is a typical

inter event time.

Of the delayed renewal counting processes one is particularly important.

Definition 3.1.5 (˜

N).The delayed renewal counting process with typical

inter event time distribution Fand initial inter event time distribution ˜

the ∼-transform of Fdefined in 2.1.2, is denoted ˜

-r r r

````````

````````````

````````````````

τ1-  τ2-  τ3-

Figure 3.1: A sequence of inter event times and the associated counting

process

The sequence of inter event times and the associated counting process

contain the same information. We give a graphical representation of both in

figure 3.1.

We want to describe the distribution of the discrete valued delayed re-

newal counting process N. For a convenient way to describe the mass function

we make the following

Definition 3.1.6 (Convolution).•For two density functions fand gon

R≥0their convolution f∗gis defined as f∗g(t) = Rt

s=0 f(t−s)g(s)ds.

•For two distribution functions Fand Gon R≥0their convolution F⊛G

is defined as F⊛G(t) = Rt

s=0 F(t−s)dG(s).

•For F:R≥0→R≥0not necessarily a distribution function the convo-

lution is again F⊛G=Rt

s=0 F(t−s)dG(s).

Convoluting for Fwith itself we write f∗f=f∗2and F⊛F=F⊛2.

More generally f∗kis the k-time convolution recursively defined for k≥2

and for a uniform notation we write f∗1=fand f∗0=δ0. Analogously for

Fand F⊛kwith F⊛0= 11[0,∞).

Defined this way the convolution of densities and distributions match: Rt

s=0 f∗

g(s)ds =F⊛G(t).

Remark 3.1.7. Convolutions of densities and distribution functions describe

densities and distribution functions of sums of independent inter event times:

e.g. if τ, σ are independent and have densities fand gthen τ+σhas density

f∗g.

3.1 The counting process 39

Claim 3.1.8 (Mass function for delayed rcp).If Nis a delayed rcp with

initial inter event time distribution Gand typical inter event time distribution

Fthen

P(Nt=k) = (Gc(t)if k= 0

Fc⊛G⊛F⊛(k−1)(t)if k≥1

Proof of 3.1.8: Let τ1, τ2,...be the inter event times of N,τ1with distri-

bution function Gand τ2, τ3,...each with distribution function F. For k≥1

the distribution function of Sk=Pk

l=1 τlis the convolution G⊛F⊛(k−1). We

calculate the mass function:

P(Nt= 0) = P(τ1> t) = Gc(t)

P(Nt=k) = Zt

s=0

P(Nt=k|Sk=s)dG ⊛F⊛(k−1)(s)

=Zt

s=0

P(τk> t −s)dG ⊛F⊛(k−1)(s)

=Fc⊛G⊛F⊛(k−1)(t) (3.1)

and for the last line we need to allow the convolution of the not-a-distribution

function Fc.

3.1.8

Corollary 3.1.9. For G=Fwe get the following mass function for the

undelayed counting process

P(Nt=k) = Fc⊛F⊛k(t) = E[τ]˜

f∗f∗k(t) (k∈N)

and P∞

k=0 Fc⊛F⊛k(t) = 1 or P∞

k=0 ˜

f∗f∗k(t) = 1

E[τ].

We have introduced exponential transformation of distribution functions

in definition 2.3.1 and we want to know how this relates to convolutions.

Claim 3.1.10. Let Λbe the lmgf of interevent time τwith density fand

distribution function F. If k≥1then

(1) log R∞

x=0 eβx dF⊛k(x) = kΛ(β);

(2) f∗kβ=fβ∗kor equivalently F⊛kβ=Fβ⊛k.

Remark 3.1.11. •Claim 3.1.10 and the definition of the exponential

transform 2.3.1 imply that f∗kβ(x) = eβx−kΛ(β)f∗k(x)or equivalently

F⊛kβ(x) = Rx

s=0 eβs−kΛ(β)dF⊛k(s).

•As a consequence we can omit parentheses: F⊛k

β= (Fβ)⊛k= (F⊛k)β.

Proof of 3.1.10: (1) is clear from F⊛kbeing the distribution function of

l=1 τlfor iid τ1,...,τk. We prove (2) inductively for the densities. Initially

for k= 1

(2), k = 1 (fβ)∗1(x) = fβ(x) = (f∗1)β(x).

And generally for k≥1: If (2) holds for kand (1) generally holds then (2)

holds for k+ 1, too.

(fβ)∗(k+1)(x) = (fβ)∗k∗fβ(x) = Zx

s=0

(fβ)∗k(x−s)fβ(s)ds

(2)

=Zx

s=0

(f∗k)β(x−s)fβ(s)ds

(1)

=Zx

s=0

f∗k(x−s)eβ(x−s)−kΛ(β)f(s)eβs−Λ(β)ds

=Zx

s=0

f∗k(x−s)f(s)ds eβx−(k+1)Λ(β)

(1)

= (f∗(k+1))β(x)

3.1.10

3.1.1 Joint distributions

At any fixed time tit might be interesting to know how much time passed

since the last event and how far the next event is.

Definition 3.1.12 (Age, residual lifetime, spread).For a counting process

Nwith associated partial sums of inter event times Swe define

•the age at time tas B(N, t) = t−SN(t).

•the residual lifetime at time tas C(N, t) = SN(t)+1 −t.

•the spread at time tas B(N, t)+C(N, t). This is the length of the inter

event time covering t.

We sometimes abbreviate B(N, t) and C(N, t) by omitting Nand write

B(t) and C(t) instead.

Figure 3.2 illustrates the definition.

Remark 3.1.13. From construction of the process we immediately get: Given

the age, the residual life is independent of the state.

3.1 The counting process 41

r r

τN(t)+1 -

B(t)-  C(t)-

Figure 3.2: Age, residual lifetime, and spread

Claim 3.1.14. P(C(t)≤x|B(t)) = F+B(t)(x)with F+·of definition 2.1.7.

Proof of 3.1.14: First note that

B(t) = a⇔∆N(t−a) = 1 , τN(t)+1 > a

{B(t) = a , N(t) = k} ⇔ {Sk=t−a , τk+1 > a}

and apply this in the following.

P(τN(t)+1 > x +a|B(t) = a)

∞

k=0

P(τN(t)+1 > x +a|N(t) = k , B(t) = a)P(N(t) = k)

∞

k=0

P(τk+1 > x +a|N(t) = k , Sk=t−a , τk+1 > a)P(N(t) = k)

∞

k=0

P(τk+1 > x +a|Sk=t−a , τk+1 > a)P(N(t) = k)

Sk⊥⊥τk+1

∞

k=0

P(τk+1 > x +a|τk+1 > a)

|{z }

=P(τ1>x+a|τ1>a)

P(N(t) = k)

=P(τ1> x +a|τ1> a)

∞

k=0

P(N(t) = k)

=Fc

+a(x)



3.1.14

Corollary 3.1.15. By symmetry of age and residual lifetime similarly to

3.1.14 holds: P(B(t)≤x|C(t)) = F+C(t)(x)with F+·of definition 2.1.7.

Let Nbe a delayed renewal counting process with initial inter event time

distribution function Gand typical inter event time distribution function F.

We give the joint distribution of state and residual lifetime.

Claim 3.1.16. The joint mass of (N(t), C(t)), the state and residual lifetime

at t, on {k} × (s, ∞)for k∈Nand s≥0is

P(N(t) = k, C(t)> s) = (Gc(t+s)k= 0

r=0 Fc(t+s−r)dG ⊛F⊛(k−1)(r)k≥1.

The joint mass of (N(t), B(t)), the state and the age at t, on {k} × [0, s]for

k∈Nand s≥0is

P(N(t) = k , B(t)≤s) =











0k= 0 , s < t

Gc(t)k= 0 , s ≥t

Fc⊛G⊛F⊛(k−1)(t)

−Fc⊛G⊛F⊛(k−1)(t−s)!k > 0, s ≤t

Fc⊛G⊛F⊛(k−1)(t)k > 0, s > t.

Proof of 3.1.16: Let Sk=Pk

l=1 τl.

P(N(t) = k, C(t)> s)

=P(Sk≤t < Sk+1 , Sk+1 > t +s)

=P(Sk≤t , Sk+1

|{z}

=Sk+τk+1

> t +s)

=Zt

r=0

P(r≤t , r +τk+1 > t +s|Sk=r)dG ⊛F⊛(k−1)(r)

=Zt

r=0

P(τk+1 > t +s−r)dG ⊛F⊛(k−1)(r)

=Zt

r=0

Fc(t+s−r)dG ⊛F⊛(k−1)(r)

Now the joint mass of state and age and first for k= 0.

P(N(t) = 0 , B(t)≤s) = (0, s < t

Gc(t), s ≥t

3.2 The counting process 43

Now for the general k≥1 and s≤t

P(N(t) = k , B(t)≤s)

=P(N(t) = k

|{z }

t∈[Sk, Sk+τk+1)

, t −SN(t)≤s)

=P(Sk∈[t−s, t], τk+1 > t −Sk)

=Zt

r=t−s

P(r∈[t−s, t], τk+1 > t −r|Sk=r)g∗f∗(k−1)(r)dr

=Zt

r=t−s

Fc(t−r)g∗f∗(k−1)(r)dr (3.2)

=Zt

r=t−s

Fc(t−r)d G ⊛F⊛(k−1)(r)

=Fc⊛G⊛F⊛(k−1)(t)−Fc⊛G⊛F⊛(k−1)(t−s)

whereas for k≥1, t < s

{N(t) = k

|{z }

⇒B(t)<t

, B(t)≤s}={N(t) = k}

⇒P(N(t) = k , B(t)≤s) = Fc⊛G⊛F⊛(k−1)(t) (cf. 3.1)



3.1.16

Note that the joint distribution of state and age has a density for k≥1

fixed and s≤t(easily seen from (3.2)):

s7→ Fc(s)g∗f∗(k−1)(t−s) (3.3)

Corollary 3.1.17. The undelayed renewal counting process with inter event

time distribution function Fhas as joint distribution of state and age

P(N(t) = k , B(t)≤s) = 









0k= 0 , s < t

Fc(t)k= 0 , s ≥t

Fc⊛F⊛k(t)−Fc⊛F⊛k(t−s)k > 0, s ≤t

Fc⊛F⊛k(t)k > 0, s > t

with the following density for k > 0, s ∈[0, t]

s7→ d

dsP(N(t) = k , B(t)≤s) = Fc(s)f∗k(t−s)

3.2 Stationary increments

Consider the delayed renewal counting process ˜

Ndefined in 3.1.5. In this

section we prove that ˜

Nhas strictly / strongly stationary increments.

The mass function of ˜

Nat fixed tis derived from claim 3.1.8 as

k7→ P(˜

N(t) = k) = (˜

Fc(t)k= 0

Fc⊛˜

F⊛F⊛(k−1)(t)k≥1(3.4)

The following is weak stationarity of increments.

Claim 3.2.1. t7→ E[˜

N(t)] is linear with slope 1

E[τ].

Proof of 3.2.1:

E[˜

N(t)] =

∞

k=1

P(˜

N(t)≥k) =

∞

k=1

P(˜τ+

l=2

τl≤t) =

∞

k=1

F⊛F⊛(k−1)(t)

and for the derivative

dt ˜

F⊛F⊛k(t) = ˜

F(0)

|{z}

F⊛k(t) + Zt

s=0

f(t−s)dF⊛k(s) = 1

E[τ]Fc⊛F⊛k(t)

3.1.9

E[τ]P(N(t) = k)

and thus

dtE[˜

N(t)] = d

dt ˜

F(t) +

∞

k=1

F⊛F⊛k(t)

=˜

f(t) +

∞

k=1

E[τ]P(N(t) = k)

E[τ]Fc(t)

|{z}

=P(N(t)=0)

+P(N(t)≥1)=1

E[τ]

3.2.1

We will continue by arguing for strong stationarity of increments of the de-

layed counting process ˜

Lemma 3.2.2. For the delayed renewal counting process ˜

Nwith initial inter

event time ˜τholds: L(C(˜

N, t)) = L(˜τ)for any t.

3.2 The counting process 45

Proof of 3.2.2: We work with the joint distribution of ˜

N(t) and C(˜

N, t)

of 3.1.16. We have to set g=˜

P(C(t)> s) = P(N(t) = 0 , C(t)> s) +

∞

k=1

P(N(t) = k , C(t)> s)

=˜

Fc(t+s) + Zt

r=0

Fc(t+s−r)

∞

k=1

f∗f∗(k−1)(r)dr

=Z∞

r=t+s

Fc(r)

E[τ]dr +Zt

r=0

Fc(t+s−r)

∞

k=0

f∗f∗k(r)

|{z }

E[τ]by 3.1.9

=Z∞

r=t+s

Fc(r)

E[τ]dr +Zt+s

r=s

Fc(r)

E[τ]dr

=˜

Fc(s) = P(˜τ > s)

3.2.2

From this strong stationarity of increments of ˜

Nis immediate:

Claim 3.2.3. Let N′:s7→ ˜

N(t+s)−˜

N(t)for s≥0. Then L(˜

N) = L(N′).

Proof of 3.2.3: The process N′is piecewise constant and if it jumps at s

then

N′(s)−N′(s−) = ˜

N(t+s)−˜

N(t+s−) = 1.

Typical inter event times of N′are typical inter event times of ˜

N, so they

are independent. Thus N′is a delayed renewal counting process. The initial

inter event time distribution of N′is τ′

1=C(˜

N, t) and we have just seen that

L(C(˜

N, t)) = L(˜τ). So inter event times of N′have the same distribution as

those of ˜

Nand the distributions of the associated counting processes coin-

cide. 

3.2.3

As a graphical presentation of the proof of 3.2.3 consider figure 3.3. In-

ter event times of ˜

Nare on the upper and of N′on the lower timeline.

For a different kind of proof of strong stationarity of increments of ˜

Nsee

[26] (ch. 2.16 especially theorem 17. on p. 112).

We can similarly argue for reversibility of the counting process with sta-

tionary increments. The following lemma is a preparation.

r r r r

rpppppppppr

rppppppppp

τ1= ˜τ

 - τ2

 - τ3

 - τ4

 -

τ′

1=C(t)



-

τ′

2=τ5

Figure 3.3: Inter event times of ˜

Nand N′of claim 3.2.3

Lemma 3.2.4.

P(B(t)≤x) = (1if x≥t

F(x)if x < t

Proof of 3.2.4: We apply the conditional distribution 3.1.15 and the

knowledge of the distribution of C(˜

N, t) from 3.2.2.

P(B(t)≤x) = Z∞

y=0

P(B(t)≤x|C(t) = y)

|{z }

=F+y(x)

f(y)dy

=Z∞

y=0

(1 −Fc(x+y)

Fc(y))Fc(y)

E[τ]dy

= 1 −Z∞

y=0

Fc(x+y)

E[τ]dy

= 1 −Z∞

y=x

f(y)dy

=˜

F(x)

For the point mass: P(B(˜

N, t) = t) = P(˜τ > t) = ˜

Fc(t) which suits just fine.



3.2.4

Claim 3.2.5 (Reversibility of increments).For ˜

Nand fixed t > 0let N′′ :

s7→ ˜

N(t)−˜

N(t−s)for s∈[0, t]. Then L(˜

N[0,t]) = L(N′′).

Proof of 3.2.5: B(˜

N, t) has the required distribution by lemma 3.2.4.

Denote inter event times of N′′ by τ′′

1, τ′′

2,.... The remaining proof is in

figure 3.4. 

3.2.5

3.3 Lmgf for the undelayed rcp

The logarithmic moment generating function is an essential in classical large

deviation theory.

3.3 The counting process 47

r r r r



rppppppppp

τ1= ˜τ

 -

B(N′′, t)

 -

τ2

 -

τ′′

3=τ2

 -

τ3

 -

τ′′

2=τ3

 -

τ4

 -

τ′′

1=B(˜

N, t)

Figure 3.4: Inter event times of ˜

Nand the reversed process N′′

Claim 3.3.1. Let Nbe the undelayed rcp with inter event time distribution

Fwith lmgf Λ(cf 2.2.1) and associated Γ(cf 2.2.7). Then for any θ∈R

lim

t→∞

tlog E[eθNt] = Γ(θ).

Proof of 3.3.1: We calculate exactly. For θ∈Rset ρ=−Γ(θ) which is

equivalent to −Λ(ρ) = θby definition of Γ.

E[eθNt]3.1.9

=Fc(t) +

∞

k=1

eθk Fc⊛F⊛k(t)

=Fc(t) +

∞

k=1

eθk Zt

s=0

Fc(t−s)f∗k(s)ds

=Fc(t) + Zt

s=0

Fc(t−s)

∞

k=1

eθk f∗k(s)ds

θ=−Λ(ρ)

=Fc(t) + Zt

s=0

Fc(t−s)

∞

k=1

e−ρs eρs−kΛ(ρ)f∗k(s)ds

3.1.10

=Fc(t) + Zt

s=0

Fc(t−s)e−ρs

∞

k=1

f∗k

ρ(s)ds

=Fc(t) + e−tρ Zt

s=0

Fc(t−s)eρ(t−s)1

E(ρ)[τ]−g

(fρ)(s)

|{z }

E(ρ)[τ]−Fc

ρ(s)

E(ρ)[τ]

ds

=Fc(t) + e−tρ Zt

s=0

Fc(t−s)eρ(t−s)Fρ(s)

E(ρ)[τ]ds

And the integral in the last line converges to some value in (0,∞). Which

implies that there is no exponential decay or growth and the integral does

not contribute to the exponential rate of the expectation.

s=0

Fc(t−s)eρ(t−s)Fρ(s)ds r=t−s

=Zt

r=0

Fc(r)

|{z}

=E[τ]˜

f(r)

erρ Fρ(t−r)dr

=Zt

r=0

(˜

f)ρ(r)e˜

Λ(ρ)Fρ(t−r)dr

=e˜

Λ(ρ)Zt

r=0

Fρ(t−r)d(˜

F)ρ(r)

=e˜

Λ(ρ)Fρ⊛(˜

F)ρ(t)

→e˜

Λ(ρ)(as t→ ∞)

Thus under the exponential scaling

lim sup

t→∞

tlog E[eθNt]≤max{lim

t→∞

tlog Fc(t),−ρ+ lim sup

t→∞

tlog e˜

Λ(ρ)

E(ρ)[τ]}

= max{−LC(h),−ρ}=−ρ

where we applied 2.4.7 (for decay rate of Fcas LC(h)) and ρ∈ D(Λ) =

(−∞, LC(h))) and

lim inf

t→∞

tlog E[eθNt]≥lim inf

t→∞

tlog e−tρ e˜

Λ(ρ)

E(ρ)[τ]=−ρ

Since ρ=−Γ(θ) the claim is proved. 

3.3.1

3.4 Exponential equivalence for cps

Doing Large Deviations for counting processes it will be convenient to make

some small alterations from the original process from time to time. In this

section we describe these alterations and prove that they do not affect the

Large Deviation behaviour of the process.

Definition 3.4.1 (Scaling).For Na renewal counting process define

Nn:R≥0→R, t 7→ 1

nN(nt).

For every n∈Nand T > 0 the scaled counting processes on [0, T] are

elements of D([0, T],R). The sup-norm of any Nnrestricted to [0, T] is finite

and for two counting processes the sup-norm induced distance will be finite:

||Nn−N′

n|| ≤ ||Nn|| +||N′

n|| =1

n(N(nT) + N′(nT)) <∞

3.4 The counting process 49

Definition 3.4.2 (Exponential equivalence in D(⌊p0, Tq⌋,R),||.||).The se-

quences of processes (Yn;n∈N)and (Zn;n∈N)in D([0, T],R)equipped

with the sup-norm are exponentially equivalent if for each n∈Nthere is a

coupling (ˇ

Yn,ˇ

Zn)of (Yn, Zn)such that the sequence of sup-norm distances

(||ˇ

Yn−ˇ

Zn|| ;n∈N)decays super exponentially: For any δ > 0

lim

n→∞

nlog P(||ˇ

Yn−ˇ

Zn|| > δ) = −∞.

If the sequence of processes used in the exponential equivalence is obvious

and for example relates to a counting process under the scaling 3.4.1, then we

may say that two processes N , N′are exponentially equivalent when indeed

we should be saying that (Nn;n∈N) and (N′

n;n∈N) are exponentially

equivalent.

3.4.1 Initial inter event time

Here we argue for the exponential equivalence of counting processes that differ

only in the distribution of the time to the first event if these distributions

are exponentially equivalent. This will imply exponential equivalence of the

undelayed rcp and the associated rcp with stationary increments.

Definition 3.4.3 (N, Nσ).•Nis an undelayed renewal counting pro-

cess and Fis the distribution function of each inter event time. τis

the first inter event time of N.

•Nσis a delayed renewal counting process with typical inter event time

distribution Fand initial inter event time σwith distribution function

Claim 3.4.4. If the initial inter event times τ, σ of N, Nσare exponentially

equivalent then the counting processes N, Nσare exponentially equivalent.

To prove this we have to give a coupling ˇ

N, ˇ

Nσof N, Nσsuch that the

difference || ˇ

Nn−ˇ

Nσ

n|| decays faster than exponentially in n.

Definition 3.4.5 (Coupled ˇ

N, ˇ

Nσ).Let {U, τ2, τ3,...}be independent ran-

dom variables where Uis uniform on [0,1] and τ2, τ3,... have density fand

distribution function F. Define the counting process ˇ

Nthrough its inter event

times F−1(U), τ2, τ3,... and the counting process ˇ

Nσthrough its inter event

times G−1(U), τ2, τ3,... (cf def 3.1.1 of a counting process).

Figure 3.5 is a graphical representation of the coupled ˇ

Nand ˇ

Nσ.

0r r r r

0rp

τ1=τ

 -

τ1=σ

 -

τ2

 - τ3

 - τ4

 -

Figure 3.5: Inter event times of coupled ˇ

Nand ˇ

Nσ

Claim 3.4.6 (Marginal distributions).L(N) = L(ˇ

N)and L(Nσ) = L(ˇ

Nσ).

Proof of 3.4.6: Inter event times F−1(U), τ2, τ3,... and G−1(U), τ2, τ3,...

are independent because U, τ2, τ3,...are. Thus ˇ

Nand ˇ

Nσare rcps. All inter

event times of ˇ

Nhave distribution function F: the τ2, τ3,...by definition and

F−1(U) by the quantile coupling and (2.6). Thus ˇ

Nis an undelayed renewal

counting process. By the definition 3.1.1 of a counting process its distribution

is determined by the distribution of its inter event times: so distributions of

Nand Ncoincide. Similarly Nσis a delayed renewal counting process and

its first inter event time has the required distribution: L(σ) = L(G−1(U))

again by the quantile coupling and (2.6). 

3.4.6

Lemma 3.4.7. For coupled ˇ

N, ˇ

Nσof 3.4.5: ˇ

Nσ(t) = ˇ

N(t+τ−σ)(with

N(s) = 0 for s≤0).

Proof of 3.4.7: Set Sk=τ+Pk

l=2 τkand Sσ

k=σ+Pk

l=2 τk. For k≥1

N(t+τ−σ) = k⇔Sk≤t+τ−σ < Sk+1

−τ+σ

⇔Sσ

k≤t < Sσ

k+1 ⇔ˇ

Nσ(t) = k

and for k= 0

N(t+τ−σ) = 0 ⇔t+τ−σ < τ ⇔t < σ ⇔ˇ

Nσ(t) = 0

Consider the case τ < σ and let 0 < s < −τ+σ. Since s < σ we have

Nσ(s) = 0 and N(s+τ−σ) = 0 since s+τ−σ < 0. 

3.4.7

Proof of 3.4.4: The claim is proved if we can prove

lim sup

n→∞

nlog P(|| ˇ

Nn−ˇ

Nσ

n|| > a) = −∞ (3.5)

for ˇ

N,ˇ

Nσof definition 3.4.5 and any a > 0. We introduce a small parameter

3.4 The counting process 51

δ > 0.

P( sup

t∈[0,T ]

|ˇ

Nσ

n(t)−ˇ

Nn(t)|> a)

=P( sup

t∈[0,T ]

|N(nt +τ−σ)−N(nt)|> n a , |τ−σ|< δn)

+P( sup

t∈[0,T ]

|N(nt +τ−σ)−N(nt)|> n a , |τ−σ| ≥ δn)

(We write Ninstead of ˇ

Nagain since L(N) = L(ˇ

N)). From the quantile

coupling of τand σ(such that they are exponentially equivalent, cf claim

2.5.7) we already know that the second probability decays superexponentially.

We further investigate the first event. Under the condition of a small distance

|τ−σ|and in light of the monotonicity of N

|N(nt +τ−σ)−N(nt)|

≤max{|N(nt +nδ)−N(nt)|,|N((nt −nδ)+)−N(nt)|}

= max{N(nt +nδ)−N(nt), N(nt)−N((nt −nδ)+)}

and taking the sup over all tmakes the max{...}unnecessary.

P( sup

t∈[0,T ]

|ˇ

Nσ(nt)−ˇ

N(nt)|> n a , |τ−σ|< δn)

≤P( sup

t∈[0,T ]

N(nt +nδ)−N(nt)> n a , |τ−σ|< δn)

And

P( sup

t∈[0,T ]

|ˇ

Nσ

n(t)−ˇ

Nn(t)|> a)

≤P( sup

t∈[0,T ]

N(nt +nδ)−N(nt)> n a) + P(|τ−σ| ≥ nδ)

Exponential equivalence as stated in 3.5 has now become equivalent to

lim

δ→0lim sup

n→∞

nlog P( sup

t∈[0,T ]

Nn(t+δ)−Nn(t)> a) = −∞ (3.6)

Now fix a δ < a

λ(and small relative to T) and divide the interval [0, T] into

many intervals of size δ. If there is tsuch that Nn(t+δ)−Nn(t)> a then

this twill be in one of these intervals.

P( sup

t∈[0,T ]

Nn(t+δ)−Nn(t)≥a)

=P( max

m=0,...,⌊T

δ⌋

sup

t∈[mδ,(m+1)δ]

Nn(t+δ)−Nn(t)≥a)

≤

⌊T

δ⌋

m=0

P( sup

t∈[mδ,(m+1)δ]

Nn(t+δ)−Nn(t)≥a)

⌊T

δ⌋

m=0

P(C(mnδ)< nδ , sup

t∈[mδ,(m+1)δ]

Nn(t+δ)−Nn(t)≥a)

⌊T

δ⌋

m=0

P( sup

t∈[mδ,(m+1)δ]

Nn(t+δ)

|{z }

≤Nn((m+2)δ)

−Nn(t)

|{z}

≥Nn(mδ)

n(N(nmδ+C(nmδ))−1)

|{z }

≤1

n+Nτ

n(2δ)

≥a

|C(mnδ)< nδ)P(C(mnδ)< nδ)

|{z }

≤1

≤

⌊T

δ⌋

m=0

P(Nτ

n(2δ)≥a−1

where we introduced Nτas an undelayed rcp. to bounds increments of Nn

(It should be a stochastic domination only since we don’t want to do another

explicit coupling again ...). Since Ndefined in 3.4.3 was undelayed too, we

can even omit the τ.

P( sup

t∈[0,T ]

Nn(t+δ)−Nn(t)≥a)≤(⌊T

δ⌋+ 1)P(Nn(2δ)> a −1

a′<a

≤(⌊T

δ⌋+ 1)P(Nn(2δ)> a′)

With in the last inequality n > 1

a−a′which is required for a−1

n> a′to

hold for a′< a. To apply the scaling we move from counting processes to

partial sums since we already have the large deviation for their mean (cf [5]

3.4 The counting process 53

Cram´er’s theorem 2.2.3).

nlog P(Nn(2δ)≥a′) = 1

nlog P(N(n2δ)≥na′)

nlog P(Sna′≤n2δ)

=a′1

na′log P(1

na′Sna′≤2δ

a′)n→∞

→ −a′Λ∗(2δ

a′)

As we have arbitrary a′< a and Λ∗is continuous we now have checked (3.6):

lim

δ→0lim sup

n→∞

nlog P( sup

t∈[0,T ]

Nn(t+δ)−Nn(t)> a)

≤ − lim

δ→0aΛ∗(2δ

a) = −aΛ∗(0) = −∞

Note that Λ∗(0) = ∞is a consequence of the no-point-mass at zero property

of assumption 2.2.2 and that limδ→0Λ∗(δ) = ∞follows from lower semi con-

tinuity of Λ∗.

3.4.4

The following generalisation of 3.4.4 is immediate.

Corollary 3.4.8. Any two renewal counting processes with the same typical

inter event time distribution and exponentially equivalent initial inter event

times are exponentially equivalent.

3.4.2 Independence of increments

For the Markovian renewal counting process the state at a fixed time s∈

[0, T] and future increments (N(s), N(T)−N(s)) are independent. This

does not hold for non-deterministic inter event distributions different from

the exponential.

The non-independence of N(s) and N(T)−N(s) for a general rcp Nis

through the age B(s) at time sand affects the initial distribution of the

process of increments on [s, T]. We have seen in the last section that a single

initial inter event time may be changed without affecting the large deviation

behaviour of the process. This will be a tool when replacing a renewal count-

ing process by a similar process with a certain independence of increments.

Claim 3.4.9. Let Nbe a rcp and Nnthe associated scaled process. Fix a

k∈Nand fix 0< s1<··· < sk< T. Then there is a sequence of scaled

counting process (N′

n;n∈N)such that (Nn;n∈N),(N′

n;n∈N)are

exponentially equivalent in (D([0, T],R),|| · ||)and processes of increments

over disjoint intervals for N′

N′

n(t)−N′

n(sm) ; t∈[sm, sm+1]|m= 0,...,k

(with s0= 0 , sk+1 =T) are independent.

Since rcps with the same Cesaro mean for their initial inter event time

distribution are exponentially equivalent we can pick one for the proof of

3.4.9 and we pick ˜

Nof definition 3.1.5 and section 3.2. The proof will be by

induction and we start with the base case k= 1.

Claim 3.4.10. Let ˜

Nbe the delayed rcp with stationary increments and for

some T > 0let s∈[0, T]. Then there is a sequence of counting process

(N′

n;n∈N)such that (N′

n;n∈N)and (˜

Nn;n∈N)are exponentially

equivalent in (D([0, T],R),|| · ||)and for each n∈N

(N′

n(t) ; t∈[0, s]) ,(N′

n(t)−N′

n(s) ; t∈[s, T])

are independent.

Definition 3.4.11 (Restarted process Nre,s).Given s > 0and rcps N, N(1)

define

Nre,s :R≥0→R, t 7→ (N(t)if t≤s

N(s) + N(1)(t−s)if t > s.

Note that Nre,s is a counting process and if N[0,s], N(1) are independent,

then Nre,s(s) and increments of Nre,s after sare independent.

We want a scaling for the restarted process Nre,s that also scales the as-

sociated epoche s.

Definition 3.4.12 (Scaled restarted process Nre,s

n).Fix s > 0. Let Nbe a

renewal counting process and (Nre,ns ;n∈N)a sequence of restarted process

associated with ns, N, N(1). Then define

Nre,s

n:R≥0→R, t 7→ 1

nNre,ns(nt).

For some fixed swe now give a coupling of ˜

Nand a restarted process

Nre,s.

3.4 The counting process 55

Definition 3.4.13 (Coupling M, M′).Let s > 0be fixed. Let U1, U2, τ′

2, τ′′

τ′

3, τ′′

3... be independent random variables where U1, U2are uniform on [0,1]

and all τ′

i, τ′′

ihave distribution function F. Let F+·be associated with F(cf

definition 2.1.7) and define (B, C) = ( ˜

F−1(U1), F−1

+B(U2)) through the quan-

tile coupling (cf definition 2.5.2).

Finally define Mas the counting process associated with the following se-

quence of inter event times:

•B(M, s) = B

•C(M, s) = C

•inter event times after s+Care τ′

2, τ′

3,...

•inter event times before s−Bare τ′′

2, τ′′

3,...

And define M′as the counting process with the following inter event times

•inter event time covering s:C(M′, s) = G−1(U2),B(M′, s) = B(M, s)

•all other inter event times the same as in M.

Figure 3.6 is a graphical representation of the definition of inter event

times for the coupled M, M′.

pppp -

r r r r r r

τ′′

 - τ′′

 - B

 -

 -  -

τ′

2 -

τ′

F−1

+B(U2)

pppp -rppppppppp

rppppppppp

 -

G−1(U2)

Figure 3.6: Inter event times of coupled M(top line) and M′(bottom line)

Claim 3.4.14. L(M) = L(˜

N).

Proof of 3.4.14: The construction of Mis a combination of those in section

3.2: We construct the counting process starting in tfor s≥tas in 3.2.3 and

also construct its past as the reverse of a stationary counting process in 3.2.5.

We make sure that C=F−1

+B(U2) has the correct marginal distribution:

density ˜

P(C≤x) = Z∞

b=0

P(C≤x

|{z}

⇔F−1

+b(U2)≤x

|B=b)˜

f(b)db =Z∞

b=0

F+b(x)˜

f(b)db

2.1.7

=Z∞

b=0

F(x+b)−F(b)

Fc(b)

E[τ]db =Z∞

b=0

F(x+b)−F(b)

E[τ]db

dxP(C≤x) = R∞

b=0 f(x+b)db

E[τ]=Fc(x)

E[τ]=˜

f(x)

So we can really use Bas the age B(˜

N, t) and Cas the residual lifetime

C(˜

N, t). The inter event times of Mand ˜

Nthat cover thave the same

distribution. Since all other inter event times of of Mand ˜

Nhave the same

distribution, too, the distributions of the counting processes coincide. 

3.4.14

Claim 3.4.15. L(M′) = L(Nre,s)

Proof of 3.4.15: We have the process of increments of M′after swith

inter event times G−1(U2), τ′

2, τ′

3,...which is independent of M′on [0, s] and

we can identify increments of M′after s- distribution wise - with N(1) of the

definition of Nre,s.

3.4.15

Proof of claim 3.4.10 with N′

n=Nre,s

n: For each n∈Nlet M, M′be the

coupled processes defined above associated with ns. Then for fixed n

sup

t∈[0,nT ]

|M(t)−M′(t)|= sup

t∈[ns,nT ]

|M(t)−M′(t)|

= sup

t∈[0,n(T−s)]

|NF−1

+B(U2)(t)−NG−1(U2)(t)|

where N·are rcp with typical inter event time distribution function Fand

indicated initial inter event times: and F−1

+b(U2) and G−1(U2) = ˜

F−1(U2) have

the same Cesaro mean for their hazard functions uniformly in b∈R≥0. Thus

by section 3.4.1 / corollary 3.4.8 these processes are exponentially equivalent

and

lim

n→∞

nlog P( sup

t∈[0,n(T−s)]

|NF−1

+B(U2)(t)−NG−1(U2)(t)|> na) = −∞



3.4.10

We now prepare for the inductive step:

3.4 The counting process 57

Definition 3.4.16 (Restarted process Nre,(s1,...,sk)).Given an ordered se-

quence 0< s1<···< skand rcps N, N(1),...,N(k)define

Nre,(s1,...,sk):R≥0→R

t7→ 









N(t)if t∈[0, s1]

N(s1) + N(1)(t−s1)if t∈(s1, s2]

N(s1) + Pk−1

m=1 N(m)(sm+1 −sm) + N(k)(t−sk)if t∈(sk,∞).

Definition 3.4.16 generalises definition 3.4.11 in the number of restarted

epochs.

Lemma 3.4.17. The following is an equivalent recursive definition for 3.4.16:

Given an ordered sequence 0< s1<··· < skand rcps N, N(1),...,N(k)de-

fine

•For k= 1 apply definition 3.4.11.

•For k≥2and an ordered sequence 0< s1<··· < sk<∞and rcps

N, N(1),...,N(k)let Nre,(s1,...,sk−1)be the restarted process associated

with N, N(1),...,N(k−1) and the ordered sequence s1,...,sk−1

Nre,(s1,...,sk)(t) = (Nre,(s1,...,sk−1)(t)if t∈[0, sk]

Nre,(s1,...,sk−1)(sk) + N(k)(t−sk)if t∈(sk,∞)

The proof of 3.4.17 is just by inductively spelling out the definition. 

3.4.17

Claim 3.4.18. Let ˜

Nbe the delayed rcp with stationary increments. Fix

T > 0and k∈Nand let 0< s1,...,sk+1 < T be an ordered sequence.

Assume that for any ordered sequence 0< s′

1,...,s′

kthe sequences of pro-

cesses (˜

Nn;n∈N)and (Nre,(s′

1,...,s′

n;n∈N)are exponentially equivalent in

(D([0, T],R),|| · ||). Then N′

n=Nre,(s1,...,sk+1)

nis such that the sequences

of processes (˜

Nn;n∈N)and (N′

n;n∈N)are exponentially equivalent in

(D([0, T],R),|| · ||)and for each n∈Nthe processes of increments

(N′

n(t) ; t∈[0, s1]) ,(N′

n(t)−N′

n(s1) ; t∈[s1, s2]) , . . .

. . . , (N′

n(t)−N′

n(sk+1) ; t∈[sk+1, T])

are independent.

Proof of 3.4.18: For each n∈Nwe couple the sequence of k+2 processes

{˜

Nn,˜

N(1)

n,˜

N(2)

n,..., ˜

N(k)

n,˜

N(k+1)

n}(3.7)

the following way:

•˜

Nn,˜

N(1)

n,˜

N(2)

n,..., ˜

N(k)

nare coupled such that the exponential equiva-

lence of ( ˜

Nn;n∈N) and (Nre,(s1,...,sk)

n;n∈N) holds when Nre,(s1,...,sk)

is build from ˜

Nn,˜

N(1)

n,˜

N(2)

n,..., ˜

N(k)

•˜

N(k+1)

nand ˜

N(k)

nare coupled as in definition 3.4.13 for s=n(sk+1 −sk).

We have for any process N

|| ˜

Nn−Nre,(s1,...,sk+1)

n|| ≤ || ˜

Nn−N|| +||N−Nre,(s1,...,sk+1)

n||

P(|| ˜

Nn−Nre,(s1,...,sk+1)

n|| > δ)≤P(|| ˜

Nn−N|| +||N−Nre,(s1,...,sk+1)

n|| > δ)

≤P(|| ˜

Nn−N|| >δ

2) (3.8)

+P(||N−Nre,(s1,...,sk+1)

n|| >δ

2) (3.9)

For fixed n∈Nchoose N=Nre,(s1,...,sk)

nassociated with the ˜

N,˜

N(1),...,

N(k)of (3.7). Then ( ˜

Nn;n∈N) and ( ˜

Nre,(s1,...,sk)

n;n∈N) have an LD

bounded sup-norm distance by construction and the induction assumption:

The probability (3.8) decays super exponentially. For (3.9) first notice that

t∈[0, sk+1] : Nre,(s1,...,sk)

n(t) = Nre,(s1,...,sk+1)

n(t)

since both processes are associated with the same sequence of (3.7). And on

[sk+1, T] first apply the representation of Nre,(s1,...,sk+1)

nof lemma 3.4.17 and

then the plain definition of the restarted process.

t≥sk+1 :Nre,(s1,...,sk)

n(t)−Nre,(s1,...,sk+1)

n(t)

=Nre,(s1,...,sk)

n(t)−Nre,(s1,...,sk)

n(sk+1) + ˜

N(k+1)

n(t−sk+1)

=˜

N(k)

n(t−sk)−˜

N(k)

n(sk+1 −sk)−˜

N(k+1)

n(t−sk+1)

and the event of the probability in (3.9) can be simplified:

||Nre,(s1,...,sk)

n−Nre,(s1,...,sk+1)

n||

= sup

t∈[sk+1,T ]

|˜

N(k)

n(t−sk)−˜

N(k)

n(sk+1 −sk)−˜

N(k+1)

n(t−sk+1)|

= sup

t∈[sk+1−sk,T −sk]

|˜

N(k)

n(t)−˜

N(k)

n(sk+1 −sk)−˜

N(k+1)

n(t+sk−sk+1)|

= sup

t∈[sk+1−sk,T −sk]

|˜

N(k)

n(t)−˜

N(k)re,sk+1−sk

n(t)|

=|| ˜

N(k)

n−˜

N(k)re,sk+1−sk

n||[0,T −sk+1]

3.5 The counting process 59

where ˜

N(k)re,sk+1−skis the once restarted process associated with ˜

N(k),˜

N(k+1)

both of (3.7). By the the base case 3.4.10 of the once restarted process we

have exponential equivalence and the probability (3.9) decays super expo-

nentially. 

3.4.18

Proof of 3.4.9: by induction with the base case 3.4.10 and inductive step

3.4.18. N′is the restarted process of definition 3.4.16. 

3.4.9

Corollary 3.4.19. Let Nbe a rcp and Nnthe associated scaled process. Fix

k∈Nand fix 0< s1<··· < sk< T. Then there is a sequence of scaled

counting process (N′

n;n∈N)such that (Nn;n∈N),(N′

n;n∈N)are

exponentially equivalent in (D([0, T],R),||·||)and for each n∈Nincrements

{N′

n(s1), N′

n(s2)−N′

n(s1), . . . , N′

n(T)−N′

n(sk)}

of N′

nare independent.

3.4.3 Interpolation

The counting process is piecewise constant and each realisation is a rightcon-

tinuous function with left limits, an element of D([0, T],R). Interpolating

Nand denoting the interpolated counting process ˆ

Nwe get an element of

C([0, T],R). For any twe have ˆ

N(t)−N(t)∈[0,1] so || ˆ

N−N|| is bounded

and the counting process and the interpolated counting process are exponen-

tially equivalent.

3.5 Conclusions from previous proofs

In this section we apply the exponential equivalences of the last section to

obtain finite dimensional large deviations for the renewal counting process.

Subsection 3.5.2 is on the limiting distribution for the scaled renewal counting

process being concentrated on the space of continuous functions. Technically

it is more a reinterpretation of a statement in the proof of 3.4.4.

3.5.1 Finite dimensional large deviations

We develop finite dimensional large deviation principles for the delayed and

undelayed renewal counting process.

Having calculated the lmgf for the undelayed rcp in 3.3 the following theorem

is now immediately clear.

Theorem 3.5.1. Let Nbe the undelayed renewal counting process with typ-

ical inter event time density fand lmgf Λ. Then for θ∈R

lim

t→∞

tlog E[eθNt] = Γ(θ)

with Γ(θ) = −Λ−1(−θ). Furthermore a one-dimensional Large Deviation

principle holds: for any s > 0and open set Gand closed set F

−sinf

x∈GΓ∗(x)≤lim inf

n→∞

nlog P(Nn(s)∈G)

lim sup

n→∞

nlog P(Nn(s)∈F)≤ −sinf

x∈FΓ∗(x)

with Γ∗the Fenchel-Legendre transform of Γ.

Our theorem 3.5.1 is theorem 1 in [10].

Proof of 3.5.1: By the G¨artner-Ellis theorem, cf [5] Theorem 2.3.6 on p.

44 or 7.5.3 of the appendix. 

In terms of large deviation we need not distinguish between the delayed

and the undelayed counting process any more. The same theorem holds

for undelayed rcp that are exponentially equivalent to undelayed N; we have

proved exponential equivalence for delayed and undelayed rcp in section 3.4.1.

We state and prove the more general finite dimensional large deviations for

the delayed process ˜

N. We apply exponential equivalence of ˜

Nand the

restarted process. Again the theorem immediately generalises to all other

rcp exponentially equivalent to ˜

Theorem 3.5.2. Let ˜

Nbe the delayed renewal counting process with sta-

tionary increments and inter event time of density fand lmgf Λ. Then for

any k≥1ak-dimensional Large Deviation principle holds: for any ordered

sequence s1, . . . , sk(and for easier notation s0=x0= 0), and open set

G⊆Rk

−inf

x∈G

r=1

(sr−sr−1) Γ∗(xr−xr−1

sr−sr−1

)≤lim inf

n→∞

nlog P(˜

Nn(s1,...,sk)∈G)

while for any closed set F⊆Rk

lim sup

n→∞

nlog P(˜

Nn(s1,...,sk)∈F)≤ − inf

x∈F

r=1

(sk−sk−1) Γ∗(xk−xk−1

sk−sk−1

)

with Γ∗the Fenchel-Legendre transform of Γ.

3.5 The counting process 61

Proof of 3.5.2: Again by the G¨artner-Ellis theorem. In the scope of

this proof abbreviate Nre

n=Nre,(s1,...,sk−1)

nfor the restarted process defined

in 3.4.16 associated with ˜

N, ˜

N(1),..., ˜

N(k−1). We calculate the following

relatively simple lmgf:

E[exp{hθ , 





Nre

n(s1)

Nre

n(sk)





i}]

(1)

=E[exp{h





θ1+···+θk

θ2+· · +θk

θk





,





Nre

n(s1)

Nre

n(s2)−Nre

n(s1)

Nre

n(sk)−Nre

n(sk−1)





i}]

=E[exp{h





θ1+···+θk

θ2+· · +θk

θk





,





Nn(s1)

N(1)

n(s2−s1)

N(k−1)(sk−sk−1)





i}]

N(0):= ˜

r=1

E[exp{(θr+···+θk)˜

N(r−1)

n(sr−sr−1)}]

where in (1) we applied

(1) : hθ, yi=hT⊤θ, T−1yi,T=





1 0 0 ... 0

1 1 0 ... 0

........

1 1 ... 1 0

1 1 ... 1 1







and thus

lim

n→∞

nlog E[exp{hθ , 





Nre

n(s1)

Nre

n(sk)





i}]

r=1

lim

n→∞

nlog E[exp{(θr+···+θk)˜

N(r−1)

n(sr−sr−1)}]

r=1

(sr−sr−1) Γ(θr+···+θk)

And the Fenchel-Legendre transform of this lmgf in some x∈Rkbecomes

(first apply to the inner product hθ, xithe same transformation as in (1),

then change the variable θ)

sup

θ∈Rk

hθ, xi −

r=1

(sr−sr−1) Γ(θr+···+θk)

= sup

θ∈Rk

h





θ1+···+θk

θ2+· · +θk

θk





,





x2−x1

xk−xk−1





i

−

r=1

(sr−sr−1) Γ(θr+···+θk)

ξr=θr+···+θk

= sup

ξ∈Rk

hξ , 





x2−x1

xk−xk−1





i −

r=1

(sr−sr−1) Γ(ξr)

= sup

ξ∈Rk

r=1

ξr(xr−xr−1)−(sr−sr−1) Γ(ξr)

r=1

(sr−sr−1) sup

ξ∈Rxr−xr−1

sr−sr−1

ξ−Γ(ξ)

r=1

(sr−sr−1) Γ∗(xr−xr−1

sr−sr−1

)

Now the restarted process is exponentially equivalent to ˜

N. By 7.2.3 they

have the same lmgf. Alternatively from the large deviation principle proved

for the restarted process Nre follows the large deviation principle for ˜

Nand

with the same rate function by [5] theorem 4.2.13. Thus the claim is proved.



3.5.2

Corollary 3.5.3. Any delayed renewal counting process exponentially equiv-

alent to ˜

Nsatisfies a finite dimensional large deviation principle with the

rate function of 3.5.2.

3.6 The counting process 63

3.5.2 Continuous paths

Recall section 3.4.1 and equation (3.6). This equation is really about the

modulus of continuity for Nn. The modulus of continuity is defined as

ωδ(f) := sup{|f(s)−f(t)|:s, t ∈[0, T],|s−t| ≤ δ}

and for monotonously increasing f=Nnwe get

ωδ(Nn) = sup

t∈[0,T ]

Nn(t+δ)−Nn(t)

Claim 3.5.4. limδ→0limn→∞ ωδ(Nn) = 0 a.s.

Proof of 3.5.4: In the proof of 3.4.4 we showed that (3.6) holds for any

fixed a > 0. We restate (3.6) in terms of the modulus of continuity:

(3.6) ⇔lim

δ→0lim sup

n→∞

nlog P(ωδ(Nn)> a) = −∞

This implies that for any fixed a > 0, M ∈R

∃(δ0, n0) : ∀(δ < δ0, n ≥n0) : P(ωδ(Nn)> a)≤e−nM

but then P∞

n=1 P(ωδ(Nn)> a)<∞and by the Borel Cantelli lemma

ωδ(Nn)> a happens only for finitely many nfor almost any fixed path

Nin the scaling (Nn;n∈N). Thus lim supn→∞ ωδ(Nn)≤afor δ < δ0a.s.

Thus

lim

δ→0lim sup

n→∞

ωδ(Nn)≤sup

δ<δ0

lim sup

n→∞

ωδ(Nn)≤a

Since awas arbitrary the claim is proved. 

3.5.4

So we know that as n→ ∞ the counting process tends to a continuous

function. We can look at interpolations ˆ

Nnof Nnand their distribution on

C([0, T],R) for finite n. We now know that any limiting distribution will be

concentrated on C([0, T],R).

3.6 Change of measure

We start with an intuitive way of defining a change of measure for the pro-

cess Nover compact intervals and then rigorously prove that it is a mean

one martingale and has the intuitively clear properties. Since delayed and

undelayed counting processes are exponentially equivalent (as long as initial

inter event times share the same Cesaro mean of hazard rates) we are free to

chose an initial distribution and we will work with the undelayed counting

process throughout this section.

Fix a path of the counting process Nover [0, t]. The path is defined through

the now fixed value at tand the fixed inter event times τ1, . . . , τNt. The

likelihood of the path under the inter event times density fis

f(τ1)· · · · · f(τNt)Fc(B(t))

and the likelihood ratio for two different inter event times densities fand g

(with distribution function G)

g(τ1)· · · · · g(τNt)Gc(B(t))

f(τ1)· · · · · f(τNt)Fc(B(t)) =

k=1

f(τk)Gc

Fc(B(t))

In the case of g=fβ

k=1

fβ

f(τk)Fc

Fc(B(t)) =

k=1

eβτk−Λ(β)Fc

Fc(B(t)) = eβ(t−B(t))−NtΛ(β)Fc

Fc(B(t))

and this will be our density process.

3.6.1 Martingale property

We prove the martingale property directly relying only on the renewal prop-

erty of the counting process.

Claim 3.6.1. For β∈ D(Λ) and any T∈Rthe process

L(β, ·) : [0, T]→[0,∞), t 7→ eβ(t−B(t))−NtΛ(β)Fc

Fc(B(t)) (3.10)

is a martingale with respect to the natural filtration of N.

Proof of 3.6.1: L(β, t)<∞always for unbounded τ. If τis bounded and

has no point mass on its least upper bound, bsay, then P(Fc(B(t)) = 0) =

P(τ=b) = 0 and L(β, t)<∞a.s. Changing the density of τto f(b) = 0

makes L(β, t)<∞always.

Measurability wrt the filtration generated by Nis immediate: Observing

Nup to time twe know Ntand B(t) which L(β, t) is a function of. We prove

integrability in the following claim 3.6.2 by calculating the mean and then

innovation in claim 3.6.3. 

3.6.1

3.6 The counting process 65

Claim 3.6.2 (Integrability).E[L(β, t)] = 1 for all βand t≥0.

Proof of 3.6.2: We need to know about the distribution of the age when

Nt=kis fixed. If k= 0 then B(t) = tfor sure. For specified k≥1 we have

an explicit density for the age (cf 3.1.17)

E[L(β, t)]

∞

k=0

E[eβ(t−B(t))−NtΛ(β)Fc

Fc(B(t)) 11Nt=k]

=Fc

Fc(t)Fc(t)

∞

k=1 Zt

x=0

E[eβ(t−B(t))−kΛ(β)Fc

Fc(B(t)) 11Nt=k|B(t) = x]Fc(x)f∗k(t−x)dx

=Fc

β(t) +

∞

k=1 Zt

x=0

eβ(t−x)−kΛ(β)Fc

Fc(x)Fc(x)f∗k(t−x)

|{z }

=Fc

β(x)f∗k

β(t−x)

=Fc

β(t) +

∞

k=1

β⊛F⊛k

β(t)

=P(β)(Nt= 0) +

∞

k=1

P(β)(Nt=k) = 1



3.6.2

Claim 3.6.3 (Innovation).E[L(β, t)|(Nr;r≤s)] = L(β, s)for any β∈

D(Λ) and all t, s with s≤t.

Proof of 3.6.3: Let us first investigate the conditional expectation of the

claim.

E[L(β, t)|(Nr;r≤s)]

=E[

k=1

exp{βτk−Λ(β)}Fc

Fc(B(t)) |(Nr;r≤s)]

k=1

exp{βτk−Λ(β)}E[

k=Ns+1

exp{βτk−Λ(β)}Fc

Fc(B(t)) |(Nr;r≤s)]

For the remaining conditional expectation we do not need the condition on

the whole past of the process: All the information the integrand requires is

in the state N(s) in sand the age B(s) in s. Claim 3.6.3 is equivalent to the

following

k=Ns+1

exp{βτk−Λ(β)}Fc

Fc(Bt)|Bs, Ns] = Fc

Fc(Bs) (3.11)

for any β∈ D(Λ) and all t, s with s≤t. In (3.11) there is τN(s)+1, the inter

event time covering sand we condition on B(s), the age of the process at

time s. We in fact condition on τN(s)+1 > b for b=B(s) which allows us to

apply the distribution function F+a(cf 2.1.7) with a=B(s) to τN(s)+1. We

denote the density of F+B(s)by f+B(s).

Now introduce the indicator 11Nt=Ns+lwith l∈N. For l= 0 we have an

empty product in the following

k=Ns+1

exp{βτk−Λ(β)}Fc

Fc(Bt) 11Nt=Ns|Bs, Ns]

=Fc

β(t−s+Bs)

Fc(Bs)(3.12)

And we have applied Nt=Ns⇒Bt=Bs+t−s. Now for general l≥1.

k=Ns+1

exp{βτk−Λ(β)}Fc

Fc(B(t)) 11Nt=Ns+l|B(s), Ns]

=E[eβτNs+1−Λ(β)

k=Ns+2

exp{βτk−Λ(β)}Fc

Fc(B(t)) 11Nt=Ns+l|B(s), Ns]

(3.13)

Now τNs+1 =Bs+Cswhere Bsis known. We solve for the unknown

Cs=τNs+1 −Bs

which has density f+aof definition 2.1.7 with a=Bs(cf 3.1.14).

(3.13) = Z∞

x=0

E[eβ(Bs+Cs)−Λ(β)

k=Ns+2

exp{βτk−Λ(β)}Fc

Fc(Bt)

11Nt=Ns+l|Bs, Ns, Cs=x]f+Bs(x)dx

=Zt−s

x=0

eβ(Bs+x)−Λ(β)E[

k=Ns+2

exp{βτk−Λ(β)}Fc

Fc(Bt)

11Nt=Ns+l|Bs, Ns, Cs=x]f+Bs(x)dx (3.14)

3.6 The counting process 67

In the remaining conditional expectation we have inter event times τNs+2,

τNs+3,...that are independent of Nsand identically distributed to τ1, τ2,....

The age B(t) = B(N, t) conditional on Nhaving an event at s+xhas the

same distribution as the age of the renewal counting process NτN(s)+2 defined

through its inter event times τN(s)+2, τN(s)+3,... at time t−(s+x):

LB(N, t)|s, Ns, Cs=x=LB(NτNs+2 , t −(s+x)) |s, Ns

And from the just mentioned independence

LB(NτNs+2 , t −(s+x)) |s, Ns=LB(Nτ1, t −(s+x)) 

The event Nt=Ns+ltranslates into 1 + NτNs+2 (t−(s+x)) = lor equiv-

alently Nτ1(t−(s+x)) = l−1. In the product we had k=Ns,...,Ntfor

Nt=Ns+l, so we did take the product over all l−1 inter event times of

Nbetween [s+C(s), t]. These inter event times now become τ1,...τl−1or

τ1,...,τNτ1

t−(s+x).

We summarise all just mentioned changes:

(3.14) = Zt−s

x=0

eβ(B(s)+x)−Λ(β)

Nτ1(t−(s+x))

k=1

exp{βτk−Λ(β)}

Fc(B(Nτ1, t −(s+x))))11Nτ1(t−(s+x))=l−1]

f+B(s)(x)dx

and summing expressions (3.13) over l≥1

∞

l=1

(3.13)

=Zt−s

x=0

eβ(B(s)+x)−Λ(β)E[

Nτ1(t−(s+x))

k=1

exp{βτk−Λ(β)}Fc

Fc(B(Nτ1, t −(s+x))))

∞

l=1

11Nτ1(t−(s+x))=l−1

|{z }

]f+B(s)(x)dx

and applying 3.6.2 to Nτ1at t−(s+x)

Zt−s

x=0

eβ(B(s)+x)−Λ(β)E[

Nτ1(t−(s+x))

k=1

exp{βτk−Λ(β)}Fc

Fc(B(Nτ1, t −(s+x)))) ]

|{z }

f+B(s)(x)dx

we can simplify

∞

l=1

(3.13) = Zt−s

x=0

eβ(B(s)+x)−Λ(β)f+B(s)(x)

|{z }

=f(x+B(s))

Fc(B(s))

Fc(B(s)) Zt−s

x=0

eβ(B(s)+x)−Λ(β)f(x+B(s)) dx

Fc(B(s)) Zt−s+B(s)

x=B(s)

fβ(x)dx

Fc(B(s)) (Fβ(t−s+B(s)) −Fβ(B(s)))

Fc(B(s)) (−Fc

β(t−s+B(s)) + Fc

β(B(s)))

=−Fc

β(t−s+B(s))

Fc(B(s)) +Fc

β(B(s))

Fc(B(s))

which finally results in

k=Ns+1

exp{βτk−Λ(β)}Fc

Fc(B(t)) |B(s), Ns]

∞

l=0

k=Ns+1

exp{βτk−Λ(β)}Fc

Fc(B(t)) 11Nt=Ns+l|B(s), Ns]

= (3.12) +

∞

l=1

(3.13)

=Fc

β(t−s+B(s))

Fc(B(s)) +−Fc

β(t−s+B(s))

Fc(B(s)) +Fc

β(B(s))

Fc(B(s))

=Fc

β(B(s))

Fc(B(s))

and we got (3.11). 

3.6.3

For an alternative proof cf. [4] proposition 13.3.V on p. 535.

3.6.2 The twisted distribution

We rearrange the non-negative mean one martingale L(β, ·) a little and

change the parameter from βto γsuch that β=−Γ(γ). Since D(Γ) = R

and Λ(D(Λ)) = Rwe can start with any γand find the matching β. The

following is well defined.

3.6 The counting process 69

Definition 3.6.4. M(γ, t) := L(−Γ(γ), t)for γ∈R, t ∈R≥0.

We get a simplification in L(β, t) from Λ(−Γ(γ)) = Λ(Λ−1(−γ)) = −γ

L(−Γ(γ), t)(3.10)

=e−Γ(γ) (t−B(t))−NtΛ(−Γ(γ)) Fc

−Γ(γ)

Fc(B(t))

=eγNt−tΓ(γ)Fc

−Γ(γ)

Fc(B(t))eΓ(γ)B(t)

and will write

M(γ, t) = exp{γNt−tΓ(γ)}r(−Γ(γ), t) (3.15)

with

Definition 3.6.5. For t≥0and βsuch that Fβis well defined (cf 2.3.1) set

r(β, t) := Fc

Fc(B(t)) e−βB(t).

Note that ris measurable by continuity of (β, x)7→ Fc

Fc(x)e−βx and mea-

surability of t7→ B(t).

Claim 3.6.6. The one dimensional distributions of the counting process un-

der the change of measure M(γ, ·)coincide with the one dimensional distri-

butions of a renewal counting process with inter event times densities fβfor

β=−Γ(γ).

Proof of 3.6.6: It holds for k= 0 and arbitrary t≥0:

P[γ](Nt= 0) = E[11Nt=0 eγ0−(t−t)Γ(γ)Fc

Fc(t)] = E[11Nt=0]Fc

Fc(t) = Fc

β(t)

and for k≥1 and t > 0, applying β=−Γ(γ)⇔Λ(β) = −γ:

P[γ](Nt=k)

=E[11Nt=keγNt−(t−B(t))Γ(γ)Fc

Fc(B(t))]

3.3

=Zt

s=0

E[11Nt=keγNt−(t−B(t))Γ(γ)Fc

Fc(B(t)) |B(t) = s, Nt=k]Fc(s)f∗k(t−s)ds

=Zt

s=0

eγk−(t−s)Γ(γ)Fc

Fc(s)Fc(s)f∗k(t−s)ds

=Zt

s=0

eγk−(t−s)Γ(γ)f∗k(t−s)

|{z }

=f∗k

β(t−s)

β(s)ds

=Fc

β⊛F⊛k

β(t)

3.6.6

Where we did not fix the order of convoluting and exponentially twisting due

to 3.1.10.

Claim 3.6.7. The two dimensional distributions of the counting process un-

der the change of measure M(γ, ·)coincide with two dimensional distribu-

tions of a renewal counting process with inter event times densities fβfor

β=−Γ(γ).

Proof of 3.6.7: First the untwisted counting process. We apply the joint

distribution of state and residual lifetime introduced in 3.1.16 for k≥0, l ≥

1. Remember that F⊛0= 11(0,∞).

P(Nt1=k, Nt2=k+l)

=Zt2−t1

r=0

P(Nt1=k, Nt2=k+l|Nt1=k, C(t1) = r)

Zt1

p=0

f(t1+r−p)f∗k(p)dp dr

=Zt2−t1

r=0

P(Nt2=k+l|Nt1=k, C(t1) = r)

|{z }

=P(N′

t2−t1−r=l−1) Zt1

p=0

f(t1+r−p)f∗k(p)dp dr

With N′counting increments of Nafter t1+C(t1) and N′an undelayed

renewal counting process (cf section 3.2). Now for k≥0, l = 0.

P(Nt1=k, Nt2=k) = P(Nt1=k, C(t1)> t2−t1)

=Zt1

r=0

Fc(t1+ (t2−t1)−r)dF⊛k(r)

=Zt1

r=0

Fc(t2−r)dF⊛k(r)

Now we investigate the twisted process. k≥0, l ≥1

P[γ](Nt1=k, Nt2=k+l)

=Zt2−t1

r=0 Zt1

s=0

E[11Nt1=k11Nt2=k+leγNt2−(t2−B(t2))Γ(γ)Fc

Fc(B(t2))

|Nt1=k, B(t1) = s, C(t1) = r]f(t1+r−s)f∗k(s)ds dr (3.16)

Let again N′count increments of Nafter t1+C(t1). Then the notation

changes:

Nt2−Nt1|C(t1) = r→1 + N′

t2−t1−r

Nt2=Nt1+Nt2−Nt1=Nt1+ 1 + N′

t2−t1−r

B(N, t2)→B(N′, t2−t1−r)

t2−B(t2) = t1+r−s+t2−t1−r−B(N, t2) + s

=t1+r−s+t2−t1−r−B(N′, t2−t1−r) + s

3.6 The counting process 71

Continue

(3.16)

=Zt2−t1

r=0 Zt1

s=0

E[11N′

t2−t1−r=l−1eγN′

t2−t1−r−(t2−t1−r−B(N′,t2−t1−r))Γ(γ)

Fc(B(N′, t2−t1−r))|Nt1=k, B(t1) = s, C(t1) = r]

e(k+1)γ−(t1+r)Γ(γ)f(t1+r−s)f∗k(s)ds dr

=Zt2−t1

r=0 Zt1

s=0

E[γ][11N′

t2−t1−r=l−1]fβ(t1+r−s)f∗k

β(s)ds dr

to get the twisted analogue of the untwisted finite dimensional distribution

of t1, t2. The case of l= 0 for the twisted process is omitted. 

3.6.7

Iterating this all finite dimensional distributions of the counting process un-

der the change of measure M(γ, ·) are those of a renewal counting process

with inter event time densities fβ(for β=−Γ(γ)):

Conclusion 3.6.8. Let Nbe a renewal counting process with typical inter

event time density fand lmgf Λ. Let t7→ M(t, γ)be the change of measure

process defined in 3.6.4. Under this change of measure the counting process

remains renewal and inter event times now have density fβwith β=−Γ(γ) =

Λ−1(−γ).

We now investigate the change of measure process further.

Lemma 3.6.9. Let τbe an inter event time with density fand distribution

function Fand β∈ D(Λ). Then infx:F(x)<1

Fc(x)e−βx >0.

Proof of 3.6.9: The function x7→ Fc

Fc(x)e−βx is continuous on {x|F(x)<

1}from the existence of the density ffor F. If τis unbounded the infimum

is over R≥0otherwise - if τ∈(0, b), say - over some bounded interval. The

function is positive for each fixed xand thus has a positive minimum over

any compact interval whithin {x|F(x)<1}. We still need a positive liminf

as x→ ∞ or x→b. We do a calculation for τunbounded, but the same

holds for bounded τ.

Fc(x)e−βx =Z∞

s=x

fβ(s)ds 1

Fc(x)e−βx

=Z∞

s=x

eβs−Λ(β)f(s)

Fc(x)ds e−βx

=Z∞

s=x

eβs f(s)

Fc(x)ds e−Λ(β)−βx

=E[eβ(τ−x)|τ > x]e−Λ(β)

Thus for β > 0

Fc(x)e−βx =E[eβ(τ−x)|τ > x]e−Λ(β)> e−Λ(β)>0

(the e−Λ(β)>0 requires β∈ D(Λ)). If on the other hand β < 0 by an

application of Jensen’s inequality

Fc(x)e−βx =E[eβ(τ−x)|τ > x]e−Λ(β)≥eβE[τ−x|τ>x]e−Λ(β)

and

ǫ > Fc

Fc(x)e−βx ⇒ǫ > eβE[τ−x|τ>x]e−Λ(β)

⇔1

βlog ǫ eΛ(β)<E[τ−x|τ > x]

and the last inequality holds for arbitrarily small ǫonly if lim supx→∞ E[τ−

x|τ > x] = ∞. We have assumed in 2.2.13 that this does not happen. 

3.6.9

We can often do without the cited assumption 2.2.13 but then have to con-

cider different cases:

•For bounded τand β < 0 we could have argued directly:

τ > x ⇒τ∈(x, b)β<0

⇒β(τ−x)> β (b−x)

Implying E[eβ(τ−x)|τ > x]> eβ(b−x)> eβb >0 which makes Fc

Fc(x)e−βx

strictly positive for all x∈(0, b).

•If for unbounded τthe limit limx→∞ h(x) exists in (0,∞] then limx→∞ E[τ−

x|τ > x] = 1

limx→∞ h(x)<∞. (cf (7.4) of the appendix with E[τ+x]

the expectation under the distribution F+x(cf definition 2.1.7), so

E[τ+x] = E[τ−x|τ > x])

Lemma 3.6.10. Let τbe an inter event time with density fand distribution

function Fand γ∈ D(Λ). Then supx:F(x)<1

Fc(x)e−γx <∞.

Proof of 3.6.10: We have seen that infx

Fc(x)e−βx >0 for any β∈ D(Λ).

Since 0 ∈ D(Λ) we also have α∈ D(Λ) + α=D(Λ−α) for any α∈R(cf 2.5).

For −α∈ D(Λ) we apply claim 3.6.9 to the inter event time with distribution

3.6 The counting process 73

function F−α1(and parameter αin the place of β):

0<inf

x:F−α(x)<1

(F−α)c

−α

(x)e−αx = inf

x:F(x)<1

−α

(x)e−αx

supx:F(x)<1

−α

Fc(x)eαx

(With {x|F−α(x)<1}={x|F(x)<1}from equivalence of measures /

distribution functions under the exponential twist; for (F−α)α=Fcf lemma

2.3.3.) So we got the claim with γ=−αand our only requirement was

−α∈ D(Λ). 

3.6.10

Claim 3.6.11. t7→ r(β, t)defined in 3.6.5 is bounded and strictly positive.

Proof of 3.6.11: For unbounded τ

inf

t∈R≥0

r(β, t) = inf

t∈R≥0

Fc(B(t)) e−βB(t)B(t)∈[0,t]⊆R≥0

≥inf

x∈R≥0

Fc(x)e−βx >0

sup

t∈R≥0

r(β, t) = sup

t∈R≥0

Fc(B(t)) e−βB(t)B(t)∈[0,t]⊆R≥0

≤sup

x∈R≥0

Fc(x)e−βx <∞

while for bounded τ∈(0, b) we have B(t)∈(0, b) and we apply the infimum

and supremum over (0, b) = {x:F(x)<1}.

Positivity of the infimum was proved in 3.6.9 and finiteness of the supremum

in 3.6.10. 

3.6.11

We have now introduced a change of measure M(γ, ·) for the renewal count-

ing process and have proved how it affects the inter event times of the count-

ing process. In future notation we will distinguish exponentially twisting

inter event times with some parameter βin parentheses: E(β)[τ]. If we ex-

plicitly refer to a counting process we write E(β)[eθNt] to express the twist of

its inter event times with parameter β; or we might write E[γ][eθNt] to de-

note the twisting of the counting process with the change of measure process

M(γ, ·). Both are the same as soon as β=−Γ(γ).

Further more about notation: Under the exponentially transformed inter

event densities with parameter βthe mean changes from E[τ] = Λ′(0) to

E(β)[τ] = Λ′(β) (cf proof of 2.2.8 and 2.3.4). This corresponds to a change

of the renewal point process’ rate from λ=1

E[τ]to 1

E(β)[τ]. The following is a

1F−αis an exponential transform as defined in 2.1.2 and not of the kind of definition

2.1.7. If it was it would have to be F+(−α)for α≤0

translation of this fact into the changed parameters (β=−Γ(γ)) and more

in terms of the counting process.

Definition 3.6.12. The rate of the point process Nunder the twist M(γ, ·)

is denoted λ(γ)and is defined as λ(γ) := Γ′(γ).

Alternatively we could have defined λ(γ) = 1

E(−Γ(γ))[τ].

To conclude: We have developed our basic tools to be working with renewal

counting processes. This allows us to make up for the lost Markov property.

We have

•in terms of large deviations identified the undelayed renewal counting

process, the renewal counting process with stationary increments, and

the counting process with independent increments over disjoint inter-

vals. For the Poisson process all these three properties hold generally

and truly - not only in terms of large deviation.

•We have defined an exponential change of measure for the undelayed

renewal counting process such that under the changed distribution the

process remains renewal. Changing a Markovian jump process this

holds true immediately.

Chapter 4

Large deviations of the ren.

counting process

In this chapter we develop a sample path large deviation principle for the

renewal counting process and we get a rate function in integral form with

the integrand, the so-called local rate function, the Fenchel-Legendre trans-

form of the lmgf of the counting process. The integral form fits the claimed

closeness to the Markov property while the local rate function reflects the

generalised distribution of inter event times.

The importance of this chapter is twofold: On the one hand the sample

path large deviations for a one dimensional counting process are much sim-

pler than the sample path large deviations for a higher dimensional process

with non independent coordinates and discontinuous statistics. A stochastic

process describing a stochastic network will have such undesirable properties.

So aiming at the large deviations for a network process we want to develop

and test our tools by proving the large deviations of the counting process.

On the other hand we will later directly apply the sample path large devi-

ations of the arrival and service process to obtain local large deviations for

the network. Thus it is also a matter of completeness that we include the

sample path large deviations for the counting process.

The large deviation principle for the counting process is not a new result:

it was proved in 1997 by Anatolii Puhalskii and Ward Whitt in the space

D([0,∞),R). Under our assumption 2.2.2 they equip the space with the

Skorohod J1-topology [17], theorem 6.1. They use weak convergence analogs

in large deviations and apply an extended contraction principle. It was not

immediately clear to us how to apply their techniques in the setting of a

stochastic network.

There may be another interesting point in this chapter: The sample path

large deviations of the partial sums process can be obtained for LD-bounded

iid summands by Mogulskii’s theorem, 5.1.2 in [5]. It may be intuitive that

the partial sums process of inter event times contains the same information

as the counting process constructed from these inter event times and that

rare events of one process can be translated into rare events of the other.

In recent work Raymond Russel [22] and Mark Rodgers-Lee [20] prove how

to obtain a large deviation principle for the counting process from the large

deviation principle of the partial sums process and vice versa. This would

be one approach to develop the large deviations for the counting process of

LD-bounded inter event times.

However, one would still have to develop the large deviations for partial sums

processes with not LD-bounded summands for which the Mogulskii theorem

does not hold - for example for exponentially distributed summands. In this

case, one would rather start from the counting process-side: Sample path

large deviations for the Poisson process and other more general Markovian

jump processes have been developed in the book of Shwartz and Weiss [23].

We decided to directly develop the sample path large deviations for the count-

ing process and will do so for LD-bounded and not LD-bounded inter event

times at the same time. One could then like to transfer the large deviation

principle for the renewal counting process with not LD-bounded inter event

times back to the partial sums process.

Our approach of development of the sample path large deviations for the

renewal counting process starts with local large deviations and calculating

exponential decay rates on sup-norm balls around piecewise linear functions

with diminishing radii applying an exponential change of measure. In the

change of measure we need to find the suitable change of measure that makes

the deviating event become the expected behaviour. We get a weak large de-

viation principle and identify the integral form for the rate function. Then

we strengthen the weak to a full large deviation principle by exponential

tightness.

Apart from the existence of a large deviation principle and the explicit form

of the rate function this chapter contributes in allowing standard large devi-

ation interpretation of rare events: that the rare event happens like a regular

(common, non-rare) event under a different distribution, the twisted distri-

bution. And this allows for standard applications as in fast simulation.

4.1 LD of the ren. counting process 77

4.1 The space

We have introduced the counting process as a stochastic process with piece-

wise constant paths. Its paths are elements of D([0, T],Z). As we interpolate

Nits paths become elements of C([0, T],R)⊆D([0, T],R). By exponential

equivalence of Nand ˆ

Nwe need not distinguish between Nand ˆ

Nin terms

of large deviations when working in the sup-norm induced topology, cf 3.4.3.

As before we choose the process that we are most convenient to work with.

In this section in terms of large deviation theorems it will be the interpo-

lated process ˆ

Nliving in the continuous functions C([0, T],R) while for direct

calculations of exponentially scaled limits of probabilities it will be the not-

interpolated undelayed process Nwith realisations in D([0, T],R).

The choice of C([0, T],R) equipped with the sup-norm ||·|| may seem natural

in that we have already seen that limiting distributions of scaled counting

processes will be in the continuous functions, cf 3.5.2. The sup-norm induces

a metric on C([0, T],R) and on D([0, T],R) and thus a topology on these sets

of functions. We use the same notation for sup-norm balls in C([0, T],R) as

on D([0, T],R):

Definition 4.1.1. For ψ∈C([0, T],R), ǫ > 0

Uǫ(ψ) = {f:||f−ψ|| < ǫ}

where Uǫ(ψ)is understood to be a subset of D([0, T],R)or C([0, T],R).

Remark 4.1.2. The space (C([0, T],R),|| · ||)of continuous functions over

the compact interval [0, T]with the sup-norm is a complete seperable metric

space.

While the scaled process Nn(cf definition 3.4.1) is a function on [0, T] it

contains information about Nover [0, nT].

4.1.1 A base of the topology

In this section we give a base of the sup-norm induced topology of C([0, T],R)

the space of continuous functions. It will consist of sup-norm balls around

piecewise linear functions.

Definition 4.1.3 (Piecewise linear functions).For J∈Nset

PJ={f∈C([0, T],R)|flinear on [k T 2−J,(k+ 1) T2−J]

for k= 0,...,2J−1}.

Claim 4.1.4.

J={ Uǫ(ψ)|ǫ > 0, J ∈N, ψ ∈ PJ}

is a base of the sup-norm induced topology of the real valued continuous func-

tions C([0, T],R).

Proof of 4.1.4: We argue with [21] (chapter 8, section 2, proposition 3, p.

146)

•The base elements cover the set of continuous functions: Let fbe a

continuous function. Then there is an element of Jthat contains f:

There is for any ǫ > 0 a J∈Nand a piecewise linear function ψ∈ PJ

such that ||f−ψ|| < ǫ.

•Let B1and B2be in J. There is for any g∈B1∩B2aB3=B3(g)∈ J

such that g∈B3and B3⊆B1∩B2.

Nothing to say about the first bullet. About the second: Let B1, B2∈ J

be such that they have a non-empty intersection. Let ψibe the centre of Bi

for i= 1,2 and ǫibe such that Bi=Uǫi(ψi) and Jisuch that ψi∈ PJi. Set

J:= max{J1, J2}then ψi∈ PJfor i= 1,2.

Let gbe some continuous function (not necessarily piecewise linear!) in

B1∩B2. On any of the 2Jintervals of [0, T] this ghas a positive distance to

the boundary of the intersection B1∩B2[T k 2−J, T (k+1) 2−J].

In terms of pointwise restrictions any function in the intersection B1∩B2

maps x∈[0, T] to ywith the following properties.

x∈[T k 2−J, T (k+ 1) 2−J] (4.1)

y∈(max{ψ1(x)−ǫ1, ψ2(x)−ǫ2},min{ψ1(x) + ǫ1, ψ2(x) + ǫ2})

and for fixed gdefine the strictly positive, continuous functions h, i as

h(x) := g(x)−max{ψ1(x)−ǫ1, ψ2(x)−ǫ2}

i(x) := min{ψ1(x) + ǫ1, ψ2(x) + ǫ2} − g(x)

for x∈[0, T]. Each has a strictly positive minimum over [0, T] and we can

define 0 < ǫ := minx∈[0,T ]{min{h(x), i(x)}} and uniformly on [0, T] the dis-

tance of g(x) to the bound B1∩B2is at least ǫ. Now there is K∈Nand

a piecewise linear φ∈ PKthat is ǫ

4-close to gin the sup-norm. The neigh-

bourhood Uǫ

2(φ) lies within B1∩B2and contains g. 

4.1.4

4.2 LD of the ren. counting process 79

T k 2−JT(k+ 1) 2−J



ψ1









ǫ1

@



ψ2

@





@





ǫ2?

pppppppp

pppppppppppppp

ppp pp

ppp

Figure 4.1: Feasible B1, B2∈ J and

g∈B1∩B2

T k 2−JT(k+ 1) 2−J









@





pppppppp

pppppppppppppp

ppp pp

ppp

h(x)

i(x)

Figure 4.2: Restrictions (4.1) for

functions in B1∩B2,gand h(x), i(x)

for a fixed xand the fixed g

The base Jconsists of convex sets (not-compact sup-norm balls). This

agrees with C([0, T],R) being locally convex.

We get a similar countable Jif we allow only piecewise linear functions

with slopes in Q, starting in Q, and sup norm balls of rational radii. This

agrees with C([0, T],R) being separable. As a complete separable space

(C([0, T]),R) is denoted a Polish space.

If we fix the initial values and work with (C([0, T],R)∩ {f|f(0) = x},||.||)

we get a base of the induced topology as J ∩ {f|f(0) = x}.

4.2 Local large deviations

We directly compute decay-rates for probabilities of the event that the scaled

renewal counting process Nn(cf definition 3.4.1) stays close to a piecewise

linear function ψ∈ PJwith some J∈N:

lim

ǫ→0lim

n→∞

nlog P(Nn∈ Uǫ(ψ)) (4.2)

We do this in steps, first for linear ψ∈ P0and then for the general case

ψ∈ PJ, J ≥1.

Remember that Λ is the lmgf of a typical inter event time τof Nand

Γ = −Λ−1(− ·) has been defined in 2.2.7. We will apply the density pro-

cess Mdefined in 3.6.4 (for an explicit form: (3.15)). The rappearing in the

density process has been defined in 3.6.5. We repeat the change of measure

process with twist parameter α=−Λ(β) for β∈ D(Λ):

M(α, t) = exp{αNt−tΓ(α)}r(−Γ(α), t)

This change of measure process applied to the counting process corresponds

to the exponential twist of inter event times with parameter β=−Γ(α) (cf

3.6.8).

4.2.1 Local large deviations upper bound

We calculate the limsup for the expression (4.2) for t7→ tv ∈ P0with some

v≥0.

Claim 4.2.1. For a renewal counting process Nwith lmgf Γand Γ∗the

Fenchel-Legendre transform of Γ(cf section 2.6)

lim

ǫ→0lim sup

n→∞

nlog P(Nn∈ Uǫ(t7→ tv)) ≤ −TΓ∗(v)

Proof of 4.2.1: We start with a change of measure with M(α, ·) for some

α∈R.

P(Nn∈ Uǫ(t7→ tv))

=E[11Nn∈Uǫ(t7→tv)] = E[11Nn∈Uǫ(t7→tv)

M(α, nT)

M(α, nT)]

=E[α][11Nn∈Uǫ(t7→tv)

M(α, nT)]

=E[α][11Nn∈Uǫ(t7→tv)exp{−α N(nT) + n T Γ(α)}1

r(−Γ(α), nT)]

We now add the zero α nTv −α nTv and bound by applying closeness of

4.2 LD of the ren. counting process 81

N(nT) to nTv as enforced by the indicator. For α > 0

P(Nn∈ Uǫ(t7→ tv))

=E[α][11Nn∈Uǫ(t7→tv)

exp{ −α N(nT) + α nTv

|{z }

=−α n T (Nn(T)−T v)≤α n T ǫ

−α nTv +n T Γ(α)

|{z }

=−nT (α v−Γ(α))

r(−Γ(α), nT)]

≤E[α][11Nn∈Uǫ(t7→tv)

r(−Γ(α), nT)] exp{−n T α(v−ǫ)−Γ(α)}

The expectation E[α][11Nn∈Uǫ(t7→tv)1

r(−Γ(α),nT )] is finite and uniformly bounded

in n: Bound 11Nn∈Uǫ(t7→tv)≤1 and apply boundedness of 1

r(−Γ(α),·)proven in

3.6.11. There is no exponential growth in the expectation and it vanishes in

the exponentially scaled upper bound limit. Thus we finish for α > 0

lim sup

n→∞

nlog P(Nn∈ Uǫ(t7→ tv))

≤lim

n→∞

nlog E[α][11Nn∈Uǫ(t7→tv)

r(−Γ(α), nT)] exp{−n T α(v−ǫ)−Γ(α)}

≤ −Tα(v−ǫ)−Γ(α)(4.3)

while for α < 0 we get

lim sup

n→∞

nlog P(Nn∈ Uǫ(t7→ tv)) ≤ −α T (v+ǫ) + TΓ(α)

Optimising the bound

lim sup

n→∞

nlog P(Nn∈ Uǫ(t7→ tv))

≤ −Tmax{sup

α>0

α(v−ǫ)−Γ(α),sup

α<0

α(v+ǫ)−Γ(α)}

and letting ǫ→0

lim

ǫ→0lim sup

n→∞

nlog P(Nn∈ Uǫ(t7→ tv))

≤ −Tmax{sup

α>0

α v −Γ(α),sup

α<0

α v −Γ(α)}

=−TΓ∗(v).



4.2.1

4.2.2 Local large deviations lower bound

We calculate the liminf for the expression (4.2) for t7→ tv ∈ P0and v≥0.

Claim 4.2.2. For a renewal counting process Nwith lmgf Γand Γ∗the

Fenchel-Legendre transform of Γ(cf section 2.6)

lim

ǫ→0lim inf

n→∞

nlog P(Nn∈ Uǫ(t7→ tv)) ≥ −TΓ∗(v)

To prove the claim we will apply the following

Lemma 4.2.3. If Γ′(α) = vthen

lim inf

n→∞

nlog E[α][11Nn∈Uǫ(t7→tv)

r(−Γ(α), t))] = 0.

Proof of 4.2.3: We have for β=−Γ(α) bounded x7→ Fc

Fc(x)e−βx and thus

E[α][11Nn∈Uǫ(t7→tv)

r(−Γ(α), nT)]≥E[α][11Nn∈Uǫ(t7→tv)]

|{z }

→1 by 7.1.3

inf

x∈R

(x)eβx >0

The convergence in 7.1.3 of the appendix is almost surely. From boundedness

the convergence is in the mean, too. 

4.2.3

Proof of 4.2.2: Consider the case v > 0 first. We start similarly as for

the upper bound. We again apply that N(nT) is close to nTv. Let α > 0.

P(Nn∈ Uǫ(t7→ tv))

=E[α][11Nn∈Uǫ(t7→tv)

exp{ −α N(nT) + α nTv

|{z }

=−α n T (Nn(T)−T v)≥−α n T ǫ

−α nTv +n T Γ(α)

|{z }

=−nT (α v−Γ(α))

r(−Γ(α), nT)]

≥E[α][11Nn∈Uǫ(t7→tv)

r(−Γ(α), nT)] exp{−n T α(v+ǫ)−Γ(α)}

Fix αsuch that Γ′(α) = vand lemma 4.2.3 is applicable. For v > Γ′(0) we

have α > 0 (by α= (Γ′)−1(v)>(Γ′)−1(Γ′(0)) = 0) and lower bound

lim inf

n→∞

nlog P(Nn∈ Uǫ(t7→ tv))

≥lim inf

n→∞

nlog E[α][11Nn∈Uǫ(t7→tv)

r(−Γ(α), nT)]

exp{−α n T (v+ǫ) + n T Γ(α)}

= 0 −α T (v+ǫ) + TΓ(α).

4.2 LD of the ren. counting process 83

This results in the now accurate

lim

ǫ→0lim

n→∞

nlog P(Nn∈ Uǫ(t7→ tv)) (1)

=−Tα(v)v−Γ(α(v))(2)

=−TΓ∗(v).

where we wrote α(v) since αis the twist with Γ′(α) = vfrom lemma 4.2.3

needed in (1). The same property Γ′(α) = vjustifies (2).

Similarly for v∈(0,Γ′(0)) and Γ′(α) = vwhich implies α≤0.

Since Γ′>0 always lemma 4.2.3 does not apply to v= 0. However, for

v= 0 we may write

Nn∈ Uǫ(t7→ 0·t)⇔Nn< ǫ

and calculate the probability of the event directly.

lim

n→∞

nlog P( sup

t∈[0,T ]

|Nn(t)|< ǫ) = lim

n→∞

nlog P(Nn(T)< ǫ) = −TΓ∗(ǫ)

lim

ǫ→0lim

n→∞

nlog P( sup

t∈[0,T ]

|Nn(t))|< ǫ) = −TΓ∗(0)

With Γ∗(0) = LC(h) and = ∞for LD-bounded inter event times. We applied

continuity of Γ∗(or simultaneous divergence to ∞) in 0 from the corollary

2.6.4. 

4.2.2

4.2.3 Generalisation

Up to now we have started the scaled counting process as Nn(0) = 0 and

calculated decay rates for the counting process to stay close to some linear

function starting in 0, too. Upper and lower bound work exactly the same

way for counting processes starting in ⌊nx⌋

nand an affine function t7→ x+tv.

We make this explicit in the following notation and state the generalised

result.

Definition 4.2.4 (Scaling).For a counting process Nand a fixed x > 0set

Nn(·, x) : t7→ ⌊nx⌋

n+1

nN(nt)

Corollary 4.2.5. Let Nbe a counting process, x > 0and ψ∈ P0,ψ(t) =

x+tv. Let Nn(·, x)be the scaled process of definition 4.2.4. Then

lim

ǫ→0lim

n→∞

nlog P(Nn(·, x)∈ Uǫ(t7→ x+tv)) = −TΓ∗(v)

4.2.4 Piecewise linear functions

We calculate (4.2) for general J≥1 applying heuristics learned from [23]

(chapter 5, p. 73) developed for Markovian processes.

Claim 4.2.6. If ψ∈ PJwith non-negative slopes v1, v2,...,v2Jand Nan

undelayed renewal counting process with lmgf Γand Nnthe scaled process

associated with N. Then

lim

ǫ→0lim

n→∞

nlog P(Nn∈ Uǫ(ψ)) = −T2−J

k=1

Γ∗(vk)

Proof of 4.2.6: Let Nre,(T2−J,2T2−J,...,(2J−1)T2−J)

nbe the scaled restarted

process defined in 3.4.16 and 3.4.12 for k= 2J−1 with the equidistant

s1=T2−J, s2= 2 T2−J. . . s2J−1=T(2J−1) 2−J

and identically distributed rcps

N(0) , N(1) , . . . , N(2J−1)

We abbreviate Nre

n:= Nre,(T2−J,2T2−J,...,(2J−1)T2−J)

nfor the rest of this section

4.2.4.

On the interval [kT2−J,(k+1)T2−J] the restarted process starts in Nre

n(kT2−J)

and increases as the renewal counting process N(k). Note that Nre

nand ψare

defined with respect to the same cases. Let v1,...,v2Jbe the piecewise con-

stant slopes of ψ∈ PJand set s0= 0, s2J=Tfor notational convenience.

Nre

n(t)−ψ(t)











N(t)−v1tif t∈[0, s1]

N(s1) + N(1)(t−s1)−v1s1−v2(t−s1) if t∈(s1, s2]

2J−2

j=0

N(j)(sj+1 −sj)−vj+1(sj+1 −sj)

+N(2J−1)(t−s2J−1)−v2J(t−s2J−1)

if t∈(s2J−1, s2J] = (s2J−1, T]

2J−1

k=0

11t∈(sk,sk+1]

k−1

j=0

N(j)(sj+1 −sj)−vj+1(sj+1 −sj) + N(k)(t−sk)−vk+1(t−sk)

4.2 LD of the ren. counting process 85

and from the triangle inequality

||Nre

n−ψ|| ≤

2J−1

k=0

11t∈(sk,sk+1](4.4)

k−1

j=0

|N(j)(sj+1 −sj)−vj+1(sj+1 −sj)|+||N(k)−(t7→ tvk+1)||[0,sk+1−sk].

Now if

||N(j)−(t7→ tvj+1)||[0,sj+1−sj]≤ǫ

then especially for the end point of the interval

|N(j)(sj+1 −sj)−vj+1(sj+1 −sj)| ≤ ǫ

and

||Nre

n−ψ|| ≤

2J−1

k=0

11t∈(sk,sk+1]

k−1

j=0

|N(j)(sj+1 −sj)−vj+1(sj+1 −sj)|+||N(k)−(t7→ tvk+1)||[0,sk+1−sk]

≤

2J−1

k=0

11t∈(sk,sk+1]k−1

j=0

2J+ǫ

2J=

2J−1

k=0

11t∈(sk,sk+1](k+ 1) ǫ

≤ǫ.

Thus, by independence of increments

P(Nre

n∈ Uǫ(ψ)) ≥P(||N(k)

n−(t7→ tvk+1)|| < ǫ 2−J∀k)

2J−1

k=0

P(||N(k)

n−(t7→ tvk+1)|| < ǫ 2−J)

lim

ǫ→0lim inf

n→∞

nlog P(Nre

n∈ Uǫ(ψ))

≥

2J−1

k=0

lim

ǫ→0lim

n→∞

nlog P(||N(k)

n−(t7→ tvk+1)|| < ǫ 2−J)

≥

2J−1

k=0

T2−JΓ∗(vk+1) = T2−J

k=1

Γ∗(vk)

From exponential equivalence of Nand the restarted process limits in the

form of the claim are the same for both processes by application of claim

7.2.1 of the appendix. This finishes the lower bound. The upper bound is

quite similar with blowing up radii from constant ǫto an increasing sequence

of radii ǫ, 2ǫ, . . . , 2Jǫ.

4.2.6

4.2.5 Towards linear geodesics

In sample path large deviations linear geodesics is the property that the rate

function is in integral form and the integrand, the so called local rate func-

tion, is convex (cf [9], definition 6.1). In their book [9] Ayalvadi Ganesh, Neil

O’Connell, and Damon Wischik approach large deviations through proving

a sample path large deviation principle once and then deducing further large

deviation principles by application of the contraction principle. To get an ex-

plicit rate function from the contraction principle a variational problem has

to be solved - and here linear geodesics helps. Technically linear geodesics is

the settting where Jensen’s inequality can be applied.

In the queueing setting a nice application of the contraction principle and

linear geodesics is the large deviations of the queue size at some fixed time

t > 0 if the smoothed queue has started in t= 0 with size x0. In this and

many other cases we can observe a most likely path to the event of interest

which is piecewise linear.

While linear geodesics starts with a rate function in integral form and can

prove the implication that the process “likes” to move along piecewise linear

functions, we argue the other way around: Aiming to get a large deviation

principle with a rate function in integral form (like in the Markovian case)

we can already prove that the scaled renewal counting process deliberately

stays close to linear functions if it can.

Claim 4.2.7. For ψ∈ P0and φ∈ P1with φ(0) = ψ(0),φ(T) = ψ(T)and

||φ−ψ|| >0

lim

ǫ→0lim

n→∞

nlog P(Nn∈ Uǫ(φ)) <lim

ǫ→0lim

n→∞

nlog P(Nn∈ Uǫ(ψ))

Proof of 4.2.7: We name slopes of ψand φfirst:

ψ: [0, T]→[0,∞), t 7→ tv

φ: [0, T]→[0,∞), t 7→ (tw1for t∈[0,T

2w1+ (t−T

2)w2for t∈(T

2, T]

4.2 LD of the ren. counting process 87

-t

φ(T), ψ(T)











Figure 4.3: ψwith slope vand φwith slopes w1, w2

The paths’ different slopes relate as

Tv =T

2w1+ (T−T

2)w2=T(1

2w1+1

2w2)

And from strict convexity of the Fenchel-Legendre transform Γ∗

Γ∗(v) = Γ∗(1

2w1+1

2w2)<1

2Γ∗(w1) + 1

2Γ∗(w2)

which in terms of the decay rate for the tubes is

TΓ∗(v)<T

2Γ∗(w1) + T

2Γ∗(w2)

⇔lim

ǫ→0lim

n→∞

nlog P(Nn∈ Uǫ(ψ)) >lim

ǫ→0lim

n→∞

nlog P(Nn∈ Uǫ(φ)) 

4.2.7

Comparing two elements of P1the one that is close to a linear function is

asymptotically preferred by the scaled process Nn. In preparation for claim

4.2.9 we make the following

Definition 4.2.8. Let ψ, φ ∈ P1be defined as

ψ: [0, T]→[0,∞), t 7→ (tv1for t∈[0,T

2v1+ (t−T

2)v2for t∈(T

2, T]

φ: [0, T]→[0,∞), t 7→ (tw1for t∈[0,T

2w1+ (t−T

2)w2for t∈(T

2, T]

with non-negative v1, v2, w1, w2and φ(0) = ψ(0),φ(T) = ψ(T).

-t

φ(T), ψ(T)













-t

φ(T), ψ(T)













Figure 4.4: Two sets of {ψ, φ}suiting definition 4.2.8

Note that v1, v2have the same distance to ψ(T)

ψ(T)

T=1

2(v1+v2)⇔ψ(T)

T−v2=v1−ψ(T)

⇒ |ψ(T)

T−v2|=|ψ(T)

T−v1|

Writing slopes as a vector ~v =v1

v2we get

ψ(T)

T=1

2(v1+v2)⇔v2= 2ψ(T)

T−v1

⇔~v =v1

v2=ψ(T)

|{z}

mean slope 1

1+ (v1−ψ(T)

T)1

−1

|{z }

off-set from mean slope

which is visualised in figure 4.5. Since all the above considerations apply to

φand its vector of slopes ~w as well, the figure shows again a ~v corresponding

to ψand a ~w corresponding to φ.

Claim 4.2.9. For φ, ψ of definition 4.2.8 If

||~v −ψ(T)

T1

1|| <||~w −ψ(T)

T1

1||

then

lim

ǫ→0lim

n→∞

nlog P(Nn∈ Uǫ(φ)) <lim

ǫ→0lim

n→∞

nlog P(Nn∈ Uǫ(ψ))

4.2 LD of the ren. counting process 89

-R≥0

R≥0

2ψ(T)

ψ(T)

2ψ(T)



@@@@@@@@@@@@@

r~v

-

r~w -

ψ(T)

T1

1





Figure 4.5: Slopes ~v for ψand ~w for φsuiting definition 4.2.8.

Proof of 4.2.9: Consider the vectors of slopes ~v, ~w in R2

≥0. Note that

||~v −ψ(T)

T1

1|| =|v1−ψ(T)

T| · || 1

1||

and thus

||~v −ψ(T)

T1

1|| <||~w −ψ(T)

T1

1|| ⇔ |v1−ψ(T)

T|<|w1−ψ(T)

which, due to symmetry, is equivalent to

|v2−ψ(T)

T|<|w2−ψ(T)

Now we have an ordering

wi1< vj1<ψ(T)

T< vj2< wi2

with i1, i2, j1, j2∈ {1,2}and i16=i2, j16=j2. Lets assume that w1< v1<

ψ(T)

T< v2< w2. Then

vi=vi−w1

w2−w1

w1+w2−vi

w2−w1

w2(i= 1,2)

and by convexity

Γ∗(vi)≤vi−w1

w2−w1

Γ∗(w1) + (1 −vi−w1

w2−w1

) Γ∗(w2) (i= 1,2)

⇒Γ∗(v1) + Γ∗(v2)

≤v1−w1

w2−w1

+v2−w1

w2−w1Γ∗(w1) + w2−v1

w2−w1

+w2−v2

w2−w1Γ∗(w2)

=v1+v2−2w1

w2−w1

Γ∗(w1) + 2w2−v1−v2

w2−w1

Γ∗(w2)

= Γ∗(w1) + Γ∗(w2)

Thus from

lim

ǫ→0lim

n→∞

nlog P(Nn∈ Uǫ(ψ)) = −T

2(Γ∗(v1) + Γ∗(v2))

lim

ǫ→0lim

n→∞

nlog P(Nn∈ Uǫ(φ)) = −T

2(Γ∗(w1) + Γ∗(w2))

follows the claim. 

4.2.9

4.2.6 A limit

We can write the rhs of 4.2.6 in integral form as

lim

ǫ→0lim

n→∞

nlog P(Nn∈ Uǫ(ψ)) = −ZT

s=0

Γ∗(ψ′(s)) ds

While for finite Jthis integral is artifical, it gives us a uniform expression

for finite Jand the limit of J→ ∞.

In the following we will apply a standard way of approximating absolutely

continuous functions.

Definition 4.2.10. For f∈AC[0, T]with f(s) = f(0) + Rs

r=0 g(r)dr for

some g∈ L1define its piecewise linear approximation fJ∈ PJthrough a

piecewise constant approximation of the almost derivative g.

gJ(s) = 1

T2JZ⌈s

T2J⌉T2−J

r=⌊s

T2J⌋T2−J

g(r)dr

fJ=f(0) + ZgJ

4.3 LD of the ren. counting process 91

Claim 4.2.11. The fJof 4.2.10 have a limit “in the rate function”:

lim

J→∞ lim

ǫ→0lim

n→∞

nlog P(Nn∈ Uǫ(fJ)) = ZT

s=0

Γ∗(f′(s)) ds

Proof of 4.2.11: The gJapproximate gin a pointwise fashion (cf [5] C.13,

a Lebesgue theorem). An application of the pointwise convergence of gJ→g

and the Fatou-lemma tells us

lim inf

J→∞ ZT

s=0

Γ∗(fJ′(s)) ds = lim inf

J→∞

2J−1

l=0

2−JTΓ∗(gJ(l2−JT))

= lim inf

J→∞ ZT

s=0

Γ∗(gJ(s)) ds

Fatou

≥ZT

s=0

lim inf

J→∞ Γ∗(gJ(s)) ds

lsc

≥ZT

s=0

Γ∗(g(s)) ds

=ZT

s=0

Γ∗(f′(s)) ds

We get the other direction, an upper bound, immediately from Jensen’s in-

equality. We only need this if the lower bound is finite.

Γ∗(gJ(lT2−J)) = Γ∗2J

TZ(l+1)T2−J

r=lT 2−J

g(r)dr≤2J

TZ(l+1)T2−J

r=lT 2−J

Γ∗(g(r)) dr

thus

s=0

Γ∗(fJ′(s)) ds =

2J−1

k=0

2−JTΓ∗(gJ(kT2−J)) ≤

2J−1

k=0

2−JT2J

TZ(k+1)T2−J

r=kT 2−J

Γ∗(g(r)) dr

=ZT

r=0

Γ∗(g(r)) dr =ZT

s=0

Γ∗(f′(s)) ds

What we have now is a well defined limit for any f∈ AC.

lim

J→∞ ZT

s=0

Γ∗(fJ′(s)) ds =ZT

s=0

Γ∗(f′(s)) ds (4.5)

So the claimed equality is proved. 

4.2.11

4.3 The LDP in sample space

In this section we develop a full large deviation principle for the counting

process. We will apply the local large deviations, the tube limits, found in

previous sections and repeat some of the techniques already developed. The

main object of this section are ǫ-neighbourhoods of piecewise linear functions:

the technical difference to the tubes in the local large deviations is that we

have fixed ǫ > 0 instead of a limit ǫ→0.

4.3.1 The weak large deviation principle

The following is an application of theorem 4.1.11 of [5]. We state it here in

the notation fitting our context and as implied by the remark following the

theorem.

Theorem 4.3.1 (Dembo and Zeitouni).In the space of continuous functions

over a compact interval equipped with the sup-norm (C[0, T],||.||) let Abe a

base of the topology. If for every U∈ A

lim

n→∞

nlog P(ˆ

Nn∈U) (4.6)

exists in R∪ {−∞} then ˆ

Nnsatisfies the weak LDP with the rate function I

defined as

I(f) := sup

U∈A :f∈U

−lim

n→∞

nlog P(ˆ

Nn∈U).

The theorem takes place in (C[0, T],||.||) and the large deviation object

has to be the interpolated counting process ˆ

Nn∈C[0, T]. To calculate

the limit (4.6) we will be working with the undelayed and not-interpolated

counting process Nsince limits for both processes are the same (cf claim

7.2.1 of the appendix).

Claim 4.3.2. For ψ∈ P0,ψ(t) = vt for some v≥0, and ǫ > 0

lim

n→∞

nlog P(Nn∈ Uǫ(ψ)) = −Tinf

w∈[v−ǫ

T, v+ǫ

T]Γ∗(w)

Proof of 4.3.2: For the lower bound we apply our tubes limit. Let δ > 0

and w∈[v−ǫ

T+δ

T, v +ǫ

T−δ

T].

P(Nn∈ Uǫ(ψ)) ≥PNn∈ Uδ(t7→ tw)(δ∈(0, ǫ])

lim inf

n→∞

nlog P(Nn∈ Uǫ(ψ)) ≥ −TΓ∗(w)

4.3 LD of the ren. counting process 93

As we let δ→0 the restrictions for wbecome w∈(v−ǫ

T, v +ǫ

T) resulting

lim inf

n→∞

nlog P(Nn∈ Uǫ(ψ)) ≥ −Tinf

w∈(v−ǫ

T, v+ǫ

T)Γ∗(w)

≥ −Tinf

w∈[v−ǫ

T, v+ǫ

T]Γ∗(w).

The upper bound is simplified to the large deviation of the mean (one-

dimensional LDP in claim 3.5.1).

P(Nn∈ Uǫ(ψ)) ≤PNn(T)∈[Tv −ǫ , Tv +ǫ])

=P1

nT N(nT)∈[v−ǫ

T, v +ǫ

T])

lim sup

n→∞

nlog P(Nn∈ Uǫ(ψ)) ≤ −Tinf

w∈[v−ǫ

T, v+ǫ

T]Γ∗(w)



We can rephrase 4.3.2 the following way: There is a φ∈ P0∩ Uǫ(ψ) such

that the probability to stay within Uǫ(ψ) is carried by some φ:

lim

n→∞

nlog P(Nn∈ Uǫ(ψ)) = lim

δ→0lim

n→∞

nlog P(Nn∈ Uδ(φ))

That φis linear agrees with linear geodesics (cf section 4.2.5).

Claim 4.3.3. For ψ∈ P1and ǫ > 0

lim

n→∞

nlog P(Nn∈ Uǫ(ψ)) = −inf

φ∈P1∩Uǫ(ψ)ZT

t=0

Γ∗(φ′(t)) dt

Proof of 3.4.7: Let v1, v2≥0 be such that

ψ(t) = (v1tfor t∈[0,T

v1T

2+v2(t−T

2) for t∈(T

2, T]

For a lower bound: if φ∈ P1with non-negative slopes w1, w2is such that

w1∈(v1−2ǫ

T, v1+2ǫ

and w1+w2∈(v1+v2−2ǫ

T, v1+v2+2ǫ

then for δ > 0 small enough Uδ(φ)⊆ Uǫ(ψ). Figure 4.6 shows such ψand φ.

Now we can bound applying claim 4.2.6 to J= 1, φand Uδ(φ).

lim inf

n→∞

nlog P(Nn∈ Uǫ(ψ)) ≥lim

δ→0lim

n→∞

nlog P(Nn∈ Uδ(φ))

=−T

2Γ∗(w1) + Γ∗(w2)

-t







ψ(·)











```````````````

``````````````````````````````

φ(·)

Figure 4.6: Solid ψwith slopes v1, v2and dashed φwith slopes w1, w2,

φ∈ Uǫ(ψ)

Let ~w =w1

w2, ~v =v1

v2and to abbreviate the condition on ~w set

S1(v, ǫ) := {w∈R2:w1∈[v1−2ǫ

T, v1+2ǫ

T],

w1+w2∈[v1+v2−2ǫ

T, v1+v2+2ǫ

T]}

V1:S1(v, ǫ)→R, ~w 7→ Γ∗(w1) + Γ∗(w2)

4.3 LD of the ren. counting process 95

and optimise the lower bound:

lim inf

n→∞

nlog P(Nn∈ Uǫ(ψ)) ≥ −T

2inf

~w∈S1(~v,ǫ)Γ∗(w1) + Γ∗(w2)

To get an upper bound we apply the finite dimensional large deviation prin-

ciple of 3.5.1: Consider the process only at fixed epochs T

2and T:

P(Nn∈ Uǫ(ψ)) ≤PNn(T

2, T)∈

(ψ(T

2)−ǫ , ψ(T

2) + ǫ)×(ψ(T)−ǫ , ψ(T) + ǫ)

≤PNn(T

2, T)∈

[ψ(T

2)−ǫ , ψ(T

2) + ǫ]×[ψ(T)−ǫ , ψ(T) + ǫ]

Now apply the upper bound for closed sets of claim 3.5.2:

lim sup

n→∞

nlog P(Nn∈ Uǫ(ψ))

≤ −T

2inf

~z∈FΓ∗(2z1

T) + Γ∗(2(z2−z1)

T) (4.7)

with F= [ψ(T

2)−ǫ , ψ(T

2) + ǫ]×[ψ(T)−ǫ , ψ(T) + ǫ] (4.8)

Define V2the following way.

V2:F→R, ~z 7→ Γ∗(2z1

T) + Γ∗(2(z2−z1)

To match lower and upper bound we need

inf

~w∈S1(~v,ǫ)V1(~w) = inf

~z∈FV2(~z)

We prove a little more in lemma 4.3.4 which finishes the proof. 

4.3.3

Lemma 4.3.4. V1=V2◦Ufor a regular transformation U:S1(~v, ǫ)→F.

Proof of 4.3.4: Define Uas

U:S1(~v, ǫ)→F , ~w 7→ T

21 0

1 1 ~w.

It is well defined since

U(~w)∈F⇔T

2w1∈[ψ(T

2)−ǫ , ψ(T

2) + ǫ]

2(w1+w2)∈[ψ(T)−ǫ , ψ(T) + ǫ]

·2

⇔











w1∈[2ψ(T

|{z}

=v1

−2ǫ

T,2ψ(T

T+2ǫ

w1+w2∈[2ψ(T)

|{z}

=v1+v2

−2ǫ

T,2ψ(T)

T+2ǫ











⇔~w ∈ S1(~v, ǫ).

Regularity is immediate from S1(~v, ǫ) having relative dimension 2 and the

matrix representation. Further note that mapping of ~z ∈Fonto the argu-

ments of Γ∗in (4.7)

~z 7→ 2z1

2(z2−z1)

T=2

T1 0

−1 1 ~z =U−1(~z)

is the inverse transformation. Which already implies the claim. To be very

exact

V2◦U(~w) = V2T

2w1

2(w1+w2)

= Γ∗(2

2w1) + Γ∗(2

T(T

2(w1+w2)−T

2w1))

= Γ∗(w1) + Γ∗(w2)

=V1(~w)



4.3.4

Claim 4.3.3 holds in the general case of J∈N, too. We state it and give the

definition of the generalised objects, but do not prove the general case.

Remark 4.3.5. For ψ∈ PJfor some fixed J∈Nand ǫ > 0

lim

n→∞

nlog P(Nn∈ Uǫ(ψ)) = −inf

φ∈PJ∩Uǫ(ψ)ZT

t=0

Γ∗(φ′(t)) dt

The following is the definition analogue to the case J= 1 and an appli-

cation of this definition.

4.3 LD of the ren. counting process 97

Remark 4.3.6. Given ψ∈ PJwith slopes v1,...,v2Jforming ~v ∈R2J

SJ(~v, ǫ) := {w∈R2J

≥0:∀k∈ {1,...,2J}:

l=1

wl∈[

l=1

vl−2Jǫ

l=1

vl+2Jǫ

T]}

F=×2J

k=1[ψ(k T 2−J)−2Jǫ

T, ψ(k T 2−J) + 2Jǫ

U:SJ(~v, ǫ)→F , ~w 7→ T2−J





1 0 0 ... 0

1 1 0 ... 0

........

1 1 ... 1 0

1 1 ... 1 1







If ψ , φ ∈ PJwith slopes vi(for ψ) and wi(for φ) for i= 1,...,2Jforming

~v, ~w ∈R2Jthen

φ∈ Uǫ(ψ)⇔~w ∈ SJ(~v, ǫ)⇔U(~w)∈F

According to 4.3.1 we have a rate function for the weak LDP. We want

to identify the rate function as the decay rate on tubes.

Claim 4.3.7 (Rate function identification).If ψ∈ PKthen

I(ψ) = ZT

s=0

Γ∗(ψ′(s)) ds.

Proof of 4.3.7: Let ψ∈ PKand Kbe minimal in that ψ6∈ PK−1. From

4.3.1 we have I(f) = supU∈A,f∈U−limn→∞ 1

nlog P(Nn∈U). Applying the

form of our base as U=Uǫ(ψ) for some ψ∈ PJwith J∈N, ǫ ∈R.

I(f) = sup

ǫ∈R, J∈N

ψ∈PJ, f∈Uǫ(ψ)

−lim

n→∞

nlog P(Nn∈ Uǫ(ψ))

|{z }

=−infξ∈PJ∩Uǫ(ψ)RΓ∗◦ξ′

In the infimum we need to concider all ψ∈ Uǫ(f) and Uǫ(ψ)∩ PJwill never

be empty since it contains ψby construction - we will not see the infimum

over the empty set.

We immediately have I(f)≥RΓ∗◦f′(fix J=Kand let ǫ→0).

To get the opposite inequality fix a feasible combination of J, ǫ, ψ: such

that ψ∈ PJand f∈ Uǫ(ψ). If K≤Jthen f∈ PJ∩ Uǫ(ψ) and

inf

ξ∈PJ∩Uǫ(ψ)ZΓ∗◦ξ′≤ZΓ∗◦f′

If J < K optimising over PK∩ Uǫ(ψ) instead of PJ∩ Uǫ(ψ) would generally

decrease the infimum. However, from section 4.2.5 on linear geodesics we

know that the functions ξ∈ PK∩(Pj)cwe added to the set of restrictions

do not decrease but increase the decay rate RΓ∗◦ξ′. Thus

inf

ξ∈PJ∩Uǫ(ψ)ZΓ∗◦ξ′= inf

ξ∈PK∩Uǫ(ψ)ZΓ∗◦ξ′≤ZΓ∗◦f′

We have a uniform bound for the infimum (uniform in ǫ, J, ψ) which implies

I(f) = sup

ǫ∈R, J∈N

ψ∈PJ, f∈Uǫ(ψ)

inf

ξ∈PJ∩Uǫ(ψ)ZΓ∗◦ξ′≤sup

ǫ∈R, J∈N

ψ∈PJ, f∈Uǫ(ψ)ZΓ∗◦f′≤ZΓ∗◦f′

matching the lower bound of I(f).

4.3.7

We want this to be the general form of the rate function.

Claim 4.3.8 (Rate function identification).I(ψ) = RT

s=0 Γ∗(ψ′(s)) ds for

ψ∈AC[0, T].

Proof of 4.3.8: Let ψbe absolutely continuous and ψ6∈ SJ∈NPJ. We

investigate again

I(f) = sup

U∈A,f∈U

−lim

n→∞

nlog P(Nn∈U)

= sup

ǫ∈R, J∈N

ψ∈PJ, f∈Uǫ(ψ)

inf

ξ∈PJ∩Uǫ(ψ)ZΓ∗◦ξ′

Let fJbe approximations of fin PJ(cf 4.2.10) and assume that limJ→∞ I(fJ)<

∞and let γ > 0 be some small number. Choose Jlarge enough for

|I(fJ)−lim

K→∞ I(fK)|<γ

to hold and ǫ > 0 small enough for

|I(fJ)−inf

ξ∈PJ∩Uǫ(fJ)I(ξ)|<γ

4.3 LD of the ren. counting process 99

to hold. We then get

|inf

ξ∈PJ∩Uǫ(fK)I(ξ)−lim

K→∞ I(fK)|< γ

With J , ǫ chosen according to γand now fixed we have

I(f)≥inf

ξ∈PJ∩Uǫ(fJ)I(ξ)≥lim

K→∞ I(fK)−γ

And since γwas arbitrarily small

I(f)≥lim

K→∞ I(fK) (=: I)

Now lets assume the inequality was strict: that is the following defines a

positive number.

γ′:= sup

ǫ∈R, J∈N

ψ∈PJ, ψ∈Uǫ(f)

inf

ξ∈PJ∩Uǫ(ψ)I(ξ)−I= sup

ǫ∈R, J∈N

ψ∈PJ, ψ∈Uǫ(f)inf

ξ∈PJ∩Uǫ(ψ)I(ξ)−I

There are basically two possibilities to choose ǫ, J, ψ in the supremum. We

investigate the argument of the supremum in both cases.

1st case. ǫ, J, ψ are chosen such that fJ∈ Uǫ(ψ) then infξ∈PJ∩Uǫ(ψ)I(ξ)≤

I(fJ) and

inf

ξ∈PJ∩Uǫ(ψ)I(ξ)−I= inf

ξ∈PJ∩Uǫ(ψ)I(ξ)−I(fJ)

|{z }

≤0

+I(fJ)−I

|{z }

≤0

2nd case. If on the other hand ǫ, J, ψ are such that fJ6∈ Uǫ(ψ) then we have

ψ∈ PJ∩ Uǫ(f)∩ Uǫ(fJ)c. That is ψis an element of PJthat is closer in the

sup-norm to fthan fJ.

Let ||f−ψ|| =: δ < ǫ and Klarge enough for ||fK−f|| < ǫ −δto hold.

Then ||fK−ψ|| < ǫ and

inf

ξ∈PJ∩Uǫ(ψ)I(ξ)−I= inf

ξ∈PK∩Uǫ(ψ)I(ξ)−I

= inf

ξ∈PK∩Uǫ(ψ)I(ξ)−I(fK)

|{z }

≤0

+I(fK)−I

|{z }

≤0

So γ′≤0. 

4.3.8

Since Γ∗is convex the rate function Iis convex, too.

Looking at [5] and their proof of the sample path LDP for the partial sums

process in lemma 5.1.6 (p. 181, top) we see that the rate function Iis con-

centrated on absolutely continuous functions.

100

4.3.2 The full large deviation principle

We can now strengthen the weak large deviation principle to a full one.

Claim 4.3.9. In the space of continuous functions C([0, T],R)equipped with

the sup-norm induced topology the interpolated renewal counting process ˆ

(under the scaling ˆ

Nn:t7→ 1

nˆ

N(nt)) satisfies: for any open set Gand any

closed set F

−inf

f∈GI(f)≤lim infn→∞ 1

nlog P(ˆ

Nn∈G)

lim supn→∞ 1

nlog P(ˆ

Nn∈F)≤ − inf

f∈FI(f)

with the good convex rate function

I(f) = (RT

t=0 Γ∗◦f′(t)dt if f∈AC([0, T],R), f(0) = 0

∞else.

Proof of 4.3.9: From the weak LDP the full LDP follows as soon as we

have a good rate function in the weak LDP. As we are in a Polish space

goodness of the rate function is the compactness of its level sets.

We apply the Arzela-Ascoli theorem (cf 7.5.1) to identify level sets

L(c) = {f|I(f)≤c}

with c≥0 as compact subsets of C([0, T],R).

•Closedness of the level set is given as a property of Ibeing a rate

function.

•I(f)≤cimplies f(0) = 0, so initial points of elements of L(c) are

bounded.

•It remains to be shown that

∀(t, ǫ)∃δ > 0 : |t−s|< δ ⇒sup

f∈L(c)

|f(t)−f(s)|< ǫ

Remember 4.2.11 and let f∈ L(c), J∈N, and fJits approximation with

piecewise constant derivatives vJ

1,...,vJ

2J:vk=1

T2JRT2−Jk

t=T2−J(k−1) f′(t)dt.

4.3 LD of the ren. counting process 101

(Note that only 2−Jis a power and the Jin fJ, vJ

kis an index)

T2−J

k=1

Γ∗(vJ

k) = I(fJ)≤I(f)≤cfor each J

⇒max

k=1,...,2JT2−JΓ∗(vJ

k)≤cfor each J

⇒sup

J∈N,k=1,...,2J

T2−JΓ∗(vJ

k)≤c

⇔sup

t,s<t

(t−s) Γ∗(f(t)−f(s)

t−s)≤c

Fix t, ǫ > 0 and let m≥c

ǫ. From limx→∞ Γ∗(x)

x=∞let Mbe such that

Γ∗(x)

x> m for x≥M. Set δ=c

f(t)−f(s)

t−s< M ⇒f(t)−f(s)≤M(t−s)≤ǫfor |t−s|< δ

otherwise

f(t)−f(s)

t−s≥M

⇒(t−s) Γ∗(f(t)−f(s)

t−s)> m(f(t)−f(s))

f∈L(c)

⇒c≥(t−s) Γ∗(f(t)−f(s)

t−s)> m(f(t)−f(s))

⇒c≥m(f(t)−f(s))

Γ∗(f(t)−f(s)

t−s)> m(f(t)−f(s))

⇒ǫ=c

m≥f(t)−f(s)



4.3.9

For an alternative proof see [23], lemma 5.18. There compactness of level

sets is proved for the Poisson process, which is the renewal process with

exponential inter event times. It works exactly the same for all rcp in the

scope of this thesis; especially the no-point mass in {0}-property of assump-

tion 2.2.2 is important since it is equivalent to Λ∗(0) = ∞which is equivalent

to limx→∞ Γ∗(x)

x=∞.

Corollary 4.3.10. In the space D([0, T],R)equipped with the sup-norm in-

duced topology the sequence of scaled renewal counting processes (Nn;n∈N)

satisfies a sample path large deviation principle with the good, convex rate

function I(.)of 4.3.9.

102

The corollary follows from claim 4.3.9 by an application of lemma 4.1.5

and theorem 4.2.13 of Dembo and Zeitouni [5].

4.3.3 Interpretation

We denote Γ∗the local rate function for the large deviation principle of

the counting process. Note that with reference to Big Queues [9] (section 6.2

definition 6.1, p .99) we now do have linear geodesics for the counting process.

Given some fixed ψ∈AC[0, T] with ψ′≥0 where it exists, what can we

do?

•Approximate ψby ψJfor some large Jand denote v∈R2Jthe vector

of slopes of ψJ. The rate function I(ψ) is well approximated by the

finite sum T2−JP2J

k=1 Γ∗(vk). Each summand has Γ∗(vk) = θ(vk)·vk−

Γ∗(θ(vk)) where θ(vk) as a twist parameter makes vkthe expectation

of limn→∞ Nn(1) under the twisted measure θ(vk).

•Simulate a counting proccess that (over a long time) behaves different

from what would be expected in terms of its empirical mean.

•Find the most likely paths for certain events. The most likely path

to a “too large” or “too small” value at some fixed time will be along

some piecewise linear function with only two different slopes. This is a

typical application of linear geodesics.

•Apply the contraction principle and solve the associated variational

problem.

4.4 Split counting processes

This is a first step towards networks. In a network of d∈Nnodes customers

leaving a node i∈ {1,...,d}may be routed to another network node or may

leave the network, cf figure 4.7. The queue of customers waiting for service at

node imay be non-empty over a period of time. During this time times be-

tween subsequent departures are the customers service times. Equivalently:

if over an interval of time the queue at node iis never empty then increments

of the departure process from this queue are increments of the service process

at this queue.

When describing the network and how it evolves in time we want to doc-

ument where customers departing from a node go to next. As customers

4.4 LD of the ren. counting process 103

node



*

HHHHH

leaving the network

``````

` ` ` ` ` `

`````` -



Figure 4.7: Possible routing of customers leaving node i

leaving queue iare routed into different directions on a technical level we are

splitting the service process.

We will then need a linear transformation to describe the number of cus-

tomers leaving node iand the number of customers being routed to other

network nodes as a vector-valued process in time. Customers leaving the

network will not be counted they only appear as leaving node i. With an

additional condition the linear transformation we apply is a bijection.

In this section we develop the large deviations for a split renewal counting

process under a linear transformation. We start in 4.4.1with constructing

the split process and calculating its lmgf. In another subsection we give the

explicit linear transformation we will later apply in the generalised Jackson

network. We continue in 4.4.2 with an exponential change of measure that

transforms the split rcp into another split rcp. We also give the change of

measure explicitly for the linearly transformed split process we will apply in

the network setting. Finally, in 4.4.3 we develop the full sample path large

deviations principle for the split and the linearly transformed split counting

process..

The examples we give in this section fit the example-network we will work

with in chapter 6.

4.4.1 Construction of the split process

Let us split a counting process Ninto mprocesses N(1),...,N(m)with the

property that Pm

j=1 N(j)

t=Ntfor all t≥0 and that (N(1),...,N(m)) as an

m-tupel changes state iff Ndoes. If it changes state it will be by increasing

one coordinate by 1.

104

Definition 4.4.1 (Split rcp Nsp associated with N, p).Let Nbe a rcp and

p= (p1,...,pm)for some m∈Nsuch that P(r=ei) = pidefines the

distribution of ron Rm(eiis the i-th standard base vector in Rm). Let

r, r1, r2,... be iid and define

Nsp

t:=

i=1

ri(∈Rm).

The coordinates of the split process Nsp are renewal counting processes

and we will identify t7→ PNt

i=1 riand 





N(1)

N(m)



.

Definition 4.4.2 (Scaled split process).For the split process Nsp defined in

4.4.1 define the scaled split process Nsp

nand

Nsp

n:t7→ 1

nNsp

nt =1

Nnt

i=1

And we can identify the scaled split process with the split process of all

its coordinates scaled as in 3.4.1:

Nsp

n=





N(1)

N(m)





, Nsp

n(t) = 





nN(1)(nt)

nN(m)(nt)







Example 4.4.3. Let m= 5 and p= (0,1

2,1

2,0,0). Then Nsp =





N(2)

N(3)





.

Figure 4.8 shows a realisation of Nand the coordinate processes N(2) and

N(3) of Nsp. Note that N=N(2) +N(3) in the figure. Inter event times of

Nare uniformly distributed on (0,2).

We define the lmgf for a discrete probability measure pand then the lmgf

for the split counting process.

Definition 4.4.4. The logarithmic moment generating function of rwith

P(r=ei) = pifor i= 1,...,m is denoted K(as a capital Greek letter). For

θ∈Rm

K(θ) = log E[ehθ,ri] = log

j=1

ehθ,ejipj= log

j=1

eθjpj.

4.4 LD of the ren. counting process 105

0 2 4 6 8 10 12

N(2)

0 2 4 6 8 10 12

N(3)

original process

2. coordinate of the split process

3. coordinate of the split process

Figure 4.8: Realisation of Nand coordinates of Nsp of example 4.4.3.

It is D(K) = Rmeither from the explicit form of the lmgf or from bound-

edness of rand by 2.6.2 the Fenchel-Legendre transform K∗has compact

level sets.

Claim 4.4.5. If the counting process Nwith lmgf Γis split wrt the probability

measure pwith lmgf Kthen the lmgf of the split process Nsp associated with

Nand pis

lim

t→∞

tlog E[ehθ , Nsp

ti] = Γ ◦K(θ).

Proof of 4.4.5: The split process is a time change of the partial sum

n7→ Pn

k=1 rkreplacing nby Ntresulting in t7→ PNt

k=1 rk. Note that coordi-

nates of Pn

k=1 rkare binomially distributed.

106

We calculate exponential moments for the split process. Let θ∈Rm.

E[ehθ , Nsp

ti] = E[ehθ , PNt

k=1 rki]

=E[ehθ , Pn

k=1 rki

∞

n=0

11Nt=n]

∞

n=0 X

i1,...,im

l=1 il=n

P(Nt=n ,

k=1

rk=









)

|{z }

=P(Nt=n)P(Pn

k=1 rk=... )

ePm

j=1 θjij

∞

n=0

P(Nt=n)X

i1,...,im

l=1 il=n

i1!i2!···im!

j=1

pij

jePm

j=1 θjij

|{z }

=Qm

j=1(pjeθj)ij

∞

n=0

P(Nt=n)(

j=1

pjeθj)n

=E[exp{Ntlog

j=1

pjeθj}]

We take the scaled limit to obtain the lmgf for the split process.

lim

n→∞

nlog E[ehθ , Nsp

nt i] = lim

n→∞

nlog E[exp{Nnt log

j=1

pjeθj}]

=tΓlog

j=1

pjeθj=tΓ◦K(θ)



4.4.5

We’ll later need a linear transformation of the split process Nsp.

Remark 4.4.6. Generally, given a random variable X∈Rmwith lmgf K

and TXa linear transformation of Xthe lmgf of TXis θ7→ K(T⊤θ). Es-

pecially we might use a linear transformation to only work with a subset

M⊆ {1,...,m}of coordinates. Then

T=diag(111∈M,...,11m∈M) = T⊤

4.4 LD of the ren. counting process 107

and thus

K◦T⊤(θ) = K(X

k∈M

θkek) = log X

k∈M

pkeθk+X

k6∈M

pk

= log X

k∈M

pkeθk+ 1 −X

k∈K

pk

which is the Ξ-style lmgf of a sub-probability measure pM.

Application to the network

We now give the explicit linear transformation we need in the network set-

ting and calculate the lmgf of the linearly transformed split counting process.

If a vector y∈Rd+1 describes the next destinations of customers leaving

node i= 1 over an interval of time

•with yd+1 the customers that have left the network at departure from

node i= 1

•and customers cannot immediately join the same queue again

then







−Pd+1

k=1 yk





∈Rd

has as its first coordinate the number of customers that have left the first

node i= 1. Remaining coordinates j= 2,...,dare the number of customers

who have gone from node i= 1 to node j. This formalises as a linear

transformation

T=





−1... ... −1

0 1 . . . . . .

. . . . . .

0... 1 0





,T:





yd+1





7→ 





−Pd+1

k=1 yk







When working with a network of dnodes we need a family (T(i))i=1,...,d of

such transformations, one at each node. The Tjust given would be T(1).

108

Definition 4.4.7 (Transformation T(i)).Let d, i ∈Z, d ≥2, i ≤dand

{e1,...,ed}the standard base of Rd.

T(i):Rd+1 →Rd, y 7→

k=1

k6=i

ykek−d+1

k=1

ykei

For example T(2) transforms

T(2) :





yd+1





7→

k=1

k6=2

ykek−d+1

k=1

yke2=





−Pd+1

k=1 yk







and T(2) can be identified with the d×(d+ 1) matrix







1 0 ... 0 0 0

0 1 ... 0 0 0

.....

0 0 ... 1 0 0

0 0 ... 0 1 0







+





0 0 ... 0 0

−1−1... −1−1

0 0 ... 0 0







As we did split the counting process it was with respect to a probability

measure pwith lmgf K: all departing customers were considered. Now,

after the transformation we only consider customers staying in the network

and how they are split onto the other nodes: The measure we use becomes

a subprobability measure. In the following we define a lmgf Ξ for a sub-

probability measure.

Definition 4.4.8. Let pbe a sub-probability distribution on {1,...,d}set

p0= 1 −Pd

k=1 pkand define

Ξ : Rd→R, ξ 7→ log d

k=1

pkeθk+p0

Since Ξ(θ)<∞for all θ∈Rdthe Fenchel-Legendre transform Ξ∗has

compact level sets (cf 2.6.2).

4.4 LD of the ren. counting process 109

Claim 4.4.9. Let Ξ(i)be associated with the sub-probability measure p(i)on

{e1,...,ed}with pii = 0. Let Nbe a counting process with lmgf Γand let

Nsp be the split process associated with Nand the unique probability measure

on {e1,...,ed,0}associated with p(i). The lmgf of the linearly transformed

split process is

lim

t→∞

tlog E[ehξ , T(i)Nsp

ti] = Γ(−ξi+ Ξ(i)(ξ))

Proof of 4.4.9: If p(i)= (pi1,...,pid) is a sub-probability measure then

set pi0= 1−Pd

j=1 pij and let Kbe associated with this probability measure.

Applying pii = 0 we get

Ξ(i)(ξ) = ξi+K(T(i)⊤ξ)

from calculating

K(T(1)⊤ξ) = K





−ξ1









+





ξ2

ξd













pii=p11=0

= log d

j=1

pij e−ξ1+ξj+pi0e−ξ1

=−ξ1+ log d

j=1

pij eξj+pi0

=−ξ1+ Ξ(1)(ξ)

and apply the adaption of the lmgf to a linear transformation of the split

process 4.4.5 to obtain

lim

t→∞

tlog E[ehξ , T(i)Nsp

ti] = lim

t→∞

tlog E[ehT(i)⊤ξ , Nsp

ti]

= Γ ◦K(T(i)⊤ξ) = Γ(−ξi+ Ξ(i)(ξ)).



4.4.9

Remark 4.4.10. The transformation T(i)defined in 4.4.7 is a regular trans-

formation between ddimensional spaces when defined on

T(i):Rd+1 ∩ {x|xi= 0} → Rd.

110

The inverse transformation is

Rd→Rd+1 ∩ {x|xi= 0}, ξ 7→ 





ξ1

ξd

−Pd

k=1 ξk





−ξiei.

Conclusion 4.4.11. If Nis split wrt p(i)= (p1i,...,pdi)with pii = 0 then

the transformation between Nsp and T(i)Nsp is a linear bijection.

When splitting a counting process wrt some probability measure p(i)with

pii = 0 we’ll have N(i)

t≡0 and the split process Nsp

t∈Rd+1 ∩ {x|xi= 0}.

We continue the example 4.4.3 where we had split a counting process: We

now apply a linear transformation.

Example 4.4.12. The transformation T=T(1) applied to the split process







N(2)

N(3)





results in 





−N

N(2)

N(3)





.

The transformation works a little different if the last coordinate process

is not ≡0.

Example 4.4.13. Let m= 5,p= (1

4,1

4,0,0,1

2), and the transformation

T=T(3)

N7→ Nsp 7→ T(3) Nsp :Nsplit

7→ 





N(1)

N(2)

N(5)







T(3)

7→ 





N(1)

N(2)

−N







All examples will appear in the later example network of d= 4 nodes.

4.4.2 Change of measure for the split process

Changing the split processes means changing the unsplit process and the

Bernoulli random variable reigning the routing. Since we constructed the

counting process Nand the routing variables r1, r2,...to be independent we

multiply mass functions. First each variable is twisted exponentially.

4.4 LD of the ren. counting process 111

We apply the exponential twist of 2.3.1 to rwith the lmgf Kdefined in

4.4.4. For measurable A⊂Rm

P(r∈A) = X

k:ek∈A

P(θ)(r∈A) = X

k:ek∈A

ehθ,eki−K(θ)pk=X

k:ek∈A

eθk

E[ehθ,ri]pk

We may also write

P(θ)(r∈A) = E[11r∈A

ehr,θi

E[ehθ,ri]]

and we identify the change of measure

r7→ ehr,θi−K(θ)=ehr,θi

E[ehθ,ri]

The sum Pn

k=1 rkhas lmgf nK by independence of r1,...,rnand the density

of the twisted distribution wrt the original distribution is

k=1

rk7→

k=1

ehrk,θi−K(θ)=ehPn

k=1 rk, θi−nK(θ)

and summands remain independent under the new measure.

The change of measure for the counting process was given in definition 3.6.4

and (3.15). We combine both changes of measure on the product space for

Ntand Pn

k=1 rk:

Claim 4.4.14. Let Nbe a counting process with lmgf Γand r∈Rma routing

variable with mass function pand lmgf K. Then for θ∈ D(K) = Rm

(θ, t)7→ exp{hNsp

t, θi − tΓ◦K(θ)}r(t, −Γ◦K(θ))

is a change of measure process for the split process constructed from Nand p.

The r(·,·)appearing in the change of measure is a random function defined

in (3.6.5) referring to the distribution function Fof inter event times of N

and the age B(and r6=r(·,·)).

112

Proof of 4.4.14: Since Nand Pn

k=1 rkare independent we twist them

individually with ζ∈Rthe twist parameter for Nand θ∈Rmthe twist

parameter for Pn

k=1 rk. Let j∈Nand x∈Nm.

Pθ[ζ](Nt=j ,

k=1

rk=x)

=Eθ[ζ][ 11Nt=j11Pn

k=1 rk=x]

=Eθ[ 11Nt=jeζj−tΓ(ζ)Fc

−Γ(ζ)

Fc(B(t)) eB(t) Γ(ζ)11Pn

k=1 rk=x]

=E[ 11Nt=jeζj−tΓ(ζ)Fc

−Γ(ζ)

Fc(B(t)) eB(t) Γ(ζ)11Pn

k=1 rk=xehθ,xi−nK(θ)]

The following is for the case of n=jand ζ=K(θ).

Pθ[K(θ)](Nt=n ,

k=1

rk=x)

=E[ 11Nt=neK(θ)n−tΓ◦K(θ)Fc

−Γ◦K(θ)

Fc(B(t)) eB(t) Γ◦K(θ)11Pn

k=1 rk=xehθ,xi−nK(θ)]

=E[ 11Nt=n11Pn

k=1 rk=xehθ,xi−tΓ◦K(θ)Fc

−Γ◦K(θ)

Fc(B(t)) eB(t) Γ◦K(θ)]

We apply this to the split counting process Nsp(t) = PN(t)

k=1 rk. Fix an arbi-

trary x∈Nmand n=n(x) = Pm

k=1 xk. Then

P(Nsp(t) = x) = P(Nt=n ,

k=1

rk=x)

and

Pθ,[K(θ)](Nt=n ,

k=1

rk=x)

=E[ 11Nsp

t=xehθ,xi−tΓ◦K(θ)Fc

−Γ◦K(θ)

Fc(B(t)) eB(t) Γ◦K(θ)]

Which identifies the claimed density process. 

4.4.14

4.4 LD of the ren. counting process 113

Application to the network

We rewrite the change of measure process to fit our transformed split counting

process. Let θ∈Rd+1 and T=T(1) for notational simplicity.

hNsp

t, θi=hTNsp

t,(T⊤)−1θi

ξ:= (T⊤)−1θ= (T−1)⊤θ=





θ2

θd





−θd+1 











with this definition θ=T⊤ξand K(θ) = Ξ(1)(ξ)−ξ1from definition 4.4.8.

We rewrite the change of measure 4.4.14 and denote it G(i).

Definition 4.4.15 (G(i)).For

•a renewal counting process N

•a subprobability p(i)= (pi1,...,pid)with pii = 0 and lmgf Ξ

•the split process Nsp associated with Nand (pi1,...,pid,1−Pd

j=1 pij)

•a linear transformation T(i)

we define the change of measure process for T(i)Nsp with parameter ξ∈Rd

G(i)(ξ, t) = exp{hT(i)Nsp

t, ξi − tΓ(−ξi+ Ξ(i)(ξ))}r(t, −Γ(−ξi+ Ξ(i)(ξ))).

We give a summary of this subsection 4.4.2:

Corollary 4.4.16. Let Nbe a counting process with inter event times density

fand lmgf Γthat is split into d+ 1 processes wrt a probability measure p.

Let (p1,...,pd)be the sub-probability measure associated with pwith lmgf Ξ.

Then under the change of measure G(i)the new process is distributed like the

split process of iid inter event times with density f−Γ(−ξ1+Ξ(1)(ξ)) and mean

Γ′(−ξ1+ Ξ(1)(ξ)) and with routing (sub)probabilities ∇Ξ(i)(ξ) = p(ξ).

Definition 4.4.17. For a rcp Nwith lmgf Γand fixed iand ξ∈Rdthe

function

ξ7→ Γ′(−ξi+ Ξ(i)(ξ))

gives the rate of Nunder the change of measure G(i).

114

4.4.3 Sample path LDP for the split process

Claim 4.4.18. Let Nsp be the split process associated with the counting pro-

cess Nand the probability measure p. Let Nhave lmgf Γand plmgf K.

Then under the scaling of 4.4.2 a sample path large deviation principle holds

for Nsp in D([0, T],Rm)equipped with the sup-norm induced topology. The

rate function is good, convex and

h7→ ZT

t=0

(Γ ◦K)∗(h′(t)) dt

for h∈AC([0, T],Rm),h(0) = 0 and h7→ ∞ otherwise.

Proof of 4.4.18: Set Sn=Pn

k=1 rkand we get a sample path LDP for

the split process t7→ Nsp

t=PNt

k=1 =S◦N(t) similarly to the partial sums

process of iid summands in Mogulskii’s theorem (cf [5], theorem 5.1.2 with

K(θ)<∞for all θ∈Rm). We only sketch it in the following. We already

have the lmgf Γ ◦Kand from restarting the counting process we easily get

the finite dimensional lmgfs for (1

nS◦N(0),1

nS◦N(nt1),...,1

nS◦N(nT)).

From the G¨artner-Ellis theorem we get the finite dimensional large deviations

principle with the rate function as the sum over expressions of (Γ ◦K)∗.

Applying the projective limit theorem we get the large deviation principle

in the continuous functions with the rate function in integral form, the lo-

cal rate function the Fenchel-Legendre transform (Γ ◦K)∗. The topology

we get is that of pointwise convergence. One can then deduce that the rate

function is concentrated on absolutely continuous functions. To obtain the

large deviation principle in the sup-norm induced topology we have to prove

exponential tightness. This is done in 4.4.19. 

4.4.18

Claim 4.4.19. The distributions of the scaled split counting process (Nsp

n;n∈

N)are exponentially tight under the sup-norm induced topology.

Proof of 4.4.19: The i-th coordinate process for the split process has inter

event times τ◦=PG

k=1 τkwith geometric Gwith mass function P(G=g) =

pi(1 −pi)g−1. To fit our notation: p= 1 −pi.

By 2.2.11 the lmgf is Λ◦(θ) = Λ(θ) + log 1−p

1−peΛ(θ)and we need the associ-

ated Γ, that is −Λ◦−1(−θ). The inverse of Λ◦is

x7→ Λ−1−log(p+ (1 −p)e−x)

and we get on the level of the counting process

Γ◦(x) = −Λ◦−1(−x) = −Λ−1−log(p+ (1 −p)ex).

4.4 LD of the ren. counting process 115

Substituting p= 1−piand applying the definition of Γ through Λ (cf (2.2.7))

Γ◦(x) = Γlog(1 −pi+piex).

This perfectly coincides with the coordinate-wise lmgf of the split counting

process: Consider x eifor some x∈R

Γ◦K◦πi(x ei) = Γ ◦K(x ei) = Γlog(piex+X

m6=i

pm)

= Γlog(piex+ (1 −pi))= Γ◦(x)

For each coordinate process N(k)(for k= 1,...,d) a sample path large

deviation principle holds (cf. 4.3.9). We apply these to construct a compact

set from the level sets of the coordinate functions. Let ǫ > 0.

L(α, k) := {f∈C([0, T],R)|ZT

t=0

Γ◦(k)∗(f′(t)) dt < α}

K(α) := {f∈C([0, T],Rm)|fk∈ L(α+ǫ, k) for k= 1,...,m}

Each L(α+ǫ, k) is a compact set in C([0, T],R) by the large deviation prin-

ciple 4.3.9 with good rate function. K(α) is a Cartesian product of compact

sets and itself compact in the product space C([0, T],R)m=C([0, T],Rm).

P(ˆ

Nn6∈ K(α)) = P(ˆ

N(k)

n6∈ L(α+ǫ) for some k)

≤mmax

k=1,...,m P(ˆ

N(k)

n6∈ L(α+ǫ, k))

The closure of L(α+ǫ, k)cis a subset of L(α, k)cand we can apply an

alternative formulation of the LDP (cf [DZ] (1.2.7), p.6).

lim sup

n→∞

nlog P(ˆ

N(k)

n6∈ L(α+ǫ)) ≤ −α

⇒lim sup

n→∞

nlog P(ˆ

N(k)

n6∈ K(α)) ≤ −α

which is exponential tightness. 

4.4.19

Application to the network

We simply restate the claim of the large deviation principle 4.4.18 in terms

of the process transformed by T. As usual d=m−1.

116

Corollary 4.4.20. There is a sample path large deviation principle for the

transformed split process t7→ TNsp

tin D([0, T],Rd)equipped with the sup-

norm induced topology with the good convex rate function

h7→ ZT

t=0 Γ◦(Ξ(i)−πi)∗(h′(t)) dt

for h∈AC([0, T],Rd),h(0) = 0 and h7→ ∞ otherwise.

Proof of 4.4.20: From 4.4.18 and from T(i)being a continuous bijection

we immediately have the sample path large deviation principle in the same

space with rate function

h7→ ZT

t=0 Γ◦K∗◦T−1(h′(t)) dt

and we only have to identify the local rate functions. Abbreviate T=T(i).

(Γ ◦K)∗◦T−1(x) = inf

zΓ∗(z) + z K∗(T−1x

linearity of T

= inf

zΓ∗(z) + z K∗(T−1x

def. 4.4.8

= inf

zΓ∗(z) + z(Ξ(i)−πi)∗(x

= (Γ ◦(Ξ(i)−πi))∗(x) (4.9)



4.4.20

Lemma 4.4.21 (Rate function identification).For the local rate function in

4.4.20 for the transformed split counting process holds

Γ◦(Ξ(i)−πi)∗(x) = inf

r(i),γ

γ(r(i)−ei)=x

Γ∗(γ) + γΞ(i)∗(r(i))

Proof of 4.4.21: We make explicit the Fenchel-Legendre transform (4.9).

(Γ ◦(Ξ(i)−πi))∗(x) = inf

γΓ∗(γ) + γ(Ξ(i)−πi)∗(1

γx)

= inf

γΓ∗(γ) + γinf

b∈RdΞ(i)∗(1

γx−b) + (−πi)∗(b)

4.4 LD of the ren. counting process 117

but for the projection πiand b6=−eiwe get

(−πi)∗(b)(7.1)

=−π∗

i(−b) = ∞.

So the projection enforces b=−eiand then vanishes. We continue

(Γ ◦(Ξ(i)−πi))∗(x) = inf

γΓ(i)∗

S(γ) + γΞ(i)∗(1

γx+ei)

Writing the restriction differently as

r(i)=x

γ+ei⇔γ(r(i)−ei) = x

we have proved the claim. 

4.4.21

Claim 4.4.22. If Ξis associated with the subprobability measure pon Rd

then Ξ∗(r)<∞for rrepresenting a strictly positive sub-probability measure

in Rdwith

p0= 0 ⇔r0= 0 and pi= 0 ⇒ri= 0.

Proof of 4.4.22: We have for θ∈Rd

∇Ξ(θ) = "pieθi

j=1 pjeθj+p0#i=1,...,d

and ∇Ξ(θ) is a probability measure iff pis (p0= 0). For p0>0 and ri=

0⇒pi= 0 set

θi= log ri

pi(i:pi>0).

In isuch that pi= 0 the θinever shows up in Ξ(θ) at all and can be set to

an arbitrary value. We get ∇Ξ(θ) = rand an optimiser in the transform is

found. The coordinates θiwith i∈supp(p) are uniquely defined.

If there are isuch that ri= 0, pi>0 then setting this θi=−∞ also makes

∇Ξ(θ) = r. Formally an optimiser of the transform does not exist (not in

Rd). We nevertheless get an Ξ∗(r)<∞and explain it more formally:

Set z=Pi:ri>0pi+p0which will be z < 1 if there are isuch that ri=

118

0, pi>0. If z= 0 then p, r are orthogonal / their supports have an empty

intersection. So we assume z > 0 in the following:

sup

θ∈Rd

hθ, ri − Ξ(θ) = sup

θi:ri>0

hθ, ri+ sup

θi:ri=0

−Ξ(θ)

= sup

θi:ri>0

hθ, ri − lim

θi→−∞:ri=0 −Ξ(θ)

= sup

θi:ri>0

hθ, ri − log d

j=1

rj>0

pjeθj+p0

= sup

θi:ri>0

hθ, ri − log 1

z+ log d

j=1

rj>0

pjeθj+p0−log 1

z

= sup

θi:ri>0

hθ, ri − log d

j=1

rj>0

zeθj+p0

z+ log 1

which is now of the kind Ξ(r) with Ξ associated with the submeasure p′=

(pi

z;i∈ {1,...,d} ∩ supp(r)). rand p′are now equivalent and our initial

reasoning applies. 

4.4.22

So if Ξ∗(r)<∞for r, p that are not equivalent there is no optimising θ∈Rd.

However, we can still associate with Ξ∗(r) a change of measure for the sub-

probability measure pthat changes pinto r. Note that Ξ is different from the

inter event times lmgfs in that it comes from a random variable with point

mass.

Corollary 4.4.23. If Ξis associated with the sub-probability measure pon

Rd(cf definition 4.4.8) and θ∈Rdthen

∇Ξ(θ) = "pieθi

j=1 pjeθj+p0#i=1,...,d

is a subprobability measure. ∇Ξ(i)(θ)is a probability measure if pis and

∇Ξ(θ)i>0⇔pi>0

Chapter 5

Stochastic networks and

associated processes

This chapter introduces stochastic networks and the processes we work with

in the next chapter.

We give a formal definition of stochastic networks starting from random

walks on graphs. We will point out how the generalised Jackson network and

the Jackson network are instances of stochastic networks.

As we are concerned with rare events for generalised Jackson networks where

we can observe initial queue sizes we introduce tools to describe the expected

behaviour of such a network with an initial condition. The initial condition

will tell us which queues are initially full with a large queue size and which

are about empty. In order to describe the expected future behaviour of the

network with a given present starting point we will apply the Skorohod map.

Processes we work with are the free, the network, and the local process

and we develop their change of measure. For the free process we will develop

sample path large deviations.

We repeat our notation for counting processes.

inter event rcp, renewal lmgf of rcp rate

time counting process

τ N :t7→ N(t) Γ(θ) = limt→∞ 1

tlog E[eθNt]Γ′(0) = 1

E[τ]

chapter 2 definition 3.1.1 section 3.3 definition 3.6.12

We have further defined the scaled process Nnin 3.4.1, the split process

119

120

Nsp in 4.4.1 with Nsp

nthe scaled split process, and the restarted process

Nre,(s1,...,sk)in 3.4.16 with the scaling specified in 3.4.12.

And we repeat our assumptions

Assumption 5.0.2. If τ < ∞is a non-deterministic inter event time then

the assumptions on inter event times of chapter 2 should hold: 2.2.2, 2.2.13,

2.4.2.

In this chapter we will have different counting processes in the setting

of a stochastic network: processes counting external arrivals at nodes of the

network, we will call them arrival processes, and those counting how many

customer can be served over any period of time, these will be denoted service

processes.

5.1 Stochastic networks

Definition 5.1.1 (Graph).A graph is a collection of nodes and edges. For

a graph of finitely many nodes we denote them as {1,...,d}for some d∈N.

Edges will be directed and are written

{(i, j)|i, j ∈ {1,...,d}}.

In this thesis we work with finite directed graphs only.

Definition 5.1.2 (Random walk on a graph).Given

•a graph with nodes {1,...,d}for some d∈Nand edges Y⊆ {1,...,d}×2

•a substochastic matrix P∈Rd,d with components pij >0⇔(i, j)∈Y

and rows p(i)

•a fixed node iA∈ {1,...,d}

•a fixed t0≥0

•for each i∈ {1,...,d}a sequence of inter event times τ(i)S

1, τ(i)S

2,...

denoted service times

a random walk zon the graph is a stochastic process giving at each time tthe

position of a customer who enters the graph at time t0at node iAand travels

the graph according to the following rules:

•when at node ifor the k-th time occupy the server for time τ(i)S

5.1 Stochastic networks and associated processes 121

•at the end of the server occupation time / service time leave node i

immediately and either go to node jwith probability pij or leave the

network with probability 1−Pd

j=1 pij.

If t0= 0 the process starts as z(0) = iA, if t0>0the process starts as

z(t) = 0 for t∈[0, t0)and z(t0) = iA. As the customer leaves the network,

the state of the process is fixed at 0.

This random walk on a graph will also be referred to as isolated random

walk on a graph.

Definition 5.1.3 (Stochastic network).A stochastic network is a joint ran-

dom walk on a graph with inter action: If dis the number of nodes of the

network and A(i)are counting processes modelling the arrivals to node isuch

that

•for every jumptime t′of A(j)and jump size ∆A(j)(t′) = kof any j∈

{1,...,d}there are krandom walks zj,A(j)(t′−)+1,...,zj,A(j)(t′)with iA=

jand t0=t′

then define the stochastic network process Zas

Z(i)(t) =

j=1

∞

k=1

11zj,k(t)=i, Z(t) = 





Z(1)(t)

Z(d)(t)





.

A selection of typical interaction of randomly walking customers in the

network is

•queueing at single server nodes: if at arrival at a node the server is

occupied, an arriving customer has to wait until they can occupy the

server;

•state dependent routing: the matrix Pj,n of the random walk zj,n may

be time-inhomogeneous and Pj,n(t) may depend on Z(t−);

•modified service: for each random walk the service times τ(i)

1, τ(i)

2,...at

node imay depend on the value of Zat the time the customer occupies

the server at node i.

If the isolated random walks forming the stochastic network are independent

(no interaction) the stochastic network represents a network with a pool of

infinitely many servers at each node. Generally queueing occurs whenever

there are more customers at a node than servers, as a server can only serve

122

one customer at a time. The number of customers exceeding the number of

servers forms the queue at that node. Possible permutations of the sequence

of customers arriving at a node and of customers starting service are speci-

fied in the queueing discipline (most well known are FIFO, LIFO which are

explained in every book on queueing). Examples for state dependent routing

are “join the shortest queue” and modifications of it, like: ”joint the shortest

queue of a randomly sampled subset”. Modified service might be: if queue

iis empty over time [t1, t2) then the server at node ibecomes an additional

server at some node jwith a non-empty queue ([8] of Robert Foley and David

McDonald).

Definition 5.1.4 (Generalised Jackson network).A generalised Jackson net-

work is a stochastic network with all random walks sharing

•the same graph,

•the same time-homogeneous, deterministic routing matrix P,

•the same distribution of service times at each fixed node; and service

times are independent of how often a customer returns to the node.

Further more

•inter arrival times at each node are iid and independent of everything

else,

•service times at each node are iid and independent of everything else,

•there is (only) queueing interaction with a single server at each node.

In this thesis we assume that all service times and all inter arrival times

satisfy assumption 5.0.2.

Note that under the general assumption 5.0.2 the counting process A(i)as-

sociated with inter arrival times at node iis a renewal counting process with

all jumpsizes equal to unity, and regular (finitely many jumps over finite in-

tervals). The stochastic network process for the generalised Jackson network

Zsatisfies

Z(i)(t) =

j=1

A(j)(t)

k=1

11zj,k(t)=i≤

j=1

A(j)(t)<∞

which makes Z:R≥0→Nd.

5.1 Stochastic networks and associated processes 123

For the generalised Jackson network we have for each random walk on the

graph zj,n a sequence of inter event times at each node. Due to independence

of all service times of the isolated random walks we can replace service times

at each node iby just a single sequence of iid service times. These service

times will form the renewal counting processes S(1),...,S(d).

We summarise the counting processes introduced so far:

inter event time rcp lmgf rate

τ(i)AA(i):t7→ A(i)(t) Γ(i)

Aλi= Γ(i)′

A(0) = 1

E[τ(i)A]

τ(i)SS(i):t7→ S(i)(t) Γ(i)

Sµi= Γ(i)′

S(0) = 1

E[τ(i)S]

The following definition names the processes required to describe the

generalised Jackson network the networks primitives. We will later, in 5.3.4,

give another equivalent definition for the network primitives that will be

technically more convenient and will apply results from section 4.4.

Definition 5.1.5 (Network primitives I).For a generalised Jackson network

with dnodes the following processes are denoted the networks primitives:

•arrival processes A(1),...,A(d)with inter event times τ(i)Aand lmgf Γ(i)

for A(i);

•service processes S(1),...,S(d)with inter event times τ(i)Sand lmgf Γ(i)

for S(i);

•the processes of routing decisions

n7→

k=1

r(i)

with r(i), r(i)

1, r(i)

2, . . . iid with values in {e1,...,ed,~

0}and P(r(i)=ej) =

pij for i= 1,...,d and Ξ(i)the lmgf of r(i), cf 4.4.8.

Definition 5.1.6 (Rates λ, µ, P)).For a generalised Jackson network with

network primitives as in 5.1.5 and typical inter arrival times τ(i)Afor the

primitive arrival process A(i)and typical service times τ(i)Sfor the primitive

service process S(i)(for i= 1,...,d) let

•λi= Γ(i)′

A(0) = 1

E[τ(i)A]and λ∈Rd

≥0have coordinates λifor i= 1,...,d;

•µi= Γ(i)′

S(0) = 1

E[τ(i)S]and µ∈Rd

>0have coordinates µifor i= 1,...,d;

124

•p(i)=∇Ξ(i)(0) = E[r(i)]and P∈Rd×dhas rows p(i)for i= 1,...,d.

Then we denote by (λ, µ, P)the arrival rates, the service rates, and the rout-

ing matrix of the generalised Jackson network. In short we say that (λ, µ, P)

are the rates of the network.

In the following section we will investigate stochastic networks through

their rates only. Results we obtain there are applicable to stochastic net-

works that are not generalised Jackson networks: They for example apply to

networks where arrival processes are sums of independent renewal counting

processes.

Definition 5.1.7 (Jackson Network).A Jackson Network is a generalised

Jackson network with all inter arrival and service times being exponentially

distributed.

James Jackson’s definition in [14] of what he then called a network of

waiting lines allowed for finitely many servers at each node and did specify

the queueing discipline as “first come first serve”. In the context of large de-

viation for the queue sizes multiple servers are modeled as a single server and

the service times at that node are decreased correspondingly. Since we do not

distinguish different classes of customers and investigate queue sizes only and

not delay the queueing discipline is not relevant here. Our definition 5.1.7

of Jackson networks corresponds to the definition of Ignatiouk-Robert in [12].

The following definition is reasonable in a network where customer share

the same underlying graph as they do in the generalised Jackson network.

Definition 5.1.8 (Path).In a network with routing matrix Pwe say there

is a path from node ito node j6=i(i, j ∈ {1,...,d}) if there is a sequence

k1,...,kmof nodes in {1,...,d}such that

pik1pk1k2. . . pkmj>0

or equivalently Pm+1(i, j)>0with Pm+1 the m+ 1-times product of P. We

say that such a path has length m+1 as it moves along m+1 (not necessarily

different) edges.

We make the following assumption for the rest of this thesis:

Assumption 5.1.9 (Open, no immediate feedback).Networks are open and

without immediate feedback:

•For each node jthere is a node iwith λi>0and a path from ito j.

If λi>0node iis called an entry node.

5.1 Stochastic networks and associated processes 125

•For each node ithere is a node jwith pj0>0and a path from ito j.

If pi0>0node iis called an exit node.

•When leaving a node customers are not immediately fed back into the

same node.

The assumed properties are reflected in the routing matrix P:Pis strictly

substochastic and −1,1 are no eigenvalues. Then (id −P⊤)−1exists, and

(id −P⊤)−1λ > 0 (coordinate-wise). Also (idM−P⊤

M)−1exists for any

M⊆ {1,...,d}. No immediate feedback in the network is equivalent to

pii = 0 for i= 1,...,d.

Satisfying assumption 5.1.9 is a property of a network that depends on the

network topology or the adjacency matrix associated with Pand the entry

nodes {i|λ(i)>0}. Different networks with equivalent distributions for inter

arrival times τ(i)Aand inter service times τ(i)Sand equivalent distributions

for all routing decisions r(i)are either all open and without feedback or none

of them is.

Example 5.1.10. The network of figure 5.1 satisfies 5.1.9. Nodes 1,2are

entry-nodes, nodes 3,4are exit nodes.

@@@@







- -

Figure 5.1: An open network without immediate feedback for d= 4

The graph is even strongly connected which is not a general assumption.

Removing immediate feedback

The assumption 5.1.9 of no feedback is no restriction at all. For an open

network with feedback we can remove the feedback and remodel the network

such that the theory developed in this thesis is applicable.

In a feedback queue customers finishing service have two possibilities to pro-

ceed: either leave the network or join the queue again. If decisions to join

126

the queue again are iid with pthe probability to do so then a typical total

service time is the τ◦defined in 2.1.4. We repeat the definition:

τ◦=

k=1

τkwith geometrically distributed G≥1 and iid τ, τ1, τ2,...

The ◦above the τis a reference to the loop and the parameter pis dropped

in the notation.

We can now remodel the queue as one without feedback but with inter event

times τ◦

1, τ◦

2,....

Claim 5.1.11. Queue sizes of the single queue with inter event times τ, τ1, τ2,...

and feedback and the GI/GI/1 with service times τ◦, τ◦

1, τ◦

2,... and without

feedback have the same distribution.

Proof of 5.1.11 by coupling: Let τ1, τ2,... be the iid sequence of service

times and r, r1, r2,... the sequence of routing decisions r∈ {0,1}and r= 1

representing a customer just having finished service rejoins the feedback-

queue.

We model an associated queue without feedback the following way: With the

sequence of routing decisions associate a sequence G1, G2,...that counts the

length of runs of 1-s:

•G1= 1 + max{j≥1|1 = r1=···=rj,}

•K(i) := Pi−1

k=1 Gk,Gi= 1 + max{j≥1|1 = rK(i)+1 =···=rK(i)+j}.

(with max ∅= 0) Then the G1, G2,...are iid geometric and τ◦

i=PGi

g=1 τK(i)+g

defines service times of the associated queue without feedback.

Starting from one set of sequences τ1, τ2,...and r1, r2,...we have customers

leaving the respective queues with or without feedback at exactly the same

epochs: at Pk

j=1 τkwith a ksuch that rk= 0. With the same arrival process

both queues are always of the same size. 

5.1.11

The same remodelling of inter event times can be applied to a queue in

a network. If at node icustomers are routed internally wrt 





α1

αd





with

αi>0 we can change inter event times at node ifrom τ(i)to τ(i)◦(with

p=αi) and routing to p(i)=





pi1

pid





with pii = 0, pij =αj

1−αi.

5.2 Stochastic networks and associated processes 127

5.2 Deterministic descriptions of stochastic

networks

In this section we focus on the deterministic rates of a stochastic network.

This section generally applies to stochastic networks with deterministic rates,

including the generalised Jackson network. In terms of only the rates the

Jackson and the generalised Jackson network are not distinguishable.

Definition 5.2.1 (Flow).If (λ, µ, P)are the deterministic rates of an open

stochastic network then

ν=λ+P⊤min{ν, µ}

is called the traffic equation (in ν) and its unique solution is denoted the flow

in the network.

Uniqueness of νis proved in [11]. Given the network flow one can decide

if or if not the network is ergodic.

Definition 5.2.2 (Ergodic network).A network with flow νwith νi< µifor

all iis called ergodic.

From ergodicity we get the existence of an equilibrium distribution for

the number of customers in each queue of the Jackson network. We have

seen this in the introduction. This implies that in the limit of the scaling

3.4.1 all queue sizes will uniformly on compacts stay small: The a.s. limit of

the network process is the function t7→ 0. That is we know the determinis-

tic limiting behaviour, the expected behaviour. We expect the same for the

generalised Jackson network.

An easy to check criterion for ergodicity is: Calculate the flow rates as if

all service rates were = ∞.

ν=λ+P⊤min{ν, ∞} =λ+P⊤ν⇔ν= (id −P⊤)−1λ

and check if νi< µifor all nodes iof the network. If not, the network is not

ergodic. This can be written as an (coordinate wise) ”equilibrium inequality”

(id −P⊤)−1λ < µ (5.1)

If a network is not ergodic solving the traffic equation is still important. For

a nice way to solve the traffic equation see [11] of Jonathan Goodman and

William Massey where they give an algorithm to calculate νwith at most d

matrix vector multiplications in Rd.

128

Definition 5.2.3 (Traffic intensity).In a network with flow νand service

rates µthe traffic intensity ρiat the i-th node is defined as ρi=νi

µi.

Definition 5.2.4. In a network with traffic intensity ρwe say node iis

•a bottleneck if ρi≥1

•a strict bottleneck if ρi>1

•ergodic if ρi<1.

Definition 5.2.5 (Loss rate).In a stochastic network with deterministic

rates (λ, µ, P)and flow νdefine the loss rate y∈Rd

≥0

y:= max{0, µ −ν}.(5.2)

For ergodic nodes we have yi=µi−νiand in an ergodic network

y=µ−(id −P⊤)−1λ(∈Rd

>0) (5.3)

Definition 5.2.6 (Free drift, equilibrium network drift).In a stochastic

network with deterministic rates (λ, µ, P)define the free drift of the network

λ+P⊤µ−µ.

Additionally, for νthe flow of the network (cf 5.2.1) define the network drift

as the coordinate wise maximum

max{0, λ +P⊤min{ν, µ} − µ}

Remark 5.2.7. •The equilibrium drift of an ergodic network is 0.

•The network drift can equivalently be expressed as max{ν−µ}.

•νi−µi=µi(ρi−1) for each i= 1,...,d from expressing νiin terms

of ρi.

The following is an example on rates and drifts in a network with fixed

routing probabilities and service resources at the nodes. The network drift

is qualitatively different wrt arrival processes with the different rates λ.

Example 5.2.8. Consider the network 5.1 with λ, µ, P where

µ=









, P =





0 0 1 0

0 0 0 1

40 0





.

5.2 Stochastic networks and associated processes 129

408 32

240 12

364

32280 16

Ergodic network Non-ergodic network,

bottleneck node 3 (yellow)

Figure 5.2: Realisation of the queue sizes in the networks of example 5.2.8.

Then the free and network drift and the set of bottlenecks depend on the ar-

rival rates

arrival rates free drift network drift bottlenecks

λ=









1

4





−5

−3

−4





0since (id −P⊤)−1λ=1

3









< µ none

λ=









1

4





−1

−4





max{





2.5

3.75





−µ , 0}=









3 (strict)

Figure 5.2 gives two simulations of the network of example 5.2.8 and for

the different arrival rates. Note the different scales for the y-axis: In the

ergodic network all queues stay small, in the non-ergodic network the queue

size at the bottleneck grows.

In the following two subsections we interprete the solution of the traf-

fic equation as actual flow and we will look at non-ergodic networks and

investigate ergodic subnetworks.

130

5.2.1 Fluid network

We generally look at a network that is travelled by customers or packages,

things that can be counted. We will now look at an associated sewer like

network model.

99K

µ1p12

@@@@@@@@@

µ1p13





µ4p42

µ4p41

µ3p34

µ2p23 -

µ3p30

µ4p40

Figure 5.3: The fluid network of definition 5.2.9 associated with the network

of figure 5.1. 99K represents an outlet.

Definition 5.2.9 (Fluid model associated with a stochastic network).Given

a stochastic network with deterministic rates (λ, µ, P)the associated deter-

ministic fluid model is based on the same graph where edges now represent

pipes:

•At node iwith λi>0some kind of fluid flows into the network at rate

λi.

•The edge / pipe (i, j)has capacity µipij. The joint capacity of edges

leaving from node iis µi=Pd

j=0 µipij.

•At each node there is an outlet.

•Propagation: If at node ithe sum of incoming flow is strictly less than

µithen the incoming flow is divided into outgoing flow to nodes 1,...,d

and the outside world according to pi1,...,pid and pi0. Otherwise all

outgoing pipes / edges will get flow equal to their capacity and the non-

negative surplus leaves the network through the outlet.

In this setting the solution νto the traffic equation is the actually ob-

served equilibrium flow in the network: νiis the total flow into node iand

5.2 Stochastic networks and associated processes 131

min{µi, νi}pij is the flow through the edge / pipe connecting nodes iand j.

In a non-ergodic network there is a node where at rate νi−µi≥0 fluid leaves

the network through the outlet. All nodes that can route all incoming flow

to nodes j∈supp(p(i)) and still have some spare capacity left are ergodic

nodes.

5.2.2 Subnetworks

Distinguishing bottlenecks and ergodic nodes in a network allows to find

an ergodic subnetwork. We can move the bottleneck nodes to the outside

world and calculate flow rates in the network of remaining nodes. In the flow

network moving node i“to the outside world” means removing node ifrom

the network and increasing for all nodes jin the support of p(i)the arrival

rate from λjto λj+µipij. From the intuition of the flow network it is clear

that flow rates in the ergodic subnetwork remain the same. However, we do

the explicit calculations.

The network starting empty

Let νbe the flow of the network with rates (λ, µ, P) as defined in 5.2.1.

Partition nodes into two sets: the ergodic nodes Eand the bottleneck nodes

Bas defined in 5.2.4.

Claim 5.2.10. In a network with rates (λ, µ, P)and flow νpartition nodes

into ergodic nodes Eand bottleneck nodes Band consider the subnetwork of

nodes Ewith rates

λE+ (P⊤)E B µB, µE, PE(5.4)

Then νE= [νi]i∈Eis the flow in this subnetwork.

Proof of 5.2.10: For the network of all dnodes the following initial state-

ment is true.

ν=λ+P⊤min{ν, µ}

any partition

⇔νB=λB+P⊤

Bmin{νB, µB}+ (P⊤)B E min{νE, µE}

νE=λE+ (P⊤)E B min{νB, µB}+P⊤

Emin{νE, µE}

our partition

⇔νB=λB+P⊤

BµB+ (P⊤)B E νE

νE=λE+ (P⊤)E B µB+P⊤

EνE(5.5)

The second equation of (5.5) can be rearranged:

νE=λE+ (P⊤)E B µB+P⊤

EνE=λE+ (P⊤)E B µB

|{z }

arrivals to the ergodic subnetwork

+P⊤

Emin{νE, µE}

|{z }

=νE

132

So the νEwe extracted coordinate wise from the flow of the network of all d,

bottleneck and ergodic, nodes is the solution for the traffic equation in the

remodelled smaller network of ergodic nodes with rates (5.4). 

5.2.10

Let us now turn again to the bottleneck nodes

Claim 5.2.11. In a network with rates (λ, µ, P)and flow νpartition nodes

into ergodic nodes Eand bottleneck nodes B. Then the bottleneck nodes B

have equilibrium network drift

λB+ (P⊤)BE νE+ (P⊤

B−id)µB

with νE= [νi]i∈Ethe flow in the ergodic subnetwork. The equilibrium network

drift equals the free drift.

Proof of 5.2.11: we prove that

νB−µB=λB+ (P⊤)BE νE+ (P⊤

B−id) µB

where the lhs is the network equilibrium drift by definition 5.2.6. We apply

the same partitioning of {1,...,d}=E∪Bwe solve the second equation of

(5.5) for νE

νE=λE+ (P⊤)E B µB+P⊤

EνE⇔νE= (id −P⊤

E)−1λE+ (P⊤)E B µB

and plug it into the first equation of (5.5). Substraction of µBgives the

claimed equilibrium drift. Which is a free drift as in definition 5.2.6. 

5.2.11

We can also get an expression for νE, νB(And then for the drift of bot-

tleneck nodes) with on the rhs only network primitives:

νE= (id −P⊤

E)−1(λE+ (P⊤)EB µB(5.6)

νB=λB+P⊤

BµB+ (P⊤)B E (id −P⊤

E)−1λE+ (P⊤)E B µB(5.7)

We remember that the partition of nodes into sets Eand Bwas such that

νE< µEand νB≥µBcoordinate wise.

Another approach to finding the ergodic subnetwork is in Chen and Mandel-

baum, [3] p. 411. LCP is short for “linear complementary problem”.

Definition 5.2.12 (LCP in Rd).Let x , P be given with x∈Rdand P

a substochastic matrix in Rd×dand 1not an eigenvalue of P. The linear

complementary problem is to find (y, z) satisfying

z=x+ (id −P⊤)y , y, z ≥0,hy, zi= 0 (5.8)

5.2 Stochastic networks and associated processes 133

A solution to the LCP exists and is unique for any xif all the principal

minors of id −Pare positive (cf [1] p. 271). This is what we get from 1 not

being an eigenvalue of P. The following claim is from [3] p. 412.

Claim 5.2.13. In a stochastic network with deterministic rates (λ, µ, P)and

free drift x=λ+P⊤µ−µlet (y, z)be the solution to the LCP in Rdfor

(x, P). Then zis the equilibrium network drift and yis the loss rate.

Proof of 5.2.13:

•y= max{0, µ −ν} ≥ 0, z= max{0, ν −µ} ≥ 0

•max{0, µ −ν} · max{0, ν −µ}= 0 ≥0

•to show z=x+ (id −P⊤)ywe start with partitioning nodes {1,...,d}

into ergodic nodes Eand bottleneck nodes B. The network drift is

then 0

νB−µB. We show that the drift equals x+ (id −P⊤)y.

x+ (id −P⊤)yE=λE+(PT−id) µE+(id −P⊤)yE

=λE+P⊤

E−id (P⊤)EB (µ−y)

=λE+ (P⊤

E−id) (µE−yE

|{z }

=νE

) + (P⊤)EB (µB−yB

|{z}

)

(5.6)

= 0

For the bottleneck nodes

x+ (id −P⊤)yB

=λB+ (P⊤

B−id) (µB−yB

|{z}

) + (P⊤)BE (µE−yE

|{z }

=νE

)

5.2.11

=νB−µB

So we feed the LCP the free drift and get the loss rate and the equilib-

rium network drift. We can identify bottlenecks and ergodic nodes from the

solution of the LCP by

•iis a bottleneck if zi≥0

•a strict bottleneck if if zi>0

•an ergodic node if zi= 0 and yi>0.

134

0 408 24 2816 32204

400

208 3228

12 3624

Figure 5.4: The ergodic network of 5.2.8 with a queue starting non-empty.

This may or may not change ergodic nodes to become bottlenecks.

The network starting non-empty

Similar to observing the subnetwork of ergodic nodes we can investigate a

network where some nodes are initially non-empty. We would investigate

the subnetwork of initially empty nodes and the initially non-empty nodes

separately.

The reason why bottleneck nodes and full nodes are treated similarly is:

bottlenecks tend to fill up if started empty (positive network drift), so the

initial difference vanishes immediately: We see this in figure 5.5 which shows

simulations of the non-ergodic network of example 5.2.8: Qualitatively the

evolution of queue sizes is the same if or if not the bottleneck node (green)

starts empty or not.

The following is a generalisation of the LCP to function space. It will

lead to a definition of a unique network drift and loss rate in a network

where initial nodes may be non-empty. The Skorohod problem is generally

investigated in the space D([0,∞),Rd) functions (cf [18], appendix D) but

we only need it for (piecewise) linear input functions. In [3], section 5.2

p. 431 the Skorohod problem is stated under the name “oblique reflection

mapping”. We give a simplified version:

Theorem 5.2.14 (Skorohod problem for linear input functions).If

•P∈Rd×dis substochastic with ρ(P)<1

5.2 Stochastic networks and associated processes 135

404 280

12 36

32168

248

32 36284

4016 200

Figure 5.5: The non-ergodic network of 5.2.8, the bottleneck node start-

ing empty or not does not affect the qualitative behaviour of the remaining

ergodic nodes

•z0∈Rd

≥0,θ∈Rd

and X:X(t) = z0+t θ then there are unique functions Y, Z continuously

depending on Xsuch that

•Z=X+ (id −P⊤)Y

•Y, Z ≥0

•Y(0) = 0;Z(0) = z0

•Ycoordinate wise increasing and for each i∈ {1,...,d}:Yiincreases

at times t≥0when Zi(t) = 0.

We consider the Skorohod problem with the linear input function with

the free network drift as slope.

Claim 5.2.15. Consider the Skorohod problem 5.2.14 for P , X(t) = z0+

t(λ+ (P⊤−id)µ). Then Y, Z of the solution of the Skorohod problem are

linear over [0, T]for some T > 0.

Proof of 5.2.15: Let Λ = {i|z0,i >0}and νΛthe flow in the Λc-

subnetwork with rates

(λΛ, µΛc, PΛc), λΛ=λΛc+ (P⊤)ΛcΛµΛ

136

Let K={i|z0,i >0 or for i∈Λc:νΛ

i> µi}and consider the subnetwork of

Kc-nodes with rates

(λK, µKc, PKc), λK=λKc+ (P⊤)KcKµK

Note that νΛ

i=νK

ifor i∈Kcby 5.2.10 and that Kcnodes are ergodic or

non-strict bottlenecks: the flow νKsatisfies νK

i≤µifor all i∈Kc.

Rearrange nodes such that Λ = {1,...,|Λ|} and Λ , K ={1,...,|K|} (this

works since K⊇Λ in the above construction).

Now set

•zKcto the network drift of the Kcsubnetwork and y′∈R|Kc|to the

loss rates:

zKc=λKc+ (P⊤)KcKµK+ (P⊤

Kc−id) (µKc−y′)

which is = 0 since Kcnodes are no strict bottlenecks. Equivalently let

(zKc, y′) be the solution of the LCP in R|Kc|with (x, P) of definition

5.2.12 as

x=λKc+ (P⊤)KcKµK+ (P⊤

Kc−id) µKc, P =PKc

•zKto the drift of K-nodes: zK=λK+ (P⊤)KKcνK+ (P⊤

K−id) µK.

Note that zimay be negative for i∈Λ⊆K.

Then let Z(t) = z0+tz and Y(t) = ty with y=0

y′. We rearrange

z=zK

zKc=λ+(P⊤

K−id) µK+ (P⊤)KKc(µKc−yK)

(P⊤)KcKµK+ (P⊤

Kc−id) (µKc−yK)

=λ+ (P⊤−id) (µ−y)

=λ+ (P⊤−id) µ+ (id −P⊤)y

We have

•i∈Kc⇒z0,i = 0, zi= 0

•i∈K\Λ⇒z0,i = 0, zi>0

•i∈Λ⇒z0,i >0, zi∈R

5.3 Stochastic networks and associated processes 137

which implies that there is T > 0 such that z0+Tz ≥0 coordinatewise. We

now give the explicit solution of the Skorohod problem on [0, T]:

Y(t) = t y

Z(t) = z0+tz

=z0+tλ+ (P⊤−id) µ+ (id −P⊤)y

=z0+tλ+ (P⊤−id) µ+ (id −P⊤)t y

=X(t) + (id −P⊤)Y(t)



5.2.15

Claim 5.2.15 can be generalised to arbitrary Tand piecewise linear Y, Z.

Definition 5.2.16 (Network drift and loss rate, non-empty network).Con-

sider a stochastic network with deterministic rates (λ, µ, P). Let z0∈Rd

≥0

and consider the Skorohod problem with

•P,

•z0,

•X(t) = z0+tλ+ (P⊤−id)µ.

If linear processes Z(t) = z0+tz, Y (t) = ty are the solution of this Skorohod

problem then define

•zas the network drift

•yas the loss rate.

For well-defined-ness note that from the Skorohod problem z0,i = 0 ⇒

yi≥0 and z0,i >0⇒yi= 0. The network drift and loss rate are unique

from uniqueness of the solution to the Skorohod problem.

Remark 5.2.17. If a network with rates (λ, µ, P)starts in z0and Λ =

{i|z0,i >0}is the set of indices of queues starting nonempty and zis the

network drift then

•zΛ= [zi]i∈Λis the free drift of the initially non-empty nodes i∈Λ;

•zΛc= [zi]i∈Λcis the equilibrium drift of the subnetwork of Λc-nodes, the

initially empty nodes.

138

5.3 Processes

We now move from vectors of real numbers to sample paths. This section

is about different kind of processes we need to describe the behaviour of the

network, especially with reference to it starting non-empty and being non-

ergodic.

We define stochastic processes to analyse the generalised Jackson network

in terms of large deviations. We will emphasise the changes of measure that

maintain desirable properties of the processes.

5.3.1 The free process

We describe the free process associated with a generalised Jackson network

and investigate its large deviation behaviour. The free process is interesting

since it behaves like a network in a certain way and will be easy to analyse in

terms of large deviations. We gain valuable insights we will apply to obtain

the local large deviations of the generalised Jackson network.

We apply results from chapter 4 on the sample paths large deviations of

the networks primitives to obtain relatively easily a sample path large devi-

ation principle for the free process. We present it in 5.3.13. Similarly the

change of measure for the free process can be given easily relying on chapter

4 and independence of the network primitives. We define and interprete a

change of measure in 5.3.14.

Given a generalised Jackson network of dnodes with primitive processes

of 5.1.5 we make some definitions:

Definition 5.3.1 (Vectorial arrival process).In a stochastic network with d

nodes, network primitives 5.1.5 where we denote the arrival processes A(1),...,A(d)

set

A=





A(1)

A(d)



.

Definition 5.3.2. In a stochastic network with dnodes, network primitives

5.1.5 we define for the i-th service process S(i)and the i-th process of routing

decisions n7→ Pn

k=1 r(i)

S(i):t7→

S(i)(t)

k=1

r(i)

k−eiS(i).

5.3 Stochastic networks and associated processes 139

Remark 5.3.3. For the primitive service process S(i)and the process S(i)sp

split wrt S(i)and (pi1, . . . , pid,1−Pd

j=1 pij)as in 4.4.1. For T(i)of definition

4.4.7 we have

T(i)S(i)sp =T(i)





S(i1)

S(id)

S(i,d+1)





=−eiS(i)+

k=1

ekS(ik)=S(i)

and for this process we have developed the sample path large deviations in

section 4.4.3.

We will refer to S(i)as a renewal counting process, too. Its coordinates

count customers leaving node iand customers moving from node ito the

other network nodes and times between changes of state are independent.

We now give a definition for the network primitives that is equivalent to that

in 5.1.5

Definition 5.3.4 (Network primitives II).If for i= 1,...,d

A(i), S(i), n 7→

k=1

r(i)

are the network primitives of a generalised Jackson network as defined in

5.1.5 and Ais defined as in 5.3.1 and S(i)for i= 1,...,d as in 5.3.2 then

A, S(1),...,S(d)

represent the same generalised Jackson network. These processes will be de-

noted the networks primitives.

In [3], section 2.7 these processes together with the initial starting point

Z(0) of the network are denoted the networks primitives. In a generalised

Jackson network the primitives are independent. With these we define the

Definition 5.3.5 (Free process).X=A+Pd

i=1 S(i).

The process Xincreases in its i-th coordinate at the occurrence of an

event of the arrival process A(i). When S(i)changes state we can interprete

this as a customer leaving node iand being routed to some other network

node or leaving the network. This change of state also happens in the free

process X. In these aspects Xdescribes the stochastic network well. How-

ever, the free process does not suitably describe a stochastic network because

140

coordinates of Xmay be of negative values.

The free process is nice to work with: We will develop its large deviations in

the following sections. Before that we want to connect the free process with

the free drift defined in section 5.2.

Claim 5.3.6. Network primitives in a generalised Jackson network converge

a.s uniformly on compacts with a linear limit.

Proof of 5.3.6: This follows as a functional strong law of large numbers for

the renewal counting processes, cf 7.1.3 of the appendix, or from the sample

path large deviations proved in chapter 4. 

5.3.6

Definition 5.3.7 (Drift of S(i)).T(i)µip(i)=µi−ei+p(i)

Claim 5.3.8. The free drift λ+ (P⊤−id)µdefined in 5.2.6 is the drift of

the free process.

Proof of 5.3.8: The drift of the process as its a.s limit under the scaling

and wrt the supnorm over some compact interval.

lim

t→∞

tXt=λ+

i=1

lim

t→∞

tS(i)

t=λ+

i=1

µi−ei+p(i)=λ+ (P⊤−id) µ



5.3.8

For the network primitives and the free process the rates of section 5.2 de-

scribe the expected behaviour of the process in the limit of the scaling 3.4.1.

Lmgf for free process

We calculate the lmgf Ψ of the free process Xand prove its strict convexity.

Definition 5.3.9 (Lmgf Ψ of the free process).If Xis the free process build

from the network primitives of a generalised Jackson network as defined in

5.1.5, 5.3.4 then the lmgf of Xis

Ψ(θ) = lim

t→∞

tlog E[ehθ,Xti] =

i=1

Γ(i)

A(θi) + Γ(i)

S(−θi+ Ξ(i)(θ)).

For the form of the lmgf we have applied independence of network primi-

tives and the form of the lmgf of S(i), in the representation of 5.3.3, of claim

4.4.9:

E[ehθ,Xti] =

i=1

E[eθiA(i)

t]E[ehθ,S(i)

ti].

5.3 Stochastic networks and associated processes 141

Claim 5.3.10. D(Ψ) = Rdand Ψis strictly convex.

Proof of 5.3.10: Finiteness of Ψ is a direct consequence of finiteness of all

involved Γ’s and Ξ’s. Convexity of Ψ is immediate since it is the limit (in

t) of strictly convex (in θ) functions (cf 2.2.8). In this approach in the limit

strictness is lost, so we choose another: Assume all network primitives are

non-deterministic. Let α, β ∈Rd,α6= 0. We show that along the half line

{β+cα |c≥0}the function c7→ Ψ(β+c α) is strictly convex.

We calculate derivatives in direction α.

dcΨ(β+cα)

i=1

αiΓ(i)′

A(βi+cαi)

+Γ(i)′

S(−(β+cα)i+ Ξ(i)(β+cα)) (−αi+d

dcΞ(i)(β+cα))

dc2Ψ(β+cα) (5.9)

i=1

α2

iΓ(i)′′

A(βi+cαi)

+Γ(i)′′

S(−(β+cα)i+ Ξ(i)(β+cα)) (−αi+d

dcΞ(i)(β+cα))2

+Γ(i)′

S(−(β+cα)i+ Ξ(i)(β+cα)) d2

dc2Ξ(i)(β+cα)

where the Γ′′ are non-negative (cf 2.2.5). Convexity of the Ξ(i)follows from

their definition 4.4.8 (and convexity of K(i)and linearity of T) or alternatively

from the following application of Jensen’s inequality.

dc2Ξ(i)(β+cα) (5.10)

dc Pd

j=1 αjpij exp{(β+cα)j}

exp{Ξ(i)(β+cα)}

j=1

α2

pij exp{(β+cα)j}

exp{Ξ(i)(β+cα)}−d

j=1

αj

pij exp{(β+cα)j}

exp{Ξ(i)(β+cα)}2

≥0

We have (5.10)= 0 for p(i)a point measure (when routing from node iis

deterministic) or when the α1,...,αdare constant on the support of p(i).

142

We now argue for strict convexity. We show that for α, β ∈Rd,α6= 0

the second derivative (5.9) is strictly positive.

Strict positivity of (5.9) follows if there is isuch that λiαi6= 0 since then

i=1 α2

iΓ(i)′′

A(αi)>0 from strict convexity of the ΓA(·)’s. Otherwise

i=1 α2

iΓ(i)′′

A(αi) = 0 and we need the ΓS’s to argue for strict positivity.

Let’s assume that

•α, β ∈Rd,α6≡ 0

• ∀i:λi>0⇒αi= 0

•d2

dc2Ψ(β+cα) = 0

and produce a contradiction. The derivatives Γ(i)′

Sand Γ(i)′′

Sare strictly pos-

itive by 2.2.8. So the other sums in (5.9) are = 0 iff

−αi+d

dcΞ(i)(β+cα) = 0 ,d2

dc2Ξ(i)(β+cα) = 0 ∀i(5.11)

First and second derivatives of Ξ(i)on the half line can be written as expec-

tation and variance under exponentially twisted p(i):

dcΞ(i)(β+cα) =

j=1

αj

pije(β+cα)j

eΞ(i)(β+cα)=Ep(i)(β+cα)[α]

dc2Ξ(i)(β+cα)(5.10)

=Vp(i)(β+cα)[α]

and (5.11) becomes

αi=Ep(i)(β+cα)[α],Vp(i)(β+cα)[α] = 0 ∀i(5.12)

From openess of our network we know there is at least one λi>0 and by our

assumption (2nd bullet) for this iwe have αi= 0. From (5.11) we know that

all αjwith j∈supp(p(i)) take the same value (from V= 0) and that this

value is again = 0 (from 0 = αi=E). So moving along a directed spanning

forest with each tree of the spanning forest rooted at an entry node, we find

that αi= 0 at each node we pass. From the spanning property we get α≡0,

the contradiction.

If in the network there are deterministic primitives and some Γ′′

A≡0 the

proof works the same way if there is non-deterministic flow at each node / if

5.3 Stochastic networks and associated processes 143

there is a spanning forest of the network with only those entry nodes as roots

that have non-deterministic arrival processes; all service processes have to be

non-deterministic. This condition (non-det. flow at each node) compares to

condition (P, S) applied in theorem 2.2. in Puhalskii’s [16]. 

5.3.10

Since we did not properly define a spanning forest we at least give exam-

ples.

r r



r r

@@@@

Figure 5.6: Three spanning forests for the network in figure 5.1

The first spanning forest in figure 5.6 is a spanning tree. If the network

is not strongly connected a spanning tree does not necessarily exist.

Large deviations of the free process

We start with a one dimensional large deviation principle and proceed with

the sample path large deviation principle.

Claim 5.3.11. For the mean of the free process a large deviation principle

holds: For open G⊆Rand closed F⊆R

−inf

x∈GΨ∗(x)≤lim infn→∞ 1

nlog P(1

nXn∈G)

lim supn→∞ 1

nlog P(1

nXn∈F)≤ − inf

x∈FΨ∗(x)

with Ψthe lmgf of the free process of definition 5.3.9 and Ψ∗its Fenchel-

Legendre transform.

Proof of 5.3.11: By application of the G¨artner-Ellis theorem. Strict con-

vexity of Ψ and D(Ψ) = Rdare important here (no regularisation in the

G¨artner-Ellis theorem needed). The rate function is the convex Ψ∗, the

Fenchel-Legendre transform of the lmgf. 

5.3.11

144

Claim 5.3.12. Ψ∗is a good rate function and

Ψ∗(v) = inf

a,r,γ

a+(r⊤−id)γ=v

i=1

Γ(i)∗

A(ai) + Γ(i)∗

S(γi) + γiΞ(i)∗(r(i)).

Proof of 5.3.12: We have one dimensional large deviations with good

rate functions for the arrival and split service processes that form the free

process, cf 3.5.1. As summing is continuous in Rwe can apply the contraction

principle (cf 7.5.2 in the appendix) to get a large deviation principle for the

mean of the free process. As rate function we get

v7→ inf

a,s1,...,sd∈Rd

a+Pd

i=1 si=v

i=1

Γ(i)∗

A(ai) + (Γ(i)

S◦(Ξ(i)−πi))∗(si)

Since the rate function of a large deviation object is unique this rate function

has to equal Ψ∗.

Ψ∗(v) = inf

a,s1,...,sd∈Rd

a+Pd

i=1 si=v

i=1

Γ(i)∗

A(ai) + (Γ(i)

S◦(Ξ(i)−πi))∗(si)

4.4.21

= inf

a,s1,...,sd∈Rd

a+Pd

i=1 si=v

i=1

Γ(i)∗

A(ai) + inf

r(i),γi

γi(r(i)−ei)=si

Γ(i)∗

S(γi) + γiΞ(i)∗(r(i))

Now in the set of restrictions we write ras the matrix with rows r(i). Note

that for Ψ∗(v) to be finite rows have to be subprobabilities concentrated on

the support of p(i), cf 4.4.22. We rewrite the restriction as

v=a+

i=1

si=a+

i=1

γi(r(i)−ei) = a+ (r⊤−id)γ

and we have obtained the desired representation. From goodness of the

primitives’ rate functions (cf 2.6.2) all of the above representations of the

rate function are good, too. 

5.3.12

Since we do have sample path large deviation principles for the arrival and

split service processes we can apply the contraction principle to obtain sample

path large deviations for the free process.

5.3 Stochastic networks and associated processes 145

Claim 5.3.13. The free process Xobeys a sample path large deviation prin-

ciple in D([0, T],Rd)equipped with the sup-norm induced topology under the

scaling 3.4.1. The good rate function is

ψ7→ ZT

t=0

Ψ∗(ψ′(t)) dt

for ψ∈AC[0, T],ψ(0) = 0. The rate function equals ∞for all other ψ.

Proof of 5.3.13: We have sample path large deviation principles for each

primitive Aand S(i)(for i= 1,...,d) in (D([0, T],Rd) equipped with the

supremum norm induced topology. For the definition of A, S(i)cf 5.3.1, 5.3.2

and for the sample path large deviation principles 4.3.10, 4.4.20. As primi-

tives are independent we have the joint large deviation principle of

(A, S(1), . . . , S(d))∈D([0, T],Rd)×(d+1)

with the supremum norm

f7→ max

i=1,...,d

j=1,...,d+1

||fi,j|| for f∈D([0, T],Rd)×(d+1) ,||fij|| = sup

t∈[0,T ]

|fij(t)|

The rate function of the joint large deviation principle is the sum of individual

rate functions by independence of the primitives. It is infinite for any fwith

an fij 6∈ AC([0, T],R) as each rate function is concentrated on absolutely

continuous functions. If all fij are absolutely continuous then for this element

f∈D([0, T],Rd)×(d+1)

the rate function has the representation

f7→

i=1 ZT

t=0

Γ(i)∗

A(f′

i1(t)) dt +

d+1

j=2 ZT

t=0 Γ(j)

S◦(−πj+ Ξ(j))∗(f′

·j(t)) dt

Now addition on D([0, T],Rd)× · · · × D([0, T],Rd) is a continuous map wrt

the sup-norm induced topology. Applying the contraction principle we get a

large deviation principle for the free process with rate function

ψ7→ inf

a,s1,...,sd

a+s1+···+sd=ψ

i=1 ZT

t=0

Γ(i)∗

A(a′

i(t)) + Γ(i)

S◦(−πi+ Ξ(i))∗(s′

i(t)) dt

where the infimum can be taken over a, s1,...,sd∈AC([0, T],Rd). Note

that this makes the rate function infinite as soon as ψ6∈ AC([0, T],Rd).

146

We get a lower bound for the rate function by interchanging summation

and integration and then optimisation and integration. With the infimum

inside the integral it changes from an optimisation in function space to an

optimisation in Rd.

inf

a,s1,...,sd∈AC

a+s1+···+sd=ψ

i=1 ZT

t=0

Γ(i)∗

A(a′

i(t)) + Γ(i)

S◦(−πi+ Ξ(i))∗(s′

i(t)) dt

≥ZT

t=0

i=1

inf

a,s1,...,sd∈Rd

a+s1+···+sd=ψ′(t)

Γ(i)∗

A(ai) + Γ(i)

S◦(−πi+ Ξ(i))∗(si)dt

=ZT

t=0

Ψ∗(ψ′(t)) dt

where the last equality is due to the different representations of Ψ∗we have

already seen in the proof of 5.3.12.

Since there are no continuity restrictions for the derivatives and ψis ab-

solutely continuous, we get absolutely continuous functions from integrating

infimisers found inside the integral. The ≥is in fact an equality. 

5.3.13

Change of measure

We define a change of measure process for the free process associated with a

generalised Jackson network.

Definition 5.3.14. Consider a generalised Jackson network with

•primitives defined in 5.1.5, 5.3.4.

•Let Xbe the free process defined in 5.3.5

•with lmgf Ψdefined in 5.3.9.

Define for t≥0and α∈Rd

M(α, t) = exp{hα , Xti − tΨ(α)}r(α, t)

where

r(α, t) =

i=1

r(i)S(Γ(i)

S(−αi+ Ξ(i)(α)), t)r(i)A(−Γ(i)

A(αi), t)

r(i)D(β, t) = Fc

Fc(B(t)) e−βB(t)

5.3 Stochastic networks and associated processes 147

for D∈ {A, S}and Fthe distribution function of inter event times of the

renewal counting process D(i)and B(t)the age of D(i)at time t(for the

definition of r(i)Dcf 3.6.5).

Claim 5.3.15. •t7→ M(α, t)is a change of measure process for the free

process.

•Under the change of measure M(α, ·)the process X=A+Pd

i=1 S(i)is

again a free process associated with the primitives of a generalised Jack-

son network. Each primitive changes its distribution in the following

way:

–if A(i)had inter event densities fthen under the changed measure

A(i)remains a renewal counting process and now has inter event

times density f−Γ(i)

A(αi), cf 2.3.1, (2.4).

–if the routing decision r(i)was distributed on {e1,...,ed,0}with

probability measure (pi1,...,pid,1−Pd

j=1 pij)associated with the

sub-probability measure p(i)then under the change of measure rout-

ing decisions remain iid. r(i)now has the distribution associated

with the sub-probability measure ∇Ξ(i)(α), cf 4.4.23.

–if S(i)had inter event densities fthen under the changed measure

S(i)remains a renewal counting process and now has inter event

times density f−Γ(i)

S◦(−πi+Ξ(i))(α), cf 2.3.1, (2.4).

Corollary 5.3.16. Under the change of measure M(α, ·)the rates of the

primitives change from λ, µ, P to

•λi(αi) = Γ(i)′

A(αi)for i= 1,...,d

•µi(α) = Γ(i)′

S(−αi+ Ξ(i)(α)) for i= 1,...,d

•P(α)⊤= [∇Ξ(1)(α)· · · ∇Ξ(d)(α) ]

The Corollary fits 5.3.15 and is a summary of definitions 3.6.12, 4.4.17,

(4.4.23).

Proof of 5.3.15: For the primitives A(i),S(i)for i= 1,...,d we have in

previous sections developed individual change of measure processes. For the

arrival processes (definition 3.6.4 and representation (3.15)) with γ∈R

(γ, t)7→ exp{γA(i)

t−tΓ(i)

A(γ)}r(i)A(−Γ(i)

A(γ), t)

148

and in vectorial notation with α∈Rd

(α, t)7→ exp{hα , Ati − t

i=1

Γ(i)

A(αi)}

i=1

r(i)A(−Γ(i)

A(αi), t)

and for S(i)(cf definition 4.4.15)

(α, t)7→ exp{hS(i)

t, αi − tΓ(i)

S(−αi+ Ξ(i)(α))}r(i)S(t, −Γ(i)

S(−αi+ Ξ(i)(α)))

Since primitives are independent we can multiply changes of measure pro-

cesses.

i=1

exp{hα , S(i)

ti − tΓ(i)

S(−αi+ Ξ(i)(α))} × rS(i)(Γ(i)

S(−αi+ Ξ(i)(α)), t)

×exp{hα , Ati − t

i=1

Γ(i)

A(αi)}

i=1

rA(i)(−Γ(i)

A(αi), t)

= exp{hα , At+

i=1

S(i)

|{z }

=Xt

i − td

i=1

ΓA(αi) + Γ(i)

S(−αi+ Ξ(i)(α))

|{z }

=Ψ(α)}

i=1

rS(i)(Γ(i)

S(−αi+ Ξ(i)(α)), t)rA(i)(−Γ(i)

A(αi), t)

|{z }

=r(α,t)

= exp{hα , Xti − tΨ(α)}r(α, t).

So M(α, ·) is the compound change of measure we get when changing each

primitive process separately. For the translation of the change of measure

for S(i)into that of S(i)and p(i)cf 4.4.15 and 4.4.16. 

5.3.15

Application of the change of measure in the large deviation

The rate function for the large deviation for the mean of the free process is

the Fenchel Legendre transform of the free process’ lmgf.

Ψ∗(v) = sup

α∈Rd

hα, vi − Ψ(v)

5.3 Stochastic networks and associated processes 149

We want to point out here that the drift of the free process under the change

of measure M(α, ·) with some α∈Rdcoincides with ∇Ψ(α):

dθi

Ψ(α)

= Γ(i)′

A(αi)−Γ(i)′

S(−αi+ Ξ(i)(α)) +

j=1

Γ(j)′

S(−αj+ Ξ(i)(α)) d

dαi

Ξ(j)(α)

=λi(α)−µi(α) +

j=1

µj(α)pji(α)

Where for the last equality we applied corollary 5.3.16. We get

∇Ψ(α) = λ(α) + (P(α)⊤−id) µ(α) (5.13)

Claim 5.3.17. If for v∈Rdthere is α∈Rdsuch that under the change of

measure M(α, ·)the free process has drift vthen

Ψ∗(v) = hα, vi − Ψ(α).

Proof of 5.3.17: From the definition of the Fenchel-Legendre transform

and the above

Ψ∗(v) = sup

α∈Rd

hα, vi − Ψ(α) = hα, vi − Ψ(α)

⇔v=∇Ψ(α)

so ∇Ψ(α) is the free drift (cf 5.2.6) under the changed measure and if this

equals vthen θ=αis the optimiser in the Fenchel-Legendre transform.



5.3.17

It is therefore interesting under which conditions an optimising α∈Rdin

the Fenchel-Legendre transform Ψ∗(v) exists.

Remark 5.3.18. •From identifying rate functions in 5.3.12 we have

finiteness of Ψ∗(v)iff there are a, r, γ such that a+ (r⊤−id)γ=v.

And from goodness of Ψ∗we have existence of an optimiser whenever

Ψ∗(v)<∞:

∃a, γ, r :a+ (r⊤−id)γ=v

⇔ ∃α:λ(α) + (P(α)⊤−id)µ(α) = v

so while α∈Rdhas dcoordinates or degrees of freedom and a, γ, r have

2d+d2this seems not to be relevant in terms of existence of optimisers.

150

•From the form of the rate function in 5.3.12 and openness of the do-

mains D(Γ(i)∗

D)for all D=S, i ∈ {1,...,d}and D=A, i ∈ {1,...,d}

with λi>0(cf 2.8.2): if for v∈Rdexists αsuch that ∇Ψ(α) = vthen

for v′close enough to vthere is α′such that ∇Ψ(α′) = v′. This also

implies that the domain D(Ψ∗)is open.

5.3.2 The network process

This section is on the stochastic process that describes the generalised Jack-

son network. We will construct it from the network primitives in a similar

way as the free process in such a way that its coordinates cannot become

negative. We give the network process’ expected behaviour.

The large deviations of the network process need a more thorough theo-

retical background. The local large deviations for the network process will

be developed in chapter 6. The following theorem 2.1 of Chen and Mandel-

baum [3] allows to move from the networks primitives defined in 5.3.4 to the

network process that we define in 5.3.20.

Theorem 5.3.19. Let z0∈Rand consider a network of dnodes and prim-

itives A, S(1),...,S(d)that satisfy

•A(1),...,A(d)are in D([0,∞),R), non-decreasing and A(i)(0) = 0 for

i= 1,...,d;

•S(i),...,S(d),S(i)=−eiS(i)+Pd

j=1 S(ij)ejwith S(i), S(ij)are in D([0,∞),R),

nondecreasing, and S(ij)(0) = S(i)(0) = 0.

If further there is ǫ > 0such that for each D∈ {A(i), S(i), S(ij)|i, j =

1,...,d}

D(t)≤D(t−) + ǫ

then there exists a unique pair of d-dimensional processes (Z, R)satisfying

•Z(t) = z0+A(t) + Pd

i=1 S(i)◦R(i)(t)

•Z(j)≥0

•R(j)(t) = Rt

011Z(j)(s)>0ds

Note that this theorem applies to the unscaled (ǫ= 1) and scaled primi-

tives (ǫ=1

n).

5.3 Stochastic networks and associated processes 151

Definition 5.3.20 (Network process).Given network primitives A, S(1),...,S(d)

and some fixed z0∈Ndsuch that theorem 5.3.19 applies set

Z(t, z0) = z0+A(t) +

i=1

S(i)◦R(i)(t)

R(i)(t) = Zt

11Z(i)(s,z0)>0ds (j= 1,...,d)

and denote (Z, R)the network process. When addressed in isolation Zis

denoted the queue size process and Rthe runtime process of the network.

The difference of the free and the network process is in the runtime pro-

cess R(i)defined at each node. If queue iis empty during [t0, t1] then during

this time Z(i)= 0 and R(i)′= 0. Thus S(i)cannot change state in [t0, t1] and

no customer leaves the empty queue at node i. The queue size always stays

non-negative.

We have just described the different behaviour of a queue when empty and

when nonempty. This is termed “discontinuous statistics” of the network

process and it is the reason why the network process is more difficult to work

with than the free process.

Definition 5.3.21 (R(·,·)).For t∈R>0and Za well behaved non-negative

process on [0, t]with values in Rlet

R(Z, t) = Zt

s=0

11Z(s)>0ds.

Definition 5.3.22 (Scaled network process).For z0∈Rd

≥0define the pro-

cesses

Zn:R≥0→Rd

≥0, t 7→ 1

nZ(nt, ⌊nz0⌋)

and Rnwith coordinate processes R(i)

ndefined as

R(i)

n(t) = 1

nR(Z(i)(·,⌊nz0⌋), nt)

Then (Zn, Rn)is denoted the scaled network process starting in z0. If the

starting point is z0= 0 it may be omitted. When addressed in isolation

Znmay be referred to as the scaled queue size process and Rnas the scaled

runtime process of the network.

152

From Chen and Mandelbaum (part of their theorem 5.1. Their assumed

uniform convergence on compacts holds for the generalised Jackson network

under our assumptions.) we get a statement paralleling claims 5.3.6, 5.3.8.

Theorem 5.3.23 (Drift of the network process).Consider the scaled net-

work process (Zn, Rn)of the generalised Jackson network with rates (λ, µ, P).

Consider the Skorohod problem 5.2.14 with

•starting point z0

•linear input functions X,X(t) = z0+tλ+ (P⊤−id)µ

•P

and let (Z, Y )be the linear solution of the form Z(t) = z0+tz,Y(t) = ty

where zis the network drift and yis the loss rate. Then the network process

converges almost surely uniformly on compacts

lim

n→∞(Zn, Rn) = t7→ z0+tz , t 7→ t(1 −ρ).

Chen and Mandelbaum denote the network drift zthe fluid limit of the

queue size process. And indeed it can be identified with the deterministic

flow in the associated fluid network of section 5.2.1.

5.3.3 The local process

Observe that in the scaled network process t7→ Zn(t, z0) (z0fixed) initially

non empty nodes stay non-empty for some positive length of time and since

R(i)(t) = tover this interval the network process looks like the free process

in this i-th coordinate and over this interval. Figure 5.7 gives a realisation

of the non-ergodic network of example 5.1, 5.2.8 over some short time span

[0, T], T = 0.1 with a scaling parameter n= 500. This motivates the local

process which is somewhere in between the free and the network process. We

work with the local process to prove local large deviations in chapter 6.

Definition 5.3.24 (WΛ, RΛ).Given network primitives A, S(1),...,S(d)and

some fixed z0∈Rd

≥0let

Λ = Λ(z0) = {i|z0,i >0}

and define the local process WΛas

WΛ:R≥0×Rd

≥0→Zd

(t, z0)7→ z0+A(t) + X

i∈Λ

S(i)(t) + X

i∈Λc

S(i)◦R(i)Λ(t)

5.3 Stochastic networks and associated processes 153

10 40

20 30

Queue 1

Queue 2

Queue 3

Queue 4

Figure 5.7: Realisation of the non-ergodic network or associated local process

with Λ = {2,3}, cf examples 5.2.8 (T= 0.1 and scaling n= 500)

and the local runtime process for i∈Λc

R(i)Λ(t) = R(WΛ

i(·, z0), t) = Zt

s=0

11W(i)Λ(s,z0)>0ds

The local process WΛwill have non-negative Λc-coordinates: WΛ

i≥0 for

i∈Λcsince the runtime R(i)Λ will stop S(i)if WΛ

i= 0. Λ-coordinates of the

local process are allowed to have negative values. We denote i∈Λ a free

node and i∈Λca restricted node.

We split the local process into the sum of a free and a network process:

Claim 5.3.25. Let A, S(1),...,S(d)be network primitives of a generalised

Jackson network, z0∈Rd

≥0and (Z, R)the associated network process starting

in z0. Set Λ = {i|z0,i >0}and let πMbe the projection such that πΛ(α) =

Pi∈Λαiei. Then

WΛ=z0+XΛ+ZΛ

for the following ZΛ, XΛ:

154

•ZΛis the network process of the Λc-nodes with all nodes starting empty.

ZΛ(t)∈Rd

≥0, Z(i)Λ ≡0for i∈Λ

ZΛ=πΛc(WΛ)

i∈ΛcA(i)+X

k∈Λ

S(ki)−S(i)◦R(S(i), t) + X

k∈Λc

S(ki)◦R(S(ki), t)ei

•XΛis the free process of the Λ-nodes with XΛ(t)∈Rd, XΛ(0) = 0

XΛ=πΛ(WΛ)−z0

i∈ΛA(i)+X

k∈Λc

S(ki)◦R(S(ki), t)−S(i)+X

k∈Λc

S(ki)ei

Proof of 5.3.25: We only have to check that the projections πΛ(WΛ) and

πΛc(WΛ) are correct.

The interpretation as a network process for ZΛis correct since 5.3.20 ap-

plies to the primitives

•arrival processes A(i)Λ =A(i)+Pk∈ΛS(ki)for i∈Λc;

•service processes −eiS(i)+Pk∈ΛcS(ki).

Similarly XΛis a free process as in 5.3.5 with

•arrival processes t7→ A(i)(t) + Pk∈ΛcS(ki)◦R(S(ki), t) for i∈Λ

where Rmakes distributions of these arrival processes difficult;

•service processes −eiS(i)+Pk∈ΛS(ki).



5.3.25

The new arrival processes A(i)Λ , i ∈Λcof ZΛdefined in the proof above

A(i)Λ =A(i)+X

k∈Λ

S(ki)

for i∈Λcare no renewal counting processes but the sum of independent

renewal counting processes. The rates of the network process ZΛare

i∈Λcλi+X

k∈Λ

µkpki ei, µ , [pij 11i,j∈Λc]i,j=1,...,d

5.3 Stochastic networks and associated processes 155

since Z(i)Λ = 0 for i∈Λ nodes in Λ are not relevant in the ZΛnetwork and

we move to R|Λc|, describing the (sub)network of Λc-nodes as

λΛc+ (P⊤)ΛcΛµΛ=λi+X

k∈Λ

µkpki i∈Λc, µΛc= [ µi]i∈Λc, PΛc= [ pij ]i,j∈Λc

These rates are like in (5.4) with Λ instead of Band Λcinstead of E. Λc

nodes are not necessarily ergodic.

Exit nodes of the Λc-network are {i∈Λc| ∃j∈Λ∪ {0}:pij >0}and

for p(i)Λ a row of PΛcand a sub-probability measure pΛ

i0=pi0+Pk∈Λpik.

Claim 5.3.26. ZΛsatisfies assumption 5.1.9.

Proof of 5.3.26: If Zis open then ZΛis, too: In a picture of an open

network there is a sequence of arrows from the outside world to an arbitrary

node iand from that node to the outside world. By remodelling a subset of

nodes as a network we get “more outside world” which makes sequences from

the outside world to node ishorter but never looses the required accessabil-

ity. Feedback cannot be created by removing edges from the network. 

5.3.26

We can now investigate if the subnetwork of Λc-nodes described through

ZΛis ergodic or find the ergodic subnetwork. The network has deterministic

rates and definitions and claims of section 5.2 apply.

Change of measure

We define the change of measure process for the local process associated

with a generalised Jackson network for a starting point z0∈Rd

≥0and set

Λ = {i|z0,i >0}of initially non-empty nodes.

Observing the local process WΛup to some fixed twe observe all arrival

processes and the service processes of Λ-nodes up to time t. For restricted

nodes i∈Λcwe only have observed the service process and the routing de-

cisions up to time R(i)Λ(t)≤t. If node iis ergodic with high probability

R(i)Λ(t)< t. We want to argue that the change of measure process for the

i-th service process we developed in section 4.4.2, esp. definition 4.4.15, is

still a change of measure process under the time change t7→ R(i)Λ(t).

Definition 5.3.27 (Local filtration).

FΛ

t=σWΛ

s;s≤t,FΛ=FΛ

t;t≥0

156

Then (WΛ

t;t≥0) is adapted to the filtration (FΛ

t)t≥0and the runtime

t7→ R(i)Λ(t) is adapted to this filtration, since R(i)Λ(t) = R(WΛ(·, z0), t) is a

measurable function of WΛ.

Let G(i)for some fixed i∈Λcbe the change of measure process for the

counting process S(i)(cf 5.3.2, 4.4.15):

G(i)(α, t) = exp{hα, S(i)

ti − tΓ(i)

S(−αi+ Ξ(i)(α))}r(−αi+ Ξ(i)(α), t)

Claim 5.3.28. t7→ G(i)(α, R(i)Λ(t)) is a martingale wrt FΛ.

Proof of 5.3.28: G(i)(α, ·) is a martingale wrt the filtration

σS(i)

s;s≤tt≥0

generated by S(i)and for fixed t≥0 the runtime R(i)Λ(t) is a stopping time

wrt FΛ. Thus G(i)(α, R(i)Λ(t)) is a regular random variable with mean 1 (the

same mean as G(i)(α, t)) and measurable wrt σ(S(i)

s;s≤R(i)Λ(t))) ⊆ FΛ

If s < t we have two ordered, bounded stopping times R(i)Λ(s)≤R(i)Λ(t)

and

E[G(i)(α, R(i)Λ(t)) | FΛ

s] = G(i)(α, R(i)Λ(s))

From this we have the martingale property for any s1<···< snor 0 = s0<

s1<···< sn−1< sn=Tof the following finite dimensional vector:

1,G(i)(α, R(i)Λ(s1)),...,G(i)(α, R(i)Λ(sn−1)),G(i)(α, T)

And t7→ G(i)(α, R(i)Λ(t)) is a local martingale. We now show that for any

t≥0 we have E[sups∈[0,t]G(i)(α, s)] <∞and then the martingale property

follows (cf proposition A.7 in [18]).

sup

t∈Rr(−αi+ Ξ(i)(α), t)<∞

−sΓ(i)

S(−αi+ Ξ(i)(α)) ≤max{0,−tΓ(i)

S(−αi+ Ξ(i)(α)) }

E[exp{hα , S(i)

si}] = E[exp{hα , T(i)S(i)sp

si}]

5.3.3

=E[exp{hT(i)⊤α , S(i)sp

si}]

β=T(i)⊤α

=E[exp{hβ , S(i)sp

si}]

Replace βiby β+

ithen the last expression on the rhs is monotone increasing

in sand

E[ sup

s∈[0,t]

G(i)(α, s)] ≤E[exp{hβ+, S(i)sp

ti}]<∞.

5.3 Stochastic networks and associated processes 157



5.3.28

Corollary 5.3.29. t7→ G(i)(α, R(i)Λ(t)) is the change of measure process for

t7→ S(i)◦R(i)Λ(t), the primitive service process at node iin the setting of the

local process.

The corollary brings together claim 5.3.28 and the additional property of

a mean equal to unity.

Definition 5.3.30 (MΛ(α, ·)).Consider a generalised Jackson network with

•primitives defined in 5.1.5, 5.3.4.

•Let Xbe the free process defined in 5.3.5

•with lmgf Ψdefined in 5.3.9.

•Let (WΛ, RΛ)be the local process for some Λ⊆ {1,...,d}.

Define for t≥0and α∈Rd

MΛ(α, t) = exp{hα , WΛ

ti −

i=1

tΓ(i)

A(αi)

−X

i∈Λ

tΓ(i)

S(−αi+ Ξ(i)(α))

−X

i∈Λc

R(i)Λ(t) Γ(i)

S(−αi+ Ξ(i)(α))} × r(α, t)

where

r(α, t) =

i=1

r(i)S(Γ(i)

S(−αi+ Ξ(i)(α)), t)r(i)A(−Γ(i)

A(αi), t)

r(i)D(β, t) = Fc

Fc(B(t)) e−βB(t)

for D∈ {A, S}and Fthe distribution function of inter event times of the

renewal counting process D(i)and B(t) = B(D(i), t)the age of D(i)at time t

(for the definition of r(i)Dcf 3.6.5).

Note that stochasticity of MΛ(α, t) is in the inner product of WΛand in

the runtimes R(i)Λ for i∈Λc.

Claim 5.3.31. •t7→ MΛ(α, t)is a change of measure process for the

local process.

158

•Under the change of measure MΛ(α, ·)the process WΛis again a local

process associated with the primitives of a generalised Jackson network

and the same set Λ. Each primitive changes its distribution in the

following way:

–if A(i)had inter event densities fthen under the changed measure

A(i)remains a renewal counting process and now has inter event

times density f−Γ(i)

A(αi), cf 2.3.1, (2.4).

–if the routing decision r(i)was distributed on {e1,...,ed,0}with

probability measure (pi1,...,pid,1−Pd

j=1 pij)associated with the

sub-probability measure p(i)then under the change of measure rout-

ing decisions remain iid. r(i)now has the distribution associated

with the sub-probability measure ∇Ξ(i)(α), cf 4.4.23.

–if S(i)had inter event densities fthen under the changed measure

S(i)remains a renewal counting process and now has inter event

times density f−Γ(i)

S◦(−πi+Ξ(i))(α), cf 2.3.1, (2.4).

Proof of 5.3.31: Can we sequentially change the distributions of the prim-

itive processes. We do it over different times: over [0, t] for the arrival pro-

cesses and the S(i)with i∈Λ. For the restricted nodes i∈Λcwe change the

distribution of S(i)over [0, R(i)(t)]. This still works in the new setting due to

5.3.29. Since R(i)(t) is a random variable in our setting and a stopping time

we can work with the stopped martingale as the change of measure. 

5.3.31

Since the change of measure MΛ(α, ·) has the same effect on the network

primitives as in the case of the free process, the rates of the network’s prim-

itives change in the same way and corollary 5.3.16 holds under the change

MΛ(α, ·).

Definition 5.3.32 (E[α]).For any fixed t≥0and set A∈ FΛ

E[11(WΛ

s;s≤t)∈AMΛ(α, t)] = E[α][11(WΛ

s;s≤t)∈A].

Remark 5.3.33. •The local process has been defined with explicit ref-

erence to a starting point z0. This suits our following application of

the local process associated with a network process Z=Z(·, z0). How-

ever, local processes can be defined more generally, not requiring that

Λ = Λ(z0).

•If Λ = ∅then MΛ(α, ·) = M∅(α, ·)is a change of measure process for

the network process.

Chapter 6

Local large deviations of the

generalised Jackson network

For the local sample path large deviations for a Markovian network we cite

the definition of Irina Ignatiouk-Robert in the introduction of [13], “Large

Deviations for Processes with Discontinuous Statistics”. The paper is con-

cerned with how to develop full large deviations for Markovian processes with

discontinuous statistics starting from local large deviations.

Definition 6.0.2 (Local large deviations [13]).Let x∈Rd

≥0and (X(t, x))

be a Markov process on E⊆Rd

≥0with initial state X(0, x) = x. For n∈N,

(Zn(t, z)) is the rescaled Markov process on En=1

nEand having initial state

Zn(0, z) = z∈ En:

Zn(t, z) = 1

nX(nt, nz).

A local sample path large deviation principle with a rate function J[0,T ]is said

to hold when the following inequalities are satisfied:

(3) : lim

δ→0lim

ǫ→0lim inf

n→∞ inf

z∈En

|z−ψ(0)|<ǫ

nlog Pz(||ψ−Zn||∞< δ)≥ −J[0,T ](ψ)

(4) : lim

δ→0lim sup

n→∞

sup

z∈En

|z−ψ(0)|<δ

nlog Pz(||ψ−Zn||∞< δ)≤ −J[0,T ](ψ)

for every piecewise linear function ψ: [0, T]→Rd

≥0.

From the Markov property it is deduced in [13] that one only needs to

consider linear functions. We will work only with linear functions for the

non-Markovian generalised Jackson network since we have shown that its

159

160

primitives are exponentially equivalent to primitive proceses that have in-

dependent increments over finitely many, deterministic, disjoint intervals, cf

section 3.4.2. This can be generalised to independent evolution of the net-

work over finitely many, deterministic, disjoint intervals of time, over which

the process stays close to some linear function over each interval (we can

thereby bound runtimes and will obtain deterministic intervals for the ser-

vice processes of the restricted nodes.).

Since a piecewise linear function over a compact interval will hit and leave

boundaries at a finite number of fixed instances of time, independence of the

network evolution over such intervals should be enough to move from linear

to piecewise linear functions. We work with a slightly weaker definition of

local large deviations.

Definition 6.0.3 (Local large deviations).For each n∈Nand fixed z0let

Zn(·, z0)be a scaled network process starting in z0, cf 5.3.22. A local sample

path large deviation principle is said to hold when for any x, v, T such that

•xi= 0 ⇒vi≥0

•xi>0⇒xi+Tvi>0

the following inequalities are satisfied:

(1) : lim

δ→0lim

ǫ→0lim inf

n→∞ inf

|z0−x|<ǫ

nlog P(||(t7→ x+tv)−Zn(·, z0)|| < δ)≥ −TL(x, v)

(2) : lim

δ→0lim

ǫ→0lim sup

n→∞

sup

|z0−x|<ǫ

nlog P(||(t7→ x+tv)−Zn(·, z0)|| < δ)≤ −TL(x, v)

with || · || the supremum norm over the interval [0, T].

We prove the following local large deviation principle for the non-Markovian

generalised Jackson network. We explicitly allow linear functions that leave

a boundary. Such a situation is given if there is isuch that xi= 0, vi>0.

Claim 6.0.4 (Local large deviations for the generalised Jackson network).

Consider the generalised Jackson network with dnodes and primitives 5.1.5.

Let Γ(i)

A,Γ(i)

S,Ξ(i)be the the lmgfs for the primitives and Ψthe lmgf for the

free process. Under assumptions 5.0.2 for the inter event times and 5.1.9

for the network a sample path local large deviation principle holds with rate

function

L(x, v) = sup

α∈BK

hα, vi − Ψ(α)

6.1 Local large deviations of the generalised Jackson network 161

with

K={i|xi>0or vi>0}

BK={α∈Rd| − αi+ Ξ(i)(α)≥0∀i∈Kc}.

From the definition of the local large deviation we make the general as-

sumption

Assumption 6.0.5. x, v, T are such that

•xi= 0 ⇒vi≥0and

•xi>0⇒xi+T vi>0.

In the local large deviation we bound the probability that the queue size

process stays close to a specified linear function. Since queue sizes will al-

ways be non-negative this assumption picks linear functions that a network

process may have positive probability of staying close to; it is not restrictive.

In the next section we will prove the upper bound of the local large de-

viation principle and obtain a candidate for the local rate function L(·,·)

as an inequality constrained optimisation problem. We will then investigate

existence of an optimiser in this optimisation problem; an optimiser can be

interpreted as the parameter of a change of measure for the network pro-

cess. We will further investigate properties of the network under the change

of measure with the optimising αas parameter and finally prove the lower

bound. As the lower and upper bound coincide the candidate local rate

function is the local rate function and 6.0.4 will be proved.

6.1 Local large deviations upper bound

We are interested in the event

Zn(·, z)∈ Uδ(t7→ x+t v) (6.1)

over some interval [0, T] for the scaled network process Znstarting in z∈Rd

and the asymptotic decay of the probability that the process stays in the

neighbourhood over an interval of positive length as n→ ∞.

We start with linear functions where empty queues stay empty and the scaled

network process has the same starting point as the linear function t7→ x+tv.

162

t7→ 0

hhhhhhhhhhhhhh

t7→ x1+tv1

hhhhhhhhhhhhhh

Figure 6.1: d= 2, x2= 0 , v2= 0

Claim 6.1.1. Consider the generealised Jackson network with dnodes and

primitives 5.1.5. Let Γ(i)

A,Γ(i)

S,Ξ(i)be the the lmgfs for the primitives and Ψ

the lmgf for the free process. Let Znbe the scaled queue size process of the

generalised Jackson network. Given x, v, T such that assumption 6.0.5 holds

and xi= 0 ⇒vi= 0

lim

δ→0lim sup

n→∞

nlog P(Zn(·, x)∈ Uδ(t7→ x+t v)) ≤ −Tsup

α∈BΛ

hα, vi − Ψ(α)

with the set BΛdefined as

Λ = {i|xi>0}

BΛ={α∈Rd| − αi+ Ξ(i)(α)≥0∀i∈Λc}.

Proof of 6.1.1: By the assumption on x, v, T we have R(i)(t) = tfor all

i∈Λ and t≤nT while Zn(·, x)∈ Uδ(t7→ x+tv). Thus we can exchange

the network process for the local process:

Zn(·, x)∈ Uδ(t7→ x+t v)⇔WΛ

n(·, x)∈ Uδ(t7→ x+t v)

We uniformise in x:

WΛ

n(t, x)−(x+tv) (6.2)

=Zn(0, x) + An(t) + X

i∈Λ

S(i)

n(t) + 1

i∈Λc

S(i)◦R(i)(nt)−x−tv

=⌊nx⌋

n−x

|{z }

∈[−1

n,0]

+An(t) + X

i∈Λ

S(i)

n(t) + 1

i∈Λc

S(i)◦R(i)(nt)−tv

|{z }

=WΛ

n(t,0)−tv

6.1 Local large deviations of the generalised Jackson network 163

As we have removed xfrom (6.2) we remove it in the notation and write

WΛ

n(t) instead of WΛ

n(t, 0). The difference ⌊nx⌋

n−xforces us to change the δ

of our neighbourhood to some δ′with |δ−δ′| ≤ 1

nbut we choose to ignore

this notational nuisance. We are now investigating the event

WΛ

n∈ Uδ(t7→ t v)

and will apply the change of measure MΛ(cf 5.3.24) with parameter α∈Rd.

P(WΛ

n∈ Uδ(t7→ t v)) = E[11WΛ

n∈Uδ(t7→t v)] = E[11WΛ

n∈Uδ(t7→t v)

MΛ(α, nT)

MΛ(α, nT)]

=E[α][11WΛ

n∈Uδ(t7→t v)

MΛ(α, nT)] (6.3)

We bound

MΛ(α, nT)= exp{− hα, WΛ

nT i

|{z }

=hα,W Λ

nT −nT vi+hα,nT vi

+nTd

i=1

Γ(i)

A(αi)

i∈Λ

Γ(i)

S(−αi+ Ξ(i)(α))

i∈Λc

R(i)(nT)

|{z }

≤1

Γ(i)

S(−αi+ Ξ(i)(α))}1

r(α, R(nT))(6.4)

To get an upper bound we restrict αsuch that

BΛ:= {α∈Rd| − αi+ Ξ(i)(α)≥0∀i∈Λc}(6.5)

resulting in

MΛ(α, nT)≤exp n− hα, WΛ

nT −nTvi+nTΨ(α)− hα, vio1

r(α, R(nT))

We go on with the bound

P(WΛ

n∈ Uδ(t7→ t v))

(6.4)

≤E[α][11WΛ

n∈Uδ(t7→t v)exp{ hα, nTv −WΛ

nT i

|{z }

≤||α||·||nT v−WΛ

nT ||<||α||nT δ

+nTΨ(α)− hα, vi}

r(α, R(nT))] (6.6)

≤E[α][11WΛ

n∈Uδ(t7→t v)

r(α, R(nT))]

|{z }

≤1 sup 1

r<∞

exp{||α|| n T δ +nTΨ(α)− hα, vi

164

With the expectation finite uniformly in nand δby 11 ≤1 and 1

r(α,R(nT )) a

product of bounded terms independent of δand n, cf claim 3.6.11.

For fixed α∈ BΛwe have

lim

δ→0lim sup

n→∞

nlog P(WΛ

n∈ Uδ(t7→ t v)) ≤TΨ(α)− hα, vi

Optimising over α∈ BΛwe get the desired upper bound

inf

α∈BΛ

T(Ψ(α)− hα, vi) = −Tsup

α∈BΛ

hα, vi − Ψ(α)



6.1.1

The upper bound −Tsupα∈BΛhα, vi−Ψ(α) in 6.1.1 looks similar to a Fenchel

Legendre transform, the usual candidate for a rate function. We will some-

times refer to this upper bound as an almost Fenchel-Legendre transform.

Interpretation 6.1.2. The α∈ BΛare twist parameter in the change of

measure process MΛ(α, ·)and by definition 4.4.17 the change of measure for

the service process changes the rate of the counting process from µito

µi(α) = Γ(i)′

S(−αi+ Ξ(i)(α)).

Thus we can interprete BΛas the twist parameters that do not allow a de-

crease of service rates at the Λc-nodes.

Corollary 6.1.3. It should be immediate that we can similarly bound the

event (6.1) with z6=xif we have z→xbefore δ→0:

lim

δ→0lim

ǫ→0lim sup

n→∞

nlog sup

z:|z−x|<ǫ

P(Zn(·, z)∈ Uδ(t7→ x+t v))

≤ −Tsup

α∈BΛΨ(α)− hα, vi

The situation x6=zaffects the proof in (6.2) as we get ⌊nx⌋

n−z=

⌊nx⌋

n−x+x−zinstead of ⌊nx⌋

n−x. It is again just a matter of changing δ

to some δ′. The order of limits as ǫ→0 before δ→0 is important.

6.1.1 Leaving a boundary

In this section we investigate the event that a network process starting in

Zn(0) = ⌊nx⌋

nstays close to some affine function t7→ x+vt and we allow

vi>0 for i6∈ Λ(x). That is: an initially empty node xi= 0 increases over

[0, T] and becomes non-empty. Figure 6.2 is an example of this situation.

6.1 Local large deviations of the generalised Jackson network 165



t7→ tv2

hhhhhhhhhhhhhh

t7→ x1+tv1





δ



hhhhhhhhhhhhhh

Figure 6.2: d= 2, x2= 0 , v2>0

Claim 6.1.4. Consider the generalised Jackson network with dnodes and

primitives 5.1.5. Let Γ(i)

A,Γ(i)

S,Ξ(i)be the the lmgfs for the primitives and Ψ

the lmgf for the free process. Let assumption 6.0.5 hold for x, v, T. Then

lim

δ→0lim

n→∞

nlog P(Zn(·, x)∈ Uδ(t7→ x+t v)) ≤ −Tsup

α∈BK

hα, vi − Ψ(α).

for

K={i|xi>0or vi>0}

BK={α∈Rd| − αi+ Ξ(i)(α)≥0∀i∈Kc}.

The difference of claims 6.1.1 and 6.1.4 is in vΛc= 0 vs vΛc≥0 and the

optimisation on the right hand sides over BΛvs BK. The difference in the

result stems only from the additional assumption in 6.1.1.

Lemma 6.1.5. For iwith xi= 0 , vi>0

Zn(·, z)∈ Uδ(t7→ x+t v)⇒R(i)

n(T)≥T−δ

Proof of 6.1.5: The claim may be obvious from figure 6.2. Nevertheless,

we apply the definition of the runtime and bound. In this proof we abbreviate

Zn(·, z)∈ Uδ(t7→ x+t v) as Zn∈ U.

11Zn∈U R(i)(nT) = 11Zn∈U ZnT

t=0

11Z(i)

t>0dt

≥11Zn∈U ZnT

t=0

11xi+tvi−nδ>0dt

xi=0

= 11Zn∈U ZnT

t=0

11t> nδ

dt = 11Zn∈U (nT −nδ

)

166



6.1.5

Proof of 6.1.4: In this proof we abbreviate Wn(·,0) ∈ Uδ(t7→ t v) as WΛ

n∈ U.

The proof goes unchanged up to (6.4) where we bound differently the sum-

mands for i∈K∩Λc:

11WΛ

n∈Uδ(t7→tv)

R(i)(nT)

nT Γ(i)

S(−αi+ Ξ(i)(α))

= 11WΛ

n∈Uδ(t7→tv)11−αi+Ξ(i)(α)>0

R(i)(nT)

|{z }

≤1

Γ(i)

S(−αi+ Ξ(i)(α))

+11−αi+Ξ(i)(α)≤011WΛ∈Uδ(t7→tv)

R(i)(nT)

|{z }

≥1−δ

T vi

Γ(i)

S(−αi+ Ξ(i)(α))

|{z }

≤0

≤11WΛ

n∈Uδ(t7→tv)11−αi+Ξ(i)(α)>0Γ(i)

S(−αi+ Ξ(i)(α))

+11−αi+Ξ(i)(α)≤0(1 −δ

Tvi

) Γ(i)

S(−αi+ Ξ(i)(α))

= 11WΛ

n∈Uδ(t7→tv)Γ(i)

S(−αi+ Ξ(i)(α)) (1 −δ

Tvi

11−αi+Ξ(i)(α)≤0)

and then analog to (6.4)

11WΛ

n∈Uδ(t7→tv)

MΛ(α, nT)

≤11WΛ

n∈Uδ(t7→tv)exp{−hα, WΛ

nT i+nTd

i=1

Γ(i)

A(αi)

i∈Λ

Γ(i)

S(−αi+ Ξ(i)(α))

i∈K∩Λc

R(i)(nT)

|{z }

≤1

Γ(i)

S(−αi+ Ξ(i)(α))

i∈Kc

R(i)(nT)

|{z }

≤1−δ

T vi11−αi+Ξ(i)(α)

Γ(i)

S(−αi+ Ξ(i)(α))}1

r(α, R(nT))

6.2 Local large deviations of the generalised Jackson network 167

The upper bound then becomes

P(WΛ

n∈ Uδ(t7→ tv)) ≤E[α][11WΛ∈Uδ(t7→tv)

exp{||α||nTδ −nThα, vi+nTd

i=1

Γ(i)

A(αi)

i∈Λc∩K

Γ(i)

S(−αi+ Ξ(i)(α)) (1 −δ

Tvi

11−αi+Ξ(i)(α)≤0)

i∈Λ

Γ(i)

S(−αi+ Ξ(i)(α))

i∈Kc

Γ(i)

S(−αi+ Ξ(i)(α))}(6.7)

Where we need the restriction −αi+Ξ(i)(α)≥0 only for i∈Kcfor bounding

the relative runtime in (6.7) by 1. These restrictions define BKanalogue to

(6.5). We continue

P(WΛ

n∈ Uδ(t7→ tv)) ≤E[α][11WΛ∈Uδ(t7→tv)

|{z }

≤1

|{z}

bounded by 3.6.11

]

exp{||α||nTδ −nThα, vi+nTΨ(α)

−X

i∈Λc∩K

Γ(i)

S(−αi+ Ξ(i)(α)) δ

Tvi

|{z}

→0 as δ→0

11−αi+Ξ(i)(α)≤0}

and under the scaling limit

lim

δ→0lim

n→∞

nlog P(WΛ

n∈ Uδ(t7→ tv)) ≤ −Thα, vi+TΨ(α)



6.1.4

6.2 Existence and uniqueness of an optimiser

We investigate the optimisation problem found in claim 6.1.4

sup

α∈BK

hα, vi − Ψ(α) (6.8)

with K⊇ {i|vi>0}

BK={α∈Rd| − αi+ Ξ(i)(α)≥0∀i∈Kc}

168

and we will argue for the existence of a unique optimiser in the following.

Uniqueness is not really an issue since we have seen that Ψ is strictly convex

(if all inter event times are non-deterministic or if at least there is nondeter-

ministic flow reaching each node).

We start with a simple condition for existence of an optimiser and then

develop a second one, more elaborate and less restrictive.

Claim 6.2.1. If the Fenchel Legendre transform Ψ∗is finite on all of Rd

then an optimising αin (6.8) exists.

Proof of 6.2.1: We prove that For any v, K such that vKc= 0 the level

sets {α∈ BK|Ψ(α)− hα , vi ≤ c}are compact. Similar to [12] we construct

a finite norm-ball including the level set.

Let || · || denote some norm in Rd. and define the norm | · |1as

|α|1= sup

||v′||≤1

hα, v′i

We give a finite bound for |α|1uniform in αof the level set.

sup

α∈BK:

Ψ(α)−hv,αi≤c

|α|1= sup

α∈BK:

Ψ(α)−hv,αi≤c

sup

v′:||v′||≤1

hα, v′i

For αfrom the level set {α∈ BK|Ψ(α)− hα , vi ≤ c}

hα, v′i=hα, v′+vi − hα, vi

=hα, v′+vi − Ψ(α) + Ψ(α)− hα, vi

|{z }

≤c

and thus

|α|1≤c+ sup

v′:||v′||≤1

hα , v′+vi − Ψ(α)

≤c+ sup

v′:||v′||≤1

sup

α∈Rd

hα , v′+vi − Ψ(α)

≤c+ sup

v′:||v′||≤1

Ψ∗(v′+v) (6.9)

which is finite if Ψ∗is finite on Rdand the supremum is over a convex set.

The bound uniform in α: The level set is bounded. Closedness is immediate

and only needs continuity of Ψ. From compactness of level sets and finite-

ness and continuity of the objective α7→ Ψ(α)− hα, vifollows existence of

6.2 Local large deviations of the generalised Jackson network 169

an infimiser. 

6.2.1

We formulate our second criterion below in 6.2.7 and we believe that its

if-part holds if the rate functions Γ(i)∗

Dfor all i= 1,...,d and D∈ {A, S}

are open, cf 2.8. Technically we need to replace Ψ∗in the bound (6.9) in the

case that that (6.8) is finite but there is no neighbourhood of von which Ψ∗

is finite. We will define the replacement GKin 6.2.4.

We start with basic convex analysis and then give an upper bound for (6.8).

Also we will argue for finiteness of this upper bound.

Consider the generalised Jackson network with primitives A(i), S(i), n 7→

k=1 r(i)

kfor i= 1,...,d (cf 5.1.5) where the arrival and service processes

have lmgfs Γ(i)

A,Γ(i)

Sand r(i)

khas lmgf Ξ(i), cf 4.4.8. The Γ(i)

Dare strictly con-

vex as soon as they are not deterministic, the Ξ(i)are strictly convex if the

routing measure they are build from are no point measures.

Let γ∈Rd

≥0and w∈Rdbe fixed for the moment and denote by πj

the projection from Rdonto span{ej}, that is πj(α) = αjej. Then α7→

i=1 Γ(i)

A◦πi(α)+γiΞ(i)(α) is a convex function and we investigate its Fenchel-

Legendre transform.

Definition 6.2.2. For γ∈Rd

≥0define

gγ:Rd→R∪ {∞} , θ 7→ d

i=1

Γ(i)

A◦πi+γiΞ(i)(θ).

Claim 6.2.3. gγis convex and

g∗

γ(w) = inf

a∈Rr,r∈Rd×d

a+r⊤γ=w

i=1

Γ(i)∗

A(ai) + γiΞ(i)∗(r(i))

We interprete gγ(w) as the joint decay rate for the probability that at

node ian empirical arrival rate aican be observed instead of the expected λi

and that routing happens at empirical rates r(i)instead of p(i)over time γi.

Additionally there is the condition that ai, r(i)have to be such as to produce

total flow into each node iof rate wi=ai+ (r⊤γ)iin the associated fluid

network. If there are no a, r in the domains of Γ(i)∗

A,Ξ(i)∗such a flow of a

into the fluid network and a splitting of flow at each node iwrt r(i)would

produce input flow winto the nodes then g∗

γ(w) = ∞.

170

Proof of 6.2.3: We start with some convex analysis (cf (7.2) of the appendix).

d

i=1

Γ(i)

A◦πi+γiΞ(i)∗(w) = inf

cj,dj∈Rd

Pjcj+dj=w

j=1

(Γ(j)

A◦πj)∗(cj) + (γjΞ(j))∗(dj)

Due to 7.3.1 we do not increase the infimum when restricting cjto πj(Rd).

In the following we optimise over a∈Rdwith aj=hcj, eji. This is OK since

(c1,...,cd)7→ ais a bijection for those cjfor which (Γ(j)

A◦πj)∗(cj)<∞. We

also apply (7.1)

g∗

γ(w) = inf

a,dj∈Rd

a+Pdj=w

j=1

Γ(j)∗

A(aj) + γjΞ(j)∗(1

γj

dj) (6.10)

Changing variables from 1

γjdjto r(j)in the argument of Ξ(j)∗we need to

change the restriction, too.

r(j)=1

γj

dj⇒

j=1

dj=

j=1

γjr(j)=r⊤γ

This completes the proof. 

6.2.3

From the new representation of g∗

γin 6.2.3 we see that g∗

γis finite if there are

(a, r)∈Rd×Rd×dwith ai≥0, λi= 0 ⇒ai= 0, and r(i)equivalent to p(i). It

is also possible that there is r(i)not equivalent to p(i)but with supp(r(i))$

supp(p(i)). In that case there’d be no finite optimiser in Ξ(i)∗(r(i)) (cf 4.4.22).

Note that the Fenchel-Legendre transform in claim 6.2.3 satisfies

•g∗

γ≥0 from Γ(i)∗

A,Ξ(i)∗, γi≥0 and thus g∗

γ>−∞ always,

•g∗

γ(λ+P⊤µ)<∞since (λ, P)∈ X (λ+P⊤µ, µ). This is also a minimiser

of the Fenchel-Legendre transform: g∗

γ(λ+P⊤µ) = 0.

Thus g∗

γis a proper convex function (cf [19] p. 24, definition of “proper”).

As a next prep-step

6.2 Local large deviations of the generalised Jackson network 171

Definition 6.2.4. Let K⊆ {1,...,d}.

GK:Rd→R∪ {∞}

v7→ inf

a,r,γ:

a+(r⊤−id)γ=v

i=1

Γ(i)∗

A(αi) + γiΞ(i)∗(r(i))

j∈K

Γ(j)∗

S(γi) + X

j∈Kc

11γj>µ(j)Γ(j)∗

S(γj)

GK(v)<∞if there is one set of {a, r, γ}such that the respective rate

functions are finite and the fluid network with

•non empty nodes K

•arrival rate aiat node i

•service rate γiat each node i∈K

•service rate max{γi, µi}at each node i∈Kc

•routing matrix r

has network drift v(cf definition 5.2.16). The difference µi−γiwould be the

loss rate usually denoted yiat the initially empty subnetwork Kc.

Claim 6.2.5. If the set of restrictions {a, r, γ :a+ (r⊤−id)γ=v}has

a non-empty inter section with the domains of the respective individual rate

functions Γ(i)∗

A,Γ(i)∗

S,Ξ(i)∗then GK(v)<∞and the infimum is a minimum

(optimiser exists).

Proof of 6.2.5: We argue with compactness of level sets.

j=1

Γ(j)∗

A(aj) + γjΞ(j)∗(r(j)) + X

i∈Λc

11γi>µ(i)Γ(i)∗

S(γi) + X

i∈Λ

Γ(i)∗

S(γi)≤M

⇒









Γ(j)∗

A(aj)≤M , j = 1,...d

Γ(i)∗

S(γi)≤M , i ∈Λ







γi∈[0, µ(i)]

Γ(i)∗

S(γi)≤M



, i ∈Λc











From goodness of the Γ∗-rate functions the a, γ are in compact sets of Rd,

forming a bounded set themselves. Also the r(i)are sub-probability measures

and their max-norm is ≤1 so each is in a bounded set of Rd. From continuity

172

of all involved rate functions the level set is closed. Thus all parameters are

in compact sets forming a compact set in product space. Therefore GKthat

was defined as an infimum is actually a minimum whenever it is finite.



6.2.5

Note that GKcan be written as a composition of g∗

γthat is constant in

Kand the rate functions for the service processes.

GK(v) = inf

γg∗

γ(v+γ) + X

i∈Kc

11γi>µ(i)Γ(i)∗

S(γi) + X

i∈K

Γ(i)∗

S(γi).

Further note that G{1,...,d}= Ψ∗so G{1,...,d}is a tight upper bound for the

Fenchel-Legendre transform Ψ∗and this generalises as:

Claim 6.2.6. GKbounds the almost Fenchel-Legendre transform (6.8).

Proof of 6.2.6: We start with some transformations that will allow us to

apply 6.2.3.

sup

α∈BK

hα , vi − Ψ(α)

= sup

α∈BK

hα , vi −

i=1

Γ(i)

A(αi) + Γ(i)

S(−αi+ Ξ(i)(α))

we add a zero (introducing the non-negative parameters γ1,...,γd) and re-

arrange.

= sup

α∈BK

hα , vi+

i=1

−γi(−αi+ Ξ(i)(α)) −Γ(i)

A(αi)

+γi(−αi+ Ξ(i)(α)) −Γ(i)

S(−αi+ Ξ(i)(α))

= sup

α∈BK

hα , v +γi −

i=1

γiΞ(i)(α) + Γ(i)

A(αi)

+γi(−αi+ Ξ(i)(α)) −Γ(i)

S(−αi+ Ξ(i)(α))

and take suprema separately. We loose the restriction in the first supremum.

6.3 Local large deviations of the generalised Jackson network 173

The expression may increase:

≤sup

α∈Rd

hα , v +γi −

i=1

γiΞ(i)(α) + Γ(i)

A(αi)

i=1

sup

α∈BK

γi(−αi+ Ξ(i)(α)) −Γ(i)

S(−αi+ Ξ(i)(α))

≤d

i=1

Γ(i)

A◦πi+γiΞ(i)∗(v+γ)

i∈Kc

11γi>µ(i)Γ(i)∗

S(γi) + X

i∈K

Γ(i)∗

S(γi) (6.11)

Optimise over γand the claim follows: we got

sup

α∈BK

hα, vi − Ψ(α)≤inf

γ∈Rd(6.11) = GK(v)



6.2.6

In the rest of the subsection we apply GKto prove existence of an optimiser

in the almost Fenchel-Legendre transform (6.8).

Claim 6.2.7. If v∈ D◦(GK)then for any v, K such that vKc= 0 the level

sets {α∈ BK|Ψ(α)− hα , vi ≤ c}are compact and an optimiser in (6.8)

exists.

Proof of 6.2.7: Let a > 0 be small enough for GKto be finite in an || · ||-

ball of radius aaround vand let |α|a= supv′:||v′||≤ahα, vi. Then as in the

proof of 6.2.1 we obtain

|α|a≤c+ sup

v′:||v′||≤a

hα , v′+vi − Ψ(α)

≤c+ sup

v′:||v′||≤a

sup

α∈BΛ

hα , v′+vi − Ψ(α)

≤c+ sup

v′:||v′||≤a

GΛ(v′+v)

which is a finite bound by choice of aand uniform in α: The level set is

bounded. Closedness is immediate from continuity of Ψ. 

6.2.7

174

6.3 Network drift under the changed mea-

sure

From the local large deviations upper bound in 6.1.1 and more general 6.1.4

we got the candidate for the local rate function L(x, v) as the following

optimisation problem

sup

α∈BK

hα, vi − Ψ(α).(6.12)

In section 6.2 we have given conditions under which an optimiser αexists.

We will now investigate the behaviour of the network after the change of

measure M∅(α, ·).

For any αthe change of measure can be translated back into the individ-

ual changes of measure for each network primitive, cf 5.3.31. Thus once we

have identified an αwe are not restricted to work with the local process WΛ

we started with, we just switch from Pto P[α]and work with the network

primitives and the free, the network, and the local process as before.

Claim 6.3.1. Under assumption 6.0.5 for x, v, T and under P[α]for αthe

optimiser in 6.1.4 the fluid limit of Zn(·, z0)is t7→ z0+tv and

lim

n→∞ P[α](Zn∈ Uδ(t7→ z0+tv)) = 1.

The proof of 6.3.1 requires elements of optimisation theory we state before

we begin the proof. For this section we rephrase the almost Fenchel-Legendre

transform (6.12) as an inequality constrained minimisation problem (ICM).

−hα , vi+ Ψ(α)→min subj. to αi−Ξ(i)(α)≤0∀i∈Kc(ICM)

For future reference set gi=πi−Ξ(i).

If for the ICM the Slater condition holds the optimiser satisfies the Karush-

Kuhn-Tucker (KKT) condition. The KKT condition will help us to identify

vas the network drift under P[α].

Claim 6.3.2 (Slater condition).There is α∈Rdsuch that gi(α)<0for

each i∈Λc.

Proof of 6.3.2: We assume that Λc6=∅. We step by step fix the value of

each αisuch that the condition gi(α)<0 always (and finally) holds for all

6.3 Local large deviations of the generalised Jackson network 175

i∈Λc.

Fix values for αΛand let B:= Λ be the set of indices with αialready fixed.

The proof stops as B={1,...,d}and gi(α)<0 has been checked for all

i∈Λc.

We define a partition of the Λc-nodes relative to the length of the short-

est path from a node i∈Λcto Λ ∪ {0}. From assumption 5.1.9 there is a

finite length path from ito {0}making the shortest path from ito Λ ∪ {0}

have finite length. For fixed Λ set

A0= Λ ∪ {0}

and for k= 1,2,... and while Ak−1is not empty set

Ak={i∈ {1, . . . , d} \ (A0∪ · · · ∪ Ak−1)| ∃j∈Ak−1:pij >0}

Then Akis the subset of Λcnodes with the shortest path to Λ∪{0}consisting

of kedges. Note that there is at most d+ 1 such sets.

Set αifor i∈Λ to an arbitrary value in R. In the following we fix the values

of αi, i ∈A1: We omit the α’s not to be fixed now and get an inequality.

Ξ(i)(α)≥log X

j∈A1

pij eαj+X

j∈B

pij eαj+pi0

|{z }

=:p′



⇒gi(α) = αi−Ξ(i)(α)≤αi−log X

j∈A1

pij eαj+p′

i0

We now have an upper bound for gi(α) and we choose αA1to make the upper

bound negative.

0> αi−log X

j∈A1

pij eαj+p′

i0

⇔eαi<X

j∈A1

pij eαj+p′

⇔ˆαi<X

j∈A1

pij ˆαj+p′

i0(6.13)

where we set ˆαi:= eαiand will have to observe the condition ˆαi>0 to get

an αi∈R. From construction of A1we have p′

i0>0 - either due to pi0>0

or from pij >0 for some j∈B= Λ and the αΛfixed as some real numbers

176

- the rhs of (6.13) is positive. In the linear notation p′

i0=Pj∈Bpij ˆαj+pi0.

We get a system of |Λc|linear inequalities.

(6.13) ∀i∈Λc⇔ˆαΛc< PΛcA1ˆαA1+PΛcBˆαB+PΛc{0}

⇒ˆαA1< PA1ˆαA1+PA1BˆαB+PA1{0}

⇔(id −PA1) ˆαA1< PA1BˆαB+PA1{0}

The inverse of id −PA1exists and is strictly positive. Thus we can multiply

with (id −PA1)−1and keep the coordinate wise inequality.

ˆαA1<(id −PA1)−1PA1BˆαB+PA1{0}

which leaves a non-degenerate positive interval for each ˆαi, i ∈A1. We can

fix αA1and thus update B:= B∪A1(= Λ∪A1). If |B|=dwe are done. Else:

We can iterate this. For k≥2 the k-th iteration is to be done only if

|B|=|Λ∪A1∪ · · · ∪ Ak−1|< d which is equivalent to Ak6=∅. Up to the

k−1-st iteration αiare known for all i∈B. As before

Ξ(i)(α)≥log X

j∈Ak

pij eαj+X

j∈B

pij eαj+pi0

|{z}

|{z }

=:p′



⇔αi−Ξ(i)(α)≤αi−log X

j∈Ak

pij eαj+p′

i0

and

gi(α)<0⇐ˆαi<X

j∈Ak

pij ˆαj+p′

with strictly positive p′

i0(from B∩supp(p(i))6=∅by construction of Ak).

Putting all i∈Akin one inequality

ˆαAk<(id −PAk)−1PAkBˆαB

Positivity of the p′

i0grants solvability of the inequality and we can fix real

coordinates for αAkand update B:= B∪Ak. If necessary iterate again.



6.3.2

Claim 6.3.3 (KKT).In αthe Karush-Kuhn-Tucker condition holds:

• ∇(−hα , vi+ Ψ(α)) + Pi∈Kcηi∇gi(α) = 0

6.3 Local large deviations of the generalised Jackson network 177

•η∈R|Kc|

≥0

• hη, g(α)i= 0

with a unique η=η(α).

Proof of 6.3.3: The KKT condition holds since we proved that the Slater

condition holds; existence of the optimiser αwas already proved. For unique-

ness of ηit remains to be shown that the gradients {∇gi(α),|i∈Kc}are

linearly independent.

From our general assumption 5.1.9 one is not an eigenvalue of Pand id−P⊤

is a regular matrix. Then also the sub-matrix idKc−P⊤

Kcis regular. Since

rows of PKcand PKc(α) are both sub-probability measures (cf 4.4.16, 4.4.23)

also idKc−P⊤

Kc(α) is a regular matrix. And from regularity of id −P⊤

Kc(α)

follows linear independence of their columns ei−pKc(α)(i)∈R|Kc|:

{ei−p(i)

Kc(α)|i∈Kc} ⊆ R|Kc|

and thus of the longer columns

{∇gi(α)|i∈Kc}={ei−p(i)(α)|i∈Kc} ⊆ Rd.



6.3.3

We can now prove the statement about the fluid limit under the change

of measure.

Proof of 6.3.1: Restate the first bullet from KKT:

i∈Kc

ηi∇gi(α) = X

i∈Kc

(ei− ∇Ξ(i)(α)) ηi

each ∇Ξ(i)is an exponential twist of the row p(i)of P(cf 4.4.16) and we have

under the change of measure with αfrom 5.3.16)

P⊤(α) = [ ∇Ξ(1)(α),...,∇Ξ(d)(α) ]

i∈Kc

ηi∇gi(α) = (id{1,...,d},Kc−P⊤(α){1,...,d}Kcη= (id −P⊤(α)) 0

η

and the KKT first bullet becomes

v=∇Ψ(α) + X

i∈Kc

ηi(ei− ∇Ξ(i)(α))

=λ(α) + (P⊤(α)−id) µ(α) + (id −P⊤(α)) 0

η

=λ(α) + (P⊤(α)−id) µ(α)−0

η

178

and on the right hand side we have the network drift defined in 5.2.16 for

the network process starting in some z0with z0,i >0 for i∈Kand z0,i = 0

for i∈Kc. The 0

ηis the loss rate. And by 5.3.23 the network drift is the

expected, normal behaviour of the network process, its fluid limit. 

6.3.1

Interpretation 6.3.4. We have interpreted BKin 6.1.2 as twist parameters

that do not decrease service rates at Kc-nodes. From gi(α)<0⇒ηi= 0

(complementarity in the KKT condition) and the identification of ηas the

loss rate in 6.3.1 we know that if the service rate of a node is strictly increased

under the twist αthen this node is a bottleneck.

We have written the local process WΛas the sum of a free subprocess

XΛand the network process of nodes Λcdenoted ZΛ.

Corollary 6.3.5. From 6.3.1 and for K= Λ: Under P[α]there are no strict

bottlenecks in the Λc-nodes. Ergodic nodes in the Λc-subnetwork are identified

through the network drift (v, η)obtained from the KKT (via ηi>0).

Proof of 6.3.5:

1 = lim

n→∞ P[α](Zn∈ Uδ(t7→ z0+tv))

= lim

n→∞ P[α](WΛ

n∈ Uδ(t7→ z0+tv))

= lim

n→∞ P[α](XΛ

n∈ Uδ(t7→ z0+tπΛ(v)) , ZΛ

n∈ Uδ(t7→ tπΛc(v)))

From Λ = Kwe have πΛc(v) = 0 and

1 = lim

n→∞ P[α](XΛ

n∈ Uδ(t7→ z0+tπΛ(v)) , ZΛ

n∈ Uδ(t7→ tπΛc(v)))

≤lim

n→∞ P[α](ZΛ

n∈ Uδ(t7→ 0)) = lim

n→∞ P[α](||ZΛ

n|| < δ)



6.3.5

6.4 Local large deviations lower bound

In this section we give a lower bound for the exponential decay rate of the

event that the local process WΛstarting in WΛ

n(0, x) = ⌊nx⌋

nfollows the affine

function t7→ x+tv. Let Λ = {i|xi>0}and then uniformise over xand

only investigate the following event:

{sup

t∈[0,nT ]

|WΛ

t−tv|< nδ}={WΛ∈ Uδ(t7→ tv)}.

Further let again K={i|xi>0 or vi>0} ⊇ Λ.

6.4 Local large deviations of the generalised Jackson network 179

Claim 6.4.1. If αis the optimiser in the almost Fenchel-Legendre transform

sup

α∈BK

hα, vi − Ψ(α) = hα, vi − Ψ(α)

then the lower local large deviation bound holds:

lim

δ→0lim inf

n→∞

nlog P(||Zn(·, x)−(t7→ x+tv)|| < δ)≥ −Thα, vi − Ψ(α).

We will apply the change of measure with parameter αthat was found

as the optimiser in the upper bound 6.1.4. We start the same way as for the

upper bound.

P(WΛ

n∈ Uδ(t7→ tv)) = E[11WΛ

n∈Uδ(t7→tv)]

=E[11WΛ

n∈Uδ(t7→t v)

MΛ(α, nT)

MΛ(α, nT)]

=E[α][11WΛ

n∈Uδ(t7→tv)

MΛ(α, nT)]

From the change of measure applied here that was defined in 5.3.30, 5.3.32:

MΛ(α, nT)= exp − hα, WΛ

nT i+nTd

i=1

Γ(i)

A(αi)

i∈Λ

Γ(i)

S(−αi+ Ξ(i)(α))

i∈Λc

nT R(i)(nT)Γ(i)

S(−αi+ Ξ(i)(α)) 1

r(α, nT)

180

We apply the change of measure to our event (uniformise over xalready).

P(WΛ

n∈ Uδ(t7→ tv))

=E[α][11WΛ

n∈Uδ(t7→tv)

MΛ(α, nT)]

=E[α][11WΛ

n∈Uδ(t7→tv)exp n− hα , WΛ

nT i+nT

i=1

Γ(i)

A(αi)

+nT X

i∈Λ

Γ(i)

S(−αi+ Ξ(i)(α))

+nT X

i∈Λc

R(i)(nT)

nT Γ(i)

S(−αi+ Ξ(i)(α))o1

r(α, nT)]

≥exp{−||α||nTδ − hα, nTvi+nTΨ(α)}(6.14)

E[α][11WΛ∈Uδ(t7→tv)exp{−nT X

i∈Λc

(1 −R(i)(nT)

nT )Γ(i)

S(−αi+ Ξ(i)(α))}1

r(α, nT)](6.15)

Inequality in (6.14) is due only to the minus in −||α||nTδ. It applies the

definition of Ψ of 5.3.9 from the primitives lmgfs. The proof of 6.4.1 is thus

equivalent to the proof of

Lemma 6.4.2.

lim

δ→0lim inf

n→∞

nlog(6.15) = 0

Proof of 6.4.2: We have joint uniform convergence of the queue size and

the runtime process.

E[α][11Zn(·,x)∈Uδ(t7→x+tv)exp{nT X

i∈Λc

(R(i)(nT)

nT −1)Γ(i)

S(−αi+ Ξ(i)(α))}]

≥E[α][11Zn(·,x)∈Uδ(t7→x+tv)11Rn∈Uδ(t7→tρ)

exp{nT X

i∈Λc

(R(i)(nT)

nT −1)Γ(i)

S(−αi+ Ξ(i)(α))}]

=E[α][11Zn(·,x)∈Uδ(t7→x+tv)11Rn∈Uδ(t7→tρ)

exp{nT X

i∈Λc

−αi+Ξ(i)(α)=0

(R(i)(nT)

nT −1) Γ(i)

S(−αi+ Ξ(i)(α))

|{z }

}

exp{nT X

i∈Λc

−αi+Ξ(i)(α)6=0

(R(i)(nT)

nT −1)Γ(i)

S(−αi+ Ξ(i)(α))}]

6.5 Local large deviations of the generalised Jackson network 181

We have started with α∈ BK, thus for i∈Kc

−αi+ Ξ(i)(α)6= 0 ⇒ −αi+ Ξ(i)(α)>0

For nodes i∈Kcwe have found in 6.3.4 that under α

−αi+ Ξ(i)(α)>0⇒ρi= 1.

and from the indicator for the runtime process

R(i)(nT)

nT −1≥ρi−δ−1 = −δ

K∩Λcwas the set of nodes with xi= 0, vi>0 and from the runtime bound

in 6.1.5 we have

R(i)(nT)

nT −1≥1−δ

tvi

−1 = −δ

tvi

We finally have

11Zn(·,x)∈Uδ(t7→x+tv)11Rn∈Uδ(t7→tρ)

exp{nT X

i∈Λc

−αi+Ξ(i)(α)>0

(R(i)(nT)

nT −1)Γ(i)

S(−αi+ Ξ(i)(α))}

≥11Zn(·,x)∈Uδ(t7→x+tv)11Rn∈Uδ(t7→tρ)

exp{nT X

i∈Λc∩K

−αi+Ξ(i)(α)>0

(R(i)(nT)

nT −1)

|{z }

≥− δ

T vi

Γ(i)

S(−αi+ Ξ(i)(α))}

exp{nT X

i∈Kc

−αi+Ξ(i)(α)>0

(R(i)(nT)

nT −1)

|{z }

≥−δ

Γ(i)

S(−αi+ Ξ(i)(α))}

So we only have deterministic exponential expressions left, and under the

scaling in 6.4.2 they tend to 0. The indicators are such that their expecta-

tion tends to 1. 

6.4.2

We can also have Zn(·, z) with z6=x, as soon as z→xbefore δ→0.

6.5 Rate function identification

For the generalised Jackson network Anatolii Puhalskii proved a sample path

large deviation principle in [16]. The rate function is infinite on not absolute

182

continuous functions, for absolutely continuous functions qit is defined as

q0(q) = Z∞

t=0 X

J⊆{1,...,d}

11q(t)∈FJRJ(˙

q(t)) dt (= Z∞

t=0

L(q(t),˙

q(t)) dt)

FJ={x∈RK

+|xk= 0, k ∈J;xk>0, k 6∈ J}

RJ(v) = inf

(a,d,r):v=a+(r⊤−I)dψJ(a, d, r)

ψJ(a, d, r) = ψA(a) + X

k∈Jc

ψS

k(dk) + X

k∈J

ψS

k(dk) 11dk>ˆµk+

k=1

dkψR

k(rk)

In our notation (definition 6.2.4)

RJ(v) = GJc(v)

Claim 6.5.1 (Rate function identification).RJ(v) = L(x, v)for Jc= Λ(x).

Proof of 6.5.1: The set Jis the set of initially (t= 0) empty nodes and

FJis the face with Jc-coordinates strictly positive. This is just the other

way around compared to the definition of Λ (and the face BΛin [12]).

Puhalskii’s large deviation principle is on D([0,∞),Rd) equipped with the

extended J1-topology of Skorohod. It implies a sample path large deviation

principle on D([0, T],Rd) equipped with the J1-topology with the same local

rate function.

Skorohod’s J1-topology is a metric topology, denote by dJ1(·,·) a metric

inducing this topology (cf ddin display (A.2), (A.3) below theorem A.53

of [23]). Convergence in D([0, T],Rd) to affine functions in the supremum

norm induced metric and dJ1is equivalent, since: for ψ, f ∈D([0, T],R) by

definition of the metrices

dJ1(f, ψ)≤ ||f−ψ||

And if ||ψ′|| = supt∈[0,T ]|ψ′(t)|<∞then

||f−ψ|| ≤ (||ψ′|| + 1) dJ1(ψ, f)

Thus, open balls around affine functions wrt these metrices can be nested.

Define the open ball Uaround ψwith ψ(t) = x+tv wrt dJ1

U(δ) := {f∈D([0, T],Rd)|dJ1(f, t 7→ x+tv)< δ}

6.6 Local large deviations of the generalised Jackson network 183

then

Uδ(t7→ x+tv)⊆U(δ)⊆U(δ)⊆ Uδ(1+|v|)(t7→ x+tv)

and

lim inf

n→∞

nlog P(Zn∈ Uδ(t7→ x+tv)) ≤lim sup

n→∞

nlog P(Zn∈U(δ))

≤lim sup

n→∞

nlog P(Zn∈ U(1+|v|)δ(t7→ x+tv))

Then let δ→0. From goodness of the rate function IQ

q0, closedness of {ψ},

and [5], p.119

TRJ(x)(v) = IQ

q0(ψ) = inf

g∈{ψ}IQ

q0(g) = lim

δ→0inf

g∈U(δ)

q0(g)

and we finally obtain

−TL(x, v)≤ −TRJ(x)(v)≤ −TL(x, v)

which identifies the local rate functions. 

6.5.1

Corollary 6.5.2. The upper bound for the almost Fenchel Legendre trans-

form in 6.2.4 is always a tight bound and

GK(v) = sup

α∈BK

hα, vi − Ψ(α)

6.6 Calculating the local rate function

We have identified the local large deviation rate function L(·,·) for the gener-

alised Jackson network as a restricted optimisation problem 6.0.4, an almost

Fenchel-Legendre transform. For the Jackson network the rate function can

be expressed as a (real, true) Fenchel-Legendre transform in lower dimen-

sional space. We cite proposition 10.2 of [12] of Ignatiouk-Robert that we

generalise here. We give it in our notation. For L(x, v) and αthe optimiser in

the restricted optimisation problem is such that under the change of measure

with paramter αthe network drift becomes vand the deviating event that

the scaled network process stays close to the linear function t7→ x+tv be-

comes the expected behaviour, the network’s fluid limit. Knowing the change

of measure that makes t7→ x+tv the fluid limit of the network is knowing

the rate function.

184

Claim 6.6.1. For (x, v),K={i|xi>0or vi>0}there is Θ⊇Kand

convex ΨΘ:R|Θ|→Rsuch that

sup

α∈BK

hα, vi − Ψ(α) = ΨΘ∗(vΘ)

Let M⊃K={i|xi>0 or vi>0}and consider the equality constrained

optimisation problem.

inf

α∈DM

−hα , vi+ Ψ(α),DM:= {α∈Rd|αi= Ξ(i)(α)i∈Mc}(6.16)

To better describe elements of DMwe need the following

Definition 6.6.2. For strictly substochastic Pand M⊆ {1,...,d}define

the matrix Q∈Rd×|M|

Q=P{1,...,d}M+P{1,...,d}Mc(id −PMc)−1PMcM

Note that Qsimplifies when splitting {1,...,d}into Mand Mc.

QMCM= (id −PMc)−1PMcM

QM=PM+PMMc(id −PMc)−1PMcM

Remark 6.6.3. Qis substochastic and 1is not an eigenvalue of QM.

The remark was proved in [3] Lemma 4.3. Thus the rows q(i)of Qare

measures with total mass ≤1. The q(i)are not generally equivalent to the

p(i)when restricted to M: there may be j∈Msuch that qij >0 = pij. We

define the lmgf of q(i)parallel to Ξ(i)for the p(i):

Definition 6.6.4. In parallel to the definition of Ξin 4.4.8 define for the

sub-probability measure q(j), the j-th row of Qand β∈R|M|

Υ(j)(β) = log |M|

k=1

qjk eβk+ (1 −qj1− · · · − qj|M|

|{z }

=:qj0

).

If q(j)is a subprobability measure on {e1,...,ed}then Υ(j)is as in 4.4.6,

representing the restriction to M.

Lemma 6.6.5. If P∈Rd×dis a substochastic matrix and Qassociated with

Pas in 6.6.2 and for j∈ {1,...,d}Ξ(j)is associated with the j-th row p(j)

of Pand Υ(j)with the j-th row q(j)of Qthen Ξ(j)(α) = Υ(j)(αM)for any

α∈ DM.

6.6 Local large deviations of the generalised Jackson network 185

Proof of 6.6.5: We transform eαisuch that expressions become linear:

ˇαi=eαi−1. Definition of Υ(i)and Ξ(i)then become

eΞ(i)(α)i=1,...,d = d

j=1

pij eαj+ (1 −

j=1

pij)!i=1,...,d

=Pˇα+











eΥ(i)(αM)i=1,...,d =



|M|

j=1

qij eαj+ (1 −

|M|

j=1

qij)

i=1,...,d

=QˇαM+











rewriting the condition α∈ DMin a similar fashion

αi= Ξ(i)(α)∀i∈Mc⇔ˇαMc=PMc{1,...,d}ˇα(6.17)

and iterating

ˇαMc=PMc{1,...,d}ˇα

=PMcˇαMc+PMcMˇαM

=PMcPMcˇαMc+PMcMˇαM+PMcMˇαM

= (PMc)2ˇαMc+ (PMc+ id)PMcMˇαM

=... = (PMc)n+1 ˇαMc+

k=0

(PMc)kPMcMˇαM

which converges

ˇαMc= lim

n→∞

k=0

(PMc)kPMcMˇαM= (id −PMc)−1PMcMˇαM

Applying this in Psplit into Mand Mcindices

Pˇα=P{1,...,d}MˇαM+P{1,...,d}McˇαMc

=P{1,...,d}MˇαM+P{1,...,d}Mc(id −PMc)−1PMcMˇαM

=QˇαM

Thus Pˇα+









=QˇαM+









and we get the claim. 

6.6.5

Corollary 6.6.6. DM={α|αi= Υ(i)(αM)∀i∈Mc}.

186

Claim 6.6.7. The equality constraint optimisation problem is a Fenchel-

Legendre transform.

inf

α∈DM

−hα, vi+ Ψ(α) = −ΨM∗(vM)

Proof of 6.6.7: For α∈ DMwrite α=αM

αMcand ui= Υ(i)(αM) for

i∈Mc. Then α=αM

uby 6.6.6. Thus

inf

α∈DM

−hα, vi+ Ψ(α) = inf

αM∈R|M|−hv, αM

ui+ Ψ(αM

u).

By choice of Mwe have vMc= 0 and hv, αM

ui=hαM, vMi. Also Ψ

simplifies:

Ψα∈DM(α) = X

j∈M

Γ(j)

A(αj) + Γ(j)

S(−αj+ Ξ(j)(α)

|{z }

=Υ(j)(αM)

)

j∈Mc

Γ(j)

A(αj

|{z}

=Ξ(j)(α)=Υ(j)(αM)

) + Γ(j)

S(−αj+ Ξ(j)(α)

|{z }

)

j∈M

Γ(j)

A(αj) + Γ(j)

S(−αj+ Υ(j)(αM)) + X

j∈Mc

Γ(j)

A(Υ(j)(αM))

=: ΨM(αM)

and we now have

inf

α∈DM

−hα, vi+ Ψ(α) = inf

αM∈R|M|−hvM, αMi+ ΨM(αM)

and the claim is proved. 

6.6.7

Proof of 6.6.1: Let αbe the optimiser of 6.1.4 , (6.8) and

A={i∈Kc|αi= Ξ(i)(α)}

the set of indices of restrictions active in α. Set

Θ = K∪Ac={i|xi>0 or vi>0 or αi−Ξ(i)(α)<0}.

Then αis also the optimiser in the equality constrained optimisation problem

with restrictions only in Mc=A(cf [2] chapter 3 on Lagrange Multiplier

Theory, section 3.3) and

inf

α∈BK

−hα, vi+ Ψ(α) = inf

α∈DΘ

−hα, vi+ Ψ(α)−ΨΘ∗(vΘ)



6.6.1

6.6 Local large deviations of the generalised Jackson network 187

Remark 6.6.8. Let A(i), S(i),...,S(d)be primitive processes of a generalised

Jackson network with lmgfs Γ(i)

A,Γ(i)

S. Let Pbe the routing matrix of the

network. Fix M⊆ {1, . . . , d}and define Qas in 6.6.2 and let its i-th row

define the lmgf Υ(i)as in 6.6.4. For i∈Mcsplit the i-th arrival process into

A(ij)j∈Mwrt q(i)and define the compound arrival process for i∈M

A(i)=A(i)+X

j∈Mc

A(ji).

For i∈Msplit S(i)wrt q(i)and define the new S(i)

S(i)=X

k∈M

S(ik)−S(i)

Then

A+X

j∈M

S(j)

is a free process, its routing matrix is QMand it has lmgf ΨMas in 6.6.1.

It seems immediate that the network represented by QMis open and that

ΨMis finite on R|M|. General theory for the large deviations of the free

process apply. We can now look for a suitable superset Θ in 6.6.1 in the

following way:

Algorithm 6.6.9. To calculate the local large deviation rate function for

the generalised Jackson network (cf 6.0.4), especially to find the optimiser

α∈ BKand a suitable set Θof 6.6.1

1. Set M:= K , S:= ∅.

2. Renumber nodes such that {1,...,d}={1,...,|M|,...,d}.

3. Find the optimiser ˜αM∈R|M|in ΨM∗(vM).

4. Calculate ˜αi= Υ(i)(˜αM)for i∈Mcsuch that ˜α=˜αM

˜αMc∈ DM.

5. If under the change of measure M∅(˜α, ·)there are no strict bottlenecks

in the Mc-nodes then add (M , ˜α)to S. Else, for each strict bottleneck

i∈Mcset M:= M∪ {i}and iterate from 3. on.

6. (Θ,α)∈ S such that ΨΘ∗(vΘ) = min{ΨM∗(vM)|(M, ˜α)∈ S}.

This compares to theorem 2 of [12].

188

6.6.1 Interpretation and possible improvement

We now want to give an interpretation of the local rate function of the gen-

eralised Jackson network in terms of the associated free rate function ΨΘ∗,

cf 6.6.1.

From interpretation 6.1.3 we know that any feasible twist (any α∈ BK)

does not decrease service rates at Kcnodes. And from 6.3.4 we have that if

the optimal twist parameter strictly increases a service rate of a Kc-node then

this node will be a bottleneck. Both interpretations make sense as minimum-

cost (in terms of service rates Γ(i)∗

Sand routing Ξ(i)∗) to allow a certain flow

through Kc-nodes: to reduce flow through a restricted node one only has to

reduce its input, the service rate does not have to change (Γ(i)∗

S(0) = 0). And

increasing the service rate of a restricted node is reasonable only if all of the

service capacity is required to get a certain flow through this node.

In steps 3.-5. of the algorithm 6.6.9 the network is partitioned into Mand

Mc. The free process of M-nodes as in 6.6.8 is twisted to have drift vM, The

twist ˜αMis the optimiser in ΨM∗(vM), thus an optimal twist wrt service at

and routing between M-nodes and original arrival processes at Mc-nodes.

Sevice rates at Mcnodes are not considered in the model of the free process

of M-nodes of 6.6.8; cost for these service rates Γ(i)∗

S,i∈Mcdo not appear

in ΨM∗. It may now happen that as the arrival to Mcnodes is split to be-

come arrivals at Mnodes the flow through the Mc-subnetwork does not go

as smoothly as assumed in 6.6.8: Nodes in Mcoverflow when they cannot

handle the flow into Mand / or out of M.

If this happens, then in the network the drift vis not realised under the

changed measure: ˜α6=αand M6= Θ. Capacities and cost for increasing

capacities of Mcnodes has not been considered in the choice of ˜αbut should

have been. In the next iteration 3.-5. an increased set Mand cost at this

increased set of nodes is considered.

In the proof of 6.6.1 we have characterised (Θ ,α) as Θ = K∪{i∈Kc|µi(α)>

µi}. Thus if (M , ˜α) with i∈M\Kand µi(˜α)< µithen M6= Θ. This

allows us to remove a node i∈M\Kfrom the set of free nodes. It would

be an advantage if one could change step 5. of algorithm 6.6.9 to become

5’. If under the change of measure M∅(˜α, ·) there are no strict bottlenecks

in the Mcsubnetwork and µi(˜α)≥µifor all i∈M\Kthen α= ˜α

and Θ = M. Else

6.6 Local large deviations of the generalised Jackson network 189

–If for some i∈M\K:µi(˜α)< µithen restrict this node: set

M:= M\ {i}and iterate from 3. on.

–If for all i∈M\K:µi(˜α)≥µiand some i∈Mcis a bottleneck in

the subnetwork of Mc-nodes then free this node: set M:= M∪{i}

and iterate from 3. on.

However, it is not obvious that this algorithm terminates or whether the

sequence in which nodes are freed and/or restricted influences the final set

of free nodes. In the best of all cases step 6. of algorithm 6.6.9 could be

omitted.

6.6.2 Example

We give a simple example of calculating the decay rate / local rate function

with algorithm 6.6.9. The result is of course the same as when calculating

it from the restricted optimisation problem of the almost Fenchel-Legendre

transform of 6.0.4. The following example is simple as there will be only one

iteration in the algorithm and only one feasible choice for adding a bottleneck

in 5.

We work with the network of d= 4 nodes as introduced in figure 5.1 of

chapter 5. We chose exponential inter event times for all arrival and service

events for simplicity. Let the rates be

λ=









, µ =









, P =





0 0 1 0

0 0 0 1

10 0 0





.

Let the network process start in xend investigate the probability that it

evolves in direction v

x=





0.1

0.2





, v =





−1







vis not the network drift and we will calculate the decay rate for the scaled

network process to stay close to the function t7→ x+tv.

We have K= Λ = {1,2}and the local process WΛ=W{1,2}. Figure

190

6.3 is an adaption of the network to represent the local process: Λ-nodes are

circled, indicating the behaviour as always non-empty.

@@@@







- -

Figure 6.3: Adaption of the network to represent the local process of Λ =

{1,2}.

For step 3. of algorith 6.6.9 we transform the network process into the

associated free process in |Λ|= 2 dimensions. We do this in steps. We

remove nodes 3,4 but keep the flow through these nodes. Flow indirectly

leaving the network through {3,4}-nodes is now documented as exits from

the {1,2}nodes: both nodes 1 and 2 become exit nodes. Similarly there is

flow comming back to each node (via p13 p34 p41 >0 and p23 p34 p42 >0) and

indirect flow from node 1 to node 2 (via p23 p34 p41 >0).

@@@@



@@@@@







Figure 6.4: Some flows through {3,4}-nodes

Since immediate feedback is not allowed in our model we have to remodel

inter event times at these feedback nodes as in section 5.1.

We calculate Qwith Λ = {1,2}and Λc={3,4}.

QΛ=PΛ+PΛΛc(id −PΛc)−1PΛcΛ=3

20 

QΛcΛ=

∞

k=0

ΛcPΛcΛ=3

0 0 

Removing immediate feedback we have to remodel inter event times (and

their lmgfs): In this setting we now have inter event times τ(i)SΛ=τ◦(i)S

6.6 Local large deviations of the generalised Jackson network 191

@@@@







- -

-rh

Figure 6.5: Construction of the free subprocess for M= Λ = {1,2}

with E[τ◦(i)S] = 1

µ(1−qii). Generally the new free processes primitives have

the rates

free process of Λ-nodes free process remodelled

(with immediate feedback) (without immediate feed-

back, cf 5.1)

arrival

rates

λΛ+ (QΛcΛ)⊤λΛc=1

11

1

service

rates

µΛ=3

6[µi(1 −qii)]i∈Λ=2.775

5.1

routing QΛ=3

20 023

17 0

We now investigate the free process with rates (1

1,2.775

5.1,023

17 0).

Ψ{1,2}∗(−1

−1

2) = 0.2694 ,˜α{1,2}=0.1457

0.3028 

Step 4: We use rows of QΛcΛto calculate remaining coordinates of ˜α. For

192

example ˜α3= log(q31 e˜α1+q32 e˜α2+q30).

˜α=





0.1457

0.3028

0.0738







Step 5: We check if under the twist ˜αthe Λc-nodes have strict bottlenecks.

twisted twisted twisted

arrival rates service rates routing







1.2

1.4











3.2

4.8

3.7

5.8











0 0.55 0.44 0

0 0 1 0

0 0 0 0.5

0.3 0.35 0 0







Node 3 is a strict bottleneck. Also note that with these rates the evolution

at node 1 is less than the required v1=−1 as flow that was supposed to

reach node 1 via node 3 is held back at the bottleneck node 3. We free the

bottleneck node 3, set M:= {1,2,3}, and continue with step 3.

Step 3: For M={1,2,3}construct the associated free process. We give

flows of node 3 that are affected by removing node 4.

@@@@







- -







-?-

@@@@



Figure 6.6: Construction of the free subprocess for M={1,2,3}

6.6 Local large deviations of the generalised Jackson network 193

We calculate Qfor the free subprocess. We now have Mc={4}.

QM=PM+PM{4}P{4}M=



0 0 1

20 0



Q{4}M=

∞

k=0

{4}

|{z }

=id=[1]

P{4}M=3

10 0

now consider the free process with ates (



0

,



4

,



0 0 1

20 0

).

We get the free rate function and its optimiser.

Ψ{1,2,3}∗(



−1

0

) = 0.5754 ,˜α{1,2,3}=



0.0121

0.1110

−0.2595 



Step 4:

˜α=





0.0121

0.1110

−0.2595

0.0381







twisted twisted twisted

arrival rates service rates routing







1.0

1.1











2.8

4.1

5.3











0 0.59 0.41 0

0010

0 0 0 0.51

0.29 0.32 0 0







These rates have node 3 a bottleneck but not a strickt bottleneck, and node

4 in equilibrium. The network drift is v.

Step 5: Θ = {1,2,3}and α=





0.0121

0.1110

−0.2595

0.0381





. The algorithm terminates

with L(x, v) = 0.5754.

194

The following are 4 simulations of the 4-nodes network under the twisted

distribution with n= 3000 and T=.05.

500

250 100

400

125 15050

600

300

200

100

400

150

200

600

300

100

1250 25 75

500

10050

500

250 100

400

125 15050

600

300

200

100

200

150

300

500

100

25 125

400

600

0 1007550

The exponential decay rate (per time) is about .5754: The probability that

the scaled network process follows t7→ x+tv for an interval of scaled time

length 0.2 has a decay rate in nof about .2∗.5754 = .1151.

Chapter 7

Appendix

7.1 Functional strong law of large numbers

Let X1, X2,...be iid and centred with E[X1] = 0. Let Snbe the n-th partial

sum and 0, S1, S2,... a path of partial sums. Let Znbe the scaled process

under the law of large numbers scaling.

Zn:R≥0→R, t 7→ 1

nS⌊nt⌋

From the strong law of large numbers for almost every path of partial sums

∀ǫ > 0∃n0∈N:∀n≥n0:<1

nSn<ǫ

where the n0depends on the path. Fix ǫ > 0 and for each path the associ-

ated n0where it exists. Then for an arbitrary path where such a finite n0

exists, there is also a maximum of the partial sums before n0attained (again

depending on the path).

M:= max

m∈{1,...,n0}|Zm|

When choosing nlarge enough the scaled partial sums process Znbefore and

after n0

nwill be arbitrarily small. We do an exact calculation: Fix δ > 0. Fix

for ǫ:= δ

Tfor each path the n0and Mas above. Choose n≥n0and n≥M

δ.

Both lower bounds depend on the path. They are finite for almost all paths.

Let t∈[0, T]. For n≥max{n0,M

δ,1

T}we either have nt ≥n0or not. First

195

196

consider the first case.

Zn(t)≤1

nmax{|S⌊nt⌋|,|S⌊nt⌋+1|}

≤max{⌊nt⌋

⌊nt⌋|S⌊nt⌋|

|{z }

<δ

,⌊nt⌋+ 1

⌊nt⌋+ 1 |S⌊nt⌋+1|

|{z }

<δ

}

≤⌊nt⌋+ 1

T=⌊nt⌋

T+1

T≤2δ

And now the second with nt < n0.

Zn(t)=1

nmax{|S⌊nt⌋|,|S⌊nt⌋+1|} ≤ 1

nM≤δ

So almost all paths converge under the scaling and in the sup-norm to the

function t7→ 0.

1 = P( lim

n→∞ sup

t∈[0,T ]

|Zn(t)|= 0)

In another notation with X1:= τ1−E[τ] we have Sn=Pn

k=1 τk−nE[τ]

and Zn(t) = 1

nS⌊nt⌋=1

nP⌊nt⌋

k=1 τk−⌊nt⌋

nE[τ]. Since the difference between

⌊nt⌋

nE[τ] and tE[τ] is bounded by E[τ]

nwe get the almost sure convergence:

1 = P( lim

n→∞ sup

t∈[0,T ]

⌊nt⌋

k=1

τk−tE[τ]|= 0)

7.1.1 Implication for the counting process

Define the partial sums and the interpolated partial sums process wrt τ1, τ2,...

Yn:t7→ 1

⌊nt⌋

k=1

τk

Yn:t7→ Yn(t) + nt − ⌊nt⌋

nτ⌊nt⌋+1

And note that

•Yn(k

n) = ˆ

Yn(k

n) for all k∈N.

•ˆ

Yn(t)≥Yn(t) and ˆ

Yn(t)> Yn(t) for t6∈ {k

n|k∈N}.

7.1 Appendix 197

Claim 7.1.1. Yn∈ Uǫ(t7→ tλ)⇒ˆ

Yn∈ Uǫ(t7→ tλ).

Proof of 7.1.1: Note that if Ynis close to t7→ tλ then necessarily Ynis close

to t7→ tλ whenever it jumps, that is in {k

n|k= 0,1,...,nT}:

Yn∈ Uǫ(t7→ tλ)⇒ǫ > |Yn(k

n)−k

nλ|(k= 1,...,nT)

But this is enough for ˆ

Ynto be close to t7→ tλ: (t, ˆ

Yn(t)) is the interpolation

between k

n, Yn(k

n)and (k+1

n, Yn(k+1

n)) for k=⌊nt⌋

n. And since Uǫ(t7→ tλ) is

a convex subset of R2it has to contain (t, ˆ

Yn(t)). 

7.1.1











Yn(k











Yn(t)















Figure 7.1: 2 realisations of Ynat jumptimes {k

n|k= 1,...,4}and as inter-

polated functions ˆ

Claim 7.1.2. For the interpolated counting process a functional strong law

of large numbers holds.

Proof of 7.1.2: From 7.1.1 the functional strong law of large numbers

holds for the interpolated partial sums process too. ˆ

Y−1=ˆ

Nso

sup

t∈[0,T ]

|ˆ

Y(t)−tE[τ]|< ǫ ⇒sup

t∈[0,T E[τ]−ǫ]

|ˆ

Y−1(t)−t1

E[τ]|<ǫ

E[τ]

⇔sup

t∈[0,T E[τ]−ǫ]

|ˆ

N(t)−t1

E[τ]|<ǫ

E[τ]

198

For an ǫ0smaller than ǫwe get

1 = P( lim

n→∞ sup

t∈[0,T ]

nˆ

Y(nt)−tE[τ]|< ǫ)

≤P( lim

n→∞ sup

t∈[0,T E[τ]−ǫ]

nˆ

Y−1(nt)−t1

E[τ]|<ǫ

E[τ])

≤P( lim

n→∞ sup

t∈[0,T E[τ]−ǫ0]

nˆ

N(nt)−t1

E[τ]|<ǫ

E[τ])

Claim 7.1.3. A functional strong law of large numbers holds for the unin-

terpolated counting process.

Let 0 < ǫ′< ǫ and n≥1

ǫ−ǫ′.

||Nn−(t7→ tλ)|| < ǫ ⇐ || ˆ

Nn−(t7→ tλ)|| +||Nn−ˆ

Nn||

|{z }

≤1

< ǫ

⇐ || ˆ

Nn−(t7→ tλ)|| < ǫ −1

⇐ || ˆ

Nn−(t7→ tλ)|| < ǫ′

and

P( lim

n→∞ ||Nn−(t7→ tλ)|| < ǫ)≥P( lim

n→∞ ||Nn−(t7→ tλ)|| < ǫ′) = 1

7.2 Implications from exponential equivalence

Claim 7.2.1. If Nand N′are exponentially equivalent then

lim

ǫ→0lim

n→∞

nlog P(Nn∈ Uǫ(ψ)) = lim

ǫ→0lim

n→∞

nlog P(N′

n∈ Uǫ(ψ)).

If ǫ > 0is fixed and we have

lim

n→∞

nlog P(Nn∈ Uǫ(ψ)) = f(ǫ)

for some fcontinuous in ǫthen

lim

n→∞

nlog P(Nn∈ Uǫ(ψ)) = lim

n→∞

nlog P(N′

n∈ Uǫ(ψ)).

7.2 Appendix 199

Proof of 7.2.1: Assume Nand N′are coupled such that their difference

decays super exponentially fast as required for exponential equivalence. We

have for any δ > 0

lim

n→∞

nlog P(||Nn−N′

n|| > δ) = −∞

And we want for Uconvex and open or closed and U◦6=∅

lim

n→∞

nlog P(Nn∈U) = lim

n→∞

nlog P(N′

n∈U)

Let’s try. Let Uδbe the closed blow up of U

P(Nn∈U) = P(Nn∈U , ||Nn−N′

n|| ≤ δ) + P(Nn∈U , ||Nn−N′

n|| > δ)

≤P(N′

n∈Uδ,||Nn−N′

n|| ≤ δ) + P(Nn∈U , ||Nn−N′

n|| > δ)

≤P(N′

n∈Uδ) + P(||Nn−N′

n|| > δ)

and thus

lim

n→∞

nlog P(Nn∈U)

≤lim sup

n→∞

nlog P(N′

n∈Uδ) + P(||Nn−N′

n|| > δ)

= max{lim sup

n→∞

nlog P(N′

n∈Uδ),−∞}

= lim sup

n→∞

nlog P(N′

n∈Uδ)

and since δ > 0 was arbitrary and for U=Uǫ(ψ) and for the limit of ǫ→0.

lim

ǫ→0lim

n→∞

nlog P(Nn∈ Uǫ(ψ)) ≤lim

ǫ→0lim

δ→0lim sup

n→∞

nlog P(N′

n∈ Uǫ+δ(ψ))

= lim

ǫ→0lim

n→∞

nlog P(N′

n∈ Uǫ(ψ))

By symmetry we are done.

In the case of ǫ > 0 fixed with a bound continuous in ǫ

lim

n→∞

nlog P(Nn∈ Uǫ(ψ)) ≤lim

δ→0lim sup

n→∞

nlog P(N′

n∈ Uǫ+δ(ψ))

= lim

n→∞

nlog P(N′

n∈ Uǫ(ψ))

200

For a lower bound we need the δ-interior of U, that is a non-empty subset of

Usuch that its δ-blow up is still contained in U. Let this be U−δ.

P(Nn∈U) = P(N′

n+ (Nn−N′

n)∈U)

≥P(N′

n+ (Nn−N′

n)∈U , ||Nn−N′

n|| ≤ δ)

≥P(N′

n∈U−δ,||Nn−N′

n|| ≤ δ)

and for U=Uǫ(ψ) , U−δ=Uǫ−δ(ψ)

lim

n→∞

nlog P(Nn∈ Uǫ(ψ))

= lim

δ→0lim

n→∞

nlog P(N′

n∈ Uǫ−δ(ψ),||Nn−N′

n|| ≤ δ)

= lim

δ→0lim

n→∞

nlog P(N′

n∈ Uǫ−δ(ψ))

continuity

in ǫlim

n→∞

nlog P(N′

n∈ Uǫ(ψ))



7.2.1

Lemma 7.2.2. For θ > 0and || · || the supremum norm over [0,1]:

lim

n→∞

nlog E[enθ||Nn−N′

n||] = 0

Proof of 7.2.2: Let δ > 0.

E[enθ||Nn−N′

n||] = E[enθ||Nn−N′

n|| 11||Nn−N′

n||>δ] + E[enθ||Nn−N′

n|| 11||Nn−N′

n||≤δ]

Investigate summands separately, starting with the first. Apply H¨older

E[enθ||Nn−N′

n|| 11||Nn−N′

n||>δ]≤E[enpθ||Nn−N′

n||]1

pE[11||Nn−N′

n||>δ]1

≤E[enpθ(Nn(1)+δ)]1

pE[11||Nn−N′

n||>δ]1

with some p≥1. Since Γ(·) is finite on all of R:

lim sup

n→∞

nlog E[eθ||Nn−N′

n|| 11||Nn−N′

n||>δ]

≤lim sup

n→∞

nlog E[epθN(n)]1

p+ lim sup

n→∞

nlog E[11||Nn−N′

n||>δ]1

pΓ(pθ)− ∞

=−∞

7.2 Appendix 201

Continuing for the second.

lim sup

n→∞

nlog E[enθ||Nn−N′

n|| 11||Nn−N′

n||≤δ]≤lim sup

n→∞

nlog enθδ =θδ

Thus

lim

δ→0lim sup

n→∞

nlog E[eθ||Nn−N′

n||] = lim

δ→0max{−∞ , δθ}= 0

But θ > 0 and || · || ≥ 0 makes E[eθ||Nn−N′

n||]≥1 and we cannot have decay.

So the lim has to be = 0. 

7.2.2

Claim 7.2.3. If N, N′are exponentially equivalent then they have the same

lmgfs:

lim

t→∞

tlog E[eθNt] = lim

t→∞

tlog E[eθN′

lim

t→∞

tlog E[ehθ,Nti] = lim

t→∞

tlog E[ehθ,N′

ti]

Proof of 7.2.3: For θ > 0. Upper bound: Let p, q > 1 such that 1

p+1

q= 1

E[eθNt] = E[eθN′

t+θ(Nt−N′

t)]≤E[eθN′

t+θ||N−N′||[0,t]]

≤E[epθN′

t]1

pE[eqθ||N−N′||[0,t]]1

under the exponential scaling

tlog E[eθNt]≤1

tp log E[epθN′

t] + 1

tq log E[etqθ 1

t||N−N′||[0,t]]

→1

pΓ(pθ) + 0 (t→ ∞)

by application of 7.2.2. As pis arbitrary with only p > 1 we let p→1. We

have D(Γ) = Rand continuity of Γ(·) from finiteness and convexity. We get

the upper bound Γ(θ). Lower bound, still θ > 0:

E[eθNt] = E[eθN′

t+θ(Nt−N′

t)]

≥E[eθN′

t−θ||N−N′||[0,t]]

≥E[eθN′

t−θ||N−N′||[0,t]11||N−N′||≤tδ] + E[eθN′

t−θ||N−N′||[0,t]11||N−N′||>tδ]

≥E[eθN′

t−θtδ 11||N−N′||≤tδ]

under the exponential scaling

lim

t→∞

tlog E[eθNt]≥lim

t→∞

tlog E[eθN′

t11||N−N′||≤tδ]−θδ

= Γ(θ)−δ θ

202

as we had δ > 0 arbitrarily small, the lower bound is done.

For θ < 0 we apply basically the same tool and in general dimensions for

θ∈Rdand |·|a norm in Rd,|| · ||[0,t]the supremum norm over [0, t]. Bounds

work similarly starting from

E[ehθ,Nti] = E[ehθ,N′

ti+hθ,Nt−N′

ti]≤E[ehθ,N′

ti+|θ| |Nt−N′

t|]

≤E[ehθ,N′

ti+|θ| ||Nt−N′

t||[0,t]]



7.2.3

7.3 Fenchel-Legendre transforms

Some simple transformations. All functions are assumed to be convex.

(cf)∗(x) = cf∗(x

c) (7.1)

(f+g)∗(x) = inf

αf∗(x−α) + g∗(α) (7.2)

(f◦g)∗(x) = inf

α∈Rα g∗(x

α) + f∗(α) (7.3)

And as a combination of the above

(h+f◦g)∗(x)≤inf

α∈Rh∗(x−α) + (f◦g)∗(α)

≤inf

α∈[0,1] inf

βh∗(x−α) + β g∗(α

β) + f∗(β)

Should we prove them?

(c f)∗(x) = sup

θ x −c f(θ) = csup

θx

c−f(θ) = c f∗(x

(f+g)∗(x) = sup

θ x −f(θ)−g(θ)

= sup

θ(x−α)−f(θ) + α θ −g(θ)

≤sup

θ(x−α)−f(θ) + sup

α θ −g(θ)

=f∗(x−α) + g∗(α)

With equality if the optimiser is the same in f∗and g∗. Also we can optimise

the bound.

(f+g)∗(x)≤inf

αf∗(x−α) + g∗(α)

7.3 Appendix 203

If there is an optimal θfor (f+g)∗(x) then

f′(θ) + g′(θ) = x⇔f′(θ) = x−g′(θ)

g′(θ) = g′(θ)⇔f′(θ) = x−α

g′(θ) = α

for some α

and the same θis the optimiser in f∗(x−α) and g∗(α). We got the equality

of (7.2). If an optimal θdoes not exist for finite (f+g)∗(x) we probably get

equality by approximation. If (f+g)∗(x) = ∞there is nothing to do. The

claim and the proof do not rely on x∈R, it is for general f, g :Rm→Rand

x, α ∈Rm.

Next one. We have g:Rm→Rand f:R→R.

(f◦g)∗(x) = sup

θ x −f(g(θ))

= sup

θ x −α g(θ) + α g(θ)−f(g(θ))

≤sup

θ x −α g(θ) + sup

α g(θ)−f(g(θ))

=αsup

θx

α−g(θ) + sup

ξ∈g(R)

α ξ −f(ξ)

≤α g∗(x

α) + f∗(α)

We argue as before: If there is an optimising θfor (f◦g)∗(x) it will satisfy

(f◦g)′(θ) = xand if we set α:= f′(g(θ)) (∈R) then we get g′(θ) = x

αand

have obtains optimisers for f∗(α) and g∗(x

α). Again we got equality as in

(7.3). 

Claim 7.3.1. For g=πjin (7.3): f◦πj∗(α) = (f∗◦πj(α), α =πj(α)

∞,else .

Define Π⊥

jas the subspace of Rdperpendicular to πj(Rd) such that Rd=

πj(Rd)⊕Π⊥

jand write

α=πj(α) + α⊥, α ⊥∈ Π⊥

θ=πj(θ) + θ⊥, θ ⊥∈ Π⊥

This implies hα , θi=hπj(α), πj(θ)i+hα⊥, θ⊥iwhich we apply in the defi-

nition of the F-L transform:

f◦πj∗(α) = sup

θ∈Rd

hπj(α), πj(θ)i − f◦πj(θ) + hα⊥, θ⊥i

= sup

θ(1)∈πj(Rd)

hπj(α), θ(1)i − f(θ(1))

|{z }

=f∗(πj(α))

+ sup

θ(2)∈Π⊥

hα⊥, θ(2)i

|{z }

∈{0,∞}

204

Iff α=πj(α) then α⊥= 0 implying the claimed statement. 

7.4 The shifted inter event time

Let τbe an inter event time with distribution function Fand density f. We

denote f+xthe shifted density and its distribution function F+x.

f+x(t) = f(x+t)

Fc(x)

+x(t) = Z∞

s=t

f+x(s)ds =Z∞

s=t

f(s+x)

Fc(x)ds =Z∞

s=t+x

f(s)

Fc(x)ds =Fc(t+x)

Fc(x)

h+x(t) = f+x

(t) = f(x+t)

Fc(x+t)=h(x+t)

H+x(t) = Zt

s=0

h+x(s)ds =Zt

s=0

h(s+x)ds =Zt+x

s=x

h(s)ds =H(t+x)−H(x)

and F+xmatches H+xby Fc

+x=e−H+x. The Cesaro limit for the shifted

distributions hazard function does not change:

LC(h+x) = lim

t→∞

H+x(t)

t= lim

t→∞

H(t+x)−H(x)

t= lim

t→∞

H(t+x)

t−0 = lim

t→∞

H(t)

and immediately LC(h) = LC(h+x) and D(Λ) = D(Λ+x).

We get finiteness of all moments of τ+x. We calculate the mean for un-

bounded τ

E[τ+x] = Z∞

s=0

+x(s)ds =Z∞

s=0

e−H(x+s)+H(x)ds =eH(x)Z∞

s=x

e−H(s)ds

If τis bounded by b, say, then x < b is required. τ+x∈(0, b −x)

E[τ+x] = Zb−x

s=0

+x(s)ds =Zb−x

s=0

e−H(x+s)+H(x)ds =eH(x)Zb

s=x

e−H(s)ds

In both cases (τbounded or not) we got a product of something large and

something small. What happens as x→ ∞ (or x→b)?

lim

x→.. E[τ+x] = lim

x→.. Rb

s=xe−H(s)ds

e−H(x)= lim

x→..

−e−H(x)

−h(x)e−H(x)= lim

x→..

h(x)(7.4)

which only makes sense if limt→∞ h(t) exists.

7.4 Appendix 205

Claim 7.4.1. If limx→∞ h(x)exists then limx→∞ E[τ+x] = 1

LC(h).

•For τunbounded but LD-bounded: LC(h) = ∞. If lim infx→∞ h(x) = ∞

then limx→∞ E[τ+x] = 0.

•If τis bounded by b:LC(h) = ∞and if lim infx→bh(x) = ∞then

limx→bE[τ+x] = 0.

•If τis not LD-bounded and limx→∞ h(x)exists in (0,∞)(and is =

LC(h)) then limx→∞ E[τ+x] = 1

LC(h).

E[eθτ+x] = Z∞

s=0

eθs f(s+x)

Fc(x)ds =1

Fc(x)e−θx Z∞

s=0

eθ(s+x)f(s+x)ds

Fc(x)e−θx Z∞

s=x

eθs f(s)ds =1

Fc(x)eθx Z∞

s=x

eθs f(s)ds

Fc(x)eθx−Λ(θ)Z∞

s=x

eθs−Λ(θ)f(s)

|{z }

=fθ(s)

Fc(x)eθx−Λ(θ)Fc

θ(x) = Fc

Fc(x)e−θx+Λ(θ)(7.5)

Claim 7.4.2. Shifting and exponentially twisting commute.

f+xβ(x) = f+x(t)eβt−Λ+x(β)=f(t+x)

Fc(x)eβt−Λ+x(β)

Fc(x)e−βx−Λ+x(β)+Λ(β)f(t+x)eβ(t+x)−Λ(β)

Fc(x)eβx−Λ(β)e−Λ+x(β)fβ(t+x)

(7.5)

Fc(x)eβx−Λ(β)

(x)eβx−Λ(β)fβ(t+x)

=fβ(x+t)

β(x)= (fβ)+x(t)

Claim 7.4.3. If limx→∞ h(x) = ∞then limx→∞ E[eθτ+x] = 1.

206

We generally have e˜

Λ(θ)=eΛ(θ)−1

Λ′(θ)θand

lim

x→∞ E[eθ^

(τ+x)]

= lim

x→∞ R∞

s=xeθs−H(s)ds

eθx R∞

s=xe−H(s)ds = lim

x→∞

−eθx−H(x)

θ eθx R∞

s=xe−H(s)ds −eθx−H(x)

= lim

x→∞

−1

θeH(x)R∞

s=xe−H(s)ds −1= lim

x→∞

−1

θE[τ+x]−1(7.6)

If limx→∞ h(x) = ∞then limx→∞ E[τ+x] = 0 and limx→∞ E[eθ^

(τ+x)](7.6)

= 1 for

all θ∈R.

E[eθτ+x] = 1 + E[τ+x]θE[eθ^

(τ+x)]

lim

x→∞ E[eθτ+x] = 1 + lim

x→∞ E[τ+x]θE[eθ^

(τ+x)] = 1 + 0 ·θ·1 = 1 

7.5 Large deviations and other tools

Theorem 7.5.1 (Arzel`a-Ascoli, theorem A.51 of [23]).The set Ahas com-

pact closure in C([0, T],Rd)equipped with the sup-norm if and only if

•The initial points are bounded: sup~x∈A|~x(0)|<∞, and

•The functions in Aare equicontinuous, that is, for every tand ǫthere

exists a δso that, whenever |t−s|< δ we have |~x(t)−~x(s)|< ǫ for all

~x ∈A.

Theorem 7.5.2 (Contraction principle, theorem 4.2.1 of [5]).Let Xand Y

be Hausdorff topological spaces and X → Y a continuous function. Consider

a good rate function I:X → [0,∞].

(a) For each y∈ Y, define

I′(y) = inf{I(x) : x∈ X, y =f(x)}.

Then I′is a good rate function on Y, where as usual the infimum over

the empty set is taken as ∞.

(b) If Icontrols the LDP associated with a family of probability measures

{µǫ}on X, then I′controls the LDP associated with the family of prob-

ability measures {µǫ◦f−1}on Y.

The following version of the G¨artner-Ellis theorem is equivalent to theo-

rem 2.3.6 of [5].

7.6 Appendix 207

Theorem 7.5.3 (G¨artner and Ellis).Let (Zn;n∈N)be a sequence of

random vectors in Rdwith µnthe law of Zn. Assumue that the limit

Λ(λ) = lim

n→∞

nlog E[ehλ,Zni]

exists as an extended real number. Assume further that 0∈ D(Λ)◦.

(a) For any closed set F,

lim sup

n→∞

nlog µn(F)≤ − inf

x∈FΛ∗(x).

(b) For any open set G,

lim inf

n→∞

nlog µn(F)≥ − inf

x∈G∩F Λ∗(x),

where Fis the set of exposed points of Λ∗whose exposing hyperplane belongs

to D(Λ)◦.

deviation principle holds for (Zn;n∈N)with the good rate function Λ∗(·).

7.6 Assumptions

Chapter 2

2.2.2: Inter event times a.s. strictly positive. Lmgfs open and not bounded

from below.

2.2.13: Existence of density, no harsh used better than new.

2.4.2 : Existence of positive Cesaro mean for the hazard rate.

Chapter 5

5.0.2: Summarises previous assumptions.

5.1.9: Networks are open and have no immediate feedback.

208

Bibliography

[1] Abraham Berman and Robert J. Plemmons. Non-negative matrices in

the mathematical sciences. Academic Press, 1979.

[2] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, 1995.

[3] Hong Chen and Avishai Mandelbaum. Discrete flow networks: Bot-

tleneck analysis and fluid approximations. Mathematics of Operations

Research, 16:408–446, 1991.

[4] D.J. Daley and D. Vere-Jones. An Introduction to the Theory of Point

Processes. Springer, 1988.

[5] Amir Dembo and Ofer Zeitouni. Large Deviations Techniques and Ap-

plications. Springer, 1993.

[6] Nicholas G. Duffield and Neil O’Connell. Large deviations and overflow

probabilities for the general single-server queue with applications. Math.

Proc. Cam. Phil. Soc., 1995.

[7] Paul Dupuis and Richard S. Ellis. The large deviation principle for a

general class of queueing systems. Transactions of the American Math-

ematical Society, 347(8), 1995.

[8] Robert Foley and David McDonald. Large deviations of a modified jack-

son network: Stability and rough asymptotics. The Annals of Applied

Probability, 15, 2005.

[9] Ayalvadi Ganesh, Neil O’Connell, and Damon Wischik. Big Queues.

Springer, 2004.

[10] Peter Glynn and Ward Whitt. Large deviations behaviour of counting

processes and their inverses. Queueing Systems, 17:107–128, 1994.

[11] J.B. Goodman and William Massey. The non-ergodic jackson-network.

Journal of Applied Probability, 21:860–869, 1984.

209

210

[12] Irina Ignatiouk-Robert. Large deviations of jackson networks. Annals

of Applied Probability, 10:962–1001, 2000.

[13] Irina Ignatiouk-Robert. Large deviations for processes with discontinu-

ous statistics. Annals of Probability, 33:1479–1508, 2005.

[14] James R. Jackson. Networks of waiting lines. Operations Research,

5:518–521, 1957.

[15] Anatolii Puhalskii. Large deviation analysis of the single server queue.

Queueing Systems, 21:5–66, 1995.

[16] Anatolii Puhalskii. The action functional for the jackson network.

Markov Processes and Related Fields, 13:99–136, 2007.

[17] Anatolii Puhalskii and Ward Whitt. Functional large deviation princi-

ples for first-passage-time processes. The Annals of Applied Probability,

7(2):362–381, 1997.

[18] Philippe Robert. Stochastic Networks and Queues. Springer, 2003.

[19] R. Tyrrell Rockafellar. Convex Analysis. Princeton University Press,

1970.

[20] Mark Rodgers-Lee. The large deviations of random time-changes in a

metric topology, 2003. Master Thesis, University of Dublin.

[21] H. L. Royden. Real Analysis. The Macmillan Company, 1968.

[22] Raymond Russell. The large deviations of random timechanges, 1998.

PhD thesis, Trinity College Dublin.

[23] Adam Shwartz and Alan Weiss. Large Deviation for Performance Anal-

ysis. Chapman and Hall, 1995.

[24] Hermann Thorisson. Coupling, Stationarity, and Regeneration.

Springer, 2000.

[25] James S. Vandergraft. A fluid flow model of networks of queues. Man-

agement Science, 29(10), 1983.

[26] Ronald W. Wolff. Stochastic Modeling and the Theory of Queues.

Prentice-Hall, 1989.