Document [original]

Caching in Networks:

Non-Uniform Algorithms

and

Memory Capacity Constraints

Dissertation

Matthias Westermann

Acknowledgments

First of all, I would like to thank my advisor Prof. Dr. Friedhelm Meyer auf der Heide

for his great support. The atmosphere in his research group at Paderborn University

was very creative. In particular, Friedhelm always left me the freedom to do the work

in my own style and time. Furthermore, I would like to thank the other reviewers of

my thesis, Prof. Dr. Susanne Albers from Dortmund University and Prof. Dr. Burkhard

Monien from Paderborn University. I would also like to thank Prof. Dr. Johannes

Bl¨omer and Dr. Rainer Feldmann for being additional members of the examination

board.

I wouldliketo thankmy “co-advisor” Berthold V¨ockingfor a very valuable collab-

oration. It was a lot of fun to master several problems in theoretical computer science

and other major spheres of life with Berthold, such as solving a traveling salesperson

problem given by the touristic highlights in California and changing several car tires

with unconventional tools inthe forests of Massachusetts during one night. Also, many

thanksto ChristianSohler, Klaus Schr¨oder, Harald R¨acke, and Christof Krickfor many

great discussions about computer science and other significant aspects of society.

Matthias Westermann

Paderborn, Germany

November 2000

This research was supported by the DFG-Sonderforschungsbereich 376

“Massively Parallel Computing: Algorithms – Design Methods – Applications”

Contents

1 Introduction 1

1.1 Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Cost model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Competitive analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Previous work on the access tree strategy . . . . . . . . . . . . . . . . 8

1.5 Other previous work . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Caching without Memory Capacity Constraints 15

2.1 The access tree strategy . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Caching on trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.1 Uniform model . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.2 Non-uniform model . . . . . . . . . . . . . . . . . . . . . . 19

2.3 Applications of the access tree strategy . . . . . . . . . . . . . . . . . 26

2.3.1 Meshes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.2 Fat-trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.3.3 Complete networks . . . . . . . . . . . . . . . . . . . . . . . 39

2.3.4 The universal caching strategy . . . . . . . . . . . . . . . . . 42

2.3.5 Clustered Networks . . . . . . . . . . . . . . . . . . . . . . . 45

2.4 Extending the results to data-race free applications . . . . . . . . . . 52

3 Caching with Memory Capacity Constraints 55

3.1 The general framework . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.2 The caching strategy for the memory tree . . . . . . . . . . . . . . . 57

3.2.1 Uniform model . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.2.2 Non-uniform model . . . . . . . . . . . . . . . . . . . . . . 64

3.3 Applications of the general framework . . . . . . . . . . . . . . . . . 67

3.3.1 Meshes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.3.2 Fat-trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.3.3 Complete networks . . . . . . . . . . . . . . . . . . . . . . . 71

3.3.4 The universal caching strategy . . . . . . . . . . . . . . . . . 72

4 Summary and Discussion 75

Chapter 1

Introduction

In recent years, large computer systems connected by networks have become part of

our everyday live. A good example is the widespread use of the Internet and Internet-

related applications such as the World Wide Web (WWW). A basic functionality in

these systems is the interactive use of shared data objects that can be accessed from

each computer in the system. Examples for these objects are files in distributed file

systems for Ethernet-connected workstations, cache lines in virtual shared memory

systems for massively parallel computers, or pages in the WWW.

The dramatic growth of computer systems necessitates more and more an intel-

ligent management of shared data objects. The daily congestion in the Internet is a

clear evidence that the network becomes more and more a bottleneck as the size of

the system increases. The same effect can be observed in other distributed systems,

like massively parallel processors (MPPs) or networks of workstations (NOWs). The

traditional management of shared data objects based on centralized mega-servers with

special hardware meets its technical and economical limits. Thus, it is mandatory to

design new distributed managementstrategies thatguarantee thefree flow of theshared

data objects in the systems.

In general, large computer systems connected by networks consist of a set of com-

puters each having its own processor and memory module. These computers are usu-

ally connected by a relatively sparse network constructed out of links, i.e., point-to-

point connections, or buses, i.e., connections between two or more processors. The

performance of these systems depends on a number of parameters, including proces-

sor speed, memory capacity, networktopology, bandwidths, and latencies. Usually, the

links or buses are the bottleneck in these systems because improving communication

bandwidth and latency is often more expensive or more difficult than increasing pro-

cessor speed and memory capacity. But whereas several standard methods are known

for hiding latency, e.g., pipelined routing (see, e.g., [CMS96, CMSV96]), redundant

computation (see, e.g., [ALMZ96a, ALMZ96b, KLM



97, Mey83, Mey86, MW89])

or slackness (see, e.g., [Val90]), the only way to bypass the bandwidth bottleneck is to

reduce the communication load by exploiting locality.

2 Chapter 1. Introduction

The principle of locality is already known from sequential computation. Two kinds

of locality are usually distinguished: temporal and spatial locality (see, e.g., [Goo94]).

Temporal locality means that in the near future, a program is more likely to reference

those data objects that have been referenced in the recent past. This locality can be

due to instruction references in program loops, or data references in working stacks.

Spatial locality means that in the near future, a program is more likely to reference

those data objects that have addresses close to the recent references. This is caused,

e.g., by the traversal of data structures such as arrays. A further kind of locality is

specific to computer networks: topological locality. Topological locality means that

processors that are close together with respect to the structure of the interconnection

network are likely to be interested in the same data objects. This locality can be due to

a communication sensitive mapping of processes to the nodes in the network.

Usually, traditional management of shared data objects uses either the concept of

caching or hashing. Caching exploits the temporal and topological locality. The com-

munication load for accessing the shared data objects can be reduced by storing copies

of data objects in memory modules of some processors. This phenomenon is well

known and often explored in practice, e.g., in the WWW, where pages are cached in

order to reduce network congestion. When a user issues a read request for a page, the

requested page is transferred from a computer holding a copy of the page through the

network to the computer of the user. Usually, the page is cached, i.e., a copy of the

accessed page is kept in the local memory module of the user or in another memory

module lying on the routing path used by the page. In this way, subsequent requests

addressed to this page can be served locally. Similar caching mechanisms are used to

speed up parallel and distributed computations using virtual shared memory on mas-

sively parallel processors (MPPs) or networks of workstations (NOWs). The common

idea of all of these approaches is to use memory resources in order to reduce the band-

width utilization.

Caching is used for speeding up single-computers, too. Here a small, fast memory

module, the cache, is added to the processor in order to reduce the communication

on the bus connecting the processor with a much larger main memory module. Data

is partitioned into small blocks, called cache lines. Whenever the processor accesses

a data item that is not already stored in the cache, the whole respective cache line is

moved from the main memory into the cache. The major problem is to decide which

cache line is ejected, i.e., removed from the cache, when the cache is full. This prob-

lem is usually referred to as paging as it was investigated first in the context of virtual

memory where the data is usually divided into so-called pages. Paging has been in-

vestigated intensively as well in practice (see, e.g., [Bel66, CK99, FW74, Spi77]) as

in theory (see, e.g., [CK99, FKL



91, Ira97, IKP96, MS91, RS94, ST85]) in the past.

In all of these analyses it turned out that, although the idea of paging is very simple,

its analysis is not an easy task. Developing and analyzing strategies for caching in

networks is even more complicated than for single-computers because several caches

interact. Sometimes, creating a copy of a data object induces additional communica-

tion instead of reducing it, as this copy may need to be updated or invalidated later.

While caching aims to reduce the communication load by exploiting locality, hash-

ing tries, to distribute the communication load evenly among the network resources,

in order to avoid bottlenecks and thus congestion in the network. This is done by a

randomized distribution of the shared data objects to all memory modules. But, this

randomized distribution usually destroys the locality and hence the caching becomes

inefficient.

This thesis deals with distributed caching strategies that combine the two appar-

ently opposed concepts of caching and hashing. For this purpose, we reduce the com-

plex interconnection network to a simple structure, the so-called access tree. This

logical structure represents the communication potential of the whole network, but it

abstracts from single interconnections. In particular, the routing paths in this structure

are unique. Hence, the following questions can be answered efficiently by a distributed

caching strategy for access trees.



How many copies of a shared data object should be created?



On which memory modules should these copies be placed?



Which requests should be served by which copies?



How should the copies of a shared data object be located?

Due to the simple structure of access trees, the caching strategy can abandon on addi-

tional communication for the management and localization of the created copies. Now,

the logical access trees are embedded into the network via hashing. This is done in a

random but locality preserving fashion. With this concept, the so-called access tree

strategy, it is possible to use the effects of caching and hashing simultaneously, i.e.,

on one hand to reduce the communication load via caching, and on the other hand to

distribute the remaining load evenly, to avoid network congestion.

In order to provide efficient access to shared data objects, the communication over-

head caused by the caching strategy should be as small as possible. However, sim-

ply reducing the total communication load, i.e., the sum, taken over all messages,

of the size of the messages multiplied with the length of their routing paths, can re-

sult in bottlenecks. In addition, the load has to be distributed evenly among all net-

work resources. This corresponds to minimizing the congestion, i.e., the maximum,

taken over all links, of the amount of data transmitted by the link. Known results

for store-and-forward routing [LMRR94, MV95, OR97, SV96] and wormhole routing

[CMS96, CMSV96, SV96] show that reducing the congestion is most important in or-

der to get a good network throughput. For this reason, we believe that minimizing the

congestion is a promising approach in order to develop caching strategies that work

efficiently in theory and practice.

4 Chapter 1. Introduction

1.1 Our contribution

This thesis is based on the access tree strategy that we have originally introduced in

[MMVW97a]. The access tree strategy aims to minimize the congestion for trees,

meshes, and Internet-like clustered networks. For example, we achieve competitive

ratio 3 for trees and competitive ratio O





logn



, w.h.p., for d-dimensional meshes

with nnodes. Further, we present an Ω



logn





lower bound for the competitive

ratio for on-line routing in meshes which implies that our upper bound on the com-

petitive ratio for meshes of constant dimension is optimal. All strategies presented in

[MMVW97a], however, work only in a uniform cost model in which migrating a data

object has the same cost as accessing the object. Furthermore, memory capacity con-

straints are neglected. These assumptions simplify the problem significantly. In this

thesis we extend the access tree strategy to



a non-uniform cost model,



memory capacity constraints, and



other classes of networks.

Typically, words or other small blocks of data that can be accessed in a computer

system are much smaller than the data objects. For example, large files usually consist

of several small records and pages of virtual shared memory consist of cache lines that

can be accessed individually. Thus, migrating a data object can be much more expen-

sive than accessing only a small piece of data from the object. However, the data of

an object should be kept together in order to reduce the bookkeeping overhead. We

present the first deterministic and distributed caching strategy that achieves competi-

tive ratio 3 for trees in this non-uniform cost model. This competitive ratio is optimal

because of the lower bound shown by Black and Sleator [BS89]. Our strategy does

not only minimize the congestion but minimizes simultaneously the load on each indi-

vidual edge up to an optimal factor of 3. Strategies for trees are of special interest as

they can be used as subroutines in strategies for other networks, e.g., in the access tree

strategy. However, our strategy still neglects memory capacity constraints. This result

has been previously published as joint work in [MVW99].

Furthermore, we present a general framework for the development of caching

strategies for networks with memory capacity constraints. This framework is based

on the access tree strategy. Our strategies aim to minimize the congestion. As in the

model without memory capacity constraints, our framework yields a caching strategy

for d-dimensional meshes with nnodes that is O





logn



-competitive, w.h.p. In ad-

dition, we give several examples of networks for which our framework yields efficient

strategies, including fat-trees, complete networks, Cayley networks, edge symmetric

networks, hypercubes, cube-connected-cycles, de Bruijn networks, shuffle-exchange

networks, and butterflies. For example, our framework yields a caching strategy for

complete networks that is O





-competitive, w.h.p, with respect to the congestion at

1.2. Cost model 5

the memory modules due to remote accesses. Most of the results have been previously

published as joint work in [MVW00].

We believe that the access tree strategy is well suited for the application in prac-

tice because the congestion is provably small. Besides, the strategy is simple and

produces only very small overhead for bookkeeping. In order to illustrate the practi-

cal usability, we have implemented the dynamic access tree strategy on a massively

parallel mesh-connected computer system and tested it for three standard applications

of parallel computing in [KMR



99]. The experimental results show that execution

time and congestion are closely related. The access tree strategy achieves execution

times that are close to the times of hand-optimized message passing strategies using

full knowledge of the access pattern of the application. Furthermore, all experiments

show that the access tree strategy clearly outperforms a standard caching strategy in

which a fixed home processor is assigned to each data object that keeps track of the

object’s copies. In particular, the larger the network is, the more superior the access

tree strategy becomes.

1.2 Cost model

The computer system is modeled by an undirected graph G









with node set

Vand edge set Esuch that the nodes represent the processors with their memory

modules, and the edges represent the links. The capacities of the memory modules are

described by a function m:V



and the bandwidths of the links are described by a

function b:E



. An application consists of read and write requests that are issued

by the nodes. Each of these requests is directed to a shared data object from a finite set

Xof shared data objects.

The shared data objects in Xare stored in the local memory modules possibly

with redundancy. At each time step, for each object x



X, let the residence set R





represent the set of nodes holding a copy of x. Initially, only a single copy of each

data object is stored somewhere in the computer system. This initial configuration is

known by all nodes in the system. During the execution of an application, copies may

be deleted, copies may be migrated, or new copies may be created as long as each

node obeys its memory capacity constraints. But always, at least one copy of each

data object must be stored in some of the nodes, i.e., at each time step, for each object



X,R





A read request for a data object xrequires access to a copy of x, a write request

requires updating all copies of x. After a request is served, the multiple copies can be

reallocated. Any communication that proceeds along an edge increases the (communi-

cation) load on that edge by some amount which depends on whether a read, write, or

migration operation is performed. The increase is defined as follows.



Read operation: A read request for xissued by a node vcan be served by any

node uholding a copy of x. A path has to be allocated through the network from

vto u. The communication load on each edge eon this path increases by 1.

6 Chapter 1. Introduction



Write operation: A write request for xissued by a node vrequires to update all

copies of x. A multicast tree connecting vwith all nodes holding a copy of x,

i.e., a Steiner tree, has to be allocated. The communication load on each edge e

in the Steiner tree increases by 1.



Object migration: The strategy can replicate a copy of xfrom one node to an-

other along an arbitrary path. The communication load on each edge eon this

path increases by D, where D



1 is an arbitrary integer representing the ratio

between the load induced by the migration of the complete data object xand the

load induced by accessing only a relative small piece of x, e.g., a single byte.

Efficient strategies for distributed caching have to work in a distributed fashion.

In particular, the nodes do not have knowledge about the global state of the system,

that is, each node notices only the read and write accesses and the copy migrations that

pass the node. In order to accumulate additional knowledge a node has tocommunicate

with other nodes, which also increases the load on the involved edges.



Information exchange: Information about the global state of the system, e.g. the

actual residence set, can be exchanged by sending messages along a path from

one node to another. It is assumed that the messages have small size, e.g., they

include an ident-number of a data object and a tag for an action that should be

performed on the receiving node. The communication load on each edge on this

path increases by 1.

For a given application, let the relative load on an edge be its load divided by its

bandwidth. Finally, define the congestion to be the maximum over the relative loads

of all edges in the network.

We use caching strategies for single-computer systems as a subroutine. This prob-

lem is modeled by a graph of two nodes uand vconnected by a single edge e. The

memory module of node vcan hold all data objects simultaneously. The capacity of

the memory module of u, however, is limited. Only uis allowed to issue requests. Note

that in this case there is no significant difference between read and write requests. The

cost considered in the single-computer case is simply the load on edge e.

1.3 Competitive analysis

Caching strategies can be viewed as on-line strategies that are typically evaluated in

a competitive analysis. In this kind of analysis which was introduced by Sleator and

Tarjan [ST85] the costs of the on-line strategies are compared with the costs of an

optimal off-line strategy.

In order to obtain a simple, unambiguous model, we assume that an adversary

initiates an application, i.e., a sequence σ



σ1σ2σ3



of read and write requests each

of which is issued by one of the nodes in the network. The strategies have to serve

1.3. Competitive analysis 7

these requests one after the other, that is, it is assumed that σi



1is not issued before

the service of σiis finished. This assumption, however, does not mean that we want

to model sequential applications. The assumption of a sequence is the simplest way

to determine that the on-line and the off-line strategy have to serve the requests in the

same order. Note that allowing the off-line strategy to use a different order than the

on-line strategy would enable it to reduce its congestion almost arbitrarily.

First, we investigate caching strategies for networks without memory capacity con-

straints. For a given sequence σ, letCopt





denote the minimum congestion produced

by an optimal off-line strategy. An on-line strategy is said to be c-competitive if it has

congestion at most c



Copt





a, for each sequence σ, where ais a term that does not

depend on σ. The value cis also called the competitive ratio of the on-line strategy.

It is reasonable, to pay attention to the additive term a. This term must be in-

dependent on the sequence σbut it can depend, e.g., on the number of nodes in the

network. Thus, the additive term acan dominate the expression c



Copt





afor a

long time, until the sequence σreaches the necessary length. Therefore, it is interest-

ing to have strictly c-competitive on-line strategies i.e., strategies having congestion at

most c



Copt





, i.e., a



0, for each sequence σ. In this case, the value cis also called

the strict competitive ratio of the on-line strategy.

In the model without memory capacity constraints we will show that all presented

strategies are also able to handle parallel and overlapping requests, too. We restrict the

class of allowed applications specified by the adversary to be data-race free programs,

i.e., a write request to an object is not allowed to overlap with other requests to the

same object, and there is some order among the requests to the same object such that,

for each read and write request, there is a unique least recent write. Note that this still

allows arbitrary concurrent requests to different objects and concurrent read requests

to the same object. Without the restriction to data-race free programs, the optimal

off-line strategy could, e.g., defer all write requests to an object xto the end of the

execution and keep a valid copy of xat each node that reads xat some time. This “bad

trick” would give load 1 on each edge for serving all read requests to x. The example

illustrates that the restriction to the class of data-race free programs is necessary in

order to allow a fair comparison of on-line against optimal off-line strategies.

When the on-line strategy serves a request, it does not know future requests. In

contrast, the optimal off-line strategy is assumed to have full knowledge about the

whole sequence including future requests. This assumption gives much more power to

the optimal off-line strategy than to the on-line strategy. In fact, it turns out that this

advantage is too large in the case of bounded memory size: Sleator and Tarjan show in

[ST85] that the worst-case competitive ratio between the cost of a deterministic paging

strategy and an optimal off-line strategy grows linearly with the memory size. In the

case of randomized strategies this ratio grows logarithmically with the memory size

[FKL



91]. In order to compensate the advantage of knowing the future Sleator and

Tarjan suggest to restrict the optimal off-line strategy by reducing its memory capacity.

They show that, if the memory capacity of the optimal off-line strategy is reduced by

a factor of two, then a constant competitive ratio of two can be achieved, e.g., for the

8 Chapter 1. Introduction

LRU (least recently used) paging strategy, which is the most used paging strategy in

practice.

We use a similar approach for the evaluation of caching strategies for networks

with memory capacity constraints. We compare an on-line strategy whose individ-

ual memory capacities have been increased by a factor of m, for some fixed m, with

an optimal off-line strategy obeying the given memory capacities. For simplicity in

notation, we increase the capacities of the on-line strategy rather than decreasing the

capacities of the optimal off-line strategy. For a given sequence σ, let Copt





denote

the minimum congestion produced by an optimal off-line strategy, obeying the mem-

ory bound m





, for each node v. An on-line strategy is said to be







-competitive if

it obeys the memory bound m







, for each node v, and if it has congestion at most



Copt





a, for each sequence σ, where ais a term that does not depend on σ. The

value mis also called the memory ratio and the value cis also called the congestion

ratio of the strategy.

In the case with memory capacity constraints, it is again interesting to have

strictly







-competitive on-line strategies, i.e., strategies having congestion at most



Copt





, i.e., a



0, for each sequence σ, while obeying the memory bound m







for each node v. In this case, the value mis also called the strict memory ratio and the

value cis also called the strict congestion ratio of the on-line strategy.

If the on-line strategy uses randomization we have to describe the power given

to the adversary more precisely. This is studied intensively in [BDBK



90]. The ad-

versary specifying the sequence is assumed to be oblivious, i.e., the requests do not

depend on the decisions of the on-line strategies. A randomized strategy has to sat-

isfy the congestion bound with high probability (w.h.p.), that is, the probability that

the congestion exceeds α







Copt







is at most n



α, for every α



1, where n

denotes the number of nodes in the network. Note that this bound implies that the

expected congestion is O





Copt







1.4 Previous work on the access tree strategy

This thesis is built upon joint work with Christof Krick, Bruce Maggs, Friedhelm

Meyer auf der Heide, Harald R¨acke, and Berthold V¨ocking on data management in

networks without memory capacity constraints in a uniform cost model. This work

is described in [MMVW97a, MMVW97b, KMR



99] and Berthold V¨ocking’s thesis

[V¨oc98]. In the following, these results will be sketched briefly.

We introduce new static and dynamic data management strategies for trees,

meshes, and Internet-like clustered networks in [MMVW97a, MMVW97b]. Note

that all presented strategies work only in a uniform cost model and that they do not

obey memory capacity constraints. These assumptions simplify the problem signifi-

cantly.

In the static model, we assume that we are given an application for which the

the rates of read and write requests for all node-object pairs are known. The goal is to

1.4. Previous work on the access tree strategy 9

calculate a static placement of the objects to the nodes in the network and to specify the

routing such that the congestion is minimized. In the dynamic model, we assume no

knowledge about the access pattern. An adversary specifies accesses at runtime. Here

we present caching strategies that also aim to minimize the congestion by exploiting

locality. These strategies are investigated in a competitive model.

We start by investigating static data management on tree-connected networks mod-

eled by connected (hyper)graphs without cycles. We show the surprising result that a

static placement exists that minimizes the communication load on all edges (or hyper-

edges) simultaneously. Obviously, this yields minimum congestion as well as mini-

mum total communication load. We describe a deterministic strategy that calculates

this placement. The sequential running time of this strategy is O





for each object,

where ndenotes the size of the network. Moreover, the placement can be computed

efficiently in a distributed fashion by the processors of the underlying tree network.

Besides we present the first deterministic dynamic strategy for tree-connected net-

works that works in a distributed fashion and achieves a competitive ratio of 3. This

caching strategy is surprisingly simple, and the competitive ratio is optimal because of

the lower bound shown by Black and Sleator [BS89].

As our results for static and dynamic data management on trees hold for trees with

arbitrary link or bus connections having arbitrary bandwidths, our strategies are well

suited in particular for Ethernet-connected NOWs. For example, the static strategy can

be used for optimal static file allocation and the dynamic strategy for the implementa-

tion of efficient distributed shared memory systems in these networks.

The situation on meshes is much more complicated because there are several pos-

sible routing paths between every pair of nodes. In fact, we show by a reduction from

PARTITION that the static problem is NP-hard already on a 3



3 mesh. The dynamic

problem is even more complicated, since we have to solve an on-line routing problem

and a data tracking problem in order to locate a suitable copy in case of a read access

and all copies of an object in case of a write request.

We introduce a simulation approach that solves the static and the dynamic problem

on meshes simultaneously. The strategy is based on a randomized but locality preserv-

ing embedding of access trees into the mesh, and it is called the access tree strategy.

On the access trees, the static or dynamic strategy for trees is executed. Consider an

arbitrary mesh of dimension dwith nnodes. Here the access tree strategy yields an ef-

ficient strategy for static data placement on meshes with arbitrary side length achieving

optimal congestion up to a factor of O





logn



, w.h.p. The placement of each object

only takes time O





. In the dynamic model, the access tree strategy achieves compet-

itive ratio O





logn



, w.h.p. We give a corresponding Ω



logn





lower bound on the

competitive ratio for on-line routing, which implies that the competitive ratio achieved

by the dynamic access tree strategy is optimal for meshes of constant dimension.

Furthermore, we show how the access tree strategy can be adapted to Internet-

like clustered networks. A clustered network is a network that consists of several

small subnetworks, i.e., clusters, that are organized hierarchically. Communication

between nodes of the same cluster is not as expensive as communication between

10 Chapter 1. Introduction

nodes of different clusters. The access trees are embedded into the clustered network

in a preprocessing step. We show that this preprocessing can be done efficiently and

locally for each participating cluster. The static variant of the access tree strategy

allows to calculate a static placement with close-to-optimal congestion for clustered

networks. The dynamic variant yields an efficient caching strategy.

We believe that the access tree strategy is well suited for the application in practice

because the congestion is provably small. Besides the strategy is simple and produces

only very small overhead for bookkeeping. In order to illustrate the practical usabil-

ity, we have implemented variants of the dynamic access tree strategy in the DIVA

(Distributed Variables) library [KRVW98] that provides direct access to global vari-

ables, i.e., shared data objects, from the individual nodes in the network. The current

implementations are based on a massively parallel mesh-connected computer system.

We have evaluated the dynamic access tree strategies implemented in the DIVA

library experimentally in [KMR



99]. We test several variations of this strategy on

three different applications of parallel computing, which are matrix multiplication,

bitonic sorting, and Barnes-Hut N-body simulation. The implemented algorithms for

matrix multiplication and sorting are oblivious, i.e., their access or communication

patterns do not depend on the input. The reason why we have decided to include

these algorithms in our small benchmark suite is that they allow us to compare the

dynamic data management strategies with hand-optimized message passing strategies.

The third application, the Barnes-Hut N-body simulation, is non-oblivious. We believe

that a communication mechanism that uses shared data objects is the best solution

for this application, in contrast to the other two applications. However, we cannot

construct a hand-optimized message passing strategy achieving minimum congestion

for this application. Therefore, we concentrate on the comparison of different caching

strategies.

Our experiments show that the access tree strategy produces much less congestion

thanthestandardfixedhomestrategy(see, e.g., [Goo94]) inwhich afixedhome proces-

sor is assigned to each data object that keeps track of the object’s copies. Furthermore,

we observe that the execution time of the investigated application depends heavily

on the congestion produced by the different data management strategies. Therefore,

the access tree strategy outperforms the fixed home strategy clearly. In particular, the

larger the network is, the more superior the access tree strategy becomes. While the

fixed home strategy scales poorly, the access tree strategy comes reasonably close to

the performance of the hand-optimized message passing strategies even in the case of

large networks.

1.5 Other previous work

Competitive analysis of caching strategies in computer systemsconnected by networks

begins with Karlin et al. [KMRS88] who analyze strategies for snoopy caching on

buses. Bartal et al. [BFR92] introduce the file allocation problem in a competitive

1.5. Other previous work 11

model similar to our non-uniform model. The major difference to our model is that

they consider a different cost measure than we do, that is, they consider the total com-

munication load, i.e., the sum over the relative load on all edges, rather than the con-

gestion, i.e., the maximum over the relative load on all edges.

The most comparative work is done byAwerbuch et al. [ABF93, ABF98] who con-

sider the file allocation problem on arbitrary networks. In [ABF93] they present a cen-

tralized strategy that achieves an optimal competitive ratio O



logn



and a distributed

strategy that achieves competitive ratio O



logn



for the file allocation problem on

an arbitrary n-node network without memory capacity constraints. In [ABF98], the

distributed strategy is adapted to networks with memory capacity constraints result-

ing in an strategy that is



polylog n



polylog n



-competitive. All competitive ratios are

with respect to the total communication load instead of the congestion. We believe that

the congestion measure is more appropriate for the design of strategies than the total

communication load as it prevents some of the links from becoming bottlenecks.

We give a brief description of the very intriguing distributed strategy of Awerbuch

et al. because this will illustrate the influence of the cost measure on the design of

caching strategies. Their strategy uses a hierarchical network decomposition, intro-

duced in [AP90], that decomposes the network in a hierarchy of clusters with geomet-

rically decreasing diameter. If a node accesses a file xthat has no copy in a cluster of

some hierarchy level, then the cluster leader is informed and increases a counter for

the file. When the counter reaches D, the requesting node gets a copy of x, where D

denotes the ratio between the cost for migrating and the cost for accessing a file. This

strategy ensures that the total communication load is minimized up to a some loga-

rithmic factors in the size of the network, which gives a good competitive factor with

respect to the total communication load. The cluster leaders, however, can become

bottlenecks. In particular, the edges incident to the leader of the top level clusters may

become very congested. As a consequence, the competitive ratio in our congestion

based cost model becomes bad.

Caching strategies for tree-like topologies are of special interest since many net-

works have such topology, e.g., Ethernet-connected NOWs. Besides, strategies for

trees can be used as a subroutine for caching strategies on other networks. Therefore,

much research deals with caching on trees. For example, Bartal et al. [BFR92] de-

scribe a randomized strategy for the file allocation problem on trees without memory

capacity constraints that is 3-competitive with respect to the expected total load and a

deterministic strategy that is 9-competitive with respect to the total load. The conges-

tion is not considered, but it is easy to check that both strategies are ω





-competitive

with respect to the congestion.

Lund et al. [LRWY94] describe a 3-competitive deterministic but centralized strat-

egy for the same problem. The advantage of their strategy is that it minimizes not

only the total communication load but also the load on each edge and therefore also

the congestion. Unfortunately, the strategy makes use of global knowledge about a

work function that is influenced by each request issued in the network. In particular,

read requests issued by a node can induce the invalidation of copies in a completely

12 Chapter 1. Introduction

different region of the network without sending invalidation messages. This illustrates

that their strategy is inherently centralized, and hence yields no suitable solution in our

distributed setting.

A lower bound that holds for each network including at least one edge is shown by

Black and Sleator [BS89]. They give a lower bound of 3 on the competitive ratio of

each deterministic caching strategy on two nodes connected by a single edge. Bartal

et al. [BFR92] show that this bound holds also for randomized strategies.

Several recent papers deal with the distribution of pages in the World Wide Web.

Plaxton and Rajaraman [PR96] show how to balance the pages among several caches

by embedding a random cache tree for each page into the network. This balances the

load well and ensures fast responses even for popular pages. However, the strategy

uses a uniform embedding of the tree nodes onto the nodes in the Internet, which

destroys topological locality. Karger et al. [KLL



97] use a similar technique to relieve

hot spots in the Internet but they pay attention to topological locality. In contrast

to our assumption, they assume the latencies instead of the bandwidths to be the main

problem for datatransmissionin the Internet. Further, they consideronly read requests.

A difficult problem that has to be solved by each caching strategy is the data track-

ing problem, i.e., the problem of how to locate the copies of a particular object. To our

knowledge, data tracking mechanisms that aim to minimize the congestion have not

been investigated previously. Bartal et al. [BFR92] describe a data tracking mechanism

for arbitrary networks that minimizes the total communication load. This mechanism

is based on a distributed data structure that keeps track of all copies. Whenever a copy

is created or deleted on a node, this data structure is updated. The mechanism enables

each node to locate a copy of a particular object having approximately the distance of

the closest copy. The total communication load induced by each of the create, delete,

or locate operations are shown to be in O





log2n



, where



denotes the length of the

shortest path to a copy of the respective object and nis the size of the network. On

each node, the memory overhead due to the distributed data structure is at most O







with Xdenoting the set of shared objects. Plaxton et al. [PRR97] describe a similar

mechanism which reduces the memory overhead for the data structure on each node

to O





log2n



with kdenoting the number of copies on a node. However, this mecha-

nism works only for networks that fulfill a low-expansion property, which is satisfied,

e.g., by meshes of constant dimension.

1.6 Outline

The remainder of this thesis is organized as follows. In Chapter 2, caching strategies

for networks without memory capacity constraints are investigated in a non-uniform

cost model. First, the access tree strategy is introduced, and as an example the ap-

plication of the access tree strategy to 2-dimensional meshes is sketched. Then, a

deterministic and distributed caching strategy for trees is presented. This strategy is

used as a subroutine in the access tree strategy. Thereafter, the access tree strategy is

1.6. Outline 13

applied to several classes of networks, including meshes, fat-trees, complete networks,

Cayley networks, edge symmetric networks, hypercubic networks, and Internet-like

clustered networks. Further, we present the universal caching strategy working on ar-

bitrary networks for whichan oblivious routingstrategy isgiven. Finally, we will show

that all presented strategies are also able to handle parallel and overlapping requests,

too.In Chapter 3, caching strategies for networks with memory capacity constraints

are investigate in a non-uniform cost model. First, the general framework for the de-

velopment of caching strategies with memory capacity constraints is introduced. This

framework is based on the access tree strategy. Then, a caching strategy for trees

with memory capacity constraints is presented. This strategy is used as a subroutine

in the general framework. Thereafter, the general framework is applied to several

classes of networks, including meshes, fat-trees, complete networks, Cayley networks,

edge symmetric networks, and hypercubic networks. Further, we present the universal

caching strategy working on arbitrary networks for which an oblivious routing strategy

is given.

In Chapter 4, a detailed summary and discussion of the results of this thesis is

presented. Further, some concluding remarks are given.

Chapter 2

Caching without Memory Capacity

Constraints

In this chapter, we investigate caching strategies for networks without memory capac-

ity constraints in a non-uniform cost model. The remainder of the chapter is organized

as follows. In Section 2.1, the access tree strategy is introduced, and as an example the

application of the access tree strategy to 2-dimensional meshes is sketched. In Section

2.2, a deterministic and distributed caching for trees is presented. This strategy is used

as a subroutine in the access tree strategy. In Section 2.3, the access tree strategy is

applied to several classes of networks, including meshes, fat-trees, complete networks,

Cayley networks, edge symmetric networks, hypercubic networks, and Internet-like

clustered networks. Further, we present the universal caching strategy working on

arbitrary networks for which an oblivious routing strategy is given. Finally, in Sec-

tion 2.4, we will show that all presented strategies are also able to handle parallel and

overlapping requests, too.

2.1 The access tree strategy

The access tree strategy is based on a hierarchical decomposition of the network.

This hierarchical decomposition allows to exploit topological locality in an efficient

way. As an example, we describe recursively the hierarchical decomposition of a 2-

dimensional mesh Mwith nnodes. Let m1and m2denote the side lengths of the mesh,

i.e., n





m2. We assume m1



m2. If m1



1 then we have reached the end of

the recursion. Otherwise, we partition Minto two non-overlapping submeshes of size





 

m2and



"#

m2. These submeshes are then decomposed recursively

according to the same rules. Figure 2.1 gives an example for this decomposition.

The hierarchical decomposition has associated with it a decomposition tree T





in which each node corresponds to one of the submeshes, i.e., the root of T





cor-

responds to Mitself, and the children of a node vin the tree correspond to the two

submeshes into which the submesh corresponding to vis divided. In this way, the leaf

16 Chapter 2. Caching without Memory Capacity Constraints

level 0 level 1

k l m

jhg

d fe

cba

level 2 level 3 level 4

Figure 2.1: The hierarchical decomposition of M







nodes of T





correspond to submeshes of size one and thus each leaf node represents

a node of M.

We interpret T





as a virtual network that we want to simulate on M. In order to

compare the congestion in both networks we define bandwidths for the edges in T





i.e., for a tree edge e









, with udenoting the parent node, define the bandwidth

of eto be the number of edges leaving the submesh corresponding to v. Note that this

bandwidth definition is only required for the description of the theoretical results and

does not affect the access tree strategy in any way. Figure 2.2 gives an example of a

decomposition tree with bandwidths.

For each global variable, define an access tree to be a copy of the decomposition

tree. We embed the access trees randomly into the mesh, i.e., for each variable, each

node of the access tree is mapped at random to one of the nodes in the corresponding

submesh. On each of the access trees, we execute a simple 3-competitive caching

strategy. All messages that should be sent between neighboring nodes in the access

trees are sent along the dimension-by-dimension order path between the associated

nodes in the mesh, i.e., the unique shortest path between the two nodes using first

edges of dimension 1 and then edges of dimension 2.

The analysis of the access tree strategy uses a bi-simulation between the mesh M

and the decomposition tree T





. At first, it is shown that the decomposition tree





can simulate the mesh M. In this simulation, each leaf node of T





issues the

same read and write requests as its counterpart in M. It is shown that any application

that produces congestion Cwhen it is executed on Mcan be executed on T





with

congestion C, too. At second, it is shown that the mesh Mcan simulate the decompo-

sition tree T





in such a way that the congestion in Mis only O



logn



times larger

than the congestion in T





From the results on the congestion of these two simulations we can obtain the

competitive ratio of the access tree strategy. The simulation of Mby T





looses a

factor of at most c1



1; the on-line strategy looses a factor of c2



3 against the off-

line strategy on T





; finally, the simulation of T





by Mlooses a factor of at most





logn



. Altogether, we achieve a competitive ratio of c1







logn



2.2. Caching on trees 17

gkhl

fceb

3 4

2 3

3 323

3 4 2 3

2 55

level 2

level 3

level 4

level 0

level 1

Figure 2.2: The decomposition tree T







. The node labels correspond to the

labels in Figure 2.1. The edge labels indicate the bandwidths of the respective edges.

2.2 Caching on trees

In this section, deterministic caching strategies for a tree Tare presented. The advan-

tage of trees is that there is only one simple path between any pair of nodes. Thus,

the placement of the objects automatically defines the routing paths from the access-

ing node to the respective copies. In particular, this means that the congestion is fixed

as soon as the placement is specified, which makes the analysis for trees much easier

than the one for networks including cycles. The edges in the tree Tare allowed to have

arbitrary bandwidth.

First, we present a very simple caching strategy for trees in a uniform cost model,

i.e. D



1. It is shown that this deterministic strategy is 3-competitive. Then, we

generalize this strategy to a non-uniform cost model. Again, it is proved that this

deterministic strategy is still 3-competitive. This competitive ratio is optimal because

of the lower bound shown by Black and Sleator [BS89].

2.2.1 Uniform model

In the uniform cost model, i.e. D



1, we assume that each object fits into one routing

packet such that each migration of a copy along an edge increases the load on this edge

by 1. Recall that each read or write request increases the load of each edge involved

in this request by 1, and that each information message increases the load of each

traversed edge by 1.

18 Chapter 2. Caching without Memory Capacity Constraints



Node vissues a read request for object x:

The read request is served by u, the node holding a copy of xthat is

nearest to v.

A new copy of xis created on each node incident to an edge on the path

from uto v.



Node vissues a write request for object x:

vsends the update information, i.e., the value that should be written, to

u, the node holding a copy of xthat is nearest to v.

ustarts an invalidation multicast to all other nodes holding a copy of x,

and modifies its own copy of x.

A new copy of xis created on each node incident to an edge on the path

from uto v.

Figure 2.3: The tree strategy in the uniform model for object x.

Our caching strategy manages all objects independently from each other. For each

object x, the nodes that hold a copy of xalways build a connected component. In

Figure 2.3 the tree strategy in the uniform model is described for object x. Note that in

case of a write request the invalidation multicast can be viewed as following: First, the

copies are updated, and after this update the copies are deleted.

It remains to describe how the nodes locate the copies. As the nodes holding the

copies of xalways build a connected component, the data tracking problem can be

solved very easily. Each node is attached a signpost pointing to the node that has is-

sued the least recent write request. Initially, this signpost points to the only copy of

x. Whenever xis updated the signposts are redirected to the node that has issued the

corresponding write request. Note that this mechanism does not require extra com-

munication, because only the signposts on nodes involved in the invalidation multicast

have to be redirected. The number of signposts can be reduced by defining a root of

the tree and omitting all those signposts that are directed to the root. Hence, we need

signposts only to mark the trail from the root to the node that issued the least recent

write request such that diameter of Tsignposts for each object are sufficient.

Besides, we attach markers to the copies that indicate the boundaries of the con-

nected componentbuiltby thenodes that holda copy. These markersare needed for the

invalidation multicast. They are updated whenever the connected component changes.

Also this mechanism does not require extra communication.

The following theorem shows that the tree strategy in the uniform model is strictly

3-competitive on trees with edges of arbitrary bandwidth.

2.2. Caching on trees 19

Theorem 2.1 The tree strategy in the uniform model minimizes the load on any edge

up to a factor of 3.

Proof. Fix an object x



X, and a sequence σ



σ1σ2



of read and write requests

for x. If σstarts with a read request, then we add a write request issued by the node

holding the initial copy of xat the beginning of the sequence. Obviously, this causes

no extra communication for an optimal strategy. We divide the sequence of requests

into phases such that the first request in each phase is a write and each phase includes

only one write.

For a phase t, let vtdenote the node issuing the write request at the beginning of t,

and letUtdenote the connected subgraph induced by vtand the nodes issuing the read

requests in this phase. Further, for t



1, let ptdenote the unique shortest path leading

from Ut



1to vt.

Each caching strategy has to send at least one message along each edge in Utin

phase tbecause all of the nodes that issue read requests have to receive the value

written by vt. Besides, for t



1, a message has to be sent along ptbecause the new

value written by vthas to be “merged” with the value written by vt



1. Consequently,

any strategy has to sent one message along each edge inUt

ptfor each phase t.

Now, we consider the load induced by our strategy. All messages induced by read

and write requests of phase texcept for the invalidation messages are routed along

edges in Ut

pt. In particular, the load of each of the edges in Ut

ptis increased by

exactly two: by the service of a read or write request and the corresponding migration

of a copy of x.

Finally, we consider the load induced by the invalidation multicast. The invalida-

tion multicast for the write in phase tsends exactly one message along each edge in



1. Consequently, our strategy sends at most three times as many messages

along any edge as any other strategy does.

2.2.2 Non-uniform model

First, we describe a deterministic caching strategy for two nodes connected by a single

edge. This strategyis 3-competitive, and it can be computed in a distributed fashion by

the two connected nodes. The key feature of this strategy is that it can be extended to

work on trees just by simulating it on any edge in the tree. The result of this approach

is a simple, deterministic, and distributed caching strategy for trees which minimizes

the load on any edge up to a factor of 3. Obviously, the bound on the load of any edge

induces that our tree strategy is 3-competitive with respect to the total communication

load and the congestion simultaneously.

Caching on a single edge

We describe a deterministic and distributed caching strategy for a single edge econ-

necting two nodes aand b. The strategy uses a simple counting mechanism which

20 Chapter 2. Caching without Memory Capacity Constraints



Node aissues a read request for object x:

If ca

Dthen ca:





If ca



Dand aholds no copy of xthen

Move a new copy of xonto a;

If cb



0 then delete the copy of xon b;



Node aissues a write request for object x:

If cb

0 then

cb:





else If ca

Dthen ca:





If ca



Dand aholds no copy of xthen move a new copy of xonto a;

If cb



0 and aholds a copy of xthen delete the copy of xon b;

Figure 2.4: The edge strategy for requests issued by node a.

records read and write requests that are issued by the two nodes. Later on we will use

this strategy for building our tree strategy. Then the nodes aand bcorrespond to the

two connected components in which the tree is divided if eis removed.

Initially, we do not care about how the nodes aand bexchange information about

the residence set, and how the counters are distributed among them. We assume that

node aalways knows whether or not node bholds a copy and vice versa. Afterwards

we show how our strategy can be adapted to the distributed setting.

Each object xis handled independently from the other objects. Let us fix an object

x. Concerning this object there are two counters caand cb. Informally, these counters

represent saving accounts for costs referring to x.

Initially, one copy of xis placed on one of the two nodes. W.l.o.g., assume that it

is placed on a. Then cais set to D, and cbis set to 0. In Figure 2.4 the edge strategy is

described for requests issued by node a. The strategy works analogously for requests

issued by node b. Note that the edge strategy always keeps one copy of x, since it only

deletes a copy on a node if the other node also holds a copy.

The following lemma shows that the centralized edge strategy is strictly 3-

competitive.

Lemma 2.2 The centralized edge strategy minimizes the load up to a factor of 3.

Proof. We use a potential function argument (cf. [ST85]). First, let us fix an optimal

off-line strategy, which is denoted the optimal strategy in the following. W.l.o.g., the

optimal strategy fulfills the following properties.

2.2. Caching on trees 21



If a node vissues a read request for an object x, then the optimal strategy does

not delete a copy, that is, the only possible change of the residence set is that a

new copy of xis moved to v.



If a node vissues a write request for an object x, then the only possible changes

of the residence set are that a new copy of xis moved to vand/or a copy of xis

deleted on the neighbor of v.

Fix an object x



X, and a sequence σ



σ1σ2



of read and write requests for

x. Let Le





and Lopt





denote the load of the edge strategy and the optimal strategy,

respectively, after serving σt, and let Φ





denote the value of a potential function after

serving σt, which is defined in detail later. In order to prove the lemma, we show the

following invariant.

(a) Le



'



)(



Lopt





and

(b) Φ



*

Let ca





and cb





denote the value of the counters caand cb, respectively, after

serving σt, and let Re





and Ropt





denote the residence set of the edge strategy and

the optimal strategy, respectively, after serving σt. We define



+

Φa





Φb







where, for v

-,



Φv



/

2cv





, if v







and v



Ropt











, if v







and v



Ropt







2cv





, if v







and v



Ropt









, if v







and v



Ropt





First, we prove invariant (b). The optimal strategy always holds a copy of xon

a node. Hence, v



Ropt





, for some node v

4,



, and consequently Φv



5

which implies Φa





Φb



)

D. Hence, invariant (b) is shown.

Now we prove invariant (a) by induction on the length of σ. Obviously, (a) holds

for the initial setting. For the induction step suppose that Le



6



/(



Lopt





. Let

∆Le













,∆Lopt



Lopt







Lopt





,∆Φa



Φa







Φa





, and

∆Φb



Φb







Φb





. In order to prove the induction step, we show that

∆Le



∆Φa



∆Φb

(



∆Lopt

(2.1)

We distinguish between read and write requests.



Suppose σt



1is a read request issued by node a. In this case, Equation (2.1) can

be checked with Table 2.1 containing all possible changes of configuration. Note

that, if aissues a read request, the only possible changes of the residence sets

are that one of the strategies moves a copy to a, or cb



0 and the edge strategy

deletes the copy on b. In both cases, ∆Φb



22 Chapter 2. Caching without Memory Capacity Constraints









Ropt















Ropt







∆Le∆Φa

(

∆Lopt

no no no no 1 2 1

no no no yes 1 3D



no yes no yes 1



1 0

no no yes no 1





no no yes yes 1





D D

no yes yes yes 1





yes no yes no 0 1 1

yes no yes yes 0 3D



yes yes yes yes 0 0 0

Table 2.1: Possible changes of configuration if node aissues a read request.



Suppose σt



1is a write request issued by node a. We distinguish between the

cases cb





0 and cb



8

0. Note that, if aissues a write request, the only

possible changes of the residence set of the optimal strategy is that a new copy of

xis moved to aand/or a copy of xis deleted on b.

–Suppose cb





0. In this case, Equation (2.1) can be checked with Table

2.2 containing all possible changes of configuration. Note that, if aissues a

write request and cb





0, the only possible transitions of the edge strategy

are from

, from

, or from



–Suppose cb



)

0. In this case, Equation (2.1) can be checked with Table

2.3 containing all possible changes of configuration. Note that, if aissues a

write request and cb



+

0, the only possible transitions of the edge strategy

are from

or from

This completes the proof of Lemma 2.2.

Next, we describe how the edge strategy can be adapted from the centralized to the

distributed setting. Each node keeps always the current value of both counters caand

cb. It is obvious that this assignment makes it possible for the nodes to make the right

decisions according to the edge strategy. Now we specify how each node keeps track

of both counters caand cb. W.l.o.g., consider node a.



A read request for xis issued by b: If bholds no copy of xthen ais able to

update its counters. In the other case, bsends an information message along eif

and only if bhas increased its counter cb.



A write request for xis issued by b: If aholds a copy of xthen ais able to update

its counters. In the other case, bsends an information message along eif and

only if bhas decreased its counter caor increased its counter cb.

2.2. Caching on trees 23





Ropt











Ropt







∆Le∆Φa

(

∆Φb

(

∆Lopt

a a a a 0 0



2 0

a a



b a a



b0 0 1 1

a a



b a a 0 0



2 0

a b a b 0 0 1 1

a b a a



b0 3D1 1



a b a a 0 3D



b a b a 1 0



1 0

b a



b b a



b1 0 2 1

b a



b b a 1 0



1 0

b b b b 1 0 2 1

b b b a



b1 3D2 1



b b b a 1 3D





b a a



b a 1 0



1 0



b a



b a



b a



b1 0 2 1



b a



b a



b a 1 0



1 0



b b a



b b 1 0 2 1



b b a



b a



b1 3D2 1





b b a



b a 1 3D





b a a a 1 0



1 0



b a



b a a



b1 0 2 1



b a



b a a 1 0 2



3D0



b b a b 1 0 2 1



b b a a



b1 3D2 1





b b a a 1 3D2



3D D

Table 2.2: Possible changes of configuration if node aissues a write request and





In this way, each node is able to keep both of its counters up-to-date. The following

lemma shows that the additional information messages are sent very rarely.

Lemma 2.3 The distributed edge strategy minimizes the load up to a factor of 3.

Proof. We adopt the notations and definitions of Lemma 2.2. An information message

is only sent if a request changes the value of a counter. Thus, the tables used for

proving Lemma 2.2 change only slightly.



Suppose σt



1is a read request issued by node a. Then, an additional message

is sent only, if a







. In this case, ∆Leequals 1 rather than 0. Further, ∆Φa

equals



2 rather than 0, if a



Ropt





and ca





D. It is easy to check that

Equation (2.1) is still satisfied by applying these changes to Table 2.1.

24 Chapter 2. Caching without Memory Capacity Constraints





Ropt











Ropt







∆Le∆Φa

(

∆Φb

(

∆Lopt

a a a a 0 0 0 0

a a



b a a



b0 0 0 1

a a



b a a 0 0



3D0

a b a b 0 1 0 1

a b a a



b0 1 0 1



a b a a 0 1



3D D

b a b a 1



1 0 0

b a



b b a





1 0 1

b a



b b a 1



3D0

b b b b 1 2 0 1

b b b a



b1 3D



1 0 1



b b b a 1 3D



3D D

b a a a 1





D0 0

b a



b a a







D0 1

b a



b a a 1





3D0

b b a b 1





D0 1

b b a a







D0 1



b b a a 1





3D D

Table 2.3: Possible changes of configuration if node aissues a write request and



+



Suppose σt



1is a write request issued by node a. Then, an additional message

is sent only, if b







. In this case, ∆Leequals 1 rather than 0. Further, ∆Φa

equals



2 rather than 0, if a



Ropt





and ca





D. It is easy to check that

Equation (2.1) is still satisfied by applying these changes to the Tables 2.2 and

2.3.

Hence, Lemma 2.3 is proved.

Caching on trees

The distributed edge strategy can be extended to a tree T









. The edges in the

tree Tare allowed to have arbitrary bandwidths.

The tree strategy is composed out of



individual edge strategies. The idea is

to simulate the distributed edge strategy on each edge. Consider an arbitrary edge









. The removal of edivides Tinto two subtrees Taand Tb, containing aand

b, respectively. The two nodes aand bexecute the algorithm described in Figure 2.4.

The phrases “if aholds a (no) copy of x” and “node aissues a read (write) request for

object x” are just replaced by “if a node in Taholds a (no) copy of x” and “a node in Ta

issues a read (write) request for object x”, respectively.

2.2. Caching on trees 25

The simulation works properly as long as the nodes in the residence set R





build

a connected component in the tree. A key feature of our edge strategy is that it fulfills

this condition. This is shown in the following lemma.

Lemma 2.4 The graph induced by the residence set R





is always a connected com-

ponent.

Proof. Via induction on the length of the sequence of requests it can be shown that

those counters on any simple path in the tree that are responsible for moving a copy

along an edge towards the first node of the path are non-decreasing from the first to

the last node on the path. Hence, all these counters “agree” about the distribution of

copies. This ensures that all copies stay in a connected component.

Fix an object x, and a sequence σ



σ1σ2



of read and write requests for x. Fix

a node vand two different edges x









and y









that are incident on v. The

removal of xand ydivides Tinto three subtrees Tu,Tv, and Tw, containing u,vand

w, respectively. For each edge







, there are the two counters c

;

aand c

;

bin the

distributed edge strategy. Then, let cx





and cy





denote the values of the counters cx

and cy

v, respectively, after serving σt.

In order to show the lemma, we prove by induction that cx



<(





. Obviously

this is true for the initial setting. For the induction step suppose that cx



*(





. In

the following we have to consider the three cases that σt



1is a request from a node in

Tu,Tv, or Tw.



Suppose σt



1is a request from a node in Tu. Then, cx





8-,



=





and cy





>4,









. If cx





*



'

(

Dthen cy





5



?



. Thus, cx





)(









Suppose σt



1is a request from a node in Tv. Then, cx





*@,



=





and cy





8-,



=





. Thus, cx





*(









Suppose σt



1is a request from a node in Tw. Then, cx





*-,



A





and cy





54,



B





. If cy





*







0 then cx





5



C



. Thus, cx





)(







This completes the proof of Lemma 2.4.

Note that the tree strategy also does not need any additional information exchange

apart from the one done by the distributed edge strategy. Therefore, the following

theorem follows immediatelyfrom Lemma 2.3. It showsthat the tree strategy isstrictly

3-competitive for trees with edges of arbitrary bandwidth.

Theorem 2.5 The tree strategy minimizes the load on any edge up to a factor of 3.

26 Chapter 2. Caching without Memory Capacity Constraints

2.3 Applications of the access tree strategy

The access tree strategy can be applied to several classes of networks. Given any net-

work G, the decomposition tree T





can be constructed by a hierarchical decomposi-

tion of G. The quality of the resulting caching algorithm depends on some properties

of this decompositions. Although it is not clear which properties can be obtained for

general networks, applying the access tree strategy to almost any standard network

yields interesting results. In this section, we give some examples, including meshes,

fat-trees, complete networks, Cayley networks, edge symmetric networks, hypercu-

bic networks, and Internet-like clustered networks. Further, we present the universal

caching strategy working on arbitrary networks for which an oblivious routing strategy

is given.

2.3.1 Meshes

In this section, we consider caching strategies for the mesh M







777





, i.e.,

the d-dimensional mesh-connected network with side length mi



2 in dimension i.

The number of nodes is denoted by n, i.e., n





md, the number of edges is

∑d

1m1

























. Each edge has bandwidth 1. Thus,

the relative and absolute load of an edge are identical.

We show that the access tree strategy is strictly O





logn



-competitive, w.h.p., for

M. In order toprovethisresult, we givean upper boundon themaximum expectedload

in Mdue to the access tree strategy. To complete the proof, we show with a Chernoff

bound (see, e.g., [HR90]) that the maximum load over all edges does not deviate to

much from the expected load of an arbitrary edge. In order to apply a Chernoff bound

reasonably, we have to make the access tree strategy more fine granularly, i.e, the

access tree nodes have to be remapped dynamically in Mwhen too many messages

that simulate messages of the tree strategy traverse a node. The remapping is done by

the general remapping scheme. We present this scheme in a more general fashion such

that it can be used for all networks, not onlyfor meshes. Finally, we give a lower bound

for on-line routing on meshes showing that the above competitive ratio is optimal up

to a factor Θ





Upper bound

The access tree strategy uses a locality preserving embedding of access trees. It is

based on a hierarchical decomposition of M, which we describe recursively. Let ibe

the smallest indexsuch that mi



max



777



. If mi



1 then we have reached the

end of the recursion. Otherwise, we partition Minto two non-overlapping submeshes





777







E

777





and M





777

F!



"6

77G7





. These submeshes are then

decomposed recursively according to the same rules. Figure 2.1 on Page 16 gives an

example for this decomposition.

2.3. Applications of the access tree strategy 27

The hierarchical decomposition has associated with it a decomposition tree T





in which each node corresponds to one of the submeshes, i.e., the root of T





cor-

responds to Mitself, and the children of a node vin the tree correspond to the two

submeshes into which the submesh corresponding to vis divided. Thus, T





is a

binary tree of height O



logn



in which the leaves correspond to submeshes of size

one, i.e., to the nodes of M. We define the root to be on level 0 of this tree, and all

nodes whose parents are on level iare defined to be on level i



1. For each node vin





, let M





denote the corresponding submesh. Furthermore, each edge eof T





connecting a level inode uwith a level i



1 node vis defined to be on level i



1 and



H





We interpret T





as a virtual network that we want to simulate on M. In order to

compare the congestion in both networks we define bandwidths for the edges in T





i.e., for an edge eof T





, define the bandwidthof eto be the number of edges leaving

submesh M





. Figure 2.2 on Page 17 gives an example of a decomposition tree with

bandwidths.

For each object x



X, define an access tree Tx





to be a copy of the decompo-

sition tree T





. We embed the access trees randomly into M, i.e., for each x



each interior node vof Tx





is mapped by a random hash function h







to one of

the processors in M





, and each leaf vof Tx





is mapped onto the only processor in





. For simplicity, we assume that the hash functions map in a truly random fashion,

i.e., uniformly and independently.

The remaining description of our data management strategy is very simple: For

object x



X, we execute the tree strategy on the access tree Tx





. All messages

that should be sent between neighboring nodes in the access trees are sent along the

dimension-by-dimension order path between the associated nodes in the mesh, i.e.,

the unique shortest path between the two nodes using first edges of dimension 1, then

edges of dimension 2, and so on.

The following lemma gives an upper bound on the maximum expected load in M

due to the access tree strategy.

Theorem 2.6 For each application on the d-dimensional mesh M with n nodes, the

access tree strategy achieves maximum expected load O





logn



Copt





, where

Copt





denotes the optimal congestion for the application.

Proof. In order to prove the above result, we require a lower bound on the congestion

of an optimal strategy and an upper bound on the expected load of the access tree

strategy. Define Copt





to be the optimal congestion for the application when it

is executed on the binary tree T





, under the assumption that each processor of Mis

simulated by its counterpart in T





, which is one of the leaf nodes. Note that T





is used only as a tool in the proof; in our caching strategy, we actually use a separate

tree Tx





for each data object x. We give a lower bound that relates the optimal

congestion on the mesh to Copt





, and an upper bound that relates the expected

load of the access tree strategy to Copt





. We start with the lower bound.

28 Chapter 2. Caching without Memory Capacity Constraints

Lemma 2.7 Copt



)(

Copt





Proof. For a given strategy on Mwith congestion Cwe have to describe a strategy on





with congestion at mostC.

We simulate the strategy for Mon T





, except for the routing. Instead, for the

routing paths in the mesh we use the unique shortest paths between the respective

nodes, that is, whenever a message is to be routed between two mesh nodes, we instead

route the same message along the unique shortest path in the tree between the leaf

nodes corresponding to these mesh nodes. Let C

denote the congestion for a given

application with the above strategy on T





. Let edenote an edge of T





with

relative loadC

. Then the absolute load of eisC

I





Now consider the same application on M. Any message that crosses ein T





has

either to leave or to enter the submesh M





in M. The number of edges leaving M





is b





. Thus, the load on one of these edges is at least C

IJ







/

, and hence,



The following lemma gives the upper bound on the expected load of the access tree

strategy.

Lemma 2.8 The maximum expected load in M due to the access tree strategy is



logn



Copt





Proof. Fix an arbitrary edge ein M. Let hdenote the height of T





, and let L





denote the load on edue to the simulation of edges on level



of T





, for 1

(



(

We show that E



GM





Copt





, for 1

(



(

h, which yields the lemma as





logn



Fix a level



. Let vbe a node of T





on level





1 such that M





includes the

edge e. If such a node does not exist then E



GMN

0. Let v

be one of the two

children of v. We bound the expected load on edue to the simulation of the tree edge

O,



IP.

on level



of T





First, we give a boundon the probability P





that eis traversed by a dimension-by-

dimension order pathconnecting themesh nodesthat simulate vandv

. ThemeshM





is partitioned by the hierarchical decomposition into two submeshes, one of which is



I

. Let qidenote the side length of M



I

in dimension i. Let kdenote the dimen-

sion of edge e. The nodes vand v

are embedded randomly into M





and M



IQ





is bounded by the probability that the dimension-by-dimension order path between the

host of vand the host of v

traverses the row of e, i.e., the maximum set of edges of

dimension kthat build a linear array that includes e. Regardless of whether the path

starts at vor v

, the number of reachable rows in dimension kwith respect to the ran-

dom embedding is at least ∏d

kqi. As each of the these rows is equally likely to be

traversed,



8(

∏d

kqi

(

∏d

1qi

(

qmax

∏d

1qi

with qmax



max1

2.3. Applications of the access tree strategy 29

Next we relate this probability to the bandwidth b





of the tree edge eT. This

bandwidth is defined to be the number of edges leaving M



IT

. As a consequence,



U(

∑

V8W

qmax



∏d

1qi

(

∑

V8W

qmax



∏d

1qi

qmax



(



∏d

1qi

qmax

(





Note that we are allowed to exclude any dimension jwith qj

qmax



from the

above sum because the hierarchical mesh decomposition ensures that the mesh is not

divided along these dimensions before the decomposition of M





on level





1 such

that M



I

, which results from this decomposition, has no outgoing edges in one of

these dimensions.

The maximum number of messages that are transmitted along the tree edge eTis



Copt



N





, since the tree strategy is strictly 3-competitive. Consequently,

the expected load on edge efor simulating eTis at most



Copt











)(

18d



Copt





The samebound holdsfor theedge connecting vwithits otherchild. Hence, E



GMZ





Copt





, which yields the lemma.

Applying Lemma 2.7 and Lemma 2.8 yields Theorem 2.6.

The general remapping scheme

In this section, we introduce the general remapping scheme. This scheme is presented

in a more general fashion such that itcan be used for all networks, not only for meshes.

Consider a network Gand a hierarchical decomposition of G. The associated

decomposition tree T





is the tree in which each node corresponds to one of the

subgraphs, i.e., the root corresponds to Gitself, the children of a node vcorrespond

to the subgraph into which the subgraph corresponding to vis divided, and the leaves

correspond to subgraphs of size one, i.e., to the nodes of G. We define the root of





to be on level 0, and all nodes whose parents are on level iare defined to be

on level i



1. For each node vin T





, let G





denote the corresponding subgraph.

Furthermore, each edge ein T





connecting a level inode uwith a level i



1 node v

is defined to be on level i



1 and G



H





. Finally, we define the bandwidths, for

each edges ein T





, to be the number of edges leaving subgraph G





30 Chapter 2. Caching without Memory Capacity Constraints

For each objectx



X, define an access tree Tx





to bea copy of thedecomposition

tree T





. We embed the access trees randomly into G, i.e., for each x



X, each

interior node vof Tx





is mapped uniformly at random to one of the nodes in G





and each leaf vof Tx





is mapped onto the only node in G





The remaining description of our caching strategy is very simple: For each object



X, we execute the tree strategy on the access tree Tx





. All messages that should

be sent between neighboring nodes in the access trees are sent along a path in G.

The access tree nodes have to be remapped when too many access messages, i.e.,

messages that simulate messages of the tree strategy, traverse a node. The remapping is

done as follows. For every object x, and every node vof the access tree Tx





we add a

counterr







. Initially, thiscounter isset to 0. Whenever an access messagefor object

xtraverses node v, starts at node v, or arrives at node v, the counter r







is increased

by D, if this message is a migration message, and the counter r







is increased by

1, otherwise. When the counter r







reaches Kthe node vis remapped randomly to

another node in G





, where Kis some integer with K





deg





∆



[



where deg





denotes the degree of T





, and ∆





denotes the maximum

factor by which the bandwidths of two edges from T





incident on the same node

deviate. Remapping vto a new host means that we have to send a remapping message

that informs the new host about the migration and, if the old host holds a copy of x,

moves the copy to the new host. Remapping messages reset the counter r







to 0.

Furthermore, we have to send notification messages including information about the

new host to the nodes in Gthat hold the access tree neighbors of v. These notification

messages also increase the counters at their destination nodes, i.e., the counter r





IQ

for each neighbor v

of vin the access tree, by 1. The counter mechanism ensures

that the load directed to a copy on a randomly selected host is O





rather than Θ





where κdenotes the maximum number of write requests directed to the same object.

Note that this bound would fail if the notification messages would not increase the

counters.

For a given application on Glet Copt





denote the optimal congestion for the ap-

plication. Further, let Copt





denote the optimal congestion for the application

when it is executed on the tree T





, under the assumption that each node of Gis sim-

ulated by its counterpart in T





, which is one of the leaf nodes. The decomposition

tree T





is called ΨG-useful if, for every application, the following two conditions

are fulfilled.



Copt



)(

Copt







The maximum expected relative load in Gdue to access messages is ΨG



Copt





The load on the edges in Gis increased by the notification and remapping mes-

sages. However, the following lemma shows that the impact of these messages on the

load on the edges is relatively small.

2.3. Applications of the access tree strategy 31

Lemma 2.9 Consider a network G with n nodes, and a ΨG-useful decomposition tree





of G. Then, for each application, the access tree strategy achieves congestion



ΨG



Copt





min

deg



N

∆



[



.8

deg



N

logn



, w.h.p., where

κdenotes the maximum number of write requests directed to the same object.

Before we prove Lemma 2.9, we apply it to the mesh M. We can conclude with

Lemma 2.7 and Lemma 2.8 that the decomposition tree T





is O





logn



-useful.

In addition, note that deg



H

3 and ∆



H

2. Then, we can conclude with

Lemma 2.9 that the access tree strategy achieves congestion O





logn



Copt





min



.\

logn



, w.h.p., where κdenotes the maximum number of write requests

directed to the same object.

To yield the following theorem we have to relate the optimal congestion Copt





and min



. Either an object xis migrated at least once or each request message

to xis directed to the node where the initial copy of xis placed. As the nodes in

the mesh have maximum degree 2d, it follows 2d



Copt



\

min



. Thus, the

access tree strategy achieves congestion O





logn



Copt





, w.h.p., i.e., it is strictly





logn



-competitive, w.h.p., for the d-dimensional mesh Mwith nnodes.

Theorem 2.10 For every application on the d-dimensional mesh M with n nodes, the

access tree strategy achieves congestion O





logn



Copt





, w.h.p., where Copt





denotes the optimal congestion for the application.

Proof (of Lemma 2.9). First, we consider the notification messages. We show that

the congestion due to notification messages in T





is not larger than the congestion

due to access messages. Note that T





is used as a tool in the proof and actually a

separate tree Tx





is used for each data object.

Let Ndenote the maximum number of notification messages passing a single edge

of T





. Suppose eTis an edge of T





that transmits Nnotification messages. Each

of the notification messages traversing eTwas caused by at least load Kdue to access

or notification messages that pass one of the two nodes incident on eT. Therefore, at

least one of the nodes incident on eTis touched by at least load N





2 due to access

or notification messages. Hence, the load on at least one of the at most deg





tree

edges incident to this node is at least N









deg





. Let e

Tdenote one of the

incident edges fulfilling this property. For K





deg



N

∆





, the load on

Tdue to access or notification messages is at least N



∆





. Then the load on

Tdue to access messages is at least N



∆





because the maximum number of

notification messages that cross e

Tis N.

By definition is b



>





∆





. Hence, N





<(



∆









i.e., the relative load on eTdue to notification messages is not larger than the relative

load on e

Tdue to access messages. As we execute the strictly 3-competitive tree

strategy on each access tree, we can conclude that the expected relative load due to

access and notification messages on every edge in T





is in O



Copt





. Thus,

the expected load due to access and notification messages on an arbitrary edge of Gis

bounded by O



ΨG



Copt





since T





is a ΨG-useful decomposition tree of G.

32 Chapter 2. Caching without Memory Capacity Constraints

Until now, we have not considered the remapping messages. These messages have

no direct counterpart in the decomposition tree T





. However, for each remapping

message we can specify some related access or notification messages that are sent

along the edges of the access trees: Consider a node vof the access tree Tx





of an

object x



X. Every time the counter r







reaches Kthe node vis remapped and a

remapping message is sent from the old host to the new host of v.r







is increased

whenever an access or notification message traverses v. Hence, each remapping mes-

sage is caused by at least load Ktouching node v.

Again, we merge together the access trees of all objects to form the virtual de-

composition tree T





. Fix a node vfrom this tree. Let H





denote the number of

remapping messages sent for all access tree nodes corresponding to v. The remapping

messages sent for vonly influence the load on edges in G





. In detail, a remap-

ping message for vis sent between two randomly chosen nodes in the subgraph G





Hence, for the expected load on an edge eof G





this has the same effect as sending a

remapping message along an edge in T





leading from vto an uniformly at random

chosen child of v. As we are primarily interested in the expected load on the edges

in G, the remapping messages for vcan be related to additional load on the edges in





leading from vto the children of v. Let eTdenote one of these edges. Then the

additional load on eTdue to remapping messages is at most H





The total load due to access and notification messages that are sent along the at

most deg





tree edges incident on vis at least H



[

K. Thus, one of these edges

has at least load H







deg





. Let e

Tdenote one of these edges fulfilling this

property. For K



deg



'

∆



'

D, the load on e

Tdue to access or notification

messages is at least H





∆





By definition is b



]





∆





. Hence, H



^





](



^



∆









, i.e., the relative load on eTdue to remapping messages is not larger

than the relative load on e

Tdue to access and notification messages. As we execute the

strictly 3-competitive tree strategy on each access tree, we can conclude that the ex-

pected relative load due to access, notification and remapping messages on every edge

in T





is in O



Copt





. Thus, the expected load due to access, notification, and

remapping messages on an arbitrary edge of Gis bounded by O



ΨG



Copt





since





is a ΨG-useful decomposition tree of G.

Finally, let L





denote the load on an arbitrary edge eof Gdue to all kind of

messages. We have shown that



GM



ΨG



Copt





In order to complete the proof of this lemma, we have to show that the maximum load

over all edges does not deviate too much from the expected load of an arbitrary edge.

We show that L





can be decomposed into deg





parts such that L





∑deg

;_;

1Li, and each Liis a sum of independent random variables. Then, we can

apply a Chernoff bound to each Li. We color the edges of the access trees with the

colors



777



deg



.

such that all edges incident on the same node have different

colors.

2.3. Applications of the access tree strategy 33

Fix a color j

`,



777



deg



=.

. Let Exbe the set of tree edges of Tx





with

this color. For eT



Ex, let A







be a random variable that is 1







if the path

that simulates eTincludes ewhen the ith message corresponding to object xtraverses

eT. Here a migration or remapping message of an object xis interpreted as Dsmall

messages each of which having unit size. Now define

Lj:



∑

X∑

Ex∑







Obviously, all of the random variables A







for different xare stochastically

independent. The coloringyieldsthatall randomvariablesfor differenteTare indepen-

dent, too. Further, the remapping ensures that the set of all random variables with fixed

eTand xcan be partitioned into subsets of maximum sizeW, withW





min



.b

such that the values of all variables in the same subset are equal and any collection

of variables from different subsets includes only independent variables. Recall that κ

denotes the maximum number of write requests directed to the same object.

Hence, the term Ljcan be viewed as a sum of independent random variables of

maximumweightW. Applying a Chernoff bound (see, e.g., [HR90]) to this sum yields

that Lj





'



logn



, w.h.p., and therefore,



c

deg

;P;

∑

1Li







deg







logn







ΨG



Copt



'

deg







logn

4

w.h.p. This completes the proof of Lemma 2.9.

Lower bound

The theorem in the previous section shows that the access tree strategy is strictly O





logn



-competitive. In the following, we give a lower bound for on-line routing which

shows that this factor is nearly optimal.

We consider the following on-line routing problem: An adversary specifies a se-

quence of ∆routing requests, i.e., pairs rt









of source and destination nodes,

for 1

(

∆. An on-line routing algorithm must assign a routing path connecting st

and dt, for 1

(

∆, without knowing future requests, i.e., requests rt

witht

t. The

goal is to minimize the congestion.

The following theorem gives a lower bound on the competitive ratio for on-line

routing on meshes. The bound holds for each deterministic or randomized on-line

algorithm. A similar bound, but only for two-dimensional meshes, has been derived

independently also by Bartal and Leonardi in the context of routing in “all-optical

networks” [BL97].

Theorem 2.11 Any on-line routing algorithm for the mesh of dimension d



2and

side length m has competitive ratio Ω



logm



34 Chapter 2. Caching without Memory Capacity Constraints

Caching in networks includes the problem of on-line routing, which can be shown

as follows. Forevery messagein the routingproblem, we define acorresponding object

xtthat is placed initially on node st. At timet, the node stwrites object xt; immediately

afterwards, node dtreads object xt. Then some data must be routed from stto dt. From

this observation, we can conclude the following corollary.

Corollary 2.12 Any caching strategy for the mesh of dimension d



2and side length

m has competitive ratio Ω



logm



Proof (of Theorem 2.11). We show that for each C,d



2, and m



4 being a power

of 2, there is a random routing problem Pd







for which the minimum congestion

is Cwhereas the expected congestion achieved by any on-line routing algorithm is

Ω





logm



Let Md





denote the d-dimensional mesh of side length m. We start by proving a

lower bound for M2





. First, we describe a random on-line routing problem P2







on M2





, for which the minimum off-line congestion is C. Then we show that the

expected congestion of any on-line strategy is Ω





logm









is defined as follows. Let









denote the k-th node in the



-th row

of M2





, for 1

(





(

m. The adversary starts by specifying m



2 pairs of source

and destination nodes each of which should be connected by Crouting paths. The

pairs are









B















, for 1

(



(



2. Further requests are described

recursively: The mesh M2





can be partitioned into four m







2 submeshes. If





4, then the adversary selects one from these submeshes at random and specifies

routing requests in this submesh according to P2









Altogether, this gives logm



1 batches of routing requests of size C













77G7





2. For 1

(

logm



1, the routing batch specified in stage iis denoted

by Riand the submesh considered in this stage is denoted by Sisuch that S1







It is easy to check that, for 1

(

logm



2, there exists an off-line schedule that

routes the requests of batch Riwith congestion Cthrough mesh Siwithout using any

edge of mesh Si



1. Thus, all requests can be routed off-line with congestionC.

It remains to be shown that the expected congestion of any on-line strategy is Ω





logm



. The mesh M2





consists of m



1 rows of vertical edges and m



1 columns

of horizontal edges, each containing medges. We number the rows and columns from

1 to m



1, respectively. In the following, we only consider the edges in odd rows

and odd columns. These edges are called odd edges. Each routing path connecting a

source and a destination node of a request in batch Rihas to traverse at least m





odd edges of submesh Si. Note that this bound holds even if a path leaves the submesh

Si. Hence, if one chooses randomly and uniformly an edge from the odd edges in Si,

then the expected number of paths connecting two nodes of batch Riusing this edge is

at least











Eodd























22i





with Eodd





denoting the set of odd edges in Si, for 1

(

logm



1. Choosing a

random odd edge from Slogm



1rather than from Siyields the same bound, because

2.3. Applications of the access tree strategy 35

the selection of the submeshes by the adversary corresponds to random selections of

subsets of odd edges. Note that this holds only for the odd edges because if m



2 is

even, then all of the odd edges in an m



mmesh will be contained in the m







submeshes, while all of the edges going between different submeshes are even. Hence,

the expected congestion in Slogm



1due to requests from batch Riis C



8, for 1

(

logm



1. Summing over all batches, yields that the expected congestion of P2







isC





logm





8, which completes the proof for the 2-dimensional case.

We now describe the on-line routing problem Pd







for the mesh Md





with



3. Md





can be partitioned into J





2two-dimensional m



msubmeshes



777



MJ, each of which consists only of edges of dimension 1 and 2. Pd







defined as follows. The adversary specifies the routing requests in each submesh Mj

according to P2







, for 1

(

J. In each submesh Mjit uses the same random

bits, which means that it specifies exactly the same routing problem in each Mj, for

(

J. We have already seen that all requests can be routed off-line with congestion

Cinside the respective 2-dimensional mesh Mj. Hence, it remains to be shown that

the expected congestion of any on-line strategy on this routing problem is Ω





logm



This we do by contradiction.

Suppose an on-line routing strategy exists for which the expected congestion on







is smaller than C





logm





8. Then this routing strategy can be simulated

on the m



mmesh M2





for P2











. In this simulation, the nodes and the

edges of M2





simulate their respective counterparts in the meshes M1



777



MJ. The

simulation yields congestion smaller than md







logm





8, since each edge of





has to simulate only md



2edges of Md





. This contradicts the above result

for P2











. Consequently, the expected congestion of any on-line strategy on







is at least C





logm





2.3.2 Fat-trees

In this section, we consider caching strategies for the fat-tree Fof height Hwith n

nodes. Fhas thetopologyof a symmetric tree, that is, for each inner-node, the subtrees

rooted at the children of the node are isomorph. The fat-tree represents an indirect

network, that is, only the leaf nodes are processors with memory modules, the inner

nodes are only routing switches. We define the root of Fto be on level 0, and all

nodes whose parents are on level iare defined to be on level i



1. Furthermore, each

edge eof Fconnecting a level inode with a level i



1 node is defined to be on level



1. Thus, F







B

777









, i.e., for every level iin the tree F, each node

on level ihas di



1children and each edge on level ihas bandwidth bi. We make the

reasonable assumption, for every level i,bi





The following lemma shows that the bandwidths of a fat-tree can be restricted

without increasing the congestion.

Lemma 2.13 W.l.o.g. we can assume, for every level i, bi

(







36 Chapter 2. Caching without Memory Capacity Constraints

Proof. We have to show that a given strategy with congestion Con the fat-tree F

has on the fat-tree F

Ie





B

777





K

min

K







.eB

777









at most

congestionC.

Fix an edge e

on level



in F

. Let e1



777



fQg

1denote the edge on level





1 in F

that are incident to e, and, for an edge e, let L





denote the load on e.

Suppose that b



min









. Then the lemma holds obviously. Now,

suppose that d









min

K







. Any message that crosses e

has to

cross one of the edges e1



777



fQg

1, too. Thus, L



IT\(

∑d

fQg





. The relative load

of an edge is defined to be the load divided by the bandwidth of the edge. Hence, the

relative load on e



I







(

∑d

fQg











(

max



B

777





fQg

.



(



which yields the lemma.

The fat-tree Fis denoted realistic if the bandwidths of the levels decrease geomet-

rically in direction to the root, i.e., if, for every level i,bi

(









1, for some

constant α

The access tree strategy uses a locality preserving embedding of access trees. It

is based on a hierarchical decomposition of F, which we describe recursively. If F

has size one then we have reached the end of the recursion. Otherwise, we partition F

into d1non-overlapping subtrees F1



777



Fd1with Fi







B

777









. These

subtrees are then decomposed recursively according to the same rules.

The associated decomposition tree T





is isomorph to F. Since we obtain, for

fat-trees, a tradeoff between the height of the decomposition tree and the achieved

congestion, we need a decomposition tree of arbitrary height 1

(

H. Thus, some

levels of T





have to be deleted possibly. This is done with the following deletion

algorithm. While height





h, delete a level 0



height





in T





whose nodes have minimum degree. Deleting level



means that, first, all nodes and

edgesonlevel



are deleted, then theedges on level





1are connected to the respective

nodes on level





1, and finally the numbering of the levels is adjusted.

Now, the decomposition tree T





has height h. The root of T





corresponds

still to Fitself, and the children of a node vin the tree correspond to the subtrees into

which the subtree corresponding to vis divided, and the leaves correspond to subtrees

of size one, i.e., to the leaf nodes of F. Note that T





is still isomorph to a fat-tree.

The remaining description of our caching strategy is analogous to the caching strat-

egy for meshes in Section 2.3.1. Note that all access tree nodes are mapped only to

leaf nodes of the fat-tree, since the inner nodes are only routing switches, and that all

messages that should be sent between neighboring nodes in the access trees are sent

along the unique shortest path between the associated nodes in the fat-tree.

2.3. Applications of the access tree strategy 37

The following theorem shows that we obtain, for h



min



logn



loglogn

an access tree strategy that is O



min



logn



loglogn

.e

-competitive, w.h.p., for

fat-trees of height Hwith nnodes. For realistic fat-trees this result improves

to O





-competitive, w.h.p. Further, we obtain a caching strategy that is strictly



deg







logn



-competitive, w.h.p. To yield the latter result we have to relate the

optimal congestion Copt





and min



. Either an object xis migrated at least

once or each request message to xis directed to the node where the initial copy of

xis placed. As the nodes in the fat-tree have maximum degree deg





, it follows

Copt



)

min





deg



.

Theorem 2.14 Consider an application on the fat-tree F of height H with n nodes.

Let Copt





denote the optimal congestion for the application, and let κdenote the

maximum number of write requests directed to the same object.



For each 1

(

H, we obtain an access tree strategy with congestion O





N

Copt







, w.h.p., and, if h



H, with congestion O





Copt







w.h.p., with R



min



deg











.*



deg



'

[

logn.



For realistic fat-trees this improves to O





Copt



e



, w.h.p., and, if h



to O



Copt



'



, w.h.p.

Proof. In order to prove the above result we apply Lemma 2.9. Fix an 1

(

DefineCopt





to be the optimal congestion for the application when it is executed

on the tree T





, under the assumption that each leaf node of Fis simulated by its

counterpart in T





, which is one of the leaf nodes. Note that T





is used only

as a tool in the proof and that we actually use a separate tree for each data object.

According to Lemma 2.13 we can assume w.l.o.g., for each level i,bi

(









for some constant α

(

To apply effectively Lemma 2.9, we have to show that the decomposition tree T





is ΨF-useful for a certain ΨF, i.e., we have to prove the following two conditions.



Copt



)(

Copt







Themaximumexpectedrelativeload in Fdue to access messages isO



∑h

1αi





Copt





, and, if h



H,O



∑H

1αi



Copt





In addition, note that ∆



<

deg



>

max

deg



B

. Then, Theorem

2.14 can be yield with Lemma 2.9.

The next lemma shows the lower bound for the optimal strategy.

Lemma 2.15 Copt



)(

Copt





Proof. For a given strategy on Fwith congestion Cwe have to describe a strategy on





with congestion at mostC.

We simulate the strategy for Fon T





. For the routing paths in T





we use the

unique shortest paths between the respective nodes, that is, whenever a message is to

38 Chapter 2. Caching without Memory Capacity Constraints

be routed between two fat-tree nodes, we instead route the same message along the

unique shortest path in the tree between the leaf nodes corresponding to these fat-tree

nodes. Let C

denote the congestion for a given application with the above strategy on





. Let edenote an edge of T





with relative load C

. Then the absolute load of e

isC

Ih





Now consider the same application on F. Any message that crosses ein T





has

either to leave or to enter the subtree F





. The bandwidth of the edge leaving F









. Thus, the load on this edge is at leastC





G



H

, and hence, C



The following lemma gives the upper bound on the expected load of the access tree

strategy.

Lemma 2.16 The maximum expected relative load in F due to access messages is



∑h

1αi





Copt





, and, if h



H, O



∑H

1αi



Copt





Proof. Fix an arbitrary edge ein Fand a level



in T





. Let L





denote the relative

load on edue to the simulation of access messages passing edges on level



of T





Let vbe a node of T





on level





1 such that F





includes the edge e. If such

a node does not exist then E



iM+

0. For each node uin T





, let the root of

subtree F





in Fdenote Fr





. Let



vdenote the level of Fr





in F. For all the

children v1



777



vdof vin T





, the nodes Fr



B

777







are on the same level in

F. Thus, let



cdenote this level. Further, Let



edenote the level of ein F. We show

that E



iM?



αH





Copt





, if







c, and E



GM?





Copt







v, which yields the lemma, since the latter case can only occur for at most

one level in T





First, suppose that







c. Then, a child v

of vexists such that F



I

includes

the edge e.eis traversed by paths connecting the fat-tree nodes that simulate vand

its children v1



777



vd, if the host of vor the host of v

is mapped on a leaf node in





, the respective subtree of the original decomposition tree of height H. The nodes

vand v

are mapped randomly to one of the leaf nodes in F





and F



I

. Thus, for the

probability P





that the host of vis mapped on a leaf node in F







*(

∏H



1di

∏H



1di

(

∏



1di



and for the probability P



Ij

that the host of v

is mapped on a leaf node in F







8(

∏H



1di

∏H



1di

(

∏



1di

Next, we relate thebandwidth b





of an edge eTconnectingthe node vwitha child in





to the bandwidth b



/

e. According to Lemma 2.13 we can assume w.l.o.g.,

2.3. Applications of the access tree strategy 39

for each level i,bi

(









1, for some constant α

(

1. As a consequence,



H

(







∏



The maximum load on the edge eTis Copt









. Consequently, the expected

load on edge efor simulating edges on level



of T





is at most

Copt



[



[







'



I

(

Copt





2α



(

Copt





2αH





with d



∏



1di.

Finally, suppose that



v. Then, the levels







777







1 of Fmust

have been deleted in T





. We assume that eis traversed by all paths connecting the

fat-tree nodes that simulate vand its children v1



777



vd. Now, we relate the bandwidth





of an edge eTconnecting the node vwith a child in T





to the bandwidth



N

e. According to the definition of the fat-trees, for each level i,bi





1. As a

consequence, b



H

(

The maximum load on the edge eTis Copt









. Consequently, the expected

load on edge efor simulating edges on level



of T





is at most

Copt









(

Copt





(

Copt



[

2n1

Note that the deletion algorithm ensures that d

(

2n1

hbecause this algorithm deletes

always a level in T





whose nodes have minimum degree.

This completes the proof of Theorem 2.14.

2.3.3 Complete networks

Some massively parallel computers, e.g., Cray T3E and T3D or Intel Paragon, have

a network with very high bandwidth, so that not the network is the bottleneck but

the individual memory modules. In particular, remote accesses to these modules are

expensive, as local accesses are supported by additional local caches. These systems

are well modeled by a complete network Gnwith nnodes. We aim to minimize the

congestion at the nodes due to remote accesses, that is, remote accesses increase the

load at any node whereas local accesses are free. Each node has bandwidth 1. Thus,

the relative and absolute load of a node are identical.

The access tree strategy uses a locality preserving embedding of access trees. It

is based on a hierarchical decomposition of Gn, which we describe recursively. If Gn

40 Chapter 2. Caching without Memory Capacity Constraints

has size one, i.e., n



1, then we have reached the end of the recursion. Otherwise,

we partition Gninto 2 non-overlapping complete networks G

and G

. These

complete networks are then decomposed recursively according to the same rules.

The associated decomposition tree T





is a binary tree of height O



logn



. Note

that the bandwidth of a node vin T





is the number of the nodes in Gn





. As for

fat-trees, we need a decomposition tree of arbitrary height 1

(

logn. Thus, some

levels of T





have to be deleted possibly. This is done with the deletion algorithm.

Now, the decomposition tree T





has height hand degree O





. Note that

the root of T





corresponds still to Gnitself, the children of a node vin the tree

correspond to the complete networks into which the complete network corresponding

to vis divided, and the leaves correspond still to complete networks of size one, i.e., to

the nodes of Gn.

The remaining description of our caching strategy is analogous to the caching strat-

egy for meshes in Section 2.3.1. Note that all messages that should be sent between

neighboring nodes in the access trees are sent along the edge between the associated

nodes in the complete network.

The following theorem shows that we obtain, for h



1, an access tree strategy for

complete networks with nnodes that is O





-competitive, w.h.p., with respect to the

congestion at the nodes. Further, we obtain, for h



logn, an access tree strategy that

is strictly O



logn



-competitive, w.h.p., with respect to the congestion at the nodes. To

yield the latter result we have to relate the optimalcongestionCopt





and min



Either an object xis migrated at least once or each request message to xis directed to

the node where the initial copy of xis placed. Thus,Copt



*

min



Theorem 2.17 Consider an application on the complete network Gnwith n nodes. For

each 1

(

logn, we obtain an access tree strategy with congestion O





Copt





min





.8



logn



at the nodes, w.h.p., where Copt





denotes the optimal

congestion at the nodes for the application, and κdenotes the maximum number of

write requests directed to the same object.

Proof. Inorder to provethe aboveresultweapplyLemma2.9which holds analogously

for the congestion at the nodes. Fix an 1

(

logn. Define Copt





to be the

optimal congestion at the nodes for the application when it is executed on the tree





, under the assumption that each node of Gnis simulated by its counterpart in





, which is one of the leaf nodes. Note that T





is used only as a tool in the

proof and that we actually use a separate tree for each data object.

To apply effectively Lemma 2.9, we have to show that the decomposition tree





is ΨGn-useful for a certain ΨGn, i.e., we have to prove the following two condi-

tions.



Copt



)(

Copt







The maximum expected relative load at the nodes in Gndue to access messages

is O





Copt





2.3. Applications of the access tree strategy 41

In addition, note that ∆



+

deg



+





. Then, Theorem 2.17 can be

yield with Lemma 2.9.

The next lemma shows the lower bound for the optimal strategy.

Lemma 2.18 Copt



)(

Copt





Proof. For a given strategy on Gnwith congestion Cat the nodes we have to describe

a strategy on T





with congestion at mostCat the nodes.

We simulate the strategy for Gnon T





. For the routing paths in T





we use

the unique shortest paths between the respective nodes, that is, whenever a message is

to be routed between two nodes in Gn, we instead route the same message along the

unique shortest path in the tree between the leaf nodes corresponding to these nodes.

LetC

denotethecongestionatthenodesforagivenapplicationwiththe abovestrategy

on T





. Let vdenote a node of T





with relative load C

. Then the absolute load

of visC

Ih





Now consider the same application on Gn. Any message that crosses vin T





has either to leave or to enter the complete network Gn





. The number of nodes in





is b





. Thus, the load on one of these nodes is at least C

Ih







m

, and

hence, C



The following lemma gives the upper bound on the expected load of the access tree

strategy.

Lemma 2.19 The maximum expected relative load at the nodes in Gndue to access

messages is O





Copt





Proof. Fix an arbitrary node vin Gnand a level



in T





. Let L





denote the load

on vdue to the simulation of access messages passing nodes on level



of T





. Let

vTbe a node of T





on level



such that Gn





includes the node v. If such a node

does not exist then E



iM

0. We show that E



iM



Copt





, which

yields the lemma.

First, we give a bound for the probability P





that the host of vTis mapped onto v.

Let









denote the number of processors in Gn





. As vTis mapped uniformly

at random to one of the nodes in Gn







)(









Next, we relate this probability to the bandwidth b





of vT. This bandwidth is

defined to be the number of the nodes in Gn





. As a consequence,



H









(





The maximum load of vTis Copt









. Consequently, the expected load

on node vfor simulating vTis at most

Copt







[



8(

Copt





Hence, E



GM



Copt





, which yields the lemma.

42 Chapter 2. Caching without Memory Capacity Constraints

This completes the proof of Theorem 2.17.

2.3.4 The universal caching strategy

In this section, we present the universal caching strategy working on arbitrary net-

works for which an oblivious routing strategy is given. A routing strategy is called

oblivious if the path traveled by each packet depends only on the origin and destina-

tion of the packet and not on the origins and destinations of the other packets nor on

congestion encountered during the routing. Note that each oblivious routing strategy

is also an on-line routing strategy. Finally, we apply the universal caching strategy to

several classes of networks.

Suppose we are given an arbitrary networkGwith nnodes and an oblivious routing

strategy for G. Each edge is assumed to have bandwidth 1. Thus, the relative and abso-

lute load of an edge are identical. The universal caching strategy for Gis a simulation

of the access tree strategy for the complete network Gnon G. Note that both Gand Gn

have nnodes such that for each node in Ga counterpart in Gncan be fixed. All mes-

sages that are sent between adjacent nodes in Gnare sent along the paths, determined

by the oblivious routing strategy, between the associated nodes in G.

The following theorem shows that, for a network Gin which a permutation can be

routed obliviously with maximum expected load Lπ





, the universal caching strat-

egy is O



Lπ



+

deg





-competitive, w.h.p., and strictly O



Lπ



+

deg



/

logn



competitive, w.h.p. To yield the latter result we have to relate the optimal congestion

Copt





and min



. Either an object xis migrated at least once or each request

message to xis directed to the node where the initial copy of xis placed. As the nodes

in Ghave maximum degree deg





, it follows deg



[

Copt



*

min



Theorem 2.20 Consider an application on a network G with n nodes in which a per-

mutation can be routed obliviously with maximum expected load Lπ





. For each

(

logn, we obtain a caching strategy with congestion O



Lπ



/

deg



/



Copt





min





.m



logn



, w.h.p., where deg





denotes the degree of G,

Copt





denotes the optimal congestion for the application, and κdenotes the maxi-

mum number of write requests directed to the same object.

Proof. In order to prove the above result we require a lower bound on the congestion

of an optimal strategy and an upper bound on the congestion of the universal caching

strategy. Define Copt





to be the optimal congestion at the nodes for the application

when it is executed on the complete network Gn, under the assumption that each node

of Gis simulated by its counterpart in Gn. Fix an 1

(

logn. Define T





to be the

decomposition tree of Gnwith height hand degree O





, and defineCopt





be the optimal congestion at the nodes for the application when it is executed on the

tree T





, under the assumption that each node of Gnis simulated by its counterpart

in T





, which is one of the leaf nodes.

2.3. Applications of the access tree strategy 43

We give a lower bound that relates Copt





to Copt





, and an upper bound

that relates the congestion of the universal caching strategy to Copt





. We start

with the lower bound.

Lemma 2.21 Copt



)(

deg



[

Copt





Proof. Lemma 2.18 shows that Copt



<(

Copt





. Hence, it remains to prove

Copt



*(

deg



[

Copt





For a given strategy on Gwith congestion Cwe have to describe a strategy on Gn

with congestion at most deg



C

Cat the nodes. We simulate the strategy for Gon Gn,

except for the routing. Instead, for a routing path in G, we use the edge between the

respective nodes in Gn.

Each edge incident on a node vGin Ghas at most load C, since each edge in Ghas

bandwidth 1. Thus, the node vGnin Gnbeing the counterpart of vGin Ghas at most

relative load deg





C, since each node in Gnhas bandwidth 1.

The following lemma gives an upper bound on the congestion of the universal

caching strategy.

Lemma 2.22 The maximum expected relative load in G due to access messages is



Lπ







Copt





Proof. Lemma 2.19 shows that the maximum expected relative load at the nodes in

Gndue to access messages is O





Copt





. Hence, it remains to prove that,

for a given strategy with maximum expected relative load Lat the nodes in Gn, the

simulation of this strategy has at most maximum expected relative load O



Lπ



N



at the edges of G.

Recall that, in the simulation of the strategy for Gnon G, all messages that are

sent between adjacent nodes in Gnare sent along the paths, determined by the oblivi-

ous routing strategy, between the associated nodes in G. Obviously, all these routing

requests can be partitioned into Lpartial permutations. Since a permutation can be

routed in Gwith maximum expected (relative) load Lπ





, the simulation has at most

maximum expected relative load O



Lπ



[



. Note that the routing strategy is obliv-

ious, i.e., the path traveled by each packet depends only on the origin and destination

of the packet.

Applying Lemma 2.21 and Lemma 2.22yields that the maximum expected relative

load in Gdue to access messages is O



Lπ





deg







Copt





Finally, let L





denote the load on an arbitrary edge eof Gdue to all kind of

messages. Analogously to the proof of Lemma 2.9, we can conclude that E



GM[



Lπ





deg



[



Copt





. In order to complete the proof of the lemma, we have

to show that the maximum load over all edges does not deviate too much from the

expected load of an arbitrary edge.

Analogously to the proof of Lemma2.9, we decompose L





into deg





parts

such that L



H

∑deg

;_;

1Li, and each Liis a sum of independent random variables

44 Chapter 2. Caching without Memory Capacity Constraints

of maximum weight W





min



.e

with K









and κdenoting the

maximum number of write requests directed to the same object. Applying a Chernoff

bound (see, e.g., [HR90]) to the sums yields that Lj









logn



, w.h.p.,

and therefore,



c

deg

;P;

∑

1Li





'

deg



[



logn







Lπ



[

deg



[



Copt





min





.^



logn

4

w.h.p. This completes the proof of Lemma 2.20.

Cayley networks

Meyer auf der Heide and V¨ocking have presented in [MV95] an oblivious routing

strategy for Cayley networks that achieves maximum expected load O





diam





for

routing ppermutations in a Cayley network Gwith nnodes, where diam





denotes

the diameter of G. Cayley networks are defined as follows. Let Γbe a finite algebraic

group withidentity 1, and suppose Σis a setgenerator of Γwith 1



Σ. Then the Cayley

network GΓ









is defined byV



Γand E

n,













. Note that many

standard networks belong to this class, e.g., tori, cube-connected-cycles, and wrapped

butterfly networks. Then, the following corollary can be concluded with Theorem

2.20. This corollary shows that, for a Cayley network Gwith nnodes, the universal

caching strategy is O



diam



[

deg





-competitive, w.h.p., and strictly O



diam



[

deg





logn



-competitive, w.h.p.

Corollary 2.23 Consider an application on a Cayley network G with n nodes. For

each 1

(

logn, we obtain a caching strategy with congestion O



diam



'

deg



'



Copt





min





.8



logn



, w.h.p., where Copt





denotes the optimal

congestion for the application, and κdenotes the maximum number of write requests

directed to the same object.

Edge symmetric networks

Further, Meyer auf der Heide and V¨ocking have given in [MV99] an oblivious rout-

ing strategy for edge symmetric networks achieving maximum expected load O





diam





deg





for routing ppermutations in an edge symmetric network Gwith

nnodes. Note that, e.g., all equal-sided tori belong to this class. Then, the following

corollary can be concluded with Theorem 2.20. This corollary shows that, for an edge

symmetric network Gwith nnodes, the universal caching strategy is O



diam





competitive, w.h.p., and strictly O



diam



[

logn



-competitive, w.h.p.

Corollary 2.24 Consider an application on an edge symmetric network G with

n nodes. For each 1

(

logn, we obtain a caching strategy with congestion

2.3. Applications of the access tree strategy 45



diam







Copt



'

min





.8



logn



, w.h.p., where Copt





denotes

the optimal congestion for the application, and κdenotes the maximum number of

write requests directed to the same object.

Hypercubic networks

In addition, Meyer auf der Heide and V¨ocking have introduced in [MV99] an obliv-

ious routing strategy for de Bruijn networks that achieves maximum expected load





logn



for routing ppermutations in the de Bruijn network with nnodes. It is

obvious that a bi-simulation between a de Bruijn network and the shuffle-exchange

network of the same size can be done with congestion 2. Then, the following corol-

lary can be concluded with Theorem 2.20. This corollary shows that, for the wrapped

butterfly, cube-connected-cycles, hypercube, de Bruijn, and shuffle-exchange network

withnnodes, the universal caching strategyis O



logn



-competitive,w.h.p., and strictly



logn



-competitive, w.h.p.

Corollary 2.25 Consider an application on the wrapped butterfly, cube-connected-

cycles, hypercube, de Bruijn, or shuffle-exchange network G with n nodes. For each

(

logn, we obtain a caching strategy with congestion O



logn



Copt





min





.)



logn



, w.h.p., whereCopt





denotes the optimal congestion for

the application, and κdenotes the maximum number of write requests directed to the

same object.

Leighton has presented in [Lei92] an oblivious routing strategy for indirect butter-

fly networks that achieves maximum expected load O





for routing ppermutations

in the indirect butterfly network with nnodes. Hence, the following corollary can be

concluded with Theorem 2.20. This corollary shows that, for the indirect butterfly

network with nnodes, the universal caching strategy is O





-competitive, w.h.p., and

strictly O



logn



-competitive, w.h.p.

Corollary 2.26 Consider an application on the indirect butterfly network G with n

nodes. For each 1

(

logn, we obtain a caching strategy with congestion O





Copt



N

min





.



logn



, w.h.p., where Copt





denotes the optimal

congestion for the application, and κdenotes the maximum number of write requests

directed to the same object.

2.3.5 Clustered Networks

AclusterednetworkG









is a network thatconsists of several smallsubnetworks,

i.e., clusters, that are arranged hierarchically. The cluster-tree T





describes this

hierarchical structure. The internal nodes of T





correspond to the clusters of G, and

the leaves correspond to the terminal user nodes. In the following, we interpret the

terminal user nodes as clusters of size 1. Any two clusters that are connected by one

46 Chapter 2. Caching without Memory Capacity Constraints



































¡¢

£¤

¥¦

§¨

©ª

«¬

®

¯°

±²

³´

µ¶

·¸

¹º

»¼

½¾

¿À

Á ÁÂ

Ã ÃÄ

ÅÆ

ÇÈ

ÉÊ

ËÌ

ÍÎ

ÏÐ

ÑÒ

ÓÔ

ÕÖ

×Ø

ÙÚ

ÛÜ

ÝÞ

ßà

áâ

ãä

åæ

ç ç

è è

éê

ë ë

ì ì

ííí

îîî

ïïïï

ððð

ñ ñ

ò ò

óóó

ôôô

õö

÷ ÷

ø ø

ù ù

ú ú

û û

û ûü

ýþ

ÿÿÿ







 

 



 

 

 









 

 

 

 



 

 

 

 





 

 











 

 



!!!

" "

# #

# #$

%%%%%

&&&&

'''

(((

)))

***

+++++

,,,,

- -

////

000

3 34

5 5

6 6

7 7

8 8

9 9

9 9:

; ;

; ;<

===

> >

Regional Network

Backbone Network

Regional Network

Figure 2.5: The topology of a wide area NOW.

or more edges in Gare also connected by an edge in T

. The bandwidth of an edge

in T

is the sum of the bandwidths of the corresponding edges in G.

For instance, networks of workstations (NOWs) are usually organized as clustered

networks. Figure 2.5 depicts a possible topology for a wide-area NOW. Note that our

definition of clustered networks is slightly more general than the depicted network as

it allows that user terminal nodes are attached also to the internal clusters.

The nodes inside the clusters are connected in an arbitrary fashion. However, we

assume that communication between nodes in the same cluster is less expensive than

communication between nodes in different clusters, which is just the basic idea behind

any kind of clustering. This property can be formalized as follows. Consider a cluster

K. Let VKdenote the set of nodes in the cluster. Define the weight w

of a node

VKto be the sum of the bandwidths of the edges incident on vthat leave K, and the

weight w

of U

VKby w

@DC

∑v

. A cut U for U

VKis a partition of

Kinto two subgraphs induced by Uand VK

U. The capacity of a cut Uis the sum of

the bandwidths of the edges connecting the nodes in Uwith the nodes in VK

U. We

define the cross flux by

@GC

min

capacity

@JI

@KI

min

capacity

min

@ON

@OPRQ

We assume that α

@SC

Ω

, for each cluster K.

The following example illustrates that the cross flux is a good measure for the

bandwidth bottleneck of a cluster. We define the fully loaded random routing problem

2.3. Applications of the access tree strategy 47

as follows. Suppose b

@UT

2 messages arrive along each edge ethat connects cluster K

with another cluster, where b

denotes the bandwidth of e. For simplicity, we assume

thatb

@UT

2 isan integer. For each ofthe arrivingmessages, we choose a departure edge

eat random such that the message leaves the cluster along edge ewith probability

@VT

. Note that the expected number of messages arriving or leaving the cluster

via an edge eis b

, which corresponds to the maximum number of messages that can

pass the edge ein one time step.

Define the minimum cut ratio S of the fully loaded random routing problem to

be the minimum, over all cuts U, of the capacity of the cut divided by the expected

number of messages that have to pass the cut. If Sis smaller than 1 than there is at

least one cutUsuch that the expected number of messages cannot pass this cut in one

time step. The cross flux α

is equal to the minimum cut ratio Sof the fully loaded

random routing problem. Therefore, if α

@XW

1 then there is a bandwidth bottleneck

inside the cluster K.

We consider a distributed application or network computation in which the user

terminal nodes issue read and write requests. The shared objects can be placed on any

node of the network, but only the terminal user nodes are allowed to issue requests to

the shared objects. Network computations in which also the internal nodes issue re-

quests to the shared objects can be modeled by attaching a user terminal node to each

of the internal nodes. The two nodes are connected by an edge, which virtual band-

width corresponds to the maximum injection rate for requests issued by the original

node.

The access tree strategy for clustered networks works as follows. For each object

X, define the access tree Tx

to be a copy of the cluster-tree T

. Each interior

node iof Tx

is mapped at random to one of the nodes in the associated cluster K. In

particular, iis mapped with probability w

@UT

onto node v

VK. On each access

tree we execute the tree strategy.

The caching strategy requires a cluster initialization for the selection of the routing

paths inside the clusters. For this path selection, we solve a multicommodity flow

problem for each cluster K. The multicommodity flow problem is solved locally

for each cluster K, e.g., with the randomized approximation scheme introduced by

Leighton et al. in [LMP

95]. Let n

denote the number of nodes in the cluster and

the number of edges connecting these nodes. Then the initialization takes time

@ZI

log4n

@V@

, for each cluster K. Alternatively, the multicommodity

flow problem can be calculated deterministically and exactly by an algorithm based

on linear programming, which also can be done in time polynomial in the size of the

clusters. Note that, independent from the number of shared objects, the initialization

of a cluster needs to be done only once. A detailed description of the solution to the

routing problem is given in the proof for the following theorem.

Theorem 2.27 Consider an application running on the terminal user nodes of a clus-

tered network G of size n. Suppose the cross flux of each cluster is in Ω

. Let

δdenote the maximum number of nodes in a cluster that are adjacent to nodes in

48 Chapter 2. Caching without Memory Capacity Constraints

other clusters, and let γdenote the maximum degree in the cluster tree T

, i.e., γ

is the maximum number of tree nodes adjacent to any single tree node. Then the ac-

cess tree strategy achieves congestion O

logδ

Copt

@J[

logn

, w.h.p., where

Copt

denotes the optimal congestion for the application, and κdenotes the maxi-

mum number of write requests directed to the same object. If the clusters in G can be

represented by planar graphs or by constant genus graphs, then this result improves to

Copt

@\[

logn

, w.h.p.

Proof. Consider a cluster K. Let VKdenote the set of nodes in the cluster. Let α

denote the cross flux of K. Let δ

denote the number of nodes in Kthat are adjacent

tonodesin other clusters. Let γ

denote thedegree ofKin thecluster-tree. We define

the congestion of Kto be the maximum relative load of all edges adjacent to nodes in K

includingedges connecting nodes in Kwith nodes inother clusters. LetCopt

denote

the optimal congestion of K. We show, for cluster K, that the access tree strategy

achieves congestion O

logδ

@]I

Copt

@UT

max

@^P_[

@KI

logn

, w.h.p. If K

is planar or of constant genus, then this result improves to O

Copt

@UT

max

@^P`[

@`I

logn

, w.h.p. This yields the above theorem. Furthermore, it shows that the

influence of a “bad” cross flux is only small and local to the respective cluster.

In the following, we describe how the routing paths that simulate the edges of

the access trees are selected. According to this description, any message that is sent

between two nodes simulating adjacent access tree nodes chooses a routing path at

random. We assume that the path selection for each message is independent from the

selection of other messages.

First, we fix the course of the routing paths along the edges that connect different

clusters. Suppose K

is a cluster neighboring to K. Let uand u

be the cluster-tree

nodes that represent the access tree nodes corresponding to Kand K

, respectively. Let

eTbe the edge that connects the two clusters in the cluster-tree T

, and let e1

QVQUQ

denote the corresponding edges that connect Kand K

in G. We send a message that

traverses eTin T

along edge eiwith probability p

@DC

@UT

, where b

is the bandwidth of edge eiin G, and b

@bC

∑k

is the bandwidth of the edge

eTin T

. The following lemma gives an upper bound on the expected relative load

on the edges between different clusters.

Lemma 2.28 Let e be an edge of G that connects a node in cluster K with a node in

one of the neighboring clusters. Then the expected load on e is O

Copt

@U@

Proof. Let eTdenote the edge in T

that connects the nodes representing the two

adjacent clusters. Our strategy executes the strictly 3-competitive tree strategy on the

access trees. Thus, the relative load on eTis O

Copt

@V@

, and hence the absolute load

is O

Copt

@SI

@U@

. As a consequence, the expected absolute load for edge eis

Copt

@KI

@JI

@U@SC

Copt

@KI

@U@

. Therefore, the expected relative load on

eis at most O

Copt

@U@

Next we describe the path selection strategy for the routing inside the cluster K.

Consider the following multicommodity flow problem. Let V

CeM

QUQUQ

vδ

PhB

2.3. Applications of the access tree strategy 49

denote the set of nodes that are adjacent to nodes outside the cluster. We define δ

commodities

jwith 1

. The source of commodity

jis vi, its sink is

vj, and its demand is w

@`I

@UT

. We solve the commodity flow problem on

Kwhile respecting the capacities, i.e., the bandwidths of the edges in K.

We are going to use the solution to the multicommodity flow problem to help guide

us in selecting the routingpaths inside the clusters. For a solutionof the multicommod-

ity problem, define the throughput fraction q as the minimum, over all commodities,

of the fraction of the commodity’s demand that is met, that is, for each commodity

there is a flow of size q

@JI

@VT

from vito vj. An optimal solution maxi-

mizes the value of q. The following lemma relates the result of an optimal solution to

the cross flux α

Lemma 2.29 In the general case, there is a solution to the multicommodity with q

Ω

@lT

logδ

@V@

. If K is a planar graph or a constant genus graph then there is a

solution with q

Ω

@l@

Proof. First we consider the general case. Define the minimum cut ratio of a multi-

commodity flow problem to be the minimum, over all cuts, of the capacity of the cut

divided by the flow demand across the cut. In [AR98] it is shown that any multicom-

modity flow problem can be satisfied up to a factor q

Ω

logk

with Sdenoting

the minimum cut ratio and kdenoting the number of commodities. The minimum cut

ratio of our multicommodity flow problem is α

@VT

2. Note that the “load” induced

by the multicommodity flow problem on an edge is twice the load of the fully loaded

random routing problem, which has minimum cut ratio α

. As the number of com-

modities in our problem is δ

2, the maximum flow can be satisfied up to a factor

Ω

@UT

log

@U@V@

, which yields the result for general networks.

A better bound is known for the optimal throughput fraction of uniform multicom-

modity flow problems on planar graphs or graphs of constant genus. In a uniform

multicommodity flow problem there is a unit-demand commodity between every pair

of nodes. In this case, the optimal throughput fraction is shown to be in Ω

am@

, where

denotes the cut ratio of the uniform problem (see [KPR93]). We transform our

multicommodity flow problem into a uniform problem such that the cut ratio S

the uniform problem is in Ω

and the optimal throughput fraction q

for the uni-

form problem is in O

, where Sand qdenote the cut ratio and the optimal through-

put fraction of the original problem, respectively. This yields the desired result as

Ω

a@_C

Ω

a@bC

Ω

Interestingly, the bound on the optimal throughput fraction for uniform multicom-

modity problems is independent from the “size” of the respective problem. Therefore,

we can arbitrarily blow up our multicommodity flow problem in the transformation

without decreasing the optimal throughput fraction. Note that we do not aim to solve

the resulting multicommodity flow problem but only aim to show the existence of a

“good” throughput fraction.

A first transformation increases all weights such that each node, including the

nodes not in V

K, have an integral weight not smaller than 1. This is done as follows.

50 Chapter 2. Caching without Memory Capacity Constraints

We multiply all weights and capacities with a suitable large factor F

such that

each of the weights of the nodes in V

Kand each of the capacities of the edges are not

smaller than n2, where ndenotes the number of nodes in the cluster. Afterwards, the

weights of all nodes are rounded up to the next integer, and the weights of all nodes

with weight 0 in the original problem, i.e., the nodes not in V

K, are defined to have

weight 1 in the new problem. Let w

an?

denote the new weight of a node. We define

n2commodities

ioap?

with u

VT. The source of commodity

iqar?

is u, its sink

is v, and its demand is w

@KI

@UT

WwithW

@_C

∑v

VKw

Let S

denote the minimum cut ratio and q

the optimal throughput fraction of the

multicommodity flow problem after the transformation. As all weights and capacities

are multiplied with the same factor F

and the sum of the added demands is smaller

than n, which is a fraction of 1

nof the smallest capacity, S

and q

A second transformation yields a uniform problem in which each commodity has

demand 1

W. For each node vwith w

@Xs

1, we add w

@Jt

1 nodes, each of which

is connected by an edge of unbounded capacity with v. Next, the weights of all nodes

are defined to be 1. The resulting graph is still planar or of constant genus and has W

nodes. We define W2commodities, one for each tuple of nodes. The demand of each

commodity is defined to be 1

The multicommodity problem derived by the second transformation corresponds

to the previous problem in the following way: Consider two nodes uand vfrom the

original cluster graph. uand vare represented by a set of nodes V

and V

in the

new graph. The demand of the commodities that should be routed from V

to V

∑

uvE

∑

uvE

Cxw

@JI

which corresponds to the demand that should be routed between uand vin the previous

problem.

Therefore, the second transformation does not change the minimum cut ratio nor

the optimal throughput fraction. Hence, we have shown that the original problem can

be transformed into a uniform problem with minimum cut ratio S

and optimal

throughput fraction q

, which completes the proof of Lemma 2.29.

In the cluster initialization phase, we solve the maximum flow problem for the

cluster K. For each node u, for each edge eincident on u, and each commodity

let f

denote the size of the flow of commodity

jthat leaves node uacross

edge eaccording to the solution of the multicommodity flow problem. As a result of

the initialization, each node uholds a table including its respective f

values.

We use the solution of the multicommodity flow problem to choose the routing

paths in the cluster. Note that none of the request messages has its source and its

destination in the same cluster because messages are routed only between nodes that

simulate neighboring access tree nodes, and each cluster hosts at most one node of

each access tree. The hosts of the access tree nodes are chosen randomly from V

QUQlQ

vδ

, which is theset of nodes adjacent tonodes in otherclusters. Hence, any

2.3. Applications of the access tree strategy 51

message coming from outside of Karrives at a node from V

Kand has to be forwarded

to a node also from V

K. Also a message that leaves the cluster starts at a node from

Kand has to be routed through the cluster to a node from V

K, i.e., to that node which

is incident to the edge along which the message aims to leave the cluster. Hence, we

have to describe only how to choose the routing paths between pairs of nodes fromV

Consider a message that traverses or starts at a node u

VK. Let vi

Kand vj

denote the local source and the local destination of the message, respectively.

Let E

denote the set of edges that are incident to uand do not leave the cluster.

Then uchooses an edge from E

at random according to the following distribution:

eis selected with probability

∑e

The message is forwarded along the randomly selected edge. Intuitively, the message

follows the suggestions of the multicommodity flow.

The following lemma relates the expected relative load of the edges inside cluster

Kto the optimal congestionCopt

, which is defined to be the maximum relative load

over all edges incident on the nodes inVK, and hence includes also the relative load on

edges that leave the cluster. Note that the bound on the expected relative load given in

the lemma becomes o

Copt

@U@

if qis in ω

, which means that the bandwidth inside

the cluster is much higher than the bandwidth of the external links. In this case the

congestion is dominated by the load on the edges connecting different clusters.

Lemma 2.30 For each edge connecting two nodes of the cluster, the expected relative

load on e is at most O

Copt

@UT

, where q denotes the throughput fraction computed

by the multicommodity flow program.

Proof. The only difference between arriving and leaving messages is the direction

in which the messages travel along the randomly selected routing paths. As this is

irrelevant for the expected load on any edge, we only consider arriving messages.

For a data object x, let v

@yA

Kdenote the node that simulates the access tree

node of x. Any message for xthat arrives at cluster Kis directed to node v

. Recall

that v

is chosen randomly fromV

CzM

QUQUQ

vδ

. The probability that v

@{C

vj,

for 1

, is w

@UT

@_C

@UT

Suppose each node viinjects an expected number of q

messages into the

cluster and each message is directed to the host v

of some data object x. Then the

expected number of messages sent from node vito node vjis q

@`I

@VT

which corresponds to the amount of flow sent from vito vjin the solution for the

multicommodity flow problem. The random path selection inside the cluster is guided

by the solution for the multicommodity flow in such a way that the expected number

of messages traversing an edge is equivalent to the amount of flow passing the edge.

Hence, in our example, the expected load on an edge eis not larger than the capacity

of the edge, which is defined to be b

52 Chapter 2. Caching without Memory Capacity Constraints

Now, consider the messages arriving during the execution of the application. By

Lemma 2.28 the expected number of messages that arrive via an external edge e

con-

necting a node vifrom the cluster Kwith a node from a neighboring cluster K

Copt

@KI

a@U@

. As w

is equivalent to the sum of the bandwidths of the external

edges e

incident on vi, the expected number of messages injected via node viinto the

cluster is O

Copt

@ZI

@V@

. As a consequence, the expected relative load on each

edge is at most O

Copt

@VT

Combining the results of the Lemmata 2.28, 2.29, and 2.30 yields that the expected

load for each edge in Kis O

logδ

@|I

Copt

@UT

min

@UP}@

in the general case, and

Copt

@VT

min

@^P}@

, if Kis planar or of constant genus. In order to complete

the proof of Theorem 2.27, we have to show that the maximum relative load over all

edges in Kdoes not deviate too much from the expected load.

Consider edge eof the cluster K. Let L

denote the load on e. Then we have

to show that L

@XC

@lq[

@`I

logn

, w.h.p. Let Lx

denote the load on

edue to the accesses to data object x. Our tree strategy ensures that the load on each

edge of the access tree due to request messages for xis O

. Further, each edge e

is involved in the simulation of at most γ

different access tree edges. Therefore,

@DC

@KI

. This means that Lx

is the sum of O

@{I

not necessarily

independent 0-1-random variables, each of which representing the load due to one

request message. We add some dummy variables so that Lx

is the sum of exactly

@KI

random variables A1

@N

QUQUQ

Aτ

, for every x

X. Then

@C

∑

1Ai

@bC

∑

1∑

XAi

ThevariablesAi

inthesumSi

∑x

XAi

are independent, for 1

τ. Applying

a Chernoff bound(see, e.g., [HR90]) to thissumyields thatitsvaluedeviatesbyat most

an additive term of O

logn

from O

r@

, w.h.p., for 1

τ. Since L

@ZC

∑τ

1Si,

it follows L

@C

@lq[

logn

@XC

@l[

@{I

logn

, w.h.p., which

completes the proof of Theorem 2.27.

2.4 Extending the results to data-race free applications

An important class of applications that allows parallel and overlapping requests is the

class of data-race free applications, which is defined as follows. We assume that an

adversary specifies a parallel application running on the nodes of the network, i.e.,

the adversary initiates read and write requests at the nodes of the network. A write

request for an object is not allowed to overlap with other requests to the same object,

and there is some order among the requests to the same object such that, for each

read and write request, there is a unique least recent write. Note that this still allows

arbitrary concurrent requests to different objects and concurrent read requests to the

same object. An execution using a caching strategy is called consistent if it ensures

2.4. Extending the results to data-race free applications 53

that a read request directed to an object always returns the value of the most recent

write request to the same object.

We have to describe how parallel requests are handled by the presented strategy

such that the execution is consistent. On trees this works as follows. Since we con-

sider onlydata-race free applications, write requestsdo not overlapwith other requests.

Overlapping read requests are handled in the following way. Consider a request mes-

sage marriving on a node uthat does not hold a copy of x. Let edenote the next

edge on the path to the nearest copy. Suppose another request message m

directed to

xhas been sent already along ebut a data message has not yet been sent back. Then

the request message mis blocked on node uuntil the data message corresponding to

passes e. When this message arrives, either a new copy is created on node u, and

userves the request message m, or mcontinues its path to the connected component

of copies. As the other networks execute the tree strategy on access trees that are

embedded in the network, they can follow the same approach.

In data-race free applications, the following problem can occur during the remap-

ping of copies: remapping or notification messages overlap with access messages or

other remapping or notification messages. For example, an access message arrives

at an old host of an access tree node but the tree node has been remapped while the

access message was in transit. We solve this problem as follows. All messages are

acknowledged such that we can ensure that at most two messages are pending between

two access tree nodes. If a message arrives at an abandoned host the sender is in-

formed about this and resends the message to the new host. We define that neither the

acknowledgment messages nor the overlapping messages influence the counters. Ob-

viously, these messages have no influence on our asymptotic bounds as each of them

corresponds to another message that we take into account.

We can conclude that all results given in this chapter hold also for data-race free

applications, which indicates that the introduced strategies are well suited for practical

usage.

Chapter 3

Caching with Memory Capacity

Constraints

In this chapter, we investigate caching strategies for networks with memory capacity

constraints in a non-uniform cost model. The remainder of the chapter is organized as

follows. In Section 3.1, the general framework for the development of caching strate-

gies with memory capacity constraints is introduced. This framework is based on the

access tree strategy. In Section 3.2, a caching strategy for trees with memory capacity

constraints is presented. This strategy is used as a subroutine in the general framework.

In Section 3.3, the general framework is applied to several classes of networks, includ-

ing meshes, fat-trees, complete networks, Cayley networks, edge symmetric networks,

and hypercubic networks. Further, we present the universal caching strategy working

on arbitrary networks for which an oblivious routing strategy is given.

3.1 The general framework

In chapter 2, we have presented the access tree strategy that is a caching strategy ne-

glecting memory capacity constraints. The general framework is an extension of the

access tree strategy to memory capacity constraints.

Basically, the access tree strategy simulates a simple 3-competitive caching strat-

egy for a tree T

on the original network G. This tree T

is a tree whose leaf

nodes correspond to the nodes of the network G. The analysis of the access tree strat-

egy uses a bi-simulation between the network Gand the tree T

. At first, it is shown

that the tree T

can simulate the network G. In this simulation, each leaf node of

issues the same read and write requests as its counterpart in G. It is shown that

any application that produces congestion Cwhen it is executed on Gcan be executed

on T

with congestion C, too. At second, it is shown that the network Gcan simu-

late the tree T

in such a way that the congestion in Gis only c3times larger than

the congestion in the tree T

56 Chapter 3. Caching with Memory Capacity Constraints

From the results on the congestion of these two simulations we can obtain the

competitive ratio of the access tree strategy. The simulation of Gby T

looses a

factor of at most c1

1; the on-line strategy looses a factor of c2

3 against the

optimal off-line strategy on T

; finally, the simulation of T

by Glooses a factor

of at most c3. Altogether, we achieve a competitive ratio of c1

The problem with the adaption of this approach to the case of memory capacity

constraints is the lack of an efficient caching strategy for trees with memory capacity

constraints. It is also not clear how the access trees should look like. One possibility

is to allow that only the leaf nodes, which correspond to the nodes in G, have memory

modules. Another possibility is that all nodes have memory modules which then have

to be simulated by the nodes of the network G. For neither of these alternatives, how-

ever, we know an efficient caching strategy, and we will see that we do not need such

a strategy.

Define the bandwidth tree Tb

to be a tree of maximum height h, which leaf

nodes correspond to the nodes in G. Each of the leaf nodes in Tb

has a memory

module of the same size as the respective node in G. The inner nodes do not have

memory modules. Define the memory tree Tm

to be a tree that is isomorph to

. Each of the leaf nodes in Tm

has a memory module that is ktimes larger

than the one of its counterpart in Tb

. Each non-leaf node in Tm

, except for the

root, has a memory module, which capacity is equal to the sum of the capacities of

its children. The memory capacity of the root is equal to the sum of the capacities

of all nodes in G. Thus, it is guaranteed that the root node can hold all data objects

simultaneously. The exact topology of these trees depends on the network Gand will

be defined later.

Suppose the bandwidth tree Tb

can simulate the optimal off-line strategy run-

ning on Gwhile increasing the congestion only by a factor of c1; suppose the on-line

strategy on Tm

produces only a congestion that is a factor of c2larger than the con-

gestion of the optimal off-line strategy on Tb

; and suppose the on-line strategy on

can be simulated on Gso that the congestion in Gis only a factor of c3larger

than the congestion in Tm

while each node in Grequires a memory capacity that

is htimes the memory capacity of the corresponding leaf node in Tm

. Then the

resulting strategy is

[

-competitive.

The major challenge in the construction of the general framework is the analysis of

the factor between the on-line strategy running on Tm

and the optimal off-line strat-

egy running on Tb

. We use a strategy for single-computer caching as a subroutine

in the on-line strategy on Tm

. We will show that the factor between the on-line and

the optimal off-line strategy depends on the competitive ratio of the single-computer

caching strategy.

Such a result as itself does not seem to be very interesting since it considers the

congestion on two different networks. However, we will illuminate its significance in

several examples that describe how to apply the general framework to several classes

of networks.

3.2. The caching strategy for the memory tree 57

3.2 The caching strategy for the memory tree

In this section, we present a caching strategy for the memory tree Tm. This strategy is

used as a subroutine in the general framework.

The memory tree Tmis a virtual network whose topology is an arbitrary rooted

tree. Each node in Tmhas a memory module. The capacity of the modules at the leaf

nodes is arbitrary. The capacity of a module at an inner node of the tree is defined to

be the sum of the capacities of its children. The edges are allowed to have arbitrary

bandwidths. Initially, each data object is stored in one of the leaf nodes.

First, we present a caching strategy for the memory tree in a uniform cost model,

i.e. D

1. Then, we generalize this strategy to a non-uniform cost model.

3.2.1 Uniform model

In the uniform cost model, i.e. D

1, we assume that each object fits into one routing

packet such that each migration of a copy along an edge increases the load of this edge

by 1. Recall that each read or write request increases the load of each edge involved

in this request by 1, and that each information message increases the load of each

traversed edge by 1.

The caching strategy

An adversaryspecifies read and writerequests thatoccur solelyat the leafnodes. These

requests have to be served by the caching strategy. This strategy may create copies on

all nodes and may delete these copies afterwards in order to reduce the congestion.

Essentially, many copies mean cheap read requests but expensive write requests. Even

in case of only read requests, however, the considered caching problem is not trivial

as the capacities of the memory modules usually do not allow to give a copy of every

object to every node.

The caching strategy for the memory tree consists of the following sub-strategies.



Basic strategy: This strategy manages the shared objects as if the memory mod-

ules in the tree would have unlimited capacities. It decides on which nodes to

place copies so that the costs for read and write requests are balanced. It simply

neglects the memory capacity constraints.



Local strategy: This strategy corresponds to a caching strategy for single-

computer systems. It runs on every node of the tree except for the root, which

is sufficiently large to hold copies of all objects simultaneously, by definition.

The local strategy decides which object in a memory module is ejected when

the basic strategy suggests to insert a new object.



Rescue-and-help strategy: This strategy preserves at least one copy of a data

object at any time. Besides it helps the basic strategy when it wants to access a

copy that has been ejected by the local strategy.

58 Chapter 3. Caching with Memory Capacity Constraints

The coordination between these three strategies is not an easy task. For this pur-

pose, our caching strategy uses dummy copies, that is, in addition to the copies created

by the basic strategy, the algorithmcreates copies that are justplaceholders so that they

cannot be accessed in case of a read request and need not to be updated in case of a

write request. Before we explain this mechanism in detail, we turn to the basic and the

local strategy.

Nearly any known strategy for trees without memory capacity constraints can be

used as basic strategy. The only restrictionis that the strategy has toensure that, for any

object x, the nodes holding a copy of xbuild a connected component in the memory

tree Tm. The overall competitive ratio of our algorithm depends on the competitive

ratio of the basic strategy used.

In order to keep the description and the analysis of the overall strategy as simple as

possible, we refer to a concrete example of a basic strategy in the following. We use

the tree strategy in the uniform model introduced in Section 2.2.1: The nodes holding

a copy of an object xbuild a connected component. Suppose node uissues a read or

write request to object x. Let vdenote the node of the connected component that is

closest to u. Reads are served by establishing the unique shortest path from uto v.

Writes are served by establishing the minimal multicast tree from uto all nodes in the

connected component. The connected component is changed in the following way. In

case of a write all copies in the connected component, except the one on node u, are

deleted. In both the read and the write case, we create new copies on the path from uto

v. In Theorem 2.1 it is shown that this simple caching strategy does not only minimize

the congestion, it minimizes the load on any individual edge in the tree up to a factor

of 3. Note that this result is optimal because of the lower bound shown by Black and

Sleator [BS89].

When the basic strategy wants to store a copy of an object at a node, it calls the

local strategy that decides which copy of another object is ejected eventually. For the

local strategy, we can choose any caching strategy for single-computers, e.g., LRU or

RANDOM (see, e.g., [FKL

91, ST85]). The competitiveness of the overall strategy

depends on the competitiveness of the caching strategy used. In the following we

assume that the local strategy is

Nnio@

-competitive. For example, LRU is known to be

-competitive [ST85].

The description so far leaves us with the following two problems.



How to preserve at least one copy for each object?



How to realize accesses to virtual copies, i.e., copies that have been ejected by

the local strategy?

The rescue-and-help strategy is an extension to the basic strategy that solves these

problems by using dummy copies. The dummy copies of an object xare used as

placeholders in order to ensure that always one copy of xcan be preserved without

having to eject any other copy or dummy copy of another object for that purpose.

Whenever the basic strategy invalidates copies in case of a write, we do not delete

3.2. The caching strategy for the memory tree 59

them but change these copies to dummy copies instead. We assume that the local

strategy does not take notice of this change. It treats these copies as usual copies, and

may eject them later. As described before, each data object xhas only one initial copy,

and this copy is stored at a leaf node v

. We add dummy copies of xto all inner

nodes on the path leading from v

to the root. Note that the memory capacities of

all inner nodes are large enough to hold all these initial dummy copies. Adding these

initial dummy copies does not induce any cost since the inner nodes know the initial

placement of the copies.

The rescue-and-help mechanism works as follows. At any time during the execu-

tion, let V

denote the set of nodes that are included in the connected component of

nodes that should hold a copy of xaccording to the basic strategy. Sometimes, some of

the nodes inV

may not hold a copy of xbecause it was ejected by the local strategy.

Let top

denote the unique node in V

that is closest to the root. The path from

top

to the root is called the rescue path of x. Let c

denote the (virtual) copy of x

which the basic strategy assumes to be stored on top

. The rescue-and-help strategy

always keeps one copy c

of x. Initially, c

@ZC

is stored on top

. Whenever

d?

is ejected because of the insertion of another object, c

d}?

is migrated upwards

along the rescue path until it reaches a node that holds a dummy copy of x. Here it can

be stored without having to eject any other object. At any time during the execution,

the following invariant is maintained: c

is stored on a node top

on the rescue

path.

The existence of c

d?

allows us to define an assistance node a

for every node

. The assistance node a

is that node on the path from vto the root that

is closest to vand holds a copy of x. Whenever the basic strategy wants to access a

virtual copy of xon node vthe rescue-and-help strategy uses the copy of the assistance

node a

instead. That is, any path or multicast tree established by the basic strategy

between a node uthat issued a read or write request and the respective node(s) inV

is redirected such that it connects uwith the corresponding assistance node(s) instead

of the node(s) in V

The migration of copies is handled in the following way. Let vdenote the node

fromV

that is closest to the node u. When the basic strategy establishes a path from

vto uand migrates copies along that path, the rescue-and-help strategy simply estab-

lishes a path from a

to u, and migrates copies from a

to u. We distinguish

three cases.

1. Suppose a

@C

top

. Figure 3.1 illustrates this situation. Then, the rescue-

and-help strategy creates a copy on each node on the path from a

to u.

Note that in this case all migrations follow a path that leads downwards, i.e., in

direction to the leaves.

2. Suppose a

@C

top

dq?

, and the path from top

d?

to uis going downwards

only. Figure 3.1 illustrates this situation. In this case, the path includes top

The rescue-and-help strategy migrates c

to top

, and creates a dummy

copy on each node on the migration path from top

dq?

to top

, excluding

60 Chapter 3. Caching with Memory Capacity Constraints











top



top



root



Figure 3.1: Case 1 and 2: a

lies on the path from vto top

d?

top

. Then a copy on each node on the path from top

to uis created. The

migration of c

d?

to top

effectively re-unions c

dq?

and c

3. Suppose a

@C

top

, and the path from top

to uis going first upwards

to a node wand then downwards to u. Figure 3.2 illustrates this situation. In this

case, the rescue-and-help strategy first migrates c

d}?

to w, thereby replacing

all dummy copies on the path from top

to wby real copies, excluding w.

Starting from node w, which now holds c

d?

, a copy on each node on the path

from wto uis created, which re-unions c

d}?

and c

as wis the new top

node after serving the request.

We assume that almost all accesses to copies and all insertions of copies are done

by using the local strategy on the respective node. There is only one exception: In case

3 the copyon the node a

@`C

top

d?

is accessed directly, that is, the access does not

influence the local strategy. When using a memory-less local strategy, i.e., a strategy

that ejects copies independent of previous accesses, like RANDOM, we do not have to

care about this topic. In this way, the local strategy on each node vis influenced only

by requests issued in the subtree rooted at v.

It remains todescribe howour caching strategyon the memorytree can be executed

in the distributed setting in which all decisions are based solely on local information.

The tree strategy in the uniform model introduced in Section 2.2.1 is used as basic

strategy. This tree strategy can be executed in a distributed fashion. Then, it is obvious

that the rescue-and-help strategy can also be executed in the distributed setting. This

completes the description of the caching strategy on the memory tree Tm.

Let e

C?

denote an arbitrary edge in the tree so that vis closer to the root than

u. We can observe, that the load on edge ecorresponds to the load induced by the basic

strategy plus some load L1

induced by serving requests that are redirected to assis-

tant nodes plus some load L2

induced by upward migrations of c

-copies. Note that

3.2. The caching strategy for the memory tree 61











top



top

d?



root

Figure 3.2: Case 3: a

@SC

top

d?

is increased by 2 due to serving a request that is redirected to an assistant node,

since first the request is served and thereafter the residence set is modified. Whenever

or L2

are increased due to an object x, then some copy or dummy copy of x

has been ejected from node urecently, i.e., after the last time when these values have

been increased due to object x. Let eject

denote the number of ejections from node

u. Then we can conclude the following observation.

Observation 3.1 For each edge e of Tm, L1

@Dk

eject

and L2

@k

eject

The analysis of the caching strategy

We compare the load of the caching strategy on the memory tree Tmto the optimal

load on the corresponding bandwidth tree Tb. This tree Tbhas the same topology as

Tm, the same bandwidths on all edges, but different memory capacities. The inner

nodes on Tbdo not have any memory modules, i.e., memory capacity null, and the

memory modules at the leaves are smaller than the corresponding memory modules

of Tm. These capacities are chosen in dependence on the choice for the local strategy

on Tm: Suppose the single-computer caching strategy used as local strategy is

Nnio@

competitive. Then we define that the capacity of a leaf node in Tmis ktimes the

capacity of the corresponding leaf node in Tb. We assume that the capacities of Tbare

large enough so that all objects can be stored in the network. This assumption allows

us to decrease the capacity of the root of the memory tree Tmby a factor of kas it is

sufficient if the root can store all data objects simultaneously. Hence, the ratio between

the total memory capacity of Tmand the total memory capacity of Tbis k

[

1, where

hdenotes the height of Tmand Tb.

The following lemma compares the load induced by our caching strategy running

on Tmwith the load of an optimal off-line strategy running on Tb, both of them ap-

plied to the same, arbitrary sequence of requests issued by the leaf nodes of the cor-

62 Chapter 3. Caching with Memory Capacity Constraints

responding tree. The single-computer caching strategy used as local strategy may be

randomized. In this case, the competitive ratio

gives only an upper bound on the

expected load of the single-computer caching strategy, and the following lemma gives

only an upper bound on the expected load of the strategy on Tm. The lemma shows

that our caching strategy minimizes the load on all edges simultaneously. Hence, the

congestion produced by the on-line strategy executed on Tmis at most 3

[

times the

congestion produced by an optimal off-line strategy executed on Tb.

Lemma 3.2 Let eband emdenote a pair of corresponding edges in Tband Tm, respec-

tively. Then the (expected) load on emis at most 3

[

times the optimal load on eb,

where

denotes the competitive ratio of the local strategy.

Proof. Unfortunately, the behavior of an optimal off-line strategy on the bandwidth

tree Tbis difficult to understand. In order to get a more obvious behavior, we allow

the optimal strategy to place copies on internal nodes, too. Let T

bdenote a tree that is

isomorph to Tb, whose leaf nodes have the same memory capacity as the corresponding

leaf nodes in Tb, and whose inner nodes have a memory capacity that is equal to the

sum of the capacities of their respective children. We restrict, however, the placement

ofcopiesonT

bin thefollowing way: Fora node v, let T

denotethesubtreerooted at

node v. Anytime, for each node v, the set of objects that are stored in a memory module

in the subtree T

, must be a subset of the objects that have a copy or dummy copy

in the memory module of v. Recall that dummy copies are placeholders that cannot be

used to serve a read request and need not to be updated in case of a write request. We

assume that these copies can be created from scratch, that is, they do not require any

kind of migration. The following lemma shows that, despite of this restrictions, T

bis

more powerful than Tb, so that any lower bound on the load on the edges of T

bholds

also for Tb.

Lemma 3.3 The execution of any off-line caching strategy serving an arbitrary se-

quence of requests on Tb, can be simulated on T

bwithout increasing the load on any

edge.

Proof. We simulate the strategyfor TbonT

b. The managementof the memory modules

of the leaf nodes and the routing between these nodes, i.e., the selection of the paths

and multicast trees, is done in T

bexactly as in Tb. At any time during the simulation,

an inner node vof T

bholds a dummy copy of all those objects that are stored in the

memorymodulesinT

. InthiswaytherestrictionsonT

bare satisfied. Since dummy

copies can be created from scratch, this simulation does not require to increase the load

on any edge.

In contrast to the bandwidth tree Tb, the behavior of an optimal off-line strategy

on T

bis relatively clear. W.l.o.g., any optimal off-line strategy fulfills the following

properties for each node vin T

3.2. The caching strategy for the memory tree 63



If a leaf node in the subtree T

issues a read or write request to object x, then

the only possible modification of the memory module of vis that a new copy of

xis stored in this module and that copies of other objects are ejected out of this

module.



If a leaf node outside the subtree T

issues a read or write request to object x,

then the only possible modification of the memory module of vis that copies of

objects are ejected out of this module.

Any optimal off-line strategy that does not possess these two properties can be easily

changed so that the properties are fulfilled without increasing the load on any edge.

Now, fix a sequence σ

σ1σ2

IUIUI

of requests issued by the leaf nodes of the corre-

sponding tree. Let em

C?

denote an arbitrary edge of Tmand let e

C?

denote the corresponding edge of T

b. We assume that vmand v

bare closer to the root

than umand u

b, respectively. Further, let us fix an optimal off-line strategy on T

which is denoted the optimal strategy in the following. Let L

denote the minimum

load of the optimal strategy on the edge e

b. The basic strategy on Tmproduces at most

load 3

on the edge em. Therefore, it remains to prove that the additional load

due to the rescue-and-help strategy, i.e., L1

@[

, is not larger than 3

iXI

(on expectation).

First, we consider the on-line strategy on Tm. From Observation 3.1 we can con-

clude that L1

@K[

@k

eject

. Let ρdenote the subsequence of σthat

includes all requests issued in T

and that are noticed by the local strategy on um,

i.e., ρincludes any request influencing the behavior of the local strategy on um. Let

opt

denote the minimum load of an optimal off-line single-computer caching strat-

egy running on ρand having a memory module whose capacity is ktimes smaller than

the one of um. Thus, the memory capacity of the single-computer caching strategy

corresponds to the capacity of u

b. As the local strategy is

Nni@

-competitive, we can

conclude E

@\[

@]k

eject

@l]k

iI

opt

Next we consider the optimal strategy on T

b. Any request in ρissued to an object

xincreases the load on the edge e

bif the memory module of u

bdoes not include a

copy of x. Hence, the load on e

bis larger or equal to the load of an optimal off-line

single-computer caching strategy running on ρand having a memory module of the

same capacity as u

b. As a consequence,

@D

opt

These two bounds give us the required result, i.e., E

@[

@lk

i_I

Therefore, the expected load on emis at most 3

[

times the optimal load on e

band

hence at most 3

[

times the optimal load on eb, too. This completes the proof of

Lemma 3.2.

64 Chapter 3. Caching with Memory Capacity Constraints

3.2.2 Non-uniform model

In thissection, we generalize thecaching strategyfor thememory tree toa non-uniform

cost model. Now, we describe how the three sub-strategies, the basic strategy, the

local strategy, and the rescue-and-help strategy, have to be adapted to the non-uniform

model.

For the basic strategy, we can use nearly any known strategy for trees without

memory capacity constraints in the non-uniform cost model. The only restriction is

that the strategy has to ensure that, for any object x, the nodes holding a copy of x

build a connected component in the memory tree. The overall competitive ratio of our

algorithm depends on the competitive ratio of the used basic strategy. In order to keep

the description and the analysis of the overall strategy as simple as possible, we refer

to a concrete example of a basic strategy in the following. We use the tree strategy

introduced in Section 2.2.2. In Theorem 2.5 it is shown that this caching strategy does

not only minimize the congestion, it minimizes the load on any individual edge in the

tree up to a factor of 3. Note that this result is optimal because of the lower bound

shown by Black and Sleator [BS89].

For the local strategy, any caching strategy for single-computers, e.g., LRU or

RANDOM (see, e.g., [FKL

91, ST85]), can be used. The overall competitive ratio of

our algorithm depends on the used caching strategy. In order to keep the description

and the analysis of the overall strategy as simple as possible, we refer to a concrete

example of a local strategy in the following. We use the RANDOM caching strategy

for single-computers. This strategy works as follows. When a data object should

be loaded into a memory module, one of the slots in the memory module is selected

uniformly at random. The object that may be stored in that slot is ejected.

The rescue-and-help strategy presented in Section 3.2.1 has to be adapted as fol-

lows to the non-uniform cost model. Suppose a leaf node uhad issued a read or write

request for object x. Then, the migration of copies is handled in the following way. Let

vdenote the node fromV

that is closest to the node u. The basic strategy establishes

a path pfrom vto uand may migrate copies along that path. Suppose that the basic

strategy creates new copies on each node on the sub-path of pfrom vto v

, excluding

v. Note that, possibly, v

. Then the rescue-and-help strategy simply establishes a

path from a

to u, and migrates possibly copies along that path. We distinguish

three cases.

1. Suppose a

@C

top

dq?

. Figure 3.1 on Page 60 illustrates this situation. Then,

the rescue-and-help strategy creates with probability 1

Da copy on each node

on the path from a

to v

. Note that in this case all migrations follow a path

that leads downwards, i.e., in direction of the leaves.

2. Suppose a

@XC

top

, and the path from top

to uis going downwards

only. Figure 3.1 on Page 60 illustrates this situation. In this case, the path

includes top

. With probability 1

Dthe rescue-and-help strategy does the

following. c

d}?

is migrated to top

, and dummy copies are added to each

3.2. The caching strategy for the memory tree 65

node on the migration path from top

d?

to top

, excluding top

. Then a

copy on each node on the path from top

to v

is created. The migration of

to top

effectively re-unions c

and c

3. Suppose a

@C

top

, and the path from top

to uis going first upwards

to a node wand then downwards to u. Figure 3.2 on Page 61 illustrates this

situation. In this case, the rescue-and-help strategy first migrates c

d?

to w,

thereby replacing all dummy copies on the path from top

dq?

to wby real copies,

excluding w. Starting from node w, which now holds c

d?

, a copy on each node

on the path from wto v

is created, which re-unions c

d?

and c

as wis the

new top

node after serving the request.

It remains todescribe howour caching strategyon the memorytree can be executed

in the distributed setting in which all decisions are based solely on local information.

The tree strategyintroduced in Section 2.2.2 is used as basic strategy. This tree strategy

can be executed in a distributed fashion. Then it is obvious that the rescue-and-help

strategy can also be executed in the distributed setting. This completes the description

of the caching strategy on the memory tree Tm.

As in the previous section, we compare the load of the caching strategy on the

memory tree Tmto the optimal load on the corresponding bandwidth tree Tb. Recall

that theratio between the totalmemory capacity ofTmand the total memory capacity of

Tbis k

[

1, where hdenotes the height of Tmand Tb. The following lemma compares

the load induced by our caching strategy running on Tmwith the load of an optimal

off-line strategy running on Tb, both of them applied to the same, arbitrary sequence

of requests issued by the leaf nodes of the corresponding tree. The lemma shows

that our caching strategy minimizes the load on all edges simultaneously. Hence, the

congestion produced by the on-line strategy executed on Tmis at most 3

[

T]?

times the congestion produced by an optimal off-line strategy executed on Tb.

Lemma 3.4 Let eband emdenote a pair of corresponding edges in Tband Tm, respec-

tively. Then the expected load on emis at most 3

[

T]?

times the optimal load

on eb.

Proof. Unfortunately, the behavior of an optimal off-line strategy on the bandwidth

tree Tbis difficult to understand. In order to get a more obvious behavior, we allow

the optimal strategy to place copies on internal nodes, too. Let T

bdenote a tree that is

isomorph to Tb, whose leaf nodes have the same memory capacity as the corresponding

leaf nodes in Tb, and whose inner nodes have a memory capacity that is equal to the

sum of the capacities of their respective children. We restrict, however, the placement

ofcopiesonT

bin thefollowing way: Fora node v, let T

denotethesubtreerooted at

node v. Anytime, for each node v, the set of objects that are stored in a memory module

in the subtree T

, must be a subset of the objects that have a copy or dummy copy

in the memory module of v. Recall that dummy copies are placeholders that cannot

be used to serve a read request and need not to be updated in case of a write request.

66 Chapter 3. Caching with Memory Capacity Constraints

We assume that these copies can be created from scratch, that is, they do not require

any kind of migration. Lemma 3.3 shows that, despite of this restrictions, T

bis more

powerful than Tb, so that any lower bound on the load on the edges of T

bholds also for

Tb.In contrast to the bandwidth tree Tb, the behavior of an optimal off-line strategy

on T

bis relatively clear. W.l.o.g., any optimal off-line strategy fulfills the following

properties for each node vin T



If a leaf node in the subtree T

issues a read or write request to object x, then

the only possible modification of the memory module of vis that a new copy of

xis stored in this module and that copies of other objects are ejected out of this

module.



If a leaf node outside the subtree T

issues a read or write request to object x,

then the only possible modification of the memory module of vis that copies of

objects are ejected out of this module.

Any optimal off-line strategy that does not possess these two properties can be easily

changed so that the properties are fulfilled without increasing the load on any edge.

Now, fix a sequence σ

σ1σ2

IUIUI

of requests issued by the leaf nodes of the corre-

sponding tree. Let em

C?

denote an arbitrary edge of Tmand let e

C?

denote the corresponding edge of T

b. We assume that vmand v

bare closer to the root

than umand u

b, respectively. Further, let us fix an optimal off-line strategy on T

which is denoted the optimal strategy in the following. Let Lopt denote the minimum

load of the optimal strategy on edge e

b. The basic strategy on Tmproduces at most load

Lopt on edge em. Therefore, it remains to prove that the additional load due to the

rescue-and-help strategy on edge emis not larger than 3k

T]?

@]I

Lopt on expectation.

We use a potential function argument (see, e.g., [ST85]). Let Lr+h

denote the

additional load due to the rescue-and-help strategy on edge em, and let Lopt

denote

the minimum load of the optimal strategy on edge e

b, after serving σt. Let Mon

and Mopt

denote the set of objects stored in the memory module of umand u

respectively, after serving σt. We define

@{C

T]?

@KI

Mopt

Mon

In order to prove the lemma, we show the invariant

Lr+h

@[

@k

T]?

@JI

Lopt

This invariant can be shown by an induction on the length of σ. Obviously, this

invariant holds initially. Assume that E

Lr+h

@J[

@Dk

T]?

@SI

Lopt

. We

have to show that E

Lr+h

[

@K[

[

@_k

T]?

@_I

Lopt

[

. Let ∆Lr+h

Lr+h

[

@Jt

Lr+h

,∆Lopt

Lopt

[

@Jt

Lopt

, and ∆Φ

[

@Jt

We distinguish between requests issued inside or outside the subtree Tm



Suppose σt

1is a read or write request for object xissued inside the subtree

. In this case, Table 3.1 contains all possible changes of configuration.

3.3. Applications of the general framework 67

Mon

Mopt

[

∆Lr+h

Kk

∆Φ

]k

∆Lopt



yes no no 0 0 1

yes no yes 0 0 D

yes yes yes 0 0 0

no no no 3 1

T]?

@I

no no yes 3

[

@UT

@KI

T]?

@I

no yes yes 3 1

?t

[

@JI

T]?

@I

Table 3.1: Possible changes of configuration if σt

1is a read or write request for object

xissued inside the subtree Tm

Note that in this case E

∆Lr+h

C

0, if x

Mon

, and E

∆Lr+h

]k

[

3, otherwise. In addition, notethat theprobability thatRANDOMejects an object

being in Mopt

Mopt

capacity

Mon

@U@bk

because the capacity of RANDOM is ktimes the capacity of the optimal off-line

algorithm. The table implies that E

∆Lr+h

[

∆Φ

Kk

T]?

@JI

∆Lopt.



Suppose σt

1is a read or write request for object xissued outside the subtree

. In this case, E

∆Lr+h

_C

0 and E

∆Φ

Dk

0. Thus, E

∆Lr+h

[

∆Φ

Xk

T]?

@]I

∆Lopt.

Hence, E

Lr+h

[

@S[

[

@lk

T]?

@DI

Lopt

[

, which implies the

lemma.

3.3 Applications of the general framework

The general framework can be applied to several classes of networks. Given any net-

work G, the bandwidth tree Tb

and the memory tree Tm

can be constructed by a

hierarchical decomposition of G. The quality of the resulting caching strategy depends

on some properties of this decompositions and especially on the height of Tb

and

. Although it is not clear which properties can be obtained for general networks,

applying the framework to almost any standard network yields interesting results. In

this section we give some examples, including meshes, fat-trees, complete networks,

Cayley networks, edge symmetric networks, and hypercubic networks. Further, we

present the universal caching strategy working on arbitrary networks for which an

oblivious routing strategy is given.

68 Chapter 3. Caching with Memory Capacity Constraints

3.3.1 Meshes

In this section, we consider caching strategies for the mesh M

QVQUQ

, i.e.,

the d-dimensional mesh-connected network with side length mi



2 in dimension i.

The number of nodes is denoted by n, i.e., n

IUIUI

md, the number of edges is

∑d

1m1

IUIUI



Iq?

@{I

IUIUI

. Each node has a memory module

of uniform capacity. Each edge has bandwidth 1. Thus, the relative and absolute load

of an edge are identical.

The general framework is based on the access tree strategy. Thus, we adopt the

hierarchical decomposition of Mand theassociated decomposition tree T

of height

logn

from Section 2.3.1.

Since we obtain, for meshes, a tradeoff between the height of the decomposition

tree and the achieved congestion, we need a decomposition tree of arbitrary height

logn. Thus, some levels of T

have to be deleted possibly. This is done

with the following deletion algorithm. While height

@V@s

h, delete a level 0

iW

height

@U@

in T

having minimum degree, where the degree of level

is the

maximum degree over all nodes on level

. Deleting level

means that, first, all nodes

and edges on level

are deleted, then the edges on level

i[

1 are connected to the

respective nodes on level

it

1, and finally the numbering of the levels is adjusted.

Now, the decomposition tree T

has height hand degree O



. Note that

the root of T

corresponds still to Mitself, and the children of a node vin the tree

correspond to the submeshes into which the submesh corresponding to vis divided,

and the leaves correspond to subtrees of size one, i.e., to the nodes of M.

Define the bandwidth tree Tb

and the memory tree Tm

to be a copy of the

decomposition tree T

. Each of the leaf nodes in Tb

has a memory module

of the same capacity as the respective node in M. The inner nodes in Tb

do not

have memory modules. Each of the leaf nodes in Tm

has a memory module whose

capacity is 2 times the capacity of the respective node in M. Furthermore, each non-

leaf node in Tm

, except for the root, is assumed to have a memory module whose

capacity is equal to the sum of the capacities of its children. The memory capacity of

the root in Tm

is equal to the sum of the capacities of all mesh nodes. Thus, it is

guaranteed that the root node can hold all data objects simultaneously.

Our caching strategy for meshes simulates the strategy for the memory tree Tm

on M. The virtual memory module of an inner node vof the memory tree Tm

simulated jointly by all the nodes in the submesh M

. The assignment of the slots in

the memory module of vto the nodes in the submesh M

is done in random fashion

as follows. We permute uniformly at random all slots in the memory module of v.

Then the memory module is partitioned into equally sized, contiguous parts, one for

each of the nodes in M

. If vholds a copy of object x, let s

denote the memory

slot in the memory module of vholding the copy of x. Summing over all levels in the

memory tree Tm

, we obtain that the on-line strategy needs, for each node of M,

[

1 times the memory capacities of the off-line strategy.

3.3. Applications of the general framework 69

Now, we have to describe, how, in case of an access, a copy of an object is located

in a submesh. For each object x, and each node vof the memory tree Tm

, we

add a signpost p

. If a copy of xis stored in the memory module of v,p

points to the mesh node in M

holding the memory slot s

, and vice versa. In

addition, p

points to each signpost p

, with uis adjacent to vin the memory

tree Tm

. We embed the signposts randomly into M, i.e., for each object x, and each

node vof Tm

is mapped uniformly at random to one of the nodes in M

Note that each signpost contains no data of an object. Thus, the memory requirements

of the signposts can be neglected.

The remaining description of our caching strategy is simple: Suppose a message m,

concerning an object x, that is sent from a node vto an adjacent node uin the memory

tree Tm

. In addition, suppose that vand uhold a copy of x. In the mesh Mthis

message mis sent along the dimension-by-dimension order path from the mesh node

holding the memory slot s

to the mesh node holding the signpost p

, then to

the mesh node holding the signpost p

, and finally to the mesh node holding the

slot s

. The dimension-by-dimension order path between two nodes is the unique

shortest path between the two nodes using first edges of dimension 1, then edges of

dimension 2, and so on. If vor uholds no copy of x, the path in the mesh Mis

appropriately shortened.

Finally, we require the general remapping scheme for frequently accessed mem-

ory slots and signposts. In case of the remapping of a memory slot s

swapped with a uniform at random chosen memory slot in the submesh M

. For

more details see Section 2.3.1.

Note that the distributed execution of the local strategy based on LRU is a major

problem since the virtual memory modules of the memory tree may be distributed

among several nodes in the mesh. Therefore, we prefer to use RANDOM instead of

LRU.

The following theorem shows that, for each 1

logn, we obtain a caching

strategy for the d-dimensional mesh Mwith nnodes that is

[



@U@

competitive, w.h.p. Further, setting h

logn, we obtain a caching strategy that is

strictly

2logn

[

logn

@U@

-competitive, w.h.p. Recall for the latter result that

Copt

@D

min

Theorem 3.5 Consider an application on the d-dimensional mesh M with n nodes.

For each 1

logn, we obtain a caching strategy with congestion O



Copt

@K[

min



PyI



logn

, w.h.p., where Copt

denotes the optimal

congestion for the application, and κdenotes the maximum number of write requests

directed to the same object.

Proof. Inorder to provethe aboveresultweapplyLemma2.9which holds analogously

in this setting. Define Copt

@U@

to be the optimal congestion for the application

when it is executed on the bandwidth tree Tb

, under the assumption that each node

of Mis simulated by its counterpart in Tb

, which is one of the leaf nodes.

70 Chapter 3. Caching with Memory Capacity Constraints

To apply effectively Lemma 2.9, we have to show that the bandwidth tree Tb

ΨM-useful for a certain ΨM, i.e., we have to prove the following two conditions.



Copt

@U@Dk

Copt



The maximum expected relative load in Mdue to access messages is O



Copt

@U@V@

In addition, note that ∆

@U@C

deg

@U@bC



. Then Theorem 3.5 can be

yield with Lemma 2.9.

The lower boundCopt

@U@bk

Copt

for the optimal strategy can be yield with

Lemma 2.7. The following lemma gives the upper bound on the expected load of our

caching strategy.

Lemma 3.6 The maximum expected relative load in M due to access messages is O



Copt

@U@V@

Proof. Let L

?

denote the load on edue to the simulation of access messages passing

edges on level

of the memory tree Tm

, for 1

kihk

h. We show that E

?

@]C



Copt

@U@U@

, for 1

kik

h, which yields the lemma.

Fix a level

. Let vbe a node of Tm

on level

iDt

1 such that M

includes the

edge e. If such a node does not exist then E

?

@l\C

0. Let v

be one of the children

of v. From the proof of Lemma 2.8 we can obtain that the expected load on edue to

the simulation of the memory tree edge eT

CM

on level

of Tm

is at most

times the relative load on the memory tree edge eT. Thus, the expected load on e

due to the simulation of the memory tree edge eTis O

Copt

@U@V@

, since Lemma

3.4 yields with k

2 that the congestion produced by the on-line strategy executed

on Tm

is at most 9 times the congestion produced by an off-line strategy executed

on Tb

. The same bound holds for the edges connecting vwith its other O



children. Hence, E

?

@lC



Copt

@U@V@

, which yields the lemma.

This completes the proof of Theorem 3.5.

3.3.2 Fat-trees

In this section, we consider caching strategies for the fat-tree Fof height Hwith n

nodes. Fhas thetopologyof a symmetric tree, that is, for each inner-node, the subtrees

rooted at the children of the node are isomorph. The fat-tree represents an indirect

network, that is, only the leaf nodes are processors with memory modules of uniform

capacity, the inner nodes are only routing switches. We define the root of Fto be on

level 0, and all nodes whose parents are on level iare defined to be on level i

[

Furthermore, each edge eof Fconnecting a level inode with a level i

[

1 node is

defined to be on level i

[

1. Thus, F

?U?

@N

QUQUQ

N?

@V@

, i.e., for every level

iin the tree Feach node on level ihas di

1children and each edge on level ihas

bandwidth bi. We make the reasonable assumption, for each level i,bi



3.3. Applications of the general framework 71

The fat-tree Fis called realistic if the bandwidths of the levels decrease geomet-

rically in direction to the root, i.e., if, for each level i,bi

1, for some

constant α

The general framework is based on the access tree strategy. Thus, we adopt the

hierarchical decomposition of Fand the associated decomposition tree T

of arbi-

trary height 1

Hfrom Section 2.3.2. The remaining description of our caching

strategy is analogously to the caching strategy for meshes in Section 3.3.1.

The followingtheorem shows that, foreach 1

H, we obtaina cachingstrategy

for fat-trees of height Hwith nnodes that is

[



@U@

-competitive, w.h.p.,

and, if h

[

@U@

-competitive, w.h.p. For realistic fat-trees this result

improves to

[



@U@

-competitive, w.h.p., and, if h

H, to

[

@@

competitive, w.h.p. Further, we obtain, for each 1

H, a caching strategy that is

strictly

[

?U?

deg

@][



logn

@V@

-competitive, w.h.p. Recall for the latter

result that Copt

@X

min

deg

@^P

Theorem 3.7 Consider an application on the fat-tree F of height H with n nodes.

Let Copt

denote the optimal congestion for the application, and let κdenote the

maximum number of write requests directed to the same object.



For each 1

H, we obtain a caching strategy with congestion O

?V?

[



@\I

Copt

@[

, w.h.p., and, if h

H, with congestion O

Copt

@[

, w.h.p.,

with R

min

M|?

max

deg

@N



P}@

PI

max

deg

@N



PI

logn.



For realistic fat-trees this improves to O



Copt

@}[

, w.h.p., and, if h

to O

Copt

@\[

, w.h.p.

Proof. This proof is analogously to the proof of Theorem 2.14 concerning caching on

fat-trees without memory capacity constraints.

3.3.3 Complete networks

Some massively parallel computers, e.g., Cray T3E and T3D or Intel Paragon, have

a network with very high bandwidth, so that not the network is the bottleneck but

the individual memory modules. In particular, remote accesses to these modules are

expensive, as local accesses are supported by additional local caches. These systems

are well modeled by a complete network Gnwith nnodes. We aim to minimize the

congestion at the nodes due to remote accesses, that is, remote accesses increase the

load at any node whereas local accesses are free. Each node has a memory module of

uniform capacity and bandwidth 1. Thus, the relative and absolute load of a node are

identical.

The general framework is based on the access tree strategy. Thus, we adopt the

hierarchical decomposition of Gnand the associated decomposition tree T

of ar-

bitrary height 1

lognfrom Section 2.3.3. The remaining description of our

caching strategy is analogously to the caching strategy for meshes in Section 3.3.1.

72 Chapter 3. Caching with Memory Capacity Constraints

The following theorem shows that we obtain, for h

1, a caching strategy for

complete networks with nnodes that is

@U@

-competitive, w.h.p., with respect to

the congestion at the nodes. Further, we obtain, for h

logn, a caching strategy that

is strictly

2logn

[

logn

@U@

-competitive, w.h.p., with respect to the congestion at

the nodes. Recall for the latter result thatCopt

@X

min

Theorem 3.8 Consider an application on the complete network Gnwith n nodes. for

each 1

logn, we obtain a caching strategy with congestion O

Copt

@`[

min



PI



logn

at the nodes, w.h.p., where Copt

denotes the optimal

congestion at the nodes for the application, and κdenotes the maximum number of

write requests directed to the same object.

Proof. This proof is analogously to the proof of Theorem 2.17 concerning caching on

complete networks without memory capacity constraints.

3.3.4 The universal caching strategy

In thissection, we present the universal caching strategyworking onarbitrary networks

for which an oblivious routing strategy is given. A routing strategy is called oblivious

if the path traveled by each packet depends only on the origin and destination of the

packet and not on the origins and destinations of the other packets nor on congestion

encountered during the routing. Note that each oblivious routing strategy is also an

on-line routing strategy. Finally, we apply the universal caching strategy to several

classes of networks.

Suppose we are given an arbitrary networkGwith nnodes and an oblivious routing

strategy for G. Each node is assumed to have a memory module of uniform capacity,

and each edge is assumed to have bandwidth 1. Thus, the relative and absolute load

of an edge are identical. The universal caching strategy for Gis a simulation of the

access tree strategy for the complete network Gnon G. Note that both Gand Gnhave

nnodes such that for each node in Ga counterpart in Gncan be fixed. All messages

that are sent between adjacent nodes in Gnare sent along the paths, determined by the

oblivious routing strategy, between the associated nodes in G.

The following theorem shows that, for a network Gin which a permutation can be

routed obliviously with maximum expected load Lπ

, the universal caching strat-

egy is

Lπ

@{I

deg

@U@@

-competitive, w.h.p., and strictly

2logn

[

Lπ

@{I

deg

@JI

logn

@U@

-competitive, w.h.p. Recall for the latter result that deg

@JI

Copt

@

min

Theorem 3.9 Consider an application on a network G with n nodes in which a per-

mutation can be routed obliviously with maximum expected load Lπ

. For each

logn, we obtain a caching strategy with congestion O

Lπ

@SI

deg

@SI

Copt

@[

min



PbI



logn

, w.h.p., where deg

denotes the degree of G,

Copt

denotes the optimal congestion for the application, and κdenotes the maxi-

mum number of write requests directed to the same object.

3.3. Applications of the general framework 73

Proof. The proof is analogouslyto the proof of Theorem 2.20 concerning the universal

caching strategy without memory capacity constraints.

Cayley networks

The following corollarycan be concluded withTheorem 3.9 and the remarksin Section

2.3.4. This corollary shows that, for a Cayley network Gwith nnodes, the universal

caching strategy is

diam

@KI

deg

@U@

-competitive, w.h.p., and strictly

2logn

[

diam

@KI

deg

@KI

logn

@U@

-competitive, w.h.p.

Corollary 3.10 Consider an application on a Cayley network G with n nodes. For

each 1

logn, we obtain a caching strategy with congestion O

diam

@\I

deg

@\I

Copt

@][

min



PI



logn

, w.h.p., where Copt

denotes the optimal

congestion for the application, and κdenotes the maximum number of write requests

directed to the same object.

Edge symmetric networks

The following corollarycan be concluded withTheorem 3.9 and the remarksin Section

2.3.4. This corollary shows that, for an edge symmetric network Gwith nnodes, the

universal caching strategyis

diam

@U@

-competitive,w.h.p., and strictly

2logn

[

diam

@KI

logn

@U@

-competitive, w.h.p.

Corollary 3.11 Consider an application on an edge symmetric network G with

n nodes. For each 1

logn, we obtain a caching strategy with congestion

diam

@JI

Copt

@\[

min



PI



logn

, w.h.p., where Copt

denotes

the optimal congestion for the application, and κdenotes the maximum number of

write requests directed to the same object.

Hypercubic networks

The following corollarycan be concluded withTheorem 3.9 and the remarksin Section

2.3.4. This corollary shows that, for the wrapped butterfly, cube-connected-cycles, hy-

percube, de Bruijn, and shuffle-exchange network with nnodes, the universal caching

strategy is

logn

@U@

-competitive, w.h.p., and strictly

2logn

[

?U?

logn

@U@

competitive, w.h.p.

Corollary 3.12 Consider an application on the wrapped butterfly, cube-connected-

cycles, hypercube, de Bruijn, or shuffle-exchange network G with n nodes. For each

logn, we obtain a caching strategy with congestion O

logn

Copt

@`[

min



PDI



logn

, w.h.p., whereCopt

denotes the optimal congestion for

the application, and κdenotes the maximum number of write requests directed to the

same object.

74 Chapter 3. Caching with Memory Capacity Constraints

The following corollary can be concluded with Theorem 3.9 and the remarks

in Section 2.3.4. This corollary shows that, for the indirect butterfly network with

nnodes, the universal caching strategy is

@V@

-competitive, w.h.p., and strictly

2logn

[

logn

@U@

-competitive, w.h.p.

Corollary 3.13 Consider an application on the indirect butterfly network G with n

nodes. For each 1

logn, we obtain a caching strategy with congestion O

Copt

@{[

min



PI



logn

, w.h.p., where Copt

denotes the optimal

congestion for the application, and κdenotes the maximum number of write requests

directed to the same object.

Chapter 4

Summary and Discussion

We have presented and analyzed a general framework for the development of dis-

tributed caching strategies for networks with memory capacity constraints in a non-

uniform cost model. The strategies aim to minimize the network congestion in order

to minimize the communication overhead and avoid that some of the links become

a bottleneck. We have shown that this framework yields efficient strategies that are

able to exploit the locality included in an arbitrary application for several examples

of networks, including meshes, fat-trees, complete networks, Cayley networks, edge

symmetric networks, and hypercubic networks. In contrast to previously known strate-

gies, our strategies give an integrated solution for the problem of data placement, data

tracking, and routing. In the following we present a detailed summary of the results

of this thesis. Table 4.1 and 4.2 give a survey over the results for caching without and

with memory capacity constraints, respectively.

Trees. We present the first deterministic and distributed caching strategy that

achieves competitive ratio 3 for trees in a non-uniform cost model. This competi-

tive ratio is optimal because of the lower lower bound shown by Black and Sleator

[BS89]. Our tree strategy minimizes not only the congestion but minimizes simulta-

neously the load on each individual edge up to a optimal factor of 3. Strategies for

trees are of special interest as they can be used as subroutines in strategies for other

networks, e.g., in the access tree strategy. However, our strategy neglects memory

capacity constraints.

Meshes. The situation on mesh-connected networks is much more complicated than

the one on trees because in meshes there are several possible routing paths between

every pair of nodes. Let Mdenote the d-dimensional mesh with side length ni



2 in

dimension i. This graph represents a network of n

IUIUI

ndprocessors, all of them

with uniform memory capacity, and ∑d

1n1

IVIUI



Iq?

@{I

IUIUI

ndcommunica-

tion links, all of them with uniform bandwidth.

Constructing bandwidth trees of different heights, we obtain a tradeoff between

the bandwidth and memory utilization: For each 1

logn, we obtain a caching

76 Chapter 4. Summary and Discussion

Caching without memory capacity constraints

non-strict strict

network competitive ratio competitive ratio

tree 3 3

d-dimensional mesh O

logn

fat-tree Fof height H O

min

logn

loglogn

deg

logn

realistic fat-tree F

of height HO

deg

logn

complete network

(congestion at the

nodes) O

logn

Cayley network GO

diam

@KI

deg

@U@

diam

@KI

deg

@KI

logn

edge symmetric

network GO

diam

@V@

diam

@JI

logn

hypercube

cube-connected-cycles

de Bruijn

shuffle-exchange

wrapped butterfly

logn

?U?

logn

indirect butterfly O

logn

Table 4.1: Survey over the results for caching without memory capacity constraints.

Recall that ndenotes the number of nodes in the networks and that all results hold

w.h.p.

strategy for the d-dimensional mesh Mwith nnodes that is

[



@U@

competitive, w.h.p. Setting h

logn, we obtain the strict congestion ratio O

logn

w.h.p. Further, we present an Ω

logn

lower bound for the competitive ratio for

on-line routing in meshes, which implies that our upper bound on the congestion ratio

for meshes of constant dimension is optimal.

Fat-trees. Define a fat-tree Fof height Hwith nnodes to be a graph that has the

topology of a symmetric tree, that is, for each inner-node, the subtrees rooted at the

children of the node are isomorph. The fat-tree represents an indirect network, that

is, only the leaf nodes are processors with memory modules of uniform capacity, the

inner nodes are only routing switches.

Constructing bandwidth trees of different heights, we obtain a better trade-

off between the bandwidth and memory utilization than for the mesh: For each

H, we obtain a caching strategy for a fat-tree Fof height Hwith nnodes

that is

[



@U@

-competitive, w.h.p., and, if h

[

@U@

Caching with memory capacity constraints

non-strict strict

network memory congestion memory congestion

ratio ratio ratio ratio

d-dimensional

mesh 2h

[



2logn

[

logn

fat-tree F

of height H2h

[



[

?U?

deg

@\[



logn

realistic fat-tree F

of height H2h

[



[

?U?

deg

@\[



logn

complete network

(congestion at the

nodes) 3O

2logn

[

logn

Cayley network G3O

diam

@KI

deg

@U@

2logn

[

diam

@JI

deg

@JI

logn

edge symmetric

network G3O

diam

@V@

2logn

[

diam

@JI

logn

hypercube

cube-connected-

cycles

de Bruijn

shuffle-exchange

wrapped butterfly

logn

2logn

[

?U?

logn

indirect butterfly 3 O

2logn

[

logn

Table 4.2: Survey over the results for caching with memory capacity constraints. Note

that his variable with 1

lognfor the d-dimensional mesh and 1

Hfor

the (realistic) fat-tree of height H. Recall that ndenotes the number of nodes in the

networks and that all results hold w.h.p.

competitive, w.h.p. Further, we obtain, for each 1

H, a caching strategy

that is strictly

[

?U?

deg

@J[



logn

@U@

-competitive, w.h.p. Setting h

min

logn

loglogn

, we obtain the congestion ratio O

min

logn

loglogn

P}@

w.h.p., and the strict congestion ratio O

deg

logn

, w.h.p.

Usually, it is assumed that the bandwidths of the levels decrease geometrically in

direction to the root. Let Bidenote the sum of the bandwidths of all edges on level i

with level Hedges being incident on the leaf nodes. A fat-tree is called realistic, if, for

each level i,Bi

1, for some constant α

1. For realistic fat-trees, we obtain,

for each 1

H, a caching strategy that is

[



@U@

-competitive, w.h.p.,

and, if h

[

@U@

-competitive, w.h.p.

78 Chapter 4. Summary and Discussion

Complete networks. Some massively parallel computers, e.g., Cray T3E and T3D

or Intel Paragon, have a network with very high bandwidth, so that not the network

is the bottleneck but the individual memory modules. In particular, remote accesses

to these modules are expensive, as local accesses are supported by additional local

caches. These systems are well modeledby a complete network of nodes withmemory

modules of uniform capacity. We aim to minimize the congestion at the nodes due to

remote accesses, that is, remote accesses increase the load at any node whereas local

accesses are free. Each node is assumed to have uniform bandwidth. We obtain a

caching strategy for complete networks with nnodes that is

@U@

-competitive,

w.h.p., and a caching strategy that is strictly

2logn

[

logn

@U@

-competitive, w.h.p.,

with respect to the congestion at the nodes.

The universal caching strategy. We present the universal caching strategy work-

ing on arbitrary networks for which an oblivious routing strategy is given. Each

node is assumed to have a memory module of uniform capacity, and each edge is

assumed to have bandwidth 1. For a network Gin which a permutation can be routed

obliviously with maximum expected load Lπ

, the universal caching strategy is

Lπ

@\I

deg

@V@U@

-competitive, w.h.p., and strictly

2logn

[

Lπ

@\I

deg

@\I

logn

@V@

-competitive, w.h.p. Thismeans that the universal caching strategy iswell suited

for networks having a degree at most logarithmic in the network size and in which a

permutation can be routed obliviously with maximum expected load at most logarith-

mic in the network size. We apply the universal caching strategy to several classes of

networks.



Cayley networks. For a Cayley network Gwith nnodes, the universal caching

strategy is

diam

@SI

deg

@U@

-competitive, w.h.p., and strictly

2logn

[

diam

@JI

deg

@JI

logn

@U@

-competitive, w.h.p.



Edge symmetric networks. For an edge symmetric network Gwith nnodes,

the universal caching strategy is

diam

@U@

-competitive, w.h.p., and strictly

2logn

[

diam

@KI

logn

@U@

-competitive, w.h.p.



Hypercubic networks. For the wrapped butterfly, cube-connected-cycles, hy-

bercube, de Bruijn, and shuffle-exchange network with nnodes, the univer-

sal caching strategy is

logn

@U@

-competitive, w.h.p., and strictly

2logn

[

?V?

logn

@U@

-competitive, w.h.p. Further, for the indirect butterfly network

with nnodes, the universal caching strategy is

@l@

-competitive, w.h.p., and

strictly

2logn

[

logn

@U@

-competitive, w.h.p.

Clustered networks. We show how the access tree strategy can be applied to

Internet-like clustered networks. A clustered network is a network that consists of

several small subnetworks, i.e., clusters, that are organized hierarchically. Commu-

nication between nodes of the same cluster is not as expensive as communication

between nodes of different clusters. The access trees are embedded into the clustered

network in a preprocessing step. We show that this preprocessing can be done effi-

ciently and locally for each participating cluster. Thus, we yield an efficient caching

strategy that is suitable, e.g., for managing WWW pages. In contrast to the caching

strategies currently used in the Internet, our strategy keeps all copies of a data object

consistent such that, e.g., modifications of WWW pages, are propagated to all copies

of the page. However, our strategy neglects memory capacity constraints.

Previous models for caching in networks simply aim to minimize the total com-

munication load, i.e., the sum, taken over all messages, of the size of the messages

multiplied with the length of their routing paths. This can result in bottlenecks. We

have improved upon such models as our model aims to minimize the congestion, i.e.,

the maximum, taken overall links, of the amount of data transmittedby the link, which

corresponds to distributing the load evenly among all network resources. However, in

contrast to other computational models for parallel computers, e.g., the BSP [Val90]

and the LogP model [CKP

96], our congestion based model does not have any no-

tion of time. We still simply sum up the load over all time steps. Thus, our models

avoids bottlenecks in space but not in time. The BSP model, for example, assumes

that a parallel algorithm proceeds in synchronized rounds. An adequate congestion

measure for this model considers the sum, over all rounds, of the maximum edge load

in the respective round. Unfortunately, investigating caching in networks in this model

is much more complicated since shifting communication load on some of the edges

from one round to another possibly has crucial effects. Therefore, we conclude that

developing caching strategies in a cost model that incorporates some notion of time is

a challenging task.

Further, our experimental results in [KMR

99] show that there is a further im-

portant cost factor in sending messages apart from the congestion. The sending of a

message by a processor is called a startup. The overhead induced by the startup pro-

cedure, inclusive the overhead of the receiving processor, is called startup cost. Since

we ignore the startup costs in our model it is an interesting question how to incorporate

startup costs in an adequate way in our congestion based model.

We have shown that our general framework yields efficient caching strategies for

several classes of networks. It can also be efficiently applied to other networks. All

that is needed is a hierarchical network decomposition that possesses similar proper-

ties to the decompositions we have described. Basically, such a hierarchical network

decomposition can be obtained by a recursively decomposition of the network. In or-

der to achieve efficient caching strategies, in each decomposition step the following

two rules have to be regarded.



Decompose the network almost along the bisection cut into two subnetworks,

such that communication between nodes of the same subnetwork is not as ex-

pensive as communication between nodes of different subnetworks.

80 Chapter 4. Summary and Discussion



Decompose the network into two almost equally sized subnetworks, such that

the associated decomposition tree is balanced.

Unfortunately, there could be a tradeoff between the two rules in arbitrary networks.

Thus, it is an interesting question of whether or not there is a variation of the recur-

sive decomposition described above that yields a caching strategy achieving close to

optimal congestion on arbitrary networks?

Bibliography

[ABF93] B. Awerbuch, Y. Bartal, and A. Fiat. Competitive distributed file al-

location. In Proceedings of the 25th ACM Symposium on Theory of

Computing (STOC), pages 164–173, 1993.

[ABF98] B. Awerbuch, Y. Bartal, and A. Fiat. Distributed paging for general

networks. Journal of Algorithms, 28:67–104, 1998.

[ALMZ96a] M. Andrews, F. T. Leighton, P. T. Metaxas, and L. Zhang. Automatic

methodsfor hidinglatencyin highbandwidth networks. In Proceedings

of the 28th ACM Symposium on Theory of Computing (STOC), pages

257–265, 1996.

[ALMZ96b] M. Andrews, F. T. Leighton, P. T. Metaxas, and L. Zhang. Improved

methodsfor hidinglatencyin highbandwidth networks. In Proceedings

of the 8th ACM Symposium on Parallel Algorithms and Architectures

(SPAA), pages 52–61, 1996.

[AP90] B. Awerbuch and D. Peleg. Sparse partitions. In Proceedings of the

31th IEEE Symposium on Foundations of Computer Science (FOCS),

pages 503–513, 1990.

[AR98] Y. Aumann and R. Rabani. An O

logk

approximate min-cut max-flow

theorem and approximation algorithm. SIAM Journal on Computing,

27(1):291–301, 1998.

[BDBK

90] S. Ben-David, A. Borodin, R. M. Karp, G. Tardos, and A. Widgerson.

On the power of randomization in online algorithms. In Proceedings

of the 22th ACM Symposium on Theory of Computing (STOC), pages

386–379, 1990.

[Bel66] L. A. Belady. A study of replacement algorithms. IBM Systems Jour-

nal, 5:78–101, 1966.

[BFR92] Y. Bartal, A. Fiat, and Y. Rabani. Competitive algorithms for dis-

tributed data management. In Proceedings of the 24th ACM Symposium

on Theory of Computing (STOC), pages 39–50, 1992.

82 Bibliography

[BL97] Y. Bartal and S. Leonardi. On-line routing in all-optical networks. In

Proceedings of the 24th International Colloquium on Automata, Lan-

guages and Programming (ICALP), pages 516–526, 1997.

[BS89] D. L. Black and D. D. Sleator. Competitive algorithms for replication

and migration problems. Technical Report CMU-CS-89-201, Depart-

ment of Computer Science, Carnegie-Mellon University, 1989.

[CK99] E. Cohen and H. Kaplan. Exploiting regularities in web traffic patterns

for cache replacement. In Proceedings of the 31th ACM Symposium on

Theory of Computing (STOC), 1999.

[CKP

96] D. E. Culler, R. M Karp, D. Patterson, A. Sahay, K. E. Schauser, E. E.

Santos, R. Subramonian, and T. von Eicken. LogP: A practical model

of parallel computation. Communications of the ACM, 39(11):78–85,

1996.

[CMS96] R. J. Cole, B. M. Maggs, and R. K. Sitaraman. On the benefit of sup-

porting virtual channels in wormhole routers. In Proceedings of the

8th ACM Symposium on Parallel Algorithms and Architectures (SPAA),

pages 131–141, 1996.

[CMSV96] R. Cypher, F. Meyer aufder Heide, C. Scheideler, and B. V¨ocking. Uni-

versal algorithms for store-and-forward and wormhole routing. In Pro-

ceedingsof the 28thACM SymposiumonTheoryofComputing(STOC),

pages 356–365, 1996.

[FKL

91] A. Fiat, R. M. Karp., M. Luby, L. A. McGeoch, D. D. Sleator, and

N. E. Young. Competitive paging algorithms. Journal of Algorithms,

12(2):685–699, 1991.

[FW74] P. A. Franaszek and T. J. Wagner. Some distribution-free aspects of

paging performace. Journal of the ACM, 21:31–39, 1974.

[Goo94] A. J. van de Goor. Computer Architecture and Design. Addison-

Wesley, 1994.

[HR90] T. Hagerup and C. R¨ub. A guided tour of Chernoff bounds. Information

Processing Letters, 33:305–308, 1990.

[IKP96] S. Irani, A. R. Karlin, and S. Phillips. Strongly competitive algorithms

for paging with locality of reference. SIAM Journal on Computing,

25(3):477–497, 1996.

[Ira97] S. Irani. Page replacement with multi-size pages and applications to

web caching. In Proceedings of the 29th ACM Symposium on Theory

of Computing (STOC), pages 701–710, 1997.

Bibliography 83

[KLL

97] D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin, and R. Pan-

igrahy. Consistent hashing and random trees: Distributed caching pro-

tocols for relieving hot spots on the World Wide Web. In Proceedings

of the 29th ACM Symposium on Theory of Computing (STOC), pages

654–655, 1997.

[KLM

97] R. R. Koch, F. T. Leighton, B. M. Maggs, S. B. Rao, A. L. Rosenberg,

and E. J. Schwabe. Work-preserving emulations of fixed-connection

networks. Journal of the ACM, 44(1):104–147, 1997.

[KMR

99] C. Krick, F. Meyer auf der Heide, H. R¨acke, B. V¨ocking, and M. West-

ermann. Data management in networks: Experimental evaluation of a

provably good strategy. In Proceedings of the 11th ACM Symposium on

Parallel Algorithms and Architectures (SPAA), pages 165–174, 1999.

[KMRS88] A. R. Karlin, M. S. Manasse, L. Rudolph, and D. D. Sleator. Competi-

tive snoopy caching. Algorithmica, 3(1):79–119, 1988.

[KPR93] P. Klein, S. A. Plotkin, and S. Rao. Excluded minors, network decom-

position, and multicommodity flow. In Proceedings of the 25th ACM

Symposium on Theory of Computing (STOC), pages 682–690, 1993.

[KRVW98] C. Krick, H. R¨acke, B. V¨ocking, and M. Westermann. The DIVA (dis-

tributedvariables)library. www.uni-paderborn.de/sfb376/a2/diva.html,

Paderborn University, 1998.

[Lei92] F. T. Leighton. Introduction to Parallel Algorithms and Architectures:

Arrays



Trees



Hypercubes. Morgan Kaufmann, San Mateo, CA,

1992.

[LMP

95] F. T. Leighton, F. Makedon, S. Plotkin, C. Stein, E. Tardos, and

S. Tragoudas. Fast approximation algorithms for multicommodity flow

problems. Journal of Computer and System Science, 50:228–243,

1995.

[LMRR94] F. T. Leighton, B. M. Maggs, A. G. Ranade, and S. B. Rao. Ran-

domized routing and sorting on fixed–connection networks. Journal of

Algorithms, 17:157–205, 1994.

[LRWY94] C. Lund, N. Reingold, J. Westbrook, and D. Yan. On-line distributed

data management. In Proceedings of the 2nd European Symposium on

Algorithms (ESA), 1994.

[Mey83] F. Meyer auf der Heide. Efficiency of universal parallel computers.

Acta Informatica, 19:269–296, 1983.

84 Bibliography

[Mey86] F. Meyer auf der Heide. Efficient simulations among several models

of parallel computers. SIAM Journal on Computing, 15(1):106–119,

1986.

[MMVW97a] B. M. Maggs, F. Meyer auf der Heide, B. V¨ocking, and M. Wester-

mann. Exploiting locality for networks of limited bandwidth. In Pro-

ceedings of the 38th IEEE Symposium on Foundations of Computer

Science (FOCS), pages 284–293, 1997.

[MMVW97b] B. M. Maggs, F. Meyer auf der Heide, B. V¨ocking, and M. Wester-

mann. Exploiting localityfor networks oflimited bandwidth. Technical

Report tr-rsfb-97-042, Paderborn University, 1997.

[MS91] L. A. McGeoch and D. D. Sleator. A strongly competitive randomized

paging algorithm. Algorithmica, 6(6):816–825, 1991.

[MV95] F. Meyer auf der Heide and B. V¨ocking. A packet routing protocol for

arbitrary networks. In Proceedings of the 12th Symposium on Theoret-

ical Aspects of Computer Science (STACS), pages 291–302, 1995.

[MV99] F. Meyer auf der Heide and B. V¨ocking. Shortest paths routing in arbi-

trary networks. Journal of Algorithms, 31(1):105–131, 1999.

[MVW99] F. Meyer auf der Heide, B. V¨ocking, and M. Westermann. Provably

good and practical strategies for non-uniform data management in net-

works. In Proceedings of the 7th European Symposium on Algorithms

(ESA), pages 89–100, 1999.

[MVW00] F. Meyer auf der Heide, B. V¨ocking, and M. Westermann. Caching

in networks. In Proceedings of the 11th ACM-SIAM Symposium on

Discrete Algorithms (SODA), pages 430–439, 2000.

[MW89] F.Meyerauf der Heide and R. Wanka. Time-optimalsimulationsof net-

works by universal parallel computers. In Proceedings of the 6th Sym-

posium on Theoretical Aspects of Computer Science (STACS), pages

120–131, 1989.

[OR97] R. Ostrovsky and Y. Rabani. Universal O

congestion

[

dilation

[

log1

εn

local control packet switching algorithms. In Proceedings

of the 29th ACM Symposium on Theory of Computing (STOC), pages

644–653, 1997.

[PR96] C. G. Plaxton and R. Rajaraman. Fast fault-tolerant concurrent ac-

cess to shared objects. In Proceedings of the 37th IEEE Symposium on

Foundations of Computer Science (FOCS), pages 570–579, 1996.

Bibliography 85

[PRR97] C. G. Plaxton, R. Rajaraman, and A. W. Richa. Accessing nearby

copies of replicated objects in a distributed environment. In Proceed-

ings of the 9th ACM Symposium on Parallel Algorithms and Architec-

tures (SPAA), pages 311–320, 1997.

[RS94] P. Raghavan and M. Snir. Memory versus randomization in on-line

algorithms. IBM Journal of Research and Development, 38(6):683–

707, 1994.

[Spi77] J. R. Spirn. Program Behavior: Models and Measurements. Elsevier

Computer Science Library. Elsevier, Amsterdam, 1977.

[ST85] D. D. Sleator and R. E. Tarjan. Amortized efficiency of list update and

paging rules. Communications of the ACM, 28(2):202–208, 1985.

[SV96] C. Scheideler and B. V¨ocking. Universal continuous routing strategies.

In Proceedings of the 8th ACM Symposium on Parallel Algorithms and

Architectures (SPAA), pages 142–151, 1996.

[Val90] L. G. Valiant. A bridging model for parallel computation. Communi-

cations of the ACM, 33, 1990.

[V¨oc98] B. V¨ocking. Static and Dynamic Data Management in Networks. PhD

thesis, Paderborn University, Germany, December 1998.