scieee Science in your language
[en] (orig)
Caching in Networks:
Non-Uniform Algorithms
and
Memory Capacity Constraints
Dissertation
of
Matthias Westermann
Acknowledgments
First of all, I would like to thank my advisor Prof. Dr. Friedhelm Meyer auf der Heide
for his great support. The atmosphere in his research group at Paderborn University
was very creative. In particular, Friedhelm always left me the freedom to do the work
in my own style and time. Furthermore, I would like to thank the other reviewers of
my thesis, Prof. Dr. Susanne Albers from Dortmund University and Prof. Dr. Burkhard
Monien from Paderborn University. I would also like to thank Prof. Dr. Johannes
Bl¨omer and Dr. Rainer Feldmann for being additional members of the examination
board.
I wouldliketo thankmy “co-advisor” Berthold V¨ockingfor a very valuable collab-
oration. It was a lot of fun to master several problems in theoretical computer science
and other major spheres of life with Berthold, such as solving a traveling salesperson
problem given by the touristic highlights in California and changing several car tires
with unconventional tools inthe forests of Massachusetts during one night. Also, many
thanksto ChristianSohler, Klaus Schr¨oder, Harald R¨acke, and Christof Krickfor many
great discussions about computer science and other significant aspects of society.
Matthias Westermann
Paderborn, Germany
November 2000
This research was supported by the DFG-Sonderforschungsbereich 376
“Massively Parallel Computing: Algorithms Design Methods Applications”
Contents
1 Introduction 1
1.1 Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Cost model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Competitive analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Previous work on the access tree strategy . . . . . . . . . . . . . . . . 8
1.5 Other previous work . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Caching without Memory Capacity Constraints 15
2.1 The access tree strategy . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Caching on trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Uniform model . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 Non-uniform model . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Applications of the access tree strategy . . . . . . . . . . . . . . . . . 26
2.3.1 Meshes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.2 Fat-trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.3 Complete networks . . . . . . . . . . . . . . . . . . . . . . . 39
2.3.4 The universal caching strategy . . . . . . . . . . . . . . . . . 42
2.3.5 Clustered Networks . . . . . . . . . . . . . . . . . . . . . . . 45
2.4 Extending the results to data-race free applications . . . . . . . . . . 52
3 Caching with Memory Capacity Constraints 55
3.1 The general framework . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2 The caching strategy for the memory tree . . . . . . . . . . . . . . . 57
3.2.1 Uniform model . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.2 Non-uniform model . . . . . . . . . . . . . . . . . . . . . . 64
3.3 Applications of the general framework . . . . . . . . . . . . . . . . . 67
3.3.1 Meshes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3.2 Fat-trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3.3 Complete networks . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.4 The universal caching strategy . . . . . . . . . . . . . . . . . 72
4 Summary and Discussion 75
Chapter 1
Introduction
In recent years, large computer systems connected by networks have become part of
our everyday live. A good example is the widespread use of the Internet and Internet-
related applications such as the World Wide Web (WWW). A basic functionality in
these systems is the interactive use of shared data objects that can be accessed from
each computer in the system. Examples for these objects are files in distributed file
systems for Ethernet-connected workstations, cache lines in virtual shared memory
systems for massively parallel computers, or pages in the WWW.
The dramatic growth of computer systems necessitates more and more an intel-
ligent management of shared data objects. The daily congestion in the Internet is a
clear evidence that the network becomes more and more a bottleneck as the size of
the system increases. The same effect can be observed in other distributed systems,
like massively parallel processors (MPPs) or networks of workstations (NOWs). The
traditional management of shared data objects based on centralized mega-servers with
special hardware meets its technical and economical limits. Thus, it is mandatory to
design new distributed managementstrategies thatguarantee thefree flow of theshared
data objects in the systems.
In general, large computer systems connected by networks consist of a set of com-
puters each having its own processor and memory module. These computers are usu-
ally connected by a relatively sparse network constructed out of links, i.e., point-to-
point connections, or buses, i.e., connections between two or more processors. The
performance of these systems depends on a number of parameters, including proces-
sor speed, memory capacity, networktopology, bandwidths, and latencies. Usually, the
links or buses are the bottleneck in these systems because improving communication
bandwidth and latency is often more expensive or more difficult than increasing pro-
cessor speed and memory capacity. But whereas several standard methods are known
for hiding latency, e.g., pipelined routing (see, e.g., [CMS96, CMSV96]), redundant
computation (see, e.g., [ALMZ96a, ALMZ96b, KLM
97, Mey83, Mey86, MW89])
or slackness (see, e.g., [Val90]), the only way to bypass the bandwidth bottleneck is to
reduce the communication load by exploiting locality.
2 Chapter 1. Introduction
The principle of locality is already known from sequential computation. Two kinds
of locality are usually distinguished: temporal and spatial locality (see, e.g., [Goo94]).
Temporal locality means that in the near future, a program is more likely to reference
those data objects that have been referenced in the recent past. This locality can be
due to instruction references in program loops, or data references in working stacks.
Spatial locality means that in the near future, a program is more likely to reference
those data objects that have addresses close to the recent references. This is caused,
e.g., by the traversal of data structures such as arrays. A further kind of locality is
specific to computer networks: topological locality. Topological locality means that
processors that are close together with respect to the structure of the interconnection
network are likely to be interested in the same data objects. This locality can be due to
a communication sensitive mapping of processes to the nodes in the network.
Usually, traditional management of shared data objects uses either the concept of
caching or hashing. Caching exploits the temporal and topological locality. The com-
munication load for accessing the shared data objects can be reduced by storing copies
of data objects in memory modules of some processors. This phenomenon is well
known and often explored in practice, e.g., in the WWW, where pages are cached in
order to reduce network congestion. When a user issues a read request for a page, the
requested page is transferred from a computer holding a copy of the page through the
network to the computer of the user. Usually, the page is cached, i.e., a copy of the
accessed page is kept in the local memory module of the user or in another memory
module lying on the routing path used by the page. In this way, subsequent requests
addressed to this page can be served locally. Similar caching mechanisms are used to
speed up parallel and distributed computations using virtual shared memory on mas-
sively parallel processors (MPPs) or networks of workstations (NOWs). The common
idea of all of these approaches is to use memory resources in order to reduce the band-
width utilization.
Caching is used for speeding up single-computers, too. Here a small, fast memory
module, the cache, is added to the processor in order to reduce the communication
on the bus connecting the processor with a much larger main memory module. Data
is partitioned into small blocks, called cache lines. Whenever the processor accesses
a data item that is not already stored in the cache, the whole respective cache line is
moved from the main memory into the cache. The major problem is to decide which
cache line is ejected, i.e., removed from the cache, when the cache is full. This prob-
lem is usually referred to as paging as it was investigated first in the context of virtual
memory where the data is usually divided into so-called pages. Paging has been in-
vestigated intensively as well in practice (see, e.g., [Bel66, CK99, FW74, Spi77]) as
in theory (see, e.g., [CK99, FKL
91, Ira97, IKP96, MS91, RS94, ST85]) in the past.
In all of these analyses it turned out that, although the idea of paging is very simple,
its analysis is not an easy task. Developing and analyzing strategies for caching in
networks is even more complicated than for single-computers because several caches
interact. Sometimes, creating a copy of a data object induces additional communica-
tion instead of reducing it, as this copy may need to be updated or invalidated later.
3
While caching aims to reduce the communication load by exploiting locality, hash-
ing tries, to distribute the communication load evenly among the network resources,
in order to avoid bottlenecks and thus congestion in the network. This is done by a
randomized distribution of the shared data objects to all memory modules. But, this
randomized distribution usually destroys the locality and hence the caching becomes
inefficient.
This thesis deals with distributed caching strategies that combine the two appar-
ently opposed concepts of caching and hashing. For this purpose, we reduce the com-
plex interconnection network to a simple structure, the so-called access tree. This
logical structure represents the communication potential of the whole network, but it
abstracts from single interconnections. In particular, the routing paths in this structure
are unique. Hence, the following questions can be answered efficiently by a distributed
caching strategy for access trees.
How many copies of a shared data object should be created?
On which memory modules should these copies be placed?
Which requests should be served by which copies?
How should the copies of a shared data object be located?
Due to the simple structure of access trees, the caching strategy can abandon on addi-
tional communication for the management and localization of the created copies. Now,
the logical access trees are embedded into the network via hashing. This is done in a
random but locality preserving fashion. With this concept, the so-called access tree
strategy, it is possible to use the effects of caching and hashing simultaneously, i.e.,
on one hand to reduce the communication load via caching, and on the other hand to
distribute the remaining load evenly, to avoid network congestion.
In order to provide efficient access to shared data objects, the communication over-
head caused by the caching strategy should be as small as possible. However, sim-
ply reducing the total communication load, i.e., the sum, taken over all messages,
of the size of the messages multiplied with the length of their routing paths, can re-
sult in bottlenecks. In addition, the load has to be distributed evenly among all net-
work resources. This corresponds to minimizing the congestion, i.e., the maximum,
taken over all links, of the amount of data transmitted by the link. Known results
for store-and-forward routing [LMRR94, MV95, OR97, SV96] and wormhole routing
[CMS96, CMSV96, SV96] show that reducing the congestion is most important in or-
der to get a good network throughput. For this reason, we believe that minimizing the
congestion is a promising approach in order to develop caching strategies that work
efficiently in theory and practice.
4 Chapter 1. Introduction
1.1 Our contribution
This thesis is based on the access tree strategy that we have originally introduced in
[MMVW97a]. The access tree strategy aims to minimize the congestion for trees,
meshes, and Internet-like clustered networks. For example, we achieve competitive
ratio 3 for trees and competitive ratio O
d
logn
, w.h.p., for d-dimensional meshes
with nnodes. Further, we present an
logn
d
lower bound for the competitive
ratio for on-line routing in meshes which implies that our upper bound on the com-
petitive ratio for meshes of constant dimension is optimal. All strategies presented in
[MMVW97a], however, work only in a uniform cost model in which migrating a data
object has the same cost as accessing the object. Furthermore, memory capacity con-
straints are neglected. These assumptions simplify the problem significantly. In this
thesis we extend the access tree strategy to
a non-uniform cost model,
memory capacity constraints, and
other classes of networks.
Typically, words or other small blocks of data that can be accessed in a computer
system are much smaller than the data objects. For example, large files usually consist
of several small records and pages of virtual shared memory consist of cache lines that
can be accessed individually. Thus, migrating a data object can be much more expen-
sive than accessing only a small piece of data from the object. However, the data of
an object should be kept together in order to reduce the bookkeeping overhead. We
present the first deterministic and distributed caching strategy that achieves competi-
tive ratio 3 for trees in this non-uniform cost model. This competitive ratio is optimal
because of the lower bound shown by Black and Sleator [BS89]. Our strategy does
not only minimize the congestion but minimizes simultaneously the load on each indi-
vidual edge up to an optimal factor of 3. Strategies for trees are of special interest as
they can be used as subroutines in strategies for other networks, e.g., in the access tree
strategy. However, our strategy still neglects memory capacity constraints. This result
has been previously published as joint work in [MVW99].
Furthermore, we present a general framework for the development of caching
strategies for networks with memory capacity constraints. This framework is based
on the access tree strategy. Our strategies aim to minimize the congestion. As in the
model without memory capacity constraints, our framework yields a caching strategy
for d-dimensional meshes with nnodes that is O
d
logn
-competitive, w.h.p. In ad-
dition, we give several examples of networks for which our framework yields efficient
strategies, including fat-trees, complete networks, Cayley networks, edge symmetric
networks, hypercubes, cube-connected-cycles, de Bruijn networks, shuffle-exchange
networks, and butterflies. For example, our framework yields a caching strategy for
complete networks that is O
1
-competitive, w.h.p, with respect to the congestion at
1.2. Cost model 5
the memory modules due to remote accesses. Most of the results have been previously
published as joint work in [MVW00].
We believe that the access tree strategy is well suited for the application in prac-
tice because the congestion is provably small. Besides, the strategy is simple and
produces only very small overhead for bookkeeping. In order to illustrate the practi-
cal usability, we have implemented the dynamic access tree strategy on a massively
parallel mesh-connected computer system and tested it for three standard applications
of parallel computing in [KMR
99]. The experimental results show that execution
time and congestion are closely related. The access tree strategy achieves execution
times that are close to the times of hand-optimized message passing strategies using
full knowledge of the access pattern of the application. Furthermore, all experiments
show that the access tree strategy clearly outperforms a standard caching strategy in
which a fixed home processor is assigned to each data object that keeps track of the
object’s copies. In particular, the larger the network is, the more superior the access
tree strategy becomes.
1.2 Cost model
The computer system is modeled by an undirected graph G
V
E
with node set
Vand edge set Esuch that the nodes represent the processors with their memory
modules, and the edges represent the links. The capacities of the memory modules are
described by a function m:V

and the bandwidths of the links are described by a
function b:E

. An application consists of read and write requests that are issued
by the nodes. Each of these requests is directed to a shared data object from a finite set
Xof shared data objects.
The shared data objects in Xare stored in the local memory modules possibly
with redundancy. At each time step, for each object x
X, let the residence set R
x
represent the set of nodes holding a copy of x. Initially, only a single copy of each
data object is stored somewhere in the computer system. This initial configuration is
known by all nodes in the system. During the execution of an application, copies may
be deleted, copies may be migrated, or new copies may be created as long as each
node obeys its memory capacity constraints. But always, at least one copy of each
data object must be stored in some of the nodes, i.e., at each time step, for each object
x
X,R
x

/
0.
A read request for a data object xrequires access to a copy of x, a write request
requires updating all copies of x. After a request is served, the multiple copies can be
reallocated. Any communication that proceeds along an edge increases the (communi-
cation) load on that edge by some amount which depends on whether a read, write, or
migration operation is performed. The increase is defined as follows.
Read operation: A read request for xissued by a node vcan be served by any
node uholding a copy of x. A path has to be allocated through the network from
vto u. The communication load on each edge eon this path increases by 1.
6 Chapter 1. Introduction
Write operation: A write request for xissued by a node vrequires to update all
copies of x. A multicast tree connecting vwith all nodes holding a copy of x,
i.e., a Steiner tree, has to be allocated. The communication load on each edge e
in the Steiner tree increases by 1.
Object migration: The strategy can replicate a copy of xfrom one node to an-
other along an arbitrary path. The communication load on each edge eon this
path increases by D, where D
1 is an arbitrary integer representing the ratio
between the load induced by the migration of the complete data object xand the
load induced by accessing only a relative small piece of x, e.g., a single byte.
Efficient strategies for distributed caching have to work in a distributed fashion.
In particular, the nodes do not have knowledge about the global state of the system,
that is, each node notices only the read and write accesses and the copy migrations that
pass the node. In order to accumulate additional knowledge a node has tocommunicate
with other nodes, which also increases the load on the involved edges.
Information exchange: Information about the global state of the system, e.g. the
actual residence set, can be exchanged by sending messages along a path from
one node to another. It is assumed that the messages have small size, e.g., they
include an ident-number of a data object and a tag for an action that should be
performed on the receiving node. The communication load on each edge on this
path increases by 1.
For a given application, let the relative load on an edge be its load divided by its
bandwidth. Finally, define the congestion to be the maximum over the relative loads
of all edges in the network.
We use caching strategies for single-computer systems as a subroutine. This prob-
lem is modeled by a graph of two nodes uand vconnected by a single edge e. The
memory module of node vcan hold all data objects simultaneously. The capacity of
the memory module of u, however, is limited. Only uis allowed to issue requests. Note
that in this case there is no significant difference between read and write requests. The
cost considered in the single-computer case is simply the load on edge e.
1.3 Competitive analysis
Caching strategies can be viewed as on-line strategies that are typically evaluated in
a competitive analysis. In this kind of analysis which was introduced by Sleator and
Tarjan [ST85] the costs of the on-line strategies are compared with the costs of an
optimal off-line strategy.
In order to obtain a simple, unambiguous model, we assume that an adversary
initiates an application, i.e., a sequence σ
σ1σ2σ3

of read and write requests each
of which is issued by one of the nodes in the network. The strategies have to serve
1.3. Competitive analysis 7
these requests one after the other, that is, it is assumed that σi
1is not issued before
the service of σiis finished. This assumption, however, does not mean that we want
to model sequential applications. The assumption of a sequence is the simplest way
to determine that the on-line and the off-line strategy have to serve the requests in the
same order. Note that allowing the off-line strategy to use a different order than the
on-line strategy would enable it to reduce its congestion almost arbitrarily.
First, we investigate caching strategies for networks without memory capacity con-
straints. For a given sequence σ, letCopt
σ
denote the minimum congestion produced
by an optimal off-line strategy. An on-line strategy is said to be c-competitive if it has
congestion at most c
Copt
σ

a, for each sequence σ, where ais a term that does not
depend on σ. The value cis also called the competitive ratio of the on-line strategy.
It is reasonable, to pay attention to the additive term a. This term must be in-
dependent on the sequence σbut it can depend, e.g., on the number of nodes in the
network. Thus, the additive term acan dominate the expression c
Copt
σ

afor a
long time, until the sequence σreaches the necessary length. Therefore, it is interest-
ing to have strictly c-competitive on-line strategies i.e., strategies having congestion at
most c
Copt
σ
, i.e., a
0, for each sequence σ. In this case, the value cis also called
the strict competitive ratio of the on-line strategy.
In the model without memory capacity constraints we will show that all presented
strategies are also able to handle parallel and overlapping requests, too. We restrict the
class of allowed applications specified by the adversary to be data-race free programs,
i.e., a write request to an object is not allowed to overlap with other requests to the
same object, and there is some order among the requests to the same object such that,
for each read and write request, there is a unique least recent write. Note that this still
allows arbitrary concurrent requests to different objects and concurrent read requests
to the same object. Without the restriction to data-race free programs, the optimal
off-line strategy could, e.g., defer all write requests to an object xto the end of the
execution and keep a valid copy of xat each node that reads xat some time. This “bad
trick” would give load 1 on each edge for serving all read requests to x. The example
illustrates that the restriction to the class of data-race free programs is necessary in
order to allow a fair comparison of on-line against optimal off-line strategies.
When the on-line strategy serves a request, it does not know future requests. In
contrast, the optimal off-line strategy is assumed to have full knowledge about the
whole sequence including future requests. This assumption gives much more power to
the optimal off-line strategy than to the on-line strategy. In fact, it turns out that this
advantage is too large in the case of bounded memory size: Sleator and Tarjan show in
[ST85] that the worst-case competitive ratio between the cost of a deterministic paging
strategy and an optimal off-line strategy grows linearly with the memory size. In the
case of randomized strategies this ratio grows logarithmically with the memory size
[FKL
91]. In order to compensate the advantage of knowing the future Sleator and
Tarjan suggest to restrict the optimal off-line strategy by reducing its memory capacity.
They show that, if the memory capacity of the optimal off-line strategy is reduced by
a factor of two, then a constant competitive ratio of two can be achieved, e.g., for the
8 Chapter 1. Introduction
LRU (least recently used) paging strategy, which is the most used paging strategy in
practice.
We use a similar approach for the evaluation of caching strategies for networks
with memory capacity constraints. We compare an on-line strategy whose individ-
ual memory capacities have been increased by a factor of m, for some fixed m, with
an optimal off-line strategy obeying the given memory capacities. For simplicity in
notation, we increase the capacities of the on-line strategy rather than decreasing the
capacities of the optimal off-line strategy. For a given sequence σ, let Copt
σ
denote
the minimum congestion produced by an optimal off-line strategy, obeying the mem-
ory bound m
v
, for each node v. An on-line strategy is said to be
m
c
-competitive if
it obeys the memory bound m
m
v
, for each node v, and if it has congestion at most
c
Copt
σ

a, for each sequence σ, where ais a term that does not depend on σ. The
value mis also called the memory ratio and the value cis also called the congestion
ratio of the strategy.
In the case with memory capacity constraints, it is again interesting to have
strictly
m
c
-competitive on-line strategies, i.e., strategies having congestion at most
c
Copt
σ
, i.e., a
0, for each sequence σ, while obeying the memory bound m
m
v
,
for each node v. In this case, the value mis also called the strict memory ratio and the
value cis also called the strict congestion ratio of the on-line strategy.
If the on-line strategy uses randomization we have to describe the power given
to the adversary more precisely. This is studied intensively in [BDBK
90]. The ad-
versary specifying the sequence is assumed to be oblivious, i.e., the requests do not
depend on the decisions of the on-line strategies. A randomized strategy has to sat-
isfy the congestion bound with high probability (w.h.p.), that is, the probability that
the congestion exceeds α
c
Copt
σ

a
is at most n
α, for every α
1, where n
denotes the number of nodes in the network. Note that this bound implies that the
expected congestion is O
c
Copt
σ

a
.
1.4 Previous work on the access tree strategy
This thesis is built upon joint work with Christof Krick, Bruce Maggs, Friedhelm
Meyer auf der Heide, Harald R¨acke, and Berthold V¨ocking on data management in
networks without memory capacity constraints in a uniform cost model. This work
is described in [MMVW97a, MMVW97b, KMR
99] and Berthold V¨ocking’s thesis
[V¨oc98]. In the following, these results will be sketched briefly.
We introduce new static and dynamic data management strategies for trees,
meshes, and Internet-like clustered networks in [MMVW97a, MMVW97b]. Note
that all presented strategies work only in a uniform cost model and that they do not
obey memory capacity constraints. These assumptions simplify the problem signifi-
cantly.
In the static model, we assume that we are given an application for which the
the rates of read and write requests for all node-object pairs are known. The goal is to
1.4. Previous work on the access tree strategy 9
calculate a static placement of the objects to the nodes in the network and to specify the
routing such that the congestion is minimized. In the dynamic model, we assume no
knowledge about the access pattern. An adversary specifies accesses at runtime. Here
we present caching strategies that also aim to minimize the congestion by exploiting
locality. These strategies are investigated in a competitive model.
We start by investigating static data management on tree-connected networks mod-
eled by connected (hyper)graphs without cycles. We show the surprising result that a
static placement exists that minimizes the communication load on all edges (or hyper-
edges) simultaneously. Obviously, this yields minimum congestion as well as mini-
mum total communication load. We describe a deterministic strategy that calculates
this placement. The sequential running time of this strategy is O
n
for each object,
where ndenotes the size of the network. Moreover, the placement can be computed
efficiently in a distributed fashion by the processors of the underlying tree network.
Besides we present the first deterministic dynamic strategy for tree-connected net-
works that works in a distributed fashion and achieves a competitive ratio of 3. This
caching strategy is surprisingly simple, and the competitive ratio is optimal because of
the lower bound shown by Black and Sleator [BS89].
As our results for static and dynamic data management on trees hold for trees with
arbitrary link or bus connections having arbitrary bandwidths, our strategies are well
suited in particular for Ethernet-connected NOWs. For example, the static strategy can
be used for optimal static file allocation and the dynamic strategy for the implementa-
tion of efficient distributed shared memory systems in these networks.
The situation on meshes is much more complicated because there are several pos-
sible routing paths between every pair of nodes. In fact, we show by a reduction from
PARTITION that the static problem is NP-hard already on a 3
3 mesh. The dynamic
problem is even more complicated, since we have to solve an on-line routing problem
and a data tracking problem in order to locate a suitable copy in case of a read access
and all copies of an object in case of a write request.
We introduce a simulation approach that solves the static and the dynamic problem
on meshes simultaneously. The strategy is based on a randomized but locality preserv-
ing embedding of access trees into the mesh, and it is called the access tree strategy.
On the access trees, the static or dynamic strategy for trees is executed. Consider an
arbitrary mesh of dimension dwith nnodes. Here the access tree strategy yields an ef-
ficient strategy for static data placement on meshes with arbitrary side length achieving
optimal congestion up to a factor of O
d
logn
, w.h.p. The placement of each object
only takes time O
n
. In the dynamic model, the access tree strategy achieves compet-
itive ratio O
d
logn
, w.h.p. We give a corresponding
logn
d
lower bound on the
competitive ratio for on-line routing, which implies that the competitive ratio achieved
by the dynamic access tree strategy is optimal for meshes of constant dimension.
Furthermore, we show how the access tree strategy can be adapted to Internet-
like clustered networks. A clustered network is a network that consists of several
small subnetworks, i.e., clusters, that are organized hierarchically. Communication
between nodes of the same cluster is not as expensive as communication between
10 Chapter 1. Introduction
nodes of different clusters. The access trees are embedded into the clustered network
in a preprocessing step. We show that this preprocessing can be done efficiently and
locally for each participating cluster. The static variant of the access tree strategy
allows to calculate a static placement with close-to-optimal congestion for clustered
networks. The dynamic variant yields an efficient caching strategy.
We believe that the access tree strategy is well suited for the application in practice
because the congestion is provably small. Besides the strategy is simple and produces
only very small overhead for bookkeeping. In order to illustrate the practical usabil-
ity, we have implemented variants of the dynamic access tree strategy in the DIVA
(Distributed Variables) library [KRVW98] that provides direct access to global vari-
ables, i.e., shared data objects, from the individual nodes in the network. The current
implementations are based on a massively parallel mesh-connected computer system.
We have evaluated the dynamic access tree strategies implemented in the DIVA
library experimentally in [KMR
99]. We test several variations of this strategy on
three different applications of parallel computing, which are matrix multiplication,
bitonic sorting, and Barnes-Hut N-body simulation. The implemented algorithms for
matrix multiplication and sorting are oblivious, i.e., their access or communication
patterns do not depend on the input. The reason why we have decided to include
these algorithms in our small benchmark suite is that they allow us to compare the
dynamic data management strategies with hand-optimized message passing strategies.
The third application, the Barnes-Hut N-body simulation, is non-oblivious. We believe
that a communication mechanism that uses shared data objects is the best solution
for this application, in contrast to the other two applications. However, we cannot
construct a hand-optimized message passing strategy achieving minimum congestion
for this application. Therefore, we concentrate on the comparison of different caching
strategies.
Our experiments show that the access tree strategy produces much less congestion
thanthestandardfixedhomestrategy(see, e.g., [Goo94]) inwhich afixedhome proces-
sor is assigned to each data object that keeps track of the object’s copies. Furthermore,
we observe that the execution time of the investigated application depends heavily
on the congestion produced by the different data management strategies. Therefore,
the access tree strategy outperforms the fixed home strategy clearly. In particular, the
larger the network is, the more superior the access tree strategy becomes. While the
fixed home strategy scales poorly, the access tree strategy comes reasonably close to
the performance of the hand-optimized message passing strategies even in the case of
large networks.
1.5 Other previous work
Competitive analysis of caching strategies in computer systemsconnected by networks
begins with Karlin et al. [KMRS88] who analyze strategies for snoopy caching on
buses. Bartal et al. [BFR92] introduce the file allocation problem in a competitive
1.5. Other previous work 11
model similar to our non-uniform model. The major difference to our model is that
they consider a different cost measure than we do, that is, they consider the total com-
munication load, i.e., the sum over the relative load on all edges, rather than the con-
gestion, i.e., the maximum over the relative load on all edges.
The most comparative work is done byAwerbuch et al. [ABF93, ABF98] who con-
sider the file allocation problem on arbitrary networks. In [ABF93] they present a cen-
tralized strategy that achieves an optimal competitive ratio O
logn
and a distributed
strategy that achieves competitive ratio O

logn
4
for the file allocation problem on
an arbitrary n-node network without memory capacity constraints. In [ABF98], the
distributed strategy is adapted to networks with memory capacity constraints result-
ing in an strategy that is
polylog n
polylog n
-competitive. All competitive ratios are
with respect to the total communication load instead of the congestion. We believe that
the congestion measure is more appropriate for the design of strategies than the total
communication load as it prevents some of the links from becoming bottlenecks.
We give a brief description of the very intriguing distributed strategy of Awerbuch
et al. because this will illustrate the influence of the cost measure on the design of
caching strategies. Their strategy uses a hierarchical network decomposition, intro-
duced in [AP90], that decomposes the network in a hierarchy of clusters with geomet-
rically decreasing diameter. If a node accesses a file xthat has no copy in a cluster of
some hierarchy level, then the cluster leader is informed and increases a counter for
the file. When the counter reaches D, the requesting node gets a copy of x, where D
denotes the ratio between the cost for migrating and the cost for accessing a file. This
strategy ensures that the total communication load is minimized up to a some loga-
rithmic factors in the size of the network, which gives a good competitive factor with
respect to the total communication load. The cluster leaders, however, can become
bottlenecks. In particular, the edges incident to the leader of the top level clusters may
become very congested. As a consequence, the competitive ratio in our congestion
based cost model becomes bad.
Caching strategies for tree-like topologies are of special interest since many net-
works have such topology, e.g., Ethernet-connected NOWs. Besides, strategies for
trees can be used as a subroutine for caching strategies on other networks. Therefore,
much research deals with caching on trees. For example, Bartal et al. [BFR92] de-
scribe a randomized strategy for the file allocation problem on trees without memory
capacity constraints that is 3-competitive with respect to the expected total load and a
deterministic strategy that is 9-competitive with respect to the total load. The conges-
tion is not considered, but it is easy to check that both strategies are ω
1
-competitive
with respect to the congestion.
Lund et al. [LRWY94] describe a 3-competitive deterministic but centralized strat-
egy for the same problem. The advantage of their strategy is that it minimizes not
only the total communication load but also the load on each edge and therefore also
the congestion. Unfortunately, the strategy makes use of global knowledge about a
work function that is influenced by each request issued in the network. In particular,
read requests issued by a node can induce the invalidation of copies in a completely
12 Chapter 1. Introduction
different region of the network without sending invalidation messages. This illustrates
that their strategy is inherently centralized, and hence yields no suitable solution in our
distributed setting.
A lower bound that holds for each network including at least one edge is shown by
Black and Sleator [BS89]. They give a lower bound of 3 on the competitive ratio of
each deterministic caching strategy on two nodes connected by a single edge. Bartal
et al. [BFR92] show that this bound holds also for randomized strategies.
Several recent papers deal with the distribution of pages in the World Wide Web.
Plaxton and Rajaraman [PR96] show how to balance the pages among several caches
by embedding a random cache tree for each page into the network. This balances the
load well and ensures fast responses even for popular pages. However, the strategy
uses a uniform embedding of the tree nodes onto the nodes in the Internet, which
destroys topological locality. Karger et al. [KLL
97] use a similar technique to relieve
hot spots in the Internet but they pay attention to topological locality. In contrast
to our assumption, they assume the latencies instead of the bandwidths to be the main
problem for datatransmissionin the Internet. Further, they consideronly read requests.
A difficult problem that has to be solved by each caching strategy is the data track-
ing problem, i.e., the problem of how to locate the copies of a particular object. To our
knowledge, data tracking mechanisms that aim to minimize the congestion have not
been investigated previously. Bartal et al. [BFR92] describe a data tracking mechanism
for arbitrary networks that minimizes the total communication load. This mechanism
is based on a distributed data structure that keeps track of all copies. Whenever a copy
is created or deleted on a node, this data structure is updated. The mechanism enables
each node to locate a copy of a particular object having approximately the distance of
the closest copy. The total communication load induced by each of the create, delete,
or locate operations are shown to be in O

log2n
, where
denotes the length of the
shortest path to a copy of the respective object and nis the size of the network. On
each node, the memory overhead due to the distributed data structure is at most O

X
with Xdenoting the set of shared objects. Plaxton et al. [PRR97] describe a similar
mechanism which reduces the memory overhead for the data structure on each node
to O
k
log2n
with kdenoting the number of copies on a node. However, this mecha-
nism works only for networks that fulfill a low-expansion property, which is satisfied,
e.g., by meshes of constant dimension.
1.6 Outline
The remainder of this thesis is organized as follows. In Chapter 2, caching strategies
for networks without memory capacity constraints are investigated in a non-uniform
cost model. First, the access tree strategy is introduced, and as an example the ap-
plication of the access tree strategy to 2-dimensional meshes is sketched. Then, a
deterministic and distributed caching strategy for trees is presented. This strategy is
used as a subroutine in the access tree strategy. Thereafter, the access tree strategy is
1.6. Outline 13
applied to several classes of networks, including meshes, fat-trees, complete networks,
Cayley networks, edge symmetric networks, hypercubic networks, and Internet-like
clustered networks. Further, we present the universal caching strategy working on ar-
bitrary networks for whichan oblivious routingstrategy isgiven. Finally, we will show
that all presented strategies are also able to handle parallel and overlapping requests,
too.In Chapter 3, caching strategies for networks with memory capacity constraints
are investigate in a non-uniform cost model. First, the general framework for the de-
velopment of caching strategies with memory capacity constraints is introduced. This
framework is based on the access tree strategy. Then, a caching strategy for trees
with memory capacity constraints is presented. This strategy is used as a subroutine
in the general framework. Thereafter, the general framework is applied to several
classes of networks, including meshes, fat-trees, complete networks, Cayley networks,
edge symmetric networks, and hypercubic networks. Further, we present the universal
caching strategy working on arbitrary networks for which an oblivious routing strategy
is given.
In Chapter 4, a detailed summary and discussion of the results of this thesis is
presented. Further, some concluding remarks are given.
Chapter 2
Caching without Memory Capacity
Constraints
In this chapter, we investigate caching strategies for networks without memory capac-
ity constraints in a non-uniform cost model. The remainder of the chapter is organized
as follows. In Section 2.1, the access tree strategy is introduced, and as an example the
application of the access tree strategy to 2-dimensional meshes is sketched. In Section
2.2, a deterministic and distributed caching for trees is presented. This strategy is used
as a subroutine in the access tree strategy. In Section 2.3, the access tree strategy is
applied to several classes of networks, including meshes, fat-trees, complete networks,
Cayley networks, edge symmetric networks, hypercubic networks, and Internet-like
clustered networks. Further, we present the universal caching strategy working on
arbitrary networks for which an oblivious routing strategy is given. Finally, in Sec-
tion 2.4, we will show that all presented strategies are also able to handle parallel and
overlapping requests, too.
2.1 The access tree strategy
The access tree strategy is based on a hierarchical decomposition of the network.
This hierarchical decomposition allows to exploit topological locality in an efficient
way. As an example, we describe recursively the hierarchical decomposition of a 2-
dimensional mesh Mwith nnodes. Let m1and m2denote the side lengths of the mesh,
i.e., n
m1
m2. We assume m1
m2. If m1
1 then we have reached the end of
the recursion. Otherwise, we partition Minto two non-overlapping submeshes of size
m1
2
 
m2and
!
m1
2
"#
m2. These submeshes are then decomposed recursively
according to the same rules. Figure 2.1 gives an example for this decomposition.
The hierarchical decomposition has associated with it a decomposition tree T
M
,
in which each node corresponds to one of the submeshes, i.e., the root of T
M
cor-
responds to Mitself, and the children of a node vin the tree correspond to the two
submeshes into which the submesh corresponding to vis divided. In this way, the leaf
16 Chapter 2. Caching without Memory Capacity Constraints
level 0 level 1
k l m
jhg
d fe
cba
level 2 level 3 level 4
Figure 2.1: The hierarchical decomposition of M
4
3
.
nodes of T
M
correspond to submeshes of size one and thus each leaf node represents
a node of M.
We interpret T
M
as a virtual network that we want to simulate on M. In order to
compare the congestion in both networks we define bandwidths for the edges in T
M
,
i.e., for a tree edge e
u
v
, with udenoting the parent node, define the bandwidth
of eto be the number of edges leaving the submesh corresponding to v. Note that this
bandwidth definition is only required for the description of the theoretical results and
does not affect the access tree strategy in any way. Figure 2.2 gives an example of a
decomposition tree with bandwidths.
For each global variable, define an access tree to be a copy of the decomposition
tree. We embed the access trees randomly into the mesh, i.e., for each variable, each
node of the access tree is mapped at random to one of the nodes in the corresponding
submesh. On each of the access trees, we execute a simple 3-competitive caching
strategy. All messages that should be sent between neighboring nodes in the access
trees are sent along the dimension-by-dimension order path between the associated
nodes in the mesh, i.e., the unique shortest path between the two nodes using first
edges of dimension 1 and then edges of dimension 2.
The analysis of the access tree strategy uses a bi-simulation between the mesh M
and the decomposition tree T
M
. At first, it is shown that the decomposition tree
T
M
can simulate the mesh M. In this simulation, each leaf node of T
M
issues the
same read and write requests as its counterpart in M. It is shown that any application
that produces congestion Cwhen it is executed on Mcan be executed on T
M
with
congestion C, too. At second, it is shown that the mesh Mcan simulate the decompo-
sition tree T
M
in such a way that the congestion in Mis only O
logn
times larger
than the congestion in T
M
.
From the results on the congestion of these two simulations we can obtain the
competitive ratio of the access tree strategy. The simulation of Mby T
M
looses a
factor of at most c1
1; the on-line strategy looses a factor of c2
3 against the off-
line strategy on T
M
; finally, the simulation of T
M
by Mlooses a factor of at most
c3
O
logn
. Altogether, we achieve a competitive ratio of c1
c2
c3
O
logn
.
2.2. Caching on trees 17
jm
gkhl
da
fceb
3 4
3
2 3
3 323
3 4 2 3
34
33
34
2 55
level 2
level 3
level 4
level 0
level 1
Figure 2.2: The decomposition tree T
M
4
3

. The node labels correspond to the
labels in Figure 2.1. The edge labels indicate the bandwidths of the respective edges.
2.2 Caching on trees
In this section, deterministic caching strategies for a tree Tare presented. The advan-
tage of trees is that there is only one simple path between any pair of nodes. Thus,
the placement of the objects automatically defines the routing paths from the access-
ing node to the respective copies. In particular, this means that the congestion is fixed
as soon as the placement is specified, which makes the analysis for trees much easier
than the one for networks including cycles. The edges in the tree Tare allowed to have
arbitrary bandwidth.
First, we present a very simple caching strategy for trees in a uniform cost model,
i.e. D
1. It is shown that this deterministic strategy is 3-competitive. Then, we
generalize this strategy to a non-uniform cost model. Again, it is proved that this
deterministic strategy is still 3-competitive. This competitive ratio is optimal because
of the lower bound shown by Black and Sleator [BS89].
2.2.1 Uniform model
In the uniform cost model, i.e. D
1, we assume that each object fits into one routing
packet such that each migration of a copy along an edge increases the load on this edge
by 1. Recall that each read or write request increases the load of each edge involved
in this request by 1, and that each information message increases the load of each
traversed edge by 1.
18 Chapter 2. Caching without Memory Capacity Constraints
Node vissues a read request for object x:
The read request is served by u, the node holding a copy of xthat is
nearest to v.
A new copy of xis created on each node incident to an edge on the path
from uto v.
Node vissues a write request for object x:
vsends the update information, i.e., the value that should be written, to
u, the node holding a copy of xthat is nearest to v.
ustarts an invalidation multicast to all other nodes holding a copy of x,
and modifies its own copy of x.
A new copy of xis created on each node incident to an edge on the path
from uto v.
Figure 2.3: The tree strategy in the uniform model for object x.
Our caching strategy manages all objects independently from each other. For each
object x, the nodes that hold a copy of xalways build a connected component. In
Figure 2.3 the tree strategy in the uniform model is described for object x. Note that in
case of a write request the invalidation multicast can be viewed as following: First, the
copies are updated, and after this update the copies are deleted.
It remains to describe how the nodes locate the copies. As the nodes holding the
copies of xalways build a connected component, the data tracking problem can be
solved very easily. Each node is attached a signpost pointing to the node that has is-
sued the least recent write request. Initially, this signpost points to the only copy of
x. Whenever xis updated the signposts are redirected to the node that has issued the
corresponding write request. Note that this mechanism does not require extra com-
munication, because only the signposts on nodes involved in the invalidation multicast
have to be redirected. The number of signposts can be reduced by defining a root of
the tree and omitting all those signposts that are directed to the root. Hence, we need
signposts only to mark the trail from the root to the node that issued the least recent
write request such that diameter of Tsignposts for each object are sufficient.
Besides, we attach markers to the copies that indicate the boundaries of the con-
nected componentbuiltby thenodes that holda copy. These markersare needed for the
invalidation multicast. They are updated whenever the connected component changes.
Also this mechanism does not require extra communication.
The following theorem shows that the tree strategy in the uniform model is strictly
3-competitive on trees with edges of arbitrary bandwidth.
2.2. Caching on trees 19
Theorem 2.1 The tree strategy in the uniform model minimizes the load on any edge
up to a factor of 3.
Proof. Fix an object x
X, and a sequence σ
σ1σ2

of read and write requests
for x. If σstarts with a read request, then we add a write request issued by the node
holding the initial copy of xat the beginning of the sequence. Obviously, this causes
no extra communication for an optimal strategy. We divide the sequence of requests
into phases such that the first request in each phase is a write and each phase includes
only one write.
For a phase t, let vtdenote the node issuing the write request at the beginning of t,
and letUtdenote the connected subgraph induced by vtand the nodes issuing the read
requests in this phase. Further, for t
1, let ptdenote the unique shortest path leading
from Ut
1to vt.
Each caching strategy has to send at least one message along each edge in Utin
phase tbecause all of the nodes that issue read requests have to receive the value
written by vt. Besides, for t
1, a message has to be sent along ptbecause the new
value written by vthas to be “merged” with the value written by vt
1. Consequently,
any strategy has to sent one message along each edge inUt
$
ptfor each phase t.
Now, we consider the load induced by our strategy. All messages induced by read
and write requests of phase texcept for the invalidation messages are routed along
edges in Ut
$
pt. In particular, the load of each of the edges in Ut
$
ptis increased by
exactly two: by the service of a read or write request and the corresponding migration
of a copy of x.
Finally, we consider the load induced by the invalidation multicast. The invalida-
tion multicast for the write in phase tsends exactly one message along each edge in
Ut
1
$
pt
1. Consequently, our strategy sends at most three times as many messages
along any edge as any other strategy does.
2.2.2 Non-uniform model
First, we describe a deterministic caching strategy for two nodes connected by a single
edge. This strategyis 3-competitive, and it can be computed in a distributed fashion by
the two connected nodes. The key feature of this strategy is that it can be extended to
work on trees just by simulating it on any edge in the tree. The result of this approach
is a simple, deterministic, and distributed caching strategy for trees which minimizes
the load on any edge up to a factor of 3. Obviously, the bound on the load of any edge
induces that our tree strategy is 3-competitive with respect to the total communication
load and the congestion simultaneously.
Caching on a single edge
We describe a deterministic and distributed caching strategy for a single edge econ-
necting two nodes aand b. The strategy uses a simple counting mechanism which
20 Chapter 2. Caching without Memory Capacity Constraints
Node aissues a read request for object x:
If ca
%
Dthen ca:
ca
1;
If ca
Dand aholds no copy of xthen
Move a new copy of xonto a;
If cb
0 then delete the copy of xon b;
Node aissues a write request for object x:
If cb
&
0 then
cb:
cb
1;
else If ca
%
Dthen ca:
ca
1;
If ca
Dand aholds no copy of xthen move a new copy of xonto a;
If cb
0 and aholds a copy of xthen delete the copy of xon b;
Figure 2.4: The edge strategy for requests issued by node a.
records read and write requests that are issued by the two nodes. Later on we will use
this strategy for building our tree strategy. Then the nodes aand bcorrespond to the
two connected components in which the tree is divided if eis removed.
Initially, we do not care about how the nodes aand bexchange information about
the residence set, and how the counters are distributed among them. We assume that
node aalways knows whether or not node bholds a copy and vice versa. Afterwards
we show how our strategy can be adapted to the distributed setting.
Each object xis handled independently from the other objects. Let us fix an object
x. Concerning this object there are two counters caand cb. Informally, these counters
represent saving accounts for costs referring to x.
Initially, one copy of xis placed on one of the two nodes. W.l.o.g., assume that it
is placed on a. Then cais set to D, and cbis set to 0. In Figure 2.4 the edge strategy is
described for requests issued by node a. The strategy works analogously for requests
issued by node b. Note that the edge strategy always keeps one copy of x, since it only
deletes a copy on a node if the other node also holds a copy.
The following lemma shows that the centralized edge strategy is strictly 3-
competitive.
Lemma 2.2 The centralized edge strategy minimizes the load up to a factor of 3.
Proof. We use a potential function argument (cf. [ST85]). First, let us fix an optimal
off-line strategy, which is denoted the optimal strategy in the following. W.l.o.g., the
optimal strategy fulfills the following properties.
2.2. Caching on trees 21
If a node vissues a read request for an object x, then the optimal strategy does
not delete a copy, that is, the only possible change of the residence set is that a
new copy of xis moved to v.
If a node vissues a write request for an object x, then the only possible changes
of the residence set are that a new copy of xis moved to vand/or a copy of xis
deleted on the neighbor of v.
Fix an object x
X, and a sequence σ
σ1σ2

of read and write requests for
x. Let Le
t
and Lopt
t
denote the load of the edge strategy and the optimal strategy,
respectively, after serving σt, and let Φ
t
denote the value of a potential function after
serving σt, which is defined in detail later. In order to prove the lemma, we show the
following invariant.
(a) Le
t
'
Φ
t
)(
3
Lopt
t
and
(b) Φ
t
*
0.
Let ca
t
and cb
t
denote the value of the counters caand cb, respectively, after
serving σt, and let Re
t
and Ropt
t
denote the residence set of the edge strategy and
the optimal strategy, respectively, after serving σt. We define
Φ
t
+
Φa
t

Φb
t

D
where, for v
-,
a
b
.
,
Φv
t
/
01
1
2
1
13
2cv
t
, if v
Re
t
and v
Ropt
t
,
3D
cv
t
, if v
Re
t
and v
Ropt
t
,
3D
2cv
t
, if v
Re
t
and v
Ropt
t
,
cv
t
, if v
Re
t
and v
Ropt
t
.
First, we prove invariant (b). The optimal strategy always holds a copy of xon
a node. Hence, v
Ropt
t
, for some node v
4,
a
b
.
, and consequently Φv
t
5
D,
which implies Φa
t

Φb
t
)
D. Hence, invariant (b) is shown.
Now we prove invariant (a) by induction on the length of σ. Obviously, (a) holds
for the initial setting. For the induction step suppose that Le
t
6
Φ
t
/(
3
Lopt
t
. Let
Le
Le
t
1

Le
t
,Lopt
Lopt
t
1

Lopt
t
,∆Φa
Φa
t
1

Φa
t
, and
∆Φb
Φb
t
1

Φb
t
. In order to prove the induction step, we show that
Le
∆Φa
∆Φb
(
3
Lopt
7
(2.1)
We distinguish between read and write requests.
Suppose σt
1is a read request issued by node a. In this case, Equation (2.1) can
be checked with Table 2.1 containing all possible changes of configuration. Note
that, if aissues a read request, the only possible changes of the residence sets
are that one of the strategies moves a copy to a, or cb
0 and the edge strategy
deletes the copy on b. In both cases, ∆Φb
0.
22 Chapter 2. Caching without Memory Capacity Constraints
a
Re
t
a
Ropt
t
a
Re
t
1
a
Ropt
t
1
Le∆Φa
(
Lopt
no no no no 1 2 1
no no no yes 1 3D
1D
no yes no yes 1
1 0
no no yes no 1
D2
D1
no no yes yes 1
D2
D D
no yes yes yes 1
D
1
D0
yes no yes no 0 1 1
yes no yes yes 0 3D
2D
yes yes yes yes 0 0 0
Table 2.1: Possible changes of configuration if node aissues a read request.
Suppose σt
1is a write request issued by node a. We distinguish between the
cases cb
t
&
0 and cb
t
8
0. Note that, if aissues a write request, the only
possible changes of the residence set of the optimal strategy is that a new copy of
xis moved to aand/or a copy of xis deleted on b.
Suppose cb
t
&
0. In this case, Equation (2.1) can be checked with Table
2.2 containing all possible changes of configuration. Note that, if aissues a
write request and cb
t
&
0, the only possible transitions of the edge strategy
are from
,
a
.
to
,
a
.
, from
,
b
.
to
,
b
.
, or from
,
a
b
.
to
,
a
b
.
or
,
a
.
.
Suppose cb
t
)
0. In this case, Equation (2.1) can be checked with Table
2.3 containing all possible changes of configuration. Note that, if aissues a
write request and cb
t
+
0, the only possible transitions of the edge strategy
are from
,
a
.
to
,
a
.
or from
,
b
.
to
,
b
.
or
,
a
.
.
This completes the proof of Lemma 2.2.
Next, we describe how the edge strategy can be adapted from the centralized to the
distributed setting. Each node keeps always the current value of both counters caand
cb. It is obvious that this assignment makes it possible for the nodes to make the right
decisions according to the edge strategy. Now we specify how each node keeps track
of both counters caand cb. W.l.o.g., consider node a.
A read request for xis issued by b: If bholds no copy of xthen ais able to
update its counters. In the other case, bsends an information message along eif
and only if bhas increased its counter cb.
A write request for xis issued by b: If aholds a copy of xthen ais able to update
its counters. In the other case, bsends an information message along eif and
only if bhas decreased its counter caor increased its counter cb.
2.2. Caching on trees 23
Re
t
Ropt
t
Re
t
1
Ropt
t
1
Le∆Φa
(
∆Φb
(
Lopt
a a a a 0 0
2 0
a a
b a a
b0 0 1 1
a a
b a a 0 0
2 0
a b a b 0 0 1 1
a b a a
b0 3D1 1
D
a b a a 0 3D
2D
b a b a 1 0
1 0
b a
b b a
b1 0 2 1
b a
b b a 1 0
1 0
b b b b 1 0 2 1
b b b a
b1 3D2 1
D
b b b a 1 3D
1D
a
b a a
b a 1 0
1 0
a
b a
b a
b a
b1 0 2 1
a
b a
b a
b a 1 0
1 0
a
b b a
b b 1 0 2 1
a
b b a
b a
b1 3D2 1
D
a
b b a
b a 1 3D
1D
a
b a a a 1 0
1 0
a
b a
b a a
b1 0 2 1
a
b a
b a a 1 0 2
3D0
a
b b a b 1 0 2 1
a
b b a a
b1 3D2 1
D
a
b b a a 1 3D2
3D D
Table 2.2: Possible changes of configuration if node aissues a write request and
cb
t
&
0.
In this way, each node is able to keep both of its counters up-to-date. The following
lemma shows that the additional information messages are sent very rarely.
Lemma 2.3 The distributed edge strategy minimizes the load up to a factor of 3.
Proof. We adopt the notations and definitions of Lemma 2.2. An information message
is only sent if a request changes the value of a counter. Thus, the tables used for
proving Lemma 2.2 change only slightly.
Suppose σt
1is a read request issued by node a. Then, an additional message
is sent only, if a
Re
t
. In this case, Leequals 1 rather than 0. Further, ∆Φa
equals
2 rather than 0, if a
Ropt
t
and ca
t
%
D. It is easy to check that
Equation (2.1) is still satisfied by applying these changes to Table 2.1.
24 Chapter 2. Caching without Memory Capacity Constraints
Re
t
Ropt
t
Re
t
1
Ropt
t
1
Le∆Φa
(
∆Φb
(
Lopt
a a a a 0 0 0 0
a a
b a a
b0 0 0 1
a a
b a a 0 0
3D0
a b a b 0 1 0 1
a b a a
b0 1 0 1
D
a b a a 0 1
3D D
b a b a 1
1 0 0
b a
b b a
b1
1 0 1
b a
b b a 1
1
3D0
b b b b 1 2 0 1
b b b a
b1 3D
1 0 1
D
b b b a 1 3D
1
3D D
b a a a 1
D
1
D0 0
b a
b a a
b1
D
1
D0 1
b a
b a a 1
D
1
D
3D0
b b a b 1
D2
D0 1
b b a a
b1
D2
D0 1
D
b b a a 1
D2
D
3D D
Table 2.3: Possible changes of configuration if node aissues a write request and
cb
t
+
0.
Suppose σt
1is a write request issued by node a. Then, an additional message
is sent only, if b
Re
t
. In this case, Leequals 1 rather than 0. Further, ∆Φa
equals
2 rather than 0, if a
Ropt
t
and ca
t
%
D. It is easy to check that
Equation (2.1) is still satisfied by applying these changes to the Tables 2.2 and
2.3.
Hence, Lemma 2.3 is proved.
Caching on trees
The distributed edge strategy can be extended to a tree T
V
E
. The edges in the
tree Tare allowed to have arbitrary bandwidths.
The tree strategy is composed out of
E
individual edge strategies. The idea is
to simulate the distributed edge strategy on each edge. Consider an arbitrary edge
e
a
b
. The removal of edivides Tinto two subtrees Taand Tb, containing aand
b, respectively. The two nodes aand bexecute the algorithm described in Figure 2.4.
The phrases “if aholds a (no) copy of x and “node aissues a read (write) request for
object x are just replaced by “if a node in Taholds a (no) copy of x and “a node in Ta
issues a read (write) request for object x”, respectively.
2.2. Caching on trees 25
The simulation works properly as long as the nodes in the residence set R
x
build
a connected component in the tree. A key feature of our edge strategy is that it fulfills
this condition. This is shown in the following lemma.
Lemma 2.4 The graph induced by the residence set R
x
is always a connected com-
ponent.
Proof. Via induction on the length of the sequence of requests it can be shown that
those counters on any simple path in the tree that are responsible for moving a copy
along an edge towards the first node of the path are non-decreasing from the first to
the last node on the path. Hence, all these counters “agree” about the distribution of
copies. This ensures that all copies stay in a connected component.
Fix an object x, and a sequence σ
σ1σ2

of read and write requests for x. Fix
a node vand two different edges x
u
v
and y
v
w
that are incident on v. The
removal of xand ydivides Tinto three subtrees Tu,Tv, and Tw, containing u,vand
w, respectively. For each edge
a
b
, there are the two counters c
9
a
:
b
;
aand c
9
a
:
b
;
bin the
distributed edge strategy. Then, let cx
u
t
and cy
v
t
denote the values of the counters cx
u
and cy
v, respectively, after serving σt.
In order to show the lemma, we prove by induction that cx
u
t
<(
cy
v
t
. Obviously
this is true for the initial setting. For the induction step suppose that cx
u
t
*(
cy
v
t
. In
the following we have to consider the three cases that σt
1is a request from a node in
Tu,Tv, or Tw.
Suppose σt
1is a request from a node in Tu. Then, cx
u
t
1
8-,
cx
u
t
=
cx
u
t

1
.
and cy
v
t
1
>4,
cy
v
t

cy
v
t

1
.
. If cx
u
t
1
*
cx
u
t
'
1
(
Dthen cy
v
t
1
5
,
cy
v
t
?
1
D
.
. Thus, cx
u
t
1
)(
cy
v
t
1
.
Suppose σt
1is a request from a node in Tv. Then, cx
u
t
1
*@,
cx
u
t
=
cx
u
t

1
.
and cy
v
t
1
8-,
cy
v
t
=
cy
v
t

1
.
. Thus, cx
u
t
1
*(
cy
v
t
1
.
Suppose σt
1is a request from a node in Tw. Then, cx
u
t
1
*-,
cx
u
t
A
cx
u
t

1
.
and cy
v
t
1
54,
cy
v
t
B
cy
v
t

1
.
. If cy
v
t
1
*
cx
u
t

1
0 then cx
u
t
1
5
,
cy
v
t
C
1
0
.
. Thus, cx
u
t
1
)(
cy
v
t
1
.
This completes the proof of Lemma 2.4.
Note that the tree strategy also does not need any additional information exchange
apart from the one done by the distributed edge strategy. Therefore, the following
theorem follows immediatelyfrom Lemma 2.3. It showsthat the tree strategy isstrictly
3-competitive for trees with edges of arbitrary bandwidth.
Theorem 2.5 The tree strategy minimizes the load on any edge up to a factor of 3.
26 Chapter 2. Caching without Memory Capacity Constraints
2.3 Applications of the access tree strategy
The access tree strategy can be applied to several classes of networks. Given any net-
work G, the decomposition tree T
G
can be constructed by a hierarchical decomposi-
tion of G. The quality of the resulting caching algorithm depends on some properties
of this decompositions. Although it is not clear which properties can be obtained for
general networks, applying the access tree strategy to almost any standard network
yields interesting results. In this section, we give some examples, including meshes,
fat-trees, complete networks, Cayley networks, edge symmetric networks, hypercu-
bic networks, and Internet-like clustered networks. Further, we present the universal
caching strategy working on arbitrary networks for which an oblivious routing strategy
is given.
2.3.1 Meshes
In this section, we consider caching strategies for the mesh M
M
m1
777
md
, i.e.,
the d-dimensional mesh-connected network with side length mi
2 in dimension i.
The number of nodes is denoted by n, i.e., n
m1

md, the number of edges is
d
i
D
1m1

mi
1
mi
1

mi
1

md
Θ
d
n
. Each edge has bandwidth 1. Thus,
the relative and absolute load of an edge are identical.
We show that the access tree strategy is strictly O
d
logn
-competitive, w.h.p., for
M. In order toprovethisresult, we givean upper boundon themaximum expectedload
in Mdue to the access tree strategy. To complete the proof, we show with a Chernoff
bound (see, e.g., [HR90]) that the maximum load over all edges does not deviate to
much from the expected load of an arbitrary edge. In order to apply a Chernoff bound
reasonably, we have to make the access tree strategy more fine granularly, i.e, the
access tree nodes have to be remapped dynamically in Mwhen too many messages
that simulate messages of the tree strategy traverse a node. The remapping is done by
the general remapping scheme. We present this scheme in a more general fashion such
that it can be used for all networks, not onlyfor meshes. Finally, we give a lower bound
for on-line routing on meshes showing that the above competitive ratio is optimal up
to a factor Θ
d2
.
Upper bound
The access tree strategy uses a locality preserving embedding of access trees. It is
based on a hierarchical decomposition of M, which we describe recursively. Let ibe
the smallest indexsuch that mi
max
,
m1
777
md
.
. If mi
1 then we have reached the
end of the recursion. Otherwise, we partition Minto two non-overlapping submeshes
M
m1
777
mi
2
E
777
md
and M
m1
777
F!
mi
2
"6
77G7
md
. These submeshes are then
decomposed recursively according to the same rules. Figure 2.1 on Page 16 gives an
example for this decomposition.
2.3. Applications of the access tree strategy 27
The hierarchical decomposition has associated with it a decomposition tree T
M
,
in which each node corresponds to one of the submeshes, i.e., the root of T
M
cor-
responds to Mitself, and the children of a node vin the tree correspond to the two
submeshes into which the submesh corresponding to vis divided. Thus, T
M
is a
binary tree of height O
logn
in which the leaves correspond to submeshes of size
one, i.e., to the nodes of M. We define the root to be on level 0 of this tree, and all
nodes whose parents are on level iare defined to be on level i
1. For each node vin
T
M
, let M
v
denote the corresponding submesh. Furthermore, each edge eof T
M
connecting a level inode uwith a level i
1 node vis defined to be on level i
1 and
M
e
H
M
v
.
We interpret T
M
as a virtual network that we want to simulate on M. In order to
compare the congestion in both networks we define bandwidths for the edges in T
M
,
i.e., for an edge eof T
M
, define the bandwidthof eto be the number of edges leaving
submesh M
e
. Figure 2.2 on Page 17 gives an example of a decomposition tree with
bandwidths.
For each object x
X, define an access tree Tx
M
to be a copy of the decompo-
sition tree T
M
. We embed the access trees randomly into M, i.e., for each x
X,
each interior node vof Tx
M
is mapped by a random hash function h
x
v
to one of
the processors in M
v
, and each leaf vof Tx
M
is mapped onto the only processor in
M
v
. For simplicity, we assume that the hash functions map in a truly random fashion,
i.e., uniformly and independently.
The remaining description of our data management strategy is very simple: For
object x
X, we execute the tree strategy on the access tree Tx
M
. All messages
that should be sent between neighboring nodes in the access trees are sent along the
dimension-by-dimension order path between the associated nodes in the mesh, i.e.,
the unique shortest path between the two nodes using first edges of dimension 1, then
edges of dimension 2, and so on.
The following lemma gives an upper bound on the maximum expected load in M
due to the access tree strategy.
Theorem 2.6 For each application on the d-dimensional mesh M with n nodes, the
access tree strategy achieves maximum expected load O
d
logn
Copt
M

, where
Copt
M
denotes the optimal congestion for the application.
Proof. In order to prove the above result, we require a lower bound on the congestion
of an optimal strategy and an upper bound on the expected load of the access tree
strategy. Define Copt
T
M

to be the optimal congestion for the application when it
is executed on the binary tree T
M
, under the assumption that each processor of Mis
simulated by its counterpart in T
M
, which is one of the leaf nodes. Note that T
M
is used only as a tool in the proof; in our caching strategy, we actually use a separate
tree Tx
M
for each data object x. We give a lower bound that relates the optimal
congestion on the mesh to Copt
T
M

, and an upper bound that relates the expected
load of the access tree strategy to Copt
T
M

. We start with the lower bound.
28 Chapter 2. Caching without Memory Capacity Constraints
Lemma 2.7 Copt
T
M
)(
Copt
M
.
Proof. For a given strategy on Mwith congestion Cwe have to describe a strategy on
T
M
with congestion at mostC.
We simulate the strategy for Mon T
M
, except for the routing. Instead, for the
routing paths in the mesh we use the unique shortest paths between the respective
nodes, that is, whenever a message is to be routed between two mesh nodes, we instead
route the same message along the unique shortest path in the tree between the leaf
nodes corresponding to these mesh nodes. Let C
I
denote the congestion for a given
application with the above strategy on T
M
. Let edenote an edge of T
M
with
relative loadC
I
. Then the absolute load of eisC
I
b
e
.
Now consider the same application on M. Any message that crosses ein T
M
has
either to leave or to enter the submesh M
e
in M. The number of edges leaving M
e
is b
e
. Thus, the load on one of these edges is at least C
IJ
b
e

b
e
/
C
I
, and hence,
C
C
I
.
The following lemma gives the upper bound on the expected load of the access tree
strategy.
Lemma 2.8 The maximum expected load in M due to the access tree strategy is
O
logn
d
Copt
T
M

.
Proof. Fix an arbitrary edge ein M. Let hdenote the height of T
M
, and let L
K
e
denote the load on edue to the simulation of edges on level
of T
M
, for 1
(
(
h.
We show that E
L
L
K
e
GM
O
d
Copt
T
M

, for 1
(
(
h, which yields the lemma as
h
O
logn
.
Fix a level
. Let vbe a node of T
M
on level
1 such that M
v
includes the
edge e. If such a node does not exist then E
L
L
K
e
GMN
0. Let v
I
be one of the two
children of v. We bound the expected load on edue to the simulation of the tree edge
eT
O,
v
v
IP.
on level
of T
M
.
First, we give a boundon the probability P
e
that eis traversed by a dimension-by-
dimension order pathconnecting themesh nodesthat simulate vandv
I
. ThemeshM
v
is partitioned by the hierarchical decomposition into two submeshes, one of which is
M
v
I
. Let qidenote the side length of M
v
I
in dimension i. Let kdenote the dimen-
sion of edge e. The nodes vand v
I
are embedded randomly into M
v
and M
v
IQ
.P
v
is bounded by the probability that the dimension-by-dimension order path between the
host of vand the host of v
I
traverses the row of e, i.e., the maximum set of edges of
dimension kthat build a linear array that includes e. Regardless of whether the path
starts at vor v
I
, the number of reachable rows in dimension kwith respect to the ran-
dom embedding is at least d
i
D
1
:
i
RD
kqi. As each of the these rows is equally likely to be
traversed,
P
e
8(
1
d
i
D
1
:
i
RD
kqi
(
qk
d
i
D
1qi
(
qmax
d
i
D
1qi
with qmax
max1
S
i
S
d
,
qi
.
.
2.3. Applications of the access tree strategy 29
Next we relate this probability to the bandwidth b
eT
of the tree edge eT. This
bandwidth is defined to be the number of edges leaving M
v
IT
. As a consequence,
b
eT
U(
d
j
D
1
qj
V8W
qmax
X
2
Y
2
d
i
D
1qi
qj
(
d
j
D
1
qj
V8W
qmax
X
2
Y
2
d
i
D
1qi
!
qmax
2
"
(
6d
d
i
D
1qi
qmax
(
6d
P
e
7
Note that we are allowed to exclude any dimension jwith qj
%
!
qmax
2
"
from the
above sum because the hierarchical mesh decomposition ensures that the mesh is not
divided along these dimensions before the decomposition of M
v
on level
1 such
that M
v
I
, which results from this decomposition, has no outgoing edges in one of
these dimensions.
The maximum number of messages that are transmitted along the tree edge eTis
3
Copt
T
M
N
b
eT
, since the tree strategy is strictly 3-competitive. Consequently,
the expected load on edge efor simulating eTis at most
3
Copt
T
M

b
eT

P
e
)(
18d
Copt
T
M

7
The samebound holdsfor theedge connecting vwithits otherchild. Hence, E
L
L
K
e
GMZ
O
d
Copt
T
M

, which yields the lemma.
Applying Lemma 2.7 and Lemma 2.8 yields Theorem 2.6.
The general remapping scheme
In this section, we introduce the general remapping scheme. This scheme is presented
in a more general fashion such that itcan be used for all networks, not only for meshes.
Consider a network Gand a hierarchical decomposition of G. The associated
decomposition tree T
G
is the tree in which each node corresponds to one of the
subgraphs, i.e., the root corresponds to Gitself, the children of a node vcorrespond
to the subgraph into which the subgraph corresponding to vis divided, and the leaves
correspond to subgraphs of size one, i.e., to the nodes of G. We define the root of
T
G
to be on level 0, and all nodes whose parents are on level iare defined to be
on level i
1. For each node vin T
G
, let G
v
denote the corresponding subgraph.
Furthermore, each edge ein T
G
connecting a level inode uwith a level i
1 node v
is defined to be on level i
1 and G
e
H
G
v
. Finally, we define the bandwidths, for
each edges ein T
G
, to be the number of edges leaving subgraph G
e
.
30 Chapter 2. Caching without Memory Capacity Constraints
For each objectx
X, define an access tree Tx
G
to bea copy of thedecomposition
tree T
G
. We embed the access trees randomly into G, i.e., for each x
X, each
interior node vof Tx
G
is mapped uniformly at random to one of the nodes in G
v
,
and each leaf vof Tx
G
is mapped onto the only node in G
v
.
The remaining description of our caching strategy is very simple: For each object
x
X, we execute the tree strategy on the access tree Tx
G
. All messages that should
be sent between neighboring nodes in the access trees are sent along a path in G.
The access tree nodes have to be remapped when too many access messages, i.e.,
messages that simulate messages of the tree strategy, traverse a node. The remapping is
done as follows. For every object x, and every node vof the access tree Tx
G
we add a
counterr
x
v
. Initially, thiscounter isset to 0. Whenever an access messagefor object
xtraverses node v, starts at node v, or arrives at node v, the counter r
x
v
is increased
by D, if this message is a migration message, and the counter r
x
v
is increased by
1, otherwise. When the counter r
x
v
reaches Kthe node vis remapped randomly to
another node in G
v
, where Kis some integer with K
Θ
deg
T
G

T
G
[
D
,
where deg
T
G

denotes the degree of T
G
, and
T
G

denotes the maximum
factor by which the bandwidths of two edges from T
G
incident on the same node
deviate. Remapping vto a new host means that we have to send a remapping message
that informs the new host about the migration and, if the old host holds a copy of x,
moves the copy to the new host. Remapping messages reset the counter r
x
v
to 0.
Furthermore, we have to send notification messages including information about the
new host to the nodes in Gthat hold the access tree neighbors of v. These notification
messages also increase the counters at their destination nodes, i.e., the counter r
x
v
IQ
,
for each neighbor v
I
of vin the access tree, by 1. The counter mechanism ensures
that the load directed to a copy on a randomly selected host is O
K
rather than Θ
κ
,
where κdenotes the maximum number of write requests directed to the same object.
Note that this bound would fail if the notification messages would not increase the
counters.
For a given application on Glet Copt
G
denote the optimal congestion for the ap-
plication. Further, let Copt
T
G

denote the optimal congestion for the application
when it is executed on the tree T
G
, under the assumption that each node of Gis sim-
ulated by its counterpart in T
G
, which is one of the leaf nodes. The decomposition
tree T
G
is called ΨG-useful if, for every application, the following two conditions
are fulfilled.
Copt
T
G
)(
Copt
G
.
The maximum expected relative load in Gdue to access messages is ΨG
Copt
T
G

.
The load on the edges in Gis increased by the notification and remapping mes-
sages. However, the following lemma shows that the impact of these messages on the
load on the edges is relatively small.
2.3. Applications of the access tree strategy 31
Lemma 2.9 Consider a network G with n nodes, and a ΨG-useful decomposition tree
T
G
of G. Then, for each application, the access tree strategy achieves congestion
O
ΨG
Copt
G

min
,
deg
T
G
N
T
G
[
D
κ
.8
deg
T
G
N
logn
, w.h.p., where
κdenotes the maximum number of write requests directed to the same object.
Before we prove Lemma 2.9, we apply it to the mesh M. We can conclude with
Lemma 2.7 and Lemma 2.8 that the decomposition tree T
M
is O
d
logn
-useful.
In addition, note that deg
T
M
H
3 and
T
M
H
2. Then, we can conclude with
Lemma 2.9 that the access tree strategy achieves congestion O
d
logn
Copt
M

min
,
D
κ
.\
logn
, w.h.p., where κdenotes the maximum number of write requests
directed to the same object.
To yield the following theorem we have to relate the optimal congestion Copt
M
and min
,
D
κ
.
. Either an object xis migrated at least once or each request message
to xis directed to the node where the initial copy of xis placed. As the nodes in
the mesh have maximum degree 2d, it follows 2d
Copt
M
\
min
,
D
κ
.
. Thus, the
access tree strategy achieves congestion O
d
logn
Copt
M

, w.h.p., i.e., it is strictly
O
d
logn
-competitive, w.h.p., for the d-dimensional mesh Mwith nnodes.
Theorem 2.10 For every application on the d-dimensional mesh M with n nodes, the
access tree strategy achieves congestion O
d
logn
Copt
M

, w.h.p., where Copt
M
denotes the optimal congestion for the application.
Proof (of Lemma 2.9). First, we consider the notification messages. We show that
the congestion due to notification messages in T
G
is not larger than the congestion
due to access messages. Note that T
G
is used as a tool in the proof and actually a
separate tree Tx
G
is used for each data object.
Let Ndenote the maximum number of notification messages passing a single edge
of T
G
. Suppose eTis an edge of T
G
that transmits Nnotification messages. Each
of the notification messages traversing eTwas caused by at least load Kdue to access
or notification messages that pass one of the two nodes incident on eT. Therefore, at
least one of the nodes incident on eTis touched by at least load N
K
2 due to access
or notification messages. Hence, the load on at least one of the at most deg
T
G

tree
edges incident to this node is at least N
K
2
deg
T
G

. Let e
I
Tdenote one of the
incident edges fulfilling this property. For K
4
deg
T
G
N
T
G

, the load on
e
I
Tdue to access or notification messages is at least N
2
T
G

. Then the load on
e
I
Tdue to access messages is at least N
T
G

because the maximum number of
notification messages that cross e
I
Tis N.
By definition is b
eT
>
b
e
I
T

T
G

. Hence, N
b
eT
<(
N
T
G

b
e
I
T
,
i.e., the relative load on eTdue to notification messages is not larger than the relative
load on e
I
Tdue to access messages. As we execute the strictly 3-competitive tree
strategy on each access tree, we can conclude that the expected relative load due to
access and notification messages on every edge in T
G
is in O
Copt
T
G

. Thus,
the expected load due to access and notification messages on an arbitrary edge of Gis
bounded by O
ΨG
Copt
G

since T
G
is a ΨG-useful decomposition tree of G.
32 Chapter 2. Caching without Memory Capacity Constraints
Until now, we have not considered the remapping messages. These messages have
no direct counterpart in the decomposition tree T
G
. However, for each remapping
message we can specify some related access or notification messages that are sent
along the edges of the access trees: Consider a node vof the access tree Tx
G
of an
object x
X. Every time the counter r
v
x
reaches Kthe node vis remapped and a
remapping message is sent from the old host to the new host of v.r
v
x
is increased
whenever an access or notification message traverses v. Hence, each remapping mes-
sage is caused by at least load Ktouching node v.
Again, we merge together the access trees of all objects to form the virtual de-
composition tree T
G
. Fix a node vfrom this tree. Let H
v
denote the number of
remapping messages sent for all access tree nodes corresponding to v. The remapping
messages sent for vonly influence the load on edges in G
v
. In detail, a remap-
ping message for vis sent between two randomly chosen nodes in the subgraph G
v
.
Hence, for the expected load on an edge eof G
v
this has the same effect as sending a
remapping message along an edge in T
G
leading from vto an uniformly at random
chosen child of v. As we are primarily interested in the expected load on the edges
in G, the remapping messages for vcan be related to additional load on the edges in
T
G
leading from vto the children of v. Let eTdenote one of these edges. Then the
additional load on eTdue to remapping messages is at most H
v

D.
The total load due to access and notification messages that are sent along the at
most deg
T
G

tree edges incident on vis at least H
v
[
K. Thus, one of these edges
has at least load H
v

K
deg
T
G

. Let e
I
Tdenote one of these edges fulfilling this
property. For K
deg
T
G
'
T
G
'
D, the load on e
I
Tdue to access or notification
messages is at least H
v

T
G

D.
By definition is b
eT
]
b
e
I
T

T
G

. Hence, H
v
^
D
b
eT
](
H
v
^
D
T
G

b
e
I
T
, i.e., the relative load on eTdue to remapping messages is not larger
than the relative load on e
I
Tdue to access and notification messages. As we execute the
strictly 3-competitive tree strategy on each access tree, we can conclude that the ex-
pected relative load due to access, notification and remapping messages on every edge
in T
G
is in O
Copt
T
G

. Thus, the expected load due to access, notification, and
remapping messages on an arbitrary edge of Gis bounded by O
ΨG
Copt
G

since
T
G
is a ΨG-useful decomposition tree of G.
Finally, let L
e
denote the load on an arbitrary edge eof Gdue to all kind of
messages. We have shown that
E
L
L
e
GM
O
ΨG
Copt
M

7
In order to complete the proof of this lemma, we have to show that the maximum load
over all edges does not deviate too much from the expected load of an arbitrary edge.
We show that L
e
can be decomposed into deg
T
G

parts such that L
e

deg
9
T
9
G
;_;
i
D
1Li, and each Liis a sum of independent random variables. Then, we can
apply a Chernoff bound to each Li. We color the edges of the access trees with the
colors
,
1
777
deg
T
G
.
such that all edges incident on the same node have different
colors.
2.3. Applications of the access tree strategy 33
Fix a color j
`,
1
777
deg
T
G
=.
. Let Exbe the set of tree edges of Tx
G
with
this color. For eT
Ex, let A
eT
x
i
be a random variable that is 1
b
e
if the path
that simulates eTincludes ewhen the ith message corresponding to object xtraverses
eT. Here a migration or remapping message of an object xis interpreted as Dsmall
messages each of which having unit size. Now define
Lj:
x
a
X
eT
a
Ex
iA
eT
x
i
7
Obviously, all of the random variables A
eT
x
i
for different xare stochastically
independent. The coloringyieldsthatall randomvariablesfor differenteTare indepen-
dent, too. Further, the remapping ensures that the set of all random variables with fixed
eTand xcan be partitioned into subsets of maximum sizeW, withW
O
min
,
K
κ
.b
,
such that the values of all variables in the same subset are equal and any collection
of variables from different subsets includes only independent variables. Recall that κ
denotes the maximum number of write requests directed to the same object.
Hence, the term Ljcan be viewed as a sum of independent random variables of
maximumweightW. Applying a Chernoff bound (see, e.g., [HR90]) to this sum yields
that Lj
O
E
Lj
'
W
logn
, w.h.p., and therefore,
L
e
c
deg
9
T
9
G
;P;
i
D
1Li
O
E
L

deg
T
G

W
logn
O
ΨG
Copt
M
'
deg
T
G

W
logn
4
w.h.p. This completes the proof of Lemma 2.9.
Lower bound
The theorem in the previous section shows that the access tree strategy is strictly O
d
logn
-competitive. In the following, we give a lower bound for on-line routing which
shows that this factor is nearly optimal.
We consider the following on-line routing problem: An adversary specifies a se-
quence of routing requests, i.e., pairs rt
st
dt
of source and destination nodes,
for 1
(
t
(
. An on-line routing algorithm must assign a routing path connecting st
and dt, for 1
(
t
(
, without knowing future requests, i.e., requests rt
d
witht
I
&
t. The
goal is to minimize the congestion.
The following theorem gives a lower bound on the competitive ratio for on-line
routing on meshes. The bound holds for each deterministic or randomized on-line
algorithm. A similar bound, but only for two-dimensional meshes, has been derived
independently also by Bartal and Leonardi in the context of routing in “all-optical
networks” [BL97].
Theorem 2.11 Any on-line routing algorithm for the mesh of dimension d
2and
side length m has competitive ratio
logm
.
34 Chapter 2. Caching without Memory Capacity Constraints
Caching in networks includes the problem of on-line routing, which can be shown
as follows. Forevery messagein the routingproblem, we define acorresponding object
xtthat is placed initially on node st. At timet, the node stwrites object xt; immediately
afterwards, node dtreads object xt. Then some data must be routed from stto dt. From
this observation, we can conclude the following corollary.
Corollary 2.12 Any caching strategy for the mesh of dimension d
2and side length
m has competitive ratio
logm
.
Proof (of Theorem 2.11). We show that for each C,d
2, and m
4 being a power
of 2, there is a random routing problem Pd
m
C
for which the minimum congestion
is Cwhereas the expected congestion achieved by any on-line routing algorithm is
C
logm
.
Let Md
m
denote the d-dimensional mesh of side length m. We start by proving a
lower bound for M2
m
. First, we describe a random on-line routing problem P2
m
C
on M2
m
, for which the minimum off-line congestion is C. Then we show that the
expected congestion of any on-line strategy is
C
logm
.
P2
m
C
is defined as follows. Let
k
denote the k-th node in the
-th row
of M2
m
, for 1
(
k
(
m. The adversary starts by specifying m
2 pairs of source
and destination nodes each of which should be connected by Crouting paths. The
pairs are

m
2
B
m
2
m
2

, for 1
(
(
m
2. Further requests are described
recursively: The mesh M2
m
can be partitioned into four m
2
m
2 submeshes. If
m
2
4, then the adversary selects one from these submeshes at random and specifies
routing requests in this submesh according to P2
m
2
C
.
Altogether, this gives logm
1 batches of routing requests of size C
m
2
C
m
4
77G7
C
2. For 1
(
i
(
logm
1, the routing batch specified in stage iis denoted
by Riand the submesh considered in this stage is denoted by Sisuch that S1
M2
m
.
It is easy to check that, for 1
(
i
(
logm
2, there exists an off-line schedule that
routes the requests of batch Riwith congestion Cthrough mesh Siwithout using any
edge of mesh Si
1. Thus, all requests can be routed off-line with congestionC.
It remains to be shown that the expected congestion of any on-line strategy is
C
logm
. The mesh M2
m
consists of m
1 rows of vertical edges and m
1 columns
of horizontal edges, each containing medges. We number the rows and columns from
1 to m
1, respectively. In the following, we only consider the edges in odd rows
and odd columns. These edges are called odd edges. Each routing path connecting a
source and a destination node of a request in batch Rihas to traverse at least m
2i
1
odd edges of submesh Si. Note that this bound holds even if a path leaves the submesh
Si. Hence, if one chooses randomly and uniformly an edge from the odd edges in Si,
then the expected number of paths connecting two nodes of batch Riusing this edge is
at least
Ri
m
2i
1
Eodd
Si
C
m
2i

m
2i
1
m2
22i
2
C
8
with Eodd
Si
denoting the set of odd edges in Si, for 1
(
i
(
logm
1. Choosing a
random odd edge from Slogm
1rather than from Siyields the same bound, because
2.3. Applications of the access tree strategy 35
the selection of the submeshes by the adversary corresponds to random selections of
subsets of odd edges. Note that this holds only for the odd edges because if m
2 is
even, then all of the odd edges in an m
mmesh will be contained in the m
2
m
2
submeshes, while all of the edges going between different submeshes are even. Hence,
the expected congestion in Slogm
1due to requests from batch Riis C
8, for 1
(
i
(
logm
1. Summing over all batches, yields that the expected congestion of P2
m
C
isC
logm
1

8, which completes the proof for the 2-dimensional case.
We now describe the on-line routing problem Pd
m
C
for the mesh Md
m
with
d
3. Md
m
can be partitioned into J
md
2two-dimensional m
msubmeshes
M1
777
MJ, each of which consists only of edges of dimension 1 and 2. Pd
m
C
is
defined as follows. The adversary specifies the routing requests in each submesh Mj
according to P2
m
C
, for 1
(
j
(
J. In each submesh Mjit uses the same random
bits, which means that it specifies exactly the same routing problem in each Mj, for
1
(
j
(
J. We have already seen that all requests can be routed off-line with congestion
Cinside the respective 2-dimensional mesh Mj. Hence, it remains to be shown that
the expected congestion of any on-line strategy on this routing problem is
C
logm
.
This we do by contradiction.
Suppose an on-line routing strategy exists for which the expected congestion on
Pd
m
C
is smaller than C
logm
1

8. Then this routing strategy can be simulated
on the m
mmesh M2
m
for P2
m
md
2
C
. In this simulation, the nodes and the
edges of M2
m
simulate their respective counterparts in the meshes M1
777
MJ. The
simulation yields congestion smaller than md
2
C
logm
1

8, since each edge of
M2
m
has to simulate only md
2edges of Md
m
. This contradicts the above result
for P2
m
md
2
C
. Consequently, the expected congestion of any on-line strategy on
Pd
m
C
is at least C
logm
1

8.
2.3.2 Fat-trees
In this section, we consider caching strategies for the fat-tree Fof height Hwith n
nodes. Fhas thetopologyof a symmetric tree, that is, for each inner-node, the subtrees
rooted at the children of the node are isomorph. The fat-tree represents an indirect
network, that is, only the leaf nodes are processors with memory modules, the inner
nodes are only routing switches. We define the root of Fto be on level 0, and all
nodes whose parents are on level iare defined to be on level i
1. Furthermore, each
edge eof Fconnecting a level inode with a level i
1 node is defined to be on level
i
1. Thus, F
F

d1
b1
B
777
dH
bH

, i.e., for every level iin the tree F, each node
on level ihas di
1children and each edge on level ihas bandwidth bi. We make the
reasonable assumption, for every level i,bi
bi
1.
The following lemma shows that the bandwidths of a fat-tree can be restricted
without increasing the congestion.
Lemma 2.13 W.l.o.g. we can assume, for every level i, bi
(
di
1
bi
1.
36 Chapter 2. Caching without Memory Capacity Constraints
Proof. We have to show that a given strategy with congestion Con the fat-tree F
has on the fat-tree F
Ie
F

d1
b1
B
777
d
K
min
,
b
K
d
K
1
b
K
1
.eB
777
dH
bH

at most
congestionC.
Fix an edge e
I
on level
in F
I
. Let e1
777
ed
fQg
1denote the edge on level
1 in F
I
that are incident to e, and, for an edge e, let L
e
denote the load on e.
Suppose that b
K
min
,
b
K
d
K
1
b
K
1
.
. Then the lemma holds obviously. Now,
suppose that d
K
1
b
K
1
min
,
b
K
d
K
1
b
K
1
.
. Any message that crosses e
I
has to
cross one of the edges e1
777
ed
fQg
1, too. Thus, L
e
IT\(
d
fQg
1
i
D
1L
ei
. The relative load
of an edge is defined to be the load divided by the bandwidth of the edge. Hence, the
relative load on e
I
is
L
e
I
d
K
1
b
K
1
(
d
fQg
1
i
D
1L
ei
d
K
1
b
K
1
(
max
,
L
e1
B
777
L
ed
fQg
1
.
b
K
1
(
C
which yields the lemma.
The fat-tree Fis denoted realistic if the bandwidths of the levels decrease geomet-
rically in direction to the root, i.e., if, for every level i,bi
(
α
di
1
bi
1, for some
constant α
%
1.
The access tree strategy uses a locality preserving embedding of access trees. It
is based on a hierarchical decomposition of F, which we describe recursively. If F
has size one then we have reached the end of the recursion. Otherwise, we partition F
into d1non-overlapping subtrees F1
777
Fd1with Fi
F

d2
b2
B
777
dH
bH

. These
subtrees are then decomposed recursively according to the same rules.
The associated decomposition tree T
F
is isomorph to F. Since we obtain, for
fat-trees, a tradeoff between the height of the decomposition tree and the achieved
congestion, we need a decomposition tree of arbitrary height 1
(
h
(
H. Thus, some
levels of T
F
have to be deleted possibly. This is done with the following deletion
algorithm. While height
T
F

&
h, delete a level 0
%
%
height
T
F

in T
F
whose nodes have minimum degree. Deleting level
means that, first, all nodes and
edgesonlevel
are deleted, then theedges on level
1are connected to the respective
nodes on level
1, and finally the numbering of the levels is adjusted.
Now, the decomposition tree T
F
has height h. The root of T
F
corresponds
still to Fitself, and the children of a node vin the tree correspond to the subtrees into
which the subtree corresponding to vis divided, and the leaves correspond to subtrees
of size one, i.e., to the leaf nodes of F. Note that T
F
is still isomorph to a fat-tree.
The remaining description of our caching strategy is analogous to the caching strat-
egy for meshes in Section 2.3.1. Note that all access tree nodes are mapped only to
leaf nodes of the fat-tree, since the inner nodes are only routing switches, and that all
messages that should be sent between neighboring nodes in the access trees are sent
along the unique shortest path between the associated nodes in the fat-tree.
2.3. Applications of the access tree strategy 37
The following theorem shows that we obtain, for h
min
,
H
logn
loglogn
.
,
an access tree strategy that is O
min
,
H
logn
loglogn
.e
-competitive, w.h.p., for
fat-trees of height Hwith nnodes. For realistic fat-trees this result improves
to O
1
-competitive, w.h.p. Further, we obtain a caching strategy that is strictly
O
deg
F
3
logn
-competitive, w.h.p. To yield the latter result we have to relate the
optimal congestion Copt
F
and min
,
D
κ
.
. Either an object xis migrated at least
once or each request message to xis directed to the node where the initial copy of
xis placed. As the nodes in the fat-tree have maximum degree deg
F
, it follows
Copt
F
)
min
,
D
κ
deg
F
.
.
Theorem 2.14 Consider an application on the fat-tree F of height H with n nodes.
Let Copt
F
denote the optimal congestion for the application, and let κdenote the
maximum number of write requests directed to the same object.
For each 1
(
h
(
H, we obtain an access tree strategy with congestion O

h
n1
X
h
N
Copt
F

R
, w.h.p., and, if h
H, with congestion O
H
Copt
F

R
,
w.h.p., with R
min
,
deg
F

n1
X
h
2
D
κ
.*
deg
F
'
n1
X
h
[
logn.
For realistic fat-trees this improves to O
n1
X
h
Copt
F
e
R
, w.h.p., and, if h
H,
to O
Copt
F
'
R
, w.h.p.
Proof. In order to prove the above result we apply Lemma 2.9. Fix an 1
(
h
(
H.
DefineCopt
T
F

to be the optimal congestion for the application when it is executed
on the tree T
F
, under the assumption that each leaf node of Fis simulated by its
counterpart in T
F
, which is one of the leaf nodes. Note that T
F
is used only
as a tool in the proof and that we actually use a separate tree for each data object.
According to Lemma 2.13 we can assume w.l.o.g., for each level i,bi
(
α
di
1
bi
1,
for some constant α
(
1.
To apply effectively Lemma 2.9, we have to show that the decomposition tree T
F
is ΨF-useful for a certain ΨF, i.e., we have to prove the following two conditions.
Copt
T
F
)(
Copt
F
.
Themaximumexpectedrelativeload in Fdue to access messages isO

h
i
D
1αi
n1
X
h

Copt
T
F

, and, if h
H,O
H
i
D
1αi
Copt
T
F

.
In addition, note that
T
F
<
deg
T
F
>
max
,
deg
F
B
n1
X
h
.
. Then, Theorem
2.14 can be yield with Lemma 2.9.
The next lemma shows the lower bound for the optimal strategy.
Lemma 2.15 Copt
T
F
)(
Copt
F
.
Proof. For a given strategy on Fwith congestion Cwe have to describe a strategy on
T
F
with congestion at mostC.
We simulate the strategy for Fon T
F
. For the routing paths in T
F
we use the
unique shortest paths between the respective nodes, that is, whenever a message is to
38 Chapter 2. Caching without Memory Capacity Constraints
be routed between two fat-tree nodes, we instead route the same message along the
unique shortest path in the tree between the leaf nodes corresponding to these fat-tree
nodes. Let C
I
denote the congestion for a given application with the above strategy on
T
F
. Let edenote an edge of T
F
with relative load C
I
. Then the absolute load of e
isC
Ih
b
e
.
Now consider the same application on F. Any message that crosses ein T
F
has
either to leave or to enter the subtree F
e
. The bandwidth of the edge leaving F
e
is
b
e
. Thus, the load on this edge is at leastC
I
b
e
G
b
e
H
C
I
, and hence, C
C
I
.
The following lemma gives the upper bound on the expected load of the access tree
strategy.
Lemma 2.16 The maximum expected relative load in F due to access messages is
O

h
i
D
1αi
n1
X
h

Copt
T
F

, and, if h
H, O
H
i
D
1αi
Copt
T
F

.
Proof. Fix an arbitrary edge ein Fand a level
in T
F
. Let L
K
e
denote the relative
load on edue to the simulation of access messages passing edges on level
of T
F
.
Let vbe a node of T
F
on level
1 such that F
v
includes the edge e. If such
a node does not exist then E
L
L
K
e
iM+
0. For each node uin T
F
, let the root of
subtree F
v
in Fdenote Fr
v
. Let
vdenote the level of Fr
v
in F. For all the
children v1
777
vdof vin T
F
, the nodes Fr
v1
B
777
Fr
vd
are on the same level in
F. Thus, let
cdenote this level. Further, Let
edenote the level of ein F. We show
that E
L
L
K
e
iM?
O
αH
K
Copt
T
F

, if
e
c, and E
L
L
K
e
GM?
O
n1
X
h
Copt
T
F

,
if
c
&
e
&
v, which yields the lemma, since the latter case can only occur for at most
one level in T
F
.
First, suppose that
e
c. Then, a child v
I
of vexists such that F
v
I
includes
the edge e.eis traversed by paths connecting the fat-tree nodes that simulate vand
its children v1
777
vd, if the host of vor the host of v
I
is mapped on a leaf node in
F
e
, the respective subtree of the original decomposition tree of height H. The nodes
vand v
I
are mapped randomly to one of the leaf nodes in F
v
and F
v
I
. Thus, for the
probability P
v
that the host of vis mapped on a leaf node in F
e
is
P
v
*(
H
i
D
K
e
1di
H
i
D
K
v
1di
(
1
K
e
i
D
K
v
1di
and for the probability P
v
Ij
that the host of v
I
is mapped on a leaf node in F
e
is
P
v
I
8(
H
i
D
K
e
1di
H
i
D
K
c
1di
(
1
K
e
i
D
K
c
1di
7
Next, we relate thebandwidth b
eT
of an edge eTconnectingthe node vwitha child in
T
F
to the bandwidth b
e
/
b
K
e. According to Lemma 2.13 we can assume w.l.o.g.,
2.3. Applications of the access tree strategy 39
for each level i,bi
(
α
di
1
bi
1, for some constant α
(
1. As a consequence,
b
eT
H
b
K
c
(
b
K
e
α
K
e
K
c
K
e
i
D
K
c
1
di
7
The maximum load on the edge eTis Copt
T
F

b
eT
. Consequently, the expected
load on edge efor simulating edges on level
of T
F
is at most
Copt
T
F
[
b
eT
[
d
P
v
'
P
v
I
b
K
e
(
Copt
T
F

2α
K
e
K
c
(
Copt
T
F

2αH
K
with d
K
v
i
D
K
c
1di.
Finally, suppose that
c
&
e
&
v. Then, the levels
c
1
777
v
1 of Fmust
have been deleted in T
F
. We assume that eis traversed by all paths connecting the
fat-tree nodes that simulate vand its children v1
777
vd. Now, we relate the bandwidth
b
eT
of an edge eTconnecting the node vwith a child in T
F
to the bandwidth
b
e
N
b
K
e. According to the definition of the fat-trees, for each level i,bi
bi
1. As a
consequence, b
eT
H
b
K
c
(
b
K
e
7
The maximum load on the edge eTis Copt
T
F

b
eT
. Consequently, the expected
load on edge efor simulating edges on level
of T
F
is at most
Copt
T
F

b
eT

d
b
K
e
(
Copt
T
F

d
(
Copt
T
F
[
2n1
X
h
7
Note that the deletion algorithm ensures that d
(
2n1
X
hbecause this algorithm deletes
always a level in T
F
whose nodes have minimum degree.
This completes the proof of Theorem 2.14.
2.3.3 Complete networks
Some massively parallel computers, e.g., Cray T3E and T3D or Intel Paragon, have
a network with very high bandwidth, so that not the network is the bottleneck but
the individual memory modules. In particular, remote accesses to these modules are
expensive, as local accesses are supported by additional local caches. These systems
are well modeled by a complete network Gnwith nnodes. We aim to minimize the
congestion at the nodes due to remote accesses, that is, remote accesses increase the
load at any node whereas local accesses are free. Each node has bandwidth 1. Thus,
the relative and absolute load of a node are identical.
The access tree strategy uses a locality preserving embedding of access trees. It
is based on a hierarchical decomposition of Gn, which we describe recursively. If Gn
40 Chapter 2. Caching without Memory Capacity Constraints
has size one, i.e., n
1, then we have reached the end of the recursion. Otherwise,
we partition Gninto 2 non-overlapping complete networks G
k
n
X
2
l
and G
W
n
X
2
Y
. These
complete networks are then decomposed recursively according to the same rules.
The associated decomposition tree T
Gn
is a binary tree of height O
logn
. Note
that the bandwidth of a node vin T
Gn
is the number of the nodes in Gn
v
. As for
fat-trees, we need a decomposition tree of arbitrary height 1
(
h
(
logn. Thus, some
levels of T
Gn
have to be deleted possibly. This is done with the deletion algorithm.
Now, the decomposition tree T
Gn
has height hand degree O
n1
X
h
. Note that
the root of T
Gn
corresponds still to Gnitself, the children of a node vin the tree
correspond to the complete networks into which the complete network corresponding
to vis divided, and the leaves correspond still to complete networks of size one, i.e., to
the nodes of Gn.
The remaining description of our caching strategy is analogous to the caching strat-
egy for meshes in Section 2.3.1. Note that all messages that should be sent between
neighboring nodes in the access trees are sent along the edge between the associated
nodes in the complete network.
The following theorem shows that we obtain, for h
1, an access tree strategy for
complete networks with nnodes that is O
1
-competitive, w.h.p., with respect to the
congestion at the nodes. Further, we obtain, for h
logn, an access tree strategy that
is strictly O
logn
-competitive, w.h.p., with respect to the congestion at the nodes. To
yield the latter result we have to relate the optimalcongestionCopt
Gn
and min
,
D
κ
.
.
Either an object xis migrated at least once or each request message to xis directed to
the node where the initial copy of xis placed. Thus,Copt
Gn
*
min
,
D
κ
.
.
Theorem 2.17 Consider an application on the complete network Gnwith n nodes. For
each 1
(
h
(
logn, we obtain an access tree strategy with congestion O
h
Copt
Gn

min
,
n2
X
h
D
κ
.8
n1
X
h
logn
at the nodes, w.h.p., where Copt
Gn
denotes the optimal
congestion at the nodes for the application, and κdenotes the maximum number of
write requests directed to the same object.
Proof. Inorder to provethe aboveresultweapplyLemma2.9which holds analogously
for the congestion at the nodes. Fix an 1
(
h
(
logn. Define Copt
T
Gn

to be the
optimal congestion at the nodes for the application when it is executed on the tree
T
Gn
, under the assumption that each node of Gnis simulated by its counterpart in
T
Gn
, which is one of the leaf nodes. Note that T
Gn
is used only as a tool in the
proof and that we actually use a separate tree for each data object.
To apply effectively Lemma 2.9, we have to show that the decomposition tree
T
Gn
is ΨGn-useful for a certain ΨGn, i.e., we have to prove the following two condi-
tions.
Copt
T
Gn
)(
Copt
Gn
.
The maximum expected relative load at the nodes in Gndue to access messages
is O
h
Copt
T
Gn

.
2.3. Applications of the access tree strategy 41
In addition, note that
T
Gn
+
deg
T
Gn
+
O
n1
X
h
. Then, Theorem 2.17 can be
yield with Lemma 2.9.
The next lemma shows the lower bound for the optimal strategy.
Lemma 2.18 Copt
T
Gn
)(
Copt
Gn
.
Proof. For a given strategy on Gnwith congestion Cat the nodes we have to describe
a strategy on T
Gn
with congestion at mostCat the nodes.
We simulate the strategy for Gnon T
Gn
. For the routing paths in T
Gn
we use
the unique shortest paths between the respective nodes, that is, whenever a message is
to be routed between two nodes in Gn, we instead route the same message along the
unique shortest path in the tree between the leaf nodes corresponding to these nodes.
LetC
I
denotethecongestionatthenodesforagivenapplicationwiththe abovestrategy
on T
Gn
. Let vdenote a node of T
Gn
with relative load C
I
. Then the absolute load
of visC
Ih
b
v
.
Now consider the same application on Gn. Any message that crosses vin T
Gn
has either to leave or to enter the complete network Gn
v
. The number of nodes in
Gn
v
is b
v
. Thus, the load on one of these nodes is at least C
Ih
b
e

b
e
m
C
I
, and
hence, C
C
I
.
The following lemma gives the upper bound on the expected load of the access tree
strategy.
Lemma 2.19 The maximum expected relative load at the nodes in Gndue to access
messages is O
h
Copt
T
Gn

.
Proof. Fix an arbitrary node vin Gnand a level
in T
Gn
. Let L
K
v
denote the load
on vdue to the simulation of access messages passing nodes on level
of T
Gn
. Let
vTbe a node of T
Gn
on level
such that Gn
vT
includes the node v. If such a node
does not exist then E
L
L
K
v
iM
0. We show that E
L
L
K
v
iM
O
Copt
T
Gn

, which
yields the lemma.
First, we give a bound for the probability P
v
that the host of vTis mapped onto v.
Let
Gn
vT
denote the number of processors in Gn
vT
. As vTis mapped uniformly
at random to one of the nodes in Gn
v
,
P
v
)(
1
Gn
vT
7
Next, we relate this probability to the bandwidth b
vT
of vT. This bandwidth is
defined to be the number of the nodes in Gn
vT
. As a consequence,
b
vT
H
Gn
vT
(
1
P
v
7
The maximum load of vTis Copt
T
Gn

b
eT

. Consequently, the expected load
on node vfor simulating vTis at most
Copt
T
Gn

b
vT
[
P
v
8(
Copt
T
Gn

7
Hence, E
L
L
K
v
GM
O
Copt
T
Gn

, which yields the lemma.
42 Chapter 2. Caching without Memory Capacity Constraints
This completes the proof of Theorem 2.17.
2.3.4 The universal caching strategy
In this section, we present the universal caching strategy working on arbitrary net-
works for which an oblivious routing strategy is given. A routing strategy is called
oblivious if the path traveled by each packet depends only on the origin and destina-
tion of the packet and not on the origins and destinations of the other packets nor on
congestion encountered during the routing. Note that each oblivious routing strategy
is also an on-line routing strategy. Finally, we apply the universal caching strategy to
several classes of networks.
Suppose we are given an arbitrary networkGwith nnodes and an oblivious routing
strategy for G. Each edge is assumed to have bandwidth 1. Thus, the relative and abso-
lute load of an edge are identical. The universal caching strategy for Gis a simulation
of the access tree strategy for the complete network Gnon G. Note that both Gand Gn
have nnodes such that for each node in Ga counterpart in Gncan be fixed. All mes-
sages that are sent between adjacent nodes in Gnare sent along the paths, determined
by the oblivious routing strategy, between the associated nodes in G.
The following theorem shows that, for a network Gin which a permutation can be
routed obliviously with maximum expected load Lπ
G
, the universal caching strat-
egy is O
Lπ
G
+
deg
G

-competitive, w.h.p., and strictly O
Lπ
G
+
deg
G
/
logn
-
competitive, w.h.p. To yield the latter result we have to relate the optimal congestion
Copt
M
and min
,
D
κ
.
. Either an object xis migrated at least once or each request
message to xis directed to the node where the initial copy of xis placed. As the nodes
in Ghave maximum degree deg
G
, it follows deg
G
[
Copt
G
*
min
,
D
κ
.
.
Theorem 2.20 Consider an application on a network G with n nodes in which a per-
mutation can be routed obliviously with maximum expected load Lπ
G
. For each
1
(
h
(
logn, we obtain a caching strategy with congestion O
Lπ
G
/
deg
G
/
h
Copt
G

min
,
n2
X
h
D
κ
.m
n1
X
h
logn
, w.h.p., where deg
G
denotes the degree of G,
Copt
G
denotes the optimal congestion for the application, and κdenotes the maxi-
mum number of write requests directed to the same object.
Proof. In order to prove the above result we require a lower bound on the congestion
of an optimal strategy and an upper bound on the congestion of the universal caching
strategy. Define Copt
Gn
to be the optimal congestion at the nodes for the application
when it is executed on the complete network Gn, under the assumption that each node
of Gis simulated by its counterpart in Gn. Fix an 1
(
h
(
logn. Define T
Gn
to be the
decomposition tree of Gnwith height hand degree O
n1
X
h
, and defineCopt
T
Gn

to
be the optimal congestion at the nodes for the application when it is executed on the
tree T
Gn
, under the assumption that each node of Gnis simulated by its counterpart
in T
Gn
, which is one of the leaf nodes.
2.3. Applications of the access tree strategy 43
We give a lower bound that relates Copt
G
to Copt
T
Gn

, and an upper bound
that relates the congestion of the universal caching strategy to Copt
T
Gn

. We start
with the lower bound.
Lemma 2.21 Copt
T
Gn
)(
deg
G
[
Copt
G
.
Proof. Lemma 2.18 shows that Copt
T
Gn
<(
Copt
Gn
. Hence, it remains to prove
Copt
Gn
*(
deg
G
[
Copt
G
.
For a given strategy on Gwith congestion Cwe have to describe a strategy on Gn
with congestion at most deg
G
C
Cat the nodes. We simulate the strategy for Gon Gn,
except for the routing. Instead, for a routing path in G, we use the edge between the
respective nodes in Gn.
Each edge incident on a node vGin Ghas at most load C, since each edge in Ghas
bandwidth 1. Thus, the node vGnin Gnbeing the counterpart of vGin Ghas at most
relative load deg
G

C, since each node in Gnhas bandwidth 1.
The following lemma gives an upper bound on the congestion of the universal
caching strategy.
Lemma 2.22 The maximum expected relative load in G due to access messages is
O
Lπ
G

h
Copt
T
Gn

.
Proof. Lemma 2.19 shows that the maximum expected relative load at the nodes in
Gndue to access messages is O
h
Copt
T
Gn

. Hence, it remains to prove that,
for a given strategy with maximum expected relative load Lat the nodes in Gn, the
simulation of this strategy has at most maximum expected relative load O
Lπ
G
N
L
at the edges of G.
Recall that, in the simulation of the strategy for Gnon G, all messages that are
sent between adjacent nodes in Gnare sent along the paths, determined by the oblivi-
ous routing strategy, between the associated nodes in G. Obviously, all these routing
requests can be partitioned into Lpartial permutations. Since a permutation can be
routed in Gwith maximum expected (relative) load Lπ
G
, the simulation has at most
maximum expected relative load O
Lπ
G
[
L
. Note that the routing strategy is obliv-
ious, i.e., the path traveled by each packet depends only on the origin and destination
of the packet.
Applying Lemma 2.21 and Lemma 2.22yields that the maximum expected relative
load in Gdue to access messages is O
Lπ
G

deg
G

h
Copt
G

.
Finally, let L
e
denote the load on an arbitrary edge eof Gdue to all kind of
messages. Analogously to the proof of Lemma 2.9, we can conclude that E
L
L
e
GM[
O
Lπ
G

deg
G
[
h
Copt
G

. In order to complete the proof of the lemma, we have
to show that the maximum load over all edges does not deviate too much from the
expected load of an arbitrary edge.
Analogously to the proof of Lemma2.9, we decompose L
e
into deg
T
Gn

parts
such that L
E
H
deg
9
T
9
Gn
;_;
i
D
1Li, and each Liis a sum of independent random variables
44 Chapter 2. Caching without Memory Capacity Constraints
of maximum weight W
O
min
,
K
κ
.e
with K
Θ
n2
X
h
D
and κdenoting the
maximum number of write requests directed to the same object. Applying a Chernoff
bound (see, e.g., [HR90]) to the sums yields that Lj
O
E
Lj

W
logn
, w.h.p.,
and therefore,
L
e
c
deg
9
T
9
Gn
;P;
i
D
1Li
O
E
L
'
deg
T
Gn
[
W
logn
O
Lπ
G
[
deg
G
[
h
Copt
M

min
,
n2
X
h
D
κ
.^
n1
X
h
logn
4
w.h.p. This completes the proof of Lemma 2.20.
Cayley networks
Meyer auf der Heide and V¨ocking have presented in [MV95] an oblivious routing
strategy for Cayley networks that achieves maximum expected load O
p
diam
G

for
routing ppermutations in a Cayley network Gwith nnodes, where diam
G
denotes
the diameter of G. Cayley networks are defined as follows. Let Γbe a finite algebraic
group withidentity 1, and suppose Σis a setgenerator of Γwith 1
Σ. Then the Cayley
network GΓ
:
Σ
V
E
is defined byV
Γand E
n,
a
b
a
1b
Σ
.
. Note that many
standard networks belong to this class, e.g., tori, cube-connected-cycles, and wrapped
butterfly networks. Then, the following corollary can be concluded with Theorem
2.20. This corollary shows that, for a Cayley network Gwith nnodes, the universal
caching strategy is O
diam
G
[
deg
G

-competitive, w.h.p., and strictly O
diam
G
[
deg
G

logn
-competitive, w.h.p.
Corollary 2.23 Consider an application on a Cayley network G with n nodes. For
each 1
(
h
(
logn, we obtain a caching strategy with congestion O
diam
G
'
deg
G
'
h
Copt
G

min
,
n2
X
h
D
κ
.8
n1
X
h
logn
, w.h.p., where Copt
G
denotes the optimal
congestion for the application, and κdenotes the maximum number of write requests
directed to the same object.
Edge symmetric networks
Further, Meyer auf der Heide and V¨ocking have given in [MV99] an oblivious rout-
ing strategy for edge symmetric networks achieving maximum expected load O
p
diam
G

deg
G

for routing ppermutations in an edge symmetric network Gwith
nnodes. Note that, e.g., all equal-sided tori belong to this class. Then, the following
corollary can be concluded with Theorem 2.20. This corollary shows that, for an edge
symmetric network Gwith nnodes, the universal caching strategy is O
diam
G

-
competitive, w.h.p., and strictly O
diam
G
[
logn
-competitive, w.h.p.
Corollary 2.24 Consider an application on an edge symmetric network G with
n nodes. For each 1
(
h
(
logn, we obtain a caching strategy with congestion
2.3. Applications of the access tree strategy 45
O
diam
G

h
Copt
G
'
min
,
n2
X
h
D
κ
.8
n1
X
h
logn
, w.h.p., where Copt
G
denotes
the optimal congestion for the application, and κdenotes the maximum number of
write requests directed to the same object.
Hypercubic networks
In addition, Meyer auf der Heide and V¨ocking have introduced in [MV99] an obliv-
ious routing strategy for de Bruijn networks that achieves maximum expected load
O
p
logn
for routing ppermutations in the de Bruijn network with nnodes. It is
obvious that a bi-simulation between a de Bruijn network and the shuffle-exchange
network of the same size can be done with congestion 2. Then, the following corol-
lary can be concluded with Theorem 2.20. This corollary shows that, for the wrapped
butterfly, cube-connected-cycles, hypercube, de Bruijn, and shuffle-exchange network
withnnodes, the universal caching strategyis O
logn
-competitive,w.h.p., and strictly
O

logn
2
-competitive, w.h.p.
Corollary 2.25 Consider an application on the wrapped butterfly, cube-connected-
cycles, hypercube, de Bruijn, or shuffle-exchange network G with n nodes. For each
1
(
h
(
logn, we obtain a caching strategy with congestion O
logn
h
Copt
G

min
,
n2
X
h
D
κ
.)
n1
X
h
logn
, w.h.p., whereCopt
G
denotes the optimal congestion for
the application, and κdenotes the maximum number of write requests directed to the
same object.
Leighton has presented in [Lei92] an oblivious routing strategy for indirect butter-
fly networks that achieves maximum expected load O
p
for routing ppermutations
in the indirect butterfly network with nnodes. Hence, the following corollary can be
concluded with Theorem 2.20. This corollary shows that, for the indirect butterfly
network with nnodes, the universal caching strategy is O
1
-competitive, w.h.p., and
strictly O
logn
-competitive, w.h.p.
Corollary 2.26 Consider an application on the indirect butterfly network G with n
nodes. For each 1
(
h
(
logn, we obtain a caching strategy with congestion O
h
Copt
G
N
min
,
n2
X
h
D
κ
.
n1
X
h
logn
, w.h.p., where Copt
G
denotes the optimal
congestion for the application, and κdenotes the maximum number of write requests
directed to the same object.
2.3.5 Clustered Networks
AclusterednetworkG
V
E
is a network thatconsists of several smallsubnetworks,
i.e., clusters, that are arranged hierarchically. The cluster-tree T
G
describes this
hierarchical structure. The internal nodes of T
G
correspond to the clusters of G, and
the leaves correspond to the terminal user nodes. In the following, we interpret the
terminal user nodes as clusters of size 1. Any two clusters that are connected by one
46 Chapter 2. Caching without Memory Capacity Constraints
op
qr
st
uv
wx
yz
{|
}~
¡¢
£¤
¥¦
§¨
©ª
«¬
®
¯°
±²
³´
µ
·¸
¹º
»¼
½¾
¿À
Á ÁÂ
à ÃÄ
ÅÆ
ÇÈ
ÉÊ
ËÌ
ÍÎ
ÏÐ
ÑÒ
ÓÔ
ÕÖ
ר
ÙÚ
ÛÜ
ÝÞ
ßà
áâ
ãä
åæ
ç ç
è è
éê
ë ë
ì ì
ííí
ííí
ííí
ííí
îîî
îîî
îîî
îîî
ïïïï
ïïïï
ððð
ððð
ñ ñ
ñ ñ
ò ò
ò ò
óóó
óóó
ôôô
ôôô
õ
õö
ö
÷ ÷
ø ø
ù ù
ú ú
û û
û ûü
ü
ýþ
ÿÿÿ













!!!
!!!
" "
" "
# #
# #$
$
%%%%%
&&&&
'''
'''
(((
(((
)))
)))
***
***
+++++
+++++
,,,,
,,,,
- -
- -
- -
.
.
.
////
////
000
000
12
3 34
5 5
5 5
6 6
6 6
7 7
7 7
7 7
8 8
8 8
8 8
9 9
9 9:
:
; ;
; ;<
<
===
===
===
> >
> >
> >
Regional Network
Regional Network
Backbone Network
Regional Network
Figure 2.5: The topology of a wide area NOW.
or more edges in Gare also connected by an edge in T
?
G
@
. The bandwidth of an edge
in T
?
G
@
is the sum of the bandwidths of the corresponding edges in G.
For instance, networks of workstations (NOWs) are usually organized as clustered
networks. Figure 2.5 depicts a possible topology for a wide-area NOW. Note that our
definition of clustered networks is slightly more general than the depicted network as
it allows that user terminal nodes are attached also to the internal clusters.
The nodes inside the clusters are connected in an arbitrary fashion. However, we
assume that communication between nodes in the same cluster is less expensive than
communication between nodes in different clusters, which is just the basic idea behind
any kind of clustering. This property can be formalized as follows. Consider a cluster
K. Let VKdenote the set of nodes in the cluster. Define the weight w
?
v
@
of a node
v
A
VKto be the sum of the bandwidths of the edges incident on vthat leave K, and the
weight w
?
U
@
of U
B
VKby w
?
U
@DC
v
E
Uw
?
v
@
. A cut U for U
B
VKis a partition of
Kinto two subgraphs induced by Uand VK
F
U. The capacity of a cut Uis the sum of
the bandwidths of the edges connecting the nodes in Uwith the nodes in VK
F
U. We
define the cross flux by
α
?
K
@GC
min
U
H
VK
capacity
?
U
@JI
w
?
VK
@
w
?
U
@KI
w
?
VK
F
U
@
L
min
U
H
VK
capacity
?
U
@
min
M
w
?
U
@ON
w
?
VK
F
U
@OPRQ
We assume that α
?
K
@SC
?
1
@
, for each cluster K.
The following example illustrates that the cross flux is a good measure for the
bandwidth bottleneck of a cluster. We define the fully loaded random routing problem
2.3. Applications of the access tree strategy 47
as follows. Suppose b
?
e
@UT
2 messages arrive along each edge ethat connects cluster K
with another cluster, where b
?
e
@
denotes the bandwidth of e. For simplicity, we assume
thatb
?
e
@UT
2 isan integer. For each ofthe arrivingmessages, we choose a departure edge
eat random such that the message leaves the cluster along edge ewith probability
b
?
e
@VT
w
?
Vk
@
. Note that the expected number of messages arriving or leaving the cluster
via an edge eis b
?
v
@
, which corresponds to the maximum number of messages that can
pass the edge ein one time step.
Define the minimum cut ratio S of the fully loaded random routing problem to
be the minimum, over all cuts U, of the capacity of the cut divided by the expected
number of messages that have to pass the cut. If Sis smaller than 1 than there is at
least one cutUsuch that the expected number of messages cannot pass this cut in one
time step. The cross flux α
?
K
@
is equal to the minimum cut ratio Sof the fully loaded
random routing problem. Therefore, if α
?
K
@XW
1 then there is a bandwidth bottleneck
inside the cluster K.
We consider a distributed application or network computation in which the user
terminal nodes issue read and write requests. The shared objects can be placed on any
node of the network, but only the terminal user nodes are allowed to issue requests to
the shared objects. Network computations in which also the internal nodes issue re-
quests to the shared objects can be modeled by attaching a user terminal node to each
of the internal nodes. The two nodes are connected by an edge, which virtual band-
width corresponds to the maximum injection rate for requests issued by the original
node.
The access tree strategy for clustered networks works as follows. For each object
x
A
X, define the access tree Tx
?
G
@
to be a copy of the cluster-tree T
?
G
@
. Each interior
node iof Tx
?
G
@
is mapped at random to one of the nodes in the associated cluster K. In
particular, iis mapped with probability w
?
v
@UT
w
?
VK
@
onto node v
A
VK. On each access
tree we execute the tree strategy.
The caching strategy requires a cluster initialization for the selection of the routing
paths inside the clusters. For this path selection, we solve a multicommodity flow
problem for each cluster K. The multicommodity flow problem is solved locally
for each cluster K, e.g., with the randomized approximation scheme introduced by
Leighton et al. in [LMP
Y
95]. Let n
?
K
@
denote the number of nodes in the cluster and
m
?
K
@
the number of edges connecting these nodes. Then the initialization takes time
O
?
n
?
K
@
3
I
m
?
K
@ZI
log4n
?
K
@V@
, for each cluster K. Alternatively, the multicommodity
flow problem can be calculated deterministically and exactly by an algorithm based
on linear programming, which also can be done in time polynomial in the size of the
clusters. Note that, independent from the number of shared objects, the initialization
of a cluster needs to be done only once. A detailed description of the solution to the
routing problem is given in the proof for the following theorem.
Theorem 2.27 Consider an application running on the terminal user nodes of a clus-
tered network G of size n. Suppose the cross flux of each cluster is in
?
1
@
. Let
δdenote the maximum number of nodes in a cluster that are adjacent to nodes in
48 Chapter 2. Caching without Memory Capacity Constraints
other clusters, and let γdenote the maximum degree in the cluster tree T
?
G
@
, i.e., γ
is the maximum number of tree nodes adjacent to any single tree node. Then the ac-
cess tree strategy achieves congestion O
?
logδ
I
Copt
?
G
@J[
γ
I
κ
I
logn
@
, w.h.p., where
Copt
?
G
@
denotes the optimal congestion for the application, and κdenotes the maxi-
mum number of write requests directed to the same object. If the clusters in G can be
represented by planar graphs or by constant genus graphs, then this result improves to
O
?
Copt
?
G
@\[
γ
I
κ
I
logn
@
, w.h.p.
Proof. Consider a cluster K. Let VKdenote the set of nodes in the cluster. Let α
?
K
@
denote the cross flux of K. Let δ
?
K
@
denote the number of nodes in Kthat are adjacent
tonodesin other clusters. Let γ
?
K
@
denote thedegree ofKin thecluster-tree. We define
the congestion of Kto be the maximum relative load of all edges adjacent to nodes in K
includingedges connecting nodes in Kwith nodes inother clusters. LetCopt
?
K
@
denote
the optimal congestion of K. We show, for cluster K, that the access tree strategy
achieves congestion O
?
logδ
?
K
@]I
Copt
?
K
@UT
max
M
1
N
α
?
K
@^P_[
γ
?
K
@KI
κ
I
logn
@
, w.h.p. If K
is planar or of constant genus, then this result improves to O
?
Copt
?
K
@UT
max
M
1
N
α
?
K
@^P`[
γ
?
K
@`I
κ
I
logn
@
, w.h.p. This yields the above theorem. Furthermore, it shows that the
influence of a “bad” cross flux is only small and local to the respective cluster.
In the following, we describe how the routing paths that simulate the edges of
the access trees are selected. According to this description, any message that is sent
between two nodes simulating adjacent access tree nodes chooses a routing path at
random. We assume that the path selection for each message is independent from the
selection of other messages.
First, we fix the course of the routing paths along the edges that connect different
clusters. Suppose K
a
is a cluster neighboring to K. Let uand u
a
be the cluster-tree
nodes that represent the access tree nodes corresponding to Kand K
a
, respectively. Let
eTbe the edge that connects the two clusters in the cluster-tree T
?
G
@
, and let e1
N
QVQUQ
N
ek
denote the corresponding edges that connect Kand K
a
in G. We send a message that
traverses eTin T
?
G
@
along edge eiwith probability p
?
ei
@DC
b
?
ei
@UT
b
?
eT
@
, where b
?
ei
@
is the bandwidth of edge eiin G, and b
?
eT
@bC
k
i
c
1b
?
ei
@
is the bandwidth of the edge
eTin T
?
G
@
. The following lemma gives an upper bound on the expected relative load
on the edges between different clusters.
Lemma 2.28 Let e be an edge of G that connects a node in cluster K with a node in
one of the neighboring clusters. Then the expected load on e is O
?
Copt
?
K
@U@
.
Proof. Let eTdenote the edge in T
?
G
@
that connects the nodes representing the two
adjacent clusters. Our strategy executes the strictly 3-competitive tree strategy on the
access trees. Thus, the relative load on eTis O
?
Copt
?
K
@V@
, and hence the absolute load
is O
?
Copt
?
K
@SI
b
?
eT
@U@
. As a consequence, the expected absolute load for edge eis
O
?
Copt
?
K
@KI
b
?
eT
@JI
p
?
e
@U@SC
O
?
Copt
?
K
@KI
b
?
e
@U@
. Therefore, the expected relative load on
eis at most O
?
Copt
?
K
@U@
.
Next we describe the path selection strategy for the routing inside the cluster K.
Consider the following multicommodity flow problem. Let V
d
K
CeM
v1
N
QUQUQ
N
vδ
f
K
g
PhB
VK
2.3. Applications of the access tree strategy 49
denote the set of nodes that are adjacent to nodes outside the cluster. We define δ
?
K
@
2
commodities
i
i
j
jwith 1
k
i
N
j
k
δ
?
K
@
. The source of commodity
i
i
j
jis vi, its sink is
vj, and its demand is w
?
vi
@`I
w
?
vj
@UT
w
?
V
d
K
@
. We solve the commodity flow problem on
Kwhile respecting the capacities, i.e., the bandwidths of the edges in K.
We are going to use the solution to the multicommodity flow problem to help guide
us in selecting the routingpaths inside the clusters. For a solutionof the multicommod-
ity problem, define the throughput fraction q as the minimum, over all commodities,
of the fraction of the commodity’s demand that is met, that is, for each commodity
i
i
j
j,
there is a flow of size q
I
w
?
vi
@JI
w
?
vj
@VT
w
?
V
d
K
@
from vito vj. An optimal solution maxi-
mizes the value of q. The following lemma relates the result of an optimal solution to
the cross flux α
?
K
@
.
Lemma 2.29 In the general case, there is a solution to the multicommodity with q
C
?
α
?
K
@lT
logδ
?
K
@V@
. If K is a planar graph or a constant genus graph then there is a
solution with q
C
?
α
?
K
@l@
.
Proof. First we consider the general case. Define the minimum cut ratio of a multi-
commodity flow problem to be the minimum, over all cuts, of the capacity of the cut
divided by the flow demand across the cut. In [AR98] it is shown that any multicom-
modity flow problem can be satisfied up to a factor q
C
?
S
T
logk
@
with Sdenoting
the minimum cut ratio and kdenoting the number of commodities. The minimum cut
ratio of our multicommodity flow problem is α
?
K
@VT
2. Note that the “load” induced
by the multicommodity flow problem on an edge is twice the load of the fully loaded
random routing problem, which has minimum cut ratio α
?
K
@
. As the number of com-
modities in our problem is δ
?
K
@
2, the maximum flow can be satisfied up to a factor
q
C
?
α
?
K
@UT
log
?
δ
?
K
@U@V@
, which yields the result for general networks.
A better bound is known for the optimal throughput fraction of uniform multicom-
modity flow problems on planar graphs or graphs of constant genus. In a uniform
multicommodity flow problem there is a unit-demand commodity between every pair
of nodes. In this case, the optimal throughput fraction is shown to be in
?
S
am@
, where
S
a
denotes the cut ratio of the uniform problem (see [KPR93]). We transform our
multicommodity flow problem into a uniform problem such that the cut ratio S
a
of
the uniform problem is in
?
S
@
and the optimal throughput fraction q
a
for the uni-
form problem is in O
?
q
@
, where Sand qdenote the cut ratio and the optimal through-
put fraction of the original problem, respectively. This yields the desired result as
q
C
?
q
a@_C
?
S
a@bC
?
S
@
.
Interestingly, the bound on the optimal throughput fraction for uniform multicom-
modity problems is independent from the “size” of the respective problem. Therefore,
we can arbitrarily blow up our multicommodity flow problem in the transformation
without decreasing the optimal throughput fraction. Note that we do not aim to solve
the resulting multicommodity flow problem but only aim to show the existence of a
“good” throughput fraction.
A first transformation increases all weights such that each node, including the
nodes not in V
d
K, have an integral weight not smaller than 1. This is done as follows.
50 Chapter 2. Caching without Memory Capacity Constraints
We multiply all weights and capacities with a suitable large factor F
?
eT
@
such that
each of the weights of the nodes in V
d
Kand each of the capacities of the edges are not
smaller than n2, where ndenotes the number of nodes in the cluster. Afterwards, the
weights of all nodes are rounded up to the next integer, and the weights of all nodes
with weight 0 in the original problem, i.e., the nodes not in V
d
K, are defined to have
weight 1 in the new problem. Let w
an?
v
@
denote the new weight of a node. We define
n2commodities
ioap?
u
N
v
@
with u
N
v
A
VT. The source of commodity
iqar?
u
N
v
@
is u, its sink
is v, and its demand is w
a?
u
@KI
w
a?
v
@UT
WwithW
C
w
a?
VK
@_C
v
E
VKw
a?
v
@
.
Let S
a
denote the minimum cut ratio and q
a
the optimal throughput fraction of the
multicommodity flow problem after the transformation. As all weights and capacities
are multiplied with the same factor F
?
eT
@
and the sum of the added demands is smaller
than n, which is a fraction of 1
T
nof the smallest capacity, S
aC
Θ
?
S
@
and q
aC
Θ
?
q
@
.
A second transformation yields a uniform problem in which each commodity has
demand 1
T
W. For each node vwith w
a?
v
@Xs
1, we add w
a?
v
@Jt
1 nodes, each of which
is connected by an edge of unbounded capacity with v. Next, the weights of all nodes
are defined to be 1. The resulting graph is still planar or of constant genus and has W
nodes. We define W2commodities, one for each tuple of nodes. The demand of each
commodity is defined to be 1
T
W.
The multicommodity problem derived by the second transformation corresponds
to the previous problem in the following way: Consider two nodes uand vfrom the
original cluster graph. uand vare represented by a set of nodes V
?
u
@
and V
?
v
@
in the
new graph. The demand of the commodities that should be routed from V
?
u
@
to V
?
v
@
is
u
uvE
V
f
u
g
v
uvE
V
f
v
g
1
W
Cxw
V
?
u
@
w
I
w
V
?
v
@
w
W
C
w
a?
u
@JI
w
a?
v
@
W
N
which corresponds to the demand that should be routed between uand vin the previous
problem.
Therefore, the second transformation does not change the minimum cut ratio nor
the optimal throughput fraction. Hence, we have shown that the original problem can
be transformed into a uniform problem with minimum cut ratio S
aC
Θ
?
S
@
and optimal
throughput fraction q
aC
Θ
?
q
@
, which completes the proof of Lemma 2.29.
In the cluster initialization phase, we solve the maximum flow problem for the
cluster K. For each node u, for each edge eincident on u, and each commodity
i
i
j
j,
let f
?
u
N
e
N
i
N
j
@
denote the size of the flow of commodity
i
i
j
jthat leaves node uacross
edge eaccording to the solution of the multicommodity flow problem. As a result of
the initialization, each node uholds a table including its respective f
?
u
N
e
N
i
N
j
@
values.
We use the solution of the multicommodity flow problem to choose the routing
paths in the cluster. Note that none of the request messages has its source and its
destination in the same cluster because messages are routed only between nodes that
simulate neighboring access tree nodes, and each cluster hosts at most one node of
each access tree. The hosts of the access tree nodes are chosen randomly from V
d
K
C
M
v1
N
QUQlQ
N
vδ
f
K
g
P
, which is theset of nodes adjacent tonodes in otherclusters. Hence, any
2.3. Applications of the access tree strategy 51
message coming from outside of Karrives at a node from V
d
Kand has to be forwarded
to a node also from V
d
K. Also a message that leaves the cluster starts at a node from
V
d
Kand has to be routed through the cluster to a node from V
d
K, i.e., to that node which
is incident to the edge along which the message aims to leave the cluster. Hence, we
have to describe only how to choose the routing paths between pairs of nodes fromV
d
K.
Consider a message that traverses or starts at a node u
A
VK. Let vi
A
V
d
Kand vj
A
V
d
K
F
M
u
P
denote the local source and the local destination of the message, respectively.
Let E
?
u
@
denote the set of edges that are incident to uand do not leave the cluster.
Then uchooses an edge from E
?
u
@
at random according to the following distribution:
eis selected with probability
f
?
u
N
e
N
i
N
j
@
e
u
E
E
f
u
g
f
?
u
N
e
a
N
i
N
j
@Q
The message is forwarded along the randomly selected edge. Intuitively, the message
follows the suggestions of the multicommodity flow.
The following lemma relates the expected relative load of the edges inside cluster
Kto the optimal congestionCopt
?
K
@
, which is defined to be the maximum relative load
over all edges incident on the nodes inVK, and hence includes also the relative load on
edges that leave the cluster. Note that the bound on the expected relative load given in
the lemma becomes o
?
Copt
?
K
@U@
if qis in ω
?
1
@
, which means that the bandwidth inside
the cluster is much higher than the bandwidth of the external links. In this case the
congestion is dominated by the load on the edges connecting different clusters.
Lemma 2.30 For each edge connecting two nodes of the cluster, the expected relative
load on e is at most O
?
Copt
?
K
@UT
q
@
, where q denotes the throughput fraction computed
by the multicommodity flow program.
Proof. The only difference between arriving and leaving messages is the direction
in which the messages travel along the randomly selected routing paths. As this is
irrelevant for the expected load on any edge, we only consider arriving messages.
For a data object x, let v
?
x
@yA
V
d
Kdenote the node that simulates the access tree
node of x. Any message for xthat arrives at cluster Kis directed to node v
?
x
@
. Recall
that v
?
x
@
is chosen randomly fromV
d
K
CzM
v1
N
QUQUQ
N
vδ
f
K
g
P
. The probability that v
?
x
@{C
vj,
for 1
k
j
k
δ
?
K
@
, is w
?
vj
@UT
w
?
VK
@_C
w
?
vj
@UT
w
?
V
d
K
@
.
Suppose each node viinjects an expected number of q
I
w
?
vi
@
messages into the
cluster and each message is directed to the host v
?
x
@
of some data object x. Then the
expected number of messages sent from node vito node vjis q
I
w
?
vi
@`I
w
?
vj
@VT
w
?
V
d
K
@
,
which corresponds to the amount of flow sent from vito vjin the solution for the
multicommodity flow problem. The random path selection inside the cluster is guided
by the solution for the multicommodity flow in such a way that the expected number
of messages traversing an edge is equivalent to the amount of flow passing the edge.
Hence, in our example, the expected load on an edge eis not larger than the capacity
of the edge, which is defined to be b
?
e
@
.
52 Chapter 2. Caching without Memory Capacity Constraints
Now, consider the messages arriving during the execution of the application. By
Lemma 2.28 the expected number of messages that arrive via an external edge e
a
con-
necting a node vifrom the cluster Kwith a node from a neighboring cluster K
a
is
O
?
Copt
?
K
@KI
b
?
e
a@U@
. As w
?
vi
@
is equivalent to the sum of the bandwidths of the external
edges e
a
incident on vi, the expected number of messages injected via node viinto the
cluster is O
?
Copt
?
K
@ZI
w
?
vi
@V@
. As a consequence, the expected relative load on each
edge is at most O
?
Copt
?
K
@VT
q
@
.
Combining the results of the Lemmata 2.28, 2.29, and 2.30 yields that the expected
load for each edge in Kis O
?
logδ
?
K
@|I
Copt
?
K
@UT
min
M
1
N
α
?
K
@UP}@
in the general case, and
O
?
Copt
?
K
@VT
min
M
1
N
α
?
K
@^P}@
, if Kis planar or of constant genus. In order to complete
the proof of Theorem 2.27, we have to show that the maximum relative load over all
edges in Kdoes not deviate too much from the expected load.
Consider edge eof the cluster K. Let L
?
e
@
denote the load on e. Then we have
to show that L
?
e
@XC
O
?
E
~
L
?
e
@lq[
γ
?
K
@`I
κ
I
logn
@
, w.h.p. Let Lx
?
e
@
denote the load on
edue to the accesses to data object x. Our tree strategy ensures that the load on each
edge of the access tree due to request messages for xis O
?
κ
@
. Further, each edge e
is involved in the simulation of at most γ
?
K
@
different access tree edges. Therefore,
Lx
?
e
@DC
O
?
γ
?
K
@KI
κ
@
. This means that Lx
?
e
@
is the sum of O
?
γ
?
K
@{I
κ
@
not necessarily
independent 0-1-random variables, each of which representing the load due to one
request message. We add some dummy variables so that Lx
?
e
@
is the sum of exactly
τ
C
O
?
γ
?
K
@KI
κ
@
random variables A1
?
x
@N
QUQUQ
N
Aτ
?
x
@
, for every x
A
X. Then
L
?
e
@C
x
E
X
τ
i
c
1Ai
?
x
@bC
τ
i
c
1
x
E
XAi
?
x
@
Q
ThevariablesAi
?
x
@
inthesumSi
C
x
E
XAi
?
x
@
are independent, for 1
k
i
k
τ. Applying
a Chernoff bound(see, e.g., [HR90]) to thissumyields thatitsvaluedeviatesbyat most
an additive term of O
?
logn
@
from O
?
E
~
Si
r@
, w.h.p., for 1
k
i
k
τ. Since L
?
e
@ZC
τ
i
c
1Si,
it follows L
?
e
@C
O
?
E
~
L
?
e
@lq[
τ
I
logn
@XC
O
?
E
~
L
?
e
@l[
γ
?
K
@{I
κ
I
logn
@
, w.h.p., which
completes the proof of Theorem 2.27.
2.4 Extending the results to data-race free applications
An important class of applications that allows parallel and overlapping requests is the
class of data-race free applications, which is defined as follows. We assume that an
adversary specifies a parallel application running on the nodes of the network, i.e.,
the adversary initiates read and write requests at the nodes of the network. A write
request for an object is not allowed to overlap with other requests to the same object,
and there is some order among the requests to the same object such that, for each
read and write request, there is a unique least recent write. Note that this still allows
arbitrary concurrent requests to different objects and concurrent read requests to the
same object. An execution using a caching strategy is called consistent if it ensures
2.4. Extending the results to data-race free applications 53
that a read request directed to an object always returns the value of the most recent
write request to the same object.
We have to describe how parallel requests are handled by the presented strategy
such that the execution is consistent. On trees this works as follows. Since we con-
sider onlydata-race free applications, write requestsdo not overlapwith other requests.
Overlapping read requests are handled in the following way. Consider a request mes-
sage marriving on a node uthat does not hold a copy of x. Let edenote the next
edge on the path to the nearest copy. Suppose another request message m
a
directed to
xhas been sent already along ebut a data message has not yet been sent back. Then
the request message mis blocked on node uuntil the data message corresponding to
m
a
passes e. When this message arrives, either a new copy is created on node u, and
userves the request message m, or mcontinues its path to the connected component
of copies. As the other networks execute the tree strategy on access trees that are
embedded in the network, they can follow the same approach.
In data-race free applications, the following problem can occur during the remap-
ping of copies: remapping or notification messages overlap with access messages or
other remapping or notification messages. For example, an access message arrives
at an old host of an access tree node but the tree node has been remapped while the
access message was in transit. We solve this problem as follows. All messages are
acknowledged such that we can ensure that at most two messages are pending between
two access tree nodes. If a message arrives at an abandoned host the sender is in-
formed about this and resends the message to the new host. We define that neither the
acknowledgment messages nor the overlapping messages influence the counters. Ob-
viously, these messages have no influence on our asymptotic bounds as each of them
corresponds to another message that we take into account.
We can conclude that all results given in this chapter hold also for data-race free
applications, which indicates that the introduced strategies are well suited for practical
usage.
Chapter 3
Caching with Memory Capacity
Constraints
In this chapter, we investigate caching strategies for networks with memory capacity
constraints in a non-uniform cost model. The remainder of the chapter is organized as
follows. In Section 3.1, the general framework for the development of caching strate-
gies with memory capacity constraints is introduced. This framework is based on the
access tree strategy. In Section 3.2, a caching strategy for trees with memory capacity
constraints is presented. This strategy is used as a subroutine in the general framework.
In Section 3.3, the general framework is applied to several classes of networks, includ-
ing meshes, fat-trees, complete networks, Cayley networks, edge symmetric networks,
and hypercubic networks. Further, we present the universal caching strategy working
on arbitrary networks for which an oblivious routing strategy is given.
3.1 The general framework
In chapter 2, we have presented the access tree strategy that is a caching strategy ne-
glecting memory capacity constraints. The general framework is an extension of the
access tree strategy to memory capacity constraints.
Basically, the access tree strategy simulates a simple 3-competitive caching strat-
egy for a tree T
?
G
@
on the original network G. This tree T
?
G
@
is a tree whose leaf
nodes correspond to the nodes of the network G. The analysis of the access tree strat-
egy uses a bi-simulation between the network Gand the tree T
?
G
@
. At first, it is shown
that the tree T
?
G
@
can simulate the network G. In this simulation, each leaf node of
T
?
G
@
issues the same read and write requests as its counterpart in G. It is shown that
any application that produces congestion Cwhen it is executed on Gcan be executed
on T
?
G
@
with congestion C, too. At second, it is shown that the network Gcan simu-
late the tree T
?
G
@
in such a way that the congestion in Gis only c3times larger than
the congestion in the tree T
?
G
@
.
56 Chapter 3. Caching with Memory Capacity Constraints
From the results on the congestion of these two simulations we can obtain the
competitive ratio of the access tree strategy. The simulation of Gby T
?
G
@
looses a
factor of at most c1
C
1; the on-line strategy looses a factor of c2
C
3 against the
optimal off-line strategy on T
?
G
@
; finally, the simulation of T
?
G
@
by Glooses a factor
of at most c3. Altogether, we achieve a competitive ratio of c1
I
c2
I
c3
C
O
?
c3
@
.
The problem with the adaption of this approach to the case of memory capacity
constraints is the lack of an efficient caching strategy for trees with memory capacity
constraints. It is also not clear how the access trees should look like. One possibility
is to allow that only the leaf nodes, which correspond to the nodes in G, have memory
modules. Another possibility is that all nodes have memory modules which then have
to be simulated by the nodes of the network G. For neither of these alternatives, how-
ever, we know an efficient caching strategy, and we will see that we do not need such
a strategy.
Define the bandwidth tree Tb
?
G
@
to be a tree of maximum height h, which leaf
nodes correspond to the nodes in G. Each of the leaf nodes in Tb
?
G
@
has a memory
module of the same size as the respective node in G. The inner nodes do not have
memory modules. Define the memory tree Tm
?
G
@
to be a tree that is isomorph to
Tb
?
G
@
. Each of the leaf nodes in Tm
?
G
@
has a memory module that is ktimes larger
than the one of its counterpart in Tb
?
G
@
. Each non-leaf node in Tm
?
G
@
, except for the
root, has a memory module, which capacity is equal to the sum of the capacities of
its children. The memory capacity of the root is equal to the sum of the capacities
of all nodes in G. Thus, it is guaranteed that the root node can hold all data objects
simultaneously. The exact topology of these trees depends on the network Gand will
be defined later.
Suppose the bandwidth tree Tb
?
G
@
can simulate the optimal off-line strategy run-
ning on Gwhile increasing the congestion only by a factor of c1; suppose the on-line
strategy on Tm
?
G
@
produces only a congestion that is a factor of c2larger than the con-
gestion of the optimal off-line strategy on Tb
?
G
@
; and suppose the on-line strategy on
Tm
?
G
@
can be simulated on Gso that the congestion in Gis only a factor of c3larger
than the congestion in Tm
?
G
@
while each node in Grequires a memory capacity that
is htimes the memory capacity of the corresponding leaf node in Tm
?
G
@
. Then the
resulting strategy is
?
k
I
h
[
1
N
c1
I
c2
I
c3
@
-competitive.
The major challenge in the construction of the general framework is the analysis of
the factor between the on-line strategy running on Tm
?
G
@
and the optimal off-line strat-
egy running on Tb
?
G
@
. We use a strategy for single-computer caching as a subroutine
in the on-line strategy on Tm
?
G
@
. We will show that the factor between the on-line and
the optimal off-line strategy depends on the competitive ratio of the single-computer
caching strategy.
Such a result as itself does not seem to be very interesting since it considers the
congestion on two different networks. However, we will illuminate its significance in
several examples that describe how to apply the general framework to several classes
of networks.
3.2. The caching strategy for the memory tree 57
3.2 The caching strategy for the memory tree
In this section, we present a caching strategy for the memory tree Tm. This strategy is
used as a subroutine in the general framework.
The memory tree Tmis a virtual network whose topology is an arbitrary rooted
tree. Each node in Tmhas a memory module. The capacity of the modules at the leaf
nodes is arbitrary. The capacity of a module at an inner node of the tree is defined to
be the sum of the capacities of its children. The edges are allowed to have arbitrary
bandwidths. Initially, each data object is stored in one of the leaf nodes.
First, we present a caching strategy for the memory tree in a uniform cost model,
i.e. D
C
1. Then, we generalize this strategy to a non-uniform cost model.
3.2.1 Uniform model
In the uniform cost model, i.e. D
C
1, we assume that each object fits into one routing
packet such that each migration of a copy along an edge increases the load of this edge
by 1. Recall that each read or write request increases the load of each edge involved
in this request by 1, and that each information message increases the load of each
traversed edge by 1.
The caching strategy
An adversaryspecifies read and writerequests thatoccur solelyat the leafnodes. These
requests have to be served by the caching strategy. This strategy may create copies on
all nodes and may delete these copies afterwards in order to reduce the congestion.
Essentially, many copies mean cheap read requests but expensive write requests. Even
in case of only read requests, however, the considered caching problem is not trivial
as the capacities of the memory modules usually do not allow to give a copy of every
object to every node.
The caching strategy for the memory tree consists of the following sub-strategies.
Basic strategy: This strategy manages the shared objects as if the memory mod-
ules in the tree would have unlimited capacities. It decides on which nodes to
place copies so that the costs for read and write requests are balanced. It simply
neglects the memory capacity constraints.
Local strategy: This strategy corresponds to a caching strategy for single-
computer systems. It runs on every node of the tree except for the root, which
is sufficiently large to hold copies of all objects simultaneously, by definition.
The local strategy decides which object in a memory module is ejected when
the basic strategy suggests to insert a new object.
Rescue-and-help strategy: This strategy preserves at least one copy of a data
object at any time. Besides it helps the basic strategy when it wants to access a
copy that has been ejected by the local strategy.
58 Chapter 3. Caching with Memory Capacity Constraints
The coordination between these three strategies is not an easy task. For this pur-
pose, our caching strategy uses dummy copies, that is, in addition to the copies created
by the basic strategy, the algorithmcreates copies that are justplaceholders so that they
cannot be accessed in case of a read request and need not to be updated in case of a
write request. Before we explain this mechanism in detail, we turn to the basic and the
local strategy.
Nearly any known strategy for trees without memory capacity constraints can be
used as basic strategy. The only restrictionis that the strategy has toensure that, for any
object x, the nodes holding a copy of xbuild a connected component in the memory
tree Tm. The overall competitive ratio of our algorithm depends on the competitive
ratio of the basic strategy used.
In order to keep the description and the analysis of the overall strategy as simple as
possible, we refer to a concrete example of a basic strategy in the following. We use
the tree strategy in the uniform model introduced in Section 2.2.1: The nodes holding
a copy of an object xbuild a connected component. Suppose node uissues a read or
write request to object x. Let vdenote the node of the connected component that is
closest to u. Reads are served by establishing the unique shortest path from uto v.
Writes are served by establishing the minimal multicast tree from uto all nodes in the
connected component. The connected component is changed in the following way. In
case of a write all copies in the connected component, except the one on node u, are
deleted. In both the read and the write case, we create new copies on the path from uto
v. In Theorem 2.1 it is shown that this simple caching strategy does not only minimize
the congestion, it minimizes the load on any individual edge in the tree up to a factor
of 3. Note that this result is optimal because of the lower bound shown by Black and
Sleator [BS89].
When the basic strategy wants to store a copy of an object at a node, it calls the
local strategy that decides which copy of another object is ejected eventually. For the
local strategy, we can choose any caching strategy for single-computers, e.g., LRU or
RANDOM (see, e.g., [FKL
Y
91, ST85]). The competitiveness of the overall strategy
depends on the competitiveness of the caching strategy used. In the following we
assume that the local strategy is
?
k
Nnio@
-competitive. For example, LRU is known to be
?
2
N
2
@
-competitive [ST85].
The description so far leaves us with the following two problems.
How to preserve at least one copy for each object?
How to realize accesses to virtual copies, i.e., copies that have been ejected by
the local strategy?
The rescue-and-help strategy is an extension to the basic strategy that solves these
problems by using dummy copies. The dummy copies of an object xare used as
placeholders in order to ensure that always one copy of xcan be preserved without
having to eject any other copy or dummy copy of another object for that purpose.
Whenever the basic strategy invalidates copies in case of a write, we do not delete
3.2. The caching strategy for the memory tree 59
them but change these copies to dummy copies instead. We assume that the local
strategy does not take notice of this change. It treats these copies as usual copies, and
may eject them later. As described before, each data object xhas only one initial copy,
and this copy is stored at a leaf node v
?
x
@
. We add dummy copies of xto all inner
nodes on the path leading from v
?
x
@
to the root. Note that the memory capacities of
all inner nodes are large enough to hold all these initial dummy copies. Adding these
initial dummy copies does not induce any cost since the inner nodes know the initial
placement of the copies.
The rescue-and-help mechanism works as follows. At any time during the execu-
tion, let V
?
x
@
denote the set of nodes that are included in the connected component of
nodes that should hold a copy of xaccording to the basic strategy. Sometimes, some of
the nodes inV
?
x
@
may not hold a copy of xbecause it was ejected by the local strategy.
Let top
?
x
@
denote the unique node in V
?
x
@
that is closest to the root. The path from
top
?
x
@
to the root is called the rescue path of x. Let c
?
x
@
denote the (virtual) copy of x
which the basic strategy assumes to be stored on top
?
x
@
. The rescue-and-help strategy
always keeps one copy c
d?
x
@
of x. Initially, c
d?
x
@ZC
c
?
x
@
is stored on top
?
x
@
. Whenever
c
d?
x
@
is ejected because of the insertion of another object, c
d}?
x
@
is migrated upwards
along the rescue path until it reaches a node that holds a dummy copy of x. Here it can
be stored without having to eject any other object. At any time during the execution,
the following invariant is maintained: c
d?
x
@
is stored on a node top
d?
x
@
on the rescue
path.
The existence of c
d?
x
@
allows us to define an assistance node a
?
x
N
v
@
for every node
v
A
V
?
x
@
. The assistance node a
?
x
N
v
@
is that node on the path from vto the root that
is closest to vand holds a copy of x. Whenever the basic strategy wants to access a
virtual copy of xon node vthe rescue-and-help strategy uses the copy of the assistance
node a
?
x
N
v
@
instead. That is, any path or multicast tree established by the basic strategy
between a node uthat issued a read or write request and the respective node(s) inV
?
x
@
is redirected such that it connects uwith the corresponding assistance node(s) instead
of the node(s) in V
?
x
@
.
The migration of copies is handled in the following way. Let vdenote the node
fromV
?
x
@
that is closest to the node u. When the basic strategy establishes a path from
vto uand migrates copies along that path, the rescue-and-help strategy simply estab-
lishes a path from a
?
x
N
v
@
to u, and migrates copies from a
?
x
N
v
@
to u. We distinguish
three cases.
1. Suppose a
?
x
N
v
@C
top
d
?
x
@
. Figure 3.1 illustrates this situation. Then, the rescue-
and-help strategy creates a copy on each node on the path from a
?
x
N
v
@
to u.
Note that in this case all migrations follow a path that leads downwards, i.e., in
direction to the leaves.
2. Suppose a
?
x
N
v
@C
top
dq?
x
@
, and the path from top
d?
x
@
to uis going downwards
only. Figure 3.1 illustrates this situation. In this case, the path includes top
?
x
@
.
The rescue-and-help strategy migrates c
d?
x
@
to top
?
x
@
, and creates a dummy
copy on each node on the migration path from top
dq?
x
@
to top
?
x
@
, excluding
60 Chapter 3. Caching with Memory Capacity Constraints
V
?
x
@
u
v
top
?
x
@
top
d
?
x
@
root
Figure 3.1: Case 1 and 2: a
?
x
N
v
@
lies on the path from vto top
d?
x
@
.
top
?
x
@
. Then a copy on each node on the path from top
?
x
@
to uis created. The
migration of c
d?
x
@
to top
?
x
@
effectively re-unions c
dq?
x
@
and c
?
x
@
.
3. Suppose a
?
x
N
v
@C
top
d?
x
@
, and the path from top
d?
x
@
to uis going first upwards
to a node wand then downwards to u. Figure 3.2 illustrates this situation. In this
case, the rescue-and-help strategy first migrates c
d}?
x
@
to w, thereby replacing
all dummy copies on the path from top
d?
x
@
to wby real copies, excluding w.
Starting from node w, which now holds c
d?
x
@
, a copy on each node on the path
from wto uis created, which re-unions c
d}?
x
@
and c
?
x
@
as wis the new top
?
x
@
node after serving the request.
We assume that almost all accesses to copies and all insertions of copies are done
by using the local strategy on the respective node. There is only one exception: In case
3 the copyon the node a
?
x
N
v
@`C
top
d?
x
@
is accessed directly, that is, the access does not
influence the local strategy. When using a memory-less local strategy, i.e., a strategy
that ejects copies independent of previous accesses, like RANDOM, we do not have to
care about this topic. In this way, the local strategy on each node vis influenced only
by requests issued in the subtree rooted at v.
It remains todescribe howour caching strategyon the memorytree can be executed
in the distributed setting in which all decisions are based solely on local information.
The tree strategy in the uniform model introduced in Section 2.2.1 is used as basic
strategy. This tree strategy can be executed in a distributed fashion. Then, it is obvious
that the rescue-and-help strategy can also be executed in the distributed setting. This
completes the description of the caching strategy on the memory tree Tm.
Let e
C?
u
N
v
@
denote an arbitrary edge in the tree so that vis closer to the root than
u. We can observe, that the load on edge ecorresponds to the load induced by the basic
strategy plus some load L1
?
e
@
induced by serving requests that are redirected to assis-
tant nodes plus some load L2
?
e
@
induced by upward migrations of c
d
-copies. Note that
3.2. The caching strategy for the memory tree 61
V
?
x
@
u
top
?
x
@
top
d?
x
@
w
root
Figure 3.2: Case 3: a
?
x
N
v
@SC
top
d?
x
@
.
L1
?
e
@
is increased by 2 due to serving a request that is redirected to an assistant node,
since first the request is served and thereafter the residence set is modified. Whenever
L1
?
e
@
or L2
?
e
@
are increased due to an object x, then some copy or dummy copy of x
has been ejected from node urecently, i.e., after the last time when these values have
been increased due to object x. Let eject
?
e
@
denote the number of ejections from node
u. Then we can conclude the following observation.
Observation 3.1 For each edge e of Tm, L1
?
e
@Dk
2
I
eject
?
e
@
and L2
?
e
@k
eject
?
e
@
.
The analysis of the caching strategy
We compare the load of the caching strategy on the memory tree Tmto the optimal
load on the corresponding bandwidth tree Tb. This tree Tbhas the same topology as
Tm, the same bandwidths on all edges, but different memory capacities. The inner
nodes on Tbdo not have any memory modules, i.e., memory capacity null, and the
memory modules at the leaves are smaller than the corresponding memory modules
of Tm. These capacities are chosen in dependence on the choice for the local strategy
on Tm: Suppose the single-computer caching strategy used as local strategy is
?
k
Nnio@
-
competitive. Then we define that the capacity of a leaf node in Tmis ktimes the
capacity of the corresponding leaf node in Tb. We assume that the capacities of Tbare
large enough so that all objects can be stored in the network. This assumption allows
us to decrease the capacity of the root of the memory tree Tmby a factor of kas it is
sufficient if the root can store all data objects simultaneously. Hence, the ratio between
the total memory capacity of Tmand the total memory capacity of Tbis k
I
h
[
1, where
hdenotes the height of Tmand Tb.
The following lemma compares the load induced by our caching strategy running
on Tmwith the load of an optimal off-line strategy running on Tb, both of them ap-
plied to the same, arbitrary sequence of requests issued by the leaf nodes of the cor-
62 Chapter 3. Caching with Memory Capacity Constraints
responding tree. The single-computer caching strategy used as local strategy may be
randomized. In this case, the competitive ratio
i
gives only an upper bound on the
expected load of the single-computer caching strategy, and the following lemma gives
only an upper bound on the expected load of the strategy on Tm. The lemma shows
that our caching strategy minimizes the load on all edges simultaneously. Hence, the
congestion produced by the on-line strategy executed on Tmis at most 3
[
3
i
times the
congestion produced by an optimal off-line strategy executed on Tb.
Lemma 3.2 Let eband emdenote a pair of corresponding edges in Tband Tm, respec-
tively. Then the (expected) load on emis at most 3
[
3
i
times the optimal load on eb,
where
i
denotes the competitive ratio of the local strategy.
Proof. Unfortunately, the behavior of an optimal off-line strategy on the bandwidth
tree Tbis difficult to understand. In order to get a more obvious behavior, we allow
the optimal strategy to place copies on internal nodes, too. Let T
a
bdenote a tree that is
isomorph to Tb, whose leaf nodes have the same memory capacity as the corresponding
leaf nodes in Tb, and whose inner nodes have a memory capacity that is equal to the
sum of the capacities of their respective children. We restrict, however, the placement
ofcopiesonT
a
bin thefollowing way: Fora node v, let T
a
b
?
v
@
denotethesubtreerooted at
node v. Anytime, for each node v, the set of objects that are stored in a memory module
in the subtree T
a
b
?
v
@
, must be a subset of the objects that have a copy or dummy copy
in the memory module of v. Recall that dummy copies are placeholders that cannot be
used to serve a read request and need not to be updated in case of a write request. We
assume that these copies can be created from scratch, that is, they do not require any
kind of migration. The following lemma shows that, despite of this restrictions, T
a
bis
more powerful than Tb, so that any lower bound on the load on the edges of T
a
bholds
also for Tb.
Lemma 3.3 The execution of any off-line caching strategy serving an arbitrary se-
quence of requests on Tb, can be simulated on T
a
bwithout increasing the load on any
edge.
Proof. We simulate the strategyfor TbonT
a
b. The managementof the memory modules
of the leaf nodes and the routing between these nodes, i.e., the selection of the paths
and multicast trees, is done in T
a
bexactly as in Tb. At any time during the simulation,
an inner node vof T
a
bholds a dummy copy of all those objects that are stored in the
memorymodulesinT
a
b
?
v
@
. InthiswaytherestrictionsonT
a
bare satisfied. Since dummy
copies can be created from scratch, this simulation does not require to increase the load
on any edge.
In contrast to the bandwidth tree Tb, the behavior of an optimal off-line strategy
on T
a
bis relatively clear. W.l.o.g., any optimal off-line strategy fulfills the following
properties for each node vin T
a
b.
3.2. The caching strategy for the memory tree 63
If a leaf node in the subtree T
a
b
?
v
@
issues a read or write request to object x, then
the only possible modification of the memory module of vis that a new copy of
xis stored in this module and that copies of other objects are ejected out of this
module.
If a leaf node outside the subtree T
a
b
?
v
@
issues a read or write request to object x,
then the only possible modification of the memory module of vis that copies of
objects are ejected out of this module.
Any optimal off-line strategy that does not possess these two properties can be easily
changed so that the properties are fulfilled without increasing the load on any edge.
Now, fix a sequence σ
C
σ1σ2
IUIUI
of requests issued by the leaf nodes of the corre-
sponding tree. Let em
C?
um
N
vm
@
denote an arbitrary edge of Tmand let e
a
b
C?
u
a
b
N
v
a
b
@
denote the corresponding edge of T
a
b. We assume that vmand v
a
bare closer to the root
than umand u
a
b, respectively. Further, let us fix an optimal off-line strategy on T
a
b,
which is denoted the optimal strategy in the following. Let L
?
e
a
b
@
denote the minimum
load of the optimal strategy on the edge e
a
b. The basic strategy on Tmproduces at most
load 3
I
L
?
e
a
b
@
on the edge em. Therefore, it remains to prove that the additional load
due to the rescue-and-help strategy, i.e., L1
?
em
@[
L2
?
em
@
, is not larger than 3
iXI
L
?
e
a
b
@
(on expectation).
First, we consider the on-line strategy on Tm. From Observation 3.1 we can con-
clude that L1
?
em
@K[
L2
?
em
@k
3
I
eject
?
em
@
. Let ρdenote the subsequence of σthat
includes all requests issued in T
?
um
@
and that are noticed by the local strategy on um,
i.e., ρincludes any request influencing the behavior of the local strategy on um. Let
opt
?
ρ
@
denote the minimum load of an optimal off-line single-computer caching strat-
egy running on ρand having a memory module whose capacity is ktimes smaller than
the one of um. Thus, the memory capacity of the single-computer caching strategy
corresponds to the capacity of u
a
b. As the local strategy is
?
k
Nni@
-competitive, we can
conclude E
~
L1
?
em
@\[
L2
?
em
@]k
E
~
3
I
eject
?
em
@l]k
3
iI
opt
?
ρ
@
Q
Next we consider the optimal strategy on T
a
b. Any request in ρissued to an object
xincreases the load on the edge e
a
bif the memory module of u
a
bdoes not include a
copy of x. Hence, the load on e
a
bis larger or equal to the load of an optimal off-line
single-computer caching strategy running on ρand having a memory module of the
same capacity as u
a
b. As a consequence,
L
?
e
a
b
@D
opt
?
ρ
@
Q
These two bounds give us the required result, i.e., E
~
L1
?
em
@[
L2
?
em
@lk
3
i_I
L
?
e
a
b
@
.
Therefore, the expected load on emis at most 3
[
3
i
times the optimal load on e
a
band
hence at most 3
[
3
i
times the optimal load on eb, too. This completes the proof of
Lemma 3.2.
64 Chapter 3. Caching with Memory Capacity Constraints
3.2.2 Non-uniform model
In thissection, we generalize thecaching strategyfor thememory tree toa non-uniform
cost model. Now, we describe how the three sub-strategies, the basic strategy, the
local strategy, and the rescue-and-help strategy, have to be adapted to the non-uniform
model.
For the basic strategy, we can use nearly any known strategy for trees without
memory capacity constraints in the non-uniform cost model. The only restriction is
that the strategy has to ensure that, for any object x, the nodes holding a copy of x
build a connected component in the memory tree. The overall competitive ratio of our
algorithm depends on the competitive ratio of the used basic strategy. In order to keep
the description and the analysis of the overall strategy as simple as possible, we refer
to a concrete example of a basic strategy in the following. We use the tree strategy
introduced in Section 2.2.2. In Theorem 2.5 it is shown that this caching strategy does
not only minimize the congestion, it minimizes the load on any individual edge in the
tree up to a factor of 3. Note that this result is optimal because of the lower bound
shown by Black and Sleator [BS89].
For the local strategy, any caching strategy for single-computers, e.g., LRU or
RANDOM (see, e.g., [FKL
Y
91, ST85]), can be used. The overall competitive ratio of
our algorithm depends on the used caching strategy. In order to keep the description
and the analysis of the overall strategy as simple as possible, we refer to a concrete
example of a local strategy in the following. We use the RANDOM caching strategy
for single-computers. This strategy works as follows. When a data object should
be loaded into a memory module, one of the slots in the memory module is selected
uniformly at random. The object that may be stored in that slot is ejected.
The rescue-and-help strategy presented in Section 3.2.1 has to be adapted as fol-
lows to the non-uniform cost model. Suppose a leaf node uhad issued a read or write
request for object x. Then, the migration of copies is handled in the following way. Let
vdenote the node fromV
?
x
@
that is closest to the node u. The basic strategy establishes
a path pfrom vto uand may migrate copies along that path. Suppose that the basic
strategy creates new copies on each node on the sub-path of pfrom vto v
a
, excluding
v. Note that, possibly, v
C
v
a
. Then the rescue-and-help strategy simply establishes a
path from a
?
x
N
v
@
to u, and migrates possibly copies along that path. We distinguish
three cases.
1. Suppose a
?
x
N
v
@C
top
dq?
x
@
. Figure 3.1 on Page 60 illustrates this situation. Then,
the rescue-and-help strategy creates with probability 1
T
Da copy on each node
on the path from a
?
x
N
v
@
to v
a
. Note that in this case all migrations follow a path
that leads downwards, i.e., in direction of the leaves.
2. Suppose a
?
x
N
v
@XC
top
d?
x
@
, and the path from top
d?
x
@
to uis going downwards
only. Figure 3.1 on Page 60 illustrates this situation. In this case, the path
includes top
?
x
@
. With probability 1
T
Dthe rescue-and-help strategy does the
following. c
d}?
x
@
is migrated to top
?
x
@
, and dummy copies are added to each
3.2. The caching strategy for the memory tree 65
node on the migration path from top
d?
x
@
to top
?
x
@
, excluding top
?
x
@
. Then a
copy on each node on the path from top
?
x
@
to v
a
is created. The migration of
c
d?
x
@
to top
?
x
@
effectively re-unions c
d?
x
@
and c
?
x
@
.
3. Suppose a
?
x
N
v
@C
top
d?
x
@
, and the path from top
d?
x
@
to uis going first upwards
to a node wand then downwards to u. Figure 3.2 on Page 61 illustrates this
situation. In this case, the rescue-and-help strategy first migrates c
d?
x
@
to w,
thereby replacing all dummy copies on the path from top
dq?
x
@
to wby real copies,
excluding w. Starting from node w, which now holds c
d?
x
@
, a copy on each node
on the path from wto v
a
is created, which re-unions c
d?
x
@
and c
?
x
@
as wis the
new top
?
x
@
node after serving the request.
It remains todescribe howour caching strategyon the memorytree can be executed
in the distributed setting in which all decisions are based solely on local information.
The tree strategyintroduced in Section 2.2.2 is used as basic strategy. This tree strategy
can be executed in a distributed fashion. Then it is obvious that the rescue-and-help
strategy can also be executed in the distributed setting. This completes the description
of the caching strategy on the memory tree Tm.
As in the previous section, we compare the load of the caching strategy on the
memory tree Tmto the optimal load on the corresponding bandwidth tree Tb. Recall
that theratio between the totalmemory capacity ofTmand the total memory capacity of
Tbis k
I
h
[
1, where hdenotes the height of Tmand Tb. The following lemma compares
the load induced by our caching strategy running on Tmwith the load of an optimal
off-line strategy running on Tb, both of them applied to the same, arbitrary sequence
of requests issued by the leaf nodes of the corresponding tree. The lemma shows
that our caching strategy minimizes the load on all edges simultaneously. Hence, the
congestion produced by the on-line strategy executed on Tmis at most 3
[
3k
T]?
k
t
1
@
times the congestion produced by an optimal off-line strategy executed on Tb.
Lemma 3.4 Let eband emdenote a pair of corresponding edges in Tband Tm, respec-
tively. Then the expected load on emis at most 3
[
3k
T]?
k
t
1
@
times the optimal load
on eb.
Proof. Unfortunately, the behavior of an optimal off-line strategy on the bandwidth
tree Tbis difficult to understand. In order to get a more obvious behavior, we allow
the optimal strategy to place copies on internal nodes, too. Let T
a
bdenote a tree that is
isomorph to Tb, whose leaf nodes have the same memory capacity as the corresponding
leaf nodes in Tb, and whose inner nodes have a memory capacity that is equal to the
sum of the capacities of their respective children. We restrict, however, the placement
ofcopiesonT
a
bin thefollowing way: Fora node v, let T
a
b
?
v
@
denotethesubtreerooted at
node v. Anytime, for each node v, the set of objects that are stored in a memory module
in the subtree T
a
b
?
v
@
, must be a subset of the objects that have a copy or dummy copy
in the memory module of v. Recall that dummy copies are placeholders that cannot
be used to serve a read request and need not to be updated in case of a write request.
66 Chapter 3. Caching with Memory Capacity Constraints
We assume that these copies can be created from scratch, that is, they do not require
any kind of migration. Lemma 3.3 shows that, despite of this restrictions, T
a
bis more
powerful than Tb, so that any lower bound on the load on the edges of T
a
bholds also for
Tb.In contrast to the bandwidth tree Tb, the behavior of an optimal off-line strategy
on T
a
bis relatively clear. W.l.o.g., any optimal off-line strategy fulfills the following
properties for each node vin T
a
b.
If a leaf node in the subtree T
a
b
?
v
@
issues a read or write request to object x, then
the only possible modification of the memory module of vis that a new copy of
xis stored in this module and that copies of other objects are ejected out of this
module.
If a leaf node outside the subtree T
a
b
?
v
@
issues a read or write request to object x,
then the only possible modification of the memory module of vis that copies of
objects are ejected out of this module.
Any optimal off-line strategy that does not possess these two properties can be easily
changed so that the properties are fulfilled without increasing the load on any edge.
Now, fix a sequence σ
C
σ1σ2
IUIUI
of requests issued by the leaf nodes of the corre-
sponding tree. Let em
C?
um
N
vm
@
denote an arbitrary edge of Tmand let e
a
b
C?
u
a
b
N
v
a
b
@
denote the corresponding edge of T
a
b. We assume that vmand v
a
bare closer to the root
than umand u
a
b, respectively. Further, let us fix an optimal off-line strategy on T
a
b,
which is denoted the optimal strategy in the following. Let Lopt denote the minimum
load of the optimal strategy on edge e
a
b. The basic strategy on Tmproduces at most load
3
I
Lopt on edge em. Therefore, it remains to prove that the additional load due to the
rescue-and-help strategy on edge emis not larger than 3k
T]?
k
t
1
@]I
Lopt on expectation.
We use a potential function argument (see, e.g., [ST85]). Let Lr+h
?
t
@
denote the
additional load due to the rescue-and-help strategy on edge em, and let Lopt
?
t
@
denote
the minimum load of the optimal strategy on edge e
a
b, after serving σt. Let Mon
?
t
@
and Mopt
?
t
@
denote the set of objects stored in the memory module of umand u
a
b,
respectively, after serving σt. We define
Φ
?
t
@{C
3k
T]?
k
t
1
@KI
D
I
w
Mopt
?
t
@
F
Mon
?
t
@
wQ
In order to prove the lemma, we show the invariant
E
~
Lr+h
?
t
@[
Φ
?
t
@k
3k
T]?
k
t
1
@JI
Lopt
?
t
@
Q
This invariant can be shown by an induction on the length of σ. Obviously, this
invariant holds initially. Assume that E
~
Lr+h
?
t
@J[
Φ
?
t
@Dk
3k
T]?
k
t
1
@SI
Lopt
?
t
@
. We
have to show that E
~
Lr+h
?
t
[
1
@K[
Φ
?
t
[
1
@_k
3k
T]?
k
t
1
@_I
Lopt
?
t
[
1
@
. Let Lr+h
C
Lr+h
?
t
[
1
@Jt
Lr+h
?
t
@
,Lopt
C
Lopt
?
t
[
1
@Jt
Lopt
?
t
@
, and ∆Φ
C
Φ
?
t
[
1
@Jt
Φ
?
t
@
.
We distinguish between requests issued inside or outside the subtree Tm
?
um
@
.
Suppose σt
Y
1is a read or write request for object xissued inside the subtree
Tm
?
um
@
. In this case, Table 3.1 contains all possible changes of configuration.
3.3. Applications of the general framework 67
x
A
Mon
?
t
@
x
A
Mopt
?
t
@
x
A
Mopt
?
t
[
1
@
E
~
Lr+h
Kk
E
~
∆Φ
]k
Lopt
yes no no 0 0 1
yes no yes 0 0 D
yes yes yes 0 0 0
no no no 3 1
T
D
I
1
T
k
I
3k
T]?
k
t
1
@I
D1
no no yes 3
?
1
T
D
I
1
T
k
[
?
D
t
1
@UT
D
@KI
3k
T]?
k
t
1
@I
DD
no yes yes 3 1
T
D
I
?t
1
[
1
T
k
@JI
3k
T]?
k
t
1
@I
D0
Table 3.1: Possible changes of configuration if σt
Y
1is a read or write request for object
xissued inside the subtree Tm
?
um
@
.
Note that in this case E
~
Lr+h
C
0, if x
A
Mon
?
t
@
, and E
~
Lr+h
]k
1
[
1
T
D
I
2D
C
3, otherwise. In addition, notethat theprobability thatRANDOMejects an object
being in Mopt
?
t
@
is
w
Mopt
?
t
@
w
T
capacity
?
Mon
?
t
@U@bk
1
T
k
because the capacity of RANDOM is ktimes the capacity of the optimal off-line
algorithm. The table implies that E
~
Lr+h
[
∆Φ
Kk
3k
T]?
k
t
1
@JI
Lopt.
Suppose σt
Y
1is a read or write request for object xissued outside the subtree
Tm
?
um
@
. In this case, E
~
Lr+h
_C
0 and E
~
∆Φ
Dk
0. Thus, E
~
Lr+h
[
∆Φ
Xk
3k
T]?
k
t
1
@]I
Lopt.
Hence, E
~
Lr+h
?
t
[
1
@S[
Φ
?
t
[
1
@lk
3k
T]?
k
t
1
@DI
Lopt
?
t
[
1
@
, which implies the
lemma.
3.3 Applications of the general framework
The general framework can be applied to several classes of networks. Given any net-
work G, the bandwidth tree Tb
?
G
@
and the memory tree Tm
?
G
@
can be constructed by a
hierarchical decomposition of G. The quality of the resulting caching strategy depends
on some properties of this decompositions and especially on the height of Tb
?
G
@
and
Tm
?
G
@
. Although it is not clear which properties can be obtained for general networks,
applying the framework to almost any standard network yields interesting results. In
this section we give some examples, including meshes, fat-trees, complete networks,
Cayley networks, edge symmetric networks, and hypercubic networks. Further, we
present the universal caching strategy working on arbitrary networks for which an
oblivious routing strategy is given.
68 Chapter 3. Caching with Memory Capacity Constraints
3.3.1 Meshes
In this section, we consider caching strategies for the mesh M
C
M
?
m1
N
QVQUQ
N
md
@
, i.e.,
the d-dimensional mesh-connected network with side length mi
2 in dimension i.
The number of nodes is denoted by n, i.e., n
C
m1
IUIUI
md, the number of edges is
d
i
c
1m1
IUIUI
mi
1
Iq?
mi
t
1
@{I
mi
Y
1
IUIUI
md
C
Θ
?
d
I
n
@
. Each node has a memory module
of uniform capacity. Each edge has bandwidth 1. Thus, the relative and absolute load
of an edge are identical.
The general framework is based on the access tree strategy. Thus, we adopt the
hierarchical decomposition of Mand theassociated decomposition tree T
?
M
@
of height
O
?
logn
@
from Section 2.3.1.
Since we obtain, for meshes, a tradeoff between the height of the decomposition
tree and the achieved congestion, we need a decomposition tree of arbitrary height
1
k
h
k
logn. Thus, some levels of T
?
M
@
have to be deleted possibly. This is done
with the following deletion algorithm. While height
?
T
?
M
@V@s
h, delete a level 0
W
iW
height
?
T
?
M
@U@
in T
?
M
@
having minimum degree, where the degree of level
i
is the
maximum degree over all nodes on level
i
. Deleting level
i
means that, first, all nodes
and edges on level
i
are deleted, then the edges on level
i[
1 are connected to the
respective nodes on level
it
1, and finally the numbering of the levels is adjusted.
Now, the decomposition tree T
?
M
@
has height hand degree O
?
n1
h
@
. Note that
the root of T
?
M
@
corresponds still to Mitself, and the children of a node vin the tree
correspond to the submeshes into which the submesh corresponding to vis divided,
and the leaves correspond to subtrees of size one, i.e., to the nodes of M.
Define the bandwidth tree Tb
?
M
@
and the memory tree Tm
?
M
@
to be a copy of the
decomposition tree T
?
M
@
. Each of the leaf nodes in Tb
?
M
@
has a memory module
of the same capacity as the respective node in M. The inner nodes in Tb
?
M
@
do not
have memory modules. Each of the leaf nodes in Tm
?
M
@
has a memory module whose
capacity is 2 times the capacity of the respective node in M. Furthermore, each non-
leaf node in Tm
?
M
@
, except for the root, is assumed to have a memory module whose
capacity is equal to the sum of the capacities of its children. The memory capacity of
the root in Tm
?
M
@
is equal to the sum of the capacities of all mesh nodes. Thus, it is
guaranteed that the root node can hold all data objects simultaneously.
Our caching strategy for meshes simulates the strategy for the memory tree Tm
?
M
@
on M. The virtual memory module of an inner node vof the memory tree Tm
?
M
@
is
simulated jointly by all the nodes in the submesh M
?
v
@
. The assignment of the slots in
the memory module of vto the nodes in the submesh M
?
v
@
is done in random fashion
as follows. We permute uniformly at random all slots in the memory module of v.
Then the memory module is partitioned into equally sized, contiguous parts, one for
each of the nodes in M
?
v
@
. If vholds a copy of object x, let s
?
x
N
v
@
denote the memory
slot in the memory module of vholding the copy of x. Summing over all levels in the
memory tree Tm
?
M
@
, we obtain that the on-line strategy needs, for each node of M,
2
I
h
[
1 times the memory capacities of the off-line strategy.
3.3. Applications of the general framework 69
Now, we have to describe, how, in case of an access, a copy of an object is located
in a submesh. For each object x, and each node vof the memory tree Tm
?
M
@
, we
add a signpost p
?
x
N
v
@
. If a copy of xis stored in the memory module of v,p
?
x
N
v
@
points to the mesh node in M
?
v
@
holding the memory slot s
?
x
N
v
@
, and vice versa. In
addition, p
?
x
N
v
@
points to each signpost p
?
x
N
u
@
, with uis adjacent to vin the memory
tree Tm
?
M
@
. We embed the signposts randomly into M, i.e., for each object x, and each
node vof Tm
?
M
@
,p
?
x
N
v
@
is mapped uniformly at random to one of the nodes in M
?
v
@
.
Note that each signpost contains no data of an object. Thus, the memory requirements
of the signposts can be neglected.
The remaining description of our caching strategy is simple: Suppose a message m,
concerning an object x, that is sent from a node vto an adjacent node uin the memory
tree Tm
?
M
@
. In addition, suppose that vand uhold a copy of x. In the mesh Mthis
message mis sent along the dimension-by-dimension order path from the mesh node
holding the memory slot s
?
x
N
v
@
to the mesh node holding the signpost p
?
x
N
v
@
, then to
the mesh node holding the signpost p
?
x
N
u
@
, and finally to the mesh node holding the
slot s
?
x
N
u
@
. The dimension-by-dimension order path between two nodes is the unique
shortest path between the two nodes using first edges of dimension 1, then edges of
dimension 2, and so on. If vor uholds no copy of x, the path in the mesh Mis
appropriately shortened.
Finally, we require the general remapping scheme for frequently accessed mem-
ory slots and signposts. In case of the remapping of a memory slot s
?
x
N
v
@
,s
?
x
N
v
@
is
swapped with a uniform at random chosen memory slot in the submesh M
?
v
@
. For
more details see Section 2.3.1.
Note that the distributed execution of the local strategy based on LRU is a major
problem since the virtual memory modules of the memory tree may be distributed
among several nodes in the mesh. Therefore, we prefer to use RANDOM instead of
LRU.
The following theorem shows that, for each 1
k
h
k
logn, we obtain a caching
strategy for the d-dimensional mesh Mwith nnodes that is
?
2h
[
1
N
O
?
d
I
h
I
n1
h
@U@
-
competitive, w.h.p. Further, setting h
C
logn, we obtain a caching strategy that is
strictly
?
2logn
[
1
N
O
?
d
I
logn
@U@
-competitive, w.h.p. Recall for the latter result that
2d
I
Copt
?
M
@D
min
M
D
N
κ
P
.
Theorem 3.5 Consider an application on the d-dimensional mesh M with n nodes.
For each 1
k
h
k
logn, we obtain a caching strategy with congestion O
?
d
I
h
I
n1
h
I
Copt
?
M
@K[
min
M
n2
h
I
D
N
κ
PyI
n1
h
I
logn
@
, w.h.p., where Copt
?
M
@
denotes the optimal
congestion for the application, and κdenotes the maximum number of write requests
directed to the same object.
Proof. Inorder to provethe aboveresultweapplyLemma2.9which holds analogously
in this setting. Define Copt
?
Tb
?
M
@U@
to be the optimal congestion for the application
when it is executed on the bandwidth tree Tb
?
M
@
, under the assumption that each node
of Mis simulated by its counterpart in Tb
?
M
@
, which is one of the leaf nodes.
70 Chapter 3. Caching with Memory Capacity Constraints
To apply effectively Lemma 2.9, we have to show that the bandwidth tree Tb
?
M
@
is
ΨM-useful for a certain ΨM, i.e., we have to prove the following two conditions.
Copt
?
Tb
?
M
@U@Dk
Copt
?
M
@
.
The maximum expected relative load in Mdue to access messages is O
?
d
I
h
I
n1
h
I
Copt
?
Tb
?
M
@U@V@
.
In addition, note that
?
Tb
?
M
@U@C
deg
?
Tb
?
M
@U@bC
O
?
n1
h
@
. Then Theorem 3.5 can be
yield with Lemma 2.9.
The lower boundCopt
?
Tb
?
M
@U@bk
Copt
?
M
@
for the optimal strategy can be yield with
Lemma 2.7. The following lemma gives the upper bound on the expected load of our
caching strategy.
Lemma 3.6 The maximum expected relative load in M due to access messages is O
?
d
I
h
I
n1
h
I
Copt
?
Tb
?
M
@U@V@
.
Proof. Let L
?
e
@
denote the load on edue to the simulation of access messages passing
edges on level
i
of the memory tree Tm
?
M
@
, for 1
kihk
h. We show that E
~
L
?
e
@]C
O
?
d
I
n1
h
I
Copt
?
Tb
?
M
@U@U@
, for 1
kik
h, which yields the lemma.
Fix a level
i
. Let vbe a node of Tm
?
M
@
on level
iDt
1 such that M
?
v
@
includes the
edge e. If such a node does not exist then E
~
L
?
e
@l\C
0. Let v
a
be one of the children
of v. From the proof of Lemma 2.8 we can obtain that the expected load on edue to
the simulation of the memory tree edge eT
CM
v
N
v
aP
on level
i
of Tm
?
M
@
is at most
O
?
d
@
times the relative load on the memory tree edge eT. Thus, the expected load on e
due to the simulation of the memory tree edge eTis O
?
d
I
Copt
?
Tb
?
M
@U@V@
, since Lemma
3.4 yields with k
C
2 that the congestion produced by the on-line strategy executed
on Tm
?
M
@
is at most 9 times the congestion produced by an off-line strategy executed
on Tb
?
M
@
. The same bound holds for the edges connecting vwith its other O
?
n1
h
@
children. Hence, E
~
L
?
e
@lC
O
?
d
I
n1
h
I
Copt
?
Tb
?
M
@U@V@
, which yields the lemma.
This completes the proof of Theorem 3.5.
3.3.2 Fat-trees
In this section, we consider caching strategies for the fat-tree Fof height Hwith n
nodes. Fhas thetopologyof a symmetric tree, that is, for each inner-node, the subtrees
rooted at the children of the node are isomorph. The fat-tree represents an indirect
network, that is, only the leaf nodes are processors with memory modules of uniform
capacity, the inner nodes are only routing switches. We define the root of Fto be on
level 0, and all nodes whose parents are on level iare defined to be on level i
[
1.
Furthermore, each edge eof Fconnecting a level inode with a level i
[
1 node is
defined to be on level i
[
1. Thus, F
C
F
?U?
d1
N
b1
@N
QUQUQ
N?
dH
N
bH
@V@
, i.e., for every level
iin the tree Feach node on level ihas di
Y
1children and each edge on level ihas
bandwidth bi. We make the reasonable assumption, for each level i,bi
bi
Y
1.
3.3. Applications of the general framework 71
The fat-tree Fis called realistic if the bandwidths of the levels decrease geomet-
rically in direction to the root, i.e., if, for each level i,bi
k
α
I
di
Y
1
I
bi
Y
1, for some
constant α
W
1.
The general framework is based on the access tree strategy. Thus, we adopt the
hierarchical decomposition of Fand the associated decomposition tree T
?
F
@
of arbi-
trary height 1
k
h
k
Hfrom Section 2.3.2. The remaining description of our caching
strategy is analogously to the caching strategy for meshes in Section 3.3.1.
The followingtheorem shows that, foreach 1
k
h
k
H, we obtaina cachingstrategy
for fat-trees of height Hwith nnodes that is
?
2h
[
1
N
O
?
h
[
n1
h
@U@
-competitive, w.h.p.,
and, if h
C
H,
?
2H
[
1
N
O
?
H
@U@
-competitive, w.h.p. For realistic fat-trees this result
improves to
?
2h
[
1
N
O
?
n1
h
@U@
-competitive, w.h.p., and, if h
C
H, to
?
2H
[
1
N
O
?
1
@@
-
competitive, w.h.p. Further, we obtain, for each 1
k
h
k
H, a caching strategy that is
strictly
?
2h
[
1
N
O
?U?
deg
?
F
@][
n1
h
@
3
I
logn
@V@
-competitive, w.h.p. Recall for the latter
result that Copt
?
F
@X
min
M
D
N
κ
T
deg
?
F
@^P
.
Theorem 3.7 Consider an application on the fat-tree F of height H with n nodes.
Let Copt
?
F
@
denote the optimal congestion for the application, and let κdenote the
maximum number of write requests directed to the same object.
For each 1
k
h
k
H, we obtain a caching strategy with congestion O
?V?
h
[
n1
h
@\I
Copt
?
F
@[
R
@
, w.h.p., and, if h
C
H, with congestion O
?
H
I
Copt
?
F
@[
R
@
, w.h.p.,
with R
C
min
M|?
max
M
deg
?
F
@N
n1
h
P}@
2
I
D
N
κ
PI
max
M
deg
?
F
@N
n1
h
PI
logn.
For realistic fat-trees this improves to O
?
n1
h
I
Copt
?
F
@}[
R
@
, w.h.p., and, if h
C
H,
to O
?
Copt
?
F
@\[
R
@
, w.h.p.
Proof. This proof is analogously to the proof of Theorem 2.14 concerning caching on
fat-trees without memory capacity constraints.
3.3.3 Complete networks
Some massively parallel computers, e.g., Cray T3E and T3D or Intel Paragon, have
a network with very high bandwidth, so that not the network is the bottleneck but
the individual memory modules. In particular, remote accesses to these modules are
expensive, as local accesses are supported by additional local caches. These systems
are well modeled by a complete network Gnwith nnodes. We aim to minimize the
congestion at the nodes due to remote accesses, that is, remote accesses increase the
load at any node whereas local accesses are free. Each node has a memory module of
uniform capacity and bandwidth 1. Thus, the relative and absolute load of a node are
identical.
The general framework is based on the access tree strategy. Thus, we adopt the
hierarchical decomposition of Gnand the associated decomposition tree T
?
Gn
@
of ar-
bitrary height 1
k
h
k
lognfrom Section 2.3.3. The remaining description of our
caching strategy is analogously to the caching strategy for meshes in Section 3.3.1.
72 Chapter 3. Caching with Memory Capacity Constraints
The following theorem shows that we obtain, for h
C
1, a caching strategy for
complete networks with nnodes that is
?
3
N
O
?
1
@U@
-competitive, w.h.p., with respect to
the congestion at the nodes. Further, we obtain, for h
C
logn, a caching strategy that
is strictly
?
2logn
[
1
N
O
?
logn
@U@
-competitive, w.h.p., with respect to the congestion at
the nodes. Recall for the latter result thatCopt
?
Gn
@X
min
M
D
N
κ
P
.
Theorem 3.8 Consider an application on the complete network Gnwith n nodes. for
each 1
k
h
k
logn, we obtain a caching strategy with congestion O
?
h
I
Copt
?
Gn
@`[
min
M
n2
h
I
D
N
κ
PI
n1
h
I
logn
@
at the nodes, w.h.p., where Copt
?
Gn
@
denotes the optimal
congestion at the nodes for the application, and κdenotes the maximum number of
write requests directed to the same object.
Proof. This proof is analogously to the proof of Theorem 2.17 concerning caching on
complete networks without memory capacity constraints.
3.3.4 The universal caching strategy
In thissection, we present the universal caching strategyworking onarbitrary networks
for which an oblivious routing strategy is given. A routing strategy is called oblivious
if the path traveled by each packet depends only on the origin and destination of the
packet and not on the origins and destinations of the other packets nor on congestion
encountered during the routing. Note that each oblivious routing strategy is also an
on-line routing strategy. Finally, we apply the universal caching strategy to several
classes of networks.
Suppose we are given an arbitrary networkGwith nnodes and an oblivious routing
strategy for G. Each node is assumed to have a memory module of uniform capacity,
and each edge is assumed to have bandwidth 1. Thus, the relative and absolute load
of an edge are identical. The universal caching strategy for Gis a simulation of the
access tree strategy for the complete network Gnon G. Note that both Gand Gnhave
nnodes such that for each node in Ga counterpart in Gncan be fixed. All messages
that are sent between adjacent nodes in Gnare sent along the paths, determined by the
oblivious routing strategy, between the associated nodes in G.
The following theorem shows that, for a network Gin which a permutation can be
routed obliviously with maximum expected load Lπ
?
G
@
, the universal caching strat-
egy is
?
3
N
O
?
Lπ
?
G
@{I
deg
?
G
@U@@
-competitive, w.h.p., and strictly
?
2logn
[
1
N
O
?
Lπ
?
G
@{I
deg
?
G
@JI
logn
@U@
-competitive, w.h.p. Recall for the latter result that deg
?
G
@JI
Copt
?
G
@
min
M
κ
N
D
P
.
Theorem 3.9 Consider an application on a network G with n nodes in which a per-
mutation can be routed obliviously with maximum expected load Lπ
?
G
@
. For each
1
k
h
k
logn, we obtain a caching strategy with congestion O
?
Lπ
?
G
@SI
deg
?
G
@SI
h
I
Copt
?
G
@[
min
M
n2
h
I
D
N
κ
PbI
n1
h
I
logn
@
, w.h.p., where deg
?
G
@
denotes the degree of G,
Copt
?
G
@
denotes the optimal congestion for the application, and κdenotes the maxi-
mum number of write requests directed to the same object.
3.3. Applications of the general framework 73
Proof. The proof is analogouslyto the proof of Theorem 2.20 concerning the universal
caching strategy without memory capacity constraints.
Cayley networks
The following corollarycan be concluded withTheorem 3.9 and the remarksin Section
2.3.4. This corollary shows that, for a Cayley network Gwith nnodes, the universal
caching strategy is
?
3
N
O
?
diam
?
G
@KI
deg
?
G
@U@
-competitive, w.h.p., and strictly
?
2logn
[
1
N
O
?
diam
?
G
@KI
deg
?
G
@KI
logn
@U@
-competitive, w.h.p.
Corollary 3.10 Consider an application on a Cayley network G with n nodes. For
each 1
k
h
k
logn, we obtain a caching strategy with congestion O
?
diam
?
G
@\I
deg
?
G
@\I
h
I
Copt
?
G
@][
min
M
n2
h
I
D
N
κ
PI
n1
h
I
logn
@
, w.h.p., where Copt
?
G
@
denotes the optimal
congestion for the application, and κdenotes the maximum number of write requests
directed to the same object.
Edge symmetric networks
The following corollarycan be concluded withTheorem 3.9 and the remarksin Section
2.3.4. This corollary shows that, for an edge symmetric network Gwith nnodes, the
universal caching strategyis
?
3
N
O
?
diam
?
G
@U@
-competitive,w.h.p., and strictly
?
2logn
[
1
N
O
?
diam
?
G
@KI
logn
@U@
-competitive, w.h.p.
Corollary 3.11 Consider an application on an edge symmetric network G with
n nodes. For each 1
k
h
k
logn, we obtain a caching strategy with congestion
O
?
diam
?
G
@JI
h
I
Copt
?
G
@\[
min
M
n2
h
I
D
N
κ
PI
n1
h
I
logn
@
, w.h.p., where Copt
?
G
@
denotes
the optimal congestion for the application, and κdenotes the maximum number of
write requests directed to the same object.
Hypercubic networks
The following corollarycan be concluded withTheorem 3.9 and the remarksin Section
2.3.4. This corollary shows that, for the wrapped butterfly, cube-connected-cycles, hy-
percube, de Bruijn, and shuffle-exchange network with nnodes, the universal caching
strategy is
?
3
N
O
?
logn
@U@
-competitive, w.h.p., and strictly
?
2logn
[
1
N
O
?U?
logn
@
2
@U@
-
competitive, w.h.p.
Corollary 3.12 Consider an application on the wrapped butterfly, cube-connected-
cycles, hypercube, de Bruijn, or shuffle-exchange network G with n nodes. For each
1
k
h
k
logn, we obtain a caching strategy with congestion O
?
logn
I
h
I
Copt
?
G
@`[
min
M
n2
h
I
D
N
κ
PDI
n1
h
I
logn
@
, w.h.p., whereCopt
?
G
@
denotes the optimal congestion for
the application, and κdenotes the maximum number of write requests directed to the
same object.
74 Chapter 3. Caching with Memory Capacity Constraints
The following corollary can be concluded with Theorem 3.9 and the remarks
in Section 2.3.4. This corollary shows that, for the indirect butterfly network with
nnodes, the universal caching strategy is
?
3
N
O
?
1
@V@
-competitive, w.h.p., and strictly
?
2logn
[
1
N
O
?
logn
@U@
-competitive, w.h.p.
Corollary 3.13 Consider an application on the indirect butterfly network G with n
nodes. For each 1
k
h
k
logn, we obtain a caching strategy with congestion O
?
h
I
Copt
?
G
@{[
min
M
n2
h
I
D
N
κ
PI
n1
h
I
logn
@
, w.h.p., where Copt
?
G
@
denotes the optimal
congestion for the application, and κdenotes the maximum number of write requests
directed to the same object.
Chapter 4
Summary and Discussion
We have presented and analyzed a general framework for the development of dis-
tributed caching strategies for networks with memory capacity constraints in a non-
uniform cost model. The strategies aim to minimize the network congestion in order
to minimize the communication overhead and avoid that some of the links become
a bottleneck. We have shown that this framework yields efficient strategies that are
able to exploit the locality included in an arbitrary application for several examples
of networks, including meshes, fat-trees, complete networks, Cayley networks, edge
symmetric networks, and hypercubic networks. In contrast to previously known strate-
gies, our strategies give an integrated solution for the problem of data placement, data
tracking, and routing. In the following we present a detailed summary of the results
of this thesis. Table 4.1 and 4.2 give a survey over the results for caching without and
with memory capacity constraints, respectively.
Trees. We present the first deterministic and distributed caching strategy that
achieves competitive ratio 3 for trees in a non-uniform cost model. This competi-
tive ratio is optimal because of the lower lower bound shown by Black and Sleator
[BS89]. Our tree strategy minimizes not only the congestion but minimizes simulta-
neously the load on each individual edge up to a optimal factor of 3. Strategies for
trees are of special interest as they can be used as subroutines in strategies for other
networks, e.g., in the access tree strategy. However, our strategy neglects memory
capacity constraints.
Meshes. The situation on mesh-connected networks is much more complicated than
the one on trees because in meshes there are several possible routing paths between
every pair of nodes. Let Mdenote the d-dimensional mesh with side length ni
2 in
dimension i. This graph represents a network of n
C
n1
IUIUI
ndprocessors, all of them
with uniform memory capacity, and d
i
c
1n1
IVIUI
ni
1
Iq?
ni
t
1
@{I
ni
Y
1
IUIUI
ndcommunica-
tion links, all of them with uniform bandwidth.
Constructing bandwidth trees of different heights, we obtain a tradeoff between
the bandwidth and memory utilization: For each 1
k
h
k
logn, we obtain a caching
76 Chapter 4. Summary and Discussion
Caching without memory capacity constraints
non-strict strict
network competitive ratio competitive ratio
tree 3 3
d-dimensional mesh O
?
d
I
logn
@
O
?
d
I
logn
@
fat-tree Fof height H O
?
min
M
H
N
logn
T
loglogn
@
O
?
deg
?
F
@
3
I
logn
@
realistic fat-tree F
of height HO
?
1
@
O
?
deg
?
F
@
3
I
logn
@
complete network
(congestion at the
nodes) O
?
1
@
O
?
logn
@
Cayley network GO
?
diam
?
G
@KI
deg
?
G
@U@
O
?
diam
?
G
@KI
deg
?
G
@KI
logn
@
edge symmetric
network GO
?
diam
?
G
@V@
O
?
diam
?
G
@JI
logn
@
hypercube
cube-connected-cycles
de Bruijn
shuffle-exchange
wrapped butterfly
O
?
logn
@
O
?U?
logn
@
2
@
indirect butterfly O
?
1
@
O
?
logn
@
Table 4.1: Survey over the results for caching without memory capacity constraints.
Recall that ndenotes the number of nodes in the networks and that all results hold
w.h.p.
strategy for the d-dimensional mesh Mwith nnodes that is
?
2h
[
1
N
O
?
d
I
h
I
n1
h
@U@
-
competitive, w.h.p. Setting h
C
logn, we obtain the strict congestion ratio O
?
d
I
logn
@
,
w.h.p. Further, we present an
?
logn
T
d
@
lower bound for the competitive ratio for
on-line routing in meshes, which implies that our upper bound on the congestion ratio
for meshes of constant dimension is optimal.
Fat-trees. Define a fat-tree Fof height Hwith nnodes to be a graph that has the
topology of a symmetric tree, that is, for each inner-node, the subtrees rooted at the
children of the node are isomorph. The fat-tree represents an indirect network, that
is, only the leaf nodes are processors with memory modules of uniform capacity, the
inner nodes are only routing switches.
Constructing bandwidth trees of different heights, we obtain a better trade-
off between the bandwidth and memory utilization than for the mesh: For each
1
k
h
k
H, we obtain a caching strategy for a fat-tree Fof height Hwith nnodes
that is
?
2h
[
1
N
O
?
h
[
n1
h
@U@
-competitive, w.h.p., and, if h
C
H,
?
2H
[
1
N
O
?
H
@U@
-
77
Caching with memory capacity constraints
non-strict strict
network memory congestion memory congestion
ratio ratio ratio ratio
d-dimensional
mesh 2h
[
1O
?
d
I
h
I
n1
h
@
2logn
[
1O
?
d
I
logn
@
fat-tree F
of height H2h
[
1
2H
[
1O
?
h
[
n1
h
@
O
?
H
@
2h
[
1O
?U?
deg
?
F
@\[
n1
h
@
3
I
logn
@
realistic fat-tree F
of height H2h
[
1
2H
[
1O
?
n1
h
@
O
?
1
@
2h
[
1O
?U?
deg
?
F
@\[
n1
h
@
3
I
logn
@
complete network
(congestion at the
nodes) 3O
?
1
@
2logn
[
1O
?
logn
@
Cayley network G3O
?
diam
?
G
@KI
deg
?
G
@U@
2logn
[
1O
?
diam
?
G
@JI
deg
?
G
@JI
logn
@
edge symmetric
network G3O
?
diam
?
G
@V@
2logn
[
1O
?
diam
?
G
@JI
logn
@
hypercube
cube-connected-
cycles
de Bruijn
shuffle-exchange
wrapped butterfly
3O
?
logn
@
2logn
[
1O
?U?
logn
@
2
@
indirect butterfly 3 O
?
1
@
2logn
[
1O
?
logn
@
Table 4.2: Survey over the results for caching with memory capacity constraints. Note
that his variable with 1
k
h
k
lognfor the d-dimensional mesh and 1
k
h
k
Hfor
the (realistic) fat-tree of height H. Recall that ndenotes the number of nodes in the
networks and that all results hold w.h.p.
competitive, w.h.p. Further, we obtain, for each 1
k
h
k
H, a caching strategy
that is strictly
?
2h
[
1
N
O
?U?
deg
?
F
@J[
n1
h
@
3
I
logn
@U@
-competitive, w.h.p. Setting h
C
min
M
H
N
logn
T
loglogn
P
, we obtain the congestion ratio O
?
min
M
H
N
logn
T
loglogn
P}@
,
w.h.p., and the strict congestion ratio O
?
deg
?
F
@
3
I
logn
@
, w.h.p.
Usually, it is assumed that the bandwidths of the levels decrease geometrically in
direction to the root. Let Bidenote the sum of the bandwidths of all edges on level i
with level Hedges being incident on the leaf nodes. A fat-tree is called realistic, if, for
each level i,Bi
k
α
I
Bi
Y
1, for some constant α
W
1. For realistic fat-trees, we obtain,
for each 1
k
h
k
H, a caching strategy that is
?
2h
[
1
N
O
?
n1
h
@U@
-competitive, w.h.p.,
and, if h
C
H,
?
2H
[
1
N
O
?
1
@U@
-competitive, w.h.p.
78 Chapter 4. Summary and Discussion
Complete networks. Some massively parallel computers, e.g., Cray T3E and T3D
or Intel Paragon, have a network with very high bandwidth, so that not the network
is the bottleneck but the individual memory modules. In particular, remote accesses
to these modules are expensive, as local accesses are supported by additional local
caches. These systems are well modeledby a complete network of nodes withmemory
modules of uniform capacity. We aim to minimize the congestion at the nodes due to
remote accesses, that is, remote accesses increase the load at any node whereas local
accesses are free. Each node is assumed to have uniform bandwidth. We obtain a
caching strategy for complete networks with nnodes that is
?
3
N
O
?
1
@U@
-competitive,
w.h.p., and a caching strategy that is strictly
?
2logn
[
1
N
O
?
logn
@U@
-competitive, w.h.p.,
with respect to the congestion at the nodes.
The universal caching strategy. We present the universal caching strategy work-
ing on arbitrary networks for which an oblivious routing strategy is given. Each
node is assumed to have a memory module of uniform capacity, and each edge is
assumed to have bandwidth 1. For a network Gin which a permutation can be routed
obliviously with maximum expected load Lπ
?
G
@
, the universal caching strategy is
?
3
N
O
?
Lπ
?
G
@\I
deg
?
G
@V@U@
-competitive, w.h.p., and strictly
?
2logn
[
1
N
O
?
Lπ
?
G
@\I
deg
?
G
@\I
logn
@V@
-competitive, w.h.p. Thismeans that the universal caching strategy iswell suited
for networks having a degree at most logarithmic in the network size and in which a
permutation can be routed obliviously with maximum expected load at most logarith-
mic in the network size. We apply the universal caching strategy to several classes of
networks.
Cayley networks. For a Cayley network Gwith nnodes, the universal caching
strategy is
?
3
N
O
?
diam
?
G
@SI
deg
?
G
@U@
-competitive, w.h.p., and strictly
?
2logn
[
1
N
O
?
diam
?
G
@JI
deg
?
G
@JI
logn
@U@
-competitive, w.h.p.
Edge symmetric networks. For an edge symmetric network Gwith nnodes,
the universal caching strategy is
?
3
N
O
?
diam
?
G
@U@
-competitive, w.h.p., and strictly
?
2logn
[
1
N
O
?
diam
?
G
@KI
logn
@U@
-competitive, w.h.p.
Hypercubic networks. For the wrapped butterfly, cube-connected-cycles, hy-
bercube, de Bruijn, and shuffle-exchange network with nnodes, the univer-
sal caching strategy is
?
3
N
O
?
logn
@U@
-competitive, w.h.p., and strictly
?
2logn
[
1
N
O
?V?
logn
@
2
@U@
-competitive, w.h.p. Further, for the indirect butterfly network
with nnodes, the universal caching strategy is
?
3
N
O
?
1
@l@
-competitive, w.h.p., and
strictly
?
2logn
[
1
N
O
?
logn
@U@
-competitive, w.h.p.
Clustered networks. We show how the access tree strategy can be applied to
Internet-like clustered networks. A clustered network is a network that consists of
several small subnetworks, i.e., clusters, that are organized hierarchically. Commu-
nication between nodes of the same cluster is not as expensive as communication
79
between nodes of different clusters. The access trees are embedded into the clustered
network in a preprocessing step. We show that this preprocessing can be done effi-
ciently and locally for each participating cluster. Thus, we yield an efficient caching
strategy that is suitable, e.g., for managing WWW pages. In contrast to the caching
strategies currently used in the Internet, our strategy keeps all copies of a data object
consistent such that, e.g., modifications of WWW pages, are propagated to all copies
of the page. However, our strategy neglects memory capacity constraints.
Previous models for caching in networks simply aim to minimize the total com-
munication load, i.e., the sum, taken over all messages, of the size of the messages
multiplied with the length of their routing paths. This can result in bottlenecks. We
have improved upon such models as our model aims to minimize the congestion, i.e.,
the maximum, taken overall links, of the amount of data transmittedby the link, which
corresponds to distributing the load evenly among all network resources. However, in
contrast to other computational models for parallel computers, e.g., the BSP [Val90]
and the LogP model [CKP
Y
96], our congestion based model does not have any no-
tion of time. We still simply sum up the load over all time steps. Thus, our models
avoids bottlenecks in space but not in time. The BSP model, for example, assumes
that a parallel algorithm proceeds in synchronized rounds. An adequate congestion
measure for this model considers the sum, over all rounds, of the maximum edge load
in the respective round. Unfortunately, investigating caching in networks in this model
is much more complicated since shifting communication load on some of the edges
from one round to another possibly has crucial effects. Therefore, we conclude that
developing caching strategies in a cost model that incorporates some notion of time is
a challenging task.
Further, our experimental results in [KMR
Y
99] show that there is a further im-
portant cost factor in sending messages apart from the congestion. The sending of a
message by a processor is called a startup. The overhead induced by the startup pro-
cedure, inclusive the overhead of the receiving processor, is called startup cost. Since
we ignore the startup costs in our model it is an interesting question how to incorporate
startup costs in an adequate way in our congestion based model.
We have shown that our general framework yields efficient caching strategies for
several classes of networks. It can also be efficiently applied to other networks. All
that is needed is a hierarchical network decomposition that possesses similar proper-
ties to the decompositions we have described. Basically, such a hierarchical network
decomposition can be obtained by a recursively decomposition of the network. In or-
der to achieve efficient caching strategies, in each decomposition step the following
two rules have to be regarded.
Decompose the network almost along the bisection cut into two subnetworks,
such that communication between nodes of the same subnetwork is not as ex-
pensive as communication between nodes of different subnetworks.
80 Chapter 4. Summary and Discussion
Decompose the network into two almost equally sized subnetworks, such that
the associated decomposition tree is balanced.
Unfortunately, there could be a tradeoff between the two rules in arbitrary networks.
Thus, it is an interesting question of whether or not there is a variation of the recur-
sive decomposition described above that yields a caching strategy achieving close to
optimal congestion on arbitrary networks?
Bibliography
[ABF93] B. Awerbuch, Y. Bartal, and A. Fiat. Competitive distributed file al-
location. In Proceedings of the 25th ACM Symposium on Theory of
Computing (STOC), pages 164–173, 1993.
[ABF98] B. Awerbuch, Y. Bartal, and A. Fiat. Distributed paging for general
networks. Journal of Algorithms, 28:67–104, 1998.
[ALMZ96a] M. Andrews, F. T. Leighton, P. T. Metaxas, and L. Zhang. Automatic
methodsfor hidinglatencyin highbandwidth networks. In Proceedings
of the 28th ACM Symposium on Theory of Computing (STOC), pages
257–265, 1996.
[ALMZ96b] M. Andrews, F. T. Leighton, P. T. Metaxas, and L. Zhang. Improved
methodsfor hidinglatencyin highbandwidth networks. In Proceedings
of the 8th ACM Symposium on Parallel Algorithms and Architectures
(SPAA), pages 52–61, 1996.
[AP90] B. Awerbuch and D. Peleg. Sparse partitions. In Proceedings of the
31th IEEE Symposium on Foundations of Computer Science (FOCS),
pages 503–513, 1990.
[AR98] Y. Aumann and R. Rabani. An O
?
logk
@
approximate min-cut max-flow
theorem and approximation algorithm. SIAM Journal on Computing,
27(1):291–301, 1998.
[BDBK
Y
90] S. Ben-David, A. Borodin, R. M. Karp, G. Tardos, and A. Widgerson.
On the power of randomization in online algorithms. In Proceedings
of the 22th ACM Symposium on Theory of Computing (STOC), pages
386–379, 1990.
[Bel66] L. A. Belady. A study of replacement algorithms. IBM Systems Jour-
nal, 5:78–101, 1966.
[BFR92] Y. Bartal, A. Fiat, and Y. Rabani. Competitive algorithms for dis-
tributed data management. In Proceedings of the 24th ACM Symposium
on Theory of Computing (STOC), pages 39–50, 1992.
82 Bibliography
[BL97] Y. Bartal and S. Leonardi. On-line routing in all-optical networks. In
Proceedings of the 24th International Colloquium on Automata, Lan-
guages and Programming (ICALP), pages 516–526, 1997.
[BS89] D. L. Black and D. D. Sleator. Competitive algorithms for replication
and migration problems. Technical Report CMU-CS-89-201, Depart-
ment of Computer Science, Carnegie-Mellon University, 1989.
[CK99] E. Cohen and H. Kaplan. Exploiting regularities in web traffic patterns
for cache replacement. In Proceedings of the 31th ACM Symposium on
Theory of Computing (STOC), 1999.
[CKP
Y
96] D. E. Culler, R. M Karp, D. Patterson, A. Sahay, K. E. Schauser, E. E.
Santos, R. Subramonian, and T. von Eicken. LogP: A practical model
of parallel computation. Communications of the ACM, 39(11):78–85,
1996.
[CMS96] R. J. Cole, B. M. Maggs, and R. K. Sitaraman. On the benefit of sup-
porting virtual channels in wormhole routers. In Proceedings of the
8th ACM Symposium on Parallel Algorithms and Architectures (SPAA),
pages 131–141, 1996.
[CMSV96] R. Cypher, F. Meyer aufder Heide, C. Scheideler, and B. V¨ocking. Uni-
versal algorithms for store-and-forward and wormhole routing. In Pro-
ceedingsof the 28thACM SymposiumonTheoryofComputing(STOC),
pages 356–365, 1996.
[FKL
Y
91] A. Fiat, R. M. Karp., M. Luby, L. A. McGeoch, D. D. Sleator, and
N. E. Young. Competitive paging algorithms. Journal of Algorithms,
12(2):685–699, 1991.
[FW74] P. A. Franaszek and T. J. Wagner. Some distribution-free aspects of
paging performace. Journal of the ACM, 21:31–39, 1974.
[Goo94] A. J. van de Goor. Computer Architecture and Design. Addison-
Wesley, 1994.
[HR90] T. Hagerup and C. R¨ub. A guided tour of Chernoff bounds. Information
Processing Letters, 33:305–308, 1990.
[IKP96] S. Irani, A. R. Karlin, and S. Phillips. Strongly competitive algorithms
for paging with locality of reference. SIAM Journal on Computing,
25(3):477–497, 1996.
[Ira97] S. Irani. Page replacement with multi-size pages and applications to
web caching. In Proceedings of the 29th ACM Symposium on Theory
of Computing (STOC), pages 701–710, 1997.
Bibliography 83
[KLL
Y
97] D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin, and R. Pan-
igrahy. Consistent hashing and random trees: Distributed caching pro-
tocols for relieving hot spots on the World Wide Web. In Proceedings
of the 29th ACM Symposium on Theory of Computing (STOC), pages
654–655, 1997.
[KLM
Y
97] R. R. Koch, F. T. Leighton, B. M. Maggs, S. B. Rao, A. L. Rosenberg,
and E. J. Schwabe. Work-preserving emulations of fixed-connection
networks. Journal of the ACM, 44(1):104–147, 1997.
[KMR
Y
99] C. Krick, F. Meyer auf der Heide, H. R¨acke, B. V¨ocking, and M. West-
ermann. Data management in networks: Experimental evaluation of a
provably good strategy. In Proceedings of the 11th ACM Symposium on
Parallel Algorithms and Architectures (SPAA), pages 165–174, 1999.
[KMRS88] A. R. Karlin, M. S. Manasse, L. Rudolph, and D. D. Sleator. Competi-
tive snoopy caching. Algorithmica, 3(1):79–119, 1988.
[KPR93] P. Klein, S. A. Plotkin, and S. Rao. Excluded minors, network decom-
position, and multicommodity flow. In Proceedings of the 25th ACM
Symposium on Theory of Computing (STOC), pages 682–690, 1993.
[KRVW98] C. Krick, H. R¨acke, B. V¨ocking, and M. Westermann. The DIVA (dis-
tributedvariables)library. www.uni-paderborn.de/sfb376/a2/diva.html,
Paderborn University, 1998.
[Lei92] F. T. Leighton. Introduction to Parallel Algorithms and Architectures:
Arrays
Trees
Hypercubes. Morgan Kaufmann, San Mateo, CA,
1992.
[LMP
Y
95] F. T. Leighton, F. Makedon, S. Plotkin, C. Stein, E. Tardos, and
S. Tragoudas. Fast approximation algorithms for multicommodity flow
problems. Journal of Computer and System Science, 50:228–243,
1995.
[LMRR94] F. T. Leighton, B. M. Maggs, A. G. Ranade, and S. B. Rao. Ran-
domized routing and sorting on fixed–connection networks. Journal of
Algorithms, 17:157–205, 1994.
[LRWY94] C. Lund, N. Reingold, J. Westbrook, and D. Yan. On-line distributed
data management. In Proceedings of the 2nd European Symposium on
Algorithms (ESA), 1994.
[Mey83] F. Meyer auf der Heide. Efficiency of universal parallel computers.
Acta Informatica, 19:269–296, 1983.
84 Bibliography
[Mey86] F. Meyer auf der Heide. Efficient simulations among several models
of parallel computers. SIAM Journal on Computing, 15(1):106–119,
1986.
[MMVW97a] B. M. Maggs, F. Meyer auf der Heide, B. V¨ocking, and M. Wester-
mann. Exploiting locality for networks of limited bandwidth. In Pro-
ceedings of the 38th IEEE Symposium on Foundations of Computer
Science (FOCS), pages 284–293, 1997.
[MMVW97b] B. M. Maggs, F. Meyer auf der Heide, B. V¨ocking, and M. Wester-
mann. Exploiting localityfor networks oflimited bandwidth. Technical
Report tr-rsfb-97-042, Paderborn University, 1997.
[MS91] L. A. McGeoch and D. D. Sleator. A strongly competitive randomized
paging algorithm. Algorithmica, 6(6):816–825, 1991.
[MV95] F. Meyer auf der Heide and B. V¨ocking. A packet routing protocol for
arbitrary networks. In Proceedings of the 12th Symposium on Theoret-
ical Aspects of Computer Science (STACS), pages 291–302, 1995.
[MV99] F. Meyer auf der Heide and B. V¨ocking. Shortest paths routing in arbi-
trary networks. Journal of Algorithms, 31(1):105–131, 1999.
[MVW99] F. Meyer auf der Heide, B. V¨ocking, and M. Westermann. Provably
good and practical strategies for non-uniform data management in net-
works. In Proceedings of the 7th European Symposium on Algorithms
(ESA), pages 89–100, 1999.
[MVW00] F. Meyer auf der Heide, B. V¨ocking, and M. Westermann. Caching
in networks. In Proceedings of the 11th ACM-SIAM Symposium on
Discrete Algorithms (SODA), pages 430–439, 2000.
[MW89] F.Meyerauf der Heide and R. Wanka. Time-optimalsimulationsof net-
works by universal parallel computers. In Proceedings of the 6th Sym-
posium on Theoretical Aspects of Computer Science (STACS), pages
120–131, 1989.
[OR97] R. Ostrovsky and Y. Rabani. Universal O
?
congestion
[
dilation
[
log1
Y
εn
@
local control packet switching algorithms. In Proceedings
of the 29th ACM Symposium on Theory of Computing (STOC), pages
644–653, 1997.
[PR96] C. G. Plaxton and R. Rajaraman. Fast fault-tolerant concurrent ac-
cess to shared objects. In Proceedings of the 37th IEEE Symposium on
Foundations of Computer Science (FOCS), pages 570–579, 1996.
Bibliography 85
[PRR97] C. G. Plaxton, R. Rajaraman, and A. W. Richa. Accessing nearby
copies of replicated objects in a distributed environment. In Proceed-
ings of the 9th ACM Symposium on Parallel Algorithms and Architec-
tures (SPAA), pages 311–320, 1997.
[RS94] P. Raghavan and M. Snir. Memory versus randomization in on-line
algorithms. IBM Journal of Research and Development, 38(6):683–
707, 1994.
[Spi77] J. R. Spirn. Program Behavior: Models and Measurements. Elsevier
Computer Science Library. Elsevier, Amsterdam, 1977.
[ST85] D. D. Sleator and R. E. Tarjan. Amortized efficiency of list update and
paging rules. Communications of the ACM, 28(2):202–208, 1985.
[SV96] C. Scheideler and B. V¨ocking. Universal continuous routing strategies.
In Proceedings of the 8th ACM Symposium on Parallel Algorithms and
Architectures (SPAA), pages 142–151, 1996.
[Val90] L. G. Valiant. A bridging model for parallel computation. Communi-
cations of the ACM, 33, 1990.
[V¨oc98] B. V¨ocking. Static and Dynamic Data Management in Networks. PhD
thesis, Paderborn University, Germany, December 1998.