Intelligent Prefetching and Buffering for Interactive Streaming of MPEG Videos [original]

Intelligent Prefetching and Buffering for Interactive

Streaming of MPEG Videos

Susanne Boll, Christian Heinlein, Wolfgang Klas, Jochen Wandel

Databases and Information Systems (DBIS)

Computer Science Department, University of Ulm, Germany

b oll,heinlein,klas,wandel

@informatik.uni-ulm.de

ABSTRACT

Continuous delivery of media streams like video over IP net-

works so far is mainly handled by commercial approaches

that deliver the stream forward-oriented in their own pro-

prietary format. Though some existing streaming technolo-

gies are able to adapt to varying bandwidths, they do not

provide smo oth reactions to user interactions with the con-

tinuous stream.

Wehave develop ed the

MPEG-L/MRP

strategy, an adap-

tive prefetching algorithm for the MPEG-1 video format in

combination with an intelligent buering technique that al-

lows for smo oth and quick reactions to user interactions with

the stream. With L/MRP [12] an approach already has

b een presented to deliver and buer homogeneous continu-

ous data streams like Motion-JPEG with special fo cus on

fast reaction to user interactions. In contrast, the MPEG-1

enco ding with its dierent frame typ es and the dep endencies

between frames op ens the do or to a more ne-grained adap-

tation of the continous stream. However, the complexity

of MPEG-1 calls for comprehensive adaptation and special

amendments of the L/MRP algorithm to make it an eÆcient

preloading and buering technique for MPEG-1 videos.

With the realization of

MPEG-L/MRP

in the context of

a multimedia presentation engine on top of a multimedia

rep ository we have an eÆcient means to deliver continu-

ous streams of interactivemultimedia presentations over ex-

isting IP infrastructure trying to minimize interaction re-

sp onse time and optimize loading/reloading p ortions of a

video stream.

1. INTRODUCTION

In future, users of multimedia applications will no longer b e

satised with pre-packed presentations on stand-alone sys-

tems or proprietary comp ositions embedded in Web pages

and rendered by browser plug-ins. Rather, p ersonalized in-

teractivemultimedia presentations are needed, delivered on-

demand from a multimedia server over an IP network to a

user's exible presentation environment. In this context, the

delivery of continuous multimedia data as well as its presen-

tation must be tailored to the sp ecic requirements of this

environment, i. e., the varying bandwidth, resp onse time of

the server, and the like.

The motivation of our work in the area of continuous de-

livery of interactive multimedia presentations over a net-

work stems from our research pro ject \Gallery of Cardiac

Surgery" (Cardio-OP

) [8] which aims at developing an In-

ternet-based and database-driven multimedia information

system in the domain of cardiac surgery. The users of the

system request multimedia content from dierent platforms

over dierent network connections. Video streams are of

high imp ortance in this educational environment. During

the learning pro cess, it is indisp ensable for the user to inter-

act on the stream so as to watch a scene again or jump to

another interesting part of the video. Therefore the system

must supp ort interactions and, to be user-friendly, should

react in a very responsiveway. Hence, the presentation en-

vironment demands for streaming support for continuous

media with suitable handling of user interactions.

In the pro ject context, we developed a multimedia presen-

tation engine which includes support for continuous MPEG

video streams. For this, we develop ed the

MPEG-L/MRP

algorithm to continuously deliver MPEG-1 video streams

over an IP network whichwe present in this paper.

Compared with, e. g., Motion-JPEG, the enco ding of con-

tinuous video streams with MPEG-1 oers a signicantly

higher compression rate whichis very imp ortant for a de-

livery over a network with p otentially low bandwidth. We

aim at continuously delivering the MPEG-1 stream in small

units and at buering these units in an intelligent wayat

the client such that the user is provided with a smooth and

continous presentation though the user can p ossibly carry

out VCR-likeinteractions on the stream like fast forward,

reverse, or jumping to a bo okmark in the video. The buer-

ing technique should hide the request and buering of units

and rather deliver a continuous MPEG-stream of the b est

quality that can b e currently provided to the application.

With

L/MRP

[12] we nd a preloading and buering strat-

egy for continuous streams supp orting interactions that has

proven to p erform b etter than \traditional" strategies like,

e. g., LRU, FIFO, LFU, etc.This approach, however, aims

at delivering and buering

homogeneous

continuous data

streams like Motion-JPEG with sp ecial fo cus on fast re-

action to user interactions. The complexity of MPEG-1

Partially funded by the German Ministry of Research and

Education, grant number 08C58456. Our pro ject partners

are the University Hospital of Ulm, Dept. of Cardiac Surgery

and Dept. of Cardiology, the University Hospital of Heidel-

b erg, Dept. of Cardiac Surgery, an asso ciated Rehabilitation

Hospital, the publisher Huthig-Verlag, Heidelb erg, FAW Ulm,

and ENTEC GmbH, St. Augustin. For details see also URL

http://www.informatik.uni-ulm.de/dbis/Cardio-OP/

Ulmer Informatikberichte Nr. 2000-05, April 2000

with its heterogeneous frame types of dierent imp ortance,

varying frame sizes, and inter-frame dependencies calls for

comprehensive adaptation and sp ecial amendments of the

original L/MRP algorithm to make it an eÆcient preload-

ing and buering technique for MPEG-1 videos. This pap er

presents our MPEG-1 sp ecic preloading and buer man-

agement strategy

MPEG-L/MRP

for MPEG-1 videos.

The remainder of this pap er is organized as follows: Sec-

tion 2 discusses related work. Section 3 revisits the original

L/MRP algorithm and gives a short overview of the parts

of MPEG-1 relevant to our approach. In Section 4, our

new MPEG-L/MRP approach is presented which consists

of a formal mo del and a corresp onding algorithm. Section 5

sketches the implementation of the approach and Section 6

concludes the pap er.

2. RELATED WORK

Related work, concerned with the delivery of multimedia

contentover the Internet, covers several research approaches

dealing with the adaptive streaming of MPEG videos. As

a part of the QUASAR pro ject at the Oregon Graduate

Institute [19] an MPEG player for adaptive MPEG stream-

ing over the Internet has been develop ed which addresses

resource scarceness in the end-to-end delivery. The fo cus

lies on a quality of service (QoS) mo del and an adaptation

mechanism of the player. To facilitate adaptive streaming,

the MPEG video is provided by the server in dierent qual-

ities. The stream is adapted in the temp oral dimension by

dropping B frames rst, then P frames, and nally I frames.

In addition, dierent spatial resolutions are provided as a

second variable quality dimension. Buering is applied to

comp ensate network jitter but do es not supp ort fast reac-

tions to user interactions. Another approach, the Media

Streaming Proto col [4] developed at the Universityof Illi-

nois, provides adaptive streaming of MPEG movies, to o. On

congestion, the proto col considers the dierent frame typ es

of MPEG with their frame interdependencies and, similar

to our approach, drops less imp ortant MPEG frames rst.

The client side buer is employed only to smooth the jitter

of arriving data but do es not allow for minimizing inter-

action resp onse time and reload of data after p ossible user

interactions.

In the commercial area, many approaches can b e found that

deal very well with the streaming of videos, e. g., Quicktime

[1] or Emblaze [2]. With VDOLive [17] and Real [14] ap-

proaches exist, that are furthermore able to adapt the video

stream to uctuations of the available bandwidth. For in-

stance, with the intro duction of the SureStream technology

[15], Real allows to enco de a video clip that serves for up to

six dierent bandwidths. This stream can automatically b e

adjusted to compensate for network congestions. However,

as this technique encodes multiple disjoint streams into one

le, it leads to an ination of the storage size and to redun-

dancy. However, all the commercial approaches mentioned

have in common that they operate on proprietary video for-

mats and are neither designed to support minimization of

the interaction resp onse time nor to optimize the eort for

reloading p ortions of a video stream.

With Q-L/MRP [3] an interesting application of L/MRP has

evolved. Q-L/MRP extends L/MRP with additional inter-

action sets in order to supp ort the sp ecic QoS requirements

of certain users. However, the approach do es not deal with

MPEG sp ecic preloading and replacement strategies.

3. L/MRP AND MPEG-1 REVISITED

3.1 L/MRP

L/MRP (

east/

ost

elevant for

resentation) [12] is a

buer management strategy for interactive continuous data

ows in a client/server environment. The client requests

and receives a continuous medium in small units and buers

that part of the stream that is relevant for the current and

future presentation. The main idea is to request, preload,

and buer those units that are

most relevant

to b e presented

in the near future. The sp eciality of the L/MRP strategy

here is that the preloading and buering takes into account

the interactions a user p ossibly carries out on the stream,

e. g., switch to fast forward playback or jump to a bo okmark.

By that means, the interaction resp onse time compared to

common buer management and replacement strategies is

reduced (cf. [12]).

Preloading

and

replacement

are the two

tasks the buer management strategy has to master. During

preloading the next most relevant units of the continuous

stream are determined, whereas the replacement strategy

must decide which are the least relevant units as these are

removed from the buer to free space for more relevant units.

The L/MRP buer management strategy treats the stream

as a sequence of so called

Continuous Object Presentation

Units (COPUs)

with an ascending numbering of the units.

Lo oking at a sequence of COPUs from a specic presentation

p oint

in time, the single COPUs are dierently relevant

for the current presentation which is expressed by assigning

relevance values to each COPU. Consider Figure 1 for an

illustration: The current presentation p ointis

= 43 and

the user is watching the stream at double sp eed in forward

direction. Then, every other COPU in forward direction

close to the current presentation p oint is absolutely relevant

for the up coming presentation. These COPUs form the so

called

referenced

set, as they are likely to b e referenced in

the near future. However, there are COPUs that already

have b een viewed. These b elong to the

history

set of CO-

PUs of the stream. As a user could change the direction

of the playout at any time, these COPUs are still relevant

for the presentation. Finally, the frames in forward direc-

tion which are

skipped

due to the double sp eed playout, are

relevant, to o, as the user could switch to normal sp eed play-

backat any time. These considerations can b e continued for

further interaction types such as fast backward, jumping to

b o okmarks, and the like.

The relevance of a COPU with resp ect to one of these sets is

determined by a so called

distancerelevance function

which

expresses a COPU's relevance as a function of the distance of

the COPU to the current presentation p oint

. For the ref-

erenced set, the distance relevance function is monotonously

decreasing with value 1 for the next few COPUs to b e pre-

sented. As the frames of the history and skipp ed sets are

less likely to b e presented, their distance relevance functions

are decreasing more rapidly. Given one or more relevance

functions for each COPU, an overall relevance function can

b e calculated, e. g., by taking the maximum relevance value

for each COPU. This global relevance function is then used

by the preloading and replacement of the buer. The rel-

evance value expresses which COPUs are likely to b e pre-

sented when taking into account the dierent interactions

a user could p erform on the stream. L/MRP tries to keep

those most relevant COPUs in the client buer to achievea

quick and smooth reaction to the user interaction. Dep end-

ing on the buer size those COPUs ab ove a certain relevance

value are kept in the buer and those b elow the threshold

value are not loaded/are removed from the buer to make

ro om for the more/most relevant COPUs. Whenever the

presentation p oint

pro ceeds, the relevance values are re-

calculated, the COPUs to be preloaded are determined and

the COPU(s) with the least relevance value in the buer are

replaced.

Referenced Skip

0.5

45 46 47444342414039383736 48 49 50 51 52 53 54 55 56 57 58 59

Relevance

COPU Indices

HistoryInteraction Sets:

pPresentation Point

Figure 1: L/MRP: interaction sets and relevance

values

3.2 MPEG-1

The MPEG-1 standard [6] is a coding format for audio and

video streams. In this pap er, we are concerned with video

streams only [7]. The main feature of MPEG-1 that is inter-

esting in this pap er is that frames are no longer indep endent

of each other as is the case with, e. g., Motion-JPEG which

is a series of single JPEG [18] images. Figure 2 shows a se-

quence of MPEG frames and their interdep endencies which

are relevant for deco ding the stream. An MPEG-1 video se-

quence in general consists of three dierent frame typ es,

and

. Usually, the frames from one I frame up to the frame

b efore the next I frame form a so called

Group of Pictures

(GoP). Since I frames (intra-coded pictures) are enco ded

similarily to JPEG images, their deco ding is indep endentof

other frames. The deco ding of P frames (predictive co ded

pictures) dep ends on the preceding I or P frame of the same

GoP. For B frames (bidirectionally co ded pictures) deco d-

ing dep ends on b oth the preceding and the succeeding I or

P frame. P and B frames allowamuch higher compression

rate than I frames by exploiting temp oral prediction using

motion vectors. It is imp ortant to note that the display

order in which the frames are presented is dierent from

the bitstream order in which the frames are decoded due

to inter-frame dep endencies. Figure 2 illustrates b oth the

display order and the bitstream order of a stream. The or-

der for deco ding is very imp ortant as a preloading strategy

must of course consider the order of deco ding and not only

of displaying the frames.

A preloading and buer management strategy for MPEG-1

video must pay attention to the dierent frame typ es and

Display order:

Bitstream order by frame number:

264 7850 9 ...

BB BB BBPPII

12345 78960

Figure 2: MPEG frame types and their interdepen-

dencies

their inter-frame dependencies, the bitstream order for de-

co ding the stream, and the fact that the bitrate/data rate

of the video and the size of the frames can heavily vary.

4. MPEG-L/MRP MODEL

4.1 Overview of MPEG-L/MRP

Basic idea

So far the L/MRP approach has proven [12] to b e sup erior

to traditional preloading and buering strategies esp ecially

when it comes to fast reaction to user interactions. The ba-

sic idea of

MPEG

-L/MRP is to provide the same interaction

resp onsiveness as achieved with L/MRP but in particular to

takeinto account the specic features of the MPEG video

stream. The dierent frame typ es with their inter-frame

dep endencies and their dierent imp ortance for the presen-

tation are the main issue when adapting L/MRP to MPEG.

The MPEG-L/MRP strategy exploits the knowledge ab out

the imp ortance and dep endencies of the frames such that

the video can be optimally presented under the available

network bandwidth. Therefore, the interaction sets and the

asso ciated relevance functions of the L/MRP strategy are

adapted such that they reect this specic importance of

frames for the presentation. When frames do not arrivein

time at the client, temp oral adaptation is used in order to

maintain a continuous presentation.

Choosing the appropriate COPU size

The rst issue of adapting L/MRP to MPEG streams is the

kind and size of the data that forms a COPU. The COPUs

are the basic units for transp ortation of the stream. Lo ok-

ing at MPEG-1 there are dierent possibilities to dene a

COPU:

A COPU corresponds to a GoP.

The rather big size

of the COPU might be a problem. If such a COPU cannot

be delivered to the client, in average half a second of the

video is missing. This size is also unsuitable for, e. g., a fast

forward presentation of the video, since all frames had to b e

loaded to the client though only a subset of them would b e

needed.

A COPU corresp onds to a part of a GoP.

[5] prop osed

to use IBB or PBB groups. However, the groups and there-

fore the COPUs are then dep endent on each other. And this

restricts the supp orted co ding scheme of the MPEG stream

to IBBPBB...PBB patterns.

A COPU corresp onds to a frame.

Here, still the

COPUs are dep endent on each other like the frames of the

MPEG stream are. However, this granularity allows for fast

and targeted reaction to varying network bandwidth and

user interactions.

We decided to use the third alternative as it oers the most

appropriate p ossibility to comp ensate uctuations in the

available network bandwidth and, at the same time, oers

supp ort for fast and smo oth reactions to user interactions on

the stream. This decision serves as the basis for the formal

mo del to follow.

4.2 The MPEG-L/MRP Model

Overview

In this subsection, the MPEG-L/MRP mo del will b e devel-

op ed step by step. Following some preliminary denitions,

weintro duce

presentation sets

as a means to collect those

frames whichhave to b e displayed for a particular kind of

presentation of a video, such as normal playback, double

sp eed presentation, and so on. Since P and B frames cannot

b e deco ded indep endently, additional Ior P frames might

b e necessary to actually deco de and display the frames of

a sp ecic presentation set. These inter-frame dep endencies

are captured by

dependency sets

, leading to the notion of

closedpresentation sets

Afterwards, static and dynamic

relevance functions

are de-

ned as a means to quantify the relevance of frames con-

tained in a particular presentation set. While static rele-

vance functions are used to assign relevance values to frames

surrounding a static

reference frame

(representing, e. g., a

b o okmark), dynamic relevance functions are needed to com-

pute the relevance values of frames surrounding the

current

presentation point

which is constantly moving in time during

a normal presentation of the video. Both static and dynamic

relevance functions are based on

generic relevance functions

which dene relevance values indep endent of a particular

reference frame or the current presentation p oint.

Finally,a

global relevance function

is intro duced which com-

bines the relevance values of static and dynamic relevance

functions into a single overall relevance value for each frame

of the video which will b e used by the MPEG-L/MRP algo-

rithm to determine preloading candidates and replacement

victims.

Remark:

For readers familiar with the details of the origi-

nal L/MRP mo del [12], it should b e noted that the formal

mo del evolved in several asp ects in order to adapt it to the

sp ecial requirements of the MPEG video format. In particu-

lar, the notion of

interaction sets

containing

pairs

of frames

(or COPUs) and relevance values (determined by so called

distancerelevance functions

) has b een split into two orthog-

onal concepts:

presentation sets

containing frames only on

the one hand, and

relevance functions

assigning relevance

values to frames on the other hand. By that means, inter-

frame dep endencies can b e captured quite easily byintro-

ducing dep endency sets which are completely indep endent

of the concept of relevance values. Furthermore,

generic rel-

evance functions

, which are translated to a particular frame

and restricted to a particular presentation set in order to ob-

tain static and dynamic relevance functions, are somewhat

easier to use than the corresp onding

distancerelevance func-

tions

of the original mo del, esp ecially when frames are not

equidistantly distributed within a presentation set.

Preliminary Definitions

Let, as usual, IN=

;

;:::

b e the set of natural num-

b ers and

Z =

:::;



;



;

;:::

the set of integer

numb ers. For

Z, let

denote the set of integers from

0to

, i. e.,



;

;:::;k



for



;

k; k

;:::;



;

for

For a subset



Z of integers, let



! f

;

the

characteristic function

assigning a value of 1 to all

memb ers of

and 0 to all other numb ers:



(



1 for

0 otherwise

Presentation Sets

For a particular video comprising

IN frames, let

;

;:::;n



b e the set of its

frame numbers

in display order. Further-

more, let

, and

be pairwise disjoint subsets of

representing the set of all I, P, and B frames of the video,

resp ectively. Assuming that a video do es not contain other

frame types (in particular, D frames), it holds:

[

presentation set

is a subset



of frames whichhave

to b e displayed for a particular kind of presentation of the

video. For instance, the presentation set



s; i

;s;

s;:: :

sp ecies the set of all frames

whichhave to b e displayed

for a forward or backward presentation of the video with a

relative sp eed (or

skip factor

)of

IN.

Dependency Sets

Due to inter-frame dependencies, in order to b e able to de-

co de and display the frames of a particular presentation

set

, it might be necessary,however, to deco de additional

frames. These inter-frame dependencies are captured by the

dependency set

(

)



containing all frames

which

are directly or transitively needed to deco de and display

frame

. Using the auxiliary denitions

(

) = max



and

(

) = min

[



sp ecifying the

closest preceding I frame

of frame

and the

closest succeeding I or P frame

of frame

, resp ectively,

(

)

can b e dened as follows:

(

g[f

[

(

)



(

)

Since

(

) =

(

) for an I frame

, it holds

(

) =

in that case, which means that no additional

frame is needed to deco de an I frame. For a P frame

it holds

(

)

(

), and thus

(

) contains

and all

preceding P frames up to and including the closest preced-

ing I frame. The same holds for a B frame

, but since

(

)

< f < P

(

) in that case,

(

) contains the closest

succeeding I or P frame of

,too.

Remark:

For

(

) and

(

)tobewell-dened for all frames

, the rst frame of a video must be an I frame and its

last frame must b e an I or P frame. Without these restric-

tions, the video would not be standard-conforming, however.

Closed Presentation Sets

The

closure



of a presentation set



can b e dened as

the set



[

(

)

containing all frames which are actually needed for a partic-

ular kind of presentation, either directly b ecause they have

to b e displayed or indirectly due to inter-frame dependen-

cies. A presentation set

is called

closed

,if



holds.

Given the denition of

ab ove, the closure



comprises,

for example, all frames which are actually needed for a pre-

sentation of the video with a relative sp eed of

. Intersecting



with one of the sets

,or

, yields the pairwise dis-

joint sets



, and



containing all I, P, or B frames, resp ectively, necessary for

such a presentation.

If the co ding scheme of a video is a constant rep etition of

the pattern illustrated in Figure 3 (i) (followed by a nal I

frame), the presentation set

contains all frames depicted

as shaded b oxes. Since this set comprises all I and P frames

of the video, it is already closed, i. e., it holds



in that case. The presentation set

on the other hand,

illustrated in Figure 3 (ii) is not closed, since additional

P frames identied by black arrows are needed to deco de the

B frames that are to b e presented at a skip factor of 3. That

means, that the closure



contains 6 instead of 4 frames out

of each 12-frame pattern IBBBPBBBPBBB resulting in an

overhead of roughly 50%.

Bframes to be

presented

frames needed

for decoding

IBB

additional

BBBBBPP

IPPBBB BBB BBB

(i) 2

F :

(ii) F :

Figure 3: Presentation sets at dierent skip rates

Relevance Functions

generic relevance function

is a function



;

1] as-

signing a

relevance value



(

)

;

1] to eachinteger num-

ber

Z. Typically, a generic relevance function is either

monotonously increasing

for



0 and zero-valued for

or zero-valued for

0 and

monotonously decreasing

for



0. For instance, the linear functions



(



max(





x=a

)

;

0) for

;

0 otherwise

;

with a p eak value of

;

1] for

= 0 and positivevalues

for

(

Z) are typical examples of generic

Since the sizes of I, P, and B frames are usually quite dierent,

this is indeed only a rough estimation.

relevance functions (cf. Figure 4 (i) for



0 and (ii) for



0).

(i)

(ii)

Figure 4: Typical generic relevance functions

static relevance function

is a function



;

1] as-

signing a relevance value



(

)

;

1] to each frame

Typically, a static relevance function



is constructed by

translating

the peak of a generic relevance function



to a

sp ecic

referenceframe

and

restricting

its domain to

a particular presentation set





(



(



)





(

)

dynamic relevance function

is a function



;

assigning a relevance value

(

f; p

)

;

1] to each frame

taking into account the current presentation p oint

. Similar to a static relevance function, a dynamic

relevance function is usually constructed by translating and

restricting a generic relevance function



, where the current

presentation point

replaces the static reference frame

(

f; p



(



)





(

) for some



As noted ab ove, static relevance functions are typically used

to describ e b o okmarks where the reference frame

sp ecies

the p osition of the b o okmark in the video, while dynamic

relevance functions are needed to mo del dynamic presenta-

tions of the video like normal playback, reverse playback,

fast forward, etc., where the current presentation p oint

constantly moving in time.

Global Relevance Function

Given a set of static relevance functions



;:::;

and a

set of dynamic relevance functions

;:::;Æ

, the

global rel-

evance function





;

1] is dened as an appro-

priate combination of these functions, e. g., by computing a

weighted maximum value

for each frame



(

f; p

) = max



max

;:::;k





(

)

;

max

;:::;m





(

f; p

)



Here, the weighting factors

;:::;!

;

1] and



;:::;

;

1] can b e used as global regulators similar to the slide

controls of a sound mixer.

Examples

Example 1 – Forward Presentations

For a skip factor

IN, let



, and



be generic

relevance functions

which are zero-valued for

x <

0 and

monotonously decreasing for



0. Since I frames are gen-

erally more imp ortant than P frames which are in turn more

imp ortant than B frames, the relationship



(

)





(

)





(

) shall hold for all

Z (cf. Figure 5 for a typical

example using a skip factor

= 1).

Relevance

0.25

0.50

0.75

1.00

3 6 9 12 15 18



Figure 5: Generic relevance functions



Based on these generic relevance functions, dynamic rele-

vance functions

, and

can be dened using the

presentation sets

, and

, resp ectively, dened ear-

lier (cf. Figure 6, again using

= 1):

(

f; p

) =



(



)





(

)

;

(

f; p



(



)





(

)

;

(

f; p



(



)





(

)

Relevance

0.25

0.50

0.75

1.00

p p

+12

+15

+18

Figure 6: Dynamic relevance functions

Dep ending on the current presentation p oint

, these func-

tions express the relevance values of those I, P, and B frames,

resp ectively, which will b e needed in the near future for a for-

ward presentation of the video with a skip factor of

. Since

the presentation sets

, and

are pairwise disjoint, at

most one of the values

(

f; p

(

f; p

), and

(

f; p

) will

b e p ositive for each frame

, while the others are zero.

Combining the functions

, and

into a global rel-

evance function using weight factors



=1,

yields the function



shown in Figure 7:



(

f; p

) = max



(

f; p

)

;

(

f; p

)

;

(

f; p

)



the

denotes the direction of the playout

Relevance

0.25

0.50

0.75

1.00

p p

+12

+15

+18

Figure 7: Global relevance function



Example 2 – Backward Presentations

Replacing the generic relevance functions



, and



with their reected counterparts



, and



, resp ec-

tively, which are monotonously increasing for



0 and

zero-valued for

0, yields analogous dynamic relevance

functions

(

f; p



(



)





(

) for

I; P; B

expressing the relevance values of those

frames which will

b e needed for a backward presentation of the video in the

near future (cf. Figure 8 for

= 1).

Relevance

0.25

0.50

0.75

1.00



Figure 8: Dynamic relevance functions

Remark:

Since the frames

(

)

are necessary to de-

co de frame

, their relevance values should b e greater than

that of

. For the \forward functions"

describ ed ab ove,

this requirement is normally fullled due to the monotony

of the generic functions



and the global relationship











. The only critical case here is a P frame

(

)

succeeding

a B frame

. By choosing functions



and



whose dierence is large enough, it can b e guaran-

teed, however, that the desired condition

(

)

(

)is

always satised.

For the \backward function"

, however, the monotony

of the generic function



is partially counter-pro ductive,

since it yields the undesired relationship

(

)



(

) for

a P frame

and all frames

(

)



. To remedy

this aw, a correction term can b e included in the denition

which reduces the relevance value of a frame

by a small amount

for every frame

(

)

that is necessary to deco de

(cf. Figure 9):

(

f; p





(



)







(

)

where

(

)

j

Relevance

0.25

0.50

0.75

1.00



Figure 9: Dynamic relevance functions

with

mo died

Combining the forward functions

, and

and the

backward functions

, and

into a globale relevance

function using weight factors



= 1 and



= 0

75 expressing that the overall relevance of

backward playing should be weighted 0.75 compared with

the overall relevance of forward playing, yields the function



shown in Figure 10:



(

f; p

) = max(



(

f; p

)

;



(

f; p

)

;



(

f; p

)

;



(

f; p

))

Relevance

0.25

0.50

0.75

1.00



Figure 10: Global relevance function



Example 3 – Bookmarks

Setting a b o okmark at a sp ecic frame

should al-

low a user to jump to that frame at any time and con-

tinue the presentation of the video at that p oint. There-

fore, frames surrounding a b o okmark frame

should have

the same relevance values as those surrounding the current

presentation p oint

. Thus, dep ending on the presentation

directions (forward/backward) and the skip factors

which should be allowed after jumping to a b o okmark, static

relevance functions



and p ossibly



should b e dened

for each b o okmark frame

based on the generic relevance

functions



and



, resp ectively,intro duced ab ove:



(



(



)





(

) for

I; P; B;



(



(



)





(

) for

I; P; B:

In order to include these functions into a global relevance

function like, e. g.,



, their weighting factors

and

should dep end on factors like the frequency b o okmarks are

jump ed to, the numb er of b o okmarks whichhave b een set,

and the available buer size. If, for instance, the number

of b o okmarks is small with respect to the buer size and

b o okmarks are referenced frequently, factors

might be sensible in order to give bo okmarks the same over-

all relevance as the current presentation point. If, on the

other hand, the numb er of bo okmarks is very large, smaller

weighting factors havetobechosen in order to avoid that

b o okmark frames completely push out frames which are rel-

evant for the current presentation.

4.3 MPEG-L/MRP Algorithm

Preloading and Replacement of Frames

The MPEG-L/MRP algorithm, which is based on the buer

management algorithm presented in [12], integrates preload-

ing and replacement of frames. Using the global relevance

function



, those frames of an MPEG video are determined

that are to be loaded next as they are most relevant for

the presentation. If the buer is full, the global relevance

function is also used to determine those frames that are to

be removed to make ro om for more relevant ones. List-

ing 4.1 shows the essential parts of the algorithm which are

explained in the following.

The buer containing the currently loaded frames is rep-

resented by an ob ject

of type

Buffer

supplying metho ds

full()

load()

toss()

, and

undo()

explained b elow. To

simplify notations,

is also used as a set comprising all cur-

rently loaded frames.

The function

LoadMostRelevantFrames()

, which is called

whenever the current presentation p oint

changes, main-

tains two frames,

and

, with corresponding relevance val-

ues,

and

(cf. Figure 11). The

load threshold

rep-

resents the maximum relevance value max



(

f; p

) of all

frames

of the movie which are currently

not

loaded, while the

toss threshold

represents the minimum

relevance value min



(

f; p

) of all frames

which

are

currently loaded. Thus it is known, that all frames with

a global relevance value greater than

are already loaded,

while those having a global relevance value lower than

are

not loaded. The

load candidate

and the

toss victim

are frames with global relevance values



(

l; p

) =

and



(

t; p

) =

, resp ectively. In the snapshot shown in

Figure 11, it holds for example

+ 9 and



Relevance

0.25

0.50

0.75

1.00



Figure 11: Load and toss thresholds

and

The function rep eatedly selects a load candidate

(line 5)

and loads it into the buer (line 11), causing the load thresh-

old

= max



(

f; p

) to gradually decrease. As long as there

is not enough buer space for the load candidate

,however

(line 6), a toss victim

is selected (line 7) and tossed out of

the buer (line 9), causing the toss threshold

= min



(

f; p

)

to gradually increase. Since frame sizes mightvary heavily,

it might indeed b e necessary to toss several frames

out of

the buer before b eing able to load the load candidate

Listing 4.1

LoadMostRelevantFrames

1 Buffer

;

3 LoadMostRelevantFrames(int

)

4 while (true)

5 let

where



(

l; p

)= max



(

f; p

)

;

6 while (

.full(

))

7 let

where



(

t; p

) = min



(

f; p

)

;

8 if (



(

t; p

)

>

(

l; p

)

.undo(); return;

.toss(

);

.load(

);

If the toss threshold



(

t; p

) becomes greater than the

load threshold



(

l; p

) (line 8), i. e., load and toss thresh-

olds meet each other, a stable state is reached causing the

function to return without loading the current load candi-

date

anymore. In that case, one or more toss victims

might already have b een tossed out of the buer in order to

make ro om for

. Since the available buer space is still not

suÆcient for

,however, these victims might b e put backinto

the buer in order to avoid unused buer space. For that

reason, the metho d call

.toss(

)

(line 9) only

tentatively

tosses the frame

, e. g., by

marking

it for deletion, while

the actual tossing is p erformed implicitly by the next call

.load(

)

(line 11). Alternatively, the call to

.undo()

(line 8) undo es all tentative tossing operations, e. g., by un-

marking all marked frames, causing them to remain in the

buer.

Remark:

Even though

.toss(

)

do es not actually toss

frame

out of the buer, it will b e treated as removed by

subsequent calls to

.full()

, of course. Furthermore,

virtually removed from the set

to avoid that it b ecomes

the toss victim in line 7 again and again.

Optimizations

The algorithm presented so far, esp ecially the selection of

the load candiate

and the toss victim

(line 5 and 7, re-

sp ectively), is describ ed in a rather abstract way to makeit

more comprehensible. Hence, it is not directly usable for a

real-world implementation, since a straight-forward imple-

mentation of the \where clauses"



(

l; p

)= max



(

f; p

) and



(

t; p

) = min



(

f; p

) would require a lo op over all frames

and

, resp ectively, and a computation of the

global relevance value



(

f; p

) for each such frame

. Given a

typical frame rate of 20 to 30 frames p er second, a 10 minute

movie comprises 12,000 to 18,000 frames for which



(

f; p

)

would have to b e computed. Fortunately, the typical prop-

erties of generic relevance functions mentioned b efore, allow

a signicant reduction of the numb er of necessary computa-

tions, as will b e briey explained in the following.

Consider, for example, a generic relevance function



(

)

which is zero-valued for

0 and monotonously decreasing

for



0, and an arbitrary set



Z of integers which

is partitioned into two subsets



x <

and



containing negative and non-

negative memb ers of

, resp ectively. If

= min

and

= max

denote the borders of the subset

, then

the extreme values min



(

) and max



(

) of the generic

relevance function



can b e computed as follows:

min



(



;



;



(

)

;



;

max



(



(

)

;

That means, that at most one function value



(

) has to b e

computed in order to determine the minimum or maximum

value of



over an arbitrarily large set

Applied to a dynamic relevance function

(

f; p

) which is con-

structed by translating and restricting a generic relevance

function



to a presentation set



, this means, that at

most one function value

(

f; p

) has to b e computed in order

to determine the minimum or maximum value of

over an

arbitrary subset of

. In particular, if



is any subset

of frames, min

(

f; p

) and max

(

f; p

) can b e determined

by computing at most one function value

(

f; p

Finally, if a global relevance function



is dened as

the weighted maximum of a set of dynamic relevance

functions

;:::;Æ

with corresp onding presentation sets

;:::;S

and weighting factors



;:::;

, its maximum

value max



(

f; p

) can be determined by simply compar-

ing the weighted maximum values





max

(

f; p

) of the

dynamic functions

(

= 1

;:::;m

). Since each of these

maximum values, as explained above, requires at most one

function evaluation, the global maximum value needed in

line 5 of the algorithm (Listing 4.1) can b e computed very

eÆciently,too.

If the presentation sets

;:::;S

are pairwise disjoint, it

can be shown that the minimum value min



(

f; p

) of the

global relevance function



, needed in line 7 of the algorithm,

can also be determined by comparing the weighted mini-

mum values





min

(

f; p

) of the dynamic functions

(

= 1

;:::;m

) and thus can b e computed very eÆciently,

Due to symmetry prop erties, the same holds for a generic rele-

vance function



which is monotonously increasing for



0 and

zero-valued for

If a dynamic relevance function

is modied as shown in Fig-

ure 9, this statement do es no longer hold: Since the function's

monotony is partially violated, its extreme values over a set of

frames might not o ccur

exactly

at, but only

near

the borders

of the set. Nevertheless, they can b e determined by computing

a few

function values near the b orders, where the exact number

dep ends on the maximum numb er of consecutive P frames in the

video.

to o. Otherwise, things b ecome a bit more diÆcult, since a

frame

might belong to more than one presentation

set

causing it to possibly p ossess multiple p ositive rele-

vance values

(

f; p

). If one of them represents the global

minimum value over all frames

and all relevance func-

tions

,itwould b e selected as the toss victim, despite the

fact that it might possess another, greater dynamic rele-

vance, to o. In order to avoid this problem,

virtual frames

are intro duced as pairs (

f; j

)

f

;:::;m

b elonging

to at most one \virtual presentation set"

f

(which

is isomorphic to the real presentation set

). Technically

sp eaking that means, that the buer

might contain mul-

tiple

references

(representing virtual frames) to the same

physical frame

, and that a physical frame is only

tossed out of the buer if the last virtual frame which refer-

ences it is tossed.

In principle, static relevance functions needed to supp ort

b o okmarks can be treated completely analogously to dy-

namic functions. If their numb er becomes large, however,

even more sophisticated optimizations can be employed.

Since static relevance values do not change with the cur-

rent presentation p oint, it is p ossible to precompute and

store them in a global sorted list which can be treated like

single

monotonous function.

Video Stream Construction

In parallel with the preload and replacement function

just describ ed, a video stream construction function

ConstructVideoStream()

is executed which re-assembles

the individual frames contained in the buer into a coher-

ent MPEG video stream which can b e handed to a player.

Typically, this function is called by the player (via an appro-

priate interface) whenever it needs another group of pictures

to display, i. e., when the current presentation point

has

changed. Therefore,

ConstructVideoStream()

in turn calls

LoadMostRelevantFrames()

in a parallel thread in order to

up date the buer contents. If

LoadMostRelevantFrames()

has not nished when

ConstructVideoStream()

is called the

next time, it is interrupted and called again using the new

current presentation p oint

4.4 Temporal Adaptation

When applying the MPEG-L/MRP buering technique for

transmitting an MPEG video over a network connection,

it might happen that some frames cannot be used by an

MPEG-player, b e it b ecause they are not delivered in time or

b ecause they are broken due to transmission failures. Many

existing MPEG players are not able to cop e with missing

frames or at least cannot maintain a timely or continuous

presentation, resp ectively. Therefore, the player must be

provided with an MPEG stream in which the missing frames

are replaced by corresp onding surrogate frames. The struc-

ture of the MPEG format allows to dene a frame that is

able to reference the whole content of a preceding I or P

frame. By providing such a frame as a surrogate for a miss-

ing one, a correct stream can b e constructed that maintains

the time synchronous playout. However, to maintain the

inter-frame dependencies of the original stream, it must b e

assured that each of the surrogate frames is inserted at the

right p osition in the stream. By constructing a syntacti-

cally correct MPEG video stream with surrogate frames,

anyavailable MPEG player can present the modied video.

In our implementation, this kind of temp oral adaptation is

applied to complete the MPEG stream when frames do not

arrive in time and deliver the adapted stream to the MPEG

player.

4.5 Physical Storage Management

In order to eÆciently realize the buering of frames in

the MPEG-L/MRP algorithm, an adequate physical storage

management has to b e employed. Due to the high variability

of frame sizes in MPEG videos, this is a non-trivial task. We

analyzed several memory management strategies in search

for an appropriate storage management of MPEG frames

in the client's memory. The requirements a go o d physical

storage manager must meet are eÆcient management and

go o d utilization of the buer and a high degree of reusage

of freed memory. Therefore, a dynamic management of the

buer is indisp ensable. Awell-known technique for dynamic

buer management is the binary buddy algorithm [9]. Pe-

terson and Norman [13] analyzed dierentvariants of buddy

systems (binary, Fib onacci, and weighted) and showed that

they all suer from a high fragmentation. Due to this rea-

son, a more suitable memory management algorithm had to

b e found. In a simulation, we compared a bunch of buddy

systems, namely DTSS buddy [10], Starburst buddy [11],

exact buddy, and as a reference system the binary buddy.

The simulation evaluated the internal and external fragmen-

tation, reusability of the memory, and the numb er of used

buddy segments. We made diverse test runs with dierent

MPEG videos and varying parameters. In a nutshell one can

say, that the Starburst buddy was the most suitable algo-

rithm for storing MPEG frames in a client buer. Although

it p erforms slightly worse than the genuine buddy system, it

is more eÆcient in the utilization and the reusabilityofthe

memory. So we decided to implement the physical storage

management of the MPEG-L/MRP algorithm based on the

Starburst buddy strategy.

5. IMPLEMENTATION

Since L/MRP has b een develop ed as a typical client/server

architecture, we have implemented the MPEG-L/MRP

strategy employing a client/server architecture, to o. The

client buer following the MPEG-L/MRP buer manage-

ment strategy has b een implemented using Java and the

Java Media Framework (JMF) [16]. The JMF oers an easy

to use and extendable API for the presentation of continuous

media. We realized the client buer providing an interface

that meets the requirements of the JMF players such that

the standard MPEG player of JMF can b e connected. The

client buer requests the data from a server corresp onding

to our strategy and delivers a correct MPEG-1 data stream

to the player. On the server side we employ the Informix

Dynamic Server / Universal Data Option (IDS/UD) to man-

age { among media of other typ es { the MPEG videos. Due

to its extensibility, manifesting in so called DataBlades, the

IDS/UD can b e extended to supp ort frame-based access to

MPEG videos. On top of the IDS/UD we implemented a

server in Java to p erform the task of transfering the re-

quested frames from the media server to the client buer.

6. CONCLUSION

In this pap er, we presented the MPEG-L/MRP buering

technique providing

interactive

and

adaptive

streaming of

MPEG videos that exploits MPEG-specic features suchas

dierent frame typ es with dierent imp ortance and their

inter-frame dep endencies.

The ma jor fo cus of the strategy is the suitable supp ort for

interactive

video streaming. With this, applications like our

multimedia information system in cardiac surgery, that de-

livers interactivemultimedia learning material over the In-

ternet, can b e supp orted in suchaway that a user feels much

more comfortable with the system. With MPEG-L/MRP,

we deliveramuch b etter

Quality of Presentation

due to the

smo oth reactions to the exp ected frequent user interactions.

Consequently,we deliver a much b etter

Quality of Informa-

tion

to the user who { not to be forgotten { might have

to pay for the multimedia material presented. Additionally,

the dierent frame typ es of MPEG op en a new p ossibility,

compared to homogeneous streams like Motion-JPEG, to

adapt the stream to dierent bandwidths. Besides the in-

teraction, we presented how the MPEG-L/MRP algorithm

exploits the dierent frame typ es and the dierent impor-

tance of the frames for an

adaptive

delivery of frames to the

presentation client. Another advantage of the employment

of a standard video format like MPEG-1 is, that one is no

longer b ound to the proprietary video formats of commer-

cial applications. Since MPEG-2 oers further p ossibilities

for adaptation, we consider applying an adapted version of

MPEG-L/MRP to this format.

As we are concerned with the construction of multimedia

rep ositories, MPEG-L/MRP forms one building blo ck of

such a rep ository that provides comprehensive support for

dierent kinds of multimedia information systems. On each

architectural level and on each granularity of the comp o-

nents of the rep ository envisioned, we try to build mo dules

that follow the overall goal to supp ort multimedia applica-

tions that, with regard to the quality of information and

with regard to the qualityof delivery and presentation, is

optimally adapted to the user context and system environ-

ment. In this context, to please a user with a continuous,

smo oth, and b est quality presentation, MPEG-L/MRP is an

imp ortant mo dule to provide smooth interactive delivery of

MPEG-1 videos.

Acknowledgements

We would like to thank Joachim

Wiedmann for his contributions to the design and implemen-

tation of the MPEG-L/MRP strategy and Michael Menth

for his valuable comments on the topic.

7. REFERENCES

[1] Apple. Quicktime streaming.

http://develop er.apple.com/techpubs/quicktime/

qtdevdo cs/RM/p drame.html.

[2] Emblaze. Streaming Audio and Video { Solutions over

IP Networks. http://www.emblaze.com.

[3] P. Halvorsen, V. Goeb el, and T. Plagemann.

Q-L/MRP: A Buer Management Mechanism for QoS

Supp ort in a Multimedia DBMS. In

International

Workshop on Multimedia Database Management

Systems

,Dayton, 1998.

[4] C. K. Hess and R. H. Campb ell. Media Streaming

Proto col: An Adaptive Proto col for the Delivery of

Audio and Video over the Internet. In

Proc. of IEEE

International Conference on Multimedia Computing

and Systems (ICMCS'99)

, Florence, Italy, June 1999.

[5] S. Hollfelder and H.-J. Lee. Data Abstractions for

Multimedia Database Systems. Technical Rep ort 1075,

GMD, Darmstadt, Germany, 1997.

[6] ISO.

Information Technology { Coding of Moving

Pictures and AssociatedAudio for Digital Storage

Media at up to about 1,5 Mbit/s { Part 1: Systems

ISO/IEC 11172-1, 1993.

[7] ISO.

Information Technology { Coding of Moving

Pictures and AssociatedAudio for Digital Storage

Media at up to about 1,5 Mbit/s { Part 2: Video

ISO/IEC 11172-2, 1993.

[8] W. Klas, C. Greiner, and R. Friedl. Cardio-OP |

Gallery of Cardiac Surgery. In

Proc. of IEEE

International Conference on Multimedia Computing

and Systems (ICMCS'99)

, Florence, Italy, June 1999.

[9] D. E. Knuth.

The Art Of Computer Programming

volume 1: Fundamental Algorithms. Addison-Wesley,

Reading, Massachusetts, second edition, 1973.

[10] P.D.L.Koch. Disk File Allocation Based on the

Buddy System.

ACM Transactions on Computer

Systems

, 5(4), 1987.

[11] T. J. Lehman and B. G. Lindsay. The Starburst Long

Field Manager. In

Proceedings of the 15th VLDB

Conference

, Amsterdam, 1989.

[12] F. Moser, A. Krai, and W. Klas. L/MRP: A Buer

Management Strategy for Interactive Continuous Data

Flows in a Multimedia DBMS. In

Proceedings of the

21st VLDB Conference

,Zurich, Switzerland, 1995.

[13] J. L. Peterson and T. A. Norman. Buddy Systems.

Communications of the ACM

, 20(6):421{431, 1977.

[14] Real Networks. Real Video Technical White Paper,

Feb. 1997.

http://www.real.com/devzone/library/whitepap ers/

overview.html.

[15] Real Networks. SureStream - Delivering Sup erior

Quality and Reliability, Feb. 2000.

http://www.realnetworks.com/devzone/library/

whitepap ers/surestrm.html.

[16] SUN Microsystems. Java Media Framework 2.0, 1999.

http://java.sun.com/pro ducts/java-media/jmf/2.0.

[17] VDOnet. VDOLive. http://www.vdo.net.

[18] G. K. Wallace. The JPEG Still Picture Compression

Standard.

Communications of the ACM

, 34(4):30{44,

1991.

[19] J. Walp ole, R. Koster, S. Cen, C. Cowan, D. Maier,

D. McNamee, C. Pu, and D. Steere. A Player for

Adaptive MPEG Video Streaming over the Internet.

Proceedings of the 26th Applied Imagery Pattern

Recognition Workshop (AIPR-97), SPIE

,Washington

DC, USA, Octob er 1997.