A Comparison of Multimedia Document Models Concerning Advanced Requirements [original]

Technical Rep ort - Ulmer Informatik-Berichte No. 99-01

Department of Computer Science, University of Ulm, Germany.

A Comparison of Multimedia Do cument Mo dels Concerning

Advanced Requirements

Susanne Boll, Wolfgang Klas, Utz Westermann

Databases and Information Systems (DBIS),

Computer Science Department, University of Ulm, Germany

b oll, klas, westermann

@informatik.uni -ul m.de

Abstract

Existing multimedia do cument mo dels like HTML, MHEG, SMIL, and HyTime lack appropriate

mo deling primitives that meet sp ecic requirements given by advanced multimedia information sy-

stem application s. In traditional multimedia applications, multimedia do cument mo dels just had to

cop e with the mo deling of the temp oral, spatial, and interactive course of a multimedia presentation.

However, we seriously question whether existing mo dels t the needs of next generation multimedia

applications that bring up requirements like reusabilityofmultimedia content in dierent presenta-

tions and contexts, and adaptation to user preferences. In this pap er, we motivate and present new

requirements stemming from advanced multimedia applications and the resulting consequences for

multimedia do cument mo dels. Along these requirements, we discuss HTML, HyTime, MHEG, SMIL,

and Z

X, a new mo del that has b een develop ed with sp ecial fo cus on reusability and adaptation. The

analysis and comparison of the mo dels show the limitations of existing mo dels, p oint the waytothe

need for new exible multimedia do cument mo dels, and throw light on the many implicatio ns on

authoring systems, multimedia content management, and presentation.

Keywords:

Multimedia do cument mo del, multimedia databases, educational medical applicatio ns.

1 Intro duction

The initial requirements to multimedia do cuments were the mo deling of the temp oral and spatial course

of a multimedia presentation. So on the imp ortance of interactivity for multimedia applications was

understo o d and interaction mo deling formed an additional requirement. To oer suitable supp ort for

multimedia applications the developmentofmultimedia do cument mo dels b egan. On the one side stan-

dardization activities started and on the other side commercial to ols were evolving. The development

and the passing of standards to ok quite a long time, while in the meantime very sophisticated commercial

multimedia authoring to ols came to the market that supp ort their own

proprietary

format and only by

now start to cross the bridge to do cument standards.

However, we think that the standards and the commercial to ols develop ed so far only partially oer

the necessary prerequisites for multimedia do cument mo deling as \next generation" multimedia applica-

tions extend the requirements given ab oveby far: demand for

reusability

of the media including entire

do cuments and parts of do cuments, mo deling of

adaptation

to user sp ecic needs and context-dep endent,

ne-grained re-usage of the multimedia material, and wide-spread use in the Internet.

Whydowe consider these to b e the new requirements to multimedia do cuments? As authoring of

multimedia information is a very time consuming and costly task the reuse of material is denitely of

high interest simply from an economical p oint of view. But reuse by means of \cut and paste" obviously

can not b e a solution, rather distinct and ne-grained reuse of multimedia content is highly demanded.

Personalization and adaptation of information systems to p ersonal needs and p ersonal interests b ecome

more and more imp ortant (e.g., [Bul98]). The trend to oer a user the most suitable and narrowed down

multimedia information can b e seen in research prototyp es from dierent research areas, e.g., a user

adaptive diagram assistant [C-L98], an adaptive tutorial agent [SSW98], adaptive textb o oks on the WWW

[EBS97], p ersonalized news pap er [KBA93], p ersonalized delivery of news [KLAV98], etc. Personalization

and adaptation in consequence calls for the enhancementofmultimedia content with metadata to allow

for the targeted context-sp ecic selection of multimedia content. Another new requirement to a do cument

mo del is its Internet-applicability, i.e., how it can cop e with the demands of the heterogeneous environment

of the Internet.

Within our pro ject \Gallery of Cardiac Surgery" (Cardio-OP)

, that aims at the developmentofan

Internet-based and database-driven multimedia information system in the domain of cardiac surgery,

we nd a representative application that explicitly requires a mo del for multimedia material which can

b e extensively reused in dierent context. Based on a multimedia content rep ository, the system is

going to serve as a common information and education base for its dierenttyp es of users, physicians,

medical lecturers, students, and patients, who are provided with multimedia information according to

their user sp ecic request to the multimedia information system, their dierent understanding of the

selected sub ject, their lo cation and technical infrastructure. For example, a high qualitymultimedia

presentation dynamically comp osed during a lecture at the university campus should b e available for

students at home for revise although they do not need the high quality of videos or images at home.

Therefore, either the multimedia material must b e delivered to the student with a lower quality and

lower bitrate or high data volume parts like video are replaced by \comparable" but less voluminous

parts like a slide show.

Toachieve this kind of functionality, a suitable multimedia do cument mo del that allows for exible and

context-dep endent reuse of a multimedia do cument and parts of it is needed. On our way to such a \next

generation" multimedia do cument mo del we rst tracked down and identied the advanced requirements

lo oking at up coming advanced applications like Cardio-OP. When investigating the applicability of the

existing mo dels HTML, HyTime, MHEG, and SMIL we, unfortunately, nd serious limitations and

drawbacks. It is quite a challenge to push these limits and to dene a multimedia do cument mo del

providing mo deling primitives that go b eyond those found in existing mo dels. An example for sucha

mo del is Z

X , presented in [BK99].

The implications of approaches trying to resolve the shortcomings of existing mo dels are manyfold:

There arises an urgent need for suitable authoring systems that supp ort ne-grained reuse of multimedia

content, adaptability of content to user needs and individual interest, and, as a direct consequence, the

presentation-neutral representation of material, e.g., in a database. The latter is an analogue to the

principle of data indep endence of applications well-known from database systems. What is known as

data indep endence for \traditional" applications must b e enhanced by presentation indep endence for

multimedia applications. In addition, presentation-neutral representation of multimedia content directly

impacts the design of presentation to ols, since these have to \deliver" exibility and adaptability to end

users.

In this pap er, we give a motivation towards the development of next generation multimedia do cument

mo dels likeZ

X [BK99] by identifying advanced application requirements, analyzing existing do cument

mo dels showing the limits of current approaches, and calling for a concerted action on developing next

generation authoring and management to ols for advanced multimedia applications.

The remainder of the pap er is organized as follows: Section 2 presents the new requirements for

multimedia do cument mo dels. Section 3 intro duces the reader to the dierent mo dels for multimedia

do cuments we compare in this pap er, HTML, SMIL, MHEG-5, HyTime, and Z

X. Section 4 presents

the comparison of the mo dels along the requirements identied in Section 2. The pap er concludes with

an reection of the analysis and p oints the way to the future of multimedia do cument mo dels.

2 Requirements to Next Generation Multimedia Do cumentMod-

els

In this section, we identify requirements to multimedia do cument mo dels. These can b e divided into

traditional requirements, whichwe consider to b e imp erative for anymultimedia do cument mo del, and

advanced requirements, whichwe exp ect to b e demanded more and more by future multimedia appli-

cations. The availabilityof a

temporal model

spatial model

,aswell as supp ort for the mo deling of

interaction

are traditional requirements while

reusability

of multimedia do cument content,

adaptation

Cardio-OP - Gallery of Cardiac Surgery - is partially funded by the German Ministry of Research and Education,

grantnumb er 08C58456. Our pro ject partners are the University Hospital of Ulm, Dept. of Cardiac Surgery and Dept.

of Cardiology, the University Hospital of Heidelb erg, Dept. of Cardiac Surgery, an asso ciated Rehabilitation Hospital, the

publishers Barth-Verlag and dpunkt-Verlag, Heidelb erg, FAW Ulm, and ENTEC GmbH, St. Augustin. For details see also

URL www.informatik.uni-ulm.de/dbis/Cardio-OP/

to user sp ecic needs, and

presentation-neutral representation

of multimedia do cument content are ad-

vanced requirements. Each of these requirements is motivated and illustrated in its dierent facettes in

the following subsections. The requirements form a metric along which selected multimedia do cument

mo dels are analyzed in Section 4.

2.1 Temp oral mo del

As the presentation of multimedia do cuments is time-dep endent, one of the basic requirements to a

multimedia do cument mo del is the mo deling of the temp oral course of the presentation. Thus, a temp oral

mo del must b e provided to describ e temp oral dep endencies b etween the media elements that a multimedia

do cument comprises. We nd three typ es of temp oral mo dels:

point-based

temp oral mo dels,

interval-based

temp oral mo dels and

event-based

temp oral mo dels.

In the p oint-based mo del the temp oral extent of each media element in the multimedia do cumentis

mo deled by

points in time

. These determine at which p oint in time on the time axis the presentation

of a media element starts, and ends resp ectively.For anytwo p oints in time one of the relationships

before

(

after

(

), or

equals

(=) holds. This is a simple representation of time with a small number of

temp oral relationships.

Existing representations of temp oral asp ects in the context of multimedia presentations are mainly

based on some or all of the 13 binary temp oral relations b etween

time intervals

as dened by Allen [All83].

These mo dels, however, do not supp ort time intervals of unknown duration that o ccur, for instance, in

the context of user interaction in multimedia presentations (e.g., Ob ject Comp osition Petri Nets (OCPN)

[LG93]). Therefore, enhanced interval-based temp oral mo dels have b een prop osed to handle op en time

intervals and indenite interval relationships [DK95, HFK95, WR94].

In an event-based mo del of time,

events

determine the temp oral course of the presentation. An event

is connected to actions and when an event o ccurs, e.g., a video reaches a certain p oint in time, the

corresp onding actions, typically start and stop of the presentation of other media elements, is carried

out.

Another way to sp ecify temp oral relations b etween media elements is by the use of

scripts

{ programs

written in a scripting language which can comprise temp oral op erations. If the scripting language forms

a complete programming language, this mechanism allows for very complex and p owerful sp ecications

of temp oral dep endencies b etween media elements.

2.2 Spatial Mo del

If a presentation consists of visual media elements, not only the temp oral synchronization of these elements

is of interest but also their spatial p ositioning on the presentation media (e.g., a window). This p ositioning

can b e sp ecied by the use of a spatial mo del. In general, three approaches to spatial mo dels can b e

distinguished:

absolute positioning

directional relations

, and

topological relations

With absolute p ositioning the media element is placed on the presentation area at a xed

absolute

position

sp ecied by a co ordinate pair. To handle overlapping, a third value maybe intro duced by which

the ordering of overlapping media elements is dened.

A more exible way to dene the spatial p ositioning of visual media elements is the sp ecication of

directional

relations [PTSE95, PS94], like

north

north-west

etc. At a ner granularity,byintro ducing re-

lations like

strong-north

and

weak-north

to sp ecify overlapping, 169 dierent directional relations b etween

two rectangles in 2D space can b e distinguished [PTSE95].

Another way to dene spatial relationships is by the use of top ological relations [EF91]. Between any

two continuous region ob jects, the following eight top ological relations can b e distinguished:

disjoint

meet

overlap

covers

covered-by

contains

inside

, and

equal

2.3 Interaction

A distinct feature of a multimedia do cument mo del is the ability to sp ecify user interaction in order to

let a user cho ose b etween dierent presentation paths. Multimedia do cuments without user interaction

are not very interesting as the course of their presentation is exactly known in advance and, hence, could

b e recorded as a movie. For the mo deling of user interaction, one can identify at least two basic typ es of

interaction:

navigational interactions

design interactions

With navigational interactions a user can determine the owofamultimedia presentation. An example

is the selection of a link or an item from a menu to decide which presentation path is to b e followed.

Design interactions inuence the visual and audible layout of a presentation. Examples are the

adjustmentofspeaker volume, fonts, scaling of images, and the like.

2.4 Reusability

As motivated in the intro duction, reusabilityofmultimedia content is a desired feature of a next gen-

eration multimedia do cument mo del. Reusability of do cument content can b e characterized along three

dimensions: the

granularity

of reuse, the

kind

of reuse, and the

selection and identication

of reusable

comp onents.

Granularity:

The granularity of reuse determines

what

can b e reused. Regarding multimedia do cument

mo dels, we can distinguish at least three levels of granularity of reusable comp onents: reuse of complete

multimedia

documents

, reuse of

fragments

of multimedia do cuments like single scenes or chapters, and

reuse of individual atomic

media elements

such as a video or audio.

Kind of re-usage:

For all three levels of granularitywe distinguish b etween dierentways of

how

reuse material for the comp osition of new do cuments:

identical re-usage

, i.e., the comp onents are reused

including all temp oral, spatial, design and interaction relationships and constraints as originally sp ecied

by the author(s), and

structural re-usage

, i.e., we separate the layout from the structure of comp onents

and reuse only the structural parts.

Selection and identication:

Before we can reuse comp onents wehaveto

identify

and

select

them

within an information system. This calls for metadata and a mechanism for classifying, indexing, and

querying comp onents. Hence, a do cument mo del should provide supp ort for annotation of reusable

comp onents with metadata.

2.5 Adaptation

Presentation of multimedia do cuments preferably dep ends on the user context and hence, the multimedia

presentation needs to b e adapted to this user context. But it is also of interest whether all p ossible

adaptation alternatives are to b e known and mo deled at authoring time of a multimedia do cumentorif

they are left for evaluation at the actual presentation time.

Parameters of adaptability:

For the user context, we distinguish b etween

adaptation to personal

interest

and

adaptation to technical infrastructure

. Consider a professor on campus who is interested to

see in-depth multimedia material on coronary artery bypass grafting, and an undergraduate studentat

home who needs to get only an abstraction of the same material. In the example the presentation needs to

b e adapted to the p ersonal interest, here identied by p ersonal interest \coronary artery bypass grafting"

and professional level \professor" and \student". In addition to this kind of semantic adaptation of

multimedia do cuments, the multimedia presentation can b e adapted according to the technical capabilities

of the environment a user is working in, i.e., \on campus", \at home". The professor may run a high

quality presentation on the university campus providing excellent network bandwidth and computer

power, whereas the student can view the presentation at home where he do es not have the same excellent

technical prerequisites. A do cument mo del supp orts these typ es of adaptation if it can supp ort the

mo deling of user-sp ecic and system-sp ecic parameters as \input parameters" for adaptation suciently.

Denition of presentation alternatives:

Dep ending on

when

the dierent \alternatives" are dened

that can b e exploited for adaptation, we distinguish b etween

static adaptation

and

dynamic adaptation

With static adaptation the adaptable alternatives must b e known and included in the do cumentat

authoring time. Whereas for dynamic adaptation the available alternatives are determined due to the

sp ecic context at presentation time. One could therefore say that mo dels that allow static and/or

dynamic adaptation allows for \early and/or late adaptation binding".

2.6 Presentation-neutral Representation

Reuse of multimedia content in dierent context do es not mean that the material is presented always

identically. Rather reuse of contentmay require structural reuse of material and assignment of dierent

visual and audible layout according to the context. In addition, advanced distributed multimedia applica-

tions often face a heterogeneous environment with regard to op erating systems and hardware platforms.

It is desirable that the multimedia material of such an application can b e presented within this hetero-

geneous environment with minimal implementation eort. Thus, it makes p erfect sense to try to reuse

existing presentation software, e.g., HTML browsers, MHEG engines, on these systems.

As a consequence, the multimedia material has to b e mo deled in a

presentation-neutral

way, i.e.,

indep endent of the actual realization (layout) of a presentation. This is a challenging problem as it calls

for automatic conversion of the multimedia do cument mo del used for the presentation-neutral description

of multimedia contentinto the multimedia do cument mo del used for presentation of the multimedia

content. In general, two ma jor characteristics inuence the convertabilitybetween multimedia do cument

mo dels:

multimedia functionality

and

semantic level

of a mo del [RvOB97].

Multimedia functionality:

The multimedia functionalityof a multimedia do cument mo del describ es

the expressiveness of its mo deling primitives. In a conversion pro cess, this means that if the target

do cument mo del do es not oer an equivalentmultimedia functionality as oered by the source mo del,

the conversion will b e lossy.

Semantic level:

The semantic level of a multimedia do cument mo del plays an imp ortant role for the

automatic conversion for presentation. If the target do cument mo del of such a conversion provides a

semantic description of multimedia content on a high level, i.e., rather description of structure than

description of presentation, higher than the source do cument mo del, the conversion requires the analysis

of the do cument sp ecied in the source mo del and the derivation of its semantics for its enco ding in

the target mo del. In general, this requires knowledge ab out the multimedia content that often only the

author will have. In these cases, automatic conversion will not b e p ossible. However, the automatic

conversion of a multimedia do cument represented on a high level of semantics into a mo del based on a

comprehensive set of low level semantic constructs can b e p erformed much easier. In order to avoid the

problems of automatic conversion, the presentation-neutral representation of multimedia content should

{ b esides the coverage of richmultimedia functionality { take place on a high level of semantics.

3 Existing Multimedia Do cument Mo dels

In this section, we briey present the most imp ortant and relevant existing standards and data mo dels for

multimedia do cuments. We giveanintro duction to the existing do cument mo dels HTML, SMIL, MHEG-

5, HyTime, and also to Z

X, an example for a mo del oering more advanced mo deling primitives. These

mo dels will b e analyzed and compared along the requirements of the previous section in Section 4.

3.1 HTML

The Hyp ertext Markup Language (HTML) [RLJ98] is based on SGML [ISO86] and denes a syntax

to enrich text pages with structural information using SGML

elements

.For instance, elements can b e

inserted into the text to organize it into paragraphs, to mark headings of dierent levels, to dene tables,

and to dene quotations. Furthermore, it is p ossible to include various kinds of ob jects like media elements

(e.g, images, videos and audio tracks), Java applets, ActiveX comp onents, and scripts. In addition to

that, HTML allows for the denition of

hyperlinks

between do cuments. These hyp erlinks are means to

dene interactions, i.e., an interaction (e.g., a mouse click) with the

link anchor

results in the presentation

of the do cument sp ecied by the

link target

. Scripts, applets, and ActiveX comp onents included with

a do cument are executed at presentation time by the presentation environment, the so-called

HTML

browser software

.However, the HTML standard do es neither dene syntax nor semantics of the scripting

languages, so presentation b ehaviour of a HTML page that includes scripts dep ends on the employed

browser software.

There are eorts of the large HTML browser software vendors Netscap e and Microsoft to allow for the

manipulation of the structure, layout, and content of a HTML do cument with scripting languages. Thus,

scripts can dynamically manipulate HTML do cuments, a technique which is also called

Dynamic HTML

(DHTML)

. The price for this increased exibility is that p ortability problems arise due to dierences

between the scripting languages employed by Microsoft and Netscap e.

3.2 SMIL

The Synchronized Multimedia Integration Language (SMIL) [HBB

98] is a W3C standard which aims

at synchronized multimedia presentations on the web. A SMIL do cument provides synchronization of

continuous media elements and constitutes an integrated presentation. SMIL is dened by an XML DTD

[BPSM98] and, hence, the language can b e understo o d as a set of element denitions sp ecied in terms

of XML. SMIL denes

schedule elements

to describ e temp oral synchronization b etween media elements.

Furthermore, the spatial layout of the media elements can b e dened. SMIL also allows to sp ecify links

between do cuments or parts of do cuments which are equivalent to HTML links. An interesting feature of

SMIL is the

switch

element which is a simple means for mo deling alternatives in the course and qualityof

a presentation. With the help of switch elements, an author can sp ecify dierent presentation alternatives

among which one is chosen at presentation time due to external parameters.

3.3 MHEG-5 and MHEG-6

MHEG-5 [ISO95, JR95] is an adaptation of the MHEG-1 Standard [MBE95] to the needs of video-on-

demand and kiosk applications for set-top-b oxes and low-end PC. MHEG-5 enco des applications in their

nal form and aims at an ecient realization of MHEG-1 attracting the interest of telecommunication

and entertainment industry in this standard. MHEG-5 provides an ob ject-oriented data mo del for mul-

timedia do cuments. The standard denes a hierarchyof

MHEG-5 classes

. This hierarchy comprises

classes for various uses. For example, there are classes that represent media elements like videos and

audios, classes that representinteraction elements like buttons, and even classes that provide variable

functionality of programming languages. Classes p ossess attributes, can p erform actions (which closely

resemble metho ds in ob ject-oriented programming languages), and re events. An MHEG-5 do cument

is a collection of instances of these classes organized in

scenes

which are the main structural primitives.

A scene corresp onds to a \page" on a screen and, hence, only one scene can b e presented at a time. In

addition to that, each MHEG-5 do cument features one instance of the class

Application

dening the

entry p oint for do cument's presentation. Moreover, this application ob ject can contain ob jects which are

global to every scene. The presentation b ehaviour of an MHEG-5 do cument is dened by the means of

links

which resemble event-condition-action rules.

MHEG-6 [ISO96, Hof96] is an extension of MHEG-5 that intro duces an interface b etween an MHEG-

5 engine and a Java Virtual Machine. With MHEG-6, it is p ossible to include Java programs into an

MHEG-5 do cument. Such a program has access to the ob jects of the do cument and, hence, can inuence

its presentation b ehaviour.

3.4 HyTime

HyTime [ISO92, DD94, NKN91] is a standard which allows for the description of the structure of mul-

timedia do cuments. Based on SGML [ISO86], HyTime provides a well-dened set of primitives which

allows for the interlinking of media ob jects without sp ecifying the enco ding of the media ob jects. The

primitives provided by HyTime are oered by means of

architectural forms

and are organized in terms

of mo dules. Architectural Forms (AF) are HyTime elements with pre-dened multimedia semantics and

attributes. An AF can b e used in any SGML DTD by extending an SGML elementtyp e by an attribute

HyTime

b earing the name of the AF to b e used. In that way, the elementtyp e inherits the semantics and

attributes of the AF.

The mo dules of HyTime Base-Mo dule, which denes the basic concepts of HyTime, the Lo cation-

Address-Mo dule, which implements the p owerful construct of

Locators

providing an abstract mechanism

for addressing external do cument ob jects, the Hyp erlink-Mo dule, which implements the concept of links,

the Finite-Co ordinate-Space-Mo dule, which provides means for the synchronized presentation of media

ob jects based on n-dimensional

coordinate spaces

, the Event-Pro jection-Mo dule, which allows to trans-

form eventschedules dening the temp oral execution of a presentation, and the Ob ject-Mo dication-

Mo dule, which allows to transform presentation ob jects, e.g., fading.

3.5 Z

The Z

Xmultimedia do cument mo del [BK99] has b een develop ed by our group in the context of the

Cardio-OP pro ject which aims at the development of a database-driven multimedia information system

with sp ecial needs for reusability, adaptation, interaction, and presentation-neutral description of multi-

media content. Z

X describ es complete or fragments of multimedia do cuments by the means of a tree

(for an illustration, see Figure 1). The no des of the tree are called

presentation elements

. Each presen-

tation has got a

binding point

asso ciated with it. Such a binding p oint can b e b ound to one

variable

of another presentation element, thus creating the edges of the tree. The presentation elements are the

generic elements of the mo del. They can represent

atomic media elements

(e.g., videos, images and

text) or more complex comp ositions of media elements. Another group of presentation elements combine

presentation elements with certain semantics, the

operator elements

. There are op erator elements that

allow for temp oral synchronization, denition of interaction, adaptation, and for the spatial, audible, and

visible layout (the so-called

projector elements

) of the do cument.

It is p ossible to delay the pro cess of variable binding by leaving variables unb ound. This allows

for the denition of

templates

which can b e customized to a sp ecic problem at a later p oint in time.

Furthermore, a tree can b e encapsulated bya

complex media element

which can then b e used in other

trees (see Figure 2) likeany other presentation element. Unb ound variables of an encapsulated tree are

exp orted by the complex media element allowing for the encapsulation of templates. Thus, a complex

media element is somehow a blackbox view of a Z

X tree.

Presentation

element Binding point

Binding edge

Variable

Legend:

v2v1 v3 v4 v5

Slide 1 Slide 3

seq

Slide 2

Figure 1: A sample Z

X tree in graphical notation

Video

v5v4 v7

complex media element

par

v7 v8

seq

Audio Text

seq

v2 v4v3

v5 v6

template

Figure 2: Template encapsulated by a complex me-

dia element in graphical notation

4 Analysis

In this section, we analyze how the multimedia do cument mo dels intro duced in the previous section fulll

the requirements outlined in Section 2. First, weinvestigate the do cument mo dels HTML, MHEG-5, Hy-

Time, SMIL, and Z

X with resp ect to the traditional requirements, i.e., their temp oral and spatial mo del,

and their interaction mo deling. Then, we examine how these mo dels b ehave concerning the advanced

requirements reusability, adaptation, and presentation-neutral description for do cument content. Figure

4 illustrates the summary of this analysis.

4.1 Temp oral Mo del

HTML:

As HTML has b een develop ed for Hyp ertext, the standard itself do es not oer constructs to

sp ecify temp oral synchronization b etween the media elements included in a HTML do cument. However,

by the use of DHTML, the temp oral course of a presentation can b e programmed in a scripting language.

Recently, there has b een a prop osal by Microsoft for adding temp oral synchronization supp ort to HTML

called HTML+TIME [SYS98]. This approachintegrates HTML with the temp oral mo deling mechanisms

intro duced by SMIL but it do es not seem to b e very mature, yet.

MHEG-5:

MHEG-5 sp ecies the temp oral course of a presentation by means of its link concept.

Presentable MHEG classes (e.g., the video and audio classes) dene a varietyof events relating to time.

These events can b e asso ciated via links with actions which are p erformed when the corresp onding event

o ccurs. Thus, the temp oral mo del of MHEG-5 is event-based.

HyTime:

In HyTime, the co ordinated presentation of media elements can b e sp ecied using the Finite-

Co ordinate-Space mo dule. This mo dule provides a means to dene

-dimensional co ordinate spaces which

can include time dimensions. Into such a co ordinate space, media elements can b e placed. This placement

is called

event

and exactly denes the p oint in the co ordinate space where the media element asso ciated

with the event will b e presented. Hence, the temp oral mo del of HyTime is p oint-based. Several events can

b e group ed to an

event schedule

which describ es the course of the presentation of a HyTime do cument.

SMIL:

SMIL follows an interval-based approach to temp oral synchronization. Each media element has

an asso ciated presentation interval. These intervals can b e co ordinated by the use of schedule elements.

In general, SMIL denes two kinds of schedule elements. On the one hand, there is the

paral lel element

which denes the parallel presentation of

intervals. Using attributes, a more detailed denition of

a parallel presentation is p ossible. For instance, time delays, lipsync synchronization, and lo ops can

b e sp ecied. On the other hand, SMIL provides the

sequential

element which allows for the sequential

presentation of

intervals. Again, a more detailed sp ecication of this presentation is p ossible with the

help of element attributes. The dierentschedule elements can b e nested, thus allowing for the mo deling

of complex temp oral relations.

The temp oral mo del of Z

X is closely related to the temp oral mo del of SMIL. Z

X denes the

temp oral op erator elements

seq

and

par

which resemble the parallel and sequential elements of SMIL.

In contrast to SMIL, lo ops and time delays are not sp ecied using attributes. Instead, these are handled

byown temp oral op erator elements, the

l oop

op erator element and the

del ay

temp oral op erator element.

Due to its close resemblance to SMIL, the Z

X temp oral mo del must b e considered as interval-based.

4.2 Spatial Mo del

HTML:

The control of the spatial layout of the media elements included in a HTML do cumentis

very limited. However, the concept of

framesets

allows to partition the presentation area of an HTML

do cument (i.e., the HTML browser window) into rectangular regions, so-called

frames

. In such a frame,

another HTML do cument can b e displayed which itself can dene further frames. This allows for frame

nesting. As frames are sp ecied by their size and p osition this constitutes a kind of absolute p ositioning.

Exact p ositioning of media elements is p ossible by the use of DHTML. Scripts can set and mo dify the

co ordinates at which a media element is presented in the browser window. Hence, (D)HTML oers

absolute p ositioning.

MHEG-5:

Each MHEG-5 class that represents visual media elements provides attributes dening the

co ordinates of the presentation area at which the visual media element has to b e presented. By the use

of the link concept, these co ordinates can b e set and changed as the result of events. This is a kind of

absolute p ositioning.

HyTime:

As mentioned ab ove, the Finite-Co ordinate-Space mo dule provides means to sp ecify the

course of the presentation of a HyTime do cumentby the means of eventschedules referring to

dimensional co ordinate spaces. Such a co ordinate space can include, b esides a temp oral dimension,

one or more spatial dimensions. Thus, an event not only describ es the p oint in time at which the asso ci-

ated media element will b e presented but also the spatial p osition. Therefore, HyTime allows for absolute

p ositioning.

SMIL:

SMIL provides a mechanism to allow for the absolute spatial p ositioning of media elements.

In the head of a SMIL do cument, rectangular regions of the presentation area can b e sp ecied, called

channels

. Eachchannel is dened by its p osition, its size, and a value which is used to dene the order

of overlapping channels. Each media element in the do cument b o dy can reference a channel thereby

sp ecifying its spatial p osition on the presentation area.

Spatial layout in Z

X is dened by the use of

spatial

pro jector elements. A spatial pro jector

element denes the rectangular region of the presentation area in which the subtree b elow the spatial

pro jector element is presented. Likechannels in SMIL, such a region is dened by its p osition, size, and

avalue to resolveoverlapping of regions. Spatial pro jector elements can b e nested. A spatial pro jector

element

in the subtree under spatial pro jector element

is seen in the context of

and not the entire

presentation area. All in all, Z

X employs absolute p ositioning to sp ecify spatial layout of a do cument.

4.3 Interaction

HTML:

HTML provides the concept of links which allows for navigational interaction. Moreover,

HTML allows for the denition of data entry forms which consist of dierent controls like buttons and

text-input elds. The results of an interaction with a form cannot b e sp ecied using HTML itself. This is

left to CGI scripts running at a web server. However, if the employed browser software supp orts DHTML,

more sophisticated interactions like design interactions can b e programmed using scripts.

MHEG-5:

MHEG-5 provides a small set of basic interaction classes for the mo deling of user interaction.

MHEG-5 separates the element that initiates an interaction from the eect of an interaction. By the

denition of links, the interaction with an interaction element such as a button can trigger an action

resulting in a navigational or in a design interaction. Since MHEG-6 allows for the integration of Java

programs, it can supp ort additional, sophisticated user interactions.

HyTime:

As explained ab ove, the Finite-Co ordinate-Space mo dule of HyTime provides mechanisms

to dene the spatial and temp oral co ordination of a presentation. This is done byeventschedules

which require to know all spatial and temp oral p ositions of media ob jects in advance. This excludes ad-

ho c navigational interaction by the user. HyTime do es not supp ort design interactions as the primitives

provided by the Ob ject-Mo dication-Mo dule and the Event-Pro jection-Mo dule do not include

interaction

semantics.

SMIL:

The concept of links in SMIL provides for navigational interaction. But no supp ort is given for

the sp ecication of design interactions.

The requirement to supp ort the mo deling of interactivemultimedia presentations is met byZ

X's

interaction elements. There are twotyp es of interaction elements,

navigational

interaction elements and

design

interaction elements. Examples for navigational interaction elements are the

link

element that

allows to sp ecify hyp ertext structure as in SMIL or HTML and the

element with which one can

interactively follow one path out of a set of presentations paths. The design interaction elements are

interactiveversions of the pro jector elements. For example, for the typ ographic pro jector that allows

to sp ecify font, size and style of a text, the

interactive typographic projector element

sp ecies that these

settings can b e carried out interactively when the do cument is presented.

4.4 Reusability

HTML:

HTML allows to reference whole do cuments and single media elements via

uniform resource

locators (URL)

.However, it is not p ossible to reference just a fragment of a do cument. Thus, reusability

is only supp orted on the highest and lowest level of granularity as identied in Section 2.

As there is no clear distinction b etween structure and layout of an HTML do cument, hence, reuse

can only b e identical. Although HTML 4.0 tries to separate structure from the layout of a do cumentin

a more rigid way and promotes the use of external cascading style sheets, it is still p ossible to mix layout

and structure for backward compatibility reasons.

In order to supp ort classication and identication of do cuments, HTML allows for the sp ecication

meta attributes by means of attribute-value pairs in the head of a do cument.

MHEG-5:

Considering the granularity of reuse, it is imp ortant to notice that MHEG-5 structures the

media elements of an application into

groups

which can b e addressed globally. Hence, groups constitute

the units which can b e reused. As an application ob ject is a group, it is p ossible to reuse entire MHEG-5

do cuments within an MHEG-5 do cument. Likewise, scenes are MHEG-5 groups and, hence, could b e

reused in principal. Since scenes can refer to ob jects global to an MHEG-5 do cument which are contained

in the application ob ject, it is not p ossible to reuse scenes which dep end on such global ob jects. Only

fully indep endent and isolated scenes could b e candidates for re-usage in other do cuments. Therefore,

there is no general supp ort for reusability at the level of do cument fragments. Regarding the reusability

at the level of mere media elements (like videos, audios), it is imp ortant to know that a media element

must b e asso ciated to exactly one group and is addressed through this group. Thus, it is not p ossible to

use one and the same media elementintwo dierent scenes. However, as media elements do not haveto

include the underlying data but also just can refer to their data, groups can share at least the data of

media elements.

Since MHEG-5 aims less at mo deling the structure of a multimedia application but at representing

its nal presentation form, which includes the layout, groups can only b e reused identically.

The identication and selection of groups to b e reused is a serious problem in MHEG-5 as no in-

formation can b e assigned to MHEG ob jects. One can use neither annotations, keywords, or any other

kind of metadata for classication of and search for media ob jects, nor semantically useful names for the

identication of parts to b e reused.

HyTime:

HyTime allows for reusability on all levels of granularity as identied in the requirements

section. As HyTime is built on SGML, single media elements and complete do cuments can b e referenced as

entities and therefore b e reused. Moreover, the Lo cation-Address-Mo dule provides p owerful mechanisms

to lo cate and address fragments of HyTime do cuments. Using

locators

, parts of a HyTime do cument can

b e referenced by name, p osition, or even by the use of a p owerful query language.

As any SGML DTD can b e made HyTime-compliant, HyTime do cuments describ e rather the structure

of a do cument than its presentation semantics. Thus, reuse in HyTime is semantic reuse.

Moreover, b ecause HyTime is indep endent of a DTD, a DTD can b e provided with supp ort for

classication of (parts of ) do cuments, e.g., by the use of attribute-value pairs. Hence, HyTime oers

supp ort for classication and identication of reusable comp onents.

SMIL:

As SMIL can reference complete do cuments and single media elements by the use of URL, SMIL

allows for reuse at the according levels of granularity as dened in Section 2. However, SMIL do es not

supp ort the reuse of fragments of do cuments.

SMIL separates layout sp ecications, whichhave togointo the head of the do cument, from the

structural sp ecications given in the b o dy. But as b oth kinds of sp ecications are closely interrelated,

SMIL provides only for identical reuse.

Like HTML, SMIL allows to dene meta-attributes within the head element of a do cument. Such meta-

attributes can b e used to classify and retrieve do cuments providing supp ort for selection and identication.

The Z

X do cument mo del has b een designed with all levels of granularity of reuse in mind. To

supp ort reusability of media elements, atomic media elements are provided which can b e reused in any

X sp ecication. Likewise, complex media elements which encapsulate sp ecications can b e reused in

any other sp ecication. As the encapsulated sp ecications can smo othly range from small logical parts

of a do cumenttoentire do cuments, Z

X supp orts reuse b oth on the level of entire do cuments and ne-

grained do cument fragments. Moreover, the ability to encapsulate templates in complex media elements

provides for the reuse of do cument templates.

The ability to delay the pro cess of variable binding, esp ecially the binding of pro jector variables,

allows for the clear separation of the presentation elements building the structure of a do cument and the

pro jector elements determining its layout. This allows for structural reuse of Z

X sp ecications. As Z

complex media elements may include pro ject elements dening visual and audible layout this provides

for identical reuse of comp onents.

Concerning selection and identication of reusable elements, Z

X allows media elements, either com-

plex or atomic, to b e annotated with key-value pairs.

4.5 Adaptation

HTML:

Since HTML do es not oer any mechanism to sp ecify adaptation of a do cument to user interest

or to technical infrastructure, we consider only DHMTL here. DHTML oers to dynamically manipulate

the structure and content of HTML do cuments. Therefore, adaptation to user interest or technical

infrastructure can b e implemented by the use of scripts. In a rst step, such a script has to determine

the user or system prole, for example by a database query. In the second step, the script has to change

the structure of the HTML do cument according to the prole. As the author must co de and thus know

at authoring time all adaptation alternatives inside scripts, this kind of adaptation must b e considered

as static.

MHEG-5:

MHEG-5 denes classes for variables whose contents can b e tested. Hence, variables can

b e used to cho ose b etween dierent branches of a presentation. Thus, a prole dening user interest and

technical infrastructure could b e mo deled using variables. However, the problem is how such a prole

is set. MHEG-5 allows to set variables only from within a do cument. User-sp ecic adaptation would

require to make the determination of the prole a part of the MHEG-5 do cument. In MHEG-6, the

MHEG engine could call a Java program which retrieves the actual values for a given prole and then

sets the variables of the do cument. So, with the use of MHEG-6, adaptation of a presentation to user

interest or technical infrastructure is p ossible. Since all adaptation alternatives must b e sp ecied within

a do cument at authoring time, this is static adaptation.

HyTime:

Since HyTime can b e used with any concrete DTD, it is always p ossible to dene sp ecic

attributes with elements of a DTD that characterize (parts of ) do cuments or media elements in terms

of user interest or technical prop erties like bandwidth needed, resolution or frame rate required. It is

also p ossible to check for values of such element attributes by using the Query-Lo cator provided by

the Lo cation-Address-Mo dule. But all the results of such queries checking attribute values are fully

determined by the concrete do cument content and cannot b e mo died by external parameters like those

in a user or system prole. Hence, it is not p ossible to adapt a HyTime do cument according to external

parameters like a prole.

SMIL:

SMIL oers the

switch

element to mo del alternative presentation variants. Using this element,

dierent adaptation alternatives can b e sp ecied inside the do cument at authoring time. Thus, the switch

element allows for static adaptation. The selection of the alternatives is guided by simple predicates which

include parameters set outside the SMIL do cument. These parameters are predened by the standard and

describ e mainly technical features like the available bandwidth. This allows to adapt a SMIL do cument

to technical infrastructure.

As mentioned ab ove, each media elementofaZ

X do cument can b e annotated with a set of

key-value pairs that describ es its content. In addition to that a user prole, also key-value pairs, can

b e dened to capture values that describ e a user's topics of interest, presentation system environment,

network connection characteristics and the like. The Z

X mo del oers op erator elements to supp ort

adaptation to a user's prole by means of

switch elements

and the

query elements

Like in SMIL, a switch element allows to sp ecify dierent presentation alternatives for a part of the

do cument allowing for static adaptation. One of the alternatives is selected corresp onding to the user

prole. In contrast to SMIL, the scop e of a switch statement is not limited to predened parameters. A

switch element is used if all adaptation alternatives are known to the author of a do cument. In order

to allow for dynamic adaptation, the

query

elementisprovided. This element is a placeholder for a

media element or fragment which is describ ed by the means of a query. The query is represented bya

set of key-value pairs. When the do cument is selected for presentation the query elementisevaluated

and replaced by the complex or atomic media element with b est matching the set of key-value pairs with

regard to the user prole. Thus, Z

X allows for adaptation to user interest and system structure.

4.6 Presentation-neutral Representation

Figure 3 shows the relationships b etween various formats and mo dels with resp ect to their supp ort for

multimedia and for presentation-neutral representation.

HTML

MHEG-5

SMIL

HyTime

ZYX

Semantic level

Multimedia functionality

DHTML

Figure 3: Presentation-neutral representation and multimedia supp ort of the dierent formats and mo dels

HTML:

As HTML do es not clearly separate the layout of a do cument from its structure, the seman-

tic level of a HTML do cument description is not as high as HyTime though it is comparable to SMIL.

However, HTML oers only an extremely limited multimedia functionality (even simple temp oral syn-

chronization is not p ossible). To oer more multimedia functionality, DHTML must b e employed. Since

DHMTL scripts must imp eratively implementmultimedia functionality, their use extremely reduces the

semantic level of a do cument description. Furthermore DHTML intro duces p ortability problems b etween

browser vendors. Thus, neither HTML nor DHMTL are well suited for presentation-neutral representa-

tion of multimedia do cument content.

MHEG-5:

MHEG-5 primarily aims at a detailed and platform indep endent description of a presenta-

tion, i.e., the layout, of a do cument. Toachieve this goal, the standard provides MHEG-5 with a rich

multimedia functionality.However, the description of the structure of an MHEG-5 do cumentisvery p o or.

Hence, the level of semantic mo deling is very low and if compared to the semantics of the structure of

multimedia do cuments MHEG-5 might b e viewed as a \multimedia-assembler". Hence, MHEG-5 cannot

supp ort presentation-neutral representation of multimedia do cuments.

HyTime:

Since HyTime mainly sp ecies the structure and semantics of a multimedia do cumentitis

quite well-suited for presentation-neutral representation of multimedia do cuments. HyTime oers sp eci-

cation of do cument content at a high semantic level though it lacks multimedia functionality (esp ecially

in the area of interaction).

SMIL:

In contrast to HyTime do cuments, a SMIL do cument describ es in detail the presentation of

the do cument but less detailed the structure of the do cument. However, SMIL oers more multimedia

functionality than HyTime. Compared to MHEG-5, the description of SMIL do cuments takes place on

a higher level of semantics though lacking the multimedia functionality of MHEG-5. Hence, SMIL ranks

between HyTime and MHEG-5 with resp ect to its supp ort for presentation-neutral representation.

As it is p ossible to separate structure and layout of a do cument due to the ability to delay

the pro cess of variable binding and to encapsulate templates in complex media elements, the semantic

level of a do cument description is quite high and thus suited for presentation-neutral representation of

multimedia do cument content. The amountof multimedia functionality oered byZ

X exceeds SMIL

but ranks b elow MHEG-5.

interval-

based

absolute

positioning absolute

positioning

HTML DHTML SMIL MHEG-5 HyTime ZYX

script event-

based point-

based based

interval-

Temporal Model -

Interaction

Presentation-neutral

Representation

high

Multimedia Functionality mediumvery low very high

Spatial Model

Navigational

Design

Reusability

Granularity

Kind of Reusage

Fragments

Adaptation

Parameters of Adaptability

Definition of Alternatives

high

Semantic Level medium medium lowvery low very high

lowhigh

Media Elements +++ +++

Documents +++ +++

+++ - ++

-+- -++

--- ++-

Identical +++ - ++

Structural ----

Identification/Selection +++ ++-

User Interest -+- -+MHEG-6

Technical Infrastructure -++ -+MHEG-6

Static --++ --+MHEG-6

Dynamic --- -+-

Figure 4: Summary of the supp ort of the requirements by (D)HTML, SMIL, MHEG-5, HyTime, and

X (+ supp ort, - no supp ort)

4.7 Summary

Summarizing (see also Figure 4), we can say that none of the examined do cument mo dels HTML, MHEG-

5, HyTime, and SMIL oers sucient supp ort for all requirements arising from advanced multimedia

applications. HTML can hardly b e characterized as a multimedia do cument mo del b ecause it lacks

supp ort for even the most basic multimedia requirement, a temp oral mo del. Though HTML can b ecome

a quite p owerful multimedia do cument mo del by the extension to DHTML, it still lacks supp ort for

reuse at all levels of granularity and suers from a low semantic level of content description which leaves

DHTML unsuitable for presentation-neutral description of multimedia content. This is also the case with

MHEG-5. Although MHEG-5 oers a high multimedia functionality, it mainly describ es the presentation

and not the structure of a multimedia do cument and, therefore, cannot b e employed for presentation-

neutral mo deling of multimedia do cument content. Furthermore, reuse at the level of fragments is severely

hamp ered due to the unexible scene-based do cument structure.

Powerful supp ort for reuse is the strength of HyTime. Moreover, HyTime describ es do cument con-

tentatavery high semantical level and, thus, is p erfectly suited for presentation-neutral mo deling of

do cument content. However, the lacking capabilityof interaction mo deling and mo deling of adaption is

a serious drawback. In contrast to HyTime, SMIL oers the mo deling of static adaptation to technical

infrastructure and navigational interaction. Furthermore, the semantic level of a SMIL do cument descrip-

tion ranks b etween MHEG-5 and HyTime and, hence, is quite well suited for the presentation neutral

description of multimedia do cument content. However, reuse at the level of fragments is not p ossible as

is the mo deling of design interactions.

Since Z

X has b een designed with the fullment of the advanced requirements in mind, it oers

reuse on all three levels of granularity, static and dynamic adaptation to user sp ecic needs, a quite high

semantic level of do cument description, and presentation-neutral representation of multimedia content.

Regarding the traditional requirements, enough multimedia functionality has b een provided to allow for

interesting multimedia presentations including design interactions.

5 Conclusion and Future Work

Driven by our advanced multimedia information system application Cardio-OP,we rst have identied a

new set of requirements for multimedia do cument mo dels:

reusability of multimedia content

adaptation of

multimedia content to user needs and interests

, and

presentation-neutral description

of the structure and

contentofmultimedia do cuments. These requirements complement the more traditional requirements

for multimedia do cument mo dels, i.e., temp oral, spatial, and interaction mo deling, well known so far.

We then have presented an analysis of the relevant standard formats and mo dels, i.e., HTML, SMIL,

MHEG-5, HyTime, including the Z

X mo del which has b een designed to meet the advanced requirements.

Wehave presented the capabilities and identied the limitations of these mo dels. The shortcomings of

standards call for a new initiative for next generation multimedia do cument mo dels. As illustrated byZ

[BK99] it is very well p ossible to push the limits of existing approaches and to meet the new requirements.

Wewould liketopoint out that the implications of our analysis and of approaches trying to resolve

the shortcomings of existing mo dels are signicant: There arises an urgent need for appropriate authoring

to ols that supp ort ne-grained reuse of multimedia content, adaptability of content to user needs and

individual interest, and, as a direct consequence, the presentation-neutral representation of material, e.g.,

in a database. When developing the multimedia content rep ository of Cardio-OP based on Z

Xwe made

this painful exp erience. Our group has already develop ed a DataBlade mo dule for the ob ject-relational

database system Informix Dynamic Server / Universal Data Option capable of managing Z

X do cuments

and fragments [BKW99]. We currently develop an authoring to ol and a presentation engine for Z

since presentation-neutral representation of multimedia contentaswell as adaptation supp ort directly

impacts the design of authoring and presentation to ols.

References

[All83] J. F. Allen. Maintainin g Knowledge ab out Temp oral Intervals.

Communications of the ACM

26(11):832{843, Novemb er 1983.

[BK99] S. Boll and W. Klas. Z

X | A Semantic Mo del for Multimedia Do cuments and Presentations. In

be published in: Proceedings of the 8th IFIP Conference on Data Semantics (DS-8): \Semantic Issues

in Multimedia Systems"

. Kluwer Academic Publishers, Rotorua, New Zealand, 5-8 January 1999.

[BKW99] S. Boll, W. Klas, and U. Westermann. Exploiting OR-DBMS Technology to Implement the Z

Data Mo del for Multimedia Do cuments and Presentations. In A. Buchmann, editor,

Submitted to:

Datenbanksysteme in Buro, Technik und Wissenschaft (BTW)

. GI-Fachtagung, March 1999.

[BPSM98] T. Bray,J.Paoli, and C. M. Sp erb erg-McQueen.

Extensible Markup Language (XML) 1.0 { W3C

Recommendation 10-February-1998

. W3C, URL: http://www.w3.org/TR/1998/REC-xml-19980210,

Februar 1998.

[Bul98] D. C. A. Bulterman. User-centered Abstractions for Adaptive Hypermedia Presentations. In

Proc. of

the 6th ACM Multimedia Conference

, Bristol, UK, Septemb er 1998.

[C-L98] C-LAB | Research Institute of the UniversityofPaderb orn and Siemens AG. IDIAS - An Intelligent

Diagram Assistant, 1998. URL http://www.c-lab.de/ ucmm/idias/ro ot.html .

[DD94] S. DeRose and D. G. Durand.

Making Hypermedia Work: A User's Guide to HyTime

. Kluwer

Academic Publishers, Dordrecht, 1994.

[DK95] A. Duda and C. Keramane. Structured temp oral comp osition of multimedia data. In

Proc. IEEE

International Workshop on Multimedia- Database-Management Systems

, Blue Mountain Lake, August

1995.

[EBS97] J. Eklund, P. Brusilovsky, and E. Schwarz. Adaptive textb o oks on the www. In H. Ashman, P. Thistew-

aite, R. Debreceny, and A. Ellis, editors,

Proceedings of AUSWEB97, The ThirdAustralian Conference

on the World Wide Web, Queensland, Australia

, pages 186|192. Southern Cross University Press,

July, 5{9 1997.

[EF91] M. J. Egenhofer and R. Franzosa. Point-Set Top ological Spatial Relations.

Int. Journal of Geographic

Information Systems

, 5(2), March 1991.

[HBB

98] P. Hoschka, S. Buga j, D. Bulterman, et al.

Synchronized Multimedia Integration Language { W3C

Working Draft 2-February-98

. W3C, URL: http://www.w3.org/TR/1998/WD-smil-0202 , Februar

1998.

[HFK95] N. Hirzalla, B. Falchuk, and A. Karmouch. A Temp oral Mo del for Interactive Multimedia Scenarios.

IEEE Multimedia

, 2(3):24{31, Fall 1995.

[Hof96] P. Hofmann. MHEG-5 and MHEG-6: Multimedia Standards for Minimal Resource Systems. Technical

Rep ort, Technische Universitat Berlin, April 1996.

[ISO86] ISO.

Information processing - Text and Oce Systems - Standard Generalized Markup Language

(SGML)

, 1986. ISO-IS 8879.

[ISO92] ISO/IEC.

Information Technology - Hypermedia/Time-based Structuring Language (HyTime)

, 1992.

ISO/IEC IS 10744.

[ISO95] ISO/IEC JTC1/SC29/WG12.

Information Technology { Coding of Multimedia and Hypermedia In-

formation { Part 5: Support for Base-Level Interactive Applications, ISO/IEC IS 13522-5

. ISO/IEC,

1995.

[ISO96] ISO/IEC JTC1/SC29/WG12.

Information Technology { Coding of Multimedia and Hypermedia In-

formation { Part 6: Support for Enhanced Interactive Applications, ISO/IEC IS 13522-6

. ISO/IEC,

1996.

[JR95] R. Joseph and J. Rosengren. MHEG-5: An Overview. Technical Rep ort, GMD-FOKUS, Berlin,

URL://www.fokus.gmd.de/ovma/mug/archives/d o c/mh eg-reader/ rd120 6.html, Decemb er 1995.

[KBA93] T. Kamba, K. Bharat, and M. C. Alb ers. The Krakatoa Chronicle - An Interactive, Personalized

Newspap er on the Web. page http://www.w3.org/Conferences/WWW4/Pap ers/ 93/, 1993.

[KLAV98] W. Klippgen, T. D. C. Little, G. Ahanger, and D. Venkatesh. The Use of Metadata for the Rendering

of Personalized Video Delivery. In

[SK98]

, New York, 1998. McGraw-Hill.

[LG93] T. D. C. Little and A. Ghafo or. Interval-Based Conceptual Mo dels for Time-Dep endent Multimedia

Data.

IEEE Transactions on Know ledge and Data Engineering

, 5(4), August 1993.

[MBE95] T. Meyer-Boudnik and W. Eelsb erg. MHEG Explained.

IEEE Multimedia

, 2(1), Spring 1995.

[NKN91] S. R. Newcomb, N. A. Kipp, and V. T. Newcomb. "HyTime" { The Hyp ermedia/Time-Based Do cu-

ment Structuring Language.

Communications of the ACM

, 34(11), Novemb er 1991.

[PS94] D. Papadias and T. Sellis. Qualitative Representation of Spatial Knowledge in Two-Dimensional Space.

VLDB Journal

, 3(4), Octob er 1994.

[PTSE95] D. Papadias, Y. Theo doridis, T. Sellis, and M. J. Egenhofer. Top ological Relations in the World

of Minimum Bounding Rectangles: A Study with R-Trees. In

Proceedings of the ACM SIGMOD

Conference on Management of Data

, San Jose, May 1995.

[RLJ98] D. Raggett, A. Le Hors, and I. Jacobs.

HTML 4.0 Specication { W3C Recommendation, revisedon

24-April-1998

. W3C, URL: http://www.w3.org/TR/1998/REC-html40-19980424, April 1998.

[RvOB97] L. Rutledge, J. van Ossenbruggen, and D. C. A. Bulterman. A Framework for Generating Adaptable

Hyp ermedia Do cuments. In

Proc. ACM Multimedia Conference

, Seattle, Novemb er 1997.

[SK98] A. Sheth and W. Klas.

Multimedia Data Management - Using Metadata to Integrate and Apply Digital

Media

. McGraw-Hill, New York, 1998.

[SSW98] V. Schoch, M. Sp echt, and G. Web er. ADI | An Empirical Evaluation of a Tutorial Agent.

In T. Ottmann and I. Tomek, editors,

Proceedings of the ED-Media and ED-TELECOM 1998,

Freiburg, Germany

. Asso ciation for the Advancement of Computing in Education, June 1998. URL

http://apsymac33.uni-tri er.de:8080/AD I.html.

[SYS98] D. Schmitz, J. Yu, and P. Satangeli.

Timed Interactive Multimedia Extensions for HTML

(HTML+TIME)

. W3C, URL: http://www.w3.org/TR/1998/NOTE-HTMLplusTIME-19980918,

Septemb er 1998.

[WR94] T. Wahl and K. Rothermel. Representing Time in Multimedia Systems. In

Proc. IEEE International

Conference on Multimedia Computing and Systems

, pages 538{543, Boston, MA, May 1994.