Record Subtyping in Flexible Relations by means of Attribute Dependencies
Christian Kalus
Peter Dadam
Universit~t Ulm
Fakult~t for Informatik
Abteflung Datenbanken und Informationssysteme
89069 Ulm, Germany
Abstract
The model of flexible relations supports hetero-
geneous sets of tuples in a strongly typed way. The
elegance of the standard relational model is preserved
by using a single, generic scheme constructor.In each
model supporting structural variants the shape of some
part of a heterogeneous scheme may be determined by
the contents of some other part of the scheme. We
formalize this relationship by a certain kind of integrity
constraint we have called "attribute dependency" (AD).
We motivate how ADs can be used, besides their
application in type and integrity checking, to incorporate
record subtyping into our extended relational model
Moreover, we show that ADs yield a stronger assertion
than the traditional record subtyping rule as they
consider interdependencies among refinements. We
discuss how ADs are related to query processing and how
they may help to identify redundant operations.
I. Introduction
The relational model as defined by Codd [6] forms the
base of most contemporary data models. It constructs a
database as a set of relations, a relation being defined
over a set of aUributes called its scheme. The instance of
a relation is a set of tuples where each tuple is a mapping
from the scheme attributes to values of given (atomic)
domains.
The constraint of homogeneity, i.e. the fact that all
tuples of a relation are defined over the same set of
attributes, does often not meet the intuition of a relation
as a container of
related
entities. Take an address (e.g. of
a person's record) as a simple example. Each address
comprises a zip code and a town. The town-local part of
the address may be either a post-office box number or a
street and, if it is a street, it is sometimes followed by a
house number, sometimes not. This little example already
exhibits many facets: ZipCode and Town are
unconditioned
or
homogeneous
components as they are
always present. The town-local part is a
disjoint union
of
PostOfficeBoxNumber and Street while Street may be
accompanied by the
optional attribute
HouseNumber.
Another form of attribute relationship can be
motivated by the "electronic communication part" of an
address which is composed of a telephone-number, a
FAX-number, and an electronic-mail-address. At least one
of these three attributes should be present to constitute an
electronic communication address, but two or all three of
the attributes are allowed, too. This
non-disjoint union
is
another relationship type between attributes, and many
other forms of relationships can be found in addition.
The relationships mentioned above are based on the
pure existence of attributes. I.e. from the sole
existence/absence of a certain attribute we draw
conclusions about the existence or absence of other
attributes. In addition to these
existence-based attribute
relationships
there occur
value-based attribute
relationships
which take the influence of values (in some
attributes) on the existence of (other) attributes into
account.
Take an employee entity possessing the attributes
salary and jobtype as an example. In addition,
- if the value of jobtypo is 'secretary' then the attributes
typing-speed and foreign-languages are present.
- if the jobtype is 'software engineer' the employee is
further described by the products he is in charge of and
the programming-languages he knows.
-if
the jobtype is 'salesman' be possesses a sales-
commission and again the produets he is in charge of.
Another example is that the existence of a maiden-
name is determined by the appropriate values in the
attributes sex and marital-status.
The overall aim of the model of flexible relations is to
bridge the gap between semantic data models and
operational data models. Therefore it captures the
attribute relationships described above, yet it utilizes a
single, generic scheme construct to preserve the elegance
of the relational model.
1063-6382/95 $4.00 © 1995 IEEE
383
11th ICDE 1995, Taipei, Taiwan
The paper is focussing on value-based attribute
relationships. We will introduce a notation we have
called attribute dependencies (ADs) that enables us to
model these relationships as integrity constraints. We will
motivate the usage of ADs, particularly their ability to
model a strong notion of subtyping. Besides this
employment their connection to constructs of semantic
data models, the benefit of attribute dependencies in type
checking and in host language coupling is discussed. The
behaviour of ADs, especially in connection with a query
language, will be formally described by an axiom system.
The rest of the paper is structured as follows: Section
2 introduces some basic notations of the model of flexibe
relations needed as an environment in which attribute
dependencies are to be integrated. The different purposes
which ADs serve in our model, including subtyping, are
discussed in section 3. In section 4 an axiom system for
the derivation of attribute dependencies is developed and
shown to be sound and complete. Section 5 compares our
approach to related ones, while section 6 concludes with a
summary and an outlook.
2. Attribute dependencies
2.1 Basic notations of the model of flexible
relations
The elegance of the relational model is mainly due to
the fact that it gets along with a single type constructor.
To preserve this elegance as much as possible, we looked
for an extended type constructor enabling us to describe
the various variant (and also non-varian0 structures in a
single, generic fashion. It is obvious that specifying a
scheme as a set of attributes, as the relational model does,
is not expressive enough to model arbitrary attribute
relationships. To achieve this goal we enhanced the
scheme notation in the following way: a scheme is now
composed of a set of attributes accompanied by a
cardinality constraint in the form of two integer values
determining how many components of the set have at
least to be taken and how many components of the set are
allowed at most. If we describe this construct as a three-
tuple
< at-least value, at-most value, set of attributes >
then the various constructs introduced in section 1 can be
expressed in the following way 1
-
a traditional relational scheme with attributes At .....
An is denoted by < n, n, { A 1 ..... An } >, i.e. at least
n and at most n (and therefore exactly n ) of the
attributes have to be present.
1 Let as usual be 1~ the universe of attributes, A, B ... and A i be single
atlributes, and V ..... Z be atlribute sets. Let XY denote the union of the
attribute sets X and Y and txeat attributes as singleton attribute sets when
sets of attributes are expected. Tuples will be denoted by < ... >.
-
a disjoint union of attributes At ..... An is modeled by
< 1, 1, { At ..... An } > telling that exactly one of the
attributes may appear.
-a
non-disjoint union of attributes A1 ..... An is
described by < 1, n, { A1 ..... A n } >, i.e. the electronic
communication address of section 1 is expressed by
< 1, 3, { tel-number, FAX-number, email-address } >.
The above notation is not completely satisfying yet: a
union might appear as only a part of a scheme, the
variants in a union do not need to be single attributes but
can be relational schemes, variants again, and so on.
Therefore we have to extend our notation allowing the set
components to be either single attributes or again three-
tuples of our notation. This final version of a flexible
scheme is presented by a more abstract example.
Example 1 An application demanding tuples with
attributes A and B (unconditioned), either attribute C
or D (i.e. a disjoint variant between C and D)and
"some" of E, F and G (a non-disjoint union of E, F and
G) yields the following flexible scheme FS
FS = <4,4,{A,B, < 1, 1,{ C,D}>,
< 1,3,{E,F,G }> }> []
A flexible scheme is a very compact notation. For the
purpose of a basic understanding one can unfold a
flexible scheme yielding the allowed attribute
combinations. As this unfolding can be interpreted as
building the disjunctive normal form of a flexible scheme
FS, we will refer to it as dnf(FS). Forming the DNF of
the scheme of example 1 yields
dnJ(FS)= {ABCE, ABDE, ABCF, ABDF, ABCG,
ABDG, ABCEF, ABDEF, ABCEG, ABDEG,
ABCFG, ABDFG, ABCEFG, ABDEFG }
Note that this unfolded version of a flexible scheme
corresponds to the "set of objects" idea of Ed Sciore [13]
(see also [11], chapter 12). The little example above
should suffice as a motivation to find a compact
description for variant schemes.
Now it is easy to define the domain of a flexible
scheme. If Tup(X) denotes the set of tuples for a given
attribute set X, then dom(FS) = I.Jx ~ dnf(FS) Tup(X). A
flexible relation FR can then be defined as a two-tuple
FR = < FS, inst > with scheme(FR) = FS being a flexible
scheme and inst(FR) = inst being the instance of the
relation, a finite set of tuples satisfying inst(FR) c
dom(scheme(FR)). As a flexible scheme does not
uniquely determine the shape of its tuples we assume the
existence of a function attr(O yielding the attribute set X
, tuple t is defined on (of course attr(O ~ dnf(FS) iff t
dom(FS)).
384
2.2 Definition of attribute dependencies
Up to now a flexible scheme considers existential
relationships of attributes and determines thus the basic
shape of tuples and instances. Value-based constraints are
not yet taken into account, a flexible scheme is therefore,
following the notation of [12], a primitive scheme and a
flexible relation only a possible relation (instance). The
examples in section 1 have shown that flexible relations
demand, besides the known types of constraints, a certain
class of constraints concerning the variant structure of
tuples. Referring to the jobtype-example of section 1 we
may say that the value of the attribute jobtypa determines
the existence of the attributes in Y = { typing-speed,
foreign-languages, products, programming-languages,
sales-commission
} in the way that
(1) t(jobtype) = 'secretary'
attr(t) n Y = { typing-speed, foreign-languages }
(2) t(jobtype) = 'software engineer'
attr(t) n Y = {products, programming-languages }
(3) t(jobtype) = 'salesman'
attr(t) n Y = { products, sales-commission }
To prepare the formal definition of an attribute
dependency consider the following points. While in the
example above there is only one determining attribute
(iobtype), in general there may be several ones (take
sex
and marital-status determining the existence of a
maiden-name). Therefore we should say that the contents
in the attribute set X determines which attributes in the
attribute set Y exist. This general assertion can be refined
by considering the legal variants explicitely. Each variant
consists of an attribute set Yi c y (i=l..n, n being the
number of variants) and is determined by a set of values
Vi c Tup(X) with the obvious meaning that the attribute
set Yi occurs in a tuple t whenever t[X] e V i . When
there is no V i such that t[X] e V i then it is intuitive to
demand that tuple t does not possess any attribute of Y.
Considering this we obtain the definition of an attribute
dependency 2
Def'mition 2.1 "explicit attribute dependency"
An explicit attribute dependency EAD has the syntactical
form
EAD = <X e~p. am, y,{v1 exp.attr> yl,
.... Vn exp.am) Yn } >
where X c ~, V i c Tup(X) (i = 1 .. n),
Y c ~t,Yi c_
y (i =l..n),
i~j --> Vi nVj = O (i,j=l..n)
2 In section 4 we will use a slightly modified definition. To distinguish
both we call the following definition an explicit attribute dependency.
A flexible relation FR is said to satisfy the explicit
attribute dependency EAD ff Vt e inst(FR) :
( 3i : t[X] E Vi ) ---> attr(O n Y = Yi
^ ( Vi : t[X] ~ Vi ) --> attr(t) n Y = 0 []
Example
2 The jobtype-example is formulated in EAD-
notation by
< {jobtype}
exp. am,
{ typing-speed, foreign-languages,
products, programming-languages, sales-commission },
{ < jobtype : 'secretary' > e~p..m ,
{ typing-speed, foreign-languages },
< jobtype : 'software engineer' > exp. ,m ,
{ products, programming-languages },
< jobtype : 'salesman' > exp.,m ,
{ products, sales-commission } } > []
3. Usage of attribute dependencies
There are several streams motivating the use of ADs.
The first one is to integrate semantic type constructs into
an operational data model, thus bridging the gap between
semantic and operational data models. Secondly we show
that ADs may be used to incorporate subtyping into a
relational data model. In addition, we discuss how ADs
may be applied in decomposition and query evaluation.
3.1 Mapping of entity-relationship concepts onto
flexible relations
Specialization is one of the enhanced entity
relationship concepts ([7], chapter 15). A specialization
that is encoded in the entity itself is called a predicate
defined specialization. If one replaces the predicate Pi of
the i-th specialization by its extension Vi, i.e. Vi = { v I
pi(v) is true }, then an attribute dependency is a one-to-
one mapping of a predicate defined specialization. ER
models further divide specialization into disjoint versus
overlapping subclasses and total versus partial
subclasses. This classification can be inferred from ADs
as well: the variants of an AD are disjoint if Yi n Yj = O
(i~j) and they are total if Ui=LnVi = Tup(X). The benefit
of mapping these ER constructs onto the model of flexible
relations is that they can now be exploited operationally.
The most important operational use of ADs is their
application in type-checking, which is a central point of
our model. Flexible schemes do serve this purpose
already better than relational schemes as existential
attribute relationships are already captured by them (see
section 2). However, value-based dependencies cannot be
type-checked by flexible schemes. For example, there is
no scheme which would reject the tuple
385
< .. jobtype : 'salesman', typing-speed : high, foreign-
languages : { french, russian } >
as { .... jobtype, typing-speed, foreign-languages } is a
valid attribute combination. The fact that jobtype =
'salesman' requires different attributes to be present has
to be checked with the AD of example 2. Type checking
based on ADs is initiated dunng insertion, update 3, and
data retrieval, which will be discussed in more detail.
3.1.1
Decomposition along
ADs In [7] four translation
methods for predicate defined specializations into
relations are described. Two of these methods result in a
single relation with plenty of null values. Moreover,
artificial attributes indicating the current variant have to
be introduced - and have to be interpreted by the user.
The benefit of our model is obvious: flexible schemes
together with attribute dependencies relieve the user of
the burden to set and control the correct variant.
The third and fourth translation method horizontally/
vertically decompose the entity along the specialization.
These techniques can be regarded as extensions of the
traditional horizontal/vertical decomposition [5] to
support structural variants. To restore the entity, an outer
union instead of a simple union (horizontal
decomposition) and a mulfiway join instead of a natural
join (vertical decomposition) have to be performed.
3.1.2 Query optimization by
the aid
of ADs Qualified
relations are used to extend algebraic equivalences to
deecoiaposed relations [5]. Again we may say that a
relation together with an AD is an extension of a
qualified relation to support structural variants. As for
qualified relations we can exploit each selection
concerning the determining attributes of an AD to draw
conclusion about redundant operations, e.g. unnecessary
joins with variants that are known to be excluded. See
[5,p. 103ff] for a list of query rewrite rules.
Another potential for optimization are type guards.
Each model supporting heterogeneous collections posses-
ses operations do not preserve the most specific type of
an entity (see e.g. [3] among others). Type guards restore
the lost type information by checking if an entity has a
certain type or if certain attributes are available. ADs
cannot only be used to implement type guards but can
also recognize redundant type guards when additional
information is sufficient to determine a more specific
type.
3.2 Semantic preserving subtyping through ADs
At first glance a predicate defined specialization can
as well be described by the traditional record subtyping
rule (see [4] among many others)
3 While there axe no further tyl:~-related consequences when the salary of
an employee is updated, the change of his jobtype causes a type change,
too.
ti -< ui (i=l..n)
< a! : tl , --. , an : tn , ... , am :
tm
> -< < al :
Ill , ...
, an : tln >
Let us first show that this inclusion rule can be
expressed with an attribute dependency. Therefore,
consider a flexible scheme FS with attr(FS) = W and let
EAD = < X exp. at~, Y , { V1 e~p. at~, YI .....
Vn exp. attr ~ Yn } > be an attribute dependency for FS.
Then the corresponding supertype contains the attributes
W - Y and the domain of X consists of Tup(X), i.e. the
domain of X is unrestricted in the supertype. Further we
can derive from EAD that there are n subtypes
possessing the attributes ( W -
Y ) u Yi,
having the
domain of X restricted to Vi (i= 1..n). We may therefore
say that attribute dependencies incorporate record
subtyping into the model of flexible relations.
Now, what is the benefit of using attribute
dependencies instead of the traditional subtyping rule?
This rule is obviously sufficient to state that secretary,
salesman and software engineer type are subtypes of a
more general employee type.
Example
3 The following employee type and its
predicate defined subtypes could have been inferred from
thejobtype-EAD of example 2:
employee_type = < .... salary : float, jobtype •
{ 'secretary', 'software engineer', 'salesman' } >
secretary_type = < .... salary : float, jobtype :
{ 'secretary' }, typing-speed : .... foreign-lang : ... >
salesman_type = < .... salary : float, jobtype :
{'salesman' }, products : .... sales-commission : ... >
softw_eng_type= < .... salary : float, jobtype :
{'software eng' }, products : .... programming-
languages
:
...
> []
Note that for each of the three subtypes there are two
type changes causing the subtype relation: The domain of
jobtype is restricted and some attributes are added to the
subtypes. These simultaneous type changes are
considered to be purely accidental by the record subtyping
rule. The type < .... salary : float > (without attribute
jobtype) is therefore treated as a valid supertype of the
subtypes presented above, although the connection
between the determining attribute
jobtype
and the
subtypes is destroyed. To prevent this from happening or
at least to notify the loss of this connection, it is necessary
to treat these type changes as causal related, like attribute
dependencies do.
3.3 Further usage of attribute dependencies
A last usage, which shall only be sketched here, is
that ADs are an encoding of general sums (see [10] as an
entry point). This equivalence can be exploited when
embedding of flexible relations into programming
languages is discussed. It can be shown that a flexible
386
scheme can be translated into an appropriate program-
ming language type (e.g. a variant record in PASCAL) if
each existential attribute relationship is accompanied by
an AD. If necessary, this can be obtained by introducing
artificial ADs with artificial determining attributes.
The applications discussed in this section pose
different requirements on ADs. In some cases, like
insertion, whole tuples of a flexible relation are
considered. Here, it is sufficient to apply the ADs
specified in the scheme. But most applications, like
update, retrieval or programming language embedding,
are referring only to parts of a tuple or to tuples which
may even have been transformed by (query language)
operations. Therefore it is also necessary to know how
ADs behave under transformations. This question is also
the central point when ADs are exploited for (semantic
preserving) subtyping. To answer this question we will
develop an axiom system for the implication of attribute
dependencies. This is done in the next section.
4. Axiom systems for attribute dependencies
4.1 An axiom system regarding ADs separately
Before we define the axiom system we slightly modify
the definition of an AD. This is only done for the sake of
readability and to better illustrate the similarity to other
forms of dependencies, but does not change its intention.
From definition 2.1 we can derive that, given an
explicit attribute dependency < X exp. ~t~, y ... >,
whenever two tuples h, t2 agree on X, then they possess
the same subset of Y as attributes. So we can define
Dermltion 4.1 "attribute dependency"
Let X, Y c ~/. A flexible relation FR is said to satisfy
the attribute dependency X ,t~ , y if Vtl, t2 ~
inst(FR) :
X c attr(tl) ^ X ~attr(t2) ^
tl[X] = t2[X]
attr(q) n V = attr(t2) n V []
The axiom system i that manages attribute
dependencies consists of the following four rules:
(A1) X ,t~, YZ I- X ~t~, y (projectivity)
(A2) {x "~, Y,X "~, z} ~-x "~, YZ
(additivity)
(A3) O I-X ,t~, y if Y _c X(reflexivity)
(A4) X ,t~ , y F-XZ ~ , Y (left augmentation)
A remarkable point about this rule system is that
transitivity is not valid for attribute dependencies. This
stems from the fact that we do not draw any conclusion
about the contents of the determined attributes. The
correctness of the axiom system is stated by the following
theorem 4
Theorem 4.1 ~ is a sound, complete and non-redundant
system of axioms for the implication of attribute
dependencies. []
Note that all rules could have been defined for explicit
attribute dependencies as well. For example, the
addidvity rule would be
{<X exp. aOr > y,{Vil e.xp. attr ) yl 1 ..... Vl n exp. attr ) ln}> '
<X exp'a~)Z,{V21 cxp'at~r)Z h ..... V~ exp'a~>zzm}>}
I-<X exp. attr ~ YZ,{ VllAV21 exp. attr ) yliz2! .....
VlnnVz m exp. am, ylnz~a}>
This lengthy definition hampers of course the
readability, thus making the abbreviated definition of
attribute dependencies more favorable for our purpose.
Nevertheless we stress again that the presented axiom
system works for explicit attribute dependencies as well.
The following example motivates the arguments of
section 3.1.2 bydemonstrating how a type guard can be
recognized being redundant.
Example 4 Imagine a query containing a selection with
the formula "salary > 5000 AND jobtype = 'secretary' "
followed by a type guard checking for the presence of the
attribute typing-speed. The redundancy of the type guard
can be shown by the following derivation based on the
jobtype-EAD
defined in example 2:
Projecting the right side of the
jobtype-EAD
onto
{ typing-speed } yields (cf. rule (A1))
<{jobtype} exp.,~ {typing-speed} , { < jobtype :
'secretary' > ~xp. at~ , {typing-speed} } >
Augmenting the left side of this EAD with the
attribute salary yields (cf. rule (A4))
< { jobtype, salary } exp. attr , {typing-speed},
{ <
jobtype
: 'secretary', salary : s >
exp. a~ > {typing-speed} } >
where s is an arbitrary value of
dom(salary).
We may
conclude that the presence of the attribute typing-speed
can be deduced from the selection formula and that the
type guard is therefore redundant. []
4 Due to space limitations we skip the proof of soundness and non-
redundancy. The completeness is subsumed by the completeness of the
extended axiom system &* (see theorem 4.2) which is proved in the
appendix. The complete proofs can be found in [8].
387
4.2 An extended axiom system capturing
functional and attribute dependencies
There are several reasons why one should regard FDs
together with ADs. The first and most practical reason
origins from the discussion how to embed flexible
relations into programming languages. Take PASCAL as
an example: Although its variant record type resembles
ADs, there are some syntactic restrictions that have to be
obeyed. One of them is that only a single attribute may
appear as determinant of a variant record. Iamgine an
attribute dependency X am, y with X consisting of at
least two attributes. There is an intuitive way to
circumvent the aforementioned syntactic restriction:
Introduce an artificial attribute A, replace X am , y by
A am , y and make the value of A dependent on the
value of X, i.e. extend the constraints by X runt, A.
The validity of this and other replacements may be
verified by the aid of a rule system combining functional
and attribute dependencies.
Secondly, the assertion of the reflexivity rule for ADs
is too weak as X does not only determine the existence of
Y ff Y is a subset of X but also its value (see reflexivity
rule (F1) below). In a combined system we can sharpen
this assertion, making the axiom system more expressive.
As a preliminary we have to adapt the notion of FDs
to fit into our model. The adaption simply consists of the
adding of a type guard "X ~
attr(t)" as the
access of
values must be preceded by a type guard when structural
variants are allowed. The axiom system, consisting of the
reflexivity rule, the transitivity rule, and the augmenta-
tion rule (see (F1), (F2) and (F3) below), is borrowed
from [16,p.384ff]. Its soundness and completeness is not
affected by the adaption to our model.
Definition 4.2 "FD (adapted to flexible relations)"
Let X,Y c ~t. A flexible relation FR is said to satisfy the
functional dependency X ~ , Y ff ~'tl, t2 ~
inst(FR):
X c_ attr(tl) ^ X ~
attr(t2) ^
tl[X] = t2[X] --->
Y c attr(h ) ^ y c attr(t2) ^
tl[Y] = t2[Y] []
The combined axiom system ~ for functional and
attribute dependencies consists of the following seven
rules:
(AF1) X ~c , y I- X am , y (subsumption)
(AF2){X ~o y,y am Z} I'-X am Z
(combined transitivity)
(A1) X am ~ YZ I'-X am, y (projectivity)
(A2) {X am Y,X am Z} t-X am YZ
(additivity)
(F1) O I- X runt, y if Y c X (reflexivity)
(F2) X f~c ~ y I- XZ f~o, YZ (augmentation)
(F3) {X f~c y,y ~, Z} I-X rune Z
(transitivity)
Theorem 4.2 ~ is a sound, complete and non-redundant
system of rules for the implication of functional and
attribute dependencies. []
Referring to the motivation for this combined rule
system, one can see that the pragmatic "work-around" for
PASCAL's variant record type is valid (see the combined
transitivity rule (AF2)).
In addition we could save two rules, thus making the
axiom system more compact. The reflexivity rule (A3)
and the left augmentation rule (A4), still needed in ~1 to
produce a complete axiom system, can now be inferred
from g with rather simple derivation sequences.
4.3 Attribute dependencies versus record
subtyping - the impact of transformations
At the end of this section we want to discuss the
differences and similarities of attribute dependencies and
traditional record subtyping. To do so, we sketch, with
the aid of the developed axiom system, how
transformations affect attribute dependencies. As the
formal description of the algebra for the model of flexible
relations is beyond the scope of this paper, we will rely
upon well-known algebraic operator, providing the
intuitive meaning in our model, too.
The most remarkable difference between record
subtyping and attribute dependencies shows the project
operator. While record subtyping tells us that any
projection yields a valid supertype [14], two cases have to
be discriminated for attribute dependencies: Suppose that
a flexible relation is to be projected onto the attribute set
X. As there is no rule telling us that an attribute
dependency may hold ff attributes at its left side are
omitted, all V am W with V ~ X are invalidated. If
on the other hand V c X then the projection rule tells us
that V am , W n X holds in the projection.
The two notions of subtyping perform similar when
the result "enlarges" the input relation(s). This holds e.g.
for the extension operator and the cartesian product. The
behaviour of attribute dependencies under algebraic
transformations can be summarized as follows:
Theorem 4.3 Let ads(FR) be the set of attribute
dependencies that hold in the flexible relation FR. The
388
following rules describe the propagation of attribute
dependencies:
(1) ads(FR1 x FR2) = ads(FR1) w ads(FR2)
(2) ads(~x(FR)) = {V ate, WnX I V ~t~,
W • ads(FR) ^ V_cX }
(3) ads( t~ F (FR)) = ads(FR)
(4)
ads(FR 1 u FR2) = O
(5) ads(FR1 - FR2) = ads(FR1) []
The theorem shows that besides the projection the
union operator causes a problem, too. First of all note that
without appropriate precautions no dependency at all
holds in the result of a union, as one cannot decide from
which input relation the tuples do come from, i.e. this is
not a special problem of attribute dependencies. To make
dependencies hold, one has to tag both input relations
before performing the union. The tagging can be realized
with the extension operator eA:a(FR), which extends each
tuple of FR by attribute A with value 'a'.
The left augmentation rule allows us to replace any
X a~, y occuring in one of the input relations by
AX at~ y in its extended counterpart. The extended
attribute dependencies now remain valid in the result
relation, i.e. we obtain
(6) ads( ( ea:al (FR1)) u ( CA: ~ (FR9) ) =
{ AX a~, y I X a~, y • ads(FR1) v
X ~r y • ads(FR2) }
5. Related work
The concept of subtyping is present in each object-
oriented data model [2]. To obey the desirable closure
property, these data models should discuss how the
subtype relation is affected by algebraic tranformations.
This has been done e.g. for the COCOON model in [14]
and for the ENCORE model in [15]. Differences to our
notion of record subtyping have been discussed in section
3.2, the comparison of the behaviour under
transformations is contained in section 4.3.
From the viewpoint of data dependencies, several
attempts have been made to consider value-oriented
dependencies in the presence of null values ([9], [17]),
but considering a merely existential consequence without
any value-oriented assertion seems to be a novelty in the
context of an operational data model.
An approach which pursuits the idea of [7] to
decompose an entity subtree into a master relation and
depending relations containing the variant information is
the "multirelation" model of Ahad and Basu [1]. They
improve the decomposition by keeping track of the
connection between the master relation and the
depending relations so that the restoration of the
complete information can be automated. The recording of
the connection between master and depending relation is
done via so-caUed "image attributes", attributes
possessing relation names as their domain. Image
attributes can be regarded as a special case of an attribute
dependency using a single artificial attribute as
determinant, this approach is therefore completely
covered by our results.
6. Summary and outlook
In this paper we have motivated attribute
dependencies as constraints naturally arising when
variant structures are considered. It turned out that they
can be used to incorporate record subtyping in a
relational model, yielding an even stronger notion of
subtyping, as attribute dependencies consider causal
connections between type refinements. In addition we
could show that the several forms of specialization
arising in (enhanced) entity relationship models can be
one-to-one mapped onto attribute dependencies, with the
benefit that they can be operationally employed in type
and integrity checking.
Our approach to model these features as a dependency
allowed us develop an axiom system for their derivation.
The axiom system has been shown to be sound, non-
redundant and complete, which enables us to precisely
predict the effect of arbitrary transformations (like query
language operations) on attribute dependencies.
Furthermore the connection between attribute and
functional dependencies has beeen discussed and an
extended axiom system capturing both forms of
dependencies has been evaluated.
Although attribute dependencies were regarded in the
context of the model of flexible relations they are rather
loosely fled to particularities of this model (see section 2).
Thus there seem to be no major problems to integrate
attribute dependencies into other data models supporting
variant structures or appropriate null values.
In this paper the connection between attribute
dependencies and subtyping has been shown. However,
the second axiom system presented here, capturing both
functional and attribute dependencies, has been motivated
differently (see section 4.2). Additional work should be
put on the question if this combined rule system can be
exploited to put further semantics into the subtype
relationship.
Acknowledgements
The authors would like to thank Marc Scholl, Franz
Schweiggert and Klaus Gassner for valuable suggestions
on a previous version of the paper. Also thanks go to the
anonymous referees for their valuable comments.
389
References
[1] R. Ahad, A. Basu, "ESQL: A Query Language for the
Relation Model Supporting Image Domains", 7th Int.
Conf. on Data Engineering, April 1991, pp. 550 - 559
[2] M. Atkinson, F. Bancilhon et al., "The Object-Oriented
Database System Manifesto", 1st Int. Conf. on
Deductive and Object-Oriented Databases, Kyoto,
Japan, Dec. 1989, pp. 40 - 57
[3] P. Buneman, A. Ohori, "A Type System that reconciles
Classes and Extents", 3rd Int. Workshop on Database
Programming Languages, Greece, 1991, pp. 191 - 202
[4] L. Cardelli, P. Wegner, "On Understanding Types, Data
Abstraction, and Polymorphism", ACM Computing
Surveys, vol. 17, no. 4, Dec. 1985, pp. 471 - 522
[5] S. Ceri, G. Pelagetti, "Distributes Databases, Principles
and Systems", McGraw-Hill Book Company, 1984
[6] E.F. Codd, "A Relational Model for Large Shared Data
Bases", Comm. of the ACM, vol. 13, no. 6, June 1970
[7] R. Elmasri, S.B. Navathe, "Fundamentals of Database
Systems", Benjamin/Cummings Publ. Comp., 1989
[8] C. Kalus, P. Dadam, "Incorporating record subtyping
into a relational data model", Tech. Report Nr. 94-06,
University of Ulm, Fac. of Computer Science, 1994
[9] Y.E. Lien, "Multivalued Dependencies with Null
Values in Relational Databases", 5th Int. Conf. on Very
Large DataBases, Brazil, 1979, pp. 155 - 168
[10] D. MacQueen, "Using Dependent Types to Express
Modular Structure", 13th ACM Syrup. on Principles of
Programming Languages, 1986, pp. 277 - 286
[11] D. Maier, "The Theory of Relational Databases",
Computer Science Press, 1983
[12] J. Paredaens, P. DeBra, M.Gyssens, D. VanGucht, "The
Structure of the Relational Database Model", Springer-
Verlag, 1989
[13] E. Sciore, "The Universal Instance and Database
Design", Doctoral diss., Princeton Univ.,, Oct. 1980
[14] M.H. Scholl, H.-J. Schek, "A Relational Object Model",
Third Int. Conf. on Database Theory, LNCS 470, Paris,
Dec. 1990, pp. 89 - 105
[15] G.M. Shaw, S.B. Zdonik, "A Query Algebra for Object-
Oriented Databases", 6th Int. Conf. on Data
Engineering, Los Angeles, Feb. 1990, pp. 154 - 162
[16] J.D. Ullman, "Principles of Database and Knowledge-
Base Systems, Volume 1", Computer Sci. Press, 1988
[17] Y. Vassilou, "Functional Dependencies and Incomplete
Information", 6th Int. Conf. on Very Large Data Bases,
Montreal, Canada, 1980, pp. 260 - 269
Appendix
Proof of completeness of the axiom sytem ~.
Let AF be a set ~ ADs and FDs and let AF+IAF" be the set of all
dependencies that can/cannot be derived from AF by the axioms in &r.
X+~nc, the closure of FDs for an attribute set X is known from literature
(see e.g. [16]). Let X+anr be the corresponding closure for ADs. The
relationship between both is X~ttr ~ X~anc as any FD implies an AD due to
the subsumption rule (AF1).
To prove completeness, for each X ~ Y e AF- we have to
construct a flexible relation FR that satisfies all dependencies in AF +, but
not X ~ Y. Analogous to the proof for FDs we construct a two-tuple
(flexible) relation with the following specification (independent on if we
regardX
-attr ) YeAF-orX rune ) YeAF-)
attr(t I) = 1~
VA~ g:tl(A) = 1
attr(t9 = X~t~
VA ~ X~m c : t2(A ) = 1
VA e X~t~- X~¢ : h(A) = 0
This relation can be visualized as follows (with Ill/symboli:fing non-
existent attributes)
attributes of X ~me attributes of X a+~ - X~ta¢ attributes of B - Xatlr+
A A A.
11...1 1 1... 1 11...1
1 1 ... 1 0 0 ... 0 ///////
Suppose we have to show that X attr > y is not satisfied by this
flexible relation. By reflexivity, X g~ X+func , so by construction q[X] =
t2[X]. Y cannot be a subset of X~t ~ , otherwise it would have been inferred
by the closure property. So by construction attr(tl) n Y ~ attr(t 2) n Y, i.e.
X at~ ) y
is not satisfied.
Suppose at the other hand that we have to show that X rune ) y is
not satisfied by this relation. By the closure property Y cannot be a subset
of X~aac, so by construction either Y ~ attr(t2) or at least q[Y] ~ tz[Y ]. In
both cases X time ) y is not satisfied.
In addition we have to show that FR is a legal relation, i.e. that all
dependencies in AF + are satisfied. Let W lane Z ~ AF +.
If
W g X~mc, then t 1 and t 2 disagree on W, and the dependency is trivially
satisfied by FR. Let on the other hand W ~
X+func.
Then by the closure
property X time ) W and by transitivity X rune ) Z. Using the closure
property again we get Z K X~m c and now, by construction, tl[Z ] = t2[Z ].
Hence W fune > Zis satisfied by FR.
Take now W attr
) Z ¢ AF +. Again, ifW g£ X~a c,
then the
dependency is trivially satisfied by FR. Assume on the other hand
W ~ X+func . From the functional closure property we can infer that
fune
X ) W. Now the combined transitivity rule applies and yields that
X attr ) Z holds. The attribute closure property asserts that Z ~ X+attr
and, by construction, attr(tl) n Z = attr(t2) n Z. Hence W attr ) Z is
satisfied by FR. That is, the axioms are complete.
390