scieee Science in your language
[en] (orig)
Property Testing and Geometry
Dissertation of Christian Sohler
September 30, 2002
Acknowledgments
First of all, I would like to thank my thesis advisor Professor Friedhelm Meyer auf der
Heide for his great support. He showed me in many helpful discussions how to improve
my work and suggested in which direction my research should go. But he also left me the
opportunity to go my own way.
I would also like to thank my ’co-advisor’ Professor Artur Czumaj for doing an excel-
lent job. Artur introduced me to property testing and suggested to consider it in the context
of geometry. His encouragement and ideas were always helpful and it was a pleasure for
me to work with him.
Then I would like to thank Valentina Damerow, Alexander May, Thomas M¨
uller, Har-
ald R¨
acke, and Martin Ziegler for carefully reading parts of this thesis and their helpful
suggestions to improve its readability.
Finally, a special thanks to the anonymous person(s) who cheered me up with a gift in
form of a ’SchlagerBild’ music CD during the last phase of writing.
iii
Advertisement
iv
Contents
1 Introduction 1
1.1 Motivation.................................. 4
1.2 RelatedWork ................................ 5
2 Preliminaries 9
2.1 The Standard Testing Model . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Combinatorial Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 The Power of Uniform Sampling . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Two Probability Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Testing Algorithms for Geometric Properties 17
3.1 Intersection of Geometric Objects . . . . . . . . . . . . . . . . . . . . . 17
3.2 ConvexPosition............................... 20
3.3 Euclidean Minimum Spanning Tree . . . . . . . . . . . . . . . . . . . . 28
3.3.1 Basic Definitions and Input Representation . . . . . . . . . . . . 29
3.3.2 Testing EMSTs in Well-Shaped Graphs . . . . . . . . . . . . . . 31
3.3.3 A Simple Property Tester in Well-Shaped Graphs . . . . . . . . . 34
3.3.4 An Improved Property Tester in Well-Shaped Graphs . . . . . . . 37
3.3.5 A Property Tester in Graphs with Maximum Degree 5 . . . . . . 43
3.3.6 A Property Tester in General Graphs . . . . . . . . . . . . . . . . 48
4 Efficient Property Testers 51
4.1 Abstract Combinatorial Programs . . . . . . . . . . . . . . . . . . . . . 51
4.1.1 Testing Abstract Combinatorial Programs . . . . . . . . . . . . . 53
4.2 Property Testing vs. Testing Abstract Combinatorial Programs . . . . . . 56
4.3 ClusteringProblems............................. 57
4.3.1 Radiusclustering.......................... 58
4.3.2 Diameter clustering . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4 ReversalDistance.............................. 64
4.5 Property Testing vs. Testing Abstract Combinatorial Programs (continued) 68
4.6 GraphColoring............................... 70
4.7 HypergraphColoring............................ 73
4.8 Testable Hereditary Graph Properties . . . . . . . . . . . . . . . . . . . . 78
v
Advertisement
Contents
5 Testing Algorithms with Geometric Queries 83
5.1 Property Testing with Range Queries . . . . . . . . . . . . . . . . . . . . 83
5.2 Testing Convex Position with Range Queries . . . . . . . . . . . . . . . . 85
5.3 MapLabeling................................ 89
5.4 ClusteringProblems............................. 93
6 Property Testing and Moving Data 97
6.1 Soft Kinetic Data Structures . . . . . . . . . . . . . . . . . . . . . . . . 98
6.1.1 Analysis of Soft Kinetic Data Structures . . . . . . . . . . . . . . 100
6.1.2 Discussion of The Model . . . . . . . . . . . . . . . . . . . . . . 101
6.2 Basic Soft Kinetic Data Structures . . . . . . . . . . . . . . . . . . . . . 101
6.2.1 SortedSequences.......................... 101
6.2.2 Balanced Search Trees . . . . . . . . . . . . . . . . . . . . . . . 105
6.2.3 RangeTrees............................. 108
6.2.4 Euclidean Minimum Spanning Trees . . . . . . . . . . . . . . . . 111
7 Conclusions 113
vi
1 Introduction
Property Testing is the computational task to decide whether a given object (e.g., a graph,
a function, or a point set) has a certain predetermined property (e.g., bipartiteness, mono-
tonicity, or convex position) or is far away from every object having this property. If the
input object neither has the property nor is far away from it, the algorithm may answer
arbitrarily. In contrast to traditional algorithms a property testing algorithm does not get a
complete description of the object under consideration as input. Instead, it has the ability
to perform queries about the (local) structure of the object. This way it is possible that a
property testing algorithm achieves its goal by looking only at a small (usually randomly
selected) part of the whole object.
The type of query used by a property testing algorithm depends on the object under
consideration. For example, if the object is a (dense) graph Gthen the algorithm may
query for entries in the adjacency matrix of G. If the object is a function f, it may ask
queries of the form: ’What is the value of f(x)?’ where xis an element of the domain of
f. For point sets the algorithm may ask for the position of the i-th point of the set (given
an arbitrary, fixed, but unknown ordering among the points).
To specify if an object is far away from a property we need a distance measure between
objects. The distance between two objects O1and O2is typically given by the ’fraction
of object O1 that has to be changed in order to obtain object O2. For example, in the
adjacency matrix model for graphs the distance between two graphs is the fraction of
entries in the adjacency matrix on which the two graphs differ. The distance between
functions is the fraction of domain elements on which the value of the two functions differs
and the distance of points sets is the fraction of points that are contained in one set but not
in the other. To determine when an object is far away from a property a distance parameter
is introduced. Given a distance measure and a distance parameter we say that an object
is -far from a property, if the distance to every object having the property is more than .
The quality of a property testing algorithm is measured by its query complexity and its
running time. The query complexity is the number of queries the algorithm asks about the
input. The running time is the time required for additional computations.
The concept of property testing was first explicitly formulated by Rubinfeld and Sudan
mainly in the context of program checking [75]. The study of property testing for com-
binatorial objects with focus on graphs was initiated by Goldreich et al. [51]. In many
follow-up papers the concept of property testing has been applied to different classes of
objects including graphs, hypergraphs, matrices, formal languages, Boolean expressions,
and point sets.
In this thesis we study geometric problems in the context of property testing and we
1
Advertisement
1 Introduction
apply concepts from computational geometry to property testing problems. We start our
investigations in Chapter 3 with the development of property testing algorithms for some
fundamental geometric properties [A. Czumaj, C. Sohler, and M. Ziegler, Property testing
in computational geometry, In: Proceedings of the 8th Annual European Symposium on
Algorithms (ESA), pages 155 - 166, 2000]. We show in particular, that disjointness of a set
of ngeometric objects can be tested with O(pn/)query complexity. Next we develop a
property testing algorithm for the convex position property of point sets in the Rd. A point
set is in convex position if every point of the set is a vertex of its convex hull. Our testing
algorithm has a query complexity of Od+1
pnd/. For both problems we show that our
property testing algorithm is optimal with respect to its asymptotic query complexity. Next
we consider the property that a geometric graph in R2(a graph with vertex set equal to a
point set in the R2)is a Euclidean minimum spanning tree. We develop a property testing
algorithm with query complexity O(pn/ log(n/)) for this problem. Our algorithm
is designed in the adjacency list model for graphs and it uses an interesting non uniform
sampling technique.
In the next chapter we present a general framework for property testing algorithms with
one-sided error [A. Czumaj and C. Sohler, Abstract combinatorial programs and efficient
property testers, In: Proceedings of the 43rd Annual IEEE Symposium on Foundations of
Computer Science (FOCS), to appear, 2002]. Our framework is based on a connection of
property testing with a new class of problems which we call abstract combinatorial pro-
grams. Informally, we show that a property can be tested efficiently, if for every problem
instance (no matter, if the instance has the property or not) there is an abstract combinato-
rial program (of small dimension and width) that satisfies a Feasibility Preserving property
and a Distance Preserving property. We apply our framework to a variety of classical com-
binatorial problems. We prove the testability of some clustering properties of point sets,
of the reversal distance property of permutations, and the k-coloring property of graphs
and hypergraphs [A. Czumaj and C. Sohler, Testing hypergraph coloring, In: Proceedings
of the 28th Annual International Colloquium on Automata, Languages and Programming
(ICALP), pages 493 - 505, 2001]. Although for most of these properties a property testing
algorithm has been known before, our framework can be used to simplify the correctness
proofs and in most cases it slightly improves the analyzed query complexity. More im-
portant, it shows that our framework is fairly general and can be applied to a couple of
different problems. In the last section of Chapter 4 we continue our investigations in the
generality of the framework and show: If a hereditary graph property can be tested in
time independent of the size of the graph (for constant ), then there exists a proof for its
testability in our framework. A graph property is called hereditary, if it is closed under
taking induced subgraphs, i.e., if G= (V, E)is a graph with a hereditary graph property
then every subgraph of Ginduced by a set SVhas the property, as well. Although the
current formulation of the framework is purely combinatorial and there is no trace of the
geometric origin, its development has been influenced by concepts from geometry such as
the geometry of linear programming and LP-type problems [76].
In Chapter 5 we return to the development of specific property testing algorithms;
this time under a different model of computation [A. Czumaj and C. Sohler, Property
testing with geometric queries, In: Proceedings of the 9th Annual European Symposium on
2
Algorithms (ESA), pages 266 - 277, 2001]. While in the standard testing model considered
in Chapter 3 a testing algorithm for properties of point sets is only allowed to query for
the position of the i-th point of the set we now allow a certain kind of geometric queries
when accessing the input point set. In this new model a testing algorithm may specify a
query range Rand ask for the i-th point within R. Depending on the type of query range
allowed we have different models of computation. The queries we consider are supported
by standard spatial data structures as they appear in many applications. We show that in
the new model it is possible to design more efficient testing algorithms than in the model
considered so far. In particular, we reconsider the convex position property and show that
it can be tested with O(logn/)triangular range queries. Then we show that there is a
property testing algorithm for a basic map labeling property. This algorithm uses O(1/3)
rectangular range queries. Finally, we visit the clustering properties from Chapter 4 again.
In Chapter 6 we apply property testing in the context of moving objects [A. Czumaj,
C. Sohler, Soft kinetic data structures, In: Proceedings of the 12th ACM-SIAM Symposium
on Discrete Algorithms (SODA), pages 865 -872, 2001]. When we consider moving object
like cars, robots, and mobile phones we do not know the future position of an object.
In many situations an object is controlled and moved by a third party which is located
outside of our system. Basically, the only query a system can guarantee to answer correctly
is for the current position of an object. In Chapter 6 we develop a theoretical model
for this kind of motion. We are interested in combinatorial structures that are uniquely
defined by the current position of all objects. We assume the following: Queries about
the structures we maintain occur subsequently in time. Between two queries the objects
may move arbitrarily. This movement induces combinatorial changes in the maintained
structure because the structure is uniquely defined by the objects’ positions. Before each
query a data structure may spend some time to ’adjust’ its data. The time spent is compared
to the number of ’combinatorial changes’ that occurred in the data structure since the
most recent query. More precisely, we assume that the objects moved on the shortest’ (in
terms of combinatorial changes) route to their current position. If we do not want to ask
for the position of every single object before each query, there is no hope to maintain a
combinatorial structure correctly. That is, where property testing can be applied. We only
maintain approximately correct structures (in the property testing sense). If a structure
is far from correct, then the property testing algorithm usually rejects and returns a small
counter example. This counter example is then locally corrected. This procedure is applied
until the property tester accepts. Using this approach we can show that we can maintain
approximately correct data structures spending only slightly more time than the number
of combinatorial changes in the structure during the shortest motion from configuration to
configuration. We illustrate our approach on sorted sequences, binary search trees, range
trees, and the Euclidean minimum spanning tree.
In the remainder of the Introduction we discuss our motivation to consider property
testing in general, and in particular in the geometric context. Lateron we give a brief
summary of the state of the art in property testing.
3
Advertisement
1 Introduction
1.1 Motivation
Property testing can be viewed as a relaxation of a standard decision procedure: Instead
of deciding whether an object has a given property or not, we only have to distinguish
between the case when the object has the property and the case when the object is far
away from it. This simpler problem can often be solved much faster and by looking only
at a small - sometimes constant - part of the input. For this reason the running time of a
property testing algorithm is usually much smaller than that of the corresponding decision
procedure. Problems that are infeasible in their original formulation as a decision problem
(e.g., NP-complete problems) are tractable in the property testing setting. The standard
way to deal with intractable problems would be, of course, to apply an approximation
algorithm (in the classical sense) to the problem. But sometimes polynomial or even linear
time approximation algorithm may also be too slow. In telecommunications, web traffic
analysis, and data mining we have massive data sets that cannot be processed by classical
algorithms. In such cases the only possibility is to use a sublinear time and space algorithm
as provided by property testing.
We also observe that property testing is a natural form of approximation. Whether or
not the approximation provided by a standard approximation algorithm is more useful than
that of property testing depends on the problem at hand. For example, in the case of graph
and hypergraph coloring it might be more useful to obtain a coloring that violates at most
an -fraction of the edges than a coloring that might use many more colors than necessary.
We remark that property testing can be very useful in the field of program verification.
In software development people often specify interfaces between different program parts.
Even with a correct and complete interface specification it may happen (more often than
not ?) that data is passed through the interface that is not consistent with the specification.
Typically, data is passed internally using pointers and so it is possible to pass huge objects
in constant time. Hence it is infeasible to check the whole object for its correctness without
heavy impact on the running time of the system. In this case a property testing algorithm
provides a useful alternative: If we cannot give a full guarantee that the input is correct
then we can at least detect the case when it is far from correct.
A further reason for the research in property testing algorithms is the fact that in some
situations being close to a property is almost as good as having the property. This can be
seen in the following example from Computer Graphics: One of the major problems in
Computer Graphics is to design efficient rendering algorithms. Rendering is the computa-
tional task to display a virtual scene on the screen. Such a virtual scene is typically com-
posed of several millions of polygons. Using nowadays graphics hardware and a standard
z-buffer algorithm it is not possible to render all polygons in a reasonable time. Therefore,
it is necessary to determine in advance if certain parts of the scene cannot be seen from
the current point of view. This process is called occlusion culling. Unfortunately, it is very
difficult to compute exactly those polygons that cannot be seen from a certain point of
view. Many approaches have been developed to deal with this problem and some of them
work only for objects that have (roughly) a convex shape (see, e.g., the survey [26]). If we
could determine quickly in advance whether an object is convex or far away from convex
then we could quickly decide if it is useful for the purpose of (online) occlusion culling.
4
1.2 Related Work
Such a service can be provided by a property testing algorithm. Therefore we consider the
closely related problem, if a point set (possibly the set of vertices of an object used for
culling purposes) is in convex position, in Chapter 3 and 5 under different property testing
models.
One particular situation when property testing can be useful in Computational Geom-
etry is the exact computation of geometric structures. It is well-known that fixed precision
arithmetic leads to serious problems in the design of geometric algorithms. Therefore,
software packages provide efficient algorithms to evaluate geometric predicates exactly.
Unfortunately, precision takes its time and so the exact evaluation of geometric predicates
is usually much slower than what we could achieve by using fixed precision arithmetic.
One approach to deal with these problems is the following: We first compute the geomet-
ric structure using fixed precision arithmetic and then verify the final structure using exact
arithmetic. If we find errors we try to fix them locally. Unfortunately, such a heuristic
has the drawback that an early error in the computation may cause damage to the whole
part of the structure computed later. Therefore, we should make sure that our structure is
not too far away’ from the correct structure during the whole computation. This could
be achieved by a property testing algorithm. If the property testing algorithm rejects the
current structure, then we can correct it before we finish the computation.
Another situation when property testing is useful is in the case of continuously chang-
ing data. Continuously changing data arises in applications that deal with moving objects,
e.g., when we want to maintain a mobile ad-hoc network for moving robots. In such a
situation we can query each robot for its current position but such a query may be slow
and each query increases the network load. It also does not make sense to update the posi-
tion of each robot before a transmission using the network is performed - such an update
would often create much more network load than the transmission itself. Instead we may
use property testing techniques to check the structure of the current network. If we find
errors in the structure we then correct them locally. In Chapter 6 we develop a theoretical
framework for mobile data and we show how to maintain a certain type of approximate
range trees. These range trees may be useful for the problem of maintaining such a com-
munication network.
1.2 Related Work
We now give a brief summary of the state of the art in property testing. Property testing
was first explicitly formulated by Rubinfeld and Sudan in the context of program checking
[75]. In [51] Goldreich, Goldwasser, and Ron initiated the study of property testing for
combinatorial objects. Since then, people designed property testing algorithm for a wide
variety of problems and tried to classify large classes of testable properties. In the follow-
ing we try to summarize the known results in the different areas where property testing
has been applied. We say that a property can be tested efficiently, if there is a property
testing algorithm with a query complexity that does not depend on the size of the input (is
constant, if the distance parameter is a constant).
5
Related document tools
Tools for careful academic writing
Plag helps identify passages that may need closer source checking. Identific can support academic and institutional document processes. They help keep academic document workflows clearer.
1 Introduction
Graph, Hypergraph, and Matrix Properties. Graph properties have been exten-
sively studied in the context of property testing. There are essentially two models for
testing graph properties. The adjacency matrix model for dense graphs and the adjacency
list model for sparse graphs. In the adjacency matrix model introduced in [51] it is known
that properties that can be formulated as a certain class of graph partitioning properties
can be tested efficiently [51]. Among these properties are clique size, cut size, bisection,
and k-colorability properties of graphs. Colorability properties have received much atten-
tion in subsequent work. The analysis from [51] for the bipartiteness and the (standard)
k-colorability property has been improved in [6]. In [5] Alon et al. used a variant of Sze-
mer´
edi’s Regularity Lemma to show that a very general graph coloring property can be
tested efficiently. They use this result to prove that every first order logic graph property
without ∀∃ quantification is efficiently testable. Though the query complexity of their al-
gorithm is independent of the size of the graph its dependency on is enormous (i.e., a
tower of towers of a polynomial in ). The notion of coloring from [5] has been further
generalized in various ways in [42] and shown to be efficiently testable.
Another general property considered in the research is the property of H-freeness
(graphs not containing a fixed graph Has an induced subgraph). Alon proved that the
query complexity of a one-sided error property tester is polynomial, if and only if His
bipartite [3]. The property of H-freeness can also be tested efficiently in 3-uniform hyper-
graphs [62].
In the adjacency matrix model there is always a canonical property testing algorithm as
the following result of Goldreich and Trevisan shows: A property in the adjacency matrix
model can be tested with query complexity independent of the size of the graph, if and
only if there exists a property testing algorithm that samples a set Sof vertices of size
independent of the graph uniformly at random and that accepts if and only if the subgraph
induced by Shas some fixed graph property (possibly different from the tested one) [54]. In
this work the authors also characterize graph partition problems that are efficiently testable
with one-sided error.
The adjacency list model for graphs has been introduced in [52] by Goldreich and Ron.
They show that some graph properties like connectivity, planarity, and cycle-freeness can
be tested efficiently. In contrast to the adjacency matrix model in the adjacency list model
there is no canonical property testing algorithm known. Typically, non-uniform sampling
strategies that depend on the problem at hand have been applied. For this reason the prop-
erties considered in the adjacency list model seem to be structurally simplier than the ones
analyzed in the adjacency matrix model. Colorability properties have also been considered
in the adjacency list model: In [53] Goldreich and Ron showed that bipartiteness can be
tested with a query complexity of Opoly((logn)/)pn/using a non-uniform sam-
pling strategy based on random walks. The authors also present an almost matching lower
bound on the query complexity. Very recently, Bogdanov et al. [19] presented a lower
bound of (n)(for sufficiently small ) on the query complexity of 3-colorability in the
adjacency list model. It is also known that the diameter of graphs [67] and acyclicity of
directed graphs is testable in sublinear time [15].
The popularity of the adjacency matrix model for graphs lead to the question which
matrix properties can be tested efficiently. Fischer and Newman showed that matrix prop-
6
1.2 Related Work
erties defined by a finite collection of forbidden induced partially ordered sets (where we
view a matrix as a partially ordered set in the standard way, that is, as a product order of
total orders) are efficiently testable [45]. Parnas and Ron developed property testing algo-
rithms for the problem to test whether a given matrix is a Euclidean metric, a tree metric,
or an ultrametric [68].
Properties of Functions. Monotonicity properties of functions have also received a
lot of attention. Monotonicity of binary functions f:{0, 1}n{0, 1}can be tested with
a query complexity of O(n/)[50]. This result has been generalized to functions over
arbitrary finite ordered sets in [37]. Sortedness of a sequence of nnumbers can be tested
in O(logn/)time [40]. Lower bounds on monotonicity testing have been given in [44].
Properties of Boolean functions that can be tested efficiently are singletons, monomi-
als, and the property that fis a monotone DNF formula with at most `terms. [70]. In [43]
Fischer et al. proved that the property that a Boolean function over nvariables depends
only on kof them can be tested with a number of queries depending only on kand .
Language Properties. In order to understand the nature of efficiently testable prop-
erties people considered property testing in the context of formal languages. Alon et al.
showed that a regular language seen as a property can be tested efficiently [7] and there
exists a context-free language that cannot be efficiently tested. Further, there is a property
testing algorithm for testing Dyck languages (languages containing strings of balanced
parenthesis) with query complexity e
O(n2/3/)[69]. Newman proved that constant width
oblivious read-once branching programs viewed as properties can be tested efficiently [66].
Recently, Fischer and Newman showed that read-twice constant width branching programs
are not necessarily testable [46].
Other Results. There are a couple of results on specific topics that do not fit into one
of the three categories above. We summarize them in this paragraph.
Alon and Shapira showed that a general satisfiability problem which includes hyper-
graph k-coloring can be tested efficiently [8].
In the context of clustering Alon et al. considered the radius and diameter clustering
problem, i.e. the problem if a given point set can be partitioned into ksubsets such that
the radius of the smallest sphere enclosing each set (the maximum distance between two
points within each set, respectively) is at most b. If we view this problem as a property of
point sets then it can be tested efficiently as proved in [4].
Some results related to statistics have been obtained: In [14] Batu et al. gave a prop-
erty testing algorithm with query complexity O(n2/3/4logn)for closeness of distri-
butions. Given a distribution Aover [n]×[m]it can be tested with query complexity
O(n2/3m1/3poly(1)) if the distributions obtained by projecting Ato each coordinate
are independent [13].
Recently, Buhrman et al. introduced the notion of property testing to quantum com-
puting [21].
7
Advertisement
1 Introduction
Surveys. There are three surveys that summarize some of the work done in the field of
property testing [42, 74, 49].
8
2 Preliminaries
In this chapter we introduce some notation used throughout this thesis. In particular, we
introduce a formulation of the standard property testing model with one sided error. We
introduce a special class of properties which we call combinatorial properties.
Then we show that if there exists a property tester for a combinatorial property of point
sets then there exists a property tester with the same query complexity that picks a sample
set uniformly at random and decides based on the sample and its internal coin flips. Such a
result is used for lower bound constructions for the query complexity of a property testing
algorithm.
Finally, we prove two auxiliary lemmas. The first lemma gives a lower bound on the
probability that a sample taken uniformly at random from a set contains one of kdisjoint
sets of size l. The second lemma gives an upper bound on that probability.
2.1 The Standard Testing Model
We begin with some basic notation and definitions. We use the e
O-notation to hide poly-
logarithmic factors, i.e. we have O(npolylog(n)) = e
O(n). We write [n] := {1,...,n}
for the set of integer numbers between 1and n. Throughout this thesis, let Ddenote a
finite set called domain and let Rdenote a (possibly infinite) set called range. Further let
Fdenote a set of functions from Dto R. For a subset S D and a function f F let
f|Sdenote the restriction of fto S. That is, f|S:SRwith f|S(x) = f(x)for all xS.
Now we define a property of Fas a set of functions from F:
Definition 2.1.1 A set Π F is called a property.
As already mentioned in the introduction we also need a distance measure between
functions in Fto define a property testing problem. In general, such a distance measure
can be an arbitrary function σ:F ×F [0, 1]:
Definition 2.1.2 Given a distance measure σbetween functions in Fand a real number
,0<1, we say a function f F is -far from (having a property) Πif σ(f, g)>
for every function gΠ.
Typically, we define the distance between two functions f, g F as the fraction of
domain elements on which the two functions differ (see, for example, [51]). Therefore, we
denote this distance measure as the standard distance measure:
9
Advertisement
2 Preliminaries
Definition 2.1.3 Given two functions f, g F we define the standard distance measure σ1
for functions in Fas:
σ1(f, g) = |{x D :f(x)6=g(x)}|
|D|.
The goal of property testing is to develop efficient property testers. A property tester
for Πis an algorithm that gets a distance parameter and a (possibly implicit) description
of D(for example, when Dis the set of numbers from [n]it gets n). The property tester
has oracle access to the input function f(for each x D it may ask queries of the form:
’What is the value of f(x)?’). A property tester must
accept every function fΠ, and
reject every function fthat is -far from Πwith probability at least 2
3.
1Notice that if f /Πand fis not -far from Π, then the outcome of the algorithm can go
either way.
Complexity of Property Testers. There are two types of possible complexity mea-
sures for property testers: The query complexity and the running time. The query com-
plexity measures the number of queries asked by a property testing algorithm:
Definition 2.1.4 The number of queries to the oracle is the query complexity of the prop-
erty tester.
If one counts also the time the algorithm needs to perform other tasks than querying
the input function values (for example, to compute certain combinatorial structures or to
compute a coloring of a graph), then the obtained complexity is called the running time of
the property tester. Now we present two examples how objects other than functions can be
represented in our model.
Point Sets. When we consider point sets we set D= [n]. Then we can represent a
point set P={p1,··· , pn}in Rdby a function f: [n]Rdwith f(i) = pifor 1in.
Throughout the thesis we assume that the input point set is in general position when we
deal with property testing algorithms for point sets. General position means that our point
set is not in a degenerate configuration. Roughly speaking, we call a configuration of
points degenerate, if it occurs with probability 0when the points are subject to small
random perturbations.
To be more precise, when we consider the convex position property as defined in Chap-
ter 3 then a point set Pin the Rdis degenerate, if there are more than k+1points in a
k-dimensional affine subspace with k<d. When we consider the Euclidean minimum
spanning tree, then a point set is degenerate, if the point set does not have a unique min-
imum spanning tree or if it has a subset that does not have a unique minimum spanning
tree.
1We consider a one-sided error model, though in the literature also a two-sided error model has been
considered. In the two-sided error model the goal is to distinguish with probability at least 2
3between
the case fΠand the case of fbeing -far from Π.
10
2.2 Combinatorial Properties
Graphs. In the literature one can find three models for property testing in graphs. The
first (and most widely considered) model is called the adjacency matrix model [51]. Here
one assumes that a graph G= (V, E)is represented as an adjacency matrix, i.e. it is
possible to query for each pair (v, u)with v, u V, if (v, u)Eor not. Similarly, one
can assume that the graph is represented by a function f:V×V{0, 1}that encodes the
adjacency matrix. W.l.o.g., we assume that V= [n]. Hence when we set D= [n]×[n]
and R={0, 1}then we can represent graphs as functions from Dto Ras required.
The second model is the bounded length adjacency list model [52]. Here one assumes
that a degree bound don the maximum degree of the vertices in the input graph is known.
In this case a graph is represented as a function f: [n]×[d][n]{0}where f(i, j)
denotes the j-th neighbor of vertex iand where f(i, j) = 0, if such a neighbor does not
exist.
The third model is the unbounded length adjacency list model [67]. In this model, the
input is not represented as a function. The model is specified by the different queries that
may be asked about the input graph G= (V, E). First of all, the algorithm may ask for the
degree deg(v)of each vertex vV. Further it may ask for the i-th neighbor of vertex v
for each ideg(v).
2.2 Combinatorial Properties
We now want to introduce a special class of properties that plays an important role in
property testing. We refer to these properties as combinatorial properties. Combinatorial
properties are only defined for functions whose domain Dis (isomorphic to) a set of the
form [n]dfor some constant d>1. They are properties that are closed under permutations
(isomorphisms) of the domain. This is typically the case when the function is used to
represent a set of objects or a combinatorial structure such like a graph or hypergraph.
Definition 2.2.1 Let Fbe the set of functions with domain D= [n]dand arbitrary range
R. Let Sndenote the set of permutations of [n]. A property Πof Fis called combinatorial,
if it is closed under permutations of [n]. That is, if π Snis a permutation of [n]then
for every fΠit holds: π(f)Πwhere π(f)(i1, . . . , id) := f(π(i1), . . . , π(id)) for
i1,··· , id[n].
Let us consider a few examples for combinatorial properties. If the considered object
is a set of basic objects (that is, there is no ordering among the objects) then every property
must be combinatorial. This is because the property does not depend on the way the object
is represented as a function. Examples for combinatorial properties of sets of objects
are the disjointness property considered in Chapter 3 or the clustering properties of point
sets in Chapter 4. Other examples are all graph and hypergraph properties (if viewed as
properties of an n-vertex graph or hypergraph).
Examples for properties that are not combinatorial are properties like sortedness,con-
vex polygons2, and the reversal distance property considered in Chapter 4. These prop-
erties typically depend heavily on the ordering given implicitly by the representation of
2The property that a sequence of points forms a simple convex polygon.
11
Advertisement
2 Preliminaries
the input object as a function. When we design property testers for these properties we
sometimes require non-uniform sampling techniques.
For combinatorial properties of point sets we show in the following section that there is
no chance to improve the query complexity of a property tester by using adaptive sampling.
If we consider combinatorial properties of point sets and we have an arbitrary property
tester with query complexity q(n, )then there is a property tester that samples a set
S[n]of size q(n, )uniformly at random and decides based on Sand its internal coin
flips.
2.3 The Power of Uniform Sampling
We now prove a fundamental lemma that is needed for lower bound constructions. We
show that if there is a property tester for a combinatorial property Πof point sets with
query complexity q(, n)then there is a property tester for Πthat picks a sample set
SPof size q(, n)uniformly at random and decides based on Sand its internal coin
flips. The proof is almost similar to the proof of Lemma 4.1 in the paper [54] by Goldreich
and Trevisan which is based on an analogous statement proven by Bar-Yossef et al. in [11].
The idea of the proof is simple: There are n!different representations of the same
point set, one for each permutation of [n]. When we start our algorithm on a random
representation of the same set then choosing the next index adaptively simply means to
pick another element uniformly at random. Indeed we cannot gain anything using adaptive
sampling:
Lemma 2.3.1 Let Πbe an arbitrary combinatorial property of point sets and let Abe an
arbitrary property tester for Πwith query complexity q(, n). Then there exists a property
tester for property Πthat selects a set Sof q(, n)points uniformly at random and decides
based on Sand its internal coin flips.
Proof : Recall that a point set P={p1,··· , pn}in the Rdis represented by a function
f: [n]Rdwith f(i) = pi. Since Πis a combinatorial property we know that we can
talk about properties of Prather than properties of the function representing P. That is, if
one representation of Phas property Πthen every other representation also has property
Π. We use this fact in the following to construct a property tester that samples uniformly at
random from an arbitrary property tester for Π. Both property testers have the same query
complexity.
Let Abe an arbitrary property tester for Π. Wlog., we assume that algorithm Aoperates
in iterations. In each iteration, depending on its internal coin flips and the answers obtained
in the past iterations, the algorithm selects a new index iand makes a query for the position
of the point pi.
Now we obtain a new algorithm A0from Ain the following way: When A0is started
with input point set Pof size n(given as an oracle) it first choses a permutation πof [n]
uniformly at random. Then algorithm Ais invoked with oracle access to the point set π(P)
that is represented by the function fπ: [n]Rddefined by fπ(i) = f(π(i)). We observe
12
2.3 The Power of Uniform Sampling
that π(P)has property Π, if and only if Phas property Πsince π(P) = Pand since Πis
combinatorial. Further we know that Ais a property tester for property Π. Thus it must
accept, if π(P)has property Π. It follows that A0accepts, if Phas property Π.
If Pis -far from Πthen so is π(P). By the fact that Ais a property tester it follows
that π(P)is rejected by Awith probability at least 2/3. It follows that A0also rejects P
with probability at least 2/3. Thus we know that A0is a property tester for property Π.
Our next step is to show that in each iteration of algorithm A0the next chosen point
is uniformly distributed among all possible choices (among all points that have not been
chosen in a previous iteration). For every sequence rof coin flips let Ardenote the deter-
ministic algorithm obtained from Aby fixing the outcome of the coin flips according to r.
Further let A0
rdenote the algorithm similar to A0with the exception that Aris invoked in-
stead of A. We show that for every sequence rof coin flips and for every possible sequence
of queries and answers the choice of the point selected next by algorithm A0is uniformly
distributed among the points not selected so far.
Let iπ
1,r,··· , iπ
j,r denote the indices selected by A0
rin iterations 1to j, i.e., the indices se-
lected when Aris invoked with oracle access to the point set π(P). Further let qπ
1,r,··· , qπ
j,r
denote the points selected by A0
rin iterations 1to j, i.e., qπ
`,r =fπ(iπ
`,r)for 1`j. The
choice of the index iπ
j+1,r selected by A0
rin iteration j+1depends only on the previously
selected points qπ
1,r, ..., qπ
j,r . Thus when we condition πby fπ(iπ
`,r) = qπ
`,r,1`j,
the choice of index iπ
j+1,r is deterministic. Further we prove in the following claim that
for every fixed index iπ
j+1,r under the same conditioning the point fπ(iπ
j+1,r)is uniformly
distributed among the points not selected so far:
Claim 2.3.2 Let i1,··· , ij[n]be distinct indices and let q1,··· , qjPbe the distinct
points. Then for each i[n]\ {i1,··· , ij}and each pP\ {q1,··· , qj}:
Pr[fπ(i) = pfπ(i`) = q`, 1 `j] = 1
nj
.
Proof : The number of permutations of [n]with fπ(i`) = q`for 1`jis (nj)!.
For each i[n]\ {i1,··· , ij}and each pP\ {q1,··· , qj}the number of permutations
of [n]with fπ(i) = pand fπ(i`) = q`for 1`jis (nj1)!. Thus
Pr[fπ(i) = pfπ(i`) = q`, 1 `j] = (nj1)!
(nj)! =1
nj
.2
Thus we know that in each iteration of algorithm A0
rthe next chosen point is uniformly
distributed among the points not chosen so far. We use this fact to build an algorithm
A00 that has the same output distribution as A0. Then we design an algorithm A000 that
samples a set SPof q(, n)points uniformly at random and then decides based on
internal coin flips and the point positions. We observe that we can execute algorithm A0by
first choosing a sequence rof coin flips uniformly at random and then invoking algorithm
A0
r. Algorithm A00 works in a similar manner. It first chooses a sequence rof coin flips
13
Advertisement
2 Preliminaries
uniformly at random. Then it invokes algorithm A00
r. Algorithm A00
rbehaves similarly to
algorithm A0
rwith the exception that in each iteration when A0
rrequests the position of
the point with index ialgorithm A00
rselects uniformly at random an index not selected so
far and returns the corresponding point. Let us formalize this point and look at iteration
jof both algorithms. Let ijdenote the index selected by algorithm A0
rin iteration jand
let i0
jdenote the index selected by algorithm A00
r. When during iteration j+1algorithm
A0
rrequests the position of the point stored at index ij+1then A00
rselects an index i0
j+1
uniformly at random from the set [n]\ {i0
1,··· , i0
j}and answers the query with the position
of point i0
j+1.
Claim 2.3.3 Let Pbe any point set in the Rd. If algorithms A00 and A0are invoked with
oracle access to the same function representing Pthen their output distributions are iden-
tical.
Proof : Let us fix the function representing Pand r(for both algorithms). We show by
induction on the number of iterations that the output distribution of algorithms A00
rand A0
r
are identical. We already proved for a fixed selection of points in iterations 1to jthat the
next point chosen by algorithm A0
ris uniformly distributed among all points not selected so
far. Clearly, the same is true for algorithm A00
r(if the distribution of the indices is uniform
then so is the distribution of the corresponding points). Thus the induction step holds and
thus the output distributions of both algorithms are identical. 2
We have already proven that A0is a property tester for property Π. Since the output
distributions of A00 and A0are identical for every input point set we have shown that algo-
rithm A00 is also a property tester. But A00 uses a non-adaptive sampling. Finally, we show
that there is an algorithm A000 that samples a set Sof q(, n)points uniformly at random
and decides based on Sand its internal coin flips.
Algorithm A000 first samples a set Sof q(, n)indices uniformly at random and then
chooses a permutation (i1,··· , iq(,n))of Suniformly at random. It then runs algorithm
A00 and answers the j-th query of algorithm A00 by returning the point stored at index ij.
Claim 2.3.4 Let Pbe any point set in Rd. If algorithms A000 and A00 are invoked with oracle
access to the same function representing Pthen their output distributions are identical.
Proof : Again we fix the function representing point set Pand the random choice of r. We
show by induction on the number of iterations that both algorithms have the same output
distribution. It suffices to show that the choice of the index in iteration j+1is uniformly
distributed among the indices not chosen so far. For every index i[n]\ {i1,··· , ij}it
holds that
Pr[ij+1=ii`6=ifor 1`j] = q(, n) j
nj·(q(, n) j1)!
(q(, n) j)! =1
nj
Thus the index is chosen uniformly at random from the set of remaining indices. Hence,
the output distributions of algorithms A000 and A00 are identical. 2
Algorithm A000 satisfies the statement of Lemma 2.3.1 and thus the proof is completed.
2
14
2.4 Two Probability Lemmas
2.4 Two Probability Lemmas
The first lemma shows that if we take a sample of size sfrom a set of nelements, then
with good probability we have at least one of kdisjoint sets (each of size l) in our sample.
Lemma 2.4.1 Let be an arbitrary set of nelements. Let kand lbe arbitrary integers
(possibly dependent on n) and let sbe an arbitrary integer such that s2 n
(2 k)1/l . Let
W1, . . . , Wkbe arbitrary disjoint subsets of each of size l. Let Sbe a subset of of
size swhich is chosen independently and uniformly at random. Then
Prj[k] : (WjS)1
4.
Proof : We first observe that we can focus on the case kn
2, because if k > n
2, then
every set Wicontains exactly one element and then we immediately get Pr[j[k] :
(WjS)] 1
2. Further, since kn
2yields l+nl
(2 k)1/l 2 n
(2 k)1/l , it is sufficient to
consider only the case s=l+nl
(2 k)1/l . Next, by Boole-Benferoni inequality (see, e.g., [65,
Proposition C.2]) we obtain
Prj[k] : (WjS)
k
X
j=1
Pr[WjS] X
1i<jk
Pr[(WiWj)S].
Further, we observe that for every j[k]it holds that
Pr[WjS] = nl
sl
n
s=(nl)!
(sl)! (ns)! ·s! (ns)!
n!=(nl)! s!
n! (sl)! =
l1
Y
r=0
sr
nr.
Similar arguments can be used to show that for every i, j [k], if i6=j, then it holds that
Pr[(WiWj)S] = n2 l
s2 l
n
s=
2 l1
Y
r=0
sr
nr=
l1
Y
r=0
sr
nr·
l1
Y
r=0
(sl) r
(nl) r.
Hence, Boole-Benferoni inequality implies that
Prj[k] : (WjS)k·
l1
Y
r=0
sr
nrk
2·
l1
Y
r=0
sr
nr·
l1
Y
r=0
(sl) r
(nl) r
=k·
l1
Y
r=0
sr
nr· 1k1
2·
l1
Y
r=0
(sl) r
(nl) r!
k·
l1
Y
r=0
sl
nl· 1k·
l1
Y
r=0
sl
nl!
=k·sl
nll
· 1k·sl
nll!.
15
Advertisement
2 Preliminaries
Now, since we assumed that s=l+nl
(2 k)1/l , our calculations above yield Pr[j[k] :
(WjS)] 1
4, and thus the lemma follows. 2
The next lemma is useful for certain lower bound constructions. The lemma gives an
upper bound on the probability that a sample set of size schosen uniformly at random
contains one of ksets Wicompletely.
Lemma 2.4.2 Let be an arbitrary set of nelements. Let k,l, and sbe arbitrary integers
(possibly dependent on n). Let W1, . . . , Wkbe any disjoint subsets of of size leach.
Let Sbe a subset of of size swhich is chosen independently and uniformly at random.
For every real p,0 < p < 1, (possibly depending on other parameters), if s(n (l
1)) ·(p/k)1/l, then Prj[k] : (WjS)p .
Proof : We use the union bound to obtain that
Prj[k] : (WjS)
k
X
j=1
Pr[WjS] =
k
X
j=1
l1
Y
r=0
sr
nrk·s
n (l1)l
.
Therefore, if s(n (l1)) ·(p/k)1/l, then the above inequality is upper bounded by
p.2
16
3 Testing Algorithms for Geometric
Properties
In this chapter we design and analyze testing algorithms for fundamental problems from
the area of Computational Geometry. The problems we consider deal with global proper-
ties of geometric objects such as point sets or geometric graphs. We can formulate property
testing in the standard model for geometric objects as follows:
Given a geometric object O(for example, a set of points or a geometric graph) and the
ability to perform local queries about the object (e.g. queries of the form: ’What is the
position of the i-th point of the collection ?’, ’What is the i-th edge incident to vertex p
?’, or ’What is the position of vertex p?’) decide whether Ohas a certain (predetermined)
property (e.g., whether the set of points is in convex position or whether the geometric
graph is a Euclidean minimum spanning tree) or is far away from every object having this
property.
We assume that our object representation is in the standard testing model. Point sets are
represented in the same way as it has been described in the previous chapter. Other objects
we consider are sets of geometric objects and geometric graphs. The object representation
is again in the spirit of the standard testing model. We allow only the most basic queries
about the object. It is only required that it is possible to determine the whole structure of
the object using these queries. By allowing only basic queries we ensure that our testing
algorithm is almost independent of the application (almost every structure supports these
basic queries). Later, in Chapter 5 we consider a model where we allow more powerful
queries.
The first problem we consider is disjointness of geometric objects. Then we design
a property tester for the convex position property of point sets. And finally, we give a
property tester for Euclidean minimum spanning trees.
3.1 Intersection of Geometric Objects
The first problem we consider is to test whether a set O={O1, . . . , On}of ngeometric
objects is pairwise disjoint. A geometric object is an (implicitly represented) subset of
Rd. Two geometric objects O1and O2are said to be disjoint, if O1O2=. A set
O={O1, . . . , On}of ngeometric objects is pairwise disjoint, if each pair of objects Oi
and Oj,1i, j n, is disjoint. An equivilant formulation of this problem is to test
17
Advertisement
3 Testing Algorithms for Geometric Properties
whether the intersection graph of the objects contains no edges.1
We represent sets of geometric objects by a function f: [n]Rwhere Rcontains
implicit representations of all geometric objects of a certain class of objects, e.g. all line
segments, points, or rectangle. For example, when we consider sets of line segments in the
Rdthen R=Rd×Rd. The property tester for disjointness of geometric objects is used
later in this thesis when we design a property tester for the Euclidean minimum spanning
tree. We use the standard distance measure from Definition 2.1.3 which has the following
equivalent formulation:
Definition 3.1.1 A set Oof nobjects in the Rdis -far from being pairwise disjoint, if one
has to remove more than n objects from Oto obtain a disjoint set of objects.
The Testing Algorithm. Like many other property testers the tester for disjointness of
geometric objects is very simple. It picks a set Sof 8pn/ objects uniformly at random
and computes whether these objects are pairwise disjoint. If they are, then the algorithm
concludes that in this case either the objects are disjoint or at least ’most of the objects of
the input set are disjoint’. Thus in such a case the algorithm can accept the input. If the
sample set contains two or more objects that intersect, then we know for sure that the set of
objects is not disjoint and the algorithm can reject the input. We prove that the following
algorithm is a property tester for disjointness of geometric objects.
DISJOINTNESSTESTER(set of objects O)
Choose a set S O of size 8pn/ uniformly at random
if Sis disjoint then accept
else reject
Lemma 3.1.2 Algorithm DISJOINTNESSTESTER is a property tester for disjointness of
geometric objects. Its query complexity is O(pn/)and its running time is T(8pn/)+
O(1), where T(m)is the time to decide whether a set of minput objects is disjoint.
Proof : We have to prove that (1) algorithm DISJOINTNESSTESTER accepts every set of
disjoint geometric objects, and (2) that it rejects every set of geometric objects that is -far
from disjoint with probability at least 2/3.
Part (1) is immediate: If Ois pairwise disjoint, the every subset of Ois also pairwise
disjoint and so algorithm DISJOINTNESSTESTER accepts O.
Thus let us suppose that Ois -far from disjoint and prove part (2). It is easy to see that
we can apply k=n/2 times the following procedure to O: pick a pair of intersecting
objects Wi,i[k], and remove it from O. In order to prove that DISJOINTNESSTESTER
is a property tester it is sufficient to show that with probability at least 2/3 at least one
of these pairs is in the sample set S. We apply Lemma 2.4.1 and standard amplification
1We do not use the formulation as a graph problem here because our distance measure would be based on
vertex deletion rather than edge deletion as it is usually in the property testing literature (see Section 2.1).
18
3.1 Intersection of Geometric Objects
arguments. We consider the sample set Sas four independently selected sets S
1, S
2, S
3, S
4,
each of size 2n
(2k)1/2 and then apply Lemma 2.4.1 to obtain:
Prj[k] : (WjS)1Y
1i41Prj[k] : (WjS
i)1(3/4)42/3
2
A Lower Bound on the Query Complexity. We now want to show that the algo-
rithm DISJOINTNESSTESTER in the general setting of arbitrary geometric objects is op-
timal with respect to its asymptotic query complexity. For this purpose we consider the
problem if a set of unit disks is disjoint. We represent a set of nunit disks by their center
points, that is, by a function f: [n]Rd. We observe that the disjointness property for
unit disks is equivalent to the following sparseness property of point sets. A point set Pin
the Rdis called sparse, if for each pair of points p, q we have dist(p, q)> 1. In order to
prove that every property tester for disjointness of geometric objects has query complexity
(pn/)we show that there is no property tester for sparseness of point sets with query
complexity o(pn/).
Lemma 3.1.3 Every property tester for disjointness of sets of geometric objects must have
query complexity (pn/).
Proof : As already mentioned above sparseness of point sets is a special case of dis-
jointness of geometric objects. Hence, it suffices to show a lower bound for the sparseness
property of point sets. Let us assume that there is a property tester Afor sparseness of
point sets with query complexity q(, n) = o(pn/).
Claim 3.1.4 Let Abe a property tester for sparseness of point sets with query complexity
q(, n). Then there is a property tester A00 for sparseness of point sets that samples a set S
of q(, n)points uniformly at random and accepts if and only if the points in Sare sparse.
Such a property tester is called canonical.
Proof : By Lemma 2.3.1 we know that there is a property tester A0that samples a set of
q(, n)points uniformly at random and decides based on Sand its internal coin flips. Let
A00 denote an algorithm that samples a set Sof q(, n)points uniformly at random and that
accepts if and only if Sis sparse. We want to prove that A00 is a property tester. Clearly,
algorithm A00 accepts every sparse point set. Thus we have to prove that it rejects every
point set that is -far from being sparse (a point set is -far from sparse if the corresponding
set of unit disks is -far from disjoint). We show this indirectly by proving that if algorithm
A0rejects a sample set Sthen so does algorithm A00. By definition a property tester has
one-sided error and so A0must accept if the sample is sparse. But if the sample Sis not
sparse then A00 rejects and thus it follows that if A0rejects then so does A00. Since A0is a
property tester it rejects every point set Pthat is -far from sparse with probability at least
2/3. Hence algorithm A00 also rejects such a set Pwith probability at least 2/3. Since A00
19
Advertisement
3 Testing Algorithms for Geometric Properties
accepts every sparse point set we have shown that A00 is a property tester for sparseness.
This proves Claim 3.1.4. 2
We construct a set Pof npoints in Rthat is -far from sparse but is not rejected by a
canonical property tester with query complexity q(, n)if nis sufficiently large. Pis the
union of bnc+1pairs Wiof points and a set Uof n2(bnc+1)points such that the
following condition is satisfied for all points p, q Pwith p6=q: If there is a Wiwith
Wi={p, q}then dist(p, q)1. Otherwise, we have dist(p, q)> 1. We observe that
algorithm A00 rejects Pif and only if the sample contains one of the sets Wi. We apply
Lemma 2.4.2 with p=1/2,k=n +1,
s=n1
q1
2(n +1)
=(pn/)
and l=2; thus:
Prj[k] : (WjS)1/2
Since q(, n) = o(pn/)for sufficiently large nwe have
q(, n)n1
q1
2(n +1)
.
This proves that there is a point set that is -far from sparse and that is rejected by algorithm
A00 with probability at most 1/2. Since 1/2 < 2/3 we conclude that A00 is not a property
tester. Contradiction. 2
We also note that the proof of the lower bound for disjointness of unit disks can be
modified to prove lower bounds for other classes of objects including line segments, rect-
angles, etc. This means that we cannot asymptotically improve the query complexity of a
property tester when we use the geometry of the special class of objects.
Summary. We summarize the results from this section in the following theorem:
Theorem 1 There is a property tester for pairwise disjointness of a set of geometric ob-
jects Owith query complexity O(pn/)and running time T(8pn/) + O(1)where
T(m)denotes the time needed to compute if a set of mobjects is disjoint. Every property
tester for disjointness of geometric objects has query complexity (pn/).
3.2 Convex Position
In this section we consider a classical property of point sets: Being in convex position. Let
Pdenote a set of npoints in general position in the Rd. We say that a point set Pis in
convex position, if all points in Pare vertices of the convex hull of P:
20
3.2 Convex Position
Definition 3.2.1 Let Pbe a point set in the Rdand let CH(P)denote the convex hull of P.
A point pPis called an extreme point of P, if pis a vertex of CH(P). A point set Pis in
convex position if each point in Pis an extreme point.
Using our standard representation of point sets we assume that the input point set Pis
represented by a function f: [n]Rd. We also use the standard distance measure from
Definition 2.1.3. In terms of the problem under consideration this can be stated as follows:
Definition 3.2.2 A set Pof npoints is -far from convex position, if no set Qof size (at
most) n exists such that P\Qis in convex position.
In the remainder of this section we prove that the following algorithm is a property
tester for convex position:
'
&
$
%
CONVEXTESTER (P, )
let s=164d+1
pnd/ +2d +2
Choose a set SPof size suniformly at random
if Sis in convex position then accept
else reject
In order to show that CONVEXTESTER is a property tester we have to prove that (1)
it accepts every point set in convex position and (2) it rejects every point set that is -far
from convex position with probability at least 2/3.
We observe that CONVEXTESTER accepts every point set in convex position because
every subset of a set in convex position is in convex position, as well.
Thus we only have to prove that a point set that is -far from convex position is rejected
with high probability. We make use of the following theorem by Carath´
eodory [23]:
Theorem 2 (Carath´
eodory’s Theorem) If PRdand p CH(P)thenp CH{p0, . . . , pd}
for certain affinely independent vectors {p0, . . . , pd}in P.
Now let us assume that Pis -far from convex position. We want to prove that algo-
rithm CONVEXTESTER rejects Pwith probability at least 2/3. It follows from Theorem 2
that every point set that is not in convex position can be rejected by finding a small set of
d+2points that is not in convex position. We refer to such a set as a counter example.
The idea of the analysis is now to consider sets Wi,1in
2(d+1), of d+1points
that can be easily extended to a set of d+2points that is not in convex position. Easily
extendible means that there is a large set Uiof points such that for each pUithe set
Wi{p}is a counter example. Intuitively, the existence of such a large set means that we
get one of the d+2points of the counter example for free because our sample set contains
at least one point of Uiwith very high probability.
21
Advertisement
3 Testing Algorithms for Geometric Properties
p
12
3
p
pp
Figure 3.1: Illustration for the proof of Lemma 3.2.3. One of the cones Ci(this one is
defined by p1and p2).
Finding Counter Examples. We now want to formalize this idea and prove two tech-
nical lemmas. First we prove that for a given point pin the interior of the convex hull of
a point set Pthere exists a set WPof dpoints such that there are at least n
d+1points in
Psuch that any of them can be added to W{p}to obtain a point set that is not in convex
position.
Lemma 3.2.3 Let Pbe a set of npoints in Rdand let pRdbe a point with p
CHint(P)and let P{p}be in general position. Then there exist sets WPand U
P\Wwith |W|=dand |U|n
d+1such that p CHint(W{q})for each qU.
Proof : By Theorem 2 and the general position assumption it follows that there is a set
WPof size d+1such that p CHint(W). Let us denote by W
i,1id+1,
the subsets of Wof size d. We show that one of the subsets W
iof Wsatisfies Lemma
3.2.3. W.l.o.g., let us assume that p= (0,...,0). We now want to partition the Rd
into d+1cones. Let W
i,1id+1, denote the set of points {(−x1,...,xd) :
(x1, . . . , xd)W
i}. The conic combination of the point vectors in the set W
idefines a
cone Ci,1id+1. The union of these cones Cicovers the Rd. Thus there is a cone Cj,
1jd+1that contains at least n
d+1points of P(and also of P\W
jsince W
jCj=).
We further observe that for each qPCjit holds that p CHint(W
j{q}). We
conclude that the sets W=W
jand U=PCjsatisfy the lemma. 2
The second technical lemma shows that if a point set is -far from convex position
then there are many sets Wiand Uisuch that Wi{q}is not in convex position for every
qUi.
Lemma 3.2.4 Let Pbe a set of npoints in the Rdthat is -far from convex position and
let k=n
d+1. Then there exist sets WiP,i[k], and sets UiP,i[k], such that the
following properties are fulfilled:
(1) |Wi|=d+1,i[k],
(2) WiWj=for all i, j [k]with i6=j,
22
3.2 Convex Position
(3) Wi{q}is not in convex position for every qUi, and
(4) |Ui|1
d+1n.
Proof : We construct point sets P1=P, P2, . . . , Pkwith WiPiand Pi+1=Pi\Wi.
Clearly, our construction satisfies that the sets Wiare disjoint. We show how to construct
Wifrom the set Pi: First of all, observe that for every i[k]it holds that |Pi|=n (d+
1)(i1)and thus |Pi|n (d+1)( n
d+11). This implies that |P\Pi|< n and so the
set Picannot be in convex position (by the definition of -far we can delete every set of
size at most n and do not obtain a point set in convex position). Since Piis not in convex
position there is a point p CHint(Pi). We apply Lemma 3.2.3 with pand Pi\ {p}and
obtain the sets Wand U. We choose Wi=W{p}and Ui=U. By Lemma 3.2.3 we get
|Wi|=d+1and Wi{q}is not in convex position for each qUi. We already observed
that |Pi|n (d+1)( n
d+11). Thus we know that |Pi\ {p}| nn which means
that |Ui|nn
d+1=1
d+1n. All properties in Lemma 3.2.4 are satisfied and the lemma is
proven. 2
Lemma 3.2.5 Algorithm CONVEXTESTER is a property tester for the convex position
property. Its query complexity is O(d+1
pnd/).
Proof : We have to prove that CONVEXTESTER accepts every point set in convex position
and rejects every point set that is -far from convex position with probability 2/3.
We already observed that algorithm CONVEXTESTER accepts every point set in convex
position.
Thus we have to show that every point set that is -far from convex position is rejected
with probability at least 2/3. Let Pbe a point set that is -far from convex position and let
k=n
d+1. Then there exist sets Wiand Ui,i[k], with the properties stated in Lemma
3.2.4. Now let SPbe a point set of size 4d+1
pnd/ +2(d+1)chosen uniformly at
random from P. W.l.o.g., let us assume that < 1/2. We view Sas the union of two sets
Wand Uof size 4d+1
pnd/ and 2(d+1), respectively. Clearly, we can assume that each
of them is chosen uniformly (and independently) at random from P. We first observe that
for a fixed Uiof size 1
d+1n:
PrUiU6=11|Ui|
n|U|11/e 1/2
We further know that:
PrPis rejectedPri[k] : WiSand UiS6=
Pri[k] : WiWand UiU6=
Pri[k] : WiW·1/2 .
To derive a bound on Pri[k] : WiWwe apply Lemma 2.4.1 with k=n
d+1,
l=d+1, and
s=4d+1
pnd/ d+1
p(2n)d·(d+1)/ 2n
(2k)1/l
23
Advertisement
3 Testing Algorithms for Geometric Properties
and obtain: Pri[k] : WiW1/4 .
We conclude that PrPis rejected1/8 .
If algorithm CONVEXTESTER chooses a set Sof size 16 ·4d+1
pnd/ +2(d+1)then
PrPis rejected1 (11/8)16 2/3
which proves the query complexity stated in Lemma 3.2.5. 2
A Lower Bound on the Query Complexity. We now want to prove that our prop-
erty tester is optimal with respect to the asymptotic query complexity. The proof follows
the same line of thought as the proof of Lemma 3.1.3. We first show that the existence of
a property tester with query complexity q(, n)implies that there is a canonical property
tester with the same query complexity, i.e. a property tester that samples a set Sof q(, n)
points uniformly at random and accepts, if and only if the sample set Sis in convex po-
sition. Then we show that for sufficiently large nthere is a point set Pthat is -far from
convex position such that
Pr[Sis in convex position ]> 1/3
holds, if SPis a set of q(, n) = o(d+1
pnd/)points chosen uniformly at random
from P. This implies that there is no property tester with query complexity q(, n).
Lemma 3.2.6 Every property tester (with one sided error) for convex position has query
complexity (d+1
pnd/).
Proof : The proof is by contradiction. Assume there is a property tester Afor convex
position with query complexity q(, n) = o(d+1
pnd/). We first want to prove that there
is a canonical tester with the same query complexity:
Claim 3.2.7 Let Abe a property tester for convex position of point sets with query com-
plexity q(, n). Then there is a property tester A00 for convex position of point sets that
samples a set Sof q(, n)points uniformly at random and accepts if and only if the points
in Sare in convex position.
Proof : The proof is similar to the proof of Claim 3.1.4. By Lemma 2.3.1 we know that
there is a property tester A0that samples a set of q(, n)points uniformly at random and
decides based on Sand its internal coin flips. Let A00 be an algorithm that samples a set Sof
q(, n)points uniformly at random and that accepts if and only if Sis in convex position.
We want to prove that A00 is a property tester. Clearly, algorithm A00 accepts every point set
in convex position. Thus we have to prove that it rejects every point set that is -far from
convex position. We show this indirectly by proving that if algorithm A0rejects a sample
set Sthen so does algorithm A00. By definition a property tester has one-sided error and
24
3.2 Convex Position
so A0must accept if the sample is in convex position. But if the sample Sis not in convex
position then A00 rejects. It follows that A00 always rejects when A0rejects. Since A0is
a property tester it rejects every point set Pthat is -far from disjoint with probability at
least 2/3. Hence algorithm A00 also rejects such a set Pwith probability at least 2/3. Since
A00 accepts every point set in convex position we have shown that A00 is a property tester
for convex position. This proves Claim 3.2.7. 2
Let us denote by A00 the property tester as obtained in Claim 3.2.7. In the remainder of
the proof we construct a point set Pof npoints that is -far from convex position and show
that this point set is accepted by A00 (observe that A00 is uniquely defined by q(, n)) with
probability at least 1/2. Since Pis -far from convex position this is a contradiction to the
fact that A00 is a property tester (which would require that Pis rejected with probability at
least 2/3 > 1 1/2 =1/2).
We start with an overview of our construction and then present it in details. The idea is
to construct a point set Pthat consists of n+1sets Wiof size d+1(and some additional
points). Each set Wiconsists of a set Fiof dextreme points that form a facet of the convex
hull and one point qithat is located infinitesimally close to the facet but in the interior of
the convex hull. The point set has the property that every subset SPof size more than
d+1is in convex position, if and only if there is a set WiS.
Figure 3.2: An example for the lower bound construction in 2D.
Now let Sbe a point set of size schosen uniformly at random from P. We apply
Lemma 2.4.2 with p=1/2 and obtain that for
s(nd)·1
2(n +1)
1/(d+1)
=Θ(d+1
pnd/)
our sample set Sdoes not contain one of the sets Wiwith probability at least 1/2. Hence S
is in convex position with probability at least 1/2 and thus algorithm A00 is not a property
tester, if q(, n)(nd)·1
2(n+1
1/(d+1).
TheDetailedConstruction. For a set Xanda pointpin Rdletdist(X, p) = minxX||x
p||. Let P1be a set of nd·(n +1)points in general and convex position and let
< 1/d 1/n. Further let P2={p1, . . . , pk}P1be a subset of size k=n +1.
25
Advertisement
3 Testing Algorithms for Geometric Properties
For each point pP1let hpdenote a hyperplane (a tangent to CH(P1)) such that
hpCH(P1) = p. Further let H+
p,H
pdenote the closed halfspaces induced by hpsuch
that H+
pdenotes the halfspace containing P1(Figure 3.2). Let
p
h
H
pHp
+
p
Figure 3.3: Step 1: We start with a point set in convex position.
β=min
p,qP1,p6=qdist(hp, q).
Observation 3.2.8 Every point set e
P1obtained from P1by perturbating each of the points
in P1by a distance of less than β/2 is in convex position.
Instead of perturbating the point set we replace each point piP2by a set Fi,i[k], of d
points (Figure 3.4). We do this replacement in such a way that we obtain a point set P3in
general and convex position. For a point pRdlet B(p, β/3)denote the d-dimensional
ball with center pand radius β/3. We choose each Fisuch that
qhpB(p, β/3)for each qFi.
P3=P1\P2Fis in general position, where F=Si[k]Fi
Clearly, we can choose the Fisuch that P3is in general position because we can move the
points within B(pi, β/3)hpito destroy every degenerate cases. We observe that P3is in
convex position since hpiis a witness for the fact that the points in Fiare extreme points
in P3. We obtain the set Pfrom P3by adding a set Q={q1, . . . , qk}of kpoints to P3. Let
Wi=Fi{qi}for each qiQ. We want to choose Qin such a way that:
P=P3Qis -far from convex position,
If a subset SPcontains no set Wi,i[k], completely, then Sis in convex
position.
We now give a construction that satisfies these properties. For each i[k]let q0
idenote
a point in the interior of the ((d1)-dimensional) convex hull of Fiand let videnote
a vector pointing from q0
iin the interior of CH(P3)(Figure 3.5). Let qi(β)denote the
point q0
i+β·vi. For i[k]and pFilet hFi,p(β)denote the hyperplane induced by
Fi\ {p}{qi(β)}. Further let H+
Fi,p(β)and H
Fi,p(β)denote the halfspaces induced by
26
3.2 Convex Position
p
hp
h
p
Figure 3.4: Step 2: We replace some points by a set Fiof d=2points.
hFi,p(β)such that H+
Fi,p(β)contains Pfor β=0. Since P3is in general position there is
a value βsuch that for every i[k]and every pFithe halfspace H+
Fi,p(β)contains
P3\ {p}and such that P3Qis in general position (where Qis defined as follows). We
choose qi=qi(β)and Q=Si[k]qi. We now have to prove that Qhas the required
properties:
Claim 3.2.9 The set P=P3Qis -far from convex position. If a subset SPdoes not
contain a set Wicompletely, then Sis in convex position.
Proof : Assume Pis -close to convex position. Then there is a set Rof at most n
points such that P\Ris in convex position. Since |R|n there is a set Wicompletely
contained in P\R. Further we know that by the size of Pand Rthere must be at least
one further point qin P\R. By the choice of Qwe know that qis in H+
Fi,p(β)for each
pFi. Hence qiis in the interior of CH(Fi{q})which contradict the fact that P\Ris
in convex position.
It remains to prove the second statement. Let SPa subset that contains no set Wi
completely. W.l.o.g., we assume that it contains exactly dpoints of each set Wi. If these
points are the points in Fithen by our construction the hyperplane hpiis a witness that the
points in Wi=Fiare extreme. Otherwise, one of the points in Fiis not in S. Let is call the
missing point p. Then hFi,p(β)is a witness that the points are extreme points. Thus for
every point pSwe have a witness that pis extreme. Hence Sis in convex position. 2
p
hp
h
q’
i
vii
q
Figure 3.5: Step 3: We place another point in the interior of CH(Fi)and move it slightly
along viinto the interior of the convex hull.
27
Advertisement
3 Testing Algorithms for Geometric Properties
Now we can apply Lemma 2.4.2 with p=1/2 and obtain that for
s(nd)·1
2(n +1)
1/(d+1)
=(d+1
pn/)
a sample set Sof size schosen uniformly at random from Pdoes not contain one of the
Wiwith probability at least 1/2. Since q(, n) = o(d+1
pnd/ we have for sufficiently
large n
q(, n)(nd)·1
2(n +1)
1/(d+1)
.
It follows that there exists a point set Pthat is -far from convex position and that is ac-
cepted by algorithm A00 with probability at least 1/2. Since the existence of a property
tester for convex position with query complexity q(, n)implies that algorithm A00 is a
property tester, it followsthat thereis no propertytester withquery complexity o(d+1
pnd/).
2
Summary. We summarize the results of this section in the following theorem:
Theorem 3 Let Pbe a point set of npoints in the Rd. Then there is a property tester
for the convex position property with query complexity O(d+1
pnd/)and running time
O(T(d+1
pnd/)) where T(n) = O(n·logO(1)h+ (nh)bd/2c
bd/2c+1·logO(1)n)is the running
time of the fastest known algorithm [24] to decide if a point set is in convex position (h
denotes the number of extreme points of the set). Every property tester for convex position
has a query complexity of (d+1
pnd/).
Proof : Follows from Lemma 3.2.5 and Lemma 3.2.6. 2
3.3 Euclidean Minimum Spanning Tree
We are given a point set Pin the R2and a geometric graph G= (P, E)with vertex set
Pand edge set Eand we want to design a property tester for the property that Gis a
Euclidean Minimum Spanning Tree (EMST) of the point set P. A graph G= (P, E)is
called a Euclidean minimum spanning tree of point set P, if Gis a minimum spanning
tree of the complete Euclidean graph of P. The complete Euclidean graph is a complete
weighted graph where each edge e= (p, q)P×Phas weight equal to the Euclidean
distance between pand q.
In this section we make slightly different assumptions on the input representation. We
assume that our property tester has oracle access to point set Pand graph Geach repre-
sented by a separate function. The point set is represented in a similar way as in the pre-
vious subsection. The graph is represented in the unbounded length adjacency list model
[67]. Details on the input representation follow in Subsection 3.3.1.
For simplicity we assume throughout this section that Pis in general position, i.e., all
edge weights are distinct and hence the EMST is unique and its maximum degree is five.
28
3.3 Euclidean Minimum Spanning Tree
3.3.1 Basic Definitions and Input Representation
Let Pdenote a set of npoints in general position in R2(we consider only the problem in
two dimensional space) and let T= (P, E)denote the Euclidean minimum spanning tree
of P. We start with some basic definitions needed in this section before we discuss the
input representation:
Definition 3.3.1 Ageometric graph for Pis a weighted graph G= (P, E)with vertex set P
and edge set EP×P(the edges can be interpreted as straight-line segments connecting
the endpoints). The weight of an edge (p, q)is implicitly given by the Euclidean distance
between pand qin R2.
Definition 3.3.2 A geometric graph for Pthat is the minimum spanning tree of the com-
plete geometric graph for Pis called the Euclidean minimum spanning tree (EMST) of
P.
In this section we do not use the standard distance measure (though the differences are
insignificant), because the input object consists of two parts, the point set and the graph.
Further we use the unbounded length adjacency list model which uses a non-functional
representation of the graph. Our results also hold in the (weaker) bounded degree adja-
cency list model.
Typically, a distance measure for graph properties depends on the number of entries
in the graph representation that must be changed to obtain a graph that has the tested
property. In our case, the size of the graph representation depends on the number of edges
in the graph. Since every EMST has n1edges we let our distance measure depend on
the number of vertices instead of the number of edges:
Definition 3.3.3 Let G= (P, E)be a geometric graph for Pand let T= (P, E)denote
the Euclidean minimum spanning tree of P. We say Gis -far from being the Euclidean
minimum spanning tree of P(or, in short, -far from EMST), if one has to modify (insert
or delete) more than n edges in Gto obtain T, that is :
|E\E|+|E\E|> n .
We want to design a property tester for the property of being a Euclidean minimum
spanning tree of a point set in the plane. Such a property tester takes as input a point set P
in general position in R2and a geometric graph Gfor P, and accepts the input if Gis the
EMST for Pand rejects the input with probability at least 2
3if Gis -far from being the
Euclidean minimum spanning tree of P.
Input Representation. Our property tester has to verify whether a geometric graph is
the EMST of a point set P. Hence, it must have access to the graph and the point set. Both
the point set and the graph are given as an oracle. Similar to the previous section the point
set is represented by a function f: [n]R2. Thus the algorithm may query the oracle for
the value of f(i)for some i[n]and gets in return the position of the i-th point of P.
29
Advertisement
3 Testing Algorithms for Geometric Properties
The geometric graph is given in the unbounded length adjacency list representation
introduced in [67]. Let us briefly recall the description of the model as given in Chapter
2: The unbounded length adjacency list model is a general model for sparse graphs. The
graph structure is represented by adjacency lists of varying length. Our property tester may
query for the degree deg(p)of a vertex pand for each ideg(p)it may query for the
i-th neighbor of p. We represent the vertex set of our graph by the set of numbers [n].
Thus we can easily obtain the position of a vertex pfrom the point set representation by
querying for the value of f(p).
BasicPropertiesof EMSTs. The followingclaim statessome basic(and well-known)
properties about Euclidean minimum spanning trees:
Claim 3.3.4 Every Euclidean minimum spanning tree of a point set in general position
in R2has maximum degree less than or equal to five, is connected, and its straight-line
embedding is crossing-free.
Proof : The EMST is connected since it is a spanning tree of a complete graph. Its
embedding is crossing-free because it is a subgraph of the Delaunay triangulation; see,
e.g., [61]. Assume the EMST has a vertex with degree 6 or larger. Then there are two
edges with an angle less than 60 degree since the point set is in general position. But this
means that at most one of these edges can be in the EMST. Contradiction. 2
Now we want to introduce some additional notation that will be useful to simplify the
description of the algorithm and its analysis. Let Gdenote a geometric graph for P. The
EMST-completion of Gis a geometric graph G0over the same point set that contains all
edges from Gand all edges of the EMST (that are missing in G).
Definition 3.3.5 For a given point set P, a geometric graph G= (P, E), and the Euclidean
minimum spanning tree T= (P, E)of P, the EMST-completion of Gis the geometric graph
GC= (P, E E)that contains all edges that are either in Gor in T.
In the next subsection we present a property tester for EMST that works for a special
class of input graphs which we call well-shaped. A well-shaped graph is a connected graph
with maximum degree 5 that has crossing-free EMST-completion. The restriction to well-
shaped graphs simplifies the analysis of the algorithm and it allows a clear view on the
important features of the property tester.
Definition 3.3.6 Let Gbe a geometric graph for point set P. We call Gwell-shaped, if
it has maximum degree of 5,
it is connected,
and the straight-line embedding of its EMST-completion is crossing-free.
30
3.3 Euclidean Minimum Spanning Tree
Later in this section we present a general testing algorithm for EMST. This algorithm first
tests if the input graph is far from a well-shaped graph. If this is the case then we can
reject the graph by Claim 3.3.4. If the input graph passes the test, then we know that with
good probability it is either close to a well-shaped graph or it is well-shaped. If the graph
is well-shaped we can use the testing algorithm for the special case of well-shaped graphs.
At the end of this section we show that for this algorithm it does not matter if the graph is
well-shaped or close to well-shaped. Hence, we can also use the algorithm if the graph is
close to a well-shaped graph.
3.3.2 Testing EMSTs in Well-Shaped Graphs
We now design a property tester for EMST for the case that our input graph is well-shaped.
First, we give an overview of the algorithm:
Let Gdenote a well-shaped geometric graph with vertex set P. Our property tester uses
the following procedure: We first pick a sample set SPusing some randomized scheme
to be described later. Next, we find the subgraph GSof Gthat is induced by the vertex set
S. Then we compute the EMST-completion of GS. If the EMST-completion has a cycle
then we reject the input, otherwise we accept. Computing the EMST-completion can be
done in time O(|S|·log|S|). We just have to compute the EMST of Sand insert the missing
edges in GS. The query complexity of the tester is O(|S|).
We first show that we can always reject the input graph, if the EMST-completion of GS
contains a cycle. We use the following lemma which follows easily from standard theory
of minimum spanning trees (see, e.g., [77, Chapter 6]). It implies that if we sample a set
of vertices SPthen an edge ecannot be in the EMST of Pif it is not in the EMST of S.
Lemma 3.3.7 Let SPbe a subset of Pand let p, q S. If the edge e= (p, q)does not
belong to the EMST of S, then edoes not belong to the EMST of P.
Proof : Our proof is by contradiction. Let us suppose that edoes not belong to the EMST
of Sand ebelongs to the EMST Tof P. The removal of ein Tcuts Tinto two trees. These
two trees induce a partition of Pinto two subsets P1and P2. Since ebelongs to the EMST
of P,emust also be the shortest edge between these two subsets. Let S1=P1Sand
S2=P2S.P1and P2are not empty since one vertex of eis in each of the sets. Then
eis also the shortest edge between S1and S2and therefore it belongs to the EMST of S;
contradiction. 2
Lemma 3.3.7 is of major importance for the way we proceed. Therefore we state its
consequences in Corollary 3.3.8 below. First of all, it shows that our property tester always
accepts the EMST of P(it rejects only, if there is a cycle in the EMST-completion and such
a cycle provides a counter example for the EMST property). Further it can be used to show
that we can reject the input graph, if all vertices of a cycle in the EMST-completion of G
are contained in S:
Corollary 3.3.8 Let Gbe a geometric graph for P. Let SPbe a subset of Pand let GS
be the subgraph induced by S.
31
Advertisement
3 Testing Algorithms for Geometric Properties
If the EMST-completion G0= (P, E0)of Gcontains a cycle C= (p0, . . . , pk)2of
length k3and piSfor all 0ik, then there is a cycle in the EMST-
completion of GS.
If the EMST-completion of GScontains a cycle C= (p0, . . . , pk)of length k3,
then Gis not the EMST of P.
Proof : We have to prove that Cis a cycle in the EMST-completion of GS, i.e., that every
edge (pi1, pi+1)for all i[k]is in the EMST-completion G0
Sof GS. Every such edge
is either an edge of the EMST or an edge of the input graph. Obviously, the edges of the
input graph are in G0
Sand by Lemma 3.3.8 all EMST edges are also in G0
S.
Now we prove the second part: Since the EMST-completion of GScontains a cycle
there is an edge ein GSthat does not belong to the EMST of S. It follows by Lemma 3.3.7
that eis not in the EMST of P. Hence Gis not the EMST of P.2
Now we consider the case when the input graph G= (P, E)is -far from EMST. Our
goal is to design a randomized sampling scheme such that the EMST-completion of the
subgraph induced by the sample set contains a cycle with high probability. Let T= (P, E)
denote the EMST of Pand let GC= (P, EE)denote the EMST-completion of G. In the
following we refer with red edges to the edges in E\Eand with blue edges to the edges
in E. It turns out that it is sufficient to focus on “short” cycles that contain at most two red
edges:
Definition 3.3.9 Let Gbe a geometric graph for Pand let Cbe a cycle of length kin the
EMST-completion of G. We call C -short if (1) its length kis smaller than or equal to 72
and (2) it contains at most two red edges.
In our algorithm we try to find -short cycles that satisfy some additional “topological”
properties. We want to exploit the fact that Gis well-shaped, in particular, that the EMST-
completion of Ghas a crossing-free straight-line embedding. Hence we use a topological-
like representation of the geometric graph Gto exploit the fact that every minimal cycle
in a (well-shaped) planar geometric graph corresponds to a face in its straight-line embed-
ding. In order to use this approach in a formal framework we will consider the geometric
graph Gnot only as an undirected graph, but at the same time also using its “directed” rep-
resentation by “replacing” each undirected edge (p, q)by two directed edges [p, qiand
[q, pi.
For every vertex pin Gwe (cyclically) sort incident outgoing edges in clockwise order
around the vertex pwith respect to the Euclidean positions of the edges’ endpoints. (This
sorting is done only implicitly, but since we assume that each vertex has a constant degree,
each time we consider a vertex we can in constant time sort its incident edges.) The
successor of a directed edge [p, qiis the edge [q, riwhere ris the vertex adjacent to q
that precedes pin the adjacency list of q(sorted in clockwise order around q). With this
definition cycles of succeeding edges correspond to faces of the straight-line embedding.
2Here, C= (p0,...,pk)is a cycle (of length k) if piPfor all i[k],p0=pk,(pi1, pi)E0for all
i[k]and pi6=pjfor all i, j [k],i6=j.
32
3.3 Euclidean Minimum Spanning Tree
Figure 3.6: A straight-line embedding and its planar map representation.
Such a representation of Gis called a planar map for the straight-line embedding of G(see
figure 3.6). We denote it e
G.
Definition 3.3.10 Let Gbe a well-shaped geometric graph for Pand let e
Gbe the corre-
sponding planar map. For each edge e= [p, qiin e
Gthe successor of eis the edge [q, ri
in e
Gwith rbeing the vertex adjacent to qthat precedes pin the adjacency list of q(sorted
in clockwise order around q); if qhas degree one, then r=p.
For edge e= [p, qiin Gthe kth successor,k0, is defined recursively as follows:
the 0th successor of [p, qiis [p, qiitself, and for k > 0, the kth successor of [p, qiis the
successor of the (k1)st successor of [p, qi.
Similarly, edge eis the predecessor of edge e0if edge e0is the successor of edge e, and
eis the kth predecessor of e0if e0is the kth successor of e.
We have introduced the planar map representation of a graph Gbecause it describes the
faces of the corresponding embedding in a simple way (using succeeding edges in e
G). We
observe that the correspondence between the faces in the embedding of Gand the cycles
of succeeding edges in e
Gis one to one. We also note that each (directed) edge is contained
in exactly one cycle of succeeding edges.
Definition 3.3.11 Let Gbe a well-shaped geometric graph for Pand let C= (p0, . . . , pk)
be a cycle in the planar map of the EMST-completion of G. Then Cis called topological,
if for every two consecutive edges on the cycle [pi, pi+1iand [pi+1, pi+2i,[pi+1, pi+2iis
the successor of [pi, pi+1i. We also call the corresponding cycle in Gtopological.
The following key lemma shows that every well-shaped geometric graph that is far
from EMST must contain many short topological cycles in its EMST-completion.
Lemma 3.3.12 Let G= (P, E)be a well-shaped geometric graph for P. If Gis -far from
EMST, then there are at least n
100 -short topological cycles in the EMST-completion of G.
Proof : Let T= (P, E)denote the EMST of Pand let EB=E\Edenote the blue edges of
the EMST-completion of Gthat are not in E. Further let ER=E\Edenote the red edges
33
Advertisement
3 Testing Algorithms for Geometric Properties
of the EMST-completion of G. Since Gis -far from EMST we have |EB|+|ER|> n by
definition of -far.
Now let Hdenote the EMST-completion of Gand let e
Hdenote the planar map of
its straight-line embedding. Let us denote by ρthe number of faces in the straight-line
embedding of H. Hence e
Hhas ρ(disjoint) topological cycles since each face is bounded
by a unique cycle of succeeding edges (which by definition is called topological). Since
His planar and connected and has more than n1+n/2 edges we can apply Euler’s
formula to deduce that ρn/2.
Now let s(f)denote the number of (directed) edges in the topological cycle bounding
face f. Since Pfs(f)6n by Euler’s formula there can be at most ρ
4faces fwith
s(f)48
. Thus
4faces fhave s(f)<48
.
Since |ER|ρthe number of directed red edges is at most . Hence the number of
topological cycles with 3 or more red edges can be at most
3. Since we have shown that
there are at least
4topological cycles having less than 48
edges, at least ρ
12 n
100 of them
have at most two red edges. 2
Definition 3.3.13 Let Gbe a well-shaped geometric graph for P. For every vertex pP,
we define its topological k-neighborhood as the set of vertices that are the endpoints of the
edges that are either the ith successor, 0ik, of any edge incident to p, or are the jth
predecessor, 0jk, of any edge incident to p. The topological k-neighborhood of a
vertex pis denoted Ntop
G(p, k).
The following claim follows easily from the fact that our input graph has maximum
degree of 5.
Claim 3.3.14 Let Gbe a well-shaped geometric graph for P. For every vertex pP, we
can find its topological k-neighborhood in time O(k).2
Now, provided that Gis -far from EMST (but well-shaped), our first approach is to
sample uniformly at random a sufficiently large set Qof points in P. Then we add for
every point in Qits topological 72
-neighborhood to the sample set. Provided that the set
Qis sufficiently large, we prove in Lemma 3.3.18 that if Gis -far from EMST, then
the so obtained set will contain all vertices from a certain -short topological cycle in the
EMST-completion of Gwith probability at least 2/3. Therefore, this will certify that G
is not an EMST by Corollary 3.3.8. In Lemma 3.3.24, we tune the sketched algorithm to
slightly improve the complexity bound.
3.3.3 A Simple Property Tester in Well-Shaped Graphs
Now we discuss our first property tester for EMST for well-shaped input graphs. We follow
the approach sketched in the beginning of this subsection. According to that description
the key issue is to describe the algorithm that finds the sample set Sand to prove that
with probability at least 2
3the input is rejected if Gis -far from EMST. In particular, by
our discussion above we know from Lemma 3.3.12 that if a well-shaped graph Gis -far
34
3.3 Euclidean Minimum Spanning Tree
from EMST, then there are many -short topological cycles in the EMST-completion of G.
Further, by Corollary 3.3.8, in order to reject Git is sufficient to prove that with probability
at least 2
3the sample Scontains all vertices from a certain -short topological cycle in the
EMST-completion of G. We provide a precise description of this algorithm and its analysis
below. We assume that Gis well-shaped. Every -short topological cycle either
1. is a cycle consisting of at most 72
blue edges, or
2. is a path consisting of at most 72
blue edges having the endpoints connected by a red
edge, or
3. is a path consisting of at most 72
blue edges having the endpoints connected by a
path of two red edges, or
4. consists of two paths containing at most 72
blue edges that are connected to each
other by two red edges.
Our first observation is that if there are many -short topological cycles of type (1) or (2),
then we can easily spot them. Indeed, if there are at least n
200 -short topological cycles of
type (1) or (2) in the EMST-completion of G, then it is enough to take a random subset Q
of Pof size Θ(1/)to ensure that at least one vertex from any -short topological cycle
will be in Qwith probability at least 2
3.
Lemma 3.3.15 Let G= (P, E)be awell-shaped geometricgraph. Ifthe EMST-completion
of Gcontains at least n
200 -short topological cycles of type (1) or (2), then a set QP
of size 4000/ chosen uniformly at random contains at least one vertex from any -short
topological cycle.
Proof : Since Gis well-shaped it has a maximum degree of 5and so the EMST-
completion of Ghas a maximum degree of 10. We conclude that every vertex pP
is contained in at most 10 -short topological cycles. Furthermore, the set PCof all ver-
tices that are contained in at least one -short topological cycle (of type (1) or (2)) has
cardinality at least n
2000 . Now let QPdenote a set of size 4000
taken uniformly at
random from P. Then
PrQPC=(1|PC|
n)|Q|(1
2000)|Q|1/3 .
Therefore, PrQPC6=2/3 .
2
Now, we explore the key feature of -short topological cycles of type (1) or (2): For
every vertex vfrom such a cycle all other vertices from the cycle belong to the topological
72
-neighborhood of v. This motivates us to define the sample set Sas the topological 72
-
neighborhood of all vertices in Q. Since the set Qcontains at least one vertex from any
-short topological cycle of type (1) or (2) with probability at least 2/3, we can conclude
that Scontains all vertices from a particular -short topological cycle of type (1) or (2)
35
Advertisement
3 Testing Algorithms for Geometric Properties
with probability at least 2
3. Since every vertex of the topological cycle is contained in our
sample set we know by Corollary 3.3.8 that the EMST-completion of the subgraph induced
by our sample contains a cycle. Thus our property tester rejects the input with probability
2
3if it is -far from EMST.
It is easy to see that the procedure for cycles of type (1) and (2) described above has a
query complexity of O(|Q|
) = O(1
2).
Lemma 3.3.16 Let G= (P, E)be a well-shaped geometric graph and let QPbe a set
of size 4000/ chosen uniformly at random from P. If the EMST-completion of Gcontains
at least n
200 -short topological cycles of type (1) or (2), then the set
S=[
pQNtop
G(p, 72
)
contains all vertices of at least one -short topological cycle with probability at least 2/3.
Proof : Follows from Lemma 3.3.15 and the observation that a cycle of type (1) and (2)
is completely contained in S, if one of its vertices is contained in Q.2
The -short topological cycles of type (3) and (4) are a little bit more difficult to detect.
However, we can still use a very similar approach as for cycles of type (1) or (2), but
this time we must find two vertices that belong to the same -short topological cycle.
Suppose that there are at least n
200 -short topological cycles of type (3) or (4) in the
EMST-completion of G. As before, we first take a random subset Qof P, but this time
the size of Qis Θ(pn/). Then, we define the sample set Sto be the union of the
topological 72
-neighborhood of all vertices in Q. We show now that the so defined sample
set is sufficient to certify that G(if it is -far from EMST) is not an EMST by proving an
analogous statement to Lemma 3.3.16 for cycles of type (3) and (4):
Lemma 3.3.17 Let G= (P, E)be a well-shaped geometric graph and let QPbe a
set of size 80pn/ chosen uniformly at random from P. If the EMST-completion of G
contains at least n
200 -short topological cycles of type (3) or (4), then the set
S=[
pQNtop
G(p, 72
)
contains all vertices of at least one -short topological cycle with probability at least 2/3.
Proof : For every -short topological cycle Cof type (3) let us define the set WCto
contain two vertices: one vertex on the blue path in Cand the vertex incident to the two
red edges in C. Similarly, for every -short topological cycle Cof type (4) let us define the
set WCto contain one vertex from the first blue path in Cand one vertex from the second
blue path in C.
Since each vertex pPbelongs to at most 10 -short topological cycles we can select
from the sets WCthe sets Wi,1in
2000 , such that the sets Wiare disjoint and for each
36
3.3 Euclidean Minimum Spanning Tree
i,1in
2000 , there is an -short topological cycle Cwith Wi=WC. Then we apply
Lemma 2.4.1 with k=n
2000 ,l=2, and s=20n
n =20pn/ and obtain
Prj[k] : (WjQ)1 (11/4)42/3 .
Now observe that all vertices of a cycle Care in S, if WCQ. Therefore, the lemma
follows. 2
We prove that the following algorithm is a property tester for EMST:
'
&
$
%
EMST-TEST-SIMPLE(G, )
s=80pn/ +4000/
choose a set QPof size suniformly at random
S=SqQNtop
G(q, 72
)
compute the subgraph GSinduced by S
compute the EMST-completion GCof GS
if GCcontains a cycle then reject
else accept
Lemma 3.3.18 Let Gbe a well-shaped geometric graph for P. Then there is a property
tester that in time Opn/3·log(n/)and with query complexity O(pn/3)accepts
the input if Gis an EMST of Pand rejects the input with probability at least 2
3, if Gis -far
from EMST.
Proof : We claim the algorithm EMST-TEST-SIMPLE is a property tester for EMST. By
Corollary 3.3.8 we know that EMST-TEST-SIMPLE accepts, if the input graph G= (P, E)
is the EMST.
Now let us consider the case when Gis -far from EMST. Then by Lemma 3.3.12 we
know that there are n/100 -short topological cycles in the EMST completion of G. It
follows that there are n/200 cycles of type (1) and (2) or n/200 cycles of type (3) or
(4). By Lemma 3.3.16 and 3.3.17 we know that the sample taken by EMST-TEST-SIMPLE
contains an -short topological cycle with probability at least 2/3. By Corollary 3.3.8 we
know that then there is a cycle in the EMST-completion of the subgraph induced by our
sample. Hence the algorithm rejects in such a case.
The query complexity of the algorithm is immediate. Its running time follows from
Claim 3.3.14 and the fact that the EMST completion of a graph with mvertices can be
computed in time O(mlogm).2
3.3.4 An Improved Property Tester in Well-Shaped Graphs
Now we slightly improve the complexity of the property tester described in Lemma 3.3.18.
In our analysis above we were always trying to catch one initially fixed single vertex from
each blue path although an -short topological cycle can contain as many as 72
vertices.
37
Advertisement
3 Testing Algorithms for Geometric Properties
We now want to take the length of the topological cycles into consideration. Further, we
were always taking topological 72
-neighborhoods of all vertices. This strategy should be
applied to the cycles that haveas many as 72
edges, but it is not necessary for shorter cycles.
Our approach now is to improve the complexity of the property tester by combining these
two observations. We want to show that the following algorithm is a property tester for
EMST, if the input graph is well-shaped:
'
&
$
%
EMSTTEST(G, )
s=1700pn/ +192000/ +4000/
S=FINDCYCLE(G, s, )
compute the subgraph GSinduced by S
compute the EMST-completion GCof GS
if GCcontains a cycle then reject
else accept
Where the procedure FINDCYCLE is the following:
'
&
$
%
FINDCYCLE(G, s, )
S(0)=
for i=1to 2s do
j=0
pick a vertex p(i)Puniformly at random
while jlog 72
do
j=j+1
flip a coin
if head then exit
S(i)=S(i1)Ntop
G(p(i), 2j)
return S(2s)
First of all, we observe that algorithm EMSTTEST accepts every Euclidean minimum
spanning tree by Corollary 3.3.8.
Thus we have to prove that the algorithm rejects, if the input graph is -far from EMST.
Let us assume that Gis well-shaped and -far from EMST. Then, by Lemma 3.3.12, there
are at least n
100 -short topological cycles in the EMST-completion of G. Let Cj,j=
1, 2, 3, 4, denote the set of all -short topological cycles of type (j) in the EMST-completion
of G. Now we consider separately cycles in C1C2and cycles in C3C4. By our discussion
above we have either |C1C2| n
200 or |C3C4|n
200 .
Cycles of type (1) and (2). Suppose that Gis a geometric graph for Pwith maxi-
mum degree 5and there are at least εn -short topological cycles of type (1) or (2) in the
EMST-completion of Gfor ε=
200 . We first consider the probability that a fixed -short
topological cycle CC1C2is contained in the sample set. Let `denote the number
of vertices in cycle C. Then the probability that in round iof the FINDCYCLE procedure
38
3.3 Euclidean Minimum Spanning Tree
vertex p(i)is one of the `vertices of cycle Cis `
n. Further the probability that the topolog-
ical neighborhood of p(i)is chosen large enough to contain all vertices of Cis at least 1
2` .
Overall, for a fixed cycle Cthe probability that a vertex of Cis chosen in round iand that
the topological neighborhood of the vertex is large enough is at least 1
2n . If the cycles are
vertex disjoint then it is simple to prove that after O(1
)rounds at least one cycle is com-
pletely contained in the sample set with constant probability. Unfortunately, in the general
case the cycles are not vertex disjoint. To overcome this technical problem we use the
planar map representation of Gand the following trick for the analysis: Instead of taking
the whole topological 2j-neighborhood of vertex p(i)we assume that our algorithm selects
only one of the outgoing edges (in its planar map representation) uniformly at random.
Then it includes only the 2jsuccessors and predecessors of the chosen edge in the planar
map representation of G. Clearly, this procedure considers only a subset of the vertices
considered in the original procedure. Nevertheless, we can show that the set of vertices
we pick using this procedure is still sufficiently large. We can now use the fact that the
(directed) cycles are edge disjoint. Assume that we pick a vertex that belongs to a cycle C.
Provided that the chosen neighborhood is large enough we still have to choose the correct
outgoing edge to have all vertices of Cin the sample set. Since our graph has a degree
bound of 5 the probability that this edge is chosen is at least 1/5. Since type (2) cycles
consist of a path of `1(directed) blue edges the probability that p(i)is one of the `1
origins of these edges is `1
n`
2n (a directed edge points from its origin to its destina-
tion). Hence the probability that a cycle Cof type (1) or (2) is completely contained in the
sample set is at least 1
20n . We know that the cycles are disjoint and so the probability that at
least one cycle is completely contained in the sample set in round iis at least εn
20n =ε
20 . Let
XCdenote the indicator random variable for the event that cycle CC1C2is completely
contained in the sample set. Then we have for s20/ε =4000/:
PrCC1C2:XC=01ε
202s 1/3
and hence PrCC1C2:XC=12/3
and so we have just proved:
Lemma 3.3.19 Let Gbe a geometric graph for Pwith maximum degree 5that has at least
εn =n
200 topological -short cycles of type (1) or (2). If algorithm FINDCYCLE(G, s, )
is invoked with s4000/ then the set S(2s)returned by the algorithm contains an -short
topological cycle with probability at least 2/3.2
Cycles of type (3) and (4). Let Gbe a geometric graph with maximum degree 5. Let
us further assume that there are at least εn topological -short cycles of type (3) and (4) in
the EMST-completion of G, for ε=
200 . We want to show that the sample set computed
by algorithm FINDCYCLE contains every vertex of at least one -short topological cycle
with good probability.
Recall that cycles of type (4) consist of 2 paths of blue edges connected by two red
edges. Cycles of type (3) are a special case of type (4) cycles: The shorter path has length
39
Advertisement
3 Testing Algorithms for Geometric Properties
0. For each cycle CC3C4let X(i)
Cdenote the indicator random variable for the event
that all vertices of the longer (blue) path of cycle Care in S(i). Let Y(i)
Cdenote the indicator
random variable for the event that all vertices of the shorter (blue) path of cycle Care in
S(i). Further let (i+1)denote the indicator random variable for the event that there is a
cycle C0C3C4with X(i)
C0=0and X(i+1)
C0=1. We say that a cycle CC3C4is
half-contained in S(i), if X(i)
C=1. Cycle Cis contained in S(i), if X(i)
C=1and Y(i)
C=1.
We analyze the algorithm in two steps. We first show that with high probability many
(at least εs/80) topological -short cycles are half-contained in the set S(s). Then we show
that the set S(2s)contains at least one cycle CC3C4with high probability.
Claim 3.3.20 Let the outcome of the random choices in round 1to iof the for-loop of
FINDCYCLE be fixed. Then it holds that, if
X
CC3C4
X(i)
C<εs
2(3.1)
then
Pr(i+1)=1ε/40 . (3.2)
Proof : Let us assume that Equation (3.1) is true. Then we observe that:
X
CC3C4
X(i)
C<εs
2εn
2
since sn. We conclude that we have more than εn/2 cycles in C3C4that are not
half-contained in S(i). If p(i+1)is one of the vertices of the longer path of one of these
cycles and if the topological neighborhood included in FINDCYCLE is large enough then
we have (i+1)=1. To estimate the probability for (i+1)=1we apply the same trick as
in the analysis for the case of type (1) and (2) cycles. This yields immediately (observing
the fact that we have εn/2 cycles instead of εn):
Pr(i+1)=11
2` ·`
2n ·1
5·εn
2=ε/40
.2
Our next goal is to show that there are at least εs/80 cycles that are half-contained in
S(s).
Claim 3.3.21
PrX
CC3C4
X(s)
Cεs/80eεs/300 .
40
3.3 Euclidean Minimum Spanning Tree
Proof :
PrX
CC4C4
X(s)
Cεs
80
PrX
1is
(i)εs
80
PrX
1is
B(i)εs
80
PrX
1is
B(i)(11
2)·εs
40
where B(i)are independent 01variables with Pr[B(i)=1] = ε/40. The latter inequality
follows from Claim 3.3.20. We apply a Chernoff bound [56, Inequality (7)] and it follows
that
e−( 1
2)2εs
40·2=eεs/320 .
2
Let W(i+1)denote the indicator random variable for the event that there exists C
C3C4with X(i)
C=1and Y(i)
C=0and Y(i+1)
C=1.
Claim 3.3.22 Let the outcome of the random choices in round 1to iof the procedure
FINDCYCLE be fixed. If X
CC3C4
X(i)
C> εs/80
then PrW(i+1)εs
1600 ·n.
Proof :
We assume that there are more than εs/80 cycles that are half-contained in S(i). Again
we use essentially the same trick as in the case of type (1) and (2) cycles. We observe that
there is a little problem with cycles of type (3). Since the length of the shorter path is 0there
is no directed edge in this path. Thus we have to modify our approach slightly. We use
the following sampling scheme for the analysis: Instead of taking the whole topological
2j-neighborhood of p(i)we choose a number kbetween 1and 6uniformly distributed. If k
is between 1and 5we include the 2jpredecessors and successors of the k-th edge incident
to p(i). In the case k=6we only include the vertex p(i)in the sample. Then we get that
the probability that a cycle Cis contained in the sample is at least 1
2` ·`
2n ·1
6=1
24n . We
have more than εs/80 cycles that are half-contained in S(i). Therefore we obtain that:
PrW(i+1)εs
1920 ·n.
2
41
Advertisement
3 Testing Algorithms for Geometric Properties
Lemma 3.3.23 Let Gbe a geometric graph for Pwith maximum degree 5that has at
least εn =n
200 topological -short cycles of type (3) or (4). Then FINDCYCLE is an
algorithm with (expected) query complexity O(pn/ log(n/)) that samples a set SP,
|S|1700pn/ +192000/, such that the EMST-completion of the subgraph GS(2s)
induced by S(2s)has an -short topological cycle.
Proof : Let ε=/200 and let Gbe a geometric graph for Pwith maximum degree 5
that has at least εn topological -short cycles of type (3) or (4).
PrThere is a cycle in the EMST-completion of the subgraph induced by S(2s)
1Pr1
sX
CC3C4
X(s)
Cε
80+1εs
1920ns
by Claim 3.3.22. Choosing s1700pn/ +192000/ it follows together with Claim
3.3.21 for n4:
1 (e2+e5)4
5.
2
Obtaining Deterministic Query Complexity. It is easy to modify the algorithm
such that the query complexity has an upper bound of O(pn/ log(n/)) by insignifi-
cantly increasing the error of the algorithm. We can do this in the following way: We run
algorithm EMSTTEST and stop, if it either accepts or rejects or if the sample size becomes
too large. Let XSdenote the random variable for the size of S(2s). We stop the algorithm
and accept the input, if we find out that the size of S(2s)becomes larger than 10 ·E[XS]. By
Markov inequality we have:
PrXS10 E[XS]1/10 .
Hence it follows that our new algorithm rejects a geometric graph that is -far from EMST
with probability 4/5 1/10 2/3. Thus it is a property tester with a deterministic bound
of O(log(n/)·pn/)on the query complexity of our algorithm (rather than expected
query complexity).
Lemma 3.3.24 Let Gbe a well-shaped geometric graph for P. Then there is a property
testerthat in time O(log2(n/)·pn/)and withquery complexityof O(log(n/)·pn/)
accepts the input G, if Gis an EMST of Pand that rejects the input with probability at least
2
3if Gis -far from EMST.
Proof : Follows from Lemma 3.3.12, Lemma 3.3.19 and Lemma 3.3.23. 2
42
3.3 Euclidean Minimum Spanning Tree
3.3.5 A Property Tester in Graphs with Maximum Degree 5
Now we want to remove the well-shaped condition for input graphs. In this subsection
we do the first step towards that goal. We develop property tester for connectivity and
crossing-free EMST-completions. Then we replace the well-shaped condition for EM-
STTEST by the assumption that the input graph has maximum degree 5. Before EM-
STTEST is invoked we test if the input graph is /200-far from connected and if its
EMST-completion has a crossing-free straight line embedding. Thus we may assume that
EMSTTEST gets an input graph that is /200-close to connected and /200-close to hav-
ing an EMST-completion with a crossing-free straight line embedding (if this is not the
case, the property testers for connectivity and crossing-free EMST-completions reject).
This way we develop a property tester for graphs with maximum degree 5. The degree
bound is then removed in the last subsection.
Testing Connectivity. In the first phase we test whether the input graph is connected.
Definition 3.3.25 A geometric graph Gfor Pis -far from connected if one has to add
more than ·nedges to Gto obtain a connected graph. If Gis not -far from connected,
then we call it -close to connected.
Remark 1 Let us notice here the equivalent characterization of geometric graphs for P
that are -far from connected these are geometric graphs for Phaving more than ·n+1
connected components.
Since the property of being connected does not depend on the positions of the input
points in P, we can use a property tester for connectivity in graphs. In [52] it has been
proven that connectivity of bounded degree graphs can be tested efficiently:
Lemma 3.3.26 [52] Let Gbe a graph with degree bound d. Connectivity of Gcan be
tested with O(log2(1/d)
d )time and query complexity in the bounded length adjacency list
model (see Chapter 2) 3.
We can immediately apply this result to geometric graphs:
Corollary 3.3.27 Let Gbe a geometric graph for Pwith maximum degree 5. There is a
property tester that in time O(log2(1/)
)and with a query complexity of O(log2(1/)
)accepts
the input if Gis connected and rejects the input with probability at least 2
3if Gis -far
from connected.
Proof : We run the tester from [52] with d=5and 0=/5.2
3In the bounded degree graph model a graph is -far from connected if one has to add more than dn edges
to obtain a connected graph.
43
Advertisement
3 Testing Algorithms for Geometric Properties
Testing Crossing-Free EMST-Completions. Next, we design a tester that accepts
the input graph, if it is the EMST and rejects it if the straight-line embedding of its EMST-
completion is -far from crossing-free. We proceed in two steps. First, our property tester
checks for pairs of intersecting blue edges and then it tries to find intersections between
blue and red edges; red edges cannot intersect because they are edges of an EMST. We use
the following distance measure.
Definition 3.3.28 The straight-line embedding of a geometric graph Gfor Pis -far from
crossing-free if one has to remove more than ·nedges in Gto obtain a crossing-free
straight-line embedding. If the straight-line embedding of Gis not -far from crossing-
free, then we call it -close to crossing-free.
We first use the tester DISJOINTNESS developed in Section 3.1 to find intersections be-
tween blue segments (induced by blue edges). Since Ghas maximum degree 5it has at
most 5n edges. Therefore, since one can verify in time O(nlogn)if a geometric graph
with nvertices has crossing-free straight-line embedding [16] (the number of edges must
be O(n); otherwise there must be a crossing), Theorem 1 implies the following result.
Lemma 3.3.29 Let Gbe a geometric graph with maximum degree 5. There is a property
tester that in time O(pn/ log(n/)) and with the query complexity of O(pn/)ac-
cepts the input if the straight-line embedding of Gis crossing-free and rejects the input
with probability at least 2
3if the straight-line embedding of Gis -far from crossing-free.
2
It remains to design a property tester for red-blue intersections in the EMST-completion
of G4. A geometric graph with red and blue edges has a straight-line embedding without
red-blue intersections, if there is no intersection between the corresponding red and blue
segments. Similarly, the straight-line embedding of a geometric graph whose edges are
colored blue and red is -far from having no red-blue intersections, if one has to delete
more than an -fraction of its edges to remove all red-blue intersections.
The main difficulty with testing for red-blue intersections in the EMST-completion of
Gis that the red edges are only defined implicitly. The following lemma shows a relation
between the endpoints of red and blue edge intersecting each other.
Lemma 3.3.30 Let pq be a red and xy be a blue segment in the EMST-completion of G.
If pq and xy intersect each other, then either pq is not in the EMST of every set containing
{p, q, x}or it is not in the EMST of every set containing {p, q, y}.
Proof : The points p, q, x, y are in convex position because the segments pq and xy
intersect. We consider the quadrilateral pxqy (see figure 3.7). Let us call the inner angles
in the quadrilateral at vertices p, q, x, y to be α,β,γ, and δ, respectively. Let us recall that
the longest edge of a triangle is opposite of the largest angle.
4More precisely, we do not design a property tester for the property of having no red-blue intersections. Our
algorithm might reject an input graph if its EMST-completion has no red-blue intersections. However, if
the input graph is the EMST then it is always accepted by our algorithm.
44
3.3 Euclidean Minimum Spanning Tree
p
q
y
x
α
β
δ
γ
Figure 3.7: A red blue intersection. The red edge is dotted.
If α < π
2and β < π
2then γor δis larger than π
2because α+β+γ+δ=2 π. Without
loss of generality, let γ > π
2. Then, segment pq is the longest edge of triangle pqx and
thus is cannot be the EMST of {p, q, x}. By Lemma 3.3.7 this is a contradiction to the fact
that (p, q)is an EMST edge. Hence we must have either απ
2or βπ
2.
If απ
2then segment xy is the longest edge in triangle pxy. Hence edge (x, y)is not
contained in the EMST of p, x, y. By Lemma 3.3.7 it is also not contained in every EMST
of a subset of Pthat contains p, x, y.
If βπ
2then segment xy is the longest edge in triangle qxy. Hence edge (x, y)is not
contained in the EMST of q, x, y. By Lemma 3.3.7 it is also not contained in every EMST
of a subset of Pthat contains q, x, y.2
This lemma shows that each red-blue intersection has a “witness” consisting of one
point and one edge of the input graph. Our property tester for red-blue intersections is
similar to the DISJOINTNESS tester. We modify the disjointness property in the following
way: We say that two points p, q Pintersect, if there is an (blue) edge e= (p, x)(or
e= (q, x)) incident to p(or q) such that the other point is a witness that eis not in the
EMST of P, that is eis not in the EMST of p, q, x. The following algorithm is a property
tester for red-blue intersections:
'
&
$
%
REDBLUETEST(G, )
Choose a set S0Pof size 16p5n/ uniformly at random
Let S=S0N(S0)where N(S0)denotes the set of neighbors of points in S0
Let GSdenote the subgraph induced by S
if the EMST-completion of GShas a cycle then reject
else accept
The analysis of the algorithm is similar to the analysis of the algorithm DISJOINTNESS.
If the input graph is the EMST then there are no intersections among the points in P.
If the EMST-completion of the input graph is -far from having no red-blue intersections
then by Lemma 3.3.30 it follows that the point set Pis -far from being disjoint, if the
EMST-completion is -far from having no red-blue intersections. Thus we have:
45
Advertisement
3 Testing Algorithms for Geometric Properties
Lemma 3.3.31 Let Gbe a geometric graph for Pwith maximum degree 5. Algorithm
REDBLUETEST runs in time O(pn/ logn)and with the query complexity of O(pn/)
and accepts the input graph Gif it is the Euclidean Minimum Spanning Tree and rejects
the input with probability at least 2
3if the straight-line embedding of the EMST-completion
of Gis -far from having no red-blue intersection.
Proof : If G= (P, E)is the EMST then its EMST-completion is crossing-free. Thus
algorithm REDBLUETEST accepts G.
Let GCdenote the EMST-completion of Gand let us assume that GCis -far from
having no red-blue intersections. By Lemma 3.3.30 we can apply the following procedure
k=n/20 times: pick a pair of intersecting (with the definition from above) points
{p, q}=Wi,i[k], and remove all edges incident to pand qfrom GC. By the degree
bound, we have to remove at most 10 edges for each of the two vertices. Therefore, we
can apply this procedure at least ktimes.
In order to prove that REDBLUETEST rejects Gwith probability at least 2/3 if GCis
-far from having no red-blue intersections we show that with probability at least 2/3 one
of the pairs Wi,i[k], is in S0. We apply Lemma 2.4.1 and obtain:
Prj[k] : (WjS0)1 (13/4)42/3
It remains to show that the algorithm rejects, if there is a pair WiS0. If Wi={p, q}S0
then there exists e= (p, x)(or e= (q, x)) such that eis not in the EMST of p, q, x. By
Lemma 3.3.7 eis not in the EMST of S. Hence Smust have a cycle and REDBLUETEST
rejects. 2
Finally, we can combine Lemma 3.3.29 and Lemma 3.3.31 to obtain.
Lemma 3.3.32 Let Gbe a geometric graph for Pwith maximum degree 5. There is an
algorithm that in time O(pn/ log(n/)) and with a query complexity of O(pn/)
accepts G, if Gis the EMST of Pand rejects Gwith probability at least 2
3if the straight-
line embedding of its EMST-completion is -far from crossing-free.
Proof : Let Gbe -far from having a crossing-free EMST-completion. Then, either the
straight-line embedding of Gis ε
2-far from crossing-free or the EMST-completion of Gis
ε
2-far from having no red-blue intersections. Applying Lemma 3.3.29 and Lemma 3.3.31
with =ε
2shows that Gis rejected with probability at least 2
3. Since the tester for blue-
blue intersection and the tester for red-blue intersections both accept the EMST we have
completed the proof of Lemma 3.3.32. 2
Extension to Degree Bounded Graphs. Now we want to design a property tester
for arbitrary degree bounded input graphs. We first extend the planar map representa-
tion for well-shaped graphs to a planar map like representation for general graphs in the
following straightforward way:
For every vertex pin Gwe (cyclically) sort incident outgoing edges in clockwise order
around the vertex pwith respect to the Euclidean positions of the edges endpoints. For
sake of completeness we restate Definitions 3.3.10, 3.3.11, and 3.3.13 for general graphs
(these generalizations are immediate):
46
3.3 Euclidean Minimum Spanning Tree
Definition 3.3.33 Let Gbe a geometric graph for Pand let e
Gbe the corresponding planar
map like representation. For each edge e= [pqiin e
Gthe successor of eis the edge [q, ri
in e
Gwith rbeing the vertex incident to qthat precedes pin the adjacency list of q(sorted
in clockwise order around q); if qhas degree one, then r=p.
For edge e= [p, qiin Gthe kth successor,k0, is defined recursively as follows:
the 0th successor of [p, qiis [p, qiitself, and for k > 0, the kth successor of [p, qiis the
successor of the (k1)st successor of [p, qi.
Similarly, edge eis the predecessor of edge e0if edge e0is the successor of edge e, and
eis the kth predecessor of e0if e0is the kth successor of e.
Definition 3.3.34 Let Gbe a geometric graph for Pand let C= (p0, p1, . . . , pk)be a
cycle in the planar map of the EMST-completion of G. Then Cis called topological, if
for any two consecutive edges on the cycle [pi, pi+1iand [pi+1, pi+2i,[pi+1, pi+2iis the
successor of [pi, pi+1i. We also call the corresponding cycle in Gtopological.
Definition 3.3.35 Let Gbe a geometric graph for P. For any vertex pP, we define
its topological k-neighborhood as the set of vertices that are the endpoints of the edges
that are either the ith successor, 0ik, of any edge incident to p, or are the jth
predecessor, 0jk, of any edge incident to p. The topological k-neighborhood of a
vertex vis denoted Ntop
G(v, k).
Then we recall that Lemmas 3.3.19 and 3.3.23 do not require that Gis connected not
that the EMST-completion. Hence it suffices to show that the bound of Lemma 3.3.12
carries over to general graphs:
Lemma 3.3.36 Let G= (P, E)be a geometric graph for Pthat is /200-close to con-
nected and /200-close to having a crossing-free straight-line embedding. If Gis -far
from EMST, then there are at least n
100 -short topological cycles in the EMST-completion
of G.
Proof : Let GCdenote the EMST-completion of G= (V, E). Since GCis /200-close
to crossing-free we can delete a set EDof at most n/200 edges from GCto make it
crossing-free. Then we can insert a set EIof at most n/100 edges to make Gconnected
(and still keep its EMST-completion crossing-free). This is always possible since we can
insert EMST edges to connect the disconnected components. Let us denote the resulting
graph G0
C= (V, E \EDEI).
Using the same arguments as in the proof of Lemma 3.3.12 we obtain that there are at
least n/24 -short topological cycles in G0
C. Now we reverse our modifications of the
EMST-completion and we remove the edges EIfrom G0
C. We observe that the removal of
a single edge can destroy at most two cycles since the cycles are disjoint (in their planar
map representation). Hence the removal of EDdestroys at most n/50 -short topological
cycles. Then we reinsert the removed edges ED. Now each edge can destroy at most 2
cycles. Hence the re-insertion of EDdestroys at most n/100 -short topological cycles.
Counting the remaining cycles we get that at least n/100 -short topological cycles are
in GC.2
We conclude:
47
Advertisement
3 Testing Algorithms for Geometric Properties
Lemma 3.3.37 Let Gbe a geometric graph for Pwith maximum degree 5. Then there
is a property tester that in time O(log2(n/)·pn/)and with query complexity of
O(log(n/)·pn/)accepts the input if Gis an EMST of Pand rejects the input with
probability at least 2
3if Gis -far from EMST.
Proof : Follows from Lemma 3.3.19, Lemma 3.3.23, and Lemma 3.3.36. 2
3.3.6 A Property Tester in General Graphs
It remains to remove the degree bound condition for the input graph. In order to do this,
we first describe a low degree tester:
A property tester for low degree. We first define when a graph is far from having
low degree vertices:
Definition 3.3.38 A geometric graph G= (P, E)for Pis -far from having low degree if
one has to remove more than ·nedges in Gto obtain a graph having maximum degree
smaller than or equal to five.
If Gis not -far from having low degree, then we call it -close to having small degree.
It is easy to see that if Gis -far from having small degree, then there are at least
·nvertices in Geither having degree greater than five or having an adjacent vertex
with degree greater than five. Therefore the simple algorithm that picks a random set Sof
4pn/ points in Pand tests if every point pShas the degree smaller than or equal to
5and if so then it tests if every neighbor of pShas degree smaller than or equal to 55,
will detect with probability greater than or equal to 2
3every geometric graph Gthat is -far
from having small degree.
Lemma 3.3.39 Let Gbe a geometric graph for P. There is a property tester that in time
O(pn/)and with the query complexity of O(pn/)accepts the input if Ghas the
maximum degree smaller than or equal to 5and rejects the input with probability at least
2
3if Gis -far from having small degree.
Proof : Clearly, our algorithm accepts every graph having maximum degree of five. Let
us assume that Gis -far from having small degree. A vertex that has degree more than
five or that is adjacent to a vertex with degree more than 5is called a heavy vertex. Let S
denote a sample of size 4n chosen uniformly at random from P.
PrScontains no heavy vertex(11/pn/)4n/ 1/3 .
It follows that PrScontains a heavy vertex2/3
Hence our algorithm rejects every graph that is -far from having small degree with prob-
ability at least 2/3. Thus it is a property tester. 2
5Since the degree of every pSis less than or equal to 5, such a test can be performed in a constant time
per vertex p.
48
3.3 Euclidean Minimum Spanning Tree
The Tester for General Graphs. To obtain a tester for general graphs we have to
modify the tester for graph with degree bound of 5in the following way: We first test
whether the input graph is /2-far from having low degree (using the tester described be-
low). Then we run the property tester for graphs with maximum degree of 5after applying
the following modifications and with distance parameter ε=/2:
If during the course of the algorithm we encounter a vertex with degree more than 5,
we immediately reject.
For each vertex vSwe also include every neighbor of vin Ginto the sample
set. This can be done without asymptotically increasing the running time of the
algorithm (because we reject, if we encounter a vertex with degree more than 5).
Clearly, the above modifications do not affect the case when the input graph is the
EMST of the point set: The algorithm will still accept the input graph. Thus let us consider
the case when the input graph Gis -far from EMST. If the low degree tester rejects the
input graph, we are done. Thus let us assume that the input graph passes this test (and thus
is /2-close to having low degree):
Let us assume that Gis /2-close to having low degree but -far from EMST. We call
a vertex of Gadistinguisher, if it either has degree more than 5 or if it has a neighbor
whose degree is greater than 5. Now we define the graph G0to be a graph obtained from
Gby deleting a minimal set of edges such that G0has a maximum degree of 5. Since we
deleted less than n/2 edges from Gto obtain G0we conclude that G0is /2-far from
EMST.
Inorder to analyze the behaviorof themodified algorithm weconsider the(unmodified)
algorithm for graphs with maximum degree 5. First of all, we observe that the modified
algorithm always rejects, if there is a distinguisher in the sample chosen by the unmod-
ified algorithm. But if there is no distinguisher in the sample chosen by the unmodified
algorithm then the graph ’looks’ like the graph G0which has maximum degree 5 and is
/2-far from EMST. If we run the unmodified algorithm on input G0it rejects with prob-
ability 2/3. Thus the modified algorithm always rejects when the unmodified algorithm
rejects G0. We conclude that the modified algorithm rejects Gwith probability at least 2/3
because it either rejects or it behaves like the unmodified algorithm with input G0. Hence
we proved:
Theorem 4 There is a property tester for the EMST property with a running time of
O( n)·log2(n/)) and with a query complexity of O( n ·log(n/)).2
49
Advertisement
3 Testing Algorithms for Geometric Properties
50
4 Efficient Property Testers
In the previous chapter we have seen some property testers and their analyses. This leads
to the general question if it is possible to characterize all properties of a certain class of
objects (e.g., all properties of point sets) that can be tested efficiently. First of all, it is
necessary to specify what is meant by ”efficiently testable”: A property Πis efficiently
testable, if there is a property tester for Πthat has query complexity independent of the
size of the input (for constant distance parameter ). One of the major open questions in the
field is to characterize for certain classes of objects all properties that are efficiently testable
and there has been some effort to achieve this goal: For graph properties it is known that
all first order graph properties that do not have a ∀∃ quantification are efficiently testable
in the adjacency matrix model [5]. In the same model it has also been proven that a large
class of graph partitioning problems can be tested efficiently [51]. For matrix properties it
is known that certain monotonicity properties can be tested efficiently [45].
In this chapter we try to find a characterization of efficiently testable properties for
functions from an arbitrary finite domain Dinto an arbitrary range R. Using the very
general formulation of a property (see Definition 2.1.1) we can use our framework for a
large class of objects and - as shown in this chapter - also for a large class of properties. In
particular, we show that our framework can be used to prove that certain clustering prop-
erties over point sets, a reversal distance property over permutations, and the k-coloring
property over `-uniform hypergraphs can be tested efficiently. For some of these problems
a property testing algorithm has been analyzed before [4]. Our goal is here to show that
our framework is fairly general and that it leads to elegant proofs that highlight the combi-
natorial features of the problems. We can also show that every hereditary graph property
(every graph property that is closed under taking induced subgraphs) is efficiently testable,
if and only if there is a proof of its testability using our framework.
We emphasize that our results only hold for property testers with one-sided error. Our
result for hereditary graph properties is existential. It shows that our framework can be
applied to a large variety of problems. It would be very interesting to show that our charac-
terization can be used to show that certain large classes of functions are efficiently testable.
This is one of the major open questions regarding our framework.
4.1 Abstract Combinatorial Programs
In this section we introduce abstract combinatorial programs. An abstract combinatorial
program (ACP) is a combinatorial structure defined over a ground set of atom items. In
51
Advertisement
4 Efficient Property Testers
this combinatorial structure atom items may be arranged into sets. These sets describe
possible ”basic” configurations and they are called the bases of the abstract combinatorial
program. Further there is a relationship between atom items and bases: Each atom item
is either consistent with a given basis or it violates it. This relationship is described by a
violation function.
ACPs can be used to highlight combinatorial features of property testing problems.
In typical applications the ground set depends on the problem under consideration and
corresponds in a natural way to the basic items of the considered problem. When we want
to apply ACPs to graph problems then the ground set may be the set of vertices of the graph
and when we consider properties of point sets the ground set may be the set of points.
The set of bases describes possible “basic” configurations of the corresponding prob-
lem. If we consider a graph coloring problem then a basis may correspond to a subset of
vertices Wtogether with an associated coloring of W. For technical reasons we define
every basis as a pair (W, `), where Wis a subset of the ground set and `is an index de-
scribing a configuration of W(for example, in the graph-coloring example above, it is a
coloring of vertices in W).
The violation function describes for each basis band each atom item xif xis consistent
with bor not. If xis not consistent with bthen we say xviolates b. If every atom item
is consistent with a basis then we call this basis feasible. Normally, the violation function
depends on the input instance. For the graph-coloring example above one could define the
violation function such that a vertex vviolates a basis (colored vertex set W) if and only
if in the input graph the k-coloring of Wcannot be extended to a proper k-coloring of
W{v}.
Formally, we define an abstract combinatorial program in the following way.
Definition 4.1.1 An abstract combinatorial program (ACP) is a triple (C,B, $), where
C is a finite set called ground set,
B {(W, `) : W C, ` N}is a set of bases, and
$:B2Cis a violation function. For each basis b B the set $(b)is the set of
atom items violating b.
A basis bis feasible if $(b) = . An abstract combinatorial program is feasible if it has
a feasible basis.
We also would like to remark here that in the context of randomized incremental algorithms
configuration spaces have been introduced as an analysis tool (e.g., see [35]). Although
the structure of a configuration space is similiar to that of an ACP it would be misleading
to describe our framework in terms of configuration spaces as the intuitive meaning of the
corresponding components of the definitions differs widely.
We are now interested in the problem of testing the feasibility of ACPs. In order to do
so we need some further definitions. The first definition introduces the term dimension to
denote the maximum number of atom items involved in a basis. Then we denote by the
width of an ACP the maximum number of different bases with the same set of atom items:
52
4.1 Abstract Combinatorial Programs
Definition 4.1.2 (ACP Dimension) Let Pbe an abstract combinatorial program. The
dimension of P(denoted dim(P)) is defined as max{|W|: (W, `) B}. The width of P
denoted (width(P)) is defined as max{`: (W, `) B}.
Since we are interested in property testing we want to investigate in ”local” properties
of ACPs to conclude if the ACP as a whole is feasible or not. For this purpose we need
some further definitions. We say that a basis is feasible for a set of atom items, if none
of these items violates the basis. A basis is covered by a set of atom items, if the basis
contains only atom items from this set. And a set of atom items contains a self-feasible
basis, if there is a feasible basis that is covered by the set:
Definition 4.1.3 (Self-feasible bases) Let P= (C,B, $)be an abstract combinatorial
program. We say a basis b= (W, `) B is covered by a subset C C if WC. We
say that a basis bis feasible for a subset C C, if no cCviolates b. We say a subset
C C contains a self-feasible basis if there is a basis bthat is covered by Cand that is
feasible for C.
In the next section we want to design a property tester for feasibility of ACPs. We
assume that the algorithm has the possibility to determine for a set S C if Shas a self-
feasible basis or not. The size of the set Sshould be as small as possible (the size of this
set could be seen as the query complexity of the algorithm). Since a property tester has
1-sided error it is necessary that we can always determine if the input ACP is feasible.
But then this would require that an ACP that consists of a single feasible basis is always
accepted. The only chance to ensure this is to set S=C. Therefore, we consider only
monotone ACPs. If a monotone ACP Pwith dimension dim(P)is feasible then every
S C with |S|dim(P)has a self-feasible basis.
Definition 4.1.4 (Monotonicity) Let P= (C,B, $)be an abstract combinatorial pro-
gram with dimension dim(P).Pis called monotone if it is either not feasible, or if (it is
feasible and) every subset S C with |S|dim(P)contains a self-feasible basis.
4.1.1 Testing Abstract Combinatorial Programs
In this section we design a property tester for monotone ACPs. We first have to define when
an ACP is far from feasible. We do this directly without specifying a distance measure
between ACPs:
Definition 4.1.5 An abstract combinatorial program is -far from feasible if every basis
is violated by more than ·|C|objects from the ground set C.
A property tester for ACPs is an algorithm that (i) accepts every feasible ACP and (ii)
rejects with probability at least 2
3every ACP that is -far from feasible. In the following
theorem we analyze a property tester for certain ACPs:
53
Advertisement
4 Efficient Property Testers
Theorem 5 (Testing ACPs) Let P= (C,B, $)be an abstract combinatorial program
with dimension at most δand width at most ρ. Then there exists swith
s=Θ(1·(δ·ln(δ/) + lnρ))
such that the following algorithm
ACP-TESTER(P, )
Sample a set Sof sobjects from Cuniformly at random
if Scontains a self-feasible basis then accept P
else reject P
1. accepts P, if Pis monotone and feasible,
2. and rejects Pwith probability at least 2/3, if Pis -far from feasible .
Proof : Let P= (C,B, $)be an ACP that is -far from feasible. Further let Phave
dimension at most δand width at most ρ. For a basis b= (W, `)let Ebbe the random
event (with respect to the random choice of S) that WSand that none of the elements
from $(b)is in S. Now, in order to prove the theorem it is sufficient to show that with the
probability larger than or equal to 2
3for none of b B the event Ebholds.
For every r,0rδ, let rbe the set of all b= (W, `) B with |W|=r. Let us fix
an arbitrary br. Then we have
Pr[Eb](1)nr
sr
n
s.
Since Phas dimension at most δand width at most ρ, we have |r|ρ·n
rfor every
r0. Furthermore, we have |r|=0for all r > δ. Therefore, by the union bound we
obtain
Pr[b B :Eb]X
b∈B
Pr[Eb] =
δ
X
r=0X
br
Pr[Eb]
ρ·
δ
X
r=0n
r·(1)nr
sr
n
s
=ρ·
δ
X
r=0n
r
n
s·(1)nr
sr
=ρ·
δ
X
r=0s
r(ns)!
(nr)! ·((1)nr)!
((1)ns)!
54
4.1 Abstract Combinatorial Programs
=ρ·
δ
X
r=0s
r·((1)nr)···((1)ns+1)
(nr)···(ns+1)
ρ·
δ
X
r=0
sr·(1)srρ·
δ
X
r=0
sr·e(sr)
ρ·δ·sδ·e(sδ)ρ·δ·sδ·e(ss/2),
where we assume in the last inequality that s. Then we set s0:= (δ1ln( ·ρ))3
and
s=21(δlns0+ln( ·ρ)) .
With these choices we have
s=21(δlns0+ln( ·ρ))
21·δ·lns0·ln( ·ρ)
2(1δln( ·ρ))2
(δ1ln( ·ρ))3=s0
We further conclude
ρ·δ·sδ·e(ss/2)ρ·δ·sδ·(s0)δ·( ·ρ)1
1/3
Hence, with probability at least 2
3all b= (W, `) B with WSare violated by S, which
completes the proof of the first part of the theorem. If Pis monotone and feasible then
every set X C of size at least dim(P)contains a self-feasible basis. Therefore, Smust
contain a self-feasible basis because sdim(P). Hence the tester accepts the input. It
remains to show that
s=Θ(1·(δ·ln(δ/) + lnρ))
We have
s=21(δlns0+ln( ·ρ))
=Θ(1(δ(ln(δ1) + lnln(δρ)) + ln(δρ))
=Θ(1(δ(ln(δ1) + lnlnρ) + lnρ))
=Θ(1·(δ·ln(δ/) + lnρ))
by the observation that for δlnρwe have δlnlnρ=O(δln(δ/)) and for δ < lnρ
we have δlnlnρ=o(lnρ).2
Corollary 4.1.6 Algorithm ACP-TESTER is a property tester for monotone abstract com-
binatorial programs.
Proof : Follows immediately from Theorem 5. 2
55
Advertisement
4 Efficient Property Testers
4.2 Property Testing vs. Testing Abstract
Combinatorial Programs
Our motivation to introduce abstract combinatorial programs was to study its relation to
property testing problems. We now prove a theorem that shows how we can use abstract
combinatorial programs to prove for certain properties that there is an efficient property
tester. Roughly speaking, a property can be tested with small query complexity, if for
every problem instance there is an equivalent (in the sense of property testing) abstract
combinatorial program of small dimension and width.
We now present a first (simple) variant of the main theorem of this chapter. Then
we give some examples and discuss in detail how our theorem can be used to prove the
existence of a property tester with small query complexity. In most examples the obtained
algorithm has also a small running time.
Our approach of using the framework of abstract combinatorial programs to study
property testers of functions f F is to reduce testing of fto testing certain ACPs. In
this section we consider only ACPs whose ground set Cis the domain Dof the function
f. Later in Section 4.5 we show how to deal with other cases. In order to show that a
property Πcan be tested with low query complexity we construct for every f F an ACP
Pf.Pfmust satisfy the following constraints: If fis -far from Πthen Pfmust be -far
from feasible. The second constraint requires that if the function values of fon a set X
can be extended to a function in Π(and if Xhas a certain size) then Xcontains a feasible
basis. We now want to prove a special case of the main theorem of this chapter. We only
consider ACPs whose ground set Cis the domain Dof the tested function f.
Theorem 6 Let Fbe a set of functions from a finite set Dto a set Rand let Πbe a
property of F. Let 0<<1and let δ, ρ N. If for every f F there exists an ACP Pf
with dim(Pf)δand width(Pf)ρsuch that the following two conditions are satisfied:
(Distance Preserving) if fis -far from Πthen Pfis -far from feasible and
(Feasibility Preserving) for every X C with |X|δ: If there exists gΠwith
f|X=g|X, then Xcontains a self-feasible basis,
then there exists s=Θ(1·(δ·ln(δ/) + lnρ))) such that the following algorithm is a
property tester for property Π:
TESTER(f, )
Sample a set Sof selements in Duniformly at random
if f|S=g|Sfor some gΠthen accept f
else reject f
.
Proof : In order to show that TESTER(f, ) is a property tester for Π, we have to prove
that every function having property Πis accepted by the tester, and every function that is
56
4.3 Clustering Problems
-far from having property Πis rejected with probability at least 2
3. If fΠthen for every
X C we have f|X=g|Xwith g=fΠ. This immediately implies that every fΠ
is accepted by TESTER(f, ). Therefore, it remains to prove that if fis -far from Π, then
the algorithm rejects the input with probability greater than or equal to 2
3. We prove this
by relating ACP-TESTER(P, ) to TESTER(f, ) and by applying Theorem 5.
By the Distance Preserving property, if fis -far from Πthen Pfis -far from feasible.
Furthermore, by Theorem 5, if Pfis -far from feasible then ACP-TESTER(Pf, ) rejects
Pfwith probability greater than or equal to 2
3.Pfis rejected by ACP-TESTER(Pf, )
only if the chosen sample set Scontains no self-feasible basis. But now the Feasibility
Preserving property implies that if there is gΠthat agrees with fon the sample set then
every set X C with |X|δdim(Pf)contains a self-feasible basis. By the fact that
|S|δwe can conclude that Scontains a self-feasible basis, if there exists a gΠthat
agrees with fon the sample set S. Therefore, we can conclude that if fis -far from Π
then with probability at least 2
3there is no such gΠwith f|S=g|S. Hence, fis rejected
by TESTER(f, ) with probability at least 2
3. This implies that TESTER(f, ) is a property
tester for Π.2
4.3 Clustering Problems
So far we have introduced a framework that can be used to prove that certain properties can
be tested efficiently. Although in principle we can model many different property testing
problems in terms of our framework, it is not clear that we can use our framework to
prove that these properties have efficient property testers. For this reason we now consider
problems from different fields and show that our framework can be used to design and
analyze efficient property testers.
The first class of problems we consider are clustering problems. Clustering deals with
the problem to partition a set of items into different groups called clusters such that a given
optimization function is minimized. If we consider the corresponding decision problem we
want to know if a clustering with a given optimization value exists.
We consider two clustering problems of point sets in the Rd. The first problem is called
radius k-clustering. Here the goal is to partition a point set in the Rdinto kdifferent
clusters such that the maximum cost of the kclusters is minimized. The cost of a cluster is
given by the radius of the smallest ball enclosing the cluster. The second problem is called
diameter k-clustering. Again the goal is to minimize the maximum cost among a set of k
clusters but this time the cost of a cluster is given by the maximum distance between any
two points within a cluster.
Both problems have been analyzed in the context of property testing and it is known
that there are efficient property testers [4]. We merely want to show that these results
can also be achieved using our framework. The proofs we present are in the spirit of the
proofs from [4]. Yet we think that our proofs might give a clearer view of the important
combinatorial aspects that make these problems testable. In case of the diameter clustering
problem we also slightly improve the query complexity.
57
Advertisement
4 Efficient Property Testers
4.3.1 Radius clustering
The decision version of the radius k-clustering problem in the Euclidean space Rd[4, 41]
[57, p. 325] (sometimes also called the Euclidean k-center problem) is to verify whether a
given set Pof npoints in Rdcan be partitioned into ksets such that the points in each set
are contained in a unit ball. If such a partition exists, we say that Pis k-clusterable. We
assume that Pis in general position and, as usual, that the point set Pis represented as a
function f: [n]Rd. Let Fdenote the set of functions representing point sets of size
nand let Π F denote the set of functions representing point sets that are k-clusterable.
The distance between two point sets is given by the standard distance measure between
functions (see Definition 2.1.3) which is consistent with the following definition:
Definition 4.3.1 A set Pof npoints in Rdis -far from being k-clusterable if more than
n points must be deleted from Pto obtain a point set that is k-clusterable.
Now, in orderto useour framework fromTheorem 6we haveto describefor everyinput
point set Pin Rdan ACP P= (C,B, $)with C= [n]that satisfies the two conditions
of the theorem. Before we start thinking about the construction of Pwe observe that the
radius k-clustering property is a combinatorial property. This means that the ’number’
of a point in the representation is irrelevant for the property. Thus we can identify the
ground set Cof the ACP with the point set P. Hence a basis of such an ACP consists of a
small set of points from P(formally, of their corresponding indices) and some additional
information (formally encoded as an integer number).
Now we are ready to talk about how to construct the ACPs. Usually, the hard part is
to find the right set of bases. Typically, a basis is used to represent implicitly a ”possible
solution” to the problem. Usually, it should be the case that the set of bases corresponds
to the set of possible solutions of the problem. Additionally, it is required that this implicit
representation can be done in the form of some atom items from the ground set of the ACP
(in our case these items correspond to points from P) and some ”additional information”.
For simplicity, let us first consider the radius 1-clustering problem. If the point set P
is 1-clusterable then by definition it is contained in some unit ball. Vice versa, we can
describe every ’possible solution’ of the problem implicitly by the position of such a unit
ball. In a similar way we can consider every ball with radius at most 1as a possible
solution. Our goal is to describe a subset of these balls implicitly in terms of atom items
(points). If Pis 1-clusterable then this subset must contain a ball that contains every point
in P.
In the following we use the fact that every finite point set is contained in a unique
(closed) ball of smallest radius (see, e.g., [78]). We denote this ball by sball(P). With
every subset WPwe associate its smallest enclosing ball. If the radius of sball(W)
is at most 1then this ball can be interpreted as a ’possible solution’ of the problem and
W(formally, the pair (W, 1)) as a basis. So we could say that for each WPthe pair
(W, 1)is a basis, if and only if the radius of sball(W)is at most 1. But to get a query
complexity independent of nfrom our framework we need to reduce the number of atom
items involved in a basis. To do this we use the fact that there is always a subset WP
of cardinality at most d+1such that sball(W) = sball(P)[78].
58
4.3 Clustering Problems
We say that (W, 1)is a basis, if the following two conditions are satisfied:
|W|d+1,
the radius of sball(W)is at most 1.
We can now define a natural violation function in the following form: A basis (W, 1)
is violated by all points that are not contained in sball(W). Using this definition we know
(a) that Phas a feasible basis if Pis 1-clusterable and (b) that every basis in Pis violated
by more than n points, if Pis -far from 1-clusterable.
We can easily extend this definition of a basis to kclusters: A basis for the radius
k-clustering problem consists of kbases for single clusters. Formally, a basis consists
of at most k(d+1)points from Pand an integer number that encodes a partition of the
k(d+1)points into ksets (of size at most d+1). The violation function is defined in the
straightforward way: A point violates a basis, if it is not contained in any of the smallest
enclosing balls defined by the bases. We have now done the ”tricky part” of the proof -
the design of the bases - what remains is straightforward verification of the requirements
of our framework.
Bases for radius clustering: We define the bases to specify all possible representa-
tions of feasible k-clusterings. Formally, we define the set of bases B={(W, `) : W
[n],|W|(d+1)k, 1 `k(d+1)k}, where the pair (W, `) B should be interpreted
as follows:
Wis the set of points defining ksmallest enclosing balls (clusters), and
`is represented as a vector hν1, . . . , ν(d+1)kiof length (d+1)ksuch that for 1
i|W|, the ith point in Wdefines the smallest enclosing ball containing cluster
number νi[k].
We say a basis b B is valid if every set W0of points defining one of the ksmallest
enclosing balls in bsatisfies that the radius of sball(W0)is at most 1.
It is easy to see that with such a definition of the bases, the abstract combinatorial
program designed for the radius clustering problem has dim(P)=(d+1)k=: δand
width(P) = k(d+1)k=: ρ.
Violation Function for Radius Clustering. We say p C = [n]violates a basis
b B if bis not valid or the point p(located at f(p)) is not contained in any of the kballs
defined by b. Furthermore, all non-valid bases are violated by all ground set elements.
Now, once we have defined formally the abstract combinatorial program Pfor every
instance of the radius clustering problem, we have to verify the prerequisites of Theorem 6:
The Distance Preserving and the Feasibility Preserving properties.
59
Advertisement
4 Efficient Property Testers
Distance Preserving Property. If Pis -far from being k-clusterable, then for every
set of kballs in Rdof radius at most one there is always a set of more than n points in
Pthat are not contained in any of the balls. By definition every basis corresponds to such
a set of smallest enclosing balls and the violation function is defined according to these
balls. Thus every basis b B must be violated by more than n elements from C. This
implies the Distance Preserving property.
Feasibility Preserving Property. Every set S,|S|δ, that is k-clusterable is con-
tained in kunit balls. Thus we can partition Sinto kclusters S1, . . . , Skeach of which is
contained in a unit ball. Further there exists sets WiSiwith sball(Wi) = sball(Si)and
|Wi|d+1. The sets Widefine a certain basis in Bwhich is covered by Sand is feasible
for S. This implies the Feasibility Preserving property. Therefore, we can apply Theorem
6 to obtain a property tester for the radius clustering problem with a query complexity of
e
O(d k/).
Implementation. We also observe that we can implement the second statement of the
algorithm TESTER(f, ) efficiently in the following way: We compute whether the sample
set Sis k-clusterable. If it is, we accept the input. If it is not we reject. The correctness
follows immediately from the fact that if a point set Sis k-clusterable then there exists a
superset of Swith size nthat is also k-clusterable. We summarize our discussion in this
section with the following theorem.
Theorem 7 There is a property tester for the radius clustering problem with query com-
plexity of
O(d k 1ln(d k/)) = e
O(d k/).2
Finally, let us discuss what we learned from this example with respect to our frame-
work. We have seen that the bases of the constructed ACP correspond to possible solutions
of the problem. If we generalize this observation we can say, roughly speaking, that prop-
erties are testable if every possible (global) solution to the problem can be encoded using
a few ”atom items” (of the problem instance) and some additional information. Unfor-
tunately, things are not always that simple. In this first example we have a one-to-one
correspondence between the clustering problem and the ACP, i.e., a sample S C has a
self-feasible basis if and only if the corresponding point set is k-clusterable. This is not
always the case as we see in the next section.
4.3.2 Diameter clustering
The decision version of the diameter k-clustering problem ([4], [9, Problem ND54], [48,
Problem MS9], [57, p.326]) is defined as follows: Given a point set Pin Rdand a positive
integer k, can Pbe partitioned into kdisjoint sets (clusters) C1, . . . , Cksuch that for every
i,1ik, and every x, y Ciit holds that dist(x, y)1. If such a partition exists,
we say that Pis k-clusterable. As always, we assume in the property testing setting that the
60
4.3 Clustering Problems
point set is represented by a function f: [n]Rd. In this case we do not use the standard
distance measure. It has been shown in [4] that under the standard distance measure every
property tester must have a query complexity of (n). Instead we use the bicriteria
distance measure proposed in [4] which is defined in the following way:
Definition 4.3.2 [4] Let Pbe a point set in Rdand kbe a positive integer. We say P
is (, β)-far from being k-clusterable if for every partition of Pinto sets C0, C1, . . . , Ck
satisfying dist(x, y)1+βfor all 1ikand x, y Ci, it holds that |C0|> ·|P|.
It is known that under this distance measure there is a property tester with query com-
plexity e
Ok2
·2
β2dfor the diameter k-clustering problem [4]. Again our goal in this
section is to show that we can prove this result using our framework and improve the query
complexity of e
Ok
·2
βd. Our proof uses combinatorial arguments that appear implicitly
in the proof presented in [4]. Our goal is to design an efficient property tester that for given
k,and β>0(i) accepts every point set that is k-clusterable and (ii) rejects with proba-
bility at least 2
3every input that is (, β)-far from being k-clusterable. Again we denote by
Fall functions representing point sets of size nand by Π F all functions representing
point sets that are k-clusterable.
As in the case of the radius clustering problem we begin our discussion with the con-
struction of bases for a single cluster. We will see that we can generalize this definition
to the case of k-clusters in a similar way as in the radius clustering problem. We use the
following notation: A cluster Cis a non-empty set of points in Rdwith dist(x, y)1for
every x, y C.
Definition 4.3.3 Let Cbe a cluster. The kernel kern(C)of Cis defined as the intersection
of unit balls with centers at the points in C.
We use the following simple properties of a kernel:
Claim 4.3.4 Let Cbe a cluster. Then we have:
1. Ckern(C).
2. There exists a unit ball containing all the points in C.
3. If pkern(C)then dist(x, y)1for every x, y C{p}.
Proof : (1) Since dist(x, y)1for every x, y C, each point xCis contained
in every unit ball with the center at any other point yC, and hence, xis contained in
kern(C). (2) Since C6=, from the previous property we get kern(C)6=. Let us pick
any point xkern(C). Since xis contained in all unit balls with centers at the points
in C, it is at the distance at most 1from every point in C. Therefore all points in Care
contained in the unit ball with the center at x. (3) If pkern(C), then pis contained in
all unit balls with centers at the points in Cand thus its distance to every point in Xis at
most 1.2
61
Advertisement
4 Efficient Property Testers
Now, in order to use our framework from Theorem 6 we have to describe for every
input set Pof npoints in Rdan ACP P= (C,B, $)with C= [n]that satisfies the
preconditions of the theorem. Following the arguments from the radius clustering case we
can again identify the ground set Cof the ACP with the input point set P. Thus we can
again construct our bases using points from Pas atom items. We first consider the case
k=1:
We start by making an observation about the kernel of a cluster C: If we add a point
pto Cthen Cremains a cluster (i.e., the pairwise distance between points from Cis at
most 1) if and only if pis in the kernel of C. Thus a good basis would contain exactly
those elements from Cthat define the boundary of kern(C). Unfortunately, it might be
the case that almost every element of Cdefines the boundary of C. So this approach is
doomed to failure. But we can do the following: We can find a small set of points Win C
that approximates the kernel of C. We choose a basis for a cluster Cto be every maximal
subset WCwith the property that all points in Whave a mutual pairwise distance of
more than β. This way, we ensure that (a) the kernel is well approximated and (b) that the
number of points defining the kernel of a cluster is small. We now prove that every basis
for a single cluster is defined by no more than (1+2
β)dpoints.
Lemma 4.3.5 Let Cbe a cluster and let Wbe a subset of Csuch that for every p, q W
dist(p, q)> β > 0. Then |W|(1+2/β)d.
Proof : Let Cand Wbe as stated in the lemma. If we draw balls of radius β/2 centered
at every point in W, then all these balls are pairwise disjoint. By Property 2 in Claim
4.3.4, all points in Ware contained in some unit ball because WCand Cis a cluster.
Therefore, if we draw balls of radius β/2 centered at every point in W, then all these balls
are contained in a ball of radius 1+β/2. Since all these small balls are disjoint, we have
the following upper bound for the size of W:
|W|·Vol(ball of radius β/2)Vol(ball of radius 1+β/2),
where Vol() denotes the volume of the object. Since the volume of a d-dimensional ball of
radius ris equal to1rd·πd/2
Γ(1+d/2), we obtain an upper bound for the size of Wof (1+ (2/β))d.
2
To formalize our definition of bases we need:
Definition 4.3.6 Let Pbe a point set in Rd, and βa positive real. Let Cbe a cluster.
We say a point pPis β-covered by Cif pkern(C)and there is qCsuch that
dist(p, q)β.
Similar to the radius clustering problem we want to represent every possible solution
to the diameter 1-clustering problem by a basis. As already mentioned we can do this
1Here, Γ() is Euler’s Gamma (factorial) function, that is formally defined as Γ(x) = R
0tx1etdt for all
positive x. It is well known that Γ(x+1) = x Γ(x)and that for integer x0we have Γ(x+1) = x!and
Γ(x+1
2)! = π·((2 x)!)/(x!·4x).
62
4.3 Clustering Problems
only approximately. Therefore we say that a pair b= (W, `)is a basis for the diameter
1-clustering problem if Wis a subset of Pof size at most (1+2
β)dand `=1. We say
that a basis is valid, if for all p, q Wwe have βdist(p, q)1. It remains to define
the violation function. If a basis is not valid it is violated by every point in P. If a basis
is valid, we have two different types of violation: First of all a point pviolates a basis
b= (W, 1), if p /kern(W). This is the straightforward type of violation. If a point
is not in the kernel of Wthen it cannot belong to the same cluster. The second type of
violation is different. A point pkern(W)violates b, if pis not β-covered by W. Here
a point violates a basis if it is consistent with the current basis of the cluster, but it changes
the kernel significantly. But if the kernel is changed significantly then the basis bis no
longer ”a good implicit description of the solution of the clustering problem”. In this case,
we can obtain a new basis b0= (W{p}, 1)from bthat approximates the new kernel.
Bases for diameter clustering. We can extend our definition of bases for the diam-
eter 1-clustering problem to arbitrary kin the following way: A basis for the diameter k-
clustering problem is an encoding of ksets W1, . . . , WkPeach of size at most (1+2
β)d.
A basis is valid if for every p, q Wi,1ik, it holds that βdist(p, q)1.
Formally, a basis is a set W=SiWiwith an integer encoding the partition of Win the
sets Wi.
Lemma 4.3.7 Any ACP Pfor the diameter k-clustering problem has dimension at most
k·(1+2/β)dand width at most kk·(1+2/β)d.
Proof : The dimension of the ACP follows immediately from the definition of the bases.
The width follows from the fact that every point of a basis can belong to one of ksets, that
is, we have (at most) kchoices for each point of the basis. 2
Violation function for diameter clustering. A basis bthat is an encoding of the
sets W1, . . . , Wkis violated by a point pP, if bis not valid or if pviolates every Wi
(seen as a basis for the 1-clustering problem), 1ik.
Feasibility Preserving property. In order to show the Feasibility preserving prop-
erty we have to show that every k-clusterable set SPof size at least δ:= k·(1+2/β)d
dim(P)has a self-feasible basis. If Sis k-clusterable then there exists a partition of Sinto
kclusters C1, . . . , Ck. Since for every WiCiit holds that kern(Ci)kern(Wi)
we know that there exists sets Wiwith the property that for each p, q Wiwe have
βdist(p, q)1and for each pCiand qWiwe have dist(p, q)β. By
Lemma 4.3.5 and the size of the bases the Feasibility Preserving property follows.
Distance Preserving property. We prove the Distance Preserving property by con-
tradiction. Let us assume Pis (, β)-far from being k-clusterable and suppose there is a
basis bencoding the sets W1, . . . , Wkthat is violated by less than n points. We delete
all points in Pthat violate band let Pbe the remaining point set. Since all the points in
63
Advertisement
4 Efficient Property Testers
Pare β-covered by some Wi, for each point pPthere is a Wiwith pkern(Wi)
and for which there exists qpWiwith dist(p, qp)β. We assign each such a point
pto the cluster corresponding to Wi. Observe that all points in the cluster are contained
in kern(Wi). Furthermore, for every point rkern(Wi)the distance between pand
ris not larger than the distance from pto qpplus the distance from qpto r. Hence, we
can conclude that the distance between two points in the cluster (both of which must be
contained in kern(Wi)) is at most 1+β. This implies that Pcan be partitioned into k
clusters of diameter at most 1+βeach, which is a contradiction.
Implementation. Similar to the radius clustering problem we can implement the sec-
ondstatement ofthe algorithm TESTER(f, ) bychecking ifthe sampleset Sis k-clusterable.
This follows from the fact that every set k-clusterable set Sof size at most ncan be ex-
tended to a set Pof size nthat is clusterable. On the other hand, every set Sthat is not
k-clusterable cannot be extended to a set Pthat is k-clusterable. Now, by our discussion
above, we can apply Theorem 6 to obtain the following result.
Theorem 8 There is a property tester for the diameter k-clustering problem with a query
complexity of
e
O(k·1·(1+ (2/β))d)
4.4 Reversal Distance
In the previous section we have seen how to apply our framework to certain clustering
problems. We have seen that a clustering problem is efficiently testable, if we have an im-
plicit characterization of the cluster consisting of a small number of input objects and some
additional information. We now want to consider a different problem which is called the
reversal distance problem. Determining the reversal distance between two permutations is
a fundamental problem in computational biology. It has been introduced in the pioneering
works by Sankoff and later on many researchers have investigated in this problem (see,
e.g., the survey in [71, Chapter 10]). The problem to compute the reversal distance be-
tween two permutations can be reduced to the problem of computing the distance between
one permutation and the identity permutation (details below). Therefore, the problem is
also called sorting by reversals’.
In sorting by reversals one is asked to compute the shortest sequence of (interval)
reversals that transforms a given permutation πinto the identity permutation. The number
of reversals that are necessary is called the reversal distance between πand the identity
permutation. Because of its applications in computational biology, sorting by reversals has
been widely studied in the recent years (see, e.g., [10, 17, 22, 59, 71, 72]). It is known that
sorting by reversals is NP-hard [22], its optimization version is MAX-SNP-hard [18], and
that there exits a polynomial-time 1.375-approximation algorithm [17] (see also [10, 59]).
We now introduce the reversal distance problem, formally. Let Sndenote the set of all
permutations of [n].
64
4.4 Reversal Distance
Definition 4.4.1 A reversal ρhi, jiof an interval [i, j],1ijn, is the permutation
1 2 . . . i 1ii+1 . . . j 1 j j +1 . . . n
1 2 . . . i 1 j j 1 . . . i +1ij+1 . . . n .
That is, for a permutation π= (π1, . . . , πn) Sn,ρhi, jihas the effect of reversing
the order of (πi, πi+1, . . . , πj)and transforming πinto
π·ρhi, ji= (π1, . . . , πi1, πj, πj1, . . . , πi, πj+1, . . . , πn)
.
The reversal distance between two permutations πand σis the minimum number of
reversals that is necessary to transform πinto σ.
Definition 4.4.2 Given a pair of permutations π= (π1, . . . , πn), σ = (σ1, . . . , σn) Sn,
the reversal distance drev(π, σ)between πand σis the minimum number of reversals
needed to transform πinto σ(that is, the minimum number ksuch that there exists a
sequence of reversals ρ1, ρ2, . . . , ρkwith π·ρ1·ρ2···ρk=σ).
Equivalently, we can compute the number of reversals necessary to transform σ1πinto
the identity permutation id. Therefore, we are interested in the reversal distance between
a given permutation πand the identity permutation. We want to consider the decision
version of the reversal distance problem.
Definition 4.4.3 The reversal distance problem is to decide for a given permutation π
Snand an integer kif the reversal distance drev(π, id)between πand the identity permu-
tation id is at most k.
We now want to formulate the problem as a property testing problem that fits into our
framework. The set of functions Fwe want to consider is the set of permutations of [n],
that is, F=Sn. We are interested in all permutations that have a reversal distance of at
most kto the identity permutation. We can write this as a property Πas follows:
Π={π Sn:drev(π, id)k}.
Now that we have defined the property we need a distance measure between permuta-
tions. Here we can use the standard distance measure from Definition 2.1.3. Equivalently,
we can use the following definition:
Definition 4.4.4 A permutation π Snis -far from having reversal distance smaller
than or equal to kif for every sequence of kreversals ρ1, ρ2, . . . , ρkthe permutation
π·ρ1·ρ2···ρkdisagrees with the identity permutation on more than ·nplaces, that is,
if π·ρ1·ρ2···ρk= (σ1, . . . , σn)then |{i{1,2,...,n}:σi6=i}| > ·n.
65
Advertisement
4 Efficient Property Testers
In contrast to the clustering problems the reversal distance property is obviously not
combinatorial. We conclude that we have to take the domain of the function into account.
For a permutation π= (π1,··· , πn)we denote atom items by πi. This notion covers the
fact that the value of domain element iis πi=π(i). Hence it captures also the domain of
a value of f.
Let us notice that we can encode an interval [i, j]by the two domain elements πiand
πj(using the fact that π1(πi) = iand π1(πj) = j). If we apply a reversal ρto πthen
πiand πjinduce the interval ((π·ρ)1)(πi),((π·ρ)1)(πj)(for this reason we want to
work with πirather than with i). We denote the interval induced by two elements πiand
πjby [πi, πj].
We say a reversal ρhr, sisplits an interval [πi, πj], if i<rjor is<j(or both).
Definition 4.4.5 Let π= (π1, . . . , πn)be a permutation and let [πi, πj]denote an interval.
We say that a reversal ρhk, `isplits an interval [πi, πj], if i<kjor if i`<j.
We generalize this notion to k-reversals:
Definition 4.4.6 Let π= (π1, . . . , πn)be a permutation. A k-reversal ρ=ρ1·ρ2···ρk
splits an interval [πi, πj], if there exists `,0`<k, such that ρ`+1splits
[(π·ρ1···ρ`)1(πi),(π·ρ1···ρ`)1(πj)] .
If ρdoes not split [πi, πj]then we say ρis safe for [πi, πj].
Notice that if ρ1, . . . , ρkis safe for [πi, πj]then each of the reversals ρ1, . . . , ρkeither
entirely contains [πi, πj]or it does not contain any π`[πi, πj]. Therefore, in this case,
after applying ρ1, . . . , ρkthe positions of πi+1, . . . , πj1are determined by the position of
πiand πj.
Bases for the k-reversal problem: The idea is now to define a basis as a set of
2k +1intervals induced by pairs of the atom items of the basis. For each such set we then
consider only reversal that are safe for these intervals. We say that a set of 2k +1intervals
is a basis if a sequence of kreversals ρ1,...ρkexists such that for each atom item πiof
the basis it holds that (π·ρ1···ρk)1(πi) = πiand if ρ1,...ρkis safe for all intervals
induced by the elements involved in the basis.
Definition 4.4.7 (Bases for k-reversal distance) Let π= (π1,··· , πn) Sn. A set Iof
2 k +1intervals is a valid basis for the reversal distance problem if there is a sequence
ρ1, . . . , ρkof kreversals such that
(π·ρ1···ρk)1(πi) = πiand (π·ρ1···ρk)1(πj) = πjfor each interval [πi, πj]
I, and
no interval [πi, πj]Iis split by ρ1, . . . , ρk.
If the set of intervals is a basis b, then we associate with it the k-reversal ρb=ρ1···ρk
(in case that there are different sequences that witness the basis property we choose an
arbitrary one).
66
4.4 Reversal Distance
Formally, a basis consists of 4k +2atom items and an integer number encoding the
2k +1pairs of atom items (an atom item may be paired with itself). The integer number
can be seen as a vector of length 2k +1having entries with values from [4k +2]×[4k +2]
to specify each of the 2k +1pairs.
Lemma 4.4.8 For each instance of the reversal distance problem the corresponding ACP
Phas dim(P)4k +2and width(P)(4k +2)4k+2.
Proof : By definition a basis consists of 4k +2atom items. Thus we have dim(P)
4k+2. Each vector of length 2k+1with values from [4k+2]×[4k+2]can be encoded as
an integer number between 1and (4k +2)4k+2. Thus we have width(P)(4k +2)4k+2.
2
Violation function for k-reversal distance: Let bbe a basis and let ρb=ρ1···ρk
bethe k-reversal associated with basis b. We say bisviolated by πi C, if (π·ρb)1(πi)6=
πi, that is, πiis not moved to position πiwhen ρbis applied to π.
Distance Preserving Property: We have associated a k-reversal ρbto each basis.
An atom item violates a basis, if it is not at the correct position when ρbis applied to π. If
a permutation is -far from having reversal distance smaller than or equal to kthen every
k-reversal puts more than n elements to the wrong position. Hence every basis is violated
by more than n elements. Therefore, the distance preserving property is satisfied.
Feasibility Preserving Property: The hard part in this case is to prove the Feasibility
Preserving property. Let S C be a set of atom items and let ρ=ρ1···ρkbe a k-reversal
with (π·ρ)1(πi) = πifor each πiS. We show that in this case Shas a self-feasible
basis. First we want to construct a set of 2k +1intervals that are safe for ρ1,...ρk. We
start with the set of s1intervals induced by S. now we observe that each reversal ρican
split at most 2of these intervals. We conclude that at most 2k of these intervals are split by
ρ1,...ρk. We can merge adjacent intervals not split by ρ1,...ρkand obtain a set of 2k+1
intervals that are not split by ρ1,...ρk. Hence there exists a sequence of kreversals that
is safe for each of our intervals. Thus these intervals form a basis b. It remains to prove
that this basis is not violated (the k-reversal associated with the basis must not be the k-
reversal ρ). By our construction of the intervals each πiSis contained in a safe interval.
Therefore, its position after applying the reversal is uniquely determined by the positions
of the endpoints of the interval. Let SISdenote the set of endpoints of intervals of the
basis b. Since bis a basis there is a k-reversal ρbwith
(π·ρb)1(πi) = πi= (π·ρ)1(πi)for each πiSI
Since the endpoints are mapped to the same places when ρband ρare applied to πwe can
conclude that each other point in Sis also mapped to the same place. Hence no πiS
violates band we have shown the Feasibility Preserving property.
Once we have proven the Distance and the Feasibility preserving property, Lemma
4.4.8 and Theorem 6 allow us to conclude with the following theorem.
67
Advertisement
4 Efficient Property Testers
Theorem 9 There exists a property tester for the k-reversal distance property with query
complexity e
O(k/).2
4.5 Property Testing vs. Testing Abstract
Combinatorial Programs (continued)
In Section 4.2, we described a framework for testing problems via testing abstract combi-
natorial programs. We only considered ACPs whose ground set is equivalent to the domain
of the tested function. Although we showed that this framework can be applied to differ-
ent problems, it is not always powerful enough. In some cases it is necessary to consider
ground sets different from the domain of the tested function. For example, when we con-
sider graph problems we want to identify the ground set with the set of vertices of the
graph. If we consider the adjacency matrix model using the approach from Section 4.2 and
we represent a graph by a function f:V×V{0, 1}then the ground set would be a set
of entries in the adjacency matrix. But sampling entries of the adjacency matrix results in
a disconnected set of edges, if the size of the sample set is o(n)[51]. In order to have a
more flexible model we introduce interpretations.
Interpretations. Interpretations are functions that map each subset of the ground set of
the ACP to a subset of the domain of the tested function. Given a sample set S of ground
set items we use the interpretation to determine a set of domain elements DS. Then we
query for the value f(x)for each x DS.
Definition 4.5.1 An interpretation of Cin Dis a function I:2C2D.
To investigate quantitative properties of the reduction we need the following definition.
Definition 4.5.2 For a function h:NN, we say an interpretation Iof Cin Dis h-
bounded if for every X C it holds |I(X)|h(|X|). (We write in that case that Iis
h(N)-bounded, with Nbeing the formal input variable.)
The main idea behind introducing these notions is to allow a more general analysis of
algorithm TESTER(f, )from Section 4.2. Similarly to the proof of Theorem 6, we want
to test an input function f F via testing a related ACP P= (C,B, $). Since Pis now
allowed to be an ACP with arbitrary ground set C, we use the interpretation Iof Cin D
to link the domains of fand Pin the reduction. The notion of h-bounded functions in
Definition 4.5.2 is used to describe the size of the random sample in the tester. That is, if
the interpretation Iis h(N)-bounded and if our algorithm samples a set S C then we
query for the value of f(x)for every xI(S) D and the restriction |I(S)|h(s)yields
an upper bound on the query complexity.
68
4.5 Property Testing vs. Testing Abstract Combinatorial Programs (continued)
Distance Preserving Property. In Theorem 6 we used the Distance Preserving prop-
erty that requires that if a function fis -far from property Πthen the ACP is -far from
feasible. In general, however, one can parameterize this property and require the (, λ)-
Distance Preserving property: if fis -far from property Πthen the ACP is λ-far from
feasible.
Now, in the framework defined above, it is easy to see that Theorem 6 can be general-
ized to the following theorem, which describes our framework in its full generality.
Theorem 10 Let Fbe a set of functions from a finite set Dto a set R, and let Πbe a
property of F. Let 0 < , λ < 1 and let I:2C2Dbe an h-bounded interpretation of
Cin D. If for every f F there exists an ACP Pfwith dim(Pf)δand width(Pf)ρ
such that:
((, λ)-Distance Preserving) if fis -far from Πthen every basis in Pfis λ-far from
feasible, and
(Feasibility Preserving) For every X C with |X|δ: If there exists gΠwith
f|I(X)=g|I(X)then Xcontains a self-feasible basis,
then there exists s=Θ(λ1·(δ·ln(δ/λ) + lnρ)) such that the following algorithm is a
property tester for Πwith query complexity h(s):
TESTER(f, )
Sample a set Sof selements in Cuniformly at random
if f|I(S)=gI(S)for some gΠthen accept f
else reject f
Proof : In order to show that TESTER(f, ) is a property tester for Π, we have to prove
that every function having property Πis accepted by the tester, and every function that is
-far from having property Πis rejected with probability at least 2
3. If fΠthen for every
X C we have f|I(X)=g|I(X)with g=fΠ. This immediately implies that every fΠ
is accepted by TESTER(f, ). Therefore, it remains to prove that if fis -far from Π, then
the algorithm rejects the input with probability greater than or equal to 2
3. We prove this
by relating ACP-TESTER(P, ) to TESTER(f, ) and by applying Theorem 5.
By the Distance Preserving property, if fis -far from Πthen Pfis -far from feasible.
Furthermore, by Theorem 5, if Pfis -far from feasible then ACP-TESTER(Pf, ) rejects
Pfwith probability greater than or equal to 2
3.Pfis rejected by ACP-TESTER(Pf, )
only if the chosen sample set Scontains no self-feasible basis. But now the Feasibility
Preserving property implies that if there is gΠthat agrees with fon the interpretation of
the sample set then every set X C with |X|δdim(Pf)contains a self-feasible basis.
By the fact that |S|δwe can conclude that Scontains a self-feasible basis, if there exists
agΠthat agrees with fon the sample set S. Therefore, we can conclude that if fis -far
from Πthen with probability at least 2
3there is no such gΠwith f|I(S)=g|I(S). Hence, f
is rejected by TESTER(f, ) with probability at least 2
3. This implies that TESTER(f, ) is
a property tester for Π.2
69
Advertisement
4 Efficient Property Testers
4.6 Graph Coloring
In this section we apply Theorem 10 to graph coloring. A k-coloring of a graph G= (V, E)
is an assignment χ:V{1,...,k}of colors to the vertices of the graph. A coloring is
proper if there is no edge e= (v, u)Esuch that χ(v) = χ(u). If Ghas a proper k-
coloring, then Gis k-colorable. The graph k-coloring problem is to decide whether a given
graph is k-colorable. It is a classical problem in algorithmic graph theory. It is known that
for k3the problem of verifying if an input graph is k-colorable is NP-complete (see,
e.g., [48, Problem GT4] or [9, Problem GT5]). It is also well known that this problem is
very hard to approximate, and so, for example, it is NP-hard to 4-color 3-colorable graphs
and it is hard to color k-colorable graphs with approximation within n1, and even within
n1O(1/loglogn)[38] where ndenotes the number of vertices in the graph. The best known
approximation bound for arbitrary kis of O(n(loglogn)2/log3n)[9, Problem GT5].
We want to consider the graph coloring problem in the adjacency matrix model. That
is, the input graph G= (V, E)is given as a function V×V{0, 1}representing the
adjacency matrix of the graph. Thus we have f(u, v) = 1, if and only if (u, v)E.
W.l.o.g. we assume that V= [n]. Let Πdenote all n-vertex graphs that have a proper
k-coloring. We use the standard distance measure between graphs (see Definition 2.1.3)
which is equivalent to the following definition:
Definition 4.6.1 A graph Gis -far from being k-colorable if in order to transform Ginto
ak-colorable graph one has to modify more than n2entries in the adjacency matrix of
G.
It is known that graph coloring in the adjacency model can be tested efficiently [51] and
the proof we present uses combinatorial arguments that essentially appear in the proof
presented in [51]. The achievement of our framework is again a clear and elegant proof
that highlights the combinatorial aspects of the problem and a slight improvement in the
query complexity that matches for k>2the bounds from [6] in the e
O-notation.
For the k-coloring problem, we identify the ground set Cwith the set of vertices V
of the input graph G= (V, E). Since in our framework Gcan be viewed as given by its
adjacency matrix representation, we define the interpretation Ito map each set of vertices
to the submatrix induced by these vertices. That is, for every WV, we have I(W) =
W×W. Clearly, the interpretation is N2-bounded.
The bases of the coloring problem are formed by some properly colored sets of vertices.
That is, everybasis correspondsto apair (W, χ), where WV=Candχisan encoding
of a proper k-coloring of W(here, one can think that a k-coloring of Wis represented by
a vector of length |W|with values in kW). Notice that with this definition, if for each
(W, χ) B we have |W|δ, then so defined abstract combinatorial program Phas
dim(P)δand width(P)kδ.
Before we define the bases formally, we need some more definitions:
Definition 4.6.2 Let SVbe a set of vertices and let χbe a proper k-coloring of S. Let
Vi=vV:uSwith χ(u) = iand (v, u)E
70
4.6 Graph Coloring
be the set of vertices that cannot be properly colored in color iusing any extension of χ
to V. We call a vertex vV\Sheavy for hS, χiif there is a proper extension of χto
S{v}, but every extension increases the number of vertices in certain Vi,1ik, by
at least n/3 (that is, (i) there exists 1jk, with v /Vjand (ii) 1jkif v /Vjthen
{w /Vj: (v, w)E} n/3).
A vertex vV\Sthat cannot be properly colored by any extension of the coloring χ
(that is, vTk
i=1Vi) is called a conflict vertex for hS, χi.
Bases for k-coloring: For the graph coloring problem we define the bases inductively:
{, 1}is a basis (where 1is the encoding of the coloring of the empty set of vertices)
and
if b= (K, χ)is a basis, vis a heavy vertex for band χis an encoding of the
previous coloring χof Kextended by a proper coloring of v, then (K{v}, χ)is a
basis.
Our next step is to show that the ACP we define this way has small dimension and
width:
Lemma 4.6.3 For every input instance of the graph coloring problem the corresponding
abstract combinatorial program Phas dim(P)3k/ and width(P)k3 k/.
Proof : Since every k-coloring of a set of rvertices can be encoded using an integer
number between 1 and kr, it is enough to show that |K|3 k/ for every basis b= (K, χ).
Let b= (K, χ)be a basis with |K|=r. Since bis a basis, there must exist a sequence
of bases (K0, χ0),(K1, χ1),. . .,(Kr, χr)with (K0, χ0)=(, 1),(Kr, χr)=(K, χ), and
such that for every 1irwe have |Ki|=iand the only vertex in Ki\Ki1is heavy
for hKi1, χi1i.
Let V(i)
1, V(i)
2, . . . , V(i)
k,0ir, be the sets of vertices such that each vertex vV(i)
j
cannot be properly colored in color jif we want to extend coloring χito the set Ki{v}.
That is,
V(i)
j=vV:uKiwith χi(u) = jand (v, u)E.
It is easy to see that for every i, j we have V(i)
jV(i+1)
j. Furthermore, by the definition of
heavy vertices, we know that for every i,1ir, there is certain j,1jk, such
that |V(i)
j||V(i1)
j|+ n/3. Therefore, since we have V(0)
j=for every j,1jk, it
must hold that Pk
j=1|V(i)
j|i n/3 for every i,1ir. Finally, since |V(r)
j|nfor
every j,1jk, we conclude that r3 k/.2
Violation function for k-coloring: A basis b= (K, χ)is violated by a vertex vV
if either (i) vis a heavy vertex for hK, χior (ii) vis a conflict vertex for hK, χi.
Once we have described k-coloring in our framework, in order to apply Theorem 10,
we must show that the (, λ)-Distance Preserving and the Feasibility Preserving properties
hold. We begin with the proof of the following lemma that implies the (, λ)-Distance
Preserving property with λ=/3.
71
Advertisement
4 Efficient Property Testers
Lemma 4.6.4 ((, /3)-Distance Preserving property) Let G= (V, E)be a graph that
is -far from being k-colorable and let SVbe any set of properly k-colored vertices
with a proper coloring χ. Then, Vcontains more than n/3 conflict vertices for hS, χior
Vhas more than n/3 heavy vertices for hS, χi.
Proof : Our proof is by contradiction. Assume Gis -far from having a proper k-coloring
and there are less than n/3 conflict vertices for hS, χiand less than n/3 heavy vertices
for hS, χi. Then, we show that there is a k-colorable graph Gthat is obtained from Gby
removing less than n2edges. This is a contradiction.
Let Xbe the set of all conflict vertices for hS, χi, let Ybe the set of all heavy vertices
for hS, χi, and let Zbe the set of remaining uncolored vertices (i.e., Z=V\(SXY)).
For every i,1ik, let Vibe the set of vertices in Vsuch that for every vVithe
extension of χby coloring vwith color iis not a proper coloring (cf. Definition 4.6.2).
We first construct a graph G0by removing all edges incident to the vertices in XY
and extend the coloring χof Sto a k-coloring of SXYby coloring the vertices in
XYarbitrarily. Since |XY|< 2 n/3, less than 2 n2/3 edges are removed from G
in this way. Furthermore, since all vertices in XYare isolated, the obtained coloring χ0
is a proper k-coloring of SXY.
Now, we modify G0to extend the coloring χ0to all vertices in Z. For each vZ, let
τ(v)be a color that satisfies the following two constraints:
If we extend χ0to S{v}and we color vwith τ(v)then the resulting k-coloring is
proper, and
If we extend χ0to S{v}and we color vwith τ(v)then the absolute increase in the
size of Vτ(v)is minimal (among all possible choices that satisfy the first constraint).
In other words, v /Vτ(v)and for every i,1ik, if v /Vithen |{w /Vi: (v, w)
E}| |{w /Vτ(v): (v, w)E}|.) Since vis not a conflict vertex for hS, χi, such a color
τ(v)always exists (but is possibly not unique). By the fact that vis not a heavy vertex for
hS, χi, we know that |{w /Vτ(v): (v, w)E}| < n/3. Therefore, if we remove from
G0for every vZall edges (v, w)Ewith w /Vτ(v), then the resulting graph Gis
obtained from Gby removal of less than n2edges. It remains to show that the following
coloring of Gis proper: We color each vertex vSXYwith color χ0(v). Each
vertex vZis colored with color τ(v). Now assume that this coloring is not proper. Then
there must be an edge (v, u)such that vand uare colored in the same color. Since χis
proper, all vertices in Xand Yare isolated, and by the definition of τ(v)it is immediate
that such an edge can only be between vertices in Z. But then we have that u /Vτ(v)and
this implies that (v, u)has been removed from G0. Contradiction. 2
It remains to prove the Feasibility Preserving property:
Lemma 4.6.5 (Feasibility Preserving property) If for some SVevery basis covered
by Sis violated by certain vSthen the subgraph of Ginduced by vertices in Scannot
be properly k-colored.
72
4.7 Hypergraph Coloring
Proof : The proof is by contradiction. Let us suppose there is a proper k-coloring χof the
subgraph of Ginduced by the vertices in S. For every US, let χUdenote the coloring χ
restricted to vertex set U.
Let us observe that the set of bases covered by Sis not empty, because it contains
the “empty set” basis (, χ). Therefore, there exists a basis (possibly one of many) b=
(U, χU)covered by Shaving the maximum size of set U. Since bis violated by certain v
S, either vis a conflict vertex for hU, χUior vis a heavy vertex for hU, χUi. Furthermore,
since χwas assumed to be a proper k-coloring of S,vcannot be a conflict vertex for
hU, χUiand therefore it must be a heavy vertex for hU, χUi. But this implies that (U
{v}, χU{v})is a basis that is moreover covered by S. This yields a contradiction, because
we assumed that there is no basis (K, χK)covered by Shaving |K|>|U|.2
Therefore, we summarize our discussion in this section with the following theorem that
follows directly from Theorem 10 and Lemmas 4.6.4 and 4.6.5.
Theorem 11 There is a property tester for the graph k-coloring property with a query
complexity of
O(k 2ln(k/2))2=e
O(k2/4).2
4.7 Hypergraph Coloring
We now want to extend the analysis from Section 4.6 to obtain an efficient property tester
for hypergraph coloring. A hypergraph is a pair H= (V, E)with a finite vertex set V
and the edge set E2V. A hypergraph His `-uniform if |e|=`for all eE. (Notice
that a 2-uniform hypergraph is a graph.) A k-coloring of a hypergraph His an assignment
χ:V{1,...,k}of colors to the vertices of the hypergraph. A coloring is proper if
no edge in Eis monochromatic, that is, if for every edge eEthere are v, u ewith
χ(v)6=χ(u). If Hhas a proper k-coloring, then His k-colorable. The k-coloring problem
for hypergraphs is to decide whether a given hypergraph is k-colorable.
Hypergraph coloring is a well studied problem in the literature in discrete mathemat-
ics, combinatorics, and computer science. In contrast to graphs, where one can decide in
linear time if a graph is 2-colorable (or equivalently, bipartite), deciding if a given hyper-
graph is 2-colorable is NP-complete even for 3-uniform hypergraphs [64]. In [63], it is
shown that unless NP ZPP, for every fixed `3, it is impossible to approximate in
polynomial time the chromatic number of `-uniform hypergraphs within a factor n1εfor
every constant ε>0. See also [55, 60] for further inapproximability results. The property
of hypergraph 2-colorability has been also extensively studied in combinatorics (see, e.g.,
[28, 29, 39, 73]), and for example, the study of this problem led to the discovery of the
celebrated Lov´
asz Local Lemma [39].
We want to design a property tester for the k-coloring problem in `-uniform hyper-
graphs. An `-uniform hypergraph can be represented by a function f:V`{0, 1}that
encodes its adjacency matrix. We use the standard distance measure (Definition 2.1.3) to
measure the distance between hypergraphs. In terms of hypergraphs we can express this
distance measure as follows:
73
Advertisement
4 Efficient Property Testers
Definition 4.7.1 An `-uniform hypergraph H= (V, E)is -far from having a proper k-
coloring, if we have to remove more than n`edges from Hto obtain a hypergraph that
has a proper k-coloring.
Now, we discuss how to apply our framework to hypergraph coloring. Similarly to
the graph coloring problem, we identify the ground set Cwith the set of vertices Vof
the input hypergraph H= (V, E). Since in our representation Hcan be viewed as given
by its adjacency matrix representation, we define the interpretation Ito map each set of
vertices to the submatrix induced by these vertices. That is, for every SV, we have
I(S) = S×S×···×S. Clearly, the interpretation is N`-bounded. Let hS, χibe a pair with
SVand χa proper k-coloring of vertices in S.
Definition 4.7.2 We say a vertex vis i-colorable with respect to hS, χiif for every eE
with ve, either (i) there exists a vertex u(Se)with χ(u)6=ior (ii) there exists a
vertex we\(S{v}).
We want to extend our approach for graph coloring to hypergraphs. Again we use a
recursive definition of bases that is based on heavy vertices and conflict vertices. In order
to define heavy vertices we introduce a potential function. A vertex is a heavy vertex if
its coloring increases the potential significantly (very much in the spirit of graph coloring;
further motivation behind the definition of the potential function can be found in the proof
of Lemma 4.7.7.)
Definition 4.7.3 Let H= (V, E)be a hypergraph. Let SVand let χbe a proper
k-coloring of vertices in S. The potential of hS, χiis defined as
ΦH(hS, χi) :=
k
X
i=1
`1
X
j=1
nj1·|ϕ(hS, χi, i, j)|
where
ϕ(hS, χi, i, j) = WV:|W|=`j&eE(We&ve\Wχ(v) = i).
Hence if Wϕ(hS, χi, i, j), then coloring all vertices in Xwith color icreates a
monochromatic edge. Our next step is to extend the notion of heavy and conflict vertices
to hypergraphs:
Definition 4.7.4 A vertex vV\Sis heavy with respect to hS, χiif (i) there is an i,
1ik, such that vis i-colorable and (ii) for every i,1ik, if vis i-colorable and
χ0is the extension of χto S{v}by coloring vwith color ithen
∆ΦH(hS, χi, v, i) := ΦH(hS{v}, χ0i) ΦH(hS, χi)>n`1
3.
Definition 4.7.5 A vertex vV\Sis a conflict vertex with respect to hS, χiif for every
i,1ik,vis not i-colorable.
The bases and the violation function are defined in the same way as for graph coloring:
74
4.7 Hypergraph Coloring
Bases for k-coloring:
{, 1}is a basis (where 1is the encoding of the coloring of the empty set of vertices)
and
if b= (K, χ)is a basis, vis a heavy vertex for band χ0is an encoding of the previous
coloring χof Kextended by a proper coloring of v, then (K{v}, χ0)is a basis.
Violation function for k-coloring: A basis b= (K, χ)is violated by a vertex vV
if either (i) vis a heavy vertex for hK, χior (ii) vis a conflict vertex for hK, χi.
In a similar way as for the graph coloring problem we can give an upper bound for the
dimension of the constructed ACP:
Lemma 4.7.6 For every problem instance of the hypergraph k-coloring problem the cor-
responding abstract combinatorial program Phas dim(P)3k`/ and width(P)
k3k`/.
Proof : Since every k-coloring of a set of rvertices can be encoded using an integer
in the range between 1 and kr, it is enough to show that |K|3 k `/ for every basis
b= (K, χ). To show this, let us recall that the bases are defined inductively by adding a
heavy vertex to another basis. By definition, a heavy vertex increases the potential of the
corresponding basis by more than 1
3 n`1. The maximum potential of every basis is less
than k ` n`1and the starting potential is 0. Thus, it follows for every basis b= (K, χ)that
|K|3 k `/.2
Our next step is to prove the Distance Preserving property.
Lemma 4.7.7 ((, /3)-Distance Preserving property) Let H= (V, E)be a hypergraph
that is -far from being k-colorable and let SVbe an arbitrary set of vertices colored
according to a proper k-coloring χ. Then, either Vcontains more than n/3 conflict
vertices with respect to hS, χior Vhas more than n/3 heavy vertices for hS, χi.
Proof : Our arguments are similar to those used in the proof of Lemma 4.6.4. The proof
is by contradiction. Let us assume there are less than or equal to n/3 heavy vertices
and less than or equal to n/3 conflict vertices with respect to hS, χi. Then, we show
that it is possible to extend coloring χof Sto a coloring χof Vthat has at most n`
monochromatic (violating) edges in H. This will yield contradiction.
We define χas follows:
χ(v) =
χ(v)for every vS
1if vV\Sand vis a heavy vertex or a conflict vertex with respect to
hS, χi
iif vV\Sis i-colorable with respect to hS, χiand iminimizes (over
all possible choices of proper coloring i) the increase in the
potential, that is, ∆ΦH(hS, χi, v, i)∆ΦH(hS, χi, v, j)for every
proper coloring jof v
75
Advertisement
4 Efficient Property Testers
Now, we give an upper bound on the number of monochromatic edges in coloring χ
of H. Let us first consider heavy and conflict vertices. By our assumption, the number of
such vertices is upper bounded by 2
3 n. Therefore, the number of edges incident to these
vertices is upper bounded by 2
3 n`. Hence, it is sufficient to show that there are at most
1
3 n`monochromatic edges in Hthat are not incident to heavy or conflict vertices. We
show this indirectly by defining a set EDEof at most 1
3 n`edges. Then we prove that
this set contains all monochromatic edges for the coloring χ. Let Vlight denote the set of
all vertices in V\Sthat are neither heavy nor conflict vertices for hS, χi.
For a vertex vVlight let us define
∆ϕ(hS, χi, v, i, j) := ϕ(hS{v}, χ0i, i, j)\ϕ(hS, χi, i, j)
where χ0is the extension of χto S{v}by coloring vertex vwith color i. We further
denote by E(X, v) = {eE:ve&Xe}all edges of the hypergraph that contain
X{v}. Now we make a simple but important observation:
Claim 4.7.8 If E(X, v) = then X /∆ϕ(hS, χi, v, χ(v), j).
Proof : Follows immediately from the definition of ∆ϕ(hS, χi, v, χ(v), j).2
We want to define the set
ED:= [
vVlight [
j[`1]
E(v)
D,j
using sets E(v)
D,j that determine for each vertex va set of edges that is responsible for the
sets in ∆ϕ(hS, χi, v, χ(v), j). As we will see later these edges are the only edges that may
possibly be monochromatic in χ. We define the set E(v)
D,j as follows:
E(v)
D,j := [
X∆ϕ(hS,χi,v,χ(v),j)
E(X, v)
Claim 4.7.9 Let eEbe a monochromatic edge in the coloring hV, χi. Then eED.
Proof : If eis monochromatic then there is i[k]such that χ(u) = ifor all u
e. Further we conclude that for each uewe have {u}ϕ(hV, χi, i, ` 1). Now
we distinguish between two cases. First let us consider the case when there is a ue
with {u}ϕ(hS, χi, i, ` 1). In this case χ(u)6=iby the definition of χ. In the
second case such a udoes not exists but we have {u}ϕ(hV, χi, i, ` 1). But - as
can be seen from Claim 4.7.8 - we defined EDin such a way that there cannot be an
Xϕ(hV, χi, i, ` 1)that is not contained in ϕ(hS, χi, i, ` 1). Hence, if eis not in ED
it cannot be monochromatic. 2
It remains to show that the size of EDis small enough. By definition we have
ED=X
vVlight
`1
X
j=1E(v)
D,j
76
4.7 Hypergraph Coloring
and it is easy to see that for the E(v)
D,j it holds
E(v)
D,j∆ϕ(hS, χi, v, χ(v), j)·nj1
because of the fact that |E(X, v)|n`−(|X|+1). We conclude that we have
EDX
vVlight
`1
X
j=1∆ϕ(hS, χi, v, χ(v), j)·nj1
Since all considered vertices are light we also have for every vVlight
∆ΦH(hS, χi, v, χ(v)) =
`1
X
j=1∆ϕ(hS, χi, v, χ(v), j)·nj11
3n`1.
This finally gives us ED1
3n`.
Together with Claim 4.7.9 we get a contradiction to the fact that His -far from being
k-colorable. This proves the lemma. 2
The proof of the following lemma is essentially the same as the proof of Lemma 4.6.5
and we present it here for the sake of completeness only.
Lemma 4.7.10 (Feasibility Preserving property) If for some SVevery basis covered
by Sis violated by certain vS, then the sub-hypergraph of Hinduced by vertices in S
cannot be properly k-colored.
Proof : The proof is by contradiction. Let us suppose there is a proper k-coloring χof the
subgraph of Hinduced by the vertices in S. For every US, let χ|Udenote the coloring
χrestricted to vertex set U.
Let us observe that the set of bases covered by Sis not empty, because it contains
the “empty set” basis (, χ). Therefore, there exists a basis (possibly one of many) b=
(U, χ|U)covered by Shaving the maximum size of set U. Since bis violated by certain v
S, either vis a conflict vertex for hU, χ|Uior vis a heavy vertex for hU, χ|Ui. Further, since
χwas assumed to be a proper k-coloring of S,vcannot be a conflict vertex for hU, χUi
and therefore it must be a heavy vertex for hU, χ|Ui. But this implies that (U{v}, χ|U{v})
is a basis that is moreover covered by S. This yields a contradiction, because we assumed
that there is no basis (K, χ|K)covered by Shaving the size of Kgreater than |U|.2
The three lemmas above combined with our framework from Theorem 10 imply the
following result.
Theorem 12 There is a property tester for the hypergraph k-colorability with the query
complexity
O(k`2ln(k/))`=e
O((k `/2)`).2
77
Advertisement
4 Efficient Property Testers
4.8 Testable Hereditary Graph Properties
In this section we consider arbitrary hereditary graph properties. A graph property Π
is a family of graphs that is preserved under graph isomorphism (that is, if Gsatisfies
property Πand G0is a graph isomorphic to Gthen G0has property Π, too). A graph
property Πis hereditary if it is closed under taking induced subgraphs, that is, if graph
G= (V, E)has Πthen every subgraph GSinduced by a set SVhas property Π(see,
e.g., [20]). Similarly as in Section 4.6, we consider undirected graphs that are represented
by a function f:V×V{0, 1}that encodes the adjacency matrix of the graph. W.l.o.g.
we assume V= [n]. When we talk about graph properties we have to observe that in
our framework a property is defined as a subset of the set of n-vertex graphs. When no
confusion can arise we use Πto denote the graph property Πas well as the corresponding
set of n-vertex graphs (which is the formal definition of a property in this thesis). Our
main result is that a hereditary graph property is efficiently testable if and only if it can be
reduced to an abstract combinatorial program of dimension that is independent of the size
of the input graph.
For every graph G= (V, E)and every subset UVwe denote by GUthe subgraph of
Ginduced by U.
We use the standard distance measure from Definition 2.1.3 for testing graph proper-
ties:
Definition 4.8.1 Let Πbe an arbitrary graph property. A graph Gis -far from (satisfy-
ing) Πif in order to transform Ginto a graph satisfying Πone has to modify more than
n2entries in the adjacency matrix of G.
Now, we can formally state the main result of this section.
Theorem 13 Let Πbe a hereditary graph property. Let 0<<1. Let Gbe the set of all
graphs on the vertex set V= [n]. Then, Πis efficiently testable if and only if there are
δ=δ(),ρ=ρ()and λ=λ(), such that every G G can be mapped to an abstract
combinatorial program PG= ([n],B, $)with dim(PG)δ()and width(PG)ρ()
satisfying the following two properties:
((, λ)-Distance Preserving) if Gis -far from Πthen every basis in PGis λ-far from
feasible, and
(Feasibility Preserving) for every SVwith |S|δ(): If there is G0Πwith
GS=G0
Sthen there is a self-feasible basis for Sin PG.
The remaining part of this section is devoted to the proof of Theorem 13. We begin with
the following lemma that simplifies the analysis of property testers for graph properties. By
this lemma, if a hereditary graph property Πhas a property tester with a query complexity
of q()then it has property tester that samples a set Sof r(q()) vertices and accepts, if
and only if the subgraph induced by Shas Π.
78
4.8 Testable Hereditary Graph Properties
Lemma 4.8.2 (Alon, In Appendix E of [54]) Let Πbe a hereditary graph property.
Suppose there is a property tester for property Πwith query complexity q(). Then, there
is a property tester for property Πthat selects uniformly at random a set of r(q()) vertices
and accepts the input graph if and only if the corresponding induced subgraph satisfies
property Π.2
In order to prove Theorem 13, we must prove that our condition is both necessary and
sufficient for efficient testability of any property Π. We first observe that our proof of The-
orem 10 with the interpretation Ias defined in Section 4.6 implies directly the sufficiency
of our conditions. Now, we prove the necessity of our condition.
Lemma 4.8.3 (Necessary Condition) Let Πbe a hereditary graph property. Let 0 < <
1. Suppose there is a property tester for property Πwith query complexity q()1
4nand
let us set r=r(q()). Let Gbe the set of all graphs on the vertex set V={1,...,n}.
Let λ=λ() = 1
3·2r. Then, for every G G there exists an abstract combinatorial
program PG= ([n],B, $)with dim(PG)2r =: δand width(PG) = 1satisfying the
(, λ)-Distance Preserving and the Feasibility Preserving properties.
Proof : Let us fix an arbitrary graph G= (V, E)for which we describe ACP PGsatisfying
the required properties.
Bases for graph properties: We define the bases of PGto be of the form (K, 1)where
KV. We first give a set of basis candidates BC and then show how to obtain the set of
bases from this set of candidates.
We define the set of basis candidates BC as follows:
(, 1)is a basis candidate,
if Kis a set of vertices of size rand GK(the subgraph induced by K) does not satisfies
Π, then (K, 1)is a basis candidate, and
if (K, 1)is a basis and K0K, then (K0, 1)is a basis candidate.
We also define the notion of violators and light bases:
Definition 4.8.4 Let BC be a set of basis candidates and let b= (K, 1) BC. The set of
violators ViolBC(b)of bwith respect to BC is defined as follows:
if |K|=rthen ViolBC(b) = V\K, and
if |K|< r then ViolBC(b) = {vV\K:there exists a basis candidate b0=
(K{v}, 1)}.
Definition 4.8.5 Let BC be a set of basis candidates and let b BC. We call blight (with
respect to BC) if |ViolBC(b)|3
2λ n. If bis not light then it is heavy.
79
Advertisement
4 Efficient Property Testers
Now, we define the set of bases Bas the output of the following algorithm COMPUTE-
BASES:
COMPUTEBASES(Set of candidates BC)
while there is a light basis b(with respect to the current BC)do
BC =BC \ {b}
return B:= BC {(, 1)}
Violation function for graph properties: A basis b Bis violated by its violators, that
is, by all vertices contained in ViolB(b).
Notice that our construction of bases and the definition of the violation function implies
that every basis b B is heavy, that is, it is violated by more than 3
2λ n vertices in V.
Now, the following claim follows trivially from our construction.
Claim 4.8.6 PGhas dimension (r, 1).2
(, λ)-Distance Preserving Property: We prove now that the ACP PGdefined above
satisfies the (, λ)-Distance Preserving Property, that is, that if Gis -far from Πthen
every basis in PGis λ-far from feasible, where λ=λ() = 1
3·2r. Notice that by the
definition of bases, every basis except for (, 1)is violated by more than λ n ground set
elements. Therefore, what remains to be proven is that also the basis (, 1)is violated by
more than λ n ground set elements. Our proof of this fact is via a sequence of claims that
top-down establish lower bounds for the number of bases of given size.
For a basis b= (K, 1), let |K|be called its size. We begin with the following simple
claim about bases of size r.
Claim 4.8.7 Suppose r < (12
3λ)n. If Gis -far from Π, then the number of bases of
size rin Bis bigger than 2
3·n
r.
Proof : The proof follows directly from the definition of the bases and from Lemma
4.8.2. Indeed, Lemma 4.8.2 implies that for a graph that is -far from Π, if one picks at
random a set of rvertices in V, then with probability at least 2
3the subgraph induced by
these vertices does not satisfy Π. This is equivalent to say that the number of subsets of V
of size rfor which the induced subgraph does not satisfy Πis bigger than 2
3·n
r. By the
definition of the basis candidates, for every set KVof size r,(K, 1) BC. Moreover,
the set V\Kis the set of violators of K. Hence (K, 1)is a basis that is violated by nr
violators, and thus (K, 1)is heavy. Thus, (K, 1)belongs to B. Therefore, the number of
bases of size rin Bis bigger than 2
3·n
r.2
The next claim deals with the relation between the number of bases of size kand of
size k1.
Claim 4.8.8 Suppose that n2 r. Let ζ,0ζ1, and let k,1kr, be an integer.
If there are more than ζ·n
kbases of size kin Bthen the number of bases in Bof size
k1is bigger than ζ3·λ
2·n
k1.
80
4.8 Testable Hereditary Graph Properties
Proof : Recall that Bcontains only heavy bases. Furthermore, if (K, 1)is a basis then our
construction of bases ensures that for every uK,(K\ {u}, 1)is a basis (in the following
we refer with basis also to basis candidates) and that this basis is violated by u. Thus,
every basis of size kdefines kviolators for bases for size k1. We conclude that overall,
there are more than
k·ζ·n
k= (nk+1)·ζ·n
k1>1
2·n·ζ·n
k1
violators for bases of size k1. Observe that every light basis has at most 3
2λ n violators.
Therefore, the number of violators of all light bases of size k1is at most 3
2λ n n
k1. It
follows that the number of violators of heavy bases of that size is bigger than
1
2·n·ζ·n
k13
2·λ·n·n
k1=1
2·(ζ3 λ)·n·n
k1.
Since each basis has at most nviolators it follows that the number of heavy bases is larger
than 1
2·(ζ3 λ)·n
k1,
which completes the proof of our claim. 2
The next claim gives a lower bound for the number of bases of given size (bigger than
0).
Claim 4.8.9 Let 1krbe an integer. Then the number of bases of size rkis bigger
than 1
3·2k13
2λ21
2k1·n
rk.
Proof : The proof is by induction on k. For k=1the claim follows directly from Claims
4.8.7 and 4.8.8. Now, let us assume the claim is true for k=k0, for certain 1k0< r,
and we show that this implies that the claim is true also for k=k0+1. By induction, this
will yield the claim.
By the induction hypothesis, the number of bases in Bof size rk0is bigger than
1
3·2k013
2λ(21
2k01)·n
rk0
=1
3·2k23
2λ21
2k2·n
rk+1.
Therefore, by Claim 4.8.8, the number of bases in Bof size rk= (rk0) 1is bigger
than
1
3·2k23
2λ21
2k23 λ
2·n
rk=1
3·2k13
2λ21
2k1·n
rk
81
Advertisement
4 Efficient Property Testers
and the claim follows. 2
Now, we are ready to complete the proof of the (, λ)-Distance Preserving Property for
PG. By our discussion above, we only must show that the basis (, 1)is violated by more
than λ n ground set elements. By Claim 4.8.9, the number of bases in Bof size 1is bigger
than 1
3·2r23·λ·n
1=4
3·2r3·λ·nλ·n ,
by our assumption that λ=1
3·2r. Each of the bases of size 1is of the form ({u}, 1)for some
uVand by the definition of violation, such a vertex uviolates (, 1). Since the number
of such vertices uis bigger than λ·n, this completes the proof of the (, λ)-Distance
Preserving Property for PG.
Feasibility Preserving Property: We prove now that the ACP PGsatisfies the Feasi-
bility Preserving Property. We prove that for every SV,|S|δ, if the subgraph GS
satisfies Πthen there is a self-feasible basis for Sin PG. This implies the Feasibility Pre-
serving Property because Πis hereditary and so: If GSdoes not have property Πthen G
cannot have Π, as well.
Our arguments are roughly the same as in the proof of Lemma 4.6.5. The proof is
by contradiction. Let us suppose that the subgraph GSsatisfies Π, but every basis in S
is violated. Observe that the set of bases covered by Sis not empty, because it contains
the “empty set” basis (, 1). Therefore, there exists a basis b= (K, 1)covered by S
maximizing the size of set K. Since bis violated by certain vSand since Kis of
maximum size, we conclude that |K|=r. It follows that the subgraph GKinduced by
Kdoes not have property Π. Since Πis hereditary, it is closed under taking induced
subgraphs, and hence we conclude that GSdoes not have property Π.
If n < 2r then we use the following simple ACP: The ACP has a single basis (V, 1)
which is violated by all ground set elements, if Gis -far from Πand which is not violated
otherwise. This completes the proof of Lemma 4.8.3. 2
Now, we can conclude our discussion to complete the proof of Theorem 13.
Proof : The “only if part follows from Lemma 4.8.3 and the “if part follows from
Theorem 10 with the interpretation Ias defined in Section 4.6. 2
82
5 Testing Algorithms with
Geometric Queries
In Chapter 3 we considered Property Testing in the standard model’ for geometric objects
such as point sets. This means that the property testing algorithm is only allowed to ask
basic queries about the given object. In the case of point sets, our algorithm was allowed
to ask queries of the form ’What is the position of the i-th point of the point set ?’.
In this section we want to introduce a new type of query for point sets: Range queries.
Range queries are of the form: ’What is the i-th point in a query range R?’ where R
is a range in the Rd(typically, an axis parallel rectangle or a simplex). Such queries are
efficiently supported by basic spatial data structures, e.g. range trees and partition trees
[1].Later in this section we show that the property of being in convex position (in the R2)
can be tested with O(logn/)triangular range queries. This stands in contrast to the lower
boundof (d+1
pnd/)onthe query complexityin thestandard model fromChapter 3. We
also consider two other examples, labeling and clustering, and show that we can achieve
much better results in the new model than it would be possible in the standard model.
5.1 Property Testing with Range Queries
Many fundamental data structures for point sets are designed to efficiently answer range
queries (cf. [1]), i.e. queries that ask to return all points (or just their number) located in
a given query range. Usually, the shape of a query range is restricted to a certain class of
ranges, e.g. axis parallel rectangles or simplices. This way it is possible to answer queries
much more efficient than it would be possible for arbitrary query ranges.
In this section we introduce models of computation whose basic operations are range
queries. All models are defined for point sets in the Rd. Let Pdenote the point set under
consideration.
The Simplex Range Query Model.In the Simplex Range Query Model a Property
Testing algorithm is allowed to ask the following types of queries:
’What is the number of points of Pcontained in a (possibly degenerated) simplex S
?’.
83
Advertisement
5 Testing Algorithms with Geometric Queries
’What is the position of the i-th point of Pcontained in a (possibly degenerated)
simplex S?’ The ordering of the points of Pcontained in simplex Sis fixed but
unknown to the property testing algorithm. If iis bigger than the number of points
contained in Sthen the symbol is returned to denote that such a point does not
exist.
’What is the position of the i-th point of P?’ The ordering of the points of Pis fixed
but unknown. If iis bigger than |P|then the symbol is returned. Such a query
could be realized using the second type of query with a sufficiently large simplex.
For convenience in notation we introduce this as a separate type of query.
Of course, the simplex Sspecified by the property testing algorithm may differ from
query to query. This model is motivated by the large number of efficient data structures
that support triangular range queries such as partition trees and cutting trees. The most
efficient data structure for simplex range counting queries requires O(nd+)space (for
any >0) and supports queries in time O(logdn)[1].
The Rectangle Range Query Model.In the Rectangle Range Query Model a prop-
erty testing algorithm is allowed to ask the following types of queries:
What is the number of points of Pcontained in a (possibly unbounded) axis parallel
rectangle R?’
’What is the position of the i-th point of Pcontained in a (possibly unbounded) axis
parallel rectangle R?’ The ordering of the points of Pcontained in rectangle Ris
fixed but unknown to the Property Testing algorithm. If iis bigger than the number
of points contained in Rthen the symbol is returned to denote that such a point
does not exist.
’What is the position of the i-th point of P?’ The ordering of the points of Pis fixed
but unknown. If iis bigger than |P|then the symbol is returned. Similar to the
Simplex Range Query Model such a query could be realized using the second type
of query with a sufficiently large rectangle.
Orthogonal range queries are supported by many fundamental data structures such
as range trees, R-tree, and quad trees. The most efficient data structure for orthogonal
range counting queries in the RAM model supports queries in O(logd1n)time and uses
O(nlogd2n)space [1].
The Mixed Range Query Model. In the Mixed Range Query Model the property
testing algorithm may ask every query that is allowed in the Simplex Range Query Model
as well as every query that is allowed in the Rectangle Range Query Model.
Some further definitions. In all three models the query complexity of a property
testing algorithm is the number of queries asked about the point set. The size n=|P|of
the point set is known to the testing algorithm.
84
5.2 Testing Convex Position with Range Queries
5.2 Testing Convex Position with Range Queries
In Chapter 3 we designed a property tester for convex position in the standard property
testing model. Its query complexity was O(d+1
pnd/). We also proved a matching
lower bound on the query complexity of every testing algorithm in the standard testing
model. Obviously, the simplex range query model is more powerful than the standard test-
ing model. But how much does the possibility to use simplex range queries help us when
we want to design property testing algorithms ?
We first design and analyze a property tester for convex position in the simplex range
query model. We only consider the 2-dimensional case when the point set Pis in the
R2. Again we assume that the input point set is in general position. We prove that in
the simplex range query model there is a property tester for convex position with query
complexity O(logn/). Hence it is possible to achieve an exponential improvement in
the number of queries using this new model.
Our algorithm is based on a randomized test for extremality of points. Hence, we first
have to prove that many points of a point set are not extreme, if the set is -far from convex
position. This is done in the following Proposition:
Proposition 5.2.1 Let Pbe a point set that is -far from convex position. Then more than
|P|points in Pare not extreme.
Proof : Let Pbe -far from convex position. Assume less than |P|points in Pare not
extreme. Then we can delete these points from Pto obtain a point set in convex position.
Hence the point set Pcannot be -far from convex position. This is a contradiction. 2
Our next step is to show that the simple algorithm ISFACET decides if two points are
the vertices of the same facet (segment) of the convex hull.
p
q
Figure 5.1: If two points pand qbelong to the same facet of the convex hull then one of
the halfspaces induced by a line through pand qcontains exactly 2 points.
'
&
$
%
ISFACET(p, q)
Let H+and Hdenote the two halfspaces induced by the line through pand q
if H+contains exactly 2 points from Pthen accept
if Hcontains exactly 2 points from Pthen accept
reject
85
Advertisement
5 Testing Algorithms with Geometric Queries
p
r
q
Figure 5.2: A circular ordering for all points in a halfspace induced by a line through p
and r.
We show the following lemma:
Lemma 5.2.2 Let p, q Pbe two points from a point set PR2. Then algorithm
IsFacet(p, q)decides whether pand qare the vertices of the same facet of the convex hull
of P. The algorithm uses 2simplex range queries.
Proof : First of all, let us observe that two points p, q Pare the vertices of the
same facet of the convex hull of P, if and only if one of the closed halfspaces induced
by a line through pand qcontains exactly the points pand q. A halfspace range query
is a degenerated triangular range query and so we are allowed to use such a query in the
simplex range query model. Hence the algorithm is correct and obviously uses at most 2
queries. 2
So far, we can only find out whether two points belong to the same facet of the convex
hull of P. This would imply that both of them are extreme points. We use this possibility
to design an algorithm that finds out whether a single point pis extreme. If pis extreme
(and if |P|> 1) then there is a point qPsuch that pand qare vertices of the same facet
of the convex hull of P. Thus we have to design a procedure that finds such a point q, if p
is extreme.
The idea of our procedure is simple: Given an extreme point pwe pick another point
runiformly at random from P. If pr is a facet of the convex hull of Pthen we have a
witness that pis extreme. If not we select one of the halfspaces induced by a line through
pand r. Let us call the selected halfspace H(see Figure 5.2). Then we observe that the
point pinduces a circular ordering among the points in halfspace H. The ’last’ point in
this ordering is the point qwe are looking for. We can find this point efficiently using a
randomized binary search-like procedure on the points within the halfspace.
86
5.2 Testing Convex Position with Range Queries
'
&
$
%
EXTREMECHECK (P, p)
Choose a point rPuniformly at random
if ISFACET(p, r)then return pis extreme
Let Hdenote a closed halfspace induced by the line through pand r
Let τ1denote the (degenerated) triangle H
i=1
while τicontains more than 2points and i100 logndo
Choose a point siτiuniformly at random
if si6=pand si6=rthen
Let H(i)
sdenote the halfspace induced by a line through pand si
that does not contain r
Let τi+1denote the intersection of Hand H(i)
s
i=i+1
if τicontains more than 2 points then return don’t know
else if ISFACET(p, si)then return pis extreme
else return pis not extreme
We first show that the algorithm is correct, if it returns an answer other than ”don’t
know”.
Lemma 5.2.3 Let pbe an extreme point in P. If algorithm EXTREMECHECK is started
with input pthen the algorithm returns either pis extreme” or ”don’t know”.
Proof : We first observe that pis incident to two facets of the convex hull of Psince p
is an extreme point. The segment pr is not a facet of the convex hull of P, since otherwise
the algorithm had accepted the input in the first if-statement by Lemma 5.2.2. Now let
q1and q2denote the points of pthat share a facet with pin the convex hull of P. Then
either one of the points must be in halfspace H. If τi+1contains more than 2 points then
we know that sicannot share a facet with pbecause both closed halfspaces induced by a
line through pand sicontain at least 3 points. One of the halfspaces contains p,si, and
rand the other one contains all points in τi+1. Hence sican only share a facet with p, if
τi+1contains exactly 2points. If the algorithm does not answer ”don’t know” it computes
a triangle that contains exactly 2points. As we already observed there is a point in Hthat
shares a facet with pit must be the computed point. 2
Lemma 5.2.4 Let pbe a point in Pthat is not extreme. If algorithm EXTREMECHECK is
started with input pthen the algorithm returns either pis not extreme” or ”don’t know”.
Proof : By Lemma 5.2.2 we know that the algorithm only return pis extreme” if pis a
vertex of a facet of the convex hull of P. The lemma follows since pcannot be a vertex of
a facet of the convex hull, if it is not extreme. 2
Then we show that the algorithm typically returns an answer other than ”don’t know”.
87
Advertisement
5 Testing Algorithms with Geometric Queries
Lemma 5.2.5 Let Pbe a point set of npoints in the R2. Let Xp
idenote the random variable
for the number of points in triangle τiwhen running algorithm EXTREMECHECK with
input pP. Let Zidenote the indicator random variable for the event that Xp
i+12
3Xp
i.
Then we have
Pr[Zi=1]1
4
Proof : Let Qidenote the set of points that are contained in triangle τi. We observe
that Pr[si=q] = 1
|Qi|for each point qQi. For the sake of analysis we enumerate all
points in Qibut paccording to their circular ordering around p. Then it is immediate that
Xp
i+12
3Xp
iholds, if sibelongs to the last b2
3|Qi|c1points in this particular ordering. It
follows for |Qi|> 2 that
Pr[Zi=1]b2/3|Qi|c1
|Qi|1
4.
2
Lemma 5.2.6 Algorithm EXTREMECHECK returns the answer ”don’t know” with prob-
ability less than 1/3.
Proof : We first observe that
Pr[EXTREMECHECK returns ”don’t know”]PrX
1i100 logn
Zilog3/2 n
Now let Yibe a 0-1 random variable with PrYi=1=1/4. Then by Lemma 5.2.5 we
observe that
PrX
1i100 logn
Zilog3/2 nPrX
1i100 logn
Yilog3/2 n
=PrX
1i100 logn
Yi4logn
log(3/2)E[Yi]
PrX
1i100 logn
Yi8lognE[Yi]
PrX
1i100 logn
Yi(19
10)100 logn·E[Yi]
We apply a Chernoff bound [56] to the last inequality and obtain:
PrX
1i100 logn
Yi(19
10)100 logn·E[Yi]e−(9/10)225 logn/2 e10 logn1/3
2
Now we can come up with the testing algorithm:
88
5.3 Map Labeling
'
&
$
%
CONVEXTESTER-C(P, )
for i=1to 61do
choose a point pfrom Puniformly at random
if EXTREMECHECK(P, p)returns pis not extreme” then reject
accept
Theorem 14 Algorithm CONVEXTESTER-Cis a property tester for convex position in the
plane in the simplex range query model. Its query complexity is O(logn/).
Proof : We observe that the algorithm only rejects a point set, if algorithm EXTREME-
CHECK returns that the point is not an extreme point. By Lemma 5.2.3 algorithm EX-
TREMECHECK always returns pis extreme” or ”don’t know” if pis extreme. Hence
every point set in convex position is accepted because every point is an extreme point.
It remains to prove that a point set Pis rejected, if Pis -far from convex position.
When a point pis chosen uniformly at random from Pthen by Proposition 5.2.1 the
probability that pis not extreme is at least . By Lemma 5.2.6 we have that the prob-
ability that EXTREMECHECK answers ”don’t know” is at most 1/3. We conclude that the
probability that a point chosen uniformly at random by EXTREMECHECK is not extreme
and EXTREMECHECK rejects it, is at least 2/3. Hence the probability that algorithm
CONVEXTESTER-Cdoes not reject a point set that is -far from convex position is at least
PrCONVEXTESTER-Caccepts(12/3)61e41/3
and hence, PrCONVEXTESTER-Crejects2/3
2
5.3 Map Labeling
We continue the investigations in the power of our new model. We now consider a basic
map labeling problem. Map labeling in general deals with problems where to place labels,
e.g. names of towns, rivers, mountains, and oceans, on a geographic map. The constraints
for such a placement of labels are fairly obvious: Labels should be close to the labeled
feature and labels should not intersect. A basic map labeling problem can be formulated
in the following way [47]:
Let Pbe a set of npoints in the R2. Decide whether it is possible to place naxis-
parallel unit squares such that
all squares are pairwise disjoint (labels do not overlap),
each point is a corner of exactly one square (each point is labeled), and
each square has exactly one point on its corners (each point has a unique label).
89
Advertisement
5 Testing Algorithms with Geometric Queries
If a set Sof nsquares satisfies the conditions above, then Sis called a valid labeling
for P. The map labeling problem is known to be NP-complete and the corresponding
optimization problem (with the goal to maximize the label size) is known to have no ap-
proximation algorithm with ratio better than 2, unless P=NP [47].
Our goal is to design a property testing algorithm for the above labeling problem. We
use the standard distance measure from Definition 2.1.3. In terms of map labeling this can
be formulated as follows:
Definition 5.3.1 A set Pof npoints in the plane is -far from having a valid labeling, if
we have to delete more than n points to obtain a set of points that has a valid labeling.
Before we consider the map labeling problem in the orthogonal range query model, we
prove a strong lower bound on the query complexity in the standard testing model. This
bound shows that we cannot hope for a property testing algorithm that has o(n1δ)query
complexity for every constant δ>0.
Theorem 15 For every constant δ,0<δ<1, there is a positive constant such that
there is no property tester for the labeling problem with o(n1δ)query complexity in the
standard testing model.
Proof : For a given δlet us define c=d1δ
eand let =1
2(2c+1). We observe that by the
chosen value of cit holds that δ1
2c+1. We construct a point set that is -far from having
a valid labeling. The set consists of n/(2c +1)copies of a set Qcof 2c +1points. Qchas
the property that it does not have a valid labeling but if we delete one point from it, then it
always has a valid labeling. A property tester with query complexity o(n1δ)that samples
a subset of points uniformly at random cannot reject such a point set with probability 2/3.
Therefore, we first show that the existence of a property tester for a point set Pimplies
the existence of a property tester with similar query complexity that selects a sample set S
from Puniformly at random and accepts if and only if Shas a valid labeling:
Claim 5.3.2 Let Abe a property tester for the map labeling problem with query complexity
q(n, ). Then there exists a property tester A00 for the map labeling problem that sample a
set Sof q(, n)points uniformly at random and accepts if and only if the points in Shave
a valid labeling. Such a property tester is called canonical.
Proof : By Lemma 2.3.1 we know that there is a property tester A0that samples a set Sof
q(, n)points uniformly at random and decides based on Sand and its internal coin flips.
Let A00 be an algorithm that samples a set Sof q(, n)points uniformly at random and that
accepts if and only if Shas a valid labeling. We want to prove that A00 is a property tester.
Clearly, algorithm A00 accepts every point set having a valid labeling. Thus it remains to
prove that it rejects every point set that is -far from having a valid labeling. We show this
indirectly by proving the following statement: If algorithm A0rejects a sample set Sthen
so does algorithm A00. By definition a property tester has one-sided error and so A0must
accept if the sample set has a valid labeling. But if the sample Sdoes not have a valid
90
5.3 Map Labeling
labeling we know that A00 rejects the input. Hence, A00 always rejects, if A0rejects. Since
A0is a property tester it rejects every point set Pthat is -far from having a valid labeling
with probability at least 2/3. Thus algorithm A00 is a property tester. This proves Claim
5.3.2. 2
Now we show that there exists a point set that is -far from having a valid labeling
and that is not rejected by a canonical property tester with query complexity o(n1δ)with
probability more than 1/3. Our construction is based on a point set Qcof size 2c +1with
the following two properties:
Qcdoes not have a valid labeling,
for every pQcthe set Qc\ {p}has a valid labeling.
Label:
Does not have a valid labeling
Figure 5.3: The lower bound construction. If one point is missing then the set has a valid
labeling (right figure).
Claim 5.3.3 For every c2there exists a set Qcof 2c +1points such that the set Qc
does not have a valid labeling and such that Qc\{p}has a valid labeling for every pQc.
Proof : For a given c2we construct Qc. W.l.o.g. we assume that two squares intersect
if and only if their interiors intersect. We place points lcand rcat position (1, 0)and (c, 1),
respectively, a set Ucof cpoints at the position (1, 1),...,(c, 1), and a set Lcof c1points
at positions (1.5, 0.5),...,(c0.5, 0.5). We claim that the set Qc=UcLc{lc, rc}
does not have a valid labeling. This is because we have only |Lc|1label positions for the
points in Lcthat do not necessarily intersect with points from Uc{lc, rc}. Hence there
cannot be a valid labeling for Qc. On the other hand, for every qQcthe set Qc\ {q}has
a valid labeling as one easily verifies by a case distinction (see also the figure above):
In the first case we either have q=lcor q=rc(these cases are symmetric). Wlog.
let q=lc. Then we choose the upper left square for all points in Uc, the lower left
square for all points in Lcand the lower right square for rc.
Now let qbe in Lcand let qhave the position (i, 0.5). We choose the upper left
square for all points in Uc, the lower right square for all points r= (j, 0.5)Lc
with j<iand the the lower left square for the remaining points in Lc. For the points
lcand rcwe choose the lower left and lower right square, respectively.
91
Advertisement
5 Testing Algorithms with Geometric Queries
It remains to consider the case qUc. Let us assume that qhas position (i, 1).
Then we pick the upper left square for all points r= (j, 1)Ucwith j<iand the
upper right square for the remaining points in Uc. For all points r= (j, 0.5)Lc
with j<i1we choose the lower right square. For (i0.5, 0.5)Lcwe choose
the upper right square, and for the remaining points from Lcwe choose the lower
left square. For the points lcand rcwe choose the lower left and lower right square,
respectively.
In all cases we easily verify that the labels do not intersect. Hence Claim 5.3.3 follows.
2
Now we can construct a set Pof npoints that consists of k=bn
2c+1ctranslated copies
W1, . . . , Wkof the set Qc(such that their possible labelings do not intersect). In order to
obtain a set with a valid labeling from Pwe must delete at least one point from each Wi,
1ik. Thus we have that Pis -far from having a valid labeling. Further a canonical
property tester can reject Ponly if the sample set Scontains at least one of the sets Wi. If
we now apply Lemma 2.4.2 with l=2c +1and p=1/2 then we obtain that
Prj[k] : (WjS)1/2
if the sample size is less than (n2c)·1
2n 1
2c+1=(n1δ). Hence there is no canonical
property tester with query complexity o(n1δ)and so there cannot be any property tester
with that query complexity. 2
We show now that if we use the computational model that allows range queries we can
design a tester with O(1/3)query complexity. It is based on the approach developed in
[36] and [58].
'
&
$
%
LABELTEST(P):
choose a sample set Sof size 4/ uniformly at random from P
for each pSdo
i=0,T=
Let Rbe the axis parallel square with center pand side length 20d1/e
while i(20d1/e+2)2do
Let qbe the i-th point in the query range R
if q6=then T=T {q}
i=i+1
if Tdoes not have a valid labeling then reject
accept
Theorem 16 Algorithm LABELTEST is a tester for the labeling problem that has query
complexity O(1/3)and running time exp(O(1/2)).
Proof : First of all, let us observe that algorithm LABELTEST accepts every input point
set that has a valid labeling. Thus we have to show that every instance Pthat is -far
92
5.4 Clustering Problems
from having a valid labeling is rejected with probability at least 2/3. Let us assume that
Pis -far from having a valid labeling. Now we partition the plane into grid cells of side
length 10d1/esuch that at most n/2 points of Pare within a distance of 1or less to the
boundary of the grid cell they are contained in. Such a partition can be constructed in the
following way: We start by partitioning the plane into vertical stripes Ai={(x, y)R2:
2.5i x < 2.5(i+1)}. For j[k], with k=4d1
e, let Rj=Si:imodk=jAi. Then we
have that there is m[k]with |RmP|n/4. If we put the vertical lines of the grid
cells in the middle of the stripes in Rmthen at most n/4 points are within a distance of 1
to these vertical lines. A similar argument applied to horizontal lines gives us the desired
grid partition.
For the analysis we delete all points that are within a distance of 1or less to the bound-
ary of their grid cell from the point set P. This way we obtain the point set P0. Let
Xconfl P0be a minimal-size set of points such that P0\Xconfl has a valid labeling. Since
Pis -far from having a valid labeling we know that Xconfl > n/2. Now we consider a
single grid cell C. By the construction of P0we know that the labeling of the points in C
cannot intersect with labels from points outside of C.
Claim 5.3.4 If SXconfl 6=then algorithm LABELTEST rejects.
Proof : Let xbe a point from Xconfl that is contained in Sand let Cdenote the grid cell
containing x. The while-loop either adds all points in Rto Tor it finds out that more than
(20d1/e+2)2points are contained in R. In the case that Tcontains all points in Rit must
also contain all points from C. Thus the points in Tdo not have a valid labeling. In the
case when we have more than (20d1/e+2)2points in Twe would have to fit more than
(20d1/e+2)2unit square labels on an area of size (20d1/e+2)2. Thus there cannot be
a labeling in this case, either. Hence the algorithm rejects, if SXconfl 6=.2
By Claim 5.3.4 the theorem follows with
PrSXconfl =(1/2)4/ 2/3
and so PrSXconfl 6=2/3 .
2
5.4 Clustering Problems
In this section we revisit the radius and diameter clustering problem as defined in Chapter
4. Recall that the goal of a clustering problem is to decide whether a set of npoints Pin
Rdcan be partitioned into ksubsets (called clusters)S1, . . . , Sksuch that the cost of each
cluster is at most b. There are several different ways to define the cost of a cluster. Let S
be a set of points in Rd. In the radius clustering problem the cost costR(S)of a cluster S
is twice the minimum radius of a ball containing all points of the cluster. In the diameter
93
Advertisement
5 Testing Algorithms with Geometric Queries
clustering problem the cost costD(S)of a cluster Sis the maximum distance between a
pair of points of the cluster.
The goal of our property tester is to accept all instances that admit a clustering into k
subsets of cost band to reject with high probability those instances that cannot be clustered
into ksubsets of cost (1+)b. Note that this distance measure is different from the one
used in Chapter 4.
Definition 5.4.1 A point set Pis (b, k)-clusterable for a cost measure cost(), if there is a
partition of Pinto sets S1,...Sksuch that cost(Si)bfor all 1ik. A point set P
is -far from being (b, k)-clusterable, if for every partition of Pinto sets S1,...Skat least
one set Sihas cost larger than (1+)b.
Let us assume, without loss of generality, that b=1and thus we want to design a
tester for the problem whether a point set Pis (1, k)-clusterable for the two cost measures
above. We partition Rdinto grid cells of side length /(3d). For each cell containing an
input point, we choose an arbitrary input point from the cell as its representative. Then, we
compute whether the set of representatives is (1, k)-clusterable. If it is so, then we accept
it, if it is not so, then we reject it. Clearly, every set of points that is (1, k)-clusterable
is accepted by the algorithm. On the other hand, every instance that is -far from (1, k)-
clusterable will be rejected. (This approach has been introduced in [2] to obtain a (1+)-
approximation algorithm for the radius clustering problem.)
Our algorithms starts with an empty box with vertices at infinity. Then we query for
a point in this box. This point is located in a grid cell C. We will use the point as repre-
sentative for this cell. The bounding hyperplanes of Cinduce a partition of the space in
3dboxes (one of them being the grid cell). We mark all empty boxes and the grid cell.
Then we continue this process with an unmarked box. If there are only marked boxes the
algorithm terminates.
So far, our partition into grid cells works fine, only if there are many points in a single
cell. On the other hand, if no two points are in the same grid cell, the algorithm has (n)
query complexity. Thus we need an upper bound on the number of representatives that are
(1, k)-clusterable. Similar to Chapter 4 we use a volume argument to show that we can
stop the process when we find a set of representatives of size k·(6d/)d.
Lemma 5.4.2 Let Sbe a set of points in Rdno two points of which belonging to the same
cell of a grid of size /(3d)< 1. If cost(S)1for any of the two cost measures
described above, then |S|(6d/)d, where cost(·){costR, costD}.
Proof : We first observe that if cost(S)1then Sis contained in a hypercube with
side length 1. We show that a no set Sof size more than (6d/)dthat satisfies the
precondition of Lemma 5.4.2 is contained in a hypercube with side length 1. A hypercube
with side length 1 can contain at most (1
/(3d)+2)d(6d/)dgrid cells. This proves
Lemma 5.4.2. 2
Let V=k·(6d/)dthe maximum number of cells that can contain points that
belong to one of the kclusters. We observe that we can stop our procedure if the number
94
5.4 Clustering Problems
of representatives is more than V. Thus, we can guarantee that the algorithm requires at
most V·3drange queries.
Theorem 17 There is an property tester for the radius clustering and diameter clustering
problem that uses at most k·(18 d/)dorthogonal range queries.
Proof : We first describe the algorithm:
'
&
$
%
CLUSTERTEST(P, k, b, ):
B= infinite box
S=
Initialize empty queue Q
insert Binto Q
while Qis not empty do
B=Q.head()
if Bcontains a point pthen
find the grid cell Cthat contains p
S=S{p}
split Binto 3dsub-boxes using the bounding hyperplanes of C
insert all sub-boxes (except for C) into Q
if there are more than Vcells with representatives then
reject
if Sis (k, b)-clusterable then accept
else reject
We charge to each point pSthe 3d1range queries that have to be done to check
the 3d1new boxes (other than the grid cell containing p) that are created when pis
selected to be the representative for a grid cell. Thus the overall number of range queries
is at most V·3d=k·(18 d/)d.
Now we have to prove that our algorithm is a property tester. Clearly, if Pis (k, b)-
clusterable then so is every subset of Pand so the algorithm accepts. If the algorithm
rejects because there are too many cells with representatives then by Lemma 5.4.2 the
input points set is not (k, b)-clusterable. Thus it remains to show that the algorithm rejects
every point set that is -far from (k, b)-clusterable.
Let us assume that Pis -far from (k, b)-clusterable. Then for every partition into sets
C1, . . . , Ckthere is a set Ciwith cost(Ci)>(1+)b. By definition of the diameter
clustering problem this means that there are two points p, q Cisuch that dist(x, y)>
(1+)b. Since the diagonal of a grid cell is at most b/3 the distance between the
representatives of the cells of pand qis at least (1+/3)b. Thus the set Scannot be
clustered and the algorithm rejects.
Similar arguments show that every -far instance of the radius clustering problem is
rejected. 2
95
Advertisement
5 Testing Algorithms with Geometric Queries
96
6 Property Testing and Moving Data
In this chapter we apply the concept of property testing in the context of moving data. We
develop a theoretical model for the analysis of approximate combinatorial structures under
a very general type of motion. Then we illustrate our model on some examples. Among
these examples are range trees and the Euclidean minimum spanning tree of objects mov-
ing in the R2. We consider the following general scenario: We are given a set Oof n
objects and these objects are moving in the Rd. We assume that the objects are moved by
a third party. We have no information about the future movement of the objects. The only
thing we are allowed to do is to update the current position of an object at certain points of
time and at unit cost per update. Such an update is also called an update query. Our goal
is to approximately maintain a combinatorial structure defined uniquely by the positions
of these objects. At certain points of time there are queries to the combinatorial structure
we maintain. Before we answer such a query we may update some (possibly all) object
positions and perform some additional computations. After these updates we require that
our structure (which we call hypothesis) is close to correct with probability at least 2/3.
Then the query is answered. The combinatorial structure we maintain together with the
update scheme we use is called a soft kinetic data structure. The quality of a soft kinetic
data structure is measured by a form of competitive analysis. For every possible (non de-
generate) continuous motion of the objects we compare the number of updates made by
our algorithm against the dynamics of the motion. The dynamics is measured in terms
of combinatorial changes that occur if the combinatorial structure is correctly maintained
for the complete motion. The worst case ratio (over all continuous motions) between the
number of updates and the dynamics is called the dynamic efficiency of a soft kinetic data
structure.
Our strategy to design a soft kinetic data structure is the following: We first design a
property tester for the combinatorial structure we want to maintain. If this property tester
rejects a hypothesis it should provide a witness that this hypothesis is not correct. Then we
try to find a strategy that fixes such a witnessed error by applying some local changes in
the structure.
Each time before we have to answer a query we run the property tester. If the property
tester rejects, it provides a witness. The witnessed error is then corrected and we invoke
the property tester again. In order to show that this process terminates we have to prove
that every error correction brings the combinatorial structure closer to the correct structure.
In order to analyze the dynamic efficiency of such a soft kinetic data structure we have to
relate the number of corrections we do to the number of combinatorial changes induced by
the underlying motion.
97
Advertisement
6 Property Testing and Moving Data
Our model is partly inspired by the framework of kinetic data structures by Basch et
al. [12]. In their framework Basch et al. make different assumptions on the motion of the
objects than we do. In particular, they assume that there is a flight plan that describes the
near future motion of every object. This flight plan is known to the algorithm and changes
in the flight plan are passed to the algorithm. The information provided by the flight plan
results in a completely different model of computation. The analysis of a kinetic data
structure is in terms of combinatorial changes of the maintained structured and inspired
our way to analyze the dynamics.
6.1 Soft Kinetic Data Structures
In this section we describe the framework of soft kinetic data structures.
Moving Objects. We first introduce some terminology used to describe objects and
their motion. We are given a set O={O1, . . . , On}of nobjects (e.g., points, balls, cubes)
that move in the Rd. The motion of the objects is described by a motion plan. A motion
plan contains the position pi(t)of each object Oi O at every point of time t0. The
entity of all positions of objects P(t) = {p1(t), . . . , pn(t)}is called the configuration of
Oat time t. With this definition a motion plan is a function that assigns a configuration to
each point of time tR,t0. We assume that the motion plan is fixed but unknown to
what we later define as a soft kinetic data structure. The way we can access the positions
of our objects is similar to property testing: If we fix a point of time tthen the current
configuration of the objects can be seen as a point set. We may then update the position of
every point by specifying its index. This is similar to our representation of point sets used
in the context of property testing. We may update the position of an object certain points
of time.
We say that a motion plan is continuous if for each t, δ R, t 0, there exists an ε0
such that |pi(t) pi(t+ε)|δholds for each i, 1 in, and every εε0.
Hypotheses. In our model each of the nobjects is represented by a unique object
identifier which is a number from [n]. At every point of time tthe soft kinetic data struc-
ture may update the current position pi(t)of object Oiusing the object identifier i. We
introduce object identifier to syntactically distinguish between objects and their current
position. That is, we may, for example, compute a combinatorial structure defined over
the objects without knowing their current positions. In our framework we are interested in
combinatorial structures that are uniquely defined by the objects’ positions. When the set
of objects is a point set then some examples for such structures could be sorted sequences
in 1D, Voronoi diagrams, or the Euclidean minimum spanning tree. We use the fact that
we can build such a structure over a given set of objects even if we do not know the current
object configuration. In such a case we make a guess about the position of each object
and then we compute the structure. These guesses are usually based on the past config-
urations of the point set and on additional information obtained by update queries. The
computed structure may or may not be consistent with the current object configuration.
98
6.1 Soft Kinetic Data Structures
Therefore, we call such a structure a hypothesis. Given a distance measure σa hypothesis
His called -close to correct if the distance σ(H, H)to the correct hypothesis is at most
. Otherwise, we say that His -far from correct.
Now we want to illustrate our definitions on the example of a point set in 1D and the
combinatorial structure sorted sequence’. When we consider point sets then at every fixed
point of time tthe current configuration P(t)is a point set represented essentially in the
same way as in the context of property testing in the previous chapters. In particular, we
have that the object identifiers are the numbers from [n]and the current position of each
point can be accessed by an update query for the value of pi(t). Since we consider a point
set in one dimension we have a total ordering defined by the positions of the points. This
total ordering corresponds to a permutation of [n]which in turn can be viewed as a sorted
sequence. Now assume we do not know the positions of the points at a fixed point of time
t. Then we can still make some guesses (e.g. based on the past configurations we know)
about the positions of the points and compute a permutation πof [n]. Such a permutation
is a hypothesis for the sorted sequence defined by the configuration P(t).
The Objective of Soft Kinetic Data Structures. The objective of a soft kinetic
data structure is to maintain a hypothesis for a predetermined combinatorial structure un-
der an unknown motion plan. At certain points of time the soft kinetic data structure has
to answer a query about the hypothesis it maintains. Such a query is usually the standard
query supported by the combinatorial structure (typically, some kind of access or search
query). In our example of the sorted sequence a query may ask for the i-th item in the
sorted sequence. Each time before a query is processed, an algorithm called the reorga-
nizer is invoked. This algorithm may perform update queries about the current position
of some objects (we assume that the time does not proceed as long as the reorganizer is
working) and perform changes in the hypothesis. Technically, a reorganizer consists of a
property tester for the combinatorial structure and a procedure that corrects all errors in the
hypothesis that are detected by the property tester. After the reorganizer finished its work
the hypothesis should be -close to correct with probability at least 2/3. Then the query
algorithm is invoked and answers the query.
We assume that the soft kinetic data structure has to process a sequence of mchrono-
logical queries hQi= (Q1, . . . , Qm). Query Qitakes place at time tiand we associate
with each query the point of time when it takes place.
Soft Kinetic Data Structures. Now we can say that a soft kinetic data structure con-
sists of
ahypothesis for a predetermined combinatorial structure of a set of npoints,
areorganizer that is invoked each time before a query to the hypothesis has to be
processed,
aquery algorithm that processes the queries to a hypothesis.
99
Advertisement
6 Property Testing and Moving Data
6.1.1 Analysis of Soft Kinetic Data Structures
In this section we explain how to evaluate the quality of a soft kinetic data structure. Our
concept is to apply competitive analysis. That is, for every sequence of queries and every
continuous motion plan Mwe compare the number of update queries made by our soft
kinetic data structure against the number of combinatorial changes that occur in a correct
hypothesis when the points move according to M.
Let Mdenote an arbitrary motion plan describing a continuous motion and let CS
denote a combinatorial structure uniquely defined by a configuration of O. The dynamics
dynCS(M)of Mw.r.t. CS is defined as the number of combinatorial changes in CS when
we move the objects according to Mfrom time 0to . The term combinatorial change
depends on the problem at hand. For example, when the objects are points moving in the
Rand the combinatorial structure CS is a sorted sequence then a combinatorial change
takes places at each point of time twhen the ordering in the sorted sequence changes.
When the objects are points and the structure is a Euclidean minimum spanning tree then
a combinatorial change takes place when the graph structure changes. We always assume
that the motion plan is non degenerate, that is, at a fixed point of time there can be at most
one combinatorial change in the structure.
Now let us considered a given soft kinetic data structure DS. For a given sequence
hQiof mqueries Q1, . . . , Qmtaking place at time t1, . . . , tiand a motion plan Mwe
denote the update cost cost(hQi,M)of DS to be the number of update queries done
by the reorganizer when the objects move according to M. The dynamic efficiency of a
soft kinetic data structure is roughly the worst case ratio between the update cost and the
dynamics.
Definition 6.1.1 The dynamic efficiency deff of a soft kinetic data structure is defined as
follows:
deff := sup
hQi,M
costhQi,M
dynCSM+m
where the supremum is taking over all non-degenerate motion plans Mdescribing a con-
tinuous motion and all possible sequences of mqueries hQi.
A second quality measure for soft kinetic data structures is the update time. The up-
date time is the worst case ratio between overall running time of the reorganizer and the
number of update queries. We denote by time(hQi,M)the overall running time used by
the reorganizer for sequence hQiunder a motion plan M.
Definition 6.1.2 The update time of a soft kinetic data structure is defined as
tu:= sup
hQi,M
timehQi,M
costhQi,M+m
where the supremum is taking over all non-degenerate motion plans Mdescribing a con-
tinuous motion and all possible sequences of mqueries hQi.
100
6.2 Basic Soft Kinetic Data Structures
6.1.2 Discussion of The Model
There are a few points in our model that require a further discussion. First of all, let us
make some notes on the assumption that the points do not move while the reorganizer
works. We made this assumption because we have no other restrictions on the motion. As
soon as we allow the points to move during this process in a theoretical model there is no
real chance to make reasonable statements about the quality of the structure. In an im-
plementation the reorganizer is a continuous process typically running in the background.
When a query arrives we stop the reorganizer and answer the query. Typically, the changes
that occur during the time when the query is processed are neglectable.
A second point that requires discussion is the distance measure needed to evaluate the
quality of a hypothesis. Unlike in the case of property testing we want to use our soft
kinetic data structure to answer queries. Therefore, a distance measure should reflect the
’correctness of a hypothesis w.r.t. the query algorithm’ rather than the pure combinatorial
distance to a correct combinatorial structure. For example, if we consider soft kinetic
search trees then a query could be an access operation to an item stored in the tree. In such
a case a single error at the root of the tree can make almost every access operation fail.
Although the combinatorial distance to a correct structure would be rather small in such
a case, a hypothesis for a balanced search tree having such an error should be -far from
correct. In general, we want to have a distance measure that guarantees that, typically, an
access operation works correctly.
6.2 Basic Soft Kinetic Data Structures
In this section we describe how to apply our framework to some combinatorial structures
like sorted sequences, balanced search trees, range trees and Euclidean minimum spanning
trees.
6.2.1 Sorted Sequences
The first example for a soft kinetic data structure we consider is that of a sorted sequence.
W.l.o.g., we assume that our set of objects Ois a set of npoints moving in R. We assume
that the total order induced by the positions of the objects at a fixed point of time is always
unique. That is, we never have two objects with the same position. We can achieve this
by using an arbitrary tie-breaker (e.g., the object number). The object identifiers of O
are stored in an array A[1...n]. When we consider continuous motion the topological
structure of a sorted sequence changes when the positions v1, v2of two points change from
v1< v2to v1> v2. We say that a permutation A(t) = (A[1], . . . , A[n]) of [n]stored in
Aat time tis a hypothesis that is -close to correct, if the sequence pA[1](t), . . . , pA[n](t)
has edit distance (see Definition 6.2.1 for an equivalent definition) at most n to a sorted
sequence. We use the following definition:
Definition 6.2.1 Let pA[1](t), . . . , pA[n](t)denote a sequence stored in an array A[1...n]
at time t. We say that pA[1](t), . . . , pA[n](t)is -close to sorted, if the longest increasing
101
Advertisement
6 Property Testing and Moving Data
subsequence in pA[1](t), . . . , pA[n](t)has length at least (1)n. Otherwise, we say that
pA[1](t), . . . , pA[n](t)is -far from sorted.
Now we are given a sequence of maccess queries hQi= (Q1, . . . , Qm)to our array
and we assume that query Qitakes place at time ti. Before the query is processed the
reorganizer is invoked. This algorithm checks whether the current hypothesis is -close to
correct. In order to do so, it invokes a property tester SORTEDTEST for sorted sequences
from [40]. This property tester checks, if a sequence of number pA[1](t), . . . , pA[n](t)is a
sorted sequence or -far from sorted (where the distance is measured according to the edit
distance).
If this algorithm rejects it returns two indices k, l with k<land pA[k](t)> pA[l](t).
We then use a binary search like procedure to find an index k0such that pA[k0](t)>
pA[k0+1](t)and swap the object identifier stored in A[k0]and A[k0+1]. Then we make
two recursive calls to the reorganizer:
'
&
$
%
ARRAYREORGANIZER(A, t, )
if SORTEDTESTpA[1](t), . . . , pA[n](t), rejects then
Let (k, l)be the pair returned by SORTEDTEST pA[1](t), . . . , pA[n](t),
k0=FINDINVERSION(A, k, l)
swap(A[k0], A[k0+1])
ARRAYREORGANIZER(A, t, )
ARRAYREORGANIZER(A, t, )
From the pair of indices (k, l)returned by the property tester SORTEDTEST from [40]
algorithm FINDINVERSION can compute a pair (k0, k0+1)with pA[k0](t)> pA[k0+1](t)
by a binary search like procedure in the following way: As long k6=l1we compute
m=dk+l
2eand proceed either with (k, m)or with (m, l)depending on the value of
pA[m](t).
We now want to prove that algorithm ARRAYREORGANIZER computes a hypothesis
that is -close to correct.
Lemma 6.2.2 Let A(t) = A[1], . . . , A[n]be -far from a correct hypothesis for a
sorted sequence. Then algorithm ARRAYREORGANIZER computes with probability at
least 2/3 a hypothesis that is -close to correct.
Proof : We start with a simple claim about the edit distance:
Claim 6.2.3 Let k0denote an index such that pA[k0](t)> pA[k0+1](t). If we swap A[k0]
and A[k0+1]then the edit distance of pA[1](t), . . . , pA[n](t)to a sorted sequence does not
increase.
Proof : pA[1](t), . . . , pA[n](t)has edit distance k, if and only if the longest increasing
subsequence of pA[1](t), . . . , pA[n](t)has length nk. The length of the longest increasing
subsequence cannot decrease when we swap A[k0]and A[k0+1]and hence the edit distance
cannot increase. 2
102
6.2 Basic Soft Kinetic Data Structures
Now we want to analyze the probability that ARRAYREORGANIZER stops when the
hypothesis is -far from correct. We assume that the probability that SORTEDTEST accepts
a sequence that is -far from sorted is less than 1/10. If the probability is larger then we can
use standard amplification arguments to get the desired probability. Now we observe that
for the case that ARRAYREORGANIZER terminates when the input is -far from correct
there is a witness in form of a binary tree. Each node of this tree corresponds to a call
of ARRAYREORGANIZER that rejects the current hypothesis and it has a left (right) child
if the first (second) recursive call rejects the hypothesis. For example, if the first call to
ARRAYREORGANIZER accepts the hypothesis, then the corresponding tree is the empty
tree and it appears with probability 1
10 . It is well known that the number of binary trees
with knodes is less than 4k.
Now let us fix a binary tree Twith knodes and analyze the probability that the be-
havior of the algorithm is according to the witness T. We observe that the probability that
ARRAYREORGANIZER behaves according to Tis at most (1
10 )k+1because each binary tree
with knodes has k+1empty leafs and each empty leaf corresponds to a call to the reor-
ganizer that accepted a sequence that is -far from sorted. The overall probability that the
process stops before the hypothesis is -close to correct is bounded from the above by
X
0i
4i(1
10)i+11
10 X
0i
(2
5)i1
3
This proves Lemma 6.2.2. 2
Let Mdenote an arbitrary non-degenerate continuous motion plan. In the following we
want to make some observations w.r.t. the combinatorial changes induced by the motion
plan. First let us observe that any combinatorial change in the sorted sequence can be
written as a reversal σhi, i +1ifor some i[n1](unless we have a degenerate case
wheremore than 2 points areat thesame position). Such a reversal is called atransposition.
Now let hQi= (Q1, . . . , Qm)denote a sequence of mqueries and let us denote by tithe
time when query Qitakes place and t0=0. Now we derive a lower bound on the number
of combinatorial changes that can be expressed in terms of the configurations of the point
set at times ti,1im. For this purpose let us introduce some notation. For a number
πiin a permutation π= (π1, . . . , πn)of [n]let us define the rank(πi)as the number of
elements πjwith πj< πiand j>i. Then we define the weight w(π)of a permutation π
as Pi[n]rank(πi). It is well known that we can write every permutation πas a sequence
of w(π)transpositions and that every sequence of transpositions equal to πhas length at
least w(π).
We further observe that at every point of time tthere is a unique permutation π=
(π1, . . . , πn)such that pπ1(t)<··· < pπn(t). Thus we can write πi= (πi
1, . . . , πi
n)to
denote the permutation that satisfies pπi
1(ti)<··· < pπi
n(ti), that is, πidenotes the sorted
sequence at the time when the i-th query takes place. Then we can define permutations
σ1,...σmby the equation:
πi·σi=πi+1.
103
Advertisement
6 Property Testing and Moving Data
Now we can give a lower bound on the dynamics dyn(M)of the motion plan Mw.r.t. the
combinatorial structure sorted sequence:
dyn(M)X
1im
w(σi).
We use this lower bound to show that the number of errors corrected by the reorganizer is
at most the dynamics of the motion plan M.
Lemma 6.2.4 Let hQi= (Q1, . . . , Qm)be an arbitrary sequence of mqueries and let M
be an arbitrary non-degenerate continuous motion plan. If query Qitakes place at time ti
and the reorganizer corrects err(i)errors at time ti, then we have:
X
1im
err(i)dyn(M)
where dyn(M)denotes the dynamics of the motion plan Mw.r.t. the combinatorial struc-
ture sorted sequence.
Proof : For a given point of time tilet A(ti) = (A[1], . . . , A[n]) denote the permutation
defined by the elements of array Aat time ti. Further let us define the permutation σC(ti)
in the following way:
A(ti)·σC(ti) = πi.
Hence the permutation σC(ti)denotes the changes that must be performed to transform
the array at time tiinto a sorted sequence. The weight w(σC(ti)) denotes the minimum
number of transpositions necessary to transform the array into a sorted sequence. We prove
by induction on the number of queries that for kmthe following inequality holds:
X
1ik
err(i) + w(σC(tk)) X
1ik
w(σi).
Since every permutation can be expressed as a sequence of transpositions we know that
we can apply the triangle inequality to the weight of permutations. This way, we observe
that the number of errors corrected in the (k+1)-st call of the reorganizer is at most
err(k+1)w(σC(tk)·σk+1) w(σC(tk+1)) w(σC(tk))+ w(σk+1) w(σC(tk+1)) .
And hence err(k+1) w(σC(tk)) + w(σC(tk+1)) w(σk+1).
This proves the induction hypothesis for every kmand we get
X
1im
err(i)X
1im
err(i) + w(σC(tm)) X
1im
w(σi)dyn(M)
which completes the proof of Lemma 6.2.4. 2
Now we are ready to prove:
104
6.2 Basic Soft Kinetic Data Structures
Theorem 18 There is a soft kinetic sorted array with dynamic efficiency O(logn
)and up-
date time O(1). It answers queries about the i-th object in the soft kinetic sorted array in
time O(1).
Proof : By Lemma 6.2.2 we know that our reorganizer satisfies the requirements of a soft
kinetic data structure, that is, with probability at least 2/3 the sorted sequence maintained
in the array is -close to correct (after the reorganizer has been invoked). Thus we only
have to analyze the dynamic efficiency and update time of the data structure. The property
tester SORTEDTEST makes Θ(logn/)(update) queries and has the same running time.
The procedure FINDINVERSION makes O(logn)update queries and its running time is
proportional to the number of performed update queries. For a sequence of queries hQi=
(Q1, . . . , Qm)the number of calls to the reorganizer is m+twice the number of errors
corrected by the reorganizer’. Thus, together with Lemma 6.2.4 we get that the dynamic
efficiency of the data structure is O(logn/). The update time and the query time of O(1)
are immediate because for each update we do only a constant amount of work and the
query algorithm just has to return one entry in the array. 2
6.2.2 Balanced Search Trees
The next structure we want to look at is a binary search tree. A (balanced) binary search
tree is a rooted tree. At each node vof the tree a key KEY(v)is stored. For every node vin
the tree and every node uin the left (right, respectively) subtree of vit holds that KEY(v)
KEY(u)(KEY(v)<KEY(u)). In the case of soft kinetic data structures each node stores
an object identifier. The key of a node is the position of the corresponding object (w.l.o.g.,
we assume that the objects are points). Again we assume that every configuration of points
induces a unique ordering among the objects (that is, the case ’=’ does not occur). We want
to consider standard access (search) queries in binary trees. At first glance the problem
seems to be similar to the case of soft kinetic arrays. The combinatorial structure we use
for the analysis is (essentially) a sorted sequence. Thus we have a combinatorial change in
the search tree, if there is a combinatorial change in the sorted sequence. The difference
lies in the distance function used for soft kinetic balanced search trees. In the case of
binary trees the edit distance to a sorted sequence is not a good distance measure w.r.t. the
functionality of the trees. If only a single object is not at its correct position in the sorted
sequence and this object is stored at the root of the tree then it might be the case that a huge
fraction of access operations fails. Therefore, we consider a different distance measure:
Definition 6.2.5 A search tree is -far from correct, if more than n access operations
fail.
With this new distance measure we also need a new property tester for the invariant
of the search tree. But this is simple for balanced search trees with the distance measure
from above. We store all nodes of the search tree in an array such that we can pick a
node uniformly at random in constant time. Then for s2/ the following algorithm
is a property tester for the invariant of a search tree Tw.r.t. the distance measure from
Definition 6.2.5:
105
Advertisement
6 Property Testing and Moving Data
'
&
$
%
TREETEST(T, s, )
for i=1to s
pick a node vof the search tree uniformly at random
if node vcannot be accessed reject
accept
The test whether a node can be accessed can be implemented by walking from node v
to the root and checking the invariant at each node of the search path. We show that for
s2/ algorithm TREETEST is a property tester.
Lemma 6.2.6 For s2/ algorithm TREETEST is a property tester for correctness of
balanced binary search trees. Its query complexity and running time is O(logn/).
Proof : Clearly, the algorithm accepts every correct search tree. Thus we assume that the
tree is -far from correct. Then the probability that we find a node that cannot be accessed
is more than in each pass of the for-loop. Hence the overall probability that the algorithm
reject is
PrAlgorithm TREETEST rejects1 (1)2/ 1e22/3 .
The query complexity and running time follow immediately from the fact that the search
tree is balanced. 2
If an object in the search tree cannot be accessed there must be an error on the search
path (the path from the object’s node to the root). We conclude that algorithm TREETEST
rejects, only if it finds two objects whose current position is inconsistent with the tree
order. Thus we are basically in the same situation as in the case of the sorted arrays. We
therefore maintain an array A[1...n]where the items are stored in the ordering induced
by the search tree (this can be the same array that is used for the sampling). Then we
can proceed similarly to the sorted arrays. We use a binary search to find two adjacent
array entries whose ordering is inconsistent with the corresponding object positions. Then
we swap these entries in the array and we swap the entries in the corresponding nodes in
the tree (these nodes must not be adjacent in the tree). Unfortunately, the change of the
distance function makes the proof for the sorted arrays invalid for the search trees. This is
because of the fact that reducing errors in the tree order by swapping two objects (in the
way just explained) does not necessarily decrease the distance to a correct data structure
in terms of the distance measure from Definition 6.2.5. In fact, in situations it can even
increase this distance. We must apply a different technique. We therefore change the
sample size of algorithm TREETEST in such a way that it accepts an input that is -far
from correct with probability at most 1
3n2:
Lemma 6.2.7 Let s2ln(3n)/. Then the probability that algorithm TREETEST ac-
cepts an input that is -far from correct is at most 1
3n2.
Proof : For the case s2ln(3n)/ we get:
PrAlgorithm TREETEST accepts(1)2ln(3n)/ 1
3n2.
106
6.2 Basic Soft Kinetic Data Structures
2
Then we use the following reorganizer:
'
&
$
%
TREEREORGANIZER(T, t, )
if TREETEST(T, 2 ln(3n)/, )rejects then
Let (k, l)be the pair returned by TREETEST(T, 2 ln(3n)/, )
k0=FINDINVERSION(A, k, l)
swap(A[k0], A[k0+1])
swap the corresponding tree nodes
TREEREORGANIZER(T, t, )
Now we prove that the probability that the reorganizer terminates when the tree is -far
from correct is at most 1/3.
Lemma 6.2.8 The probability that TREEREORGANIZER stops when the hypothesis Tis
-far from correct is at most 1/3.
Proof : Let Adenote the array in which the object identifiers are stored with respect
to the tree order. Let A(t) = (A[1], . . . , A[n]) denote the permutation of [n]induced
by array Aat time t. Further let π= (π1, . . . , πn)denote the permutation such that
pπ1(t)<··· < pπn(t). Let
A(t)·σ=π .
Then we have that w(σ)n23by the definition of the weight of a permutation. We
further observe that each call of the reorganizer decreases the weight of σby 1. In the case
that Tis -far from correct we know that the reorganizer has to reject at most n2times. Let
Xdenote the indicator random variable for the event that algorithm REORGANIZER does
not stop when Tis -far from correct. Then we have:
PrX=1(11
3n2)n231
3
e2/3
since
PrAlgorithm TREETEST accepts1
3n2.
2
Lemma 6.2.4 holds also for trees because it does only assume that errors are corrected
by swapping two adjacent objects in the maintained sorted sequence (which is equivalent
to the tree order and is maintained in array A). Therefore we can conclude:
Theorem 19 There isa soft kinetic balanced searchtree with dynamicefficiencyO(log2n/)
and update time O(1). It answers access queries in time O(logn).
107
Advertisement
6 Property Testing and Moving Data
Proof : By Lemma 6.2.8 we know that our reorganizer satisfies the requirements of a
soft kinetic data structure. The number of update queries made by the property tester is
again proportional to its running time of O(log2n/). For a sequence of queries hQi=
(Q1, . . . , Qm)the number of calls to the reorganizer is m+the number of errors corrected
by the reorganizer’. Together with Lemma 6.2.4 we get that the dynamic efficiency of the
data structure is O(log2n/). The update time of O(1)is immediate. The query algorithm
needs time O(logn)for an access query because the tree is balanced. 2
6.2.3 Range Trees
We now want to focus on structures from computational geometry. We first consider range
trees in one and two dimensions. Then we design soft kinetic Euclidean minimum span-
ning trees.
1D Range Trees. 1D range trees are balanced binary search trees (see [1] for a formal
definition of range trees). The only difference to a standard search tree lies in the type of
query. A query to a range tree specifies a query range [x, y]and the answer should return
all points within the interval [x, y]. The standard way to answer such queries is to search
for xand yin the range tree and then to report all points on the search paths that are within
the query range as well as all points between the two search paths. Obviously, we can
apply the analysis of soft kinetic search trees to range trees. So the only question is what
time is needed to answer a query. A query to a soft kinetic range tree works in a similar
way as a standard query to a range tree. But in soft kinetic range trees it may be the case
that a point whose position is not within the query interval is stored between the two search
paths (because there is an error in the structure). In such a case we do not report this point.
For each point between the two search paths that is not contained in the query interval
we can find an error in the data structure. This way we can amortize the time needed to
process those points against the dynamics of the system.
We can use soft kinetic data structures for binary search trees to obtain the following
result (see [1] for a formal definition of range trees):
Theorem 20 There is a soft kinetic 1D-range tree with dynamic efficiency O(log2n/)
and update time O(1). The soft kinetic range trees supports 1D-range queries in O(logn+
k)time, where kis the number of reported points (a point is reported, only if it is within
the query range). If kdenotes the number of points in the query range (at the time the
query is processed) then it holds with probability at least 2/3 that kk n.
Proof : We have to show that kk n holds with probability 2/3 and we have
to prove the running time for the range queries. The other results follow from the binary
search trees. First of all, we can assume that the data structure is -close to correct with
probability 2/3 when the query is processed. This follows from Lemma 6.2.8 because the
reorganizer is called before a query is answered.
A range query for a one dimensional range tree is answered in the following standard
way: we search for the right and the left end of the query interval and output all nodes
108
6.2 Basic Soft Kinetic Data Structures
that are between the search paths. Before we report a point we check if it is contained
inside the query interval. If it is, we output it. If it is not we have found a conflict with
some node on the search paths. After all nodes inside the query interval have been reported
we partition the set of points Perr whose position is not in the query interval but that are
located between the two search paths into two sets. The first set P<x contains all points
of Perr whose position is smaller than x. The second set P>y contains all points of Perr
whose position is larger than y. We pick the larger set of these two and correct an error for
each point in this set. W.l.o.g. let us assume that P<x is the larger set. In the following we
identify the point with the node where it is stored. We observe that there are two different
orders among the points. The standard <’-relation and the tree order T. We observe that
each point pin P<x is either stored in a right subtree of a node of the left search path or
in a left subtree of a node of the right search path. These are the two cases we consider.
Let us start with the case when pis stored in a right subtree of a point qon the left search
path. Then we observe that qmust be larger than x(because the search path took the left
edge). Therefore, we have p < q but qTp. This is a conflict. In the second case, the
point pis stored in a left subtree of a point on the right search path. Then we know that the
point qstored at the split node (the node where the two search paths split) lies within the
search interval [x, y]. Since pis stored in a left subtree of the right path it must be larger
than q. But this is not the case. Hence there is a conflict because qTp. For each point
we use the standard procedure to find two points that are adjacent in the tree order and we
swap them. Since we process the points in increasing order none of the conflicts of the
remaining points is resolved (though the ’conflict partner of a point may be moved in the
tree order).
The number of inversions corrected in this process is at least half the number of points
in the search interval that are not reported. Therefore we can amortize the update queries
and the time needed to deal with these points against the dynamics of the motion.
Now assume that less than k n points are reported. Then there are more than n
nodes that cannot be accessed, because every access operation to one of the missing nodes
will end either on the search paths or between them. But the nodes that can be accessed
this way are the nodes reported by the query algorithm. Thus we have that the tree is -far
which occurs with probability at most 1/3.2
2D-Range Trees. In this section we describe and analyze 2-dimensional soft kinetic
range trees (see, e.g., [1] for a formal definition). We consider a standard implementation
of 2D-range trees. A 2D-range tree for a set of npoints in R2is a two level data structure.
The first level consists of a binary search tree (a 1D-range tree) sorted according to the
x-coordinate. The second level consists of ntrees (one for each node in the first level tree)
sorted according to the y-coordinates of the points. (see, e.g., [1] for more details).
Similar to the binary search trees we want to have a data structure that supports range
queries in a reasonable way. That is, we want to make sure that, if a structure is -close to
correct, it cannot happen that almost every range query fails.
We say that a point is accessible in a 2D range tree, if it can be accessed in the first
level tree using its x-coordinate as key and it can be accessed in all second level trees on
109
Advertisement
6 Property Testing and Moving Data
the search path using its y-coordinate.
Definition 6.2.9 A 2D range tree is -far from correct, if there are more than n points
that are not accessible.
Similar to the binary search trees we can design a simple property tester. We sample a set
of O(1/)nodes and check whether they are accessible by following the parents pointers
in the trees. We also use two arrays as auxiliary data structures. Array AXis used to define
the order of objects in the first level of the range tree and array AYdefines the order in
the second level. We maintain the invariant that at any time the order of the objects in
both levels is according to the order of the corresponding array. A combinatorial change
in a range tree takes places when the there is a change in the sorted sequence w.r.t. the
x-coordinate or in the sorted sequence w.r.t. the y-coordinate.
When the property tester rejects, it provides a proof that one of these arrays is not in
the correct order and we can find an adjacent pair that is in the wrong order (using the
same procedure as in the case of arrays and trees). Then we swap these two objects in the
corresponding array and we adjust the range tree to the changes we made. The adjustment
in the second level is done by deleting and reinserting the objects. This can be done in
O(log2n)time.
If we swap two objects in the first level tree, we delete all occurrences in the second
level trees. For this purpose we maintain pointers to all occurrences of an object in the
secondleveltrees. Then we swapthetwonodes inthe first level tree andreinsert theobjects
into the second level trees (using the ordering maintained in array AYfor comparisons).
This procedure can be done in O(log2n)time.
Now we can proceed in a similar way as for the search trees. We first show that the
following algorithm is a property tester for correctness of range trees that accepts a range
tree with probability at most 1
6n2if it is -far from correct.
'
&
$
%
RANGETREETEST(T, s, )
for i=1to s
pick a node vof the search tree uniformly at random
if node vis not accessible then reject
accept
Lemma 6.2.10 Let s2ln(6n)/. Then algorithm RANGETREETEST is a property
tester. If an input tree Tis -far from correct then it is accepted with probability at most
1
6n2.
Proof : Since algorithm RANGETREETEST accepts every correct tree we only have to
analyse the probability of rejectance. Let Tbe a range tree that is -far from correct. Then
we have
Pralgorithm RANGETREETEST accepts(1)2ln(6n)/ 1
6n2
2
110
6.2 Basic Soft Kinetic Data Structures
We already explained the way the errors are corrected by the reorganizer. The remain-
ing part of the reorganizer is similar to the one for search trees.
RANGETREEREORGANIZER(T, t, )
if RANGETREETEST(T, 2 ln(3n)/, )rejects then
correct the detected error
RANGETREEREORGANIZER(T, t, )
Lemma 6.2.11 The probability that RANGETREEREORGANIZER stops when the hypoth-
esis Tis -far from correct is at most 1/3.
Proof : The distance between the hypothesis and a correct range tree is at most two times
the distance in the one dimensional case (because we have two sorted sequences instead
of one). In the case that Tis -far from correct we know that the reorganizer has to reject
at most 2(n23)times. Let Xdenote the indicator random variable for the event that
algorithm REORGANIZER does not stop when Tis -far from correct. Then we have:
PrX=1(11
6n2)2(n23)1
3
e2/3
since
PrAlgorithm RANGETREETEST accepts1
6n2
2
Checking whether a point is accessible takes O(log2n)time. We conclude:
Theorem 21 There is a soft kinetic 2D-range trees with dynamic efficiency O(log3n/)
and update time O(1). The soft kinetic range tree supports 2D-range queries in O(log2n+
k)time where kis the number of reported points in the given query range. If kdenotes
the number of points in the given query range then it holds with probability 2/3 than
kkn.
Proof : The analysis of the running time of the queries is similar to the 1D case. And
again, if there is a query that returns less than k n points, then the structure must be
-far which is a contradiction. 2
6.2.4 Euclidean Minimum Spanning Trees
The last structure we consider is a Euclidean minimum spanning tree of a point set in the
R2. In order to design a soft kinetic Euclidean minimum spanning tree we use the property
tester and the distance measure from Section 3.3. Recall that this property tester only
rejects, if it finds a cycle in the EMST-completion of the input graph. The longest edge
of such a cycle cannot be in the EMST of the underlying point set. We use the following
reorganizer:
111
Advertisement
6 Property Testing and Moving Data
'
&
$
%
EMSTREORGANIZER(T, t, )
if EMSTTESTER (T, )rejects then
Let edenote the edge that is not in the EMST
mark eas deleted
increase counter for deleted edges by 1
if n/200 edges have been deleted then
recompute the EMST from scratch (query all point positions)
set counter to 0
EMSTREORGANIZER(T, t, )
Here EMSTTESTER(T, )denotes the property tester for EMSTs in degree bounded
graphs from Section 3.3 with the exception that we do not test the connectivity of the
tree. In the algorithm EMSTTESTER marked edges will be treated as missing edges. Since
the number of marked edges is at most n/200 our graph is always /200-close to con-
nected. By the analysis from Section 3.3 it follows that algorithm EMSTTESTER rejects
the hypothesis with probability at least 2/3, if it is -far from EMST.
Theorem 22 There is a soft kinetic Euclidean minimum spanning tree with dynamic effi-
ciency O(pn/ ·log(n/)) and update time O(logn/). Queries about incident edges
to a vertex vcan be answered in time O(1).
Proof : We first observe that the number of calls to the reorganizer is at most m+
’number of detected errors’. If we found n/200 errors then we correct all of them at
once by recomputing the EMST from the scratch. This requires nupdate queries and
O(nlogn)time. We observe that for each of the n/200 errors there must be a change
in the combinatorial structure of the Euclidean minimum spanning tree. Therefore, we
can amortize the number of update queries and the running time over the n/200 errors.
We obtain that the dynamic efficiency is dominated by the property tester and therefore
O(pn/ ·log(n/)). The update time is dominated by the time needed to recompute the
EMST and therefore O(logn/).2
112
7 Conclusions
In this thesis we studied geometric problems in the context of property testing and we
applied concepts from geometry to property testing. In Chapter 3 we designed property
testing algorithms for specific problems including disjointness of geometric objects, con-
vex position, and the Euclidean minimum spanning tree. Our algorithms have sublinear
query complexity and for some of them we give matching lower bounds. We also believe
that every property tester for the Euclidean minimum spanning tree in 2D has a query
complexity of (n)though we do not have a proof of that.
We think that property testing is an interesting concept, i.e., if applied to geometric
problems. We hope that further research in this area will provide new insights into the
structure of geometric properties. Besides the development of new property testing algo-
rithms it could be also an interesting problem to use existing property testers in the de-
sign of sublinear algorithms for geometric problems. For example, in the recent sublinear
approximation algorithm for the weight of a minimum spanning tree in degree bounded
graphs by Chazelle, Rubinfeld, and Trevisan [25] a property tester for connectivity in
graphs has been used as a subroutine.
In Chapter 4 we designed a framework for property testing algorithms with one-sided
error. The achievement of our framework is a nice decoupling of the probabilistic and
the combinatorial aspects of property testing. As a consequence of this, a proof using our
framework is usually elegant and gives a clear view of the important combinatorial features
of the problem. We applied our framework to analyze a couple of different property testers
and we showed that a large class of graph properties can be tested efficiently, if and only
if there is a proof using our framework. There are several open problems regarding our
framework and we want to mention two of them here:
The first problem is to extend the framework to algorithms with 2-sided error. The
question is here, if it is possible to formulate a framework for a 2-sided error model and if
such a model is useful for the design and analysis of property testing algorithms. Another
interesting problem is to prove in our framework that triangle-free graphs can be tested
efficiently. Although it is known that triangle-free graphs can be tested efficiently, the only
known proof uses Szemer´
edi’s Regularity Lemma. A proof using other techniques might
provide new insights into property testing and the structure of dense graphs.
Chapter 5 and 6 were devoted to the design of new models that involve property test-
ing. In Chapter 5 we introduced the possibility of range queries to property testing and
showed that the query complexity for some problems can be dramatically improved, if
range queries are allowed. Very recently, in a related model a sublinear time approxima-
tion algorithm for the weight of the Euclidean minimum spanning tree of a point set in
113
Advertisement
7 Conclusions
the Rdhas been developed [27]. Is it possible to find other sublinear time approximation
algorithms in such a model ?
In Chapter 6 we designed a model (called soft kinetic data structures) to analyze the
maintainance of combinatorial structures under a very general model of motion. We ana-
lyzed some basic soft kinetic structures. Besides the obvious problem to design other soft
kinetic data structures it would be very interesting to further develop models for moving
data. Keeping the results from Chapter 5 in mind an interesting direction of research is to
find out which queries should be supported by mobile objects in order to design efficient
algorithms for mobile data.
114
Bibliography
[1] P. Agarwal. Range searching. Handbook of Discrete and Computational Geometry,
1997.
[2] P. Agarwal and C. Procopiuc. Exact and approximation algorithms for clustering.
In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms
(SODA), pages 658–667, 1998.
[3] N. Alon. Testing subgraphs in large graphs. In Proccedings of the 42nd Annual IEEE
Symposium on Foundations of Computer Science (FOCS), pages 434 441, 2001.
[4] N. Alon, S. Dar, M. Parnas, and D. Ron. Testing of clustering. In Proceedings of the
41st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages
240–250, 2000.
[5] N. Alon, E. Fischer, M. Krivelevich, and M. Szegedy. Efficient testing of large
graphs. Combinatorica, 20(4):451 476, 2000.
[6] N. Alon and M. Krivelevich. Testing k-colorability. SIAM Journal on Discrete Math-
ematics, 15:211 227, 2002.
[7] N. Alon, M. Krivelevich, I. Newman, and M. Szegedy. Regular languages are testable
with a constant number of queries. SIAM Journal on Computing, 30(6):1842 1862,
2001.
[8] N. Alon and A. Shapira. Testing satisfiability. In Proceedings of the 13th Annual
ACM-SIAM Symposium on Discrete Algorithms (SODA), 2002.
[9] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, and
M. Protasi. Complexity and approximation. combinatorial optimization problems
and their approximability properties. Springer, New York, NY, 1998.
[10] V. Bafna and P. A. Pevzner. Genome rearrangements and sorting by reversals. SIAM
Journal on Computing, 25(2):272 289, 1996.
[11] Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Sampling algorithms: Lower bounds
and applications. In Proceedings of the 33th Annual ACM Symposium on Theory of
Computing (STOC), pages 266–275, 2001.
115
Advertisement
Bibliography
[12] J. Basch, L. Guibas, and J. Hershberger. Data structures for mobile data. Journal of
Algorithms, 31(1):1 28, 1999.
[13] T. Batu, E. Fischer, L. Fortnow, R. Kumar, R. Rubinfeld, and P. White. Testing
random variables for independence and identity. In Proceedings of the 42nd Annual
IEEE Symposium on Foundations of Computer Science (FOCS), pages 442 451,
2001.
[14] T. Batu, L. Fortnow, R. Rubinfeld, W. Smith, and P. White. Testing closeness of
distributions. In Proceedings of the 41st Annual IEEE Symposium on Foundations of
Computer Science (FOCS), pages 259 269, 2001.
[15] M. A. Bender and D. Ron. Testing acyclicity of directed graphs in sublinear time. In
Proceedings of the Annual Interrnational Colloquium on Automata, Languages and
Programming (ICALP), pages 809–820, 2000.
[16] J. L. Bentley and T. A. Ottmann. Algorithms for reporting and counting geometric
intersections. IEEE Trans. Comput., C-28:643 647, 1979.
[17] P. Berman, S. Hannenhall, and M. Karpinski. 1.375-approximation algorithm for
sorting by reversals. Electronic Colloquium on Computational Complexity, Technical
Report Series, 8(47), 2001.
[18] P. Berman and M. Karpinski. On some tighter inapproximability results, further
improvements. In Proceedings of the 26th Annual Interrnational Colloquium on
Automata, Languages and Programming (ICALP), pages 200 209, 1999.
[19] A. Bogdanov, K. Obata, and L. Trevisan. A lower bound for testing 3-colorability
in bounded-degree graphs. In Proceedings of the 43th Annual IEEE Symposium on
Foundations of Computer Science (FOCS), pages 93–102, 2002.
[20] B. Bollob´
as. Hereditary properties of graphs: Asymptotic enumeration, global struc-
ture, and colouring. Documenta Mathematica, III:333 342, 1998.
[21] H. Buhrman, L. Fortnow, I. Newman, and H. R¨
ohrig. Quantum property testing.
In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms
(SODA), 2003.
[22] A. Caprara. Sorting permutations by reversals and Eulerian cycle decompositions.
SIAM Journal on Discrete Mathematics, 12(1):91 110, 1999.
[23] C. Carath´
eodory. ¨
Uber den Variabilit¨
atsbereich der Fourierschen Konstanten von
positiven harmonischen Funktionen. Rendiconto del Circolo Matematico di Palermo,
32:193–217, 1911.
[24] T. M. Chan. Output-sensitive results on convex hulls, extreme points, and related
problems. Discrete & Computational Geometry, 16(3):369–387, 1996.
116
Bibliography
[25] B. Chazelle, R. Rubinfeld, and L. Trevisan. Approximating the minimum spanning
tree weight in sublinear time. In Proceedings of the 28th Annual Interrnational Collo-
quium on Automata, Languages and Programming (ICALP), pages 190 200, 2001.
[26] D. Cohen-Or, Y. Chrysanthou, C. T. Silva, and F. Durand. A survey of visibility
for walkthrough applications. to appear in: IEEE Transactions on Visualization &
Computer Graphics.
[27] A. Czumaj, F. Erg¨
un, L. Fortnow, A. Magen, I. Newman, R. Rubinfeld, and C. Sohler.
Sublinear approximation of Euclidean minimum spanning tree. In Proceedings of the
14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 813–
822, 2003.
[28] A. Czumaj and C. Scheideler. An algorithmic approach to the general Lov´
asz local
lemma with applications to scheduling and satisfiability problems. In Proceedings of
the 32th Annual ACM Symposium on Theory of Computing (STOC), pages 38 47,
2000.
[29] A. Czumaj and C. Scheideler. Coloring non-uniform hypergraphs: A new algorithmic
approach to the general Lov´
asz Local Lemma. Random Structures and Algorithms,
17(3 - 4):213–237, 2000.
[30] A. Czumaj and C. Sohler. Property testing with geometric queries. In Proceedings of
the 9th Annual European Symposium on Algorithms (ESA), pages 266 277, 2001.
[31] A. Czumaj and C. Sohler. Soft kinetic data structures. In Proceedings of the 12th
Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 865 872,
2001.
[32] A. Czumaj and C. Sohler. Testing hypergraph coloring. In Proceedings of the
28th Annual Interrnational Colloquium on Automata, Languages and Programming
(ICALP), pages 493 505, 2001.
[33] A. Czumaj and C. Sohler. Abstract combinatorial programs and efficient property
testers. In Proceedings of the 43rd Annual IEEE Symposium on Foundations of Com-
puter Science (FOCS), pages 83–92, 2002.
[34] A. Czumaj, C. Sohler, and M. Ziegler. Property testing in computational geometry.
In Proceedings of the 8th Annual European Symposium on Algorithms (ESA), pages
155 166, 2000.
[35] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf. Computational
Geometry: Algorithms and Applications. Springer-Verlag Berlin Heidelberg, 1997.
[36] S. Doddi, M. Marathe, A. Mirzaian, B. Moret, and B. Zhu. Map labeling and its gen-
eralizations. In Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete
Algorithms (SODA), pages 148 157, 1997.
117
Advertisement
Bibliography
[37] Y. Dodis, O. Goldreich, E. Lehman, S. Raskhodnikova, D. Ron, and A. Samorod-
nitsky. Improved testing algorithms for monotonicity. In Proceedings of the 3rd In-
ternation Workshop on Randomization and Approximation Techniques in Computer
Science, pages 97 108, 1999.
[38] L. Engebretsen and J. Holmerin. Clique is hard to approximate within n1o(1). In
Proceedings of the 27th Annual Interrnational Colloquium on Automata, Languages
and Programming (ICALP), pages 2 12, 2000.
[39] P. Erd˝
os and L. Lov´
asz. Problems and results on 3-chromatic hypergraphs and some
related questions. Infinite and Finite Sets (to Paul Erd˝
os on his 60th birthday), II:609
627, 1975.
[40] F. Erg¨
un, S. Kannan, S.R. Kumar, R. Rubinfeld, andM. Viswanathan. Spot-checkers.
J. Comput. Syst. Sci., 60:717–751, 2000.
[41] T. Feder and D. H. Greene. Optimal algorithms for approximate clustering. In Pro-
ceedingsof the 20th Annual ACM Symposium on Theory ofComputing (STOC), pages
434 444, 1998.
[42] E. Fischer. The art of uniformed decisions. a primer to property testing. Bulletin of
the European Association for Theoretical Computer Science, 75:97 126, 2001.
[43] E. Fischer, G. Kindler, D. Ron, S. Safra, and A. Samorodnitsky. Testing juntas.
In Proceedings of the 43th Annual IEEE Symposium on Foundations of Computer
Science (FOCS), pages 103–112, 2002.
[44] E. Fischer, E. Lehman, I. Newman, S. Rashkodnikova, R. Rubinfeld, and
A. Samorodnitsky. Monotonicity testing over general poset domains. In Proceed-
ings of the 34th Annual ACM Symposium on Theory of Computing (STOC), 2002.
[45] E. Fischer and I. Newman. Testing of matrix properties. In Proceedings of the 33rd
Annual ACM Symposium on Theory of Computing (STOC), pages 286 295, 2001.
[46] E. Fischer and I. Newman. Functions that have read-twice constant width branching
programs are not necessarily testable. In Proceedings of the 17th Conference on
Computational Complexity, 2002.
[47] M. Formann and F. Wagner. A packing problem with applications to lettering of
maps. In Proceedings of the 7th Annual ACM Symposium on Computational Geom-
etry (SoCG), pages 281 288, 1991.
[48] M. R. Garey and D. S. Johnson. Computers and intractability: A guide to the theory
of NP-completeness. Freeman, New York, NY, 1979.
[49] O. Goldreich. Combinatorial property testing (a survey). In Proceedings of DIMACS
Workshop in Randomization Methods in Algorithm Design, pages 45 59, 1997.
118
Bibliography
[50] O. Goldreich, S. Goldwasser, E. Lehman, D. Ron, and A. Samorodnitsky. Testing
monotonicity. Combinatorica, 20(3):301 337, 2000.
[51] O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to
learning and approximation. Journal of the ACM, 45(4):653 750, 1998.
[52] O. Goldreich and D. Ron. Property testing in bounded degree graphs. In Proceedings
of the Annual ACM Symposium on Theory of Computing (STOC), pages 406–415,
1997.
[53] O. Goldreich and D. Ron. A sublinear bipartiteness tester for bounded degree graphs.
Combinatorica, 19(3):335–373, 1999.
[54] O. Goldreich and L. Trevisan. Three theorems regarding testing graph properties.
In Proceedings of the 42nd Annual IEEE Symposium on Foundations of Computer
Science (FOCS), pages 460–469, 2001.
[55] V. Guruswami, J. H˚
astad, and M. Sudan. Hardness of approximate hypergraph color-
ing. In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer
Science (FOCS), pages 149 158, 2000.
[56] T. Hagerub and C. Rub. A guided tour to chernoff bounds. Information Processing
Letters, 33:305 308, 1989.
[57] D. S. Hochbaum. Approximation algorithms for NP-hard problems. PWS Publish-
ing Company, Boston, MA, 1996.
[58] H. H. III, M. Marathe, V. Radhakrishnan, D. R. S. Ravi, and R. Stears. Nc-
approximation for np- and pspace-hard problems for geometric graphs. Journal of
Algorithms, 26(2):238 274, 1998.
[59] J. D. Kececioglu and D. Sankoff. Exact and approximation algorithms for sorting by
reversals, with application to genome rearrangement. Algorithmica, 13(1/2):180
210, 1995.
[60] S. Khot. Hardness results for approximate hypergraph coloring. In Proceedings of
the 34th Annual ACM Symposium on Theory of Computing (STOC), 2002.
[61] D. Kirkpatrick and J. Radke. A framework for computational morphology. Compu-
tational Geometry, pages 217 248, 1985.
[62] Y. Kohayakawa, B. Nagle, and V. R¨
odl. Efficient testing of hypergraphs. In Pro-
ceedings of the Annual Interrnational Colloquium on Automata, Languages and Pro-
gramming (ICALP), pages 1017–1028, 2002.
[63] M. Krivelevich and B. Sudakov. Approximate coloring of uniform hypergraphs. In
Proceedings of the 6th Annual European Symposium on Algorithms (ESA), pages 477
489, 1998.
119
Advertisement
Bibliography
[64] L. Lov´
asz. Coverings and colorings of hypergraphs. In Proceedings of the 4th South-
eastern Conference on Combinatorics, Graph Theory and Computing, pages 3 12,
1973.
[65] R. Motwani and P. Raghavan. Randomized algorithms. Cambridge University Press,
New York, NY, 1995.
[66] I. Newman. Testing of function that have small width branching programs. In Pro-
ceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science
(FOCS), pages 251 258, 2000.
[67] M. Parnas and D. Ron. Testing the diameter of graphs. In Proceedings of the
RANDOM-APPROX’99, pages 85–96, 1999.
[68] M. Parnas and D. Ron. Testing metric properties. In Proceedings of the 33rd Annual
ACM Symposium on Theory of Computing (STOC), pages 276 285, 2001.
[69] M. Parnas, D. Ron, and R. Rubinfeld. Testing parenthesis languages. In Proceedings
of the 5th International Workshop on Randomization and Approximation Techniques
in Computer Science, pages 261 272, 2001.
[70] M. Parnas, D. Ron, and A. Samorodnitsky. Proclaiming dictators and juntas or testing
boolean formulae. In Proceedings of the 5th International Workshop on Randomiza-
tion and Approximation Techniques in Computer Science, pages 273 284, 2001.
[71] P. A. Pevzner. Computational Molecular Biology. MIT Press, 2000.
[72] P. A. Pevzner and M. S. Waterman. Open combinatorial problems in computational
molecular biology. In Proceedings of the 3rd Israel Symposium on Theory of Com-
puting and Systems, pages 158 173, 1995.
[73] J. Radhakrishnan and A. Srinivasan. Improved bounds and algorithms for hypergraph
two-coloring. In Proceedings of the 39th Annual IEEE Symposium on Foundations
of Computer Science (FOCS), pages 684 693, 1998.
[74] D. Ron. Property testing. In Handbook of Randomized Algorithms. Kluwer Academic
Publishers, 2001.
[75] R. Rubinfeld and M. Sudan. Robust characterization of polynomials with applica-
tions to program testing. SIAM Journal on Computing, 25(2):252 271, 1996.
[76] M. Sharir and E. Welzl. A combinatorial bound for linear programming and related
problems. In Proceedings of the 9th Annual Symposium on Theoretical Aspects of
Computer Science (STACS), pages 569 579, 1992.
[77] R. E. Tarjan. Data structures and network algorithms. In CBMS-NSF Regional Con-
ference Series in Applied Mathematics, volume 44. Society for Industrial and Applied
Mathematics (SIAM), 1983.
120
Bibliography
[78] E. Welzl. Smallest enclosing disks(balls and ellipsoids. In H. Maurer, editor, New
Results and New Trends in Computer Science, number 555 in LNCS. Springer, 1991.
121
Advertisement